由买买提看人间百态

boards

本页内容为未名空间相应帖子的节选和存档,一周内的贴子最多显示50字,超过一周显示500字 访问原贴
Database版 - Query help
相关主题
Cascade delete, update 的问题(MS SQL)sybase是用那个程序来执行sql query呢
SQL Server query 一问w2k下除了ms的JDBC驱动外,还有其它的吗?
帮忙解释下这个查询,有关NULL的请教一个SQL 的问题
请教一个sql问题To 版主: about similar ID...
Suffix tree and the matrix together?刚回来有点忙,先发50个吧。NY 版 IDs Only
请教个excel function问题(整理一些数据)请教一个题 string similarity
INSERT or UPDATE, which is faster?跳槽风险
Re: 用Servlet显示数据库里的数据,分页的? (答案在这里)请教:similarity measure between surfaces
相关话题的讨论汇总
话题: ids话题: pairwise话题: subset话题: 75%话题: between
进入Database版参与讨论
1 (共1页)
M***7
发帖数: 2420
1
Hi there,
The table is like
ID1 ID2 similarity
1 2 95%
1 3 80%
...
1 10000 60%
2 3 70%
...
Suppose there are 10000 distinct IDs, and the table stores all pairwise
similarities. Now I want to retrieve a subset of IDs and make sure that the
pairwise similairity between every two IDs in the subset is in a specific
range (e.g. 70~85%).
Anyone help me out. Thanks.
i****a
发帖数: 36252
2
don't really understand your requirement. can you give an example
by just the surface meaning, I think you are just looking for anything in 70
to 80%?
select *
from table
where similarity is between 75 and 80

【在 M***7 的大作中提到】
: Hi there,
: The table is like
: ID1 ID2 similarity
: 1 2 95%
: 1 3 80%
: ...
: 1 10000 60%
: 2 3 70%
: ...
: Suppose there are 10000 distinct IDs, and the table stores all pairwise

M***7
发帖数: 2420
3
I want a subset of IDs, like
-------------------------
ID
1
3
4
.....
------------------------
that having all pairwise similarities between 70~85%.
The way you suggested is not working since it eliminates certain possible
pairwise comparison between IDs in the subset.

70

【在 i****a 的大作中提到】
: don't really understand your requirement. can you give an example
: by just the surface meaning, I think you are just looking for anything in 70
: to 80%?
: select *
: from table
: where similarity is between 75 and 80

t****n
发帖数: 263
4
There might be more than 1 subsets that could satisfy your requirement.
for example, if there are only 2 rows in the table.
1 2 75%
3 4 75%
either 1&2 or 3&4 can satisfy your requirement. which one of them do you
want?

【在 M***7 的大作中提到】
: Hi there,
: The table is like
: ID1 ID2 similarity
: 1 2 95%
: 1 3 80%
: ...
: 1 10000 60%
: 2 3 70%
: ...
: Suppose there are 10000 distinct IDs, and the table stores all pairwise

i****a
发帖数: 36252
5
what's the similarity between 1 and 4, and between 3 and 4?
not listed in your raw data table

【在 M***7 的大作中提到】
: I want a subset of IDs, like
: -------------------------
: ID
: 1
: 3
: 4
: .....
: ------------------------
: that having all pairwise similarities between 70~85%.
: The way you suggested is not working since it eliminates certain possible

M***7
发帖数: 2420
6
First, the original table contains all possible pairwise similarities.
It is possible to get more than 1 subset. So I would want to get a whole subset, for example, 2000 out of
10000 while all possible pairwise similarities in the 2000 IDs are in the range, then use some other constraints to trim it down.
Thanks

【在 t****n 的大作中提到】
: There might be more than 1 subsets that could satisfy your requirement.
: for example, if there are only 2 rows in the table.
: 1 2 75%
: 3 4 75%
: either 1&2 or 3&4 can satisfy your requirement. which one of them do you
: want?

t****n
发帖数: 263
7
After give this more thought. I think there is a deeper problem in this.
consider the following example
1 2 75%
3 1 75%
3 2 75%
4 1 75%
4 2 75%
5 1 75%
5 2 75%
4 5 75%
Then (1, 2, 3) or (1, 2, 4, 5) are both OK, which one do you want?
My instinct tells me that this might be a NP problem. You should think about
it more.

【在 t****n 的大作中提到】
: There might be more than 1 subsets that could satisfy your requirement.
: for example, if there are only 2 rows in the table.
: 1 2 75%
: 3 4 75%
: either 1&2 or 3&4 can satisfy your requirement. which one of them do you
: want?

M***7
发帖数: 2420
8
That's true. I have not think about it. Thanks a lot

【在 t****n 的大作中提到】
: After give this more thought. I think there is a deeper problem in this.
: consider the following example
: 1 2 75%
: 3 1 75%
: 3 2 75%
: 4 1 75%
: 4 2 75%
: 5 1 75%
: 5 2 75%
: 4 5 75%

t****n
发帖数: 263
9
You can add all missing pairs in my example with similarity 0%, it doen't
change anything. Think, bro.

subset, for example, 2000 out of
range, then use some other constraints to trim it down.

【在 M***7 的大作中提到】
: First, the original table contains all possible pairwise similarities.
: It is possible to get more than 1 subset. So I would want to get a whole subset, for example, 2000 out of
: 10000 while all possible pairwise similarities in the 2000 IDs are in the range, then use some other constraints to trim it down.
: Thanks

p*********a
发帖数: 61
10
Think each ID as a node. Connect an edge between any two nodes if their
similarity is between the specified range. Then the problem is to find a
clique. Evidently, there are usually more than one cliques in the graph. If
you want to find the maximum clique, the problem is NP-hard. It is even "
harder" than other NP problems, a.k.a. fixed-parameter intractable. In other
words, there is no good approximation algorithm.
M***7
发帖数: 2420
11
Thanks a lot. I realized that after put in more thought.

If
other

【在 p*********a 的大作中提到】
: Think each ID as a node. Connect an edge between any two nodes if their
: similarity is between the specified range. Then the problem is to find a
: clique. Evidently, there are usually more than one cliques in the graph. If
: you want to find the maximum clique, the problem is NP-hard. It is even "
: harder" than other NP problems, a.k.a. fixed-parameter intractable. In other
: words, there is no good approximation algorithm.

1 (共1页)
进入Database版参与讨论
相关主题
请教:similarity measure between surfacesSuffix tree and the matrix together?
看看你会不会出错!(Correlation factor vs Similarity)请教个excel function问题(整理一些数据)
[合集] 看看你会不会出错!(Correlation factor vs Similarity)INSERT or UPDATE, which is faster?
2009年12月9号-最新版务声明Re: 用Servlet显示数据库里的数据,分页的? (答案在这里)
Cascade delete, update 的问题(MS SQL)sybase是用那个程序来执行sql query呢
SQL Server query 一问w2k下除了ms的JDBC驱动外,还有其它的吗?
帮忙解释下这个查询,有关NULL的请教一个SQL 的问题
请教一个sql问题To 版主: about similar ID...
相关话题的讨论汇总
话题: ids话题: pairwise话题: subset话题: 75%话题: between