Query help - Database版 - 未名存档

本页内容为未名空间相应帖子的节选和存档，一周内的贴子最多显示50字，超过一周显示500字访问原贴

Database版 - Query help

相关主题
● Cascade delete, update 的问题(MS SQL)	● sybase是用那个程序来执行sql query呢
● SQL Server query 一问	● w2k下除了ms的JDBC驱动外，还有其它的吗？
● 帮忙解释下这个查询，有关NULL的	● 请教一个SQL 的问题
● 请教一个sql问题	● To 版主: about similar ID...
● Suffix tree and the matrix together?	● 刚回来有点忙，先发50个吧。NY 版 IDs Only
● 请教个excel function问题（整理一些数据）	● 请教一个题 string similarity
● INSERT or UPDATE, which is faster?	● 跳槽风险
● Re: 用Servlet显示数据库里的数据,分页的? (答案在这里)	● 请教：similarity measure between surfaces

相关话题的讨论汇总
话题: ids话题: pairwise话题: subset话题: 75%话题: between

进入Database版参与讨论

1

(共1页)

M***7 发帖数: 2420	1 Hi there, The table is like ID1 ID2 similarity 1 2 95% 1 3 80% ... 1 10000 60% 2 3 70% ... Suppose there are 10000 distinct IDs, and the table stores all pairwise similarities. Now I want to retrieve a subset of IDs and make sure that the pairwise similairity between every two IDs in the subset is in a specific range (e.g. 70~85%). Anyone help me out. Thanks.
i****a 发帖数: 36252	2 don't really understand your requirement. can you give an example by just the surface meaning, I think you are just looking for anything in 70 to 80%? select * from table where similarity is between 75 and 80 【在 M***7 的大作中提到】 : Hi there, : The table is like : ID1 ID2 similarity : 1 2 95% : 1 3 80% : ... : 1 10000 60% : 2 3 70% : ... : Suppose there are 10000 distinct IDs, and the table stores all pairwise
M***7 发帖数: 2420	3 I want a subset of IDs, like ------------------------- ID 1 3 4 ..... ------------------------ that having all pairwise similarities between 70~85%. The way you suggested is not working since it eliminates certain possible pairwise comparison between IDs in the subset. 70 【在 i***a 的大作中提到】 : don't really understand your requirement. can you give an example : by just the surface meaning, I think you are just looking for anything in 70 : to 80%? : select : from table : where similarity is between 75 and 80
t****n 发帖数: 263	4 There might be more than 1 subsets that could satisfy your requirement. for example, if there are only 2 rows in the table. 1 2 75% 3 4 75% either 1&2 or 3&4 can satisfy your requirement. which one of them do you want? 【在 M***7 的大作中提到】 : Hi there, : The table is like : ID1 ID2 similarity : 1 2 95% : 1 3 80% : ... : 1 10000 60% : 2 3 70% : ... : Suppose there are 10000 distinct IDs, and the table stores all pairwise
i****a 发帖数: 36252	5 what's the similarity between 1 and 4, and between 3 and 4? not listed in your raw data table 【在 M***7 的大作中提到】 : I want a subset of IDs, like : ------------------------- : ID : 1 : 3 : 4 : ..... : ------------------------ : that having all pairwise similarities between 70~85%. : The way you suggested is not working since it eliminates certain possible
M***7 发帖数: 2420	6 First, the original table contains all possible pairwise similarities. It is possible to get more than 1 subset. So I would want to get a whole subset, for example, 2000 out of 10000 while all possible pairwise similarities in the 2000 IDs are in the range, then use some other constraints to trim it down. Thanks 【在 t****n 的大作中提到】 : There might be more than 1 subsets that could satisfy your requirement. : for example, if there are only 2 rows in the table. : 1 2 75% : 3 4 75% : either 1&2 or 3&4 can satisfy your requirement. which one of them do you : want?
t****n 发帖数: 263	7 After give this more thought. I think there is a deeper problem in this. consider the following example 1 2 75% 3 1 75% 3 2 75% 4 1 75% 4 2 75% 5 1 75% 5 2 75% 4 5 75% Then (1, 2, 3) or (1, 2, 4, 5) are both OK, which one do you want? My instinct tells me that this might be a NP problem. You should think about it more. 【在 t****n 的大作中提到】 : There might be more than 1 subsets that could satisfy your requirement. : for example, if there are only 2 rows in the table. : 1 2 75% : 3 4 75% : either 1&2 or 3&4 can satisfy your requirement. which one of them do you : want?
M***7 发帖数: 2420	8 That's true. I have not think about it. Thanks a lot 【在 t****n 的大作中提到】 : After give this more thought. I think there is a deeper problem in this. : consider the following example : 1 2 75% : 3 1 75% : 3 2 75% : 4 1 75% : 4 2 75% : 5 1 75% : 5 2 75% : 4 5 75%
t****n 发帖数: 263	9 You can add all missing pairs in my example with similarity 0%, it doen't change anything. Think, bro. subset, for example, 2000 out of range, then use some other constraints to trim it down. 【在 M***7 的大作中提到】 : First, the original table contains all possible pairwise similarities. : It is possible to get more than 1 subset. So I would want to get a whole subset, for example, 2000 out of : 10000 while all possible pairwise similarities in the 2000 IDs are in the range, then use some other constraints to trim it down. : Thanks
p*********a 发帖数: 61	10 Think each ID as a node. Connect an edge between any two nodes if their similarity is between the specified range. Then the problem is to find a clique. Evidently, there are usually more than one cliques in the graph. If you want to find the maximum clique, the problem is NP-hard. It is even " harder" than other NP problems, a.k.a. fixed-parameter intractable. In other words, there is no good approximation algorithm.
M***7 发帖数: 2420	11 Thanks a lot. I realized that after put in more thought. If other 【在 p*********a 的大作中提到】 : Think each ID as a node. Connect an edge between any two nodes if their : similarity is between the specified range. Then the problem is to find a : clique. Evidently, there are usually more than one cliques in the graph. If : you want to find the maximum clique, the problem is NP-hard. It is even " : harder" than other NP problems, a.k.a. fixed-parameter intractable. In other : words, there is no good approximation algorithm.

1

(共1页)

进入Database版参与讨论

相关主题
● 请教：similarity measure between surfaces	● Suffix tree and the matrix together?
● 看看你会不会出错！（Correlation factor vs Similarity）	● 请教个excel function问题（整理一些数据）
● [合集] 看看你会不会出错！（Correlation factor vs Similarity）	● INSERT or UPDATE, which is faster?
● 2009年12月9号-最新版务声明	● Re: 用Servlet显示数据库里的数据,分页的? (答案在这里)
● Cascade delete, update 的问题(MS SQL)	● sybase是用那个程序来执行sql query呢
● SQL Server query 一问	● w2k下除了ms的JDBC驱动外，还有其它的吗？
● 帮忙解释下这个查询，有关NULL的	● 请教一个SQL 的问题
● 请教一个sql问题	● To 版主: about similar ID...

相关话题的讨论汇总
话题: ids话题: pairwise话题: subset话题: 75%话题: between

未名新帖统计// 7月16日

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

* 这里只显示发帖超过25的版面，努力灌水吧:-)