M***7 发帖数: 2420 | 1 Hi there,
The table is like
ID1 ID2 similarity
1 2 95%
1 3 80%
...
1 10000 60%
2 3 70%
...
Suppose there are 10000 distinct IDs, and the table stores all pairwise
similarities. Now I want to retrieve a subset of IDs and make sure that the
pairwise similairity between every two IDs in the subset is in a specific
range (e.g. 70~85%).
Anyone help me out. Thanks. |
i****a 发帖数: 36252 | 2 don't really understand your requirement. can you give an example
by just the surface meaning, I think you are just looking for anything in 70
to 80%?
select *
from table
where similarity is between 75 and 80
【在 M***7 的大作中提到】 : Hi there, : The table is like : ID1 ID2 similarity : 1 2 95% : 1 3 80% : ... : 1 10000 60% : 2 3 70% : ... : Suppose there are 10000 distinct IDs, and the table stores all pairwise
|
M***7 发帖数: 2420 | 3 I want a subset of IDs, like
-------------------------
ID
1
3
4
.....
------------------------
that having all pairwise similarities between 70~85%.
The way you suggested is not working since it eliminates certain possible
pairwise comparison between IDs in the subset.
70
【在 i****a 的大作中提到】 : don't really understand your requirement. can you give an example : by just the surface meaning, I think you are just looking for anything in 70 : to 80%? : select * : from table : where similarity is between 75 and 80
|
t****n 发帖数: 263 | 4 There might be more than 1 subsets that could satisfy your requirement.
for example, if there are only 2 rows in the table.
1 2 75%
3 4 75%
either 1&2 or 3&4 can satisfy your requirement. which one of them do you
want?
【在 M***7 的大作中提到】 : Hi there, : The table is like : ID1 ID2 similarity : 1 2 95% : 1 3 80% : ... : 1 10000 60% : 2 3 70% : ... : Suppose there are 10000 distinct IDs, and the table stores all pairwise
|
i****a 发帖数: 36252 | 5 what's the similarity between 1 and 4, and between 3 and 4?
not listed in your raw data table
【在 M***7 的大作中提到】 : I want a subset of IDs, like : ------------------------- : ID : 1 : 3 : 4 : ..... : ------------------------ : that having all pairwise similarities between 70~85%. : The way you suggested is not working since it eliminates certain possible
|
M***7 发帖数: 2420 | 6 First, the original table contains all possible pairwise similarities.
It is possible to get more than 1 subset. So I would want to get a whole subset, for example, 2000 out of
10000 while all possible pairwise similarities in the 2000 IDs are in the range, then use some other constraints to trim it down.
Thanks
【在 t****n 的大作中提到】 : There might be more than 1 subsets that could satisfy your requirement. : for example, if there are only 2 rows in the table. : 1 2 75% : 3 4 75% : either 1&2 or 3&4 can satisfy your requirement. which one of them do you : want?
|
t****n 发帖数: 263 | 7 After give this more thought. I think there is a deeper problem in this.
consider the following example
1 2 75%
3 1 75%
3 2 75%
4 1 75%
4 2 75%
5 1 75%
5 2 75%
4 5 75%
Then (1, 2, 3) or (1, 2, 4, 5) are both OK, which one do you want?
My instinct tells me that this might be a NP problem. You should think about
it more.
【在 t****n 的大作中提到】 : There might be more than 1 subsets that could satisfy your requirement. : for example, if there are only 2 rows in the table. : 1 2 75% : 3 4 75% : either 1&2 or 3&4 can satisfy your requirement. which one of them do you : want?
|
M***7 发帖数: 2420 | 8 That's true. I have not think about it. Thanks a lot
【在 t****n 的大作中提到】 : After give this more thought. I think there is a deeper problem in this. : consider the following example : 1 2 75% : 3 1 75% : 3 2 75% : 4 1 75% : 4 2 75% : 5 1 75% : 5 2 75% : 4 5 75%
|
t****n 发帖数: 263 | 9 You can add all missing pairs in my example with similarity 0%, it doen't
change anything. Think, bro.
subset, for example, 2000 out of
range, then use some other constraints to trim it down.
【在 M***7 的大作中提到】 : First, the original table contains all possible pairwise similarities. : It is possible to get more than 1 subset. So I would want to get a whole subset, for example, 2000 out of : 10000 while all possible pairwise similarities in the 2000 IDs are in the range, then use some other constraints to trim it down. : Thanks
|
p*********a 发帖数: 61 | 10 Think each ID as a node. Connect an edge between any two nodes if their
similarity is between the specified range. Then the problem is to find a
clique. Evidently, there are usually more than one cliques in the graph. If
you want to find the maximum clique, the problem is NP-hard. It is even "
harder" than other NP problems, a.k.a. fixed-parameter intractable. In other
words, there is no good approximation algorithm. |
M***7 发帖数: 2420 | 11 Thanks a lot. I realized that after put in more thought.
If
other
【在 p*********a 的大作中提到】 : Think each ID as a node. Connect an edge between any two nodes if their : similarity is between the specified range. Then the problem is to find a : clique. Evidently, there are usually more than one cliques in the graph. If : you want to find the maximum clique, the problem is NP-hard. It is even " : harder" than other NP problems, a.k.a. fixed-parameter intractable. In other : words, there is no good approximation algorithm.
|