m***t 发帖数: 254 | 1 write the program for displaying the ten most frequent words in a file such
that your program should be efficient in all complexity measures.
I think hash table with counting will be a natual solution, but not sure if
it is a best fit for this problem. |
m******k 发帖数: 43 | 2 hashtable or b+ tree/trie like dictinonary data structure etc only provide
fast search/index, so still need additional data structure on the frequnecy
order statistics. A min-heap is ok. or you can use another b+ tree to store
the frequency as key
such
if
【在 m***t 的大作中提到】 : write the program for displaying the ten most frequent words in a file such : that your program should be efficient in all complexity measures. : I think hash table with counting will be a natual solution, but not sure if : it is a best fit for this problem.
|
m***t 发帖数: 254 | 3 I guess u mean max heap to store frequency, right?
frequnecy
store
【在 m******k 的大作中提到】 : hashtable or b+ tree/trie like dictinonary data structure etc only provide : fast search/index, so still need additional data structure on the frequnecy : order statistics. A min-heap is ok. or you can use another b+ tree to store : the frequency as key : : such : if
|
m******k 发帖数: 43 | 4 I mean a min-heap of size 10 store top 10 most frequence words on the fly.
because we need to compare other candidates with the least frequent one of
the top 10 to determine whether do a replcaement. Other operation on the min
-heap includes increase_key if anyone of the current top 10 get hit again.
of course, a max-heap also works, but it seems you need to store the entire
N elements to determine the top 10.
I guess u mean max heap to store frequency, right?
frequnecy
store
【在 m***t 的大作中提到】 : I guess u mean max heap to store frequency, right? : : frequnecy : store
|
m***t 发帖数: 254 | 5 //blush. That is absolutely right.
min
entire
【在 m******k 的大作中提到】 : I mean a min-heap of size 10 store top 10 most frequence words on the fly. : because we need to compare other candidates with the least frequent one of : the top 10 to determine whether do a replcaement. Other operation on the min : -heap includes increase_key if anyone of the current top 10 get hit again. : of course, a max-heap also works, but it seems you need to store the entire : N elements to determine the top 10. : : I guess u mean max heap to store frequency, right? : frequnecy : store
|