S******t 发帖数: 151 | 1 It is easy to come up with an $O(SL)$ algorithm for this problem but what is
good about this problem is that it actually has a linear time solution:
Let’s first make some notations clear:
1) |S| -> The length of S.
2) |L| -> The number of words in the list $L$.
3) $len_L$ -> The length of each string in $L$.
First we use an Aho-Corasick automaton to find out the occurrences of all
patterns. The time complexity of this step is $O(|S| + |L| * len_L + z)$
where $z$ is the number of output matches. ... 阅读全帖 |
|
S******t 发帖数: 151 | 2 我对于这个问题的理解是最好用Aho-Corasick Algorithm而不是用Suffix Tree,Aho-
Corasick就是做多串匹配用的,实现复杂度也低很多。
http://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_string_matchi
Suffix Tree的线性构造算法是非常复杂的,我的印象里面要在300行以上(以Ukkonen算
法为例),而非线性的构造算法则毫无意义,还不如直接brute force的去匹配(比如说
CC150上的算法)。
Suffix Array的实现稍微简单一些,但是Linear Time的构造算法仍然并不好理解,我
的印象里三分法的实现复杂度应该在30-50行,而且几乎是千篇一律的模板,倍增法的
实现虽然好写一些但是复杂度就不是O(n)而是O(nlogn)的,不知道最近是不是有更好的
实现方式了,还请各位不吝指教。。。 |
|
b**********r 发帖数: 91 | 3 Aho-corasick string pattern search |
|
y*****i 发帖数: 141 | 4 自己顶一下。
KMP不适合多次查询的情况,Aho-Corasick适合查前缀而不是任意位置的子串,所以只
能是suffix tree吧? |
|
|
|
l***i 发帖数: 1309 | 7 maxflow
mincost flow
assignment problem and hungarian algorithm
Knuth-Morris-Pratt
Strassens' matrix multiplication
Bellman-Ford
Aho–Corasick
其实我是开玩笑的,以上应该都不会考。 |
|
M*******a 发帖数: 1633 | 8 所谓牛逼就是平时挺都不大听到的算法,用更用不到的算法,比如什么
max flow/min cut
bellman-ford
bi-partite matching
boyer-moore
red-black tree
rabin-karp
aho-corasick
min-hash
其他
看看概率多少 |
|
r**a 发帖数: 31 | 9 Aho-Corasick,让我现场手写,当时我就给跪了 |
|
|
m*******g 发帖数: 410 | 11 这题如果面试用Aho-Corasick 解,是是不是代码太长了,实际会这么考吗? |
|
D***e 发帖数: 247 | 12 How large is your hashtable. Bloom filer is not good for large pattern
dictionaries, and it is also computation intensive.
I recommend using Aho-Corasick to build a DFA and run your strings through
it. Or Knuth-Morris-Pratt and Boyer-Moore all have multi-pattern algorithms,
and you can look them up. |
|