x******r 发帖数: 367 | 1 Hello,
There are too many materials about the design of web crawler. Would someone
recommend 1-2 websites/documents about the good design of web crawler. It is
desirable that the materials are suitable for interviews. Thanks. |
r******g 发帖数: 138 | |
x******r 发帖数: 367 | 3 Thank you for your reply.
Are you talking about this question?
If you were designing a web crawler, how would you avoid getting into
infinite loops?
It is a small part of the design. Are there any more comprehensive materials
? Thanks.
【在 r******g 的大作中提到】 : cc150 has answer
|
s**x 发帖数: 7506 | 4 我也不太明白,查了一些材料,好像是从几个Seed网站开始的,用好几台机器,似乎要
用一个共同的服务器存放已经访问的网站,避免重复访问。细节不清楚。 |
x******r 发帖数: 367 | 5 Right.
So I am asking for a more complete description of the design of web crawler.
Thanks.
【在 s**x 的大作中提到】 : 我也不太明白,查了一些材料,好像是从几个Seed网站开始的,用好几台机器,似乎要 : 用一个共同的服务器存放已经访问的网站,避免重复访问。细节不清楚。
|
x******r 发帖数: 367 | 6 re
crawler.
【在 x******r 的大作中提到】 : Right. : So I am asking for a more complete description of the design of web crawler. : Thanks.
|
x******r 发帖数: 367 | 7 Re
crawler.
【在 x******r 的大作中提到】 : Right. : So I am asking for a more complete description of the design of web crawler. : Thanks.
|
w****k 发帖数: 6244 | 8 look at the design of scrapy
should be enough to deal with interview
someone
is
【在 x******r 的大作中提到】 : Hello, : There are too many materials about the design of web crawler. Would someone : recommend 1-2 websites/documents about the good design of web crawler. It is : desirable that the materials are suitable for interviews. Thanks.
|
x******r 发帖数: 367 | 9 Thank you.
【在 w****k 的大作中提到】 : look at the design of scrapy : should be enough to deal with interview : : someone : is
|