anybody doing Lucene/Solr? - Java版 - 未名存档

本页内容为未名空间相应帖子的节选和存档，一周内的贴子最多显示50字，超过一周显示500字访问原贴

Java版 - anybody doing Lucene/Solr?

相关主题
● any good j2ee book?	● 请问有用过lucene作中文搜索的吗?
● 这叫啥名词？	● 再请教一个lucene的问题
● solr shared index file solution	● 再请教一个lucene的问题
● Lucene 中精确匹配	● 再请教一个lucene的问题
● Anybody here used apache Lucene?	● 还是lucene的问题
● 急！如何用eclipse编辑lucene	● Search Results Navigation
● how to run Java on Linux?	● Nutch
● tomcat 访问硬盘文件的问题, 3x	● Twitter Search is Now 3x Faster using Java server

相关话题的讨论汇总
话题: lucene话题: solr话题: doclists话题: search话题: read

进入Java版参与讨论

1

(共1页)

c******n 发帖数: 4965	1 I'm new , so having the following question on the mailing list, haven't got an answer, maybe someone here could help? thanks! http://mail-archives.apache.org/mod_mbox/lucene-java- user/201104.mbox/browser
g**********y 发帖数: 14569	2 what's your question? The link points to April archive. 【在 c******n 的大作中提到】 : I'm new , so having the following question on the mailing list, : haven't got an answer, maybe someone here could help? thanks! : http://mail-archives.apache.org/mod_mbox/lucene-java- : user/201104.mbox/browser
c******n 发帖数: 4965	3 thanks, I didn't realize the link shows differently.... here it is: ######################################################## I'm new to lucene/search engine , and have been struggling with these questions recently. I'd appreciate a lot of you could shed some light on this. let's say I do a query on dog greyhound note that I did not quote them, i.e. this is not a phrase search. what happens under the hood ? which term does Lucene use to look up the inverted Index ? I read somewhere that Lucene uses the term with the higher IDF (i.e. the more distinguishing term), i.e. in this case "greyhound", but what about dog? does Lucene traverse down the doclist of "dog" at all? if I provide multiple terms in my query, generally how does Lucene decide how many doclists to travel down? I read that Lucene uses a combination of "binary model" and VSM, then it seems that in the above case, it finds the full doclist of dog , and that of "greyhound", (the binary model part), then find the common docs from the two doclists, then order them by scores ( the VSM part). is it true that the FULL doclists are fetched first? or is some pruning done on the individual doclists? I see the talk in http://www.slideshare.net/abial/eurocon2010 that talks about pruning and tiered search, but is this the default behavior of Lucene? how are the doclists sorted? (by idf ?? --- sorry I'm just beginning to sift through a lot of docs online, somehow got this impression but can't form a precise conclusion) also generally, could you please provide some good articles on how lucene/search engines work? I've read the "anatomy of a search engine" (google Sergey Brin & Larry Page paper), "introduction to information retrieval (Manning et al ) " , "Lucene in action" .... Thanks Yang 【在 g**********y 的大作中提到】 : what's your question? The link points to April archive.
g**********y 发帖数: 14569	4 Sorry, I just use Lucene as a search engine in our product. I didn't dive into how it works. I did read some documents and code from Lucene project for curiosity. My impression is: it is a C-style Java program, painful to read and use. Maybe you can directly contact the developers for technical details. 【在 c******n 的大作中提到】 : thanks, I didn't realize the link shows differently.... here it is: : ######################################################## : I'm new to lucene/search engine , and have been struggling with these : questions recently. : I'd appreciate a lot of you could shed some light on this. : let's say I do a query on : dog greyhound : note that I did not quote them, i.e. this is not a phrase search. : what happens under the hood ? : which term does Lucene use to look up the inverted Index ?
i**e 发帖数: 6810	5 I don't know much about the internals of Lucene. With Solr, it's possible to specify the default operator as OR or AND. I think your were more talking about the OR case. It is optional, that when AND gives you a very small number of results, you could do an OR to enrich the result. 【在 c******n 的大作中提到】 : thanks, I didn't realize the link shows differently.... here it is: : ######################################################## : I'm new to lucene/search engine , and have been struggling with these : questions recently. : I'd appreciate a lot of you could shed some light on this. : let's say I do a query on : dog greyhound : note that I did not quote them, i.e. this is not a phrase search. : what happens under the hood ? : which term does Lucene use to look up the inverted Index ?
c******n 发帖数: 4965	6 exactly!! I read it and it was all "abstract class", if it's java, most of it would have been written as interfaces + baseImpl dive 【在 g**********y 的大作中提到】 : Sorry, I just use Lucene as a search engine in our product. I didn't dive : into how it works. : I did read some documents and code from Lucene project for curiosity. My : impression is: it is a C-style Java program, painful to read and use. : Maybe you can directly contact the developers for technical details.
c******n 发帖数: 4965	7 btw, have you built solr in eclipse? its directory organization is not very standardized, I had to manually tweak it a lot to make it work dive 【在 g**********y 的大作中提到】 : Sorry, I just use Lucene as a search engine in our product. I didn't dive : into how it works. : I did read some documents and code from Lucene project for curiosity. My : impression is: it is a C-style Java program, painful to read and use. : Maybe you can directly contact the developers for technical details.
g**********y 发帖数: 14569	8 no, I didn't use solr in work. 【在 c******n 的大作中提到】 : btw, have you built solr in eclipse? : its directory organization is not very standardized, I had to manually : tweak it a lot to make it work : : dive
q*****9 发帖数: 85	9 u should use analyzer or somehow tokenizer, no one search against index like that. 【在 c******n 的大作中提到】 : I'm new , so having the following question on the mailing list, : haven't got an answer, maybe someone here could help? thanks! : http://mail-archives.apache.org/mod_mbox/lucene-java- : user/201104.mbox/browser

1

(共1页)

进入Java版参与讨论

相关主题
● Twitter Search is Now 3x Faster using Java server	● Anybody here used apache Lucene?
● Java用在Server Side的技术是哪些？	● 急！如何用eclipse编辑lucene
● Goodbug能提供一些Spring, Hibernate, Cassandra入门资料吗	● how to run Java on Linux?
● 请教个ec2 + nosql 的问题	● tomcat 访问硬盘文件的问题, 3x
● any good j2ee book?	● 请问有用过lucene作中文搜索的吗?
● 这叫啥名词？	● 再请教一个lucene的问题
● solr shared index file solution	● 再请教一个lucene的问题
● Lucene 中精确匹配	● 再请教一个lucene的问题

相关话题的讨论汇总
话题: lucene话题: solr话题: doclists话题: search话题: read

未名新帖统计// 7月16日

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

* 这里只显示发帖超过25的版面，努力灌水吧:-)