m**********e 发帖数: 14 | 1 去看看这三篇 paper 吧 ..
记得 pagerank 的是 random walk 角度的,
跟 Kleiberg 的 web structure 角度还是不同的 ... |
|
i******y 发帖数: 70 | 2 The high level idea of pagerank / social network / collaborative filtering
and its application in google / youtube,myspace / amazon.
P2P and emule, kazza, bittorrent
scientific computing and rockets
computer games, graphics, animation
... |
|
h****r 发帖数: 2056 | 3 专利有两种,实用新型和发明,PageRank的大概属于前者。 |
|
D*******a 发帖数: 3688 | 4 google发家的关键是adsense。
pagerank早有了,他们先后试了7个business model,最后才成功的。
创业课的老师说,技术并不是最重要的,关键的还是business execution。 |
|
c****r 发帖数: 185 | 5 难怪Turing Award远不如炸药奖那么有名。
PageRank的作者该算有影响力吧,不过人家GOOGLE的创始人估计也不稀罕这种奖。
filter, |
|
N**D 发帖数: 10322 | 6 SVD can be used to compute eigenvalues.
There are so many places using SVD or eigenvalues: mechanics, signal
processing, chemistry, psycology, medical, Pagerank used in google's search
engine (at least one way to compute), etc.
And, Hoffman coding is also very simple.
SVD是不是太简单了,感觉没啥大用
behind |
|
|
N**D 发帖数: 10322 | 8 did it talk google's algorithm?
I am super impressed if it is more than pagerank 101 |
|
l******e 发帖数: 470 | 9 我不是给了个联接么
我不做这方面,只是randomly的听过几个talk看过几篇paper
比如small world Phenomenon,一方面的研究是用来解释social network中的现象
另一方面能帮助设计算法,比如decentralized search,用local/rough information
去做routing。这个phenomenon和很多p2p的结构有很多内在联系,比如chord。
还听过一个talk是information怎么在social network中传播,在哪些结点广播这些信
息会让信息传播的最广泛。还有疾病的传播模型。
还有socialnetwork的结构和pagerank也有很多联系。
还有在social netowrk中可以mining出来很多有趣的联系,facebook好象用了很多这些
东西。上次yahoo research的头来我们学校,就说需要不少做socialnetwork和
gametheory的人。不过至少从我听到的这些来说都还是很interesting和有意义的,呼
唤专门做这个方向的权威来讲两句,呵呵。 |
|
c*******h 发帖数: 1096 | 10 能不能给个network结构和pagerank关系的link
information |
|
|
v****s 发帖数: 1112 | 12 那么就是说
www.aaa.com/bbb
和
www.aaa.com
其实是2个独立的node?
因为理论上这些点不应该是(没必要是)单独的node,他们有着包含的关系。。。 |
|
|
v****s 发帖数: 1112 | 14 ML is kind of general and also, fundamental.
我不知道search engine的job 多不多,但是我觉得应该不会很多吧?毕竟去一个公司
做pagerank,hits, 不太可能吧?呵呵
search |
|
a****9 发帖数: 418 | 15 刚才我还真去看了这两篇paper
如果我的理解没有错(俺不是做这个的)
两篇提出来的策略都不是根据访问量.
比如你要确定一个page A的rank
Li是根据所有指向A的link的描述,计算和关键字的相关性
Page是把Internet model成一个无比巨大的Markov Chain,
每个webpage是一个节点,每个链接
是一个state transition. PageRank其实就是这个MC的稳态分布.
两篇相同的地方在于都是基于反向链接来确定rank而不是网页本身的keywords,
可以防止keyword spamming
不同的地方地方在于Google的模型更漂亮,而且可以体现出不同网站给出链接的重要性
差别,对于rank的作弊就更难一些--除非你能把很牛的网站也弄上你的链接
宏怒 |
|
c******n 发帖数: 4965 | 16 if I search
cat dog
inside the engine, 2 doclists are returned, each sorted in docId, one for
"dog", one for "cat", then the lists are merged. then PageRank is carried
out over the intersection of the 2 sets. is this correct? my question is:
since both doclists can be very long, and their intersection list could
also be very long, each time user does a query, maybe only 100 of the
intersection docIds are finally used, so we have to sort unnecessarily
each time??
how is this solved exactly in goo... 阅读全帖 |
|
|
c****e 发帖数: 1453 | 18 It's enough to put emails in file. Search is not rocket science. It could be
as simple as revert index. Suppose you have email A,B and C as
A:
To: John Body:Hello
B:
To: Sam Body:Hello
C:
To: Lily Body:Hi
Then you got:
Hello->A,B
Hi->C
When you search "hello", it can find email A and B.
Web search is much more tricky, because of ranking. Imaging you got the A,B
list as a million webpages, you have to show them in order. Pagerank is only
a small part of that. In reality, thousands of signals are... 阅读全帖 |
|
c****e 发帖数: 1453 | 19 It's enough to put emails in file. Search is not rocket science. It could be
as simple as revert index. Suppose you have email A,B and C as
A:
To: John Body:Hello
B:
To: Sam Body:Hello
C:
To: Lily Body:Hi
Then you got:
Hello->A,B
Hi->C
When you search "hello", it can find email A and B.
Web search is much more tricky, because of ranking. Imaging you got the A,B
list as a million webpages, you have to show them in order. Pagerank is only
a small part of that. In reality, thousands of signals are... 阅读全帖 |
|
d**********3 发帖数: 1186 | 20 Hello All Members,
We will hold SQL PASS Career Talk on Jun 9th.
(Jun 9, 2015 9:00 PM - 10:30 PM CST)
Please register for How to Start the Career as Data Scientist on Jun 9, 2015
9:00 PM CDT at:
https://attendee.gotowebinar.com/register/5891367287069457410
After registering, you will receive a confirmation email containing
information about joining the webinar.
------------------------------------------------------
About this Session,
How to Start the Career as Data Scientist
In this talk, Yong,... 阅读全帖 |
|
p*u 发帖数: 2454 | 21 Google had support from two famous venture capitals, Sequoia Capital and
Kleiner Perkins Caufield & Byers. At that time search engine was a virgin
soil, and pagerank is a good algorithm. Be the first crab eater. |
|
|
|
Z****e 发帖数: 2999 | 24 maybe open source PageRank, too? hehe |
|
c****e 发帖数: 1453 | 25 搜索主要的机器用在serving上面。crawling和indexing是小头。当然,serve的机器和
traffic有关。
Ranking也有很多进展,真靠pagerank哪能玩的转。04年和现在比,entity就是一个很
大的进展。Bing和Google都在往这方面投入。而且各个segment进展也很大,象movie,
restaurant, celebrity.
上即刻和百度随便搜一个明星,就知道差距了。
search |
|
d******k 发帖数: 4295 | 26 外行问一下google search的核心还是PageRank吗?
现在Deep learning搞得咋样了?是噱头还是真的有实用的东西出来了? |
|
|
w***g 发帖数: 5958 | 28 这两天在faculty版上灌水纠结大数据概念,让我突然想明白了大数据在工业界兴起的
原因:
很多规律在数据量大了以后都会变得很显然,算法再傻B,在超大量数据作用下效果也
会变得很好。因为开发牛B算法开销很大风险很高,而获取数据对不少公司而言很容易。
所以会出现一个大数据潮流,大家都指着数据变成救命法宝。
比如大数据的源头Google,Pagerank其实就是个幌子,真正优化Google搜索结果的其实
是人民群众。结果被点得多了排名就上去了。算法够傻B吧。还有Google翻译也用到了
大数据技术。算法是否傻B是和系统复杂度相对的。比如linear regression,数据量小用
atlab三五行就写出来了。数据量大了就得上Hadoop, Spark,支持运算的平台比真正的
算法要复杂得多得多。
但真相果真如此吗?做过machine learning的人都知道,数据量增大对于结果改善的边
际效应是递减的,而且往往是指数递减的。需要获得10个百分点的提高,可能需要成千
上万,甚至亿万倍的数据量。真是有点愚公移山的感觉了。
大数据的驱动力背后还有更黑暗的真相:提高barrier to ent... 阅读全帖 |
|
z****e 发帖数: 54598 | 29 February 22, 2015 Nicole Hemsoth
art2
If you haven’t heard of Flink until now, get ready for the deluge. As one
of a stream of Apache incubator-to-top-level projects turned commercial
effort, the data processing engine’s promise is to deliver near-real time
handling of data analytics in a much faster, more condensed, and memory-
aware way than Hadoop or its in-memory predecessor, Spark, could do.
What really captured our attention, however, was the claim by Data Artisans,
the company behind Flin... 阅读全帖 |
|
h*i 发帖数: 3446 | 30 BP要解决的问题,其实很简单,就是如何根据新数据来改变既有的已经学习到的结构,
同时又不能完全忘了以前学的,所以就用微分的办法来一点点的把结果的错误往回传播
来微调已有结构。在原理上,这和Web搜索用到的pagerank的计算很类似。说这个,目
的是说,BP只是一个办法,要做到同样的事,可能有很多其他的办法。BP肯定不会是最
好的办法。为什么?因为1.这种办法太慢了,所以需要很多很多数据,生物不需要这么
多数据。2.生物神经系统的结构是不支持BP的。 |
|
A******y 发帖数: 2041 | 31 Probably JBC is not better than I.F. suggested. It is still number 2 by
pagerank that works in real life i.e. google. |
|
d***a 发帖数: 13752 | 32 Larry Page和Sergey Brin也是。
Sareh说的图灵奖有点不好预测,我觉得那个PageRank算法是应用型的算法,
不是基础研究,主要的贡献是在工程应用上。 |
|
d**********3 发帖数: 1186 | 33 Hello All Members,
We will hold SQL PASS Career Talk on Jun 9th.
(Jun 9, 2015 9:00 PM - 10:30 PM CST)
Please register for How to Start the Career as Data Scientist on Jun 9, 2015
9:00 PM CDT at:
https://attendee.gotowebinar.com/register/5891367287069457410
After registering, you will receive a confirmation email containing
information about joining the webinar.
------------------------------------------------------
About this Session,
How to Start the Career as Data Scientist
In this talk, Yong,... 阅读全帖 |
|
a****y 发帖数: 1035 | 34 arxiv+pagerank?
不就是bbs或者博客上十大上头条的算法吗。 |
|
c********e 发帖数: 598 | 35
病。
生信这是鸡肋行业,没有动力的是常见。你要开发个新算法,能象pagerank算法那样挣
钱吗? |
|
|
m******i 发帖数: 834 | 37 【纳米科技世界快讯】根据最新的我们对全球纳米科技网站,用 Alexa做单的计算说明
,我们论坛在很短的时间内已经进入全球排名前10名。目前Google收录达到57000条,
百度收录达40000条以上。
Here is how Alexa ranks the leading nanotechnology portals (as per August 3,
2007):
Portal Traffic Rank Google collection Google
PageRank
Nanotech-now.com 167,926 29, 400
7
Nanowerk.com 181,471 31,000
7
Foresight.org 248,263 N.A. |
|
B*G 发帖数: 13438 | 38 mitbbs在google的pagerank相当高,同学你这副图会被搜到的。 |
|
p*****i 发帖数: 2019 | 39 你用过google吗
你知道google的道理多么简单吗?
links form a graph-->markov chain -->svd-->singular values as pageRanks
可以回去看larry page写的paper,你会发现和大部分垃圾一样
但人家成功了
光纤的paper,高锟的,IEEE transaction上的,和大部分垃圾一样
但人家成功了
没有理工科的科研,人类就没进步 |
|
h***t 发帖数: 2540 | 40 pagerank was first published in WWW. |
|
|
m***r 发帖数: 359 | 42 http://ml.memect.com/weekly/2015-03-15/
## 2015-03-15 星期日,完整版 28 条
NICAR15主题报告Hands-on with machine learning @爱可可-爱生活
《七周七并发模型》 @睡眼惺忪的小叶先森
两个R语言教程 @2gua
经典老文Fast unfolding of communities in large networks @爱可可-爱生活
用PageRank计算世界杯参赛球队排行榜 @爱可可-爱生活
## 2015-03-14 星期六,完整版 28 条
LINE模型大规模网络降维公布论文以及源代码 @chuckpku
概率图模型基础总结 @数盟社区
Sum-Product Networks @爱可可-爱生活
IJCV 综述:图模型中的离散能量最小化 @赵家平USC
Stanford基于神经网络的自然语言依存关系解析器 @爱可可-爱生活
## 2015-03-13 星期五,完整版 32 条
多智能自然语言处理 @刘知远THU
计算机辅助翻译推荐作者 @青鸟leo翻译之声
FaceNet人脸识别 @爱可可... 阅读全帖 |
|
d**********3 发帖数: 1186 | 43 Hello All Members,
We will hold SQL PASS Career Talk on Jun 9th.
(Jun 9, 2015 9:00 PM - 10:30 PM CST)
Please register for How to Start the Career as Data Scientist on Jun 9, 2015
9:00 PM CDT at:
https://attendee.gotowebinar.com/register/5891367287069457410
After registering, you will receive a confirmation email containing
information about joining the webinar.
------------------------------------------------------
About this Session,
How to Start the Career as Data Scientist
In this talk, Yong,... 阅读全帖 |
|
m****g 发帖数: 530 | 44 最近生物化学杂志(JBC)的两位副主编,Vincent C. Hascall和Richard W Hanson写
了一篇文章,“JBC on Journal
Ranking”,文章非常有意思,对于现在科学界一切跟着影响因子(Impact Factor)转
的潮流,算是浇上了一点冷水,
也表达了JBC这个百年老杂志的不满之情。
目前杂志的排名多采用总部在美国费城的ISI(Institute for Scientific
Information)公司采纳的影响因子,这个概念最
早由Eugene Garfield提出来的,初衷是对科学界发表的文章根据发表的数量和引用率
进行一下排名,以便于公司对巨
量的科学信息进行管理,但之后这个“影响因子”影响巨大,是Eugene Garfield最初
没有想到的,一些大学把ISI文章多
少用来作为判定科学家水平的一个重要标志,甚至可以决定一位科研人员的命运。
Eugene Garfield 1995年在一次演讲
中说(大意):“1955年,当我开始采纳这个影响因子的时候,我没有想到有朝一日这
个能引来这么大的争论。正如核
能一样,影响因子也是一把双刃剑, |
|
a****o 发帖数: 1786 | 45 JBC仍然是生化类总引用数最高的杂志
我也有幸被据过 |
|
|
|
|