x**g 发帖数: 807 | 1 Did anybody used Revolution R? If so, do you think it is a solution for
overcoming the limited memory problem in R?
Thank you. | d*l 发帖数: 400 | 2 Well, I listened to their talk about RHadoop at a conference last year, and
then tried to use RHadoop https://github.com/RevolutionAnalytics/RHadoop/
wiki) a couple of months ago.
My impression is that this thing is in a very early stage of development,
not quite useful at the moment. It is not transparent to programmer, and I
guess it would take many people many years to port a good sized pool of R
packages to Map/Reduce, it is not there yet. Basically, it is just an
abstract layer on top of Hadoop streaming, but then I find that using the
plain Hadoop streaming with R is actually easier and more straightforward
and more flexible.
I am interested in knowing others' opinions.
【在 x**g 的大作中提到】 : Did anybody used Revolution R? If so, do you think it is a solution for : overcoming the limited memory problem in R? : Thank you.
| s*********e 发帖数: 1051 | 3 同意,他们迟早会有版权官司。
and
【在 d*l 的大作中提到】 : Well, I listened to their talk about RHadoop at a conference last year, and : then tried to use RHadoop https://github.com/RevolutionAnalytics/RHadoop/ : wiki) a couple of months ago. : My impression is that this thing is in a very early stage of development, : not quite useful at the moment. It is not transparent to programmer, and I : guess it would take many people many years to port a good sized pool of R : packages to Map/Reduce, it is not there yet. Basically, it is just an : abstract layer on top of Hadoop streaming, but then I find that using the : plain Hadoop streaming with R is actually easier and more straightforward : and more flexible.
| S******y 发帖数: 1123 | 4 Interesting topic :-)
Many people think that there would be such a thing coming that user could
simply plug in R or SAS and make all existing functions/packages/procedures
to run on Hadoop-scaled data and "solve" the ultimate data size problem.
Unfortunately, there is no such thing. To achieve that, somebody has to
virtually rewrite every R package or every SAS/STAT procedure since most of
their underlying code/algorithms are simply not map-reduce compatible.
That is industry-scaled development work.
What Revolution R has achieved is a small piece of aforementioned endeavor -
they rewrote a few R packages in drastic different implementation from free
R. The goal is to avoid loading everything into memory upfront, but process
data in chunks, while resembling current R user interface as much as
possible.
I have used SAS and R for many years in financial/pharmaceutical/insurance
industries. But recent years witnessed explosive growth of data. In one of
my recent projects for an e-commerce company, we are having 10GB data coming
in every day. Hadoop has become the de facto platform for us. I have been
using RevoR since 2009 and Mahout since 2011. I like the ease and simplicity
of RevoR. But I also believe that Mahout is very promising (maybe the best
shot so far) for solving analytical side of the big data problem.
Just my 2 cents.
Happy Holiday to everyone! | n*****3 发帖数: 1584 | 5 nice, thanks for sharing with us.
May I ask what if you want some other algorithms which
are NOt part of mahout? write the algorithm from scratch? will that be easy
in the mahout environment?
procedures
of
-
free
【在 S******y 的大作中提到】 : Interesting topic :-) : Many people think that there would be such a thing coming that user could : simply plug in R or SAS and make all existing functions/packages/procedures : to run on Hadoop-scaled data and "solve" the ultimate data size problem. : Unfortunately, there is no such thing. To achieve that, somebody has to : virtually rewrite every R package or every SAS/STAT procedure since most of : their underlying code/algorithms are simply not map-reduce compatible. : That is industry-scaled development work. : What Revolution R has achieved is a small piece of aforementioned endeavor - : they rewrote a few R packages in drastic different implementation from free
| S******y 发帖数: 1123 | 6 It probably requires quite a bit of work.
If you can come up with something like that, you probably can contribute to
Mahout, and also publish your work in academic/industry journal(s) :-)
easy
【在 n*****3 的大作中提到】 : nice, thanks for sharing with us. : May I ask what if you want some other algorithms which : are NOt part of mahout? write the algorithm from scratch? will that be easy : in the mahout environment? : : procedures : of : - : free
|
|