h******e 发帖数: 1791 | 1 如果是很大的dataset,比如千万个observation,用r还是sas? |
T*******I 发帖数: 5138 | 2 Definitely SAS if you are familiar with it.
【在 h******e 的大作中提到】 : 如果是很大的dataset,比如千万个observation,用r还是sas?
|
s*********e 发帖数: 1051 | 3 it really depends on your OS. if you are using 64-bit OS with enough memory,
R should be able to handle the data you mentioned. |
b*******x 发帖数: 100 | 4 R有相应的包处理大数据比如
bigmemory,foreach,还有其他的平行运算用的包
或者选择
revolution R
这个东西我预计不出2年,在商业应用领域beat SAS,或者至少平起平坐
看看现在的ubuntu的风靡,就可以想象的出revo R的未来。
最后,至少R它是门正经的编程语言。
【在 h******e 的大作中提到】 : 如果是很大的dataset,比如千万个observation,用r还是sas?
|
l*********s 发帖数: 5409 | 5 The maximal size of datasets R can handle is limited by the capacity of
physical memory, right?
memory,
【在 s*********e 的大作中提到】 : it really depends on your OS. if you are using 64-bit OS with enough memory, : R should be able to handle the data you mentioned.
|
l*********s 发帖数: 5409 | 6 Do you have any hands-on experience with these R packages?
【在 b*******x 的大作中提到】 : R有相应的包处理大数据比如 : bigmemory,foreach,还有其他的平行运算用的包 : 或者选择 : revolution R : 这个东西我预计不出2年,在商业应用领域beat SAS,或者至少平起平坐 : 看看现在的ubuntu的风靡,就可以想象的出revo R的未来。 : 最后,至少R它是门正经的编程语言。
|
b*******x 发帖数: 100 | 7 I know that bigmemory package can break this limits.
【在 l*********s 的大作中提到】 : The maximal size of datasets R can handle is limited by the capacity of : physical memory, right? : : memory,
|
b*******x 发帖数: 100 | 8 I tried bigmemory and foreach before, unless it is really a big data set
and some computation really slow down your code, those package won't
increase the speed a lot, but if you have multiple core on your computer,
or you have access to clusters, that's another case.
I haven't try revolution R yet, it look like they already have a released
version for Ubunton, anyway, based on the purpose of this project, I
really have high expectation for this version of R in the following years.
【在 l*********s 的大作中提到】 : Do you have any hands-on experience with these R packages?
|
h******e 发帖数: 1791 | 9 好事儿呀,sas有了竞争者,license fee就不会这么黑了。
【在 b*******x 的大作中提到】 : R有相应的包处理大数据比如 : bigmemory,foreach,还有其他的平行运算用的包 : 或者选择 : revolution R : 这个东西我预计不出2年,在商业应用领域beat SAS,或者至少平起平坐 : 看看现在的ubuntu的风靡,就可以想象的出revo R的未来。 : 最后,至少R它是门正经的编程语言。
|
l*********s 发帖数: 5409 | 10 That is very encouraging. I definitely have to bother our adminstrator to let
me give it a try. :-)
Speed is probably not a big concern; SAS is not very fast at dealing large
data sets either.
【在 b*******x 的大作中提到】 : I tried bigmemory and foreach before, unless it is really a big data set : and some computation really slow down your code, those package won't : increase the speed a lot, but if you have multiple core on your computer, : or you have access to clusters, that's another case. : I haven't try revolution R yet, it look like they already have a released : version for Ubunton, anyway, based on the purpose of this project, I : really have high expectation for this version of R in the following years.
|
c*******o 发帖数: 8869 | 11 I happened to be in a workshop taught by one of the authors of bigmemory. I
recall he mentioned that this package only work for all numerical matrix but
not mixed type data frame.....Maybe I was mistaken somehow......
【在 b*******x 的大作中提到】 : I know that bigmemory package can break this limits.
|
n***i 发帖数: 15 | 12 I think you are right.
I
but
【在 c*******o 的大作中提到】 : I happened to be in a workshop taught by one of the authors of bigmemory. I : recall he mentioned that this package only work for all numerical matrix but : not mixed type data frame.....Maybe I was mistaken somehow......
|
n***i 发帖数: 15 | 13 One think I don't like bigmemory is that you can only create sub bigmatrix
from contiguous rows/columns from existing bigmatrix. Maybe I am wrong.
I
but
【在 c*******o 的大作中提到】 : I happened to be in a workshop taught by one of the authors of bigmemory. I : recall he mentioned that this package only work for all numerical matrix but : not mixed type data frame.....Maybe I was mistaken somehow......
|