关于5000x32的讨论汇总 - 话题女王

全部话题 - 话题: 5000x32

x*******i
发帖数: 10

来自主题: Statistics版 - k means clustering number

哪位推荐一个算法？
以前一直用gap statistic。
但数据量大了之后（5000x32)，R里面运算剧慢，老是out of memory。我的job在服务
器上被kill了多次了。
想用c 的 k mean 程序加一个 gap statistic，c 用的不多，又比较懒，最好哪位有现
成的拿来用一下。

g********r
发帖数: 8017

来自主题: Statistics版 - k means clustering number

5000x32不大.怎么会锯慢还内存不够?得多老的服务器呀?

x*******i
发帖数: 10

来自主题: Statistics版 - k means clustering number

No, it can not.
The problem is for each K, it need to compare with the randomly draw uniform
data from matrix 5000x32, estimate the dispersion. You know, the number for
the cycle can not be small for this NP procedure. I even reduced to 30
times.
For large dataset, given K, the K mean itself (cluster library) is slow in R
for only one time calculation.

存就省了呢？

g********r
发帖数: 8017

来自主题: Statistics版 - k means clustering number

stats里面也有个kmeans好像不慢.
我说得内存问题就是这个意思:一个5000x32占几十M而已.如果随机生成的矩阵每次算完
就删除,有没有内存泄露,应该用不
到多少内存.

uniform
for
R

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

版面

帖数(主题数)

全站

4871 (796)

3777 (569)

341 (51)

117 (17)

116 (3)