由买买提看人间百态

boards

本页内容为未名空间相应帖子的节选和存档,一周内的贴子最多显示50字,超过一周显示500字 访问原贴
Mathematics版 - count unique values in file with 1 million rows (转载)
相关主题
Help on random integer generator请教:如何证明Green's function的唯一性
Predict values of vectors generated by black box functions如何验证有限元等得到的数值解的可靠性?
请教:algebra VS. sigma algebra忘光了,谁来帮我解决这个比较白的问题?
有用singular软件或者搞invariant polynomial theory的吗?问个线性代数的问题
几个难度很大的极限问题(征解)Existence and uniqueness of ground states
请教:g(v)=v unique solution 的证明求教PDE Maxwell Equations数学问题
ask a linear algebra question请问一个关于Mathematica数值解的问题
不好意思,再问一个线性代数的问题,我不太理解题目朱熹平临门一脚证明了庞加莱猜想 --要申请2016年度国家科学奖
相关话题的讨论汇总
话题: excel话题: values话题: unique话题: a5389579话题: count
进入Mathematics版参与讨论
1 (共1页)
l******9
发帖数: 579
1
【 以下文字转载自 Statistics 讨论区 】
发信人: light009 (light009), 信区: Statistics
标 题: count unique values in file with 1 million rows
发信站: BBS 未名空间站 (Sat Apr 12 16:11:22 2014, 美东)
I need to count the unique values in one column in EXCEL 2010.
The worksheet has 1 million rows and 10 columns. All cell values are string
or numbers.
I used the solution at Count unique values in a column in Excel
=SUMPRODUCT((A2:A1000000<>"")/COUNTIF(A2:A100000,A2:A1000000&""))
But, it runs so long time that the EXCEL is almost frozen. And, it generates
25 processes in Win 7.
Are there more efficient ways to do it ?
Also, in the column, all values have for format of
AX_Y
here, A is a character, X is an integer, Y is an integer from 1 to 10.
For example, A5389579_10
I need to cut off the part after (including) undersocre. for the example,
A5389579
This is what I need to count as unique values in all cells in one column.
For example, A5389579_10
A1543848_6
A5389579_8
here, the unique value has 2 after removing the part after underscore.
How to do it in EXCEL VBA and R (if no efficient solution for EXCEL) ?
Thanks !
w**a
发帖数: 1024
2
MATLAB 有个命令叫unique 吧
S*********N
发帖数: 6151
3

string
generates
你可以先做一下字串操作,然后Count

【在 l******9 的大作中提到】
: 【 以下文字转载自 Statistics 讨论区 】
: 发信人: light009 (light009), 信区: Statistics
: 标 题: count unique values in file with 1 million rows
: 发信站: BBS 未名空间站 (Sat Apr 12 16:11:22 2014, 美东)
: I need to count the unique values in one column in EXCEL 2010.
: The worksheet has 1 million rows and 10 columns. All cell values are string
: or numbers.
: I used the solution at Count unique values in a column in Excel
: =SUMPRODUCT((A2:A1000000<>"")/COUNTIF(A2:A100000,A2:A1000000&""))
: But, it runs so long time that the EXCEL is almost frozen. And, it generates

s*****V
发帖数: 21731
4
1 million row在计算机面前算毛,直接数就得了。
s****b
发帖数: 2039
5
can use programming, such as java:
make a 2D array, put excel sheet into the array.
for each value, find _ and select substring before _, put into a new array.
then for the new array, start with first value, use for loop, from i=1,if
find any value == this
value, delete that value, shift array contents one step up,reduce the
dimension by 1, increase i by 1, do it again. when i reaches 1million, put
this value into first position of a new 1D array, delete this value.
do it again with the second value ...
...
until you do it with the last value (the final dimension).
the new 1D array will be all the unique values.
set up a counter and a++ each time you put a value into the 1D array. the
final number a is the number of all unique values.
Don't know if this is an efficient idea, and don't know if java will freeze
or not with that large memory requirement, ha ha.
***
if you don't mind, you can also just select that column, paste to a new
sheet, then your programming will just be from 1D array to 1D array.
or you can import that excel into SAS, only select that column, export to a
textfile and save it. I guess SAS will not freeze.

string
generates

【在 l******9 的大作中提到】
: 【 以下文字转载自 Statistics 讨论区 】
: 发信人: light009 (light009), 信区: Statistics
: 标 题: count unique values in file with 1 million rows
: 发信站: BBS 未名空间站 (Sat Apr 12 16:11:22 2014, 美东)
: I need to count the unique values in one column in EXCEL 2010.
: The worksheet has 1 million rows and 10 columns. All cell values are string
: or numbers.
: I used the solution at Count unique values in a column in Excel
: =SUMPRODUCT((A2:A1000000<>"")/COUNTIF(A2:A100000,A2:A1000000&""))
: But, it runs so long time that the EXCEL is almost frozen. And, it generates

1 (共1页)
进入Mathematics版参与讨论
相关主题
朱熹平临门一脚证明了庞加莱猜想 --要申请2016年度国家科学奖几个难度很大的极限问题(征解)
急!急!急!倾我所有寻求解题帮助请教:g(v)=v unique solution 的证明
matrix inverseask a linear algebra question
10个包子征解法。。。 (转载)不好意思,再问一个线性代数的问题,我不太理解题目
Help on random integer generator请教:如何证明Green's function的唯一性
Predict values of vectors generated by black box functions如何验证有限元等得到的数值解的可靠性?
请教:algebra VS. sigma algebra忘光了,谁来帮我解决这个比较白的问题?
有用singular软件或者搞invariant polynomial theory的吗?问个线性代数的问题
相关话题的讨论汇总
话题: excel话题: values话题: unique话题: a5389579话题: count