l******9 发帖数: 579 | 1 【 以下文字转载自 Statistics 讨论区 】
发信人: light009 (light009), 信区: Statistics
标 题: count unique values in file with 1 million rows
发信站: BBS 未名空间站 (Sat Apr 12 16:11:22 2014, 美东)
I need to count the unique values in one column in EXCEL 2010.
The worksheet has 1 million rows and 10 columns. All cell values are string
or numbers.
I used the solution at Count unique values in a column in Excel
=SUMPRODUCT((A2:A1000000<>"")/COUNTIF(A2:A100000,A2:A1000000&""))
But, it runs so long time that the EXCEL is almost frozen. And, it generates
25 processes in Win 7.
Are there more efficient ways to do it ?
Also, in the column, all values have for format of
AX_Y
here, A is a character, X is an integer, Y is an integer from 1 to 10.
For example, A5389579_10
I need to cut off the part after (including) undersocre. for the example,
A5389579
This is what I need to count as unique values in all cells in one column.
For example, A5389579_10
A1543848_6
A5389579_8
here, the unique value has 2 after removing the part after underscore.
How to do it in EXCEL VBA and R (if no efficient solution for EXCEL) ?
Thanks ! | w**a 发帖数: 1024 | | S*********N 发帖数: 6151 | 3
string
generates
你可以先做一下字串操作,然后Count
【在 l******9 的大作中提到】 : 【 以下文字转载自 Statistics 讨论区 】 : 发信人: light009 (light009), 信区: Statistics : 标 题: count unique values in file with 1 million rows : 发信站: BBS 未名空间站 (Sat Apr 12 16:11:22 2014, 美东) : I need to count the unique values in one column in EXCEL 2010. : The worksheet has 1 million rows and 10 columns. All cell values are string : or numbers. : I used the solution at Count unique values in a column in Excel : =SUMPRODUCT((A2:A1000000<>"")/COUNTIF(A2:A100000,A2:A1000000&"")) : But, it runs so long time that the EXCEL is almost frozen. And, it generates
| s*****V 发帖数: 21731 | 4 1 million row在计算机面前算毛,直接数就得了。 | s****b 发帖数: 2039 | 5 can use programming, such as java:
make a 2D array, put excel sheet into the array.
for each value, find _ and select substring before _, put into a new array.
then for the new array, start with first value, use for loop, from i=1,if
find any value == this
value, delete that value, shift array contents one step up,reduce the
dimension by 1, increase i by 1, do it again. when i reaches 1million, put
this value into first position of a new 1D array, delete this value.
do it again with the second value ...
...
until you do it with the last value (the final dimension).
the new 1D array will be all the unique values.
set up a counter and a++ each time you put a value into the 1D array. the
final number a is the number of all unique values.
Don't know if this is an efficient idea, and don't know if java will freeze
or not with that large memory requirement, ha ha.
***
if you don't mind, you can also just select that column, paste to a new
sheet, then your programming will just be from 1D array to 1D array.
or you can import that excel into SAS, only select that column, export to a
textfile and save it. I guess SAS will not freeze.
string
generates
【在 l******9 的大作中提到】 : 【 以下文字转载自 Statistics 讨论区 】 : 发信人: light009 (light009), 信区: Statistics : 标 题: count unique values in file with 1 million rows : 发信站: BBS 未名空间站 (Sat Apr 12 16:11:22 2014, 美东) : I need to count the unique values in one column in EXCEL 2010. : The worksheet has 1 million rows and 10 columns. All cell values are string : or numbers. : I used the solution at Count unique values in a column in Excel : =SUMPRODUCT((A2:A1000000<>"")/COUNTIF(A2:A100000,A2:A1000000&"")) : But, it runs so long time that the EXCEL is almost frozen. And, it generates
|
|