第3页 - 关于pca的讨论汇总 - 话题女王

c***y
发帖数: 615

来自主题: Biology版 - PCA explained variance 太低怎么办?

270个样本(6 replicates X 5 treatments X 9 time points), metagenomic data,
phylogeny features: 850
PCA 结果: PC1, ~12%; PC2, ~4%; PC3, ~1%
这种结果如何解释? 可信度如何?
谢谢了!!

k*****2
发帖数: 135

来自主题: Biology版 - PCA explained variance 太低怎么办?

12%感觉好低的样子。不过不知道你在predict什么，也许不错也说不定呢。
建议1. 你用random feature（比如shuffle现在的信息之后）看看你能够Predict多少
，如果你的结果显著好于random prediction，则说明结果还行。
建议2. 做PCA之前，用CV把一些没什么用的feature去掉，因为feature多，noise会比
较大。
建议3. 如果你是想做classification的话，可以考虑其他classification的方法，种
类繁多，适用于不同类型。

k*****2
发帖数: 135

来自主题: Biology版 - PCA explained variance 太低怎么办?

如果你random shuffle treatment的数据得到pseudo dataset，用PCA还能够得到~12%
variance explained，就说明model挺糟糕（没有任何predictive power），如果只能
解释很小部分，你做1000次shuffle什么的，如果都远低于12%，则说明还OK。

validation

s******s
发帖数: 13035

来自主题: Biology版 - PCA explained variance 太低怎么办?

你PC1有12%，不知道你为啥认为低了，和啥比的？
你这个PC的variance explained降的很快，我不错metagenomics，不过从统计上讲，这
说明前两个PC很可靠啊。
你用PC做predication或者cluster，我知道很多人这样做，结果出来也好解释，纯生物
的喜欢，不过这玩意儿其实就是看看啊，真要做肯定直接放model啊。PCA这玩意儿，和
distance function关系太大，不一样的data transformation, 不一样的distance
measurement结果可能完全不同，也就一fast & dirty的方法，真不适合深究。

c***y
发帖数: 615

来自主题: Biology版 - PCA explained variance 太低怎么办?

能推荐一些model analysis的方法吗?
我个人也觉得PCA很多时候不是很另人满意

c***y
发帖数: 615

来自主题: Biology版 - PCA explained variance 太低怎么办?

我的理解是PCA本身就是unsupervised的,已经random了...

%

c***y
发帖数: 615

来自主题: Biology版 - PCA explained variance 太低怎么办?

非常谢谢你们的回复.
仔细看了下PCA permutation test的有关资料.我以前理解的确是错误的.我会去试试这
个test.

s******s
发帖数: 13035

来自主题: Biology版 - t-SNE和PCA哪个好？为什么？

PCA本来就是欧几里得距离similarity matrix做eigenanalysis，naturally
capture点之间相似性信息。
你去仔细想一想，其实retain sample variance和retain dissimilarity差不
多是一个意思

c********e
发帖数: 598

来自主题: Biology版 - t-SNE和PCA哪个好？为什么？

retain dissimilarity (PCA) is different than retain similarity (tSNE).

A*****n
发帖数: 243

来自主题: Biology版 - t-SNE和PCA哪个好？为什么？

哪个图画出来好看用哪个：）
即使是用tSNE画图，也还是最好先用PCA降维，顺便观察数据结构。

A*****n
发帖数: 243

来自主题: Biology版 - t-SNE和PCA哪个好？为什么？

当数据维数很高的时候，比如3万多个基因，尤其其中还有一些co-linear的特征，使用
PCA以及SVD有助于将数据维数降到一个合理范围，比如50或者100以下。这样对于距离
矩阵的计算有很大帮助，另外也有助于减少噪音对于距离计算的影响。这个时候可以试
着用PC1 to 3 先对数据进行初步的观察，看看有没有一些结构。如果PC1 to 3的total
variance实在太小，这个时候才考虑用tSNE做进一步的降维分析。当然如果象CyTOF这
种维数本来就比较低，而且基因之间差异又比较大的数据，倒是无所谓。
所有非线性的转换如果不选择合适的参数的话，结果很容易misleading，发现一些
artificial的结构。不过这种负结论很多人不爱听。
http://distill.pub/2016/misread-tsne/

s******s
发帖数: 13035

来自主题: Biology版 - t-SNE和PCA哪个好？为什么？

请教一下，有啥算PCA节省内存的软件？
我有一个10k x 500k维度的数据，在R里面算把我200G的内存都搞爆了，现在只能
subsampling才能算，不爽

total

c********e
发帖数: 598

来自主题: Biology版 - t-SNE和PCA哪个好？为什么？

Python scikit learn has "Incremental PCA". difficult to avoid subsampling.

a****g
发帖数: 6

来自主题: Computation版 - where to find C++ codes for PCA and CA?

PCA: principle component analysis;
CA: correspondence analysis.
I have attempted to search on google , but the codes found not work.
Do somebody have the reliable C/C++ codes for them, or is there any good
website on which I can find them?
Thanks a lot!

a****g
发帖数: 6

来自主题: Computation版 - where to find C++ codes for PCA and CA?

My labmate has this book. But I can't find PCA. Could you tell me in which
chapter it is?
(also I find the book's website
"http://lib-www.lanl.gov/numerical/bookcpdf.html")
Thanks a lot!

a****g
发帖数: 6

来自主题: Computation版 - where to find C++ codes for PCA and CA?

Thank you very much, Tarzan. I guess there is no diect code or function for
PCA in that book, but it is very easy to implement using those eigvalue codes.
I have transformed my data in Matlab, --- that is also very easy.
Thanks again!

web),

c*****i
发帖数: 120

来自主题: Computation版 - 求教PCA

请问各位有没有知道比较好的PCA算法，
能求解非常高维数据的分析？（比如几百万维？）
如果有好的资料或者代码，能否推荐！
谢谢

a*******a
发帖数: 33

来自主题: Computation版 - 求教PCA

If size of dataset is small ( ~10000), it's possible.
Let the columns of matrix X be each data point,
then use the eigenvectors of X^T X is the projection of data points on the PCA
vectors.

l*********1
发帖数: 351

来自主题: Computation版 - matlab中PCA一问,谢谢

http://www.mathworks.com/access/helpdesk/help/toolbox/stats/index.html?/access/h
elpdesk/help/toolbox/stats/brkgqnt.html&http://www.mathworks.com/cgi-bin/texis/w
ebinator/search/?db=MSS&prox=page&rorder=750&rprox=750&rdfreq=500&rwfreq=500
&rle
ad=250&sufs=0&
order=r&is_summary_on=1&ResultCount=10&query=Principal+Component+Analysis+&
submi
tButtonName=Search
中pca的例子,计算得到
c3 = coefs(:,1:3)
c3 =
0.2064 0.2178 -0.6900
0.3565 0.2506 -0.2082
0.4602 -0.2995 -0.0073
0.2813

r****y
发帖数: 1437

来自主题: Computation版 - matlab中PCA一问,谢谢

for PCA, +/- is totally arbitary, just make the score/loading
consistent.
think this way, a point in 3-dimensional space (1 1 1)
3-d coordinate, if you let the basis be
(1 0 0) (0 1 0) (0 0 1) (your EOF components)
your project coefficients would be
1 1 1
if you let the basic be
(-1 0 0) (0 1 0) (0 0 1)
the they are
-1 1 1

500

s********k
发帖数: 6180

来自主题: EE版 - 求助Kernel PCA的matlab code (转载)

【以下文字转载自 CS 讨论区】
发信人: silverhawk (silverhawk), 信区: CS
标题: 求助Kernel PCA的matlab code
发信站: BBS 未名空间站 (Wed Apr 14 16:50:14 2010, 美东)
Google了好几个都不是很好用，觉得这种广泛应用的方法应该有很多可用code吧，版上
有没有大侠能贡献一下知道的code或者链接，只需要基本的RBF或者poly的kernel就可
以了，谢谢

x******i
发帖数: 172

来自主题: Mathematics版 - 请教如何用PCA结果中的component 1 和 component 2 做MANOVA分析

我有事情请教，就是关于做MANOVA
我用PCA找到componet 1 和component 2里面的成分
接下来我要把component 1 和 component 2作为dependent variables，手术前与手术
后作为independent variable，BMI作为covariance
我在网上查了下在SPSS里做MANOVA的步骤，大致知道做MANOVA的步骤，但是不知道把
component 1 和 component 2的什么值copy到MANOVA的column里，麻烦问下你们有谁做
过类似的分析么？谢谢！

t********y
发帖数: 166

来自主题: Mathematics版 - 请教如何用PCA结果中的component 1 和 component 2 做MANOVA分析

去统计问。
这里知道PCA是什么的都没几个。

l******r
发帖数: 18699

来自主题: Mathematics版 - 请教如何用PCA结果中的component 1 和 component 2 做MANOVA分析

PCA是啥?记得上回听一个搞or的loser跟我说过，不过忘了

x******i
发帖数: 172

来自主题: MedicalCareer版 - 请教关于如何用PCA结果中的component 1 和 component 2 做MANOVA

c****y
发帖数: 3592

来自主题: Quant版 - 问个PCA的问题，很困惑

我有P个股票N天数据，data=n*p, allocation=n*1
然后我做PCA,搞出J个主成分n*j,新的allocation=j*1
问题是，为什么
J*new_allocation - data*allocation 不等于0呢？？
J=eigenvector*data
照理应该是没有差别的啊？

m******l
发帖数: 97

来自主题: Quant版 - Matlab PCA 问题请教

PCA = X * coeff

s*****n
发帖数: 2174

来自主题: Statistics版 - 请教PCA

PCA在做之前, 要把矩阵normalize.
princomp里面自动包含了这个过程
你自己用特征值计算之前要自己做.
用 scale() 函数就可以.

n****8
发帖数: 37

来自主题: Statistics版 - 请问如何测PCA里某两个eigen value是相同的？谢谢！

我说的是hypothesis test啊。
算出来肯定不可能一样。但是如果实际一样，PCA算的没有意义啊。太不稳定。
谢谢指点。

p********0
发帖数: 186

来自主题: Statistics版 - how to Use PCA to get eignen vector and eigen value

Hi,
I have 300 observation of two diemensional data, X1(a1, b1), X2(a2, b2), ...
X3(a3, b3).
how do I use the PCA analysis to get the eigen vector and eigen value?
Do I need to get covariance matrix first? E(a) = Average(a) and E(b) =
average(b).
All the E(X1) = E(X2) = ... = E(Xn)???
How do I get Covariance Matrix Cov(Xi, Xj)

s********k
发帖数: 6180

来自主题: Statistics版 - 求助Kernel PCA的matlab code (转载)

l*********s
发帖数: 5409

来自主题: Statistics版 - what's the difference of PCA and FA?

http://psych.wisc.edu/henriques/pca.html

s*r
发帖数: 2757

来自主题: Statistics版 - 请问哪里有PCA的SAS code 啊

there should be some proc for pca

a*****3
发帖数: 601

来自主题: Statistics版 - 请问哪里有PCA的SAS code 啊

是不是可以用什么‘两部聚类法的’？用什么eigenvalue最后sas输出一个‘树’？不过没搞懂搞这东东有什么用？懒得google了。谁给展开说说pca和cluster analysis有啥区别一律30伪币怎么样？

l**********9
发帖数: 148

来自主题: Statistics版 - 请问哪里有PCA的SAS code 啊

I think there is a proc fo PCA in SAS....you can try the help document.
Waiting for big cow answer the question by 5L

d*********k
发帖数: 1239

来自主题: Statistics版 - 用PCA的时候，如果P比N大怎么办啊？

用PCA的时候，如果N比P大怎么办啊？就是large P,small N的问题？
直接用R的 princomp()就直接报错啦啊
谢谢啊

w*****n
发帖数: 375

来自主题: Statistics版 - 用PCA的时候，如果P比N大怎么办啊？

try "sparse PCA"

B****n
发帖数: 11290

来自主题: Statistics版 - 用PCA的时候，如果P比N大怎么办啊？

When doing PCA, you have to estimate the variance-covariance matrix. If p>>n
, it is impossible to get a good estimate unless you have special structures about your variance-covariance matrix.

d******e
发帖数: 7844

来自主题: Statistics版 - 用PCA的时候，如果P比N大怎么办啊？

引入额外的regularization就是另一回事了。
实际应用时，PCA很多情况下就是在p>>n的情况下，直接使用的，而且性能也并不是特
别差。虽然课本和paper里说divergent spectra会导致性能的严重下降，对实际应用指
导意义不大。

>n
structures about your variance-covariance matrix.

f******y
发帖数: 2971

来自主题: Statistics版 - PCA and linear regression

suppose two random variables, X and Y, mean of them are very small.
I can get the slope by linear regression lm(Y~X);
I can also do PCA,
data = data.frame(X=X, Y=Y);
princomp(data);
I expected the slope of the first PC vector to be very close to the slope
given by linear regression. I tried it in R, the results are very different.
Anyone can explain?

t**c
发帖数: 539

来自主题: Statistics版 - PCA and linear regression

请教PCA和regression之间是什么关系啊？

m**c
发帖数: 199

来自主题: Statistics版 - 问个pca的问题

怎么选pca的个数,除了基本的 sum(lamda_i)/total var >90%
有没有其他方法呢? thanks!

n******v
发帖数: 2184

来自主题: Statistics版 - 请教一个PCA用到time series的问题

每个时间点测量很多variables，做PCA的时候怎么account for temporal correlation?

t**********y
发帖数: 374

来自主题: Statistics版 - PCA (principle component analysis) analysis

I have 10 sets of data and each contains some gene counts. If I would like
to do the PCA to cluster the similar data sets, should I use the raw data
counts or use the correlation coefficients?
Thanks.

t**********y
发帖数: 374

来自主题: Statistics版 - PCA (principle component analysis) analysis

I actually tried hclust. Just was not patient enough to wait until the
results out...
The other people start to suggest PCA...

c******x
发帖数: 350

来自主题: Statistics版 - PCA 分析请教

假设x1，x2联合高斯分布，有cov为：[1 x;x 2]。x为非零。
现在我想把x1和x2表达为：
x1=a11*G1 + a12*G2
x2=a21*G1 + a22*G2
G1和G2为独立的两个单变量高斯分布。
不知道如何从PCA分析得到4个a系数，以及这两个高斯分布。
多谢。

k***n
发帖数: 997

来自主题: Statistics版 - 请教怎么用PCA capture pairwise covariance (with missing va

do stock returns often have 0 averages so no need to center them before
applying pca?
what's Y in the model?

s****b
发帖数: 2039

来自主题: Statistics版 - PCA作图

是不是４个PC以上，PCA就是高维空间的，不可能作出图来？
您是用PROC PRINCOMP还是用PROC FACTOR？为什么一定有２个PC？

E**********e
发帖数: 1736

来自主题: Statistics版 - PCA 可以用在mixture of continuous 和categorical variables

据我了解， pca用在continous variables 比较合理。categorical one怎计算
covariance matrix啊？

S*********1
发帖数: 105

来自主题: Statistics版 - PCA 可以用在mixture of continuous 和categorical variables

可以把categorical variables变成dummy variables,再用PCA。

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

topics

未名新帖统计// 7月16日

历史上的今天