cross validation and best model question - CS版 - 未名存档

本页内容为未名空间相应帖子的节选和存档，一周内的贴子最多显示50字，超过一周显示500字访问原贴

CS版 - cross validation and best model question

相关主题
● Classfication 比较好的算法是什么?	● 想问一个关于评价prediction performance的问题 (转载)
● 这里有码工网友吗？ (转载)	● 数据科学之江湖兵器谱 (转载)
● graphical models vs kernel methods	● Re: Efficient duplicate filtering for st
● 问两个个KNN的问题	● any guys familiar with MetaPost ???
● 【求助】Fortran多线程执行效率问题	● 谁有Mario Martin 的SVM Incremental Regression 包
● 菜鸟求助: 用 Oracle Sqlldr Control File 能不能改character set?	● 速成machine learning?
● 周志华抄袭一事的证据	● SVM clustering
● 现在的Complexity theory的研究	● an algorithm question

相关话题的讨论汇总
话题: model话题: fold话题: validation话题: cross话题: best

进入CS版参与讨论

1

(共1页)

m*p 发帖数: 1331	1 如果用K fold CV 的话，会得到K个model，然后再实际应用中应该选哪个model呢？用 accuracy or AUC 最高的那个？
D***r 发帖数: 7511	2 这个是比较主观的一般就直接用accuracy 【在 m*p 的大作中提到】 : 如果用K fold CV 的话，会得到K个model，然后再实际应用中应该选哪个model呢？用 : accuracy or AUC 最高的那个？
s*w 发帖数: 729	3 In my understanding, k-fold cross-validation is used to estimate the general ization error. So you do not really just pick the best one from one fold. If you have a certain set of paraemters for your model, and an evaluation cr iteria for your model, then k-fold c-v is able to tell you which set of para meters got what evaluation. since k-fold came into being from random parition into k-folds, it is better to run multiple times of k-fold c-v and got a mean and std of your evalutat ion results from all runs, where each run has a single number 【在 m*p 的大作中提到】 : 如果用K fold CV 的话，会得到K个model，然后再实际应用中应该选哪个model呢？用 : accuracy or AUC 最高的那个？
p*********e 发帖数: 303	4 不是每个fold选一个model/parameter，而是选一个model在所有fold上平均最好, 最后在把这个model在所有数据上训练一遍。【在 m*p 的大作中提到】 : 如果用K fold CV 的话，会得到K个model，然后再实际应用中应该选哪个model呢？用 : accuracy or AUC 最高的那个？
m*p 发帖数: 1331	5 thanks guys. one way is to average K model outputs. that's easy to understand. what do you mean by "而是选一个model在所有fold上平均最好"? could you please explain it using linear SVM as an example? thanks! 【在 p*********e 的大作中提到】 : 不是每个fold选一个model/parameter， : 而是选一个model在所有fold上平均最好, : 最后在把这个model在所有数据上训练一遍。
p*********e 发帖数: 303	6 假设对linear svm的参数C，要从C1到Ck中找到最好的一个值，先把数据分成N份，每次拿出1份测试，剩下的N-1份训练，一共有N中不同的拿法。对于每一种拿法，训练对应不同参数C的linear svm，并在剩下的1份上测试，得到 accuracy。所以对于每个参数值，都得到N种拿法对应的N个accuracy，最后对应平均 accuracy最高的那个参数值作为选定的C，再到所有N份数据上训练。【在 m*p 的大作中提到】 : thanks guys. : one way is to average K model outputs. that's easy to understand. : what do you mean by "而是选一个model在所有fold上平均最好"? : could you please explain it using linear SVM as an example? thanks!
m*p 发帖数: 1331	7 ic. that's the grid search in libsvm... how about for ensemble classifiers like adaboost? there is no simple parameter like C. Thanks! 【在 p*********e 的大作中提到】 : 假设对linear svm的参数C，要从C1到Ck中找到最好的一个值， : 先把数据分成N份，每次拿出1份测试，剩下的N-1份训练，一共有N中不同的拿法。 : 对于每一种拿法，训练对应不同参数C的linear svm，并在剩下的1份上测试，得到 : accuracy。所以对于每个参数值，都得到N种拿法对应的N个accuracy，最后对应平均 : accuracy最高的那个参数值作为选定的C，再到所有N份数据上训练。

1

(共1页)

进入CS版参与讨论

相关主题
● an algorithm question	● 【求助】Fortran多线程执行效率问题
● 请教machine learning 哪个方向比较有前途？	● 菜鸟求助: 用 Oracle Sqlldr Control File 能不能改character set?
● SVM这个方向怎么样	● 周志华抄袭一事的证据
● 求平均值zz关于research应该怎么做	● 现在的Complexity theory的研究
● Classfication 比较好的算法是什么?	● 想问一个关于评价prediction performance的问题 (转载)
● 这里有码工网友吗？ (转载)	● 数据科学之江湖兵器谱 (转载)
● graphical models vs kernel methods	● Re: Efficient duplicate filtering for st
● 问两个个KNN的问题	● any guys familiar with MetaPost ???

相关话题的讨论汇总
话题: model话题: fold话题: validation话题: cross话题: best

未名新帖统计// 7月16日

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

* 这里只显示发帖超过25的版面，努力灌水吧:-)