有算法专家吗？ - Fishing版

本页内容为未名空间相应帖子的节选和存档，一周内的贴子最多显示50字，超过一周显示500字访问原贴

Fishing版 - 有算法专家吗？

相关主题
● Amazing accuracy	● 加州海边钓鱼可否用多于一根鱼竿？
● 公园fishing被抓，求意见 --勿置顶，谢谢	● [出售]Sunglass Hut 20off and 50off Coupon@$5,$15 (转载)
● 在美国火眼金睛抓缺德厂家	● 我终于可以去办个划算的钓鱼执照了
● there are big fish in every pond	● 困惑：用cross validationce 来评估performance的时候，还需要把原始的dataset区分为train 和test吗？
● 【新人求助】马里兰，纯新手一枚，求大神指点，带上路	● 问一个机器学习的问题
● 试与刀神PK	● 为啥back test 很牛的algorithm/strategy, going forward 就不咋地了?
● 鱼证有效期	● 对goldmember的数据的三分回归分析结果
● 大家有买Second Rod Validation $13.53 吗?谢谢	● 请问关于小的dataset evaluation的问题

相关话题的讨论汇总
话题: set话题: testing话题: machine话题: training话题: accuracy

进入Fishing版参与讨论

(共1页)

G***G
发帖数: 16778

random forest, Support Vector Machine, neural network
prediction and classfication
I have training set and testing set.
the accuracy for the training set is good (almost 100%)
but the accuracy for the testing set is always bad (highest is 60%)
how to improve it?

b*****2
发帖数: 47

How did you adjust your parameters? Did you try k fold cross validation?

c*********o
发帖数: 8367

you overfitted the train sample, that is why is not working on testing set.
there is no way to get 100%, unless you have some variables directly linked
to the target. get rid of that variable and try again.

【在 G***G 的大作中提到】

: random forest, Support Vector Machine, neural network
: prediction and classfication
: I have training set and testing set.
: the accuracy for the training set is good (almost 100%)
: but the accuracy for the testing set is always bad (highest is 60%)
: how to improve it?

r********n
发帖数: 6979

你选的怎么都是这么容易overfit的model
试试logistic regression
或者像上面的人说的用cross validation或者bootstrapping
对了
你的模型维数高的话还要先做feature selection

【在 G***G 的大作中提到】

d*****g
发帖数: 4364

typical overfitting
you may want to add controlling item to penalize the object function

【在 G***G 的大作中提到】

T*****u
发帖数: 7103

goog 你具体点吧。可能over fitting，pseudo correlation，怎么避免还要具体什么
情况。用的canned software还是自己写的，什么数据，啥物理模型，training set 和
validation set怎么采集的，training的条件啥的。

G***G
发帖数: 16778

一般什么算法的预测率最高： svm， rf, or nerual network?
我用的是weka.

【在 T*****u 的大作中提到】

: goog 你具体点吧。可能over fitting，pseudo correlation，怎么避免还要具体什么
: 情况。用的canned software还是自己写的，什么数据，啥物理模型，training set 和
: validation set怎么采集的，training的条件啥的。

T*****u
发帖数: 7103

俺不清楚，只能说no free lunch。用weka就看看Data Mining: Practical Machine
Learning Tools and Techniques呗。

【在 G***G 的大作中提到】

: 一般什么算法的预测率最高： svm， rf, or nerual network?
: 我用的是weka.

w*******9
发帖数: 1433

所有data mining 的努力就是克服overfitting的问题，所以Goog你的问题general到等
价于“怎么做data mining”, 不是两句话就行的。你说的这些模型都很复杂，所以基
本上不会有太大的差别，主要取决于feature creation/selection，而这只能手动的
case by case解决。

s****0
发帖数: 2532

这里有这么多算法专家。老天啊，我为什么不早问呢？我有海量的从猴大脑记录的神经
信号需要处理，到处找合作者而不遇。
有合作意向的话，给我投条哦

相关主题
● 试与刀神PK	● 加州海边钓鱼可否用多于一根鱼竿？
● 鱼证有效期	● [出售]Sunglass Hut 20off and 50off Coupon@$5,$15 (转载)
● 大家有买Second Rod Validation $13.53 吗?谢谢	● 我终于可以去办个划算的钓鱼执照了
进入Fishing版参与讨论

r********n
发帖数: 6979

weka挺好的
就用它的nominal settings大部分时候效果都不错
不过我估计你的问题维数太高了
这几个model都是出了名的容易overfit

【在 G***G 的大作中提到】

: 一般什么算法的预测率最高： svm， rf, or nerual network?
: 我用的是weka.

r********n
发帖数: 6979

你是在Duke， JHU还是UCLA？

【在 s****0 的大作中提到】

: 这里有这么多算法专家。老天啊，我为什么不早问呢？我有海量的从猴大脑记录的神经
: 信号需要处理，到处找合作者而不遇。
: 有合作意向的话，给我投条哦

g****s
发帖数: 1755

lol, 真没想到在这里看到这样的话题，看来弄算法和钓鱼还是有overlap的哈！

G***G
发帖数: 16778

请问什么是nominal settings？

【在 r********n 的大作中提到】

: weka挺好的
: 就用它的nominal settings大部分时候效果都不错
: 不过我估计你的问题维数太高了
: 这几个model都是出了名的容易overfit

r********n
发帖数: 6979

就是默认设置

【在 G***G 的大作中提到】

: 请问什么是nominal settings？

s***e
发帖数: 5242

你在学校？大方向是啥？

【在 r********n 的大作中提到】

: 就是默认设置

r********n
发帖数: 6979

不在学校
做machine learning的。。。

【在 s***e 的大作中提到】

: 你在学校？大方向是啥？

(共1页)

进入Fishing版参与讨论

相关主题
● 请问关于小的dataset evaluation的问题	● 【新人求助】马里兰，纯新手一枚，求大神指点，带上路
● An interview question	● 试与刀神PK
● 高考由于强烈的竞技性	● 鱼证有效期
● 对于不想或者没能力搞系统的同学们	● 大家有买Second Rod Validation $13.53 吗?谢谢
● Amazing accuracy	● 加州海边钓鱼可否用多于一根鱼竿？
● 公园fishing被抓，求意见 --勿置顶，谢谢	● [出售]Sunglass Hut 20off and 50off Coupon@$5,$15 (转载)
● 在美国火眼金睛抓缺德厂家	● 我终于可以去办个划算的钓鱼执照了
● there are big fish in every pond	● 困惑：用cross validationce 来评估performance的时候，还需要把原始的dataset区分为train 和test吗？

相关话题的讨论汇总
话题: set话题: testing话题: machine话题: training话题: accuracy

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

boards

未名新帖统计// 7月16日

历史上的今天