有论文说大batch size会导致过拟合 - Programming版 - 未名存档

本页内容为未名空间相应帖子的节选和存档，一周内的贴子最多显示50字，超过一周显示500字访问原贴

Programming版 - 有论文说大batch size会导致过拟合

相关主题
● Re: 打脸文章：关于deep learning (转载)	● 廖方舟李哲金奖！ (转载)
● Question about learning C#	● titan v评测不高啊
● 各位对编程预制板快，即插即用有何高见？有什么参考网站	● deep learning就是数据太贵
● wdong帮忙几个alphaGo的问题	● 请问一个常见的算法问题
● 这个机器怎么样？	● 请教：3维数据拟合（包子贴）
● ubuntu 16.04上1070不work	● 把一个function pointer作为参数传入一个function的语法是什么？
● 讨论程序语言没什么意思，functional programming还是不错的	● 问个简单算法。
● IBM 弄出个DDL，16天缩短到7小时，很暴力	● open source都快搞得大家没饭吃了

相关话题的讨论汇总
话题: batch话题: size话题: learning话题: large话题: training

进入Programming版参与讨论

1

(共1页)

x****u 发帖数: 44466	1 https://arxiv.org/abs/1609.04836 On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
L****8 发帖数: 3938	2 For all experiments, we used 10% of the training data as batch size for the large-batch experiments and 256 data points for small-batch experiments. 512 batch_size应该不算大的【在 x****u 的大作中提到】 : https://arxiv.org/abs/1609.04836 : On Large-Batch Training for Deep Learning: Generalization Gap and Sharp : Minima
x****u 发帖数: 44466	3 这主要讲的是趋势问题我的感觉是batch size越大学习能力越强，但数据集是人类手工设计的，过强学习能力容易学到系统偏差【在 L****8 的大作中提到】 : For all experiments, we used : 10% of the training data as batch size for the large-batch experiments : and 256 data points for small-batch experiments. : 512 batch_size应该不算大的
w***g 发帖数: 5958	4 昨天又看到两篇paper/帖子，没细看。一篇说batch size大容易过早收敛。另一篇说训练到后来不应该缩小learning rate而是应该增大batch size。那种好几k的batch size应该是上了8显卡吧。刚刚好不容易拼出一个2 titan xp的机器，还不甚稳定。小作坊日子不容易啊。【在 L****8 的大作中提到】 : For all experiments, we used : 10% of the training data as batch size for the large-batch experiments : and 256 data points for small-batch experiments. : 512 batch_size应该不算大的
g*******u 发帖数: 3948	5 你的 10% of train 和 256 谁好？【在 L****8 的大作中提到】 : For all experiments, we used : 10% of the training data as batch size for the large-batch experiments : and 256 data points for small-batch experiments. : 512 batch_size应该不算大的
l*******m 发帖数: 1096	6 100-300 GPUs, haha 【在 w***g 的大作中提到】 : 昨天又看到两篇paper/帖子，没细看。一篇说batch size大容易过早收敛。 : 另一篇说训练到后来不应该缩小learning rate而是应该增大batch size。 : 那种好几k的batch size应该是上了8显卡吧。 : 刚刚好不容易拼出一个2 titan xp的机器，还不甚稳定。小作坊日子不容易啊。
x****u 发帖数: 44466	7 最近比特币价格不好，可以找挖矿的租空闲资源【在 w***g 的大作中提到】 : 昨天又看到两篇paper/帖子，没细看。一篇说batch size大容易过早收敛。 : 另一篇说训练到后来不应该缩小learning rate而是应该增大batch size。 : 那种好几k的batch size应该是上了8显卡吧。 : 刚刚好不容易拼出一个2 titan xp的机器，还不甚稳定。小作坊日子不容易啊。
w*****r 发帖数: 197	8 你想多了，人家可以用CPU做实验 : 昨天又看到两篇paper/帖子，没细看。一篇说batch size大容易过早收敛。 : 另一篇说训练到后来不应该缩小learning rate而是应该增大batch size。 : 那种好几k的batch size应该是上了8显卡吧。 : 刚刚好不容易拼出一个2 titan xp的机器，还不甚稳定。小作坊日子不容易啊。【在 w***g 的大作中提到】 : 昨天又看到两篇paper/帖子，没细看。一篇说batch size大容易过早收敛。 : 另一篇说训练到后来不应该缩小learning rate而是应该增大batch size。 : 那种好几k的batch size应该是上了8显卡吧。 : 刚刚好不容易拼出一个2 titan xp的机器，还不甚稳定。小作坊日子不容易啊。

1

(共1页)

进入Programming版参与讨论

相关主题
● open source都快搞得大家没饭吃了	● 这个机器怎么样？
● How to find the best fit dimension of Polynomial interpolation/curve fitting ?	● ubuntu 16.04上1070不work
● 网络售票网站外包	● 讨论程序语言没什么意思，functional programming还是不错的
● 猜数据范围	● IBM 弄出个DDL，16天缩短到7小时，很暴力
● Re: 打脸文章：关于deep learning (转载)	● 廖方舟李哲金奖！ (转载)
● Question about learning C#	● titan v评测不高啊
● 各位对编程预制板快，即插即用有何高见？有什么参考网站	● deep learning就是数据太贵
● wdong帮忙几个alphaGo的问题	● 请问一个常见的算法问题

相关话题的讨论汇总
话题: batch话题: size话题: learning话题: large话题: training

未名新帖统计// 7月16日

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

* 这里只显示发帖超过25的版面，努力灌水吧:-)