spark is slower than java Mapreduce --scala big bulls pls advise - Programming版 - 未名存档

本页内容为未名空间相应帖子的节选和存档，一周内的贴子最多显示50字，超过一周显示500字访问原贴

Programming版 - spark is slower than java Mapreduce --scala big bulls pls advise

相关主题
● Spark请教。	● 转行做data warehouse的问个学习大数据的问题
● aws EMR能设置一个mapper吗？	● Re: 请教板上老司机关于组和以后的发展方向
● 能不能在hadoop中用open mpi?	● 学scala和spark需要什么pre req?
● coltzhao的公司还在用mongo吗？	● 试了下spark，不过如此啊
● Flink Sparks Next Wave of Distributed Data Processing	● 已经全上内存了，还要40多秒啊
● MapReduce 的思想是怎么发明的？	● Spark会干掉Storm吗？
● 是否值得把业务逻辑做到Hbase coprocessor里面?	● 以后真的是cassandra spark的天下了？
● Hadoop运行时是不是用命令行执行的？Hadoop和Java有什么联系？	● Spark PK Akka 完胜呀

相关话题的讨论汇总
话题: spark话题: advise话题: bulls话题: sortmapper

进入Programming版参与讨论

1

(共1页)

v*****r 发帖数: 2325	1 spark beginner trying out the buzz tech input 200GB uncompressed data file stored in hdfs 37 worker nodes, each has 24 cores using java map reduce, 6-8 minutes using spark, 37 minutes, 2 18 minute-stage "lightning fast cluster computing, 100x faster" ???!!!! Big bulls please advise! #sortMapper sort values for each key, then do some iteration for the grouped values text = sc.textFile(input,1776) #24372 text.map(mapper).filter(lambda x: x!=None).groupByKey().map(sortMapper). filter(lambda x: x[1]!=[]).saveAsTextFile(output) sc.textFile and saveAsTextFile is very slow configuration as follows: conf = SparkConf().set("spark.executor.memory","24g").set("spark.driver. memory","16g").set("spark.serializer", "org.apache.spark.serializer. KryoSerializer")
N********n 发帖数: 8363	2 It's "lightening fast" only when it's in-memory, otherwise there's no magic here.
w********m 发帖数: 1137	3 Shuffle了没有 ★ 发自iPhone App: ChineseWeb 8.7 【在 v****r 的大作中提到】 : spark beginner trying out the buzz tech : input 200GB uncompressed data file stored in hdfs : 37 worker nodes, each has 24 cores : using java map reduce, 6-8 minutes : using spark, 37 minutes, 2 18 minute-stage : "lightning fast cluster computing, 100x faster" ???!!!! : Big bulls please advise! : #sortMapper sort values for each key, then do some iteration for the grouped : values : text = sc.textFile(input,1776) #2437*2
b********l 发帖数: 84	4 没有 worker mem 24g x 37 是数据量的4倍【在 w********m 的大作中提到】 : Shuffle了没有 : : ★ 发自iPhone App: ChineseWeb 8.7
b********l 发帖数: 84	5 做磁盘读写也不能比java mr 慢吧【在 N********n 的大作中提到】 : It's "lightening fast" only when it's in-memory, otherwise there's : no magic here.

1

(共1页)

进入Programming版参与讨论

相关主题
● Spark PK Akka 完胜呀	● Flink Sparks Next Wave of Distributed Data Processing
● 谈谈为什么上scala	● MapReduce 的思想是怎么发明的？
● mapreduce, hadoop还能火几年？	● 是否值得把业务逻辑做到Hbase coprocessor里面?
● Intro to Hadoop and MapReduce @ ucadicy 有人学过么？ 199块 (转载)	● Hadoop运行时是不是用命令行执行的？Hadoop和Java有什么联系？
● Spark请教。	● 转行做data warehouse的问个学习大数据的问题
● aws EMR能设置一个mapper吗？	● Re: 请教板上老司机关于组和以后的发展方向
● 能不能在hadoop中用open mpi?	● 学scala和spark需要什么pre req?
● coltzhao的公司还在用mongo吗？	● 试了下spark，不过如此啊

相关话题的讨论汇总
话题: spark话题: advise话题: bulls话题: sortmapper

未名新帖统计// 7月16日

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

* 这里只显示发帖超过25的版面，努力灌水吧:-)