由买买提看人间百态

topics

全部话题 - 话题: udf
首页 上页 1 2 3 (共3页)
r*****d
发帖数: 346
1
来自主题: DataSciences版 - Pig UDF written in Python
问题终于孑孓了。。
问题出在原来的jython是老版本,2.5.1+
我下载了最新的版本,2.5.3
然后把原来的register jython.jar跟PIG_CLASSPATH都换成新的就好了
老版本可以用来单独运行python script,
但是不能放到register 'my_udf.py' using jython as myfuncs;
因为跟用的Pig版本不兼容
就酱紫
r*****d
发帖数: 346
2
来自主题: DataSciences版 - Pig UDF written in Python
又有一个问题,需要import a python module怎么办?
比如import json
执行pig script的时候得到:
ImportError: no module named json
json这个module在那里,
假设这个python script的名字是test.py
执行python test.py没问题
但是执行jython test.py就也得到ImportError: no module named json
非常感谢!
l*******m
发帖数: 1096
3
来自主题: DataSciences版 - Pig UDF written in Python
need to test in jython.
BTW, using java is much much better with hadoop. Json parsing in java is 10-
time faster compared to python.
r*****d
发帖数: 346
4
来自主题: DataSciences版 - Pig UDF written in Python
More details please? Thank you very much.
l*******m
发帖数: 1096
5
来自主题: DataSciences版 - Pig UDF written in Python
如果你就是在pig读json. 用elephant bird的json loader就好,不用java coding
download如下jar(google 他们的jar下载,不要编译整个repositery, 不容易)
register /path/to/json-simple.jar;
register /path/to/elephant-bird-core.jar;
register /path/to/elephant-bird-pig.jar;
一个完整的example
https://github.com/kevinweil/elephant-bird/blob/master/examples/src/main/pig
/nested_json_get_distinct_items_from_nested_array.pig
j*******g
发帖数: 331
6
来自主题: DataSciences版 - Pig UDF written in Python
another thing I can think of is you probably should make sure all data nodes
has this package installed, did you test on every data node you gonna use?
r*****d
发帖数: 346
7
来自主题: DataSciences版 - Pig UDF written in Python
hopefully it is not this complicated.. i treat data nodes as encapsulation..

nodes
j*******g
发帖数: 331
8
来自主题: DataSciences版 - Pig UDF written in Python
data node might be inaccurate, but look at this
http://stackoverflow.com/questions/7831649/how-do-i-make-hadoop

..
t******g
发帖数: 2253
9
来自主题: DataSciences版 - Pig UDF written in Python
lz是HH?
r*****d
发帖数: 346
10
来自主题: DataSciences版 - Pig UDF written in Python
谢谢大家!迟些时候发包子。
t*********u
发帖数: 26311
11
还是用pig+UDF?
t*********u
发帖数: 26311
12
我只是想看看有没有稍微复杂点的用java全程写的例子
便于理解 和学习
PS
UDF的大众选择是java还是python?
w******k
发帖数: 299
13
Hadoop for data storage. Hive as query interface with mapper/reducer written
in perl/python. Hive UDF was written in Java. Final classifier was written
in Java.
t*********u
发帖数: 26311
14
那在原始的数据的层次
能不能在FOREACH GENERATE的时候用上Eval或者Filter类型的UDF
这样就是相当于直接对原始数据进行map操作
我的理解这个就是local运行的,对么?
s*******d
发帖数: 132
15
来自主题: DataSciences版 - 有谁在上UW的data science课吗?
感觉这课讲的东西不多,作业要求倒是不少。
twitter python 那个作业,就是hash table的应用呀
sql的作业也不难,什么conditional query的,也没具体讲 udf连个例子都没有
写作业就靠google
大家是选亚麻棕,还是kaggle?
c***u
发帖数: 4107
16
来自主题: DataSciences版 - pig能做iterative的问题吗?
最近在自学, 请问, pig能做iterative的问题吗, 比如一些matrix update的问题的.
比如nonnegative matrix factorization, 有一个nonnegative matrix N, 要分解成2
个matrix A和B, 使得|N-A*B|尽可能的小
标准算法是: 先随机生成2个矩阵A和B. 然后先固定A, 按照一个规则用A和N去更新B;
再固定B, 用B和N去更新A; 一直如此循环更新, 直到|N-A*B|足够小
不知道用pig或者hive, 能解决如此问题吗?
(不是research问题, 5/6年就有人用mapreduce+java发了N片文章)
当然, 我是说不另外写UDFs的情形下
D**u
发帖数: 288
17
来自主题: DataSciences版 - pig能做iterative的问题吗?
Pig itself has no support for iteration, but if you really want to use Pig,
you can embed the pig script into a python (jython) program to do it
iteratively.
Check this for example:
http://thedatachef.blogspot.com/2013/11/linear-regression-with-
After all, this is not best practice since for every iteration a M/R job is
spanned, and that is 2 sec wasted, and usually your algorithm runs with
hundreds of iteration. So, just use Spark. Spark now support both Scala and
Python pretty much equally wel... 阅读全帖
a*****d
发帖数: 18
18
来自主题: DataSciences版 - data scientist position
我现在的公司想扩张data science team。是一个做大数据的pre-ipo公司(比较有前途
的一个)。这个team主
要是做consulting service,项目比较有意思。老板是白人,人非常好。
Key responsibilities include:
Help customers understand and evaluate data science use-cases appropriate
for their business
Collaborate with customer teams to formulate the problem, recommend a
solution approach and design a data architecture
Create a prototype in R, Python, Java or similar stack to demonstrate the
results of various algorithmic approaches and evaluate their performance
Work ... 阅读全帖
a*****d
发帖数: 18
19
来自主题: DataSciences版 - data scientist position
我现在的公司想扩张data science team。是一个做大数据的pre-ipo公司(比较有前途
的一个)。这个team主
要是做consulting service,项目比较有意思。老板是白人,人非常好。
Key responsibilities include:
Help customers understand and evaluate data science use-cases appropriate
for their business
Collaborate with customer teams to formulate the problem, recommend a
solution approach and design a data architecture
Create a prototype in R, Python, Java or similar stack to demonstrate the
results of various algorithmic approaches and evaluate their performance
Work ... 阅读全帖
t*********u
发帖数: 26311
20
j*****n
发帖数: 1545
21
跟 node 有啥关系? 该 mapper 跑就 mapper 跑,该 reducer 跑就 reducer 跑
j********p
发帖数: 9680
22
感觉这个Hive就是为分布计算设计的,多个node同时运行是应该的,
但作为使用者就不用操心哪个和哪个在跑了.
l******n
发帖数: 9344
23
来自主题: DataSciences版 - san bruno ds position
Minimum Qualifications?
- PhD in Computer Science, Statistics or related field; OR a Master’s
degree or equivalent in Computer Science, Statistics or related field and 2
years of related experience.?
- Knowledge of machine learning, information retrieval, data mining,
statistics, NLP or related field.?
- Programming skills in one of the following languages: Java, Scala, C/C++.?
- Knowledge of one of the scripting languages such as Python or Perl.?
- Experience analyzing and interpreting the resu... 阅读全帖
G***n
发帖数: 877
24
来自主题: DataSciences版 - HIVE load CSV 问题请教
Hive本身也不成熟,很多东西需要UDF

jar
w*******y
发帖数: 60932
25
Chip.de is having an easter promotion and giving away copies of Daemon Tools
Pro.
The installer comes with the serial number pre-loaded. To get a copy, go to
Link:
http://www.chip.de/downloads/DAEMON-Tools-Pro-Advanced-Vollvers
Click "Zum Download", then click "Download-Server CHIP Online" and wait a
few seconds, the download should start automatically.
Features include:Create .iso, .mds/.mdf and .mdx images of CD, DVD, Blu-ray
discs
Protect images with password
Make or edit images with... 阅读全帖
q*z
发帖数: 13362
26
来自主题: _Xiyu版 - 再问个问题。
当然不是,这个问题很复杂,
不同的bluray player也不一样
最基本的方法是用udf 2.5,
直接burn 文件,但是并不保证
所有的bluray player都能放,因为
有的player有folder或者文件名的限制
为了确保兼容性.最好是用个支持avchd master的软件
比如说nero,来burn
首页 上页 1 2 3 (共3页)