【Python-pandas】如何改变dataframe中部分元素的格式？ - DataSciences版 - 未名存档

本页内容为未名空间相应帖子的节选和存档，一周内的贴子最多显示50字，超过一周显示500字访问原贴

DataSciences版 - 【Python-pandas】如何改变dataframe中部分元素的格式？

相关主题
● Memory Error in pandas.concat with Python	● 请问为什么有的data analyst的工作要求会用excel呢？
● Spark开始使用DataFrame	● python for data analysis
● 如何改变spark dataframe的column names	● 问一道(大)数据 algorithm (转载)
● 在R里merge两个dataframe太慢了	● spark上一两个million的时间序列数据
● python用起来没有matlab好使，尤其是数据处理	● [挖个坑]数据分析都有哪些开源工具呀？
● 如何用python读取大数据	● 求data analysis/engineer/scientist intern的面试经验及就业方向指导谢谢！
● 求指点-怎样提高python水平？	● Data Engineer Summer Intern at NEC Labs America, Princeton
● python 网络爬虫和数据处理	● 讲个找数据科学工作的捷径 aka 刷Kaggle题迅速入门data scient (转载)

相关话题的讨论汇总
话题: original话题: float话题: str话题: class话题: dataframe

进入DataSciences版参与讨论

1

(共1页)

D******6 发帖数: 841	1 求问版上高手，本人用Python里的神包pandas将csv格式或者xlsx/xlsm（Excel）格式的原始数据read_csv或者read_excel进来形成dataframe，但由于原数据里string和numbe r都有，结果大家在dataframe里统统被自动搞成str的format，那么需要将dataframe表格中数字部分的str变成float格式、字符部分继续保持原str格式，也就是说，需要将d ataframe二维表格中某些行、某些列或者某些行的某部分、某些列的某部分进行str到f loat的格式转变，请问如果做？求高手指点，最好能给两行关键code示范，谢谢！
i**********a 发帖数: 1402	2 def convert(x): try: return(float(x)) except: return(x) df = df.applymap(convert)
D******6 发帖数: 841	3 求问版上高手，本人用Python里的神包pandas将csv格式或者xlsx/xlsm（Excel）格式的原始数据read_csv或者read_excel进来形成dataframe，但由于原数据里string和numbe r都有，结果大家在dataframe里统统被自动搞成str的format，那么需要将dataframe表格中数字部分的str变成float格式、字符部分继续保持原str格式，也就是说，需要将d ataframe二维表格中某些行、某些列或者某些行的某部分、某些列的某部分进行str到f loat的格式转变，请问如果做？求高手指点，最好能给两行关键code示范，谢谢！
i**********a 发帖数: 1402	4 def convert(x): try: return(float(x)) except: return(x) df = df.applymap(convert)
g*****g 发帖数: 390	5 Beautiful and succinct code! Did something else as below, seems working too: import re f = lambda x: float(x) if re.sub(r'(^-\|\.\|e\+\|e\-)', '', x).isdigit() else x df = df.applymap(f) Note: the line below solves the issue to verify a string 'x' is "int/float" string or "str" string: f = lambda x: float(x) if re.sub(r'(^-\|\.\|e\+\|e\-)', '', x).isdigit() else x test: l = ['123', '12.3', '-12.3', '--1.2','1a', 'a1', '-1.a', '12-', '-0.08e+3', '1.2e-3'] for item in l: print('Original: %10s'%repr(item), '\t', '==>', type(f(item)), f(item)) output: Original: '123' ==> 123.0 Original: '12.3' ==> 12.3 Original: '-12.3' ==> -12.3 Original: '--1.2' ==> --1.2 Original: '1a' ==> 1a Original: 'a1' ==> a1 Original: '-1.a' ==> -1.a Original: '12-' ==> 12- Original: '-0.08e+3' ==> -80.0 Original: '1.2e-3' ==> 0.0012 Explaining the pattern: r'(^-\|\.\|e\+\|e\-)' is to match: 1) "^-" (for negative number, only when "-" is at beginning, middle "-" will be counted as str string), 2) decimal "\.", and 3) scientific notion "e\+" or "e\-" and replace with '' (nothing), then if .isdigit() is True, means it can be convert to float: if not, str string. Note, the thousands "," is not handled yet, but better to be done before this step (either during csv read or do a .str.replace), because builtin float() will return ValueError.

1

(共1页)

进入DataSciences版参与讨论

相关主题
● 有用pycharm的同学吗－为什么在pycharm 里找不到pandas module？	● python用起来没有matlab好使，尤其是数据处理
● 有没有人一起组队做kaggle？	● 如何用python读取大数据
● python数据处理的一个问题 (转载)	● 求指点-怎样提高python水平？
● 装不了scipy 包，提示没装MKL	● python 网络爬虫和数据处理
● Memory Error in pandas.concat with Python	● 请问为什么有的data analyst的工作要求会用excel呢？
● Spark开始使用DataFrame	● python for data analysis
● 如何改变spark dataframe的column names	● 问一道(大)数据 algorithm (转载)
● 在R里merge两个dataframe太慢了	● spark上一两个million的时间序列数据

相关话题的讨论汇总
话题: original话题: float话题: str话题: class话题: dataframe

未名新帖统计// 7月16日

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

* 这里只显示发帖超过25的版面，努力灌水吧:-)