D******6 发帖数: 841 | 1 求问版上高手,本人用Python里的神包pandas将csv格式或者xlsx/xlsm(Excel)格式的
原始数据read_csv或者read_excel进来形成dataframe,但由于原数据里string和numbe
r都有,结果大家在dataframe里统统被自动搞成str的format,那么需要将dataframe表
格中数字部分的str变成float格式、字符部分继续保持原str格式,也就是说,需要将d
ataframe二维表格中某些行、某些列或者某些行的某部分、某些列的某部分进行str到f
loat的格式转变,请问如果做?求高手指点,最好能给两行关键code示范,谢谢! | i**********a 发帖数: 1402 | 2 def convert(x):
try:
return(float(x))
except:
return(x)
df = df.applymap(convert) | D******6 发帖数: 841 | 3 求问版上高手,本人用Python里的神包pandas将csv格式或者xlsx/xlsm(Excel)格式的
原始数据read_csv或者read_excel进来形成dataframe,但由于原数据里string和numbe
r都有,结果大家在dataframe里统统被自动搞成str的format,那么需要将dataframe表
格中数字部分的str变成float格式、字符部分继续保持原str格式,也就是说,需要将d
ataframe二维表格中某些行、某些列或者某些行的某部分、某些列的某部分进行str到f
loat的格式转变,请问如果做?求高手指点,最好能给两行关键code示范,谢谢! | i**********a 发帖数: 1402 | 4 def convert(x):
try:
return(float(x))
except:
return(x)
df = df.applymap(convert) | g*****g 发帖数: 390 | 5 Beautiful and succinct code!
Did something else as below, seems working too:
import re
f = lambda x: float(x) if re.sub(r'(^-|\.|e\+|e\-)', '', x).isdigit() else x
df = df.applymap(f)
Note:
the line below solves the issue to verify a string 'x' is "int/float" string
or "str" string:
f = lambda x: float(x) if re.sub(r'(^-|\.|e\+|e\-)', '', x).isdigit() else x
test:
l = ['123', '12.3', '-12.3', '--1.2','1a', 'a1', '-1.a', '12-', '-0.08e+3',
'1.2e-3']
for item in l:
print('Original: %10s'%repr(item), '\t', '==>', type(f(item)), f(item))
output:
Original: '123' ==> 123.0
Original: '12.3' ==> 12.3
Original: '-12.3' ==> -12.3
Original: '--1.2' ==> --1.2
Original: '1a' ==> 1a
Original: 'a1' ==> a1
Original: '-1.a' ==> -1.a
Original: '12-' ==> 12-
Original: '-0.08e+3' ==> -80.0
Original: '1.2e-3' ==> 0.0012
Explaining the pattern: r'(^-|\.|e\+|e\-)'
is to match:
1) "^-" (for negative number, only when "-" is at beginning, middle "-" will
be counted as str string),
2) decimal "\.", and
3) scientific notion "e\+" or "e\-"
and replace with '' (nothing), then if .isdigit() is True,
means it can be
convert to float: if not, str string. Note, the thousands "," is not handled
yet, but better to be done before this step (either during csv read or do a
.str.replace), because builtin float() will return ValueError. |
|