v*********9 发帖数: 1 | 1 Python初学者,
现在需要split一个Column, 这个column里面有两部分,
一部分是类似
01jan2019_order_1977663:
或者是
877920_jan_19799"
类似这种pattern, 那regular expression部分则么写比较好?
试了*_*_*" 或者 *_*_*:
都不好用
谢谢! | j****w 发帖数: 11 | 2 "877920_jan_19799".split("_")
or
re.split("_", "877920_jan_19799")
【在 v*********9 的大作中提到】 : Python初学者, : 现在需要split一个Column, 这个column里面有两部分, : 一部分是类似 : 01jan2019_order_1977663: : 或者是 : 877920_jan_19799" : 类似这种pattern, 那regular expression部分则么写比较好? : 试了*_*_*" 或者 *_*_*: : 都不好用 : 谢谢!
| v*********9 发帖数: 1 | 3 哎呀,问题没说清楚
比如有一列是客户的comment, 但是像是这样的
01JAN2019_order_3879940"I like this product, but the parkage is broken"
already replace an order and sent to customer
01mar2019_SAP_3879940:the parkage is broken, all things are mess
01JAN2019_order_3879940-3778"wrong color, I order golden, but comes yellow"
contacted customer to refund
01JAN2019_order_3879940
01JAN2019_dfegf_3879940"I like this product, but the parkage is broken"
already replace an order and sent to customer
01JAN2019_order_3879940:"I like this product, but the parkage is broken"
already replace an order and sent to customer
01JAN2019_order_3879940_"I like this product, but the parkage is broken"
already replace an order and sent to customer
it feels mold inside
color is not right" contacted with customer
现在就是想除掉不是客户留言的第一部分,试了几个都不好用。 | H**********f 发帖数: 2978 | 4 你这是已经在e-commerce做ds或者da了吧。那劝你认真学下常用字符串相关函数和
regular expression,没多少东西就一天的事,否则以后这种没完没了你还得问。
: 哎呀,问题没说清楚
: 比如有一列是客户的comment, 但是像是这样的
: 01JAN2019_order_3879940"I like this product, but the parkage is broken
"
: already replace an order and sent to customer
: 01mar2019_SAP_3879940:the parkage is broken, all things are mess
: 01JAN2019_order_3879940-3778"wrong color, I order golden, but comes
yellow"
: contacted customer to refund
: 01JAN2019_order_3879940
: 01JAN2019_dfegf_3879940"I like this product, but the parkage is broken
"
: already replace an order and sent to customer
【在 v*********9 的大作中提到】 : 哎呀,问题没说清楚 : 比如有一列是客户的comment, 但是像是这样的 : 01JAN2019_order_3879940"I like this product, but the parkage is broken" : already replace an order and sent to customer : 01mar2019_SAP_3879940:the parkage is broken, all things are mess : 01JAN2019_order_3879940-3778"wrong color, I order golden, but comes yellow" : contacted customer to refund : 01JAN2019_order_3879940 : 01JAN2019_dfegf_3879940"I like this product, but the parkage is broken" : already replace an order and sent to customer
| j****w 发帖数: 11 | 5 re.split('[0-9a-zA-Z]+_[a-zA-Z]+_[0-9]+_?', '01JAN2019_order_3879940_I like
this product, but the parkage is broken"')[1]
"
【在 v*********9 的大作中提到】 : 哎呀,问题没说清楚 : 比如有一列是客户的comment, 但是像是这样的 : 01JAN2019_order_3879940"I like this product, but the parkage is broken" : already replace an order and sent to customer : 01mar2019_SAP_3879940:the parkage is broken, all things are mess : 01JAN2019_order_3879940-3778"wrong color, I order golden, but comes yellow" : contacted customer to refund : 01JAN2019_order_3879940 : 01JAN2019_dfegf_3879940"I like this product, but the parkage is broken" : already replace an order and sent to customer
| m******n 发帖数: 453 | | g*****g 发帖数: 390 | 7 在学python,练练手哈:
"any_mon" can used as a group (not right now) to catch the time for the
feedback, if needed.
import re
text = '01JAN2019_order_3879940:"I like this product, but the parkage is
broken"'
any_mon = "(?:Jan|Feb|Mar)"
pattern = r'\d+{}\d+_.+_\d+[:"]'.format(any_mon)
res = split(pattern, text, flags=re.I)
if len(res) ==1:
print('No Split')
else:
print(res[1]) # output: "I like this product, but the parkage is broken" | v*********9 发帖数: 1 | 8 Python初学者,
现在需要split一个Column, 这个column里面有两部分,
一部分是类似
01jan2019_order_1977663:
或者是
877920_jan_19799"
类似这种pattern, 那regular expression部分则么写比较好?
试了*_*_*" 或者 *_*_*:
都不好用
谢谢! | j****w 发帖数: 11 | 9 "877920_jan_19799".split("_")
or
re.split("_", "877920_jan_19799")
【在 v*********9 的大作中提到】 : Python初学者, : 现在需要split一个Column, 这个column里面有两部分, : 一部分是类似 : 01jan2019_order_1977663: : 或者是 : 877920_jan_19799" : 类似这种pattern, 那regular expression部分则么写比较好? : 试了*_*_*" 或者 *_*_*: : 都不好用 : 谢谢!
| v*********9 发帖数: 1 | 10 哎呀,问题没说清楚
比如有一列是客户的comment, 但是像是这样的
01JAN2019_order_3879940"I like this product, but the parkage is broken"
already replace an order and sent to customer
01mar2019_SAP_3879940:the parkage is broken, all things are mess
01JAN2019_order_3879940-3778"wrong color, I order golden, but comes yellow"
contacted customer to refund
01JAN2019_order_3879940
01JAN2019_dfegf_3879940"I like this product, but the parkage is broken"
already replace an order and sent to customer
01JAN2019_order_3879940:"I like this product, but the parkage is broken"
already replace an order and sent to customer
01JAN2019_order_3879940_"I like this product, but the parkage is broken"
already replace an order and sent to customer
it feels mold inside
color is not right" contacted with customer
现在就是想除掉不是客户留言的第一部分,试了几个都不好用。 | | | H**********f 发帖数: 2978 | 11 你这是已经在e-commerce做ds或者da了吧。那劝你认真学下常用字符串相关函数和
regular expression,没多少东西就一天的事,否则以后这种没完没了你还得问。
: 哎呀,问题没说清楚
: 比如有一列是客户的comment, 但是像是这样的
: 01JAN2019_order_3879940"I like this product, but the parkage is broken
"
: already replace an order and sent to customer
: 01mar2019_SAP_3879940:the parkage is broken, all things are mess
: 01JAN2019_order_3879940-3778"wrong color, I order golden, but comes
yellow"
: contacted customer to refund
: 01JAN2019_order_3879940
: 01JAN2019_dfegf_3879940"I like this product, but the parkage is broken
"
: already replace an order and sent to customer
【在 v*********9 的大作中提到】 : 哎呀,问题没说清楚 : 比如有一列是客户的comment, 但是像是这样的 : 01JAN2019_order_3879940"I like this product, but the parkage is broken" : already replace an order and sent to customer : 01mar2019_SAP_3879940:the parkage is broken, all things are mess : 01JAN2019_order_3879940-3778"wrong color, I order golden, but comes yellow" : contacted customer to refund : 01JAN2019_order_3879940 : 01JAN2019_dfegf_3879940"I like this product, but the parkage is broken" : already replace an order and sent to customer
| j****w 发帖数: 11 | 12 re.split('[0-9a-zA-Z]+_[a-zA-Z]+_[0-9]+_?', '01JAN2019_order_3879940_I like
this product, but the parkage is broken"')[1]
"
【在 v*********9 的大作中提到】 : 哎呀,问题没说清楚 : 比如有一列是客户的comment, 但是像是这样的 : 01JAN2019_order_3879940"I like this product, but the parkage is broken" : already replace an order and sent to customer : 01mar2019_SAP_3879940:the parkage is broken, all things are mess : 01JAN2019_order_3879940-3778"wrong color, I order golden, but comes yellow" : contacted customer to refund : 01JAN2019_order_3879940 : 01JAN2019_dfegf_3879940"I like this product, but the parkage is broken" : already replace an order and sent to customer
| m******n 发帖数: 453 | | g*****g 发帖数: 390 | 14 在学python,练练手哈:
"any_mon" can used as a group (not right now) to catch the time for the
feedback, if needed.
import re
text = '01JAN2019_order_3879940:"I like this product, but the parkage is
broken"'
any_mon = "(?:Jan|Feb|Mar)"
pattern = r'\d+{}\d+_.+_\d+[:"]'.format(any_mon)
res = split(pattern, text, flags=re.I)
if len(res) ==1:
print('No Split')
else:
print(res[1]) # output: "I like this product, but the parkage is broken" | c*****m 发帖数: 1160 | 15 观察你的例子,得到的结论是:
如果1行里有双引号,就把双引号前的删除;
如果1行里有冒号,就把冒号和它之前的删除;
入宫既没有双引号,也没有冒号,就看有没有 _ 号;如果有,就把整行删除。
这就是三句 python语句,就能清理你刚才那些例子了。 |
|