Maison > Questions et réponses > le corps du texte
初学python,最近尝试爬数据,json字符串的value中有冒号,需要去掉。我的代码如下。
a和b都是value中会有冒号的字符串
import re
a = "Title:'Intern: Customer Experience + Innovation (CX+I) Intern Brands'"
b = "cmp:'Adecco: USA',cmpesc:'Adecco: USA'"
result = re.sub('^(?:Title|cmp|cmpesc):.+(\:)','', a)
代码执行结果是只剩 Customer Experience + Innovation (CX+I) Intern Brands',之前的内容全被删除了,而我想要的效果是只删intern之后的那个冒号(title后的冒号要保留)。
请问大家该如何修改?
大家讲道理2017-04-18 10:32:40
import re
result = re.sub('^(Title|cmp|cmpesc:)(.+):(.*)',
'\1\2\3',
"Title:'Intern: Customer Experience + Innovation (CX+I) Intern Brands'")
print(result) # Title:'Intern Customer Experience + Innovation (CX+I) Intern Brands'
PHPz2017-04-18 10:32:40
Dans ce cas :
''.join(re.split('(?<![Title|cmp|cmpesc]):',a))
C'est bien
高洛峰2017-04-18 10:32:40
Pas besoin de supprimer les deux points, transformez-le simplement en dictionnaire~
>>> a = "Title:'Intern: Customer Experience + Innovation (CX+I) Intern Brands'";\
b = "cmp:'Adecco: USA',cmpesc:'Adecco: USA'"
>>> dict([s.split(':',1) for s in a.split(',')])
{'Title': "'Intern: Customer Experience + Innovation (CX+I) Intern Brands'"}
>>> dict([s.split(':',1) for s in b.split(',')])
{'cmpesc': "'Adecco: USA'", 'cmp': "'Adecco: USA'"}
>>>
Écrire en fonction
a = "Title:'Intern: Customer Experience + Innovation (CX+I) Intern Brands'"
b = "cmp:'Adecco: USA',cmpesc:'Adecco: USA'"
def fn(x):
return dict((s.split(':',1) for s in x.replace("'","").split(',')))
print(fn(a))
print(fn(b))
# {'Title': 'Intern: Customer Experience + Innovation (CX+I) Intern Brands'}
# {'cmp': 'Adecco: USA', 'cmpesc': 'Adecco: USA'}