search

Home  >  Q&A  >  body text

json - python中用正则表达式去掉字符串中的冒号

初学python,最近尝试爬数据,json字符串的value中有冒号,需要去掉。我的代码如下。
a和b都是value中会有冒号的字符串

import re
a = "Title:'Intern: Customer Experience + Innovation (CX+I) Intern Brands'"
b = "cmp:'Adecco: USA',cmpesc:'Adecco: USA'"
result = re.sub('^(?:Title|cmp|cmpesc):.+(\:)','', a)

代码执行结果是只剩 Customer Experience + Innovation (CX+I) Intern Brands',之前的内容全被删除了,而我想要的效果是只删intern之后的那个冒号(title后的冒号要保留)。
请问大家该如何修改?

黄舟黄舟2839 days ago1049

reply all(4)I'll reply

  • 大家讲道理

    大家讲道理2017-04-18 10:32:40

    import re
    result = re.sub('^(Title|cmp|cmpesc:)(.+):(.*)',
                    '\\1\\2\\3',
                    "Title:'Intern: Customer Experience + Innovation (CX+I) Intern Brands'")
    
    print(result) # Title:'Intern Customer Experience + Innovation (CX+I) Intern Brands'

    reply
    0
  • PHPz

    PHPz2017-04-18 10:32:40

    In this case:

    ''.join(re.split('(?<![Title|cmp|cmpesc]):',a))

    That’s it

    reply
    0
  • 巴扎黑

    巴扎黑2017-04-18 10:32:40

    Sure enough, I read the question wrong....

    reply
    0
  • 高洛峰

    高洛峰2017-04-18 10:32:40

    No need to remove the colon, just turn it into a dictionary~

    >>> a = "Title:'Intern: Customer Experience + Innovation (CX+I) Intern Brands'";\
    b = "cmp:'Adecco: USA',cmpesc:'Adecco: USA'"
    >>> dict([s.split(':',1) for s in a.split(',')])
    {'Title': "'Intern: Customer Experience + Innovation (CX+I) Intern Brands'"}
    >>> dict([s.split(':',1) for s in b.split(',')])
    {'cmpesc': "'Adecco: USA'", 'cmp': "'Adecco: USA'"}
    >>> 

    Write as a function

    a = "Title:'Intern: Customer Experience + Innovation (CX+I) Intern Brands'"
    b = "cmp:'Adecco: USA',cmpesc:'Adecco: USA'"
    
    def fn(x):
        return dict((s.split(':',1) for s in x.replace("'","").split(',')))
    
    print(fn(a))
    print(fn(b))
    
    # {'Title': 'Intern: Customer Experience + Innovation (CX+I) Intern Brands'}
    # {'cmp': 'Adecco: USA', 'cmpesc': 'Adecco: USA'}
    

    reply
    0
  • Cancelreply