search

Home  >  Q&A  >  body text

Python正则替换所有标签内的引号

<p class="red" id='123' onclick="do()">
  <h1>"哈哈"</h1>
  <a href="1" title="123"></a>
</p>

比如这段代码,我想替换所有标签<>内的双引号以及单引号为"aaa",而其他地方的引号不受影响该怎么写

<p class=aaaredaaa id=aaa123aaa onclick=aaado()aaa>
  <h1>"哈哈"</h1>
  <a href=aaa1aaa title=aaa123aaa></a>
</p>
PHP中文网PHP中文网2889 days ago414

reply all(2)I'll reply

  • 巴扎黑

    巴扎黑2017-04-18 09:18:45

    First match the outer '<....>', then match the inner ' and ":

    import re
    
    ss = '''
    <p class="red" id='123' onclick="do()">
      <h1>"哈哈"</h1>
      <a href="1" title="123"></a>
    </p>
    '''
    
    def quoterepl(matchobj):
        pattern = re.compile('\'|"')
        return pattern.sub('aaa', matchobj.group(0))
    
    print re.sub('<[^<>]+?>', quoterepl, ss)

    reply
    0
  • 伊谢尔伦

    伊谢尔伦2017-04-18 09:18:45

    Update, I understand the meaning of the question. I originally misunderstood it as replacing the things inside the quotation marks, but now I understand that I need to replace the quotation marks themselves.

    <([^<>]*)['"]([^<>]*)>

    Because we only changed the quotation marks and left the other parts unchanged, we actually don’t want to capture the quotation marks but to capture things other than the quotation marks and then replace them with

    <aaa>

    This replaces the quotation marks with aaa. Please note that only one quotation mark can be replaced at a time. I suggest matching the contents of the < tag first

    <([^<>]*=[^<>]*)>

    Then match

    ['"]

    replaced with

    aaa

    ——The following is the wrong original answer——
    Give me an example

    class="(.*?)"

    The explanation is to add double quotes after class= and add as few characters as possible until another double quote

    reply
    0
  • Cancelreply