search

Home  >  Q&A  >  body text

python - How to automatically escape '<abc>' when encountering such html escape characters under python3?

I am new to python. When using the scray crawler, I encountered the special characters of html, so I searched the documentation on Baidu:

import HTMLParser
html_parser = HTMLParser.HTMLParser()
s = '&l t;abc&g t;&nbs p;' #Leave a space to avoid web page escaping
s = html_parser.unescape(s )

Runtime prompt:
import markupbase
ImportError: No module named 'markupbase'


With the help of translation software, I looked at the official documentation of HTMLParser to find the second method

from html.parser import HTMLParser

class MyHTMLParser(HTMLParser):

def handle_data(self, data):
    print(data)
    return data

parser = MyHTMLParser()
s = '&l t;abc&g t;&nbs p;' #A space is left to avoid web page escaping
ss=parser.feed(s)

The second method was tested successfully. The problem encountered is that the return data sentence is invalid?


Excuse me, is there any way to solve the escape problem with just a few lines of code? If there is no second method, how can I get a return value?

typechotypecho2774 days ago1071

reply all(1)I'll reply

  • 某草草

    某草草2017-06-12 09:29:01

    from html.parser import HTMLParser
    html_parser = HTMLParser()
    s = '<abc>&nbsp;'
    txt = html_parser.unescape(s)
    print(txt)
    # 结果:<abc>

    reply
    0
  • Cancelreply