python - 使用selenium，用PhantomJS抓取网页后保存为.html后出现中文乱码？

Question

保存的html文档中出现中文乱码：&lt;meta name="keywords" content="鈽呯敤閽㈢惔璇犻噴鍛ㄦ澃浼︹櫔鏃犱笌浼︽瘮涓嶉€濈粡鍏革紝姊︽兂瀹禯eDragon锛岄挗鐞达紝缁忓吀锛岃交闊充箰"&gt; 代码： {代码...} 1.使用...

黄舟 · Answer

Try this:

print(browser.page_source.encode('utf-8').decode(), file=open("xxx.html","w", encoding='utf-8'))

高洛峰 · Answer

print(browser.page_source,file=open('C:/Users/welwel/Desktop/source.html','w'))

高洛峰 · Answer

Okay, my need is to crawl the comments and song titles of songs. I originally planned to crawl down the webpage and slowly regularize it. Later, I found that the odd-numbered webpages were normal in Chinese, and the even-numbered webpages were garbled. (I want to crawl 50 of them. web page), and later it was reversed. This means that there is a bug on win7 and linux is not installed. Use xpath analysis according to requirements,
eg:
ele_com = browser.find_element_by_xpath("//p[@class='cnt f-brk']")
The returned data is normal. Still running on cmd.
So if you want to crawl data, just use the tools on the module, don’t mess around

python - 使用selenium，用PhantomJS抓取网页后保存为.html后出现中文乱码？

reply all(3)I'll reply