Home  >  Article  >  Backend Development  >  How to Resolve \'TypeError: can\'t use a string pattern on a bytes-like object in re.findall()\' When Extracting Text from Web Pages?

How to Resolve \'TypeError: can\'t use a string pattern on a bytes-like object in re.findall()\' When Extracting Text from Web Pages?

Mary-Kate Olsen
Mary-Kate OlsenOriginal
2024-11-25 02:41:11778browse

How to Resolve

TypeError: Using a String Pattern on a Bytes-Like Object in re.findall()

When attempting to extract text using regular expressions in Python, you may encounter the error "TypeError: can't use a string pattern on a bytes-like object in re.findall()". This error occurs when you use a string regex pattern to search a bytes-like object, which is often encountered when working with web pages.

To resolve this issue, it's necessary to decode the bytes-like object into a string before applying the regex search. In the code provided:

import urllib.request
import re

url = "http://www.google.com"
regex = r'<title>(,+?)</title>'
pattern  = re.compile(regex)

with urllib.request.urlopen(url) as response:
   html = response.read().decode('utf-8') # Decode the bytes-like object

title = re.findall(pattern, html)
print(title)

By decoding the html variable using .decode('utf-8'), we convert it into a Unicode string that can be processed by the regex pattern. This will allow the code to successfully extract the web page title.

The above is the detailed content of How to Resolve \'TypeError: can\'t use a string pattern on a bytes-like object in re.findall()\' When Extracting Text from Web Pages?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn