Home >Backend Development >Python Tutorial >Why Does `UnicodeDecodeError: Invalid Continuation Byte` Occur with UTF-8, But Not Latin-1?
Troubleshooting UnicodeDecodeError: Invalid Continuation Byte
When encountering the error "UnicodeDecodeError: 'utf8' codec can't decode byte invalid continuation byte," it's important to identify the underlying cause. In this instance, the issue arises when attempting to decode a specific string containing a character encoded using UTF-8.
The character xe9 represents the letter "é" in UTF-8 encoding. To decode it correctly, it's necessary to use an appropriate decoder that supports this UTF-8 character. However, as the error suggests, the default "utf-8" decoder in this case is unable to process the continuation byte properly.
Why Does it Succeed with "Latin-1" Codec?
The "latin-1" codec, also known as ISO-8859-1, represents a different character encoding standard that does not include the "é" character. Instead, it maps the byte xe9 to the character "í," which does not require a continuation byte.
Therefore, when using the "latin-1" codec, the decoder correctly interprets the byte xe9 as "í" and returns the string "a test of í char" without an error.
Solution to the Issue
To resolve the "UnicodeDecodeError" for the original string, one needs to use a decoder that supports the UTF-8 encoding. For example, instead of the default "utf-8" decoder, one can use the "u8" decoder specifically designed for UTF-8:
v = o.decode("u8")
Alternatively, the string can be modified to use the Latin-1 encoding by replacing the UTF-8 coded character with its Latin-1 equivalent:
o = "a test of í char"
By using the appropriate decoder or encoding, the string can be successfully decoded without encountering the "UnicodeDecodeError: invalid continuation byte" error.
The above is the detailed content of Why Does `UnicodeDecodeError: Invalid Continuation Byte` Occur with UTF-8, But Not Latin-1?. For more information, please follow other related articles on the PHP Chinese website!