Home  >  Article  >  Backend Development  >  How to Handle UTF8 Encoding in Python When Reading CSV Files?

How to Handle UTF8 Encoding in Python When Reading CSV Files?

Mary-Kate Olsen
Mary-Kate OlsenOriginal
2024-11-02 14:10:30436browse

How to Handle UTF8 Encoding in Python When Reading CSV Files?

Reading a UTF8 CSV File with Python

CSV files, commonly used for data exchange, often contain accented characters that require UTF8 encoding to preserve their integrity. The Python csvreader, however, supports only ASCII data.

Problem

When attempting to read a UTF8 CSV file with accented French or Spanish characters, despite using code to handle UTF8 encoding, the following exception was encountered:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 68: ordinal not in range(128)

Solution

The solution lies in understanding the purpose of the encode method. It converts Unicode strings into byte strings, not vice versa. By correctly utilizing the codecs module and specifically codecs.open for handling UTF8 text files, the code can be simplified:

<code class="python">import csv

def unicode_csv_reader(utf8_data, dialect=csv.excel, **kwargs):
    csv_reader = csv.reader(utf8_data, dialect=dialect, **kwargs)
    for row in csv_reader:
        yield [unicode(cell, 'utf-8') for cell in row]

filename = 'da.csv'
reader = unicode_csv_reader(open(filename))
for field1, field2, field3 in reader:
  print field1, field2, field3 </code>

Note

If the input data is not in UTF8, such as ISO-8859-1, the code requires transcoding:

<code class="python">line.decode('whateverweirdcodec').encode('utf-8')</code>

However, this is often unnecessary as csv can directly handle ISO-8859-* encoded byte strings.

The above is the detailed content of How to Handle UTF8 Encoding in Python When Reading CSV Files?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn