Home  >  Article  >  Backend Development  >  How to Read UTF8 CSV Files with Accented Characters in Python?

How to Read UTF8 CSV Files with Accented Characters in Python?

Patricia Arquette
Patricia ArquetteOriginal
2024-11-02 21:57:30179browse

How to Read UTF8 CSV Files with Accented Characters in Python?

Reading a UTF8 CSV File with Python

Reading CSV files containing accented characters can be challenging in Python due to its limited ASCII support. This article explores a solution to this problem, addressing the "UnicodeDecodeError" encountered when attempting to read such files.

Unicode-Friendly CSV Reader

To handle accented characters, we need a CSV reader that supports Unicode encoding. The following code modifies the standard CSV reader:

<code class="python">def unicode_csv_reader(unicode_csv_data, dialect=csv.excel, **kwargs):
    # Decode UTF-8 data into Unicode strings
    csv_reader = csv.reader(unicode_csv_data, dialect=dialect, **kwargs)
    for row in csv_reader:
        yield [unicode(cell, 'utf-8') for cell in row]</code>

UTF-8 Encoder

The original solution incorrectly applied encoding to a byte-string instead of a Unicode string. The below code corrects this mistake:

<code class="python">import csv

def unicode_csv_reader(utf8_data, dialect=csv.excel, **kwargs):
    csv_reader = csv.reader(utf8_data, dialect=dialect, **kwargs)
    for row in csv_reader:
        yield [unicode(cell, 'utf-8') for cell in row]</code>

Example Usage

Now we can confidently read UTF8-encoded CSV files as follows:

<code class="python">filename = 'output.csv'
reader = unicode_csv_reader(open(filename))

# Iterate through the rows, fields
for field1, field2, field3 in reader:
    print field1, field2, field3 </code>

Note on Input Data Encoding

Remember that the provided solution assumes the input data is already in UTF8 encoding. If this is not the case, you can use the decode method to convert it to UTF8 before passing it to the CSV reader.

The above is the detailed content of How to Read UTF8 CSV Files with Accented Characters in Python?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn