Home >Backend Development >Python Tutorial >How to Read Data Directly from a URL Using Pandas?

How to Read Data Directly from a URL Using Pandas?

DDD
DDDOriginal
2024-11-04 10:40:30489browse

How to Read Data Directly from a URL Using Pandas?

The Read-All-URL Conundrum

One common task in data analysis is to load data from a URL. Pandas, a popular Python library for data manipulation, provides a read_csv function that allows one to read data from a CSV file located in a file path or as a file-like object. However, attempting to directly pass a URL to read_csv may result in an error.

Understanding the Error

To demonstrate this error, let's consider the example provided in the question:

<code class="python">import pandas as pd
import requests

url = "https://github.com/cs109/2014_data/blob/master/countries.csv"
s = requests.get(url).content
c = pd.read_csv(s)</code>

This code attempts to retrieve the CSV file from the given URL using the requests library and then pass the retrieved content as a file-like object to read_csv. However, this will raise an error:

Expected file path name or file-like object, got <class 'bytes'> type

Resolving the Issue

To resolve this error, we need to ensure that we pass a file-like object to read_csv. In Python, there are two main types of file-like objects: text files and binary files. The example provided in the question passes a byte array retrieved from the URL, which is a binary file. Read_csv expects a text file object, which can be obtained by decoding the byte array:

<code class="python">import pandas as pd

url = "https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv"
c = pd.read_csv(url, encoding="utf-8")</code>

By specifying the encoding as "utf-8," we are interpreting the byte array as a text file. This allows read_csv to successfully load the data from the URL.

Improved Simplicity with Pandas 0.19.2

In the latest version of pandas (0.19.2), there is a simpler solution available. Pandas now allows direct reading from URLs:

<code class="python">import pandas as pd

url = "https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv"
c = pd.read_csv(url)</code>

This eliminates the need for additional operations such as retrieving the content and decoding it, making the process more straightforward.

The above is the detailed content of How to Read Data Directly from a URL Using Pandas?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn