Home > Article > Backend Development > Python's eight data import methods, have you mastered them?
In most cases, NumPy or Pandas will be used to import data, so before starting, execute:
import numpy as np import pandas as pd
Many times you don’t know much about some function methods. At this time, Python provides some help information to quickly use Python objects.
np.info(np.ndarray.dtype)
help(pd.read_csv)
filename = 'demo.txt' file = open(filename, mode='r') # 打开文件进行读取 text = file.read() # 读取文件的内容 print(file.closed) # 检查文件是否关闭 file.close() # 关闭文件 print(text)
Use context manager -- with
with open('demo.txt', 'r') as file: print(file.readline()) # 一行一行读取 print(file.readline()) print(file.readline())
Numpy’s built-in functions process data at the C language level.
Flat file is a file containing records without relative relationship structure. (Excel, CSV and Tab delimiter files are supported)
Files with one data type
The string used to separate values skips the first two lines. Read the type of the resulting array in the first and third columns.
filename = 'mnist.txt' data = np.loadtxt(filename, delimiter=',', skiprows=2, usecols=[0,2], dtype=str)
Two hard requirements:
filename = 'titanic.csv' data = np.genfromtxt(filename, delimiter=',', names=True, dtype=None)##Use Pandas to read Flat files
filename = 'demo.csv' data = pd.read_csv(filename, nrows=5,# 要读取的文件的行数 header=None,# 作为列名的行号 sep='t', # 分隔符使用 comment='#',# 分隔注释的字符 na_values=[""]) # 可以识别为NA/NaN的字符串2. Excel SpreadsheetExcelFile() in Pandas is a very convenient and fast class in pandas for reading excel table files, especially for excel containing multiple sheets. It is very convenient to manipulate files.
file = 'demo.xlsx' data = pd.ExcelFile(file) df_sheet2 = data.parse(sheet_name='1960-1966', skiprows=[0], names=['Country', 'AAM: War(2002)']) df_sheet1 = pd.read_excel(data, sheet_name=0, parse_cols=[0], skiprows=[0], names=['Country'])Use the sheet_names property to get the name of the worksheet to be read.
data.sheet_names3. SAS fileSAS (Statistical Analysis System) is a modular and integrated large-scale application software system. The file it saves, sas, is a statistical analysis file.
from sas7bdat import SAS7BDAT with SAS7BDAT('demo.sas7bdat') as file: df_sas = file.to_data_frame()4. Stata fileStata is a complete and integrated statistical software that provides its users with data analysis, data management and professional chart drawing. The saved file is a Stata file with the .dta extension.
data = pd.read_stata('demo.dta')5. Pickled filesAlmost all data types in python (lists, dictionaries, sets, classes, etc.) can be serialized using pickle. Python's pickle module implements basic data sequencing and deserialization. Through the serialization operation of the pickle module, we can save the object information running in the program to a file and store it permanently; through the deserialization operation of the pickle module, we can create the object saved by the last program from the file.
import pickle with open('pickled_demo.pkl', 'rb') as file: pickled_data = pickle.load(file) # 下载被打开被读取到的数据The corresponding operation is to write the method pickle.dump(). 6. HDF5 fileHDF5 file is a common cross-platform data storage file that can store different types of images and digital data and can be transferred on different types of machines. There are also function libraries that uniformly handle this file format. HDF5 files generally have .h5 or .hdf5 as the suffix, and special software is required to open the content of the preview file.
import h5py filename = 'H-H1_LOSC_4_v1-815411200-4096.hdf5' data = h5py.File(filename, 'r')7. Matlab fileIt is a file with the suffix .mat in which matlab stores the data in its workspace.
import scipy.io filename = 'workspace.mat' mat = scipy.io.loadmat(filename)8. Relational database
from sqlalchemy import create_engine engine = create_engine('sqlite://Northwind.sqlite')Use the table_names() method to obtain a list of table names
table_names = engine.table_names()1. Directly query the relational database
con = engine.connect() rs = con.execute("SELECT * FROM Orders") df = pd.DataFrame(rs.fetchall()) df.columns = rs.keys() con.close()Use the context manager -- with
with engine.connect() as con: rs = con.execute("SELECT OrderID FROM Orders") df = pd.DataFrame(rs.fetchmany(size=5)) df.columns = rs.keys()2. Use Pandas to query the relationship Type database
df = pd.read_sql_query("SELECT * FROM Orders", engine)Data explorationAfter the data is imported, the data will be initially explored, such as checking the data type, data size, length and other basic information. Here is a brief summary. 1, NumPy Arrays
data_array.dtype# 数组元素的数据类型 data_array.shape# 阵列尺寸 len(data_array) # 数组的长度2, Pandas DataFrames
df.head()# 返回DataFrames前几行(默认5行) df.tail()# 返回DataFrames最后几行(默认5行) df.index # 返回DataFrames索引 df.columns # 返回DataFrames列名 df.info()# 返回DataFrames基本信息 data_array = data.values # 将DataFrames转换为NumPy数组
The above is the detailed content of Python's eight data import methods, have you mastered them?. For more information, please follow other related articles on the PHP Chinese website!