Home > Article > Backend Development > A brief analysis of Python data processing
This article shares with you the relevant content and key explanations about Python data processing. Friends who are interested in this knowledge point can refer to it.
Numpy and Pandas are two frameworks often used in Python data processing. They are both written in C language, so the operation speed is fast. Matplotlib is a Python drawing tool that can draw previously processed data through images. I have only seen the syntax before and have not systematically studied and summarized it. This blog post summarizes the APIs of these three frameworks.
The following is a brief introduction and difference between these three frameworks:
Numpy: often used for data generation and some operations
Pandas: Built based on Numpy, it is an upgraded version of Numpy
Matplotlib: A powerful drawing tool in Python
Numpy
Numpy quick start tutorial can refer to: Numpy tutorial
Numpy properties
ndarray.ndim: Dimension
ndarray.shape: Number of rows and columns, such as (3, 5)
ndarray.size: Number of elements
ndarray. dtype: element type
Numpy creation
array(object, dtype=None): Use Python’s list or tuple to create data
zeors(shape, dtype=float): Create data that is all 0
ones(shape, dtype=None): Create data that is all 1
empty( shape, dtype=float): Create data without initialization
arange([start, ]stop, [step, ]dtype=None): Create fixed-interval data segments
linspace(start, stop, num=50, dtype=None): Create data evenly within a given range
Numpy operation
Add, Subtract: a b, a - b
Multiply: b*2, 10*np.sin(a)
raised to the power: b**2
Judgment: a93a319bb29a8ffd7e82e258962636a0e 0]
Pandas handles missing data
Delete rows with missing data: df.dropna(how='any')
Fill in missing data :df.fillna(value=5)
Whether the data value is NaN: pd.isna(df1)
Pandas merged data
pd.concat([df1, df2, df3], axis=0): merge df
pd.merge(left, right, on='key'): merge based on key field
df.append(s, ignore_index=True):Add data
Pandas import and export
df.to_csv('foo.csv' ): Save to csv file
pd.read_csv('foo.csv'): Read from csv file
df.to_excel('foo.xlsx', sheet_name='Sheet1'): Save to excel file
pd.read_excel('foo.xlsx', 'Sheet1', index_col=None, na_values=['NA']): From excel file Read
Matplotlib
Here we only introduce the simplest way to plot:
import pandas as pd import numpy as np import matplotlib.pyplot as plt # 随机生成1000个数据 data = pd.Series(np.random.randn(1000),index=np.arange(1000)) # 为了方便观看效果, 我们累加这个数据 data.cumsum() # pandas 数据可以直接观看其可视化形式 data.plot() plt.show()
Related recommendations:
A brief discussion on the configuration file path problem of python log
The above is the detailed content of A brief analysis of Python data processing. For more information, please follow other related articles on the PHP Chinese website!