search
HomeBackend DevelopmentPython TutorialIn-depth understanding of Python data processing and visualization

This article brings you relevant knowledge about python, which mainly introduces related issues about data processing and visualization, including the preliminary use of NumPy, the use of Matplotlib package and data statistics Visual display, etc. I hope it will be helpful to everyone.

In-depth understanding of Python data processing and visualization

Recommended learning: python tutorial

1. Preliminary use of NumPy

Tables are general representations of data form, but it is incomprehensible to the machine, that is, it is unrecognizable data, so we need to adjust the form of the table.
The commonly used machine learning representation is a data matrix.
In-depth understanding of Python data processing and visualization
We observed this table and found that there are two types of attributes in the matrix, one is numeric type and the other is Boolean type. So we will now build a model to describe this table:

#  数据的矩阵化import numpy as np
data = np.mat([[1,200,105,3,False],[2,165,80,2,False],[3,184.5,120,2,False],
              [4,116,70.8,1,False],[5,270,150,4,True]])row = 0for line in data:
    row += 1print( row )print(data.size)print(data)

The first line of code here means introducing NumPy and renaming it to np. In the second line, we use the mat() method in NumPy to create a data matrix, and row is the variable introduced to calculate the number of rows.
The size here means a table of 5*5. You can see the data by printing the data directly:
In-depth understanding of Python data processing and visualization

2. Use of Matplotlib package – graphical data processing

Let’s still look at the top table. The second column is the difference in housing prices. It is not easy to see the difference intuitively (because there are only numbers), so we hope to draw it (Research The method for numerical differences and anomalies is to draw the distribution of data ):

import numpy as npimport scipy.stats as statsimport pylab
data =  np.mat([[1,200,105,3,False],[2,165,80,2,False],[3,184.5,120,2,False],
              [4,116,70.8,1,False],[5,270,150,4,True]])coll = []for row in data:
    coll.append(row[0,1])stats.probplot(coll,plot=pylab)pylab.show()

The result of this code is to generate a graph:
In-depth understanding of Python data processing and visualization
So that we can clearly see it There is a difference.

The requirement for a coordinate chart is to show the specific values ​​of data through different rows and columns.
Of course, we can also display the coordinate diagram:
In-depth understanding of Python data processing and visualizationIn-depth understanding of Python data processing and visualization

3. Deep learning theoretical method – similarity calculation (can be skipped)

Similarity There are many calculation methods, and we choose the two most commonly used ones, namely Euclidean similarity and cosine similarity calculation.

1. Similarity calculation based on Euclidean distance

Euclidean distance is used to represent the true distance between two points in three-dimensional space. We all know the formula, but we rarely hear the name:
In-depth understanding of Python data processing and visualization
So let’s take a look at its practical application:
This table shows the ratings of items by three users:
In-depth understanding of Python data processing and visualization
d12 represents the similarity between user 1 and user 2, then there is:
In-depth understanding of Python data processing and visualization
Similarly, d13:
In-depth understanding of Python data processing and visualization
It can be seen that user 2 is more similar to User 1 (the smaller the distance, the greater the similarity).

2. Similarity calculation based on cosine angle

The starting point for the calculation of cosine angle is the difference in the included angle.
In-depth understanding of Python data processing and visualization
In-depth understanding of Python data processing and visualization
It can be seen that compared to user 3, user 2 is more similar to user 1 (the more similar the two targets are, the smaller the angle formed by their line segments)

4. Visual display of data statistics (taking precipitation in Bozhou City as an example)

Quartiles of data

Quartiles are the statistical median scores A kind of digit, that is, the data is arranged from small to large, and then divided into four equal parts. The data at the three dividing points is the quartile.
First quartile (Q1), also called lower quartile;
Second quartile (Q1), also called median;
Third quartile (Q1), also called lower quartile;

The gap between the third quartile and the first quartile is also called the four-point gap (IQR).

若n为项数,则:
Q1的位置 = (n+1)*0.25
Q2的位置 = (n+1)*0.50
Q3的位置 = (n+1)*0.75

四分位示例:
关于这个rain.csv,有需要的可以私我要文件,我使用的是亳州市2010-2019年的月份降水情况。

from pylab import *import pandas as pdimport matplotlib.pyplot as plot
filepath = ("C:\\Users\\AWAITXM\\Desktop\\rain.csv")# "C:\Users\AWAITXM\Desktop\rain.csv"dataFile = pd.read_csv(filepath)summary = dataFile.describe()print(summary)array = dataFile.iloc[:,:].values
boxplot(array)plot.xlabel("year")plot.ylabel("rain")show()

以下是plot运行结果:
In-depth understanding of Python data processing and visualization
这个是pandas的运行
In-depth understanding of Python data processing and visualization
这里就可以很清晰的看出来数据的波动范围。
可以看出,不同月份的降水量有很大差距,8月最多,1-4月和10-12月最少。

那么每月的降水增减程度如何比较?

from pylab import *import pandas as pdimport matplotlib.pyplot as plot
filepath = ("C:\\Users\\AWAITXM\\Desktop\\rain.csv")# "C:\Users\AWAITXM\Desktop\rain.csv"dataFile = pd.read_csv(filepath)summary = dataFile.describe()minRings = -1maxRings = 99nrows = 11for i in range(nrows):
    dataRow = dataFile.iloc[i,1:13]
    labelColor = ( (dataFile.iloc[i,12] - minRings ) / (maxRings - minRings) )
    dataRow.plot(color = plot.cm.RdYlBu(labelColor),alpha = 0.5)plot.xlabel("Attribute")plot.ylabel(("Score"))show()

结果如图:
In-depth understanding of Python data processing and visualization
可以看出来降水月份并不规律的上涨或下跌。

那么每月降水是否相关?

from pylab import *import pandas as pdimport matplotlib.pyplot as plot
filepath = ("C:\\Users\\AWAITXM\\Desktop\\rain.csv")# "C:\Users\AWAITXM\Desktop\rain.csv"dataFile = pd.read_csv(filepath)summary = dataFile.describe()corMat = pd.DataFrame(dataFile.iloc[1:20,1:20].corr())plot.pcolor(corMat)plot.show()

结果如图:
In-depth understanding of Python data processing and visualization
可以看出,颜色分布十分均匀,表示没有多大的相关性,因此可以认为每月的降水是独立行为。

今天就记录到这里了,我们下次再见!希望本文章对你也有所帮助。

推荐学习:python学习教程

The above is the detailed content of In-depth understanding of Python data processing and visualization. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:CSDN. If there is any infringement, please contact admin@php.cn delete
Python vs. C  : Understanding the Key DifferencesPython vs. C : Understanding the Key DifferencesApr 21, 2025 am 12:18 AM

Python and C each have their own advantages, and the choice should be based on project requirements. 1) Python is suitable for rapid development and data processing due to its concise syntax and dynamic typing. 2)C is suitable for high performance and system programming due to its static typing and manual memory management.

Python vs. C  : Which Language to Choose for Your Project?Python vs. C : Which Language to Choose for Your Project?Apr 21, 2025 am 12:17 AM

Choosing Python or C depends on project requirements: 1) If you need rapid development, data processing and prototype design, choose Python; 2) If you need high performance, low latency and close hardware control, choose C.

Reaching Your Python Goals: The Power of 2 Hours DailyReaching Your Python Goals: The Power of 2 Hours DailyApr 20, 2025 am 12:21 AM

By investing 2 hours of Python learning every day, you can effectively improve your programming skills. 1. Learn new knowledge: read documents or watch tutorials. 2. Practice: Write code and complete exercises. 3. Review: Consolidate the content you have learned. 4. Project practice: Apply what you have learned in actual projects. Such a structured learning plan can help you systematically master Python and achieve career goals.

Maximizing 2 Hours: Effective Python Learning StrategiesMaximizing 2 Hours: Effective Python Learning StrategiesApr 20, 2025 am 12:20 AM

Methods to learn Python efficiently within two hours include: 1. Review the basic knowledge and ensure that you are familiar with Python installation and basic syntax; 2. Understand the core concepts of Python, such as variables, lists, functions, etc.; 3. Master basic and advanced usage by using examples; 4. Learn common errors and debugging techniques; 5. Apply performance optimization and best practices, such as using list comprehensions and following the PEP8 style guide.

Choosing Between Python and C  : The Right Language for YouChoosing Between Python and C : The Right Language for YouApr 20, 2025 am 12:20 AM

Python is suitable for beginners and data science, and C is suitable for system programming and game development. 1. Python is simple and easy to use, suitable for data science and web development. 2.C provides high performance and control, suitable for game development and system programming. The choice should be based on project needs and personal interests.

Python vs. C  : A Comparative Analysis of Programming LanguagesPython vs. C : A Comparative Analysis of Programming LanguagesApr 20, 2025 am 12:14 AM

Python is more suitable for data science and rapid development, while C is more suitable for high performance and system programming. 1. Python syntax is concise and easy to learn, suitable for data processing and scientific computing. 2.C has complex syntax but excellent performance and is often used in game development and system programming.

2 Hours a Day: The Potential of Python Learning2 Hours a Day: The Potential of Python LearningApr 20, 2025 am 12:14 AM

It is feasible to invest two hours a day to learn Python. 1. Learn new knowledge: Learn new concepts in one hour, such as lists and dictionaries. 2. Practice and exercises: Use one hour to perform programming exercises, such as writing small programs. Through reasonable planning and perseverance, you can master the core concepts of Python in a short time.

Python vs. C  : Learning Curves and Ease of UsePython vs. C : Learning Curves and Ease of UseApr 19, 2025 am 12:20 AM

Python is easier to learn and use, while C is more powerful but complex. 1. Python syntax is concise and suitable for beginners. Dynamic typing and automatic memory management make it easy to use, but may cause runtime errors. 2.C provides low-level control and advanced features, suitable for high-performance applications, but has a high learning threshold and requires manual memory and type safety management.

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

MinGW - Minimalist GNU for Windows

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

PhpStorm Mac version

PhpStorm Mac version

The latest (2018.2.1) professional PHP integrated development tool

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)