


PCA principal component analysis (dimensionality reduction) skills in Python
PCA (Principal Component Analysis) principal component analysis is a very commonly used data dimensionality reduction technology. The PCA algorithm can be used to process data to discover the inherent characteristics of the data and provide a more accurate and effective data collection for subsequent data analysis and modeling.
Below we will introduce some techniques for using PCA principal component analysis in Python.
- How to normalize data
Before performing PCA dimensionality reduction analysis, you first need to normalize the data. This is because the PCA algorithm calculates the principal components through variance maximization, rather than simply the size of the element values, so it fully takes into account the impact of the corresponding variance of each element.
There are many methods in Python for data normalization. The most basic method is to standardize the data into a standard normal distribution with a mean of 0 and a variance of 1 through the StandardScaler class of the sklearn library. The code is as follows:
from sklearn.preprocessing import StandardScaler scaler = StandardScaler() data_std = scaler.fit_transform(data)
In this way, we can get a data that has been normalized The processed data collection data_std.
- Using PCA for dimensionality reduction
The code for using PCA to reduce the dimensionality of data is very simple. The PCA module has been integrated in the sklearn library. We only need to set the number of principal components retained after dimensionality reduction when calling the PCA class. For example, the following code reduces the data to 2 principal components:
from sklearn.decomposition import PCA pca = PCA(n_components=2) data_pca = pca.fit_transform(data_std)
Among them, data_pca returns the new data after PCA dimensionality reduction processing.
- How to choose the number of principal components after dimensionality reduction
When actually using PCA for data dimensionality reduction, we need to choose the appropriate number of principal components to achieve the best Dimensionality reduction effect. Usually, we can judge by plotting the cumulative variance contribution rate graph.
The cumulative variance contribution rate represents the percentage of the sum of the variances of the first n principal components to the total variance, for example:
import numpy as np pca = PCA() pca.fit(data_std) cum_var_exp = np.cumsum(pca.explained_variance_ratio_)
By drawing the cumulative variance contribution rate graph, we can observe the number of principal components The changing trend of the cumulative variance contribution rate when gradually increasing from 1 to estimate the appropriate number of principal components. The code is as follows:
import matplotlib.pyplot as plt plt.bar(range(1, 6), pca.explained_variance_ratio_, alpha=0.5, align='center') plt.step(range(1, 6), cum_var_exp, where='mid') plt.ylabel('Explained variance ratio') plt.xlabel('Principal components') plt.show()
The red line in the figure represents the cumulative variance contribution rate, the x-axis represents the number of principal components, and the y-axis represents the proportion of variance explained. It can be found that the variance contribution rate of the first two principal components is close to 1, so selecting two principal components can meet the needs of most analysis tasks.
- How to visualize the data after PCA dimensionality reduction
Finally, we can use the scatter function of the matplotlib library to visualize the data after PCA dimensionality reduction. For example, the following code reduces the data from the original 4 dimensions to 2 dimensions through PCA, and then displays it visually:
import matplotlib.pyplot as plt x = data_pca[:, 0] y = data_pca[:, 1] labels = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'] colors = ['b', 'g', 'r', 'c', 'm', 'y', 'k', 'pink', 'brown', 'orange'] for i, label in enumerate(np.unique(labels)): plt.scatter(x[labels == label], y[labels == label], c=colors[i], label=label, alpha=0.7) plt.legend() plt.xlabel('Principal Component 1') plt.ylabel('Principal Component 2') plt.show()
The colors and labels in the figure correspond to the numerical labels in the original data respectively. Through visualization With dimensionally reduced data, we can better understand the structure and characteristics of the data.
In short, using PCA principal component analysis technology can help us reduce the dimensionality of the data and thereby better understand the structure and characteristics of the data. Through Python's sklearn and matplotlib libraries, we can implement and visualize the PCA algorithm very conveniently.
The above is the detailed content of PCA principal component analysis (dimensionality reduction) techniques in Python. For more information, please follow other related articles on the PHP Chinese website!

本篇文章给大家带来了关于Python的相关知识,其中主要介绍了关于Seaborn的相关问题,包括了数据可视化处理的散点图、折线图、条形图等等内容,下面一起来看一下,希望对大家有帮助。

本篇文章给大家带来了关于Python的相关知识,其中主要介绍了关于进程池与进程锁的相关问题,包括进程池的创建模块,进程池函数等等内容,下面一起来看一下,希望对大家有帮助。

本篇文章给大家带来了关于Python的相关知识,其中主要介绍了关于简历筛选的相关问题,包括了定义 ReadDoc 类用以读取 word 文件以及定义 search_word 函数用以筛选的相关内容,下面一起来看一下,希望对大家有帮助。

本篇文章给大家带来了关于Python的相关知识,其中主要介绍了关于数据类型之字符串、数字的相关问题,下面一起来看一下,希望对大家有帮助。

VS Code的确是一款非常热门、有强大用户基础的一款开发工具。本文给大家介绍一下10款高效、好用的插件,能够让原本单薄的VS Code如虎添翼,开发效率顿时提升到一个新的阶段。

本篇文章给大家带来了关于Python的相关知识,其中主要介绍了关于numpy模块的相关问题,Numpy是Numerical Python extensions的缩写,字面意思是Python数值计算扩展,下面一起来看一下,希望对大家有帮助。

pythn的中文意思是巨蟒、蟒蛇。1989年圣诞节期间,Guido van Rossum在家闲的没事干,为了跟朋友庆祝圣诞节,决定发明一种全新的脚本语言。他很喜欢一个肥皂剧叫Monty Python,所以便把这门语言叫做python。


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

SublimeText3 Chinese version
Chinese version, very easy to use

mPDF
mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

Notepad++7.3.1
Easy-to-use and free code editor

DVWA
Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software

SecLists
SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.
