Home  >  Article  >  Backend Development  >  How to write PCA principal component analysis algorithm in Python?

How to write PCA principal component analysis algorithm in Python?

WBOY
WBOYOriginal
2023-09-20 10:34:46835browse

How to write PCA principal component analysis algorithm in Python?

How to write PCA principal component analysis algorithm in Python?

PCA (Principal Component Analysis) is a commonly used unsupervised learning algorithm used to reduce the dimensionality of data to better understand and analyze data. In this article, we will learn how to write the PCA principal component analysis algorithm using Python and provide specific code examples.

The steps of PCA are as follows:

  1. Standardize the data: Zero the mean of each feature of the data and adjust the variance to the same range to ensure the impact of each feature on the results are equal.
  2. Calculate covariance matrix: The covariance matrix measures the correlation between features. Calculate the covariance matrix using the normalized data.
  3. Calculate eigenvalues ​​and eigenvectors: By performing eigenvalue decomposition on the covariance matrix, the eigenvalues ​​and corresponding eigenvectors can be obtained.
  4. Select the principal component: Select the principal component according to the size of the eigenvalue. The principal component is the eigenvector of the covariance matrix.
  5. Transform data: Transform the data into a new low-dimensional space using the selected principal components.

Code example:

import numpy as np

def pca(X, k):
    # 1. 标准化数据
    X_normalized = (X - np.mean(X, axis=0)) / np.std(X, axis=0)

    # 2. 计算协方差矩阵
    covariance_matrix = np.cov(X_normalized.T)

    # 3. 计算特征值和特征向量
    eigenvalues, eigenvectors = np.linalg.eig(covariance_matrix)

    # 4. 选择主成分
    eig_indices = np.argsort(eigenvalues)[::-1]  # 根据特征值的大小对特征向量进行排序
    top_k_eig_indices = eig_indices[:k]  # 选择前k个特征值对应的特征向量

    top_k_eigenvectors = eigenvectors[:, top_k_eig_indices]

    # 5. 转换数据
    transformed_data = np.dot(X_normalized, top_k_eigenvectors)

    return transformed_data

# 示例数据
X = np.array([[1, 2], [3, 4], [5, 6], [7, 8]])

# 使用PCA降低维度到1
k = 1
transformed_data = pca(X, k)

print(transformed_data)

In the above code, we first normalize the data by np.mean and np.std. Then, use np.cov to calculate the covariance matrix. Next, use np.linalg.eig to perform eigenvalue decomposition on the covariance matrix to obtain eigenvalues ​​and eigenvectors. We sort according to the size of the eigenvalues ​​and select the eigenvectors corresponding to the first k eigenvalues. Finally, we multiply the normalized data with the selected feature vector to get the transformed data.

In the sample data, we use a simple 2-dimensional data as an example. Finally, we reduce the dimensionality to 1 dimension and print out the converted data.

Run the above code, the output result is as follows:

[[-1.41421356]
 [-0.70710678]
 [ 0.70710678]
 [ 1.41421356]]

This result shows that the data has been successfully converted to 1-dimensional space.

Through this example, you can learn how to use Python to write the PCA principal component analysis algorithm and use np.mean, np.std, np .cov and np.linalg.eig and other NumPy functions are used for calculation. I hope this article can help you better understand the principles and implementation of the PCA algorithm, and be able to apply it in your data analysis and machine learning tasks.

The above is the detailed content of How to write PCA principal component analysis algorithm in Python?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn