


How to convert Scikit-learn's IRIS dataset into a dataset with only two features in Python?
Iris, a multivariate flower dataset, is one of the most useful Python scikit-learn datasets. It is divided into 3 categories of 50 instances each and contains measurements of the sepal and petal parts of three species of iris (Iris mountaina, Iris virginia and Iris variegated). Apart from this, the Iris dataset contains 50 instances of each of the three species and consists of four features, namely sepal_length (cm), sepal_width (cm), petal_length (cm), petal_width (cm).
We can use Principal Component Analysis (PCA) to transform the IRIS dataset into a new feature space with 2 features.
step
We can convert the IRIS dataset into a 2-feature dataset using PCA in Python by following the steps given below -
Step 1 - First, import the necessary packages from scikit-learn. We need to import the dataset and decomposition package.
Steps 2 - Load the IRIS dataset.
Steps 3 - Print detailed information about the dataset.
Steps 4 - Initialize Principal Component Analysis (PCA) and apply the fit() function to fit the data. p>
Step 5 - Convert the dataset into a new dimension, a 2-feature dataset.
Example
In the example below, we will transform the scikit-learn IRIS plant dataset into 2 features via PCA using the above steps.
# Importing the necessary packages from sklearn import datasets from sklearn import decomposition # Load iris plant dataset iris = datasets.load_iris() # Print details about the dataset print('Features names : '+str(iris.feature_names)) print('\n') print('Features size : '+str(iris.data.shape)) print('\n') print('Target names : '+str(iris.target_names)) print('\n') X_iris, Y_iris = iris.data, iris.target # Initialize PCA and fit the data pca_2 = decomposition.PCA(n_components=2) pca_2.fit(X_iris) # Transforming iris data to new dimensions(with 2 features) X_iris_pca2 = pca_2.transform(X_iris) # Printing new dataset print('New Dataset size after transformations: ', X_iris_pca2.shape)
Output
It will produce the following output -
Features names : ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)'] Features size : (150, 4) Target names : ['setosa' 'versicolor' 'virginica'] New Dataset size after transformations: (150, 2)
How to convert Iris dataset to 3-feature dataset?
We can transform the Iris dataset into a new feature space with 3 features using a statistical method called Principal Component Analysis (PCA). PCA essentially linearly projects the data into a new feature space by analyzing the features of the original data set.
The main concept behind PCA is to select the "main" features of the data and build features based on them. It will give us a new dataset that is smaller in size but has the same information as the original dataset.
Example
In the example below, we will use PCA to transform the scikit-learn Iris plant dataset (initialized with 3 components).
# Importing the necessary packages from sklearn import datasets from sklearn import decomposition # Load iris plant dataset iris = datasets.load_iris() # Print details about the dataset print('Features names : '+str(iris.feature_names)) print('\n') print('Features size : '+str(iris.data.shape)) print('\n') print('Target names : '+str(iris.target_names)) print('\n') print('Target size : '+str(iris.target.shape)) X_iris, Y_iris = iris.data, iris.target # Initialize PCA and fit the data pca_3 = decomposition.PCA(n_components=3) pca_3.fit(X_iris) # Transforming iris data to new dimensions(with 2 features) X_iris_pca3 = pca_3.transform(X_iris) # Printing new dataset print('New Dataset size after transformations : ', X_iris_pca3.shape) print('\n') # Getting the direction of maximum variance in data print("Components : ", pca_3.components_) print('\n') # Getting the amount of variance explained by each component print("Explained Variance:",pca_3.explained_variance_) print('\n') # Getting the percentage of variance explained by each component print("Explained Variance Ratio:",pca_3.explained_variance_ratio_) print('\n') # Getting the singular values for each component print("Singular Values :",pca_3.singular_values_) print('\n') # Getting estimated noise covariance print("Noise Variance :",pca_3.noise_variance_)
Output
It will produce the following output -
Features names : ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)'] Features size : (150, 4) Target names : ['setosa' 'versicolor' 'virginica'] Target size : (150,) New Dataset size after transformations : (150, 3) Components : [[ 0.36138659 -0.08452251 0.85667061 0.3582892 ] [ 0.65658877 0.73016143 -0.17337266 -0.07548102] [-0.58202985 0.59791083 0.07623608 0.54583143]] Explained Variance: [4.22824171 0.24267075 0.0782095 ] Explained Variance Ratio: [0.92461872 0.05306648 0.01710261] Singular Values : [25.09996044 6.01314738 3.41368064] Noise Variance : 0.02383509297344944
The above is the detailed content of How to convert Scikit-learn's IRIS dataset into a dataset with only two features in Python?. For more information, please follow other related articles on the PHP Chinese website!

To maximize the efficiency of learning Python in a limited time, you can use Python's datetime, time, and schedule modules. 1. The datetime module is used to record and plan learning time. 2. The time module helps to set study and rest time. 3. The schedule module automatically arranges weekly learning tasks.

Python excels in gaming and GUI development. 1) Game development uses Pygame, providing drawing, audio and other functions, which are suitable for creating 2D games. 2) GUI development can choose Tkinter or PyQt. Tkinter is simple and easy to use, PyQt has rich functions and is suitable for professional development.

Python is suitable for data science, web development and automation tasks, while C is suitable for system programming, game development and embedded systems. Python is known for its simplicity and powerful ecosystem, while C is known for its high performance and underlying control capabilities.

You can learn basic programming concepts and skills of Python within 2 hours. 1. Learn variables and data types, 2. Master control flow (conditional statements and loops), 3. Understand the definition and use of functions, 4. Quickly get started with Python programming through simple examples and code snippets.

Python is widely used in the fields of web development, data science, machine learning, automation and scripting. 1) In web development, Django and Flask frameworks simplify the development process. 2) In the fields of data science and machine learning, NumPy, Pandas, Scikit-learn and TensorFlow libraries provide strong support. 3) In terms of automation and scripting, Python is suitable for tasks such as automated testing and system management.

You can learn the basics of Python within two hours. 1. Learn variables and data types, 2. Master control structures such as if statements and loops, 3. Understand the definition and use of functions. These will help you start writing simple Python programs.

How to teach computer novice programming basics within 10 hours? If you only have 10 hours to teach computer novice some programming knowledge, what would you choose to teach...

How to avoid being detected when using FiddlerEverywhere for man-in-the-middle readings When you use FiddlerEverywhere...


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Safe Exam Browser
Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

MantisBT
Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

SecLists
SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

ZendStudio 13.5.1 Mac
Powerful PHP integrated development environment