How to convert Scikit-learn's IRIS dataset into a dataset with only two features in Python?-Python Tutorial-php.cn

Home

Backend Development

Python Tutorial

How to convert Scikit-learn's IRIS dataset into a dataset with only two features in Python?

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Aug 30, 2023 pm 09:49 PM

data setConvertfeature

How to convert Scikit-learns IRIS dataset into a dataset with only two features in Python?

Iris, a multivariate flower dataset, is one of the most useful Python scikit-learn datasets. It is divided into 3 categories of 50 instances each and contains measurements of the sepal and petal parts of three species of iris (Iris mountaina, Iris virginia and Iris variegated). Apart from this, the Iris dataset contains 50 instances of each of the three species and consists of four features, namely sepal_length (cm), sepal_width (cm), petal_length (cm), petal_width (cm).

We can use Principal Component Analysis (PCA) to transform the IRIS dataset into a new feature space with 2 features.

step

We can convert the IRIS dataset into a 2-feature dataset using PCA in Python by following the steps given below -

Step 1 - First, import the necessary packages from scikit-learn. We need to import the dataset and decomposition package.

Steps 2 - Load the IRIS dataset.

Steps 3 - Print detailed information about the dataset.

Steps 4 - Initialize Principal Component Analysis (PCA) and apply the fit() function to fit the data. p>

Step 5 - Convert the dataset into a new dimension, a 2-feature dataset.

Example

In the example below, we will transform the scikit-learn IRIS plant dataset into 2 features via PCA using the above steps.

# Importing the necessary packages
from sklearn import datasets
from sklearn import decomposition

# Load iris plant dataset
iris = datasets.load_iris()

# Print details about the dataset
print('Features names : '+str(iris.feature_names))
print('\n')
print('Features size : '+str(iris.data.shape))
print('\n')
print('Target names : '+str(iris.target_names))
print('\n')
X_iris, Y_iris = iris.data, iris.target

# Initialize PCA and fit the data
pca_2 = decomposition.PCA(n_components=2)
pca_2.fit(X_iris)

# Transforming iris data to new dimensions(with 2 features)
X_iris_pca2 = pca_2.transform(X_iris)

# Printing new dataset
print('New Dataset size after transformations: ', X_iris_pca2.shape)

Output

It will produce the following output -

Features names : ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']

Features size : (150, 4)

Target names : ['setosa' 'versicolor' 'virginica']

New Dataset size after transformations: (150, 2)

How to convert Iris dataset to 3-feature dataset?

We can transform the Iris dataset into a new feature space with 3 features using a statistical method called Principal Component Analysis (PCA). PCA essentially linearly projects the data into a new feature space by analyzing the features of the original data set.

The main concept behind PCA is to select the "main" features of the data and build features based on them. It will give us a new dataset that is smaller in size but has the same information as the original dataset.

Example

In the example below, we will use PCA to transform the scikit-learn Iris plant dataset (initialized with 3 components).

# Importing the necessary packages
from sklearn import datasets
from sklearn import decomposition

# Load iris plant dataset
iris = datasets.load_iris()

# Print details about the dataset
print('Features names : '+str(iris.feature_names))
print('\n')
print('Features size : '+str(iris.data.shape))
print('\n')
print('Target names : '+str(iris.target_names))
print('\n')
print('Target size : '+str(iris.target.shape))
X_iris, Y_iris = iris.data, iris.target

# Initialize PCA and fit the data
pca_3 = decomposition.PCA(n_components=3)
pca_3.fit(X_iris)

# Transforming iris data to new dimensions(with 2 features)
X_iris_pca3 = pca_3.transform(X_iris)

# Printing new dataset
print('New Dataset size after transformations : ', X_iris_pca3.shape)
print('\n')

# Getting the direction of maximum variance in data
print("Components : ", pca_3.components_)
print('\n')

# Getting the amount of variance explained by each component
print("Explained Variance:",pca_3.explained_variance_)
print('\n')

# Getting the percentage of variance explained by each component
print("Explained Variance Ratio:",pca_3.explained_variance_ratio_)
print('\n')

# Getting the singular values for each component
print("Singular Values :",pca_3.singular_values_)
print('\n')

# Getting estimated noise covariance
print("Noise Variance :",pca_3.noise_variance_)

Output

It will produce the following output -

Features names : ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']

Features size : (150, 4)

Target names : ['setosa' 'versicolor' 'virginica']

Target size : (150,)
New Dataset size after transformations : (150, 3)

Components : [[ 0.36138659 -0.08452251 0.85667061 0.3582892 ]
[ 0.65658877 0.73016143 -0.17337266 -0.07548102]
[-0.58202985 0.59791083 0.07623608 0.54583143]]

Explained Variance: [4.22824171 0.24267075 0.0782095 ]

Explained Variance Ratio: [0.92461872 0.05306648 0.01710261]

Singular Values : [25.09996044 6.01314738 3.41368064]

Noise Variance : 0.02383509297344944

The above is the detailed content of How to convert Scikit-learn's IRIS dataset into a dataset with only two features in Python?. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:tutorialspoint. If there is any infringement, please contact admin@php.cn delete

Python and Time: Making the Most of Your Study TimeApr 14, 2025 am 12:02 AM

To maximize the efficiency of learning Python in a limited time, you can use Python's datetime, time, and schedule modules. 1. The datetime module is used to record and plan learning time. 2. The time module helps to set study and rest time. 3. The schedule module automatically arranges weekly learning tasks.

Python: Games, GUIs, and MoreApr 13, 2025 am 12:14 AM

Python excels in gaming and GUI development. 1) Game development uses Pygame, providing drawing, audio and other functions, which are suitable for creating 2D games. 2) GUI development can choose Tkinter or PyQt. Tkinter is simple and easy to use, PyQt has rich functions and is suitable for professional development.

Python vs. C : Applications and Use Cases ComparedApr 12, 2025 am 12:01 AM

Python is suitable for data science, web development and automation tasks, while C is suitable for system programming, game development and embedded systems. Python is known for its simplicity and powerful ecosystem, while C is known for its high performance and underlying control capabilities.

The 2-Hour Python Plan: A Realistic ApproachApr 11, 2025 am 12:04 AM

You can learn basic programming concepts and skills of Python within 2 hours. 1. Learn variables and data types, 2. Master control flow (conditional statements and loops), 3. Understand the definition and use of functions, 4. Quickly get started with Python programming through simple examples and code snippets.

Python: Exploring Its Primary ApplicationsApr 10, 2025 am 09:41 AM

Python is widely used in the fields of web development, data science, machine learning, automation and scripting. 1) In web development, Django and Flask frameworks simplify the development process. 2) In the fields of data science and machine learning, NumPy, Pandas, Scikit-learn and TensorFlow libraries provide strong support. 3) In terms of automation and scripting, Python is suitable for tasks such as automated testing and system management.

How Much Python Can You Learn in 2 Hours?Apr 09, 2025 pm 04:33 PM

You can learn the basics of Python within two hours. 1. Learn variables and data types, 2. Master control structures such as if statements and loops, 3. Understand the definition and use of functions. These will help you start writing simple Python programs.

How to teach computer novice programming basics in project and problem-driven methods within 10 hours?Apr 02, 2025 am 07:18 AM

How to teach computer novice programming basics within 10 hours? If you only have 10 hours to teach computer novice some programming knowledge, what would you choose to teach...

How to avoid being detected by the browser when using Fiddler Everywhere for man-in-the-middle reading?Apr 02, 2025 am 07:15 AM

How to avoid being detected when using FiddlerEverywhere for man-in-the-middle readings When you use FiddlerEverywhere...

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)

3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

R.E.P.O. Best Graphic Settings

3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Assassin's Creed Shadows: Seashell Riddle Solution

2 weeks agoByDDD

R.E.P.O. How to Fix Audio if You Can't Hear Anyone

4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

WWE 2K25: How To Unlock Everything In MyRise

1 months agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Safe Exam Browser

Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

MantisBT

Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

ZendStudio 13.5.1 Mac

Powerful PHP integrated development environment

Hot Topics

Where is the login entrance for gmail email?

7501

CakePHP Tutorial

1377

What is the format of the account name of steam

win11 activation key permanent

nyt connections hints and answers