


Detailed explanation of the principle of t-SNE algorithm and Python code implementation
T-distributed stochastic neighbor embedding (t-SNE) is an unsupervised machine learning algorithm for visualization. It uses nonlinear dimensionality reduction technology and based on the relationship between data points and features. Similarity attempts to minimize the difference between these conditional probabilities (or similarities) in high- and low-dimensional spaces to perfectly represent the data points in the low-dimensional space.
Therefore, t-SNE is good at embedding high-dimensional data in a two-dimensional or three-dimensional low-dimensional space for visualization. It should be noted that t-SNE uses a heavy-tailed distribution to calculate the similarity between two points in a low-dimensional space instead of a Gaussian distribution, which helps solve crowding and optimization problems. And outliers do not affect t-SNE.
t-SNE algorithm steps
#1. Find the pairwise similarity between adjacent points in high-dimensional space.
2. Based on the pairwise similarity of the points in the high-dimensional space, map each point in the high-dimensional space to a low-dimensional map.
3. Use gradient descent based on Kullback-Leibler divergence (KL divergence) to find a low-dimensional data representation that minimizes the mismatch between conditional probability distributions.
4. Use Student-t distribution to calculate the similarity between two points in low-dimensional space.
Python code to implement t-SNE on the MNIST data set
Import module
# Importing Necessary Modules. import numpy as np import pandas as pd import matplotlib.pyplot as plt from sklearn.manifold import TSNE from sklearn.preprocessing import StandardScaler
Read data
# Reading the data using pandas df = pd.read_csv('mnist_train.csv') # print first five rows of df print(df.head(4)) # save the labels into a variable l. l = df['label'] # Drop the label feature and store the pixel data in d. d = df.drop("label", axis = 1)
Data pre- Processing
# Data-preprocessing: Standardizing the data from sklearn.preprocessing import StandardScaler standardized_data = StandardScaler().fit_transform(data) print(standardized_data.shape)
Output
# TSNE # Picking the top 1000 points as TSNE # takes a lot of time for 15K points data_1000 = standardized_data[0:1000, :] labels_1000 = labels[0:1000] model = TSNE(n_components = 2, random_state = 0) # configuring the parameters # the number of components = 2 # default perplexity = 30 # default learning rate = 200 # default Maximum number of iterations # for the optimization = 1000 tsne_data = model.fit_transform(data_1000) # creating a new data frame which # help us in plotting the result data tsne_data = np.vstack((tsne_data.T, labels_1000)).T tsne_df = pd.DataFrame(data = tsne_data, columns =("Dim_1", "Dim_2", "label")) # Plotting the result of tsne sn.FacetGrid(tsne_df, hue ="label", size = 6).map( plt.scatter, 'Dim_1', 'Dim_2').add_legend() plt.show()
The above is the detailed content of Detailed explanation of the principle of t-SNE algorithm and Python code implementation. For more information, please follow other related articles on the PHP Chinese website!

Pythonusesahybridmodelofcompilationandinterpretation:1)ThePythoninterpretercompilessourcecodeintoplatform-independentbytecode.2)ThePythonVirtualMachine(PVM)thenexecutesthisbytecode,balancingeaseofusewithperformance.

Pythonisbothinterpretedandcompiled.1)It'scompiledtobytecodeforportabilityacrossplatforms.2)Thebytecodeistheninterpreted,allowingfordynamictypingandrapiddevelopment,thoughitmaybeslowerthanfullycompiledlanguages.

Forloopsareidealwhenyouknowthenumberofiterationsinadvance,whilewhileloopsarebetterforsituationswhereyouneedtoloopuntilaconditionismet.Forloopsaremoreefficientandreadable,suitableforiteratingoversequences,whereaswhileloopsoffermorecontrolandareusefulf

Forloopsareusedwhenthenumberofiterationsisknowninadvance,whilewhileloopsareusedwhentheiterationsdependonacondition.1)Forloopsareidealforiteratingoversequenceslikelistsorarrays.2)Whileloopsaresuitableforscenarioswheretheloopcontinuesuntilaspecificcond

Pythonisnotpurelyinterpreted;itusesahybridapproachofbytecodecompilationandruntimeinterpretation.1)Pythoncompilessourcecodeintobytecode,whichisthenexecutedbythePythonVirtualMachine(PVM).2)Thisprocessallowsforrapiddevelopmentbutcanimpactperformance,req

ToconcatenatelistsinPythonwiththesameelements,use:1)the operatortokeepduplicates,2)asettoremoveduplicates,or3)listcomprehensionforcontroloverduplicates,eachmethodhasdifferentperformanceandorderimplications.

Pythonisaninterpretedlanguage,offeringeaseofuseandflexibilitybutfacingperformancelimitationsincriticalapplications.1)InterpretedlanguageslikePythonexecuteline-by-line,allowingimmediatefeedbackandrapidprototyping.2)CompiledlanguageslikeC/C transformt

Useforloopswhenthenumberofiterationsisknowninadvance,andwhileloopswheniterationsdependonacondition.1)Forloopsareidealforsequenceslikelistsorranges.2)Whileloopssuitscenarioswheretheloopcontinuesuntilaspecificconditionismet,usefulforuserinputsoralgorit


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

SublimeText3 English version
Recommended: Win version, supports code prompts!

Safe Exam Browser
Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

SecLists
SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

Notepad++7.3.1
Easy-to-use and free code editor

PhpStorm Mac version
The latest (2018.2.1) professional PHP integrated development tool
