Home >Technology peripherals >It Industry >A Primer on Machine Learning with Python

A Primer on Machine Learning with Python

Christopher NolanOriginal: 2025-02-10 15:54:09889browse

Over the past decade, machine learning has moved from scientific research labs to everyday web and mobile applications. Machine learning enables your application to perform previously difficult tasks, such as detecting objects and faces in images, detecting spam and hate speech, and generating smart replies for email and message applications.

However, performing machine learning is fundamentally different from classical programming. In this article, you will learn the basics of machine learning and create a basic model that can predict flower species based on flower measurements.

Key Points

Machine learning has evolved from a scientific research lab to everyday web and mobile applications, enabling applications to perform tasks that were previously difficult to program.
Machine learning relies on experience, trains models through examples, rather than providing rules to them. There are different categories of machine learning algorithms, each of which can solve specific problems: supervised learning, unsupervised learning, and reinforcement learning.
Python has become a popular machine learning language due to its simplicity, readability and a wide ecosystem including libraries and frameworks such as Scikit-learn, TensorFlow, and PyTorch. However, understanding the basic concepts of Python programming, libraries such as NumPy, Pandas, and Matplotlib, as well as statistics and probability is a prerequisite.
The process of implementing a machine learning model includes defining problems, collecting data, splitting the dataset into training and test sets, building the model, and evaluating its performance. Techniques such as cross-validation and training test splitting, as well as indicators such as accuracy, accuracy, recall and F1 score, can be used to verify the performance of the model.

How does machine learning work?

Classic programming relies on well-defined problems that can be broken down into different classes, functions, and if-else commands. Machine learning, on the other hand, relies on developing its behavior based on experience. Instead of providing rules to machine learning models, you train them through examples.

There are different categories of machine learning algorithms, each of which can solve specific problems.

Supervised Learning

Supervised learning is suitable for questions you want to get from input data to the result. A common feature of all supervised learning problems is the existence of a real situation that can be used to test the model, such as marked images or historical sales data.

Supervised learning models can solve regression or classification problems. The regression model predicts quantity (e.g. the quantity of goods sold or the price of stock), while the classification problem attempts to determine the categories of input data (e.g. cat/dog/fish/bird, fraud/non-fraud).

Image classification, face detection, stock price prediction and sales prediction are examples of problems that supervised learning can solve.

Some popular supervised learning algorithms include linear regression and logistic regression, support vector machines, decision trees and artificial neural networks.

Unsupervised learning

Unsupervised learning is suitable for problems where you have data but not results, but looking for patterns. For example, you might want to group them into segments based on your similarity. This is called clustering in unsupervised learning. Alternatively, you may want to detect malicious network traffic that deviates from the normal activities of your business. This is called anomaly detection, which is another unsupervised learning task. Unsupervised learning can also be used for dimensionality reduction, a technique to simplify machine learning tasks by removing irrelevant features.

Some popular unsupervised learning algorithms include K-mean clustering and principal component analysis (PCA).

Reinforcement Learning

Reinforcement learning is a branch of machine learning where agents try to achieve their goals by interacting with their environment. Reinforcement learning involves actions, status and rewards. Untrained reinforcement learning agents start with random action. Each action changes the state of the environment. If the agent finds himself in the desired state, he will receive a reward. The agent tries to find the sequence of actions and states that generate the most rewards.

Reinforcement learning is used in recommendation systems, robotics, and gaming robots, such as Google's AlphaGo and AlphaStar.

Setting up Python environment

In this article, we will focus on supervised learning, as it is the most popular branch of machine learning and its results are easier to evaluate. We will use Python because it has many features and libraries that support machine learning applications. However, the general concept can be applied to any programming language with similar libraries.

(If you are not familiar with Python, freeCodeCamp provides a great crash course to get you started.)

One of the Python libraries commonly used in data science and machine learning is Scikit-learn, which provides implementations of popular machine learning algorithms. Scikit-learn is not part of a basic Python installation, you have to install it manually.

MacOS and Linux are pre-installed with Python. To install the Scikit-learn library, type the following command in the terminal window:

<code>pip install scikit-learn</code>

Or for Python 3:

<code>python3 -m pip install scikit-learn</code>

On Microsoft Windows, you must first install Python. You can get the latest version of Windows Python 3 installer from the official website. After Python is installed, type the following command in the command line window:

<code>python -m pip install scikit-learn</code>

Alternatively, you can install the Anaconda framework, which includes standalone Python 3 as well as Scikit-learn and many other libraries for data science and machine learning, such as Numpy, Scipy > and Matplotlib. You can find the installation instructions for the free personal version of Anaconda on its official website.

Step 1: Define the problem

The first step in every machine learning project is to understand the problem you want to solve. Defining a question will help you determine the type of data you need to collect and give you an idea of which machine learning algorithm you need to use.

In our example, we want to create a model that predicts the type of flower based on measurements of petals and sepal length and width.

This is a supervision classification issue. We need to collect a list of measurements of different flower specimens and their corresponding species. We will then use this data to train and test a machine learning model that can map measurements to species.

Step 2: Collect data

One of the trickiest parts of machine learning is collecting data to train your model. You must find a source that can collect the amount of data needed to train the model. You also need to verify the quality of your data, make sure it represents the different situations the model will handle, and avoid collecting data that contains hidden biases.

Luckily, Scikit-learn contains several toy datasets that can be used to try different machine learning algorithms. The "Iris Dataset" happens to contain the exact data required for our question. We just need to load it from the library.

The following code loads the housing dataset:

<code>pip install scikit-learn</code>

The Iris data set contains 150 observations, each with four measurements (iris.data) and target flower species (iris.target). You can see the name of the data column in iris.feature_names:

<code>python3 -m pip install scikit-learn</code>

iris.target contains a numerical index (0-2) of one of the three flower species registered in the dataset. The names of the flower species can be found in iris.target_names:

<code>python -m pip install scikit-learn</code>

Step 3: Split the dataset

Before starting training, you must split the data into a training set and a test set. You will use the training set to train a machine learning model and use the test set to verify its accuracy.

This is done to ensure that your model does not overfit the training data. Overfitting is when your machine learning model performs well on training examples but not on unseen data. Overfitting may be caused by choosing a wrong machine learning algorithm, misconfiguring the model, poor training data, or too few training examples.

Depending on the type of problem you are solving and the amount of data you have, you must determine the amount of data you want to assign to the test set. Usually, when you have a lot of data (about tens of thousands of examples), even just about 1% of the small samples is enough to test your model. For the iris dataset containing a total of 150 records, we will select the 75-25 segmentation.

Scikit-learn has a train_test_split function that splits the dataset into a training dataset and a test dataset:

<code>from sklearn.datasets import load_iris

iris = load_iris()
</code>

train_test_split Gets the data and target datasets and returns two pairs of datasets used for training (X_train and y_train) and test (X_test and y_test). The test_size parameter determines the percentage of data to be assigned to the test (between 0 and 1). The stratify parameter ensures that the training array and the test array contain the number of balanced samples from each category. The random_state variable exists in many functions of Scikit-learn and is used to control the random number generator and achieve repeatability.

Step 4: Build the model

Now that our data is ready, we can create a machine learning model and train it on the training set. There are many different machine learning algorithms that can solve the classification problem we are dealing with. In our case, we will use the "logistic regression" algorithm, which is very fast and is suitable for simple classification problems that do not contain too many dimensions.

Scikit-learn's LogisticRegression class implements this algorithm. After instantiating it, we will train it on our training set (X_train and y_train) by calling the fit function. This will adjust the parameters of the model to find the mapping between the measured values and the flower species.

<code>pip install scikit-learn</code>

Step 5: Evaluate the model

Now that we have trained the model, we want to measure its accuracy. The LogisticRegression class has a score method that returns the accuracy of the model. First, we will measure the accuracy of the model on the training data:

<code>python3 -m pip install scikit-learn</code>

This will return approximately 0.97, which means the model accurately predicts 97% of the training examples, which is pretty good considering that we only have about 37 training examples per species.

Next, we will check the accuracy of the model on the test set:

<code>python -m pip install scikit-learn</code>

This will give us about 95% of the results, slightly below training accuracy, which is natural because these are examples that the model has never seen before. By creating larger data sets or trying another machine learning algorithm (such as support vector machines), we may be able to further improve the accuracy of our models and bridge the gap between training and testing performance.

Finally, we want to see how to use the model we trained on the new example. The LogisticRegression class has a predict function that takes an array of observations as input and returns the predicted category. In the case of our flower classifier model, we need to provide it with an array of four measurements (sepal length, sepal width, petal length, petal width) which will return an integer representing the category of the flower:

<code>from sklearn.datasets import load_iris

iris = load_iris()
</code>

Congratulations! You created your first machine learning model. We can now combine it into an app that takes measurements from the user and returns the flower species:

<code>print(iris.feature_names)
'''
['sepal length (cm)',
 'sepal width (cm)',
 'petal length (cm)',
 'petal width (cm)']
'''
</code>

Hope this is your first step to becoming a master of machine learning. From here, you can continue to learn other machine learning algorithms, learn more about the basic concepts of machine learning, and continue to learn more advanced topics such as neural networks and deep learning. With some learning and practice, you will be able to create extraordinary applications that can detect objects in images, process voice commands, and engage in conversations with users.

FAQ for Machine Learning with Python (FAQ)

What are the prerequisites for learning to use Python for machine learning?

To start learning to use Python for machine learning, you need a basic understanding of Python programming. It is also beneficial to be familiar with libraries like NumPy, Pandas, and Matplotlib. Furthermore, a basic understanding of statistics and probability is crucial because they form the core of machine learning algorithms.

How does Python compare to other machine learning languages?

Python is one of the most popular machine learning languages due to its simplicity and readability. It has a wide range of libraries and frameworks such as Scikit-learn, TensorFlow, and PyTorch that simplify the development of machine learning models. Other languages like R and Java are also used in machine learning, but Python’s extensive ecosystem makes it the first choice for many.

What common machine learning algorithms can I implement using Python?

Python's Scikit-learn library provides implementations of various machine learning algorithms. Some commonly used algorithms include linear regression, logistic regression, decision trees, random forests, support vector machines, and k-nearest neighbors. For deep learning, you can use libraries like TensorFlow and PyTorch.

How to verify the performance of my Python machine learning model?

You can use techniques such as cross-validation and training test splitting to verify the performance of your model. Python's Scikit-learn library provides functions for this. Additionally, you can use metrics such as accuracy, accuracy, recall, and F1 score to classify problems and use mean square error or R squared for regression problems.

Can I use Python for supervised and unsupervised learning?

Yes, Python supports supervised learning and unsupervised learning. Library such as Scikit-learn can be used to implement supervised learning algorithms such as regression and classification. For unsupervised learning, you can use clustering algorithms such as K-means, hierarchical clustering, and DBSCAN.

How to deal with overfitting in machine learning models?

Techniques such as regularization, early stopping and neural network dropout can be used to handle overfitting. You can also use integrated methods such as bagging and boosting to reduce overfitting.

What is the role of data preprocessing in machine learning using Python?

Data preprocessing is a key step in machine learning. It includes cleaning up data, processing missing values, encoding categorical variables, and scaling features. Python provides libraries such as Pandas and Scikit-learn, which can perform efficient data preprocessing.

How to use Python to visualize the performance of machine learning models?

You can use libraries such as Matplotlib and Seaborn to visualize the performance of your model. These libraries provide functions to plot graphs such as confusion matrix, ROC curve, and learning curve.

Can I use Python for Natural Language Processing (NLP)?

Yes, Python provides libraries such as NLTK and SpaCy for natural language processing. These libraries provide functions such as tokenization, part-of-speech annotation, named entity recognition, and sentiment analysis.

How to deploy a machine learning model built using Python?

You can use web frameworks such as Flask or Django to deploy machine learning models. For large-scale deployments, you can use cloud platforms such as AWS, Google Cloud, or Azure. They provide services for model deployment, scaling and monitoring.

The above is the detailed content of A Primer on Machine Learning with Python. For more information, please follow other related articles on the PHP Chinese website!

Python Java django flask numpy scipy pandas matplotlib 数据类型 if windows macos 算法线性回归 tensorflow pytorch boosting nlp microsoft azure linux

Statement：

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Previous article：How to Idiomatically Use Global Variables in RustNext article：How to Idiomatically Use Global Variables in Rust

See more

A Primer on Machine Learning with Python

Key Points

How does machine learning work?

Supervised Learning

Unsupervised learning

Reinforcement Learning

Setting up Python environment

Step 1: Define the problem

Step 2: Collect data

Step 3: Split the dataset

Step 4: Build the model

Step 5: Evaluate the model

FAQ for Machine Learning with Python (FAQ)

What are the prerequisites for learning to use Python for machine learning?

How does Python compare to other machine learning languages?

What common machine learning algorithms can I implement using Python?

How to verify the performance of my Python machine learning model?

Can I use Python for supervised and unsupervised learning?

How to deal with overfitting in machine learning models?

What is the role of data preprocessing in machine learning using Python?

How to use Python to visualize the performance of machine learning models?

Can I use Python for Natural Language Processing (NLP)?

How to deploy a machine learning model built using Python?

Related articles