


Introduction
A machine learning model is essentially a set of rules or mechanisms used to make predictions or find patterns in data. To put it super simply (and without fear of oversimplification), a trendline calculated using the least squares method in Excel is also a model. However, models used in real applications are not so simple—they often involve more complex equations and algorithms, not just simple equations.
In this post, I’m going to start by building a very simple machine learning model and releasing it as a very simple web app to get a feel for the process.
Here, I’ll focus only on the process, not the ML model itself. Alsom I’ll use Streamlit and Streamlit Community Cloud to easily release Python web applications.
TL;DR:
Using scikit-learn, a popular Python library for machine learning, you can quickly train data and create a model with just a few lines of code for simple tasks. The model can then be saved as a reusable file with joblib. This saved model can be imported/load like a regular Python library in a web application, allowing the app to make predictions using the trained model!
App URL: https://yh-machine-learning.streamlit.app/
GitHub: https://github.com/yoshan0921/yh-machine-learning.git
Technology Stack
- Python
- Streamlit: For creating the web application interface.
- scikit-learn: For loading and using the pre-trained Random Forest model.
- NumPy & Pandas: For data manipulation and processing.
- Matplotlib & Seaborn: For generating visualizations.
What I Made
This app allows you to examine predictions made by a random forest model trained on the Palmer Penguins dataset. (See the end of this article for more details on the training data.)
Specifically, the model predicts penguin species based on a variety of features, including species, island, beak length, flipper length, body size, and sex. Users can navigate the app to see how different features affect the model's predictions.
Prediction Screen
Learning Data/Visualization Screen
Development Step1 - Creating the Model
Step1.1 Import Libraries
import pandas as pd from sklearn.model_selection import train_test_split from sklearn.ensemble import RandomForestClassifier from sklearn.metrics import accuracy_score import joblib
pandas is a Python library specialized in data manipulation and analysis. It supports data loading, preprocessing, and structuring using DataFrames, preparing data for machine learning models.
sklearn is a comprehensive Python library for machine learning that provides tools for training and evaluating. In this post, I will build a model using a learning method called Random Forest.
joblib is a Python library that helps save and load Python objects, like machine learning models, in a very efficient way.
Step1.2 Read Data
df = pd.read_csv("./dataset/penguins_cleaned.csv") X_raw = df.drop("species", axis=1) y_raw = df.species
Load the dataset (training data) and separate it into features (X) and target variables (y).
Step1.3 Encode the Category Variables
encode = ["island", "sex"] X_encoded = pd.get_dummies(X_raw, columns=encode) target_mapper = {"Adelie": 0, "Chinstrap": 1, "Gentoo": 2} y_encoded = y_raw.apply(lambda x: target_mapper[x])
The categorical variables are converted into a numerical format using one-hot encoding (X_encoded). For example, if “island” contains the categories “Biscoe”, “Dream”, and “Torgersen”, a new column is created for each (island_Biscoe, island_Dream, island_Torgersen). The same is done for sex. If the original data is “Biscoe,” the island_Biscoe column will be set to 1 and the others to 0.
The target variable species is mapped to numerical values (y_encoded).
Step1.4 Split the Dataset
x_train, x_test, y_train, y_test = train_test_split( X_encoded, y_encoded, test_size=0.3, random_state=1 )
To evaluate a model, it is necessary to measure the model's performance on data not used for training. 7:3 is widely used as a general practice in machine learning.
Step1.5 Train a Random Forest Model
clf = RandomForestClassifier() clf.fit(x_train, y_train)
The fit method is used to train the model.
The x_train represents the training data for the explanatory variables, and the y_train represents the target variables.
By calling this method, the model trained based on the training data is stored in clf.
Step1.6 Save the Model
joblib.dump(clf, "penguin_classifier_model.pkl")
joblib.dump() is a function for saving Python objects in binary format. By saving the model in this format, the model can be loaded from a file and used as-is without having to be trained again.
Sample Code
Development Step2 - Building the Web App and Integrating the Model
Step2.1 Import Libraries
import streamlit as st import numpy as np import pandas as pd import joblib
stremlit is a Python library that makes it easy to create and share custom web applications for machine learning and data science projects.
numpy is a fundamental Python library for numerical computing. It provides support for large, multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on these arrays efficiently.
Step2.2 Retrieve and encode input data
data = { "island": island, "bill_length_mm": bill_length_mm, "bill_depth_mm": bill_depth_mm, "flipper_length_mm": flipper_length_mm, "body_mass_g": body_mass_g, "sex": sex, } input_df = pd.DataFrame(data, index=[0]) encode = ["island", "sex"] input_encoded_df = pd.get_dummies(input_df, prefix=encode)
Input values are retrieved from the input form created by Stremlit, and categorical variables are encoded using the same rules as when the model was created. Note that the order of each data must also be the same as when the model was created. If the order is different, an error will occur when executing a forecast using the model.
Step2.3 Load the Model
clf = joblib.load("penguin_classifier_model.pkl")
"penguin_classifier_model.pkl" is the file where the previously saved model is stored. This file contains a trained RandomForestClassifier in binary format. Running this code loads the model into clf, allowing you to use it for predictions and evaluations on new data.
Step2.4 Perform prediction
prediction = clf.predict(input_encoded_df) prediction_proba = clf.predict_proba(input_encoded_df)
clf.predict(input_encoded_df): Uses the trained model to predict the class for the new encoded input data, storing the result in prediction.
clf.predict_proba(input_encoded_df): Calculates the probability for each class, storing the results in prediction_proba.
Sample Code
Step3. Deploy
You can publish your developed application on the Internet by accessing the Stremlit Community Cloud (https://streamlit.io/cloud) and specifying the URL of the GitHub repository.
About Data Set
Artwork by @allison_horst (https://github.com/allisonhorst)
The model is trained using the Palmer Penguins dataset, a widely recognized dataset for practicing machine learning techniques. This dataset provides information on three penguin species (Adelie, Chinstrap, and Gentoo) from the Palmer Archipelago in Antarctica. Key features include:
- Species: The species of the penguin (Adelie, Chinstrap, Gentoo).
- Island: The specific island where the penguin was observed (Biscoe, Dream, Torgersen).
- Bill Length: The length of the penguin's bill (mm).
- Bill Depth: The depth of the penguin's bill (mm).
- Flipper Length: The length of the penguin's flipper (mm).
- Body Mass: The mass of the penguin (g).
- Sex: The sex of the penguin (male or female).
This dataset is sourced from Kaggle, and it can be accessed here. The diversity in features makes it an excellent choice for building a classification model and understanding the importance of each feature in species prediction.
The above is the detailed content of Machine Learning Model Deployment as a Web App using Streamlit. For more information, please follow other related articles on the PHP Chinese website!

Is it enough to learn Python for two hours a day? It depends on your goals and learning methods. 1) Develop a clear learning plan, 2) Select appropriate learning resources and methods, 3) Practice and review and consolidate hands-on practice and review and consolidate, and you can gradually master the basic knowledge and advanced functions of Python during this period.

Key applications of Python in web development include the use of Django and Flask frameworks, API development, data analysis and visualization, machine learning and AI, and performance optimization. 1. Django and Flask framework: Django is suitable for rapid development of complex applications, and Flask is suitable for small or highly customized projects. 2. API development: Use Flask or DjangoRESTFramework to build RESTfulAPI. 3. Data analysis and visualization: Use Python to process data and display it through the web interface. 4. Machine Learning and AI: Python is used to build intelligent web applications. 5. Performance optimization: optimized through asynchronous programming, caching and code

Python is better than C in development efficiency, but C is higher in execution performance. 1. Python's concise syntax and rich libraries improve development efficiency. 2.C's compilation-type characteristics and hardware control improve execution performance. When making a choice, you need to weigh the development speed and execution efficiency based on project needs.

Python's real-world applications include data analytics, web development, artificial intelligence and automation. 1) In data analysis, Python uses Pandas and Matplotlib to process and visualize data. 2) In web development, Django and Flask frameworks simplify the creation of web applications. 3) In the field of artificial intelligence, TensorFlow and PyTorch are used to build and train models. 4) In terms of automation, Python scripts can be used for tasks such as copying files.

Python is widely used in data science, web development and automation scripting fields. 1) In data science, Python simplifies data processing and analysis through libraries such as NumPy and Pandas. 2) In web development, the Django and Flask frameworks enable developers to quickly build applications. 3) In automated scripts, Python's simplicity and standard library make it ideal.

Python's flexibility is reflected in multi-paradigm support and dynamic type systems, while ease of use comes from a simple syntax and rich standard library. 1. Flexibility: Supports object-oriented, functional and procedural programming, and dynamic type systems improve development efficiency. 2. Ease of use: The grammar is close to natural language, the standard library covers a wide range of functions, and simplifies the development process.

Python is highly favored for its simplicity and power, suitable for all needs from beginners to advanced developers. Its versatility is reflected in: 1) Easy to learn and use, simple syntax; 2) Rich libraries and frameworks, such as NumPy, Pandas, etc.; 3) Cross-platform support, which can be run on a variety of operating systems; 4) Suitable for scripting and automation tasks to improve work efficiency.

Yes, learn Python in two hours a day. 1. Develop a reasonable study plan, 2. Select the right learning resources, 3. Consolidate the knowledge learned through practice. These steps can help you master Python in a short time.


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

WebStorm Mac version
Useful JavaScript development tools

SublimeText3 Linux new version
SublimeText3 Linux latest version

Atom editor mac version download
The most popular open source editor

SublimeText3 English version
Recommended: Win version, supports code prompts!

SAP NetWeaver Server Adapter for Eclipse
Integrate Eclipse with SAP NetWeaver application server.