Home >Backend Development >Python Tutorial >Multivariable linear regression example in Python

Multivariable linear regression example in Python

王林
王林Original
2023-06-10 13:57:151454browse

In the field of machine learning, linear regression is a commonly used method. Multivariable linear regression is a method that can predict the linear relationship between one or more independent variables and a dependent variable. It is usually used to predict market trends such as real estate and stock prices.

Python is a very popular programming language that is easy to learn, write and debug. In Python, multivariable linear regression models can be easily implemented using the Scikit-learn library.

In this article, we will introduce multivariable linear regression in Python through an example of housing price prediction.

Import libraries and data

In order to implement the multivariable linear regression model, we need to import some Python libraries:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

The data used here comes from Boston on the Kaggle website Real estate data set. We can use the read_csv function in the Pandas library to read data from the data file:

data = pd.read_csv('Boston.csv')

Exploration and visualization of data

Before building the model, we should explore the data and understand each Distribution of features and relationships between features.

We can use the describe function and corr function in the Pandas library to understand the feature distribution of the data and the correlation between features. Among them, the corr function returns the correlation coefficient matrix between each feature and other features.

print(data.describe())
print(data.corr())

We can also use the Matplotlib library to visualize the data. For example, draw a scatter plot between two features:

plt.scatter(data['RM'], data['Price'])
plt.title('House Price vs Number of Rooms')
plt.xlabel('Number of Rooms')
plt.ylabel('Price')
plt.show()

Data preprocessing and model training

In order to train a multivariate linear regression model, we need to split the data into two parts: training set and test set. We can use the train_test_split function in the Scikit-learn library to randomly divide the data set into a training set and a test set:

X = data.drop(['Price'], axis=1)
y = data['Price']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

Next, we can use the LinearRegression function in the Scikit-learn library to initialize multivariable linear regression model and use the training set to fit the model:

model = LinearRegression()
model.fit(X_train, y_train)

Evaluation and prediction of the model

To evaluate the performance of the model, we can use the test set to predict house prices and use the average variance ( Mean Squared Error (MSE) to measure the difference between predicted results and actual results. We can use the mean_squared_error function in the Scikit-learn library to calculate the average variance of the prediction results:

y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
print(mse)

If the value of the average variance is smaller, it means that the model prediction results are more accurate.

Finally, we can use the model to predict new house prices. For example, we want to predict the price of a house that has 6 rooms, 2 bathrooms, is 4.5 miles from a business district, etc. We can input the values ​​of these features into the model and use the model’s predict function to predict the price of the house:

new_data = np.array([[6, 2, 4.5, 0, 0.4, 6, 79, 6.1, 5, 331, 17, 385, 11.3]])
new_prediction = model.predict(new_data)
print(new_prediction)

The predicted price of this house is approximately $239,000.

Summary

In this article, we introduced how to implement a multivariable linear regression model in Python using the LinearRegression function of the Scikit-learn library. We use the Boston real estate data set as an example to illustrate the steps of data import, exploration, visualization, preprocessing, and model training, evaluation, and prediction. I hope this article can help readers better master the multivariable linear regression method in Python.

The above is the detailed content of Multivariable linear regression example in Python. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn