Home > Article > Backend Development > How to build a simple recommendation system in Python
How to build a simple recommendation system in Python
Recommendation systems are designed to help people discover and select items that may be of interest to them. Python provides a wealth of libraries and tools that can help us build a simple but effective recommendation system. This article will introduce how to use Python to build a user-based collaborative filtering recommendation system and provide specific code examples.
Collaborative filtering is a common algorithm for recommendation systems. It infers similarities between users based on users' behavioral history data, and then uses these similarities to predict and recommend items. We will use the MovieLens dataset, which contains a set of user ratings of movies. First, we need to install the required libraries:
pip install pandas scikit-learn
Next, we will import the required libraries and load the MovieLens dataset:
import pandas as pd from sklearn.model_selection import train_test_split # 加载数据集 data = pd.read_csv('ratings.csv')
The dataset contains userId## The three columns #,
movieId and
rating represent the user ID, movie ID and rating respectively. Next, we split the data set into a training set and a test set:
train_data, test_data = train_test_split(data, test_size=0.2, random_state=42)Now, we can build the recommendation system. Here we will use the cosine similarity between users as the similarity measure. We will create two dictionaries to store the similarity scores of users and movies:
# 计算用户之间的相似度 def calculate_similarity(train_data): similarity = dict() for user in train_data['userId'].unique(): similarity[user] = dict() user_ratings = train_data[train_data['userId'] == user] for movie in user_ratings['movieId'].unique(): similarity[user][movie] = 1.0 return similarity # 计算用户之间的相似度得分 def calculate_similarity_score(train_data, similarity): for user1 in similarity.keys(): for user2 in similarity.keys(): if user1 != user2: user1_ratings = train_data[train_data['userId'] == user1] user2_ratings = train_data[train_data['userId'] == user2] num_ratings = 0 sum_of_squares = 0 for movie in user1_ratings['movieId'].unique(): if movie in user2_ratings['movieId'].unique(): num_ratings += 1 rating1 = user1_ratings[user1_ratings['movieId'] == movie]['rating'].values[0] rating2 = user2_ratings[user2_ratings['movieId'] == movie]['rating'].values[0] sum_of_squares += (rating1 - rating2) ** 2 similarity[user1][user2] = 1 / (1 + (sum_of_squares / num_ratings) ** 0.5) return similarity # 计算电影之间的相似度得分 def calculate_movie_similarity_score(train_data, similarity): movie_similarity = dict() for user in similarity.keys(): for movie in train_data[train_data['userId'] == user]['movieId'].unique(): if movie not in movie_similarity.keys(): movie_similarity[movie] = dict() for other_movie in train_data[train_data['userId'] == user]['movieId'].unique(): if movie != other_movie: movie_similarity[movie][other_movie] = similarity[user][other_user] return movie_similarity # 构建推荐系统 def build_recommendation_system(train_data, similarity, movie_similarity): recommendations = dict() for user in train_data['userId'].unique(): user_ratings = train_data[train_data['userId'] == user] recommendations[user] = dict() for movie in train_data['movieId'].unique(): if movie not in user_ratings['movieId'].unique(): rating = 0 num_movies = 0 for other_user in similarity[user].keys(): if movie in train_data[train_data['userId'] == other_user]['movieId'].unique(): rating += similarity[user][other_user] * train_data[(train_data['userId'] == other_user) & (train_data['movieId'] == movie)]['rating'].values[0] num_movies += 1 if num_movies > 0: recommendations[user][movie] = rating / num_movies return recommendations # 计算评价指标 def calculate_metrics(recommendations, test_data): num_users = 0 sum_of_squared_error = 0 for user in recommendations.keys(): if user in test_data['userId'].unique(): num_users += 1 for movie in recommendations[user].keys(): if movie in test_data[test_data['userId'] == user]['movieId'].unique(): predicted_rating = recommendations[user][movie] actual_rating = test_data[(test_data['userId'] == user) & (test_data['movieId'] == movie)]['rating'].values[0] sum_of_squared_error += (predicted_rating - actual_rating) ** 2 rmse = (sum_of_squared_error / num_users) ** 0.5 return rmse # 计算用户之间的相似度 similarity = calculate_similarity(train_data) # 计算用户之间的相似度得分 similarity = calculate_similarity_score(train_data, similarity) # 计算电影之间的相似度得分 movie_similarity = calculate_movie_similarity_score(train_data, similarity) # 构建推荐系统 recommendations = build_recommendation_system(train_data, similarity, movie_similarity) # 计算评价指标 rmse = calculate_metrics(recommendations, test_data)Finally, we can output the results and evaluation metrics of the recommendation system:
print(recommendations) print('RMSE:', rmse)With the above code example, we A user-based collaborative filtering recommendation system was successfully constructed in Python and its evaluation indicators were calculated. Of course, this is just a simple example, and actual recommendation systems require more complex algorithms and larger data sets to obtain more accurate recommendation results. To summarize, Python provides powerful libraries and tools to build recommendation systems. We can use collaborative filtering algorithms to infer similarities between users and make recommendations based on these similarities. I hope this article can help readers understand how to build a simple but effective recommendation system in Python, and provide some ideas for further exploring the field of recommendation systems.
The above is the detailed content of How to build a simple recommendation system in Python. For more information, please follow other related articles on the PHP Chinese website!