Home  >  Article  >  Backend Development  >  Naive Bayes examples in Python

Naive Bayes examples in Python

王林
王林Original
2023-06-09 23:36:06900browse

Python is a simple and easy-to-learn programming language with a rich set of scientific computing libraries and data processing tools. Among them, the Naive Bayes algorithm, as a classic machine learning method, is also widely used in the Python language. This article will use examples to introduce the usage and steps of Naive Bayes in Python.

  1. Introduction to Naive Bayes

The Naive Bayes algorithm is a classification algorithm based on Bayes’ theorem. Its core idea is to use known training data The characteristics of the set are used to infer the classification results of new data. In practical applications, the Naive Bayes algorithm is often used in scenarios such as text classification, spam filtering, and sentiment analysis.

The characteristic of the Naive Bayes algorithm is that it assumes that each feature is independent of each other. This assumption is often not true in actual situations, so the Naive Bayes algorithm is called "naive". Despite this assumption, Naive Bayes still performs well on problems such as short text classification.

  1. Using Naive Bayes Classifier

In Python, the steps for using Naive Bayes Classifier can be summarized as follows:

2.1 Prepare data

First you need to prepare the training data and test data to be classified. This data can be in the form of text, pictures, audio, etc., but it needs to be converted into a form that can be understood by the computer. In text classification problems, it is often necessary to convert text into vector representation.

2.2 Training model

Next, you need to use the training data set to build the Naive Bayes classifier. There are three commonly used naive Bayes classifiers in Python:

  • GaussianNB: suitable for classification of continuous data.
  • BernoulliNB: Suitable for classification of binary data.
  • MultinomialNB: Suitable for classification of multivariate data.

Taking text classification as an example, you can use the TfidfVectorizer class provided by the sklearn library to convert the text into a vector representation, and use the MultinomialNB classifier for training.

2.3 Test model

After the training is completed, the test data set needs to be used to evaluate the performance of the model. Typically, the test data set and the training data set are independent. It should be noted that data from the training dataset cannot be used during testing. You can use the accuracy_score function provided by the sklearn library to calculate the accuracy of the model.

  1. Example: Text classification based on Naive Bayes

In order to demonstrate the practical application of the Naive Bayes classifier, this article uses text classification based on Naive Bayes For example.

3.1 Prepare data

First, find two text data sets from the Internet, namely "Sports News" and "Science and Technology News". Each data set contains 1,000 texts. Put the two data sets into different folders and label the texts as "Sports" and "Technology" respectively.

3.2 Use the sklearn library for classification

Next, use the naive Bayes classifier provided by the sklearn library for classification.

(1) Import related libraries

from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score
import os

(2) Read text data and its annotations

def read_files(path):
    text_list = []
    label_list = []
    for root, dirs, files in os.walk(path):
        for file in files:
            file_path = os.path.join(root, file)
            with open(file_path, 'r', encoding='utf-8') as f:
                text = ''.join(f.readlines())
                text_list.append(text)
                if '体育' in file_path:
                    label_list.append('体育')
                elif '科技' in file_path:
                    label_list.append('科技')
    return text_list, label_list

(3) Convert text into vector representation

def text_vectorizer(text_list):
    vectorizer = TfidfVectorizer()
    X = vectorizer.fit_transform(text_list)
    return X, vectorizer

(4) Train the model and return the accuracy

def train(text_list, label_list):
    X, vectorizer = text_vectorizer(text_list)
    y = label_list
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
    clf = MultinomialNB()
    clf.fit(X_train, y_train)
    y_pred = clf.predict(X_test)
    acc = accuracy_score(y_test, y_pred)
    return clf, vectorizer, acc

(5) Test the model

def predict(clf, vectorizer, text):
    X = vectorizer.transform(text)
    y_pred = clf.predict(X)
    return y_pred[0]

3.3 Result analysis

Run the above code to get the accuracy of the classifier is 0.955. When performing actual classification, you only need to input the text to be classified into the predict function to return the category it belongs to. For example, enter the text "iPhone 12 is finally released!" to return to the "Technology" category.

  1. Summary

As a simple and effective classification algorithm, the Naive Bayes algorithm is also widely used in Python. This article introduces the methods and steps of using the Naive Bayes classifier, and takes text classification based on Naive Bayes as an example to demonstrate the practical application of the classifier. In the actual application process, data preprocessing, feature selection and other operations are also required to improve the accuracy of the classifier.

The above is the detailed content of Naive Bayes examples in Python. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn