search
HomeBackend DevelopmentPython Tutorialpython implementation of naive bayes algorithm

Advantages and disadvantages of the algorithm

Advantages: Still effective when there is less data, can handle multi-category problems

Disadvantages: Sensitive to the preparation method of input data

Applicable data type: nominal data

Algorithm idea:

Naive Bayes

For example, if we want to determine whether an email is spam, then what we know is the distribution of words in the email, then we also need to know: how many occurrences of certain words in spam emails, then we can Obtained using Bayes' theorem.

An assumption in the Naive Bayes classifier is that each feature is equally important

Bayesian classification is the general term for a class of classification algorithms. This type of algorithm is based on Bayes’ theorem, so it is collectively called Bayesian. Classification.

Function

loadDataSet()

Create a data set. The data set here is a sentence composed of words that have been split, which represents user comments on a forum. Label 1 means that this is a curse

createVocabList(dataSet )

Find out how many words there are in these sentences to determine the size of our word vector

setOfWords2Vec(vocabList, inputSet)

Convert the sentences into vectors based on the words in them. The Bernoulli model is used here, that is Only consider whether the word exists

bagOfWords2VecMN(vocabList, inputSet)

This is another model that converts sentences into vectors, a polynomial model, considering the number of occurrences of a certain word

trainNB0(trainMatrix, trainCatergory)

calculation P(i) and P(w[i]|C[1]) and P(w[i]|C[0]), there are two tricks here. One is that the starting numerator and denominator are not all initialized to 0 in order to To prevent the probability of one of them from being 0 and causing the whole to be 0, the other is to use the multiplication logarithm later to prevent the result from being 0 due to accuracy issues

classifyNB(vec2Classify, p0Vec, p1Vec, pClass1)

Calculate this vector according to the Bayesian formula. Which of the two sets has a higher probability

#coding=utf-8
from numpy import *
def loadDataSet():
    postingList=[['my', 'dog', 'has', 'flea', 'problems', 'help', 'please'],
                 ['maybe', 'not', 'take', 'him', 'to', 'dog', 'park', 'stupid'],
                 ['my', 'dalmation', 'is', 'so', 'cute', 'I', 'love', 'him'],
                 ['stop', 'posting', 'stupid', 'worthless', 'garbage'],
                 ['mr', 'licks', 'ate', 'my', 'steak', 'how', 'to', 'stop', 'him'],
                 ['quit', 'buying', 'worthless', 'dog', 'food', 'stupid']]
    classVec = [0,1,0,1,0,1]    #1 is abusive, 0 not
    return postingList,classVec
#创建一个带有所有单词的列表
def createVocabList(dataSet):
    vocabSet = set([])
    for document in dataSet:
        vocabSet = vocabSet | set(document)
    return list(vocabSet)
     
def setOfWords2Vec(vocabList, inputSet):
    retVocabList = [0] * len(vocabList)
    for word in inputSet:
        if word in vocabList:
            retVocabList[vocabList.index(word)] = 1
        else:
            print 'word ',word ,'not in dict'
    return retVocabList
#另一种模型    
def bagOfWords2VecMN(vocabList, inputSet):
    returnVec = [0]*len(vocabList)
    for word in inputSet:
        if word in vocabList:
            returnVec[vocabList.index(word)] += 1
    return returnVec
def trainNB0(trainMatrix,trainCatergory):
    numTrainDoc = len(trainMatrix)
    numWords = len(trainMatrix[0])
    pAbusive = sum(trainCatergory)/float(numTrainDoc)
    #防止多个概率的成绩当中的一个为0
    p0Num = ones(numWords)
    p1Num = ones(numWords)
    p0Denom = 2.0
    p1Denom = 2.0
    for i in range(numTrainDoc):
        if trainCatergory[i] == 1:
            p1Num +=trainMatrix[i]
            p1Denom += sum(trainMatrix[i])
        else:
            p0Num +=trainMatrix[i]
            p0Denom += sum(trainMatrix[i])
    p1Vect = log(p1Num/p1Denom)#处于精度的考虑,否则很可能到限归零
    p0Vect = log(p0Num/p0Denom)
    return p0Vect,p1Vect,pAbusive
     
def classifyNB(vec2Classify, p0Vec, p1Vec, pClass1):
    p1 = sum(vec2Classify * p1Vec) + log(pClass1)    #element-wise mult
    p0 = sum(vec2Classify * p0Vec) + log(1.0 - pClass1)
    if p1 > p0:
        return 1
    else: 
        return 0
         
def testingNB():
    listOPosts,listClasses = loadDataSet()
    myVocabList = createVocabList(listOPosts)
    trainMat=[]
    for postinDoc in listOPosts:
        trainMat.append(setOfWords2Vec(myVocabList, postinDoc))
    p0V,p1V,pAb = trainNB0(array(trainMat),array(listClasses))
    testEntry = ['love', 'my', 'dalmation']
    thisDoc = array(setOfWords2Vec(myVocabList, testEntry))
    print testEntry,'classified as: ',classifyNB(thisDoc,p0V,p1V,pAb)
    testEntry = ['stupid', 'garbage']
    thisDoc = array(setOfWords2Vec(myVocabList, testEntry))
    print testEntry,'classified as: ',classifyNB(thisDoc,p0V,p1V,pAb)
     
     
def main():
    testingNB()
     
if __name__ == '__main__':
    main()


Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
How do you create multi-dimensional arrays using NumPy?How do you create multi-dimensional arrays using NumPy?Apr 29, 2025 am 12:27 AM

Create multi-dimensional arrays with NumPy can be achieved through the following steps: 1) Use the numpy.array() function to create an array, such as np.array([[1,2,3],[4,5,6]]) to create a 2D array; 2) Use np.zeros(), np.ones(), np.random.random() and other functions to create an array filled with specific values; 3) Understand the shape and size properties of the array to ensure that the length of the sub-array is consistent and avoid errors; 4) Use the np.reshape() function to change the shape of the array; 5) Pay attention to memory usage to ensure that the code is clear and efficient.

Explain the concept of 'broadcasting' in NumPy arrays.Explain the concept of 'broadcasting' in NumPy arrays.Apr 29, 2025 am 12:23 AM

BroadcastinginNumPyisamethodtoperformoperationsonarraysofdifferentshapesbyautomaticallyaligningthem.Itsimplifiescode,enhancesreadability,andboostsperformance.Here'showitworks:1)Smallerarraysarepaddedwithonestomatchdimensions.2)Compatibledimensionsare

Is Tuple Comprehension possible in Python? If yes, how and if not why?Is Tuple Comprehension possible in Python? If yes, how and if not why?Apr 28, 2025 pm 04:34 PM

Article discusses impossibility of tuple comprehension in Python due to syntax ambiguity. Alternatives like using tuple() with generator expressions are suggested for creating tuples efficiently.(159 characters)

What are Modules and Packages in Python?What are Modules and Packages in Python?Apr 28, 2025 pm 04:33 PM

The article explains modules and packages in Python, their differences, and usage. Modules are single files, while packages are directories with an __init__.py file, organizing related modules hierarchically.

What is docstring in Python?What is docstring in Python?Apr 28, 2025 pm 04:30 PM

Article discusses docstrings in Python, their usage, and benefits. Main issue: importance of docstrings for code documentation and accessibility.

What is a lambda function?What is a lambda function?Apr 28, 2025 pm 04:28 PM

Article discusses lambda functions, their differences from regular functions, and their utility in programming scenarios. Not all languages support them.

What is a break, continue and pass in Python?What is a break, continue and pass in Python?Apr 28, 2025 pm 04:26 PM

Article discusses break, continue, and pass in Python, explaining their roles in controlling loop execution and program flow.

What is a pass in Python?What is a pass in Python?Apr 28, 2025 pm 04:25 PM

The article discusses the 'pass' statement in Python, a null operation used as a placeholder in code structures like functions and classes, allowing for future implementation without syntax errors.

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

WebStorm Mac version

WebStorm Mac version

Useful JavaScript development tools

Atom editor mac version download

Atom editor mac version download

The most popular open source editor

VSCode Windows 64-bit Download

VSCode Windows 64-bit Download

A free and powerful IDE editor launched by Microsoft

DVWA

DVWA

Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software