Home  >  Article  >  Backend Development  >  How to count word frequency in text in Python

How to count word frequency in text in Python

爱喝马黛茶的安东尼
爱喝马黛茶的安东尼Original
2019-06-20 17:05:307710browse

When we are reading an article or even a novel, we want to know which word appears the most in the text and how many times it appears. What should we do? Python can do this job with simple code. You can also expand a bit and infer who the protagonist is by whose name or which sentence appears most often in the novel? What's the mantra? Isn’t it very interesting? Come and try it.

How to count word frequency in text in Python

Idea:

is to first extract each character and put it in the list;

then filter Remove the punctuation marks;

Finally, use a dictionary to accumulate the frequency of a certain word.

Related recommendations: "python video tutorial"

Take the novel Youth as an example:

#coding:utf-8
word_lst = []
word_dict = {}
exclude_str = ",。!?、()【】<>《》=:+-*—“”…" 
with open("芳华.txt","r") as fileIn ,open("芳华字频.txt",&#39;w&#39;) as fileOut:
    # 添加每一个字到列表中
    for line in fileIn:
        for char in line:
            word_lst.append(char)
    # 用字典统计每个字出现的个数       
    for char in word_lst:
        if char not in exclude_str:
            if char.strip() not in word_dict: # strip去除各种空白
                word_dict[char] = 1
            else :
                word_dict[char] += 1
    # 排序
    #   x[1]是按字频排序,x[0]则是按字排序
    lstWords = sorted(word_dict.items(), key=lambda x:x[1],  reverse=True) 
   
    # 输出结果 (前100)
    print (&#39;字符\t字频&#39;)
    print (&#39;=============&#39;)
    for e in lstWords[:100]:
        print (&#39;%s\t%d&#39; % e)
        fileOut.write(&#39;%s, %d\n&#39; % e)

Output results

字符    字频
=============
的    3641
一    1834
了    1748
是    1506
不    1267
我    1229
她    1156
他    985
小    962
个    921
人    866
在    853
刘    745
丁    728
那    723
上    705
来    698
峰    691
们    684
就    667
说    577
有    572
到    564
这    562
里    537
儿    520
嫚    499
子    494
都    492
着    491
大    482
么    462
出    460
看    441
也    415
得    404
下    383
时    367
还    366
女    349
地    340
头    331
好    327
没    326
去    321
过    320
老    317
跟    311
你    309
把    307
对    303
年    301
会    300
生    291
为    289
发    289
要    281
何    280
亲    273
后    272
给    267
和    266
天    265
家    259
手    251
长    251
想    249
多    242
自    241
开    240
当    236
兵    235
样    232
郝    230
可    228
起    225
被    224
成    216
十    215
什    215
以    209
事    209
从    209
点    208
能    203
两    203
回    202
门    201
所    195
淑    188
雯    188
只    188
心    184
身    184
让    179
道    179
母    174
做    173
话    173
最    172
>>>

The above is the detailed content of How to count word frequency in text in Python. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn