When we are reading an article or even a novel, we want to know which word appears the most in the text and how many times it appears. What should we do? Python can do this job with simple code. You can also expand a bit and infer who the protagonist is by whose name or which sentence appears most often in the novel? What's the mantra? Isn’t it very interesting? Come and try it.
Idea:
is to first extract each character and put it in the list;
then filter Remove the punctuation marks;
Finally, use a dictionary to accumulate the frequency of a certain word.
Related recommendations: "python video tutorial"
Take the novel Youth as an example:
#coding:utf-8 word_lst = [] word_dict = {} exclude_str = ",。!?、()【】<>《》=:+-*—“”…" with open("芳华.txt","r") as fileIn ,open("芳华字频.txt",'w') as fileOut: # 添加每一个字到列表中 for line in fileIn: for char in line: word_lst.append(char) # 用字典统计每个字出现的个数 for char in word_lst: if char not in exclude_str: if char.strip() not in word_dict: # strip去除各种空白 word_dict[char] = 1 else : word_dict[char] += 1 # 排序 # x[1]是按字频排序,x[0]则是按字排序 lstWords = sorted(word_dict.items(), key=lambda x:x[1], reverse=True) # 输出结果 (前100) print ('字符\t字频') print ('=============') for e in lstWords[:100]: print ('%s\t%d' % e) fileOut.write('%s, %d\n' % e)
Output results
字符 字频 ============= 的 3641 一 1834 了 1748 是 1506 不 1267 我 1229 她 1156 他 985 小 962 个 921 人 866 在 853 刘 745 丁 728 那 723 上 705 来 698 峰 691 们 684 就 667 说 577 有 572 到 564 这 562 里 537 儿 520 嫚 499 子 494 都 492 着 491 大 482 么 462 出 460 看 441 也 415 得 404 下 383 时 367 还 366 女 349 地 340 头 331 好 327 没 326 去 321 过 320 老 317 跟 311 你 309 把 307 对 303 年 301 会 300 生 291 为 289 发 289 要 281 何 280 亲 273 后 272 给 267 和 266 天 265 家 259 手 251 长 251 想 249 多 242 自 241 开 240 当 236 兵 235 样 232 郝 230 可 228 起 225 被 224 成 216 十 215 什 215 以 209 事 209 从 209 点 208 能 203 两 203 回 202 门 201 所 195 淑 188 雯 188 只 188 心 184 身 184 让 179 道 179 母 174 做 173 话 173 最 172 >>>
The above is the detailed content of How to count word frequency in text in Python. For more information, please follow other related articles on the PHP Chinese website!

Python is an interpreted language, but it also includes the compilation process. 1) Python code is first compiled into bytecode. 2) Bytecode is interpreted and executed by Python virtual machine. 3) This hybrid mechanism makes Python both flexible and efficient, but not as fast as a fully compiled language.

Useaforloopwheniteratingoverasequenceorforaspecificnumberoftimes;useawhileloopwhencontinuinguntilaconditionismet.Forloopsareidealforknownsequences,whilewhileloopssuitsituationswithundeterminediterations.

Pythonloopscanleadtoerrorslikeinfiniteloops,modifyinglistsduringiteration,off-by-oneerrors,zero-indexingissues,andnestedloopinefficiencies.Toavoidthese:1)Use'i

Forloopsareadvantageousforknowniterationsandsequences,offeringsimplicityandreadability;whileloopsareidealfordynamicconditionsandunknowniterations,providingcontrolovertermination.1)Forloopsareperfectforiteratingoverlists,tuples,orstrings,directlyacces

Pythonusesahybridmodelofcompilationandinterpretation:1)ThePythoninterpretercompilessourcecodeintoplatform-independentbytecode.2)ThePythonVirtualMachine(PVM)thenexecutesthisbytecode,balancingeaseofusewithperformance.

Pythonisbothinterpretedandcompiled.1)It'scompiledtobytecodeforportabilityacrossplatforms.2)Thebytecodeistheninterpreted,allowingfordynamictypingandrapiddevelopment,thoughitmaybeslowerthanfullycompiledlanguages.

Forloopsareidealwhenyouknowthenumberofiterationsinadvance,whilewhileloopsarebetterforsituationswhereyouneedtoloopuntilaconditionismet.Forloopsaremoreefficientandreadable,suitableforiteratingoversequences,whereaswhileloopsoffermorecontrolandareusefulf

Forloopsareusedwhenthenumberofiterationsisknowninadvance,whilewhileloopsareusedwhentheiterationsdependonacondition.1)Forloopsareidealforiteratingoversequenceslikelistsorarrays.2)Whileloopsaresuitableforscenarioswheretheloopcontinuesuntilaspecificcond


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

SecLists
SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

SublimeText3 English version
Recommended: Win version, supports code prompts!

Safe Exam Browser
Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

Dreamweaver CS6
Visual web development tools

Atom editor mac version download
The most popular open source editor
