Pythonを使用してファイルで単語頻度をカウントします-Python チュートリアル-php.cn

ホームページ

バックエンド開発

Python チュートリアル

Pythonを使用してファイルで単語頻度をカウントします

Jennifer Aniston

Mar 06, 2025 am 11:59 AM

このチュートリアルは、Pythonを使用して単語頻度を分析することにより、ドキュメントのメイントピックをすばやく決定する方法を示しています。手動での単語の発生を数えることは退屈です。この自動化されたアプローチは、プロセスを簡素化します

サンプルテキストファイル

（ダウンロードしますが、覗かないでください！）を使用して説明します。目標は、単語の頻度に基づいてチュートリアルの主題を推測することです。 test.txt

正規表現の理解このプロセスでは、正規表現（正規表現）を使用します。なじみのない場合、正規表現は、文字列マッチングの検索パターンを定義する文字シーケンスです（「検索と置換」など）。より深いダイビングについては、専用の正規表現チュートリアルを参照してください。プログラムの構築

ファイルを読む：

プログラムは、テキストファイルを文字列に読み取ることから始まります：

正規表現：

document_text = open('test.txt', 'r')
text_string = document_text.read().lower()

3〜15文字の正規表現をフィルターします：

単語周波数：
```
match_pattern = re.findall(r'\b[a-z]{3,15}\b', text_string)
```
辞書は単語の周波数を追跡します：

output：

frequency = {}
for word in match_pattern:
    count = frequency.get(word, 0)
    frequency[word] = count + 1

プログラムは、各単語とその頻度を印刷します。

完全なプログラム

frequency_list = frequency.keys()
for word in frequency_list:
    print(word, frequency[word])

これを実行すると、単語周波数リストが出力されます。最も頻繁な言葉は、元のチュートリアルのトピックを示唆しています。

import re

frequency = {}
document_text = open('test.txt', 'r')
text_string = document_text.read().lower()
match_pattern = re.findall(r'\b[a-z]{3,15}\b', text_string)

for word in match_pattern:
    count = frequency.get(word, 0)
    frequency[word] = count + 1

frequency_list = frequency.keys()
for word in frequency_list:
    print(word, frequency[word])

大きなテキストファイルの処理

大きなファイルの場合、周波数辞書をソートすると、最も頻繁な単語を見つけることが簡素化されます。 Counting Word Frequency in a File Using Python

これはソートされたリストを出力し、最も頻繁な単語が最初に表示されます。

一般的な単語を除外します

import re

frequency = {}
document_text = open('dracula.txt', 'r')  # Example: dracula.txt
text_string = document_text.read().lower()
match_pattern = re.findall(r'\b[a-z]{3,15}\b', text_string)

for word in match_pattern:
    count = frequency.get(word, 0)
    frequency[word] = count + 1

most_frequent = dict(sorted(frequency.items(), key=lambda elem: elem[1], reverse=True))
most_frequent_count = most_frequent.keys()

for word in most_frequent_count:
    print(word, most_frequent[word])

これにより、より焦点の整った分析が提供されます。

Counting Word Frequency in a File Using Python

この拡張されたPythonスクリプトは、単語の頻度に基づいてテキストを分析し、重要なトピックを識別するための堅牢な方法を提供します。特定のニーズに合わせて、ブラックリストと単語の長さの基準を適応させることを忘れないでください。

以上がPythonを使用してファイルで単語頻度をカウントしますの詳細内容です。詳細については、PHP 中国語 Web サイトの他の関連記事を参照してください。

声明

この記事の内容はネチズンが自主的に寄稿したものであり、著作権は原著者に帰属します。このサイトは、それに相当する法的責任を負いません。盗作または侵害の疑いのあるコンテンツを見つけた場合は、admin@php.cn までご連絡ください。

リストと配列の選択は、大規模なデータセットを扱うPythonアプリケーションの全体的なパフォーマンスにどのように影響しますか？May 03, 2025 am 12:11 AM

forhandlinglaredataSetsinpython、usenumpyArrays forbetterperformance.1）numpyarraysarememory-effictientandfasterfornumericaloperations.2）nusinnnnedarytypeconversions.3）レバレッジベクトル化は、測定済みのマネージメーシェイメージーウェイズデイタイです

Pythonのリストと配列にメモリがどのように割り当てられるかを説明します。May 03, 2025 am 12:10 AM

inpython、listsusedynamicmemoryallocation with allocation、whilenumpyArraysalocatefixedmemory.1）listsallocatemorememorythanneededededinitivative.2）numpyArrayasallocateexactmemoryforements、rededicablebutlessflexibilityを提供します。

Pythonアレイ内の要素のデータ型をどのように指定しますか？May 03, 2025 am 12:06 AM

inpython、youcanspecthedatatypeyfelemeremodelernspant.1）usenpynernrump.1）usenpynerp.dloatp.ploatm64、フォーマーpreciscontrolatatypes。

Numpyとは何ですか、そしてなぜPythonの数値コンピューティングにとって重要なのですか？May 03, 2025 am 12:03 AM

numpyisessentialfornumericalcomputinginpythonduetoitsspeed、memory efficiency、andcomprehensivematicalfunctions.1）それは、performsoperations.2）numpyArraysaremoremory-efficientthanpythonlists.3）Itofderangeofmathematicaloperty

「隣接するメモリ割り当て」の概念と、配列にとってその重要性について説明します。May 03, 2025 am 12:01 AM

contiguousMemoryAllocationisucial forArraysは、ForeffienceAndfastelementAccess.1）iteenablesConstantTimeAccess、O（1）、DuetodirectAddresscalculation.2）itemprovesefficiencyByAllowingMultiblementFechesperCacheLine.3）itimplifieMememm

Pythonリストをどのようにスライスしますか？May 02, 2025 am 12:14 AM

slicingapythonlistisdoneusingtheyntaxlist [start：stop：step] .hore'showitworks：1）startisthe indexofthefirstelementtoinclude.2）spotisthe indexofthefirmenttoeexclude.3）staptistheincrementbetbetinelements

Numpyアレイで実行できる一般的な操作は何ですか？May 02, 2025 am 12:09 AM

numpyallows forvariousoperationsonarrays：1）basicarithmeticlikeaddition、減算、乗算、および分割; 2）AdvancedperationssuchasmatrixMultiplication;

Pythonを使用したデータ分析では、配列はどのように使用されていますか？May 02, 2025 am 12:09 AM

Arraysinpython、特にnumpyandpandas、aresentialfordataanalysis、offeringspeedandeficiency.1）numpyarraysenable numpyarraysenable handling forlaredatasents andcomplexoperationslikemoverages.2）Pandasextendsnumpy'scapabivitieswithdataframesfortruc

See all articles