ホームページ >バックエンド開発 >Python チュートリアル >Python カウンター: collections.Counter の使用方法?

Python カウンター: collections.Counter の使用方法?

王林転載: 2023-05-08 13:34:071194ブラウズ

1. はじめに

カウンターツールは高速で便利なカウントを提供します。カウンターは dict のサブクラスであり、ハッシュ可能なオブジェクトをカウントするために使用されます。これは、辞書キーのように格納された要素とその数を値として含むコレクションです。カウントには、0 や負の数を含む任意の整数値を指定できます。Counter クラスは、他の言語のバッグやマルチセットに似ています。簡単に言うと統計的に数えることができるので、いくつか例を見てみましょう。
例:

#计算top10的单词
from collections import Counter
import re
text = &#39;remove an existing key one level down remove an existing key one level down&#39;
words = re.findall(r&#39;\w+&#39;, text)
Counter(words).most_common(10)
[(&#39;remove&#39;, 2),(&#39;an&#39;, 2),(&#39;existing&#39;, 2),(&#39;key&#39;, 2),(&#39;one&#39;, 2)(&#39;level&#39;, 2),(&#39;down&#39;, 2)] 


#计算列表中单词的个数
cnt = Counter()
for word in [&#39;red&#39;, &#39;blue&#39;, &#39;red&#39;, &#39;green&#39;, &#39;blue&#39;, &#39;blue&#39;]:
    cnt[word] += 1
cnt
Counter({&#39;red&#39;: 2, &#39;blue&#39;: 3, &#39;green&#39;: 1})


#上述这样计算有点嘛，下面的方法更简单，直接计算就行
L = [&#39;red&#39;, &#39;blue&#39;, &#39;red&#39;, &#39;green&#39;, &#39;blue&#39;, &#39;blue&#39;] 
Counter(L)
Counter({&#39;red&#39;: 2, &#39;blue&#39;: 3, &#39;green&#39;: 1}

要素は反復可能からカウントされるか、他のマッピング (またはカウンター) から初期化されます:

from collections import Counter

#字符串计数
Counter(&#39;gallahad&#39;) 
Counter({&#39;g&#39;: 1, &#39;a&#39;: 3, &#39;l&#39;: 2, &#39;h&#39;: 1, &#39;d&#39;: 1})

#字典计数
Counter({&#39;red&#39;: 4, &#39;blue&#39;: 2})  
Counter({&#39;red&#39;: 4, &#39;blue&#39;: 2})

#计数
Counter(cats=4, dogs=8)
Counter({&#39;cats&#39;: 4, &#39;dogs&#39;: 8})

Counter([&#39;red&#39;, &#39;blue&#39;, &#39;red&#39;, &#39;green&#39;, &#39;blue&#39;, &#39;blue&#39;])
Counter({&#39;red&#39;: 2, &#39;blue&#39;: 3, &#39;green&#39;: 1})

2. 基本操作

1. 統計"反復可能なシーケンス内の各要素の出現数"

1.1 リスト/文字列への影響

次の 2 つの使用方法があります。1 つは直接使用する方法、もう 1 つはインスタンス化する方法です。頻繁に呼び出したい場合は、Counter のさまざまなメソッドを簡単に呼び出すことができ、同じルーチンが他の反復可能なシーケンスに使用されるため、明らかに後者の方が簡潔です。

#首先引入该方法
from collections import Counter
#对列表作用
list_01 = [1,9,9,5,0,8,0,9]  #GNZ48-陈珂生日
print(Counter(list_01))  #Counter({9: 3, 0: 2, 1: 1, 5: 1, 8: 1})
 
#对字符串作用
temp = Counter(&#39;abcdeabcdabcaba&#39;)
print(temp)  #Counter({&#39;a&#39;: 5, &#39;b&#39;: 4, &#39;c&#39;: 3, &#39;d&#39;: 2, &#39;e&#39;: 1})
#以上其实是两种使用方法，一种是直接用，一种是实例化以后使用,如果要频繁调用的话，显然后一种更简洁

1.2 結果の出力

#查看类型
print( type(temp) ) #<class &#39;collections.Counter&#39;>
 
#转换为字典后输出
print( dict(temp) ) #{&#39;b&#39;: 4, &#39;a&#39;: 5, &#39;c&#39;: 3, &#39;d&#39;: 2, &#39;e&#39;: 1}
 
for num,count in enumerate(dict(temp).items()):
    print(count)
"""
(&#39;e&#39;, 1)
(&#39;c&#39;, 3)
(&#39;a&#39;, 5)
(&#39;b&#39;, 4)
(&#39;d&#39;, 2)
"""

1.3 組み込みの items() メソッドを使用して出力

この方法は、辞書に変換して出力するよりも明らかに便利です。 :

print(temp.items()) #dict_items([(&#39;e&#39;, 1), (&#39;c&#39;, 3), (&#39;b&#39;, 4), (&#39;d&#39;, 2), (&#39;a&#39;, 5)])
 
for item in temp.items():
    print(item)
"""
(&#39;a&#39;, 5)
(&#39;c&#39;, 3)
(&#39;d&#39;, 2)
(&#39;e&#39;, 1)
(&#39;b&#39;, 4)
"""

2. most_common() は、最も多く出現する要素をカウントします

most_common() メソッドを使用して、n 個の最も一般的な要素と出現数を順番に含むリストを返します。一般性の低い順に並べ替えます。 n が省略されている場合、または None の場合、most_common() はカウンター内のすべての要素を返します。等しいカウント値を持つ要素は、最初に出現した順にソートされます。上位の単語頻度を計算するためによく使用される単語:

#求序列中出现次数最多的元素
 
from collections import Counter
 
list_01 = [1,9,9,5,0,8,0,9]
temp = Counter(list_01)
 
#统计出现次数最多的一个元素
print(temp.most_common(1))   #[(9, 3)]  元素“9”出现3次。
print(temp.most_common(2)) #[(9, 3), (0, 2)]  统计出现次数最多个两个元素
 
#没有指定个数，就列出全部
print(temp.most_common())  #[(9, 3), (0, 2), (1, 1), (5, 1), (8, 1)]

Counter(&#39;abracadabra&#39;).most_common(3)
[(&#39;a&#39;, 5), (&#39;b&#39;, 2), (&#39;r&#39;, 2)]

Counter(&#39;abracadabra&#39;).most_common(5)
[(&#39;a&#39;, 5), (&#39;b&#39;, 2), (&#39;r&#39;, 2), (&#39;c&#39;, 1), (&#39;d&#39;, 1)]

3. elements() および sort() メソッド

説明: 各要素がカウント値で指定された回数だけ繰り返される反復子を返します。要素は最初に出現した順に返されます。要素の数が 1 未満の場合、elements() はそれを無視します。
例:

c = Counter(a=4, b=2, c=0, d=-2)
list(c.elements())
[&#39;a&#39;, &#39;a&#39;, &#39;a&#39;, &#39;a&#39;, &#39;b&#39;, &#39;b&#39;]

sorted(c.elements())
[&#39;a&#39;, &#39;a&#39;, &#39;a&#39;, &#39;a&#39;, &#39;b&#39;, &#39;b&#39;]

c = Counter(a=4, b=2, c=0, d=5)
list(c.elements())
[&#39;a&#39;, &#39;a&#39;, &#39;a&#39;, &#39;a&#39;, &#39;b&#39;, &#39;b&#39;, &#39;d&#39;, &#39;d&#39;, &#39;d&#39;, &#39;d&#39;, &#39;d&#39;]

from collections import Counter
 
c = Counter(&#39;ABCABCCC&#39;)
print(c.elements()) #<itertools.chain object at 0x0000027D94126860>
 
#尝试转换为list
print(list(c.elements())) #[&#39;A&#39;, &#39;A&#39;, &#39;C&#39;, &#39;C&#39;, &#39;C&#39;, &#39;C&#39;, &#39;B&#39;, &#39;B&#39;]
 
#或者这种方式
print(sorted(c.elements()))  #[&#39;A&#39;, &#39;A&#39;, &#39;B&#39;, &#39;B&#39;, &#39;C&#39;, &#39;C&#39;, &#39;C&#39;, &#39;C&#39;]
 
#这里与sorted的作用是： list all unique elements，列出所有唯一元素
#例如
print( sorted(c) ) #[&#39;A&#39;, &#39;B&#39;, &#39;C&#39;]

公式ドキュメントの例:

# Knuth&#39;s example for prime factors of 1836:  2**2 * 3**3 * 17**1
prime_factors = Counter({2: 2, 3: 3, 17: 1})
product = 1
for factor in prime_factors.elements():  # loop over factors
    product *= factor  # and multiply them
print(product)  #1836
#1836 = 2*2*3*3*3*17

4.subtract() 減算演算: 出力は、結果が 0 または 0 未満であるカウントを無視しません

反復可能なオブジェクトまたはマップされたオブジェクトから要素を減算します。入力と出力は両方とも 0 または負の値にすることができます。

c = Counter(a=4, b=2, c=0, d=-2)
d = Counter(a=1, b=2, c=3, d=4)
c.subtract(d)
c
Counter({&#39;a&#39;: 3, &#39;b&#39;: 0, &#39;c&#39;: -3, &#39;d&#39;: -6})

#减去一个abcd
str0 = Counter(&#39;aabbccdde&#39;)
str0
Counter({&#39;a&#39;: 2, &#39;b&#39;: 2, &#39;c&#39;: 2, &#39;d&#39;: 2, &#39;e&#39;: 1})

str0.subtract(&#39;abcd&#39;)
str0
Counter({&#39;a&#39;: 1, &#39;b&#39;: 1, &#39;c&#39;: 1, &#39;d&#39;: 1, &#39;e&#39;: 1}

subtract_test01 = Counter("AAB")
subtract_test01.subtract("BCC")
print(subtract_test01)  #Counter({&#39;A&#39;: 2, &#39;B&#39;: 0, &#39;C&#39;: -2})

ここでのカウントはゼロに減らすことができ、ゼロと負の数を含めることができます:

subtract_test02 = Counter("which")
subtract_test02.subtract("witch")  #从另一个迭代序列中减去元素
subtract_test02.subtract(Counter("watch"))  #^……
 
#查看结果
print( subtract_test02["h"] )  # 0 ,whirch 中两个，减去witch中一个，减去watch中一个，剩0个
print( subtract_test02["w"] )  #-1

5. ディクショナリメソッド

通常、ディクショナリメソッドは Counter オブジェクトに使用できます。ただし、辞書とは異なる動作をするメソッドが 2 つあります。

fromkeys(iterable): このクラスメソッドは Counter には実装されていません。
update([iterable-or-mapping]): 反復可能オブジェクトから要素をカウントするか、別のマッピングオブジェクト (またはカウンター) から追加すると、要素の数が追加されます。さらに、反復オブジェクトは (キー、値) ペアではなく、シーケンス要素である必要があります。

sum(c.values())                 # total of all counts
c.clear()                       # reset all counts
list(c)                         # list unique elements
set(c)                          # convert to a set
dict(c)                         # convert to a regular dictionary
c.items()                       # convert to a list of (elem, cnt) pairs
Counter(dict(list_of_pairs))    # convert from a list of (elem, cnt) pairs
c.most_common(n)                   # n least common elements
+c                              # remove zero and negative counts

6. 数学的演算

この関数は非常に強力で、Counter オブジェクトと組み合わせてマルチセット (0 より大きい要素) を生成できるいくつかの数学的演算を提供します。カウンター）。加算と減算では、対応する要素の数を加算または減算してカウンターを結合します。 Intersection と Union は、対応するカウントの最小値または最大値を返します。各操作は符号付きカウントを受け入れますが、出力では結果が 0 または 0 未満のカウントは無視されます。

c = Counter(a=3, b=1)
d = Counter(a=1, b=2)
c + d                       # add two counters together:  c[x] + d[x]
Counter({&#39;a&#39;: 4, &#39;b&#39;: 3})
c - d                       # subtract (keeping only positive counts)
Counter({&#39;a&#39;: 2})
c & d                       # intersection:  min(c[x], d[x]) 
Counter({&#39;a&#39;: 1, &#39;b&#39;: 1})
c | d                       # union:  max(c[x], d[x])
Counter({&#39;a&#39;: 3, &#39;b&#39;: 2})

print(Counter(&#39;AAB&#39;) + Counter(&#39;BCC&#39;))
#Counter({&#39;B&#39;: 2, &#39;C&#39;: 2, &#39;A&#39;: 2})
print(Counter("AAB")-Counter("BCC"))
#Counter({&#39;A&#39;: 2})

And" および "OR" 演算:

print(Counter(&#39;AAB&#39;) & Counter(&#39;BBCC&#39;))
#Counter({&#39;B&#39;: 1})
 
print(Counter(&#39;AAB&#39;) | Counter(&#39;BBCC&#39;))
#Counter({&#39;A&#39;: 2, &#39;C&#39;: 2, &#39;B&#39;: 2})

一方向加算および減算 (単項演算子) は、空のカウンタに加算または減算することを意味します。これは、カウント値に正の値を乗算することと同等です。または負の値の場合、出力では、結果が 0 または 0 未満であるカウントも無視されます:

c = Counter(a=2, b=-4)
+c
Counter({&#39;a&#39;: 2})
-c
Counter({&#39;b&#39;: 4})

テキストの類似性、重み付けされた類似性を計算するアルゴリズムを作成します:

def str_sim(str_0,str_1,topn):
    topn = int(topn)
    collect0 = Counter(dict(Counter(str_0).most_common(topn)))
    collect1 = Counter(dict(Counter(str_1).most_common(topn)))       
    jiao = collect0 & collect1
    bing = collect0 | collect1       
    sim = float(sum(jiao.values()))/float(sum(bing.values()))        
    return(sim)         

str_0 = &#39;定位手机定位汽车定位GPS定位人定位位置查询&#39;         
str_1 = &#39;导航定位手机定位汽车定位GPS定位人定位位置查询&#39;         

str_sim(str_0,str_1,5)    
0.75

7. 合計を計算します。要素の数、Keys() および Values()

from collections import Counter
 
c = Counter(&#39;ABCABCCC&#39;)
print(sum(c.values()))  # 8  total of all counts
 
print(c.keys())  #dict_keys([&#39;A&#39;, &#39;B&#39;, &#39;C&#39;])
print(c.values())  #dict_values([2, 2, 4])

8. 単一要素の結果のクエリ

from collections import Counter
c = Counter(&#39;ABBCC&#39;)
#查询具体某个元素的个数
print(c["A"])  #1

9. Add

for elem in &#39;ADD&#39;:  # update counts from an iterabl
    c[elem] += 1
print(c.most_common())  #[(&#39;C&#39;, 2), (&#39;D&#39;, 2), (&#39;A&#39;, 2), (&#39;B&#39;, 2)]
#可以看出“A”增加了一个，新增了两个“D”

10. 削除 (del)

del c["D"]
print(c.most_common())  #[(&#39;C&#39;, 2), (&#39;A&#39;, 2), (&#39;B&#39;, 2)]
del c["C"]
print(c.most_common())  #[(&#39;A&#39;, 2), (&#39;B&#39;, 2)]

11. update update()

d = Counter("CCDD")
c.update(d)
print(c.most_common())  #[(&#39;B&#39;, 2), (&#39;A&#39;, 2), (&#39;C&#39;, 2), (&#39;D&#39;, 2)]

12. Clear Clear()

c.clear()
print(c)  #Counter()

3. Summary

Counter は dict のサブクラスで、主にアクセスするために使用されます。オブジェクトの頻度がカウントされます。

一般的に使用されるメソッド:

elements(): オブジェクトの数がカウントされている場合、各要素の反復計算回数である反復子を返します。要素 1 未満の場合は無視されます
most_common([n]): 最も頻繁にアクセスされる n 個の要素とその数を提供するリストを返します
subtract([iterable-or-mapping]): 反復可能なオブジェクトから要素を減算します。入力と出力は 0 または負の数値にすることができますが、これはマイナス記号の役割とは異なります -
update ([iterable-or-mapping]): 反復可能オブジェクトから要素をカウントするか、別のマッピングオブジェクト (またはカウンター) から追加します。

例:

# 统计字符出现的次数
>>> import collections
>>> collections.Counter(&#39;hello world&#39;)
Counter({&#39;l&#39;: 3, &#39;o&#39;: 2, &#39;h&#39;: 1, &#39;e&#39;: 1, &#39; &#39;: 1, &#39;w&#39;: 1, &#39;r&#39;: 1, &#39;d&#39;: 1})
# 统计单词数
>>> collections.Counter(&#39;hello world hello world hello nihao&#39;.split())
Counter({&#39;hello&#39;: 3, &#39;world&#39;: 2, &#39;nihao&#39;: 1})

一般的に使用される方法:

>>> c = collections.Counter(&#39;hello world hello world hello nihao&#39;.split())
>>> c
Counter({&#39;hello&#39;: 3, &#39;world&#39;: 2, &#39;nihao&#39;: 1})
# 获取指定对象的访问次数，也可以使用get()方法
>>> c[&#39;hello&#39;]
3
>>> c = collections.Counter(&#39;hello world hello world hello nihao&#39;.split())
# 查看元素
>>> list(c.elements())
[&#39;hello&#39;, &#39;hello&#39;, &#39;hello&#39;, &#39;world&#39;, &#39;world&#39;, &#39;nihao&#39;]
# 追加对象，或者使用c.update(d)
>>> c = collections.Counter(&#39;hello world hello world hello nihao&#39;.split())
>>> d = collections.Counter(&#39;hello world&#39;.split())
>>> c
Counter({&#39;hello&#39;: 3, &#39;world&#39;: 2, &#39;nihao&#39;: 1})
>>> d
Counter({&#39;hello&#39;: 1, &#39;world&#39;: 1})
>>> c + d
Counter({&#39;hello&#39;: 4, &#39;world&#39;: 3, &#39;nihao&#39;: 1})
# 减少对象，或者使用c.subtract(d)
>>> c - d
Counter({&#39;hello&#39;: 2, &#39;world&#39;: 1, &#39;nihao&#39;: 1})
# 清除
>>> c.clear()
>>> c
Counter()

以上がPython カウンター: collections.Counter の使用方法?の詳細内容です。詳細については、PHP 中国語 Web サイトの他の関連記事を参照してください。

声明：

この記事はyisu.comで複製されています。侵害がある場合は、admin@php.cn までご連絡ください。

前の記事：さまざまなプログラミング言語における基数ソートの原理と実装方法次の記事：さまざまなプログラミング言語における基数ソートの原理と実装方法

続きを見る