Home >Backend Development >Python Tutorial >How do you efficiently group data in Python based on a specific key, and what are the different methods available for this task?
In Python, grouping data by a specific key involves organizing items based on a common attribute. This can be achieved through various methods, offering efficient solutions for large datasets. Let's explore how to group data effectively.
Consider a scenario where we have a set of data pairs, and the goal is to group them based on their type. To accomplish this, we can leverage the collections.defaultdict class. It creates a dictionary where missing keys are automatically initialized with default values, allowing us to append items to these keys.
<code class="python">from collections import defaultdict input = [ ('11013331', 'KAT'), ('9085267', 'NOT'), ('5238761', 'ETH'), ('5349618', 'ETH'), ('11788544', 'NOT'), ('962142', 'ETH'), ('7795297', 'ETH'), ('7341464', 'ETH'), ('9843236', 'KAT'), ('5594916', 'ETH'), ('1550003', 'ETH'), ] res = defaultdict(list) for v, k in input: res[k].append(v) print([{ 'type': k, 'items': v } for k, v in res.items()])</code>
Output:
[{'items': ['9085267', '11788544'], 'type': 'NOT'}, {'items': ['5238761', '5349618', '962142', '7795297', '7341464', '5594916', '1550003'], 'type': 'ETH'}, {'items': ['11013331', '9843236'], 'type': 'KAT'}]
Another approach involves using itertools.groupby. This function requires the input to be sorted beforehand. It generates groups of consecutive elements where the values of the specified key are the same.
<code class="python">import itertools from operator import itemgetter sorted_input = sorted(input, key=itemgetter(1)) groups = itertools.groupby(sorted_input, key=itemgetter(1)) print([{ 'type': k, 'items': [x[0] for x in v]} for k, v in groups])</code>
Output:
[{'items': ['5238761', '5349618', '962142', '7795297', '7341464', '5594916', '1550003'], 'type': 'ETH'}, {'items': ['11013331', '9843236'], 'type': 'KAT'}, {'items': ['9085267', '11788544'], 'type': 'NOT'}]
Prior to Python 3.7, dictionaries did not preserve insertion order. To address this, collections.OrderedDict can be used to maintain the order of key-value pairs.
<code class="python">from collections import OrderedDict res = OrderedDict() for v, k in input: if k in res: res[k].append(v) else: res[k] = [v] print([{ 'type': k, 'items': v } for k, v in res.items()])</code>
However, in Python 3.7 and later, regular dictionaries preserve insertion order, making OrderedDict unnecessary.
The above is the detailed content of How do you efficiently group data in Python based on a specific key, and what are the different methods available for this task?. For more information, please follow other related articles on the PHP Chinese website!