如何在 Python 中使用 Pandas GroupBy 計算分組統計？-Python教學-PHP中文網

首頁

後端開發

Python教學

如何在 Python 中使用 Pandas GroupBy 計算分組統計？

Barbara Streisand

Dec 21, 2024 pm 09:18 PM

How Can Pandas GroupBy Be Used to Calculate Group-Wise Statistics in Python?

使用 Pandas GroupBy 計算分組統計

簡介

處理資料時，通常需要分析和比較不同組別的統計資料。 Pandas 是一個用於資料操作的著名 Python 函式庫，它提供了 GroupBy 功能來輕鬆執行這些操作。

取得分組行計數

取得每個群組的行計數的最簡單方法是透過.size() 方法。此方法傳回包含分組計數的Series：

df.groupby(['col1','col2']).size()

以表格格式擷取計數（即，作為具有「計數」列的DataFrame）：

df.groupby(['col1', 'col2']).size().reset_index(name='counts')

計算多個分組統計資料

要計算多個統計數據，請使用.agg() 方法和字典。鍵指定要計算的列，而值是所需聚合的清單（例如「平均值」、「中位數」和「計數」）：

df.groupby(['col1', 'col2']).agg({
    'col3': ['mean', 'count'],
    'col4': ['median', 'min', 'count']
})

自訂資料輸出

為了更好地控制輸出，可以加入單獨的聚合：

counts = df.groupby(['col1', 'col2']).size().to_frame(name='counts')
counts.join(gb.agg({'col3': 'mean'}).rename(columns={'col3': 'col3_mean'})) \
    .join(gb.agg({'col4': 'median'}).rename(columns={'col4': 'col4_median'})) \
    .join(gb.agg({'col4': 'min'}).rename(columns={'col4': 'col4_min'})) \
    .reset_index()

這會產生一個更結構化的DataFrame未嵌套的列標籤。

註腳

在提供的範例中，空值可能會導致用於不同計算的行計數出現差異。這強調了在解釋分組統計資料時考慮空值的重要性。

以上是如何在 Python 中使用 Pandas GroupBy 計算分組統計？的詳細內容。更多資訊請關注PHP中文網其他相關文章！

陳述

本文內容由網友自願投稿，版權歸原作者所有。本站不承擔相應的法律責任。如發現涉嫌抄襲或侵權的內容，請聯絡admin@php.cn

為什麼數組通常比存儲數值數據列表更高？May 05, 2025 am 12:15 AM

ArraySareAryallyMoremory-Moremory-forigationDataDatueTotheIrfixed-SizenatureAntatureAntatureAndirectMemoryAccess.1）arraysStorelelementsInAcontiguxufulock，ReducingOveringOverheadHeadefromenterSormetormetAdata.2）列表，通常

如何將Python列表轉換為Python陣列？May 05, 2025 am 12:10 AM

ToconvertaPythonlisttoanarray,usethearraymodule:1)Importthearraymodule,2)Createalist,3)Usearray(typecode,list)toconvertit,specifyingthetypecodelike'i'forintegers.Thisconversionoptimizesmemoryusageforhomogeneousdata,enhancingperformanceinnumericalcomp

您可以將不同的數據類型存儲在同一Python列表中嗎？舉一個例子。May 05, 2025 am 12:10 AM

Python列表可以存儲不同類型的數據。示例列表包含整數、字符串、浮點數、布爾值、嵌套列表和字典。列表的靈活性在數據處理和原型設計中很有價值，但需謹慎使用以確保代碼的可讀性和可維護性。

Python中的數組和列表之間有什麼區別？May 05, 2025 am 12:06 AM

Pythondoesnothavebuilt-inarrays;usethearraymoduleformemory-efficienthomogeneousdatastorage,whilelistsareversatileformixeddatatypes.Arraysareefficientforlargedatasetsofthesametype,whereaslistsofferflexibilityandareeasiertouseformixedorsmallerdatasets.

通常使用哪種模塊在Python中創建數組？May 05, 2025 am 12:02 AM

theSostCommonlyusedModuleForCreatingArraysInpyThonisnumpy.1）NumpyProvidEseffitedToolsForarrayOperations，Idealfornumericaldata.2）arraysCanbeCreatedDusingsnp.Array（）for1dand2Structures.3）

您如何將元素附加到Python列表中？May 04, 2025 am 12:17 AM

toAppendElementStoApythonList，usetheappend（）方法forsingleements，Extend（）formultiplelements，andinsert（）forspecificpositions.1）useeAppend（）foraddingoneOnelementAttheend.2）useextendTheEnd.2）useextendexendExendEnd（

您如何創建Python列表？舉一個例子。May 04, 2025 am 12:16 AM

TocreateaPythonlist,usesquarebrackets[]andseparateitemswithcommas.1)Listsaredynamicandcanholdmixeddatatypes.2)Useappend(),remove(),andslicingformanipulation.3)Listcomprehensionsareefficientforcreatinglists.4)Becautiouswithlistreferences;usecopy()orsl

討論有效存儲和數值數據的處理至關重要的實際用例。May 04, 2025 am 12:11 AM

金融、科研、医疗和AI等领域中，高效存储和处理数值数据至关重要。1)在金融中，使用内存映射文件和NumPy库可显著提升数据处理速度。2)科研领域，HDF5文件优化数据存储和检索。3)医疗中，数据库优化技术如索引和分区提高数据查询性能。4)AI中，数据分片和分布式训练加速模型训练。通过选择适当的工具和技术，并权衡存储与处理速度之间的trade-off，可以显著提升系统性能和可扩展性。

See all articles