如何使用 Z 分數有效檢測和排除 Pandas DataFrame 中的異常值？-Python教學-PHP中文網

首頁

後端開發

Python教學

如何使用 Z 分數有效檢測和排除 Pandas DataFrame 中的異常值？

Mary-Kate Olsen

Dec 01, 2024 am 04:54 AM

How to Effectively Detect and Exclude Outliers in Pandas DataFrames Using Z-scores?

Pandas DataFrame 中的離群值偵測與排除

使用資料集時，辨識並處理離群值至關重要，因為它們可能會影響分析和結果結果。在 pandas 中，可以使用優雅且高效的方法來實現基於特定列值的異常值檢測和排除。

理解問題

給定一個包含多個列的 pandas DataFrame ，某些行可能在特定列中包含異常值，表示為「Vol」。任務是過濾 DataFrame 並排除「Vol」列值顯著偏離平均值的行。

解決方案使用scipy.stats.zscore

來實現這個，我們可以利用scipy.stats.zscore 函數：

import pandas as pd
import numpy as np
from scipy import stats

# Calculate Z-scores for the specified column
z_scores = stats.zscore(df['Vol'])

# Define a threshold for outlier detection (e.g., 3 standard deviations)
threshold = 3

# Create a mask to identify rows with outlier values
mask = np.abs(z_scores) <p></p>這個解決方案提供一種根據指定列值檢測和排除異常值的有效方法。透過使用 Z 分數，我們可以量化各個值與平均值的偏差，並應用閾值來識別異常值。產生的 outlier_filtered_df 將僅包含「Vol」值在指定範圍內的行。

以上是如何使用 Z 分數有效檢測和排除 Pandas DataFrame 中的異常值？的詳細內容。更多資訊請關注PHP中文網其他相關文章！

陳述

本文內容由網友自願投稿，版權歸原作者所有。本站不承擔相應的法律責任。如發現涉嫌抄襲或侵權的內容，請聯絡admin@php.cn

陣列的同質性質如何影響性能？Apr 25, 2025 am 12:13 AM

數組的同質性對性能的影響是雙重的：1)同質性允許編譯器優化內存訪問，提高性能；2)但限制了類型多樣性，可能導致效率低下。總之，選擇合適的數據結構至關重要。

Numpy數組與使用數組模塊創建的數組有何不同？Apr 24, 2025 pm 03:53 PM

numpyArraysareAreBetterFornumericalialoperations andmulti-demensionaldata，而learthearrayModuleSutableforbasic，內存效率段

Numpy數組的使用與使用Python中的數組模塊陣列相比如何？Apr 24, 2025 pm 03:49 PM

numpyArraySareAreBetterForHeAvyNumericalComputing，而lelethearRayModulesiutable-usemoblemory-connerage-inderabledsswithSimpleDatateTypes.1）NumpyArsofferVerverVerverVerverVersAtility andPerformanceForlargedForlargedAtatasetSetsAtsAndAtasEndCompleXoper.2）

CTYPES模塊與Python中的數組有何關係？Apr 24, 2025 pm 03:45 PM

ctypesallowscreatingingangandmanipulatingc-stylarraysinpython.1）usectypestoInterfacewithClibrariesForperfermance.2）createc-stylec-stylec-stylarraysfornumericalcomputations.3）passarraystocfunctions foreforfunctionsforeffortions.however.however，However，HoweverofiousofmemoryManageManiverage，Pressiveo，Pressivero

在Python的上下文中定義'數組”和'列表”。Apr 24, 2025 pm 03:41 PM

Inpython，一個“列表” isaversatile，mutableSequencethatCanholdMixedDatateTypes，而“陣列” isamorememory-sepersequeSequeSequeSequeSequeRingequiringElements.1）列表

Python列表是可變還是不變的？那Python陣列呢？Apr 24, 2025 pm 03:37 PM

pythonlistsandArraysareBothable.1）列表Sareflexibleandsupportereceneousdatabutarelessmory-Memory-Empefficity.2）ArraysareMoremoremoremoreMemoremorememorememorememoremorememogeneSdatabutlesserversEversementime，defteringcorcttypecrecttypececeDepeceDyusagetoagetoavoavoiDerrors。