Home  >  Article  >  Backend Development  >  Python and Pandas code examples for time series feature extraction

Python and Pandas code examples for time series feature extraction

王林
王林forward
2023-04-12 17:43:081352browse

Extract meaningful features from time series data using Pandas and Python, including moving averages, autocorrelation, and Fourier transforms.

Preface

Time series analysis is a powerful tool for understanding and predicting trends in various industries (such as finance, economics, healthcare, etc.). Feature extraction is a key step in this process, which involves converting raw data into meaningful features that can be used to train models for prediction and analysis. In this article, we will explore time series feature extraction techniques using Python and Pandas.

Before we delve into feature extraction, let’s briefly review time series data. Time series data is a sequence of data points indexed in time order. Examples of time series data include stock prices, temperature measurements, and traffic data. Time series data can be univariate or multivariate. Univariate time series data has only one variable, while multivariate time series data has multiple variables.

Python and Pandas code examples for time series feature extraction

There are various feature extraction techniques that can be used for time series analysis. In this article, we will cover the following techniques:

  • Resampling
  • Moving Average
  • Exponential Smoothing
  • Autocorrelation
  • Fourier Transform

1. Resampling

Resampling mainly changes the frequency of time series data. This is useful for smoothing noise or sampling data to a lower frequency. Pandas provides the resample() method to resample time series data. The resample() method can be used to upsample or downsample data. Here is an example of how to downsample a time series to daily frequency:

import pandas as pd
 
 # create a time series with minute frequency
 ts = pd.Series([1, 2, 3, 4, 5], index=pd.date_range('2022-01-01', periods=5, freq='T'))
 
 # downsample to daily frequency
 daily_ts = ts.resample('D').sum()
 
 print(daily_ts)

In the above example, we create a time series with minute frequency and then sample it using the resample() method to daily frequency.

Python and Pandas code examples for time series feature extraction

2. Moving Average

Moving Average Moving average is a technique that smoothes time series data by averaging over a rolling window. Can help remove noise and get trends in the data. Pandas provides the rolling() method to calculate the average of a time series. Here is an example of how to calculate the average of a time series:

import pandas as pd
 
 # create a time series
 ts = pd.Series([1, 2, 3, 4, 5])
 
 # calculate the rolling mean with a window size of 3
 rolling_mean = ts.rolling(window=3).mean()
 
 print(rolling_mean)

We create a time series and then use the rolling() method to calculate a moving average with a window size of 3.

Python and Pandas code examples for time series feature extraction

You can see that the first two values ​​will generate NAN because they do not reach the minimum number of moving average 3. If necessary, you can use the fillna method to fill.

3. Exponential Smoothing

Exponential Smoothing Exponential smoothing is a technique for smoothing time series data by giving more weight to recent values. It can help remove noise to obtain trends in the data. Pandas provides the ewm() method for calculating exponential moving average.

import pandas as pd
 ts = pd.Series([1, 2, 3, 4, 5])
 ts.ewm( alpha =0.5).mean()

Python and Pandas code examples for time series feature extraction

In the above example, we create a time series and then use the ewm() method to calculate an exponential moving average with a smoothing factor of 0.5.

ewm has many parameters, here we introduce a few main ones.

com: Specify attenuation based on center of mass

Python and Pandas code examples for time series feature extraction

span Specify attenuation based on range

Python and Pandas code examples for time series feature extraction

halflife specifies the attenuation based on the half-life

Python and Pandas code examples for time series feature extraction

##alpha specifies the smoothing coefficient α

Python and Pandas code examples for time series feature extraction

The above four parameters all specify the smoothing coefficient α, but the first three are calculated based on conditions, and the last one is manually specified, so there must be at least one. For example, in the above example we Just manually set the minimum number of observations with values ​​in the 0.5

min_periods window, which defaults to 0.

adjust Whether to perform error correction. The default is True.

adjust =Ture时公式如下:

Python and Pandas code examples for time series feature extraction

adjust =False

Python and Pandas code examples for time series feature extraction

4、Autocorrelation

Autocorrelation 自相关是一种用于测量时间序列与其滞后版本之间相关性的技术。可以识别数据中重复的模式。Pandas提供了autocorr()方法来计算自相关性。

import pandas as pd
 
 # create a time series
 ts = pd.Series([1, 2, 3, 4, 5])
 
 # calculate the autocorrelation with a lag of 1
 autocorr = ts.autocorr(lag=1)
 
 print(autocorr)

Python and Pandas code examples for time series feature extraction

5、Fourier Transform

Fourier Transform 傅里叶变换是一种将时间序列数据从时域变换到频域的技术。可以识别数据中的周期性模式。我们可以使用numpy的fft()方法来计算时间序列的快速傅里叶变换。

import pandas as pd
 import numpy as np
 
 # create a time series
 ts = pd.Series([1, 2, 3, 4, 5])
 
 # calculate the Fourier transform
 fft = pd.Series(np.fft.fft(ts).real)
 
 print(fft)

Python and Pandas code examples for time series feature extraction

这里我们只显示了实数的部分。

总结

在本文中,我们介绍了几种使用Python和Pandas的时间序列特征提取技术。这些技术可以帮助将原始时间序列数据转换为可用于分析和预测的有意义的特征,在训练机器学习模型时,这些特征都可以当作额外的数据输入到模型中,可以增加模型的预测能力。


The above is the detailed content of Python and Pandas code examples for time series feature extraction. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:51cto.com. If there is any infringement, please contact admin@php.cn delete