Home >Backend Development >Python Tutorial >Extract large amounts of time series features with a small amount of code

Extract large amounts of time series features with a small amount of code

王林forward: 2023-04-25 14:40:081333browse

Traditional machine learning algorithms cannot capture the temporal order of time series data. Data scientists need to perform relevant feature engineering to capture the important characteristics of the data into several metrics. Generating a large number of time series features and extracting relevant features from them is a time-consuming and tedious task.

几行 Python 代码就可以提取数百个时间序列特征

#Python's tsfresh package can generate hundreds of standard common features for time series data. In this article, we will discuss the use of tsfresh package in depth.

tsfresh is an open source package that can generate hundreds of relevant time series features. Features generated from tsfresh can be used to solve classification, prediction, and outlier detection use cases.

The tsfresh package provides various functions for performing feature engineering on time series data, including:

Feature generation
Feature selection
and Big data compatibility

Installing tsfresh is also very simple. The official installation method of pip and conda is provided:

pip install -U tsfresh# orconda install -c conda-forge tsfresh

1. Feature generation

tsfresh package Provides an automatic feature generation API that can generate more than 750 relevant features from 1 time series variable. A wide range of features are generated, including:

Descriptive statistics (mean, maximum, correlation, etc.)
Physics-based nonlinearity and complexity metrics
Digital signal processing related functions
Historical compression features

Using the tsfresh.extract_features() function, 789 features can be generated from multiple domains for 1 time series variable.

import pandas as pdfrom tsfresh import select_features# Read the time-series datadf = pd.read_excel("train.xlsx", parse_dates=['date']).set_index('date')# automated feature generationfeatures = tsfresh.extract_features(df, column_, column_sort="date")

Because there are too many features, please see the official documentation for a detailed introduction to all features.

2. Feature selection

The tsfresh package also provides a feature selection implementation based on hypothesis testing, which can be used to identify relevant features of the target variable. To limit the number of irrelevant features, tsfresh includes the fresh algorithm (fresh stands for feature extraction based on scalable hypothesis testing).

tsfresh.select_features() function users can implement feature selection.

3. Compatible with big data

When we have a large amount of time series data. tsfresh also provides APIs to extend feature generation/extraction, as well as feature selection implementation for large amounts of data:

Multi-threaded processing: The default tsfresh package can parallelize feature generation/extraction and feature selection on multiple cores implement.
Distributed framework: tsfresh also implements its own distributed framework to distribute feature calculations on multiple machines to speed up calculations.
Spark compatible: tsfresh can also use spark or Dask to process very large data.

Finally, tsfresh can generate and select relevant features for time series features in a few lines of Python code. It automatically extracts and selects 750 practically tested features from multiple domains of time-based data samples. It reduces a lot of work time wasted by data scientists on feature engineering.

And time series data is quite large. tsfresh also uses multi-threading and supports dask and spark to process large data samples that cannot be processed by a single machine.

The above is the detailed content of Extract large amounts of time series features with a small amount of code. For more information, please follow other related articles on the PHP Chinese website!

Python 分布式 pip conda 线程多线程算法 spark

Statement：

This article is reproduced at:51cto.com. If there is any infringement, please contact admin@php.cn delete

Previous article：What are the ways to remove duplicate elements from a list in Python?Next article：What are the ways to remove duplicate elements from a list in Python?

See more

Extract large amounts of time series features with a small amount of code

1. Feature generation

2. Feature selection

3. Compatible with big data

Related articles