Home  >  Article  >  Backend Development  >  How to group data by time interval in Python Pandas?

How to group data by time interval in Python Pandas?

PHPz
PHPzforward
2023-08-29 14:13:02845browse

如何在Python Pandas中按时间间隔分组数据?

Data analysis is increasingly becoming an important aspect of every industry. Many organizations rely heavily on information to make strategic decisions, predict trends, and understand consumer behavior. In such an environment, Python's Pandas library emerges as a powerful device, providing a different range of functionality to successfully manipulate, decompose, and visualize information. One of these powerful features includes grouping data by time intervals.

This article will focus on how to use Pandas to group data by time intervals. We'll explore the syntax, easy-to-understand algorithms, two different approaches, and two fully executable real-world codes based on these approaches.

grammar

The method we will focus on is Pandas's groupby() function, specifically its resampling method. The syntax is as follows:

df.groupby(pd.Grouper(key='date', freq='T')).sum()

In syntax:

  • df − Your DataFrame.

  • groupby(pd.Grouper()) − Function for grouping data.

  • key − The column you want to group by. Here, it's the 'date' column.

  • freq − Frequency of the interval. ('T' stands for minutes, 'H' stands for hours, 'D' stands for days, etc.)

  • sum() - Aggregation function.

algorithm

This is a step-by-step algorithm for grouping data by time intervals -

  • Import the necessary libraries, namely Pandas.

  • Load or create your DataFrame.

  • 25edfb22a4f469ecb59f1190150159c6e388a4556c0f65e1904146cc1a846beeConvert the date column to a datetime object, if it is not already converted. 94b3e26ee717c64999d7867364b1b4a3bed06894275b65c1ab86501b08a632eb
  • Use pd.Grouper to apply the groupby() function on the date column, using the desired frequency.

  • Apply sum(), mean() and other aggregate functions

  • Print or store the results.

method

We will consider two different approaches −

Method 1: Group by daily frequency

In this example, we create a DataFrame containing a series of dates and values. We then grouped the data by daily frequency and summed the daily values.

Example

# Import pandas
import pandas as pd

# Create a dataframe
df = pd.DataFrame({
   'date': pd.date_range(start='1/1/2022', periods=100, freq='H'),
   'value': range(100)
})

# Convert 'date' to datetime object, if not already
df['date'] = pd.to_datetime(df['date'])

# Group by daily frequency
daily_df = df.groupby(pd.Grouper(key='date', freq='D')).sum()

print(daily_df)

Output

            value
date             
2022-01-01    276
2022-01-02    852
2022-01-03   1428
2022-01-04   2004
2022-01-05    390

illustrate

Introducing the Pandas library is an absolute requirement for any data manipulation work, and is the main thing we are really going to do in this code. Utilizing the pd.DataFrame() strategy is a subsequent stage during the construction of a DataFrame. The "Date" and "Value" parts make up this dataframe. The pd.date_range() function is used to create a range of hourly timestamps in the "Date" column, while the "Value" part contains only integer ranges. The "Date" column is the result of this interaction.

Although our "Date" column currently handles datetime objects differently, we are increasingly using the pd.to_datetime() function to ensure it is changed. This step is critical because the progress of the collection activity depends on whether the segment has an information type of datetime object.

After this, to group the data by daily ('D') frequency, we use the groupby() function combined with the pd.Grouper() function. After grouping, we use the sum() function to combine all 'value' elements belonging to the same day into a single total.

Finally, a grouped DataFrame is written out, showing the total of each day's values.

Method 2: Group by custom frequency, such as 15 minute intervals

Example

# Import pandas
import pandas as pd

# Create a dataframe
df = pd.DataFrame({
   'date': pd.date_range(start='1/1/2022', periods=100, freq='T'),
   'value': range(100)
})

# Convert 'date' to datetime object, if not already
df['date'] = pd.to_datetime(df['date'])

# Group by 15-minute frequency
custom_df = df.groupby(pd.Grouper(key='date', freq='15T')).sum()

print(custom_df)

Output

                     value
date                      
2022-01-01 00:00:00    105
2022-01-01 00:15:00    330
2022-01-01 00:30:00    555
2022-01-01 00:45:00    780
2022-01-01 01:00:00   1005
2022-01-01 01:15:00   1230
2022-01-01 01:30:00    945

illustrate

The next technique starts with an import of the Pandas library similar to the first, and then creates a DataFrame. This DataFrame is the same as used in the previous model; the only difference is that the 'date' column now contains the timestamp in minutes.

The 'date' column should be a datetime object in order for the collection activity to work properly, and the pd.to_datetime() function ensures that this happens.

In this section, we use the pd.Grouper() function inside the groupby() method to perform grouping operations using a dedicated frequency of 15 minutes ("15T"). To aggregate the "value" entries for each 15-minute interval, we use the sum() function, which is the same method used in the first method.

Complete the code by displaying a new grouped DataFrame showing the sum of the 'value' column for each 15 minute interval.

in conclusion

The powerful features of Pandas include various data operations, one of which is grouping data by time intervals. By using the groupby() function in conjunction with pd.Grouper, we can effectively segment data based on daily frequencies or custom frequencies, enabling efficient and flexible data analysis.

The ability to group data by time intervals enables analysts and businesses to extract meaningful insights from the data. Whether it's calculating the total sales per day, getting the average temperature per hour, or counting website hits every 15 minutes, grouping data by time intervals allows us to better understand trends, patterns, and trends in the data over time. Outliers.

Remember, Python’s Pandas library is a powerful data analysis tool. Learning how to use its features, such as the groupby method, can help you become a more efficient and proficient data analyst or data scientist.

The above is the detailed content of How to group data by time interval in Python Pandas?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:tutorialspoint.com. If there is any infringement, please contact admin@php.cn delete