Home >Backend Development >Python Tutorial >How to group data by time interval in Python Pandas?
Data analysis is increasingly becoming an important aspect of every industry. Many organizations rely heavily on information to make strategic decisions, predict trends, and understand consumer behavior. In such an environment, Python's Pandas library emerges as a powerful device, providing a different range of functionality to successfully manipulate, decompose, and visualize information. One of these powerful features includes grouping data by time intervals.
This article will focus on how to use Pandas to group data by time intervals. We'll explore the syntax, easy-to-understand algorithms, two different approaches, and two fully executable real-world codes based on these approaches.
The method we will focus on is Pandas's groupby() function, specifically its resampling method. The syntax is as follows:
df.groupby(pd.Grouper(key='date', freq='T')).sum()
In syntax:
df − Your DataFrame.
groupby(pd.Grouper()) − Function for grouping data.
key − The column you want to group by. Here, it's the 'date' column.
freq − Frequency of the interval. ('T' stands for minutes, 'H' stands for hours, 'D' stands for days, etc.)
sum() - Aggregation function.
This is a step-by-step algorithm for grouping data by time intervals -
Import the necessary libraries, namely Pandas.
Load or create your DataFrame.
Use pd.Grouper to apply the groupby() function on the date column, using the desired frequency.
Apply sum(), mean() and other aggregate functions
Print or store the results.
We will consider two different approaches −
In this example, we create a DataFrame containing a series of dates and values. We then grouped the data by daily frequency and summed the daily values.
# Import pandas import pandas as pd # Create a dataframe df = pd.DataFrame({ 'date': pd.date_range(start='1/1/2022', periods=100, freq='H'), 'value': range(100) }) # Convert 'date' to datetime object, if not already df['date'] = pd.to_datetime(df['date']) # Group by daily frequency daily_df = df.groupby(pd.Grouper(key='date', freq='D')).sum() print(daily_df)
value date 2022-01-01 276 2022-01-02 852 2022-01-03 1428 2022-01-04 2004 2022-01-05 390
Introducing the Pandas library is an absolute requirement for any data manipulation work, and is the main thing we are really going to do in this code. Utilizing the pd.DataFrame() strategy is a subsequent stage during the construction of a DataFrame. The "Date" and "Value" parts make up this dataframe. The pd.date_range() function is used to create a range of hourly timestamps in the "Date" column, while the "Value" part contains only integer ranges. The "Date" column is the result of this interaction.
Although our "Date" column currently handles datetime objects differently, we are increasingly using the pd.to_datetime() function to ensure it is changed. This step is critical because the progress of the collection activity depends on whether the segment has an information type of datetime object.
After this, to group the data by daily ('D') frequency, we use the groupby() function combined with the pd.Grouper() function. After grouping, we use the sum() function to combine all 'value' elements belonging to the same day into a single total.
Finally, a grouped DataFrame is written out, showing the total of each day's values.
# Import pandas import pandas as pd # Create a dataframe df = pd.DataFrame({ 'date': pd.date_range(start='1/1/2022', periods=100, freq='T'), 'value': range(100) }) # Convert 'date' to datetime object, if not already df['date'] = pd.to_datetime(df['date']) # Group by 15-minute frequency custom_df = df.groupby(pd.Grouper(key='date', freq='15T')).sum() print(custom_df)
value date 2022-01-01 00:00:00 105 2022-01-01 00:15:00 330 2022-01-01 00:30:00 555 2022-01-01 00:45:00 780 2022-01-01 01:00:00 1005 2022-01-01 01:15:00 1230 2022-01-01 01:30:00 945
The next technique starts with an import of the Pandas library similar to the first, and then creates a DataFrame. This DataFrame is the same as used in the previous model; the only difference is that the 'date' column now contains the timestamp in minutes.
The 'date' column should be a datetime object in order for the collection activity to work properly, and the pd.to_datetime() function ensures that this happens.
In this section, we use the pd.Grouper() function inside the groupby() method to perform grouping operations using a dedicated frequency of 15 minutes ("15T"). To aggregate the "value" entries for each 15-minute interval, we use the sum() function, which is the same method used in the first method.
Complete the code by displaying a new grouped DataFrame showing the sum of the 'value' column for each 15 minute interval.
The powerful features of Pandas include various data operations, one of which is grouping data by time intervals. By using the groupby() function in conjunction with pd.Grouper, we can effectively segment data based on daily frequencies or custom frequencies, enabling efficient and flexible data analysis.
The ability to group data by time intervals enables analysts and businesses to extract meaningful insights from the data. Whether it's calculating the total sales per day, getting the average temperature per hour, or counting website hits every 15 minutes, grouping data by time intervals allows us to better understand trends, patterns, and trends in the data over time. Outliers.
Remember, Python’s Pandas library is a powerful data analysis tool. Learning how to use its features, such as the groupby method, can help you become a more efficient and proficient data analyst or data scientist.
The above is the detailed content of How to group data by time interval in Python Pandas?. For more information, please follow other related articles on the PHP Chinese website!