Home  >  Article  >  Backend Development  >  Get and store time series data with Python

Get and store time series data with Python

WBOY
WBOYforward
2023-04-11 19:30:321811browse

Get and store time series data with Python

##Translator| Bugatti

Reviewer| Sun Shujuan

This tutorial will introduce how to Use Python to get time series data from the OpenWeatherMap API and convert it into a Pandas DataFrame. Next, we will use the InfluxDB Python Client to write this data to the time series data platform InfluxDB.

We will convert the JSON response from the API call into a Pandas DataFrame as this is the easiest way to write data to InfluxDB. Since InfluxDB is a purpose-built database, our writes to InfluxDB are designed to meet the high requirements in terms of ingestion of time series data.

Requirements

This tutorial is completed on a macOS system that has Python 3 installed via Homebrew. It is recommended to install additional tools such as virtualenv, pyenv or conda-env to simplify the installation of Python and Client. The full requirements are here:

txt
influxdb-client=1.30.0
pandas=1.4.3
requests>=2.27.1

This tutorial also assumes that you have already created a Free Tier InfluxDB cloud account or are using InfluxDB OSS, and that you have also:

    Created the bucket. You can think of buckets as the highest level of data organization in a database or InfluxDB.
  • Token created.
Finally, this tutorial requires that you have already created an account with OpenWeatherMap and created a token.

Request weather data

First, we need to request data. We will use the requests library to return hourly weather data from a specified longitude and latitude via the OpenWeatherMap API.

# Get time series data from OpenWeatherMap API
params = {'lat':openWeatherMap_lat, 'lon':openWeatherMap_lon, 'exclude': 
"minutely,daily", 'appid':openWeatherMap_token}
r = requests.get(openWeather_url, params = params).json()
hourly = r['hourly']

Convert data into Pandas DataFrame

Next, convert the JSON data into Pandas DataFrame. We also convert the timestamp from a second-precision Unix timestamp to a datetime object. This conversion is done because the InfluxDB write method requires the timestamp to be in datetime object format. Next, we will use this method to write data to InfluxDB. We also removed columns that we didn't want written to InfluxDB.

python
# Convert data to Pandas DataFrame and convert timestamp to datetime 
object
df = pd.json_normalize(hourly)
df = df.drop(columns=['weather', 'pop'])
df['dt'] = pd.to_datetime(df['dt'], unit='s')
print(df.head)

Writing Pandas DataFrame to InfluxDB

Now create an instance of the InfluxDB Python client library and write the DataFrame to InfluxDB. We specified the measurement name. Measurements contain data in buckets. You can think of it as the second-highest level structure in InfluxDB's data organization after buckets.

You can also use the data_frame__tag_columns parameter to specify which columns are converted to tags.

Since we did not specify any columns as labels, all of our columns will be converted into fields in InfluxDB. Tags are used to write metadata about your time series data, which can be used to query subsets of the data more efficiently. Fields are where you store the actual time series data in InfluxDB. This document (https://docs.influxdata.com/influxdb/cloud/reference/key-concepts/?utm_source=vendor&utm_medium=referral&utm_campaign=2022-07_spnsr-ctn_obtaining-storing-ts-pything_tns) goes into more detail about These data concepts.

on
# Write data to InfluxDB
with InfluxDBClient(url=url, token=token, org=org) as client:
df = df
client.write_api(write_options=SYNCHRONOUS).write(bucket=bucket,record=df,
data_frame_measurement_name="weather",
data_frame_timestamp_column="dt")

Full script

To review, you might as well take a look at the complete script. We take the following steps:

1. Import the library.

2. Collect the following:

    InfluxDB Bucket
  • InfluxDB Organization
  • InfluxDB Token
  • InfluxDB URL
  • OpenWeatherMap URL
  • OpenWeatherMap Token
3. Create the request.

4. Convert JSON response into Pandas DataFrame.

5. Delete any columns that you do not want to write to InfluxDB.

6. Convert timestamp column from Unix time to Pandas datetime object.

7. Create an instance for the InfluxDB Python Client library.

8. Write a DataFrame and specify the measurement name and timestamp column.

python
import requests
import influxdb_client
import pandas as pd
from influxdb_client import InfluxDBClient
from influxdb_client.client.write_api import SYNCHRONOUS
bucket = "OpenWeather"
org = "" # or email you used to create your Free Tier 
InfluxDB Cloud account
token = " 
url = "" # for example, 
https://us-west-2-1.aws.cloud2.influxdata.com/
openWeatherMap_token = ""
openWeatherMap_lat = "33.44"
openWeatherMap_lon = "-94.04"
openWeather_url = "https://api.openweathermap.org/data/2.5/onecall"
# Get time series data from OpenWeatherMap API
params = {'lat':openWeatherMap_lat, 'lon':openWeatherMap_lon, 'exclude': 
"minutely,daily", 'appid':openWeatherMap_token}
r = requests.get(openWeather_url, params = params).json()
hourly = r['hourly']
# Convert data to Pandas DataFrame and convert timestamp to datetime 
object
df = pd.json_normalize(hourly)
df = df.drop(columns=['weather', 'pop'])
df['dt'] = pd.to_datetime(df['dt'], unit='s')
print(df.head)
# Write data to InfluxDB
with InfluxDBClient(url=url, token=token, org=org) as client:
df = df
client.write_api(write_options=SYNCHRONOUS).write(bucket=bucket,record=df,
data_frame_measurement_name="weather",
data_frame_timestamp_column="dt")

Query data

Now that we have written the data to InfluxDB, we can use the InfluxDB UI to query the data. Navigate to Data Explorer (from the left navigation bar). Using Query Builder, select the data you want to visualize and the range you want to visualize, and click Submit.

Get and store time series data with Python

Figure 1. Default materialized view of weather data. InfluxDB automatically aggregates time series data so new users don't accidentally query too much data and cause timeouts

Pro Tip: When you query data using the query builder, InfluxDB automatically downsamples the data. To query raw data, navigate to the Script Editor to view the underlying Flux query. Flux is a native query and scripting language for InfluxDB that can be used to analyze and create predictions using your time series data. Use the aggregateWindow() function to uncomment or delete rows to see the original data.

Get and store time series data with Python

Figure 2. Navigate to the Script Editor and uncomment or delete the aggregateWindow() function to view the raw weather data

Conclusion

Hopefully this article helped you get the most out of InfluxDB Python Client library, obtains time series data and stores it in InfluxDB. If you want to learn more about using the Python Client library to query data from InfluxDB, I recommend you take a look at this article (https://thenewstack.io/getting-started-with-python-and-influxdb/). It's also worth mentioning that you can use Flux to get data from the OpenWeatherMap API and store it into InfluxDB. If you use InfluxDB Cloud, this means that the Flux script will be hosted and executed periodically, so you can get a reliable stream of weather data fed into the instance. To learn more about how to use Flux to obtain weather data on a user-defined schedule, please read this article (https://www.influxdata.com/blog/tldr-influxdb-tech-tips-handling-json-objects-mapping- arrays/?utm_source=vendor&utm_medium=referral&utm_campaign=2022-07_spnsr-ctn_obtaining-storing-ts-pything_tns).

The above is the detailed content of Get and store time series data with Python. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:51cto.com. If there is any infringement, please contact admin@php.cn delete