Home  >  Article  >  Technology peripherals  >  Quantile regression for time series probabilistic forecasting

Quantile regression for time series probabilistic forecasting

王林
王林forward
2024-05-07 17:04:01491browse

Do not change the meaning of the original content, fine-tune the content, rewrite the content, and do not continue writing. "Quantile regression meets this need, providing prediction intervals with quantified chances. It is a statistical technique used to model the relationship between a predictor variable and a response variable, especially when the conditional distribution of the response variable is of interest When. Unlike traditional regression methods, quantile regression focuses on estimating the conditional magnitude of the response variable rather than the conditional mean.”

Quantile regression for time series probabilistic forecasting##Figure (A): Quantile. Regression

Concept of Quantile Regression

Quantile regression is a modeling method that estimates the linear relationship between a set of regressors X and the quantiles of the explained variable Y .

The existing regression model is actually a method of studying the relationship between the explained variable and the explanatory variable. They focus on the relationship between the explanatory variables and the explained variables and their error distribution. Median regression and quantile regression are two common regression models. They were first proposed according to Koenker and Bassett (1978).

The calculation of the ordinary least squares regression estimator is based on minimizing the sum of squares of the residuals. The calculation of the quantile regression estimator is also based on minimizing the absolute value residual in a symmetric form. Among them, the median regression operation is the least absolute deviations estimator (LAD, least absolute deviations estimator).

Advantages of Quantile Regression

Explaining the full picture of the conditional distribution of the explained variable is not only analyzing the conditional expectation (mean) of the explained variable, but also analyzing how the explanatory variable affects the explained variable Median, quantiles, etc. of variables. The regression coefficient estimates at different quantiles are often different, that is, the explanatory variables have different effects on different quantiles. Therefore, the different effects of different quantiles of the explanatory variables will have different effects on the explained variables.

Compared with the least multiplication method, the estimation method for median regression is more robust to outliers, and quantile regression does not require strong assumptions on the error term, so For non-normal distributions, the median regression coefficient is healthier. At the same time, the quantile regression system quantity estimation becomes more robust.

What are the advantages of quantile regression over Monte Carlo simulation? First, quantile regression directly estimates the conditional magnitude of the response variable given the predictors. This means that, rather than producing a large number of possible outcomes like a Monte Carlo simulation, it provides an estimate of a specific magnitude of the distribution of the response variable. This is particularly useful for understanding different levels of forecast uncertainty, such as quintiles, quartiles, or extreme magnitudes. Second, quantile regression provides a model-based prediction uncertainty estimation method that uses observation data to estimate the relationship between variables and make predictions based on this relationship. In contrast, Monte Carlo simulation relies on specifying probability distributions for input variables and generating results based on random sampling.

NeuralProphet provides two statistical techniques: (1) quantile regression and (2) conformal quantile regression. The conformal quantile prediction technique adds a calibration process to do quantile regression. In this article, we will use Neural Prophet's quantile regression module to make quantile regression predictions. This module adds a calibration process to ensure that the prediction results are consistent with the distribution of the observed data. We will use Neural Prophet’s quantile regression module in this chapter.

Environmental requirements

Install NeuralProphet.

!pip install neuralprophet!pip uninstall numpy!pip install git+https://github.com/ourownstory/neural_prophet.git numpy==1.23.5
Import the required libraries.

%matplotlib inlinefrom matplotlib import pyplot as pltimport pandas as pdimport numpy as npimport loggingimport warningslogging.getLogger('prophet').setLevel(logging.ERROR)warnings.filterwarnings("ignore")
Dataset

Shared bicycle data. The dataset is a multivariate dataset that contains daily rental demand as well as other weather fields such as temperature or wind speed.

data = pd.read_csv('/bike_sharing_daily.csv')data.tail()

Quantile regression for time series probabilistic forecastingPicture (B): Shared bicycles

Plot the number of shared bicycles. We observed that demand increased in the second year and followed a seasonal pattern.

# convert string to datetime64data["ds"] = pd.to_datetime(data["dteday"])# create line plot of sales dataplt.plot(data['ds'], data["cnt"])plt.xlabel("date")plt.ylabel("Count")plt.show()

Quantile regression for time series probabilistic forecastingFigure (C): Daily demand for bicycle rental

Make the most basic data preparation for modeling. NeuralProphet requires the column names ds and y, which is the same as Prophet.

df = data[['ds','cnt']]df.columns = ['ds','y']

构建分位数回归模型

直接在 NeuralProphet 中构建分位数回归。假设我们需要第 5、10、50、90 和 95 个量级的值。我们指定 quantile_list = [0.05,0.1,0.5,0.9,0.95],并打开参数 quantiles = quantile_list。

from neuralprophet import NeuralProphet, set_log_levelquantile_list=[0.05,0.1,0.5,0.9,0.95 ]# Model and predictionm = NeuralProphet(quantiles=quantile_list,yearly_seasnotallow=True,weekly_seasnotallow=True,daily_seasnotallow=False)m = m.add_country_holidays("US")m.set_plotting_backend("matplotlib")# Use matplotlibdf_train, df_test = m.split_df(df, valid_p=0.2)metrics = m.fit(df_train, validation_df=df_test, progress="bar")metrics.tail()

分位数回归预测

我们将使用 .make_future_dataframe()为预测创建新数据帧,NeuralProphet 是基于 Prophet 的。参数 n_historic_predictions 为 100,只包含过去的 100 个数据点。如果设置为 True,则包括整个历史数据。我们设置 period=50 来预测未来 50 个数据点。

future = m.make_future_dataframe(df, periods=50, n_historic_predictinotallow=100) #, n_historic_predictinotallow=1)# Perform prediction with the trained modelsforecast = m.predict(df=future)forecast.tail(60)

预测结果存储在数据框架 predict 中。

Quantile regression for time series probabilistic forecasting图 (D):预测

上述数据框架包含了绘制地图所需的所有数据元素。

m.plot(forecast, plotting_backend="plotly-static"#plotting_backend = "matplotlib")

预测区间是由分位数值提供的!

Quantile regression for time series probabilistic forecasting图 (E):分位数预测

预测区间和置信区间的区别

预测区间和置信区间在流行趋势中很有帮助,因为它们可以量化不确定性。它们的目标、计算方法和应用是不同的。下面我将用回归来解释两者的区别。在图(F)中,我在左边画出了线性回归,在右边画出了分位数回归。

Quantile regression for time series probabilistic forecasting图(F):置信区间与预测区间的区别

首先,它们的目标不同:

  • 线性回归的主要目标是找到一条线,使预测值尽可能接近给定自变量值时因变量的条件均值。
  • 分位数回归旨在提供未来观测值的范围,在一定的置信度下。它估计自变量与因变量条件分布的不同量化值之间的关系。

其次,它们的计算方法不同:

  • 在线性回归中,置信区间是对自变量系数的区间估计,通常使用普通最小二乘法 (OLS) 找出数据点到直线的最小总距离。系数的变化会影响预测的条件均值 Y。
  • 在分位数回归中,你可以选择依赖变量的不同量级来估计回归系数,通常是最小化绝对偏差的加权和,而不是使用OLS方法。

第三,它们的应用不同:

  • 在线性回归中,预测的条件均值有 95% 的置信区间。置信区间较窄,因为它是条件平均值,而不是整个范围。
  • 在分位数回归中,预测值有 95% 的概率落在预测区间的范围内。

写在最后

本文介绍了分位数回归预测区间的概念,以及如何利用 NeuralProphet 生成预测区间。我们还强调了预测区间和置信区间之间的差异,这在商业应用中经常引起混淆。后面将继续探讨另一项重要的技术,即复合分位数回归(CQR),用于预测不确定性。

The above is the detailed content of Quantile regression for time series probabilistic forecasting. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:51cto.com. If there is any infringement, please contact admin@php.cn delete