Home > Article > Technology peripherals > Multivariate time series forecasting: independent forecasting or joint forecasting?
Today I introduce an article published by NTU in April this year. It mainly discusses the differences between the effects of independent prediction (channel independent) and joint prediction (channel dependent) in multivariate time series forecasting problems, the reasons behind them, and the optimization methods. .
Paper title: The Capacity and Robustness Trade-off: Revisiting the Channel Independent Strategy for Multivariate Time Series Forecasting
Download address : https://arxiv.org/pdf/2304.05206v1.pdf
In the multivariate time series forecasting problem, the dimensions of multivariable modeling methods are: There are two types, one is independent prediction (channel independent, CI), which refers to treating multivariate sequences as multiple univariate predictions, and each variable is modeled separately; the other is joint prediction (channel dependent, CD), which refers to It is to model multiple variables together and consider the relationship between each variable. The difference between the two is as shown below.
The two methods have their own characteristics: the CI method only considers a single variable, the model is simpler, but the ceiling is also lower because it does not consider the relationship between each sequence. relationship, losing part of the key information; while the CD method considers more comprehensive information, but the model is also more complex.
First conduct a detailed comparative experiment and use linear models to observe the effects of the CI method and the CD method on multiple data sets to determine which method A better way. In the experiments in this article, a main conclusion is that the CI method shows better performance on most tasks and has stronger effect stability. As can be seen in the picture below, CI's MAE, MSE and other indicators are basically smaller than CD in each data set, and the fluctuation of the effect is also smaller.
As can be seen from the experimental results below, compared with CD, CI has the same effect on most prediction window lengths and data sets. elevated.
Why is the CI method better and more stable than CD in practical applications? The article conducted some theoretical proofs, and the core conclusion is that real data often has Distribution Drift, and using CI methods can help alleviate this problem and improve model generalization. The picture below shows the distribution of ACF (autocorrelation coefficient, reflecting the relationship between future sequences and historical sequences) of each data set trainset and testset over time. You can see that Distribution Drift is widespread in various data sets. (That is, the ACF of the trainset is different from the ACF of the testset, that is, the relationship between the history and the future sequence of the two is different).
The article proves through theory that CI is effective in mitigating Distribution Drift. The choice between CI and CD is a kind of model capacity and model robustness. A trade-off between stickiness. Although the CD model is more complex, it is also more sensitive to distribution shifts. This is actually similar to the relationship between model capacity and model generalization. The more complex the model, the more accurate the training set samples that the model fits, but the generalization is poor. Once the distribution difference between the training set and the test set is large, the effect will be will get worse.
Aiming at the problem of CD modeling, this article proposes some optimization methods that can help the CD model to be more robust.
Regularization: Introduce a regularization loss, use the sequence minus the nearest sample point as the historical sequence input model for prediction, and use smoothing to constrain the prediction result so that the prediction result does not deviate too much from the nearest neighbor observation value. Large, making the estimated results flatter;
Low-rank decomposition: decompose the fully connected parameter matrix into two low-order matrices, which is equivalent to reducing Increases model capacity, alleviates over-fitting problems, and improves model robustness;
Loss function: MAE is used instead of MSE to reduce the model's sensitivity to outliers;
Historical input sequence length: For the CD model, the longer the input historical sequence, the effect may be reduced. This is also because the longer the historical sequence, the more susceptible the model is to the influence of Distribution Shift. For the CI model, the growth of the historical sequence length can be relatively stable. Improve prediction performance.
In this article, the above-mentioned method of improving the CD model was tested on multiple data sets. Compared with CD, a relatively stable effect improvement was achieved, indicating that the above method is useful for improving multivariate sequences. Prediction robustness has a relatively obvious effect. Experimental results show that factors such as low-rank decomposition, historical window length and loss function type are also listed in the article in terms of influencing the effect.
The above is the detailed content of Multivariate time series forecasting: independent forecasting or joint forecasting?. For more information, please follow other related articles on the PHP Chinese website!