Home > Article > Backend Development > Introducing the use of python's statsmodels module to fit ARIMA models
Related free learning recommendations: python video tutorial
Import necessary packages and modules
from scipy import statsimport pandas as pdimport matplotlib.pyplot as pltimport statsmodels.api as smfrom statsmodels.tsa.arima.model import ARIMAfrom statsmodels.graphics.tsaplots import plot_predict plt.rcParams['font.sans-serif']=['simhei']#用于正常显示中文标签plt.rcParams['axes.unicode_minus']=False#用于正常显示负号
1. Read the data and draw the graph
data=pd.read_csv('数据/客运量.csv',index_col=0)data.index = pd.Index(sm.tsa.datetools.dates_from_range('1949', '2008'))#将时间列改为专门时间格式,方便后期操作data.plot(figsize=(12,8),marker='o',color='black',ylabel='客运量')#画图
#The passenger flow time series data used in this article: https://download.csdn.net/download/weixin_45590329 /14143811
#The time series line chart is as shown below. Obviously the data has an increasing trend, and the preliminary judgment is that the data is not stable
2. Stationarity test
sm.tsa.adfuller(data,regression='c')sm.tsa.adfuller(data,regression='nc')sm.tsa.adfuller(data,regression='ct')
is carried out Three forms of ADF unit root tests, as shown in some results, found that the sequence is not stationary
3. Perform first-order difference processing on the data
diff=data.diff(1)diff.dropna(inplace=True)diff.plot(figsize=(12,8),marker='o',color='black')#画图
Make a data Line chart after first-order difference, preliminary judgment is that it is stationary
4. Conduct stationarity test on the first-order difference data
sm.tsa.adfuller(diff,regression='c')sm.tsa.adfuller(diff,regression='nc')sm.tsa.adfuller(diff,regression='ct')
As shown in the figure, it shows that the sequence is stationary
5. Determine the order of ARIMA (p, d, q)
fig = plt.figure(figsize=(12,8))ax1 = fig.add_subplot(211)fig = sm.graphics.tsa.plot_acf(diff.values.squeeze(), lags=12, ax=ax1)#自相关系数图1阶截尾,决定MA(1)ax2 = fig.add_subplot(212)fig = sm.graphics.tsa.plot_pacf(diff, lags=12, ax=ax2)#偏相关系数图1阶截尾,决定AR(1)
According to the autocorrelation coefficient map ACF and the partial autocorrelation coefficient map PACF, determine the original data as ARIMA ( 1,1,1) Model
6. Parameter estimation
model = ARIMA(data, order=(1, 1, 1)).fit()#拟合模型model.summary()#统计信息汇总#系数检验params=model.params#系数tvalues=model.tvalues#系数t值bse=model.bse#系数标准误pvalues=model.pvalues#系数p值#绘制残差序列折线图resid=model.resid#残差序列fig = plt.figure(figsize=(12,8))ax = fig.add_subplot(111)ax = model.resid.plot(ax=ax)#计算模型拟合值fit=model.predict(exog=data[['TLHYL']])
7. Model test
#8.1.检验序列自相关sm.stats.durbin_watson(model.resid.values)#DW检验:靠近2——正常;靠近0——正自相关;靠近4——负自相关#8.2.AIC和BIC准则model.aic#模型的AIC值model.bic#模型的BIC值#8.3.残差序列正态性检验stats.normaltest(resid)#检验序列残差是否为正态分布#最终检验结果显示无法拒绝原假设,说明残差序列为正态分布,模型拟合良好#8.4.绘制残差序列自相关图和偏自相关图fig = plt.figure(figsize=(12,8))ax1 = fig.add_subplot(211)fig = sm.graphics.tsa.plot_acf(resid.values.squeeze(), lags=12, ax=ax1)ax2 = fig.add_subplot(212)fig = sm.graphics.tsa.plot_pacf(resid, lags=12, ax=ax2)#如果两图都零阶截尾,这说明模型拟合良好
8. Prediction
#预测至2016年的数据。由于ARIMA模型有两个参数,至少需要包含两个初始数据,因此从2006年开始预测predict = model.predict('2006', '2016', dynamic=True)print(predict)#画预测图及置信区间图fig, ax = plt.subplots(figsize=(10,8))fig = plot_predict(model, start='2002', end='2006', ax=ax)legend = ax.legend(loc='upper left')
For a large number of free learning recommendations, please visitpython tutorial(Video)
The above is the detailed content of Introducing the use of python's statsmodels module to fit ARIMA models. For more information, please follow other related articles on the PHP Chinese website!