2010年玉米价格:1970-2010美国玉米价格周期研究及未来价格预测

背景及研究问题

Corn price is one of the most important representations of agricultural economy in the United States. Research on it could reveal the changes of agricultural product price in these decades and show whether there exists some factors that influenced the American agriculture unpredictably. Moreover, the cycle for agriculture products is said to be significant. Ignoring this seasonal factor would
conceal some important information and make our forecasting become not precise or even not accurate. So our another problem is to determine whether this cycle exists and if it exits, what the cycle is and what influence it will leave on the corn price.

在美国,玉米价格是反映农业经济变化的重要指标。对历史玉米价格数据的研究不仅可以反映历史农业经济的变化情况,侧面展示该段历史价格是否存在异常波动,同时,对历史数据的建模也是预测未来价格的重要手段。一般而言,农产品价格具有极其鲜明的季节特性,因此我们在建模时也必须考察此段历史数据是否存在明显的周期性,如果存在周期性,为了提升对未来价格预测的准确程度,我们也必须将季节性因素考虑在内。

Key Questions:
• Are there any significant cycles in the corn price?
• What kind of time series model we will develop and what forecasts we have for the next two years?
研究问题:

  • 该段历史数据中是否存在周期性波动?
  • 我们是否可以根据该段历史数据建模以预测下一阶段的玉米价格?
  • To address these questions, we need to plot the data to acquire the features to construct a time series model. And the model based spectral density estimate will address possible cycles in the data. In the end, the forecast for the future 24 months will be made.
    为了解决以上问题,我们首先需要画出该段数据的波动图,以观察该数据集的变动和可能存在的周期性,为我们下一步的建模工作做准备。在项目的最末,我们将用该模型预测接下来24个月的玉米价格,并计算该模型的准确度。

    初步探索

    The data is obtained from http://www.macrotrends.net. Since the raw data is listed daily, we need to process it through R first to make it shown monthly.
    我们的数据是从网站http://www.macrotrends.net上获取的。由于原始数据是每日玉米价格,我们首先用xts和zoo包将其转换为月均价格。以下为1970年至2010年美国玉米月均价格的波动图:

    // plotting the averaged monthly dataplot(dat.month$price, type = "l", xaxt = "n", xlab = "1970.01 to 2010.12", ylab = "Price", main = "monthly corn price from 1970 to 2010")


    The time series plot of monthly corn price from 1970 to 2010 reveals a strongly correlated series with oscilations. Then we will step into the next procedure to plot the ACF and PACF.
    波动图的振荡展示了极为明显的时间序列数据特征。因而,我们将观察该组数据的ACF和PACF图像,以对数据进行进一步分析。

    invisible(acf2(dat.month$price))


    As is shown in the ACF, monthly corn price is a short memory covariance stationary time series, so we do not need to give it any differencing. And AR(12) seems to be a good choice. However, a cyclical pattern is observed in the ACF, so we still cannot ignore the seasonal factors.
    从ACF图中我们可以看到,从1970年到2010年的时间序列数据展现出了非常明显的短记忆弱平稳特性(这也意味着我们不用再对数据进行差分化)。虽然AR(12)看起来是个不错的选择,但ACF也展现了较为明显的周期性,因而我们也必须将季节性因素考虑在内。

    探寻周期性

    We use the command astsa::mvspec to plot the periodogram.

    我们使用命令command astsa::mvspec来画周期图。

    corn.price.spectrum <- dat.month$price - mean(dat.month$price)price.per = mvspec(corn.price.spectrum, log="no")maxf <- which.max(price.per$spec)#136freq1 <- price.per$freq[136]abline(v = 0.272, lty=2, col="red")per1 <- 1/freq1#3.676471text(frequency(dat.month$price)*0.272, 900, substitute(omega==0.272))


    It is evident that at the point 0.272 the spectrum is the highest, which means 1 0.272 ≈ 4 \frac{1}{0.272}\approx4 0.2721​≈4 is the most obvious cycle for the monthly corn price. The confidence intervel for it is [0.3050298, 44.44375], this CI includes only one or two values at frequencies where we do not believe there is another peak, so I believe this peak is still significant. So we will try SARIMA model where S = 4.

    从图中我们可以明显看到,0.272处的频谱处于最高点,因而 1 0.272 ≈ 4 \frac{1}{0.272}\approx4 0.2721​≈4可以看做是该数据集最为明显的周期。在从图中我们可以看到它的置信区间[0.3050298, 44.44375]内不存在另一个最高点,所以我们可以认为这个峰值是显著的。我们将在下一步的建模中加入季节性因素S = 4。

    建模

    First, we try to difference at every 4 times to get the season (S = 4).In this case, the ACF seems that differencing may be needed.

    d.d.diff <- diff(d.diff)invisible(acf2(d.d.diff))

    首先,我们根据S = 4对数据集进行差分。然而差分后的ACF图显示我们需要再进行一次差分。

    After differencing, ACF seems to be appropriate. In this case, PACF is expontially decaying and ACF is cutting off at 1s or 2s, so I consider both SARIMA(12, 0, 0) × (0, 1, 1) 4 _{4} 4​ and SARIMA(12, 0, 0) × (0, 1, 2) 4 _{4} 4​.

    差分完成后的ACF图更加完美。我们可以看到。PACF图呈指数衰减,而ACF图在1s或2s后开始衰减,因此此处我们需要考虑尝试两个模型,SARIMA(12, 0, 0) × (0, 1, 1) 4 _{4} 4​ 和 SARIMA(12, 0, 0) × (0, 1, 2) 4 _{4} 4​,我们将多方面比较这两个模型。

    invisible(sarima(dat.month$price, p = 12, d = 0, q = 0, P = 0, D = 1, Q = 1, S = 4, details = F))invisible(sarima(dat.month$price, p = 12, d = 0, q = 0, P = 0, D = 1, Q = 2, S = 4, details = F))#compare the two modelsg011 <- sarima(dat.month$price, p = 12, d = 0, q = 0, P = 0, D = 1, Q = 1, S = 4, details = F)g012 <- sarima(dat.month$price, p = 12, d = 0, q = 0, P = 0, D = 1, Q = 2, S = 4, details = F)g011$AICc < g012$AICc## [1] TRUEg011$BIC < g012$BIC## [1] TRUE

    On the one hand, as is shown in the Ljung-Box, nearly all the p-values are significant in the SARIMA(12, 0, 0) × (0, 1, 1) 4 _{4} 4​, so it is much better than SARIMA(12, 0, 0) × (0, 1, 2) 4 _{4} 4​, which just has 5 significant p values. Therefore, SARIMA(12, 0, 0) × (0, 1, 1) 4 _{4} 4​ adequately captures the short term monthly autocorrelation and quartly cycle. On the other hand, both AIC C _{C} C​ and BIC for SARIMA(12, 0, 0) × (0, 1, 1) 12 _{12} 12​ is smaller than SARIMA(12, 0, 0) × (0, 1, 2) 4 _{4} 4​. So there is no doubt that we will choose SARIMA(12, 0, 0) × (0, 1, 1) 4 _{4} 4​.

    The estimated coefficients for SARIMA(12, 0, 0) × (0, 1, 1) 4 _{4} 4​ is as the table below. The p-value for ar5 and constant is not significant under alpha = 0.1 level, so they may need to be reconsidered.

    一方面,根据Ljung-Box检验我们可以看到,SARIMA(12, 0, 0) × (0, 1, 1) 4 _{4} 4​中近乎所有的参数p值都显著,而SARIMA(12, 0, 0) × (0, 1, 2) 4 _{4} 4​中仅有5个参数有显著p值,这意味着SARIMA(12, 0, 0) × (0, 1, 1) 4 _{4} 4​抓住了数据集中更多的自相关和周期性特征。另一方面,SARIMA(12, 0, 0) × (0, 1, 1) 4 _{4} 4​的AIC和BIC均比SARIMA(12, 0, 0) × (0, 1, 2) 4 _{4} 4​更优,因而,我们毫无疑问地选择SARIMA(12, 0, 0) × (0, 1, 1) 4 _{4} 4​作为我们的最终模型。
    我们将SARIMA(12, 0, 0) × (0, 1, 1) 4 _{4} 4​的参数列于下方,ar5和常数项在alpha = 0.1下不显著,因此这两个参数可能需要进一步斟酌。


    使用模型预测

    Using the SARIMA(12, 0, 0) × (0, 1, 1) 4 _{4} 4​ model, we predict the next 24 month’s corn price tendency. Forecast error bounds are also plotted.
    我们将使用SARIMA(12, 0, 0) × (0, 1, 1) 4 _{4} 4​ model预测接下来24个月的玉米价格情况,预测情况如图

    z = invisible(sarima.for(dat.month$price, n.ahead=24, p = 12, d = 0, q = 0, P = 0, D = 1, Q = 1, S = 4))


    与真实数据对比,预测准确度高达89%。

    相关推荐

    相关文章