Advanced Statistics with R code 90% doen just check it and make a summary

I have the problem below that needs to be completed. I have included Weekly GOOGLE time series data from the past few years. The analysis must be done on this data. I have also included sample code and analysis. Please do the analysis similar to this with thorough answers. It will be easier if you just copy and paste the sample code and modify it.

The analysis should be submitted as a report consisting of two sections:

1. A non-technical summary not longer than one typewritten page, describing the conclusions of your statistical analysis to a non-technical audience. It should be intelligible to a person who does not know regression analysis. Suppose you are talking to your boss who does not know statistics or to a friend who is not familiar with statistical terminology.

2. A technical summary not exceeding 5 pages, with details of your statistical analysis. This section is intended for a statistically literate audience and must be written in a clear organized fashion. For instance, you can organize the report into subsections such as

a. Exploratory analysis of the data.

b. Model fitting.

c. Residual analysis and model diagnostics.

d. Forecast analysis

e. Analysis of the results and discussion.

You should include appropriate output and graphs from your R programs in your document in a presentable format. Further details may be relegated to an appendix that may contain the R code, some graphs, computer output or supplementary information about the field of study.

It is possible that you may not find a satisfactory model that fits adequately your data. Sometimes a data set may admit more than one satisfactory answer; sometimes there may be none. If the statistical analysis shows that no time series models are suitable for your data set, please mention what approaches you have tried and what was unsatisfactory about them. If there is more than one suitable model, mention the pros and cons and compare their performance in forecasting future observations.

================================================================ The assignment should resemble Homework 4 and include the pieces of Homework 5 and the sample analysis. It should be in a clean format like the homework and include the R code at the end of it. Since this is Stock price data the arima process can’t be used and it should (I think) require the garch volatility modeling. Take a look at the S&P analysis for further clarification

.87

180.04

185

.26

280

390.4

390

.48

505

.49

515

.27

.07

567

372.5

390

460

.51

580

560

470.3

590

573

590.8

.77

563

600.25

.59

642

date	price
11/1/2004	169.35
11/8/2004	182
11/15/2004	169.4
11/22/2004	179.39
11/29/2004	180.4
12/6/2004	171.65
12/13/2004	180.08
12/20/2004	187.9
12/27/2004	192.79
1/3/2005	193.85
1/10/2005	199.97
1/18/2005	188.28
1/24/2005	190.34
1/31/2005	204.36
2/7/2005	187.4
2/14/2005	197.95
2/22/2005		185
2/28/2005	185.9
3/7/2005	177.8
3/14/2005		180.04
3/21/2005	179.25
3/28/2005
4/4/2005	192.05
4/11/2005
4/18/2005	215.81
4/25/2005	220
5/2/2005	228.02
5/9/2005	229.24
5/16/2005	241.61
5/23/2005	266
5/31/2005		280
6/6/2005	282.5
6/13/2005	280.3
6/20/2005	297.25
6/27/2005	291.25
7/5/2005	296.23
7/11/2005	301.19
7/18/2005	302.4
7/25/2005	287.76
8/1/2005	292.35
8/8/2005	289.72
8/15/2005
8/22/2005	283.58
8/29/2005	288.45
9/6/2005	299.09
9/12/2005	300.2
9/19/2005	315.36
9/26/2005	316.46
10/3/2005	312.99
10/10/2005	296.14
10/17/2005	339.9
10/24/2005	358.17
10/31/2005			390
11/7/2005
11/14/2005	400.21
11/21/2005	428.62
11/28/2005	417.7
12/5/2005	409.2
12/12/2005	430.15
12/19/2005	430.93
12/27/2005	414.86
1/3/2006	465.66
1/9/2006	466.25
1/17/2006	399.46
1/23/2006	433.49
1/30/2006	381.55
2/6/2006	362.61
2/13/2006	368.75
2/21/2006	377.4
2/27/2006	378.18
3/6/2006	337.5
3/13/2006	339.79
3/20/2006	365.8
3/27/2006
4/3/2006	406.16
4/10/2006	402.16
4/17/2006	437.1
4/24/2006	417.94
5/1/2006	394.3
5/8/2006	374.13
5/15/2006	370.02
5/22/2006	381.35
5/30/2006	379.44
6/5/2006	386.57
6/12/2006	390.7
6/19/2006	404.86
6/26/2006	419.33
7/3/2006	420.45
7/10/2006	403.5
7/17/2006	390.11
7/24/2006	388.12
7/31/2006	373.85
8/7/2006	368.5
8/14/2006	383.36
8/21/2006	373.26
8/28/2006	378.6
9/5/2006	377.85
9/11/2006	409.88
9/18/2006	403.78
9/25/2006	401.9
10/2/2006	420.5
10/9/2006	427.3
10/16/2006	459.67
10/23/2006	475.2
10/30/2006	471.8
11/6/2006	473.55
11/13/2006	498.79
11/20/2006		505
11/27/2006	480.8
12/4/2006	484.11
12/11/2006	480.3
12/18/2006	455.58
12/26/2006		460
1/3/2007	487.19
1/8/2007
1/16/2007	489.75
1/22/2007	495.84
1/29/2007	481.5
2/5/2007	461.89
2/12/2007	469.94
2/20/2007	470.62
2/26/2007	438.68
3/5/2007	452.96
3/12/2007	440.85
3/19/2007	461.83
3/26/2007	458.16
4/2/2007	471.51
4/9/2007	466.29
4/16/2007	482.48
4/23/2007	479.01
4/30/2007	471.12
5/7/2007	466.74
5/14/2007		470.3
5/21/2007	483.52
5/29/2007	500.4
6/4/2007		515
6/11/2007	505.89
6/18/2007	524.98
6/25/2007	522.7
7/2/2007	539.4
7/9/2007	552.16
7/16/2007	520.12
7/23/2007	511.89
7/30/2007	503
8/6/2007	515.75
8/13/2007	500.04
8/20/2007
8/27/2007	515.25
9/4/2007	519.35
9/10/2007	528.75
9/17/2007		560
9/24/2007		567
10/1/2007	594.05
10/8/2007	637.39
10/15/2007	644.71
10/22/2007	674.6
10/29/2007	711.25
11/5/2007	663.97
11/12/2007	633.63
11/19/2007	676.7
11/26/2007	693
12/3/2007	714.87
12/10/2007	689.96
12/17/2007	696.69
12/24/2007	702.53
12/31/2007	657
1/7/2008	638.25
1/14/2008		600.25
1/22/2008	566.4
1/28/2008	515.9
2/4/2008	516.69
2/11/2008	529.64
2/19/2008	507.8
2/25/2008	471.18
3/3/2008	433.35
3/10/2008	437.92
3/17/2008	433.55
3/24/2008	438.08
3/31/2008	471.09
4/7/2008	457.45
4/14/2008	539.41
4/21/2008	544.06
4/28/2008	581.29
5/5/2008		573
5/12/2008		580
5/19/2008	544.62
5/27/2008	585.8
6/2/2008
6/9/2008	571.51
6/16/2008	546.43
6/23/2008	528.07
6/30/2008	537
7/7/2008	533.8
7/14/2008	481.32
7/21/2008	491.98
7/28/2008	467.86
8/4/2008	495.01
8/11/2008	510.15
8/18/2008	490.59
8/25/2008	463.29
9/2/2008	444.25
9/8/2008	437.66
9/15/2008	449.15
9/22/2008	431.04
9/29/2008	386.91
10/6/2008	332
10/13/2008		372.5
10/20/2008	339.29
10/27/2008	359.36
11/3/2008	331.14
11/10/2008	310.02
11/17/2008	262.43
11/24/2008	292.96
12/1/2008	283.99
12/8/2008	315.76
12/15/2008	310.17
12/22/2008	300.36
12/29/2008	321.32
1/5/2009	315.07
1/12/2009	299.67
1/20/2009	324.7
1/26/2009	338.53
2/2/2009	371.28
2/9/2009	357.68
2/17/2009	346.45
2/23/2009	337.99
3/2/2009	308.57
3/9/2009	324.42
3/16/2009	330.16
3/23/2009	347.7
3/30/2009	369.78
4/6/2009
4/13/2009	392.24
4/20/2009	389.49
4/27/2009	393.69
5/4/2009	407.33
5/11/2009
5/18/2009	393.5
5/26/2009	417.23
6/1/2009	444.32
6/8/2009	424.84
6/15/2009	420.09
6/22/2009	425.32
6/29/2009	408.49
7/6/2009	414.4
7/13/2009	430.25
7/20/2009	446.72
7/27/2009	443.05
8/3/2009	457.1
8/10/2009
8/17/2009	465.24
8/24/2009	464.75
8/31/2009	461.3
9/8/2009	472.14
9/14/2009	491.46
9/21/2009	492.48
9/28/2009	484.58
10/5/2009	516.25
10/12/2009	549.85
10/19/2009	553.69
10/26/2009	536.12
11/2/2009	551.1
11/9/2009	572.05
11/16/2009	569.96
11/23/2009	579.76
11/30/2009	585.01
12/7/2009		590
12/14/2009	596.42
12/21/2009	618.48
12/28/2009	619.98
1/4/2010	602.02
1/11/2010
1/19/2010	550.01
1/25/2010	529.94
2/1/2010	531.29
2/8/2010	533.12
2/16/2010	540.76
2/22/2010	526.8
3/1/2010	564.21
3/8/2010	579.54
3/15/2010
3/22/2010	562.69
3/29/2010	568.8
4/5/2010	566.22
4/12/2010	550.15
4/19/2010	544.99
4/26/2010	525.7
5/3/2010	493.14
5/10/2010	507.53
5/17/2010	472.05
5/24/2010	485.63
6/1/2010	498.72
6/7/2010	488.5
6/14/2010	500.03
6/21/2010	472.68
6/28/2010	436.55
7/6/2010	467.49
7/12/2010	459.61
7/19/2010	490.06
7/26/2010	484.85
8/2/2010	500.22
8/9/2010	486.35
8/16/2010	462.02
8/23/2010	458.83
8/30/2010
9/7/2010	476.14
9/13/2010	490.15
9/20/2010	527.29
9/27/2010	525.62
10/4/2010	536.35
10/11/2010	601.45
10/18/2010	612.53
10/25/2010	613.7
11/1/2010	625.08
11/8/2010	603.29
11/15/2010		590.8
11/22/2010
11/29/2010
12/6/2010	592.21
12/13/2010
12/20/2010	604.23
12/27/2010	593.97
1/3/2011	616.44
1/10/2011	624.18
1/18/2011	611.83
1/24/2011	600.99
1/31/2011	610.98
2/7/2011	624.5
2/14/2011	630.08
2/22/2011	610.04
2/28/2011	600.62
3/7/2011	576.71
3/14/2011	561.06
3/21/2011	579.74
3/28/2011	591.8
4/4/2011	578.16
4/11/2011	530.7
4/18/2011	525.1
4/25/2011	544.1
5/2/2011	535.3
5/9/2011	529.55
5/16/2011	524.03
5/23/2011	520.9
5/31/2011	523.08
6/6/2011	509.51
6/13/2011	485.02
6/20/2011	474.88
6/27/2011	521.03
7/5/2011	531.99
7/11/2011	597.62
7/18/2011	618.23
7/25/2011	603.69
8/1/2011	579.04
8/8/2011		563
8/15/2011	490.92
8/22/2011	526.86
8/29/2011	524.84
9/6/2011	524.85
9/12/2011	546.68
9/19/2011	525.51
9/26/2011	515.04
10/3/2011	515.12
10/10/2011	591.68
10/17/2011	590.49
10/24/2011	600.14
10/31/2011	596.14
11/7/2011	608.35
11/14/2011	594.88
11/21/2011
11/28/2011	620.36
12/5/2011	627.42
12/12/2011	625.96
12/19/2011	633.14
12/27/2011	645.9
1/3/2012	650.02
1/9/2012	624.99
1/17/2012	585.99
1/23/2012	579.98
1/30/2012	596.33
2/6/2012	605.91
2/13/2012	604.64
2/21/2012	609.9
2/27/2012	621.25
3/5/2012
3/12/2012	625.04
3/19/2012		642
3/26/2012	641.24
4/2/2012	632.32
4/9/2012	624.6
4/16/2012	596.06
4/23/2012	614.98
4/30/2012	596.97
5/7/2012	605.23
5/14/2012	600.4
5/21/2012	591.53
5/29/2012	570.98
6/4/2012	580.45
6/11/2012	564.51
6/18/2012	571.48
6/25/2012	580.07
7/2/2012	585.98
7/9/2012	576.52
7/16/2012	610.82
7/23/2012	634.96
7/30/2012	641.33
8/6/2012
8/13/2012	677.14
8/20/2012	678.63
8/27/2012	685.09
9/4/2012	706.15
9/10/2012	709.68
9/17/2012	733.99
9/24/2012	754.5
10/1/2012	767.65
10/8/2012	744.75
10/15/2012	681.79
10/22/2012	675.15
10/31/2012	687.92
11/5/2012	663.03
11/12/2012	647.18
11/19/2012	667.97
11/26/2012	698.37
12/3/2012	684.21
12/10/2012	701.96
12/17/2012	715.63
12/24/2012	700.01
12/31/2012	737.97
1/7/2013	739.99
1/14/2013	704.51
1/22/2013	753.67
1/28/2013	775.6
2/4/2013	785.37
2/11/2013	792.89
2/19/2013	799.71
2/25/2013	806.19
3/4/2013	831.52
3/11/2013	814.3
3/18/2013	810.31
3/25/2013	794.19
4/1/2013	783.05
4/8/2013	790.05
4/15/2013	799.87
4/22/2013	801.42
4/29/2013	845.72
5/6/2013	880.23
5/13/2013	909.18
5/20/2013	873.32
5/28/2013	871.22
6/3/2013	879.73
6/10/2013	875.04
6/17/2013	880.93
6/24/2013	880.37
7/1/2013	893.49
7/8/2013	923
7/15/2013	896.6
7/22/2013	885.35
7/29/2013	906.57
8/5/2013	890.41
8/12/2013	856.91
8/19/2013	870.21
8/26/2013	846.9
9/3/2013	879.58
9/9/2013	889.07
9/16/2013	903.11
9/23/2013	876.39
9/30/2013	872.35
10/7/2013	871.99
10/14/2013	1011.41
10/21/2013	1015.2
10/28/2013	1027.04

CSC425– Time series analysis and forecasting

Homework 4 – Solutions

Problem 1 [just for fun!] – NO CREDIT

I noticed some students had questions regarding the SARIMA notation. Write the model expression for

the following SARIMA models for the {Xt} time series.

Example: SARIMA(2,0,0)(1,0,0)[12]

The notation indicates a regular AR(2) model and a seasonal AR(1)[12] model

(1-φ1B-φ2B
2
)(1-Φ1 B

12
)Xt = at

SARIMA(2,1,0)(1,1,0)[4]

SARIMA(1,0,1)(0,0,1)[12]

SARIMA(0,1,3)(0,1,1)[4]

In general SARIMA(p,d,q)(P,D,Q)[s] where the first set (p,d,q) describes the regular

ARIMA(p,d,q) model, and the second set (P,D,Q)[s] describes the order of the seasonal

component.

a) The first set (2,1,0) identifies a first difference + AR(2) model. The second set (1,1,0)
indicates the seasonal component with a seasonal differencing and a AR(4) seasonal

model.

Thus

SARIMA(2,1,0)(1,1,0)[4]

(1-ϕ1B-ϕ2B
2
)(1-φ1B

4
) (1-B)(1-B

4
)Xt = at

b) The first set (1,0,1) indicates an ARMA(1,1) model. The second set (0,0,1) is for the
seasonal component, indicating MA(12) seasonal model. Thus

SARIMA(1,0,1)(0,0,1)[12]

(1-ϕ1B)Xt =(1-θ1B
12

) (1-ϑ1B)at

c) The first set (0,1,3) indicates an ARIMA(0,1,3) model, i.e. first difference + MA(3)
model. The second set (0,1,1,) indicates a seasonal differencing and a MA(4) model.

Thus
SARIMA(0,1,3)(0,1,1)[4]

(1-B)(1-B
4
)Xt = (1-θ1B

4
) (1-ϑ1B- ϑ2B

2
– ϑ3B

3
)at

Problem 2 [15pts]

Consider the quarterly earnings per share (EPS) of Costco from the last quarter of 2001 to the third

quarter of 2013 fiscal year. The data were obtained from the Nasdaq website and are saved in the

costco.csv file.

a) Analyze the distribution of the quarterly earnings. Are the data normally distributed? Provide
appropriate statistics, tests and graphs to support your conclusions.

The average quarterly earnings are $0.66 per share with a standard deviation of $0.27. The

distribution of eps has a slight positive skew – as shown by the histogram and normal quantile plot

(points under the right tails diverge from the diagonal). Normality test rejects hypothesis of

normality for earnings data at 5% level. (See SAS / R output)

The UNIVARIATE Procedure

Variable: eps

Moments

N 48 Sum Weights 48

Mean 0.66270833 Sum Observations 31.81

Std Deviation 0.26970228 Variance 0.07273932

Skewness 0.95652585 Kurtosis 0.80234438

Uncorrected SS 24.4995 Corrected SS 3.41874792

Coeff Variation 40.6969796 Std Error Mean 0.03892817

Results of Jacque and Bera test on normality

Jarque & P-value for

skewness, kurtosis, Bera Jarque &

Obs eps eps statistic Bera test

1 0.95653 0.80234 8.60705 0.013521

b) Apply the log function to stabilize variable and analyze the variable yt=log(eps) at time t.
The distribution of log(eps) is closer to normal distribution, with symmetry and kurtosis statistics

close to 0 as also shown by histogram and normal quantile plot. The Jarque-Bera test is not

significant (p-value > 0.05) indicating that hypothesis of normal distribution for log(eps) cannot

be rejected.
Moments

N 48 Sum Weights 48

Mean -0.4891602 Sum Observations -23.479688

Std Deviation 0.39954911 Variance 0.15963949

Skewness 0.0058175 Kurtosis -0.398239

Uncorrected SS 18.9883843 Corrected SS 7.50305604

Coeff Variation -81.68063 Std Error Mean 0.05766995

Results of Jacque and Bera test on normality

Jarque & P-value for

skewness, kurtosis, Bera Jarque &

Obs xlog xlog statistic Bera test

1 .005817502 -0.39824 0.31746 0.85323

c) Is there any evidence of a seasonal effect in the log(eps) data? Explain your answer.
The time plot for Costco quarterly earnings and for log(earnings) show quarterly seasonality, with

higher earnings in the first and third quarters and lower earnings in the second and fourth

quarters. We can observe a slight dip in earnings in 2009 but the recent economic crisis didn’t

seem to affect

Costco earnings

much, as earnings increased shortly after the dip. The log(eps) time

plot shows that variability is more constant over time.

eps

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.1

1.2

1.3

1.4

date

01/01/2000 01/01/2002 01/01/2004 01/01/2006 01/01/2008 01/01/2010 01/01/2012 01/01/2014

Costco earnings

d) Compute the Dickey-Fuller test for the quarterly earnings for p=1, 3, and 5. Write down the test
hypotheses, and the test statistics and analyze the results of the test

Dickey Fuller Test is expressed as H0: φ1=1 vs Ha: φ1<>1 where the null hypothesis

indicates unit-root non-stationarity.

The Dickey-Fuller test shows that the earnings time series is unit-root non-stationary. The test p-

values for lags 1, 3 and 5 are all larger than 5, therefore the null hypothesis of non stationarity

cannot be rejected (large p-values> 0.1).

The ARIMA Procedure

Augmented Dickey-Fuller Unit Root Tests

Type Lags Rho Pr < Rho Tau Pr < Tau F Pr > F

Zero Mean 1 1.1538 0.9282 1.37 0.9554

3 1.2386 0.9381 4.43 0.9999

5 1.1075 0.9213 1.91 0.9852

Single Mean 1 -0.7726 0.9018 -0.28 0.9193 1.22 0.7642

3 1.8677 0.9962 2.06 0.9998 10.02 0.0010

5 0.4869 0.9736 0.24 0.9722 1.86 0.6076

Trend 1 -20.9931 0.0306 -2.77 0.2161 4.22 0.3620

3 0.1628 0.9958 0.06 0.9958 2.31 0.7221

5 93.4268 0.9999 -2.07 0.5472 2.53 0.6813

e) Take the first difference of log(eps) series, and analyze its autocorrelation function. Does the
differenced time series show strong seasonal patterns? Explain.

The ACF of the first difference of yt=log(eps), say yt-yt-1 shows larger values at lags multiple of

four, which is consistent with the seasonal behavior seen in the time plot. The TS is not stationary

since it shows seasonality.

xlog

-1.3

-1.2

-1.1

-1.0

-0.9

-0.8

-0.7

-0.6

-0.5

-0.4

-0.3

-0.2

–

0.1

0.0

0.1
0.2
0.3
0.4
date
01/01/2000 01/01/2002 01/01/2004 01/01/2006 01/01/2008 01/01/2010 01/01/2012 01/01/2014

Time plot of Costco log(EPS)

f) If you de-trend and de-seasonalize the Costco log(eps) data, do you obtain a stationary time series?
Use appropriate statistical techniques and graphs to analyze stationarity.

The de-trended and de-seasonalized time series is stationary – as indicated by the Dickey-Fuller

test (p-values are small, and therefore hypothesis of non stationarity can be rejected), and small

ACF values. After differencing the time series, we can assume a zero mean model for the data.

Note that deseasonalized TS have zero mean (use”nc” type in R)

The ARIMA Procedure

Augmented Dickey-Fuller Unit Root Tests

Type Lags Rho Pr < Rho Tau Pr < Tau F Pr > F

Zero Mean 1 -39.9069 <.0001 -4.17 0.0001

3 85.9684 0.9999 -5.29 <.0001

5 28.6760 0.9999 -4.61 <.0001

Single Mean 1 -39.7843 0.0003 -4.10 0.0024 8.49 0.0010

3 85.8868 0.9999 -5.19 0.0002 13.58 0.0010

5 28.7344 0.9999 -4.51 0.0009 10.30 0.0010

Trend 1 -39.6292 <.0001 -4.04 0.0149 8.22 0.0184

3 86.3878 0.9999 -5.10 0.0009 13.11 0.0010

5 28.3836 0.9999 -4.48 0.0053 10.04 0.0010

g) Apply an airline model SARIMA(0,1,1)(0,1,1)4 to estimate the log(eps).

a. Write down the estimated expression of the fitted model
(1-B)(1-B

4
)Xt = (1-0.339B)(1-0.669B

4
)at (sas output)

The ARIMA Procedure

Maximum Likelihood Estimation

Standard Approx

Parameter Estimate Error t Value Pr > |t| Lag

MA1,1 0.33862 0.14912 2.27 0.0232 1

MA2,1 0.66864 0.13492 4.96 <.0001 4

Variance Estimate 0.012082

Std Error Estimate 0.109917

AIC -63.4007

SBC -59.8783

Number of Residuals 43

b. Are all model coefficients significant?
All coefficients of the SARIMA model are significant (p-value < 0.05). .

c. Analyze the residuals to check the model validity
The autocorrelation analysis of the residuals show that the ACF at lag 5 is significant larger

than zero. This is also shown by the Ljung Box test that has p-value > 0.05 for lags up to 6.

These results indicate that there is some source of serial correlation that is still unexplained

by the model. Since it is only one large ACF value, this is not indicative of a seasonal trend.

The solution is to add a lag 5 parameter in the MA component of the SARIMA model.

Autocorrelation Check of Residuals

To Chi- Pr >

Lag Square DF ChiSq ——————–Autocorrelations——————–

6 9.44 4 0.0509 -0.013 0.104 -0.198 -0.129 -0.346 -0.049

12 12.75 10 0.2380 0.002 0.184 0.019 0.153 -0.000 0.023

18 20.68 16 0.1911 -0.185 0.066 -0.179 -0.065 -0.044 0.189

24 21.88 22 0.4673 -0.014 -0.055 -0.066 0.054 -0.053 -0.009

h) The multiplicative airline model has some limitations, how can you modify it to be a better fit for
the data? (If you are not sure how to do this, post a message on the discussion forum, and start a

discussion with other students!)

a. Write down the estimated expression of the fitted model
(1-B)(1-B

4
)Xt =(1-0.7235B

4
)(1-0.4103B-0.3726B

5
)at (sas output)

The ARIMA Procedure
Maximum Likelihood Estimation
Standard Approx
Parameter Estimate Error t Value Pr > |t| Lag

MA1,1 0.41029 0.16632 2.47 0.0136 1

MA1,2 0.37262 0.16359 2.28 0.0227 5

MA2,1 0.72350 0.13731 5.27 <.0001 4

Variance Estimate 0.010536

Std Error Estimate 0.102645

AIC -66.3638

SBC -61.0802

Number of Residuals 43

b. Are all model coefficients significant?
All coefficients are significantly different from zero

c. Analyze the residuals to check the model validity
The residuals are white noise. Ljung Box test for lags up to 24 is significant, indicating that

the model is able to capture the time dependencies in the data.

To Chi- Pr >

Lag Square DF ChiSq ——————–Autocorrelations——————–

6 3.77 3 0.2870 -0.059 0.091 -0.202 -0.134 -0.078 0.028

12 5.94 9 0.7457 0.026 0.140 -0.069 0.078 -0.055 0.062

18 13.34 15 0.5759 -0.109 0.071 -0.205 -0.086 -0.066 0.182

24 14.27 21 0.8577 0.005 -0.042 -0.079 0.018 -0.033 -0.032

i) Which model is a better fit for the data? Explain what evidence you use to select the best model.

The SARIMA(0,1,0)(0,1,5)4 provides a better description of the dynamic behavior of the Costco

log earnings time series. The coefficients are significantly different from zero, residuals are white

noise and normally distributed. Also the second model has the lowest SBC/BIC values.

j) Compute forecasts for Costco quarterly EPS for the next 4 quarters using the selected model in h).
The table below reports forecasts in terms of EPS values. The forecasts of log(eps) computed by

SARIMA model were transformed to EPS original scale using the exp() function.

Obs date OBSERVED FORECAST STD L95 U95 RESIDUAL x

49 10/01/2013 . 0.97486 0.10264 0.79302 1.18584 . .

50 01/01/2014 . 1.15743 0.11916 0.90987 1.45159 . .

51 04/01/2014 . 1.00514 0.13366 0.76662 1.29454 . .

52 07/01/2014 . 1.51769 0.14672 1.12620 2.00170 . .

R code:
# Analysis of quarterly house starts (single family homes) in the US

from 1974

# to 2012. Seasonality is expected due to lower number of new

constructions

# in the winter months.

#1. LOAD LIBRARIES

library(tseries)

library(fBasics)

library(forecast)

#2. IMPORT DATA

# Load data with no variable names into the data frame “da”

myd=read.table(“costco.csv”,header=T, sep=’,’)

# creates time series object

x=myd$eps

xts=ts(myd$eps,start=c(2001,4), frequency=4,)

# create time plot

plot(xts,type=’l’)

# TRANSFORMATIONS

# natural log

y=log(x)

yts=log(xts)

plot(yts,type=’l’)

#add circles on data points in the time plot

points(yts)

# ACF ANALYSIS

# two plots per page

par(mfcol=c(1,1))

# ACF plot for log starts

# as.vector() transforms time series to regular numerical vector

acf(as.vector(y),lag.max=24, main=”ACF of log starts”)

# APPLYING DIFFERENCING TO DATA

# compute regular differences

dy=diff(y)

# create acf plot

acf(as.vector(dy),lag.max=16, main=”ACF of DX log starts”)

# compute seasonal difference for quarterly data (s=4)

sdy=diff(dy,4)

# create acf plot

acf(as.vector(sdy),lag.max=16, main=”ACF of DSDX log starts”)

pacf(sdy)

#FIT SARIMA MODEL

library(forecast)

# try automated order selection

# since TS is stationary, stationary = FALSE

;

auto.arima(y, trace=T,stationary = FALSE)

# fit multiplicative seasonal model ARIMA(1,0,0)×(1,1,0)[4]

m1=arima(y, order=c(1,0,0),seasonal=list(order=c(1,1,0),period=4),

method=”ML”)

source(“arimaPrint.R”)

# use arimaPrint() function in arimaPrint.R file

# save arimaPrint.R file in work directory

arimaPrint(m1)

acf(m1$residuals)

qqnorm(m1$residuals)

qqline(m1$residuals)

hist(m1$residuals)

# Since seasonal AR_1 coefficient is not significant; remove it from

model;

# fit airline seasonal model ARIMA(0,1,1)×(0,1,1)4

m1=arima(y, order=c(0,1,1),seasonal=list(order=c(0,1,1),period=4),

method=”ML”)

# fit airline seasonal model ARIMA(0,1,5)×(0,1,1)4 with only lags 1

and 5

m1=arima(y, order=c(0,1,5),seasonal=list(order=c(0,1,1),period=4),

method=”ML”, fixed=c(NA, 0,0,0,NA,NA))

# use arimaPrint() function in arimaPrint.R file

arimaPrint(m1,c(1,5,6) )

# One plot per page

par(mfcol=c(1,1))

acf(m1$residuals)

# “approximated” ljung box test on residuals

Box.test(m1$residuals, 6, “Ljung-Box”,fitdf=3 )

Box.test(m1$residuals, 12, “Ljung-Box”, fitdf=3)

# compute predictions for log data for up to 8-step aheads

f1=forecast(m1,h=8)

names(f1)

plot(f1)

#predictions in original scale;

exp(f1$mean)

# creates forecast plot for original variable

# join data with forecasts

s1=c(x,exp(f1$mean))

#s1=ts(s1, frequency=4,start=c(1974,1))

# computes lower limit for 95% interval

lcl=c(x,exp(f1$lower[,2]))

# computes upper limit for 95% interval

ucl=c(x,exp(f1$upper[,2]))

# Forecast plot

plot(s1,type=’l’,col=’red’)

# points(s2011)

lines(1:length(s1),ucl,lty=2)

lines(1:length(s1),lcl,lty=2)

SAS code
title1 ‘Costco earnings’;

proc import datafile=”costco.csv” out=myd replace;

getnames=yes;

delimiter=’,’;

run;

proc print;

run;

data myd;

set myd;

xlog=log(eps);

run;

*analyze distribution of log(eps);

proc univariate data=myd;

var xlog;

histogram/normal;

qqplot / normal(mu=est sigma=est);

output out=stats kurtosis=kurtosis skewness=skewness N=ntot;

run;

/* steps to compute Jarque-Bera tests

/* and print out tests results to the OUTPUT window */

data computation;

set stats;

label pv_kur = “P-value for kurtosis test”;

*skewness statistic;

skew_test = skewness/sqrt(6/Ntot);

*kurtosis statistic;

kurt_test = kurtosis/sqrt(24/Ntot);

* jarque-bera statistic;

jb = skew_test*skew_test+kurt_test*kurt_test;

* JB test p-values;

pv_jb = 1-cdf(‘CHISQUARE’, jb,2);

label pv_jb = “P-value for Jarque & Bera test”

jb = “Jarque & Bera statistic”;

/* Print out results of tests*/

Title ” Results of Jacque and Bera test on normality”;

proc print data= computation label;

var skewness kurtosis jb pv_jb;

run;

*plot timeplots for all transformed variables;

symbol1 i=join v=dot;

proc gplot data=myd;

title1 “Time plot of Costco EPS”;

plot eps * date = 1 ;

title1 “Time plot of Costco log(EPS)”;

plot xlog * date = 1 ;

run;

proc arima data=myd;

identify var=eps nlag=24 stationarity=(adf=(1 3 5));

identify var=xlog(1) nlag=24;

identify var=xlog(1,4) nlag=20 stationarity=(adf=(1 3 5))

minic;

identify var=xlog(1,4) nlag=20 esacf;

run;

*airline model SARIMA(0,1,1)(0,1,1)[4];

estimate q=(1)(4) noconstant method=ml;

* modified airline model SARIMA(0,1,1)(0,1,5)[4];

estimate q=(1 5)(4) noconstant method=ml;

outlier maxnum=3 alpha=0.01;

run;

forecast out=b lead=12 id=date interval=quarter;

quit;

The following statements retransform the forecast values to get

forecasts in the original scales.

data c;

set b;

x = exp( xlog );

forecast = exp( forecast + std*std/2 );

l95 = exp( l95 );

u95 = exp( u95 );

run;

proc print data=c;

run;

Problem 3 [12 points] (review Week 5 notes)

Global temperatures are rising. Analyze the time series of Land-Ocean Temperature Index in degrees

Celsius published at the NASA-GISS (Goddard Institute for Space Studies) and saved in dataset

globtemp.csv. The temperature index measures temperature anomalies that are computed relative

to the base period 1951-1980. The reason to work with anomalies, rather than absolute

temperature is that absolute temperature varies markedly in short distances, while monthly or

annual temperature anomalies are representative of a much larger region (from GISS website).
a) Plot the temperature index time series and its ACFs (20 lags). Analyze trends and patterns

shown by the data.

Time plot shows that global temperatures has steadily increased since 1950.The ACF

plot shows that Global Temperature Index TS is non stationary (ACF function decays

very slowly to zero, which is not surprising given the upward time trend shown by the

time plot)

b) Analyze if the series is stationary using both the ACF function and the Dickey Fuller test
The DF tests computed for AR(3) and AR(5) models show that the null hypothesis of unit-root

stationarity cannot be rejected (p-values are > 0.1). (NOTE: for this dataset, it would be

acceptable to use the DF test for either model with intercept (constant) but no time trend, or

for a model with intercept (constant) and a time trend. Results are equivalent)

> # Dickey Fuller test for time series with time trend

library(fUnitRoots)

adfTest(index, lags=3, type=c(“ct”))

Title:

Augmented Dickey-Fuller Test

Test Results:

PARAMETER:

Lag Order: 3

STATISTIC:

Dickey-Fuller: -2.4227

P VALUE:

0.4007

> adfTest(index, lags=5, type=c(“ct”))

Title:
Augmented Dickey-Fuller Test

Test Results:
PARAMETER:

Lag Order: 5

STATISTIC:

Dickey-Fuller: -1.9472

P VALUE:

0.5984

In SAS

Augmented Dickey-Fuller Unit Root Tests

Type Lags Rho Pr < Rho Tau Pr < Tau F Pr > F

Zero Mean 1 -3.1286 0.2229 -0.97 0.2967

3 0.3095 0.7561 0.16 0.7308

5 0.6287 0.8363 0.36 0.7864

Single Mean 1 -2.9490 0.6593 -0.91 0.7827 0.74 0.8811

3 0.5894 0.9790 0.31 0.9780 1.13 0.7831

5 1.0265 0.9887 0.63 0.9900 1.84 0.6012

Trend 1 -26.5191 0.0132 -3.75 0.0225 7.46 0.0207

3 -11.2587 0.3399 -2.42 0.3659 4.15 0.3489

5 -8.1422 0.5607 -1.95 0.6245 3.26 0.5268

c) Use order selection methods such as EACF or BIC to identify the order of the “best”
AR/MA/ARMA model for the global temperature index time series. Note that since the TS is

not stationary, you should use an ARIMA(p,1,d) model.

Since TS is unit-root non stationary, only ARIMA(p,1,q) models will be evaluated.

In R: The EACF method suggests an ARIMA(1,1,2). The auto.arima() function selects the

ARIMA(0,1,1) model that exhibits the lowest BIC value.

In SAS: the EACF method suggests an ARIMA(0,1,1) model and the BIC model selects the

ARIMA(3,1,0)

#EACF method

library(TSA)

#use numeric object not ts object. coredata() function retrieves

numeric vector

d1_ts=diff(indexts)

eacf(coredata(d1_ts))

AR/MA
0 1 2 3 4 5 6 7 8 9 10 11 12 13
0 x o o x o o o o x o o o x o
1 x o o o o o o o o o o o o o

2 x o x o o o o o o o o o x o
3 o x x o o o o o o o o o o o
4 o x x x o o o o o o o o o o
5 x o x x x o o o o o o o o o
6 x o o o x o o o o o o o o o
7 x o o o x x o o o o o o o o

# BIC criterion

# use numeric object not ts object

auto.arima(index, max.p = 5, max.q = 5, stationary = F, ic

=c(“bic”), trace=TRUE, allowdrift=F)

ARIMA(2,1,2) : 982.1547
ARIMA(0,1,0) : 990.837
ARIMA(1,1,0) : 985.1276
ARIMA(0,1,1) : 972.8737
ARIMA(1,1,1) : 976.1241
ARIMA(0,1,2) : 973.7256
ARIMA(1,1,2) : 979.3824

Best model: ARIMA(0,1,1)

Series: index
ARIMA(0,1,1)

Coefficients:
ma1
-0.5398
s.e. 0.0838

sigma^2 estimated as 89.26: log likelihood=-483.91
AIC=971.82 AICc=971.92 BIC=977.59

In SAS
ESACF Probability Values

Lags MA 0 MA 1 MA 2 MA 3 MA 4 MA 5

AR 0 0.0011 0.0635 0.4021 0.0552 0.1309 0.8882

AR 1 <.0001 0.7810 0.3314 0.3416 0.1542 0.5998

AR 2 <.0001 0.6174 0.0018 0.7358 0.1896 0.8318

AR 3 0.4231 0.0002 0.0001 0.1756 0.6531 0.7922

AR 4 0.2412 <.0001 0.0018 0.0597 0.3335 0.6631

AR 5 <.0001 0.4323 0.0527 0.0004 0.0120 0.9065

The ARIMA Procedure

ARMA(p+d,q)

Tentative

Order

Selection

Tests

—ESACF—

p+d q

0 1

3 3

(5% Significance Level)

Name of Variable = temp

Minimum Information Criterion

Lags MA 0 MA 1 MA 2 MA 3 MA 4 MA 5

AR 0 4.588787 4.487776 4.468956 4.495152 4.505573 4.533958

AR 1 4.541124 4.523001 4.480694 4.48462 4.503237 4.540091

AR 2 4.499424 4.522147 4.516266 4.503816 4.537365 4.574281

AR 3 4.441992 4.477332 4.494755 4.527052 4.558184 4.586585

AR 4 4.477766 4.514051 4.522759 4.556779 4.577564 4.612352

AR 5 4.484678 4.521373 4.55314 4.590039 4.610067 4.637321

Error series model: AR(10)

Minimum Table Value: BIC(3,0) = 4.441992

d) Fit the selected ARIMA model, and write down the result.
The ARIMA(1,1,2) model selected by the EACF method in R has non-significant parameters.

The ARIMA(3,1,0) selected by BIC method in SAS is a possible model but has higher BIC

value (details of analysis are omitted). The best model is the ARIMA(0,1,1) selected by the BIC

criterion in R and EACF in SAS.

The fitted ARIMA(0,1,1) model for the global temperature index Xt can be written as:

(1-B)Xt = (1-0.5398B)at

m2= Arima(index, order=c(0,1,1), method=’ML’)

Series: index
ARIMA(0,1,1)

Coefficients:
ma1
-0.5398
s.e. 0.0838

sigma^2 estimated as 89.26: log likelihood=-483.91
AIC=971.82 AICc=971.92 BIC=977.59
>

arimaPrint(m2)

[1] “T-test table for model coefficients”
estimates std error
ma1 “-0.5397611” “0.08376795”
p-values
ma1 “1.167279e-10”

e) Use residual analysis to check the model.
Ljung Box test on residuals shows that hypothesis of

independence cannot be rejected. So we can assume that

residuals are white noise.

The normal quantile plot shows that distribution of residuals

is close to the normal distribution.

> Box.test(m2$residuals,lag=6,type=’Ljung-Box’, fitdf=1)

Box-Ljung test

data: m2$residuals

X-squared = 9.5895, df = 5,

p-value = 0.08774

Box.test(m2$residuals,lag=12,type=’Ljung-Box’, fitdf=1)

Box-Ljung test
data: m2$residuals

X-squared = 16.3747, df = 11,

p-value = 0.1278

f) Compute 1-step to 4-step ahead forecasts of the fitted model with origin at the end of the data,
i.e. 2012. Write down the forecasts and their standard errors.

> f=

forecast.Arima(m2, h=4)

plot(f, include=50)

> f
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
134 58.1793 46.07184 70.28676 39.66253 76.69606
135 58.1793 44.85108 71.50751 37.79555 78.56305
136 58.1793 43.73311 72.62548 36.08577 80.27283
137 58.1793 42.69566 73.66293 34.49912 81.85947

g) EXTRA CREDIT [2pts]: apply backtesting procedure to compare the following two models:
ARIMA(0,1,1) and ARIMA(0,1,2). Use 113

cases (85% of data) for training and 20 cases

(or 15%) for testing.

The procedure shows that the RMSE value of

the ARIMA(0,1,1) is 9.680. While the RMSE

value for the ARIMA(0,1,2) is 9.688. The two

values are very close showing that the two

models are equivalent in terms of predictive

power. However the ARIMA(0,1,1) is slightly

better.

backtest(m2,index,113,1)

[1] “RMSE of out-of-sample
forecasts”
[1] 9.680175
[1] “Mean absolute error of out-of-sample forecasts”
[1] 7.925004
>

m3=arima(index, order=c(0,1,2))

backtest(m3,index,113,1)

[1] “RMSE of out-of-sample forecasts”
[1] 9.688697
[1] “Mean absolute error of out-of-sample forecasts”
[1] 7.932232
> m3
Series: x
ARIMA(0,1,2)

Coefficients:
ma1 ma2
-0.4287 -0.1591
s.e. 0.0843 0.0764

sigma^2 estimated as 86.63: log likelihood=-481.94
AIC=967.88 AICc=968.07 BIC=976.53

In SAS: backtest needs to be modified to fit ARIMA(p,1,q) models by changing the IDENTIFY

statement in the PROC ARIMA of the BACKTEST macro as follows
identify var=&VAR(1)

noprint;

/*——— backtesting ————–*/

%backtest(trainpct=85, dataset=myd, var=temp, q=1, interval=year);

%backtest(trainpct=85, dataset=myd, var=temp, q=2, interval=year);

Backtest results for myd

Model: q=(1)

Obs _TYPE_ _FREQ_ mafe msfe rmsfe

1 0 20 7.71428 87.0438 9.32973

Backtest results for myd

Model: q=2

Obs _TYPE_ _FREQ_ mafe msfe rmsfe

1 0 20 7.60769 85.3367 9.23779

R code

# Analysis of global temperature index

#1. LOAD LIBRARIES

library(tseries)
library(fBasics)
library(forecast)

#Create a time series

library(zoo)

data=read.table(“globtemp.csv”,header=T, sep=’,’)

index=data[,2]

indexts=zoo(data[,2],as.Date(as.character(data[,1]),format=”%Y”))

#plot ts and acf test

plot(indexts, type=’l’, xlab=’time’, ylab=’temperature index ‘)

#Analysis of distribution

par(mfcol=c(1,1))

hist(index, xlab=”Distribution of GT Index”, prob=TRUE,

main=”Histogram”)

# add approximating normal density curve

xfit<-seq(min(index),max(index),length=40)

yfit<-dnorm(xfit,mean=mean(index),sd=sd(index))

lines(xfit, yfit, col=”blue”, lwd=2)

# CREATE NORMAL PROBABILITY PLOT

qqnorm(index)

qqline(index, col = 2)

acf(coredata(indexts), plot=T, lag=20)

# Dickey Fuller test

library(fUnitRoots)
adfTest(index, lags=3, type=c(“ct”))

adfTest(index, lags=5, type=c(“ct”))

adfTest(index, lags=3, type=c(“c”))

adfTest(index, lags=5, type=c(“c”))

#Ljung – Box test

Box.test(index,lag=6,type=’Ljung’)

Box.test(index,lag=12,type=’Ljung’)

#EACF method
library(TSA)
#use numeric object not ts object. coredata() function retrieves
numeric vector
d1_ts=diff(indexts)
eacf(coredata(d1_ts))
# BIC criterion
# use numeric object not ts object
auto.arima(index, max.p = 5, max.q = 5, stationary = F, ic

=c(“bic”), trace=TRUE, allowdrift=F)

# 13. MODEL FITTING

# MODEL 1:

#EACF suggests ARIMA(1,1,2) p=1 and m=2

m1= arima(index, order=c(1,1,2), method=’ML’)

m1
source(“arimaPrint.R”)
arimaPrint(m1)

# RESIDUAL ANALYSIS

Box.test(m1$residuals,lag=6,type=’Ljung-Box’, fitdf=1)

Box.test(m1$residuals,lag=12,type=’Ljung-Box’, fitdf=1)

acf(m1$residuals)
qqnorm(m1$residuals)
qqline(m1$residuals)
hist(m1$residuals)

# MODEL 2 SELECTED BY AUTOARIMA

# FIT AN ARIMA(0,1,1) MODEL

m2= Arima(index, order=c(0,1,1), method=’ML’)
m2
arimaPrint(m2)

# RESIDUAL ANALYSIS

Box.test(m2$residuals,lag=6,type=’Ljung-Box’, fitdf=1)

Box.test(m2$residuals,lag=12,type=’Ljung-Box’, fitdf=1)

acf(m2$residuals)

#Analysis of distribution
par(mfcol=c(1,1))

hist(m2$residuals, xlab=”Distribution of residuals”, prob=TRUE,

main=”Histogram”)

# add approximating normal density curve

xfit<-seq(min(m2$residuals),max(m2$residuals),length=40)

yfit<-dnorm(xfit,mean=mean(m2$residuals),sd=sd(m2$residuals))

lines(xfit, yfit, col=”blue”, lwd=2)

# CREATE NORMAL PROBABILITY PLOT

qqnorm(m2$residuals)

qqline(m2$residuals, col = 2)

#forecast and plot

pred=predict(m2,n.ahead=4, se.fit=T)

pred

forecast.Arima(m2, h=4)

f=forecast.Arima(m2, h=4)

plot(f, include=50)

#Backtesting

source(“backtest.R”)

backtest(m2,index,113,1)
m3=arima(index, order=c(0,1,2))
backtest(m3,index,113,1)

SAS code:
%macro backtest(TRAINPCT=80, DATASET=, VAR=, P=, Q=, D=, DATE=date,

INTERVAL=month);

* ————————————————————————-

;

* T I M E S E R I E S B A C K T E S T M A C R O

* ————————————————————————-
;

* DePaul CSC425, Winter 2013, Dr. Raffaella Settimi

* Macro written by Bill Qualls, First Analytics

* ————————————————————————-
;

* E X P L A N A T I O N O F P A R A M E T E R S

* (Order of variables is insignificant)

* ————————————————————————-
;

* TRAINPCT .. Percent of dataset to be used for training.

* So, (100 – TRAINPCT)% will be used for evaluation.

* Example: TRAINPCT=80

* DATASET .. Time series dataset. Libname optional, defaults to Work.

* Example: DATASET=Work.Unemp

* VAR .. Name of time series variable.

* Example: VAR=ratechg

* P .. Specified for AR models. Omit otherwise.

* Example: P=(1 3 6)

* Q .. Specified for MA models. Omit otherwise.

* Example: Q=(1 3 6)

* DATE .. Name of date variable. Defaults to date.

* Example: DATE=date

* INTERVAL .. Date interval. Defaults to month.

* Example: INTERVAL=day

* ————————————————————————-
;

* S A M P L E U S A G E

* %backtest(trainpct=80, dataset=work.unemp, var=ratedif, p=(1),

interval=day);

* ————————————————————————-
;

* How many records are in the dataset? ;

data _null_;

call symput(‘NRECS’, trim(left(nrecs)));

set &DATASET nobs=nrecs;

stop;

run;

* Determine which ones are exclusively for training based on TRAINPCT ;

%let SIZE_OF_ROLLING_WINDOW = %sysfunc(round(&NRECS * &TRAINPCT / 100));

* create a working copy of dataset with observation number ;

* as a variable for use in a where clause with proc arima. ;

* also add a placeholder for the predicted value. ;

data Work._MY_COPY_ (keep = &VAR &DATE _OBS_ _PRED_);

set &DATASET;

_OBS_ = _N_;

_PRED_ = .;

run;

* turn off log — too lengthy ;

filename junk dummy;

proc printto log=junk print=junk;

run;

* Will build the model once for each record used in evaluation. ;

* Each time I will predict one record forward. ;

%let MODELS_TO_BE_BUILT = %sysevalf(&NRECS – &SIZE_OF_ROLLING_WINDOW);

%do i = 1 %to &MODELS_TO_BE_BUILT;

* Model using SIZE_OF_ROLLING_WINDOW records, and make one prediction ;

proc arima data=Work._MY_COPY_ plots=none;

where _OBS_ ge &i

and _OBS_ le (&i + &SIZE_OF_ROLLING_WINDOW – 1);

identify var=&VAR(1) noprint;

estimate

%if (&P ne ) %then %do; p=&P

%end;

%if (&Q ne ) %then %do; q=&Q %end;

method=ml noprint;

forecast lead=1 id=&DATE interval=&INTERVAL out=Work._MY_RESULTS_

noprint;

run;

* get the predicted value (in the last record) as a macro variable ;

data _null_;

p = nrecs;

set Work._MY_RESULTS_ point=p nobs=nrecs;

call symput(“FORECAST”, forecast);

stop;

run;

* move that prediction to its place in the output file ;

proc sql noprint;

update Work._MY_COPY_

set _PRED_ = &FORECAST

where _OBS_ = &i + &SIZE_OF_ROLLING_WINDOW;

quit;

run;

* show progress so far ;

%if (%sysfunc(mod(&i, 5)) = 0) %then %do;

* print on;

proc printto log=log print=print;

run;

%put Finished &i iterations;

* print off again;

proc printto log=junk print=junk;

run;

%end;

* turn print back on ;

proc printto log=log print=print;
run;

* calculate prediction error;

data Work._MY_COPY_;

set Work._MY_COPY_;

Predicted_Error = (&VAR – _PRED_) ;

Predicted_Error_Squared = (&VAR – _PRED_)**2;

absresidual = abs(Predicted_Error);

run;

* compute and report the mean square forecast error;

%if (&P eq ) %then %let PP = ; %else %let PP = p=&P;

%if (&Q eq ) %then %let QQ = ; %else %let QQ = q=&Q;

title “Backtest results for &DATASET”;

title2 “Model: &PP &QQ”;

proc summary data=Work._MY_COPY_;

where _OBS_ > &SIZE_OF_ROLLING_WINDOW;

var Predicted_Error absresidual;

output out=outm mean(absresidual)=mafe mean(Predicted_Error_Squared)=msfe;

run;

data outm;

set outm;

rmsfe=sqrt(msfe);

run;

proc print data=outm;

run;

proc means data=Work._MY_COPY_ N MEAN STDDEV CLM T PROBT;

where _OBS_ > &SIZE_OF_ROLLING_WINDOW;

var Predicted_Error_Squared;

run;

title;

%mend backtest;

/*————————————————-*/

proc import datafile=”globtemp.csv” out=myd replace;

delimiter=’,’;
getnames=yes;
run;

proc print data=myd;

run;

* produced time plots;

goptions reset=global;

symbol interpol = join value=star;

proc gplot data=myd;

plot temp*year / haxis = 1880 to 2012 by 10;

run;
quit;

goptions reset = global;

proc univariate data=myd;

var temp;

histogram / normal;

qqplot / normal(mu=est sigma=est);
run;

* compute Dickey-Fuller unit-root tests;

proc arima data=myd;

identify var = temp stationarity = (adf=(1 3 5));

identify var = temp(1) stationarity = (adf=(1 3 5));

run;

proc arima data=myd;

identify var = temp esacf q=(0:5) p=(0:5) nlag =20;

identify var = temp minic q=(0:5) p=(0:5) nlag =20;

run; quit;

/* fit models from step above */

* fit ARIMA(p,1,q) models for unemployment rates;

* note that model intercept is set =0 using the noconstant option;

proc arima data=myd;

identify var = temp(1) noprint;

estimate p=0 q=1 method=ml plot noconstant;

estimate p=3 q=0 method=ml plot noconstant;

run;
quit;

proc arima data=myd;

identify var=temp(1) noprint;

estimate p=0 q=1 method=ml plot noconstant;

forecast back=0 lead=4 printall out=result id=year interval=day;

run;

quit;

/* Analysis of residuals*/

/* Residual plot: residual vs time*/

goptions reset=global;

symbol i=join;

proc gplot data=result;

plot residual*year / haxis = 1880 to 2012 by 10;

run;
quit;

/*analysis of distribution of residuals, to check

for normality */

goptions reset=global;

proc univariate data=result;

var residual;

histogram/normal (mu=est sigma=est);

qqplot/normal(mu=est sigma=est);

run;

goptions reset=global;

symbol1 interpol =join value=star color=red w=2;

symbol2 interpol = join color=black;

symbol3 interpol = join value=none color = blue;

symbol4 interpol = join value=none color = blue;

legend1 label=none

shape=symbol(4,2)

position=(top center inside)

mode=share;

proc gplot data=result;

plot forecast*year temp*year u95*year l95*year / overlay

haxis = 1880 to 2022 by 10 legend=legend1;

run;
/*——— backtesting ————–*/

%backtest(trainpct=85, dataset=myd, var=temp, q=(1), interval=year);

%backtest(trainpct=85, dataset=myd, var=temp, q= 2, interval=year);

S&P500analysis

Fitting an AR(1)-GARCH(1,1) model with Gaussian errors

> # Fitting a ARMA(0,0)-GARCH(1,1) model using the rugarch package

> # log returns show weak serial autocorrelations in the mean and an ARCH

effect

> # Fitting an GARCH(1,1) model

# Use ugarchspec() function to specify model

> garch11.spec=ugarchspec(variance.model=list(garchOrder=c(1,1)),

mean.model=list(armaOrder=c(0,0)))

> #estimate model

garch11.fit=ugarchfit(spec=garch11.spec, data=ret)

garch11.fit

*———————————*

* GARCH Model Fit *

*———————————*

Conditional Variance Dynamics

———————————–

GARCH Model : sGARCH(1,1)

Mean Model : ARFIMA(0,0,0)

Distribution : norm

Optimal Parameters

————————————

Estimate Std. Error t value Pr(>|t|)

mu 0.006814 0.001798 3.7889 0.000151

omega 0.000090 0.000049 1.8302 0.067224

alpha1 0.118390 0.031384 3.7723 0.000162

beta1 0.844396 0.034387 24.5556 0.000000

The fitted ARMA(0,0)-GARCH(1,1) model with Gaussian errors can be written as

8444.01184.000009.0

0068.0

tttt

aea

 





Robust Standard Errors:

Estimate Std. Error t value Pr(>|t|)

mu 0.006814 0.001914 3.5592 0.000372

omega 0.000090 0.000070 1.2858 0.198502

alpha1 0.118390 0.041165 2.8760 0.004028

beta1 0.844396 0.040113 21.0503 0.000000

LogLikelihood : 828.5681

Information Criteria

————————————

Akaike -3.4286

Bayes -3.3938

Shibata -3.4287

Hannan-Quinn -3.4149

Q-Statistics on Standardized Residuals

————————————

statistic p-value

Lag[1] 0.5261 0.4682

Lag[p+q+1][1] 0.5261 0.4682

Lag[p+q+5][5] 5.9001 0.3161

d.o.f=0

H0 : No serial correlation

Q-Statistics on Standardized Squared Residuals

————————————
statistic p-value

Lag[1] 0.2657 0.6062

Lag[p+q+1][3] 0.8252 0.3637

Lag[p+q+5][7] 3.0889 0.6863

d.o.f=2

ARCH LM Tests

————————————

Statistic DoF P-Value

ARCH Lag[2] 0.5084 2 0.7755

ARCH Lag[5] 0.9231 5 0.9685

ARCH Lag[10] 6.3220 10 0.7875

Nyblom stability test

————————————

Joint Statistic: 1.3254

Individual Statistics:

mu 0.07929

omega 0.20176

alpha1 0.02459

beta1 0.09300

Asymptotic Critical Values (10% 5% 1%)

Joint Statistic: 1.07 1.24 1.6

Individual Statistic: 0.35 0.47 0.75

Sign Bias Test

————————————

t-value prob sig

Sign Bias 2.1382 0.0330120 **

Negative Sign Bias 0.5723 0.5673614

Positive Sign Bias 1.3403 0.1807809

Joint Effect 17.4223 0.0005786 ***

Adjusted Pearson Goodness-of-Fit Test:

————————————

group statistic p-value(g-1)

1 20 43.12 0.0012496

2 30 53.82 0.0033902

3 40 75.76 0.0003804

4 50 79.60 0.0037117

Ljung-Box test for serial correlation

computed on residuals, and Ljung-

Box test for ARCH/GARCH effect

computed on squared residuals.

Goodness of fit test for distribution of err

term. The null hypothesis states that the

distribution for the error terms in the model

is adequate. Thus a small p-value < α,

indicates that null hypothesis can be

rejected and the distribution assumption is

not adequate.

#create selection list of plots for garch(1,1) fit

plot(garch11.fit)

#to display all subplots on one page

plot(garch11.fit, which=”all”)

Residual analysis of GARCH model shows that the model fits the data adequately. Ljung Box test (Q-statistic) on
residuals is not significant, showing that hypothesis of no correlation for residuals cannot be rejected. Similarly the
Ljung Box test (Q-statistic) on the squared standardized residuals is not significant suggesting that residuals show
no ARCH/GARCH effect. The Adjusted Pearson goodness of fit test is significant indicating that the normal
distribution assumed for the error term is not appropriate, as also shown by the QQ plot of the residuals.

Fitting an AR(1)-GARCH(1,1) model with t-distributed errors

Elapsed time : 0.260026

> plot(garch11.fit, which=”all”)

Error in plot.new() : figure margins too large

> # use Student-t innovations

> #specify model using functions in rugarch package

> #Fit ARMA(0,0)-GARCH(1,1) model with t-distribution

garch11.t.spec=ugarchspec(variance.model=list(garchOrder=c(1,1)),

mean.model=list(armaOrder=c(0,0)),

distribution.model = “std”)

> #estimate model

garch11.t.fit=ugarchfit(spec=garch11.t.spec, data=ret)

garch11.t.fit

*———————————*
* GARCH Model Fit *
*———————————*
Conditional Variance Dynamics
———————————–
GARCH Model : sGARCH(1,1)
Mean Model : ARFIMA(0,0,0)

Distribution : std

Optimal Parameters
————————————
Estimate Std. Error t value Pr(>|t|)

mu 0.008012 0.001746 4.5892 0.000004

omega 0.000122 0.000068 1.7901 0.073444

alpha1 0.123683 0.038919 3.1780 0.001483

beta1 0.821471 0.050706 16.2008 0.000000

shape 6.891814 1.978494 3.4834 0.000495

The fitted ARMA(0,0)-GARCH(1,1) model with Gaussian errors can be written as

8215.01237.000012.0

0080.0

ttt

tt
aea
ar
 


Error term {et} has t-distribution with 7 degrees of freedom

Robust Standard Errors:
Estimate Std. Error t value Pr(>|t|)

mu 0.008012 0.001817 4.4090 0.000010

omega 0.000122 0.000071 1.7216 0.085136

alpha1 0.123683 0.038112 3.2453 0.001173

beta1 0.821471 0.041476 19.8061 0.000000

shape 6.891814 1.976630 3.4866 0.000489

LogLikelihood : 839.2684

Information Criteria

————————————

Akaike -3.4689

Bayes -3.4255

Shibata -3.4691

Hannan-Quinn -3.4518

Q-Statistics on Standardized Residuals

————————————
statistic p-value

Lag[1] 0.4662 0.4947

Lag[p+q+1][1] 0.4662 0.4947

Lag[p+q+5][5] 5.6618 0.3405

d.o.f=0
H0 : No serial correlation

Q-Statistics on Standardized Squared Residuals
————————————
statistic p-value

Lag[1] 0.2474 0.6189

Lag[p+q+1][3] 0.8569 0.3546

Lag[p+q+5][7] 2.9570 0.7066

d.o.f=2

ARCH LM Tests
————————————
Statistic DoF P-Value

ARCH Lag[2] 0.5959 2 0.7423

ARCH Lag[5] 0.9740 5 0.9646

ARCH Lag[10] 6.0871 10 0.8079

Nyblom stability test
————————————

Joint Statistic: 1.396

Individual Statistics:

mu 0.17184

omega 0.23451

alpha1 0.04303

beta1 0.10209

shape 0.07244

Asymptotic Critical Values (10% 5% 1%)

Joint Statistic: 1.28 1.47 1.88

Individual Statistic: 0.35 0.47 0.75

Sign Bias Test
————————————

t-value prob sig

Sign Bias 2.121 0.0343991 **

Negative Sign Bias 0.549 0.5832924

Positive Sign Bias 1.206 0.2284569

Joint Effect 16.352 0.0009603 ***

Adjusted Pearson Goodness-of-Fit Test:
————————————
group statistic p-value(g-1)

1 20 22.66 0.25269

2 30 44.47 0.03312

3 40 39.50 0.44759

4 50 43.64 0.68967

Residual analysis of GARCH model with t-distributed error terms shows that the model fits the data adequately.
Ljung Box test (Q-statistic) on residuals is not significant, showing that hypothesis of no correlation for residuals
cannot be rejected. Similarly the Ljung Box test (Q-statistic) on the squared standardized residuals is not significant
suggesting that residuals show no ARCH/GARCH effect. The Adjusted Pearson goodness of fit test is not significant
indicating that the distribution of the error terms can be described by a t-distribution with 7 degrees of freedom,

as also shown by the QQ plot of the residuals (created using plot(garch11.t.fit, which=9).

Fitting an ARMA(0,0)-EGARCH(1,1) model

The EGARCH model fitted by the rugarch package has a slightly different form than the model in the textbook.
Here is the general expression of the EGARCH(1,1) model:

)ln(|))(||(|()ln(

1111











tttt

ttttt

eEe

eaar







where µt can follow an ARMA process, but most typically it will be constant. Gamma1 (γ1) is the leverage
parameter. So if gamma1 in output is significant, then we can conclude that the volatility process has an
asymmetric behavior.

Fitting an ARMA(0,0)-EGARCH(1,1) model with Gaussian distribution (similar to SAS example)

> #Fit ARMA(0,0)-eGARCH(1,1) model with Gaussian distribution

> egarch11.spec=ugarchspec(variance.model=list(model = “eGARCH”,

garchOrder=c(1,1)), mean.model=list(armaOrder=c(0,0)))

#estimate model

egarch11.fit=ugarchfit(spec=egarch11.spec, data=ret)

egarch11.fit

*———————————*
* GARCH Model Fit *
*———————————*

Conditional Variance Dynamics
———————————–

GARCH Model : eGARCH(1,1)

Mean Model : ARFIMA(0,0,0)
Distribution : norm

Optimal Parameters
————————————
Estimate Std. Error t value Pr(>|t|)

mu 0.005937 0.001860 3.1915 0.001415

omega -0.610497 0.263901 -2.3134 0.020703

alpha1 -0.111069 0.042614 -2.6064 0.009150

beta1 0.902462 0.041695 21.6442 0.000000

gamma1 0.209405 0.050542 4.1432 0.000034

Using the R output above, the ARMA(0,0)-EGARCH(1,1) model can be written as follows.

Fitted model:

rt = 0.0059 + at, at=σtet

ln(σ2t) = -0.610 + (-0.111 et-1 + 0.209(|et-1| – E(|et-1|)) + 0.9024 ln(σ
2

t-1)

Note that since et has Gaussian distribution, the E(|et|)=sqrt(2/pi) = 0.7979 or approx 0.80 (see page 143 in
textbook). Thus we can rewrite the expression above as


















07979.0209.0209.0111.0

07979.0209.0209.0111.0
)ln(9024.0610.0)ln(

112

1
2
ttt
ttt
tt

eifee

eifee




And after some algebra, we can write:















0320.0

0098.0
)ln(9024.07767.0)ln(

11
112
1
2
tt
tt
tt

eife

eife


Taking the antilog transformation we have













0)320.0exp(

0)098.0exp(
)7767.0exp(

119024.02

1
2
tt

ttx

tt
eife
eife


For a standardized shock with magnitude 2, (i.e. two standard deviations), we have

56.1
)2098.0exp(

))2(320.0exp(

)2(

)2(
1
2
1
2











tt
tt
e
e



Therefore, the impact of negative shock of size two-standard deviations is about 56% higher than the impact of a
positive shock of the same size.

Robust Standard Errors:
Estimate Std. Error t value Pr(>|t|)

mu 0.005937 0.001958 3.0328 0.002423

omega -0.610497 0.364615 -1.6744 0.094060

alpha1 -0.111069 0.066343 -1.6742 0.094096

beta1 0.902462 0.057336 15.7400 0.000000

gamma1 0.209405 0.048127 4.3511 0.000014

LogLikelihood : 835.9936

Information Criteria
————————————

Akaike -3.4553

Bayes -3.4119

Shibata -3.4555

Hannan-Quinn -3.4382

Q-Statistics on Standardized Residuals
————————————
statistic p-value

Lag[1] 0.3351 0.5627

Lag[p+q+1][1] 0.3351 0.5627

Lag[p+q+5][5] 5.1570 0.3970

d.o.f=0
H0 : No serial correlation

Q-Statistics on Standardized Squared Residuals
————————————
statistic p-value

Lag[1] 0.7262 0.3941

Lag[p+q+1][3] 1.0203 0.3124

Lag[p+q+5][7] 2.8738 0.7194

d.o.f=2

ARCH LM Tests
————————————
Statistic DoF P-Value

ARCH Lag[2] 1.018 2 0.6010

ARCH Lag[5] 1.165 5 0.9482

ARCH Lag[10] 6.026 10 0.8131

Nyblom stability test
————————————

Joint Statistic: 1.5861

Individual Statistics:

mu 0.1829

omega 0.1224

alpha1 0.1527

beta1 0.1301

gamma1 1.0125

Asymptotic Critical Values (10% 5% 1%)
Joint Statistic: 1.28 1.47 1.88
Individual Statistic: 0.35 0.47 0.75

Sign Bias Test
————————————

t-value prob sig

Sign Bias 1.973 0.04902 **

Negative Sign Bias 1.165 0.24478

Positive Sign Bias 1.041 0.29840

Joint Effect 11.259 0.01041 **

Adjusted Pearson Goodness-of-Fit Test:
————————————
group statistic p-value(g-1)

1 20 22.49 0.2604

2 30 35.86 0.1777

3 40 50.31 0.1060

4 50 55.07 0.2558

Residual analysis of EGARCH model shows that the model fits the data adequately – residuals are white noise and
show no ARCH effect. The goodness of fit test supports the choice of a Gaussian distribution for the error term.
Although the qqplot shows that the error distribution has thicker left tail than the normal distribution. We will fit
the t-distribution to check if that’s a better fit for the behavior of extreme values.

Fitting an ARMA(0,0)-EGARCH(1,1) model with t-distribution

> #Fit ARMA(0,0)-eGARCH(1,1) model with t-distribution

> egarch11.t.spec=ugarchspec(variance.model=list(model = “eGARCH”,

garchOrder=c(1,1)), mean.model=list(armaOrder=c(0,0)),

distribution.model = “std”)
> #estimate model

egarch11.t.fit=ugarchfit(spec=egarch11.t.spec, data=ret)

egarch11.t.fit

*———————————*
* GARCH Model Fit *
*———————————*

Conditional Variance Dynamics
———————————–
GARCH Model : eGARCH(1,1)
Mean Model : ARFIMA(0,0,0)
Distribution : std

Optimal Parameters
————————————
Estimate Std. Error t value Pr(>|t|)

mu 0.007093 0.001757 4.0369 0.000054

omega -0.677075 0.246000 -2.7523 0.005917

alpha1 -0.145367 0.045752 -3.1773 0.001487

beta1 0.893975 0.038715 23.0912 0.000000

gamma1 0.202717 0.055934 3.6242 0.000290

shape 7.926664 2.405188 3.2957 0.000982

Using the R output above, the ARMA(0,0)-EGARCH(1,1) model can be written as follows.

Fitted model:

rt = 0.007 + at, at=σtet

ln(σ2t) = -0.677 + (-0.145 et-1 + 0.203(|et-1| – E(|et-1|)) + 0.894 ln(σ
2

t-1)
with t-distribution with 8 degrees of freedom (nearest integer to shape value)

Note that since et has t-distribution, the E(|et|)=




)2/()1(

]2/)1[(22



 (see page 143 in book) where ν=d.f. of t-

distribution (denoted by shape in R output).

Note that since gamma1 is significant, the volatility has an asymmetric behavior. Therefore a negative shock has a
stronger impact on the volatility compared to a positive shock of the same size.

Robust Standard Errors:
Estimate Std. Error t value Pr(>|t|)

mu 0.007093 0.001888 3.7574 0.000172

omega -0.677075 0.212770 -3.1822 0.001462

alpha1 -0.145367 0.042936 -3.3857 0.000710

beta1 0.893975 0.034112 26.2071 0.000000

gamma1 0.202717 0.045692 4.4366 0.000009

shape 7.926664 2.720128 2.9141 0.003567

LogLikelihood : 846.92

Information Criteria

————————————

Akaike -3.4965

Bayes -3.4445

Shibata -3.4969

Hannan-Quinn -3.4761

Q-Statistics on Standardized Residuals
————————————
statistic p-value

Lag[1] 0.1872 0.6653

Lag[p+q+1][1] 0.1872 0.6653

Lag[p+q+5][5] 4.7776 0.4436

d.o.f=0
H0 : No serial correlation

Q-Statistics on Standardized Squared Residuals
————————————
statistic p-value

Lag[1] 0.6769 0.4107

Lag[p+q+1][3] 0.9526 0.3291

Lag[p+q+5][7] 2.5747 0.7652

d.o.f=2

ARCH LM Tests
————————————
Statistic DoF P-Value

ARCH Lag[2] 0.9725 2 0.6149

ARCH Lag[5] 1.0934 5 0.9547

ARCH Lag[10] 5.4934 10 0.8559

Nyblom stability test
————————————

Joint Statistic: 1.6436

Individual Statistics:

mu 0.20752

omega 0.12081

alpha1 0.15261

beta1 0.13304

gamma1 0.93741

shape 0.08736

Asymptotic Critical Values (10% 5% 1%)

Joint Statistic: 1.49 1.68 2.12

Individual Statistic: 0.35 0.47 0.75

Sign Bias Test
————————————

t-value prob sig

Sign Bias 1.7639 0.07840 *

Negative Sign Bias 1.2413 0.21510

Positive Sign Bias 0.9747 0.33022

Joint Effect 8.8291 0.03165 **

Adjusted Pearson Goodness-of-Fit Test:
————————————
group statistic p-value(g-1)

1 20 20.66 0.35571

2 30 39.23 0.09736

3 40 54.47 0.05098

4 50 66.51 0.04860

Residual analysis of EGARCH model shows that the model fits the data adequately – residuals are white noise and
show no ARCH effect. However the goodness of fit test shows that the t-distribution is not a good choice for the
error terms (test p-values are small and rejects the null hypothesis of error term having a t-distribution. The qqplot
shows that the error distribution has thicker tails than the t-distribution. (Further analysis shows that the ged
distribution does a better job representing the extreme values distribution)

Fitting an ARMA(0,0)-GJRGARCH(1,1) model or TGARCH model

The GJRGARCH(1,1) model fitted by the rugarch package has the following expression

)ln()()ln(

,
2
11
2

1111





tttt
tttttt

eaar





where µt can follow an ARMA process, but most typically it will be constant. Nt-1 is the indicator variable s.t.
Nt-1= 1 when at-1 (shock at time t-1) is negative, and Nt-1 = 0 otherwise. Gamma1 (γ1) is the leverage parameter. So
if gamma1 in output is significant, then we can conclude that the volatility process has an asymmetric behavior.

> #Fit ARMA(0,0)-TGARCH(1,1) model with t-distribution

gjrgarch11.t.spec=ugarchspec(variance.model=list(model = “gjrGARCH”,

garchOrder=c(1,1)),

mean.model=list(armaOrder=c(0,0)), distribution.model = “std”)

> #estimate model

gjrgarch11.t.fit=ugarchfit(spec=gjrgarch11.t.spec, data=ret)

Warning message:

In .makefitmodel(garchmodel = “gjrGARCH”, f = .gjrgarchLLH, T = T, :

NaNs produced

gjrgarch11.t.fit

*———————————*
* GARCH Model Fit *
*———————————*

Conditional Variance Dynamics
———————————–

GARCH Model : gjrGARCH(1,1)

Mean Model : ARFIMA(0,0,0)
Distribution : std

Optimal Parameters
————————————

Estimate Std. Error t value Pr(>|t|)

mu 0.007202 0.001766 4.077739 0.000045

omega 0.000212 0.000094 2.262826 0.023646

alpha1 0.000001 0.001394 0.000791 0.999369

beta1 0.781111 0.065122 11.994488 0.000000

gamma1 0.215386 0.069611 3.094141 0.001974

shape 7.179979 2.031526 3.534279 0.000409

Using the R output above, the ARMA(0,0)-TGARCH(1,1) model can be written as follows. Note that the arch(1)
coefficient is zero and we remove it from the model.

2
1
2
11

2
781.0)215.00000.0(00021.0

,0072.0




tttt
ttttt
aN
eaar





Where error term is assumed to have t-distribution with 7 degrees of freedom.

or
2
1
2
11

2
781.0215.000021.0

,0072.0





tttt
ttttt
aN
eaar




Where Nt-1 is an indicator variable for negative innovations such that













1
1
1
t
t
t

aif

aif
N

For a standardized shock with magnitude 2, (i.e. two standard deviations), we have

966.1
002.0781.0)002.04()000.0(00021.0

002.0781.0)002.04()215.00000.0(00021.0

)2(
)2(
1
2
1
2











tt
tt
e
e



Where the value for
2

1t
a is computed as

2
1
22

1 


ttt
ea  . Since

1t
 in the expression above is unknown, we set

1t
 = sample variance of the S&P500 index returns (that is

1t
 = 0.002) .

Robust Standard Errors:
Estimate Std. Error t value Pr(>|t|)

mu 0.007202 0.001921 3.749767 0.000177

omega 0.000212 0.000112 1.902791 0.057068

alpha1 0.000001 0.000054 0.020238 0.983853

beta1 0.781111 0.063980 12.208691 0.000000

gamma1 0.215386 0.056979 3.780097 0.000157

shape 7.179979 2.274951 3.156102 0.001599

LogLikelihood : 844.6336

Information Criteria
————————————

Akaike -3.4870

Bayes -3.4350

Shibata -3.4873

Hannan-Quinn -3.4666

Q-Statistics on Standardized Residuals
————————————
statistic p-value

Lag[1] 0.1348 0.7135

Lag[p+q+1][1] 0.1348 0.7135

Lag[p+q+5][5] 4.7791 0.4434

d.o.f=0
H0 : No serial correlation

Q-Statistics on Standardized Squared Residuals
————————————
statistic p-value

Lag[1] 0.7412 0.3893

Lag[p+q+1][3] 1.0145 0.3138

Lag[p+q+5][7] 2.5887 0.7631

d.o.f=2

ARCH LM Tests
————————————
Statistic DoF P-Value

ARCH Lag[2] 1.015 2 0.6020

ARCH Lag[5] 1.239 5 0.9411

ARCH Lag[10] 5.120 10 0.8830

Nyblom stability test
————————————

Joint Statistic: 2.2942

Individual Statistics:

mu 0.18155

omega 0.27666

alpha1 0.05641

beta1 0.20413

gamma1 0.03198

shape 0.06486

Asymptotic Critical Values (10% 5% 1%)
Joint Statistic: 1.49 1.68 2.12
Individual Statistic: 0.35 0.47 0.75

Sign Bias Test
————————————
t-value prob sig

Sign Bias 2.117 0.03474 **

Negative Sign Bias 1.408 0.15987

Positive Sign Bias 0.766 0.44408

Joint Effect 10.173 0.01715 **

Adjusted Pearson Goodness-of-Fit Test:
————————————
group statistic p-value(g-1)

1 20 19.83 0.4048

2 30 29.37 0.4457

3 40 37.17 0.5535

4 50 52.78 0.3300

Elapsed time : 0.458046

plot(gjrgarch11.t.fit, which=”all”)

Residual analysis of TGARCH model shows that the model fits the data adequately – residuals are white noise and
show no ARCH effect. The goodness of fit test also shows that the t-distribution is adequate.

Apply information criteria for model selection

# MODEL COMPARISON

# compare information criteria

model.list = list(garch11 = garch11.fit, garch11.t = garch11.t.fit,

+ egarch11 = egarch11.t.fit,

+ gjrgarch11 = gjrgarch11.t.fit)

info.mat = sapply(model.list, infocriteria)

rownames(info.mat) = rownames(infocriteria(garch11.fit))

info.mat

garch11 garch11.t egarch11 gjrgarch11

Akaike -3.428558 -3.468892 -3.480644 -3.487042

Bayes -3.393831 -3.425483 -3.428554 -3.434952

Shibata -3.428694 -3.469105 -3.480950 -3.487348

Hannan-Quinn -3.414909 -3.451830 -3.460170 -3.466568

Best model according to model selection criteria is the GJR-GARCH(1,1) model with t-distribution

R CODE:
# Analysis of daily S&P500 index

library(fBasics)

library(tseries)

library(rugarch)

# import data in R

# import libraries for TS analysis

myd= read.table(‘sp500_feb1970_Feb2010.txt’, header=T)

# create time series object •

rts= ts(myd$return, start = c(1970, 1), frequency=12)

# create a simple numeric object

ret =myd$return;

# CREATE TIME PLOT

plot(rts)

# Plots ACF function of vector data

acf(ret)

# Plot ACF of squared data to check for non-linear dependence

acf(ret^2)

# Computes Ljung-Box test on squared returns to test non-linear independence at lag 6

and 12

Box.test(ret^2,lag=6,type=’Ljung’)

Box.test(ret^2,lag=12,type=’Ljung’)

# Computes Ljung-Box test on absolute returns to test non-linear independence at lag 6

and 12

Box.test(abs(ret),lag=6,type=’Ljung’)

Box.test(abs(ret),lag=12,type=’Ljung’)

# FITTING AN GARCH(1,1) MODEL WITH GAUSSIAN DISTRIBUTION

# Use ugarchspec() function to specify model

garch11.spec=ugarchspec(variance.model=list(garchOrder=c(1,1)),

mean.model=list(armaOrder=c(0,0)))
#estimate model
garch11.fit=ugarchfit(spec=garch11.spec, data=ret)
garch11.fit

#persistence = alpha1+beta1

persistence(garch11.fit)

#half-life: ln(0.5)/ln(alpha1+beta1)

halflife(garch11.fit)

#create selection list of plots for garch(1,1) fit
plot(garch11.fit)
#to display all subplots on one page
plot(garch11.fit, which=”all”)

#FIT ARMA(0,0)-GARCH(1,1) MODEL WITH T-DISTRIBUTION

# specify model using functions in rugarch package

garch11.t.spec=ugarchspec(variance.model=list(garchOrder=c(1,1)),
mean.model=list(armaOrder=c(0,0)), distribution.model = “std”)
#estimate model
garch11.t.fit=ugarchfit(spec=garch11.t.spec, data=ret)
garch11.t.fit

plot(garch11.t.fit)

#FIT ARMA(0,0)-EGARCH(1,1) MODEL WITH GAUSSIAN DISTRIBUTION

egarch11.spec=ugarchspec(variance.model=list(model = “eGARCH”, garchOrder=c(1,1)),

mean.model=list(armaOrder=c(0,0)))
#estimate model
egarch11.fit=ugarchfit(spec=egarch11.spec, data=ret)
egarch11.fit

plot(egarch11.fit, which=”all”)

#FIT ARMA(0,0)-EGARCH(1,1) MODEL WITH T-DISTRIBUTION

egarch11.t.spec=ugarchspec(variance.model=list(model = “eGARCH”, garchOrder=c(1,1)),

mean.model=list(armaOrder=c(0,0)), distribution.model = “ged”)

#estimate model
egarch11.t.fit=ugarchfit(spec=egarch11.t.spec, data=ret)
egarch11.t.fit

plot(egarch11.t.fit, which=”all”)

# compute expected value E(|e|)

shape=coef(egarch11.t.fit)[6]

exp.abse=(2*sqrt(shape-2)*gamma((shape+1)/2))/((shape-1)*gamma(shape/2)*sqrt(pi))

#FIT ARMA(0,0)-TGARCH(1,1) MODEL WITH T-DISTRIBUTION

gjrgarch11.t.spec=ugarchspec(variance.model=list(model = “gjrGARCH”,

garchOrder=c(1,1)), mean.model=list(armaOrder=c(0,0)), distribution.model = “std”)

#estimate model
gjrgarch11.t.fit=ugarchfit(spec=gjrgarch11.t.spec, data=ret)
gjrgarch11.t.fit
plot(gjrgarch11.t.fit, which=”all”)

#FIT ARMA(0,0)-IGARCH(1,1) MODEL WITH SKEWED T-DISTRIBUTION

igarch11.t.spec=ugarchspec(variance.model=list(model = “iGARCH”, garchOrder=c(1,1)),

mean.model=list(armaOrder=c(0,0)), distribution.model = “std”)
#estimate model

igarch11.t.fit=ugarchfit(spec=igarch11.t.spec, data=ret)

igarch11.t.fit

plot(igarch11.t.fit, which=”all”)

# MODEL COMPARISON
# compare information criteria
model.list = list(garch11 = garch11.fit, garch11.t = garch11.t.fit,

egarch11 = egarch11.t.fit,

gjrgarch11 = gjrgarch11.t.fit)

info.mat = sapply(model.list, infocriteria)
rownames(info.mat) = rownames(infocriteria(garch11.fit))
info.mat

# RE-FIT MODELS LEAVING 100 OUT-OF-SAMPLE OBSERVATIONS FOR FORECAST

# EVALUATION STATISTICS

garch11.fit = ugarchfit(spec=garch11.spec, data=ret, out.sample=100)

garch11.t.fit = ugarchfit(spec=garch11.t.spec, data=ret, out.sample=100)

egarch11.t.fit = ugarchfit(egarch11.t.spec, data=ret, out.sample=100)

tgarch11.t.fit = ugarchfit(spec=gjrgarch11.t.spec, data=ret, out.sample=100)

# COMPUTE 100 1-STEP AHEAD ROLLING FORECASTS W/O RE-ESTIMATING

garch11.fcst = ugarchforecast(garch11.fit, n.roll=100, n.ahead=1)

garch11.t.fcst = ugarchforecast(garch11.t.fit, n.roll=100, n.ahead=1)

egarch11.t.fcst = ugarchforecast(egarch11.t.fit, n.roll=100, n.ahead=1)

tgarch11.t.fcst = ugarchforecast(tgarch11.t.fit, n.roll=100, n.ahead=1)

# COMPUTE FORECAST EVALUATION STATISTICS USING FPM() FUNCTION

fcst.list = list(garch11=garch11.fcst, garch11.t=garch11.t.fcst,

egarch11.t=egarch11.t.fcst,

tgarch11.t=tgarch11.t.fcst)

fpm.mat

= sapply(fcst.list, fpm)

fpm.mat

CSC425– Time series analysis and forecasting

Homework 5 – not be submitted

The goal of this assignment is to provide students with some practical training on

GARCH/EGARCH/TGARCH models for the analysis of the volatility of a stock return.

Solutions will be posted on Thursday November 7th 2013

Reading assignment:

1. Read Chapter 4 in short book and Chapter 3 in long book on volatility models
2. Review course documents posted under week 7 and 8.

Problem

Use the datafile nordstrom_w_00_13.csv that contains the Nordstrom (JWN ) stock weekly prices

from January 2000 to October 2013. The data file contains dates (date), daily prices (price). You

can also use the code and the analysis of the S&P500 returns used in week 7 and 8 lectures as

your reference for the analysis of this data. Analyze the Nordstrom stock log returns following

the steps below.

1. Compute log returns, and analyze their time plot.

Moments

N 693 Sum Weights 693

Mean 0.00268299 Sum Observations 1.85931229

Std Deviation 0.06068019 Variance 0.00368209

Skewness -0.2804891 Kurtosis 7.55351876

Uncorrected SS 2.55299156 Corrected SS 2.54800304

Coeff Variation 2261.66262 Std Error Mean 0.00230505

Time plot of returns show that Nordstrom stock returns shows periods of high volatility at

various times, and consistently high volatility from 2007 to 2010. Most returns are

between +/- 10%. Sample moments show that distribution of log returns is somewhat

symmetric with very fat tails (kurtosis = 7.55).

return

-0.

–

0.1
0.2
0.3
0.4

date

01/01/2000 01/01/2002 01/01/2004 01/01/2006 01/01/2008 01/01/2010 01/01/2012 01/01/2014 01/01/2016

2. Is there evidence of serial correlations in the log returns? Use autocorrelations and 5%
significance level to answer the question.

Log returns are not serially correlated as shown by the Ljung-Box test with p-values >

0.05 and the autocorrelation plot. .

Name of Variable = return

Mean of Working Series 0.002683

Standard Deviation 0.060636

Number of Observations 693

Autocorrelation Check for White Noise

To Chi- Pr >

Lag Square DF ChiSq ——————–Autocorrelations——————–

6 6.16 6 0.4050 -0.009 0.011 -0.048 -0.009 0.078 -0.006

12 9.08 12 0.6957 -0.042 -0.038 -0.019 0.011 -0.015 0.016

18 26.17 18 0.0960 0.033 -0.026 0.088 -0.032 0.101 -0.056

24 31.58 24 0.1377 0.029 -0.037 0.025 -0.042 -0.026 -0.048

30 33.84 30 0.2871 -0.029 -0.003 -0.037 0.023 0.011 -0.016

3. Is there evidence of ARCH effects in the log returns? Use appropriate tests at 5%
significance level to answer this question.

The analysis below shows a strong ARCH effect. The squared returns are strongly

correlated. The Ljung-Box tests on squared residuals are highly significant with p-values

< 0.001, and autocorrelation plots shows large autocorrelations for the first 10 lags.

Name of Variable = returnsq

Mean of Working Series 0.003684

Standard Deviation 0.011302

Number of Observations 693

Autocorrelation Check for White Noise

To Chi- Pr >

Lag Square DF ChiSq ——————–Autocorrelations——————–

6 331.56 6 <.0001 0.534 0.281 0.123 0.101 0.167 0.240

12 407.05 12 <.0001 0.246 0.187 0.053 0.036 0.032 0.082

18 469.70 18 <.0001 0.069 0.072 0.128 0.127 0.156 0.145

24 494.90 24 <.0001 0.149 0.090 0.050 0.035 0.014 0.031

30 501.01 30 <.0001 0.033 0.064 0.051 0.023 0.010 -0.004

4. Fit an GARCH(1,1) model for the log returns using a t- distribution for the error terms.
Perform model checking (analyze if residuals are white noise, if squared residuals are

white noise, and check if t-distribution is a good fit for the data) and write down the fitted

model.

The GARCH model with t-distribution is an adequate model for the volatility of the

returns. All parameters are significant and the squared residuals are white noise (LB

tests on squared residuals are non significant).

Fitted GARCH model can be expressed as follows:

rt=0.005 +at

at=σtet with σt
2
=0.1483 a

2
t-1+0.836 σ

2
t-1

(Intercept ARCH0 can be considered equal to zero)

Where error term et has t-distribution with 1/0.185=5 degrees of freedom.

The AUTOREG Procedure

GARCH Estimates

SSE 2.55069233 Observations 693

MSE 0.00368 Uncond Var 0.00531549

Log Likelihood 1102.73026 Total R-Square .

SBC -2172.7554 AIC -2195.4605

MAE 0.04086373 AICC -2195.3732

MAPE 113.334007 HQC -2186.6796

Normality Test 126.0152

Pr > ChiSq <.0001

Parameter Estimates

Standard Approx

Variable DF Estimate Error t Value Pr > |t|

Intercept 1 0.004653 0.001570 2.96 0.0030

ARCH0 1 0.0000848 0.0000429 1.98 0.0482

ARCH1 1 0.1483 0.0369 4.01 <.0001

GARCH1 1 0.8358 0.0359 23.30 <.0001

TDFI 1 0.1851 0.0448 4.14 <.0001

Inverse of t DF

R output

Robust Standard Errors:
Estimate Std. Error t value Pr(>|t|)

mu 0.004729 0.001465 3.2281 0.001246
omega 0.000086 0.000050 1.7302 0.083590
alpha1 0.145749 0.052318 2.7858 0.005340
beta1 0.836471 0.051967 16.0962 0.000000
shape 5.345569 1.009241 5.2966 0.000000

5. Fit and EGARCH(1,1) model for the NDX log returns using a normal distribution for the

error terms. Perform model checking and write down the fitted model.

The EGARCH model is adequate although the Gaussian distribution on the error terms is

not sufficient to describe most extreme events. No ARCH effects are shown in residuals.

The leverage parameter “theta” is significant showing that volatility of Nordstrom stock

returns is affected more heavily by negative

shocks.

The fitted model can be written as follows:

rt=0.0012 +at

at=σtet with ln σt
2
=-0.194+0.213 g(et-1) +0.996 ln σ

2
t-1

g(et-1)= -0.47et-1+[|et-1|-E(et-1)]

The AUTOREG Procedure

Exponential GARCH Estimates

SSE 2.54945444 Observations 693

MSE 0.00368 Uncond Var .

Log Likelihood 1093.55049 Total R-Square .

stres

–

-5

-4

-3

-2

-1

0
1
2
3
4
5
6
7

quantiles

-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7

t-distribution QQplot for residuals of GARCH model

SBC -2154.3958 AIC -2177.101

MAE 0.04095052 AICC -2177.0136

MAPE 101.222881 HQC -2168.32

Normality Test 71.9939

Pr > ChiSq <.0001 Parameter Estimates

Standard Approx

Variable DF Estimate Error t Value Pr > |t|

Intercept 1 0.001236 0.001752 0.71 0.4806

EARCH0 1 -0.1940 0.0636 -3.05 0.0023

EARCH1 1 0.2133 0.0316 6.75 <.0001

EGARCH1 1 0.9662 0.0106 91.41 <.0001

THETA 1 -0.4748 0.1121 -4.23 <.0001

R output

Robust Standard Errors:
Estimate Std. Error t value Pr(>|t|)
mu 0.001271 0.001774 0.71655 0.473654
omega -0.169047 0.110169 -1.53443 0.124924
alpha1 -0.095670 0.035615 -2.68620 0.007227
beta1 0.970507 0.018820 51.56682 0.000000
gamma1 0.194542 0.059579 3.26526 0.001094

In R model is written as

rt=0.0012 +at

at=σtet with ln σt
2
=-0.169-0.095( et-1 +0.194(|et-1|-E(et-1))+0.970 ln σ

2
t-1

where error term has t-distribution with 6 degrees of freedom.

Note that the leverage coefficients of et-1 in SAS and R are equivalent.

6. Fit and EGARCH(1,1) model for the Nordstrom log returns using a t- distribution for the
error terms. Perform model checking and write down the fitted model.

The EGARCH model t-distributed errors is a good model for the data. No ARCH effects

are shown in residuals. The leverage parameter “theta” is significant and negative

showing that volatility of Nordstrom stock returns is affected more heavily by negative

shocks.

The fitted model can be written as follows:

rt=0.0032 +at

at=σtet with ln σt
2
=-0.191+0.158 g(et-1) +0.974 ln σ

2
t-1

g(et-1)= -0.55et-1+[|et-1|-E(et-1)]

et has t-distribution with 6 degrees of freedom.

The MODEL Procedure

Nonlinear Liklhood Summary of Residual Errors

DF DF Adj

Equation Model Error SSE MSE Root MSE R-Square R-Sq

7 686 2.5482 0.00371 0.0609 -0.0001 -0.0088

nresid.y 686 1028.4 1.4991 1.2244

Nonlinear Liklhood Parameter Estimates

Approx Approx

Parameter Estimate Std Err t Value Pr > |t|

intercept 0.003273 0.00160 2.05 0.0410

earch0 -0.19123 0.0932 -2.05 0.0405

earch1 0.158378 0.0490 3.23 0.0013

egarch1 0.974881 0.0139 70.01 <.0001

theta -0.55013 0.1969 -2.79 0.0053

df 5.993367 1.3083 4.58 <.0001 R output

Robust Standard Errors:
Estimate Std. Error t value Pr(>|t|)
mu 0.003374 0.001552 2.1743 0.029681
omega -0.137932 0.098212 -1.4044 0.160191
alpha1 -0.104435 0.034086 -3.0638 0.002185
beta1 0.977603 0.016446 59.4433 0.000000
gamma1 0.180592 0.075474 2.3928 0.016722
shape 5.783774 1.169335 4.9462 0.000001

In R model is written as

rt=0.0033 +at

at=σtet with ln σt
2
=-0.138-0.104( et-1 +0.180(|et-1|-E(et-1))+0.978 ln σ

2
t-1
where error term has t-distribution with 6 degrees of freedom.

Note that the leverage coefficients of et-1 in SAS and R are equivalent.

Name of Variable = resid2

Mean of Working Series 1.483942

Standard Deviation 2.819517

Number of Observations 693
Autocorrelation Check for White Noise

To Chi- Pr >
Lag Square DF ChiSq ——————–Autocorrelations——————–

6 5.11 6 0.5299 0.026 0.020 -0.044 -0.061 0.024 -0.004

12 13.61 12 0.3261 0.025 0.088 0.014 -0.041 0.017 -0.039

18 19.28 18 0.3748 0.043 -0.032 -0.038 -0.028 -0.049 0.020

24 23.55 24 0.4873 0.004 0.028 -0.025 -0.051 -0.039 -0.021

7. Find a GJK-TGARCH(1,1) model for the Nordstrom log returns using a t-distribution for
the innovations. Perform model checking and write down the fitted model.

Similar to the AR(0)- EGARCH(1,1) model, the AR(0)-GJR(1,1) model is also a good

model for the data. No ARCH effects are shown in residuals. The leverage parameter is

significant and positive showing that volatility of Nordstrom stock returns is affected

more heavily by negative shocks.

The fitted model can be written as follows:

rt=0.0036 +at

at=σtet with σt
2
=(0.034 +0.13Nt-1)a

2
t-1 +0.800 σ

2
t-1

with Nt-1 = 1 if at-1<0, and Nt-1 = 0 if at-1>0

et has t-distribution with 5 degrees of freedom.

The MODEL Procedure
Nonlinear Liklhood Parameter Estimates
Approx Approx
Parameter Estimate Std Err t Value Pr > |t|

intercept 0.003589 0.00157 2.28 0.0226

df 5.028263 0.9762 5.15 <.0001

arch0 0.000095 0.000044 2.18 0.0297

arch1 0.034154 0.0259 1.32 0.1869

garch1 0.800444 0.0584 13.71 <.0001

gamma 0.129939 0.0496 2.62 0.0090

R output

Robust Standard Errors:
Estimate Std. Error t value Pr(>|t|)
mu 0.003880 0.001519 2.5535 0.010665
omega 0.000083 0.000053 1.5798 0.114152
alpha1 0.039211 0.032406 1.2100 0.226283
beta1 0.858372 0.058370 14.7058 0.000000
gamma1 0.153341 0.074560 2.0566 0.039723
shape 5.767186 1.177356 4.8984 0.000001

The fitted model can be written as follows:

rt=0.0039 +at

at=σtet with σt
2
=(0.039 +0.15Nt-1)a

2
t-1 +0.858 σ

2
t-1
with Nt-1 = 1 if at-1<0, and Nt-1 = 0 if at-1>0
et has t-distribution with 6 degrees of freedom.

8. Is the leverage effect significant at the 5% level?

The leverage parameters in both the EGARCH and the GJR models are significant, and

have values that indicate that volatility react more heavily to negative shocks.

9. What model provides the best fit for the data? Explain.
Since the leverage parameters in the AR(0)-EGARCH(1,1) and AR(0) – GJR(1,1) models

are significant, and residuals are white noise with no ARCH effect, we can conclude that

both models provide a good explanation of the volatility behavior for Nordstrom stock

returns. Either models can be chosen, there is no clear winner. (Note that in SAS, PROC

MODEL does not compute BIC criterion – backtesting can be used to compare models.)

In R the egarch(1,1) and the GJR-GARCH(1,1) models have very similar BIC values-

garch11.t egarch11 gjrgarch11
Akaike -3.172122 -3.186143 -3.183976
Bayes -3.139358 -3.146827 -3.144660

10. Use the selected model to compute up to 5 step-ahead forecasts of the simple returns and
its volatility.

The following predictions are based on the AR(0)-EGARCH(1,1) model. sigma is the

predicted conditional standard deviation or volatility, and series is the predicted return

(note that predicted returns are the sample mean returns since returns are white noise)

R output for forecasts
sigma series
2013-10-29 0.02732 0.003374
2013-10-30 0.02764 0.003374
2013-10-31 0.02796 0.003374
2013-11-01 0.02827 0.003374
2013-11-04 0.02858 0.003374

SAS code

*import data from file;

proc import datafile=’nordstrom_w_00_13.csv’ out=myd replace;

resid

-7

-6

-5
-4
-3
-2
-1
0
1
2
3
4
5
6
7
quantiles
-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7

t-distribution QQplot of residuals for GJRGARCH model

EXPLORATORY ANALYSIS

nobs 469.000000

NAs 0.000000

Minimum -0.166652

Maximum 0.164808

1. Quartile -0.022391

3. Quartile 0.029000

Mean 0.003843

Median 0.005614

Sum 1.802469

SE Mean 0.002079

LCL Mean -0.000242

UCL Mean 0.007929

Variance 0.002027

Stdev 0.045027

Skewness -0.040829

Kurtosis 1.569306

Title:
Jarque – Bera Normalality Test
Test Results:
STATISTIC:
X-squared: 49.4632
P VALUE:
Asymptotic p Value: 1.816e-11

The distribution of log(returns) is not normal distribution, with symmetry close to 0 but kurtosis statistics is 1.5 which is far from normal distribution statistic. The Jarque-Bera test is significant (p-value < 0.05) indicating that hypothesis of normal distribution for log(returns) can be rejected. The following plot is time series plot of weekly price of Google shares which shows increasing trend. ( Time series Plot of Google Prices ) Time plot of returns show that google stock returns shows periods of high volatility at various times. Most returns are between +/- 10%. Sample moments show that distribution of log returns is somewhat symmetric with very thin tails (kurtosis = 1.56). There is various times where the returns produce +/-15% with higher volatility also. Volatility doesn’t seem to die down quickly and it continues for a while after a volatile period. To check null hypothesis of non stationarity: Dickey Fuller Test is expressed as H0: φ1=1 vs Ha: φ1<>1 where the null hypothesis indicates unit-root non-stationarity.

The Dickey-Fuller test shows that the earnings time series is unit-root stationary. The test p-values for lags 7 is less than 0.05, therefore the null hypothesis of non stationarity can be rejected (small p-values< 0.05). Augmented Dickey-Fuller Test Test Results: PARAMETER: Lag Order: 7 STATISTIC: Dickey-Fuller: -7.1331 P VALUE: 0.01 alternative hypothesis: stationary ACF TESTS The ACF plots below show the Google stock returns are not correlated indicating a constant mean model for rt. Both the squared returns time series and the absolute time series show large autocorrelations. We can conclude that the log returns process has a strong non-linear dependence. Ljung Box testing for ARCH effects for Lags of 6-12-18 for squared returns and absolute returns Box-Ljung test data: coredata(rets^2) X-squared = 48.3339, df = 6, p-value = 1.013e-08 Box-Ljung test data: coredata(rets^2) X-squared = 71.9976, df = 12, p-value = 1.352e-10 Box-Ljung test data: coredata(rets^2) X-squared = 98.7945, df = 18, p-value = 3.68e-13 The Ljung Box tests confirm that the Ljung Box tests on the squared returns are autocorrelated. The analysis shows a strong ARCH effect. The squared returns are strongly correlated. The Ljung-Box tests on squared residuals are highly significant with p-values < 0.005, and autocorrelation plots shows large autocorrelations for the first 18 lags. Plot of PACF: There is no correlation till lag 13 which shows there is no AR lag. MODEL FITTING *---------------------------------* * GARCH Model Fit * *---------------------------------* Conditional Variance Dynamics ----------------------------------- GARCH Model : sGARCH(1,1) Mean Model : ARFIMA(0,0,0) Distribution : std Optimal Parameters ------------------------------------ Estimate Std. Error t value Pr(>|t|)
mu 0.004432 0.001778 2.4924 0.012689
omega 0.000103 0.000053 1.9574 0.050295
alpha1 0.094169 0.032716 2.8784 0.003997
beta1 0.856326 0.044635 19.1850 0.000000
shape 6.797338 2.000925 3.3971 0.000681

To fit an ARMA(0,0)-GARCH(2,1) model for the log returns using a normal distribution for the error terms:

Fitted Model:

with 7 degrees of freedom
Residual Analysis
Q-Statistics on Standardized Residuals
————————————
statistic p-value
Lag[1] 0.0339 0.8539
Lag[p+q+1][1] 0.0339 0.8539
Lag[p+q+5][5] 2.4766 0.7800
d.o.f=0
H0 : No serial correlation
Q-Statistics on Standardized Squared Residuals
————————————
statistic p-value
Lag[1] 0.1274 0.7212
Lag[p+q+1][3] 1.4245 0.2327
Lag[p+q+5][7] 3.4346 0.6333
d.o.f=2
Above output shows there is no evidence of autocorrelation in residuals. They behave like white noise. Also, there is no evidence of serial correlation in squared residuals. They also behave like white noise.
Adjusted Pearson Goodness-of-Fit Test:
————————————
group statistic p-value(g-1)
1 20 8465 0
2 30 13148 0
3 40 17834 0
4 50 22521 0
Test for Goodness of fit. Normal Distribution rejected.
To fit ARMA(0,0)-eGARCH(1,1) model with Gaussian distribution:

Conditional Variance Dynamics
———————————–
GARCH Model : eGARCH(1,1)
Mean Model : ARFIMA(0,0,0)
Distribution : norm
Optimal Parameters
————————————
Estimate Std. Error t value Pr(>|t|)
mu 0.004814 0.001837 2.6205 0.008780
omega -0.427600 0.165861 -2.5781 0.009936
alpha1 -0.086690 0.033886 -2.5583 0.010518
beta1 0.931657 0.026273 35.4604 0.000000
gamma1 0.177187 0.048248 3.6724 0.000240
Fitted Model is:

Residual Analysis:
Q-Statistics on Standardized Residuals
————————————
statistic p-value
Lag[1] 0.03655 0.8484
Lag[p+q+1][1] 0.03655 0.8484
Lag[p+q+5][5] 2.73813 0.7403
d.o.f=0
H0 : No serial correlation
Q-Statistics on Standardized Squared Residuals
————————————
statistic p-value
Lag[1] 0.2255 0.6348
Lag[p+q+1][3] 2.6812 0.1015
Lag[p+q+5][7] 4.1362 0.5300
d.o.f=2
ARCH LM Tests
————————————
Statistic DoF P-Value
ARCH Lag[2] 2.224 2 0.3290
ARCH Lag[5] 4.682 5 0.4559
ARCH Lag[10] 7.053 10 0.7204
Above output shows there is no evidence of autocorrelation in residuals. They behave like white noise. Also, there is no evidence of serial correlation in squared residuals. They also behave like white noise.
Adjusted Pearson Goodness-of-Fit Test:
————————————
group statistic p-value(g-1)
1 20 26.65 0.1131
2 30 32.04 0.3179
3 40 43.67 0.2798
4 50 47.74 0.5243
Above output shows that error terms are normally distributed as all the P values are greater than 0.05.
To fit ARMA(0,0)-tGARCH(1,1) model with Gaussian distribution:
Conditional Variance Dynamics
———————————–
GARCH Model : gjrGARCH(2,1)
Mean Model : ARFIMA(0,0,0)
Distribution : norm
Optimal Parameters
————————————
Estimate Std. Error t value Pr(>|t|)
mu 0.004928 0.001860 2.649926 0.008051
omega 0.000120 0.000053 2.257334 0.023987
alpha1 0.000000 0.000877 0.000011 0.999991
alpha2 0.072854 0.037335 1.951388 0.051011
beta1 0.828559 0.053752 15.414339 0.000000
gamma1 0.397469 0.144242 2.755564 0.005859
gamma2 -0.315459 0.131445 -2.399922 0.016399
Fitted Model is:

Residual Analysis:
Q-Statistics on Standardized Residuals
————————————
statistic p-value
Lag[1] 0.2311 0.6307
Lag[p+q+1][1] 0.2311 0.6307
Lag[p+q+5][5] 2.9412 0.7090
d.o.f=0
H0 : No serial correlation
Q-Statistics on Standardized Squared Residuals
————————————
statistic p-value
Lag[1] 2.137 0.14380
Lag[p+q+1][4] 4.191 0.04063
Lag[p+q+5][8] 4.680 0.45620
d.o.f=3
ARCH LM Tests
————————————
Statistic DoF P-Value
ARCH Lag[2] 2.461 2 0.2921
ARCH Lag[5] 4.904 5 0.4277
ARCH Lag[10] 7.140 10 0.7122
Above output shows there is no evidence of autocorrelation in residuals. They behave like white noise. Also, there is no evidence of serial correlation in squared residuals. They also behave like white noise.
Adjusted Pearson Goodness-of-Fit Test:
————————————
group statistic p-value(g-1)
1 20 20.00 0.3947
2 30 32.43 0.3014
3 40 30.87 0.8203
4 50 54.56 0.2714
Above output shows that error terms are normally distributed as all the P values are greater than 0.05.
BEST FIT FOR THE MODEL:
garch11 egarch11 gjrgarch11
Akaike -3.479720 -3.455538 -3.474405
Bayes -3.435471 -3.411288 -3.412455
Shibata -3.479944 -3.455762 -3.474841
Hannan-Quinn -3.462310 -3.438127 -3.450030
From above output it can be said that AR(0)-EGARCH(1,1) have the (smallest) values for all best fit criterion and hence can be considered as best model.

Forecast Analysis:

RE-FIT MODELS LEAVING 100 OUT-OF-SAMPLE OBSERVATIONS FOR FORECAST and EVALUATE STATISTICS for 5 Step Ahead Forecast.

garch11 egarch11 tgarch21

MSE 0.000950199 0.0009498531 0.0009499473

MAE 0.02257449 0.0225407 0.02255243

DAC 0.6 0.6 0.6

Since MSE is smallest for eGARCH model. We should consider this model for forecasting.

CODE:
#Tomas Georgakopoulos CSC 425 Final Project
setwd(“C:/Course/CSC425”)
# Analysis of Google weekly returns from
library(forecast)
library(TSA)
# import data in R and compute log returns
# import libraries for TS analysis
library(zoo)
library(tseries)
myd= read.table(‘Weekly-GOOG-TSDATA.csv’, header=T, sep=’,’)
pricets = zoo(myd$price, as.Date(as.character(myd$date), format=c(“%m/%d/%Y”)))
#log return time series
rets = log(pricets/lag(pricets, -1))
# strip off the dates and just create a simple numeric object
ret = coredata(rets);
#compute statistics
library(fBasics)
basicStats(rets)
#HISTOGRAM
par(mfcol=c(1,2))
hist(rets, xlab=”Weekly Returns”, prob=TRUE, main=”Histogram”)
#Add approximating normal density curve
xfit<-seq(min(rets,na.rm=TRUE),max(rets,na.rm=TRUE),length=40) yfit<-dnorm(xfit,mean=mean(rets,na.rm=TRUE),sd=sd(rets,na.rm=TRUE)) lines(xfit, yfit, col="blue", lwd=2) #CREATE NORMAL PROBABILITY PLOT qqnorm(rets) qqline(rets, col = 'red', lwd=2) # creates time plot of log returns par(mfrow=c(1,1)) plot(rets) #Perform Jarque-Bera normality test. normalTest(rets,method=c('jb')) #SKEWNESS TEST skew_test=skewness(rets)/sqrt(6/length(rets)) skew_test print("P-value = ") 2*(1-pnorm(abs(skew_test))) #FAT-TAIL TEST k_stat = kurtosis(rets)/sqrt(24/length(rets)) print("Kurtosis test statistic") k_stat print("P-value = ") 2*(1-pnorm(abs(k_stat))) #COMPUTE DICKEY-FULLER TEST library(fUnitRoots) adfTest(rets, lags=7, type=c("c")) # Computes Ljung-Box test on squared returns to test non-linear independence at lag 6 and 12 Box.test(coredata(rets^2),lag=6,type='Ljung') Box.test(coredata(rets^2),lag=12,type='Ljung') Box.test(coredata(rets^2),lag=18,type='Ljung') # Computes Ljung-Box test on absolute returns to test non-linear independence at lag 6 and 12 Box.test(abs(coredata(rets)),lag=6,type='Ljung') Box.test(abs(coredata(rets)),lag=12,type='Ljung') Box.test(abs(coredata(rets)),lag=18,type='Ljung') # Plots ACF function of vector data par(mfrow=c(3,1)) acf(ret) # Plot ACF of squared returns to check for ARCH effect acf(ret^2) # Plot ACF of absolute returns to check for ARCH effect acf(abs(ret)) #plot returns, square returns and abs(returns) # Plots ACF function of vector data par(mfrow=c(3,1)) plot(rets, type='l') # Plot ACF of squared returns to check for ARCH effect plot(rets^2,type='l') # Plot ACF of absolute returns to check for ARCH effect plot(abs(rets),type='l') par(mfrow=c(1,1)) # plots PACF of squared returns to identify order of AR model pacf(coredata(rets),lag=30) #GARCH Models library(rugarch) #Fit AR(0,0)-GARCH(1,1) model garch11.spec=ugarchspec(variance.model=list(garchOrder=c(1,1)), mean.model=list(armaOrder=c(0,0)),distribution.model = "std") #estimate model garch11.fit=ugarchfit(spec=garch11.spec, data=rets) garch11.fit #Fit AR(0,0)-eGARCH(1,1) model with Gaussian distribution egarch11.spec=ugarchspec(variance.model=list(model = "eGARCH",garchOrder=c(1,1)), mean.model=list(armaOrder=c(0,0))) #estimate model egarch11.fit=ugarchfit(spec=egarch11.spec, data=rets) egarch11.fit #Fit AR(0,0)-TGARCH(1,1) model with norm-distribution gjrgarch11.spec=ugarchspec(variance.model=list(model = "gjrGARCH",garchOrder=c(2,1)), mean.model=list(armaOrder=c(0,0)), distribution.model = "norm") #estimate model gjrgarch11.fit=ugarchfit(spec=gjrgarch11.spec, data=rets) gjrgarch11.fit # compare information criteria model.list = list(garch11 = garch11.fit, egarch11 = egarch11.fit, gjrgarch11 = gjrgarch11.fit) info.mat = sapply(model.list, infocriteria) rownames(info.mat) = rownames(infocriteria(garch11.fit)) info.mat # RE-FIT MODELS LEAVING 100 OUT-OF-SAMPLE OBSERVATIONS FOR FORECAST # EVALUATION STATISTICS garch11.fit = ugarchfit(spec=garch11.spec, data=rets, out.sample=100) egarch11.fit = ugarchfit(egarch11.spec, data=rets, out.sample=100) tgarch11.fit = ugarchfit(spec=gjrgarch11.spec, data=rets, out.sample=100) garch11.fcst = ugarchforecast(garch11.fit, n.roll=100, n.ahead=1) egarch11.fcst = ugarchforecast(egarch11.fit, n.roll=100, n.ahead=1) tgarch11.fcst = ugarchforecast(tgarch11.fit, n.roll=100, n.ahead=1) fcst.list = list(garch11=garch11.fcst, egarch11=egarch11.fcst, tgarch21=tgarch11.fcst) fpm.mat = sapply(fcst.list, fpm) fpm.mat

The

average weekly earnings of google shares are $1.005 per share with a standard deviation of $0.045. The distribution of returns is more or less symmetric but have high peak. Normality test (

Jarque Bera Test

) rejects hypothesis of normality for earnings data at 5% level.

Jarque Bera Test

data: returns

X-squared = 54.1642, df = 2, p-value = 1.731e-12

The distribution of log(returns) is also not closer to normal distribution, with symmetry close to 0 but kurtosis statistics is 1.5 which is far from normal distribution statistic. The Jarque-Bera test is significant (p-value < 0.05) indicating that hypothesis of normal distribution for log(returns) can be rejected.

Jarque Bera Test

data: rets

X-squared = 49.4632, df = 2, p-value = 1.816e-11

The following plot is time series plot of weekly price of google shares which shows increasing trend.

Time series Plot of Google Prices

Time plot of returns show that google stock returns shows periods of high volatility at various times. Most returns are between +/- 10%. Sample moments show that distribution of log returns is somewhat symmetric with very thin tails (kurtosis = 1.56).

To check null hypothesis of non stationarity:

Dickey Fuller Test is expressed as H0: φ1=1 vs Ha: φ1<>1 where the null hypothesis indicates unit-root non-stationarity.

Augmented Dickey-Fuller Test

data: rets

Dickey-Fuller = -7.1264, Lag order = 7, p-value = 0.01

alternative hypothesis: stationary

To check serial correlations in the log returns:

Log returns are not serially correlated as shown by the Ljung-Box test with p-values > 0.05 and the autocorrelation plot.

Box-Pierce test

data: coredata(rets)

X-squared = 0.686, df = 1, p-value = 0.4075

To look for evidence of ARCH effects in the log returns:

The analysis below shows a strong ARCH effect. The squared returns are strongly correlated. The Ljung-Box tests on squared residuals are highly significant with p-values < 0.005, and autocorrelation plots shows large autocorrelations for the first 15 lags.

Box-Pierce test

data: coredata(rets^2)

X-squared = 8.1483, df = 1, p-value = 0.00431

Plot of PACF: There is no correlation till lag 10 which shows there is no AR lag.

To fit an ARMA(0,0)-GARCH(2,1) model for the log returns using a normal- distribution for the error terms:

Fitted

Model

Residual Analysis:

Adjusted Pearson Goodness-of-Fit Test:

————————————

group statistic p-value(g-1)

20 28.78 0.06948

2 30 38.57 0.11019

3 40 46.39 0.19380

4 50 51.15 0.38929

Above output shows that error terms are normally distributed as all the P values are greater than 0.05.

To fit ARMA(0,0)-eGARCH(1,1) model with Gaussian distribution:

Fitted Model is:

Residual Analysis:
Adjusted Pearson Goodness-of-Fit Test:
————————————
group statistic p-value(g-1)

1 20 28.95 0.06673

2 30 27.82 0.52739

3 40 46.91 0.18000

4 50 50.72 0.40546

Above output shows that error terms are normally distributed as all the P values are greater than 0.05.

To fit ARMA(0,0)-TGARCH(2,1) model with norm-distribution:

Fitted Model:

Residual Analysis:
Adjusted Pearson Goodness-of-Fit Test:
————————————
group statistic p-value(g-1)

1 20 18.29 0.5030

2 30 31.28 0.3525

3 40 29.17 0.8742

4 50 51.36 0.3813

Above output shows that error terms are normally distributed as all the P values are greater than 0.05.

BEST FIT FOR THE MODEL:

garch21

egarch11

gjrgarch21

Akaike -3.436225 -3.451379 -3.463469

Bayes -3.391975 -3.407129 -3.401519

Shibata -3.436449 -3.451603 -3.463905

Hannan-Quinn -3.418814 -3.433968 -3.439094

From above output it can be said that AR(0)-EGARCH(1,1) and AR(0) – GJR(1,1) both have almost equal (smallest) values for all best fit criterion and hence can be considered as best model.

Forecast Analysis:

RE-FIT MODELS LEAVING 100 OUT-OF-SAMPLE OBSERVATIONS FOR FORECAST and EVALUATE STATISTICS for 5 Step Ahead Forecast.

0.6000000000

Model	garch21	egarch11	tgarch21
MSE	0.0009497851	0.0009497741	0.0009499554
MAE	0.0224897000	0.0225260500	0.0225533000
DAC			0.6000000000

Since MSE is smallest for GJRGARCH model. We should consider this model for forecasting.

CODE:

library(e1071)

library(tseries)

library(xts)

library(rugarch)

price = scan(“clipboard”)

pricets=ts(price,frequency=52,start=c(2004,45))

returns=pricets/lag(pricets, -1)

summary(price)

skewness(price)

kurtosis(price)

sd(price)

summary(returns)

skewness(returns)

kurtosis(returns)

sd(returns)

#Normality check

hist(returns, prob=TRUE)

lines(density(returns, adjust=2), type=”l”)

jarque.bera.test(returns)

#log return time series

rets = log(pricets/lag(pricets, -1))

#Normality check

hist(rets, prob=TRUE)

lines(density(rets, adjust=2), type=”l”)

jarque.bera.test(rets)

plot.ts(price.ts)

# strip off the dates and just create a simple numeric object (require:xts)

ret = coredata(rets);

# creates time plot of log returns

plot(rets)

summary(rets)

skewness(rets)

kurtosis(rets)

sd(returns)

#ADF test for checking null hypothesis of non stationarity

adf.test(rets)

# Computes Ljung-Box test on returns

Box.test( coredata(rets))

# Computes Ljung-Box test on squared returns to test non-linear independence

Box.test(coredata(rets^2))

# Computes Ljung-Box test on absolute returns to test non-linear independence

Box.test(abs(coredata(rets)))

par(mfrow=c(3,1))

# Plots ACF function of vector data

acf(ret)

# Plot ACF of squared returns to check for ARCH effect

acf(ret^2)

# Plot ACF of absolute returns to check for ARCH effect

acf(abs(ret))

#plot returns, square returns and abs(returns)

par(mfrow=c(3,1))
# Plots ACF function of vector data

plot(rets, type=’l’)

# Plot ACF of squared returns to check for ARCH effect

plot(rets^2,type=’l’)

# Plot ACF of absolute returns to check for ARCH effect

plot(abs(rets),type=’l’)

par(mfrow=c(1,1))

# plots PACF of squared returns to identify order of AR model

pacf(coredata(rets),lag=10)

#specify model using functions in rugarch package

#Fit ARMA(0,0)-GARCH(2,1) model

garch21.spec=ugarchspec(variance.model=list(garchOrder=c(2,1)), mean.model=list(armaOrder=c(0,0)))

#estimate model

garch21.fit

=ugarchfit(spec=garch21.spec, data=rets)

garch21.fit

#Fit ARMA(0,0)-eGARCH(1,1) model with Gaussian distribution

egarch11.spec=ugarchspec(variance.model=list(model = “eGARCH”,

garchOrder=c(1,1)), mean.model=list(armaOrder=c(0,0)))

#estimate model

egarch11.fit

=ugarchfit(spec=egarch11.spec, data=ret)

egarch11.fit

#Fit ARMA(0,0)-TGARCH(2,1) model with norm-distribution

gjrgarch21.spec=ugarchspec(variance.model=list(model = “gjrGARCH”,

garchOrder=c(2,1)), mean.model=list(armaOrder=c(0,0)), distribution.model = “norm”)

#estimate model

gjrgarch21.fit

=ugarchfit(spec=gjrgarch21.spec, data=ret)

gjrgarch21.fit

# compare information criteria

model.list = list(garch21 = garch21.fit,

egarch11 = egarch11.fit,

gjrgarch21 = gjrgarch21.fit)

info.mat

= sapply(model.list, infocriteria)

rownames(info.mat) = rownames(infocriteria(garch21.fit))

info.mat

# RE-FIT MODELS LEAVING 100 OUT-OF-SAMPLE OBSERVATIONS FOR FORECAST

# EVALUATION STATISTICS

garch21.fit = ugarchfit(spec=garch21.spec, data=ret, out.sample=100)

egarch11.fit = ugarchfit(egarch11.spec, data=ret, out.sample=100)

tgarch21.fit = ugarchfit(spec=gjrgarch21.spec, data=ret, out.sample=100)

garch21.fcst = ugarchforecast(garch21.fit, n.roll=100, n.ahead=5)

egarch11.fcst = ugarchforecast(egarch11.fit, n.roll=100, n.ahead=5)

tgarch21.fcst = ugarchforecast(tgarch21.fit, n.roll=100, n.ahead=5)

fcst.list = list(garch21=garch21.fcst,

egarch11=egarch11.fcst,

tgarch21=tgarch21.fcst)

fpm.mat

= sapply(fcst.list, fpm)

fpm.mat

Turn in your highest-quality paper
Get a qualified writer to help you with

“ Advanced Statistics with R code 90% doen just check it and make a summary ”

Get high-quality paper

Guarantee! All work is written by expert writers!

Still stressed from student homework?

Get quality assistance from academic writers!

Order now