Advanced Statistics with R code 90% doen just check it and make a summary

 

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

I have the problem below that needs to be completed. I have included Weekly GOOGLE time series data from the past few years. The analysis must be done on this data. I have also included sample code and analysis. Please do the analysis similar to this with thorough answers. It will be easier if you just copy and paste the sample code and modify it. 

The analysis should be submitted as a report consisting of two sections:

1. A non-technical summary not longer than one typewritten page, describing the conclusions of your statistical analysis to a non-technical audience. It should be intelligible to a person who does not know regression analysis. Suppose you are talking to your boss who does not know statistics or to a friend who is not familiar with statistical terminology.

2. A technical summary not exceeding 5 pages, with details of your statistical analysis. This section is intended for a statistically literate audience and must be written in a clear organized fashion. For instance, you can organize the report into subsections such as

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

a. Exploratory analysis of the data.

b. Model fitting.

c. Residual analysis and model diagnostics.

d. Forecast analysis

e. Analysis of the results and discussion.

You should include appropriate output and graphs from your  R programs in your document in a presentable format. Further details may be relegated to an appendix that may contain the R code, some graphs, computer output or supplementary information about the field of study.

It is possible that you may not find a satisfactory model that fits adequately your data. Sometimes a data set may admit more than one satisfactory answer; sometimes there may be none. If the statistical analysis shows that no time series models are suitable for your data set, please mention what approaches you have tried and what was unsatisfactory about them. If there is more than one suitable model, mention the pros and cons and compare their performance in forecasting future observations.

 ================================================================ The assignment should resemble Homework 4 and include the pieces of Homework 5 and the sample analysis. It should be in a clean format like the homework and include the R code at the end of it. Since this is Stock price data the arima process can’t be used and it should (I think) require the garch volatility modeling. Take a look at the S&P analysis for further clarification

.87

180.04

185

.26

280

.4

3

390.4

390

.48

505

2

.49

515

.1

.27

.2

.07

567

4

372.5

390

460

.51

580

560

470.3

3

590

573

590.8

.77

563

600.25

.59

642

date price
11/1/2004 169.35
11/8/2004 182
11/15/2004 169.4
11/22/2004 179.39
11/29/2004 180.4
12/6/2004 171.65
12/13/2004 180.08
12/20/2004 187.9
12/27/2004 192.79
1/3/2005 193.85
1/10/2005 199.97
1/18/2005 188.28
1/24/2005 190.34
1/31/2005 204.36
2/7/2005 187.4
2/14/2005 197.95
2/22/2005 185
2/28/2005 185.9
3/7/2005 177.8
3/14/2005 180.04
3/21/2005 179.25
3/28/2005
4/4/2005 192.05
4/11/2005
4/18/2005 215.81
4/25/2005 220
5/2/2005 228.02
5/9/2005 229.24
5/16/2005 241.61
5/23/2005 266
5/31/2005 280
6/6/2005 282.5
6/13/2005 280.3
6/20/2005 297.25
6/27/2005 291.25
7/5/2005 296.23
7/11/2005 301.19
7/18/2005 302.4
7/25/2005 287.76
8/1/2005 292.35
8/8/2005 289.72
8/15/2005
8/22/2005 283.58
8/29/2005 288.45
9/6/2005 299.09
9/12/2005 300.2
9/19/2005 315.36
9/26/2005 316.46
10/3/2005 312.99
10/10/2005 296.14
10/17/2005 339.9
10/24/2005 358.17
10/31/2005 390
11/7/2005
11/14/2005 400.21
11/21/2005 428.62
11/28/2005 417.7
12/5/2005 409.2
12/12/2005 430.15
12/19/2005 430.93
12/27/2005 414.86
1/3/2006 465.66
1/9/2006 466.25
1/17/2006 399.46
1/23/2006 433.49
1/30/2006 381.55
2/6/2006 362.61
2/13/2006 368.75
2/21/2006 377.4
2/27/2006 378.18
3/6/2006 337.5
3/13/2006 339.79
3/20/2006 365.8
3/27/2006
4/3/2006 406.16
4/10/2006 402.16
4/17/2006 437.1
4/24/2006 417.94
5/1/2006 394.3
5/8/2006 374.13
5/15/2006 370.02
5/22/2006 381.35
5/30/2006 379.44
6/5/2006 386.57
6/12/2006 390.7
6/19/2006 404.86
6/26/2006 419.33
7/3/2006 420.45
7/10/2006 403.5
7/17/2006 390.11
7/24/2006 388.12
7/31/2006 373.85
8/7/2006 368.5
8/14/2006 383.36
8/21/2006 373.26
8/28/2006 378.6
9/5/2006 377.85
9/11/2006 409.88
9/18/2006 403.78
9/25/2006 401.9
10/2/2006 420.5
10/9/2006 427.3
10/16/2006 459.67
10/23/2006 475.2
10/30/2006 471.8
11/6/2006 473.55
11/13/2006 498.79
11/20/2006 505
11/27/2006 480.8
12/4/2006 484.11
12/11/2006 480.3
12/18/2006 455.58
12/26/2006 460
1/3/2007 487.19
1/8/2007
1/16/2007 489.75
1/22/2007 495.84
1/29/2007 481.5
2/5/2007 461.89
2/12/2007 469.94
2/20/2007 470.62
2/26/2007 438.68
3/5/2007 452.96
3/12/2007 440.85
3/19/2007 461.83
3/26/2007 458.16
4/2/2007 471.51
4/9/2007 466.29
4/16/2007 482.48
4/23/2007 479.01
4/30/2007 471.12
5/7/2007 466.74
5/14/2007 470.3
5/21/2007 483.52
5/29/2007 500.4
6/4/2007 515
6/11/2007 505.89
6/18/2007 524.98
6/25/2007 522.7
7/2/2007 539.4
7/9/2007 552.16
7/16/2007 520.12
7/23/2007 511.89
7/30/2007 503
8/6/2007 515.75
8/13/2007 500.04
8/20/2007
8/27/2007 515.25
9/4/2007 519.35
9/10/2007 528.75
9/17/2007 560
9/24/2007 567
10/1/2007 594.05
10/8/2007 637.39
10/15/2007 644.71
10/22/2007 674.6
10/29/2007 711.25
11/5/2007 663.97
11/12/2007 633.63
11/19/2007 676.7
11/26/2007 693
12/3/2007 714.87
12/10/2007 689.96
12/17/2007 696.69
12/24/2007 702.53
12/31/2007 657
1/7/2008 638.25
1/14/2008 600.25
1/22/2008 566.4
1/28/2008 515.9
2/4/2008 516.69
2/11/2008 529.64
2/19/2008 507.8
2/25/2008 471.18
3/3/2008 433.35
3/10/2008 437.92
3/17/2008 433.55
3/24/2008 438.08
3/31/2008 471.09
4/7/2008 457.45
4/14/2008 539.41
4/21/2008 544.06
4/28/2008 581.29
5/5/2008 573
5/12/2008 580
5/19/2008 544.62
5/27/2008 585.8
6/2/2008
6/9/2008 571.51
6/16/2008 546.43
6/23/2008 528.07
6/30/2008 537
7/7/2008 533.8
7/14/2008 481.32
7/21/2008 491.98
7/28/2008 467.86
8/4/2008 495.01
8/11/2008 510.15
8/18/2008 490.59
8/25/2008 463.29
9/2/2008 444.25
9/8/2008 437.66
9/15/2008 449.15
9/22/2008 431.04
9/29/2008 386.91
10/6/2008 332
10/13/2008 372.5
10/20/2008 339.29
10/27/2008 359.36
11/3/2008 331.14
11/10/2008 310.02
11/17/2008 262.43
11/24/2008 292.96
12/1/2008 283.99
12/8/2008 315.76
12/15/2008 310.17
12/22/2008 300.36
12/29/2008 321.32
1/5/2009 315.07
1/12/2009 299.67
1/20/2009 324.7
1/26/2009 338.53
2/2/2009 371.28
2/9/2009 357.68
2/17/2009 346.45
2/23/2009 337.99
3/2/2009 308.57
3/9/2009 324.42
3/16/2009 330.16
3/23/2009 347.7
3/30/2009 369.78
4/6/2009
4/13/2009 392.24
4/20/2009 389.49
4/27/2009 393.69
5/4/2009 407.33
5/11/2009
5/18/2009 393.5
5/26/2009 417.23
6/1/2009 444.32
6/8/2009 424.84
6/15/2009 420.09
6/22/2009 425.32
6/29/2009 408.49
7/6/2009 414.4
7/13/2009 430.25
7/20/2009 446.72
7/27/2009 443.05
8/3/2009 457.1
8/10/2009
8/17/2009 465.24
8/24/2009 464.75
8/31/2009 461.3
9/8/2009 472.14
9/14/2009 491.46
9/21/2009 492.48
9/28/2009 484.58
10/5/2009 516.25
10/12/2009 549.85
10/19/2009 553.69
10/26/2009 536.12
11/2/2009 551.1
11/9/2009 572.05
11/16/2009 569.96
11/23/2009 579.76
11/30/2009 585.01
12/7/2009 590
12/14/2009 596.42
12/21/2009 618.48
12/28/2009 619.98
1/4/2010 602.02
1/11/2010
1/19/2010 550.01
1/25/2010 529.94
2/1/2010 531.29
2/8/2010 533.12
2/16/2010 540.76
2/22/2010 526.8
3/1/2010 564.21
3/8/2010 579.54
3/15/2010
3/22/2010 562.69
3/29/2010 568.8
4/5/2010 566.22
4/12/2010 550.15
4/19/2010 544.99
4/26/2010 525.7
5/3/2010 493.14
5/10/2010 507.53
5/17/2010 472.05
5/24/2010 485.63
6/1/2010 498.72
6/7/2010 488.5
6/14/2010 500.03
6/21/2010 472.68
6/28/2010 436.55
7/6/2010 467.49
7/12/2010 459.61
7/19/2010 490.06
7/26/2010 484.85
8/2/2010 500.22
8/9/2010 486.35
8/16/2010 462.02
8/23/2010 458.83
8/30/2010
9/7/2010 476.14
9/13/2010 490.15
9/20/2010 527.29
9/27/2010 525.62
10/4/2010 536.35
10/11/2010 601.45
10/18/2010 612.53
10/25/2010 613.7
11/1/2010 625.08
11/8/2010 603.29
11/15/2010 590.8
11/22/2010
11/29/2010
12/6/2010 592.21
12/13/2010
12/20/2010 604.23
12/27/2010 593.97
1/3/2011 616.44
1/10/2011 624.18
1/18/2011 611.83
1/24/2011 600.99
1/31/2011 610.98
2/7/2011 624.5
2/14/2011 630.08
2/22/2011 610.04
2/28/2011 600.62
3/7/2011 576.71
3/14/2011 561.06
3/21/2011 579.74
3/28/2011 591.8
4/4/2011 578.16
4/11/2011 530.7
4/18/2011 525.1
4/25/2011 544.1
5/2/2011 535.3
5/9/2011 529.55
5/16/2011 524.03
5/23/2011 520.9
5/31/2011 523.08
6/6/2011 509.51
6/13/2011 485.02
6/20/2011 474.88
6/27/2011 521.03
7/5/2011 531.99
7/11/2011 597.62
7/18/2011 618.23
7/25/2011 603.69
8/1/2011 579.04
8/8/2011 563
8/15/2011 490.92
8/22/2011 526.86
8/29/2011 524.84
9/6/2011 524.85
9/12/2011 546.68
9/19/2011 525.51
9/26/2011 515.04
10/3/2011 515.12
10/10/2011 591.68
10/17/2011 590.49
10/24/2011 600.14
10/31/2011 596.14
11/7/2011 608.35
11/14/2011 594.88
11/21/2011
11/28/2011 620.36
12/5/2011 627.42
12/12/2011 625.96
12/19/2011 633.14
12/27/2011 645.9
1/3/2012 650.02
1/9/2012 624.99
1/17/2012 585.99
1/23/2012 579.98
1/30/2012 596.33
2/6/2012 605.91
2/13/2012 604.64
2/21/2012 609.9
2/27/2012 621.25
3/5/2012
3/12/2012 625.04
3/19/2012 642
3/26/2012 641.24
4/2/2012 632.32
4/9/2012 624.6
4/16/2012 596.06
4/23/2012 614.98
4/30/2012 596.97
5/7/2012 605.23
5/14/2012 600.4
5/21/2012 591.53
5/29/2012 570.98
6/4/2012 580.45
6/11/2012 564.51
6/18/2012 571.48
6/25/2012 580.07
7/2/2012 585.98
7/9/2012 576.52
7/16/2012 610.82
7/23/2012 634.96
7/30/2012 641.33
8/6/2012
8/13/2012 677.14
8/20/2012 678.63
8/27/2012 685.09
9/4/2012 706.15
9/10/2012 709.68
9/17/2012 733.99
9/24/2012 754.5
10/1/2012 767.65
10/8/2012 744.75
10/15/2012 681.79
10/22/2012 675.15
10/31/2012 687.92
11/5/2012 663.03
11/12/2012 647.18
11/19/2012 667.97
11/26/2012 698.37
12/3/2012 684.21
12/10/2012 701.96
12/17/2012 715.63
12/24/2012 700.01
12/31/2012 737.97
1/7/2013 739.99
1/14/2013 704.51
1/22/2013 753.67
1/28/2013 775.6
2/4/2013 785.37
2/11/2013 792.89
2/19/2013 799.71
2/25/2013 806.19
3/4/2013 831.52
3/11/2013 814.3
3/18/2013 810.31
3/25/2013 794.19
4/1/2013 783.05
4/8/2013 790.05
4/15/2013 799.87
4/22/2013 801.42
4/29/2013 845.72
5/6/2013 880.23
5/13/2013 909.18
5/20/2013 873.32
5/28/2013 871.22
6/3/2013 879.73
6/10/2013 875.04
6/17/2013 880.93
6/24/2013 880.37
7/1/2013 893.49
7/8/2013 923
7/15/2013 896.6
7/22/2013 885.35
7/29/2013 906.57
8/5/2013 890.41
8/12/2013 856.91
8/19/2013 870.21
8/26/2013 846.9
9/3/2013 879.58
9/9/2013 889.07
9/16/2013 903.11
9/23/2013 876.39
9/30/2013 872.35
10/7/2013 871.99
10/14/2013 1011.41
10/21/2013 1015.2
10/28/2013 1027.04

CSC425– Time series analysis and forecasting

Homework 4 – Solutions

Problem 1 [just for fun!] – NO CREDIT

I noticed some students had questions regarding the SARIMA notation. Write the model expression for

the following SARIMA models for the {Xt} time series.

Example: SARIMA(2,0,0)(1,0,0)[12]

The notation indicates a regular AR(2) model and a seasonal AR(1)[12] model

(1-φ1B-φ2B
2
)(1-Φ1 B

12
)Xt = at

a)

SARIMA(2,1,0)(1,1,0)[4]

b)

SARIMA(1,0,1)(0,0,1)[12]

c)

SARIMA(0,1,3)(0,1,1)[4]

In general SARIMA(p,d,q)(P,D,Q)[s] where the first set (p,d,q) describes the regular

ARIMA(p,d,q) model, and the second set (P,D,Q)[s] describes the order of the seasonal

component.

a) The first set (2,1,0) identifies a first difference + AR(2) model. The second set (1,1,0)
indicates the seasonal component with a seasonal differencing and a AR(4) seasonal

model.

Thus

SARIMA(2,1,0)(1,1,0)[4]

(1-ϕ1B-ϕ2B
2
)(1-φ1B

4
) (1-B)(1-B

4
)Xt = at

b) The first set (1,0,1) indicates an ARMA(1,1) model. The second set (0,0,1) is for the
seasonal component, indicating MA(12) seasonal model. Thus

SARIMA(1,0,1)(0,0,1)[12]

(1-ϕ1B)Xt =(1-θ1B
12

) (1-ϑ1B)at

c) The first set (0,1,3) indicates an ARIMA(0,1,3) model, i.e. first difference + MA(3)
model. The second set (0,1,1,) indicates a seasonal differencing and a MA(4) model.

Thus
SARIMA(0,1,3)(0,1,1)[4]

(1-B)(1-B
4
)Xt = (1-θ1B

4
) (1-ϑ1B- ϑ2B

2
– ϑ3B

3
)at

Problem 2 [15pts]

Consider the quarterly earnings per share (EPS) of Costco from the last quarter of 2001 to the third

quarter of 2013 fiscal year. The data were obtained from the Nasdaq website and are saved in the

costco.csv file.

a) Analyze the distribution of the quarterly earnings. Are the data normally distributed? Provide
appropriate statistics, tests and graphs to support your conclusions.

The average quarterly earnings are $0.66 per share with a standard deviation of $0.27. The

distribution of eps has a slight positive skew – as shown by the histogram and normal quantile plot

(points under the right tails diverge from the diagonal). Normality test rejects hypothesis of

normality for earnings data at 5% level. (See SAS / R output)

The UNIVARIATE Procedure

Variable: eps

Moments

N 48 Sum Weights 48

Mean 0.66270833 Sum Observations 31.81

Std Deviation 0.26970228 Variance 0.07273932

Skewness 0.95652585 Kurtosis 0.80234438

Uncorrected SS 24.4995 Corrected SS 3.41874792

Coeff Variation 40.6969796 Std Error Mean 0.03892817

Results of Jacque and Bera test on normality

Jarque & P-value for

skewness, kurtosis, Bera Jarque &

Obs eps eps statistic Bera test

1 0.95653 0.80234 8.60705 0.013521

b) Apply the log function to stabilize variable and analyze the variable yt=log(eps) at time t.
The distribution of log(eps) is closer to normal distribution, with symmetry and kurtosis statistics

close to 0 as also shown by histogram and normal quantile plot. The Jarque-Bera test is not

significant (p-value > 0.05) indicating that hypothesis of normal distribution for log(eps) cannot

be rejected.
Moments

N 48 Sum Weights 48

Mean -0.4891602 Sum Observations -23.479688

Std Deviation 0.39954911 Variance 0.15963949

Skewness 0.0058175 Kurtosis -0.398239

Uncorrected SS 18.9883843 Corrected SS 7.50305604

Coeff Variation -81.68063 Std Error Mean 0.05766995

Results of Jacque and Bera test on normality

Jarque & P-value for

skewness, kurtosis, Bera Jarque &

Obs xlog xlog statistic Bera test

1 .005817502 -0.39824 0.31746 0.85323

c) Is there any evidence of a seasonal effect in the log(eps) data? Explain your answer.
The time plot for Costco quarterly earnings and for log(earnings) show quarterly seasonality, with

higher earnings in the first and third quarters and lower earnings in the second and fourth

quarters. We can observe a slight dip in earnings in 2009 but the recent economic crisis didn’t

seem to affect

Costco earnings

much, as earnings increased shortly after the dip. The log(eps) time

plot shows that variability is more constant over time.

eps

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1.1

1.2

1.3

1.4

date

01/01/2000 01/01/2002 01/01/2004 01/01/2006 01/01/2008 01/01/2010 01/01/2012 01/01/2014

Costco earnings

d) Compute the Dickey-Fuller test for the quarterly earnings for p=1, 3, and 5. Write down the test
hypotheses, and the test statistics and analyze the results of the test

Dickey Fuller Test is expressed as H0: φ1=1 vs Ha: φ1<>1 where the null hypothesis

indicates unit-root non-stationarity.

The Dickey-Fuller test shows that the earnings time series is unit-root non-stationary. The test p-

values for lags 1, 3 and 5 are all larger than 5, therefore the null hypothesis of non stationarity

cannot be rejected (large p-values> 0.1).

The ARIMA Procedure

Augmented Dickey-Fuller Unit Root Tests

Type Lags Rho Pr < Rho Tau Pr < Tau F Pr > F

Zero Mean 1 1.1538 0.9282 1.37 0.9554

3 1.2386 0.9381 4.43 0.9999

5 1.1075 0.9213 1.91 0.9852

Single Mean 1 -0.7726 0.9018 -0.28 0.9193 1.22 0.7642

3 1.8677 0.9962 2.06 0.9998 10.02 0.0010

5 0.4869 0.9736 0.24 0.9722 1.86 0.6076

Trend 1 -20.9931 0.0306 -2.77 0.2161 4.22 0.3620

3 0.1628 0.9958 0.06 0.9958 2.31 0.7221

5 93.4268 0.9999 -2.07 0.5472 2.53 0.6813

e) Take the first difference of log(eps) series, and analyze its autocorrelation function. Does the
differenced time series show strong seasonal patterns? Explain.

The ACF of the first difference of yt=log(eps), say yt-yt-1 shows larger values at lags multiple of

four, which is consistent with the seasonal behavior seen in the time plot. The TS is not stationary

since it shows seasonality.

xlog

-1.3

-1.2

-1.1

-1.0

-0.9

-0.8

-0.7

-0.6

-0.5

-0.4

-0.3

-0.2

0.1

0.0

0.1
0.2
0.3
0.4
date
01/01/2000 01/01/2002 01/01/2004 01/01/2006 01/01/2008 01/01/2010 01/01/2012 01/01/2014

Time plot of Costco log(EPS)

f) If you de-trend and de-seasonalize the Costco log(eps) data, do you obtain a stationary time series?
Use appropriate statistical techniques and graphs to analyze stationarity.

The de-trended and de-seasonalized time series is stationary – as indicated by the Dickey-Fuller

test (p-values are small, and therefore hypothesis of non stationarity can be rejected), and small

ACF values. After differencing the time series, we can assume a zero mean model for the data.

Note that deseasonalized TS have zero mean (use”nc” type in R)

The ARIMA Procedure

Augmented Dickey-Fuller Unit Root Tests

Type Lags Rho Pr < Rho Tau Pr < Tau F Pr > F

Zero Mean 1 -39.9069 <.0001 -4.17 0.0001

3 85.9684 0.9999 -5.29 <.0001

5 28.6760 0.9999 -4.61 <.0001

Single Mean 1 -39.7843 0.0003 -4.10 0.0024 8.49 0.0010

3 85.8868 0.9999 -5.19 0.0002 13.58 0.0010

5 28.7344 0.9999 -4.51 0.0009 10.30 0.0010

Trend 1 -39.6292 <.0001 -4.04 0.0149 8.22 0.0184

3 86.3878 0.9999 -5.10 0.0009 13.11 0.0010

5 28.3836 0.9999 -4.48 0.0053 10.04 0.0010

g) Apply an airline model SARIMA(0,1,1)(0,1,1)4 to estimate the log(eps).

a. Write down the estimated expression of the fitted model
(1-B)(1-B

4
)Xt = (1-0.339B)(1-0.669B

4
)at (sas output)

The ARIMA Procedure

Maximum Likelihood Estimation

Standard Approx

Parameter Estimate Error t Value Pr > |t| Lag

MA1,1 0.33862 0.14912 2.27 0.0232 1

MA2,1 0.66864 0.13492 4.96 <.0001 4

Variance Estimate 0.012082

Std Error Estimate 0.109917

AIC -63.4007

SBC -59.8783

Number of Residuals 43

b. Are all model coefficients significant?
All coefficients of the SARIMA model are significant (p-value < 0.05). .

c. Analyze the residuals to check the model validity
The autocorrelation analysis of the residuals show that the ACF at lag 5 is significant larger

than zero. This is also shown by the Ljung Box test that has p-value > 0.05 for lags up to 6.

These results indicate that there is some source of serial correlation that is still unexplained

by the model. Since it is only one large ACF value, this is not indicative of a seasonal trend.

The solution is to add a lag 5 parameter in the MA component of the SARIMA model.

Autocorrelation Check of Residuals

To Chi- Pr >

Lag Square DF ChiSq ——————–Autocorrelations——————–

6 9.44 4 0.0509 -0.013 0.104 -0.198 -0.129 -0.346 -0.049

12 12.75 10 0.2380 0.002 0.184 0.019 0.153 -0.000 0.023

18 20.68 16 0.1911 -0.185 0.066 -0.179 -0.065 -0.044 0.189

24 21.88 22 0.4673 -0.014 -0.055 -0.066 0.054 -0.053 -0.009

h) The multiplicative airline model has some limitations, how can you modify it to be a better fit for
the data? (If you are not sure how to do this, post a message on the discussion forum, and start a

discussion with other students!)

a. Write down the estimated expression of the fitted model
(1-B)(1-B

4
)Xt =(1-0.7235B

4
)(1-0.4103B-0.3726B

5
)at (sas output)

The ARIMA Procedure
Maximum Likelihood Estimation
Standard Approx
Parameter Estimate Error t Value Pr > |t| Lag

MA1,1 0.41029 0.16632 2.47 0.0136 1

MA1,2 0.37262 0.16359 2.28 0.0227 5

MA2,1 0.72350 0.13731 5.27 <.0001 4

Variance Estimate 0.010536

Std Error Estimate 0.102645

AIC -66.3638

SBC -61.0802

Number of Residuals 43

b. Are all model coefficients significant?
All coefficients are significantly different from zero

c. Analyze the residuals to check the model validity
The residuals are white noise. Ljung Box test for lags up to 24 is significant, indicating that

the model is able to capture the time dependencies in the data.

To Chi- Pr >

Lag Square DF ChiSq ——————–Autocorrelations——————–

6 3.77 3 0.2870 -0.059 0.091 -0.202 -0.134 -0.078 0.028

12 5.94 9 0.7457 0.026 0.140 -0.069 0.078 -0.055 0.062

18 13.34 15 0.5759 -0.109 0.071 -0.205 -0.086 -0.066 0.182

24 14.27 21 0.8577 0.005 -0.042 -0.079 0.018 -0.033 -0.032

i) Which model is a better fit for the data? Explain what evidence you use to select the best model.

The SARIMA(0,1,0)(0,1,5)4 provides a better description of the dynamic behavior of the Costco

log earnings time series. The coefficients are significantly different from zero, residuals are white

noise and normally distributed. Also the second model has the lowest SBC/BIC values.

j) Compute forecasts for Costco quarterly EPS for the next 4 quarters using the selected model in h).
The table below reports forecasts in terms of EPS values. The forecasts of log(eps) computed by

SARIMA model were transformed to EPS original scale using the exp() function.

Obs date OBSERVED FORECAST STD L95 U95 RESIDUAL x

49 10/01/2013 . 0.97486 0.10264 0.79302 1.18584 . .

50 01/01/2014 . 1.15743 0.11916 0.90987 1.45159 . .

51 04/01/2014 . 1.00514 0.13366 0.76662 1.29454 . .

52 07/01/2014 . 1.51769 0.14672 1.12620 2.00170 . .

R code:
# Analysis of quarterly house starts (single family homes) in the US

from 1974

# to 2012. Seasonality is expected due to lower number of new

constructions

# in the winter months.

#1. LOAD LIBRARIES

library(tseries)

library(fBasics)

library(forecast)

#2. IMPORT DATA

# Load data with no variable names into the data frame “da”

myd=read.table(“costco.csv”,header=T, sep=’,’)

# creates time series object

x=myd$eps

xts=ts(myd$eps,start=c(2001,4), frequency=4,)

# create time plot

plot(xts,type=’l’)

# TRANSFORMATIONS

# natural log

y=log(x)

yts=log(xts)

plot(yts,type=’l’)

#add circles on data points in the time plot

points(yts)

# ACF ANALYSIS

# two plots per page

par(mfcol=c(1,1))

# ACF plot for log starts

# as.vector() transforms time series to regular numerical vector

acf(as.vector(y),lag.max=24, main=”ACF of log starts”)

# APPLYING DIFFERENCING TO DATA

# compute regular differences

dy=diff(y)

# create acf plot

acf(as.vector(dy),lag.max=16, main=”ACF of DX log starts”)

# compute seasonal difference for quarterly data (s=4)

sdy=diff(dy,4)

# create acf plot

acf(as.vector(sdy),lag.max=16, main=”ACF of DSDX log starts”)

pacf(sdy)

#FIT SARIMA MODEL

library(forecast)

# try automated order selection

# since TS is stationary, stationary = FALSE

;

auto.arima(y, trace=T,stationary = FALSE)

# fit multiplicative seasonal model ARIMA(1,0,0)×(1,1,0)[4]

m1=arima(y, order=c(1,0,0),seasonal=list(order=c(1,1,0),period=4),

method=”ML”)

source(“arimaPrint.R”)

# use arimaPrint() function in arimaPrint.R file

# save arimaPrint.R file in work directory

arimaPrint(m1)

acf(m1$residuals)

qqnorm(m1$residuals)

qqline(m1$residuals)

hist(m1$residuals)

# Since seasonal AR_1 coefficient is not significant; remove it from

model;

# fit airline seasonal model ARIMA(0,1,1)×(0,1,1)4

m1=arima(y, order=c(0,1,1),seasonal=list(order=c(0,1,1),period=4),

method=”ML”)

m1

# fit airline seasonal model ARIMA(0,1,5)×(0,1,1)4 with only lags 1

and 5

m1=arima(y, order=c(0,1,5),seasonal=list(order=c(0,1,1),period=4),

method=”ML”, fixed=c(NA, 0,0,0,NA,NA))

m1

# use arimaPrint() function in arimaPrint.R file

arimaPrint(m1,c(1,5,6) )

# One plot per page

par(mfcol=c(1,1))

acf(m1$residuals)

# “approximated” ljung box test on residuals

Box.test(m1$residuals, 6, “Ljung-Box”,fitdf=3 )

Box.test(m1$residuals, 12, “Ljung-Box”, fitdf=3)

# compute predictions for log data for up to 8-step aheads

f1=forecast(m1,h=8)

names(f1)

plot(f1)

#predictions in original scale;

exp(f1$mean)

# creates forecast plot for original variable

# join data with forecasts

s1=c(x,exp(f1$mean))

#s1=ts(s1, frequency=4,start=c(1974,1))

# computes lower limit for 95% interval

lcl=c(x,exp(f1$lower[,2]))

# computes upper limit for 95% interval

ucl=c(x,exp(f1$upper[,2]))

# Forecast plot

plot(s1,type=’l’,col=’red’)

# points(s2011)

lines(1:length(s1),ucl,lty=2)

lines(1:length(s1),lcl,lty=2)

SAS code
title1 ‘Costco earnings’;

proc import datafile=”costco.csv” out=myd replace;

getnames=yes;

delimiter=’,’;

run;

proc print;

run;

data myd;

set myd;

xlog=log(eps);

run;

*analyze distribution of log(eps);

proc univariate data=myd;

var xlog;

histogram/normal;

qqplot / normal(mu=est sigma=est);

output out=stats kurtosis=kurtosis skewness=skewness N=ntot;

run;

/* steps to compute Jarque-Bera tests

*/

/* and print out tests results to the OUTPUT window */

data computation;

set stats;

label pv_kur = “P-value for kurtosis test”;

*skewness statistic;

skew_test = skewness/sqrt(6/Ntot);

*kurtosis statistic;

kurt_test = kurtosis/sqrt(24/Ntot);

* jarque-bera statistic;

jb = skew_test*skew_test+kurt_test*kurt_test;

* JB test p-values;

pv_jb = 1-cdf(‘CHISQUARE’, jb,2);

label pv_jb = “P-value for Jarque & Bera test”

jb = “Jarque & Bera statistic”;

/* Print out results of tests*/

Title ” Results of Jacque and Bera test on normality”;

proc print data= computation label;

var skewness kurtosis jb pv_jb;

run;

*plot timeplots for all transformed variables;

symbol1 i=join v=dot;

proc gplot data=myd;

title1 “Time plot of Costco EPS”;

plot eps * date = 1 ;

title1 “Time plot of Costco log(EPS)”;

plot xlog * date = 1 ;

run;

proc arima data=myd;

identify var=eps nlag=24 stationarity=(adf=(1 3 5));

identify var=xlog(1) nlag=24;

identify var=xlog(1,4) nlag=20 stationarity=(adf=(1 3 5))

minic;

identify var=xlog(1,4) nlag=20 esacf;

run;

*airline model SARIMA(0,1,1)(0,1,1)[4];

estimate q=(1)(4) noconstant method=ml;

* modified airline model SARIMA(0,1,1)(0,1,5)[4];

estimate q=(1 5)(4) noconstant method=ml;

outlier maxnum=3 alpha=0.01;

run;

forecast out=b lead=12 id=date interval=quarter;

quit;

/*

The following statements retransform the forecast values to get

forecasts in the original scales.

*/

data c;

set b;

x = exp( xlog );

forecast = exp( forecast + std*std/2 );

l95 = exp( l95 );

u95 = exp( u95 );

run;

proc print data=c;

run;

Problem 3 [12 points] (review Week 5 notes)

Global temperatures are rising. Analyze the time series of Land-Ocean Temperature Index in degrees

Celsius published at the NASA-GISS (Goddard Institute for Space Studies) and saved in dataset

globtemp.csv. The temperature index measures temperature anomalies that are computed relative

to the base period 1951-1980. The reason to work with anomalies, rather than absolute

temperature is that absolute temperature varies markedly in short distances, while monthly or

annual temperature anomalies are representative of a much larger region (from GISS website).
a) Plot the temperature index time series and its ACFs (20 lags). Analyze trends and patterns

shown by the data.

Time plot shows that global temperatures has steadily increased since 1950.The ACF

plot shows that Global Temperature Index TS is non stationary (ACF function decays

very slowly to zero, which is not surprising given the upward time trend shown by the

time plot)

b) Analyze if the series is stationary using both the ACF function and the Dickey Fuller test
The DF tests computed for AR(3) and AR(5) models show that the null hypothesis of unit-root

stationarity cannot be rejected (p-values are > 0.1). (NOTE: for this dataset, it would be

acceptable to use the DF test for either model with intercept (constant) but no time trend, or

for a model with intercept (constant) and a time trend. Results are equivalent)

> # Dickey Fuller test for time series with time trend

>

library(fUnitRoots)

>

adfTest(index, lags=3, type=c(“ct”))

Title:

Augmented Dickey-Fuller Test

Test Results:

PARAMETER:

Lag Order: 3

STATISTIC:

Dickey-Fuller: -2.4227

P VALUE:

0.4007

> adfTest(index, lags=5, type=c(“ct”))

Title:
Augmented Dickey-Fuller Test

Test Results:
PARAMETER:

Lag Order: 5

STATISTIC:

Dickey-Fuller: -1.9472

P VALUE:

0.5984

In SAS

Augmented Dickey-Fuller Unit Root Tests

Type Lags Rho Pr < Rho Tau Pr < Tau F Pr > F

Zero Mean 1 -3.1286 0.2229 -0.97 0.2967

3 0.3095 0.7561 0.16 0.7308

5 0.6287 0.8363 0.36 0.7864

Single Mean 1 -2.9490 0.6593 -0.91 0.7827 0.74 0.8811

3 0.5894 0.9790 0.31 0.9780 1.13 0.7831

5 1.0265 0.9887 0.63 0.9900 1.84 0.6012

Trend 1 -26.5191 0.0132 -3.75 0.0225 7.46 0.0207

3 -11.2587 0.3399 -2.42 0.3659 4.15 0.3489

5 -8.1422 0.5607 -1.95 0.6245 3.26 0.5268

c) Use order selection methods such as EACF or BIC to identify the order of the “best”
AR/MA/ARMA model for the global temperature index time series. Note that since the TS is

not stationary, you should use an ARIMA(p,1,d) model.

Since TS is unit-root non stationary, only ARIMA(p,1,q) models will be evaluated.

In R: The EACF method suggests an ARIMA(1,1,2). The auto.arima() function selects the

ARIMA(0,1,1) model that exhibits the lowest BIC value.

In SAS: the EACF method suggests an ARIMA(0,1,1) model and the BIC model selects the

ARIMA(3,1,0)

>

#EACF method

>

library(TSA)

>

#use numeric object not ts object. coredata() function retrieves

numeric vector

>

d1_ts=diff(indexts)

>

eacf(coredata(d1_ts))

AR/MA
0 1 2 3 4 5 6 7 8 9 10 11 12 13
0 x o o x o o o o x o o o x o
1 x o o o o o o o o o o o o o

2 x o x o o o o o o o o o x o
3 o x x o o o o o o o o o o o
4 o x x x o o o o o o o o o o
5 x o x x x o o o o o o o o o
6 x o o o x o o o o o o o o o
7 x o o o x x o o o o o o o o

>

# BIC criterion

>

# use numeric object not ts object

>

auto.arima(index, max.p = 5, max.q = 5, stationary = F, ic

=c(“bic”), trace=TRUE, allowdrift=F)

ARIMA(2,1,2) : 982.1547
ARIMA(0,1,0) : 990.837
ARIMA(1,1,0) : 985.1276
ARIMA(0,1,1) : 972.8737
ARIMA(1,1,1) : 976.1241
ARIMA(0,1,2) : 973.7256
ARIMA(1,1,2) : 979.3824

Best model: ARIMA(0,1,1)

Series: index
ARIMA(0,1,1)

Coefficients:
ma1
-0.5398
s.e. 0.0838

sigma^2 estimated as 89.26: log likelihood=-483.91
AIC=971.82 AICc=971.92 BIC=977.59

In SAS
ESACF Probability Values

Lags MA 0 MA 1 MA 2 MA 3 MA 4 MA 5

AR 0 0.0011 0.0635 0.4021 0.0552 0.1309 0.8882

AR 1 <.0001 0.7810 0.3314 0.3416 0.1542 0.5998

AR 2 <.0001 0.6174 0.0018 0.7358 0.1896 0.8318

AR 3 0.4231 0.0002 0.0001 0.1756 0.6531 0.7922

AR 4 0.2412 <.0001 0.0018 0.0597 0.3335 0.6631

AR 5 <.0001 0.4323 0.0527 0.0004 0.0120 0.9065

The ARIMA Procedure

ARMA(p+d,q)

Tentative

Order

Selection

Tests

—ESACF—

p+d q

0 1

3 3

(5% Significance Level)

Name of Variable = temp

Minimum Information Criterion

Lags MA 0 MA 1 MA 2 MA 3 MA 4 MA 5

AR 0 4.588787 4.487776 4.468956 4.495152 4.505573 4.533958

AR 1 4.541124 4.523001 4.480694 4.48462 4.503237 4.540091

AR 2 4.499424 4.522147 4.516266 4.503816 4.537365 4.574281

AR 3 4.441992 4.477332 4.494755 4.527052 4.558184 4.586585

AR 4 4.477766 4.514051 4.522759 4.556779 4.577564 4.612352

AR 5 4.484678 4.521373 4.55314 4.590039 4.610067 4.637321

Error series model: AR(10)

Minimum Table Value: BIC(3,0) = 4.441992

d) Fit the selected ARIMA model, and write down the result.
The ARIMA(1,1,2) model selected by the EACF method in R has non-significant parameters.

The ARIMA(3,1,0) selected by BIC method in SAS is a possible model but has higher BIC

value (details of analysis are omitted). The best model is the ARIMA(0,1,1) selected by the BIC

criterion in R and EACF in SAS.

The fitted ARIMA(0,1,1) model for the global temperature index Xt can be written as:

(1-B)Xt = (1-0.5398B)at

>

m2= Arima(index, order=c(0,1,1), method=’ML’)

>

m2

Series: index
ARIMA(0,1,1)

Coefficients:
ma1
-0.5398
s.e. 0.0838

sigma^2 estimated as 89.26: log likelihood=-483.91
AIC=971.82 AICc=971.92 BIC=977.59
>

arimaPrint(m2)

[1] “T-test table for model coefficients”
estimates std error
ma1 “-0.5397611” “0.08376795”
p-values
ma1 “1.167279e-10”

e) Use residual analysis to check the model.
Ljung Box test on residuals shows that hypothesis of

independence cannot be rejected. So we can assume that

residuals are white noise.

The normal quantile plot shows that distribution of residuals

is close to the normal distribution.

> Box.test(m2$residuals,lag=6,type=’Ljung-Box’, fitdf=1)

Box-Ljung test

data: m2$residuals

X-squared = 9.5895, df = 5,

p-value = 0.08774

>

Box.test(m2$residuals,lag=12,type=’Ljung-Box’, fitdf=1)

Box-Ljung test
data: m2$residuals

X-squared = 16.3747, df = 11,

p-value = 0.1278

f) Compute 1-step to 4-step ahead forecasts of the fitted model with origin at the end of the data,
i.e. 2012. Write down the forecasts and their standard errors.

> f=

forecast.Arima(m2, h=4)

>

plot(f, include=50)

> f
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
134 58.1793 46.07184 70.28676 39.66253 76.69606
135 58.1793 44.85108 71.50751 37.79555 78.56305
136 58.1793 43.73311 72.62548 36.08577 80.27283
137 58.1793 42.69566 73.66293 34.49912 81.85947

g) EXTRA CREDIT [2pts]: apply backtesting procedure to compare the following two models:
ARIMA(0,1,1) and ARIMA(0,1,2). Use 113

cases (85% of data) for training and 20 cases

(or 15%) for testing.

The procedure shows that the RMSE value of

the ARIMA(0,1,1) is 9.680. While the RMSE

value for the ARIMA(0,1,2) is 9.688. The two

values are very close showing that the two

models are equivalent in terms of predictive

power. However the ARIMA(0,1,1) is slightly

better.

>

backtest(m2,index,113,1)

[1] “RMSE of out-of-sample
forecasts”
[1] 9.680175
[1] “Mean absolute error of out-of-sample forecasts”
[1] 7.925004
>

m3=arima(index, order=c(0,1,2))

>

backtest(m3,index,113,1)

[1] “RMSE of out-of-sample forecasts”
[1] 9.688697
[1] “Mean absolute error of out-of-sample forecasts”
[1] 7.932232
> m3
Series: x
ARIMA(0,1,2)

Coefficients:
ma1 ma2
-0.4287 -0.1591
s.e. 0.0843 0.0764

sigma^2 estimated as 86.63: log likelihood=-481.94
AIC=967.88 AICc=968.07 BIC=976.53

In SAS: backtest needs to be modified to fit ARIMA(p,1,q) models by changing the IDENTIFY

statement in the PROC ARIMA of the BACKTEST macro as follows
identify var=&VAR(1)

noprint;

/*——— backtesting ————–*/

%backtest(trainpct=85, dataset=myd, var=temp, q=1, interval=year);

%backtest(trainpct=85, dataset=myd, var=temp, q=2, interval=year);

Backtest results for myd

Model: q=(1)

Obs _TYPE_ _FREQ_ mafe msfe rmsfe

1 0 20 7.71428 87.0438 9.32973

Backtest results for myd

Model: q=2

Obs _TYPE_ _FREQ_ mafe msfe rmsfe

1 0 20 7.60769 85.3367 9.23779

R code

# Analysis of global temperature index

#1. LOAD LIBRARIES

library(tseries)
library(fBasics)
library(forecast)

#Create a time series

library(zoo)

data=read.table(“globtemp.csv”,header=T, sep=’,’)

index=data[,2]

indexts=zoo(data[,2],as.Date(as.character(data[,1]),format=”%Y”))

#plot ts and acf test

plot(indexts, type=’l’, xlab=’time’, ylab=’temperature index ‘)

#Analysis of distribution

par(mfcol=c(1,1))

hist(index, xlab=”Distribution of GT Index”, prob=TRUE,

main=”Histogram”)

# add approximating normal density curve

xfit<-seq(min(index),max(index),length=40)

yfit<-dnorm(xfit,mean=mean(index),sd=sd(index))

lines(xfit, yfit, col=”blue”, lwd=2)

# CREATE NORMAL PROBABILITY PLOT

qqnorm(index)

qqline(index, col = 2)

acf(coredata(indexts), plot=T, lag=20)

# Dickey Fuller test

library(fUnitRoots)
adfTest(index, lags=3, type=c(“ct”))

adfTest(index, lags=5, type=c(“ct”))

adfTest(index, lags=3, type=c(“c”))

adfTest(index, lags=5, type=c(“c”))

#Ljung – Box test

Box.test(index,lag=6,type=’Ljung’)

Box.test(index,lag=12,type=’Ljung’)

#EACF method
library(TSA)
#use numeric object not ts object. coredata() function retrieves
numeric vector
d1_ts=diff(indexts)
eacf(coredata(d1_ts))
# BIC criterion
# use numeric object not ts object
auto.arima(index, max.p = 5, max.q = 5, stationary = F, ic

=c(“bic”), trace=TRUE, allowdrift=F)

# 13. MODEL FITTING

# MODEL 1:

#EACF suggests ARIMA(1,1,2) p=1 and m=2

m1= arima(index, order=c(1,1,2), method=’ML’)

m1
source(“arimaPrint.R”)
arimaPrint(m1)

# RESIDUAL ANALYSIS

Box.test(m1$residuals,lag=6,type=’Ljung-Box’, fitdf=1)

Box.test(m1$residuals,lag=12,type=’Ljung-Box’, fitdf=1)

acf(m1$residuals)
qqnorm(m1$residuals)
qqline(m1$residuals)
hist(m1$residuals)

# MODEL 2 SELECTED BY AUTOARIMA

# FIT AN ARIMA(0,1,1) MODEL

#

m2= Arima(index, order=c(0,1,1), method=’ML’)
m2
arimaPrint(m2)

# RESIDUAL ANALYSIS

Box.test(m2$residuals,lag=6,type=’Ljung-Box’, fitdf=1)

Box.test(m2$residuals,lag=12,type=’Ljung-Box’, fitdf=1)

acf(m2$residuals)

#Analysis of distribution
par(mfcol=c(1,1))

hist(m2$residuals, xlab=”Distribution of residuals”, prob=TRUE,

main=”Histogram”)

# add approximating normal density curve

xfit<-seq(min(m2$residuals),max(m2$residuals),length=40)

yfit<-dnorm(xfit,mean=mean(m2$residuals),sd=sd(m2$residuals))

lines(xfit, yfit, col=”blue”, lwd=2)

# CREATE NORMAL PROBABILITY PLOT

qqnorm(m2$residuals)

qqline(m2$residuals, col = 2)

#forecast and plot

pred=predict(m2,n.ahead=4, se.fit=T)

pred

forecast.Arima(m2, h=4)

f=forecast.Arima(m2, h=4)

plot(f, include=50)

#Backtesting

source(“backtest.R”)

backtest(m2,index,113,1)
m3=arima(index, order=c(0,1,2))
backtest(m3,index,113,1)

SAS code:
%macro backtest(TRAINPCT=80, DATASET=, VAR=, P=, Q=, D=, DATE=date,

INTERVAL=month);

* ————————————————————————-

;

* T I M E S E R I E S B A C K T E S T M A C R O

* ————————————————————————-
;

* DePaul CSC425, Winter 2013, Dr. Raffaella Settimi

* Macro written by Bill Qualls, First Analytics

* ————————————————————————-
;

* E X P L A N A T I O N O F P A R A M E T E R S

* (Order of variables is insignificant)

* ————————————————————————-
;

* TRAINPCT .. Percent of dataset to be used for training.

* So, (100 – TRAINPCT)% will be used for evaluation.

* Example: TRAINPCT=80

* DATASET .. Time series dataset. Libname optional, defaults to Work.

* Example: DATASET=Work.Unemp

* VAR .. Name of time series variable.

* Example: VAR=ratechg

* P .. Specified for AR models. Omit otherwise.

* Example: P=(1 3 6)

* Q .. Specified for MA models. Omit otherwise.

* Example: Q=(1 3 6)

* DATE .. Name of date variable. Defaults to date.

* Example: DATE=date

* INTERVAL .. Date interval. Defaults to month.

* Example: INTERVAL=day

* ————————————————————————-
;

* S A M P L E U S A G E

* %backtest(trainpct=80, dataset=work.unemp, var=ratedif, p=(1),

interval=day);

* ————————————————————————-
;

* How many records are in the dataset? ;

data _null_;

call symput(‘NRECS’, trim(left(nrecs)));

set &DATASET nobs=nrecs;

stop;

run;

* Determine which ones are exclusively for training based on TRAINPCT ;

%let SIZE_OF_ROLLING_WINDOW = %sysfunc(round(&NRECS * &TRAINPCT / 100));

* create a working copy of dataset with observation number ;

* as a variable for use in a where clause with proc arima. ;

* also add a placeholder for the predicted value. ;

data Work._MY_COPY_ (keep = &VAR &DATE _OBS_ _PRED_);

set &DATASET;

_OBS_ = _N_;

_PRED_ = .;

run;

* turn off log — too lengthy ;

filename junk dummy;

proc printto log=junk print=junk;

run;

* Will build the model once for each record used in evaluation. ;

* Each time I will predict one record forward. ;

%let MODELS_TO_BE_BUILT = %sysevalf(&NRECS – &SIZE_OF_ROLLING_WINDOW);

%do i = 1 %to &MODELS_TO_BE_BUILT;

* Model using SIZE_OF_ROLLING_WINDOW records, and make one prediction ;

proc arima data=Work._MY_COPY_ plots=none;

where _OBS_ ge &i

and _OBS_ le (&i + &SIZE_OF_ROLLING_WINDOW – 1);

identify var=&VAR(1) noprint;

estimate

%if (&P ne ) %then %do; p=&P

%end;

%if (&Q ne ) %then %do; q=&Q %end;

method=ml noprint;

forecast lead=1 id=&DATE interval=&INTERVAL out=Work._MY_RESULTS_

noprint;

run;

* get the predicted value (in the last record) as a macro variable ;

data _null_;

p = nrecs;

set Work._MY_RESULTS_ point=p nobs=nrecs;

call symput(“FORECAST”, forecast);

stop;

run;

* move that prediction to its place in the output file ;

proc sql noprint;

update Work._MY_COPY_

set _PRED_ = &FORECAST

where _OBS_ = &i + &SIZE_OF_ROLLING_WINDOW;

quit;

run;

* show progress so far ;

%if (%sysfunc(mod(&i, 5)) = 0) %then %do;

* print on;

proc printto log=log print=print;

run;

%put Finished &i iterations;

* print off again;

proc printto log=junk print=junk;

run;

%end;

%end;

* turn print back on ;

proc printto log=log print=print;
run;

* calculate prediction error;

data Work._MY_COPY_;

set Work._MY_COPY_;

Predicted_Error = (&VAR – _PRED_) ;

Predicted_Error_Squared = (&VAR – _PRED_)**2;

absresidual = abs(Predicted_Error);

run;

* compute and report the mean square forecast error;

%if (&P eq ) %then %let PP = ; %else %let PP = p=&P;

%if (&Q eq ) %then %let QQ = ; %else %let QQ = q=&Q;

title “Backtest results for &DATASET”;

title2 “Model: &PP &QQ”;

proc summary data=Work._MY_COPY_;

where _OBS_ > &SIZE_OF_ROLLING_WINDOW;

var Predicted_Error absresidual;

output out=outm mean(absresidual)=mafe mean(Predicted_Error_Squared)=msfe;

run;

data outm;

set outm;

rmsfe=sqrt(msfe);

run;

proc print data=outm;

run;

/*

proc means data=Work._MY_COPY_ N MEAN STDDEV CLM T PROBT;

where _OBS_ > &SIZE_OF_ROLLING_WINDOW;

var Predicted_Error_Squared;

run;

title;

*/

%mend backtest;

/*————————————————-*/

proc import datafile=”globtemp.csv” out=myd replace;

delimiter=’,’;
getnames=yes;
run;

proc print data=myd;

run;

* produced time plots;

goptions reset=global;

symbol interpol = join value=star;

proc gplot data=myd;

plot temp*year / haxis = 1880 to 2012 by 10;

run;
quit;

goptions reset = global;

proc univariate data=myd;

var temp;

histogram / normal;

qqplot / normal(mu=est sigma=est);
run;

* compute Dickey-Fuller unit-root tests;

proc arima data=myd;

identify var = temp stationarity = (adf=(1 3 5));

identify var = temp(1) stationarity = (adf=(1 3 5));

run;

proc arima data=myd;

identify var = temp esacf q=(0:5) p=(0:5) nlag =20;

identify var = temp minic q=(0:5) p=(0:5) nlag =20;

run; quit;

/* fit models from step above */

* fit ARIMA(p,1,q) models for unemployment rates;

* note that model intercept is set =0 using the noconstant option;

proc arima data=myd;

identify var = temp(1) noprint;

estimate p=0 q=1 method=ml plot noconstant;

estimate p=3 q=0 method=ml plot noconstant;

run;
quit;

proc arima data=myd;

identify var=temp(1) noprint;

estimate p=0 q=1 method=ml plot noconstant;

forecast back=0 lead=4 printall out=result id=year interval=day;

run;

quit;

/* Analysis of residuals*/

/* Residual plot: residual vs time*/

goptions reset=global;

symbol i=join;

proc gplot data=result;

plot residual*year / haxis = 1880 to 2012 by 10;

run;
quit;

/*analysis of distribution of residuals, to check

for normality */

goptions reset=global;

proc univariate data=result;

var residual;

histogram/normal (mu=est sigma=est);

qqplot/normal(mu=est sigma=est);

run;

goptions reset=global;

symbol1 interpol =join value=star color=red w=2;

symbol2 interpol = join color=black;

symbol3 interpol = join value=none color = blue;

symbol4 interpol = join value=none color = blue;

legend1 label=none

shape=symbol(4,2)

position=(top center inside)

mode=share;

proc gplot data=result;

plot forecast*year temp*year u95*year l95*year / overlay

haxis = 1880 to 2022 by 10 legend=legend1;

run;
/*——— backtesting ————–*/

%backtest(trainpct=85, dataset=myd, var=temp, q=(1), interval=year);

%backtest(trainpct=85, dataset=myd, var=temp, q= 2, interval=year);

S&P500analysis

Fitting an AR(1)-GARCH(1,1) model with Gaussian errors

> # Fitting a ARMA(0,0)-GARCH(1,1) model using the rugarch package

> # log returns show weak serial autocorrelations in the mean and an ARCH

effect

> # Fitting an GARCH(1,1) model

>

# Use ugarchspec() function to specify model

> garch11.spec=ugarchspec(variance.model=list(garchOrder=c(1,1)),

mean.model=list(armaOrder=c(0,0)))

> #estimate model

>

garch11.fit=ugarchfit(spec=garch11.spec, data=ret)

>

garch11.fit

*———————————*

* GARCH Model Fit *

*———————————*

Conditional Variance Dynamics

———————————–

GARCH Model : sGARCH(1,1)

Mean Model : ARFIMA(0,0,0)

Distribution : norm

Optimal Parameters

————————————

Estimate Std. Error t value Pr(>|t|)

mu 0.006814 0.001798 3.7889 0.000151

omega 0.000090 0.000049 1.8302 0.067224

alpha1 0.118390 0.031384 3.7723 0.000162

beta1 0.844396 0.034387 24.5556 0.000000

The fitted ARMA(0,0)-GARCH(1,1) model with Gaussian errors can be written as

22

2

8444.01184.000009.0

0068.0

tttt

t

t

tt

aea

ar

 



Robust Standard Errors:

Estimate Std. Error t value Pr(>|t|)

mu 0.006814 0.001914 3.5592 0.000372

omega 0.000090 0.000070 1.2858 0.198502

alpha1 0.118390 0.041165 2.8760 0.004028

beta1 0.844396 0.040113 21.0503 0.000000

LogLikelihood : 828.5681

Information Criteria

————————————

Akaike -3.4286

Bayes -3.3938

Shibata -3.4287

Hannan-Quinn -3.4149

Q-Statistics on Standardized Residuals

————————————

statistic p-value

Lag[1] 0.5261 0.4682

Lag[p+q+1][1] 0.5261 0.4682

Lag[p+q+5][5] 5.9001 0.3161

d.o.f=0

H0 : No serial correlation

Q-Statistics on Standardized Squared Residuals

————————————
statistic p-value

Lag[1] 0.2657 0.6062

Lag[p+q+1][3] 0.8252 0.3637

Lag[p+q+5][7] 3.0889 0.6863

d.o.f=2

ARCH LM Tests

————————————

Statistic DoF P-Value

ARCH Lag[2] 0.5084 2 0.7755

ARCH Lag[5] 0.9231 5 0.9685

ARCH Lag[10] 6.3220 10 0.7875

Nyblom stability test

————————————

Joint Statistic: 1.3254

Individual Statistics:

mu 0.07929

omega 0.20176

alpha1 0.02459

beta1 0.09300

Asymptotic Critical Values (10% 5% 1%)

Joint Statistic: 1.07 1.24 1.6

Individual Statistic: 0.35 0.47 0.75

Sign Bias Test

————————————

t-value prob sig

Sign Bias 2.1382 0.0330120 **

Negative Sign Bias 0.5723 0.5673614

Positive Sign Bias 1.3403 0.1807809

Joint Effect 17.4223 0.0005786 ***

Adjusted Pearson Goodness-of-Fit Test:

————————————

group statistic p-value(g-1)

1 20 43.12 0.0012496

2 30 53.82 0.0033902

3 40 75.76 0.0003804

4 50 79.60 0.0037117

Ljung-Box test for serial correlation

computed on residuals, and Ljung-

Box test for ARCH/GARCH effect

computed on squared residuals.

Goodness of fit test for distribution of err

or

term. The null hypothesis states that the

distribution for the error terms in the model

is adequate. Thus a small p-value < α,

indicates that null hypothesis can be

rejected and the distribution assumption is

not adequate.

>

#create selection list of plots for garch(1,1) fit

>

plot(garch11.fit)

>

#to display all subplots on one page

>

plot(garch11.fit, which=”all”)

Residual analysis of GARCH model shows that the model fits the data adequately. Ljung Box test (Q-statistic) on
residuals is not significant, showing that hypothesis of no correlation for residuals cannot be rejected. Similarly the
Ljung Box test (Q-statistic) on the squared standardized residuals is not significant suggesting that residuals show
no ARCH/GARCH effect. The Adjusted Pearson goodness of fit test is significant indicating that the normal
distribution assumed for the error term is not appropriate, as also shown by the QQ plot of the residuals.

Fitting an AR(1)-GARCH(1,1) model with t-distributed errors

Elapsed time : 0.260026

> plot(garch11.fit, which=”all”)

Error in plot.new() : figure margins too large

> # use Student-t innovations

> #specify model using functions in rugarch package

> #Fit ARMA(0,0)-GARCH(1,1) model with t-distribution

>

garch11.t.spec=ugarchspec(variance.model=list(garchOrder=c(1,1)),

mean.model=list(armaOrder=c(0,0)),

distribution.model = “std”)

> #estimate model

>

garch11.t.fit=ugarchfit(spec=garch11.t.spec, data=ret)

>

garch11.t.fit

*———————————*
* GARCH Model Fit *
*———————————*
Conditional Variance Dynamics
———————————–
GARCH Model : sGARCH(1,1)
Mean Model : ARFIMA(0,0,0)

Distribution : std

Optimal Parameters
————————————
Estimate Std. Error t value Pr(>|t|)

mu 0.008012 0.001746 4.5892 0.000004

omega 0.000122 0.000068 1.7901 0.073444

alpha1 0.123683 0.038919 3.1780 0.001483

beta1 0.821471 0.050706 16.2008 0.000000

shape 6.891814 1.978494 3.4834 0.000495

The fitted ARMA(0,0)-GARCH(1,1) model with Gaussian errors can be written as

2

22

8215.01237.000012.0

0080.0

t

tt

ttt

tt
aea
ar
 


Error term {et} has t-distribution with 7 degrees of freedom

Robust Standard Errors:
Estimate Std. Error t value Pr(>|t|)

mu 0.008012 0.001817 4.4090 0.000010

omega 0.000122 0.000071 1.7216 0.085136

alpha1 0.123683 0.038112 3.2453 0.001173

beta1 0.821471 0.041476 19.8061 0.000000

shape 6.891814 1.976630 3.4866 0.000489

LogLikelihood : 839.2684

Information Criteria

————————————

Akaike -3.4689

Bayes -3.4255

Shibata -3.4691

Hannan-Quinn -3.4518

Q-Statistics on Standardized Residuals

————————————
statistic p-value

Lag[1] 0.4662 0.4947

Lag[p+q+1][1] 0.4662 0.4947

Lag[p+q+5][5] 5.6618 0.3405

d.o.f=0
H0 : No serial correlation

Q-Statistics on Standardized Squared Residuals
————————————
statistic p-value

Lag[1] 0.2474 0.6189

Lag[p+q+1][3] 0.8569 0.3546

Lag[p+q+5][7] 2.9570 0.7066

d.o.f=2

ARCH LM Tests
————————————
Statistic DoF P-Value

ARCH Lag[2] 0.5959 2 0.7423

ARCH Lag[5] 0.9740 5 0.9646

ARCH Lag[10] 6.0871 10 0.8079

Nyblom stability test
————————————

Joint Statistic: 1.396

Individual Statistics:

mu 0.17184

omega 0.23451

alpha1 0.04303

beta1 0.10209

shape 0.07244

Asymptotic Critical Values (10% 5% 1%)

Joint Statistic: 1.28 1.47 1.88

Individual Statistic: 0.35 0.47 0.75

Sign Bias Test
————————————

t-value prob sig

Sign Bias 2.121 0.0343991 **

Negative Sign Bias 0.549 0.5832924

Positive Sign Bias 1.206 0.2284569

Joint Effect 16.352 0.0009603 ***

Adjusted Pearson Goodness-of-Fit Test:
————————————
group statistic p-value(g-1)

1 20 22.66 0.25269

2 30 44.47 0.03312

3 40 39.50 0.44759

4 50 43.64 0.68967

Residual analysis of GARCH model with t-distributed error terms shows that the model fits the data adequately.
Ljung Box test (Q-statistic) on residuals is not significant, showing that hypothesis of no correlation for residuals
cannot be rejected. Similarly the Ljung Box test (Q-statistic) on the squared standardized residuals is not significant
suggesting that residuals show no ARCH/GARCH effect. The Adjusted Pearson goodness of fit test is not significant
indicating that the distribution of the error terms can be described by a t-distribution with 7 degrees of freedom,

as also shown by the QQ plot of the residuals (created using plot(garch11.t.fit, which=9).

Fitting an ARMA(0,0)-EGARCH(1,1) model

The EGARCH model fitted by the rugarch package has a slightly different form than the model in the textbook.
Here is the general expression of the EGARCH(1,1) model:

)ln(|))(||(|()ln(

,

2

1111

11

1

2







t

tttt

t

ttttt

eEe

e

eaar





where µt can follow an ARMA process, but most typically it will be constant. Gamma1 (γ1) is the leverage
parameter. So if gamma1 in output is significant, then we can conclude that the volatility process has an
asymmetric behavior.

Fitting an ARMA(0,0)-EGARCH(1,1) model with Gaussian distribution (similar to SAS example)

> #Fit ARMA(0,0)-eGARCH(1,1) model with Gaussian distribution

> egarch11.spec=ugarchspec(variance.model=list(model = “eGARCH”,

garchOrder=c(1,1)), mean.model=list(armaOrder=c(0,0)))

>

#estimate model

>

egarch11.fit=ugarchfit(spec=egarch11.spec, data=ret)

>

egarch11.fit

*———————————*
* GARCH Model Fit *
*———————————*

Conditional Variance Dynamics
———————————–

GARCH Model : eGARCH(1,1)

Mean Model : ARFIMA(0,0,0)
Distribution : norm

Optimal Parameters
————————————
Estimate Std. Error t value Pr(>|t|)

mu 0.005937 0.001860 3.1915 0.001415

omega -0.610497 0.263901 -2.3134 0.020703

alpha1 -0.111069 0.042614 -2.6064 0.009150

beta1 0.902462 0.041695 21.6442 0.000000

gamma1 0.209405 0.050542 4.1432 0.000034

Using the R output above, the ARMA(0,0)-EGARCH(1,1) model can be written as follows.

Fitted model:

rt = 0.0059 + at, at=σtet

ln(σ2t) = -0.610 + (-0.111 et-1 + 0.209(|et-1| – E(|et-1|)) + 0.9024 ln(σ
2

t-1)

Note that since et has Gaussian distribution, the E(|et|)=sqrt(2/pi) = 0.7979 or approx 0.80 (see page 143 in
textbook). Thus we can rewrite the expression above as















07979.0209.0209.0111.0

07979.0209.0209.0111.0
)ln(9024.0610.0)ln(

1

11

1

112

1
2
ttt
ttt
tt

eifee

eifee

And after some algebra, we can write:










0320.0

0098.0
)ln(9024.07767.0)ln(

11
112
1
2
tt
tt
tt

eife

eife


Taking the antilog transformation we have












0)320.0exp(

0)098.0exp(
)7767.0exp(

11

119024.02

1
2
tt

ttx

tt
eife
eife


For a standardized shock with magnitude 2, (i.e. two standard deviations), we have

56.1
)2098.0exp(

))2(320.0exp(

)2(

)2(
1
2
1
2








tt
tt
e
e

Therefore, the impact of negative shock of size two-standard deviations is about 56% higher than the impact of a
positive shock of the same size.

Robust Standard Errors:
Estimate Std. Error t value Pr(>|t|)

mu 0.005937 0.001958 3.0328 0.002423

omega -0.610497 0.364615 -1.6744 0.094060

alpha1 -0.111069 0.066343 -1.6742 0.094096

beta1 0.902462 0.057336 15.7400 0.000000

gamma1 0.209405 0.048127 4.3511 0.000014

LogLikelihood : 835.9936

Information Criteria
————————————

Akaike -3.4553

Bayes -3.4119

Shibata -3.4555

Hannan-Quinn -3.4382

Q-Statistics on Standardized Residuals
————————————
statistic p-value

Lag[1] 0.3351 0.5627

Lag[p+q+1][1] 0.3351 0.5627

Lag[p+q+5][5] 5.1570 0.3970

d.o.f=0
H0 : No serial correlation

Q-Statistics on Standardized Squared Residuals
————————————
statistic p-value

Lag[1] 0.7262 0.3941

Lag[p+q+1][3] 1.0203 0.3124

Lag[p+q+5][7] 2.8738 0.7194

d.o.f=2

ARCH LM Tests
————————————
Statistic DoF P-Value

ARCH Lag[2] 1.018 2 0.6010

ARCH Lag[5] 1.165 5 0.9482

ARCH Lag[10] 6.026 10 0.8131

Nyblom stability test
————————————

Joint Statistic: 1.5861

Individual Statistics:

mu 0.1829

omega 0.1224

alpha1 0.1527

beta1 0.1301

gamma1 1.0125

Asymptotic Critical Values (10% 5% 1%)
Joint Statistic: 1.28 1.47 1.88
Individual Statistic: 0.35 0.47 0.75

Sign Bias Test
————————————

t-value prob sig

Sign Bias 1.973 0.04902 **

Negative Sign Bias 1.165 0.24478

Positive Sign Bias 1.041 0.29840

Joint Effect 11.259 0.01041 **

Adjusted Pearson Goodness-of-Fit Test:
————————————
group statistic p-value(g-1)

1 20 22.49 0.2604

2 30 35.86 0.1777

3 40 50.31 0.1060

4 50 55.07 0.2558

Residual analysis of EGARCH model shows that the model fits the data adequately – residuals are white noise and
show no ARCH effect. The goodness of fit test supports the choice of a Gaussian distribution for the error term.
Although the qqplot shows that the error distribution has thicker left tail than the normal distribution. We will fit
the t-distribution to check if that’s a better fit for the behavior of extreme values.

Fitting an ARMA(0,0)-EGARCH(1,1) model with t-distribution

> #Fit ARMA(0,0)-eGARCH(1,1) model with t-distribution

> egarch11.t.spec=ugarchspec(variance.model=list(model = “eGARCH”,

garchOrder=c(1,1)), mean.model=list(armaOrder=c(0,0)),

distribution.model = “std”)
> #estimate model

>

egarch11.t.fit=ugarchfit(spec=egarch11.t.spec, data=ret)

>

egarch11.t.fit

*———————————*
* GARCH Model Fit *
*———————————*

Conditional Variance Dynamics
———————————–
GARCH Model : eGARCH(1,1)
Mean Model : ARFIMA(0,0,0)
Distribution : std

Optimal Parameters
————————————
Estimate Std. Error t value Pr(>|t|)

mu 0.007093 0.001757 4.0369 0.000054

omega -0.677075 0.246000 -2.7523 0.005917

alpha1 -0.145367 0.045752 -3.1773 0.001487

beta1 0.893975 0.038715 23.0912 0.000000

gamma1 0.202717 0.055934 3.6242 0.000290

shape 7.926664 2.405188 3.2957 0.000982

Using the R output above, the ARMA(0,0)-EGARCH(1,1) model can be written as follows.

Fitted model:

rt = 0.007 + at, at=σtet

ln(σ2t) = -0.677 + (-0.145 et-1 + 0.203(|et-1| – E(|et-1|)) + 0.894 ln(σ
2

t-1)
with t-distribution with 8 degrees of freedom (nearest integer to shape value)

Note that since et has t-distribution, the E(|et|)=





)2/()1(

]2/)1[(22



 (see page 143 in book) where ν=d.f. of t-

distribution (denoted by shape in R output).

Note that since gamma1 is significant, the volatility has an asymmetric behavior. Therefore a negative shock has a
stronger impact on the volatility compared to a positive shock of the same size.

Robust Standard Errors:
Estimate Std. Error t value Pr(>|t|)

mu 0.007093 0.001888 3.7574 0.000172

omega -0.677075 0.212770 -3.1822 0.001462

alpha1 -0.145367 0.042936 -3.3857 0.000710

beta1 0.893975 0.034112 26.2071 0.000000

gamma1 0.202717 0.045692 4.4366 0.000009

shape 7.926664 2.720128 2.9141 0.003567

LogLikelihood : 846.92

Information Criteria

————————————

Akaike -3.4965

Bayes -3.4445

Shibata -3.4969

Hannan-Quinn -3.4761

Q-Statistics on Standardized Residuals
————————————
statistic p-value

Lag[1] 0.1872 0.6653

Lag[p+q+1][1] 0.1872 0.6653

Lag[p+q+5][5] 4.7776 0.4436

d.o.f=0
H0 : No serial correlation

Q-Statistics on Standardized Squared Residuals
————————————
statistic p-value

Lag[1] 0.6769 0.4107

Lag[p+q+1][3] 0.9526 0.3291

Lag[p+q+5][7] 2.5747 0.7652

d.o.f=2

ARCH LM Tests
————————————
Statistic DoF P-Value

ARCH Lag[2] 0.9725 2 0.6149

ARCH Lag[5] 1.0934 5 0.9547

ARCH Lag[10] 5.4934 10 0.8559

Nyblom stability test
————————————

Joint Statistic: 1.6436

Individual Statistics:

mu 0.20752

omega 0.12081

alpha1 0.15261

beta1 0.13304

gamma1 0.93741

shape 0.08736

Asymptotic Critical Values (10% 5% 1%)

Joint Statistic: 1.49 1.68 2.12

Individual Statistic: 0.35 0.47 0.75

Sign Bias Test
————————————

t-value prob sig

Sign Bias 1.7639 0.07840 *

Negative Sign Bias 1.2413 0.21510

Positive Sign Bias 0.9747 0.33022

Joint Effect 8.8291 0.03165 **

Adjusted Pearson Goodness-of-Fit Test:
————————————
group statistic p-value(g-1)

1 20 20.66 0.35571

2 30 39.23 0.09736

3 40 54.47 0.05098

4 50 66.51 0.04860

Residual analysis of EGARCH model shows that the model fits the data adequately – residuals are white noise and
show no ARCH effect. However the goodness of fit test shows that the t-distribution is not a good choice for the
error terms (test p-values are small and rejects the null hypothesis of error term having a t-distribution. The qqplot
shows that the error distribution has thicker tails than the t-distribution. (Further analysis shows that the ged
distribution does a better job representing the extreme values distribution)

Fitting an ARMA(0,0)-GJRGARCH(1,1) model or TGARCH model

The GJRGARCH(1,1) model fitted by the rugarch package has the following expression

)ln()()ln(

,
2
11
2

1111

2





tttt
tttttt

aN

eaar





where µt can follow an ARMA process, but most typically it will be constant. Nt-1 is the indicator variable s.t.
Nt-1= 1 when at-1 (shock at time t-1) is negative, and Nt-1 = 0 otherwise. Gamma1 (γ1) is the leverage parameter. So
if gamma1 in output is significant, then we can conclude that the volatility process has an asymmetric behavior.

> #Fit ARMA(0,0)-TGARCH(1,1) model with t-distribution

>

gjrgarch11.t.spec=ugarchspec(variance.model=list(model = “gjrGARCH”,

garchOrder=c(1,1)),

mean.model=list(armaOrder=c(0,0)), distribution.model = “std”)

> #estimate model

>

gjrgarch11.t.fit=ugarchfit(spec=gjrgarch11.t.spec, data=ret)

Warning message:

In .makefitmodel(garchmodel = “gjrGARCH”, f = .gjrgarchLLH, T = T, :

NaNs produced

>

gjrgarch11.t.fit

*———————————*
* GARCH Model Fit *
*———————————*

Conditional Variance Dynamics
———————————–

GARCH Model : gjrGARCH(1,1)

Mean Model : ARFIMA(0,0,0)
Distribution : std

Optimal Parameters
————————————

Estimate Std. Error t value Pr(>|t|)

mu 0.007202 0.001766 4.077739 0.000045

omega 0.000212 0.000094 2.262826 0.023646

alpha1 0.000001 0.001394 0.000791 0.999369

beta1 0.781111 0.065122 11.994488 0.000000

gamma1 0.215386 0.069611 3.094141 0.001974

shape 7.179979 2.031526 3.534279 0.000409

Using the R output above, the ARMA(0,0)-TGARCH(1,1) model can be written as follows. Note that the arch(1)
coefficient is zero and we remove it from the model.

2
1
2
11

2
781.0)215.00000.0(00021.0

,0072.0




tttt
ttttt
aN
eaar



Where error term is assumed to have t-distribution with 7 degrees of freedom.

or
2
1
2
11

2
781.0215.000021.0

,0072.0





tttt
ttttt
aN
eaar


Where Nt-1 is an indicator variable for negative innovations such that





00

01

1
1
1
t
t
t

aif

aif
N

For a standardized shock with magnitude 2, (i.e. two standard deviations), we have

966.1
002.0781.0)002.04()000.0(00021.0

002.0781.0)002.04()215.00000.0(00021.0

)2(
)2(
1
2
1
2










tt
tt
e
e

Where the value for
2

1t
a is computed as

2
1
22

1 

ttt
ea  . Since

2

1t
 in the expression above is unknown, we set

2

1t
 = sample variance of the S&P500 index returns (that is

2

1t
 = 0.002) .

Robust Standard Errors:
Estimate Std. Error t value Pr(>|t|)

mu 0.007202 0.001921 3.749767 0.000177

omega 0.000212 0.000112 1.902791 0.057068

alpha1 0.000001 0.000054 0.020238 0.983853

beta1 0.781111 0.063980 12.208691 0.000000

gamma1 0.215386 0.056979 3.780097 0.000157

shape 7.179979 2.274951 3.156102 0.001599

LogLikelihood : 844.6336

Information Criteria
————————————

Akaike -3.4870

Bayes -3.4350

Shibata -3.4873

Hannan-Quinn -3.4666

Q-Statistics on Standardized Residuals
————————————
statistic p-value

Lag[1] 0.1348 0.7135

Lag[p+q+1][1] 0.1348 0.7135

Lag[p+q+5][5] 4.7791 0.4434

d.o.f=0
H0 : No serial correlation

Q-Statistics on Standardized Squared Residuals
————————————
statistic p-value

Lag[1] 0.7412 0.3893

Lag[p+q+1][3] 1.0145 0.3138

Lag[p+q+5][7] 2.5887 0.7631

d.o.f=2

ARCH LM Tests
————————————
Statistic DoF P-Value

ARCH Lag[2] 1.015 2 0.6020

ARCH Lag[5] 1.239 5 0.9411

ARCH Lag[10] 5.120 10 0.8830

Nyblom stability test
————————————

Joint Statistic: 2.2942

Individual Statistics:

mu 0.18155

omega 0.27666

alpha1 0.05641

beta1 0.20413

gamma1 0.03198

shape 0.06486

Asymptotic Critical Values (10% 5% 1%)
Joint Statistic: 1.49 1.68 2.12
Individual Statistic: 0.35 0.47 0.75

Sign Bias Test
————————————
t-value prob sig

Sign Bias 2.117 0.03474 **

Negative Sign Bias 1.408 0.15987

Positive Sign Bias 0.766 0.44408

Joint Effect 10.173 0.01715 **

Adjusted Pearson Goodness-of-Fit Test:
————————————
group statistic p-value(g-1)

1 20 19.83 0.4048

2 30 29.37 0.4457

3 40 37.17 0.5535

4 50 52.78 0.3300

Elapsed time : 0.458046

>

plot(gjrgarch11.t.fit, which=”all”)

Residual analysis of TGARCH model shows that the model fits the data adequately – residuals are white noise and
show no ARCH effect. The goodness of fit test also shows that the t-distribution is adequate.

Apply information criteria for model selection

>

# MODEL COMPARISON

>

# compare information criteria

>

model.list = list(garch11 = garch11.fit, garch11.t = garch11.t.fit,

+ egarch11 = egarch11.t.fit,

+ gjrgarch11 = gjrgarch11.t.fit)

>

info.mat = sapply(model.list, infocriteria)

>

rownames(info.mat) = rownames(infocriteria(garch11.fit))

>

info.mat

garch11 garch11.t egarch11 gjrgarch11

Akaike -3.428558 -3.468892 -3.480644 -3.487042

Bayes -3.393831 -3.425483 -3.428554 -3.434952

Shibata -3.428694 -3.469105 -3.480950 -3.487348

Hannan-Quinn -3.414909 -3.451830 -3.460170 -3.466568

Best model according to model selection criteria is the GJR-GARCH(1,1) model with t-distribution

R CODE:
# Analysis of daily S&P500 index

#

library(fBasics)

library(tseries)

library(rugarch)

# import data in R

# import libraries for TS analysis

myd= read.table(‘sp500_feb1970_Feb2010.txt’, header=T)

# create time series object •

rts= ts(myd$return, start = c(1970, 1), frequency=12)

# create a simple numeric object

ret =myd$return;

# CREATE TIME PLOT

plot(rts)

# Plots ACF function of vector data

acf(ret)

# Plot ACF of squared data to check for non-linear dependence

acf(ret^2)

# Computes Ljung-Box test on squared returns to test non-linear independence at lag 6

and 12

Box.test(ret^2,lag=6,type=’Ljung’)

Box.test(ret^2,lag=12,type=’Ljung’)

# Computes Ljung-Box test on absolute returns to test non-linear independence at lag 6

and 12

Box.test(abs(ret),lag=6,type=’Ljung’)

Box.test(abs(ret),lag=12,type=’Ljung’)

# FITTING AN GARCH(1,1) MODEL WITH GAUSSIAN DISTRIBUTION

# Use ugarchspec() function to specify model

garch11.spec=ugarchspec(variance.model=list(garchOrder=c(1,1)),

mean.model=list(armaOrder=c(0,0)))
#estimate model
garch11.fit=ugarchfit(spec=garch11.spec, data=ret)
garch11.fit

#persistence = alpha1+beta1

persistence(garch11.fit)

#half-life: ln(0.5)/ln(alpha1+beta1)

halflife(garch11.fit)

#create selection list of plots for garch(1,1) fit
plot(garch11.fit)
#to display all subplots on one page
plot(garch11.fit, which=”all”)

#FIT ARMA(0,0)-GARCH(1,1) MODEL WITH T-DISTRIBUTION

# specify model using functions in rugarch package

garch11.t.spec=ugarchspec(variance.model=list(garchOrder=c(1,1)),
mean.model=list(armaOrder=c(0,0)), distribution.model = “std”)
#estimate model
garch11.t.fit=ugarchfit(spec=garch11.t.spec, data=ret)
garch11.t.fit

plot(garch11.t.fit)

#FIT ARMA(0,0)-EGARCH(1,1) MODEL WITH GAUSSIAN DISTRIBUTION

egarch11.spec=ugarchspec(variance.model=list(model = “eGARCH”, garchOrder=c(1,1)),

mean.model=list(armaOrder=c(0,0)))
#estimate model
egarch11.fit=ugarchfit(spec=egarch11.spec, data=ret)
egarch11.fit

plot(egarch11.fit, which=”all”)

#FIT ARMA(0,0)-EGARCH(1,1) MODEL WITH T-DISTRIBUTION

egarch11.t.spec=ugarchspec(variance.model=list(model = “eGARCH”, garchOrder=c(1,1)),

mean.model=list(armaOrder=c(0,0)), distribution.model = “ged”)

#estimate model
egarch11.t.fit=ugarchfit(spec=egarch11.t.spec, data=ret)
egarch11.t.fit

plot(egarch11.t.fit, which=”all”)

# compute expected value E(|e|)

shape=coef(egarch11.t.fit)[6]

exp.abse=(2*sqrt(shape-2)*gamma((shape+1)/2))/((shape-1)*gamma(shape/2)*sqrt(pi))

#FIT ARMA(0,0)-TGARCH(1,1) MODEL WITH T-DISTRIBUTION

gjrgarch11.t.spec=ugarchspec(variance.model=list(model = “gjrGARCH”,

garchOrder=c(1,1)), mean.model=list(armaOrder=c(0,0)), distribution.model = “std”)

#estimate model
gjrgarch11.t.fit=ugarchfit(spec=gjrgarch11.t.spec, data=ret)
gjrgarch11.t.fit
plot(gjrgarch11.t.fit, which=”all”)

#FIT ARMA(0,0)-IGARCH(1,1) MODEL WITH SKEWED T-DISTRIBUTION

igarch11.t.spec=ugarchspec(variance.model=list(model = “iGARCH”, garchOrder=c(1,1)),

mean.model=list(armaOrder=c(0,0)), distribution.model = “std”)
#estimate model

igarch11.t.fit=ugarchfit(spec=igarch11.t.spec, data=ret)

igarch11.t.fit

plot(igarch11.t.fit, which=”all”)

# MODEL COMPARISON
# compare information criteria
model.list = list(garch11 = garch11.fit, garch11.t = garch11.t.fit,

egarch11 = egarch11.t.fit,

gjrgarch11 = gjrgarch11.t.fit)

info.mat = sapply(model.list, infocriteria)
rownames(info.mat) = rownames(infocriteria(garch11.fit))
info.mat

# RE-FIT MODELS LEAVING 100 OUT-OF-SAMPLE OBSERVATIONS FOR FORECAST

# EVALUATION STATISTICS

garch11.fit = ugarchfit(spec=garch11.spec, data=ret, out.sample=100)

garch11.t.fit = ugarchfit(spec=garch11.t.spec, data=ret, out.sample=100)

egarch11.t.fit = ugarchfit(egarch11.t.spec, data=ret, out.sample=100)

tgarch11.t.fit = ugarchfit(spec=gjrgarch11.t.spec, data=ret, out.sample=100)

# COMPUTE 100 1-STEP AHEAD ROLLING FORECASTS W/O RE-ESTIMATING

garch11.fcst = ugarchforecast(garch11.fit, n.roll=100, n.ahead=1)

garch11.t.fcst = ugarchforecast(garch11.t.fit, n.roll=100, n.ahead=1)

egarch11.t.fcst = ugarchforecast(egarch11.t.fit, n.roll=100, n.ahead=1)

tgarch11.t.fcst = ugarchforecast(tgarch11.t.fit, n.roll=100, n.ahead=1)

# COMPUTE FORECAST EVALUATION STATISTICS USING FPM() FUNCTION

fcst.list = list(garch11=garch11.fcst, garch11.t=garch11.t.fcst,

egarch11.t=egarch11.t.fcst,

tgarch11.t=tgarch11.t.fcst)

fpm.mat

= sapply(fcst.list, fpm)

fpm.mat

CSC425– Time series analysis and forecasting

Homework 5 – not be submitted

The goal of this assignment is to provide students with some practical training on

GARCH/EGARCH/TGARCH models for the analysis of the volatility of a stock return.

Solutions will be posted on Thursday November 7th 2013

Reading assignment:

1. Read Chapter 4 in short book and Chapter 3 in long book on volatility models
2. Review course documents posted under week 7 and 8.

Problem

Use the datafile nordstrom_w_00_13.csv that contains the Nordstrom (JWN ) stock weekly prices

from January 2000 to October 2013. The data file contains dates (date), daily prices (price). You

can also use the code and the analysis of the S&P500 returns used in week 7 and 8 lectures as

your reference for the analysis of this data. Analyze the Nordstrom stock log returns following

the steps below.

1. Compute log returns, and analyze their time plot.

Moments

N 693 Sum Weights 693

Mean 0.00268299 Sum Observations 1.85931229

Std Deviation 0.06068019 Variance 0.00368209

Skewness -0.2804891 Kurtosis 7.55351876

Uncorrected SS 2.55299156 Corrected SS 2.54800304

Coeff Variation 2261.66262 Std Error Mean 0.00230505

Time plot of returns show that Nordstrom stock returns shows periods of high volatility at

various times, and consistently high volatility from 2007 to 2010. Most returns are

between +/- 10%. Sample moments show that distribution of log returns is somewhat

symmetric with very fat tails (kurtosis = 7.55).

return

-0.

5

0.

4

0.

3

0.

2

0.

1

0.

0

0.1
0.2
0.3
0.4

date

01/01/2000 01/01/2002 01/01/2004 01/01/2006 01/01/2008 01/01/2010 01/01/2012 01/01/2014 01/01/2016

2. Is there evidence of serial correlations in the log returns? Use autocorrelations and 5%
significance level to answer the question.

Log returns are not serially correlated as shown by the Ljung-Box test with p-values >

0.05 and the autocorrelation plot. .

Name of Variable = return

Mean of Working Series 0.002683

Standard Deviation 0.060636

Number of Observations 693

Autocorrelation Check for White Noise

To Chi- Pr >

Lag Square DF ChiSq ——————–Autocorrelations——————–

6 6.16 6 0.4050 -0.009 0.011 -0.048 -0.009 0.078 -0.006

12 9.08 12 0.6957 -0.042 -0.038 -0.019 0.011 -0.015 0.016

18 26.17 18 0.0960 0.033 -0.026 0.088 -0.032 0.101 -0.056

24 31.58 24 0.1377 0.029 -0.037 0.025 -0.042 -0.026 -0.048

30 33.84 30 0.2871 -0.029 -0.003 -0.037 0.023 0.011 -0.016

3. Is there evidence of ARCH effects in the log returns? Use appropriate tests at 5%
significance level to answer this question.

The analysis below shows a strong ARCH effect. The squared returns are strongly

correlated. The Ljung-Box tests on squared residuals are highly significant with p-values

< 0.001, and autocorrelation plots shows large autocorrelations for the first 10 lags.

Name of Variable = returnsq

Mean of Working Series 0.003684

Standard Deviation 0.011302

Number of Observations 693

Autocorrelation Check for White Noise

To Chi- Pr >

Lag Square DF ChiSq ——————–Autocorrelations——————–

6 331.56 6 <.0001 0.534 0.281 0.123 0.101 0.167 0.240

12 407.05 12 <.0001 0.246 0.187 0.053 0.036 0.032 0.082

18 469.70 18 <.0001 0.069 0.072 0.128 0.127 0.156 0.145

24 494.90 24 <.0001 0.149 0.090 0.050 0.035 0.014 0.031

30 501.01 30 <.0001 0.033 0.064 0.051 0.023 0.010 -0.004

4. Fit an GARCH(1,1) model for the log returns using a t- distribution for the error terms.
Perform model checking (analyze if residuals are white noise, if squared residuals are

white noise, and check if t-distribution is a good fit for the data) and write down the fitted

model.

The GARCH model with t-distribution is an adequate model for the volatility of the

returns. All parameters are significant and the squared residuals are white noise (LB

tests on squared residuals are non significant).

Fitted GARCH model can be expressed as follows:

rt=0.005 +at

at=σtet with σt
2
=0.1483 a

2
t-1+0.836 σ

2
t-1

(Intercept ARCH0 can be considered equal to zero)

Where error term et has t-distribution with 1/0.185=5 degrees of freedom.

The AUTOREG Procedure

GARCH Estimates

SSE 2.55069233 Observations 693

MSE 0.00368 Uncond Var 0.00531549

Log Likelihood 1102.73026 Total R-Square .

SBC -2172.7554 AIC -2195.4605

MAE 0.04086373 AICC -2195.3732

MAPE 113.334007 HQC -2186.6796

Normality Test 126.0152

Pr > ChiSq <.0001

Parameter Estimates

Standard Approx

Variable DF Estimate Error t Value Pr > |t|

Intercept 1 0.004653 0.001570 2.96 0.0030

ARCH0 1 0.0000848 0.0000429 1.98 0.0482

ARCH1 1 0.1483 0.0369 4.01 <.0001

GARCH1 1 0.8358 0.0359 23.30 <.0001

TDFI 1 0.1851 0.0448 4.14 <.0001

Inverse of t DF

R output

Robust Standard Errors:
Estimate Std. Error t value Pr(>|t|)

mu 0.004729 0.001465 3.2281 0.001246
omega 0.000086 0.000050 1.7302 0.083590
alpha1 0.145749 0.052318 2.7858 0.005340
beta1 0.836471 0.051967 16.0962 0.000000
shape 5.345569 1.009241 5.2966 0.000000

5. Fit and EGARCH(1,1) model for the NDX log returns using a normal distribution for the

error terms. Perform model checking and write down the fitted model.

The EGARCH model is adequate although the Gaussian distribution on the error terms is

not sufficient to describe most extreme events. No ARCH effects are shown in residuals.

The leverage parameter “theta” is significant showing that volatility of Nordstrom stock

returns is affected more heavily by negative

shocks.

The fitted model can be written as follows:

rt=0.0012 +at

at=σtet with ln σt
2
=-0.194+0.213 g(et-1) +0.996 ln σ

2
t-1

g(et-1)= -0.47et-1+[|et-1|-E(et-1)]

The AUTOREG Procedure

Exponential GARCH Estimates

SSE 2.54945444 Observations 693

MSE 0.00368 Uncond Var .

Log Likelihood 1093.55049 Total R-Square .

stres

7

6

-5

-4

-3

-2

-1

0
1
2
3
4
5
6
7

quantiles

-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7

t-distribution QQplot for residuals of GARCH model

SBC -2154.3958 AIC -2177.101

MAE 0.04095052 AICC -2177.0136

MAPE 101.222881 HQC -2168.32

Normality Test 71.9939

Pr > ChiSq <.0001 Parameter Estimates

Standard Approx

Variable DF Estimate Error t Value Pr > |t|

Intercept 1 0.001236 0.001752 0.71 0.4806

EARCH0 1 -0.1940 0.0636 -3.05 0.0023

EARCH1 1 0.2133 0.0316 6.75 <.0001

EGARCH1 1 0.9662 0.0106 91.41 <.0001

THETA 1 -0.4748 0.1121 -4.23 <.0001

R output

Robust Standard Errors:
Estimate Std. Error t value Pr(>|t|)
mu 0.001271 0.001774 0.71655 0.473654
omega -0.169047 0.110169 -1.53443 0.124924
alpha1 -0.095670 0.035615 -2.68620 0.007227
beta1 0.970507 0.018820 51.56682 0.000000
gamma1 0.194542 0.059579 3.26526 0.001094

In R model is written as

rt=0.0012 +at

at=σtet with ln σt
2
=-0.169-0.095( et-1 +0.194(|et-1|-E(et-1))+0.970 ln σ

2
t-1

where error term has t-distribution with 6 degrees of freedom.

Note that the leverage coefficients of et-1 in SAS and R are equivalent.

6. Fit and EGARCH(1,1) model for the Nordstrom log returns using a t- distribution for the
error terms. Perform model checking and write down the fitted model.

The EGARCH model t-distributed errors is a good model for the data. No ARCH effects

are shown in residuals. The leverage parameter “theta” is significant and negative

showing that volatility of Nordstrom stock returns is affected more heavily by negative

shocks.

The fitted model can be written as follows:

rt=0.0032 +at

at=σtet with ln σt
2
=-0.191+0.158 g(et-1) +0.974 ln σ

2
t-1

g(et-1)= -0.55et-1+[|et-1|-E(et-1)]

et has t-distribution with 6 degrees of freedom.

The MODEL Procedure

Nonlinear Liklhood Summary of Residual Errors

DF DF Adj

Equation Model Error SSE MSE Root MSE R-Square R-Sq

7 686 2.5482 0.00371 0.0609 -0.0001 -0.0088

nresid.y 686 1028.4 1.4991 1.2244

Nonlinear Liklhood Parameter Estimates

Approx Approx

Parameter Estimate Std Err t Value Pr > |t|

intercept 0.003273 0.00160 2.05 0.0410

earch0 -0.19123 0.0932 -2.05 0.0405

earch1 0.158378 0.0490 3.23 0.0013

egarch1 0.974881 0.0139 70.01 <.0001

theta -0.55013 0.1969 -2.79 0.0053

df 5.993367 1.3083 4.58 <.0001 R output

Robust Standard Errors:
Estimate Std. Error t value Pr(>|t|)
mu 0.003374 0.001552 2.1743 0.029681
omega -0.137932 0.098212 -1.4044 0.160191
alpha1 -0.104435 0.034086 -3.0638 0.002185
beta1 0.977603 0.016446 59.4433 0.000000
gamma1 0.180592 0.075474 2.3928 0.016722
shape 5.783774 1.169335 4.9462 0.000001

In R model is written as

rt=0.0033 +at

at=σtet with ln σt
2
=-0.138-0.104( et-1 +0.180(|et-1|-E(et-1))+0.978 ln σ

2
t-1
where error term has t-distribution with 6 degrees of freedom.

Note that the leverage coefficients of et-1 in SAS and R are equivalent.

Name of Variable = resid2

Mean of Working Series 1.483942

Standard Deviation 2.819517

Number of Observations 693
Autocorrelation Check for White Noise

To Chi- Pr >
Lag Square DF ChiSq ——————–Autocorrelations——————–

6 5.11 6 0.5299 0.026 0.020 -0.044 -0.061 0.024 -0.004

12 13.61 12 0.3261 0.025 0.088 0.014 -0.041 0.017 -0.039

18 19.28 18 0.3748 0.043 -0.032 -0.038 -0.028 -0.049 0.020

24 23.55 24 0.4873 0.004 0.028 -0.025 -0.051 -0.039 -0.021

7. Find a GJK-TGARCH(1,1) model for the Nordstrom log returns using a t-distribution for
the innovations. Perform model checking and write down the fitted model.

Similar to the AR(0)- EGARCH(1,1) model, the AR(0)-GJR(1,1) model is also a good

model for the data. No ARCH effects are shown in residuals. The leverage parameter is

significant and positive showing that volatility of Nordstrom stock returns is affected

more heavily by negative shocks.

The fitted model can be written as follows:

rt=0.0036 +at

at=σtet with σt
2
=(0.034 +0.13Nt-1)a

2
t-1 +0.800 σ

2
t-1

with Nt-1 = 1 if at-1<0, and Nt-1 = 0 if at-1>0

et has t-distribution with 5 degrees of freedom.

The MODEL Procedure
Nonlinear Liklhood Parameter Estimates
Approx Approx
Parameter Estimate Std Err t Value Pr > |t|

intercept 0.003589 0.00157 2.28 0.0226

df 5.028263 0.9762 5.15 <.0001

arch0 0.000095 0.000044 2.18 0.0297

arch1 0.034154 0.0259 1.32 0.1869

garch1 0.800444 0.0584 13.71 <.0001

gamma 0.129939 0.0496 2.62 0.0090

R output

Robust Standard Errors:
Estimate Std. Error t value Pr(>|t|)
mu 0.003880 0.001519 2.5535 0.010665
omega 0.000083 0.000053 1.5798 0.114152
alpha1 0.039211 0.032406 1.2100 0.226283
beta1 0.858372 0.058370 14.7058 0.000000
gamma1 0.153341 0.074560 2.0566 0.039723
shape 5.767186 1.177356 4.8984 0.000001

The fitted model can be written as follows:

rt=0.0039 +at

at=σtet with σt
2
=(0.039 +0.15Nt-1)a

2
t-1 +0.858 σ

2
t-1
with Nt-1 = 1 if at-1<0, and Nt-1 = 0 if at-1>0
et has t-distribution with 6 degrees of freedom.

8. Is the leverage effect significant at the 5% level?

The leverage parameters in both the EGARCH and the GJR models are significant, and

have values that indicate that volatility react more heavily to negative shocks.

9. What model provides the best fit for the data? Explain.
Since the leverage parameters in the AR(0)-EGARCH(1,1) and AR(0) – GJR(1,1) models

are significant, and residuals are white noise with no ARCH effect, we can conclude that

both models provide a good explanation of the volatility behavior for Nordstrom stock

returns. Either models can be chosen, there is no clear winner. (Note that in SAS, PROC

MODEL does not compute BIC criterion – backtesting can be used to compare models.)

In R the egarch(1,1) and the GJR-GARCH(1,1) models have very similar BIC values-

garch11.t egarch11 gjrgarch11
Akaike -3.172122 -3.186143 -3.183976
Bayes -3.139358 -3.146827 -3.144660

10. Use the selected model to compute up to 5 step-ahead forecasts of the simple returns and
its volatility.

The following predictions are based on the AR(0)-EGARCH(1,1) model. sigma is the

predicted conditional standard deviation or volatility, and series is the predicted return

(note that predicted returns are the sample mean returns since returns are white noise)

R output for forecasts
sigma series
2013-10-29 0.02732 0.003374
2013-10-30 0.02764 0.003374
2013-10-31 0.02796 0.003374
2013-11-01 0.02827 0.003374
2013-11-04 0.02858 0.003374

SAS code

*import data from file;

proc import datafile=’nordstrom_w_00_13.csv’ out=myd replace;

resid

-7

-6

-5
-4
-3
-2
-1
0
1
2
3
4
5
6
7
quantiles
-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7

t-distribution QQplot of residuals for GJRGARCH model

EXPLORATORY ANALYSIS

nobs 469.000000

NAs 0.000000

Minimum -0.166652

Maximum 0.164808

1. Quartile -0.022391

3. Quartile 0.029000

Mean 0.003843

Median 0.005614

Sum 1.802469

SE Mean 0.002079

LCL Mean -0.000242

UCL Mean 0.007929

Variance 0.002027

Stdev 0.045027

Skewness -0.040829

Kurtosis 1.569306

Title:
Jarque – Bera Normalality Test
Test Results:
STATISTIC:
X-squared: 49.4632
P VALUE:
Asymptotic p Value: 1.816e-11

The distribution of log(returns) is not normal distribution, with symmetry close to 0 but kurtosis statistics is 1.5 which is far from normal distribution statistic. The Jarque-Bera test is significant (p-value < 0.05) indicating that hypothesis of normal distribution for log(returns) can be rejected. The following plot is time series plot of weekly price of Google shares which shows increasing trend. ( Time series Plot of Google Prices ) Time plot of returns show that google stock returns shows periods of high volatility at various times. Most returns are between +/- 10%. Sample moments show that distribution of log returns is somewhat symmetric with very thin tails (kurtosis = 1.56). There is various times where the returns produce +/-15% with higher volatility also. Volatility doesn’t seem to die down quickly and it continues for a while after a volatile period. To check null hypothesis of non stationarity: Dickey Fuller Test is expressed as H0: φ1=1 vs Ha: φ1<>1 where the null hypothesis indicates unit-root non-stationarity.

The Dickey-Fuller test shows that the earnings time series is unit-root stationary. The test p-values for lags 7 is less than 0.05, therefore the null hypothesis of non stationarity can be rejected (small p-values< 0.05). Augmented Dickey-Fuller Test Test Results: PARAMETER: Lag Order: 7 STATISTIC: Dickey-Fuller: -7.1331 P VALUE: 0.01 alternative hypothesis: stationary ACF TESTS The ACF plots below show the Google stock returns are not correlated indicating a constant mean model for rt. Both the squared returns time series and the absolute time series show large autocorrelations. We can conclude that the log returns process has a strong non-linear dependence. Ljung Box testing for ARCH effects for Lags of 6-12-18 for squared returns and absolute returns Box-Ljung test data: coredata(rets^2) X-squared = 48.3339, df = 6, p-value = 1.013e-08 Box-Ljung test data: coredata(rets^2) X-squared = 71.9976, df = 12, p-value = 1.352e-10 Box-Ljung test data: coredata(rets^2) X-squared = 98.7945, df = 18, p-value = 3.68e-13 The Ljung Box tests confirm that the Ljung Box tests on the squared returns are autocorrelated. The analysis shows a strong ARCH effect. The squared returns are strongly correlated. The Ljung-Box tests on squared residuals are highly significant with p-values < 0.005, and autocorrelation plots shows large autocorrelations for the first 18 lags. Plot of PACF: There is no correlation till lag 13 which shows there is no AR lag. MODEL FITTING *---------------------------------* * GARCH Model Fit * *---------------------------------* Conditional Variance Dynamics ----------------------------------- GARCH Model : sGARCH(1,1) Mean Model : ARFIMA(0,0,0) Distribution : std Optimal Parameters ------------------------------------ Estimate Std. Error t value Pr(>|t|)
mu 0.004432 0.001778 2.4924 0.012689
omega 0.000103 0.000053 1.9574 0.050295
alpha1 0.094169 0.032716 2.8784 0.003997
beta1 0.856326 0.044635 19.1850 0.000000
shape 6.797338 2.000925 3.3971 0.000681

To fit an ARMA(0,0)-GARCH(2,1) model for the log returns using a normal distribution for the error terms:

Fitted Model:

with 7 degrees of freedom
Residual Analysis
Q-Statistics on Standardized Residuals
————————————
statistic p-value
Lag[1] 0.0339 0.8539
Lag[p+q+1][1] 0.0339 0.8539
Lag[p+q+5][5] 2.4766 0.7800
d.o.f=0
H0 : No serial correlation
Q-Statistics on Standardized Squared Residuals
————————————
statistic p-value
Lag[1] 0.1274 0.7212
Lag[p+q+1][3] 1.4245 0.2327
Lag[p+q+5][7] 3.4346 0.6333
d.o.f=2
Above output shows there is no evidence of autocorrelation in residuals. They behave like white noise. Also, there is no evidence of serial correlation in squared residuals. They also behave like white noise.
Adjusted Pearson Goodness-of-Fit Test:
————————————
group statistic p-value(g-1)
1 20 8465 0
2 30 13148 0
3 40 17834 0
4 50 22521 0
Test for Goodness of fit. Normal Distribution rejected.
To fit ARMA(0,0)-eGARCH(1,1) model with Gaussian distribution:

Conditional Variance Dynamics
———————————–
GARCH Model : eGARCH(1,1)
Mean Model : ARFIMA(0,0,0)
Distribution : norm
Optimal Parameters
————————————
Estimate Std. Error t value Pr(>|t|)
mu 0.004814 0.001837 2.6205 0.008780
omega -0.427600 0.165861 -2.5781 0.009936
alpha1 -0.086690 0.033886 -2.5583 0.010518
beta1 0.931657 0.026273 35.4604 0.000000
gamma1 0.177187 0.048248 3.6724 0.000240
Fitted Model is:

Residual Analysis:
Q-Statistics on Standardized Residuals
————————————
statistic p-value
Lag[1] 0.03655 0.8484
Lag[p+q+1][1] 0.03655 0.8484
Lag[p+q+5][5] 2.73813 0.7403
d.o.f=0
H0 : No serial correlation
Q-Statistics on Standardized Squared Residuals
————————————
statistic p-value
Lag[1] 0.2255 0.6348
Lag[p+q+1][3] 2.6812 0.1015
Lag[p+q+5][7] 4.1362 0.5300
d.o.f=2
ARCH LM Tests
————————————
Statistic DoF P-Value
ARCH Lag[2] 2.224 2 0.3290
ARCH Lag[5] 4.682 5 0.4559
ARCH Lag[10] 7.053 10 0.7204
Above output shows there is no evidence of autocorrelation in residuals. They behave like white noise. Also, there is no evidence of serial correlation in squared residuals. They also behave like white noise.
Adjusted Pearson Goodness-of-Fit Test:
————————————
group statistic p-value(g-1)
1 20 26.65 0.1131
2 30 32.04 0.3179
3 40 43.67 0.2798
4 50 47.74 0.5243
Above output shows that error terms are normally distributed as all the P values are greater than 0.05.
To fit ARMA(0,0)-tGARCH(1,1) model with Gaussian distribution:
Conditional Variance Dynamics
———————————–
GARCH Model : gjrGARCH(2,1)
Mean Model : ARFIMA(0,0,0)
Distribution : norm
Optimal Parameters
————————————
Estimate Std. Error t value Pr(>|t|)
mu 0.004928 0.001860 2.649926 0.008051
omega 0.000120 0.000053 2.257334 0.023987
alpha1 0.000000 0.000877 0.000011 0.999991
alpha2 0.072854 0.037335 1.951388 0.051011
beta1 0.828559 0.053752 15.414339 0.000000
gamma1 0.397469 0.144242 2.755564 0.005859
gamma2 -0.315459 0.131445 -2.399922 0.016399
Fitted Model is:

Residual Analysis:
Q-Statistics on Standardized Residuals
————————————
statistic p-value
Lag[1] 0.2311 0.6307
Lag[p+q+1][1] 0.2311 0.6307
Lag[p+q+5][5] 2.9412 0.7090
d.o.f=0
H0 : No serial correlation
Q-Statistics on Standardized Squared Residuals
————————————
statistic p-value
Lag[1] 2.137 0.14380
Lag[p+q+1][4] 4.191 0.04063
Lag[p+q+5][8] 4.680 0.45620
d.o.f=3
ARCH LM Tests
————————————
Statistic DoF P-Value
ARCH Lag[2] 2.461 2 0.2921
ARCH Lag[5] 4.904 5 0.4277
ARCH Lag[10] 7.140 10 0.7122
Above output shows there is no evidence of autocorrelation in residuals. They behave like white noise. Also, there is no evidence of serial correlation in squared residuals. They also behave like white noise.
Adjusted Pearson Goodness-of-Fit Test:
————————————
group statistic p-value(g-1)
1 20 20.00 0.3947
2 30 32.43 0.3014
3 40 30.87 0.8203
4 50 54.56 0.2714
Above output shows that error terms are normally distributed as all the P values are greater than 0.05.
BEST FIT FOR THE MODEL:
garch11 egarch11 gjrgarch11
Akaike -3.479720 -3.455538 -3.474405
Bayes -3.435471 -3.411288 -3.412455
Shibata -3.479944 -3.455762 -3.474841
Hannan-Quinn -3.462310 -3.438127 -3.450030
From above output it can be said that AR(0)-EGARCH(1,1) have the (smallest) values for all best fit criterion and hence can be considered as best model.

Forecast Analysis:

RE-FIT MODELS LEAVING 100 OUT-OF-SAMPLE OBSERVATIONS FOR FORECAST and EVALUATE STATISTICS for 5 Step Ahead Forecast.

garch11 egarch11 tgarch21

MSE 0.000950199 0.0009498531 0.0009499473

MAE 0.02257449 0.0225407 0.02255243

DAC 0.6 0.6 0.6

Since MSE is smallest for eGARCH model. We should consider this model for forecasting.

CODE:
#Tomas Georgakopoulos CSC 425 Final Project
setwd(“C:/Course/CSC425”)
# Analysis of Google weekly returns from
library(forecast)
library(TSA)
# import data in R and compute log returns
# import libraries for TS analysis
library(zoo)
library(tseries)
myd= read.table(‘Weekly-GOOG-TSDATA.csv’, header=T, sep=’,’)
pricets = zoo(myd$price, as.Date(as.character(myd$date), format=c(“%m/%d/%Y”)))
#log return time series
rets = log(pricets/lag(pricets, -1))
# strip off the dates and just create a simple numeric object
ret = coredata(rets);
#compute statistics
library(fBasics)
basicStats(rets)
#HISTOGRAM
par(mfcol=c(1,2))
hist(rets, xlab=”Weekly Returns”, prob=TRUE, main=”Histogram”)
#Add approximating normal density curve
xfit<-seq(min(rets,na.rm=TRUE),max(rets,na.rm=TRUE),length=40) yfit<-dnorm(xfit,mean=mean(rets,na.rm=TRUE),sd=sd(rets,na.rm=TRUE)) lines(xfit, yfit, col="blue", lwd=2) #CREATE NORMAL PROBABILITY PLOT qqnorm(rets) qqline(rets, col = 'red', lwd=2) # creates time plot of log returns par(mfrow=c(1,1)) plot(rets) #Perform Jarque-Bera normality test. normalTest(rets,method=c('jb')) #SKEWNESS TEST skew_test=skewness(rets)/sqrt(6/length(rets)) skew_test print("P-value = ") 2*(1-pnorm(abs(skew_test))) #FAT-TAIL TEST k_stat = kurtosis(rets)/sqrt(24/length(rets)) print("Kurtosis test statistic") k_stat print("P-value = ") 2*(1-pnorm(abs(k_stat))) #COMPUTE DICKEY-FULLER TEST library(fUnitRoots) adfTest(rets, lags=7, type=c("c")) # Computes Ljung-Box test on squared returns to test non-linear independence at lag 6 and 12 Box.test(coredata(rets^2),lag=6,type='Ljung') Box.test(coredata(rets^2),lag=12,type='Ljung') Box.test(coredata(rets^2),lag=18,type='Ljung') # Computes Ljung-Box test on absolute returns to test non-linear independence at lag 6 and 12 Box.test(abs(coredata(rets)),lag=6,type='Ljung') Box.test(abs(coredata(rets)),lag=12,type='Ljung') Box.test(abs(coredata(rets)),lag=18,type='Ljung') # Plots ACF function of vector data par(mfrow=c(3,1)) acf(ret) # Plot ACF of squared returns to check for ARCH effect acf(ret^2) # Plot ACF of absolute returns to check for ARCH effect acf(abs(ret)) #plot returns, square returns and abs(returns) # Plots ACF function of vector data par(mfrow=c(3,1)) plot(rets, type='l') # Plot ACF of squared returns to check for ARCH effect plot(rets^2,type='l') # Plot ACF of absolute returns to check for ARCH effect plot(abs(rets),type='l') par(mfrow=c(1,1)) # plots PACF of squared returns to identify order of AR model pacf(coredata(rets),lag=30) #GARCH Models library(rugarch) #Fit AR(0,0)-GARCH(1,1) model garch11.spec=ugarchspec(variance.model=list(garchOrder=c(1,1)), mean.model=list(armaOrder=c(0,0)),distribution.model = "std") #estimate model garch11.fit=ugarchfit(spec=garch11.spec, data=rets) garch11.fit #Fit AR(0,0)-eGARCH(1,1) model with Gaussian distribution egarch11.spec=ugarchspec(variance.model=list(model = "eGARCH",garchOrder=c(1,1)), mean.model=list(armaOrder=c(0,0))) #estimate model egarch11.fit=ugarchfit(spec=egarch11.spec, data=rets) egarch11.fit #Fit AR(0,0)-TGARCH(1,1) model with norm-distribution gjrgarch11.spec=ugarchspec(variance.model=list(model = "gjrGARCH",garchOrder=c(2,1)), mean.model=list(armaOrder=c(0,0)), distribution.model = "norm") #estimate model gjrgarch11.fit=ugarchfit(spec=gjrgarch11.spec, data=rets) gjrgarch11.fit # compare information criteria model.list = list(garch11 = garch11.fit, egarch11 = egarch11.fit, gjrgarch11 = gjrgarch11.fit) info.mat = sapply(model.list, infocriteria) rownames(info.mat) = rownames(infocriteria(garch11.fit)) info.mat # RE-FIT MODELS LEAVING 100 OUT-OF-SAMPLE OBSERVATIONS FOR FORECAST # EVALUATION STATISTICS garch11.fit = ugarchfit(spec=garch11.spec, data=rets, out.sample=100) egarch11.fit = ugarchfit(egarch11.spec, data=rets, out.sample=100) tgarch11.fit = ugarchfit(spec=gjrgarch11.spec, data=rets, out.sample=100) garch11.fcst = ugarchforecast(garch11.fit, n.roll=100, n.ahead=1) egarch11.fcst = ugarchforecast(egarch11.fit, n.roll=100, n.ahead=1) tgarch11.fcst = ugarchforecast(tgarch11.fit, n.roll=100, n.ahead=1) fcst.list = list(garch11=garch11.fcst, egarch11=egarch11.fcst, tgarch21=tgarch11.fcst) fpm.mat = sapply(fcst.list, fpm) fpm.mat

The

average weekly earnings of google shares are $1.005 per share with a standard deviation of $0.045. The distribution of returns is more or less symmetric but have high peak. Normality test (

Jarque Bera Test

) rejects hypothesis of normality for earnings data at 5% level.

Jarque Bera Test

data: returns

X-squared = 54.1642, df = 2, p-value = 1.731e-12

The distribution of log(returns) is also not closer to normal distribution, with symmetry close to 0 but kurtosis statistics is 1.5 which is far from normal distribution statistic. The Jarque-Bera test is significant (p-value < 0.05) indicating that hypothesis of normal distribution for log(returns) can be rejected.

Jarque Bera Test

data: rets

X-squared = 49.4632, df = 2, p-value = 1.816e-11

The following plot is time series plot of weekly price of google shares which shows increasing trend.

Time series Plot of Google Prices

Time plot of returns show that google stock returns shows periods of high volatility at various times. Most returns are between +/- 10%. Sample moments show that distribution of log returns is somewhat symmetric with very thin tails (kurtosis = 1.56).


To check null hypothesis of non stationarity:

Dickey Fuller Test is expressed as H0: φ1=1 vs Ha: φ1<>1 where the null hypothesis indicates unit-root non-stationarity.

The Dickey-Fuller test shows that the earnings time series is unit-root stationary. The test p-values for lags 7 is less than 0.05, therefore the null hypothesis of non stationarity can be rejected (small p-values< 0.05).

Augmented Dickey-Fuller Test

data: rets

Dickey-Fuller = -7.1264, Lag order = 7, p-value = 0.01

alternative hypothesis: stationary


To check serial correlations in the log returns:

Log returns are not serially correlated as shown by the Ljung-Box test with p-values > 0.05 and the autocorrelation plot.

Box-Pierce test

data: coredata(rets)

X-squared = 0.686, df = 1, p-value = 0.4075


To look for evidence of ARCH effects in the log returns:

The analysis below shows a strong ARCH effect. The squared returns are strongly correlated. The Ljung-Box tests on squared residuals are highly significant with p-values < 0.005, and autocorrelation plots shows large autocorrelations for the first 15 lags.

Box-Pierce test

data: coredata(rets^2)

X-squared = 8.1483, df = 1, p-value = 0.00431

Plot of PACF: There is no correlation till lag 10 which shows there is no AR lag.


To fit an ARMA(0,0)-GARCH(2,1) model for the log returns using a normal- distribution for the error terms:

Fitted

Model

:

Residual Analysis:

Adjusted Pearson Goodness-of-Fit Test:

————————————

group statistic p-value(g-1)

1

20 28.78 0.06948

2 30 38.57 0.11019

3 40 46.39 0.19380

4 50 51.15 0.38929

Above output shows that error terms are normally distributed as all the P values are greater than 0.05.


To fit ARMA(0,0)-eGARCH(1,1) model with Gaussian distribution:

Fitted Model is:

Residual Analysis:
Adjusted Pearson Goodness-of-Fit Test:
————————————
group statistic p-value(g-1)

1 20 28.95 0.06673

2 30 27.82 0.52739

3 40 46.91 0.18000

4 50 50.72 0.40546

Above output shows that error terms are normally distributed as all the P values are greater than 0.05.

To fit ARMA(0,0)-TGARCH(2,1) model with norm-distribution:

Fitted Model:

Residual Analysis:
Adjusted Pearson Goodness-of-Fit Test:
————————————
group statistic p-value(g-1)

1 20 18.29 0.5030

2 30 31.28 0.3525

3 40 29.17 0.8742

4 50 51.36 0.3813

Above output shows that error terms are normally distributed as all the P values are greater than 0.05.

BEST FIT FOR THE MODEL:

garch21

egarch11

gjrgarch21

Akaike -3.436225 -3.451379 -3.463469

Bayes -3.391975 -3.407129 -3.401519

Shibata -3.436449 -3.451603 -3.463905

Hannan-Quinn -3.418814 -3.433968 -3.439094

From above output it can be said that AR(0)-EGARCH(1,1) and AR(0) – GJR(1,1) both have almost equal (smallest) values for all best fit criterion and hence can be considered as best model.


Forecast Analysis:

RE-FIT MODELS LEAVING 100 OUT-OF-SAMPLE OBSERVATIONS FOR FORECAST and EVALUATE STATISTICS for 5 Step Ahead Forecast.

0.6000000000

0.6000000000

Model garch21 egarch11

tgarch21

MSE

0.0009497851

0.0009497741

0.0009499554

MAE

0.0224897000

0.0225260500

0.0225533000

DAC

0.6000000000

Since MSE is smallest for GJRGARCH model. We should consider this model for forecasting.

CODE:

library(e1071)

library(tseries)

library(xts)

library(rugarch)

price = scan(“clipboard”)

pricets=ts(price,frequency=52,start=c(2004,45))

returns=pricets/lag(pricets, -1)

summary(price)

skewness(price)

kurtosis(price)

sd(price)

summary(returns)

skewness(returns)

kurtosis(returns)

sd(returns)

#Normality check

hist(returns, prob=TRUE)

lines(density(returns, adjust=2), type=”l”)

jarque.bera.test(returns)

#log return time series

rets = log(pricets/lag(pricets, -1))

#Normality check

hist(rets, prob=TRUE)

lines(density(rets, adjust=2), type=”l”)

jarque.bera.test(rets)

plot.ts(price.ts)

# strip off the dates and just create a simple numeric object (require:xts)

ret = coredata(rets);

# creates time plot of log returns

plot(rets)

summary(rets)

skewness(rets)

kurtosis(rets)

sd(returns)

#ADF test for checking null hypothesis of non stationarity

adf.test(rets)

# Computes Ljung-Box test on returns

Box.test( coredata(rets))

# Computes Ljung-Box test on squared returns to test non-linear independence

Box.test(coredata(rets^2))

# Computes Ljung-Box test on absolute returns to test non-linear independence

Box.test(abs(coredata(rets)))

par(mfrow=c(3,1))

# Plots ACF function of vector data

acf(ret)

# Plot ACF of squared returns to check for ARCH effect

acf(ret^2)

# Plot ACF of absolute returns to check for ARCH effect

acf(abs(ret))

#plot returns, square returns and abs(returns)

par(mfrow=c(3,1))
# Plots ACF function of vector data

plot(rets, type=’l’)

# Plot ACF of squared returns to check for ARCH effect

plot(rets^2,type=’l’)

# Plot ACF of absolute returns to check for ARCH effect

plot(abs(rets),type=’l’)

par(mfrow=c(1,1))

# plots PACF of squared returns to identify order of AR model

pacf(coredata(rets),lag=10)

#specify model using functions in rugarch package

#Fit ARMA(0,0)-GARCH(2,1) model

garch21.spec=ugarchspec(variance.model=list(garchOrder=c(2,1)), mean.model=list(armaOrder=c(0,0)))

#estimate model

garch21.fit

=ugarchfit(spec=garch21.spec, data=rets)

garch21.fit

#Fit ARMA(0,0)-eGARCH(1,1) model with Gaussian distribution

egarch11.spec=ugarchspec(variance.model=list(model = “eGARCH”,

garchOrder=c(1,1)), mean.model=list(armaOrder=c(0,0)))

#estimate model

egarch11.fit

=ugarchfit(spec=egarch11.spec, data=ret)

egarch11.fit

#Fit ARMA(0,0)-TGARCH(2,1) model with norm-distribution

gjrgarch21.spec=ugarchspec(variance.model=list(model = “gjrGARCH”,

garchOrder=c(2,1)), mean.model=list(armaOrder=c(0,0)), distribution.model = “norm”)

#estimate model

gjrgarch21.fit

=ugarchfit(spec=gjrgarch21.spec, data=ret)

gjrgarch21.fit

# compare information criteria

model.list = list(garch21 = garch21.fit,

egarch11 = egarch11.fit,

gjrgarch21 = gjrgarch21.fit)

info.mat

= sapply(model.list, infocriteria)

rownames(info.mat) = rownames(infocriteria(garch21.fit))

info.mat

# RE-FIT MODELS LEAVING 100 OUT-OF-SAMPLE OBSERVATIONS FOR FORECAST

# EVALUATION STATISTICS

garch21.fit = ugarchfit(spec=garch21.spec, data=ret, out.sample=100)

egarch11.fit = ugarchfit(egarch11.spec, data=ret, out.sample=100)

tgarch21.fit = ugarchfit(spec=gjrgarch21.spec, data=ret, out.sample=100)

garch21.fcst = ugarchforecast(garch21.fit, n.roll=100, n.ahead=5)

egarch11.fcst = ugarchforecast(egarch11.fit, n.roll=100, n.ahead=5)

tgarch21.fcst = ugarchforecast(tgarch21.fit, n.roll=100, n.ahead=5)

fcst.list = list(garch21=garch21.fcst,

egarch11=egarch11.fcst,

tgarch21=tgarch21.fcst)

fpm.mat

= sapply(fcst.list, fpm)

fpm.mat

Still stressed from student homework?
Get quality assistance from academic writers!

Order your essay today and save 25% with the discount code LAVENDER