I have the problem below that needs to be completed. I have included Weekly GOOGLE time series data from the past few years. The analysis must be done on this data. I have also included sample code and analysis. Please do the analysis similar to this with thorough answers. It will be easier if you just copy and paste the sample code and modify it.
The analysis should be submitted as a report consisting of two sections:
1. A non-technical summary not longer than one typewritten page, describing the conclusions of your statistical analysis to a non-technical audience. It should be intelligible to a person who does not know regression analysis. Suppose you are talking to your boss who does not know statistics or to a friend who is not familiar with statistical terminology.
2. A technical summary not exceeding 5 pages, with details of your statistical analysis. This section is intended for a statistically literate audience and must be written in a clear organized fashion. For instance, you can organize the report into subsections such as
a. Exploratory analysis of the data.
b. Model fitting.
c. Residual analysis and model diagnostics.
d. Forecast analysis
e. Analysis of the results and discussion.
You should include appropriate output and graphs from your R programs in your document in a presentable format. Further details may be relegated to an appendix that may contain the R code, some graphs, computer output or supplementary information about the field of study.
It is possible that you may not find a satisfactory model that fits adequately your data. Sometimes a data set may admit more than one satisfactory answer; sometimes there may be none. If the statistical analysis shows that no time series models are suitable for your data set, please mention what approaches you have tried and what was unsatisfactory about them. If there is more than one suitable model, mention the pros and cons and compare their performance in forecasting future observations.
================================================================ The assignment should resemble Homework 4 and include the pieces of Homework 5 and the sample analysis. It should be in a clean format like the homework and include the R code at the end of it. Since this is Stock price data the arima process can’t be used and it should (I think) require the garch volatility modeling. Take a look at the S&P analysis for further clarification
date | price | |||
11/1/2004 | 169.35 | |||
11/8/2004 | 182 | |||
11/15/2004 | 169.4 | |||
11/22/2004 | 179.39 | |||
11/29/2004 | 180.4 | |||
12/6/2004 | 171.65 | |||
12/13/2004 | 180.08 | |||
12/20/2004 | 187.9 | |||
12/27/2004 | 192.79 | |||
1/3/2005 | 193.85 | |||
1/10/2005 | 199.97 | |||
1/18/2005 | 188.28 | |||
1/24/2005 | 190.34 | |||
1/31/2005 | 204.36 | |||
2/7/2005 | 187.4 | |||
2/14/2005 | 197.95 | |||
2/22/2005 | 185 | |||
2/28/2005 | 185.9 | |||
3/7/2005 | 177.8 | |||
3/14/2005 | 180.04 | |||
3/21/2005 | 179.25 | |||
3/28/2005 | ||||
4/4/2005 | 192.05 | |||
4/11/2005 | ||||
4/18/2005 | 215.81 | |||
4/25/2005 | 220 | |||
5/2/2005 | 228.02 | |||
5/9/2005 | 229.24 | |||
5/16/2005 | 241.61 | |||
5/23/2005 | 266 | |||
5/31/2005 | 280 | |||
6/6/2005 | 282.5 | |||
6/13/2005 | 280.3 | |||
6/20/2005 | 297.25 | |||
6/27/2005 | 291.25 | |||
7/5/2005 | 296.23 | |||
7/11/2005 | 301.19 | |||
7/18/2005 | 302.4 | |||
7/25/2005 | 287.76 | |||
8/1/2005 | 292.35 | |||
8/8/2005 | 289.72 | |||
8/15/2005 | ||||
8/22/2005 | 283.58 | |||
8/29/2005 | 288.45 | |||
9/6/2005 | 299.09 | |||
9/12/2005 | 300.2 | |||
9/19/2005 | 315.36 | |||
9/26/2005 | 316.46 | |||
10/3/2005 | 312.99 | |||
10/10/2005 | 296.14 | |||
10/17/2005 | 339.9 | |||
10/24/2005 | 358.17 | |||
10/31/2005 | 390 | |||
11/7/2005 | ||||
11/14/2005 | 400.21 | |||
11/21/2005 | 428.62 | |||
11/28/2005 | 417.7 | |||
12/5/2005 | 409.2 | |||
12/12/2005 | 430.15 | |||
12/19/2005 | 430.93 | |||
12/27/2005 | 414.86 | |||
1/3/2006 | 465.66 | |||
1/9/2006 | 466.25 | |||
1/17/2006 | 399.46 | |||
1/23/2006 | 433.49 | |||
1/30/2006 | 381.55 | |||
2/6/2006 | 362.61 | |||
2/13/2006 | 368.75 | |||
2/21/2006 | 377.4 | |||
2/27/2006 | 378.18 | |||
3/6/2006 | 337.5 | |||
3/13/2006 | 339.79 | |||
3/20/2006 | 365.8 | |||
3/27/2006 | ||||
4/3/2006 | 406.16 | |||
4/10/2006 | 402.16 | |||
4/17/2006 | 437.1 | |||
4/24/2006 | 417.94 | |||
5/1/2006 | 394.3 | |||
5/8/2006 | 374.13 | |||
5/15/2006 | 370.02 | |||
5/22/2006 | 381.35 | |||
5/30/2006 | 379.44 | |||
6/5/2006 | 386.57 | |||
6/12/2006 | 390.7 | |||
6/19/2006 | 404.86 | |||
6/26/2006 | 419.33 | |||
7/3/2006 | 420.45 | |||
7/10/2006 | 403.5 | |||
7/17/2006 | 390.11 | |||
7/24/2006 | 388.12 | |||
7/31/2006 | 373.85 | |||
8/7/2006 | 368.5 | |||
8/14/2006 | 383.36 | |||
8/21/2006 | 373.26 | |||
8/28/2006 | 378.6 | |||
9/5/2006 | 377.85 | |||
9/11/2006 | 409.88 | |||
9/18/2006 | 403.78 | |||
9/25/2006 | 401.9 | |||
10/2/2006 | 420.5 | |||
10/9/2006 | 427.3 | |||
10/16/2006 | 459.67 | |||
10/23/2006 | 475.2 | |||
10/30/2006 | 471.8 | |||
11/6/2006 | 473.55 | |||
11/13/2006 | 498.79 | |||
11/20/2006 | 505 | |||
11/27/2006 | 480.8 | |||
12/4/2006 | 484.11 | |||
12/11/2006 | 480.3 | |||
12/18/2006 | 455.58 | |||
12/26/2006 | 460 | |||
1/3/2007 | 487.19 | |||
1/8/2007 | ||||
1/16/2007 | 489.75 | |||
1/22/2007 | 495.84 | |||
1/29/2007 | 481.5 | |||
2/5/2007 | 461.89 | |||
2/12/2007 | 469.94 | |||
2/20/2007 | 470.62 | |||
2/26/2007 | 438.68 | |||
3/5/2007 | 452.96 | |||
3/12/2007 | 440.85 | |||
3/19/2007 | 461.83 | |||
3/26/2007 | 458.16 | |||
4/2/2007 | 471.51 | |||
4/9/2007 | 466.29 | |||
4/16/2007 | 482.48 | |||
4/23/2007 | 479.01 | |||
4/30/2007 | 471.12 | |||
5/7/2007 | 466.74 | |||
5/14/2007 | 470.3 | |||
5/21/2007 | 483.52 | |||
5/29/2007 | 500.4 | |||
6/4/2007 | 515 | |||
6/11/2007 | 505.89 | |||
6/18/2007 | 524.98 | |||
6/25/2007 | 522.7 | |||
7/2/2007 | 539.4 | |||
7/9/2007 | 552.16 | |||
7/16/2007 | 520.12 | |||
7/23/2007 | 511.89 | |||
7/30/2007 | 503 | |||
8/6/2007 | 515.75 | |||
8/13/2007 | 500.04 | |||
8/20/2007 | ||||
8/27/2007 | 515.25 | |||
9/4/2007 | 519.35 | |||
9/10/2007 | 528.75 | |||
9/17/2007 | 560 | |||
9/24/2007 | 567 | |||
10/1/2007 | 594.05 | |||
10/8/2007 | 637.39 | |||
10/15/2007 | 644.71 | |||
10/22/2007 | 674.6 | |||
10/29/2007 | 711.25 | |||
11/5/2007 | 663.97 | |||
11/12/2007 | 633.63 | |||
11/19/2007 | 676.7 | |||
11/26/2007 | 693 | |||
12/3/2007 | 714.87 | |||
12/10/2007 | 689.96 | |||
12/17/2007 | 696.69 | |||
12/24/2007 | 702.53 | |||
12/31/2007 | 657 | |||
1/7/2008 | 638.25 | |||
1/14/2008 | 600.25 | |||
1/22/2008 | 566.4 | |||
1/28/2008 | 515.9 | |||
2/4/2008 | 516.69 | |||
2/11/2008 | 529.64 | |||
2/19/2008 | 507.8 | |||
2/25/2008 | 471.18 | |||
3/3/2008 | 433.35 | |||
3/10/2008 | 437.92 | |||
3/17/2008 | 433.55 | |||
3/24/2008 | 438.08 | |||
3/31/2008 | 471.09 | |||
4/7/2008 | 457.45 | |||
4/14/2008 | 539.41 | |||
4/21/2008 | 544.06 | |||
4/28/2008 | 581.29 | |||
5/5/2008 | 573 | |||
5/12/2008 | 580 | |||
5/19/2008 | 544.62 | |||
5/27/2008 | 585.8 | |||
6/2/2008 | ||||
6/9/2008 | 571.51 | |||
6/16/2008 | 546.43 | |||
6/23/2008 | 528.07 | |||
6/30/2008 | 537 | |||
7/7/2008 | 533.8 | |||
7/14/2008 | 481.32 | |||
7/21/2008 | 491.98 | |||
7/28/2008 | 467.86 | |||
8/4/2008 | 495.01 | |||
8/11/2008 | 510.15 | |||
8/18/2008 | 490.59 | |||
8/25/2008 | 463.29 | |||
9/2/2008 | 444.25 | |||
9/8/2008 | 437.66 | |||
9/15/2008 | 449.15 | |||
9/22/2008 | 431.04 | |||
9/29/2008 | 386.91 | |||
10/6/2008 | 332 | |||
10/13/2008 | 372.5 | |||
10/20/2008 | 339.29 | |||
10/27/2008 | 359.36 | |||
11/3/2008 | 331.14 | |||
11/10/2008 | 310.02 | |||
11/17/2008 | 262.43 | |||
11/24/2008 | 292.96 | |||
12/1/2008 | 283.99 | |||
12/8/2008 | 315.76 | |||
12/15/2008 | 310.17 | |||
12/22/2008 | 300.36 | |||
12/29/2008 | 321.32 | |||
1/5/2009 | 315.07 | |||
1/12/2009 | 299.67 | |||
1/20/2009 | 324.7 | |||
1/26/2009 | 338.53 | |||
2/2/2009 | 371.28 | |||
2/9/2009 | 357.68 | |||
2/17/2009 | 346.45 | |||
2/23/2009 | 337.99 | |||
3/2/2009 | 308.57 | |||
3/9/2009 | 324.42 | |||
3/16/2009 | 330.16 | |||
3/23/2009 | 347.7 | |||
3/30/2009 | 369.78 | |||
4/6/2009 | ||||
4/13/2009 | 392.24 | |||
4/20/2009 | 389.49 | |||
4/27/2009 | 393.69 | |||
5/4/2009 | 407.33 | |||
5/11/2009 | ||||
5/18/2009 | 393.5 | |||
5/26/2009 | 417.23 | |||
6/1/2009 | 444.32 | |||
6/8/2009 | 424.84 | |||
6/15/2009 | 420.09 | |||
6/22/2009 | 425.32 | |||
6/29/2009 | 408.49 | |||
7/6/2009 | 414.4 | |||
7/13/2009 | 430.25 | |||
7/20/2009 | 446.72 | |||
7/27/2009 | 443.05 | |||
8/3/2009 | 457.1 | |||
8/10/2009 | ||||
8/17/2009 | 465.24 | |||
8/24/2009 | 464.75 | |||
8/31/2009 | 461.3 | |||
9/8/2009 | 472.14 | |||
9/14/2009 | 491.46 | |||
9/21/2009 | 492.48 | |||
9/28/2009 | 484.58 | |||
10/5/2009 | 516.25 | |||
10/12/2009 | 549.85 | |||
10/19/2009 | 553.69 | |||
10/26/2009 | 536.12 | |||
11/2/2009 | 551.1 | |||
11/9/2009 | 572.05 | |||
11/16/2009 | 569.96 | |||
11/23/2009 | 579.76 | |||
11/30/2009 | 585.01 | |||
12/7/2009 | 590 | |||
12/14/2009 | 596.42 | |||
12/21/2009 | 618.48 | |||
12/28/2009 | 619.98 | |||
1/4/2010 | 602.02 | |||
1/11/2010 | ||||
1/19/2010 | 550.01 | |||
1/25/2010 | 529.94 | |||
2/1/2010 | 531.29 | |||
2/8/2010 | 533.12 | |||
2/16/2010 | 540.76 | |||
2/22/2010 | 526.8 | |||
3/1/2010 | 564.21 | |||
3/8/2010 | 579.54 | |||
3/15/2010 | ||||
3/22/2010 | 562.69 | |||
3/29/2010 | 568.8 | |||
4/5/2010 | 566.22 | |||
4/12/2010 | 550.15 | |||
4/19/2010 | 544.99 | |||
4/26/2010 | 525.7 | |||
5/3/2010 | 493.14 | |||
5/10/2010 | 507.53 | |||
5/17/2010 | 472.05 | |||
5/24/2010 | 485.63 | |||
6/1/2010 | 498.72 | |||
6/7/2010 | 488.5 | |||
6/14/2010 | 500.03 | |||
6/21/2010 | 472.68 | |||
6/28/2010 | 436.55 | |||
7/6/2010 | 467.49 | |||
7/12/2010 | 459.61 | |||
7/19/2010 | 490.06 | |||
7/26/2010 | 484.85 | |||
8/2/2010 | 500.22 | |||
8/9/2010 | 486.35 | |||
8/16/2010 | 462.02 | |||
8/23/2010 | 458.83 | |||
8/30/2010 | ||||
9/7/2010 | 476.14 | |||
9/13/2010 | 490.15 | |||
9/20/2010 | 527.29 | |||
9/27/2010 | 525.62 | |||
10/4/2010 | 536.35 | |||
10/11/2010 | 601.45 | |||
10/18/2010 | 612.53 | |||
10/25/2010 | 613.7 | |||
11/1/2010 | 625.08 | |||
11/8/2010 | 603.29 | |||
11/15/2010 | 590.8 | |||
11/22/2010 | ||||
11/29/2010 | ||||
12/6/2010 | 592.21 | |||
12/13/2010 | ||||
12/20/2010 | 604.23 | |||
12/27/2010 | 593.97 | |||
1/3/2011 | 616.44 | |||
1/10/2011 | 624.18 | |||
1/18/2011 | 611.83 | |||
1/24/2011 | 600.99 | |||
1/31/2011 | 610.98 | |||
2/7/2011 | 624.5 | |||
2/14/2011 | 630.08 | |||
2/22/2011 | 610.04 | |||
2/28/2011 | 600.62 | |||
3/7/2011 | 576.71 | |||
3/14/2011 | 561.06 | |||
3/21/2011 | 579.74 | |||
3/28/2011 | 591.8 | |||
4/4/2011 | 578.16 | |||
4/11/2011 | 530.7 | |||
4/18/2011 | 525.1 | |||
4/25/2011 | 544.1 | |||
5/2/2011 | 535.3 | |||
5/9/2011 | 529.55 | |||
5/16/2011 | 524.03 | |||
5/23/2011 | 520.9 | |||
5/31/2011 | 523.08 | |||
6/6/2011 | 509.51 | |||
6/13/2011 | 485.02 | |||
6/20/2011 | 474.88 | |||
6/27/2011 | 521.03 | |||
7/5/2011 | 531.99 | |||
7/11/2011 | 597.62 | |||
7/18/2011 | 618.23 | |||
7/25/2011 | 603.69 | |||
8/1/2011 | 579.04 | |||
8/8/2011 | 563 | |||
8/15/2011 | 490.92 | |||
8/22/2011 | 526.86 | |||
8/29/2011 | 524.84 | |||
9/6/2011 | 524.85 | |||
9/12/2011 | 546.68 | |||
9/19/2011 | 525.51 | |||
9/26/2011 | 515.04 | |||
10/3/2011 | 515.12 | |||
10/10/2011 | 591.68 | |||
10/17/2011 | 590.49 | |||
10/24/2011 | 600.14 | |||
10/31/2011 | 596.14 | |||
11/7/2011 | 608.35 | |||
11/14/2011 | 594.88 | |||
11/21/2011 | ||||
11/28/2011 | 620.36 | |||
12/5/2011 | 627.42 | |||
12/12/2011 | 625.96 | |||
12/19/2011 | 633.14 | |||
12/27/2011 | 645.9 | |||
1/3/2012 | 650.02 | |||
1/9/2012 | 624.99 | |||
1/17/2012 | 585.99 | |||
1/23/2012 | 579.98 | |||
1/30/2012 | 596.33 | |||
2/6/2012 | 605.91 | |||
2/13/2012 | 604.64 | |||
2/21/2012 | 609.9 | |||
2/27/2012 | 621.25 | |||
3/5/2012 | ||||
3/12/2012 | 625.04 | |||
3/19/2012 | 642 | |||
3/26/2012 | 641.24 | |||
4/2/2012 | 632.32 | |||
4/9/2012 | 624.6 | |||
4/16/2012 | 596.06 | |||
4/23/2012 | 614.98 | |||
4/30/2012 | 596.97 | |||
5/7/2012 | 605.23 | |||
5/14/2012 | 600.4 | |||
5/21/2012 | 591.53 | |||
5/29/2012 | 570.98 | |||
6/4/2012 | 580.45 | |||
6/11/2012 | 564.51 | |||
6/18/2012 | 571.48 | |||
6/25/2012 | 580.07 | |||
7/2/2012 | 585.98 | |||
7/9/2012 | 576.52 | |||
7/16/2012 | 610.82 | |||
7/23/2012 | 634.96 | |||
7/30/2012 | 641.33 | |||
8/6/2012 | ||||
8/13/2012 | 677.14 | |||
8/20/2012 | 678.63 | |||
8/27/2012 | 685.09 | |||
9/4/2012 | 706.15 | |||
9/10/2012 | 709.68 | |||
9/17/2012 | 733.99 | |||
9/24/2012 | 754.5 | |||
10/1/2012 | 767.65 | |||
10/8/2012 | 744.75 | |||
10/15/2012 | 681.79 | |||
10/22/2012 | 675.15 | |||
10/31/2012 | 687.92 | |||
11/5/2012 | 663.03 | |||
11/12/2012 | 647.18 | |||
11/19/2012 | 667.97 | |||
11/26/2012 | 698.37 | |||
12/3/2012 | 684.21 | |||
12/10/2012 | 701.96 | |||
12/17/2012 | 715.63 | |||
12/24/2012 | 700.01 | |||
12/31/2012 | 737.97 | |||
1/7/2013 | 739.99 | |||
1/14/2013 | 704.51 | |||
1/22/2013 | 753.67 | |||
1/28/2013 | 775.6 | |||
2/4/2013 | 785.37 | |||
2/11/2013 | 792.89 | |||
2/19/2013 | 799.71 | |||
2/25/2013 | 806.19 | |||
3/4/2013 | 831.52 | |||
3/11/2013 | 814.3 | |||
3/18/2013 | 810.31 | |||
3/25/2013 | 794.19 | |||
4/1/2013 | 783.05 | |||
4/8/2013 | 790.05 | |||
4/15/2013 | 799.87 | |||
4/22/2013 | 801.42 | |||
4/29/2013 | 845.72 | |||
5/6/2013 | 880.23 | |||
5/13/2013 | 909.18 | |||
5/20/2013 | 873.32 | |||
5/28/2013 | 871.22 | |||
6/3/2013 | 879.73 | |||
6/10/2013 | 875.04 | |||
6/17/2013 | 880.93 | |||
6/24/2013 | 880.37 | |||
7/1/2013 | 893.49 | |||
7/8/2013 | 923 | |||
7/15/2013 | 896.6 | |||
7/22/2013 | 885.35 | |||
7/29/2013 | 906.57 | |||
8/5/2013 | 890.41 | |||
8/12/2013 | 856.91 | |||
8/19/2013 | 870.21 | |||
8/26/2013 | 846.9 | |||
9/3/2013 | 879.58 | |||
9/9/2013 | 889.07 | |||
9/16/2013 | 903.11 | |||
9/23/2013 | 876.39 | |||
9/30/2013 | 872.35 | |||
10/7/2013 | 871.99 | |||
10/14/2013 | 1011.41 | |||
10/21/2013 | 1015.2 | |||
10/28/2013 | 1027.04 |
CSC425– Time series analysis and forecasting
Homework 4 – Solutions
Problem 1 [just for fun!] – NO CREDIT
I noticed some students had questions regarding the SARIMA notation. Write the model expression for
the following SARIMA models for the {Xt} time series.
Example: SARIMA(2,0,0)(1,0,0)[12]
The notation indicates a regular AR(2) model and a seasonal AR(1)[12] model
(1-φ1B-φ2B
2
)(1-Φ1 B
12
)Xt = at
a)
SARIMA(2,1,0)(1,1,0)[4]
b)
SARIMA(1,0,1)(0,0,1)[12]
c)
SARIMA(0,1,3)(0,1,1)[4]
In general SARIMA(p,d,q)(P,D,Q)[s] where the first set (p,d,q) describes the regular
ARIMA(p,d,q) model, and the second set (P,D,Q)[s] describes the order of the seasonal
component.
a) The first set (2,1,0) identifies a first difference + AR(2) model. The second set (1,1,0)
indicates the seasonal component with a seasonal differencing and a AR(4) seasonal
model.
Thus
SARIMA(2,1,0)(1,1,0)[4]
(1-ϕ1B-ϕ2B
2
)(1-φ1B
4
) (1-B)(1-B
4
)Xt = at
b) The first set (1,0,1) indicates an ARMA(1,1) model. The second set (0,0,1) is for the
seasonal component, indicating MA(12) seasonal model. Thus
SARIMA(1,0,1)(0,0,1)[12]
(1-ϕ1B)Xt =(1-θ1B
12
) (1-ϑ1B)at
c) The first set (0,1,3) indicates an ARIMA(0,1,3) model, i.e. first difference + MA(3)
model. The second set (0,1,1,) indicates a seasonal differencing and a MA(4) model.
Thus
SARIMA(0,1,3)(0,1,1)[4]
(1-B)(1-B
4
)Xt = (1-θ1B
4
) (1-ϑ1B- ϑ2B
2
– ϑ3B
3
)at
Problem 2 [15pts]
Consider the quarterly earnings per share (EPS) of Costco from the last quarter of 2001 to the third
quarter of 2013 fiscal year. The data were obtained from the Nasdaq website and are saved in the
costco.csv file.
a) Analyze the distribution of the quarterly earnings. Are the data normally distributed? Provide
appropriate statistics, tests and graphs to support your conclusions.
The average quarterly earnings are $0.66 per share with a standard deviation of $0.27. The
distribution of eps has a slight positive skew – as shown by the histogram and normal quantile plot
(points under the right tails diverge from the diagonal). Normality test rejects hypothesis of
normality for earnings data at 5% level. (See SAS / R output)
The UNIVARIATE Procedure
Variable: eps
Moments
N 48 Sum Weights 48
Mean 0.66270833 Sum Observations 31.81
Std Deviation 0.26970228 Variance 0.07273932
Skewness 0.95652585 Kurtosis 0.80234438
Uncorrected SS 24.4995 Corrected SS 3.41874792
Coeff Variation 40.6969796 Std Error Mean 0.03892817
Results of Jacque and Bera test on normality
Jarque & P-value for
skewness, kurtosis, Bera Jarque &
Obs eps eps statistic Bera test
1 0.95653 0.80234 8.60705 0.013521
b) Apply the log function to stabilize variable and analyze the variable yt=log(eps) at time t.
The distribution of log(eps) is closer to normal distribution, with symmetry and kurtosis statistics
close to 0 as also shown by histogram and normal quantile plot. The Jarque-Bera test is not
significant (p-value > 0.05) indicating that hypothesis of normal distribution for log(eps) cannot
be rejected.
Moments
N 48 Sum Weights 48
Mean -0.4891602 Sum Observations -23.479688
Std Deviation 0.39954911 Variance 0.15963949
Skewness 0.0058175 Kurtosis -0.398239
Uncorrected SS 18.9883843 Corrected SS 7.50305604
Coeff Variation -81.68063 Std Error Mean 0.05766995
Results of Jacque and Bera test on normality
Jarque & P-value for
skewness, kurtosis, Bera Jarque &
Obs xlog xlog statistic Bera test
1 .005817502 -0.39824 0.31746 0.85323
c) Is there any evidence of a seasonal effect in the log(eps) data? Explain your answer.
The time plot for Costco quarterly earnings and for log(earnings) show quarterly seasonality, with
higher earnings in the first and third quarters and lower earnings in the second and fourth
quarters. We can observe a slight dip in earnings in 2009 but the recent economic crisis didn’t
seem to affect
Costco earnings
much, as earnings increased shortly after the dip. The log(eps) time
plot shows that variability is more constant over time.
eps
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
1.2
1.3
1.4
date
01/01/2000 01/01/2002 01/01/2004 01/01/2006 01/01/2008 01/01/2010 01/01/2012 01/01/2014
Costco earnings
d) Compute the Dickey-Fuller test for the quarterly earnings for p=1, 3, and 5. Write down the test
hypotheses, and the test statistics and analyze the results of the test
Dickey Fuller Test is expressed as H0: φ1=1 vs Ha: φ1<>1 where the null hypothesis
indicates unit-root non-stationarity.
The Dickey-Fuller test shows that the earnings time series is unit-root non-stationary. The test p-
values for lags 1, 3 and 5 are all larger than 5, therefore the null hypothesis of non stationarity
cannot be rejected (large p-values> 0.1).
The ARIMA Procedure
Augmented Dickey-Fuller Unit Root Tests
Type Lags Rho Pr < Rho Tau Pr < Tau F Pr > F
Zero Mean 1 1.1538 0.9282 1.37 0.9554
3 1.2386 0.9381 4.43 0.9999
5 1.1075 0.9213 1.91 0.9852
Single Mean 1 -0.7726 0.9018 -0.28 0.9193 1.22 0.7642
3 1.8677 0.9962 2.06 0.9998 10.02 0.0010
5 0.4869 0.9736 0.24 0.9722 1.86 0.6076
Trend 1 -20.9931 0.0306 -2.77 0.2161 4.22 0.3620
3 0.1628 0.9958 0.06 0.9958 2.31 0.7221
5 93.4268 0.9999 -2.07 0.5472 2.53 0.6813
e) Take the first difference of log(eps) series, and analyze its autocorrelation function. Does the
differenced time series show strong seasonal patterns? Explain.
The ACF of the first difference of yt=log(eps), say yt-yt-1 shows larger values at lags multiple of
four, which is consistent with the seasonal behavior seen in the time plot. The TS is not stationary
since it shows seasonality.
xlog
-1.3
-1.2
-1.1
-1.0
-0.9
-0.8
-0.7
-0.6
-0.5
-0.4
-0.3
-0.2
–
0.1
0.0
0.1
0.2
0.3
0.4
date
01/01/2000 01/01/2002 01/01/2004 01/01/2006 01/01/2008 01/01/2010 01/01/2012 01/01/2014
Time plot of Costco log(EPS)
f) If you de-trend and de-seasonalize the Costco log(eps) data, do you obtain a stationary time series?
Use appropriate statistical techniques and graphs to analyze stationarity.
The de-trended and de-seasonalized time series is stationary – as indicated by the Dickey-Fuller
test (p-values are small, and therefore hypothesis of non stationarity can be rejected), and small
ACF values. After differencing the time series, we can assume a zero mean model for the data.
Note that deseasonalized TS have zero mean (use”nc” type in R)
The ARIMA Procedure
Augmented Dickey-Fuller Unit Root Tests
Type Lags Rho Pr < Rho Tau Pr < Tau F Pr > F
Zero Mean 1 -39.9069 <.0001 -4.17 0.0001
3 85.9684 0.9999 -5.29 <.0001
5 28.6760 0.9999 -4.61 <.0001
Single Mean 1 -39.7843 0.0003 -4.10 0.0024 8.49 0.0010
3 85.8868 0.9999 -5.19 0.0002 13.58 0.0010
5 28.7344 0.9999 -4.51 0.0009 10.30 0.0010
Trend 1 -39.6292 <.0001 -4.04 0.0149 8.22 0.0184
3 86.3878 0.9999 -5.10 0.0009 13.11 0.0010
5 28.3836 0.9999 -4.48 0.0053 10.04 0.0010
g) Apply an airline model SARIMA(0,1,1)(0,1,1)4 to estimate the log(eps).
a. Write down the estimated expression of the fitted model
(1-B)(1-B
4
)Xt = (1-0.339B)(1-0.669B
4
)at (sas output)
The ARIMA Procedure
Maximum Likelihood Estimation
Standard Approx
Parameter Estimate Error t Value Pr > |t| Lag
MA1,1 0.33862 0.14912 2.27 0.0232 1
MA2,1 0.66864 0.13492 4.96 <.0001 4
Variance Estimate 0.012082
Std Error Estimate 0.109917
AIC -63.4007
SBC -59.8783
Number of Residuals 43
b. Are all model coefficients significant?
All coefficients of the SARIMA model are significant (p-value < 0.05). .
c. Analyze the residuals to check the model validity
The autocorrelation analysis of the residuals show that the ACF at lag 5 is significant larger
than zero. This is also shown by the Ljung Box test that has p-value > 0.05 for lags up to 6.
These results indicate that there is some source of serial correlation that is still unexplained
by the model. Since it is only one large ACF value, this is not indicative of a seasonal trend.
The solution is to add a lag 5 parameter in the MA component of the SARIMA model.
Autocorrelation Check of Residuals
To Chi- Pr >
Lag Square DF ChiSq ——————–Autocorrelations——————–
6 9.44 4 0.0509 -0.013 0.104 -0.198 -0.129 -0.346 -0.049
12 12.75 10 0.2380 0.002 0.184 0.019 0.153 -0.000 0.023
18 20.68 16 0.1911 -0.185 0.066 -0.179 -0.065 -0.044 0.189
24 21.88 22 0.4673 -0.014 -0.055 -0.066 0.054 -0.053 -0.009
h) The multiplicative airline model has some limitations, how can you modify it to be a better fit for
the data? (If you are not sure how to do this, post a message on the discussion forum, and start a
discussion with other students!)
a. Write down the estimated expression of the fitted model
(1-B)(1-B
4
)Xt =(1-0.7235B
4
)(1-0.4103B-0.3726B
5
)at (sas output)
The ARIMA Procedure
Maximum Likelihood Estimation
Standard Approx
Parameter Estimate Error t Value Pr > |t| Lag
MA1,1 0.41029 0.16632 2.47 0.0136 1
MA1,2 0.37262 0.16359 2.28 0.0227 5
MA2,1 0.72350 0.13731 5.27 <.0001 4
Variance Estimate 0.010536
Std Error Estimate 0.102645
AIC -66.3638
SBC -61.0802
Number of Residuals 43
b. Are all model coefficients significant?
All coefficients are significantly different from zero
c. Analyze the residuals to check the model validity
The residuals are white noise. Ljung Box test for lags up to 24 is significant, indicating that
the model is able to capture the time dependencies in the data.
To Chi- Pr >
Lag Square DF ChiSq ——————–Autocorrelations——————–
6 3.77 3 0.2870 -0.059 0.091 -0.202 -0.134 -0.078 0.028
12 5.94 9 0.7457 0.026 0.140 -0.069 0.078 -0.055 0.062
18 13.34 15 0.5759 -0.109 0.071 -0.205 -0.086 -0.066 0.182
24 14.27 21 0.8577 0.005 -0.042 -0.079 0.018 -0.033 -0.032
i) Which model is a better fit for the data? Explain what evidence you use to select the best model.
The SARIMA(0,1,0)(0,1,5)4 provides a better description of the dynamic behavior of the Costco
log earnings time series. The coefficients are significantly different from zero, residuals are white
noise and normally distributed. Also the second model has the lowest SBC/BIC values.
j) Compute forecasts for Costco quarterly EPS for the next 4 quarters using the selected model in h).
The table below reports forecasts in terms of EPS values. The forecasts of log(eps) computed by
SARIMA model were transformed to EPS original scale using the exp() function.
Obs date OBSERVED FORECAST STD L95 U95 RESIDUAL x
49 10/01/2013 . 0.97486 0.10264 0.79302 1.18584 . .
50 01/01/2014 . 1.15743 0.11916 0.90987 1.45159 . .
51 04/01/2014 . 1.00514 0.13366 0.76662 1.29454 . .
52 07/01/2014 . 1.51769 0.14672 1.12620 2.00170 . .
R code:
# Analysis of quarterly house starts (single family homes) in the US
from 1974
# to 2012. Seasonality is expected due to lower number of new
constructions
# in the winter months.
#1. LOAD LIBRARIES
library(tseries)
library(fBasics)
library(forecast)
#2. IMPORT DATA
# Load data with no variable names into the data frame “da”
myd=read.table(“costco.csv”,header=T, sep=’,’)
# creates time series object
x=myd$eps
xts=ts(myd$eps,start=c(2001,4), frequency=4,)
# create time plot
plot(xts,type=’l’)
# TRANSFORMATIONS
# natural log
y=log(x)
yts=log(xts)
plot(yts,type=’l’)
#add circles on data points in the time plot
points(yts)
# ACF ANALYSIS
# two plots per page
par(mfcol=c(1,1))
# ACF plot for log starts
# as.vector() transforms time series to regular numerical vector
acf(as.vector(y),lag.max=24, main=”ACF of log starts”)
# APPLYING DIFFERENCING TO DATA
# compute regular differences
dy=diff(y)
# create acf plot
acf(as.vector(dy),lag.max=16, main=”ACF of DX log starts”)
# compute seasonal difference for quarterly data (s=4)
sdy=diff(dy,4)
# create acf plot
acf(as.vector(sdy),lag.max=16, main=”ACF of DSDX log starts”)
pacf(sdy)
#FIT SARIMA MODEL
library(forecast)
# try automated order selection
# since TS is stationary, stationary = FALSE
;
auto.arima(y, trace=T,stationary = FALSE)
# fit multiplicative seasonal model ARIMA(1,0,0)×(1,1,0)[4]
m1=arima(y, order=c(1,0,0),seasonal=list(order=c(1,1,0),period=4),
method=”ML”)
source(“arimaPrint.R”)
# use arimaPrint() function in arimaPrint.R file
# save arimaPrint.R file in work directory
arimaPrint(m1)
acf(m1$residuals)
qqnorm(m1$residuals)
qqline(m1$residuals)
hist(m1$residuals)
# Since seasonal AR_1 coefficient is not significant; remove it from
model;
# fit airline seasonal model ARIMA(0,1,1)×(0,1,1)4
m1=arima(y, order=c(0,1,1),seasonal=list(order=c(0,1,1),period=4),
method=”ML”)
m1
# fit airline seasonal model ARIMA(0,1,5)×(0,1,1)4 with only lags 1
and 5
m1=arima(y, order=c(0,1,5),seasonal=list(order=c(0,1,1),period=4),
method=”ML”, fixed=c(NA, 0,0,0,NA,NA))
m1
# use arimaPrint() function in arimaPrint.R file
arimaPrint(m1,c(1,5,6) )
# One plot per page
par(mfcol=c(1,1))
acf(m1$residuals)
# “approximated” ljung box test on residuals
Box.test(m1$residuals, 6, “Ljung-Box”,fitdf=3 )
Box.test(m1$residuals, 12, “Ljung-Box”, fitdf=3)
# compute predictions for log data for up to 8-step aheads
f1=forecast(m1,h=8)
names(f1)
plot(f1)
#predictions in original scale;
exp(f1$mean)
# creates forecast plot for original variable
# join data with forecasts
s1=c(x,exp(f1$mean))
#s1=ts(s1, frequency=4,start=c(1974,1))
# computes lower limit for 95% interval
lcl=c(x,exp(f1$lower[,2]))
# computes upper limit for 95% interval
ucl=c(x,exp(f1$upper[,2]))
# Forecast plot
plot(s1,type=’l’,col=’red’)
# points(s2011)
lines(1:length(s1),ucl,lty=2)
lines(1:length(s1),lcl,lty=2)
SAS code
title1 ‘Costco earnings’;
proc import datafile=”costco.csv” out=myd replace;
getnames=yes;
delimiter=’,’;
run;
proc print;
run;
data myd;
set myd;
xlog=log(eps);
run;
*analyze distribution of log(eps);
proc univariate data=myd;
var xlog;
histogram/normal;
qqplot / normal(mu=est sigma=est);
output out=stats kurtosis=kurtosis skewness=skewness N=ntot;
run;
/* steps to compute Jarque-Bera tests
*/
/* and print out tests results to the OUTPUT window */
data computation;
set stats;
label pv_kur = “P-value for kurtosis test”;
*skewness statistic;
skew_test = skewness/sqrt(6/Ntot);
*kurtosis statistic;
kurt_test = kurtosis/sqrt(24/Ntot);
* jarque-bera statistic;
jb = skew_test*skew_test+kurt_test*kurt_test;
* JB test p-values;
pv_jb = 1-cdf(‘CHISQUARE’, jb,2);
label pv_jb = “P-value for Jarque & Bera test”
jb = “Jarque & Bera statistic”;
/* Print out results of tests*/
Title ” Results of Jacque and Bera test on normality”;
proc print data= computation label;
var skewness kurtosis jb pv_jb;
run;
*plot timeplots for all transformed variables;
symbol1 i=join v=dot;
proc gplot data=myd;
title1 “Time plot of Costco EPS”;
plot eps * date = 1 ;
title1 “Time plot of Costco log(EPS)”;
plot xlog * date = 1 ;
run;
proc arima data=myd;
identify var=eps nlag=24 stationarity=(adf=(1 3 5));
identify var=xlog(1) nlag=24;
identify var=xlog(1,4) nlag=20 stationarity=(adf=(1 3 5))
minic;
identify var=xlog(1,4) nlag=20 esacf;
run;
*airline model SARIMA(0,1,1)(0,1,1)[4];
estimate q=(1)(4) noconstant method=ml;
* modified airline model SARIMA(0,1,1)(0,1,5)[4];
estimate q=(1 5)(4) noconstant method=ml;
outlier maxnum=3 alpha=0.01;
run;
forecast out=b lead=12 id=date interval=quarter;
quit;
/*
The following statements retransform the forecast values to get
forecasts in the original scales.
*/
data c;
set b;
x = exp( xlog );
forecast = exp( forecast + std*std/2 );
l95 = exp( l95 );
u95 = exp( u95 );
run;
proc print data=c;
run;
Problem 3 [12 points] (review Week 5 notes)
Global temperatures are rising. Analyze the time series of Land-Ocean Temperature Index in degrees
Celsius published at the NASA-GISS (Goddard Institute for Space Studies) and saved in dataset
globtemp.csv. The temperature index measures temperature anomalies that are computed relative
to the base period 1951-1980. The reason to work with anomalies, rather than absolute
temperature is that absolute temperature varies markedly in short distances, while monthly or
annual temperature anomalies are representative of a much larger region (from GISS website).
a) Plot the temperature index time series and its ACFs (20 lags). Analyze trends and patterns
shown by the data.
Time plot shows that global temperatures has steadily increased since 1950.The ACF
plot shows that Global Temperature Index TS is non stationary (ACF function decays
very slowly to zero, which is not surprising given the upward time trend shown by the
time plot)
b) Analyze if the series is stationary using both the ACF function and the Dickey Fuller test
The DF tests computed for AR(3) and AR(5) models show that the null hypothesis of unit-root
stationarity cannot be rejected (p-values are > 0.1). (NOTE: for this dataset, it would be
acceptable to use the DF test for either model with intercept (constant) but no time trend, or
for a model with intercept (constant) and a time trend. Results are equivalent)
> # Dickey Fuller test for time series with time trend
>
library(fUnitRoots)
>
adfTest(index, lags=3, type=c(“ct”))
Title:
Augmented Dickey-Fuller Test
Test Results:
PARAMETER:
Lag Order: 3
STATISTIC:
Dickey-Fuller: -2.4227
P VALUE:
0.4007
> adfTest(index, lags=5, type=c(“ct”))
Title:
Augmented Dickey-Fuller Test
Test Results:
PARAMETER:
Lag Order: 5
STATISTIC:
Dickey-Fuller: -1.9472
P VALUE:
0.5984
In SAS
Augmented Dickey-Fuller Unit Root Tests
Type Lags Rho Pr < Rho Tau Pr < Tau F Pr > F
Zero Mean 1 -3.1286 0.2229 -0.97 0.2967
3 0.3095 0.7561 0.16 0.7308
5 0.6287 0.8363 0.36 0.7864
Single Mean 1 -2.9490 0.6593 -0.91 0.7827 0.74 0.8811
3 0.5894 0.9790 0.31 0.9780 1.13 0.7831
5 1.0265 0.9887 0.63 0.9900 1.84 0.6012
Trend 1 -26.5191 0.0132 -3.75 0.0225 7.46 0.0207
3 -11.2587 0.3399 -2.42 0.3659 4.15 0.3489
5 -8.1422 0.5607 -1.95 0.6245 3.26 0.5268
c) Use order selection methods such as EACF or BIC to identify the order of the “best”
AR/MA/ARMA model for the global temperature index time series. Note that since the TS is
not stationary, you should use an ARIMA(p,1,d) model.
Since TS is unit-root non stationary, only ARIMA(p,1,q) models will be evaluated.
In R: The EACF method suggests an ARIMA(1,1,2). The auto.arima() function selects the
ARIMA(0,1,1) model that exhibits the lowest BIC value.
In SAS: the EACF method suggests an ARIMA(0,1,1) model and the BIC model selects the
ARIMA(3,1,0)
>
#EACF method
>
library(TSA)
>
#use numeric object not ts object. coredata() function retrieves
numeric vector
>
d1_ts=diff(indexts)
>
eacf(coredata(d1_ts))
AR/MA
0 1 2 3 4 5 6 7 8 9 10 11 12 13
0 x o o x o o o o x o o o x o
1 x o o o o o o o o o o o o o
2 x o x o o o o o o o o o x o
3 o x x o o o o o o o o o o o
4 o x x x o o o o o o o o o o
5 x o x x x o o o o o o o o o
6 x o o o x o o o o o o o o o
7 x o o o x x o o o o o o o o
>
# BIC criterion
>
# use numeric object not ts object
>
auto.arima(index, max.p = 5, max.q = 5, stationary = F, ic
=c(“bic”), trace=TRUE, allowdrift=F)
ARIMA(2,1,2) : 982.1547
ARIMA(0,1,0) : 990.837
ARIMA(1,1,0) : 985.1276
ARIMA(0,1,1) : 972.8737
ARIMA(1,1,1) : 976.1241
ARIMA(0,1,2) : 973.7256
ARIMA(1,1,2) : 979.3824
Best model: ARIMA(0,1,1)
Series: index
ARIMA(0,1,1)
Coefficients:
ma1
-0.5398
s.e. 0.0838
sigma^2 estimated as 89.26: log likelihood=-483.91
AIC=971.82 AICc=971.92 BIC=977.59
In SAS
ESACF Probability Values
Lags MA 0 MA 1 MA 2 MA 3 MA 4 MA 5
AR 0 0.0011 0.0635 0.4021 0.0552 0.1309 0.8882
AR 1 <.0001 0.7810 0.3314 0.3416 0.1542 0.5998
AR 2 <.0001 0.6174 0.0018 0.7358 0.1896 0.8318
AR 3 0.4231 0.0002 0.0001 0.1756 0.6531 0.7922
AR 4 0.2412 <.0001 0.0018 0.0597 0.3335 0.6631
AR 5 <.0001 0.4323 0.0527 0.0004 0.0120 0.9065
The ARIMA Procedure
ARMA(p+d,q)
Tentative
Order
Selection
Tests
—ESACF—
p+d q
0 1
3 3
(5% Significance Level)
Name of Variable = temp
Minimum Information Criterion
Lags MA 0 MA 1 MA 2 MA 3 MA 4 MA 5
AR 0 4.588787 4.487776 4.468956 4.495152 4.505573 4.533958
AR 1 4.541124 4.523001 4.480694 4.48462 4.503237 4.540091
AR 2 4.499424 4.522147 4.516266 4.503816 4.537365 4.574281
AR 3 4.441992 4.477332 4.494755 4.527052 4.558184 4.586585
AR 4 4.477766 4.514051 4.522759 4.556779 4.577564 4.612352
AR 5 4.484678 4.521373 4.55314 4.590039 4.610067 4.637321
Error series model: AR(10)
Minimum Table Value: BIC(3,0) = 4.441992
d) Fit the selected ARIMA model, and write down the result.
The ARIMA(1,1,2) model selected by the EACF method in R has non-significant parameters.
The ARIMA(3,1,0) selected by BIC method in SAS is a possible model but has higher BIC
value (details of analysis are omitted). The best model is the ARIMA(0,1,1) selected by the BIC
criterion in R and EACF in SAS.
The fitted ARIMA(0,1,1) model for the global temperature index Xt can be written as:
(1-B)Xt = (1-0.5398B)at
>
m2= Arima(index, order=c(0,1,1), method=’ML’)
>
m2
Series: index
ARIMA(0,1,1)
Coefficients:
ma1
-0.5398
s.e. 0.0838
sigma^2 estimated as 89.26: log likelihood=-483.91
AIC=971.82 AICc=971.92 BIC=977.59
>
arimaPrint(m2)
[1] “T-test table for model coefficients”
estimates std error
ma1 “-0.5397611” “0.08376795”
p-values
ma1 “1.167279e-10”
e) Use residual analysis to check the model.
Ljung Box test on residuals shows that hypothesis of
independence cannot be rejected. So we can assume that
residuals are white noise.
The normal quantile plot shows that distribution of residuals
is close to the normal distribution.
> Box.test(m2$residuals,lag=6,type=’Ljung-Box’, fitdf=1)
Box-Ljung test
data: m2$residuals
X-squared = 9.5895, df = 5,
p-value = 0.08774
>
Box.test(m2$residuals,lag=12,type=’Ljung-Box’, fitdf=1)
Box-Ljung test
data: m2$residuals
X-squared = 16.3747, df = 11,
p-value = 0.1278
f) Compute 1-step to 4-step ahead forecasts of the fitted model with origin at the end of the data,
i.e. 2012. Write down the forecasts and their standard errors.
> f=
forecast.Arima(m2, h=4)
>
plot(f, include=50)
> f
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
134 58.1793 46.07184 70.28676 39.66253 76.69606
135 58.1793 44.85108 71.50751 37.79555 78.56305
136 58.1793 43.73311 72.62548 36.08577 80.27283
137 58.1793 42.69566 73.66293 34.49912 81.85947
g) EXTRA CREDIT [2pts]: apply backtesting procedure to compare the following two models:
ARIMA(0,1,1) and ARIMA(0,1,2). Use 113
cases (85% of data) for training and 20 cases
(or 15%) for testing.
The procedure shows that the RMSE value of
the ARIMA(0,1,1) is 9.680. While the RMSE
value for the ARIMA(0,1,2) is 9.688. The two
values are very close showing that the two
models are equivalent in terms of predictive
power. However the ARIMA(0,1,1) is slightly
better.
>
backtest(m2,index,113,1)
[1] “RMSE of out-of-sample
forecasts”
[1] 9.680175
[1] “Mean absolute error of out-of-sample forecasts”
[1] 7.925004
>
m3=arima(index, order=c(0,1,2))
>
backtest(m3,index,113,1)
[1] “RMSE of out-of-sample forecasts”
[1] 9.688697
[1] “Mean absolute error of out-of-sample forecasts”
[1] 7.932232
> m3
Series: x
ARIMA(0,1,2)
Coefficients:
ma1 ma2
-0.4287 -0.1591
s.e. 0.0843 0.0764
sigma^2 estimated as 86.63: log likelihood=-481.94
AIC=967.88 AICc=968.07 BIC=976.53
In SAS: backtest needs to be modified to fit ARIMA(p,1,q) models by changing the IDENTIFY
statement in the PROC ARIMA of the BACKTEST macro as follows
identify var=&VAR(1)
noprint;
/*——— backtesting ————–*/
%backtest(trainpct=85, dataset=myd, var=temp, q=1, interval=year);
%backtest(trainpct=85, dataset=myd, var=temp, q=2, interval=year);
Backtest results for myd
Model: q=(1)
Obs _TYPE_ _FREQ_ mafe msfe rmsfe
1 0 20 7.71428 87.0438 9.32973
Backtest results for myd
Model: q=2
Obs _TYPE_ _FREQ_ mafe msfe rmsfe
1 0 20 7.60769 85.3367 9.23779
R code
# Analysis of global temperature index
#1. LOAD LIBRARIES
library(tseries)
library(fBasics)
library(forecast)
#Create a time series
library(zoo)
data=read.table(“globtemp.csv”,header=T, sep=’,’)
index=data[,2]
indexts=zoo(data[,2],as.Date(as.character(data[,1]),format=”%Y”))
#plot ts and acf test
plot(indexts, type=’l’, xlab=’time’, ylab=’temperature index ‘)
#Analysis of distribution
par(mfcol=c(1,1))
hist(index, xlab=”Distribution of GT Index”, prob=TRUE,
main=”Histogram”)
# add approximating normal density curve
xfit<-seq(min(index),max(index),length=40)
yfit<-dnorm(xfit,mean=mean(index),sd=sd(index))
lines(xfit, yfit, col=”blue”, lwd=2)
# CREATE NORMAL PROBABILITY PLOT
qqnorm(index)
qqline(index, col = 2)
acf(coredata(indexts), plot=T, lag=20)
# Dickey Fuller test
library(fUnitRoots)
adfTest(index, lags=3, type=c(“ct”))
adfTest(index, lags=5, type=c(“ct”))
adfTest(index, lags=3, type=c(“c”))
adfTest(index, lags=5, type=c(“c”))
#Ljung – Box test
Box.test(index,lag=6,type=’Ljung’)
Box.test(index,lag=12,type=’Ljung’)
#EACF method
library(TSA)
#use numeric object not ts object. coredata() function retrieves
numeric vector
d1_ts=diff(indexts)
eacf(coredata(d1_ts))
# BIC criterion
# use numeric object not ts object
auto.arima(index, max.p = 5, max.q = 5, stationary = F, ic
=c(“bic”), trace=TRUE, allowdrift=F)
# 13. MODEL FITTING
# MODEL 1:
#EACF suggests ARIMA(1,1,2) p=1 and m=2
m1= arima(index, order=c(1,1,2), method=’ML’)
m1
source(“arimaPrint.R”)
arimaPrint(m1)
# RESIDUAL ANALYSIS
Box.test(m1$residuals,lag=6,type=’Ljung-Box’, fitdf=1)
Box.test(m1$residuals,lag=12,type=’Ljung-Box’, fitdf=1)
acf(m1$residuals)
qqnorm(m1$residuals)
qqline(m1$residuals)
hist(m1$residuals)
# MODEL 2 SELECTED BY AUTOARIMA
# FIT AN ARIMA(0,1,1) MODEL
#
m2= Arima(index, order=c(0,1,1), method=’ML’)
m2
arimaPrint(m2)
# RESIDUAL ANALYSIS
Box.test(m2$residuals,lag=6,type=’Ljung-Box’, fitdf=1)
Box.test(m2$residuals,lag=12,type=’Ljung-Box’, fitdf=1)
acf(m2$residuals)
#Analysis of distribution
par(mfcol=c(1,1))
hist(m2$residuals, xlab=”Distribution of residuals”, prob=TRUE,
main=”Histogram”)
# add approximating normal density curve
xfit<-seq(min(m2$residuals),max(m2$residuals),length=40)
yfit<-dnorm(xfit,mean=mean(m2$residuals),sd=sd(m2$residuals))
lines(xfit, yfit, col=”blue”, lwd=2)
# CREATE NORMAL PROBABILITY PLOT
qqnorm(m2$residuals)
qqline(m2$residuals, col = 2)
#forecast and plot
pred=predict(m2,n.ahead=4, se.fit=T)
pred
forecast.Arima(m2, h=4)
f=forecast.Arima(m2, h=4)
plot(f, include=50)
#Backtesting
source(“backtest.R”)
backtest(m2,index,113,1)
m3=arima(index, order=c(0,1,2))
backtest(m3,index,113,1)
SAS code:
%macro backtest(TRAINPCT=80, DATASET=, VAR=, P=, Q=, D=, DATE=date,
INTERVAL=month);
* ————————————————————————-
;
* T I M E S E R I E S B A C K T E S T M A C R O
* ————————————————————————-
;
* DePaul CSC425, Winter 2013, Dr. Raffaella Settimi
* Macro written by Bill Qualls, First Analytics
* ————————————————————————-
;
* E X P L A N A T I O N O F P A R A M E T E R S
* (Order of variables is insignificant)
* ————————————————————————-
;
* TRAINPCT .. Percent of dataset to be used for training.
* So, (100 – TRAINPCT)% will be used for evaluation.
* Example: TRAINPCT=80
* DATASET .. Time series dataset. Libname optional, defaults to Work.
* Example: DATASET=Work.Unemp
* VAR .. Name of time series variable.
* Example: VAR=ratechg
* P .. Specified for AR models. Omit otherwise.
* Example: P=(1 3 6)
* Q .. Specified for MA models. Omit otherwise.
* Example: Q=(1 3 6)
* DATE .. Name of date variable. Defaults to date.
* Example: DATE=date
* INTERVAL .. Date interval. Defaults to month.
* Example: INTERVAL=day
* ————————————————————————-
;
* S A M P L E U S A G E
* %backtest(trainpct=80, dataset=work.unemp, var=ratedif, p=(1),
interval=day);
* ————————————————————————-
;
* How many records are in the dataset? ;
data _null_;
call symput(‘NRECS’, trim(left(nrecs)));
set &DATASET nobs=nrecs;
stop;
run;
* Determine which ones are exclusively for training based on TRAINPCT ;
%let SIZE_OF_ROLLING_WINDOW = %sysfunc(round(&NRECS * &TRAINPCT / 100));
* create a working copy of dataset with observation number ;
* as a variable for use in a where clause with proc arima. ;
* also add a placeholder for the predicted value. ;
data Work._MY_COPY_ (keep = &VAR &DATE _OBS_ _PRED_);
set &DATASET;
_OBS_ = _N_;
_PRED_ = .;
run;
* turn off log — too lengthy ;
filename junk dummy;
proc printto log=junk print=junk;
run;
* Will build the model once for each record used in evaluation. ;
* Each time I will predict one record forward. ;
%let MODELS_TO_BE_BUILT = %sysevalf(&NRECS – &SIZE_OF_ROLLING_WINDOW);
%do i = 1 %to &MODELS_TO_BE_BUILT;
* Model using SIZE_OF_ROLLING_WINDOW records, and make one prediction ;
proc arima data=Work._MY_COPY_ plots=none;
where _OBS_ ge &i
and _OBS_ le (&i + &SIZE_OF_ROLLING_WINDOW – 1);
identify var=&VAR(1) noprint;
estimate
%if (&P ne ) %then %do; p=&P
%end;
%if (&Q ne ) %then %do; q=&Q %end;
method=ml noprint;
forecast lead=1 id=&DATE interval=&INTERVAL out=Work._MY_RESULTS_
noprint;
run;
* get the predicted value (in the last record) as a macro variable ;
data _null_;
p = nrecs;
set Work._MY_RESULTS_ point=p nobs=nrecs;
call symput(“FORECAST”, forecast);
stop;
run;
* move that prediction to its place in the output file ;
proc sql noprint;
update Work._MY_COPY_
set _PRED_ = &FORECAST
where _OBS_ = &i + &SIZE_OF_ROLLING_WINDOW;
quit;
run;
* show progress so far ;
%if (%sysfunc(mod(&i, 5)) = 0) %then %do;
* print on;
proc printto log=log print=print;
run;
%put Finished &i iterations;
* print off again;
proc printto log=junk print=junk;
run;
%end;
%end;
* turn print back on ;
proc printto log=log print=print;
run;
* calculate prediction error;
data Work._MY_COPY_;
set Work._MY_COPY_;
Predicted_Error = (&VAR – _PRED_) ;
Predicted_Error_Squared = (&VAR – _PRED_)**2;
absresidual = abs(Predicted_Error);
run;
* compute and report the mean square forecast error;
%if (&P eq ) %then %let PP = ; %else %let PP = p=&P;
%if (&Q eq ) %then %let QQ = ; %else %let QQ = q=&Q;
title “Backtest results for &DATASET”;
title2 “Model: &PP &QQ”;
proc summary data=Work._MY_COPY_;
where _OBS_ > &SIZE_OF_ROLLING_WINDOW;
var Predicted_Error absresidual;
output out=outm mean(absresidual)=mafe mean(Predicted_Error_Squared)=msfe;
run;
data outm;
set outm;
rmsfe=sqrt(msfe);
run;
proc print data=outm;
run;
/*
proc means data=Work._MY_COPY_ N MEAN STDDEV CLM T PROBT;
where _OBS_ > &SIZE_OF_ROLLING_WINDOW;
var Predicted_Error_Squared;
run;
title;
*/
%mend backtest;
/*————————————————-*/
proc import datafile=”globtemp.csv” out=myd replace;
delimiter=’,’;
getnames=yes;
run;
proc print data=myd;
run;
* produced time plots;
goptions reset=global;
symbol interpol = join value=star;
proc gplot data=myd;
plot temp*year / haxis = 1880 to 2012 by 10;
run;
quit;
goptions reset = global;
proc univariate data=myd;
var temp;
histogram / normal;
qqplot / normal(mu=est sigma=est);
run;
* compute Dickey-Fuller unit-root tests;
proc arima data=myd;
identify var = temp stationarity = (adf=(1 3 5));
identify var = temp(1) stationarity = (adf=(1 3 5));
run;
proc arima data=myd;
identify var = temp esacf q=(0:5) p=(0:5) nlag =20;
identify var = temp minic q=(0:5) p=(0:5) nlag =20;
run; quit;
/* fit models from step above */
* fit ARIMA(p,1,q) models for unemployment rates;
* note that model intercept is set =0 using the noconstant option;
proc arima data=myd;
identify var = temp(1) noprint;
estimate p=0 q=1 method=ml plot noconstant;
estimate p=3 q=0 method=ml plot noconstant;
run;
quit;
proc arima data=myd;
identify var=temp(1) noprint;
estimate p=0 q=1 method=ml plot noconstant;
forecast back=0 lead=4 printall out=result id=year interval=day;
run;
quit;
/* Analysis of residuals*/
/* Residual plot: residual vs time*/
goptions reset=global;
symbol i=join;
proc gplot data=result;
plot residual*year / haxis = 1880 to 2012 by 10;
run;
quit;
/*analysis of distribution of residuals, to check
for normality */
goptions reset=global;
proc univariate data=result;
var residual;
histogram/normal (mu=est sigma=est);
qqplot/normal(mu=est sigma=est);
run;
goptions reset=global;
symbol1 interpol =join value=star color=red w=2;
symbol2 interpol = join color=black;
symbol3 interpol = join value=none color = blue;
symbol4 interpol = join value=none color = blue;
legend1 label=none
shape=symbol(4,2)
position=(top center inside)
mode=share;
proc gplot data=result;
plot forecast*year temp*year u95*year l95*year / overlay
haxis = 1880 to 2022 by 10 legend=legend1;
run;
/*——— backtesting ————–*/
%backtest(trainpct=85, dataset=myd, var=temp, q=(1), interval=year);
%backtest(trainpct=85, dataset=myd, var=temp, q= 2, interval=year);
S&P500analysis
Fitting an AR(1)-GARCH(1,1) model with Gaussian errors
> # Fitting a ARMA(0,0)-GARCH(1,1) model using the rugarch package
> # log returns show weak serial autocorrelations in the mean and an ARCH
effect
> # Fitting an GARCH(1,1) model
>
# Use ugarchspec() function to specify model
> garch11.spec=ugarchspec(variance.model=list(garchOrder=c(1,1)),
mean.model=list(armaOrder=c(0,0)))
> #estimate model
>
garch11.fit=ugarchfit(spec=garch11.spec, data=ret)
>
garch11.fit
*———————————*
* GARCH Model Fit *
*———————————*
Conditional Variance Dynamics
———————————–
GARCH Model : sGARCH(1,1)
Mean Model : ARFIMA(0,0,0)
Distribution : norm
Optimal Parameters
————————————
Estimate Std. Error t value Pr(>|t|)
mu 0.006814 0.001798 3.7889 0.000151
omega 0.000090 0.000049 1.8302 0.067224
alpha1 0.118390 0.031384 3.7723 0.000162
beta1 0.844396 0.034387 24.5556 0.000000
The fitted ARMA(0,0)-GARCH(1,1) model with Gaussian errors can be written as
22
2
8444.01184.000009.0
0068.0
tttt
t
t
tt
aea
ar
Robust Standard Errors:
Estimate Std. Error t value Pr(>|t|)
mu 0.006814 0.001914 3.5592 0.000372
omega 0.000090 0.000070 1.2858 0.198502
alpha1 0.118390 0.041165 2.8760 0.004028
beta1 0.844396 0.040113 21.0503 0.000000
LogLikelihood : 828.5681
Information Criteria
————————————
Akaike -3.4286
Bayes -3.3938
Shibata -3.4287
Hannan-Quinn -3.4149
Q-Statistics on Standardized Residuals
————————————
statistic p-value
Lag[1] 0.5261 0.4682
Lag[p+q+1][1] 0.5261 0.4682
Lag[p+q+5][5] 5.9001 0.3161
d.o.f=0
H0 : No serial correlation
Q-Statistics on Standardized Squared Residuals
————————————
statistic p-value
Lag[1] 0.2657 0.6062
Lag[p+q+1][3] 0.8252 0.3637
Lag[p+q+5][7] 3.0889 0.6863
d.o.f=2
ARCH LM Tests
————————————
Statistic DoF P-Value
ARCH Lag[2] 0.5084 2 0.7755
ARCH Lag[5] 0.9231 5 0.9685
ARCH Lag[10] 6.3220 10 0.7875
Nyblom stability test
————————————
Joint Statistic: 1.3254
Individual Statistics:
mu 0.07929
omega 0.20176
alpha1 0.02459
beta1 0.09300
Asymptotic Critical Values (10% 5% 1%)
Joint Statistic: 1.07 1.24 1.6
Individual Statistic: 0.35 0.47 0.75
Sign Bias Test
————————————
t-value prob sig
Sign Bias 2.1382 0.0330120 **
Negative Sign Bias 0.5723 0.5673614
Positive Sign Bias 1.3403 0.1807809
Joint Effect 17.4223 0.0005786 ***
Adjusted Pearson Goodness-of-Fit Test:
————————————
group statistic p-value(g-1)
1 20 43.12 0.0012496
2 30 53.82 0.0033902
3 40 75.76 0.0003804
4 50 79.60 0.0037117
Ljung-Box test for serial correlation
computed on residuals, and Ljung-
Box test for ARCH/GARCH effect
computed on squared residuals.
Goodness of fit test for distribution of err
or
term. The null hypothesis states that the
distribution for the error terms in the model
is adequate. Thus a small p-value < α,
indicates that null hypothesis can be
rejected and the distribution assumption is
not adequate.
>
#create selection list of plots for garch(1,1) fit
>
plot(garch11.fit)
>
#to display all subplots on one page
>
plot(garch11.fit, which=”all”)
Residual analysis of GARCH model shows that the model fits the data adequately. Ljung Box test (Q-statistic) on
residuals is not significant, showing that hypothesis of no correlation for residuals cannot be rejected. Similarly the
Ljung Box test (Q-statistic) on the squared standardized residuals is not significant suggesting that residuals show
no ARCH/GARCH effect. The Adjusted Pearson goodness of fit test is significant indicating that the normal
distribution assumed for the error term is not appropriate, as also shown by the QQ plot of the residuals.
Fitting an AR(1)-GARCH(1,1) model with t-distributed errors
Elapsed time : 0.260026
> plot(garch11.fit, which=”all”)
Error in plot.new() : figure margins too large
> # use Student-t innovations
> #specify model using functions in rugarch package
> #Fit ARMA(0,0)-GARCH(1,1) model with t-distribution
>
garch11.t.spec=ugarchspec(variance.model=list(garchOrder=c(1,1)),
mean.model=list(armaOrder=c(0,0)),
distribution.model = “std”)
> #estimate model
>
garch11.t.fit=ugarchfit(spec=garch11.t.spec, data=ret)
>
garch11.t.fit
*———————————*
* GARCH Model Fit *
*———————————*
Conditional Variance Dynamics
———————————–
GARCH Model : sGARCH(1,1)
Mean Model : ARFIMA(0,0,0)
Distribution : std
Optimal Parameters
————————————
Estimate Std. Error t value Pr(>|t|)
mu 0.008012 0.001746 4.5892 0.000004
omega 0.000122 0.000068 1.7901 0.073444
alpha1 0.123683 0.038919 3.1780 0.001483
beta1 0.821471 0.050706 16.2008 0.000000
shape 6.891814 1.978494 3.4834 0.000495
The fitted ARMA(0,0)-GARCH(1,1) model with Gaussian errors can be written as
2
22
8215.01237.000012.0
0080.0
t
tt
ttt
tt
aea
ar
Error term {et} has t-distribution with 7 degrees of freedom
Robust Standard Errors:
Estimate Std. Error t value Pr(>|t|)
mu 0.008012 0.001817 4.4090 0.000010
omega 0.000122 0.000071 1.7216 0.085136
alpha1 0.123683 0.038112 3.2453 0.001173
beta1 0.821471 0.041476 19.8061 0.000000
shape 6.891814 1.976630 3.4866 0.000489
LogLikelihood : 839.2684
Information Criteria
————————————
Akaike -3.4689
Bayes -3.4255
Shibata -3.4691
Hannan-Quinn -3.4518
Q-Statistics on Standardized Residuals
————————————
statistic p-value
Lag[1] 0.4662 0.4947
Lag[p+q+1][1] 0.4662 0.4947
Lag[p+q+5][5] 5.6618 0.3405
d.o.f=0
H0 : No serial correlation
Q-Statistics on Standardized Squared Residuals
————————————
statistic p-value
Lag[1] 0.2474 0.6189
Lag[p+q+1][3] 0.8569 0.3546
Lag[p+q+5][7] 2.9570 0.7066
d.o.f=2
ARCH LM Tests
————————————
Statistic DoF P-Value
ARCH Lag[2] 0.5959 2 0.7423
ARCH Lag[5] 0.9740 5 0.9646
ARCH Lag[10] 6.0871 10 0.8079
Nyblom stability test
————————————
Joint Statistic: 1.396
Individual Statistics:
mu 0.17184
omega 0.23451
alpha1 0.04303
beta1 0.10209
shape 0.07244
Asymptotic Critical Values (10% 5% 1%)
Joint Statistic: 1.28 1.47 1.88
Individual Statistic: 0.35 0.47 0.75
Sign Bias Test
————————————
t-value prob sig
Sign Bias 2.121 0.0343991 **
Negative Sign Bias 0.549 0.5832924
Positive Sign Bias 1.206 0.2284569
Joint Effect 16.352 0.0009603 ***
Adjusted Pearson Goodness-of-Fit Test:
————————————
group statistic p-value(g-1)
1 20 22.66 0.25269
2 30 44.47 0.03312
3 40 39.50 0.44759
4 50 43.64 0.68967
Residual analysis of GARCH model with t-distributed error terms shows that the model fits the data adequately.
Ljung Box test (Q-statistic) on residuals is not significant, showing that hypothesis of no correlation for residuals
cannot be rejected. Similarly the Ljung Box test (Q-statistic) on the squared standardized residuals is not significant
suggesting that residuals show no ARCH/GARCH effect. The Adjusted Pearson goodness of fit test is not significant
indicating that the distribution of the error terms can be described by a t-distribution with 7 degrees of freedom,
as also shown by the QQ plot of the residuals (created using plot(garch11.t.fit, which=9).
Fitting an ARMA(0,0)-EGARCH(1,1) model
The EGARCH model fitted by the rugarch package has a slightly different form than the model in the textbook.
Here is the general expression of the EGARCH(1,1) model:
)ln(|))(||(|()ln(
,
2
1111
11
1
2
t
tttt
t
ttttt
eEe
e
eaar
where µt can follow an ARMA process, but most typically it will be constant. Gamma1 (γ1) is the leverage
parameter. So if gamma1 in output is significant, then we can conclude that the volatility process has an
asymmetric behavior.
Fitting an ARMA(0,0)-EGARCH(1,1) model with Gaussian distribution (similar to SAS example)
> #Fit ARMA(0,0)-eGARCH(1,1) model with Gaussian distribution
> egarch11.spec=ugarchspec(variance.model=list(model = “eGARCH”,
garchOrder=c(1,1)), mean.model=list(armaOrder=c(0,0)))
>
#estimate model
>
egarch11.fit=ugarchfit(spec=egarch11.spec, data=ret)
>
egarch11.fit
*———————————*
* GARCH Model Fit *
*———————————*
Conditional Variance Dynamics
———————————–
GARCH Model : eGARCH(1,1)
Mean Model : ARFIMA(0,0,0)
Distribution : norm
Optimal Parameters
————————————
Estimate Std. Error t value Pr(>|t|)
mu 0.005937 0.001860 3.1915 0.001415
omega -0.610497 0.263901 -2.3134 0.020703
alpha1 -0.111069 0.042614 -2.6064 0.009150
beta1 0.902462 0.041695 21.6442 0.000000
gamma1 0.209405 0.050542 4.1432 0.000034
Using the R output above, the ARMA(0,0)-EGARCH(1,1) model can be written as follows.
Fitted model:
rt = 0.0059 + at, at=σtet
ln(σ2t) = -0.610 + (-0.111 et-1 + 0.209(|et-1| – E(|et-1|)) + 0.9024 ln(σ
2
t-1)
Note that since et has Gaussian distribution, the E(|et|)=sqrt(2/pi) = 0.7979 or approx 0.80 (see page 143 in
textbook). Thus we can rewrite the expression above as
07979.0209.0209.0111.0
07979.0209.0209.0111.0
)ln(9024.0610.0)ln(
1
11
1
112
1
2
ttt
ttt
tt
eifee
eifee
And after some algebra, we can write:
0320.0
0098.0
)ln(9024.07767.0)ln(
11
112
1
2
tt
tt
tt
eife
eife
Taking the antilog transformation we have
0)320.0exp(
0)098.0exp(
)7767.0exp(
11
119024.02
1
2
tt
ttx
tt
eife
eife
For a standardized shock with magnitude 2, (i.e. two standard deviations), we have
56.1
)2098.0exp(
))2(320.0exp(
)2(
)2(
1
2
1
2
tt
tt
e
e
Therefore, the impact of negative shock of size two-standard deviations is about 56% higher than the impact of a
positive shock of the same size.
Robust Standard Errors:
Estimate Std. Error t value Pr(>|t|)
mu 0.005937 0.001958 3.0328 0.002423
omega -0.610497 0.364615 -1.6744 0.094060
alpha1 -0.111069 0.066343 -1.6742 0.094096
beta1 0.902462 0.057336 15.7400 0.000000
gamma1 0.209405 0.048127 4.3511 0.000014
LogLikelihood : 835.9936
Information Criteria
————————————
Akaike -3.4553
Bayes -3.4119
Shibata -3.4555
Hannan-Quinn -3.4382
Q-Statistics on Standardized Residuals
————————————
statistic p-value
Lag[1] 0.3351 0.5627
Lag[p+q+1][1] 0.3351 0.5627
Lag[p+q+5][5] 5.1570 0.3970
d.o.f=0
H0 : No serial correlation
Q-Statistics on Standardized Squared Residuals
————————————
statistic p-value
Lag[1] 0.7262 0.3941
Lag[p+q+1][3] 1.0203 0.3124
Lag[p+q+5][7] 2.8738 0.7194
d.o.f=2
ARCH LM Tests
————————————
Statistic DoF P-Value
ARCH Lag[2] 1.018 2 0.6010
ARCH Lag[5] 1.165 5 0.9482
ARCH Lag[10] 6.026 10 0.8131
Nyblom stability test
————————————
Joint Statistic: 1.5861
Individual Statistics:
mu 0.1829
omega 0.1224
alpha1 0.1527
beta1 0.1301
gamma1 1.0125
Asymptotic Critical Values (10% 5% 1%)
Joint Statistic: 1.28 1.47 1.88
Individual Statistic: 0.35 0.47 0.75
Sign Bias Test
————————————
t-value prob sig
Sign Bias 1.973 0.04902 **
Negative Sign Bias 1.165 0.24478
Positive Sign Bias 1.041 0.29840
Joint Effect 11.259 0.01041 **
Adjusted Pearson Goodness-of-Fit Test:
————————————
group statistic p-value(g-1)
1 20 22.49 0.2604
2 30 35.86 0.1777
3 40 50.31 0.1060
4 50 55.07 0.2558
Residual analysis of EGARCH model shows that the model fits the data adequately – residuals are white noise and
show no ARCH effect. The goodness of fit test supports the choice of a Gaussian distribution for the error term.
Although the qqplot shows that the error distribution has thicker left tail than the normal distribution. We will fit
the t-distribution to check if that’s a better fit for the behavior of extreme values.
Fitting an ARMA(0,0)-EGARCH(1,1) model with t-distribution
> #Fit ARMA(0,0)-eGARCH(1,1) model with t-distribution
> egarch11.t.spec=ugarchspec(variance.model=list(model = “eGARCH”,
garchOrder=c(1,1)), mean.model=list(armaOrder=c(0,0)),
distribution.model = “std”)
> #estimate model
>
egarch11.t.fit=ugarchfit(spec=egarch11.t.spec, data=ret)
>
egarch11.t.fit
*———————————*
* GARCH Model Fit *
*———————————*
Conditional Variance Dynamics
———————————–
GARCH Model : eGARCH(1,1)
Mean Model : ARFIMA(0,0,0)
Distribution : std
Optimal Parameters
————————————
Estimate Std. Error t value Pr(>|t|)
mu 0.007093 0.001757 4.0369 0.000054
omega -0.677075 0.246000 -2.7523 0.005917
alpha1 -0.145367 0.045752 -3.1773 0.001487
beta1 0.893975 0.038715 23.0912 0.000000
gamma1 0.202717 0.055934 3.6242 0.000290
shape 7.926664 2.405188 3.2957 0.000982
Using the R output above, the ARMA(0,0)-EGARCH(1,1) model can be written as follows.
Fitted model:
rt = 0.007 + at, at=σtet
ln(σ2t) = -0.677 + (-0.145 et-1 + 0.203(|et-1| – E(|et-1|)) + 0.894 ln(σ
2
t-1)
with t-distribution with 8 degrees of freedom (nearest integer to shape value)
Note that since et has t-distribution, the E(|et|)=
)2/()1(
]2/)1[(22
(see page 143 in book) where ν=d.f. of t-
distribution (denoted by shape in R output).
Note that since gamma1 is significant, the volatility has an asymmetric behavior. Therefore a negative shock has a
stronger impact on the volatility compared to a positive shock of the same size.
Robust Standard Errors:
Estimate Std. Error t value Pr(>|t|)
mu 0.007093 0.001888 3.7574 0.000172
omega -0.677075 0.212770 -3.1822 0.001462
alpha1 -0.145367 0.042936 -3.3857 0.000710
beta1 0.893975 0.034112 26.2071 0.000000
gamma1 0.202717 0.045692 4.4366 0.000009
shape 7.926664 2.720128 2.9141 0.003567
LogLikelihood : 846.92
Information Criteria
————————————
Akaike -3.4965
Bayes -3.4445
Shibata -3.4969
Hannan-Quinn -3.4761
Q-Statistics on Standardized Residuals
————————————
statistic p-value
Lag[1] 0.1872 0.6653
Lag[p+q+1][1] 0.1872 0.6653
Lag[p+q+5][5] 4.7776 0.4436
d.o.f=0
H0 : No serial correlation
Q-Statistics on Standardized Squared Residuals
————————————
statistic p-value
Lag[1] 0.6769 0.4107
Lag[p+q+1][3] 0.9526 0.3291
Lag[p+q+5][7] 2.5747 0.7652
d.o.f=2
ARCH LM Tests
————————————
Statistic DoF P-Value
ARCH Lag[2] 0.9725 2 0.6149
ARCH Lag[5] 1.0934 5 0.9547
ARCH Lag[10] 5.4934 10 0.8559
Nyblom stability test
————————————
Joint Statistic: 1.6436
Individual Statistics:
mu 0.20752
omega 0.12081
alpha1 0.15261
beta1 0.13304
gamma1 0.93741
shape 0.08736
Asymptotic Critical Values (10% 5% 1%)
Joint Statistic: 1.49 1.68 2.12
Individual Statistic: 0.35 0.47 0.75
Sign Bias Test
————————————
t-value prob sig
Sign Bias 1.7639 0.07840 *
Negative Sign Bias 1.2413 0.21510
Positive Sign Bias 0.9747 0.33022
Joint Effect 8.8291 0.03165 **
Adjusted Pearson Goodness-of-Fit Test:
————————————
group statistic p-value(g-1)
1 20 20.66 0.35571
2 30 39.23 0.09736
3 40 54.47 0.05098
4 50 66.51 0.04860
Residual analysis of EGARCH model shows that the model fits the data adequately – residuals are white noise and
show no ARCH effect. However the goodness of fit test shows that the t-distribution is not a good choice for the
error terms (test p-values are small and rejects the null hypothesis of error term having a t-distribution. The qqplot
shows that the error distribution has thicker tails than the t-distribution. (Further analysis shows that the ged
distribution does a better job representing the extreme values distribution)
Fitting an ARMA(0,0)-GJRGARCH(1,1) model or TGARCH model
The GJRGARCH(1,1) model fitted by the rugarch package has the following expression
)ln()()ln(
,
2
11
2
1111
2
tttt
tttttt
aN
eaar
where µt can follow an ARMA process, but most typically it will be constant. Nt-1 is the indicator variable s.t.
Nt-1= 1 when at-1 (shock at time t-1) is negative, and Nt-1 = 0 otherwise. Gamma1 (γ1) is the leverage parameter. So
if gamma1 in output is significant, then we can conclude that the volatility process has an asymmetric behavior.
> #Fit ARMA(0,0)-TGARCH(1,1) model with t-distribution
>
gjrgarch11.t.spec=ugarchspec(variance.model=list(model = “gjrGARCH”,
garchOrder=c(1,1)),
mean.model=list(armaOrder=c(0,0)), distribution.model = “std”)
> #estimate model
>
gjrgarch11.t.fit=ugarchfit(spec=gjrgarch11.t.spec, data=ret)
Warning message:
In .makefitmodel(garchmodel = “gjrGARCH”, f = .gjrgarchLLH, T = T, :
NaNs produced
>
gjrgarch11.t.fit
*———————————*
* GARCH Model Fit *
*———————————*
Conditional Variance Dynamics
———————————–
GARCH Model : gjrGARCH(1,1)
Mean Model : ARFIMA(0,0,0)
Distribution : std
Optimal Parameters
————————————
Estimate Std. Error t value Pr(>|t|)
mu 0.007202 0.001766 4.077739 0.000045
omega 0.000212 0.000094 2.262826 0.023646
alpha1 0.000001 0.001394 0.000791 0.999369
beta1 0.781111 0.065122 11.994488 0.000000
gamma1 0.215386 0.069611 3.094141 0.001974
shape 7.179979 2.031526 3.534279 0.000409
Using the R output above, the ARMA(0,0)-TGARCH(1,1) model can be written as follows. Note that the arch(1)
coefficient is zero and we remove it from the model.
2
1
2
11
2
781.0)215.00000.0(00021.0
,0072.0
tttt
ttttt
aN
eaar
Where error term is assumed to have t-distribution with 7 degrees of freedom.
or
2
1
2
11
2
781.0215.000021.0
,0072.0
tttt
ttttt
aN
eaar
Where Nt-1 is an indicator variable for negative innovations such that
00
01
1
1
1
t
t
t
aif
aif
N
For a standardized shock with magnitude 2, (i.e. two standard deviations), we have
966.1
002.0781.0)002.04()000.0(00021.0
002.0781.0)002.04()215.00000.0(00021.0
)2(
)2(
1
2
1
2
tt
tt
e
e
Where the value for
2
1t
a is computed as
2
1
22
1
ttt
ea . Since
2
1t
in the expression above is unknown, we set
2
1t
= sample variance of the S&P500 index returns (that is
2
1t
= 0.002) .
Robust Standard Errors:
Estimate Std. Error t value Pr(>|t|)
mu 0.007202 0.001921 3.749767 0.000177
omega 0.000212 0.000112 1.902791 0.057068
alpha1 0.000001 0.000054 0.020238 0.983853
beta1 0.781111 0.063980 12.208691 0.000000
gamma1 0.215386 0.056979 3.780097 0.000157
shape 7.179979 2.274951 3.156102 0.001599
LogLikelihood : 844.6336
Information Criteria
————————————
Akaike -3.4870
Bayes -3.4350
Shibata -3.4873
Hannan-Quinn -3.4666
Q-Statistics on Standardized Residuals
————————————
statistic p-value
Lag[1] 0.1348 0.7135
Lag[p+q+1][1] 0.1348 0.7135
Lag[p+q+5][5] 4.7791 0.4434
d.o.f=0
H0 : No serial correlation
Q-Statistics on Standardized Squared Residuals
————————————
statistic p-value
Lag[1] 0.7412 0.3893
Lag[p+q+1][3] 1.0145 0.3138
Lag[p+q+5][7] 2.5887 0.7631
d.o.f=2
ARCH LM Tests
————————————
Statistic DoF P-Value
ARCH Lag[2] 1.015 2 0.6020
ARCH Lag[5] 1.239 5 0.9411
ARCH Lag[10] 5.120 10 0.8830
Nyblom stability test
————————————
Joint Statistic: 2.2942
Individual Statistics:
mu 0.18155
omega 0.27666
alpha1 0.05641
beta1 0.20413
gamma1 0.03198
shape 0.06486
Asymptotic Critical Values (10% 5% 1%)
Joint Statistic: 1.49 1.68 2.12
Individual Statistic: 0.35 0.47 0.75
Sign Bias Test
————————————
t-value prob sig
Sign Bias 2.117 0.03474 **
Negative Sign Bias 1.408 0.15987
Positive Sign Bias 0.766 0.44408
Joint Effect 10.173 0.01715 **
Adjusted Pearson Goodness-of-Fit Test:
————————————
group statistic p-value(g-1)
1 20 19.83 0.4048
2 30 29.37 0.4457
3 40 37.17 0.5535
4 50 52.78 0.3300
Elapsed time : 0.458046
>
plot(gjrgarch11.t.fit, which=”all”)
Residual analysis of TGARCH model shows that the model fits the data adequately – residuals are white noise and
show no ARCH effect. The goodness of fit test also shows that the t-distribution is adequate.
Apply information criteria for model selection
>
# MODEL COMPARISON
>
# compare information criteria
>
model.list = list(garch11 = garch11.fit, garch11.t = garch11.t.fit,
+ egarch11 = egarch11.t.fit,
+ gjrgarch11 = gjrgarch11.t.fit)
>
info.mat = sapply(model.list, infocriteria)
>
rownames(info.mat) = rownames(infocriteria(garch11.fit))
>
info.mat
garch11 garch11.t egarch11 gjrgarch11
Akaike -3.428558 -3.468892 -3.480644 -3.487042
Bayes -3.393831 -3.425483 -3.428554 -3.434952
Shibata -3.428694 -3.469105 -3.480950 -3.487348
Hannan-Quinn -3.414909 -3.451830 -3.460170 -3.466568
Best model according to model selection criteria is the GJR-GARCH(1,1) model with t-distribution
R CODE:
# Analysis of daily S&P500 index
#
library(fBasics)
library(tseries)
library(rugarch)
# import data in R
# import libraries for TS analysis
myd= read.table(‘sp500_feb1970_Feb2010.txt’, header=T)
# create time series object •
rts= ts(myd$return, start = c(1970, 1), frequency=12)
# create a simple numeric object
ret =myd$return;
# CREATE TIME PLOT
plot(rts)
# Plots ACF function of vector data
acf(ret)
# Plot ACF of squared data to check for non-linear dependence
acf(ret^2)
# Computes Ljung-Box test on squared returns to test non-linear independence at lag 6
and 12
Box.test(ret^2,lag=6,type=’Ljung’)
Box.test(ret^2,lag=12,type=’Ljung’)
# Computes Ljung-Box test on absolute returns to test non-linear independence at lag 6
and 12
Box.test(abs(ret),lag=6,type=’Ljung’)
Box.test(abs(ret),lag=12,type=’Ljung’)
# FITTING AN GARCH(1,1) MODEL WITH GAUSSIAN DISTRIBUTION
# Use ugarchspec() function to specify model
garch11.spec=ugarchspec(variance.model=list(garchOrder=c(1,1)),
mean.model=list(armaOrder=c(0,0)))
#estimate model
garch11.fit=ugarchfit(spec=garch11.spec, data=ret)
garch11.fit
#persistence = alpha1+beta1
persistence(garch11.fit)
#half-life: ln(0.5)/ln(alpha1+beta1)
halflife(garch11.fit)
#create selection list of plots for garch(1,1) fit
plot(garch11.fit)
#to display all subplots on one page
plot(garch11.fit, which=”all”)
#FIT ARMA(0,0)-GARCH(1,1) MODEL WITH T-DISTRIBUTION
# specify model using functions in rugarch package
garch11.t.spec=ugarchspec(variance.model=list(garchOrder=c(1,1)),
mean.model=list(armaOrder=c(0,0)), distribution.model = “std”)
#estimate model
garch11.t.fit=ugarchfit(spec=garch11.t.spec, data=ret)
garch11.t.fit
plot(garch11.t.fit)
#FIT ARMA(0,0)-EGARCH(1,1) MODEL WITH GAUSSIAN DISTRIBUTION
egarch11.spec=ugarchspec(variance.model=list(model = “eGARCH”, garchOrder=c(1,1)),
mean.model=list(armaOrder=c(0,0)))
#estimate model
egarch11.fit=ugarchfit(spec=egarch11.spec, data=ret)
egarch11.fit
plot(egarch11.fit, which=”all”)
#FIT ARMA(0,0)-EGARCH(1,1) MODEL WITH T-DISTRIBUTION
egarch11.t.spec=ugarchspec(variance.model=list(model = “eGARCH”, garchOrder=c(1,1)),
mean.model=list(armaOrder=c(0,0)), distribution.model = “ged”)
#estimate model
egarch11.t.fit=ugarchfit(spec=egarch11.t.spec, data=ret)
egarch11.t.fit
plot(egarch11.t.fit, which=”all”)
# compute expected value E(|e|)
shape=coef(egarch11.t.fit)[6]
exp.abse=(2*sqrt(shape-2)*gamma((shape+1)/2))/((shape-1)*gamma(shape/2)*sqrt(pi))
#FIT ARMA(0,0)-TGARCH(1,1) MODEL WITH T-DISTRIBUTION
gjrgarch11.t.spec=ugarchspec(variance.model=list(model = “gjrGARCH”,
garchOrder=c(1,1)), mean.model=list(armaOrder=c(0,0)), distribution.model = “std”)
#estimate model
gjrgarch11.t.fit=ugarchfit(spec=gjrgarch11.t.spec, data=ret)
gjrgarch11.t.fit
plot(gjrgarch11.t.fit, which=”all”)
#FIT ARMA(0,0)-IGARCH(1,1) MODEL WITH SKEWED T-DISTRIBUTION
igarch11.t.spec=ugarchspec(variance.model=list(model = “iGARCH”, garchOrder=c(1,1)),
mean.model=list(armaOrder=c(0,0)), distribution.model = “std”)
#estimate model
igarch11.t.fit=ugarchfit(spec=igarch11.t.spec, data=ret)
igarch11.t.fit
plot(igarch11.t.fit, which=”all”)
# MODEL COMPARISON
# compare information criteria
model.list = list(garch11 = garch11.fit, garch11.t = garch11.t.fit,
egarch11 = egarch11.t.fit,
gjrgarch11 = gjrgarch11.t.fit)
info.mat = sapply(model.list, infocriteria)
rownames(info.mat) = rownames(infocriteria(garch11.fit))
info.mat
# RE-FIT MODELS LEAVING 100 OUT-OF-SAMPLE OBSERVATIONS FOR FORECAST
# EVALUATION STATISTICS
garch11.fit = ugarchfit(spec=garch11.spec, data=ret, out.sample=100)
garch11.t.fit = ugarchfit(spec=garch11.t.spec, data=ret, out.sample=100)
egarch11.t.fit = ugarchfit(egarch11.t.spec, data=ret, out.sample=100)
tgarch11.t.fit = ugarchfit(spec=gjrgarch11.t.spec, data=ret, out.sample=100)
# COMPUTE 100 1-STEP AHEAD ROLLING FORECASTS W/O RE-ESTIMATING
garch11.fcst = ugarchforecast(garch11.fit, n.roll=100, n.ahead=1)
garch11.t.fcst = ugarchforecast(garch11.t.fit, n.roll=100, n.ahead=1)
egarch11.t.fcst = ugarchforecast(egarch11.t.fit, n.roll=100, n.ahead=1)
tgarch11.t.fcst = ugarchforecast(tgarch11.t.fit, n.roll=100, n.ahead=1)
# COMPUTE FORECAST EVALUATION STATISTICS USING FPM() FUNCTION
fcst.list = list(garch11=garch11.fcst, garch11.t=garch11.t.fcst,
egarch11.t=egarch11.t.fcst,
tgarch11.t=tgarch11.t.fcst)
fpm.mat
= sapply(fcst.list, fpm)
fpm.mat
CSC425– Time series analysis and forecasting
Homework 5 – not be submitted
The goal of this assignment is to provide students with some practical training on
GARCH/EGARCH/TGARCH models for the analysis of the volatility of a stock return.
Solutions will be posted on Thursday November 7th 2013
Reading assignment:
1. Read Chapter 4 in short book and Chapter 3 in long book on volatility models
2. Review course documents posted under week 7 and 8.
Problem
Use the datafile nordstrom_w_00_13.csv that contains the Nordstrom (JWN ) stock weekly prices
from January 2000 to October 2013. The data file contains dates (date), daily prices (price). You
can also use the code and the analysis of the S&P500 returns used in week 7 and 8 lectures as
your reference for the analysis of this data. Analyze the Nordstrom stock log returns following
the steps below.
1. Compute log returns, and analyze their time plot.
Moments
N 693 Sum Weights 693
Mean 0.00268299 Sum Observations 1.85931229
Std Deviation 0.06068019 Variance 0.00368209
Skewness -0.2804891 Kurtosis 7.55351876
Uncorrected SS 2.55299156 Corrected SS 2.54800304
Coeff Variation 2261.66262 Std Error Mean 0.00230505
Time plot of returns show that Nordstrom stock returns shows periods of high volatility at
various times, and consistently high volatility from 2007 to 2010. Most returns are
between +/- 10%. Sample moments show that distribution of log returns is somewhat
symmetric with very fat tails (kurtosis = 7.55).
return
-0.
5
–
0.
4
–
0.
3
–
0.
2
–
0.
1
0.
0
0.1
0.2
0.3
0.4
date
01/01/2000 01/01/2002 01/01/2004 01/01/2006 01/01/2008 01/01/2010 01/01/2012 01/01/2014 01/01/2016
2. Is there evidence of serial correlations in the log returns? Use autocorrelations and 5%
significance level to answer the question.
Log returns are not serially correlated as shown by the Ljung-Box test with p-values >
0.05 and the autocorrelation plot. .
Name of Variable = return
Mean of Working Series 0.002683
Standard Deviation 0.060636
Number of Observations 693
Autocorrelation Check for White Noise
To Chi- Pr >
Lag Square DF ChiSq ——————–Autocorrelations——————–
6 6.16 6 0.4050 -0.009 0.011 -0.048 -0.009 0.078 -0.006
12 9.08 12 0.6957 -0.042 -0.038 -0.019 0.011 -0.015 0.016
18 26.17 18 0.0960 0.033 -0.026 0.088 -0.032 0.101 -0.056
24 31.58 24 0.1377 0.029 -0.037 0.025 -0.042 -0.026 -0.048
30 33.84 30 0.2871 -0.029 -0.003 -0.037 0.023 0.011 -0.016
3. Is there evidence of ARCH effects in the log returns? Use appropriate tests at 5%
significance level to answer this question.
The analysis below shows a strong ARCH effect. The squared returns are strongly
correlated. The Ljung-Box tests on squared residuals are highly significant with p-values
< 0.001, and autocorrelation plots shows large autocorrelations for the first 10 lags.
Name of Variable = returnsq
Mean of Working Series 0.003684
Standard Deviation 0.011302
Number of Observations 693
Autocorrelation Check for White Noise
To Chi- Pr >
Lag Square DF ChiSq ——————–Autocorrelations——————–
6 331.56 6 <.0001 0.534 0.281 0.123 0.101 0.167 0.240
12 407.05 12 <.0001 0.246 0.187 0.053 0.036 0.032 0.082
18 469.70 18 <.0001 0.069 0.072 0.128 0.127 0.156 0.145
24 494.90 24 <.0001 0.149 0.090 0.050 0.035 0.014 0.031
30 501.01 30 <.0001 0.033 0.064 0.051 0.023 0.010 -0.004
4. Fit an GARCH(1,1) model for the log returns using a t- distribution for the error terms.
Perform model checking (analyze if residuals are white noise, if squared residuals are
white noise, and check if t-distribution is a good fit for the data) and write down the fitted
model.
The GARCH model with t-distribution is an adequate model for the volatility of the
returns. All parameters are significant and the squared residuals are white noise (LB
tests on squared residuals are non significant).
Fitted GARCH model can be expressed as follows:
rt=0.005 +at
at=σtet with σt
2
=0.1483 a
2
t-1+0.836 σ
2
t-1
(Intercept ARCH0 can be considered equal to zero)
Where error term et has t-distribution with 1/0.185=5 degrees of freedom.
The AUTOREG Procedure
GARCH Estimates
SSE 2.55069233 Observations 693
MSE 0.00368 Uncond Var 0.00531549
Log Likelihood 1102.73026 Total R-Square .
SBC -2172.7554 AIC -2195.4605
MAE 0.04086373 AICC -2195.3732
MAPE 113.334007 HQC -2186.6796
Normality Test 126.0152
Pr > ChiSq <.0001
Parameter Estimates
Standard Approx
Variable DF Estimate Error t Value Pr > |t|
Intercept 1 0.004653 0.001570 2.96 0.0030
ARCH0 1 0.0000848 0.0000429 1.98 0.0482
ARCH1 1 0.1483 0.0369 4.01 <.0001
GARCH1 1 0.8358 0.0359 23.30 <.0001
TDFI 1 0.1851 0.0448 4.14 <.0001
Inverse of t DF
R output
Robust Standard Errors:
Estimate Std. Error t value Pr(>|t|)
mu 0.004729 0.001465 3.2281 0.001246
omega 0.000086 0.000050 1.7302 0.083590
alpha1 0.145749 0.052318 2.7858 0.005340
beta1 0.836471 0.051967 16.0962 0.000000
shape 5.345569 1.009241 5.2966 0.000000
5. Fit and EGARCH(1,1) model for the NDX log returns using a normal distribution for the
error terms. Perform model checking and write down the fitted model.
The EGARCH model is adequate although the Gaussian distribution on the error terms is
not sufficient to describe most extreme events. No ARCH effects are shown in residuals.
The leverage parameter “theta” is significant showing that volatility of Nordstrom stock
returns is affected more heavily by negative
shocks.
The fitted model can be written as follows:
rt=0.0012 +at
at=σtet with ln σt
2
=-0.194+0.213 g(et-1) +0.996 ln σ
2
t-1
g(et-1)= -0.47et-1+[|et-1|-E(et-1)]
The AUTOREG Procedure
Exponential GARCH Estimates
SSE 2.54945444 Observations 693
MSE 0.00368 Uncond Var .
Log Likelihood 1093.55049 Total R-Square .
stres
–
7
–
6
-5
-4
-3
-2
-1
0
1
2
3
4
5
6
7
quantiles
-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7
t-distribution QQplot for residuals of GARCH model
SBC -2154.3958 AIC -2177.101
MAE 0.04095052 AICC -2177.0136
MAPE 101.222881 HQC -2168.32
Normality Test 71.9939
Pr > ChiSq <.0001 Parameter Estimates
Standard Approx
Variable DF Estimate Error t Value Pr > |t|
Intercept 1 0.001236 0.001752 0.71 0.4806
EARCH0 1 -0.1940 0.0636 -3.05 0.0023
EARCH1 1 0.2133 0.0316 6.75 <.0001
EGARCH1 1 0.9662 0.0106 91.41 <.0001
THETA 1 -0.4748 0.1121 -4.23 <.0001
R output
Robust Standard Errors:
Estimate Std. Error t value Pr(>|t|)
mu 0.001271 0.001774 0.71655 0.473654
omega -0.169047 0.110169 -1.53443 0.124924
alpha1 -0.095670 0.035615 -2.68620 0.007227
beta1 0.970507 0.018820 51.56682 0.000000
gamma1 0.194542 0.059579 3.26526 0.001094
In R model is written as
rt=0.0012 +at
at=σtet with ln σt
2
=-0.169-0.095( et-1 +0.194(|et-1|-E(et-1))+0.970 ln σ
2
t-1
where error term has t-distribution with 6 degrees of freedom.
Note that the leverage coefficients of et-1 in SAS and R are equivalent.
6. Fit and EGARCH(1,1) model for the Nordstrom log returns using a t- distribution for the
error terms. Perform model checking and write down the fitted model.
The EGARCH model t-distributed errors is a good model for the data. No ARCH effects
are shown in residuals. The leverage parameter “theta” is significant and negative
showing that volatility of Nordstrom stock returns is affected more heavily by negative
shocks.
The fitted model can be written as follows:
rt=0.0032 +at
at=σtet with ln σt
2
=-0.191+0.158 g(et-1) +0.974 ln σ
2
t-1
g(et-1)= -0.55et-1+[|et-1|-E(et-1)]
et has t-distribution with 6 degrees of freedom.
The MODEL Procedure
Nonlinear Liklhood Summary of Residual Errors
DF DF Adj
Equation Model Error SSE MSE Root MSE R-Square R-Sq
7 686 2.5482 0.00371 0.0609 -0.0001 -0.0088
nresid.y 686 1028.4 1.4991 1.2244
Nonlinear Liklhood Parameter Estimates
Approx Approx
Parameter Estimate Std Err t Value Pr > |t|
intercept 0.003273 0.00160 2.05 0.0410
earch0 -0.19123 0.0932 -2.05 0.0405
earch1 0.158378 0.0490 3.23 0.0013
egarch1 0.974881 0.0139 70.01 <.0001
theta -0.55013 0.1969 -2.79 0.0053
df 5.993367 1.3083 4.58 <.0001 R output
Robust Standard Errors:
Estimate Std. Error t value Pr(>|t|)
mu 0.003374 0.001552 2.1743 0.029681
omega -0.137932 0.098212 -1.4044 0.160191
alpha1 -0.104435 0.034086 -3.0638 0.002185
beta1 0.977603 0.016446 59.4433 0.000000
gamma1 0.180592 0.075474 2.3928 0.016722
shape 5.783774 1.169335 4.9462 0.000001
In R model is written as
rt=0.0033 +at
at=σtet with ln σt
2
=-0.138-0.104( et-1 +0.180(|et-1|-E(et-1))+0.978 ln σ
2
t-1
where error term has t-distribution with 6 degrees of freedom.
Note that the leverage coefficients of et-1 in SAS and R are equivalent.
Name of Variable = resid2
Mean of Working Series 1.483942
Standard Deviation 2.819517
Number of Observations 693
Autocorrelation Check for White Noise
To Chi- Pr >
Lag Square DF ChiSq ——————–Autocorrelations——————–
6 5.11 6 0.5299 0.026 0.020 -0.044 -0.061 0.024 -0.004
12 13.61 12 0.3261 0.025 0.088 0.014 -0.041 0.017 -0.039
18 19.28 18 0.3748 0.043 -0.032 -0.038 -0.028 -0.049 0.020
24 23.55 24 0.4873 0.004 0.028 -0.025 -0.051 -0.039 -0.021
7. Find a GJK-TGARCH(1,1) model for the Nordstrom log returns using a t-distribution for
the innovations. Perform model checking and write down the fitted model.
Similar to the AR(0)- EGARCH(1,1) model, the AR(0)-GJR(1,1) model is also a good
model for the data. No ARCH effects are shown in residuals. The leverage parameter is
significant and positive showing that volatility of Nordstrom stock returns is affected
more heavily by negative shocks.
The fitted model can be written as follows:
rt=0.0036 +at
at=σtet with σt
2
=(0.034 +0.13Nt-1)a
2
t-1 +0.800 σ
2
t-1
with Nt-1 = 1 if at-1<0, and Nt-1 = 0 if at-1>0
et has t-distribution with 5 degrees of freedom.
The MODEL Procedure
Nonlinear Liklhood Parameter Estimates
Approx Approx
Parameter Estimate Std Err t Value Pr > |t|
intercept 0.003589 0.00157 2.28 0.0226
df 5.028263 0.9762 5.15 <.0001
arch0 0.000095 0.000044 2.18 0.0297
arch1 0.034154 0.0259 1.32 0.1869
garch1 0.800444 0.0584 13.71 <.0001
gamma 0.129939 0.0496 2.62 0.0090
R output
Robust Standard Errors:
Estimate Std. Error t value Pr(>|t|)
mu 0.003880 0.001519 2.5535 0.010665
omega 0.000083 0.000053 1.5798 0.114152
alpha1 0.039211 0.032406 1.2100 0.226283
beta1 0.858372 0.058370 14.7058 0.000000
gamma1 0.153341 0.074560 2.0566 0.039723
shape 5.767186 1.177356 4.8984 0.000001
The fitted model can be written as follows:
rt=0.0039 +at
at=σtet with σt
2
=(0.039 +0.15Nt-1)a
2
t-1 +0.858 σ
2
t-1
with Nt-1 = 1 if at-1<0, and Nt-1 = 0 if at-1>0
et has t-distribution with 6 degrees of freedom.
8. Is the leverage effect significant at the 5% level?
The leverage parameters in both the EGARCH and the GJR models are significant, and
have values that indicate that volatility react more heavily to negative shocks.
9. What model provides the best fit for the data? Explain.
Since the leverage parameters in the AR(0)-EGARCH(1,1) and AR(0) – GJR(1,1) models
are significant, and residuals are white noise with no ARCH effect, we can conclude that
both models provide a good explanation of the volatility behavior for Nordstrom stock
returns. Either models can be chosen, there is no clear winner. (Note that in SAS, PROC
MODEL does not compute BIC criterion – backtesting can be used to compare models.)
In R the egarch(1,1) and the GJR-GARCH(1,1) models have very similar BIC values-
garch11.t egarch11 gjrgarch11
Akaike -3.172122 -3.186143 -3.183976
Bayes -3.139358 -3.146827 -3.144660
10. Use the selected model to compute up to 5 step-ahead forecasts of the simple returns and
its volatility.
The following predictions are based on the AR(0)-EGARCH(1,1) model. sigma is the
predicted conditional standard deviation or volatility, and series is the predicted return
(note that predicted returns are the sample mean returns since returns are white noise)
R output for forecasts
sigma series
2013-10-29 0.02732 0.003374
2013-10-30 0.02764 0.003374
2013-10-31 0.02796 0.003374
2013-11-01 0.02827 0.003374
2013-11-04 0.02858 0.003374
SAS code
*import data from file;
proc import datafile=’nordstrom_w_00_13.csv’ out=myd replace;
resid
-7
-6
-5
-4
-3
-2
-1
0
1
2
3
4
5
6
7
quantiles
-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7
t-distribution QQplot of residuals for GJRGARCH model
EXPLORATORY ANALYSIS
nobs 469.000000
NAs 0.000000
Minimum -0.166652
Maximum 0.164808
1. Quartile -0.022391
3. Quartile 0.029000
Mean 0.003843
Median 0.005614
Sum 1.802469
SE Mean 0.002079
LCL Mean -0.000242
UCL Mean 0.007929
Variance 0.002027
Stdev 0.045027
Skewness -0.040829
Kurtosis 1.569306
Title:
Jarque – Bera Normalality Test
Test Results:
STATISTIC:
X-squared: 49.4632
P VALUE:
Asymptotic p Value: 1.816e-11
The distribution of log(returns) is not normal distribution, with symmetry close to 0 but kurtosis statistics is 1.5 which is far from normal distribution statistic. The Jarque-Bera test is significant (p-value < 0.05) indicating that hypothesis of normal distribution for log(returns) can be rejected. The following plot is time series plot of weekly price of Google shares which shows increasing trend. ( Time series Plot of Google Prices ) Time plot of returns show that google stock returns shows periods of high volatility at various times. Most returns are between +/- 10%. Sample moments show that distribution of log returns is somewhat symmetric with very thin tails (kurtosis = 1.56). There is various times where the returns produce +/-15% with higher volatility also. Volatility doesn’t seem to die down quickly and it continues for a while after a volatile period. To check null hypothesis of non stationarity: Dickey Fuller Test is expressed as H0: φ1=1 vs Ha: φ1<>1 where the null hypothesis indicates unit-root non-stationarity.
The Dickey-Fuller test shows that the earnings time series is unit-root stationary. The test p-values for lags 7 is less than 0.05, therefore the null hypothesis of non stationarity can be rejected (small p-values< 0.05).
Augmented Dickey-Fuller Test
Test Results:
PARAMETER:
Lag Order: 7
STATISTIC:
Dickey-Fuller: -7.1331
P VALUE:
0.01
alternative hypothesis: stationary
ACF TESTS
The ACF plots below show the Google stock returns are not correlated indicating a constant mean model for rt. Both the squared returns time series and the absolute time series show large autocorrelations. We can conclude that the log returns process has a strong non-linear dependence.
Ljung Box testing for ARCH effects for Lags of 6-12-18 for squared returns and absolute returns
Box-Ljung test
data: coredata(rets^2)
X-squared = 48.3339, df = 6, p-value = 1.013e-08
Box-Ljung test
data: coredata(rets^2)
X-squared = 71.9976, df = 12, p-value = 1.352e-10
Box-Ljung test
data: coredata(rets^2)
X-squared = 98.7945, df = 18, p-value = 3.68e-13
The Ljung Box tests confirm that the Ljung Box tests on the squared returns are autocorrelated. The analysis shows a strong ARCH effect. The squared returns are strongly correlated. The Ljung-Box tests on squared residuals are highly significant with p-values < 0.005, and autocorrelation plots shows large autocorrelations for the first 18 lags.
Plot of PACF: There is no correlation till lag 13 which shows there is no AR lag.
MODEL FITTING
*---------------------------------*
* GARCH Model Fit *
*---------------------------------*
Conditional Variance Dynamics
-----------------------------------
GARCH Model : sGARCH(1,1)
Mean Model : ARFIMA(0,0,0)
Distribution : std
Optimal Parameters
------------------------------------
Estimate Std. Error t value Pr(>|t|)
mu 0.004432 0.001778 2.4924 0.012689
omega 0.000103 0.000053 1.9574 0.050295
alpha1 0.094169 0.032716 2.8784 0.003997
beta1 0.856326 0.044635 19.1850 0.000000
shape 6.797338 2.000925 3.3971 0.000681
To fit an ARMA(0,0)-GARCH(2,1) model for the log returns using a normal distribution for the error terms:
Fitted Model:
with 7 degrees of freedom
Residual Analysis
Q-Statistics on Standardized Residuals
————————————
statistic p-value
Lag[1] 0.0339 0.8539
Lag[p+q+1][1] 0.0339 0.8539
Lag[p+q+5][5] 2.4766 0.7800
d.o.f=0
H0 : No serial correlation
Q-Statistics on Standardized Squared Residuals
————————————
statistic p-value
Lag[1] 0.1274 0.7212
Lag[p+q+1][3] 1.4245 0.2327
Lag[p+q+5][7] 3.4346 0.6333
d.o.f=2
Above output shows there is no evidence of autocorrelation in residuals. They behave like white noise. Also, there is no evidence of serial correlation in squared residuals. They also behave like white noise.
Adjusted Pearson Goodness-of-Fit Test:
————————————
group statistic p-value(g-1)
1 20 8465 0
2 30 13148 0
3 40 17834 0
4 50 22521 0
Test for Goodness of fit. Normal Distribution rejected.
To fit ARMA(0,0)-eGARCH(1,1) model with Gaussian distribution:
Conditional Variance Dynamics
———————————–
GARCH Model : eGARCH(1,1)
Mean Model : ARFIMA(0,0,0)
Distribution : norm
Optimal Parameters
————————————
Estimate Std. Error t value Pr(>|t|)
mu 0.004814 0.001837 2.6205 0.008780
omega -0.427600 0.165861 -2.5781 0.009936
alpha1 -0.086690 0.033886 -2.5583 0.010518
beta1 0.931657 0.026273 35.4604 0.000000
gamma1 0.177187 0.048248 3.6724 0.000240
Fitted Model is:
Residual Analysis:
Q-Statistics on Standardized Residuals
————————————
statistic p-value
Lag[1] 0.03655 0.8484
Lag[p+q+1][1] 0.03655 0.8484
Lag[p+q+5][5] 2.73813 0.7403
d.o.f=0
H0 : No serial correlation
Q-Statistics on Standardized Squared Residuals
————————————
statistic p-value
Lag[1] 0.2255 0.6348
Lag[p+q+1][3] 2.6812 0.1015
Lag[p+q+5][7] 4.1362 0.5300
d.o.f=2
ARCH LM Tests
————————————
Statistic DoF P-Value
ARCH Lag[2] 2.224 2 0.3290
ARCH Lag[5] 4.682 5 0.4559
ARCH Lag[10] 7.053 10 0.7204
Above output shows there is no evidence of autocorrelation in residuals. They behave like white noise. Also, there is no evidence of serial correlation in squared residuals. They also behave like white noise.
Adjusted Pearson Goodness-of-Fit Test:
————————————
group statistic p-value(g-1)
1 20 26.65 0.1131
2 30 32.04 0.3179
3 40 43.67 0.2798
4 50 47.74 0.5243
Above output shows that error terms are normally distributed as all the P values are greater than 0.05.
To fit ARMA(0,0)-tGARCH(1,1) model with Gaussian distribution:
Conditional Variance Dynamics
———————————–
GARCH Model : gjrGARCH(2,1)
Mean Model : ARFIMA(0,0,0)
Distribution : norm
Optimal Parameters
————————————
Estimate Std. Error t value Pr(>|t|)
mu 0.004928 0.001860 2.649926 0.008051
omega 0.000120 0.000053 2.257334 0.023987
alpha1 0.000000 0.000877 0.000011 0.999991
alpha2 0.072854 0.037335 1.951388 0.051011
beta1 0.828559 0.053752 15.414339 0.000000
gamma1 0.397469 0.144242 2.755564 0.005859
gamma2 -0.315459 0.131445 -2.399922 0.016399
Fitted Model is:
Residual Analysis:
Q-Statistics on Standardized Residuals
————————————
statistic p-value
Lag[1] 0.2311 0.6307
Lag[p+q+1][1] 0.2311 0.6307
Lag[p+q+5][5] 2.9412 0.7090
d.o.f=0
H0 : No serial correlation
Q-Statistics on Standardized Squared Residuals
————————————
statistic p-value
Lag[1] 2.137 0.14380
Lag[p+q+1][4] 4.191 0.04063
Lag[p+q+5][8] 4.680 0.45620
d.o.f=3
ARCH LM Tests
————————————
Statistic DoF P-Value
ARCH Lag[2] 2.461 2 0.2921
ARCH Lag[5] 4.904 5 0.4277
ARCH Lag[10] 7.140 10 0.7122
Above output shows there is no evidence of autocorrelation in residuals. They behave like white noise. Also, there is no evidence of serial correlation in squared residuals. They also behave like white noise.
Adjusted Pearson Goodness-of-Fit Test:
————————————
group statistic p-value(g-1)
1 20 20.00 0.3947
2 30 32.43 0.3014
3 40 30.87 0.8203
4 50 54.56 0.2714
Above output shows that error terms are normally distributed as all the P values are greater than 0.05.
BEST FIT FOR THE MODEL:
garch11 egarch11 gjrgarch11
Akaike -3.479720 -3.455538 -3.474405
Bayes -3.435471 -3.411288 -3.412455
Shibata -3.479944 -3.455762 -3.474841
Hannan-Quinn -3.462310 -3.438127 -3.450030
From above output it can be said that AR(0)-EGARCH(1,1) have the (smallest) values for all best fit criterion and hence can be considered as best model.
Forecast Analysis:
RE-FIT MODELS LEAVING 100 OUT-OF-SAMPLE OBSERVATIONS FOR FORECAST and EVALUATE STATISTICS for 5 Step Ahead Forecast.
garch11 egarch11 tgarch21
MSE 0.000950199 0.0009498531 0.0009499473
MAE 0.02257449 0.0225407 0.02255243
DAC 0.6 0.6 0.6
Since MSE is smallest for eGARCH model. We should consider this model for forecasting.
CODE:
#Tomas Georgakopoulos CSC 425 Final Project
setwd(“C:/Course/CSC425”)
# Analysis of Google weekly returns from
library(forecast)
library(TSA)
# import data in R and compute log returns
# import libraries for TS analysis
library(zoo)
library(tseries)
myd= read.table(‘Weekly-GOOG-TSDATA.csv’, header=T, sep=’,’)
pricets = zoo(myd$price, as.Date(as.character(myd$date), format=c(“%m/%d/%Y”)))
#log return time series
rets = log(pricets/lag(pricets, -1))
# strip off the dates and just create a simple numeric object
ret = coredata(rets);
#compute statistics
library(fBasics)
basicStats(rets)
#HISTOGRAM
par(mfcol=c(1,2))
hist(rets, xlab=”Weekly Returns”, prob=TRUE, main=”Histogram”)
#Add approximating normal density curve
xfit<-seq(min(rets,na.rm=TRUE),max(rets,na.rm=TRUE),length=40)
yfit<-dnorm(xfit,mean=mean(rets,na.rm=TRUE),sd=sd(rets,na.rm=TRUE))
lines(xfit, yfit, col="blue", lwd=2)
#CREATE NORMAL PROBABILITY PLOT
qqnorm(rets)
qqline(rets, col = 'red', lwd=2)
# creates time plot of log returns
par(mfrow=c(1,1))
plot(rets)
#Perform Jarque-Bera normality test.
normalTest(rets,method=c('jb'))
#SKEWNESS TEST
skew_test=skewness(rets)/sqrt(6/length(rets))
skew_test
print("P-value = ")
2*(1-pnorm(abs(skew_test)))
#FAT-TAIL TEST
k_stat = kurtosis(rets)/sqrt(24/length(rets))
print("Kurtosis test statistic")
k_stat
print("P-value = ")
2*(1-pnorm(abs(k_stat)))
#COMPUTE DICKEY-FULLER TEST
library(fUnitRoots)
adfTest(rets, lags=7, type=c("c"))
# Computes Ljung-Box test on squared returns to test non-linear independence at lag 6 and 12
Box.test(coredata(rets^2),lag=6,type='Ljung')
Box.test(coredata(rets^2),lag=12,type='Ljung')
Box.test(coredata(rets^2),lag=18,type='Ljung')
# Computes Ljung-Box test on absolute returns to test non-linear independence at lag 6 and 12
Box.test(abs(coredata(rets)),lag=6,type='Ljung')
Box.test(abs(coredata(rets)),lag=12,type='Ljung')
Box.test(abs(coredata(rets)),lag=18,type='Ljung')
# Plots ACF function of vector data
par(mfrow=c(3,1))
acf(ret)
# Plot ACF of squared returns to check for ARCH effect
acf(ret^2)
# Plot ACF of absolute returns to check for ARCH effect
acf(abs(ret))
#plot returns, square returns and abs(returns)
# Plots ACF function of vector data
par(mfrow=c(3,1))
plot(rets, type='l')
# Plot ACF of squared returns to check for ARCH effect
plot(rets^2,type='l')
# Plot ACF of absolute returns to check for ARCH effect
plot(abs(rets),type='l')
par(mfrow=c(1,1))
# plots PACF of squared returns to identify order of AR model
pacf(coredata(rets),lag=30)
#GARCH Models
library(rugarch)
#Fit AR(0,0)-GARCH(1,1) model
garch11.spec=ugarchspec(variance.model=list(garchOrder=c(1,1)), mean.model=list(armaOrder=c(0,0)),distribution.model = "std")
#estimate model
garch11.fit=ugarchfit(spec=garch11.spec, data=rets)
garch11.fit
#Fit AR(0,0)-eGARCH(1,1) model with Gaussian distribution
egarch11.spec=ugarchspec(variance.model=list(model = "eGARCH",garchOrder=c(1,1)), mean.model=list(armaOrder=c(0,0)))
#estimate model
egarch11.fit=ugarchfit(spec=egarch11.spec, data=rets)
egarch11.fit
#Fit AR(0,0)-TGARCH(1,1) model with norm-distribution
gjrgarch11.spec=ugarchspec(variance.model=list(model = "gjrGARCH",garchOrder=c(2,1)), mean.model=list(armaOrder=c(0,0)), distribution.model = "norm")
#estimate model
gjrgarch11.fit=ugarchfit(spec=gjrgarch11.spec, data=rets)
gjrgarch11.fit
# compare information criteria
model.list = list(garch11 = garch11.fit,
egarch11 = egarch11.fit,
gjrgarch11 = gjrgarch11.fit)
info.mat = sapply(model.list, infocriteria)
rownames(info.mat) = rownames(infocriteria(garch11.fit))
info.mat
# RE-FIT MODELS LEAVING 100 OUT-OF-SAMPLE OBSERVATIONS FOR FORECAST
# EVALUATION STATISTICS
garch11.fit = ugarchfit(spec=garch11.spec, data=rets, out.sample=100)
egarch11.fit = ugarchfit(egarch11.spec, data=rets, out.sample=100)
tgarch11.fit = ugarchfit(spec=gjrgarch11.spec, data=rets, out.sample=100)
garch11.fcst = ugarchforecast(garch11.fit, n.roll=100, n.ahead=1)
egarch11.fcst = ugarchforecast(egarch11.fit, n.roll=100, n.ahead=1)
tgarch11.fcst = ugarchforecast(tgarch11.fit, n.roll=100, n.ahead=1)
fcst.list = list(garch11=garch11.fcst,
egarch11=egarch11.fcst,
tgarch21=tgarch11.fcst)
fpm.mat = sapply(fcst.list, fpm)
fpm.mat
The
average weekly earnings of google shares are $1.005 per share with a standard deviation of $0.045. The distribution of returns is more or less symmetric but have high peak. Normality test (
Jarque Bera Test
) rejects hypothesis of normality for earnings data at 5% level.
Jarque Bera Test
data: returns
X-squared = 54.1642, df = 2, p-value = 1.731e-12
The distribution of log(returns) is also not closer to normal distribution, with symmetry close to 0 but kurtosis statistics is 1.5 which is far from normal distribution statistic. The Jarque-Bera test is significant (p-value < 0.05) indicating that hypothesis of normal distribution for log(returns) can be rejected.
Jarque Bera Test
data: rets
X-squared = 49.4632, df = 2, p-value = 1.816e-11
The following plot is time series plot of weekly price of google shares which shows increasing trend.
Time series Plot of Google Prices
Time plot of returns show that google stock returns shows periods of high volatility at various times. Most returns are between +/- 10%. Sample moments show that distribution of log returns is somewhat symmetric with very thin tails (kurtosis = 1.56).
To check null hypothesis of non stationarity:
Dickey Fuller Test is expressed as H0: φ1=1 vs Ha: φ1<>1 where the null hypothesis indicates unit-root non-stationarity.
The Dickey-Fuller test shows that the earnings time series is unit-root stationary. The test p-values for lags 7 is less than 0.05, therefore the null hypothesis of non stationarity can be rejected (small p-values< 0.05).
Augmented Dickey-Fuller Test
data: rets
Dickey-Fuller = -7.1264, Lag order = 7, p-value = 0.01
alternative hypothesis: stationary
To check serial correlations in the log returns:
Log returns are not serially correlated as shown by the Ljung-Box test with p-values > 0.05 and the autocorrelation plot.
Box-Pierce test
data: coredata(rets)
X-squared = 0.686, df = 1, p-value = 0.4075
To look for evidence of ARCH effects in the log returns:
The analysis below shows a strong ARCH effect. The squared returns are strongly correlated. The Ljung-Box tests on squared residuals are highly significant with p-values < 0.005, and autocorrelation plots shows large autocorrelations for the first 15 lags.
Box-Pierce test
data: coredata(rets^2)
X-squared = 8.1483, df = 1, p-value = 0.00431
Plot of PACF: There is no correlation till lag 10 which shows there is no AR lag.
To fit an ARMA(0,0)-GARCH(2,1) model for the log returns using a normal- distribution for the error terms:
Fitted
Model
:
Residual Analysis:
Adjusted Pearson Goodness-of-Fit Test:
————————————
group statistic p-value(g-1)
1
20 28.78 0.06948
2 30 38.57 0.11019
3 40 46.39 0.19380
4 50 51.15 0.38929
Above output shows that error terms are normally distributed as all the P values are greater than 0.05.
To fit ARMA(0,0)-eGARCH(1,1) model with Gaussian distribution:
Fitted Model is:
Residual Analysis:
Adjusted Pearson Goodness-of-Fit Test:
————————————
group statistic p-value(g-1)
1 20 28.95 0.06673
2 30 27.82 0.52739
3 40 46.91 0.18000
4 50 50.72 0.40546
Above output shows that error terms are normally distributed as all the P values are greater than 0.05.
To fit ARMA(0,0)-TGARCH(2,1) model with norm-distribution:
Fitted Model:
Residual Analysis:
Adjusted Pearson Goodness-of-Fit Test:
————————————
group statistic p-value(g-1)
1 20 18.29 0.5030
2 30 31.28 0.3525
3 40 29.17 0.8742
4 50 51.36 0.3813
Above output shows that error terms are normally distributed as all the P values are greater than 0.05.
BEST FIT FOR THE MODEL:
garch21
egarch11
gjrgarch21
Akaike -3.436225 -3.451379 -3.463469
Bayes -3.391975 -3.407129 -3.401519
Shibata -3.436449 -3.451603 -3.463905
Hannan-Quinn -3.418814 -3.433968 -3.439094
From above output it can be said that AR(0)-EGARCH(1,1) and AR(0) – GJR(1,1) both have almost equal (smallest) values for all best fit criterion and hence can be considered as best model.
Forecast Analysis:
RE-FIT MODELS LEAVING 100 OUT-OF-SAMPLE OBSERVATIONS FOR FORECAST and EVALUATE STATISTICS for 5 Step Ahead Forecast.
Model | garch21 | egarch11 |
tgarch21 |
MSE |
0.0009497851 |
0.0009497741 |
0.0009499554 |
MAE |
0.0224897000 |
0.0225260500 |
0.0225533000 |
DAC |
0.6000000000 |
Since MSE is smallest for GJRGARCH model. We should consider this model for forecasting.
CODE:
library(e1071)
library(tseries)
library(xts)
library(rugarch)
price = scan(“clipboard”)
pricets=ts(price,frequency=52,start=c(2004,45))
returns=pricets/lag(pricets, -1)
summary(price)
skewness(price)
kurtosis(price)
sd(price)
summary(returns)
skewness(returns)
kurtosis(returns)
sd(returns)
#Normality check
hist(returns, prob=TRUE)
lines(density(returns, adjust=2), type=”l”)
jarque.bera.test(returns)
#log return time series
rets = log(pricets/lag(pricets, -1))
#Normality check
hist(rets, prob=TRUE)
lines(density(rets, adjust=2), type=”l”)
jarque.bera.test(rets)
plot.ts(price.ts)
# strip off the dates and just create a simple numeric object (require:xts)
ret = coredata(rets);
# creates time plot of log returns
plot(rets)
summary(rets)
skewness(rets)
kurtosis(rets)
sd(returns)
#ADF test for checking null hypothesis of non stationarity
adf.test(rets)
# Computes Ljung-Box test on returns
Box.test( coredata(rets))
# Computes Ljung-Box test on squared returns to test non-linear independence
Box.test(coredata(rets^2))
# Computes Ljung-Box test on absolute returns to test non-linear independence
Box.test(abs(coredata(rets)))
par(mfrow=c(3,1))
# Plots ACF function of vector data
acf(ret)
# Plot ACF of squared returns to check for ARCH effect
acf(ret^2)
# Plot ACF of absolute returns to check for ARCH effect
acf(abs(ret))
#plot returns, square returns and abs(returns)
par(mfrow=c(3,1))
# Plots ACF function of vector data
plot(rets, type=’l’)
# Plot ACF of squared returns to check for ARCH effect
plot(rets^2,type=’l’)
# Plot ACF of absolute returns to check for ARCH effect
plot(abs(rets),type=’l’)
par(mfrow=c(1,1))
# plots PACF of squared returns to identify order of AR model
pacf(coredata(rets),lag=10)
#specify model using functions in rugarch package
#Fit ARMA(0,0)-GARCH(2,1) model
garch21.spec=ugarchspec(variance.model=list(garchOrder=c(2,1)), mean.model=list(armaOrder=c(0,0)))
#estimate model
garch21.fit
=ugarchfit(spec=garch21.spec, data=rets)
garch21.fit
#Fit ARMA(0,0)-eGARCH(1,1) model with Gaussian distribution
egarch11.spec=ugarchspec(variance.model=list(model = “eGARCH”,
garchOrder=c(1,1)), mean.model=list(armaOrder=c(0,0)))
#estimate model
egarch11.fit
=ugarchfit(spec=egarch11.spec, data=ret)
egarch11.fit
#Fit ARMA(0,0)-TGARCH(2,1) model with norm-distribution
gjrgarch21.spec=ugarchspec(variance.model=list(model = “gjrGARCH”,
garchOrder=c(2,1)), mean.model=list(armaOrder=c(0,0)), distribution.model = “norm”)
#estimate model
gjrgarch21.fit
=ugarchfit(spec=gjrgarch21.spec, data=ret)
gjrgarch21.fit
# compare information criteria
model.list = list(garch21 = garch21.fit,
egarch11 = egarch11.fit,
gjrgarch21 = gjrgarch21.fit)
info.mat
= sapply(model.list, infocriteria)
rownames(info.mat) = rownames(infocriteria(garch21.fit))
info.mat
# RE-FIT MODELS LEAVING 100 OUT-OF-SAMPLE OBSERVATIONS FOR FORECAST
# EVALUATION STATISTICS
garch21.fit = ugarchfit(spec=garch21.spec, data=ret, out.sample=100)
egarch11.fit = ugarchfit(egarch11.spec, data=ret, out.sample=100)
tgarch21.fit = ugarchfit(spec=gjrgarch21.spec, data=ret, out.sample=100)
garch21.fcst = ugarchforecast(garch21.fit, n.roll=100, n.ahead=5)
egarch11.fcst = ugarchforecast(egarch11.fit, n.roll=100, n.ahead=5)
tgarch21.fcst = ugarchforecast(tgarch21.fit, n.roll=100, n.ahead=5)
fcst.list = list(garch21=garch21.fcst,
egarch11=egarch11.fcst,
tgarch21=tgarch21.fcst)
fpm.mat
= sapply(fcst.list, fpm)
fpm.mat