ECO FORECASTING

courseworkhero.co.uk 

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

The first part of the project is essentially what I requested for a proposal that is included in Assignment 3.  In this portion of the project (you may call it introduction) the purpose of the project — to develop the best quarterly forecast of your assigned company revenue.  In this introduction you will need to state your logic relative to why you chose the X variables and demonstrate their linear relationships with the revenue variable (Y).  Support this logic with scatter plots and a correlation matrix.  Then you should discuss the characteristics of the Y data — T, C or S that you have in the Y variable along with basic statistics.

 

The introduction should be followed by a section for each of the forecast methods.  You need to have one section each for Exponential Smoothing using the work in Assignment 4, Decomposition using the work in Assignment 5, ARIMA using the work in Assignment 6 and finally a section on Multiple Regression using the work in Assignment 7.  You should change my questions in the assignments to statements that are followed by your answers to make the project easier to follow.  Each section should have discussion of only the best model using the method accompanied by error measures (RMSE and MAPE) for fit and forecast as well as residual analysis for the fit and forecast.  Remember, only show the single best model with each method and do not show model failures.

 

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

Finally, you need to write a summary section that includes a table of fit and forecast RMSE and MAPE.  Choose the most accurate method and model in the table (the lowest MAPE and RMSE for the forecast period).  Again, the project is your write up supported by Minitab results to determine the best revenue forecast for your assigned company.

 

You should have an Appendix with each data series that you used in the project to support your work.  All X variables (excluding dummy variables and counters) should have citations. There is a description in Doc Sharing for the project. You may include an abstract or executive summary at the front of the project before the introduction as well.

New folder/Eco 309 03W 82819 Fall 2013 Syllabus x
Economics 309 03W 82819
Economic Forecasting
Fall 2013
(See your specific section syllabus in Doc Sharing)
 

Professor: Stanley Holmes

Email:
Stanley.Holmes@tamuc.edu

Phone: 903-468-6029 (Commerce) 903- 365-7190 (Home Office)

Office Hours: : 11:00 A.M. to 12:00 P.M. and 1:00 P.M. to 4:00 P.M. Central Time Tuesdays and Thursdays and Wednesdays 10:00 A.M to 12:00 P.M. Central Time or by appointment (BA Room 102 TAMU Commerce). We may also meet online at our Classlive website by appointment.

Text: Business Forecasting 9th ed., Hanke and Wichern.
Pearson/Prentice Hall, Inc, ISBN: 139780132301206

 

Software: You need to rent the student version of MINITAB 16 for 6 months at
http://www.minitab.com/en-US/academic/ 

Important Dates: Please refer to the academic calendar at:
http://www.tamu- commerce.edu/registrar/pdfs/academicCalendar09
 

CLASS        Online lectures will be held on Tuesdays from 6:30 P.M. until 9:30 P.M. central time. During the lectures we the will cover specific chapters and examples mentioned in the syllabus. You may use the BA computer lab or the library computers at TAMUC as an alternative to your personal computer. I suggest that you download a copy of Minitab to enable you to follow examples during the lecture.

 

COURSE OBJECTIVE

Objectives of this course is to introduce the student to the basics of quantitative methods and their application to real business situations as well as the use of current software available for forecasting. After taking this course the students will be able to apply different forecasting techniques to empirically test economic theories and business policy analysis and professionally present the results of their analysis.

COURSE OUTLINE  
Chapter 1 Introduction to Forecasting
Chapter 2 Review of Basic Statistical Concepts
Chapter 3 Data Patterns and Forecasting Techniques

Project Part 1 (Proposal- 5 points) Due 9/16

Chapter 4 Moving Averages and Smoothing Methods

Project Part 2A (5 extra credit points) Due 10/14

Chapter 5 Time-Series and Their Components

Project Part 2B (5 extra credit points) Due 10/21

 EXAM 1—Chapters 1,2,3,4, 5 (25 points) Due 10/24-10/26

 
Chapter 9 Box-Jenkins (ARIMA) Type Forecasting Models and Combining Forecast Methods

Project Part 3 (5 extra credit points) Due 11/4

 EXAM 2— Chapter 9 (25 points) Due 11/7-11/9

Chapter 6 Simple Linear Regression
Chapters 7& 8 Multiple Regression Analysis/Time Series

Project Part 4 (5 extra credit points) Due 11/25

 EXAM 3—Chapters 6,7 and 8 (25 points) 12/5=12/6

 

Completed Class Project Part 5 (20 points) Due 12/2

NOTE: This outline is subject to change! Check your e-mail multiple times every day, check our class eCollege website and attend the class regularly.
 

GRADES AND ADMINISTRATIVE MATTERS:

Grades will be based on 2 exams (25 points each), a 5-part formal class project (total of 25 points.), and a comprehensive final exam (25 points).  Project Parts must be completed and submitted on time to earn credit.  No late work will be accepted.  Plan in advance for the exams: there will be no early exams and no make-up exams. An exam that is missed will be considered an F, unless your professor is notified prior to the exam and the excuse is a legitimate medical one or officially approved. Regardless of the excuse, if you miss two tests you will automatically fail the class. Again, late assignments and projects will not be accepted. Course grades will be assigned as:
90 – 100 % A
80 – 89 % B
70 – 79 % C
60 – 69 % D
Below 60 % F
See the student evaluation criteria below.
 

HELPFUL HINTS Since this is an enhanced course, you need to follow your school emails regularly. You will have regular announcements and uploads posted in the class eCollege website. For each chapter assigned, you need to read your book, make sure you understand the key concepts and apply the concepts using MINITAB. Reading the assigned materials, working the assigned exercises, using office hours, being in frequent communication with your instructor, and checking the class website regularly are very important learning tools. A textbook will be placed on 2 hour reserve in the library on campus in case the dog ate yours.  It can be checked out from the circulation desk. Unfortunately, there is not a similar online opportunity.

 

All assignments must be submitted to the appropriate assignment dropbox in the course eCollege website.   Each submission should have a filename with your first initial followed by your last name, eco 309 and assignment number (assign#).
 

EXAMS: Each exam will be online and can be found on our class eCollege website.  Each exam is subject to a time limit.  You will have to upload your answers to exam problems by the specified deadline. Late work will not be accepted.
 

PROJECT PARTS: You will have to upload your project proposals and projects to BOTH turn-it-in.com and the relevant dropbox folder on e-College by midnight of the specified due date. Each submission should include a summary page of what you had done, how you have done it and interpretations of the results. Plots and output without interpretation will be considered incomplete and will not be graded. Please submit everything in Word format, cite and LABEL your variables. The class id for turn-it-in is 2769279 and your enrollment password is ECO309.
 

CLASS, LAB/ WORKSHIP AND  OFFICE HOURS: I strongly recommend using all options. Do not  miss a class lecture session and if you have any questions contact me for further explanations via the email.
 

RULES, REGULATIONS AND OTHER STUFF

All students enrolled at the university shall follow the tenets of common decency and acceptable behavior conducive to a positive learning environment.
 
The College of Business and Technology at Texas A&M University-Commerce students will follow the highest level of ethical and professional behavior. Actionable Conduct includes illegal activity, dishonest conduct, cheating, and plagiarism. Failure to abide by the principles of ethical and professional behavior will result in sanctions up to and including dismissal from the university.
 

PLAGIARISM   Plagiarism represents disregard for academic standards and is strictly against University policy. Plagiarized work will result in an “F” for the course and further administrative sanctions permitted under University policy. Guidelines for properly quoting someone else’s writings and the proper citing of sources can be found in the APA Publication Manual. If you do not understand the term “plagiarism”, or if you have difficulty summarizing or documenting sources, contact your professor for assistance.
 

STUDENT WORKLOAD University students are expected to dedicate a minimum of 90 clock hours during the term/semester for a 3SH course.

 

Students with Disabilities:

The Americans with Disabilities Act (ADA) is a federal anti-discrimination statute that provides comprehensive civil rights protection for persons with disabilities.  Among other things, this legislation requires that all students with disabilities be guaranteed a learning environment that provides for reasonable accommodation of their disabilities.  If you have a disability requiring an accommodation, please contact:

Office of Student Disability Resources and Services

Texas A&M University-Commerce

Gee Library

Room 132

Phone (903) 886-5150 or (903) 886-5835

Fax (903) 468-8148

StudentDisabilityServices@tamuc.edu

 
 
 

Student Evaluation Criteria

 

Criteria

 

1(Unsatisfactory)

 

2 (Emerging)

 

3 (Proficient)

 

4 (Exemplary)

Understanding of time series data and components using various statistical and graphical tools.

Student can’t demonstrate understanding of the components.

Student can identify some components.

Student can identify most components using most of the tools.

Student can identify all components using all the tools.

Understanding of Regression Analysis and application to both time series and cross section data.

Student cannot demonstrate an understanding of regression analysis.

Student demonstrates an understanding of some regression concepts but cannot apply it.

Student demonstrates an understanding of the concept of regression and can apply those concepts.

Student demonstrates an understanding of the concept of regression and can apply to time series and cross section data.

Understanding and application of different univariate time series models including but not limited to Smoothing, Decomposition, and ARIMA.

Student cannot demonstrate an understanding of univariate methods.

Student demonstrates an understanding of some/ all of the univariate time series models but can’t apply.

Student demonstrates an understanding of some/ all univariate time series models and apply some of them successfully.

Student demonstrates an understanding of all univariate time series models and apply them successfully.

Identification of the best model from alternative models and obtaining forecasts using at least one software.

Student cannot demonstrate an understanding of the model selection processes.

Student can demonstrate an understanding of 1 out of 3 of these processes.

Student can demonstrate an understanding of 2 out of 3 of these processes.

Student can demonstrate an understanding of the entire processes.

 

New folder/MINITAB ASSIGNMENTS/Assignment 4 Project Part 2A x
Chapter 4: Chapter 4 – Assignment 4

Assignment 4 Project Part 2A — The Exponential Smoothing Forecast.   Due by midnight 10/14.  Get it in on time or it will not be graded.  This part of the assignment is worth up to 2.5 extra credit points and can serve as the exponential smoothing part of your class project. 
 Show your work and submit it to the Chapter 4 Assignment 6 Dropbox. 
This assignment addresses forecasting your selected Y data (dependent variable) using an exponential smoothing technique.  Note:  Do not use the X (independent) variables in this exercise.  Use only one exponential smoothing method — the best that applies. Do not use any other forecasting techniques in this assignment.  Turn in only the one best model that you develop.
(Remember– 1. Do not show failed models in business reports.  Share your failures with your family if you wish and not with your boss or instructor.and 2. Never use Y hold out data observations in any forecast model.)
a) Tell me why you selected the appropriate exponential smoothing method by commenting on your Y data characteristics. (you should use a time series plot and autocorrelations to do this), 
b) Apply the appropriate exponential smoothing forecast technique to your Y variable excluding the last two years of data (8 quarter hold out period).  Show the Y data, fitted values and residuals in excel format and show your exponential smoothing model coefficients. (Find the correct coefficient and not just use the default values.)
c) Evaluate the “Goodness To Fit” using at least two error measures — RMSE and MAPE.
d) Check the “Fit” period residual mean proximity to zero and randomness with a time series plot; check the residual time series plot and autocorrelations (ACFs)  for trend, cycle and seasonality.  
e) Evaluate the residuals for the “Fit” period by indicating the residual distribution using a histogram (normal or not and random or not), 
f) Comment on the acceptability of the model’s ability to pick up the systematic variation in your Fit period actual data.
g) Develop a two year quarterly forecast (for the hold out period). 
h) Evaluate the “Accuracy” of the forecast for the “hold out period” using RMSE and MAPE error measures used from forecast period residuals and comment them.  
i) Do the forecast period residuals seem to be random relative to the hold out period data? Check the forecast period time series plot of the residuals.
j) Did the error measures get worse, remain the same or get better from the fit to the hold out period?  Do you think the forecast accuracy is acceptable?
Show your work and graphs in a Word document.  Make sure that you comment on statistics and graphs relevant to answering the above questions.  DO NOT leave statistics and graphs stranded.  If you show something write about it.  Note that this work will become part of your class project so do a good job on it.
 

New folder/MINITAB ASSIGNMENTS/Assignment 5 Project Part 2B x
Assignment 5 Project Part 2B — The Decomposition Forecast.  Due by midnight 10/21.  Get this in on time or it will not be graded.  This part of the assignment is worth up to 2.5 extra credit points and can serve as the Decomposition part of your class project. 
Show your work and submit it to the Chapter 5 Assignment 7 Dropbox.  (Again — 1. Do not show failed models in business reports.  Share your failures with your family if you wish and not with your boss or instructor.and 2. Never use Y hold out data observations in any forecast model.)
a) Perform Time Series Decomposition on your project Y variable excluding the hold out period.  Show me the smoothed Trend Values (TREN in Minitab) , Smoothed Cycle Values (use Minitab Calculator to DESE/TREN for Cycle Factors)  and Seasonal Indexes (SEAS in Mintab).  
— note that you must use the last cycle factor and multiply each forecast observation by it to get a cycle adjusted forecast.  Since this is a multiplicative decomposition model this must be done Minitab result to obtain a reasonable forecast.  We have discussed this procedure in class.
b) Show the seasonal indices (SEAS in Minitab) and develop a one year time series plot of them.  Do they indicate strong seasonality?  How can you tell?
c) Evaluate the “Goodness To Fit” using RMSE and MAPE error measures .
d) Evaluate the residuals for the “Fit” period by indicating the residual distribution (random or not).  Use a fit period residual time series plot, residuals ACFs and a histogram to determine if the Fit period residuals are random.  If the residuals are not random state if you detect any trend, cycle and seasonality autoregressive characteristics.  (Note: you expect to see only cycle in the residuals — any T or S is a signal that the model did not use this information.  You will adjust the cycle component in the forecast by using the last cycle factor in the forecast.)
e) Develop a two year quarterly forecast (for the hold out period) using the time series decomposition model you evaluated in c) above and adjust the forecast with the last cycle factor.  Evaluate the reasonableness of the forecast by appending the cycle adjusted decomposition forecast to the Y data and developing a time series plot.
f) Evaluate the “Accuracy” of the model for the “hold out period” using the RMSE and MAPE measures used in part b) and comment on them.  Did the error measures increase, remain the same or decrease from the “Fit” to “Hold Out” or forecast period?

Show your work and graphs in a Word document.  Make sure that you comment on statistics and graphs relevant to answering the above questions.  Again, this will be the decomposition portion of your class project.

New folder/MINITAB ASSIGNMENTS/Assignment 6 Project Part 3 x
Assignment 6  Project Part 3 — The ARIMA Forecast.   This assignment is due by midnight Nov 4th.  The assignment is worth a maximum of 2.5 extra credit points and may serve as the project ARIMA section.  This assignment is due by midnight Nov 4th.  No late submissions will be graded. 
(Again — 1. Do not show failed models in business reports.  Share your failures with your family if you wish and not with your boss or instructor.and 2.Never use Y hold out data observations in any forecast model.)
Complete each of the following sections.
a) Examine the Y data (excluding the hold out period) to determine if it needs to be differenced to make it stationary.  Show a time series plot of the raw Y data and autocorrelation functions (ACFs).  
b) From your time series data plot and AFCs determine if you have seasonality.  If you do, use seasonal differences to remove it and run the ACFs and PACFs on the non seasonal Y data series.  
c) Fill out the ARIMA seasonal menu (P,D,Q) appropriately.  If you have no trend as shown by the seasonally differenced ACFs run the ARIMA model and note the significance of each coefficient.  Make model adjustments accordingly to improve results.  
Note:  You may not use an ARIMA model with non significant coefficients to forecast.  If the coeffcients are not signficant derive another model that has signficant coefficents and the lowest residual MS value.
d) If it requires differencing for trend to make it stationary do so and run another time series plot and ACFs on the differenced data. If this requires differencing again do so but run time series plots and ACFs each time you do. 
e) Run and show the PACFs on your stationary data series and identify the appropriate ARIMA model and show the initial ARIMA non seasonal menu section (p,d,q) filled out appropriately and any seasonal (P,D,Q) components in the seasonal menu filled out.  
f) Run the ARIMA model and note the significance of each coefficient.  Make model adjustments accordingly to improve results shown by the residual MSE. 
g) Calculate the two error measures that you used in other model analysis and comment on the acceptability of the size of the measure.
h) Note the LBQ associated P values for the selected lags.  They should each be significant (above .05) to qualify the residuals as potentially random.  If they are not random select an alternative ARIMA model form that has random residuals.
i) Run an ARIMA forecast for your hold out period and show a time series plot of the residuals (Y actual and Y forecast) for the 8 quarter hold out period.
j) Calculate the hold out period RMSE and MAPE (Refer back to earlier chapters for the error measure formulas) and compare them to the Fit period ARIMA error measures (from g above).
k) Plot the forecast values appended to the Y data without the hold out to check for forecast reasonableness. 
Read chapters 6 and 7.   Go to assignment 10 in chapter 6.

New folder/MINITAB ASSIGNMENTS/Assignment 7 Project Part 4 x
Chapter 8: Chapter 8 – Assignment 7

Assignment 7 Project Part 4– The Multiple Regression Forecast — This assignment is due by midnight November 25th.  This completed assignment is worth up to 2.5 extra credit points and may serve as the multiple regression portion of the class project.  Late submissions will not be graded.
This assignment is essentially the multiple regression analysis portion of your project.  This means that I expect you to develop a good regression model with more than one independent variable (X).  Ideally, if you made a good choice of variables in your proposal you should be able to include all three or more X variables in your regression equation.  Be sure to complete each part and write your responses supported by Minitab/excel work.  This assignment should be turned in to me as a Word document.  You should include excel and Minitab tables and graphs in the Word document as required.  Be sure to comment on each of the 10 points below.
1.   Run scatter plots and a correlation matrix on your project variables and comment on their values and significance if you have done this earlier you may use that analysis here. 
2.  Note any seasonality in your Y data with ACF (autocorrelation analysis of Y)  You may use ACFs that you previously developed.
3.   Determine if any of your variables require transformation.  If they do, calculate the transformed values and create a scatter plot with a regression line and run a correlation with Y for each transformed X.  Create a table for the Y, X and X transformed values.
4.   Determine if your model requires dummy variables (e.g. for Y variable seasonality or significant events) and include a table of the dummy variable values for regression analysis.  You may use either Decomposition centered moving average of Y (CMA) for Y and seasonal indices (SI) to seasonally adjust your Y variable or use dummy X variables in regression.
5.   Use regression to evaluate the variable combinations to determine the best regression model.  Note that is any seasonal dummy variables are used all of the seasonal dummy variables must be used.  Use R square and F as primary determinants of the best model.
Note the significance of each slope term in the model.  Rule– if the coefficient is not significant then you may 
not
 use the model to forecast.
7.   Investigate your best model using appropriate statistics or graphs to comment on possible:
a.       Autocorrelation (Serial correlation) with the DW statistic
b.      Heteroscedasticity with a residuals versus order plot (look for a megaphone effect)
c.       Multicollinearity with the VIF statistic 
Determine the best remedies for any of the problems identified in 5 above and make the appropriate changes to your regression model if required.  Rerun the model and evaluate the fit again including error measures, R adjusted square, F value, slope coefficient significance, DW and VIF.
6.   Evaluate the best multiple regression model accuracy with 2 error measures (RMSE and MAPE) each for the fit and again for the forecast period. 
9.   Evaluate the best model fit residuals and comment on their randomness using autocorrelation functions (ACFs) , histogram and a normality plot (You should use a four-in-one graphs as well). Comment on the cause of the error — trend, cycle, seasonality and if it is statistically significant.
10. Forecast for the holdout period using your hold out X values to forecast Y.  You can use Minitab Regression – Options menu by placing the columns for the X variables hold out values and any dummy variable predictions in the “Minitab/Regression/Options/Prediction intervals for new observations” area.
11.   Evaluate the forecast error measures and residuals to determine if the error is acceptable or has systematic variation.  Write your conclusion relative to the acceptability of the sales forecast.

New folder/MINITAB ASSIGNMENTS/Assignment 8 — Project Part 5 — Class Projec x
Assignment 8 —  Project Part 5  — Class Project is due December 2.   This is worth a maximum of 20 final grade points.  No late submissions will receive a grade.  
You will not be given an example for the project.  Consider it a business assignment that you have been given by an executive to forecast the company sales (Y) variable.  The projects will be evaluated on how well you forecast the Y variable and your use of the four alternative forecast techniques.  Consider the reader not as an economics professor but an executive that requires the best forecast of Y.  
You must 
not
 make this a forecasting tutorial — executives may take offense at this.  Assume that the executive is at least a MBA level and has basic familiarity with statistical concepts. 
You must 
not
 use possessive terms such as “my”, “our” or “your” when referring to data, statistics or models in this study.  This is a sign of poor professional business report writing. 
You must provide statistical or graphical support for your points in the project.  Do not make assertions without support or proof.  For example, what constitutes the “best” forecast?  Why are X variables “significant” in forecasting a Y variable.  What is significant seasonality?
Remember that this is a business report and not a Minitab exercise.  As a result, each Minitab plot, table or statistic must be appropriately narrated relative to 1) why you are showing it and 2) what it indicates.  Stand alone plots, statistics or tables without narration will not be graded.  In essence, if it does not have narrative — it does not exist. 
Never show failed work in your report.  It is a waste of the executive’s time to read about your failures.  Show only your success or best results. 
Never use the word “attempt” in the report.  In business either the project is accomplished or it is not.  
Every project will be subject automatically to Turn-It-In so do not use material from previous projects submitted by others.  If I detect plagiarism (a turn-it-in of 50 or more you will receive a zero for the project and an F in the course.  
You will be graded on organization, grammar and spelling as well as content described in the project outline in Doc Sharing.  Reread and spell check your work.  Your project will also be graded on the ease of reading your material.  Typically, plots and tables close to the narrative are easier to read.  However, you may include and refer to tables in appendices as well.  
Please ensure that your project is in MS Word format.  The Appendix items should be included in the document along with appropriate data citations.  Again, late projects will not receive a grade. 
Submit your project in the drop box for the class project by midnight December 2  with your first initial, last name followed by Eco 309 Project as the file name.   
Thank you,
Stanley Holmes, Ph.D.

New folder/POWERPOINTS/chap08-09.ppt

©2007 McGraw-Hill/Irwin

Chapter 8
Combining Forecast Results

8-*

Forecasts of the Same Series (Y) Can Be Quickly and Easily Combined
The forecast methods can be different – for example a regression forecast can be combined with a qualitative forecast and combined with a decomposition forecast and combined with and exponential smoothing forecast.

You must have the forecast series along with the residuals and RMSE to combine forecasts effectively.

Selecting the appropriate weights to give to each forecast series can be done using several alternative methods.

8-*

McGraw-Hill/Irwin
©2007 The McGraw-Hill Companies, Inc. All rights reserved
Selecting the Appropriate Weights for each Forecast is Important

8-*

One of the methods to assign forecast weights is the Variance-Correlation method with a general form:

k = (ó2)2 – ρ ó1ó2
(ó1)2 – (ó2)2 – 2ρ ó1 ó2

Another method of continually adjusting weights is the Ratio of Errors method with a general form:

ɑ1, T+1 = ∑ ε22t
ε21t + ε22t

Where:
k = the weight of Forecast 1
Ó1 = the variance of residuals of Forecast 1
Ó2 = the variance of residuals of Forecast 2
ρ = the Correlation if residuals between Forecast 1 and Forecast 2
k-1 = the weight of Forecast 2

T
t=T-v
Where:
ɑ1, T+1 = Weight assigned to Forecast 1 in time period T+1
εit = error (residual) made by forecast i in time period t
v=no of periods included in adaptive weighting procedure.
T = total number of forecast error periods

8-*

8-*

Combining a Subjective, Regression and a Decomposition Forecast

8-*

Note combining all three forecasts produces the best results.

8-*

The General Form for Combining Forecasts With Regression

Ŷ1 is the forecast series for the best forecast method. When compared to the observations (Y) values generates residuals that can be summarized with the lowest RMSE.

Ŷ2 is the next best forecast series and when compared to the Y observation values generates residuals that can be summarized by RMSE.

Problem: what weights to apply to each forecast series to get the best forecast results (lowest RMSE).

ŶF= β Ŷ1 + β Ŷ2

You may have more than 2 forecasts to combine. When you do combine the stepwise through the regressions. Combine two forecasts, then three and if necessary combine 4 in successive regression runs.. Note the RMSE changes as well as BIC, AIC and R square changes as you add other forecast (Ŷn) models. Also note the t values for the coefficients of your forecasts.

8-*

Using Regression to Combine Forecasts

Make sure there is no overlap in model composition – Run correlation coefficient(s) on the squared residuals between each forecast Ŷ1 and Ŷ2. Correlation should be low.

Run regression with the actual observations (Y) as the dependent variable and the forecast series (Ŷ1) as X1 and Ŷ2 as X2.

Check the t-statistic for the intercept (constant) coefficient to ensure that it is not significant. You want it to fail the Ho hypothesis. If it is significant you may want to include another forecast as X3 and check the intercept t statistic again.

Rerun the regression with the same data again and force the intercept through the origin (no intercept coefficient). Check the slope terms for forecasts X1 and X2 to ensure they sum to approximately 1. The coefficients are the weights to apply to each forecast.

Multiply each X variable forecast series by its weight (percentage) and sum for each period for the forecast.

Check the RMSE and MAPE the ensure that it is lower than individual forecast measures.

8-*

8-*

Solutions to
Case Questions #2

8-*

Solutions to
Case Questions #2

8-*

Solutions to
Case Questions #3

8-*

Solutions to
Case Questions #3

8-*

Changing Landscape in Business

Most large databases can handle billions of records that make data mining a daunting task.

The databases have recovery; locking, multiple processes and multi-threading for high concurrency; hot and cold backup; and single-master replication for high availability applications, Web-centric applications and OLAP (Online Analytical Processing) capabilities.

These features create access and high reliability mining data.

People with talent in manipulating and transforming data into usable information are highly valued in business.

Business forecasters are a subset of these people.

8-*

Data Mining
(searching for meaningful patterns in the data)

Data mining the is extraction of useful information in a database.

In selecting and implementing a forecasting model we have a preconceived notion of how the model should behave and what the forecast should look like.

In data mining we don’t know what pattern the data should have. Let the data tell the story rather than conforming it to a model.

The current environment is data rich. Most corporations have large data warehouses and subtending data marts that provide the opportunity to reveal more about businesses than ever before.

Rise of HS data links, large storage devices and SQL and later PLSQL created access to transform data into information.

8-*

RDBMS
RDBMS stands for Relational Database Management System.

RDBMS is the basis for SQL, and for all modern database systems like MS SQL Server, IBM DB2, Oracle, MySQL, and Microsoft Access.

SQL is the standard language that works with all the well-known database systems.

The data in RDBMS is stored in database objects called tables.

A table is a collections of related data entries and it consists of columns and rows.

Data is stored in specific formats for characters, numbers, date, and time.

8-*

SQL

SQL stands for Structured Query Language and lets you access and manipulate databases
It is typically used by Database Managers to administer and maintain large databases and data marts.
SQL can execute queries against a database, retrieve data from a database as well as insert, update and delete records in a database,
SQL can create new databases, tables, stored procedures and views in a database
SQL can set permissions on tables, procedures, and views
Although SQL is an ANSI (American National Standards Institute) standard, there are many different versions of the SQL language.
They all support at least the major commands (such as SELECT, UPDATE, DELETE, INSERT, WHERE) in a similar manner but have their own proprietary extensions.

8-*

PL/SQL

PL/SQL stands for Procedural Language/SQL. PL/SQL extends SQL by adding constructs found in procedural languages.

It include declare, begin, exception, end and other programming procedural statements to use with the database.

This is used by DBMs to create and execute tailored queries to the database.

The DBM is the data miner’s best resource to achieve success
in writing queries and managing databases.

One of several SQL proprietary extension sets.

8-*

Four Data Mining Tools

Classification – distinguishing between different types of objects or actions. Fraudulent or correct, influential or not, etc. )

Clustering – Techniques to define and attach labels to objects then group the objects into a meaningful data subset.

Association – (“affinity analysis”) defining rules of discovery that reveal preferences or related patterns in data.

Prediction – the techniques we have studied thus far Predicting continuous and categorical variables.

By using these tools data miners do not simply verify previous data pattern hypothesis but seek new knowledge from the data in the form of meaningful facts or rules.

8-*

Differences in Terminology

Data Mining Statistical Forecasting
Output Variable or Target Variable Dependent Variable
Algorithm Forecasting Model
Attribute or Feature Independent or Explanatory Variable
Record Observation
Training Data Sample or “In Sample”
Validation Data “Hold Out” or “Out of Sample”
Score Forecast

Data mining stems from more computer science than statistical orientation.
For example a forecast from data relationships might reveal “if then” rules between data subsets that determine outcomes or forecasts. These rules can be combined with statistical forecasting to create very accurate forecasting models.

8-*

Data Mining Classification Techniques

1. k-Nearest Neighbor

2. Classification Trees

3. Naïve Bayes

4. Logistic Regression

Most businesses today use classification techniques to determine good customers, target advertising, keep the best suppliers, determine whether an applicant will be a good employee. There are thousands of uses of these techniques. They are used to predict specific outcomes and are gaining in use as data accessibility expands.

8-*

Nearest Neighbor Method

An Investigation of bank customer IDs reveal that personal loan takers (PL) cluster around specific attribute levels.

What will the next account ID be? Can we identify them early on and tailor the banks ad campaign to target just them?

Income
Age
Loan Takers
Non Loan Takers

8-*

Nearest Neighbor Method

The Nearest Neighbor looks at the sum of (k) neighbors in each attribute cluster to determine the category of the observation. (loan taker in this case)

k = 3

We can expand the neighbor classes (X) beyond age and income to improve the nearest neighbor model and its predictive capabilities.

Income
Age
Personal Loan Takers
Non Loan Takers

8-*

Data Mining Nearest Neighbor Technique

Nearest Neighbor – Relational attributes between the output variable (Y) and the attribute variables (independent (X) variable) data that is associated with a particular outcome (classification).
Table 9.2 on page 451 shows the list of attributes (X) associated with banking customers (ID) some of which took out personal loans (Y).
Table 9.3 on page 452 shows the training data set and Table 9.5 on page 454 the Classification Matrix (Confusion Matrix) for the nearest neighbor outcome.
The training data (3000 customer IDs) are run to create the relationships that are then applied to the validation data (2000 customer IDs).
Note: Of the validation sample nearest neighbor predicted class to take out loans 118 took out loans and 8 did not. On the predicted class not to take out loans 1798 did not while 76 did.
Examine the percentage errors to determine what is acceptable not only the overall (4.2% in this case) but class error rates as well.
Note the number (k) of nearest neighbors used. The software chose the lowest error rate with the lowest neighbors (k).

8-*

Lift Charts of Validation Data Sets
Figure A Figure B

Number of Cases Deciles
Lift measures the change in a particular class (PLs in this case)) when a model is used to select a group from the general population (IDs in this case). Note that the lift is well above the random selection result line in Figure A and that the top 10 percent of the first category of PL/ID is seven times larger than any other randomly chosen deckle in Figure B.

Cumulative PL

Deckle Mean/ Global Mean

LIFT
Random Selection Line (45 degrees)

About 7 times the next largest deckle

8-*

Decision Tree Method

An Investigation of bank customer IDs reveal that personal loan takers (PL) cluster around specific attribute levels.

Age > X2

No Yes

Income > X1 PL Taker

No Yes

Non PL PL Taker

In this case 2 decision rules D1 and D2 apply to place an observation in a “Personal Loan Taker” category or not.

What will the next account ID be? Can we identify them early on and tailor the banks ad campaign to target just them?

Income
Age
Loan Takers
Non Loan Takers

D1
X1
D1 = Income > X1
D2 = Age > X2
D2
X2

8-*

Data Mining Decision or Classification Tree Technique

It begins with the classification of observations using “decision nodes” or rules that usually consist of <,> or = operators and a classification numeric value. In the example on page 456, > 7.0 and > 6 are decision nodes for classification.

The observations move from the root (collection of observations) to the branch(es) first classification through to leaves (final outcome) or “terminal node” of the classification process. Upside down tree as seen in Figure 9.2 on page 458.

Data observations pass from the roots through the tree to end up in one of the leaves or terminal nodes.

Note: prune or eliminate excessive decision rules. Don’t let the tree become too cumbersome.
Use non correlated decision characteristics or rules. For example don’t use house value and income level in separate decision rules.

Use training and validation data sets as with nearest neighbor.

Scoring of validation is similar to nearest neighbor (see page. 461)

8-*

Decision Nodes In Classification Trees

N1
N3
N2
N4

No PL
The X independent variables define a decision node – values are computed for each decision point from the training data and then tested with the validation data set.

Where N1 = Decision value of variable X1
N2 = Decision value of variable X2
N3, N4, etc….
Note that the decision nodes have numeric values followed by an operator >,<, =. There is a “yes” “no” path following each that directs the classified observations to the next branch or decision point (Nx) until a final classification is reached (“Terminal Leaf” or “Terminal Node”). The No PL in the chart above is a “Terminal Leaf”. This is an example of decision tree categorization. Predictions can be made with Classification Trees as well –e.g. the sale price of a car (instead of category) based on variables such as age, mileage, engine size, original price, color, etc… 8-* Data Mining Naïve Bayes Technique Bayes theorem predicts the probability of a prior event given that a subsequent event has occurred. Probability of the prior even is the “posterior probability. We can use this method predict the outcome of conditional probabilities. Any subset of data will have a probability associated with it. For example, the portion of females in a group defines the probability that anyone drawn from the group will be female. The general form of the conditional probability is: P(A/B) = P(B/A) P(A) P(B) Obtain the probabilities from the ratio of the data subsets. For example, pg. 469 let A = No. of fraudulent credit card transactions B = No. of reported lost cards 8-* Naïve Bayes Technique (Continued) In the example we try to find the probability that a card transaction is fraudulent given that the card was reported lost. From the data in Table 9.10 on page 469 you can calculate the (probability of B/given A) and the probability of A. This is divided by the Probability of B. P(A/B) = (2/3)(3/8) = .667 3/8 In this technique you must clearly define and summarize the information on the subsets or classes of data you are to assign probabilities to. The example of the probability that an adult male crewmember dies on the Titanic is determined from the class conditional probabilities determined by the software. This method generates the same Confusion Matrix and lift charts shown in other classification techniques. 8-* Data Mining Logistical Regression Technique This method uses regression analysis along with Y defined as a logistics (logit) curve to determine likely outcomes. This method like the other classification methods deals with continuous and non continuous Y outcomes. (recall that dummy variables are a form of non continuous or discrete variables that switch from 0 to 1.) The Y logit curve is a continuous function disguised as a non continuous (discrete) variable. 1. Enter the data for the independent variables (X) such as age, income, experience, family, education, etc… in the personal loan (Y) example in the text and the Logit model will general and intercept (constant) and variable coefficients that are subject to t-tests and p values. 2. The logit model may then be pruned on non significant X variables and rerun to get a good estimator. 3. Logit regression also generates Confusion Matrix and Lift diagrams similar to the other classification methods. 8-* 1 0 Discrete Pass Fail Logit Curve The log form of Y results in the classification of an occurrence (forecasted outcome) given the values of the X variables. For discrete Y outcomes (like pass fail, fraudulent or good, male crewmember dies on Titanic or lives, etc..) the logit regression on outcome Y will provide very accurate results. This form of classification has the added advantage of using statistical significance measures in regression analysis along with classification error and lift results to improve the classification prediction. 8-* McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved 8-* We are back to where we started with the forecasting process. 8-* 8-* Eco 309 01W The End Test next Tuesday at 12 noon. It will be available as shared word and excel documents. Don’t forget to put you names on them. I have enjoyed working with you and I wish you the best of luck – Happy Forecasting! New folder/POWERPOINTS/Chapter 4 and 5 Slides revised (3).ppt 3-* Eco 309 Economic Forecasting 3-* Residual Autocorrelation and the Adequacy of Forecasting Techniques (Summary) Determine if the autocorrelation coefficients of the residuals is indicative of a non trended random series. 1. Calculate the forecast residuals (Ft-At) for each period. 2. Calculate the autocorrelation coefficients (ACFs) of the residuals for representative lag periods. 3. Look at the ACF pattern to check for stationary (no trend) series and statistically significant lags (greater than t-value) 4. Compare the 12th and 24th LBQ value to the .05 Chi-square table value for 12 and 24 degrees of freedom. Residuals may be random (not autocorrelated) if the LBQ does not exceed to Chi-square table value. 3-* Checking Your Y Variable Characteristics 1. Perform a Time Series Plot your Y data and note Magnitude and range of data Trended or Stationary Cyclical and length of cycles Seasonal and peak periods 2. Run Autocorrelations with at least 2 years worth of period lags Note lag periods exceed the t table values (+ or - red line) Slow decline in significant r values indicates trend Significant values or r peaks for cycle periods Significant values or r peaks for seasonal peaks (4th peak for quarterly or 12th peak for monthly lags) 3. Comment on significant trend, cycle and seasonality of the data 3-* How to Evaluate a Forecast and Forecast Model 1. Recall that models produce two types of output Estimates of data used in developing the model Forecasts of future data values 2. Residuals (Error Terms) are: Actuals – Estimates for Model Fit Actuals - Forecasts for Forecast Accuracy 3. Analysis of Residuals is key to determining the usefulness of a model and reliability of a forecast 4. Treat residuals as another data series Perform Time Series Plots and Autocorrelations to check for characteristics (Same as Y Check on the previous slide) 5. Check residuals with selected Error Measures Select the model with the lowest error measure values 6. If Statistically significant Characteristics exist then your model needs improvement (or try another forecast method) 3-* Random Variation Is Not Expressed in Significant Lag Period Autocorrelations How is this useful? Determining the characteristics of your data series. Examining the error terms of models that you develop – their error terms should not have any strong (significant) autocorrelation components. You should test your model error terms with autocorrelation analysis – that is look for significant lag period autocorrelations that indicate that the model could be improved or choose another model. You must comment on these in your final project paper as analysis of residuals (error terms) for each of the four forecasting models that you use. 3-* An Additional Residual Test: Residual Distribution and the Adequacy of Forecasting Techniques Calculate the forecast residuals and plot them in a histogram Look for normally distributed residuals. Compare the histogram to a normal bell shaped probability curve with residuals clustered around the mean. Residuals clustered to one side of the residual mean indicates a distribution that is not normal and an improvement to the forecast technique could be made. The forecast may be biased – examine the mean of the residuals (error measures) to confirm this. 3-* MarriagesFITS1 RESI1 2413 2430.91 -17.9087 2407 2424.53 -17.5340 2403 2416.31 -13.3051 2396 2406.87 -10.8750 2403 2397.63 5.3704 2443 2392.90 50.0985 2371 2402.09 -31.0949 2362 2388.53 -26.5328 2334 2375.06 -41.0604 2362 2356.02 5.9757 2336 2349.66 -13.6648 2344 2337.54 6.4576 2384 2331.01 52.9928 2244 2338.99 -94.9926 2358 2303.89 54.1078 2329 2310.45 18.5465 2345 2308.40 36.5997 2254 2312.65 -58.6547 2245 2289.28 -44.2806 2279 2267.84 11.1579 We can calculate the error for each observation – how can we tell if one forecast is better than another? We need a single value that expresses the amount of error that we can expect per observation. (Marriages here) We nee error measures. What are Error Measures? 3-* Error Measures These measures are vital to our analysis of forecast model performance. Good models have the lowest error measures. They must be applied to “fit” or the data used in developing each forecast model. They must be applied to “accuracy” or the data that is forecasted by the model. They can be used in any forecast or budget accuracy analysis. They can be used for any forecast method either quantitative or qualitative. You must use them consistently in your class project – and you must employ at least two of these measure across each of your four forecast models. 3-* Forecast Error Measures or Evaluation Tools ME = ∑ ( At – Ft) MAE = ∑ At – Ft MPE = ∑ [( At – Ft)/At ] MAPE = ∑ (At – Ft)/At RMSE = ∑ ( At – Ft)2 n n n n n Mean Error is good for detecting bias but typically Underestimates forecast error. Mean Absolute Error or Mean Absolute Deviation that is in forecast terms not a good measure across various forecast series. This is produced by Minitab as MAD Mean Percentage Error is good for detecting bias but typically underestimates forecast error. Mean Absolute Percentage Error is good for comparing error across forecast series and is not sensitive to forecast unit size. This is produced by Mintab.MAPE Root Mean Squared Error is good for comparing Error for models of a given series. It is forecast unit sensitive. (equivalent to Standard Deviation) Can be found by squaring Mintab MSD. MSE = ∑ ( At – Ft)2 n Mean Squared Error is good for comparing model error for a given series but it is unit sensitive. This is produced by Mintab as MSD. 3-* Theil’s U U =√∑(At – Ft )2 / √∑(A – At-1 )2 This coefficient is very useful to determine if a forecast is better than a naïve forecast (next period will be the same as last period). This is good to determine is a forecaster or budgeter is adding value. Values <1 indicate better than the naïve assumption while U values >1 indicate forecast results that are worse than the naïve assumption.

U values = 0 indicate a perfect forecast.

3-*

Use of Error Measures

From this point on you must show and comment the error measures for any forecast or estimated values.

Remember the best models have the lowest error measures.

Apply the error measures consistently in your analysis.

These along with residual analysis determine forecast model fit or accuracy. (Confirmation that the error is small and random.)

Error measures required
Root Mean Square Error (RMSE) – the square root of MSE or MSD. Average expected error per observation.

Mean Absolute Percentage Error (MAPE) – always expressed as a percentage. Average expected percent error per observation. (see page 88)

3-*

Beyond Error Measures
(They tell you how much error there is)
What caused the error?

3-*

Model error components:

– Random
– Systematic (T,C and S)

If the model did not pick up the T, C and S information it is left in the error terms or residuals.
Actual Data Observations – Model Fit Estimates = Residuals

3-*

Use of Residual Analysis

From this point on you must show and comment on the T, C or S characteristics that you find in the residuals.(error terms). If you see T,C or S in the residuals it is not in the forecast.

Remember the best models have random residuals.
– Zero residual mean
– Normal residual distribution (bell-shaped histogram)
– Non autocorrelated observations (ACFs within t-limits and LBQ values below Chi-square table values)

Residual Analysis Requirements
– Time Series Plot of residuals
– Autocorrelation Functions of residuals
– Histogram of residuals
– Calculation of basic statistics (mean should be close to zero, otherwise the model has bias)
+ mean The model underforecasts
– mean The model overforecasts

3-*

Chapter 4 Continues Our Study of Forecasting Methods

Moving Averages

Exponential Smoothing

Error Measures Apply

Evaluation of the Residuals Apply

3-*

Forecasting Methods for Stationary Data

First demonstrate that the data is “stationary”

1. Plot the time series data (comment on trend)

2. Run autocorrelations to detect trend with 95%
confidence

This first simple exponential smoothing model has very limited application in business since most business data is trended and has cycle and many times seasonality – be careful about what the characteristics of the data are.

3-*

Simple Exponential Smoothing
(In Minitab Stat/Time Series/Single Exponential Smoothing)

– Save the Fits and Residuals
Forecast the number of periods required
Plot the Y data and the Fits on the same time series graph
Plot the time series residuals

3-*

Simple Exponential Smoothing
Ft+1= ɑXt+ (1- ɑ) Ft
Where :
Ft+1= Forecast for Period t+1
ɑ = Smoothing Constant (0<ɑ<1) Xt = Actual Value Now (in time period t) Ft = Forecast (smoothed) value for period t The equation can be written: Ft+1= Ft + ɑ(Xt - Ft ) Where (Xt - Ft ) is the forecast error for period t This means that the forecast (Ft+1 ) is equal to the forecast for the previous period (Ft) plus the level coefficient alpha (ɑ) times the forecast error (Xt - Ft ) of the last period t. 3-* Simple Exponential Smoothing Ft+1= ɑXt+ (1- ɑ) Ft (Level or Intercept) Where : Ft+1= Forecast for Period t+1 ɑ = Smoothing Constant (0<ɑ<1) Xt = Actual Value Now (in time period t) Ft = Forecast (smoothed) value for period t The equation can be written: Ft+1= Ft + ɑ(Xt - Ft ) Where (Xt - Ft ) is the forecast error for period t This means that the forecast (Ft+1 ) is equal to the forecast for the previous period (Ft) plus the level coefficient alpha (ɑ) times the forecast error (Xt - Ft ) of the last period t. This model typically results in a stationary forecast – with no slope. 3-* Ft+1= ɑXt+ (1- ɑ) Ft Where : Ft+1= Forecast for Period t+1 ɑ = Smoothing Constant (0<ɑ<1) Xt = Actual Value Now (in time period t) Ft = Forecast (smoothed) vlue for period t Exponential Smoothing 3-* 3-* Forecasting With Trended Data Validate the trend with a time series plot and ACF analysis 2 smoothing methods can be used do address trended data, - Holt’s: in Minitab/Stat/Time Series/Double Exponential Smoothing - Winter’s in Minitab/Stat/Time Series/Winter’s Method (note you must have seasonality to use Winter’s) If you have only trend or cycle in the X variable data you must use Double Exponential Smoothing. You must develop a two year forecast for each X variable with the appropriate exponential smoothing model. 3-* Holt’sTwo Parameter Exponential Smoothing Ft+1 = ɑX t + (1-ɑ)(Ft + T t) and Tt+1 = ɤ(Ft+1-Ft) + (1-ɤ)Tt Ht+1= Ft+1+ mTt+1 Where: Ft+1 = Smoothed value for Period t+1 ɑ = Smoothing constant for the level (0<ɑ<1) X t = Actual value Now Ft = Forecast value for time period t (now) Tt+1= Trend estimate ɤ = Smoothing constant for the trend (0<ɤ<1) m = Number of periods ahead to be forecast Ht+1= Holt’s forecast value for period t+m Note the lack of any parameter or equation to address seasonality. 3-* Holt’s (Double) Exponential Smoothing Ft+1 = ɑX t + (1-ɑ)(Ft + T t) and (Level or Intercept) Tt+1 = ɤ(Ft+1-Ft) + (1-ɤ)Tt (Trend or Slope) Ht+1= Ft+1+ mTt+1 (Sum) Where: Ft+1 = Smoothed value for Period t+1 ɑ = Smoothing constant for the level (0<ɑ<1) X t = Actual value Now Ft = Forecast value for time period t (now) Tt+1= Trend estimate ɤ = Smoothing constant for the trend (0<ɤ<1) m = Number of periods ahead to be forecast Ht+1= Holt’s forecast value for period t+m This model can produce a forecast that is similar to a trend forecast with alpha as the intercept and Gamma as the slope. 3-* Note the alpha and gamma values of the Holt’s method. Note the error measures. 3-* What do you see in the fitted values relative to the actual observations? 3-* Forecasting With Seasonal Data Forecasts should be evaluated in the seasonal form of the data (original form). Error measures should be applied to the seasonal data and the seasonal Fits or Forecast. You will have one exponential smoothing option: including the following: - Use Winter’s Note that if your data has seasonality you must use the Winter’s method of exponential smoothing to forecast in each X variable your class project. 3-* 3-* Winters’ Exponential Smoothing Ft = ɑX t/St-p + (1-ɑ)(Ft-1 + T t-1) (Level) St = βXt/Ft + (1- β)St-p (Seasonality) Tt = ɤ(Ft – Ft-1) + (1-ɤ)Tt-1 (Trend) Wt+m= (Ft + mTt) St+m-p (Sum ) Ft = Smoothed value for Period t ɑ = Smoothing constant for the level (0<ɑ<1) X t = Actual value Now Ft -1= Forecast value for time period t -1 Tt+1= Trend estimate S = Seasonality estimate β = Smoothing constant for seasonality estimate (0< β<1) ɤ = Smoothing constant for the trend (0<ɤ<1) m = Number of periods ahead to be forecast p = Number of periods in the seasonal cycle Wt+1= Winters’ forecast value for period t+m 3-* Note the alpha, beta and gamma values of this Winter’s model. Note the error measures. 3-* 3-* 3-* Quarterly Season Indices – Note that you have one and only one for each annual period regardless of the years. Note that they sum to the number of periods in the year. Four in the quarterly example above. 3-* Smoothing Techniques and Relation to Data Consider the Linear Equation: Y = a + bX where a = Intercept or Level b = Slope or Trend For stationary or non trended data use level (intercept) techniques that include: 1. Moving Averages (extension of Naïve Methods) 2. Simple Exponential Smoothing 3. Adaptive-Response-Rate Single Exponential Smoothing For trended data use level and trend (slope) techniques that include: 4. Holt’s (Double) Exponential Smoothing 5. Winters’ Exponential Smoothing a bX Y X 3-* Procedure for Evaluating Models for Fit and Forecast 1. Determine the characteristics of the data before you select the appropriate model 2. Run the model and obtain fitted values and forecasts 3. Evaluate the Error Measures from the model – can you use a model that generates this amount of error? 4. Evaluate the model Fit by examining the Fit Residuals - Evaluate the Fits to Actual data time series plots - Time series plot of residuals (look for characteristics and zero mean) - Autocorrelation of residuals (confirm characteristics) - If no systematic characteristics look for normality with normal plot and histogram. Declare the residuals random normal or not. 3-* Procedure for Evaluating Models for Fit and Forecast 5. Examine time series plot of forecast and hold out actuals. 6. Evaluate the model Forecast by examining Forecast error measures. Are they larger than Fit error measures? Can you tell why? 7. Evaluate the Forecast Residuals for systematic characteristics. - Evaluate the Forecast to Actual data time series plots - Time series plot of residuals (look for characteristics and zero mean) - Autocorrelation of residuals (confirm characteristics) - If no systematic characteristics look for normality with normal plot. Declare the residuals random normal or not. 8. Declare the forecast useful or not. 3-* Forecasting for New Products There is typically little or not data on new products being introduced. There are techniques based on the product life cycle hypothesis that enable forecasting of new products. Introduction (data of cycle beginning and early observations) Early Adoption (early take off and observations) Growth (usually to be forecast) Maturity (highest point usually to be forecast with S curve - but length not using S curve) Decline (usually to be forecast but not with S curve) They employ the selection of a specific curve type and early data to develop a forecast. You have one form of the S curve in minitab under Stat/Time Series/Trend Analysis/S Curve. 3-* New Produce Forecasting 3-* 3-* 3-* 3-* 3-* Innovation Imitation 3-* 3-* 3-* S-Curve Application in Forecasting New Products 1. Use Gompertz curves for forecasts where 100% penetration is hard to achieve. The curve is not symmetrical and flattens before you get to complete market penetration. 2. Use Logistics curves for forecasts that have a network effect and can achieve 100% penetration more easily. These curves are symmetrical and do not flatten before complete market penetration is reached. 3. Use Bass curves for innovation (p) and imitation (q) manipulation. See the chart on page 137 to select appropriate values for p and q. In Minitab you have the Pearl-Reed Logistic S-Curve option under the Time Series/Trend Analysis heading. This requires only a few initial observations to forecast new services. 3-* 3-* Economic Forecasting 3-* Time Series Decomposition This forecasting method (as with the previous univariate methods) assumes that the forces that have worked in the past to influence data will continue in the forecast period. The process starts by breaking the Time Series into three component parts and checking them for reasonableness. We will use the multiplicative method for doing this. The next step is to reassemble the parts to develop fit and forecast values. The fits and forecasts are used to determine residuals or error terms for evaluation. 3-* Time-Series Decomposition (another single data series or univariate approach) The decomposition model goes beyond the exponential smoothing models with a cycle factor. The general business application form of the multiplicative time series decomposition model is Y = T x S x C x I Where: Y = variable to be forecast T = long-term (secular) trend in the data S = seasonal adjustment factor C = business cycle adjustment factor I = random or irregular variations in the data Each element in the model must be generated separately. The determination of the elements begins with two staged smoothing. 3-* Time Series Decomposition Features It demonstrates how smoothing can be used to isolate data characteristics It enables the forecaster to use each of the isolated characteristics to develop a forecast It is not as popular as it once was and have been replaced by ARIMA (another univariate approach) and Regression analysis. It has two approaches –additive and multiplicative Multiplicative is used for most business applications It requires a lot of data due to the smoothing technique involved. Can be used to develop fairly accurate forecasts. 3-* An example of Time series Decomposition Using monthly Private Housing Start data. 3-* 3-* McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved 3-* ©2007 The McGraw-Hill Companies, Inc. All rights reserved 3-* An Illustration of the 2-Staged Smoothing Using Quaterly Data 3-* Seasonal Indices Remove seasonality by Calculating the Moving Average (MA) (12 for months, 4 for quarters) of the data. Start moving average at the observation just below the median of the first years data. Remove random fluctuations by calculating the Centered Moving Average (CMA) by averaging the MA and the MA for the following period. Calculate Seasonal Factors (SF) by dividing each data observation by its respective CMA. For each representative month or quarter average the seasonal factors. Divide the number periods (months or quarters) by the sum the seasonal factors (SF) Multiply each representative (averaged) SF by the ratio to get the seasonal index for each period. Note you will lose n/2 of the observations at the beginning and end of the series 3-* Example of Highly Seasonal Data 3-* 3-* Another Example of Highly Seasonal Data 3-* These Monthly Indices Sum To 12 3-* Uses of Seasonal Indices 1. To deseasonalize data for a period divide each raw data observation by the respective period seasonal index. deseasonal = observation/seasonal index 2. To estimate reseasonalized (raw) data for a period multiply the deseasonalized observation by the respective period index. seasonal = deseasonal observation x seasonal index 3. Note that the seasonal indices should sum to the number of periods being evaluated (4 for quarters, 12 for months) sum of indices = 12 or 4 3-* Uses of Seasonal Indices (continued) 4. Each index represents the percentage of the average period observation that occur in the year. (sum of annual data/periods) x period index = period seasonal estimate 5. To get an annual projection multiply the average of the deseasonalized period values by the number of periods (4 for quarters 12 for months) Average period deseasonized value x periods = annual projection 3-* At this point You have Smoothed Data or the Centered Moving Average (CMA) that is used to develop trend and cycle components. You have developed the Seasonal component - the Seasonal Indices (S) Now we will move to the trend component (T) that is shown by a Centered Moving Average Trend (CMAT) data series. 3-* Long-Term Trend Component To estimate long-term trend we want to use CMAs to find the linear intercept and slope. Run trend analysis in mintab to get the trend equation and the trended values or Centered Moving Average Trended (CMAT). CMA trend = a + b (time) Note that this is a linear function Example: CMAT = 122.94 + .04 (time) Plotted with data plot shows the long-term trend that can be positive (upward sloping) or negative (downward sloping) Be sure to inspect the coefficient values for reasonableness. The CMAT is the T component. 3-* McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved 3-* The Cyclical Component (C) This is shown by the Cycle Factor (CF) data series. CF = CMA / CMAT Where: CF = the cyclical component of the series for each period CMA = the smoothed deseaonlized data for each period CMAT = the trend values for each period Simply divide the CMA values by the CMAT (trended values) to get the Cycle Factor CF or the C component. In Minitab the CF = DESE/TREN 3-* Adding the Cycle Component (C) to the Decomposition Forecast -- Cycle Choices -- Calculate cycle factors by dividing the deseasonalized Y data by the trend factors (DESE/TREN) to get a historical series of cycle factors Perform a time series plot of these cycle factors. If you do not have any idea of what the business cycle will look like in the future adjust the Decomposition forecast by multiplying the entire forecast by the Last Cycle Factor from the Y data series. If you have an indication that the cycle will increase or decrease find historical cycle factors that reflect this and multiply each forecast observation by the historical cycle factor series of the same length. 3-* Business Cycle Are Difficult To Predict (they defy the term “Cycle” since regularity cannot be counted on) 3-* McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved Become Familiar with Leading Indices 3-* McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved Forecasters have learned the downturn periods and can usually detect these in any data series influenced by economic conditions. 3-* McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved Cycle Factors and Their Interpretation 3-* You can borrow from the past to estimate cycles Calculate the cycle factors (DESE/TREN) and plot them. Note the CF where the data series ended. Pick up a forecast period length series of historical cycle factors that begin close to the CF value of the end of the series - and continues up if you expect economic expansion - and continues down if you expect economic contraction Multiply the decomposition forecast values by the CF series you selected. 3-* Time-Series Decomposition Restated The decomposition Model Y = T x S x C x I Where: Y = variable to be forecast T = long-term trend CMAT data and forecast S = seasonal index (SI) and forecast C = cycle factor (CF) and forecast I = random variations (assumed to be 1) – ignore this. Fit or Forecast of Y = (CMAT)x(SI)x(CF)x(1) In Minitab Estimate of Y = TREN x SEAS x (Your CF) To calculate decomposition fitted values and forecast values simply multiply each series starting with trend times the seasonal index times the cycle factor. 3-* 3-* 3-* McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved 3-* ©2007 The McGraw-Hill Companies, Inc. All rights reserved Forecasting cycle factors is a matter of judgment –know where you are in the current cycle and apply a recent cycle pattern for the forecast period. This is best done with a time series plot of the CMA data. 3-* 3-* McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved 3-* McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved 3-* Running Decomposition in Mintab In minitab select  Stat/Time Series/Decomposition In the model select your data column and enter the periodicity (12 or 4) In model type select “multiplicative” and “trend plus seasonal” (for seasonal and trended data) Select generate forecasts and enter the number (12 or 4) Under Options title you graphs and use the default if your data begins at the first period of the year Under Storage select all of the options In graphs select plot and residual plots “4 in one” Save and store   TREN1 = CMAT     trend line produced by the fitted trend equation DESE1 = CMA SEAS1 = SI    the repeated seasonal indices DETR1 = Data plotted with trend removed (shows seasonality and random variation FITS1 = Decomposition Fitted Data RESI1 = Residuals or Error Terms FORE1 = Forecast Produces error measures  MAPE, MAD and  MSD (or MSE) 3-* You can use the average of the entire series of seasonal factors as the seasonal index. This is similar to the process of deriving the index we discussed in our last class meeting. (75.4+86.8+96.9+72.6+80.0+85.4)/6 = 82.9 = TS = 900(.829) = 746.1 In this situation you can have at least two correct answers. A safe one that uses the average of all of the seasonal factors or an answer that applies to a very steep trend. Regardless, you can adjust the trend value using an average of the seasonal factors. The author recommends using only the last two seasonal factor values in the average. (80.0 + 85.4)/2 = 82.7 = TS = 900(.827) = $744.3 Eco 309 Chapter 5 Assignment 7 3-* 16. a. Multiplicative Model Data Disney Sales Length 63 NMissing 0 Fitted Trend Equation Yt = -302.9 + 44.9*t Seasonal Indices Period Index 1 0.957 2 1.022 3 1.046 4 0.975 3-* 16 b. There is a significant trend but it is not a linear trend. First quarter sales tend to be relatively low and third quarter sales tend to be relatively high. However, the plot in part a indicates a multiplicative decomposition with a linear trend is not an adequate representation of Disney sales. Perhaps better to do a multiplicative decomposition with a quadratic trend. Even better, in this case, is to do an additive decomposition with the logarithms of Disney sales. c. With the right decomposition, would use both the trend and seasonal components to generate forecasts. d. Forecasts Quarter Forecast Q4/1995 2506 Q1/1996 2502 Q2/1996 2719 Q3/1996 2830 Q4/1996 2681 However, the plot in part a indicates that forecasts generated from a multiplicative decomposition with a linear trend are likely to be too low. Y ˆ New folder/POWERPOINTS/Chapter 6 and 7 Slides Revised.ppt ©2007 McGraw-Hill/Irwin Chapter 6 Forecasting with Simple Linear Regression Methods 4-* Regression is Useful to Test Theories of Business Variable Relationships and to Forecast Outcomes You have hypothesized relationships. Business people do this every day – relationship between inputs and outputs, sales and product price, sales and customer visits, etc… You formulate hypotheses between variables in everyday life. Relationship between time spent on studying for this course and your grade, relationship between your earnings and taxes you must pay, relationship between expected miles you drive and gasoline costs, relationship between weather and your utility bills. In each of these case one variable (X) seems to be a good determinant or explains the value of another variable (Y). Economics is the study of variable relationships – the science part of economics. Economic theory is a compilation of these relationships observed and tested. Economic Law occurs when the theories have been thoroughly tested and the outcome is a virtual certainty. 4-* We are going to examine how to forecast one variable Y with another variable X This starts with bivariate (two variable) analysis. The relationship has to be (1) statistical and (2) logical before you can use X to forecast Y. 4-* The Pearson product moment correlation coefficient takes on the symbol ρ for the population. Perfect correlations are represented by positive (+1) or negative (-1) values while 0 values represent no correlation. Just getting a high r value is only half of the job. You must check the r value for significance. That is you must answer the question “Is the (r) that you observe a member of a distribution (of r’s) that have zero mean or expected value?” You must test the null hypothesis Ho: r=0. r = ∑ (X-X) (Y-Ῡ) √[∑(X-X)2] [∑(Y-Ῡ)2] Correlation Coefficients Quantify The Relationship Between Variables 4-* Using Hypothesis Testing In Bivariate Analysis (Testing for correlation significance) Ho: r = 0 H1: r ≠ 0 To determine if your independent (X) variable is useful in regression analysis you must perform a hypothesis test to ensure statistical significance even though you have a high correlation coefficient (ρ). Use the standard t-test to perform the significance test on the sample Pearson Product Moment correlation value ( r ) with the t-calc equation: t = r – 0 √(1-r2) /(n-2) Note that the t value is very sensitive to the size of ( r ) and the number of observations ( n ). If t-calc is greater than the t-table value for (n-2) observations then you reject the null hypothesis and your (X) variable correlation is statistically significant. ρ for population, r for the sample 4-* 4-* 4-* Hypothesis Test for the Example C Scatter Plot Given a correlation coefficient of r=.79 and a sample size of 5 test for Ho: ρ = 0 H1: ρ ≠ 0 t-table (for two tailed test, 95% confidence and df = n-2 is 3.182 t-calc = .79 – 0/√(1-.792)/(5-2) = .79/√.1253 = 2.232 -3.182 +3.182 2.232 Reject the null hypothesis Ho The correlation is not statistically significant and X is not a good explanatory variable of Y. 4-* Given a specific value of the independent random variable X, the conditional expected value of Y, denoted as E(Y|X = x) is: mX = mean of independent variable X. mY = mean of independent variable Y. r = correlation coefficient between X and Y. sY, sX = standard deviations of Y and X, respectively. When two random variables are correlated (r ≠ 0), knowledge of the outcome of one will benefit in the forecasting of the other. Statisticians have developed a useful theory of conditional expectation, called Classical Linear Regression: Why is Regression a Useful Forecasting Tool? 4-* Regression is Useful to Test Theories of Business Variable Relationships and to Forecast Outcomes You have hypothesized relationships. Business people do this every day – relationship between inputs and outputs, sales and product price, sales and customer visits, etc… You formulate hypotheses between variables in everyday life. Relationship between time spent on studying for this course and your grade, relationship between your earnings and taxes you must pay, relationship between expected miles you drive and gasoline costs, relationship between weather and your utility bills. In each of these case one variable (X) seems to be a good determinant or explains the value of another variable (Y). Economics is the study of variable relationships – the science part of economics. Economic theory is a compilation of these relationships observed and tested. Economic Law occurs when the theories have been thoroughly tested and the outcome is a virtual certainty. 4-* Bivariate Causal Regression Model Simple Least Squares regression where there are two variables (Y and X) Y=f(X) where variable X is assumed to cause changes in variable Y The linear model can be expressed: Y= βo+ β1X + ε Y is the dependent variable to be forecast βo is the intercept where X is zero β1 is the change in Y for every unit change in X (Slope) X is the independent variable (implied causation) ε is the error term Ordinary Least Squares (OLS) produces the model that minimizes the squared error (∑ ε2). That is it minimizes the ∑(Y-Ŷ)2 where Y is the actual Y value and Ŷ is the estimated value from regression. Remember that Y = Ŷ + (Y-Ŷ) or the estimate plus the error. 4-* C2) Population and Sample Data Points (Why Do We Need Hypothesis Tests?) Y X Estimate Based on Sample Regression Line From Sample Data – Is it Representative of the Population? Y=bo + b1 X 4-* C2) Population and Sample Data Points (Why Do We Need Hypothesis Tests?) Y X Estimate Based on Sample Population Estimate The Sample Regression Line is Not Representative of the Population in this case. We need to test Ho: b1=0 4-* Regression Analysis Probabilities Since the regression solution is an estimate of Y given an X there is uncertainty at various levels. Is there significance and strength in the X relationship with Y? Is Y sample a good representation of the Y population? Are the estimator coefficients significant? Is the model result significant? Is the Y value a member of the distribution about the Y estimate Ŷ? The bottom line is that regression models require a lot of hypothesis testing in evaluating their performance potential. 4-* Solving For OLS Coefficients Y= βo+ β1X + ε is the equation for the population Ŷ= bo+ b1X is the estimated equation for the sample ε = Y- Ŷ To estimate the coefficients: b1 =∑(X- X)(Y –Y)/∑(X – X) 2 solve for slope first bo = Y- b1X then solve for the intercept Given two variables anyone can solve for the regression equation but does the equation produce a good forecasting model? 4-* Decomposition of Variance SST = Sum of Squares Total (total variability of Y) SSR = Sum of Squares Regression (variability explained by linear relationship) SSE = Sum of Squares Error (residual or unexplained variability) SST = SSR + SSE Components can be calculated by: SST = ∑(Y-Ῡ)2 SSR = ∑(Ŷ-Ῡ)2 SSE = ∑(Y-Ŷ)2 Where Y = observed values of Y, Ῡ = mean of Y, Ŷ = forecast (fitted) values of Y R2 = SSR/SST or the portion of Y variance explained by the linear relationship 4-* Important ANOVA Measures Analysis of Variance R2 = SSR/SST (Unexplained variation divided by total variation) Mean Squared Regression (MSR) = SSR/ 1 Mean Squared Error (MSE) = SSE/(n-2) F = MSR/MSE (Average Explained Variation divided by Average Unexplained Variation) Run the ANOVA table in minitab to accompany your model results. Sum of Squares Total (total variability of Y) = SST degrees of freedom (df) = n-1 Sum of Squares Regression = SSR degrees of freedom (df) = 1 (the number of X variables) Sum of Squares Error = SSE degrees of freedom (df) = n-2 4-* The Relationship Between F ratio and R2 R2 is sometimes called the Multiple Correlation Coefficient or Coefficient of Determination. R2 = SSR/SST Explained Variance Divided by Total Variance F = MSR/ MSE or (SSR/k)/(SSE/(n-k-1)) Where: n = number of observations k = number of independent variables F = R2/(1- R2) ((n-k-1)/k) or = R2(n-2)/(1-R2) Large R2 values will result in large F values all other things being equal. The F value will provide a more reliable measure of model results significance to the population by adjusting for the degrees of freedom in observations and X variables. The F- test must be performed to determine regression results applicability to the sample population. 4-* Standard Error of the Estimate (SEE) A measure of the Y dispersion about the forecast (Ŷ) similar to standard deviation (S) It measures the amount that the observations Y differ from the estimates Ŷ One SEE would include about 67% of the differences while 2 SEE would include about 95% of the differences between Y and Ŷ. A small SEE means that the actual observations of Y fall very close to the regression line. SEE = √ ∑(Y-Ŷ)2/(n-2) or SEE is the square root of the sum of the squared residuals (SSR) divided by the degree of freedom. This is equal to the square root of MSE from variance decomposition. 4-* Developing a Confidence Band around your Forecast Approximate 95% Confidence Band = + and – (2 x SEE) Develop a confidence band particularly when the X forecast values exceed the historical or observed X values. This replaces uncertainty with probability in the forecast. 4-* Standard Error of the Estimate (SEE) = ∑(Y – Ŷ)2 Where: Ŷ = forecast (fitted) value for each period Y = observed value for each period Using Standard Error of the Estimate to Develop a Forecast Range SEE is sometimes called the Standard Error or Regression SER 4-* Standard Error of the Forecast (Sf) A measure of the variability of an estimated Y around the actual Y given a value X. That is you solve for the interval given an X value. Sf = √ 1 + 1/n + (X-X)2/∑(X-X)2 It can be used to calculate a confidence interval around forecasted values of Y or Ŷ. The interval = Ŷ ± (t-table value x Sf) 4-* How Can You Tell When You Have A Good Regression Model? 4-* Critical Regression Parameter Assumptions Assumption 1. The relationship between Y and X is linear, as described in the above equation. This implies that Y is determined by X, rather than vice versa. Assumption 2. Var(X) is nonzero, and finite for any sample. The values of Xt are not all the same. If Var(X) = 0, it would be impossible to estimate the impact of ∆X on Y. Assumption 3. The error term (et) has zero expected value. That is random error terms will cancel out (+ and -) over the long run. ∑(et) = 0 Assumption 4. The error term (et) has constant variance for all observations and Var(et) = s2, where: Assumption 5. The random variables et are uncorrelated, i.e., Cov(et, et-i) = 0 for all i. Assumption 6. The error term et is normally distributed over the entire range of values. * 4-* X1 Y Best Linear Unbiased Estimators It is assumed that Y is normally distributed about the Ŷ values (values of Y estimated from the regression equation. They are Normally Distributed with Constant Variance Over the Entire Range of X. Note that Sf and SEE take advantage of this property and enable confidence interval estimation. 4-* X1 Best Linear Unbiased Estimators It is assumed that the residuals are normally distributed about a zero mean. They are Normally Distributed with Constant Variance Over the Entire Range of X (or time series). Note that a residual time series plot is important to verify the mean and the constant variance. Residuals 0 4-* 4-* Implications of the Critical Assumptions 1) The model has three unknown parameters: βo, β1 and s2. (intercept, slope and variance) 2) Each is an normal variate with mean βo + β1X and variance s2. 3) If we know βo, β1 and s2 we can forecast Y using standard normal distribution. 4) Sample estimates of βo, β1 can be obtained using Ordinary Least Squares (OLS) and result in the Best Linear Unbiased Estimates (BLUE) Note: Your regression results cannot be BLUE unless the assumptions hold. 4-* ©2007 The McGraw-Hill Companies, Inc. All rights reserved The resulting regression equation is Y =3 +.5X for each of the above data sets. A Regression Model Expression may Fit Various Variable Relationships 4-* Make sure that you inspect the data scatterplots for all variables The linear equation may not produce an accurate forecast with some of these XY relationships. 4-* Process for Regression Forecasting 1. Look for causal relationships between the dependent variable to be forecasted and causal independent variables. Clearly state the forecasting problem and your hypothesis of causation. 2. Visually inspect the data looking for trend, seasonality and cycles for all variables (dependent and independent). 3. Determine the best regression model to fit the data (trend or causal, linear or nonlinear, bivariate or multivariate). 4. Forecast the independent variables (using the best method determined in Table 2.1 in Chapter 2). 5. Specify the regression model by estimating the coefficients bo and b1, b2… Designate a hold out period that is not used in the estimation of the coefficients. 6. Perform and “in sample” evaluation using error measures and error autocorrelations and hypothesis tests – (Test for Critical BLUE Assumptions). 4-* 7. Perform a hold out or “out of sample” evaluation using the same error measures and error tests. 8. Adjust or respecify the model as necessary by transforming, adding or deleting independent (explanatory) variables. 9. Repeat “in sample” and “out of sample” error measures and tests to ensure accuracy of the model. 10. Use the tested and selected model to forecast beyond the boundary of known variables or actual observations 11.Check the resulting forecast for reasonableness by plotting the actual observations along with the forecast results. 12. Declare the estimator BLUE if the residuals warrant it. Process for Regression Forecasting (Continued) 4-* Linear Trend Analysis (The Simplest Form of Regression) You have used this in Decomposition to get CMAT from CMA In this case the time index takes the place of the X variable – there is no implied causation in this model. 4-* Linear Trend Forecasting (Simple Regression) Note the Time Index 4-* Inspect the raw data series for trend (positive or negative) What other test can you perform to confirm trend? 4-* Simple Trend Regression Forecasting (non causal linear regression) 1. Inspect the data plot for long term linear trend. 2. Set up time index (T) for each independent data observation. 3. Estimate the regression coefficients for Ŷ= bo+ b1T. 4. Plot the observations along with the fitted values for Y and check for reasonableness of the fit. 5. Check the plot for alternating wave patterns. 6. Check the error terms (residuals) for remaining trend or seasonality and perform other hypothesis tests for error randomness, variance and distribution. Note: Linear Trend Regression is a substitute for the Holt’s Exponential Smoothing Method 4-* 4-* Bivariate Causal Regression Models Y = bo + b1(X) Where: Y = dependent variable (caused by X) X = independent variable (causes X) bo= Intercept coefficient (may not cross the X axis in the observed range) b1 = Slope coefficient Note that time does not influence Y in this model. The values of X influence Y. The procedures for testing statistical significance of the slope parameter follow directly the recommended steps for hypothesis testing for a one tailed test if the sign of the slope is known, a two tailed test if it is unknown (or you have no strong rationale). 4-* Evaluating Regression Results Logic of dependent to independent variable(s) relationship (+ or -). Does it make sense? - if not it implies and underspecified model and you may need to add independent variables Size of the Slope term – the closer to zero, the weaker the relationship between independent and dependent variables. A zero coefficient implies no relationship between X and Y and the model is overspecified (includes a non productive variable). Hypothesis test of the slope coefficient: Ho: β = 0, H1: β ≠ 0 The t-calc value is the slope coefficient (b1)/standard error of b1 The t-table value df is (n-2) and the test is two tailed. 4-* Determining the Significance of the Slope Term Due to the size of the variables being forecasted and the change in the variables over the series, the absolute size that the slope term is sometimes hard to determine. Perform a hypothesis test on the slope term with either a 95% confidence interval where: Ho: b1 = 0, H1: b1 ≠ 0 when the slope sign is not known and requires a two tailed t-test Ho: b1< or = 0, H1: b1 > 0 when the slope is positive, one tailed test
Ho: b1> or = 0, H1: b1 < 0 when the slope is negative, one tailed test or Use P-values that indicate the level of significance for the slope coefficient. Recall 95% confidence equates to 5% level of significance. For a two tailed test Ho: b1=0, then P must be less than .05 to reject Ho. For a one tailed test Ho: b1> or <0 then on half P must be less than .05 to reject Ho. 4-* Decomposition of Variance SST = Sum of Squares Total (total variability of Y) SSR = Sum of Squares Regression (variability explained by linear relationship SSE = Sum of Squares Error (residual or unexplained variability SST = SSR + SSE Mean Squared Regression (MSR) = SSR/ df Mean Squared Error (MSE) = SSE/ df R2 = SSR/SST or the portion of Y variance explained by the linear relationship Components can be calculated by: SST = ∑(Y-Ῡ)2 SSR = ∑(Ŷ-Ῡ)2 SSE = ∑(Y-Ŷ)2 Where Y = observed values of Y, Ῡ = mean of Y, Ŷ = forecast (fitted) values of Y 4-* Coefficient of Determination R2 R2 = Sum of Squares Regression (SSR) Sum of Squares Total (SST) = Total variability explained by the linear relation Total variability of Y = ∑ (Ŷ – Y)2 ∑ (Y – Y)2 R2 is an easy measure to manipulate and that it generally overstates the fit of a given regression model, especially when serial correlation is present. Perform hypothesis test on R2 where: Ho: R2= 0, H1: R2≠ 0 R2 = correlation coefficient (r) squared. 4-* Coefficient of Determination (R2) Requirement R2 shows the portion of the variation of Y explained by the independent variable X In multiple regression adding independent variables will increase R2 since the correlation with Y will increase. To adjust for the added variables R2 is recalculated to account for the change in degrees of freedom (K). With 2 independent variable K=2, three independent variables K=3, etc.. The result is Adjusted R2 that explains the true amount of variation explained by multiple regression independent variables. Comment on the size of the coefficient of determination in discussions of the power of your regression model. 4-* Testing the Significance of Regression The “ F” Test This is the equivalent of a two tailed test of the null hypothesis β1 = 0 for all X (independent) variables in the regression. It should provide the same results as the t-test for the null hypothesis for a simple regression model with one X variable. It can be used to evaluate the entire model when the number of X variables increase. Compare the calculated F value from regression with the F table value on pages 529-530. Degrees of freedom for the numerator is the (K) number of X variables (e.g. 1 for simple regression) Degrees of freedom for the denominator is n- (K+1) for the denominator. Where K is the number of X variables. (e.g. n-2 for simple regression) F calc must exceed F table to reject the null hypothesis 4-* F-Test Requirement F = Explained variation/K From SSR and SSE Unexplained variation/[n-(K-1)] Where K = the number of independent variables (X) n = the number of dependent variable (Y) observations 1. Compare the calculated F-statistic with the table B-5 on pages 529-530. a) Look up the column by using the K value b) Look up the row by using the [n - (K+1)] value 2. Check to ensure that F-calc (statistic) is greater than F-table to reject Ho 3. The table provides a significance test at 95% confidence or alpha of .05 4. Comment on the F-Test to indicate the reliability of your model. Test the regression model hypothesis: Ho: β1=β2=β3=……βn=0 or R2 = 0 4-* 4-* Bivariate Regression Evaluation Questions 1. Does the sign of the slope term make sense? 2. Does the t-test to determine if the slope term is statistically significant either positive or negative? 3. How much of the dependent variable is explained by the independent variable shown by R2? 4-* Developing a Confidence Band around your Forecast Approximate 95% Confidence Band = + and – (2 x SEE) 4-* Standard Error of the Estimate (SEE) = ∑(Y – Ŷ)2 Where: Ŷ = forecast (fitted) value for each period Y = observed value for each period Using Standard Error of the Estimate to Develop a Forecast Range SEE is sometimes called the Standard Error or Regression SER 4-* There are two ways to address a curvilinear XY relationship (as shown by scatter plots of Y versus X) Change the form of the regression (e.g. from Linear to Quadratic) but this will change the assumptions made relative to the measures. We will discuss this later in Chapter 7. Change (transform) the X variable to create a more linear relationship. With a known transformation all of the Regression Statistics still hold just as they did with raw X data but they will be improved. Again, use only the X transform if required for now in you regression analysis and for your project data. 4-* Poor Linear Relationships and Data Transformation If you XY scatter plot is not linear you may need to transform your data. (Note the difference between a weak linear relationship A on the following slide versus a curvilinear relationship shown by B) Use the Calculator function in minitab to convert your X variable to another form to obtain a linear relationship. Scatterplot the to select the most linear transformed X and Y relationship. Run the regression on the transformed X and Y. Check the R square, F, error measures and residuals. 4-* Make sure that you inspect the data scatterplots for all variables The linear equation may not produce an accurate forecast with some of these XY relationships. 4-* Transformation Applications This weak linear relationship in data set A will not be helped by transformation. This curvilinear relationship in data set B could be helped with transformation. 4-* Transformation Types Square X2 Square Root √ X Reciprocal 1/X Log (use base 10) Log10 X 1.In minitab Calc select Calculator 2. Type in an open column to place the results in 3. Select the Function you wish form the above options 4. In the (number) place the column of you X variable. Select OK 5. Scatter Plot the Y data and transformed X column 6. If Linear run a Regression on the Y and transformed X 7. Evaluate the results, Rsquare, F, B1 error measures and residuals. 4-* Transformation Examples The following scatter plot slides show what the transformations do to a perfectly linear XY data relationship. If you have a good linear relationship you do not need to transform X. Note that we only need to transform the X variable – Y stays in its raw form. The transformations have the reverse effects on the XY data series that are shown on the slides. The scatter plot for transformed data should become more linear. You need to select the best (most linear form) transformation of X. The best will be indicated by a higher R square value, lower P value for F and a more linear scatter plot. Be sure to run the best transformed value of X as the independent variable replacing the raw X values in regression analysis. 4-* Linear Relationship Between X and Y 4-* Linear Relationship with Squared Transform of X 4-* Linear Relationship with a Transform Square Root of X 4-* Linear Relationship with a Reciprocal Transform of X 4-* Linear Relationship with a Log Transform of X 4-* Cross-Sectional Forecasting With Regression All of the data pertains to one time period. Exploring the relation between variables (Y and X) at a specific point in time. The example shown is the relationship across markets of sales and population at several locations. Cross-sectional regression could apply to any dependent and independent variables at a specific time. (Examples) That is, we can determine the relationship across markets with regression analysis and forecast the result of an additional market (or Y variable) using a regression equation. Many of the estimator tests hold in this case as well, t-test on the slope coefficient, R2, RMSE and F test 4-* Multiple Regression Analysis (Specification, Test and Forecast) The equation for the population is Y= βo+ β1X1+ β2X2+ β3X3 + β4X4 + ….. βnXn +ε Where ε = Y- Ŷ The estimated equation for the sample is Ŷ= bo+ b1X1+ b2X2+ b3X3+ b4X4+ …. bnXn Where residuals = Y- Ŷ In Ordinary Least Squares (OLS) we Minimize ∑ ε2= ∑(Y- Ŷ)2 or the sum of the squared error There are several additional tests that you must perform when you add X variables in multiple regression. - Muliticollinearity - Serial Correlation - Heteroscedasticity 4-* Hypothesis Test of Independent Variables (Xn) Requirement Check the statistical significance of each independent variable (X) and the direction of the variable coefficient sign (+ or -) by performing a t-test. The null hypothesis must be stated: Ho: b ≥ or ≤ 0 Set the null hypothesis the opposite of the sign of your coefficient. H1: b < or > 0

t-calc for Xi= coefficient (b) of Xi/standard error of Xi

Where:
i is the independent X variable

Compare the t-calc value (provided in the minitab analysis for each X variable) to the t-table value where df = n-(K+1) for a one tailed t-test at 95% significance or ɑ of .05. Remember to compare the absolute t-calc value to the t-table.
n = number of observations
K = number of independent variables

You may use the p values in comparison to calculated t-values.
Comment on the statistical significance of your variables and its sign (+ or -)

4-*

Multicollinearity Check Requirement
Multicollinearity is independent (X) variable information overlap and is indicated by a strong linear relationship between X variables.

Example:
Assume GDP and Personal Disposable Income (PDI) are used to forecast Houses Sold. For example GDP and Personal Disposable Income may have strong significant relationships (correlations) and with Houses Sold. But GDP and PDI may have stronger correlations with each other. (see the Correlation Matrix).

Using both GDP and PDI to forecast Houses Sold would result in
misleading error measures and
overstated t-tests and R2 values and
inflated and sensitive X coefficients.

This is a major reason why you do not want a “kitchen sink model”. Overlap or multicollinearity reduces the reliability of forecast measures.

You must comment on model multicollinearity and any steps taken to reduce it.

4-*

Detecting Multicollinearity

Determine correlation coefficients between all independent variables (X) from the correlation Matrix. (You have already done so)

2. X to X variable Correlation coefficients of + or – .8 to 1.0 will signal
possible multicollinearity.

3. Determine if the independent (X,X) correlation coefficients are greater
than the correlation coefficients with the dependent variable (Y).

Examine independent variable coefficients and look for low t-calc values of each independent variable.

Look at the Variance Inflation Factors (VIFs) for values that exceed 2 – this may indicate multicollinearity. (in Minitab regression)

VIFs greater than 2 indicate potentially unstable and inflated coefficients as well as other significance and regression strength measure overstatements.

Take corrective actions suggested on the next slide and recheck VIFs and coefficient t-values.

4-*

Reducing Multicollinearity
1. If independent variable correlation coefficients are high and show possible multicollinearity use the first difference of one of the highly correlated independent variables as a substitute for it.

2. Recalculate the independent variable correlation coefficients including the first differenced variable to determine if correlation remains high.

3. If the correlation remains high eliminate one of the two highly correlated independent variables. Check for significant t-calc values.

4. Select another independent variable that has a logical coefficient sign, lower correlation and higher t-calc value.

5. Transform each independent variable (X) with the scalar formula 7.12 on page 299 and rerun the regression with the scaled Xs.

4-*

Detection of Mulitcollinearity
When signs of the coefficients and t-values don’t make sense

4-*

Detection of Mulitcollinearity
Examine the correlation matrix

4-*

4-*

Serial Correlation
(Autocorrelation of the Error Terms)

Detecting Serial Correlation is essential for correct forecasting technique and deserves special attention.

1) Serial correlation, while not introducing bias into the estimated slope coefficient, creates bias in the estimated standard errors.

2) Serial Correlation produces estimated standard error of the regression that is smaller than the true standard error.

3) This produces spurious regression results in that the significance of coefficient estimates and quality of fit measures will be overstated.

4) Regression coefficients may be deemed significant when in fact they are not. R2 and t values will be overestimated.

5) Serial correlation occurs frequently in business applications of regression.

4-*

Detecting Serial Correlation With Scatter Plots
( Serial Correlation Weakens your Regression Evaluation Measures)
Negative Serial Correlation
Positive Serial Correlation
Which type of serial correlation is associated with business cycles?

4-*

Dubin-Watson test for Serial Correlation

4-*

See the table B-6 on pages 530-531 and use ɑ of .05.

4-*

4-*

Reducing Serial Correlation
(Options for you to explore)

Respecify the model –In the case of positive serial correlation the most likely cause is business cycles. Include a cyclical variable in your model.
Use the first difference of the X and Y variables to smooth out business cycle influences. You must reverse this in the estimates and forecast.
Include the square of your independent (X) variable as another independent variable. (square transform – try this first)
Include a lagged value of your dependent (Y) variable as another independent variable. (try this second)
Use Cochran-Orcutt to adjust the autocorrelation out of the error terms by introducing ρ (rho) to create a differencing transformation. A new regression that estimates another ρ value that is used to estimate another regression equation with lower serial correlation.
Use the Hildreth-Lu that is similar to Cochran-Orcutt. Both are only good for near term forecasts.

4-*

Heteroscedasticity
(Violation of the Residual Constant Variance Assumption)

1) Is present when the error-variance is not constant across the range of the independent variable.

2) The result is bias in error variance estimates, causing misleading statistical inference (R2 and t values).

3) It can be identified with a residual plot against the independent Y variable. Y on the horizontal axis and the error terms or residuals on the vertical axis. Look for the megaphone effect.

4) Fixes for heteroscedasticity include data transformations to stabilize the error variance such as a logarithmic transformation of the dependent (X) variable.

4-*

Plot the Residuals vs X values and look for a Megaphone Effect
You want Homoscedasticity in residuals (constant variance)

You do not want Heteroscedasticity in residuals (non constant variance)

4-*

4-*

How to Correct for Heteroscedasticity

1. Fixes for heteroscedasticity include data transformations to stabilize the error variance such as a logarithmic transformation of one dependent (X) variable.

2. Use Minitab Calculator to transform a selected independent variable into the log base 10 form.

3. Use both the log form and the natural form of the independent variable in your regression.

4. Check the residuals with a time series plot to determine if the megaphone effect is gone.

4-*

Evaluation of Regression Analysis Summary

A. We can evaluate the relationship of independent (Y) variable and (X) variables
by plotting time series of each

B. We can evaluate the significance of independent variables before you run regression models by:
1) Correlation Scatter Plots of X versus Y (look for linearity)
2) Correlation Coefficients (look for magnitude 0 to 1 and sign + or -)
3) Correlation Significance (t-test for Ho: r=0, reject then variable is significant)

C. We can evaluate the estimator X variables by:
1) Evaluating the Logic of the Coefficient Sign –Does it make sense?
2) t-Test to Evaluate X variable coefficient (b) significance with Ho: b=0 by comparing
t-value for b to table or p to reject when X variable is significant)
3) Serial Correlation test for error term serial correlation by:
Plot forecast fitted data Ŷ against Y values and look for + or – type
DW between 0 and 4 to compare with table upper and lower limits

4) Heteroscedasticity test in mintab plot the residuals vs the independent variable and
look for megaphone disitribution of the residuals
5) Multicollinearity check variable sign logic, check X1 vs Xn correlation to ensure that it
is greater than X1 vs Y correlation. (Omitted Variable Problem)
6) F-Test We can evaluate the entire estimator (regression equation) by:
F-test Ho: R2=0 test of significance by comparing F value to F-table value to reject
Ho when R2 is significant –useful with multiple regression methods

*

4-*

Required Hold Out Period Performance Assessment

4-*

Multiple Regression and Variable Relationships

4-*

Multiple Regression and Variable Relationships

4-*

How to Account For Know Occurrences or Outlier Values

What could you do to explain the occurrence of 9-11 or the effect of increased taxes, a sales promotion activity or an XY outlier?

It would be great to have a qualitative variable to account for their effects in a regression.

In this case we can capture the time series influence with a switching or dummy variable with data made by you.

The period(s) that the know influence occurs you switch from 0 to 1. All other periods remain as 0.

You run the multiple regression with this new data series made by you and evaluate the statistics for the new series just as you would for other X variables.

4-*

Dummy Independent Variables (X)
(How to Account for Known Qualitative External Factors)

1. Dummy or indicator variables are used to introduce qualitative independent factors to forecast a dependent variable.

2. The are typically used in conjunction with other quantitative X variables. You can use more than one dummy variable.

3. The qualitative factors are indicated with a data series of 0 (no influence) and 1 (influence) assigned for each observation.

4. The dummy variable data series is considered a switch on or off for the qualitative factor.

5. Note that you must be able to account for the dummy variable historically and project it into the forecast future.

6. Also note that you cannot introduce symmetry in the dummy variables.
That is do not introduce another data series with just the opposite values.
Check the significance and regression statistics for inclusion in regression

4-*

Inspect the Dummy Variable Results and Significance

4-*

4-*

How to Address Seasonality in the Y Data

Inspect your Y data for seasonality (time series plot and autocorrelations)
(Your option)
Method A
Perform a decomposition to identify the seasonal factors
Apply seasonal indices to your Y data
Run the regression on the seasonally adjusted Y
Reseasonalize your Y data estimates and forecast results.

Method B
Use dummy variables to represent each period in the year (less one) as additional X variables.
Results will be in seasonal Y data form.

4-*

You can introduce Dummy Variables to Account For Seasonality
In this case you would need to introduce a separate dummy data series of 0 and 1 for each month – but you must leave one month out to avoid symmetry.

4-*

4-*

4-*

4-*

Examine the Plot of Fitted Y from the regression and Actual Y to check for seasonal pattern
Check Y Fits for Seasonality

4-*

Check monthly t and P values

4-*

4-*

)

μ

(x

σ

σ

ρ

μ

x)

X

|

E(Y

x

X

Y

Y

+

=

=

(

)

.

|

X

X

Y

t

t

t

E

b

a

+

=

.

1

2

2

å

=

=

N

t

t

N

e

s

New folder/POWERPOINTS/Chapter 9 ARIMA Slides Apr 19.ppt

7-*

Chapter 9
ARIMA
(Box-Jenkins)–Type Forecasting Models
Taking advantage of autoregressiveness in your Y variable

7-*

General Overview of Box-Jenkins Forecasting

Understand the process

– Box-Jenkins is the application of 3 predefined forecasting models to a Y data series that is stationary.

– The model type is selected by the autoregressive characteristics of the stationary data series.

– You begin by evaluation your Y data series for stationarity

– If it is not stationary it must be differenced until it is.

– Run autocorrelation analysis to determine the characteristics of the ACF patterns

– Fill out the menu based on the ACF patterns and run the appropriate ARIMA model.

– Evaluate the residuals just as you do with regression – look for
remaining autoregressiveness and randomness.

7-*

Box-Jenkins (ARMA, ARIMA) Techniques Review
Univariate Forecasting
Box-Jenkins is a procedure which uses a variable’s past behavior to select the best forecasting model from a general class of models. It assumes that any time series pattern can be represented by one of three categories of models. These categories include:

• Autoregressive models: forecasts of a variable based on linear function of its past
values
• Moving Average models: forecasts based on linear combination of past errors
• Autoregressive-Moving Average models: combination of the previous two categories

Advantages: Box-Jenkins approaches to forecasting provide some of the most accurate short-term forecasts. Limitations: It requires a very large amount of data.
Autoregressive Moving Average (ARMA) and Autoregressive Integrated Moving Average (ARIMA) models are commonly called Box-Jenkins models after the mathematicians George Box and Gwilym Jenkins who popularized them in their 1976 book Time Series Analysis-Forecasting And Control.

7-*

McGraw-Hill/Irwin
©2007 The McGraw-Hill Companies, Inc. All rights reserved
Box-Jenkins essentially works the forecasting problem backwards.

7-*

Do You Remember What A
Stationary Data Series Is?

It has no slope and the data observations fluctuate around a mean value.

You can determine a stationary series with autocorrelation analysis – no trend signals.

You can create one from a trended series by taking a difference. In Minitab/Stat/Time Series/Differences – select lag 1 to get the first difference.

If one difference doesn’t make it stationary then take another and continue until it is stationary.

7-*

Note to be Careful with Differencing
(Differencing is Integration in ARIMA)
You will likely need to difference the Y data

Use Minitab to do this as many time as you wish to make the data stationary – check a time series plot or trend after each difference.

Too many differences can actually result in more data series instability.

If you difference and find that the trend characteristics do not change or become worse (determined by the slope term of trend analysis) you should not difference further.
Do not waste your time.

7-*

Statisticians have found that some data sets with specific characteristics contain adequate autoregressive information to forecast them accurately

They have discovered that stationary data series have theses special characteristics and they can be forecast with the use of a template

The template fit is applied by matching autoregressive characteristics found in autocorrelation and partial autocorrelation analysis.

Box and Jenkins have formalized this process and broadened it to apply to non stationary series.

The process involves

Determining if your data is stationary (2 nonstationary classes are trend and seasonality)
Changing the form of the data to make it stationary
Obtaining specific information on your Y data autoregressive characteristics
Matching the autoregressive characteristics to standard ARIMA templates or forms..

ARIMA Background

7-*

Autocorrelation in your Dependent Variable and Forecasting

A time series observation depends on the effects of previous observations.

As a result previous observation effects (t-n) + random effects = observation (t)

The question is what are these autocorrelated effects and can we isolate them and use them to forecast?

The question also is how many time lags are relevant?

In most cases more recent time (p) lags carry the most information about the current or last period.

7-*

Three Components of Box Jenkins Approach

The current observation (Yt) is represented by:
(1) a linear combination (weighted average) of previous observations (Yt-n),
(2) an error term associated with the current observation (εt),
(3) a linear combination of error terms associated with previous observations (εt-n).

The portion of the model involving the observations is called the autoregressive part of the model, and the portion involving the error terms is called the moving average part of the model.

Note: The residuals should have zero mean, constant variance, and be uncorrelated with each other. Always check the residual autocorrelations and lag pattern to confirm randomness.

7-*

Box Jenkins Model Expressions

AR (p) for Autoregressive models – Yt is a linear combination of weighted past observations (Y)

MA (q) for Moving Average models –Yt is a linear combination of weighted past errors. As Y moves forward in time the linear error combination moves forward as well.

ARMA (p,q) – non seasonal Combined AR and MA Models

ARIMA (p,d,q) – non seasonal differenced Combined AR and MA models

ARIMA (p,d,q)(P,D,Q) – seasonal differenced Combined AR and MA models. Derived after ARIMA (p,d,q) form is determined.

7-*

You are essentially going shopping for a model for your data

You need to make sure that you know the autoregressive tendencies of your data before shopping

You may need to integrate (difference) your data to get to the right picture of these tendencies with autocorrelation and partial autocorrelation.

During any transformation process and observation of these tendencies you will need to keep a log. The log has basically three entries (p,d,q)
1. for the observed autoregressive (AR) tendencies noted by (p)
2. for the number of times you transformed the data to get it
stationary (d)
3. for the observed moving average (MA) tendencies noted by (q)
Shopping has finished when you get a model that has significant coefficients
and random and uncorrelated residuals. Evaluate fit with error measures.

Forecast with the model and use the same analysis of residuals and error measures you used in other forecasting techniques.

General How To

7-*

Forecasting a Stationary Series with Previous Observations (Yt-n)

v

Observation

7-*

Look for high autocorrelatoions especially between recent observations.
These will cause spikes on the correlleogram.

7-*

Demand Example

We have 52 observations of demand data that we wish to prepare for ARIMA forecasting for problem 7 in Chapter 9

We will examine how to difference and what differencing does. Remember our differencing in Decomposition?

Next we will look at an example of how we can examine our data series with autocorrelation and partial autocorrelation

7-*

7-*

This shows definite trend in the data indicated by the slowly falling ACFs

7-*

The first difference of the trended data shows that it may be stationary. Confirm this with ACFs.

7-*

The ACFs indicate that the data series is stationary. Note the spike in lag 1.

7-*

Autocorrelation and Partial Autocorrelation

Autocorrelation is the total correlation between an observation and a (n) period lagged observation(s).

It is a good tool to statistically confirm data characteristics in Regression Analysis and is an even more useful tool in ARIMA model application.

Partial Autocorrelation is the direct correlation with only an (n) period lagged observation(s) holding other period lagged effects (or autocorrelations) constant.

The patterns between autocorrelation and partial autocorrelation define the type of ARIMA model to be applied.

The patterns are used to estimate model coefficient values

7-*

7-*

You will need to run your Y data through Autoregressive Analysis

Run it through Minitab Autocorrelation and note the results.
Run it through Minitab Partial Autocorrelation and note the results.
Go to Minitab/Stat/Time Series/Autocorrelation or Partial Autocorrelation
To get differences go to Minitab/Stat/Time Series/Differences and select lag 1 for a first differences and lag 2 for a second difference, etc…

7-*

1.We will explore three basic ARIMA model types AR, MA, ARMA

2. Then we will look at methods to difference your data series for trend and how to note it in the ARIMA menu

3. Then we will looks at methods to difference you data series for seasonality and how to note it in the ARIMA menu

7-*

Box-Jenkins AR Model

The AR model will produce coefficients equal to the number of orders (spikes) in the partial autocorrelation analysis.

The number of spikes determines the number of AR coefficients in the model equation.

The model will usually consist of a constant term and a coefficient for the lag variable selected in the peak(s).

First Order (AR1) Form: Ŷ = C + A1Yt-1 + ε

Second Order (AR2) Form: Ŷ = C + A1Yt-1 + A2Yt-2+ ε

Each lag period (n) observation (Yt-n) generates a separate coefficient (An).

The Box-Jenkins software will generate forecasts for you from your p input (p,d,q) in conjunction with your difference instructions (d).

7-*

Auto Regressive (AR) models
With one and two period
Partial Autocorrelations with
abrupt fall off noted as (p)
type.
First Order
Second
Order

7-*

Calculation of autocorrelation function (ACF) and partial autocorrelation functions (PACF) are required to identify the correct model.

Note the spike in the autocorrelation (ASF) and the alternating signs of the PACF.

7-*

Forecasting a Stationary Series
with Previous Errors (εt-n)
Moving Average (MA) Model
Error

7-*

Autocorrelation and Forecasts

A time series observation depends on the effects of previous error:
previous error effects (t-n) + random effects = error (t)

The question is what are these autocorrelated effects and can we isolate them and use them to forecast?

The question also is how many time lags are relevant?

In most cases more recent time lags (q) carry the most information about the current or last period.

7-*

Box-Jenkins Moving Average Model

The MA model will produce coefficients equal to the number of orders (spikes) in autocorrelation analysis.

The number of spikes determines the number of MA coefficients in the model equation.

The model will usually consist of a constant term and a coefficient for the lag variable selected in the peak(s).

First Order (MA1) Form: Ŷ = C + W1εt-1 + ε

Second Order (MA2) Form: Ŷ = C + W1εt-1 + W2εt-2+ ε

Each lag period (n) error εt-n) generates a separate coefficient (Wn).

The Box-Jenkins software will generate forecasts for you from your q input. (p,d,q) in conjunction with the number of differences that you have taken (d).

7-*

Moving Average (MA) models
With one and two period
Autocorrelations with abrupt
fall off noted as (q) type.

First Order
Second Order

7-*

7-*

Box-Jenkins ARMA Model

The Autoregressive Model (AR) and the Moving Average Model (MA) can be combined in ARMA that include both coefficient types p and q. (p,d,q)

The ARMA models have particular autocorrelation and partial autocorrelation patterns that slowly die out over the time lags.

The ARMA model applies to a stationary time series. As with other models if the series is not stationary it must be integrated (differenced) until it is.

7-*

Combination ( ARMA)
with no abrupt fall off
(p, q) type.
Typically ARMA are the first order – rarely the second order – both look the same relative to ACFs and PACFs.

7-*

Autocorrelation Partial Autocorrelation B-J Model Type

Cuts Off (q) Dies Out Moving Average MA(q)
Dies Out Cuts Off (p) Autoregressive AR(p)
Dies Out Dies Out Combination ARMA (p,q)
Rule of thumb for t-estimates for large sample sizes is ±2/√n
Where q is the order of MA spikes and p is the order of AR spikes that rarely exceed 2.
Summary of Model Identification

7-*

ARIMA Models can be used on seasonal data without seasonally adjusting the data. A seasonal difference may also reduce trend in data – perform the seasonal difference first and if you still have trend difference seasonal differenced data with lag 1.

The data must be differenced with a moving seasonal length difference (s) (e.g. s=4 for quarters or s=12 for months). For months Y difference = Y1 – Y13

Stationarity can be checked using autocorrelation coefficients or correlograms.
The r values for each lagged period must lie within t-value boundaries (statistically insignificant).

If differencing is used the number of times differences are taken to achieve a stationary series is (D).

ARIMA models are noted by (P,D,Q) that note the order of autoregression, differencing, and moving averaging, respectively for a seasonal model.

If you need to add other p or q values after a seasonal differencing to get a better model do so to the p and q values. Make sure you uncheck the “include constant term in model” box.

You will get a coefficient for each P,Q order that you have in the ARIMA model.

Keep track of D since the Box-Jenkins software will reseasonalize the forecast.
You will lose one entire year of data for every time a D is taken.
What if your Time Series Has a Seasonality?

7-*

sDY1 = dY5 – dY1
Seasonal Differencing
Always difference for seasaonlaity first (D), then difference for trend (d).

d = 1 D = 1 D = 3

Observaton No. (Y) Observation First Difference Quarterly Difference 2nd Qtrly Difference

1 1.62

2 1.55 -0.07

3 1.59 0.04

4 1.55 -0.04

5 1.1 -0.45

6 0.82 -0.28 -0.21

7 1.06 0.24 0.2

8 0.69 -0.37 -0.33

9 0.74 0.05 0.5

10 0.73 -0.01 0.27 0.48

11 0.44 -0.29 -0.53 -0.73

12 0.98 0.54 0.91 1.24

13 0.62 -0.36 -0.41 -0.91

14 0.44 -0.18 -0.17 -0.44

15 0.66 0.22 0.51 1.04

16 0.83 0.17 -0.37 -1.28

17 1.25 0.42 0.78 1.19

18 0.89 -0.36 -0.18 -0.01

19 1.56 0.67 0.45 -0.06

20 1.75 0.19 0.02 0.39

7-*

Seasonal Differencing

7-*

Seasonal Differencing

7-*

What if your Time Series Has a Trend ?

Trends can be detected visually with time series plots and confirmed with autocorrelation (at least several statistically significant recent values that slowly trend toward zero as time lags increase). If your data also has seasonality go to seasonal difference first . The seasonal difference may remove some of the trend.

The average value (mean) of the series observations change over time with the trend.

To use AR, MA or ARMA model we must Identify the model and then stabilize the series mean observation value by differencing.

Differencing changes the model to ARIMA where (I) stands for “integrated” where the differences are summed and are considered ratios to transform the series back to the original data series.

In minitab/stat/time series/difference use lag 1 for the original data. If additional differences are required use lag 1 on the previously differenced data. Uncheck the “include constant term in model”

Keep track of the number of times you take differences (d) and the ARIMA or “Box-Jenkins” software will transform the forecast back to original data values.

You always lose one observation for each difference (d)

7-*

Dif (Yt) = Yt – Y t-1
Differencing the Data for Trend
Older Data
More Recent
Data

d = 1 d = 2 d = 3

Observaton No. (Y) Observation First Difference Second Difference Third Difference

1 1.62

2 1.55 -0.07

3 1.59 0.04 0.11

4 1.55 -0.04 -0.08 -0.19

5 1.1 -0.45 -0.41 -0.33

6 0.82 -0.28 0.17 0.58

7 1.06 0.24 0.52 0.35

8 0.69 -0.37 -0.61 -1.13

9 0.74 0.05 0.42 1.03

10 0.73 -0.01 -0.06 -0.48

11 0.44 -0.29 -0.28 -0.22

12 0.98 0.54 0.83 1.11

13 0.62 -0.36 -0.9 -1.73

14 0.44 -0.18 0.18 1.08

15 0.66 0.22 0.4 0.22

16 0.83 0.17 -0.05 -0.45

17 1.25 0.42 0.25 0.3

18 0.89 -0.36 -0.78 -1.03

19 1.56 0.67 1.03 1.81

20 1.75 0.19 -0.48 -1.51

7-*

Seasonal Differencing

7-*

(1,1,1)

7-*

Seasonal Differencing

7-*

Seasonal Differencing

7-*

Seasonal Differencing
(0,2,1)

7-*

Running ARIMA in Minitab

You should not exceed 5 in any one of the ARIMA p,d,q or P,D,Q parameters
The total of the p,d,q and P,D,Q parameters should not exceed 10
You will need at least three observations after differencing to run the software. Three observations is not an good basis much for a reliable forecast.
If you differenced the data (that is d or D ≠ 0) then do not include a constant term in your ARIMA model.
In Graphs…select the graph time series plots and ACFs, histograms, normal plot and residuals versus order.
In Forecasts… select the number of forecast periods you want in Lead. Under Storage in Forecasts: select the worksheet column where you want the forecasts to appear. In Lower limit and Upper limit select the worksheet columns where you want the confidence limits to appear.
In Results…ensure that you have selected at least the Table of final estimates.
In the Storage.. option save the residuals. Select OK and run ARIMA.

7-*

ARIMA Model: IBM
Estimates at each iteration
Iteration SSE Parameters
0 1804.51 0.100 0.743
1 1714.82 0.250 0.654
2 1692.58 0.365 0.588
3 1692.52 0.371 0.588
4 1692.52 0.371 0.588
Relative change in each estimate less than 0.0010

Final Estimates of Parameters
Type Coef SE Coef T P
AR 1 0.3709 0.1506 2.46 0.017
Constant 0.5883 0.8277 0.71 0.481

Differencing: 1 regular difference
Number of observations: Original series 52, after differencing 51
Residuals: SS = 1692.42 (backforecasts excluded)
MS = 34.54 DF = 49
Minitab ARIMA Output
Note the significance of the ARIMA coefficient(s) Expect P values > .05
Select the model with the lowest Residual MS

7-*

Modified Box-Pierce (Ljung-Box) Chi-Square statistic

Lag 12 24 36 48
Chi-Square 7.4 15.9 28.8 38.4
DF 10 22 34 46
P-Value 0.690 0.822 0.720 0.778

Use the LBQ statistic to determine if the model is adequate.

Note that this is a test of the residuals in groups (lag periods).
– You want a model that has P-Values greater than .05
P-Values less than .05 have residuals with autocorrelation that

is not captured in the ARIMA model.
High P-Values are preferred especially in the early lag periods

12 and 24 for monthly data.
– Confirm the LBQ results with the evaluation of residuals.
Minitab ARIMA Output (continued)
LBQ
P should be greater than.05

7-*

Under Graph/Options select ACF of Residuals to inspect for significant residual autocorrelation
Minitab ARIMA Output (continued)

7-*

Minitab ARIMA Output (continued)
Under ARIMA Graph Options select 4 in 1 to check for a normal distribution and randomness of the residuals.

7-*

Forecasts from period 52

95% Limits
Period Forecast Lower Upper Actual
53 312.006 300.485 323.527
54 315.564 296.014 335.114
55 317.472 291.314 343.630
56 318.768 287.033 350.502
57 319.837 283.261 356.412
58 320.821 279.938 361.704
Minitab ARIMA Output (continued)
In ARIMA under Forecasts select the Lead time (Forecast

Period) and the columns to store the forecast values and 95%
upper and lower confidence interval values.
Check the forecast for reasonableness by plotting the forecast

values appended to the historical Y values.

7-*

Bayesian Information Criterion – BIC
The BIC is a statistic that informs whether it is better to add a new variable in ARIMA. (add another AR or MA)

Compare model BIC values and select the model with the lowest value and has low residual MS.

BIC is calculated by:
First, taking the natural log of the (residual Sum of Squares/number of observations). (from the Minitab ARIMA output)
Add to that the (natural log of the number of observations/number of observations) multiplied by the number of ARIMA parameters (constant, AR and MA terms or coefficients)
BIC penalizes the model for having many variables.

*

7-*

The Principle of Parsimony
Prefer simple models of complex models – given similar forecasting performance.

Try to start your analysis with simple model forms – never start with complex models with several orders of AR and/or MA.

Do not begin your analysis with an ARMA model just to cover the bases.

The ARIMA process may introduce muliticolliniarity in the model results that will give misleading significance terms and result in very sensitive coefficients.

Remember the simpler the better when it comes to ARIMA.

7-*

ARIMA Disadvantages:
Large amount of data is required- at least 40 observations for non seasonal data and 70 or more for seasonal data.
No way to update parameters as new data becomes available – you will need to evaluate the data again to produce a new model.
Cost is high with ARIMA due to large data and data manipulation requirements.
Forecasts are entirely based on historical data characteristics (cause and effect are autoregressive characteristics)
No causation from other factors.

7-*

ARIMA Advantages:

It usually provides a very accurate forecast especially in the short term.
It is highly flexible in dealing with any time series data.
The ARIMA process includes many tests of model adequacy.
Forecast are derived directly from model parameters – it is replicable.

In applying ARIMA techniques to business forecasting carefully consider the costs versus the benefits – benefits from improved forecast accuracy versus the cost of data collection and manipulation.

7-*

Models of a non stationary series are called ARIMA Models and the data must be made stationary before the appropriate model is identified and run.

The existence of trend can be checked using autocorrelation coefficients or correlograms. To be stationary the r values for each lagged period must lie within t-value boundaries (statistically insignificant).

If differencing is used the number of times differences are taken to achieve a stationary series is (d).

ARIMA models are noted by (p,d,q) that note the order of autoregression, differencing, and moving averaging, respectively.

You will get a coefficient for each p, q order that you have in the ARIMA model.
e.g. If p =2, you will get Yt-1 and Yt-2 coefficients. Similarly if you have a q=1 you will get an εt-1 coefficient.
Model Identification Review

7-*

7-*

7-*

What is the ARIMA menu for this?

7-*

What is the ARIMA menu for this?

7-*

7-*

Checking the Model with the Ljung-Box Q Statistic

Q is an overall check of the model relative to the size of the residual autocorrelations as a group that imply residual randomness. The null hypothesis in this case is Ho: autocorrelation between residuals = 0. Note that you want to prove the Q null hypothesis to support your ARIMA model performance.

Compare the P value associated with Q. It should be above .05 to be in the acceptable range (this is the opposite of the P value associated with t-statistics)

Or look up the Chi-Square table value an note the degrees of freedom as m-p-q
Where:
m = number of time lags tested (many times the default of 12 is used)
p = number of Autoregressive Parameters (AR coefficients)
q = number of Moving Average Parameters (MA coefficients)

If seasonality is considered in the ARIMA model then Chi-Square df = m-p-q-P-Q.

The Q-calc from the ARIMA analysis should be less than the Chi-Square table value to test for randomness in residuals and an efficient ARIMA model.

7-*

The Box-Jenkins Process Steps

1. Identify the tentative model. Which of the categories listed above is appropriate.
Analyze the autocorrelations.
Difference the data for seasonality and/or trend to make it stationary if required.

(c) Analyze the autocorrelations and partial autocorrelations of the data.
(d) Determine the appropriate type of model for a specific situation by matching the observed correlations (q) and partial autocorrelations (p) to the theoretical autocorrelation and partial autocorrelation profiles for each of the possible model.

2. Setup the model with time series data, (p, d, q) or (p,d,q,)(P,D,Q) in the Box-Jenkins software. If differenced data is used uncheck the “include constant term in model” box. See pages 478-479.

3. Determine the parameters or parameter estimation of the model. This is similar to estimating the parameters in regression analysis. The most common method uses non linear least squares estimation and hypothesis tests still hold.

4. Check the model by testing whether the estimated model conforms to the specifications of a stationary univariate process in that the residuals are independent of each other and constant in mean and variance over time. Model Checking Procedure involves both:
(a) plotting autocorrelation of the residuals with t limits and random pattern.
(b) Test the hypothesis of residual randomness with the Ljung-Box Q statistic.

5. Test coefficients for significance (t-test).

6. Adjust the model if necessary and repeat from step 2.

7. Apply the model to forecast and check RMSE and MAPE.

7-*

5 Critical ARIMA Checks

The significance of the coefficients (Remove coefficients that are not significant with low t values and p values below .05.)
The residual MS (Select the model with the lowest MSE)
The Chi-Square (Select the model with non significant LBQ values and have p values above .05.)
The residuals (Select the model with normally distributed and random residuals from histogram and residual normality plot)
The error measures (Select the model with the lowest RMSE and MAPE for Fit)

7-*

5. a. MA(2)
b. AR(1)
c. ARIMA(1,0,1)
6. a. Model is not adequate.
b. Q = 44.3 df = 11 α = .05
Reject H0 if
> 19.675
Since Q = 44.3 > 19.675, reject H0 and conclude model is not
adequate. Also, there is a significant residual autocorrelation at
lag 2. Add a MA term to the model at lag 2 and fit an
ARIMA(1,1,2) model.
Chapter 9 Assignment 14

7-*

7. a. Autocorrelations of original series fail to die out, suggesting that demand is non-stationary. Autocorrelations for first differences of demand, do die out (cut off relative to standard error limits) suggesting series of first differences is stationary. Low lag autocorrelations of series of second differences increase in magnitude, suggesting second differencing is too much.
If an ARIMA model is fit to the demand data, the autocorrelations and plots of the original series and the series of first differences, suggest an ARIMA(0,1,1) model with a constant term might be good starting point. The first order moving average term is suggested by the significant autocorrelation at lag 1 for the first differenced series.

7-*

b. The Minitab output from fitting an ARIMA(0,1,1) model with a constant is shown below.

The least squares estimate of the constant term, .7127, is virtually the same as the least squares slope coefficient in the straight line fit shown in part a. Also, the first order moving average coefficient is essentially 1. These two results are consistent with a straight line time trend regression model for the original data.
7. (continued)

7-*

c. Prediction equations for period 53.
Straight line model:
ARIMA model:

d. The forecasts for the next four periods from forecast origin t = 52 for the ARIMA model follow.

These forecasts are essentially the same as the forecasts
obtained by extrapolating the fitted straight line in part a.
7. (continued)

7-*

8. Since the autocorrelation coefficients drop off after one time lag and the partial autocorrelation coefficients trail off, an MA(1) model should be adequate. The best
model is

t = 56.1853 – (-0.7064)t-1
The forecast for period 127 is

127 = 56.1853 + 0.7064)125

127 = 56.1853 + 0.7064)(-5.4) = 52.37
The critical 5% chi-square value for 10 df is 18.31. Since the calculated chi-square Q for the residual autocorrelations equals 7.4, the model is deemed adequate.

7-*

8. (continued)

�����������������������������������������������

Autocorrelation

Autocorrelation Function for Yt

LBQ

T

Corr

Lag

LBQ

T

Corr

Lag

LBQ

T

Corr

Lag

LBQ

T

Corr

Lag

64.52

63.51

63.01

62.90

56.87

56.19

52.05

51.49

51.35

41.53

39.51

39.26

36.49

36.02

35.89

35.49

32.95

30.47

30.41

27.43

26.02

25.16

23.23

22.27

21.31

21.29

21.24

20.65

20.59

20.06

19.20

0.63

-0.45

0.21

1.60

-0.54

-1.36

0.50

-0.26

-2.22

-1.02

-0.36

-1.22

-0.51

0.27

0.47

1.21

1.21

0.20

1.36

0.95

-0.75

-1.13

-0.81

-0.82

-0.12

0.18

-0.65

0.22

0.62

-0.80

4.33

0.08

-0.05

0.03

0.19

-0.06

-0.16

0.06

-0.03

-0.25

-0.11

-0.04

-0.13

-0.06

0.03

0.05

0.13

0.13

0.02

0.14

0.10

-0.08

-0.12

-0.08

-0.08

-0.01

0.02

-0.07

0.02

0.06

-0.08

0.39

31

30

29

28

27

26

25

24

23

22

21

20

19

18

17

16

15

14

13

12

11

10

9

8

7

6

5

4

3

2

1

1.0

0.8

0.6

0.4

0.2

0.0

-0.2

-0.4

-0.6

-0.8

-1.0

30

20

10

7-*

8. (continued)

�����������������������������������������������

Partial Autocorrelation

Partial Autocorrelation Function for Yt

T

PAC

Lag

T

PAC

Lag

T

PAC

Lag

T

PAC

Lag

0.45

-0.05

-1.26

0.53

1.17

-0.92

-1.14

1.53

-0.59

-2.30

0.61

-1.10

-0.40

-1.64

1.94

-1.18

2.36

-0.27

0.50

1.41

0.24

-0.90

-0.95

-0.22

-0.65

-0.10

0.92

-2.25

2.95

-3.03

4.33

0.04

-0.00

-0.11

0.05

0.10

-0.08

-0.10

0.14

-0.05

-0.20

0.05

-0.10

-0.04

-0.15

0.17

-0.11

0.21

-0.02

0.04

0.13

0.02

-0.08

-0.08

-0.02

-0.06

-0.01

0.08

-0.20

0.26

-0.27

0.39

31

30

29

28

27

26

25

24

23

22

21

20

19

18

17

16

15

14

13

12

11

10

9

8

7

6

5

4

3

2

1

1.0

0.8

0.6

0.4

0.2

0.0

-0.2

-0.4

-0.6

-0.8

-1.0

30

20

10

7-*

(continued)

ARIMA model for Yt
Final Estimates of Parameters
Type Coef StDev T
MA 1 -0.7064 0.0638 -11.07
Constant 56.1853 0.5951 94.42
Mean 56.1853 0.5951
Number of observations: 126
Residuals: SS = 1910.10 (backforecasts excluded)
MS = 15.40 DF = 124
Modified Box-Pierce (Ljung-Box) Chi-Square statistic
Lag 12 24 36 48
Chi-Square 7.4(DF=10) 36.4(DF=22) 64.8(DF=34) 80.5(DF=46)
95 Percent Limits
Period Forecast Lower Upper
127 52.3696 44.6754 60.0637
128 56.1853 46.7651 65.6054
129 56.1853 46.7651 65.6054

7-*

9. Since the autocorrelation coefficients trail off and the partial autocorrelation
coefficients cut off after one time lag, an AR(1) model should be adequate.
The best model is

t = 109.628 – 0.9377Yt-1
The forecast for period 81 is

81 = 109.628 – 0.9377Y80

81 = 109.628 – 0.9377(85) = 29.92

7-*

9. (continued)

������������������������������������

Autocorrelation

Autocorrelation Function for Yt

LBQ

T

Corr

Lag

LBQ

T

Corr

Lag

LBQ

T

Corr

Lag

293.47

277.15

262.96

251.34

244.21

238.24

233.73

231.46

230.84

230.79

230.78

230.51

229.81

227.24

220.35

206.31

186.36

155.90

118.31

64.18

-1.28

1.22

-1.13

0.90

-0.84

0.74

-0.53

0.28

-0.08

-0.04

0.19

-0.31

0.60

-1.00

1.47

-1.83

2.45

-3.03

4.50

-7.86

-0.39

0.36

-0.33

0.26

-0.24

0.21

-0.15

0.08

-0.02

-0.01

0.05

-0.09

0.17

-0.28

0.40

-0.48

0.59

-0.66

0.80

-0.88

20

19

18

17

16

15

14

13

12

11

10

9

8

7

6

5

4

3

2

1

1.0

0.8

0.6

0.4

0.2

0.0

-0.2

-0.4

-0.6

-0.8

-1.0

20

15

10

5

7-*

9. (continued)

������������������������������������

Partial Autocorrelation

Partial Autocorrelation Function for Yt

T

PAC

Lag

T

PAC

Lag

T

PAC

Lag

1.42

-0.87

-2.32

0.08

0.28

-0.04

-1.95

0.98

-0.19

1.01

1.19

-1.44

-1.44

1.10

-0.40

1.30

1.43

2.54

1.16

-7.86

0.16

-0.10

-0.26

0.01

0.03

-0.00

-0.22

0.11

-0.02

0.11

0.13

-0.16

-0.16

0.12

-0.04

0.15

0.16

0.28

0.13

-0.88

20

19

18

17

16

15

14

13

12

11

10

9

8

7

6

5

4

3

2

1

1.0

0.8

0.6

0.4

0.2

0.0

-0.2

-0.4

-0.6

-0.8

-1.0

20

15

10

5

7-*

9. (continued)

ARIMA model for Yt
Final Estimates of Parameters
Type Coef StDev T
AR 1 -0.9377 0.0489 -19.17
Constant 109.628 0.611 179.57
Mean 56.5763 0.3151
Number of observations: 80
Residuals: SS = 2325.19 (backforecasts excluded)
MS = 29.81 DF = 78
Modified Box-Pierce (Ljung-Box) Chi-Square statistic
Lag 12 24 36 48
Chi-Square 24.8(DF=10) 39.4(DF=22) 74.0(DF=34) 83.9(DF=46)
95 Percent Limits
Period Forecast Lower Upper
81 29.9234 19.2199 40.6269
82 81.5688 66.8957 96.2419
83 33.1408 15.7088 50.5728

7-*

9. (continued)

The critical 5% chi-square value for 10 df’s is 18.31. Since the calculated chi-square Q for the residual autocorrelations equals 24.8, the model is deemed inadequate. An examination of the individual residual autocorrelations suggests it might be possible to
improve the model by adding a MA term at lag 2.

2

c

)

53

(

71

.

97

.

19

ˆ

53

+

=

Y

52

52

53

ˆ

00

.

1

71

.

ˆ

e

+

=

Y

Y

Y

ˆ

30

20

10

1.0

0.8

0.6

0.4

0.2

0.0

-0.2

-0.4

-0.6

-0.8

-1.0

LBQ

T

Corr

Lag

LBQ

T

Corr

Lag

LBQ

T

Corr

Lag

LBQ

T

Corr

Lag

64.52

63.51

63.01

62.90

56.87

56.19

52.05

51.49

51.35

41.53

39.51

39.26

36.49

36.02

35.89

35.49

32.95

30.47

30.41

27.43

26.02

25.16

23.23

22.27

21.31

21.29

21.24

20.65

20.59

20.06

19.20

0.63

-0.45

0.21

1.60

-0.54

-1.36

0.50

-0.26

-2.22

-1.02

-0.36

-1.22

-0.51

0.27

0.47

1.21

1.21

0.20

1.36

0.95

-0.75

-1.13

-0.81

-0.82

-0.12

0.18

-0.65

0.22

0.62

-0.80

4.33

0.08

-0.05

0.03

0.19

-0.06

-0.16

0.06

-0.03

-0.25

-0.11

-0.04

-0.13

-0.06

0.03

0.05

0.13

0.13

0.02

0.14

0.10

-0.08

-0.12

-0.08

-0.08

-0.01

0.02

-0.07

0.02

0.06

-0.08

0.39

31

30

29

28

27

26

25

24

23

22

21

20

19

18

17

16

15

14

13

12

11

10

9

8

7

6

5

4

3

2

1

Autocorrelation Function for Yt

30

20

10

1.0

0.8

0.6

0.4

0.2

0.0

-0.2

-0.4

-0.6

-0.8

-1.0

T

PAC

Lag

T

PAC

Lag

T

PAC

Lag

T

PAC

Lag

0.45

-0.05

-1.26

0.53

1.17

-0.92

-1.14

1.53

-0.59

-2.30

0.61

-1.10

-0.40

-1.64

1.94

-1.18

2.36

-0.27

0.50

1.41

0.24

-0.90

-0.95

-0.22

-0.65

-0.10

0.92

-2.25

2.95

-3.03

4.33

0.04

-0.00

-0.11

0.05

0.10

-0.08

-0.10

0.14

-0.05

-0.20

0.05

-0.10

-0.04

-0.15

0.17

-0.11

0.21

-0.02

0.04

0.13

0.02

-0.08

-0.08

-0.02

-0.06

-0.01

0.08

-0.20

0.26

-0.27

0.39

31

30

29

28

27

26

25

24

23

22

21

20

19

18

17

16

15

14

13

12

11

10

9

8

7

6

5

4

3

2

1

Partial Autocorrelation Function for Yt

20

15

10

5

1.0

0.8

0.6

0.4

0.2

0.0

-0.2

-0.4

-0.6

-0.8

-1.0

LBQ

T

Corr

Lag

LBQ

T

Corr

Lag

LBQ

T

Corr

Lag

293.47

277.15

262.96

251.34

244.21

238.24

233.73

231.46

230.84

230.79

230.78

230.51

229.81

227.24

220.35

206.31

186.36

155.90

118.31

64.18

-1.28

1.22

-1.13

0.90

-0.84

0.74

-0.53

0.28

-0.08

-0.04

0.19

-0.31

0.60

-1.00

1.47

-1.83

2.45

-3.03

4.50

-7.86

-0.39

0.36

-0.33

0.26

-0.24

0.21

-0.15

0.08

-0.02

-0.01

0.05

-0.09

0.17

-0.28

0.40

-0.48

0.59

-0.66

0.80

-0.88

20

19

18

17

16

15

14

13

12

11

10

9

8

7

6

5

4

3

2

1

Autocorrelation Function for Yt

20

15

10

5

1.0

0.8

0.6

0.4

0.2

0.0

-0.2

-0.4

-0.6

-0.8

-1.0

T

PAC

Lag

T

PAC

Lag

T

PAC

Lag

1.42

-0.87

-2.32

0.08

0.28

-0.04

-1.95

0.98

-0.19

1.01

1.19

-1.44

-1.44

1.10

-0.40

1.30

1.43

2.54

1.16

-7.86

0.16

-0.10

-0.26

0.01

0.03

-0.00

-0.22

0.11

-0.02

0.11

0.13

-0.16

-0.16

0.12

-0.04

0.15

0.16

0.28

0.13

-0.88

20

19

18

17

16

15

14

13

12

11

10

9

8

7

6

5

4

3

2

1

Partial Autocorrelation Function for Yt

New folder/POWERPOINTS/Chapters 1, 2 and 3 revised.ppt

1-*

Welcome to Eco 309
Economic Forecasting

1-*

Course Objectives

1. Be able to calculate and interpret basic sample statistics.
2. Be able to determine the components of a time series.
3. Distinguish between stationary, nonstationary, and random data.
4. Understand and distinguish between different types of smoothing.
5. Understand the concept of decomposition.
6. Be proficient in using at least one software package to forecast 7. Understand and interpret simple and multiple regression analysis.
8. Be able to identify the violation to regression assumptions.
9. Understand and run ARIMA Model.
10. Be able to conduct residual diagnostics.
11. Be able to present forecast methods and results

This course is part of the business core.

1-*

What Good is This Course?
It is needed to graduate with a business degree.
(The immediate need)
It is critical for management to determine the quality and value of forecasts handed to you.
(You will be responsible for applying resources and achieving it.)
It is critical for people in sales, finance and accounting to provide insight into the future of the business.
(Executive management counts on you to either produce or buy off on it)
It is an important element in any business plan to convince investors that you know what you are doing.
(Your reputation and ability to attract investment dollars depend on it)
This is one course that will give you information that you will use.
———————————-
Your challenge is to get the information – by doing the work.

1-*

What is your responsibility?
You must gain a working knowledge of four quantitative forecasting techniques using Minitab.
Exponential smoothing
Time Series Decomposition
ARIMA (Box Jenkins)
Multiple Regression

You will need to demonstrate your knowledge to me by earning points by completing the following tasks:

3 Tests
A class project proposal
A class project (formal paper)
Attending live online classes when possible

1-*

This is a Business Class
All work turned into me must be read by you as well as spell and grammar checked. The work must be in Word format.
All work must be turned in on time—late homework, tests or class projects will not be accepted.
Read your submitted material as if it were a business report – make sure that you are clear
Use the correct terminology
Do not use “I feel” or “I believe” — answer questions with a direct statement.
Do not use possessive terms – for example “my data, your mean value, my graph”
Back up your work with demonstrations – graphs, tables, statistics.
Never leave a graph, table or statistic without a comment on what it shows – never “strand” graphs or tables.
Always try to answer any homework or test questions – a zero hurts.
Do not quote from sections of the book or any other document– use your own words.
Do not provide a tutorial in the project or exam – simply answer the question.

1-*

Do not do work that is not relevant to the problem.
About 80 percent of the students that take this course do work that is not requested and a waste of time – most do not make excellent grades

1-*

The Importance of Forecasting

You were likely brought into this world with a forecast.
Your bank balance depends on your forecast or estimate of personal spending and income.
Your decision to go on a date is based on a forecast of the outcome.
Many critical business decisions depend on how accurately a single number can be forecast.
The success of any company depends on how well it estimates next period sales.
– The number of cars that Chrysler produces depends on the estimated sales of its cars.
– The quantity of products that your local grocery store will purchase and stock depends on an estimate of product sales.

1-*

The Importance of Forecasting (Continued)

Forecasts are the way to actualize any market plan or business decision or plan.

The business forecast of the key driver (sales in units or dollars) provides a guide to all departmental actions and decisions.

Every day in business you deal with forecasts – either creating one, evaluating one or providing changes to one– regardless of where you are in the organization.

You must learn to tell the difference between a poor forecast and a good one.

1-*

What is a Plan in Business?
The foundation of any business plan is a forecast. That is, what you expect to happen in business as a result of either action or inaction.

This forms the basis of decision making – to proceed or not – to alter direction of not.

Whenever you see a plan, examine the forecast that it is based upon. As a result of this course you will be able to evaluate the business plan foundation – its forecast.

This will increase your value to the firm and improve the probability that you will recommend the right decision.

1-*

Find the Forecaster

Forecasters typically do not carry that label or title in an organization.

Key business driver forecasts are so important that the person with the responsibility is typically a VP in the organization

Strong leadership within a company is typically driven by the acceptance of the company direction show by a single forecast.

Weaker leadership can be detected when the forecast function is “second guessed” or duplicated at various places in the organization.

Find the person responsible for key driver forecasts in your company and speak with them relative to the methods that they use.

1-*

Where do the Forecast Numbers Come From?
The Questions are where do the estimates come from and are they reliable or accurate?
Source Reliability/Accuracy

Thin air

Experience person

Quantitative Model
?
?
?
In business accuracy has dollars and cents value – firms are willing to pay for accuracy.

1-*

Making Rational Business Decisions with Forecasts

In this course we will evaluate four quantitative methods of forecasting.
Why are quantitative methods important ?
They are replicable (you can reproduce the forecast given the same data)
They can be accurate – there are methods to determine forecast accuracy that you will learn – an experience substitute.
They move the forecast and the guidance of the firm from uncertainty (unknown probability) to risk (with known probability).
This means that in addition to acquiring new methods for evaluating data you will move to greater confidence in the evaluation.

1-*

Moving From Uncertainty To Low Risk
Observe a Data Series for Characteristics

Observe Data Relationships

Interpret Statistics Describing Relationships

Assign Probabilities to Statistics
Uncer
tainty
Low

RISK

1-*

In this class you will deal with numbers and what they mean.
There are two types of numbers—
1. Those that carry information — typical business information (systematic information)
2. Those that don’t (random numbers)

If a number does not include any information it is called a random number. That is it may take on almost any value (large, small, positive or negative) at any specific time. As a result, you cannot predict the next value it may take.

If a number is not random then it includes information that we want to use to predict the next values.

Do business people expect a zero outcome?

1-*

The Numbers Change — But Why?
As you move from number to number is a data series you notice that sometimes they increase and sometimes they decrease by various amounts. (variation)

Why?

Some of the change can be viewed as random – that is, just like the flip of the coin some numbers change in magnitude and direction on pure chance. (random variation)

Some of change can be associated with a specific cause – that is what the model will try to determine – it will tell you that there is cause but will not tell you what caused it. (non random variation).

One model type – regression will try to determine the specific cause using your choice of causal independent (X) variables. (non random variation)

1-*

The information in business data is the firm speaking to you.
Businesses and economies do not know English – the speak through the information that they produce.

In this class you will learn how to read business and economic data and use the information contained in it to forecast future events.

The information is simple to read – business and economies what to grow over time – we read the information as trend (T).

Businesses sometimes have a tough time in an economic downturn – we read the information as cycle (C).
Business react to the time of the year – we read the information as seasonality (S)

1-*

What Is One of the Best Ways to Determine What the Business or Economy is Saying
A simple Time Series Plot (TSP) of the data will typically reveal the three elements (T, C and S)

Make sure that you plot each series independently – not on the same plot – to ensure the scale reveals any T, C and/or S.

Make sure that you get enough data observations to indicate T, C and/or S

Make sure that the plots are labeled correctly

Make sure that you accompany any plot with a comment about what is indicated by it.

1-*

What A Good Quantitative Model Will Do
Extract (Capture) T, C and S information from a data series

Use the information to provide estimates of future data values

Provide the degree of risk (providing also a degree of confidence) in the forecast
Provide error terms for determining error measures and other performance statistics.
What A Good Quantitative Model Will Not Do
It will not provide random estimates or random data components – we expect them to have a zero expected value over the long run.

1-*

Search for Randomness—

Where would a good forecaster like to find it?

1-*

Number Variation and Randomness
What are random numbers?

They can take on any value from –∞ to + ∞

They have a zero expected value (zero mean)

Numbers that cannot be forecast by definition

They have central tendency (to cluster about zero)

They are normally distributed about the zero mean (for large samples)

1-*

Number Variation and the Randomness Problem
Most business data are numbers that result from business causes mixed with random components.

The problem becomes one of extracting the business information (T, C and S) in data and leaving only the random components as a residual (leftover).

Remember, that the random components by definition cannot be forecast. (They have a zero expected value– so how can you forecast them?) – my experience with a random forecast.

The non random components of data (T, C and S) can be forecast with some degree of accuracy.

1-*

Data Series Distribution and the Randomness Problem – (moving from uncertainty to risk)
Statistics are derived from number series – they are descriptive values the indicate characteristics about a data series.

Important Concept: Even though the business data series itself may not be normally distributed (keep in mind that most business data is not normally distributed) – the statistics about the data series are normally distributed.

Therefore, there are tests to ensure that a statistic value (which is normally distributed – bell shaped histogram) is not a member of a random set of numbers (also normally distributed – bell shaped but zero mean)

We will discuss some critical statistical tests in this course.

1-*

Extracting Information From Numbers
Your job is to examine the numbers (that form a series) and extract as much information as possible from them.

You will use this information to derive a forecast of future values that the number is likely to take on.

The extraction process takes one of the four forms that we mentioned earlier — Exponential smoothing, time series decomposition, ARIMA and multiple regression.

Each of these methods use the information and result in a model that can be used to
1) estimate past values and
2) forecast future values.

1-*

Each forecast method uses the information (Data) and results a model that can
1) estimate past values (estimates of the data that you provided in develop the model) called FITS
2) forecast future values (from the historical data that you provided) call Forecast

Each forecast method uses the information (Data) and its estimates (FITS) to produce error terms or (Residuals).
Actual Data – FITS = Residuals

If the forecast model does not pick up T, C and S information well from the data then the information will reside in the Residuals.

Analysis of Residuals is an important concept in accurate and reliable forecasting.

Forecast Model Overview

1-*

Basic Approach to Quantitative Forecasting
Data
that contains random and non random elements

Model
(1 of four methods)

Estimates of data used to build the model (non random)

Comparison or Fit of
Estimates to Original
Data

Forecast
Point of Logic:
If the estimates (non random) are subtracted from the original data ( random and non random elements) the difference should be random.
This difference is called residuals (or error terms)
Data
– Estimates (or Fit values)
Residuals

1-*

Basic Approach to Quantitative Forecasting
Data
that contains random and non random elements

Model
(1 of four methods)

Estimates of data used to build the model (non random)

Comparison or Fit of
Estimates to Original
Data (Fit Error)

Forecast
New Data Values
– Forecast
Forecast Residuals
Data
– Estimates (or Fit values)
Fit Residuals
Comparison Forecast to future data values
(Forecast Error)
New Data Values
2 Points of Forecast Model Evaluation
Fit
Accuracy

1-*

How a Forecast Model Basically Works
Each model is different in that it may handle certain information contained in data or it may not.

If the model cannot or is not capable of extracting an using the information in the data it is — ineffective.

If the model does not do a good job of extracting all or most of the information in the data it is – inefficient.
Data Model Estimates of Adjusted
the Data Model

Forecast of the Data

1-*

Two Basic Stages of Quantitative Forecasting

1. Model Development, Test and Development of Fit to Historical Time Series Data
In this stage you will use a quantitative method to develop a model that, in turn, produces estimates of the variable for the historical period.

The model fit can be evaluated using error measurements described in chapter 3 — note the error measures estimate model “Fit” to the data.
Model and Test

1-*

Two Basic Stages of Quantitative Forecasting
2. Implement the Forecast and Measure Future Error
Forecast
In this stage you will use the model to produces estimates of your forecast variable.

The forecast accuracy can be evaluated using the same error measurements used to determine model “Fit” — note the error measures now estimate forecast “Accuracy” to future observations.

1-*

A Common Thread Goes Through Both Stages
Measures of Forecast or Budget Quality
Error Measures
– They are measures of how close the Fit estimates and forecast are to the actual data values
– Each measure has a specific focus on the type of data being forecast
– In your project you will use specific measures that best fit you needs and stay with them. See the error measures on page 88.

1-*

Eco 309 Project Objective
The project is to develop the best forecast and present quantitative forecast results.

For the project you must determine the best forecast method for a variable of interest (Y) for one year.

Forecast methods required are, 1) exponential smoothing (of your choice), 2) decomposition, 3) multiple regression (using 3 other variables), 4) ARIMA.

You will develop a model for each method using your data that excludes the last full (most recent) year of observations. This last year of data excluded is call the “hold out” period.

Each model will be used to produce error measures for “Fit”. In order to point to the best forecast model you will use error measures . (pg. 88)

Each model will also be used to produce a forecast that covers the “hold out” period of one year. In order to point to the best forecast use the error measure (pg. 88) to determine “Accuracy”.

Summarize your results in a table of “Fit” and “Accuracy” with error measures for each method and recommend the best model.

1-*

Economics 309
Business Forecasting
Project due – last day of class Tuesday,
Semester Project
 
The objective of the project is to simulate a problem you might face in your first job or business. Forecasting is an important process in almost all businesses and is an important skill to possess as an employee or as a business owner. You are strongly encouraged to spread the work throughout the semester.
 
Selecting a Topic for your project
You need to pick a weekly, monthly or quarterly data set (NOT seasonally adjusted).
It should be a topic of your interest.
No Macroeconomic objective variables such a Money Supply, GDP or CPI, or national interest rate etc..
Please remember that there is NOT one right way of preparing a paper.
Be precise, explain everything in detail and be structured.
Writing an abstract
After you pick your topic write-up a brief one page abstract to help you focus your thoughts.
It should include: 
– Statement of your forecasting problem
– Information about and description of your data
– Description of your proposed models
(Your abstract should be turned in as a proposal to me in a few weeks)

1-*

Final Project Report Should Include:
Introduction to talk about why you selected your topic and data.
Executive summary of your findings that will include the conclusion (Which of the techniques works best and why?). You can add the results to your abstract and make the abstract into an executive summary.
Body of the paper titled from your forecast problem should include:
What did you do (ex: Winter’s method with these parameter values…)
Why did you do it?
What did you find? (Interpretation of your results)

Your work must address all four major quantitative forecasting techniques including:
(i) Smoothing (Chapters 2 & 3): You need explain what, why, how etc.
(ii) Time Series Decomposition (Chapter 6) You need explain what, why, how etc.
(iii) Regression (Chapter 4 & 5): You will need other variable(s) to help
explain the variation in your variable of interest. You need to start looking for those explanatory variables!!
(iv) Box-Jenkins (Chapters 7): You will need use autocorrelation analysis and differencing.
You must also address how you deal with seasonality, trend and business cycle effects for each forecast method. You must discuss the tests of statistical significance in this section and their influence on your acceptance or rejection of the respective forecast model results.
Conclusion should clearly identify the best estimation methodology and model and explain why it is the best. Make sure you include a summary analysis of your “hold out” period for each forecast method. This can be done by selecting the best performance measure(s) for the” hold out” period and present them for each forecast method used in a single table.
Appendix should include all calculations and supporting graphs. The Appendix should be done in MS Excel while the other sections of the project report should be in MS Word. Make sure that the source of your data is clearly identified in the Appendix.
Be detailed and complete in all your sections. This is a case of “more is better than less” as you explain to me what you have done. Grammar and spelling will be considered in your grade. Use spell check.

1-*

The Keys to Success for the Project and the Course

1. Learn the terms and when to use them.
2. Be logical and straightforward in your explanations of your findings
Do not just produce diagrams without explaining their significance

and what they demonstrate. I will count off for “stranded” graphs, tables and data.
Always show your data (tables), time series graphs of the data (Plots)

and note the sources of your data (References).
Always spell and grammar check your homework assignments and projects.
Never produce a graph that is scaled wrong. That is, all of the data points fall so closely on a line that they appear jumbled. A graph like this is useless.
Answer every test question. DO NOT leave any question unanswered.
Do not copy previous projects. We keep copies of them for reference by the data that was used even between the classes. You will fail if this occurs.
Attend the online class sessions. I need to know what you don’t know to explains concepts that you may be unsure of.

10. Turn the Homework Assignments in on time. NO late assignments will be accepted. We will review the homework assignments on line so turning in late assignments means nothing.
11. There will be no extra credit work – there is enough work scheduled. You miss an assignment or test and you miss a building block that weakens your chance of passing the course.

1-*

This is what You Must do for Your Project

There are some specific statistics that you must learn and when to apply them.

Most of the calculation work will be done by the software.

DON’T BE FOOLED by the software and don’t underestimate the time it will take you to do the work.

This course and the project are a collection of processes that must be done in a specific order and level of completeness.

If you don’t learn the procedures (what to do and when to do it) then you won’t know what you are doing and will not pass this course.

Learning these statistical definitions and procedures must be a high priority for you. Read, watch, listen, ask questions and do the work.

If you put this course off it will be impossible for you to catch up.

1-*

The Best Way to View Your Project
Assume that you want to go into business for yourself or improve your current business. That is you want to develop a business plan.
This could include starting a business, investing in business expansion including buying equipment or plant or hiring additional personnel.

Ask yourself what critical information would I need to know to achieve business success?

You will need the best quantitative forecast possible of the critical variable for one year in the future.

You will need to gather historical data on the critical information.

You will also need to gather other historical data on variables that “cause” or “help determine” your critical forecast variable.

Most business cases that you will present to bankers or other backers of your business depend on a reliable forecast of your critical variable. Make a convincing case that this is a great forecast of the variable.

1-*

“Hold Out”
12 observations
At least 72 monthly observations
“Model Period”
Error Measures “Goodness to Fit”
Error Measures “Forecast Accuracy”
Months
Y
Your Forecast will be developed using the “Model Period” data for your variable(s). The model will produce “Fit” values that can be measured for error and “Forecast” values that can be measured for error.
(see the top of page 6).

Your Project Objective:
Get the best forecast possible given 4 methods
Time Series Plot of Actual Data
Time Series Plot of Model Data

1-*

Approved Sources for your Project Data

Use data only from these sources for your projects
Business Data sources

www.rfe.org , www.economagic.com

Population Estimates and Economic Indicators at U.S. Bureau of the Census

http://www.census.gov/popest/estbygeo.html

http://www.census.gov/econ/index.html

U.S. Economic Data
http://research.stlouisfed.org/fred2/
Regional Economic Data from the Dallas Federal Reserve Bank

http://dallasfed.org/data/resources.html

Economic Data from the Bureau of Labor Statistics

http://bls.gov/data/

Texas employment and unemployment data from Tracer

http://www.tracer2.com/

Freelunch.com
Make sure that you cite the data source in any assignment or project in which the data is used or evaluated.

1-*

Project Data Issues – What to Avoid.

Do not forecast Macroeconomic variables – that is national PDI, GNP, GDP, CPI, PPI or other national accounting data or price indices. – too complex. Note that you may use these Macro variables as Independent variables.

Do not use components of a data series as a variable to explain a data series. That is do not use data on car sales and data on truck sakes to forecast vehicle sales – creates model reliability problems discussed later.

3. Do not forecast interest rates – too discrete and are very difficult to model with a limited number of variables.

Keep in mind that you may use Macro variables as independent variables in your class projects but not as the objective or dependent variable.

What does that leave you? Sector information relative to a specific industry or product. Revenues, incomes, taxes, units produces and household expenditures, etc..

1-*

Project Data Issues – What to Avoid

Do not try to make monthly or quarterly data out of annual data. The results will create false forecast results.

Do not use seasonally adjusted data (Seasonality removed). Many of the models we use will capture seasonality. What good is a seasonally adjusted forecast in business?

Inspect the data for missing values – missing information that reduces the capability of the forecast model.

Use time series data – not cross sectional data. That is data from sources captured over time (Monthly or Quarterly).

Use the data sources mentioned under Chapter 1 of your Course Home. Do not use proprietary data or data that is not publically available.

1-*

Residuals and Error Measures
Recall that Actual Data – Forecast Data = Residuals (or Error Terms)

Each of the forecast Error Measures that you will use depend on the size of the residuals usually in absolute value.

The larger the residuals the larger the Error Measure. So you will likely choose the forecast with the smallest Error Measures (smallest residuals).

I want you to use at least two error measures to check every forecast and estimate of data in this course —

MAPE (Mean Absolute Percentage Error) and
RMSE (Root Mean Squared Error)

You may use other measures as well if you wish.

1-*

The Goal of Forecast Modeling

Our goal is to create a model that captures as much information from the data series as possible and leaving only random variation that can’t be modeled.

How can we tell that we have achieved our goal?

You can tell by examining what is left after the model has done its work!

These remains are called “Residuals” or error terms.

They are calculated for each observation period by subtracting the model “Fits” from the actual data observation. Observation – Fit = Residual. For example if you estimated a number to have a value of 6 but it turned out to be 8 your residual will be 2.

We check the residual time series to determine if they have trend, cycle or seasonality.

If they don’t then only random variation is left and the model has done its work and it should produce a good quantitative forecast.

1-*

Two Basic Quantitative Model Types

Univariate Models – That use only the Y (dependent data)
Extract non random data elements and use these to forecast future values
All forecast methods except Regression.

Mutivariate Models – That use X or independent data to estimate historical and future values of Y (dependent variable)
Regression
In these models X variation is used to explain Y variation – X causes Y (causal model)

1-*

How to tell when you have the
Best Model Possible

Each forecast method has rules and checks relative to providing the best model

You need to calculate error measures (covered later in Chapter 3 page 88)

You need to apply the measures in two instances for each quantitative forecast method (“Fit” and Forecast “Accuracy”)

You project conclusion should include the table below:

Forecast Method Fit Error Measures Accuracy Error Measures
Exponential Smoothing —— ——
Decomposition —— ——
Multiple Regression —— ——
ARIMA —— ——

Remember that there are several measures of error. You will apply at least 2 measures for each method for Fit and Accuracy.

Refer to your table results to recommend the best forecasting approach and forecast values.

1-*

Always consider the Costs vs Benefits of developing forecasts. This is why forecast method is important. Do not get too complex and caught up in the science of forecasting techniques or develop and implement methods that are too costly.

Remember, each model that you examine will have costs associated with it. These costs include data collection and preparation, model selection, model adjustment, fit and forecast evaluation as well as forecast reporting and method explanation.

Use the error measures as a key to determine if the model is producing an adequate forecast product. When you select your objective Y variable for your project keep in mind how accurate you wish the forecast to be. This will help you decide which model to use and at what cost.
Forecast Cost is Important

1-*

Two Basic Types of Forecasts
Note that both types can be evaluated using the same Error Measures. As a result, you can compare the accuracy of qualitative methods with quantitative methods.

1-*

Qualitative or Judgmental Forecasting – Chapter 10
-Sales Force Composite Forecasts:
Subjective and many times biased but it can be better informed by exposure of the forecaster to additional information to place the forecast in an informed context. Information includes company internal sales, production, promotion, customer service plans; external economic and market factors, competitive assessment information) to develop information scripts to be used by forecasters.
-Surveys of the Extent of the Market (Customers and General Population)
Surveys of Buyers intentions (better for industrial users than consumer market. How people feel about the product or market conditions or their propensity to buy. The Index of Consumer Sentiment is an example.
-Jury of Executive Opinion
Forecast from firm executives in different areas such as operations, finance, sales, etc..that meet to reach consensus about the forecast .
-Delphi Method form a panel of informed participants (could be execs) and provide questionnaires to determine the forecast. Summarize the results and asked the panel to review these results again and resubmit their revised views along with comments supporting their revisions. Go through iterations of this process until no significant changes result.

1-*

Qualitative or Judgmental Forecasting – Chapter 10

The qualitative methods below rely on participant experience (many years of observation).

-Sales Force Composite Forecasts

-Surveys of the Extent of the Market -Jury of Executive Opinion

-Jury of Executive Opinion

-Delphi Method

The question is do you want to wait until you have adequate experience to forecast? Do you know how much experience is required or at what level you need to be in the company to forecast? A way to get around this is by acquiring skills in quantitative forecast methods.

1-*

Quantitative Forecast Methods
(The Types of Quantitative Vehicles)
Naïve (tomorrow will be the same as today)
Moving Average (more recent averages are better than historical averages)
Exponential Smoothing*
Simple
Adaptive Response
Holt’s
Winter’s

Decomposition*
ARIMA*
Regression
Trend
Causal (Simple and Multiple Regression*)

* Methods that must be evaluated in your project.

1-*

Advantages of Qualitative Approaches

– Do not require forecasting training or background
– Subjectivity can capture complex forces
– Don’t required a lot of data
– Many times have low forecast development cost

Disadvantages of Qualitative Approaches

– How the forecast was reached is questionable – forecast is usually not replicable
– Not widely accepted by users (harder to convince users that it is the best forecast)
– Always biased to some degree
– Not consistently accurate over time
– Takes years of experience to convert judgment into forecast.
Do not exclude Qualitative Methods from your methodology – You can combine them with quantitative methods to obtain better forecasts.

1-*

Forecasting Issues
Subjective vs Quantitative (experience of responsible people)

Lean to Subjective techniques when: not trained in quantitative forecasting methods, complex environment, quick forecast turnaround is required, lack of relevant data (external and internal)
Relevant Data

Many companies do not keep historical information due to data storage capacity and costs or the lack of capability to capture the data.
Relevant Forecasts

What is important to be forecasted, forecast units (U.S. vs other countries), Seasonal Patterns differ by global markets, Cultural Differences differ by regional markets.
Sales dollar forecasts (for Sales and Financial use) are not the same as unit forecasts (for Input and Operations Planning use).
Top Down vs Bottom Up Forecasts
Local market demand forecasts (bottom up) usually do not sum to total market forecasts (top down)- Why? Bias sources need to reconcile the forecasts.- my own example.

1-*

Forecasting Issues (continued)
Forecast and Purpose

Sales forecast and Quota Setting. Operations Input forecast vs Operations Cost. Why? – Bias
Forecast Timeframe

When it was produced and why. Sales forecast may not match the need of Financial Reporting etc.. Quarterly forecasts do not supply the need for weekly or monthly forecasts.
How far in the future is the forecast required to be extended?
Forecast Costs

Sales cannot afford to forecast hundreds of local markets monthly. There needs to be cost/benefit judgment.
Forecast Accuracy

The degree of error in forecasts may be acceptable for Sales and Marketing use but not precise enough for input planning or engineering.
These are among the major issues. There are many others that will come to mind as we proceed.

1-*

Trends in Forecasting

Forecasts and the Supply Chain – forecasts of flow between suppliers, producers, distributors and consumer – For Cost Control and Customer Satisfaction –”Think Lean”
Collaborative forecasting (Sharing information or forecasts between firms. Sometimes the information includes contractual information relative to buys or other sellers that can be deemed proprietary.
What are some of the advantages?

-Lower inventory and capacity buffers
-Fewer unplanned product runs and shipments
-Reduction of Stockouts
-Increased Customer Satisfaction and repeat business
-More Effective Sales Promotions (better preparations)
-Better Response to Market Changes

1-*

Trends in Forecasting (continued)

Collaborative Forecasting and Issues include sharing of confidential information, and how to deal with non participants as well as added cost of gathering information and generating forecasts and determining the reliability of these forecasts.) WalMart, Target, Sears, Whirlpool and Goodyear are examples of collaborative forecasters.

Relational Data and Large Data Warehouses are now the norm for many corporations. No data is lost, it is gathered and retained and resides in large data repositories (warehouses or data marts) with the relationships intact. This enables the forecaster to create forecast threads in which a forecast for a business driver also influences the expected values of relational variables.

Data Mining and Analysis Tools linked to enterprise data are now relatively sophisticated and can provide information in a better form for business decision making. SAP and Oracle are leading the way with analysis tools such as ERP, CRM, SFA capabilities. These tools have GUIs that include data storage, retrieval and transformation capabilities that , in turn, reduce the need for data base management interface.

1-*

Data Evaluation
The basis for quantitative approaches used in this course are the sample data to be forecast.

Careful selection of data is imperative – it must form the basis for your business plan and forecast. You need to ensure that it is accurate (from a reliable source), complete (no mission values), recent, and relevant to your plan.

1-*

A Population and a Sample
(One is a member of the other)
A population is a complete set of observations (data) about a variable.
A sample is a subset of the population – that is, drawn from it.
It may be sequential (e.g. a series a data generated over time)
It may be random (selected from a population without respect for the sequence)
Samples can tell us about a population but the question is – is it representative of the population?
Keep in mind that the data that you will use in your projects are samples where the entire population is unknown.

1-*

Data are not just Numbers

Things happen that cause numbers (data observations) to occur.
People become confused with data series and see just numbers.
People come to think that data observations (the numbers) are not related to each other.
In actuality they may be –just as one step in a staircase is related to another.

The questions are:
What is this relationship? (or what type?)
How strong is the relationship? (or correlation?)

1-*

Get to Know the Data
(Things you must write about in your project)
Know the Data Source – where it is from, when it was compiled and last updated. Note any missing observations.
Determine if it “Time Series” that spans a time period or “Cross Sectional” that occurs at a single time.
Know the Data Periodicity – what time period does it span and what periods are represented (months, weeks, years, days, etc..)
Note the Data Range – the highest value minus the lowest value will give you the data range. The greater the range the more difficult the forecast.
Note the Data Mean – the average value and the Mode – most frequent value and the Median – middle value
Ensure that it is continuous – Note any missing observations. When you use data from a table make sure the columns and rows of data create a continuous series.
After copying the data make a plot to ensure you have all observations.
Ensure that your data plots are scaled appropriately. Flat lines or “piled observations” indicate that you need to rescale the data plot. Remember, the plots should be scaled to make data movements stand out.

1-*

Get to Know the Data
(Things you must write about in your project)
Basic statistics can reveal a lot about the forecast difficulties that you may face.

For example the variation of data about its mean indicates the forecast difficulty.

Mean to Standard Deviation is a good indicator – if the standard deviation exceeds the mean value (for trended data for example) the forecast model must process more systematic variation for the sample data. This indicates that the forecast may be more difficult than if the standard deviation is less than the mean.

Non linear data with high variation may be much more difficult to estimate (forecast) than more linear data.

Cyclical data is much more difficult to forecast than non cyclical data. Business cycles have a larger random component that non cyclical data.

1-*

Where Can you get these Statistics?
You can obtain statistics on your data series from Minitab

I will try to place all instructions for Minitab in green.

Go to Minitab/Stat (in the menu bar at the top)/Basic Statistics either Display or Store basic Statistics.

You will find that mean and variance are of particular interest in determining the type of data series you have as well as the magnitude of information contained in the data series and the difficulty of the forecasting problem.
A lot of variation around the mean indicates that the model you select will need to extract a large amount of information from the data.

1-*

Basic Statistics and Data Series Description
Whenever you discuss a data series you should – comment on the size of the average value (mean)
Discuss the range of the data (highest and lowest value)
Describe the size of the standard deviation relative to the mean value.
Answer the question do three standard deviations become negative values when subtracted from the mean? In many cases negative business values do not hold – so the zero value becomes a floor.
Are there any outliers in the data series (beyond 3 standard deviations from the mean)?
What do you recommend doing with the outliers? Discard them or keep them? Do you have any explanation for them?

1-*

Fundamental Data Observations —
Basic Statistics of a Number Series
All number series distributions can be described by the mean and the standard deviation.
The degree of variation around the mean value determines how much information is carried in the observations.
Data series observations that vary far from the mean value are called “outliers” and may be the result of specific business condition causes or error is data measurement or collection.
You need to determine whether you wish to include the outlier in your forecast model.
Examine the outlier from the perspective of standard deviation calculated from the data without the outlier.
If it is greater that 3 standard deviations from the mean value of the series you may want to either discard the observation or provide special treatment for it.

1-*

Which Series Will Be Easier To Forecast?

Y
Y
Months
Months
Data with more variation is more difficult to forecast – look at the standard deviation relative to the mean of the data series for an estimate of the difficulty in forecasting a variable.

1-*

Fundamental Data Observations —
Linear versus Curvilinear Relationships

Data may have a linear relationship with another variable – in the previous case time.

It may have a non linear or curvilinear relationship with another variable as well.

It may have no relationship.

The challenge is to determine and comment on the type of relationship the data has with other variables – including time.

1-*

Linear vs Curvilinear Relationships.

1-*

What is Linearity?

How well observations of a variable fall on a straight line.
How to describe a straight line?

Y = a+bX where:
Y= the values of the variable
a = the intercept value
b = the slope value (stair step vertical height)
X = time index (stair step horizontal height)

Check out Minitab under stats/time series/trend analysis to get the best linear fit for any data series. It will return a plot of the data and the linear fit along with the linear equation in the form above. The more trend the higher the slope term.

1-*

Fundamental Data Observations
With Time

1. If the data has long term trend then the slope term (b) will have either positive or negative values.
2. If the slope term (b) is zero then the data does not have trend.
3. The data with zero slope is termed “Stationary”
4. The data observations will fluctuate around a single mean value.
Data that has a positive or negative slope term is termed “Nonstationary” or “Trended”.
Make sure to comment on this when writing about a data series.

1-*

When Data is “Stationary”
The mean of the data series is sometimes termed the “level” of the series. As more data are added the mean tends to become “Stationary”.
You can check this out in Minitab/Stat/Time Series/Trend Analysis and look for a slope term (b value) that is close to zero for a stationary series.

1-*

Fundamental Data Observations —
With Time Continued
(And things you must write about in your project continued)

Look for Trend in the data (shifts from below the mean to above the mean for a positive trend or the reverse for negative trend)

Look Cycle in the data (roller coaster effect usually over quarters or years)

Look for Seasonality in the data (abrupt spikes and dips that occur the same months or quarters each year)

Do not tell me that annual data has seasonality

Determine if it has been Seasonally Adjusted or is raw unadjusted data. Note – you should use unadjusted data and adjust for seasonality yourself if required.

1-*

A Single Time Series Carries Information
(Additional things you must write about in your project)

If a series of numbers is increasing (positive trend) then each observation is related to a previous observation (as with steps on a staircase). We call this auto(self)correlation called Positive Trend.

If a series of numbers is decreasing (negative trend) then each observation is related to a previous observation (as with steps on a basement staircase). Another autocorrelation called Negative Trend.

If a series of numbers increases over a period of years then decreases in a predictable pattern (roller coaster) then each observation is related to a previous observation a year or more ago. Another autocorrelation called Cycle.

If a series of numbers increase and decrease depending on the same month year after year then each observation is related to a previous observation 12 months ago. Another autocorrelation called Seasonality.

When writing about a data series comment on each of these characteristics.

1-*

“Non Stationary” or “Trended” Data

IIntercept or level
Trended data either increases or decreases over time. It can be described with a level value (intercept) plus a slope value that can be either positive or negative.

Note that as the number of data observations increase or decrease the mean of the data series increases or decreases. So a trended data series has either a rising or falling average value.
Rise
Run
Slope

1-*

Cycle Effects – complete cycles take place over several years and the magnitude is difficult to predict.
Can you identify the length of the first cycle? How does this length differ from the second cycle?

1-*

Why As Business Majors are we interested in Cycles?
General business cycles affect interest rates, prices, demand, unemployment and income. A data series may have cycles that seem to be independent from overall general business cycles.
Trough to trough cycle measure

1-*

McGraw-Hill/Irwin
©2007 The McGraw-Hill Companies, Inc. All rights reserved
Seasonality – Look for the short cycle repeating in the same fashion year after year (one cycle takes place every year)

Why as Business Majors are we interested in Seasonality?
In Minitab select Stat/Time Series/Time Series Plot to check for seasonality.

1-*

What can you tell from this data series?

1-*

Determine the Forecast Horizon

How much forecast do you need?
What does your business plan require?
What makes your comfortable with the forecast?
Very short term (one period)
Short term (one or two periods)
Medium term (one to four periods)
Long term (one + periods)
The selection of the Forecast Horizon depends on data availability, forecast method used and your business requirements

1-*

McGraw-Hill/Irwin
©2007 The McGraw-Hill Companies, Inc. All rights reserved

1-*

Fundamental Data Observations —
Basic Number Series Types

Numbers can take at least two forms depending on when they were captured.

Cross Sectional – all captured at the same time from different occurrences or places.

Time Series – all captured from the same place at different times (usually in time sequence – e.g. daily, weekly, monthly, quarterly or annually.

Note: Make sure that you are not fooled by the column format of numbers. Sometimes monthly or quarterly numbers or data are formatted in columns when they should be a single data series. For example, if you see four quarterly columns of numbers this is not four separate data series but a single series with each quarterly number following in sequence.

1-*

Time Series vs Cross Sectional Data

Note that you can use quantitative techniques to determine store performance by location and any point in time – a cross section

1-*

Cross Sectional vs Time Series Data
(One point in time vs several)

Months
Sales
A Plot Sales for Four Stores for 2009
Jan
Dec
May
Answers the question as to which store is performing better in May.

1-*

Time Series Data

You are looking for variation over time of specific data –sales.

1-*

Data Plots are Very Revealing
(Always Plot Your Time Series Data)
Data versus Time (Time Series Plot)

Data versus Data (Scatter Plot)
Value
Time
Y
Value
X Value
0
0

This shows the data observation relationship over time
This shows the data observation relationship with other data observations
In Minitab select Stat/Time Series/
Time Series Plot
In Minitab select Graph/Scatter Plot/Simple and select Y and X data series

1-*

What can you tell from looking at the data?
Always plot the time series data to examine it for trend. Confirm trend with Minitab trend analysis and observe the size and sign of the slop term.

1-*

When you collect data in a time series your are actually collecting two data series at once

— The objective data observations

— The time series (time)

Since you will be dealing with time series data in your projects you will always have a fourth independent variable – Time.

Y the objective variable
X1, X2 and X3 the independent variables
Time – an independent variable – you can include among your independent variables by using a counting sequence as another independent variable (e.g. 1,2,3,4,…n).
Time as a Variable

1-*

Time
Past
Future
nt
Now
n Data Observations
nt-1
nt-n
nt+1
nt-2
nt+2
nt+n
Typically we want to use the n observations of data through nt to forecast the values nt+1

nt+2
,
etc.
Time Series Data Notation
The number of periods we want to forecast is termed the “Forecast Horizon”.

Remember that forecast model error measures can apply to past data values as well as future data values.

1-*

The Uncertain Future
There is a gambling parallel to statistics and forecasting

For any future outcome there is a probability that can be assigned if we make a few basic assumptions.
One assumption is “central tendency” that is outcomes will tend to cluster about some central value or mean.
Another assumption is “normality” that is that the frequency of observations about the mean will assume a normal distribution
If it is normally distributed then the distribution of observations has characteristics of variance and standard deviation about the mean.
We must also test the probability that the sample data is representative of the population data.
We will need mean, standard deviation (or variance), as well as the number of sample observations to determine this.

6. Another weaker assumption is “linearity” that is;
a) the relationship with other variables may be linear (correlation) and
b) the relationship between observations of the same variable may be linear.
7. Since this last assumption is weak, we must test to see if it is true.

1-*

You must know that data relationships depend on probabilities.

Significance (short for Statistical Significance) and Confidence are ways of expressing probabilities.

For example, if you have a sample of the a population of numbers what is the probability that the sample is representative of the population?

A descriptive statistic can be a way of describing a sample of a population as well and also has a probability associated with it.

In some cases you will need to calculate the statistic and then determine the probability that it applies to the population.

That is, there is generally a probability associated with any statistic describing a data series.

We typically decide the level of probability (Confidence or Significance) we will accept for the statistic and then test to see if it meets the level.
Why Know About Probabilities and Tests of Significance

1-*

Probability Tests That You Must Become Familiar With

For example you can test the probability of;

A sample mean representing a population mean,

The correlation of two variables from two sample representing the respective variable population correlations,

The correlation between previous observations of the same variable representing the population of that variable,

The coefficient of a model derived from a sample representing the population relationship of a variable,

An entire model derived from a sample representing the relationship of the population of all variables included in the model.

1-*

Sample vs the Population of a Variable

In most cases the entire data series for a variable (the population) cannot be observed.
Continuous variables have an infinite number of values.
Discrete variables may have past or future values that are not revealed.
You will typically use sample data to develop a forecast.
The question is how representative is the sample to the total population?
We begin by describing the population with parameters that have sample counterparts.

1-*

Statistics That Describe a Data Series

Number of Observations or n
Range (highest value minus the lowest value)
Mean or Ȳ or X (arithmetic average of data values) pg. 15
Median (middle data value)
Mode (most frequent or repeated data value)
Standard Deviation or S (unit of measure of distance from the mean)
Variance or S2 (unit of measure of distance from the mean)
Recall that you can obtain these statistics from Minitab/Stat/Basic Statistics.

1-*

Population Parameters and Sample Statistics
μ = The population mean
Ó = The population standard deviation
X or Y = The sample mean
S = The sample standard deviation
These are used to describe any normal distribution.

1-*

Standard Error is Standard deviation divided by the number of observations (n or N).
Measures of Data Fluctuation About the Mean

1-*

1-*

Moving from Uncertainty to Risk
(Hypothesis Testing)

How to become more confident in your forecast model and the forecast itself.

How much risk does a good business person take?
Risk is reflected in “confidence”
A good business person needs to be 95% confident in the outcome? Or willing to take a maximum of .05 risk (5%) relative to the outcome.

1-*

How Are The Data Observations Distributed?

For example the ages of the people in this class – there is a distribution of these ages about a mean value.
This distribution can be shown in a histogram (see Minitab/Graph/Histogram and enter the data series)

18- 19 20 21 22 23 24 25 26+
Mean
No. of Students
Age of Students
This distribution approaches a normal distribution about the mean age of students. It also describes the probability that anyone drawn from this class will be of a specific age or age range.

1-*

Data and Central Tendency

Data have Central Tendency and the result is a distribution around the central value (mean)

Even random variables have central tendency around a mean value of zero.

A population has a mean value and a distribution around it characterized by its variance (or standard deviation) —
A normal distribution

The question is– if you have a sample—does it belong to the population?

You form a hypothesis that it is a member and test it.

1-*

1-*

How can You Tell if Observations Are Part of a Population?

Statisticians have calculated the probability (Z) values of a sample being a member of a normally distributed portion of the population (by comparing sample and population statistics).

Calculate the Z value for each sample to answer the question.

The Z value is the probability that the sample came for a population – the higher to probability (Z value) the more sure (or confident) we are that the hypothesis is true.

Note that Z values are expressed as the number of standard deviations from the population mean.

1-*

Transforming the Distribution

In order to analyze the probability of a sample, we must transform the population sample distribution into a “standard normal” distribution.

Calculate the Z value from the sample size and the population standard deviation. Calculate this to two decimal places.

Then apply the probability of the sample occurrence taking place in the population. Look up the probability in the Z table. If the probability is + or – then use a one tailed Z value. If the probability is global (= and – then use a two tailed Z value.

From the perspective of the standard normal distribution the sample parameter value has a probability associated with it relative to the population. The task is to define this probability.

1-*

1-*

Example of Probability

What is the probability that the mean of a random sample of 100 weights drawn from a population is within 2 pounds of the true population mean weight. The standard deviation of the population is estimated to be 15 pounds.

Standard error is Ó/ √n or 15/ √100 = 1.5 The Z = (X – μ)/ Ó/ √n or 2/1.5 = 1.33

This gives a table Z value of .4082. Since this is a two tailed test we double the Z probability to .8164. Therefore the chances that the sample mean will lie within 2 pounds of the true population mean is about 82%.

1-*

When you obtain a statistic which family or sample distribution does it belong to?
5
5
5
5
0
35
-5
4
How can you tell? You can assume the distributions are normal and there is a specific probability of the occurrence of a 5.

1-*

Inference and Hypothesis Testing

There is a difference in determining the “inference” that a sample statistic is a good estimate of a population parameter (e.g. mean of the sample vs mean of the population) and testing a hypothesis.

We will use the Student’s t-test as a method of testing hypotheses in this course.

We will test the basic null or “no change” hypothesis versus a hypothesis that you assert . We will use the t-test to determine if your alternative hypothesis has a high probability of being true versus the null hypothesis.

1-*

In some cases we want to determine which distribution it is not likely a member of.
5
5
5
5
0
35
-5
4
Again, you can assume the distributions are normal and there is a specific probability of the occurrence of a 5.

1-*

The Forecaster’s Challenge –
Proving Statistical Significance

When forecasters examine data relationships and forecast model coefficients we want to know if the data correlations and model coefficients are statistically different from zero.

If they are not then no relationship exists between the data or the model coefficients do not have any forecasting reliability.

What we need is a point of demarcation between an acceptable probability of occurrence and an unacceptable probability. We call this point or value of demarcation either a Z or Student’s T test.

We typically apply a T test to the population sample.
5
0

1-*

The Forecaster’s Challenge –
Proving Statistical Significance

In forecasting we take a very conservative approach to proving statistical significance.

That is we challenge the forecaster to prove that the data relationships or model coefficients are a member of a distribution that has an expected value (mean) of zero.

The forecaster must then disprove the null or “no change” hypothesis where Ho: statistic = 0 with an alternative hypothesis H1: statistic ≠ 0.

The t-test is used to prove that the alternative hypothesis is true.

\
*

1-*

A Version of the Normal Distribution Used to Test Hypotheses – the t Test

In this test we use the opposite logic used in Z analysis.
This involves the decision between the Null Hypothesis and the Alternative Hypothesis
The hypothesis tested is called the “no change” or “null” hypothesis.
If you are testing a theory then your hypothesis is the “alternative hypothesis” or change hypothesis.
We want to test the probability that a null hypothesis (or proposition) is not true.
That is we want to “reject” the null hypothesis with a probability value that falls outside of the null distribution.
Therefore the rejection region sought after is in the tails of the t distribution.
Our test will compare a t-calc (one that either you or the software calculates) and compare it to the t-table value.
We want the t-calc value to be greater than the t-table value for our proposition (alternative hypothesis) to be accepted as “statistically significant”.
You need to know the “degrees of freedom” or observations (n) and the level of statistical significance you will test (%).

1-*

This is one of the times in your life that you want to experience rejection

Remember you want to reject the null or no change hypothesis (Ho)

You want t values calculated from your data that are greater in absolute terms than the t-table values.

This means that your model or data relationship hypothesis is statistically true.

Then you can say that your data relationship or model coefficients are statistically significant.

1-*

Note that the t-table values depend on the sample degrees of freedom and the level of confidence (or significance) that you want.

1-*

The t-Test
For statistical significance of critical statistics
1. State the hypotheses
Null Ho: sample statistic = pop. number
Alternative H1: sample statistic ≠ pop. number
2. Determine if the test is a one or two tailed
3. Determine the confidence level (e.g.95%)
4. Look up t-table value by sample degrees of freedom page 526
5. Calculate the sample t-value:
t = (sample statistic – pop. number)
sample std dev./ √sample n
6. Compare t-table to t-calc (e.g. t-calc>t-table then reject null)
7. State whether you accept or reject the null hypothesis

1-*

SAT Score Example
Given a random sample of 15 students the mean (X) of the sample is 475 and the standard deviation (S) is 35. The national average for the SAT Score is 500. The investigator wants to know if there is a trend for lower test score based on the sample. Assume the investigator wants to be 95% confident is his assertion.

1-*

SAT Score Example (continued)

First set up the hypotheses:
H1 U<500 Ho U> or = to 500

df = n-1 or 15-1=14 alpha (significance) = .05
look at the t-table for 14 df and alpha= .05 gives a t-table value of 1.761
t-calc = (475-500)/(35/ √15) = -2.77

The absolute value of t-calc is greater than the absolute value of t-table — we reject Ho and accept H1 as true.

1-*

In business we prefer a high degree of confidence in our probability assessments. This is expressed in two ways

Confidence – for example being 95% confident that the null hypothesis is rejected is the same as
Significance (statistical term) – the hypothesis is outside of the 5% level of significance. Significance = 1 – Confidence
This is a two tailed test at a 95% confidence level or 5% significance level. Typically we call the significance level “ɑ” or “alpha”.
T-Values and Hypothesis Testing
Rejection Region
Rejection Region

1-*

The Importance of Large Sample Size
A smaller interval at a given confidence level will result in greater area to reject the null hypothesis. That is, as n increases the “rejection region” increases.

1-*

Hypothesis Testing – Replacing Uncertainty with Probability
Step 1. Formulate the hypothesis being tested (called the Null Hypothesis Ho) and state the alternative hypothesis (the one concluded if Ho is rejected (symbol H1).

Step2. Collect a sample of items from the population, measure them an compute the appropriate sample test statistic.

Step 3. Assume the null hypothesis is true and determine the sampling distribution of the test statistic. (t-table)

Step 4. Compute the probability that a value of the sample statistic at least as large as the observed could have been drawn from this sampling distribution. (t-calc).

Step 5. If the probability is high do not reject the null hypothesis. (t-table is greater than absolute t-calc) If the probability is low, the null hypothesis is discredited and can be rejected (t-table is less than absolute t-calc) with small chance of error (Type 1 error).

1-*

Do not be guilty of Type 1 error by not testing. Formulate and test your hypotheses.
Statistical Significance Error Types

1-*

To Determine the Right T-Test Look At the
Alternative Hypothesis Sign
“Inequality” sign indicates that a 2 tailed test is required

1-*

“Less than” sign indicates a one tailed test to the left (negative t value)
To Determine the Right T-Test Look At the
Alternative Hypothesis Sign

1-*

“Greater than” sign indicates a one tailed t-test to the right (positive t value)
To Determine the Right T-Test Look At the
Alternative Hypothesis Sign

1-*

Statistical Confirmation of Significant Linear Relationships Between Data

Can you develop a hypothesis for testing the relationship between each of your X variables with Y?
Is the hypothesis one or two tailed?
What is the test of each hypothesis?

1-*

Sales Example
From data on a large samples of sales transaction, a small business owner reports a 95% confidence interval for the mean profit per transaction, µ, is (23.41, 102.59). Use these data to determine the following and assume the standard error of the sample mean is 20.2.

1) A point estimate (best guess) of the mean µ and its 95% error margin.

2) A 90% confidence interval for the mean µ. (transform the 95% confidence interval to 90%)

1-*

1) A point estimate (best guess) of the mean µ and its 95% error margin.
Point estimate:
95% error margin: (102.59  23.41)/2 = 39.59

2) A 90% confidence interval for the mean µ. (transform the 95% confidence interval to 90%)
1 = .90  Z = 1.645, X = 63,
S/ √n = 39.59/1.96 = 20.2
The Interval about the mean X =
= or – 1.645 (S/square root of n) = 63 ± 1.645 (20.2)
= 63 ± 33.23 or an interval of (29.77 to 96.23)
Note that this is a tighter interval than the 95% confidence interval above.

1-*

Another quick check of the Significance Test
The P value
Your software will calculate a p value for each statistic as well. It should support the t-test conclusion.

The P value is the highest probability that Ho (the null hypothesis) is true.

For example a high t-calc value and a low P value indicate that the null hypothesis is not true. This mean that you can reject the null hypothesis Alternatively, this indicates that the alternative hypothesis is statistically significant or true.

1-*

Statistical Confirmation of Significant Linear Relationships Between Data

Can you develop a hypothesis for testing the relationship between each of your X variables with Y?
Is the hypothesis one or two tailed?
What is the test of each hypothesis?

1-*

Testing the Relationship Between Two Variables (Data Series)

The Observation of TSPs for each Data Series
The Scatter Plot with a Regression line for Two Variables (Data Series)
The Correlation between Two Variables

( With the probability (p) that no correlation exists)

1-*

The Relationship Between Two Variables (Two Data Series)
We are evaluating the relationship between two data series (Y and X) if they are highly linearly related. That is as one increases (decreases) the other increases (decreases) in a constant fashion. These linear relationships can be positive or negative.

In Mintab go to Graphs/Scatter Plot

1-*

The Pearson product moment correlation coefficient takes on the
symbol ρ for the population. Perfect correlations are represented by positive (+1) or negative (-1) values while 0 values represent no correlation.

Minitab and Excel will calculate the correlation coefficient (r) for you. You will need to test it with a t-test for statistical significance.

The correlation coefficient is directly related in sign and magnitude to the slope term in a trend or regression equation.

r = ∑ (X-X) (Y-Ῡ)

√[∑(X-X)2] [∑(Y-Ῡ)2]

Correlation Coefficient (r)
The Statistic That Captures The Strength
of the Linear Relationship

1-*

You Must Use the Correlation Coefficient in Your Project
1. Need to examine each independent variable’s relationship to your dependent objective variable (Y).
2. Need to examine each independent variable’s relationship with the other independent variables (Xs)
3. The higher the (r) values the stronger the linear relationship.
General Rule: You want the Y and X relationships to be stronger than the X to X relationships.

You must state the results of your correlation analysis in your proposal and final project paper. There are also p values that should indicate the statistical significance of the correlation. You should mention the level of the p values or the statistical probability that Ho is not true as well.

1-*

Correlation Analysis of Two Variables
1. Perform a Scatter Plot of the Variables
Y variable on vertical axis
X variable on horizontal axis
Comment on Linearity of the Relationship
2. Run the Correlation Coefficient (r)
r is close to 1 high correlation, close to 0 low
+r = direct relationship, -r = negative relationship
Comment on the strength and direction of r
3. Determine if r is statistically significant at 95% confidence
Null hypothesis Ho: r = 0
Alternative hypothesis H1: r ≠ 0
4. Examine p value to determine to accept or reject the null
in p is below .05 reject the null
5. Continue using variables with strong r and low p values

1-*

Assignment 3 and Correlation Assessment

You will need to perform a hypothesis test on the correlation between your Y and each of your X variables.

Examine the p values to determine if the probability of the correlation belonging to the distribution of correlations with zero mean (expected value).

Each correlation p value should be .05 or below.

Comment on these p values as well as the strength of the correlation coefficients.
Here you are moving from uncertainty to low risk.

1-*

Consider the Relationship Between the Health Care Cost and Age of Children.

Cost Age

859 8

682 5

471 3

708 9

1094 11

224 2

320 1

651 8

1049 12

1-*

Note the linearity of the relationship — points lying close to the regression line. These are close so there is a strong linear relationship between the variables.
In Minitab go to Graph/Scatter Plot/With Regression. Enter the Y variable first then X.

1-*

Correlations: Cost, Age

Pearson correlation of Cost and Age = 0.938
P-Value = 0.000
In Minitab go to Stat/Basic Stats/Correlation
Enter the Y value (dependent variable) first then the X (independent variable(s))
1. Note the strength of the relationship as shown by be
correlation coefficient (.938 is close to 1 and indicates a strong linear relationship.
2. Note the direction of the relationship. r= .938 is
positive and indicates a direct relationship.
3. Note the p value. It is .000 (below .05) that indicates
the probability that the null hypothesis Ho: r = 0 is not
true.
4. You have found a statistically significant relationship.

1-*

Where we are in this course—
Described what is expected in terms of a class project –your response is a proposal.
Described the characteristics of a data series and how to detect them.
Discovered that good forecasting models are good at estimating these characteristics (Leave Random Residuals)
Examined ways to determine a good forecast
We have not covered the required Forecast Methods or Models yet.

1-*

Statistical Confirmation of Data Series Characteristics (T, C and S)

We will use correlation analysis to see how T, C and S can be statistically proven with the correlation between observations within a single data series.
Note – this correlation is not between two variables but between observations of a single variable.
It is expresses as autocorrelation coefficients or functions (ACFs).
Provides the probability that T, C or S exist.

1-*

Autocorrelation

(Correlation Analysis applied to a Single Data Series)

Uses statistical probabilities to determine the likelihood of the relationship recurring in the data series.
Random relationships indicate a very low likelihood of recurring. (zero probability)
Examines the entire data series and describe the strength of the observation relationships to other observations and the likelihood of recurring.
Autocorrelation Functions (ACFs) are measures of the relation ship strength between lagged data observations
T-values indicate the likelihood of recurrence (probability)
LBQ values indicate the probability of recurrence for the entire series.

1-*

Data Observations and Their Interrelationships

Just as steps in a staircase are related to one another many data series observations are related to one another.

101
104
106

We can examine this relationship using observation correlations expressed in lag period correlations of observations in the same data series — auto(self)correlations.
Note that autocorrelation coefficients express the non random or systematic variation relationship between observations.

1-*

In business we prefer a high degree of confidence in our probability assessments. This is expressed in two ways

Confidence – for example being 95% confident that the null hypothesis is rejected is the same as
Significance (statistical term) – the hypothesis is outside of the 5% level of significance. Significance = 1 – Confidence
This is a two tailed test at a 95% confidence level or 5% significance level. Typically we call the significance level “ɑ” or “alpha”.
T-Values and Hypothesis Testing
Rejection Region
Rejection Region

1-*

rk = ∑ (Yt-k – Y) (Yt – Y)
∑ (Yt – Y)2
Autocorrelation Coefficient (r)
rk = Autocorrelation for a k-period lag
Yt = Value of the time series at period t
Yt-k = Value of time series k periods before period t

Y = Mean of the time series

Autocorrelation Coefficient Function ACF is a list of the autocorrelations for each lag period. Upper and lower bounds can be set by using the rule of thumb.

1-*

What to look for in ACFs

Upper t value limit
Lower t value limit
Zero correlation line
Upper Rejection Region
Lower Rejection Region
Normal t value distribution
What hypothesis are you testing? – That the data is a member of a randomly distributed population
If the ACFs fall outside of the limits you must reject this hypothesis.
95% Confidence or %% alpha

1-*

Quarterly Price Data

0.4

0.29

0.24

0.32

0.47

0.34

0.3

0.39

0.63

0.43

0.38

0.49

0.76

0.51

0.42

0.61

0.86

0.51

0.47

0.63

0.94

0.56

0.5

0.65

0.95

0.42

0.57

0.6

0.93

0.38

0.37

0.57

1-*

These are the autocorrelation functions (ACFs) for problem 20. Note the peak in lag period 4 and appearing again in lag 8 indicating seasonality.
In Minitab go to Stat/Time Series/Autocorrelation
Quarterly Seasonal Effect

1-*

Lag ACF T LBQ
1 0.254028 1.44 2.26
2 -0.056970 -0.30 2.38
3 0.213617 1.13 4.09
4 0.734102 3.75 25.03
5 0.103383 0.39 25.47
6 -0.204855 -0.76 27.22
7 0.096260 0.35 27.62
8 0.463845 1.68 37.38
9 -0.085004 -0.28 37.72
10 -0.304619 -1.02 42.31
11 -0.042873 -0.14 42.40
12 0.228614 0.74 45.25
Check the Autocorrelation Function t values for each lag period. Note the t value for lag period 4 — it is statistically significant (higher than the 95% t-table value of 1.960.
Check the ACF and t values
In Minitab go to Stat/Time Series/Autocorrelation

Note the significant t value for the fourth lag period – indicating seasonality for quarterly data.
See the LBQ table on page 528 for .05 alpha and lag values as degrees of freedom (df).

If the LBQ calc above exceeds the table value then the data series is not random through this lag.

1-*

The time series plot of the data indicates that the series does have seasonality. The AFC value in period 4 shows that the seasonality is statistically significant. In Mintab go to Stat/Time Series/Time Series Plot.

1-*

1-*

1-*

ACF Evaluation

Trend – early lags (at least 2 or 3) are in the rejection region and they slowly fall or rise toward the zero line. Sever trend may never fall toward zero.
Seasonality – spike into the rejection region for every seasonal spike in the data (12 and 24 lags for monthly, 4 and 8 lags for quarterly.
Cycle – roller coaster of ACFs rising and falling sometimes exceeding the rejection boundary.
Random – all ACFs fall within the rejection region boundary – you must still check the data distribution with a histogram.

1-*

1-*

How Much Error is There?
Error measure answer this question.

You need to focus on two types of error measures relevant to business –MAPE and RMSE

One (MAPE) measures the percent error while the other (RMSE) expresses error in units of Y.

These are estimates of the expected error per fit or forecast period observation.

For good forecast results you want the lowest error measures possible.

1-*

Error Measures
These measures are vital to our analysis of forecast model performance.
Good models have the lowest error measures.
They must be applied to “fit” or the data used in developing each forecast model.
They must be applied to “accuracy” or the data that is forecasted by the model.
They can be used in any forecast or budget accuracy analysis.
They can be used for any forecast method either quantitative or qualitative.
You must use them consistently in your class project – and you must employ at least two of these measure across each of your four forecast models.

1-*

Forecast Error Measures or Evaluation Tools
ME = ∑ ( At – Ft)
MAE = ∑ At – Ft
MPE = ∑ [( At – Ft)/At ]
MAPE = ∑ (At – Ft)/At
RMSE = ∑ ( At – Ft)2
n
n
n
n
n
Mean Error is good for detecting bias but typically
Underestimates forecast error.
Mean Absolute Error or Mean Absolute Deviation
that is in forecast terms not a good measure across
various forecast series. This is produced by Minitab as MAD
Mean Percentage Error is good for detecting bias but typically underestimates forecast error.
Mean Absolute Percentage Error is good for comparing
error across forecast series and is not sensitive to forecast
unit size. This is produced by Mintab.MAPE
Root Mean Squared Error is good for comparing
Error for models of a given series. It is forecast
unit sensitive. (equivalent to Standard Deviation)
Can be found by squaring Mintab MSD.
MSE = ∑ ( At – Ft)2
n
Mean Squared Error is good for comparing model error for
a given series but it is unit sensitive. This is produced by Mintab as MSD.

1-*

Theil’s U
U =√∑(At – Ft )2 / √∑(A – At-1 )2
This coefficient is very useful to determine if a forecast is better than a naïve forecast (next period will be the same as last period).

This is good to determine is a forecaster or budgeter is adding value.

Values <1 indicate better than the naïve assumption while U values >1 indicate forecast results that are worse than the naïve assumption.

U values = 0 indicate a perfect forecast.

1-*

Use of Error Measures

From this point on you must show and comment the error measures for any forecast or estimated values.

Remember the best models have the lowest error measures.

Apply the error measures consistently in your analysis.

These along with residual analysis determine forecast model reliability. (Confirmation that the error is small and random.)
Error measures
Normality analysis plots of residuals
Autocorrelation Functions of residuals
Histogram of residuals
Calculation of basic statistics (mean should be close to 0)

1-*

What is the Source of Error?

Now that we have identified the amount of error with error measures we must determine the cause of error.

Remember since the random components can take on any value high error may not be the result of non randomness.

We must investigate residuals further.

1-*

Where Do We Want Randomness ?

Answer: In the Residuals or Error Terms

How Do We Check for Randomness In Residuals?
TSP – (random signs and no T, C or S
Histogram and Mean (normal distribution and zero mean)
ACFs (non-significance of each ACF)

1-*

Autocorrelation and Residual Evaluation
(What is the Source of Error?)

Besides Determining The Characteristics of a Data Series to be Forecast Autocorrelation can be applied to Fit and Forecast Residuals.
The check for statistically significant characteristics you found in the original data– you do not want them to show up as significant in the residuals.
To indicate that the residuals are random – all lag relationships should be within the t boundaries.
Use this in conjunction with a histogram and TSP of residuals to confirm randomness.

1-*

Residual Distribution and the Adequacy of Forecasting Techniques

Calculate the forecast residuals and plot them in a histogram
Look for normally distributed residuals. Compare the histogram to a normal bell shaped probability curve with residuals clustered around the mean.
Residuals clustered to one side of the residual mean indicates a distribution that is not normal and an improvement to the forecast technique could be made.

A good forecasting technique yields residuals (errors) that have random characteristics and approximate a normal distribution

1-*

Residual Autocorrelation and the Adequacy of Forecasting Techniques (Summary)
Determine if the autocorrelation coefficients of the residuals is indicative of a non trended random series.

1. Calculate the forecast residuals (Ft-At) for each period.

2. Calculate the autocorrelation coefficients (ACFs) of the
residuals for representative lag periods.

3. Look at the ACF pattern to check for stationary (no trend)
series and statistically significant lags (greater than t-value)

4. Compare the 12th and 24th LBQ value to the .05 Chi-square table value for 12 and 24 degrees of freedom. Residuals may be random (not autocorrelated) if the LBQ does not exceed to Chi-square table value.

1-*

Checking Your Y Variable Characteristics

1. Perform a Time Series Plot your Y data and note
Magnitude and range of data
Trended or Stationary
Cyclical and length of cycles
Seasonal and peak periods
2. Run Autocorrelations with at least 2 years worth of period lags
Note lag periods exceed the t table values (+ or – red line)
Slow decline in significant r values indicates trend
Significant values or r peaks for cycle periods
Significant values or r peaks for seasonal peaks
(4th peak for quarterly or 12th peak for monthly lags)
3. Comment on significant trend, cycle and seasonality of the data

1-*

How to Evaluate a Forecast and Forecast Model
1. Recall that models produce two types of output
Estimates of data used in developing the model
Forecasts of future data values
2. Residuals (Error Terms) are:
Actuals – Estimates for Model Fit
Actuals – Forecasts for Forecast Accuracy
3. Analysis of Residuals is key to determining the usefulness of a model and reliability of a forecast
4. Treat residuals as another data series
Perform Time Series Plots and Autocorrelations to check for characteristics (Same as Y Check on the previous slide)
5. Check residuals with selected Error Measures
Select the model with the lowest error measure values
6. If Statistically significant Characteristics exist then your model needs improvement (or try another forecast method)

1-*

Random Variation Is Not Expressed in Significant Lag Period Autocorrelations

How is this useful?
Determining the characteristics of your data series.
Examining the error terms of models that you develop – their error terms should not have any strong (significant) autocorrelation components.
You should test your model error terms with autocorrelation analysis – that is look for significant lag period autocorrelations that indicate that the model could be improved or choose another model.
You must comment on these in your final project paper as analysis of residuals (error terms) for each of the four forecasting models that you use.

1-*

An Additional Residual Test:
Residual Distribution and the Adequacy of Forecasting Techniques
Calculate the forecast residuals and plot them in a histogram

Look for normally distributed residuals. Compare the histogram to a normal bell shaped probability curve with residuals clustered around the mean.

Residuals clustered to one side of the residual mean indicates a distribution that is not normal and an improvement to the forecast technique could be made.
The forecast may be biased – examine the mean of the residuals (error measures) to confirm this.

New folder/POWERPOINTS/Nov 3 chap 7 and 8.ppt

©2007 McGraw-Hill/Irwin

Chapters 7 and 8
Forecasting with Multiple Regression
And Time Series Regression Methods

4-*

Regression is Useful to Test Theories of Business Variable Relationships and to Forecast Outcomes
You have hypothesized relationships.

Business people do this every day – relationship between inputs and outputs, sales and product price, sales and customer visits, etc…

You formulate hypotheses between variables in everyday life. Relationship between time spent on studying for this course and your grade, relationship between your earnings and taxes you must pay, relationship between expected miles you drive and gasoline costs, relationship between weather and your utility bills. In each of these case one variable (X) seems to be a good determinant or explains the value of another variable (Y).

Economics is the study of variable relationships – the science part of economics. Economic theory is a compilation of these relationships observed and tested. Economic Law occurs when the theories have been thoroughly tested and the outcome is a virtual certainty.

4-*

Y
X
Remember that in regression the appropriate slope term or coefficient is determined by the minimum of the square residuals (square values of the red lines below holding other X variable relationships constant.
Multiple Regression and Partial Regression Coefficients
Note that although we find the best linear relationship it is still an estimate of Y actual values. These estimates are represented as fitted values or “Fits”.
In multiple regression each X variable slope coefficient is determined in this fashion. Each coefficient effect is combined to form the regression equation.

4-*

The Importance of Hypothesis Testing In Regression
Before you can use a regression equation to forecast values of your dependent Y variable you must test the regression model reliability.

Student’s T test for the X coefficient
That is test the coefficient to ensure that it is not a member of the set of coefficients with a zero expected value (zero mean) Test to reject the Null Hypothesis Ho: B1 = 0.

F test for the regression equation
Test the entire model to ensure that it represents the relationships of the population of the Y and X variables and that the coefficient is significantly different from zero. Test the Null Hypothesis Ho: B1 = 0.

Remember to check to t and F table values that you need to exceed to reject Ho. Also remember to check the p values to ensure that they are below the significance (alpha) value of .05.

4-*

4-*

Hypothesis Test for the Example C Scatter Plot

Given a correlation coefficient of r=.79 and a sample size of 5 test for
Ho: ρ = 0
H1: ρ ≠ 0

t-table (for two tailed test, 95% confidence and df = n-2 is 3.182
t-calc = .79 – 0/√(1-.792)/(5-2)
= .79/√.1253 = 2.232
-3.182
+3.182
2.232
Reject the null hypothesis Ho
The correlation is not statistically significant and X is not a good explanatory variable of Y.

4-*

Given a specific value of the independent random variable X, the conditional expected value of Y, denoted as E(Y|X = x) is:

mX = mean of independent variable X.
mY = mean of independent variable Y.
r = correlation coefficient between X and Y.
sY, sX = standard deviations of Y and X, respectively.
When two random variables are correlated (r ≠ 0), knowledge of the outcome of one will benefit in the forecasting of the other.
Statisticians have developed a useful theory of conditional expectation, called Classical Linear Regression:
Why is Regression a Useful Forecasting Tool?

4-*

Regression Model Types and Required Tests
Simple Linear Regression (Bivariate Model)
t and p test for coefficient significance
F test for model significance and pop. relevance
R square evaluation for model strength/usefulness

Multiple Regression
Each of the above plus
XX correlation test, VIF test for Multicolinearity
Adjusted R square evaluation for model strength

Time Series Regression
All of the above plus:
DW test for serial correlation
Residual order plot for Heteroscedasticity

All of the regression models above must also be subjected to residual analysis with residual time series plots, residual autocorrelation and residual histogram to ensure the residuals are random, normally distributed with zero mean.

4-*

Simple Linear Regression
Recall the Bivariate Causal Regression Model
Simple Least Squares regression where there are two variables (Y and X)
Y=f(X) where variable X is assumed to cause changes in variable Y
The linear model can be expressed:
Y= βo+ β1X + ε
Y is the dependent variable to be forecast
βo is the intercept where X is zero
β1 is the change in Y for every unit change in X (Slope)
X is the independent variable (implied causation)
ε is the error term

Ordinary Least Squares (OLS) produces the model that minimizes the squared error (∑ ε2). That is it minimizes the ∑(Y-Ŷ)2 where Y is the
actual Y value and Ŷ is the estimated value from regression.
Remember that Y = Ŷ + (Y-Ŷ) or the estimate plus the error.
The same will hold true for Multiple Regression analysis.

4-*

Decomposition of Variance
SST = Sum of Squares Total (total variability of Y)
SSR = Sum of Squares Regression (variability explained by linear relationship)
SSE = Sum of Squares Error (residual or unexplained variability)
SST = SSR + SSE

Components can be calculated by:
SST = ∑(Y-Ῡ)2
SSR = ∑(Ŷ-Ῡ)2
SSE = ∑(Y-Ŷ)2

Where Y = observed values of Y, Ῡ = mean of Y, Ŷ = forecast (fitted) values of Y

R2 = SSR/SST or the portion of Y variance explained by the linear relationship

4-*

Decomposition of Variance and Mean Variance
SST = Sum of Squares Total (total variability of Y)
SSR = Sum of Squares Regression (variability explained by linear relationship
SSE = Sum of Squares Error (residual or unexplained variability
SST = SSR + SSE
Mean Squared Regression (MSR) = SSR/ df
Where degrees of freedom (df) = 1 (the number of X variables)

Mean Squared Error (MSE) = SSE/ df
Where degrees of freedom (df) = n-2

4-*

Important ANOVA Components
Their Relationship to R2 and F
R2 = SSR/SST (Unexplained variation divided by total variation)

Mean Squared Regression (MSR) = SSR/ 1
Mean Squared Error (MSE) = SSE/(n-2)

F = MSR/MSE (Average Explained Variation divided by Average Unexplained Variation)

Run the ANOVA table in minitab to accompany your model results.
Sum of Squares Total (total variability of Y) = SST
degrees of freedom (df) = n-1
Sum of Squares Regression = SSR
degrees of freedom (df) = 1 (the number of X variables)
Sum of Squares Error = SSE
degrees of freedom (df) = n-2

4-*

The Relationship Between F ratio and R2
R2 is sometimes called the Multiple Correlation Coefficient or Coefficient of Determination. There is no longer a direct r to R2 relationship since X values take shares of the Y variation.

R2 = SSR/SST Explained Variance Divided by Total Variance

F = MSR/ MSE or (SSR/k)/(SSE/(n-k-1))
Where: n = number of observations
k = number of independent variables

F = R2/(1- R2) ((n-k-1)/k) or = R2(n-2)/(1-R2)

Large R2 values will result in large F values all other things being equal.

The F value will provide a more reliable measure of model results significance to the population by adjusting for the degrees of freedom in observations and X variables. The F- test must be performed to determine regression results applicability to the sample population.

4-*

Standard Error of the Estimate (SEE)

A measure of the Y dispersion about the forecast (Ŷ) similar to standard deviation (S)

It measures the amount that the observations Y differ from the estimates Ŷ

One SEE would include about 67% of the differences while 2 SEE would include about 95% of the differences between Y and Ŷ.

A small SEE means that the actual observations of Y fall very close to the regression line.

SEE = √ ∑(Y-Ŷ)2/(n-2) or SEE is the square root of the sum of the squared residuals (SSR) divided by the degree of freedom. This is equal to the square root of MSE from variance decomposition in the ANOVA table.

4-*

Getting RMSE for the ANOVA Table
You must use the SSE to determine RMSE. Using the ANOVA MSE value will overstate the true RMSE for the regression.

Note the number actual Y variable observations (n)

Divide the SSE by n.

Take the square root of the resulting number to get RMSE.

The square root of the ANOVA MSE will give a slightly overstated estimate of the true RMSE value – the larger the n value the closer to true RMSE the square root of the ANOVA MSE will be. Note that this is the same as the regression Standard Error of the Estimate (SEE).

4-*

Developing a Confidence Band around your Forecast
Approximate 95% Confidence Band = + and – (2 x SEE)
Develop a confidence band particularly when the X forecast values exceed the historical or observed X values. This replaces uncertainty with probability in the forecast.

4-*

Standard Error of the Estimate (SEE) = ∑(Y – Ŷ)2
Where:
Ŷ = forecast (fitted) value for each period
Y = observed value for each period

Using Standard Error of the Estimate to Develop a Forecast Range
SEE is sometimes called the Standard Error or Regression SER

4-*

Cross-Sectional Forecasting With Regression

All of the data pertains to one time period.
Exploring the relation between variables (Y and X) at a specific point in time.
The example shown is the relationship across markets of sales and population at several locations. Cross-sectional regression could apply to any dependent and independent variables at a specific time. (Examples)
That is, we can determine the relationship across markets with regression analysis and forecast the result of an additional market (or Y variable) using a regression equation.
Many of the estimator tests hold in this case as well, t-test on the slope coefficient, R2, RMSE and F test

4-*

Multiple Regression Analysis
The equation for the population is
Y= βo+ β1X1+ β2X2+ β3X3 + β4X4 + ….. βnXn +ε
Where ε = Y- Ŷ

The estimated equation for the sample is
Ŷ= bo+ b1X1+ b2X2+ b3X3+ b4X4+ …. bnXn
Where residuals = Y- Ŷ

In Ordinary Least Squares (OLS) we Minimize ∑ ε2= ∑(Y- Ŷ)2 or the sum of the squared error

1. Note that this is causal regression analysis since X’s are understood to determine Y values.

2. Note that it is partial regression analysis relative to each variable since each independent variable regression coefficient is determined holding all other independent variables constant.

3. Note that the regression can be either time series or cross sectional.

4-*

Evaluation of Regression Analysis Summary

Before Running Regression:

A. Evaluate the relationship of the dependent (Y) variable and (X) variables
by plotting time series of each. X variables should meet the basic criteria.

B. Evaluate the significance of independent variables before you run regression models by:
1. Correlation Scatter Plots of X versus Y (look for linearity)
2. Correlation Coefficients (look for magnitude 0 to 1 and sign + or -)
3. Correlation Significance (t-test and P test for Ho: r=0, reject then variable is
significant)
Determine data transformation requirement using scatter plots. Explore transformation scatter plots to determine the correct X transform. Use the transformed X variables instead of raw X data.
Run autocorrelation on the Y variable. Look for characteristics especially seasonality or other major qualitative factors.

1. Use dummy X variables for other qualitative events (e.g. 9-11, hurricane Kartina)
If seasonality is detected you must:
2. Deseasonalize the data by running Decompositiion and using CMA or DESEA as the Y variable (Do not forget to reseasonalize any forecast with the seasonal index from Decomposition) or,
3. Use dummy variables for each monthly or quarterly period (except one) as additional X variables. The forecast will be include seasonality.

*

4-*

Evaluation of Regression Analysis (checklist)

Before Running Regression:

A. Time series plot of (Y) and (X)

B. Significance of Xs:
1. Correlation Scatter Plots of X and Y
2. Correlation Coefficients
3. Correlation t-test and P test
Data transformations
XY Scatter plots
Transform X
Y autocorrelation for seasonality

1. Dummy X variables for events
2. Deseasonalize and index with Decompositiion or,
3. Dummy variables for periods..

*

4-*

Evaluation of Regression Analysis Summary (continued)

Run Regression:

E. Evaluate the X variables by:
1. Evaluating the Logic of the Coefficient Sign –Does it make sense?
2. t-Test and F-test to Evaluate X variable coefficient (b) significance with Ho: b=0
by comparing the t-value to t-table or p to reject when X variable is significant.
3. Running Stepwise Regression and examine signs, t and P values of coefficients.
F. Multicollinearity check variable sign logic, check X1 vs Xn correlation to ensure
that it is greater than X1 vs Y correlation (Omitted Variable Problem). Look at the Variance Inflation Factors (VIFs) for values that exceed 2 – this may indicate multicollinearity.
G. Heteroscedasticity test in mintab by plotting the residuals vs the independent variable or time series and look for megaphone disitribution of the residuals
H. Serial Correlation test for error term serial correlation by:
1. Plot forecast fitted data Ŷ against Y values and look for + or – type
2. Check DW between 0 and 4 to compare with table upper and lower limits
I. R2 value check (adjusted R2 for multiple regression) and perform the
F-test Ho: R2=0 test of significance by comparing F value to F-table value to reject
Ho when R2 is significant –useful with multiple regression methods
Residuals time series plot for zero mean and randomness, histogram, normality plot and autocorrelation to determine if systematic characteristics exist. Determine if BLUE assumptions have been violated.

4-*

Evaluation of Regression Analysis Summary (checklist)

Run Regression:

E. Evaluate Regression Equation
1. Coefficient Sign logic
2. t-Test and P test for X coefficient
3. Run Stepwise Regression and examine signs, t and P values of the coefficients
F. Multicollinearity
coefficient sign logic
X1 vs Xn correlation

3. Variance Inflation Factors
Heteroscedasticity
Plot residuals vs order
Look for megaphone

H. Serial Correlation
1. Plot fitted Ŷ vs Y values and look for + or – type
2. Check DW
I. Adjusted R2 (strength) and F test (reliability)
F-test Ho: R2=0 test of significance by comparing F value to F-table value to reject
Ho when R2 is significant –useful with multiple regression methods
Residual randomness and normality (BLUE)

1. Time series plot residuals, histogram, normality plot
2. Residual autocorrelation

4-*

A. Independent Variable Candidate Required Characteristics

Plausible relationship between the dependent variable (Y) and the independent variable (X).

Is not subject to large measurement errors

Does not duplicate other independent variables

Not difficult or costly to measure

4-*

B. Examine the relationship between variable Y with another variable X

The relationship has to be (1) statistical and (2) logical before you can use X to forecast Y.

4-*

The Pearson product moment correlation coefficient takes on the
symbol ρ for the population. Perfect correlations are represented by positive (+1) or negative (-1) values while 0 values represent no correlation.

Just getting a high r value is only half of the job. You must check the r value for significance. That is you must answer the question “Is the (r) that you observe a member of a distribution (of r’s) that have zero mean or expected value?” You must test the null hypothesis Ho: r=0.

r = ∑ (X-X) (Y-Ῡ)

√[∑(X-X)2] [∑(Y-Ῡ)2]

Correlation Coefficients Quantify The
Relationship Between Variables
B.

4-*

You Must Learn to Discard X Variables
Discard X variables if:
They do not contribute significantly to the R2
The coefficients are not statistically significant (t, P, or F)
The signs of the coefficients do not make sense.

B.

4-*

B. Using Hypothesis Testing In Bivariate Analysis
(Testing for correlation significance)
Ho: r = 0
H1: r ≠ 0
To determine if your independent (X) variable is useful in regression analysis you must perform a hypothesis test to ensure statistical significance even though you have a high correlation coefficient (ρ).
Use the standard t-test to perform the significance test on the sample Pearson Product Moment correlation value ( r ) with the t-calc equation:

t = r – 0
√(1-r2) /(n-2)
Note that the t value is very sensitive to the size of ( r ) and the number of observations ( n ).
If t-calc is greater than the t-table value for (n-2) observations then you reject the null hypothesis and your (X) variable correlation is statistically significant.
ρ for population, r for the sample
The t value can be calculated by dividing each coefficient by the coefficient Standard Error

or

4-*

B.
Make sure that you compare the t-calc value for the correlation to the t-table value. t-calc must exceed the t-table value in absolute terms to reject Ho: r=0.

4-*

C.
There are two ways to address a curvilinear XY relationship
(as shown by scatter plots of Y versus X)

Change the form of the regression (e.g. from Linear to Quadratic) but this will change the assumptions made relative to the measures. We will discuss this later in Chapter 7.
Change (transform) the X variable to create a more linear relationship. With a known transformation all of the Regression Statistics still hold just as they did with raw X data but they will be improved.
You should check the linear relationship with a scatter plot of the Y variable and the transformed X variable. Look for a more linear function.
Again, use only the X transform if required in your regression analysis and for your project data.
C.

4-*

C. Poor Linear Relationships and Data Transformation

If you XY scatter plot is not linear you may need to transform your data

Use the Calculator function in Minitab to convert your X variable to another form to obtain a linear relationship.

Use a Scatterplot in Mintab to select the most linear transformed X and Y relationship.

Run the regression on the transformed X and Y. Check the R square, F, error measures and residuals.

4-*

Poor Linear Relationships and Data Transformation

If you XY scatter plot is not linear you may need to transform your data. (Note the difference between a weak linear relationship A on the following slide versus a curvilinear relationship shown by B)

Use the Calculator function in minitab to convert your X variable to another form to obtain a linear relationship.

Scatterplot the to select the most linear transformed X and Y relationship.

Run the regression on the transformed X and Y. Check the R square, F, error measures and residuals.
C.

4-*

C. Transformation Applications
This weak linear relationship in data set A will not be helped by transformation.
This curvilinear relationship in data set B could be helped with transformation.

4-*

C. Transformation Types

Square X2
Square Root √ X
Reciprocal 1/X
Log (use base 10) Log10 X

1.In minitab Calc select Calculator
2. Type in an open column to place the results in
3. Select the Function you wish form the above options
4. In the (number) place the column of you X variable.
Select OK
5. Scatter Plot the Y data and transformed X column
6. If Linear run a Regression on the Y and transformed X
7. Evaluate the results, Rsquare, F, B1 error measures and residuals.

4-*

C. Transformation Examples

The following scatter plot slides show what the transformations do to a perfectly linear XY data relationship. If you have a good linear relationship you do not need to transform X.

Note that we only need to transform the X variable – Y stays in its raw form.

The transformations have the reverse effects on the XY data series that are shown on the slides. The scatter plot for transformed data should become more linear. You need to select the best (most linear form) transformation of X.

The best will be indicated by a higher R square value, lower P value for F and a more linear scatter plot.

Be sure to run the best transformed value of X as the independent variable replacing the raw X values in regression analysis.

4-*

D. How to Account For Know Occurrences or other Qualitative Values

What could you do to explain the occurrence of 9-11 or the effect of increased taxes, a sales promotion activity or an XY outlier?

It would be great to have a qualitative variable to account for their effects in a regression.

In this case we can capture the time series influence with a switching or dummy variable with data made by you.

The period(s) that the know influence occurs you switch from 0 to 1. All other periods remain as 0.

You run the multiple regression with this new data series made by you and evaluate the statistics for the new series just as you would for other X variables.

4-*

D. Dummy Independent Variables (X)
(How to Account for Known Qualitative External Factors)

1. Dummy or indicator variables are used to introduce qualitative independent factors to forecast a dependent variable.

2. The are typically used in conjunction with other quantitative X variables. You can use more than one dummy variable.

3. The qualitative factors are indicated with a data series of 0 (no influence) and 1 (influence) assigned for each observation.

4. The dummy variable data series is considered a switch on or off for the qualitative factor.

5. Note that you must be able to account for the dummy variable historically and project it into the forecast future.

6. Also note that you should not introduce symmetry in the dummy variables. That is do not introduce another data series with just the opposite values. Check the significance and regression statistics for inclusion in regression

4-*

In Mintab to include a dummy variable for an event simply create a new data column and zero fill every observation where the event does not occur and place a 1 in every observation where the event occurs.

Include this new event column in your multiple regression run or stepwise regression run. Note the t and p statistic for each even variable then you enter. The model used in the example to the left has 10 even variables. Note that Time is a trend variable with a column filled with a time index (numbers from 1 to n)..
Modeling Events In Regression
D.

4-*

If you include event variables be sure to note the sign and significance of each. If they have unexplained sign relationships or low t (high p) values then exclude the event from your model.
Evaluating Event Dummy Variables
D.

4-*

Regression with Seasonal Data

Examine your Y data for seasonality.
If it is seasonal you must address the seasonal variation by either seasonally adjusting the data and reseasonlizing the estimates and forecast or use dummy seasonal X variables.
D.

4-*

Regression with Seasonal Data

Examine your Y data for seasonality.
If it is seasonal you must address the seasonal variation by either seasonally adjusting the data and reseasonlizing the estimates and forecast or use dummy seasonal X variables.
D.

4-*

D.
Look For Seasonality In the Y data with a Time Series Plot
Look for regular peaks and valleys that occur every year in the data.

4-*

Scatter Plots Also Show Signs of Seasonality
The seasonal peaks and valleys you see in time series data frequently show up as a shadow line in scatter plots.
Data Shadow
D.

4-*

Autocorrelation Functions Confirm Seasonality
Quarterly (as above) or monthly significant lag period spikes confirm seasonality. Confirm it with spikes in the 4th and 8th lag period for quarterly data and the 12th and 24th lag period for monthly data.
D.

4-*

D. Decomposition Calculation of Deseasonalized Y Data (CMA)

4-*

Decomposition Seasonal Indices
Remove seasonality by Calculating the Moving Average (MA) (12 for months, 4 for quarters) of the data.
Start moving average at the observation just below the median of the first years data.
Remove random fluctuations by calculating the Centered Moving Average (CMA) by averaging the MA and the MA for the following period.
Calculate Seasonal Factors (SF) by dividing each data observation by its respective CMA.
For each representative month or quarter average the seasonal factors.
Divide the number periods (months or quarters) by the sum the seasonal factors (SF)
Multiply each representative (averaged) SF by the ratio to get the seasonal index for each period.

Note you will lose n/2 of the observations at the beginning and end of the series
D.

4-*

Uses of Seasonal Indices
1. To deseasonalize data for a period divide each raw data observation by the respective period seasonal index.
deseasonal = observation/seasonal index
In Minitab Deseasonal = Y data/SEAS

2. To estimate reseasonalized (raw) data for a period multiply the deseasonalized observation by the respective period index.
seasonal = deseasonal observation x seasonal index
In Mintab Seasonal = Y estimates x SEAS
3. Note that the seasonal indices should sum to the number of periods being evaluated (4 for quarters, 12 for months)
sum of indices = 12 or 4

D.

4-*

You can introduce Dummy Variables to Account For Seasonality
In this case you would need to introduce a separate dummy data series of 0 and 1 for each month – but you must leave one month out to avoid symmetry.

Y = b0+b1X1+b2X2+b3X3+b4X4+b5X5+……..b11X11

Note that you use the monthly X variables in addition to other X variables.
D.

4-*

D.
Example

4-*

Evaluating Regression Results
Steps
E through J

4-*

E. Independent Variable Evaluation Questions

1. Does the sign of each X variable slope term make sense?

2. Does the t-test indicate that each X variable slope term is statistically significant either positive or negative?

3. How much of the dependent variable is explained by the independent variable(s) shown by Adjusted R2?

4-*

E. Evaluating Regression Coefficient Results

Logic of dependent to independent variable(s) relationship (+ or -). Does it make sense?

– if not it implies and underspecified model and you may need to add independent variables

Size of the Slope term – the closer to zero, the weaker the relationship between independent and dependent variables. A zero coefficient implies no relationship between X and Y and the model is overspecified (includes a non productive variable).

Hypothesis test of the slope coefficient: Ho: β = 0, H1: β ≠ 0
The t-calc value is the slope coefficient (b1)/standard error of b1
The t-table value df is (n-2) and the test is two tailed.

4-*

E. Determining the Significance of the Slope Term

Due to the size of the variables being forecasted and the change in the variables over the series, the absolute size that the slope term is sometimes hard to determine.

Perform a hypothesis test on the slope term with either a 95% confidence interval where:
Ho: b1 = 0, H1: b1 ≠ 0 when the slope sign is not known and requires a two tailed t-test
Ho: b1< or = 0, H1: b1 > 0 when the slope is positive, one tailed test
Ho: b1> or = 0, H1: b1 < 0 when the slope is negative, one tailed test or Use P-values that indicate the level of significance for the slope coefficient. Recall 95% confidence equates to 5% level of significance. For a two tailed test Ho: b1=0, then P must be less than .05 to reject Ho. For a one tailed test Ho: b1> or <0 then on half P must be less than .05 to reject Ho. 4-* E. Hypothesis Test of Independent Variables (Xn) Coefficients Check the statistical significance of each independent variable (X) and the direction of the variable coefficient sign (+ or -) by performing a t-test. The null hypothesis must be stated: Ho: b ≥ or ≤ 0 Set the null hypothesis the opposite of the sign of your coefficient. H1: b < or > 0

t-calc for Xi= coefficient (b) of Xi/standard error of Xi

Where:
i is the independent X variable

Compare the t-calc value (provided in the minitab analysis for each X variable) to the t-table value where df = n-(K+1) for a one tailed t-test at 95% significance or ɑ of .05. Remember to compare the absolute t-calc value to the t-table.
n = number of observations
K = number of independent variables

You may use the p values in comparison to calculated t-values.
Comment on the statistical significance of your variables and its sign (+ or -)

4-*

E. Stepwise Regression

If you have many X variable candidates to evaluate you can use stepwise regression in Minitab to obtain a model with the best fit.
In Mintab enter y and all X variables – you can force an X variable to be in every regression run but select at most one. Use the default enter and leave values.
2. Examine the coefficient signs, t and p values.
3. Stepwise regression will not provide you with quadratic forms or transformed values of variables.
4. With only 2 or 3 variable candidates you can run the combinations manually if you wish. (stepwise may be overkill)
5. Note that you should not run seasonal X variables discussed in the following D.3. in stepwise regression – you must include all of the seasonal X variables in your regression run to account for seasonality.

4-*

F. Multicollinearity Check Example
Multicollinearity is independent (X) variable information overlap and indicated by a strong linear relationship between X variables.

Example:
Assume GDP and Personal Disposable Income (PDI) are used to forecast Houses Sold. For example GDP and Personal Disposable Income may have strong significant relationships (correlations) and with Houses Sold. But GDP and PDI may have stronger correlations with each other. (see the Correlation Matrix).

Using both GDP and PDI to forecast Houses Sold would result in
misleading error measures and
overstated t-tests and R2 values and
inflated and sensitive X coefficients.

This is a major reason why you do not want a “kitchen sink model”. Overlap or multicollinearity reduces the reliability of forecast measures.

You must comment on model multicollinearity and any steps taken to reduce it.

4-*

F. Detecting Multicollinearity

Determine correlation coefficients between all independent variables (X) from

the correlation Matrix.

2. X to X variable Correlation coefficients of + or – .8 to 1.0 will signal possible
multicollinearity. (In Minitab run Stat/Basic Statistics/Correlation )

3. Determine if the independent correlation coefficients are greater than the
correlation coefficients with the dependent variable.

Examine independent variable coefficients and look for low t-calc values of each independent variable.

Look at the Variance Inflation Factors (VIFs) for values that exceed 2 – this may indicate multicollinearity. (In Minitab run Stat/Regression look at VIF following the coefficients)

VIFs greater than 2 indicate potentially unstable and inflated coefficients as well as other significance and regression strength measure overstatements.

Take corrective actions suggested on the next slide and recheck VIFs and coefficient t-values.

4-*

F. Detection of Mulitcollinearity Example
When Houses Sold (Y) correlation is compared to the X variables DPICI, GDP and Interest Rates IR you need to examine the correlation between X variables.

Examine the correlation matrix before you run the regression. The correlation with houses sold and each X variables ranges from .65 to .79. The following are the X variable correlation coefficients from the correlation matrix.
Note that the correlation between GDP and Disposable Personal Income is greater than its correlation to Houses Sold – a warning sign of Multicollinearity.

4-*

F. Detection of Mulitcollinearity Example
When signs of the coefficients and t-values don’t make sense – The regression coefficients for houses sold.

Note that the regression coefficient for Disposable Personal Income DPIPC is negative. You would expect DPIPC to have a positive relationship with Houses Sold. Also note the low t-calc value of -.21.

4-*

F. Reducing Multicollinearity
1. If independent variable correlation coefficients are high and show possible multicollinearity use the first difference of one of the highly correlated independent variables as a substitute for it. (In Minitab

2. Recalculate the independent variable correlation coefficients including the first differenced variable to determine if correlation remains high.

3. If the correlation remains high eliminate one of the two highly correlated independent variables. Check for significant t-calc values.

4. Select another independent variable that has a logical coefficient sign, lower correlation and higher t-calc value.

5. Transform each independent variable (X) with the scalar formula 7.12 on page 299 and rerun the regression with the scaled Xs.

4-*

G. Heteroscedasticity
(Violation of the Residual Constant Variance Assumption)

1) Is present when the error-variance is not constant across the range of the independent variable.

2) The result is bias in error variance estimates, causing misleading and overstated statistical inference (R2 and t values).

3) It can be identified with a residual plot against the independent Y variable. Y on the horizontal axis and the error terms or residuals on the vertical axis. Look for the megaphone effect.

4) Fixes for heteroscedasticity include data transformations to stabilize the error variance such as a logarithmic transformation of the dependent (X) variable.

4-*

Plot the Residuals vs X values and look for a Megaphone

Effect
You want Homoscedasticity in residuals (constant variance)

You do not want Heteroscedasticity in residuals (non constant variance).

Implies that there are systematic influences not picked up in the model.

4-*

G.
Example
There is slight Heteroscedasticity in this example. In most cases it is more pronounced than this.
What is Homoscedasticity?
Heteroscedasticity is a Judgment Call

4-*

How To Fix Heteroscedasticity

The problem of non constant residual variance must be solved by including a form of the explanatory (X variable) that shows rapid growth. This may reduce the expanded variance of the residuals.

1. Log or square transform of at least one of the X variables. See section C. The transformation of an X variable may influence the curvilinear form of the residuals shown by the time series or residuals versus order plot.

2. Using X and Y variables in constant dollar terms – applying a price index to deflate the dollar amounts. (You must make sure that you inflate the forecast with a forecasted index if actual dollar values are expected in the forecast).

3. Using another X variable with rapid growth or decline in later periods.
G.

4-*

H. Serial Correlation
(Autocorrelation of the Error Terms)

When using time series data detecting Serial Correlation is essential for correct forecasting technique and deserves special attention.

1) Serial correlation implies the error terms are correlated. εt = r εt-1+ v
where ε = residual, r = autocorrelation coefficient, v= random error.
2) Serial correlation, while not introducing bias into the estimated slope coefficient, creates bias in the estimated standard errors.

3) Serial Correlation produces estimated standard error of the regression that is smaller than the true standard error.

4) This produces spurious regression results in that the significance of coefficient estimates and quality of fit measures (t, P, and F) will be overstated.

5) Regression coefficients may be deemed significant when in fact they are not. R2 and t values will be overestimated.

4-*

H. Detecting Serial Correlation With Scatter Plots
( Serial Correlation Weakens your Regression Evaluation Measures)
Negative Serial Correlation
Positive Serial Correlation
Use the Fitted Line Plot for Y and X in Minitab Regression to detect Serial Correlation. In business Positive Serial Correlation is sometimes detected when business cycles are not picked up by the forecasting estimator.

4-*

Detecting Serial correlation with the Durbin-Watson Test for Serial Correlation
H.
For our purposes you will need to use the lower DW value for the appropriate alpha level and degrees of freedom. For negative serial correlation use 4- DW lower value for the upper boundary. For positive serial correlation use the DW lower value for the lower boundary.

4-*

H.
K = Number in Independent Variables

N = Number of Data Observations

Use dl for the lower DW value and du for the upper DW value.

Compare these to the DW-calc in Minitab.
See Table B-6
Page 530.

4-*

Durbin-Watson Statistic Quick Evaluation

1. If the Minitab DW is greater than 4-DW lower value in the DW table then you have negative serial correlation.

2. If the Minitab DW is close to 2 then you have no serial correlation.

If the Minitab DW is lower than the DW lower value in the DW table then you have positive serial correlation (typical of cyclical business data).

Also perform a closer check for an indeterminant DW by using the chart in F.2 and the DW table. You may do this after you have run your regressions to select the best variables.
H.

4-*

H. Example
When running your regressions with multiple X variables always select to include the Durbin-Watson statistic as a Minitab option. It is required to check for Serial Correlation.

4-*

H. Reducing Serial Correlation

Respecify the model –In the case of positive serial correlation the most likely cause is business cycles. Include a cyclical variable in your model.
Use the first difference of the X and Y variables to smooth out business cycle influences.
Include the square of your independent (X) variable as another independent variable.
Include a lagged value of your dependent (Y) variable as another independent variable (the auto regressive model).
Use Cochran-Orcutt to adjust the autocorrelation out of the error terms by introducing ρ (rho) to create a differencing transformation. A new regression that estimates another ρ value that is used to estimate another regression equation with lower serial correlation.
Use the Hildreth-Lu that is similar to Cochran-Orcutt. Both are only good for near term forecasts.
(You will not be responsible for using solutions 5 and 6)

4-*

I. Coefficient of Determination R2
(Critical Evaluation of Explanatory Strength or Usefulness of the Regression)
R2 shows the portion of the variation of Y explained by the independent variable(s) X. You want the percentage as high as possible.

In multiple regression adding independent variables will increase R2 since the correlation with Y will increase.

To adjust for the added variables R2 is recalculated to account for the change in degrees of freedom (K). With 2 independent variable K=2, three independent variables K=3, etc..

The result is Adjusted R2 that explains the true amount of variation explained by multiple regression independent variables.

Comment on the size of the coefficient of determination in discussions of the explanatory power of your regression model.

4-*

I. Coefficient of Determination R2 and ANOVA

R2 = Sum of Squares Regression (SSR)
Sum of Squares Total (SST)

= Total variability explained by the linear relation
Total variability of Y

= ∑ (Ŷ – Y)2
∑ (Y – Y)2

R2 is an easy measure to manipulate and that it generally overstates the fit of a given regression model, especially when serial correlation is present.
Perform hypothesis test on R2 where: Ho: R2= 0, H1: R2≠ 0

R2 = correlation coefficient (r) squared.

4-*

I. Testing the Significance of Regression
The “ F” Test for Reliability of the regression model
This is the equivalent of a two tailed test of the null hypothesis β1 = 0.

It should provide the same results as the t-test for the null hypothesis for a simple regression model with one X variable.

It can be used to evaluate the entire model when the number of X variables increase.

Compare the calculated F value from regression with the F table value on pages 529-530.

Degrees of freedom for the numerator is the (K) number of X variables (e.g. 1 for simple regression)

Degrees of freedom for the denominator is n- (K+1) for the denominator. Where K is the number of X variables. (e.g. n-2 for simple regression)

F calc must exceed F table to reject the null hypothesis. The test also implies that the sample model represents the population relationships.

4-*

I. F-Test Requirement
F = Explained variation/K From SSR and SSE
Unexplained variation/[n-(K-1)]

Where:
K = the number of independent variables (X)
n = the number of dependent variable (Y) observations

1. Compare the calculated F-statistic with the table B-5 on pages 529-530.
a) Look up the column by using the K value
b) Look up the row by using the [n – (K+1)] value

2. Check to ensure that F-calc (statistic) is greater than F-table to reject Ho

3. The table provides a significance test at 95% confidence or alpha of .05

4. Comment on the F-Test to indicate the reliability of your model. Ideally,
you would like F values about 4 times greater than F table values.
Test the regression model hypothesis:
Ho: β1=β2=β3=……βn=0 or R2 = 0

4-*

Table b-5
Pg 529

4-*

J. Critical Regression Parameter Assumptions
Assumption 1. The relationship between Y and X is linear, as described in the above equation. This implies that Y is determined by X, rather than vice versa.

Assumption 2. Var(X) is nonzero, and finite for any sample. The values of Xt are not all the same. If Var(X) = 0, it would be impossible to estimate the impact of ∆X on Y.

Assumption 3. The error term (et) has zero expected value. That is random error terms will cancel out (+ and -) over the long run. ∑(et) = 0

Assumption 4. The error term (et) has constant variance for all observations and Var(et) = s2, where:

Assumption 5. The random variables et are uncorrelated, i.e., Cov(et, et-i) = 0 for all i.

Assumption 6. The error term et is normally distributed over the entire range of values.

*

4-*

X1
Y
J. Best Linear Unbiased Estimators
It is assumed that Y is normally distributed about the Ŷ values (values of Y estimated from the regression equation. They are Normally Distributed with Constant Variance Over the Entire Range of X.
Note that Sf and SEE take advantage of this property and enable confidence interval estimation.
If the Y values are normally distributed about the fitted Y values what does this mean for the residual distribution?
(See the next slide)

4-*

X1
J. Best Linear Unbiased Estimators
It is assumed that the residuals are normally distributed about a zero mean. They are Normally Distributed with Constant Variance Over the Entire Range of X (or time series).
Note that a residual time series plot is important to verify the mean and the constant variance.
Residuals
0

4-*

J. Implications of the Critical Assumptions

1) The model has at least three unknown parameters:
βo, β1 and s2. (intercept, slope and variance)

2) Each is an normal variate with mean βo + β1X and variance s2.

3) If we know βo, β1 and s2 we can forecast Y using standard normal distribution.

4) Sample estimates of βo, β1 can be obtained using Ordinary Least Squares (OLS) and result in the Best Linear Unbiased Estimates (BLUE)

Note: Your regression results cannot be BLUE unless the assumptions hold in slide J. Critical Regression Parameter Assumptions.

4-*

J. Regression Evaluation Questions

1. Does the sign of each X variable slope term make sense?

2. Does the t-test indicate that each X variable slope term is statistically significant either positive or negative?

3. How much of the dependent variable is explained by the independent variable(s) shown by Adjusted R2?

Is the F-calc value more than 3 times the significant F-table value?

Are there 10 data observations per each X variable?

Is the DW value close to 2?

Are the residuals random?

J. Regression Evaluation Check List

1. Does the sign of each X variable slope term make sense?

2. Does the t-test indicate that each X variable slope term is statistically significant either positive or negative? Does the P value support this?

3. How much of the dependent variable is explained by the independent variable(s) shown by Adjusted R2? Is the model underspecified?
Is the F-calc significant? Does the P value support this?
Is the F-calc value more than 3 times the significant F-table value?

Are there 10 data observations per each X variable? Is the model overspecified?

Is the DW value close to 2?
Is the VIF close to 2?
Are the residuals random?

4-*

Applying Regression in Your Project

1. Create separate columns in your Minitab worksheet for Hold Out period information for each variable (Y HO, X1 HO, X2 HO, X2 HO and any transformations or dummy variables.
2. Use the Fit data to derive the best regression model using significant variables with strong R2 contribution.
3. Open the Calc function and enter the column where you want to place the forecast on your worksheet.
4. Enter the Regression equation in the Calc function without Y or = signs. Make sure that all other signs are entered in the regression equation.
5. Enter the intercept term in the Calc equation.
6. Enter the columns where your X variables for the hold out period are stored in the appropriate place in the equation
7. Run the equation in calc. Your forecast will appear in the designated worksheet column. This will be your forecast for the hold out period. Subtract the forecast from the Hold Out Y values to get residuals for error measure calculation and residual analysis.

4-*

Required Hold Out Period Performance Assessment
Remember to evaluate the hold out period versus Forecast residuals using the same analysis you used in Fit residual evaluation.
Use the same error measures for the Fit and Hold Out period.

4-*

Process for Regression Forecasting

1. Look for causal relationships between the dependent variable to be forecasted and causal independent variables. Clearly state the forecasting problem and your hypothesis of causation.
2. Visually inspect the data looking for trend, seasonality and cycles for all variables (dependent and independent).
3. Determine the best regression model to fit the data (trend or causal, linear or nonlinear, simple regression or multiple regression).
4. Forecast the independent variable by substituting X values into the regression equation to get Y. You can use Minitab Calculator.
5. Specify the regression model by estimating the coefficients bo and b1, b2… Designate a hold out period
that is not used in the estimation of the coefficients.
6. Perform and “in sample” evaluation using error measures and error autocorrelations and hypothesis tests – (Test for Critical BLUE Assumptions).

4-*

7. Perform a hold out or “out of sample” evaluation using the same error measures and error tests.
8. Adjust or respecify the model as necessary by transforming, adding or deleting independent (explanatory) variables.
9. Repeat “in sample” and “out of sample” error measures and tests to ensure accuracy of the model.
10. Use the tested and selected model to forecast beyond the boundary of known variables or actual observations
11.Check the resulting forecast for reasonableness by plotting the actual observations along with the forecast results.
12. Declare the estimator BLUE (no multicolinearity, heteroscedasticity or serial correlation).
Process for Regression Forecasting (Continued)

4-*

Combining Forecasts Using Regression

The best results come from combining forecasts with varying methods or sources.

You can use any combination of quantitative and qualitative methods to derive a more accurate forecast than by one method alone.
Regression

Exponential Smoothing

Delphi Method

Sales Survey Method

Moving average Method
Regression
A single Forecast that is more accurate than any of the component forecasts.
Forecast of the same variable for the same period

4-*

Forecasts of the Same Series (Y) Can Be Quickly and Easily Combined
The forecast methods can be different – for example a regression forecast can be combined with a qualitative forecast and combined with a decomposition forecast and combined with and exponential smoothing forecast.

You must have the Fit and Forecast series for each forecast method along with the residuals and RMSE to combine forecasts effectively.

Selecting the appropriate weights to give to each forecast series can be done using several alternative methods but we will use regression to derive the weights (regression coefficients).

4-*

Regression

Exponential Smoothing

Delphi Method

Sales Survey Method

Moving average Method
Run correlation on squared residuals of each method – select methods with low correlations.
Run Regressions of selected method Fit values against actual Y values
Select high t and R square and RMSE value methods.
Rerun the regression forcing the equation through the origin – coefficients should sum close to one and represent the share of each forecast method.
Multiply the forecast values from each selected method by the regression coefficient and sum the results to get the combined forecast. Check the RMSE to ensure that it is low.
Regression Combination Procedure Overview

4-*

Combining a Subjective, Regression and a Decomposition Forecast
If you have forecasts from different methods you may combine them using regression to determine which share of the combined forecast is given to each method .

4-*

Note combining all three forecasts produces the best results.

4-*

The General Form for Combining Forecasts With Regression

Ŷ1 is the forecast series for the best forecast method. When compared to the observations (Y) values generates residuals that can be summarized with the lowest RMSE.

Ŷ2 is the next best forecast series and when compared to the Y observation values generates residuals that can be summarized by RMSE.

Problem: what weights to apply to each forecast series to get the best forecast results (lowest RMSE).

ŶF= β Ŷ1 + β Ŷ2

You may have more than 2 forecasts to combine. When you do combine the stepwise through the regressions. Combine two forecasts, then three and if necessary combine 4 in successive regression runs.. Note the RMSE and R square changes as you add other forecast (Ŷn) models. You must note the t values for the coefficients of your forecasts.

4-*

Using Regression to Combine Forecasts

This method uses regression to determine the share or percentage of each forecast method to use in the final forecast. Since qualitative factors may be represented with this method forecast accuracy should improve
You will need Actual Y values, Fits and Residuals and a Forecast for each forecast method to be combined. .

1. Make sure there is no overlap in model composition – Run correlation coefficient(s) on the squared residuals between each forecast Ŷ1 and Ŷ2. Correlation should be low. (In Minitab/Stat/Basic Statistics/Correlation).

2, Run regression with the actual observations (Y) as the dependent variable and the Fit series for each forecast method (Ŷ1) as X1 and Ŷ2 as X2.
(In Minitab Stat/Regression/Stepwise)

3. Check the t-statistic for the intercept (constant) coefficient to ensure that it is not significant. You want it to fail the Ho hypothesis. If it is significant you may want to include another forecast as X3 and check the intercept t statistic again.

4-*

4. Rerun the regression with the same data again and force the intercept through the origin (no intercept coefficient). Check the slope terms (coefficients) for forecast methods X1 and X2 to ensure they sum to approximately 1. The coefficients are the weights to apply to each forecast.
(In Minitab Stat/Regression/Regression)

5. Multiply each X variable forecast series by its coefficient weight (percentage) and sum for each period for the forecast.

6. Check the RMSE and MAPE the ensure that it is lower than individual forecast measures. (In Minitab use Calc to obtain RMSE or the square root of the average of the sum of the squared residuals)
Using Regression to Combine Forecasts (continued)

)

μ

(x

σ

σ

ρ

μ

x)

X

|

E(Y

x

X

Y

Y

+

=

=

(

)

.

|

X

X

Y

t

t

t

E

b

a

+

=

.

1

2

2

å

=

=

N

t

t

N

e

s

New folder/POWERPOINTS/Revised Minitab Regression Forecasting Instructions.pptx

In order to run the regression model to generate a forecast you should set up the data as described below.
Place the Y data in the first column (C1) followed by each independent variable (X) in sequence in columns (C2, C3 and C4).
If you have transforms of these variables substitute the columns where you have stored the transformation in each of these columns.
Next, if you have dummy variables place them in the next columns C5 through C15 for example for monthly seasonal dummy variables or C5 through C8 for quarterly seasonal dummy variables..
Next place the Hold Out information for each of the variables in the next columns (Y hold out data in C16 and X hold out data in columns C17 through C19) follow this with the forecast of the dummy variables for the hold out period in the next columns.
Now you should be prepared to run the regression. The regression results that you save in storage will appear after you designated variable columns.
Instructions for Running Forecasts with Regression Model
Setting Up The Data in the Minitab Worksheet

Revenue Use Charge Customer Rev HO Use HO Charge HO Cust HO
19.3 10413 1.33 139881 161.5 12892 5.01 254890
20.4 11129 1.29 142806 164.9 13090 5.12 264950
20.9 11361 1.25 146616 172.8 13789 5.28 264500
21.9 11960 1.21 151640 178.2 14288 5.38 274232
23.4 12498 1.19 157205 180.5 14678 5.79 284750
24.5 12667 1.19 162328 184.6 15290 5.92 280495
25.8 12857 1.21 166558
30.5 13843 1.29 170317
33.3 14223 1.33 175536
37.2 14427 1.42 181553
42.5 14878 1.52 188325
48.8 15763 1.59 194237
55.4 15130 1.84 198847
64.3 14697 2.17 201465
78.9 15221 2.55 203444
56.5 14166 2.97 205533
114.6 14854 3.70 208574
129.7 14997 4.10 210811
126.1 13674 4.34 212865
132.0 13062 4.71 214479
138.1 13284 4.82 215610
141.2 13531 4.81 217164
143.7 13589 4.81 219968
149.2 13800 4.84 223364
146.1 13287 4.83 227575
153.9 13406 4.91 233795
146.9 12661 4.84 239733
156.8 12434 4.98 253364
Note that the Y data (Revenue ) is listed first followed by the three independent variables (Use, Charge and Customers.
If you have dummy variables they would follow in the next columns. Finally the Hold Out data for each variable is listed in a separate column.

You should run exploratory regressions first in Minitab/Stat/Regression/Regression
Enter the Y data in the “Response:” area and enter the X data in the “Predictors:” area in sequence. Under the Graphs option in the main menu select “Regular” and “Four in one”.
In the main menu under Options select under Display “Variance inflation factors” and “Durbin-Watson statistic” and select OK.
Under the main menu “Storage” option select “Residuals”, “Coefficients”, “Fits” and “MSE” and select OK.
Under the main menu “Results” select the last choice that should give you everything and then select OK.
Select OK to run the regression and obtain the results. After you feel that you have a good regression and want to develop a forecast go to the next steps on the following slide. See the last slide on Analysis of Regression Results to ensure that your regression is worthy of use in forecasting.

Instructions for Running Forecasts with Regression Model
Running Exploratory Regressions

Instructions for Running Forecasts with Regression Model
Running a Regression Forecast

Select Minitab/Stat/Regression/Regression
In the main menu select the Y variable as the “Response”
Select the X variables in sequence for the “Predictors:”
Select the Dummy variables in sequence and add them to the “Predictors”

Under “Options…” make sure the Variance inflation factors and the Durbin-Watson statistic boxes are checked.
Ensure that 95.0 is filled in the confidence intervals
Now to predict with regression you use the forecasted or Hold Out X and Dummy variables
Again under “Options” menu in “Prediction intervals for new observations” menu option and enter the columns for the X variables including the dummy variables in the same sequence as above.
Select Fits, Confidence limits and select OK.

Under the “Storage…” option in the main menu select Coefficients, Fits Residuals and MSE then select OK.
Under the “Results..” option in the main menu basically select everything in Results (The last option) then select OK.

In the main menu under “Graphs” select Regular and Four in one. Then select OK.
Select OK in the main menu to run the regression – You should get a residual column (RESI1), and Fits column (FITS1) and forecast (PFIT1) and lower and upper 95% confidence limits (CLIM1 and CLIM2)

Instructions for Running Forecasts with Regression Model (cont.)
Running a Regression Forecast

Analysis of Minitab Regression Results
Make sure that you observe the size and sign of the variable coefficients
Evaluate their t and p values for significance
Evaluate their VIF values for multicollinearity
Evaluate the regression DW statistic for serial correlation,
Evaluate the R square size for regression strength and the F value size for regression significance – note the p value – at or below .05.
Note the four-in-one graph for residual histogram and normality plot for residual randomness and residual versus order for randomness and for variation over time (megaphone effect) for heteroscedasticity.
Next subject the residuals to autocorrelation analysis to check for randomness.
Evaluate the Fit error measures from the residuals (RMSE and MAPE) for size.
Evaluate the forecast of Y with a time series plot to check for reasonableness. Plot the forecast period residuals to check for randomness.
Evaluate the Forecast period error measures (RMSE and MAPE) for size – compare it to the Fit period error measures and note any increase or decrease.

Note for each of these checks you must comment on what you have found.

General Regression Analysis: Revenue versus Use, Charge, Customer

Regression Equation

Revenue = -65.625 + 0.00172984 Use + 29.4964 Charge + 0.000196841 Customer

Coefficients

Term Coef SE Coef T P 95% CI VIF
Constant -65.6250 14.8252 -4.4266 0.000 (-96.2227, -35.0274)
Use 0.0017 0.0015 1.1663 0.255 ( -0.0013, 0.0048) 2.1511
Charge 29.4964 2.4063 12.2579 0.000 ( 24.5300, 34.4628) 8.5151
Customer 0.0002 0.0001 1.4399 0.163 ( -0.0001, 0.0005) 10.2797

Summary of Model

S = 6.90038 R-Sq = 98.54% R-Sq(adj) = 98.36%
PRESS = 1378.33 R-Sq(pred) = 98.24%
Since this is multiple regression note the R-Sq(adj) value to determine regression explanatory power.
Note the sign and magnitude of the coefficients.
Note the t and p values for the coefficients and the VIF levels. VIF is used to check for multicollinearity.

Analysis of Variance

Source DF Seq SS Adj SS Adj MS F P
Regression 3 77036.9 77036.9 25679.0 539.30 0.000000
Use 1 2733.8 64.8 64.8 1.360 0.254957
Charge 1 74204.4 7154.5 7154.5 150.256 0.00000 Customer 1 98.7 98.7 98.7 2.073 0.162816
Error 24 1142.8 1142.8 47.6
Total 27 78179.7
Note the F value and its associated p value fro the Regression. The F value must be larger than the F table value to be significant. If not then R square is = 0 (the null hypothesis holds).
Durbin-Watson statistic = 2.20656
Note the DW statistic and check it against the DW Lower table value. This is the test for serial correlation.

The Four in one Residual Plots
Check for heteroscedasticity
Check for a normal distribution.
Check for normal distribution.
Check for a normal distribution with zero mean

RESI1 COEF1 FITS1
0.1477 -65.6250 19.152
0.6132 0.0017 19.787
1.1418 29.4964 19.758
1.2965 0.0002 20.603
1.3604 22.040
1.1596 23.340
0.7084 25.092
0.6031 29.897
0.5386 32.761
0.2467 36.953
0.4839 42.016
2.0245 46.776
1.4379 53.962
0.8378 63.462
2.9332 75.967
-30.4415 86.941
4.3374 110.263
6.9512 122.749
-1.8437 127.944
-6.1164 138.116
-3.8676 141.968
-1.2058 142.406
0.6419 143.058
4.2235 144.976
1.4770 144.623
5.4871 148.413
0.6717 146.228
4.1518 152.648
We have also saved the residuals, coefficients and fitted values in the worksheet.

Plot the fitted values appended with the forecast to check for reasonableness.

Run autocorrelations on the residuals with at least 24 lag periods to check randomness.
Regression Additions to the Worksheet

Check for random behavior in the residuals – ensure there is no trend, cycle or seasonality present in the residuals.
Evaluating Regression Fit Residuals

Check the Error Measures
RMSE = 6.389
MAPE = .0462

Calculate these measures manually for the Fit period. Then calculate the same measure values for the forecast period.

PFIT1 CLIM1 CLIM2
154.626 145.009 164.242
160.193 148.894 171.493
166.033 156.842 175.224
171.762 161.423 182.100
186.600 175.691 197.509
190.656 181.723 199.588

Next check the forecast for reasonableness and plot with the fitted values. The PFIT1 column is the predicted Fits or Forecast values. The CLIM1 and 2 are the 95% confidence bands around your forecast values. There is a 95% probability that new observed values of Y will fall within this band.

You may plot this with the confidence interval as well if you wish.
Evaluate the Forecast

‘Rev HO’-‘PFIT1’
6.87425
4.70692
6.76692
6.43844
-6.10009
-6.05572
Forecast residuals

In Minitab calculator subtract the Hold Out forecast values (PFIT1) from the Hold Out actual values to get the Forecast residuals.
Perform a time series plot of the Forecast period residuals and check for randomness.
Evaluate the Forecast (cont.)

RMSE = 6.199
MAPE = 0.036

RMSE = 6.389
MAPE = .0462
Calculate the Forecast period error measures manually Compare the Fit period and Forecast period error measures
Evaluate the Forecast (cont.)
Note if the error measures became worse or stayed basically the same for both periods. Did the forecast “accuracy” stay consistent with the model “goodness to fit”? Comment on what this means relative to the application of the regression model to the forecast of Y.
Fit
Forecast

New folder/Project Outline Revised
Economics 309
Applied Financial and Economic Forecasting

Professor:
Stanley Holmes, Ph.D.
Email:

Stanley.Holmes@tamuc.edu

Office:

BA 132D
Phone:

(903) 468-6029 or (903) 365-7190

Fax:

(903) 886-5601

Semester Project

The objective of the project is to simulate a problem you might face in your first job or business—developing the best 2 year forecast for company sales. Forecasting is an important process in almost all businesses and is an important skill to possess as an employee. The project is due by midnight Central Time on 12/2/2013. You are strongly encouraged to spread the work throughout the semester.
1. Selecting independent (X) variables for your project

Hopefully you have all passed this stage. You need to pick X variable quarterly data sets. Make sure the data are NOT seasonally adjusted. The data relationships should be logical and not spurious. For example, do not use the price of hamburger to forecast auto sales.
2. Writing an abstract

After you pick your topic, it is a good idea to write a brief abstract to help focus your thoughts. The abstract should not be more than a page. It should include
(i) State your forecasting problem- develop the best forecast of each variable.
(ii) Information about and description of your data

(iii) Description of your proposed models

3. Body of the paper

The paper will contain detailed explanations of your thought processes and methodology. At every stage you need to answer the following questions for each variable:
(i) What did you do (ex: Winter’s method with these parameter values…)

(ii) Why did you do it?

(iii) What did you find? (Interpretation of your results)

(iv) What is your conclusion?
Remember this is a business project report for your company – Do Not Show failed model attempts. Show only the best model for each variable for each forecast method.
4. Methodology

The body of your paper will include all four major forecasting techniques:

(i)
Smoothing for each X variable (Chapters 8): You need explain what, why, how etc.
(ii)
Box-Jenkins for each X variable (Chapters 9, 10, 11, 12):
(iii) Decomposition for each X variable (Chapter 7)

(iv) Multiple Regression forecast of the sales variable (Y) (Chapter 3, 4, 5 and 6): You will need other variable(s) to help explain the variation in the sales variable of interest. You need to start looking for those explanatory (X) variables!!
5. Appendix

Your appendix should contain all the relevant supporting printouts such as the tests, plots, graphs etc. that were not used in the discussion in the body of the presentation. It is better to divide the appendix into parts where each part is representing the output from each methodology (ex: Appendix A: Variable Data and citations, Appendix B: X variable outputs, Appendix C: Sales Revenue outputs). Don’t forget to name or number the appendices, you will need to refer to them when you are talking about a specific methodology.
6. Final Report

Your final report should include an executive summary
of your findings which will include the conclusions (Which of the techniques works best and why for each X variable?). You can add the results to your abstract and make the abstract into an executive summary. You need to have an
introduction to talk about your topic in greater length. It will include above
sections 3, 4 and 5 and a conclusion
. Be detailed and complete in all your sections. This is a case of more is better than less as you explain to me what you have done.

� Please remember that there is NOT one right way of preparing a paper. Be precise, explain everything in detail and be structured. The outline is ONLY to remind you about the required parts; its arrangement is completely up to you.

New folder/ProjectProposal Eco 309 Revised
Project Proposal
Due by Midnight 9/16/2013
You have been hired by a major corporation to forecast their quarterly sales for two years. As a result, you need to show progress in the initial forecasting steps. Your proposal should begin with a hypothesis statement about the variables that you selected to use in the forecast of company sales.

1. Your proposal should begin with a hypothesis statement about the variables that you selected to use in the forecast of company sales. Include a brief statement of why you chose each variable to forecast sales.

2. Obtain quarterly sales/revenue data for the company you are assigned and load it into Minitab and save it as your Eco 309 Project. See the email that I sent you earlier with the quarterly data. You can refer to the project outline for further details. You may also download and save the data as an EXCEL file. The first column should be the dates and second column should be your data. Always label your columns appropriately.
3. Go to one of the approved data sources. This time you will need to obtain minimum 3 related variables. These variables should be values determined outside of the firm. (Not company accounting data) We will call these related variables the X variables. These variables will be quarterly consistent with your Y variable (If your Y variable is quarterly, your X variables will also be quarterly). All variables should be from the same time period. You can refer to the project outline for further details.

4. As an option for one of the three X variables you may Ggo to the company web site, annual reports or 10-K SEC filings and obtain any relevant internal data that you wish to include as another X variable. This could be R&D Expenses, Marketing Expenses, Total Expenses Product Units, Employees, etc. that you believe are important to the output of the company. Make sure that this variable is quarterly as well and covers the same time as the Y variable.

5. As another option for one of the three X variables you may go to the company data in Doc Sharing and see if you can find either competitor companies or companies that produce goods that are complimentary to your assigned company. You may also look for competitors sales data from 10-Ks and annual reports. Remember that this must also cover the entire time period you have for the assigned Y data. You may use the revenue/sales figures for these companies as another X variable.
6. Download and save all your X variables in your Minitab project worksheet labeled appropriately and save Minitab project. You may also save them as an EXCEL file. Recall that the first column is dates and second column is your Y variable. Your third column will be X1, 4th will be X2 and so forth. Label your X variables properly so I can understand what those variables are.

7. At this point you should have a copy of dates (C1), Y variable (C2) and X variables (C3 – C5) in a MINITAB project worksheet. Make sure that the variable columns are labeled correctly so I can understand which variables you are analyzing.

8. Run individual “Time Series Plots” and “ACF” on all your variables. Note the Trend, Cycle and Seasonality characteristics that are indicated by these plots. Note which X variable trend, cycle and seasonality characteristics are similar to the Y variable characteristics.
9. Produce descriptive statistics for all your variables. Comment on the mean, range and standard deviation and what they indicate relative to each variable.

10. Run Scatter plots of Y against each X separately. Note the linearity of the relationship. Classify it as positive or negative, weak, moderate or strong.
11. Develop and show the correlation matrix (you can find it under Stat>Basic Statistics>Correlation) of Y and all Xs. Enter the Y values first followed by each X that you have in order of the data columns. You need to tell me which Xs are more related to Y and which Xs are not. You need to prove that the relationships are statistically significant.
12. Your proposal should be a word document and include the pasted relevant Minitab output as well as the, the data used in a table at the end of the word document. The table should be followed by a brief description and source of each variable. You should put the webpage or document page that each variable came from.

Do not submit Minitab files. Submit the word proposal document in the proposal dropbox.

New folder/Seasonal Forecast Dummy Variables For Regression.xlsx
Monthly Seasonal Dummy Variable

S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 Simply cut at past for each year. You should have 11 monthly seasonal columns that range over the entire fit period data. You need to also have 11 monthly columns for 1 years of monthly data for the hold out period. These S variables should be treated as any other variable relative to signficance tests.

0 0 0 0 0 0 0 0 0 0 0

1 0 0 0 0 0 0 0 0 0 0

0 1 0 0 0 0 0 0 0 0 0

0 0 1 0 0 0 0 0 0 0 0

0 0 0 1 0 0 0 0 0 0 0

0 0 0 0 1 0 0 0 0 0 0

0 0 0 0 0 1 0 0 0 0 0

0 0 0 0 0 0 1 0 0 0 0

0 0 0 0 0 0 0 1 0 0 0

0 0 0 0 0 0 0 0 1 0 0

0 0 0 0 0 0 0 0 0 1 0

0 0 0 0 0 0 0 0 0 0 1

Quarterly Seasonal Dummy Var

S1 S2 S3 Simply cut at past for each year. You should have 3 quarterly seasonal columns that range over the entire fit period data. You need to also have 3 quarterly columns for 2 years of quarterly data for the hold out period. These S variables should be treated as any other variable relative to signficance tests.

0 0 0

1 0 0

0 1 0

0 0 1

Sheet3

New folder/SOLVED ASSIGNMENTS/chapter_2_assignment_2 (1) x
Chapter 2 Assignment 2 Due by midnight September 6.
Submit your hypothesis relative to the X variables that you are considering that cause Y (your assigned company sales revenue) to change. Write your hypothesis in the form
Y = f(X1, X2, X3 etc..)  
Be sure to include the name in place of each X variable so I can tell what it is.   Also, include the sign (+ or -) in front of the name to tell me the hypothesized direction of the relationship.  I do not need the data at this point.  Submit your hypotheses with your first initial, last name and Assignment 2 in the file name in the assignment 2 dropbox.

Chapter 2 Assignment 2

The variables considered in the multiple regression model are:
Y= Quarterly Revenue of CostCo from 1st quarter of 1995 to 2nd quarter of 2013
X1 = Quarterly Revenue of Walmart from 1st quarter of 1989 to 2nd quarter of 2013
X2= Total monthly export data of goods and services published by U.S. Census Bureau, Foreign Trade Division from January 1995 to June 2013.
X3=Industrial production Index of crude oil from 1972 till 2013.
The Multiple Regression Equation
Y= α + β1 (X1) + β2 (X2) + β3 (X3)
The model will develop the best quarterly forecast of CostCo Revenue.
CostCo and Walmart are competitors so the relation between Quarterly Revenue of the companies should be negative.
So β1 is negative
As the exports increase, so will the revenue so
So β2 is positive
Industrial production Index of crude oil will determine the cost of exporting as the cost increases, the revenue will decrease so the relation will be negative
So β3 is negative

So the model will be
The Multiple Regression Equation
Y= α – β1 (X1) + β2 (X2) – β3 (X3)
H01: β1 =0
H02: β2 =0
H03: β3 =0
Beta =0, means that independent variable does not affect the variable Y

New folder/SOLVED ASSIGNMENTS/Chapter_3_Assignment_3_Part_1 (1) x
Chapter 3: Chapter 3 – Assignment 3 Project Proposal

The variables considered for the study are described below.  

Y= Quarterly Revenue of CostCo from 1st quarter of 1995 to 2nd quarter of 2013
X1 = Quarterly Revenue of Walmart from 1st quarter of 1989 to 2nd quarter of 2013
X2= Total monthly export data of goods and services published by U.S. Census Bureau, Foreign Trade Division from January 1995 to June 2013.
X3=Industrial production Index of crude oil from 1972 till 2013.

Hypothesis

Usually revenue of firm is closely associated or can be predicted from its competitor’s revenue and sales. In this paper, it is assumed that the quarterly revenue of Walmart linearly affects the quarterly revenue of its competitor CostCo.
Industrial production Index of crude oil is a macroeconomic indicator that affects the stock prices. Oil prices can be shown to influence macroeconomic indicators and stock market returns, by examining the effects of oil prices in industrial production and inflation (Darby, 1982; Hamilton, 1983; Burbridge and Harrison, 1984).
Higher oil prices result to higher costs of production and, subsequently, to lower production or lower expected earnings (Jones et al, 2004).
Oil prices influence CPI, since higher oil prices can result to higher CPI levels and this could lead to higher inflation.

Export is also a good indicator of economic growth and the revenue of CostCo is expected to directly relate to export figures of the country.

Time series and ACF plot

The Figures below shows the time series plot of the variables Y, X1, X2 and X3. The time series plot shows that the CostCo Revenue shows a trend pattern. This is normal feature that with time sales increase but there is sharp decline Walmart Revenue in 2008 and 2009. The reason is obvious. CostCo could sustain its sales even in the recession but such was not the case with every company. Time series of Exports also shows the trend. But, the time series plot of IPIC shows sharp increases during the period of recession. When the recession was there, the cost of oil increased. Otherwise the variable has no variation and shows a significant Cyclical component.

Plots in figures below show the ACF of the four variables. Default lag option was taken for all variables Six lags are significant in the Cost Co revenue and Walmart revenue data but it is decaying to zero gradually. The other variables also show six lags that are significant.

Descriptive Statistics

Cost Co revenue

 

Walmart Revenue

 

Exports

 

IPIC

 

 

 

 

 

 

 

 

 

Mean

13024.49

Mean

43364.07

Mean

112528.34

Mean

166.37

Standard Error

785.86

Standard Error

2810.65

Standard Error

4539.15

Standard Error

19.82

Standard Deviation

6760.27

Standard Deviation

24178.16

Standard Deviation

39047.24

Standard Deviation

170.53

Kurtosis

-0.42

Kurtosis

-0.83

Kurtosis

-0.93

Kurtosis

13.34

Skewness

0.61

Skewness

0.69

Skewness

0.67

Skewness

3.68

Range

28321.76

Range

82871.70

Range

126179.00

Range

898.55

Minimum

3896.24

Minimum

16206.30

Minimum

64349.00

Minimum

100.52

Maximum

32218.00

Maximum

99078.00

Maximum

190528.00

Maximum

999.07

Count

74.00

Count

74.00

Count

74.00

Count

74.00

The above analysis gives the detail of all the variables in a snapshot. We can see the variability in data by Standard Deviation. Highest variability is seen in exports. The range is also supporting the variability in export data.
Correlation matrix shows the relation between the various variables. The revenue of CostCo and Walmart are negatively related as they are competitors. The high degree of correlation explains the consistent sales growth in CostCo irrespective of the recession and the cost of transportation does not have direct relation with the sales of CostCo. Whether it is high or low, the company has to bear the expenses and even high cost does not deter the profits.

 

CostCo Revenue

Walmart Revenue

Exports

IPIC

CostCo Revenue

1.00

 

 

 

Walmart Revenue

-0.15

1.00

 

 

Exports

0.94

-0.25

1.00

 

IPIC

0.17

0.43

0.07

1.00

We can view the relation between the variables in a scatter plot

The scatter plot of the revenue data of the both the companies show a negative relation and a weak correlation. Strongest relation is seen with exports and there is a spurious relation between revenue and IPIC.

References

Burbridge, J., and A. Harrison. (1984), Testing for the effects of oil-price rises using vector autoregressions. International Economic Review 25(1):459-484
Darby, M.R. (1982), The price of oil and world inflation and recession. American Economic Review 72(4):738-751
Jones, D.W., P.N. Lelby., and I.K. Paik. (2004), Oil price shocks and the macroeconomy: what has been learned since 1996. Energy Journal 25(2):1-32

7
0
6
3
5
6
4
9
4
2
3
5
2
8
2
1
1
4
7
1
3
5
0
0
0
3
0
0
0
0
2
5
0
0
0
2
0
0
0
0
1
5
0
0
0
1
0
0
0
0
5
0
0
0
0
I
n
d
e
x
C
o
s
t
C
o

R
e
v
e
n
u
e
T
i
m
e

S
e
r
i
e
s

P
l
o
t

o
f

C
o
s
t
C
o

R
e
v
e
n
u
e
1
8
1
6
1
4
1
2
1
0
8
6
4
2
1
.
0
0
.
8
0
.
6
0
.
4
0
.
2
0
.
0

0
.
2

0
.
4

0
.
6

0
.
8

1
.
0
L
a
g
A
u
t
o
c
o
r
r
e
l
a
t
i
o
n
A
u
t
o
c
o
r
r
e
l
a
t
i
o
n

F
u
n
c
t
i
o
n

f
o
r

C
o
s
t
C
o

R
e
v
e
n
u
e
(
w
i
t
h

5
%

s
i
g
n
i
f
i
c
a
n
c
e

l
i
m
i
t
s

f
o
r

t
h
e

a
u
t
o
c
o
r
r
e
l
a
t
i
o
n
s
)
1
8
1
6
1
4
1
2
1
0
8
6
4
2
1
.
0
0
.
8
0
.
6
0
.
4
0
.
2
0
.
0

0
.
2

0
.
4

0
.
6

0
.
8

1
.
0
L
a
g
A
u
t
o
c
o
r
r
e
l
a
t
i
o
n
A
u
t
o
c
o
r
r
e
l
a
t
i
o
n

F
u
n
c
t
i
o
n

f
o
r

C
o
s
t
C
o

R
e
v
e
n
u
e
(
w
i
t
h

5
%

s
i
g
n
i
f
i
c
a
n
c
e

l
i
m
i
t
s

f
o
r

t
h
e

a
u
t
o
c
o
r
r
e
l
a
t
i
o
n
s
)
1
8
1
6
1
4
1
2
1
0
8
6
4
2
1
.
0
0
.
8
0
.
6
0
.
4
0
.
2
0
.
0

0
.
2

0
.
4

0
.
6

0
.
8

1
.
0
L
a
g
A
u
t
o
c
o
r
r
e
l
a
t
i
o
n
A
u
t
o
c
o
r
r
e
l
a
t
i
o
n

F
u
n
c
t
i
o
n

f
o
r

C
o
s
t
C
o

R
e
v
e
n
u
e
(
w
i
t
h

5
%

s
i
g
n
i
f
i
c
a
n
c
e

l
i
m
i
t
s

f
o
r

t
h
e

a
u
t
o
c
o
r
r
e
l
a
t
i
o
n
s
)
1
8
1
6
1
4
1
2
1
0
8
6
4
2
1
.
0
0
.
8
0
.
6
0
.
4
0
.
2
0
.
0

0
.
2

0
.
4

0
.
6

0
.
8

1
.
0
L
a
g
A
u
t
o
c
o
r
r
e
l
a
t
i
o
n
A
u
t
o
c
o
r
r
e
l
a
t
i
o
n

F
u
n
c
t
i
o
n

f
o
r

C
o
s
t
C
o

R
e
v
e
n
u
e
(
w
i
t
h

5
%

s
i
g
n
i
f
i
c
a
n
c
e

l
i
m
i
t
s

f
o
r

t
h
e

a
u
t
o
c
o
r
r
e
l
a
t
i
o
n
s
)
1
0
0
0
0
0
9
0
0
0
0
8
0
0
0
0
7
0
0
0
0
6
0
0
0
0
5
0
0
0
0
4
0
0
0
0
3
0
0
0
0
2
0
0
0
0
1
0
0
0
0
3
5
0
0
0
3
0
0
0
0
2
5
0
0
0
2
0
0
0
0
1
5
0
0
0
1
0
0
0
0
5
0
0
0
0
W
a
l
m
a
r
t

R
e
v
e
n
u
e
C
o
s
t
C
o

R
e
v
e
n
u
e
S
c
a
t
t
e
r
p
l
o
t

o
f

C
o
s
t
C
o

R
e
v
e
n
u
e

v
s

W
a
l
m
a
r
t

R
e
v
e
n
u
e
2
0
0
0
0
0
1
7
5
0
0
0
1
5
0
0
0
0
1
2
5
0
0
0
1
0
0
0
0
0
7
5
0
0
0
5
0
0
0
0
3
5
0
0
0
3
0
0
0
0
2
5
0
0
0
2
0
0
0
0
1
5
0
0
0
1
0
0
0
0
5
0
0
0
0
E
x
p
o
r
t
s
C
o
s
t
C
o

R
e
v
e
n
u
e
S
c
a
t
t
e
r
p
l
o
t

o
f

C
o
s
t
C
o

R
e
v
e
n
u
e

v
s

E
x
p
o
r
t
s
1
0
0
0
8
0
0
6
0
0
4
0
0
2
0
0
0
3
5
0
0
0
3
0
0
0
0
2
5
0
0
0
2
0
0
0
0
1
5
0
0
0
1
0
0
0
0
5
0
0
0
0
I
P
I
C
C
o
s
t
C
o

R
e
v
e
n
u
e
S
c
a
t
t
e
r
p
l
o
t

o
f

C
o
s
t
C
o

R
e
v
e
n
u
e

v
s

I
P
I
C
7
0
6
3
5
6
4
9
4
2
3
5
2
8
2
1
1
4
7
1
1
0
0
0
8
0
0
6
0
0
4
0
0
2
0
0
0
I
n
d
e
x
I
P
I
C
T
i
m
e

S
e
r
i
e
s

P
l
o
t

o
f

I
P
I
C
7
0
6
3
5
6
4
9
4
2
3
5
2
8
2
1
1
4
7
1
1
0
0
0
0
0
9
0
0
0
0
8
0
0
0
0
7
0
0
0
0
6
0
0
0
0
5
0
0
0
0
4
0
0
0
0
3
0
0
0
0
2
0
0
0
0
1
0
0
0
0
I
n
d
e
x
W
a
l
m
a
r
t

R
e
v
e
n
u
e
T
i
m
e

S
e
r
i
e
s

P
l
o
t

o
f

W
a
l
m
a
r
t

R
e
v
e
n
u
e
7
0
6
3
5
6
4
9
4
2
3
5
2
8
2
1
1
4
7
1
2
0
0
0
0
0
1
7
5
0
0
0
1
5
0
0
0
0
1
2
5
0
0
0
1
0
0
0
0
0
7
5
0
0
0
5
0
0
0
0
I
n
d
e
x
E
x
p
o
r
t
s
T
i
m
e

S
e
r
i
e
s

P
l
o
t

o
f

E
x
p
o
r
t
s

New folder/X1- WALMART.xlsx
Revenue

WALMART

Date Revenue

3/31/89 5373.2598

6/30/89 6046.4131

9/29/89 6283.4971

12/29/89 8107.4858

3/30/90 6768.1948

6/29/90 7543.5098

9/28/90 7930.9512

12/31/90 10358.9385

3/29/91 9280.5703

6/28/91 10339.9717

9/30/91 10627.5

12/31/91 13638.8604

3/31/92 11649.4297

6/30/92 13028.4502

9/30/92 13683.8203

12/31/92 17122.0703

3/31/93 13920.4072

6/30/93 16236.5771

9/30/93 16826.998

12/31/93 21001.5605

3/31/94 17686.1348

6/30/94 19942.3145

9/30/94 20417.7168

12/30/94 24447.834

3/31/95 20440

6/30/95 22723

9/29/95 22913

12/29/95 27551

3/29/96 22772

6/28/96 25587

9/30/96 25644

12/31/96 30856

3/31/97 25409

6/30/97 28386

9/30/97 28777

12/31/97 35386

3/31/98 29819

6/30/98 33521

9/30/98 33509

12/31/98 40785

3/31/99 34717

6/30/99 38470

9/30/99 40432

12/31/99 51394

3/31/00 42985

6/30/00 46112

9/29/00 45676

12/29/00 56556

3/30/01 48052

6/29/01 52799

9/28/01 52738

12/31/01 64210

3/29/02 52126

6/28/02 56271

9/30/02 55241

12/31/02 66400

3/31/03 56718

6/30/03 62637

9/30/03 62480

12/31/03 74494

3/31/04 64763

6/30/04 69722

9/30/04 68520

12/31/04 82217

3/31/05 71680

6/30/05 76697

9/30/05 75397

12/30/05 88418

3/31/06 79676

6/30/06 85430

9/29/06 84467

12/29/06 99078

3/30/07 86410

6/29/07 92999 12/31/08 83.597

9/28/07 91865 3/31/09 61.837

12/31/07 107202 6/30/09 68.816

3/31/08 94940 9/30/09 77.752

6/30/08 102342 12/31/09 77.721

9/30/08 98345 3/31/10 80.553

12/31/08 108747 6/30/10 82.561

3/31/09 94242 9/30/10 80.681

6/30/09 100876 12/31/10 80.927

9/30/09 99373 3/31/11 79.206

12/31/09 113622 6/30/11 85.536

3/31/10 99811 9/30/11 90.74

6/30/10 103726 12/30/11 84.945

9/30/10 101952 3/30/12 85.966

12/31/10 116360 6/29/12 87.022

3/31/11 104189 9/28/12 84.851

6/30/11 109366 12/31/12 95.264

9/30/11 110226 3/29/13 86.712

12/30/11 123169 6/28/13 93.103

3/30/12 113018

6/29/12 114296

9/28/12 113929

12/31/12 127919

3/29/13 114187

6/28/13 116945

New folder/X2-EXPORTS.xls
Exhibit 1 History

U.S. International Trade in Goods and Services, 1992 – Present

In millions of dollars. Seasonally adjusted; details may not equal totals due to seasonal adjustment and rounding.

Goods data presented on a Balance of Payments (BOP) Basis. Source: U.S. Census Bureau, Foreign Trade Division.

Exports

Period Total Goods Services

1995

Jan. – Dec. 794,387 575,204 219,183

March 64,349 46,882 17,467

April 65,074 47,165 17,909

May 65,755 47,742 18,013

June 64,855 47,614 17,241

July 66,560 47,985 18,575

August 67,469 48,770 18,699

September 68,943 49,781 19,162

October 68,105 49,272 18,833

November 68,009 48,852 19,157

December 69,071 49,653 19,418

1996

Jan. – Dec. 851,602 612,113 239,489

January 68,299 49,554 18,745

February 69,899 50,973 18,926

March 69,797 50,026 19,771

April 70,126 50,797 19,329

May 71,029 50,914 20,115

June 71,056 51,150 19,906

July 69,404 50,030 19,374

August 70,563 50,877 19,686

September 70,553 50,949 19,604

October 73,769 52,306 21,463

November 74,263 52,698 21,565

December 72,842 51,840 21,002

1997

Jan. – Dec. 934,453 678,366 256,087

January 72,910 52,260 20,650

February 75,225 54,470 20,755

March 77,050 55,940 21,110

April 77,769 56,621 21,148

May 77,883 56,386 21,497

June 78,889 57,242 21,647

July 80,099 58,545 21,554

August 78,622 56,895 21,727

September 79,287 57,714 21,573

October 79,383 57,655 21,728

November 78,300 56,982 21,318

December 79,038 57,654 21,384

1998

Jan.- Dec. 933,174 670,416 262,758

January 79,008 57,500 21,508

February 77,931 56,627 21,304

March 78,812 56,934 21,878

April 77,626 55,229 22,397

May 77,147 55,073 22,074

June 76,961 55,257 21,704

July 76,149 54,618 21,531

August 75,512 53,945 21,567

September 77,179 55,490 21,689

October 79,351 56,962 22,389

November 79,117 56,949 22,168

December 78,382 55,833 22,549

1999

Jan.-Dec. 967,008 698,218 268,790

January 77,835 56,269 21,566

February 77,436 55,645 21,791

March 78,243 55,990 22,253

April 78,716 56,882 21,835

May 78,782 56,723 22,059

June 79,070 56,665 22,405

July 80,019 57,500 22,519

August 81,508 58,930 22,578

September 82,718 59,920 22,798

October 83,410 60,274 23,136

November 84,286 60,836 23,450

December 84,983 62,585 22,397

2000

Jan.-Dec. 1,072,782 784,781 288,002

January 85,556 62,483 23,074

February 86,055 62,425 23,631

March 87,277 63,539 23,738

April 88,643 64,276 24,366

May 87,805 64,002 23,804

June 90,549 66,347 24,202

July 90,054 66,022 24,032

August 92,058 67,903 24,155

September 91,981 67,845 24,136

October 91,330 67,048 24,282

November 91,213 66,876 24,337

December 90,260 66,015 24,245

2001

Jan.-Dec. 1,007,725 731,189 276,537

January 90,277 65,974 24,304

February 90,282 66,267 24,014

March 88,714 64,521 24,193

April 86,988 63,257 23,731

May 87,482 63,875 23,607

June 85,265 61,355 23,910

July 82,967 59,402 23,565

August 83,730 59,999 23,731

September 77,286 56,516 20,770

October 78,114 57,068 21,046

November 78,439 56,802 21,637

December 78,180 56,152 22,027

2002

Jan.-Dec. 980,879 697,439 283,440

January 78,966 56,403 22,563

February 78,910 56,091 22,819

March 79,613 56,178 23,435

April 81,500 58,316 23,184

May 81,682 58,173 23,509

June 82,705 59,192 23,513

July 83,125 59,526 23,599

August 83,484 59,568 23,916

September 82,893 59,194 23,699

October 82,565 58,382 24,184

November 83,747 59,254 24,493

December 81,686 57,162 24,524

2003

Jan.-Dec. 1,023,937 729,816 294,121

January 82,502 58,645 23,857

February 83,186 59,347 23,840

March 82,944 59,435 23,509

April 81,511 58,752 22,759

May 82,325 58,769 23,556

June 84,810 60,892 23,917

July 85,520 61,158 24,362

August 84,409 59,679 24,730

September 86,399 61,349 25,050

October 88,632 62,793 25,839

November 90,998 64,916 26,082

December 90,699 64,081 26,618

2004

Jan.-Dec. 1,163,724 821,986 341,739

January 90,225 63,142 27,084

February 93,935 66,330 27,605

March 96,472 68,352 28,120

April 95,889 67,682 28,207

May 97,387 69,346 28,041

June 94,993 66,887 28,106

July 96,694 68,368 28,326

August 97,147 68,836 28,311

September 98,543 69,994 28,549

October 100,039 70,788 29,251

November 99,829 70,004 29,825

December 102,571 72,258 30,313

2005

Jan.-Dec. 1,288,257 911,686 376,571

January 102,986 72,545 30,440

February 103,806 73,071 30,736

March 104,752 73,599 31,153

April 107,031 76,295 30,736

May 106,552 75,679 30,873

June 106,652 75,656 30,996

July 106,977 75,853 31,124

August 108,361 77,028 31,333

September 107,166 75,230 31,936

October 109,634 77,329 32,305

November 110,859 78,733 32,126

December 113,482 80,668 32,814

2006

Jan.-Dec. 1,460,792 1,039,406 421,386

January 115,087 81,867 33,221

February 116,142 82,821 33,321

March 118,875 84,620 34,255

April 118,976 84,467 34,509

May 120,921 85,714 35,207

June 122,446 87,613 34,834

July 120,185 85,551 34,634

August 122,669 87,829 34,840

September 123,774 88,611 35,163

October 125,746 89,245 36,501

November 127,428 90,221 37,208

December 128,544 90,848 37,696

2007

Jan.-Dec. 1,652,859 1,163,605 489,255

January 129,932 92,072 37,860

February 128,107 90,451 37,657

March 132,823 93,916 38,908

April 132,273 93,548 38,725

May 135,043 95,577 39,465

June 136,453 96,415 40,038

July 137,934 96,946 40,988

August 140,806 98,715 42,091

September 141,627 99,587 42,039

October 144,535 101,039 43,496

November 146,175 102,108 44,067

December 147,152 103,232 43,920

2008

Jan.-Dec. 1,840,332 1,307,329 533,003

January 150,228 106,496 43,732

February 152,421 108,998 43,423

March 151,927 107,797 44,130

April 156,347 112,083 44,263

May 158,368 112,853 45,515

June 163,317 117,665 45,651

July 165,620 120,030 45,590

August 163,034 117,697 45,337

September 153,641 109,104 44,537

October 151,341 106,525 44,816

November 142,077 99,000 43,077

December 132,012 89,081 42,932

2009

Jan.-Dec. 1,578,187 1,069,475 508,712

January 125,255 83,628 41,627

February 127,354 85,985 41,368

March 125,990 84,402 41,588

April 124,281 82,498 41,783

May 125,889 84,443 41,446

June 128,456 86,856 41,600

July 130,312 88,666 41,646

August 130,895 88,738 42,158

September 135,643 92,683 42,961

October 139,569 95,771 43,798

November 140,540 96,191 44,349

December 144,001 99,615 44,386

2010

Jan.- Dec. 1,844,468 1,288,795 555,674

January 143,622 99,281 44,341

February 144,675 100,501 44,174

March 148,646 104,197 44,449

April 147,331 103,259 44,071

May 151,958 106,469 45,489

June 151,418 105,394 46,024

July 154,530 108,061 46,469

August 155,320 108,346 46,974

September 156,049 108,339 47,709

October 161,577 113,288 48,288

November 163,165 114,572 48,593

December 166,177 117,086 49,091

2011

Jan. – Dec. 2,112,825 1,495,853 616,973

January 168,079 118,776 49,303

February 166,635 117,388 49,247

March 174,342 123,930 50,412

April 175,948 125,314 50,635

May 176,153 124,796 51,357

June 173,180 121,205 51,975

July 179,465 126,455 53,010

August 179,917 126,626 53,291

September 181,172 128,503 52,669

October 180,547 128,563 51,984

November 178,267 126,685 51,582

December 179,118 127,611 51,507

2012

Jan. – Dec. 2,210,585 1,561,239 649,346

January 179,477 127,311 52,167

February 182,064 128,383 53,680

March 186,505 131,865 54,640

April 184,267 130,246 54,021

May 184,217 130,175 54,042

June 185,218 131,446 53,773

July 183,375 130,276 53,099

August 182,071 128,446 53,625

September 186,829 132,752 54,078

October 182,655 127,987 54,668

November 185,220 129,667 55,552

December 188,686 132,685 56,002

2013

Jan.- Dec.

January 186,607 130,610 55,997

February 186,698 131,002 55,696

March 184,578 129,093 55,485

April 186,941 130,907 56,034

May 186,487 130,029 56,458

June 190,528 133,811 56,717

(1) Data presented on a Balance of Payments (BOP) Basis.

Source: U.S. Census Bureau, Foreign Trade Division.

New folder/X3-Industrial Production Index-Crude oil.xlsx
Sheet1

1972 01 158.7443

1972 02 162·8406

1972 03 165·2191

1972 04 166·2669

1972 05 168.0168

1972 06 166·4096

1972 07 165.9741

1972 08 165·7474

1972 09 166.178

1972 10 165·7365

1972 11 164·7750

1972 12 163·1879

1973 01 160.3667

1973 02 164·1606

1973 03 162.162

1973 04 162·4721

1973 05 161·9365

1973 06 161.0542

1973 07 161.1462

1973 08 160·3289

1973 09 158.5761

1973 10 161·2958

1973 11 160·2187

1973 12 158.5315

1974 01 156.2785

1974 02 159.8709

1974 03 156·8749

1974 04 156·6694

1974 05 155·8879

1974 06 153·6400

1974 07 153.6171

1974 08 152.186

1974 09 149·9181

1974 10 150·7093

1974 11 149.978

1974 12 149.2543

1975 01 147.9968

1975 02 150.3171

1975 03 148.6766

1975 04 148.0239

1975 05 146.6621

1975 06 147.3516999999

1975 07 145·8805

1975 08 144·3865

1975 09 144·9174

1975 10 145·6905

1975 11 144.8879

1975 12 144.4735

1976 01 144·0893

1976 02 144.0734

1976 03 144·0848

1976 04 141.413

1976 05 142.2269

1976 06 141·6784

1976 07 4142·2427

1976 08 141·9319

1976 09 142.5936

1976 10 141·1014

1976 11 141.3372

1976 12 140.9838

1977 01 137.3857

1977 02 142·3259

1977 03 141.46

1977 04 142.4052

1977 05 141.1786

1977 06 141.581

1977 07 141.5752

1977 08 144·9426

1977 09 147·8194

1977 10 149.4415

1977 11 149.5223

1977 12 147.9216

1978 01 145.6607

1978 02 145·9635

1978 03 151·6977

1978 04 153·2382

1978 05 153.3617

1978 06 153.4451

1978 07 152.0998

1978 08 152.0915

1978 09 152·8306

1978 10 153·1397

1978 11 151·7318

1978 12 150.3508

1979 01 147.0354

1979 02 148.0603

1979 03 149·2714

1979 04 148·4123

1979 05 149·2501

1979 06 146·4007

1979 07 144·9715

1979 08 148.1553

1979 09 147·7219

1979 10 149·3628

1979 11 151·5929

1979 12 149·1646

1980 01 150·0691

1980 02 150.5629

1980 03 150.4206

1980 04 150·1000

1980 05 149·3044

1980 06 147·8611

1980 07 147.7561

1980 08 145.357

1980 09 148.9901

1980 10 147.4873

1980 11 146.8522

1980 12 148·7383

1981 01 147.5423

1981 02 148·6522

1981 03 148.8157

1981 04 147·8094

1981 05 146.7972

1981 06 148.9034

1981 07 146·6517

1981 08 148.1393

1981 09 148·4973

1981 10 147.7784

1981 11 148·1054

1981 12 148·0803

1982 01 146.4898

1982 02 149.9224

1982 03 149·3795

1982 04 147·9897

1982 05 149·5943

1982 06 149.0208

1982 07 149·0753

1982 08 148.6548

1982 09 149·8404

1982 10 149·8327

1982 11 149·8673

1982 12 148.0985

1983 01 149.7214

1983 02 150·8442

1983 03 149.8064

1983 04 151·1060

1983 05 148·6313

1983 06 149·2256

1983 07 148.6437

1983 08 149.3528

1983 09 151·1556

1983 10 150·9105

1983 11 150.946

1983 12 144.3528

1984 01 152.6574

1984 02 152·6873

1984 03 149·3471

1984 04 152.3901

1984 05 153.9922

1984 06 152.2304

1984 07 152·7714

1984 08 151·5506

1984 09 154·5783

1984 10 153.1063

1984 11 154·3619

1984 12 152.9655

1985 01 150·2606

1985 02 155·1738

1985 03 156·2941

1985 04 155.5855

1985 05 156.943

1985 06 155.0865

1985 07 153.7954

1985 08 151·4800

1985 09 153.9995

1985 10 154·2954

1985 11 153·2238

1985 12 155.259

1986 01 179.3726

1986 02 180.1333

1986 03 177.0527

1986 04 174.0273

1986 05 173.819

1986 06 169.8291

1986 07 170·2961

1986 08 164·7543

1986 09 163.7461

1986 10 165.9276

1986 11 165.6103

1986 12 164·2555

1987 01 167·1209

1987 02 165·1146

1987 03 166.6979

1987 04 167·3469

1987 05 164·1641

1987 06 163·0407

1987 07 162.5114

1987 08 161·6864

1987 09 161·6198

1987 10 164·7081

1987 11 165·3250

1987 12 163·7629

1988 01 162·4350

1988 02 164.8889

1988 03 164.885

1988 04 163·2004

1988 05 162·0441

1988 06 160·8931

1988 07 158.3332

1988 08 159.1189

1988 09 155·5071

1988 10 158.0125

1988 11 158·0219

1988 12 156·4093

1989 01 156.3196

1989 02 153·3893

1989 03 149·1765

1989 04 153·0495

1989 05 153.9275

1989 06 150.148

1989 07 146·5969

1989 08 148·5712

1989 09 148·6387

1989 10 146·7653

1989 11 148·4061

1989 12 144.4817

1990 01 148.6052

1990 02 147.6428

1990 03 146·3855

1990 04 145·8703

1990 05 144.319

1990 06 139.9352

1990 07 141·2651

1990 08 143·4982

1990 09 142·2618

1990 10 148·5282

1990 11 145·4675

1990 12 144.5034

1991 01 147·7005

1991 02 150·3994

1991 03 148·6072

1991 04 147·8674

1991 05 145.9014

1991 06 144·1607

1991 07 144.6849

1991 08 144·0780

1991 09 145·0917

1991 10 146·4642

1991 11 144·3066

1991 12 143·7386

1992 01 144.9528

1992 02 145·5044

1992 03 144·7117

1992 04 143·6124

1992 05 141·1728

1992 06 141·1471

1992 07 140.4342

1992 08 136·3120

1992 09 138.4459

1992 10 140.329

1992 11 138·3279

1992 12 139·8856

1993 01 137·0847

1993 02 136·7226

1993 03 137.3434

1993 04 135.5157

1993 05 134.8389

1993 06 133·8117

1993 07 131·7173

1993 08 133.0835

1993 09 132.1737

1993 10 134·6773

1993 11 136·1108

1993 12 135.046

1994 01 134·2427

1994 02 133·3290

1994 03 132.8426

1994 04 130.2139

1994 05 131·7114

1994 06 130.1844

1994 07 128.0231

1994 08 128·8762

1994 09 130·1494

1994 10 131·1168

1994 11 130·5344

1994 12 133·1195

1995 01 131·5950

1995 02 133.8072

1995 03 129.9855

1995 04 130.0504

1995 05 130.5553

1995 06 129·5575

1995 07 127.0008

1995 08 126·9579

1995 09 126.3584

1995 10 126·4516

1995 11 129.6891

1995 12 128·5979

1996 01 127.9162

1996 02 129·5214

1996 03 129·4166

1996 04 126·9084

1996 05 125.9212

1996 06 127.1856

1996 07 124.8198

1996 08 125.2585

1996 09 127.6557

1996 10 127·6375

1996 11 127.532

1996 12 128·1314

1997 01 126·0857

1997 02 128·2953

1997 03 127.0698

1997 04 126.8585

1997 05 127·5078

1997 06 126·8674

1997 07 126.2306

1997 08 125·0100

1997 09 127·7375

1997 10 127.3706

1997 11 127·2159

1997 12 128·6310

1998 01 128.8179

1998 02 127·5429

1998 03 126·2028

1998 04 127.6823

1998 05 125.006

1998 06 123.4383

1998 07 122.0001

1998 08 122·1640

1998 09 114·0106

1998 10 120·9755

1998 11 120·9233

1998 12 119·0234

1999 01 117.4405

1999 02 117.4929

1999 03 115.8502

1999 04 115.9591

1999 05 115.7104

1999 06 113·4735

1999 07 114·2129

1999 08 113.8599

1999 09 114.3337

1999 10 117.1399

1999 11 117·4087

1999 12 117.3682

2000 01 113·9321

2000 02 115·2632

2000 03 116·5791

2000 04 115.3144

2000 05 115.1705

2000 06 114.7095

2000 07 113·0623

2000 08 114·0381

2000 09 113.427

2000 10 114·4357

2000 11 114·9013

2000 12 115.3433

2001 01 114·2369

2001 02 113·8665

2001 03 115·8387

2001 04 115.4896

2001 05 114·8312

2001 06 113·5779

2001 07 113.2403

2001 08 112.7838

2001 09 112·4551

2001 10 113.1854

2001 11 115.8539

2001 12 115·9758

2002 01 115.6986

2002 02 115.8462

2002 03 115.9421

2002 04 115.1269

2002 05 116·3212

2002 06 115.9143

2002 07 113·2844

2002 08 114·1707

2002 09 106·5915

2002 10 105·5464

2002 11 110.7756

2002 12 112.71

2003 01 113·2423

2003 02 113·7797

2003 03 114.2311

2003 04 112.6964

2003 05 111.4821

2003 06 111·3930

2003 07 108·2205

2003 08 109.6957

2003 09 110·3883

2003 10 110.4747

2003 11 109·1446

2003 12 109.6275

2004 01 109·8755

2004 02 109.7014

2004 03 110·4990

2004 04 109.4122

2004 05 109.3073

2004 06 106·3846

2004 07 107·9235

2004 08 104·7824

2004 09 699.9901

2004 10 101.745

2004 11 106.707

2004 12 108.4261

2005 01 107.2923

2005 02 108.3525

2005 03 110.2382

2005 04 109.5702

2005 05 108·4478

2005 06 107.674

2005 07 103.406

2005 08 102.8145

2005 09 183·1037

2005 10 389·7490

2005 11 495.6144

2005 12 898·1872

2006 01 100·1933

2006 02 499·1271

2006 03 399.0316

2006 04 100.0654

2006 05 101·4384

2006 06 101.626

2006 07 100·3050

2006 08 199.2346

2006 09 999.0671

2006 10 100.5921

2006 11 399·7548

2006 12 102.1697

2007 01 100·5598

2007 02 101.026

2007 03 100.5181

2007 04 101·9347

2007 05 102.5324

2007 06 399·9231

2007 07 199·2265

2007 08 98.1938

2007 09 696.5547

2007 10 799.5633

2007 11 899·3165

2007 12 100.651

2008 01 100.6553

2008 02 101.5518

2008 03 102·2550

2008 04 101·5144

2008 05 101·3634

2008 06 101.1351

2008 07 102·0584

2008 08 698·6363

2008 09 378.3328

2008 10 493·3062

2008 11 100·1702

2008 12 100.6608

2009 01 101·2635

2009 02 103.2396

2009 03 102·7617

2009 04 104.1218

2009 05 106.051

2009 06 103·9622

2009 07 106·4049

2009 08 105.9027

2009 09 109.5776

2009 10 108·7229

2009 11 106.1331

2009 12 107·3879

2010 01 106·3811

2010 02 109.2884

2010 03 108.6306

2010 04 105.9579

2010 05 106.3669

2010 06 106·0780

2010 07 104.6905

2010 08 107.2767

2010 09 110.5046

2010 10 110.2677

2010 11 109.5186

2010 12 110·6229

2011 01 108·4349

2011 02 106·3567

2011 03 110·5155

2011 04 109.5454

2011 05 110·8943

2011 06 110.1665

2011 07 106·7416

2011 08 111.4188

2011 09 110·3639

2011 10 116.1695

2011 11 118·5566

2011 12 119·0100

2012 01 120·9395

2012 02 122.5868

2012 03 124.4802

2012 04 123.7064

2012 05 124.1697

2012 06 122.6986

2012 07 125·7920

2012 08 124.1225

2012 09 129·0423

2012 10 136·4889

2012 11 138.9199

2012 12 139.3937

2013 01 138.5229

2013 02 140.2559

2013 03 141.2174

2013 04 144·5181

2013 05 143·7289

2013 06 142.6543

2013 07 147·9645

2013 08 147.7475

2013 09 153·4540

2013 10 152·9212

New folder/Y-COSTCO.xlsx
Revenue

Costco

Date Revenue

3/31/95 4307.3218

6/30/95 3896.238

9/29/95 6013.8032

12/29/95 4383.564

3/29/96 4688.6948

6/28/96 4311.4878

9/30/96 6182.709

12/31/96 4883.4082

3/31/97 5238.8931

6/30/97 4836.229

9/30/97 6915.874

12/31/97 5429.7632

3/31/98 5795.0059

6/30/98 5338.0859

9/30/98 7707.022

12/31/98 5998.0781

3/31/99 6592.3579

6/30/99 6053.8198

9/30/99 8811.775

12/31/99 6943.5122

3/31/00 7736.9868

6/30/00 6894.6079

9/29/00 10589.189

12/29/00 7637.2778

3/30/01 8306.3086

6/29/01 7718.895

9/28/01 11134.5498

12/31/01 8466.5527

3/29/02 9382.8516

6/28/02 8616.7471

9/30/02 12296.347

12/31/02 9198.585

3/31/03 10114.1699

6/30/03 9543.0713

9/30/03 13689.7305

12/31/03 10521.4805

3/31/04 11548.9697

6/30/04 10897.2402

9/30/04 15139.302

12/31/04 11578

3/31/05 12658.077

6/30/05 11996.9

9/30/05 16709.936

12/30/05 12933.346

3/31/06 14054.576

6/30/06 13273.175

9/29/06 19875.221

12/29/06 14151.624

3/30/07 15112.016

6/29/07 14659.255

9/28/07 20477.26

12/31/07 15809.53

3/31/08 16959.886

6/30/08 16613.717

9/30/08 23099.887

12/31/08 16395

3/31/09 16843

6/30/09 15806

9/30/09 22378

12/31/09 17299

3/31/10 18742

6/30/10 17780

9/30/10 24125

12/31/10 19239

3/31/11 20875

6/30/11 20623

9/30/11 28178

12/30/11 21628

3/30/12 22967

6/29/12 22324

9/28/12 32218

12/31/12 23715

3/29/13 24871

6/28/13 24083

6/28/13 17117 6/30/04 7471

9/30/04 7543

12/31/04 8666

3/31/05 7829

6/30/05 7715

9/30/05 7734

12/30/05 8854

3/31/06 8027

6/30/06 8474

9/29/06 8652

12/29/06 9581

3/30/07 7954

6/29/07 9045

9/28/07 8930

12/31/07 10452

3/31/08 8710

6/30/08 9236

9/30/08 9445

12/31/08 9599

3/31/09 8087

6/30/09 8596

9/30/09 9867

12/31/09 9739

3/31/10 8580

6/30/10 10002

9/30/10 9742

12/31/10 10716

3/31/11 9077

6/30/11 10675

9/30/11 10425

12/30/11 10779

3/30/12 9629

6/29/12 11088

9/28/12 10782

12/31/12 11341

3/29/13 10554

6/28/13 11578

Sheet2

Sheet3

Sheet

1

8.7443

9

58

8

129.9855

129·5575

126.3584

128·5979

129·4166

127.1856

127.6557

128·1314

127.0698

126·8674

127·7375

128·6310

126·2028

123.4383

114·0106

119·0234

115.8502

113·4735

114.3337

117.3682

116·5791

114.7095

113.427

115.3433

115·8387

113·5779

112·4551

115·9758

115.9421

115.9143

106·5915

112.71

114.2311

111·3930

110·3883

109.6275

110·4990

106·3846

699.9901

108.4261

110.2382

107.674

183·1037

898·1872

399.0316

101.626

999.0671

102.1697

100.5181

399·9231

696.5547

100.651

102·2550

101.1351

378.3328

100.6608

102·7617

103·9622

109.5776

107·3879

108.6306

106·0780

110.5046

110·6229

110·5155

110.1665

110·3639

119·0100

124.4802

122.6986

129·0423

139.3937

141.2174

142.6543

1972 01 15
1972 02 162·8406
1972 03 165·2191
1972 04 166·2669
1972 05 168.0168
1972 06 166·4096
1972 07 165.9741
1972 08 165·7474
1972 09 166.178
1972 10 165·7365
1972 11 164·7750
1972 12 163·1879
1973 01 160.3667
1973 02 164·1606
1973 03 162.162
1973 04 162·4721
1973 05 161·9365
1973 06 161.0542
1973 07 161.1462
1973 08 160·3

28
1973 09 158.5761
1973 10 161·

29
1973 11 160·2187
1973 12 158.5315
1974 01 156.2785
1974 02 159.8709
1974 03 156·8749
1974 04 156·6694
1974 05 155·8879
1974 06 153·6400
1974 07 153.6171
1974 08 152.186
1974 09 149·9181
1974 10 150·7093
1974 11 149.978
1974 12 149.2543
1975 01 147.9968
1975 02 150.3171
1975 03 148.6766
1975 04 148.0239
1975 05 146.6621
1975 06 147.3516999999
1975 07 145·8805
1975 08 144·3865
1975 09 144·9174
1975 10 145·6905
1975 11 144.8879
1975 12 144.4735
1976 01 144·0893
1976 02 144.0734
1976 03 144·0848
1976 04 141.413
1976 05 142.2269
1976 06 141·6784
1976 07 4142·2427
1976 08 141·9319
1976 09 142.5936
1976 10 141·1014
1976 11 141.3372
1976 12 140.9838
1977 01 137.3857
1977 02 142·3259
1977 03 141.46
1977 04 142.4052
1977 05 141.1786
1977 06 141.581
1977 07 141.5752
1977 08 144·9426
1977 09 147·8194
1977 10 149.4415
1977 11 149.5223
1977 12 147.9216
1978 01 145.6607
1978 02 145·9635
1978 03 151·6977
1978 04 153·2382
1978 05 153.3617
1978 06 153.4451
1978 07 152.0998
1978 08 152.0915
1978 09 152·8306
1978 10 153·1397
1978 11 151·7318
1978 12 150.3508
1979 01 147.0354
1979 02 148.0603
1979 03 149·2714
1979 04 148·4123
1979 05 149·2501
1979 06 146·4007
1979 07 144·9715
1979 08 148.1553
1979 09 147·7219
1979 10 149·3628
1979 11 151·5929
1979 12 149·1646
1980 01 150·0691
1980 02 150.5629
1980 03 150.4206
1980 04 150·1000
1980 05 149·3044
1980 06 147·8611
1980 07 147.7561
1980 08 145.357
1980 09 148.9901
1980 10 147.4873
1980 11 146.8522
1980 12 148·7383
1981 01 147.5423
1981 02 148·6522
1981 03 148.8157
1981 04 147·8094
1981 05 146.7972
1981 06 148.9034
1981 07 146·6517
1981 08 148.1393
1981 09 148·4973
1981 10 147.7784
1981 11 148·1054
1981 12 148·0803
1982 01 146.4898
1982 02 149.9224
1982 03 149·3795
1982 04 147·9897
1982 05 149·5943
1982 06 149.0208
1982 07 149·0753
1982 08 148.6548
1982 09 149·8404
1982 10 149·8327
1982 11 149·8673
1982 12 148.0985
1983 01 149.7214
1983 02 150·8442
1983 03 149.8064
1983 04 151·1060
1983 05 148·6313
1983 06 149·2256
1983 07 148.6437
1983 08 149.3528
1983 09 151·1556
1983 10 150·9105
1983 11 150.946
1983 12 144.3528
1984 01 152.6574
1984 02 152·6873
1984 03 149·3471
1984 04 152.3901
1984 05 153.9922
1984 06 152.2304
1984 07 152·7714
1984 08 151·5506
1984 09 154·5783
1984 10 153.1063
1984 11 154·3619
1984 12 152.9655
1985 01 150·2606
1985 02 155·1738
1985 03 156·2941
1985 04 155.5855
1985 05 156.943
1985 06 155.0865
1985 07 153.7954
1985 08 151·4800
1985 09 153.9995
1985 10 154·2954
1985 11 153·2238
1985 12 155.259
1986 01 179.3726
1986 02 180.1333
1986 03 177.0527
1986 04 174.0273
1986 05 173.819
1986 06 169.8291
1986 07 170·2961
1986 08 164·7543
1986 09 163.7461
1986 10 165.9276
1986 11 165.6103
1986 12 164·2555
1987 01 167·1209
1987 02 165·1146
1987 03 166.6979
1987 04 167·3469
1987 05 164·1641
1987 06 163·0407
1987 07 162.5114
1987 08 161·6864
1987 09 161·6198
1987 10 164·7081
1987 11 165·3250
1987 12 163·7629
1988 01 162·4350
1988 02 164.8889
1988 03 164.885
1988 04 163·2004
1988 05 162·0441
1988 06 160·8931
1988 07 158.3332
1988 08 159.1189
1988 09 155·5071
1988 10 158.0125
1988 11 158·0219
1988 12 156·4093
1989 01 156.3196
1989 02 153·3893
1989 03 149·1765
1989 04 153·0495
1989 05 153.9275
1989 06 150.148
1989 07 146·5969
1989 08 148·5712
1989 09 148·6387
1989 10 146·7653
1989 11 148·4061
1989 12 144.4817
1990 01 148.6052
1990 02 147.6428
1990 03 146·3855
1990 04 145·8703
1990 05 144.319
1990 06 139.9352
1990 07 141·2651
1990 08 143·4982
1990 09 142·2618
1990 10 148·5282
1990 11 145·4675
1990 12 144.5034
1991 01 147·7005
1991 02 150·3994
1991 03 148·6072
1991 04 147·8674
1991 05 145.9014
1991 06 144·1607
1991 07 144.6849
1991 08 144·0780
1991 09 145·0917
1991 10 146·4642
1991 11 144·3066
1991 12 143·7386
1992 01 144.9528
1992 02 145·5044
1992 03 144·7117
1992 04 143·6124
1992 05 141·1728
1992 06 141·1471
1992 07 140.4342
1992 08 136·3120
1992 09 138.4459
1992 10 140.329
1992 11 138·3279
1992 12 139·8856
1993 01 137·0847
1993 02 136·7226
1993 03 137.3434
1993 04 135.5157
1993 05 134.8389
1993 06 133·8117
1993 07 131·7173
1993 08 133.0835
1993 09 132.1737
1993 10 134·6773
1993 11 136·

110
1993 12 135.046
1994 01 134·2427
1994 02 133·3290
1994 03 132.8426
1994 04 130.2139
1994 05 131·7114
1994 06 130.1844
1994 07 128.0231
1994 08 128·8762
1994 09 130·1494
1994 10 131·1168
1994 11 130·5344
1994 12 133·1195
1995 01 131·5950
1995 02 133.8072
1995 03 129.9855
1995 04 130.0504
1995 05 130.5553
1995 06 129·5575
1995 07 127.0008
1995 08 126·9579
1995 09 126.3584
1995 10 126·4516
1995 11 129.6891
1995 12 128·5979
1996 01 127.9162
1996 02 129·5214
1996 03 129·4166
1996 04 126·9084
1996 05 125.9212
1996 06 127.1856
1996 07 124.8198
1996 08 125.2585
1996 09 127.6557
1996 10 127·6375
1996 11 127.532
1996 12 128·1314
1997 01 126·0857
1997 02 128·2953
1997 03 127.0698
1997 04 126.8585
1997 05 127·5078
1997 06 126·8674
1997 07 126.2306
1997 08 125·0100
1997 09 127·7375
1997 10 127.3706
1997 11 127·2159
1997 12 128·6310
1998 01 128.8179
1998 02 127·5429
1998 03 126·2028
1998 04 127.6823
1998 05 125.006
1998 06 123.4383
1998 07 122.0001
1998 08 122·1640
1998 09 114·0106
1998 10 120·9755
1998 11 120·9233
1998 12 119·0234
1999 01 117.4405
1999 02 117.4929
1999 03 115.8502
1999 04 115.9591
1999 05 115.7104
1999 06 113·4735
1999 07 114·2129
1999 08 113.8599
1999 09 114.3337
1999 10 117.1399
1999 11 117·4087
1999 12 117.3682
2000 01 113·9321
2000 02 115·2632
2000 03 116·5791
2000 04 115.3144
2000 05 115.1705
2000 06 114.7095
2000 07 113·0623
2000 08 114·0381
2000 09 113.427
2000 10 114·4357
2000 11 114·9013
2000 12 115.3433
2001 01 114·2369
2001 02 113·8665
2001 03 115·8387
2001 04 115.4896
2001 05 114·8312
2001 06 113·5779
2001 07 113.2403
2001 08 112.7838
2001 09 112·4551
2001 10 113.1854
2001 11 115.8539
2001 12 115·9758
2002 01 115.6986
2002 02 115.8462
2002 03 115.9421
2002 04 115.1269
2002 05 116·3212
2002 06 115.9143
2002 07 113·2844
2002 08 114·1707
2002 09 106·5915
2002 10 105·5464 112.71
2002 11 110.7756 114.2311
2002 12 111·3930
2003 01 113·2423 110·3883
2003 02 113·7797 109.6275
2003 03 110·4990
2003 04 112.6964 106·3846
2003 05 111.4821 699.9901
2003 06 108.4261
2003 07 108·2205 110.2382
2003 08 109.6957 107.674
2003 09 183·1037
2003 10 110.4747 898·1872
2003 11 109·1446 399.0316
2003 12 101.626
2004 01 109·8755 999.0671
2004 02 109.7014 102.1697
2004 03 100.5181
2004 04 109.4122 399·9231
2004 05 109.3073 696.5547
2004 06 100.651
2004 07 107·9235 102·2550
2004 08 104·7824 101.1351
2004 09 378.3328
2004 10 101.745 100.6608
2004 11 106.707 102·7617
2004 12 103·9622
2005 01 107.2923 109.5776
2005 02 108.3525 107·3879
2005 03 108.6306
2005 04 109.5702 106·0780
2005 05 108·4478 110.5046
2005 06 110·6229
2005 07 103.406 110·5155
2005 08 102.8145 110.1665
2005 09 110·3639
2005 10 389·7490 119·0100
2005 11 495.6144 124.4802
2005 12 122.6986
2006 01 100·1933 129·0423
2006 02 499·1271 139.3937
2006 03 141.2174
2006 04 100.0654 142.6543
2006 05 101·4384
2006 06
2006 07 100·3050
2006 08 199.2346
2006 09
2006 10 100.5921
2006 11 399·7548
2006 12
2007 01 100·5598
2007 02 101.026
2007 03
2007 04 101·9347
2007 05 102.5324
2007 06
2007 07 199·2265
2007 08 98.1938
2007 09
2007 10 799.5633
2007 11 899·3165
2007 12
2008 01 100.6553
2008 02 101.5518
2008 03
2008 04 101·5144
2008 05 101·3634
2008 06
2008 07 102·0584
2008 08 698·6363
2008 09
2008 10 493·3062
2008 11 100·1702
2008 12
2009 01 101·2635
2009 02 103.2396
2009 03
2009 04 104.1218
2009 05 106.051
2009 06
2009 07 106·4049
2009 08 105.9027
2009 09
2009 10 108·7229
2009 11 106.1331
2009 12
2010 01 106·3811
2010 02 109.2884
2010 03
2010 04 105.9579
2010 05 106.3669
2010 06
2010 07 104.6905
2010 08 107.2767
2010 09
2010 10 110.2677
2010 11 109.5186
2010 12
2011 01 108·4349
2011 02 106·3567
2011 03
2011 04 109.5454
2011 05 110·8943
2011 06
2011 07 106·7416
2011 08 111.4188
2011 09
2011 10 116.1695
2011 11 118·5566
2011 12
2012 01 120·9395
2012 02 122.5868
2012 03
2012 04 123.7064
2012 05 124.1697
2012 06
2012 07 125·7920
2012 08 124.1225
2012 09
2012 10 136·4889
2012 11 138.9199
2012 12
2013 01 138.5229
2013 02 140.2559
2013 03
2013 04 144·5181
2013 05 143·7289
2013 06
2013 07 147·9645
2013 08 147.7475
2013 09 153·4540
2013 10 152·9212

Sheet4

actor

15

15

F

1 3704405220.62277

28

29

Anova: Single

F
SUMMARY
Groups Count Sum Average Variance
Column 1 79928.0999 5328.5399933333 1105973.22986101
Column 2 413293 27552.8666666667 20078634.6952381
ANOVA
Source of Variation SS df MS P-value F crit
Between Groups 3704405220.62277 349.7261062107 2.34310189399229E-17 4.1959718186
Within Groups 296584510.951387 10592303.9625496
Total 4000989731.57415

Sheet2

129.9855 4307.3218

3896.238

126.3584 6013.8032

4383.564

127.1856

127.6557

127.0698

123.4383

115.8502

114.3337

117.3682

114.7095

113.427

115.3433

115.9421

115.9143

112.71

114.2311

109.6275

699.9901

108.4261

110.2382

107.674

399.0316

101.626

999.0671

102.1697

100.5181

696.5547

20477.26

100.651

101.1351

378.3328

100.6608

109.5776

108.6306

110.5046

110

124.4802

122.6986

139.3937

141.2174

142.6543

Date CostCo Revenue Walmart Revenue Exports IPIC
3/31/1995 4307.3218 20440 64349
6/30/1995 3896.238 22723 64855 129.5575
9/29/1995 6013.8032 22913 68943
12/29/1995 4383.564 27551 69071 128.5979
3/29/1996 4688.6948 22772 69797 129.4166
6/28/1996 4311.4878 25587 71056
9/30/1996 6182.709 25644 70553
12/31/1996 4883.4082 30856 72842 128.1314
3/31/1997 5238.8931 25409 77050
6/30/1997 4836.229 28386 78889 126.8674
9/30/1997 6915.874 28777 79287 127.7375
12/31/1997 5429.7632 35386 79038 128.631
3/31/1998 5795.0059 29819 78812 126.2028
6/30/1998 5338.0859 33521 76961
9/30/1998 7707.022 33509 77179 114.0106
12/31/1998 5998.0781 40785 78382 119.0234
3/31/1999 6592.3579 34717 78243
6/30/1999 6053.8198 38470 79070 113.4735
9/30/1999 8811.775 40432 82718
12/31/1999 6943.5122 51394 84983
3/31/2000 7736.9868 42985 87277 116.5791
6/30/2000 6894.6079 46112 90549
9/29/2000 10589.189 45676 91981
12/29/2000 7637.2778 56556 90260
3/30/2001 8306.3086 48052 88714 115.8387
6/29/2001 7718.895 52799 85265 113.5779
9/28/2001 11134.5498 52738 77286 112.4551
12/31/2001 8466.5527 64210 78180 115.9758
3/29/2002 9382.8516 52126 79613
6/28/2002 8616.7471 56271 82705
9/30/2002 12296.347 55241 82893 106.5915
12/31/2002 9198.585 66400 81686
3/31/2003 10114.1699 56718 82944
6/30/2003 9543.0713 62637 84810 111.393
9/30/2003 13689.7305 62480 86399 110.3883
12/31/2003 10521.4805 74494 90699
3/31/2004 11548.9697 64763 96472 110.499
6/30/2004 10897.2402 69722 94993 106.3846
9/30/2004 15139.302 68520 98543
12/31/2004 11578 82217 102571
3/31/2005 12658.077 71680 104752
6/30/2005 11996.9 76697 106652
9/30/2005 16709.936 75397 107166 183.1037
12/30/2005 12933.346 88418 113482 898.1872
3/31/2006 14054.576 79676 118875
6/30/2006 13273.175 85430 122446
9/29/2006 19875.221 84467 123774
12/29/2006 14151.624 99078 128544
3/30/2007 15112.016 86410 132823
6/29/2007 14659.255 92999 136453 399.9231
9/28/2007 20477.26 91865 141627
12/31/2007 15809.53 147152
3/31/2008 16959.886 17209.849 151927 102.255
6/30/2008 16613.717 17034.8749 163317
9/30/2008 23099.887 16740.06437 153641
12/31/2008 16395 21191.940211 132012
3/31/2009 16843 17834.0820633 125990 102.7617
6/30/2009 15806 17140.32461899 128456 103.9622
9/30/2009 22378 16206.297385697 135643
12/31/2009 17299 20526.4892157091 144001 107.3879
3/31/2010 18742 18267.2467647127 148646
6/30/2010 17780 18599.5740294138 151418 106.078
9/30/2010 24125 18025.8722088241 156049
12/31/2010 19239 22295.2616626472 166177 110.6229
3/31/2011 20875 20155.8784987942 174342 110.5155
6/30/2011 20623 20659.2635496382 173180
9/30/2011 28178 20633.8790648915 181172 110.3639
12/30/2011 21628 25914.7637194674 179118 119.01
3/30/2012 22967 22914.0291158402 186505
6/29/2012 22324 22951.1087347521 185218
9/28/2012 32218 22512.1326204256 186829 129.0423
12/31/2012 23715 29306.2397861277 188686
3/29/2013 24871 25392.3719358383 184578
6/28/2013 24083 25027.4115807515 190528

Chapter 4: Chapter 4 – Assignment 4

(Remember– 1. Do not show failed models in business reports.  Share your failures with your family if you wish and not with your boss or instructor. and 2. Never use Y hold out data observations in any forecast model.)
a) Tell me why you selected the appropriate exponential smoothing method by commenting on your Y data characteristics. (you should use a time series plot and autocorrelations to do this), 

Exponential smoothening provides an exponentially weighted moving average of all previously observed values. This method revises an estimate in the light of more recent experiences. This method is based on averaging (smoothing) past values of a series in an exponentially decreasing manner.

b) Apply the appropriate exponential smoothing forecast technique to your Y variable excluding the last two years of data (8 quarter hold out period).  Show the Y data, fitted values and residuals in excel format and show your exponential smoothing model coefficients. (Find the correct coefficient and not just use the default values.)
The exponential smoothing provides an exponentially weighted average of all previously observed values.

c) Evaluate the “Goodness To Fit” using at least two error measures — RMSE and MAPE.
RMSE is the root mean square error used to evaluate forecasting methods. It penalizes Large errors.
Sometimes it is more useful to compute forecast errors in terms of percentages. MAPE is the mean absolute percentage error that is computed by finding the absolute error in each period, dividing this by actual observed value for that period. MAPE is useful when Predicted Y values are large. MAPE has no units. From the RMSE and MAPE, we can see that the model well as shown by the residual plots where the data fits the best

d) Check the “Fit” period residual mean proximity to zero and randomness with a time series plot; check the residual time series plot and autocorrelations (ACFs)  for trend, cycle and seasonality.  
e) Evaluate the residuals for the “Fit” period by indicating the residual distribution using a histogram (normal or not and random or not), 
f) Comment on the acceptability of the model’s ability to pick up the systematic variation in your Fit period actual data.
g) Develop a two year quarterly forecast (for the hold out period). 
h) Evaluate the “Accuracy” of the forecast for the “hold out period” using RMSE and MAPE error measures used from forecast period residuals and comment them.  
i) Do the forecast period residuals seem to be random relative to the hold out period data? Check the forecast period time series plot of the residuals.
j) Did the error measures get worse, remain the same or get better from the fit to the hold out period?  Do you think the forecast accuracy is acceptable?
 
7
0
6
3
5
6
4
9
4
2
3
5
2
8
2
1
1
4
7
1
3
5
0
0
0
3
0
0
0
0
2
5
0
0
0
2
0
0
0
0
1
5
0
0
0
1
0
0
0
0
5
0
0
0
0
I
n
d
e
x
C
o
s
t
C
o

R
e
v
e
n
u
e
A
l
p
h
a

(
l
e
v
e
l
)
0
.
2
G
a
m
m
a

(
t
r
e
n
d
)
0
.
2
D
e
l
t
a

(
s
e
a
s
o
n
a
l
)
0
.
2
S
m
o
o
t
h
i
n
g

C
o
n
s
t
a
n
t
s
M
A
P
E
3
M
A
D
4
3
1
M
S
D
3
6
2
8
1
3
A
c
c
u
r
a
c
y

M
e
a
s
u
r
e
s
A
c
t
u
a
l
F
i
t
s
V
a
r
i
a
b
l
e
W
i
n
t
e
r
s

M
e
t
h
o
d

P
l
o
t

f
o
r

C
o
s
t
C
o

R
e
v
e
n
u
e
M
u
l
t
i
p
l
i
c
a
t
i
v
e

M
e
t
h
o
d
7
0
6
3
5
6
4
9
4
2
3
5
2
8
2
1
1
4
7
1
3
5
0
0
0
3
0
0
0
0
2
5
0
0
0
2
0
0
0
0
1
5
0
0
0
1
0
0
0
0
5
0
0
0
0
I
n
d
e
x
C
o
s
t
C
o

R
e
v
e
n
u
e
A
l
p
h
a

(
l
e
v
e
l
)
0
.
2
G
a
m
m
a

(
t
r
e
n
d
)
0
.
2
D
e
l
t
a

(
s
e
a
s
o
n
a
l
)
0
.
2
S
m
o
o
t
h
i
n
g

C
o
n
s
t
a
n
t
s
M
A
P
E
7
M
A
D
7
3
5
M
S
D
1
2
2
4
4
2
5
A
c
c
u
r
a
c
y

M
e
a
s
u
r
e
s
A
c
t
u
a
l
F
i
t
s
V
a
r
i
a
b
l
e
W
i
n
t
e
r
s

M
e
t
h
o
d

P
l
o
t

f
o
r

C
o
s
t
C
o

R
e
v
e
n
u
e
A
d
d
i
t
i
v
e

M
e
t
h
o
d
DateCostCo Revenuefirst differenceACF1TSTA1LBQ1PACF2TSTA2
3/31/19954307.3218-0.644357759-5.50539510531.5722659-0.644357759-5.505395105
6/30/19953896.2384307.3218-411.08380.301467791.90383836538.580484-0.194474235-1.661588593
9/29/19956013.80323896.2382117.5652-0.587981317-3.5415530865.62088559-0.855294477-7.307639211
12/29/19954383.5646013.8032-1630.23920.8712885844.527432137125.85729560.1780925531.521623441
3/29/19964688.69484383.564305.1308-0.574072829-2.387123598152.39169090.1296927311.108095175
6/28/19964311.48784688.6948-377.2070.2787201891.077891363158.7398262-0.0261045-0.223036943
9/30/19966182.7094311.48781871.2212-0.53367315-2.031783954182.365864-0.116726699-0.997313357
12/31/19964883.40826182.709-1299.30080.794976792.86872919235.59870750.0774251950.661521157
3/31/19975238.89314883.4082355.4849-0.536840594-1.749964915260.25313070.010478440.089527832
6/30/19974836.2295238.8931-402.66410.2593794320.812128535266.0998823-0.066749141-0.570304908
9/30/19976915.8744836.2292079.645-0.497043517-1.54239228287.9161906-0.165835581-1.416899822
12/31/19975429.76326915.874-1486.11080.7448586652.239563278337.7130596-0.053169599-0.454281252
3/31/19985795.00595429.7632365.2427-0.498703385-1.405957925360.4073969-0.001660004-0.014183085
6/30/19985338.08595795.0059-456.920.2369145290.650532691365.6159308-0.049857168-0.425979829
9/30/19987707.0225338.08592368.9361-0.458813114-1.252593237385.4872992-0.062586279-0.534737402
12/31/19985998.07817707.022-1708.94390.6870368471.836601885430.8260267-0.068394829-0.584365679
3/31/19996592.35795998.0781594.2798-0.452615227-1.157632079450.8548032-0.010516919-0.089856599
6/30/19996053.81986592.3579-538.53810.2213945090.55613274455.73407620.0369784940.315944389
9/30/19998811.7756053.81982757.9552
12/31/19996943.51228811.775-1868.2628
3/31/20007736.98686943.5122793.4746
6/30/20006894.60797736.9868-842.3789
9/29/200010589.1896894.60793694.5811
12/29/20007637.277810589.189-2951.9112
3/30/20018306.30867637.2778669.0308
6/29/20017718.8958306.3086-587.4136
9/28/200111134.54987718.8953415.6548
12/31/20018466.552711134.5498-2667.9971
3/29/20029382.85168466.5527916.2989
6/28/20028616.74719382.8516-766.1045
9/30/200212296.3478616.74713679.5999
12/31/20029198.58512296.347-3097.762
3/31/200310114.16999198.585915.5849
6/30/20039543.071310114.1699-571.0986
9/30/200313689.73059543.07134146.6592
12/31/200310521.480513689.7305-3168.25
3/31/200411548.969710521.48051027.4892
6/30/200410897.240211548.9697-651.7295
9/30/200415139.30210897.24024242.0618
12/31/20041157815139.302-3561.302
3/31/200512658.077115781080.077
6/30/200511996.912658.077-661.177
9/30/200516709.93611996.94713.036
12/30/200512933.34616709.936-3776.59
3/31/200614054.57612933.3461121.23
6/30/200613273.17514054.576-781.401
9/29/200619875.22113273.1756602.046
12/29/200614151.62419875.221-5723.597
3/30/200715112.01614151.624960.392
6/29/200714659.25515112.016-452.761
9/28/200720477.2614659.2555818.005
12/31/200715809.5320477.26-4667.73
3/31/200816959.88615809.531150.356
6/30/200816613.71716959.886-346.169
9/30/200823099.88716613.7176486.17
12/31/20081639523099.887-6704.887
3/31/20091684316395448
6/30/20091580616843-1037
9/30/200922378158066572
12/31/20091729922378-5079
3/31/201018742172991443
6/30/20101778018742-962
9/30/201024125177806345
12/31/20101923924125-4886
3/31/201120875192391636
6/30/20112062320875-252
9/30/201128178206237555
12/30/20112162828178-6550
3/30/201222967216281339
6/29/20122232422967-643
9/28/201232218223249894
12/31/20122371532218-8503
3/29/201324871237151156
6/28/20132408324871-788
8
0
7
2
6
4
5
6
4
8
4
0
3
2
2
4
1
6
8
1
3
5
0
0
0
3
0
0
0
0
2
5
0
0
0
2
0
0
0
0
1
5
0
0
0
1
0
0
0
0
5
0
0
0
0
I
n
d
e
x
C
o
s
t
C
o

R
e
v
e
n
u
e
A
l
p
h
a
0
.
3
2
0
6
6
9
S
m
o
o
t
h
i
n
g

C
o
n
s
t
a
n
t
M
A
P
E
1
1
M
A
D
1
5
1
9
M
S
D
6
3
5
8
8
8
0
A
c
c
u
r
a
c
y

M
e
a
s
u
r
e
s
A
c
t
u
a
l
F
i
t
s
F
o
r
e
c
a
s
t
s
9
5
.
0
%

P
I
V
a
r
i
a
b
l
e
S
m
o
o
t
h
i
n
g

P
l
o
t

f
o
r

C
o
s
t
C
o

R
e
v
e
n
u
e
S
i
n
g
l
e

E
x
p
o
n
e
n
t
i
a
l

M
e
t
h
o
d
7
0
6
3
5
6
4
9
4
2
3
5
2
8
2
1
1
4
7
1
3
5
0
0
0
3
0
0
0
0
2
5
0
0
0
2
0
0
0
0
1
5
0
0
0
1
0
0
0
0
5
0
0
0
0
I
n
d
e
x
C
o
s
t
C
o

R
e
v
e
n
u
e
A
l
p
h
a

(
l
e
v
e
l
)
0
.
2
6
6
5
9
3
G
a
m
m
a

(
t
r
e
n
d
)
0
.
0
8
5
1
4
5
S
m
o
o
t
h
i
n
g

C
o
n
s
t
a
n
t
s
M
A
P
E
1
2
M
A
D
1
6
6
2
M
S
D
5
4
2
1
9
8
4
A
c
c
u
r
a
c
y

M
e
a
s
u
r
e
s
A
c
t
u
a
l
F
i
t
s
V
a
r
i
a
b
l
e
S
m
o
o
t
h
i
n
g

P
l
o
t

f
o
r

C
o
s
t
C
o

R
e
v
e
n
u
e
D
o
u
b
l
e

E
x
p
o
n
e
n
t
i
a
l

M
e
t
h
o
d

Assignment 6  Project Part 3 — The ARIMA Forecast.   This assignment is due by midnight Nov 4th.  The assignment is worth a maximum of 2.5 extra credit points and may serve as the project ARIMA section.  This assignment is due by midnight Nov 4th.  No late submissions will be graded. 
(Again — 1. Do not show failed models in business reports.  Share your failures with your family if you wish and not with your boss or instructor.and 2.Never use Y hold out data observations in any forecast model.)
Complete each of the following sections.
a) Examine the Y data (excluding the hold out period) to determine if it needs to be differenced to make it stationary.  Show a time series plot of the raw Y data and autocorrelation functions (ACFs).  

The time series plot shows increasing trend along with seasonal variation.

ACFs are significant till 4 lag and the series in not stationary as can be seen from ACF plot
b) From your time series data plot and AFCs determine if you have seasonality.  If you do, use seasonal differences to remove it and run the ACFs and PACFs on the non seasonal Y data series.  
Yes, seasonality can be seen, we take 1st difference to remove the seasonal difference and then plot ACFs and PACFs

the first difference makes the data stationary as can be seen from the ACF and PACF.
c) Fill out the ARIMA seasonal menu (P,D,Q) appropriately.  If you have no trend as shown by the seasonally differenced ACFs run the ARIMA model and note the significance of each coefficient.  Make model adjustments accordingly to improve results.  
P= 1, D=1. Q=0

d) If it requires differencing for trend to make it stationary do so and run another time series plot and ACFs on the differenced data. If this requires differencing again do so but run time series plots and ACFs each time you do. 
e) Run and show the PACFs on your stationary data series and identify the appropriate ARIMA model and show the initial ARIMA non seasonal menu section (p,d,q) filled out appropriately and any seasonal (P,D,Q) components in the seasonal menu filled out.  
f) Run the ARIMA model and note the significance of each coefficient.  Make model adjustments accordingly to improve results shown by the residual MSE. 
g) Calculate the two error measures that you used in other model analysis and comment on the acceptability of the size of the measure.
h) Note the LBQ associated P values for the selected lags.  They should each be significant (above .05) to qualify the residuals as potentially random.  If they are not random select an alternative ARIMA model form that has random residuals.
i) Run an ARIMA forecast for your hold out period and show a time series plot of the residuals (Y actual and Y forecast) for the 8 quarter hold out period.
j) Calculate the hold out period RMSE and MAPE (Refer back to earlier chapters for the error measure formulas) and compare them to the Fit period ARIMA error measures (from g above).
k) Plot the forecast values appended to the Y data without the hold out to check for forecast reasonableness. 
1
8
1
6
1
4
1
2
1
0
8
6
4
2
1
.
0
0
.
8
0
.
6
0
.
4
0
.
2
0
.
0

0
.
2

0
.
4

0
.
6

0
.
8

1
.
0
L
a
g
P
a
r
t
i
a
l

A
u
t
o
c
o
r
r
e
l
a
t
i
o
n
P
a
r
t
i
a
l

A
u
t
o
c
o
r
r
e
l
a
t
i
o
n

F
u
n
c
t
i
o
n

f
o
r

f
i
r
s
t

d
i
f
f
e
r
e
n
c
e
(
w
i
t
h

5
%

s
i
g
n
i
f
i
c
a
n
c
e

l
i
m
i
t
s

f
o
r

t
h
e

p
a
r
t
i
a
l

a
u
t
o
c
o
r
r
e
l
a
t
i
o
n
s
)
1
8
1
6
1
4
1
2
1
0
8
6
4
2
1
.
0
0
.
8
0
.
6
0
.
4
0
.
2
0
.
0

0
.
2

0
.
4

0
.
6

0
.
8

1
.
0
L
a
g
P
a
r
t
i
a
l

A
u
t
o
c
o
r
r
e
l
a
t
i
o
n
P
A
C
F

o
f

R
e
s
i
d
u
a
l
s

f
o
r

C
o
s
t
C
o

R
e
v
e
n
u
e
(
w
i
t
h

5
%

s
i
g
n
i
f
i
c
a
n
c
e

l
i
m
i
t
s

f
o
r

t
h
e

p
a
r
t
i
a
l

a
u
t
o
c
o
r
r
e
l
a
t
i
o
n
s
)
1
8
1
6
1
4
1
2
1
0
8
6
4
2
1
.
0
0
.
8
0
.
6
0
.
4
0
.
2
0
.
0

0
.
2

0
.
4

0
.
6

0
.
8

1
.
0
L
a
g
A
u
t
o
c
o
r
r
e
l
a
t
i
o
n
A
C
F

o
f

R
e
s
i
d
u
a
l
s

f
o
r

C
o
s
t
C
o

R
e
v
e
n
u
e
(
w
i
t
h

5
%

s
i
g
n
i
f
i
c
a
n
c
e

l
i
m
i
t
s

f
o
r

t
h
e

a
u
t
o
c
o
r
r
e
l
a
t
i
o
n
s
)
DateCostCo Revenuefirst differenceACF1TSTA1LBQ1PACF2TSTA2
3/31/19954307.3218-0.644357759-5.50539510531.5722659-0.644357759-5.505395105
6/30/19953896.2384307.3218-411.08380.301467791.90383836538.580484-0.194474235-1.661588593
9/29/19956013.80323896.2382117.5652-0.587981317-3.5415530865.62088559-0.855294477-7.307639211
12/29/19954383.5646013.8032-1630.23920.8712885844.527432137125.85729560.1780925531.521623441
3/29/19964688.69484383.564305.1308-0.574072829-2.387123598152.39169090.1296927311.108095175
6/28/19964311.48784688.6948-377.2070.2787201891.077891363158.7398262-0.0261045-0.223036943
9/30/19966182.7094311.48781871.2212-0.53367315-2.031783954182.365864-0.116726699-0.997313357
12/31/19964883.40826182.709-1299.30080.794976792.86872919235.59870750.0774251950.661521157
3/31/19975238.89314883.4082355.4849-0.536840594-1.749964915260.25313070.010478440.089527832
6/30/19974836.2295238.8931-402.66410.2593794320.812128535266.0998823-0.066749141-0.570304908
9/30/19976915.8744836.2292079.645-0.497043517-1.54239228287.9161906-0.165835581-1.416899822
12/31/19975429.76326915.874-1486.11080.7448586652.239563278337.7130596-0.053169599-0.454281252
3/31/19985795.00595429.7632365.2427-0.498703385-1.405957925360.4073969-0.001660004-0.014183085
6/30/19985338.08595795.0059-456.920.2369145290.650532691365.6159308-0.049857168-0.425979829
9/30/19987707.0225338.08592368.9361-0.458813114-1.252593237385.4872992-0.062586279-0.534737402
12/31/19985998.07817707.022-1708.94390.6870368471.836601885430.8260267-0.068394829-0.584365679
3/31/19996592.35795998.0781594.2798-0.452615227-1.157632079450.8548032-0.010516919-0.089856599
6/30/19996053.81986592.3579-538.53810.2213945090.55613274455.73407620.0369784940.315944389
9/30/19998811.7756053.81982757.9552
12/31/19996943.51228811.775-1868.2628
3/31/20007736.98686943.5122793.4746
6/30/20006894.60797736.9868-842.3789
9/29/200010589.1896894.60793694.5811
12/29/20007637.277810589.189-2951.9112
3/30/20018306.30867637.2778669.0308
6/29/20017718.8958306.3086-587.4136
9/28/200111134.54987718.8953415.6548
12/31/20018466.552711134.5498-2667.9971
3/29/20029382.85168466.5527916.2989
6/28/20028616.74719382.8516-766.1045
9/30/200212296.3478616.74713679.5999
12/31/20029198.58512296.347-3097.762
3/31/200310114.16999198.585915.5849
6/30/20039543.071310114.1699-571.0986
9/30/200313689.73059543.07134146.6592
12/31/200310521.480513689.7305-3168.25
3/31/200411548.969710521.48051027.4892
6/30/200410897.240211548.9697-651.7295
9/30/200415139.30210897.24024242.0618
12/31/20041157815139.302-3561.302
3/31/200512658.077115781080.077
6/30/200511996.912658.077-661.177
9/30/200516709.93611996.94713.036
12/30/200512933.34616709.936-3776.59
3/31/200614054.57612933.3461121.23
6/30/200613273.17514054.576-781.401
9/29/200619875.22113273.1756602.046
12/29/200614151.62419875.221-5723.597
3/30/200715112.01614151.624960.392
6/29/200714659.25515112.016-452.761
9/28/200720477.2614659.2555818.005
12/31/200715809.5320477.26-4667.73
3/31/200816959.88615809.531150.356
6/30/200816613.71716959.886-346.169
9/30/200823099.88716613.7176486.17
12/31/20081639523099.887-6704.887
3/31/20091684316395448
6/30/20091580616843-1037
9/30/200922378158066572
12/31/20091729922378-5079
3/31/201018742172991443
6/30/20101778018742-962
9/30/201024125177806345
12/31/20101923924125-4886
3/31/201120875192391636
6/30/20112062320875-252
9/30/201128178206237555
12/30/20112162828178-6550
3/30/201222967216281339
6/29/20122232422967-643
9/28/201232218223249894
12/31/20122371532218-8503
3/29/201324871237151156
6/28/20132408324871-788
6
3
5
6
4
9
4
2
3
5
2
8
2
1
1
4
7
1
2
5
0
0
0
2
0
0
0
0
1
5
0
0
0
1
0
0
0
0
5
0
0
0
I
n
d
e
x
C
o
s
t
C
o

R
e
v
e
n
u
e
T
i
m
e

S
e
r
i
e
s

P
l
o
t

o
f

C
o
s
t
C
o

R
e
v
e
n
u
e
1
6
1
4
1
2
1
0
8
6
4
2
1
.
0
0
.
8
0
.
6
0
.
4
0
.
2
0
.
0

0
.
2

0
.
4

0
.
6

0
.
8

1
.
0
L
a
g
A
u
t
o
c
o
r
r
e
l
a
t
i
o
n
A
u
t
o
c
o
r
r
e
l
a
t
i
o
n

F
u
n
c
t
i
o
n

f
o
r

C
o
s
t
C
o

R
e
v
e
n
u
e
(
w
i
t
h

5
%

s
i
g
n
i
f
i
c
a
n
c
e

l
i
m
i
t
s

f
o
r

t
h
e

a
u
t
o
c
o
r
r
e
l
a
t
i
o
n
s
)
1
8
1
6
1
4
1
2
1
0
8
6
4
2
1
.
0
0
.
8
0
.
6
0
.
4
0
.
2
0
.
0

0
.
2

0
.
4

0
.
6

0
.
8

1
.
0
L
a
g
A
u
t
o
c
o
r
r
e
l
a
t
i
o
n
A
u
t
o
c
o
r
r
e
l
a
t
i
o
n

F
u
n
c
t
i
o
n

f
o
r

f
i
r
s
t

d
i
f
f
e
r
e
n
c
e
(
w
i
t
h

5
%

s
i
g
n
i
f
i
c
a
n
c
e

l
i
m
i
t
s

f
o
r

t
h
e

a
u
t
o
c
o
r
r
e
l
a
t
i
o
n
s
)

Still stressed from student homework?
Get quality assistance from academic writers!

Order your essay today and save 25% with the discount code LAVENDER