Forecasting and Business Analysis

The data contained in the spreadsheet “Corner Store.xls” provide information relating to the gross monthly sales of a hypothetical corner store chain. Each observation in the data represents a corner store at a different location.

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

 

You have been approached by the owners of this corner store chain to conduct a statistical analysis. They are looking to open new corner stores in several areas where they are currently not operating. Hence, they are interested in the determinants of gross sales, and in predicting the characteristics of the areas they should be considering for the establishment of new corner stores.

 

Task Write a report for the corner store chain owners. The report should provide a brief summary of the data, justification of the variable(s) that you consider to be relevant, and analysis of your results (including your interpretation of the results and any diagnostic tests you have conducted). You should also highlight any limitations in your analysis and suggestions for improving this research that you feel to be appropriate.

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

FORECASTING AND BUSINESS ANALYSIS

1

FORECASTING AND BUSINESS ANALYSIS

Name

Professor

Institution

Course

Date

Body of the Work/Assignment

References {examples of referencing in Havard}

Joachim, S 2010, Failed Bridges: Case studies, Causes and consequences, Wilhelm Ernst

and Sohn, Berlin.

Udith, JS, & Ernest, CH 1999, Structural Design in Wood, Kluwer Academic Publisher

Massachusetts.

Assignment Extract

Background
The data contained in the spreadsheet “Corner Store.xls” (located in the same folder as
this document) provide information relating to the gross monthly sales of a hypothetical
corner store chain. Each observation in the data represents a corner store at a different
location. The data incorporate observations from different cities and states (but all in
Australia). For each corner store, (i=1,2,….,50) , the variables you have been given are:
1. Gross monthly sales ($)
2. The number of competitors within 10km
3. The population (in 1000’s) within 10km
4. The average income of the population within 10km ($)
5. The average number of cars owned by households within 10km
6. The median age of dwellings within 10km
You have been approached by the owners of this corner store chain to conduct a
statistical analysis. They are looking to open new corner stores in several areas where
they are currently not operating. Hence, they are interested in the determinants of gross
sales, and in predicting the characteristics of the areas they should be considering for the
establishment of new corner stores.
Task
Write a report for the corner store chain owners. The report should provide a brief
summary of the data, justification of the variable(s) that you consider to be relevant, and
analysis of your results (including your interpretation of the results and any diagnostic
tests you have conducted). You should also highlight any limitations in your analysis and
suggestions for improving this research that you feel to be appropriate.

You need to do some excel.
If u need some help u can look my school web for this course http://learn.unisa.edu.au/course/view.php?id=130685
Name: shiwy015
Password:11Calvin18

Forecasting and Business Analysis

Study Period 5, 2013

Assignment 1 – Report

Due Date: 13

th
September, 5pm

Background

The data contained in the spreadsheet “Corner Store.xls” (located in the same folder as
this document) provide information relating to the gross monthly sales of a hypothetical
corner store chain. Each observation in the data represents a corner store at a different
location. The data incorporate observations from different cities and states (but all in
Australia). For each corner store, (i=1,2,….,50) , the variables you have been given are:

1. Gross monthly sales ($)
2. The number of competitors within 10km
3. The population (in 1000’s) within 10km
4. The average income of the population within 10km ($)
5. The average number of cars owned by households within 10km
6. The median age of dwellings within 10km

You have been approached by the owners of this corner store chain to conduct a
statistical analysis. They are looking to open new corner stores in several areas where
they are currently not operating. Hence, they are interested in the determinants of gross
sales, and in predicting the characteristics of the areas they should be considering for the
establishment of new corner stores.

Task

Write a report for the corner store chain owners. The report should provide a brief
summary of the data, justification of the variable(s) that you consider to be relevant, and
analysis of your results (including your interpretation of the results and any diagnostic
tests you have conducted). You should also highlight any limitations in your analysis and
suggestions for improving this research that you feel to be appropriate.

WORD LIMIT FOR REPORT: 1000 words including a 100 word executive

summary, but excluding references and appendix.

– An electronic copy of your assignment is to be submitted online using

Learnonline, while a hard copy of the identical version of your assignment is to

be dropped into the pigeon hole marked as “Assignment Box” located outside

the office of the School of Commerce (WL 2-57). For external students, the hard

copy is not required.

– There is a penalty of 10% of the total mark for late submission of each day. The

only exception to this is if your tutor approves an extension prior to the due date,

for legitimate reasons based on documented evidence. So you should collect your

documented evidence before you contact your tutor for an extension.

– Plagiarism is a specific form of academic misconduct. If found, both parties will
be penalised regardless of who copies from whom. The electronic version of your

submitted assignment will be used for a plagiarism check.

– For guideline on report writing for an empirical project, see Appendix A of
Koop.

The following are some guidelines that we will use to mark your assignments. This may
help you in preparing your assignment.

Main article

Superb: include all main results; accompanying well-presented and meaningful graphs
and tables; accurate, consistent and logical arguments; well-written.

Good, solid report: it is not superb in all aspects above, but an overall impression of the
report suggests that it is a very good and solid report.

Acceptable, but with several weaknesses: it clearly contains some mistakes, but overall it
covers most important findings.

Quite inadequate: the report contains many mistakes. An overall impression suggests that
assignments were done with little effort and students do not understand most of the
concepts introduced in the course.

Appendix

Include all necessary Excel outputs. Your appendix may include some technical
materials. The appendix should be well sorted out and with brief descriptions, so that they
are easy to follow.

Please note: marks are deducted for excessive or irrelevant output, in particular, printouts
of the worksheet.

2

>Data

km

.00

84

5

0

4

2

0.2

5

4

0.2

3

8

0.1

3

1.1

3

3

1.1 3.5

3

0.1 1.2

2

4.0

2

2.4

2

0.1

2

0

1.1 4.0

2

1.1

2

0.9

1

5

1.1

2

9

21542 0.2

2

5.0

1

1.1

2

1.1

2

1.2

1

1.2 2.4

2

1

0.7

1

1.4

1

1.5

1

1.5 2.5

2

1.2

1

1.8 9.2

1

2.0

1

2.2

5

2.1 4.3

1

58457 1.1

5

2.2

1

1

1.1 1.3

2

1.8

1

2.0 1.6

1

2.0 1.5

1

2.0 1.2

2

2.2

1

2.2

0

2.2 3.7

2

2.1

1

2.5 5.3

1

1.4

1

2.1

0

2.0 1.8

Gross monthly sales Number of competitors within

1 0 Population within 10km (1000’s) Average income of residents within 10km Average number of cars owned Median age of dwellings in area
12192 6 1

4 19

5 0.1 2.

3
13061 1

5.0 15000 0.2 1.6
19153 12.77 54785 1.3 1

1.1
20714 13.89 10357 1

4.0
21222 17.70 25645 0.6 3.5
22319 14.88 21545 20.0
32215 2

1.4 16859 2.4
33629 24.51 45215 4.3
33977 26.56 25845 1.2 8.8
34163 23.56 25985
36647 24.43 21548
40937 39.89 21542 0.5
45302 30.20 32546 0.9
49298 35.26 24649 9.2
49583 33.06 24792 0.3 14.5
57929 45.69 32545
59501 50.00 29751 2.6
62747 40.59 42151 1.5
63073 4

2.0 52545 6.9
63775 4

2.5 2.2
70985 58.75 45125 0.8
71616 12.59 35808 10.0
75019 50.01 35485 1.7
76374 100.00 32156 0.7
85372 56.91 44151
88057 6

1.8 41254 6.0
94150 62.80 31254 3.2
103683 71.26 58457 3.7
108781 77.00 58000
113330 75.53 52012 12.8
113987 77.54 65898
117454 70.11 61328 2.1
123245 89.00 61623 5.8
125584 83.72 62792
127114 88.00 1.0
132717 88.48 66358 5.3
134340 45.00 67125 1.9 4.1
139739 92.55 45875
141946 120.00 64885 3.1
144593 48.75 72296
145712 101.44 75551
156329 95.00 74584
160987 107.32 61454 6.6
161436 98.58 78545 14.0
171521 114.35 77854
201344 138.00 85784 9.3
205397 135.75 102565
207523 148.00 98475 2.3
211938 141.29 88545 7.9
233522 202.15 65854

Sheet3

Introduction

Forecasting and Business Analysis
Copyright UPmarket Software Services. This file must not be used without permission.

Correlation Analysis
Correlation models tell you how strong the linear relationship is between two variables. The statistic used is the coefficient of correlation denoted r (Rho). This is used mainly for understanding and can take values from –

1

to +1 to measure the degree of association.  
Y
X
Correlation Analysis
The chart on the left shows a simple relationship between X and Y. Correlation analysis can help show the strength of the relationship and also if the relationship is positive or negative.
Using Excel for Correlation Analysis
Excel can be used to calculate an individual correlation or a correlation matrix. The matrix is a table of correlations between a number of variables. This worksheet shows how to calculate a single correlation coefficient using the CORREL function in Excel and also how to create a correlation matrix using the Analysis ToolPak.

Correlation Example

5,500

94

5

5

94

6

5

,000

5

6

8

7

7

6 175

695 7

Value $ Land Area Rooms Building Area
Value $ 1
Land Area

1

Rooms

1

Building Area

1

Value $ Land Area Rooms Building Area
$1

5 6 124
$160,000 465 134
$163,500 7 119
$172,000 696 120
$

175 715 133
$212,000 634 8 234
$218,000 918 164
$225,000 695 204
$250,000 922 181
$265,000 801 158
$275,000 348
$310,000 220
0.00045
0.60722 0.19927
0.70607 -0.04193 0.86599
0.8659925633

The Excel Correlation Function in Analysis Tools
You can automatically calculate a correlation matrix in Excel using the Correlation function in the Analysis ToolPak. Open the ToolPak using Tools – Data Analysis. When you open the ToolPak you will see a long list of statistical methods that you can access. Go down the list and click Correlation. A screen will appear a little like that below. (It may vary depending upon the version of Excel you are using). The Input range is the data array of all input variables. If this includes a label in the first row of your data, you should check the Labels in First Row box. In this case the data is organised by columns. You need to select an output range where the output appears. When it is all entered – click Okay and the Correlation Matrix will be calculated.
The Excel Correl Function
The Correl function is CORREL(array1,array2). This returns a single correlation between array1 and array2. In cell 21B above is the function for calculating the correlation between Rooms and Building Area. Note that labels are not used – this is a mathematical function only. See HELP for details.

Introduction

Forecasting and Business Analysis
Copyright Upmarket Software Services. This file must not be used without permission.

Forecast Evaluation Using EXCEL
This is a simple exercise where two very simple naïve models are estimated and then a series of evaluation techniques are used. The purpose of this spreadsheet is to allow you to see the various formula that can be used for these applications.
– The naive forecast tab shows the two forecasting methods
– The evaluation tab shows the basic evaluation methods for these two naive forecasts

Two Naive Forecasts

7.60

0

9.70

7.50

7.20

7.00

6.20

5.50

5.50 5.30
5.50
Actual

7.60
9.70

9.60

7.50 9.6
7.20

7.00

6.20

5.50

5.30

5.50 5.2
Naïve Forecast 1
Actual Forecast1
7.60
9.70
9.6
7.50 9.60
7.20
7.00
6.20
5.50
5.30
Naïve Forecast 2
Forecast2
10.8
6.5
7.1
6.9
5.8
5.2
5.6

Two Naive Forecasts

Actual
Forecast1
Forecast2

Calculating the

Error

s

Naïve Forecast 1

Actual Forecast1 Error

7.60
9.70 7.60

9.60 9.70

0.10

0.01

0.01

7.50 9.60

2.10

0.28

7.20 7.50

0.30

0.04 0.01

7.00 7.20

0.20

0.03 0.00 0.04

6.20 7.00

0.80

0.13

5.50 6.20

0.70

0.13 0.01

5.30 5.50

0.20

0.04 0.00 0.04

5.50 5.30 0.20 0.20 0.04 0.04 0.00 0.04

5.50
Naïve Forecast 2

Actual Forecast2 Error Asb Error % error Abs % Error Adj MAPE Squared Error

7.60
9.70

9.60 10.8

1.15

0.12 0.01

7.50 9.6

2.05

0.27 0.03

7.20 6.5

0.75 0.10 0.10 0.01

7.00 7.1

0.05

0.01 0.00 0.00

6.20 6.9

0.70

0.11 0.01 0.49

5.50 5.8

0.30

0.05 0.01 0.09

5.30 5.2

0.15 0.03 0.03 0.00 0.02

5.50 5.2 0.30 0.30 0.05 0.05 0.01 0.09
Asb Error % error Abs % Error Adj

MAPE Squared Error

0.10

0.01 0.00

2.10

0.28 0.03 4.41

0.30

0.04 0.09

0.20 -0.03

0.80

0.13 0.02 0.64

0.70 -0.13 0.49
-0.20 -0.04

1.15

0.12 1.32

2.05

0.27 4.20
0.75 0.56

0.05 -0.01
-0.70

0.11
-0.30 -0.05
0.15

The most simple naïve forecast uses the last periods data as a forecast for the next. In other words “what ever happened last period will happen the next period”
This can be expressed as
Ft=At-1
The second naïve model uses the last value and the DIRECTION of the last change in values. So the last value is adjusted depending on if the last change in direction was positive or negative. This change is then weighted. This can be expressed as
Ft=At-1+P(At-1-At-2)
At-1-At-2 is the change
Pis the weight In this example a 50% weight (p=.5) is used
CALCULATING THE ERRORS
The “error” for each data point is simply the difference between the observed value and the forecast. In other words, how wrong were you!! This is either positive or negative. In most cases the positives and negatives almost even each other out so that the positive errors are approximately equal to the negative errors. So if you take the mean of these the Mean Error is almost nothing. In most cases the “absolute error” is more important this measures the value of the error but ignores the sign. So they are all positive. Another common error estimate is the percentage error. This is simply the error expressed in terms of the actual value. This often helps in the comparison of errors for time series with different relative values. Maybe a 10% error is acceptable and this then has meaning regardless of the magnitude of the actual values.
The Squared error is simply the error squared. This also removes the direction of the error (all measured as positives) and also highlights large errors by making them exponentially larger.

Forecast Errors (forecast 1)

Naïve Forecast 1

Actual Forecast1 Error Asb Error % error Abs % Error Squared Error

7.60
9.70 7.60

9.60 9.70

0.10 -0.01 0.01 0.01 0.01

7.50 9.60

2.10

0.28 4.41 4.41

7.20 7.50 -0.30 0.30 -0.04 0.04 0.09 0.09
7.00 7.20 -0.20 0.20 -0.03 0.03 0.04 0.04
6.20 7.00

0.80 -0.13 0.13 0.64 0.64

5.50 6.20 -0.70 0.70 -0.13 0.13 0.49 0.49
5.30 5.50 -0.20 0.20 -0.04 0.04 0.04 0.04
5.50 5.30 0.20 0.20 0.04 0.04 0.04 0.04

5.50

MAPE 0.09
(At-At-1)^2
-0.10
-2.10 -0.28
-0.80
ME -0.53
MAE 0.58
MPE -0.08
MSE 0.72
RMSE 0.85
Theil’s U 1.00

The final forecast evaluation is based on the summary statistics of the errors. Most of these involve taking the mean of the various errors. The mean error is thus the mean of the errors, which will often be very small because of the positive and negative values. The Mean Absolute Percentage Error is a popular measure as it measures the average error (regardless of direction) in percentage terms. The root mean squared error is also popular as it highlights forecasts when there are a number of very large errors.
Forecast Evaluation

Forecast Errors (forecast 2)

Naïve Forecast 2

Actual Forecast2 Error Asb Error % error Abs % Error Squared Error (At-At-1)^2

7.60
9.70

9.60

1.15

0.12 1.32 0.01

7.50

2.05

0.27 4.20 4.41

7.20

0.75 0.75 0.10 0.10 0.56 0.09

7.00

-0.05 0.05 -0.01 0.01 0.00 0.04

6.20

-0.70 0.70

0.11 0.49 0.64

5.50

-0.30 0.30 -0.05 0.05 0.09 0.49

5.30

0.15 0.15 0.03 0.03 0.02 0.04

5.50

0.30 0.30 0.05 0.05 0.09 0.04

ME

MAE

MPE -0.05

MAPE 0.09

MSE 0.85
RMSE

Theil’s U

10.75 -1.15 -0.12
9.55 -2.05 -0.27
6.45
7.05
6.90 -0.11
5.80
5.15
5.20
5.60
-0.38
0.68
0.92
1.09

Forecast Evaluation
The final forecast evaluation is based on the summary statistics of the errors. Most of these involve taking the mean of the various errors. The mean error is thus the mean of the errors, which will often be very small because of the positive and negative values. The Mean Absolute Percentage Error is a popular measure as it measures the average error (regardless of direction) in percentage terms. The root mean squared error is also popular as it highlights forecasts when there are a number of very large errors.

Summary

Naïve Forecast 1 Naïve Forecast 2
ME

MAE

MPE

MAPE

MSE

RMSE

Theil’s U

-0.5250 -0.3813
0.5750 0.6813
-0.0773 -0.0476
0.0864 0.0943
0.7200 0.8478
0.8485 0.9208
1.0000 1.0851

Introduction

ing and Business Analysis

Forecast
Copyright UPmarket Software Services. This file must not be used without permission.

Follow the Example
Go to the sheet ”

Your Try

” and use the Tools – Data Analysis menu to exponentially smooth the data in the “actual” column, using a smoothing constant of .5. The input dialog box on the left shows the inputs needed. You should forecast a value for the first month in 1996.
Remember that the Excel exponential smoothing function applies the dampening to the forecast value not the actual value. This means that the dampening factor is equal to 1 minus alpha. For example if you want an alpha value of .3 then you need a dampening factor of 1-.3 = .7.
You will find a solution and a fully worked example on other worksheet tabs.
The Excel Exponential Smoothing Function
You can automatically exponentially smooth data in Excel using the Exponential Smoothing function from the Data Analysis menu. First check that the Analysis ToolPak is turned on. Click on the Tools menu. The bottom item should be Data Analysis. If it isn’t then you will need to turn the ToolPak on. Go to the Tools Menu, click on Add-Ins then check the box next to Analysis ToolPak. After a short period you should now be able to access Data Analysis under the Tools menu. When you click on Data Analysis you will see a long list of statistical methods that you can access. Go down the list and click Exponential Smoothing. A screen will appear a little like that below. (It may vary depending upon the version of Excel you are using). The screen below shows the input for smoothing with a damping factor of .5. This is similar to the alpha figure referred to in your text. In fact this is 1minus the alpha refered o in the text. You need to enter the cell range for the input data and also the destination of the output. If you include a label in the first row of your data, you should check the Labels in First Row box. When it is all entered – click OK and the smoothed data is calculated.

Your Try

91.500

95.100

00

92.700

Period Actual
1994M1 94.300
1994M2 93.200
1994M3 91.500
1994M4 92.600
1994M5 92.800
1994M6 91.200
1994M7 89.000
1994M8 91.700
1994M9
1994M10 92.700
1994M11 91.600
1994M12 95.100
1995M1 97.600
1995M2
1995M3 9

0.3
1995M4 92.500
1995M5 89.800
1995M6
1995M7 94.400
1995M8 96.200
1995M9 88.900
1995M10 90.200
1995M11 88.200
1995M12 91.000
1996M1
Created by Peter Rossini © 2000 UPmarket Software Services

Peter Rossini:
Forecast this value

Exponential Smoothing Solution

Period Actual Forecast

1994M1 94.300 94.300

0.000

1994M2 93.200 94.300

1994M3 91.500

1994M4 92.600

29

1994M5 92.800

1994M6 91.200

1994M7 89.000

1994M8 91.700

1994M9 91.500

0.000

1994M10 92.700

1994M11 91.600

1994M12 95.100

1995M1 97.600

1995M2 95.100

1995M3

1995M4 92.500

1995M5 89.800

1995M6 92.700

1995M7 94.400

1995M8 96.200

1995M9 88.900

1995M10 90.200

1995M11 88.200

1995M12 91.000

1996M1

0.3

Created by Peter Rossini © 2000 UPmarket Software Services

Error Pct Error Sq Error
0.000 0.000%
-1.100 1.180% 1.210
93.970 -2.470 2.699% 6.101
93.229

0.6 0.679% 0.396
93.040 -0.240 0.259% 0.058
92.968 -1.768 1.939% 3.127
92.438 -3.438 3.863% 11.818
91.406 0.294 0.320% 0.086
91.494 0.006 0.006%
91.496 1.204 1.299% 1.449
91.857 -0.257 0.281% 0.066
91.780 3.320 3.491% 11.022
92.776 4.824 4.943% 23.270
94.223 0.877 0.922% 0.769
90.300 94.486 -4.186 4.636% 17.525
93.230 -0.730 0.790% 0.533
93.011 -3.211 3.576% 10.312
92.048 0.652 0.703% 0.425
92.244 2.156 2.284% 4.650
92.890 3.310 3.440% 10.953
93.883 -4.983 5.606% 24.834
92.388 -2.188 2.426% 4.789
91.732 -3.532 4.004% 12.474
90.672 0.328 0.360% 0.107
MISSING 90.771
Smoothing Constant (alpha)
RMS Error 2.519

Exponential Smoothing Solution

Actual
Forecast
Year and Month
Index
Simple Exponential Smoothing Forecast of the Index of Consumer Sentiment

Fully Worked Example

s – 5 Periods with Calculations

Period Actual Calculation Forecast
1994M1 94.300

94.300

1994M2 93.200

94.300

1994M3 91.500

1994M4 92.600

1994M5 92.800

0.6

Period Actual Forecast Error Pct Error Sq Error

1994M1 94.300 94.300

1994M2 93.200 94.300 -1.100 1.180% 1.210

1994M3 91.500 93.640

1994M4 92.600 92.356

1994M5 92.800 92.502

1994M6 91.200

1994M7 89.000

1994M8 91.700

1994M9 91.500

1994M10 92.700

1994M11 91.600

1994M12 95.100

1995M1 97.600

1995M2 95.100

1995M3 90.300

1995M4 92.500

1995M5 89.800

1995M6 92.700

1995M7 94.400

1995M8 96.200

1995M9 88.900

1995M10 90.200

1995M11 88.200

-2.470

1995M12 91.000

1996M1 MISSING

Smoothing Constant 0.6 (alpha) RMS Error

Created by Peter Rossini © 2000 UPmarket Software Services

Example of Exponential Smoothing

Calculation
Last Forecast or Actual Value
(0.6)(94.3) + ( 1 – 0.6)(94.3)
(0.6)(93.2) + ( 0.4)(94.3) 93.640
(0.6)(91.5) + ( 0.4)(93.64) 92.356
(0.6)(92.6) + ( 0.4)(92.356) 92.502
Alpha Change this value
Example of Exponential Smoothing Calculations – all periods
-2.140 2.339% 4.580
0.244 0.263% 0.060
0.298 0.321% 0.089
92.681 -1.481 1.624% 2.193
91.792 -2.792 3.138% 7.797
90.117 1.583 1.726% 2.506
91.067 0.433 0.473% 0.188
91.327 1.373 1.481% 1.886
92.151 -0.551 0.601% 0.303
91.820 3.280 3.449% 10.757
93.788 3.812 3.906% 14.531
96.075 -0.975 1.025% 0.951
95.490 -5.190 5.748% 26.937
92.376 0.124 0.134% 0.015
92.450 -2.650 2.951% 7.025
90.860 1.840 1.985% 3.385
91.964 2.436 2.580% 5.934
93.426 2.774 2.884% 7.697
95.090 -6.190 6.963% 38.319
91.376 -1.176 1.304% 1.383
90.670 2.801% 6.103
89.188 1.812 1.991% 3.283
90.275
2.575

Finding the Optimal Value of Alpha using SOLVER
SORITEC and similar forecasting software will automatically find the optimal value for Alpha by finding the value which minimises the RMS Error. This can also be done using solver in Excel. The Solver function enables the user to maximise or minimise a value (or function) by changing other cells until the optimal solution is found. This is the process used in optimising methods such as linear, non-linear, integer or dynamic programming.
For this example it is quite simple. Minimise the value of the RMS Error by changing the value of Alpha. To do this click Tools, then Solver. The dialog box to your left should appear. In this case we input to minimuse the value in cell F28 which is the RMS Error by changing the value of Alpha (or cell D27). Click solve and you will find the same solution that you would get through using SORITEC.
NOTE: IN this spreadsheet the value for Alpha has been named rather than using a cell reference. To find out how to use names I suggest you consult the Excel help system.
Change the value of alpha and see what happens. Try to find the value for alpha that minimises the RMS error. Then go to the next sheet to find out how to do this easily

Fully Worked Example

Actual
Forecast
Peter Rossini:
Change this value to see the effect on the calculations, the forecast and the errors

2

>Introduction

Forecasting and Business Analysis
Copyright UPmarket Software Services. This file must not be used without permission.

The Excel

Moving Average

Function in Analysis Tools

Y

ou can automatically calculate moving averages in Excel using the Moving Average function in the Analysis ToolPak. First check that the Analysis ToolPak is turned on. Click on the Tools menu. The bottom item should be Data Analysis. If it isn’t then you will need to turn the ToolPak on. Go to the Tools Menu, click on Add-Ins then check the box next to Analysis ToolPak. After a short period you should now be able to access Data Analysis under the Tools menu.
When you open Data Analysisyou will see a long list of statistical methods that you can access. Go down the list and click Moving Average. A screen will appear a little like that below. (It may vary depending upon the version of Excel you are using).
The screen below shows the input for a three period (Interval) moving average.
You need to enter the cell range for the input data and also the destination of the output. If you include a label in the first row of your data, you should check the Labels in First Row box. When it is all entered – click OK and the moving averages will be calculated.
Follow the Example
Go to the sheet ”

Your Try

” and use the Excel Data Analysis, Moving Average Add-in to calculate

3

and

5

period moving averages.
The input dialog box on the left shows the inputs needed for the 3 period moving average. The calculations, including a one period forecast are shown on the

Moving Average Solution

sheet, together with a chart and Root Mean Square (RMS) errors indicating the “best” estimate. Try to follow the calculations for the RMS errors. The

Simple MA Example

sheet shows the concept of how the moving average is calculated.

Your Try

980Q1

3.529

9

.585

5.

24

7.909

Period Actual
1 2

4
1980Q2 232.

12
1980Q3 219.844
1980Q4 2

10
1981Q1 20 6
1981Q2 219.938
1981Q3 231.738
1981Q4 224.549
1982Q1 233.729
1982Q2 244.012
1982Q3 259.075
1982Q4 259.160
1983Q1 235.686
1983Q2 237.483
1983Q3 242.444
1983Q4 234.117
1984Q1 230.830
1984Q2 229.758
1984Q3 243.576
1984Q4 246.140
1985Q1 257.428
1985Q2 250.814
1985Q3 238.397
1985Q4 207.214
1986Q1 18
1986Q2 169.855
1986Q3 155.852
1986Q4 160.431
1987Q1 153.217
1987Q2 142.653
1987Q3 147.010
1987Q4 135.656
1988Q1 127.964
1988Q2 125.710
1988Q3 133.697
1988Q4 125.185
1989Q1 128.577
1989Q2 137.959
1989Q3 142.297
1989Q4 143.139
1990Q1 148.070
1990Q2 155.385
1990Q3 145.051
1990Q4 130.918
1991Q1 133.988
1991Q2 138.358
1991Q3 136.339
1991Q4 129.478
1992Q1 128.695
1992Q2 130.388
1992Q3 124.928
1992Q4 123.021
1993Q1 120.929
1993Q2 110.056
1993Q3 105.678
1993Q4 108.274
1994Q1 107.657
1994Q2 103.259
1994Q3 99.056
1994Q4 98.866
1995Q1 96.108
1995Q2 84.487
1995Q3 94.161
1995Q4 101.539
1996Q1 105.827
Created by Peter Rossini © 2000 UPmarket Software Services
Created by Peter Rossini © 1999 UniSA

Peter Rossini:
This is the actual value for this quarter. DO NOT include this in your calculation then you can tets the quality of your one period forecast

Moving Average Solution

Period Actual

Moving Average

1980Q1 243.529

0.000 0.000 0.000

1980Q2 232.129 0.000 0.000 0.000 0.000
1980Q3 219.844

0.000 0.000 0.000 0.000

1980Q4 210.585

231.834 0.000

0.000

1981Q1 205.624

220.853

0.000

0.000

1981Q2 219.938

212.018

222.342

1981Q3 231.738

212.049

217.624

1981Q4 224.549

219.100

217.546

1982Q1 233.729

225.408

218.487

1982Q2 244.012

230.005

223.116

1982Q3 259.075

234.097

230.793

1982Q4 259.160

245.605

238.621

1983Q1 235.686

254.082

244.105

1983Q2 237.483

251.307

246.332

1983Q3 242.444

244.110

247.083

1983Q4 234.117

238.538

246.770

1984Q1 230.830

238.015

241.778

1984Q2 229.758

235.797

236.112

1984Q3 243.576

231.568

234.926

1984Q4 246.140

234.721

236.145

1985Q1 257.428

239.825

236.884

1985Q2 250.814

249.048

241.546

1985Q3 238.397

251.461

245.543

1985Q4 207.214

248.880

247.271

1986Q1 187.909

232.142

239.999

1986Q2 169.855

211.173

228.352

1986Q3 155.852

188.326

210.838

1986Q4 160.431

171.205

191.845

1987Q1 153.217

162.046

176.252

1987Q2 142.653

156.500

165.453

1987Q3 147.010

152.100

156.402

1987Q4 135.656

147.627

151.833

1988Q1 127.964

141.773

147.793

1988Q2 125.710

136.877

141.300

1988Q3 133.697

129.777

135.799

1988Q4 125.185

129.124

134.007

1989Q1 128.577

128.197

129.642

1989Q2 137.959

129.153

128.227

1989Q3 142.297

130.574

130.226

1989Q4 143.139

136.278

133.543

1990Q1 148.070

141.132

135.431

1990Q2 155.385

144.502

140.008

1990Q3 145.051

148.865

145.370

1990Q4 130.918

149.502

146.788

1991Q1 133.988

143.785

144.513

7

1991Q2 138.358

136.652

142.682

1991Q3 136.339

134.421

140.740

1991Q4 129.478

136.228

136.931

1992Q1 128.695

134.725

133.816

1992Q2 130.388

131.504

133.372

1992Q3 124.928

129.520

132.652

1992Q4 123.021

128.004

129.966

1993Q1 120.929

126.112

127.302

1993Q2 110.056

122.959

125.592

1993Q3 105.678

118.002

121.864

1993Q4 108.274

112.221

116.922

1994Q1 107.657

108.003

113.592

1994Q2 103.259

107.203

110.519

1994Q3 99.056

106.397

106.985

1994Q4 98.866

103.324

104.785

1995Q1 96.108

100.394

103.422

1995Q2 84.487

98.010

100.989

1995Q3 94.161

93.153

96.355

1995Q4 101.539

91.585

94.536

49.048

1996Q1 105.827 93.396 95.032

Created by Peter Rossini © 2000 UPmarket Software Services

Three

Quarter Three Quarter Moving Average Forecast Five-Quarter Moving Average Five-Quarter Moving Average Forecast Sq Error 3Q forecast Sq Error 5Q forecast
0.000
231.834
220.853 451.520
212.018 222.342 231.912
212.049 217.624 62.732 5.780
219.100 217.546 387.657 199.205
225.408 218.487 29.692 49.045
230.005 223.116 69.233 232.325
234.097 230.793 196.187 436.660
245.605 238.621 623.917 799.860
254.082 244.105 183.729 421.867
251.307 246.332 338.425 70.880
244.110 247.083 191.103 78.312
238.538 246.770 2.774 21.522
238.015 241.778 19.542 160.088
235.797 236.112 51.619 119.859
231.568 234.926 36.470 40.373
234.721 236.145 144.184 74.816
239.825 236.884 130.386 99.900
2

49.048 241.546 309.877 422.048
251.461 245.543 3.119 85.888
248.880 247.271 170.659 51.068
232.142 239.999 1736.028 1604.563
211.173 228.352 1956.529 2713.326
188.326 210.838 1707.205 3421.946
171.205 191.845 1054.561 3023.438
162.046 176.252 116.086 986.865
156.500 165.453 77.951 530.620
152.100 156.402 191.739 519.831
147.627 151.833 25.911 88.202
141.773 147.793 143.297 261.682
136.877 141.300 190.688 393.205
129.777 135.799 124.694 243.048
129.124 134.007 15.369 4.417
128.197 129.642 15.513 77.835
129.153 128.227 0.144 1.135
130.574 130.226 77.546 94.720
136.278 133.543 137.437 145.719
141.132 135.431 47.078 92.083
144.502 140.008 48.140 159.734
148.865 145.370 118.440 236.440
149.502 146.788 14.544 0.102
143.785 144.513 345.365 251.870
136.652 142.682 95.975 11

0.76
134.421 140.740 2.909 18.700
136.228 136.931 3.677 19.369
134.725 133.816 45.567 55.544
131.504 133.372 36.361 26.227
129.520 132.652 1.245 8.902
128.004 129.966 21.090 59.654
126.112 127.302 24.827 48.227
122.959 125.592 26.867 40.615
118.002 121.864 166.496 241.374
112.221 116.922 151.881 262.000
108.003 113.592 15.579 74.795
107.203 110.519 0.119 35.219
106.397 106.985 15.555 52.705
103.324 104.785 53.880 62.860
100.394 103.422 19.879 35.039
98.010 100.989 18.368 53.502
93.153 96.355 182.872 272.325
91.585 94.536 1.016 4.813
93.396 95.032 99.075
RMS ERROR 14.464 18.297

Moving Average Solution

Actual
Three Quarter Moving Average Forecast
Five-Quarter Moving Average Forecast

Year

and Quarter
Actual Data and the Three-Quarter Moving Average Forecast

Simple MA Example

Period Actual

Three Quarter Moving Average Three Quarter Moving Average Forecast Period Actual Three Quarter Moving Average Three Quarter Moving Average Forecast Five-Quarter Moving Average Five-Quarter Moving Average Forecast Sq Error 3Q forecast Sq Error 5Q forecast

1980Q1 243.529

MISSING 1980Q1 243.529

Missing Missing Missing Missing Missing

1980Q2 232.129

MISSING MISSING 1980Q2 232.129 Missing Missing Missing Missing Missing Missing

1980Q3 219.844

231.834 MISSING 1980Q3 219.844 231.834 Missing Missing Missing Missing Missing

1980Q4 210.585

220.853 231.834 1980Q4 210.585 220.853 231.834 Missing Missing 451.520 Missing

1981Q1 205.624

212.018 220.853 1981Q1 205.624 212.018 220.853 222.342 Missing 231.912 Missing

1981Q2 219.938

212.049 212.018 1981Q2 219.938 212.049 212.018 217.624 222.342 62.732 5.780

1981Q3 231.738

219.100 212.049 1981Q3 231.738 219.100 212.049 217.546 217.624 387.657 199.205

1981Q4 224.549

225.408 219.100 1981Q4 224.549 225.408 219.100 218.487 217.546 29.692 49.045

1982Q1 233.729

230.005 225.408

1982Q2 244.012 234.097 230.005 1995Q2 84.487 93.153 98.010 96.355 100.989 182.872 272.325
234.097 1995Q3 94.161 91.585 93.153 94.536 96.355 1.016 4.813
1995Q4 101.539 93.396 91.585 95.032 94.536 99.075 49.048
1996Q1 105.827 93.396 95.032

Created by Peter Rossini © 2000 UPmarket Software Services

RMS ERROR 14.464 18.297
Calculation
MISSING Missing
(243.529+232.129+219.844)/3
(232.129+219.844+210.585)/3
(219.844+210.585+205.624)/3
(210.585+205.624+219.938)/3
(205.624+219.938+231.738)/3
(219.938+231.738+224.549)/3
(231.738+224.549+233.729)/3
(224.549+233.729+244.012)/3

Centered Moving Average Example

Year Quarter

Y Moving Average

1 10 Missing Missing

Year 1

2 18 Missing Missing

Year 1

3 20

Year 1

4 12

0.76

First Quarter 5 12

Missing

Year 2 Second Quarter 6 20 Missing Missing

Created by Peter Rossini © 2000 UPmarket Software Services

Time Index Centred Moving Average Seasonal Factor
Year 1 First Quarter
Second Quarter
Third Quarter 15.00 15.25 1.31 20/15.25=1.31
Fourth Quarter 15.50 15.75 12/15.75=0.76
Year 2 16.00

2

>Introduction

Forecasting and Business Analysis
Copyright Upmarket Software Services. This file must not be used without permission.

A Practice session
This practice session will provide a review of some of the basic skills needed in Excel. It is expected that most students will have covered this material before and that this is simply a refresher. If you have not covered all of this material before then you may also find it necessary to use a simple “how to” type Excel reference in order for you to get started with Excel.
What you will do in this Session?
This session will take you through the worked example that is used for a simple naive model that includes a proportion of the change in actual values. It is a very simple model so the calculations should be easy to follow. The example is built up in several steps so that you can become more familiar with Excel but hopefully not be lost in any of the steps. In particular you will learn
– how to input a function and formula
– the difference between relative and absolute referencing and how to use each
– how to copy a formula to make a single formula apply to a group of cells
– how to use cell naming
How do I follow this session?
Its easy. At the bottom of this page you will see a number of “tabs”. They look like this.

You are currently looking at the Introduction Tab. So its highlighted. If you click on another tab, the sheet will open. To follow this session simply worked through the tabs in order!
I hope that you find this session to be useful

The Naive Forecast

.

0

2

.70

0

.

9.6

5

6

7

8

9

10 5.50 5.2
Copyright Upmarket Software Services. This file must not be used without permission.
Proportion (P) 0.

5
Naïve Forecast 2
Period Actual Forecast2
1 7 6
9
3 9.6 10 8
4 7.50
7.20 6.5
7.00 7.1
6.20 6.9
5.50 5.8
5.30 5.2
11 5.6

Ft is the forecast at time t
At-1 is the actual value at time t-1. The value in the period before t.
At-1-At-2 is the change in the actual values between t-1 and t-2
Pis the weight In this example a 50% weight (p=.5) is used
The naïve forecasting model used in this session uses the last actual value and the last CHANGE in actual values in order to make a forecast.
The proceeding actual value is adjusted depending on size and direction of the last change in actual values. This change is then weighted using a proportion P.
This can be expressed as:-
Ft=At-1+P(At-1-At-2)
This is what we are aiming to produce.
It’s a simple spreadsheet to make a naïve forecast. The concept of the forecast is explained in the text box below and the final worked example is shown. We will be working towards this outcome through the various steps.

Step 1

Proportion (P)

Naïve Forecast 2
Period Actual Forecast2
1

2

3

4 7.50
5 7.20
6 7.00
7 6.20
8 5.50
9 5.30
10 5.50
11

Copyright Upmarket Software Services. This file must not be used without permission.

0.5 7.60 9.70
9.60

Relative Cell References
This sheet uses a relative cell reference. This means that as you copy the formula it will “point” to a new set of cells. In the next step we examine an absolute cell reference.
At this time you should go to the Excel Help system and learn more about cell referencing. To do this open the Help System and search for relative cell reference in the index.
Now move your cursor over cell D8. A large Ft will appear to show you that this is what we are calculating. I have added similar markers for At-1 and At-2. (This will not normally happen in your spreadsheets). Now click the left mouse button. The formula will appear in the formula bar. Move your cursor into the formula bar as in the diagram on the right and click the left hand mouse button when the cursor is in the formula bar. You should see the formula and relevant cells change colour. This should help you to match the cells in the formula with the original formula.
Ft=At-1+P(At-1-At-2)
We can start by entering the formula into the appropriate cell. The formula is :-
In this example the first actual values are in cell C6 and C7. We will make our first forecast in period 3 and enter this into a new column headed Forecast2. We can enter the formula as shown below. Please enter this formula into cell D8.
=C7+0.5*(C7-C6)
You should get and answer of

10.8

. This is the first period when we can make a forecast since we need two proceeding actual values.
We can now copy this formula to apply to the full range of data. Move you cursor to cell D8 and click the left mouse button. Now move the cursor to the lower left hand corner of the cell and the cursor should change to the cursor as indicated on the diagram.
Now click and HOLD the left hand mouse button and move the mouse down the spreadsheet. This will “drag down” the formula. You can drag it down to cell D16
At-2
At-1
Ft

Step 2

Proportion (P) 0.5
Naïve Forecast 2
Period Actual Forecast2
1 7.60
2 9.70
3 9.60 10.8
4 7.50 9.6
5 7.20 6.5
6 7.00 7.1
7 6.20 6.9
8 5.50 5.8
9 5.30 5.2
10 5.50 5.2
11 5.6
Copyright Upmarket Software Services. This file must not be used without permission.

In step 1 the proportion was included as a fixed value (.5). In this step we will add the proportion as a cell reference. The proportion is input into cell D2. The formula can be modified to point to this cell. The formula becomes
=C7+$D$2*(C7-C6)
Note that the reference to D2 is shown as $D$2. This is an absolute reference. Unlike the relative reference used before, this part of the formula will always point to exactly the cell D2 even when “dragged down”. You can easily make a cell reference absolute by using the F4 key.
If you “drag down” the formula now, you will find that the answer remains as before. The advantage of this method is that you can now change the proportion for the whole forecast by simply changing the value in D2.
Using the same method as in step one, drag the formula down until period 11. Now try changing this value and see the effect n the forecast values.
In this example the absolute reference refers to an absolute column ($D) and an absolute row ($2). It is also possible to keep one part of the reference absolute, while leaving the other section relative. So $D2 would mean that the reference is always to Column D but that as the formula is copied the row reference will change. There is a good discussion of this in the Excel help menu under the heading, the difference between relative and absolute references. You should read this as you will need to use all forms of relative and absolute references in later problems.
Absolute Cell References
This sheet uses a combination of relative and absolute cell references. This means that as some references will always point to a specific or absolute cell regardless of how or where the formula is copied

Step 3

Proportion (P) 0.5
Naïve Forecast 2
Period Actual Forecast2
1 7.60
2 9.70
3 9.60 10.8
4 7.50 9.6
5 7.20 6.5
6 7.00 7.1
7 6.20 6.9
8 5.50 5.8
9 5.30 5.2
10 5.50 5.2
11 5.6
Copyright Upmarket Software Services. This file must not be used without permission.

If you click on cell D2 you will see that the letter P appears in the Name Box as in the diagram on the right. Since the proportion is now named P we can use the formula below to calculate the naïve forecast.
To name a cell, simply click on a cell, then move the cursor to the name box and type in the name
Cell Names as a Reference
This sheet uses Cell Names. Cell names are very useful in complex formula’s or sheets as it makes it easier to keep track of variables. You should read about cell names in in the Excel Help under the heading, about labels and names in formulas.
=C7+P*(C7-C6)
In step 2 the proportion was included by using an absolute cell reference. In more complex situations we may find that naming the cell is easier. A named call can be referred to by name rather than the cell reference. This is particularly useful if we are using a large number of variables or the same variable on multiple sheets. In this case we call the Proportion, P.
Try entering a cell name. Click on cell D16 and name it forecast by typing this in the name box. When you have typed it hit the return (enter) key. Now click on cell C16 and enter the formula =forecast. This should give you the value for the cell that you have named the forecast.

I

n

troduct

i

on

F

or

e

castingand Business Analysis
Copyright UPmarket

S

oftware Services.

This file must not

b

e used without permission.

Estimating the Unknown Parameters in a Simple

Regression

Model
This Excel template shows a simple example of how the unknown parameters are estimate in a simple regression. These parameters are b

0

(

intercept

)

and b

1

(slope). Students are not expected to be able to calculate these by hand and for most problems a computer would always be used to estimate the parameters. The calculations and demonstration here is simply to provided a greater level of understanding. Using the ToolPak and Linest function are also explained.

Y

X

e1
e

2

e

3

e

4

e

5

e

6

Ordinary Least Squares
The method used to estimate the unknown parameters is called ordinary least squares or OLS. The diagram illustrates that the 6 observed values and the line of best fit for these points. The two unknown parameters for this line are

b0

and

b1

. We can find these parameters by minimising the errors between each point and the line. These are labelled as e1 to e6. The errors are squared first.
So the parameters are found by minimising the squared errors.
This can be shown as

And the values of b0 and b1 can be found from the partial derivates i.e.

The method is shown on the “

Estimating the Model

” tab.

Estimating the Model

Response

s Size 7

4
6 6
10

6

9

7
10

8 11

9
Responses

Size

Estimate

e

e^2 7 4 7 0 0

6 6 9

3 9

10 6 9 1 1
9 7 10

-1

1
10 8 11 -1 1
11 9

12

-1 1
Sum of Squared Errors 13 Intercept

b0 3
Slope

b1 1
Copyright UPmarket Software Services. This file must not be used without permission.

Estimate the relationship between advertisement responses and size
You are analysing the responses from your companies latest advertising campaign. You wish to estimate the number of responses that you receive on average for each column centimetre of advertisement. You hypothesis that there is a positive relationship between the size of the advertisement and the number of responses. The relationship is shown on the scattergram. The line of best fit is also shown. You can adjust the line by changing the b0 & b1 parameters below.

Estimating the Model

Advertisment Size (Column Centimetres)
Number of Responses

OLS Calculations

Responses Size
7 4
6 6

10 6

9 7
10 8
11 9

Size Response
X Y

X^2 XY 4 7

16 28 6 6

36

36
6 10 36

60 7 9

49 63 8 10

64 80 9 11

81 99 Sums 40 53 282 366 Mean 6.667 8.833 n 6 6

Slope b1

=

12.6666666667

=

0.8260869565 15.3333333333 Intercept

b0= 3.3260869565 Copyright UPmarket Software Services. This file must not be used without permission.

Find the Line of best fit
Adjust the bo and b1 value to find the best fit. The errors (e) above are the difference between the observed response and the model response. Try to minimise the squared error.
Y=b0

+

b1(size)
Finding the line of best fit by minimising the sum of squared errors
While you can find the line of best fit by trial and error you can also minimise the squared errors by formula. This is shown on the “OLS Calculations” tab. To make the trial and error (iteration) method quicker, you can use solver to find the minimum. Use solver as shown below to minimise the squared errors. Check the results are the same as OLS.
Using Solver to minimise the error
You can open solver from the Tools menu. If Solver does not appear in the menu you may have to click the appropriate tick box in Tools Add-ins. In the solver screen notice that you are setting the target cell J17 to a minimum. Thus you minimise the sum of squared erros in J17. You do this by changing the values in H18 and H19. These are the two parameters b0 and b1. Use the solve button to find the parameter values. If you ask it to “Keep Solver Solution” the new parameter estimates will become the parameter values (the same as for OLS).
Finding the line of best using the OLS formulae
This shows the application of the OLS formulae for estimating the parameters
The formulae are applied to the data using the table and the formulae to the left. Note that these OLS results are the same as for the iterative (trial and error) approach.

Using the Excel ToolPak

Responses Size
7 4
6 6
10 6
9 7
10 8
11 9

SUMMARY OUTPUT Regression Statistics Multiple R 0.7453846705 R Square 0.555598307 Adjusted R Square 0.4444978838 Standard Error 1.4465100429 Observations

6
ANOVA df SS MS

F

Significance F

Regression 1

10.4637681159

10.4637681159

5.0008658009

0.088990

22

45

Residual

4

8.3695652174 2.0923913043 Total

5

18.8333333333 Coefficients

Standard Error

t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0% Intercept 3.3260869565

2.5325153929 1.3133531057 0.2593334655 -3

.7053175738

10.3574914868 -3.7053175738

10.3574914868
Size 0.8260869565

0.3694053363 2.2362615681 0.0889902245 -0.1995488055 1.8517227186

-0.1995488055 1.8517227186

Copyright UPmarket Software Services. This file must not be used without permission.

Estimate the relationship between advertisement responses and size
You are analysing the responses from your companies latest advertising campaign. You wish to estimate the number of responses that you receive on average for each column centimetre of advertisement. You hypothesis that there is a positive relationship between the size of the advertisement and the number of responses. This relationship can be tested using regression analysis. The analysis can be performed using the Analysis ToolPak.
Excel Regression Using Data Analysis
The graphic on the left, shows the regression dialog box. To find this use the Tools – Data Analysis menu and select regression from the list of methods. The most important inputs to this dialog box are
Y Range – which is the cells that refer to the dependent variable
X Range – which is a contiguous set of cells that refer to one or more independent variable (up to 16 variables can be chosen)
Labels – which needs to be ticked if the first row of the data is a label
Output options – you need to select to put the output in a specified range of the same worksheet OR on a new worksheet ply (or tab) or in a new workbook.
Residuals – you may choose to select from a range of residual options.
Normal Probability – you may choose to have a normal probability plot.
The Output
On the left you can see the output from the data analysis. Notice that the Coefficients are the same as for the other methods of analysis. These are the parameter estimates of the intercept and the slope. All of the other details are relevant statistics. This will be discussed at a later stage.

Using Linest

Responses Size Responses Size
7 4 7 4
6 6 6 6
10 6 10 6
9 7 9 7
10 8 10 8
11 9 11 9
Slope Intercept
0.8260869565 3.3260869565

0.8260869565 3.3260869565

0.3694053363 2.5325153929

0.3694053363 2.5325153929

0.555598307 1.4465100429 R Square 0.555598307 1.4465100429

5.0008658009 4 F 5.0008658009 4

0.8260869565 3.3260869565 10.4637681159 8.3695652174

10.4637681159 8.3695652174

Slope Intercept

Copyright UPmarket Software Services. This file must not be used without permission.

Reg Coeff
Stdev
=LINEST(known_y’s,known_x’s,const,stats) SEE
DF
SSE SSR
Enter Linest Function here

Estimate the relationship between advertisement responses and size
You are analysing the responses from your companies latest advertising campaign. You wish to estimate the number of responses that you receive on average for each column centimetre of advertisement. You hypothesis that there is a positive relationship between the size of the advertisement and the number of responses. This relationship can be tested using regression analysis. The analysis can be performed using the the LINEST function.
Details of the LINEST Function from the Excel Help file
LINEST(known_y’s,known_x’s,const,stats)
Fits a straight line to your data and returns an array that describes that line. The accuracy of the line depends on the degree of scattering in the data you provide. The more linear the data, the more accurate the LINEST model. LINEST uses the method of least squares for determining the best fit for the data.
The known_x’s, const, and stats arguments are optional.
If the array known_y’s is in a single row, then each row of known_x’s is interpreted as a separate variable.
If the array known_y’s is in a single column, then each column of known_x’s is interpreted as a separate variable.
The array known_x’s can include one or more sets of variables.
If you use only one variable, known_y’s and known_x’s can be shaped differently.
If you use more than one variable, known_y’s must be a vector (a range with a height or width of 1).
If you omit known_x’s, LINEST uses the values {1,2,3,…} in an array the same size as
known_y’s.
If const is FALSE, the constant term b equals zero.
If const is TRUE or omitted, the constant term will be estimated.
If stats is FALSE or omitted, LINEST returns only the slope and y-intercept.
If stats is TRUE, LINEST returns the additional values:
Standard error for each coefficient
Standard error for the constant b
Coefficient of determination (r-squared)
Standard error for the y-estimate
F-statistic
Degrees of freedom
Regression sum of squares
Residual sum of squares
LINEST is an Array function in EXCEL to produce regression results. The advantage of this over the Analysis ToolPak approach is that like all functions the results will change as values in the data set change. The function is applied below and you can read about at the bottom of the sheet. To use Linest you must use an array. This is a group of cells which cannot change. In this case the arrays will be the X’s and Y’s and the results. To input an array you must use and together rather than just . See the further example to your right.
Time for you to try this for yourself
Use the fx command to enter the Linest Command in cells O21 for the data in cells 03 to P8. Function should be LINEST(O3:O8,P3:P8,TRUE,TRUE). Now highlight cells O21 to P25. Click the cursor in the Equation Edit Bar while the cells are still highlighted. Now press . You should get an answer like those in cells O14 to P18. A version with labels is to the right of that.
The yellow highlighted cells are the results from the LINEST function. In this case only the two unknown parameters are shown. The example to the right shows all of the statistics as well. Note the “squiggly” brackets around the formula. This indicates it is entered as an array.

(

)

(

)

2

2

1

X

X

Y

X

n

Y

X

b

i

i

i

S

S

=

22
1

XXYXnYXb

ii

i

2

1

0

2

)

(

Σ

Minimise

i

i

i

X

b

b

Y

e

S

=

2
10
2

)( Σ Minimise

iii

XbbY

e

X

b

Y

b

1

0

=

XbYb

10

i

i

X

b

b

Y

1

0

ˆ

+

=

ii
XbbY
10
ˆ

MBD00112D39.unknown

MBD001883AE.unknown

MBD001883AF.unknown

MBD00113586.unknown

MBD000E6AB2.unknown

MBD001066CE.unknown

29/07/1

3

 

1
 

Forecas0ng
 and
 Business
 
Analysis

 
 

Introduc0on,
 Data
 Handling
 and
 

Correla0on
 

Lecture
 1
 

 
 

•  Lecturer:
 
Dr
 Patricia
 Sourdin
 
 
patricia.sourdin@unisa.edu.au
 

 
•  Tutor
 
Mr
 Minh
 Nguyen
 
HuuMinh.Nguyen@unisa.edu.au
 

 

 

Admin
 

•  Lectures:
 
 5
 pm–
 7
 pm,
 Thursday
 

•  Tutorials:
 
 you
 need
 to
 be
 enrolled
 in
 one
 of
 
the
 tutorials
 

29/07/

13
 

2
 

Admin
 
 

•  Your
 textbook
 

Course
 website
 
 

Features:
 
•  Course
 informa0on
 booklet
 
•  Lecture
 slides
 
•  Online
 forum
 
•  Assessment
 informa0on
 
•  Study
 guide
 and
 data
 sets
 to
 help
 you
 prepare
 
for
 tutorials
 

Assessment
 
 

Three
 pieces
 of
 assessment:
 
-­‐-­‐
 Assignment
 1
 
 (20%)
 due
 13
 September
 2013
 
-­‐-­‐
 Assignment
 2
 
 (20%)
 due
 8
 November
 2013
 
-­‐-­‐
 Final
 Exam
 
 (60%)
 

 
Please
 look
 at
 course
 outline
 for
 specific
 
instruc0ons
 and
 rules
 related
 to
 pass
 marks.
 

29/07/13
 

3
 

Prerequisites
 

This
 course
 REQUIRES
 successful
 comple0on
 of
 the
 
following
 courses:
 
 
–Either
 Sta0s0cs
 for
 Business
 (MATH
 1052)
 or
 
Quan0ta0ve
 Methods
 for
 Business
 (Math
 1053)
 
 
and
 
 
–Either
 Principles
 of
 Economics
 (ECON
 1008),
 or
 
Microeconomics
 (ECON
 1006),
 or
 Macroeconomics
 
(ECON
 1007).
 
 
If
 you
 do
 not
 have
 these
 courses
 you
 will
 be
 de-­‐
enrolled
 automa0cally.
 
 

What
 you
 need.
 

•  This
 course
 builds
 upon
 your
 previous
 study,
 and
 
therefore
 we
 assume
 you
 have
 acquired
 knowledge
 
of
 the
 following:
 
 

 –Basic
 MS
 Excel
 
 

 –Fundamental
 economic
 theory
 and
 logic
 
 

 –Basic
 algebraic
 techniques
 
 

 –Basic
 concepts
 from
 sta0s0cs,
 such
 as
 parameter
 

 
 
 es0ma0on
 and
 sta0s0cal
 tes0ng
 
 

•  You
 will
 need
 regular
 access
 to
 a
 computer
 with
 MS
 
Excel
 installed
 
 

 

What
 FBA
 will
 do
 for
 you
 
•  Introduce
 you
 to
 a
 range
 of
 quan0ta0ve
 analysis
 
techniques
 and
 their
 limita0ons
 
 

•  Take
 you
 from
 being
 able
 to
 compile
 and
 
summarise
 data
 in
 a
 very
 basic
 way
 towards
 more
 
sophis0cated
 analysis
 
 

 –Provide
 you
 with
 prac0cal
 forecas0ng
 and
 

 quan0ta0ve
 skills
 that
 can
 be
 applied
 in
 business,
 

 government,
 and
 academic
 research
 
 

•  These
 tools
 are
 highly
 sought
 by
 employers
 
 

 

29/07/13
 

4
 

Examples
 of
 ques0ons
 that
 can
 be
 
answered
 using
 the
 techniques
 you
 will
 

acquire
 
 
•  Do
 lower
 speed
 limits
 save
 lives?
 
 
•  How
 much
 is
 a
 university
 degree
 worth
 in
 the
 labour
 
market?
 
 

•  Does
 campaign
 spending
 influence
 elec0on
 
outcomes?
 
 

•  Does
 economic
 development
 lead
 to
 less
 or
 more
 
environmental
 damage?
 
 

•  Why
 are
 some
 countries
 so
 poor
 and
 others
 so
 rich?
 
 
•  What
 is
 the
 likely
 effect
 of
 a
 marke0ng
 campaign
 on
 
sales?
 
 

 

This
 week
 

•  Topic
 1a
 –
 Brief
 Review
 
 
– Review
 some
 basic
 concepts
 from
 maths
 and
 
 
sta0s0cs
 
 

•  Topic
 1b
 -­‐
 Data
 Handling.
 
 
– Data
 types,
 graphical
 methods,
 descrip0ve
 stats
 –
 
mean
 and
 variance
 

•  Topic
 1c
 –
 

Probability
 distribu0ons
 

•  Topic
 1d
 –
 Correla0on
 
 
– Defini0on,
 correla0on
 table
 
 

 

Topic
 1a
 

Review
 and
 concepts
 from
 maths
 and
 
sta0s0cs
 

29/07/13
 

5
 

Func0onal
 Nota0on
 

•  Ogen
 we
 are
 interested
 in
 the
 rela0onship
 
between
 2
 or
 more
 variables,
 which
 is
 ogen
 
denoted
 using
 the
 concept
 of
 a
 func0on
 
 

•  Read:
 “Y
 is
 a
 func0on
 of
 X”
 
 
•  X
 can
 be
 one
 variable,
 or
 it
 can
 be
 a
 vector
 of
 
many
 variables
 
 

 

Y

=

f X

( )

Equa0on
 of
 a
 Straight
 Line
 
 
•  Any
 straight
 line
 can
 be
 expressed
 as:
 

•  where
 α
 (y-­‐intercept)
 and
 β
 (slope
 of
 the
 line)
 are
 
values
 which
 determine
 the
 quan0ta0ve
 
rela0onship
 between
 X
 and
 Y.
 
 

•  β
 is
 the
 amount
 Y
 changes
 when
 X
 changes
 by
 
one
 unit
 
 

•  Explain
 the
 intui0on
 of
 β=0.5,
 β=-­‐0.75??
 
 

Y = α + β

X

Logarithms
 
•  The
 logarithm
 is
 a
 common
 way
 of
 transforming
 a
 
variable
 in
 business
 analysis
 
 

•  The
 logarithm
 of
 A
 is
 the
 power
 to
 which
 B
 (a
 base
 
value)
 must
 be
 raised
 to
 give
 the
 value
 A
 
 

•  For
 example,
 if
 B
 =
 9
 and
 A
 =
 81
 then
 the
 logarithm
 is
 
2,
 expressed
 as
 log9(81)
 =
 2
 
 

•  Ogen
 we
 use
 the
 natural
 logarithm,
 where
 B
 =
 e
 (e,
 a
 
mathema0cal
 constant,
 is
 approximately
 equal
 to
 
2.718)
 
 

•  For
 example,
 if
 GDP
 =
 243
 then
 loge(243)
 =
 ln(243)
 =
 
5.493
 
 

29/07/13
 

6
 

Ln(A)
 in
 Excel
 
 

•  To
 calculate
 the
 natural
 logarithm
 of
 a
 number
 in
 
Excel,
 use
 “=LN(number)”
 
 

•  To
 return
 that
 result
 back
 to
 the
 original
 number,
 
use
 “=EXP(number)”
 (this
 is
 called
 
“exponen0a0on”)
 
 

•  Both
 opera0ons
 can
 also
 be
 done
 using
 a
 
calculator
 
 

 

Where
 we
 are
 headed
 
 

•  This
 week
 we’ll
 discuss
 data
 types,
 handling,
 
correla0on
 analysis,
 and
 other
 simple
 procedures
 
-­‐
 and
 then
 move
 on
 to
 basic
 0me-­‐series
 
forecas0ng
 techniques
 
 

•  This
 will
 be
 followed
 by
 an
 introduc0on
 to
 cross-­‐
sec0onal
 regression,
 and
 then
 0me-­‐series
 
regression
 –
 which
 together
 will
 take
 up
 the
 bulk
 
of
 the
 course
 
 

•  During
 the
 course,
 we’ll
 focus
 on
 how
 the
 
techniques
 we
 have
 discussed
 can
 be
 used
 in
 
real-­‐world
 forecas0ng
 and
 data
 analysis
 
 

What
 does
 regression
 have
 to
 do
 with
 
“Forecas0ng”???
 
 

•  The
 course
 does
 have
 some
 maths
 in
 it.
 Don’t
 
panic!
 
 

•  We
 will
 focus
 on
 regression:
 regression
 is
 a
 
sta/s/cal
 technique
 
 

•  Despite
 many
 rumours,
 neither
 math
 nor
 
sta0s0cs
 is
 an
 evil
 thing:
 they
 are
 TOOLS,
 
which
 are
 used
 EVERYWHERE
 in
 business
 and
 
elsewhere
 
 

29/07/13
 

7
 

What
 does
 regression
 have
 to
 do
 with
 
“Forecas0ng”???
 (con0nued)
 
 

•  A
 forecast
 is
 a
 predicted
 outcome.
 The
 
predic0on
 may
 be:
 
 

 –Of
 an
 aggregate
 “macro”
 variable
 (e.g.,
 

 unemployment)
 
 

 –Of
 a
 “micro”
 variable
 (e.g.,
 sales)
 
 

 –Based
 on
 opinion,
 theory,
 data
 analysis,
 or
 

 some
 combina0on
 of
 these
 three
 
 

•  Regressions
 yield
 predic0ons
 –
 this
 is
 why
 
they
 are
 used
 heavily
 in
 forecas0ng
 
 

 

More
 about
 forecas0ng
 
 

•  Forecas0ng
 helps
 businesses
 and
 governments
 make
 
decisions
 in
 the
 face
 of
 uncertainty
 
 

•  Virtually
 all
 governments
 forecast
 macroeconomic
 
indicators
 such
 as
 unemployment,
 consumer
 
spending,
 popula0on
 growth,
 and
 GDP
 
 

•  Forecasts
 are
 constructed
 at
 na0onal,
 regional,
 state
 
and
 local
 levels,
 and
 inform
 public
 and
 private
 policy
 
(resource
 alloca0on)
 at
 every
 level
 
 

•  Some
 of
 the
 data
 used
 to
 develop
 such
 forecasts
 are
 
confiden0al,
 and
 some
 come
 from
 government
 
agencies
 and
 departments
 (like
 the
 ABS)
 
 

Examples
 
 
We
 have
 just
 seen
 a
 major
 economic
 event.
 The
 
“global
 financial
 crisis”
 
 

 –What
 will
 GDP
 growth
 be
 this
 year?
 
 

 –How
 will
 sales
 of
 my
 company’s
 goods
 do
 this
 

 year?
 
 

 –What
 will
 the
 level
 of
 unemployment
 be?
 
 

 –What
 will
 happen
 to
 house
 prices?
 
 

 –Will
 wages
 fall?
 
 

 –What
 will
 happen
 to
 a
 par0cular
 stock
 price?
 

 What
 will
 happen
 to
 stock
 prices
 in
 general?
 
 

29/07/13
 

8
 

A
 note
 on
 cross-­‐sec0onal
 versus
 0me
 
series
 predic0ons
 
 

•  We
 interpret
 a
 “forecast”
 broadly
 as
 a
 
“predic0on”
 in
 this
 course
 
 

 
•  The
 predic0on
 can
 be
 for
 future
 0me
 periods
 
(using
 0me
 series
 data)
 
 

 
•  A
 predic0on
 can
 also
 be
 constructed
 using
 
cross-­‐sec0onal
 data
 
 

Topic
 1b
 
 

Data
 handling
 

 

Subscripts
 and
 Summa0on
 
 

•  Subscripts
 are
 used
 to
 denote
 different
 observa0ons
 
of
 a
 variable
 
 

•  Conven0onally,
 we
 use
 subscript
 i
 for
 cross-­‐sec0onal
 
observa0ons
 (i.e.
 states,
 individuals,
 etc),
 and
 t
 for
 
0me
 series
 observa0ons
 (i.e.
 years,
 months,
 quarters)
 
 

•  Say
 we
 had
 data
 for
 GDP
 (Y)
 over
 a
 10
 year
 period,
 
with
 one
 observa0on
 for
 GDP
 (Y)
 in
 year
 1,
 another
 in
 
year
 2,
 etc.
 
 

•  The
 individual
 values
 of
 Y
 can
 be
 expressed
 as
 Y1
 (=
 
GDP
 in
 year
 one),
 Y2
 (
 =
 GDP
 in
 year
 2),
 all
 the
 way
 up
 
to
 Y10,
 or
 {Yt}
 where
 t
 =
 1
 to
 10
 
 

29/07/13
 

9
 

…con0nued
 

•  We
 can
 then
 write
 Yt
 to
 denote
 any
 individual
 
observa0on
 of
 GDP
 
 

•  If
 you
 are
 interested
 in
 calcula0ng
 the
 average
 
of
 GDP
 over
 the
 ten-­‐year
 period,
 then
 you
 first
 
want
 to
 add
 up
 (or
 sum)
 over
 all
 the
 
observa0ons
 
 

•  Use
 the
 summa0on
 operator,
 capital
 sigma:
 
 

Yt = Y1 +Y2 +…Y10t=1

10

Data
 Types
 

Types
 of
 Data:
 
 

 –0me
 series
 data
 (Yt
 for
 t=1,…,T)
 
 

 –cross-­‐sec0onal
 data
 (Yi
 for
 i=1,…,N)
 
 

 –panel
 data
 (Yit
 for
 i=1,..,N
 and
 t=1,…,T)
 
 

Cross-­‐sec0onal
 vs
 0me-­‐series
 data
 

•  Cross-­‐sec0onal
 data
 are
 observa0ons
 on
 one
 
or
 more
 unit-­‐level
 variables
 collected
 at
 a
 
single
 point
 in
 0me
 
 

 -­‐-­‐
 
 Repeated
 cross-­‐sec0ons:
 cross-­‐sec0onal
 

 data
 on
 variables
 that
 are
 roughly
 

 comparable
 and
 observed
 at
 successive
 0me
 

 periods
 (but
 NOT
 for
 the
 same
 sample)
 
 

•  A
 “0me
 series”
 is
 a
 series
 of
 observa0ons
 on
 
one
 variable
 over
 successive
 periods
 of
 0me
 
 

29/07/13
 

10
 

Cross-­‐sec0onal
 example
 
Household
 
 
 
 
 
 Yearly
 Household
 
 Yearly
 Household
 
 
number
 
 
 
 
 
 
 
 spending
 
 
 
 
 
 income
 
 
 
 
 

 
 
 
 
 
 
1
 
 
 
 
 
 
 
 
 30,000
 
 
 
 
 100,000
 
 
 
2
 
 
 
 
 
 
 
 
 40,000
 
 
 
 
 
 
 70,000
 
 
 
3
 
 
 
 
 
 
 
 
 80,000
 
 
 
 
 
 
 60,000
 
 
 
4
 
 
 
 
 
 
 
 
 
 
 
 
 
 100,000
 
 
 
 
 250,000
 
 
 
5
 
 
 
 
 
 
 
 
 30,000
 
 
 
 
 
 
 25,000
 
 
 
6
 
 
 
 
 
 
 
 
 15,000
 
 
 
 
 
 
 20,000
 
 
 
7
 
 
 
 
 
 
 
 
 40,000
 
 
 
 
 
 
 60,000
 
 
 
8
 
 
 
 
 
 
 
 
 50,000
 
 
 
 
 
 
 50,000
 
 
 
9
 
 
 
 
 
 
 
 
 80,000
 
 
 
 
 
 
 90,000
 
 
 
10
 
 
 
 
 
 
 
 
 20,000
 
 
 
 
 100,000
 
 
 

 
 
 
 
 
 
 
 
 
 …
 
 
 
 
 
 …
 
 
 
100
 
 
 
 
 
 
 
 
 30,000
 
 
 
 
 
 
 40,000
 
 
 

Time-­‐series
 data
 
 
Time
 Period
 
 Median
 house
 price
 
 
 
1
 
 
 
 
 
 100,000
 
 
 
2
 
 
 
 
 
 150,000
 
 
 
3
 
 
 
 
 
 150,000
 
 
 
4
 
 
 
 
 
 155,000
 
 
 
5
 
 
 
 
 
 157,000
 
 
 
6
 
 
 
 
 
 200,000
 
 
 
7
 
 
 
 
 
 210,000
 
 
 
8
 
 
 
 
 
 150,000
 
 
 
9
 
 
 
 
 
 200,000
 
 
 
10
 
 
 
 
 
 205,000
 
 
 

 
 
 
 
 
 
 …
 
 
 
100
 
 
 
 
 250,000
 
 
 

Panel
 data
 example
 
 
Time
 Period
 
 
 
 
 
 State
 
 
 
 #
 Road
 Accidents
 
 
 
1
 
 
 
 
 
 
 
 
 NSW
 
 
 
 
 
 46
 
 
 
1
 
 
 
 
 
 
 
 
 WA
 
 
 
 
 
 31
 
 
 
1
 
 
 
 
 
 
 
 
 Vic
 
 
 
 
 
 19
 
 
 
2
 
 
 
 
 
 
 
 
 NSW
 
 
 
 
 
 52
 
 
 
2
 
 
 
 
 
 
 
 
 WA
 
 
 
 
 
 47
 
 
 
2
 
 
 
 
 
 
 
 
 Vic
 
 
 
 
 
 17
 
 
 
3
 
 
 
 
 
 
 
 
 NSW
 
 
 
 
 
 49
 
 
 
3
 
 
 
 
 
 
 
 
 WA
 
 
 
 
 
 37
 
 
 
3
 
 
 
 
 
 
 
 
 Vic
 
 
 
 
 
 14
 
 
 

 
 
 
 
 
 
 
 
 …
 
 
 
 
 
 
 …
 
 
 

 

29/07/13
 

11
 

Variable
 types
 
 
Categorical
 
 

 
 –Nominal
 scale:
 one-­‐to-­‐one
 or
 many-­‐to-­‐one
 
mapping
 of
 categories
 into
 numerical
 “dummies”
 
 

 
 –Ordinal
 scale:
 numbers
 assigned
 to
 categories
 
reflect
 an
 inherent
 ranking
 
 
Numerical
 
 

 
 –Discrete
 (binary
 or
 mul0nomial);
 ogen
 equivalent
 
to
 ordinal
 categorical
 variables
 –
 e.g.,
 number
 of
 
bedrooms
 in
 a
 house
 
 

 
 –Con0nuous
 
 
Most
 variables
 we
 wish
 to
 forecast
 are
 numerical
 
 

Commonly
 used
 transforma0ons
 
 

•  Levels
 versus
 Growth
 Rates
 
 

 
 –Example:
 Might
 be
 more
 interested
 in
 
 
 

 growth
 of
 GDP
 as
 opposed
 to
 levels
 
 

Growth
 Rates:
 
 

 
•Log
 transforma0ons
 
 
•Propor0ons
 –
 e.g.,
 the
 propor0on
 of
 people
 in
 
Australia
 with
 a
 university
 degree
 
 
•Index
 (eg.
 CPI).
 See
 Sect
 2.1
 Koop.
 
 

 

Yt −Yt−1( )
Yt−1

×100

Graphical
 methods
 

29/07/13
 

12
 

Time
 series
 
•  Retail
 trade
 over
 0me
 

Histograms
 

•  Example
 ques0ons:
 
 

 –“What
 is
 the
 distribu0on
 of
 income
 across
 

countries?”
 
 

 –“What
 is
 the
 extent
 of
 global
 inequality?”
 
 

•  Related
 to
 the
 idea
 of
 a
 distribu0on.
 
 
•  Data
 to
 use:
 real
 GDP
 per
 capita
 in
 1992
 for
 90
 
countries
 measured
 in
 $US.
 
 

Construc0ng
 a
 Histogram:
 Step
 1
 
 

•  Construct
 “class
 
intervals”
 (“bins”).
 
 

•  Real
 GDP
 per
 capita
 in
 
our
 data
 set
 varies
 
from
 $408
 in
 Chad
 to
 
$17,945
 in
 the
 U.S.
 
 

•  Class
 intervals
 must
 
include
 these
 
extremes.
 
 

•  One
 choice
 of
 class
 
intervals
 (of
 many
 
choices
 possible):
 
 

29/07/13
 
13
 

Step
 2:
 Calculate
 frequencies.
 
 

•  Count
 the
 number
 of
 countries
 whose
 GDP
 
per
 capita
 falls
 into
 each
 bin.
 

Step
 3:
 Make
 a
 bar
 chart
 
 

•  Make
 a
 bar
 chart,
 with
 the
 bins
 on
 the
 x-­‐axis,
 
and
 frequency
 on
 the
 y-­‐axis.
 
 

XY
 Graph
 (cross-­‐sec0onal)
 
 
•  Example:
 Deforesta0on
 and
 Popula0on
 density
 
for
 70
 tropical
 countries.
 
 

•  Ques0on
 of
 interest:
 
 

 
 –“Do
 countries
 with
 high
 popula0on
 density
 

 
 also
 tend
 to
 have
 high
 deforesta0on
 rates?”
 
 

•  Plot
 of
 one
 variable
 versus
 another
 (e.g.
 
deforesta0on
 on
 y-­‐axis,
 popula0on
 density
 is
 on
 
x-­‐axis).
 
 

•  Each
 point
 on
 graph
 represents
 deforesta0on
 and
 
popula0on
 density
 for
 one
 country.
 
 

29/07/13
 

14
 

Sca|erplot
 Example
 

XY
 plot
 of
 popula0on
 density
 against
 deforesta0ons
 

Interpreta0on
 of
 XY-­‐plots
 

•There
 seems
 to
 be
 a
 posi0ve
 rela0onship
 between
 
deforesta0on
 and
 popula0on
 density
 
 
•Countries
 with
 low
 popula0on
 density
 also
 tend
 to
 
have
 low
 deforesta0on
 rates
 (i.e.
 low-­‐low)
 
 
•Countries
 with
 high
 popula0on
 density
 also
 tend
 
to
 have
 high
 deforesta0on
 rates
 (i.e
 high-­‐high)
 
 
•Outliers:
 countries
 which
 do
 not
 fit
 the
 “general
 
pa|ern”.
 
 

 

Descrip0ve
 sta0s0cs
 

29/07/13
 

15
 

Descrip0ve
 Sta0s0cs
 
 

•  Example
 (con0nued):
 real
 GDP
 per
 capita
 for
 
90
 countries.
 
 

•  A
 histogram
 graphically
 summarises
 the
 cross-­‐
country
 income
 distribu0on.
 
 

•  Descrip0ve
 sta0s0cs
 are
 numbers
 which
 
summarise
 proper0es
 of
 the
 income
 
distribu0on.
 
 

1.
 
 Measures
 of
 Loca0on
 

•  Intui0on:
 centre
 of
 distribu0on,
 average,
 “typical
 
country”
 (careful!!).
 We
 can
 calculate
 the
 
sample’s
 value
 –
 not
 the
 popula0on
 value.
 
 

•  Sample
 mean:
 
 

 
•  Mean
 GDP
 per
 capita
 is
 $5,443.80
 in
 this
 sample
 
 
•  Median
 or
 mode
 is
 ogen
 useful
 for
 skewed
 data
 
 

 

Y

=

Yi

i=1

N


N

2.
 
 Measures
 of
 dispersion
 

•  Intui0on:
 spread/variability/dispersion
 of
 
distribu0on;
 inequality
 across
 observa0ons.
 
Again
 –
 calculated
 for
 the
 sample
 at
 hand.
 
 

•  Standard
 devia0on:
 

• 
 Variance
 =
 standard
 devia0on
 squared
 
 

s =
Yi −Y( )

2∑

N −1

29/07/13
 

16
 

Topic
 1c
 
 

Probability
 distribu0ons
 

Random
 variables
 
•  A
 random
 variable
 is
 a
 variable
 whose
 value
 is
 
unknown
 un0l
 it
 is
 observed;
 in
 other
 words
 it
 is
 a
 
variable
 that
 is
 not
 perfectly
 predictable
 
–  Each
 random
 variable
 has
 a
 set
 of
 possible
 values
 it
 can
 
take
 

–  A
 discrete
 random
 variable
 can
 take
 only
 a
 limited,
 or
 
countable,
 number
 of
 values
 
•  An
 indicator
 variable
 taking
 the
 values
 one
 if
 yes,
 or
 zero
 if
 no
 
•  Indicator
 variables
 are
 discrete
 and
 are
 used
 to
 represent
 
qualita0ve
 characteris0cs
 such
 as
 gender
 (male
 or
 female),
 or
 race
 
(white
 or
 nonwhite)
 

–  A
 random
 variable
 that
 can
 have
 any
 value
 is
 treated
 as
 a
 
conMnuous
 random
 variable
 

•  Probability
 is
 usually
 defined
 in
 terms
 of
 
experiments
 
– If
 we
 were
 to
 select
 one
 cell
 from
 the
 table
 at
 
random,
 that
 would
 cons0tute
 a
 random
 
experiment
 

29/07/13
 

17
 

•  We
 summarize
 the
 probabili0es
 of
 possible
 
outcomes
 using
 a
 probability
 density
 func0on
 
 
(pdf
 )
 
– The
 pdf
 for
 a
 discrete
 random
 variable
 indicates
 the
 
probability
 of
 each
 possible
 value
 occurring
 

– For
 a
 discrete
 random
 variable
 X
 the
 value
 of
 the
 
probability
 density
 func0on
 f(x)
 is
 the
 probability
 that
 
the
 random
 variable
 X
 takes
 the
 value
 x,
 f(x)
 =
 P(X
 =
 x)
 
•  It
 must
 be
 true
 that
 0
 ≤
 f(x)
 ≤
 1
 
 

f(x1)
 +
 f(x2)
 +
 …
 +
 f(xn)
 =
 1
 
 

PDF
 of
 a
 discrete
 random
 variable
 

PDF
 of
 a
 discrete
 random
 variable
 

29/07/13
 

18
 

PDF
 of
 con0nuous
 RV
 

Proper0es
 of
 PDF
 

•  Two
 key
 features
 of
 a
 probability
 distribu0on
 
are
 its
 center
 (loca0on)
 and
 width(dispersion)
 
– A
 key
 measure
 of
 the
 center
 is
 the
 mean,
 or
 
expected
 value
 

– Measures
 of
 dispersion
 are
 variance,
 and
 its
 
square
 root,
 the
 standard
 deviaMon
 

Expected
 value
 

•  The
 mean
 of
 a
 random
 variable
 is
 given
 by
 its
 
mathemaMcal
 expectaMon
 
– If
 X
 is
 a
 discrete
 random
 variable,
 then
 the
 
mathema0cal
 expecta0on,
 or
 expected
 value,
 of
 X
 
is:
 

E X( ) = x1P X = x1( )+ x2P X = x2( )++ xnP X = xn( )

29/07/13
 

19
 

•  For
 the
 popula0on
 in
 our
 table,
 the
 expected
 
value
 of
 X
 is:
 

( ) ( ) ( )

( ) ( )

( ) ( ) ( ) ( )

1 1 2 2 3 3 4

4

1 0.1 2 0.2 3 0.3 4 0.4
3

E X P X P X P X P X= × = + × = + × = + × =

= × + × + × + ×
=

•  The
 mean
 of
 a
 random
 variable
 is
 the
 
populaMon
 mean
 
– We
 use
 Greek
 le|ers
 for
 populaMon
 parameters
 

•  The
 expected
 value
 can
 be
 wri|en
 
equivalently
 as:
 

µX = E X( ) = x1 f x1( )+ x2 f x2( )++ xn f xn( )
= xi f xi( )

i=1

n


= xf x( )

x

•  For
 our
 example:
 

( ) ( )
( ) ( ) ( ) ( )
4

1

µ

1 0.1 2 0.2 3 0.3 4 0.4
3

X
i

E X xf x
=

= =

= × + × + × + ×
=

29/07/13
 

20
 

Proper0es
 of
 expecta0ons
 

•  If
 a
 is
 a
 constant,
 then
 g(X)
 =
 aX
 is
 a
 func0on
 
of
 X,
 and:
 

•  If
 a
 and
 b
 are
 constants,
 then:
 

( ) ( ) ( ) ( )
( ) ( )
( )
x

x x

E aX E g X g x f x

axf x a xf x

a

E X

⎡ ⎤= =⎣ ⎦

= =
=

∑ ∑

( ) ( )E aX b aE X b+ = +

•  The
 expected
 value
 of
 the
 random
 variable
 is
 
the
 average
 value
 that
 occurs
 in
 many
 
repeated
 trials
 of
 an
 experiment
 

Variance
 of
 a
 random
 variable
 

•  The
 variance
 of
 a
 discrete
 or
 con0nuous
 
random
 variable
 X
 is
 the
 expected
 value
 of:
 

Algebraically:
 
( ) ( ) 2g X X E X⎡ ⎤

= −

⎣ ⎦

( )

( )
( )

( ) ( )
( )

2

2

2 2

2 2
2 2

var σ µ

2

µ µ

2µ µ
µ

XX E X

E

X X

E X E X

E X

= = −

= − +

= − +
= −

29/07/13
 

21
 

•  For
 our
 problem,
 we
 know
 that
 E(X)
 =
 μ
 =
 3
 
– Now:
 

 

– Then:
 

– The
 square
 root
 of
 the
 variance
 is
 called
 the
 
standard
 deviaMon
 

( ) ( ) ( ) ( )
4 4

2 2

1 1

2 2 2 21 0.1 2 0.2 3 0.3 4 0.4

10

i i
E X g x f x x f x

= =
= =

⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤= × + × + × + ×⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦
=

∑ ∑

( ) ( )2 2 2 2var σ µ 10 3 1XX E X= = − = − =

•  2
 PDF
 with
 different
 variances
 

Property
 of
 variances
 

•  A
 useful
 property
 of
 variances
 is
 the
 following
 
– Let
 a
 and
 b
 be
 constants,
 then:
 

– To
 see
 this,
 let
 Y
 =
 aX
 +
 b.
 
 Then:
 
( ) ( )2var varaX b a X+ =

( ) ( ) ( ) ( )( )
( ) ( )

( )
( )
22

2 22

22
2

var var µ µ

µ µ
µ

var

Y X

X X
X

aX b Y E Y E aX b a b

E aX a E

a X

a E X

a X

⎡ ⎤⎡ ⎤+ = = − = + − +⎢ ⎥⎣ ⎦ ⎣ ⎦
⎡ ⎤ ⎡ ⎤= − = −⎣ ⎦ ⎣ ⎦
⎡ ⎤= −⎣ ⎦

=

29/07/13
 

22
 

Rules of expected values

64
 

•  E(c) = c

•  E(X+c)=E(X)+c

•  E(cX)=cE(X)

July 13

Rules of Variance

65
 

•  V(c)=0

•  V(X+c)=V(X)

•  V(cX)=c2V(X)

July 13

Histogram
 of
 a
 normally-­‐distributed
 
variable
 
 

29/07/13
 

23
 

PDF
 of
 the
 normal
 distribu0on
 
 

Topic
 1d
 

Correla0on
 

Defini0on
 of
 correla0on
 

•  Correla0on
 measures
 numerically
 the
 
rela0onship
 between
 two
 variables
 X
 and
 Y
 
(e.g.,
 popula0on
 density
 and
 deforesta0on).
 
 

•  Correla0on
 between
 X
 and
 Y
 is
 symbolised
 by
 
the
 correla0on
 coefficient
 “r”
 or
 “rXY”.
 

r =
Yi −Y( ) Xi − X( )∑

Yi −Y( )
2∑ Xi − X( )

2∑

29/07/13
 

24
 

Coefficient
 of
 correla0on
 values
 

Coefficient
 of
 correla0on
 and
 XY
 plots
 
 

Why
 are
 variables
 correlated?
 
 
Correla0on
 does
 not
 imply
 causality!!!!
 
 

 
Example:
 
 
•  Correla0on
 between
 educa0on
 and
 wages
 is
 strong
 &
 posi0ve
 
 
•  Does
 this
 mean
 educa0on
 “causes”
 higher
 wages?
 
 
•  Possibility
 1:
 Educa0on
 improves
 skills,
 skilled
 workers
 get
 be|er
 

paying
 jobs,
 and
 therefore
 educa0on
 causes
 wages
 to
 increase
 
 
•  Possibility
 2:
 Some
 individuals
 are
 born
 with
 high
 innate
 ability,
 

which
 makes
 it
 easier
 for
 such
 individuals
 to
 pursue
 more
 
educa0on
 and
 to
 be
 more
 produc0ve
 on
 the
 job.
 Innate
 ability
 
(not
 educa0on)
 causes
 wages
 to
 increase.
 
 

 
•  NEED
 AN
 UNDERLYING
 THEORY
 TO
 BRING
 TOGETHER!
 
 

29/07/13
 

25
 

Correla0on
 with
 several
 variables
 
 

•  Correla0on
 relates
 precisely
 two
 variables
 
 
•  What
 to
 do
 with
 three
 or
 more?
 Usually
 use
 
regression.
 
 

•  Or
 you
 can
 calculate
 the
 correla0on
 between
 
every
 possible
 pair
 of
 variables.
 
 

•  Given
 three
 variables:
 X,
 Y
 and
 Z,
 we
 can
 calculate
 
three
 correla0ons:
 
 

 
 –rxy,
 rxz
 and
 ryz
 
 

Correla0on
 matrix
 
 

A
 correla0on
 matrix
 shows
 the
 correla0on
 
between
 each
 variable
 and
 each
 other
 variable
 
in
 a
 sample.
 
 

Conclusions
 
 

•  Appropriate
 data
 descrip0on
 is
 a
 necessary
 
first
 step
 before
 ANY
 modelling
 
 

•  How
 you
 describe
 data
 depends
 on
 what
 sort
 
of
 data
 you’re
 describing
 
 

•  Correla0on
 can
 be
 sugges0ve,
 but
 alone
 
cannot
 establish
 causality
 
 

 –Correla0on
 +
 a
 sensible
 theory
 suggests
 

 (does
 not
 prove
 but
 provides
 evidence
 of)
 a
 

causal
 rela0onship
 
 

 

29/07/13
 

26
 

Study
 
 

•  Keep
 up
 with
 the
 readings,
 
 
•  Prepare
 for
 your
 tutorials:
 
 

 –There
 is
 independent
 work
 that
 must
 be
 

 completed
 by
 you
 prior
 to
 tutorials,
 if
 you
 wish
 

 to
 par0cipate
 in
 this
 learning
 experience
 
 

 –Please
 don’t
 come
 to
 tutorials
 if
 you
 have
 

 not
 prepared
 
 

Next
 topic…
 

•  Begin
 looking
 at
 simple
 regression
 analysis
 

7/08/1

3

 

1

 

Topic
 2
 
 

Simple
 Regression
 
Koop
 Chapters
 4
 and
 

5
 

Admin
 
 

•  Assignment
 1
 to
 be
 posted
 week
 3
 

•  Due
 date
 
 13
 September
 –
 week
 

7
 

•  Assign
 1:
 
 Regression
 exercise
 and
 report
 

Last
 lecture
 
 

•  Maths
 and
 stats
 review
 
– Data
 handling
 
– Data
 descripKon
 
 

•  XY
 plots
 
•  Mean,
 Standard
 deviaKion
 

– CorrelaKon
 
– Probability
 and
 probability
 distribuKons
 

7/08/

13
 

2
 

This
 topic
 

•  A
 discussion
 of
 the
 simple
 regression…
 

•  ABSOLUTELY
 ESSENTIAL
 READING:
 

 Koop
 Chapters
 4
 and
 5
 

 
IntroducKon
 to
 Simple
 Regression
 
 

 
•  Regression
 is
 the
 most
 common
 tool
 of
 the
 
applied
 economist.
 
 

•  Used
 to
 help
 understand
 what
 factors
 
(variables)
 accountable
 for
 the
 outcome
 of
 
variable
 of
 interest.
 
 

•  We
 begin
 with
 simple
 regression
 to
 
understand
 the
 relaKonship
 between
 two
 
variables,
 X
 and
 Y.
 
 

Imagine
 a
 “best-­‐fiang”
 line…
 
•  XY-­‐plot
 of
 populaKon
 density
 against
 
deforestaKon
 

7/08/13
 

3
 

 
The
 Regression
 Line
 IS
 
 
the
 Line
 of
 Best
 Fit
 
 

•  The
 process
 of
 (bivariate)
 regression
 is
 the
 
process
 of
 fiang
 a
 line
 through
 the
 points
 in
 
the
 XY-­‐plot
 that
 best
 captures
 the
 relaKonship
 
between
 deforestaKon
 and
 populaKon
 
density.
 
 

•  What
 do
 we
 mean
 by
 “best
 fiang”
 line?
 
 

 
Assumed
 Model
 Structure
 
 

Assume
 a
 true
 linear
 relaKonship
 exists
 between
 Y
 and
 
X:
 
 

 

 

 

 
Example:
 
Y=
 output
 of
 a
 good,
 X=
 labour
 input
 
α=
 ?
 (perhaps
 0)
 

β

=?
 (perhaps
 0.8=
 marginal
 product
 of
 labour)
 

 
 
 
 

Y = α + βX
α = intercept of line
β =slope of the line

NOT
 the
 Line
 of
 Perfect
 Fit
 
1.  Even
 if
 the
 straight
 line
 relaKonship
 were
 true
 

on
 average,
 we
 would
 never
 get
 all
 points
 on
 an
 
XY-­‐plot
 lying
 precisely
 on
 it
 due
 to
 the
 fact
 that
 
some
 of
 Y’s
 movement
 is
 not
 able
 to
 be
 
explained
 using
 X.
 
 

2.  Also
 –
 the
 true
 relaKonship
 is
 probably
 more
 
complicated;
 a
 straight
 line
 is
 typically
 thought
 
of
 as
 an
 approximaKon
 
 

3.  Y
 or
 X
 may
 be
 measured
 with
 errors.
 
 
Due
 to
 1,
 2
 and
 3,
 we
 add
 an
 error
 term
 to
 the
 
model.
 
 

7/08/13
 

4
 

 
Adding
 the
 error
 
 

Y
 =α
 +
 βX
 +
 e
 
 
where
 e
 is
 an
 error.
 
 

•  What
 we
 know:
 X
 and
 Y.
 
 
•  What
 we
 do
 not
 know:
 α,
 β
 and
 e.
 
 
•  Regression
 analysis
 uses
 data
 (X
 and
 Y)
 to
 make
 
an
 
 esKmate,
 of
 what
 α
 and
 β
 are.
 
 

•  NotaKon:
 
 
 
 
 
 
 
 
 
 and
 
 
 
 
 
 
 
 are
 the
 esKmates
 of
 α
 and
 
 
 
β
 that
 the
 regression
 (line-­‐fiang)
 process
 spits
 
out.
 
 

α̂ β̂

 
Pre-­‐
 versus
 Post-­‐EsKmaKon
 Model
 
 

•  True
 regression
 model:
 

 
•  EsKmated
 regression
 model:
 
 
 
 
 
 

Y = α + βX + e

e = Y −α − βX
e = error

Y = α̂ + β̂X + u

u = Y −α̂ − β̂X
u = residual

How
 do
 we
 choose
 
 
 
 
 
 and
 
 
 
 
 
 
 
 ?
 

With
 more
 than
 
 
two
 points,
 it’s
 
 
usually
 not
 
 
possible
 to
 find
 
 
a
 
 line
 that
 fits
 
 
perfectly
 through
 
 
all
 points:
 
 

α̂

β̂

7/08/13
 
5
 

 
EssenKal
 CharacterisKc
 of
 Regression
 

(or
 “Ordinary
 Least
 Squares”)
 
 

OLS
 regression
 chooses
 the
 line
 that
 
minimizes
 the
 sum
 of
 squared
 residuals.
 
 

Expressing
 the
 OLS
 esKmator
 

We
 observe
 data
 on
 two
 variables
 for
 i=1,..,N
 
individuals.
 Each
 individual
 has
 a
 Yi
 and
 an
 Xi.
 
 

 
Any
 line
 we
 fit/choice
 of
 
 
 
 
 
 
 
 and
 
 
 
 
 
 
 
 
 will
 yield
 
residuals
 ui.
 
 

 

 
OLS
 esKmator
 chooses
 
 
 
 
 
 
 and
 
 
 
 
 
 
 to
 minimise
 
SSR
 

 

α̂ β̂

Sum of squared residuals = SSR = ui
2

α̂ β̂

MathemaKcal
 expressions
 for
 the
 
bivariate
 OLS
 esKmators
 

SoluKon:
 

 

 
and
 
 
 

β̂ =
Yi −Y( ) Xi − X( )∑

Xi − X( )
2∑

α̂ = Y − β̂X

7/08/13
 

6
 

Regression
 and
 CausaKon
 

How
 do
 you
 choose
 which
 variables
 to
 use?
 
 
•  Ideally,
 the
 explanatory
 variable
 should
 be
 the
 one
 
which
 causes/influences
 the
 other
 (dependent)
 
variable:
 so,
 X
 causes
 Y.
 
 

•  If
 you
 can,
 only
 esKmate
 models
 where
 this
 
causality
 assumpKon
 make
 sense.
 
 

•  But,
 what
 guides
 this?
 
 

 
 IntuiKon,
 reasoning
 raKonale,
 theory
 
 

Examples
 

•  Increases
 in
 X
 (=
 populaKon
 density)
 cause
 Y
 
(=
 deforestaKon)
 to
 increase
 (or
 vice
 versa?
 
Make
 your
 argument)
 
 

•  Increasing
 X
 (=
 the
 lot
 size
 of
 a
 house)
 causes
 Y
 
(=
 its
 value)
 to
 increase
 (or
 vice
 versa?
 Make
 
your
 argument)
 
 

•  Increasing
 X
 (=
 adverKsing
 expenditures)
 
causes
 Y
 (=
 company
 sales)
 to
 increase
 (or
 vice
 
versa?
 Make
 your
 argument)
 
 

Causality
 (cont.)
 

•  In
 pracKce,
 great
 care
 must
 be
 taken
 in
 interpreKng
 
regression
 results
 as
 reflecKng
 causality.
 Why?
 
 

 –your
 assumpAon
 that
 X
 causes
 Y
 may
 be
 wrong.
 
 

 –X
 and
 Y
 may
 both
 be
 caused
 by
 some
 third
 factor,
 

 call
 it
 Z.
 
 

 –X
 may
 cause
 Y
 but
 Y
 may
 also
 cause
 X
 (e.g.
 

 exchange
 rates
 and
 interest
 rates).
 
 

 –the
 whole
 concept
 of
 causality
 may
 be
 

 inappropriate.
 
 

•  Formally,
 one
 key
 quesKon
 regression
 addresses
 is:
 
“How
 much
 of
 the
 variability
 in
 Y
 can
 be
 explained
 by
 
X?”
 (we
 will
 look
 at
 this
 shortly)
 
 

 

7/08/13
 
7
 

InterpretaKon
 of
 
 
 
 

•  EsKmated
 value
 of
 Y
 if
 X
 =
 0
 
 
•  This
 is
 ooen
 not
 of
 interest
 
 
Example:
 
 
•  X
 =
 lot
 size,
 Y
 =
 house
 price
 
 
• 
 
 
 
 
 
 
 =
 esKmated
 value
 of
 a
 house
 with
 lot
 size
 
=
 0
 
 

α̂

α̂

InterpretaKon
 of
 
 
 
 
1. 
 
 
 
 
 
 
 
 is
 the
 esKmate
 of
 the
 marginal
 effect
 of
 X
 on
 Y
 
 
2.  Using
 the
 regression
 model:
 
 

3.  The
 OLS
 esKmator
 –
 the
 esKmated
 “slope”
 –
 is
 a
 
measure
 of
 how
 much
 Y
 tends
 to
 change
 when
 you
 
change
 X.
 
 

4.  “If
 X
 changes
 by
 1
 unit
 then
 Y
 tends
 to
 change
 by
 
 

 units”,
 where
 “units”
 refers
 to
 what
 the
 variables
 are
 

 measured
 in
 (e.g.
 $,
 $billions,
 £,
 %,
 hectares,
 metres,
 

 etc.)
 
 

β̂
β̂

β̂ =
dY
dX

=
ΔY
ΔX

β̂

DeforestaKon
 example
 
Development
 economists
 have
 theories
 that
 imply
 
that
 increasing
 populaKon
 density
 should
 increase
 
deforestaKon.
 
 
Thus:
 
 
•  Y
 =
 deforestaKon
 (annual
 percentage
 lost)
 =
 
dependent
 variable
 
 

•  X
 =
 populaKon
 density
 (people
 per
 thousand
 
hectares)
 =
 explanatory
 variable
 
 

•  Using
 data
 on
 N
 =
 70
 tropical
 countries
 we
 find:
 
 

 
 
 
 
 
 
 
 
 =
 0.000842
 
 β̂

7/08/13
 

8
 

InterpretaKon
 and
 predicKon
 

a)
 “If
 populaKon
 density
 increases
 by
 1
 person
 
per
 1,000
 hectares,
 then
 the
 average
 
deforestaKon
 is
 esKmated
 (or
 expected)
 to
 
increase
 by
 0.000842
 %
 per
 year”
 
 

 
b)
 “If
 populaKon
 density
 increases
 by
 100
 
people
 per
 1,000
 hectares,
 then
 deforestaKon
 is
 
esKmated
 to
 increase
 by
 0.0842%
 per
 year
 on
 
average”
 
 

Basic
 evaluaKon
 staKsKcs
 

•  R-­‐squared
 
 
•  F-­‐test
 
 
•  Data
 evaluaKon
 
 
•  t-­‐test
 
 

 

 
R2:
 A
 Measure
 of
 Fit
 
 

IntuiKon:
 
 
•  “Variability”
 =
 (e.g.)
 how
 deforestaKon
 rates
 vary
 
across
 countries
 
 

Total
 variability
 in
 dependent
 variable
 Y
 =
 (1)+(2):
 
 
1.  Variability
 explained
 by
 the
 explanatory
 variable
 

(X)
 in
 the
 regression
 
 

 
 
 
 
 
 +
 
 

2.
 
 
 Variability
 that
 cannot
 be
 explained
 and
 is
 leo
 
over
 in
 the
 residual.
 
 

7/08/13
 

9
 

Sums
 of
 squares
 

In
 mathemaKcal
 terms,
 
 

 
 
 
 TSS
 =
 RSS
 +
 SSR
 
 

 
where
 TSS
 =
 Total
 sum
 of
 squares
 =
 

 

 

 
Note
 similarity
 to
 formula
 for
 variance.
 
 

 
 

TSS = Yi −Y( )
2∑

More
 sums
 of
 squares
 

•  RSS
 =
 Regression
 sum
 of
 squares
 

•  SSR=
 Sum
 of
 squared
 residuals
 

 

RSS = Ŷi −Y( )∑

2

SSR = u2
i=

1

N

R-­‐squared
 expressed
 as
 sums
 of
 
squares
 

 

 

 

 

 
•  R-­‐squared
 is
 a
 measure
 of
 fit
 (i.e.
 how
 well
 
does
 the
 regression
 line
 fit
 the
 data
 points
 –
 
meaning
 how
 closely
 X
 and
 Y
 are
 related)
 
 

 

 

R2 = 1−
SSR
TSS

or equivalently: R2 =
RSS
TSS

since 1− SSR = RSS

7/08/13
 

10
 

 
ProperKes
 of
 R-­‐squared
 

 
•  R2=1
 means
 perfect
 fit.
 All
 data
 points
 exactly
 on
 the
 
regression
 line
 (i.e.
 SSR=0).
 
 

•  R2=
 0
 means
 X
 does
 not
 have
 any
 explanatory
 power
 
for
 Y
 whatsoever
 (i.e.,
 X
 has
 no
 influence
 on
 Y).
 
 

•  Bigger
 values
 of
 R2
 imply
 X
 has
 more
 explanatory
 
power
 for
 Y.
 
 

•  R2
 is
 equal
 to
 (the
 correlaKon
 between
 X
 and
 Y)
 
squared
 (i.e.
 R2=r2xy)
 
 

0 ≤ R2 ≤1

R-­‐squared
 example
 

•  R2
 measures
 the
 proporKon
 of
 the
 variability
 in
 Y
 
that
 can
 be
 explained
 by
 X.
 
 

Example:
 
 
•  In
 regression
 of
 Y
 =
 deforestaKon
 on
 X
 =
 populaKon
 
density,
 we
 obtain
 R2=0.44
 
 

àWe
 can
 say
 that
 “44%
 of
 the
 cross-­‐country
 
variaKon
 in
 deforestaKon
 rates
 can
 be
 explained
 by
 
the
 cross-­‐country
 variaKon
 in
 populaKon
 density”
 
 

F
 test
 of
 overall
 significance
 
The
 F
 test
 is
 oHen
 used
 to
 measure
 the
 explanatory
 power
 of
 the
 
whole
 model
 (or,
 equivalently,
 the
 significance
 of
 the
 R-­‐
squared).
 The
 typical
 hypotheses
 in
 this
 context
 are:
 
 
•  H0
 :
 there
 is
 no
 staKsKcal
 significance
 on
 the
 relaKonship
 

between
 Y
 and
 X
 
 
•  H1
 :
 there
 is
 a
 staKsKcal
 significance
 on
 relaKonship
 between
 

Y
 and
 X
 
 
•  The
 F
 staKsKc
 is
 calculated
 as
 the
 raKo
 of
 the
 amount
 of
 

variaKon
 in
 Y
 that
 is
 explained
 by
 the
 model
 to
 the
 amount
 of
 
variaKon
 unexplained,
 corrected
 for
 degrees
 of
 freedom:
 
 

Fk,n−k−1 =
RSS k

SSR n − k −1( )

7/08/13
 

11
 

DeforestaKon
 Excel
 output
 

Non
 lineariKes
 

August 13 32

0

5

10

15

20

25

average hourly earnings

0 5 10 15 20
years of education

August 13 33

0

100

200

300

child mortality

0 1000 2000 3000 4000
per capita gnp in 1980

7/08/13
 

12
 

August 13 34

EsKmates
 only!
 

As
 menKoned,
 
 
 
 
 
 and
 
 
 
 
 
 
 are
 es#mates
 of
 the
 
true
 populaKon
 parameters
 only
 
 
•  But
 how
 accurate
 are
 they?
 
 
•  The
 t-­‐test
 allows
 us
 to
 formally
 address
 this
 
problem
 for
 each
 variable
 separately.
 
 It
 is
 
based
 on
 the
 esKmated
 standard
 deviaKon
 –
 
or
 “standard
 error”
 –
 of
 
 
 
 
 
 
 which
 is
 
esKmated,
 along
 with
 the
 value
 itself,
 by
 the
 
regression
 process
 
 

α̂ β̂
β̂

Standard
 error
 of
 
 

 

 

 

 
The
 s.e.
 of
 the
 esKmated
 slope
 varies:
 

 directly
 with
 SSR
 (the
 variability
 in
 the
 
residuals)
 
 

 Inversely
 with
 N
 
 

 Inversely
 with
 ,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 which
 relates
 to
 
the
 variance/variability
 of
 X
 
 

 

β̂

se =
SSR

n − 2( )

X − X( )2∑

X − X( )2∑

7/08/13
 
13
 

What
 factors
 affect
 the
 accuracy
 of
 the
 
esKmate
 
 
 
 
 
 ?
 
 
 

Ceteris
 paribus,
 
•  A
 large
 number
 of
 observaKons
 (more
 data
 
points)
 

•  Small
 errors
 (small
 SSR)
 

•  A
 bigger
 spread
 of
 values
 of
 the
 explanatory
 
variable
 X
 (X
 has
 a
 range
 of
 values)
 will
 
increase
 the
 accuracy
 of
 the
 esKmate
 
 

 
β̂
β̂

Very
 small
 sample
 size
 

Large
 sample
 size,
 large
 error
 variance
 

7/08/13
 

14
 

Large
 sample
 size,
 small
 error
 variance
 

Limited
 range
 of
 X
 values
 

DeforestaKon
 excel
 output
 

7/08/13
 

15
 

Test
 of
 a
 slope
 coefficient
 
The
 t-­‐test
 in
 the
 context
 of
 linear
 regression
 tests
 whether
 there
 is
 a
 
staKsKcally
 significant
 linear
 relaKonship
 between
 X
 and
 Y.
 
 
Hypotheses:
 
 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 

 
 
To
 perform
 the
 test,
 form
 the
 following
 staKsKc
 using
 the
 esKmated
 
coefficient
 and
 standrd
 error:
 
 

 

 

 

n-­‐k-­‐1
 represents
 the
 degrees
 of
 freedom
 associated
 with
 the
 test;
 when
 
there
 is
 one
 independent
 variable,
 k
 =
 1.
 
 

H0 : β = 0 (No linear relationship)
H1 : β ≠ 0 (Linear relationship)

t n−k−1( ) =
β̂

se β̂( )

Concept
 check
 

Test
 your
 understanding
 of
 the
 t-­‐test
 by
 doing
 
the
 following:
 
 

 
1.  Calculate
 the
 t-­‐stat
 using
 the
 standard
 error
 

of
 the
 esKmate
 and
 esKmated
 coefficient
 
 
2.  Check
 that
 this
 is
 equal
 to
 the
 t-­‐stat
 reported
 

in
 excel
 
 

EvaluaKng
 the
 model
 using
 the
 original
 
data:
 The
 issues
 
 

•  How
 well
 does
 the
 model
 ‘explain’
 the
 variance
 in
 the
 
dependent
 variable?
 
 

 –‘Goodness
 of
 fit’:
 the
 closer
 the
 points
 to
 the
 regression
 

 line,
 the
 bezer
 
 

•  How
 strongly
 are
 the
 independent
 variables
 related
 to
 the
 
dependent
 variable?
 
 

 –Are
 the
 esKmated
 effects
 economically
 meaningful?
 
 

 –Are
 the
 esKmated
 effects
 staKsKcally
 significant?
 
 

•  Determine
 whether
 the
 underlying
 assumpKons
 of
 
regression
 modelling
 have
 been
 met
 (more
 on
 this
 later…)
 
 

•  Determine
 robustness
 of
 the
 model
 to
 outliers
 (‘unusual’
 
observaKons)
 
 

7/08/13
 

16
 

Confidence
 in
 our
 results
 
 

•  Uncertainty
 about
 accuracy
 of
 the
 esKmator
 
can
 be
 summarised
 in
 a
 “confidence
 interval”.
 

•  This
 will
 provide
 us
 with
 some
 more
 
informaKon
 about
 the
 accuracy
 of
 our
 results
 
 

Confidence
 Interval
 for
 
 

•  Confidence
 interval
 for
 
 
 
 
 
 is
 given
 by:
 
 

Where:
 

 
 
 
 
 
 
 
 
 “criKcal
 value”
 from
 the
 t-­‐distribuKon
 
 
(note
 that
 Excel
 can
 provide
 the
 value)
 
And
 

 
 
 
 
 
 
 
 
 
 
 
 is
 the
 standard
 error
 of
 
 

 

 

β

β̂ − t
β̂
× se

β̂( ), β̂ + tβ̂ × seβ̂( )⎡⎣ ⎤⎦
β

t
β̂
=

se
β̂
= β̂

ConstrucKng
 a
 Confidence
 Interval
 
 
•  If
 we
 want
 a
 95%
 CI
 on
 
 
 
 
 ,
 we
 need
 the
 s.e.
 (provided
 in
 excel
 

output)
 and
 the
 relevant
 t-­‐staKsKc*
 
 
•  Therefore,
 the
 CI
 on
 our
 esKmate
 is:
 

 
Lower
 bound:
 
 
 

 0.000842
 –
 1.99*0.0001165
 =
 0.00061
 

 
 

Upper
 bound:
 
 

 0.000842
 +
 1.99*0.0001165
 =
 0.001075
 

 
 
 
 CI
 =
 [0.00061,
 0.001075]
 

 
InterpretaKon
 (informal):
 

–  There
 is
 a
 95%
 probability
 that
 the
 true
 value
 of
 β
 lies
 between
 
0.00061
 to
 0.001075.
 
 

β

7/08/13
 

17
 

 

•  OLS
 esKmator
 has
 many
 nice
 staKsKcal
 
properKes
 if
 certain
 condiKons
 hold.
 
 

•  These
 condiKons
 known
 as
 Gauss
 Markov
 
CondiTons
 
 

Y = α + βX + e

The
 Gauss-­‐Markov
 Assump#ons,
 In
 
Brief
 
 

•  These
 necessary
 condiKons
 are:
 
 
– The
 linear
 model
 is
 correct
 
 
– We’ve
 got
 a
 random
 sample
 of
 data
 from
 the
 
populaKon
 whose
 behaviour
 we’re
 using
 the
 
model
 to
 explain
 
 

– There’s
 some
 sample
 variance
 in
 X
 
 
– X
 and
 the
 unexplained
 part
 of
 Y
 (that
 is,
 e)
 aren’t
 
related
 
 

Why
 the
 G-­‐M
 assump#ons
 ma

•  If
 any
 of
 these
 condiKons
 DON’T
 hold,
 then
 
you
 can’t
 run
 the
 regression
 and
 expect
 the
 
OLS
 esKmator
 to
 deliver
 parameter
 esKmates
 
that
 are
 reasonable
 guesses
 of
 the
 true
 
relaKonship
 between
 Y
 and
 X
 in
 the
 
populaKon
 you
 care
 about.
 
– Bias
 means
 
 
 
 
 
 is
 not
 a
 true
 esKmate
 of
 
– Lack
 of
 precision:
 
 means
 
 
 
 
 
 
 
 
 
 has
 a
 large
 
standard
 error
 relaKve
 to
 itself
 

β̂ β
β̂

7/08/13
 

18
 

More
 on
 non-­‐linear
 rela#onships
 

•  So
 far,
 we’ve
 discussed
 esKmaKng
 a
 LINEAR
 
regression
 of
 Y
 on
 X
 and
 we
 have
 seen
 briefly
 
a
 lizle
 of
 non-­‐linearity:
 

•  So
 let’s
 say
 you
 have
 chosen
 your
 variables
 
(say
 explaining
 birth
 weight
 (Y)
 using
 mother’s
 
income
 (X))
 
 

•  Now
 choose
 funcKonal
 form
 
 

Y = α + βX + e

Nonlinearity
 
•  Is
 the
 relaKonship
 linear?
 
 
•  We
 could
 perform
 a
 regression
 of
 Y
 (or
 ln(Y)
 or
 Y2)
 on
 
X2
 (or
 1/X
 or
 ln(X)
 or
 X3,
 etc.)…
 and
 the
 same
 
esKmaKon
 technique
 for
 the
 equaKon’s
 parameters
 
would
 hold.
 
 

•  How
 will
 we
 decide?
 
 
–  Theory
 (would
 the
 marginal
 effect
 on
 birth
 weight
 of
 
income
 be
 likely
 to
 be
 constant?
 ie.
 does
 $1
 of
 extra
 
income
 have
 the
 same
 effect
 for
 an
 unemployed
 person
 as
 
a
 millionaire?)
 
 

–  Graphical
 analysis:
 What
 does
 a
 plot
 of
 the
 two
 variables
 
look
 like?
 
 

Nonlinearity
 -­‐
 example
 
 

•  e.g:
 
 

•  But
 how
 might
 you
 know
 if
 the
 TRUE
 
relaKonship
 between
 X
 and
 Y
 is
 likely
 to
 be
 
nonlinear?
 
 

•  Answer:
 Careful
 examinaKon
 of
 X-­‐Y
 plots
 and/
or
 theory.
 
 

Y = α + βX2 + e

7/08/13
 

19
 

Figure 4.2 A quadratic relationship between X and Y

0
20

40

60

80

100

120

140

160

180

200

0 1 2 3 4 5 6

Copyright
 ©
 2005
 John
 Wiley
 &
 Sons,
 Ltd
 

Choosing
 func#onal
 form
 
 

•  Common
 transformaKons
 are:
 
 
– Squared
 terms
 
 
– Taking
 natural
 logs
 (one
 side
 or
 both
 sides)
 –
 implies
 
elasAciAes,
 not
 slopes,
 are
 constant.
 
 

– Note:
 Need
 values
 >
 0
 to
 use
 log
 models!
 
 
•  To
 find
 the
 proper
 data
 transformaKon,
 try
 the
 
following:
 
 
– Plot
 out
 the
 data
 in
 X-­‐Y
 space,
 as
 per
 following
 slides
 
 
– Scan
 relevant
 theory
 for
 any
 suggesKons
 
 

Figure 4.3 X and Y need to be logged

0

0.5

1

1.5

2

2.5

3

3.5

4

0 2 4 6 8 10 12 14

Copyright
 ©
 2005
 John
 Wiley
 &
 Sons,
 Ltd
 

7/08/13
 

20
 

Figure 4.4 ln(X) versus ln(Y)

-1.5

-1

-0.5

0
0.5
1
1.5

-3 -2 -1 0 1 2 3

Copyright
 ©
 2005
 John
 Wiley
 &
 Sons,
 Ltd
 

Func#onal
 form
 
 

•  So,
 data
 in
 previous
 slide
 suggest
 that
 double
 
log
 model
 is
 ‘correct’
 one:
 
 

 
•  InterpretaKon
 of
 results
 will
 be
 different:
 
 
•  Eg.
 A
 coefficient
 of
 10
 implies
 a
 1%
 change
 in
 X
 
yields
 a
 10%
 change
 in
 Y
 
 

 

lnY = α + β lnX + e

Next
 topic
 

•  More
 on
 regression
 
– MulKple
 regression
 
– Dig
 deeper:
 
 What
 assumpKons
 do
 we
 rely
 on
 to
 
get
 unbiased
 and
 efficient
 esKmates?
 

22/08/13

 

1
 

Forecas/ng
 and
 Business
 
Analysis
 

 
Topic
 

3
 

This
 topic
 

•  Koop
 Ch
 6
 
 
– Mul/ple
 regression
 

•  Last
 topic:
 
 
– Simple
 regression
 
 

 

Mul/ple
 Regression
 

22/08/13
 

2
 

Differences
 between
 simple
 and
 
mul6ple
 regression
 
 

•  Mul/ple
 regression
 is
 like
 simple
 regression,
 
except
 that
 there
 are
 many
 explanatory
 
variables:
 X1,
 X2,…,
 Xk
 
 

•  The
 key
 differences
 are:
 
 
– You
 can
 perform
 mul/ple
 t-­‐tests,
 achieve
 higher
 R-­‐
squared,
 and
 build
 a
 more
 theore/cally
 complete
 
model
 of
 Y
 
 

– The
 effect
 of
 each
 independent
 variable
 on
 Y
 is
 
es/mated
 CONDITIONAL
 on
 the
 other
 independent
 
variables
 
 

OLS
 es6ma6on
 
 

•  Mul/ple
 regression
 model:
 
 

•  OLS
 es/mates:
 
 
•  These
 es/mates
 (s/ll)
 minimise
 the
 sum
 of
 
squared
 residuals
 
 

•  Solu/on
 to
 minimisa/on
 problem:
 Messy
 
 
•  Calcula/on
 of
 
 
 
 
 
 
 is
 harder
 for
 mul/ple
 OLS
 
 
•  Excel
 will
 calculate
 the
 OLS
 es/mates
 for
 you
 
 

Yi = α + β1X1i +…+ βkXk + ei
α̂, β̂1,…, β̂k

β̂

Sta6s6cal
 Aspects
 and
 Evalua6on
 
 

•  Standard
 error
 of
 the
 es/mate
 :
 largely
 the
 
same
 as
 for
 simple
 regression,
 just
 with
 bigger
 
‘k’
 

•  Confidence
 intervals
 can
 be
 calculated
 for
 
each
 individual
 coefficient,
 as
 we
 did
 before
 
for
 just
 the
 one
 coefficient.
 
 

•  Can
 test
 βj=0
 using
 a
 t-­‐test
 for
 each
 individual
 
coefficient
 (j=1,2,..,k),
 just
 as
 before
 
 

22/08/13
 
3
 

Mul6ple
 OLS
 sta6s6cs
 –
 cont’d
 
 

•  R2
 is
 s/ll
 a
 measure
 of
 fit,
 with
 the
 same
 interpreta/on
 
(although
 now
 it
 is
 no
 longer
 simply
 the
 square
 of
 the
 
correla/on
 between
 Y
 and
 ‘X’).
 
 

•  Can
 s/ll
 test
 R2=0
 using
 an
 F-­‐test,
 but
 with
 bigger
 ‘k’.
 
 
•  If
 you
 find
 R2>0,
 then
 you
 conclude
 that
 the
 
explanatory
 variables
 together
 provide
 explanatory
 
power
 (note:
 this
 does
 not
 necessarily
 mean
 that
 each
 
individual
 explanatory
 variable
 [through
 t-­‐stats]
 is
 
significant).
 
 

Interpre6ng
 OLS
 Es6mates
 in
 the
 
Mul6ple
 Regression
 Model
 
 

Mathema/cal
 Intui/on
 
 
Total
 vs.
 par/al
 deriva/ve
 
 

 
Simple
 regression:
 
 
 

 
Mul/ple
 regression:
 
 
 

dY
dX

= β

∂Y
∂Xj

=

β j

Interpreta6on
 of
 Mul6ple
 OLS
 
Es6mates,
 cont’d
 
 

•  Verbal
 intui/on
 
 
• 
 
 
 
 
 
 the
 marginal
 effect
 of
 Xj
 on
 Y,
 ceteris
 
paribus
 
 

• 
 
 
 
 
 
 is
 the
 effect
 on
 the
 dependent
 variable
 of
 a
 
small
 change
 in
 the
 jth
 explanatory
 variable,
 
holding
 all
 the
 other
 explanatory
 variables
 
constant.
 
 

β j
β j

22/08/13
 

4
 

Example:
 

Explaining
 Birth
 Weight
 
 

•  Let’s
 take
 some
 6me
 going
 over
 the
 following
 
example:
 
 

•  Data
 on
 N
 =
 1388
 individuals
 
 
•  Dependent
 variable:
 
 
– Y
 =
 birth
 weight
 of
 child,
 in
 pounds
 
 

•  Explanatory
 variables:
 
 
– X1
 =
 number
 of
 cigareges
 smoked
 per
 day
 by
 pregnant
 
mum
 
 

– X2
 =
 Family
 income,
 1988$USD
 
 
•  NOTE
 k=2!
 
 

Example:
 

Excel
 Output
 
 

Explaining
 Birth
 Weight
 
 

•  Figed
 Regression
 Line:
 
 

•  Evaluate
 the
 following:
 
 
– Significance
 of
 coefficient
 es/mates
 
 
– R2
 and
 its
 significance
 
 
– Do
 the
 results
 accord
 with
 common
 sense
 and/or
 
formal
 theory
 in
 the
 areas
 of
 economics,
 general
 
human
 behaviour,
 and
 health?
 
 

Ŷ = 7.31− 0.029X1 + 0.000006X2

22/08/13
 

5
 

Interpreta6on:
 Birth
 Weight
 Results
 
 

•  Since
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 ,
 then
 at
 the
 average
 
number
 of
 cigareges
 smoked:
 
 
– Mum
 having
 one
 extra
 cigarege
 per
 day
 is
 
expected
 to
 reduce
 the
 baby’s
 birth
 weight
 by
 
0.029
 pounds,
 ceteris
 paribus
 (i.e.,
 holding
 income
 
constant)
 
 

– If
 we
 compare
 individuals
 with
 the
 exact
 same
 
income,
 mums
 who
 smoke
 10
 cigareges
 per
 day
 
are
 expected
 to
 have
 babies
 that
 weigh
 0.2

9
 

pounds
 less
 than
 those
 of
 mums
 who
 do
 not
 
smoke.
 
 

β̂1 = −0.029

Interpreta6on:
 Birth
 Weight
 Results
 
 

•  Since
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 ,
 then
 at
 the
 average
 
family
 income:
 
 
– The
 family’s
 having
 one
 extra
 dollar
 of
 annual
 income
 
is
 expected
 to
 increase
 the
 baby’s
 birth
 weight
 by
 
0.000006
 pounds,
 ceteris
 paribus
 (i.e.,
 holding
 mum‟s
 
smoking
 behaviour
 constant)
 
 

–  If
 we
 compare
 mums
 with
 the
 exact
 same
 number
 of
 
cigareges
 smoked
 per
 day,
 mums
 who
 have
 $10,000
 
more
 in
 family
 income
 are
 expected
 to
 have
 babies
 
that
 weigh
 0.06
 pounds
 more
 (a
 seemingly
 “small”
 
effect
 in
 the
 output,
 but
 economically
 meaningful
 and
 
sta/s/cally
 significant!).
 
 

β̂2 = 0.000006

Example:
 Data
 transforma6on
 
 

•  Is
 it
 reasonable
 to
 expect
 that
 birth
 weight
 will
 
increase
 at
 a
 constant
 rate
 with
 income?
 
Economic
 intui/on
 would
 tell
 us
 that
 this
 is
 
probably
 not
 the
 case.
 
 

•  Can
 check
 X-­‐Y
 plot
 of
 birth
 weight
 and
 income
 
to
 confirm.
 
 

•  Might
 make
 more
 sense
 to
 use
 ln(income)?
 
 

22/08/13
 

6
 

Excel
 Output
 
 

Interpreta6on
 of
 Results
 
 

•  No
 massive
 changes
 in
 the
 explanatory
 power
 
of
 the
 model,
 and
 no
 changes
 to
 significance
 
 

•  New
 figed
 model
 is:
 
 

•  The
 es/mated
 coefficient
 on
 cigs
 (number
 of
 
cigareges)
 has
 not
 changed
 all
 that
 much,
 nor
 
has
 the
 intercept.
 
 

 

Ŷ = 7.31− 0.029X1 + 0.116

X2

Interpre6ng
 
 

•  How
 do
 we
 interpret
 the
 new
 coefficient
 on
 
our
 transformed
 variable?
 
 

•  Recall,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 ,
 our
 dependent
 variable
 is
 in
 
pounds,
 and
 our
 independent
 variable
 is
 
ln(income)
 
 

•  Interpreta/on:
 A
 1%
 increase
 in
 income
 is
 
expected
 to
 increase
 birth
 weight
 by
 about
 
(0.116/100)
 =
 .00116
 pounds,
 ceteris
 paribus
 
(i.e.,
 holding
 smoking
 behaviour
 constant).
 
 

β̂2

β̂2 = 0.116

22/08/13
 

7
 

Some
 piJalls…?
 
 

•  Consider
 the
 following
 output
 for
 a
 simple
 
version
 of
 our
 birth
 weight
 model:
 
 

Answer
 
 

•  These
 es/mators
 come
 from
 two
 different
 
regressions
 which
 control
 for
 different
 
explanatory
 variables.
 
 

•  This
 means
 that
 the
 two
 es/mates
 come
 with
 
different
 ‘ceteris
 paribus’
 condi/ons.
 
 
– Specifically:
 In
 the
 simple
 model,
 we
 are
 not
 
holding
 anything
 else
 constant.
 
 

Answer
 (cont’d)
 
 
•  Simple
 Regression:
 1%
 increase
 in
 income
 is
 expected
 
to
 increase
 birth
 weight
 by
 0.00147lbs
 
 

•  One
 solu/on
 to
 increasing
 birth
 weight,
 according
 to
 
this
 model,
 would
 be
 to
 give
 people
 more
 money.
 
 

•  BUT:
 other
 factors
 will
 influence
 birth
 weight,
 such
 as
 
smoking,
 diet,
 etc.
 
 

•  People
 with
 higher
 incomes
 tend
 not
 to
 be
 smokers
 
and
 therefore
 are
 “healthier”
 
 

•  The
 simple
 model
 only
 shows
 that
 “richer”
 people
 tend
 
to
 have
 higher
 birth
 weight
 babies
 –
 but
 gives
 no
 
indica/on
 that
 this
 effect
 may
 be
 partly
 due
 to
 their
 
increased
 health
 
 

22/08/13
 

8
 

Sta6s6cal
 Evidence
 
 

•  The
 nega/ve
 correla/on
 between
 income
 and
 
cigarege
 consump/on
 suggests
 that
 people
 with
 lower
 
incomes
 tend
 to
 consume
 more
 cigareges,
 and
 vice
 
versa.
 We
 are
 not
 examining
 the
 smoking
 and
 income
 
effects
 separately
 when
 we
 only
 consider
 one
 variable
 
in
 our
 model.
 
 

Mul6ple
 regression
 may
 provide
 a
 
bePer
 approxima6on
 to
 reality
 
 

•  If
 we
 evaluate
 the
 overall
 performance
 and
 fit
 of
 the
 
model,
 we
 see
 that
 the
 mul/ple
 regression
 model
 has
 
a
 lower
 standard
 error,
 higher
 F
 sta/s/cs,
 and
 a
 higher
 
Adjusted
 R2.
 
 
–  Adjusted
 R
 squared
 accounts
 for
 the
 fact
 that
 more
 
variables
 included
 (greater
 k)
 
 

•  Models
 which
 contain
 all
 or
 most
 of
 the
 drivers
 of
 Y
 
will
 tend
 to
 look
 beger
 “on
 paper,”
 as
 well
 as
 in
 terms
 
of
 their
 theore/cal
 coherence,
 than
 simple
 regression.
 
 

•  This
 is
 not
 always
 the
 case,
 it
 depends
 on
 the
 marginal
 
effect
 of
 each
 addi/onal
 variable
 (the
 trade-­‐off
 
between
 an
 increase
 in
 R2
 and
 the
 increase
 in
 k)
 
 

PiJall
 #1:
 OmiPed
 Variables
 Bias
 
 
The
 technical
 term
 for
 what
 we
 just
 described
 is
 “OMITTED
 
VARIABLES
 BIAS”
 
 
IF
 
 
•  We
 exclude
 explanatory
 variable(s)
 that
 should
 be
 present
 in
 

the
 model
 
 

AND
 
 
•  •These
 variable(s)
 are
 correlated
 with
 an
 included
 explanatory
 

variable
 
 

THEN
 
 
•  The
 OLS
 es/mate
 of
 the
 coefficient
 on
 the
 included
 explanatory
 

variable
 will
 be
 “biased”
 –
 that
 is,
 it
 won’t
 reflect
 the
 “pure”
 
impact
 of
 that
 variable
 on
 Y
 
 

22/08/13
 
9
 

OVB
 cont’d
 
 

•  In
 our
 simple
 regression,
 we
 only
 considered
 
ln(income)
 
 

•  Many
 important
 determinants
 of
 birth
 weight
 
were
 omiged,
 such
 as
 smoking.
 
 

•  This
 omiged
 variable
 was
 correlated
 with
 
income,
 and
 therefore
 the
 es/mate
 from
 our
 
simple
 regression
 was
 biased.
 
 

OmiPed
 variable
 bias
 

•  True
 model
 is:
 

 

 
•  But
 we
 omit
 
 
 
 
 
 
 
 
 and
 es/mate:
 

•  If
 
 
 
 
 
 
 
 
 and
 
 
 
 
 
 
 
 are
 correlated,
 OLS
 is
 biased
 
 
 
 

 

26

y = α + β1X1 + β2X2 + e

X2

y = α + β1X1 + v
v = β2X2 + e

X2 X1

OmiPed
 variable
 bias
 

Bias (+) Bias (–)
Bias (–) Bias (+)

27

corr(X1,X2) > 0 corr(X1,X2) < 0 β2 > 0
β2 < 0

The true model is:

But we omit ability.

Since ability and education are most likely correlated (+ve),
our coefficient on education is likely biased upwards.

wage = β0 + β1educ + v

wage = α + β1educ + β2ability + e

v = β2ability + e( )

22/08/13
 

10
 

PiJall
 #2:
 Irrelevant
 variables
 
 
•  If
 we
 include
 any
 irrelevant
 variables
 as
 
independent
 variables
 in
 our
 model,
 it
 will
 not
 
cause
 bias
 if
 the
 true
 coefficient
 of
 the
 extra
 
variable
 is
 zero
 (irrelevant).
 
 

•  However,
 it
 will
 increase
 the
 variance
 of
 the
 
es/mated
 coefficients,
 which
 will
 tend
 to
 
decrease
 the
 magnitude
 of
 their
 t-­‐scores.
 
 

•  Irrelevant
 variables
 also
 usually
 decreases
 the
 
adjusted
 R
 squared.
 
 

•  Hence
 irrelevant
 variables
 reduce
 the
 precision
 of
 
regressions.
 
 

PiJall
 #3:
 Mul6-­‐collinearity
 
 
•  Explanatory
 variables
 may
 be
 HIGHLY
 correlated
 è
 the
 

model
 has
 trouble
 differen/a/ng
 between
 their
 effects
 on
 
Y.
 
 

•  High
 R2,
 large
 F-­‐stat,
 but
 insignificant
 t
 stats
 for
 coefficient
 
es/mates
 (or
 wrong
 sign).
 
 

•  ie.
 Model
 overall
 fits
 well,
 but
 cannot
 pin
 down
 marginal
 
effects
 of
 individual
 variables
 
 

•  Diagnosing:
 Look
 at
 your
 correla/on
 matrix
 for
 high
 levels
 
of
 correla/on
 between
 your
 explanatory
 variables.
 This
 will
 
reveal
 the
 source
 and
 extent
 of
 the
 mulCcollinearity
 
problem.
 
 

•  NOTE:
 High
 correla/on
 between
 your
 dependent
 variable
 
and
 independent
 variable(s)
 is
 usually
 OK!
 
 

Remedies
 for
 mul6collinearity
 
 

1.Do
 nothing
 
 
– If
 looking
 for
 overall
 predic/on
 and
 not
 individual
 
effects
 
 

– If
 theory
 suggests
 variables
 should
 be
 included
 
 
2.Transform
 variables
 
 
– If
 theore/cally
 jus/fied.
 Mul/collinearity
 is
 a
 
problem
 when
 there
 is
 a
 linear
 rela/onship
 
between
 explanatory
 variables
 
 

3.Drop
 or
 combine
 explanatory
 variables
 
 

22/08/13
 

11
 

How
 do
 you
 select
 explanatory
 
variables?
 
 

•  Include
 (insofar
 as
 possible)
 all
 explanatory
 
variables
 that
 you
 think
 might
 explain
 your
 
dependent
 variable.
 This
 will
 reduce
 OVB.
 
 

•  However,
 including
 irrelevant
 variables
 or
 
ones
 that
 are
 highly
 mul/collinear
 is
 also
 not
 
advisable.
 
 

•  Ideally,
 turn
 to
 theory,
 intui/on,
 logic,
 and/or
 
common
 sense
 for
 sugges/ons
 on
 what
 is
 
appropriate
 to
 include.
 
 

Example:
 Forecas6ng
 Demand
 
 
•  Back
 to
 first
 year
 microeconomics.
 
 
•  Theory
 argues
 that
 market
 demand
 for
 a
 product
 is
 a
 

func/on
 of
 the
 following:
 
 
–  Price
 
 
–  Tastes
 and
 preferences
 
 
–  Disposable
 income
 
 
–  Number
 of
 consumers
 in
 the
 market
 
 
–  Prices
 of
 related
 goods
 
 
–  Expecta/ons
 
 

•  This
 theory
 gives
 you
 an
 indica/on
 of
 what
 should
 be
 
included
 in
 a
 regression
 model
 of
 market
 demand.
 
However,
 you
 will
 also
 need
 to
 consider
 the
 availability
 of
 
data
 (which
 is
 almost
 always
 the
 biggest
 constraint
 on
 
model
 development).
 
 

Start with some data:

Date

Total earnings (male)

Nov.1983 362.00

Feb.1984 370.60

May.1984 383.80

Aug.1984 386.20

Nov.1984 389.50

Feb.1985 392.70

May.1985 397.20

Aug.1985 403.10

Nov.1985 413.90

Feb.1986 422.70

May.1986 425.50

Aug.1986 437.20

Nov.1986 446.30

Feb.1987 444.50

May.1987 450.90

Aug.1987 457.00

Nov.1987 470.00

Feb.1988 474.90

May.1988 481.70

Aug.1988 486.20

6302.0 Average Weekly Earnings,

Australia TABLE 3. Average Weekly

Earnings Of Employees, Australia

(Dollars) – Original

Set p = 0.5 and find your forecasts for the time period of interest

Once you have found your RMSE or evaluation tool, go to Solver

And impose the appropriate constraints:

Go back and check your p-value, which should have changed. You have now minimized the error terms by choice of p, which provides you with the best naive 2 forecast.

Still stressed from student homework?
Get quality assistance from academic writers!

Order your essay today and save 25% with the discount code LAVENDER