Finish the HW1 with quailty work. this homework due 2018/01/15 23:30. I also post our PPT if it could help you to finish this homework.
You may use Econometric Software
The homework assignments include computer exercises in which you will apply the econometric
methods learned in class to analyze data using Stata. The edition called Stata/IC will be sufficient
for our uses (or the discontinued “Small Stata” edition if you have purchased an earlier version).
•
You may purchase Stata at the following site and install it on your own computer:
http:
//www.stata.com/order/new/edu/gradplans/student-pricing/
Econ 54
2
0: Econometrics II
The Ohio State University
Spring 20
1
8
Prof. Jason Blevins
Homework 1
Due in class on January 16.
Review Chapters 1–3 of Wooldridge and complete the exercises below. (Textbook exercise
numbers are given in parenthesis for reference, but the first two are not from our textbook.)
Problem 1. (Studenmund, 1.11) The distinction between the stochastic error term ui and the
residual ûi is one of the most important in this class.
a. List at least two differences between the error term and the residual.
b. Usually, we can never observe the error term, but we can get around this difficulty if we
assume values for the true coefficients. Calculate values of the error term and residual for
each of the following six observations given that the true β0 equals 0.0, the true β1 equals
1.5, and the estimated regression equation is Ŷi = 0.48 + 1.32 · X i :
Yi 2 6 3 8 5 4
X i 1 4 2 5 3 4
(Hint: To answer this question, you’ll have to solve for ui in the “true model” equation for
Yi .)
Problem 2. In order to estimate a regression equation using Ordinary Least Squares (OLS), it
must be linear in the coefficients. Determine whether the coefficients in each of the following
equations could be estimated using OLS.
a. Yi = β0 +β1 ln X i + ui
b. ln Yi = β0 +β1 ln X i + ui
c. Yi = β0 +β1 X β2i + ui
d. Y
β0
i
= β1 +β2 X 2i + ui
e. Yi = β0 +β1 X i +β2 X 2i + ui
Problem 3. (Wooldridge, 3.4) The median starting salary for new law school graduates is deter-
mined by
ln SALARY = β0 +β1LSAT +β2GPA +β3 ln LIBVOL +β4 ln COST +β5RANK + u.
where LSAT is the median LSAT score for the graduating class, GPA is the median college GPA for
the class, LIBVOL is the number of volumes in the law school library, COST is the annual cost of
attending law school, and RANK is a law school ranking (with RANK = 1 being the best).
1
Econ 5420: Econometrics II Homework 1
a. Explain why we expect β5 ≤ 0.
b. What signs do you expect for the other slope parameters? Justify your answers.
c. Using the lawsch85 dataset, the estimated equation is
áln SALARY = 8.34 + 0.0047 LSAT + 0.248 GPA + 0.095 ln LIBVOL
+ 0.038 ln COST − 0.0033 RANK
n = 136, R2 = 0.842.
What is the predicted ceteris paribus difference in salary for schools with a median GPA
different by one point? (Report your answer as a percentage.)
d. Interpret the coefficient on the variable ln LIBVOL.
e. Would you say it is better to attend a higher ranked law school? How much is a difference
in ranking of 20 worth in terms of predicted starting salary?
Problem 4. (Wooldridge, C3.2) Use the hprice1 dataset to estimate the model
PRICE = β0 +β1SQRFT +β2BDRMS + u
where PRICE is the house price measured in thousands of dollars.
a. Write out the results in equation form.
b. What is the estimated increase in price for a house with one more bedroom, holding square
footage constant?
c. What is the estimated increase in price for a house with an additional bedroom that is 140
square feet in size? Compare this to your answer in part b.
d. What percentage of the variation in price is explained by square footage and number of
bedrooms?
e. The first house in the sample has SQRFT = 2, 438 and BDRMS = 4. Find the predicted
selling price for this house from the OLS regression line.
f. The actual selling price of the first house in the sample was $300,000 (so PRICE = 300).
Find the residual for this house. Does it suggest that the buyer underpaid or overpaid for
the house?
2
1/12/18, 11:10 AM Page 1 of 1
User: Jason Blevins
name:
log: /tmp/hw1-problem4.smcl
log type: smcl
opened on: 12 Jan 2018, 11:09:18
1 . use hprice1
2 . regress price sqrft bdrms
Source SS df MS Number of obs = 88
F(2, 85) = 72.96
Model 580009.152 2 290004.576 Prob > F = 0.0000
Residual 337845.354 85 3974.65122 R-squared = 0.6319
Adj R-squared = 0.6233
Total 917854.506 87 10550.0518 Root MSE = 63.045
price Coef. Std. Err. t P>|t| [95% Conf. Interval]
sqrft .1284362 .0138245 9.29 0.000 .1009495 .1559229
bdrms 15.19819 9.483517 1.60 0.113 -3.657582 34.05396
_cons -19.315 31.04662 -0.62 0.536 -81.04399 42.414
3 . log close
name:
log: /tmp/hw1-problem4.smcl
log type: smcl
closed on: 12 Jan 2018, 11:09:50
Econ 5420: Cross Section:
Introductio
n
Prof. Jason Blevins
Department of Economics
The Ohio State University
http://jblevins.org/
What is Econometrics?
Model ⇐⇒ Econometrics ⇐⇒ Data
Figure 1: What is econometrics?
• Literally “economic measurement”
• Quantitative analysis of economic problems
• Application of statistical methods to connect theoretical
economic models to data
• Bridges abstract economic theory and real-world human
economic activi
ty
• Notion of a “true model”
Relevance
• Academic research
• Every field of economics
• With the exception of pure theory
• Government
• Procurement auctions (e.g., FCC spectrum
auctions)
• Assignment of scarce resources (e.g., timber and spectrum
auctions)
• Antitrust
• Environmental regulation
• Business
• Estimate demand for a new product
• Forecasting sales
• Pricing financial assets
• Search engine advertisements
Non-ideal data
• Traditional statistics: controlled experiments
• Economists rarely have this luxury
• Much of econometrics focuses on statistical analysis of
data under the non-ideal circumstances inherent in
measuring economic interactions.
Quote: Ragnar Frisch
[T]here are several aspects of the quantitative
approach to economics, and no single one of these
aspects, taken by itself, should be confounded with
econometrics. Thus, econometrics is by no means
the same as economic statistics. Nor is it identical
with what we call general economic theory, although
a considerable portion of this theory has a definitely
quantitative character. Nor should econometrics be
taken as synonomous with the application of
mathematics to economics. . . . It is the unification of
all three that is powerful. And it is this unification that
constitutes econometrics.
–Frisch (1933)
Roles of Econometrics
In light of these definitions, it is clear that econometrics is used
for:
1. quantifying economic relationships (estimation),
2. testing economic theories, and
3. prediction and forecasting.
Demand
Example
Example
Let Q denote the quantity demanded of a particular good. We
expect that Q should depend on P, the price of the good itself,
Ps, the price of a substitute good, and Yd, disposable income.
Without saying more about the specific relationships between
these variables, we can represent this notion using the abstract
functional relationship
Q = f(P, Ps, Yd).
Typically, we would expect demand (Q) to decrease with P
and increase with Ps.
Examples of the Roles of Econometrics
Examples of the three roles applied to the demand model
Q = f(P, Ps, Yd):
1. Quantifying economic relationships: In a linear model
Q = β1 + β2P + β3Ps + β4Yd,
what are the values of the β1, β2, β3, and β4?
2. Testing economic theories: Is the good a normal good?
3. Prediction and forecasting: How many units would be
demanded, given hypothetical values of prices and
income?
Estimation: Quantifying Uncertainty
Q
Yd
(a) Small sample
Q
Yd
(b) Large sample
Q
Yd
(c) Low variance
Figure 2: Statistical significance.
A Note on Causality
• Correlation does not imply causation!
• Suppose we observe that when large numbers of people
carry umbrellas to work it tends to rain.
• Obviously, carrying umbrellas do not cause it to rain!
• This is of course just a correlation, and the causation runs
in the opposite direction.
• For the purposes of prediction, perhaps the number of
umbrellas is a good predictor of rain, but for interpretation
of the underlying process it is nonsensical.
Another Note on Causality
Example
A study of traffic accidents due to alcohol examined police
reports of traffic accidents. For each, researchers recorded
whether the driver had consumed alcohol and whether or not
the report noted an empty beer container in the vehicle.
Statement A: A recent study has found that drinking beer while
driving may lead to increased risk of an accident.
Statement B: A recent study has found that empty beer
containers in cars may lead to increased risk of an accident.
Econometric Data
There are three basic types of econometric data:
1. Cross-sectional data: observations on different individual
units (e.g., people or firms) with no natural ordering.
2. Time series data: observations at different points in time
(e.g., monthly or yearly) with a natural ordering (time).
3. Panel data: a mixture of time series and cross sectional
data consisting of observations on multiple individuals
(unordered) at different points in time (ordered).
Cross-Sectional Data
Example
A cross-sectional dataset on different students including age,
GPA, and hours studied per week:
Student Age GPA Hours
1 20 3.4 1
0
2 21 3.1 5
3 19 3.9
12
…
…
…
…
Time Series Data: Example
Example
A time series dataset consisting of semester-by-semester
observations on a single student:
Semester Age GPA Hours
Fall 2013 18 3.3 7
Spring 2014 18 3.8 5
Fall 2014 19 3.5 12
…
…
…
…
Panel Data: Example
Example
A panel dataset with observations on multiple students across
multiple semesters:
Student Semester Age GPA Hours
1 Fall 2013 18 3.3 7
1 Spring 2014 18 3.8 5
1 Fall 2014 19 3.5 12
…
…
…
…
…
2 Fall 2013 20 3.1
10
2 Spring 2014 20 2.9 10
2 Fall 2014 21 3.4 1
8
…
…
…
…
…
Mean, Variance, and
Standard Deviation
• Three important features of random variables: mean,
variance, and standard deviation.
• Referred to as “moments” of a distribution.
• Forumla depends on whether the distribution is discrete
(sum) or continuous (integral).
Expected Value
The expected value or mean of a discrete random variable is
the sum of all possible outcomes weighted by the probability
that each outcome occurs.
• Discrete random variable Z takes on a countable number
of outcomes.
• Let k denote the number of outcomes and z1, z2, . . . , zk
are denote the values of the outcomes.
• Let P(z1), P(z2), . . . , P(zk) denote the probabilities
associated with each outcome.
• The expected value of Z, denoted E[Z] or µ is
µ = E[Z] =
k∑
i=
1
P(zi)zi = P(z1)z1 + · · · + P(zk)zk
Expected Value
Example
An individual makes a bet with the possibility of losing $1.00,
breaking even, winning $3.00, or winning $5.00. Let Z be a
random variable representing the winnings.
Outcome -$1.00 $0.00 $3.00 $5.00
Probability 0.30 0.40 0.20 0.10
The mean outcome is:
µ = E[Z] = (0.3 × −1) + (0.4 × 0) + (0.2 × 3) + (0.1 × 5)
= −0.3 + 0.6 + 0.5 = 0.8
Therefore, the expected payoff for this bet is $0.80.
Variance
The variance of a discrete random variable Z, denoted Var(Z)
or σ2, is a measure of the variability of the distribution and is
defined as
σ2 = Var[Z] = E[(Z − µ)2] =
k∑
i=1
P(zi)(zi − µ)2.
The variance is the expected value of (Z − µ)2 which is the
anticipated average value of the squared deviations of Z from
the mean.
Variance: Example
Example
Returning to the example, even though the odds are favorable
one might also be concerned about the possibility of extreme
outcomes. The variance of the winnings is
σ2 = [0.3 × (−1 − 0.8)2] + [0.4 × (0 − 0.8)2]
+ [0.2 × (3 − 0.8)2] + [0.1 × (5 − 0.8)2]
= [0.3 × (−1.8)2] + [0.4 × (−0.8)2]
+ [0.2 × (2.2)2] + [0.1 × (4.2)2]
= [0.3 × 3.24] + [0.4 × 0.64] + [0.2 × 4.84] + [0.1 × 17.64]
= 0.972 + 0.256 + 0.968 + 1.76
4
= 3.96.
Standard Deviation
The standard deviation, denoted σ, is the square root of the
variance.
Example
In our example, the variance was σ2 = 3.96 so the standard
deviation is
σ =
√
3.96 ≈ 1.99
Sample Statistics
In contrast to the population mean, the sample average or
sample mean is a sum over n sampled values (called
realizations) of a random variable where all of the weights are
all equal to 1/n. Let {Z1, Z2, . . . , Zn} denote a sample of size n
from the distribution of Z. The sample average of
{Z1, Z2, . . . , Zn} is
Z̄ =
1
n
n∑
i=1
Zi =
1
n
Z1 + · · · +
1
n
Zn.
The sample variance is defined similarly:
s2 =
1
n
n∑
i=1
(Zi − Z̄)2.
Normal Distribution
• Mean, variance, and standard deviation are can be
illustrated intuitively with the normal distribution.
• Write N(µ, σ2) to denote the normal distribution with
mean µ and variance σ2.
• The standard deviation is defined as the square root of the
variance, σ in this case.
Normal Distribtion
0.00
0.04
0.09
0.14
0.18
0.22
0.27
0.31
0.3
6
0.40
0.45
-9.0 -7.2 -5.4 -3.6 -1.8 0.0 1.8 3.6 5.4 7.2 9.0
P
ro
b
a
b
ili
ty
d
e
n
si
ty
x
Figure 3: N(0, 1)
Normal Distribtion
0.00
0.04
0.09
0.14
0.18
0.22
0.27
0.31
0.36
0.40
0.45
-9.0 -7.2 -5.4 -3.6 -1.8 0.0 1.8 3.6 5.4 7.2 9.0
P
ro
b
a
b
ili
ty
d
e
n
si
ty
x
Figure 4: N(1, 2)
• Ordinary Least Squares (OLS) with a single independent
variable.
• The theoretical model of interest is the linear model
Yi = β0 + β1Xi + ui.
• From this equation, we seek to use the information
contained in a dataset of observations (Xi, Yi) to estimate
the values of β0 and β1, which we call β̂1 and β̂2.
• The fitted values are
Ŷi = β̂0 + β̂1
Xi
• The residuals are the differences between the fitted values
Ŷi and the observed values Yi:
ûi ≡ Yi − Ŷi.
The Geometry of Simple Regression
0
Y
XX1 X2
Y2
Ŷ2
Ŷ1
Y1
û2
û1
Figure 5: Estimated regression line and residuals
Simple Regression
• We need to formally define our loss function, the criteria
by which we determine whether the fit is good or not.
• OLS is founded on minimizing the sum of squared
residuals (SSR), defined as
SSR =
n∑
i=1
û2i = û
2
1 + û
2
2 + · · · + û
2
n,
where n is the sample size.
• OLS estimates of β0 and β1 are defined to be the values of
β̂0 and β̂1 which, when plugged into the estimated
regression equation, minimize the SSR.
• Note that different samples (of the same size) yield
different estimates.
Why Ordinary Least Squares?
OLS is used so often for several reasons.
1. It is very straightforward to implement, both by hand and
computationally, and it is simple to work with theoretically.
2. The criteria of minimizing the squared residuals is intuitive.
3. Other nice properties: the regression line passes through
the means of X and Y, (X̄, Ȳ), the sum of the residuals is
zero, etc.
Simple Regression: Interpretation
In the regression line
Yi = β̂0 + β̂1Xi,
the coefficient on Xi represents the amount by which we
predict Yi will increase when Xi increases by one unit.
Simple Regression: Interpretation
Example
Let Yi be an individual i’s annual demand for housing in dollars
and let Xi be individual i’s annual income, also measured in
dollars.
Then β̂1 is the number of additional dollars individual i is
predicted to spend on housing when income increases by one
dollar.
The intercept, β̂0, is an individual’s predicted expenditure on
housing when income is zero.
Anscombe’s Quartet: A Cautionary Tale
4 6 8 10 12 14 16 18
4
6
8
10
12
x1
y 1
4 6 8 10 12 14 16 18
4
6
8
10
12
x2
y 2
4 6 8 10 12 14 16 18
4
6
8
10
12
x3
y 3
4 6 8 10 12 14 16 18
4
6
8
10
12
x4
y 4
OLS Review: Sum of the Residuals is Zero
To see that the residuals sum to zero, we can look at the
average of the residuals. If the average is zero, so is the sum.
1
n
n∑
i=1
ûi =
1
n
n∑
i=1
(
Yi − β̂0 − β̂1Xi
)
=
1
n
n∑
i=1
Yi − β̂0 − β̂1
1
n
n∑
i=1
Xi
= Ȳ − β̂0 − β̂1X̄.
But recall that β̂0 = Ȳ − β̂1X̄, and so
1
n
n∑
i=1
ûi = Ȳ − β̂0 − β̂1X̄.
= Ȳ − (Ȳ − β̂1X̄) − β̂1X̄
= 0.
OLS Review: Fitted Value at Mean
Show that the regression line passes through the point (X̄, Ȳ).
We can evaluate the regression line at the point X̄ and show
that the fitted value is indeed Ȳ.
Substituting for β̂0 gives:
Ŷ = β̂0 + β̂1X
= (Ȳ − β̂1X̄) + β̂1X.
Then evaluating at the point X = X̄ gives:
Ŷ
= (Ȳ − β̂1X̄) + β̂1X̄
= Ȳ.
OLS Review: Average of Predicted Values
Another property is that the average of the predicted values Ŷi
equals the average of the observations Yi:
1
n
n∑
i=1
Ŷi =
1
n
n∑
i=1
(β̂0 − β̂1Xi)
= β̂0 + β̂1X̄
= (Ȳ − β̂1X̄) + β̂1X̄
= Ȳ.
References
Frisch, R. (1933). Editor’s note. Econometrica 1, 1–4.
- Introduction
- Mathematical and Background
What is Econometrics?
Some Quotes
Causality
Econometric Data
Mean, Variance, and Standard Deviation
Sample Statistics
Simple Regression
OLS Review
&DPO ����� $SPTT 4FDUJPO�
.VMUJQMF 3FHSFTTJPO
1SPG� +BTPO #MFWJOT
%FQBSUNFOU PG &DPOPNJDT
5IF 0IJP 4UBUF 6OJWFSTJUZ
4JNQMF 3FHSFTTJPO /PUBUJPO
Ŕ 1SFWJPVTMZ
XF IBWF CFFO VTJOH UIF TJOHMF�WBSJBCMF NPEFM
:J = β� + β�9J + VJ
.
Ŕ 5IF J TVCTDSJQU SFQSFTFOUT B TJOHMF PCTFSWBUJPO� ‘PS B
EBUBTFU XJUI O PCTFSWBUJPOT XF IBWF O FRVBUJPOT�
:� = β� + β�9� + V�,
:� = β� + β�9� + V�,
���
:O = β� + β�9O + VO.
Ŕ 5IJT NPEFM POMZ BMMPXT GPS POF JOEFQFOEFOU WBSJBCMF 9J�
Ŕ 5P BMMPX GPS NVMUJQMF JOEFQFOEFOU WBSJBCMFT JO PVS
SFHSFTTJPO
XF IBWF UP HFOFSBMJ[F UIF OPUBUJPO�
.VMUJQMF 3FHSFTTJPO /PUBUJPO
Ŕ &YUFOEJOH UP , SFHSFTTPST
XF BEE B L TVCTDSJQU�
Ŕ 8F XSJUF UIF NPEFM GPS , SFHSFTTPST BOE O PCTFSWBUJPOT BT
:J = β� + β�9J� + β�9J� + · · · + β,9J,! “# $
, SFHSFTTPST
+VJ
Ŕ 8SJUJOH PVU UIF FOUJSF TZTUFN PG O FRVBUJPOT�
:� = β� + β�9�� + β�9�� + · · · + β�9�, + V�,
:� = β� + β�9�� + β�9�� + · · · + β�9�, + V�,
���
:O = β� + β�9O� + β�9O� + · · · + β�9O, + VO.
*OUFSQSFUJOH .VMUJQMF 3FHSFTTJPO $PFťDJFOUT
Ŕ $PFťDJFOUT JO NVMUJQMF MJOFBS SFHSFTTJPO BSF DBMMFE UIF
NVMUJQMF SFHSFTTJPO DPFťDJFOUT PS QBSUJBM SFHSFTTJPO
DPFťDJFOUT�
Ŕ 1SFWJPVTMZ
β� JO PVS TJOHMF�WBSJBCMF MJOFBS SFHSFTTJPO
NPEFM SFQSFTFOUFE UIF DIBOHF JO UIF EFQFOEFOU WBSJBCMF
BTTPDJBUFE XJUI B POF�VOJU JODSFBTF JO UIF EFQFOEFOU
WBSJBCMF�
Ŕ )PMEJOH PUIFS WBSJBCMFT DPOTUBOU� /PX
GPS TPNF WBSJBCMF
9L
βL SFQSFTFOUT UIF JODSFBTF JO 9L BTTPDJBUFE XJUI B
POF�VOJU JODSFBTF JO UIF L�UI EFQFOEFOU WBSJBCMF
IPMEJOH
UIF PUIFS JOEFQFOEFOU WBSJBCMFT DPOTUBOU�
&YBNQMF� ‘JOBODJBM “JE
&YBNQMF
4UVEFONVOE ƈƆƇƆ
Q� Ɗƈ
Ŕ ‘*/”*%J = BNPVOU PG ţOBODJBM BJE BXBSEFE EPMMBST QFS
ZFBS
Ŕ 1″3&/5J = FYQFDUFE GBNJMZ DPOUSJCVUJPO EPMMBST QFS ZFBS
Ŕ )43″/,J = (1″ SBOL JO IJHI TDIPPM QFSDFOUBHF
Ɔ�ƇƆƆ
XJUI ƇƆƆ CFJOH UIF IJHIFTU
‘*/”*%J = β� + β�1″3&/5J + β�)43″/,J + VJ
&YBNQMF� ‘JOBODJBM “JE DPOU�E
&YBNQMF
!’*/”*%J = ���� − �.�� 1″3&/5J + ��.� )43″/,J
Ŕ )FSF β̂� = −�.�� NFBOT UIBU IPMEJOH IJHI TDIPPM SBOL
ţYFE
TUVEFOUT XIPTF QBSFOUT DBO DPOUSJCVUF BO BEEJUJPOBM
ƚƇ XJMM SFDFJWF PO BWFSBHF ƚƆ�Ɖƌ MFTT JO ţOBODJBM BJE�
Ŕ ‘PS BO FYUSB ƚƇ
ƆƆƆ JO FYQFDUFE QBSFOUBM DPOUSJCVUJPOT
UIBUōT ƚƉƌƆ MFTT JO ţOBODJBM BJE�
Ŕ �ƚƉƌƆ � ƚƇ
ƆƆƆ × �Ɔ�Ɖƌ�
Ŕ #FDBVTF PG MJOFBSJUZ
UIJT JT UIF FTUJNBUFE FŢFDU GPS CPUI
IJHI BOE MPX SBOLFE TUVEFOUT�
(FPNFUSZ PG .VMUJQMF 3FHSFTTJPO
-0.36
HSRANK
HSRANK
FI
N
A
ID
FI
N
A
ID
FI
N
A
ID
PARENT
PA
RE
NT
87.4
‘JHVSF Ƈ� (FPNFUSJD JOUFSQSFUBUJPO PG NVMUJQMF SFHSFTTJPO
5IF $PFťDJFOU PG %FUFSNJOBUJPO
Ŕ 5IF NPTU DPNNPO NFBTVSF PG UIF PWFSBMM ţU PG B
SFHSFTTJPO JT UIF DPFťDJFOU PG EFUFSNJOBUJPO
EFOPUFE 3��
Ŕ 5IJT NFBTVSF TVNNBSJ[FT UIF ţU PG B TJOHMF SFHSFTTJPO
JOEFQFOEFOUMZ�
Ŕ *U JT BMTP VTFGVM UP DPNQBSF UIF ţU PG B DPMMFDUJPO PG
SFHSFTTJPOT XJUI EJŢFSFOU DPNCJOBUJPOT PG JODMVEFE
JOEFQFOEFOU WBSJBCMFT�
%FDPNQPTJUJPO PG 7BSJBODF
Ŕ 5P EFţOF 3�
XF VTF UIF EFDPNQPTJUJPO PG WBSJBODF�
Ŕ 445 JT UIF UPUBM WBSJBUJPO JO :J
UIF UPUBM TVN PG TRVBSFT�
445 ≡
O%
J=�
(:J − :̄)�.
Ŕ 445 DBO CF GBDUPSFE JOUP UXP DPNQPOFOUT BT
445 = 44& + 443�
Ƈ� 44& DBQUVSFT EFWJBUJPOT JO :J GSPN UIF NFBO :̄
44& ≡
O%
J=�
(:̂J − :̄)�
ƈ� 443 DBQUVSJOH UIF ŏVOFYQMBJOFEŐ PS SFTJEVBM EFWJBUJPOT
443 ≡
O%
J=�
(:J − :̂J)�
%FDPNQPTJUJPO PG 7BSJBODF
.BUIFNBUJDBMMZ�
445
=
O%
J=�
(:J − :̄)�
=
O%
J=�
(:J − :̂J)� +
O%
J=�
(:̂J − :̄)�
=
O%
J=�
V̂�J +
O%
J=�
(:̂J − :̄)�
= 443 + 44&.
*O PUIFS XPSET NPSF JO MJOF XJUI PVS SFHSFTTJPO GPSN
�
445 = 44& + 443.
6OBEKVTUFE
3�
Ŕ 5IF DPFťDJFOU PG EFUFSNJOBUJPO JT EFţOFE BT
3� =
44&
445
= � −
443
445
.
Ŕ *U JT UIF GSBDUJPO PG UPUBM WBSJBUJPO JO :J J�F�
445
UIBU JT
FYQMBJOFE CZ UIF JODMVEFE SFHSFTTPST J�F
44&
�
Ŕ .FBTVSFT TVDI BT 3� BSF DBMMFE HPPEOFTT PG ţU NFBTVSFT�
Ŕ ” IJHIFS WBMVF PG 3� JOEJDBUFT UIBU UIF SFHSFTTJPO ţUT UIF
EBUB CFUUFS TJODF 9 DBO FYQMBJO NPSF PG UIF WBSJBUJPO JO :�
” 1SPCMFN XJUI 3�
Ŕ 8IFO BEEJOH B WBSJBCMF
3� BMXBZT JODSFBTFT�
Ŕ 5IJT IBQQFOT DBO IBQQFO FWFO JG UIF WBSJBCMF JT VOSFMBUFE�
Ŕ 5P TFF UIJT
SFDBMM UIBU
3� =
44&
445
.
Ŕ 445 JT B GVODUJPO PG :J POMZ
TP JU JT VODIBOHFE�
Ŕ #VU BEEJOH BO BEEJUJPOBM 9 WBSJBCMF UP UIF SFHSFTTJPO
XFBLMZ
JODSFBTF 44& BOE IFODF 3��
Ŕ 5IFSFGPSF
MPPLJOH BU 3� BMPOF DBO FODPVSBHF UIF BEEJUJPO
PG UPP NBOZ FYQMBOBUPSZ WBSJBCMFT�
“EKVTUFE 3�
Ŕ 5IF BEKVTUFE 3� JT EFţOFE BT
3̄� = � −
443/(O − , − �)
445/(O − �)
.
Ŕ 5IFSF JT B ŏQFOBMUZŐ JO UIF TFOTF UIBU JODSFBTJOH UIF
OVNCFS PG SFHSFTTPST
,
EFDSFBTFT 3̄� VOMFTT UIF 443
EFDSFBTFT 44& JODSFBTFT
FOPVHI UP DPNQFOTBUF�
Ŕ *OUVJUJWFMZ
UIF BEKVTUFE 3� NFBTVSFT UIF QFSDFOUBHF PG UIF
WBSJBUJPO JO : BSPVOE :̄ UIBU JT FYQMBJOFE CZ UIF SFHSFTTPST
XJUI BO BEKVTUNFOU GPS UIF EFHSFFT PG GSFFEPN�
3̄� .BZ *ODSFBTF PS %FDSFBTF
Ŕ “T PQQPTFE UP UIF TUBOEBSE 3�
3̄� NBZ JODSFBTF
EFDSFBTF
PS TUBZ UIF TBNF XIFO BO BEEJUJPOBM SFHSFTTPS JT BEEFE�
Ŕ 5IF EJSFDUJPO PG UIF DIBOHF XJMM EFQFOE PO XIFUIFS UIF
ţU JNQSPWFT FOPVHI UP KVTUJGZ UIF EFDSFBTF JO UIF EFHSFFT
PG GSFFEPN�
Ŕ “T XJUI 3�
3̄� JT CPVOEFE BCPWF CZ Ƈ�Ɔ�
Ŕ )PXFWFS
JU NBZ BMTP CF OFHBUJWF
XIJMF UIF NJOJNVN
QPTTJCMF 3� JT Ɔ�
0SEJOBSZ -FBTU 4RVBSFT 3FHSFTTJPO JO 4UBUB
Ŕ 0SEJOBSZ MFBTU TRVBSFT� 5P SVO 0-4 JO 4UBUB
VTF UIF
ß�¨ß�ãã DPNNBOE GPMMPXFE CZ UIF EFQFOEFOU WBSJBCMF
BOE UIFO UIF JOEFQFOEFOU WBSJBCMFT� ‘PS FYBNQMF�
ß�¨ß�ãã ā Ā Ā�
Ŕ ‘JUUFE WBMVFT� 5P HFOFSBUF B OFX WBSJBCMF
TBZ ā�ê
DPOUBJOJOH UIF ţUUFE WBMVFT
SVO Üß��°�ê ā�êϔ Ā�
GPMMPXJOH UIF ß�¨ß�ãã DPNNBOE�
Ŕ 3FTJEVBMT� 5P HFOFSBUF B OFX WBSJBCMF
TBZ ï�ê
DPOUBJOJOH UIF SFTJEVBMT
SVO Üß��°�ê ï�êϔ ß�ã°�
GPMMPXJOH UIF ß�¨ß�ãã DPNNBOE�
4UBUB &YBNQMF� *OQVU
ïã� §°É�°�
ß�¨ß�ãã §°É�°� Ü�ß�Éê ãß�É¿
Üß��°�ê ā�êϔ Ā�
Üß��°�ê ï�êϔ ß�ã°�
4UBUB &YBNQMF� 0VUQVU
. use finaid
. regress finaid parent hsrank
Source SS df MS Number of obs = 50
F(2, 47) = 68.33
Model 1.0496e+09 2 524779682 Prob > F = 0.0000
Residual 360941186 47 7679599.7 R-squared = 0.7441
Adj R-squared = 0.7332
Total 1.4105e+09 49 28785725.5 Root MSE = 2771.2
finaid Coef. Std. Err. t P>|t| [95% Conf. Interval]
parent -.3567721 .0316851 -11.26 0.000 -.4205143 -.2930299
hsrank 87.37815 20.67413 4.23 0.000 45.78717 128.9691
_cons 8926.929 1739.083 5.13 0.000 5428.346 12425.51
. predict yhat, xb
. predict uhat, resid