exam 2 online . 10 question. the exam 2 hours long. needed by sunday evening 4 PM.
over these to chapters.
- Lecture Notes & Materials From Class – Correlation & Simple Linear Regression (Chapter 14)
- Lecture Notes & Materials – Multiple Regression (Chapter 15)
BUS 305: PRACTICE PROBLEMS EXAM 2
Simple Regression Problems
1) Of the following two graphs, indicate which one has a correlation coefficient that is closer to 0.
Scatterplot A Scatterplot B
2) Which of the following describes the relationship between the variables in the graphs in #1 above?
A. positive correlation
B. negative correlation
C. perfect positive correlation
D. perfect negative correlation
E. no correlation
3) If the scatterplot below depicted a set of bivariate data with independent variable X and dependent variable Y, would a regression model be appropriate for this data? Why or why not?
4) If the scatterplot below depicted a set of bivariate data with independent variable X and dependent variable Y, would a regression model be appropriate for this data? Why or why not?
5) Which of the following would represent the regression line for this data set? Why? Explain what characteristic of the line makes it the regression line.
6) Suppose your company is interested in discovering if there is a relationship or correlation between production volume (in number of units) and costs (in $). Which would be more appropriate for this data – to run a correlation analysis or to run a regression analysis? Explain.
7) Suppose your company is interested in discovering if there is a relationship between production volume (in number of units) and costs (in $).
ProdVolume
Cost
20
137
25
122
28
123
30
120
35
106
40
109
45
97
Which of the following is the most appropriate statistical analysis to run?
A. ANOVA
B. Multiple linear regression
C. Simple linear regression
D. T-test for the mean of a single population
8) Suppose you run a regression for variables X and Y, and find that r2 = 0.64, that the t-statistic for the hypothesis test H0: 1 = 0 is 1.31, and that the p-value for that test is 0.117. Then:
a) r = ______________
b) t-statistic for the hypothesis test H0: = 0 equals (give a number):_____________
c) p-value for the hypothesis test H0: = 0 equals (give a number): _________________
d) What do you conclude about the existence of a significant correlation between X and Y in the population? Explain.
9) Provide about one or two sentences to answer each question.
a) In a simple regression model, what is the difference between the 1 and b1?
b) Why are outliers problematic in a multiple regression model?
10) Given the following data and scatterplot, determine if a simple linear regression model is appropriate for this data. If so, generate the regression output using StatCrunch or Excel. If not, explain why linear regression is not appropriate.
ProdVolume
Cost
20
137
25
122
28
123
30
120
35
106
40
109
45
97
11) When answering questions (a) and (b) below, refer to the following StatCrunch output from a regression model that asserts that the number of near misses per year (Y) of commercial airliners is a linear function of the number of flights per year (X).
(a) Test for a linear relationship between near_misses and num_flights by reading the appropriate values from the output above. Be sure to indicate a test statistic, a p-value, and a conclusion as to whether or not there is a relationship.
(b) What percentage of the variation in the number of near misses is explained by the number of flights? Do you think this is a good regression model?
(c) What is the correlation between misses and flights? Is there a strong relationship between these variables? Explain.
(d) Write the regression line and then use it to calculate the predicted number of near misses if the number of flights is 100. Does this prediction make sense? Explain. Is it wise to make predictions with this model? Why or why not? (Refer to a part of the output to back up your conclusions.)
(e) Interpret the value of b1, the sample slope. Does this value appear to make sense? Explain.
Multiple Regression Problems
12) Provide one or two sentences to answer each of these questions.
a. Briefly explain the difference between multiple and simple regression.
b. What is multicollinearity in a multiple regression model, and why is it problematic?
c. How do you incorporate qualitative/categorical variables into a regression model? Be specific about what kind of variable is added to the model and what values that variable can be.
13) Suppose you want to try to estimate the miles per gallon of various car types by using their engine size (number of cylinders), cab space, horsepower, top speed and weight.
Which of the following is the most appropriate statistical analysis to run?
A. ANOVA
B. Multiple linear regression
C. Simple linear regression
D. T-test for the mean of a single population
14) Given the following data set, generate the multiple regression output for the model that states that MPG of a car is a linear function of EngineSize, CabSpace, Horsepower, TopSpeed, and Weight . Use StatCrunch or Excel. (See Excel file, PracticeExam2data.xlsx to copy the entire data set.)
MAKE/Model
EngineSize
CabSpace
HorsePower
TopSpeed
Weight
MPG
GM/GeoMetroXF1
4
89
49
96
17.5
65.4
GM/GeoMetro
4
92
55
97
20
56
GM/GeoMetroLSI
4
92
55
97
20
55.9
BMW750IL
6
119
295
157
45
16.7
Rolls-RoyceVarious
8
107
236
130
55
13.2
15) Use the following Excel output from a multiple regression model to answer questions (a) – (d). The model asserts that the sale price of an item is a function of both the original price, and the manufacturer’s suggested retail price (MSRP).
a) What does the F-statistic and its p-value tell you about the overall significance of the model in terms of the effects of Orig_Price and MSRP on the price of an item?
b) Which, if any, of the independent variables appear to affect the sale price (Y)? Indicate any numbers from the table you used to arrive at this conclusion.
c) State the regression equation and use it to predict the value of Y (sale price) corresponding to Original Price = 80 and MSRP = 100.
d) How much can you expect the sale price (Y) to increase as the MSRP increases by 1 unit? As Orig_Price increases by one unit?
e) How good/effective is this model? Are you comfortable using this regression equation to predict prices? Why or why not?
16) Consider the data in the file PracticeExam2data.xls. This data shows 82 cars and measures several characteristics of each. Use this data to develop the BEST/most efficient multiple regression model for predicting how many miles per gallon (MPG) that vehicles get (you may have to run more than one). Once you have your final model, explain why this was the best model possible using the discussion points from class.
7 10 11 12 13 14 15 16 17 18 19 20 21 22 23 8 9 11 12 13 14 14 16 17 16 19 20 21 21 23 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 8 8 8 7 7 7 6 504 503 506 504 500 490 486 476 464 450 434 416 396 374 350 515 515 525 518 515 506 500 494 483 480 460 438 419 398 375 496 497 508 502 500 492 487 482 472 470 451 430 412 380 370 500 470 486 450 440 478 469 TopSpeed 20 20 20 20 20 20 22.5 22.5 22.5 22.5 22.5 22.5 22.5 22.5 22.5 22.5 22.5 22.5 25 25 25 25 25 25 25 25 25 25 27.5 27.5 27.5 27 .5 27.5 27.5 27.5 27.5 27.5 30 30 30 30 30 30 30 30 30 30 30 30 35 35 35 35 35 35 35 35 35 35 35 35 40 40 40 40 40 40 40 40 40 40 40 40 45 45 45 45 45 45 45 97 97 105 96 105 97 98 98 107 103 113 113 103 100 103 106 113 106 109 110 101 111 105 111 110 110 110 109 105 112 103 103 111 111 102 106 106 109 109 120 106 106 109 106 105 108 108 107 120 109 109 109 109 123 125 115 102 109 104 105 120 107 114 114 117 122 122 122 122 118 130 121 121 110 110 121 125 140 137 138 Weight
TopSpeed
930 903.3820988091868 884.45970502358307 947.63033660888789 910 894.06482498743821 880 870.96684087367044 834.21152031879058 860.20835084087196 814.07540116065684 865.08090820166865 828.46674858822792 840.80518498698302 816.58905716279833 789.99769584279852 736.238172127947 763.9557685650376 778.4703772327498 726.44210283383939
730 1103.3820988091863 784.45970502358307 1047.6303366088866 710 1094.064824987438 780 970.96684087367044 634.21152031879058 1060.2083508408718 714.07540116065684 665.08090820166865 1028.466748588228 740.80518498698302 916.58905716279833 589.99769584279852 936.238172127947 663.9557685650376 878.4703772327498 526.44210283383939
X 233 266 400 266 300 233 300 266 233 266 233 300 333 266 266 266 333 400 266 367 367 233 500 1800 2599.92 1000 2000 750 1500 1399.99 1600 1649.93 1099.97 1799.99 2199.9899999999998 1499.93 1199.95 1399.99 1999.99 2599.9899999999998 1299.99 2200 2300 1349.7
Page 5
MAKE/ModelEngineSize
CabSpaceHorsePowerTopSpeedWeightMPG
GM/GeoMetroXF14
89499617.565.4
GM/GeoMetro4
9255972056
GM/GeoMetroLSI4
9255972055.9
SuzukiSwift4
92701052049
DaihatsuCharade4
9253962046.5
GM/GeoSprintTurbo4
89701052046.2
GM/GeoSprint4
9255972045.4
HondaCivicCRXHF4
50629822.559.2
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.931
R Square
0.866
Adjusted R Square
0.854
Standard Error
16.991
Observations
25
ANOVA
df
SS
MS
F
Significance F
Regression
2
41129.41
20564.7
71.23
2.45E-10
Residual
22
6351.15
288.7
Total
24
47480.56
Coefficients
Standard Error
t Stat
P-value
Lower 95%
Upper 95%
Intercept
-7.62
31.05
-0.245
0.808
-72.00
56.77
Orig_Price
1.01
0.10
10.087
1.031E-09
0.81
1.22
MSRP
-0.08
0.11
-0.727
0.475
-0.30
0.15
BUS 305: SOLUTIONS TO
PRACTICE PROBLEMS EXAM 2
1) B
2) B
3) No, fan pattern (heteroscedasticity)
4) No, nonlinear relationship between X and Y
5) The black line is the regression line because it get closest to the sample points (minimizes error between the points and the line). The red line has a larger error; that is, larger total distance from points to the line.
6) Because it is reasonable to suppose that costs are dependent on production volume (since units are produced, directly resulting in costs), then regression is more appropriate for this data since regression is appropriate when an cause-and-effect relationship is assumed.
7) C
8) a) r = 0.8;
b) T = 1.31;
c) p = 0.117
d) There is no evidence of a significant correlation between X and Y in the population because we did not reject the null of H0: = 0.
9) Note: the following are not complete answers to Question 11; they are just enough for you to know whether your short answer addressed the correct things.
a) 1 = population slope, b1 = sample slope. On exam, would also want to address what you know (or don’t know) about each of these and how each is found.
b) An outlier can “drag” the regression line toward it. On the exam, also think about how this would affect the quality of your regression model and the predictions.
10) Yes, there appears to be a straight line relationship between the variables. Linear regression appears to be appropriate. The regression output is:
11) a) T = -0.09, p = 0.929, do not reject Ho, conclude there is no evidence of a relationship
b) R2 = 0.002 = 0.2%, No because value is very close to zero
c) Correlation = r = -0.0421. No, there is not a strong relationship between these variables. The correlation is nearly 0.
d) Regression line is Y^ = 1.26 – 0.035X.
Y^ = 1.26 – 0.035(100) = 1.26 – 3.5 = -2.24. No this does not make sense because you cannot have a negative number of near misses. It is not wise to predict with this model. The R-squared value is extremely low (essentially 0%), which means that there is no relationship at all between near misses and flights in this data. Therefore, predicting misses from flights is meaningless.
e) b1 = -0.035. As Number of flights increases by 1, we expect number of near misses to go down by 0.035. Or, put another way, as flights increases by 1000, we expect number of near misses to go down by 35. No, this does not make sense. We would assume that as flights increase, so would near misses.
12) a. Multiple regression is a direct extension of simple regression, except that now we have more than one independent (X) variable.
b. Note: the following is not a complete answer; it is just enough for you to know whether your short answer addressed the correct things: Multicollinearity is when the independent variables are highly correlated with one another. On the exam, also indicate how this affects the model, how one can identify if it is present, and what can be done to correct it.
c. Dummy variables are used to incorporate categorical variables into a regression model. A dummy variable is added that is “1” if the person/item has the characteristic and “0” if it does not.
13) B
14)
15) a) The since the p-value associated with the F-statistic is very small (note: 2.45E-10 means to move the decimal point 10 places to the LEFT, i.e. 0.000000000245), we would reject the null that says that none of the independent variables (Orig_Price and MSRP) have an effect on price. Therefore, we conclude at least one of these X variables does have an effect or relationship with price.
b) Orig_Price does affect Price, since p = 1.031E-09 = 0.000000001031 < 0.01, reject Ho: = 0
MSRP does NOT since p = 0.475 > 0.10, do not reject Ho: = 0
c) Regression equation: Y^ = -7.62 + 1.01X1 – 0.08X2; prediction: 65.18
d) MSRP -0.08, Orig_Price 1.01
e) R-squared = 0.866. This is a good model because r-square is close to 1 (100%), thus I would feel pretty confident that my predictions would be fairly accurate in this case.
16) Model 1: The first model run states that MPG is a linear function of: EngineSize, CabSpace, HorsePower, TopSpeed, and Weight. When that model is run, we find:
· R-square = 0.873
· Adjusted r-square = 0.865
· Significant variables: Horsepower, TopSpeed, Weight
· Insignificant variables: EngineSize, CabSpace
Because we have two insignificant variables, take them out.
Model 2: This model states that MPG is a linear function of HorsePower, TopSpeed, and Weight. We find that:
· R-square = 0.873
· Adjusted r-square = 0.868
· Significant variables: Horsepower, TopSpeed, Weight
· Insignificant variables: none
Taking out EngineSize and CabSpace did not change the R-squared value at all. Apparently, CabSpace did not explain any variation in MPG, so removing it clearly results in a better model (simpler with no loss of explanatory power). Since all of the independent variables left are significant, we find that this is the best possible model (removing any more would surely decrease R-squared).
Page 3
SUMMARY OUTPUT
Regression Statistics
Multiple R0.9583
R Square0.9183
Adjusted R Square0.9020
Standard Error4.1442
Observations7
ANOVA
dfSSMSFSignificance F
Regression1965.556965.55656.2210.00067
Residual585.87217.174
Total61051.429
CoefficientsStandard Errort StatP-valueLower 95%Upper 95%
Intercept162.70076.385425.48020.0000146.2865179.1148
ProdVolume-1.45700.1943-7.49800.0007-1.9565-0.9575
SUMMARY OUTPUT
Regression Statistics
Multiple R0.9346
R Square0.8734
Adjusted R Square0.8651
Standard Error3.6750
Observations82
ANOVA
dfSSMSFSignificance F
Regression57081.0473441416.209104.8621.19E-32
Residual761026.41521713.50546
Total818107.462561
CoefficientsStandard Errort StatP-valueLower 95%Upper 95%
Intercept192.81223.7168.1300.000145.578240.047
EngineSize-0.1040.387-0.2670.790-0.8750.668
CabSpace-0.0150.023-0.6680.506-0.0610.030
HorsePower0.3930.0824.7960.0000.2300.556
TopSpeed-1.2980.246-5.2650.000-1.789-0.807
Weight-1.8470.220-8.4020.000-2.285-1.409
>Sheet X 1
. 4 92 55 97 20 4 92 5
20 49 4 92 20 Turbo
4 89 70 20 .4
CRX
HF
4 .5
HF 4 50 62 98 22.5 6 89 22.5 3
22.5 .9
92 1 22.5 40.9 22.5 6 89 73 103 22.5 4 22.5 22.5 4 103 25 40.9 6 107 25 6 101 25 40 4 96 105 25 39.3 4 89 95 111 25 38.8 4 92 110 25 38.4 6 90 109 25 6 90 Protege
6 107 103 27.5 6 103 27.5 4 97 102 111 27.5 .4
4 113 102 111 27.5 4 101 81 102 27.5 4 98 90 106 27.5 35.1 6 90 106 27.5 35 4 102 109 30 6 113 95 106 30 6 106 95 106 30 32.2 6 88 95 106 30 32.2 4 102 105 30 4 99 100 30 31.5 4 111 100 108 30 6 103 98 107 30 31.4 6 101 109 35 109 35 6 115 109 35 31.3 8 113 0
35 0
35 .9
8 124 130 115 35 28 8 92 96 102 35 28 8 94 100 104 35 28 6 115 100 105 35 28 8 111 120 35 8 120 107 40 6 40 8 140 114 40 8 40 8 50 40 23.6 6 114 165 122 40 23.6 8 165 122 40 23.6 8 123 165 122 40 23.6 8 40 8 50 160 40 6 40 23.4 6 162 121 40 6 160 140 110 45 8 140 110 45 22.9 121 45 .5
SL
8 50 165 45 8 115 140 45 6 50 45 17 6 45 8 107 130 55 MAKE/Model Weight TopSpeed 20 GM/GeoMetro 20 97 25 122 Statistics
DaihatsuCharade 20 96 35 106 20 105 40 109 GM/GeoSprint 20 97 45 97 7 22.5 98 HondaCivicCRXHF 22.5 98 DaihatsuCharade 22.5 107 F SubaruJusty 22.5 103 Regression 1 965.556 HondaCivicCRX 22.5 113 5 HondaCivic 22.5 113 6 SubaruJusty 22.5 103 SubaruJusty 22.5 100 Standard Error 2
ToyotaTercel 22.5 106 ProdVolume .4570
HondaCivicCRX 22.5 113 PontiacLeMans 25 101 IsuzuStylus 25 111 GM/GeoStorm 25 111 HondaCivicCRX 25 110 HondaCivicWagon 25 110 HondaCivic 25 110 25 109 VolksJettaDiesel 27.5 105 27.5 112 FordEscortWagon 27.5 103 FordEscort 27.5 103 ToyotaCorolla 27.5 111 EagleSummit 27.5 102 NissanCentraCoupe 27.5 106 NissanCentraWagon 27.5 106 ToyotaCelica 30 109 ToyotaCelica 30 109 ToyotaCorolla 30 120 ChevroletCorsica 30 106 ChevroletBeretta 30 106 ToyotaCorolla 30 109 FordTempo 30 107 ToyotaCelica 30 120 ToyotaCamry 35 109 ToyotaCamry 35 109 ToyotaCamry 35 109 45 125 7 10 11 12 13 14 15 16 17 18 19 20 21 22 23 8 9 11 12 13 14 14 16 17 16 19 20 21 21 23 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 8 8 8 7 7 7 6 504 503 506 504 500 490 486 476 464 450 434 416 396 374 350 515 515 525 518 515 506 500 494 483 480 460 438 419 398 375 496 497 508 502 500 492 487 482 472 470 451 430 412 380 370 500 470 486 450 440 478 469 TopSpeed 20 20 20 20 20 20 22.5 22.5 22.5 22.5 22.5 22.5 22.5 22.5 22.5 22.5 22.5 22.5 25 25 25 25 25 25 25 25 25 25 27.5 27.5 27.5 27 .5 27.5 27.5 27.5 27.5 27.5 30 30 30 30 30 30 30 30 30 30 30 30 35 35 35 35 35 35 35 35 35 35 35 35 40 40 40 40 40 40 40 40 40 40 40 40 45 45 45 45 45 45 45 97 97 105 96 105 97 98 98 107 103 113 113 103 100 103 106 113 106 109 110 101 111 105 111 110 110 110 109 105 112 103 103 111 111 102 106 106 109 109 120 106 106 109 106 105 108 108 107 120 109 109 109 109 123 125 115 102 109 104 105 120 107 114 114 117 122 122 122 122 118 130 121 121 110 110 121 125 140 137 138 Weight
2
1
MAKE/Model
EngineSize
CabSpace
HorsePower
TopSpeed
Weight
MPG
GM/GeoMetro
F
4
8
9
49
9
6
1
7
5
65.4
GM/GeoMetro 4
92
55
97
2
0
56
GM/GeoMetroLSI
55.9
SuzukiSwift
70
10
DaihatsuCharade
5
3
96
46.5
GM/GeoSprint
105
46.2
GM/GeoSprint 4 92 55 97 20
45
HondaCivic
50
62
98
22
59.2
HondaCivicCRX
22.5
53.3
DaihatsuCharade 4
94
80
107
43.4
SubaruJusty
73
103
41.1
HondaCivicCRX 4 50 92
11
40
HondaCivic 4
99
13
SubaruJusty 6 89 73 103 22.5
40.4
SubaruJusty 6 89
66
100
39.6
SubaruJusty4wd
39.3
ToyotaTercel
91
78
106
38.9
HondaCivicCRX 4 50 92
113
38.8
ToyotaTercel 4 91 78 106 22.5
38.2
FordEscort
90
109
25
42.2
HondaCivic 4 99 92
110
PontiacLeMans
74
101
40.7
IsuzuStylus
95
111
DodgeColt
81
GM/GeoStorm
HondaCivicCRX 4 50 92 110 25
38.4
HondaCivicWagon
1
17
HondaCivic 4 99 92 110 25 38.4
SubaruLoyale
102
29.5
VolksJettaDiesel
104
52
27.5
46.9
Mazda3
23
1
12
36.3
FordEscortWagon
1
14
84
36.1
FordEscort 4 101 84 103 27.5 36.1
GM/GeoPrism
35
ToyotaCorolla
35.3
EagleSummit
35.1
NissanCentraCoupe
NissanCentraWagon
88
ToyotaCelica
86
30
33.2
ToyotaCelica 4 86 102 109 30
32.9
ToyotaCorolla 4 92
130
120
32.3
ChevroletCorsica
32.2
ChevroletBeretta
ToyotaCorolla 4 92 102 109 30 32.2
PontiacSunbirdConv
DodgeShadow
93
31.5
DodgeDaytona
108
EagleSpirit
31.4
FordTempo
ToyotaCelica 4 86 130 120 30
31.2
ToyotaCamry
1
15
33.7
ToyotaCamry 6 101
115
32.6
ToyotaCamry 6 101 115 109 35
31.3
ToyotaCamryWagon
124
OldsCutlassSup
18
133
30.4
OldsCutlassSup 8 113
16
125
28
Saab9000
FordMustang
ToyotaCamry 6 101 115 109 35 28
ChryslerLebaronConv
DodgeDynasty
Volvo740
145
27.7
FordThunderbird
116
25.6
ChevroletCaprice
131
140
114
25.3
LincolnContinental
123
23.9
ChryslerNewYorker
1
21
150
117
23.6
BuickReatta
165
122
OldsTrof/Toronado
Oldsmobile98
127
PontiacBonneville
LexusLS400
112
245
148
23.5
Nissan300ZX
280
23.4
Volvo760Wagon
135
162
121
Audi200QuatroWag
132
23.1
BuickElectraWagon
22.9
CadillacBrougham
129
CadillacBrougham 8 129
175
19
Mercedes
500
322
18.1
Mercedes560SEL
238
17.2
JaguarXJSConvert
263
147
BMW750IL
119
295
157
16.7
Rolls-RoyceVarious
236
13.2
Sheet2
1 7
504
ProdVolume
Cost
SUMMARY OUTPUT
2 10
503
137
3 11
506
Regression
4 12 504 GM/GeoMetroLSI 20 97 28 123
Multiple R
0.9583
5 13 500 SuzukiSwift 20 105 30 120
R Square
0.9183
6 14
490
Adjusted R Square
0.9020
7 15
486
GM/GeoSprintTurbo
Standard Error
4.1442
8 16
476
Observations
9 17
464
HondaCivicCRXHF
10 18
450
ANOVA
11 19
434
df
SS
MS
Significance F
12 20
416
965.556
56.221
0.00067
13 21
396
Residual
85.872
17.174
14 22
374
Total
1051.429
15 23
350
1 8
515
Coefficients
t Stat
P-value
Lower 95%
Upper 95%
2 9 515 SubaruJusty4wd 22.5 103
Intercept
162.7007
6.3854
25.
480
0.0000
146.2865
179.1148
3 11
525
-1
0.1943
-7.4980
0.0007
-1.9565
-0.9575
4 12
518
5 13 515 ToyotaTercel 22.5 106
6 14 506 FordEscort 25 109
7 14 500 HondaCivic 25 110
8 16
494
9 17
483
10 16 480 DodgeColt 25 105
11 19
460
12 20
438
13 21
419
14 21
398
15 23
375
Subaru Loyale
1 9
496
2 10
497
Mazda323Protege
3 11
508
4 12
502
5 13 500 GM/GeoPrism 27.5 111
6 14
492
7 15
487
8 16
482
9 17
472
10 18
470
11 19
451
12 20
430
13 21
412
14 22
380
15 23
370
0 8 500 PontiacSunbirdConv 30 106
0 8 470 DodgeShadow 30 105
0 8 486 DodgeDaytona 30 108
-1 7 450 EagleSpirit 30 108
-1 7
440
-1 7
478
-1 6
469
ToyotaCamry 35 109
ToyotaCamryWagon 35 109
OldsCutlassSup 35 123
OldsCutlassSup 35 125
Saab9000 35 115
FordMustang 35 102
ChryslerLebaronConv 35 104
DodgeDynasty 35 105
Volvo740 35 120
FordThunderbird 40 107
ChevroletCaprice 40 114
LincolnContinental 40 114
ChryslerNewYorker 40 117
BuickReatta 40 122
OldsTrof/Toronado 40 122
Oldsmobile98 40 122
PontiacBonneville 40 122
LexusLS400 40
118
Nissan300ZX 40 130
Volvo760Wagon 40 121
Audi200QuatroWag 40 121
BuickElectraWagon 45 110
CadillacBrougham 45 110
CadillacBrougham 45 121
Mercedes500SL
Mercedes560SEL 45 140
JaguarXJSConvert 45 137
BMW750IL 45
138
TopSpeed
Cost 20 25 28 30 35 40 45 137 122 123 120 106 109 97 BUS 305: SOLUTIONS TO
PRACTICE PROBLEMS EXAM 2
1) B
2) B
3) No, fan pattern (heteroscedasticity)
4) No, nonlinear relationship between X and Y
5) The black line is the regression line because it get closest to the sample points (minimizes error between the points and the line). The red line has a larger error; that is, larger total distance from points to the line.
6) Because it is reasonable to suppose that costs are dependent on production volume (since units are produced, directly resulting in costs), then regression is more appropriate for this data since regression is appropriate when an cause-and-effect relationship is assumed.
7) C
8) a) r = 0.8;
b) T = 1.31;
c) p = 0.117
d) There is no evidence of a significant correlation between X and Y in the population because we did not reject the null of H0: = 0.
9) Note: the following are not complete answers to Question 11; they are just enough for you to know whether your short answer addressed the correct things.
a) 1 = population slope, b1 = sample slope. On exam, would also want to address what you know (or don’t know) about each of these and how each is found.
b) An outlier can “drag” the regression line toward it. On the exam, also think about how this would affect the quality of your regression model and the predictions.
10) Yes, there appears to be a straight line relationship between the variables. Linear regression appears to be appropriate. The regression output is:
11) a) T = -0.09, p = 0.929, do not reject Ho, conclude there is no evidence of a relationship
b) R2 = 0.002 = 0.2%, No because value is very close to zero
c) Correlation = r = -0.0421. No, there is not a strong relationship between these variables. The correlation is nearly 0.
d) Regression line is Y^ = 1.26 – 0.035X.
Y^ = 1.26 – 0.035(100) = 1.26 – 3.5 = -2.24. No this does not make sense because you cannot have a negative number of near misses. It is not wise to predict with this model. The R-squared value is extremely low (essentially 0%), which means that there is no relationship at all between near misses and flights in this data. Therefore, predicting misses from flights is meaningless.
e) b1 = -0.035. As Number of flights increases by 1, we expect number of near misses to go down by 0.035. Or, put another way, as flights increases by 1000, we expect number of near misses to go down by 35. No, this does not make sense. We would assume that as flights increase, so would near misses.
12) a. Multiple regression is a direct extension of simple regression, except that now we have more than one independent (X) variable.
b. Note: the following is not a complete answer; it is just enough for you to know whether your short answer addressed the correct things: Multicollinearity is when the independent variables are highly correlated with one another. On the exam, also indicate how this affects the model, how one can identify if it is present, and what can be done to correct it.
c. Dummy variables are used to incorporate categorical variables into a regression model. A dummy variable is added that is “1” if the person/item has the characteristic and “0” if it does not.
13) B
14)
15) a) The since the p-value associated with the F-statistic is very small (note: 2.45E-10 means to move the decimal point 10 places to the LEFT, i.e. 0.000000000245), we would reject the null that says that none of the independent variables (Orig_Price and MSRP) have an effect on price. Therefore, we conclude at least one of these X variables does have an effect or relationship with price.
b) Orig_Price does affect Price, since p = 1.031E-09 = 0.000000001031 < 0.01, reject Ho: = 0
MSRP does NOT since p = 0.475 > 0.10, do not reject Ho: = 0
c) Regression equation: Y^ = -7.62 + 1.01X1 – 0.08X2; prediction: 65.18
d) MSRP -0.08, Orig_Price 1.01
e) R-squared = 0.866. This is a good model because r-square is close to 1 (100%), thus I would feel pretty confident that my predictions would be fairly accurate in this case.
16) Model 1: The first model run states that MPG is a linear function of: EngineSize, CabSpace, HorsePower, TopSpeed, and Weight. When that model is run, we find:
· R-square = 0.873
· Adjusted r-square = 0.865
· Significant variables: Horsepower, TopSpeed, Weight
· Insignificant variables: EngineSize, CabSpace
Because we have two insignificant variables, take them out.
Model 2: This model states that MPG is a linear function of HorsePower, TopSpeed, and Weight. We find that:
· R-square = 0.873
· Adjusted r-square = 0.868
· Significant variables: Horsepower, TopSpeed, Weight
· Insignificant variables: none
Taking out EngineSize and CabSpace did not change the R-squared value at all. Apparently, CabSpace did not explain any variation in MPG, so removing it clearly results in a better model (simpler with no loss of explanatory power). Since all of the independent variables left are significant, we find that this is the best possible model (removing any more would surely decrease R-squared).
Page 3
SUMMARY OUTPUT
Regression Statistics
Multiple R0.9583
R Square0.9183
Adjusted R Square0.9020
Standard Error4.1442
Observations7
ANOVA
dfSSMSFSignificance F
Regression1965.556965.55656.2210.00067
Residual585.87217.174
Total61051.429
CoefficientsStandard Errort StatP-valueLower 95%Upper 95%
Intercept162.70076.385425.48020.0000146.2865179.1148
ProdVolume-1.45700.1943-7.49800.0007-1.9565-0.9575
SUMMARY OUTPUT
Regression Statistics
Multiple R0.9346
R Square0.8734
Adjusted R Square0.8651
Standard Error3.6750
Observations82
ANOVA
dfSSMSFSignificance F
Regression57081.0473441416.209104.8621.19E-32
Residual761026.41521713.50546
Total818107.462561
CoefficientsStandard Errort StatP-valueLower 95%Upper 95%
Intercept192.81223.7168.1300.000145.578240.047
EngineSize-0.1040.387-0.2670.790-0.8750.668
CabSpace-0.0150.023-0.6680.506-0.0610.030
HorsePower0.3930.0824.7960.0000.2300.556
TopSpeed-1.2980.246-5.2650.000-1.789-0.807
Weight-1.8470.220-8.4020.000-2.285-1.409