statistics 2

exam 2 online . 10 question. the exam 2 hours long. needed by sunday evening 4 PM.

over these to chapters.

Lecture Notes & Materials From Class – Correlation & Simple Linear Regression (Chapter 14)
Lecture Notes & Materials – Multiple Regression (Chapter 15)

BUS 305: PRACTICE PROBLEMS EXAM 2

Simple Regression Problems

1) Of the following two graphs, indicate which one has a correlation coefficient that is closer to 0.

Scatterplot A Scatterplot B
2) Which of the following describes the relationship between the variables in the graphs in #1 above?
A. positive correlation
B. negative correlation
C. perfect positive correlation
D. perfect negative correlation
E. no correlation
3) If the scatterplot below depicted a set of bivariate data with independent variable X and dependent variable Y, would a regression model be appropriate for this data? Why or why not?

Save Time On Research and Writing

Hire a Pro to Write You a 100% Plagiarism-Free Paper.

Get My Paper

4) If the scatterplot below depicted a set of bivariate data with independent variable X and dependent variable Y, would a regression model be appropriate for this data? Why or why not?
5) Which of the following would represent the regression line for this data set? Why? Explain what characteristic of the line makes it the regression line.

6) Suppose your company is interested in discovering if there is a relationship or correlation between production volume (in number of units) and costs (in $). Which would be more appropriate for this data – to run a correlation analysis or to run a regression analysis? Explain.
7) Suppose your company is interested in discovering if there is a relationship between production volume (in number of units) and costs (in $).

ProdVolume

Cost

20

137

25

122

28

123

30

120

35

106

40

109

45

97

Which of the following is the most appropriate statistical analysis to run?
A. ANOVA
B. Multiple linear regression
C. Simple linear regression
D. T-test for the mean of a single population
8) Suppose you run a regression for variables X and Y, and find that r2 = 0.64, that the t-statistic for the hypothesis test H0: 1 = 0 is 1.31, and that the p-value for that test is 0.117. Then:
a) r = ________
b) t-statistic for the hypothesis test H0: = 0 equals (give a number):_
c) p-value for the hypothesis test H0: = 0 equals (give a number): ___________
d) What do you conclude about the existence of a significant correlation between X and Y in the population? Explain.

9) Provide about one or two sentences to answer each question.
a) In a simple regression model, what is the difference between the 1 and b1?
b) Why are outliers problematic in a multiple regression model?
10) Given the following data and scatterplot, determine if a simple linear regression model is appropriate for this data. If so, generate the regression output using StatCrunch or Excel. If not, explain why linear regression is not appropriate.

ProdVolume

Cost

20

137

25

122

28

123

30

120

35

106

40

109

45

97

11) When answering questions (a) and (b) below, refer to the following StatCrunch output from a regression model that asserts that the number of near misses per year (Y) of commercial airliners is a linear function of the number of flights per year (X).

(a) Test for a linear relationship between near_misses and num_flights by reading the appropriate values from the output above. Be sure to indicate a test statistic, a p-value, and a conclusion as to whether or not there is a relationship.
(b) What percentage of the variation in the number of near misses is explained by the number of flights? Do you think this is a good regression model?
(c) What is the correlation between misses and flights? Is there a strong relationship between these variables? Explain.
(d) Write the regression line and then use it to calculate the predicted number of near misses if the number of flights is 100. Does this prediction make sense? Explain. Is it wise to make predictions with this model? Why or why not? (Refer to a part of the output to back up your conclusions.)
(e) Interpret the value of b1, the sample slope. Does this value appear to make sense? Explain.

Multiple Regression Problems

12) Provide one or two sentences to answer each of these questions.
a. Briefly explain the difference between multiple and simple regression.
b. What is multicollinearity in a multiple regression model, and why is it problematic?
c. How do you incorporate qualitative/categorical variables into a regression model? Be specific about what kind of variable is added to the model and what values that variable can be.
13) Suppose you want to try to estimate the miles per gallon of various car types by using their engine size (number of cylinders), cab space, horsepower, top speed and weight.

Which of the following is the most appropriate statistical analysis to run?
A. ANOVA
B. Multiple linear regression
C. Simple linear regression
D. T-test for the mean of a single population
14) Given the following data set, generate the multiple regression output for the model that states that MPG of a car is a linear function of EngineSize, CabSpace, Horsepower, TopSpeed, and Weight . Use StatCrunch or Excel. (See Excel file, PracticeExam2data.xlsx to copy the entire data set.)

MAKE/Model

EngineSize

CabSpace

HorsePower

TopSpeed

Weight

MPG

GM/GeoMetroXF1

4

89

49

96

17.5

65.4

GM/GeoMetro

4

92

55

97

20

56

GM/GeoMetroLSI

4

92

55

97

20

55.9

BMW750IL

6

119

295

157

45

16.7

Rolls-RoyceVarious

8

107

236

130

55

13.2

15) Use the following Excel output from a multiple regression model to answer questions (a) – (d). The model asserts that the sale price of an item is a function of both the original price, and the manufacturer’s suggested retail price (MSRP).
a) What does the F-statistic and its p-value tell you about the overall significance of the model in terms of the effects of Orig_Price and MSRP on the price of an item?
b) Which, if any, of the independent variables appear to affect the sale price (Y)? Indicate any numbers from the table you used to arrive at this conclusion.
c) State the regression equation and use it to predict the value of Y (sale price) corresponding to Original Price = 80 and MSRP = 100.
d) How much can you expect the sale price (Y) to increase as the MSRP increases by 1 unit? As Orig_Price increases by one unit?
e) How good/effective is this model? Are you comfortable using this regression equation to predict prices? Why or why not?
16) Consider the data in the file PracticeExam2data.xls. This data shows 82 cars and measures several characteristics of each. Use this data to develop the BEST/most efficient multiple regression model for predicting how many miles per gallon (MPG) that vehicles get (you may have to run more than one). Once you have your final model, explain why this was the best model possible using the discussion points from class.
7 10 11 12 13 14 15 16 17 18 19 20 21 22 23 8 9 11 12 13 14 14 16 17 16 19 20 21 21 23 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 8 8 8 7 7 7 6 504 503 506 504 500 490 486 476 464 450 434 416 396 374 350 515 515 525 518 515 506 500 494 483 480 460 438 419 398 375 496 497 508 502 500 492 487 482 472 470 451 430 412 380 370 500 470 486 450 440 478 469 TopSpeed 20 20 20 20 20 20 22.5 22.5 22.5 22.5 22.5 22.5 22.5 22.5 22.5 22.5 22.5 22.5 25 25 25 25 25 25 25 25 25 25 27.5 27.5 27.5 27 .5 27.5 27.5 27.5 27.5 27.5 30 30 30 30 30 30 30 30 30 30 30 30 35 35 35 35 35 35 35 35 35 35 35 35 40 40 40 40 40 40 40 40 40 40 40 40 45 45 45 45 45 45 45 97 97 105 96 105 97 98 98 107 103 113 113 103 100 103 106 113 106 109 110 101 111 105 111 110 110 110 109 105 112 103 103 111 111 102 106 106 109 109 120 106 106 109 106 105 108 108 107 120 109 109 109 109 123 125 115 102 109 104 105 120 107 114 114 117 122 122 122 122 118 130 121 121 110 110 121 125 140 137 138 Weight
TopSpeed
930 903.3820988091868 884.45970502358307 947.63033660888789 910 894.06482498743821 880 870.96684087367044 834.21152031879058 860.20835084087196 814.07540116065684 865.08090820166865 828.46674858822792 840.80518498698302 816.58905716279833 789.99769584279852 736.238172127947 763.9557685650376 778.4703772327498 726.44210283383939

730 1103.3820988091863 784.45970502358307 1047.6303366088866 710 1094.064824987438 780 970.96684087367044 634.21152031879058 1060.2083508408718 714.07540116065684 665.08090820166865 1028.466748588228 740.80518498698302 916.58905716279833 589.99769584279852 936.238172127947 663.9557685650376 878.4703772327498 526.44210283383939

X 233 266 400 266 300 233 300 266 233 266 233 300 333 266 266 266 333 400 266 367 367 233 500 1800 2599.92 1000 2000 750 1500 1399.99 1600 1649.93 1099.97 1799.99 2199.9899999999998 1499.93 1199.95 1399.99 1999.99 2599.9899999999998 1299.99 2200 2300 1349.7

Page 5
MAKE/ModelEngineSize
CabSpaceHorsePowerTopSpeedWeightMPG
GM/GeoMetroXF14
89499617.565.4
GM/GeoMetro4
9255972056
GM/GeoMetroLSI4
9255972055.9
SuzukiSwift4
92701052049
DaihatsuCharade4
9253962046.5
GM/GeoSprintTurbo4
89701052046.2
GM/GeoSprint4
9255972045.4
HondaCivicCRXHF4
50629822.559.2
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.931
R Square
0.866
Adjusted R Square
0.854
Standard Error
16.991
Observations
25
ANOVA
df
SS
MS
F
Significance F
Regression
2
41129.41
20564.7
71.23
2.45E-10
Residual
22
6351.15
288.7
Total
24
47480.56
Coefficients
Standard Error
t Stat
P-value
Lower 95%
Upper 95%
Intercept
-7.62
31.05
-0.245
0.808
-72.00
56.77
Orig_Price
1.01
0.10
10.087
1.031E-09
0.81
1.22
MSRP
-0.08
0.11
-0.727
0.475
-0.30
0.15

BUS 305: SOLUTIONS TO

PRACTICE PROBLEMS EXAM 2

1) B
2) B
3) No, fan pattern (heteroscedasticity)
4) No, nonlinear relationship between X and Y
5) The black line is the regression line because it get closest to the sample points (minimizes error between the points and the line). The red line has a larger error; that is, larger total distance from points to the line.
6) Because it is reasonable to suppose that costs are dependent on production volume (since units are produced, directly resulting in costs), then regression is more appropriate for this data since regression is appropriate when an cause-and-effect relationship is assumed.
7) C
8) a) r = 0.8;
b) T = 1.31;
c) p = 0.117
d) There is no evidence of a significant correlation between X and Y in the population because we did not reject the null of H0: = 0.
9) Note: the following are not complete answers to Question 11; they are just enough for you to know whether your short answer addressed the correct things.
a) 1 = population slope, b1 = sample slope. On exam, would also want to address what you know (or don’t know) about each of these and how each is found.
b) An outlier can “drag” the regression line toward it. On the exam, also think about how this would affect the quality of your regression model and the predictions.

10) Yes, there appears to be a straight line relationship between the variables. Linear regression appears to be appropriate. The regression output is:

11) a) T = -0.09, p = 0.929, do not reject Ho, conclude there is no evidence of a relationship
b) R2 = 0.002 = 0.2%, No because value is very close to zero
c) Correlation = r = -0.0421. No, there is not a strong relationship between these variables. The correlation is nearly 0.
d) Regression line is Y^ = 1.26 – 0.035X.
Y^ = 1.26 – 0.035(100) = 1.26 – 3.5 = -2.24. No this does not make sense because you cannot have a negative number of near misses. It is not wise to predict with this model. The R-squared value is extremely low (essentially 0%), which means that there is no relationship at all between near misses and flights in this data. Therefore, predicting misses from flights is meaningless.
e) b1 = -0.035. As Number of flights increases by 1, we expect number of near misses to go down by 0.035. Or, put another way, as flights increases by 1000, we expect number of near misses to go down by 35. No, this does not make sense. We would assume that as flights increase, so would near misses.
12) a. Multiple regression is a direct extension of simple regression, except that now we have more than one independent (X) variable.
b. Note: the following is not a complete answer; it is just enough for you to know whether your short answer addressed the correct things: Multicollinearity is when the independent variables are highly correlated with one another. On the exam, also indicate how this affects the model, how one can identify if it is present, and what can be done to correct it.
c. Dummy variables are used to incorporate categorical variables into a regression model. A dummy variable is added that is “1” if the person/item has the characteristic and “0” if it does not.
13) B
14)

15) a) The since the p-value associated with the F-statistic is very small (note: 2.45E-10 means to move the decimal point 10 places to the LEFT, i.e. 0.000000000245), we would reject the null that says that none of the independent variables (Orig_Price and MSRP) have an effect on price. Therefore, we conclude at least one of these X variables does have an effect or relationship with price.
b) Orig_Price does affect Price, since p = 1.031E-09 = 0.000000001031 < 0.01, reject Ho: = 0 MSRP does NOT since p = 0.475 > 0.10, do not reject Ho: = 0
c) Regression equation: Y^ = -7.62 + 1.01X1 – 0.08X2; prediction: 65.18
d) MSRP -0.08, Orig_Price 1.01
e) R-squared = 0.866. This is a good model because r-square is close to 1 (100%), thus I would feel pretty confident that my predictions would be fairly accurate in this case.
16) Model 1: The first model run states that MPG is a linear function of: EngineSize, CabSpace, HorsePower, TopSpeed, and Weight. When that model is run, we find:
· R-square = 0.873
· Adjusted r-square = 0.865
· Significant variables: Horsepower, TopSpeed, Weight
· Insignificant variables: EngineSize, CabSpace
Because we have two insignificant variables, take them out.
Model 2: This model states that MPG is a linear function of HorsePower, TopSpeed, and Weight. We find that:
· R-square = 0.873
· Adjusted r-square = 0.868
· Significant variables: Horsepower, TopSpeed, Weight
· Insignificant variables: none
Taking out EngineSize and CabSpace did not change the R-squared value at all. Apparently, CabSpace did not explain any variation in MPG, so removing it clearly results in a better model (simpler with no loss of explanatory power). Since all of the independent variables left are significant, we find that this is the best possible model (removing any more would surely decrease R-squared).
Page 3
SUMMARY OUTPUT
Regression Statistics
Multiple R0.9583
R Square0.9183
Adjusted R Square0.9020
Standard Error4.1442
Observations7
ANOVA
dfSSMSFSignificance F
Regression1965.556965.55656.2210.00067
Residual585.87217.174
Total61051.429
CoefficientsStandard Errort StatP-valueLower 95%Upper 95%
Intercept162.70076.385425.48020.0000146.2865179.1148
ProdVolume-1.45700.1943-7.49800.0007-1.9565-0.9575
SUMMARY OUTPUT
Regression Statistics
Multiple R0.9346
R Square0.8734
Adjusted R Square0.8651
Standard Error3.6750
Observations82
ANOVA
dfSSMSFSignificance F
Regression57081.0473441416.209104.8621.19E-32
Residual761026.41521713.50546
Total818107.462561
CoefficientsStandard Errort StatP-valueLower 95%Upper 95%
Intercept192.81223.7168.1300.000145.578240.047
EngineSize-0.1040.387-0.2670.790-0.8750.668
CabSpace-0.0150.023-0.6680.506-0.0610.030
HorsePower0.3930.0824.7960.0000.2300.556
TopSpeed-1.2980.246-5.2650.000-1.789-0.807
Weight-1.8470.220-8.4020.000-2.285-1.409

>Sheet

1 MAKE/Model EngineSize CabSpace HorsePower TopSpeed Weight MPG GM/GeoMetro

1 4 8 9 49 9

6 1

5 65.4 GM/GeoMetro 4

92 55 97 2

0 56 GM/GeoMetroLSI

4 92 55 97 20

55.9 SuzukiSwift

4 92

70 10

20 49
DaihatsuCharade

4 92

3 96

46.5 GM/GeoSprint

Turbo

4 89 70

105

46.2 GM/GeoSprint 4 92 55 97 20

.4 HondaCivic

CRX

50 62 98 22

.5 59.2 HondaCivicCRX

HF 4 50 62 98

22.5 53.3 DaihatsuCharade 4

94 80 107

22.5

43.4 SubaruJusty

6 89

73 103

22.5

41.1 HondaCivicCRX 4 50 92

22.5

.9 HondaCivic 4

92 1

22.5 40.9
SubaruJusty 6 89 73 103 22.5

40.4 SubaruJusty 6 89

66 100

22.5

39.6 SubaruJusty4wd

6 89 73 103 22.5

39.3 ToyotaTercel

91 78 106

22.5

38.9 HondaCivicCRX 4 50 92

113

22.5

38.8 ToyotaTercel 4 91 78 106 22.5

38.2 FordEscort

4 103

90 109 25 42.2 HondaCivic 4 99 92

110

25 40.9
PontiacLeMans

6 107

74 101

40.7 IsuzuStylus

6 101

95 111

25 40
DodgeColt

4 96

105 25 39.3
GM/GeoStorm

4 89 95 111 25 38.8
HondaCivicCRX 4 50 92 110 25

38.4 HondaCivicWagon

92 110 25 38.4
HondaCivic 4 99 92 110 25 38.4
SubaruLoyale

102

90 109 25

29.5 VolksJettaDiesel

104 52

27.5 46.9 Mazda3

Protege

6 107 103

27.5

36.3 FordEscortWagon

14 84

103 27.5

36.1 FordEscort 4 101 84 103 27.5 36.1
GM/GeoPrism

4 97 102 111 27.5

.4 ToyotaCorolla

4 113 102 111 27.5

35.3 EagleSummit

4 101 81 102 27.5

35.1 NissanCentraCoupe

4 98 90 106 27.5 35.1
NissanCentraWagon

90 106 27.5 35
ToyotaCelica

102 109

30 33.2 ToyotaCelica 4 86 102 109 30

32.9 ToyotaCorolla 4 92

130 120

32.3 ChevroletCorsica

6 113 95 106 30

32.2 ChevroletBeretta

6 106 95 106 30 32.2
ToyotaCorolla 4 92 102 109 30 32.2
PontiacSunbirdConv

6 88 95 106 30 32.2
DodgeShadow

4 102

105 30

31.5 DodgeDaytona

4 99 100

108

30 31.5
EagleSpirit

4 111 100 108 30

31.4 FordTempo

6 103 98 107 30 31.4
ToyotaCelica 4 86 130 120 30

31.2 ToyotaCamry

6 101

109 35

33.7 ToyotaCamry 6 101

115

109 35

32.6 ToyotaCamry 6 101 115 109 35

31.3 ToyotaCamryWagon

124

115 109 35 31.3
OldsCutlassSup

8 113

0 133

30.4 OldsCutlassSup 8 113

0 125

.9 Saab9000

8 124 130 115 35 28
FordMustang

8 92 96 102 35 28
ToyotaCamry 6 101 115 109 35 28
ChryslerLebaronConv

8 94 100 104 35 28
DodgeDynasty

6 115 100 105 35 28
Volvo740

8 111

145

120 35

27.7 FordThunderbird

116

120 107 40

25.6 ChevroletCaprice

131 140 114

25.3 LincolnContinental

123

140 114 40

23.9 ChryslerNewYorker

21 150 117

23.6 BuickReatta

8 50

165 122

40 23.6
OldsTrof/Toronado

6 114 165 122 40 23.6
Oldsmobile98

127

165 122 40 23.6
PontiacBonneville

8 123 165 122 40 23.6
LexusLS400

112 245 148

23.5 Nissan300ZX

8 50

280

160 40

23.4 Volvo760Wagon

135 162 121

40 23.4
Audi200QuatroWag

132

162 121 40

23.1 BuickElectraWagon

6 160 140 110 45

22.9 CadillacBrougham

129

140 110 45 22.9
CadillacBrougham 8 129

175

121 45

.5 Mercedes

500

8 50

322

165 45

18.1 Mercedes560SEL

8 115

238

140 45

17.2 JaguarXJSConvert

6 50

263 147

45 17
BMW750IL

119 295 157

16.7 Rolls-RoyceVarious

8 107

236

130 55

13.2

Sheet2

1 7

MAKE/Model Weight TopSpeed

2 10

3 11

GM/GeoMetro 20 97 25 122

Statistics

4 12 504 GM/GeoMetroLSI 20 97 28 123

5 13 500 SuzukiSwift 20 105 30 120

6 14

DaihatsuCharade 20 96 35 106

7 15

20 105 40 109

8 16

GM/GeoSprint 20 97 45 97

9 17

22.5 98

10 18

HondaCivicCRXHF 22.5 98

11 19

DaihatsuCharade 22.5 107

12 20

SubaruJusty 22.5 103 Regression 1

965.556

13 21

HondaCivicCRX 22.5 113

14 22

HondaCivic 22.5 113

15 23

SubaruJusty 22.5 103

1 8

SubaruJusty 22.5 100

Standard Error

2 9 515 SubaruJusty4wd 22.5 103

3 11

ToyotaTercel 22.5 106 ProdVolume

.4570

4 12

HondaCivicCRX 22.5 113

5 13 515 ToyotaTercel 22.5 106
6 14 506 FordEscort 25 109
7 14 500 HondaCivic 25 110
8 16

PontiacLeMans 25 101

9 17

IsuzuStylus 25 111

10 16 480 DodgeColt 25 105
11 19

GM/GeoStorm 25 111

12 20

HondaCivicCRX 25 110

13 21

HondaCivicWagon 25 110

14 21

HondaCivic 25 110

15 23

25 109

1 9

VolksJettaDiesel 27.5 105

2 10

27.5 112

3 11

FordEscortWagon 27.5 103

4 12

FordEscort 27.5 103

5 13 500 GM/GeoPrism 27.5 111
6 14

ToyotaCorolla 27.5 111

7 15

EagleSummit 27.5 102

8 16

NissanCentraCoupe 27.5 106

9 17

NissanCentraWagon 27.5 106

10 18

ToyotaCelica 30 109

11 19

ToyotaCelica 30 109

12 20

ToyotaCorolla 30 120

13 21

ChevroletCorsica 30 106

14 22

ChevroletBeretta 30 106

15 23

ToyotaCorolla 30 109

0 8 500 PontiacSunbirdConv 30 106
0 8 470 DodgeShadow 30 105
0 8 486 DodgeDaytona 30 108
-1 7 450 EagleSpirit 30 108
-1 7

FordTempo 30 107

-1 7

ToyotaCelica 30 120

-1 6

ToyotaCamry 35 109

ToyotaCamryWagon 35 109
OldsCutlassSup 35 123
OldsCutlassSup 35 125
Saab9000 35 115
FordMustang 35 102

ToyotaCamry 35 109

ChryslerLebaronConv 35 104
DodgeDynasty 35 105
Volvo740 35 120
FordThunderbird 40 107
ChevroletCaprice 40 114
LincolnContinental 40 114
ChryslerNewYorker 40 117
BuickReatta 40 122
OldsTrof/Toronado 40 122
Oldsmobile98 40 122
PontiacBonneville 40 122
LexusLS400 40

Nissan300ZX 40 130
Volvo760Wagon 40 121
Audi200QuatroWag 40 121
BuickElectraWagon 45 110
CadillacBrougham 45 110
CadillacBrougham 45 121

45 125

Mercedes560SEL 45 140
JaguarXJSConvert 45 137
BMW750IL 45

	504		ProdVolume	Cost	SUMMARY OUTPUT
503		137
	506		Regression
Multiple R	0.9583
R Square	0.9183
490	Adjusted R Square	0.9020
	486	GM/GeoSprintTurbo		Standard Error	4.1442
476	Observations
464		HondaCivicCRXHF
	450	ANOVA
434	df	SS	MS	Significance F
416		965.556	56.221	0.00067
396	Residual	85.872	17.174
374	Total	1051.429
350
		515	Coefficients	t Stat	P-value	Lower 95%	Upper 95%
Intercept	162.7007	6.3854	25.	480	0.0000	146.2865	179.1148
525					-1	0.1943	-7.4980	0.0007	-1.9565	-0.9575
518
494
483
460
438
419
398
375	Subaru Loyale
496
497	Mazda323Protege
508
502
492
487
482
472
	470
451
430
412
380
370
440
478
469
118
Mercedes500SL
138

7 10 11 12 13 14 15 16 17 18 19 20 21 22 23 8 9 11 12 13 14 14 16 17 16 19 20 21 21 23 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 8 8 8 7 7 7 6 504 503 506 504 500 490 486 476 464 450 434 416 396 374 350 515 515 525 518 515 506 500 494 483 480 460 438 419 398 375 496 497 508 502 500 492 487 482 472 470 451 430 412 380 370 500 470 486 450 440 478 469 TopSpeed 20 20 20 20 20 20 22.5 22.5 22.5 22.5 22.5 22.5 22.5 22.5 22.5 22.5 22.5 22.5 25 25 25 25 25 25 25 25 25 25 27.5 27.5 27.5 27 .5 27.5 27.5 27.5 27.5 27.5 30 30 30 30 30 30 30 30 30 30 30 30 35 35 35 35 35 35 35 35 35 35 35 35 40 40 40 40 40 40 40 40 40 40 40 40 45 45 45 45 45 45 45 97 97 105 96 105 97 98 98 107 103 113 113 103 100 103 106 113 106 109 110 101 111 105 111 110 110 110 109 105 112 103 103 111 111 102 106 106 109 109 120 106 106 109 106 105 108 108 107 120 109 109 109 109 123 125 115 102 109 104 105 120 107 114 114 117 122 122 122 122 118 130 121 121 110 110 121 125 140 137 138 Weight
TopSpeed
Cost 20 25 28 30 35 40 45 137 122 123 120 106 109 97

BUS 305: SOLUTIONS TO

PRACTICE PROBLEMS EXAM 2

1) B
2) B
3) No, fan pattern (heteroscedasticity)
4) No, nonlinear relationship between X and Y
5) The black line is the regression line because it get closest to the sample points (minimizes error between the points and the line). The red line has a larger error; that is, larger total distance from points to the line.
6) Because it is reasonable to suppose that costs are dependent on production volume (since units are produced, directly resulting in costs), then regression is more appropriate for this data since regression is appropriate when an cause-and-effect relationship is assumed.
7) C
8) a) r = 0.8;
b) T = 1.31;
c) p = 0.117
d) There is no evidence of a significant correlation between X and Y in the population because we did not reject the null of H0: = 0.
9) Note: the following are not complete answers to Question 11; they are just enough for you to know whether your short answer addressed the correct things.
a) 1 = population slope, b1 = sample slope. On exam, would also want to address what you know (or don’t know) about each of these and how each is found.
b) An outlier can “drag” the regression line toward it. On the exam, also think about how this would affect the quality of your regression model and the predictions.

10) Yes, there appears to be a straight line relationship between the variables. Linear regression appears to be appropriate. The regression output is:

11) a) T = -0.09, p = 0.929, do not reject Ho, conclude there is no evidence of a relationship
b) R2 = 0.002 = 0.2%, No because value is very close to zero
c) Correlation = r = -0.0421. No, there is not a strong relationship between these variables. The correlation is nearly 0.
d) Regression line is Y^ = 1.26 – 0.035X.
Y^ = 1.26 – 0.035(100) = 1.26 – 3.5 = -2.24. No this does not make sense because you cannot have a negative number of near misses. It is not wise to predict with this model. The R-squared value is extremely low (essentially 0%), which means that there is no relationship at all between near misses and flights in this data. Therefore, predicting misses from flights is meaningless.
e) b1 = -0.035. As Number of flights increases by 1, we expect number of near misses to go down by 0.035. Or, put another way, as flights increases by 1000, we expect number of near misses to go down by 35. No, this does not make sense. We would assume that as flights increase, so would near misses.
12) a. Multiple regression is a direct extension of simple regression, except that now we have more than one independent (X) variable.
b. Note: the following is not a complete answer; it is just enough for you to know whether your short answer addressed the correct things: Multicollinearity is when the independent variables are highly correlated with one another. On the exam, also indicate how this affects the model, how one can identify if it is present, and what can be done to correct it.
c. Dummy variables are used to incorporate categorical variables into a regression model. A dummy variable is added that is “1” if the person/item has the characteristic and “0” if it does not.
13) B
14)

15) a) The since the p-value associated with the F-statistic is very small (note: 2.45E-10 means to move the decimal point 10 places to the LEFT, i.e. 0.000000000245), we would reject the null that says that none of the independent variables (Orig_Price and MSRP) have an effect on price. Therefore, we conclude at least one of these X variables does have an effect or relationship with price.
b) Orig_Price does affect Price, since p = 1.031E-09 = 0.000000001031 < 0.01, reject Ho: = 0 MSRP does NOT since p = 0.475 > 0.10, do not reject Ho: = 0
c) Regression equation: Y^ = -7.62 + 1.01X1 – 0.08X2; prediction: 65.18
d) MSRP -0.08, Orig_Price 1.01
e) R-squared = 0.866. This is a good model because r-square is close to 1 (100%), thus I would feel pretty confident that my predictions would be fairly accurate in this case.
16) Model 1: The first model run states that MPG is a linear function of: EngineSize, CabSpace, HorsePower, TopSpeed, and Weight. When that model is run, we find:
· R-square = 0.873
· Adjusted r-square = 0.865
· Significant variables: Horsepower, TopSpeed, Weight
· Insignificant variables: EngineSize, CabSpace
Because we have two insignificant variables, take them out.
Model 2: This model states that MPG is a linear function of HorsePower, TopSpeed, and Weight. We find that:
· R-square = 0.873
· Adjusted r-square = 0.868
· Significant variables: Horsepower, TopSpeed, Weight
· Insignificant variables: none
Taking out EngineSize and CabSpace did not change the R-squared value at all. Apparently, CabSpace did not explain any variation in MPG, so removing it clearly results in a better model (simpler with no loss of explanatory power). Since all of the independent variables left are significant, we find that this is the best possible model (removing any more would surely decrease R-squared).
Page 3
SUMMARY OUTPUT
Regression Statistics
Multiple R0.9583
R Square0.9183
Adjusted R Square0.9020
Standard Error4.1442
Observations7
ANOVA
dfSSMSFSignificance F
Regression1965.556965.55656.2210.00067
Residual585.87217.174
Total61051.429
CoefficientsStandard Errort StatP-valueLower 95%Upper 95%
Intercept162.70076.385425.48020.0000146.2865179.1148
ProdVolume-1.45700.1943-7.49800.0007-1.9565-0.9575
SUMMARY OUTPUT
Regression Statistics
Multiple R0.9346
R Square0.8734
Adjusted R Square0.8651
Standard Error3.6750
Observations82
ANOVA
dfSSMSFSignificance F
Regression57081.0473441416.209104.8621.19E-32
Residual761026.41521713.50546
Total818107.462561
CoefficientsStandard Errort StatP-valueLower 95%Upper 95%
Intercept192.81223.7168.1300.000145.578240.047
EngineSize-0.1040.387-0.2670.790-0.8750.668
CabSpace-0.0150.023-0.6680.506-0.0610.030
HorsePower0.3930.0824.7960.0000.2300.556
TopSpeed-1.2980.246-5.2650.000-1.789-0.807
Weight-1.8470.220-8.4020.000-2.285-1.409

Turn in your highest-quality paper
Get a qualified writer to help you with

“ statistics 2 ”

Get high-quality paper

Guarantee! All work is written by expert writers!

Still stressed from student homework?

Get quality assistance from academic writers!

Order now