hiii
Sales |
Age |
Growth |
Income |
HS |
College |
1695712.620 |
33.1574 |
0.8299 |
26748.51 |
73.5949 |
17.8350 |
3403862.053 |
32.6667 |
0.6619 |
53063.79 |
88.4557 |
31.9439 |
2710352.905 |
35.6553 |
0.9688 |
36090.14 |
73.5362 |
18.6198 |
529215.459 |
33.0728 |
0.0821 |
32058.07 |
79.1780 |
20.6284 |
663686.654 |
35.7585 |
0.4646 |
47843.42 |
84.1838 |
35.2032 |
2546324.335 |
33.8132 |
2.1796 |
50180.97 |
93.4996 |
41.7057 |
2787046.202 |
30.9797 |
1.8048 |
30710.08 |
78.0234 |
28.0250 |
612696.054 |
30.7843 |
-0.0569 |
29141.70 |
70.2949 |
15.0882 |
891822.033 |
32.3164 |
-0.1577 |
25980.15 |
70.6674 |
10.9829 |
1124967.965 |
32.5312 |
0.3664 |
18730.88 |
63.7395 |
13.2458 |
909500.976 |
31.4400 |
2.2256 |
31109.23 |
76.9059 |
19.5500 |
2631166.881 |
33.1613 |
1.5158 |
35614.12 |
82.9452 |
20.8135 |
882972.654 |
31.8736 |
0.1413 |
23038.43 |
65.2127 |
16.9796 |
1078573.124 |
33.4072 |
-1.0400 |
34531.72 |
73.4944 |
32.9920 |
844320.194 |
34.0470 |
1.6836 |
30350.36 |
80.2201 |
22.3185 |
1849119.029 |
28.8879 |
2.3596 |
38964.94 |
87.5973 |
24.5670 |
3860007.316 |
36.1056 |
0.7840 |
49392.77 |
85.3041 |
30.8790 |
826573.880 |
32.8083 |
0.1164 |
25595.69 |
65.5884 |
17.4545 |
604682.868 |
33.0538 |
1.1498 |
29622.61 |
80.6176 |
18.6356 |
1903611.600 |
33.4996 |
0.0606 |
31586.10 |
80.3790 |
38.3249 |
2356808.391 |
32.6809 |
1.6338 |
39674.56 |
79.8526 |
23.7780 |
2788571.957 |
28.5166 |
1.1256 |
28878.98 |
81.2371 |
16.9300 |
634878.286 |
32.8945 |
1.4884 |
24287.08 |
70.2244 |
19.1429 |
2371627.369 |
30.5024 |
4.7937 |
46711.24 |
87.1046 |
30.8843 |
2627837.961 |
30.2922 |
1.8922 |
33449.81 |
80.2057 |
26.5570 |
1868116.330 |
31.2911 |
1.8667 |
31694.45 |
75.2914 |
28.3600 |
2236796.862 |
33.0498 |
1.7896 |
25459.22 |
77.6162 |
19.2490 |
1318876.234 |
32.9348 |
0.2707 |
47047.34 |
85.1753 |
35.4994 |
1868097.836 |
31.8381 |
3.0129 |
26433.24 |
74.1792 |
18.6375 |
1695218.566 |
31.0794 |
23.4630 |
33396.66 |
81.6991 |
41.1130 |
2700194.415 |
32.1807 |
0.7041 |
26179.36 |
73.4140 |
17.8566 |
1156049.774 |
31.6944 |
-0.1569 |
33454.64 |
73.7161 |
26.5426 |
643858.444 |
34.0263 |
0.7084 |
42271.50 |
78.6493 |
29.8734 |
2188687.363 |
34.7315 |
0.1353 |
46514.75 |
80.9503 |
24.5374 |
830351.940 |
30.5613 |
0.3848 |
27030.81 |
66.8057 |
14.1390 |
1226905.572 |
33.5183 |
0.7417 |
42910.08 |
77.8905 |
20.8340 |
566903.589 |
32.3952 |
0.6693 |
40561.40 |
79.3622 |
19.0309 |
826518.398 |
29.9108 |
0.1111 |
22325.96 |
58.3610 |
10.6729 |
1) Can demographic information be helpful in
predicting sales at sporting goods stores? The file
contains the monthly sales totals from a random sample of
38 stores in a large chain of nationwide sporting goods
stores. All stores in the franchise, and thus within the
sample, are approximately the same size and carry the same
merchandise. The county or, in some cases, counties in
which the store draws the majority of its customers is
referred to here as the customer base. For each of the 38
stores, demographic information about the customer base is
provided. The data are real, but the name of the franchise is
not used, at the request of the company. The data set
contains the following variables:
Sales—Latest one-month sales total
(dollars)
Age—Median age of customer base (years)
HS—Percentage of customer base with a high school
diploma
College—Percentage of customer base with a college
diploma
Growth—Annual population growth rate of customer
base over the past 10 years
Income—Median family income of customer base
(dollars)
a. Construct a scatter plot, using sales as the dependent
variable and median family income as the independent
variable. Discuss the scatter plot.
b. Assuming a linear relationship, use the least-squares
method to compute the regression coefficients and
c. Interpret the meaning of the Y intercept, , and the
slope, , in this problem.
d. Compute the coefficient of determination, , and interpret
its meaning.
e. Perform a residual analysis on your results and determine
the adequacy of the fit of the model.
f. At the 0.05 level of significance, is there evidence of a
linear relationship between the independent variable and
the dependent variable?
2- For the data of Problem 13.85, repeat (a) through (f), using Age as the independent variable.
3 – For the data of Problem 13.85, repeat (a) through(f), using HS as the independent variable.
4 – For the data of Problem 13.85, repeat (a) through (f), using College as the independent variable.
Use Excel, StatCrunch, or any other statistical packages to solve the following
4
problems
Develop a model to predict the assessed value (in
thousands of dollars), using the size of the houses (in thousands
of square feet) and the age of the houses (in years) from the following table
Assessed Value |
Heating Area |
Age |
|||
1 8 4 .4 |
2 .
0 0 |
3 .42 |
|||
1 7 7.4 |
1.71 |
1 1. 5 0 |
|||
1 75 .7 |
1.45 |
8.33 |
|||
1 85 . 9 |
1.7 6 |
0.00 |
|||
1 79 .1 |
1. 93 |
7.42 |
|||
1 70 .4 |
1. 20 |
32.00 |
|||
17 5.8 |
1.55 |
16 .00 |
|||
1 78 .5 |
1. 59 |
1.75 |
|||
179.2 |
1.50 |
2.75 |
|||
1 86 .7 |
1.90 |
||||
179.3 |
1.39 |
||||
1 74 .5 |
1.54 |
12 .58 |
|||
1 83 .8 |
1.89 |
||||
176.8 |
7.17 |
||||
a. State the multiple regression equation.
b. Interpret the meaning of the slopes in this equation.
c. Predict the assessed value for a house that has a size of
1,750 square feet and is
10
years old.
d. Perform a residual analysis on the results and determine
whether the regression assumptions are valid.
e. Determine whether there is a significant relationship between
assessed value and the two independent variables
(size and age) at the 0.05 level of significance.
f. Determine the p-value in (e) and interpret its meaning.
g. Interpret the meaning of the coefficient of multiple determination
in this problem.
h. Determine the adjusted
i. At the 0.05 level of significance, determine whether
each
independent variable makes a significant contribution to
the regression model. Indicate the most appropriate
regression
model for this set of data.
j. Determine the p-values in (i) and interpret their meaning.
2)
14
.73 Crazy Dave, a well-known baseball analyst, wants
to determine which variables are important in predicting a
team’s wins in a given season. He has collected data related
to wins, earned run average (ERA), and runs scored for the
2009 season. Develop a model to predict
the number of wins based on ERA and runs scored.
Team |
Wins |
E.R.A. |
Runs Scored |
|||||||||
Baltimore |
64 |
5.
15 |
741 |
|||||||||
Boston |
95 |
4.3 5 |
87 2 |
|||||||||
Chicago White Sox |
79 |
4.1 4 |
724 |
|||||||||
Cleveland |
65 |
5.06 |
773 |
|||||||||
Detroit |
86 |
4.2 9 |
743 |
|||||||||
Kansas City |
4.8 3 |
686 |
||||||||||
Los Angeles Angels |
97 |
4.4 5 |
88 3 |
|||||||||
Minnesota |
4.5 0 |
817 |
||||||||||
New Y ork Yankees |
103 |
4.26 |
91 5 |
|||||||||
Oakland |
75 |
759 |
||||||||||
Seattle |
85 |
3.8 7 |
640 |
|||||||||
Tampa Bay |
84 |
4.33 |
80 3 |
|||||||||
Texas |
87 |
4.38 |
764 |
|||||||||
Toronto |
4.47 |
798 |
||||||||||
Arizona |
70 |
4.42 |
720 |
|||||||||
Atlanta |
3.5 7 |
735 |
||||||||||
Chicago Cubs |
83 |
3.84 |
707 |
|||||||||
Cincinnati |
78 |
4.18 |
673 |
|||||||||
Colorado |
92 |
4. 22 |
804 |
|||||||||
Florida |
772 |
|||||||||||
Houston |
74 |
4.54 |
643 |
|||||||||
Los Angeles Dodgers |
3.41 |
780 |
||||||||||
Milwaukee |
80 |
785 |
||||||||||
New York Mets |
671 |
|||||||||||
Philadelphia |
93 |
4.16 |
820 |
|||||||||
Pittsburgh |
62 |
4.59 |
636 |
|||||||||
St. Louis |
91 |
3.66 |
7 30 |
|||||||||
San Diego |
4.37 |
638 |
||||||||||
San Francisco |
88 |
3.55 |
657 |
|||||||||
Washington |
59 |
5.00 |
710 |
a. State the multiple regression equation.
b. Interpret the meaning of the slopes in this equation.
c. Predict the number of wins for a team that has an ERA
of 4.50 and has scored 750 runs.
d. Perform a residual analysis on the results and determine
whether the regression assumptions are valid.
e. Is there a significant relationship between number of
wins and the two independent variables (ERA and runs
scored) at the 0.05
level of significance?
f. Determine the p-value in (e) and interpret its meaning.
g. Interpret the meaning of the coefficient of multiple determination
in this problem.
h. Determine the adjusted
i. At the 0.05 level of significance, determine whether
each independent variable makes a significant contribution
to the regression model. Indicate the most appropriate
regression model for this set of data.
j. Determine the p-values in (i) and interpret their meaning.
3) Referring to Problem 2, suppose that in addition to
using ERA to predict the number of wins, Crazy Dave wants to
include the league ( American, National) as an independent
variable. Develop a model to predict wins based on ERA
and league. For (a) through (f), do not include an interaction term.
a. State the multiple regression equation.
b. Interpret the slopes in (a).
c. Predict the number of wins for a team with an ERA of
4.50 in the American League. Construct a 95%
confidence interval estimate for all teams and a 95%
prediction interval for an individual team.
d. Perform a residual analysis on the results and determine
whether the regression assumptions are valid.
e. Is there a significant relationship between wins and the
two independent variables (ERA and league) at the 0.05
level of significance?
f. At the 0.05 level of significance, determine whether each
independent variable makes a contribution to the regression
model. Indicate the most appropriate regression
model for this set of data.
League |
|||||||||||||||||||||||||||||||||||||
0 | |||||||||||||||||||||||||||||||||||||
1 | |||||||||||||||||||||||||||||||||||||
4) Last year, we contacted a small survey of 27 undergraduate students regarding their school performances (Grade Point Averages) and possible factors which might influence their Grade Point Averages (GPA). The accompanying file summarized the results. Suppose we show the grade point averages by Y, the number of hours per week spent studying by
X1
, the average number of hours spent preparing for tests by
X2
, the number of hours per week spend in Cafeterias by
X3
, whether students take notes or mark highlights when reading texts by
X4
(X4 = 1 if yes, 0 if no), and the average number of credit hours taken per semester by
X5
. Develop a multiple regression model based on the above variables. Fully discus your model by using at least four criteria to evaluate multiple regression models.
Y | X1 | X2 | X3 | X4 | X5 | |||||||||||||||||||||||||
4.8 |
25 |
5 | 6 | 16 | ||||||||||||||||||||||||||
4.3 | 22 | 2 | 15 | |||||||||||||||||||||||||||
3.8 | 9 | 3 | 4 | |||||||||||||||||||||||||||
8 | 17 | |||||||||||||||||||||||||||||
4.2 | ||||||||||||||||||||||||||||||
30 |
13 |
|||||||||||||||||||||||||||||
20 | 7 | |||||||||||||||||||||||||||||
10 |
19 |
|||||||||||||||||||||||||||||
3.1 |
||||||||||||||||||||||||||||||
3.9 |
18 | |||||||||||||||||||||||||||||
3.2 |
||||||||||||||||||||||||||||||
4.9 |
||||||||||||||||||||||||||||||
4.4 | 12 | |||||||||||||||||||||||||||||
4.5 | ||||||||||||||||||||||||||||||
4.6 |
||||||||||||||||||||||||||||||
28 |
||||||||||||||||||||||||||||||
3.7 |
14 | |||||||||||||||||||||||||||||
3.5 | ||||||||||||||||||||||||||||||
2.8 |
||||||||||||||||||||||||||||||
4.1 | ||||||||||||||||||||||||||||||