>Output
Hypothesis Test: Mea vs. Value , .000
. 5
8. 9
. 05
n 3
(one-tailed, upper)
vs. Hypothesized Value
1, hypothesized value .2 . 9
std. dev. 5
std. error n t 4 .000
hypothesized value .9 std. dev. p-value (one-tailed, upper) )
. . mean std. dev. df . rence (On-campus – Commuter)
. 0
pooled variance 3
7
t .15 7777777 4
rth
7777777778
. 10 11 12 3 .5 0
7
8.410
44 r pairwise t-tests
73.13 82.83 16
East South Central North 3.32 3.000
00
4
00
.000
00
.01
00
.000
5
0.000 0
05
p-value es
.00
492 4
492.00 0
chi-square p-value 73019 4 9
1.000 Analysis
1 Sales MS F p-value 28,138.0066 E-05
13
.1579
p-value .3 8
7
lower lower upper Hypothesis Test: Mean vs. Hypothesized Value hypothesized value std. dev. std. error n df t p-value (one-tailed, upper) 2
mean std. dev. df pooled variance 36
pooled std. dev. 0 hypothesized difference t p-value (one-tailed, upper) p-value (one-tailed, upper) p (as decimal) / p (as fraction) .974
X difference std. error z p-value (two-tailed) Comparison of Groups North South Central East 87.0 76.0 .2 .3 .9 79.0 77.0 79.0 84.0 61.4 62.3 82.83333333333333 72.8 73.12727272727273 63.80833333333334 73.1577777777778 73.1577777777778 73.1577777777778 73.1577777777778
73.0 7 83.0 83.0 85.0 85.0 89.0 93.0 93.0 .0 .0 107.0 117.0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 507.0 474.0 .0 .0 .0 .0 .0 .0 .0 .0 Aptitude to show there are or there are no difference between what you are testing and the current situation.
ly above the university mean of 0. Level of signifance (Alpha) is 95%
9
1
4
0
7
7
Hypothesis Test: Mean vs. Hypothesized Value 1,150.000 hypothesized value 1,183.235 mean Score 118.979 std. dev. 20.405 std. error 6
34 n 6
1.63 z 1
.0517 p-value (one-tailed, upper) 5
1247 8
9
mean difference (On-campus – Commuter) pooled variance standard error of difference 95 83 79 77 76 76 78 106 Briar Hills Englewood 277 74.6 80.4 One factor ANOVA Mean n Std. Dev North ANOVA table 5.34E-06 Total 4,518.410 44 Post hoc analysis Central 73.13 North 82.83 2.54E-07 .0035 .0037 Tukey simultaneous comparison t-values (d.f. = 41) 0.10 critical values for experimentwise error rate: 0.01 3.32 5 4 1 4 4 4 2.5
5 1 2 3 4 5 5 Englewood Briar Hills 4 5 1 3 4 3 3 5 2 2 4 4 5 2 4 3 4 2 2 5 1 4 2 2 3 4 1 2 5 1 1 4 4 1 2 2 1 1 1 5 3 2 2 4 2 4 5 2 5 4 5 2 5 2 2 4 5 3 4 2 4 2 .8
1 2 3 5 5 4 4 4 5 2 4 1 4 3 4 2 .8
4 5 4 1 1 .5
1 2 5 3 5 5 5 4 2 1 5 North 200 East 200 200 7.000 -2.000 0.020 0.54 16.000 1.995 54.51 100.00 p-value Yes 49 47 96 Total 47 96 192 96.00 192.00 36 300 83 492 492.00 166 492 chi-square Yes 24 50 105 . Make predictions for X = 107
critical value .05 (two-tail) critical value .01 (two-tail) 1 28,138.01 35.39 1.59E-05 18 95% upper 6.91E-10 278.2928 393.5513 493.557 479.908 507.206 432.519 554.595 0.053 X Y &P of &N
Comparison of Groups 73.0 75.0 83.0 83.0 85.0 85.0 89.0 93.0 93.0 103.0 105.0 107.0 117.0 120.0 129.0 141.0 147.0 153.0 155.0 462.0 447.0 408.0 435.0 483.0 486.0 465.0 480.0 507.0 474.0 512.0 456.0 468.0 541.0 524.0 522.0 594.0 581.0 537.0 Aptitude Given the sample of prices, are the prices different in the different subdivisions of the RealEstateData? Sort on subdivsion and form the groups below. Do the analysis with MegaStat.
2
n
Hypothesized
1
1
5
0
hypothesized value
1,1
8
3
23
mean
Score
11
9
7
std. dev.
20
4
std. error
34
33
df
1.
6
t
.05
64
p-value
Hypothesis Test:
Mean
15
0.000
1,1
83
35
mean Score
1
18
97
20.
40
34
33 df
1.
63
.056
p-value (one-tailed, upper)
Hypothesis Test: Mean vs. Hypothesized Value
1,1
50
1,183.235 mean Score118
79
20.405 std. error
34 n 1.63
z
.05
17
Hypothesis Test: Independent Groups (t-test,
pooled variance
On-campus
Commuter
87
13
76
93
8.64
9.
54
15 15 n
28
10
200
dif
fe
82
81
9.
100
pooled std. dev.
3.
32
standard error of
difference
0
hypothesized difference
3.0
.00
47
p-value (two-tailed)
One factor ANOVA
Mean n
Std. Dev
73
77
78
82.83
12
5.
96
No
73.1
57
72
80
10.0
43
South
73.1577777777778
73.13
8.
205
Central
73.1577777777778
63.8
1
5.820
East
73.
16
45
10.134
Total
ANOVA table
Source
SS
df
MS
F
p-value
Treatment
2,173.632
7
24
44
12.
67
5.34E-06
Error
2,344.778
41
57.1
89
Total
4,
51
Post hoc analysis
p-values
fo
East South Central
North
63.81
72.80
East 63.81
South 72.80
.0082
Central 73.13
.00
52
.
92
North 82.83
2.54E-07
.0035
.00
37
Tukey simultaneous comparison t-values (d.f. = 41)
63.81 72.80 73.13 82.83
East 63.81
South 72.80
2.78
Central 73.13
2.
95
0.10
North 82.83
6.16
3.10
3.07
critical values for experimentwise error rate:
0.05
2.
68
0.01
Goodness of Fit Test
observed
expected
O – E
(O – E)² / E
% of chisq
200
19
7.0
0.
25
6.
94
200
20
2.0
-2.000
0.020
0.54
200
1
84
1
6.0
1.3
91
38
200
22
1.0
–
21
1.
99
54.51
800
800.000
3.
66
100.00
3.66
chi-square
3 df
.
30
Chi-square Contingency Table Test for Independence
UG
GR
totals
Total
Y
Observed
49
47
96
192
Expected
63.
61
32.
39
96.00
192.00
No
Observed
1
14
36
150
300
Expected
99.39
50.61
150.00
300.00
totals Observed
163
83
2
46
492
Expected
163.00
83.00
246
492.00
Total Observed
3
26
166
98
Expected
326.00
166.00
98
4.0
16.
31
4 df
.0026
Correlation Matrix
Aptitude
Sales
Aptitude
1.000
0.821
88
90
53
Sales
.822
19
sample size
± .
456
critical value .05 (two-tail)
± .5
75
critical value .01 (two-tail)
Regression
r²
0.675
n
19
r
0.822
k
Std. Error
28.198
Dep. Var.
ANOVA table
Source
SS
df
Regression
28,138.0066
1
35.39
1.
59
Residual
13,517.15
17
795.12
65
Total
41,6
55
18
Regression output
confidence interval
variables
coefficients
std. error
t (df=17)
95%
lower
95% upper
Intercept
335.9
221
27
1
48
12.
29
6.91E-10
278.2928
393.5513
Aptitude
1.4732
0.2477
5.949
1.59E-05
0.9
507
1.995
Predicted values for: Sales
95% Confidence Interval
95% Prediction Interval
Aptitude
Predicted
upper
Leverage
107
493.557
479.908
507.206
432.519
554.595
0.053
80.000
83.528
mean
Income
13.233
2.206
36
35
1.
60
.0593
Hypothesis Test: Independent Groups (t-test, pooled variance)
Briar Hills
Englewood
338.
74
315.264
26.755
31.781
12 14 n
24
23.4774
difference (Briar Hills – Englewood)
875.1917
29.
58
11.6381
standard error of difference
2.02
.0275
Hypothesis test for proportion vs hypothesized value
Observed Hypothesized
0.73
0.
62
p (as decimal)
38/52
32/52
p (as fraction)
37.96
32.24
X
52 52
n
0.0673
std. error
1.63 z
.0511
Hypothesis test for two independent proportions
p1
p2
pc
0.
474
0.358
0.4133
64/
135
53/148
117
283
63.99
52.984
116
135 148 283 n
0.116
0
hypothesized difference
0.05
86
1.98
.0478
82.4
60.1
54.2
82.9
65.7
70
91.1
67.2
62.3
67.4
85
74.6
80.4
71
80.5
64.5
59.6
74.4
59.5
75.3
67.8
76.9
84.4
54.6
85.6
59.8
66.4
61.1
79.2
61.4
91.8
82.5
78.8
82.6
82.1
73.4
68.5
5.0
103
105
120
129
141
147
153
155
462
447
408
435
483
486
465
480
512
456.0
468
541
524
522
594
581
537
SalesSheet1
Week 4 – More Hypothesizing and some real practical material
Hypothesis testing is what you
do
The steps we use in this course are:
Know the Ho (Null Hypothesis or current situation)
Know the Alpha value
Enter into MegaStat the H1 (Research Hypothesis)
Enter the data.
Make a decision on the Ho, based on the result of the “p-value” computation of H1
First we do the Ho testing using the Z or “t” for one or two groups
Example
An instructor wants to know if the mean entrance exam score of his class of 34 students .
is significant
115
Score
1295
112
1326
102
1006
1206
1279
123
122
1301
124
987
104
1177
Note: since the N > 30 one uses the Z test
1040
1266
1345
1230
1239
1434
1385
114
101
1182
1012
121
113
1120
Decision – since p-value > Alpha fail to reject Ho
1
277
Mean score is not statically significant above 1150
992
119
1181
109
Another
example
An instructor wants to know did his On-campus students scored
On-campus Commuter
differently than his On-line students. Alpha = .075
86 71
79 80
88 83
97 87
88 76
85 62
97 68
79 82
88 84
87 75
91 84
86 61
104 72
67 96
85 73
Hypothesis Test: Independent Groups (t-test, pooled variance)
On-campus Commuter
87.13
76.93
8.64 9.54 std. dev.
15 15 n
28 do
10.200
82.810
9.100 pooled std. dev.
3.323
0 hypothesized difference
3.07 t
.0047 p-value (two-tailed)
Decision: since the p-value < Alpha one rejects the Ho and
can say there is a difference. Note since one did not test
for greater or less, one can not say which scored higher
based upon this test
1.0
Your turn
A shopping center developer wants to create a development in a particular area only and only
Income
if the mean income of the homes in the immediate vicinity is greater 80 thousand dollars. Alpha = .025.
79
Does the developer build in this area?
95
77
105
65
100
67
83
76
93
79
75
62
103
87
74
107
73
89
106
97
80
64
96
78
70
65 92
2.0
A developer wants to build a shopping center near Briar Hills, IF the homes there are more expensive
that the homes in Englewood. Test using Alpha = .05.
328.1
330.9
368.6
350.7
306.8
300.2
348
297.7
399
353.3
338.6
283.2
337.3
271.6
349.6
353.7
314.4
307.4
275.8
347.4
343
319.7
339.3
298.1
339.2
FYI – you conduct similar test on Proportions
3.0
A professional basketball player has a 62% free throw percentage. Since making a change in his technique
he has hit 73% out of 52 free throws. Is this evidence that his change has helped? Alpha = .05
4.0
47.4% of 135 men say they would purchase a particular product, 35.8% of 148 women say they
would purchase the product. Are these percentages different?
Do the appropriate test. Alpha =.05
Now we statistical test when there are more than two groups (ANOVA test)
there are several different ANOVA one can use, The version one uses depends upon the groups being tested. In this class we do a simple straight forward test of the means
Example
Given the sales in the three regions shown below. Test if there are difference in the average sales by region
North South Central East
87 82.4 60.1 54.2
82.9 76 65.7
70.2
91.1 67.2 62.3 67.4
85.3
71.9
79 80.5 64.5 59.6
74.4 59.5 75.3 67.8
76.9 84.4 77 54.6
85.6 59.8 79 66.4
84 61.1 79.2 61.4
91.8 82.5 78.8 61.4
82.6 82.1 62.3
73.4 68.5
73.1577777777778 82.83 12 5.964
73.1577777777778 72.80 10 10.043 South
73.1577777777778 73.13 11 8.205 Central
73.1577777777778 63.81 12 5.820 East
73.16 45 10.134 Total
Source SS df MS F p-value Treatment 2,173.632 3
724.5440
12.67
p-value 9 < .05 so reject Ho
Error 2,344.778 41
57.1897
There is a difference in means
p-values for pairwise t-tests
East South Central North
63.81 72.80 73.13 82.83
East 63.81 South 72.80 .0082
FYI – these p-values show which groups are different
.0052
.9216
East South Central North
63.81 72.80 73.13 82.83
East 63.81
South 72.80 2.78
Central 73.13 2.95
North 82.83 6.16 3.10 3.07
0.05 2.68
5.0 Your turn
Click in cell A1 to return to the Index.
No.
Price
SubDiv
1
480.1
2
397.8
3 307.4 3
4
413.0
5
389.3
6 353.3 2
7
331.0
8
381.2
9
42
10
427.3
Burbsville
Lone Tree
Stanton
11
380.6
12
439.6
13
249.8
14
248.0
15 337.3 3
16
376.5
17
320.4
18
341.8
19 339.3 2
20
455.9
21
273.7
22
283.8
23
381.4
24
382.8
25
419.6
26 275.8 2
27
336.8
28 339.2 2
29
391.4
30
387.0
31
412.9
32
290.7
33 306.8 3
34
343.0
35
452.2
36
224.7
37 271.6 2
38
407.4
39
278.0
40
350.6
41
328.4
42 330.9 2
43
401.8
44
235.0
45
357.9
46 353.7 2
47
475.7
48
257.7
49
283.0
50
399.2
51
245.0
52
192.9
53
258.4
54 298.1 2
55
298.3
56
227.0
57
224.1
58
262.0
59
433.8
60
333.3
61
346.2
62 300.2 2
63 347.4 3
64
299.7
65
407.0
66
272.3
67
380.9
68
414.9
69
354.6
70
415.1
71
381.6
72
452.3
73
296.7
74
451.4
75
280.1
76
248.2
77
411.4
78 314.4 3
79
500.0
80
316.8
81
406.8
82
267.7
83
247.5
84
345.5
85
207
86 338.6 3
87
276.5
88 368.6 3
89
309.7
90 297.7 2
91 350.7 2
92
511.0
93
460.2
94
411.7
95
383.3
96
392.3
97
450.9
98
341.6
99
379.1
100
197.8
101
390.3
102 349.6 3
103
296.2
104
390.2
105
348.8
106
386
107
475.5
108
385.3
109
263.6
110
200.5
111
202
112
341.4
113
452.4
114
332.0
115
430.8
116
421.6
117
429.5
118 283.2 2
119 328.1 3
120
405.6
121
277.0
122 456.0 5
123
239.9
124
457.3
Now a statistical used with Nominal Data (means are not involved) Chi-Square Test
example
Given the sample of units sold in four regions, are the number of units sold in the four regions uniformly distributed?
test with Alpha = .05
fo fe
193
note the fo gives the actual data
202 South 200
while the fe gives the expected values if there were an = distribution
184
221
West
Goodness of Fit Test
observed expected O – E (O – E)² / E % of chisq
200
193.000
0.254
6.94
200
202.000
200
184.000
1.391
38.01
200
221.000
-21.000
800 800.000 0.000
3.660
3.66 chi-square
3 df
.3005
decision: since p-value > .05 fail to reject.
the units are evenly distributed.
another example
This table shows the computer ownership of a sample of GRaduate and UnderGraduate students.
Are the factors independent? Calculate the expected frequencies and perform a chi-square test: Alpha = .05
observed frequencies
UG GR totals
Own PC
No 114 36 150
totals 163 83 246
Chi-square Contingency Table Test for Independence
UG
GR
totals
Yes Observed
49
Expected
63.61
32.39
No Observed
114
150
Expected 99.39 50.61 150.00 300.00
totals Observed
163
246
Expected 163.00 83.00
246.00
Total Observed
326
984
Expected 326.00 166.00 492.00
984.00
16.31
4 df
.0026 p-value
decision; Since p-value < Alpha of .05 reject Ho, there is a difference
6.0 Your turn
This table is a result of a sample of 386 managers from small, medium, and large companies
who were asked by a local university if they planned to pursue an MBA degree in the next five years.
Test to see if there are differences – Alpha =.025
Size of company
Small
Medium
Large
Plan MBA?
179
No 49 58 100 207
73 108 205 386
Now we move on to Regression, scatter plot and correlation – they are all tied together
We previously made some scatter plots. Now we move forward and show how it is related to Correlation and
regression
Example
These data show the relationship between a sales aptitude test (X) and Sales in thousands (Y).
A. Use MegaStat to do a
Regression Analysis
X Y
X Y
Aptitude Sales Aptitude Sales
73 462 73 462
75 447 75 447
83 408 83 408
83 435 83 435
85 483 85 483
85 486 85 486
89 465 89 465
93 480 93 480
93 507 93 507
103 474 103 474
105 512 105 512
107 456 107 456
117 468 117 468
120 541 120 541
129 524 129 524
141 522 141 522
147 594 147 594
153 581 153 581
155 537 155 537
Note – the in depended variable is on horizontal axis
Correlation Matrix
Aptitude Sales
Aptitude 1.000
0.8218873019904539
Sales .822 1.000
19 sample size
± .456
± .575
Note the r > than the critical values so it
is significant
Also note .822^2 =
0.675684
Which is R
R is the correlation squared and is the relationship (%) between variable
Now regression – note where we typed the 107
Regression Analysis
r² 0.675 n 19
r 0.822 k 1
Std. Error 28.198 Dep. Var. Sales
ANOVA table
Source SS df MS F p-value
Regression
28,138.01
Residual 13,517.15 17
795.1265
Total
41,655.16
Regression output confidence interval
variables coefficients std. error t (df=17) p-value
95% lower
Intercept
335.9221
27.3148
12.298
Aptitude 1.4732 0.2477 5.949 1.59E-05
0.9507
1.9957
Predicted values for: Sales
95% Confidence Interval 95% Prediction Interval
Aptitude Predicted lower upper lower upper Leverage
107
7.0 Your turn
Use the following data to compute a (1) Scatter plot, (2) a Correlation – state if r is statistical significant,
(3) state the relationship between the variable in percent, and (4) forecast/predict the sales of sales
person that makes 22
Calls
Calls Sales
6 19
12 38
14 34
10 24
20 47
22 38
25 60
27 53
29 70
51 46
33 59
36 63
37 70
42 67
44 53
48 57
52 33
North South Central East 87.0 82.4 60.1 54.2 82.9 76.0 65.7 70.2 91.1 67.2 62.3 67.4 85.3 74.6 80.4 71.9 79.0 80.5 64.5 59.6 74.4 59.5 75.3 67.8 76.9 84.4 77.0 54.6 85.6 59.8 79.0 66.4 84.0 61.1 79.2 61.4 91.8 82.5 78.8 61.4 82.6 82.1 62.3 73.4 68.5 82.83333333333333 72.8 73.12727272727273 63.80833333333334 73.1577777777778 73.1577777777778 73.1577777777778 73.1577777777778
SalesSheet2
Sheet3