Read assignments before reaching out to me.

Due: by Monday 2 PM

Chapt. 7 assignments are at the end all exercises

You will have a link you must use information from that site for the CDC assignment. . Also use 2010 is the best time frame

for the examples.

Section

7

Analyzing our Marketing Test, Survey Results

and Other Metrics Using

Confidence Intervals

Rhonda Knehans Drake

Associate Professor, New York University

Data Analytics, Interpretation and Reporting

copyright © 2013

2

• When we estimate population averages or percentages based on

samples, a certain amount of error is present.

• The amount of error present is a function mostly of your

sample size.

• The larger the sample, the less error in your estimates.

• Today we will learn how to place bounds around our estimates

obtained.

• With such an interval, we will then be able to say with 90%, 95%

or 99% confidence that the true population estimate will lie in

these bounds.

Introduction

3

The eight confidence interval formulas we will discuss are for

the following situations:

1. Average or mean based on large samples

2. Average or mean based on small samples

3. Response rate or survey percent based on large samples

4. Response rate or survey percent based on small samples

5. Difference between 2 averages for large samples

6. Difference between 2 averages for small samples

7. Difference between 2 response rates for large samples

8. Difference between 2 response rates for small sample

Confidence Intervals

A-B Split Tests

4

1. Confidence Interval for Averages or Means Based on

Large Samples (n ≥ 30)

To calculate a confidence interval around a mean, the following

information is required:

– The sample mean x obtained from the test.

– The sample standard deviation S obtained from the test.

Many software packages, including Microsoft ExcelTM, can

automatically calculate this value for you. (Review Section 3 for the

standard deviation formula.)

– The sample size n of the test.

This is the number of observations used to calculate your mean. The

sample size must be greater than or equal to 30 in size.

– The desired confidence level: 90%, 95% or 99%.

A confidence interval constructed around the mean will guarantee, with

your specified level of confidence, the true mean will fall within those

bounds.

Sample Means

(Large Sample)

5

• Once all information is known, you construct the confidence

interval around the mean by adding and subtracting from the

mean a multiple of your standard deviation associated with the

sample mean. The “multiple” depends on your desired

confidence level.

• The formula for a confidence interval around the mean is

calculated as follows:

x (Z )(S/n )

Where:

• S is the standard deviation associated with the sample.

• n is the sample size.

• Z is equal to 1.645, 1.96 or 2.575 for a 90%, 95% or 99%

confidence level

Sample Means

(Large Sample)

6

Example:

Money magazine conducts a survey of 100 retirees across the US

and asks them how much they have in their retirement fund.

You obtain an average of $84.75 with a standard deviation of

$18.75, both in thousands of dollars.

You are about to write an article based on this average but realize

that the true average is something more or less than this in reality.

Construct a 95% confidence interval around this average.

Sample Means

(Large Sample)

7

Sample Means

(Large Sample)

84.75 1.96 (18.75 / √100)

84.75 1.96 (1.875)

84.75 3.675

($81,075 , $88,425)

1.

2.

3.

4.

8

Later we will discuss how to chose the confidence level and

address if 95% was the appropriate level for this example.

Sample Means

(Large Sample)

9

• Let’s do the previous

example again but using

the Plan-

alyzer.

1.

2.

3.

Sample Means

(Large Sample)

Select the tab “Table of

Calculators”

Select “Confidence

Interval Calculators for

Averages, Large Samples”

Select “One Sample”

10

Sample Means

(Large Sample)

Input the required info.

11

Sample Means

(Large Sample)

See the answer.

12

2) Confidence Interval for Averages or Means Based on

Small Samples (n < 30)

To calculate a confidence interval around a mean, the following

information is required:

– The sample mean x obtained from the test.

– The sample standard deviation S obtained from the test.

Many software packages, including Microsoft ExcelTM, can

automatically calculate this value for you. (Review Section 3 for the

standard deviation formula.)

– The sample size n of the test.

This is the number of observations used to calculate your mean. The

sample size must be greater than or equal to 30 in size.

– The desired confidence level: 90%, 95% or 99%.

A confidence interval constructed around the mean will guarantee,

with your specified level of confidence, the true mean will fall within

those bounds.

Sample Means

(Small Sample)

13

• Once all information is known, you construct the confidence

interval around the mean by adding and subtracting from the

mean a multiple of your standard deviation associated with the

sample mean. The “multiple” depends on your desired

confidence level.

• The formula for a confidence interval around the mean is

the same as the prior formula except we use a value from the

“t-distribution” which is for approximate normally distributed

data:

x (t )(S / n )

Where:

• S is the standard deviation associated with the sample.

• n is the sample size.

• t is obtained by using the excel function TINV as will be seen

shortly.

Sample Means

(Small Sample)

14

Example:

Suppose Money Magazine only conducted the survey to a sample

of 10 retirees instead of 100 as in our prior example, all else the

same.

Construct a 95% confidence interval around this average.

Sample Means

(Small Sample)

15

We construct the confidence interval as before except we will use

The t-distribution.

84.75 ± (t) (18.75/√10 )

Where the value of t = TINV(.05,9)

= 2.262

84.75 ± (2.262) (18.75/3.1623)

84.75 ± (2.262) (5.93)

84.75 ± 13.

41

($71.34, $98.16)

(Small Sample)

(1-conf

level)

(n-1)

Sample Means

16

Note our confidence interval is wider for two reasons:

1. The smaller sample size

1. Our sample is less than 30 we cannot assume it is normal but

only approximately normal so our multiplier is larger (2.2

62

versus 1.96).

Sample Means

(Small Sample)

17

• Let’s do the previous

example again but using

the Plan-alyzer.

1.

2.

3.

Sample Means

(Small Sample)

Select the tab “Table of

Calculators”

Select “Confidence

Interval Calculators for

Averages, Small Samples”

Select “One Sample”

18

Sample Means

(Small Sample)

Input the required info.

19

Sample Means

(Small Sample)

See the answer.

20

3. Confidence Intervals for Response Rates or Survey

Percentages Based on Large Samples (where n*p and

n*(1 – p) are both ≥ 5)

To calculate a confidence interval around a sample proportion, the

following information is required.

– The sample proportion p obtained from the test.

– The sample size n of the test.

This is the number of observations used to calculate your proportion.

The sample size, when multiplied by the sample proportion and when

multiplied by one minus the sample proportion, must both be greater

than or equal to

5.

– The desired confidence level: 90%, 95% or 99%.

A confidence interval constructed around the sample proportion will

guarantee, with your specified level of confidence, the true population

proportion will fall within those bounds.

Sample Proportions

(Large Sample)

21

• Once all information is known, you construct the confidence

interval around the sample proportion by adding and subtracting

from the sample proportion a multiple of the standard deviation

associated with the sample proportion. The “multiple” depends

on your desired confidence level.

• The formula for a confidence interval around the sample

proportion is calculated as follows:

• Where n is your sample size and Z is 1.645, 1.96, 2.575 for a

90%, 95% or 99% confidence interval

This is the standard deviation for a

proportion or response rate

Sample Proportions

(Large Sample)

22

Example:

AT&T samples a new prospect list and sends them an offer to

order their new wireless cellular service.

They sample 10,000 prospects and receive a 0.89% response rate.

What is the margin of error at 90% confidence?

Sample Proportions

(Large Sample)

23

• So, for our example we have the confidence interval is:

p^ (z/2 )(Sp^ )

= .0089 ± (1.645)·√ (.0089)(1- .0089 )/10,000

= .0089 ± (1.645)·√ (.0089)(.9911 )/10,000

= .0089 ± (1.645)·√ 10000008

= .0089 ± (1.645)·(.0008944)

= .0089 ± .0015

(.0074, .0104) or (.74%, 1.04%)

Sample Proportions

(Large Sample)

0.0089 1.645*(√[(0.0089)*(0.9911)/10,000]

0.0089 1.645*(√0.0000008)

0.0089 1.645*(0.0008944)

0.0089 .0015

(0.0074 , 0.0104) or (0.74% , 1.04%)

1.

2.

3.

4.

5.

24

• Let’s do the previous

example again but using

the Plan-alyzer.

1.

2.

3.

Sample Proportions

(Large Sample)

Select the tab “Table of

Calculators”

Select “Confidence

Interval Calculators for

Percentages, Large

Samples”

Select “One Sample”

25

Sample Proportion

(Large Sample)

Input the required info.

26

Sample Proportion

(Large Sample)

See the answer.

27

4. Confidence Intervals for Response Rates or Survey

Percentages Based on Small Samples (where either n*p or

n*(1 – p) are < 5)

To calculate a confidence interval around a sample proportion, the

following information is required:

– The sample proportion p obtained from the test.

– The sample size n of the test.

This is the number of observations used to calculate your proportion.

The sample size, when multiplied by the sample proportion and when

multiplied by one minus the sample proportion, must both be greater

than or equal to 5.

– The desired confidence level: 90%, 95% or 99%.

A confidence interval constructed around the sample proportion will

guarantee, with your specified level of confidence, the true population

proportion will fall within those bounds.

Sample Proportions

(Small Sample)

28

• Once all information is known, you construct the confidence

interval around the sample proportion by adding and subtracting

from the sample proportion a multiple of the standard deviation

associated with the sample proportion. The “multiple” depends

on your desired confidence level.

• The formula for a confidence interval around the sample

proportion is the same as the prior formula except we use a

value from the “t-distribution” which is for approximate normally

distributed data:

• Where n is your sample size and t is obtained by using the

Excel function TINV as will be seen shortly.

This is the standard deviation for a

proportion or response rate

Sample Proportions

(Small Sample)

29

Example:

Suppose AT&T only sampled 100 prospects instead of 10,000 as

in our previous example, all else the same.

What is the margin of error at 90% confidence?

Sample Proportions

(Small Sample)

30

We construct the confidence interval as before except we will use

the t-distribution.

0.0089 (t)·√(.0089)(1- .0089 )/100

Where the value of t = TINV(.10,99)

= 1.66

= .0089 ± (1.66)·√(.0089)(.9911 )/100

= .0089 ± (1.66)·√ .0000882

= .0089 ± (1.66)·(.0093914)

= (0.00, 0.0245) or (0.00%, 2.45%)

The lower bound here cannot be negative, so we change it to zero.

Sample Proportions

(Small Sample)

(1-conf

level)

(n-1)

31

• Let’s do the previous

example again but using

the Plan-alyzer.

1.

2.

3.

Sample Proportions

(Small Sample)

Select the tab “Table of

Calculators”

Select “Confidence

Interval Calculators for

Percentages, Small

Samples”

Select “One Sample”

32

Sample Proportions

(Small Sample)

Input the required info.

33

See the answer.

Sample Proportions

(Small Sample)

34

5. Confidence Interval for the Difference between 2 Means or

Averages for Large Samples (n1 ≥ 30 and n2 ≥ 30 )

To calculate a confidence interval around the difference between

two sample means, the following information is required:

– The means of both samples (x1 and x2).

– The standard deviation of both samples (S1 and S2).

Many software packages, including Microsoft ExcelTM, can

automatically calculate these values.

– The size of both samples (n1 and n2).

These are the number of observations that went into calculating each

of your means. Both sample sizes n1 and n2 must be greater than or

equal to 30 in size.

– The desired confidence level: 90%, 95% or 99%.

A confidence interval constructed around the sample proportion will

guarantee, with your specified level of confidence, the true population

proportion will fall within those bounds.

•

Confidence Interval Estimation (A-B Split Testing)

(Difference Between Two Samples Means – Large Samples)

35

• Once all information is known, you construct the confidence

interval around the difference between two mean by adding and

subtracting from the difference in means a multiple of the

standard deviation associated with the difference. The

“multiple” depends on your desired confidence level.

• The formula for a confidence interval around the difference

between means is calculated as follows:

• Where Z is 1.645, 1.96, 2.575 for a 90%, 95% or 99%

confidence interval.

This is the standard deviation for the

difference in averages.

Confidence Interval Estimation (A-B Split Testing)

(Difference Between Two Sample Means – Large Samples)

(X1 – X2) ± (Z)(√(S1

2 /n1) + (S2

2 /n2))

36

Example:

You sample 100 home sales in San Francisco and 100 home sales

in NYC for 2010 with the following results:

NSF = 100 XSF = $745.25 SSF = $

40

NNYC = 100 XNYC = $775.10 SNYC = $

45

Is there any difference in home prices between NYC and San

Francisco? Base your answer on the 95% confidence level.

Confidence Interval Estimation (A-B Split Testing)

(Difference Between Two Sample Means – Large Samples)

37

• Assume SSF = $40 and SNYC = $45

(775.10 – 745.25) ± (1.96)·√ [(402/100)+(452/100)]

29.85 ± (1.96)·√ 16 + 20.25

29.85 ± (1.96)·√ 36.25

29.85 ± (1.96)·(6.021)

29.85 ± 11.80

($18,050, $41,650)

Confidence Interval Estimation (A-B Split Testing)

(Difference Between Two Sample Means – Large Samples)

1.

2.

3.

4.

5.

(775.10-745.25) 1.96*(√[(402/100)+(452/100 )])

29.85 1.96*(√(16+20.25))

29.85 1.96*(6.021)

29.85 11.80

($18.050 , $41.650)

38

How do we interpret?

• What if the interval was -$18,050 to $41,540. How would you

interpret and is it okay in this case to have a negative value?

Confidence Interval Estimation (A-B Split Testing)

(Difference Between Two Sample Means – Large Samples)

– With 95% confidence, we can say that NYC home prices in

2010 are higher than SF home prices by anywhere from

$18,050 to $41,540.

– If Zero were in the interval then you would say no

difference between the two (a hypothesis test!!).

– It means there’s no statistical evidence to conclude that

NYC prices are different from those of SF.

– Its ok to have a negative value.

39

• Let’s do the previous

example again but using

the Plan-alyzer.

1.

2.

3.

Select the tab “Table of

Calculators”

Select “Confidence

Interval Calculators for

Averages, Large Samples”

Select “Test vs. Control”

Confidence Interval Estimation (A-B Split Testing)

(Difference Between Two Sample Means – Large Samples)

40

Input the required info.

Confidence Interval Estimation (A-B Split Testing)

(Difference Between Two Sample Means – Large Samples)

41

See the answer.

Confidence Interval Estimation (A-B Split Testing)

(Difference Between Two Sample Means – Large Samples)

42

6. Confidence Interval for the Difference between 2 Means or

Averages for Small Samples (n1 < 30 or n2 < 30).

– If one or both samples is less than 30 in size then you will

replace the Z value with the t value.

– You will again use the TINV function in excel.

– Your parameters are TINV(1- confidence level, n1+n2 – 2).

– All else the same.

– This will be used for small market research problems.

Confidence Interval Estimation (A-B Split Testing)

(Difference Between Two Sample Means – Small Samples)

43

7. Confidence Intervals for the Difference Between 2

Percentages for Large Samples (n1*p1, n1*(1-p1), n2*p2,

n2*(1-p2), all ≥ 5)

To calculate a confidence interval around the difference between

two sample proportions, the following information is required:

– The proportions (p1 and p2) for both samples.

– The size of both samples (n1 and n2).

These are the number of observations used in calculating each of the

sample proportions. Both sample sizes, when multiplied by their

respective sample proportions and when multiplied by one minus their

respective sample proportions, must all be greater than or equal to 5.

– The desired confidence level: 90%, 95% or 99%.

A confidence interval constructed around the sample proportion will

guarantee, with your specified level of confidence, the true population

proportion will fall within those bounds.

Confidence Interval Estimation (A-B Split Testing)

(Difference Between Two Sample Proportions – Large Samples)

44

• Once all information is known, you construct the confidence

interval around the difference between two proportions by adding

and subtracting from the difference in proportions a multiple of the

standard deviation associated with the difference. The “multiple”

depends on your desired confidence level.

• The formula for a confidence interval around the difference

between proportions is calculated as follows:

• Where Z is 1.645, 1.96, 2.575 for a 90%, 95% or 99% confidence

interval.

This is the standard deviation for the

difference in proportions.

Confidence Interval Estimation (A-B Split Testing)

(Difference Between Two Sample Proportions – Large Samples)

45

Example:

You are in charge of new card acquisitions at American Express.

You conduct a new offer test for the green card versus your

control offer with the following results

Did the test beat the control with 95% confidence? Do you have a

winner?

Sample Size Response Rate

Control Offer with 10,000 Bonus Miles 10,000 1.10%

Test Offer with 25,000 Bonus Miles 10,000 1.38%

Confidence Interval Estimation (A-B Split Testing)

(Difference Between Two Sample Proportions – Large Samples)

1.

2.

3.

4.

46

Confidence Interval Estimation (A-B Split Testing)

(Difference Between Two Sample Proportions – Large Samples)

0.00297

0.000001

.0028

.0028

47

So how do we interpret?

• With 95% confidence the test can do worse than the control by -.017%

OR do better than the control by .577%.

• As such we say the test and the control are not significantly different

since the confidence interval wrapped around the difference in

response rates contains zero.

• Had the lower bound been above zero then we would say the test has

beaten the control.

• But let’s be real here. For all purposes, the test is a winner. The lower

bound is soooo close to zero. So worst case the test is the same as

the control with much upside potential.

Confidence Interval Estimation (A-B Split Testing)

(Difference Between Two Sample Proportions – Large Samples)

48

So how do we interpret (continuation)?

• But remember just because the test beaten the control from a statistical

point of view, that does not mean that it won from a marketing point of

view.

• In this example we were giving away additional sky miles. So the test

will need to beat the control by some minimum most likely greater than

zero or else we will not generate the same revenue.

• Suppose I told you that based on the cost of the additional sky miles

the test needs to beat the control by at least .025% to break-even.

Would you consider it a winner?

• What if I told you the test needs to beat the control by .25% to break

even. Would you now consider the test a winner?

(Difference Between Two Sample Proportions – Large Samples)

49

• Let’s do the previous

example again but using

the Plan-alyzer.

1.

2.

3.

Select the tab “Table of

Calculators”

Select “Confidence

Interval Calculators for

Proportions, Large

Samples”

Select “Test vs. Control”

(Difference Between Two Sample Proportions – Large Samples)

50

Input the required info.

(Difference Between Two Sample Proportions – Large Samples)

51

See the answer.

Confidence Interval Estimation (A-B Split Testing)

(Difference Between Two Sample Proportions – Large Samples)

52

8. Confidence Intervals for the Difference Between 2

Percentages for Small Samples (where n1*p1 or n1*(1-p1)

or n2*p2 or n2*(1-p2) < 5)

• You will replace the Z value with the t value.

• You will again use the TINV function in excel.

• Your parameters are TINV(1- confidence level, n1+n2 – 2).

• All else the same.

• This will be used for small market research problems.

Confidence Interval Estimation (A-B Split Testing)

(Difference Between Two Sample Proportions – Small Samples)

53

• Remember, interpretation of confidence intervals is not that

simple.

• They will not tell you what to do.

• They simply give you valid best and worst case scenarios to

take into

consideration.

• They give you additional information upon which to help you

base your marketing decisions:

• Worst case, are the results meeting your criteria?

• How does the upside potential compare to the downside potential?

Interpretation of Confidence Intervals

54

• No direct marketer should ever consider evaluating their test

results with a confidence level lower than 90%. To do so assumes

way to much risk.

• And, fishing for a confidence level that yields significance should

never be practiced.

• The rules that any good direct marketer should follow regarding

significance are shown on the next slide.

Setting the Confidence Level

55

Evaluate your test response

rate at 95% confidence level

Significant?

Yes No

Is it significant at the 99%

confidence level?

Not that we want to go that

low but is it significant at the

90% confidence level?

Yes No Yes No

A no brainer.

Let’s roll!

That’s okay…at

a minimum let’s

consider a

partial to full

roll out.

Okay, so we

have something

here. Let’s

either retest or

go for a partial

rollout.

Not good. We

should scrape

this from further

consideration.

Setting the Confidence Level

56

It is important to keep in mind the following facts regarding the

creation of a confidence interval.

• If you want more confidence in your estimates, the resulting interval

will widen.

• If you increase your sample size, the resulting interval will become

tighter.

• The more accuracy you need in your test estimate, the higher you

should set your confidence level.

• The confidence level you set should depend on the risk you are

willing to take in making an incorrect decision.

Setting the Confidence Level

57

• If our sample represents a large percent (> 10%) of the population in total,

then we typically apply a correction factor to our margin of error estimates.

• The larger our sample is as a percent of the total population the more valid

our estimates.

• Of the following two samples, which would you think would yield a better

parameter estimate?

• A sample of size 5,000 from a population of size 10,000 in total

• A sample of size 5,000 from a population of size 10,000,000 in total.

• If our sample represents 10% or more of the total population, you

multiple the margin of error for the first four formulas by:

Finite Population Correction Factor

58

Example:

Going back to our first exercise, suppose the survey of 100 people

was conduct to the 800 residents of the Happy Retirement Village and

not Money Magazine subscribers.

What is our correction factor and what is the new interval?

Finite Population Correction Factor

3.

1.

2.

3.4398

59

7.1 Briefly explain how the width of a confidence interval decreases

with an increase in the sample size. Give me an example.

7.2 Briefly explain how the width of a confidence interval decreases

with a decrease in the confidence level. Give me an example.

7.3 According to a study done by Dr. Martha S. Linet and others, the mean

duration of the recent headache was 8.2 hours for a sample of 5055

females aged 12 through 29. Assume that this sample represents the

current population of all headaches for all females aged 12 through 29

and that the standard deviation for this sample is 2.4 hours. Make a 95%

confidence interval for the mean duration of all headaches for all 12-to-29-

year-old females. Do by hand and

using The Plan-alyzer.

7.4 A sample of 12 observations was drawn from a population of size 100.

Calculate a 95% confidence interval around the average for this sample

by hand. HINT: You will want to use the finite population correction factor for this

problem as found on slides 57 and 58.

13 15 9 11 8 16 14 9 10 14 16 12

Section 7 Exercises

60

7.5 A company wants to estimate the mean net weight of its “Big Top Circus”

cereal boxes. A sample of 16 such boxes produced the mean net weight of

31.98 ounces with a standard deviation of .26 ounces. Make a 95%

confidence interval for the mean net weight of all boxes. Do by hand and

using The Plan-alyzer.

7.6 Crate and Barrel Cataloger promises its customers that the products

ordered will be mailed within 72 hours after an order is placed. The quality

control department at the company checks from time to time to see if this

promise is fulfilled. Recently, the quality control department took a sample

of 50 orders and found that 42 of them were mailed within 72 hours of the

placement of the orders.

a) Construct a 95% confidence interval for the percentage of all orders

that are mailed within 72 hours of their placement. Do by hand and

using The Plan-alyzer.

b) Suppose the confidence interval obtained in part a is too wide. How

can the width of this interval be reduced? Discuss all possible

alternatives. Which of these alternatives is the best?

Section 7 Exercises

61

7.7 In virtual reality a person views a computer-generated scene that changes

as if the viewer’s body were in motion. Some individual experience

unpleasant side effects from virtual reality, such as nausea, dizziness, or

disorientation. In a recent study by Clare Tegan of Britain’s Defense

Research Agency, each of the 150 people included in the study spent 20

minutes wearing a head-mounted virtual reality system through which he

or she explored a virtual environment consisting of a series of rooms.

Either during their time in the virtual environment or in the 10 minutes

immediately afterward, 61% of these 150 persons suffered side effects.

Find the 95% confidence interval for the proportion of all virtual reality

users who would suffer side effects. Do by hand and using The Plan-

alyzer.

Section 7 Exercises

62

7.8 One of the major problems faced by department stores is a high

percentage of returns. The manager of Macy’s wanted to estimate the

percentage of all sales that result in returns. A sample of 40 sales showed

that 8 of them had products returned within the time allowed for returns.

a) Construct a 99% confidence interval for the percentage of all sales

that result in returns. Do by

hand and using The Plan-alyzer.

b) Do you think 99% confidence is appropriate in this case and if not

what would be a more appropriate level of confidence to use?

7.9 According to a survey, the mean price of gasoline in the U.S. was

$1.20 per gallon in 1995 and $1.10 per gallon in 1994 (Wow, don’t you

wish!) Suppose these means were based on random samples of 100

gas station for 1995 and 120 gas station for 1994. Also, assume that

the sample standard deviations were $.11 for 1995 and $.10 for 1994.

Find a 90% confidence interval for the difference between the mean

gasoline prices for 1995 and 1994. Do by hand and using The Plan-

alyzer.

Section 7 Exercises

63

7.10 An insurance company wants to know if the average speed at which men

drive cars is higher than that of women drivers. The company took a

random sample of 27 cars driven by men on a highway and found the

mean speed to be 68 miles per hour with a standard deviation of 2.2 miles.

Another sample of 18 cars driven by women on the same highway gave a

mean speed of 65 miles per hour with a standard deviation of 2.5 miles.

Assume that the speeds at which all men and all women drive cars on this

highway are both known to be normally distributed. Construct a 99%

confidence interval for the difference between the mean speeds of cars

driven by all men and all women drivers on this highway. Do by hand and

using The Plan-alyer.

Section 7 Exercises

64

7.11 Removed.

7.12 In a Prevention magazine survey released in 2008, Princeton Survey

Research Association examined the weight of children aged 3 to 17.

According to this study, 24% of children in this age group were

overweight in 2000, and 31% were considered overweight in 2008.

Suppose that these percentages are based on random samples of 400

and 500 children in the given age group in 2000 and 2008, respectively.

Conduct a 95% confidence interval for the difference between the

portions of the overweight 3-to-17-year-olds in 2000 and 2008. Do by

hand and using The Plan-alyzer.

Section 7 Exercises

**CDC**

**CASESTUDY ASSIG**

N

**MENT
**

**Rising Insurance Premium for the Northeast
**

You are a journalist for the New York Times. Your specialty is research.

Some reader’s have written you complaining that their health insurance premiums are going up while that is not the case for people living in the Midwest. Upon calling several insurance companies (at your reader’s prompting), you were told the same thing –

“Those residing in the Northeast compared to those residing in the Midwest live a more risky lifestyle in terms of alcohol consumption, driving habits, etc. As such their premiums unfortunately are going up while those living in the Midwest are not.”

You have decided to research this and write an article for the newspaper to either confirm or refute this claim by the insurance companies. You suspect they are not telling the whole story and that the cost difference is most likely a function of the higher cost of living in the Northeast.

Upon searching for various data sources to assist you in this effort, you ran across the Center for Disease Control’s “2010 Behavior Risk Factor Surveillance Survey.” This survey is conducted every 10 years and compiles the following information by state.

Variable |
Label |

SMK |
Current cigarette smokers |

WEI |
Overweight (based on government height weight formula |

SED |
Sedentary lifestyle (less than three 20-minuted exercise sessions a week) |

ACT |
No leisure time activity off of the job |

ALC |
Binge drinking (five or more drinks on occasion) |

DWI |
Drinking and driving (after too much to drink) |

SEA |
Seat-belt use (occasionally or never) |

STATE |
U.S. State (alphanumeric) |

N |
Number of people surveyed |

In this study you will note that about 1,000 people from each state were randomly chosen to estimate the true population percentages.

The file has been uploaded to blackboard. A printout of the dataset is attached. Please note that some data is missing for certain states and is denoted with an asterisk.

__Your task:__

1. First you will need to determine which states represent the Midwest and which represent the Northeast. You can find this information as defined by the U.S. government at the web address: **www.census.gov/geo/www/us_regdiv **

2. Once determined you will then need to weight together the survey percentages based on population estimates for each state for the northeast and again for the Midwest. Use the 2000 population estimates. You can find these population estimates at the web address:

http://quickfacts.census.gov/qfd/index.html

3. Once all data is obtained and weighted together, you will then be ready to conduct some hypothesis tests to determine statistical significance. I suggest you create confidence intervals around the difference in percentages or averages. For example, you may want to determine if the percentage of those that binge drink is higher in the Northeast versus the Midwest and if so, is the difference statistically significant.

4. In addition to using the CDC data you are required to also add some census data on income (median income) and education (% with bachelors degree) to the mix. But you must confirm it makes sense by correlating these measures with the risky CDC behaviors at a state level. In other words, you will conduct a correlation analysis where you correlate the census median incomes and percent with a bachelors degree versus the CDC smoking and alcohol consumption percentages. Are they correlated and if so how?

5. Addiitonally read the lasted research on education on life expectancy. Google “Do Rich People Live Longer Harvard.” You may wish to cite portions of this study in your findings.

6. In addition, I want you to consider the cost of living differences for the NE states versus the Midwest states. Google “Money Magazine Cost of Living Calculator” to find a tool to help you conduct this portion of the analysis. The true reason as to why we in the NE pay more is simply because our cost of living is more. Is that true? What do these calculators suggest?

7. At this stage you will then be ready to create a final presentation. The presentations will need to display the raw data and findings in table and graph format using Excel. Be creative. (Hint: Perry loves presentations with nice backgrounds and graphics).

8. Your final report will be in Power Point format with no more than 15 slides.

9. Each Presentation must have at a minimum the following sections:

· Introduction

· Data Used and How Collected

· Analysis Steps

· Findings

· Conclusion

10. On the last class each person will present their findings.

**Census Web Site Screen Shoot
**

1. This is where you will go. At this location you can get mini reports by state including population estimates, median income values, education levels, etc. It is a vast source of free information by state.

2. To find the map that defines what states represent Northeast versus the Midwest go to the PDF file located at: www.census.gov/geo/www/us_regdiv

**2010 CDC Survey Data Regarding Behavior Risk**