Statistics 1 chapters and One big assignment Due 3/9, 1 due 3/11/13

Read assignments before reaching out to me.

Due: by 3/9/13 Saturday 2 PM

Chapt. 7 & 8 assignments are at the end of all exercises

Chapter 7

You will have a link you must use information from that site for the CDC assignment. . Also use 2010 is the best time frame for the examples.

* Chapter 8- Will not be due until 3/11/13 Monday 2PM

Section8

Ensure Valid Test and Survey Results Trough

Proper

Sample Size Estimation

Rhonda Knehans Drake

Associate Professor, New York University

Data Analytics, Interpretation and Reporting
Copyright © 2013

• Prior to conducting your surveys, marketing tests, or obtaining
your estimates we need to determine how large of a sample is
required.

• The more accuracy required in our estimates, the more we will
need to sample.

• However, we should only sample enough to obtain the level of
accuracy needed to help us make a decision.

• Some situations will require more accuracy than others.

• The key here is to determine how much accuracy you will
need to make the decision and only sample the required number
of names.

• We will discuss 4 sample size formulas in this section.

Introduction

We will learn the following formulas for Sample Size estimation

when concerned with:

1. The error associated with a single sample mean

2. The error associated with a single sample proportion or
response rate

3. The error associated with the difference between 2 sample
means or averages

4. The error associated with the difference between 2 sample
proportions or response rates

Sample Size Estimation

1. Sample Size Formula When Concerned with the Accuracy of a
Single Sample Mean.

n = Z2S2

Where:

• S is an estimate of the standard deviation.

If unsure use 25% of your estimated average.

• E is the ± error you can tolerate

• Z is 1.645, 1.96 or 2.575 for a 90%, 95% or 99%
confidence level.

Single Sample Mean I

Example

Every year you conduct a survey to determine student satisfaction at NYU.

The scale is 1 -10 (1 = extremely unhappy and 10 = extremely Happy)

Last year the survey yielded an average and standard deviation of 7.5 and 1.

Your goal for this year was to increase satisfaction by 1 full point to 8.5. If

unsuccessful you will not receive your bonus of $20,000.

You work hard at increasing satisfaction over the course of the year by holding

town hall meetings with students, putting suggestion boxes in all dorms,

upgrading housing conductions, enhancing the student union with free coffee

and snacks, etc.

Single Sample Mean II

Example (continue…)

Based on what you are hearing students say, you believe you will just meet

your goal of moving the needle one point to a new average of 8.5. But, you do

not believe it will be much higher than this value.

How many students should you survey to ensure a tight read here with a

maximum error of ± .1 with 99% confidence.

Single Sample Mean II

• The resulting sample size is:

Single Sample Mean III

n = (2.575)2(1.2)2

(0.1)2

= (6.63)*(1.44)

0.01

= 954

• Let’s do the previous
example again but using
the Plan-alyzer.

Select the tab “Table of

Calculators”

Select “Sample Size
Calculators for Averages”

Select “One Sample”

Single Sample Mean IV

Input the required info.

Single Sample Mean V

See the answer.

Single Sample Mean VI

• Is a 99% confidence level appropriate here in your opinion? What
would be a more appropriate level to use?

• Had you felt you had moved the needle by almost 1.5 points instead of
only one point, could you have been able to sample less names and
tolerate more error while not putting your bonus in jeopardy? Explain.

Single Sample Mean VII

A 99% confidence interval is a bit extreme for a survey.

A 90% – 95% interval is more appropriate.

An error of + 0.5 is still tolerable.

n = (6.63)*(1.44)

(0.25)

n = 38.2

2. Sample Size Formula When Concerned with the Accuracy of a
Single Sample Proportion or Response Rate

n = Z2(p)·(1-p)

Where:

• P is estimate of population proportion. You will base this
figure on prior experience.

• E is the ± error you can tolerate

• Z is 1.645, 1.96 or 2.575 for a 90%, 95% or 99%
confidence level.

Single Sample Proportion I

Example

You are about to test a new prospect list

You expect the response rate of this new list to be some where

around 1% based on your list brokers experience.

Your break-even (the lowest response rate you can tolerate) for

prospecting is .9%.

How many names should you sample so that should the response

rate come in at 1% you will be able to make a decision regarding

using the entire list?

Single Sample Proportion II

• So
n = (1.96)2·(.01)(.99) = (3.8416)(.0099) = 38,0

(.01-,009)2 .000001

• Do so will ensure should the response rate of the test come in at 1%
the resulting confidence interval will look like

.01 ± (1.96) ·√ (.01)(.99)/38,032 = .01 ± .000877

(.9%, 1.1%)

and we can make our decision with actually same worse case our
response rate is at or above break even!

Single Sample Proportion III

n = (1.96)2(0.01)·(0.99)

(0.001)2

n = (3.8416)(0.0099)

(0.000001)

n = 38,032

• Let’s do the previous
example again but using
the Plan-alyzer.

Select the tab “Table of
Calculators”

Select “Sample Size
Calculators for
Percentages”

Select “One Sample”

Single Sample Proportion IV

Input the required info.

Single Sample Proportion V

See the answer.

Single Sample Proportion VI

3. Sample Size Estimation When Concerned with Accurately
Measuring the Difference Between 2 Means or Averages

n1 = n2 = Z2(S12 + S22)

Where:

•d is the minimum difference you wish to detect as

significant should it be observed.

•S1 and S2 are estimates of the standard deviation

associated with each sample. In most cases you will use

the same estimate for both samples and if unsure, you

will use 25% of your expected average.

•Z is 1.645, 1.96 or 2.575 for a 90%, 95% or 99%

confidence

level

Difference Between Sample Means I

Example

You work for MasterCard and you wish to test an incentive for new

card members to increase spend over their first 3 months as a

card holder.

Based on a break even analysis you will need spend to increase by

$5 to cover the costs of your incentive (a few bonus sky miles that
is costing you about 10 cents per card member) Currently, new
card members spend on average $325 over the first 3 months with

a standard deviation of $2

How many names should we sample to ensure that if we find the

test to yield a spending level of $330 we can read the results as

significant?

Difference Between Sample Means II

Difference Between Sample Means III

n1 = n2 = (1.96)2(252 + 252)

n1 = n2 = (3.8416)*(625 + 625)

n1 = n2 = 192

• Let’s do the previous
example again but using
the Plan-alyzer.

Select the tab “Table of
Calculators”

Select “Sample Size
Calculators for
Averages”

Select “Test vs. Control”

Difference Between Sample Means IV

Input the required info.

Difference Between Sample Means V

See the answer.

Difference Between Sample Means VI

4. Sample Size Estimation When Concerned with Accurately
Measuring the Difference Between 2 Proportions or Response
Rates

n1 = n2 = (Z2) (p1)(1-p1) + (p2)(1-p2)

Where:

•p1 is an estimate of one of the samples. You typically know the

response rate of your control group.

•d is difference you wish to detect as significant.

•p2 is p1 + d

•Z is 1.645, 1.96 or 2.575 for a 90%, 95% or 99% confidence

level

Difference Between Sample Proportions I

25
Example

You are testing the addition of a premium to your control package.

Based on a break even analysis, you determine that you need two

additional order per thousand names mailed to break even with the

control.

Your control response rate is typically 1.00%.

How many names should we sample to ensure that if we obtain

two additional orders for our test package we will be able to detect

it as a significant increase.

Difference Between Sample Proportions II

Difference Between Sample Proportions III

n1 = n2 = (1.962) (0.01)(0.99) + (0.012)(0.988)

(0.002)2

n1 = n2 = (3.8416) (0.0099) + (0.011856)

(0.000004)

n1 = n2 = 20,894

• Let’s do the previous
example again but using
the Plan-alyzer.

Select the tab “Table of
Calculators”
Select “Sample Size
Calculators for
Percentages”
Select “Test vs. Control”

Difference Between Sample Proportions IV

Input the required info.

Difference Between Sample Proportions V

See the answer.

Difference Between Sample Proportions VI

There are two break-evens we typically calculate as marketers:

• The break-even response rate required for a new list or product

test such that profit exactly offsets revenue – breakeven.

• The break-even for a new and more expensive format or creative

test such that net profit generated equals that of the control

format.

Break-Even Analysis

The break-even response rate for a new list or product test is the lowest

response rate you can tolerate and not lose any money. It is easily

calculated. It is the response rate such that:

Revenue – Costs = $0, or

(MQ x RR x PPP) – (MQ x PC) = $0

Where: MQ = Mail Quantity

RR = Response Rate

PPP = Profit Prior Promotional Costs

PC = Promotional Costs

Break-Even Response I

By rearranging the formula and solving for the RR, we find that the

break-even response rate is equal to:

RR = PC / PPP

Break-Even Response II

Consider the following example:

Assume you sell collector plates via direct mail. The average profit per

order before promotional costs is $55.00. You are planning to test a

new list on the market that will cost you $650.00 per 1,000 names

promoted.

What is the minimum response rate you must achieve on this list test in

order to break-eve and not lose any money?

And, if you typically have never seen a response rate above 1.00%

historically (regardless of how good the list is) do you recommend

testing the list?

Break-Even Response III

Break-even is calculated as:

Our decision to test this list is:

Break-Even Response IV

RR = PC / PPP

= 0.65/55

= 0.118 or 1.18%

No. we will most likely not see this level of response, so do not

test.

The increase in response that must be obtained on a new and more

expensive format test in order to generate at least the same profit as the

control format is the response rate such that:

Test Rev – Test Cost = Control Rev – Control Cost, or

(MQ*RRT*PPP) – (MQ*PCT) = (MQ*RRC*PPP) – (MQ*PCC)

Where: MQ = Mail Quantity

RRT = Test Response Rate

RRC = Control Response Rate

PPP = Profit Prior Promotional Costs

PCT = Test Promotional Costs

PCC = Control Promotional Costs

Increase in Response Required Break – Even

Consider the following example:

Your current control format is known to yield a 5% response rate and

has a promotional cost of $1 per piece. The profit prior promotional

costs per order is $30.

Your creative director has come up with a new format but it is quite

expensive. This new format will cost you $1.75 per piece to mail.

What is the increase in response required for this new format to break-

even wit the control format?

Increase in Response Required Break – Even

Break-even for the new format test is calculated as:

Increase in Response Required Break – Even

(1,000*RRT*30) – (1,000*1.75) = (1,000*0.05*30) – (1,000*1)

Divide both sides by 1,000

(30*RRT) – (1.75) = (1.5) – (1)

30*RRT = 2.25

RRT = 0.075 or 7.50%

8.1 A researcher wants to determine a 95% confidence interval for the mean

number of hours that high school students spend doing homework per

week. She believes based on prior research that the average study time

per week is about 20 hours with a standard deviation of 7 hours. How

large a sample should the researcher select this year so that the estimate

will be within 1.5 hours of the population mean?

Do by hand and using the Plan-alyzer.

8.2 A U.S. government agency wants to estimate at a 95% confidence level

the mean speed for all cars traveling on Interstate Highway I-95. From a

previous study last year, the agency knows that the average is about 63

miles per hour with a standard deviation of 3.5 miles per hour. What

sample size should the agency choose this year so that the estimate will

be within 1.5 miles per hour of the population?

Do by hand and using the Plan-alyzer.

Section 8 Exercises I

8.3 Tony’s Pizza guarantees all pizza deliveries within 30 minutes of the

placement of orders. The Federal Trade Commission is concerned with

Tony’s advertisements and feels, based on customer complaints, that they

only meet their guarantee about 50% of the time. As such the FTC has

requested that Tony conduct a study. What sample size should the FTC

require of Tony’s to ensure the estimate obtained is within 2% of the true

percentage with 99% confidence?

Do by hand and using the Plan-alyzer.

8.4 A consumer agency wants to estimate the proportion of all drivers who

wear seat belts while driving. Assume that a preliminary study has shown

that 76% of drivers wear seat belts while driving. How large should the

sample size be so that a 99% confidence interval for the population

proportion has a maximum error of .03?

Do by hand and using the Plan-alyzer.

Section 8 Exercises II

8.5 The marketing director at ACME Direct is planning to test the addition of a

4-color flyer to his current direct mail control format. The 4-color flyer will

contain testimonials from famous celebrities praising the product being

offered. The control format is expected to yield a 4.50% response rate. In

order to cover the cost of the flyer (break-even) the test format will need to

yield an additional 3 orders per thousand names promoted. To ensure the

marketing director will be able to read the break-even response rate with

statistical significance, how large should each test panel be? Assume a

95% confidence level.

Do by hand and using the Plan-alyzer.

Section 8 Exercises III

8.6 Jet Music is a direct marketer of music packages covering all genres. Their active

music buyer market is shrinking and fast. There is much competition. Based on prior

mailings, one of Jet Music’s most popular CD packages, “Dance Till You Drop” is

known to yield a net response rate of 3.63% at a $9.97 price point. In an effort to

help increase response rates, the marketing manager has tested this title at a $1

lower price. Order intake is just beginning to come in for this test. After two weeks of

intake the net response rate is approaching 3.95% and climbing. At least 6 more

weeks of intake is expected. It is looking good.

The marketing managers boss is curious if the test of a $1 price decrease is at the

break-even response rate level yet. Calculate the minimum net order rate required to

break-even with the $1 price decrease test so that the marketing manager can

answer her boss.

Section 8 Exercises IV

8.7 You are the marketing manager at ACME Publishing. You test promoted a new

cookbook concept to a very large compiled list file and received a response rate of

2.54%. Based on an examination of age information, you notice that for those over 50

years of age you received a response rate of 4.30% (an index of 169 to total or a 69%

gain over total).

Assume the following:

– Cook book profit prior promotion costs = $9.92

– Promotion costs including list rental costs = $0.4217 per

Should you promote those on this complied list file that are over the age of 50 if your

goal of this promotion is to break even?

Section 8 Exercises IV

Section

Analyzing our Marketing Test, Survey Results

and Other Metrics Using

Confidence Intervals

Rhonda Knehans Drake

Associate Professor, New York University

Data Analytics, Interpretation and Reporting
copyright © 2013

• When we estimate population averages or percentages based on
samples, a certain amount of error is present.

• The amount of error present is a function mostly of your
sample size.

• The larger the sample, the less error in your estimates.

• Today we will learn how to place bounds around our estimates
obtained.

• With such an interval, we will then be able to say with 90%, 95%
or 99% confidence that the true population estimate will lie in
these bounds.

Introduction

The eight confidence interval formulas we will discuss are for

the following situations:

1. Average or mean based on large samples

2. Average or mean based on small samples

3. Response rate or survey percent based on large samples

4. Response rate or survey percent based on small samples

5. Difference between 2 averages for large samples

6. Difference between 2 averages for small samples

7. Difference between 2 response rates for large samples

8. Difference between 2 response rates for small sample

Confidence Intervals

A-B Split Tests

1. Confidence Interval for Averages or Means Based on
Large Samples (n ≥ 30)

To calculate a confidence interval around a mean, the following

information is required:

– The sample mean x obtained from the test.

– The sample standard deviation S obtained from the test.

Many software packages, including Microsoft ExcelTM, can
automatically calculate this value for you. (Review Section 3 for the
standard deviation formula.)

– The sample size n of the test.

This is the number of observations used to calculate your mean. The
sample size must be greater than or equal to 30 in size.

– The desired confidence level: 90%, 95% or 99%.

A confidence interval constructed around the mean will guarantee, with
your specified level of confidence, the true mean will fall within those
bounds.

Sample Means

(Large Sample)

• Once all information is known, you construct the confidence
interval around the mean by adding and subtracting from the
mean a multiple of your standard deviation associated with the
sample mean. The “multiple” depends on your desired
confidence level.

• The formula for a confidence interval around the mean is
calculated as follows:

x  (Z )(S/n )

Where:

• S is the standard deviation associated with the sample.

• n is the sample size.

• Z is equal to 1.645, 1.96 or 2.575 for a 90%, 95% or 99%
confidence level

Sample Means

(Large Sample)

Example:

Money magazine conducts a survey of 100 retirees across the US

and asks them how much they have in their retirement fund.

You obtain an average of $84.75 with a standard deviation of

$18.75, both in thousands of dollars.

You are about to write an article based on this average but realize

that the true average is something more or less than this in reality.

Construct a 95% confidence interval around this average.

Sample Means
(Large Sample)

84.75 1.96 (18.75 / √100)

84.75 1.96 (1.875)

84.75 3.675

($81,075 , $88,425)

Later we will discuss how to chose the confidence level and

address if 95% was the appropriate level for this example.

Sample Means
(Large Sample)

• Let’s do the previous
example again but using
the Plan-

alyzer.

Sample Means
(Large Sample)

Select the tab “Table of

Calculators”

Select “Confidence
Interval Calculators for
Averages, Large Samples”

Select “One Sample”

Sample Means
(Large Sample)

Input the required info.

Sample Means
(Large Sample)

See the answer.

2) Confidence Interval for Averages or Means Based on
Small Samples (n < 30)

To calculate a confidence interval around a mean, the following
information is required:
– The sample mean x obtained from the test.
– The sample standard deviation S obtained from the test.
Many software packages, including Microsoft ExcelTM, can
automatically calculate this value for you. (Review Section 3 for the
standard deviation formula.)
– The sample size n of the test.
This is the number of observations used to calculate your mean. The
sample size must be greater than or equal to 30 in size.
– The desired confidence level: 90%, 95% or 99%.

A confidence interval constructed around the mean will guarantee,
with your specified level of confidence, the true mean will fall within
those bounds.

Sample Means

(Small Sample)

• The formula for a confidence interval around the mean is
the same as the prior formula except we use a value from the

“t-distribution” which is for approximate normally distributed
data:

x  (t )(S / n )

Where:

• S is the standard deviation associated with the sample.
• n is the sample size.

• t is obtained by using the excel function TINV as will be seen
shortly.

Sample Means

(Small Sample)

Example:

Suppose Money Magazine only conducted the survey to a sample

of 10 retirees instead of 100 as in our prior example, all else the

same.

Construct a 95% confidence interval around this average.
Sample Means
(Small Sample)

We construct the confidence interval as before except we will use

The t-distribution.

84.75 ± (t) (18.75/√10 )

Where the value of t = TINV(.05,9)

= 2.262

84.75 ± (2.262) (18.75/3.1623)

84.75 ± (2.262) (5.93)

84.75 ± 13.

($71.34, $98.16)

(Small Sample)

(1-conf

level)

(n-1)

Sample Means

Note our confidence interval is wider for two reasons:

1. The smaller sample size

1. Our sample is less than 30 we cannot assume it is normal but
only approximately normal so our multiplier is larger (2.2

versus 1.96).

Sample Means
(Small Sample)

• Let’s do the previous
example again but using
the Plan-alyzer.

Sample Means
(Small Sample)
Select the tab “Table of
Calculators”

Select “Confidence
Interval Calculators for
Averages, Small Samples”

Select “One Sample”

Sample Means
(Small Sample)
Input the required info.

Sample Means
(Small Sample)

See the answer.

3. Confidence Intervals for Response Rates or Survey
Percentages Based on Large Samples (where n*p and
n*(1 – p) are both ≥ 5)

To calculate a confidence interval around a sample proportion, the

following information is required.

– The sample proportion p obtained from the test.

– The sample size n of the test.
This is the number of observations used to calculate your proportion.

The sample size, when multiplied by the sample proportion and when
multiplied by one minus the sample proportion, must both be greater
than or equal to

– The desired confidence level: 90%, 95% or 99%.

A confidence interval constructed around the sample proportion will
guarantee, with your specified level of confidence, the true population
proportion will fall within those bounds.

Sample Proportions

(Large Sample)

• Once all information is known, you construct the confidence
interval around the sample proportion by adding and subtracting
from the sample proportion a multiple of the standard deviation
associated with the sample proportion. The “multiple” depends
on your desired confidence level.

• The formula for a confidence interval around the sample
proportion is calculated as follows:

• Where n is your sample size and Z is 1.645, 1.96, 2.575 for a
90%, 95% or 99% confidence interval

This is the standard deviation for a

proportion or response rate

Sample Proportions
(Large Sample)

Example:

AT&T samples a new prospect list and sends them an offer to

order their new wireless cellular service.

They sample 10,000 prospects and receive a 0.89% response rate.

What is the margin of error at 90% confidence?

Sample Proportions
(Large Sample)

• So, for our example we have the confidence interval is:

p^  (z/2 )(Sp^ )

= .0089 ± (1.645)·√ (.0089)(1- .0089 )/10,000

= .0089 ± (1.645)·√ (.0089)(.9911 )/10,000

= .0089 ± (1.645)·√ 10000008

= .0089 ± (1.645)·(.0008944)

= .0089 ± .0015

(.0074, .0104) or (.74%, 1.04%)

Sample Proportions
(Large Sample)

0.0089 1.645*(√[(0.0089)*(0.9911)/10,000]

0.0089 1.645*(√0.0000008)

0.0089 1.645*(0.0008944)

0.0089 .0015

(0.0074 , 0.0104) or (0.74% , 1.04%)

• Let’s do the previous
example again but using
the Plan-alyzer.

Sample Proportions
(Large Sample)
Select the tab “Table of
Calculators”

Select “Confidence
Interval Calculators for
Percentages, Large
Samples”

Select “One Sample”

Sample Proportion

(Large Sample)
Input the required info.

Sample Proportion
(Large Sample)
See the answer.

4. Confidence Intervals for Response Rates or Survey
Percentages Based on Small Samples (where either n*p or
n*(1 – p) are < 5)

To calculate a confidence interval around a sample proportion, the

following information is required:

– The sample proportion p obtained from the test.
– The sample size n of the test.

This is the number of observations used to calculate your proportion.
The sample size, when multiplied by the sample proportion and when
multiplied by one minus the sample proportion, must both be greater
than or equal to 5.

– The desired confidence level: 90%, 95% or 99%.
A confidence interval constructed around the sample proportion will
guarantee, with your specified level of confidence, the true population
proportion will fall within those bounds.
Sample Proportions
(Small Sample)

• The formula for a confidence interval around the sample
proportion is the same as the prior formula except we use a
value from the “t-distribution” which is for approximate normally
distributed data:

• Where n is your sample size and t is obtained by using the
Excel function TINV as will be seen shortly.

This is the standard deviation for a
proportion or response rate
Sample Proportions
(Small Sample)

Example:

Suppose AT&T only sampled 100 prospects instead of 10,000 as

in our previous example, all else the same.

What is the margin of error at 90% confidence?
Sample Proportions
(Small Sample)

We construct the confidence interval as before except we will use

the t-distribution.

0.0089  (t)·√(.0089)(1- .0089 )/100

Where the value of t = TINV(.10,99)

= 1.66

= .0089 ± (1.66)·√(.0089)(.9911 )/100

= .0089 ± (1.66)·√ .0000882

= .0089 ± (1.66)·(.0093914)

= (0.00, 0.0245) or (0.00%, 2.45%)

The lower bound here cannot be negative, so we change it to zero.

Sample Proportions
(Small Sample)
(1-conf
level)
(n-1)

• Let’s do the previous
example again but using
the Plan-alyzer.

Sample Proportions
(Small Sample)
Select the tab “Table of
Calculators”

Select “Confidence
Interval Calculators for
Percentages, Small
Samples”

Select “One Sample”

Sample Proportions
(Small Sample)
Input the required info.

See the answer.
Sample Proportions
(Small Sample)

5. Confidence Interval for the Difference between 2 Means or
Averages for Large Samples (n1 ≥ 30 and n2 ≥ 30 )

To calculate a confidence interval around the difference between

two sample means, the following information is required:

– The means of both samples (x1 and x2).

– The standard deviation of both samples (S1 and S2).

Many software packages, including Microsoft ExcelTM, can
automatically calculate these values.

– The size of both samples (n1 and n2).

These are the number of observations that went into calculating each
of your means. Both sample sizes n1 and n2 must be greater than or
equal to 30 in size.

•

Confidence Interval Estimation (A-B Split Testing)

(Difference Between Two Samples Means – Large Samples)

• Once all information is known, you construct the confidence
interval around the difference between two mean by adding and
subtracting from the difference in means a multiple of the
standard deviation associated with the difference. The
“multiple” depends on your desired confidence level.

• The formula for a confidence interval around the difference
between means is calculated as follows:

• Where Z is 1.645, 1.96, 2.575 for a 90%, 95% or 99%
confidence interval.

This is the standard deviation for the

difference in averages.

Confidence Interval Estimation (A-B Split Testing)

(Difference Between Two Sample Means – Large Samples)

(X1 – X2) ± (Z)(√(S1
2 /n1) + (S2

2 /n2))

Example:

You sample 100 home sales in San Francisco and 100 home sales

in NYC for 2010 with the following results:

NSF = 100 XSF = $745.25 SSF = $

NNYC = 100 XNYC = $775.10 SNYC = $

Is there any difference in home prices between NYC and San

Francisco? Base your answer on the 95% confidence level.

Confidence Interval Estimation (A-B Split Testing)

(Difference Between Two Sample Means – Large Samples)

• Assume SSF = $40 and SNYC = $45

(775.10 – 745.25) ± (1.96)·√ [(402/100)+(452/100)]

29.85 ± (1.96)·√ 16 + 20.25

29.85 ± (1.96)·√ 36.25

29.85 ± (1.96)·(6.021)

29.85 ± 11.80

($18,050, $41,650)

Confidence Interval Estimation (A-B Split Testing)
(Difference Between Two Sample Means – Large Samples)
1.

(775.10-745.25) 1.96*(√[(402/100)+(452/100 )])

29.85 1.96*(√(16+20.25))

29.85 1.96*(6.021)

29.85 11.80

($18.050 , $41.650)

How do we interpret?

• What if the interval was -$18,050 to $41,540. How would you
interpret and is it okay in this case to have a negative value?

Confidence Interval Estimation (A-B Split Testing)
(Difference Between Two Sample Means – Large Samples)

– With 95% confidence, we can say that NYC home prices in
2010 are higher than SF home prices by anywhere from
$18,050 to $41,540.

– If Zero were in the interval then you would say no
difference between the two (a hypothesis test!!).

– It means there’s no statistical evidence to conclude that
NYC prices are different from those of SF.

– Its ok to have a negative value.

• Let’s do the previous
example again but using
the Plan-alyzer.

Select the tab “Table of
Calculators”
Select “Confidence
Interval Calculators for
Averages, Large Samples”

Select “Test vs. Control”

Confidence Interval Estimation (A-B Split Testing)
(Difference Between Two Sample Means – Large Samples)

40
Input the required info.

Confidence Interval Estimation (A-B Split Testing)
(Difference Between Two Sample Means – Large Samples)

41
See the answer.
Confidence Interval Estimation (A-B Split Testing)
(Difference Between Two Sample Means – Large Samples)

6. Confidence Interval for the Difference between 2 Means or
Averages for Small Samples (n1 < 30 or n2 < 30).

– If one or both samples is less than 30 in size then you will
replace the Z value with the t value.

– You will again use the TINV function in excel.

– Your parameters are TINV(1- confidence level, n1+n2 – 2).

– All else the same.

– This will be used for small market research problems.

Confidence Interval Estimation (A-B Split Testing)

(Difference Between Two Sample Means – Small Samples)

7. Confidence Intervals for the Difference Between 2
Percentages for Large Samples (n1*p1, n1*(1-p1), n2*p2,

n2*(1-p2), all ≥ 5)

To calculate a confidence interval around the difference between

two sample proportions, the following information is required:

– The proportions (p1 and p2) for both samples.

– The size of both samples (n1 and n2).

These are the number of observations used in calculating each of the
sample proportions. Both sample sizes, when multiplied by their
respective sample proportions and when multiplied by one minus their
respective sample proportions, must all be greater than or equal to 5.

Confidence Interval Estimation (A-B Split Testing)

(Difference Between Two Sample Proportions – Large Samples)

• Once all information is known, you construct the confidence
interval around the difference between two proportions by adding
and subtracting from the difference in proportions a multiple of the
standard deviation associated with the difference. The “multiple”
depends on your desired confidence level.

• The formula for a confidence interval around the difference
between proportions is calculated as follows:

• Where Z is 1.645, 1.96, 2.575 for a 90%, 95% or 99% confidence
interval.

This is the standard deviation for the

difference in proportions.

Confidence Interval Estimation (A-B Split Testing)

(Difference Between Two Sample Proportions – Large Samples)

45
Example:

You are in charge of new card acquisitions at American Express.
You conduct a new offer test for the green card versus your
control offer with the following results

Did the test beat the control with 95% confidence? Do you have a
winner?

Sample Size Response Rate

Control Offer with 10,000 Bonus Miles 10,000 1.10%

Test Offer with 25,000 Bonus Miles 10,000 1.38%

Confidence Interval Estimation (A-B Split Testing)
(Difference Between Two Sample Proportions – Large Samples)

0.00297

0.000001

.0028

So how do we interpret?

• With 95% confidence the test can do worse than the control by -.017%
OR do better than the control by .577%.

• As such we say the test and the control are not significantly different
since the confidence interval wrapped around the difference in
response rates contains zero.

• Had the lower bound been above zero then we would say the test has
beaten the control.

• But let’s be real here. For all purposes, the test is a winner. The lower
bound is soooo close to zero. So worst case the test is the same as
the control with much upside potential.

Confidence Interval Estimation (A-B Split Testing)
(Difference Between Two Sample Proportions – Large Samples)

So how do we interpret (continuation)?

• But remember just because the test beaten the control from a statistical
point of view, that does not mean that it won from a marketing point of
view.

• In this example we were giving away additional sky miles. So the test
will need to beat the control by some minimum most likely greater than
zero or else we will not generate the same revenue.

• Suppose I told you that based on the cost of the additional sky miles
the test needs to beat the control by at least .025% to break-even.
Would you consider it a winner?

• What if I told you the test needs to beat the control by .25% to break
even. Would you now consider the test a winner?

Confidence Interval Estimation (A-B Split Testing)
(Difference Between Two Sample Proportions – Large Samples)

• Let’s do the previous
example again but using
the Plan-alyzer.

Select the tab “Table of
Calculators”

Select “Confidence
Interval Calculators for
Proportions, Large
Samples”

Select “Test vs. Control”

Confidence Interval Estimation (A-B Split Testing)
(Difference Between Two Sample Proportions – Large Samples)

Input the required info.

Confidence Interval Estimation (A-B Split Testing)
(Difference Between Two Sample Proportions – Large Samples)

See the answer.
Confidence Interval Estimation (A-B Split Testing)
(Difference Between Two Sample Proportions – Large Samples)

8. Confidence Intervals for the Difference Between 2
Percentages for Small Samples (where n1*p1 or n1*(1-p1)
or n2*p2 or n2*(1-p2) < 5)

• You will replace the Z value with the t value.

• You will again use the TINV function in excel.

• Your parameters are TINV(1- confidence level, n1+n2 – 2).

• All else the same.

• This will be used for small market research problems.

Confidence Interval Estimation (A-B Split Testing)

(Difference Between Two Sample Proportions – Small Samples)

• Remember, interpretation of confidence intervals is not that

simple.

• They will not tell you what to do.

• They simply give you valid best and worst case scenarios to

take into

consideration.

• They give you additional information upon which to help you

base your marketing decisions:

• Worst case, are the results meeting your criteria?

• How does the upside potential compare to the downside potential?

Interpretation of Confidence Intervals

• No direct marketer should ever consider evaluating their test
results with a confidence level lower than 90%. To do so assumes
way to much risk.

• And, fishing for a confidence level that yields significance should
never be practiced.

• The rules that any good direct marketer should follow regarding
significance are shown on the next slide.

Setting the Confidence Level

Evaluate your test response
rate at 95% confidence level

Significant?

Yes No

Is it significant at the 99%

confidence level?

Not that we want to go that

low but is it significant at the

90% confidence level?

Yes No Yes No

A no brainer.

Let’s roll!

That’s okay…at

a minimum let’s

consider a

partial to full

roll out.

Okay, so we

have something

here. Let’s

either retest or

go for a partial

rollout.

Not good. We

should scrape

this from further

consideration.
Setting the Confidence Level

It is important to keep in mind the following facts regarding the

creation of a confidence interval.

• If you want more confidence in your estimates, the resulting interval
will widen.

• If you increase your sample size, the resulting interval will become
tighter.

• The more accuracy you need in your test estimate, the higher you
should set your confidence level.

• The confidence level you set should depend on the risk you are
willing to take in making an incorrect decision.

Setting the Confidence Level

• If our sample represents a large percent (> 10%) of the population in total,
then we typically apply a correction factor to our margin of error estimates.

• The larger our sample is as a percent of the total population the more valid
our estimates.

• Of the following two samples, which would you think would yield a better
parameter estimate?

• A sample of size 5,000 from a population of size 10,000 in total

• A sample of size 5,000 from a population of size 10,000,000 in total.

• If our sample represents 10% or more of the total population, you

multiple the margin of error for the first four formulas by:

Finite Population Correction Factor

Example:

Going back to our first exercise, suppose the survey of 100 people

was conduct to the 800 residents of the Happy Retirement Village and

not Money Magazine subscribers.

What is our correction factor and what is the new interval?

Finite Population Correction Factor

3.
1.

2.
3.4398

7.1 Briefly explain how the width of a confidence interval decreases

with an increase in the sample size. Give me an example.

7.2 Briefly explain how the width of a confidence interval decreases

with a decrease in the confidence level. Give me an example.

7.3 According to a study done by Dr. Martha S. Linet and others, the mean
duration of the recent headache was 8.2 hours for a sample of 5055
females aged 12 through 29. Assume that this sample represents the
current population of all headaches for all females aged 12 through 29
and that the standard deviation for this sample is 2.4 hours. Make a 95%
confidence interval for the mean duration of all headaches for all 12-to-29-
year-old females. Do by hand and

using The Plan-alyzer.

7.4 A sample of 12 observations was drawn from a population of size 100.
Calculate a 95% confidence interval around the average for this sample
by hand. HINT: You will want to use the finite population correction factor for this
problem as found on slides 57 and 58.

13 15 9 11 8 16 14 9 10 14 16 12

Section 7 Exercises

7.5 A company wants to estimate the mean net weight of its “Big Top Circus”

cereal boxes. A sample of 16 such boxes produced the mean net weight of

31.98 ounces with a standard deviation of .26 ounces. Make a 95%

confidence interval for the mean net weight of all boxes. Do by hand and

using The Plan-alyzer.

7.6 Crate and Barrel Cataloger promises its customers that the products

ordered will be mailed within 72 hours after an order is placed. The quality

control department at the company checks from time to time to see if this

promise is fulfilled. Recently, the quality control department took a sample

of 50 orders and found that 42 of them were mailed within 72 hours of the

placement of the orders.

a) Construct a 95% confidence interval for the percentage of all orders

that are mailed within 72 hours of their placement. Do by hand and

using The Plan-alyzer.

b) Suppose the confidence interval obtained in part a is too wide. How

can the width of this interval be reduced? Discuss all possible

alternatives. Which of these alternatives is the best?

Section 7 Exercises

7.7 In virtual reality a person views a computer-generated scene that changes

as if the viewer’s body were in motion. Some individual experience

unpleasant side effects from virtual reality, such as nausea, dizziness, or

disorientation. In a recent study by Clare Tegan of Britain’s Defense

Research Agency, each of the 150 people included in the study spent 20

minutes wearing a head-mounted virtual reality system through which he

or she explored a virtual environment consisting of a series of rooms.

Either during their time in the virtual environment or in the 10 minutes

immediately afterward, 61% of these 150 persons suffered side effects.

Find the 95% confidence interval for the proportion of all virtual reality

users who would suffer side effects. Do by hand and using The Plan-

alyzer.

Section 7 Exercises

7.8 One of the major problems faced by department stores is a high

percentage of returns. The manager of Macy’s wanted to estimate the

percentage of all sales that result in returns. A sample of 40 sales showed

that 8 of them had products returned within the time allowed for returns.

a) Construct a 99% confidence interval for the percentage of all sales

that result in returns. Do by

hand and using The Plan-alyzer.

b) Do you think 99% confidence is appropriate in this case and if not

what would be a more appropriate level of confidence to use?

7.9 According to a survey, the mean price of gasoline in the U.S. was

$1.20 per gallon in 1995 and $1.10 per gallon in 1994 (Wow, don’t you

wish!) Suppose these means were based on random samples of 100

gas station for 1995 and 120 gas station for 1994. Also, assume that

the sample standard deviations were $.11 for 1995 and $.10 for 1994.

Find a 90% confidence interval for the difference between the mean

gasoline prices for 1995 and 1994. Do by hand and using The Plan-

alyzer.
Section 7 Exercises

7.10 An insurance company wants to know if the average speed at which men

drive cars is higher than that of women drivers. The company took a

random sample of 27 cars driven by men on a highway and found the

mean speed to be 68 miles per hour with a standard deviation of 2.2 miles.

Another sample of 18 cars driven by women on the same highway gave a

mean speed of 65 miles per hour with a standard deviation of 2.5 miles.

Assume that the speeds at which all men and all women drive cars on this

highway are both known to be normally distributed. Construct a 99%

confidence interval for the difference between the mean speeds of cars

driven by all men and all women drivers on this highway. Do by hand and

using The Plan-alyer.

Section 7 Exercises

7.11 Removed.

7.12 In a Prevention magazine survey released in 2008, Princeton Survey

Research Association examined the weight of children aged 3 to 17.

According to this study, 24% of children in this age group were

overweight in 2000, and 31% were considered overweight in 2008.

Suppose that these percentages are based on random samples of 400

and 500 children in the given age group in 2000 and 2008, respectively.

Conduct a 95% confidence interval for the difference between the

portions of the overweight 3-to-17-year-olds in 2000 and 2008. Do by

hand and using The Plan-alyzer.

Section 7 Exercises

CDC

CASESTUDY ASSIG

MENT

Rising Insurance Premium for the Northeast

You are a journalist for the New York Times. Your specialty is research.

Some reader’s have written you complaining that their health insurance premiums are going up while that is not the case for people living in the Midwest. Upon calling several insurance companies (at your reader’s prompting), you were told the same thing –

“Those residing in the Northeast compared to those residing in the Midwest live a more risky lifestyle in terms of alcohol consumption, driving habits, etc. As such their premiums unfortunately are going up while those living in the Midwest are not.”

You have decided to research this and write an article for the newspaper to either confirm or refute this claim by the insurance companies. You suspect they are not telling the whole story and that the cost difference is most likely a function of the higher cost of living in the Northeast.

Upon searching for various data sources to assist you in this effort, you ran across the Center for Disease Control’s “2010 Behavior Risk Factor Surveillance Survey.” This survey is conducted every 10 years and compiles the following information by state.

Variable	Label
SMK	Current cigarette smokers
WEI	Overweight (based on government height weight formula
SED	Sedentary lifestyle (less than three 20-minuted exercise sessions a week)
ACT	No leisure time activity off of the job
ALC	Binge drinking (five or more drinks on occasion)
DWI	Drinking and driving (after too much to drink)
SEA	Seat-belt use (occasionally or never)
STATE	U.S. State (alphanumeric)
N	Number of people surveyed

In this study you will note that about 1,000 people from each state were randomly chosen to estimate the true population percentages.

The file has been uploaded to blackboard. A printout of the dataset is attached. Please note that some data is missing for certain states and is denoted with an asterisk.

Your task:

1. First you will need to determine which states represent the Midwest and which represent the Northeast. You can find this information as defined by the U.S. government at the web address: www.census.gov/geo/www/us_regdiv

2. Once determined you will then need to weight together the survey percentages based on population estimates for each state for the northeast and again for the Midwest. Use the 2000 population estimates. You can find these population estimates at the web address:

http://quickfacts.census.gov/qfd/index.html

3. Once all data is obtained and weighted together, you will then be ready to conduct some hypothesis tests to determine statistical significance. I suggest you create confidence intervals around the difference in percentages or averages. For example, you may want to determine if the percentage of those that binge drink is higher in the Northeast versus the Midwest and if so, is the difference statistically significant.

4. In addition to using the CDC data you are required to also add some census data on income (median income) and education (% with bachelors degree) to the mix. But you must confirm it makes sense by correlating these measures with the risky CDC behaviors at a state level. In other words, you will conduct a correlation analysis where you correlate the census median incomes and percent with a bachelors degree versus the CDC smoking and alcohol consumption percentages. Are they correlated and if so how?

5. Addiitonally read the lasted research on education on life expectancy. Google “Do Rich People Live Longer Harvard.” You may wish to cite portions of this study in your findings.

6. In addition, I want you to consider the cost of living differences for the NE states versus the Midwest states. Google “Money Magazine Cost of Living Calculator” to find a tool to help you conduct this portion of the analysis. The true reason as to why we in the NE pay more is simply because our cost of living is more. Is that true? What do these calculators suggest?

7. At this stage you will then be ready to create a final presentation. The presentations will need to display the raw data and findings in table and graph format using Excel. Be creative. (Hint: Perry loves presentations with nice backgrounds and graphics).

8. Your final report will be in Power Point format with no more than 15 slides.

9. Each Presentation must have at a minimum the following sections:

· Introduction

· Data Used and How Collected

· Analysis Steps

· Findings

· Conclusion

10. On the last class each person will present their findings.

Census Web Site Screen Shoot

1. This is where you will go. At this location you can get mini reports by state including population estimates, median income values, education levels, etc. It is a vast source of free information by state.

2. To find the map that defines what states represent Northeast versus the Midwest go to the PDF file located at: www.census.gov/geo/www/us_regdiv

2010 CDC Survey Data Regarding Behavior Risk

Turn in your highest-quality paper
Get a qualified writer to help you with

“ Statistics 1 chapters and One big assignment Due 3/9, 1 due 3/11/13 ”

Get high-quality paper

Guarantee! All work is written by expert writers!

Still stressed from student homework?

Get quality assistance from academic writers!

Order now