# Marketing Assignments

All of the excersises at the end of the chapters must be completed (microsoft excel). cancer graping assignment does not have a course attached

Section3

Measures of Central Tendency

Rhonda Knehans Drake

Associate Professor, New York University

Data Analytics, Interpretation and Reporting

2

• One way to aid in better understanding your sample data is with
descriptive measures or statistics.

• Each basic summary statistic has its own unique purpose and,
therefore, each plays a critical role in helping you describe and
understand your data. However, not fully understanding what
each of these statistics are measuring and how they are
calculated or when to use one versus the other can cause you to
draw erroneous conclusions.

• For example, Did you know that the average or mean is quite
sensitive to extreme data values (outliers) which could cause you
to make incorrect conclusions based on this data? Do you know
what to do in this case?

• This section will show you how to calculate the basic measures
of central tendency.

Introduction

3

• One way to gain a better understanding of your quantitative data is
with measures of central tendency.

• These measures tell, for example, where the center of the distribution
of income levels lie.

• The Three Measures of Central Tendency are:

– Mean

– Median

– Mode

Measures of Central Tendency

4

• The Mean is the sum of all observations divided by the number of
observations.

• The mean or average is the most widely used measure of central
tendency of a set of observations.

• We will denote the sample mean as x (pronounced “x bar”), the
population mean as  (the Greek letter mu), the number of
observations in a sample as n, the number of observations in a
population as N, the observations for the variable of concern as x1,
x2, x3, x4,… and the sum of all observations as  x ( is the
uppercase Greek letter sigma).

• With the sample mean we are estimating the population mean
denoted as µ.

The Mean I

5

Example: Suppose the following are the prices of 5 houses sold in
Seattle, in thousands of dollars.

158 189 265 127 191

What is the mean?

The Mean II

6

• Sometimes a data set may contain outliers which are extremely
low or high values. They may be legitimate or not legitimate.

Example: Consider our sample prices of houses. Assume the 265

158 189 982 127 191

This outlier value of 982 will pull up the mean and not be a
reflective measure.

New mean = \$1,647 / 5 = \$329 (in thousands)

So, what do we do in this case?

The Mean III

Old Mean = \$186

7

• Median represents the “exact middle” observation for the variable
of concern when the values in your sample are ranked from
lowest to highest.

• Median is also an important measure of central tendency and is
not as sensitive to outliers.

The median is determined by performing the following steps:

1. Rank the observations in your data set from the lowest value to
the highest value.

2. Select the (n + 1)/2 observation in this ranked data set, where
n is the size of the sample drawn.

If the sample size, n, is an even number then (n + 1)/2 will lie
exactly between two observations. In this case, the median is
simply the average of these two observations.

The Median I

8

Example: Consider the following 5 observations which are weight
loss figures in pounds at a health club after 4 weeks for new
members.

10 5 19 8 3

What is the median?

The Median II

3 5 8 10 19

9

Because the mean can be influenced by outliers the best
practice is to show both the mean and median on corporate
level reports and dashboards.

Corporate Reporting

10

• Mode is merely the most frequently occurring observation for
the variable of concern in your sample.

• A less commonly used measure of central tendency is the mode.

The mode is determined by one of the following factors:

1. The most frequently occurring observation in the data set.

2. If all observations within the data set occur the same number of
times, there is no mode.

3. If there is a tie for the most frequently occurring observation in the
data set, the data set has multiple modes.

There can be more than one mode in a set of values.

The Mode I

11

• Mode is merely the most frequently occurring observation for
the variable of concern in your sample.
• A less commonly used measure of central tendency is the mode.

The mode is determined by one of the following factors:
1. The most frequently occurring observation in the data set.
2. If all observations within the data set occur the same number of
times, there is no mode.
3. If there is a tie for the most frequently occurring observation in the
data set, the data set has multiple modes.

There can be more than one mode in a set of values.

The Mode I

12

Example: Suppose the following are the ages of 10 college

23 25 24 23 24 23 20 23 22

30

What is the mode?

The Mode II

Observation Occurrences

20 1

22 1

23 4

24 2

25 1

30 1

13

The relationship between mean, median and mode is shown

below for various data sets.

Mean = Median = Mode

The mean, median and mode are all equal

when the distribution is symmetric and bell-shaped.

Mean Median Mode

The mean is less than the median and mode

when the distribution is skewed to the left.

Mode Median Mean

The mean is greater than the median and mode

when the distribution is skewed

to the right.

Mean vs. Median vs. Mode

14

Let’s do an in class example.

Suppose we sample 10 customers off of the database and note their online
expenditures for 2010.

\$27 \$55 \$42 \$38 \$75 \$62 \$54 \$21 \$398 \$42

Calculate the mean and median, and explain what is happening here?

In Class Example

21 27 38 42 42 54 55 62 75 398

Mean:

Median:

(42 + 54) / 2 = 48

What measure of central tendency is best?

15

How does Excel do this for us?

Consider the table below which shows the ages of 10 college

Age

23

25

24

23
24
23

20

23

22

30

Calculating the Mean, Median and Mode I

(Using Excel – PC or Mac prior version 8)

16

To gain the measures of central tendency in Excel, we first go to
data, data analysis and then click “Descriptive Statistics.”

Calculating the Mean, Median and Mode II

(Using Excel – PC or Mac prior version 8)

17

Highlight your data in the “Input Range,” check “ Labels,” and

decide your “Output Range”, then check “Summary Statistics.”

Calculating the Mean, Median and Mode III

(Using Excel – PC or Mac prior version 8)

18

Then you have the mean, median and mode of the data set.

Calculating the Mean, Median and Mode IV

(Using Excel – PC or Mac prior version 8)

19

Then you have the mean, median and mode of the data set.
Calculating the Mean, Median and Mode IV
(Using Excel – PC or Mac prior version 8)

Skewness and kurtosis are a

Measure of the level of

skewness the distribution

exhibits and how peaked the

distribution is.

20

Then you have the mean, median and mode of the data set.
Calculating the Mean, Median and Mode IV
(Using Excel – PC or Mac prior version 8)

The Skewness measure indicates the level of

non-symmetry. If the distribution of the data are

symmetric then skewness will be close to 0

(zero). The further from 0, the more skewed the

data. A negative value indicates a skew to the

left. Here we note the data is slightly skewed

to the right.

Kurtosis is a measure of the peakedness of the

data. Again, for data that is not excessively

peaked, kurtosis is 0 (zero). In this case our

data is somewhat peaked.

http://www.stattutorials.com/EXCEL/EXCEL-DESCRIPTIVE-STATISTICS.html

http://www.stattutorials.com/EXCEL/EXCEL-DESCRIPTIVE-STATISTICS.html

http://www.stattutorials.com/EXCEL/EXCEL-DESCRIPTIVE-STATISTICS.html

http://www.stattutorials.com/EXCEL/EXCEL-DESCRIPTIVE-STATISTICS.html

http://www.stattutorials.com/EXCEL/EXCEL-DESCRIPTIVE-STATISTICS.html

http://www.stattutorials.com/EXCEL/EXCEL-DESCRIPTIVE-STATISTICS.html

Levels of Kurtosis

Kurtosis of 0

22

When we examine the average home prices in Seattle, we may
also want to compare them to the average home prices in
Chicago.

For example, suppose the sample average for Seattle was
\$186,000 and for Chicago it was \$175,000.

Is the observed difference in average home prices significant?

We will learn this in a few weeks.

Teaser for Future Discussion

23

Appendix Part 1

How to load the Analysis Tool

Pak in Windows Excel

24

• Go to the data tab.

• In all likelihood you will not see
the “Data Analysis” option on the
tool bar as is displayed here.

• So, here is what you do.

Appendix Part 1

25

• Click on the Office Button in

the upper left hand corner

and select “Excel Options”

Appendix Part 1

26

• Select Add-ins and then click
on the Go button.

Appendix Part 1

27

• Check off Analysis ToolPak and
select Ok.

• It will now be ready to use.

Appendix Part 1

28

Appendix Part 2

29

Appendix Part 2

If you have a Mac, you unfortunately do not have the analysis tool pak option in

Excel. Luckily StatPlus kindly offers a free version for Mac owners with

Microsoft’s approval

http://www.analystsoft.com/en/products/statplusmacle/

http://www.analystsoft.com/en/products/statplusmacle/

30
Appendix Part 2

Once Downloaded, open the program, open the data, go to statistics, then

Basic Statistics and then Descriptive Statistics

31

Appendix Part 2

Highlight the data and click OK

32

Appendix Part 2

Voila!

33

3.1 Which of the three measures of central tendency (the mean, the median,
and the mode) can be calculated for quantitative data only, and which
ones can be calculated for both quantitative and qualitative data? Illustrate
with examples.

3.2 Which of the three measures of central tendency (the mean, the median,
and the mode) can assume more than one value for a data set? Give an
example of a data set for which that summary measure assumes more
than one value.

3.3 Price of cars have a distribution that is skewed to the right with outliers in
the right tail. Which of the measures of central tendency is the best to
summarize that data set? Explain.

3.4 The following data give the number of car thefts that occurred in a city
during the past 12 days.

6 3 7 11 5 3 8 7 2 6 9 13

Find the mean, median, and mode.

Section 3 Exercises I

34

3.5 The following data give the 2010 total area of farmland (in millions of
acres) for 10 states (Statistical Abstract of the United States). The data
entered in that order, are for the states of Colorado, Iowa, Kansas,
Minnesota, Missouri, Nebraska, North Dakota, Oklahoma, South Dakota,
and Texas, respectively. (Do in Excel)

33 33 48 30 30 47 40 34 44 129

a) Calculate the mean an median for these data

b) Do theses data contain an outlier? If yes, drop this value and
recalculate the mean and median. Which of the two summary
measures changes by a larger amount when you drop the outlier?

c) Is the mean or the median a better summary measure for these
data? Explain.

3.6 The mean 2009 income for five families was \$39,520. What was the total
2009 income of these five families?

Section 3 Exercises II

35

3.7 Consider the following two data sets.

Data set 1: 12 25 37 8 41

Data set 2: 19 32 44 15 48

Notice that each value of the second data set is obtained by adding 7 to the

corresponding value of the first data set.

a) Calculate the mean for each of these two data sets.

b) Comment on the relationship between the two means.

Section 3 Exercises II

Section4

Measures of Dispersion

Rhonda Knehans Drake

Associate Professor, New York University

Data Analytics, Interpretation and Reporting

2

• One way to aid in better understanding your sample data is with
descriptive measures or statistics.

• Each basic summary statistic has its own unique purpose and,
therefore, each plays a critical role in helping you describe and
understand your data. However, not fully understanding what
each of these statistics are measuring, how they are calculated
or when to use one versus the other can cause you to draw
erroneous conclusions.

• Did you know the spread of your data reveals how solid your
estimate of the mean is? The tighter your spread the better your
estimate of the average. In other words, we want our data sets
to have as little spread as possible.

• This section will show you how to calculate the basic measures
of dispersion.

Introduction

3

• The two main measures of dispersion of concern are:

1. The Range

2. The Variance and Standard Deviation

Measures of Dispersion

4

• The range is the largest observation minus the smallest observation
for the variable of concern in your sample.

• The range gives a sense of the “true spread” of all observations in
the data set.

• In fact, the range is sometimes referred to as the “spread.” When
being reported, it is often accompanied by the minimum and
maximum values observed.

• The range is denoted by the following formula:

Range = (the maximum observation) – (the minimum observation)

The Range I

5

• For example, assume we survey 20 people and ask them their online
expenditures for 2009.

\$100 \$50 \$100 \$150 \$125 \$100 \$80 \$75 \$125 \$150

\$150 \$175 \$50 \$80 \$25 \$100 \$100 \$75 \$125 \$100

The Range II

What is the range of this data set?

Max = 175

Min = 25

Range = 175 – 25 = 150

6

• The standard deviation is an average measure of dispersion of each
observation in your data set from the mean for the variable of
concern.

• In other words, the standard deviation tells us how much, on average,
the data lies from the mean.

• It will answer if the observations for the variable of concern lie tightly
or are widely dispersed around the mean.

• To obtain the standard deviation, take the square root of the
variance.

The Standard Deviation I

7

• The formula for the sample variance (S2) is equal to the sum of each
observation minus the mean squared and then divided by your
sample size minus one.

S2 =  (X – X )
2
/ (n – 1)

• We square the difference because we do not care if the observation
is, for example, 5 units above or below the mean but only that it is five
units away from the mean.

• We divide by n-1 rather than n because it was proven a long time ago
that when you divide my n-1 it provides a much better estimate of
the true population variance.

• To determine the standard deviation (S), we take the square root of
the variance:

S = √ S2

The Standard Deviation II

8

• We denote the population variance and standard deviation with the
Greek letter sigma:

• Population variance = σ2

• Population standard deviation = σ

The Standard Deviation III

9

• Let’s calculate the variance and standard deviation for our online
spend example.

The Standard Deviation IV

10

• We can also use the shortcut formula as follows:

Using the shortcut formula for our online spend example we get:

The Standard Deviation V

11

• The larger the variation, the more spread out the data.

• The larger the variation, the more difficult it will be for the
marketer to make inferences about the data.

Dispersion of data around the mean.

Variance = S2
2

Variance = S3
2

Variance = S1
2

S3
2
> S2

2
> S1

2

The Standard Deviation VI

12

• There are several rules but we will only focus on the “Empirical
Rule” which state that if your sample has a symmetric and bell
shaped distribution then:

Data Dispersion Rules I

 68% of the observations within a data set will lie within one
standard deviation of the mean

 95% of the observations within a data set will lie within two
standard deviations of the mean

 99.7% of the observations within a data set will lie within three
standard deviations of the mean

13

• Pictorially this looks as follows:

The Empirical Rule






 -  + – 2 – 3  + 2  + 3

Data Dispersion Rules II

14

• We determine observations in our data to be outliers by
examining if they lie more than 3, 4, 5, or 6 standard deviations
from the mean

• It all depends on the quantities you are dealing with.

• Legitimate or not you must remove outliers before examining
relationships in the data.

• SAS and SPSS offer a drop down menu where you can easily
eliminate outliers.

Outliers

15

• Let’s now use Excel to calculate the variance and standard
deviation.

Consider the table below which shows

the ages of 10 college graduates from NYU.

Age

23

25

24

23
24
23

20

23

22

30

Calculating the Standard Deviation I

(Using Excel – PC)

16

• To determine the measures of central tendency in Excel, we first go to data,
data analysis and then click “Descriptive Statistics” (just as we did to calculate
the mean).

Calculating the Standard Deviation II

(Using Excel – PC)

17

• Highlight your data in the “Input Range,” check “ Labels,” and
decide your “Output Range”, then check “Summary Statistics.”

Calculating the Standard Deviation III

(Using Excel – PC)

18

• Then you have the range and standard deviation of the data set.

Calculating the Standard Deviation IV

(Using Excel – PC)

19

4.1 The range, as a measure of spread, has the disadvantage of being
influenced by outliers. Illustrate this with an example.

4.2 Can the standard deviation have a negative value? Explain.

4.3 When is the value of the standard deviation for a data set zero? Give on
example. Calculate the standard deviation for this example and show that
its value is zero.

4.4 The following table gives the

2009 revenues (rounded to

billions of dollars) of the top

10 companies in Fortune

magazine’s Global 500

(Fortune Magazine). Find the

range, variation, and

standard deviation for these

data (by hand and Excel).

2009 Revenue

(in billions of U.S.

Dollars)

Company

1995 Revenue

(in billions of U.S.

dollars)

Mitsubishi (Japan) 184

Mitsui (Japan) 182

Itochu (Japan) 169

General Motors (U.S.) 169

Sumitomo (Japan) 168

Marubeni (Japan) 161

Ford Motor (U.S.) 137

Toyota Motor (Japan) 111

Exxon (U.S.) 110

Royal Dutch/ Shell Group (Brit/Neth) 110

Company

Section 4 Exercises I

20

Section 4 Exercises II

4.5 Consider the following two data sets.

Data Set I: 12 25 37 8 41

Data Set II: 19 32 44 15 48

Note that each value of the second data set is obtained by adding 7 to the
corresponding value of the first data set. Calculate the standard deviation
for each of these two data sets using the formula for sample data.
Comment on the relationship between the two standard deviations.

Section 5

Data Distribution

Rhonda Knehans Drake

Associate Professor, New York University

Data Analytics, Interpretation and Reporting

2

• There are many distributional forms that our marketing data
can take on.

• Most data follows a specific form of one type or another.

• Because of this fact, we can make estimates and forecasts
about what our data is telling us and do so with a certain level of
confidence.

Introduction

3

Examples of common forms of data in industry are:

– Time to failure follows what is known as an exponential
distribution. Xerox takes advantage of this well know fact to
determine servicing needs and how to set contract prices for its
various equipment.

– Department stores take advantage of the fact that the period of time
between the arrivals of two successive customers also follows an
exponential distribution.

Examples of Data Distribution I

4

Examples of common forms of data in industry are:

– If you are interested in counting the number of occurrences of a
specific event within a set time period then we have what is called
the Poisson distribution.

– The number of murders in NYC during the month of April follows a
Poisson distribution. Based on this fact, the city can then
estimate the murder rate for the next month.

– The hypergeometric distribution is used when doing QC
checking for defectives in a large lot. With this distribution, you
can determine the probability that in a sample of size n from the
entire lot of size N, you will accept it when in reality it exceeds your
defective rate.

Examples of Data Distribution I

5

• However, the most prevalent and most important functional
form for a marketer is the normal distribution.

• Also known as the bell shaped curve.

• Many data elements follow the normal distribution:

– Age

– Income Levels

– GPA’s

– Spend

– Years education

– Etc.

• And because all these measures tend to follow the normal curve,
we as marketers and researchers are able to make many
inferences about our metrics with a high degree of confidence.

Examples of Data Distribution II

6

• Referring back to the distribution of income from a prior chapter,
we now know this data to be normally distributed.

• The distribution of income was symmetric and bell-shaped.

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

\$
1
0
,0

0
0

\$
2
5
,0

0
0
\$
2
5
,0

0
0

\$
4
0
,0

0
0
\$
4
0
,0
0
0

\$
5
5
,0

0
0
\$
5
5
,0
0
0

\$
7
0
,0

0
0
\$
7
0
,0
0
0

\$
8
5
,0

0
0
\$
8
5
,0
0
0

\$
1
0
0
,
0
0
0

\$
1
0
0
,
0
0
0

\$
1
1
5
,
0
0
0

Incom e Categories

R
e
la

t
iv

e
F

re
q
u
e

n
c
y

Histogram and polygon for the relative frequency distribution of income levels.

Normal Distribution

7

• What would you think the measure of skewness and kurtosis to
be for this data?.

0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
\$
1
0
,0
0
0

\$
2
5
,0
0
0
\$
2
5
,0
0
0

\$
4
0
,0
0
0
\$
4
0
,0
0
0

\$
5
5
,0
0
0
\$
5
5
,0
0
0

\$
7
0
,0
0
0
\$
7
0
,0
0
0

\$
8
5
,0
0
0
\$
8
5
,0
0
0

\$
1
0
0
,
0
0
0
\$
1
0
0
,
0
0
0

\$
1
1
5
,
0
0
0
Incom e Categories
R
e
la
t
iv
e
F
re
q
u
e
n
c
y
Histogram and polygon for the relative frequency distribution of income levels.
Normal Distribution

8

• As you recall from Section 4, we discussed the Empirical Rule.
This rule stated that if the distribution of your data is symmetric
and bell-shaped (now known as normally distributed data):

– 68% of the observations within the data set will lie within one
standard deviation of the mean

– 95% of the observations within the data set will lie within two
standard deviations of the mean

– 99.7% of the observations within the data set will lie within
three standard deviations of the mean

The Spread of Normally Distributed Data I

9

• Pictorially, this looks as follows:






 -  + – 2 – 3  + 2  + 3

The Spread of Normally Distributed Data II

10

5.1 Rite Aid Pharmacy wishes to monitor the number of customers arriving at

the checkout counter on Sunday afternoons for staffing purposes. What

distributional form will this data follow?

5.2 How is the time to failure for GE light bulbs

distributed?

5.3 How would you suspect the average age of your customer base to be

distributed?

5.4 Income on your customer database is distributed normally with a mean of

\$55,000 and a standard deviation of \$10,000. What percent of the

database do you estimated will have an income within the range \$35,000

to \$75,000.

5.5 How do the width and height of a normal distribution change when its

mean remains the same but its standard deviation decreases? Show this

graphically.

5.6 How do the width and or height of a normal distribution change when its

standard deviation remains the same but its mean increases? Show this

graphically.

Section 5 Exercises

Section 6

The Central Limit Theorem

Rhonda Knehans Drake

Associate Professor, New York University

Data Analytics, Interpretation and Reporting

2

• The assessment of sample means (average and percentages)
are the basis of many every day business decisions.

• Therefore understanding exactly how an average is distributed
is KEY to properly assessing one versus another (Pre vs. Post,
Control vs. Test, etc.).

• Luck would have it that it that averages will always follow a
normal distribution as n gets large (n>30).

Introduction

3

• When you take a sample from a population and calculate the
average dollars spent, for example, that average or mean has
certain distributional properties.

• According to the Central Limit Theorem, regardless of how the
population from which we sampled is distributed, the sample
mean or response rate or click through rate (for n  30) will be
normally distributed with a mean equal to the mean of the
population from which the sample came and a standard
deviation equal to the standard deviation of the population from
which the sample came divided by the square root of the sample
size.

• This was proven a long time ago. And we can take advantage
of it.

The Central Limit Theorem I

4

• So what does this mean….even if the distribution of dollars
spent is highly skewed and not symmetric and therefore not
normal at all, your statistics such as average spend will be.

• Let’s take a look at what the Central Limit Theorem is saying.

The Central Limit Theorem I

5

ACME Database

(10,000,000 Customer

Record)

Sample 1,000

(n=10,000

customers)

Sample 4

(n=10,000
customers)

Sample 3

(n=10,000
customers)

Sample 2

(n=10,000
customers)

Sample 1

(n=10,000
customers)

X1,000 = avg. income

of

sample 1,000

X4= avg. income of

sample 1,000

X3= avg. income of

sample 1,000

X2 = avg. income of

sample 1,000

X1 = avg. income of

sample 1,000

The Central Limit Theorem II

• The database analyst at ACME Direct draws 1,000 random
samples, of size 10,000 each, from the database and observes
the data element “household income.”

6

• The analyst then calculates the mean “household income” for
each of the 1,000 samples and creates a frequency distribution
of the average incomes from the 1,000 samples drawn.

Income Range Frequency

Less than \$10,000 32

\$10,000 – \$20,000 119

\$20,000 – \$30,000 187

\$30,000 – \$40,000 326

\$40,000 – \$50,000 192

\$50,000 – \$60,000 116

\$60,000+ 28

Total 1,00

0

The Central Limit Theorem III

7

• The analyst creates a histogram using the frequency table and
notes the distribution of these 1,000 sample mean income
values is normally distributed (symmetric and bell-shaped).

Distribution of 1,000 Sample Means

0

50

100

150

200

250

300

350

Less than

\$10,000

\$10,000 –

\$20,000

\$20,000 –

\$30,000

\$30,000 –

\$40,000

\$40,000 –

\$50,000

\$50,000 –

\$60,000

\$60,000+

Income Range s

F
re

q
u

e
n

c
y

The Central Limit Theorem IV

8

• According to the Central Limit Theorem, the histogram will be
normally distributed (bell-shaped and symmetric) with

– a mean equal to the true mean “household income” level of
all people on the database and

– a standard deviation equal to the true standard deviation
for all people on the database (when divided by the square
root of n).

The Central Limit Theorem V

9

So important that the normal curve was printed on the 10 German

Deutsche Mark until 1993.

Gauss and the Deutsche Mark

http://en.wikipedia.org/wiki/File:Carl_Friedrich_Gauss

10

Suppose you work for American Express and conducted a test to

1,000 new card members to excite spend.

• Based on your test you received an average spend value of \$175 for
this test with a standard deviation of \$25.

• You know this is not reality because you only conducted a test.

How can you estimate the spend level for rollout to all new card
members?

• Based on the CLT we can say with 95% certainty, true spend will lie
some where within plus or minus 2 standard deviations of our average.

• So, in other words, the true spend should lie somewhere between
\$125 and \$225 with 95% certainty

An Example of the CLT in Practice

11

6.1 You note the number of arrivals each day at Starbucks in Grand Central

Station between the hours of 5 pm and 6 pm for the months of April and

May. You do the same for the Starbucks at Penn Station. You calculate

the average for Grand Central and the average for Penn Station.

a) How is the variable number of arrivals between 5pm and 6pm

distributed?

b) How is the average number of arrivals for Grand Central and Penn

Station distributed?

6.2 You know income to be normally distributed on your customer file. You

sample 20 people on the file. How will the average income be distributed

for this small sample?

6.3 Income on your database is highly skewed. You sample 20 people on the

file. How will the average income be distributed for this small sample?

Section 6 Exercises I

12

6.4 Income on your database is highly skewed. You sample 1,000 people on

the file. How will the average income be distributed for this sample of size

1,000?

Section 6 Exercises II

Graphing/Charting Project

Below is data obtained from the National Cancer Society with projections for the number of newly diagnosed cancer patients through

2050

broken out by age.

Age of Newly Diagnosed Cancer Patients

Year

<50

50-64

65-74

75-84

>85

2000

0.17

0.38

0.35

0.31

0.14

2010

0.17

0.46

0.57

0.34

0.14

2020

0.17

0.55

0.68

0.40

0.20

2030

0.18

0.60

0.70

0.62

0.16

2040

0.23

0.59

0.69

0.71

0.28

2050

0.22

0.68

0.65

0.70

0.42

Note: .17 equals 170,000 for example

Instructions:

Prepare one slide that graphically best presents this data and make appropriate observations. It would be interesting to try and show if the number of cancer patients are going up year after year and how the distribution by age range is also changing, all in one graphical representation. A stacked bar chart or area chart might serve this purpose well.