week 5 -for courseworkhero.co.uk

FOR courseworkhero.co.uk

Read the instructions in the attached 3 templets which are chapter,7,8,9 and 11 answer the questions in the templete.

Name:

Chapter 7 Instructions

Practice Problem 14

Due Week 5 Day 6 (Sunday)

Follow the instructions below to submit your answers for Chapter 7 Practice Problem 14.

1. Save Chapter 7 Instructions to your computer.

2. Type your answers into the shaded boxes below. The boxes will expand as you type your answers.

3. Resave this form to your computer with your answers filled-in.

4. Attach the saved form to your reply when you turn-in your work in the Assignments section of the Classroom tab. Note: Each question in the assignments section will be listed separately; however, you only need to submit this form one time to turn-in your answers.

Below is an explanation of the symbols in Chapter 7, Practice Problem 14.

M = Mean

S2 = Population Variance

SM = Standard Deviation of the Distribution of Means

t = score for your sample

t needed = cut-off score that establishes the region of rejection (also known as the critical value)

Decision:
Reject the Null
or
Fail to Reject the Null
(select only one)

Read Chapter 7 Practice Problem 14 in your text book and then type your answers beside into the shaded boxes below. Note: Please provide only those answers indicated below, nothing more. You do not need to show your work. Round your answers to 2 decimal places.

M =

S2 =

SM =

t =

t needed = +

Decision:

Name:

Chapter 8 Instructions

Practice Problem 18

Due Week 5 Day 6 (Sunday)

Follow the instructions below to submit your answers for Chapter 8 Practice Problem 18.

1. Save Chapter 8 Instructions to your computer.

2. Type your answers into the shaded boxes below. The boxes will expand as you type your answers.

3. Resave this form to your computer with your answers filled-in.

Below is an explanation of the symbols in Chapter 8, Practice Problem 18.

N1 = number of participants in the experimental group

N2 = number of participants in the control group

df1 = degrees of freedom for the experimental group

df2 = degrees of freedom for the control group

dfTotal = degrees of freedom for both groups

M1 = mean of the experimental group

M2 = mean of the control group

S21 = estimated population variance of the experimental group

S22 = estimated population variance of the control group

S2Pooled = pooled estimate of the population variance

S2M1 = variance of the distribution of means for the experimental group

S2M2 = variance of the distribution of means for the control group

S2Difference = variance of the distribution of differences between means

SDifference = standard deviation of the distribution of differences between means

t = score for your sample

t needed = cut-off score that establishes the region of rejection (also known as the critical value)

Decision:
Reject the Null
or
Fail to Reject the Null
(select only one)

Read Chapter 8 Practice Problem 18 in your text book and then type your answers into the shaded boxes below. Note: Please provide only those answers indicated below, nothing more. You do not need to show your work. Round your answers to 2 decimal places.

N1 =

N2 =

df1 =

df2 =

dfTotal =

M1 =

M2 =

S21 =

S22 =

S2Pooled =

S2M1 =

S2M2 =

S2Difference =

SDifference =

t =

t needed = +

Decision:

Name:

Chapter 9 Instructions

Practice Problem 17

Due Week 5 Day 6 (Sunday)

Follow the instructions below to submit your answers for Chapter 9 Practice Problem 17.

1. Save Chapter 9 Instructions to your computer.

2. Type your answers into the shaded boxes below. The boxes will expand as you type your answers.

3. Resave this form to your computer with your answers filled-in.

Below is an explanation of the symbols in Chapter 9, Practice Problem 17.

S2Between = between groups population variance estimate

S2Within = within groups population variance estimate

F = statistical score that represents the ratio of the between groups to the within groups population variance estimate

F needed = cut-off score that establishes the region of rejection (also known as the critical value)

both groups

Decision:
Reject the Null
or
Fail to Reject the Null
(select only one)

Read Chapter 9 Practice Problem 17 in your text book and then type your answers into the shaded boxes below. Note: Please provide only those answers indicated below, nothing more. You do not need to show your work. Round your answers to 2 decimal places.

S2Between =

S2Within =

F =

F needed = +

Decision:

222

CHAPTER 7

✪ The t Test for a Single Sample 2

✪ The t Test for Dependent Means 23

✪ Assumptions of the t Test for a Single
Sample and the t Test for Dependent
Means 247

✪ Effect Size and Power for the t Test
for Dependent Means 247

✪ Controversy: Advantages and
Disadvantages of Repeated-Measures
Designs 25

At this point, you may think you know all about hypothesis testing. Here’s asurprise: what you know will not help you much as a researcher. Why? Theprocedures for testing hypotheses described up to this point were, of course,
absolutely necessary for what you will now learn. However, these procedures in-
volved comparing a group of scores to a known population. In real research practice,
you often compare two or more groups of scores to each other, without any direct
information about populations. For example, you may have two scores for each per-
son in a group of people, such as scores on an anxiety test before and after psy-
chotherapy or number of familiar versus unfamiliar words recalled in a memory
experiment. Or you might have one score per person for two groups of people, such

✪ Single Sample t Tests and Dependent
Means t Tests in Research
Articles 252

✪ Summary 253

✪ Key Terms 254

✪ ExampleWorked-OutProblems 254

✪ Practice Problems 258

✪ Using SPSS 265

✪ Chapter Notes 268

Introduction to t Tests
Single Sample
and

Dependent Means

Chapter Outline

IS
B

0-558-46761-X

Introduction to t Tests 223

t test hypothesis-testing procedure in
which the population variance is un-
known; it compares t scores from a sam-
ple to a comparison distribution called a
t distribution

as an experimental group and a control group in a study of the effect of sleep loss on
problem solving, or comparing the self-esteem test scores of a group of 10-year-old
girls to a group of 10-year-old boys.

These kinds of research situations are among the most common in psychology,
where usually the only information available is from samples. Nothing is known
about the populations that the samples are supposed to come from. In particular, the
researcher does not know the variance of the populations involved, which is a crucial
ingredient in Step ❷ of the hypothesis-testing process (determining the characteristics
of the comparison distribution).

In this chapter, we first look at the solution to the problem of not knowing the
population variance by focusing on a special situation: comparing the mean of a sin-
gle sample to a population with a known mean but an unknown variance. Then, after
describing how to handle this problem of not knowing the population variance, we
go on to consider the situation in which there is no known population at all—the sit-
uation in which all we have are two scores for each of a number of people.

The hypothesis-testing procedures you learn in this chapter, those in which the
population variance is unknown, are examples of t tests. The t test is sometimes
called “Student’s t” because its main principles were originally developed by
William S. Gosset, who published his research articles anonymously using the name
“Student” (see Box 7–1).

The t Test for a Single

Sample

Let’s begin with an example. Suppose your college newspaper reports an informal
survey showing that students at your college study an average of 17 hours per week.
However, you think that the students in your dormitory study much more than that.
You randomly pick 16 students from your dormitory and ask them how much they
study each day. (We will assume that they are all honest and accurate.) Your result is
that these 16 students study an average of 21 hours per week. Should you conclude
that students in your dormitory study more than the college average? Or should you
conclude that your results are close enough to the college average that the small dif-
ference of 4 hours might well be due to your having picked, purely by chance, 16 of
the more studious residents in your dormitory?

In this example you have scores for a sample of individuals and you want to com-
pare the mean of this sample to a population for which you know the mean but not the
variance. Hypothesis testing in this situation is called a t test for a single sample. (It is
also called a one-sample t test.) The t test for a single sample works basically the same
way as the Z test you learned in Chapter 5. In the studies we considered in that chapter,
you had scores for a sample of individuals (such as a group of 64 students rating the at-
tractiveness of a person in a photograph after being told that the person has positive
personality qualities) and you wanted to compare the mean of this sample to a popula-
tion (in this case, a population of students not told about the person’s personality qual-
ities). However, in the studies we considered in Chapter 5, you knew both the mean
and variance of the general population to which you were going to compare your sam-
ple. In the situations we are now going to consider, everything is the same, but you
don’t know the population variance. This presents two important new wrinkles affect-
ing the details of how you carry out two of the steps of the hypothesis-testing process.

The first important new wrinkle is in Step ❷. Because the population variance is not
known, you have to estimate it. So the first new wrinkle we consider is how to estimate
an unknown population variance. The other important new wrinkle affects Steps ❷
and ❸. When the population variance has to be estimated, the shape of the comparison

t test for a single sample hypothesis-
testing procedure in which a sample
mean is being compared to a known
population mean and the population
variance is unknown.

IS
B

N
0-

55
8-

46
76

1-
X

distribution is not quite a normal curve; so the second new wrinkle we consider is the
shape of the comparison distribution (for Step ❷) and how to use a special table to find
the cutoff (Step ❸) on what is a slightly differently shaped distribution.

Let’s return to the amount of studying example. Step ❶ of the hypothesis-testing
procedure is to restate the problem as hypotheses about populations. There are two
populations:

Population 1: The kind of students who live in your dormitory.
Population 2: The kind of students in general at your college.

The research hypothesis is that Population 1 students study more than Population 2
students; the null hypothesis is that Population 1 students do not study more than
Population 2 students. So far, the problem is no different from those in Chapter 5.

Step ❷ is to determine the characteristics of the comparison distribution. In this
example, its mean will be 17, what the survey found for students at your college
generally (Population 2).

224 Chapter 7

professor of mathematics and not a proper brewer at all.
To his statistical colleagues, mainly at the Biometric Lab-
oratory at University College in London, he was a mere
brewer and not a proper mathematician.

So Gosset discovered the t distribution and invented
the t test—simplicity itself (compared to most of
statistics)—for situations when samples are small and
the variability of the larger population is unknown. How-
ever, the Guinness brewery did not allow its scientists to
publish papers, because one Guinness scientist had re-
vealed brewery secrets. To this day, most statisticians
call the t distribution “Student’s t” because Gosset wrote
under the anonymous name “Student.” A few of his fel-
low statisticians knew who “Student” was, but apparently
meetings with others involved the secrecy worthy of a
spy novel. The brewery learned of his scientific fame
only at his death, when colleagues wanted to honor him.

In spite of his great achievements, Gosset often wrote
in letters that his own work provided “only a rough idea
of the thing” or so-and-so “really worked out the com-
plete mathematics.” He was remembered as a thoughtful,
kind, humble man, sensitive to others’ feelings. Gosset’s
friendliness and generosity with his time and ideas also
resulted in many students and younger colleagues mak-
ing major breakthroughs based on his help.

To learn more about William Gosset, go to http://
www-history.mcs.st-andrews.ac.uk/Biographies/Gosset.
html.

Sources: Peters (1987); Salsburg (2001); Stigler (1986);
Tankard (1984).

BOX 7–1 William S. Gosset, Alias “Student”:
Not a Mathematician, But a Practical Man

William S. Gosset graduated
from Oxford University in
1899 with degrees in mathe-
matics and chemistry. It hap-
pened that in the same year
the Guinness brewers in
Dublin, Ireland, were seeking
a few young scientists to take
a first-ever scientific look at
beer making. Gosset took one
of these jobs and soon had

immersed himself in barley, hops, and vats of brew.
The problem was how to make beer of a consistently

high quality. Scientists such as Gosset wanted to make the
quality of beer less variable, and they were especially in-
terested in finding the cause of bad batches. A proper sci-
entist would say, “Conduct experiments!” But a business
such as a brewery could not afford to waste money on ex-
periments involving large numbers of vats, some of which
any brewer worth his hops knew would fail. So Gosset
was forced to contemplate the probability of, say, a certain
strain of barley producing terrible beer when the experi-
ment could consist of only a few batches of each strain.
Adding to the problem was that he had no idea of the vari-
ability of a given strain of barley—perhaps some fields
planted with the same strain grew better barley. (Does this
sound familiar? Poor Gosset, like today’s psychologists,
had no idea of his population’s variance.)

Gosset was up to the task, although at the time only he
knew that. To his colleagues at the brewery, he was a

The Granger Collection

IS
B

N
0-558-46761-X

The next part of Step ❷ is finding the variance of the distribution of means. Now
you face a problem. Up to now in this book, you have always known the variance of
the population of individuals. Using that variance, you then figured the variance of the
distribution of means. However, in the present example, the variance of the number of
hours studied for students at your college (the Population 2 students) was not reported
in the newspaper article. So you email the paper. Unfortunately, the reporter did not
figure the variance, and the original survey results are no longer available. What to do?

Basic Principle of the t Test: Estimating the Population
Variance from the Sample

Scores

If you do not know the variance of the population of individuals, you can estimate it
from what you do know—the scores of the people in your sample.

In the logic of hypothesis testing, the group of people you study is considered to
be a random sample from a particular population. The variance of this sample ought
to reflect the variance of that population. If the scores in the population have a lot of
variation, then the scores in a sample randomly selected from that population should
also have a lot of variation. If the population has very little variation, the scores in a
sample from that population should also have very little variation. Thus, it should be
possible to use the variation among the scores in the sample to make an informed
guess about the spread of the scores in the population. That is, you could figure the
variance of the sample’s scores, and that should be similar to the variance of the
scores in the population. (See Figure 7–1.)

There is, however, one small hitch. The variance of a sample will generally be
slightly smaller than the variance of the population from which it is taken. For this
reason, the variance of the sample is a biased estimate of the population variance.1

It is a biased estimate because it consistently underestimates the actual variance of
the population. (For example, if a population has a variance of 180, a typical sample

Introduction to t Tests 225

(c)(b) (a)

Figure 7–1 The variation in samples (as in each of the lower distributions) is similar to
the variations in the populations they are taken from (each of the upper distributions).

biased estimate estimate of a popula-
tion parameter that is likely systemati-
cally to overestimate or underestimate
the true value of the population parame-
ter. For example, would be a biased
estimate of the population variance (it
would systematically underestimate it).

SD2

IS
B
N
0-
55
8-
46
76
1-
X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

226 Chapter 7

unbiased estimate of the population
variance ( ) estimate of the popula-
tion variance, based on sample scores,
which has been corrected so that it is
equally likely to overestimate or under-
estimate the true population variance;
the correction used is dividing the sum
of squared deviations by the sample size
minus 1, instead of the usual procedure
of dividing by the sample size directly.

of 20 scores might have a variance of only 171.) If we used a biased estimate of the
population variance in our research studies, our results would not be accurate. There-
fore, we need to identify an unbiased estimate of the population variance.

Fortunately, you can figure an unbiased estimate of the population variance by
slightly changing the ordinary variance formula. The ordinary variance formula is the
sum of the squared deviation scores divided by the number of scores. The changed for-
mula still starts with the sum of the squared deviation scores, but divides this by the
number of scores minus 1. Dividing by a slightly smaller number makes the result
slightly larger. Dividing by the number of scores minus 1 makes the variance you get
just enough larger to make it an unbiased estimate of the population variance. (This
unbiased estimate is our best estimate of the population variance. However, it is still
an estimate, so it is unlikely to be exactly the same as the true population variance. But
we can be certain that our unbiased estimate of the population variance is equally likely
to be too high as it is to be too low. This is what makes the estimate unbiased.)

The symbol we will use for the unbiased estimate of the population variance is
. The formula is the usual variance formula, but now dividing by :

(7–1)

(7–2)

Let’s return again to the example of hours spent studying and figure the estimated
population variance from the sample’s 16 scores. First, you figure the sum of squared
deviation scores. (Subtract the mean from each of the scores, square those deviation
scores, and add them.) Presume in our example that this comes out to
To get the estimated population variance, you divide this sum of squared deviation
scores by the number of scores minus 1; that is, in this example, you divide 694 by

; 694 divided by 15 comes out to 46.27. In terms of the formula,

At this point, you have now seen several different types of standard deviation
and variance (that is, for a sample, for a population, and unbiased estimates); and
each of these types has used a different symbol. To help you keep them straight, a
summary of the types of standard deviation and variance is shown in Table 7–1.

Degrees of Freedom
The number you divide by (the number of scores minus 1) to get the estimated pop-
ulation variance has a special name. It is called the degrees of freedom. It has this
name because it is the number of scores in a sample that are “free to vary.” The idea
is that, when figuring the variance, you first have to know the mean. If you know the
mean and all but one of the scores in the sample, you can figure out the one you
don’t know with a little arithmetic. Thus, once you know the mean, one of the

scores

in the sample is not free to have any possible value. So in this kind of situation the
degrees of freedom are the number of scores minus 1. In terms of a formula,

(7–3)

df is the degrees of freedom.

df =

N – 1

S2 = a
(X – M)2

N – 1 =

694

16 – 1

694

15
=

46.27

16 – 1

694 (SS = 694).

S = 2S2

S2 = a
(X – M)2
N – 1 =
SS
N – 1

N – 1S2
The estimated population
variance is the sum of the
squared deviation scores di-
vided by the number of
scores minus 1.

The estimated population
standard deviation is the
square root of the estimated
population variance.

degrees of freedom (df ) number of
scores free to vary when estimating a
population parameter; usually part of a
formula for making that estimate—for
example, in the formula for estimating
the population variance from a single
sample, the degrees of freedom is the
number of scores minus 1.

Table 7–1 Summary of
Different Types of Standard Deviation
and Variance

Statistical Term Symbol

Sample standard deviation SD

Population standard deviation

Estimated population S
standard deviation

Sample variance SD2

Population variance

Estimated population variance

S 2

�2

�

The degrees of freedom are
the number of scores in the
sample minus 1.

IS
B
N
0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

Introduction to t Tests 227

In our example, . (In some situations you learn about in later
chapters, the degrees of freedom are figured a bit differently. This is because in those
situations, the number of scores free to vary is different. For all the situations you
learn about in this chapter, .)

The formula for the estimated population variance is often written using df in-
stead of :

(7–4)

The Standard Deviation of the Distribution of Means
Once you have figured the estimated population variance, you can figure the stan-
dard deviation of the comparison distribution using the same procedures you learned
in Chapter 5. Just as before, when you have a sample of more than one, the compar-
ison distribution is a distribution of means, and the variance of a distribution of
means is the variance of the population of individuals divided by the sample size.
You have just estimated the variance of the population. Thus, you can estimate the
variance of the distribution of means by dividing the estimated population variance
by the sample size. The standard deviation of the distribution of means is the square
root of its variance. Stated as formulas,

(7–5)

(7–6)

Note that, with an estimated population variance, the symbols for the variance and
standard deviation of the distribution of means use S instead of .

In our example, the sample size was 16 and we worked out the estimated popu-
lation variance to be 46.27. The variance of the distribution of means, based on that
estimate, will be 2.89. That is, 46.27 divided by 16 equals 2.89. The standard devia-
tion is 1.70, the square root of 2.89. In terms of the formulas,

The Shape of the Comparison Distribution When Using
an Estimated Population Variance: The t Distribution
In Chapter 5 you learned that when the population distribution follows a normal
curve, the shape of the distribution of means will also be a normal curve. However,
this changes when you do hypothesis testing with an estimated population variance.
When you are using an estimated population variance, you have less true informa-
tion and more room for error. The mathematical effect is that there are likely to be
slightly more extreme means than in an exact normal curve. Further, the smaller your

SM = 2S2M = 22.89 = 1.70
S2M =

N
=

46.27

16
= 2.8

�

SM = 2

S2M

S2M =
S2

N
S2 = a
(X – M)2

SS
df
N – 1

df = N – 1

df = 16 – 1 = 15

T I P F O R S U C C E S S
Be sure that you fully understand
the difference between and .
These terms look quite similar, but
they are quite different. is the
estimated variance of the popula-
tion of individuals. is the esti-
mated variance of the distribution
of means (based on the estimated
variance of the population of indi-
viduals, ).S2

S2M
S2

2S2

The estimated population
variance is the sum of squared
deviations divided by the de-
grees of freedom.

The variance of the distribu-
tion of means based on an es-
timated population variance
is the estimated population
variance divided by the num-
ber of scores in the sample.

The standard deviation of the
distribution of means based on
an estimated population vari-
ance is the square root of the
variance of the distribution of
means based on an estimated
population variance.

IS
B
N
0-
55
8-
46
76
1-
X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

228 Chapter 7

sample size, the bigger this tendency. This is because, with a smaller sample size,
your estimate of the population variance is based on less information.

The result of all this is that, when doing hypothesis testing using an estimated
variance, your comparison distribution will not be a normal curve. Instead, the com-
parison distribution will be a slightly different curve called a t distribution.

Actually, there is a whole family of t distributions. They vary in shape according
to the degrees of freedom you used to estimate the population variance. However, for
any particular degrees of freedom, there is only one t distribution.

Generally, t distributions look to the eye like a normal curve—bell-shaped, sym-
metrical, and unimodal. A t distribution differs subtly in having heavier tails (that is,
slightly more scores at the extremes). Figure 7–2 shows the shape of a

t distribution

compared to a normal curve.

This slight difference in shape affects how extreme a score you need to reject
the null hypothesis. As always, to reject the null hypothesis, your sample mean has
to be in an extreme section of the comparison distribution of means, such as the top
5%. However, if the comparison distribution has more of its means in the tails than a
normal curve would have, then the point where the top 5% begins has to be farther
out on this comparison distribution. The result is that it takes a slightly more extreme
sample mean to get a significant result when using a t distribution than when using a
normal curve.

Just how much the t distribution differs from the normal curve depends on the de-
grees of freedom, the amount of information used in estimating the population vari-
ance. The t distribution differs most from the normal curve when the degrees of
freedom are low (because your estimate of the population variance is based on a very
small sample). For example, using the normal curve, you may recall that 1.64 is the
cutoff for a one-tailed test at the .05 level. On a t distribution with 7 degrees of free-
dom (that is, with a sample size of 8), the cutoff is 1.895 for a one-tailed test at the .05
level. If your estimate is based on a larger sample, say a sample of 25 (so that ),
the cutoff is 1.711, a cutoff much closer to that for the normal curve. If your sample
size is infinite, the t distribution is the same as the normal curve. (Of course, if your
sample size were infinite, it would include the entire population!) But even with sam-
ple sizes of 30 or more, the t distribution is nearly identical to the normal curve.

Shortly, you will learn how to find the cutoff using a t distribution, but let’s first
return briefly to the example of how much students in your dorm study each week.
You finally have everything you need for Step ❷ about the characteristics of the
comparison distribution. We have already seen that the distribution of means in this
example has a mean of 17 hours and a standard deviation of 1.70. You can now add
that the shape of the comparison distribution will be a t distribution with 15 degrees
of freedom.2

df = 24

Normal distribution

t distribution

Figure 7–2 A t distribution (dashed blue line) compared to the normal curve (solid
black line).

t distribution mathematically defined
curve that is the comparison distribution
used in a t test.

IS
B
N
0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

Introduction to t Tests 229

The Cutoff Sample Score for Rejecting the Null
Hypothesis: Using the t Table
Step ❸ of hypothesis testing is determining the cutoff for rejecting the null hypothesis.
There is a different t distribution for any particular degrees of freedom. However, to
avoid taking up pages and pages with tables for each possible t distribution, you use a
simplified table that gives only the crucial cutoff points. We have included such a
t table in the Appendix (Table A–2). Just as with the normal curve table, the t table
shows only positive t scores. If you have a one-tailed test, you need to decide whether
your cutoff score is a positive t score or a negative t score. If your one-tailed test is test-
ing whether the mean of Population 1 is greater than the mean of Population 2, the cut-
off t score is positive. However, if your one-tailed test is testing whether the mean of
Population 1 is less than the mean of Population 2, the cutoff t score is negative.

In the hours-studied example, you have a one-tailed test. (You want to know
whether students in your dorm study more than students in general at your college
study.) You will probably want to use the 5% significance level, because the cost of a
Type I error (mistakenly rejecting the null hypothesis) is not great. You have 16 partic-
ipants, making 15 degrees of freedom for your estimate of the population variance.

Table 7–2 shows a portion of the t table from Table A–2 in the Appendix. Find
the column for the .05 significance level for one-tailed tests and move down to the
row for 15 degrees of freedom. The crucial cutoff is 1.753. In this example, you are
testing whether students in your dormitory (Population 1) study more than students
in general at your college (Population 2). In other words, you are testing whether

Table 7–2 Cutoff Scores for t Distributions with 1 Through 17 Degrees of Freedom
(Highlighting Cutoff for Hours-Studied Example)

One-Tailed Tests Two-Tailed Tests

df .10 .05 .01 .10 .05 .01

1 3.078 6.314 31.821 6.314 12.706 63.657

2 1.886 2.920 6.965 2.920 4.303 9.925

3 1.638 2.353 4.541 2.353 3.182 5.841

4 1.533 2.132 3.747 2.132 2.776 4.604

5 1.476 2.015 3.365 2.015 2.571 4.032

6 1.440 1.943 3.143 1.943 2.447 3.708

7 1.415 1.895 2.998 1.895 2.365 3.500

8 1.397 1.860 2.897 1.860 2.306 3.356

9 1.383 1.833 2.822 1.833 2.262 3.250

10 1.372 1.813 2.764 1.813 2.228 3.170

11 1.364 1.796 2.718 1.796 2.201 3.106

12 1.356 1.783 2.681 1.783 2.179 3.055

13 1.350 1.771 2.651 1.771 2.161 3.013

14 1.345 1.762 2.625 1.762 2.145 2.977

15 1.341 1.753 2.603 1.753 2.132 2.947

16 1.337 1.746 2.584 1.746 2.120 2.9

17 1.334 1.740 2.567 1.740 2.110 2.898

t table table of cutoff scores on the
t distribution for various degrees of
freedom, significance levels, and
one- and two-tailed tests.

IS
B
N
0-
55
8-
46
76
1-
X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

230 Chapter 7

students in your dormitory have a higher t score than students in general. This means
that the cutoff t score is positive. Thus, you will reject the null hypothesis if your
sample’s mean is 1.753 or more standard deviations above the mean on the compar-
ison distribution. (If you were using a known variance, you would have found your
cutoff from a normal curve table. The Z score to reject the null hypothesis based on
the normal curve would have been 1.645.)

One other point about using the t table: In the full t table in the Appendix, there
are rows for each degree of freedom from 1 through 30, then for 35, 40, 45, and so
on up to 100. Suppose your study has degrees of freedom between two of these higher
values. To be safe, you should use the nearest degrees of freedom to yours given on
the table that is less than yours. For example, in a study with 43 degrees of freedom,
you would use the cutoff for .

The Sample Mean’s Score on the Comparison
Distribution: The t Score
Step ❹ of hypothesis testing is figuring your sample mean’s score on the comparison
distribution. In Chapter 5, this meant finding the Z score on the comparison
distribution—the number of standard deviations your sample’s mean is from the
mean on the distribution. You do exactly the same thing when your comparison distri-
bution is a t distribution. The only difference is that, instead of calling this a Z score,
because it is from a t distribution, you call it a t score. In terms of a formula,

(7–7)

In the example, your sample’s mean of 21 is 4 hours from the mean of the distri-
bution of means, which amounts to 2.35 standard deviations from the mean (4 hours
divided by the standard deviation of 1.70 hours).3 That is, the t score in the example
is 2.35. In terms of the formula,

Deciding Whether to Reject the Null Hypothesis
Step ➎ of hypothesis testing is deciding whether to reject the null hypothesis. This
step is exactly the same with a t test, as it was in the hypothesis-testing situations dis-
cussed in previous chapters. In the example, the cutoff t score was 1.753 and the
actual t score for your sample was 2.35. Conclusion: reject the null hypothesis. The
research hypothesis is supported that students in your dorm study more than students
in the college overall.

Figure 7–3 shows the various distributions for this example.

Summary of Hypothesis Testing When the Population
Variance Is Not Known
Table 7–3 compares the hypothesis-testing procedure we just considered (for a t test
for a single sample) with the hypothesis-testing procedure for a Z test from
Chapter 5. That is, we are comparing the current situation in which you know the
population’s mean but not its variance to the Chapter 5 situation, where you knew
the population’s mean and variance.

t =
M – �

SM
=

21 –

1.70

=
4

1.70
= 2.35

t =
M – �
SM

df = 40

The t score is your sample’s
mean minus the population
mean, divided by the standard
deviation of the distribution
of means.

t score on a t distribution, number of
standard deviations from the mean (like
a Z score, but on a t distribution).

IS
B
N
0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

Introduction to t Tests 231

Comparison
distribution

(t)

Population
(normal)

15.3013.60 17.00 18.70 20.40
–1–2 0 1 2

21
Sample

Raw Scores:
t Scores:

Figure 7–3 Distribution for the hours-studied example.

Table 7–3 Hypothesis Testing with a Single Sample Mean When Population Variance
Is Unknown (t Test for a Single Sample) Compared to When Population
Variance Is Known (Z Test)

Another Example of a t Test for a Single Sample
Consider another fictional example. Suppose a researcher was studying the psychologi-
cal effects of a devastating flood in a small rural community. Specifically, the researcher
was interested in how hopeful (versus unhopeful) people felt after the flood. The

Steps in Hypothesis Testing
Difference From When Population
Variance Is Known

❶ Restate the question as a research hypothesis
and a null hypothesis about the populations.

No difference in method.

❷ Determine the characteristics of the comparison
distribution:

Population mean No difference in method.

Standard deviation of the distribution
of sample means

No difference in method (but based on estimated
population variance).

Population variance Estimate from the sample.

Shape of the comparison distribution Use the t distribution with .df = N – 1
❸ Determine the significance cutoff. Use the t table.

❹ Determine your sample’s score on the
comparison distribution.

No difference in method (but called a t score).

❺ Decide whether to reject the null hypothesis. No difference in method.

IS
B
N
0-
55
8-
46
76
1-
X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

232 Chapter 7

researcher randomly selected 10 people from this community to complete a short ques-
tionnaire. The key item on the questionnaire asked how hopeful they felt, using a 7-point
scale from extremely unhopeful (1) to neutral (4) to extremely hopeful (7). The re-
searcher wanted to know whether the ratings of hopefulness for people who had been
through the flood would be consistently above or below the neutral point on the scale (4).

Table 7–4 shows the results and figuring for the t test for a single sample;
Figure 7–4 shows the distributions involved. Here are the steps of hypothesis testing.

❶ Restate the question as a research hypothesis and a null hypothesis about
the populations. There are two populations:

Population 1: People who experienced the flood.
Population 2: People who are neither hopeful nor unhopeful.

The research hypothesis is that the two populations will score differently. The
null hypothesis is that they will score the same.

❷ Determine the characteristics of the comparison distribution. If the null hy-
pothesis is true, the mean of both populations is 4. The variance of these popu-
lations is not known, so you have to estimate it from the sample. As shown in
Table 7–4, the sum of the squared deviations of the sample’s scores from the
sample’s mean is 32.10. Thus, the estimated population variance is 32.10 divided
by 9 degrees of freedom (10 – 1), which comes out to 3.57.

The distribution of means has a mean of 4 (the same as the population mean).
Its variance is the estimated population variance divided by the sample size (3.57

Table 7–4 Results and Figuring for a Single-Sample t Test for a Study of 10 People’s
Ratings of Hopefulness Following a Devastating Flood (Fictional Data)

Rating
(X )

Difference
From the Mean

(X � M )

Squared Difference
From the Mean

(X � M )2

5 .30 .09

3 2.89

6 1.30 1.69

2 7.29

7 2.30 5.29

6 1.30 1.69

7 5.29

4 .49

2 7.29
5 .30 .09

47 32.10

t with needed for 1% significance level, two-tailed .

Actual sample .

Decision: Do not reject the null hypothesis.

t = (M – �)>SM = (4.70 – 4.00)>.60 = .70>.60 = 1.17
= ;3.250

df = 9

SM = 2S 2M = 2.36 = .60.
S2M = S 2>N = 3.57>10 = .36
S 2 = SS>df = 32.10>(10 – 1) = 32.10>9 = 3.57.
� = 4.00.
df = N – 1 = 10 – 1 = 9.
M = (©X )>N = 47>10 = 4.70.

©:

-2.70

– .70

-2.30

-2.70

-1.70

T I P F O R S U C C E S S
Be careful. To find the variance of a
distribution of means, you always
divide the population variance by
the sample size. This is true
whether the population’s variance
is known or only estimated. It is
only when making the estimate of
the population variance that you
divide by the sample size minus 1.
That is, the degrees of freedom are
used only when estimating the
variance of the population of
individuals.

IS
B
N
0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

Introduction to t Tests 233

divided by 10 equals .36). The square root of this, the standard deviation of the
distribution of means, is .60. Its shape will be a t distribution for .

❸ Determine the cutoff sample score on the comparison distribution at which
the null hypothesis should be rejected. The researcher wanted to be very cau-
tious about mistakenly concluding that the flood made a difference. Thus, she
decided to use the .01 significance level. The hypothesis was nondirectional
(that is, no specific direction of difference from the mean of 4 was specified;
either result would have been of interest); so the researcher used a two-tailed
test. The researcher looked up the cutoff in Table 7–2 (or Table A–2 in the
Appendix) for a two-tailed test and 9 degrees of freedom. The cutoff given in
the table is 3.250. Thus, to reject the null hypothesis, the sample’s score on the
comparison distribution must be 3.250 or higher, or or lower.

❹ Determine your sample’s score on the comparison distribution. The sam-
ple’s mean of 4.70 is .70 scale points from the null hypothesis mean of 4.00.
That makes it 1.17 standard deviations on the comparison distribution from that
distribution’s mean ; .

➎ Decide whether to reject the null hypothesis. The t of 1.17 is not as extreme
as the needed t of . Therefore, the researcher cannot reject the null hy-
pothesis. The study is inconclusive. (If the researcher had used a larger sample,
giving more power, the result might have been quite different.)

Summary of Steps for a t Test for a Single Sample
Table 7–5 summarizes the steps of hypothesis testing when you have scores from a
single sample and a population with a known mean but an unknown variance.4

;3.250

t = 1.17(.70>.60 = 1.17)

-3.250

df = 9

Comparison
distribution (t)

Population
(normal)

4.00

3.40 4.00 4.60
–1 0 1

4.70

Sample
Raw Scores:
t Scores:

Figure 7–4 Distributions for the example of how hopeful individuals felt following a
devastating flood.

IS
B
N
0-
55
8-
46
76
1-
X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

234 Chapter 7

Table 7–5 Steps for a t Test for a Single Sample

❶ Restate the question as a research hypothesis and a null hypothesis about the populations.

❷ Determine the characteristics of the comparison distribution.

a. The mean is the same as the known population mean.

b. The standard deviation is figured as follows:

●A Figure the estimated population variance: .

●B Figure the variance of the distribution of means:

●C Figure the standard deviation of the distribution of means: .

c. The shape will be a t distribution with degrees of freedom.

❸ Determine the cutoff sample score on the comparison distribution at which the null hypothesis
should be rejected.

a. Decide the significance level and whether to use a one-tailed or a two-tailed test.

b. Look up the appropriate cutoff in a t table.

❹ Determine your sample’s score on the comparison distribution: .

❺ Decide whether to reject the null hypothesis: Compare the scores from Steps ❸ and ❹.

t = (M – �)>SM

N – 1
SM = 2S 2M

S 2M = S 2>N.
S 2 = SS>df

How are you doing?

1. In what sense is a sample’s variance a biased estimate of the variance of the
population the sample is taken from? That is, in what way does the sample’s
variance typically differ from the population’s?

2. What is the difference between the usual formula for figuring the variance and
the formula for estimating a population’s variance from the scores in a sample
(that is, the formula for an unbiased estimate of the population variance)?

3. (a) What are degrees of freedom? (b) How do you figure the degrees of freedom
in a t test for a single sample? (c) What do they have to do with estimating the
population variance? (d) What do they have to do with the t distribution?

4. (a) How does a t distribution differ from a normal curve? (b) How do degrees of
freedom affect this? (c) What is the effect of the difference on hypothesis testing?

5. List three differences in how you do hypothesis testing for a t test for a single
sample versus for the Z test (you learned in Chapter 5).

6. A population has a mean of 23. A sample of 4 is given an experimental proce-
dure and has scores of 20, 22, 22, and 20. Test the hypothesis that the proce-
dure produces a lower score. Use the .05 significance level. (a) Use the steps
of hypothesis testing and (b) make a sketch of the distributions involved.

❸Determine the cutoff sample score on the comparison distribution at
which the null hypothesis should be rejected. From Table A–2, the cutoff
for a one-tailed ttest at the .05 level for is . The cutoff tscore
is negative, since the research hypothesis is that the procedure produces a
lowerscore.

❹Determine your sample’s score on the comparison distribution.
.

➎Decide whether to reject the null hypothesis.The tof is more ex-
treme than the needed tof . Therefore, reject the null hypothesis;
the research hypothesis is supported.

(b) Sketches of distributions are shown in Figure 7–5.

-2.353
-3.51

t=(M-�)>SM=(21-23)>.57=-2>.57=-3.51

-2.353 df=3

IS
B
N
0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

Introduction to t Tests 235

6.(a) Steps of hypothesis testing:
❶Restate the question as a research hypothesis and a null hypothesis

about the populations. There are two populations:

Population 1: People who are given the experimental procedure.
Population 2: The general population.

The research hypothesis is that Population 1 will score lower than Population 2.
The null hypothesis is that Population 1 will not score lower than Population 2.
❷Determine the characteristics of the comparison distribution.

a.The mean of the distribution of means is 23.
b.The standard deviation is figured as follows:

●AFiguretheestimatedpopulationvariance.Youfirstneedtofigure
thesamplemean,whichis.The
estimatedpopulationvarianceis

.
●BFigure the variance of the distribution of means:

●CFigure the standard deviation of the distribution of means:

c.Theshapeofthecomparisondistributionwillbeatdistributionwithdf=3.
SM=2S2M=2.33=.57

S2
M=S2 > N=1.33 > 4=.33

– 12)>3=(1+1+1+1)>3=4>3=1.33
12 + (-12+12+ (22-21)2+(20-21)24>(4-1)= (22-21)2+

S2=SS>(N-1)=[(20-21)2+
(20+22+22+20)>4=84>4=21

Comparison
distribution
(t)
Population
(normal)
23

21.8622.4323

21
Sample

21.29

–2–10 –3

Raw Scores:
t Scores:

Figure 7–5Distributions for answer to “How Are You Doing?” question 6b.

IS
B
N
0-
55
8-
46
76
1-
X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

236 Chapter 7

repeated-measures design research
strategy in which each person is tested
more than once; same as within subjects
design.

t test for dependent means
hypothesis-testing procedure in which
there are two scores for each person and
the population variance is not known; it
determines the significance of a hypoth-
esis that is being tested using difference
or change scores from a single group of
people.

The t Test for Dependent Means
The situation you just learned about (the t test for a single sample) is for when you
know the population mean but not its variance and you have a single sample of
scores. It turns out that in most research you do not even know the population’s
mean; plus, in most research situations you usually have not one set, but two sets, of
scores. These two things, not knowing the population mean and having two sets of
scores, is very, very common.

The rest of this chapter focuses specifically on this important research situation in
which you have two scores from each person in your sample. This kind of research sit-
uation is called a repeated-measures design (also known as a within subjects design).
A common example is when you measure the same people before and after some
psychological or social intervention. For example, a psychologist might measure the
quality of men’s communication before and after receiving premarital counseling.

The hypothesis-testing procedure for the situation in which each person is mea-
sured twice (that is, for the situation in which we have a repeated-measures design) is a
t test for dependent means. It has the name “dependent means” because the mean for
each group of scores (for example, a group of before-scores and a group of after-scores)
are dependent on each other in that they are both from the same people. (In Chapter 8,
we consider the situation in which you compare scores from two different groups of
people, a research situation you analyze using a t test for independent means.)

You do a t test for dependent means exactly the same way as a t test for a single
sample, except that (a) you use something called difference scores, and (b) you as-
sume that the population mean (of the difference scores) is 0. We will now consider
each of these two new aspects.

Difference Scores
With a repeated-measures design, your sample includes two scores for each person in-
stead of just one. The way you handle this is to make the two scores per person into one

Answers

1.The sample’s variance will in general be smaller than the variance of the pop-
ulation the sample is taken from.

2.Intheusualformulayoudividebythenumberofparticipants(N);intheformu-
la for estimating a population’s variance from the scores in a sample, you
divide by the number of participants in the sample minus 1 (that is, ).

3.(a) Degrees of freedom consist of the number of scores free to vary. (b) The de-
grees of freedom in a ttest for a single sample consist of the number of scores
in the sample minus 1. (c) In estimating the population variance, the formula is
the sum of squared deviations divided by the degrees of freedom. (d) tdistrib-
utions differ slightly from each other according to the degrees of freedom.

4.(a) A tdistribution differs from a normal curve in that it has heavier tails; that is,
more scores at the extremes. (b) The more degrees of freedom, the closer the
shape (including the tails) is to a normal curve. (c) The cutoffs for significance
are more extreme for a tdistribution than for a normal curve.

5.In the ttest you (a) estimate the population variance from the sample (it is not
known in advance); (b) you look up the cutoff on a ttable in which you also
have to take into account the degrees of freedom (you don’t use a normal
curve table); and (c) your sample’s score on the comparison distribution,
which is a tdistribution (not a normal curve), is a tscore (not a Zscore).

N-1

IS
B
N
0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

Introduction to t Tests 237

score per person! You do this magic by creating difference scores: For each person,
you subtract one score from the other. If the difference is before versus after, differ-
ence scores are also called change scores.

Consider the example of the quality of men’s communication before and after re-
ceiving premarital counseling. The psychologist subtracts the communication quality
score before the counseling from the communication quality score after the counsel-
ing. This gives an after-minus-before difference score for each man. When the two
scores are a before-score and an after-score, we usually take the after-score minus the
before-score to indicate the change.

Once you have the difference score for each person in the study, you do the rest of
the hypothesis testing with difference scores. That is, you treat the study as if there were
a single sample of scores (scores that in this situation happen to be difference scores).

Population of Difference Scores with a Mean of 0
So far in the research situations we have considered in this book, you have always
known the mean of the population to which you compared your sample’s mean. For
example, in the college dormitory survey of hours studied, you knew the population
mean was 17 hours. However, now we are using difference scores, and we usually
don’t know the mean of the population of difference scores.

Here is the solution. Ordinarily, the null hypothesis in a repeated-measures de-
sign is that on the average there is no difference between the two groups of scores.
For example, the null hypothesis in a study of the quality of men’s communication
before and after receiving premarital counseling is that on the average there is no dif-
ference between communication quality before and after the counseling. What does
no difference mean? Saying there is on the average no difference is the same as say-
ing that the mean of the population of the difference scores is 0. Therefore, when
working with difference scores, you are comparing the population of difference
scores that your sample of difference scores comes from to a population of differ-
ence scores with a mean of 0. In other words, with a t test for dependent means, what
we call Population 2 will ordinarily have a mean of 0 (that is, it is a population of dif-
ference scores that has a mean of 0).

Example of a t Test for Dependent Means
Olthoff (1989) tested the communication quality of couples three months before and
again three months after marriage. One group studied was 19 couples who had re-
ceived ordinary (very minimal) premarital counseling from the ministers who were
going to marry them. (To keep the example simple, we will focus on just this one
group and only on the husbands in the group. Scores for wives were similar, though
somewhat more varied, making it a more complicated example for learning the t test
procedure.)

The scores for the 19 husbands are listed in the “Before” and “After” columns in
Table 7–6, followed by all the t test figuring. (The distributions involved are shown
in Figure 7–6.) The crucial column for starting the analysis is the difference scores.
For example, the first husband, whose communication quality was 126 before mar-
riage and 115 after had a difference of . (We figured after minus before, so that
an increase is positive and a decrease, as for this husband, is negative.) The mean of
the difference scores is . That is, on the average, these 19 husbands’ commu-
nication quality decreased by about 12 points.

Is this decrease significant? In other words, how likely is it that this sample of
difference scores is a random sample from a population of difference scores whose
mean is 0?

-12.05

-11

difference scores difference between
a person’s score on one testing and the
same person’s score on another testing;
often an after-score minus a before-
score, in which case it is also called a
change score.

IS
B
N
0-
55
8-
46
76
1-
X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

238 Chapter 7

❶ Restate the question as a research hypothesis and a null hypothesis about
the populations. There are two populations:

Population 1: Husbands who receive ordinary premarital counseling.
Population 2: Husbands whose communication quality does not change from
before to after marriage. (In other words, it is a population of husbands whose
mean difference in communication quality from before to after marriage is 0.)

The research hypothesis is that Population 1’s mean difference score (com-
munication quality after marriage minus communication quality before marriage)
is different from Population 2’s mean difference score (of zero). That is, the

Table 7–6 t Test for Communication Quality Scores Before and After Marriage for
19 Husbands Who Received Ordinary Premarital Counseling

Husband
Communication

Quality
Difference

(After – Before)
Deviation

(Difference – M )
Squared
Deviation

Before After

A 126 115 1.05 1.10

B 133 125 4.05 16.40

C 126 96 322.20

D 115 115 0 12.05 145.20

E 108 119 11 23.05 531.30

F 109 82 223.50

G 124 93 359.10

H 98 109 11 23.05 531.30

I 95 72 119.90

J 120 104 15.60

K 118 107 1.05 1.10

L 126 118 4.05 16.40

M 121 102 48.30

N 116 115 11.05 122.10

O 94 83 1.05 1.10

P 105 87 35.40

Q 123 121 10.05 101.00

R 125 100 167.70

S 128 118 2.05 4.20

2,210 1,981 2,762.90

For difference scores:

(assumed as a no-change baseline of comparison).

t with needed for 5% level, two-tailed .

Decision: Reject the null hypothesis.

Source: Data from Olthoff (1989).

t = (M – �)>SM = (-12.05 – 0)>2.84 = -4.24.
= ;2.101df = 18

SM = 2S2M = 28.08 = 2.84.
S2M = S2>N = 153.49>19 = 8.08.
S2 = SS>df = 2,762.90>(19 – 1) = 153.49.

� = 0

M = -229>19 = -12.05.

-229©:
-10

-12.95-25
-2

-5.95-18
-11
-1

-6.95-19
-8

-11
-3.95-16

-10.95-23

-18.95-31
-14.95-27

-17.95-30
-8

-11

T I P F O R S U C C E S S
As in previous chapters, Popula-
tion 2 is the population for when
the null hypothesis is true.

IS
B
N
0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

Introduction to t Tests 239

research hypothesis is that husbands who receive ordinary premarital counseling,
like the husbands Olthoff studied, do change in communication quality from be-
fore to after marriage. The null hypothesis is that the populations are the same—
that the husbands who receive ordinary premarital counseling do not change in
their communication quality from before to after marriage.

Notice that you have no actual information about Population 2 husbands.
The husbands in the study are a sample of Population 1 husbands. For the pur-
poses of hypothesis testing, you set up Population 2 as a kind of straw man com-
parison group. That is, for the purpose of the analysis, you set up a comparison
group of husbands who, if measured before and after marriage, would on the
average show no difference.

❷ Determine the characteristics of the comparison distribution. If the null hy-
pothesis is true, the mean of the population of difference scores is 0. The vari-
ance of the population of difference scores can be estimated from the sample
of difference scores. As shown in Table 7–6, the sum of squared deviations of
the difference scores from the mean of the difference scores is 2,762.90. With 19
husbands in the study, there are 18 degrees of freedom. Dividing the sum of
squared deviation scores by the degrees of freedom gives an estimated popula-
tion variance of difference scores of 153.49.

The distribution of means (from this population of difference scores) has a
mean of 0, the same as the mean of the population of difference scores. The vari-
ance of the distribution of means of difference scores is the estimated population
variance of difference scores (153.49) divided by the sample size (19), which

Comparison
distribution (t)

Population
of difference

scores
0

–2.85 0 2.85
–1 0 1

–12.05

Sample

Raw Scores
t Scores

Figure 7–6 Distributions for the Olthoff (1989) example of a t test for dependent
means.

IS
B
N
0-
55
8-
46
76
1-
X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

240 Chapter 7

gives 8.08. The standard deviation of the distribution of means of difference
scores is 2.84, the square root of 8.08. Because Olthoff was using an estimated
population variance, the comparison distribution is a t distribution. The estimate
of the population variance of difference scores is based on 18 degrees of freedom;
so this comparison distribution is a t distribution for 18 degrees of freedom.

❸ Determine the cutoff sample score on the comparison distribution at which
the null hypothesis should be rejected. Olthoff used a two-tailed test to allow
for either an increase or decrease in communication quality. Using the .05 sig-
nificance level and 18 degrees of freedom, Table A–2 shows cutoff t scores of

and .
❹ Determine your sample’s score on the comparison distribution. Olthoff’s

sample had a mean difference score of . That is, the mean was 12.05
points below the mean of 0 on the distribution of means of difference scores.
The standard deviation of the distribution of means of difference scores is 2.84.
Thus, the mean of the difference scores of is 4.24 standard deviations
below the mean of the distribution of means of difference scores. So Olthoff’s
sample of difference scores has a t score of .

❺ Decide whether to reject the null hypothesis. The t of for the sample of
difference scores is more extreme than the needed t of . Thus, you can re-
ject the null hypothesis: Olthoff’s husbands are from a population in which hus-
bands’ communication quality is different after marriage from what it was
before (it is lower).

Olthoff’s actual study was more complex. You may be interested to know that he
found that the wives also showed this decrease in communication quality after mar-
riage. But a group of similar engaged couples who were given special communication
skills training by their ministers (much more than the usual short session) had no sig-
nificant decline in marital communication quality after marriage. In fact, there is a
great deal of research showing that on the average marital happiness declines steeply
over time (VanLaningham et al., 2001). And many studies have now shown the value
of a full course of premarital communications training. For example, a recent repre-
sentative survey of 3,344 adults in the United States showed that those who had at-
tended a premarital communication program had significantly greater marital
satisfaction, had less marital conflict, and were 31% less likely to divorce (Stanley et al.,
2006). Further, benefits were greatest for those with a college education!

Summary of Steps for a t Test for Dependent Means
Table 7–7 summarizes the steps for a t test for dependent means.5

A Second Example of a t Test for Dependent Means
Here is another example. A team of researchers examined the brain systems involved
in human romantic love (Aron et al., 2005). One issue was whether romantic love en-
gages a part of the brain called the caudate (a brain structure that is engaged when peo-
ple win money, are given cocaine, and other such “rewards”). Thus, the researchers
recruited people who had very recently fallen “madly in love.” (For example, to be in
the study participants had to think about their partner at least 80% of their waking
hours.) Participants brought a picture of their beloved with them, plus a picture of a fa-
miliar, neutral person of the same age and sex as their beloved. Participants then went
in to the functional magnetic resonance imaging (fMRI) machine and their brain was
scanned while they looked at the two pictures—30 seconds at the neutral person’s pic-
ture, 30 seconds at their beloved, 30 seconds at the neutral person, and so forth.

;2.101

-4.24

-4.24
-12.05
-12.05

-2.101+2.101

T I P F O R S U C C E S S
Step ❷ of hypothesis testing for the
t test for dependent means is more
complex than previously. This can
make it easy to lose track of the
purpose of this step. Step ❷ of
hypothesis testing determines the
characteristics of the comparison
distribution. In the case of the t test
for dependent means, this compar-
ison distribution is a distribution of
means of difference scores. The
key characteristics of this distribu-
tion are its mean (which is as-
sumed to equal 0), its standard
deviation (which is estimated as ),
and its shape (a t distribution with
degrees of freedom equal to the
sample size minus 1).

T I P F O R S U C C E S S
You now have to deal with some
rather complex terms, such as the
standard deviation of the distribu-
tion of means of difference scores.
Although these terms are complex,
there is good logic behind them.
The best way to understand such
terms is to break them down into
manageable pieces. For example,
you will notice that these new
terms are the same as the terms
for the t test for a single sample,
with the added phrase “of differ-
ence scores.” This phrase has
been added because all of the fig-
uring for the t test for dependent
means uses difference scores.

IS
B
N
0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

Introduction to t Tests 241

Table 7–8 shows average brain activations (mean fMRI scanner values) in the
caudate area of interest during the two kinds of pictures. (We have simplified the
example for teaching purposes, including using only 10 participants when the actual
study had 17.) It also shows the figuring of the difference scores and all the other

Table 7–7 Steps for a t Test for Dependent Means

❶ Restate the question as a research hypothesis and a null hypothesis about the populations.
❷ Determine the characteristics of the comparison distribution.

a. Make each person’s two scores into a difference score. Do all the remaining steps using these difference
scores.

b. Figure the mean of the difference scores.

c. Assume a mean of the distribution of means of difference scores of 0: .

d. The standard deviation of the distribution of means of difference scores is figured as follows:

●A Figure the estimated population variance of difference scores: .

●B Figure the variance of the distribution of means of difference scores: .

●C Figure the standard deviation of the distribution of means of difference scores:

e. The shape is a t distribution with .

❸ Determine the cutoff sample score on the comparison distribution at which the null hypothesis
should be rejected.
a. Decide the significance level and whether to use a one-tailed or a two-tailed test.
b. Look up the appropriate cutoff in a t table.
❹ Determine your sample’s score on the comparison distribution: .

➎ Decide whether to reject the null hypothesis: Compare the scores from Steps ❸ and ❹.

t = (M – �)>SM

df = N – 1
SM = 2S 2M .

S2M = S 2>N
S 2 = SS>df

� = 0

Table 7–8 t Test for a Study of Romantic Love and Brain Activation in Part of the Caudate

Brain Activation

Student Beloved’s photo Control photo

Difference
(Beloved –

Control)
Deviation

(Difference – M )
Squared
Deviation

1 1487.8 1487.2 .6 .640

2 1329.4 1328.1 1.3 .010

3 1407.9 1405.9 2.0 .600 .360

4 1236.1 1234.0 2.1 .700 .490

5 1299.8 1298.2 1.6 .200 .040

6 1447.2 1444.7 2.5 1.100 1.210

7 1354.1 1354.3 2.560

8 1204.6 1203.7 .9 .250

9 1322.3 1320.8 1.5 .100 .010

10 1388.5 1386.8 1.7 .300 .090

13477.7 13463.7 14.0 5.660

For difference scores:
(assumed as a no-change baseline of comparison).

t with needed for 5% level, one-tailed .

Decision: Reject the null hypothesis.

Source: Data based on Aron et al. (2005).

t = (M – �)>SM = (1.400 – 0)>.251 = 5.58.
= 1.833df = 9

SM = 2S2M = 2.063 = .251.
S2M = S2>N = .629>10 = .063.
S 2 = SS>df = 5.660>(10 – 1) = 5.660>9 = .629.
� = 0
M = 14.0>10 = 1.400.

©:

– .500
-1.600- .2

– .100
– .800

IS
B
N
0-
55
8-
46
76
1-
X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

242 Chapter 7

figuring for the t test for dependent means. Figure 7–7 shows the distributions in-
volved. Here are the steps of hypothesis testing:

❶ Restate the question as a research hypothesis and a null hypothesis about
the populations. There are two populations:

Population 1: Individuals like those tested in this study.
Population 2: Individuals whose brain activation in the caudate area of interest is
the same when looking at a picture of their beloved and a picture of a familiar,
neutral person.

The research hypothesis is that Population 1’s mean difference score (brain activa-
tion when viewing the beloved’s picture minus brain activation when viewing the
neutral person’s picture) is greater than Population 2’s mean difference score (of no
difference). That is, the research hypothesis is that brain activation in the caudate
area of interest is greater when viewing the beloved person’s picture than when
viewing the neutral person’s picture. The null hypothesis is that Population 1’s
mean difference score is not greater than Population 2’s. That is, the null hypothe-
sis is that brain activation in the caudate area of interest is not greater when viewing
the beloved person’s picture than when viewing the neutral person’s picture.

❷ Determine the characteristics of the comparison distribution.
a. Make each person’s two scores into a difference score. This is shown in the

column labeled “Difference” in Table 7–8. You do all the remaining steps
using these difference scores.

Comparison
distribution (t)
Population
of difference
scores
0

–.251 0 .251
–1 0 1

1.400

Sample
Raw Scores:
t Scores:

Figure 7–7 Distributions for the example of romantic love and brain activation in part
of the caudate.

IS
B
N
0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

Introduction to t Tests 243

b. Figure the mean of the difference scores. The sum of the difference scores
(14.0) divided by the number of difference scores (10) gives a mean of the
difference scores of 1.400. So, .

c. Assume a mean of the distribution of means of difference scores of 0:
d. The standard deviation of the distribution of means of difference scores is fig-

ured as follows:
●A Figure the estimated population variance of difference scores:

.
●B Figure the variance of the distribution of means of difference scores:

●C Figure the standard deviation of the distribution of means of difference
scores:

e. The shape is a t distribution with . Therefore, the comparison
distribution is a t distribution for 9 degrees of freedom. It is a t distribution
because we figured its variance based on an estimated population variance. It
has 9 degrees of freedom because there were 9 degrees of freedom in the
estimate of the population variance.

❸ Determine the cutoff sample score on the comparison distribution at which
the null hypothesis should be rejected.
a. We will use the standard .05 significance level. This is a one-tailed test because

the researchers were interested only in a specific direction of difference.
b. Using the .05 significance level with 9 degrees of freedom, Table A–2 shows a

cutoff t of 1.833. In Table 7–8, the difference score is figured as brain activa-
tion when viewing the beloved’s picture minus brain activation when viewing
the neutral person’s picture. Thus, the research hypothesis predicts a positive
difference score, which means that our cutoff is .

❹ Determine your sample’s score on the comparison distribution.
. The sample’s mean difference

of 1.400 is 5.58 standard deviations (of .251 each) above the mean of 0 on the
distribution of means of difference scores.

➎ Decide whether to reject the null hypothesis. The sample’s t score of 5.58 is
more extreme than the cutoff t of 1.833. You can reject the null hypothesis.
Brain activation in the caudate area of interest is greater when viewing a
beloved’s picture than when viewing a neutral person’s picture. The results of
this study are not limited to North Americans. Recently, the study was replicated,
with virtually identical results, in Beijing with Chinese students who were in-
tensely in love (Xu et al., 2007).

t Test for Dependent Means with Scores
from Pairs of Research Participants
The t test for dependent means is also called a paired-samples t test, t test for correlated
means, t test for matched samples, and t test for matched pairs. Each of these names
comes from the same idea that in this kind of t test you are comparing two sets of scores
that are related to each other in a direct way. In the t test for dependent means examples
in this chapter, the two sets of scores have been related because each individual had a
score in both sets of scores (for example, a score before a procedure and a score after a
procedure). However, you can also use a t test for dependent means with scores from
pairs of research participants, considering each pair as if it were one person, and figur-
ing the difference score for each pair. For example, suppose you have 30 married cou-
ples and want to test whether wives consistently do more housework than husbands.

t = (M – �)>SM = (1.400 – 0)>.251 = 5.58

+1.833

df = N – 1
SM = 2S2M = 2.063 = .251.

S2M = S2 > N = .629 > 10 = .063.

S2 = SS>df = 5.660>(10 – 1) = .629

� = 0.
M = 1.400

IS
B
N
0-
55
8-
46
76
1-
X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

244 Chapter 7

You could figure for each couple a difference score of the wife’s hours of housework
per week minus her husband’s number of hours of housework per week. There are also
situations in which experimenters create pairs. For example, a researcher might put
participants into pairs to do a puzzle task together and, for each pair, assign one to be a
leader and one a follower. At the end of the study, participants privately fill out a ques-
tionnaire about how much they enjoyed the interaction. The procedure for analyzing
this study would be to create a difference score for each pair by taking the enjoyment
rating of the leader minus the enjoyment rating of the follower.

Review and Comparison of Z Test, t Test for a Single
Sample, and t test for Dependent Means
In Chapter 5 you learned about the Z test; in this chapter you have learned about the
t test for a single sample and the t test for dependent means. Table 7–9 provides a
review and comparison of the Z test, the t test for a single sample, and the t test for
dependent means.

T I P F O R S U C C E S S
We recommend that you spend
some time carefully going through
Table 7–9. Test your understanding
of the different tests by covering
up portions of the table and trying
to recall the hidden information.
Also, take a look at Chapter Note 3
(page 268) for a discussion of the
terminology used in the formulas.

Table 7–9 Review of the Z Test, the t Test for a Single Sample, and the t Test for Dependent
Means

Type of Test

Features Z Test
t Test for a

Single Sample
t Test for

Dependent Means

Population variance is known Yes No No

Population mean is known Yes Yes No

Number of scores for each participant 1 1 2

Shape of comparison distribution Z distribution t distribution t distribution

Formula for degrees of freedom Not applicable

Formula t = (M – �)>SMt = (M – �)>SMZ = (M – �M)>

�M

df = N – 1df = N – 1

How are you doing?

1. Describe the situation in which you would use a t test for dependent means.
2. When doing a t test for dependent means, what do you do with the two

scores you have for each participant?
3. In a t test for dependent means, (a) what is usually considered to be the mean

of the “known” population (Population 2). (b) Why?
4. Five individuals are tested before and after an experimental procedure; their

scores are given in the following table. Test the hypothesis that there is no
change, using the .05 significance level. (a) Use the steps of hypothesis test-
ing and (b) sketch the distributions involved.

Person Before After

1 20 30
2 30 50
3 20 10
4 40 30
5 30 40

IS
B
N
0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

Introduction to t Tests 245

5. What about the research situation makes the difference in whether you should
carry out a Z test or a t test for a single sample?

6. What about the research situation makes the difference in whether you should
carry out a t test for a single sample or a t test for dependent means?

Comparison
distribution (t)
Population
of difference
scores
0

–606
–101

4.0

Sample
Raw Scores:
t Scores:

Figure 7–8Distributions for answer to “How Are You Doing?” question 4.

➎Decidewhethertorejectthenullhypothesis.Thesample’stscoreof.67
isnotmoreextremethanthecutofftof.Therefore,donotrejectthe
nullhypothesis.

4.(b) The distributions are shown inFigure 7–8.
5.As shown in Table 7–9, whether the population variance is known determines

whether you should carry out a Ztest or a ttest for a single sample. You use
a Ztest when the population variance is known and you use the ttest for a
single sample when it is not known.

6.As shown in Table 7–9, whether the population mean is known and whether
there are one or two scores for each participant determines whether you
should carry out a ttest for a single sample or a ttest for dependent means.
You use a ttest for a single sample when you know the population mean and
you have one score for each participant; you use the ttest for dependent
means when you do not know the population mean and there are two scores
for each participant.

;2.776

IS
B
N
0-
55
8-
46
76
1-
X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

246 Chapter 7

Answers

1.A ttest for dependent means is used when you are doing hypothesis testing
and you have two scores for each participant (such as a before-score and an
after-score) and the population variance is unknown. It is also used when a
study compares participants who are organized into pairs.

2.Subtract one from the other to create a difference (or change) score for each
person. The ttest is then done with these difference (or change) scores.

3.(a) The mean of the “known” population (Population 2) is 0. (b) You are com-
paring your sample to a situation in which there is no difference—a population
of difference scores in which the average difference is 0.

4.(a) Steps of hypothesis testing (all figuring is shown in Table 7–10):
❶Restate the question as a research hypothesis and a null hypothesis

about the populations.There are two populations:

Population 1:People like those tested before and after the experimental
procedure.
Population 2:People whose scores are the same before and after the
experimental procedure.

The research hypothesis is that Population 1’s mean change score (after
minus before) is different from Population 2’s. The null hypothesis is that
Population 1’s mean change score is the same as Population 2’s.

❷Determine the characteristics of the comparison distribution.The
mean of the distribution of means of difference scores (the comparison
distribution) is 0; the standard deviation of the distribution of means of dif-
ference scores is 6; it is a tdistribution with 4 degrees of freedom.

❸Determine the cutoff sample score on the comparison distribution at
which the null hypothesis should be rejected.For a two-tailed test at
the .05 level, the cutoff sample scores are and .

❹Determine your sample’s score on the comparison distribution.
. t=(4-0)>6=.67

-2.776 +2.776

Table 7–10Figuring for Answer to “How Are You Doing?”Question 4

ScoreDifferenceDeviation

PersonBeforeAfter(After– Before)(Difference– M)
Squared
Deviation

1203010636

230502016256

32010196

44030196

5304010636

14016020720

For difference scores:

tfor needed for 5% significance level,two-tailed.

Decision:Do not reject the null hypothesis.

t=(M-�)>SM=(4-0)>6=.67
=;2.776 df=4

SM=2S2M=236=6.
S2

M=S2>N=180>5=36.
S2=SS>df=720>(5-1)=720>4=180.
�=0.
M=20>5=4.00.

©:

-14 -10
-14 -10

IS
B
N
0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

Introduction to t Tests 247

Assumptions of the t Test for a Single Sample
and the t Test for Dependent Means
As we have seen, when you are using an estimated population variance, the comparison
distribution is a t distribution. However, the comparison distribution will be exactly a
t distribution only if the distribution of individuals follows a normal curve. Otherwise,
the comparison distribution will follow some other (usually unknown) shape.

Thus, strictly speaking, a normal population is a requirement within the logic
and mathematics of the t test. A requirement like this for a hypothesis-testing
procedure is called an assumption. That is, a normal population distribution is one
assumption of the t test. The effect of this assumption is that if the population distri-
bution is not normal, the comparison distribution will be some indeterminate shape
other than a t distribution—and thus the cutoffs on the t table will be incorrect.

Unfortunately, when you do a t test, you don’t know whether the population is nor-
mal. This is because, when doing a t test, usually all you have to go on are the scores in
your sample. Fortunately, however, as we saw in Chapter 3, distributions in psychology
research quite often approximate a normal curve. (This also applies to distributions of
difference scores.)Also, statisticians have found that, in practice, you get reasonably ac-
curate results with t tests even when the population is rather far from normal. In other
words, the t test is said to be robust over moderate violations of the assumption of a nor-
mal population distribution. How statisticians figure out the robustness of a test is an
interesting topic, which is described in Box 8–1 in Chapter 8.

The only very common situation in which using a t test for dependent means is
likely to give a seriously distorted result is when you are using a one-tailed test and
the population is highly skewed (is very asymmetrical, with a much longer tail on
one side than the other). Thus, you need to be cautious about your conclusions when
doing a one-tailed test if the sample of difference scores is highly skewed, suggest-
ing the population it comes from is also highly skewed.

Effect Size and Power for the t Test
for Dependent Means

Effect Size

You can figure the effect size for a study using a t test for dependent means the same
way as in Chapter 6.6 t is the difference between the population means divided by the
population standard deviation: . When using this formula for a
t test for dependent means, is for the predicted mean of the population of differ-
ence scores, (the “known” population mean) is almost always 0, and usually
stands for the standard deviation of the population of difference scores. The conven-
tions for effect size for a t test for dependent means are also the same as you learned
for the situation we considered in Chapter 6: A small effect size is .20, a medium ef-
fect size is .50, and a large effect size is .80.

Consider an example. A sports psychologist plans a study on attitudes toward
teammates before versus after a game. She will administer an attitude questionnaire
twice, once before and once after a game. Suppose that the smallest before-after dif-
ference that would be of any importance is 4 points on the questionnaire. Also sup-
pose that, based on related research, the researcher figures that the standard deviation
of difference scores on this attitude questionnaire is about 8 points. Thus, and

. Applying the effect size formula, . In
terms of the effect size conventions, her planned study has a medium effect size.

d = (�1 – �2)>� = (4 – 0)>8 = .50� = 8
�1 = 4

��2

�1

d = (�1 – �2)>�

assumption condition, such as a pop-
ulation’s having a normal distribution,
required for carrying out a particular
hypothesis-testing procedure; a part of
the mathematical foundation for the
accuracy of the tables used in determin-
ing cutoff values.

robustness extent to which a particu-
lar hypothesis-testing procedure is rea-
sonably accurate even when its
assumptions are violated.

IS
B
N
0-
55
8-
46
76
1-
X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

248 Chapter 7

To estimate the effect size after a study, use the actual mean of your sample’s
difference scores as your estimate of , and use S (for the population of difference
scores) as your estimate of .

Consider our first example of a t test for dependent means, the study of
husbands’ change in communication quality. In that study, the mean of the differ-
ence scores was . The estimated population standard deviation of the differ-
ence scores would be 12.41. That is, we figured the estimated variance of the
difference scores to be 153.49; Therefore, the estimated effect
size is . This is a
very large effect size. (The negative sign for the effect size means that the large
effect was a decrease.)

Power
Power for a t test for dependent means can be determined using a power table, a power
software package, or an Internet power calculator. Table 7–11 gives the approximate
power at the .05 significance level for small, medium, and large effect sizes and one-
tailed and two-tailed tests. In the sports psychology example, the researcher expected
a medium effect size ( ). If she planned to conduct the study using the .05 level,
two-tailed, with 20 participants, the study would have a power of .59. This means that,
if the research hypothesis is true and has a medium effect size, there is a 59% chance
that this study will come out significant.

The power table (Table 7–11) is also useful when you are reading about a non-
significant result in a published study. Suppose that a study using a t test for dependent
means has a nonsignificant result. The study tested significance at the .05 level, was
two-tailed, and had 10 participants. Should you conclude that there is in fact no differ-
ence at all in the populations? Probably not. Even assuming a medium effect size, Table
7–11 shows that there is only a 32% chance of getting a significant result in this study.

d = .50

d = (�1 – �2)>� = (M – 0)>S = (-12.05 – 0)>12.39 = – .97
2S2 = 12.39.(S2)

-12.05

�
�1

T I P F O R S U C C E S S
Recall from Chapter 6 that power
can be expressed as a probability
(such as .71) or as a percentage
(such as 71%). Power is expressed
as a probability in Table 7–11 (as
well as in power tables in later
chapters).

Table 7–11 Approximate Power for Studies Using the t Test for Dependent Means for Testing
Hypotheses at the .05 Significance

Level

Effect SizeDifference
Scores in
Sample (N )

Small
(d � .20)

Medium
(d � .50)

Large
(d � .80)

One-tailed test

10 .15 .46 .78

20 .22 .71 .96

30 .29 .86 *

40 .35 .93 *

50 .40 .97 *

100 .63 * *

Two-tailed test

10 .09 .32 .66

20 .14 .59 .93

30 .19 .77 .99

40 .24 .88 *

50 .29 .94 *

100 .55 * *

*Power is nearly 1.

IS
B
N
0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

Introduction to t Tests 249

Consider another study that was not significant. This study also used the .05 sig-
nificance level, two-tailed. This study had 100 research participants. Table 7–11 tells
you that there would be a 55% chance of the study’s coming out significant if there
were even a true small effect size in the population. If there were a medium effect
size in the population, the table indicates that there is almost a 100% chance that this
study would have come out significant. Thus, in this study with 100 participants, we
could conclude from the results that in the population there is probably at most a
small difference.

To keep Table 7–11 simple, we have given power figures for only a few differ-
ent numbers of participants (10, 20, 30, 40, 50, and 100). This should be adequate for
the kinds of rough evaluations you need to make when evaluating results of research
articles.7

Planning Sample Size
Table 7–12 gives the approximate number of participants needed for 80% power for
a planned study. (Eighty percent is a common figure used by researchers for the
minimum power to make a study worth doing.) Suppose you plan a study in which
you expect a large effect size and you use the .05 significance level, two-tailed. The
table shows you would only need 14 participants to have 80% power. On the other
hand, a study using the same significance level, also two-tailed, but in which you ex-
pect only a small effect size would need 196 participants for 80% power.8

How are you doing?

1. (a) What is an assumption in hypothesis testing? (b) Describe a specific as-
sumption for a t test for dependent means. (c) What is the effect of violating
this assumption? (d) What does it mean to say that the t test for dependent
means is robust? (e) Describe a situation in which it is not robust.

2. How can you tell if you have violated the normal curve assumption?
3. (a) Write the formula for effect size; (b) describe each of its terms as they

apply to a planned t test for dependent means; (c) describe what you use for
each of its terms in figuring effect size for a completed study that used a t test
for dependent means.

4. You are planning a study in which you predict the mean of the population of
difference scores to be 40, and the population standard deviation is 80. You
plan to test significance using a t test for dependent means, one-tailed, with
an alpha of .05. (a) What is the predicted effect size? (b) What is the power of
this study if you carry it out with 20 participants? (c) How many participants
would you need to have 80% power?

Table 7–12 Approximate Number of Research Participants Needed for 80% Power for the
t Test for Dependent Means in Testing Hypotheses at the .05 Significance Level

Effect Size
Small
(d � .20)
Medium
(d � .50)
Large
(d � .80)

One-tailed 156 26 12

Two-tailed 196 33 14

IS
B
N
0-
55
8-
46
76
1-
X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

250 Chapter 7

Controversy: Advantages and Disadvantages
of Repeated-Measures Designs
The main controversies about t tests have to do with their relative advantages and
disadvantages compared to various alternatives (alternatives we will discuss in
Chapter 14). There is, however, one consideration that we want to comment on now.
It is about all research designs in which the same participants are tested before and
after some experimental intervention (the kind of situation the t test for dependent
means is often used for).

Studies using difference scores (that is, studies using a repeated-measures de-
sign) often have much larger effect sizes for the same amount of expected difference
between means than other kinds of research designs. That is, testing each of a group
of participants twice (once under one condition and once under a different condition)
usually produces a study with high power. In particular, this kind of study gives more
power than dividing the participants up into two groups and testing each group once
(one group tested under one condition and the other tested under another condition).
In fact, studies using difference scores usually have even more power than those in
which you have twice as many participants, but each is tested only once.

Why do repeated-measures designs have so much power? The reason is that the
standard deviation of difference scores is usually quite low. (The standard deviation
of difference scores is what you divide by to get the effect size when using difference
scores.) This produces a large effect size, which increases the power. In a repeated-
measures design, the only variation is in the difference scores. Variation among par-
ticipants on each testing’s scores is not part of the variation involved in the analysis.
As an example, look back at Table 7–8 from our romantic love and brain imaging
study. Notice that there were very great differences between the scores (fMRI scanner

Answers

1.(a) An assumption is a requirement that you must meet for the results of the
hypothesis testing procedure to be accurate.(b) The population of individu-
als’ difference scores is assumed to be a normal distribution. (c)The signifi-
cance level cutoff from the ttable is not accurate. (d) Unless you very strongly
violate the assumption (that is, unless the population distribution is very far
from normal), the cutoff is fairly accurate.(e) The ttest for dependent means
is not robust when you are doing a one-tailed test and the population distrib-
ution is highly skewed.

2.You look at the distribution of the sample of difference scores to see if it is
dramatically different from a normal curve.

3.(a) .(b) dis the effect size; is for the predicted mean of the
population of difference scores; is the mean of the known population,
which for a population of difference scores is almost always 0; is for the
standard deviation of the population of difference scores.(c) To estimate ,
you use M,the actual mean of your sample’s difference scores; remains as
0; and for , you use S,the estimated standard deviation of the population of
difference scores.

4.(a)Predicted effect size: . (b) Power of
this study: .71. (c) Number of participants for 80% power: 26.

d=(�1-�2)>�=(40-0)>80=.50

�
�2
�1
�
�2

�1 d=(�1-�2)>�

IS
B
N
0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

Introduction to t Tests 251

activation values) for each participant. The first participant’s scores were around
1,487, the second’s was around 1,328, and so forth. Each person has a quite different
overall level of activation. But the differences between the two conditions were rela-
tively small. What we see in this example is that, because difference scores are all
comparing participants to themselves, the variation in them is much less (and does
not include the variation between participants). William S. Gosset, who essentially
invented the t test (see Box 7–1), made much of the higher power of repeated-
measures studies in a historically interesting controversy over an experiment about
milk, which is described in Box 7–2.

On the other hand, testing a group of people before and after an experimental proce-
dure, without any kind of control group that does not go through the procedure, is a weak
research design (Cook & Campbell, 1979). Even if such a study produces a significant
difference, it leaves many alternative explanations for that difference. For example, the
research participants might have matured or improved during that period anyway, or
perhaps other events happened between tests, or the participants not getting benefits
may have dropped out. It is even possible that the initial test itself caused changes.

Note, however, that the difficulties of research that tests people before and after
some intervention are shared only slightly with the kind of study in which participants
are tested under two conditions, such as viewing a beloved person’s picture and a neu-
tral person’s picture, with half tested first viewing the beloved’s picture and half tested
first viewing the neutral person’s picture.Another example would be a study examining
the hand-eye coordination of a group of surgeons under both quiet and noisy conditions
(not while doing surgery, of course). Each surgeon would perform the test of hand-eye

from group to group if they took pity on a child whom
they felt would benefit from receiving milk!

However, even more interesting in light of the present
chapter, Gosset demonstrated that the researchers could
have obtained the same result with 50 pairs of identical
twins, flipping a coin to determine which of each pair was
in the milk group (and sticking to it). Of course, the statis-
tic you would use is the t test as taught in this chapter—the
t test for dependent means.

More recently, the development of power analysis,
which we introduced in Chapter 6, has thoroughly vindi-
cated Gosset. It is now clear just how surprisingly few
participants are needed when a researcher can find a way
to set up a repeated-measures design in which difference
scores are the basic unit of analysis. (In this case, each
pair of twins would be one “participant.”) As Gosset
could have told them, studies that use the t test for depen-
dent means can be extremely sensitive.

Sources: Peters (1987); Tankard (1984).

BOX 7–2 The Power of Studies Using Difference Scores:
How the Lanarkshire Milk Experiment Could Have
Been Milked for More

In 1930, a major health experiment was conducted in
Scotland involving 20,000 schoolchildren. Its main pur-
pose was to compare the growth of a group of children
who were assigned to drink milk regularly to those who
were in a control group. The results were that those who
drank milk showed more growth.

However, William Gosset, a contemporary statistician
and inventor of the t test (see Box 7–1), was appalled at
the way the experiment was conducted. It had cost about
£7,500, which in 1930 was a huge amount of money, and
was done wrong! Large studies such as this were very
popular among statisticians in those days because they
seemed to imitate the large numbers found in nature.
Gosset, by contrast, being a brewer, was forced to use
very small numbers in his studies—experimental batches
of beer were too costly. And he was often chided by the
“real statisticians” for his small sample sizes. But Gosset
argued that no number of participants was large enough
when strict random assignment was not followed. And
in this study, teachers were permitted to switch children

IS
B
N
0-
55
8-
46
76
1-
X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

252 Chapter 7

coordination during quiet conditions and noisy conditions. Ideally, any effects of prac-
tice or fatigue from taking the test twice would be equalized by testing half of the sur-
geons under noisy conditions first, and half under quiet conditions first.

Single Sample t Tests and Dependent Means
t Tests in Research Articles
Research articles usually describe t tests in a fairly standard format that includes the
degrees of freedom, the t score, and the significance level. For example,
“ , p � .05” tells you that the researcher used a t test with 24 degrees of
freedom, found a t score of 2.80, and the result was significant at the .05 level.
Whether a one-tailed or two-tailed test was used may also be noted. (If not, assume
that it was two-tailed.) Usually the means, and sometimes the standard deviations, are
given for each testing. Rarely does an article report the standard deviation of the dif-
ference scores.

Had our student in the dormitory example reported the results in a research arti-
cle, she would have written something like this: “The sample from my dormitory
studied a mean of 21 hours ( ). Based on a t test for a single sample, this
was significantly different from the known mean of 17 for the college as a whole,

, p � .05, one-tailed.” The researchers in our fictional flood victims
example might have written up their results as follows: “The reported hopefulness
of our sample of flood victims ( , ) was not significantly differ-
ent from the midpoint of the scale, .”

As we noted earlier, psychologists only occasionally use the t test for a single
sample. We introduced it mainly as a stepping-stone to the more widely used t test
for dependent means. Nevertheless, one sometimes sees the t test for a single sample
in research articles. For example, Soproni and colleagues (2001), as part of a larger
study, had pet dogs respond to a series of eight trials in which the owner would look
at one of two bowls of dog food and the researchers measured whether the dog went
to the correct bowl. (The researchers called these “at trials” because the owner
looked directly at the target.) For each dog, this produced an average percentage cor-
rect that was compared to chance, which would be 50% correct. Here is part of their
results: “During the eight test trials for gesture, dogs performed significantly above
chance on at target trials: one sample t test, , p � .01 . . .” (p. 124).

As we have said, the t test for dependent means is much more commonly used.
Olthoff (1989) might have reported the result of his study of husbands’ communica-
tion quality as follows: “There was a significant decline in communication quality,
dropping from a mean of 116.32 before marriage to a mean of 104.26 after marriage,

, p � .05.”
As another example, Rashotte and Webster (2005) carried out a study about

people’s general expectations about the abilities of men and women. In the study, the
researchers showed 174 college students photos of women and men (referred to as
the female and male targets, respectively). The students rated the person in each
photo in terms of that person’s general abilities (e.g., in terms of the person’s intelli-
gence, abstract abilities, capability at most tasks, and so on). For each participant,
these ratings were combined to create a measure of the perceived status of the female
targets and of the male targets. The researchers then compared the status ratings
given for the female targets and male targets. Since each participant in the study
rated both the female and the male targets, the researchers compared the status rat-
ings assigned to the female and male targets using a t test for dependent means.
Table 7–13 shows the results. The row entitled “Whole sample ( )” gives theN = 174

t(18) = -4.24

t(13) = 5.3

t(9) = 1.17
SD = 1.89M = 4.70

t(15) = 2.35

SD = 6.80

t(24) = 2.80

IS
B
N
0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

Introduction to t Tests 253

result of the t test for all 174 participants and shows that the status rating assigned to
the male targets was significantly higher than the rating assigned to the female tar-
gets ( , p � .001). As shown in the table, the researchers also conducted two
additional t tests to see if this effect was the same among the female participants and
the male participants. The results showed that both the female and the male partici-
pants assigned higher ratings to the male targets.

t = 3.46

Table 7–13 Status Scale: Mean (and SE ) General Expectations for Female and Male Targets

Mean Score (SE )

Respondents Female Target Male Target
M � F Target

Difference t(1-tailed p)

Whole sample ( ) 5.60 (.06) 5.85 (.07) .25

Female respondents ( ) 5.62 (.07) 5.84 (.081) .22

Male respondents ( ) 5.57 (.10) 5.86 (.11) .29

Source: Rashotte, L. S., & Webster, M., Jr. (2005). Gender status beliefs. Social Science Research, 34, 618–633. Copyright ©
2005 by Elsevier. Reprinted by permission of Elsevier.

2.26 (6 .05)N = 63
2.62 (6 .05)N = 111
3.46 (6 .001)N = 174

1. You use the standard steps of hypothesis testing even when you don’t know the
population variance. However, in this situation you have to estimate the popula-
tion variance from the scores in the sample, using a formula that divides the sum
of squared deviation scores by the degrees of freedom ( ).

2. When the population variance is estimated, the comparison distribution of means
is a t distribution (with cutoffs given in a t table). A t distribution has slightly
heavier tails than a normal curve (just how much heavier depends on how few the
degrees of freedom are). Also, in this situation, a sample’s number of standard
deviations from the mean of the comparison distribution is called a t score.

3. You use a t test for a single sample when a sample mean is being compared to a
known population mean and the population variance is unknown.

4. You use a t test for dependent means in studies where each participant has two
scores, such as a before-score and an after-score or a score in each of two experi-
mental conditions. A t test for dependent means is also used when you have scores
from pairs of research participants. In this t test, you first figure a difference or
change score for each participant, then go through the usual five steps of hypothe-
sis testing with the modifications described in summary points 1 and 2 and making
Population 2 a population of difference scores with a mean of 0 (no difference).

5. An assumption of the t test is that the population distribution is a normal curve.
However, even when it is not, the t test is usually fairly accurate.

6. The effect size of a study using a t test for dependent means is the mean of the
difference scores divided by the standard deviation of the difference scores. You
can look up power and needed sample size for any particular level of power
using power software packages, an Internet power calculator, or special tables.

7. The power of studies using difference scores is usually much higher than that of
studies using other designs with the same number of participants. However, re-
search using a single group tested before and after some intervening event, without
a control group, allows for many alternative explanations of any observed changes.

8. t tests are reported in research articles using a standard format. For example,
“ , p � .05.”t(24) = 2.80

df = N – 1

Summary

IS
B
N
0-
55
8-
46
76
1-
X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

254 Chapter 7

t tests (p. 223)
t test for a single sample (p. 223)
biased estimate (p. 225)
unbiased estimate of the population

variance ( ) (p. 226)S2

degrees of freedom (df ) (p. 226)
t distribution (p. 228)
t table (p. 229)
t score (p. 230)
repeated-measures design (p. 236)

t test for dependent means (p. 236)
difference scores (p. 237)
assumption (p. 247)
robustness (p. 247)

Key Terms

t Test for a Single Sample
Eight participants are tested after being given an experimental procedure. Their
scores are 14, 8, 6, 5, 13, 10, 10, and 6. The population of people not given this pro-
cedure is normally distributed with a mean of 6. Using the .05 level, two-tailed, does
the experimental procedure make a difference? (a) Use the five steps of hypothesis
testing and (b) sketch the distributions involved.

Answer
(a) Steps of hypothesis testing:

❶ Restate the question as a research hypothesis and a null hypothesis
about the populations. There are two populations:

Population 1: People who are given the experimental procedure.
Population 2: The general population.

The research hypothesis is that the Population 1 will score differently than
Population 2. The null hypothesis is that Population 1 will score the same as
Population 2.

❷ Determine the characteristics of the comparison distribution. The mean
of the distribution of means is 6 (the known population mean). To figure the
estimated population variance, you first need to figure the sample mean,
which is ( . The
estimated population variance is ; the variance
of the distribution of means is The standard
deviation of the distribution of means is Its
shape will be a t distribution for .

❸ Determine the cutoff sample score on the comparison distribution at
which the null hypothesis should be rejected. From Table A–2, the cutoffs
for a two-tailed t test at the .05 level for are and .

❹ Determine your sample’s score on the comparison distribution.
.

➎ Decide whether to reject the null hypothesis. The t of 2.54 is more extreme
than the needed t of . Therefore, reject the null hypothesis; the research
hypothesis is supported. The experimental procedure does make a difference.

(b) Sketches of distributions are shown in Figure 7–9.

;2.365

t = (M – �)>SM = (9 – 6)>1.18 = 3>1.18 = 2.54

-2.365+2.365df = 7

df = 7
SM = 2S2M = 21.39 = 1.18.

S2M = S2 > N = 11.14 > 8 = 1.39.
78>7 = 11.14S2 = SS>df =

8 = 72>8 = 914 + 8 + 6 + 5 + 13 + 10 + 10 + 6)>

Example Worked-Out Problems

IS
B
N
0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

Introduction to t Tests 255

t Test for Dependent Means
A researcher tests 10 individuals before and after an experimental procedure. The
results are as follows:

Comparison
distribution
(t)
Population
(normal)
6

4.82 6 7.188.36
–1 0 1 2

9
Sample
Raw Scores:
t Scores:

Figure 7–9 Distributions for answer to Example Worked-Out Example Problem for
t test for a single sample.

Participant Before After

1 10.4 10.8
2 12.6 12.1
3 11.2 12.1
4 10.9 11.4
5 14.3 13.9
6 13.2 13.5
7 9.7 10.9
8 11.5 11.5
9 10.8 10.4

10 13.1 12.5

Test the hypothesis that there is an increase in scores, using the .05 significance
level. (a) Use the five steps of hypothesis testing and (b) sketch the distributions
involved.

IS
B
N
0-
55
8-
46
76
1-
X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

256 Chapter 7

Answer
(a) Table 7–14 shows the results, including the figuring of difference scores and

all the other figuring for the t test for dependent means. Here are the steps of
hypothesis testing:

❶ Restate the question as a research hypothesis and a null hypothesis
about the populations. There are two populations:

Population 1: People like those who are given the experimental procedure.
Population 2: People who show no change from before to after.

The research hypothesis is that Population 1’s mean difference score (figured
using “after” scores minus “before” scores) is greater than Population 2’s
mean difference score. The null hypothesis is that Population 1’s mean dif-
ference score is not greater than Population 2’s.

❷ Determine the characteristics of the comparison distribution. Its popula-
tion mean is 0 difference. The estimated population variance of difference
scores, , is shown in Table 7–14 to be .388. As shown in Table 7–14, the
standard deviation of the distribution of means of difference scores, , is
.197. Therefore, the comparison distribution has a mean of 0 and a standard
deviation of .197. It will be a t distribution for .

❸ Determine the cutoff sample score on the comparison distribution at which
the null hypothesis should be rejected. For a one-tailed test at the .05 level with

, the cutoff is 1.833. (The cutoff is positive as the research hypothesis is
that Population 1’s mean difference score will be greater than Population 2’s.)
df = 9

df = 9
SM
S2

Table 7–14 Figuring for Answer to Example Worked-Out Problem for t Test for Dependent
Means

Participant Score
Difference

(After � Before)
Deviation

(Difference � M )
Squared
Deviation

Before After

1 10.4 10.8 .4 .260 .068

2 12.6 12.1 .410

3 11.2 12.1 .9 .760 .578

4 10.9 11.4 .5 .360 .130

5 14.3 13.9 .292

6 13.2 13.5 .3 .160 .026

7 9.7 10.9 1.2 1.060 1.124

8 11.5 11.5 0.0 .020

9 10.8 10.4 .292

10 13.1 12.5 .548

117.7 119.1 1.4 3.488

For difference scores:

t for needed for 5% significance level, one-tailed

Decision: Do not reject the null hypothesis.

t = (M – �)>SM = (.140 – 0)>.197 = .71.
= 1.833.df = 9

SM = 2S 2M = 2.039 = .197.
S 2M = S 2>N = .388>10 = .039.
S 2 = SS>df = 3.488>(10 – 1) = 3.488>9 = .388.
� = 0.
M = 1.4>10 = .140.

©:
– .740- .6

– .540- .4

– .140

– .540- .4

– .640- .5

IS
B
N
0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

Introduction to t Tests 257

❹ Determine your sample’s score on the comparison distribution. The sam-
ple’s mean change of .140 is .71 standard deviations (of .197 each) on the
distribution of means above that distribution’s mean of 0. That is,

➎ Decide whether to reject the null hypothesis. The sample’s t of .71 is less
extreme than the needed t of 1.833. Thus, you cannot reject the null hypothe-
sis. The study is inconclusive.

(b) Sketches of distributions are shown in Figure 7–10.

Outline for Writing Essays for a t Test for a Single Sample
1. Describe the core logic of hypothesis testing in this situation. Be sure to mention

that the t test for a single sample is used for hypothesis testing when you have
scores for a sample of individuals and you want to compare the mean of this
sample to a population for which the mean is known but the variance is un-
known. Be sure to explain the meaning of the research hypothesis and the null
hypothesis in this situation.

2. Outline the logic of estimating the population variance from the sample scores.
Explain the idea of biased and unbiased estimates of the population variance,
and describe the formula for estimating the population variance and why it is
different from the ordinary variance formula.

3. Describe the comparison distribution (the t distribution) that is used with a t test
for a single sample, noting how it is different from a normal curve and why.

t = (M – �)>SM = (.140 – 0)>.197 = .71

Comparison
distribution (t)
Population
of difference
scores
0

–.20 0 .20
–1 0 1

.14

Sample
Raw Scores:
t Scores:

Figure 7–10 Distributions for answer to Example Worked-Out Problem for t test for
dependent means.

IS
B
N
0-
55
8-
46
76
1-
X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

258 Chapter 7

Explain why a t distribution (as opposed to the normal curve) is used as the
comparison distribution.

4. Describe the logic and process for determining the cutoff sample score(s) on the
comparison distribution at which the null hypothesis should be rejected.

5. Describe why and how you figure the t score of the sample mean on the compar-
ison distribution.

6. Explain how and why the scores from Steps ❸ and ❹ of the hypothesis-testing
process are compared. Explain the meaning of the result of this comparison with
regard to the specific research and null hypotheses being tested.

Outline for Writing Essays for a t Test for Dependent Means
1. Describe the core logic of hypothesis testing in this situation. Be sure to mention

that the t test for dependent means is used for hypothesis testing when you have
two scores from each person in your sample. Be sure to explain the meaning of
the research hypothesis and the null hypothesis in this situation. Explain the
logic and procedure for creating difference scores.

2. Explain why you use 0 as the mean for the comparison distribution.
3. Outline the logic of estimating the population variance of difference scores from

the sample scores. Explain the idea of biased and unbiased estimates of the pop-
ulation variance, and describe the formula for estimating the population vari-
ance. Describe how to figure the standard deviation of the distribution of means
of difference scores.

4. Describe the comparison distribution (the t distribution) that is used with a t test
for dependent means. Explain why a t distribution (as opposed to the normal
curve) is used as the comparison distribution.

5. Describe the logic and process for determining the cutoff sample score(s) on the
comparison distribution at which the null hypothesis should be rejected.

6. Describe why and how you figure the t score of the sample mean on the compar-
ison distribution.

7. Explain how and why the scores from Steps ❸ and ❹ of the hypothesis-testing
process are compared. Explain the meaning of the result of this comparison with
regard to the specific research and null hypotheses being tested.

These problems involve figuring. Most real-life statistics problems are done on a
computer with special statistical software. Even if you have such software, do these
problems by hand to ingrain the method in your mind. To learn how to use a computer
to solve statistics problems like those in this chapter, refer to the Using SPSS section
at the end of this chapter and the Study Guide and Computer Workbook that
accompanies this text.

All data are fictional unless an actual citation is given.

Set I (for Answers to Set I Problems, see pp. 681–683)
1. In each of the following studies, a single sample’s mean is being compared to a

population with a known mean but an unknown variance. For each study, decide
whether the result is significant. (Be sure to show all of your calculations.)`

Practice Problems

IS
B
N
0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

Introduction to t Tests 259

2. Suppose a candidate running for sheriff in a rural community claims that she will
reduce the average speed of emergency response to less than 30 minutes, which is
thought to be the average response time with the current sheriff. There are no past
records; so the actual standard deviation of such response times cannot be deter-
mined. Thanks to this campaign, she is elected sheriff, and careful records are
now kept. The response times for the first month are 26, 30, 28, 29, 25, 28, 32, 35,
24, and 23 minutes.

Using the .05 level of significance, did she keep her promise? (a) Use the
steps of hypothesis testing. (b) Sketch the distributions involved. (c) Explain your
answer to someone who has never taken a course in statistics.

3. Aresearcher tests five individuals who have seen paid political ads about a partic-
ular issue. These individuals take a multiple-choice test about the issue in which
people in general (who know nothing about the issue) usually get 40 questions
correct. The number correct for these five individuals was 48, 41, 40, 51, and 50.

Using the .05 level of significance, two-tailed, do people who see the ads
do better on this test? (a) Use the steps of hypothesis testing. (b) Sketch the dis-
tributions involved. (c) Explain your answer to someone who is familiar with
the Z test (from Chapter 5) but is unfamiliar with t tests.

4. For each of the following studies using difference scores, test the significance
using a t test for dependent means.

Estimated
Sample Population Population Sample Significance Level
Size (N) Mean ( ) Variance ( ) Mean (M) Tails ( )

(a) 64 12.40 9.00 11.00 1 (low predicted) .05
(b) 49 1,006.35 317.91 1,009.72 2 .01
(c) 400 52.00 7.02 52.41 1 (high predicted) .01

�S 2�

Number of Mean of Estimated
Difference Difference Population
Scores in Scores in Variance of Significance
Sample Sample Difference Scores Tails Level

(a) 20 1.7 8.29 1 (high predicted) .05
(b) 164 2.3 414.53 2 .05
(c) 15 4.00 1 (low predicted) .01-2.2

5. A program to decrease littering was carried out in four cities in California’s
Central Valley starting in August 2007. The amount of litter in the streets (aver-
age pounds of litter collected per block per day) was measured during July be-
fore the program started and then the next July, after the program had been in
effect for a year. The results were as follows:

City July 2007 July 2008

Fresno 9 2
Merced 10 4
Bakersfield 8 9
Stockton 9 1

IS
B
N
0-
55
8-
46
76
1-
X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

260 Chapter 7

Using the .01 level of significance, was there a significant decrease in the
amount of litter? (a) Use the five steps of hypothesis testing. (b) Sketch the
distributions involved. (c) Explain your answer to someone who understands
mean, standard deviation, and variance, but knows nothing else about statistics.

6. A researcher assesses the level of a particular hormone in the blood in five pa-
tients before and after they begin taking a hormone treatment program. Results
for the five are as follows:

Patient Before After

A .20 .18
B .16 .16
C .24 .23
D .22 .19
E .17 .16

Using the .05 significance level, was there a significant change in the level of
this hormone? (a) Use the steps of hypothesis testing. (b) Sketch the distribu-
tions involved. (c) Explain your answer to someone who understands the t test
for a single sample but is unfamiliar with the t test for dependent means.

7. Figure the estimated effect size and indicate whether it is approximately small,
medium, or large, for each of the following studies:

Mean Change S

(a) 20 32
(b) 5 10
(c) .1 .4
(d) 100 500

8. What is the power of each of the following studies, using a t test for dependent
means (based on the .05 significance level)?

Effect Size N Tails

(a) Small 20 One
(b) Medium 20 One
(c) Medium 30 One
(d) Medium 30 Two
(e) Large 30 Two

9. About how many participants are needed for 80% power in each of the follow-
ing planned studies that will use a t test for dependent means with p � .05?

Predicted Effect Size Tails

(a) Medium Two
(b) Large One
(c) Small One

IS
B
N
0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

Introduction to t Tests 261

10. Weller and Weller (1997) conducted a study of the tendency for the menstrual
cycles of women who live together (such as sisters) to become synchronized. For
their statistical analysis, they compared scores on a measure of synchronization
of pairs of sisters living together versus the degree of synchronization that would
be expected by chance (lower scores mean more synchronization). Their key re-
sults (reported in a table not reproduced here) were synchrony scores of 6.32 for
the 30 roommate sister pairs in their sample compared to an expected synchrony
score of 7.76; they then reported a t score of 2.27 and a p level of .011 for this dif-
ference. Explain this result to a person who is familiar with hypothesis testing
with a known population variance, but not with the t test for a single sample.

11. A psychologist conducts a study of perceptual illusions under two different
lighting conditions. Twenty participants were each tested under both of the two
different conditions. The experimenter reported: “The mean number of effective
illusions was 6.72 under the bright conditions and 6.85 under the dimly lit con-
ditions, a difference that was not significant, .” Explain this result
to a person who has never had a course in statistics. Be sure to use sketches of
the distributions in your answer.

12. A study was done of personality characteristics of 100 students who were tested
at the beginning and end of their first year of college. The researchers reported
the results in the following table:

t(19) = 1.62

(a) Focusing on the difference scores, figure the t values for each personality
scale. (Assume that SD in the table is for what we have called S, the unbiased
estimate of the population standard deviation.)
(b) Explain to a person who has never had a course in statistics what this table
means.

Set II
13. In each of the following studies, a single sample’s mean is being compared to a

population with a known mean but an unknown variance. For each study, decide
whether the result is significant.

Fall Spring Difference

Personality Scale M SD M SD M SD

Anxiety 16.82 4.21 15.32 3.84 1.50** 1.85
Depression 89.32 8.39 86.24 8.91 3.08** 4.23
Introversion 59.89 6.87 60.12 7.11 2.22
Neuroticism 38.11 5.39 37.22 6.02 .89* 4.21

*p � .05.
**p � .01.

– .23

Estimated
Population
Standard Sample Significance

Sample Population Deviation Mean Level
Size (N ) Mean ( ) (S ) (M ) Tails ( )

(a) 16 100.31 2.00 100.98 1 (high predicted) .05
(b) 16 .47 4.00 .00 2 .05
(c) 16 68.90 9.00 34.00 1 (low predicted) .01

��

IS
B
N
0-
55
8-
46
76
1-
X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

262 Chapter 7

14. Evolutionary theories often emphasize that humans have adapted to their physi-
cal environment. One such theory hypothesizes that people should spontaneously
follow a 24-hour cycle of sleeping and waking—even if they are not exposed to
the usual pattern of sunlight. To test this notion, eight paid volunteers were
placed (individually) in a room in which there was no light from the outside and
no clocks or other indications of time. They could turn the lights on and off as
they wished. After a month in the room, each individual tended to develop a
steady cycle. Their cycles at the end of the study were as follows: 25, 27, 25, 23,
24, 25, 26, and 25.

Using the .05 level of significance, what should we conclude about the
theory that 24 hours is the natural cycle? (That is, does the average cycle length
under these conditions differ significantly from 24 hours?) (a) Use the steps of
hypothesis testing. (b) Sketch the distributions involved. (c) Explain your an-
swer to someone who has never taken a course in statistics.

15. In a particular country, it is known that college seniors report falling in love an
average of 2.20 times during their college years. A sample of five seniors, origi-
nally from that country but who have spent their entire college career in the
United States, were asked how many times they had fallen in love during their
college years. Their numbers were 2, 3, 5, 5, and 2. Using the .05 significance
level, do students like these who go to college in the United States fall in love
more often than those from their country who go to college in their own coun-
try? (a) Use the steps of hypothesis testing. (b) Sketch the distributions in-
volved. (c) Explain your answer to someone who is familiar with the Z test
(from Chapter 5) but is unfamiliar with the t test for a single sample.

16. For each of the following studies using difference scores, test the significance
using a t test for dependent means.

Number of
Difference
Scores in
Sample

Mean of
Difference

Scores

for
Difference

Scores
S 2

Tails
Significance

Level

(a) 10 3.8 50 One (high) .05
(b) 100 3.8 50 One (high) .05
(c) 100 1.9 50 One (high) .05
(d) 100 1.9 50 Two .05
(e) 100 1.9 25 Two .05

17. Four individuals with high levels of cholesterol went on a special crash diet,
avoiding high-cholesterol foods and taking special supplements. Their total
cholesterol levels before and after the diet were as follows:

Participant Before After

J. K. 287 255
L. M. M 305 269
A. K. 243 245
R. O. S. 309 247

Using the .05 level of significance, was there a significant change in cholesterol
level? (a) Use the steps of hypothesis testing. (b) Sketch the distributions

IS
B
N
0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

Introduction to t Tests 263

involved. (c) Explain your answer to someone who has never taken a course in
statistics.

18. Five people who were convicted of speeding were ordered by the court to attend
a workshop. A special device put into their cars kept records of their speeds for
two weeks before and after the workshop. The maximum speeds for each person
during the two weeks before and the two weeks after the workshop follow.

Participant Before After

L. B. 65 58
J. K. 62 65
R .C. 60 56
R. T. 70 66
J. M. 68 60

Using the .05 significance level, should we conclude that people are likely to
drive more slowly after such a workshop? (a) Use the steps of hypothesis test-
ing. (b) Sketch the distributions involved. (c) Explain your answer to someone
who is familiar with hypothesis testing involving known populations, but has
never learned anything about t tests.

19. The amount of oxygen consumption was measured in six individuals over two
10-minute periods while sitting with their eyes closed. During one period, they
listened to an exciting adventure story; during the other, they heard restful
music.

Based on the results shown, is oxygen consumption less when listening to the
music? Use the .01 significance level. (a) Use the steps of hypothesis testing.
(b) Sketch the distributions involved. (c) Explain your answer to someone who
understands mean, standard deviation, and variance but knows nothing else
about statistics.

20. Five sophomores were given an English achievement test before and after
receiving instruction in basic grammar. Their scores are shown below.

Participant Story Music

1 6.12 5.39
2 7.25 6.72
3 5.70 5.42
4 6.40 6.16
5 5.82 5.96
6 6.24 6.08

Student Before After

A 20 18
B 18 22
C 17 15
D 16 17
E 12 9

IS
B
N
0-
55
8-
46
76
1-
X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

264 Chapter 7

Is it reasonable to conclude that future students would show higher scores after
instruction? Use the .05 significance level. (a) Use the steps of hypothesis test-
ing. (b) Sketch the distributions involved (c) Explain your answer to someone
who understands mean, standard deviation, and variance but knows nothing else
about statistics.

21. Figure the predicted effect size and indicate whether it is approximately small,
medium, or large, for each of the following planned studies:

Predicted Mean
Change

(a) 8 30
(b) 8 10
(c) 16 30
(d) 16 10

�

22. What is the power of each of the following studies, using a t test for dependent
means (based on the .05 significance level)?

Effect Size N Tails

(a) Small 50 Two
(b) Medium 50 Two
(c) Large 50 Two
(d) Small 10 Two
(e) Small 40 Two
(f) Small 100 Two
(g) Small 100 One

23. About how many participants are needed for 80% power in each of the follow-
ing planned studies that will use a t test for dependent means with p � .05?

Predicted Effect Size Tails

(a) Small Two
(b) Medium One
(c) Large Two

24. A study compared union activity of employees in 10 plants during two different
decades. The researchers reported “a significant increase in union activity,

, p � .01.” Explain this result to a person who has never had a course
in statistics. Be sure to use sketches of the distributions in your answer.

25. Holden and colleagues (1997) compared mothers’ reported attitudes toward cor-
poral punishment of their children from before to 3 years after having their first
child. “The average change in the women’s prior-to-current attitudes was signif-
icant, , p � .001 . . . ” (p. 485). (The change was that they felt
more negatively about corporal punishment after having their child.) Explain
this result to someone who is familiar with the t test for a single sample, but not
with the t test for dependent means.

26. Table 7–15 (reproduced from Table 4 of Larson et al., 2001) shows ratings of
various aspects of work and home life of 100 middle-class men in India who
were fathers. Pick three rows of interest to you and explain the results to some-
one who is familiar with the mean, variance, and Z scores but knows nothing
else about statistics.

t(107) = 10.32

t(9) = 3.28

IS
B
N
0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

Introduction to t Tests 265

The U in the following steps indicates a mouse click. (We used SPSS version 15.0
for Windows to carry out these analyses. The steps and output may be slightly differ-
ent for other versions of SPSS.)

t Test for a Single Sample
❶ Enter the scores from your distribution in one column of the data window.
❷ U Analyze.
❸ U Compare means.
❹ U One-sample T test (this is the name SPSS uses for a t test for a single sample).
➎ U on the variable for which you want to carry out the t test and then U the arrow.
❻ Enter the population mean in the “Test Value” box.
❼ U OK.

Practice these steps by carrying out a single sample t test for the example shown
earlier in this chapter of 10 people’s ratings of hopefulness after a flood. The sample
scores, population mean, and figuring for that study are shown in Table 7–4 on
page 232. Your SPSS output window should look like Figure 7–11. The first table
provides information about the variable: the number of scores (“N”); the mean of the
scores (“Mean”); the estimated population standard deviation, S (“Std. Deviation”);
and the standard deviation of the distribution of means, (“Std. Error Mean”).
Check that the values in that table are consistent (allowing for rounding error) with
the values in Table 7–4.

The second table in the SPSS output window gives the outcome of the t test.
Compare the values of t and df in that table and the values shown in Table 7–4. The
exact two-tailed significance level of the t test is given in the “Sig. (2-tailed)” col-
umn. In this study, the researcher was using the .01 significance level. The signifi-
cance level given by SPSS (.271) is not more extreme than .01, which means that the
researcher cannot reject the null hypothesis and the study is inconclusive.

Using SPSS

Table 7–15 Comparison of Fathers’ Mean Psychological States in the Job and Home Spheres
( )

Sphere

Scale Range Work Home Work vs. Home

Important 0–9 5.98 5.06 6.86***

Attention 0–9 6.15 5.13 7.96***

Challenge 0–9 4.11 2.41 11.49***

Choice 0–9 4.28 4.74 ***

Wish doing else 0–9 1.50 1.44 0.61

Hurried 0–3 1.80 1.39 3.21**

Social anxiety 0–3 0.81 0.64 3.17**

Affect 1–7 4.84 4.98 **

Social climate 1–7 5.64 5.95 4.17***

Note: Values for column 3 are t scores; for all t tests.
**
***
Source: Larson, R., Dworkin, J., & Verma, S. (2001). Men’s work and family lives in India: The daily organization of time and
emotions. Journal of Family Psychology, 15, 206–224. Copyright © 2001 by the American Psychological Association.

p 6 .001.
p 6 .01.

df = 90

-2.64

-3.38

N = 100

IS
B
N
0-
55
8-
46
76
1-
X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

266 Chapter 7

t Test for Dependent Means
❶ Enter one set of scores (for example, the “before” scores) in the first column of

the data window. Then enter the second set of scores (for example, the “after”
scores) in the second column of the data window. (Be sure to enter the scores in
the order they are listed.) Since each row in the SPSS data window represents a
separate person, it is important that you enter each person’s scores in two sepa-
rate columns (for example, a “before” column and an “after” column).

❷ U Analyze.
❸ U Compare means.
❹ U Paired-Samples T Test (this is the name SPSS uses for a t test for dependent

means).

Figure 7–11 Using SPSS to carry out a t test for a single sample for the example of 10
people’s ratings of hopefulness after a flood.

IS
B
N
0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

Introduction to t Tests 267

❺ U on the first variable (this will highlight the variable). U on the second vari-
able (this will highlight the variable). U the arrow. The two variables will now
appear in the “Paired Variables” box.

❻ U OK.

Practice these steps by carrying out a t test for dependent means for Olthoff’s
(1989) study of communication quality of 19 men who received ordinary premarital
counseling. The scores and figuring for that study are shown in Table 7–6 on page 238.
Your SPSS output window should look like Figure 7–12. The key information is con-
tained in the third table (labeled “Paired Samples Test”). The final three columns of
this table give the t score (4.240), the degrees of freedom (18), and the two-tailed sig-
nificance level (.000 in this case) of the t test. The significance level is so small that,
even after rounding to three decimal places, it is less than .001. Since the significance
level is more extreme than the .05 significance level we set for this study, you can re-
ject the null hypothesis. By looking at the means for the “before” variable and the
“after” variable in the first table (labeled “Paired Samples Statistics”), you can see that

Figure 7–12 Using SPSS to carry out a t test for dependent means for Olthoff’s (1989)
study of communication quality of 19 men who received ordinary premarital counseling.

IS
B
N
0-
55
8-
46
76
1-
X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

268 Chapter 7

the husbands’ communication quality was lower after marriage (a mean of 104.2632)
than before marriage (a mean 116.3158). Don’t worry that the t value figured in
Table 7–6 was negative, whereas the t value in the SPSS output is positive. This hap-
pens because the difference score in Table 7–6 was figured as after minus before, but
SPSS figured the difference scores as before minus after. Both ways of figuring the
difference score are mathematically correct and the overall result is the same in
each case.

1. A sample’s variance is slightly smaller than the population’s because it is based
on deviations from the sample’s mean. A sample’s mean is the optimal balance
point for its scores. Thus, deviations of a sample’s scores from its mean will be
smaller than deviations from any other number. The mean of a sample generally
is not exactly the same as the mean of the population it comes from. Thus, devi-
ations of a sample’s scores from its mean will generally be smaller than devia-
tions of that sample’s scores from the population mean.

2. Statisticians make a subtle distinction in this situation between the comparison
distribution and the distribution of means. (We avoid this distinction to simplify
your learning of what is already fairly difficult.) The general procedure of hy-
pothesis testing, as we introduced it in Chapter 5, can be described as comparing
a Z score to your sample’s mean, where and then comparing
this Z score to a cutoff Z score from the normal curve table. We described this
process as using the distribution of means as your comparison distribution. Sta-
tisticians would say that actually you are comparing the Z score you figured for
your sample mean to a distribution of Z scores (which is simply a standard nor-
mal curve). Similarly, for a t test, statisticians think of the procedure as figuring
a t score (like a Z score, but figured using an estimated standard deviation)
where and then comparing your computed t score to a cutoff
t score from a t distribution table. Thus, according to the formal statistical logic,
the comparison distribution is a distribution of t scores, not of means.

3. In line with the terminology we used in Chapter 5, the symbol � in the formula
should read , since it refers to the population mean of a distribution of means.
In Chapter 5, we used the terminology to emphasize the conceptual differ-
ence between the mean of a population of individuals and the mean of a popula-
tion of means. But � and are always equal. Thus, to keep the terminology as
straightforward as possible in this and subsequent chapters, we refer to the mean
of a distribution of means as . (If we were even more formal, we might use
or even since we are referring to the mean of Population 2.)

4. The steps of carrying out a t test for a single sample can be combined into a
computational formula for t based on difference scores. For learning purposes in
your class, you should use the steps as we have discussed them in this chapter.
In a real research situation, the figuring is usually all done by computer (see this
chapter’s Using SPSS section). Should you ever have to do a t test for a single
sample for an actual research study by hand (or just with a hand calculator), you
may find the following formula useful:

t =
M – �

A
©X2 – ((©X)2>N)

(N – 1)(N)

�M2

�2�

�M
�M
�M
t = (M – �)>SM

Z = (M – �)>�M

Chapter Notes

The t score for a t test for a
single sample is the result of
subtracting the population
mean from the sample mean
and dividing that difference
by the square root of the fol-
lowing: the sum of the
squared scores minus the re-
sult of taking the sum of all
the scores, squaring this sum
and dividing by the number of
scores, then taking this whole
difference and dividing it by
the result of multiplying the
number of scores minus 1 by
the number of scores.

IS
B
N
0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

Introduction to t Tests 269

5. The steps of carrying out a t test for dependent means can be combined into a
computational formula for t based on difference scores. For learning purposes in
your class, you should use the steps as we have discussed them in this chapter.
In a real research situation, the figuring is usually all done by computer (see the
Using SPSS section at the end of this chapter). However, if you ever have to do
a t test for dependent means for an actual research study by hand (or with just a
hand calculator), you may find the formula useful:

6. Single sample t tests are quite rare in practice; so we didn’t include a discussion
of effect size or power for them in the main text. However, the effect size for a
single sample t test can be figured using the same approach as in Chapter 6
(which is the same as the approach for figuring effect size for the t test for de-
pendent means). It is the difference between the population means divided by
the population standard deviation: . When using this formula
for a t test for a single sample, is the predicted mean of Population 1 (the
population from which you are studying a sample), is the mean of the
“known” population, and is the population standard deviation. The conven-
tions for effect size for a t test for a single sample are the same as you learned for
the situation we considered in Chapter 6: A small effect size is .20, a medium
effect size is .50, and a large effect size is .80.

7. Cohen (1988, pp. 28–39) provides more detailed tables in terms of numbers of
participants, levels of effect size, and significance levels. If you use his tables,
note that the d referred to is actually based on a t test for independent means (the
situation we consider in Chapter 8). To use these tables for a t test for dependent
means, first multiply your effect size by 1.4. For example, if your effect size is
.30, for purposes of using Cohen’s tables, you would consider it to be .42 (that
is, . ). The only other difference from our table is that Cohen de-
scribes the significance level by the letter a (for “alpha level”), with a subscript
of either 1 or 2, referring to a one-tailed or two-tailed test. For example, a table
that refers to “ ” at the top means that this is the table for p � .05, one-
tailed.

8. More detailed tables, giving the needed numbers of participants for levels of
power other than 80% (and also for effect sizes other than .20, .50, and .80 and
for other significance levels) are provided in Cohen (1988, pp. 54–55). However,
see Chapter Note 7 about using Cohen’s tables for a t test for dependent means.

a1 = .05

30 * 1.4 = .42

�
�2

�1
d = (�1 – �2)>�

t =
(©D)>N

A
©D2 – ((©D)2>N)

(N – 1)(N)

The t score for a t test for de-
pendent means is the result of
dividing the sum of the dif-
ference scores by the number
of difference scores and then
dividing that result by the
square root of the following:
the sum of the squared differ-
ence scores minus the result
of taking the sum of all the
difference scores, squaring
this sum and dividing by the
number of difference scores,
then taking this whole differ-
ence and dividing it by the re-
sult of multiplying the
number of difference scores
minus 1 by the number of dif-
ference scores.

IS
B
N
0-
55
8-
46
76
1-
X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

270

✪ The Distribution of Differences
Between Means 27

✪ Hypothesis Testing with a t Test
for Independent Means 27

✪ Assumptions of the t Test
for Independent Means 286

✪ Effect Size and Power for the t Test
for Independent Means 288

✪ Review and Comparison of the Three
Kinds of t Tests 290

✪ Controversy: The Problem of Too Many
t Tests 291

In the previous chapter, you learned how to use the t test for dependent means tocompare two sets of scores from a single group of people (such as the same menmeasured on communication quality before and after premarital counseling)

In this chapter, you learn how to compare two sets of scores, one from each of

two entirely separate groups of people. This is a very common situation in psychol

–

ogy research. For example, a study may compare the scores from individuals in an
experimental group and individuals in a control group (or from a group of men and
a group of women). This is a t test situation because you don’t know the population
variances (so they must be estimated). The scores of the two groups are indepen-
dent of each other; so the test you learn in this chapter is called a t test for inde-
pendent

means.

✪

The t Test for Independent Means

in Research Articles 29

✪ Advanced Topic: Power for the
t Test for Independent Means When
Sample Sizes Are Not Equal 29

✪ Summary 294

✪ Key Terms 295

✪ Example Worked-Out Problems 295

✪ Practice Problems 298

✪ Using SPSS 305

✪ Chapter Notes 309

The t Test for Independent Means

Chapter Outline

CHAPTER 8

t test for independent
means hypothesis-testing procedure in
which there are two separate groups of
people tested and in which the popula-
tion variance is not known.

IS
B

N
0-558-46761-X

The t Test for Independent Means 271

Let’s consider an example. A team of researchers is interested in the effect on
physical health of writing about thoughts and feelings associated with traumatic life
events. This kind of writing is called expressive writing. Suppose the researchers
recruit undergraduate students to take part in a study and randomly assign them to
be in an expressive writing group or a control group. Students in the expressive writ-
ing group are instructed to write four 20-minute essays over four consecutive days
about their most traumatic life experiences. Students in the control group write
four 20-minute essays over four consecutive days describing their plans for that day.
One month later, the researchers ask the students to rate their overall level of physi-
cal health (on a scale from to ). Since the
expressive writing and the control group contain different students, a t test for inde-
pendent means is the appropriate test of the effect of expressive writing on physical
health. We will return to this example later in the chapter. But first, you will learn
about the logic of the t test for independent means, which involves learning about a
new kind of distribution (called the distribution of differences between means).

The Distribution of Differences Between Means
In the previous chapter, you learned the logic and figuring for the t test for dependent
means. In that chapter, the same group of people each had two scores; that is, you
had a pair of scores for each person. This allowed you to figure a difference score for
each person. You then carried out the hypothesis-testing procedure using these dif-
ference scores. The comparison distribution you used for this hypothesis testing was
a distribution of means of difference scores.

In the situation you face in this chapter, the scores in one group are for different
people than the scores in the other group. So you don’t have any pairs of scores, as you
did when the same group of people each had two scores. Thus, it wouldn’t make sense
to create difference scores, and you can’t use difference scores for the hypothesis-
testing procedure in this chapter. Instead, when the scores in one group are for differ-
ent people than the scores in the other group, what you can compare is the mean of one
group to the mean of the other group.

So the t test for independent means focuses on the difference between the

means

of the two groups. The hypothesis-testing procedure, however, for the most part
works just like the hypothesis-testing procedures you have already learned. Since the
focus is now on the difference between means, the comparison distribution is a
distribution of differences between means.

A distribution of differences between means is, in a sense, two steps removed
from the populations of individuals: First, there is a distribution of means from each
population of individuals; second, there is a distribution of differences between pairs
of means, one of each pair from each of these distributions of means.

Think of this distribution of differences between means as being built up as
follows: (a) randomly select one mean from the distribution of means for the first
group’s population, (b) randomly select one mean from the distribution of means for the
second group’s population, and (c) subtract. (That is, take the mean from the first distri-
bution of means and subtract the mean from the second distribution of means.) This
gives a difference score between the two selected means. Then repeat the process. This
creates a second difference score, a difference between the two newly selected means.
Repeating this process a large number of times creates a distribution of differences be-
tween means. You would never actually create a distribution of differences between
means using this lengthy method. But it shows clearly what makes up the distribution.

100 = perfect health0 = very poor health

distribution of differences between
means distribution of differences
between means of pairs of samples such
that, for each pair of means, one is from
one population and the other is from a
second population; the comparison
distribution in a t test for independent
means.

T I P F O R S U C C E S S
The comparison distributions for
the t test for dependent means and
the t test for independent means
have similar names: a distribution
of means of difference scores, and
a distribution of differences be-
tween means, respectively. Thus, it
can be easy to confuse these com-
parison distributions. To remember
which is which, think of the logic of
each t test. The t test for depen-
dent means involves difference
scores. So, its comparison distrib-
ution is a distribution of means of
difference scores. The t test for
independent means involves
differences between means. Thus,
its comparison distribution is a dis-
tribution of differences between
means.

IS
B

N
0-

55
8-

46
76

1-
X

272 Chapter 8

Distributions of
means

Populations

Samples

Distribution of
differences between

means

Figure 8–1 Diagram of the logic of a distribution of differences between means.

The Logi

Figure 8–1 shows the entire logical construction for a distribution of differences
between means. At the top are the two population distributions. We do not know the
characteristics of these population distributions, but we do know that if the null hy-
pothesis is true, the two population means are the same. That is, the null hypothesis
is that . We also can estimate the variance of these populations based on the
sample information (these estimated variances will be and ).

Below each population distribution is the distribution of means for that popula-
tion. Using the estimated population variance and knowing the size of each sample,
you can figure the variance of each distribution of means in the usual way. (It is the
estimated variance of its parent population divided by the size of the sample from
that population that is being studied.)

Below these two distributions of means, and built from them, is the crucial
distribution of differences between means. This distribution’s variance is ultimately
based on estimated population variances. Thus, we can think of it as a t distribution.
The goal of a t test for independent means is to decide whether the difference be-
tween the means of your two actual samples is a more extreme difference than the
cutoff difference on this distribution of differences between means. The two actual
samples are shown (as histograms) at the bottom.

Remember, this whole procedure is really a kind of complicated castle in the air. It
exists only in our minds to help us make decisions based on the results of an actual ex-
periment. The only concrete reality in all of this is the actual scores in the two samples.
You estimate the population variances from these sample scores. The variances of the
two distributions of means are based entirely on these estimated population variances
(and the sample sizes). And, as you will see shortly, the characteristics of the distribu-
tion of differences between means are based on these two distributions of means.

Still, the procedure is a powerful one. It has the power of mathematics and logic
behind it. It helps you develop general knowledge based on the specifics of a particu-
lar study.

With this overview of the basic logic, we now turn to six key details: (1) the
mean of the distribution of differences between means, (2) the estimated population
variance, (3) the variance of the two distributions of means, (4) the variance and
standard deviation of the distribution of differences between means, (5) the shape of
the distribution of differences between means, and (6) the t score for the difference
between the two means being compared.

S22S
2
1

�1 = �2

IS
B
N
0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

The t Test for Independent Means 273

Mean of the Distribution of Differences Between Means
In a t test for independent means, you are considering two populations: for example,
one population from which an experimental group is taken and one population from
which a control group is taken. In practice, you don’t know the mean of either popu-
lation. You do know that if the null hypothesis is true, these two populations have
equal means. Also, if these two populations have equal means, the two distributions
of means have equal means. (This is because each distribution of means has the same
mean as its parent population of individuals.) Finally, if you take random samples
from two distributions with equal means, the differences between the means of these
random samples, in the long run, balance out to 0. The result of all this is the follow-
ing: whatever the specifics of the study, you know that, if the null hypothesis is true,
the distribution of differences between means has a mean of 0.

Estimating the Population Variance
In Chapter 7, you learned to estimate the population variance from the scores in your
sample. It is the sum of squared deviation scores divided by the degrees of freedom
(the number in the sample minus 1). To do a t test for independent means, it has to be
reasonable to assume that the populations the two samples come from have the same
variance (which, in statistical terms, is called homogeneity of variance). (If the null
hypothesis is true, they also have the same mean. However, whether or not the null
hypothesis is true, you must be able to assume that the two populations have the
same variance.) Therefore, when you estimate the variance from the scores in either
sample, you are getting two separate estimates of what should be the same number.
In practice, the two estimates will almost never be exactly identical. Since they are
both supposed to be estimating the same thing, the best solution is to average the two
estimates to get the best single overall estimate. This is called the pooled estimate of
the population variance ( ).

In making this average, however, you also have to take into account that the two
samples may not be the same size. If one sample is larger than the other, the estimate
it provides is likely to be more accurate (because it is based on more information). If
both samples are exactly the same size, you could just take an ordinary average of
the two estimates. On the other hand, when they are not the same size, you need to
make some adjustment in the averaging to give more weight to the larger sample.
That is, you need a weighted average, an average weighted by the amount of infor-
mation each sample provides.

Also, to be precise, the amount of information each sample provides is not its
number of scores, but its degrees of freedom (its number of scores minus 1). Thus,
your weighted average needs to be based on the degrees of freedom each sample
provides. To find the weighted average, you figure out what proportion of the total
degrees of freedom each sample contributes and multiply that proportion by the pop-
ulation variance estimate from that sample. Finally, you add up the two results, and
that is your weighted, pooled estimate. In terms of a formula,

(8–1)

In this formula, is the pooled estimate of the population variance. is the
degrees of freedom in the sample from Population 1, and is the degrees of freedom
in the sample from Population 2. (Remember, each sample’s df is its number of scores
minus 1.) is the total degrees of freedom . is theS2 1(dfTotal = df1 + df2)

dfTotal

df2

df1S
2
Pooled

S2Pooled =
df1

dfTotal
(S21) +

df2
dfTotal

(S22)

S2Pooled

pooled estimate of the population
variance ( ) in a t test for inde-
pendent means, weighted average of the
estimates of the population variance from
two samples (each estimate weighted by
the proportion consisting of its sample’s
degrees of freedom divided by the total
degrees of freedom for both samples).

S2Pooled

weighted average average in which
the scores being averaged do not have
equal influence on the total, as in figur-
ing the pooled variance estimate in a t
test for independent means.

The pooled estimate of the
population variance is the de-
grees of freedom in the first
sample divided by the total
degrees of freedom (from
both samples), multiplied by
the population estimate based
on the first sample, plus the
degrees of freedom in the
second sample divided by the
total degrees of freedom mul-
tiplied by the population
variance estimate based on
the second sample.

IS
B
N
0-
55
8-
46
76
1-
X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

274 Chapter 8

estimate of the population variance based on the scores in Population 1’s sample; is
the estimate based on the scores in Population 2’s sample.

Consider a study in which the population variance estimate based on an experi-
mental group of 11 participants is 60, and the population variance estimate based on
a control group of 31 participants is 80. The estimate from the experimental group is
based on 10 degrees of freedom (11 participants minus 1), and the estimate from the
control group is based on 30 degrees of freedom (31 minus 1). The total information
on which the estimate is based is the total degrees of freedom—in this example, 40
(that is, ). Thus, the experimental group provides one-quarter of the infor-
mation ( ), and the control group provides three-quarters of the informa-
tion ( ).

You then multiply the estimate from the experimental group by , making 15
(that is, ), and you multiply the estimate from the control group by

, making 60 (that is, ). Adding the two gives an overall estimate
of 15 plus 60, which is 75. Using the formula,

Notice that this procedure does not give the same result as ordinary averaging
(without weighting).

Ordinary averaging would give an estimate of 70 (that is, ).
Your weighted, pooled estimate of the population variance of 75 is closer to the esti-
mate based on the control group alone than to the estimate based on the experimen-
tal group alone. This is as it should be, because the control group estimate in this
example was based on more information.

Figuring the Variance of Each of the Two
Distributions of Means
The pooled estimate of the population variance is the best estimate for both popula-
tions. (Remember, to do a t test for independent means, you have to be able to as-
sume that the two populations have the same variance.) However, even though the
two populations have the same variance, if the samples are not the same size, the dis-
tributions of means taken from them do not have the same variance. That is because
the variance of a distribution of means is the population variance divided by the sam-
ple size. In terms of formulas,

(8–2)S2M1 =
S2Pooled

360 + 804>2 = 70

=
1

4
(60) +

4
(80) = 15 + 60 = 75.

S2Pooled =
df1

dfTotal
(S21) +
df2
dfTotal

(S22) =
10

40
(60) +

40
(80)

80 * 3>4 = 603>4 60 * 1>4 = 15
1>430>40 = 3>4

10>40 = 1>410 + 30

T I P F O R S U C C E S S
You know you have made a mis-
take in figuring if it does not
come out between the two esti-
mates of the population variance.
(You also know you have made a
mistake if it does not come out
closer to the estimate from the
larger sample.)

S2Pooled

The variance of the distribu-
tion of means for the first
population (based on an
estimated population vari-
ance) is the pooled estimate
of the population variance
divided by the number of
participants in the sample
from the first population.

IS
B
N
0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

The t Test for Independent Means 2

(8–3)

Consider again the study with 11 in the experimental group and 31 in the control
group. We figured the pooled estimate of the population variance to be 75. For the
experimental group, the variance of the distribution of means would be , which
is 6.82. For the control group, the variance would be , which is 2.42. Using the
formulas,

The Variance and Standard Deviation of the
Distribution of Differences Between Means
The variance of the distribution of differences between means is the
variance of Population 1’s distribution of means plus the variance of Population 2’s
distribution of means. (This is because, in a difference between two numbers, the
variation in each contributes to the overall variation in their difference. It is like sub-
tracting a moving number from a moving target.) Stated as a formula,

(8–4)

The standard deviation of the distribution of differences between means
( ) is the square root of the variance:

(8–5)

In the example we have been considering, the variance of the distribution of
means for the experimental group was 6.82, and the variance of the distribution of
means for the control group was 2.42; the variance of the distribution of the differ-
ence between means is thus 6.82 plus 2.42, which is 9.24. This makes the standard
deviation of this distribution the square root of 9.24, which is 3.04. In terms of the
formulas,

Steps to Find the Standard Deviation of the
Distribution of Differences Between Means

●A Figure the estimated population variances based on each sample. That is,
figure one estimate for each population using the formula ).S2 = SS>(N – 1

SDifference = 2S2Difference = 29.24 = 3.04.
S2Difference = S2M1 + S

2
M2 = 6.82 + 2.42 = 9.24

SDifference = 2

S2Difference

SDifference

S2Difference = S2M1 + S
2
M2

(S2Difference)

S2M2 =
S2Pooled

31
= 2.42.

S2M1 =
S2Pooled

N1
=

11
= 6.82

75>31 75>11

S2M2 =
S2Pooled
N2

T I P F O R S U C C E S S
Remember that when figuring esti-
mated variances, you divide by the
degrees of freedom. But when fig-
uring the variance of a distribution
of means, which does not involve
any additional estimation, you di-
vide by the actual number in the
sample.

The variance of the
distribution of differences
between means is the variance
of the distribution of means
for the first population (based
on an estimated population
variance) plus the variance of
the distribution of means for
the second population (based
on an estimated population
variance).

variance of a distribution of differ-
ences between means ( )
one of the numbers figured as part of a
t test for independent means; it equals
the sum of the variances of the distribu-
tions of means associated with each of
the two samples.

S2Difference

standard deviation of the distribu-
tion of differences between means
( ) in a t test for independent
means, square root of the variance of the
distribution of differences between
means.

SDifference

The standard deviation of the
distribution of differences
between means is the square
root of the variance of the
distribution of differences
between means.

The variance of the distribu-
tion of means for the second
population (based on an
estimated population vari-
ance) is the pooled estimate
of the population variance
divided by the number of
participants in the sample
from the second population.

IS
B
N
0-
55
8-
46
76
1-
X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

276 Chapter 8

●B Figure the pooled estimate of the population variance:

( and ; )

●C Figure the variance of each distribution of means: and

●D Figure the variance of the distribution of differences between means:

●E Figure the standard deviation of the distribution of differences between
means:

The Shape of the Distribution of Differences
Between Means
The distribution of differences between means is based on estimated population vari-
ances. Thus, the distribution of differences between means (the comparison distri-
bution) is a t distribution. The variance of this distribution is figured based on
population variance estimates from two samples. Therefore, the degrees of freedom
for this t distribution are the sum of the degrees of freedom of the two samples. In
terms of a formula,

(8–6)

In the example we have been considering with an experimental group of 11 and
a control group of 31, we saw earlier that the total degrees of freedom is 40 (that is,

; ; and ). To find the t score needed for sig-
nificance, you look up the cutoff point in the t table in the row with 40 degrees of
freedom. Suppose you are conducting a one-tailed test using the .05 significance
level. The t table in the Appendix (Table A–2) shows a cutoff of 1.684 for 40 degrees
of freedom. That is, for a result to be significant, the difference between the means
has to be at least 1.684 standard deviations above the mean difference of 0 on the
distribution of differences between means.

The t Score for the Difference Between
the Two Actual Means
Here is how you figure the t score for Step ❹ of the hypothesis testing: First, figure
the difference between your two samples’ means. (That is, subtract one from the
other). Then, figure out where this difference is on the distribution of differences be-
tween means. You do this by dividing your difference by the standard deviation of
this distribution. In terms of a formula,

(8–7)

For our example, suppose the mean of the first sample is 198 and the mean of
the second sample is 190. The difference between these two means is 8 (that is,

). Earlier we figured the standard deviation of the distribution of dif-
ferences between means in this example to be 3.04. That would make a t score of
2.63 (that is, ). In other words, in this example the difference between
the two means is 2.63 standard deviations above the mean of the distribution of dif-
ferences between means. In terms of the formula,

t =
M1 – M2

SDifference

=
198 – 190

3.04
=

3.04
= 2.63

8>3.04 = 2.63
198 – 190 = 8

t =
M1 – M2
SDifference

10 + 30 = 4031 – 1 = 3011 – 1 = 10

dfTotal = df1 + df2

SDifference = 2S2Difference.
S2Difference = S2M1 + S

2
M2.

S2M2 = S
2
Pooled > N2.

S2M1 = S
2
Pooled > N1

dfTotal = df1 + df2df2 = N2 – 1df1 = N1 – 1

S2Pooled =
df1
dfTotal
(S21) +
df2
dfTotal
(S22)

The total degrees of free-
dom for a t test for indepen-
dent means is the degrees of
freedom in the first sample
plus the degrees of freedom
in the second sample.

The t score is the difference
between the two sample
means divided by the
standard deviation of the
distribution of differences
between means.

IS
B
N
0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

The t Test for Independent Means 277

(e) is the pooled estimate of the population variance; and are
the degrees of freedom in the samples from the first and second populations,
respectively; is the total degrees of freedom (the sum of and );
and are the population variance estimates based on the samples from the
first and second populations, respectively; is the variance of the distribu-
tion of means for the first population based on an estimated variance of the
population of individuals; is the number of participants in the sample from
the first population; is the variance of the distribution of differences
between means based on estimated variances of the populations of individu-
als; tis the tscore for a ttest for independent means (the number of standard
deviations from the mean on the distribution of differences between means)

;

and are the means of the samples from the first and second popula-
tions, respectively; and is the standard deviation of the distribution
of differences between means based on estimated variances of the popula-
tions of individuals.

4.(a) You assume that both populations have the same variance; thus the
estimates from the two samples should be estimates of the same number.
(b) We weight (give more influence to) an estimate from a larger sample
because, being based on more information, it is likely to be more accurate.
(c) The actual weighting is done by multiplying each sample’s estimate by the
degrees of freedom for that sample divided by the total degrees of freedom;
you then sum these two products.

5.(a) Standard deviation of the distribution of differences between means:

(b) Mean:0;(c)Shape:tdistributionwith;(d)ShouldlooklikeFigure8–1
withnumberswrittenin(seeFigure8–2foranexample).

df=50

SDifference=212.78=3.57.
S2

Difference=7.62+5.16=12.78;
S2

M1=160 > 21=7.62; S2M2=160 > 31=5.16;
S2

Pooled=(20 >50)(100)+(30>50)(200)=40+120=160.

SDifference

M2 M1

S2
Difference

2
M1

S2
2

S2
1 df2 df1 dfTotal

df2 df1

S2
Pooled

How are you doing?

1. (a) When would you carry out a t test for independent means? (b) How is this
different from the situation in which you would carry out a t test for dependent
means?

2. (a) What is the comparison distribution in a t test for independent means?
(b) Explain the logic of going from scores in two samples to an estimate of the
variance of this comparison distribution. (c) Illustrate your answer with sketches
of the distributions involved. (d) Why is the mean of this distribution 0?

3. Write the formula for each of the following: (a) pooled estimate of the population
variance, (b) variance of the distribution of means for the first population,
(c) variance of the distribution of differences between means, and (d) t score in a
t test for independent means. (e) Define all the symbols used in these formulas.

4. Explain (a) why a t test for independent means uses a single pooled estimate of
the population variance, and (b) why and (c) how this estimate is “weighted.”

5. For a particular study comparing means of two samples, the first sample has
21 participants and an estimated population variance of 100; the second
sample has 31 participants and an estimated population variance of 200.
(a) What is the standard deviation of the distribution of differences between
means? (b) What is its mean? (c) What will be its shape? (d) Illustrate your an-
swer with sketches of the distributions involved.

IS
B
N
0-
55
8-
46
76
1-
X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

278 Chapter 8

Answers

1.(a) You carry out a ttest for independent means when you have done a study
in which you have scores from two samples of different individuals and you
do not know the population variance.
(b) In a ttest for dependent means you have two scores from each of several
individuals.

2.(a) The comparison distribution in a ttest for independent means is a distrib-
ution of differences between means.
(b) You estimate the population variance from each sample’s scores. Since
you assume the populations have the same variance, you then pool the two
estimates (giving proportionately more weight in this averaging to the sample
that has more degrees of freedom in its estimate). Using this pooled estimate,
you figure the variance of the distribution of means for each sample’s popula-
tion by dividing this pooled estimate by the sample’s number of participants.
Finally, since your interest is in a difference between means, you create a
comparison distribution of differences between means. This comparison dis-
tribution will have a variance equal to the sum of the variances of the two dis-
tributions of means. (Because the distribution of differences between means
is made up of pairs of means, one taken from each distribution of means, the
variance of both of these distributions of means contributes to the variance of
the comparison distribution.)
(c) Your sketch should look like Figure 8–1.
(d) The mean of this distribution will be zero because, if the null hypothesis is
true, the two populations have the same mean. So differences between
means would on the average come out to zero.

3.(a) Pooled estimate of the population variance:

(b)Variance of the distribution of means for the first population:

(c)Variance of the distribution of differences between means:

(d)tscore in a ttest for independent means: t=
M1 – M2
SDifference

S2
Difference=

S2M

1+S2M
2.

S2
M1=

S2
Pooled
N1

S2
Pooled=

df1
dfTotal

(S2
1)+

df2
dfTotal

(S2
2)

Hypothesis Testing with a t Test
for Independent Means
Considering the five steps of hypothesis testing, there are three new wrinkles for a
t test for independent means: (1) the comparison distribution is now a distribution of
differences between means (this affects Step ❷); (2) the degrees of freedom for find-
ing the cutoff on the t table is based on two samples (this affects Step ❸); and (3)
your sample’s score on the comparison distribution is based on the difference
between your two means (this affects Step ❹).

Example of a t Test for Independent Means
Let’s return to the expressive writing study that we introduced at the start of the
chapter. Twenty students were recruited to take part in the study. The 10 students

IS
B
N
0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

randomly assigned to the expressive writing group wrote about their thoughts and
feelings associated with their most traumatic life events. The 10 students randomly
assigned to the control group wrote about their plans for the day. One month later,
all of the students rated their overall level of physical health on a scale from

to .
The scores and figuring for the t test are shown in Table 8–1. Figure 8–2 shows

the distributions involved. Let’s go through the five steps of hypothesis testing.

❶ Restate the question as a research hypothesis and a null hypothesis about
the populations. There are two populations:

Population 1: Students who engage in expressive writing.
Population 2: Students who write about a neutral topic (their plans for the day).

The researchers were interested in identifying a positive or a negative health
effect of expressive writing. Thus, the research hypothesis was that Population 1
students would rate their health differently from Population 2 students: .
The null hypothesis was that Population 1 students would rate their health the
same as Population 2 students: .

❷ Determine the characteristics of the comparison distribution. The compari-
son distribution is a distribution of differences between means. (a) Its mean is

�1 = �2

�1 Z �2

100 = perfect health0 = very poor health

The t Test for Independent Means 279

T I P F O R S U C C E S S
Note that in previous chapters,
Population 2 represented the pop-
ulation situation if the null hypothe-
sis is true.

Table 8–1 t Test for Independent Means for a Fictional Study of the Effect
of Expressive Writing on Physical Health

Expressive Writing Group Control Writing Group

Score

Deviation
from Mean
(Score – M )

Squared
Deviation

from Mean Score

Deviation
from Mean
(Score – M )
Squared
Deviation

from Mean

77 4 87 19 361

88 9 81 77 9 81

77 4 71 3 9

90 11 121 70 2 4

68 121 63 25

74 25 50 324

62 289 58 100

93 14 196 63 25

82 3 9 76 8 64

79 0 0 65 9

: 790 850 680 1002

Needed t with df = 18, 5% level, two-tailed

Decision: Reject the null hypothesis.

t = (M1 – M2)>SDifference = (79.00 – 68.00)>4.54 = 2.42
= ;2.101

S2Difference = 2S2Difference = 220.58 = 4.54
S2Difference = S2M1 + S

2
M2 = 10.29 + 10.29 = 20.

S2M2 = S
2
Pooled>N2 = 102.89>10 = 10.29

S2M1 = S
2
Pooled>N1 = 102.89>10 = 10.29

S2Pooled =
df1

dfTotal
(S21) +

df2
dfTotal

(S22) =
9

18
(94.44) +

9
18

(111.33) = 47.22 + 55.67 = 102.89

dfTotal = df1 + df2 = 9 + 9 = 18
N1 = 10; df1 = N1 – 1 = 9; N2 = 10; df2 = N2 – 1 = 9
M1 = 79.00; S21 = 850>9 = 94.44; M2 = 68.00 S21 = 1002>9 = 111.33
g

-3

-5
-10-17
-18-5
-5-11

-2

-2
IS
B
N
0-
55
8-
46
76
1-
X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

280 Chapter 8

0 (as it almost always is in a t test for independent means, because we are inter-
ested in whether there is more than 0 difference between the two populations).
(b) Regarding its standard deviation,
●A Figure the estimated population variances based on each sample. As shown

in Table 8–1, comes out to 94.44 and
●B Figure the pooled estimate of the population variance: As shown in Table

8–1, the figuring for gives a result of 102.89.
●C Figure the variance of each distribution of means: Dividing by the

N in each sample, as shown in Table 8–1, gives and
●D Figure the variance of the distribution of differences between means:

Adding up the variances of the two distributions of means, as shown in Table
8–1, comes out to

●E Figure the standard deviation of the distribution of differences between
means:

SDifference = 2S2Difference = 220.58 = 4.54.
S2Difference = 20.58.

S2M2 = 10.29.S
2
M1 = 10.29

S2Pooled
S2Pooled

S22 = 111.33.S2 1

Students who engage in expressive writing Students who write about a neutral topic

Distributions of means

SM = 3.21
(SM = 10.29)

SDifference = 4.54

S2 = 94.44 S2 = 111.33

79.00 68.00

Samples

Distribution of differences
between means
(comparison distribution)

t Score = 2.420

2 2

Populations
(SPooled = 102.89)

Figure 8–2 Distributions for a t test for independent means for the expressive writing
example.

T I P F O R S U C C E S S
Notice that, in this example, the
value for is the same as the

value for This is because
there was the same number of stu-
dents in the two groups (that is,
was the same as ). When the
number of individuals in the two
groups is not the same, the values
for and will be different.S2M2S

2
M1

N2
N1

S2M2.

S2M1

IS
B
N
0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

The t Test for Independent Means 281

❸ Determine the cutoff sample score on the comparison distribution at which
the null hypothesis should be rejected. This requires a two-tailed test because
the researchers were interested in an effect in either direction. As shown in Table
A–2 (in the Appendix), the cutoff t scores at the .05 level are 2.101 and .

❹ Determine your sample’s score on the comparison distribution. The t score is
the difference between the two sample means ( , which is 11.00),
divided by the standard deviation of the distribution of differences between
means (which is 4.54). This comes out to 2.42.

➎ Decide whether to reject the null hypothesis. The t score of 2.42 for the differ-
ence between the two actual means is larger than the cutoff t score of 2.101. You
can reject the null hypothesis. The research hypothesis is supported: students
who engage in expressive writing report a higher level of health than students who
write about a neutral topic.

Although the actual numbers in this study were fictional, the results are consistent
with those from many actual studies that have shown beneficial effects of expressive
writing on self-reported health outcomes, as well as additional outcomes such as
psychological well-being (e.g., Pennebaker & Beall, 1986; Warner et al., 2006; see
also Frattaroli, 2006).

Summary of Steps for a t Test for Independent Means
Table 8–2 summarizes the steps for a t test for independent means.1

79.00 – 68.00

-2.101

Table 8–2 Steps for a t Test for Independent Means

❶ Restate the question as a research hypothesis and a null hypothesis about the populations.

❷ Determine the characteristics of the comparison distribution.

a. Its mean will be 0.

b. Figure its standard deviation.

●A Figure the estimated population variances based on each sample. For each population,

.
●B Figure the pooled estimate of the population variance:

●C Figure the variance of each distribution of means:

●D Figure the variance of the distribution of differences between means:

●E Figure the standard deviation of the distribution of differences between means:

c. Determine its shape: It will be a t distribution with degrees of freedom.

❸ Determine the cutoff sample score on the comparison distribution at which the null hypothesis
should be rejected.

a. Determine the degrees of freedom ( ), desired significance level, and tails in the test (one or two).

b. Look up the appropriate cutoff in a t table. If the exact df is not given, use the df below it.

❹ Determine your sample’s score on the comparison distribution:

❺ Decide whether to reject the null hypothesis: Compare the scores from Steps ❸ and ❹.

t = (M1 – M2)>SDifference

dfTotal
dfTotal

SDifference = 2S2Difference

S2Difference = S2M1 + S
2
M2

S2M1 = S
2
Pooled>N1 and S2M2 = S2Pooled>N2

(df1 = N1 – 1 and df2 = N2 – 1; dfTotal = df1 + df2)

S2Pooled =
df1
dfTotal
(S21) +
df2
dfTotal
(S22)

S2 = SS>(N – 1)

IS
B
N
0-
55
8-
46
76
1-
X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

282 Chapter 8

A Second Example of a t Test for Independent Means
Valenzuela (1997) compared the mothering received by poor children who either
were or were not undernourished. One of her measures was systematic ratings of
how well the mother assisted her child in a standard puzzle-solving task (observed
during home visits). The mothers of the 43 adequately nourished children had a
mean quality of assistance of 33.10 and an estimated population variance of 201.64.
The mothers of the 42 chronically undernourished children had a mean of 27.00 on
this measure, with an estimated population variance of 134.56.

The figuring for the t test comparing the quality of assistance scores for the two
conditions is shown in Table 8–3. The distributions involved are shown in Figure 8–3.
Next, we go through the five steps of hypothesis testing.

❶ Restate the question as a research hypothesis and a null hypothesis about
the populations. There are two populations:

Population 1: Mothers of adequately nourished poor children.
Population 2: Mothers of chronically undernourished poor children.

The research hypothesis was that Population 1 mothers would score
differently from Population 2 mothers on the quality of assistance to their chil-
dren. Valenzuela predicted that Population 1 would score higher than Popula-
tion 2. However, following conventional practice in studies like this, she used
a nondirectional (two-tailed) significance test. (This had the advantage of al-
lowing the possibility of finding significant results in the direction opposite to
her prediction.) Thus, the research hypothesis actually tested was that Popula-
tion 1 mothers would score differently from Population 2 mothers: .
The null hypothesis was that the Population 1 mothers would score the same
as Population 2 mothers: .

❷ Determine the characteristics of the comparison distribution. (a) Its mean
will be 0. (b) Figure its standard deviation (see Table 8–3 for the figuring for
each step below),
●A Figure the estimated population variances based on each sample. These

are already figured for us: and S22 = 134.56.S2 1 = 201.64

�1 = �2
�1 Z �2

Table 8–3 t Test for Independent Means in Study of Quality of Assistance of Mothers of
Adequately Nourished Versus Chronically Undernourished Poor Chilean Children

Adequately Nourished Children:

Chronically Undernourished Children:

Needed t with (using in the table), 5% level,

Decision: Reject the null hypothesis; the research hypothesis is supported.

Source: Data from Valenzuela (1997).

t = (M1 – M2)>SDifference = (33.10 – 27.00)>2.82 = 6.10>2.82 = 2.16
two-tailed = ;1.990df = 80df = 83

SDifference = 2S2Difference = 27.94 = 2.82
S2Difference = S2M1 + S

2
M2 = 3.92 + 4.02 = 7.94

S2M2 = S
2
Pooled>N2 = 168.77>42 = 4.02

S2M1 = S
2
Pooled>N1 = 168.77>43 = 3.92

= .51(201.64) + .49(134.56) = 102.84 + 65.93 = 168.77

S2Pooled =
df1
dfTotal
(S21) +
df2
dfTotal

(S22) =
42
83

(201.64) +
41
83

(134.56)

dfTotal = df1 + df2 = 42 + 41 = 83
N2 = 42; df2 = N2 – 1 = 41; M2 = 27.00; S 22 = 134.56

N1 = 43; df1 = N1 – 1 = 42; M1 = 33.10; S 21 = 201.64

IS
B
N
0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

The t Test for Independent Means 283

Populations
(SPooled = 168.77)

Adequately nourished poor children Chronically undernourished poor children

Distributions of meansSM = 1.98
(SM = 3.92)

SM = 2.00
(SM = 4.02)

SDifference = 2.82

S2 = 201.64 S2 = 134.56

33.10 27.00

Samples
Distribution of differences
between means
(comparison distribution)

t Score = 2.160

2 2
2

Figure 8–3 Distributions for a t test for independent means for the mothers of
adequately nourished versus chronically undernourished poor children.

Source: Data from Valenzuela, 1997.

●B Figure the pooled estimate of the population variance: The figuring for
gives a result of 168.77.

●C BFigure the variance of each distribution of means: Dividing by the
N in each sample gives and

●D Figure the variance of the distribution of differences between means:
Adding up the variances of the two distributions of means comes out to

●E Figure the standard deviation of the distribution of differences between
means:

❸ Determine the cutoff sample score on the comparison distribution at which
the null hypothesis should be rejected. The cutoff you need is for a two-tailed
test (because the research hypothesis is nondirectional) at the usual .05 level,
with 83 degrees of freedom. The t table in the Appendix (Table A–2) does not
have a listing for 83 degrees of freedom. Thus, you use the next lowest df avail-
able, which is 80. This gives cutoff t scores of and .-1.990+1.990

dfTotal

SDifference = 2S2Difference = 27.94 = 2.82.
S2Difference = 7.94.

S2M2 = 4.02.S
2
M1 = 3.92

S2Pooled
S2Pooled
IS
B
N
0-
55
8-
46
76
1-
X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

284 Chapter 8

How are you doing?

1. List the ways in which hypothesis testing for a t test for independent means is
different from a t test for dependent means in terms of (a) Step ❷, (b) Step ❸,
and (c) Step ❹.

2. Using the .05 significance level, figure a t test for independent means for an
experiment in which scores in an experimental condition are predicted to be
lower than scores in a control condition. For the experimental condition, with
26 participants, , ; for the control condition, with 36 partici-
pants, , . (a) Use the steps of hypothesis testing. (b) Sketch the
distributions involved.

S2 = 12M = 8
S2 = 10M = 5

❹ Determine your sample’s score on the comparison distribution. The t score
is the difference between the two sample means divided by the standard devia-
tion of the distribution of differences between means. This comes out to a t of
2.16. (That is, .)

❺ Decide whether to reject the null hypothesis. The t score of 2.16 for the differ-
ence between the means of the two conditions is more extreme than the cutoff
t score of . Therefore, the researchers could reject the null hypothesis. The
research hypothesis is supported: mothers of adequately nourished children
provide better-quality assistance to their children than do mothers of chronically
undernourished children.

;1.990

t = 6.10>2.82 = 2.16

❷Determine the characteristics of the comparison distribution.
(a)Its mean will be 0.
(b)Figure its standard deviation,

●AFigure the estimated population variances based on each
sample.

●BFigure the pooled estimate of the population variance:

●CFigure the variance of each distribution of means:

●DFigure the variance of the distribution of differences between
means:

●EFigure the standard deviation of the distribution of differ-
ences between means:

(c)The shape is a tdistribution with = 60.
❸Determine the cutoff sample score on the comparison distribution

at which the null hypothesis should be rejected.The tcutoff for .05
level, one-tailed, is . (The cutoff is a negative tscore,
because the research hypothesis is that the mean of Population 1 will
be lowerthan the mean of Population 2.)

❹Determine your sample’s score on the comparison distribution.
.

❺Decide whether to reject the null hypothesis.The tof is
moreextreme than the cutoff tof . Therefore, reject the null
hypothesis.

(b)The distributions involved are shown in Figure 8–4.

-1.671
-3.49

t=(M1-M2)>SDifference=(5-8)>.86=-3.49

-1.671 dfTotal=60

dfTotal

SDifference=2S2Difference=2.74=.86.

S2
Difference=.43+.31=.74.

S2
M1=11.17 > 26=.43 and S2M2=11.17 > 36=.31.

S2
Pooled=(25 > 60)(10)+(35 > 60)(12)=4.17+7.00=11.17.

S2
1=10; S22=12.

IS
B
N
0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

The t Test for Independent Means 285

Answers

1.(a) The comparison distribution for a ttest for independent means is a distrib-
ution of differences between means.
(b) The degrees of freedom for a ttest for independent means is the sum of
the degrees of freedom for the two samples.
(c) The tscore for a ttest for independent means is based on differences be-
tween means (divided by the standard deviation of the distribution of differ-
ences between means).

2.(a) Steps of hypothesis testing:
❶Restate the question as a research hypothesis and a null hypothe-

sis about the populations.There are two populations.

Population 1:People given the experimental procedure.
Population 2:People given the control procedure.

The research hypothesis is that the mean of Population 1 is less than
the mean of Population 2: . The null hypothesis is that the mean
of Population 1 is not less than the mean of Population 2: . �1Ú�2

�16�2

Populations
(SPooled = 11.17)

Control Group

Distributions of means SM = .66
(SM = .43)

SM = .56
(SM = .31)

SDifference = .86

S2 = 10S2 = 12

58
Samples
Distribution of differences
between means
(comparison distribution)

t Score = −3.490

Experimental Group

Figure 8–4Distributions for a ttest for independent means for the answer to “How
Are You Doing” question 2.

IS
B
N
0-
55
8-
46
76
1-
X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

286 Chapter 8

simulating the statistical conditions of what were essen-
tially physical experiments, the physical world could
be understood—or at least approximated in an ade-
quate way.

Do you remember being shown Brownian motion in
your chemistry or physics class in high school? Its study
is a good example of a Monte Carlo problem. Here are
atomic particles, more or less, this time in fluids, free to
do an almost limitless number of almost random things.
In fact, Brownian motion has been likened to a “random
walk” of a drunkard. At any moment, the drunkard
could move in any direction. But the problem is simpli-
fied by limiting the drunkard (or particle) to an imagi-
nary grid.

Picture the grid of a city’s streets. Further imagine
that there is a wall around the city that the drunkard can-
not escape (just as all particles must come to a limit; they
cannot go on forever). At the limit, the wall, the drunkard
must pay a fine, which also varies randomly. The point
of this example is how much is random—all the move-
ments and also all the ultimate consequences. So the
number of possible paths is enormous.

The random walk example brings us to the main fea-
ture of Monte Carlo methods: they require the use of ran-
dom numbers. And for an explanation of them, you can
look forward to Chapter 14, Box 14–1.

Now, let’s return to what interests us here: the use of
Monte Carlo studies to check out what will be the result

BOX 8–1 Monte Carlo Methods: When Mathematics Becomes
Just an Experiment, and Statistics Depend on a Game
of Chance

The name for the methods, Monte Carlo (after the fa-
mous Monegasque casino resort city), has been adopted
only in recent years. But the approach itself dates back at
least a few centuries to when mathematicians would set
down their pens or chalk and go out and try an actual ex-
periment to test a particular understanding of a probabil-
ity problem. For example, in 1777 Buffon described, in
his Essai d’Arithmétique morale, a method of computing
the ratio of the diameter of a circle to its circumference
by tossing a needle onto a flat surface containing parallel
lines. Assuming that the needle fell randomly into any
position, one could figure the odds of its taking certain
positions, such as touching the lines or not and lying at
certain angles. The term Monte Carlo no doubt reflects
the early understanding of mathematicians and statisti-
cians that many of their problems were like those involv-
ing games of chance. (Recall Pascal and the problem of
points from Chapter 3, Box 3–3.)

Wide use of Monte Carlo methods by statisticians
became possible with the advent of computers. This is
because the essence of Monte Carlo studies is the inter-
action of randomness and probabilities, which means
testing out a great many possibilities. Indeed, the first
application of Monte Carlo methods was in neutron
physics because the behavior of particles when scattered
by a neutron beam is so complicated and so close to
random that solving the problem mathematically from
equations was practically impossible. But by artificially

Assumptions of the t Test for Independent Means
The first assumption for a t test for independent means is the same as that for any
t test: each of the population distributions is assumed to follow a normal curve. In
practice, this is only a problem if you have reason to think that the two popula-
tions are dramatically skewed distributions and in opposite directions. The t test
holds up well even when the shape of the population distributions is fairly far
from normal.

In a t test for independent means, you also have to be able to assume that the two
populations have the same variance. (As you learned earlier in the chapter, this as-
sumption is called homogeneity of variance.) Once again, however, it turns out that
in practice the t test gives pretty accurate results even when there are fairly large dif-
ferences in the population variances, particularly when there are equal or near equal
numbers of scores in the two samples. (How do we know that the t test holds up well
to moderate violations of its assumptions? See Box 8–1 for a description of what are
called Monte Carlo methods.)

IS
B
N
0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

The t Test for Independent Means 287

of the violations of assumptions of statistical tests. For
example, the computer may set up two populations with
identical means, but the other parameters are supplied
by the statistical researcher so that these violate some
important assumption. Perhaps the populations are
skewed a certain way or the two populations have differ-
ent variances.

Then, samples are randomly selected from each of
these two offbeat populations (remember, they were in-
vented by the computer). The means of these samples are
compared using the usual t-test procedure with the usual
t tables with all their assumptions. A large number, often
around 10,000, of such pairs of samples are selected, and
a t test is figured for each. The question is, “How many
of these 10,000 t tests will come out significant at the 5%
significance level?” Ideally, the result would be about
5%, or 50 of the 10,000. But what if 10% (1,000) of these
supposedly 5%-level tests come out significant? What if
only 1% do? If these kinds of results arise, then this par-
ticular violation of the assumptions of the t test cannot be
tolerated. But, in fact, most violations (except for very
extreme ones) checked with these methods do not create
very large changes in the p values.

Monte Carlo methods are a boon to statistics, but
like everything else, they have their drawbacks as well
and consequently their critics. One problem is that the
ways in which populations can violate assumptions are
almost limitless in their variations. But even computers
have their limits; Monte Carlo studies are tried on only

a representative set of those variations. A more specific
problem is that there is good reason to think that some
of the variations that are not studied are far more like
the real world than those that have been studied (see
the discussion in Chapter 3 of the controversy about
how common the normal curve really is). Finally, when
we are deciding whether to use a particular statistic in
any specific situation, we have no idea about the popu-
lation our sample came from; is it like any on which
there has been a Monte Carlo study performed, or not?
Simply knowing that Monte Carlo studies have shown
some statistic to be robust in the face of many kinds of
assumption violations does not prove that it is robust in
a given situation. We can only hope that it increases the
chances that using the statistic is safe and justifiable.

At any rate, Monte Carlo studies are a perfect exam-
ple of how the computer has changed science. Shreider
(1966) expressed it this way:

Computers have led to a novel revolution in mathemat-
ics. Whereas previously an investigation of a random
process was regarded as complete as soon as it was re-
duced to an analytic description, nowadays it is conve-
nient in many cases to solve an analytic problem by
reducing it to a corresponding random process and then
simulating that process. (p. vii)

In other words, instead of math helping us analyze exper-
iments, experiments are helping us analyze math.

However, the t test can give quite misleading results if (a) the scores in the
samples suggest that the populations are very far from normal, (b) the variances
are very different, or (c) there are both problems. In these situations, there are al-
ternatives to the ordinary t test procedure, some of which we will consider in
Chapter 14.

Many computer programs for figuring the t test for independent means actually
provide two sets of results. One set of results figures the t test assuming the popula-
tion variances are equal. This method is the standard one, the one you have learned
in this chapter. A second set of results uses a special alternative procedure that takes
into account that the population variances may be unequal. (But it still assumes that
the populations follow a normal distribution.) An example of these two sets of re-
sults is shown in the Using SPSS section at the end of this chapter (see Figure 8–8).
However, in most situations we can assume that the population variances are equal.
Thus, researchers usually use the standard method. Using the special alternative pro-
cedure has the advantage that you don’t have to worry about whether you met the
equal population variance assumption. But it has the disadvantage that if you have
met that assumption, with this special method you have less power. That is, when
you do meet the assumption, you are slightly less likely to get a significant result
using the special method.

IS
B
N
0-
55
8-
46
76
1-
X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

288 Chapter 8

Effect Size and Power for the t Test
for Independent Means

Effect Size

Effect size for the t test for independent means is figured in basically the same way
as we have been using all along:

(8–8)

Cohen’s (1988) conventions for the t test for independent means are the same
as in all the situations we have considered so far: .20 for a small effect size, .50 for
a medium effect size, and .80 for a large effect size.

Suppose that an environmental psychologist is working in a city with high lev-
els of air pollution. This psychologist plans a study of the number of problems com-
pleted on a creativity test over a one-hour period. The study compares performance
under two conditions. In the experimental condition, each participant takes the test in
a room with a special air purifier. In the control condition, each participant takes the
test in a room without the air purifier. The researcher expects that the control group
will probably score like others who have taken this test in the past, which is a mean
of 21. But the researcher expects that the experimental group will perform better,
scoring about 29. This test is known from previous research to have a standard devi-
ation of about 10. Thus, , , and . Given these figures,

, a large effect size.
When you have results of a completed study, you estimate the effect size as the

difference between the sample means divided by the pooled estimate of the popula-
tion standard deviation (the square root of the pooled estimate of the population vari-
ance). You use the sample means because they are the best estimate of the population
means, and you use because it is the best estimate of . Stated as a formula,

(8–9)

Consider Valenzuela’s (1997) study of the quality of instructional assistance
provided by mothers of poor children. The mean for the sample of mothers of the ad-
equately nourished children was 33.10; the mean for the sample of mothers of chron-
ically undernourished children was 27.00. We figured the pooled estimate of the
population variance to be 168.77; the standard deviation is thus 12.99. The differ-
ence in means of 6.10, divided by 12.99, gives an effect size of .47—a medium
effect size. In terms of the formula,

Power
Power for a t test for independent means can be determined using a power table, a
power software package, or an Internet power calculator. The power table shown in

Estimated d =
M1 – M2

SPooled

33.10 – 27.00
12.99

=
6.10

12.99
= .47

Estimated d =
M1 – M2
SPooled

�SPooled

d = (�1 – �2)>� = (29 – 21)>10 = .80
� = 10�2 = 21�1 = 29

d =
�1 – �2

�

The effect size is the differ-
ence between the population
means divided by the popu-
lation’s standard deviation.

The estimated effect size is
the difference between the
sample means divided by the
pooled estimate of the popu-
lation’s standard deviation.

IS
B
N
0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

The t Test for Independent Means 289

Table 8–4 gives the approximate power for the .05 significance level for small, medi-
um, and large effect sizes, and one-tailed or two-tailed tests.2 Consider again the
environmental psychology example of a planned study, where the researchers ex-
pected a large effect size ( ). Suppose this researcher plans to use the .05 level,
one-tailed, with 10 participants. Based on Table 8–4, the study would have a power
of .53. This means that, even if the research hypothesis is in fact true and has a large
effect size, there is only a 53% chance that the study will come out significant.

Now consider an example of a completed study. Suppose you have read a study
using a t test for independent means that had a nonsignificant result using the .05 sig-
nificance level, two-tailed. There were 40 participants in each group. Should you
conclude that there is in fact no difference at all in the populations? This conclusion
seems quite unjustified. Table 8–4 shows a power of only .14 for a small effect size.
This suggests that even if such a small effect does indeed exist in the populations,
this study would probably not come out significant. Still, we can also conclude that,
if there is a true difference in the populations, it is probably not large. Table 8–4
shows a power of .94 for a large effect size. This suggests that, if a large effect exists,
it almost surely would have produced a significant result.

Planning Sample Size
Table 8–5 gives the approximate number of participants needed for 80% power for
estimated small, medium, and large effect sizes, using one-tailed and two-tailed
tests, all using the .05 significance level.3 Suppose you plan a study in which you
expect a medium effect size and will use the .05 significance level, one-tailed. Based
on Table 8–5, you need 50 people in each group (100 total) to have 80% power.
However, if you did a study using the same significance level but expected a large
effect size, you would need only 20 people in each group (40 total).

d = .80

Table 8–4 Approximate Power for Studies Using the t Test for Independent Means Testing
Hypotheses at the .05 Significance Level

One-tailed test

10 .11 .29 .53

20 .15 .46 .80

30 .19 .61 .92

40 .22 .72 .97

50 .26 .80 .99

100 .41 .97 *
Two-tailed test

10 .07 .18 .39

20 .09 .33 .69

30 .12 .47 .86

40 .14 .60 .94

50 .17 .70 .98

100 .29 .94 *

*Nearly 1.

Effect SizeNumber of Participants
in Each Group

Small (.20) Medium (.50) Large (.80)

IS
B
N
0-
55
8-
46
76
1-
X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

290 Chapter 8

Table 8–5 Approximate Number of Participants Needed in Each Group (Assuming Equal
Sample Sizes) for 80% Power for the t Test for Independent Means, Testing
Hypotheses at the .05 Significance Level

Effect Size
Small (.20) Medium (.50) Large (.80)

One-tailed 310 50 20

Two-tailed 393 64 26

Review and Comparison of the Three
Kinds of t Tests
You have now learned about three kinds of t tests: In Chapter 7, you learned about
the t test for a single sample and the t test for dependent means, and in this chapter
you learned about the t test for independent means. Table 8–6 provides a review and
comparison of these three kinds of t tests.

As you can see in Table 8–6, the population variance is not known for each test,
and the shape of the comparison distribution for each test is a t distribution. The

How are you doing?

1. List two assumptions for the t test for independent means. For each, give the
situations in which violations of these assumptions would be seriously
problematic.

2. Why do you need to assume the populations have the same variance?
3. What is the effect size for a planned study in which Population 1 is predicted

to have a mean of 17, Population 2 is predicted to have a mean of 25, and the
population standard deviation is assumed to be about 20?

4. What is the power of a study using a t test for independent means, with a two-
tailed test at the .05 significance level, in which the researchers predict a large
effect size and there are 20 participants in each group?

5. How many participants do you need in each group for 80% power in a planned
study in which you predict a small effect size and will be using a t test for inde-
pendent means, one-tailed, at the .05 significance level?

Answers

1.One assumption is that the two populations are normally distributed; this is
mainly a problem if you have reason to think the two populations are strongly
skewed in opposite directions. A second assumption is that the two popula-
tions have the same variance; this is mainly a problem if you believe the two
distributions have quite different variances andthe sample sizes are different.

2.You need to assume the populations have the same variance because you
make a pooled estimate of the population variance. The pooling would not
make sense if the estimates from the two samples were for populations with
different variances.

3.The effect size is
4.The power is .69.
5.You need 310 participants.

d=(17-25)>20=-8>20=-.40.

IS
B
N
0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

The t Test for Independent Means 291

single sample t test is used for hypothesis testing when you are comparing the mean
of a single sample to a known population mean. However, in most research in psy-
chology, you do not know the population mean. With an unknown population mean,
the t test for dependent means is the appropriate t test when each participant has two
scores (such as a before-score and an after-score) and you want to see if, on average,
there is a difference between the participants’ pairs of scores. The t test for indepen-
dent means is used for hypothesis testing when you are comparing the mean of
scores from of one group of individuals (such as an experimental group) with the
mean of scores from a different group of individuals (such as a control group).

Controversy: The Problem of Too Many t Tests
A long-standing controversy is what is usually called the problem of “too many
t tests.” The basic issues come up in all types of hypothesis testing, not just in the
t test. However, we introduce this problem now because it has traditionally been
brought up in this context.

Suppose you do a large number of t tests for the same study. For example, you
might compare two groups on each of 17 different measures, such as different indi-
cators of memory on a recall task, various intelligence test subscales, or different as-
pects of observed interactions between infants. When you do several t tests in the
same study, the chance of any one of them coming out significant at, say, the 5%
level is really greater than 5%. If you make 100 independent comparisons, on the av-
erage five of them will come out significant at the 5% level just by chance. That is,
about five will come out significant even if there is no true difference at all between
the populations the t tests are comparing.

The fundamental issue is not controversial. Everyone agrees that there is a prob-
lem in a study involving a large number of comparisons. And everyone agrees that in
a study like this, if only a few results come out significant, these differences should
be viewed very cautiously. The controversy is about how cautious to be and about
how few is “only a few.” One reason there is room for controversy is that, in most
cases, the many comparisons being made are not independent; the chance of one
coming out significant is related to the chance of another coming out significant.

T I P F O R S U C C E S S
We recommend that you spend
some time carefully going through
Table 8–6. Test your understanding
of the three kinds of t tests by cov-
ering up portions of the table and
trying to recall the hidden informa-
tion. If you are at all unsure about
any information in the table, be
sure to review the relevant material
in this chapter and in Chapter 7.

Table 8–6 Review of the Three Kinds of t Tests

Type of t Test

Feature of the t Tests Single Sample Dependent Means Independent Means

Population variance
is known

No No No

Formula for degrees
of freedom

df = N – 1 df = N – 1 dfTotal = df1 + df2

Shape of comparison
distribution

t distribution t distribution t distribution

Population mean is
known

Yes No No

Number of scores for
each participant

1 2 1

t test carried out on
difference scores

No Yes No

Formula for t t = (M – �)>SM t = (M – �)>SM t = (M1 – M2)>SDifference
(df1 = N1 – 1; df2 = N2 – 1)

IS
B
N
0-
55
8-
46
76
1-
X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

292 Chapter 8

Here is an example. A study compares a sample of lawyers to a sample of doc-
tors on 100 personality traits. Now suppose the researcher simply conducts 100
t tests. If these 100 t tests were truly independent, we would expect that on the aver-
age five would come out significant just by chance. In fact, tables exist that tell you
quite precisely the chance of any particular number of t tests coming out significant.
The problem, however, is that in practice these 100 t tests are not independent. Many
of the various personality traits are probably related: if doctors and lawyers differ on
assertiveness, they probably also differ on self-confidence. Thus, certain sets of
comparisons may be more or less likely to come out significant by chance so that
5 in 100 may not be what you should expect by chance.

There is yet another complication: in most cases, differences on some of the
variables are more important than on others. Some comparisons may directly test a
theory or the effectiveness of some practical procedure; other comparisons may be
more “exploratory.”

Here is another kind of example. In studies using brain imaging procedures
[such as functional magnetic resonance imagery (fMRI)], the way the analysis works
for a typical study is like this: a person’s brain is scanned every few seconds over a
10- or 15-minute period. During this time, the person is sometimes looking at one
kind of image, say a picture of a person smiling, and at other times is looking at a dif-
ferent kind of image, say a picture of the same person frowning. For each little area
of the brain, the fMRI produces a number for how active that area was during each
2- to 3-second scan. Thus, for each little area of the brain, you might have 60 num-
bers for activation when looking at the smile and 60 numbers for when looking at the
frown. Thus, for each little area, you can figure a t test for dependent means. In fact,
this is exactly what is done in this kind of research. (We considered an example like
this in Chapter 7.) The problem, however, is that you have a great many little areas
of the brain. (Typically, in fMRI research, each little area may be about a -inch
cube or smaller.) Thus, you have several thousand t tests, and you would expect
some of them to be significant just by chance. This whole situation is further compli-
cated by the issue that some brain areas might be expected to be more likely to show
different levels of activity for this kind of image. In addition, the situation is still fur-
ther complicated by the fact that you might want to pay more attention when two or
more little areas that are right next to each other show significant differences.

In these various examples, there are a variety of contending solutions. We intro-
duce one kind of solution in Chapter 9 (the Bonferroni procedure), when we consider
a related situation, one that comes up in studies comparing more than two groups.
However, the issue remains at the forefront of work on the development of statistical
methods. [Aron et al. (2005) used one of the more conservative methods in the study
that was the basis of the Chapter 7 example; so they were very confident of their
results—but, using that method, they might have missed finding even more differ-
ences.] In the neuroimaging research literature, this issue has been a particularly
lively topic of late (e.g., Nancy & Cordes, 2007; Nichols & Hayasaka, 2003).

The t Test for Independent Means
in Research Articles
A t test for independent means is usually described in a research article by giving the
means (and sometimes the standard deviations) of the two samples, plus the usual
way of reporting any kind of t test—for example, , (recall that
the number in parentheses is the degrees of freedom). The result of the study of the
health effects of expressive writing might be written up as follows: “The mean level

p 6 .01t(38) = 4.72

1>4

IS
B
N
0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

The t Test for Independent Means 293

of self-reported health in the expressive writing group was 79.00 ( ),
and the mean for the control writing group was 68.00 ( ); ,

.05, two-tailed.”
Here is another example. Dodge and Kaufman (2007) conducted a study of col-

lege students’ use of and attitudes toward dietary supplements. Here is an excerpt
from their results section: “Men were more likely than women to report they had
used a dietary supplement to improve physical performance, , ,
whereas women were more likely than men to report having used a dietary supple-
ment to help with weight loss, , ” (p. 515).

Table 8–7 is an example in which the results of several t tests are given in a
table. This table is taken from a study conducted by Gibbons and colleagues (2006).
In that study, 152 college students in Guatemala were surveyed on their beliefs about
machismo (a strong sense of masculinity), their attitudes toward women, and their
beliefs about adoption. As shown in Table 8–7, the researchers used three t tests for
independent means to examine whether female and male students differed on these
beliefs and attitudes. The scales were scored so that higher scores were for more pos-
itive attitudes about machismo, more egalitarian (equal) gender beliefs (which were
measured using the Attitudes Towards Women Scale for Adolescents, abbreviated as
AWSA in Table 8–7), and more favorable beliefs about adoption. The first line of the
table shows that men (with a mean score of 1.32) had more positive attitudes about
machismo than women (mean score of 1.17). The t score for this comparison was
4.77 and it was statistically significant at . The results in Table 8–7 also
show that women had more positive attitudes toward women than men did and that
women had more favorable beliefs regarding adoption than men. (The number after
each sign is the standard deviation for that particular group.)

Advanced Topic: Power for the t Test for
Independent Means When Sample Sizes
are Not Equal
For a study with any given total number of participants, power is greatest when the
participants are divided into two equal groups. Recall the example from earlier in
this chapter where the 42 participants were divided into 11 in the experimental group
and 31 in the control group. This study has much less power than it would have if the
researchers had been able to divide their 42 participants into 21 in each group.

There is a practical problem in figuring power from tables when sample sizes
are not equal. (Power software packages and Internet power calculators require you

;

p 6 .001

p 6 .01t(61) = -2.74

p 6 .01t(61) = 4.03

p 6
t(18) = 2.42SD = 10.55
SD = 9.72

Table 8–7 Mean and Standard Deviation of Scores for Women and Men on Measures of
Machismo, Attitudes Toward Women, and Adoption Beliefs

Women
( )n � 64

Men
( )n � 88 t p

Machismo 4.77

AWSA 5.00

Adoption 3.07

Source: Gibbons, J. L., Wilson, S. L., & Rufener, C. A. (2006). Gender attitudes mediate gender differences in attitudes towards
adoption in Guatemala. Sex Roles, 54, 139–145. Copyright © 2006. Reprinted by permission of Springer Science and Business
Media.

6 .012.85 ; .413.10 ; .39
6 .0012.98 ; .353.26 ; .31
6 .0011.32 ; .201.17 ; .15

IS
B
N
0-
55
8-
46
76
1-
X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

294 Chapter 8

harmonic mean special average influ-
enced disproportionately by smaller
numbers; in a t test for independent
means when the number of scores in the
two groups differ, the harmonic mean is
used as the equivalent of each group’s
sample size when determining power.

The harmonic mean is two
times the first sample size
times the second sample size,
all divided by the sum of the
two sample sizes.

to specify the sample sizes, which are then taken into account when they figure
power.) Like most power tables, Table 8–4 assumes equal numbers in each of the
two groups. What do you do when your two samples have different numbers of peo-
ple in them? It turns out that in terms of power, the harmonic mean of the numbers
of participants in two unequal sample sizes gives the equivalent sample size for what
you would have with two equal samples. There are several accounts as to the origin
of the harmonic mean, but it seems most likely that it originated from ancient Greek
times (around 350 BCE) in the context of music and harmonious tones. The harmonic
mean sample size is given by this formula:

(8–10)

In our example with 11 in one group and 31 in the other, the harmonic mean
is 16.24:

Thus, even though you have a total of 42 participants, the study has the power of a
study with equal sample sizes of only about 16 in each group. (This means that a study
with a total of 32 participants divided equally would have had about the same power.)

Harmonic mean =
(2)(N1)(N2)

N1 + N2

(2)(11)(31)

11 + 31
=

682

42
= 16.24

Harmonic mean =
(2)(N1)(N2)
N1 + N2
How are you doing?

1. What is the approximate power of a study using a t test for independent
means, with a two-tailed test at the .05 significance level, in which the re-
searchers predict a large effect size, and there are 6 participants in one group
and 34 participants in the other group?

Answer

1.. Power for a study
like this with 10 in each group .39 (see Table 8–4).
Harmonic mean=(2)(6)(34)>(6+34)=408>40=10.20

1. A t test for independent means is used for hypothesis testing with scores from
two entirely separate groups of people. The comparison distribution for a t test
for independent means is a distribution of differences between means of samples.
This distribution can be thought of as being built up in two steps: each population
of individuals produces a distribution of means, and then a new distribution is
created of differences between pairs of means selected from these two distribu-
tions of means.

2. The distribution of differences between means has a mean of 0 and is a t distri-
bution with the total of the degrees of freedom from the two samples. Its stan-
dard deviation is figured in several steps:
●A Figure the estimated population variances based on each sample.
●B Figure the pooled estimate of the population variance.
●C Figure the variance of each distribution of means.
●D Figure the variance of the distribution of differences between means.
●E Figure the standard deviation of the distribution of differences between

means.

Summary

�
IS
B
N
0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

The t Test for Independent Means 295

3. The assumptions of the t test for independent means are that the two populations
are normally distributed and have the same variance. However, the t test gives
fairly accurate results when the true situation is moderately different from the
assumptions.

4. Effect size for a t test for independent means is the difference between the
means divided by the population standard deviation. Power for a t test for inde-
pendent means can be determined using a table (see Table 8–4), a power soft-
ware package, or an Internet power calculator.

5. When you carry out many significance tests in the same study, such as a series of
t tests comparing two groups on various measures, the possibility that any one of
the comparisons may turn out significant at the .05 level by chance is greater than
.05. There is controversy about how to adjust for this problem, though most agree
that results should be interpreted cautiously in a situation of this kind.

6. t tests for independent means are usually reported in research articles with the
means of the two groups plus the degrees of freedom, t score, and significance
level. Results may also be reported in a table in which each significant differ-
ence may be shown by asterisks.

7. ADVANCED TOPIC: Power is greatest when the sample sizes of the two
groups are equal. When they are not equal, you use the harmonic mean of the
two sample sizes when looking up power in a table.

Key Terms

t test for independent means (p. 270)
distribution of differences between

means (p. 271)
pooled estimate of the population

variance ( ) (p. 273)S2Pooled

weighted average (p. 273)
variance of the distribution of differ-

ences between means ( )
(p. 275)

S2Difference

standard deviation of the distribution
of differences between means
( ) (p. 275)

harmonic mean (p. 294)
SDifference

Example Worked-Out Problems

Figuring the Standard Deviation of the Distribution
of Differences Between Means
Figure for the following study: , ;

Answer
●A Figure the estimated population variances based on each sample:

●B Figure the pooled estimate of the population variance:
●C Figure the variance of each distribution of means:

S2M2 = S
2
Pooled > N2 = 13.19 > 60 = .22.

S2M1 = S
2
Pooled > N1 = 13.19 > 40 = .33

S2Pooled =
df1
dfTotal
(S21) +
df2
dfTotal

(S22) = (39/98)(15) + (59/98)(12) = 13.19

dfTotal = df1 + df2 = 39 + 59 = 98
df1 = N1 – 1 = 40 – 1 = 39; df2 = N2 – 1 = 60 – 1 = 59;

S2 1 = 15; S22 = 12.

S22 = 12.N2 = 60S2 1 = 15;N1 = 40SDifference

IS
B
N
0-
55
8-
46
76
1-
X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

296 Chapter 8

●D Figure the variance of the distribution of differences between means:

●E Figure the standard deviation of the distribution of differences between
means:

Hypothesis Testing Using the t Test
for Independent Means
A researcher randomly assigns seven individuals to receive a new experimental pro-
cedure and seven to a control condition. At the end of the study, all 14 are measured.
Scores for those in the experimental group were 6, 4, 9, 7, 7, 3, and 6. Scores for
those in the control group were 6, 1, 5, 3, 1, 1, and 4. Carry out a t test for indepen-
dent means using the .05 level of significance, two-tailed. Use the five steps of hy-
pothesis testing and sketch the distributions involved.

Answer
The figuring is shown in Table 8–8; the distributions are shown in Figure 8–5. Here
are the steps of hypothesis testing.

SDifference = 2S2Difference = 2.55 = .74

S2Difference = S2M1 + S
2
M2 = .33 + .22 = .55

Table 8–8 Example Worked-Out Problem for Hypothesis Testing Using the t Test Independent
Means

Experimental Group Control Group

6 0 0 6 3 9

4 4 1 4

9 3 9 5 2 4

7 1 1 3 0 0

7 1 1 1 4

3 9 1 4

6 0 0 4 1 1

: 42 0 24 21 0 26

Needed t with level,

Decision: Reject the null hypothesis; the research hypothesis is supported.
t = (M1 – M2)>SDifference = (6.00 – 3.00)>1.10 = 3.00>1.10 = 2.73

two-tailed = ;2.179df = 12, 5%
SDifference = 2S2Difference = 21.20 = 1.10
S2Difference = S2M1 + S

2
M2 = .60 + .60 = 1.20

S2M2 = S
2
Pooled>N2 = 4.17>7 = .60

S2M1 = S
2
Pooled>N1 = 4.17>7 = .60

S2Pooled =
df1
dfTotal
(S21) +
df2
dfTotal

(S22) =
6

12
(4) +

6
12

(4.33) = .5(4) + .5(4.33) = 2.00 + 2.17 = 4.17

dfTotal = df1 + df2 = 6 + 6 = 12
N1 = 7; df1 = N1 – 1 = 6; N2 = 7; df2 = N2 – 1 = 6
M1 = 6; S21 = 24>6 = 4.00; M2 = 3; S22 = 26>6 = 4.33
g

-2-3
-2

-2-2

Deviation
From MeanScore

Squared
Deviation

From Mean Score
Deviation

From Mean

Squared
Deviation
From Mean
IS
B
N
0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

The t Test for Independent Means 297

❶ Restate the question as a research hypothesis and a null hypothesis about
the populations. There are two populations:

Population 1: People like those who receive the experimental procedure.
Population 2: People like those who receive the control procedure.

The research hypothesis is that the means of the two populations are different:
. The null hypothesis is that the means of the two populations are the

same: .
❷ Determine the characteristics of the comparison distribution.

(a) The distribution of differences between means has a mean of 0. (b) Regarding
its standard deviation,
●A Figure the estimated population variances based on each sample:

●B Figure the pooled estimate of the population variance:
●C Figure the variance of each distribution of means:
●D Figure the variance of the distribution of differences between means:

S2Difference = 1.20.

S2M2 = .60.S
2
M1 = .60;

S2Pooled = 4.17.
S2 1 = 4.00; S22 = 4.33.

�1 = �2
�1 Z �2

Experimental group Control group

Distributions of means

SM = .77
(SM = .60)

S2 = 4.00 S2 = 4.33

6 3

Samples
Distribution of differences
between means
(comparison distribution)

t Score = 2.730

2 2

Populations
(SPooled = 4.17)

SDifference = 1.10

Figure 8–5 Distributions for the Example Worked-Out Problem for hypothesis testing
using the t test for independent means.

IS
B
N
0-
55
8-
46
76
1-
X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

298 Chapter 8

●E Figure the standard deviation of the distribution of differences between
means: .

(c) The shape of the comparison distribution is a t distribution with .
❸ Determine the cutoff sample score on the comparison distribution at which

the null hypothesis should be rejected. With , .05 significance
level, two-tailed test, the cutoffs are and .

❹ Determine the sample’s score on the comparison distribution. .
❺ Decide whether to reject the null hypothesis. The t of 2.73 is more extreme

than the cutoffs of . Thus, you can reject the null hypothesis. The re-
search hypothesis is supported.

Advanced Topic: Finding Power When
Sample Sizes Are Unequal
A planned study with a predicted small effect size has 22 in one group and 51 in the
other. What is the approximate power for a one-tailed test at the .05 significance level?

Answer

From Table 8–4, for a one-tailed test with 30 participants in each group, power for a
small effect size is .19.

Outline for Writing Essays for a t Test for Independent Means
1. Describe the core logic of hypothesis testing in this situation. Be sure to mention

that the t test for independent is used for hypothesis testing when you have
scores from two entirely separate groups of people. Be sure to explain the mean-
ing of the research hypothesis and the null hypothesis in this situation.

2. Explain the logic of the comparison distribution that is used with a t test for
independent means (the distribution of differences between means). Be sure to
explain why you use 0 as its mean.

3. Outline the logic of estimating the population variance and the variance of
the two distributions of means. Describe how to figure the standard deviation
of the distribution of differences between means.

4. Explain why the shape of the comparison distribution that is used with a t test
for independent means is a t distribution (as opposed to the normal curve).

5. Describe the logic and process for determining the cutoff sample score(s) on the
comparison distribution at which the null hypothesis should be rejected.

6. Describe why and how you figure the t score of the sample mean on the compar-
ison distribution.

Harmonic mean =
(2)(N1)(N2)

N1 + N2
=

(2)(22)(51)

22 + 51
=

2244

73
= 30.7

;2.179

t = 2.73
-2.179+2.179

dfTotal = 12

dfTotal = 12
SDifference = 1.10

Practice Problems

IS
B
N
0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

The t Test for Independent Means 299

at the end of this chapter and the Study Guide and Computer Workbook that
accompanies this text.

All data are fictional unless an actual citation is given.

Set I (for Answers to Set I Problems, see pp. 683–685)
1. For each of the following studies, say whether you would use a t test for depen-

dent means or a t test for independent means.
(a) A researcher randomly assigns a group of 25 unemployed workers to receive

a new job skills program and 24 other workers to receive the standard job
skills program, and then measures how well they all do on a job skills test.

(b) A researcher measures self-esteem in 21 students before and after taking a
difficult exam.

(c) A researcher tests reaction time of each member of a group of 14 individuals
twice, once while in a very hot room and once while in a normal-temperature
room.

2. Figure for each of the following studies:SDifference

(a) 20 1 20 2
(b) 20 1 40 2
(c) 40 1 20 2
(d) 40 1 40 2
(e) 40 1 40 4

S 22N2S
2

1N1

3. For each of the following experiments, decide whether the difference between
conditions is statistically significant at the .05 level (two-tailed).

Experimental Group Control Group

N M N M

(a) 30 12.0 2.4 30 11.1 2.8
(b) 20 12.0 2.4 40 11.1 2.8
(c) 30 12.0 2.2 30 11.1 3.0

S 2S 2

4. A social psychologist studying mass communication randomly assigned 82
volunteers to one of two experimental groups. Sixty-one were instructed to get
their news for a month only from television, and 21 were instructed to get their
news for a month only from the Internet. (Why the researcher didn’t assign
equal numbers to the two conditions is a mystery!) After the month was up, all
participants were tested on their knowledge of several political issues. The
researcher did not have a prediction as to which news source would make
people more knowledgeable. That is, the researcher simply predicted that there is
some kind of difference. These were the results of the study. TV group:

, ; Internet group: , . Using the .01 level, what
should the social psychologist conclude? (a) Use the steps of hypothesis test-
ing, (b) sketch the distributions involved, and (c) explain your answers to
someone who is familiar with the t test for a single sample, but not with the
t test for independent means.

5. An educational psychologist was interested in whether using a student’s own
name in a story affected children’s attention span while reading. Six children
were randomly assigned to read a story under ordinary conditions (using names

S2 = 6M = 26S2 = 4M = 24

IS
B
N
0-
55
8-
46
76
1-
X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

300 Chapter 8

6. A developmental psychologist compares 4-year-olds and 8-year-olds on their
ability to understand the analogies used in stories. The scores for the five 4-year-
olds tested were 7, 6, 2, 3, and 8. The scores for the three 8-year-olds tested were
9, 2, and 5. Using the .05 level, do older children do better? (a) Use the steps of
hypothesis testing, (b) sketch the distributions involved, and (c) explain your
answers to someone who understands the t test for a single sample but does not
know anything about the t test for independent means.

7. Figure the estimated effect size for problems (a) 4, (b) 5, and (c) 6. (d) Explain
what you have done in part (a) to someone who understands the t test for inde-
pendent means but knows nothing about effect size.

8. Figure the approximate power of a t test for independent means for each of the
following planned studies:

Ordinary Story Own-Name Story

Student Reading Time Student Reading Time

A 2 G 4
B 5 H 16
C 7 I 11
D 9 J 9
E 6 K 8
F 7

Number of People
in Each Group

One- or
Two-Tailed Effect Size

(a) 30 1 Small (.20)
(b) 100 2 Large (.80)
(c) 40 1 Medium (.50)
(d) 40 1 Large (.80)

9. ADVANCED TOPIC: Figure the approximate power of each of the following
planned studies, all using a t test for independent means at the .05 significance
level, one-tailed, with a predicted small effect size:

(a) 3 57
(b) 10 50
(c) 20 40
(d) 30 30

N2N1

10. What are the approximate numbers of participants needed for each of the
following planned studies to have 80% power, assuming equal numbers in the
two groups and all using the .05 significance level? (Be sure to give the total
number of participants needed, not just the number needed for each group.)

like Dick and Jane). Five other children read versions of the same story, but with
each child’s own name substituted for one of the children in the story. The re-
searcher kept a careful measure of how long it took each child to read the story.
The results are shown in the following table. Using the .05 level, does including
the child’s name make any difference? (a) Use the steps of hypothesis testing,
(b) sketch the distributions involved, and (c) explain your answers to someone
who has never had a course in statistics.

IS
B
N
0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

The t Test for Independent Means 301

11. Van Aken and Asendorpf (1997) studied 139 German 12-year-olds. All of the
children completed a general self-worth questionnaire and were interviewed
about the supportiveness they experienced from their mothers, fathers, and class-
mates. The researchers then compared the self-worth of those with high and low
levels of support of each type. The researchers reported that “lower general self-
worth was found for children with a low-supportive mother ( ,

, ) and with a low-supportive father ( , ,
) . . . . A lower general self-worth was also found for children with only

low supportive classmates ( , , ).” (a) Explain
what these results mean to a person who has never had a course in statistics.
(b) Include a discussion of effect size and power. (When figuring power, you can
assume that the two groups in each comparison had about equal sample sizes.)

12. Gallagher-Thompson and her colleagues (2001) compared 27 wives who were
caring for their husbands who had Alzheimer’s disease to 27 wives in which nei-
ther partner had Alzheimer’s. The two groups of wives were otherwise similar
in terms of age, number of years married, and social economic status. Table 8–9
(reproduced from their Table 1) shows some of their results. Focusing on the

d = 0.35p 6 .05t(137) = 2.04
d = 0.69

p 6 .001t(137) = 4.03d = 0.78p 6 .001
t(137) = 4.52

Expected

Tails

(a) 107.0 149.0 84.0 1
(b) 22.5 16.2 31.5 2
(c) 14.0 12.0 2.5 1
(d) 480.0 520.0 50.0 2

��2�1

Table 8–9 Comparison of Caregiving and Noncaregiving Wives on Select Psychosocial Variables

Caregiving Wives ( ) Noncaregiving Wives ( )

M SD Range M SD Range t p

Geriatric Depression Scalea 9.42 6.59 1–25 2.37 2.54 0–8 5.14 .0001

Perceived Stress Scaleb 22.29 8.34 6–36 15.33 6.36 7–30 3.44 .001

Hope questionnairec

Agency 11.88 1.63 9–16 13.23 1.39 10–16 3.20 .002

Resilience 11.89 0.91 10–14 13.08 1.60 10–16 3.31 .002

Total 23.77 2.03 21–29 26.31 2.56 22–31 3.97 .0001

Mutuality Scaled

Closeness 3.51 .81 .33–4 3.70 .41 2.67–4 .315

Reciprocity 2.25 1.19 .17–4 3.25 .55 1.67–4 .001

Shared pleasures 2.65 1.00 0–4 3.52 .61 1.75–4 .001

Shared values 3.15 .89 0–4 3.46 .45 2.4–4 .138

Note: For all measures, higher scores indicate more of the construct being measured.
aMaximum score is 30.
bMaximum score is 56.
cFour questions in each subscale, with a maximum total score of 32.
dMaximum mean for each subscale is 4.
Source: Gallagher-Thompson, D., Dal Canto, P. G., Jacob, T., & Thompson, L. W. (2001). A comparison of marital interaction patterns between couples in which
the husband does or does not have Alzheimer’s disease. The Journals of Gerontology Series B: Psychology Sciences and Social Sciences, 56, 5140–5150. Copy-
right © 2001 by the Gerontological Society of America. Reprinted by permission of the publishers.

-1.51
-3.66
-3.68
-1.02

n = 27n = 27

IS
B
N
0-
55
8-
46
76
1-
X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

302 Chapter 8

Geriatric Depression Scale (the first row of the table) and the Mutuality Scale
for Shared Values (the last row in the table), explain these results to a person
who knows about the t test for a single sample but is unfamiliar with the t test
for independent means.

Set II
13. Make up two examples of studies (not in the book or from your lectures) that

would be tested with a t test for independent means.
14. For each of the following studies, say whether you would use a t test for depen-

dent means or a t test for independent means.
(a) A researcher measures the heights of 40 university students who are the

firstborn in their families and compares the 15 who come from large fami-
lies to the 25 who come from smaller families.

(b) A researcher tests performance on a math skills test of each of 250 individuals
before and after they complete a one-day seminar on managing test anxiety.

(c) A researcher compares the resting heart rate of 15 individuals who have
been taking a particular drug to the resting heart rate of 48 other individuals
who have not been taking the drug.

15. Figure for each of the following studies:SDifference

(a) 30 5 20 4
(b) 30 5 30 4
(c) 30 5 50 4
(d) 20 5 30 4
(e) 30 5 20 2

S22N2S
2
1N1

16. For each of the following experiments, decide whether the difference between
conditions is statistically significant at the .05 level (two-tailed).

Experimental Group Control Group
N M N M

(a) 10 604 60 10 607 50
(b) 40 604 60 40 607 50
(c) 10 604 20 40 607 16

S2S2

17. A psychologist theorized that people can hear better when they have just eaten a
large meal. Six individuals were randomly assigned to eat either a large meal or
a small meal. After eating the meal, their hearing was tested. The hearing ability
scores (high numbers indicate greater ability) are given in the following table.
Using the .05 level, do the results support the psychologist’s theory? (a) Use the
steps of hypothesis testing, (b) sketch the distributions involved, and (c) explain
your answers to someone who has never had a course in statistics.

IS
B
N
0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

The t Test for Independent Means 303

Big Meal Group Small Meal Group

Subject Hearing Subject Hearing

A 22 D 19
B 25 E 23
C 25 F 21

18. Twenty students randomly assigned to an experimental group receive an
instructional program; 30 in a control group do not. After 6 months, both groups
are tested on their knowledge. The experimental group has a mean of 38 on the
test (with an estimated population standard deviation of 3); the control group
has a mean of 35 (with an estimated population standard deviation of 5). Using
the .05 level, what should the experimenter conclude? (a) Use the steps of
hypothesis testing, (b) sketch the distributions involved, and (c) explain your
answer to someone who is familiar with the t test for a single sample but not
with the t test for independent means.

19. A study of the effects of color on easing anxiety compared anxiety test scores of
participants who completed the test printed on either soft yellow paper or on
harsh green paper. The scores for five participants who completed the test printed
on the yellow paper were 17, 19, 28, 21, and 18. The scores for four participants
who completed the test on the green paper were 20, 26, 17, and 24. Using the
.05 level, one-tailed (predicting lower anxiety scores for the yellow paper), what
should the researcher conclude? (a) Use the steps of hypothesis testing,
(b) sketch the distributions involved, and (c) explain your answers to someone
who is familiar with the t test for a single sample but not with the t test for
independent means.

20. Figure the estimated effect size for problems (a) 16, (b) 17, and (c) 18.
(d) Explain your answer to part (a) to a person who understands the t test for
independent means but is unfamiliar with effect size.

21. Figure the approximate power of a t test for independent means for each of
the following planned studies:

Number of People
in Each Group
One- or
Two-Tailed Effect Size

(a) 60 1 Small (.20)
(b) 60 2 Large (.80)
(c) 10 2 Medium (.50)
(d) 100 2 Medium (.50)

22. ADVANCED TOPIC: What is the approximate power of each of the following
planned studies, all using a t test for independent means at the .05 significance
level, two-tailed, with a predicted medium effect size?

(a) 90 10
(b) 50 50
(c) 6 34
(d) 20 20

N 2N 1

IS
B
N
0-
55
8-
46
76
1-
X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

304 Chapter 8

Table 8–10 Base-Rate Differences between Clinic and Nonclinic Couples on Relational
Control and Nonverbal Affect Codes Expressed in Proportions (SDs in
Parentheses)

Between-Group
Couple Status Differences

Clinic Mean Nonclinic Mean t

Domineeringness ( ) .452 (107) .307 (.152) 3.06*

Levelingness ( ) .305 (.061) .438 (.065) 5.77**

Submissiveness ( ) .183 (.097) .226 (.111) 1.12

Double-codes .050 (.028) .024 (.017) 2.92*

Positive affect (�) .127 (.090) .280 (.173) 3.22*

Negative affect ( ) .509 (.192) .127 (.133) 5.38**

Neutral affect (0) .344 (.110) .582 (.089) 6.44**

Double-codes (�/ ) .019 (.028) .008 (.017) 2.96*

Note: Proportions of each control and affect code were converted using arcsine transformation for use in between-group com-
parisons. .
Source: Escudero, V., Rogers, L. E., & Gutierrez, E. (1997). Patterns of relational control and nonverbal affect in clinic and non-
clinic couples. Journal of Social and Personal Relationships, 14, 5–29. Copyright © 1997 by Sage Publications, Ltd. Reprinted
by permission of Sage Publications, Thousand Oaks, London, and New Delhi.

*p 6 .01, **p 6 .001, (d.f. = 28)

–
–

T
:

24. Escudero and colleagues (1997) videotaped 30 couples discussing a marital
problem in their laboratory. The videotapes were later systematically rated for
various aspects of the couple’s communication, such as domineeringness and
the positive or negative quality of affect (emotion) expressed between them. A
major interest of their study was to compare couples who were having relation-
ship problems with those who were not. The 18 couples in the group having
problems were recruited from those who had gone to a marital clinic for help;
they were called the Clinic group. The 12 couples in the group not having prob-
lems were recruited through advertisements and were called the Nonclinic
group. (The two groups in fact had dramatically different scores on a standard
test of marital satisfaction.) Table 8–10 presents some of their results. (You can
ignore the arrows and plus and minus signs, which have to do with how they
rated the interactions. Also, ignore the note at the bottom about “arcsine trans-
formation”; we will explain this in Chapter 14.) (a) Focusing on Domineering-
ness and Submissiveness, explain these results to a person who has never had a
course in statistics. (b) ADVANCED TOPIC: Include a discussion of effect size
and power.

23. What are the approximate numbers of participants needed for each of the fol-
lowing planned studies to have 80% power, assuming equal numbers in the two
groups and all using the .05 significance level? (Be sure to give the total number
of participants needed, not just the number needed for each group.)

Expected
Tails

(a) 10 15 25 1
(b) 10 30 25 1
(c) 10 30 40 1
(d) 10 15 25 2

��2�1
IS
B
N
0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

The t Test for Independent Means 305

Table 8–11 Gender Differences in Internet Use and Potential Mediators

Malesa Femalesb t-value df p-value

E-mail use 4.16 (0.66) 4.30 (0.57) 2.81 626 .005

Web use 3.57 (0.67) 3.30 (0.67) 627 .000

Overall Internet use 3.86 (0.58) 3.80 (0.53) 627 .130

Computer anxiety 1.67 (0.56) 1.80 (0.57) 4.03 612 .000

Computer self-efficacy 3.89 (0.52) 3.71 (0.62) 608 .001

Loneliness 2.06 (0.64) 1.96 (0.64) 607 .061

Depression 1.22 (0.32) 1.28 (0.34) 2.36 609 .019

E-mail privacy 4.04 (0.78) 4.10 (0.69) 609 .516

E-mail trust 3.50 (0.77) 3.46 (0.75) 610 .516

Web privacy 4.06 (0.74) 4.09 (0.71) 0.62 623 .534

Web trust 3.14 (0.73) 3.12 (0.73) 624 .780

Web search success 4.05 (0.85) 4.13 (0.81) 1.12 568 .262

Importance of computer skills 2.54 (1.03) 2.31 (0.90) 477 .011

Computers cause health problems 2.67 (1.00) 3.00 (1.08) 3.36 476 .001

Gender stereotypes about
computer skills

3.45 (1.15) 4.33 (0.96) 476 .000

Racial/ethnic stereotypes
about computer skills

3.63 (1.17) 3.99 (1.07) 3.40 477 .001

Computers are taking over 3.08 (1.19) 2.87 (1.08) 476 .059

Note: For the attitude items, . For gender, . Numbers in parentheses are standard deviations.
a .
b .
Source: Jackson, L. A., Ervin, K. S., Gardner, P. D., & Schmitt, N. (2004). Gender and the Internet Women communicating and men searching. Sex Roles, 44,
363–379. Copyright © 2004. Reprinted by permission of Springer Science and Business Media.

n = 403
n = 227

1 = male, 2 = female1 = strongly agree, 2 = strongly disagree

-1.89

-8.95

-2.57

-0.28

-0.65
-0.97

-1.88
-3.49

-1.44
-4.84

25. Jackson and colleagues (2001) gave a questionnaire about Internet usage to uni-
versity students. Table 8–11 (their Table 1) shows their results comparing men
and women. (a) Select one significant and one nonsignificant result and explain
these two results to a person who understands the t test for a single sample but
does not know anything about the t test for independent means. (b) ADVANCED
TOPIC: Include a discussion of effect size and power (note that the sample sizes
for the male and female groups are shown in the table footnote).

The U in the following steps indicates a mouse click. (We used SPSS version 15.0
for Windows to carry out these analyses. The steps and output may be slightly differ-
ent for other versions of SPSS.)

t Test for Independent Means
It is easier to learn these steps using actual numbers, so we will use the expressive
writing example from earlier in the chapter. The scores for that example are shown in
Table 8–1 on page 279.

❶ Enter the scores into SPSS. SPSS assumes that all scores in a row are from the
same person. In this example, each person is in only one of the two groups

Using SPSS

IS
B
N
0-
55
8-
46
76
1-
X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

306 Chapter 8

Figure 8–6 SPSS data editor window for the expressive writing example (in which
20 students were randomly assigned to be in an expressive writing group or a control writing
group).

(either the expressive writing group or the control writing group). Thus, to tell
SPSS which person is in each group, you should enter the numbers as shown in
Figure 8–6. In the first column (labeled “group”), we used the number “1” to
indicate that a person is in the expressive writing group and the number “2” to
indicate that a person is in the control writing group. Each person’s score on the
health measure is listed in the second column (labeled “health”). (For the t test
for dependent means in the previous chapter, you set up the SPSS data with a
before-scores column and an after-scores column so that both scores for a par-
ticular person were on the same line. In this example, you have only one score

IS
B
N
0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

The t Test for Independent Means 307

per person; so you have one column of scores and another column to show
which experimental group each person is in; that is, you have a score column
and a group column.)

❷ U Analyze.
❸ U Compare means.
❹ U Independent-Samples T Test (this is the name SPSS uses for a t test for inde-

pendent means).
➎ U on the variable called “health” and then U the arrow next to the box labeled

“Test Variable(s).” This tells SPSS that the t test should be carried out on the
scores for the “health” variable.

❻ U the variable called “group” and then U the arrow next to the the box labeled
“Grouping Variable.” This tells SPSS that the variable called “group” shows
which person is in which group. U Define Groups. You now tell SPSS the val-
ues you used to label each group. Put 1 in the Group 1 box and put 2 in the
Group 2 box. Your screen should now look like Figure 8–7. U Continue.

❼ U OK. Your SPSS output window should look like Figure 8–8.

The first table in the SPSS output provides information about the two variables.
The first column gives the levels of the grouping variable (1 and 2, which indicate
the expressive writing group and the control writing group, respectively). The sec-
ond, third, and fourth columns give, respectively, the number of individuals (N),
mean (M), and estimated population standard deviation (S) for each group. The fifth
column, labeled “Std. error mean,” is the standard deviation of the distribution of
means, , for each group. Note that these values for the standard error of the mean
are based on each population variance estimate and not on the pooled estimate; so
they are not quite the same for each group as the square root of each figured in
the text. (See Table 8–1 for the figuring for this example.)

The second table in the SPSS output shows the actual results of the t test for
independent means. Before the t test results, SPSS shows the results of “Levene’s

S2M

Figure 8–7 SPSS independent means t test window for the expressive writing
example.

IS
B
N
0-
55
8-
46
76
1-
X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

308 Chapter 8

Figure 8–8 SPSS output window for a t test for independent means for the expressive
writing example.

Test for Equality of Variances,” which is a test of whether the variances of the two
populations are the same. This test is important mainly as a check on whether you
have met the assumption of equal population variances (called “homogeneity of
variance”). If this test is significant (that is, the value in the “Sig.” column is less
than .05), this assumption is brought into question. However, in this example, the
result is clearly not significant (.766 is well above .05), so we have no reason to
doubt the assumption of equal population variances. Thus, we can feel more confi-
dent that whatever conclusion we draw from the t test will be accurate.

The t test results begin with the column labeled “t.” Note that there are two rows
of t test results. The first row (a t of 2.425, df of 18, and so on), labeled “Equal
variances assumed” (on the left hand side of the table), shows the t test results
assuming the population variances are equal. The second row (a t of 2.425, df of
17.880, and so on), labeled “Equal variances not assumed,” shows the t test results if
we do not assume that the population variances are equal. In the present example
(as in most real-life cases), the Levene test was not significant; so we use the t test re-
sults assuming equal population variances. Notice that the values for “t” (the sample’s
t score), “df ” (degrees of freedom), and “Std. Error Difference” (the standard devia-
tion of the distribution of differences between means, ) in Figure 8–8 areSDifference

IS
B
N
0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

The t Test for Independent Means 309

the same (within rounding error) as their respective values we figured by hand in
Table 8–1. The column labeled “Sig. (2-tailed)” shows the exact significance level of
the sample’s t score. The significance level of .026 is less than our .05 cutoff for this
example, which means that you can reject the null hypothesis and the research hy-
pothesis is supported. (You can ignore the final two columns of the table, listed under
the heading “95% Confidence Interval of the Difference.” These columns refer to the
raw scores corresponding to the t scores at the bottom 2.5% and the top 2.5% of the
t distribution; see Chapter 5 for a discussion of confidence intervals). Note that SPSS
does not know if you are doing a one-tailed or a two-tailed test. So it always gives re-
sults for a two-tailed test. If you are doing a one-tailed test, the true significance level
is exactly half of what is given by SPSS.

1. In a real research situation, the figuring for a t test for independent means is usu-
ally all done by computer (see this chapter’s Using SPSS section). However, if
you ever have to do a t test for independent means for an actual research study
by hand (or with just a hand calculator), you may find the following formula
useful:

2. Cohen (1988, pp. 28–39) provides more detailed tables in terms of number of
participants, levels of effect size, and significance levels. Note that Cohen de-
scribes the significance level by the letter a (for “alpha level”), with a subscript
of either 1 or 2, referring to a one-tailed or two-tailed test. For example, a table
that refers to “ ” at the top means that this is the table for ,
one-tailed.

3. Cohen (1988, pp. 54–55) provides fuller tables, indicating needed numbers of
participants for levels of power other than 80%; for effect sizes other than .20,
.50, and .80; and for other significance levels. If you just need a rough approxi-
mation, Dunlap and Myers (1997) have developed a shortcut for finding the
approximate number of participants needed for studies using the t test for inde-
pendent means. For 50% power, the number of participants needed per group is
approximately . For 80%–90% power, .16>d2 + 28>d2 + 1

p 6 .05a1 = .05

t =
M1 – M2

B
(N1 – 1)(S21) + (N2 – 1)(S22)

N1 + N2 – 2
a 1

N1
+

N2
b

Chapter Notes

The t score for a t test for in-
dependent means is the result
of subtracting Sample 2’s
mean from Sample 1’s mean
and dividing that difference
by the square root of the fol-
lowing: multiplying one less
than the number of scores in
Sample 1 by Population 1’s
estimated population variance
and adding this product to the
result of multiplying one less
than the number of scores in
Sample 2 by Population 2’s
estimated population vari-
ance, and then dividing this
summed result by two less
than the sum of the number
of scores in Sample 1 and the
number of scores in Sample 2,
and then taking the result of
this division and multiplying
it by the result of adding one
divided by the number of
scores in Sample 1 to one di-
vided by the number of
scores in Sample 2.

IS
B
N
0-
55
8-
46
76
1-
X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

310

✪ Basic Logic of the Analysis
of Variance 311

✪ Carrying Out an Analysis
of Variance 319

✪ Hypothesis Testing with the Analysis
of Variance 327

✪ Assumptions in the Analysis
of Variance 331

✪ Planned Contrasts 334

✪ Post Hoc Comparisons 337

✪ Effect Size and Power for the Analysis
of Variance 339

✪ Controversy: Omnibus Tests versus
Planned Contrasts 343

In Chapter 8, you learned about the t test for independent means, a procedure forcomparing two groups of scores from entirely separate groups of people (such asan experimental group and a control group). In this chapter, you will learn about
a procedure for comparing more than two groups of scores, each of which is from an
entirely separate group of people

We will begin with an example. Cindy Hazan and Philip Shaver (1987) arranged
to have the Rocky Mountain News, a large Denver area newspaper, print a mail-in
survey. The survey included the question shown in Table 9–1 to measure what is called
attachment style. (How would you answer this item?) Those who selected the first
choice are “secure”; those who selected the second, “avoidant”; and those who selected

✪ Analyses of Variance in Research
Articles 344

✪ Advanced Topic: The Structural
Model in the Analysis of
Variance 345

✪ Summary 351

✪ Key Terms 352

✪ Example Worked-Out
Problems 353

✪ Practice Problems 357

✪ Using SPSS 364

✪ Chapter Notes 368

Introduction to the Analysis of Variance

Chapter Outline

CHAPTER 9

T I P F O R S U C C E S S
This chapter assumes you under-
stand the logic of hypothesis test-
ing and the t test (particularly
estimated population variance and
the distribution of means). So be
sure you understand the relevant
material in Chapters 4, 5, 7, and 8
before starting this chapter.

IS
B

0-558-46761-

Introduction to the Analysis of Variance 311

analysis of variance (ANOVA)
hypothesis-testing procedure for studies
with three or more groups.

the third, “anxious-ambivalent.” These attachment styles are thought to be different
ways of behaving and thinking in close relationships that develop from a person’s ex-
perience with early caretakers (Mikulincer & Shaver, 2007). (Of course, this single
item is only a very rough measure that works for a large survey but is certainly not
definitive in any particular person.) Readers also answered questions about various as-
pects of love, including amount of jealousy. Hazan and Shaver then compared the
amount of jealousy reported by people with the three different attachment styles.

With a t test for independent means, Hazan and Shaver could have compared the
mean jealousy scores of any two of the attachment styles. Instead, they were interested
in differences among all three attachment styles. The statistical procedure for testing
variation among the means of more than two groups is called the analysis of variance,
abbreviated as ANOVA. (You could use the analysis of variance for a study with only
two groups, but the simpler t test gives the same result.)

In this chapter, we introduce the analysis of variance, focusing on the situation
in which the different groups being compared each have the same number of scores.
In an Advanced Topic section later in the chapter, we describe a more flexible way
of thinking about analysis of variance that allows groups to have different numbers
of scores. In Chapter 10, we consider situations in which the different groups are ar-
rayed across more than one dimension. For example, in the same analysis we might
consider both gender and attachment style, making six groups in all (female secure,
male secure, female avoidant, etc.), arrayed across the two dimensions of gender and
attachment style. This situation is known as a factorial analysis of variance. To em-
phasize the difference from factorial analysis of variance, what you learn in this chap-
ter is often called a one-way analysis of variance. (If this is confusing, don’t worry.
We will go through it slowly and systematically in Chapter 10. We only mention this
now so that, if you hear these terms, you will not be surprised.)

Basic Logic of the Analysis of Variance
The null hypothesis in an analysis of variance is that the several populations being com-
pared all have the same mean. For example, in the attachment style example, the null
hypothesis is that the populations of secure, avoidant, and anxious-ambivalent peo-
ple all have the same degree of jealousy. The research hypothesis would be that the
degree of jealousy differs among these three populations.

Hypothesis testing in analysis of variance is about whether the means of the sam-
ples differ more than you would expect if the null hypothesis were true. This question
about means is answered, surprisingly, by analyzing variances (hence the name

Table 9–1 Question Used in Hazan and Shaver (1987) Newspaper Survey

Which of the following best describes your feelings? [Check one]

[ ] I find it relatively easy to get close to others and am comfortable depending on them and having them de-
pend on me. I don’t often worry about being abandoned or about someone getting too close to me.

[ ] I am somewhat uncomfortable being close to others; I find it difficult to trust them completely, difficult to
allow myself to depend on them. I am nervous when anyone gets too close, and often, love partners want
me to be more intimate than I feel comfortable being.

[ ] I find that others are reluctant to get as close as I would like. I often worry that my partner doesn’t really
love me or won’t want to stay with me. I want to merge completely with another person, and this desire
sometimes scares people away.

Source: Hazan and Shaver (1987, p. 515).

IS
B

N
0-

55
8-

46
76

1-
X

312 Chapter 9

analysis of variance). Among other reasons, you focus on variances because, when you
want to know how several means differ, you are asking about the variation among
those means.

Thus, to understand the logic of analysis of variance, we consider variances. In
particular, we begin by discussing two different ways of estimating population vari-
ances. As you will see, the analysis of variance is about a comparison of the results
of these two different ways of estimating population variances.

Estimating Population Variance from Variation
Within Each Sample
With the analysis of variance, as with the t test, you do not know the true population
variances. However, as with the t test, you can estimate the variance of each of the pop-
ulations from the scores in the samples. Also, as with the t test, you assume in the
analysis of variance that all populations have the same variance. This allows you to
average the estimates from each sample into a single pooled estimate, called the
within-groups estimate of the population variance. It is an average of estimates
figured entirely from the scores within each of the samples.

One of the most important things to remember about this within-groups estimate
is that it is not affected by whether the null hypothesis is true. This estimate comes out
the same whether the means of the populations are all the same (the null hypothesis
is true) or the means of the populations are not all the same (the null hypothesis is
false). This estimate comes out the same because it focuses only on the variation
inside each population. Thus, it doesn’t matter how far apart the means of the differ-
ent populations are.

If the variation in scores within each sample is not affected by whether the null
hypothesis is true, what determines the level of within-group variation? The answer
is that chance factors (that is, factors that are unknown to the researcher) account for
why different people in a sample have different scores. These chance factors include
the fact that different people respond differently to the same situation or treatment
and that there may be some experimental error associated with the measurement of
the variable of interest. Thus, we can think of the within-groups population variance
estimate as an estimate based on chance (or unknown) factors that cause different
people in a study to have different scores.

Estimating the Population Variance from Variation
Between the Means of the Samples
There is also a second way to estimate the population variance. Each sample’s mean
is a number in its own right. If there are several samples, there are several such num-
bers, and these numbers will have some variation among them. The variation among
these means gives another way to estimate the variance in the populations that the
samples come from. Just how this works is a bit tricky; so follow the next two sec-
tions closely.

When the Null Hypothesis Is True First, consider the situation in which the null
hypothesis is true. In this situation, all samples come from populations that have the
same mean. Remember, we are always assuming that all populations have the same
variance (and also that they are all normal curves). Thus, if the null hypothesis is
true, all populations are identical and thus they have the same mean, variance, and
shape.

within-groups estimate of the popu-
lation variance estimate of the vari-
ance of the population of individuals
based on the variation among the scores
in each of the actual groups studied.

IS
B

N
0-558-46761-X

Introduction to the Analysis of Variance 313

However, even when the populations are identical (that is, even when the null hy-
pothesis is true), samples from the different populations will each be a little different.
How different can the sample means be? That depends on how much variation there is
in each population. If a population has very little variation in the scores in it, then the
means of samples from that population (or any identical population) will tend to be very
similar to each other. When the null hypothesis is true, the variability among the sam-
ple means is influenced by the same chance factors that influence the variability among
the scores within each sample.

What if several identical populations (with the same population mean) have a lot
of variation in the scores within each? In that situation, if you take one sample from
each population, the means of those samples could easily be very different from each
other. Being very different, the variance of these means will be large. The point is
that the more variance within each of several identical populations, the more vari-
ance there will be among the means of samples when you take a random sample from
each population.

Suppose you were studying samples of six children from each of three large play-
grounds (the populations in this example). If each playground had children who were all
either 7 or 8 years old, the means of your three samples would all be between 7 and 8.
Thus, there would not be much variance among those means. However, if each play-
ground had children ranging from 3 to 12 years old, the means of the three samples would
probably vary quite a bit.What this shows is that the variation among the means of sam-
ples is related directly to the amount of variation in each of the populations from which
the samples are taken. The more variation in each population, the more variation there
is among the means of samples taken from those populations.

This principle is shown in Figure 9–1. The three identical populations on the left
have small variances, and the three identical populations on the right have large vari-
ances. In each set of three identical populations, even though the means of the popula-
tions (shown by triangles) are exactly the same, the means of the samples from those
populations (shown byXs) are not exactly the same. Most important, the sample means
from the populations that each have a small amount of variance are closer together
(have less variance among them). The sample means from the populations that each have
more variance are more spread out (have more variance among them).

We have now seen that the variation among the means of samples taken from
identical populations is related directly to the variation of the scores in each of those
populations. This has a very important and perhaps surprising implication: it should
be possible to estimate the variance in each population from the variation among the
means of our samples.

Such an estimate is called a between-groups estimate of the population vari-
ance. (It has this name because it is based on the variation between the means of the
samples, the “groups.” Grammatically, it ought to be among groups, but between
groups is traditional.) You will learn how to figure this estimate later in the chapter.

So far, all of this logic we have considered has assumed that the null hypothesis
is true, so that there is no variation among the means of the populations. In this situ-
ation, the between-groups estimate of the population variance (which reflects variabil-
ity in the means of the samples) is influenced by the chance factors that cause different
people in the same sample to have different scores. Let’s now consider what happens
when the null hypothesis is not true, when instead the research hypothesis is true.

When the Null Hypothesis Is Not True If the null hypothesis is not true (and
thus the research hypothesis is true), the populations themselves have different
means. In this situation, variation among the means of samples taken from these

between-groups estimate of the
population variance estimate of the
variance of the population of individuals
based on the variation among the means
of the groups studied.

IS
B
N
0-
55
8-
46
76
1-
X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

314 Chapter 9

populations is still caused by the chance factors that cause variation within the pop-
ulations. So the larger the variation within the populations, the larger the variation
will be among the means of samples taken from the populations. However, in this
situation, in which the research hypothesis is true, variation among the means of the
samples also is caused by variation among the population means. You can think of

(a) (b)

Figure 9–1 Means of samples from identical populations will not be identical. (a) Sam-
ple means from populations with less variation will vary less. (b) Sample means from popula-
tions with more variation will vary more. (Population means are indicated by a triangle, sample
means by an X.)

IS
B
N
0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

Introduction to the Analysis of Variance 3

(b)(a)

Figure 9–2 Means of samples from populations whose means differ (b) will vary more
than sample means taken from populations whose means are the same (a). (Population means
are indicated by a triangle, sample means by an X.)

this variation among population means as resulting from a treatment effect—that is,
the different treatment received by the groups (as in an experiment) causes the
groups to have different means. So, when the research hypothesis is true, the means
of the samples are spread out for two different reasons: (1) because of variation in
each of the populations (due to chance factors) and (2) because of variation among
the population means (that is, a treatment effect). The left side of Figure 9–2 shows
populations with the same means (shown by triangles) and the means of samples
taken from them (shown by Xs). (This is the same situation as in both sides of
Figure 9–1.) The right side of Figure 9–2 shows three populations with different
means (shown by triangles) and the means of samples taken from them (shown
by Xs). (This is the situation we have just been discussing.) Notice that the means of
the samples are more spread out in the situation on the right side of Figure 9–2. This
is true even though the variations in the populations are the same for the situation on
both sides of Figure 9–2. This additional spread (variance) for the means on the right
side of Figure 9–2 is due to the populations having different means.

In summary, the between-groups estimate of the population variance is figured
based on the variation among the means of the samples. If the null hypothesis is true,
this estimate gives an accurate indication of the variation within the populations (that
is, the variation due to chance factors). But if the null hypothesis is false, this method
of estimating the population variance is influenced both by the variation within the pop-
ulations (the variation due to chance factors) and the variation among the population
means (the variation due to a treatment effect). It will not give an accurate estimate of
the variation within the populations because it also will be affected by the variation
among the populations. This difference between the two situations has important

T I P F O R S U C C E S S
You may want to read this para-
graph again to ensure that you fully
understand the logic we are
presenting.

IS
B
N
0-
55
8-
46
76
1-
X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

316 Chapter 9

implications. It is what makes the analysis of variance a method of testing hypothe-
ses about whether there is a difference among means of populations.

Comparing the Within-Groups and Between-Groups
Estimates of Population Variance
Table 9–2 summarizes what we have seen so far about the within-groups and between-
groups estimates of population variance, both when the null hypothesis is true and
when the research hypothesis is true. When the null hypothesis is true, the within-
groups and between-groups estimates are based on the same thing (that is, the chance
variation within populations). Literally, they are estimates of the same population
variance. Therefore, when the null hypothesis is true, both estimates should be about
the same. (Only about the same; these are estimates). Here is another way of describ-
ing this similarity of the between-groups estimate and the within-groups estimate
when the null hypothesis is true: In this situation, the ratio of the between-groups
estimate to the within-groups estimate should be approximately one to one. For ex-
ample, if the within-groups estimate is 107.5, the between-groups estimate should be
around 107.5, so that the ratio would be about 1. (A ratio is found by dividing one num-
ber by the other; thus .)

The situation is quite different when the null hypothesis is not true. As shown
in Table 9–2, when the research hypothesis is true, the between-groups estimate is
influenced by two sources of variation: (a) the variation of the scores in each pop-
ulation (due to chance factors) and (b) the variation of the means of the populations
from each other (due to a treatment effect). Yet even when the research hypothesis
is true, the within-groups estimate still is influenced only by the variation in the
populations. Therefore, when the research hypothesis is true, the between-groups
estimate should be larger than the within-groups estimate. In this situation, the ratio
of the between-groups estimate to the within-groups estimate should be greater
than 1. For example, the between-groups estimate might be 638.9 and the within-
groups estimate 107.5, making a ratio of 638.9 to 107.5, or 5.94. In this example the
between-groups estimate is nearly six times bigger (5.94 times to be exact) than the
within-groups estimate.

This is the central principle of the analysis of variance: When the null hypothe-
sis is true, the ratio of the between-groups population variance estimate to the within-
groups population variance estimate should be about 1. When the research hypothesis
is true, this ratio should be greater than 1. If you figure this ratio and it comes out much

107.5>107.5 = 1

T I P F O R S U C C E S S
Table 9–2 summarizes the logic of
the analysis of variance. Test your
understanding of this logic by try-
ing to explain Table 9–2, without
referring to the book. You might try
writing your answer down and
swapping it with someone else in
your class.

Table 9–2 Sources of Variation in Within-Groups and Between-Groups Variance Estimates

Variation Within
Populations (Due to

Chance Factors)

Variation

Between

Populations (Due to
a Treatment Effect)

Null hypothesis is true

Within-groups estimate reflects ✓

Between-groups estimate reflects ✓

Research hypothesis is true

Within-groups estimate reflects ✓

Between-groups estimate reflects ✓ ✓

IS
B
N
0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

Introduction to the Analysis of Variance 317

greater than 1, you can reject the null hypothesis. That is, it is unlikely that the null
hypothesis could be true and the between-groups estimate be a lot bigger than the
within-groups estimate.

The F Ratio
This crucial ratio of the between-groups to the within-groups population variance
estimate is called an F ratio. (The F is for Sir Ronald Fisher, an eminent statistician
who developed the analysis of variance; see Box 9–1.)

The F Distribution and the F Table
We have said that if the crucial ratio of between-groups estimate to within-groups
estimate (the F ratio) is a lot larger than 1, you can reject the null hypothesis. The
next question is, “Just how much bigger than 1 should it be?”

F ratio ratio of the between-groups
population variance estimate to the
within-groups population variance
estimate.

and proofs. Gosset said that when Fisher began a sentence
with “Evidently,” it meant two hours of hard work before
one could hope to see why the point was evident.

Indeed, his lack of empathy extended to all of hu-
mankind. Like Galton, Fisher was fond of eugenics, fa-
voring anything that might increase the birthrate of the
upper and professional classes and skilled artisans. Not
only did he see contraception as a poor idea—fearing that
the least desirable persons would use it least—but he de-
fended infanticide as serving an evolutionary function. It
may be just as well that his opportunities to experiment
with breeding never extended beyond the raising of his
own children and some crops of potatoes and wheat.

Although Fisher eventually became the Galton Profes-
sor of Eugenics at University College, his most influential
appointment probably came when he was invited to Iowa
State College in Ames for the summers of 1931 and 1936
(where he was said to be so put out with the terrible heat
that he stored his sheets in the refrigerator all day). At
Ames, Fisher greatly impressed George Snedecor, an
American professor of mathematics also working on agri-
cultural problems. Consequently, Snedecor wrote a text-
book of statistics for agriculture that borrowed heavily
from Fisher’s work. The book so popularized Fisher’s
ideas about statistics and research design that its second
edition sold 100,000 copies.

You can learn more about Fisher at the following
Web site: http://www-groups.dcs.st-and.ac.uk/~history/
Biographies/Fisher.html.

Sources: Peters (1987); Salsburg (2001); Stigler (1986);
Tankard (1984).

BOX 9–1 Sir Ronald Fisher, Caustic Genius of Statistics
Ronald A. Fisher, a contem-
porary of William Gosset
(see Chapter 7, Box 7–1) and
Karl Pearson (see Chapter 13,
Box 13–1), was probably the
brightest and certainly the most
productive of this close-knit
group of British statisticians. In
the process of writing 300 pa-
pers and seven books, he devel-
oped many of the modern
field’s key concepts: variance,

analysis of variance, significance levels, the null hypothe-
sis, and almost all of our basic ideas of research design, in-
cluding the fundamental importance of randomization.

A family legend is that little Ronald, born in 1890, was
so fascinated by math that one day, at age 3, when put into
his highchair for breakfast, he asked his nurse, “What is a
half of a half?” Told it was a quarter, he asked, “What’s half
of a quarter?” To that answer he wanted to know what was
half of an eighth.At the next answer he purportedly thought
a moment and said, “Then I suppose that a half of a six-
teenth must be a thirty-toof.” Ah, baby stories.

As a grown man, however, Fisher seems to have been
anything but darling. Some observers ascribe this to a cold
and unemotional mother, but, whatever the reason, through-
out his life he was embroiled in bitter feuds, even with
scholars who had previously been his closest allies and who
certainly ought to have been comrades in research.

Fisher’s thin ration of compassion extended to his read-
ers as well; not only was his writing hopelessly obscure,
but it often simply failed to supply important assumptions

Courtesy of the Library of
Congress

IS
B
N
0-
55
8-
46
76
1-
X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

318 Chapter 9

T I P F O R S U C C E S S
These “How Are You Doing” ques-
tions and answers provide a useful
summary of the logic of the analy-
sis of variance. Be sure to review
them (and the relevant sections in
the text) as many times as neces-
sary to fully understand this logic.

How are you doing?

1. When do you use an analysis of variance?
2. (a) What is the within-groups population variance estimate based on? (b) How

is it affected by the null hypothesis being true or not? (c) Why?
3. (a) What is the between-groups population variance estimate based on?

(b) How is it affected by the null hypothesis being true or not? (c) Why?
4. What are two sources of variation that can contribute to the between-groups

population variance estimate?
5. (a) What is the F ratio; (b) why is it usually about 1 when the null hypothesis is

true; and (c) why is it usually larger than 1 when the null hypothesis is false?

Statisticians have developed the mathematics of an F distribution and have pre-
pared tables of F ratios. For any given situation, you merely look up in an F table how
extreme an F ratio is needed to reject the null hypothesis at, say, the .05 level. (You
learn to use the F table later in the chapter.)

For an example of an F ratio, let’s return to the attachment style study. The
results of that study, for jealousy, were as follows: The between-groups population vari-
ance estimate was 23.27, and the within-groups population variance estimate was .53.
(You learn shortly how to figure these estimates on your own.) The ratio of the
between-groups to the within-groups variance estimates (23.27�.53) came out to 43.91;
that is, . This F ratio is considerably larger than 1. The F ratio needed to
reject the null hypothesis at the .05 level in this study is only 3.01. Thus, the re-
searchers confidently rejected the null hypothesis and concluded that the amount of
jealousy is not the same for the three attachment styles. (Mean jealous ratings were
2.17 for secures, 2.57 for avoidants, and 2.88 for anxious-ambivalents.)

An Analogy
Some students find an analogy helpful in understanding the analysis of variance. The
analogy is to what engineers call the signal-to-noise ratio. For example, your ability
to make out the words in a staticky cell phone conversation depends on the strength
of the signal versus the amount of random noise. With the F ratio in the analysis of
variance, the difference among the means of the samples is like the signal; it is the in-
formation of interest. The variation within the samples is like the noise. When the
variation among the samples is sufficiently great in comparison to the variation within
the samples, you conclude that there is a significant effect.

F = 43.91

F distribution mathematically defined
curve that is the comparison distribution
used in an analysis of variance.

F table table of cutoff scores on the
F distribution.

4.Two sources of variation that can contribute to the between-groups population
variance estimate are (i) variation among the scores in each of the populations
(that is, variation due to chance factors) and (ii) variation among the means of
the populations (that is, variation due to a treatment effect).

5.(a)TheFratioistheratioofthebetween-groupspopulationvarianceestimate
tothewithin-groupspopulationvarianceestimate.(b)Bothestimatesarebased
entirelyonthesamesourceofvariation—thevariationamongthescoresineach
ofthepopulations(thatis,duetochancefactors).(c)Thebetween-groups
estimateisalsoinfluencedbythevariationamongthemeansofthepopulations
(thatis,atreatmenteffect)whereasthewithin-groupsestimateisnot.Thus,
whenthenullhypothesisisfalse(andthusthemeansofthepopulationsarenot
thesame),thebetween-groupsestimatewillbebiggerthanthewithin-groups
estimate.

IS
B
N
0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

Introduction to the Analysis of Variance 319

Carrying Out an Analysis of Variance
Now that we have considered the basic logic of the analysis of variance, we will go
through an example to illustrate the details. (We use a fictional study to keep the num-
bers simple.)

Suppose a social psychologist is studying the influence of knowledge of previ-
ous criminal record on juries’ perceptions of the guilt or innocence of defendants. The
researcher recruits 15 volunteers who have been selected for jury duty (but have not
yet served at a trial). The researcher shows them a video of a four-hour trial in which
a woman is accused of passing bad checks. Before viewing the tape, however, all of
the research participants are given a “background sheet” with age, marital status, ed-
ucation, and other such information about the accused woman. The sheet is the same
for all 15 participants, with one difference. For five of the participants, the last sec-
tion of the sheet says that the woman has been convicted several times before of pass-
ing bad checks; we will call those participants the Criminal Record group. For five
other participants, the last section of the sheet says the woman has a completely clean
criminal record—the Clean Record group. For the remaining five participants, the
sheet does not mention anything about criminal record one way or the other—the No
Information group.

The participants are randomly assigned to the groups. After viewing the tape of
the trial, all 15 participants make a rating on a 10-point scale, which runs from com-
pletely sure she is innocent (1) to completely sure she is guilty (10). The results of this
fictional study are shown in Table 9–3. As you can see, the means of the three groups
are different (8, 4, and 5). Yet there is also quite a bit of variation within each of the
three groups. Population variance estimates from the scores in each of these three
groups are 4.5, 5.0, and 6.5.

You need to figure the following numbers to test the hypothesis that the three
populations are different: (a) a population variance estimate based on the variation of
the scores in each of the samples, (b) a population variance estimate based on the

Answers

1.Analysis of variance is used when you are comparing means of samples from
more than two populations.

2.(a) The within-groups population variance estimate is based on the variation
among the scores in each of the samples. (b) It is not affected. (c) Whether the
null hypothesis is true has to do with whether the means of the populations dif-
fer. Thus, the within-groups estimate is not affected by whether the null hy-
pothesis is true because the variation withineach population (which is the basis
for the variation in each sample) is not affected by whether the population
means differ.

3.(a) The between-groups population variance estimate is based on the variation
among the means of the samples. (b) It is larger when the null hypothesis is
false. (c) Whether the null hypothesis is true has to do with whether the means
of the populations differ. When the null hypothesis is false, the means of the
populations differ. Thus, the between-groups estimate is bigger when the null
hypothesis is false, because the variation among the means of the populations
(which is one basis for the variation among the means of the samples) is greater
when the population means differ.

IS
B
N
0-
55
8-
46
76
1-
X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

320 Chapter 9

differences among the group means, and (c) the ratio of the two, the F ratio. (In
addition, you need the significance cutoff F from an F table.)

Figuring the Within-Groups Estimate
of the Population Variance
You can estimate the population variance from any one group (that is, from any one
sample) using the usual method of estimating a population variance from a sample.
First, you figure the sum of the squared deviation scores. That is, you take the devi-
ation of each score from its group’s mean, square that deviation score, and sum all the
squared deviation scores. Second, you divide that sum of squared deviation scores by
that group’s degrees of freedom. (The degrees of freedom for a group are the number
of scores in the group minus 1.) For the example, as shown in Table 9–3, this gives
an estimated population variance of 4.5 based on the Criminal Record group’s scores,
an estimate of 5.0 based on the Clean Record group’s scores, and an estimate of 6.5
based on the No Information group’s scores.

Once again, in the analysis of variance, as with the t test, we assume that the pop-
ulations have the same variance and that the estimates based on each sample’s scores
are all estimating the same true population variance. The sample sizes are equal in this
example; so the estimate for each group is based on an equal amount of information.
Thus (unlike with the t test), you can pool these variance estimates by straight aver-
aging. This gives an overall estimate of the population variance based on the varia-
tion within groups of 5.33 (that is, the sum of 4.5, 5.0, and 6.5, which is 16, divided
by 3, the number of groups).

To summarize, the two steps are:

●A Figure population variance estimates based on each group’s scores.
●B Average these variance estimates. The estimated population variance based on

the variation of the scores within each of the groups is the within-groups vari-
ance estimate. This is symbolized as or . is short for
mean squares within. The term mean squares is another name for the variance,
because the variance is the mean of the squared deviations. ( or
is also sometimes called the error variance and symbolized as or .)MSErrorS

2
Error

MSWithinS
2

Within

MSWithin

MSWithinS
2
Within

or within-groups
estimate of the population variance.

MSWithin

S2Within

Table 9–3 Results of the Criminal Record Study (Fictional Data)

Criminal Record Group Clean Record Group No Information Group

Rating

Deviation

from Mean

Squared
Deviation

from Mean Rating
Deviation

from Mean

Squared
Deviation

from Mean Rating
Deviation

from Mean

Squared
Deviation

from Mean

10 2 4 5 1 1 4 1

7 1 1 9 6 1 1

5 9 3 1 9 4 16

10 2 4 7 3 9 3 4

8 0 0 4 0 0 3 4

40 0 18 20 0 20 25 0 26

S 2 = 26>4 = 6.5.S 2 = 20>4 = 5.0.S 2 = 18>4 = 4.5.
M = 25>5 = 5.M = 20>5 = 4.M = 40>5 = 8.

-2
-2

-1-3
-3

-1

-1
IS
B
N
0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

Introduction to the Analysis of Variance 321

In terms of a formula,

(9–1)

In this formula, is the estimated population variance based on the scores in
the first group (the group from Population 1), is the estimated population variance
based on the scores in the second group, and is the estimated population vari-
ance based on the scores in the last group. (The dots, or ellipsis, in the formula show
that you are to fill in a population variance estimate for as many other groups as
there are in the analysis.) is the number of groups.

Using this formula for our figuring, we get

Figuring the Between-Groups Estimate
of the Population Variance
Figuring the between-groups estimate of the population variance also involves two
steps (though quite different ones from the within-groups estimate). First estimate,
from the means of your samples, the variance of a distribution of means. Second,
based on the variance of this distribution of means, figure the variance of the popu-
lation of individuals. Here are the two steps in more detail:

●A Estimate the variance of the distribution of means: Add up the sample
means’ squared deviations from the overall mean (the mean of all the scores)
and divide this by the number of means minus 1.

You can think of the means of your samples as taken from a distribution of
means. Follow the standard procedure of using the scores in a sample to estimate
the variance of the population from which these scores are taken. In this situation,
you think of the means of your samples as the scores and the distribution of means
as the population from which these scores come. What this all boils down to are
the following procedures. You begin by figuring the sum of squared deviations.
(You find the mean of your sample means, figure the deviation of each sample
mean from this mean of means, square each of these deviations, and then sum
these squared deviations.) Then, divide this sum of squared deviations by the de-
grees of freedom, which is the number of means minus 1. In terms of a formula
(when sample sizes are all equal),

(9–2)

In this formula, is the estimated variance of the distribution of means
(estimated based on the means of the samples in your study). M is the mean of
each of your samples. GM is the grand mean, the overall mean of all your scores,
which is also the mean of your means. dfBetween is the degrees of freedom in the
between-groups estimate, the number of groups minus 1.

Stated as a formula,

(9–3)

In the criminal record example, the three means are 8, 4, and 5. The figuring
of is shown in Table 9–4.

S2M

dfBetween = NGroups – 1

S2M

S2M =
a (M – GM)2

dfBetween

S2Within =
S21 + S22 + Á +

S2Last

NGroups

4.5 + 5.0 + 6.5
3

=
16

3
= 5.33

NGroups
S2Last

S21

S2Within or MSWithin =
S21 + S22 + Á + S2Last

NGroups

The within-groups population
variance estimate is the sum
of the population variance es-
timates based on each sam-
ple, divided by the number of
groups.

The estimated variance of the
distribution of means is the
sum of each sample mean’s
squared deviation from the
grand mean, divided by the
degrees of freedom for the
between-groups population
variance estimate.

grand mean (GM) overall mean of all
the scores, regardless of what group they
are in; when group sizes are equal, mean
of the group means.

The degrees of freedom for
the between-groups popula-
tion variance estimate is the
number of groups minus 1.

IS
B
N
0-
55
8-
46
76
1-
X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

322 Chapter 9

●B Figure the estimated variance of the population of individual scores:

Multiply the variance of the distribution of means by the number of scores in
each group.

What we just figured in Step ●A, from a sample of a few means, is the estimated
variance of a distribution of means. From this we want to estimate the variance of the
population (the distribution of individuals) on which the distribution of means is
based. We saw in Chapter 5 that the variance of a distribution of means is smaller
than the variance of the population (the distribution of individuals) that it is based on.
This is because means are less likely to be extreme than are individual scores (be-
cause any one sample is unlikely to include several scores that are extreme in the
same direction). Specifically, you learned in Chapter 5 that the variance of a distrib-
ution of means is the variance of the distribution of individual scores divided by the
number of scores in each sample.

Now, however, we are going to reverse what we did in Chapter 5. In Chapter 5 you
figured the variance of the distribution of means by dividing the variance of the distri-
bution of individuals by the sample size. Now you are going to figure the variance of
the distribution of individuals by multiplying the variance of the distribution of means
by the sample size (see Table 9–5). That is, to come up with the variance of the popu-
lation of individuals, you multiply your estimate of the variance of the distribution of
means by the sample size in each of the groups. The result of all this is the between-
groups variance estimate. Stated as a formula (for when sample sizes are equal),

(9–4)

In this formula, or is the estimate of the population variance
based on the variation between the means (the between-groups population variance
estimate). n is the number of participants in each sample.

Let’s return to our example in which there were five participants in each sample
and an estimated variance of the distribution of means of 4.34. In this example,

MSBetweenS
2
Between

S2Between or MSBetween = S2M(n)

Table 9–4 Estimated Variance of the Distribution of Means Based on Means of the Three
Experimental Groups in the Criminal Record Study (Fictional Data)

Sample Means
Deviation from

Grand Mean
Squared Deviation
from Grand Mean

(M )

4 2.79

8 2.33 5.43

5 .45

17 8.67

Source: Hazan, C., & Shaver, P. (1987). Romantic love conceptualized as an attachment process. Journal of Personality and
Social Psychology, 52, 515. Published by the American Psychological Association. Reprinted with permission.

GM = (©M )>NGroups = 17>3 = 5.67; S2M = ©(M – GM )2>dfBetween = 8.67>2 = 4.34.
-0.01©

– .67

-1.67
(M – GM )2(M – GM )

Table 9–5 Comparison of Figuring the Variance of a Distribution of Means
from the Variance of a Distribution of Individuals, and the Reverse

• From distribution of individuals to distribution of means:
• From distribution of means to distribution of individuals: S2 = (S2M )(n )

S2M = S2>n

The between-groups popula-
tion variance estimate (or
mean squares between) is the
estimated variance of the dis-
tribution of means multiplied
by the number of scores in
each group.

or between-groups
estimate of the population variance.

MSBetween

S2Between

IS
B
N
0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

Introduction to the Analysis of Variance 323

T I P F O R S U C C E S S
A very common mistake when fig-
uring the F ratio is to turn the for-
mula upside down. Just remember
it is as simple as Black and White,
so it is Between divided by Within.

multiplying 4.34 by 5 gives a between-groups population variance estimate of 21.70.
In terms of the formula,

Figuring the F Ratio
The F ratio is the ratio of the between-groups to the within-groups estimate of the
population variance. Stated as a formula,

(9–5)

In the example, the ratio of between to within is 21.70 to 5.33. Carrying out the
division gives an F ratio of 4.07. In terms of the formula,

The F Distribution
You are not quite done. You still need to find the cutoff for the F ratio that is large
enough to reject the null hypothesis. This requires a distribution of F ratios that you
can use to figure out what is an extreme F ratio.

In practice, you simply look up the needed cutoff on a table (or read the exact sig-
nificance from the computer output). To understand where that number on the table
comes from, you need to understand the F distribution. The easiest way to understand
this distribution is to think about how you would go about making one.

Start with three identical populations. Next, randomly select five scores from
each. Then, on the basis of these three samples (of five scores each), figure the F ratio.
(That is, use these scores to make a within-groups estimate and a between-groups es-
timate, then divide the between estimate by the within estimate.) Let’s say that you
do this and the F ratio you come up with is 1.36. Now you select three new random
samples of five scores each and figure the F ratio using these three samples. Perhaps
you get an F of .93. If you do this whole process many, many times, you will even-
tually get a lot of F ratios. The distribution of all possible F ratios figured in this way
(from random samples from identical populations) is called the F distribution.
Figure 9–3 shows an example of an F distribution. (There are many different F dis-
tributions, and each has a slightly different shape. The exact shape depends on how
many samples you take each time and how many scores are in each sample. The
general shape is like that shown in the figure.)

No one actually goes about making F distributions in this way. It is a mathemat-
ical distribution whose exact characteristics can be found from a formula. Statisti-
cians can also prove that, if you had the patience to follow this procedure of taking
random samples and figuring the F ratio of each for a very long time, you would get
the same result.

As you can see in Figure 9–3, the F distribution is not symmetrical but has a long
tail on the right. The reason for the positive skew is that an F distribution is a distri-
bution of ratios of variances. Variances are always positive numbers. (A variance is
an average of squared deviations, and anything squared is a positive number.)
A ratio of a positive number to a positive number can never be less than 0. Yet there
is nothing to stop a ratio from being a very high number. Thus, the F ratio’s

F =
S2Between
S2Within

MSBetween

SWithin

=
21.70

5.33
= 4.07

F =
S2Between
S2Within
or MSBetween
MSWithin

S2Between or MSBetween = (S2M)(n) = (4.34)(5) = 21.70

The F ratio is the between-
groups population variance
estimate (or mean squares be-
tween) divided by the within-
groups population variance
estimate (or mean squares
within).

IS
B
N
0-
55
8-
46
76
1-
X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

324 Chapter 9

between-groups (or numerator)
degrees of freedom ( )
degrees of freedom used in the between-
groups estimate of the population
variance in an analysis of variance (the
numerator of the F ratio); number of
scores free to vary (number of means
minus 1) in figuring the between-groups
estimate of the population variance.

dfBetween

distribution cannot be lower than 0 and can rise quite high.1 (Most F ratios pile
up near 1, but they spread out more on the positive side, where they have more room
to spread out.)

The F Table
The F table is a little more complicated than the t table. This is because there is a dif-
ferent F distribution according to both the degrees of freedom used in the between-
groups variance estimate and the degrees of freedom used in the within-groups variance
estimate. That is, you have to take into account two different degrees of freedom to look
up the needed cutoff. One is the between-groups degrees of freedom. It is also called
the numerator degrees of freedom. This is the degrees of freedom you use in the
between-groups variance estimate, the numerator of the F ratio.As shown earlier in For-
mula 9–3, the degrees of freedom for the between-groups population variance esti-
mate is equal to the number of groups minus 1 .

The other type of degrees of freedom is the within-groups degrees of freedom,
also called the denominator degrees of freedom. This is the sum of the degrees of
freedom from each sample you use when figuring out the within-groups variance
estimate, the denominator of the F ratio.

Stated as a formula,

(9–6)

In the criminal record study example, the between-groups degrees of freedom is 2.
(There are 3 means, minus 1.) In terms of the formula,

dfBetween = NGroups – 1 = 3 – 1 = 2

dfWithin = df1 + df2 + Á +

dfLast

(dfBetween = NGroups – 1)

10 2 3 4 5

Figure 9–3 An F distribution.

The degrees of freedom for
the within-groups population
variance estimate is the sum
of the degrees of freedom
used in making estimates of
the population variance from
each sample.

within-groups (or denominator)
degrees of freedom ( )
degrees of freedom used in the within-
groups estimate of the population
variance in an analysis of variance,
denominator of the F ratio; number of
scores free to vary (number of scores in
each group minus 1, summed over all the
groups) in figuring the within-groups
population variance estimate.

dfWithin

IS
B
N
0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

Introduction to the Analysis of Variance 325

The within-groups degrees of freedom is 12. This is because each of the groups has
4 degrees of freedom on which the estimate is based (5 scores minus 1) and there are
3 groups overall, making a total of 12 degrees of freedom. In terms of the formula,

You would look up the cutoff for an F distribution “with 2 and 12” degrees of free-
dom. As shown in Table 9–6, for the .05 level, you need an F ratio of 3.89 to reject
the null hypothesis. (The full F table is Table A–3 in the Appendix.)

= 4 + 4 + 4 =

dfWithin = df1 + df2 + Á + dfLast = (5 – 1) + (5 – 1) + (5 – 1)

How are you doing?

For part (c) of each question, use the following scores involving three samples:
The scores in Sample A are 5 and 7 ( ), the scores in Sample B are 6 and 10
( ), and the scores in Sample C are 8 and 9 ( ).
1. (a) Write the formula for the within-groups population variance estimate and

(b) define each of the symbols. (c) Figure the within-groups population vari-
ance estimate for these scores.

2. (a) Write the formula for the variance of the distribution of means when using it
as part of an analysis of variance and (b) define each of the symbols. (c) Figure
the variance of the distribution of means for these scores.

3. (a) Write the formula for the between-groups population variance estimate
based on the variance of the distribution of means and (b) define each of the
symbols and explain the logic behind this formula. (c) Figure the between-
groups population variance estimate for these scores.

4. (a) Write the formula for the F ratio and (b) define each of the symbols. (c) Figure
the F ratio for these scores.

M = 8.5

M = 8

M = 6

Table 9–6 Selected Cutoffs for the F Distribution (with Values Highlighted for the Criminal
Record Study)

Denominator
Degrees of
Freedom

Significance
Level Numerator Degrees of Freedom

1 2 3 4 5 6

10 .01 10.05 7.56 6.55 6.00 5.64 5.39

.05 4.97 4.10 3.71 3.48 3.33 3.22

.10 3.29 2.93 2.73 2.61 2.52 2.46

11 .01 9.65 7.21 6.22 5.67 5.32 5.07

.05 4.85 3.98 3.59 3.36 3.20 3.10

.10 3.23 2.86 2.66 2.54 2.45 2.39

12 .01 9.33 6.93 5.95 5.41 5.07 4.82

.05 4.75 3.89 3.49 3.26 3.11 3.00

.10 3.18 2.81 2.61 2.48 2.40 2.33

13 .01 9.07 6.70 5.74 5.21 4.86 4.62

.05 4.67 3.81 3.41 3.18 3.03 2.92

.10 3.14 2.76 2.56 2.43 2.35 2.28

Note: Full table is Table A–3 in the Appendix.

IS
B
N
0-
55
8-
46
76
1-
X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

326 Chapter 9

samples. GMis the grand mean, the overall mean of all your scores, which
isalso the mean of your means. is the degrees of freedom in the
between-groups estimate, the number of groups minus 1.
(c) Grand mean (GM) is .

3.(a)
(b) is the between-groups population variance estimate; is the es-
timated variance of the distribution of means (estimated based on the means
of the samples in your study); nis the number of participants in each sample.
The goal is to have a variance of a distribution of individuals based on the vari-
ation among the means of the groups. is the estimate of the variance of a
distribution of means from the overall population based on the means of the
samples. To go from the variance of a distribution of means to the variance of
a distribution of individuals, you multiply by the size of each sample. This is be-
cause the variance of the distribution of means is always smaller than the dis-
tribution of individuals (because means of samples are less likely to be extreme
than are individual scores); the exact relation is that the variance of distribution
of means is the variance of the distribution of individuals divided by the sam-
ple size; thus you reverse that process here.
(c)

4.(a)
(b) Fis the Fratio; is the between-groups population variance estimate;

is the within-groups population variance estimate.
(c) .

5.(a) and .
(b) is the between-groups degrees of freedom; is the number
of groups; is the within-groups degrees of freedom; is the degrees
of freedom for the population variance estimate based on the scores in the
first sample; is the degrees of freedom for the population variance esti-
mate based on the scores in the second sample; is the degrees of free-
dom for the population variance estimate based on the scores in the last
sample; the dots show that you are to fill in the population degrees of freedom
for as many other samples as there are in the analysis.
(c) ;

.
6.(a) The distribution of Fratios you would expect by chance. (b) Fratios, because

they are a ratio of variances (which as averages of squared numbers have to
be positive), are ratios of two positive numbers, which always have to be pos-
itive. Thus, they can’t be less than 0. But there is no limit to how high an Fratio
can be. Thus, the scores bunch up at the left (near 0) and spread out to the right.
(c) Cutoff Ffor the .05 significance level: 9.55.

1+1+1=3
dfWithin=df1+df2+Á+dfLast = dfBetween=NGroups-1=3-1=2

dfLast

df2

df1 dfWithin

NGroups dfBetween

dfWithin=df1+df2+Á+dfLast dfBetween=NGroups-1
F=S2Between>S2Within= 3.5>3.5=1.0

Within

S2
Between

F=S2Between>S2Within.
S2

Between=(S2M)(n)=(1.75)(2)=3.5

S2
M

S2
M S2

Between

S2
Between=(S2M)(n).

=(2.25+.25+1)>2=3.5>2=1.75.

=(36-7.542+38-7.542+38.5-7.542) > (3-1)

S2
M=©(M-GM)2 > dfBetween

(6+8+8.5)>3=7.5

dfBetween
5. (a) Write the formulas for the between-groups and within-groups degrees of

freedom and (b) define each of the symbols. (c) Figure the between-groups
and within-groups degrees of freedom for these scores.

6. (a) What is the F distribution? (b) Why is it skewed to the right? (c) What is the
cutoff F for these scores for the .05 significance level?

IS
B
N
0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

Hypothesis Testing with the Analysis of Variance
Here are the five steps of hypothesis testing for the criminal record study. The distri-
butions involved are shown in Figure 9–4.

❶ Restate the question as a research hypothesis and a null hypothesis about
the populations. There are three populations:

Population 1: Jurors told that the defendant has a criminal record.
Population 2: Jurors told that the defendant has a clean record.
Population 3: Jurors given no information about the defendant’s record.

The null hypothesis is that these three populations have the same mean
( ). The research hypothesis is that the populations’ means are not
the same.

❷ Determine the characteristics of the comparison distribution. The compari-
son distribution is an F distribution with 2 and 12 degrees of freedom.

❸ Determine the cutoff sample score on the comparison distribution at which
the null hypothesis should be rejected. Using the F table for the .05 signifi-
cance level, the cutoff F ratio is 3.89.

❹ Determine your sample’s score on the comparison distribution. In the analy-
sis of variance, the comparison distribution is an F distribution, and the sample’s
score on that distribution is thus its F ratio. In the example, the F ratio was 4.07.

❺ Decide whether to reject the null hypothesis. In the example, the F ratio of
4.07 is more extreme than the .05 significance level cutoff of 3.89. Thus, the re-
searcher would reject the null hypothesis that the three groups come from popu-
lations with the same mean. This suggests that they come from populations with
different means: that people exposed to different kinds of information (or no in-
formation) about the criminal record of a defendant in a situation of this kind
will differ in their ratings of the defendant’s guilt.

�1 = �2 = �3

Introduction to the Analysis of Variance 327

Answers

1.(a) Formula for the within-groups population variance estimate:

(b) is the within-groups population variance estimate; is the estimated
population variance based on the scores in the first group (the group from Pop-
ulation A); is the estimated population variance based on the scores in the
second group; is the estimated population variance based on the scores
in the last group; the dots show that you are to fill in a population variance es-
timate for as many other groups as there are in the analysis; is the num-
ber of groups.
(c) Figuring for the within-groups population variance estimate:

.
.
.

2.(a)
(b) is the estimated variance of the distribution of means (estimated based
on the means of the samples in your study). Mis the mean of each of your

S2
M

S2
M=©(M – GM)2>dfBetween.

S2
Within=(S21+S22+Á+S2Last)>NGroups=(2+8+.5)>3=10.5>3=3.5.

S2
3=(38 – 8.542+39 – 8.542)>(2 – 1)=(.25+.25)>1=.5

S22=(36 – 842+310 – 842)>(2 – 1)=(4+4)>1=8
S2

1=(35 – 642+37 – 642)>(2 – 1)=(1+1)>1=2

NGroups

S2
Last

S2
2

S2
1 S2

Within

S2
Within=(S21+S22+Á+S2Last)>NGroups

IS
B
N
0-
55
8-
46
76
1-
X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

328 Chapter 9

1 3 5 7 9 10

Criminal Record Group Clean Record Group

No Information Group

F(2, 12) Distribution

0 1 2 3 4 5 6
F ratio

4.07 = F obtained
from sample

Population distributions are assumed to be normal and to have the same variance. They have either
the same means (null hypothesis is true) or different means (research hypothesis is true).

F distribution of ratios comparing
variances of this number of groups with
their respective number of scores
(adjusted—i.e., as degrees of freedom)

5% of Area

Distributions of samples

1 3 5 7 9 10

3.89 = F cutoff

Figure 9–4 Distributions involved in the criminal record study example (fictitious data).

You may be interested to know that several real studies have looked at whether
knowing a defendant’s prior criminal record affects the likelihood of conviction.
The overall conclusion seems to be reasonably consistent with that of the fictional
study described here. For a review of such studies, see Dane and Wrightsman (1982).
(For an example of a study showing this pattern see Greene & Dodge, 1995.)

Another Example
Mikulincer (1998) conducted a series of studies in Israel using the same attachment
style classification measure we discussed earlier in the chapter (see Table 9–1). One
of his studies included 30 university students (10 of each attachment style), all of whom
were in serious romantic relationships. As part of the study, each evening the students
wrote down whether during that day their partner had done something to violate their
trust. Participants noted such events as the partner being very late for a promised
meeting or “forgetting” to tell the participant about some important plan. The results,
along with the analysis of variance figuring, are shown in Table 9–7. The distributions
involved are shown in Figure 9–5. The steps of the hypothesis testing follow.

IS
B
N
0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

❶ Restate the question as a research hypothesis and a null hypothesis about
the populations. There are three populations.

Population 1: Students with a secure attachment style.
Population 2: Students with an avoidant attachment style.
Population 3: Students with an anxious-ambivalent attachment style.

The null hypothesis is that these three populations have the same mean
( ). The research hypothesis is that their means are not the same.

❷ Determine the characteristics of the comparison distribution. The compari-
son distribution will be an F distribution. Its degrees of freedom are figured as
follows: the between-groups variance estimate is based on three groups, making
2 degrees of freedom. The within-groups estimate is based on 9 degrees of free-
dom (10 participants) in each of the three groups, making a total of 27 degrees
of freedom.

❸ Determine the cutoff sample score on the comparison distribution at which
the null hypothesis should be rejected. Using Table A–3 in the Appendix, look
down the column for 2 degrees of freedom in the numerator and stop at the row

�1 = �2 = �3

Introduction to the Analysis of Variance 329

Table 9–7 Number of Trust Violation Events by Romantic Partners Over 3 Weeks Reported by
Individuals of Three Attachment Styles

Attachment Style

Secure Avoidant Anxious-Ambivalent

n 10 10 10

M 2.10 3.70

S 1.66 1.89 1.93

2.76 3.57 3.72

F distribution:

F needed for significance at .05 level from F table,

Between-groups population variance estimate:

Table for finding for the three means
M Deviation Squared Deviation

Secure 2.10 1.51

Avoidant 3.70 .37 .14

Anxious-Ambivalent 4.20 .87 .76

GM: 3.33

Within-groups population variance estimate:

Decision: Reject the null hypothesis.

Source: Data from Mikulincer (1998)

F ratio: F = S2Between>S2Within or MSBetween>MSWithin =

12.05>3.35 = 3.60

SWithin
2 or M S Within =

S21 + S 22 + Á + S 2Last
Ngroups

=
2.76 + 3.57 + 3.72

3
=

10.05
3

= 3.35

S2Between or MSBetween = (S2M)(n) = (1.205)(10) = 12.05
S2M = ©(M – GM )2>dfBetween = 2.41>2 = 1.205

©(M – GM )2: 2.41©: 10.00

-1.23

df = 2, 27: 3.36
dfWithin = df1 + df2 + Á + dfLast = (10 – 1) + (10 – 1) + (10 – 1) = 9 + 9 + 9 = 27
dfBetween = NGroups – 1 = 3 – 1 = 2

S2
IS
B
N
0-
55
8-
46
76
1-
X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

330 Chapter 9

for our denominator degrees of freedom of 27. We will use the .05 significance
level. This gives a cutoff F of 3.36.

❹ Determine your sample’s score on the comparison distribution. This step
requires determining the sample’s F ratio. You find the between-groups variance
estimate (the numerator of the F ratio) in two steps.
●A Estimate the variance of the distribution of means: Add up the sample

means’ squared deviations from the grand mean, and divide by the number of
means minus 1. From Table 9–7, this comes out to 1.205.

●B Figure the estimated variance of the population of individual scores:
Multiply the variance of the distribution of means by the number of scores in
each group. From Table 9–7, this comes out to 12.05.

You find the within-groups variance estimate (the denominator of the F ratio) in
two steps.

●A Figure population variance estimates based on each group’s scores: As
shown in Table 9–7, the population variance estimates are 2.76, 3.57, and 3.72.

●B Average these variance estimates: The average of 2.76, 3.57, and 3.72 comes
out to 3.35.

Secure Avoidant Anxious-Ambivalent

F(2, 27) Distribution

0 1 2 3 4 5 6
F ratio

3.60 = F obtained
from sample

Population distributions are assumed to be normal and to have the same variance. They have either
the same means (null hypothesis is true) or different means (research hypothesis is true).
F distribution of ratios comparing
variances of this number of groups with
their respective number of scores
(adjusted—i.e., as degrees of freedom)
5% of Area
Distributions of samples

S2 = 2.76

2.10

S2 = 3.57

3.70

S2 = 3.72

4.20

3.36 = F cutoff

Figure 9–5 Distributions involved in the attachment style example. (Source: Data from
Mikulincer, 1998.)

IS
B
N
0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

Introduction to the Analysis of Variance 331

The F ratio is the between-groups variance estimate divided by the within-groups
variance estimate, which comes out to 3.60 (that is, ).

❺ Decide whether to reject the null hypothesis. The F ratio of 3.60 is more
extreme than the .05 significance level cutoff F of 3.36. Therefore, Mikulin-
cer (1998) rejected the null hypothesis. He was able to conclude that students
having the three attachment styles differ in the number of trust violations by
their romantic partners they reported over a 3-week period. This conclusion was
consistent with Mikulincer’s hypotheses based on attachment theory.

Summary of Steps for Hypothesis Testing
with the Analysis of Variance
Table 9–8 summarizes the steps of an analysis of variance of the kind we have been
considering in this chapter.

Assumptions in the Analysis of Variance
The assumptions for the analysis of variance are basically the same as for the t test for
independent means. That is, the cutoff F ratio from the table (or the exact p level from
the computer output) is strictly accurate only when the populations follow a normal
curve and have equal variances. As with the t test, in practice the cutoffs are reason-
ably accurate even when your populations are moderately far from normal and have

12.05>3.35 = 3.60

Table 9–8 Steps for the Analysis of Variance (When Sample Sizes Are Equal)

❶ Restate the question as a research hypothesis and a null hypothesis about the populations.

❷ Determine the characteristics of the comparison distribution.

a. The comparison distribution is an F distribution.

b. The between-groups (numerator) degrees of freedom is the number of groups minus 1:

c. The within-groups (denominator) degrees of freedom is the sum of the degrees of freedom in each group
(the number in the group minus 1):

❸ Determine the cutoff sample score on the comparison distribution at which the null hypothesis
should be rejected.

a. Decide the significance level.

b. Look up the appropriate cutoff in an F table, using the degrees of freedom from Step ❷.

❹ Determine your sample’s score on the comparison distribution. This will be an F ratio.

a. Figure the between-groups population variance estimate ( ).
Figure the means of each group.

●A Estimate the variance of the distribution of means: .

●B Figure the estimated variance of the population of individual scores:

b. Figure the within-groups population variance estimate

●A Figure population variance estimates based on each group’s scores: For each group,

●B Average these variance estimates:

c. Figure the F ratio: .

❺ Decide whether to reject the null hypothesis: Compare the scores from Steps ❸ and ❹.

F = S 2Between>S 2Within or F = MS Between>MS Within

S 2Within or MS Within = (S 21 + S 22 + Á + S 2Last)>NGroups.

S 2 = ©(X – M )2>(n – 1 ) = SS>df

(S 2Within or MS Within).

S 2Between or MS Between = (S 2M)(n)

S2M = ©(M – GM)2/dfBetween

S2Between or MSBetween

dfWithin = df1 + df2 + Á + dfLast.

dfBetween = NGroups – 1.

IS
B
N
0-
55
8-
46
76
1-
X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

332 Chapter 9

moderately different variances. As a general rule, if the variance estimate of the group
with the largest estimate is no more than four or five times that of the smallest and the
sample sizes are equal, the conclusions using the F distribution should be adequately
accurate. In Chapter 14 we consider what to do when your populations are a long way
from meeting these assumptions.

How are you doing?

1. A study compares the effects of three experimental treatments, A, B, and C, by
giving each treatment to 16 participants and then assessing their performance
on a standard measure. The results on the standard measure are as follows.
Treatment A group: , ; Treatment B group: , ; Treat-
ment C group: , . Using the .01 significance level, do the three
experimental treatments create any difference among the populations these
groups represent? (a) Use the steps of hypothesis testing and (b) sketch the dis-
tributions involved.

2. Give the two main assumptions for the analysis of variance.
3. Why do we need the equal variance assumption?
4. What is the general rule about when violations of the equal variance assump-

tion are likely to lead to serious inaccuracies in results?

S2 = 7M =

S2 = 9M = 22S2 = 8M = 20

Treatment ATreatment BTreatment C

F(2, 45) Distribution

0123456
F ratio

8.00 = F obtained
from sample

1% of Area

Distributions of samples

7
5.11 = F cutoff

S2 = 8

S2 = 9

S2 = 7

Figure 9–6Distributions for “How Are You Doing?” question 1.

IS
B
N
0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

Introduction to the Analysis of Variance 333

Answers

1.(a) Steps of hypothesis testing:
❶Restate the question as a research hypothesis and a null hypothe-

sis about the populations. There are three populations.

Population 1: People given experimental treatment A.
Population 2: People given experimental treatment B.
Population 3: People given experimental treatment C.

The null hypothesis is that these three populations have the same mean
(). The research hypothesis is that their means are not the
same.

❷Determine the characteristics of the comparison distribution. The
comparison distribution will be an Fdistribution. Its degrees of freedom
are figured as follows: The between-groups variance estimate is based
on three groups, making 2 degrees of freedom. The within-groups esti-
mate is based on 15 degrees of freedom (16 participants) in each of the
three groups, making 45 degrees of freedom.

❸Determine the cutoff sample score on the comparison distribution at
which the null hypothesis should be rejected. Using Table A–3 in the
Appendix, the cutoff Ffor , 45 at the .01 level is 5.11.

❹Determine your sample’s score on the comparison distribution.
(a)Figurethebetween-groupspopulationvarianceestimate():First,

figurethemeanofeachgroup.Thegroupmeansare20,22,and18.
●AEstimate the variance of the distribution of means: Add up the

sample means’ squared deviations from the grand mean and di-
vide by the number of means minus 1:

●BFigure the estimated variance of the population of individual
scores: Multiply the variance of the distribution of means by the
number of scores in each group.

(b)Figure the within-groups population variance estimate ():
●AFigure population variance estimates based on each group’s

scores: Treatment A group, ; Treatment B group, ;
Treatment C group, .

●BAverage these variance estimates:.
The Fratio is the between-groups estimate divided by the within-
groups estimate: .

❺Decidewhethertorejectthenullhypothesis.TheFof8.00ismore
extremethanthe.01cutoffFof5.11.Therefore,rejectthenullhypothesis.
Theresearchhypothesisissupported;thedifferentexperimentaltreat-
mentsdoproducedifferenteffectsonthestandardperformancemeasure.

(b)The distributions involved are shown inFigure 9–6.
2.The populations are assumed to be normally distributed with equal variances.
3.Weneedtheequalvarianceassumptiontobeabletojustifyaveragingtheestimates

fromeachsampleintoanoverallwithin-groupspopulationvarianceestimate.
4.The analysis can lead to inaccurate results when the variance estimate from the

group with the largest estimate is more than four or five times the smallest vari-
ance estimate.

F=64>8=8.00

S2
Within=(8+9+7)>3=8

S2=7
S2=9 S2=8

S2
Within

S2
Between=(4)(16)=64

=(0+4+4)>2=4
S2

M=3(20-20)2+(22-20)2+(18-20)24>(3-1)

S2
Between

df=2

�1=�2=�3

IS
B
N
0-
55
8-
46
76
1-
X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

334 Chapter 9

Planned Contrasts
When you reject the null hypothesis in an analysis of variance, this implies that the
population means are not all the same. What is not clear, however, is which popula-
tion means differ from which. For example, in the criminal record study, the Crimi-
nal Record group jurors had the highest ratings for the defendant’s guilt ( ); the
No Information group jurors, the second highest ( ); and the Clean Record
group jurors, the lowest ( ). From the analysis of variance results, we concluded
that the true means of the three populations these groups represent are not all the
same. (That is, the overall analysis of variance was significant.) However, we do not
know which populations’ means are significantly different from each other.

In practice, in most research situations involving more than two groups, our real
interest is not in an overall, or omnibus, difference among the several groups, but
rather in more specific comparisons. For example, in the criminal record study, the re-
searchers’ prediction in advance would probably have been that the Criminal Record
group would rate the defendant’s guilt higher than both the No Information group and
the Clean Record group. If, in fact, the researchers had made these predictions, these
predictions would be examples of what are called planned contrasts (They are called
“contrasts” because they contrast the results from specific groups.)

Researchers use planned contrasts to look at some particular, focused differences
between groups that directly follow from a theory or that are related directly to some
practical application. Planned contrasts are also sometimes called a priori comparisons
because they have been planned in advance of the study. They may also be called
planned comparisons because they compare the results for specific groups. Finally, a
general name you may see for most contrasts you would figure is linear contrasts.

Figuring Planned Contrasts
The procedure to compare the means of a particular pair of groups is a direct exten-
sion of what you already know: figure a between-groups population variance esti-
mate, a within-groups population variance estimate, and an F.

The within-groups population variance estimate will be the same as for the over-
all analysis of variance. This is because, regardless of the particular groups you are
comparing, you are still assuming that all groups are from populations with the same
variance. Thus, your best estimate of that variance is the one that makes use of the in-
formation from all the groups, the average of the population variance estimates from
each of the samples.

The between-groups population variance estimate, however, in a planned contrast
is different from the between-groups variance estimate in the overall analysis. It is dif-
ferent because in a planned contrast you are interested in the variation only between
a particular pair of means. Specifically, in a planned contrast between two group
means, you figure the between-groups population variance estimate with the usual
two-step procedure, but using just the two means of interest.2

Once you have the two variance estimates for the planned contrast, you figure the
F in the usual way, and compare it to a cutoff from the F table based on the df that go
into the two estimates, which are the same as the overall analysis for and are
usually exactly 1 for (because the between estimate is based on two means,
and ).

An Example
Consider the planned contrast of the Criminal Record group ( ) to the No
Information group ( ).M = 5

M = 8

2 – 1 = 1
dfBetween

dfWithin

M = 4
M = 5

M = 8

planned contrast comparison in
which the particular means to be com-
pared were decided in advance. Also
called planned comparison.

IS
B
N
0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

Introduction to the Analysis of Variance 335

The within-groups population variance estimate for a planned contrast is always
the same as the within-groups estimate from the overall analysis: In the criminal record
example was 5.33.

For the between-groups population variance estimate, you follow the usual two-
step procedure, but using only the two means you plan to compare.

●A Estimate the variance of the distribution of means: Add up the sample
means’ squared deviations from the grand mean and divide by the number of
means minus 1. The grand mean for these two means would be 6.5 [that is,
( ] and when there are two means being compared is

. Thus,

●B Figure the estimated variance of the population of individual scores: Multiply
the variance of the distribution of means by the number of scores in each group.
There are five scores in each group in this study. Thus,

Thus, for this planned contrast, . The
.05 cutoff F for , 12 is 4.75. Thus, the planned contrast is not significant.
You can conclude that the three means differ overall (from the original analysis
of variance, which was significant), but you cannot conclude specifically that
the Criminal Record condition makes a person rate guilt differently from being
in the No Information condition.

A Second Example
What about the other planned contrast of the Criminal Record Group ( ) to the
Clean Record group ( )?

For the between-groups population variance estimate,

●A Estimate the variance of the distribution of means: Add the sample means’
squared deviations from the grand mean and divide by the number of means
minus 1. The grand mean for these two means is and

. Thus,

●B Figure the estimated variance of the population of individual scores: Multiply
the variance of the distribution of means by the number of scores in each group:

The within-groups estimate, again, is the same as we figured for the overall
analysis—5.33.

Thus, . This F of 7.50 is larger than
4.75 (the .05 cutoff F for , 12), which means that the planned contrast is sig-
nificant. Thus, you can conclude that the Criminal Record condition makes a person
rate guilt differently from the Clean Record condition.

df = 1
F = S2Between>S2Within = 40.0>5.33 = 7.50

S2Between = (S2M)(n) = (8)(5) = 40.0

S2M = 3(8 – 6.0)2 + (4 – 6.0)24>1 = 32.02 + (-2.02)4>1 = 4.0 + 4.0 = 8.0

dfBetween = 2 – 1 = 1
(8 + 4)>2 = 6.0

M = 4
M = 8

df = 1
F = S2Between>S2Within = 22.5>5.33 = 4.22

S2Between = (S2M)(n) = (4.5)(5) = 22.5

= [1.52 + (-1.52)]>1 = 2.25 + 2.25 = 4.5.

S2M = ©(M – GM)2>dfBetween = 3(8 – 6.5)2 + (5 – 6.5)24>1

2 – 1 = 1
dfBetween8 + 5)>2 = 6.5

S2Within
IS
B
N
0-
55
8-
46
76
1-
X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

336 Chapter 9

The Bonferroni Procedure
There is a problem when you carry out several planned contrasts. Normally, when you
set the .05 significance level, this means you have selected a cutoff so extreme that you
have only a .05 chance of getting a significant result if the null hypothesis is true. How-
ever, with multiple contrasts, if you use the .05 cutoff, you can actually have much
more than a .05 chance of getting a significant result if the null hypothesis is true!

The reason is this: if you are making several contrasts (comparisons), each at the
.05 level, the chance of any one of them coming out significant is more than .05. (It
is like flipping coins. If you flip any one coin, it has only a 50% chance of coming up
heads. But if you flip five coins, there is a lot better than 50% chance at least one of
them will come up heads.) In fact, if you make two contrasts, each at the .05 signifi-
cance level, there is about a .10 chance that at least one will come out significant just
by chance (that is, if the null hypothesis is true). If you make three planned contrasts
at the .05 level, there is about a .15 chance.

A widely used approach for dealing with this problem with planned contrasts is
the Bonferroni procedure. The idea of the Bonferroni procedure is that you use a more
stringent significance level for each contrast. The result is that the overall chance of
any one of the contrasts being mistakenly significant is still reasonably low. For ex-
ample, if each of two planned contrasts used the .025 significance level, the overall
chance of any one of them being mistakenly significant would still be less than .05.
(That is, .) With three planned contrasts, you could use the .017 level
( ).

The general principle is that the Bonferroni corrected cutoff you use is the true
significance level you want divided by the number of planned contrasts. Thus, if you
want to test your hypothesis at the .01 level and you will make three planned contrasts,
you would test each planned contrast using the .0033 significance level. That is,

.
If you are doing your analyses on a computer, it gives exact significance probabil-

ities as part of the output—that is, it might give a p of .037 or .0054, not just whether you
are beyond the .05 or .01 level. However, if you are using tables, normally only the .01
or .05 cutoffs would be available. Thus, even though almost all researchers use comput-
ers for their analyses, this situation has led to some traditions that are still followed
today. Specifically, for simplicity, when the Bonferroni corrected cutoff might be .017
or even .025, researchers often use the .01 significance level.Also, if there are only two
planned contrasts (or even three), it is common for researchers not to correct at all.

.01>3 = .0033

.05>3 = .017
.05>2 = .025

Bonferroni procedure multiple-
comparison procedure in which the total
alpha percentage is divided among the
set of comparisons so that each is tested
at a more stringent significance level.

How are you doing?

1. (a) What is a planned contrast? (b) Why do researchers make them?
2. How is the procedure for figuring a planned contrast between two particular

groups different from the overall analysis of variance?
3. A study has three groups of 25 participants each in the overall analysis of vari-

ance, and is 100. The researcher makes a single planned contrast be-
tween a group that has a mean of 10 and another group that has a mean of 16.
Is it significant? (Use the .05 significance level.)

4. (a) Why do researchers making planned contrasts need to make the Bonferroni
correction? (b) What is the principle of the Bonferroni correction?

5. If a researcher is making four planned contrasts using the .05 significance level,
what would be the Bonferroni corrected significance level?

S2Within
IS
B
N
0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

Introduction to the Analysis of Variance 337

Post Hoc Comparisons
As we have noted, rejecting the null hypothesis in an analysis of variance implies that
the population means are not all the same, but it does not tell you which population
means differ from which. As you learned in the preceding section on planned contrasts,
researchers often plan specific comparisons based on theory or practical considerations.
Sometimes, however, researchers take a more exploratory approach, for example,
comparing all the different pairings of means to discover which ones do and do not
differ significantly. (We call this making pairwise comparisons, because you are com-
paring all possible pairings of means.) That is, after the study is done, the researcher
is fishing through the results to see which groups differ from each other. These are
called post hoc comparisons (or a posteriori comparisons) because they are after
the fact and not planned in advance.

In post hoc comparisons, all possible comparisons have to be taken into account
when figuring the overall chance of any one of them turning out significant. Using the
Bonferroni procedure for post hoc comparisons is safe, in the sense that you are confi-
dent you won’t get too many results significant by chance. But in post hoc comparisons
there are often so many comparisons to consider that the overall significance level is
divided into such a small number by the Bonferroni procedure that getting any one

post hoc comparisons multiple
comparisons, not specified in advance;
procedure conducted as part of an ex-
ploratory analysis after an analysis
of variance.

Answers

1.(a) A planned contrast is a focused comparison of two groups in an overall
analysis of variance that the researcher planned in advance of the study based
on a theory or practical issue.
(b) Researchers make them because they are more likely to be of theoretical
or practical interest than the overall difference among means.

2.The procedure for figuring a planned contrast between two particular groups
is the same except that you make the between-groups estimate using only the
means of the two groups being compared.

3.It is significant. For the between-groups population variance estimate for the
planned contrast,
●AEstimate the variance of the distribution of means:

; .

●BFigure the estimated variance of the population of individual scores:

The within-groups estimate is the same as the overall within-groups esti-
mate, 100.

Thecutofffor,72(actually1,70,since1,72isnotinthetable)is3.98.
You can reject the null hypothesis. The planned contrast is significant.

4.(a) With more than one contrast, the chance of any one coming out significant
is greater than the direct significance level used.
(b) You divide your overall desired true significance level by the number of con-
trasts. This way, the chance of any one of them coming out significant is taken
into account.

5.The Bonferroni corrected significance level is . .05>4=.0125

df=1

F=450>100=4.5

S2
Between=(18)(25)=450

S2
M=3(10 – 13)2+(16 – 13)42>1=18

dfBetween=2-1=1 GM=(10+16)>2=13

IS
B
N
0-
55
8-
46
76
1-
X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

338 Chapter 9

comparison to come out significant would be a long shot. For example, with four groups,
there are six possible pairs to compare; so using a Bonferroni correction and an overall
significance level of .05, you would have to test each comparison at .05�6 or .0083. If there
are five groups, there are 10 possible comparisons; .05 overall becomes .005 for each com-
parison.And so forth. Thus, the power for any one comparison becomes very low.

Of course, you might think, “I’ll just test the pairs of means that have the biggest
difference so that the number of comparisons won’t be so great.” Unfortunately, this
strategy won’t work. Since you did not decide in advance which pairs of means would
be compared, when exploring after the fact, you have to take into account that any of
the pairs might have been the biggest ones. So unless you made specific predictions
in advance—and had a sound theoretical or practical basis for those predictions—all
the possible pairings have to be counted.

For this reason, statisticians have developed a variety of procedures to use in
these fishing expeditions. These procedures attempt to keep the overall risk of a Type I
error at some level like .05, while at the same time not too drastically reducing statis-
tical power. You may see some of these referred to in articles you read, described by
the names of their developers; the Scheffé test and Tukey test are the most widely
used, with the Neuman-Keuls and Duncan procedures almost as common. Which pro-
cedure is best under various conditions remains a topic of dispute. You can learn the
details about the possibilities and controversies in intermediate statistics.

The Scheffé Test
As a post hoc test, the Scheffé method has the advantage of being the most widely
applicable method. We say that because it is the only one that can be used when you
are making relatively simple comparisons (such as the ones we have considered in
which two groups are being compared), as well as when you are making more com-
plex comparisons (for example, comparing the average of two groups to a third group).
Its disadvantage, however, is that, compared to the Tukey and other procedures, it is
the most conservative. That is, for any given post hoc comparison, its chance of being
significant using the Scheffé is usually still better than the Bonferroni, but worse than
the Tukey or any of the other post hoc contrasts.

To use the Scheffé test, you first figure the F for your comparison in the usual way.
But then you divide that F by the overall study’s (the number of groups
minus 1). You then compare this much smaller F to the overall study’s F cutoff.

Here is an example. Recall that for the comparison of the Criminal Record group
versus the No Information group, we figured an F of 4.22. Since the overall
in that study was 2 (there were three groups), for a Scheffé test, you would actually
consider the F for this contrast to be an F of only . You would then
compare this Scheffé corrected F of 2.11 to the cutoff F for the overall between effect
(in this example, the F for , 12), which was 3.89. Thus, the comparison is not
significant using the Scheffé test.

df = 2

4.22>2 = 2.11

dfBetween
dfBetween

Scheffé test method of figuring the
significance of post hoc comparisons
that takes into account all possible com-
parisons that could be made.

How are you doing?

1. (a) What are post hoc comparisons? (b) Why do researchers make them?
2. (a) Why do researchers typically not use the Bonferroni procedure for post hoc

comparisons?
(b) What is the advantage over the Bonferroni procedure of procedures such
as the Tukey and Scheffé tests?

3. What are the (a) advantages and (b) disadvantages of the Scheffé procedure
versus other post hoc tests (such as the Tukey)?

4. How do you carry out the Scheffé procedure?

IS
B
N
0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

Introduction to the Analysis of Variance 339

proportion of variance accounted for
(R2) proportion of the total variation
of scores from the grand mean that is
accounted for by the variation between
the means of the groups.

Effect Size and Power for the Analysis of Variance

Effect Size

Effect size for the analysis of variance is a little more complex than for a t test. With
the t test, you took the difference between the two means and divided by the standard
deviation. In the analysis of variance, you have more than two means; so it is not
obvious just what is the equivalent to the difference between the means—the numer-
ator in figuring effect size.3 Thus, in this section we consider a quite different ap-
proach to effect size, the proportion of variance accounted for (R2).

To be precise, is the proportion of the total variation of scores from the grand
mean that is accounted for by the variation between the means of the groups. (In other
words, you consider how much of the variance in the measured variable—such as
ratings of guilt—is accounted for by the variable that divides up the groups—such as
what experimental condition one is in.) In terms of a formula,

(9–7)

The between- and within-groups degrees of freedom are included in the formula
to take into account the number of participants and the number of groups used in the
study.

Consider once again the criminal record study. In that example,
, , , and . Thus, thedfWithin = 12S2Within = 5.33dfBetween = 2S2Between = 21.70

R2 =
(S2Between)(dfBetween)

(S2Between)(dfBetween) + (S2Within)(dfWithin)

5. Suppose in a study with four groups of 50 participants each, for a particular
contrast, you figure an F of 12.60. Using a Scheffé test, is this significant at the
.05 significance level?

Answers

1.(a) Post hoc comparisons are comparisons figured after an analysis of vari-
ance, such as between two groups, that were not planned in advance.
(b) Researchers make them as an exploratory procedure to see what patterns
of relations among populations are suggested by the data over and above any
comparisons that were planned in advance.

2.(a) In any follow-up analysis, there are usually so many possible post hoc com-
parisons that, if you used the Bonferroni procedure, your corrected significance
level would be so extreme that it would be very hard for any result to be significant.
(b) The effect of the Tukey and Scheffé tests, and others like them, when doing
multiple post hoc comparisons and correctly adjusting for the many compar-
isons being made, is that a result does not have to be quite so extreme to be
significant.

3.(a) You can use the Scheffé procedure for any number of comparisons, includ-
ing complex comparisons.
(b) The Scheffé procedure is more conservative: the chance of any comparison
being significant is less.

4.Figure the comparison in the usual way, but divide the Fby the overall study’s
and use the overall study’s Fcutoff.

5.Overall study’s ; .
Scheffé corrected . Overall study’s .05 cutoff F(,
196; closest on table3, 100) is 2.70. Thus, even with the Scheffé correction,
this comparison is significant.

=
df=3 F=12.60>3=4.20

dfWithin=49+49+49+49=196 dfBetween=4-1=3
dfBetween

The proportion of variance
accounted for is the between-
groups population variance
estimate multiplied by the be-
tween-groups degrees of free-
dom, divided by the sum of
the between-groups popula-
tion variance estimate multi-
plied by the between-groups
degrees of freedom, plus the
within-groups population
variance estimate multiplied
by the within-groups degrees
of freedom.

IS
B
N
0-
55
8-
46
76
1-
X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

340 Chapter 9

proportion of the total variation accounted for by the variation between groups
is , which is .40 (or 40%). In terms of the
formula,

What if the between-groups and within-groups variance estimates are not avail-
able, as is often true in published studies? It is also possible to figure directly from
F and the degrees of freedom. The formula is

(9–8)

For example, in the criminal record study,

You should also know that another common name for this measure of effect size
(besides ) is , the Greek letter eta squared; is also known as the correlation
ratio.

The proportion of variance accounted for is a useful measure of effect size
because it has the direct meaning suggested by its name. [Further, researchers are
familiar with from its use in regression (see Chapter 12) and its square root, R,
is a kind of correlation coefficient that is very familiar to most researchers (see
Chapter 11).]

has a minimum of 0 and a maximum of 1. However, in practice it is rare in
most psychology research for an analysis of variance to have an even as high
as .20. Cohen’s (1988) conventions for are .01, a small effect size; .06, a medium
effect size; and .14, a large effect size.

Power
Table 9–9 shows the approximate power for the .05 significance level for small,
medium, and large effect sizes; sample sizes of 10, 20, 30, 40, 50, and 100 per group;
and three, four, and five groups.4

Consider a planned study with five groups of 10 participants each and an ex-
pected large effect size (.14). Using the .05 significance level, the study would have
a power of .56. Thus, even if the research hypothesis is in fact true and has a large ef-
fect size, there is only a little greater than even chance (56%) that the study will come
out significant.

As we have noted in previous chapters, determining power is especially useful
when interpreting the practical implication of a nonsignificant result. For example, sup-
pose that you have read a study using an analysis of variance with four groups of 30
participants each, and there is a nonsignificant result at the .05 level. Table 9–9 shows
a power of only .13 for a small effect size. This suggests that even if such a small
effect exists in the population, this study would be very unlikely to have come out

R2
R2

�2

�2R2

R2 =
(F)(dfBetween)

(F)(dfBetween) + dfWithin

(4.07)(2)

(4.07)(2) + 12 =
8.14

20.14
= .40

R2 =
(F)(dfBetween)
(F)(dfBetween) + dfWithin
R2

=
(21.70)(2)

(21.70)(2) + (5.33)(12) =
43.40

107.36
= .40

R2 =
(S2Between)(dfBetween)

(S2Between)(dfBetween) + (S2Within)(dfWithin)

(21.70)(2)>3(21.70)(2) + (5.33)(12)4

The proportion of variance
accounted for is the F ratio
multiplied by the between-
groups degrees of freedom
(the degrees of freedom for
the between-groups popula-
tion variance estimate),
divided by the sum of the F
ratio multiplied by the
between-groups degrees
of freedom, plus the degrees
of freedom for the within-
groups population variance
estimate.

eta squared ( ) common name for
the measure of effect size for the
analysis of variance. Also called
correlation ratio.

R2
�2 IS

B
N

0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

Introduction to the Analysis of Variance 341

significant. But the table shows a power of .96 for a large effect size. This suggests
that if a large effect existed in the population, it almost surely would have shown up
in that study.

Planning Sample Size
Table 9–10 gives the approximate number of participants you need in each group for
80% power at the .05 significance level for estimated small, medium, and large effect
sizes for studies with three, four, and five groups.5

Table 9–9 Approximate Power for Studies Using the Analysis of Variance Testing Hypotheses
at the .05 Significance Level

Effect Size

Participants per Group (n) Small ( ) Medium ( ) Large ( )

Three groups ( )

10 .07 .20 .45

20 .09 .38 .78

30 .12 .55 .93

40 .15 .68 .98

50 .18 .79 .99

100 .32 .98 *

Four groups ( )

10 .07 .21 .51

20 .10 .43 .85

30 .13 .61 .96

40 .16 .76 .99

50 .19 .85 *

100 .36 .99 *

Five groups ( )

10 .07 .23 .56

20 .10 .47 .90

30 .13 .67 .98

40 .17 .81 *

50 .21 .90 *

100 .40 * *

*Nearly 1.

dfBetween = 4

dfBetween = 3

dfBetween = 2

R 2 � .14R 2 � .06R 2 � .01

Table 9–10 Approximate Number of Participants Needed in Each Group (Assuming Equal
Sample Sizes) for 80% Power for the One-Way Analysis of Variance Testing
Hypotheses at the .05 Significance Level

Effect Size

Small
( )R 2 � .01

Medium
( )R 2 � .06

Large
( )R 2 � .14

Three groups 322 52 21

Four groups 274 45 18

Five groups 240 39 16(dfBetween = 4)
(dfBetween = 3)

(dfBetween = 2)

IS
B
N
0-
55
8-
46
76
1-
X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

342 Chapter 9

For example, suppose you are planning a study involving four groups and you
expect a small effect size (and will use the .05 significance level). For 80% power,
you would need 274 participants in each group, a total of 1,096 in all. However, sup-
pose you could adjust the research plan so that it was now reasonable to predict a
large effect size (perhaps by using more accurate measures and a stronger experi-
mental manipulation). Now you would need only 18 in each of the four groups, for
a total of 72.

How are you doing?

1. (a) Why is the method of figuring effect size for analysis of variance quite dif-
ferent from that used for the t tests? (b) Explain the logic of why proportion of
variance accounted for can serve as an effect size in analysis of variance.

2. (a) Write the formula for effect size in analysis of variance using and
; (b) define each of the symbols; (c) give an alternative symbol for ;

and (d) figure the effect size for a study in which

3. (a) Write the formula for effect size in analysis of variance from a study in which
only the F ratio and degrees of freedom are available; (b) define each of the sym-
bols; and (c) figure the effect size for a study with 18 participants in each of the
three groups and an F of 4.50.

4. What is the power of a study with four groups of approximately 40 participants
each to be tested at the .05 significance level, in which the researchers predict
a large effect size?

5. About how many participants do you need in each group for 80% power in a
planned study with five groups in which you predict a medium effect size and
will be using the .05 significance level?

7.20, dfBetween = 2, and dfWithin = 8.
S2Between = 12.22, S2Within =

R2S2Within

S2Between
Answers

1.(a) With ttests, your focus is on the difference between two means; there is no
direct equivalent in the analysis of variance. (b) You are figuring the percentage
of the total variation among the scores that is accounted for by which group
the participant is in.

2.(a) The formula for effect size in analysis of variance:
. (b) is the proportion of variance

accounted for; is the between-groups population variance estimate;
is the between-groups degrees of freedom (number of groups minus 1);

is the within-groups population variance estimate; is the within-
groups degrees of freedom (the sum of the degrees of freedom for each group’s
population variance estimate). (c) . (d) Effect size: .30.

3.(a).(b)istheproportionofvari-
anceaccountedfor;FistheFratiofromthestudy;isthebetween-
groups degrees of freedom; and is the within-groups degrees of free-
dom. (c) Effect size: ; .

.
4.The power of the study is .99.
5.The number of participants needed in each group is 39.

51]=9>60=.15
R2=(F)(dfBetween)>3(F)(dfBetween)+dfWithin4=(4.5)(2)>3(4.5)(2)+514=9>39+

dfWithin=17+17+17=51 dfBetween= 3-1=2
dfWithin

dfBetween

R2 R2=(F)(dfBetween)>3(F) (dfBetween)+dfWithin4
�2

dfWithin S2
Within

dfBetween
S2
Between

R2 3(S2
Between)(dfBetween)+(S2Within)(dfWithin)4

R2=(S2Between)(dfBetween)>

IS
B
N
0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

Introduction to the Analysis of Variance 343

Controversy: Omnibus Tests versus Planned Contrasts
The analysis of variance is commonly used in situations comparing three or more
groups. (If you are comparing two groups, you can use a t test.) However, following
the logic we introduced earlier, Rosnow and Rosenthal (1989) argue that such diffuse
or omnibus tests are not very useful. They say that, in almost all cases when we test
the overall difference among three or more groups, “we have tested a question in
which we almost surely are not interested” (p. 1281). In which questions are we in-
terested? We are interested in specific comparisons, such as between two particular
groups.

Rosnow and Rosenthal (1989; see also Furr & Rosenthal, 2003) advocate that,
when figuring an analysis of variance, you should analyze only planned contrasts.
These should replace entirely the overall F test (that is, the diffuse or omnibus F test)
for whether you can reject the hypothesis of no difference among population means.
Traditionally, planned contrasts, when used at all, are a supplement to the overall
F test. So this has been a rather revolutionary idea.

Consider an example. Orbach and colleagues (1997) compared a group of suici-
dal mental hospital patients (individuals who had made serious suicide attempts), non-
suicidal mental hospital patients with similar diagnoses, and a control group of
volunteers from the community. The purpose of the study was to test the theory that
suicidal individuals have a higher tolerance for physical pain. The idea is that their
higher pain threshold makes it easier for them to do the painful acts usually involved
in suicide. The researchers carried out standard pain threshold and other sensory tests
and administered a variety of questionnaires to all three groups. Here is how they
describe their analysis:

To examine the study hypothesis, we performed a set of two linear contrasts for each
pain measure. . . . The first linear contrast, suicidality contrast, compared the suicidal
group with the two nonsuicidal groups (psychiatric inpatients and control partici-
pants). The second contrast compared the two nonsuicidal groups. . . . We did not
make a previous omnibus F because we conducted preplanned group comparisons
testing the study hypothesis. Because of multiple comparisons needed, the critical
alpha was set at .01, to avoid Type I error. . . .

The suicidality contrast was significant for thermal sensation threshold,
, ; pain threshold, , ; pain toler-

ance , ; and maximum tolerance No signif-
icant difference was found between the suicidal and nonsuicidal groups in the
magnitude estimate measure. An examination of the means . . . supports our main
hypothesis: Suicidal participants, as expected, had high sensation and pain thresholds,
high pain tolerance, and were more likely to tolerate the maximum temperature
administered than inpatients and control participants. Interestingly, the second set of
contrasts revealed no significant differences between the psychiatric inpatients and
control participants in any of the five pain measures. (p. 648)

The study by Orbach and colleagues study exemplifies Rosnow and Rosenthal’s
advice to use planned contrasts instead of an overall analysis of variance. But, al-
though the idea was originally proposed nearly two decades ago, this approach has
not yet been widely adopted and is still controversial. The main concern is much
like the issue we considered in Chapter 4 regarding one-tailed and two-tailed tests.
If we adopt the highly targeted, planned contrasts recommended by Rosnow and

F(1, 95) = 16.05.p 6 .01F(1, 95) = 6.55
p 6 .01F(1, 95) = 23.65p 6 .01F(1, 95) = 21.64

IS
B
N
0-
55
8-
46
76
1-
X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

344 Chapter 9

Rosenthal, critics argue, we lose out on finding unexpected differences not initially
planned, and we put too much control of what is found in the hands of the researcher
(versus nature).

Analyses of Variance in Research Articles
Analyses of variance (of the kind we have considered in this chapter) are usually de-
scribed in a research article by giving the F, the degrees of freedom, and the signifi-
cance level. For example, “F(3, 68) � 5.21, p � .01.” The means for the groups
usually are given in a table, although if there are only a few groups and only one or
a few measures, the means may be given in the regular text of the article. Usually, there
is also some report of follow-up analyses, such as planned contrasts.

Returning again to the criminal record study example, we could describe the
analysis of variance results this way: “The means for the Criminal Record, Clean
Record, and No Information groups were 8.0, 4.0, and 5.0, respectively. These were
significantly different, F(2, 12) � 4.07, p � .05. We also carried out two planned
contrasts: The Criminal Record versus the No Information condition, F(1, 12) � 4.22,
p � .10; and the Criminal Record versus the Clean Record condition, F(1, 12) �
7.50, p � .05. The first contrast approached significance, but after a Bonferroni cor-
rection (for two planned contrasts), it would not even reach the .10 level.”

Note that it is also common for researchers to report planned contrasts using t tests.
These are not ordinary t tests for independent means, but rather special t tests for the
comparisons that are mathematically equivalent to the method we described—that is,
the results in terms of significance are identical (see Chapter Note 2).

Researchers often report results of post hoc comparisons among all pairs of means.
The most common method of doing this is by putting small letters by the means in the
tables. Usually, means with the same letter are not significantly different from each
other; those with different letters are. For example, Table 9–11 presents the actual
results on the love experience measures in the Hazan and Shaver (1987) study (our

Table 9–11 Love Subscale Means for the Three Attachment Types (Newspaper Sample)

AvoidantScale Name
Anxious/
Ambivalent Secure F(2, 571)

Happiness 3.19a 3.31a 3.51b 14.21***

Friendship 3.18a 3.19a 3.50b 22.96***

Trust 3.11a 3.13a 3.43b 16.21***

Fear of closeness 2.30a 2.15a 1.88b 22.65***

Acceptance 2.86a 3.03b 3.01b 4.66**

Emotional extremes 2.75a 3.05b 2.36c 27.54***

Jealousy 2.57a 2.88b 2.17c 43.91***

Obsessive preoccupation 3.01a 3.29b 3.01a 9.47***

Sexual attraction 3.27a 3.43b 3.27a 4.08*

Desire for union 2.81a 3.25b 2.69a 22.67***

Desire for reciprocation 3.24a 3.55b 3.22a 14.90***

Love at first sight 2.91a 3.17b 2.97a 6.00**

Note: Within each row, means with different subscripts differ at the .05 level of significance according to a Scheffé test.
* ** ***
Source: Hazan, C., & Shaver, P. (1987). Romantic love conceptualized as an attachment process. Journal of Personality and
Social Psychology, 52, 511–524. Published by the American Psychological Association. Reprinted with permission.

p < .001 .p < .01;p < .05;

IS
B
N
0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

Introduction to the Analysis of Variance 345

example at the start of the chapter). Consider the first row (the happiness results).
The avoidant and anxious-ambivalent groups are not significantly different from each
other, since they have the same letter (a). But both are significantly different on hap-
piness compared to the secure group, which has a different letter (b). In the jealousy
row, however, all three groups differ from one another.

When reading results of post hoc comparisons, you will see many different pro-
cedures named. For example, Table 9–11 (from the Hazan and Shaver study) explic-
itly mentions that the results are “according to a Scheffé test.”

Advanced Topic: The Structural Model
in the Analysis of Variance
This chapter introduced the basic logic of the analysis of variance. Building on this
understanding, we now briefly describe an alternative but mathematically equivalent
way of understanding the analysis of variance. This alternative is called the structural
model. The core logic you learned earlier in the chapter still applies. However, the
structural model provides a different and more flexible way of figuring the two pop-
ulation variance estimates. Understanding the structural model provides deeper in-
sights into the underlying logic of the analysis of variance, including helping you
understand the way analysis of variance results are laid out in computer printouts.
Also, the structural method more easily handles the situation in which the number of
individuals in each group is not equal. Finally, the structural model method is related
to a fundamental mathematical approach to which we want to expose those of you who
might be going on to more advanced statistics courses.

Principles of the Structural Model
Dividing Up the Deviations
The structural model is all about deviations. To start with, there is the deviation of a
score from the grand mean. In the criminal record example earlier in the chapter (see
Tables 9–3 and 9–4), the grand mean of the 15 scores was .

The deviation from the grand mean is just the beginning. You then think of this
deviation from the grand mean as having two parts: (a) the deviation of the score from
the mean of its group and (b) the deviation of the mean of its group from the grand
mean. Consider a participant in the criminal record study who rated the defendant’s
guilt as a 10. The grand mean of all participants’ guilt ratings was 5.67. This person’s
score has a total deviation of 4.33 (that is, ). The mean of the Crim-
inal Record group by itself was 8. Thus, the deviation of this person’s score from his
or her group’s mean is 2 (that is, ), and the deviation of that group’s mean
from the grand mean is 2.33 (that is, ). Note that these two deviations
(2 and 2.33) add up to the total deviation of 4.33. This is shown in Figure 9–7. We
encourage you to study this figure until you grasp it well.

Summing the Squared Deviations
The next step in the structural model is to square each of these deviation scores and
add up the squared deviations of each type for all the participants. This gives a sum
of squared deviations for each type of deviation score. It turns out that the sum of
squared deviations of each score from the grand mean is equal to (a) the sum of the
squared deviations of each score from its group’s mean plus (b) the sum of the squared

8 – 5.67 = 2.33
10 – 8 = 2

10 – 5.67 = 4.33

85>15 = 5.67

structural model way of understand-
ing the analysis of variance as a division
of the deviation of each score from the
overall mean into two parts: the variation
in groups (its deviation from its group’s
mean) and the variation between groups
(its group’s mean’s deviation from the
overall mean); an alternative (but mathe-
matically equivalent) way of understand-
ing the analysis of variance.

IS
B
N
0-
55
8-
46
76
1-
X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

346 Chapter 9

deviations of each score’s group’s mean from the grand mean. This principle can be
stated as a formula:

(9–9)

In this formula, or is the sum of squared deviations of each
score from the grand mean, completely ignoring the group a score is in. or

is the sum of squared deviations of each score from its group’s mean, added
up for all participants. or is the sum of squared deviations of
each score’s group’s mean from the grand mean—again, added up for all participants.

This rule applies only to the sums of the squared deviations. For each individual
score, the deviations themselves, but not the squared deviations, always add up.

From the Sums of Squared Deviations
to the Population Variance Estimates
Now we are ready to use these sums of squared deviations to figure the needed popu-
lation variance estimates for an analysis of variance. To do this, you divide each sum of
squared deviations by an appropriate degrees of freedom. The between-groups popula-
tion variance estimate ( or ) is the sum of squared deviations of each
score’s group’s mean from the grand mean ( ) divided by the degrees of free-
dom on which it is based ( , the number of groups minus 1). Stated as a formula,

(9–10)

The within-groups population variance estimate ( or ) is the sum
of squared deviations of each score from its group’s mean ( ) divided by the
total degrees of freedom on which this is based ( ; the sum of the degrees of
freedom over all the groups—the number of scores in the first group minus 1, plus the
number in the second group minus 1, etc.). Stated as a formula,

(9–11)

Notice that we have ignored the sum of squared deviations of each score from the
grand mean ( ). This sum of squares is useful mainly for checking our arith-
metic. Recall that .SSTotal = SSWithin +

SSBetween

Total

S2Within =
©(X – M)2

dfWithin
or MSWithin = SSWithindfWithin

dfWithin

SSWithin

MSWithinS
2
Within

S2Between =
©(M – GM)2

dfBetween
or MSBetween = SSBetweendfBetween

dfBetween
SSBetween
MSBetweenS
2
Between

SSBetween©(M – GM)2
SSWithin

©(X-M)2
SSTotal©(X – GM)2

SSTotal = SSWithin + SSBetween

©(X – GM) = ©(X – M)2 + ©(M – GM)2 or

Score’s deviation
from its group’s mean

(10 − 8 = 2)

Group’s mean’s deviation
from the grand mean

(8 − 5.67 = 2.33)

Score’s deviation from the grand mean

(10 − 5.67 = 4.33)

Score Group Mean Grand Mean

10 8 5.67

Figure 9–7 Example from the fictional criminal record study of the deviation of one in-
dividual’s score from the grand mean being that individual’s score’s deviation from his or her
group’s mean plus that individual’s group’s mean’s deviation from the grand mean.

The sum of squared deviations
of each score from the grand
mean is the sum of squared de-
viations of each score from its
group’s mean plus the sum of
squared deviations of each
score’s group’s mean from the
grand mean.

The between-groups popula-
tion variance estimate is the
sum of squared deviations of
each score’s group’s mean
from the grand mean divided
by the degrees of freedom
for the between-groups pop-
ulation variance estimate.

The within-groups popula-
tion variance estimate is the
sum of squared deviations
of each score from its
group’s mean divided by the
degrees of freedom for the
within-groups population
variance estimate.

sum of squared deviations
of each score from the overall mean
of all scores, completely ignoring the
group a score is in.

SSTotal

sum of squared deviations
of each score from its group’s mean.
SSWithin

sum of squared devia-
tions of each score’s group’s mean
from the grand mean.

SSBetween
IS
B
N
0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

Introduction to the Analysis of Variance 347

Figure 9–8 again shows the division of the deviation score into two parts, but
this time emphasizes which deviations are associated with which population variance
estimates.

Relation of the Structural Model Method to the Method
You Learned Earlier in the Chapter
The methods we have just described for figuring the within-groups and between-
groups population variance estimates using the structural model approach give ex-
actly the same result as the methods you learned earlier in the chapter. (If you enjoy
algebra, you might see whether you can derive the earlier formulas from the ones you
have just learned.) However, the procedures you follow to figure those estimates are
quite different. In the structural model method, when figuring the within-groups vari-
ance estimate method, you never actually figure the variance estimate for each group
and average them. Similarly, for the between-groups estimate, with the structural
model method, you never multiply anything by the number of scores in each sample.
The point is that, with either method, you get the same within-groups and between-
groups variance estimates, and thus the same F and the same overall result.

The deeper logic of the analysis of variance with the structural model is also es-
sentially the same as what you learned earlier in the chapter, with a twist. The twist is
one of emphasis. The method you learned earlier in the chapter emphasizes entire
groups, comparing a variance based on differences among group means to a variance
based on averaging variances of the groups. The structural model method emphasizes
individual scores. It compares a variance based on deviations of individual scores’
groups’ means from the grand mean to a variance based on deviations of individual
scores from their group’s mean. The method earlier in the chapter focuses directly on
what contributes to the overall population variance estimates; the structural model
method focuses directly on what contributes to the divisions of the deviations of scores
from the grand mean.

An Example
Table 9–12 shows all the figuring using the structural model for an analysis of vari-
ance for the criminal record study. This table shows all three types of deviations and
squared deviations for each score. For example, for the first person, the deviation
from the grand mean is 4.33 (the score of 10 minus the grand mean of 5.67). This

Scores deviation
from its group’s mean

(basis of the within-groups
variance estimate)

Group’s mean’s deviation
from the grand mean

(basis of the between-groups
variance estimate)

Score’s deviation from the grand mean
Score Group Mean Grand Mean

Figure 9–8 The score’s deviations from its group’s mean is the basis for the within-
groups population variance estimate; the group’s mean’s deviation from the grand mean is the
basis for the between-groups population variance estimate.

IS
B
N
0-
55
8-
46
76
1-
X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

348 Chapter 9

Table 9–12 Analysis of Variance for the Criminal Record Study (Fictional Data) Using the
Structural Model Method (Compare to Tables 9–3 and 9–4)

Criminal Record Group

X � GM X � M M � GM

10 4.33 18.75 2 4 2.33 5.43
7 1.33 1.77 1 2.33 5.43
5 .45 9 2.33 5.43

10 4.33 18.75 2 4 2.33 5.43
8 2.33 5.43 0 0 2.33 5.43

40 45.15 18 27.15

M = 40>5 = 8

-3- .67
-1

Deviation
Squared

DeviationDeviation Deviation
Squared

Deviation

Clean Record Group

X
X � GM X � M M � GM

5 .45 1 1 2.79
1 21.81 9 2.79
3 7.13 1 2.79
7 1.33 1.77 3 9 2.79
4 2.79 0 0 2.79

20 33.95 20 13.95
M = 20>5 = 4

-1.67-1.67
-1.67
-1.67-1-2.67
-1.67-3-4.67
-1.67- .67

Deviation
Squared

DeviationDeviation Deviation
Squared

Deviation
No Information Group
X
X � GM X � M M � GM

4 2.79 1 .45
6 .33 .11 1 1 .45
9 3.33 11.09 4 16 .45
3 7.13 4 .45
3 7.13 4 .45

25 28.25 26 2.25

Sums of squared deviations:

Degrees of freedom:

Population variance estimates:

F ratio: F = S Between2 >S Within2 or MS Between>MS Within = 21.68>5.33 = 4.07
S Between

2 or MS Between = SS Between>df Between = 43.35>2 = 21.68
S Within

2 or MS Within = SS Within>df Within = 64>12 = 5.33

Check (dfTotal = dfWithin + dfBetween): 14 = 12 + 2
dfBetween = NGroups – 1 = 3 – 1 = 2
dfWithin = df1 + df2 + Á + dfLast = (5 – 1) + (5 – 1) + (5 – 1) = 4 + 4 + 4 = 12
dfTotal = N – 1 = 15 – 1 = 14

SS Total = 107.35; SS Within + SS Between = 64 + 43.35 = 107.35
Check (SS Total = SS Within + SS Between):

©(M – GM )2 or SS Between = 27.15 + 13.95 + 2.25 = 43.35
©(X – M )2 or SS Within = 18 + 20 + 26 = 64
©(X – GM )2 or SS Total = 45.15 + 33.95 + 28.25 = 107.35

M = 25>5 = 5

– .67-2-2.67
– .67-2-2.67
– .67
– .67
– .67-1-1.67

Deviation
Squared
Deviation
Squared
DeviationDeviation Deviation
Squared
Deviation
IS
B
N
0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

Introduction to the Analysis of Variance 349

deviation squared is 18.75. The deviation of the score from its group’s mean is 2;
this deviation squared is 4. Finally, the deviation of the score’s group’s mean from
the grand mean is 2.33; this deviation squared is 5.43. Notice that the deviations of
each score’s group’s mean from the grand mean (in this case, 2.33) is the same
number for all the scores in a group. At the bottom of each column, we have also
summed the squared deviations of each type.

The bottom part of Table 9–12 shows the analysis of variance figuring. First, you
figure the three sums of squared deviations ( , , and ). The
next step is to check for accuracy. You do this following the principle that the sum of
squared deviations of each score from the grand mean comes out to the total of the
other two kinds of sums of squared deviations.

The degrees of freedom, the next step shown in the table, is figured the same
way as you learned earlier in the chapter. Then, the table shows the figuring of the two
crucial population variance estimates. You figure them by dividing each sum of squared
deviations by the appropriate degrees of freedom. Finally, the table shows the figur-
ing of the F ratio in the usual way—dividing the between-groups variance estimate
by the within-groups variance estimate. All these results, degrees of freedom, variance
estimates, and F come out exactly the same (within rounding error) as we figured ear-
lier in the chapter.

Analysis of Variance Tables
An analysis of variance table lays out the results of an analysis of variance based on
the structural model method. These kinds of charts are automatically produced by
most analysis of variance computer programs (see, for example, Figure 9–11 later in
the chapter). A standard analysis of variance table has five columns. Table 9–13 shows
an analysis of variance table for the criminal record study.

The first column in a standard analysis of variance table is labeled “Source”; it
lists the type of variance estimate or deviation score involved [“between” (groups),
“within” (groups), and “total”]. The next column is usually labeled “SS” (sum of
squares); it lists the different types of sums of squared deviations. The third column
is “df ” (the degrees of freedom of each type). The fourth column is “MS” (mean
square); this refers to mean squares, that is, MS is SS divided by df, the variance es-
timate. MS is, as usual, the same thing as . However, in an analysis of variance table
the variance is almost always referred to as MS. The last column is “F,” the F ratio.
(In a computer printout there may be additional columns, listing the exact p value and
possibly effect size or confidence intervals.) Each row of the table refers to one of the
variance estimates. The first row is for the between-groups variance estimate. It is
usually listed under Source as “Between” or “Group,” although you will sometimes
see it called “Model” or “Treatment.” The second row is for the within-groups vari-
ance estimate, though it is sometimes labeled as “Error.” The final row is for the sum
of squares based on the total deviation of each score from the grand mean. Note, how-
ever, that computer printouts will sometimes use a different order for the columns
and will sometimes omit either SS or MS, but not both.

SSBetween

SSWithinSSTotal

Table 9–13 Analysis of Variance Table for the Criminal Record Study (Fictional Data)

Source SS df MS F

Between 43.35 2 21.68 4.07

Within 64 12 5.33

Total 107.35 14

analysis of variance table chart
showing the major elements in figuring
an analysis of variance using the struc-
tural model approach.

IS
B
N
0-
55
8-
46
76
1-
X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

350 Chapter 9

Table 9–14 Hypothesis Testing Steps for an Analysis of Variance Using the Structural Model
Approach (Equal- or Unequal-Sized Groups)

❶ Restate the question as a research hypothesis and a null hypothesis about the populations.
❷ Determine the characteristics of the comparison distribution.

a. The comparison distribution will be an F distribution.

b. The between-groups (numerator) degrees of freedom is the number of groups minus 1:
.

c. The within-groups (denominator) degrees of freedom is the sum of the degrees of freedom in each group
(the number of scores in the group minus 1):

d. Check the accuracy of your figuring by making sure that and sum to (which is the
total number of participants minus 1).

❸ Determine the cutoff sample score on the comparison distribution at which the null hypothesis
should be rejected.
a. Decide the significance level.
b. Look up the appropriate cutoff in an F table, using the degrees of freedom from Step ❷.
❹ Determine your sample’s score on the comparison distribution. This will be an F ratio.

a. Figure the mean of each group and the grand mean of all scores.

b. Figure the following deviations for each score:

i. Its deviation from the grand mean ( ).

ii. Its deviation from its group’s mean ( ).

iii. Its group’s mean’s deviation from the grand mean ( ).

c. Square each of these deviation scores.

d. Figure the sums of each of these three types of deviation scores ( , , and ).

e. Check the accuracy of your figuring by making sure that .

f. Figure the between-groups variance estimate: .

g. Figure the within-groups variance estimate: .

h. Figure the F ratio:

❺ Decide whether to reject the null hypothesis: Compare scores from Steps ❸ and ❹.

F = S 2Between >S 2Within or F = MS Between>MS Within.
SS Within >df Within

SS Between>df Between

SS Within + SS Between = SS Total

SS BetweenSS WithinSS Total

M – GM
X – M

X – GM

dfTotaldfBetweendfWithin

dfWithin = df1 + df2 + . . . + dfLast.

df Between = N Groups – 1

Table 9–15 Analysis of Variance Table Showing Symbols and Formulas for Figuring the
Analysis of Variance

Symbols Corresponding to Each Part
of an Analysis of Variance Table

Source SS df MS F

Between F

Within
Total

Formulas for Each Part of an Analysis of Variance Table

Source SS df MS F
Between
Within

Total N – 1©(X – GM )2
SS Within>dfWithindf1 + df2 + . . . + dfLast©(X – M )2

MSBetween>MSWithinSSBetween>dfBetweenNGroups – 1©(M – GM )2

dfTotalSS Total

MS Within (or S
2
Within)df WithinSS Within

MS Between (or S
2
Between)df BetweenSS Between

T I P F O R S U C C E S S
Check your understanding of the
structural model method for analy-
sis of variance by covering up por-
tions of Table 9–15 and trying to
recall the hidden material.

IS
B
N
0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

Introduction to the Analysis of Variance 351

Summary of Procedures for an Analysis
of Variance Using the Structural Model
Table 9–14 summarizes the steps in an analysis of variance using the structural model
method. Note that the only difference from what you learned earlier in this chapter is
in Step ❹, substeps b through g (compare to Table 9–8). Table 9–15 shows an analy-
sis of variance table with the symbols for all the parts put in each section where the
numbers would usually go. It is followed by the same style of analysis of variance table
with the various formulas filled in where the numbers would usually go.6

1. The analysis of variance (ANOVA) is used to test hypotheses based on differences
among means of more than two samples. The procedure compares two estimates
of population variance. One, the within-groups estimate, is the average of the
variance estimates from each of the samples. The other, the between-groups es-
timate, is based on the variation among the means of the samples.

2. The F ratio is the between-groups estimate divided by the within-groups esti-
mate. The null hypothesis is that all the samples come from populations with the
same mean. If the null hypothesis is true, the F ratio should be about 1. This is
because the two population variance estimates are based on the same thing, the
variation within each of the populations (due to chance factors). If the research
hypothesis is true, so that the samples come from populations with different
means, the F ratio should be larger than 1. This is because the between-groups
estimate is now influenced by the variation both within the populations (due to
chance factors) and among them (due to a treatment effect). But the within-groups
estimate is still affected only by the variation within each of the populations.

3. When the samples are of equal size, the within-groups population variance esti-
mate is the ordinary average of the estimates of the population variance figured
from each sample. The between-groups population variance estimate is done in two
steps. First, you estimate the variance of the distribution of means based on the
means of your samples. (This is figured with the usual formula for estimating pop-
ulation variance from sample scores.) Second, you multiply this estimate by the
number of participants in each group. This step takes you from the variance of the
distribution of means to the variance of the distribution of individual scores.

4. The distribution of F ratios when the null hypothesis is true is a mathematically
defined distribution that is skewed to the right. Significance cutoffs are given on
an F table according to the degrees of freedom for each population variance es-
timate: the between-groups (numerator) degrees of freedom is the number of
groups minus 1, and the within-groups (denominator) degrees of freedom is the
sum of the degrees of freedom within all samples.

5. The assumptions for the analysis of variance are the same as for the t test. The
populations must be normally distributed, with equal variances. Like the t test, the
analysis of variance is robust to moderate violations of these assumptions.

6. The overall results of an analysis of variance are often followed up by planned con-
trasts, based on theory or a specific practical need, that examine differences such
as those between specific pairs of means. These contrasts (or comparisons) are
figured using the usual analysis of variance method, but with the between-groups
estimate based on the variation between the two means being compared.

Summary

IS
B
N
0-
55
8-
46
76
1-
X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

352 Chapter 9

7. When making more than one planned contrast, researchers often protect against
the possibility of getting some contrasts significant just by chance by making a
Bonferroni correction of the significance level (dividing the overall desired sig-
nificance level by the number of contrasts).

8. An analysis of variance may be followed up by exploratory, post hoc comparisons.
Such comparisons have to protect against the possibility of getting some signif-
icant results just by chance because of the great many comparisons that could be
made. There are a number of methods for dealing with this problem that are not
as severe as the Bonferroni correction. In one such method, the Scheffé test, you
figure each comparison of interest in the usual way and then divide its F by the
between-groups degrees of freedom from the overall analysis of variance.

9. The proportion of variance accounted for ( ), also called “eta squared” ( ),
is a measure of analysis of variance effect size. The formula for is:

. Power
depends on effect size, number of people in the study, significance level, and
number of groups.

10. Some experts recommend that, instead of using an analysis of variance to make
diffuse, overall comparisons among several means, researchers should plan in
advance to conduct only specific planned contrasts, targeted directly to their the-
oretical or practical questions.

11. Analysis of variance results are reported in a standard fashion, such as
.Resultsofplannedcontrasts arealsocommonly reported

(sometimes using special t tests instead of analysis of variance). Results of post hoc
comparisons are usually shown by putting small letters by the means in tables.

12. ADVANCED TOPIC: An alternative approach to the analysis of variance uses
the structural model. In the structural model method, the deviation of each score
from the grand mean is divided into two parts: (a) the score’s difference from its
group’s mean and (b) its group’s mean’s difference from the grand mean. These
deviations, when squared, summed, and divided by the appropriate degrees of
freedom, give the same within-groups and between-groups estimates as using the
standard analysis of variance method you learned. However, the structural model
is more flexible and can be applied to studies with unequal sample sizes. Com-
putations using the structural model are usually summarized in an analysis of
variance table, with a column for source of variation (between, within, and total),
sums of squared deviations (SS), degrees of freedom (df), population variance
estimates (MS, which equals SS�df), and F (which equals ).MSBetween>MSWithin

F(3, 68) = 5.21, p 6 .01

R2 = (S2Between)(dfBetween)>3(S2Between)(dfBetween) + (S2Within)(dfWithin)4
R2

�2R2

analysis of variance
(ANOVA) (p. 311)

within-groups estimate of the
population variance ( or

) (p. 312)
between-groups estimate of the
population variance ( or

) (p. 313)
F ratio (p. 317)
F distribution (p. 318)
F table (p. 318)

MSBetween
S2Between
MSWithin
S2Within

or (p. 320)
grand mean (GM) (p. 321)

or (p. 322)
between-groups (or numerator)
degrees of freedom ( )
(p. 324)

within-groups (or denominator)
degrees of freedom ( )
(p. 324)

planned contrasts (p. 334)
Bonferroni procedure (p. 336)

dfWithin
dfBetween
MSBetweenS
2
Between
MSWithinS
2
Within

post hoc comparisons
(p. 337)

Scheffé test (p. 338)
proportion of variance accounted
for ( ) (p. 339)

eta squared ( ) (p. 340)
structural model (p. 345)

, ,
(p. 346)

analysis of variance table
(p. 349)

SSBetween
SSWithinSSTotal

�2
R2

Key Terms

IS
B
N
0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

Introduction to the Analysis of Variance 353

Overall Analysis of Variance
An experiment compares the effects of four treatments, giving each treatment to 20
participants and then assessing their performance on a standard measure. The results
on the standard measure are as follows. Treatment 1: ; Treatment 2:

; Treatment 3: ; Treatment 4: .
Using the .05 significance level, does treatment matter? Use the five steps of hypoth-
esis testing and sketch the distributions involved.

Answer

The distributions involved are shown in Figure 9–9.

❶ Restate the question as a research hypothesis and a null hypothesis about
the populations. There are four populations.

Population 1: People given experimental treatment 1.
Population 2: People given experimental treatment 2.
Population 3: People given experimental treatment 3.
Population 4: People given experimental treatment 4.

M = 15, S2 = 27M = 18, S2 = 14M = 12,

S2 = 25

M = 15,

S2 = 20

Example Worked-Out Problems

Treatment 1 Treatment 2 Treatment 3

F(3, 76) Distribution

0 1 2 3 4 5 6
F ratio

5.58 = F obtained
from sample

Treatment 4

2.73 = F cutoff

S2 = 20
15
S2 = 25
12

S2 = 14

S2 = 27

Figure 9–9 Distributions involved in Example Worked-Out Problem for overall analy-
sis of variance.

IS
B
N
0-
55
8-
46
76
1-
X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

354 Chapter 9

The null hypothesis is that these four populations have the same mean
( ). The research hypothesis is that the four population means
are not the same.

❷ Determine the characteristics of the comparison distribution. The comparison
distribution will be an F distribution.

❸ Determine the cutoff sample score on the comparison distribution at which
the null hypothesis should be rejected. Using Table A–3 in the Appendix for

(the closest below 3, 76) at the .05 level, the cutoff F is 2.73.
❹ Determine your sample’s score on the comparison distribution.

a. Figure the between-groups population variance estimate :
Figure the mean of each group. The group means are 15, 12, 18, and 15.
●A Estimate the variance of the distribution of means: Add up the sample

means’ squared deviations from the grand mean and divide by the number
of means minus 1:

●B Figure the estimated variance of the population of individual scores:
Multiply the variance of the distribution of means by the number of scores
in each group.

b. Figure the within-groups population variance estimate :
●A Figure population variance estimates based on each group’s scores:

Treatment 1 group, ; Treatment 2 group, ; Treatment 3
group, ; Treatment 4 group, .

●B Average these variance estimates:

❺ Decide whether to reject the null hypothesis. The F of 5.58 is more extreme
than the .05 cutoff F of 2.73. Therefore, reject the null hypothesis. The research
hypothesis is supported; the different experimental treatments do produce dif-
ferent effects on the standard performance measure.

Planned Contrasts
For the preceding study, figure a planned contrast comparing Treatment 2 to Treat-
ment 3 using the .01 significance level.

Answer
For the between-groups population variance estimate,

.dfBetween = NGroups – 1 = 2 – 1 = 1
(12 + 18)>2 = 15

F = S2Between>S2Within = 120>21.5 = 5.58.
S2Within = (20 + 25 + 14 + 27)>4 = 86>4 = 21.5.

S2 = 27S2 = 14
S2 = 25S2 = 20

(S2Within)

S2Between = (S2M)(n) = (6)(20) = 120.

= (0 + 9 + 9 + 0)>3 = 18>3 = 6.
= (315 – 1542 + 312 – 1542 + 318 – 1542 + 315 – 1542)>(4 – 1)

S2M = a(M – GM)2>dfBetween
GM = (15 + 12 + 18 + 15)>4 = 15

(S2Between)

df = 3, 75

dfWithin = df1 + df2 + # # # + dfLast = 19 + 19 + 19 + 19 = 76.
dfBetween = NGroups – 1 = 4 – 1 = 3;

�1 = �2 = �3 = �4

IS
B
N
0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

Introduction to the Analysis of Variance 355

●B Figure the estimated variance of the population of individual scores:
Multiply the variance of the distribution of means by the number of scores in
each group.

The cutoff F for .05, (the closest on the table below the true df of
1, 76) .

Reject the null hypothesis; the contrast is significant.

Bonferroni Procedure
What is the Bonferroni corrected significance level for each of six planned contrasts
at the overall .05 significance level?

Answer
Bonferroni corrected significance .

Post Hoc Comparisons Using the Scheffé Method
A study has five groups with 10 participants in each. Using a Scheffé test, is a com-
parison with a computed F of 11.21 significant at the .01 significance level?

Answer
The overall study’s ; .

The Scheffé corrected F for this contrast is . The overall study’s
.01 cutoff F ( ) is 3.77. The contrast is not significant.

Figuring Effect Size for an Analysis of Variance
Figure the effect size for the overall analysis of variance Example Worked-Out Problem.

Answer

Advanced Topic: Figuring an Analysis of Variance
Using the Structural Model Method
A researcher at an alcohol treatment center conducts a study of client satisfaction with
treatment methods A, B, and C. The researcher randomly assigns each of the avail-
able 10 clients to receive one of these treatments; four clients end up with Treatment A,
three with Treatment B, and three with Treatment C. Two weeks later, the researcher mea-
sures client satisfaction with the three treatments on a scale from 1 (low satisfaction)

= 11202132>311202132 + 121.5217624 = 13602>313602 + 1163424 = .18.
R2 = (S2Between)(dfBetween)>3(S2Between)(dfBetween) + (S2Within)(dfWithin)4

df = 4, 45
11.21>4 = 2.80

dfWithin = 9 + 9 + 9 + 9 + 9 = 45dfBetween = 5 – 1 = 4

= .05>6 = .0083

= 3.97
df = 1, 75

F = S2Between>S2Within = 360>21.5 = 16.74.
S2Within (from the overall analysis) = 21.5
S2Between = (S2M)(n) = (18)(20) = 360

= [(-3)2 + 32]>1 = (9 + 9)>1 = 18.
= 3(12 – 15)2 + (18 – 15)24>1

S2M = ©(M – GM)2>dfBetween

IS
B
N
0-
55
8-
46
76
1-
X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

356 Chapter 9

to 20 (high satisfaction). Scores for Treatment A were 8, 13, 10, and 9. Scores for
Treatment B were 7, 3, and 8. Scores for Treatment C were 6, 4, and 2. Use the steps
of hypothesis testing and figure an analysis of variance (at the .05 level) using the
structural model method. (Although the example we used for the structural model
method earlier in the chapter had the same number of participants in each group, you
can use the same method in this example with unequal sized groups. Just remember
to figure the grand mean, GM, as the average of all of the scores.)

Answer
Table 9–16 shows the figuring and the analysis of variance table.

❶ Restate the question as a research hypothesis and a null hypothesis about
the populations. There are three populations:

Population 1: Alcoholics receiving Treatment A.
Population 2: Alcoholics receiving Treatment B.
Population 3: Alcoholics receiving Treatment C.

The null hypothesis is that these three populations have the same mean
( ). The research hypothesis is that they do not all have the same
mean.

❷ Determine the characteristics of the comparison distribution. An F distribu-
tion; from Table 9–16, .

df = 2, 7

�1 = �2 = �3

Table 9–16 Analysis of Variance Figuring and Analysis of Variance Table Problem for an Alcohol Treatment Study (Fictional Data)

Treatment A Treatment B Treatment C

X X X

Dev Dev2 Dev Dev2 Dev Dev2 Dev Dev2 Dev Dev2 Dev Dev2 Dev Dev2 Dev Dev2 Dev Dev2

8 1 1 4 3 9 7 0 0 1 1 1 6 1 2 4 9

13 6 36 3 9 3 9 3 16 9 1 4 9 0 0 9

10 3 9 0 0 3 9 8 1 1 2 4 1 2 25 4 9

9 2 4 1 3 9

40 50 14 36 18 17 14 3 12 35 8 27

Note: Dev � Deviation; Dev2 � Squared deviation.

F needed for , 7 at the .05 level 4.74

ANALYSIS OF VARIANCE TABLE:

Source SS df MS F

Between 66 2 33 6.42

Within 36 7 5.14

Total 102 9

Decision: Reject the null hypothesis.

SS Between = 36 + 3 + 27 = 66
SS Within = 14 + 14 + 8 = 36
SS Total = 50 + 17 + 35 = 102

=df = 2
df Between = NGroups – 1 = 3 – 1 = 2
df Within = df1 + df2 + Á + dfLast = (4 – 1) + (3 – 1) + (3 – 1) = 3 + 2 + 2 = 7
df Total = N – 1 = 10 – 1 = 9
GM = (40 + 18 + 12)>10 = 70>10 = 7

M = 12>3 = 4M = 18>3 = 6M = 40>4 = 10

-1
-3-2-5-1
-3-3-1-3-4
-3-1-1-2

M � GMX � MX � GMM � GMX � MX � GMM � GMX � MX � GM

Introduction to the Analysis of Variance 357

❸ Determine the cutoff sample score on the comparison distribution at which
the null hypothesis should be rejected. Using Table A–3, for and a
.05 significance level, the cutoff F is 4.74.

❹ Determine your sample’s score on the comparison distribution. From the
figuring shown in Table 9–16, .

➎ Decide whether to reject the null hypothesis. The F ratio of 6.42 is more ex-
treme than the .05 significance level cutoff F of 4.74. Thus, the researcher can
reject the null hypothesis. If these were real data, the researcher could conclude
that the three kinds of treatment have different effects on how satisfied clients
like theirs are with their treatment.

Outline for Writing Essays for a One-Way
Analysis of Variance

1. Explain that the one-way analysis of variance is used for hypothesis testing when
you have scores from three or more entirely separate groups of people. Be sure
to explain the meaning of the research hypothesis and the null hypothesis in this
situation.

2. Describe the core logic of hypothesis testing in this situation. Be sure to mention
that the analysis of variance involves comparing the results of two ways of esti-
mating the population variance. One population variance estimate (the within-
groups estimate) is based on the variation within each sample and the other estimate
(the between-groups estimate) is based on the variation among the means of the
samples. Be sure to describe these estimates in detail (including how they are fig-
ured, why they are figured that way, and how each is affected by whether the null
hypothesis is true); explain how and why they are used to figure an F ratio.

3. Explain the logic of the comparison distribution that is used with a one-way analy-
sis of varianance (the F distribution).

4. Describe the logic and process for determining the cutoff sample F score on the
comparison distribution at which the null hypothesis should be rejected.

5. Explain how and why the scores from Steps ❸ and ❹ of the hypothesis-testing
process are compared. Explain the meaning of the result of this comparison with
regard to the specific research and null hypotheses being tested.

F = 6.42

df = 2, 7

These problems involve figuring. Most real-life statistics problems are done on a com-
puter with special statistical software. Even if you have such software, do these prob-
lems by hand to ingrain the method in your mind. To learn how to use a computer to
solve statistics problems like those in this chapter, refer to the Using SPSS section at the
end of this chapter and the Study Guide and Computer Workbook that accompanies
this text.

All data are fictional unless an actual citation is given.

Set I (for Answers to Set I Problems, see pp. 685–688)
1. For each of the following studies, decide whether you can reject the null hypoth-

esis that the groups come from identical populations. Use the .05 level. Note that
study (b) provides S, not .S2

Practice Problems

358 Chapter 9

2. For each of the following studies, (a) and (b), decide whether you can reject the
null hypothesis that the groups come from identical populations. Use the .01
level. (c) Figure the effect size for each study. (d) ADVANCED TOPIC: For study
(a), carry out an analysis of variance using the structural model method.

(a) Group 1 Group 2 Group 3

8 6 4
8 6 4
7 5 3
9 7 5

(b) Group 1 Group 2 Group 3

12 10 8
4 2 0

3. A psychologist at a private mental hospital was asked to determine whether there
was any clear difference in the length of stay of patients with different categories
of diagnosis. Looking at the last four patients in each of the three major cate-
gories, the results (in terms of weeks of stay) were as follows:

Diagnosis Category

Affective Disorders Cognitive Disorders Drug-Related Conditions

7 12 8
6 8 10
5 9 12
6 11 10

(a) Using the .05 level and the five steps of hypothesis testing, is there a significant
difference in length of stay among diagnosis categories? (b) Sketch the distributions
involved. (c) Figure the effect size for the study. (d) Explain your answer to part (a)
to someone who understands everything involved in conducting a t test for inde-
pendent means but is unfamiliar with the analysis of variance. (e) Test the signifi-
cance of planned contrasts (using the .05 level without a Bonferroni correction)
for affective disorders versus drug-related conditions and (f) cognitive disorders ver-
sus drug-related conditions. (g) Explain your answers to parts (e) and (f) to a per-
son who understands analysis of variance but is unfamiliar with planned contrasts.

(a) Group 1 Group 2 Group 3

n 10 10 10
M 7.4 6.8 6.8

.82 .90 .80

(b) Group 1 Group 2 Group 3 Group 4

n 25 25 25 25
M 94 101 124 105
S 24 28 31 25

S 2

Introduction to the Analysis of Variance 359

4. A study compared the felt intensity of unrequited love (loving someone who
doesn’t love you) among three groups: 50 individuals who were currently expe-
riencing unrequited love who had a mean experienced intensity ;
50 who had previously experienced unrequited love and described their experi-
ences retrospectively, ; and 50 who had never experienced
unrequited love but described how they thought they would feel if they were to
experience it, . Determine the significance of the difference
among groups, using the 5% level. (a) Use the steps of hypothesis testing; (b) sketch
the distributions involved; (c) figure the effect size for the study; and (d) explain
your answer to part (a) to someone who has never had a course in statistics.

5. A researcher studying genetic influences on learning compares the maze perfor-
mance of four genetically different strains of mice, using eight mice per strain.
Performance for the four strains were as follows:

M = 3.8, S2 = 4.8

M = 3.2, S2 = 5.8

= 3.5, S2 = 5.2

Strain Mean S

J 41 3.5
M 38 4.6
Q 14 3.8
W 37 4.9

Using the .01 significance level, is there an overall difference in maze perfor-
mance among the four strains? (a) Use the steps of hypothesis testing; (b) sketch
the distributions involved; (c) figure the effect size for the study; and (d) explain
your answer to part (a) to someone who is familiar with hypothesis testing
with known populations but unfamiliar with the t test or the analysis of variance.
(e) Test the significance of planned contrasts using the overall .05 level (with a
Bonferroni correction for testing each of the five contrasts) for strain J versus
strain M, (f) for strain J versus strain Q, (g) for strain J versus strain W, (h) for
strain Q versus strain M, and (i) for strain Q versus strain W. (j) Explain your an-
swers to parts (e) through (i) to a person who understands analysis of variance but
is unfamiliar with planned contrasts and the Bonferroni correction.

6. What is the Bonferroni corrected significance level for each of the following
situations?

Situation (a) (b) (c) (d)

Overall significance level .05 .05 .01 .01
Number of planned contrasts 2 4 3 5

7. For each of the following studies, test whether a comparison in which the re-
searcher figures an F of 17.21 would be significant using the Scheffé method.

Number of
Groups

Participants in
Each Group

Significance
Level

(a) 5 10 .05
(b) 6 10 .05
(c) 5 20 .05
(d) 5 10 .01

360 Chapter 9

9. About how many participants do you need in each group for 80% power in each
of the following planned studies, using the analysis of variance with ?

p 6 .05

10. Grilo and colleagues (1997) are clinical psychologists interested in the relation-
ship of depression and substance use to personality disorders. Personality disor-
ders are persistent, problematic traits and behaviors that exceed the usual range
of individual differences. The researchers conducted interviews assessing per-
sonality disorders with adolescents who were psychiatric inpatients and had one
of three diagnoses: (1) those with major depression, (2) those with substance
abuse, and (3) those with both major depression and substance abuse. The mean
number of disorders was as follows: major depression , substance abuse

, those with both conditions . The researchers reported, “The
three study groups differed in the average number of diagnosed personality dis-
orders, .” Explain this result to someone who is
familiar with hypothesis testing with known populations but is unfamiliar with
the t test or the analysis of variance.

11. A researcher wants to know whether the need for mental health care among pris-
oners varies according to the different types of prison facilities. The researcher
randomly selects 40 prisoners from each of the three main types of prisons in a
particular Canadian province and conducts exams to determine their need for
mental health care. In the article describing the results, the researcher reported the
means for each group and then added: “The need for mental health care among
prisoners in the three types of prison systems appeared to be clearly different,

. A planned comparison [contrast] of System 1 to Sys-
tem 2 was significant, .” Explain this result to a per-
son who has never had a course in statistics.

12. Based on Table 9–11 from the Hazan and Shaver (1987) study, indicate for which
variables, if any, (a) the Avoidants are significantly different from the other
two groups, (b) the Anxious-Ambivalents are different from the other two groups,
(c) the Secures are different from the other two groups, and (d) all three groups
are different. (e) Explain, to a person who understands analysis of variance but
does not know anything about post hoc comparisons, what is meant in the table
note that the results are “according to a Scheffé test.”

F(1, 117) = 4.03, p 6 .05
F(2, 117) = 5.62, p 6 .01

F(2, 112) = 10.18, p 6 .0001

M = 1.9M = .7
M = 1.0

Predicted Effect Size Number of Groups

(a) Small 3
(b) Large 3
(c) Small 4
(d) Medium 3

8. What is the power of each of the following planned studies, using the analysis of
variance with ?p 6 .05

Predicted Effect
Size

Number of
Groups

Participants in
Each Group

(a) Small 3 20
(b) Small 3 30
(c) Small 4 20
(d) Medium 3 20

Introduction to the Analysis of Variance 361

13. Which type of English words are longer: nouns, verbs, or adjectives? Go to a book
of at least 400 pages (not this book) and turn to random pages using the random
numbers listed at the end of this paragraph. Go down the page until you come to a
noun. Note its length (in number of letters). Do this for 10 different nouns. Do the
same for 10 verbs and then for 10 adjectives. Using the .05 significance level, (a) carry
out an analysis of variance comparing the three types of words, and (b) figure a
planned contrast of nouns versus verbs. (Be sure also to give the full bibliographic
information on the book you used: authors, title, year published, publisher, city.)

73, 320, 179, 323, 219, 176, 167, 102, 228, 352, 4, 335, 118, 12, 333, 123,
38, 49, 399, 17, 188, 264, 342, 89, 13, 77, 378, 223, 92, 77, 152, 34, 214, 75,
83, 198, 210

Set II
14. For each of the following studies, decide whether you can reject the null hypoth-

esis that the groups come from identical populations. Use the .05 level.

(a) Group 1 Group 2 Group 3

n 5 5 5
M 10 12 14

4 6 5

(b) Group 1 Group 2 Group 3

n 10 10 10
M 10 12 14

4 6 5S 2

S 2

15. For each of the following studies, (a) and (b), decide whether you can reject the
null hypothesis that the groups come from identical populations. Use the .01
level. (c) Figure the effect size for each study. (d) ADVANCED TOPIC: Carry out
an analysis of variance for study (a) using the structural model method.

(a) Group 1 Group 2 Group 3

1 1 8
2 2 7
1 1 8
2 2 7

(b) Group 1 Group 2 Group 3

1 4 8
2 5 7
1 4 8
2 5 7

16. An organizational psychologist was interested in whether individuals working in
different sectors of a company differed in their attitudes toward the company. The
results for the three people surveyed in development were 10, 12, and 11; for the
three in the marketing department, 6, 6, and 8; for the three in accounting, 7, 4,
and 4; and for the three in production, 14, 16, and 13 (higher numbers mean more

362 Chapter 9

positive attitudes). Was there a significant difference in attitude toward the com-
pany among employees working in different sectors of the company at the .05
level? (a) Use the steps of hypothesis testing; (b) sketch the distributions involved;
(c) figure the effect size for the study; (d) explain your answer to part (a) to some-
one who understands everything involved in conducting a t test for independent
means but is unfamiliar with the analysis of variance; (e) test the significance of
planned contrasts using the overall .05 level (with a Bonferroni correction for test-
ing each of the five contrasts) for engineering versus production, (f) marketing
versus production, (g) accounting versus production, (h) development versus mar-
keting, and (i) development versus accounting. (j) Explain your answers to parts
(e) through (i) to a person who understands analysis of variance but is unfamiliar
with planned contrasts or Bonferroni corrections. (k) ADVANCED TOPIC: Carry
out an analysis of variance for the study using the structural model method.

17. Do students at various universities differ in how sociable they are? Twenty-five
students were randomly selected from each of three universities in a region and
were asked to report on the amount of time they spent socializing each day with
other students. The result for University X was a mean of 5 hours and an estimated
population variance of 2 hours; for University Y, ; and for Uni-
versity Z, . What should you conclude? Use the .05 level.
(a) Use the steps of hypothesis testing, (b) figure the effect size for the study;
and (c) explain your answers to parts (a) and (b) to someone who has never had
a course in statistics.

18. A psychologist studying artistic preference randomly assigns a group of 45 par-
ticipants to one of three conditions in which they view a series of unfamiliar ab-
stract paintings. The 15 participants in the Famous condition are led to believe that
these are each famous paintings; their mean rating for liking the paintings is 6.5
( ). The 15 in the Critically Acclaimed condition are led to believe that
these are paintings that are not famous but are very highly thought of by a group
of professional art critics; their mean rating is 8.5 ( ). The 15 in the Con-
trol condition are given no special information about the paintings; their mean rat-
ing is 3.1 ( ). Does what people are told about paintings make a difference
in how well they are liked? Use the .05 level. (a) Use the steps of hypothesis test-
ing; (b) sketch the distributions involved; (c) figure the effect size for the study;
(d) explain your answer to part (a) to someone who is familiar with the t test for
independent means but is unfamiliar with analysis of variance; (e) test the signif-
icance of planned contrasts (using the .05 significance level without a Bonferroni
correction) for Famous versus Control and (f) Critically Acclaimed versus Con-
trol. (g) Explain your answers to parts (e) and (f) to a person who understands
analysis of variance but is unfamiliar with planned contrasts.

19. What is the Bonferroni corrected significance level for each of the following
situations?

S = 2.9

S = 4.2

S = 3.5

M = 6, S2 = 2.5
M = 4, S2 = 1.5

Situation (a) (b) (c) (d)

Overall significance level .01 .01 .05 .05
Number of planned contrasts 4 2 4 3

20. For each of the following studies, test whether a comparison in which the re-
searcher figures an F of 8.12 would be significant using the Scheffé method.

Introduction to the Analysis of Variance 363

21. What is the power of each of the following planned studies, using the analysis of
variance with ?p 6 .05

Number of
Groups

Participants in
Each Group
Significance
Level

(a) 4 30 .05
(b) 5 80 .05
(c) 4 5 .05
(d) 8 30 .01

Predicted
Effect Size

Number of
Groups
Participants in
Each Group

(a) Small 4 50
(b) Medium 4 50
(c) Large 4 50
(d) Medium 5 50

22. About how many participants do you need in each group for 80% power in each
of the following planned studies, using the analysis of variance with ?p 6 .05

Predicted Effect Size Number of Groups

(a) Small 5
(b) Medium 5
(c) Large 5
(d) Medium 3

23. An experiment is conducted in which 60 participants each fill out a personality
test, but not according to the way the participants see themselves. Instead, 15 are
randomly assigned to fill it out according to the way they think their mothers see
them (that is, the way they think their mothers would fill it out to describe the par-
ticipants); 15 as their fathers would fill it out for them; 15 as their best friends
would fill it out for them; and 15 as the professors they know best would fill it
out for them. The main results appear in Table 9–17. Explain these results to a per-
son who has never had a course in statistics.

24. Rosalie Friend (2001), an educational psychologist, compared three methods
of teaching writing. Students were randomly assigned to three different exper-
imental conditions involving different methods of writing a summary. At the

Table 9–17 Means for Main Personality Scales for Each Experimental Condition
(Fictional Data)

Scale Mother Father Friend Professor F(3, 56)

Conformity 24 21 12 16 4.21**

Extroversion 14 13 15 13 2.05

Maturity 15 15 22 19 3.11*

Self-confidence 38 42 27 32 3.58*

*p 6 .05, **p 6 .01.

364 Chapter 9

end of the two days of instructions, participants wrote a summary. One of the
ways it was scored was the percentage of specific details of information it in-
cluded from the original material. Here is a selection from her article describ-
ing one of the findings:

The effect of summarization method on inclusion of important information was
significant: . The mean scores (with standard
deviations in parentheses) were as follows: Argument Repetition, 59.6% (17.9);
Generalization, 59.8% (15.2); and Self-Reflection, 50.2% (18.0). (p. 14)

(a) Explain these results to a person who has never had a course in statistics. (b)
Using the information in the preceding description, figure the effect size for the
study.

25. Miller (1997) asked 147 female students to view slides of magazine ads that in-
cluded, among other things, pictures of attractive men. The participants were
measured for physiological arousal (skin conductance) while viewing the ads and
also after viewing them; they were asked to rate the attractiveness and how much
they would like to meet each person in the ads. As part of the analysis, Miller com-
pared results for women dating no one, women in casual dating relationships,
and women in exclusive dating relationships. Table 9–18 shows Miller’s results.
(a), (b), and (c) Describe the pattern of results on each variable. (d) Explain, to a
person who understands analysis of variance but is unfamiliar with post hoc com-
parisons, what is meant in a general way by the table note that the results are
based on “Duncan’s multiple range test.” (That is, you don’t need to explain this
specific test, but you do need to explain why a test like this was used and what it
attempts to accomplish.)

F(2, 144) = 4.1032, p 6 .019

Table 9–18 Effects of

Relationship Status

Dependent measure Dating No One Casual Dating Exclusive Dating

Skin conductance 19.5b 19.1b 15.8a
Desire to meet target 14.6b 15.3b 11.2a
Perceived physical attractiveness of target 15.6b 17.1b 13.8a

Note: Higher numbers reflect greater arousal, desire to meet target, and perceived attractiveness; for the latter two items the
possible range was 1–19. Within each row, means with different subscripts differ significantly ( ) by Duncan’s multiple
range test.
Source: Miller, R.S. (1997). Inattentive and contented: Relationship commitment and attention to alternatives. Journal of
Personality and Social Psychology, 73, 758–766. Published by the American Psychological Association. Reprinted with
permission.

p 6 .05

Using SPSS

The U in the following steps indicates a mouse click. (We used SPSS version 15.0
for Windows to carry out these analyses. The steps and output may be slightly differ-
ent for other versions of SPSS.)

It is easier to learn these steps using actual numbers; so we will use the criminal
record example from earlier in the chapter. The scores for that example are shown in
Table 9–3 on page 320.

Introduction to the Analysis of Variance 365

Figuring a One-Way Analysis of Variance
❶ Enter the scores into SPSS. SPSS assumes that all scores in a row are from the

same person. In this example, each person is in only one of the three groups (the
Criminal Record group, the Clean Record group, or the No Information group).
Thus, to tell SPSS which person is in each group, enter the numbers as shown in
Figure 9–10. In the first column (labeled “group”), we used the number “1” to
indicate that a person is in the Criminal Record group, the number “2” to indi-
cate a person in the Clean Record group, and the number “3” to indicate a per-
son in the No Information group.

❷ U Analyze.
❸ U Compare means.
❹ U One-Way ANOVA.
❺ U on the variable called “guilt” and then U the arrow next to the box labeled “De-

pendent List.” This tells SPSS that the analysis of variance should be carried out
on the scores for the “guilt” variable.

Figure 9–10 SPSS data editor window for the criminal record example (in which 15
individuals rated the guilt of a defendant after being randomly assigned to one of the three
groups that were given different information about the defendant’s previous criminal record).

366 Chapter 9

❻ U the variable called “group” and then U the arrow next to the box labeled
“Factor.” This tells SPSS that the variable called “group” shows which person is
in which group.

❼ U Options. U the box labeled Descriptive (this checks the box). This tells SPSS
to provide descriptive statistics (such as the mean and standard deviation) for
each group. U Continue. (Step ❼ is optional, but we recommend always request-
ing descriptive statistics for any hypothesis testing situation.)

➑ U OK. Your SPSS output window should look like Figure 9–11.

The first table in the SPSS output provides descriptive statistics (number of in-
dividuals, mean, estimated population standard deviation, and other statistics) for the
“guilt” scores for each of the three groups.

The second table in the SPSS output shows the actual results of the one-way
analysis of variance. The first column lists the types of population variance estimates

Figure 9–11 SPSS output window for a one-way analysis of variance for the criminal
record example.

Introduction to the Analysis of Variance 367

(between groups and within groups). The second column lists the between groups and
within groups sums of squares: these are described in the Advanced Topic section
earlier in this chapter, but ignore this column if you did not read that section. The
third column, “df,” gives the degrees of freedom. In the between groups row, this cor-
responds to ; in the within groups row, this corresponds to . The
fourth column, “Mean Square,” gives the population variance estimates ( and

), with the between-groups estimate first and then the within-groups estimate.
The next column gives the F ratio for the analysis of variance. Allowing for round-
ing error, the values for “df,” “Mean Square,” and “F” (and “Sum of Squares”) are the
same as those reported earlier in the chapter (allowing for rounding error). The final
column, “Sig.,” shows the exact significance level of the F ratio. The significance
level of .045 is less than our .05 cutoff for this example. Thus, you can reject the null
hypothesis and the research hypothesis is supported (that is, the result is statistically
significant).

Post Hoc Tests for a One-Way Analysis of Variance
We will again use the criminal record example that we used for the one-way analysis
of variance. Note that, before going through the following steps, we went to the “Vari-
able View” window in SPSS and entered value labels (in the “Values” column) for the
“guilt” variable ( “criminal record”; “clean record”; “no information”).
Doing this makes it easier to read the SPSS output for the post hoc tests.

First, follow Steps ❶ through ❼ shown above for a one-way analysis of variance.
❽ UPost Hoc. As you will see in the window that appears, there are many dif-

ferent types of post hoc tests available. In this chapter, we focused on the Scheffé test,
so U the box labeled Scheffe (this checks the box). U Continue.

❾ U OK.
The first two tables shown in your SPSS output window will be the same as the

tables shown in Figure 9–11. There will also be two new tables in your SPSS output
window; the first of these tables, labeled “Post Hoc Tests,” is the most important and
is shown in Figure 9–12. The table shows the results of all possible comparisons of the
study groups (Criminal Record, No Record, and No Information). Let’s start by look-
ing at the first row of numbers, which shows the comparison of the Criminal Record
group to the Clean Record group. The value of 4 in the “Mean Difference” column
tells you that the difference between the means of these groups was 4. The “Sig.” col-
umn tells you the exact significance level associated with a difference of that size. The
value of .054 is not less than our standard .05 cutoff value, which tells you that the
means of the criminal record and Clean Record groups are not significantly different.
(For the current purposes, you do not need to worry about the columns labeled “Std.
Error” or “95% Confidence Interval.”) The second row of numbers shows the result
of comparing the Criminal Record group to the No Information group. As you can see
in the “Mean Difference” column, the difference between the means of these groups
was 3; this difference is not statistically significant since the significance level of .164
is greater than our standard .05 cutoff. You can ignore the third row of numbers because
it shows the result of comparing the Clean Record group to the Criminal Record group,
and you have already seen the result of that comparison in the first row of the table.
The only remaining comparison to consider is the difference between the Clean Record
group and the No Information group. That comparison is shown in the fourth row of
numbers. The difference between the means of those groups was and the signifi-
cance level of .795 tells you that this difference is not statistically significant. So, over-
all, none of the three Scheffé post hoc comparisons was statistically significant.

-1

3 =2 =1=

S2Within
S2Between

dfWithindfBetween

368 Chapter 9

Figure 9–12 SPSS output window for post hoc tests for a one-way analysis of vari-
ance for the criminal record example.

Chapter Notes

1. It is possible, by chance, for F to be larger or smaller than 1 in any particular
situation. Both the between-groups and the within-groups estimates are only
estimates and can each vary a fair amount even when the null hypothesis is per-
fectly true. If F is considerably larger than 1, you reject the null hypothesis that
the populations all have the same mean. But what if F is substantially smaller than 1?
This rarely happens. When it does, it could mean that there is less variation among
the groups than would be expected by chance—something is restricting the
variation between groups.

2. Why not just use a t test to compare the two groups? If you used an ordinary t test,
your pooled estimate of the population variance would be based on only these two
groups. Thus, you would be ignoring the information about the population vari-
ance provided by the scores in the other groups. One way to deal with this would
be to do the ordinary t test in the usual way at every step, except wherever you
would ordinarily use the pooled estimate, you would instead use the within-groups

population variance estimate from the overall analysis of variance. Also, you
would determine your significance cutoff using the df for the overall within-
groups estimate. Actually, this modified t test procedure for a planned contrast and
the one we describe using the F test are mathematically equivalent and give ex-
actly the same final result in terms of whether or not your result is significant. (See
Chapter 15 for a more general discussion of the relation of the t test to the analy-
sis of variance.) We emphasize the F test approach here because it is more straight-
forward in terms of the rest of the material in this chapter.

3. There actually is a kind of analysis of variance equivalent to the difference be-
tween means—the variation among the means. In fact, Cohen (1988) recom-
mends using the standard deviation of the distribution of means. Thus, he defines
what he calls f as an effect size for the analysis of variance, which is figured as
the standard deviation of the distribution of means (estimated as ) divided
by the standard deviation of the individuals (estimated as ). However, this
measure of effect size is rarely used in research articles and is less intuitively
meaningful than the more common one we discuss here.

4. More detailed tables are provided in Cohen (1988, pp. 289–354). When using
these tables, note that the value of u at the top of each of his tables refers to

, which for a one-way analysis of variance is the number of groups
minus 1, not the number of groups as used in our Table 9–9.

5. More detailed tables are provided in Cohen (1988, pp. 381–389). If you use these,
see Chapter Note 4.

6. There are also computational formulas for figuring an analysis of variance with
the structural model method. For learning purposes in your class, you should use
the steps as we have discussed them in this Advanced Topic section. In a real re-
search situation, the figuring is usually all done by computer (see this chapter’s
Using SPSS section). However, if you are ever in the unlikely situation of hav-
ing to do a one-way analysis of variance for an actual research study by hand (or
just using a hand calculator), you may find the following formulas useful:

SSWithin = SSTotal – SSBetween

n1
+

(©X2)
n2

nLast
–

(©X)2

N
dfBetween
SWithin

Introduction to the Analysis of Variance 369

Name:

Chapter 11 Instructions

Practice Problem 11 and 12c & 12d

Due Week 5 Day 6 (Sunday)

Follow the instructions below to submit your answers for Chapter 11 Practice Problem 11 and 12c & d.

1. Save Chapter 11 Instructions to your computer.

2. Type your answers into the shaded boxes below. The boxes will expand as you type your answers.

3. Resave this form to your computer with your answers filled-in.

The following link provides a tutorial on creating graphs in Excel for Chapter 11, Practice Problem 11.

***Note: There is no template for Problem 11. Each student will create an excel document on which the scatter graphs are created. The excel document will be turned in as an attachment as if it were a template.

https://ecampus.phoenix.edu/secure/aapd/uc/psy315/r1/psy315r1excelresources.htm

Or you can click on the Classroom tab after you sign-in to your UoP account and then click on Materials ( Excel Resources ( Graphs. An Excel tutorial will open with instructions on how to create graphs in Excel.

Below is an explanation of the symbols in Chapter 11, Practice Problem 12c & 12d. Note: Instructions for Practice Problem 12c & 12d begin in the second portion of Problem 11 on page 479 in your textbook (above problem 12).

r = score that represents the correlation coefficient

t = score that determines the significance of the correlation coefficient score

t needed = cut-off score that establishes the region of rejection (also known as the critical value)

both groups

Decision:
Reject the Null
or
Fail to Reject the Null
(select only one)

Read Chapter 11 Practice Problem 12c & 12d in your text book and then type your answers into the shaded boxes below. Note: Please provide only those answers indicated below, nothing more. You do not need to show your work. Round your answers to 2 decimal places.

r =

t =

t needed = +

Decision:

✪ Graphing Correlations: The Scatte

Diagram 43

✪ Patterns of Correlation 43

✪ The Correlation Coefficient 44

✪ Significance of a

Correlation

Coefficient 452

✪ Correlation and Causality 45

✪ Issues in Interpreting the Correlation
Coefficient 45

✪ Effect Size and Power for the
Correlation Coefficient 464

This chapter is about a statistical procedure that allows you to look at the rela-tionship between two groups of scores. To give you an idea of what we mean,let’s consider some common real-world examples. Among students, there is a
relationship between high school grades and college grades. It isn’t a perfect relation-
ship, but generally speaking students with better high school grades tend to get bet-
ter grades in college. Similarly, there is a relationship between parents’ heights and
the adult height of their children. Taller parents tend to give birth to children who
grow up to be taller than the children of shorter parents. Again, the relationship isn’t
perfect, but the general pattern is clear. Now we’ll look at an example in detail.

One hundred thirteen married people in the small college town of Santa Cruz,
California, responded to a questionnaire in the local newspaper about their marriage.
[This was part of a larger study reported by Aron and colleagues (2000).] As part of
the questionnaire, they answered the question, “How exciting are the things you do

✪ Controversy: What Is a Large
Correlation? 466

✪ Correlation in Research
Articles 467

✪ Summary 469

✪ Key Terms 47

✪ Example Worked-Out
Problems 471

✪ Practice Problems 474

✪ Using SPSS 482

✪ Chapter Notes 48

Correlation

Chapter Outline

CHAPTER 11

IS
B

N
0-558-46761-

Correlation 433

T I P F O R S U C C E S S
You can learn most of the material
in this chapter if you have mas-
tered Chapters 1 and 2; but if you
are reading this before havin

studied Chapters 3 through 7, you
should not try to read the material
near the end of this chapter on the
significance of a correlation coeffi-
cient or on effect size and power.

together with your partner?” using a scale from 1, not exciting at all to 5, extremely
exciting. The questionnaire also included a standard measure of marital satisfaction
(that included items such as, “In general, how often do you think that things between
you and your partner are going well?”).

The researchers were interested in finding out the relationship between doing ex-
citing things with a marital partner and the level of marital satisfaction people re-
ported. In other words, they wanted to look at the relationship between two groups of
scores: the group of scores for doing exciting things and the group of scores for mar-
ital satisfaction. As shown in Figure 11–1, the relationship between these two groups
of scores can be shown very clearly using a graph. The horizontal axis is for people’s
answers to the question, “How exciting are the things you do together with your part-
ner?” The vertical axis is for the marital satisfaction scores. Each person’s score on
the two variables is shown as a dot.

The overall pattern is that the dots go from the lower left to the upper right. That
is, lower scores on the variable “doing exciting activities with your partner” more often
go with lower scores on the variable “marital satisfaction,” and higher with higher. So,
in general, this graph shows that the more that people did exciting activities with their
partner, the more satisfied they were in their marriage. Even though the pattern is far from
one to one, you can see a general trend. This general pattern is of high scores on one vari-
able going with high scores on the other variable, low scores going with low scores,
and mediums with mediums. This is an example of a correlation.

A correlation describes the relationship between two variables. More precisely, the
usual measure of a correlation describes the relationship between two equal-interval
numeric variables. As you learned in Chapter 1, the differences between values for

Exciting Activities with Partner

1 2 3 4 5

M
ar

ita
l S

at
is

fa
ct

io
n

Figure 11–1 Scatter diagram showing the correlation for 113 married individuals be-
tween doing exciting activities with their partner and their marital satisfaction. (Data from Aron
et al., 2000.

)

correlation association between scores
on two variables.

IS
B

N
0-

55
8-

46
76

1-
X

434 Chapter 11

equal-interval numeric variables correspond to differences in the underlying thing being
measured. (Most psychologists consider scales like a 1-to-10 rating scale as approx-
imately equal-interval scales.) There are countless examples of correlations: in chil-
dren, there is a correlation between age and coordination skills; among students, the

is a correlation between amount of time studying and amount learned; in the market-
place, we often assume that a correlation exists between price and quality—that high
prices go with high quality and low with low.

This chapter explores correlation, including how to describe it graphically, dif-
ferent types of correlations, how to figure the correlation coefficient (which gives a
number for the degree of correlation), the statistical significance of a correlation co-
efficient, issues about how to interpret a correlation coefficient, and effect size and
power for a correlation coefficient.

Graphing Correlations: The Scatter Diagram
Figure 11–1 shows the correlation between exciting activities and marital satisfac-
tion and is an example of a scatter diagram (also called a scatterplot). A scatter
diagram shows you at a glance the pattern of the relationship between the two
variables.

How to Make a Scatter Diagram
There are three steps to making a scatter diagram:

❶ Draw the axes and decide which variable goes on which axis. Often, it
doesn’t matter which variable goes on which axis. However, sometimes the re-
searchers are thinking of one of the variables as predicting or causing the other.
In that case, the variable that is doing the predicting or causing goes on the hor-
izontal axis and the variable that is being predicted about or caused goes on the
vertical axis. In Figure 11–1, we put exciting activities on the horizontal axis
and marital satisfaction on the vertical axis. This was because the study was
based on a theory that the more the activities that a couple does together are
exciting, the more the couple is satisfied with their marriage. (We will have
more to say about this later in the chapter when we discuss causality and also in
Chapter 12 when we discuss prediction.)

❷ Determine the range of values to use for each variable and mark them on the
axes. Your numbers should go from low to high on each axis, starting from where
the axes meet. Your low value on each axis should be 0.

Each axis should continue to the highest value your measure can possibly
have. When there is no obvious highest possible value, make the axis go to a
value that is as high as people ordinarily score in the group of people of interest
for your study. Note that scatter diagrams are usually made roughly square, with
the horizontal and vertical axes being about the same length (a 1:1 ratio).

❸ Mark a dot for each pair of scores. Find the place on the horizontal axis for
the first pair of scores on the horizontal-axis variable. Next, move up to the
height for the score for the first pair of scores on the vertical-axis variable. Then
mark a clear dot. Continue this process for the remaining pairs of scores. Some-
times the same pair of scores occurs twice (or more times). This means that the
dots for these pairs would go in the same place. When this happens, you can put
a second dot as near as possible to the first—touching, if possible—but making
it clear that there are in fact two dots in the one place. Alternatively, you can put
the number 2 in that place.

scatter diagram graph showing the
relationship between two variables: the
values of one variable are along the
horizontal axis and the values of the
other variable are along the vertical axis;
each score is shown as a dot in this two-
dimensional space.

T I P F O R S U C C E S S
If you’re in any way unsure about
what a numeric equal-interval
variable is, be sure to review the
Chapter 1 material on kinds of
variables.

T I P F O R S U C C E S S
When making a scatter diagram, it
is easiest if you use graph paper.

IS
B

N
0-558-46761-X

Correlation 435

An Example
Suppose a researcher is studying the relationship of sleep to mood. As an initial test,
the researcher asks six students in her morning seminar two questions:

1. How many hours did you sleep last night?
2. How happy do you feel right now on a scale from 0, not at all happy, to 8,

extremely happy?

The (fictional) results are shown in Table 11–1. (In practice, a much larger group
would be used in this kind of research. We are using an example with just six to keep
things simple for learning. In fact, we have done a real version of this study. Results
of the real study are similar to what we show here, except not as strong as the ones
we made up to make the pattern clear for learning.)

❶ Draw the axes and decide which variable goes on which axis. Because sleep
comes before mood in this study, it makes most sense to think of sleep as the
predictor. Thus, as shown in Figure 11–2a, we put hours slept on the horizontal
axis and happy mood on the vertical axis.

8
7
6
5
4
3
2
1
0

H
ap

py
M

oo
d

0 1 2 3 4 5 6 7 8 9 10 11

Hours Slept Last Night

(d)

H
ap
py
M
oo
d
Hours Slept Last Night

(a)

8
7
6
5
4
3
2
1
0
H
ap
py
M
oo
d
0 1 2 3 4 5 6 7 8 9 10 11 12
Hours Slept Last Night

(b)

8
7
6
5
4
3
2
1
0
H
ap
py
M
oo
d
0 1 2 3 4 5 6 7 8 9 10 11 12
Hours Slept Last Night

(c)

❶

❷

❸

Figure 11–2 Steps for making a scatter diagram. (a) ❶ Draw the axes and decide which
variable goes on which axis—the predictor variable (Hours Slept Last Night) on the horizon-
tal axis, the other (Happy Mood) on the vertical axis. (b) ❷ Determine the range of values to
use for each variable and mark them on the axes. (c) ❸ Mark a dot for the pair of scores for the
first student. (d) ❸ continued: Mark dots for the remaining pairs of scores.

Table 11–1 Hours Slept Last
Night and Happy Mood Example
(Fictional Data)

Hours Slept Happy Mood

7 4

5 2

8 7

6 2

6 3

10 6

How are you doing?

1. What does a scatter diagram show, and what does it consist of?
2. (a) When it is the kind of study in which one variable can be thought of as pre-

dicting another variable, which variable goes on the horizontal axis? (b) Which
goes on the vertical axis?

3. Make a scatter diagram for the following scores for four people who were each
tested on two variables, X and Y. X is the variable we are predicting from; it can
have scores ranging from 0 to 6. Y is the variable being predicted; it can have
scores from 0 to 7.

0123456

X
7
6
5
4
3
2
1
0

Figure 11–3Scatter diagram for scores in “How are you doing?” question 3.

436 Chapter 11

❷ Determine the range of values to use for each variable and mark them on the
axes. For the horizontal axis, we start at 0 as usual. We do not know the maxi-
mum possible, but let us assume that students rarely sleep more than 12 hours.
The vertical axis goes from 0 to 8, the lowest and highest scores possible on the
happiness question. See Figure 11–2b.

❸ Mark a dot for each pair of scores. For the first student, the number of hours
slept last night was 7. Move across to 7 on the horizontal axis. The happy mood
rating for the first student was 4, so move up to the point across from the 4 on the
vertical axis. Place a dot at this point, as shown in Figure 11–2c. Do the same for
each of the other five students. The result should look like Figure 11–2d.

Person

X Y

A 3 4
B 6 7
C 1 2
D 4 6

Patterns of Correlation
Linear and Curvilinear Correlations
In each example so far, the pattern in the scatter diagram very roughly approximates
a straight line. Thus, each is an example of a linear correlation. In the scatter diagram
for the study of happy mood and sleep (Figure 11–2d), you could draw a line show-
ing the general trend of the dots, as we have done in Figure 11–4. Notice that the
scores do not all fall right on the line. Notice, however, that the line does describe the
general tendency of the scores. (In Chapter 12 you learn the precise rules for draw-
ing such a line.)

Sometimes, however, the general relationship between two variables does not
follow a straight line at all, but instead follows the more complex pattern of a
curvilinear correlation. Consider, for example, the relationship between a person’s
level of kindness and the degree to which that person is desired by others as a poten-
tial romantic partner. There is evidence suggesting that, up to a point, a greater level
of kindness increases a person’s desirability as a romantic partner. However, beyond
that point, additional kindness does little to increase desirability (Li et al., 2002). This
particular curvilinear pattern is shown in Figure 11–5. Notice that you could not draw
a straight line to describe this pattern. Some other examples of curvilinear relation-
ships are shown in Figure 11–6.

Correlation 437

linear correlation relation between
two variables that shows up on a scatter
diagram as the dots roughly following a
straight line.

curvilinear correlation relation be-
tween two variables that shows up on a
scatter diagram as dots following a sys-
tematic pattern that is not a straight line.

Answers

1.A scatter diagram is a graph that shows the relation between two variables. One
axis is for one variable; the other axis, for the other variable. The graph has a
dot for each individual’s pair of scores. The dot for each pair is placed above
that of the score for that pair on the horizontal axis variable and directly across
from the score for that pair on the vertical axis variable.

2.(a) The variable that is doing the predicting goes on the horizontal axis. (b) The
variable that is being predicted goes on the vertical axis.

3.See Figure 11–3.

8
7
6
5
4
3
2
1
0
H
ap
py
M
oo
d
0 1 2 3 4 5 6 7 8 9 10 11 12
Hours Slept Last Night

Figure 11–4 Scatter diagram from Figure 11–2d with a line drawn to show the general
trend.

438 Chapter 11

Kindness

D
es

ir
ab

ili
ty

Figure 11–5 Example of a curvilinear relationship: desirability and kindness.

Pe
rc

en
t W

ho
R

be
r

E
ac

h
It

Beginning Middle End

Position of Item in the List

(b)

Fe
el

in
g

Stimulus Complexity

(a)

R
at

e
of

S
ub

st
itu

tio
n

of
D

ig
its

f
or

S
ym

bo
ls

0 1 2 3 4

Motivation

(c)

−

Simple,
familiar

Simple, complex,
novel, familiar

Complex,
novel

Figure 11–6 Examples of curvilinear relationships: (a) the way we feel and the complex-
ity of a stimulus; (b) the number of people who remember an item and its position on a list; and
(c) children’s rate of and motivation for substituting digits for symbols.

Correlation 439

no correlation no systematic relation-
ship between two variables.

In
co

m
e

Shoe Size

Figure 11–7 Two variables with no association with each other: income and shoe size
(fictional data).

The usual way of figuring the correlation (the one you learn shortly in this chap-
ter) gives the degree of linear correlation. If the true pattern of association is curvi-
linear, figuring the correlation in the usual way could show little or no correlation.
Thus, it is important to look at scatter diagrams to identify these richer relationships
rather than automatically figuring correlations in the usual way, assuming that the
only relationship is a straight line.

No Correlation
It is also possible for two variables to be essentially unrelated to each other. For ex-
ample, if you were to do a study of income and shoe size, your results might appear
as shown in Figure 11–7. The dots are spread everywhere, and there is no line,
straight or otherwise, that is any reasonable representation of a trend. There is simply
no correlation.

Positive and Negative Linear Correlations
In the examples so far of linear correlations, such as exciting activities and martial sat-
isfaction, high scores go with high scores, lows with lows, and mediums with medi-
ums. This is called a positive correlation. (One reason for the term “positive” is that
in geometry, the slope of a line is positive when it goes up and to the right on a graph
like this. Notice that in Figure 11–4 the positive correlation between happy mood and
sleep is shown by a line that goes up and to the right.)

Sometimes, however, high scores on one variable go with low scores on the other
variable and lows with highs. This is called a negative correlation. For example, in
the newspaper survey about marriage, the researchers also asked about boredom with
the relationship and the partner. Not surprisingly, the more bored a person was, the
lower was the person’s marital satisfaction. That is, low scores on one variable went
with high scores on the other. Similarly, the less bored a person was, the higher the
marital satisfaction. This is shown in Figure 11–8, where we also put in a line to em-
phasize the general trend. You can see that as it goes from left to right, the line slopes
slightly downward.

Another example of a negative correlation is from organizational psychology. A
well established finding in that field is that absenteeism from work has a negative

negative correlation relation between
two variables in which high scores on
one go with low scores on the other,
mediums with mediums, and lows with
highs; on a scatter diagram, the dots
roughly follow a straight line sloping
down and to the right.

positive correlation relation between
two variables in which high scores on
one go with high scores on the other,
mediums with mediums, and lows with
lows; on a scatter diagram, the dots
roughly follow a straight line sloping up
and to the right.

440 Chapter 11

Bored with Relationship

50
40
30
20
10

0
2 4 6 10 12

M
ar
ita
l S
at
is
fa
ct
io
n

Figure 11–8 Scatter diagram with the line drawn in to show the general trend for a neg-
ative correlation between two variables: greater boredom with the relationship goes with lower
marital satisfaction. (Data from Aron et al., 2000.)

linear correlation with satisfaction with the job (e.g., Mirvis & Lawler, 1977): that is,
the higher the level of job satisfaction, the lower the level of absenteeism. Put
another way, the lower the level of job satisfaction is, the higher the absenteeism be-
comes. Research on this topic has continued to show this pattern all over the world
(e.g., Punnett et al., 2007), and the same pattern is found for university classes: the
more satisfied students are, the less they miss class (Yorges et al., 2007).

Strength of the Correlation
What we mean by the strength of the correlation is how much there is a clear pat-
tern of some particular relationship between two variables. For example, we saw
that a positive linear correlation is when high scores go with highs, mediums with
mediums, lows with lows. The strength (or degree) of such a correlation, then, is
how much highs go with highs, and so on. Similarly, the strength of a negative lin-
ear correlation is how much the highs on one variable go with the lows on the
other, and so forth. In terms of a scatter diagram, there is a “large” (or “strong”)
linear correlation if the dots fall close to a straight line (the line sloping up or down
depending on whether the linear correlation is positive or negative). A perfect lin-
ear correlation means all the dots fall exactly on the straight line. There is a “small”
(or “weak”) correlation when you can barely tell there is a correlation at all; the
dots fall far from a straight line. The correlation is “moderate” (also called a
“medium” correlation) if the pattern of dots is somewhere between a small and a
large correlation.

Importance of Identifying the Pattern of Correlation
The procedure you learn in the next main section is for figuring the direction and
strength of linear correlation. As we suggested earlier, the best approach to such a
problem is first to make a scatter diagram and to identify the pattern of correla-
tion. If the pattern is curvilinear, then you would not go on to figure the linear
correlation. This is important because figuring the linear correlation when the

Correlation 441

true correlation is curvilinear would be misleading. (For example, you might con-
clude that there is little or no correlation when in fact there is a quite strong rela-
tionship; it is just not linear.) You should assume that the correlation is linear,
unless the scatter diagram shows a curvilinear correlation. We say this, because
when the linear correlation is small, the dots will fall far from a straight line. In such
situations, it can sometimes be hard to imagine a straight line that roughly shows
the pattern of dots.

If the correlation appears to be linear, it is also important to “eyeball” the scatter
diagram a bit more. The idea is to note the direction (positive or negative) of the lin-
ear correlation and also to make a rough guess as to the strength of the correlation.
Scatter diagrams with varying directions and strengths of correlation are shown in
Figure 11–9. For example, scatter diagram (a) in Figure 11–9 shows a large positive
correlation, because the dots fall relatively close to a straight line, with low scores

(a) (b)

(e)

(f)

Figure 11–9 Examples of scatter diagrams with different degrees of correlation.

442 Chapter 11

How are you doing?

1. What is the difference between a linear and curvilinear correlation in terms of
how they appear in a scatter diagram?

2. What does it mean to say that two variables have no correlation?
3. What is the difference between a positive and negative linear correlation?

Answer this question in terms of (a) the patterns in a scatter diagram and
(b) what those patterns tell you about the relationship between the two variables.

4. For each of the scatter diagrams shown in Figure 11–10, say whether the pat-
tern is roughly linear, curvilinear, or no correlation. If the pattern is roughly lin-
ear, also say if it is positive or negative, and whether it is large, moderate, or
small.

5. Give two reasons why it is important to identify the pattern of correlation in a
scatter diagram before proceeding to figure the precise correlation.

going with low scores and highs with highs. Scatter diagram (d), however, shows a
negative correlation (there is a general tendency for lows to be with highs and highs
with lows) that is of a moderate size (the dots fall too far from a straight line to be a
large correlation, but are not so far apart that it is a small correlation). Using a scat-
ter diagram to examine the direction and approximate strength of correlation is im-
portant because it lets you check to see whether you have made a major mistake when
you then do the figuring you learn in the next section.

(a)
(c)
(b)
(d)

Figure 11–10 Scatter diagrams for “How are you doing?” question 4.

Correlation 443

product of deviation scores the
result of multiplying the deviation score
on one variable by the deviation score on
another variable.

The Correlation Coefficient
Looking at a scatter diagram gives you a rough idea of the relationship between two
variables, but it is not a very precise approach. What you need is a number that gives
the exact correlation (in terms of its direction and strength).

Logic of Figuring the Linear Correlation
A linear correlation (when it is positive) means that highs go with highs and lows
with lows. Thus, the first thing you need in figuring the correlation is some consis-
tent way to measure what is a high score and what is a low score. An efficient way to
solve this problem is to use deviation scores—that is, the raw score minus the mean
( for one variable and for the other variable). A
raw score above the mean (that is, a high score) will always give a positive deviation
score and a raw score below the mean (that is, a low score) will always give a nega-
tive deviation score.

There is an additional and very important reason why deviation scores are so use-
ful when figuring the correlation. It has to do with what happens if you multiply a score
on one variable by a score on the other variable and get the product. When using
deviation scores, this is called a product of deviation scores (or product of deviations).
If you multiply a positive deviation score on one variable by a positive deviation score
on another variable (each positive deviation score represents a raw score above the
mean), you will always get a positive product. Further—and here is where it gets in-
teresting—if you multiply a negative deviation score by a negative deviation score
(each negative deviation score represents a raw score below the mean), you also get
a positive product.

Y – MYdeviation scores = X – MX

Answers

1.In a linear correlation, the pattern of dots roughly follows a straight line (al-
though with a small correlation, the dots will be spread widely around a straight
line); in a curvilinear correlation, there is a clear systematic pattern to the dots,
but it is not a straight line.

2.Two variables have no correlation when there is no pattern of relationship
between them.

3.(a) In a scatter diagram for a positive linear correlation, the line that roughly
describes the pattern of dots goes up and to the right; in a negative linear cor-
relation, the line goes down and to the right. (b) In a positive linear correlation,
the basic pattern is that high scores on one variable go with high scores on the
other, mediums go with mediums, and lows go with lows; in a negative linear
correlation, high scores on one variable go with low scores on the other, medi-
ums go with mediums, and lows go with highs.

4.In Figure 11–10: (a) linear, negative, large; (b) curvilinear; (c) linear, positive,
large; (d) no correlation.

5.Identifying whether the pattern of correlation in a scatter diagram is linear tells
you whether it is appropriate to use the standard procedures for figuring a lin-
ear correlation. If it is linear, identifying the direction and approximate strength
of correlation before doing the figuring lets you check the results of your figur-
ing when you are done.

444 Chapter 11

So, if highs on one variable go with highs on the other, and lows on one go with
lows on the other, the products of deviation scores always will be positive. Consid-
ering a whole distribution of scores, suppose you take each person’s deviation score
on one variable and multiply it by that person’s deviation score on the other variable.
The result of doing this when highs go with highs and lows with lows is that the
products all come out positive. If you sum up these products of deviation scores for
all the people in the study, which are all positive, you will end up with a big positive
number.

On the other hand, with a negative correlation, highs go with lows and lows
with highs. In terms of deviation scores, this would mean positives with negatives
and negatives with positives. Multiplied out, that gives all negative products of de-
viations scores. If you add all these negative products together, you get a big nega-
tive number.

Finally, suppose there is no linear correlation. In this situation, for some people
highs on one variable would go with highs on the other variable (and some lows would
go with lows), making positive products of deviations. For other people, highs on
one variable would go with lows on the other variable (and some lows would go with
highs), making negative products. Adding up these products for all the people in the
study would result in the positive products and the negative products canceling each
other out, giving a result around 0.

In each situation, we changed all the scores to deviation scores, multiplied the two
deviation scores for each person by each other, and added up these products of devi-
ations. The result was a large positive number if there was a positive linear correla-
tion, a large negative number if there was a negative linear correlation, and 0 if there
was no linear correlation.

Table 11–2 summarizes the logic up to this point. The table shows the effect on
the correlation of different patterns of raw scores and resulting deviation scores. For
example, the first row shows a high score on X going with a high score on Y. In this
situation, the deviation score for variable X is a positive number (since X is a high
number, above the mean of X ), and similarly the deviation score for variable Y is a
positive number (since Y is a high number, above the mean of Y ). Thus, the product
of these two positive deviation scores must be a positive number (since a positive
number multiplied by a positive number always gives a positive number). The overall

Table 11–2 The Effect on the Correlation of Different Patterns of Raw Scores and

Deviation Scores

Product of
Pair of Scores Deviation Scores Deviation Scores

Effect on CorrelationX Y

High High Contributes to positive correlation

Low Low Contributes to positive correlation

High Low Contributes to negative correlation

Low High Contributes to negative correlation

Middle Any Zero , , or Zero Zero Makes correlation near zero

Any Middle , , or Zero Zero Zero Makes correlation near zero

Note: indicates a positive number; indicates a negative number

-+
-+

-+-
–+
+–
+++

(X � MX)(Y � MY )Y � MYX � MX

T I P F O R S U C C E S S
Test your understanding of correla-
tion by covering up portions of
Table 11–2 and trying to recall the
hidden information.

Correlation 445

effect is that when a high score on X goes with a high score on Y, the pair of scores
contribute toward making a positive correlation. The table shows that positive prod-
ucts of deviation scores contribute toward making a positive correlation, negative
products of deviation scores contribute toward making a negative correlation,
and products of deviation scores that are zero (or close to zero) contribute toward
making a correlation of zero.

However, you are still left with the problem of figuring the precise strength of
a positive or negative correlation. The larger the number is (that is, the farther from
zero), the stronger the correlation will be. But how large is large, and how large is
not very large? You can’t judge from the sum of the products of deviations alone,
which gets bigger just by adding the products of more persons together. For exam-
ple, a study with 100 people would have a larger sum of products of deviations than
the same study with only 25 people. The sum of the products also gets larger if the
scores are on a more spread-out scale. For example, a study in which the scores on
the two variables have a lot of variation, so they range from, say, 0 to 50, will have
much larger products of deviation scores (and thus a larger sum of the products)
than a study in which the scores on the two variables have less variation and range
from, say, 0 to 10. This is because you are multiplying larger deviation scores by
each other.

The upshot of all this is the sign ( or ) of the sum of the products of devia-
tion scores tells you the direction of the correlation. And the bigger it is (ignoring the
sign), the more positive or negative it is. But it is hard to know from the sum of the
products of deviation scores just how strong the correlation is because the number of
people in the study and the amount of variation of the scores for each variable both
affect the size of the sum of the products of deviation scores.

The solution to finding the precise degree of correlation is to divide this sum of
the products of deviations by a number that corrects for both the number of people in
the study and the variation of the scores for each variable. It turns out that this num-
ber is based on the sum of the squared deviations of each variable. This is because the
more people there are in the study, the more squared deviations are being summed and
because the more variation there is in the scores for each variable, the larger will be
the squared deviations being summed. That is, to adjust our sum of products, we use
a correction number that has two properties:

1. It gets larger with more people.
2. It gets larger as the scores for each variable have more variation.

These two properties of the correction number mean that it serves two very
important purposes: it adjusts for the number of people in the study, and it adjusts for
the different variation in scores for each variable.

The actual specific correction number that is used is the square root of what you
get when you take the sum of squared deviations for each variable (the SS or sum of
squares you figure when figuring the variance), multiply the two sums of squares by
each other, and take the square root: However, we will turn to the for-
mulas shortly.

So how do you actually use this number to make the correction? You divide the
sum of products of deviations by this correction number. It turns out that the result of
dividing the sum of the product of deviation scores by the correction number can
never be more than , which would be a perfect positive linear correlation. It can
never be less than , which would be a perfect negative linear correlation. In the
situation of no linear correlation, the result is 0.

-1

1(SSX) (SSY).

446 Chapter 11

For a positive linear correlation that is not perfect (it is extremely rare to find a
perfect correlation), the result of taking the sum of the products of deviation scores
and dividing by the correction number is a number between 0 and . To put this an-
other way, if the general trend of the dots is upward and to the right, but they do not
fall exactly on a single straight line, the result of this process is between 0 and .
The same rule holds for negative correlations: they fall between 0 and . So, over-
all, a correlation varies from to .

Interpreting the Correlation Coefficient
The result of dividing the sum of the products of deviation scores by the correction
number is called the correlation coefficient. It is also called the Pearson correlation
coefficient (or the Pearson product-moment correlation coefficient, to be very
traditional). It is named after Karl Pearson (whom you meet in Box 13–1). Pearson,
along with Francis Galton (see Box 11–1 in this chapter), played a major role in
developing the correlation coefficient. The correlation coefficient is abbreviated by

+1-1
-1

+1
+1

T I P F O R S U C C E S S
If you figure a correlation coeffi-
cient to be larger than or less
than , you have made a mistake
in your figuring.

-1
+1

correlation coefficient (r) measure of
degree of linear correlation between two
variables ranging from (a perfect
negative linear correlation) through 0
(no correlation) to (a perfect positive
correlation).

+1
-1

BOX 11–1 Galton: Gentleman Genius
Francis Galton is credited with
inventing the correlation statis-
tic. (Karl Pearson, the hero of
our Chapter 13, worked out the
formulas, but Pearson was a stu-
dent of Galton and gave Galton
all the credit.) Statistics at this
time (around the end of the 19th
century) was a tight little British
club. In fact, most of science
was an only slightly larger club.

Galton also was influenced greatly by his own cousin,
Charles Darwin.

Galton was a typical eccentric, independently wealthy
gentleman scientist. Aside from his work in statistics, he
possessed a medical degree, had explored “darkest
Africa,” invented glasses for reading underwater, exper-
imented with stereoscopic maps, dabbled in meteorology
and anthropology, and wrote a paper about receiving in-
telligible signals from the stars.

Above all, Galton was a compulsive counter. Some of
his counts are rather infamous. Once while attending a lec-
ture he counted the fidgets of an audience per minute, look-
ing for variations with the boringness of the subject matter.
While twice having his picture painted, he counted the
artist’s brush strokes per hour, concluding that each portrait
required an average of 20,000 strokes. While walking the
streets of various towns in the British Isles, he classified the
beauty of the female inhabitants by fingering a recording
device in his pocket to register good, medium, or bad.

Galton’s consuming interest, however, was the count-
ing of geniuses, criminals, and other types in families. He
wanted to understand how each type was produced so that
science could improve the human race by encouraging
governments to enforce eugenics—selective breeding for
intelligence, proper moral behavior, and other qualities—
to be determined, of course, by the eugenicists. (Eugenics
has since been generally discredited.) The concept of cor-
relation came directly from his first simple efforts in this
area, the study of the relation of the height of children to
their parents.

At first, Galton’s method of exactly measuring the ten-
dency for “one thing to go with another” seemed almost
the same as proving the cause of something. For exam-
ple, if it could be shown mathematically that most of the
brightest people came from a few highborn British fami-
lies and most of the least intelligent people came from
poor families, that seemed at first to “prove” that intelli-
gence was caused by the inheritance of certain genes (pro-
vided that you were prejudiced enough to overlook the
differences in educational opportunities). Now the study
only proves that if you were a member of one of those
highborn British families, history would make you a prime
example of how easy it is to misinterpret the meaning of
a correlation.

You can learn more about Galton on the following Web
page: http://www-history.mcs.st-andrews.ac.uk/Biographies/
Galton.html.

Sources: Peters (1987); Salsburg (2001); Tankard (1984).

Corbiss/Bettman

Correlation 447

(a) (b)
(c) (d)
(e) (f)

r = .81 r = −.75

r = .46 r = −.42

r = .16

r = −.18

Figure 11–11 Examples of scatter diagrams and correlation coefficients for different
degrees of linear correlation.

the letter r, which is short for regression, an idea closely related to correlation (see
Chapter 12).

The sign ( or ) of a correlation coefficient tells you the direction of the
linear correlation between two variables (a positive correlation or a negative cor-
relation). The actual value of the correlation coefficient—from a low of 0 to a high
of 1, ignoring the sign of the correlation coefficient—tells you the strength of the
linear correlation. So, a correlation coefficient of represents a larger linear cor-
relation than a correlation of . Similarly, a correlation of represents a
larger linear correlation than (since .90 is bigger than .85). Another way of
thinking of this is that, in a scatter diagram, the closer the dots are to falling on a
single straight line, the larger the linear correlation. Figure 11–11 shows the
scatter diagrams from Figure 11–9, with the correlation coefficient shown for each

.85

– .90+ .42

+ .85
-+
IS
B
N
0-
55
8-
46
76
1-
X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

448 Chapter 11

T I P F O R S U C C E S S
When changing the raw scores to
deviation scores, it is easiest (and
you will make fewer mistakes) if
you do all the deviation scores for
one variable and then all the devia-
tion scores for the other variable.
Also, to make sure you have done
it correctly, when you finish all the
deviation scores for a variable, add
them up; they should add up to 0
(within rounding error).

scatter diagram. Be sure that the correlation coefficient for each scatter diagram
agrees roughly with the correlation coefficient you would expect based on the pat-
tern of dots.

Formula for the Correlation Coefficient
The correlation coefficient, as we have seen, is the sum of the products of deviation
scores divided by a correction number that takes into account the number of people
and the variation on each variable being correlated. Put as a formula,

(11–1)

r is the correlation coefficient. is the deviation score for each person on the
X variable and is the deviation score for each person on the Y variable;
( )( ) is the product of deviation scores for each person; and

is the sum of the products of deviation scores over all the
people in the study. is the sum of squared deviations for the X variable and
is the sum of squared deviations for the Y variable.1

Steps for Figuring the Correlation Coefficient
Here are the steps for figuring the correlation coefficient.

❶ Change the scores for each variable to deviation scores. Figure the mean of
each variable. Then subtract each variable’s mean from each of its scores. (This
is just what you have been doing all along as part of figuring the variance.)

❷ Figure the product of the deviation scores for each pair of scores. That is, for
each pair of scores, multiply the deviation score on one variable by the deviation
score on the other variable.

❸ Add up all the products of the deviation scores.
❹ For each variable, square each deviation score.
❺ Add up the squared deviation scores for each variable.
➏ Multiply the two sums of squared deviations and take the square root of the

result. This creates a correction number.
❼ Divide the sum of the products of deviation scores from Step ❸ by the cor-

rection number from Step ➏.

An Example
Let us try these steps with the sleep and mood example.

❶ Change the scores for each variable to deviation scores. Starting with the
number of hours slept last night, the mean is 7 (sum of 42 divided by 6 stu-
dents). The deviation score for the first student’s sleep score is . We
figured the rest of the deviation scores for each variable and show them in the

and columns in Table 11–3.
❷ Figure the product of the deviation scores for each pair of scores. For the first

student, multiply 0 by 0 to give 0. The products of deviation scores for all the stu-
dents are shown in the last column of Table 11–3.

Y – MYX – MX

7 – 7 = 0

SSY

SSX

Y – MY
X – MX

r =
g3(X – MX)(Y – MY)4

2(SSX)(SSY)

The correlation coefficient is
the sum, over all the people in
the study, of the product of
each person’s two deviation
scores, divided by the result of
taking the square root of what
you get when you multiply the
sum of everyone’s squared
deviation scores on the X
variable by the sum of
everyone’s squared deviation
scores on the Y variable.

Correlation 449

❸ Add up all the products of the deviation scores. Adding up all the products of
the deviation scores, as shown in Table 11–3, gives a sum of 16.

❹ For each variable, square each deviation score. For the first student, the squared
deviation for the sleep variable is 0 multiplied by 0, which is 0. The squared de-
viation scores for all the students for the sleep variable are shown in the

column of Table 11–3. The squared deviation scores for all the stu-
dents for the happy mood variable are shown in the column.

❺ Add up the squared deviation scores for each variable. As shown in Table
11–3, the sum of squared deviations for the sleep variable is 16 and the sum of
squared deviations for the happy mood variable is 22.

➏ Multiply the two sums of squared deviations and take the square root of the
result. Multiplying 16 by 22 is 352, and the square root of 352 is 18.76.

❼ Divide the sum of the products of deviation scores from Step ❸ by the cor-
rection number from Step ➏. Dividing 16 by 18.76 gives a result of .85. This is
the correlation coefficient. (Note that correlation coefficients are usually rounded
to two decimal places.)

In terms of the correlation coefficient formula,

Because this correlation coefficient is positive and near 1, the highest possible value,
this is a very large positive linear correlation.

A Second Example
Suppose that a memory researcher does an experiment to test a theory predicting that
the number of exposures to a word increases the chance that the word will be re-
membered. One research participant is randomly assigned to be exposed to the list of
10 words once, one participant to be exposed to the list twice, and so forth, up to a
total of eight exposures to each word. This makes eight participants in all, one for

r =
g3(X – MX)(Y – MY)4

2(SSX)(SSY)
=

18.76

= .85

(Y – MY)2
(X – MX)2

Table 11–3 Figuring the Correlation Coefficient for the Sleep and Mood Study (Fictional Data)

Number of Hours Slept (X ) Happy Mood (Y )

Deviation Deviation Products of ❷
Deviation ❶ Squared ❹ Deviation ❶ Squared ❹ Deviation Scores

X Y

7 0 0 4 0 0 0

5 4 2

4 4

8 1 1 7 3 9 3

6 1 2 4 2

6 1 3

1 1

10 3 9 6 2 4 6

❺ ❺ ❸

❼

❻

2(SSX )(SSY )
=

2(16)(22)
=

2352
=

16
18.76

= .85

M = 4M = 7

-1-1
-2-1

-2-2

(X � MX) (Y � MY)(Y � MY )
2Y � MY(X � MX )

2X � MX

450 Chapter 11

each of the eight levels of exposure. The researchers record how many of the 10 words
each participant is able to remember. Results are shown in Table 11–4. (An actual
study of this kind would probably show a pattern in which the relative improvement
in recall is less at higher numbers of exposures.) The steps for figuring the correlation
coefficient are shown in Table 11–5.

❶ Change the scores for each variable to deviation scores. The mean of the
number of exposures is 4.5. Thus, the first exposure score of 1 gives a deviation
score of . Using the same procedure for all the other scores
gives the deviation scores shown in the and columns in
Table 11–5.

❷ Figure the product of the deviation scores for each pair of scores. For the first
person, multiply by to give 7. The products of deviation scores for all
the scores are shown in the last column of Table 11–5.

❸ Add up all the products of the deviation scores. Adding up all the products of
the deviation scores, as shown in Table 11–5, gives a sum of 30.

❹ For each variable, square each deviation score. For the first person, the squared
deviation for the number of exposures variable is multiplied by , which
is 12.25. The squared deviation scores for all the scores are shown in the

and ( columns of Table 11–5.
➎ Add up the squared deviation scores for each variable. As shown in Table

11–5, the sum of squared deviations for the number of exposures variable is 42, and
the sum of squared deviations for the number of words recalled variable is 32.

➏ Multiply the two sums of squared deviations and take the square root of the
result. Multiplying 42 by 32 is 1344, and the square root of 1344 is 36.66.

❼ Divide the sum of the products of deviation scores from Step ❸ by the cor-
rection number from Step ➏. Dividing 30 by 36.66 gives a result of .82. This
is the correlation coefficient.

Y – MY)2(X – MX)2

-3.5-3.5

-2-3.5

Y – MYX – MX
1 – 4.5 = -3.5

Table 11–4 Effect of Number
of Exposures to Words on the Number
of Words Recalled (Fictional Data)

Number of Number of
Exposures Words Recalled

1 3

2 2

3 6

4 4

5 5

6 5

7 6

8 9

Table 11–5 Figuring the Correlation Coefficient for the Effect of Number of Exposures to Each Word on the Number of Words Recalled
(Fictional Data)

Number of Exposures (X ) Number of Words Recalled (Y )

Deviation Deviation Products of ❷
Deviation ❶ Squared ❹ Deviation ❶ Squared ❹ Deviation Scores
X Y

1 12.25 3 4 7.0

2 6.25 2 9 7.5

3 2.25 6 1 1

4 .25 4 1 .5

5 .5 .25 5 0 0 0

6 1.5 2.25 5 0 0 0

7 2.5 6.25 6 1 1 2.5

8 3.5 12.25 9 4 16

➎ ➎ ❸

➐

2(42)(32)
=

21344
=

30
36.66

= .82

M = 5M = 4.5

-1- .5
-1.5-1.5

-3-2.5
-2-3.5

(X � MX )(Y � MY )(Y � MY )
2Y � MY(X � MX )

Correlation 451

In terms of the correlation coefficient formula,

Because this correlation coefficient is positive and near 1, the highest possible value,
this is a very large positive linear correlation.

r =
g3(X – MX)(Y – MY)4
2(SSX)(SSY)
=
30

36.66
= .82

How are you doing?

1. Why do we change the scores for each variable into deviation scores in the first
step of figuring the correlation coefficient?

2. Explain the logic of using the sum of the products of deviation scores as the
numerator of the formula for the correlation coefficient.

3. When figuring the correlation coefficient, why do you divide the sum of the
products of deviation scores by a correction number?

4. Write the formula for the correlation coefficient and define each of the symbols.
5. Figure the correlation coefficient for the following scores for three people who

were each tested on two variables, X and Y.

Person X Y

K 5 10
L 4 10
M 3 13

4.Formula for the correlation coefficient: . ris the corre-

lation coefficient; is the symbol for sum of—add up all the scores that fol-
low (in this formula, you add up all the products of deviation scores that follow);

is the deviation score for each person on the Xvariable; Y�is the
deviation score for each person on the Yvariable; is the sum of squared
deviations for the Xvariable; is the sum of squared deviations for the
Yvariable.

5.As shown in Table 11–6, . r

=-.87

SSY

SSX

MY X-MX

r=
g3(X-MX)(Y-MY)4

1(SSX)(SSY)

Table 11–6Figuring the Correlation Coefficient for “How are you doing?”Question 5

DeviationDeviationProducts of ❷
Deviation ❶Squared ❹Deviation ❶Squared ❹Deviation Scores

511101

4001010

311324

❺❺❸

❼
❻

2(SSX)(SSY)
=

-3

2(2)(6)
=

-3

212
=

-3
3.46

=-.87

M=11 M=4

-1
-1 -1

(X�MX)(Y�MY) (Y�MY)2 Y�M
Y (X�MX)2 X�M

452 Chapter 11

T I P F O R S U C C E S S
You will not be able to make much
sense of this section if you have
not yet studied Chapters 3
through 7.

Answers

1.We change the scores for each variable into deviation scores because devia-
tion scores show directly what is a high score and what is a low score.

2.When both deviation scores are positive (which represent scores above the
mean) or both deviation scores are negative (which represent scores below
the mean), the products of the deviation scores in each case are positive.
Across a whole distribution of high with high and low with low scores, the
sumof the products of deviation scores gives a large positive number (indi-
cating a positive correlation between the two variables). However, when one
deviation score is positive (which represents a score above the mean) and
theother deviation score is negative (which represents a score below the
mean), the product of the deviation scores is negative. Across a whole distri-
bution of high with low (and low with high) scores, the sum of the products of
deviation scores gives a large negative number (indicating a negative corre-
lation between the two variables). However, when there is no linear correla-
tion, the sum of the products of deviation scores will be close to zero,
because the positive and negative products of deviation scores will cancel
each other out.

3.You divide the sum of the products of deviation scores by a correction num-
ber because, otherwise, the more people there are in the study, and the greater
the variability of each variable’s scores, the bigger the sum of the products of
deviation scores will be, even if the degree of correlation is the same. Dividing
by the correction number (which is the result of taking the square root of the
result of multiplying the sum of squares for the Xvariable by the sum of squares
for the Yvariable) corrects for this.

Significance of a Correlation Coefficient
The correlation coefficient is a descriptive statistic, like the mean or standard
deviation. The correlation coefficient describes the linear relationship between two
variables. However, in addition to describing this relationship, we may also want to
test whether it is statistically significant. In the case of a correlation, the question is
usually whether it is significantly different from zero. That is, the null hypothesis in
hypothesis testing for a correlation is usually that in the population the true relation
between the two variable is no correlation ( ).2

The overall logic is much like that we have considered for the various t test and
analysis of variance situations discussed in previous chapters. Suppose for a particu-
lar population we had the distribution of two variables, X and Y. And suppose further
that in this population there was no correlation between these two variables. The
scatter diagram might look like that shown in Figure 11–12. Thus, if you were to
consider the dot for one random person from this scatter diagram, the scores might be

and . For another random person, it might be and . For a
third person, and . The correlation for these three persons would be

. If you then took out another three persons and figured the correlation it might
come out to . Presuming there was no actual correlation in the population,
if you did this lots and lots of times, you would end up with a distribution of correla-
tions with a mean of zero. This is a distribution of correlations of three persons’ each.
As shown in Figure 11–13, it would have a mean of zero and be spread out in both
directions up to a maximum of 1 and a minimum of .-1

r = – .12
r = .24

Y = 5X = 3
Y = 1X = 2Y = 2X = 4

r = 0

Correlation 453

Y
X

Figure 11–12 Scatter diagram for variables X and Y for a population in which there is
no relationship between X and Y.

It would actually be possible to figure out the cutoffs for significance on such a
distribution of correlation coefficients, just as we did for example for the F distribu-
tion. Then you could just compare your actual r to that cutoff to see if it was signifi-
cant. However, we do not need to introduce a whole new distribution with its own
tables and such. It turns out that we can figure out a number based on the correlation
coefficient that will follow a t distribution. This number is figured using the follow-
ing formula:

(11–2)

Notice that in this formula if , . This is because the numerator would
be 0 and the result of dividing 0 by any number is 0. Also notice that the bigger the
r, the bigger the t.

If you were to take three persons’ scores at random from the distribution with
no true correlation, you could figure this t value. For example, for the first three-
person example we just considered, the correlation was .24. So,

If you took a large number of
such samples of three persons each, computed the correlation and then the t for each,
you would eventually have a distribution of t scores. And here is the main point: you

2(1 – .242)>(3 – 2) = .24>2(.9424)>(1) = .25.
t = .24>

t = 0r = 0

t =
r

2(1 – r2)>(N – 2)

−1 0 +1
r (correlation coefficient)

Figure 11–13 Distribution of correlation coefficients for a large number of samples
( ) drawn from a population with no correlation between variables X and Y.N = 3

The t score for a correlation
coefficient is the result of
dividing the correlation
coefficient by the square root
of what you get when you
divide one minus the
correlation coefficient squared
by two less than the number
of people in the study.

454 Chapter 11

could then compare the t score figured in this way for the actual correlation in the
study, using the standard t table cutoffs.

As usual with the t statistic, there are different t distributions for different de-
grees of freedom. In the case of the t test for a correlation, df is the number of people
in the sample minus 2. (We subtract 2 because the whole figuring involved two dif-
ferent means, the mean of X and the mean of Y.) In terms of a formula,

(11–3)

Finally, note that the t value will be positive or negative, according to whether your
correlation is positive or negative. Thus, as with any t test, the t test for a correlation
can be either one-tailed or two-tailed. A one-tailed test means that the researcher has
predicted the sign ( or ) of the correlation. However, in practice, even when a re-
searcher expects a certain direction of correlation, correlations are usually tested with
two-tailed tests.

An Example
In the sleep and mood study example, let’s suppose that the researchers predicted a
correlation between number of hours slept and happy mood the next day, to be tested
at the .05 level, two-tailed.

❶ Restate the question as a research hypothesis and a null hypothesis about

the populations. There are two populations:

Population 1: People like those in this study.
Population 2: People for whom there is no correlation between number of hours
slept the night before and mood the next day.

The null hypothesis is that the two populations have the same correlation.
The research hypothesis is that the two populations do not have the same
correlation.

❷ Determine the characteristics of the comparison distribution. The compari-
son distribution is a t distribution with . (That is,

.)
❸ Determine the cutoff sample score on the comparison distribution at which

the null hypothesis should be rejected. The t table (Table A–2 in the Appendix)
shows that for a two-tailed test at the .05 level, with 4 degrees of freedom, the cut-
off t scores are 2.776 and .

❹ Determine your sample’s score on the comparison distribution. We figured
a correlation of . Applying the formula to find the equivalent t, we get

❺ Decide whether to reject the null hypothesis. The t score of 3.23 for our sam-
ple correlation is more extreme than a cutoff t score of 2.776. Thus, we can re-
ject the null hypothesis and the research hypothesis is supported.

Assumptions for the Significance Test
of a Correlation Coefficient
The assumptions for testing the significance of a correlation coefficient are similar to
those for the t test for independent means and analysis of variance. In those situations
you have to assume the population for each group follows a normal distribution and

t =
r

2(1 – r2)>(N – 2)
=

.85

2(1 – .852)>(6 – 2)
=

.85

2.0694
= 3.23

r = .85

-2.776

6 – 2 = 4
df = N – 2 =df = 4

df = N – 2
The degrees of freedom for
the t test for a correlation are
the number of people in the
sample minus 2.

Correlation 455

has the same variance as the population for the other groups. With the correlation you
have to assume that:

1. The population of each variable (X and Y) follows a normal distribution.
Actually you also assume that the relationship between the two variables also
follows a normal curve. This creates what is called a bivariate normal distribu-
tion. In practice, however, we usually check whether we have met the require-
ment by checking whether the distribution in the sample for each of our
variables is roughly normal.

2. There is an equal distribution of each variable at each point of the other
variable. For example, in a scatter diagram, if there is much more variation at
the low end than at the high end (or vice versa), this suggests a problem. In prac-
tice, you should look at the scatter diagram for your study to see if it looks like
the dots are much more spread out at the low or high end (or both). A lot of dots
in the middle are to be expected. So long as the greater number of dots in the
middle are not a lot more spread out than those at either end, this does not sug-
gest a problem with the assumptions.

Like the t tests you have already learned and like the analysis of variance, the
t test for the significance of a correlation coefficient is pretty robust to all but extreme
violations of its assumptions.

How are you doing?

1. What is the usual null hypothesis in hypothesis testing with a correlation coef-
ficient?

2. Write the formula for testing the significance of a correlation coefficient, and de-
fine each of the symbols.

3. Use the five steps of hypothesis testing to determine whether a correlation co-
efficient of from a study with a sample of 60 people is significant at
the .05 level, two-tailed.

4. What are the assumptions for the significance test of a correlation
coefficient?

r = – .31

Appendix) shows that for a two-tailed test at the .05 level, with 58 degrees
of freedom, the cutoff tscores are 2.004 and (we used the cutoffs
for , the closest dfin the table below 58).

❹Determine your sample’s score on the comparison distribution. The cor-
relation in the study was . Applying the formula to find the equivalent t,
we get

❺Decide whether to reject the null hypothesis. The tscore of for
our sample correlation is more extreme than a cutoff tscore of . Thus,
we can reject the null hypothesis and the research hypothesis is supported.

4.The population of each variable (and the relationship between them) follows a
normal distribution, and there is an equal distribution of each variable at each
point of the other variable.

-2.004
-2.48

t=
r

2(1-r2)>(N-2)=

-.31

2(1-(-.312))>(58)=
-.31
.125

=-2.48.

-.31

df=55
-2.004

456 Chapter 11

Answers

1.In hypothesis testing with a correlation coefficient, the usual null hypothesis is
that in the population the true relation between the two variables is no corre-
lation ().

2.Formula for testing the significance of a correlation coefficient: t�

. tis the tstatistic for testing the significance of the

correlation coefficient; ris the correlation coefficient; Nis the number of peo-
ple in the study.

3.❶Restate the question as a research hypothesis and a null hypothesis
about the populations. There are two populations:

Population 1: People like those in this study.
Population 2: People for whom there is no correlation between the two
variables.

The null hypothesis is that the two populations have the same correlation.
The research hypothesis is that the two populations do not have the same
correlation.

❷Determine the characteristics of the comparison distribution. The
comparison distribution is a tdistribution with . (That is,

.)
❸Determine the cutoff sample score on the comparison distribution at

which the null hypothesis should be rejected. The ttable (Table A–2in the

60-2=58
df=N-2 = df=58

2(1-r2)>(N-2)

r=0

Correlation and Causality
If two variables have a significant linear correlation, we normally assume that there is
something causing them to go together. However, you can‘t know the direction of
causality (what is causing what) just from the fact that the two variables are correlated.

Three Possible Directions of Causality
Consider the example with which we started the chapter, the correlation between
doing exciting activities with your partner and satisfaction with the relationship. There
are three possible directions of causality for these two variables:

1. It could be that doing exciting activities together causes the partners to be more
satisfied with their relationship.

2. It could also be that people who are more satisfied with their relationship choose
to do more exciting activities together.

3. Another possibility is that something like having less pressure (versus more
pressure) at work makes people happier in their marriage and also gives them
more time and energy to do exciting activities with their partner.

These three possible directions of causality are shown in Figure 11–14a.
The principle is that for any correlation between variables X and Y, there are at

least three possible directions of causality:

1. X could be causing Y.
2. Y could be causing X.
3. Some third factor could be causing both X and Y.

These three possible directions of causality are shown in Figure 11–14b.

direction of causality path of causal
effect; if X is thought to cause Y then the
direction of causality is from X to Y.

Correlation 457

It is also possible (and often likely) that there is more than one direction of causal-
ity making two variables correlated.

Ruling Out Some Possible Directions of Causality
Sometimes you can rule out one or more of these three possible directions based on
additional knowledge of the situation. For example, the correlation between sleep the
night before and a happy mood the next day cannot be due to happy mood the next
day causing you to sleep more the night before (causality doesn’t go backward in
time). But we still do not know whether the sleep the night before caused the happy
mood or some third factor, such as a general tendency to be happy, caused people
both to sleep well and to be happy on any particular day.

Another way we can rule out alternative directions of causality is by conducting
a true experiment. In a true experiment, participants are randomly assigned to a par-
ticular level of a variable and then measured on another variable. An example of this
is the study in which participants were randomly assigned (say, by flipping a coin) to
different numbers of exposures to a list of words, and then the number of words they
could remember was measured. There was an .82 correlation between number of
exposures and number of words recalled. In this situation, any causality has to be
from the variable that was manipulated (number of exposures) to the variable that is
measured (words recalled). The number of words recalled can’t cause more expo-
sures, because the exposures came first. And a third variable can’t be causing both
number of exposures and words recalled because number of exposures was deter-
mined randomly; nothing can be causing it other than the random method we used
(such as flipping a coin).

Correlational Statistical Procedures versus Correlation
Research Methods
Discussions of correlation and causality in psychology research are often confused
by there being two uses of the word correlation. Sometimes the word is used as the
name of a statistical procedure, the correlation coefficient (as we have done in this

Exciting
Activities

Marital
Satisfaction

Exciting
Activities
Marital
Satisfaction

Low Work
Pressure

Exciting
Activities
Marital
Satisfaction
X Y
X Y

X Y
(a) (b)

Figure 11–14 Three possible directions of causality (shown with arrows) for a corre-
lation for (a) the exciting activities and marital satisfaction example and (b) the general prin-
ciple for any two variables X and Y.

458 Chapter 11

chapter). At other times, the term correlation is used to describe a kind of research
design. A correlational research design is any research design other than a true
experiment. A correlational research design is not necessarily statistically analyzed
using the correlation coefficient, and some studies using experimental research designs
are most appropriately analyzed using a correlation coefficient. Hence the confusion.
We recommend you take one or more research methods courses to learn more about
research designs used in research in psychology.

How are you doing?

1. If anxiety and depression are correlated, what are three possible directions of
causality that might explain this correlation?

2. If high school and college grades are correlated, what directions of causality
can and cannot be ruled out by the situation?

3. A researcher randomly assigns participants to eat either zero or four cookies
and then asks them how full they feel. The number of cookies eaten and feel-
ing full are highly correlated. What directions of causality can and cannot be
ruled out?

4. What is the difference between correlation as a statistical procedure and a cor-
relational research design?

Answers

1.Being depressed can cause a person to be anxious; being anxious can
cause a person to be depressed; some third variable (such as some aspect
of heredity or childhood traumas) could be causing both anxiety and
depression.

2.College grades cannot be causing high school grades (causality doesn’t go
backward), but high school grades could be causing college grades (maybe
knowing you did well in high school gives you more confidence), and some
third variable (such as general academic ability) could be causing students to
do well in both high school and college.

3.Eating more cookies can cause participants to feel full. Feeling full cannot have
caused participants to have eaten more cookies, because how many cookies
were eaten was determined randomly. Third variables can’t cause both,
because how many cookies were eaten was determined randomly.

4.The statistical procedure of correlation is about using the formula for the cor-
relation coefficient, regardless of how the study was done. A correlational re-
search design is any research design other than a true experiment.

Issues in Interpreting the Correlation Coefficient
There are a number of subtle cautions in interpreting a correlation coefficient.

The Correlation Coefficient and the Proportionate Reduction
in Error or Proportion of Variance Accounted For
A correlation coefficient tells you the direction and strength of a linear correlation.
Bigger rs (values farther from 0) mean a higher degree of correlation. That is, an r of
.60 is a larger correlation than an r of .30. However, most researchers would hold

correlational research design any
research design other than a true
experiment.

Correlation 459

that an r of .60 is more than twice as large as an r of .30. To compare correlations
with each other, most researchers square the correlations (that is, they use instead
of r). This is called, for reasons you will learn in an Advanced Topic section of
Chapter 12, the proportionate reduction in error (and also the proportion of vari-
ance accounted for).

For example, a correlation of .30 is an of .09 and a correlation of .60 is an
of .36. Thus, a correlation of .60 is actually four times as large as one of .30 (that is,
.36 is four times as big as .09).

Restriction in Range
Suppose an educational psychologist studies the relation of grade level to knowledge
of geography. If this researcher studied students from the entire range of school grade
levels, the results might appear as shown in the scatter diagram in Figure 11–15a.
That is, the researcher might find a large positive correlation. But suppose the
researcher had studied students only from the first three grades. The scatter diagram
(see Figure 11–15b) would show a much smaller correlation (the general increasing
tendency is in relation to much more noise). However, the researcher would be mak-
ing a mistake by concluding that grade level is only slightly related to knowledge of
geography over all grades.

The problem in this situation is that the correlation is based on people who
include only a limited range of the possible values on one of the variables. (In this
example, there is a limited range of grade levels.) It is misleading to think of the
correlation as if it applied to the entire range of values the variable might have. This
situation is called restriction in range.

It is easy to make such mistakes in interpreting correlations. (You will occasion-
ally see them even in published research articles.) Consider another example. Busi-
nesses sometimes try to decide whether their hiring tests are correlated with how
successful the persons hired turn out on the job. Often, they find very little relation-
ship. What they fail to take into account is that they hired only people who did well
on the tests. Their study of job success included only the subgroup of high scorers. This
example is shown in Figure 11–16.

proportionate reduction in error ( )
measure of association between vari-
ables that is used when comparing asso-
ciations. Also called proportion of
variance accounted for.

restriction in range situation in
which you figure a correlation but only a
limited range of the possible values on
one of the variables is included in the
group studied.

K
no

w
le

dg
e

of
G

eo
gr

ap
hy

0 1 2 3 4 5 6 7 8 9 10 11 12

School Grade Level

(a)
K
no
w
le
dg
e
of
G
eo
gr
ap
hy

0 1 2 3

School Grade Level
(b)

Figure 11–15 Example of restriction in range comparing two scatter diagrams
(a) when the entire range is shown (of school grade level and knowledge of geography) and
(b) when the range is restricted (to the first three grades) (fictional data).

460 Chapter 11

Yet another example is any study that tries to correlate intelligence with other
variables that uses only college students. The problem here is that college students do
not include many lower or below-average intelligence students. Thus, a researcher
could find a low correlation in such a study. But if the researcher did the same study
with people who included the full range of intelligence levels, there could well be a
high correlation.

Jo
b

Su
cc

es
s

0 25 50 75

Test Score

(a) Persons tested

Jo
b
Su
cc
es
s
0 25 50 75
Test Score

(b) Persons hired

Figure 11–16 Additional example of restriction in range comparing two scatter
diagrams (a) when the entire range is shown (of all persons tested) and (b) when the range is
restricted (to just those persons who were hired) (fictional data).

BOX 11–2 Illusory Correlation: When You Know Perfectly Well
That If It’s Big, It’s Fat—and You Are Perfectly Wrong

The concept of correlation was not really invented by sta-
tisticians. It is one of the most basic of human mental
processes. The first humans must have thought in terms of
correlation all the time—at least those who survived.
“Every time it snows, the animals we hunt go away. Snow
belongs with no animals. When the snow comes again, if
we follow the animals, we may not starve.”

In fact, correlation is such a typically human and highly
successful thought process that we seem to be psychologi-
cally organized to see more correlation than is there—like
the Aztecs, who thought that good crops correlated with
human sacrifices (let’s hope they were wrong), and like the
following examples from social psychology of what is
called illusory correlation (Hamilton, 1981; Hamilton &
Gifford, 1976; Johnson & Mullen, 1994).

Illusory correlation is the term for the overestimation
of the strength of the relationship between two variables
(the term has also had other special meanings in the past).
Right away, you may think of some harmful illusory cor-
relations related to ethnicity, race, gender, and age. One
source of illusory correlation is the tendency to link two

infrequent and therefore highly memorable events. Sup-
pose Group B is smaller than Group A, and in both groups
one-third of the people are known to commit certain infre-
quent but undesirable acts. In this kind of situation, re-
search shows that Group B, whose members are less
frequently encountered, will in fact be blamed for far more
of these undesirable acts than Group A. This is true even
though the odds are greater that a particular act was com-
mitted by a member of Group A, since Group A has more
members. The problem is that infrequent events stick to-
gether in memory. Membership in the less frequent group
and the occurrence of less frequent behaviors form an il-
lusory correlation. One obvious consequence is that we
remember anything unusual done by the member of a mi-
nority group better than we remember anything unusual
done by a member of a majority group.

Illusory correlation due to “paired distinctiveness” (two
unusual events being linked in our minds) may occur
because when we first encounter distinctive experiences,
we think more about them, processing them more deeply so
that they are more accessible in memory later (Johnson &

Correlation 461

Mullen, 1994). If we encounter, for example, members of
a minority we don’t see often, or negative acts that we rarely
see or hear about, we really think about them. If they are
paired, we study them both and they are quicker to return
to memory. It also seems that we can continue to process in-
formation about groups, people, and their behaviors with-
out any awareness of doing so. Sometime along the way, or
when we go to make a judgment, we overassociate the un-
usual groups or people with the unusual (negative) behav-
iors (McConnell et al., 1994). This effect is stronger when
information about the groups or people is sparse, as if we
try even harder in ambiguous situations to make sense of
what we have seen (Berndsen et al., 2001).

Indeed, observing a single instance of a rare group show-
ing some unusual behavior, a “one-shot illusory correlation,”
is sufficient to create the effect (Risen et al., 2007).

Most illusory correlations, however, occur simply be-
cause of prejudices. Prejudices are implicit, erroneous the-
ories that we carry around with us. For example, we
estimate that we have seen more support for an association
between two social traits than we have actually seen:

driving skills and a particular age group; level of acade-
mic achievement and a specific ethnic group; certain
speech, dress, or social behaviors and residence in some
region of the country. One especially interesting example
is that most people in business believe that job satisfaction
and job performance are closely linked, when in fact the
correlation is quite low. People who do not like their jobs
can still put in a good day’s work; people who rave about
their job can still be lazy about doing it.

By the way, some people form their implicit theories
impulsively and hold them rigidly; others seem to base
them according to what they remember about people and
change their theories as they have new experiences
(McConnell, 2001). Which are you?

The point is, the next time you ask yourself why you
are struggling to learn statistics, it might help to think of
it as a quest to make ordinary thought processes more
moral and fair. So, again, we assert that statistics can be
downright romantic: it can be about conquering dark, evil
mistakes with the pure light of numbers, subduing the lie
of prejudices with the honesty of data.

Unreliability of Measurement
Suppose the number of hours slept and mood the next day have a very high degree of
correlation. However, suppose also that in a particular study the researcher had
asked people about their sleep on a particular night three weeks ago and about their
mood on the day after that particular night. There are many problems with this kind
of study, but one is that the measurement of hours slept and mood would not be very
accurate. For example, what a person recalls about how many hours were slept on a
particular night three weeks ago is probably not very close to how many hours the per-
son actually slept. Thus, the true correlation between sleep and mood could be high,
but the correlation in the particular study might be quite low, just because there is lots
of “random noise” (random inaccuracy) in the scores.

Here is another way to understand this issue: think of a correlation in terms of how
close the dots in the scatter diagram fall to a straight line. One of the reasons why
dots may not fall close to the line is inaccurate measurement.

Consider another example. Height and social power have been found in many
studies to have a moderate degree of correlation. However, if someone were to do
this study and measure each person’s height using an elastic measuring tape, the cor-
relation would be much lower. Some other examples of not fully accurate measure-
ment are personality questionnaires that include items that are difficult to understand
(or are understood differently by different people), ratings of behavior (such as
children’s play activity) that require some subjective judgment, or physiological mea-
sures that are influenced by things like ambient magnetic fields.

Often in psychology research our measures are not perfectly accurate or reliable
(this idea is discussed in more detail in Chapter 15). The result is that a correlation
between any two variables is lower than it would be if you had perfect measures of
the two variables.

462 Chapter 11

The reduction in a correlation due to unreliability of measures is called
attenuation. More advanced statistics texts and psychological measurement texts
describe formulas for correction for attenuation that can be used under some condi-
tions. However, studies using such procedures are relatively rare in most areas of psy-
chology research.

The main thing to remember from all of this is that, to the extent the measures used
in a study are less than perfectly accurate, the correlations reported in that study usu-
ally underestimate the true correlation between the variables (the correlation that
would be found if there was perfect measurement).

Influence of Outliers
The direction and strength of a correlation can be drastically distorted by one or more
individual’s scores on the two variables if each pair of scores is a very unusual com-
bination. For example, suppose in the sleep and mood example that an additional per-
son was added to the study who had not slept at all (0 hours sleep) and yet was
extremely happy the next day (8 on the happiness scale). (Maybe the person was going
through some sort of manic phase!) We have shown this situation in the scatter dia-
gram in Figure 11–17. It turns out that the correlation, which without this added per-
son was a large positive correlation ( ), now becomes a small to moderate
negative correlation ( )!

As we mentioned in Chapter 2, extreme scores are called outliers (they lie out-
side of the usual range of scores, a little like “outlaws”). Outliers are actually a prob-
lem in most kinds of statistical analyses and we will have more to say about them in
Chapter 14. However, the main point for now is this: if the scatter diagram shows one
or more unusual combinations, you need to be aware that these individuals have an
especially large influence on the correlation.

r = – .18
r = .85

8
7
6
5
4
3
2
1
0
H
ap
py
M
oo
d
0 1 2 3 4 5 6 7 8 9 10 11 12
Hours Slept Last Night
r = −.18

Figure 11–17 A scatter diagram for the hours slept last night and happy mood example
(see Table 11–1 and Figure 11–2d) with an outlier combination of scores (0 hours slept and happy
mood of 8) for an extra person (correlation is now compared to without the
extra person).

r = .85r = – .18

outliers scores with an extreme (very
high or very low) value in relation to the
other scores in the distribution.

T I P F O R S U C C E S S
If you feel you need some extra
practice figuring a correlation coef-
ficient, add the scores for this
extra person to the scores shown
in Table 11–1 and verify that the
correlation is now indeed .r = – .18

Correlation 463

What If There Is Some Curvilinearity? The Spearman Rho
The correlation coefficient, as we have seen, describes the direction and strength of
the linear relationship between two variables. It shows us how well the dots in a scat-
ter diagram follow a straight line in which highs go with highs and lows go with lows
(a positive correlation) or highs go with lows and lows with highs (a negative corre-
lation). Sometimes however, as you saw earlier in the chapter, the pattern of dots fol-
low a precise pattern, but that pattern is curved. For example, consider Figure 11–6b.
In this example, highs go with highs, middle scores go with lows, and low scores go
with highs. It is a kind of U shape. There are methods of figuring the degree to which
the dots follow such a curved line; these procedures are considered in advanced text-
books (e.g., Cohen et al., 2003).

Sometimes however, as shown in Figure 11–5, highs go with highs and lows
with lows, but the pattern is still not quite linear. In these particular kinds of situa-
tions we can in a sense straighten out the line and then use the ordinary correlation.
One way this can be done is by changing all the scores to their rank order. So, sep-
arately for each variable, you would rank the scores from lowest to highest (start-
ing with 1 for the lowest score and continuing until all the scores have been ranked).
This makes the pattern more linear. In fact, we could now proceed to figure the cor-
relation coefficient in the usual way, but using the rank-order scores instead of the
original scores. A correlation figured in this way is called Spearman’s rho. (It was
developed in the 1920s by Charles Spearman, an important British psychologist who
invented many statistical procedures to help him solve the problems he was work-
ing on, mainly involving the nature and measurement of human intelligence.)

We discuss changing scores to ranks more generally in Chapter 14, and consider
Spearman’s rho again in that context. We bring it up now, however, because in some areas
of psychology it is common practice to use Spearman’s rho instead of the ordinary cor-
relation coefficient, even if the dots do not show curvilinearity. Some researchers pre-
fer Spearman’s rho because it works correctly even if the original scores are not based
on true equal-interval measurement (as we discussed in Chapter 1). Finally, many
researchers like to use Spearman’s rho because it is much less affected by outliers.

Spearman’s rho the equivalent of a
correlation coefficient for rank-ordered
scores.

How are you doing?

1. (a) What numbers do psychologists use when they compare the size of two
correlation coefficients? (b) What are these numbers called? (c) How much
larger is a correlation of .80 than a correlation of .20?

2. (a) What is restriction in range? (b) What is its effect on the correlation coefficient?
3. (a) What is unreliability of measurement? (b) What is its effect on the correla-

tion coefficient?
4. (a) What is the outlier combination of scores in the set of scores below?

(b) Why are outliers a potential problem with regard to correlation?

5. Give three reasons why a researcher might choose to use Spearman’s rho
instead of the regular correlation coefficient.

X Y

10 41
8 35

12 46
9 37
2

464 Chapter 11

Effect Size and Power for the Correlation Coefficient
The correlation coefficient itself is a measure of effect size. (Thus, in the study of
sleep and mood, effect size was .) Cohen’s (1988) conventions for the corre-
lation coefficient are .10 for a small effect size, .30 for a medium (or moderate) ef-
fect size, and .50 for a large effect size.

Power for a correlation can be determined using a power table, a power software
package, or an Internet power calculator. Table 11–7 gives the approximate power for
the .05 significance level for small, medium, and large correlations, and one-tailed or
two-tailed tests.3 For example, the power for a study with an expected medium effect
size ( ), two-tailed, with 50 participants, is .57 (which is below the standard de-
sired level of at least .80 power). This means that even if the research hypothesis is
in fact true and has a medium effect size (that is, the two variables are correlated at

in the population), there is only a 57% chance that the study will produce a
significant correlation.

Planning Sample Size
Table 11–8 gives the approximate number of participants needed for 80% power for
estimated small, medium, and large correlations, using one-tailed and two-tailed tests,
all using the .05 significance level.4

r = .30

r = .30
r = .85
Answers

1.(a) When psychologists compare the size of two correlation coefficients, they use
the correlation coefficients squared. (b) The correlation coefficient squared is
called the proportionate reduction in error (or proportion of variance accounted
for). (c) A correlation of .80 is 16 times larger than a correlation of .20 (for

; for , ; and .64 is 16 times larger than .04).
2.(a) Restriction in range is a situation in correlation in which the scores of the

group of people studied on one of the variables do not include the full range
of scores that are found among people more generally. (b) The effect is often
to drastically reduce the correlation compared to what it would be if people
more generally were included in the study (presuming there would be a corre-
lation among people more generally).

3.(a)Unreliability of measurement is when the procedures used to measure a
particular variable are not perfectly accurate. (b) The effect is to make the
correlation smaller than it would be if perfectly accurate measures were used
(presuming there would be a correlation if perfectly accurate measures were
used).

4.(a) The outlier combination of scores is the final pair of scores (and
). The other four pairs of scores all suggest a positive correlation

between variables Xand Y,but the final pair of scores is a very low score for
variable Xand a very high score for variable Y.(b) Outliers have a larger effect
on the correlation than other combinations of scores.

5.First, Spearman’s rho can be used in certain situations when the scatter dia-
gram suggests a curvilinear relationship between two variables. Second,
Spearman’s rho can be used in certain situations to figure a correlation when
the original scores are not based on true equal-interval measurement. Finally,
Spearman’s rho is less affected by outliers than the regular correlation coef-
ficient.

Y=70
X=2

r2=.04 r=.20 r2=.64
r=.80,

T I P F O R S U C C E S S
Do not read this section if you
have not studied Chapters 3
through 7.

Correlation 465

Table 11–7 Approximate Power of Studies Using the Correlation Coefficient (r ) for Testing
Hypotheses at the .05 Level of Significance

Effect Size

Two-tailed

Total N : 10 .06 .13 .33

20 .07 .25 .64

30 .08 .37 .83

40 .09 .48 .92

50 .11 .57 .97

100 .17 .86 *

One-tailed

Total N : 10 .08 .22 .46

20 .11 .37 .75

30 .13 .50 .

40 .15 .60 .96

50 .17 .69 .98

100 .26 .92 *

*Power is nearly 1.

Small
(r � .10)

Medium
(r � .30)

Large
(r � .50)

How are you doing?

1. What are the conventions for effect size for correlation coefficients?
2. What is the power of a study using a correlation, with a two-tailed test at the

.05 significance level, in which the researchers predict a large effect size and
there are 50 participants?

3. How many participants do you need for 80% power in a planned study in which
you predict a small effect size and will be using a correlation, two-tailed, at the
.05 significance level?

Answers

1.The conventions for effect size and correlation coefficients: , small
effect size; , medium effect size; , large effect size.

2.Power is .97.
3.The number of participants needed is 783.

r=.50 r=.30
r=.10

Table 11–8 Approximate Number of Participants Needed for 80% Power for a Study Using the
Correlation Coefficient (r ) for Testing a Hypothesis at the .05 Significance Level

Effect Size

Small
(r � .10)

Medium
(r � .30)

Large
(r � .50)

Two-tailed 783 85 28

One-tailed 617 68 22

466 Chapter 11

Controversy: What Is a Large Correlation?
An ongoing controversy about the correlation coefficient is, “What is a large r?”
Traditionally in psychology, a large correlation is considered to be about .50 or above,
a moderate correlation to be about .30, and a small correlation to be about .10 (Cohen,
1988). In fact, in many areas of psychology it is rare to find correlations that are
greater than .40. Even when we are confident that X causes Y, X will not be the only
cause of Y. For example, doing exciting activities together may cause people to be
happier in their marriage. (In fact, we have done a number of true experiments sup-
porting this direction of causality; Aron et al., 2000.) However, exciting activities is
still only one of a great many factors that affect marital satisfaction. All those other
factors are not part of our correlation. No one correlation could possibly tell the whole
story. Small correlations are also due to the unavoidably low reliability of many mea-
sures in psychology.

It is traditional to caution that a low correlation is not very important even if it is
statistically significant. (A small correlation can be statistically significant if the study
includes a very large number of participants.)

Further, even experienced research psychologists tend to treat any particular size
of correlation as meaning more of an association between two variables than it actu-
ally does. Michael Oakes (1982) at the University of Sussex gave 30 research psychol-
ogists the two columns of numbers shown in Table 11–9. He then asked them to
estimate r (without doing any calculations). What is your guess? The intuitions of the
British researchers (who are as a group at least as well trained in statistics as psychol-
ogists anywhere in the world) ranged from to , with a mean of .24. You can
figure the true correlation for yourself. It comes out to .50! That is, what psycholo-
gists think a correlation of .50 means in the abstract is a much stronger degree of cor-
relation than what they think when they see the actual numbers (which even at
only look like .24).

Oakes (1982) gave a different group of 30 researchers just the X column and
asked them to fill in numbers in the Y column that would come out to a correlation of
.50 (again, just using their intuition and without any figuring). When Oakes figured
the actual correlations from their answers, these correlations averaged .68. In other
words, once again, even experienced researchers think of a correlation coefficient as
meaning more linkage between the two variables than it actually does.

In contrast, other psychologists hold that small correlations can be very impor-
tant theoretically. They also can have major practical implications in that small effects
may accumulate over time (Prentice & Miller, 1992).

To demonstrate the practical importance of small correlations, Rosnow and
Rosenthal (1989) give an example of a now famous study (Steering Committee of the
Physicians’ Health Study Research Group, 1988) in which doctors either did or did not
take aspirin each day. Whether or not they took aspirin each day was then correlated with
heart attacks. The results were that taking aspirin was correlated with heart
attacks.5 This means that taking aspirin explains only .1% (
which is .1%) of the variation in whether people get heart attacks. So taking aspirin is
only a small part of what affects people getting heart attacks; 99.9% of the variation in
whether people get heart attacks is due to other factors (diet, exercise, genetic factors,
etc.). However, Rosnow and Rosenthal point out that this correlation of “only ”
meant that among the more than 20,000 doctors who were in the study, there were 72
more heart attacks in the group that did not take aspirin. (In fact, there were also 13
more heart attack deaths in the group that did not take aspirin.) Certainly, this difference
in getting heart attacks is a difference we care about.

– .034

r2 = -034 * -034 = .001,
– .034

r = .50

+ .60- .20

Table 11–9 Table
Presented to 30 Psychologists to
Estimate r

X Y
1 1

2 10

3 2

4 9

5 5

6 4

7 6

8 3

9 11

10 8

11 7

12 12

Source: Based on Oakes (1982).

Correlation 467

Another argument for the importance of small correlations emphasizes research
methods. Prentice and Miller (1992) explain:

Showing that an effect holds even under the most unlikely circumstances possible can
be as impressive as (or in some cases, perhaps even more impressive) than showing
that it accounts for a great deal of variance. (p. 163)

Some examples they give are studies showing correlations between attractiveness and
judgments of guilt or innocence in court cases (e.g., Sigall & Ostrove, 1975). The
point is that “legal judgments are supposed to be unaffected by such extraneous fac-
tors as attractiveness.” Thus, if studies show that attractiveness is associated with
legal judgments even slightly, we are persuaded of just how important attractiveness
could be in influencing social judgments in general.

Finally, you should be aware that there is even controversy about the widespread
use of Cohen’s (1988) conventions for the correlation coefficient (that is, .10 for a
small effect size, .30 for a medium effect size, and .50 for a large effect size). When
proposing conventions for effect size estimates, such as the correlation coefficient
(r), Cohen himself noted: “. . . these proposed conventions were set forth throughout
with much diffidence, qualifications, and invitations not to employ them if possible.
The values chosen had no more a reliable basis than my own intuition. They were
offered as conventions because they were needed in a research climate characterized
by a neglect of issues of [effect size] magnitude” (p. 532). Thus, some researchers
strongly suggest that the magnitude of effects found in research studies should not be
compared with Cohen’s conventions, but rather with the effects reported in previous
similar research studies (Thompson, 2007).

Correlation in Research Articles
Scatter diagrams are occasionally included in research articles. For example, Gump
and colleagues (2007) conducted a study of the level of lead in children’s blood and
the socioeconomic status of their family. The participants were 122 children who were
taking part in an ongoing study of the developmental effects of environmental toxi-
cants. Between the age of 2 and 3 years, a blood sample was taken from each child
(with parental permission), and the amount of lead in each sample was determined with
a laboratory test. The researchers measured the socioeconomic status of each child’s
family based on the parents’ self-reported occupation and education level. As shown
in Figure 11–18 Gump et al. (2007) used a scatter diagram to describe the relation-
ship between childhood blood levels and family socioeconomic status. There was
a clear linear negative trend, with the researchers noting “. . . increasing family SES
[socioeconomic status] was significantly associated with declining blood levels”
(p. 300). The scatter diagram shows that children from families with a higher socioe-
conomic status had lower levels of lead in their blood. Of course, this is a correlational
result; so it does not necessarily mean that family socioeconomic status directly in-
fluences the amount of lead in children’s blood. It is possible that some other factor
may explain this association, such as a person’s level of education.

Correlation coefficients are very commonly reported in research articles, both in
the text of articles and in tables. The result with which we started the chapter would
be described as follows: there was a positive correlation ( ) between excitement
of activities done with partner and marital satisfaction. Usually, the statistical signif-
icance of the correlation will also be reported; in this example, it would be ,
p � .05.

r = .51

r = .51
IS
B
N
0-
55
8-
46
76
1-
X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

468 Chapter 11

Tables of correlations are common when several variables are involved. Usually,
the table is set up so that each variable is listed down the left and also across the top.
The correlation of each pair of variables is shown inside the table. This is called a
correlation matrix.

Table 11–10 is a correlation matrix from a study of 114 expert Scrabble players
(Halpern & Wai, 2007). (You may remember that we first mentioned this study in

0
10 20 30

Family Socioeconomic Status

C
hi

ld
ho

od
B

lo
od

L
ea

d
L

ev
el

s
(μ

g/
dL

)

40 50 60

2
4
6
8
10
12
14

Figure 11–18 Children’s family socioeconomic status (Hollingshead Index) as a func-
tion of childhood lead levels.
Source: Gump, B. B., Reihman, J., Stewart, P., Lonky, E., Darvill, T., & Matthews, K. A. (2007). Blood
lead (Pb) levels: A potential environmental mechanism explaining the relation between socioeconomic
status and cardiovascular reactivity in children. Health Psychology, 26, 296–304. Published by theAmerican
Psychological Association. Reprinted with permission.

Table 11–10 Correlations with Official Scrabble Ratings (Experts Only)

Variable 1 2 3 4 5 6 7 8 9

1. Official Scrabble rating — .116 * .021 .227* .224*

2. Gender — .318* .094 .265* .104 .220* .242*

3. Current age — .167 .727** .088 .769** .515**

4. Age started playing Scrabble — .355* .233* .094 ** .058

5. Age started competing — .096 .112 .386* .121

6. Days of year playing Scrabble — .050

7. Hours per day playing Scrabble — .377*

8. Years of practice — .492**

9. Total hours playing —

* . ** .

Source: Halpern, D. F., & Wai, J. (2007). The world of competitive Scrabble: Novice and expert differences in visiospatial and verbal abilities. Journal of Experimental Psychology:
Applied, 13, 79–94. Published by the American Psychological Association. Reprinted with permission.

p 6 .01p 6 .05

(Years * Hours)

– .134
– .196- .093

– .501
– .094
– .181
– .128- .202- .173- .178

correlation matrix common way of
reporting the correlation coefficients
among several variables in a research
article; table in which the variables are
named on the top and along the side and
the correlations among them are all
shown.

Correlation 469

Chapter 4.) The researchers asked the expert Scrabble players a series of questions
about their Scrabble playing, including the age at which they started playing and the
age at which they started competing, the number of days a year and the number of
hours per day they play Scrabble, and the number of years they had been practicing.
The expert Scrabble players also provided their official Scrabble rating to the re-
searchers. Table 11–10 shows the correlations among all the study measures.

This example shows several features that are typical of the way correlation ma-
trixes are laid out. First, notice that the correlation of a variable with itself is not given.
In this example, a short line is put in instead; sometimes they are just left blank. Also
notice that only the upper triangle is filled in. This is because the lower left triangle
would contain exactly the same information. For example, the correlation of official
Scrabble rating with current age (which is .116) has to be the same as the correlation
of current age with official Scrabble rating. Another shortcut saves space across the
page: the names of the variables are listed only on the side of the table, with the num-
bers for them put across the top.

Looking at this example, among other results, you can see that there is a small to
moderate negative correlation between official Scrabble rating and the age at which
a person started competing in Scrabble. Also, there is a small to moderate correlation
between official Scrabble rating and the years of practice. The asterisks—* and **—
after some of the correlation coefficients tell you that those correlations are statisti-
cally significant. The note at the bottom of the table tells you the significance levels
associated with the asterisks.

1. When two variables are associated in a clear pattern (for example, when high
scores on one consistently go with high scores on the other, and lows on one go
with lows on the other) the two variables are correlated.

2. A scatter diagram shows the relation between two variables. The lowest to high-
est possible values of one variable (the one you are predicting from if one vari-
able can be thought of as predicting the other variable) are marked on the
horizontal axis. The lowest to highest possible values of the other variable are
marked on the vertical axis. Each individual’s pair of scores is shown as a dot.

3. When the dots in the scatter diagram generally follow a straight line, this is called
a linear correlation. In a curvilinear correlation, the dots follow a line pattern
other than a simple straight line. There is no correlation when the dots do not fol-
low any kind of line. In a positive linear correlation, the line goes upward to the
right (so that low scores go with lows, mediums with mediums, and highs with
highs). In a negative linear correlation, the line goes downward to the right (so
that low scores go with highs, mediums with mediums, and highs with lows).
The strength of the correlation refers to the degree to which there is a clear pat-
tern of relationship between the two variables.

4. The correlation coefficient (r) gives the precise linear correlation between two
equal-interval numeric variables. The correlation coefficient is the product of
the deviation scores ( and ) divided by a correction number that
takes into account the number of people in the study and the variation of each
variable’s scores. The correction number is figured as the square root of the re-
sult of multiplying the sum of squared deviations for one variable ( ) by the
sum of squared deviations for the other variable ( ). The correlation coeffi-
cient is highly positive when there is a large positive linear correlation. This is

SSY
SSX
Y – MYX – MX

Summary

470 Chapter 11

because positive deviation scores are multiplied by positive, and negative by
negative (giving all positive products of deviation scores). The correlation co-
efficient is highly negative when there is a large negative linear correlation. This
is because negative deviation scores are multiplied by positive deviation scores
and positive by negative (giving all negative products of deviation scores). The
correlation coefficient is 0 when there is no linear correlation. This is because
positives are sometimes multiplied by positives and sometimes by negatives
(and vice versa), so that positive and negative products of deviation scores can-
cel each other out.

5. The sign ( ) of a correlation coefficient tells you the direction of the linear
correlation between two variables. The actual value of the correlation coefficient
(ignoring the sign) tells you the strength of the linear correlation. The maximum pos-
itive value of r is . when there is a perfect positive linear correlation.
The maximum negative value of r is . when there is a perfect negative
linear correlation.

6. The statistical significance of a correlation coefficient can be tested by chang-
ing the correlation coefficient into a t score and using cutoffs on a t distribution
with degrees of freedom equal to the number of people in the study minus two.
The t score for a correlation coefficient is the result of dividing the correlation
coefficient by the square root of what you get when you divide one minus the
correlation coefficient squared by two less than the number of people in the
study. The null hypothesis for hypothesis testing with a correlation coefficient
is that the true relation between the two variables in the population is no corre-
lation ( ).

7. The assumptions for the significance test of a correlation coefficient are that the
population of each variable (and the relationship between them) follows a nor-
mal distribution, and that there is an equal distribution of each variable at each
point of the other variable.

8. Correlation does not tell you the direction of causation. If two variables, X and
Y, are correlated, the correlation could be because X is causing Y, Y is causing X,
or a third factor is causing both X and Y.

9. Comparisons of the degree of linear correlation are considered most accurate in
terms of the correlation coefficient squared ( ), called the proportionate reduc-
tion in error or proportion of variance accounted for.

10. A correlation coefficient will be lower (closer to 0) than the true correlation if it
is based on scores from a group selected for study that is restricted in its range
of scores (compared to people in general) or if the scores are based on unreliable
measures.

11. The direction and strength of a correlation can be drastically distorted by extreme
combinations of scores called outliers.

12. Spearman’s rho is a special type of correlation based on rank-order scores. It can
be used in certain situations when the scatter diagram suggests a curvilinear re-
lationship between two variables. Spearman’s rho is less affected than the regu-
lar correlation by outliers, and it works correctly even if the original scores are
not based on true equal-interval measurement.

13. The correlation itself is a measure of effect size. Power and needed sample size
for 80% power for a correlation coefficient can be determined using special power
tables, a power software package, or an Internet power calculator.

14. Studies suggest that psychologists tend to think of any particular correlation
coefficient as meaning more association than actually exists. However, small

r2
r = 0

r = -1-1
r = +1+1

+ or –

Correlation 471

correlations may have practical importance and may also be impressive in demon-
strating the importance of a relationship when a study shows that the correlation
holds even under what would seem to be unlikely conditions.

15. Correlational results are usually presented in research articles either in the text with
the value of r (and usually the significance level) or in a special table (a correla-
tion matrix) showing the correlations among several variables.

correlation (p. 433)
scatter diagram (p. 434)
linear correlation (p. 437)
curvilinear correlation (p. 437)
no correlation (p. 439)
positive correlation (p. 439)

negative correlation (p. 439)
product of deviation scores (p. 443)
correlation coefficient (p. 446)
direction of causality (p. 456)
correlational research

design (p. 458)

proportionate reduction in
error (p. 459)

restriction in range (p. 459)
outliers (p. 462)
Spearman’s rho (p. 463)
correlation matrix (p. 468)

Key Terms

Making a Scatter Diagram and Describing the General
Pattern of Association
Based on the class size and average achievement test scores for five elementary schools
in the following table, make a scatter diagram and describe in words the general pat-
tern of association.

Example Worked-Out Problems

Elementary School Class Size Achievement Test Score

Main Street 25 80
Casat 14 98
Harland 33 50
Shady Grove 28 82
Jefferson 20 90

Answer
The steps in solving the problem follow; Figure 11–19 shows the scatter diagram with
markers for each step.

❶ Draw the axes and decide which variable goes on which axis. It seems more
reasonable to think of class size as predicting achievement test scores rather
than the other way around. Thus, you can draw the axis with class size along
the bottom. (However, the prediction was not explicitly stated in the problem; so
the other direction of prediction is certainly possible. Thus, putting either vari-
able on either axis would be acceptable.)

❷ Determine the range of values to use for each variable and mark them on the
axes. We will assume that the achievement test scores go from 0 to 100. We don’t
know the maximum class size; so we guessed 50. (The range of the variables

472 Chapter 11

was not given in the problem; thus any reasonable range would be acceptable as
long as it includes the values of the scores in the actual study.)

❸ Mark a dot for each pair of scores. For example, to mark the dot for Main
Street School, you go across to 25 and up to 80.

The general pattern is roughly linear. Its direction is negative (it goes down and
to the right, with larger class sizes going with smaller achievement scores and vice
versa). It is a quite large correlation, since the dots all fall fairly close to a straight line;
it should be fairly close to –1. In words, it is a large, linear, negative correlation.

Figuring the Correlation Coefficient
Figure the correlation coefficient for the class size and achievement test in the preced-
ing example.

Answer
You can figure the correlation using either the formula or the steps. The basic figur-
ing is shown in Table 11–11 with markers for each of the steps.

Using the formula,

Using the steps,

❶ Change the scores for each variable to deviation scores. The mean of the
class size is 24. Thus, the first class size score of 25 gives a deviation score of

. Using the same procedure for all the other scores gives the devi-
ation scores shown in the and columns in Table 11–11.

❷ Figure the product of the deviation scores for each pair of scores. For the first
school, multiply 1 by 0 to give 0. The products of deviation scores for all the
scores are shown in the last column of Table 11–11.

Y-MYX- MX
25 – 24 = 1

r =
g3(X-MX)(Y-MY)4

2(SSX)(SSY)
=

-482

533.10

= – .90

100

90
80
70
60
50
40
30
20
10

A
ch

ie
ve

m
en

t T
es

t S
co

0 5 10 15 20 25 30 35 40

Class Size

45 50

❸

❶

❷
❶
❷

Figure 11–19 Scatter diagram for scores in Example Worked-Out Problem. ❶ Draw
the axes and decide which variable goes on which axis. ❷ Determine the range of values to use
for each variable and mark them on the axes. ❸ Mark a dot for each pair of scores.

Correlation 473

❸ Add up all the products of the deviation scores. Adding up all the products of
the deviation scores, as shown in Table 11–11, gives a sum of –482.

❹ For each variable, square each deviation score. For the first school, the squared
deviation for the class size variable is 1 multiplied by 1, which is 1. The squared
deviation scores for all the scores are shown in the and
columns of Table 11–11.

➎ Add up the squared deviation scores for each variable. As shown in
Table 11–11, the sum of squared deviations for the class size variable is 214
and the sum of squared deviations for the achievement test score variable
is 1,328.

➏ Multiply the two sums of squared deviations and take the square root of the
result. Multiplying 214 by 1,328 is 284,192 and the square root of 284,192 is
533.10.

❼ Divide the sum of the products of deviation scores from Step ❸ by the
correction number from Step ➏. Dividing by 533.10 gives a result of

. This is the correlation coefficient.

Figuring the Significance of a Correlation Coefficient
Figure whether the correlation between class size and achievement test score in the
preceding example is statistically significant (use the .05 level, two-tailed).

Answer
❶ Restate the question as a research hypothesis and a null hypothesis about

the populations. There are two populations:

Population 1: Schools like those in this study.
Population 2: Schools for whom there is no correlation between the two
variables.

– .90
-482

(Y – MY)2(X – MX)2

Table 11–11 Figuring the Correlation Coefficient Between Class Size and Achievement Test
Score for the Example Worked-Out Problem

Class Size (X ) Achievement Test Score (Y )

Deviation ❶
Deviation

Squared ❹ Deviation ❶
Deviation

Squared ❹
Products of ❷

Deviation Scores
X Y

25 1 1 80 0 0 0

14 100 98 18 324

33 9 81 50 900

28 4 16 82 2 4 8

20 16 90 10 100

2(SSX)(SSY )
=

-482

2(214)(1328)
=

-482
533.10

= – .90

M = 80M = 24

-270-30
-180-10

(X � MX )(Y � MY )(Y � MY )
2Y � MY(X � MX )
2X � MX
IS
B
N
0-
55
8-
46
76
1-
X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

474 Chapter 11

The null hypothesis is that the two populations have the same correlation. The
research hypothesis is that the two populations do not have the same correlation.

❷ Determine the characteristics of the comparison distribution. The comparison
distribution is a t distribution with . (That is,

❸ Determine the cutoff sample score on the comparison distribution at which
the null hypothesis should be rejected. The t table (Table A–2 in the Appendix)
shows that for a two-tailed test at the .05 level, with 3 degrees of freedom, the cut-
off t scores are 3.182 and .

❹ Determine your sample’s score on the comparison distribution. The
correlation in the study was –.90. Applying the formula to find the equivalent t,
we get

❺ Decide whether to reject the null hypothesis. The t score of for our
sample correlation is more extreme than a cutoff t score of . Thus, we can
reject the null hypothesis and the research hypothesis is supported.

Outline for Writing Essays on the Logic and Figuring
of a Correlation Coefficient

1. If the question involves creating a scatter diagram, explain how and why you cre-
ated the diagram to show the pattern of relationship between the two variables.
Explain the meaning of the term correlation. Mention the type of correlation
(e.g., linear; positive or negative; small, moderate, or large) shown by the scat-
ter diagram.

2. Explain the idea that a correlation coefficient tells you the direction and strength
of linear correlation between two variables.

3. Outline and explain the steps for figuring the correlation coefficient. Be sure to
mention that the first step involves changing the scores for each variable to de-
viation scores. Describe how to figure the product of the deviation scores. Explain
why the product of deviation scores will tend to be positive if the correlation is
positive and will tend to be negative if the correlation is negative. Explain the two
reasons why it is necessary to use a correction number to adjust the sum of the
products of deviation scores. Describe how that correction number is figured and
how it acts to adjust the sum of the products of deviation scores. Explain what the
value of the correlation coefficient means in terms of the direction and strength
of linear correlation.

4. Be sure to discuss the direction and strength of correlation of your particular re-
sult. As needed for the specific question you are answering, discuss whether the
correlation is statistically significant.

-3.182

-3.58

t =
r

2(1 – r2)>(N-2)
=

– .90
2(1 – (- .902))>(3)

=
– .90
2.0633

= -3.58.

-3.182

df = N – 2 = 5 – 2 = 3.)df = 3

Practice Problems

Correlation 475

at the end of this chapter and the Study Guide and Computer Workbook that
accompanies this text.

All data are fictional unless an actual citation is given.

Set I (for Answers to Set I Problems, see pp. 690–692)
1. For each of the following scatter diagrams, indicate whether the pattern is

linear, curvilinear, or no correlation; if it is linear, indicate whether it is posi-
tive or negative and the approximate strength (large, moderate, small) of the
correlation.

(a)
(c)

(e)

(b)
(d)
(f)
IS
B
N
0-
55
8-
46
76
1-
X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

476 Chapter 11

(a) Make a scatter diagram of the scores; (b) describe in words the general pat-
tern of correlation, if any; (c) figure the correlation coefficient; (d) figure whether
the correlation is statistically significant (use the .05 significance level, two-
tailed); (e) explain the logic of what you have done, writing as if you are speak-
ing to someone who has never heard of correlation (but who does understand the
mean, deviation scores, and hypothesis testing); and (f) give three logically pos-
sible directions of causality, saying for each whether it is a reasonable direction
in light of the variables involved (and why).

4. In a study of people first getting acquainted with each other, researchers exam-
ined the amount of self-disclosure of one’s partner and one’s liking for one’s part-
ner. Here are the results:

Pair Number Therapist Empathy Patient Satisfaction

1 70 4
2 94 5
3 36 2
4 48 1

3. An instructor asked five students how many hours they had studied for an exam.
Here are the hours studied and the students’ grades:

Hours Studied Test Grade

0 52
10 95

6 83
8 71
6 64

Partner’s Self-Disclosure Liking for Partner

8 7
7 9

10 6
3 7
1 4

2. A researcher studied the relation between psychotherapists’ degree of empathy and
their patients’ satisfaction with therapy. As a pilot study, four patient–therapist
pairs were studied. Here are the results:

Correlation 477

(a) Make a scatter diagram of the scores; (b) describe in words the general pattern
of correlation, if any; (c) figure the correlation coefficient; and (d) figure whether
the correlation is statistically significant (use the .05 significance level, two-tailed).

5. The following have been prepared so that data sets B through D are slightly mod-
ified versions of data set A. For each data set, (a) make a scatter diagram, (b) fig-
ure the correlation coefficient, and (c) figure whether the correlation is statistically
significant (use the .05 significance level, two-tailed).

Data Set A Data Set B Data Set C Data Set D

X Y X Y X Y X Y

1 1 1 1 1 5 1 1
2 2 2 2 2 2 2 4
3 3 3 3 3 3 3 3
4 4 4 5 4 4 4 2
5 5 5 4 5 1 5 5

6. For each of the following situations, indicate why the correlation coefficient
might be a distorted estimate of the true correlation (and what kind of distortion
you would expect):
(a) Scores on two questionnaire measures of personality are correlated.
(b) Comfort of living situation and happiness are correlated among a group of

millionaires.
7. What is the power of each of the following studies using a correlation coefficient

and the .05 significance level?

Effect Size (r ) N Tails

(a) .10 50 2
(b) .30 100 1
(c) .50 30 2
(d) .30 40 1
(e) .10 100 2

8. About how many participants are needed for 80% power in each of the follow-
ing planned studies that will use a correlation coefficient and the .05 significance
level?

Effect Size (r ) Tails

(a) .50 2
(b) .30 1
(c) .10 2

9. Chapman et al. (1997) interviewed 68 inner city pregnant women and their hus-
bands (or boyfriends) twice during their pregnancy, once between three and six
months into the pregnancy and again between six and nine months into the preg-
nancy. Table 11–12 shows the correlations among several of their measures.
(“Zero-Order Correlations” means the same thing as ordinary correlations.) Most
important in this table are the correlations among women’s reports of their own
stress, men’s reports of their partners’ stress, women’s perception of their partners’
support at the first and at the second interviews, and women’s depression at the
first and at the second interviews.

478 Chapter 11

Table 11–12 Zero-Order Correlations for Study Variables

Variable 1 2 3 4 5 6 7 8 9 10

1. Women’s report of stress —

2. Men’s report of women’s stress .17 —

3. Partner Support 1 * —

4. Partner Support 2 * .44*** —

5. Depressed Mood 1 .23* .10 ** —

6. Depressed Mood 2 .50*** .14 *** *** .55*** —

7. Women’s age .06 .16 .04 * * —

8. Women’s ethnicity .11 .13 —

9. Women’s marital status .01 .12 .24* .05 ** —

10. Parity .19 .13 .10 .16 .26* .31* —

* , ** , *** .

Source: Chapman, H. A., Hobfoll, S. E., & Ritter, C. (1997). Partners’ stress underestimations lead to women’s distress: A study of pregnant inner-city women. Journal of Personality and
Social Psychology, 73, 418–425. Published by the American Psychological Association. Reprinted with permission.

p 6 .001p 6 .01p 6 .05

– .12- .17- .11
– .34- .20- .04- .18

– .02- .14- .16- .09- .19
– .09- .35- .24

– .41- .42
– .17- .34

– .18- .27
– .18- .28

Explain the results on these measures as if you were writing to a person who
has never had a course in statistics. Specifically, (a) explain what is meant by
a correlation coefficient using one of the correlations as an example; (b) study
the table and then comment on the patterns of results in terms of which vari-
ables are relatively strongly correlated and which are not very strongly corre-
lated; and (c) comment on the limitations of making conclusions about the
direction of causality based on these data, using a specific correlation as an ex-
ample (noting at least one plausible alternative causal direction and why that
alternative is plausible).

Set II
10. For each of the following scatter diagrams, indicate whether the pattern is lin-

ear, curvilinear, or no correlation; if it is linear, indicate whether it is positive
or negative and the approximate strength (large, moderate, small) of the
correlation.

Correlation 479

11. Make up a scatter diagram with 10 dots for each of the following situations:
(a) perfect positive linear correlation, (b) large but not perfect positive linear
correlation, (c) small positive linear correlation, (d) large but not perfect negative
linear correlation, (e) no correlation, (f) clear curvilinear correlation.

For problems 12 to 14, do the following: (a) Make a scatter diagram of the
scores; (b) describe in words the general pattern of correlation, if any; (c) figure
the correlation coefficient; (d) figure whether the correlation is statistically sig-
nificant (use the .05 significance level, two-tailed); (e) explain the logic of what
you have done, writing as if you are speaking to someone who has never heard
of correlation (but who does understand the mean, deviation scores, and hypoth-
esis testing); and (f) give three logically possible directions of causality, indicat-
ing for each direction whether it is a reasonable explanation for the correlation
in light of the variables involved (and why).

12. Four research participants take a test of manual dexterity (high scores mean better dex-
terity) and an anxiety test (high scores mean more anxiety). The scores are as follows.

Person Dexterity Anxiety

1 1 10
2 1 8
3 2 4
4 4 -2

13. Four young children were monitored closely over a period of several weeks to
measure how much they watched violent television programs and their amount
of violent behavior toward their playmates. The results were as follows:

480 Chapter 11

16. A developmental psychologist studying people in their eighties was interested in
the relation between number of very close friends and overall health. The scores
for six research participants follow.

Weekly Viewing of Number of Violent or Aggressive
Child’s Code Number Violent TV (hours) Acts Toward Playmates

G3368 14 9
R8904 8 6
C9890 6 1
L8722 12 8

Student Family Goal Work Goal

A 7 5
B 6 4
C 8 2
D 3 9
E 4 1

14. Five college students were asked about how important a goal it is to them to have
a family and about how important a goal it is for them to be highly successful in
their work. Each variable was measured on a scale from 0, not at all important
goal to 10, very important goal.

For problems 15 and 16, (a) make a scatter diagram of the scores; (b) describe in
words the general pattern of correlation, if any; (c) figure the correlation coeffi-
cient; and (d) figure whether the correlation is statistically significant (use the
.05 significance level, two-tailed).

15. The Louvre Museum is interested in the relation of the age of a painting to pub-
lic interest in it. The number of people stopping to look at each of 10 randomly
selected paintings is observed over a week. The results are as shown:

Painting Title Approximate Age (Years) X Number of People Stopping to Look Y

The Entombment 465 68
Mys Mar Sainte Catherine 515 71
The Bathers 240 123
The Toilette 107 112
Portrait of Castiglione 376 48
Charles I of England 355 84
Crispin and Scapin 140 66
Nude in the Sun 115 148
The Balcony 122 71
The Circus 99 91

Research Participant Number of Friends X Overall Health Y

A 2 41
B 4 72
C 0 37
D 3 84
E 2 52
F 1 49

Correlation 481

Effect Size (r ) N Tails

(a) .10 30 1
(b) .30 40 2
(c) .50 50 2
(d) .30 100 2
(e) .10 20 1

Effect Size (r ) Tails

(a) .10 1
(b) .30 2
(c) .50 1

17. What is the power of each of the following studies using a correlation coefficient
and the .05 significance level?

18. About how many participants are needed for 80% power in each of the follow-
ing planned studies that will use a correlation coefficient and the .05 significance
level?

19. As part of a larger study, Speed and Gangstead (1997) collected ratings and
nominations on a number of characteristics for 66 fraternity men from their fel-
low fraternity members. The following paragraph is taken from their Results
section:

. . . men’s romantic popularity significantly correlated with several characteris-
tics: best dressed ( ), most physically attractive ( ), most outgo-
ing ( ), most self-confident ( ), best trendsetters ( ),
funniest ( ), most satisfied ( ), and most independent ( ).
Unexpectedly, however, men’s potential for financial success did not signifi-
cantly correlate with romantic popularity ( ). (p. 931)

Explain these results as if you were writing to a person who has never had a
course in statistics. Specifically, (a) explain what is meant by a correlation coef-
ficient using one of the correlations as an example; (b) explain in a general way
what is meant by “significantly” and “not significantly,” referring to at least one
specific example; and (c) speculate on the meaning of the pattern of results, tak-
ing into account the issue of direction of causality.

20. Gable and Lutz (2000) studied 65 children, 3 to 10 years old, and their parents.
One of their results was “Parental control of child eating showed a negative
association with children’s participation in extracurricular activities (

)” (p. 296). Another result was “Parents who held less appropriate beliefs
about children’s nutrition reported that their children watched more hours of tele-
vision per day ( )” (p. 296). Explain these results as if you were
writing to a person who has never had a course in statistics. Be sure to comment
on possible directions of causality for each result.

21. Table 11–13 is from a study by Baldwin and colleagues (2006) that examined
the associations among feelings of shame, guilt, and self-efficacy in a sample
of 194 college students. Self-efficacy refers to people’s beliefs about their abil-
ity to be successful at various things they may try to do. (For example, the stu-
dents indicated how much they agreed with statements such as, “When I make

r = .36; p 6 .01

p 6 .01
r = .34;

r = .10

r = .28r = .32r = .37
r = .38r = .44r = .47

r = .47r = .48

482 Chapter 11

plans, I am certain I can make them work.”) Table 11-13 shows the correla-
tions among the questionnaire measures of shame, guilt, general self-efficacy,
social self-efficacy, and total self-efficacy (general self-efficacy plus social
self-efficacy).

Explain the results as if you were writing to a person who has never had a
course in statistics. Specifically, (a) explain what is meant by a correlation coef-
ficient using one of the correlations as an example; (b) study the table and then
comment on the patterns of results in terms of which variables are relatively
strongly correlated and which are not very strongly correlated; and (c) comment
on the limitations of making conclusions about the direction of causality based
on these data, using a specific correlation as an example (noting at least one plau-
sible alternative causal direction and why that alternative is plausible).

22. Arbitrarily select eight people from your class. Do each of the following: (a)
Make a scatter diagram for the relation between the number of letters in each
person’s first and last name; (b) figure the correlation coefficient for the relation
between the number of letters in each person’s first and last name; (c) figure
whether the correlation is statistically significant (use the .05 significance level,
two-tailed); (d) describe the result in words; and (e) suggest a possible interpre-
tation for your results.

Using SPSS

The U in the following steps indicates a mouse click. (We used SPSS version 15.0
for Windows to carry out these analyses. The steps and output may be slightly differ-
ent for other versions of SPSS.)

In the following steps for the scatter diagram and correlation coefficient, we will
use the example of the sleep and happy mood study. The scores for that study are
shown in Table 11–1 on p. 435, the scatter diagram is shown in Figure 11–2 on
p. 435, and the figuring for the correlation coefficient and its significance is shown in
Table 11–3 on p. 449.

Creating a Scatter Diagram
❶ Enter the scores into SPSS. Enter the scores as shown in Figure 11–20.
❷ U Graphs.

Table 11–13 Correlations Among Shame, Guilt, and Self-Efficacy Subscales

1 2 3 4 5

1. Shame

2. Guilt .34**

3. General Self-efficacy ** .12

4. Social Self-efficacy * .47**

5. Total Self-efficacy ** .07 .94** .74**

* , ** . For all correlations, n is between 184 and 190.

Source: Baldwin, K. M., Baldwin, J. R., & Ewald, T. (2006). The relationship among shame, guilt, and self-efficacy. American
Journal of Psychotherapy, 60, 1–21. Copyright © 2006 by The Association for the Advancement of Psychotherapy. Reprinted by
permission of the publisher.

p 6 .001p 6 .01

– .29

– .06- .18

Correlation 483

❸ U Legacy/Dialogs, U Scatter/Dot. A box will appear that allows you to select
different types of scatter diagrams. You want the “Simple Scatter” diagram. This
is selected as the default type of diagram; so you just need to U Define.

❹ U the variable called “mood” and then U the arrow next to the box labeled
“Y axis.” This tells SPSS that the scores for the “mood” variable should go on
the vertical (or Y) axis of the scatter diagram. U the variable called “sleep” and
then U the arrow next to the box labeled “X axis.” This tells SPSS that the
scores for the “sleep” variable should go on the horizontal (or X) axis of the
scatter diagram.

❺ U OK. Your SPSS output window should look like Figure 11–21.

Finding the Correlation Coefficient
❶ Enter the scores into SPSS. Enter the scores as shown in Figure 11–20.
❷ U Analyze.
❸ U Correlate.
❹ U Bivariate.
❺ U on the variable called “mood” and then U the arrow next to the box labeled

“Variables.” U on the variable called “sleep” and then U the arrow next to the
box labeled “Variables.” This tells SPSS to figure the correlation between the
“mood” and “sleep” variables. (If you wanted to find the correlation between
each of several variables, you would put all of them into the “Variables” box.) No-
tice that by default SPSS carries out a Pearson correlation (the type of correlation
you have learned in this chapter), gives the significance level using a two-tailed
test, and flags statistically significant correlations using the .05 significance level.
(Clicking the box next to “Spearman” requests Spearman’s rho, which is a spe-
cial type of correlation we briefly discussed earlier in the chapter.)

Figure 11–20 SPSS data editor window for the fictional study of the relationship be-
tween hours slept last night and mood.

484 Chapter 11

❻ U OK. Your SPSS output window should look like Figure 11–22.

The table shown in Figure 11-22 is a small correlation matrix (there are only two
variables). (If you were interested in the correlations among more than two vari-
ables—which is often the case in psychology research—SPSS would produce a larger
correlation matrix.) The correlation matrix shows the correlation coefficient (“Pear-
son Correlation”), the exact significance level of the correlation coefficient [“Sig.
(2-tailed)”], and the number of people in the correlation analysis (“N”). Note that two
of the cells of the correlation matrix show a correlation coefficient of exactly 1. You
can ignore these cells; they simply show that each variable is perfectly correlated with

Figure 11–21 An SPSS scatter diagram showing the relationship between hours slept
last night and mood (fictional data).

Correlation 485

itself. (In larger correlation matrixes all of the cells on the diagonal from the top left
to the bottom right of the table will have a correlation coefficient of 1.) You will also
notice that the remaining two cells provide identical information. This is because the
table shows the correlations between sleep and mood and also between mood and
sleep (which are, of course, identical correlations). So you can look at either one. (In
a larger correlation matrix, you need only look either at all of the correlations above
the diagonal that goes from top left to bottom right or at all of the correlations below
that diagonal.) The correlation coefficient is .853 (which is usually rounded to two dec-
imal places in research articles). The significance level of .031 is less than our .05
cutoff, which means that it is a statistically significant correlation. The asterisk (*) by
the correlation of .853 also shows that it is statistically significant (at the .05 signifi-
cance level, as shown by the note under the table).

Figure 11–22 SPSS output window for the correlation between hours slept and mood
(fictional data).

1. There is also a “computational” version of this formula that is mathematically
equivalent and thus gives the same result:

This formula is easier to use when computing by hand (or with a hand calcula-
tor) when you have a large number of people in the study, because you don’t have

r =
Ng (XY) – (gX)(gY)

23NgX2 – (gX)2423NgY2 – (gY)24

Chapter Notes

486 Chapter 11

to first figure out all the deviation scores. However, researchers rarely use com-
putational formulas like this any more because the actual figuring is done by a
computer. As a student learning statistics, it is much better to use the definitional
formula (11–1). This is because when solving problems using the definitional
formula, you are strengthening your understanding of what the correlation coef-
ficient means. In all examples in this chapter, we use the definitional formula
and we urge you to use it in doing the chapter’s practice problems.

2. As we noted in Chapter 3, statisticians usually use Greek letters to denote a pop-
ulation parameter. The population parameter for a correlation is � (rho). However,
for ease of learning (and to avoid potential confusion with a term we introduce
later in the chapter) we use the ordinary letter r for both the correlation you fig-
ure from a sample and the correlation in a population.

3. More complete tables are provided in Cohen (1988, pp. 84–95).
4. More complete tables are provided in Cohen (1988, pp. 101–102).
5. To figure the correlation between getting a heart attack and taking aspirin, you

would have to make the two variables into numbers. For example, you could
make getting a heart attack equal 1 and not getting a heart attack equal 0; simi-
larly, you could make being in the aspirin group equal 1 and being in the control
group equal 0. It would not matter which two numbers you used for the two val-
ues for each variable. Whichever two numbers you use, the result will come out
the same after converting to deviation scores and using the correction number. The
only difference that the two numbers you use makes is that the value that gets the
higher number determines whether the correlation will be positive or negative.

Turn in your highest-quality paper
Get a qualified writer to help you with

“ week 5 -for courseworkhero.co.uk ”

Get high-quality paper

Guarantee! All work is written by expert writers!

Still stressed from student homework?

Get quality assistance from academic writers!

Order now