CHAPTER 4 QUESTIONS

Prepare a written response to the following assignments located in the text:FOLLOW THE TEMPLETE

·
Ch. 4, Practice Problem: 11, 14, and 18.

Name:

Save Time On Research and Writing

Hire a Pro to Write You a 100% Plagiarism-Free Paper.

Get My Paper

C

hapter 4 Instructions

Practice

Problem

11, 14, & 18

ue Week 4 Day 6 (Sunday)

Follow the instructions below to submit your answers for Chapter 4 Practice Problem 11, 14 & 18.

1. Save Chapter 4 Instructions to your computer.

2. Type your answers into the shaded boxes below. The boxes will expand as you type your answers.

3. Resave this form to your computer with your answers filled-in.

ttach the saved form to your reply when you turn-in your work in the Assignments section of the Classroom tab. Note: Each question in the assignments section will be listed separately; however, you only need to submit this form one time to turn-in your answers.

Read each question in your text book and then type your answers for Chapter 4 Practice Problem 11, 14 & 18 in the shaded boxes below. Please record only your answers. It is not necessary to show your work.

11.

Step 1 –

Step 2 –

Step 3 –

Step 4 –

Step 5 –

14. For

Conclusion

, select one: Reject the Null or Fail to Reject Null

Problem

Cutoff Score

Z Score

Conclusion

18. Use a one-tail test for the cutoff score. Reject Null or Fail to Reject Null

Cutoff Score =

Z Score =

Conclusion:

107

Introduction to Hypothesis Testing

Chapter Outline

✪ A Hypothesis-Testing Example 108

✪ The Core Logic of Hypothesis
Testing 109

✪ The Hypothesis-Testing Process 11

✪ One-Tailed and Two-Tailed
Hypothesis Tests 1

✪ Controversy: Should Significance
Tests Be Banned? 124

In this chapter, we introduce the crucial topic of hypothesis testing. A hypothesisis a prediction intended to be tested in a research study. The prediction may bebased on informal observation (as in clinical or applied settings regarding a pos-
sible practical innovation), on related results of previous studies, or on a broader
theory about what is being studied. You can think of a theory as a set of principles
that attempt to explain an important psychological process. A theory usually leads to
various specific hypotheses that can be tested in research studies.

This chapter focuses on the basic logic for analyzing results of a research study
to test a hypothesis. The central theme of hypothesis testing has to do with the im-
portant distinction between sample and population discussed in the last chapter:
hypothesis testing is a systematic procedure for deciding whether the results of a re-
search study, which examines a sample, support a hypothesis which applies to a pop-
ulation. Hypothesis testing is the central theme in all the remaining chapters of this
book, as it is in most research in psychology and related fields.

Many students find the most difficult part of the course to be mastering the basic
logic of this chapter and the next two. This chapter in particular requires some mental
gymnastics. Even if you follow everything the first time through, you will be wise to

✪ Hypothesis Tests in Research
Articles 1

✪ Summary 128

✪ Key Terms 129

✪ Example Worked-Out Problems 129

✪ Practice Problems 131

✪ Chapter Notes 13

CHAPTER 4

theory set of principles that attempt to
explain one or more facts, relationships,
or events; psychologists often derive
specific predictions from theories that
are then tested in research studies.

hypothesis prediction, often based on
informal observation, previous research,
or theory, that is tested in a research
study.

hypothesis testing procedure for de-
ciding whether the outcome of a study
(results for a sample) support a particular
theory or practical innovation (which is
thought to apply to a population).

IS
B

N
0-

55
8-

46
76

1-
X

108 Chapter 4

review the chapter thoroughly. Hypothesis testing involves grasping ideas that make
little sense covered separately; so in this chapter you learn several new ideas all at once.
However, once you understand the material in this chapter and the two that follow, your
mind will be used to this sort of thing, and the rest of the course should seem easier.

At the same time, we have kept this introduction to hypothesis testing as simple
as possible, putting off what we could for later chapters. For example, real-life psy-
chology research involves samples of many individuals. However, to minimize how
much you have to learn at one time, this chapter’s examples are about studies in
which the sample is a single individual. To do this, we use some odd examples. Just
remember that you are building a foundation that will, by Chapter 7, prepare you to
understand hypothesis testing as it is actually done in real research.

A Hypothesis-Testing Example
Here is our first necessarily odd example that we made up to keep this introduction to
hypothesis testing as straightforward as possible. A large research project has been
going on for several years. In this project, new babies are given a particular vitamin
and then the research team follows their development during the first 2 years of life. So
far, the vitamin has not speeded up the development of the babies. The ages at which
these and all other babies start to walk is shown in Figure 4–1. The mean is 14 months
( ), the standard deviation is 3 months ( ), and the ages follow a normal
curve. Based on the normal curve percentages, you can figure that fewer than 2% of
babies start walking before 8 months of age; these are the babies who are more than
2 standard deviations below the mean. [This fictional distribution is close to the true
distribution psychologists have found for European babies, although that true distrib-
ution is slightly skewed to the right (Hindley et al., 1966).]

One of the researchers working on the project has an idea. If the vitamin the ba-
bies are taking could be more highly refined, perhaps its effect would be dramatically
increased: babies taking the highly purified version should start walking much earli-
er than other babies. (We will assume that the purification process could not possibly
make the vitamin harmful.) However, refining the vitamin in this way is extremely
expensive for each dose; so the research team decides to try the procedure with just
enough purified doses for one baby. A newborn in the project is then randomly
selected to take the highly purified version of the vitamin, and the researchers then

� = 3� = 14

12 1413 15 16

Z Score:

Age (months):

17 18

19 20 21

−1
1098

−2
7

μ = 14
σ = 3

Figure 4–1 Distribution of when babies begin to walk (fictional data).

IS
B

N
0-558-46761-X

Introduction to Hypothesis Testing 109

follow this baby’s progress for 2 years. What kind of result should lead the re-
searchers to conclude that the highly purified vitamin allows babies to walk earlier?

This is a hypothesis-testing problem. The researchers want to draw a general
conclusion about whether the purified vitamin allows babies in general to walk ear-
lier. The conclusion will be about babies in general (a population of babies). How-
ever, the conclusion will be based on results of studying a sample. In this example,
the sample consists of a single baby.

The Core Logic of Hypothesis Testing
There is a standard way researchers approach any hypothesis-testing problem. For
this example, it works as follows. Consider first the population of babies in general
(those who are not given the specially purified vitamin). In this population, the
chance of a baby’s starting to walk at age 8 months or earlier would be less than 2%.
(As shown in Figure 4–1, the mean walking age is 14 months with a standard devia-
tion of 3 months.) Thus, walking at 8 months or earlier is highly unlikely among
such babies. But what if the randomly selected sample of one baby in our study does
start walking by 8 months? If the specially purified vitamin had no effect on this par-
ticular baby’s walking age (which means that the baby’s walking age should be sim-
ilar to that of babies who were not given the vitamin), it is highly unlikely (less than
a 2% chance) that the particular baby we selected at random would start walking by
8 months. So, if the baby in our study does in fact start walking by 8 months, that al-
lows us to reject the idea that the specially purified vitamin has no effect. And if we
reject the idea that the specially purified vitamin has no effect, then we must also
accept the idea that the specially purified vitamin does have an effect.

Using the same reasoning, if the baby starts walking by 8 months, we can reject
the idea that this baby comes from a population of babies like that of the general
population with a mean walking age of 14 months. We therefore conclude that ba-
bies given the specially purified vitamin will on the average start to walk before
14 months. Our explanation for the baby’s early walking age in the study is that the
specially purified vitamin speeded up the baby’s development.

In this example, the researchers first spelled out what would have to happen for
them to conclude that the special purification procedure makes a difference. Having
laid this out in advance, the researchers then conducted their study. Conducting the
study in this case meant giving the specially purified vitamin to a randomly selected
baby and watching to see how early that baby walked. We supposed that the result of
the study is that the baby started walking before 8 months. The researchers then con-
cluded that it is unlikely the specially purified vitamin makes no difference and thus
also that it does make a difference.

This kind of testing, with its opposite-of-what-you-predict, roundabout reason-
ing, is at the heart of inferential statistics in psychology. It is something like a double
negative. One reason for this approach is that we have the information to figure the
probability of getting a particular experimental result if the situation of there being
no difference is true. In the purified vitamin example, the researchers know what the
probabilities are of babies walking at different ages if the specially purified vitamin
does not have any effect. The probabilities of babies walking at various ages are
already known from studies of babies in general—that is, babies who have not
received the specially purified vitamin. If the specially purified vitamin has no ef-
fect, then the ages at which babies start walking are the same with or without the spe-
cially purified vitamin. Thus, the distribution is that shown in Figure 4–1, based on
ages at which babies start walking in general.

T I P F O R S U C C E S S
This section, The Core Logic of
Hypothesis Testing, is central to
everything else we do in the book.
Thus, you may want to read it a
few times. You should also be cer-
tain that you understand the logic
of hypothesis testing before read-
ing later chapters.

IS
B
N
0-
55
8-
46
76
1-
X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

110 Chapter 4

Without such a tortuous way of going at the problem, in most cases you could
not test hypotheses scientifically at all. In almost all psychology research, we base
our conclusions on the question, “What is the probability of getting our research re-
sults if the opposite of what we are predicting were true?” That is, we usually predict
an effect of some kind. However, we decide on whether there is such an effect by
seeing if it is unlikely that there is not such an effect. If it is highly unlikely that we
would get our research results if the opposite of what we are predicting were true,
that finding allows us to reject the opposite prediction. If we reject the opposite pre-
diction, we are able to accept our prediction. However, if it is likely that we would
get our research results if the opposite of what we are predicting were true, we are
not able to reject the opposite prediction. If we are not able to reject the opposite pre-
diction, we are not able to accept our prediction.

The Hypothesis-Testing Process
Let’s look at our example again, this time going over each step in some detail. Along
the way, we cover the special terminology of hypothesis testing. Most important, we
introduce the five steps of hypothesis testing that you use for the rest of this book.

Step ❶: Restate the Question as a Research Hypothesis
and a Null Hypothesis About the Populations
Our researchers are interested in the effects on babies in general (not just on this par-
ticular baby). That is, the purpose of studying samples is to know about populations.
Thus, it is useful to restate the research question in terms of populations. In our
example, we can think of two populations of babies:

Population 1: Babies who take the specially purified vitamin.
Population 2: Babies in general (that is, babies who do not take the specially
purified vitamin).

Population 1 consists of babies who receive the experimental treatment (the spe-
cially purified vitamin). In our example, we use a sample of one baby to draw a con-
clusion about the age at which babies in Population 1 start to walk. Population 2 is a
kind of comparison baseline of what is already known about babies in general.

The prediction of our research team is that Population 1 babies (those who take
the specially purified vitamin) will on the average walk earlier than Population 2 ba-
bies (babies in general who do not take the specially purified vitamin). This predic-
tion is based on the researchers’ theory of how these vitamins work. A prediction like
this about the difference between populations is called a research hypothesis. Put
more formally, the prediction is that the mean of Population 1 is lower (babies re-
ceiving the special vitamin walk earlier) than the mean of Population 2. In symbols,
the research hypothesis for this example is .

The opposite of the research hypothesis is that the populations are not different
in the way predicted. Under this scenario, Population 1 babies (those who take the
specially purified vitamin) will on the average not walk earlier than Population 2 ba-
bies (babies in general—those who do not take the specially purified vitamin). That
is, the prediction is that there is no difference in the ages at which Population 1 and
Population 2 babies start walking. On the average, they start at the same time. A
statement like this, about a lack of difference between populations, is the crucial
opposite of the research hypothesis. It is called a null hypothesis. It has this name

�1 6 �2

research hypothesis statement in
hypothesis testing about the predicted
relation between populations (often a
prediction of a difference between
population means).

null hypothesis statement about a
relation between populations that is the
opposite of the research hypothesis;
statement that in the population there is
no difference (or a difference opposite to
that predicted) between populations;
contrived statement set up to examine
whether it can be rejected as part of
hypothesis testing.

IS
B
N
0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

Introduction to Hypothesis Testing 111

because it states the situation in which there is no difference (the difference is “null”)
between the populations. In symbols, the null hypothesis is .1

The research hypothesis and the null hypothesis are complete opposites: if one
is true, the other cannot be. In fact, the research hypothesis is sometimes called the
alternative hypothesis—that is, it is the alternative to the null hypothesis. This term
is a bit ironic. As researchers, we care most about the research hypothesis. But when
doing the steps of hypothesis testing, we use this roundabout method of seeing
whether or not we can reject the null hypothesis so that we can decide about its alter-
native (the research hypothesis).

Step ❷: Determine the Characteristics
of the Comparison Distribution
Recall that the overall logic of hypothesis testing involves figuring out the probabil-
ity of getting a particular result if the null hypothesis is true. Thus, you need to know
what the situation would be if the null hypothesis were true. In our example, we start
out knowing the key information about Population 2, babies in the general popula-
tion (see Figure 4–1): we know it follows a normal curve, , and If the
null hypothesis is true, Population 1 and Population 2 are the same; in our example,
this would mean Populations 1 and 2 both follow a normal curve, , and

.
In the hypothesis-testing process, you want to find out the probability that you

could have gotten a sample score as extreme as what you got (say, a baby walking
very early) if your sample were from a population with a distribution of the sort you
would have if the null hypothesis were true. Thus, in this book we call this distrib-
ution a comparison distribution. (The comparison distribution is sometimes
called a sampling distribution—an idea we discuss in Chapter 5.) That is, in the
hypothesis-testing process, you compare the actual sample’s score to this compari-
son distribution.

In our vitamin example, the null hypothesis is that there is no difference in
walking age between babies who take the specially purified vitamin (Population 1)
and babies in general who do not take the specially purified vitamin (Population 2).
The comparison distribution is the distribution for Population 2, since this popula-
tion represents the walking age of babies if the null hypothesis is true. In later chap-
ters, you will learn about different types of comparison distributions, but the same
principle applies in all cases: The comparison distribution is the distribution that rep-
resents the population situation if the null hypothesis is true.

Step ❸: Determine the Cutoff Sample Score
on the Comparison Distribution at Which
the Null Hypothesis Should Be Rejected
Ideally, before conducting a study, researchers set a target against which they will
compare their result: how extreme a sample score they would need to decide against
the null hypothesis, that is, how extreme the sample score would have to be for it to
be too unlikely that they could get such an extreme score if the null hypothesis were
true. This is called the cutoff sample score. (The cutoff sample score is also known
as the critical value.)

Consider our purified vitamin example, in which the null hypothesis is that
walking age is not influenced by whether babies take the specially purified vitamin.
The researchers might decide that, if the null hypothesis were true, a randomly

� = 3
� = 14

� = 3.� = 14

�1 = �2

comparison distribution distribution
used in hypothesis testing. It represents
the population situation if the null hy-
pothesis is true. It is the distribution to
which you compare the score based on
your sample’s results.

cutoff sample score point in hypoth-
esis testing, on the comparison distribu-
tion at which, if reached or exceeded by
the sample score, you reject the null hy-
pothesis. Also called critical value.

IS
B
N
0-
55
8-
46
76
1-
X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

112 Chapter 4

selected baby walking before 8 months would be very unlikely. With a normal distri-
bution, being 2 or more standard deviations below the mean (walking by 8 months)
could occur less than 2% of the time. Thus, based on the comparison distribution, the
researchers set their cutoff sample score even before doing the study. They decide in
advance that if the result of their study is a baby who walks by 8 months, they will
reject the null hypothesis.

But what if the baby does not start walking until after 8 months? If that happens,
the researchers will not be able to reject the null hypothesis.

When setting in advance how extreme a sample’s score needs to be to reject the
null hypothesis, researchers use Z scores and percentages. In our purified vitamin ex-
ample, the researchers might decide that if a result were less likely than 2%, they
would reject the null hypothesis. Being in the bottom 2% of a normal curve means
having a Z score of about –2 or lower. Thus, the researchers would set –2 as their
Z-score cutoff point on the comparison distribution for deciding that a result is ex-
treme enough to reject the null hypothesis. So, if the actual sample Z score is –2 or
lower, the researchers will reject the null hypothesis. However, if the actual sample
Z score is greater than –2, the researchers will not reject the null hypothesis.

Suppose that the researchers are even more cautious about too easily rejecting the
null hypothesis. They might decide that they will reject the null hypothesis only if they
get a result that could occur by chance 1% of the time or less. They could then figure
out the Z-score cutoff for 1%. Using the normal curve table, to have a score in the
lower 1% of a normal curve, you need a Z score of –2.33 or less. (In our example, a
Z score of –2.33 means 7 months.) In Figure 4–2, we have shaded the 1% of the com-
parison distribution in which a sample would be considered so extreme that the possibil-
ity that it came from a distribution like this would be rejected. Now the researchers will
reject the null hypothesis only if the actual sample Z score is –2.33 or lower—that is, if
it falls in the shaded area in Figure 4–2. If the sample Z score falls outside the shaded
area in Figure 4–2, the researchers will not reject the null hypothesis.

In general, psychology researchers use a cutoff on the comparison distribution
with a probability of 5% that a score will be at least that extreme if the null hypothe-
sis were true. That is, researchers reject the null hypothesis if the probability of get-
ting a sample score this extreme (if the null hypothesis were true) is less than 5%.
This probability is usually written as p � .05. However, in some areas of research, or
when researchers want to be especially cautious, they use a cutoff of 1% (p 6 .01).2

12 1413 15 16
Z Score:
Age (months):
0
17 18
+1
19 20 21
+2
11
−1
1098
−2
7

Bottom 1%

−2.33

Figure 4–2 Distribution of when babies begin to walk, with bottom 1% shaded
(fictional data).

IS
B
N
0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

Introduction to Hypothesis Testing 113

These are called conventional levels of significance. They are described as the .05
significance level and the .01 significance level. We also refer to them as the 5% sig-
nificance level and the 1% significance level. (We discuss in more detail in Chapter 6
the issues in deciding on the significance level to use.) When a sample score is so
extreme that researchers reject the null hypothesis, the result is said to be statistically
significant (or significant, as it is often abbreviated).

Step ❹: Determine Your Sample’s Score
on the Comparison Distribution
The next step is to carry out the study and get the actual results for your sample.
Once you have the results for your sample, you figure the Z score for the sample’s
raw score based on the population mean and standard deviation of the comparison
distribution.

Assume that the researchers did the study and the baby who was given the spe-
cially purified vitamin started walking at 6 months. The mean of the comparison dis-
tribution to which we are comparing these results is 14 months and the standard
deviation is 3 months. That is, and Thus, a baby who walks at
6 months is 8 months below the population mean. This puts the baby 22⁄3 standard de-
viations below the population mean. The Z score for this sample baby on the com-
parison distribution is thus . Figure 4–3
shows the score of our sample baby on the comparison distribution.

Step ➎: Decide Whether to Reject the Null Hypothesis
To decide whether to reject the null hypothesis, compare your actual sample’s Z
score (from Step ❹) to the cutoff Z score (from Step ❸). In our example, the actual
result was . Let’s suppose the researchers had decided in advance that they
would reject the null hypothesis if the sample’s Z score was below . Since
is below , the researchers would reject the null hypothesis.

Alternatively, suppose the researchers had used the more conservative 1% sig-
nificance level. The needed Z score to reject the null hypothesis would then have

-2
-2.67-2

-2.67

[that is, Z = (6 – 14)>3 = -2.67]-2.67

� = 3.� = 14

statistically significant conclusion
that the results of a study would be un-
likely if in fact the sample studied repre-
sents a population that is no different
from the population in general; an out-
come of hypothesis testing in which the
null hypothesis is rejected.

conventional levels of significance
levels of signifi-

cance widely used in psychology.
p<.01)(p<.05,

12 1413 15 16
Z Score:
Age (months):
0
17 18
+1
19 20 21
+2
11
−1
1098
−2
7
Bottom 1%

Experimental
sample baby
(Z = −2.67)

Cutoff

Z Score

= −2.33

Figure 4–3 Distribution of when babies begin to walk, showing both the bottom 1%
and the single baby who is the sample studied (fictional data).

T I P F O R S U C C E S S
If you are unsure about these
symbols for population parameters

be sure to review Table 3

–2

on p. 87.
(�, �),

IS
B
N
0-
55
8-
46
76
1-
X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

114 Chapter 4

been or lower. But, again, the actual Z for the randomly selected baby was
(a more extreme score than ). Thus, even with this more conservative

cutoff, they would still reject the null hypothesis. This situation is shown in
Figure 4–3. As you can see in the figure, the bottom 1% of the distribution is shaded.
We recommend that you always draw such a picture of the distribution. Be sure to
shade in the part of the distribution that is more extreme (that is, farther out in
the tail) than the cutoff sample score. If your actual sample Z score falls within the
shaded region, you can reject the null hypothesis. Since the sample Z score
in this example falls within the shaded tail region, the researchers can reject the null
hypothesis.

If the researchers reject the null hypothesis, what remains is the research hy-
pothesis. In this example, the research team can conclude that the results of their
study support the research hypothesis that babies who take the specially purified
vitamin walk earlier than babies in general.

Implications of Rejecting or Failing to Reject
the Null Hypothesis
It is important to emphasize two points about the conclusions you can make from the
hypothesis-testing process. First, when you reject the null hypothesis, all you are
saying is that your results support the research hypothesis (as in our example). You
would not go on to say that the results prove the research hypothesis or that the re-
sults show that the research hypothesis is true. Terms such as prove and true are too
strong because the results of research studies are based on probabilities. Specifically,
they are based on the probability being low of getting your result if the null hypoth-
esis were true. Proven and true are okay terms in logic and mathematics, but to use
these words in conclusions from scientific research is unprofessional. (It is okay to
use true when speaking hypothetically—for example, “if this hypothesis were true,
then . . .”—but not when speaking of conclusions about an actual result.) What you
do say when you reject the null hypothesis is that the results are statistically signifi-
cant. You can also say that the results “support” or “provide evidence for” the
research hypothesis.

Second, when a result is not extreme enough to reject the null hypothesis, you
do not say that the result supports the null hypothesis. You simply say the result is
not statistically significant.

A result that is not strong enough to reject the null hypothesis means the study
was inconclusive. The results may not be extreme enough to reject the null hypothe-
sis, but the null hypothesis might still be false (and the research hypothesis true).
Suppose in our example that the specially purified vitamin had only a slight but still
real effect. In that case, we would not expect to find a baby who is given the purified
vitamin to be walking a lot earlier than babies in general. Thus, we would not be able
to reject the null hypothesis, even though it is false. (You will learn more about such
situations in the Decision Errors section in Chapter 6.)

Showing the null hypothesis to be true would mean showing that there is ab-
solutely no difference between the populations. It is always possible that there is a
difference between the populations but that the difference is much smaller than the
particular study was able to detect. Therefore, when a result is not extreme enough to
reject the null hypothesis, the results are said to be inconclusive. Sometimes, however,
if studies have been done using large samples and accurate measuring procedures,
evidence may build up in support of something close to the null hypothesis—that
there is at most very little difference between the populations. (We have more to say

(-2.67)

-2.33-2.67
-2.33

IS
B
N
0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

Introduction to Hypothesis Testing 1

on this important issue later in this chapter and in Chapter 6.) The basic logic of
hypothesis testing is summarized in Table 4–1, which also includes the logic for our
example of a baby who is given a specially purified vitamin.

Summary of Steps of Hypothesis Testing
Here is a summary of the five steps of hypothesis testing.

❶ Restate the question as a research hypothesis and a null hypothesis about
the populations.

❷ Determine the characteristics of the comparison distribution.
❸ Determine the cutoff sample score on the comparison distribution at which

the null hypothesis should be rejected.
❹ Determine your sample’s score on the comparison distribution.
➎ Decide whether to reject the null hypothesis.

A Second Example
Here is another fictional example. Two happy-go-lucky personality psychologists
are examining the theory that happiness comes from positive experiences. In partic-
ular, these researchers argue that if people have something very fortunate happen to
them, they become very happy and will still be happy 6 months later. So the re-
searchers plan the following experiment: a person will be randomly selected from
the North American adult public and given $10 million. Six months later, the per-
son’s happiness will be measured. It is already known (in this fictional example)
what the distribution of happiness is like in the general population of North Ameri-
can adults, and this is shown in Figure 4–4. On the test being used, the mean happi-
ness score is 70, the standard deviation is 10, and the distribution is approximately
normal.

Table 4–1 The Basic Logic of Hypothesis Testing, Including the Logic for the Example of the
Effect of a Specially Purified Vitamin on the Age That Babies Begin to Walk

Basic Logic Baby Example

Focus of
Research

Sample is studied Baby given specially purified vitamin and age
of walking observed

Question Is the sample typical of the general
population?

Is this baby’s walking age typical of babies
in general?

Answer Very unlikely Could be Very unlikely

∂∂∂
Conclusion The sample is

probably not from
the general
population; it is
probably from a
different
population.

This baby is probably not
from the general popula-
tion of babies,
because its walking
age is much lower than
for babies in general.
Therefore, babies who
take the specially puri-
fied vitamin will probably
begin walking at an ear-
lier age than babies in
the general population.

Inconclusive

IS
B
N
0-
55
8-
46
76
1-
X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

116 Chapter 4

The psychologists now carry out the hypothesis-testing procedure. That is, the
researchers consider how happy the person would have to be before they can confi-
dently reject the null hypothesis that receiving so much money does not make people
happier 6 months later. If the researchers’ result shows a very high level of happi-
ness, the psychologists will reject the null hypothesis and conclude that getting
$10 million probably does make people happier 6 months later. But if the result is
not very extreme, the researchers will conclude that there is not sufficient evidence
to reject the null hypothesis, and the results of the experiment are inconclusive.

Now let us consider the hypothesis-testing procedure in more detail in this
example, following the five steps.

❶ Restate the question as a research hypothesis and a null hypothesis about

the populations. There are two populations of interest:

Population 1: People who 6 months ago received $10 million.
Population 2: The general population (consisting of people who 6 months ago
did not receive $10 million).

The prediction of the personality psychologists, based on their theory of
happiness, is that Population 1 people will on the average be happier than Popu-
lation 2 people: in symbols, . The null hypothesis is that Population 1
people (those who get $10 million) will not be happier than Population 2 people
(people in general who do not get $10 million).

❷ Determine the characteristics of the comparison distribution. The comparison
distribution is the distribution that represents the population situation if the null hy-
pothesis is true. If the null hypothesis is true, the distributions of Populations 1 and
2 are the same. We know Population 2’s distribution (it is normally distributed with

and ); so we can use it as the comparison distribution.
❸ Determine the cutoff sample score on the comparison distribution at which

the null hypothesis should be rejected. What kind of result would be extreme
enough to convince us to reject the null hypothesis? In this example, assume
that the researchers decided the following in advance: they will reject the null
hypothesis as too unlikely if the results would occur less than 5% of the time if
this null hypothesis were true. We know that the comparison distribution is a
normal curve. Thus, we can figure that the top 5% of scores from the normal

� = 10� =

�1 7 �2

Z Score: 0 +1
95

−1−2

45Happiness Score: 9085807565605550

Figure 4–4 Distribution of happiness sources (fictional data).

IS
B
N
0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

Introduction to Hypothesis Testing 117

curve table begin at a Z score of about 1.64. Thus the researchers set as the cut-
off point for rejecting the null hypothesis a result in which the sample’s Z score
on the comparison distribution is at or above 1.64. (The mean of the comparison
distribution is 70 and the standard deviation is 10. Therefore, the null hypothe-
sis will be rejected if the sample result is at or above 86.4.)

❹ Determine your sample’s score on the comparison distribution. Now for the
results: six months after giving the randomly selected person $10 million, the
now very wealthy research participant takes the happiness test. The person’s
score is 80. As you can see from Figure 4–4, a score of 80 has a Z score of
on the comparison distribution.

❺ Decide whether to reject the null hypothesis. The Z score of the sample indi-
vidual is . The researchers set the minimum Z score to reject the null hypoth-
esis at . Thus, the sample score is not extreme enough to reject the null
hypothesis. The experiment is inconclusive; researchers would say the results
are “not statistically significant.” Figure 4–5 shows the comparison distribution
with the top 5% shaded and the location of the sample participant who received
$10 million.

You may be interested to know that Brickman et al. (1978) carried out a more
elaborate study based on the same question. They studied lottery winners as exam-
ples of people suddenly having a very positive event happen to them. Their results
were similar to those in our fictional example: those who won the lottery were not
much happier 6 months later than people who did not win the lottery. Also, another
group they studied, people who had become paraplegics through a random accident,
were not much less happy than other people 6 months later. These researchers con-
cluded that if a major event does have a lasting effect on happiness, it is probably not
a very big one. This conclusion is consistent with the findings of more recent studies
(e.g., Suh et al., 1996). Indeed, in recent years, a great deal of research has examined
what factors contribute to people’s level of happiness. If you are interested in know-
ing more about this topic, we highly recommend an article by Diener and colleagues
(2006) and social psychologist Daniel Gilbert’s (2006) engaging best seller,
Stumbling on Happiness.

+1.64
+1

+1
70
Z Score: 0 +1
95
+2−1−2
45Happiness Score: 9085807565605550

Top 5%

Cutoff Z Score = 1.64

Sample participant

(Z = 1)

Figure 4–5 Distribution of happiness scores with upper 5% shaded and showing the
location of the sample participant (fictional data).

IS
B
N
0-
55
8-
46
76
1-
X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

118 Chapter 4

How are you doing?

1. A sample of rats in a laboratory is given an experimental treatment intended
to make them learn a maze faster than other rats. State (a) the null hypothesis
and (b) the research hypothesis.

2. (a) What is a comparison distribution? (b) What role does it play in hypothesis
testing?

3. What is the cutoff sample score?
4. Why do we say that hypothesis testing involves a double negative logic?
5. What can you conclude when (a) a result is so extreme that you reject the null

hypothesis and (b) a result is not very extreme so that you cannot reject the
null hypothesis?

6. A training program to increase friendliness is tried on one individual randomly
selected from the general public. Among the general public (who do not get
this training program), the mean on the friendliness measure is 30 with a stan-
dard deviation of 4. The researchers want to test their hypothesis at the 5%
significance level. After going through the training program, this individual
takes the friendliness measure and gets a score of 40. What should the re-
searchers conclude?

Answers

1.(a) The population of rats like those that get the experimental treatment score
the same on the time to learn the maze as the population of rats in general
that do not get the experimental treatment. (b) The population of rats like
those that get the experimental treatment learn the maze faster than the pop-
ulation of rats in general that do not get the experimental treatment.

2.(a) A comparison distribution is a distribution to which you compare the re-
sults of your study. (b) In hypothesis testing, the comparison distribution is the
distribution for the situation when the null hypothesis is true. To decide
whether to reject the null hypothesis, you check how extreme the score of
your sample is on this comparison distribution—how likely it would be to get
a sample with a score this extreme if your sample came from this comparison
distribution.

3.The cutoff sample score is the Zscore at which, if the sample’s Zscore is
more extreme than it is on the comparison distribution, you reject the null
hypothesis.

4.We say that hypothesis testing involves a double negative logic because we
are interested in the research hypothesis, but we test whether it is true by
seeing if we can reject its opposite, the null hypothesis.

5.(a) The research hypothesis is supported when a result is so extreme that you
reject the null hypothesis; the result is statistically significant. (b) The result is
not statistically significant when a result is not very extreme; the result is in-
conclusive.

6.The training program increases friendliness. (The cutoff sample Zscore on the
comparison distribution is 1.64. The actual sample’s Zscore of 2.50 is more
extreme—that is, farther in the tail—than the cutoff Zscore. Therefore, reject
the null hypothesis; the research hypothesis is supported; the result is statis-
tically significant.)

IS
B
N
0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

Introduction to Hypothesis Testing 119

directional hypothesis research hy-
pothesis predicting a particular direction
of difference between populations—for
example, a prediction that the population
like the sample studied has a higher
mean than the population in general.

one-tailed test hypothesis-testing
procedure for a directional hypothesis;
situation in which the region of the com-
parison distribution in which the null hy-
pothesis would be rejected is all on one
side (tail) of the distribution.

nondirectional hypothesis research
hypothesis that does not predict a partic-
ular direction of difference between the
population like the sample studied and
the population in general.

One-Tailed and Two-Tailed Hypothesis Tests
In our examples so far, the researchers were interested in only one direction of
result. In our first example, researchers tested whether babies given the specially
purified vitamin would walk earlier than babies in general. In the happiness exam-
ple, the personality psychologists predicted the person who received $10 million
would be happier than other people. The researchers in these studies were not in-
terested in the possibility that giving the specially purified vitamin would cause
babies to start walking later or that people getting $10 million might become less
happy.

Directional Hypotheses and One-Tailed Tests
The purified vitamin and happiness studies are examples of testing a directional
hypothesis. Both studies focused on a specific direction of effect. When a researcher
makes a directional hypothesis, the null hypothesis is also, in a sense, directional.
Suppose the research hypothesis is that getting $10 million will make a person hap-
pier than the general population. The null hypothesis, then, is that the money will
either have no effect or make the person less happy. [In symbols, if the research hy-
pothesis is , then the null hypothesis is (“ ” is the symbol for
less than or equal to).] Thus, in Figure 4–5, to reject the null hypothesis, the sample
has to have a score in one tail of the comparison distribution: the upper extreme or
tail (in this example, the top 5%) of the comparison distribution. (When it comes to
rejecting the null hypothesis with a directional hypothesis, a score at the other tail is
the same as a score in the middle; that is, such a score does not allow you to reject
the null hypothesis.) For this reason, the test of a directional hypothesis is called a
one-tailed test. A one-tailed test can be one-tailed in either direction. In the happi-
ness study example, the tail for the predicted effect was at the high end. In the baby
study example, the tail for the predicted effect was at the low end (that is, the predic-
tion tested was that babies given the specially purified vitamin would start walking
unusually early).

Nondirectional Hypotheses and Two-Tailed Tests
Sometimes, a research hypothesis states that an experimental procedure will have
an effect, without saying whether it will produce a very high score or a very low
score. Suppose an organizational psychologist is interested in how a new social
skills program will affect productivity. The program could either improve produc-
tivity by making the working environment more pleasant or hurt productivity by
encouraging people to socialize instead of work. The research hypothesis is that the
social skills program changes the level of productivity; the null hypothesis is that
the program does not change productivity one way or the other. In symbols, the re-
search hypothesis is (“ ” is the symbol for not equal); the null hypothesis
is .

When a research hypothesis predicts an effect but does not predict a direction
for the effect, it is called a nondirectional hypothesis. To test the significance of a
nondirectional hypothesis, you have to consider the possibility that the sample
could be extreme at either tail of the comparison distribution. Thus, this is called a
two-tailed test.

�1 = �2
Z�1 Z �2

…�1 … �2�1 7 �2

two-tailed test hypothesis-testing
procedure for a nondirectional hypothe-
sis; the situation in which the region of
the comparison distribution in which the
null hypothesis would be rejected is di-
vided between the two sides (tails) of the
distribution.

IS
B
N
0-
55
8-
46
76
1-
X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

120 Chapter 4

Determining Cutoff Scores with Two-Tailed Tests
There is a special complication in a two-tailed test. You have to divide the signifi-
cance percentage between the two tails. For example, with a 5% significance level,
you reject a null hypothesis only if the sample is so extreme that it is in either the top
2.5% or the bottom 2.5% of the comparison distribution. This keeps the overall level
of significance at a total of 5%.

Note that a two-tailed test makes the cutoff Z scores for the 5% level and
. For a one-tailed test at the 5% level, the cutoff is not so extreme: only

or . But with a one-tailed test, only one side of the distribution is considered.
These situations are shown in Figure 4–6a.

Using the 1% significance level, a two-tailed test (.5% at each tail) has cutoffs of
and , while a one-tailed test’s cutoff is either or . These sit-

uations are shown in Figure 4–6b. The Z score cutoffs for one-tailed and two-tailed
tests for the .05 and .01 significance levels are also summarized in Table 4–2.

-2.33+2.33-2.58+2.58

-1.64
+1.64-1.96

+1.96

0 +1 +2

.05 (one-tailed)

Z Score

(a)

.025 (=.05 two-tailed)

−1−2

(.05 two-tailed =) .025

0 +1 +2

.01 (one-tailed)

Z Score

(b)

.005 (=.01 two-tailed)

−1−2

(.01 two-tailed =) .005

−3 +3

1.64
1.96−1.96

−2.58

2.33
2.58

.01 significance level

.05 significance level

Figure 4–6 Significance level cutoffs for one-tailed and two-tailed tests: (a) .05 signi-
ficance level; (b) .01 significance level. (The one-tailed tests in these examples assume the
prediction was for a high score. You could instead have a one-tailed test where the prediction
is for the lower, left tail.)

IS
B
N
0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

Introduction to Hypothesis Testing 121

Table 4–2 One-Tailed and Two-Tailed Cutoff Z Scores for the .05 and .01 Significance Levels

Type of Test

One-Tailed Two-Tailed

Significance .05 �1.64 or 1.64 �1.96 and 1.96

Level .01 �2.33 or 2.33 �2.58 and 2.58

When to Use One-Tailed or Two-Tailed Tests
If the researcher decides in advance to use a one-tailed test, then the sample’s score
does not need to be so extreme to be significant compared to what would be needed
with a two-tailed test. Yet there is a price. If the result is extreme in the direction op-
posite to what was predicted—no matter how extreme—the result cannot be consid-
ered statistically significant.

In principle, you plan to use a one-tailed test when you have a clearly directional
hypothesis and a two-tailed test when you have a clearly nondirectional hypothesis.
In practice, the decision is not so simple. Even when a theory clearly predicts a par-
ticular result, the actual result may come out opposite to what you expected. Some-
times, the opposite may be more interesting than what you had predicted. (For
example, what if, as in all the fairy tales about wish-granting genies and fish, receiv-
ing $10 million and being able to fulfill almost any desire had made that individual
miserable?) By using one-tailed tests, we risk having to ignore possibly important
results.

For these reasons, researchers disagree about whether one-tailed tests should be
used, even when there is a clearly directional hypothesis. To be safe, many re-
searchers use two-tailed tests for both nondirectional and directional hypotheses. If
the two-tailed test is significant, then the researcher looks at the result to see the di-
rection and considers the study significant in that direction. In practice, always using
two-tailed tests is a conservative procedure because the cutoff scores are more ex-
treme for a two-tailed test and so it is less likely that a two-tailed test will give a sig-
nificant result. Thus, if you do get a significant result with a two-tailed test, you are
more confident about the conclusion. In fact, in most psychology research articles,
unless the researcher specifically states that a one-tailed test was used, it is assumed
that the test was two-tailed.

In practice, however, our experience is that most research results are either so
extreme that they will be significant whether you use a one-tailed or two-tailed test or
so far from extreme that they would not be significant in either kind of test. But what
happens when a result is less certain? The researcher’s decision about one- or two-
tailed tests now can make a big difference. In this situation the researcher tries to use
the type of test that will give the most accurate and noncontroversial conclusion. The
idea is to let nature—not a researcher’s decisions—determine the conclusion as much
as possible. Further, whenever a result is less than completely clear one way or the
other, most researchers are not comfortable drawing strong conclusions until more
research is done.

Example of Hypothesis Testing with a Two-Tailed Test
Here is one more fictional example, this time using a two-tailed test. Clinical psy-
chologists at a residential treatment center have developed a new type of therapy
to reduce depression that they believe is more effective than the current therapy.

IS
B
N
0-
55
8-
46
76
1-
X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

122 Chapter 4

69.5

Z Score: 0 +1 +2−1−2
Depression Score: 97.783.655.441.3

Figure 4–7 Distribution of depression scores at 4 weeks after admission for diagnosed
depressed psychiatric patients receiving the standard therapy (fictional data).

However, as with any treatment, it could make patients’ depression worse. Thus, the
clinical psychologists make a nondirectional hypothesis.

The psychologists randomly select an incoming patient to receive the new form
of therapy instead of the usual therapy. (In a real study, of course, more than one pa-
tient would be selected, but let’s assume that only one person has been trained to do
the new therapy and she has time to treat only one patient.) After 4 weeks, the patient
fills out a standard depression scale that is given automatically to all patients after
4 weeks. The standard scale has been given at this treatment center for a long time.
Thus, the psychologists know in advance the distribution of depression scores at
4 weeks for those who receive the usual therapy: it follows a normal curve with a
mean of 69.5 and a standard deviation of 14.1. [These figures correspond roughly to
the depression scores found in a national survey of 75,000 psychiatric patients given
a widely used standard test (Dahlstrom et al., 1986).] This distribution is shown in
Figure 4–7.

The clinical psychologists then carry out the five steps of hypothesis-testing.

❶ Restate the question as a research hypothesis and a null hypothesis about
the populations. There are two populations of interest:

Population 1: Patients diagnosed as depressed who receive the new therapy.
Population 2: Patients diagnosed as depressed in general (who receive the
usual therapy).

The research hypothesis is that when measured on depression 4 weeks after admis-
sion, patients who receive the new therapy (Population 1) will on the average score
differently from patients who receive the current therapy (Population 2).
In symbols, the research hypothesis is . The opposite of the research hy-
pothesis, the null hypothesis, is that patients who receive the new therapy will
have the same average depression level as the patients who receive the usual ther-
apy. (That is, the depression level measured after 4 weeks will have the same
mean for Populations 1 and 2.) In symbols, the null hypothesis is

❷ Determine the characteristics of the comparison distribution. If the null hy-
pothesis is true, the distributions of Populations 1 and 2 are the same. We know

�1 = �2.

�1 Z �2T I P F O R S U C C E S S
Remember that the research hy-
pothesis and null hypothesis must
always be complete opposites.
Researchers specify the research
hypothesis and this determines the
null hypothesis that goes with it.

IS
B
N
0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

Introduction to Hypothesis Testing 1

the distribution of Population 2 (it is the one shown in Figure 4–7). Thus, we can
use Population 2 as our comparison distribution. As noted, it follows a normal
curve, with and

❸ Determine the cutoff sample score on the comparison distribution at which
the null hypothesis should be rejected. The clinical psychologists select the
5% significance level. They have made a nondirectional hypothesis and will
therefore use a two-tailed test. Thus, they will reject the null hypothesis only if
the patient’s depression score is in either the top or bottom 2.5% of the compar-
ison distribution. In terms of Z scores, these cutoffs are �1.96 and �1.96 (see
Figure 4–6 and Table 4–2).

❹ Determine your sample’s score on the comparison distribution. The patient
who received the new therapy was measured 4 weeks after admission. The pa-
tient’s score on the depression scale was 41, which is a Z score on the comparison
distribution of �2.02. That is,
Figure 4–8 shows the distribution of Population 2 for this study, with the upper
and lower 2.5% areas shaded; the depression score of the sample patient is also
shown.

➎ Decide whether to reject the null hypothesis. A Z score of �2.02 is slightly
more extreme than a Z score of �1.96, which is where the lower 2.5% of the
comparison distribution begins. Notice in Figure 4–8 that the Z score of �2.02
falls within the shaded area in the left tail of the comparison distribution. This Z
score of �2.02 is a result so extreme that it is unlikely to have occurred if this pa-
tient were from a population no different from Population 2. Therefore, the clini-
cal psychologists reject the null hypothesis. The result is statistically significant,
and it supports the research hypothesis that depressed patients receiving the new
therapy have different depression levels than depressed patients in general who
receive the usual therapy.

Z = (X – M)>SD = (41 – 69.5)>14.1 = -2.02.

� = 14.1.� = 69.5

69.5
Z Score: 0 +1 +2−1−2
Depression Score: 97.783.655.441.3

Sample patient
depression = 41

Z = −2.02

Cutoff Z Score
= −1.96

Cutoff Z Score
= 1.96

Figure 4–8 Distribution of depression scores with upper and lower 2.5% shaded and
showing the sample patient who received the new therapy (fictional data).

T I P F O R S U C C E S S
When carrying out the five steps of
hypothesis testing, always draw a
figure like Figure 4–8. Be sure to
include the cutoff score(s) and
shade the appropriate tail(s). If the
sample score falls inside a shaded
tail region, you can reject the null
hypothesis and the result is statis-
tically significant. If the sample
score does not fall inside a shaded
tail region, you cannot reject the
null hypothesis.

IS
B
N
0-
55
8-
46
76
1-
X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

124 Chapter 4

How are you doing?

1. What is a nondirectional hypothesis test?
2. What is a two-tailed test?
3. Why do you use a two-tailed test when testing a nondirectional hypothesis?
4. What is the advantage of using a one-tailed test when your theory predicts a

particular direction of result?
5. Why might you use a two-tailed test even when your theory predicts a partic-

ular direction of result?
6. A researcher predicts that making people hungry will affect how they do on a

coordination test. A randomly selected person is asked not to eat for 24 hours
before taking a standard coordination test and gets a score of 400. For peo-
ple in general of this age group and gender, tested under normal conditions,
coordination scores are normally distributed with a mean of 500 and a stan-
dard deviation of 40. Using the .01 significance level, what should the re-
searcher conclude?

Answers

1.A nondirectional hypothesis test is a hypothesis test in which you do not pre-
dict a particular direction of difference.

2.Atwo-tailedtestisoneinwhichtheoverallpercentageforthecutoffisdivided
between the two tails of the comparison distribution. A two-tailed test is used
to test the significance of a nondirectional hypothesis.

3.You use a two-tailed test when testing a nondirectional hypothesis because
an extreme result in either direction supports the research hypothesis.

4.The cutoff for a one-tailed test is not so extreme; thus, if your result comes
out in the predicted direction, it is more likely to be significant. The cutoff is
not so extreme because the entire percentage (say 5%) is put in one tail in-
stead of being divided between two tails.

5.It lets you count as significant an extreme result in either direction; if you used
a one-tailed test and the result came out opposite to the prediction, it could
not be called statistically significant.

6.The cutoffs are and . The sample person’s Zscore is (
. The result is not significant; the study is inconclusive. 40=-2.5

400-500)> -2.58 +2.58

Controversy: Should Significance Tests
Be Banned?
In recent years, there has been a major controversy about significance testing itself,
with a concerted movement on the part of a small but vocal group of psychologists
to ban significance tests completely! This is a radical suggestion with far-reaching
implications: for at least half a century, nearly every research study in psychology
has used significance tests. There probably has been more written in the major psy-
chology journals in the last dozen years or so about this controversy than ever before
in history about any issue having to do with statistics.

The discussion has gotten so heated that one article began as follows:

It is not true that a group of radical activists held 10 statisticians and six editors
hostage at the . . . convention of the American Psychological Society and chanted,
“Support the total test ban!” and “Nix the null!” (Abelson, 1997, p. 12)

IS
B
N
0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

Introduction to Hypothesis Testing 125

Since this is by far the most important controversy in years regarding statistics
as used in psychology, we discuss the issues in at least three different places. In this
chapter we focus on some basic challenges to hypothesis testing. In Chapters 5 and 6,
we cover other topics that relate to aspects of hypothesis testing that you will learn
about in those chapters.

Before discussing this controversy, you should be reassured that you are not
learning about hypothesis testing for nothing. Whatever happens in the future, you
absolutely have to understand hypothesis testing to make sense of virtually every re-
search article published in the past. Further, in spite of the controversy that has raged
for more than a decade, it is extremely rare to see new articles that do not use signif-
icance testing. Thus, it is doubtful that any major shifts will occur in the near future.
Finally, even if hypothesis testing is completely abandoned, the alternatives (which
involve procedures you will learn about in Chapters 5 and 6) require understanding
virtually all of the logic and procedures we are covering here.

So what is the big controversy? Some of the debate concerns subtle points of
logic. For example, one issue relates to whether it makes sense to worry about reject-
ing the null hypothesis when a hypothesis of no effect whatsoever is extremely un-
likely to be true. Another issue is about the foundation of hypothesis testing in terms
of populations and samples, since in most experiments the samples we use are not
randomly selected from any definable population. We discussed some points relating
to this issue in Chapter 3. Finally, some have questioned the appropriateness of con-
cluding that if the data are inconsistent with the null hypothesis, this should be
counted as evidence for the research hypothesis. This controversy becomes rather
technical, but our own view is that, given recent considerations of the issues, the way
researchers in psychology use hypothesis testing is reasonable (Balluerka et al.,
2005; Iacobucci, 2005; Nickerson, 2000).

However, the biggest complaint against significance tests, and the one that has
received almost universal agreement, is that they are misused (Balluerka et al.,
2005). In fact, opponents of significance tests argue that even if there were no other
problems with the tests, they should be banned simply because they are so often and
so badly misused. They are misused in two main ways: one we can consider now; the
other must wait until we have covered a topic you learn in Chapter 6.

A major misuse of significance tests is the tendency for researchers to decide
that if a result is not significant, the null hypothesis is shown to be true (see Box 4–1).
We have emphasized that when you can’t reject the null hypothesis, the results are
simply inconclusive. The error of concluding the null hypothesis is true from failing
to reject it is extremely serious, because important theories and methods may be con-
sidered false just because a particular study did not get strong enough results. [You
learn in Chapter 6 that it is quite easy for a true research hypothesis not to come out
significant just because there were too few people in the study or the measures were
not very accurate. In fact, Hunter (1997) argues that in about 60% of psychology
studies, we are likely to get nonsignificant results even when the research hypothesis
is actually true.]

What should be done? The general consensus seems to be that we should keep
significance tests, but better train our students not to misuse them (hence the empha-
sis on these points in this book). We should not, as it were, throw the baby out with
the bathwater. To address this controversy, the American Psychological Association
(APA) established a committee of eminent psychologists renowned for their statisti-
cal expertise. The committee met over a two-year period, circulated a preliminary
report, and considered reactions to it from a large number of researchers. In the end,
they strongly condemned various misuses of significance testing of the kind we have

IS
B
N
0-
55
8-
46
76
1-
X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

126 Chapter 4

been discussing, but they left its use up to the decision of each researcher. In their
report they concluded:

Some had hoped that this task force would vote to recommend an outright ban on the
use of significance tests in psychology journals. Although this might eliminate some
abuses, the committee thought there were enough counterexamples (e.g., Abelson,
1997) to justify forbearance. (Wilkinson & Task Force on Statistical Inference, 1999,
pp. 602–603)

Balluerka and colleagues (2005) reviewed the arguments for and against signif-
icance testing. Their conclusion, with which we agree (as do probably most psychol-
ogy researchers), is that “. . . rigorous research activity requires the use of . . .
[significance testing] in the appropriate context, the complementary use of other
methods which provide information about aspects not addressed by . . . [significance
testing], and adherence to a series of recommendations which promote its rational
use in psychological research” (p. 55).

really began to force the issue of the mindless use of sig-
nificance testing. But he still used humor to tease behav-
ioral and social scientists for their failure to see the
problems inherent in the arbitrary yes-no decision fea-
ture of null hypothesis testing. For example, he liked to
remind everyone that significance testing came out of Sir
Ronald Fisher’s work in agriculture (see Box 9–1), in
which the decisions were yes-no matters such as whether
a crop needed manure. He pointed out that behavioral
and social scientists “do not deal in manure, at least not
knowingly” (Cohen, 1990, p. 1307)! He really disliked
the fact that Fisher-style decision making is used to de-
termine the fate of not only doctoral dissertations, re-
search funds, publications, and promotions, “but whether
to have a baby just now” (1990, p. 1307). And getting
more serious, he charged that significance testing’s
“arbitrary unreasonable tyranny has led to data fudging
of varying degrees of subtlety, from grossly altering data
to dropping cases where there ‘must have been’ errors”
(p. 1307).

Cohen was active in many social causes, especially
desegregation in the schools and fighting discrimination
in police departments. He cared passionately about
everything he did. He was deeply loved. And he suffered
from major depression, becoming incapacitated by it
four times in his life.

Got troubles? Got no more math than high school al-
gebra? It doesn’t have to stop you from contributing to
science.

BOX 4–1 Jacob Cohen, the Ultimate New Yorker:
Funny, Pushy, Brilliant, and Kind

New Yorkers can be proud of Jacob Cohen, who single-
handedly introduced to behavioral and social scientists
some of our most important statistical tools. Never
worried about being popular—although he was—he
almost single-handedly forced the current debate over
significance testing, which he liked to joke was en-
trenched like a “secular religion.” About the asterisk
that accompanies a significant result, he said the
religion must be “of Judeo-Christian derivation, as it
employs as its most powerful icon a six-pointed cross”
(1990, p. 1307).

Jacob entered graduate school at New York Univer-
sity (NYU) in clinical psychology in 1947 and three
years later had a masters and a doctorate. He then
worked in rather lowly roles for the Veterans Adminis-
tration, doing research on various practical topics, until
he returned to NYU in 1959. There he became a very
famous faculty member because of his creative, off-
beat ideas about statistics. Amazingly, he made his con-
tributions having no mathematics training beyond high
school algebra.

But a lack of formal training may have been Jacob
Cohen’s advantage because he emphasized looking at
data and thinking about them, not just applying a stan-
dard analysis. In particular, he demonstrated that the
standard methods were not working very well, especially
for the “soft” fields of psychology such as clinical, per-
sonality, and social psychology. Many of his ideas were
hailed as great breakthroughs. Starting in the 1990s he

IS
B
N
0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

Introduction to Hypothesis Testing 127

Hypothesis Tests in Research Articles
In general, hypothesis testing is reported in research articles using one of the specific
methods of hypothesis testing you learn in later chapters. For each result of interest,
the researcher usually first indicates whether the result was statistically significant.
(Note that, as with the first of the following examples, the researcher will not neces-
sarily use the word significant; so look out for other indicators, such as reporting that
scores on a variable decreased, increased, or were associated with scores on another
variable.) Next, the researcher usually gives the symbol associated with the specific
method used in figuring the probability that the result would have occurred if the
null hypothesis was true, such as t, F, r, or (see Chapters 7 to 13). Finally, there
will be an indication of the significance level, such as p .05 or p .01. (The re-
searcher will usually also provide other information, such as the mean and standard
deviation of sample scores.) For example, in a study of competitive Scrabble play-
ers, Halpern and Wai (2007) reported: “Contrary to expectations, the number of cor-
rectly defined words correlated significantly with participants’ official Scrabble
rating, � .45, p .05, showing a moderate relationship (Cohen & Cohen,
1983), with higher-rated players defining more words correctly.” There is a lot here
that you will learn about in later chapters, but the key thing to understand now about
this result is the “p .05.” This means that the probability of the results if the null
hypothesis were true is less than .05 (5%).

When a result is close but does not reach the significance level chosen, it may be
reported anyway as a “near significant trend” or as having “approached signifi-
cance.” When a result is not even close to being extreme enough to reject the null hy-
pothesis, it may be reported as “not significant,” or the abbreviation ns will be used.
Finally, whether or not a result is significant, it is increasingly common for re-
searchers to report the exact p level, such as p � .03 or p � .27 (these are given in
computer outputs of hypothesis testing results). The p reported here is based on the
proportion of the comparison distribution that is more extreme than the sample score
information that you could figure from the Z score for your sample and a normal
curve table.

A researcher will usually note if he or she used a one-tailed test. When reading
research articles, assume the researcher used a two-tailed test if nothing is said oth-
erwise. Even though a researcher has chosen a significance level in advance, such as
.05, the researcher may note that results meet more rigorous standards. Thus, in the
same article, you may see some results noted as “p � .05,” others as “p � .01,” and
still others as “p � .001.”

Finally, often researchers show hypothesis testing results only as asterisks
(stars) in a table of results. In such tables, a result with an asterisk means it is signif-
icant, while a result without an asterisk is not. For example, Table 4–3 shows the re-
sults of part of a study by Bohnert and colleagues (2007) comparing various aspects
of social adjustment to college of male and female college students during the sum-
mer before their first year of college (Time 1) and 10 months later (Time 2). The
table gives figures for means, standard deviations, and t statistics—the “t(83)” is
about details of the specific hypothesis testing procedure used in this study called a
t test, which you will learn in Chapters 7 and 8. The important things to look at now
are the asterisks (and the notes at the bottom of the table that go with them). The as-
terisks tell you the significance levels for the various comparisons. For example, fe-
males had a higher level of friendship quality at Time 1 (M � 2.82) than males (M �
2.49); thus there are three asterisks at the end of the row for this result, which the
note at the bottom tells you means that the probability of getting this big a difference

6r(21)

66
�2

IS
B
N
0-
55
8-
46
76
1-
X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

128 Chapter 4

if the null hypothesis was true is less than one in a thousand (.001). At Time 1, males
reported being more lonely (M � 39.30) than females (M � 34.78), but you can see
that there was no significant gender difference in loneliness at Time 2 (the means
were 37.88 and 34.71, and the lack of an asterisk in this row indicates that these were
not different enough to be significant in this study). At Time 2, females again reported
a significantly higher level of friendship quality (M � 3.21) than males (M � 2.84);
the asterisks show that the difference was significant at the .001 (one in a thousand)
level.

In reporting results of significance testing, researchers rarely talk explicitly
about the research hypothesis or the null hypothesis, nor do they describe any of the
other steps of the process in detail. It is assumed that readers of psychology research
understand all of this very well.

Table 4–3 Means and Standard Deviation for Main Study Variables by Gender
Total

(n � 85)
Males

(n � 31)
Females
(n � 54)

M SD M SD M SD t(83)

Adolescence (Time 1)

Friendship quality 2.70 0.40 2.49 0.46 2.82 0.32 13.98***

Loneliness 36.39 8.71 39.30 9.98 34.78 7.56 5.47*

Emerging adulthood (Time 2)

Friendship quality 3.10 0.48 2.84 0.57 3.21 0.38 11.31***

Loneliness 35.84 9.98 37.88 11.38 34.71 9.21 1.76

Activities: Intensity 8.09 8.27 10.00 10.19 7.18 7.18 0.98

Activities: Breadth 1.71 1.06 1.84 1.18 1.65 1.01 0.51

*p � .05. **p � .01. ***p � .001.
Source: Bohnert, A. M., Aikins, J. W., & Edidin, J. (2007). The role of organized activities in facilitating social adaptation across
the transition to college. Journal of Adolescent Research, 22, 189–208. Sage Publications, Ltd. Reprinted by permission of Sage
Publications, Thousands Oaks, London, and New Delhi.

1. Hypothesis testing considers the probability that the result of a study could have
come about even if the experimental procedure had no effect. If this probability
is low, the scenario of no effect is rejected and the hypothesis behind the exper-
imental procedure is supported.

2. The expectation of an effect is the research hypothesis, and the hypothetical
situation of no effect is the null hypothesis.

3. When a result (that is, a sample score) is so extreme that the result would be
very unlikely if the null hypothesis were true, the researcher rejects the null hy-
pothesis and describes the research hypothesis as supported. If the result is not
that extreme, the researcher does not reject the null hypothesis, and the study is
inconclusive.

4. Psychologists usually consider a result too extreme if it is less likely than 5%
(that is, a significance level of p � .05) to have come about if the null hypothe-
sis were true. Psychologists sometimes use a more stringent 1% (p � .01 signif-
icance level), or even .1% (p � .001 significance level), cutoff.

Summary

Introduction to Hypothesis Testing 129

5. Thecutoffpercentage is theprobabilityof the result beingextreme inapredicted di-
rection in a directional or one-tailed test. The cutoff percentages are the probability
of the result being extreme in either direction in a nondirectional or two-tailed test.

6. The five steps of hypothesis testing are:
❶ Restate the question as a research hypothesis and a null hypothesis

about the populations.
❷ Determine the characteristics of the comparison distribution.
❸ Determine the cutoff sample score on the comparison distribution at

which the null hypothesis should be rejected.
❹ Determine your sample’s score on the comparison distribution.
❺ Decide whether to reject the null hypothesis.

7. There has been much controversy about significance tests, including critiques
of the basic logic and, especially, that they are often misused. One major way
researchers misuse significance tests is by interpreting not rejecting the null
hypothesis as demonstrating that the null hypothesis is true.

8. Research articles typically report the results of hypothesis testing by saying a re-
sult was or was not significant and giving the probability level cutoff (usually
5% or 1%) that the decision was based on.

hypothesis testing (p. 107)
hypothesis (p. 107)
theory (p. 107)
research hypothesis (p. 110)
null hypothesis (p. 110)

comparison distribution (p. 111)
cutoff sample score (p. 111)
conventional levels of significance

(p � .05, p � .01) (p. 113)
statistically significant (p. 113)

directional hypothesis (p. 119)
one-tailed test (p. 119)
nondirectional hypothesis

(p. 119)
two-tailed test (p. 119)

Key Terms

A randomly selected individual, after going through an experimental treatment, has a
score of 27 on a particular measure. The scores of people in general on this measure are
normally distributed with a mean of 19 and a standard deviation of 4. The researcher
predicts an effect, but does not predict a particular direction of effect. Using the 5% sig-
nificance level, what should you conclude? Solve this problem explicitly using all five
steps of hypothesis testing and illustrate your answer with a sketch showing the compar-
ison distribution, the cutoff (or cutoffs), and the score of the sample on this distribution.

Answer
❶ Restate the question as a research hypothesis and a null hypothesis about

the populations. There are two populations of interest:

Population 1: People who go through the experimental procedure.
Population 2: People in general (that is, people who do not go through the
experimental procedure).

The research hypothesis is that Population 1 will score differently than Popula-
tion 2 on the particular measure. The null hypothesis is that the two populations
are not different on the measure.

Example Worked-Out Problems

130 Chapter 4

11
–2

Raw Score:

Z Score:
15

–1

19
0
23
+1
27
+2
Sample participant

Raw Score = 27

Z Score = 2

Cutoff Z Score
= −1.96
Cutoff Z Score
= 1.96

Figure 4–9 Diagram for Example Worked-Out Problem showing comparison distribu-
tion, cutoffs (2.5% shaded area in each tail), and sample score.

❷ Determine the characteristics of the comparison distribution: ,
normally distributed.

❸ Determine the cutoff sample score on the comparison distribution at which
the null hypothesis should be rejected. For a two-tailed test at the 5% level (2.5%
at each tail), the cutoff scores are and (see Figure 4–6 or Table 4–2).

❹ Determine your sample’s score on the comparison distribution. Z �
(27 � 19)�4 � 2.

❺ Decide whether to reject the null hypothesis. A Z score of 2 is more extreme
than the cutoff Z of Reject the null hypothesis; the result is significant.
The experimental procedure affects scores on this measure. The diagram is shown
in Figure 4–9.

Outline for Writing Essays for Hypothesis-Testing
Problems Involving a Single Sample of
One Participant and a Known

Population

1. Describe the core logic of hypothesis testing. Be sure to explain terminology
such as research hypothesis and null hypothesis, and explain the concept of pro-
viding support for the research hypothesis when the study results are strong
enough to reject the null hypothesis.

2. Explain the concept of the comparison distribution. Be sure to mention that it is the
distribution that represents the population situation if the null hypothesis is true.
Note that the key characteristics of the comparison distribution are its mean, stan-
dard deviation, and shape.

;1.96.

-1.96+1.96

� = 4,
� = 19

Introduction to Hypothesis Testing 131

These problems involve figuring. Most real-life statistics problems are done on a
computer with special statistical software. Even if you have such software, do these
problems by hand to ingrain the method in your mind.

All data are fictional unless an actual citation is given.

Set I (for Answers to Set I Problems, see pp. 675–677)
1. Define the following terms in your own words: (a) hypothesis-testing proce-

dure, (b) .05 significance level, and (c) two-tailed test.
2. When a result is not extreme enough to reject the null hypothesis, explain why it

is wrong to conclude that your result supports the null hypothesis.
3. For each of the following, (a) say which two populations are being compared,

(b) state the research hypothesis, (c) state the null hypothesis, and (d) say whether
you should use a one-tailed or two-tailed test and why.

i. Do Canadian children whose parents are librarians score higher than Canadian
children in general on reading ability?

ii. Is the level of income for residents of a particular city different from the
level of income for people in the region?

iii. Do people who have experienced an earthquake have more or less self-
confidence than the general population?

4. Based on the information given for each of the following studies, decide
whether to reject the null hypothesis. For each, give (a) the Z-score cutoff (or
cutoffs) on the comparison distribution at which the null hypothesis should be
rejected, (b) the Z score on the comparison distribution for the sample score, and
(c) your conclusion. Assume that all populations are normally distributed.

Practice Problems

3. Describe the logic and process for determining (using the normal curve) the cut-
off sample scores on the comparison distribution at which you should reject the
null hypothesis.

4. Describe how to figure the sample’s score on the comparison distribution.
5. Explain how and why the scores from Steps ❸ and ❹ of the hypothesis-testing

process are compared. Explain the meaning of the result of this comparison with
regard to the specific research and null hypotheses being tested.

5. Based on the information given for each of the following studies, decide
whether to reject the null hypothesis. For each, give (a) the Z-score cutoff (or
cutoffs) on the comparison distribution at which the null hypothesis should be
rejected, (b) the Z score on the comparison distribution for the sample score, and
(c) your conclusion. Assume that all populations are normally distributed.

Population

Study � � Sample Score p Tails of Test

A 10 2 14 .05 1 (high predicted)
B 10 2 14 .05 2
C 10 2 14 .01 1 (high predicted)
D 10 2 14 .01 2
E 10 4 14 .05 1 (high predicted)

132 Chapter 4

Population
Study � � Sample Score p Tails of Test

A 70 4 74 .05 1 (high predicted)
B 70 1 74 .01 2
C 70 2 76 .01 2
D 72 2 77 .01 2
E 72 2 68 .05 1 (low predicted)

6. A psychologist studying the senses of taste and smell has carried out many
studies in which students are given each of 20 different foods (apricot, choco-
late, cherry, coffee, garlic, and so on). She administers each food by dropping
a liquid on the tongue. Based on her past research, she knows that for students
overall at the university, the mean number of the 20 foods that students can
identify correctly is 14, with a standard deviation of 4, and the distribution of
scores follows a normal curve. The psychologist wants to know whether peo-
ple’s accuracy on this task has more to do with smell than with taste. In other
words, she wants to test whether people do worse on the task when they are
only able to taste the liquid compared to when they can both taste and smell it
(note that this is a directional hypothesis). Thus, she sets up special procedures
that keep a person from being able to use the sense of smell during the task.
The psychologist then tries the procedure on one randomly selected student.
This student is able to identify only 5 correctly. (a) Using the .05 significance
level, what should the psychologist conclude? Solve this problem explicitly
using all five steps of hypothesis testing and illustrate your answer with a
sketch showing the comparison distribution, the cutoff (or cutoffs), and the
score of the sample on this distribution. (b) Then explain your answer to some-
one who has never had a course in statistics (but who is familiar with mean,
standard deviation, and Z scores).

7. A psychologist is working with people who have had a particular type of
major surgery. This psychologist proposes that people will recover from the
operation more quickly if friends and family are in the room with them for
the first 48 hours after the operation. It is known that time to recover from
this kind of surgery is normally distributed with a mean of 12 days and a
standard deviation of 5 days. The procedure of having friends and family in
the room for the period after the surgery is tried with a randomly selected pa-
tient. This patient recovers in 18 days. (a) Using the .01 significance level,
what should the researcher conclude? Solve this problem explicitly using all
five steps of hypothesis testing, and illustrate your answer with a sketch
showing the comparison distribution, the cutoff (or cutoffs), and the score of
the sample on this distribution. (b) Then explain your answer to someone
who has never had a course in statistics (but who is familiar with mean, stan-
dard deviation, and Z scores).

8. What is the effect of going through a natural disaster on the attitude of police
chiefs about the goodness of the people in their city? A researcher studying this
expects a more positive attitude (because of the many acts of heroism and help-
ing of neighbors), but a more negative attitude is also possible (because of loot-
ing and scams). It is known that, using a 1-to-10 scale (from 1 � extremely
negative attitude to 10 � extremely positive attitude), in general police chiefs’
attitudes about the goodness of the people in their cities is normally distributed,

Introduction to Hypothesis Testing 133

with a mean of 6.5 and a standard deviation of 2.1. A major earthquake has just
occurred in an isolated city, and shortly afterward the researcher is able to give
the attitude questionnaire to the police chief of that city. The chief’s score is 8.2.
(a) Using the .05 significance level, what should the researcher conclude? Solve
this problem explicitly using all five steps of hypothesis testing and illustrate
your answer with a sketch showing the comparison distribution, the cutoff (or
cutoffs), and the score of the sample on this distribution. (b) Then explain your
answer to someone who has never had a course in statistics (but who is familiar
with mean, standard deviation, and Z scores).

9. Robins and John (1997) carried out a study on narcissism (self-love), comparing
people who scored high versus low on a narcissism questionnaire. (An example
item was, “If I ruled the world it would be a better place.”) They also had other
questionnaires, including one that had an item about how many times the partic-
ipant looked in the mirror on a typical day. In their results section, the re-
searchers noted “. . . as predicted, high-narcissism individuals reported looking
at themselves in the mirror more frequently than did low narcissism individuals
(Ms � 5.7 vs. 4.8), . . . p � .05” (p. 39). Explain this result to a person who has
never had a course in statistics. (Focus on the meaning of this result in terms of
the general logic of hypothesis testing and statistical significance.)

10. Reber and Kotovsky (1997), in a study of problem solving, described one of
their results comparing a specific group of participants within their overall con-
trol condition as follows: “This group took an average of 179 moves to solve the
puzzle, whereas the rest of the control participants took an average of 74 moves,
t (19) � 3.31, p � .01” (p. 183). Explain this result to a person who has never
had a course in statistics. (Focus on the meaning of this result in terms of the
general logic of hypothesis testing and statistical significance.)

Set II
11. List the five steps of hypothesis testing, and explain the procedure and logic of

each.
12. When a result is significant, explain why it is wrong to say the result “proves”

the research hypothesis.
13. For each of the following, (a) state which two populations are being compared,

(b) state the research hypothesis, (c) state the null hypothesis, and (d) say
whether you should use a one-tailed or two-tailed test and why.

i. In an experiment, people are told to solve a problem by focusing on the details.
Is the speed of solving the problem different for people who get such instruc-
tions compared to the speed for people who are given no special instructions?

ii. Based on anthropological reports in which the status of women is scored on a
10-point scale, the mean and standard deviation across many cultures are
known.Anew culture is found in which there is an unusual family arrangement.
The status of women is also rated in this culture. Do cultures with the unusual
family arrangement provide higher status to women than cultures in general?

iii. Do people who live in big cities develop more stress-related conditions than
people in general?

14. Based on the information given for each of the following studies, decide
whether to reject the null hypothesis. For each, give (a) the Z-score cutoff (or
cutoffs) on the comparison distribution at which the null hypothesis should be
rejected, (b) the Z score on the comparison distribution for the sample score, and
(c) your conclusion. Assume that all populations are normally distributed.

134 Chapter 4

Population
Study � � Sample Score p Tails of Test

A 5 1 7 .05 1 (high predicted)
B 5 1 7 .05 2
C 5 1 7 .01 1 (high predicted)
D 5 1 7 .01 2

Population
Study � � Sample Score p Tails of Test

A 100.0 10.0 80 .05 1 (low predicted)
B 100.0 20.0 80 .01 2
C 74.3 11.8 80 .01 2
D 16.9 1.2 80 .05 1 (low predicted)
E 88.1 12.7 80 .05 2

15. Based on the information given for each of the following studies, decide
whether to reject the null hypothesis. For each, give (a) the Z-score cutoff (or
cutoffs) on the comparison distribution at which the null hypothesis should be
rejected, (b) the Z score on the comparison distribution for the sample score, and
(c) your conclusion. Assume that all populations are normally distributed.

16. A researcher wants to test whether a certain sound will make rats do worse on
learning tasks. It is known that an ordinary rat can learn to run a particular maze
correctly in 18 trials, with a standard deviation of 6. (The number of trials to
learn this maze is normally distributed.) The researcher now tries an ordinary rat
in the maze, but with the sound. The rat takes 38 trials to learn the maze.
(a) Using the .05 level, what should the researcher conclude? Solve this problem
explicitly using all five steps of hypothesis testing, and illustrate your answer
with a sketch showing the comparison distribution, the cutoff (or cutoffs), and
the score of the sample on this distribution. (b) Then explain your answer to
someone who has never had a course in statistics (but who is familiar with
mean, standard deviation, and Z scores).

17. A family psychologist developed an elaborate training program to reduce the
stress of childless men who marry women with adolescent children. It is
known from previous research that such men, one month after moving in with
their new wife and her children, have a stress level of 85 with a standard devi-
ation of 15, and the stress levels are normally distributed. The training program
is tried on one man randomly selected from all those in a particular city who
during the preceding month have married a woman with an adolescent child.
After the training program, this man’s stress level is 60. (a) Using the .05 level,
what should the researcher conclude? Solve this problem explicitly using all
five steps of hypothesis testing and illustrate your answer with a sketch show-
ing the comparison distribution, the cutoff (or cutoffs), and the score of the
sample on this distribution. (b) Then explain your answer to someone who has
never had a course in statistics (but who is familiar with mean, standard devia-
tion, and Z scores).

Introduction to Hypothesis Testing 135

18. A researcher predicts that listening to music while solving math problems will
make a particular brain area more active. To test this, a research participant has
her brain scanned while listening to music and solving math problems, and the
brain area of interest has a percentage signal change of 58. From many previous
studies with this same math problems procedure (but not listening to music), it is
known that the signal change in this brain area is normally distributed with a
mean of 35 and a standard deviation of 10. (a) Using the .01 level, what should
the researcher conclude? Solve this problem explicitly using all five steps of hy-
pothesis testing, and illustrate your answer with a sketch showing the comparison
distribution, the cutoff (or cutoffs), and the score of the sample on this distribu-
tion. (b) Then explain your answer to someone who has never had a course in sta-
tistics (but who is familiar with mean, standard deviation, and Z scores).

19. Pecukonis (1990), as part of a larger study, measured ego development (a mea-
sure of overall maturity) and ability to empathize with others among a group of
24 aggressive adolescent girls in a residential treatment center. The girls were di-
vided into high- and low-ego development groups, and the empathy (“cognitive
empathy”) scores of these two groups were compared. In his results section,
Pecukonis reported, “The average score on cognitive empathy for subjects scor-
ing high on ego development was 22.1 as compared with 16.3 for low scorers, . . .
p � .005” (p. 68). Explain this result to a person who has never had a course in sta-
tistics. (Focus on the meaning of this result in terms of the general logic of hy-
pothesis testing and statistical significance.)

20. In an article about antitobacco campaigns, Siegel and Biener (1997) discuss the
results of a survey of tobacco usage and attitudes, conducted in Massachusetts
in 1993 and 1995; Table 4–4 shows the results of this survey. Focusing on just

Table 4–4 Selected Indicators of Change in Tobacco Use, ETS Exposure, and Public Attitudes
toward Tobacco Control Policies—Massachusetts, 1993–1995

1993 1995

Adult Smoking Behavior

Percentage smoking 25 cigarettes daily 24 10*

Percentage smoking �15 cigarettes daily 31 49*

Percentage smoking within 30 minutes of waking 54 41

Environmental Tobacco Smoke Exposure

Percentage of workers reporting a smoke free worksite 53 65*

Mean hours of ETS exposure at work during prior week 4.2 2.3*

Percentage of homes in which smoking is banned 41 51*

Attitudes Toward Tobacco Control Policies

Percentage supporting further increase in tax on

tobacco with funds earmarked for tobacco control 78 81

Percentage believing ETS is harmful 90 84

Percentage supporting ban on vending machines 54 64*

Percentage supporting ban on support of sports and cultural
events by tobacco companies 59 53*

* p � .05
Source: Siegel, M., & Biener, L. (1997). Evaluating the impact of statewide anti-tobacco campaigns: The Massachusetts and
California tobacco control programs. Journal of Social Issues, 53, 147–168. Copyright © 1997 by Blackwell Publishing.
Reprinted by permission of Blackwell Publishers Journals.

136 Chapter 4

the first line (the percentage smoking 25 cigarettes daily), explain what this
result means to a person who has never had a course in statistics. (Focus on the
meaning of this result in terms of the general logic of hypothesis testing and
statistical significance.)

1. We are oversimplifying a bit to make the initial learning easier. The research hy-
pothesis is that one population will walk earlier than the other, . Thus,
to be precise, its opposite is that the other group will either walk at the same
time or later. That is, the opposite of the research hypothesis in this example in-
cludes both no difference and a difference in the direction opposite to what we
predicted. In terms of symbols, if our research hypothesis is , then its
opposite is (the symbol “ ” means “greater than or equal to”). We
discuss this issue in some detail later in the chapter.

2. In practice, since hypothesis testing is usually done on a computer, you have to
decide in advance only on the cutoff probability. The computer prints out the
exact probability of getting your result if the null hypothesis were true. You then
just compare the printed-out probability to see if it is less than the cutoff proba-
bility level you set in advance. However, to understand what these probability
levels mean, you need to learn the entire process, including how to figure the
Z score for a particular cutoff probability.

Ú�1 Ú �2
�1 6 �2

�1 6 �2

Chapter Notes

Turn in your highest-quality paper
Get a qualified writer to help you with

“ CHAPTER 4 QUESTIONS ”

Get high-quality paper

NEW! AI matching with writer

Still stressed from student homework?

Get quality assistance from academic writers!

Order now

CHAPTER 4 QUESTIONS

Name: Save Time On Research and Writing Hire a Pro to Write You a 100% Plagiarism-Free Paper. Get My Paper

C hapter 4 Instructions

Name:

Save Time On Research and Writing

Hire a Pro to Write You a 100% Plagiarism-Free Paper.

Get My Paper

C

hapter 4 Instructions