Depression Levels of Bi-Polar Patients Before and After LSR 53


You will be testing the effects of LSR 53, a new medication that affects depression in patients diagnosed with Bi-Polar Disorder by a licensed Psychiatrist.


You will draw one sample of 20 participants and give each participant (patient) a pretest for depression. (see attached)Following the pretest, each patient will start a six week trial of LSR 53, a new depression medication, in accordance with the Psychiatrist’s instructions. Following the six week trial, each patient will take a post test for depression. Results of the pretest and post test will be compared to determine if LSR 53 was effective in changing depression levels. You will compare the means of the pretest and post test via a dependent samples “t” test.


Note: Use the fictitious “Bi-Polar Depression Survey” to rate each patients depression level before and after the trials.


Pay particular attention to the information on the “t” test for dependent means in Chapter 7 of your text – it relates directly to your hypothesis test.


Attached, you will find a data set that accompanies your hypothesis.  Your job will be to apply the formula(s) in Chapter 7 of your text to the data, interpret your results, and write your paper.  The numbers in the Pretest column are the results of the Bi-Polar Depression Survey before LSR 53 was taken. The numbers in the Post Test column are the results of the survey after LSR 53 was taken. Higher numbers indicate greater depression levels. Max score is 100. You will be using a two-tailed test at the .05 level of significance to test your data.


Your hypothesis is below: (Use this hypothesis statement in your paper)


Depression levels in patients diagnosed with Bi-Polar Disorder change after a six week trial of LSR 53.


The paper will be divided as follows:Introduction:  Participants: RachelApparatus: RachelMaterials: Procedure: Results: Discussion:








Emissions of Automobiles in America: A Controlled Investigation

Learning Team Z

Team Member Name Here

Team Member Name Here
Team Member Name Here


Psych 315

October 8, 2059

Mr. Avery

Emissions of Automobiles in America: A Controlled Investigation

A growing body of research by leading industry experts shows an alarming rate of hydrocarbons in the atmosphere throughout the lower 48 contiguous United States and Canada. Harmful emissions from automobile tailpipes and engine compartments contribute to the rising rate of hydrocarbons released into the atmosphere each year. Preston (2005) indicates an average atmospheric increase of 3.8% in hydrocarbons each year since 1999. He estimates that a 50% reduction in automobile emissions will lower the overall level of harmful hydrocarbons in the air by 23%. Engineers from the Oppenheimer Group, a leading manufacturer of emissions control products, produced an emissions control device that can be retrofitted to any automobile exhaust system through the tailpipe, and modified to fit in the engine compartment of most cars and trucks. The Oppenheimer Group installed its emissions control device in the exhaust system of 36 randomly selected cars from across the country. Industry standards reveal that automobiles not equipped with the Oppenheimer Group’s emissions control device emit an average of 100 pounds of pollutants per year into the atmosphere with a standard deviation of 15. The Oppenheimer Group predicts that cars equipped with their emissions control device will differ from cars not equipped with their emissions control device in the number of pounds of pollutants released into the air. H1:

≠ 100 and H0:
= 100.



Forty drivers and their respective cars were recruited to participate in the emissions control test from various cities across the country via a radio, newspaper, and billboard advertisement campaign which lasted 60 days prior to the beginning of the investigation. Four participants withdrew before the study began due to prior commitments that would interfere with their ability to complete all 30 days of the study. Drivers ranged in age from 18 to 59 years with a mean age of 34.12. Nineteen participants were female (52.78%); seventeen were male (47.22%). Twenty-nine participants were married (74.36%). The remaining seven were single or divorced (19.44%). The average education level for all participants was 11.63 years. Average income reported for all participants was $39,700 per year. For all 36 participants, the average number of miles driven per month was 1,140.18. Eight of the participants were unemployed at the beginning of the study (22.22%). All others had one or more jobs. A 25 dollar inducement was offered to those participants who completed the investigation.


The Oppenheimer Group’s emissions control device was retrofitted into the tailpipe of all 36 automobiles by Oppenheimer’s technicians who flew to each participant’s city to install the device. The device was place at both ends of the catalytic converter; that is, the exhaust fumes traveling from the engine first passed through the emissions control device before passing through the catalytic converter and then through a second emissions control device prior to being released through the tailpipe into the air.


All participants were given Truman’s (1999) questionnaire of driving habits that included questions about driving infractions and/or tickets over the past five years. Those drivers who indicated 3 or more traffic tickets within the past year, and/or 1 or more DUI and DWI infractions within the past 5 years were excluded from the study in the interest of maintaining a high degree of integrity within the investigation.


Drivers were instructed to maintain their normal driving routine throughout the 30 day trial. Exhaust emissions were measured on 20 of the 30 days (Monday – Friday) during the study by inserting an emissions meter into the tailpipe and also by taking emission readings from each automobile’s engine compartment. Emissions per car were totaled and the totals for each automobile were added together at the end of the study. Yearly totals were estimated from the monthly totals at the end of the 30 day trial.


A “Z” test was the statistical procedure chosen to determine the significance or non-significance of the hydrocarbons released into the air by the emissions control device. The Oppenheimer Group tested their emissions control device using a two-tailed test @ .05 alpha. The average amount of hydrocarbons released into the air by all cars not equipped with the emissions control device was 100.00 pounds per year with a standard deviation of 15.00.

All thirty-six automobiles in the test group released an average of 99.00 pounds of hydrocarbons into the air per year. The standard error of the mean was 2.50. That is, the population standard deviation of 15.00 divided by the square root of N where N = 36 yielded the standard error of the mean of 2.50. The resulting “Z” test yielded an obtained value of –0.40 against a two-tailed critical value of –1.96. The obtained value, indicated below, was not significant. The mean pounds per year for the sample (M = 99.00, SD = 4.31) dropped slightly from the pounds per year for the population (M = 100.00, SD 4.54).

Z = -0.40, p > .05.


The present study attempted to demonstrate that the Oppenheimer Group’s emissions control device released less pollutants into the air from cars on which the device was installed vs the pollutants released into the air from automobiles on which the emissions control device was not installed. After the 30 day trial, no significant difference was found between those automobiles on which the emissions control device was installed compared to those automobiles on which the device was not installed.

Hydrocarbon pollutants were an inherent part of this study and will continue to be released into the air at no less than Preston’s estimated 3.8% per year until effective measures can be put in place to reduce the level of harmful emissions released from automobile exhaust systems.

This study was limited in that only 36 participants and their automobiles volunteered to participate. A broader scope of participants from a wider geographical area would be desired. Also, global implications can scarcely be discussed since the present trials were conducted entirely within the United States where the level of atmospheric hydrocarbons is at minimum 3 times greater than other industrialized nations (Jennings, 2004). Similar research in regions outside the U.S. may have achieved different results.

Future research should include more participants with a broader range of automobiles to include small, medium, and large privately owned trucks. Further, trials should be conducted in all of the various climatic regions in the country and at multiple elevations. Moreover, the researchers feel that a longer study would produce more accurate data instead of estimating yearly emission totals from one 30 day study. Lastly, each emissions control device was installed new on each of the 36 automobile in the study. Even if results had been significant, the research team would want to know if the emissions control device would continue to reduce the level of harmful pollutants into the air and for how long until it needed to be replaced. Perhaps further research can answer these important questions.
Jennings, W.R (2004). Filling our air with poisons: A case study of pollutants in
our air. Hydrocarbon Quarterly, 34, 165-184.
Preston, H.G. (2005). Hydrocarbons and the Air: The Poisoning of America.
New York: McGraw Hill.
Truman, J.C. (1997). Driving Habits: Integrity of the American Driver. Journal of

Automobile Engineering and Science, 15, 125-133.





✪ The t Test for a Single Sample 2


✪ The t Test for Dependent Means 23


✪ Assumptions of the t Test for a Single
Sample and the t Test for Dependent
Means 247

✪ Effect Size and Power for the t Test
for Dependent Means 247

✪ Controversy: Advantages and
Disadvantages of Repeated-Measures
Designs 25


t this point, you may think you know all about hypothesis testing. Here’s a
surprise: what you know will not help you much as a researcher. Why? The
procedures for testing hypotheses described up to this point were, of course,

absolutely necessary for what you will now learn. However, these procedures in-
volved comparing a group of scores to a known population. In real research practice,
you often compare two or more groups of scores to each other, without any direct
information about populations. For example, you may have two scores for each per-
son in a group of people, such as scores on an anxiety test before and after psy-
chotherapy or number of familiar versus unfamiliar words recalled in a memory
experiment. Or you might have one score per person for two groups of people, such

✪ Single Sample t Tests and Dependent
Means t Tests in Research
Articles 252

✪ Summary 253

✪ Key Terms 254

✪ Example Worked-Out Problems 254

✪ Practice Problems 258

✪ Using SPSS 265

✪ Chapter Notes 268

Introduction to t Tests
Single Sample

Dependent Means

Chapter Outline




Introduction to t Tests 223

t test hypothesis-testing procedure in
which the population variance is un-
known; it compares t scores from a sam-
ple to a comparison distribution called a
t distribution


as an experimental group and a control group in a study of the effect of sleep loss on
problem solving, or comparing the self-esteem test scores of a group of 10-year-old
girls to a group of 10-year-old boys.

These kinds of research situations are among the most common in psychology,
where usually the only information available is from samples. Nothing is known
about the populations that the samples are supposed to come from. In particular, the
researcher does not know the variance of the populations involved, which is a crucial
ingredient in Step ❷ of the hypothesis-testing process (determining the characteristics
of the comparison distribution).

In this chapter, we first look at the solution to the problem of not knowing the
population variance by focusing on a special situation: comparing the mean of a sin-
gle sample to a population with a known mean but an unknown variance. Then, after
describing how to handle this problem of not knowing the population variance, we
go on to consider the situation in which there is no known population at all—the sit-
uation in which all we have are two scores for each of a number of people.

The hypothesis-testing procedures you learn in this chapter, those in which the
population variance is unknown, are examples of t tests. The t test is sometimes
called “Student’s t ” because its main principles were originally developed by
William S. Gosset, who published his research articles anonymously using the name
“Student” (see Box 7–1).

The t Test for a Single


Let’s begin with an example. Suppose your college newspaper reports an informal
survey showing that students at your college study an average of 17 hours per week.
However, you think that the students in your dormitory study much more than that.
You randomly pick 16 students from your dormitory and ask them how much they
study each day. (We will assume that they are all honest and accurate.) Your result is
that these 16 students study an average of 21 hours per week. Should you conclude
that students in your dormitory study more than the college average? Or should you
conclude that your results are close enough to the college average that the small dif-
ference of 4 hours might well be due to your having picked, purely by chance, 16 of
the more studious residents in your dormitory?

In this example you have scores for a sample of individuals and you want to com-
pare the mean of this sample to a population for which you know the mean but not the
variance. Hypothesis testing in this situation is called a t test for a single sample. (It is
also called a one-sample t test.) The t test for a single sample works basically the same
way as the Z test you learned in Chapter 5. In the studies we considered in that chapter,
you had scores for a sample of individuals (such as a group of 64 students rating the at-
tractiveness of a person in a photograph after being told that the person has positive
personality qualities) and you wanted to compare the mean of this sample to a popula-
tion (in this case, a population of students not told about the person’s personality qual-
ities). However, in the studies we considered in Chapter 5, you knew both the mean
and variance of the general population to which you were going to compare your sam-
ple. In the situations we are now going to consider, everything is the same, but you
don’t know the population variance. This presents two important new wrinkles affect-
ing the details of how you carry out two of the steps of the hypothesis-testing process.

The first important new wrinkle is in Step ❷. Because the population variance is not
known, you have to estimate it. So the first new wrinkle we consider is how to estimate
an unknown population variance. The other important new wrinkle affects Steps ❷
and ❸. When the population variance has to be estimated, the shape of the comparison

t test for a single sample hypothesis-
testing procedure in which a sample
mean is being compared to a known
population mean and the population
variance is unknown.






distribution is not quite a normal curve; so the second new wrinkle we consider is the
shape of the comparison distribution (for Step ❷) and how to use a special table to find
the cutoff (Step ❸) on what is a slightly differently shaped distribution.

Let’s return to the amount of studying example. Step ❶ of the hypothesis-testing
procedure is to restate the problem as hypotheses about populations. There are two

Population 1: The kind of students who live in your dormitory.
Population 2: The kind of students in general at your college.

The research hypothesis is that Population 1 students study more than Population 2
students; the null hypothesis is that Population 1 students do not study more than
Population 2 students. So far, the problem is no different from those in Chapter 5.

Step ❷ is to determine the characteristics of the comparison distribution. In this
example, its mean will be 17, what the survey found for students at your college
generally (Population 2).

224 Chapter 7

professor of mathematics and not a proper brewer at all.
To his statistical colleagues, mainly at the Biometric Lab-
oratory at University College in London, he was a mere
brewer and not a proper mathematician.

So Gosset discovered the t distribution and invented
the t test—simplicity itself (compared to most of
statistics)—for situations when samples are small and
the variability of the larger population is unknown. How-
ever, the Guinness brewery did not allow its scientists to
publish papers, because one Guinness scientist had re-
vealed brewery secrets. To this day, most statisticians
call the t distribution “Student’s t” because Gosset wrote
under the anonymous name “Student.” A few of his fel-
low statisticians knew who “Student” was, but apparently
meetings with others involved the secrecy worthy of a
spy novel. The brewery learned of his scientific fame
only at his death, when colleagues wanted to honor him.

In spite of his great achievements, Gosset often wrote
in letters that his own work provided “only a rough idea
of the thing” or so-and-so “really worked out the com-
plete mathematics.” He was remembered as a thoughtful,
kind, humble man, sensitive to others’ feelings. Gosset’s
friendliness and generosity with his time and ideas also
resulted in many students and younger colleagues mak-
ing major breakthroughs based on his help.

To learn more about William Gosset, go to http://

Sources: Peters (1987); Salsburg (2001); Stigler (1986);
Tankard (1984).

B O X 7 – 1 William S. Gosset, Alias “Student”:
Not a Mathematician, But a Practical Man

William S. Gosset graduated
from Oxford University in
1899 with degrees in mathe-
matics and chemistry. It hap-
pened that in the same year
the Guinness brewers in
Dublin, Ireland, were seeking
a few young scientists to take
a first-ever scientific look at
beer making. Gosset took one
of these jobs and soon had

immersed himself in barley, hops, and vats of brew.
The problem was how to make beer of a consistently

high quality. Scientists such as Gosset wanted to make the
quality of beer less variable, and they were especially in-
terested in finding the cause of bad batches. A proper sci-
entist would say, “Conduct experiments!” But a business
such as a brewery could not afford to waste money on ex-
periments involving large numbers of vats, some of which
any brewer worth his hops knew would fail. So Gosset
was forced to contemplate the probability of, say, a certain
strain of barley producing terrible beer when the experi-
ment could consist of only a few batches of each strain.
Adding to the problem was that he had no idea of the vari-
ability of a given strain of barley—perhaps some fields
planted with the same strain grew better barley. (Does this
sound familiar? Poor Gosset, like today’s psychologists,
had no idea of his population’s variance.)

Gosset was up to the task, although at the time only he
knew that. To his colleagues at the brewery, he was a

The Granger Collection



The next part of Step ❷ is finding the variance of the distribution of means. Now
you face a problem. Up to now in this book, you have always known the variance of
the population of individuals. Using that variance, you then figured the variance of the
distribution of means. However, in the present example, the variance of the number of
hours studied for students at your college (the Population 2 students) was not reported
in the newspaper article. So you email the paper. Unfortunately, the reporter did not
figure the variance, and the original survey results are no longer available. What to do?

Basic Principle of the t Test: Estimating the Population
Variance from the Sample


If you do not know the variance of the population of individuals, you can estimate it
from what you do know—the scores of the people in your sample.

In the logic of hypothesis testing, the group of people you study is considered to
be a random sample from a particular population. The variance of this sample ought
to reflect the variance of that population. If the scores in the population have a lot of
variation, then the scores in a sample randomly selected from that population should
also have a lot of variation. If the population has very little variation, the scores in a
sample from that population should also have very little variation. Thus, it should be
possible to use the variation among the scores in the sample to make an informed
guess about the spread of the scores in the population. That is, you could figure the
variance of the sample’s scores, and that should be similar to the variance of the
scores in the population. (See Figure 7–1.)

There is, however, one small hitch. The variance of a sample will generally be
slightly smaller than the variance of the population from which it is taken. For this
reason, the variance of the sample is a biased estimate of the population variance.1

It is a biased estimate because it consistently underestimates the actual variance of
the population. (For example, if a population has a variance of 180, a typical sample

Introduction to t Tests 225

(c)(b) (a)

Figure 7–1 The variation in samples (as in each of the lower distributions) is similar to
the variations in the populations they are taken from (each of the upper distributions).

biased estimate estimate of a popula-
tion parameter that is likely systemati-
cally to overestimate or underestimate
the true value of the population parame-
ter. For example, would be a biased
estimate of the population variance (it
would systematically underestimate it).


226 Chapter 7

unbiased estimate of the population
variance ( ) estimate of the popula-
tion variance, based on sample scores,
which has been corrected so that it is
equally likely to overestimate or under-
estimate the true population variance;
the correction used is dividing the sum
of squared deviations by the sample size
minus 1, instead of the usual procedure
of dividing by the sample size directly.


of 20 scores might have a variance of only 171.) If we used a biased estimate of the
population variance in our research studies, our results would not be accurate. There-
fore, we need to identify an unbiased estimate of the population variance.

Fortunately, you can figure an unbiased estimate of the population variance by
slightly changing the ordinary variance formula. The ordinary variance formula is the
sum of the squared deviation scores divided by the number of scores. The changed for-
mula still starts with the sum of the squared deviation scores, but divides this by the
number of scores minus 1. Dividing by a slightly smaller number makes the result
slightly larger. Dividing by the number of scores minus 1 makes the variance you get
just enough larger to make it an unbiased estimate of the population variance. (This
unbiased estimate is our best estimate of the population variance. However, it is still
an estimate, so it is unlikely to be exactly the same as the true population variance. But
we can be certain that our unbiased estimate of the population variance is equally likely
to be too high as it is to be too low. This is what makes the estimate unbiased.)

The symbol we will use for the unbiased estimate of the population variance is
. The formula is the usual variance formula, but now dividing by :



Let’s return again to the example of hours spent studying and figure the estimated
population variance from the sample’s 16 scores. First, you figure the sum of squared
deviation scores. (Subtract the mean from each of the scores, square those deviation
scores, and add them.) Presume in our example that this comes out to
To get the estimated population variance, you divide this sum of squared deviation
scores by the number of scores minus 1; that is, in this example, you divide 694 by

; 694 divided by 15 comes out to 46.27. In terms of the formula,

At this point, you have now seen several different types of standard deviation
and variance (that is, for a sample, for a population, and unbiased estimates); and
each of these types has used a different symbol. To help you keep them straight, a
summary of the types of standard deviation and variance is shown in Table 7–1.

Degrees of Freedom
The number you divide by (the number of scores minus 1) to get the estimated pop-
ulation variance has a special name. It is called the degrees of freedom. It has this
name because it is the number of scores in a sample that are “free to vary.” The idea
is that, when figuring the variance, you first have to know the mean. If you know the
mean and all but one of the scores in the sample, you can figure out the one you
don’t know with a little arithmetic. Thus, once you know the mean, one of the


in the sample is not free to have any possible value. So in this kind of situation the
degrees of freedom are the number of scores minus 1. In terms of a formula,


df is the degrees of freedom.

df =

N – 1

S2 = a
(X – M)2

N – 1 =


N – 1 =


16 – 1





16 – 1

694 (SS = 694).

S = 2S2

S2 = a
(X – M)2
N – 1 =
N – 1

N – 1S2
The estimated population
variance is the sum of the
squared deviation scores di-
vided by the number of
scores minus 1.

The estimated population
standard deviation is the
square root of the estimated
population variance.

degrees of freedom (df ) number of
scores free to vary when estimating a
population parameter; usually part of a
formula for making that estimate—for
example, in the formula for estimating
the population variance from a single
sample, the degrees of freedom is the
number of scores minus 1.

Table 7–1 Summary of
Different Types of Standard Deviation
and Variance

Statistical Term Symbol

Sample standard deviation SD

Population standard deviation

Estimated population S
standard deviation

Sample variance SD 2

Population variance

Estimated population variance

S 2


The degrees of freedom are
the number of scores in the
sample minus 1.

Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

Introduction to t Tests 227

In our example, . (In some situations you learn about in later
chapters, the degrees of freedom are figured a bit differently. This is because in those
situations, the number of scores free to vary is different. For all the situations you
learn about in this chapter, .)

The formula for the estimated population variance is often written using df in-
stead of :


The Standard Deviation of the Distribution of Means
Once you have figured the estimated population variance, you can figure the stan-
dard deviation of the comparison distribution using the same procedures you learned
in Chapter 5. Just as before, when you have a sample of more than one, the compar-
ison distribution is a distribution of means, and the variance of a distribution of
means is the variance of the population of individuals divided by the sample size.
You have just estimated the variance of the population. Thus, you can estimate the
variance of the distribution of means by dividing the estimated population variance
by the sample size. The standard deviation of the distribution of means is the square
root of its variance. Stated as formulas,



Note that, with an estimated population variance, the symbols for the variance and
standard deviation of the distribution of means use S instead of .

In our example, the sample size was 16 and we worked out the estimated popu-
lation variance to be 46.27. The variance of the distribution of means, based on that
estimate, will be 2.89. That is, 46.27 divided by 16 equals 2.89. The standard devia-
tion is 1.70, the square root of 2.89. In terms of the formulas,

The Shape of the Comparison Distribution When Using
an Estimated Population Variance: The t Distribution
In Chapter 5 you learned that when the population distribution follows a normal
curve, the shape of the distribution of means will also be a normal curve. However,
this changes when you do hypothesis testing with an estimated population variance.
When you are using an estimated population variance, you have less true informa-
tion and more room for error. The mathematical effect is that there are likely to be
slightly more extreme means than in an exact normal curve. Further, the smaller your

SM = 2S2M = 22.89 = 1.70
S2M =




= 2.8


SM = 2


S2M =

S2 = a
(X – M)2



N – 1

df = N – 1

df = 16 – 1 = 15

Be sure that you fully understand
the difference between and .
These terms look quite similar, but
they are quite different. is the
estimated variance of the popula-
tion of individuals. is the esti-
mated variance of the distribution
of means (based on the estimated
variance of the population of indi-
viduals, ).S2




The estimated population
variance is the sum of squared
deviations divided by the de-
grees of freedom.

The variance of the distribu-
tion of means based on an es-
timated population variance
is the estimated population
variance divided by the num-
ber of scores in the sample.

The standard deviation of the
distribution of means based on
an estimated population vari-
ance is the square root of the
variance of the distribution of
means based on an estimated
population variance.

228 Chapter 7

sample size, the bigger this tendency. This is because, with a smaller sample size,
your estimate of the population variance is based on less information.

The result of all this is that, when doing hypothesis testing using an estimated
variance, your comparison distribution will not be a normal curve. Instead, the com-
parison distribution will be a slightly different curve called a t distribution.

Actually, there is a whole family of t distributions. They vary in shape according
to the degrees of freedom you used to estimate the population variance. However, for
any particular degrees of freedom, there is only one t distribution.

Generally, t distributions look to the eye like a normal curve—bell-shaped, sym-
metrical, and unimodal. A t distribution differs subtly in having heavier tails (that is,
slightly more scores at the extremes). Figure 7–2 shows the shape of a

t distribution

compared to a normal curve.

This slight difference in shape affects how extreme a score you need to reject
the null hypothesis. As always, to reject the null hypothesis, your sample mean has
to be in an extreme section of the comparison distribution of means, such as the top
5%. However, if the comparison distribution has more of its means in the tails than a
normal curve would have, then the point where the top 5% begins has to be farther
out on this comparison distribution. The result is that it takes a slightly more extreme
sample mean to get a significant result when using a t distribution than when using a
normal curve.

Just how much the t distribution differs from the normal curve depends on the de-
grees of freedom, the amount of information used in estimating the population vari-
ance. The t distribution differs most from the normal curve when the degrees of
freedom are low (because your estimate of the population variance is based on a very
small sample). For example, using the normal curve, you may recall that 1.64 is the
cutoff for a one-tailed test at the .05 level. On a t distribution with 7 degrees of free-
dom (that is, with a sample size of 8), the cutoff is 1.895 for a one-tailed test at the .05
level. If your estimate is based on a larger sample, say a sample of 25 (so that ),
the cutoff is 1.711, a cutoff much closer to that for the normal curve. If your sample
size is infinite, the t distribution is the same as the normal curve. (Of course, if your
sample size were infinite, it would include the entire population!) But even with sam-
ple sizes of 30 or more, the t distribution is nearly identical to the normal curve.

Shortly, you will learn how to find the cutoff using a t distribution, but let’s first
return briefly to the example of how much students in your dorm study each week.
You finally have everything you need for Step ❷ about the characteristics of the
comparison distribution. We have already seen that the distribution of means in this
example has a mean of 17 hours and a standard deviation of 1.70. You can now add
that the shape of the comparison distribution will be a t distribution with 15 degrees
of freedom.2

df = 24

Normal distribution

t distribution

Figure 7–2 A t distribution (dashed blue line) compared to the normal curve (solid
black line).

t distribution mathematically defined
curve that is the comparison distribution
used in a t test.

Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

Introduction to t Tests 229

The Cutoff Sample Score for Rejecting the Null
Hypothesis: Using the t Table
Step ❸ of hypothesis testing is determining the cutoff for rejecting the null hypothesis.
There is a different t distribution for any particular degrees of freedom. However, to
avoid taking up pages and pages with tables for each possible t distribution, you use a
simplified table that gives only the crucial cutoff points. We have included such a
t table in the Appendix (Table A–2). Just as with the normal curve table, the t table
shows only positive t scores. If you have a one-tailed test, you need to decide whether
your cutoff score is a positive t score or a negative t score. If your one-tailed test is test-
ing whether the mean of Population 1 is greater than the mean of Population 2, the cut-
off t score is positive. However, if your one-tailed test is testing whether the mean of
Population 1 is less than the mean of Population 2, the cutoff t score is negative.

In the hours-studied example, you have a one-tailed test. (You want to know
whether students in your dorm study more than students in general at your college
study.) You will probably want to use the 5% significance level, because the cost of a
Type I error (mistakenly rejecting the null hypothesis) is not great. You have 16 partic-
ipants, making 15 degrees of freedom for your estimate of the population variance.

Table 7–2 shows a portion of the t table from Table A–2 in the Appendix. Find
the column for the .05 significance level for one-tailed tests and move down to the
row for 15 degrees of freedom. The crucial cutoff is 1.753. In this example, you are
testing whether students in your dormitory (Population 1) study more than students
in general at your college (Population 2). In other words, you are testing whether

Table 7–2 Cutoff Scores for t Distributions with 1 Through 17 Degrees of Freedom
(Highlighting Cutoff for Hours-Studied Example)

One-Tailed Tests Two-Tailed Tests

df .10 .05 .01 .10 .05 .01

1 3.078 6.314 31.821 6.314 12.706 63.657

2 1.886 2.920 6.965 2.920 4.303 9.925

3 1.638 2.353 4.541 2.353 3.182 5.841

4 1.533 2.132 3.747 2.132 2.776 4.604

5 1.476 2.015 3.365 2.015 2.571 4.032

6 1.440 1.943 3.143 1.943 2.447 3.708

7 1.415 1.895 2.998 1.895 2.365 3.500

8 1.397 1.860 2.897 1.860 2.306 3.356

9 1.383 1.833 2.822 1.833 2.262 3.250

10 1.372 1.813 2.764 1.813 2.228 3.170

11 1.364 1.796 2.718 1.796 2.201 3.106

12 1.356 1.783 2.681 1.783 2.179 3.055

13 1.350 1.771 2.651 1.771 2.161 3.013

14 1.345 1.762 2.625 1.762 2.145 2.977

15 1.341 1.753 2.603 1.753 2.132 2.947

16 1.337 1.746 2.584 1.746 2.120 2.9


17 1.334 1.740 2.567 1.740 2.110 2.898

t table table of cutoff scores on the
t distribution for various degrees of
freedom, significance levels, and
one- and two-tailed tests.

230 Chapter 7

students in your dormitory have a higher t score than students in general. This means
that the cutoff t score is positive. Thus, you will reject the null hypothesis if your
sample’s mean is 1.753 or more standard deviations above the mean on the compar-
ison distribution. (If you were using a known variance, you would have found your
cutoff from a normal curve table. The Z score to reject the null hypothesis based on
the normal curve would have been 1.645.)

One other point about using the t table: In the full t table in the Appendix, there
are rows for each degree of freedom from 1 through 30, then for 35, 40, 45, and so
on up to 100. Suppose your study has degrees of freedom between two of these higher
values. To be safe, you should use the nearest degrees of freedom to yours given on
the table that is less than yours. For example, in a study with 43 degrees of freedom,
you would use the cutoff for .

The Sample Mean’s Score on the Comparison
Distribution: The t Score
Step ❹ of hypothesis testing is figuring your sample mean’s score on the comparison
distribution. In Chapter 5, this meant finding the Z score on the comparison
distribution—the number of standard deviations your sample’s mean is from the
mean on the distribution. You do exactly the same thing when your comparison distri-
bution is a t distribution. The only difference is that, instead of calling this a Z score,
because it is from a t distribution, you call it a t score. In terms of a formula,


In the example, your sample’s mean of 21 is 4 hours from the mean of the distri-
bution of means, which amounts to 2.35 standard deviations from the mean (4 hours
divided by the standard deviation of 1.70 hours).3 That is, the t score in the example
is 2.35. In terms of the formula,

Deciding Whether to Reject the Null Hypothesis
Step ➎ of hypothesis testing is deciding whether to reject the null hypothesis. This
step is exactly the same with a t test, as it was in the hypothesis-testing situations dis-
cussed in previous chapters. In the example, the cutoff t score was 1.753 and the
actual t score for your sample was 2.35. Conclusion: reject the null hypothesis. The
research hypothesis is supported that students in your dorm study more than students
in the college overall.

Figure 7–3 shows the various distributions for this example.

Summary of Hypothesis Testing When the Population
Variance Is Not Known
Table 7–3 compares the hypothesis-testing procedure we just considered (for a t test
for a single sample) with the hypothesis-testing procedure for a Z test from
Chapter 5. That is, we are comparing the current situation in which you know the
population’s mean but not its variance to the Chapter 5 situation, where you knew
the population’s mean and variance.

t =
M – �


21 –




= 2.35

t =
M – �

df = 40

The t score is your sample’s
mean minus the population
mean, divided by the standard
deviation of the distribution
of means.

t score on a t distribution, number of
standard deviations from the mean (like
a Z score, but on a t distribution).

Introduction to t Tests 231





15.3013.60 17.00 18.70 20.40
–1–2 0 1 2


Raw Scores:
t Scores:

Figure 7–3 Distribution for the hours-studied example.

Table 7–3 Hypothesis Testing with a Single Sample Mean When Population Variance
Is Unknown (t Test for a Single Sample) Compared to When Population
Variance Is Known (Z Test)

Another Example of a t Test for a Single Sample
Consider another fictional example. Suppose a researcher was studying the psychologi-
cal effects of a devastating flood in a small rural community. Specifically, the researcher
was interested in how hopeful (versus unhopeful) people felt after the flood. The

Steps in Hypothesis Testing
Difference From When Population
Variance Is Known

❶ Restate the question as a research hypothesis
and a null hypothesis about the populations.

No difference in method.

❷ Determine the characteristics of the comparison

Population mean No difference in method.

Standard deviation of the distribution
of sample means

No difference in method (but based on estimated
population variance).

Population variance Estimate from the sample.

Shape of the comparison distribution Use the t distribution with .df = N – 1
❸ Determine the significance cutoff. Use the t table.

❹ Determine your sample’s score on the
comparison distribution.

No difference in method (but called a t score).

❺ Decide whether to reject the null hypothesis. No difference in method.

232 Chapter 7

researcher randomly selected 10 people from this community to complete a short ques-
tionnaire. The key item on the questionnaire asked how hopeful they felt, using a 7-point
scale from extremely unhopeful (1) to neutral (4) to extremely hopeful (7). The re-
searcher wanted to know whether the ratings of hopefulness for people who had been
through the flood would be consistently above or below the neutral point on the scale (4).

Table 7–4 shows the results and figuring for the t test for a single sample;
Figure 7–4 shows the distributions involved. Here are the steps of hypothesis testing.

❶ Restate the question as a research hypothesis and a null hypothesis about
the populations. There are two populations:

Population 1: People who experienced the flood.
Population 2: People who are neither hopeful nor unhopeful.

The research hypothesis is that the two populations will score differently. The
null hypothesis is that they will score the same.

❷ Determine the characteristics of the comparison distribution. If the null hy-
pothesis is true, the mean of both populations is 4. The variance of these popu-
lations is not known, so you have to estimate it from the sample. As shown in
Table 7–4, the sum of the squared deviations of the sample’s scores from the
sample’s mean is 32.10. Thus, the estimated population variance is 32.10 divided
by 9 degrees of freedom (10 – 1), which comes out to 3.57.

The distribution of means has a mean of 4 (the same as the population mean).
Its variance is the estimated population variance divided by the sample size (3.57

Table 7–4 Results and Figuring for a Single-Sample t Test for a Study of 10 People’s
Ratings of Hopefulness Following a Devastating Flood (Fictional Data)

(X )

From the Mean

(X � M )

Squared Difference
From the Mean

(X � M )2

5 .30 .09

3 2.89

6 1.30 1.69

2 7.29

7 2.30 5.29

6 1.30 1.69

7 5.29

4 .49

2 7.29
5 .30 .09

47 32.10

t with needed for 1% significance level, two-tailed .

Actual sample .

Decision: Do not reject the null hypothesis.

t = (M – �)>SM = (4.70 – 4.00)>.60 = .70>.60 = 1.17
= ; 3.250

df = 9

SM = 2S 2M = 2.36 = .60.
S2M = S 2>N = 3.57>10 = .36
S 2 = SS>df = 32.10>(10 – 1) = 32.10>9 = 3.57.
� = 4.00.
df = N – 1 = 10 – 1 = 9.
M = (©X )>N = 47>10 = 4.70.


– 2.70

– .70

– 2.30

– 2.70

– 1.70

Be careful. To find the variance of a
distribution of means, you always
divide the population variance by
the sample size. This is true
whether the population’s variance
is known or only estimated. It is
only when making the estimate of
the population variance that you
divide by the sample size minus 1.
That is, the degrees of freedom are
used only when estimating the
variance of the population of

Introduction to t Tests 233

divided by 10 equals .36). The square root of this, the standard deviation of the
distribution of means, is .60. Its shape will be a t distribution for .

❸ Determine the cutoff sample score on the comparison distribution at which
the null hypothesis should be rejected. The researcher wanted to be very cau-
tious about mistakenly concluding that the flood made a difference. Thus, she
decided to use the .01 significance level. The hypothesis was nondirectional
(that is, no specific direction of difference from the mean of 4 was specified;
either result would have been of interest); so the researcher used a two-tailed
test. The researcher looked up the cutoff in Table 7–2 (or Table A–2 in the
Appendix) for a two-tailed test and 9 degrees of freedom. The cutoff given in
the table is 3.250. Thus, to reject the null hypothesis, the sample’s score on the
comparison distribution must be 3.250 or higher, or or lower.

❹ Determine your sample’s score on the comparison distribution. The sam-
ple’s mean of 4.70 is .70 scale points from the null hypothesis mean of 4.00.
That makes it 1.17 standard deviations on the comparison distribution from that
distribution’s mean ; .

➎ Decide whether to reject the null hypothesis. The t of 1.17 is not as extreme
as the needed t of . Therefore, the researcher cannot reject the null hy-
pothesis. The study is inconclusive. (If the researcher had used a larger sample,
giving more power, the result might have been quite different.)

Summary of Steps for a t Test for a Single Sample
Table 7–5 summarizes the steps of hypothesis testing when you have scores from a
single sample and a population with a known mean but an unknown variance.4

; 3.250

t = 1.17(.70>.60 = 1.17)

– 3.250

df = 9

distribution (t)



3.40 4.00 4.60
–1 0 1


Raw Scores:
t Scores:

Figure 7–4 Distributions for the example of how hopeful individuals felt following a
devastating flood.

234 Chapter 7

Table 7–5 Steps for a t Test for a Single Sample

❶ Restate the question as a research hypothesis and a null hypothesis about the populations.

❷ Determine the characteristics of the comparison distribution.

a. The mean is the same as the known population mean.

b. The standard deviation is figured as follows:

●A Figure the estimated population variance: .

●B Figure the variance of the distribution of means:

●C Figure the standard deviation of the distribution of means: .

c. The shape will be a t distribution with degrees of freedom.

❸ Determine the cutoff sample score on the comparison distribution at which the null hypothesis
should be rejected.

a. Decide the significance level and whether to use a one-tailed or a two-tailed test.

b. Look up the appropriate cutoff in a t table.

❹ Determine your sample’s score on the comparison distribution: .

❺ Decide whether to reject the null hypothesis: Compare the scores from Steps ❸ and ❹.

t = (M – �)>SM

N – 1
SM = 2S 2M

S 2M = S 2>N.
S 2 = SS>df

How are you doing?

1. In what sense is a sample’s variance a biased estimate of the variance of the
population the sample is taken from? That is, in what way does the sample’s
variance typically differ from the population’s?

2. What is the difference between the usual formula for figuring the variance and
the formula for estimating a population’s variance from the scores in a sample
(that is, the formula for an unbiased estimate of the population variance)?

3. (a) What are degrees of freedom? (b) How do you figure the degrees of freedom
in a t test for a single sample? (c) What do they have to do with estimating the
population variance? (d) What do they have to do with the t distribution?

4. (a) How does a t distribution differ from a normal curve? (b) How do degrees of
freedom affect this? (c) What is the effect of the difference on hypothesis testing?

5. List three differences in how you do hypothesis testing for a t test for a single
sample versus for the Z test (you learned in Chapter 5).

6. A population has a mean of 23. A sample of 4 is given an experimental proce-
dure and has scores of 20, 22, 22, and 20. Test the hypothesis that the proce-
dure produces a lower score. Use the .05 significance level. (a) Use the steps
of hypothesis testing and (b) make a sketch of the distributions involved.

❸Determine the cutoff sample score on the comparison distribution at
which the null hypothesis should be rejected. From Table A–2, the cutoff
for a one-tailed ttest at the .05 level for is . The cutoff tscore
is negative, since the research hypothesis is that the procedure produces a

❹Determine your sample’s score on the comparison distribution.

➎Decide whether to reject the null hypothesis.The tof is more ex-
treme than the needed tof . Therefore, reject the null hypothesis;
the research hypothesis is supported.

(b) Sketches of distributions are shown in Figure 7–5.



-2.353 df=3

Introduction to t Tests 235

6.(a) Steps of hypothesis testing:
❶Restate the question as a research hypothesis and a null hypothesis

about the populations. There are two populations:

Population 1: People who are given the experimental procedure.
Population 2: The general population.

The research hypothesis is that Population 1 will score lower than Population 2.
The null hypothesis is that Population 1 will not score lower than Population 2.
❷Determine the characteristics of the comparison distribution.

a.The mean of the distribution of means is 23.
b.The standard deviation is figured as follows:


●BFigure the variance of the distribution of means:

●CFigure the standard deviation of the distribution of means:





> N=1.33 > 4=.33

– 1

+ (-1


+ (22-21)


4>(4-1)= (22-21)









–2–10 –3

Raw Scores:
t Scores:

Figure 7–5Distributions for answer to “How Are You Doing?” question 6b.

236 Chapter 7

repeated-measures design research
strategy in which each person is tested
more than once; same as within subjects

t test for dependent means
hypothesis-testing procedure in which
there are two scores for each person and
the population variance is not known; it
determines the significance of a hypoth-
esis that is being tested using difference
or change scores from a single group of

The t Test for Dependent Means
The situation you just learned about (the t test for a single sample) is for when you
know the population mean but not its variance and you have a single sample of
scores. It turns out that in most research you do not even know the population’s
mean; plus, in most research situations you usually have not one set, but two sets, of
scores. These two things, not knowing the population mean and having two sets of
scores, is very, very common.

The rest of this chapter focuses specifically on this important research situation in
which you have two scores from each person in your sample. This kind of research sit-
uation is called a repeated-measures design (also known as a within subjects design).
A common example is when you measure the same people before and after some
psychological or social intervention. For example, a psychologist might measure the
quality of men’s communication before and after receiving premarital counseling.

The hypothesis-testing procedure for the situation in which each person is mea-
sured twice (that is, for the situation in which we have a repeated-measures design) is a
t test for dependent means. It has the name “dependent means” because the mean for
each group of scores (for example, a group of before-scores and a group of after-scores)
are dependent on each other in that they are both from the same people. (In Chapter 8,
we consider the situation in which you compare scores from two different groups of
people, a research situation you analyze using a t test for independent means.)

You do a t test for dependent means exactly the same way as a t test for a single
sample, except that (a) you use something called difference scores, and (b) you as-
sume that the population mean (of the difference scores) is 0. We will now consider
each of these two new aspects.

Difference Scores
With a repeated-measures design, your sample includes two scores for each person in-
stead of just one. The way you handle this is to make the two scores per person into one


1.The sample’s variance will in general be smaller than the variance of the pop-
ulation the sample is taken from.

la for estimating a population’s variance from the scores in a sample, you
divide by the number of participants in the sample minus 1 (that is, ).

3.(a) Degrees of freedom consist of the number of scores free to vary. (b) The de-
grees of freedom in a ttest for a single sample consist of the number of scores
in the sample minus 1. (c) In estimating the population variance, the formula is
the sum of squared deviations divided by the degrees of freedom. (d) tdistrib-
utions differ slightly from each other according to the degrees of freedom.

4.(a) A tdistribution differs from a normal curve in that it has heavier tails; that is,
more scores at the extremes. (b) The more degrees of freedom, the closer the
shape (including the tails) is to a normal curve. (c) The cutoffs for significance
are more extreme for a tdistribution than for a normal curve.

5.In the ttest you (a) estimate the population variance from the sample (it is not
known in advance); (b) you look up the cutoff on a ttable in which you also
have to take into account the degrees of freedom (you don’t use a normal
curve table); and (c) your sample’s score on the comparison distribution,
which is a tdistribution (not a normal curve), is a tscore (not a Zscore).


Introduction to t Tests 237

score per person! You do this magic by creating difference scores: For each person,
you subtract one score from the other. If the difference is before versus after, differ-
ence scores are also called change scores.

Consider the example of the quality of men’s communication before and after re-
ceiving premarital counseling. The psychologist subtracts the communication quality
score before the counseling from the communication quality score after the counsel-
ing. This gives an after-minus-before difference score for each man. When the two
scores are a before-score and an after-score, we usually take the after-score minus the
before-score to indicate the change.

Once you have the difference score for each person in the study, you do the rest of
the hypothesis testing with difference scores. That is, you treat the study as if there were
a single sample of scores (scores that in this situation happen to be difference scores).

Population of Difference Scores with a Mean of 0
So far in the research situations we have considered in this book, you have always
known the mean of the population to which you compared your sample’s mean. For
example, in the college dormitory survey of hours studied, you knew the population
mean was 17 hours. However, now we are using difference scores, and we usually
don’t know the mean of the population of difference scores.

Here is the solution. Ordinarily, the null hypothesis in a repeated-measures de-
sign is that on the average there is no difference between the two groups of scores.
For example, the null hypothesis in a study of the quality of men’s communication
before and after receiving premarital counseling is that on the average there is no dif-
ference between communication quality before and after the counseling. What does
no difference mean? Saying there is on the average no difference is the same as say-
ing that the mean of the population of the difference scores is 0. Therefore, when
working with difference scores, you are comparing the population of difference
scores that your sample of difference scores comes from to a population of differ-
ence scores with a mean of 0. In other words, with a t test for dependent means, what
we call Population 2 will ordinarily have a mean of 0 (that is, it is a population of dif-
ference scores that has a mean of 0).

Example of a t Test for Dependent Means
Olthoff (1989) tested the communication quality of couples three months before and
again three months after marriage. One group studied was 19 couples who had re-
ceived ordinary (very minimal) premarital counseling from the ministers who were
going to marry them. (To keep the example simple, we will focus on just this one
group and only on the husbands in the group. Scores for wives were similar, though
somewhat more varied, making it a more complicated example for learning the t test

The scores for the 19 husbands are listed in the “Before” and “After” columns in
Table 7–6, followed by all the t test figuring. (The distributions involved are shown
in Figure 7–6.) The crucial column for starting the analysis is the difference scores.
For example, the first husband, whose communication quality was 126 before mar-
riage and 115 after had a difference of . (We figured after minus before, so that
an increase is positive and a decrease, as for this husband, is negative.) The mean of
the difference scores is . That is, on the average, these 19 husbands’ commu-
nication quality decreased by about 12 points.

Is this decrease significant? In other words, how likely is it that this sample of
difference scores is a random sample from a population of difference scores whose
mean is 0?

– 12.05

– 11

difference scores difference between
a person’s score on one testing and the
same person’s score on another testing;
often an after-score minus a before-
score, in which case it is also called a
change score.

238 Chapter 7

❶ Restate the question as a research hypothesis and a null hypothesis about
the populations. There are two populations:

Population 1: Husbands who receive ordinary premarital counseling.
Population 2: Husbands whose communication quality does not change from
before to after marriage. (In other words, it is a population of husbands whose
mean difference in communication quality from before to after marriage is 0.)

The research hypothesis is that Population 1’s mean difference score (com-
munication quality after marriage minus communication quality before marriage)
is different from Population 2’s mean difference score (of zero). That is, the

Table 7–6 t Test for Communication Quality Scores Before and After Marriage for
19 Husbands Who Received Ordinary Premarital Counseling



(After – Before)

(Difference – M )

Before After

A 126 115 1.05 1.10

B 133 125 4.05 16.40

C 126 96 322.20

D 115 115 0 12.05 145.20

E 108 119 11 23.05 531.30

F 109 82 223.50

G 124 93 359.10

H 98 109 11 23.05 531.30

I 95 72 119.90

J 120 104 15.60

K 118 107 1.05 1.10

L 126 118 4.05 16.40

M 121 102 48.30

N 116 115 11.05 122.10

O 94 83 1.05 1.10

P 105 87 35.40

Q 123 121 10.05 101.00

R 125 100 167.70

S 128 118 2.05 4.20

2,210 1,981 2,762.90

For difference scores:

(assumed as a no-change baseline of comparison).

t with needed for 5% level, two-tailed .

Decision: Reject the null hypothesis.

Source: Data from Olthoff (1989).

t = (M – �)>SM = ( – 12.05 – 0)>2.84 = – 4.24.
= ; 2.101df = 18

SM = 2S2M = 28.08 = 2.84.
S 2M = S 2>N = 153.49>19 = 8.08.
S 2 = SS>df = 2,762.90>(19 – 1) = 153.49.

� = 0

M = – 229>19 = – 12.05.

– 229©:
– 10

– 12.95- 25
– 2

– 5.95- 18
– 11
– 1

– 6.95- 19
– 8

– 11
– 3.95- 16

– 10.95- 23

– 18.95- 31
– 14.95- 27

– 17.95- 30
– 8

– 11

As in previous chapters, Popula-
tion 2 is the population for when
the null hypothesis is true.

Introduction to t Tests 239

research hypothesis is that husbands who receive ordinary premarital counseling,
like the husbands Olthoff studied, do change in communication quality from be-
fore to after marriage. The null hypothesis is that the populations are the same—
that the husbands who receive ordinary premarital counseling do not change in
their communication quality from before to after marriage.

Notice that you have no actual information about Population 2 husbands.
The husbands in the study are a sample of Population 1 husbands. For the pur-
poses of hypothesis testing, you set up Population 2 as a kind of straw man com-
parison group. That is, for the purpose of the analysis, you set up a comparison
group of husbands who, if measured before and after marriage, would on the
average show no difference.

❷ Determine the characteristics of the comparison distribution. If the null hy-
pothesis is true, the mean of the population of difference scores is 0. The vari-
ance of the population of difference scores can be estimated from the sample
of difference scores. As shown in Table 7–6, the sum of squared deviations of
the difference scores from the mean of the difference scores is 2,762.90. With 19
husbands in the study, there are 18 degrees of freedom. Dividing the sum of
squared deviation scores by the degrees of freedom gives an estimated popula-
tion variance of difference scores of 153.49.

The distribution of means (from this population of difference scores) has a
mean of 0, the same as the mean of the population of difference scores. The vari-
ance of the distribution of means of difference scores is the estimated population
variance of difference scores (153.49) divided by the sample size (19), which

distribution (t)

of difference


–2.85 0 2.85
–1 0 1



Raw Scores
t Scores

Figure 7–6 Distributions for the Olthoff (1989) example of a t test for dependent

240 Chapter 7

gives 8.08. The standard deviation of the distribution of means of difference
scores is 2.84, the square root of 8.08. Because Olthoff was using an estimated
population variance, the comparison distribution is a t distribution. The estimate
of the population variance of difference scores is based on 18 degrees of freedom;
so this comparison distribution is a t distribution for 18 degrees of freedom.

❸ Determine the cutoff sample score on the comparison distribution at which
the null hypothesis should be rejected. Olthoff used a two-tailed test to allow
for either an increase or decrease in communication quality. Using the .05 sig-
nificance level and 18 degrees of freedom, Table A–2 shows cutoff t scores of

and .
❹ Determine your sample’s score on the comparison distribution. Olthoff’s

sample had a mean difference score of . That is, the mean was 12.05
points below the mean of 0 on the distribution of means of difference scores.
The standard deviation of the distribution of means of difference scores is 2.84.
Thus, the mean of the difference scores of is 4.24 standard deviations
below the mean of the distribution of means of difference scores. So Olthoff’s
sample of difference scores has a t score of .

❺ Decide whether to reject the null hypothesis. The t of for the sample of
difference scores is more extreme than the needed t of . Thus, you can re-
ject the null hypothesis: Olthoff’s husbands are from a population in which hus-
bands’ communication quality is different after marriage from what it was
before (it is lower).

Olthoff’s actual study was more complex. You may be interested to know that he
found that the wives also showed this decrease in communication quality after mar-
riage. But a group of similar engaged couples who were given special communication
skills training by their ministers (much more than the usual short session) had no sig-
nificant decline in marital communication quality after marriage. In fact, there is a
great deal of research showing that on the average marital happiness declines steeply
over time (VanLaningham et al., 2001). And many studies have now shown the value
of a full course of premarital communications training. For example, a recent repre-
sentative survey of 3,344 adults in the United States showed that those who had at-
tended a premarital communication program had significantly greater marital
satisfaction, had less marital conflict, and were 31% less likely to divorce (Stanley et al.,
2006). Further, benefits were greatest for those with a college education!

Summary of Steps for a t Test for Dependent Means
Table 7–7 summarizes the steps for a t test for dependent means.5

A Second Example of a t Test for Dependent Means
Here is another example. A team of researchers examined the brain systems involved
in human romantic love (Aron et al., 2005). One issue was whether romantic love en-
gages a part of the brain called the caudate (a brain structure that is engaged when peo-
ple win money, are given cocaine, and other such “rewards”). Thus, the researchers
recruited people who had very recently fallen “madly in love.” (For example, to be in
the study participants had to think about their partner at least 80% of their waking
hours.) Participants brought a picture of their beloved with them, plus a picture of a fa-
miliar, neutral person of the same age and sex as their beloved. Participants then went
in to the functional magnetic resonance imaging (fMRI) machine and their brain was
scanned while they looked at the two pictures—30 seconds at the neutral person’s pic-
ture, 30 seconds at their beloved, 30 seconds at the neutral person, and so forth.

; 2.101

– 4.24

– 4.24
– 12.05
– 12.05

– 2.101+ 2.101

Step ❷ of hypothesis testing for the
t test for dependent means is more
complex than previously. This can
make it easy to lose track of the
purpose of this step. Step ❷ of
hypothesis testing determines the
characteristics of the comparison
distribution. In the case of the t test
for dependent means, this compar-
ison distribution is a distribution of
means of difference scores. The
key characteristics of this distribu-
tion are its mean (which is as-
sumed to equal 0), its standard
deviation (which is estimated as ),
and its shape (a t distribution with
degrees of freedom equal to the
sample size minus 1).


You now have to deal with some
rather complex terms, such as the
standard deviation of the distribu-
tion of means of difference scores.
Although these terms are complex,
there is good logic behind them.
The best way to understand such
terms is to break them down into
manageable pieces. For example,
you will notice that these new
terms are the same as the terms
for the t test for a single sample,
with the added phrase “of differ-
ence scores.” This phrase has
been added because all of the fig-
uring for the t test for dependent
means uses difference scores.

Introduction to t Tests 241

Table 7–8 shows average brain activations (mean fMRI scanner values) in the
caudate area of interest during the two kinds of pictures. (We have simplified the
example for teaching purposes, including using only 10 participants when the actual
study had 17.) It also shows the figuring of the difference scores and all the other

Table 7–7 Steps for a t Test for Dependent Means

❶ Restate the question as a research hypothesis and a null hypothesis about the populations.
❷ Determine the characteristics of the comparison distribution.

a. Make each person’s two scores into a difference score. Do all the remaining steps using these difference

b. Figure the mean of the difference scores.

c. Assume a mean of the distribution of means of difference scores of 0: .

d. The standard deviation of the distribution of means of difference scores is figured as follows:

●A Figure the estimated population variance of difference scores: .

●B Figure the variance of the distribution of means of difference scores: .

●C Figure the standard deviation of the distribution of means of difference scores:

e. The shape is a t distribution with .

❸ Determine the cutoff sample score on the comparison distribution at which the null hypothesis
should be rejected.
a. Decide the significance level and whether to use a one-tailed or a two-tailed test.
b. Look up the appropriate cutoff in a t table.
❹ Determine your sample’s score on the comparison distribution: .

➎ Decide whether to reject the null hypothesis: Compare the scores from Steps ❸ and ❹.

t = (M – �)>SM

df = N – 1
SM = 2S 2M .

S 2M = S 2>N
S 2 = SS>df

� = 0

Table 7–8 t Test for a Study of Romantic Love and Brain Activation in Part of the Caudate

Brain Activation

Student Beloved’s photo Control photo

(Beloved –


(Difference – M )

1 1487.8 1487.2 .6 .640

2 1329.4 1328.1 1.3 .010

3 1407.9 1405.9 2.0 .600 .360

4 1236.1 1234.0 2.1 .700 .490

5 1299.8 1298.2 1.6 .200 .040

6 1447.2 1444.7 2.5 1.100 1.210

7 1354.1 1354.3 2.560

8 1204.6 1203.7 .9 .250

9 1322.3 1320.8 1.5 .100 .010

10 1388.5 1386.8 1.7 .300 .090

13477.7 13463.7 14.0 5.660

For difference scores:
(assumed as a no-change baseline of comparison).

t with needed for 5% level, one-tailed .

Decision: Reject the null hypothesis.

Source: Data based on Aron et al. (2005).

t = (M – �)>SM = (1.400 – 0)>.251 = 5.58.
= 1.833df = 9

SM = 2S2M = 2.063 = .251.
S 2M = S 2>N = .629>10 = .063.
S 2 = SS>df = 5.660>(10 – 1) = 5.660>9 = .629.
� = 0
M = 14.0>10 = 1.400.


– .500
– 1.600- .2

– .100
– .800

242 Chapter 7

figuring for the t test for dependent means. Figure 7–7 shows the distributions in-
volved. Here are the steps of hypothesis testing:

❶ Restate the question as a research hypothesis and a null hypothesis about
the populations. There are two populations:

Population 1: Individuals like those tested in this study.
Population 2: Individuals whose brain activation in the caudate area of interest is
the same when looking at a picture of their beloved and a picture of a familiar,
neutral person.

The research hypothesis is that Population 1’s mean difference score (brain activa-
tion when viewing the beloved’s picture minus brain activation when viewing the
neutral person’s picture) is greater than Population 2’s mean difference score (of no
difference). That is, the research hypothesis is that brain activation in the caudate
area of interest is greater when viewing the beloved person’s picture than when
viewing the neutral person’s picture. The null hypothesis is that Population 1’s
mean difference score is not greater than Population 2’s. That is, the null hypothe-
sis is that brain activation in the caudate area of interest is not greater when viewing
the beloved person’s picture than when viewing the neutral person’s picture.

❷ Determine the characteristics of the comparison distribution.
a. Make each person’s two scores into a difference score. This is shown in the

column labeled “Difference” in Table 7–8. You do all the remaining steps
using these difference scores.

distribution (t)
of difference

–.251 0 .251
–1 0 1


Raw Scores:
t Scores:

Figure 7–7 Distributions for the example of romantic love and brain activation in part
of the caudate.

Introduction to t Tests 243

b. Figure the mean of the difference scores. The sum of the difference scores
(14.0) divided by the number of difference scores (10) gives a mean of the
difference scores of 1.400. So, .

c. Assume a mean of the distribution of means of difference scores of 0:
d. The standard deviation of the distribution of means of difference scores is fig-

ured as follows:
●A Figure the estimated population variance of difference scores:

●B Figure the variance of the distribution of means of difference scores:

●C Figure the standard deviation of the distribution of means of difference

e. The shape is a t distribution with . Therefore, the comparison
distribution is a t distribution for 9 degrees of freedom. It is a t distribution
because we figured its variance based on an estimated population variance. It
has 9 degrees of freedom because there were 9 degrees of freedom in the
estimate of the population variance.

❸ Determine the cutoff sample score on the comparison distribution at which
the null hypothesis should be rejected.
a. We will use the standard .05 significance level. This is a one-tailed test because

the researchers were interested only in a specific direction of difference.
b. Using the .05 significance level with 9 degrees of freedom, Table A–2 shows a

cutoff t of 1.833. In Table 7–8, the difference score is figured as brain activa-
tion when viewing the beloved’s picture minus brain activation when viewing
the neutral person’s picture. Thus, the research hypothesis predicts a positive
difference score, which means that our cutoff is .

❹ Determine your sample’s score on the comparison distribution.
. The sample’s mean difference

of 1.400 is 5.58 standard deviations (of .251 each) above the mean of 0 on the
distribution of means of difference scores.

➎ Decide whether to reject the null hypothesis. The sample’s t score of 5.58 is
more extreme than the cutoff t of 1.833. You can reject the null hypothesis.
Brain activation in the caudate area of interest is greater when viewing a
beloved’s picture than when viewing a neutral person’s picture. The results of
this study are not limited to North Americans. Recently, the study was replicated,
with virtually identical results, in Beijing with Chinese students who were in-
tensely in love (Xu et al., 2007).

t Test for Dependent Means with Scores
from Pairs of Research Participants
The t test for dependent means is also called a paired-samples t test, t test for correlated
means, t test for matched samples, and t test for matched pairs. Each of these names
comes from the same idea that in this kind of t test you are comparing two sets of scores
that are related to each other in a direct way. In the t test for dependent means examples
in this chapter, the two sets of scores have been related because each individual had a
score in both sets of scores (for example, a score before a procedure and a score after a
procedure). However, you can also use a t test for dependent means with scores from
pairs of research participants, considering each pair as if it were one person, and figur-
ing the difference score for each pair. For example, suppose you have 30 married cou-
ples and want to test whether wives consistently do more housework than husbands.

t = (M – �)>SM = (1.400 – 0)>.251 = 5.58

+ 1.833

df = N – 1
SM = 2S2M = 2.063 = .251.

S2M = S2 > N = .629 > 10 = .063.

S2 = SS>df = 5.660>(10 – 1) = .629

� = 0.
M = 1.400

244 Chapter 7

You could figure for each couple a difference score of the wife’s hours of housework
per week minus her husband’s number of hours of housework per week. There are also
situations in which experimenters create pairs. For example, a researcher might put
participants into pairs to do a puzzle task together and, for each pair, assign one to be a
leader and one a follower. At the end of the study, participants privately fill out a ques-
tionnaire about how much they enjoyed the interaction. The procedure for analyzing
this study would be to create a difference score for each pair by taking the enjoyment
rating of the leader minus the enjoyment rating of the follower.

Review and Comparison of Z Test, t Test for a Single
Sample, and t test for Dependent Means
In Chapter 5 you learned about the Z test; in this chapter you have learned about the
t test for a single sample and the t test for dependent means. Table 7–9 provides a
review and comparison of the Z test, the t test for a single sample, and the t test for
dependent means.

We recommend that you spend
some time carefully going through
Table 7–9. Test your understanding
of the different tests by covering
up portions of the table and trying
to recall the hidden information.
Also, take a look at Chapter Note 3
(page 268) for a discussion of the
terminology used in the formulas.

Table 7–9 Review of the Z Test, the t Test for a Single Sample, and the t Test for Dependent

Type of Test

Features Z Test
t Test for a

Single Sample
t Test for

Dependent Means

Population variance is known Yes No No

Population mean is known Yes Yes No

Number of scores for each participant 1 1 2

Shape of comparison distribution Z distribution t distribution t distribution

Formula for degrees of freedom Not applicable

Formula t = (M – �)>SMt = (M – �)>SMZ = (M – �M)>


df = N – 1df = N – 1

How are you doing?

1. Describe the situation in which you would use a t test for dependent means.
2. When doing a t test for dependent means, what do you do with the two

scores you have for each participant?
3. In a t test for dependent means, (a) what is usually considered to be the mean

of the “known” population (Population 2). (b) Why?
4. Five individuals are tested before and after an experimental procedure; their

scores are given in the following table. Test the hypothesis that there is no
change, using the .05 significance level. (a) Use the steps of hypothesis test-
ing and (b) sketch the distributions involved.

Person Before After

1 20 30
2 30 50
3 20 10
4 40 30
5 30 40

Introduction to t Tests 245

5. What about the research situation makes the difference in whether you should
carry out a Z test or a t test for a single sample?

6. What about the research situation makes the difference in whether you should
carry out a t test for a single sample or a t test for dependent means?

distribution (t)
of difference



Raw Scores:
t Scores:

Figure 7–8Distributions for answer to “How Are You Doing?” question 4.


4.(b) The distributions are shown inFigure 7–8.
5.As shown in Table 7–9, whether the population variance is known determines

whether you should carry out a Ztest or a ttest for a single sample. You use
a Ztest when the population variance is known and you use the ttest for a
single sample when it is not known.

6.As shown in Table 7–9, whether the population mean is known and whether
there are one or two scores for each participant determines whether you
should carry out a ttest for a single sample or a ttest for dependent means.
You use a ttest for a single sample when you know the population mean and
you have one score for each participant; you use the ttest for dependent
means when you do not know the population mean and there are two scores
for each participant.


246 Chapter 7


1.A ttest for dependent means is used when you are doing hypothesis testing
and you have two scores for each participant (such as a before-score and an
after-score) and the population variance is unknown. It is also used when a
study compares participants who are organized into pairs.

2.Subtract one from the other to create a difference (or change) score for each
person. The ttest is then done with these difference (or change) scores.

3.(a) The mean of the “known” population (Population 2) is 0. (b) You are com-
paring your sample to a situation in which there is no difference—a population
of difference scores in which the average difference is 0.

4.(a) Steps of hypothesis testing (all figuring is shown in Table 7–10):
❶Restate the question as a research hypothesis and a null hypothesis

about the populations.There are two populations:

Population 1:People like those tested before and after the experimental
Population 2:People whose scores are the same before and after the
experimental procedure.

The research hypothesis is that Population 1’s mean change score (after
minus before) is different from Population 2’s. The null hypothesis is that
Population 1’s mean change score is the same as Population 2’s.

❷Determine the characteristics of the comparison distribution.The
mean of the distribution of means of difference scores (the comparison
distribution) is 0; the standard deviation of the distribution of means of dif-
ference scores is 6; it is a tdistribution with 4 degrees of freedom.

❸Determine the cutoff sample score on the comparison distribution at
which the null hypothesis should be rejected.For a two-tailed test at
the .05 level, the cutoff sample scores are and .

❹Determine your sample’s score on the comparison distribution.
. t=(4-0)>6=.67

-2.776 +2.776

Table 7–10Figuring for Answer to “How Are You Doing?”Question 4


PersonBeforeAfter(After– Before)(Difference– M)







For difference scores:

tfor needed for 5% significance level,two-tailed.

Decision:Do not reject the null hypothesis.

=;2.776 df=4







-14 -10
-14 -10

Introduction to t Tests 247

Assumptions of the t Test for a Single Sample
and the t Test for Dependent Means
As we have seen, when you are using an estimated population variance, the comparison
distribution is a t distribution. However, the comparison distribution will be exactly a
t distribution only if the distribution of individuals follows a normal curve. Otherwise,
the comparison distribution will follow some other (usually unknown) shape.

Thus, strictly speaking, a normal population is a requirement within the logic
and mathematics of the t test. A requirement like this for a hypothesis-testing
procedure is called an assumption. That is, a normal population distribution is one
assumption of the t test. The effect of this assumption is that if the population distri-
bution is not normal, the comparison distribution will be some indeterminate shape
other than a t distribution—and thus the cutoffs on the t table will be incorrect.

Unfortunately, when you do a t test, you don’t know whether the population is nor-
mal. This is because, when doing a t test, usually all you have to go on are the scores in
your sample. Fortunately, however, as we saw in Chapter 3, distributions in psychology
research quite often approximate a normal curve. (This also applies to distributions of
difference scores.) Also, statisticians have found that, in practice, you get reasonably ac-
curate results with t tests even when the population is rather far from normal. In other
words, the t test is said to be robust over moderate violations of the assumption of a nor-
mal population distribution. How statisticians figure out the robustness of a test is an
interesting topic, which is described in Box 8–1 in Chapter 8.

The only very common situation in which using a t test for dependent means is
likely to give a seriously distorted result is when you are using a one-tailed test and
the population is highly skewed (is very asymmetrical, with a much longer tail on
one side than the other). Thus, you need to be cautious about your conclusions when
doing a one-tailed test if the sample of difference scores is highly skewed, suggest-
ing the population it comes from is also highly skewed.

Effect Size and Power for the t Test
for Dependent Means

Effect Size

You can figure the effect size for a study using a t test for dependent means the same
way as in Chapter 6.6 t is the difference between the population means divided by the
population standard deviation: . When using this formula for a
t test for dependent means, is for the predicted mean of the population of differ-
ence scores, (the “known” population mean) is almost always 0, and usually
stands for the standard deviation of the population of difference scores. The conven-
tions for effect size for a t test for dependent means are also the same as you learned
for the situation we considered in Chapter 6: A small effect size is .20, a medium ef-
fect size is .50, and a large effect size is .80.

Consider an example. A sports psychologist plans a study on attitudes toward
teammates before versus after a game. She will administer an attitude questionnaire
twice, once before and once after a game. Suppose that the smallest before-after dif-
ference that would be of any importance is 4 points on the questionnaire. Also sup-
pose that, based on related research, the researcher figures that the standard deviation
of difference scores on this attitude questionnaire is about 8 points. Thus, and

. Applying the effect size formula, . In
terms of the effect size conventions, her planned study has a medium effect size.

d = (�1 – �2)>� = (4 – 0)>8 = .50� = 8
�1 = 4



d = (�1 – �2)>�

assumption condition, such as a pop-
ulation’s having a normal distribution,
required for carrying out a particular
hypothesis-testing procedure; a part of
the mathematical foundation for the
accuracy of the tables used in determin-
ing cutoff values.

robustness extent to which a particu-
lar hypothesis-testing procedure is rea-
sonably accurate even when its
assumptions are violated.

248 Chapter 7

To estimate the effect size after a study, use the actual mean of your sample’s
difference scores as your estimate of , and use S (for the population of difference
scores) as your estimate of .

Consider our first example of a t test for dependent means, the study of
husbands’ change in communication quality. In that study, the mean of the differ-
ence scores was . The estimated population standard deviation of the differ-
ence scores would be 12.41. That is, we figured the estimated variance of the
difference scores to be 153.49; Therefore, the estimated effect
size is . This is a
very large effect size. (The negative sign for the effect size means that the large
effect was a decrease.)

Power for a t test for dependent means can be determined using a power table, a power
software package, or an Internet power calculator. Table 7–11 gives the approximate
power at the .05 significance level for small, medium, and large effect sizes and one-
tailed and two-tailed tests. In the sports psychology example, the researcher expected
a medium effect size ( ). If she planned to conduct the study using the .05 level,
two-tailed, with 20 participants, the study would have a power of .59. This means that,
if the research hypothesis is true and has a medium effect size, there is a 59% chance
that this study will come out significant.

The power table (Table 7–11) is also useful when you are reading about a non-
significant result in a published study. Suppose that a study using a t test for dependent
means has a nonsignificant result. The study tested significance at the .05 level, was
two-tailed, and had 10 participants. Should you conclude that there is in fact no differ-
ence at all in the populations? Probably not. Even assuming a medium effect size, Table
7–11 shows that there is only a 32% chance of getting a significant result in this study.

d = .50

d = (�1 – �2)>� = (M – 0)>S = ( – 12.05 – 0)>12.39 = – .97
2S2 = 12.39.(S2)

– 12.05


Recall from Chapter 6 that power
can be expressed as a probability
(such as .71) or as a percentage
(such as 71%). Power is expressed
as a probability in Table 7–11 (as
well as in power tables in later

Table 7–11 Approximate Power for Studies Using the t Test for Dependent Means for Testing
Hypotheses at the .05 Significance


Effect SizeDifference
Scores in
Sample (N )

(d � .20)

(d � .50)

(d � .80)

One-tailed test

10 .15 .46 .78

20 .22 .71 .96

30 .29 .86 *

40 .35 .93 *

50 .40 .97 *

100 .63 * *

Two-tailed test

10 .09 .32 .66

20 .14 .59 .93

30 .19 .77 .99

40 .24 .88 *

50 .29 .94 *

100 .55 * *

*Power is nearly 1.

Introduction to t Tests 249

Consider another study that was not significant. This study also used the .05 sig-
nificance level, two-tailed. This study had 100 research participants. Table 7–11 tells
you that there would be a 55% chance of the study’s coming out significant if there
were even a true small effect size in the population. If there were a medium effect
size in the population, the table indicates that there is almost a 100% chance that this
study would have come out significant. Thus, in this study with 100 participants, we
could conclude from the results that in the population there is probably at most a
small difference.

To keep Table 7–11 simple, we have given power figures for only a few differ-
ent numbers of participants (10, 20, 30, 40, 50, and 100). This should be adequate for
the kinds of rough evaluations you need to make when evaluating results of research

Planning Sample Size
Table 7–12 gives the approximate number of participants needed for 80% power for
a planned study. (Eighty percent is a common figure used by researchers for the
minimum power to make a study worth doing.) Suppose you plan a study in which
you expect a large effect size and you use the .05 significance level, two-tailed. The
table shows you would only need 14 participants to have 80% power. On the other
hand, a study using the same significance level, also two-tailed, but in which you ex-
pect only a small effect size would need 196 participants for 80% power.8

How are you doing?

1. (a) What is an assumption in hypothesis testing? (b) Describe a specific as-
sumption for a t test for dependent means. (c) What is the effect of violating
this assumption? (d) What does it mean to say that the t test for dependent
means is robust? (e) Describe a situation in which it is not robust.

2. How can you tell if you have violated the normal curve assumption?
3. (a) Write the formula for effect size; (b) describe each of its terms as they

apply to a planned t test for dependent means; (c) describe what you use for
each of its terms in figuring effect size for a completed study that used a t test
for dependent means.

4. You are planning a study in which you predict the mean of the population of
difference scores to be 40, and the population standard deviation is 80. You
plan to test significance using a t test for dependent means, one-tailed, with
an alpha of .05. (a) What is the predicted effect size? (b) What is the power of
this study if you carry it out with 20 participants? (c) How many participants
would you need to have 80% power?

Table 7–12 Approximate Number of Research Participants Needed for 80% Power for the
t Test for Dependent Means in Testing Hypotheses at the .05 Significance Level

Effect Size
(d � .20)
(d � .50)
(d � .80)

One-tailed 156 26 12

Two-tailed 196 33 14

250 Chapter 7

Controversy: Advantages and Disadvantages
of Repeated-Measures Designs
The main controversies about t tests have to do with their relative advantages and
disadvantages compared to various alternatives (alternatives we will discuss in
Chapter 14). There is, however, one consideration that we want to comment on now.
It is about all research designs in which the same participants are tested before and
after some experimental intervention (the kind of situation the t test for dependent
means is often used for).

Studies using difference scores (that is, studies using a repeated-measures de-
sign) often have much larger effect sizes for the same amount of expected difference
between means than other kinds of research designs. That is, testing each of a group
of participants twice (once under one condition and once under a different condition)
usually produces a study with high power. In particular, this kind of study gives more
power than dividing the participants up into two groups and testing each group once
(one group tested under one condition and the other tested under another condition).
In fact, studies using difference scores usually have even more power than those in
which you have twice as many participants, but each is tested only once.

Why do repeated-measures designs have so much power? The reason is that the
standard deviation of difference scores is usually quite low. (The standard deviation
of difference scores is what you divide by to get the effect size when using difference
scores.) This produces a large effect size, which increases the power. In a repeated-
measures design, the only variation is in the difference scores. Variation among par-
ticipants on each testing’s scores is not part of the variation involved in the analysis.
As an example, look back at Table 7–8 from our romantic love and brain imaging
study. Notice that there were very great differences between the scores (fMRI scanner


1.(a) An assumption is a requirement that you must meet for the results of the
hypothesis testing procedure to be accurate.(b) The population of individu-
als’ difference scores is assumed to be a normal distribution. (c)The signifi-
cance level cutoff from the ttable is not accurate. (d) Unless you very strongly
violate the assumption (that is, unless the population distribution is very far
from normal), the cutoff is fairly accurate.(e) The ttest for dependent means
is not robust when you are doing a one-tailed test and the population distrib-
ution is highly skewed.

2.You look at the distribution of the sample of difference scores to see if it is
dramatically different from a normal curve.

3.(a) .(b) dis the effect size; is for the predicted mean of the
population of difference scores; is the mean of the known population,
which for a population of difference scores is almost always 0; is for the
standard deviation of the population of difference scores.(c) To estimate ,
you use M,the actual mean of your sample’s difference scores; remains as
0; and for , you use S,the estimated standard deviation of the population of
difference scores.

4.(a)Predicted effect size: . (b) Power of
this study: .71. (c) Number of participants for 80% power: 26.




�1 d=(�1-�2)>�

Introduction to t Tests 251

activation values) for each participant. The first participant’s scores were around
1,487, the second’s was around 1,328, and so forth. Each person has a quite different
overall level of activation. But the differences between the two conditions were rela-
tively small. What we see in this example is that, because difference scores are all
comparing participants to themselves, the variation in them is much less (and does
not include the variation between participants). William S. Gosset, who essentially
invented the t test (see Box 7–1), made much of the higher power of repeated-
measures studies in a historically interesting controversy over an experiment about
milk, which is described in Box 7–2.

On the other hand, testing a group of people before and after an experimental proce-
dure, without any kind of control group that does not go through the procedure, is a weak
research design (Cook & Campbell, 1979). Even if such a study produces a significant
difference, it leaves many alternative explanations for that difference. For example, the
research participants might have matured or improved during that period anyway, or
perhaps other events happened between tests, or the participants not getting benefits
may have dropped out. It is even possible that the initial test itself caused changes.

Note, however, that the difficulties of research that tests people before and after
some intervention are shared only slightly with the kind of study in which participants
are tested under two conditions, such as viewing a beloved person’s picture and a neu-
tral person’s picture, with half tested first viewing the beloved’s picture and half tested
first viewing the neutral person’s picture. Another example would be a study examining
the hand-eye coordination of a group of surgeons under both quiet and noisy conditions
(not while doing surgery, of course). Each surgeon would perform the test of hand-eye

from group to group if they took pity on a child whom
they felt would benefit from receiving milk!

However, even more interesting in light of the present
chapter, Gosset demonstrated that the researchers could
have obtained the same result with 50 pairs of identical
twins, flipping a coin to determine which of each pair was
in the milk group (and sticking to it). Of course, the statis-
tic you would use is the t test as taught in this chapter—the
t test for dependent means.

More recently, the development of power analysis,
which we introduced in Chapter 6, has thoroughly vindi-
cated Gosset. It is now clear just how surprisingly few
participants are needed when a researcher can find a way
to set up a repeated-measures design in which difference
scores are the basic unit of analysis. (In this case, each
pair of twins would be one “participant.”) As Gosset
could have told them, studies that use the t test for depen-
dent means can be extremely sensitive.

B O X 7 – 2 The Power of Studies Using Difference Scores:
How the Lanarkshire Milk Experiment Could Have
Been Milked for More

In 1930, a major health experiment was conducted in
Scotland involving 20,000 schoolchildren. Its main pur-
pose was to compare the growth of a group of children
who were assigned to drink milk regularly to those who
were in a control group. The results were that those who
drank milk showed more growth.

However, William Gosset, a contemporary statistician
and inventor of the t test (see Box 7–1), was appalled at
the way the experiment was conducted. It had cost about
£7,500, which in 1930 was a huge amount of money, and
was done wrong! Large studies such as this were very
popular among statisticians in those days because they
seemed to imitate the large numbers found in nature.
Gosset, by contrast, being a brewer, was forced to use
very small numbers in his studies—experimental batches
of beer were too costly. And he was often chided by the
“real statisticians” for his small sample sizes. But Gosset
argued that no number of participants was large enough
when strict random assignment was not followed. And
in this study, teachers were permitted to switch children

252 Chapter 7

coordination during quiet conditions and noisy conditions. Ideally, any effects of prac-
tice or fatigue from taking the test twice would be equalized by testing half of the sur-
geons under noisy conditions first, and half under quiet conditions first.

Single Sample t Tests and Dependent Means
t Tests in Research Articles
Research articles usually describe t tests in a fairly standard format that includes the
degrees of freedom, the t score, and the significance level. For example,
“ , p � .05” tells you that the researcher used a t test with 24 degrees of
freedom, found a t score of 2.80, and the result was significant at the .05 level.
Whether a one-tailed or two-tailed test was used may also be noted. (If not, assume
that it was two-tailed.) Usually the means, and sometimes the standard deviations, are
given for each testing. Rarely does an article report the standard deviation of the dif-
ference scores.

Had our student in the dormitory example reported the results in a research arti-
cle, she would have written something like this: “The sample from my dormitory
studied a mean of 21 hours ( ). Based on a t test for a single sample, this
was significantly different from the known mean of 17 for the college as a whole,

, p � .05, one-tailed.” The researchers in our fictional flood victims
example might have written up their results as follows: “The reported hopefulness
of our sample of flood victims ( , ) was not significantly differ-
ent from the midpoint of the scale, .”

As we noted earlier, psychologists only occasionally use the t test for a single
sample. We introduced it mainly as a stepping-stone to the more widely used t test
for dependent means. Nevertheless, one sometimes sees the t test for a single sample
in research articles. For example, Soproni and colleagues (2001), as part of a larger
study, had pet dogs respond to a series of eight trials in which the owner would look
at one of two bowls of dog food and the researchers measured whether the dog went
to the correct bowl. (The researchers called these “at trials” because the owner
looked directly at the target.) For each dog, this produced an average percentage cor-
rect that was compared to chance, which would be 50% correct. Here is part of their
results: “During the eight test trials for gesture, dogs performed significantly above
chance on at target trials: one sample t test, , p � .01 . . .” (p. 124).

As we have said, the t test for dependent means is much more commonly used.
Olthoff (1989) might have reported the result of his study of husbands’ communica-
tion quality as follows: “There was a significant decline in communication quality,
dropping from a mean of 116.32 before marriage to a mean of 104.26 after marriage,

, p � .05.”
As another example, Rashotte and Webster (2005) carried out a study about

people’s general expectations about the abilities of men and women. In the study, the
researchers showed 174 college students photos of women and men (referred to as
the female and male targets, respectively). The students rated the person in each
photo in terms of that person’s general abilities (e.g., in terms of the person’s intelli-
gence, abstract abilities, capability at most tasks, and so on). For each participant,
these ratings were combined to create a measure of the perceived status of the female
targets and of the male targets. The researchers then compared the status ratings
given for the female targets and male targets. Since each participant in the study
rated both the female and the male targets, the researchers compared the status rat-
ings assigned to the female and male targets using a t test for dependent means.
Table 7–13 shows the results. The row entitled “Whole sample ( )” gives theN = 174

t(18) = – 4.24

t(13) = 5.3

t(9) = 1.17
SD = 1.89M = 4.70

t(15) = 2.35

SD = 6.80

t(24) = 2.80

Introduction to t Tests 253

result of the t test for all 174 participants and shows that the status rating assigned to
the male targets was significantly higher than the rating assigned to the female tar-
gets ( , p � .001). As shown in the table, the researchers also conducted two
additional t tests to see if this effect was the same among the female participants and
the male participants. The results showed that both the female and the male partici-
pants assigned higher ratings to the male targets.

t = 3.46

Table 7–13 Status Scale: Mean (and SE ) General Expectations for Female and Male Targets

Mean Score (SE )

Respondents Female Target Male Target
M � F Target

Difference t (1-tailed p)

Whole sample ( ) 5.60 (.06) 5.85 (.07) .25

Female respondents ( ) 5.62 (.07) 5.84 (.081) .22

Male respondents ( ) 5.57 (.10) 5.86 (.11) .29

Source: Rashotte, L. S., & Webster, M., Jr. (2005). Gender status beliefs. Social Science Research, 34, 618–633. Copyright ©
2005 by Elsevier. Reprinted by permission of Elsevier.

2.26 ( 6 .05)N = 63
2.62 ( 6 .05)N = 111
3.46 ( 6 .001)N = 174

1. You use the standard steps of hypothesis testing even when you don’t know the
population variance. However, in this situation you have to estimate the popula-
tion variance from the scores in the sample, using a formula that divides the sum
of squared deviation scores by the degrees of freedom ( ).

2. When the population variance is estimated, the comparison distribution of means
is a t distribution (with cutoffs given in a t table). A t distribution has slightly
heavier tails than a normal curve (just how much heavier depends on how few the
degrees of freedom are). Also, in this situation, a sample’s number of standard
deviations from the mean of the comparison distribution is called a t score.

3. You use a t test for a single sample when a sample mean is being compared to a
known population mean and the population variance is unknown.

4. You use a t test for dependent means in studies where each participant has two
scores, such as a before-score and an after-score or a score in each of two experi-
mental conditions. A t test for dependent means is also used when you have scores
from pairs of research participants. In this t test, you first figure a difference or
change score for each participant, then go through the usual five steps of hypothe-
sis testing with the modifications described in summary points 1 and 2 and making
Population 2 a population of difference scores with a mean of 0 (no difference).

5. An assumption of the t test is that the population distribution is a normal curve.
However, even when it is not, the t test is usually fairly accurate.

6. The effect size of a study using a t test for dependent means is the mean of the
difference scores divided by the standard deviation of the difference scores. You
can look up power and needed sample size for any particular level of power
using power software packages, an Internet power calculator, or special tables.

7. The power of studies using difference scores is usually much higher than that of
studies using other designs with the same number of participants. However, re-
search using a single group tested before and after some intervening event, without
a control group, allows for many alternative explanations of any observed changes.

8. t tests are reported in research articles using a standard format. For example,
“ , p � .05.”t(24) = 2.80

df = N – 1


Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

254 Chapter 7

t tests (p. 223)
t test for a single sample (p. 223)
biased estimate (p. 225)
unbiased estimate of the population

variance ( ) (p. 226)S2

degrees of freedom (df ) (p. 226)
t distribution (p. 228)
t table (p. 229)
t score (p. 230)
repeated-measures design (p. 236)

t test for dependent means (p. 236)
difference scores (p. 237)
assumption (p. 247)
robustness (p. 247)

Key Terms

t Test for a Single Sample
Eight participants are tested after being given an experimental procedure. Their
scores are 14, 8, 6, 5, 13, 10, 10, and 6. The population of people not given this pro-
cedure is normally distributed with a mean of 6. Using the .05 level, two-tailed, does
the experimental procedure make a difference? (a) Use the five steps of hypothesis
testing and (b) sketch the distributions involved.

(a) Steps of hypothesis testing:

❶ Restate the question as a research hypothesis and a null hypothesis
about the populations. There are two populations:

Population 1: People who are given the experimental procedure.
Population 2: The general population.

The research hypothesis is that the Population 1 will score differently than
Population 2. The null hypothesis is that Population 1 will score the same as
Population 2.

❷ Determine the characteristics of the comparison distribution. The mean
of the distribution of means is 6 (the known population mean). To figure the
estimated population variance, you first need to figure the sample mean,
which is ( . The
estimated population variance is ; the variance
of the distribution of means is The standard
deviation of the distribution of means is Its
shape will be a t distribution for .

❸ Determine the cutoff sample score on the comparison distribution at
which the null hypothesis should be rejected. From Table A–2, the cutoffs
for a two-tailed t test at the .05 level for are and .

❹ Determine your sample’s score on the comparison distribution.

➎ Decide whether to reject the null hypothesis. The t of 2.54 is more extreme
than the needed t of . Therefore, reject the null hypothesis; the research
hypothesis is supported. The experimental procedure does make a difference.

(b) Sketches of distributions are shown in Figure 7–9.

; 2.365

t = (M – �)>SM = (9 – 6)>1.18 = 3>1.18 = 2.54

– 2.365+ 2.365df = 7

df = 7
SM = 2S2M = 21.39 = 1.18.

S2M = S2 > N = 11.14 > 8 = 1.39.
78>7 = 11.14S2 = SS>df =

8 = 72>8 = 914 + 8 + 6 + 5 + 13 + 10 + 10 + 6)>

Example Worked-Out Problems

Introduction to t Tests 255

t Test for Dependent Means
A researcher tests 10 individuals before and after an experimental procedure. The
results are as follows:


4.82 6 7.18 8.36
–1 0 1 2

Raw Scores:
t Scores:

Figure 7–9 Distributions for answer to Example Worked-Out Example Problem for
t test for a single sample.

Participant Before After

1 10.4 10.8
2 12.6 12.1
3 11.2 12.1
4 10.9 11.4
5 14.3 13.9
6 13.2 13.5
7 9.7 10.9
8 11.5 11.5
9 10.8 10.4

10 13.1 12.5

Test the hypothesis that there is an increase in scores, using the .05 significance
level. (a) Use the five steps of hypothesis testing and (b) sketch the distributions

256 Chapter 7

(a) Table 7–14 shows the results, including the figuring of difference scores and

all the other figuring for the t test for dependent means. Here are the steps of
hypothesis testing:

❶ Restate the question as a research hypothesis and a null hypothesis
about the populations. There are two populations:

Population 1: People like those who are given the experimental procedure.
Population 2: People who show no change from before to after.

The research hypothesis is that Population 1’s mean difference score (figured
using “after” scores minus “before” scores) is greater than Population 2’s
mean difference score. The null hypothesis is that Population 1’s mean dif-
ference score is not greater than Population 2’s.

❷ Determine the characteristics of the comparison distribution. Its popula-
tion mean is 0 difference. The estimated population variance of difference
scores, , is shown in Table 7–14 to be .388. As shown in Table 7–14, the
standard deviation of the distribution of means of difference scores, , is
.197. Therefore, the comparison distribution has a mean of 0 and a standard
deviation of .197. It will be a t distribution for .

❸ Determine the cutoff sample score on the comparison distribution at which
the null hypothesis should be rejected. For a one-tailed test at the .05 level with

, the cutoff is 1.833. (The cutoff is positive as the research hypothesis is
that Population 1’s mean difference score will be greater than Population 2’s.)
df = 9

df = 9

Table 7–14 Figuring for Answer to Example Worked-Out Problem for t Test for Dependent

Participant Score

(After � Before)

(Difference � M )

Before After

1 10.4 10.8 .4 .260 .068

2 12.6 12.1 .410

3 11.2 12.1 .9 .760 .578

4 10.9 11.4 .5 .360 .130

5 14.3 13.9 .292

6 13.2 13.5 .3 .160 .026

7 9.7 10.9 1.2 1.060 1.124

8 11.5 11.5 0.0 .020

9 10.8 10.4 .292

10 13.1 12.5 .548

117.7 119.1 1.4 3.488

For difference scores:

t for needed for 5% significance level, one-tailed

Decision: Do not reject the null hypothesis.

t = (M – �)>SM = (.140 – 0)>.197 = .71.
= 1.833.df = 9

SM = 2S 2M = 2.039 = .197.
S 2M = S 2>N = .388>10 = .039.
S 2 = SS>df = 3.488>(10 – 1) = 3.488>9 = .388.
� = 0.
M = 1.4>10 = .140.

– .740- .6

– .540- .4

– .140

– .540- .4

– .640- .5

Introduction to t Tests 257

❹ Determine your sample’s score on the comparison distribution. The sam-
ple’s mean change of .140 is .71 standard deviations (of .197 each) on the
distribution of means above that distribution’s mean of 0. That is,


➎ Decide whether to reject the null hypothesis. The sample’s t of .71 is less
extreme than the needed t of 1.833. Thus, you cannot reject the null hypothe-
sis. The study is inconclusive.

(b) Sketches of distributions are shown in Figure 7–10.

Outline for Writing Essays for a t Test for a Single Sample
1. Describe the core logic of hypothesis testing in this situation. Be sure to mention

that the t test for a single sample is used for hypothesis testing when you have
scores for a sample of individuals and you want to compare the mean of this
sample to a population for which the mean is known but the variance is un-
known. Be sure to explain the meaning of the research hypothesis and the null
hypothesis in this situation.

2. Outline the logic of estimating the population variance from the sample scores.
Explain the idea of biased and unbiased estimates of the population variance,
and describe the formula for estimating the population variance and why it is
different from the ordinary variance formula.

3. Describe the comparison distribution (the t distribution) that is used with a t test
for a single sample, noting how it is different from a normal curve and why.

t = (M – �)>SM = (.140 – 0)>.197 = .71

distribution (t)
of difference

–.20 0 .20
–1 0 1


Raw Scores:
t Scores:

Figure 7–10 Distributions for answer to Example Worked-Out Problem for t test for
dependent means.

258 Chapter 7

Explain why a t distribution (as opposed to the normal curve) is used as the
comparison distribution.

4. Describe the logic and process for determining the cutoff sample score(s) on the
comparison distribution at which the null hypothesis should be rejected.

5. Describe why and how you figure the t score of the sample mean on the compar-
ison distribution.

6. Explain how and why the scores from Steps ❸ and ❹ of the hypothesis-testing
process are compared. Explain the meaning of the result of this comparison with
regard to the specific research and null hypotheses being tested.

Outline for Writing Essays for a t Test for Dependent Means
1. Describe the core logic of hypothesis testing in this situation. Be sure to mention

that the t test for dependent means is used for hypothesis testing when you have
two scores from each person in your sample. Be sure to explain the meaning of
the research hypothesis and the null hypothesis in this situation. Explain the
logic and procedure for creating difference scores.

2. Explain why you use 0 as the mean for the comparison distribution.
3. Outline the logic of estimating the population variance of difference scores from

the sample scores. Explain the idea of biased and unbiased estimates of the pop-
ulation variance, and describe the formula for estimating the population vari-
ance. Describe how to figure the standard deviation of the distribution of means
of difference scores.

4. Describe the comparison distribution (the t distribution) that is used with a t test
for dependent means. Explain why a t distribution (as opposed to the normal
curve) is used as the comparison distribution.

5. Describe the logic and process for determining the cutoff sample score(s) on the
comparison distribution at which the null hypothesis should be rejected.

6. Describe why and how you figure the t score of the sample mean on the compar-
ison distribution.

7. Explain how and why the scores from Steps ❸ and ❹ of the hypothesis-testing
process are compared. Explain the meaning of the result of this comparison with
regard to the specific research and null hypotheses being tested.

These problems involve figuring. Most real-life statistics problems are done on a
computer with special statistical software. Even if you have such software, do these
problems by hand to ingrain the method in your mind. To learn how to use a computer
to solve statistics problems like those in this chapter, refer to the Using SPSS section
at the end of this chapter and the Study Guide and Computer Workbook that
accompanies this text.

All data are fictional unless an actual citation is given.

Set I (for Answers to Set I Problems, see pp. 681–683)
1. In each of the following studies, a single sample’s mean is being compared to a

population with a known mean but an unknown variance. For each study, decide
whether the result is significant. (Be sure to show all of your calculations.)`

Practice Problems

Introduction to t Tests 259

2. Suppose a candidate running for sheriff in a rural community claims that she will
reduce the average speed of emergency response to less than 30 minutes, which is
thought to be the average response time with the current sheriff. There are no past
records; so the actual standard deviation of such response times cannot be deter-
mined. Thanks to this campaign, she is elected sheriff, and careful records are
now kept. The response times for the first month are 26, 30, 28, 29, 25, 28, 32, 35,
24, and 23 minutes.

Using the .05 level of significance, did she keep her promise? (a) Use the
steps of hypothesis testing. (b) Sketch the distributions involved. (c) Explain your
answer to someone who has never taken a course in statistics.

3. A researcher tests five individuals who have seen paid political ads about a partic-
ular issue. These individuals take a multiple-choice test about the issue in which
people in general (who know nothing about the issue) usually get 40 questions
correct. The number correct for these five individuals was 48, 41, 40, 51, and 50.

Using the .05 level of significance, two-tailed, do people who see the ads
do better on this test? (a) Use the steps of hypothesis testing. (b) Sketch the dis-
tributions involved. (c) Explain your answer to someone who is familiar with
the Z test (from Chapter 5) but is unfamiliar with t tests.

4. For each of the following studies using difference scores, test the significance
using a t test for dependent means.

Sample Population Population Sample Significance Level
Size (N ) Mean ( ) Variance ( ) Mean (M ) Tails ( )

(a) 64 12.40 9.00 11.00 1 (low predicted) .05
(b) 49 1,006.35 317.91 1,009.72 2 .01
(c) 400 52.00 7.02 52.41 1 (high predicted) .01

�S 2�

Number of Mean of Estimated
Difference Difference Population
Scores in Scores in Variance of Significance
Sample Sample Difference Scores Tails Level

(a) 20 1.7 8.29 1 (high predicted) .05
(b) 164 2.3 414.53 2 .05
(c) 15 4.00 1 (low predicted) .01- 2.2

5. A program to decrease littering was carried out in four cities in California’s
Central Valley starting in August 2007. The amount of litter in the streets (aver-
age pounds of litter collected per block per day) was measured during July be-
fore the program started and then the next July, after the program had been in
effect for a year. The results were as follows:

City July 2007 July 2008

Fresno 9 2
Merced 10 4
Bakersfield 8 9
Stockton 9 1

260 Chapter 7

Using the .01 level of significance, was there a significant decrease in the
amount of litter? (a) Use the five steps of hypothesis testing. (b) Sketch the
distributions involved. (c) Explain your answer to someone who understands
mean, standard deviation, and variance, but knows nothing else about statistics.

6. A researcher assesses the level of a particular hormone in the blood in five pa-
tients before and after they begin taking a hormone treatment program. Results
for the five are as follows:

Patient Before After

A .20 .18
B .16 .16
C .24 .23
D .22 .19
E .17 .16

Using the .05 significance level, was there a significant change in the level of
this hormone? (a) Use the steps of hypothesis testing. (b) Sketch the distribu-
tions involved. (c) Explain your answer to someone who understands the t test
for a single sample but is unfamiliar with the t test for dependent means.

7. Figure the estimated effect size and indicate whether it is approximately small,
medium, or large, for each of the following studies:

Mean Change S

(a) 20 32
(b) 5 10
(c) .1 .4
(d) 100 500

8. What is the power of each of the following studies, using a t test for dependent
means (based on the .05 significance level)?

Effect Size N Tails

(a) Small 20 One
(b) Medium 20 One
(c) Medium 30 One
(d) Medium 30 Two
(e) Large 30 Two

9. About how many participants are needed for 80% power in each of the follow-
ing planned studies that will use a t test for dependent means with p � .05?

Predicted Effect Size Tails

(a) Medium Two
(b) Large One
(c) Small One

Introduction to t Tests 261

10. Weller and Weller (1997) conducted a study of the tendency for the menstrual
cycles of women who live together (such as sisters) to become synchronized. For
their statistical analysis, they compared scores on a measure of synchronization
of pairs of sisters living together versus the degree of synchronization that would
be expected by chance (lower scores mean more synchronization). Their key re-
sults (reported in a table not reproduced here) were synchrony scores of 6.32 for
the 30 roommate sister pairs in their sample compared to an expected synchrony
score of 7.76; they then reported a t score of 2.27 and a p level of .011 for this dif-
ference. Explain this result to a person who is familiar with hypothesis testing
with a known population variance, but not with the t test for a single sample.

11. A psychologist conducts a study of perceptual illusions under two different
lighting conditions. Twenty participants were each tested under both of the two
different conditions. The experimenter reported: “The mean number of effective
illusions was 6.72 under the bright conditions and 6.85 under the dimly lit con-
ditions, a difference that was not significant, .” Explain this result
to a person who has never had a course in statistics. Be sure to use sketches of
the distributions in your answer.

12. A study was done of personality characteristics of 100 students who were tested
at the beginning and end of their first year of college. The researchers reported
the results in the following table:

t(19) = 1.62

(a) Focusing on the difference scores, figure the t values for each personality
scale. (Assume that SD in the table is for what we have called S, the unbiased
estimate of the population standard deviation.)
(b) Explain to a person who has never had a course in statistics what this table

Set II
13. In each of the following studies, a single sample’s mean is being compared to a

population with a known mean but an unknown variance. For each study, decide
whether the result is significant.

Fall Spring Difference

Personality Scale M SD M SD M SD

Anxiety 16.82 4.21 15.32 3.84 1.50** 1.85
Depression 89.32 8.39 86.24 8.91 3.08** 4.23
Introversion 59.89 6.87 60.12 7.11 2.22
Neuroticism 38.11 5.39 37.22 6.02 .89* 4.21

*p � .05.
**p � .01.

– .23

Standard Sample Significance

Sample Population Deviation Mean Level
Size (N ) Mean ( ) (S ) (M ) Tails ( )

(a) 16 100.31 2.00 100.98 1 (high predicted) .05
(b) 16 .47 4.00 .00 2 .05
(c) 16 68.90 9.00 34.00 1 (low predicted) .01


262 Chapter 7

14. Evolutionary theories often emphasize that humans have adapted to their physi-
cal environment. One such theory hypothesizes that people should spontaneously
follow a 24-hour cycle of sleeping and waking—even if they are not exposed to
the usual pattern of sunlight. To test this notion, eight paid volunteers were
placed (individually) in a room in which there was no light from the outside and
no clocks or other indications of time. They could turn the lights on and off as
they wished. After a month in the room, each individual tended to develop a
steady cycle. Their cycles at the end of the study were as follows: 25, 27, 25, 23,
24, 25, 26, and 25.

Using the .05 level of significance, what should we conclude about the
theory that 24 hours is the natural cycle? (That is, does the average cycle length
under these conditions differ significantly from 24 hours?) (a) Use the steps of
hypothesis testing. (b) Sketch the distributions involved. (c) Explain your an-
swer to someone who has never taken a course in statistics.

15. In a particular country, it is known that college seniors report falling in love an
average of 2.20 times during their college years. A sample of five seniors, origi-
nally from that country but who have spent their entire college career in the
United States, were asked how many times they had fallen in love during their
college years. Their numbers were 2, 3, 5, 5, and 2. Using the .05 significance
level, do students like these who go to college in the United States fall in love
more often than those from their country who go to college in their own coun-
try? (a) Use the steps of hypothesis testing. (b) Sketch the distributions in-
volved. (c) Explain your answer to someone who is familiar with the Z test
(from Chapter 5) but is unfamiliar with the t test for a single sample.

16. For each of the following studies using difference scores, test the significance
using a t test for dependent means.

Number of
Scores in

Mean of



S 2



(a) 10 3.8 50 One (high) .05
(b) 100 3.8 50 One (high) .05
(c) 100 1.9 50 One (high) .05
(d) 100 1.9 50 Two .05
(e) 100 1.9 25 Two .05

17. Four individuals with high levels of cholesterol went on a special crash diet,
avoiding high-cholesterol foods and taking special supplements. Their total
cholesterol levels before and after the diet were as follows:

Participant Before After

J. K. 287 255
L. M. M 305 269
A. K. 243 245
R. O. S. 309 247

Using the .05 level of significance, was there a significant change in cholesterol
level? (a) Use the steps of hypothesis testing. (b) Sketch the distributions

Introduction to t Tests 263

involved. (c) Explain your answer to someone who has never taken a course in

18. Five people who were convicted of speeding were ordered by the court to attend
a workshop. A special device put into their cars kept records of their speeds for
two weeks before and after the workshop. The maximum speeds for each person
during the two weeks before and the two weeks after the workshop follow.

Participant Before After

L. B. 65 58
J. K. 62 65
R .C. 60 56
R. T. 70 66
J. M. 68 60

Using the .05 significance level, should we conclude that people are likely to
drive more slowly after such a workshop? (a) Use the steps of hypothesis test-
ing. (b) Sketch the distributions involved. (c) Explain your answer to someone
who is familiar with hypothesis testing involving known populations, but has
never learned anything about t tests.

19. The amount of oxygen consumption was measured in six individuals over two
10-minute periods while sitting with their eyes closed. During one period, they
listened to an exciting adventure story; during the other, they heard restful

Based on the results shown, is oxygen consumption less when listening to the
music? Use the .01 significance level. (a) Use the steps of hypothesis testing.
(b) Sketch the distributions involved. (c) Explain your answer to someone who
understands mean, standard deviation, and variance but knows nothing else
about statistics.

20. Five sophomores were given an English achievement test before and after
receiving instruction in basic grammar. Their scores are shown below.

Participant Story Music

1 6.12 5.39
2 7.25 6.72
3 5.70 5.42
4 6.40 6.16
5 5.82 5.96
6 6.24 6.08

Student Before After

A 20 18
B 18 22
C 17 15
D 16 17
E 12 9

264 Chapter 7

Is it reasonable to conclude that future students would show higher scores after
instruction? Use the .05 significance level. (a) Use the steps of hypothesis test-
ing. (b) Sketch the distributions involved (c) Explain your answer to someone
who understands mean, standard deviation, and variance but knows nothing else
about statistics.

21. Figure the predicted effect size and indicate whether it is approximately small,
medium, or large, for each of the following planned studies:

Predicted Mean

(a) 8 30
(b) 8 10
(c) 16 30
(d) 16 10

22. What is the power of each of the following studies, using a t test for dependent
means (based on the .05 significance level)?

Effect Size N Tails

(a) Small 50 Two
(b) Medium 50 Two
(c) Large 50 Two
(d) Small 10 Two
(e) Small 40 Two
(f) Small 100 Two
(g) Small 100 One

23. About how many participants are needed for 80% power in each of the follow-
ing planned studies that will use a t test for dependent means with p � .05?

Predicted Effect Size Tails

(a) Small Two
(b) Medium One
(c) Large Two

24. A study compared union activity of employees in 10 plants during two different
decades. The researchers reported “a significant increase in union activity,

, p � .01.” Explain this result to a person who has never had a course
in statistics. Be sure to use sketches of the distributions in your answer.

25. Holden and colleagues (1997) compared mothers’ reported attitudes toward cor-
poral punishment of their children from before to 3 years after having their first
child. “The average change in the women’s prior-to-current attitudes was signif-
icant, , p � .001 . . . ” (p. 485). (The change was that they felt
more negatively about corporal punishment after having their child.) Explain
this result to someone who is familiar with the t test for a single sample, but not
with the t test for dependent means.

26. Table 7–15 (reproduced from Table 4 of Larson et al., 2001) shows ratings of
various aspects of work and home life of 100 middle-class men in India who
were fathers. Pick three rows of interest to you and explain the results to some-
one who is familiar with the mean, variance, and Z scores but knows nothing
else about statistics.

t(107) = 10.32

t(9) = 3.28

Introduction to t Tests 265

The U in the following steps indicates a mouse click. (We used SPSS version 15.0
for Windows to carry out these analyses. The steps and output may be slightly differ-
ent for other versions of SPSS.)

t Test for a Single Sample
❶ Enter the scores from your distribution in one column of the data window.
❷ U Analyze.
❸ U Compare means.
❹ U One-sample T test (this is the name SPSS uses for a t test for a single sample).
➎ U on the variable for which you want to carry out the t test and then U the arrow.
❻ Enter the population mean in the “Test Value” box.
❼ U OK.

Practice these steps by carrying out a single sample t test for the example shown
earlier in this chapter of 10 people’s ratings of hopefulness after a flood. The sample
scores, population mean, and figuring for that study are shown in Table 7–4 on
page 232. Your SPSS output window should look like Figure 7–11. The first table
provides information about the variable: the number of scores (“N”); the mean of the
scores (“Mean”); the estimated population standard deviation, S (“Std. Deviation”);
and the standard deviation of the distribution of means, (“Std. Error Mean”).
Check that the values in that table are consistent (allowing for rounding error) with
the values in Table 7–4.

The second table in the SPSS output window gives the outcome of the t test.
Compare the values of t and df in that table and the values shown in Table 7–4. The
exact two-tailed significance level of the t test is given in the “Sig. (2-tailed)” col-
umn. In this study, the researcher was using the .01 significance level. The signifi-
cance level given by SPSS (.271) is not more extreme than .01, which means that the
researcher cannot reject the null hypothesis and the study is inconclusive.


Using SPSS

Table 7–15 Comparison of Fathers’ Mean Psychological States in the Job and Home Spheres
( )


Scale Range Work Home Work vs. Home

Important 0–9 5.98 5.06 6.86***

Attention 0–9 6.15 5.13 7.96***

Challenge 0–9 4.11 2.41 11.49***

Choice 0–9 4.28 4.74 ***

Wish doing else 0–9 1.50 1.44 0.61

Hurried 0–3 1.80 1.39 3.21**

Social anxiety 0–3 0.81 0.64 3.17**

Affect 1–7 4.84 4.98 **

Social climate 1–7 5.64 5.95 4.17***

Note: Values for column 3 are t scores; for all t tests.
Source: Larson, R., Dworkin, J., & Verma, S. (2001). Men’s work and family lives in India: The daily organization of time and
emotions. Journal of Family Psychology, 15, 206–224. Copyright © 2001 by the American Psychological Association.

p 6 .001.
p 6 .01.

df = 90

– 2.64

– 3.38

N = 100

266 Chapter 7

t Test for Dependent Means
❶ Enter one set of scores (for example, the “before” scores) in the first column of

the data window. Then enter the second set of scores (for example, the “after”
scores) in the second column of the data window. (Be sure to enter the scores in
the order they are listed.) Since each row in the SPSS data window represents a
separate person, it is important that you enter each person’s scores in two sepa-
rate columns (for example, a “before” column and an “after” column).

❷ U Analyze.
❸ U Compare means.
❹ U Paired-Samples T Test (this is the name SPSS uses for a t test for dependent


Figure 7–11 Using SPSS to carry out a t test for a single sample for the example of 10
people’s ratings of hopefulness after a flood.

Introduction to t Tests 267

❺ U on the first variable (this will highlight the variable). U on the second vari-
able (this will highlight the variable). U the arrow. The two variables will now
appear in the “Paired Variables” box.

❻ U OK.

Practice these steps by carrying out a t test for dependent means for Olthoff’s
(1989) study of communication quality of 19 men who received ordinary premarital
counseling. The scores and figuring for that study are shown in Table 7–6 on page 238.
Your SPSS output window should look like Figure 7–12. The key information is con-
tained in the third table (labeled “Paired Samples Test”). The final three columns of
this table give the t score (4.240), the degrees of freedom (18), and the two-tailed sig-
nificance level (.000 in this case) of the t test. The significance level is so small that,
even after rounding to three decimal places, it is less than .001. Since the significance
level is more extreme than the .05 significance level we set for this study, you can re-
ject the null hypothesis. By looking at the means for the “before” variable and the
“after” variable in the first table (labeled “Paired Samples Statistics”), you can see that

Figure 7–12 Using SPSS to carry out a t test for dependent means for Olthoff’s (1989)
study of communication quality of 19 men who received ordinary premarital counseling.

Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.

268 Chapter 7

the husbands’ communication quality was lower after marriage (a mean of 104.2632)
than before marriage (a mean 116.3158). Don’t worry that the t value figured in
Table 7–6 was negative, whereas the t value in the SPSS output is positive. This hap-
pens because the difference score in Table 7–6 was figured as after minus before, but
SPSS figured the difference scores as before minus after. Both ways of figuring the
difference score are mathematically correct and the overall result is the same in
each case.

1. A sample’s variance is slightly smaller than the population’s because it is based
on deviations from the sample’s mean. A sample’s mean is the optimal balance
point for its scores. Thus, deviations of a sample’s scores from its mean will be
smaller than deviations from any other number. The mean of a sample generally
is not exactly the same as the mean of the population it comes from. Thus, devi-
ations of a sample’s scores from its mean will generally be smaller than devia-
tions of that sample’s scores from the population mean.

2. Statisticians make a subtle distinction in this situation between the comparison
distribution and the distribution of means. (We avoid this distinction to simplify
your learning of what is already fairly difficult.) The general procedure of hy-
pothesis testing, as we introduced it in Chapter 5, can be described as comparing
a Z score to your sample’s mean, where and then comparing
this Z score to a cutoff Z score from the normal curve table. We described this
process as using the distribution of means as your comparison distribution. Sta-
tisticians would say that actually you are comparing the Z score you figured for
your sample mean to a distribution of Z scores (which is simply a standard nor-
mal curve). Similarly, for a t test, statisticians think of the procedure as figuring
a t score (like a Z score, but figured using an estimated standard deviation)
where and then comparing your computed t score to a cutoff
t score from a t distribution table. Thus, according to the formal statistical logic,
the comparison distribution is a distribution of t scores, not of means.

3. In line with the terminology we used in Chapter 5, the symbol � in the formula
should read , since it refers to the population mean of a distribution of means.
In Chapter 5, we used the terminology to emphasize the conceptual differ-
ence between the mean of a population of individuals and the mean of a popula-
tion of means. But � and are always equal. Thus, to keep the terminology as
straightforward as possible in this and subsequent chapters, we refer to the mean
of a distribution of means as . (If we were even more formal, we might use
or even since we are referring to the mean of Population 2.)

4. The steps of carrying out a t test for a single sample can be combined into a
computational formula for t based on difference scores. For learning purposes in
your class, you should use the steps as we have discussed them in this chapter.
In a real research situation, the figuring is usually all done by computer (see this
chapter’s Using SPSS section). Should you ever have to do a t test for a single
sample for an actual research study by hand (or just with a hand calculator), you
may find the following formula useful:

t =
M – �

©X2 – ((©X)2>N)

(N – 1)(N)



t = (M – �)>SM

Z = (M – �)>�M

Chapter Notes

The t score for a t test for a
single sample is the result of
subtracting the population
mean from the sample mean
and dividing that difference
by the square root of the fol-
lowing: the sum of the
squared scores minus the re-
sult of taking the sum of all
the scores, squaring this sum
and dividing by the number of
scores, then taking this whole
difference and dividing it by
the result of multiplying the
number of scores minus 1 by
the number of scores.

Introduction to t Tests 269

5. The steps of carrying out a t test for dependent means can be combined into a
computational formula for t based on difference scores. For learning purposes in
your class, you should use the steps as we have discussed them in this chapter.
In a real research situation, the figuring is usually all done by computer (see the
Using SPSS section at the end of this chapter). However, if you ever have to do
a t test for dependent means for an actual research study by hand (or with just a
hand calculator), you may find the formula useful:

6. Single sample t tests are quite rare in practice; so we didn’t include a discussion
of effect size or power for them in the main text. However, the effect size for a
single sample t test can be figured using the same approach as in Chapter 6
(which is the same as the approach for figuring effect size for the t test for de-
pendent means). It is the difference between the population means divided by
the population standard deviation: . When using this formula
for a t test for a single sample, is the predicted mean of Population 1 (the
population from which you are studying a sample), is the mean of the
“known” population, and is the population standard deviation. The conven-
tions for effect size for a t test for a single sample are the same as you learned for
the situation we considered in Chapter 6: A small effect size is .20, a medium
effect size is .50, and a large effect size is .80.

7. Cohen (1988, pp. 28–39) provides more detailed tables in terms of numbers of
participants, levels of effect size, and significance levels. If you use his tables,
note that the d referred to is actually based on a t test for independent means (the
situation we consider in Chapter 8). To use these tables for a t test for dependent
means, first multiply your effect size by 1.4. For example, if your effect size is
.30, for purposes of using Cohen’s tables, you would consider it to be .42 (that
is, . ). The only other difference from our table is that Cohen de-
scribes the significance level by the letter a (for “alpha level”), with a subscript
of either 1 or 2, referring to a one-tailed or two-tailed test. For example, a table
that refers to “ ” at the top means that this is the table for p � .05, one-

8. More detailed tables, giving the needed numbers of participants for levels of
power other than 80% (and also for effect sizes other than .20, .50, and .80 and
for other significance levels) are provided in Cohen (1988, pp. 54–55). However,
see Chapter Note 7 about using Cohen’s tables for a t test for dependent means.

a1 = .05

30 * 1.4 = .42


d = (�1 – �2)>�

t =

©D2 – ((©D)2>N)

(N – 1)(N)

The t score for a t test for de-
pendent means is the result of
dividing the sum of the dif-
ference scores by the number
of difference scores and then
dividing that result by the
square root of the following:
the sum of the squared differ-
ence scores minus the result
of taking the sum of all the
difference scores, squaring
this sum and dividing by the
number of difference scores,
then taking this whole differ-
ence and dividing it by the re-
sult of multiplying the
number of difference scores
minus 1 by the number of dif-
ference scores.

Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.



1 1





5 21












29 27

19 28 32


Participant Pretest Post Test

5 1

4 9 3 6
17 14
20 22
8 56 38
16 19
10 31
11 28 33
12 44 39
13 35 29
48 41
37 27

