Annotated Bibliography

 

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

Annotated Bibliography

Read the instructions for the final paper to determine the kinds of articles you will need to use as references. Also read the Example Research Proposal provided in the course materials to help you visualize your final paper. Search the Ashford University Library’s databases and the Research Methods research guide to find appropriate peer-reviewed sources for your proposal. Read the selected articles and reread relevant sections of the textbook and the full text of the study you have been working with throughout the course. Using the Sample Annotated Bibliography information and example from the Writing Center as a guide, create an annotated bibliography of the sources you will use in your final research proposal.

The references in the annotated bibliography must be listed in alphabetical order, formatted in APA style, and published within the past 10 years. All selected sources other than the textbook must be available in full text in the Ashford University Library. After each reference, insert two paragraphs. The first paragraph should summarize the main points of the source, in your own words. Do not use any quotations or verbatim wording from the source for this assignment. In the second paragraph, explain how you will use the source to support your research proposal.

For this assignment, a title page in APA format is required but it is not necessary to include a separate reference page, because the paper itself is the reference page with additional information inserted. If you do include a separate page of references, be aware that it will not be counted towards the required page count. Your paper must be a minimum of four pages (excluding title page) and formatted according to APA style as outlined in the Ashford Writing Center.

Carefully review the

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

Grading Rubric (Links to an external site.)Links to an external site.

for the criteria that will be used to evaluate your assignment.

** Attached see all the information you need to do this paper. 

Thanks

CANVAS – PSY326 Week Four
Welcome to Week Four of Psychology 326, Research Methods at Ashford University. This week
we learn about the second category of research designs, which is called correlational. This
category is different than the purely descriptive designs we looked at last week, because with
correlational research, we explore the relationships between variables.

There are many different correlational research designs. But the one we will study this week is
survey research. As you will see from reading Chapter 4 in the textbook, within survey research,
there are many ways to design a specific survey.

I personally find this week the most fun because students get to design a survey research project
on any topic they wish. It’s so interesting to see the different topic ideas. And I always learn
something new from my students during this week of the course.

In the discussion, you’ll plan and describe all of the important aspects of your survey idea,
including the topic and explaining why it’s important, identifying an appropriate sampling
strategy and sample size, deciding how to distribute the survey, considering any ethical
implications, and providing two sample questions exactly as they will appear on the survey.

The quiz this week covers the information in Chapter 4 about non-experimental correlational
research methods. The written assignment is an annotated bibliography. This bibliography
should include all of the references other than the textbook that you will use for your final
research proposal. Check the Week 5 assignment instructions for details about what sources are
needed.

As soon as you receive feedback on your Week 3 paper, start thinking about how you would
design a new study on the topic of the study you just critiqued. Would you try using a different
research method? A different way of recruiting participants? Or a different data analysis
technique?

In the course materials, there’s an example research proposal that I put together so you can get an
idea of what is expected for the final paper and how the references are used in a research
proposal. Take a look at that. And then find at least 6 peer reviewed journal articles that will help
you create a high quality research proposal on your topic.

One of these should be the article you critiqued last week. And one should be about research
ethics. You can also include one or two other studies related to the topic. But it is especially
important to have some sources about the research methods you plan to use.

You should use the textbook in your final paper. But it does not count as one of the 6 required
peer reviewed articles. You do not need to summarize the textbook or include it in the Week
Four paper.

When you have all of your sources, read them, take notes on them, and compile your annotated
bibliography. There are resources in the instructor guidance and in the Ashford Writing Center

on how to do an annotated bibliography. Your instructor may also post more resources in an
announcement.

In short, type your APA style reference list. Make sure the references are in alphabetical order.
Then after each reference, insert two paragraphs.

In the first paragraph, summarize the main points of the source in your own words. Do not use
wording from the source or quotations for this assignment. In the second paragraph, explain how
this source will be used in your research proposal. For instance, if it is a research study on the
topic, you would use it in the literature review section to provide background information on the
topic and how it has previously been studied. If it is an article that explains how to perform a
research method you want to use, it will help you prepare your method section.

When the bibliography entries are done, add a title page and your assignment should be
complete. You do not need a separate references page because the annotated bibliography is the
references page with additional information inserted after each reference. I’ll talk more about the
final paper in next week’s video. In the meantime, have a great week and don’t hesitate to ask
your instructor if you have any questions.

PSY326 Research Methods Week 4 Guidance

Begin by viewing the video on the Week 4 overview screen. Read Chapter 4 of your textbook. It is also highly recommended that you read the book chapter on non-experimental quantitative research written by Dr. Gabriella Belli. A link to a website featuring this chapter is included in the course materials.

After completing this instructional unit, you will be able to:

· Evaluate the features of survey research and survey design.

· Design a survey project on a topic of interest.

This week, we continue learning about non-experimental research methods with the correlational research design category and survey research. In this week’s discussion, you will design a survey study using one of the approaches presented in Chapter 4, specify a suitable sampling strategy, devise two sample survey questions, consider ethical implications and how to handle them, and explain why the choices you made are appropriate. The quiz, due on Sunday, will cover the non-experimental correlational designs in Chapter 4.

Remember that all discussions should cite at least two scholarly sources, so be sure to search the Ashford Library resources (including the Research Methods research guide) for journal articles that extend the information given in the textbook on survey research. All references should be cited in APA format. See the Ashford Writing Center, under Learning Resources in the left navigation panel, for examples of correct APA style.

The written assignment this week is an annotated bibliography. For the final paper, you will need to have at least six peer-reviewed journal articles plus the textbook for references. Read the instructions for the final paper and look at the Example Research Proposal provided in the course materials to determine the kinds of articles you need, then use the Ashford University Library databases and Research Methods research guide to find the articles.

Prepare the title page for your assignment, then type your references in APA format, and arrange them in alphabetical order by the first author’s last name. After each reference, insert two paragraphs. The first paragraph should be a summary of the main points of the article written in your own words without any quotes from the source. In the second paragraph, explain how you will use the article to support your final research proposal.

You can find a sample annotated bibliography in the Ashford Writing Center at this link:

https://awc.ashford.edu/tocw-sample-annotated-bibliography.html (Links to an external site.)Links to an external site.

Here is a video presentation from the Ashford Writing Center on how to do an annotated bibliography.

According to Floyd Fowler (1995), an expert on survey design, a survey is data collection that produces summary information about a study population. Most survey research is quantitative, with the aim of producing statistical results. This is accomplished by selecting a representative sample of the population and asking people questions. Thus, a survey is a self-report measure (Newman, 2016), which allows us to get some insight into what people think and how they perceive things. Surveys can be administered by mail, by telephone, in person, or over the internet. Each approach has advantages and disadvantages.

For mail surveys, advantages include relatively low costs (when compared to telephone or in-person interviewing) and privacy for respondents. When asking about sensitive topics, a mail survey is preferred because respondents may feel more comfortable answering questions anonymously. With a mail survey, the researcher is not present and the answers will not be connected to the person who gave them. On the other hand, if the sample is very large and the questionnaire long and detailed, the costs of printing and postage can get quite expensive. Other disadvantages of mail surveys include the difficulty of getting a good mailing list, differences in reading skills among respondents, and a typically low response rate (Dillman, Smyth, & Christian, 2009).

Telephone surveys used to be very popular and effective. If the researcher has access to random digit dialing equipment, the sampling process is easy. The response rate was very good, until people got Caller ID, so this former advantage may now be considered a disadvantage. An advantage of phone surveys is that more open-ended questions can be asked, and once the interviewer is able to reach and get consent to participate, people tend to be willing to answer most or all of the questions. A disadvantage of the telephone survey is that it is time-consuming.

In-person interviews can be used in both qualitative and quantitative methods, but they are especially appropriate for qualitative studies. This format allows for the use of open-ended questions and the interviewer can read the body language of the respondent, allowing for follow-up probing to encourage more complete answers. A disadvantage is that sensitive topics may not be able to be addressed because of lack of anonymity. In addition, this approach is expensive and requires more time than the other survey approaches. If you are interested in using a qualitative approach with open-ended questions, you will need to use a qualitative data analysis technique instead of the statistical procedures covered in our textbook.

Finally, the internet survey approach is gaining in popularity as more people become computer literate. An online survey offers dynamic interaction with respondents, yet is not labor-intensive for the researcher. The cost of setting up a web survey is low, and data can be transferred automatically to the data file used for analysis. Thus, the step of entering data into the statistical analysis software is eliminated, along with the risk of data entry errors. In contrast to the other methods, a well-designed and promoted internet survey can be designed, implemented, and completed fairly quickly. Of course, there are disadvantages, too. There is no control over who actually responds to the survey, even if it is by invitation only. The intended respondent could log in and then hand the computer over to another person to answer the questions. Sampling bias can be a problem, because even though more people have computers and internet access than before, still not everyone has them or uses them, so an internet sample would not be completely representative of the general population. Spam filters may make your survey invitation impossible to deliver to some people, and other technical difficulties may interfere with the completion of the survey. Due to these disadvantages and problems, the response rate may end up being low.

As you read Chapter 4 of the textbook this week, pay careful attention to the information on developing good questions for a survey in section 4.2. In Discussion 1, you will include two sample questions from your proposed survey design. Make sure those questions are not confusing, poorly worded, double-barreled, or leading.

Here is a video that you may find helpful in designing a survey (Howcast, 2010):

https://www.youtube.com/watch?v=SsZySkZ8bRo (Links to an external site.)Links to an external site.

If you have any questions about this week’s readings or assignments, email your instructor or post your question on the “Ask Your Instructor” forum. Remember, use the forum only for questions that may concern the whole class. For personal issues, use email.

References

Dillman, D. A., Smyth, J. D., & Christian, L. M. (2009). Internet, mail, and mixed-mode surveys: The tailored design method (3rd edition). Hoboken, New Jersey: Wiley.

Fowler, F. J. (1995). Improving survey questions: Design and evaluation. Thousand Oaks, CA: Sage.

Howcast. (2010, August 30). 

How to Write a Survey or Questionnaire (Links to an external site.)Links to an external site.

 [Video file]. Retrieved from https://www.youtube.com/watch?v=SsZySkZ8bRo 

Accessibility Statement (Links to an external site.)Links to an external site.

Privacy Policy (Links to an external site.)Links to an external site.

Newman, M. (2016). Research methods in psychology (2nd ed.). San Diego, CA: Bridgepoint Education, Inc.

RunningHead: RESEARCH PROPOSAL

1

Example Research Proposal

Pamela Murphy

PSY 326 Research

Methods

Instructor’s Name

Date Submitted

NOTE: The details in this example research proposal are based on a published study which I co-

authored with Charles B. Hodges and my doctoral dissertation, both in 2009. Portions of the text

are excerpted from the published article (Hodges & Murphy, 2009) and the dissertation (Murphy,

2009).

RESEARCH PROPOSAL 2

Example Research Proposal

Introduction

The concept of self-efficacy was introduced nearly 40 years ago. “Perceived self-efficacy

refers to beliefs in one’s capabilities to organize and execute the courses of action required to

produce given attainments” (Bandura, 1977, p. 3). Self-efficacy has been identified as an

important construct for academic achievement in traditional learning environments for at least

two decades. Zimmerman and Schunk (2003) go so far as to say that “the predictive power of

self-efficacy beliefs on students’ academic functioning has been extensively verified” (p. 446).

Its importance has been noted consistently through all levels of the educational process, with

various student populations, and in varied domains of learning.

While learner self-efficacy has a well-established literature base in the context of

traditional learning environments, self-efficacy research related to learners in online and other

non-traditional learning environments is relatively new. Hodges (2008a) has called for

researchers to explore self-efficacy in online learning environments. Additionally, in terms of

students’ self-efficacy beliefs toward academic achievement, “there have been few efforts to

investigate the sources underlying these self-beliefs” (Usher, 2009, p. 275). The purpose of the

proposed study is to investigate the relative strength of the four traditionally proposed sources of

self-efficacy beliefs of students enrolled in a technology-intensive asynchronous college math

college.

RESEARCH PROPOSAL 3

Literature Review

Self-efficacy beliefs have been found to be significant contributors to motivation and

performance in academic achievement (Multon, Brown, & Lent, 1991), group functioning

(Gully, Incalcaterra, Joshi, & Beaubien, 2002; Stajkovic & Luthans, 1998), health (Holden,

1991), and sports performance (Moritz, Feltz, Fahrbach, & Mack, 2000). Research revealing the

connection between self-efficacy and mathematics, the context of the proposed study, includes

many cultures and levels of education (Malpass, O’Neil, & Hocevar, 1999; Pietsch, Walker, &

Chapman, 2003; Randhawa, Beamer, & Lundberg, 1993; Stevens, Olivarez, Lan, & Tallent-

Runnels, 2004) and continues to the present (Usher, 2009).

Sources of Self-Efficacy

Albert Bandura’s (1977) introduction of self-efficacy theory included the proposition that

self-efficacy is derived from four principal sources: mastery experiences, vicarious experience,

social persuasion, and physiological/affective states. These four areas are generally accepted in

the literature as core elements in the development of self-efficacy beliefs, but an ordering of the

importance of each of these sources is unsettled.

Mastery Experiences. Mastery experiences refer to previous, successful experiences a

learner has had performing a task. Successes build positive self-efficacy beliefs and failures

undermine self-efficacy. If failures are experienced before a firm positive belief in one’s self-

efficacy is formed, the creation of positive self-efficacy beliefs is more difficult.

Vicarious Experience. Vicarious experience refers to one’s observation of a role model

performing a task. Knowledge of how others have performed a similar task helps one determine

whether or not a performance should be judged a success or failure. Surpassing the performances

of others increases self-efficacy and falling below others’ performances lowers self-efficacy.

RESEARCH PROPOSAL 4

Note the importance of the selection of individuals for comparison. Self-efficacy beliefs will

vary depending on the abilities of those chosen for comparison, thus, models for comparison

should be selected carefully (Wood, 1989).

Social Persuasion. Social persuasion is commonly used due to the ease with which it can

be dispensed. The believability of the persuader(s) is important in the use of social persuasion.

The receiver must view the persuader as competent to provide meaningful and accurate

feedback. Bandura (1997) cautions that verbal persuasion consists of more than flippant, off-

hand comments of encouragement. Unrealistic comments from the persuader may mislead the

receiver, which may decrease self-efficacy and diminish the belief in the persuader as one

competent to evaluate the performance. “Skilled efficacy builders encourage people to measure

their successes in terms of self-improvement rather than in terms of triumphs over others”

(Bandura, 1997, p. 106).

Physiological/Affective States. Stress, emotion, mood, pain, and fatigue are all

interpreted when making judgments regarding self-efficacy. For example, someone may have

prepared well for an exam, but upon learning of some unfortunate news, stress may reduce

concentration, thus impacting performance on the exam. In general, success is expected when

one is not in a state of aversive arousal (Bandura, 1997).

Usher and Pajares (2006) summarize the inconsistent findings regarding the relative

strength of each self-efficacy source well. They follow with the proposition that “exploring the

predictive value of the sources of students’ academic self-efficacy beliefs and determining

whether this prediction varies as a function of group membership such as gender, academic

ability, and race/ethnicity is a matter of import” (p. 130).

RESEARCH PROPOSAL 5

Methods

Design

The proposed study is quantitative in nature and will use a survey research design

(Newman, 2011). Survey research falls into the non-experimental category of research designs.

The survey questions use mostly ordinal scales and will result in numeric scores summarizing the

extent of use of each source of self-efficacy beliefs as well as a score representing the level of

self-efficacy held by each student in relation to the ability to learn math in an asynchronous

learning environment.

Participants

Approximately 300 students in an asynchronous college algebra course offered at a large,

state supported university in the mid-Atlantic region of the United States will be invited to

participate in a survey. This is a convenience sample, and participation is voluntary, so the final

sample size may be considerably smaller than the number of students invited. The course is

delivered using an emporium format (Twigg, 2003) which is technology intensive. The students

enrolled in the course tend to be engaged in academic majors that are not math-intensive. They

may have a high degree of math anxiety or at least some negative feelings toward their math

abilities. In addition, the emporium model may be an unfamiliar concept for them.

Procedure/Measures

This course is offered through the Math Emporium and has no traditional class meetings.

After a brief, face-to-face, orientation meeting, students complete the course asynchronously.

There are weekly deadlines for quizzes, and proctored tests are administered periodically.

Students prepare for the quizzes and tests by taking advantage of various technology resources

available to them online. Lesson pages serve as an online textbook for the course, short

RESEARCH PROPOSAL 6

streaming video lectures are available on most topics, and an unlimited number of practice

quizzes are available. For students who desire it, face-to-face interactions with assistants in the

computer lab are available several hours each week. No appointment is needed for the face-to-

face assistance.

At the conclusion of the course, data will be collected using a web-based survey tool.

Students who provide informed consent to participate will be given an ID number and survey

access information. They may access the survey either in the Math Emporium or offsite through

the internet. Specific instruments to be used are the Self-Efficacy for Learning Mathematics

Asynchronously (SELMA) survey (Hodges, 2008b), a demographics survey, and the Sources of

Mathematics Self-Efficacy (SMSE) scale (Lent, Lopez, & Bieschke, 1991).

The SELMA survey is a 25-question survey constructed for use in college algebra and

trigonometry courses offered in an emporium model. A validation study showed an internal

consistency Cronbach’s alpha value of 0.87 (Hodges, 2008b) which is greater than the 0.80

minimum level recommended by Gable and Wolf (1993) for instruments in the affective domain.

The SMSE scale consists of four 10-question subscales designed to measure each of the

four sources of self-efficacy: mastery, vicarious experiences, social persuasion, and

affective/physiological state. In a validation study of the SMSE, Lent et al. (1991) reported

internal consistencies of 0.86 for mastery, 0.56 for vicarious, 0.74 for persuasion, and 0.90 for

affective/physiological arousal.

Data Analysis

To investigate the relative strength of the four traditional sources of self-efficacy beliefs

of students in an asynchronous math course, analysis of variance (ANOVA) and multiple

regression will be used. Scores from each of the four subscales of the SMSE will be used as

RESEARCH PROPOSAL 7

predictors of the SELMA score. Bivariate correlations will also be examined. Significant

correlations among the predictor variables may present a problem of multicollinearity. If

necessary, additional statistical tests such as ridge regression (Joe & Mendoza, 1989; Kidwell &

Brown, 1982) will be applied to solve this problem.

Ethical Issues

Participation in the survey will be strictly voluntary, and will not be tied to evaluation of

the student’s performance in the course in any way. As a non-experimental survey study, no

deception will be used. Signed informed consent will be obtained from those who wish to

participate. Those who agree to participate may withdraw from the study at any time without any

type of penalty.

Confidentiality of participants will be protected by the assignment of ID numbers to be

used on the survey documents instead of names or any other type of identifying information. A

single copy of the list matching the ID numbers with participants’ names will be kept in a secure,

locked location for a period of three years after the completion of the study. After three years, the

list will be destroyed in accordance with the instructions of the Institutional Review Board

(IRB).

As a token of appreciation, all participants will be entered into a drawing for an Amazon

gift card. The proposed amount of the gift card, subject to IRB approval, is $25. University

facilities, including the computer lab known as the Math Emporium, its computers and a survey

software program, will be used if this study is approved. This project will not receive any

external funding from commercial or other sources, and no conflicts of interest are reported by

the researchers.

RESEARCH PROPOSAL 8

Conclusion

Self-efficacy and its relationship to academic achievement in asynchronous online

learning environments are only recently beginning to be researched (Hodges, 2008a). Given the

growing prominence of asynchronous online learning, it is essential that we understand what role

constructs such as self-efficacy play in these learning environments. The proposed study will

address this need by using a survey research design. The surveys will provide data on the four

sources of self-efficacy which will serve as predictors of students’ self-efficacy for learning

mathematics in an asynchronous online setting. A multiple regression model using the four

predictors with the SELMA survey score as the dependent variable will indicate how much each

source contributes to self-efficacy.

The results of this study are expected to be important to instructional designers and

educational practitioners who either currently use or are considering using an emporium model,

as they will give indications of which elements of the asynchronous course design should be

emphasized to best promote students’ self-efficacy relating to the subject matter. An expedited

review of this proposal by the IRB is requested for approval to begin this research as soon as

possible.

RESEARCH PROPOSAL 9

References

Allison, P. D. (1999). Multiple regression: A primer. Thousand Oaks, CA: Pine Forge Press.

Bandura, A. (1977). Self-efficacy: Toward a unifying theory of behavioral change.

Psychological Review, 84(2), 191-215.

Bandura, A. (1997). Self-efficacy: The exercise of control. New York: W. H. Freeman and

Company.

Gable, R. K., & Wolf, M. B. (1993). Instrument development in the affective domain: Measuring

attitudes and values in corporate and school settings, 2nd ed. Boston: Kluwer Academic

Publishers.

Gully, S. M., Incalcaterra, K A., Joshi, A., & Beaubien, J. M. (2002). A meta-analysis of team-

efficacy, potency, and performance: Interdependence and level of analysis as moderators

of observed relationships. Journal of Applied Psychology, 87, 819-832.

Hodges, C. B. (2008a). Self-efficacy in the context of online learning environments: A review of

the literature and directions for research. Performance Improvement Quarterly, 20(3-4),

7-25.

Hodges, C. B. (2008b). Self-efficacy, motivational email, and achievement in an asynchronous

math course. Journal of Computers in Mathematics and Science Teaching, 27(3), 265-

285.

Hodges, C. B., & Murphy, P. F. (2009). Sources of self-efficacy beliefs of students in a

technology-intensive asynchronous college algebra course. Internet and Higher

Education, 12(2), 93-97. doi:10.1016/j.iheduc.2009.06.005.

Holden, G. (1991). The relationship of self-efficacy appraisals to subsequent health-related

outcomes: A meta-analysis. Social Work in Health Care, 16, 53-93.

RESEARCH PROPOSAL 10

Howell, D. C. (2002). Statistical methods for psychology, 5th ed. Pacific Grove, CA: Duxbury.

Joe, G. W., & Mendoza, J. L. (1989). The internal correlation: Its applications in statistics and

psychometrics. Journal of Educational Statistics, 14(3), 211-226.

Kidwell, J. S., & Brown, L. H. (1982). Ridge regression as a technique for analyzing models

with multicollinearity. Journal of Marriage and the Family, 44(2), 287-299.

Lent, R. W., Lopez, F. G., & Bieschke, K. J. (1991). Mathematics self-efficacy: Sources and

relation to science-based career choice. Journal of Counseling Psychology, 38(4), 424-

430.

Malpass, J. R., O’Neil, H. F., & Hocevar, D. (1999). Self-regulation, goal orientation, self-

efficacy, worry, and high-stakes math achievement for mathematically gifted high school

students. Roeper Review, 21, 281-295.

Moritz, S. E., Feltz, D. L., Fahrbach, K. R., & Mack, D. E. (2000). The relation of self-efficacy

measures to sport performance: A meta-analytic review. Research Quarterly for Exercise

and Sport, 71, 280-294.

Multon, K. D., Brown, S. D., & Lent, R. W. (1991). Relation of self-efficacy beliefs to academic

outcomes: A meta-analytic investigation. Journal of Counseling Psychology, 38(1), 30-

38.

Murphy, P. F. (2009). Relationships of parenting practices, independent learning, achievement,

and family structure (Doctoral dissertation). Virginia Tech, Blacksburg, VA. Retrieved

from http://scholar.lib.vt.edu/theses/available/etd-04022009-174950/

Newman, M. (2011). Research methods in psychology. San Diego, CA: Bridgepoint Education.

http://scholar.lib.vt.edu/theses/available/etd-04022009-174950/

RESEARCH PROPOSAL 11

Pietsch, J., Walker, R., & Chapman, E. (2003). The relationship among self-concept, self-

efficacy, and performance in mathematics during secondary school. Journal of

Educational Psychology, 95, 589-603.

Randhawa, B. S., Beamer, J. E., & Lundberg, I. (1993). Role of mathematics self-efficacy in the

structural model of mathematics achievement. Journal of Educational Psychology, 85(1),

41-48.

Stajkovic, A. D., & Luthans, F. (1998). Self-efficacy and work-related performance: A meta-

analysis. Psychological Bulletin, 124, 240-261.

Stevens, T., Olivarez, A. J., Lan, W. Y., & Tallent-Runnels, M. K. (2004). Role of mathematics

self-efficacy and motivation in mathematics performance across ethnicity. Journal of

Educational Research, 97(4), 208-221.

Twigg, C. A. (2003). Improving learning and reducing costs: New models for online learning.

EDUCAUSE Review (September/October), 28-38.

Usher, E. L. (2009). Sources of middle school students’ self-efficacy in mathematics: A

qualitative investigation. American Educational Research Journal, 46(1), 275-314.

Usher, E. L., & Pajares, F. (2006). Sources of academic and self-regulatory efficacy beliefs of

entering middle school students. Contemporary Educational Psychology, 31(2), 125-141.

Wood, J. V. (1989). Theory and research concerning social comparison of personal attributes.

Psychological Bulletin, 106(2), 231-248.

Zimmerman, B. J., & Schunk, D. H. (2003). Albert Bandura: The scholar and his contributions to

educational psychology. In B. J. Zimmerman, & D. H. Schunk (Eds.), Educational

psychology: A century of contributions (pp. 431-457). Mahwah, NJ: L. Erlbaum

Associates.

1

Created in 2015

ANNOTATED BIBLIOGRAPHY

What is an Annotated Bibliography?

Some of your courses at Ashford University will require you to write an Annotated Bibliography. An

Annotated Bibliography is a working list of references—books, journal articles, online documents,

websites, etc.—that you will use for an essay, research paper, or project. However, each reference

citation is followed by a short summative and/or evaluative paragraph, which is called an annotation. The

purpose of the annotation is to inform the reader of the relevance, accuracy, and quality of the sources

cited, and to state how this source will be used in or relevant to the paper or project.

Thus, an Annotated Bibliography has two main parts:

1. the citation of your book, article, webpage, video, or document (in APA style)

2. your annotation

How to create an Annotated Bibliography

1. Research the required number of scholarly sources from the library for your project.

2. Reference each source in APA format. For help on how to format each source, see our sample

references list.

3. Write two paragraphs under each source:

a. The first paragraph is a short summary of the article in your own words. Don’t just cut

and paste the abstract of the article.

b. The second paragraph is a short discussion of how this source supports your paper

topic. What does this source provide that reinforces the argument or claim you are

making? This support may be statistics, expert testimony, or specific examples that relate

to your focused topic.

Sample Annotated Bibliography Entry

Here is a sample entry from an Annotated Bibliography:

Belcher, D. D. (2004). Trends in teaching English for specific purposes. Annual Review of Applied Linguistics,

24(3), 165-186. doi: 10.1017/S026719050400008X.

This article reviews differing English for Specific Purposes (ESP) trends in practice and in theory. Belcher

categorizes the trends into three non-exclusive sects: sociodiscoursal, sociocultural, and sociopolitical.

Sociodiscoursal, she postulates, is difficult to distinguish from genre analysis because many of the major

players (e.g., Ann Johns) tend to research and write in favor of both disciplines. Belcher acknowledges the

preconceived shortcomings of ESP in general, including its emphasis on “narrowly-defined venues” (p.

https://awc.ashford.edu/PDFHandouts/APA_References_List_Sample

https://awc.ashford.edu/PDFHandouts/APA_References_List_Sample

2

Created in 2015

165), its tendency to “help learners fit into, rather than contest, existing…structures” (p. 166), and its

supposed “cookie-cutter” approach. In response to these common apprehensions about ESP, Belcher cites

the New Rhetoric Movement and the Sydney School as two institutions that have influenced progressive

changes and given more depth to “genre” (p. 167). She concludes these two schools of thought address the

issue of ESP pandering to “monologic” communities. Corpus linguistics is also a discipline that is

expanding the knowledge base of ESP practitioners in order to improve instruction in content-specific

areas. Ultimately, she agrees with Swales (1996) that most genres that could help ESL learners are

“hidden…or poorly taught” (p. 167) and the field of genre is only beginning to grasp the multitude of

complexities within this potentially valuable approach to the instruction of language—and in turn, writing.

This article provides examples as well as expert opinion that I can use in my project. This will provide me

with evidence to support my claims about the current disciplines in ESL studies.

Guidelines for Formatting Your Annotated Bibliography

 Citations should be cited according to APA format.

 Annotations should be indented a half an inch (.5”) so that the author’s last name is the only text

that is completely flush left.

Lapan c04.tex V1 – 09/02/2008 2:46pm Page 59

C H A P T E R

4
NONEXPERIMENTAL

QUANTITATIVE RESEARCH

GABRIELLA BELLI

KEY IDEAS

n The distinction between experimental and nonexperimental research rests on
the manipulation of treatments and on random assignment.

n Any quantitative study without manipulation of treatments or random assign-
ment is a nonexperimental study.

n Nonexperimental research is used when

variable

s of interest cannot be manipu-
lated because they are naturally existing attributes or when

random assignment

of individuals to a given treatment condition would be unethical.

n Numbers are used to represent different amounts of quantitative variables and
different classifications of categorical

variables.

n Nonexperimental studies may be classified along two dimensions: one based on
the purpose of the study and the other on the time frame of the data collection.

n Evidence of a relationship is not convincing evidence of causality.

n Alternative explanations for results in nonexperimental research should be ex-
plored and ruled out.

NOTE: My thanks to Professor Bill Frakes, from the Computer Science Department at Virginia Tech, and to

students, including many from my Research Methods class in Fall 2007, for reviewing a prior draft of this chapter.

Their insightful comments and suggestions helped improve this version. I take responsibility for any remaining

elements of confusion that may remain.

Lapan c04.tex V1 – 09/02/2008 2:46pm Page 6

0

60 Nonexperimental Quantitative Research

OVERVIEW OF NONEXPERIMENTAL RESEARCH

QUANTITATIVE RESEARCH is empirical, using numeric and quantifiable data. Conclusions are

based on experimentation and on objective and systematic observations. Quantitative research may

be divided into two general categories: experimental and nonexperimental. The essential elements

of experimental research, which was discussed in detail in the previous chapter, are presented

here first as a contrast to nonexperimental research. A primary goal for experimental research is

to provide strong evidence for cause-and-effect relationships. This is done by demonstrating that

manipulations of at least one variable, called the treatment or independent variable (IV), produce

different outcomes in another variable, called the dependent variable (DV). An experimental study

involves at least one IV that is manipulated or controlled by the researcher, random assignment to

different treatment conditions, and the measurement of some DV after treatments are applied.

Any resulting differences in the DV across the treatment groups can then be attributed to the

differences in the treatment conditions that were applied.

In contrast to experimental research, nonexperimental research involves variables

that are not manipulated by the researcher and instead are studied as they exist. One

reason for using nonexperimental research is that many variables of interest in social

science cannot be manipulated because they are attribute variables, such as gender,

socioeconomic status, learning style, or any other personal characteristic or trait. For

example, a researcher cannot randomly place individuals into different groups based on

gender or learning style because these are naturally existing attributes.

Another reason to use nonexperimental research is that, in some cases, it would

be unethical to randomly assign individuals to different treatment conditions. A classic

example of this is that one could not study the effects of smoking by randomly assigning

individuals to either a smoking or a nonsmoking group for a given number of years. The

only ethical way to investigate the potential effects of smoking would be to identify a

group of smokers and a group of nonsmokers and compare them for differences in their

current state of health. The researcher, however, would also need to take other variables

into account, such as how long people had smoked, their gender, age, and general health

level. To do so would be important because the researcher cannot take for granted that

the groups are comparable in aspects other than smoking behavior. This is in contrast

to experimental groups, which, due to the process of random assignment, start out

equal in all respects except for the treatment condition in which they are placed. In

nonexperimental research, groups based on different traits or on self-selection, such as

being or not being a smoker, may differ for any number of reasons other than the variable

under investigation. Therefore, in nonexperimental studies, one cannot be as certain as

in experimental studies that outcome differences are due to the in

dependent variable

under investigation. The researcher needs to consider possible alternative explanations,

to jointly analyze several variables, and to present conclusions without making definitive

causal statements.

Lapan c04.tex V1 – 09/02/2008 2:46pm Page 61

Variables and Their Measurement 61

In this chapter, you will learn how to characterize nonexperimental studies that do

not rely on either manipulation of variables or random assignment of subjects to groups.

Different types of nonexperimental studies will be explained, and you will learn how

to characterize them using a two-dimensional classification system. By the end of the

chapter, you will understand the basic elements of nonexperimental studies, as well

as the rationale for their use. Nonexperimental research examples, including published

studies, will be incorporated into the discussion to facilitate understanding. At the end

of the chapter, text and Web resources are provided to help you locate supplemental

materials and additional information.

VARIABLES AND THEIR MEASUREMENT

To facilitate reading the remainder of the chapter, a brief review of variables and some

of their different aspects is presented. A variable is any characteristic or attribute

that can differ across people or things; it can take on different values. Some variables

are inherent traits, such as gender or height. Others may vary due to experimenter

manipulation, such as treatment groups of drug versus placebo, or due to self-selection,

such as attending a two- or a four-year college. In quantitative research, variables are

measured in some way and those numerical values are then used in statistical analyses.

The nature of variables is important because, to some extent, it dictates the way research

questions are asked and which analysis is used.

One basic distinction is that variables can be either categorical or quantitative.

Categorical variables are those that differ across two or more distinct categories. The

researcher assigns arbitrary numbers to the categories, but the numbers have no inter-

pretable numerical meaning. For example, for categories of the variable “employment

status,” we could assign the value “1” to employed full-time, “2” to employed part-time,

and “3” to not employed. Additional examples of categorical variables that are indi-

vidual traits are gender, ethnicity, and learning style; some that are self-selected are

marital status, political party affiliation, and field of study.

Quantitative variables can be measured across a scale, their numeric values have

meaning, and they can be subjected to arithmetic operations. The following are all

examples of quantitative variables: age, height, weight, grade point average (GPA), job

satisfaction, and motivation. There is an important distinction between the first three and

the last three variables in this list. For such variables as age, height, and weight, zero

is a meaningful value that indicates the absence of the characteristic being measured,

as in something that is brand new or has no weight. The numbers have interpretable

meaning. We know what five years or five feet means because there is no arbitrariness

about these values or how to interpret them.

In contrast, zero is an arbitrary value for variables such as GPA, satisfaction,

or motivation. A zero motivation score does not mean one has no motivation, but

merely that one attained the lowest possible score for the particular instrument

being used. GPA in most schools in the United States is given on a continuum from

0.0 to 4.0 but, for example, at the Massachusetts Institute of Technology (MIT), it

goes from 0.0 to 5.0 (see GPA calculation and unit conversion in MIT Web page

at http://web.mit.edu/registrar/gpacalc.html). The International Baccalaureate grades

range from 1 to 7, based on a rubric developed from the standardized curriculum.

Lapan c04.tex V1 – 09/02/2008 2:46pm Page 62

62 Nonexperimental Quantitative Research

For another example, consider measurements for temperature. The freezing point of

water is represented as zero on a Celsius thermometer, but as 32 on a Fahrenheit

thermometer. In neither case does a zero represent the absence of temperature. In each

case, we understand what the numbers mean because specific interpretations have been

assigned to them.

Interpretation of different grading schemes or thermometers is possible because

of commonly understood unit descriptors. This is not so for such variables as job

satisfaction or motivation, where scores are arbitrary and depend on the measurement

instrument being used and how it has been designed. Typically, such scores are the

sum or the average of responses to a set of items. The items may be statements,

constructed so that all are related to the variable to be measured, and responses are

often, but not always, on a Likert scale from 1 (strongly agree) to 5 (strongly disagree).

The terms scale and index are often used to describe such sets of related items that,

together, produce a score about some characteristic or phenomenon. For example, the

Multidimensional Job Satisfaction Scale (Shouksmith, Pajo, & Jepsen, 1990) contains

eleven different subscales, each a multi-item scale measure of a different dimension

of job satisfaction. Another instrument, the Job Satisfaction Survey (Spector, 1985),

consists of nine four-item subscales to assess employee attitudes about the job. As you

can see from this example, different researchers developed different measures of the

same construct, job satisfaction.

Exact interpretation of a scale score’s value, or measure, for variables such as moti-

vation or satisfaction is not important. What is important is to know that the higher the

score, the more one has of the characteristic being measured and vice versa. One could,

for example, examine whether males or females had higher levels of job satisfaction

or if people with higher levels of job satisfaction also tended to have higher levels of

motivation. To be confident of results, it is also important to know that the measures

being used are reliable and have been validated.

Reliability relates to the consistency or dependability of a measure. Basically, if

it is reliable, you can be confident that all the items that make up the measure are

consistent with each other and that, if you were to use the measure again with the

same individuals, they would be rated similarly to the first time. Validity relates to

whether it is measuring what we intend it to measure, and represents the overarching

quality of the measure. The purpose of using the measure is an important consideration

in evaluating validity because it could be valid for one use but not for another. These

concepts are complex and beyond the scope of this chapter (see Trochim, 2005 for a

very understandable description of validity and reliability of measures). As a consumer

of research, you should at least be aware of them and look for how research authors

deal with these concepts. Do they describe their measures in detail and provide some

indication of reliability and validity?

Defining Variables

Although some variables are inherently categorical or quantitative, others may be

defined in either way. Imagine, for example, that you are interested in measuring the

education level of a group of individuals. You could do this categorically, by defining

Lapan c04.tex V1 – 09/02/2008 2:46pm Page 63

Variables and Their Measurement 63

education as “highest degree earned” and using five values representing none, high

school, college, masters, or doctorate as different levels of education. Or, you could

do this quantitatively by defining education as “number of years of schooling,” where

the resulting values would be meaningfully interpreted. This distinction is important

if one is interested in studying the relationship between educational level and salary,

a quantitative variable, because it relates to how the data might be analyzed and how

research questions would be phrased. Using the categorical definition, you could com-

pare the median salary value across the five categories of “highest degree earned.” The

median represents the midpoint when all the salaries are listed from lowest to highest.

One could then determine if there were any appreciable differences in salary across

the five groups and whether more education (represented by having a higher degree)

corresponded to higher salary.

Using the quantitative definition, you could graph the two variables in a

scatter plot

or compute a correlation coefficient (a measure of strength and direction of relationship

for two variables) for the number of years of schooling and salary. The first would

provide a visual representation of their relationship and the second a numerical one.

Figure 4.1 shows how resulting data might be depicted in the two cases described. The

table shows the number of people in each group and their median salary. The scatter

plot shows all the data points. The correlation for this data set is 0.66. Correlation

FIGURE 4.1. Two Representations of the Relationship Between Salary and
Education Level

Educational Level (years)

2220181614121086

C
u

rr
e

n
t

S
a

la
ry

1

40000

1

20000

100000

80000

60000

40000
20000
0

Highest Degree N Median Salary

Doctorate 30 68,438

Master’s 20 65,938

Bachelor 181 33,150

High School 190 24,975

None 53 24,000

Total 474 28,875

Education is measured as a categorical

variable (highest degree). The size of

each group (N) and the median salary

are given in the table.

Education is measured as a quantitative

variable (number of years in school). Each

point in the scatter plot represents years in

school and salary for a single individual.

Lapan c04.tex V1 – 09/02/2008 2:46pm Page 64

64 Nonexperimental Quantitative Research

values range from −1 to +1, with zero indicating no relationship and 1 indicating

either a negative or a positive perfect relationship depending on the sign. We could say

these data showed a moderate positive relationship. Fewer years of schooling tend to

correspond to lower salaries and more schooling to higher salaries.

Phrasing Questions

In the first case demonstrated in Figure 4.1, you would be comparing groups with

different levels of education on some measure (salary), and in the second case, you

would be relating two sets of numeric scores (years and salary). The research questions

of interest in the two cases would be: (1) how do groups, based on highest degree

earned, differ from each other with respect to salary? and (2) how does number of

years of schooling relate to salary? Phrased generically, the key questions in the two

situations are: How do groups differ from each other on some measure? How are the

variables related to each other? The distinction between these two cases depends only

on the fact that education was conceptualized as either categorical or quantitative and

not on the nature of the relationship involved.

REFLECTION QUESTIONS

By now, you should be able to:

1. Describe the difference between experimental and nonexperimental studies

2. Give an example of an independent and a dependent variable within the context of

a research question

3. Give an example of a categorical and a measured, quantitative variable

CLASSIFYING NONEXPERIMENTAL RESEARCH

In the literature on experimental studies, there is agreement on the distinction between

true- and quasi-experiments. Although both involve treatment manipulation, true-

experiments use random assignment of subjects to groups and random assignment

of groups to treatments. Quasi-experiments use preexisting intact groups, which are

randomly assigned to treatment conditions.

For nonexperimental designs, there appears to be no consistent agreement on typol-

ogy. In 1991, Elazar Pedhazur and Liora Schmelkin stated that “there is no consensus

regarding the term used to refer to designs” which were presented in their chapter

on nonexperimental designs (p. 305). Two commonly used terms for nonexperimental

studies are “correlational research” and “survey research.” However, the term correla-

tion relates more to an analysis strategy than to a research design and the term survey

describes a method of gathering data that can be used in different types of research.

Ten years later, Burke Johnson (2001) came to the same conclusion. Based on

a review of twenty-three leading methods textbooks in education and related fields

(thirteen explicitly from education and the rest from anthropology, psychology, political

science, and sociology), he found little consistency in how nonexperimental studies

were classified. He discovered over two dozen different labels being used, sometimes

Lapan c04.tex V1 – 09/02/2008 2:46pm Page 65

Classifying Nonexperimental Research 65

with slight variations in the wording. The most frequently used labels in these texts

were survey (twelve times), correlational (ten times), descriptive (eight times), and

causal-comparative (five times). The result of my informal review of six additional

research methods texts was consistent with Johnson’s findings.

In an attempt to remedy this confusion, Johnson (2001) proposed a categoriza-

tion scheme consisting of two basic dimensions, each with three categories. The first

dimension represents a characterization of the basic goal or main purpose for conduct-

ing the nonexperimental quantitative study. The second dimension allows the research

to be classified according to the time frame in which data were collected. These two

dimensions will be presented here and discussed separately in the next two sections.

In your reading of published articles or research methods textbooks, you will proba-

bly encounter other terms for nonexperimental research. You may want to read Johnson

(2001) to familiarize yourself with these terms and with the problems that arise because

of their use.

Classification Based on Purpose (Dimension 1)

The categories of the first dimension for classifying nonexperimental studies, which are

based on the main purpose of the study, are:

1. Descriptive nonexperimental research, in which the primary focus for the research

is to describe some phenomenon or to document its characteristics. Such studies

are needed in order to document the status quo or do a needs assessment in a

given area of interest.

2. Predictive nonexperimental research, in which the primary focus for the research

is to predict some variable of interest (typically called the criterion) using infor-

mation from other variables (called predictors). The development of the proper

set of predictors for a given variable is often the focus of such studies.

3. Explanatory nonexperimental research, in which the primary focus for the

research is to explain how some phenomenon works or why it operates. The

objective is often to test a theory about the phenomenon. Hypotheses derived

from a given theoretical orientation are tested in attempts to validate the theory.

The three categories could be seen as answers to the question: Was the main purpose

of the research to describe a phenomenon, to study how to predict some future event,

or to understand how something operates or what drives it?

To help explain these three categories, consider the use of exit interviews. Such

interviews are often conducted by organizations with employees who leave or by school

systems with departing teachers and graduating

seniors

. An exit interview study can be

descriptive if the purpose is to collect data in order to get a comprehensive picture of

reasons for employees leaving their organization or school. These descriptions might be

used to determine if people leave for reasons related to the organization or for personal

reasons. On the other hand, the study would be predictive if exit data were collected

and then related to hiring data for the same individuals for the purpose of using the

results to screen potential employees and hiring people who might be less likely to

leave. Finally, the study would be explanatory if the data were analyzed with the

Lapan c04.tex V1 – 09/02/2008 2:46pm Page 66

66 Nonexperimental Quantitative Research

purpose of testing hypotheses about how personal characteristics might be related to

employee or student feelings about their organization or school.

A good example of a published descriptive study is the 39th Annual Phi Delta

Kappa/Gallup Poll of the Public’s Attitudes toward the Public Schools (Rose & Gallup,

2007). Begun as an effort to inform educators, the annual survey now provides infor-

mation that has policy implications. Although the accumulated database can be used

to track changes in attitudes about Pre-K–12 schooling over a long period of time, the

design for each yearly survey is purely descriptive in terms of its purpose. Results are

a descriptive representation of how the general public feels about different aspects of

public schools.

A study by Leslie Halpern and Thomas Dodson (2006) to develop a set of indicators

that could identify women likely to report injuries related to intimate partner violence

is an example of a predictive study. They tried to develop markers that could be used

in hospital settings to make predictions about likelihood of intimate partner violence.

They identified two variables as potential predictors: injury location and responses

to a standard screening questionnaire. They included them, along with demographic

variables, in developing a prediction model.

An explanatory study was done to examine the relationships among the variables

of attachment, work satisfaction, marital satisfaction, parental satisfaction, and life sat-

isfaction (Perrone, Webb, & Jackson, 2007). This research was informed by attachment

theory, which describes “parental attachment as a stable connection that provides a feel-

ing of safety and security for the child” (p. 238). The researchers used five published

instruments and present a very good description of reliability and validity for each one.

Classification Based on Time (Dimension 2)

The categories of the second dimension for classifying nonexperimental research, which

refer to time, are:

1. Cross-sectional research, in which data are collected at one point in time, often in

order to make comparisons across different types of respondents or participants.

2. Prospective or longitudinal research, in which data are collected on multiple

occasions starting with the present and going into the future for comparisons

across time. Data are sometimes collected on different groups over time in order

to determine subsequent differences on some other variable.

3. Retrospective research, in which the researcher looks back in time using existing

or available data to explain or explore an existing occurrence. This backwards

examination may be an attempt to find potential explanations for current group

differences.

These categories could be seen as answers to the question: Were the data collected

at a single time point, across some time span into the future, or were already exist-

ing data explored? You could think of them as representing the past (retrospective),

present (cross-sectional), and future (prospective) with respect to timing of data collec-

tion. As an example, suppose you were interested in assessing differences in college

students’ attitudes toward potential careers. In a cross-sectional study, you might take a

Lapan c04.tex V1 – 09/02/2008 2:46pm Page 67

Classifying Nonexperimental Research 67

random sample of first-year college students (freshmen) and fourth-year college students

(seniors) and compare their attitudes. Your purpose might be to show that more mature

students (seniors) view career options differently from less mature students (freshmen).

Now consider assessing career attitudes in a prospective study. There are actually

three options: trend, cohort, or panel study. To distinguish among these three approaches,

think of a four-year prospective study starting in 2008 with college freshmen. The pop-

ulation of interest is all college freshmen in the United States. In 2008, a random

sample of college freshman is taken for all three approaches. Table 4.1 describes the

samples in the subsequent three years for each approach. In the trend study, the same

general population (college freshmen) is tracked. In the cohort study, the same specific

population (college freshmen in 2008) is tracked. In the panel study, the same individu-

als are tracked. One of the advantages of a panel study is that you can look for changes

and not simply report on trends. A disadvantage is that you have to start with a fairly

large sample due to attrition over time, particularly for a lengthy study.

An example of a retrospective study could be an examination of the educational

background and experience of very successful teachers and less successful teachers.

The idea is to look backward in time and examine what differences existed that might

provide an explanation for the present differences in success. To the extent that such a

study needed to depend on people’s memories of relevant background information, it

would be less accurate than if prior data were available for examination.

For a published example, consider one question addressed by Michael Heise (2004),

which was whether key actors in a criminal court case view case complexity in the same

way. The results of his cross-sectional comparison of three key actor groups (juries,

attorneys, and judges) suggest that they do possess slightly different views on whether

crimes are complex.

Examples of both prospective and retrospective research are based on the Nurses’

Health Study, a large scale longitudinal study started in 1976 with a mailed survey of

121,700 female registered nurses between thirty and fifty-five years of age who lived in

eleven states. Descriptive information about risk factors for major chronic diseases and

related issues were gathered every two years. Although most of the information gathered

TABLE 4.1. Description of Samples After Initial 2008 Sampling of College
Freshmen

2009 2010 2011

Trend

New sample — college

freshmen

New sample — college
freshmen
New sample — college
freshmen

Cohort New sample — college

sophomores

New sample — college

juniors

New sample — college

seniors

Panel

Same sample from

2008, who are now

sophomores

Same sample from 2008,

who are now juniors

Same sample from
2008, who are now
seniors

Lapan c04.tex V1 – 09/02/2008 2:46pm Page 68

68 Nonexperimental Quantitative Research

was identical, new questions were added periodically. The Nurses’ Health Study Web

page (www.channing.harvard.edu/nhs) contains a complete list of publications based on

these data.

One such study was conducted by Francine Laden et al. (2000). They examined

the responses from the 87,497 women who answered newly included questions about

lifetime use of electric blankets and heated waterbeds. Using data from the larger study,

Laden and her colleagues focused their attention on the relationship between electric

blanket use and breast cancer from both a prospective and retrospective view. This was

done because electric blanket use is a source of electric and magnetic fields (EMFs)

exposure, and EMF exposure had been hypothesized to increase the risk of breast

cancer. The relevant year is 1992, when information about use of electric blankets and

waterbeds was first documented. For the prospective part of their study, they considered

women who had not been diagnosed with cancer as of 1992 and analyzed the occurrence

of breast cancer from 1992 to 1996 for groups according to electric blanket or waterbed

usage. For the retrospective part, they used records from 1976 to 1992, considering only

women who were cancer free in 1976. In the prospective part of the study “exposure

to electric blankets and waterbed use was assessed prior to the occurrence of breast

cancer,” while in the retrospective analysis “exposure was ascertained after diagnosis”

(Laden et al., 2000, p. 42).

Retrospective studies may be based on past records, as in the previous example,

or on retrospective questions, that is, on questions about past behaviors or experiences.

Merely using already existing data, however, does not make it retrospective. The key

distinction is the study’s purpose. Are you looking backwards to discover some potential

cause or explanation for a current situation, or are you using data from one point in

time to predict data from a later time? Notice that Laden and her colleagues (

2000)

used preexisting data for both retrospective and prospective studies. For the prospective

part, women who had not been diagnosed with cancer in 1992 were divided into groups

based on whether they did or did not use electric blankets, and the groups were then

compared with respect to breast cancer incidents by 1996. For the retrospective part,

they divided the women into two groups based on whether they had or had not been

diagnosed with cancer as of 1992 and then compared them in terms of reported prior

use of electric blankets.

Combining Classification Dimensions

When used together, Johnson’s two dimensions (2001) combine to form a 3 × 3 design

for a total of nine distinct categories that may be used to describe nonexperimental

research. Examples of all nine may be found in the National Education Longitudinal

Study of 1988 (NELS:88), which was a large-scale data collection effort. A nationally

representative sample of eighth graders were first surveyed in 1988, with subsequent

follow-up surveys every two years until 1994, and then once again in 2000. The National

Center for Education Statistics’ Web page (http://nces.ed.gov/surveys/nels88) describes

this study, and also provides an annotated bibliography of research done using the

various data sets. Depending on which data were selected for each study and the study

Lapan c04.tex V1 – 09/02/2008 2:46pm Page 69

Classifying Nonexperimental Research 69

TABLE 4.2. Articles Classified According to Both Research Objective and
Time of Dimensions

Retrospective Cross-Sectional Prospective

Descriptive Type 1 Type 2 Type 3

Behavioral responses

of substance-exposed

newborns: a

retrospective study

(Higley & Morin, 2004)

Criminal case

complexity: An

empirical perspective

(Heise, 2004)

The stability of

undergraduate

students’ cognitive test

anxiety levels (Cassady,

2001)

Predictive Type 4 Type 5 Type 6

Electric blanket use

and breast cancer in

the nurses’ health

study (Laden et al.,

2000)

A predictive model to

identify women with

injuries related to

intimate partner

violence (Halpern &

Dodson, 2006)

Electric blanket use and

breast cancer in the

nurses’ health study

(Laden et al., 2000)

Explanatory Type 7 Type 8 Type 9

A further look at youth

intellectual giftedness

and its correlates:

values, interests,

performance, and

behavior (Roznowski,

Reith, & Hong, 2000)

Relationships between

parental attachment,

work and family roles,

and life satisfaction

(Perrone, Webb, &

Jackson,

2007)

Thirty-year stability and

predictive validity of

vocational interests.

(Rottinghaus, Coon,

Gaffey, & Zytowski,

2007)

purpose, different NELS:88 studies might be classified using all nine of the purpose

by time frame classifications. To help clarify this cross-classification scheme, Table 4.2

gives the titles of articles representing each type, which are then described.

Type 1 — Descriptive retrospective. Using retrospective chart review, Anne Marie

Higley and Karen Morin (2004) described the behavior of infants whose mothers had

a drug history. Their findings supported the use of an assessment tool to guide parents

in providing a supportive care environment to help infants recover.

Type 2 — Descriptive cross-sectional. This study was discussed earlier as an

example of a cross-sectional study. It is descriptive because the goal was to document

the extent to which juries, attorneys, and judges held similar or different views about

a case. The results have implications for legal reform efforts.

Type 3 — Descriptive prospective. This was an investigation of the stability of

test anxiety measures over time and testing formats, with data collected at three time

points in an academic semester, therefore making it prospective. The purpose for the

description was to determine if test anxiety was a stable condition or if it is necessary to

Lapan c04.tex V1 – 09/02/2008 2:46pm Page 70

70 Nonexperimental Quantitative Research

include a test anxiety measure with every test in a longitudinal study. Results indicated

that it is not necessary to measure anxiety with every test; it is only necessary to measure

anxiety in one test-taking situation.

Type 4 — Predictive retrospective and Type 6 — Predictive prospective. The two

parts of this study were described earlier as examples of retrospective and prospective

studies. Both parts were predictive in nature, using a backward and a forward perspective

to determine the extent to which electric blanket and waterbed use could be used to

predict breast cancer. Although results did not exclude small risks, neither analysis

supported an association between breast cancer risk and use of electric blankets and

waterbeds.

Type 5 — Predictive cross-sectional. In this study, discussed as an example of a

predictive study, a one-time data collection was used. The authors’ aim was to develop

and validate a predictive model. They subdivided their sample, using one group to

develop their model and the second group to validate, or test it. Their work produced

a predictive and validated model of three components: risk of self-report of intimate

partner violence related injury, age, and race. The researchers then hypothesized that

these three variables could be used to develop a protocol to assist in the early diagnosis

of intimate partner violence in an emergency department and outpatient clinical setting.

Type 7 — Explanatory retrospective. This study was explanatory because a goal

was to further previous work on giftedness and knowledge and understanding of sev-

eral related variables. The data came from the High School and Beyond database, a

longitudinal study with baseline information on 14,825 students who were high school

sophomores in 1980. The data for this study included the base year and the third

follow-up survey, four years later, after graduation. The data set “allowed for more

comparisons than could reasonably be included in a single study. Variables were cho-

sen that would either serve to replicate previous findings or expand psychological and

behavioral profiles of gifted male and female students into more detail” (Roznowski,

Reith, & Hong, 2000, p. 96). A retrospective conclusion was that educational attain-

ment differences of gifted males and females had their origins in the early high school

years.

Type 8 — Explanatory cross-sectional. Already discussed as an example of an

explanatory study, this study was based on data from the fifteenth annual survey of a

longitudinal study that started in 1988 with 1,724 participants. About 1,200 participants

were lost in the first three years. Only 108 participants were left for this study, which

shows the dramatic attrition that can happen in a longitudinal study. Although the data

were from a longitudinal study, these authors only used the fifteenth year’s data, thereby

making it cross-sectional.

Type 9 — Explanatory prospective. The authors suggested that “Assessing the

predictive validity of an interest inventory is essentially answering the question, ‘Do

early interest scores match one’s future occupation?’” (Rottinghaus et al., 2007, p. 7). To

answer this question, they did a thirty-year follow-up of 107 former high school juniors

and seniors whose interests were assessed in 1975. The first author had collected the

initial data. Their results extend research on vocational interests, indicating that interests

were fairly stable even after such a long time span.

Lapan c04.tex V1 – 09/02/2008 2:46pm Page 71

Causal Explanations and Nonexperimental Studies 71

REFLECTION QUESTIONS

1. How do descriptive, predictive, and explanatory studies differ?

2. How do retrospective, cross-sectional, and prospective studies differ?

3. Find several recent articles in your field of study where a nonexperimental

design was used. Classify their main purpose as being descriptive, predictive, or

explanatory and classify the time dimension as retrospective, cross-sectional,

or prospective.

CAUSAL EXPLANATIONS AND NONEXPERIMENTAL STUDIES

Using Johnson’s classification system (2001), many nonexperimental studies are either

descriptive or predictive. For those, the notion of causation is not relevant. However,

a goal for many explanatory nonexperimental research studies is to explore potentially

causal relationships. A causal relationship is one in which a given action is likely to

produce a particular result.

The terms independent and dependent refer to the different roles variables play in

experimental studies. If a causal relationship exists, then the outcome (the measured

DV) depends on, or is a direct result of, the nature of the assigned independent treatment

condition. Strictly speaking, these terms are not applicable in nonexperimental research,

although they are often used. The more appropriate terms in nonexperimental studies

are criterion and predictor variables, criterion being the presumed outcome of one

or more predictor variables. When the intent is to use nonexperimental research to

study potential cause-and-effect relationships where experimentation is not possible,

the concept of IV and DV may still be of interest, but conclusions about causation that

can be made from nonexperimental studies are weaker than those that can be made

from true-experimental studies. Additionally, great care needs to be taken to assure that

nothing essential has been overlooked.

As explained earlier, the distinction is often made between nonexperimental studies

that involve both categorical and quantitative variables and those that involve only

quantitative variables. Considering only two variables for the sake of simplicity, an

example of the first type of study is a comparison of gender differences in mathematics

achievement in high school. Gender, with male and female as the two categories, is

considered the independent variable and some mathematics achievement score is the

measured dependent variable. Examples where both variables are quantitative might

be an examination of the relationship between test scores and time spent studying, or

between scores on some measure of motivation and scores on an achievement test.

Examples like these, of very simple cases involving only two variables, are neither

very interesting nor very informative. Additional variables could be included in order

to examine more complex relationships.

No matter which type of design or which type of variable is used, evidence of a rela-

tionship would not be convincing evidence of causality. Recall the example described

earlier about investigating the relationship between education level and salary and the

Lapan c04.tex V1 – 09/02/2008 2:46pm Page 72

72 Nonexperimental Quantitative Research

two ways that education level could be measured. Regardless of whether education

level was construed as categorical (highest degree earned) or as quantitative (number

of years of schooling), it should not be concluded that one’s educational level caused or

produced a different level of salary. If dramatic differences across the five groups with

different degrees were found such that those with higher education had higher

median

salaries, all that can be concluded is that there was a relationship between educational

level and salary. This same conclusion would be possible if results indicated a strong

positive correlation between years of schooling and salary: that people with fewer years

of school tended to have low salaries and people with more years of school tended to

have high salaries (see Figure 4.1 for graphical representation of a positive relationship).

The scatter plot for a negative relationship would go from the upper left corner to the

lower right corner, indicating that low scores on one variable tended to go with high

scores on the other variable.

The differences in the wording of the research questions in the previous two cases

reflect the nature of the variables used (categorical or quantitative). They would require

different analysis strategies, either to test if the median values did differ more than you

might expect by chance, or to determine the strength and direction of the relationship.

Differences in wording or analysis do not, however, reflect any difference in the nature

of the relationship between the variables. Explanatory nonexperimental research articles

often have conclusions phrased in causal language. Therefore, the next section is a

review of the essential elements needed to establish cause-and-effect relationships and

a discussion of their applicability to nonexperimental studies.

Requirements for Causality

There are three conditions necessary in order to be able to argue that some variable

X

(the presumed independent) causes another variable Y (the presumed dependent).

1. The two variables X and Y must be related. If they are not related, it is impossible

for one to cause the other. For nonexperimental research, that means that it must

be demonstrated that differences in X are associated with differences in Y.

2. Changes in X must happen before observed changes in Y. This is always the case

when X is a manipulated treatment variable in an experiment. But establishing

that a cause happened before an effect needs to be documented in some way or

logically explained in nonexperimental studies. This is impossible to do when the

data are cross-sectional and collected simultaneously.

3. There is no possible alternative explanation for the relationship between X and

Y. That is, there is no plausible third variable that might explain the observed

relationship between X and Y, possibly having caused both of them.

In nonexperimental studies, the first requirement can be established easily with

correlational analyses. The second could also be established if longitudinal data are used

so that predictor variables are measured before the criterion. The third requirement is

more difficult to demonstrate. To do so requires a thorough knowledge of the literature

and the underlying theory or theories governing the topic being investigated, logical

arguments, plus testing and ruling out of alternative possibilities.

Lapan c04.tex V1 – 09/02/2008 2:46pm Page 73

Causal Explanations and Nonexperimental Studies 73

The fact that two variables are related does not inform us of which one influences

the other. There are at least three reasons why two variables could be related and it is not

possible to know from the correlation which one is the correct reality. Three potential

explanations are: (1) that X causes or influences Y, (2) that Y causes or influences X,

or (3) that Z, a third variable, causes both X and Y. Consider the following headline:

“Migraines plague the poor more than the rich.” It could be argued that the stresses of

living in poverty and other poverty-related conditions could trigger migraine headaches.

It could also be argued that migraines cause one to miss work and eventually lose

employment, thereby inducing poverty for a subset of individuals prone to migraines.

Which is the correct interpretation? It is impossible to tell.

Although there is no formal way to prove causation in nonexperimental research, it

may be possible to suggest it. This is done through careful consideration, by referring

to the three conditions for cause, by presenting logical arguments, and by testing likely

alternatives in order to make a case for the likely conclusion of a causal relationship.

One must be careful, however, not to phrase conclusions as proof of causation.

Ruling Out Alternative Hypotheses

To demonstrate the process for ruling out alternative hypotheses, we will use a medical

example. Consider the process a doctor goes through in diagnosing a new patient’s

illness. First, the doctor considers the symptoms. The list of symptoms is used to

select potential problems with similar symptoms and to rule out problems with different

symptoms. Tests are ordered to confirm the most likely diagnosis and remedies are

tried. If the test results are negative or the remedies do not work, then the original

diagnosis is discarded, and other possible diagnoses are considered and tested. How

does this process relate to research? The first step is matching observations (the reported

symptoms) to theory (known symptoms for an illness). The second step is to test a hunch

or tentative hypothesis (initial diagnosis) and rule out alternative hypotheses (other

potential diagnoses). The process continues until a reasonable conclusion is reached.

The analogy breaks down because, ideally, the correct diagnosis is made and the patient

is cured, although results are never as conclusive in nonexperimental studies.

Given a theory that is driving the research, how does one rule out potential alterna-

tive hypotheses? One way is to consider all likely confounding or lurking variables. In

an experimental study, two variables are confounded when their effects on a dependent

variable cannot be distinguished. The following example, although purely correlational,

should clarify the concept of confounding or lurking variables.

One would expect that grades and standardized tests, such as SAT scores, would

be related more to each other than they would to socioeconomic status (SES). In many

studies, however, SES and SAT appear to have a much stronger relationship than do

grades and SAT. Rebecca Zwick and Jennifer Green (2007) explored reasons for such

results with data from a random sample of 98,391 students from 7,330 high schools.

They performed two different analyses. In the first analysis, they found the correlation

for grades and SAT for the entire sample and, in the second analysis, they did so for

each school individually and then averaged the school-level results to get one overall

measure of relationship. The second analysis produced a much stronger relationship

Lapan c04.tex V1 – 09/02/2008 2:46pm Page 74

74 Nonexperimental Quantitative Research

between grades and SAT scores than did the first analysis. This is because the first

analysis ignored the fact that there are school-level differences in SES as well as other

variables.

Figure 4.2 should help you visualize this discussion. In part A, the two smaller

ovals represent a scatter plot of scores for two schools, where both grades and

SAT

scores tend to be higher in School 2 than in School 1. The lines bisecting these two

ovals provide a linear representation of the relationship between the variables within

each school and are called regression lines. Both ovals are rather narrow in width,

being fairly close to their regression lines, and thereby give a visual representation of a

relatively strong positive relationship between grades and SAT within each school. The

larger oval represents the relationship between grades and SAT scores as it would appear

across or between schools, that is, if school membership were ignored in the analysis.

It is much more spread out around its regression line (the dotted line), erroneously

indicating a much weaker relationship between grades and SAT. The two smaller ovals

correspond to Zwick and Green’s second analysis (2007) and the larger oval to their

first analysis. Ignoring the differences between the schools confounds the relationship

between grades and SAT being investigated.

Part B of Figure 4.2 shows a worst-case scenario of ignoring a lurking variable.

Suppose the relationship between two variables, X and Y, is negative for each of

two groups. This is shown by the two smaller ovals, where lower scores on X tend

to go with higher scores on Y and vice versa within each group. Ignoring groups,

FIGURE 4.2. Representation of Effects of Confounding Variables

SAT

Grades

School 1 School 2

Y

X

Group 1 Group 2

A. Fairly strong positive relationship

between grades and SAT within each

school. Weak relationship when

school membership is ignored.

B. Fairly strong negative relationship

between X and Y within each group.

A seemingly positive relationship

when group membership is ignored.

Lapan c04.tex V1 – 09/02/2008 2:46pm Page 75

Analysis and Interpretation in Nonexperimental Studies 75

however, would produce a positive relationship, which would be a completely wrong

conclusion.

REFLECTION QUESTIONS

By now, you should be able to

1. List and explain three essential requirements to argue cause

2. Explain why even a strong correlation does not imply causation

3. Describe why ruling out alternative hypotheses is important.

4. Find one or two nonexperimental studies in your field of study where hypotheses

were tested or where a theory was explored. What extraneous variables or poten-

tial alternative hypotheses were discussed? Can you think of others that were not

discussed? How might inclusion of those variables have changed results?

ANALYSIS AND INTERPRETATION IN NONEXPERIMENTAL STUDIES

Data analyses in nonexperimental studies depend on both the goal for the study and

the nature of the variables in the data set. Almost any analysis may be possible

and a useful presentation is not reasonable here. There are ample books and sources for

details about statistical methods and their use. A few examples are given at the end of

the chapter; also see the discussion on understanding quantitative data in Chapter Six.

You need to be aware of the basic distinction between descriptive and inferential

statistics. Descriptive statistics involve summarizing and describing quantitative infor-

mation in meaningful ways. For example, a mean, or arithmetic average, is a statistic

used to describe a central value for a set of numbers. Inferential statistics are used to

make conclusions beyond the data collected and to test hypotheses. Statistical tests are

used to make conclusions about populations based on results from random samples or

to determine the probability that results are not due to random chance.

Interpretation of results in nonexperimental studies should be consistent with the

nature of the work, which is based on nonmanipulated variables. Therefore, conclusions

about cause and effect are not appropriate in any nonexperimental study. As you read

empirical articles, you should be attuned to how conclusions are discussed and be wary

of causal language. Robinson, Levin, Thomas, Pituch, and Vaughn (2007) reviewed

274 empirical articles in five teaching-and-learning research journals in 1994 and 2004.

They recorded causal and noncausal language use in abstracts and discussion sections.

Their two main conclusions were: (1) experimental articles in teaching-and-learning

declined in the ten-year span, and (2) on average, the use of causal conclusions made

in nonexperimental and qualitative studies increased. They conclude by saying that “as

journal readers, we have an obligation to search an article for information about how

the data were collected so we are not unduly influenced by unwarranted conclusions”

(Robinson et al., 2007, p. 412). Ideally, after studying this chapter you will be able to

search through articles for information about how the study was conducted and use that

to consider conclusions.

Lapan c04.tex V1 – 09/02/2008 2:46pm Page 76

76 Nonexperimental Quantitative Research

SUMMARY

The goal for this chapter was to present

adequate information about nonexperi-

mental designs so that a practitioner could

read the literature and have a basic under-

standing of methods used. Nonexperi-

mental research is described in many

ways and covers any quantitative study

that does not have manipulated vari-

ables or random assignment. A topic

of research interest can be modified to

serve alternative purposes, and data can

be collected over different time frames.

The two-dimensional classification sys-

tem presented here should help you cat-

egorize articles. Reading any of the arti-

cles listed in Table 4.2 that are of interest

to you could be useful in understand-

ing why it was classified according to

the two dimensions given. A good place

to start, with a relatively straightforward

example, would be the Cassady (2001)

article, which is an example of Type 3,

a descriptive prospective study. A good

exercise would be to find other nonexper-

imental studies and classify them accord-

ing to the two dimensions of purpose and

time of data collection.

A key to understanding published

research is to identify the goal of the

research, evaluate what was done in rela-

tion to that goal, and consider aspects and

variables that may have been overlooked.

Most important, consider the language

used in published works and be skeptical

if overzealous researchers present their

nonexperimental results in causal terms.

Regardless of what type of research is

presented, be a wary consumer.

KEY TERMS

attribute variables

categorical variables

confounding or lurking variables

correlation coefficient

criterion

cross-sectional research

dependent variable

descriptive non

experimental research

descriptive statistics

experimental research

explanatory

nonexperimental research

independent variable

index

inferential statistics

median
nonexperimental research

predictive nonexperimental research

predictors

prospective or longitudinal research

quantitative variables

quasi-experiments

random assignment

random sample

regression line

reliability

retrospective research

scale

scatter plot

true-experiments

validity

variable

Lapan c04.tex V1 – 09/02/2008 2:46pm Page 77

Analysis and Interpretation in Nonexperimental Studies 77

FURTHER READINGS AND RESOURCES

Suggested Readings

Allison, P. D. (1999). Multiple regression: A primer . Thousand Oaks, CA: Pine Forge Press.

This basic text, discussing an analysis technique often used in nonexperimental studies, is written in an

understandable manner, using examples from social science research literature to develop the concepts.

Johnson, R. B., & Christensen, L. See lecture in Chapter Eleven: Nonexperimental quantitative research, based

on Educational Research: Quantitative, Qualitative, and Mixed Applications . Retrieved March 13, 2008, from

www.southalabama.edu/coe/bset/johnson/2lectures.htm.

Discusses steps in nonexperimental research, ways to control extraneous variables in nonexperimental

research, and Johnson’s classification scheme for nonexperimental research, and provides a graphic description of

controlling for a third variable.

Locke, L. F., Silverman, S. J., & Waneen, W. S. (2004). Reading and understanding research (2nd ed). Thousand

Oaks: Sage.

Although this book deals with research in general, it is an easily understandable resource with good examples

to help you read and understand published research articles. Aimed at consumers of research, the approach is

nontechnical and user-friendly.

Lowry, R. (1999–2008). Concepts and applications of inferential statistics . Retrieved October 10, 2007, from

http://faculty.vassar.edu/lowry/webtext.html.

Chapter Three of this free, full-length statistics textbook provides an introduction to linear correlation and

regression using examples and diagrams. This is useful for understanding the basic analyses used with nonexperi-

mental data.

Meltzoff, J. (1997). Critical thinking about research: Psychology and related fields . Washington, DC: American

Psychological Association.

This text should help develop critical thinking skills via research by critiquing exercises of different types of

research studies. It combines fundamental content with practice articles.

Trochim, W. M. The research methods knowledge base (2nd ed.). Retrieved October 20, 2006, from

www.socialresearchmethods.net/kb.

Of particular use is the Language of Research part of the Foundation section, where types of relationships

are clearly described, using simple examples and graphs.

Week 5 – Final Paper

Final Paper: Research Proposal

Review the Example Research Proposal provided in the course materials. Design a research study on the topic of the study selected in Week One and critiqued in Week Three. Your design should seek to resolve the limitations you identified in the study you critiqued. Your paper must address all of the components required in the “Methods” section of a research proposal:

· State the research question and/or hypothesis.

· Specify the approach (qualitative or quantitative), research design, sampling strategy, data collection procedures, and data analysis techniques to be used.

· If the design is quantitative, also describe the variables, measures, and statistical tests you would use.

· Analyze ethical issues that may arise and explain how you would handle these issues.

Your Final Paper must be six to eight pages in length (excluding title and reference pages) and formatted according to APA style as outlined in the Ashford Writing Center. Utilize a minimum of six peer-reviewed sources that were published within the last 10 years, in addition to the textbook, that are documented in APA style as outlined in the Ashford Writing Center. The sources should consist of the following:

· One source should be the article you critiqued in the Week Three assignment.

· At least two sources should be about the research methodology you have chosen for your study.

· At least one source should be on ethical issues in research.

· The remaining sources may be about anything pertinent to your study.

In accordance with APA style, all references listed must be cited in the body of the paper.

Required Sections and Subsections (use these headings in your paper)

I. Introduction – Introduce the research topic, explain why it is important, and present your research question and/or hypothesis.

II. Literature Review – Summarize the current state of knowledge on your topic, making reference to the findings of previous research studies (including the one you critiqued in Week Three). Briefly analyze and critique these studies and mention the research methods that have previously been used to study the topic. State whether your proposed study is a replication of a previous study or a new approach using methods that have not been used before. Be sure to properly cite all of your sources in APA style.

III. Methods

A. Design – Indicate whether your proposed study is qualitative or quantitative in approach. Identify the specific research design, using one of the designs we have studied in Weeks Three through Five, and indicate whether it is experimental or non-experimental. Evaluate your chosen design and explain why you believe this design is appropriate for the topic and how it will provide the information you need to answer the research question. Cite sources on research methodology to support your choices.

B. Participants – Identify and describe the sampling strategy you would use to recruit participants for your study. Estimate the number of participants you would need and explain why your sampling method is appropriate for your research design and approach.

C. Procedure/Measures – Apply the scientific method by describing the steps you would use in carrying out your study. Indicate whether you will use any kind of test, questionnaire, or measurement instrument. If using an existing published instrument, provide a brief description and cite your source. If you are creating a questionnaire, survey, or test, describe the types of information you will gather and explain how you would establish the validity and reliability. If you are not using such an instrument, describe how you would collect the data.

D. Data Analysis – Describe the statistical techniques (if quantitative) or the analysis procedure (if qualitative) you plan to use to analyze the data. Cite at least one source on the chosen analysis technique (from your Week Two assignment).

E. Ethical Issues – Analyze the impact of ethical concerns on your proposed study, such as confidentiality, deception, informed consent, potential harm to participants, conflict of interest, IRB approval, etc. After analyzing the ethical issues that apply to your research proposal, indicate what you would do to handle these concerns.

IV. Conclusion – Briefly summarize the major points from your paper and reiterate why your proposed study is needed.


Writing the Final Paper

The Final Paper:

· Must be six to eight double-spaced pages in length, and formatted according to APA style as outlined in the Ashford Writing Center.

· Must include a title page with the following:

· Title of paper

· Student’s name

· Course name and number

· Instructor’s name

· Date submitted

· Must begin with an introductory paragraph that has a succinct thesis statement.

· Must address the topic of the paper with critical thought.

· Must end with a conclusion that reaffirms your thesis.

· Must use at least six peer-reviewed sources that were published within the last 10 years, in addition to the textbook.

· Must document all sources in APA style, as outlined in the Ashford Writing Center.

· Must include a separate reference page, formatted according to APA style as outlined in the Ashford Writing Center.

Carefully review the 

Grading Rubric (Links to an external site.)Links to an external site.

 for the criteria that will be used to evaluate your assignment.

1/4/2018 Print

https://content.ashford.edu/print/Newman.2681.16.1?sections=navpoint-23,navpoint-26,navpoint-27,navpoint-28,navpoint-29,navpoint-30,navpoint-31… 1/44

Rayes/Photodisc/Thinkstock

Observational research can be used
to measure an infant’s attachment to
a caregiver.

3.4 Observational Research
Moving further along the continuum of control, we come to the
descriptive design with the greatest amount of researcher control.
Observational research involves studies that directly observe behavior
and record these observations in an objective and systematic way. Your
previous psychology courses may have explored the concept of
attachment theory, which argues that an infant’s bond with his or her
primary caregiver has implications for later social and emotional
development. Mary Ainsworth, a Canadian developmental psychologist,
and John Bowlby, a British psychologist and psychiatrist, articulated this
theory in the 1960s. They argued that children can form either “secure”
or a variety of “insecure” attachments with their caregivers (Ainsworth &
Bell, 1970; Bowlby, 1963).

To assess these classi�ications, Ainsworth and Bell developed an
observational technique called the “strange situation.” Mothers would
arrive at their laboratory with their children for a series of structured interactions, including having the mother play
with the infant, leave him alone with a stranger, and then return to the room after a brief absence. The researchers
were most interested in coding the ways in which the infant responded to these various episodes (eight in total).
One group of infants, for example, was curious when the mother left but then returned to playing with toys, trusting
that she would return. Another group showed immediate distress when the mother left and clung to her nervously
upon her return. Based on these and other behavioral observations, Ainsworth and colleagues classi�ied these
groups of infants as “securely” and “insecurely” attached to their mothers, respectively.

Research: Making an Impact

Harry Harlow

In the 1950s, U.S. psychologist Harry Harlow conducted a landmark series of studies on the mother–infant
bond using rhesus monkeys. Although contemporary standards would consider his research unethical, the
results of his work revealed the importance of affection, attachment, and love on healthy childhood
development.

Prior to Harlow’s �indings, it was believed that infants attached to their mothers as a part of a drive to ful�ill
exclusively biological needs, in this case obtaining food and water and avoiding pain (Herman, 2007; van der
Horst & van der Veer, 2008). In an effort to clarify the reasons that infants so clearly need maternal care,
Harlow removed rhesus monkeys from their natural mothers several hours after birth, giving the young
monkeys a choice between two surrogate “mothers.” Both mothers were made of wire, but one was bare and
one was covered in terry cloth. Although the wire mother provided food via an attached bottle, the monkeys
preferred the softer, terry-cloth mother, even though the latter provided no food (Harlow & Zimmerman,
1958; Herman, 2007).

Further research with the terry-cloth mothers contributed to the understanding of healthy attachment and
childhood development (van der Horst & van der Veer, 2008). When the young monkeys were given the
option to explore a room with their terry-cloth mothers and had the cloth mothers in the room with them,
they used the mothers as a safe base. Similarly, when exposed to novel stimuli such as a loud noise, the
monkeys would seek comfort from the cloth-covered surrogate (Harlow & Zimmerman, 1958). However,
when the monkeys were left in the room without their cloth mothers, they reacted poorly—freezing up,
crouching, crying, and screaming.

1/4/2018 Print

https://content.ashford.edu/print/Newman.2681.16.1?sections=navpoint-23,navpoint-26,navpoint-27,navpoint-28,navpoint-29,navpoint-30,navpoint-31… 2/44

A control group of monkeys who were never exposed to either their real mothers or one of the surrogates
revealed stunted forms of attachment and affection. They were left incapable of forming lasting emotional
attachments with other monkeys (Herman, 2007). Based on this research, Harlow discovered the importance
of proper emotional attachment, stressing the importance of physical and emotional bonding between
infants and mothers (Harlow & Zimmerman, 1958; Herman, 2007).

Harlow’s in�luential research led to improved understanding of maternal bonding and child development
(Herman, 2007). His research paved the way for improvements in infant and child care and in helping
children cope with separation from their mothers (Bretherton, 1992; Du Plessis, 2009). In addition, Harlow’s
work contributed to the improved treatment of children in orphanages, hospitals, day care centers, and
schools (Herman, 2007; van der Horst & van der Veer, 2008).

Pros and Cons of Observational Research

Observational designs are well suited to a wide range of research questions, provided the questions can be
addressed through directly observable behaviors and events. For example, researchers can observe parent–child
interactions, or nonverbal cues to emotion, or even crowd behavior. However, if they are interested in studying
thought processes—such as how close mothers feel to their children—then observation will not suf�ice. This point
harkens back to the discussion of behavioral measures in Chapter 2 (2.2): In exchange for giving up access to
internal processes, researchers gain access to un�iltered behavioral responses.

To capture these un�iltered behaviors, it is vital for the researcher to be as unobtrusive as possible. As we have
already discussed, people have a tendency to change their behavior when they are being observed. In the bullying
study by Craig and Pepler (1997) discussed at the beginning of this chapter, the researchers used video cameras to
record children’s behavior unobtrusively. Imagine how (arti�icially) low the occurrence of bullying might be if the
playground had been surrounded by researchers with clipboards!

If researchers conduct an observational study in a laboratory setting, they have no way to hide the fact that people
are being observed, but the use of one-way mirrors and video recordings can help people to become comfortable
with the setting. Researchers who conduct an observational study out in the real world have even more possibilities
for blending into the background, including using observers who are literally hidden. For example, someone
hypothesizes that people are more likely to pick up garbage when the weather is nicer. Rather than station an
observer with a clipboard by the trash can, the researcher could place someone out of sight behind a tree, or
perhaps sitting on a park bench pretending to read a magazine. In both cases, people would be less conscious of
being observed and therefore more likely to behave naturally.

One extremely clever strategy for blending in comes from a study by the social psychologist Muzafer Sherif et al.
(1954), involving observations of cooperative and competitive behaviors among boys at a summer camp. For Sherif,
it was particularly important to make observations in this context without the boys realizing they were part of a
research study. Sherif took on the role of camp janitor, which allowed him to be a presence in nearly all of the camp
activities. The boys never paid enough attention to the “janitor” to realize his omnipresence—or his discreet note-
taking. The brilliance of this idea is that it takes advantage of the fact that people tend to blend into the background
once we become used to their presence.

Types of Observational Research

Several variations of observational research exist, according to the amount of control that a researcher has over the
data collection process. Structured observation involves creating a standard situation in a controlled setting and
then observing participants’ responses to a predetermined set of events. The “strange situation” studies of parent–
child attachment (discussed above) are a good example of structured observation—mothers and infants are
subjected to a series of eight structured episodes, and researchers systematically observe and record the infants’

1/4/2018 Print

https://content.ashford.edu/print/Newman.2681.16.1?sections=navpoint-23,navpoint-26,navpoint-27,navpoint-28,navpoint-29,navpoint-30,navpoint-31… 3/44

reactions. Even though these types of studies are conducted in a laboratory, they differ from experimental studies in
an important way: Rather than systematically manipulate a variable to make comparisons, researchers present the
same set of conditions to all participants.

Another example of structured observation comes from the research of John Gottman, a psychologist at the
University of Washington. For nearly three decades, Gottman and his colleagues have conducted research on the
interaction styles of married couples. Couples who take part in this research are invited for a three-hour session in a
laboratory that closely resembles a living room. Gottman’s goal is to make couples feel reasonably comfortable and
natural in the setting to get them talking as they might do at home. After allowing them to settle in, Gottman adds
the structured element by asking the couple to discuss an “ongoing issue or problem” in their marriage. The
researchers then sit back to watch the sparks �ly, recording everything from verbal and nonverbal communication to
measures of heart rate and blood pressure. Gottman has observed and tracked so many couples over the decades
that he is able to predict, with remarkable accuracy, which couples will divorce in the 18 months following the lab
visit (Gottman & Levenson, 1992).

Naturalistic observation, meanwhile, involves observing and systematically recording behavior in the real world.
This can be conducted in two broad ways—with or without intervention on the part of the researcher. Intervention
in this context means that the researcher manipulates some aspect of the environment and then observes people’s
responses. For example, a researcher might leave a shopping cart just a few feet away from the cart-return area and
track whether people move the cart. (Given the number of carts that are abandoned just inches away from their
proper destination, someone must be doing this research all the time.) Recall an example from Chapter 1 (the
discussion of ethical dilemmas in section 1.5) in which Harari et al. (1995) used naturalistic observation to study
whether people would help in emergency situations. In brief, these researchers staged what appeared to be an
attempted rape in a public park and then observed whether groups or individual males were more likely to rush to
the victim’s aid.

The ABC network has developed a hit reality show that mimics this type of research. The show, What Would You Do?,
sets up provocative situations in public settings and videotapes people’s reactions. An unwitting participant in one
of these episodes might witness a customer stealing tips from a restaurant table, or a son berating his father for
being gay, or a man proposing to his girlfriend who minutes earlier had been kissing another man at the bar. Of
course, these observation “studies” are more interested in shock value than data collection (or Institutional Review
Board [IRB] approval; see Section 1.5), but the overall approach can be a useful strategy to assess people’s reactions
to various situations. In fact, some of the scenarios on the show are based on classic studies in social psychology,
such as the well-documented phenomenon that people are reluctant to take responsibility for helping in
emergencies.

Alternatively, naturalistic studies can involve simply recording ongoing behavior without any attempt by the
researchers to intervene or in�luence the situation. In these cases, the goal is to observe and record behavior in a
completely natural setting. For example, researchers might station themselves at a liquor store and observe the
numbers of men and women who buy beer versus wine. Or, they might observe the numbers of people who give
money to the Salvation Army bell-ringers during the holiday season. A researcher can use this approach to compare
different conditions, provided the differences occur naturally. That is, researchers could observe whether people
donate more money to the Salvation Army on sunny or snowy days, or compare donation rates when the bell ringers
are different genders or races. Do people give more money when the bell-ringer is an attractive female? Or do they
give more to someone who looks needier? These are all research questions that could be addressed using a well-
designed naturalistic observation study.

Finally, participant observation involves having the researcher(s) conduct observations while engaging in the
same activities as the participants. The goal is to interact with these participants to gain better access and insight
into their behaviors. In one famous example, the psychologist David Rosenhan (1973) was interested in the
experience of people hospitalized for mental illness. To study these experiences, he had eight perfectly sane people
gain admission to different mental hospitals. These fake patients were instructed to give accurate life histories to a
doctor but lie about one diagnostic symptom. They all claimed to hear an occasional voice saying the words “empty,”

1/4/2018 Print

https://content.ashford.edu/print/Newman.2681.16.1?sections=navpoint-23,navpoint-26,navpoint-27,navpoint-28,navpoint-29,navpoint-30,navpoint-31… 4/44

RENARD/BSIP/Superstock

Psychologists David Rosenhan’s study of staff
and patients in a mental hospital found that
patients tended to be treated based on their
diagnosis, not on their actual behavior.

“hollow,” and “thud.” Such auditory hallucinations are a symptom of schizophrenia, and Rosenhan chose these words
to vaguely suggest an existential crisis.

Once admitted, these “patients” behaved in a normal and cooperative manner, with instructions to convince hospital
staff that they were healthy enough to be released. In the meantime, they observed life in the hospital and took notes
on their experiences—a behavior that many doctors interpreted as “paranoid note-taking.” The main �inding of this
study was that hospital staff tended to view all patient behaviors through the lens of their initial diagnoses. Despite
immediately acting “normally,” these fake patients were hospitalized an average of 19 days (with a range from 7 to
52) before being released. All but one was diagnosed with “schizophrenia in remission” upon release. Rosenhan’s
other striking �inding was that treatment was generally depersonalized, with staff spending little time with
individual patients.

In another example of participant observation, Festinger,
Riecken, and Schachter (1956) decided to join a doomsday
cult to test their new theory of cognitive dissonance. Brie�ly,
this theory argues that people are motivated to maintain a
sense of consistency among their various thoughts and
behaviors. So, for example, a person who smokes a cigarette
despite being aware of the health risks might rationalize
smoking by convincing herself that lung-cancer risk is really
just genetic. In this case, Festinger and colleagues stumbled
upon the case of a woman named Mrs. Keach, who was
predicting the end of the world, via alien invasion, at 11 p.m.
on a speci�ic date six months in the future. What would
happen, they wondered, when this prophecy failed to come
true? (One can only imagine how shocked they would have
been had the prophecy turned out to be correct.)

To answer this question, the researchers pretended to be new
converts and joined the cult, living among the members and

observing them as they made their preparations for doomsday. Sure enough, the day came, and 11 p.m. came and
went without the world ending. Mrs. Keach �irst declared that she had forgotten to account for a time-zone
difference, but as sunrise started to approach, the group members became restless. Finally, after a short absence to
communicate with the aliens, Mrs. Keach returned with some good news: The aliens were so impressed with the
devotion of the group that they decided to postpone their invasion. The group members rejoiced, rallying around
this brilliant piece of rationalizing, and quickly began a new campaign to recruit new members.

As these examples illustrate, participant observation can provide access to amazing and one-of-a-kind data,
including insights into group members’ thoughts and feelings. This approach also provides access to groups that
might be reluctant to allow outside observers. However, the participant approach has two clear disadvantages over
other types of observation. The �irst problem is ethical; data are collected from individuals who do not have the
opportunity to give informed consent. Indeed, the whole point of the technique is to observe people without their
knowledge. Before an IRB can approve this kind of study, researchers must show an extremely compelling reason to
ignore informed consent, as well as extremely rigorous measures to protect identities. The second problem is
methodological; the approach provides ample opportunity for the objectivity of observations to be compromised by
the close contact between researcher and participant. Because the researchers are a part of the group, they can
change the dynamics in subtle ways, possibly leading the group to con�irm their hypothesis. In addition, the group
can shape the researchers’ interpretations in subtle ways, leading them to miss important details.

Another spin on participant observation is called ethnography, or the scienti�ic study of the customs of people and
cultures. This is very much a qualitative method that focuses on observing people in the real world and learning
about a culture from the perspective of the person being studied—that is, learning from the ground up rather than
testing hypotheses. Ethnography is used primarily in other social-science �ields, such as anthropology. In one famous
example, the cultural anthropologist Margaret Mead (1928) used this approach to shed light on differences in social

1/4/2018 Print

https://content.ashford.edu/print/Newman.2681.16.1?sections=navpoint-23,navpoint-26,navpoint-27,navpoint-28,navpoint-29,navpoint-30,navpoint-31… 5/44

norms around adolescence between American and Samoan societies. Mead’s conclusions were based on interviews
she conducted over a six-month period, observing and living alongside a group of 68 young women. Mead concluded
from these interviews that Samoan children and adolescents are largely ignored until they reach the age of 16 and
become full members of society. Among her more provocative claims was the idea that Samoan adolescents were
much more liberal in their sexual attitudes and behaviors than American adolescents.

Mead’s work has been the subject of criticism by a handful other anthropologists, one of whom has even suggested
that Mead was taken in by an elaborate joke played by the group of young girls. Still others have come to Mead’s
rescue and challenged the critics’ interpretations. The nature of this debate between Mead’s critics and her
supporters highlights a distinctive characteristic of qualitative methods: “Winning” the argument is based on
challenging interpretations of the original interviews and observations. In contrast, disagreements around
quantitative methods are generally based on examining statistical results from hypothesis testing. While
quantitative methods may lose much of the richness of people’s experiences, they do offer an arguably more
objective way of settling theoretical disputes.

Steps in Observational Research

One of the major strengths of observational research is its high degree of ecological validity; that is, the research
can be conducted in situations that closely resemble the real world. Think of the chapter examples so far—married
couples observed in a living-room-like laboratory; doomsday cults observed from within; bullying behaviors on the
school playground. In every case, people’s behaviors are observed in the natural environment or something very
close to it. However, this ecological validity comes at a price; the real world is a jumble of information, some
relevant, some not so much. The challenge for researchers, then, is to decide on a system that provides the best test
of their hypothesis, one that can sort out the signal from the noise. This section discusses a three-step process for
conducting observational research. The key point to note right away is that most of this process involves making
decisions ahead of time so that the process of data collection is smooth, simple, and systematic.

Step 1—Develop a Hypothesis
For research to be systematic, it is important to impose structure by having a clear research question, and, in the
case of quantitative research, a clear hypothesis as well. Other chapters have covered hypotheses in detail, but the
main points bear repeating: A hypothesis must be testable and falsi�iable, meaning that it must be framed in such a
way that it can be addressed through empirical data and might be discon�irmed by these data. In the example
involving Salvation Army donations, we predicted that people might donate more money to an attractive bell-ringer.
This hypothesis could easily be tested empirically and could just as easily be discon�irmed by the right set of data—
say, if attractive bell-ringers brought in the fewest donations.

This particular example also highlights an additional important feature of observational hypotheses; namely, they
must be based on observable behaviors. That is, we can safely make predictions about the amount of money people
will donate because we can directly observe it. We are, nonetheless, unable to make predictions in this context about
the reasons for donations. We would have no way to observe, say, that people donate more to attractive bell-ringers
because they are trying to impress them. In sum, one limitation of observing behavior in the real world is that it
prevents researchers from delving into the cognitive and motivational reasons behind the behaviors.

Step 2—Decide What and How to Sample
Once a researcher has developed a hypothesis that is testable, falsi�iable,
and observable, the next step is to decide what kind of information to
gather from the environment to test this hypothesis. The simple fact is
that the world is too complex to sample everything. Imagine that
someone wanted to observe the dinner rush at a restaurant. A nearly
in�inite list of possibilities for observation presents itself: What time does
the restaurant get crowded? How often do people send their food back to

1/4/2018 Print

https://content.ashford.edu/print/Newman.2681.16.1?sections=navpoint-23,navpoint-26,navpoint-27,navpoint-28,navpoint-29,navpoint-30,navpoint-31… 6/44

Steve Mason/Photodisc/Thinkstock

The dinner scene at a busy restaurant
offers a wide variety of behaviors to
observe. In order to simplify the
observation process, researchers
should narrow the focus by taking a
sample.

the kitchen? What are the most popular dishes? How often do people get
in arguments with the wait staff? To simplify the process of observing
behavior, the researcher will need to take a sample, or a smaller portion
of the population, that is relevant to the hypothesis. That is, rather than
observing “dinner at the restaurant,” the researcher’s goal is to narrow
his or her focus to something as speci�ic as “the number of people
waiting in line for a table at 6 p.m. versus 9 p.m.”

The choice of what and how to sample will ultimately depend on the best
�it for the hypothesis. The context of observational research offers three
strategies for sampling behaviors and events. The �irst strategy, time
sampling, involves comparing behaviors during different time intervals.
For example, to test the hypothesis that football teams make more
mistakes when they start to get tired, researchers could count the
number of penalties in the �irst �ive minutes and the last �ive minutes of
the game. This data would allow researchers to compare mistakes at one
time interval with mistakes at another time interval. In the case of
Festinger’s (1956) study of a doomsday cult, time sampling was used to
compare how the group members behaved before and after their
prophecy failed to come true.

The second strategy, individual sampling, involves collecting data by
observing one person at a time to test hypotheses about individual
behaviors. Many of the examples already discussed involve individual
sampling: Ainsworth and colleagues (1970) tested their hypotheses
about attachment behaviors by observing individual infants, while
Gottman (1992) tests his hypotheses about romantic relationships by
observing one married couple at a time. These types of data allow researchers to examine behavior at the individual
level and test hypotheses about the kinds of things people do—from the way they argue with their spouses to
whether they wear team colors to a football game.

The third strategy, event sampling, involves observing and recording behaviors that occur throughout an event. For
example, we could track the number of �ights that break out during an event such as a football game, or the number
of times people leave the restaurant without paying the check. This strategy allows for testing hypotheses about the
types of behaviors that occur in a particular environment or setting. For instance, a researcher might compare the
number of �ights that break out in a professional football versus a professional hockey game. Or, the next time we
host a party, we could count the number of wine bottles versus beer bottles that end up in the recycling bin. The
distinguishing feature of this strategy is its focus on occurrence of behaviors more than on the individuals
performing these behaviors.

Step 3—Record and Code Behavior
Having formulated a hypothesis and decided on the best sampling strategy, researchers must perform one �inal and
critical step before beginning data collection. Namely, they have to develop good operational de�initions of the
variables by translating the underlying concepts into measurable variables. Gottman’s research turns the concept of
marital interactions into a range of measurable variables, such as the number of dismissive comments and passive-
aggressive sighing—all things that can be observed and counted objectively. Rosenhan’s 1973 study involving fake
schizophrenic patients turned the concept of patient experience into measureable variables such as the amount of
time staff members spent with each patient—again, something very straightforward to observe.

It is vital that researchers decide up front what kinds and categories of behavior they will be observing and
recording. In the last section, we narrowed down our observation of dinner at the restaurant to the number of
people in line at 6 p.m. versus the number of people in line at 9 p.m. But how can we be sure of an accurate count?
What if two people are waiting by the door while the other two members of the group are sitting at the bar? Are

1/4/2018 Print

https://content.ashford.edu/print/Newman.2681.16.1?sections=navpoint-23,navpoint-26,navpoint-27,navpoint-28,navpoint-29,navpoint-30,navpoint-31… 7/44

those at the bar waiting for a table or simply having drinks? One possibility might be to count the number of
individuals who walk through the door in different time periods, although our count could be in�lated by those who
give up on waiting or who only enter to sneak in and out of the restroom.

In short, observing behavior in the real world can be messy. The best way to deal with this mess is to develop a clear
and consistent categorization scheme and stick with it. That is, in testing a hypothesis about the most crowded time
at a restaurant, researchers would choose one method of counting people and use it for the duration of the study. In
part, this choice of a method is a judgment call, but researchers’ judgment should be informed by three criteria.
First, they should consider practical issues, such as whether their categories can be directly observed. A researcher
can observe the number of people who leave the restaurant but cannot observe whether they got impatient. Second,
they should consider theoretical issues, such as how well the categories represent the underlying theory. Why did
researchers decide to study the most crowded time at the restaurant? Perhaps this particular restaurant is in a new,
up-and-coming neighborhood, and they expect the restaurant to become crowded over the course of the evening.
The time would also lead researchers to include people sitting both at tables and at the bar—because this crowd
may come to the restaurant with the sole intention of staying at the bar. Finally, researchers should consider
previous research in choosing their categories. Have other researchers studied dining patterns in restaurants? What
kinds of behaviors did they observe? If these categories make sense for the project, researchers may feel free to re-
use them—no need to reinvent the wheel.

Last but not least, a researcher should take a step back and evaluate both the validity and the reliability of the
coding system. (See Section 2.2 for a review of these terms.) Validity in this case means making sure the categories
capture the underlying variables in the hypothesis (i.e., construct validity; see Section 2.2). For example, in
Gottman’s studies of marital interactions, some of the most important variables are the emotions expressed by both
partners. One way to observe emotions would be to count the number of times a person smiles. However, we would
have to think carefully about the validity of this measure, because smiling could indicate either genuine happiness or
condescension. As a general rule, the better and more speci�ic researchers’ operational de�initions, the more valid
their measures will be (Chapter 2).

Reliability in this context means making sure data are collected in a consistent way. If research involves more than
one observer using the same system, their data should look roughly the same (i.e., interrater reliability). This
reliability is accomplished in part by making the observation task simple and straightforward—for example, having
trained assistants use a checklist to record behaviors rather than depending on open-ended notes. The other key to
improving reliability is careful training of the observers, giving them detailed instructions and ample opportunities
to practice the rating system.

Observation Examples

To explain how all of this comes together, we will explore a pair of examples, from research question to data
collection.

Example 1—Theater Restroom Usage
First, imagine, for the sake of this example, that someone is interested in whether people are more likely to use the
restroom before or after watching a movie. Such a research question could provide valuable information for theater
owners in planning employee schedules (i.e., when are bathrooms most likely to need cleaning). Thus, studying
patterns of human behavior results in valuable applied knowledge.

The �irst step is to develop a speci�ic, testable, and observable hypothesis. In this case, we might predict that people
are more likely to use the restroom after the movie, as a result of consuming those 64-ounce sodas during the movie.
Just for fun, we will also compare the restroom usage of men and women. Perhaps men are more likely to wait until
after the movie, whereas women are just as likely to go before as after? This pattern of data might look something
like the percentages in Table 3.1. That is, men make 80% of their restroom visits after the movie and 20% before the
movie, while women make about 50% of their restroom visits at each time.

1/4/2018 Print

https://content.ashford.edu/print/Newman.2681.16.1?sections=navpoint-23,navpoint-26,navpoint-27,navpoint-28,navpoint-29,navpoint-30,navpoint-31… 8/44

Table 3.1: Hypothesized restroom visits

Gender Men Women

Before movie 20% 50%

After movie 80% 50%

Total 100% 100%

The next step is to decide on the best sampling strategy to test this hypothesis. Of the three sampling strategies
discussed—individual, event, and time—which one seems most relevant here? The best option would probably be
time sampling because the hypothesis involves comparing the number of restroom visitors in two time periods
(before versus after the movie). So, in this case, we would need to de�ine a time interval for collecting data. We could
limit our observations to the 10 minutes before the previews begin and the 10 minutes after the credits end. The
potential problem here, of course, is that some people might use either the previews or the end credits as a chance
to use the restroom. Another complication arises in trying to determine which movie people are watching; in a giant
multiplex theater, movies start just as others are �inishing. One possible solution, then, would be to narrow the
sample to movie theaters that show only one movie at a time and to de�ine the sampling times based on the actual
movie start- and end-times.

Having determined a sampling strategy, the next step is to identify the types of behaviors we want to record. This
particular hypothesis poses a challenge because it deals with a rather private behavior. To faithfully record people
“using the restroom,” we would need to station researchers in both men’s and women’s restrooms to verify that
people actually, well, “use” the restroom while they are in it. However, this strategy poses the potential downside
that the researcher’s presence (standing in the corner of the restroom) will affect people’s behavior. Another, less
intrusive option would be to stand outside the restroom and simply count “the number of people who enter.” The
downside to that, of course, is that we technically do not know why people are going into the restroom. But
sometimes research involves making these sorts of compromises—in this case, we chose to sacri�ice a bit of
precision in favor of a less-intrusive measurement. This compromise would also serve to reduce ethical issues with
observing people in the restroom.

So, in sum, we started with the hypothesis that men are more likely to use the restroom after a movie, while women
use the restroom equally before and after. We then decided that the best sampling strategy would be to identify a
movie theater showing only one movie and to sample from the 10-minute periods before and after the actual
movie’s running time. Finally, we decided that the best strategy for recording behavior would be to station observers
outside the restrooms and count the number of people who enter. Now, say we conduct these observations every
evening for one week and collect the data in Table 3.2.

Table 3.2: Findings from observing restroom visits

Gender Men Women

Before movie 75 (25%) 300 (60%)

After movie 225 (75%) 200 (40%)

Total 300 (100%) 500 (100%)

Notice that more women (N = 500) than men (N = 300) attended the movie theater during our week of sampling.
The real test of our hypothesis, however, comes from examining the percentages within gender groups. That is, of
the 300 men who went into the restroom, what percentage of them did so before the movie and what percentage of
them did so after the movie? In this dataset, women used the restroom with relatively equal frequency before (60%)
and after (40%) the movie. Men, in contrast, were three times as likely to use the restroom after (75%) than before
(25%) the movie. In other words, our hypothesis appears to be con�irmed by examining these percentages.

Example 2—Cell Phone Usage While Driving

1/4/2018 Print

https://content.ashford.edu/print/Newman.2681.16.1?sections=navpoint-23,navpoint-26,navpoint-27,navpoint-28,navpoint-29,navpoint-30,navpoint-31… 9/44

Imagine that we are interested in patterns of cell phone usage among drivers. Several recent studies have reported
that drivers using cell phones are as impaired as drunk drivers, making this an important public safety issue. Thus, if
we could understand the contexts in which people are most likely to use cell phones, it would provide valuable
information for developing guidelines for safe and legal use of these devices. So, this study might count the number
of drivers using cell phones in two settings: while navigating rush-hour traf�ic and while moving on the freeway.

The �irst step is to develop a speci�ic, testable, and observable hypothesis. In this case, we might predict that people
are more likely to use cell phones when they are bored in the car. So, we hypothesize that we will see more drivers
using cell phones while stuck in rush-hour traf�ic than while moving on the freeway.

The next step is to decide on the best sampling strategy to test this hypothesis. Of the three sampling strategies
discussed—individual, event, and time—which one seems most relevant here? The best option would probably be
individual sampling because we are interested in the cell phone usage of individual drivers. That is, for each
individual car we see during the observation period, we want to know whether the driver is using a cell phone. One
strategy for collecting these observations would be to station observers along a fast-moving stretch of freeway, as
well as along a stretch of road that is clogged during rush hour. These observers would keep a record of each passing
car and note whether the driver is on the phone.

After selecting a sampling strategy, we next must decide the types of behaviors to record. One challenge this study
presents is how broadly to de�ine cell phone usage. Should we include both talking and text messaging? Given our
interest in distraction and public safety, we probably want to include text messaging. Several states have recently
banned this practice while driving, often in response to tragic accidents. Because we will be observing moving
vehicles, the most reliable approach might be to simply note whether drivers have a cell phone in their hand. As with
the restroom study, we sacri�ice a little bit of precision (i.e., knowing what the driver is using the cell phone for) to
capture behaviors that are easier to record.

To sum up, we started with the hypothesis that drivers would be more likely to use cell phones when stuck in traf�ic.
We then decided that the best sampling strategy would be to station observers along two stretches of road who
would note whether drivers were using cell phones. Finally, we decided that the cell phone usage would be de�ined
as each driver holding a cell phone. Now, suppose we conducted these observations over a 24-hour period and
collected the data in Table 3.3.

Table 3.3: Findings from observing cell phone usage

Rush Hour Highway

Cell Phone 30 (30%) 200 (67%)

No Cell Phone 70 (70%) 100 (33%)

Total 100 (100%) 300 (100%)

The results show that more cars passed by on the highway (N = 300) than on the street during the rush-hour stretch
(N = 100). The real test of our hypothesis, though, comes from examining the percentages within each stretch. That
is, of the 100 people observed during rush hour and the 300 observed on the highway, what percentage was using
cell phones? In this data set, 30% of those in rush hour were using cell phones, compared with 67% of those on the
highway. In other words, the data did not con�irm our hypothesis. Drivers in rush hour were less than half as likely
to be using cell phones. The next step in this research program would be to speculate on the reasons the data
contradicted the hypothesis.

Qualitative versus Quantitative Approaches

The general method of observation lends itself equally well to qualitative and quantitative approaches, although
some types of observation �it one approach better than the other. For example, structured observation tends to focus
on hypothesis testing and quanti�ication of responses. In Mary Ainsworth’s (1970) “strange situation” research

1/4/2018 Print

https://content.ashford.edu/print/Newman.2681.16.1?sections=navpoint-23,navpoint-26,navpoint-27,navpoint-28,navpoint-29,navpoint-30,navpoint-3… 10/44

(described previously), the primary goal was to expose children to a predetermined script of events and to test
hypotheses about how children with secure and insecure attachments would respond to these events. In contrast,
naturalistic observation—and, to a greater extent, participant observation—tends to focus on learning from events
as they occur naturally. In Leon Festinger’s “doomsday cult” study, the researchers joined the group to observe the
ways members reacted when their prophecy failed to come true. Margaret Mead (1928) spent several months living
with Samoan adolescents to understand social norms around coming of age.

Research: Thinking Critically

“Irritable Heart” Syndrome in Civil War Veterans

Follow the link below to an article by science writer and editor K. Kris Hirst. In this article, Hirst reviews
compelling research from health psychologist Roxanne Cohen Silver and her colleagues at the University of
California, Irvine. Cohen Silver and her colleagues reviewed the service records of 15,027 Civil War veterans,
�inding an astounding rate of mental illness—long before post-traumatic stress disorder was recognized. As
you read the article, consider what you have learned so far about the research process, and then respond to
the questions below.

http://psychology.about.com/od/ptsd/a/irritableheart.htm
(http://psychology.about.com/od/ptsd/a/irritableheart.htm)

Think about it:

1. What hypotheses are the researchers testing in this study?
2. How did the researchers quantify trauma experienced by Civil War soldiers? Do you think this is a

valid way to operationalize trauma? Explain why or why not.
3. Would this research be best described as case studies, archival research, or natural observation?

Does the study involve elements of more than one type? Explain.

http://psychology.about.com/od/ptsd/a/irritableheart.htm

1/4/2018 Print

https://content.ashford.edu/print/Newman.2681.16.1?sections=navpoint-23,navpoint-26,navpoint-27,navpoint-28,navpoint-29,navpoint-30,navpoint-3… 11/44

Learning Objectives

By the end of this chapter, you should be able to:

Describe the distinguishing features of survey research.
Outline best practices for designing questionnaires to ensure quality responses.
Explain the reasons for sampling for the population.
Distinguish the different types of sampling strategies.
Explain the logic behind common approaches to analyzing survey data.

In a highly in�luential book published in the 1960s, the sociologist Erving Goffman (1963) de�ined stigma as an
unusual characteristic that triggers a negative evaluation. In his words, “the stigmatized person is one who is
reduced in our minds from a whole and usual person to a tainted, discounted one” (p. 3). People’s beliefs about
stigmatized characteristics exist largely in the eye of the beholder, but have substantial in�luence on social
interactions with the stigmatized (see Snyder, Tanke, & Berscheid, 1977). A large research tradition in psychology
has been devoted to understanding both the origins of stigma and the consequences of being stigmatized. According
to Goffman and others, the characteristics associated with the greatest degree of stigma have three features in
common: they are highly visible, they are perceived as controllable, and they are misunderstood by the public.

4 Survey Designs—Predicting Behavior

Duncan Smith/Photodisc/Thinkstock

1/4/2018 Print

https://content.ashford.edu/print/Newman.2681.16.1?sections=navpoint-23,navpoint-26,navpoint-27,navpoint-28,navpoint-29,navpoint-30,navpoint-3… 12/44

Recently, researchers have taken considerable interest in people’s attitudes toward members of the gay and lesbian
community. Although these attitudes have become more positive over time, this group still encounters harassment
and other forms of discrimination on a regular basis (see Almeida, Johnson, Corliss, Molnar, & Azrael, 2009; Berrill,
1990). One of the top recognized experts on this subject is Gregory Herek, professor of psychology at the University
of California at Davis (http://psychology.ucdavis.edu/herek/ (http://psychology.ucdavis.edu/herek/) ). In a 1988
article, Herek conducted a survey of heterosexuals’ attitudes toward both lesbians and gay men, with the goal of
understanding the predictors of negative attitudes. Herek approached this research question by constructing a
questionnaire to measure people’s attitudes toward these groups. In three studies, participants were asked to
complete this attitude measure, along with other existing scales assessing attitudes about gender roles, religion, and
traditional ideologies.

Herek’s (1988) research revealed that, as hypothesized, heterosexual males tended to hold more negative attitudes
about gay men and lesbians than did heterosexual females. However, the same psychological mechanisms seemed to
explain the prejudice in both genders. That is, negative attitudes toward gays and lesbians were associated with
increased religiosity, more traditional beliefs about family and gender, and fewer experiences actually interacting
with gay men and lesbians. These associations meant that Herek could predict people’s attitudes toward gay men
and lesbians based on knowing their views about family, gender, and religion, as well as their past interactions with
the stigmatized group. In this paper, Herek’s primary contribution to the literature was the insight that reducing
stigma toward gay men and lesbians “may require confronting deeply held, socially reinforced values” (1988, p.
473). This insight was only possible because people were asked to report these values directly.

This chapter continues along the continuum of control, moving on to survey research, in which the primary goal is
either describing or predicting attitudes and behavior. For our purposes, survey research refers to any method that
relies on people’s direct reports of their own attitudes, feelings, and behaviors. So, for example, in Herek’s (1988)
study, the participants reported their attitudes toward lesbians and gay men, rather than these attitudes being
somehow directly observed by the researchers. Compared to the descriptive designs we discussed in Chapter 3,
survey designs tend to have more control over both data collection and question content. Thus, survey research falls
somewhere between purely descriptive research (Chapter 3) and the explanatory power of experimental designs
(Chapter 5). This chapter provides an overview of survey research from conceptualization through analysis. It will
discuss the types of research questions that are best suited to survey research and provide an overview of the
decisions to consider in designing and conducting a survey study. We will then cover the process of data collection,
with a focus on selecting the people who will complete surveys. Finally, the chapter will describe the three most
common approaches for analyzing survey data.

Research: Making an Impact

Kinsey Reports

Alfred Kinsey’s research on human sexuality is an example of social research that changed the way society
thought about a complex issue—in this case, ideas about “normal” sexual behavior. Kinsey’s research,
particularly two books on male and female sexuality known together as the Kinsey Reports, illuminated the
discrepancies between the assumptions made by a “moral public” and the actual behavior of individuals. His
shift in the approach to studying sex—applying scienti�ic methods and reasoning rather than basing
conclusions on medical speculation and dogmatic opinions—changed the nature of sex research and the
general public’s view of sex for decades to come.

Kinsey’s major contribution was in challenging the prevailing assumptions about sexual activity in the United
States and obtaining descriptive data from both men and women that described their own sexual practices
(Bullough, 1998). By collecting actual data instead of relying on speculation, Kinsey made the study of
sexuality more scienti�ically based. The results of his surveys revealed a variety of sexual behaviors that
shocked many members of society and rede�ined the sexual morality of modern America.

http://psychology.ucdavis.edu/herek/

1/4/2018 Print

https://content.ashford.edu/print/Newman.2681.16.1?sections=navpoint-23,navpoint-26,navpoint-27,navpoint-28,navpoint-29,navpoint-30,navpoint-3… 13/44

Until Kinsey’s research, the general, Victorian viewpoint was that women should not show any interest in sex
and should submit to their husband without any sign of pleasure (Davis, 1929). Kinsey’s data challenged the
prevailing assumption that women were asexual. His studies revealed that 25% of the women studied had
experienced an orgasm by the age of 15 and more than half by the age of 20 (Kinsey, Pomeroy, Martin, &
Gebhard, 1953). Eventually, these results were bundled into the various elements that fueled the women’s
movement of the 1960s and encouraged further examination of female sexuality (Bullough, 1998).

Kinsey’s data also contributed to the budding gay and lesbian liberation movement. Until the Kinsey Reports,
studies of human sexuality were based on the assumption that homosexuals were mentally ill (Bullough,
1998). When Kinsey’s data revealed that many males and females practiced homosexuality to some degree,
he suggested that sexuality was more of a continuum than a series of categories into which people �it. In
addition, the Kinsey Reports revealed that the number of extramarital relationships people were having was
higher than most expected. Forty percent of married American males reported having an extramarital
relationship (Kinsey, et al., 1953).

These ideas, though controversial, prompted society to take a realistic look at the actual sexual practices of its
members. The topic of sexuality became less dogmatic as society became more open about sexual activities
and preferences.

Kinsey’s data not only encouraged social change but also revolutionized the way in which scientists study
sexuality. By examining data and studying sex from an unbiased standpoint, Kinsey successfully transformed
the study of human sexuality into a science. His research not only changed our way of studying sexual
behavior but also allowed society to become less restrictive in its expectations of “normal” sexual behavior.

Think About It

1. What type of data formed the basis of Kinsey’s reports? What are the pros and cons of this type?
2. How did applying the scienti�ic method change the national conversation about sexuality?

1/4/2018 Print

https://content.ashford.edu/print/Newman.2681.16.1?sections=navpoint-23,navpoint-26,navpoint-27,navpoint-28,navpoint-29,navpoint-30,navpoint-3… 14/44

shironosov/iStock/Thinkstock

Surveys are used to describe or predict attitudes
and behavior.

4.1 Introduction to Survey Research
Whether you aware of it or not, most people encounter survey research throughout most of their lives. Every time
we decide to answer that call from an unknown number, and the person on the other end of the line insists on
knowing the call recipient’s household income and favorite brand of laundry detergent, we are helping to conduct
survey research. When news programs try to predict the winner of an election two weeks early, these reports are
based on survey research of eligible voters. In both cases, the researcher is trying to make predictions about the
products people buy or the candidates they will elect based on what people say about their own attitudes, feelings,
and behaviors.

Surveys can be used in a variety of contexts and are most
appropriate for questions that involve people describing their
attitudes, their behaviors, or a combination of the two. For
example, if we want to examine the predictors of attitudes
toward the death penalty, we could ask people their opinions
on this topic and also ask them about their political party
af�iliation. Based on these responses, we could test whether
political af�iliation predicted attitudes toward the death
penalty. Or, imagine we want to know whether students who
spend more time studying are more likely to do well on their
exams. This question could be answered using a survey that
asked students about their study habits and then tracked
their exam grades. We will return to this example near the
end of the chapter, as we discuss the process of analyzing
survey data to test our hypotheses about predictions.

The common thread of these two examples is that they require people to report either their thoughts (e.g., opinions
about the death penalty) or their behaviors (e.g., the hours they spend studying). Contrast these with an example
that might be a poor �it for survey research: If a researcher wanted to test whether a new drug led to increased risk
of developing blood clots, it would be much safer to test for these clots using medical technology, rather than asking
people for their beliefs (“on a scale from 1 to 5, how many clots have you developed this week?”). Thus, when
deciding whether a survey is the best �it for a research question, a researcher must consider whether people will be
both able and willing to report the opinions or behaviors accurately. The next section expands on both of these
issues.

Distinguishing Features of Surveys

Survey research designs have three distinguishing features that set them apart from other designs. First, all survey
research relies on either written or verbal self-reports of people’s attitudes, feelings, and behaviors. This self-
reporting means that researchers will ask participants a series of questions and record their responses. The
approach has several advantages, including being relatively straightforward and allowing a degree of access to
psychological processes (e.g., “Why do you support candidate X?”). However, researchers should also be also
cautious in their interpretation of self-report data because participants’ responses often re�lect a combination of
their true attitude and concern over how this attitude will be perceived. Scientists refer to this concern as social
desirability, which means that people may be reluctant to report unpopular attitudes. For example, if we were to
ask people their attitudes about different racial groups, their answers might re�lect both their true attitude and their
desire not to appear racist. We return to the issue of

social desirability

later in this chapter and discuss some tactics
for designing questions that can help to sidestep these concerns and capture respondents’ true attitudes.

The second distinguishing feature of survey research is its ability to access internal states that cannot be measured
through direct observation. The discussion of observational designs in Chapter 3 explained that one limitation of
these designs was a lack of insight into why people behave the way they do. Survey research can address this
limitation directly: By asking people what they think, how they feel, and why they behave in certain ways,

1/4/2018 Print

https://content.ashford.edu/print/Newman.2681.16.1?sections=navpoint-23,navpoint-26,navpoint-27,navpoint-28,navpoint-29,navpoint-30,navpoint-3… 15/44

researchers come closer to capturing the underlying psychological processes. However, people’s reports of their
internal states should be taken with a grain of salt, for two reasons. First, as mentioned, these reports may be biased
by social-desirability concerns, particularly when unpopular attitudes are involved. Second, a large body of
literature in social psychology suggests that people may not understand the true reasons for their behavior. In an
in�luential review paper, psychologists Richard Nisbett and Tim Wilson (1977) argued that we make poor guesses
after the fact about why we do things, based more on our assumptions than on any real introspection. Thus, survey
questions can provide access to internal states, but researchers should always interpret responses with caution.

Third, on a more practical note, survey research allows us to collect large amounts of data with relatively little effort
and few resources. Many of the descriptive designs Chapter 3 discussed require observing one person at a time, and
the same will hold true when Chapter 5 explores experimental designs. Survey-research designs stand out as the
most ef�icient, because surveys can be distributed to large groups of people simultaneously. Still, their actual
ef�iciency depends on the decisions researchers make during the design process. In reality, ef�iciency is often in a
delicate balance with the accuracy and completeness of the data.

Broadly speaking, survey research can be conducted using either verbal or written self-reports (or a combination of
the two). Before diving into the details of writing and formatting a survey, we need to understand the pros and cons
of administering a survey as an interview (i.e., a verbal survey) or a questionnaire (i.e., a written survey).

Interviews

An interview involves a verbal question-and-answer exchange between the researcher and the participant. This
verbal exchange can take place either face-to-face or over the phone. So, our earlier telemarketer example
represents an interview because the questions are asked verbally via phone. Likewise, if we are approached in a
shopping mall and asked to answer questions about our favorite products, we experience a survey in interview form
because the questions are administered verbally face-to-face. And, if a person has ever participated in a focus group,
during which a group of people gives their reactions to a new product, the researchers are essentially conducting an
interview with the group.

Interview Schedules
Regardless of how the interview is administered, the interviewer (i.e., the researcher) has a predetermined plan, or
script, for how the interview should go. This plan, or script, for the progress of the interview is known as an
interview schedule. When conducting an interview—including those telemarketing calls—the
researcher/interviewer has a detailed plan for the order of questions to be asked, along with follow-up questions
that depend on the participant’s responses.

Broadly speaking, researchers employ two types of interview schedules. A linear (also called “structured”)
schedule will ask the same questions, in the same order, for all participants. In contrast, a

branching schedule

unfolds more like a �lowchart, with the next question dependent on participants’ answers. Interviewers typically use
a branching schedule in cases with follow-up questions that only make sense for some of the participants. For
example, a researcher might �irst ask people whether they have children; if they answer “yes,” the interviewer might
then follow up by asking how many.

One danger in using a branching schedule is that it is based partly on the researcher’s assumptions about the
relationships between variables. Granted, to ask only people with children to indicate how many they have is fairly
uncontroversial. Imagine the following scenario, however. Say we �irst ask participants for their household income,
and then ask about their political donations:

“How much money do you make? $18,000? OK, how likely are you to donate to the Democratic Party?”
“How much money do you make? $250,000? OK, how likely are you to donate money to the Republican
Party?”

1/4/2018 Print

https://content.ashford.edu/print/Newman.2681.16.1?sections=navpoint-23,navpoint-26,navpoint-27,navpoint-28,navpoint-29,navpoint-30,navpoint-3… 16/44

The way these questions branch implicitly assumes that wealthier people are more likely to be Republicans, and less
wealthy people are more likely to be Democrats. The data might support this assumption or they might not. By
planning the follow-up questions in this way, though, we are unable to capture cases that do not �it our stereotypes
(i.e., the wealthy Democrats and the poor Republicans). Researchers must therefore be careful about letting their
biases shape the data-collection process.

Advantages and Disadvantages of Interviews
Interviews offer a number of advantages over written surveys. For one, people are often more motivated to talk than
they are to write. Consider the example of an actual undergraduate research assistant who was dispatched to a local
shopping mall to interview people about their experiences in romantic relationships. He had no trouble at all
recruiting participants, many of whom would go on and on (and on, and on) about recent relationships—one
woman even con�ided to him that she had just left an abusive spouse earlier that week. For better or for worse, these
experiences would have been more dif�icult to capture in writing.

Related to this bene�it, people’s oral responses are typically richer and more detailed than their written responses.
Think of the difference between asking someone to “describe your views on gun control” and asking someone to
“indicate on a scale of 1 to 7 the degree to which you support gun control.” The former is more likely to capture the
richness and subtlety involved in people’s attitudes about guns. On a practical note, an interview format also allows
the researcher to ensure that respondents understand the questions. Poorly worded written-questionnaire items
force survey participants to guess at the researcher’s meaning, and these guesses introduce a large source of error
variance. On the other hand, if an interview question is poorly asked, people can easily ask the interviewer to clarify.
Finally, using an interview format allows researchers to reach a broader cross-section of people and to include those
who are unable to read and write—or, perhaps, unable to read and write the language of the survey.

Interviews also have two clear disadvantages compared to written surveys. First, interviews cost more in terms of
both time and money. It took more time for the graduate assistant to go to a shopping mall than it would have taken
to mail out packets of surveys (but no more money—research-assistant positions tend to be unpaid). Second, the
interview format allows many opportunities for interviewers to pass on their personal biases. These biases are
unlikely to be deliberate, but participants can often pick up on body language and subtle facial expressions when the
interviewer disagrees with their answers. Such cues may in�luence them to shape their responses to make the
interviewer happier. The best way to understand the pros and cons of interviewing is to recognize that both are a
consequence of personal interaction. The interaction between interviewer and interviewee allows for richer
responses but also the potential for these responses to be biased. Researchers must weigh these pros and cons and
decide which method is the best �it for their survey. The next section turns to the process of administering surveys in
writing.

One additional problem with interviews is the increasing dif�iculty of obtaining representative samples for
interviews over the telephone due to low or declining use of landline phones, coupled with the use of unlisted
numbers and call-screening devices. In the United States, the Pew Research Center (2012) reports that overall
response rate—a ratio of completed interviews to the number of phone numbers dialed—was just 9% in 2012, one-
fourth of the 36% level from 1997. Thus, signi�icant differences may exist between people who elect to respond to
phone surveys and those who do not.

Questionnaires

A questionnaire is a survey that involves a written question-and-answer exchange between the researcher and the
participant. The exchange is a bit different from interview formats—in this case, the questions are designed ahead of
time, then distributed to participants, who write their responses and return the questionnaire to the researcher. The
next section discusses details for designing these questions. First, however, we will take a quick look at the process
of administering written surveys.

Distribution Methods

1/4/2018 Print

https://content.ashford.edu/print/Newman.2681.16.1?sections=navpoint-23,navpoint-26,navpoint-27,navpoint-28,navpoint-29,navpoint-30,navpoint-3… 17/44

AndreyPopov/iStock/Thinkstock

Approximately 20–30% of online surveys are
completed on a mobile device.

Questionnaires can be distributed in three primary ways, each with its own pattern of advantages and
disadvantages:

Distributing by mail: Until recently, researchers commonly distributed surveys by sending paper copies through
the mail to a group of participants (see the section on “Sampling” for more discussion on how this group is selected).
Mailing surveys is relatively cheap and relatively easy to do, but it is unfortunately one of the worst methods in
terms of response rates. People tend to ignore questionnaires that they receive in the mail, dismissing them as one
more piece of junk. Researchers have a few methods available for increasing response rates, including providing
incentives, making the survey interesting, and making it as easy as possible to return the results (e.g., with a
postage-paid envelope). However, even using all of these tactics, researchers consider themselves extremely lucky to
obtain a 30% response rate from a mail survey. That means a researcher who mails 1,000 surveys will be doing well
to receive 300 back. More typical response rates for mail surveys can be in the single digits. Because of this low
return on investment, researchers have begun relying on other methods for their written surveys.

Distributing in person: Another option for researchers is to distribute a written survey in person, simply handing
out copies and asking participants to �ill them out on the spot. This method is certainly more time-consuming; a
researcher has to be stationed for long periods of time to collect data. In addition, people are less likely to answer
the questions honestly because the presence of a researcher makes them worry about social desirability. Last, the
sample for this method is limited to people who are in the physical area at the time that questionnaires are being
distributed. As the chapter discusses later, this limitation might lead to problems in the composition of the sample.
On the plus side, however, this method tends to result in higher compliance rates because people �ind it harder to
say no to someone face-to-face than to ignore a piece of mail.

Distributing online: During the last two decades, online
surveys have become the dominant method of data
collection, for both market research and academic research.
Online distribution involves posting a questionnaire on a web
page, and then directing participants to this web page to
complete the questionnaire. Online surveys offer many
bene�its over other forms of data collection, including the
ability to: present audio and visual stimuli, randomize the
order of questions, and implement complex branching logic
(e.g., asking people to evaluate local grocery stores
depending on where they live).

Most recently, researchers have begun exploring the best
ways to design surveys for mobile devices. According to a
report from the International Telecommunications Union, in
2013, 6.8 billion mobile phones were in use, compared to a
world population of 7.2 billion. In 2012, 44% of Americans slept next to their phones (Pew Research Center, 2012).
Not surprisingly, consensus in the market research industry is that approximately 20–30% of online surveys are
actually completed on a mobile device (Poynter, Williams, & York, 2014). Why does this matter? People take surveys
on their smartphones because it is convenient (or, in some cases, because it is their only Internet device). However,
despite recent exponential advancement, mobile phones still have smaller screens, less functional keyboards, and
less predictability in displaying images and videos. (Imagine someone being asked to view a set of two-minute-long
advertisements on an iPhone while trying to complete a survey before a doctor’s appointment.) Researchers do have
ways to make this experience more pleasant for respondents and consequently to increase the quality of data
obtained. For example, mobile surveys work best when they are shorter overall, when the question text is short and
straightforward, and when response scales (discussed below) are kept at �ive points (see Poynter et al., 2014, for a
review). The latter point is a direct result of small screen size: Longer response scales require respondents to scroll
back and forth on their screens to see the entire scale. Unfortunately, but understandably, some applied research
suggests that people tend to ignore the scale points that they cannot see—perhaps using only four points out of a
ten-point scale.

1/4/2018 Print

https://content.ashford.edu/print/Newman.2681.16.1?sections=navpoint-23,navpoint-26,navpoint-27,navpoint-28,navpoint-29,navpoint-30,navpoint-3… 18/44

Because these methods are relatively new, the jury is still out on whether online and mobile distribution results in
biased samples or biased responses. However, worth keeping in mind is that approximately 13% of the U.S.
population does not have Internet access (Internet Users by Country, 2014). This group is disproportionately older
(65+) and represents the lowest income and least educated segments of the population. Thus, if research questions
involve reaching these groups, it is necessary to supplement online surveys with other distribution methods. For
readers interested in more information on designing and conducting Internet research, Sam Gosling and John
Johnson’s (2010) recent book provides an excellent resource. In addition, several groups of psychological
researchers have been attempting to understand the psychology of Internet users (read about recent studies on this
website: http://www.spring.org.uk/2010/10/internet-psychology.php
(http://www.spring.org.uk/2010/10/internet-psychology.php) ).

Advantages and Disadvantages of Questionnaires
Just as interview methods do, written questionnaires claim their own set of advantages and disadvantages. Written
surveys allow researchers to collect large amounts of data with little cost or effort, and they can offer a greater
degree of anonymity than interviews. Anonymity can be a particular advantage in dealing with sensitive or
potentially embarrassing topics. That is, people may be more willing to answer a questionnaire about their alcohol
use or their sexual history than they would be to discuss these things face-to-face with an interviewer. On the
downside, written surveys miss out on one advantage of interviews because no one is available to clarify confusing
questions. Fortunately, researchers have one relatively easy way of minimizing this problem: make survey questions
as clear as possible. The next section explains the process of questionnaire design.

The Online Society: 50 Internet Psychology Studies

1/4/2018 Print

https://content.ashford.edu/print/Newman.2681.16.1?sections=navpoint-23,navpoint-26,navpoint-27,navpoint-28,navpoint-29,navpoint-30,navpoint-3… 19/44

4.2 Questionnaire Design
One of the most important steps in conducting survey research is deciding how to construct and assemble the
questionnaire items. In some cases, a researcher will be able to answer research questions using questionnaires that
other researchers have already developed. For example, quite a bit of psychology research uses standard scales that
measure self-esteem, prejudice, depression, or stress levels. The advantage of these ready-made measures is that
other people have already gone to the trouble of making sure they are valid and reliable. So, someone interested in
the relationship between stress and depression could distribute the Perceived Stress Scale (Cohen, Kamarck, &
Mermelstein, 1983) and the Beck Depression Inventory (Beck, Steer, Ball, & Ranieri, 1996) to a group of participants
and more quickly move along on to the fun part of data analyses.

However, in many cases, no perfect measure exists for a research question—either because no one has studied the
topic before or because the current measures are all �lawed in some way. When this happens, researchers need to go
through the process of designing their own questions. This section discusses strategies for writing questions and
choosing the most appropriate response format.

Five Rules for Better Questions

Each of the rules listed below is designed to make research questions as clear and easy to understand as possible so
as to minimize the potential for error variance. We discuss each rule below and illustrate it with contrasting pairs of
items: “bad” items that do not follow the rule and “better” items that do.

1. Use simple language. One of the simplest and most important rules to keep in mind is that people have to
be able to understand the survey questions. This means avoiding jargon and specialized language whenever
possible.

BAD: “Have you ever had an STD?”

BETTER: “Have you ever had a sexually transmitted disease?”

BAD: “What is your opinion of the S-CHIP program?”

BETTER: “What is your opinion of the State Children’s Health Insurance Program?”

It is also a good idea to simplify the language as much as possible, so that people spend time answering the
question rather than trying to decode its meaning. For example, words like assist and consider can be
replaced with simpler words like help and think. This may seem odd—or perhaps even condescending to
participants—but it is always better to err on the side of simplicity. Remember, when people are forced to
guess at the meaning of questions, these guesses add error variance to their answers.

2. Be precise. Another way to ensure that people understand the question is to be as precise as possible with
wording. Ambiguously (or vaguely) worded questions will introduce an extra source of error variance into
the data because people may interpret these questions in varying ways.

BAD: “What drugs do you take?” (Legal drugs? Illegal drugs? Now? In college?)

BETTER: “What prescription drugs are you currently taking?”

BAD: “Do you like sports?” (Playing? Watching? Which sports??)

BETTER: “How much do you enjoy watching basketball on television?”

3. Use neutral language. Questions should be designed to measure participants’ attitudes, feelings, or
behaviors rather than to manipulate these things. That is, avoid leading questions that are written in such
a way that they suggest an answer.

1/4/2018 Print

https://content.ashford.edu/print/Newman.2681.16.1?sections=navpoint-23,navpoint-26,navpoint-27,navpoint-28,navpoint-29,navpoint-30,navpoint-3… 20/44

BAD: “Do you beat your children?” (Who would say yes?)

BETTER: “Is it acceptable to use physical forms of discipline?”

BAD: “Do you agree that the president is an idiot?”

BETTER: “How would you rate the president’s job performance?”

This guideline can be used to sidestep social desirability concerns. If the researcher suspects that people
may be reluctant to report holding an attitude—for example, using corporal punishment with their children
—it helps to phrase the question in a nonthreatening way: “using physical forms of discipline” versus
“beating your children.” Many current measures of prejudice adopt this technique. For example,
McConahay’s (1986) “modern racism” scale contains items such as “Discrimination against Blacks is no
longer a problem in the United States.” People who hold prejudicial attitudes are more likely to confess
agreement with statements like this one than with blunter ones, like “I hate people from Group X.”

4. Ask one question at a time. One remarkably common error that people make in designing questions is to
include a double-barreled question (one which asks more than one question at a time). A new-patient
questionnaire at a doctor’s of�ice often asks whether the patient suffer from “headaches and nausea.” What
if an individual only suffers from one of these or has a lot of nausea and an occasional headache? The better
approach is to ask about each of these symptoms separately.

BAD: “Do you suffer from pain and numbness?”

BETTER: “How often do you suffer from pain?” “How often do you suffer from numbness?”

BAD: “Do you like watching football and boxing?”

BETTER: “How much do you enjoy watching football?” “How much do you enjoy watching boxing?”

5. Avoid negations. One �inal and simple way to clarify questions is to avoid questions with negative
statements because these can often be dif�icult to understand. The �irst example below may be a little silly,
but the second comes from a real survey of voter opinion.

BAD: “Do you never not cheat on your exams?” (Wait, what? Do I cheat? Do I not cheat? What is
this asking?)

BETTER: “Have you ever cheated on an exam?”

BAD: “Are you against rejecting the ban on pesticides?” (Wait, so, am I for the ban? Against the
ban? What is this asking?)

BETTER: “Do you support the current ban on pesticides?”

Participant-Response Options

This section discusses the issue of deciding how participants should respond to survey questions. The decisions
researchers make at this stage will affect the type of data they ultimately collect, so it is important to choose
carefully. This section reviews the primary decisions a researcher will need to make about response options, as well
as the pros and cons of each one.

One of the �irst choices to make is whether to collect open-
ended or �ixed-format responses. As the names imply, �ixed-
format responses require participants to choose from a list
of options (e.g., “Choose your favorite color”), while open-

1/4/2018 Print

https://content.ashford.edu/print/Newman.2681.16.1?sections=navpoint-23,navpoint-26,navpoint-27,navpoint-28,navpoint-29,navpoint-30,navpoint-3… 21/44

Eduard Lysenko/iStock/Thinkstock

Thirty percent of participants selected the
invention of computers as the most signi�icant
event of the past 50 years when presented with
�ixed-format responses, but when a different
group was asked the same question in an open-
ended format, only 20% listed the invention of
computers.

ended responses ask participants to provide unstructured
responses to a question or statement (e.g., “How do you feel
about legalizing marijuana?”). Open-ended responses tend to
be richer and more �lexible but harder to translate into
quanti�iable data—analogous to the tradeoff we discussed in
comparing written versus oral survey methods. To put it
another way, some concepts are dif�icult to reduce to a seven-
point �ixed-format scale, but number ratings on these scales
are easier to analyze than a paragraph of free-�lowing text.

Another reason to think carefully about this decision is that
�ixed-format responses will, by de�inition, restrict people’s
options in answering the question. In some cases, these
restrictions can even act as leading questions. In a study of
people’s perceptions of history, Dario Páez Rovira and his
colleagues (Rovira, Deschamps, & Pennebaker, 2006) asked
respondents to indicate the “most signi�icant event over the
last 50 years.” When this was asked in an open-ended way
(i.e., “list the most signi�icant event”), 2% of participants
listed the invention of computers. Another version of the
survey asked the question using a �ixed-format way (i.e.,
“choose the most signi�icant event”). When asked to select
from a list of four options (World War II, invention of
computers, Tiananmen Square, or man on the moon), 30%
chose the invention of computers. In exchange for having
easily coded data, the researchers accidentally forced participants into a smaller number of options. The result, in
this case, was a distorted sense of the importance of computers in people’s perceptions of history.

Fixed-Format Options
Although �ixed-format responses can sometimes constrain or skew participants’ answers, researchers tend to use
them more often than not. This decision is largely practical; �ixed-format responses allow for more ef�icient data
collection from a much larger sample. (Imagine the chore of having to hand-code 2,000 essays.) But once
researchers have decided on this option for the questionnaire, the decision process is far from over. In this section,
we discuss three possibilities for constructing a �ixed-format response scale.

True/false. One �ixed-format option asks questions using a true/false format, which asks participants to indicate
whether they endorse a statement. For example:

“I attended church last Sunday.” True False

“I am a U.S. citizen.” True False

“I am in favor of abortion.” True False

This last example may strike you as odd, and in fact it illustrates an important limitation in the use of true/false
formats: They are best used for statements of facts rather than attitudes. It is relatively straightforward to answer
whether we attended church or are a U.S. citizen. However, people’s attitudes toward abortion are often complicated
—one might be “pro-choice” but still support some restrictions, or “pro-life” but support exceptions (e.g., in cases of
rape). For most people, a true/false question cannot even come close to capturing the complexity of these beliefs.
However, for survey items that involve simple statements of fact, the true/false format can be a good option.

Multiple choice. A second option uses a multiple-choice format, which asks participants to select from a set of
predetermined responses.

1/4/2018 Print

https://content.ashford.edu/print/Newman.2681.16.1?sections=navpoint-23,navpoint-26,navpoint-27,navpoint-28,navpoint-29,navpoint-30,navpoint-3… 22/44

“Which of the following is your favorite fast-food restaurant?”

a) McDonald’s
b) Burger King
c) Wendy’s
d) Taco Bell

“Whom did you vote for in the 2012 presidential election?”

a) Mitt Romney
b) Barack Obama

“How do you travel to work most days? (Select all that apply.)”

a) drive alone
b) carpool
c) public transportation

As these examples show, multiple-choice questions offer quite a bit of freedom in both the content and the response-
scaling of questions. A researcher can ask participants either to select one answer or, as in the last example, to select
all applicable answers. A survey can cover everything from preferences (e.g., favorite fast-food restaurant) to
behaviors (e.g., how people travel to work).

Multiple-choice formats do have a downside. Whenever the survey provides a set of responses, it restricts
participants’ responses to that set. This is the problem that Rovira and colleagues (2006) encountered in asking
people about the most signi�icant events of the last century. In each of the examples above, the categories fail to
capture all possible responses. What if someone’s favorite restaurant is In-and-Out Burger? What if a respondent
voted for Ralph Nader? What if a person telecommutes or bicycles to work? Researchers have two relatively easy
ways to avoid (or at least minimize) this problem. First, when choosing the response options, plan carefully. During
the design process, it helps to brainstorm with other people to ensure the survey is capturing the most likely range
of responses. However, it is often impossible to provide every option that people might conceive. The second
solution is to provide an “other” response to a multiple-choice question, which allows people to write in an option
that the survey neglected to include. For example, our last question about traveling to work could be rewritten as:

“How do you travel to work on most days? (Select all that apply.)”

a) drive alone
b) carpool
c) public transportation
d) other (please specify): __________________

This way, people who telecommute, or bicycle, or even ride their trained pony to work will have a way to respond
rather than skipping the question. And, if researchers start to notice a pattern in these write-in responses (e.g., 20%
of people added “bicycle”), then they have valuable knowledge to improve the next incarnation of the survey.

Rating scales. Last, but certainly not least, another option uses a rating-scale format, which asks participants to
respond on a scale representing a continuum.

“Sometimes it is necessary to sacri�ice liberty in the name of security.”

1 2 3 4 5

not at all necessary very necessary

“I would vote for a candidate who supported the death penalty.”

1/4/2018 Print

https://content.ashford.edu/print/Newman.2681.16.1?sections=navpoint-23,navpoint-26,navpoint-27,navpoint-28,navpoint-29,navpoint-30,navpoint-3… 23/44

1 2 3 4 5

not at all likely very likely

“The political party in power right now has really messed things up.”

1 2 3 4 5

strongly disagree strongly agree

This format is well suited to capturing attitudes and opinions, and, indeed, is one of the most common approaches to
attitude research. Rating scales are easy to score, and they give participants some �lexibility in indicating their
agreement with or endorsement of the questions. Researchers have two critical decisions to make about the
construction of rating-scale items; both have implications for how they analyze and interpret results.

First, a researcher needs to decide the anchors, or labels, for the response scale. Rating scales offer a good deal of
�lexibility in these anchors, as the examples above demonstrate. A survey can frame questions in terms of
“agreement” with a statement or “likelihood” of a behavior, or researchers can customize the anchors to match their
questions (e.g., “not at all necessary”). Scales that use anchors of “strongly agree” and “strongly disagree” are also
referred to as Likert scales. At a fairly simple level, the choice of labels affects the interpretation of the results. For
example, if we asked the “political party in power” question above, we have to be aware that the anchors are phrased
in terms of agreement with the statement. In discussing these results, we would be able to discuss how much people
agreed with the statement, on average, and whether agreement correlated with other things. If this seems like an
obvious point, readers would be amazed how often researchers (or the media) will take an item like this and spin
the results to talk about the “likelihood of voting” for the party in power—confusing an attitude with a behavior. So,
in short, researchers must make sure they are being honest when presenting and interpreting research data.

At a more conceptual level, a researcher needs to decide whether the anchors for the rating scale make use of a
bipolar scale, which has polar opposites at its endpoints, or a unipolar scale, which assesses a single construct.
The difference between these options is best illustrated by an example:

Bipolar: How would you rate your current mood?

Sad—————————————Happy

Unipolar: How would you rate your current mood?

1 2 3 4 5 6 7

not at all sad very sad

1 2 3 4 5 6 7

not at all happy very happy

The bipolar option requires participants to place themselves on a continuous scale somewhere between “sad” and
“happy,” which are polar opposites. The bipolar scale assumes that the endpoints represent the only two options;
participants can be sad, happy, or somewhere in between. In contrast, the unipolar option asks participants to rate
themselves on two scales, indicating their level of both “sadness” and “happiness.” A pair of unipolar scales assumes
that it is possible to experience varying degrees of each item—participants can be moderately happy, but also a little
bit sad, for example. The decision to use a bipolar or a unipolar scale comes down to the context. What is the most
logical way to think about these constructs? What have previous researchers done?

In the 1970s, Sandra Lipsitz Bem revolutionized the way researchers
thought about gender roles by arguing against a bipolar approach.
Previously, gender role identi�ication had been measured on a bipolar

1/4/2018 Print

https://content.ashford.edu/print/Newman.2681.16.1?sections=navpoint-23,navpoint-26,navpoint-27,navpoint-28,navpoint-29,navpoint-30,navpoint-3… 24/44

iulianvalentin/iStock/Thinkstock

Sandra Lipsitz Bem insisted that
people have varying degrees of
masculine and feminine traits.

scale from “masculine” to “feminine”; the scale assumed that a person
could be one or the other. Bem (1974) argued instead that people could
easily have varying degrees of masculine and feminine traits. Her scale,
the Bem Sex Role Inventory, asks respondents to rate themselves on a set
of 60 unipolar traits. Someone with mostly feminine and hardly any
masculine traits would be described as “feminine.” Someone with high
ratings on both masculine and feminine traits would be described as
“androgynous.” And, someone with low ratings on both masculine and
feminine traits would be described as “undifferentiated.” View and
complete Bem’s scale online at:
http://garote.bdmonkeys.net/bsri.html
(http://garote.bdmonkeys.net/bsri.html) .

After settling on the best way to anchor the scale, the researcher’s second
critical decision is to decide on the number of points in the response
scale. Notice that all of the examples in this section have an odd number
of points (i.e., �ive or seven). Odd numbers are usually preferable for
rating-scale items because the middle of the scale (i.e., “3” or “4”) allows
respondents to give a neutral, middle-of-the-road answer. That is, on a
scale from “strongly disagree” to “strongly agree,” the midpoint can be
used to indicate “neither” or “I’m not sure.” However, in some cases, a
researcher may not want to allow a neutral option in a scale. Using an
even number of points (e.g., four or six) essentially compels people either
to agree or disagree with the statement; this type of scaling is referred to
as forced choice.

So, how many points should the scale have? As a general rule, more points will translate into more variability in
responses—the more choice people have (up to a point), the more likely they are to distribute their responses
among those choices. From a researcher’s perspective, the big question is whether this variability is meaningful. For
example, if we assess college students’ attitudes about a student-fee increase, student opinions will likely vary
depending on the size of the fee and the ways in which it will be used. Thus, we might prefer a �ive- or seven-point
scale to a two-point (yes or no) scale. However, past a certain point, increases in the scale range cease to connect to
meaningful variation in attitudes. In other words, the difference between a 5 and a 6 on a seven-point scale is fairly
intuitive for participants to grasp. What is the real difference, though, between an 80 and an 81 on a 100-point
scale? When scales become too large, researchers risk introducing another source of error variance as participants
impose their own interpretations on the scaling. In sum, more points do not always translate to a better scale.

Back to the question: How many points should the scale have? The ideal compromise supported by most
statisticians is to use a seven-point scale whenever possible because of the differences between scales of
measurement. As the discussion in Chapter 2 explained, the way variables are measured has implications for data
analyses. For the most popular statistical tests to be legitimate, variables need to be on an interval scale (i.e., with
equal intervals between points) or a ratio scale (i.e., with a true zero point). Based on mathematical modeling
research, statisticians have concluded that the variability generated by a seven-point scale is most likely to mimic an
interval scale (e.g., Nunnally, 1978). So, from a statistical perspective, a seven-point scale is ideal because it allows us
the most �lexibility in data analyses.

Finalizing the Questionnaire

After constructing the questionnaire items, researchers face one last important step before beginning data
collection. This section discusses a few guidelines for assembling the items into a coherent questionnaire. One main
goal at this stage is to think carefully about the order of the individual items.

http://garote.bdmonkeys.net/bsri.html

1/4/2018 Print

https://content.ashford.edu/print/Newman.2681.16.1?sections=navpoint-23,navpoint-26,navpoint-27,navpoint-28,navpoint-29,navpoint-30,navpoint-3… 25/44

First, keep in mind that the �irst few questions will set the tone for the rest of the questionnaire. It is best to start
with questions that are both interesting and nonthreatening to help ensure that respondents complete the
questionnaire with open minds. For example:

BAD OPENING: “Do you agree that your child’s teacher is an idiot?” (threatening, and also a leading question)

BETTER OPENING: “How would you rate the performance of your child’s teacher?”

BAD OPENING: “Would you support a 1% sales tax increase?” (boring)

BETTER OPENING: “How do you feel about raising taxes to help fund education?”

Second, strive whenever possible to have continuity in the different sections of the questionnaire. Imagine
constructing a survey to give to college freshmen. It might include questions on family background, stress levels,
future plans, campus engagement, and so on. The survey will be most effective if it groups questions by topic. So, for
instance, students respond to a set of questions about future plans on one page and then answer a set of questions
about campus engagement on another page. This approach makes it easier for participants to progress through the
questions without having to switch mentally between topics.

Third, remember that individual questions are always read in context. This means that if the college-student survey
begins with a question about plans for the future and then asks about stress, respondents will likely have their
future plans in mind when they think about their stress level. Consider again the example of the graduate assistant.
His department used to administer a gigantic survey packet (on paper) to the 2,000 students enrolled in
Introductory Psychology each semester. One year, a faculty member included a measure of identity, asking
participants to complete the statements “I am______” and “I am not______.” As researchers started to analyze data from
this survey, they discovered an astonishing 60% of students had �illed in the blank with “I am not a homosexual!”
This response seemed downright strange, until the surveyors realized that the questionnaire immediately preceding
the identity one measured prejudice toward gay and lesbian individuals. So, as these students completed the
identity measure, they had homosexuality on their minds and felt compelled to point out that they were not
homosexual. In other words, responses are all about context.

Finally, after assembling a draft version of the questionnaire, perform a test run. This test run, called pilot testing,
involves giving the questionnaire to a small sample of people, getting their feedback, and making any necessary
changes. One of the best ways to pilot test is to �ind a patient group of friends to complete the questionnaire who
will provide extensive feedback. In soliciting their feedback, ask questions like the following:

Was anything confusing or unclear?

Was anything offensive or threatening?

How long did the questionnaire take you to complete?

Did it seem repetitive or boring? Did it seem too long?

Were there particular questions that you liked or disliked? Why?

The answers to these questions will supply valuable information to revise and clarify the questionnaire before
devoting resources to a full round of data collection. The next section turns to the question of how to �ind and select
participants for this stage of the research.

Research: Thinking Critically

Beauty and First Impressions

1/4/2018 Print

https://content.ashford.edu/print/Newman.2681.16.1?sections=navpoint-23,navpoint-26,navpoint-27,navpoint-28,navpoint-29,navpoint-30,navpoint-3… 26/44

Follow the link below to a press release from the University of British Columbia, describing a recent
publication by researchers in the psychology department. This study suggests that physical beauty may play
a role in how easily we form �irst impressions of other people. As you read the article, consider what you have
learned so far about the research process, and then respond to the questions below.

http://news.ubc.ca/2010/12/21/beautiful-people-convey-personality-traits-better-during-�irst-
impressions/ (http://news.ubc.ca/2010/12/21/beautiful-people-convey-personality-traits-better-during-�irst-
impressions/)

Think About It:

1. Suppose the following questions were part of the questionnaire given after the three-minute one-on-
one conversations in this study. Based on the goals of the study and the rules discussed in this
chapter, identify the problem with each of the following questions and suggest a better item.

a. Jane is very neat.
1 2 3 4 5

strongly
disagree

strongly
agree

main problem:

better item:

b. Jane is generous and organized.
1 2 3 4 5

strongly
disagree
strongly
agree
main problem:
better item:

c. Jane is extremely attractive.
TRUE FALSE

main problem:
better item:

2. What are the strengths and weaknesses of using a �ixed-format questionnaire in this study versus
open-ended responses?

3. The researchers state that they took steps to control for the “positive bias that can occur in self-
reporting.” How might social desirability in�luence the outcome of this particular study? What might
the researchers have done to reduce the effect of social desirability?

http://news.ubc.ca/2010/12/21/beautiful-people-convey-personality-traits-better-during-first-impressions/

1/4/2018 Print

https://content.ashford.edu/print/Newman.2681.16.1?sections=navpoint-23,navpoint-26,navpoint-27,navpoint-28,navpoint-29,navpoint-30,navpoint-3… 27/44

4.3 Sampling From the Population
At this point, the chapter should have conveyed an understanding of how to construct survey items. The next step is
to �ind a group of people to �ill out the survey. But where does a researcher �ind this group? And how many people
are needed? On the one hand, researchers want as many people as possible to capture the full range of attitudes and
experiences. On the other hand, they have to conserve time and other resources, which often means choosing a
smaller sample of people. This section examines the strategies researchers can use to select samples for their
studies.

Researchers refer to the entire collection of people who could possibly be relevant to a study as the population. For
example, if we were interested in the effects of prison overcrowding, our population would consist of prisoners in
the United States. If we wanted to study voting behavior in the next presidential election, the population would be
U.S. residents eligible to vote. And if we wanted to know how well college students cope with the transition from
high school, our population would include every college student enrolled in every college in the country.

These populations suggest an obvious practical complication. How can we get every college student—much less
every prisoner—in the country to �ill out our questionnaire? We cannot; instead, researchers will collect data from a
sample, a subset of the population. Instead of trying to reach all prisoners, we might sample inmates from a handful
of state prisons. Rather than attempt to survey all college students in the country, researchers often restrict their
studies to a collection of students at one university.

The goal in choosing a sample is to make it as representative as possible of the larger population. That is, if
researchers choose students at one university, they need to be reasonably similar to college students elsewhere in
the country. If the phrase “reasonably similar” sounds vague, this is because the basis for evaluating a sample varies
depending on the hypothesis and the key variables. For example, if we wanted to study the relationship between
family income and stress levels, we would need to make sure that our sample mirrored the population in the
distribution of income levels. Thus, a sample of students from a state university might be a better choice than
students from, say, Harvard (which costs about $60,000 per year including room and board). On the other hand, if
the research question deals with the pressures faced by students in selective private schools, then Harvard students
could be a representative sample for the study.

Figure 4.1 shows a conceptual illustration of both a representative and nonrepresentative sample, drawn from a
larger population. The population in this case consists of 144 individuals, split evenly between Xs and Os. Thus, we
would want our sample to come as close as possible to capturing this 50/50 split. The sample of 20 individuals on
the left is representative of the sample because it is split evenly between Xs and Os. But the sample of 20 individuals
on the right is nonrepresentative because it contains 75% Xs. Because the population has far fewer Os than we
might expect, this sample does not accurately represent the population. This failure of the sample to represent the
population is also referred to as sampling bias.

Figure 4.1: Representative and nonrepresentative samples of a
population

1/4/2018 Print

https://content.ashford.edu/print/Newman.2681.16.1?sections=navpoint-23,navpoint-26,navpoint-27,navpoint-28,navpoint-29,navpoint-30,navpoint-3… 28/44

From where do these samples come? Broadly speaking, researchers have two broad categories of sampling
strategies at their disposal: probability sampling and nonprobability sampling.

Probability Sampling

Researchers use probability sampling when each person in the population has a known chance of being in the
sample. This is possible only in cases where researchers know the exact size of the population. For instance, the
current population of the United States is 322.1 million people (www.census.gov/popclock/
(http://www.census.gov/popclock/) ). If we were to select a U.S. resident at random, each resident would have a one in
322.1 million chance of being selected. Whenever researchers have this information, probability-sampling strategies
are the most powerful approach because they greatly increase the odds of getting a representative sample. Within
this broad category of probability sampling are three speci�ic strategies: simple random sampling, strati�ied random
sampling, and cluster sampling.

Simple random sampling, the most straightforward approach, involves randomly picking study participants from a
list of everyone in the population. The term for this list is a sampling frame (e.g., imagine a list of every resident of
the United States). To have a truly representative random sample, researchers must have a sampling frame; they
must choose from it randomly; and they must have a 100% response rate from those selected. (As Chapter 2
discussed, if people drop out of a study, it can threaten the validity of the hypothesis test.)

Researchers use strati�ied random sampling, a variation of
simple random sampling, when subgroups of the

population

might be left out of a purely random sampling process.
Imagine a city with a population that is 80% Caucasian, 10%
Hispanic, 5% African American, and 5% Asian. If we were to

http://www.census.gov/popclock/

1/4/2018 Print

https://content.ashford.edu/print/Newman.2681.16.1?sections=navpoint-23,navpoint-26,navpoint-27,navpoint-28,navpoint-29,navpoint-30,navpoint-3… 29/44

bowdenimages/iStock/Thinkstock

In a neighborhood with a majority of Caucasian
residents, strati�ied random sampling is needed
to capture the perspective of all ethnic groups in
the community.

pick 100 residents at random, the chances are very good that
our entire sample would consist of Caucasian residents and
ignore the perspective of all ethnic minority residents. To
prevent this problem, researchers use strati�ied random
sampling—breaking the sampling frame into subgroups and
then sampling a random number from each subgroup. In this
example, we could divide the list of residents into four ethnic
groups and then pick a random 25 from each of these groups.
The end result would be a sample of 100 people that
captured opinions from each ethnic group in the population.
Notice that this approach results in a sample that does not
exactly represent the underlying population—that is,
Hispanics constitute 25% of the sample, rather than 10%.
One way to correct for this issue is to use a statistical
technique known as “weighting” the data. Although the full
details are beyond the scope of this book, weighting involves
trying to correct for problems in representation by assigning each participant a weighting coef�icient for analyses. In
essence, people from groups that are underrepresented would have a weight greater than 1, while those from
groups that are overrepresented would have a weight less than 1. For more information on weighting and its uses,
see http://www.applied-survey-methods.com/weight.html (http://www.applied-survey-methods.com/weight.html) .

Finally, researchers employ cluster sampling, another variation of random sampling, when they do not have access
to a full sampling frame (i.e., a full list of everyone in the population). Imagine that we want to study how cancer
patients in the United States cope with their illness. Because no list exists of every cancer patient in the country, we
have to get a little creative with our sampling. The best way to think about cluster sampling is as “samples within
samples.” Just as with strati�ied sampling, we divide the overall population into groups, but cluster sampling differs
in that we are dividing into groups based on more than one level of analysis. In our cancer example, we could start
by dividing the country into regions, then randomly selecting cities from within each region, and then randomly
selecting hospitals from within each city, and �inally randomly selecting cancer patients from each hospital. The end
result would be a random sample of cancer patients from, say, Phoenix, Miami, Dallas, Cleveland, Albany, and Seattle;
taken together, these patients would provide a fairly representative sample of cancer patients around the country.

Nonprobability Sampling

The other broad category of sampling strategies is known as nonprobability sampling. These strategies are used in
the (remarkably common) case in which researchers do not know the odds of any given individual’s being in the
sample. This uncertainty represents an obvious shortcoming—if we do not know the exact size of the population
and do not have a list of everyone in it, we have no way to know that our sample is representative. Despite this
limitation, researchers use nonprobability sampling on a regular basis. We will discuss two of the most common
nonprobability strategies here.

In many cases, it is not possible to obtain a sampling frame. When researchers study rare or hard-to-reach
populations or study potentially stigmatizing conditions, they often recruit by word-of-mouth. The term for this is
snowball sampling—imagine a snowball rolling down a hill, picking up more snow (or participants) as it goes. If
we wanted to study how often homeless people took advantage of social services, we would be hard pressed to �ind
a sampling frame that listed the homeless population. Instead, we could recruit a small group of homeless people
and ask each of them to pass the word along to others, and so on. If we wanted to study changes in people’s
identities following sex-reassignment surgery, we would �ind it dif�icult to track down this population via public
records. Instead, we could recruit one or two patients and ask for referrals to others. The resulting sample in both
cases is unlikely to be representative, but researchers often have to compromise for the sake of obtaining access to a
population. Snowball sampling is most often used in qualitative research, where the advantages of gaining a rich
narrative from these individuals outweigh the loss of representativeness.

http://www.applied-survey-methods.com/weight.html

1/4/2018 Print

https://content.ashford.edu/print/Newman.2681.16.1?sections=navpoint-23,navpoint-26,navpoint-27,navpoint-28,navpoint-29,navpoint-30,navpoint-3… 30/44

One of the most popular nonprobability strategies is known as convenience sampling, or simply including people
who show up for the study. Any time a 24-hour news station announces the results of a viewer poll, they are likely
based on a convenience sample. CNN and Fox News do not randomly select from a list of their viewers; they post a
question onscreen or online, and people who are motivated (or bored) enough to respond will do so. As a matter of
fact, the vast majority of psychology research studies are based on convenience samples of undergraduate college
students. Research in psychology departments often works like this: Experimenters advertise their studies on a
website, and students enroll in these studies, either to earn extra cash or to ful�ill a research requirement for a
course. Students often pick a particular study based on whether it �its their busy schedules or whether the
advertisement sounds interesting. These decisions are hardly random and, consequently, neither is the sample. The
goal here is not to disparage all psychology research—that would be self-defeating—but to emphasize that all of the
decisions researchers make have both pros and cons.

Choosing a Sampling Strategy

Although researchers always strive for a representative sample, no such thing as a perfectly representative one
exists. Some degree of sampling error, de�ined as the degree to which the characteristics of the sample differ from
the characteristics of the population, is always present. Instead of aiming for perfection, then, researchers aim for an
estimate of how far from perfection their samples are. These estimates are known as the margin of error, or the
degree to which the results from a particular sample are expected to deviate from the population as a whole.

One of the main advantages of a probability sample is that we are able to calculate these errors, as long as we know
our sample size and desired level of con�idence. In fact, most of us encounter margins of error every time we see the
results of an opinion poll. For example, CNN may report that “Candidate A is leading the race with 60% of the vote, ±
3%.” This means Candidate A’s approval percentage in the sample is 60%, but based on statistical calculations, her
real percentage is between 57% and 63%. The smaller the error (3% in this example), the more closely the results
from the sample match the population. Naturally, researchers conducting these opinion polls want the error of
estimation to be as small as possible. How persuaded would anyone be to learn that “Candidate A has a 10-point
lead, plus or minus 20 points?” This margin of error ought to trigger our skepticism, because the real difference is
between 30 points and –10 points—i.e., a 10-point lead for the other candidate.

Researchers’ most direct means of controlling the margin of error is by changing the sample size. Most survey
research aims for a margin of error of less than �ive percentage points. Based on standard calculations, this requires
a sample size of 400 people per group. That is, if we want to draw conclusions about the entire sample (e.g., “30% of
registered voters said X”), then we would need at least 400 respondents to say this with some con�idence. If we want
to draw conclusions about subgroups (e.g., “30% of women compared to 50% of men”), then we would actually need
at least 400 respondents of each gender to draw conclusions with con�idence.

The magic number of 400 represents a compromise—a researcher is willing to accept 5% error for the sake of
keeping time and costs down. It is worth noting, however, that some types of research have more stringent
standards: For political polls to be reported by the media, they must have at least 1,000 respondents, which brings
the margin of error down to three percentage points. In contrast, some areas of applied research may have more
relaxed standards. In marketing research, for example, budget considerations sometimes lead to smaller samples,
which means drawing conclusions at lower levels of con�idence. For example, with a sample size of 100 people per
group, researchers have to contend with 8–10% margin of error—almost double the error, but at a fraction of the
costs.

If probability sampling is so powerful, why are nonprobability strategies so popular? One reason is that convenience
samples are more practical; they are cheaper, easier, and almost always possible to conduct with relatively few
resources because researchers can avoid the costs of large-scale sampling. A second reason is that convenience is
often a good-enough starting point for a new line of research. For example, if we wanted to study the predictors of
relationship satisfaction, we could start by testing our hypotheses in a controlled setting using college student
participants and then extend the research to the study of adult married couples. Finally, and relatedly, in many cases
it is acceptable to have a nonrepresentative sample because researchers do not need to generalize results. If we

1/4/2018 Print

https://content.ashford.edu/print/Newman.2681.16.1?sections=navpoint-23,navpoint-26,navpoint-27,navpoint-28,navpoint-29,navpoint-30,navpoint-3… 31/44

want to study the prevalence of alcohol use in college students, it may be perfectly acceptable to use a convenience
sample of college students. Although, even in this case, researchers would have to keep in mind that they are
studying drinking behaviors among students who volunteered to complete a study on drinking behaviors.

In some cases, however, it is critical to use probability sampling, despite the extra effort required. Speci�ically,
researchers use probability samples any time it is important to generalize and any time it is important to predict
behavior of a population. The best way of understanding these criteria is to think of political polls. In the lead-up to
an election, each campaign is invested in knowing exactly what the voting public thinks of its candidate. In contrast
to a CNN poll, which is based on a convenience sample of viewers, polls conducted by a campaign will be based on
randomly selected households from a list of registered voters. The resulting sample is much more likely to be
representative, much more likely to tell the campaign how the entire population views its candidate, and therefore
much more likely to be useful.

1/4/2018 Print

https://content.ashford.edu/print/Newman.2681.16.1?sections=navpoint-23,navpoint-26,navpoint-27,navpoint-28,navpoint-29,navpoint-30,navpoint-3… 32/44

4.4 Analyzing Survey Data
Now comes the fun part. Once researchers have designed a survey, chosen an appropriate sample, and collected
some data, it is time for analyses. As with the descriptive designs Chapter 3 explained, the goal of these analyses is to
subject hypotheses to a statistical test. Surveys can be used both to describe and predict thoughts, feelings, and
behaviors. Since Chapter 3 already covered the basics of descriptive analysis, this section will focus on predictive
analyses, which are designed to assess the associations between and among variables. Researchers typically use
three approaches to test predictive hypotheses: correlational analyses, chi-square analyses, and regression analyses.
Each has its advantages and disadvantages, and each is most appropriate for a different kind of data. This section
will walk through the basics of each analysis. Because the statistics course discusses these approaches in more
detail, the goal here is to acquire a more conceptual overview of each technique and its usefulness in answering
research questions.

Correlational Analysis

The beginning of this chapter described an example of a survey research question: What is the relationship between
the number of hours that students spend studying and their grades in the class? In this case, the hypothesis claims
that we can predict something about students’ grades by knowing how many hours they spend studying.

Imagine we collected a small amount of data (shown in Table 4.1) to test this hypothesis. (Of course, a true test of
this hypothesis would require more than 10 people in the sample, but these data will do as an illustration.)

Table 4.1: Data for quiz grade/hours studied example

Participant Hours Studied Quiz Grade

1 1 2

2 1 3

3 2 4

4 3 5

5 3 6

6 3 6

7 4 7

8 4 8

9 4 9

10 5 9

The Logic of Correlation
The important question here is whether and to what extent we can predict grades based on study time. One
common statistic for testing these kinds of hypotheses is a correlation, which gives an assessment of the linear
relationship between two variables. A stronger correlation between two variables indicates a stronger association
between them. In the case of the current example, the stronger the correlation between study time and quiz grade,
the more accurately we can predict grades based on knowing how long the student spends studying.

Before we calculate the correlation between these variables, it is always a good idea to visualize the data on a graph.
Chapter 3 discussed a type of graph, called a scatterplot, that displays points of data on two variables at a time. The
scatterplot in Figure 4.2 shows our sample data from the studying/quiz grade study.

1/4/2018 Print

https://content.ashford.edu/print/Newman.2681.16.1?sections=navpoint-23,navpoint-26,navpoint-27,navpoint-28,navpoint-29,navpoint-30,navpoint-3… 33/44

Figure 4.3: Curvilinear relationship
between arousal and performance

Figure 4.2: Scatterplot for quiz grade/hours studied example

Each point on the graph represents one participant. For example, the point in the top right corner represents a
student who studied for �ive hours and earned a 9 on the quiz. The two points in the bottom right represent students
who studied for only one hour and earned a 2 and a 3 on the quiz.

Researchers have three reasons to graph data before conducting statistical tests. First, a graph allows us to get a
general sense of the pattern—in this case, students who study less appear to do worse on the quiz. As a result, we
will be better informed going into our statistical calculations. Second, the graph lets us examine the raw data for any
outliers, or points that stand out as clear exceptions to the overall pattern. These outlier points may indicate that a
respondent misunderstood the question and should be dropped from analyses. On the other hand, a cluster of
outlier points could indicate the presence of subgroups within our data. Perhaps most students do worse if they
study less, but a group of students is able to ace the quizzes without any preparation. Examining this cluster of
people in more detail might suggest either a re�inement of our hypothesis or an interesting direction for future
research.

Third, the graph assures researchers that there is a linear
relationship between the variables. This is a very important point
about correlations: The math of the standard correlation formula is
based on how well the data points �it a straight line, which means
nonlinear relationships might be overlooked. Figure 4.3
demonstrates a robust nonlinear �inding in psychology regarding
the relationship between task performance and physiological
arousal. As this graph shows, people tend to perform their best on
just about any task when they have a moderate level of arousal.

When arousal is too high, people �ind it dif�icult to calm down and
concentrate; when arousal is too low, people �ind it dif�icult to care
about the task at all. If we simply ran a standard correlation with
data on performance and arousal, the correlation would be zero because the points do not �it a straight line. Thus, it
is critical to visualize the data before jumping ahead to the statistics. Otherwise, researchers risk overlooking an
important �inding in the data. (It is important to note that non-linear relationships like this one can still be analyzed,
but the calculations quickly become complex. In fact, these analyses even require specialized knowledge to use
statistical software.)

1/4/2018 Print

https://content.ashford.edu/print/Newman.2681.16.1?sections=navpoint-23,navpoint-26,navpoint-27,navpoint-28,navpoint-29,navpoint-30,navpoint-3… 34/44

Interpreting Coef�icients
Once we are satis�ied that our data look linear, it is time to calculate our statistics. Researchers typically calculate
using a computer software program, such as SPSS, SAS, or Microsoft Excel. The number used to quantify the
correlation is called the correlation coef�icient. This number ranges from –1 to +1 and contains two important
pieces of information:

The direction of the relationship is based on the sign of the correlation coef�icient. A +0.8 would indicate a
positive correlation, meaning that as one variable increases, so does the other variable. A –0.8 would
indicate a negative correlation, meaning that as one variable increases, the other variable decreases. (Refer
back to Section 2.1 for a review of these two terms.)
The size of the relationship is based on the absolute value of the correlation coef�icient. The farther the
coef�icient is from zero in either direction, the stronger the relationship between variables. For example,
both a +0.8 and a –0.8 indicate strong relationships.

So, for example, a +0.2 represents a weak positive relationship and a –0.7 represents a strong negative relationship.

Calculating the correlation for our quiz-grade study produces a coef�icient of 0.962, indicating a strong positive
relationship between studying and quiz grade. What does this mean in plain English? Students who spend more
hours studying tend to score higher on the quiz.

How do we know whether to get excited about a correlation of 0.962? As with all of our statistical analyses, we look
this value up in a critical value table, or, more commonly, let the computer software do this for us. The critical value
table provides a p value representing the odds that our correlation is due to random chance. In this case, the p value
is less than 0.001. This means that the chance of our correlation being a random �luke is less than 1 in 1,000; we can
feel pretty con�ident in our results.

When interpreting correlation results, realize that statistical signi�icance is closely tied to the sample size. In a small
sample, it is possible to see moderate to strong relationships that do not meet the threshold for statistical
signi�icance. One good option in these cases is to collect additional data. If the correlation maintains its size and also
attains statistical signi�icance, researchers can have some con�idence in the results. It is also possible to have the
opposite problem: Large sample sizes can make even the smallest relationships show high levels of statistical
signi�icance. In a 2008 journal article, Newman, Groom, Handelman and Pennebaker analyzed differences in
language use between men and women. Because the authors had a sample of over 14,000 text samples, even the
tiniest differences in language were statistically signi�icant. For example, men used words related to anger about 4%
more than women; with such a large sample, this trivial difference was signi�icant at p < 0.05. To deal with this issue, the authors chose to use a more conservative threshold of p < 0.001, considering all other results to be too trivial.

Returning to our quiz-grade study, we now have all the information we need to report this correlation in a research
paper. The standard way of reporting a correlation coef�icient includes information about the sample size (N) and p
value, as well as the coef�icient itself. Our quiz-grade study would be reported as Figure 4.4 depicts.

Figure 4.4: Correlation coef�icient diagram

1/4/2018 Print

https://content.ashford.edu/print/Newman.2681.16.1?sections=navpoint-23,navpoint-26,navpoint-27,navpoint-28,navpoint-29,navpoint-30,navpoint-3… 35/44

Where, then, does this leave our hypothesis? We started by predicting that students who spent more time studying
would perform better on their quizzes than those who spent less time studying. We then designed a study to test
this hypothesis by collecting data on study habits and quiz grades. Finally, we analyzed these data and found a
signi�icant, strong, positive correlation between hours studied and quiz grade. Based on this study, our hypothesis
has been con�irmed—students who study more have higher quiz grades. Of course, because this is a correlational
study, we are unable to make causal statements. It could be that studying more for an exam helps students to learn
more. Or, it could be the case that previous low quiz grades make students give up and study less. A third variable of
motivation could cause students both to study more and perform better on the quizzes. To tease these explanations
apart and determine causality calls for a different type of research design, which Chapter 5 will discuss.

Multiple Regression Analysis

Correlations are the best tool to test the linear relationship between pairs of quantitative variables. However, in
many cases, researchers are interested in comparing the in�luence of several variables at once. Imagine we want to
expand the study about hours studying and quiz grade by looking at other variables that might predict students’
quiz grades. We have already learned that the hours students spend studying correlate positively with their grades.
But what about SAT scores? Will students with higher standardized-test scores do better in all of their college
classes? What about the number of classes that students have previously taken in the subject area? Will increased
familiarity with the subject be associated with higher scores? To compare the in�luence of all three variables, we can
use a slightly different analytic approach. Multiple regression is a variation on correlational analysis in which more
than one predictor variable is used to predict a single outcome variable. In this example, we would attempt to
predict the outcome variable of quiz scores based on three predictor variables: SAT scores, number of previous
classes, and hours studied.

Multiple regression requires an extensive set of calculations; consequently, it is always performed by computer
software. A detailed look at these calculations is beyond the scope of this book, but a conceptual overview will help
convey the unique advantages of this analysis. Essentially, the calculations for multiple regression are based on the
correlation coef�icients between each of our predictor variables, as well as between each of these variables and the
outcome variable. Table 4.2 shows these correlations for our revised quiz-grade study. If we scan the top row, we see
the correlations between quiz grade and the three predictor variables: SAT (r = 0.14), previous classes (r = 0.24),
and hours studied (r = 0.25). The remainder of the table shows correlations between the various predictor
variables; for example, hours studied and previous classes correlate at r = 0.24. When researchers conduct multiple
regression analysis using computer software, the software will use all of these correlations in performing its
calculations.

Table 4.2: Correlations for a multiple regression analysis

Quiz Grade SAT Score Previous Classes Hours Studied

Quiz Grade — 0.14 0.24* 0.25*

1/4/2018 Print

https://content.ashford.edu/print/Newman.2681.16.1?sections=navpoint-23,navpoint-26,navpoint-27,navpoint-28,navpoint-29,navpoint-30,navpoint-3… 36/44

Quiz Grade SAT Score Previous Classes Hours Studied

SAT Score — .02 –.02

Previous Classes — 0.24*

Hours Studied —

Note. Correlations marked with an asterisk (*) are statistically signi�icant at the 95% con�idence level. This notation in results tables is common and allows
researchers to quickly spot the most interesting �indings.

The advantage of multiple regression is that it considers both the individual and the combined in�luence of the
predictor variables. Figure 4.5 shows a visual diagram of the individual predictors of quiz grades. The numbers
along each line are known as regression coef�icients, or beta weights. These values are very similar to

correlation

coef�icients but differ in an important way: They represent the effects of each predictor variable while controlling
for the effects of all the other predictors. That is, the value of b = 0.21 linking hours studied with quiz grades is the
independent contribution of hours studied, controlling for SAT scores and previous classes. If we compare the size of
these regression coef�icients, we see that, in fact, hours spent studying is still the largest predictor of quiz grades (b
= 0.21), compared to both SAT scores (b = 0.14) and previous classes (b = 0.19).

Even if individual variables only have a small in�luence, they can add up to a larger combined in�luence. So, if we
were to analyze the predictors of quiz grades in this study, we would �ind a combined multiple correlation
coef�icient of r = 0.34. The multiple correlation coef�icient represents the combined association between the
outcome variable and the full set of predictor variables. Note that in this case, the combined r of 0.34 is larger than
any of the individual correlations in Table 4.2, which ranged from 0.14 to 0.25. These numbers mean that we are
better able to predict quiz grades from examining all three variables than we are from examining any single variable.
Or, as the saying goes, the whole is greater than the sum of its parts.

Figure 4.5: Predictors of quiz grades

Multiple regression is an incredibly useful and powerful analytic approach, but it can also be a dif�icult concept to
grasp. Before moving on, we will revisit the concept in the form of an analogy. Imagine someone has just eaten the
most delicious hamburger of his life and is determined to understand what makes it so good. Many things contribute
to the taste of the hamburger: the quality of the meat, the type and amount of cheese, the freshness of the bun,
perhaps the smoked chili peppers layered on top. If the diner were to approach this investigation using multiple
regression, he would be able to distinguish the in�luence of each variable (how important is the cheese compared to

1/4/2018 Print

https://content.ashford.edu/print/Newman.2681.16.1?sections=navpoint-23,navpoint-26,navpoint-27,navpoint-28,navpoint-29,navpoint-30,navpoint-3… 37/44

the smoked peppers?) as well as take into account the full set of ingredients (does the freshness of the bun really
matter when the other elements taste so good?). Ultimately, the individual would be armed with the knowledge of
which elements are most important in crafting the perfect hamburger and would understand more about the perfect
hamburger than if he had examined each ingredient in isolation.

Chi-Square Analyses

Both correlations and regressions are well suited to testing hypotheses about prediction, as long as we can
demonstrate a linear relationship between two variables. Linear relationships, however, require that variables be
measured on one of the quantitative scales, that is, ordinal, interval, or ratio scales (see Section 2.3 for a review).
What if we want to test an association between nominal, or categorical, variables? In these cases, we need an
alternative statistic called the chi-square statistic, which determines whether two nominal variables are
independent from or related to one another. Chi-square is often abbreviated with the symbol χ2, which shows the
Greek letter chi with the superscript 2 for squared. (This statistic is also referred to as the chi-square test for
independence—a slightly longer but more descriptive synonym.)

The idea behind this test is similar to that of the correlation coef�icient. If two variables are independent, then
knowing the value of one variable does not tell us anything about the value of the other variable. As we will see in
the example below, a larger chi-square re�lects a larger deviation from what we would expect by chance and is thus
an index of statistical signi�icance.

Imagine that we want to know whether people in rural or urban areas are more likely to support a sales-tax
increase. We can easily speculate why either group might be more likely to do so—perhaps people living in cities are
more politically liberal or perhaps people living in small towns are better able to see bene�its of higher local taxes.
So, we might survey a sample of 100 people, asking them to indicate both their location (rural or urban) and their
support for a sales-tax proposal. The survey produces the following results (in Table 4.3), presented in a

contingency table

, which displays the number of individuals in each combination of our nominal variables. Notice
that we have more urban than rural residents, re�lecting the higher population density in cities.

Table 4.3: Chi-square example: support for a sales tax increase

Rural Urban Total

Support 10 45 55

Don’t Support 30 15 45

Total 40 60 100

But, as it turns out, the raw numbers are less important than the ratios within each group. The chi-square
calculation works by �irst considering what each cell in the table would look like if there were no relationship at all
(i.e., under the null hypothesis), and then determining how much the data differ from that reference point.

In this example, our �inal chi-square value is 34.55; this represents the total difference across the table between
actual and expected data. The larger this number is, the more our observed data differ from the expected
frequencies, and the more our variables relate to one another. In the current example, this means we can predict a
person’s support for a sales-tax increase based on where he or she lives, which is consistent with our initial
hypothesis.

Still, how do we know if our value of 34.55 is meaningful? As with the other statistical tests we have discussed,
determining the signi�icance requires looking up the result in a critical-value table to assess whether the calculated
value is above threshold. In this case, the critical value for a chi-square with a 2 × 2 table = 3.84, so we can feel
con�ident in our value of 34.55—almost 10 times higher than the threshold value.

1/4/2018 Print

https://content.ashford.edu/print/Newman.2681.16.1?sections=navpoint-23,navpoint-26,navpoint-27,navpoint-28,navpoint-29,navpoint-30,navpoint-3… 38/44

However, unlike correlation and regression coef�icients, our chi-square results cannot tell us anything about the
direction or magnitude of the relationship. A larger chi-square re�lects a larger deviation from what we would
expect by chance and is thus an index of statistical signi�icance. To interpret the patterns of our data, we need to
visually inspect the numbers in our data table. Better yet, we can create a bar graph like we did in Chapter 3 to
display these frequencies visually.

As Figure 4.6 shows, the cell frequencies suggest a fairly clear interpretation: People who live in urban settings are
much more likely than people who live in rural settings to support a sales-tax increase. In fact, urban residents
support the increase by a 3-to-1 margin, while rural residents oppose the increase by a 3-to-1 margin.

Figure 4.6: Graph of chi-square results

Research: Thinking Critically

Self-Esteem in Youth and Early Adulthood

Follow the link below to read a press release from the American Psychological Association, describing recent
research on self-esteem during adolescence. This study, by a group of Swiss researchers, challenges some of
our popular assumptions about gender differences in self-esteem. As you read the article, consider what you
have learned so far about the research process, and then respond to the questions below.

http://www.apa.org/news/press/releases/2011/07/youth-self-esteem.aspx
(http://www.apa.org/news/press/releases/2011/07/youth-self-esteem.aspx)

Think About It:

1. Why is self-esteem a good topic to study using survey research methods? Does using a survey to
study self-esteem present any weaknesses?

2. What type of sampling was used in this study? Was this an appropriate strategy?

http://www.apa.org/news/press/releases/2011/07/youth-self-esteem.aspx

1/4/2018 Print

https://content.ashford.edu/print/Newman.2681.16.1?sections=navpoint-23,navpoint-26,navpoint-27,navpoint-28,navpoint-29,navpoint-30,navpoint-3… 39/44

3. What type of data analysis discussed in this chapter is appropriate to understanding the in�luence of
multiple variables (mastery, health, income) on self-esteem?

1/4/2018 Print

https://content.ashford.edu/print/Newman.2681.16.1?sections=navpoint-23,navpoint-26,navpoint-27,navpoint-28,navpoint-29,navpoint-30,navpoint-3… 40/44

Summary and Resources

Chapter Summary
This chapter has covered the process of survey research from conceptualization through analysis. We �irst discussed
the types of research questions that are best suited to survey research—essentially, those that can be answered
based on people’s observations of their own behavior. Survey research can involve either verbal reports (i.e.,
interviews) or written reports (i.e., questionnaires). In both cases, surveys are distinguished by their reliance on
people’s self-reports of their attitudes, feelings, and behaviors.

This chapter covered several key points for writing survey items. The key takeaway of the �ive rules for better
questions is that questions should be written as clearly and unambiguously as possible. This helps to minimize the
error variance that might result from participants imposing their own guesses and interpretations on the material.
When designing survey items, researchers also have a broad choice between open-ended and �ixed-format
responses. The former provide richer and more extensive data but are harder to score and code; the latter are easier
to code but can constrain people’s responses to a researcher’s choice of categories. If and when researchers settle on
a �ixed-format response, they have another set of decisions to make regarding the response scaling, labels, and
general format.

Once researchers have constructed the scale, it is time to begin data collection. This chapter discussed the concept of
sampling, or choosing a portion of the population to use for a study. Broadly speaking, sampling can be either
“probability” or “nonprobability,” depending on whether researchers have a known population size from which they
sample randomly. Probability sampling is more likely to result in a representative sample, but this approach is not
possible in all studies. In fact, a signi�icant proportion of psychology research studies use a form of nonprobability
sampling called convenience sampling, meaning that the sample consists of those who show up for the study.

Finally, this chapter covered three approaches to analyzing survey data and testing hypotheses about prediction. The
�irst, correlational analysis, is a very popular way to analyze survey data. The correlation is a statistical test that
assesses the linear relationship between two variables. The stronger the correlation between variables, the more we
can accurately predict one based on knowing the other. Second, regression analyses allow us to expand our
investigations into multiple predictors. Multiple regression offers the advantage of considering both the individual
and the combined in�luence of the predictor variables. However, both correlation and regression require the
variables to be quantitative—that is, measured on an ordinal, interval, or ratio scale. In cases where our survey
produces nominal or categorical data, we use an alternative called the chi-square statistic, which determines
whether two nominal variables are independent or related. The chi-square works by examining the extent to which
our observed data deviate from the pattern we would expect if the variables were unrelated.

The common thread in all these analyses is that while they measure the association between variables, they do not
tell us anything about the causal relationship between them. To make causal statements, we have to conduct
experiments, which the next chapter will discuss.

Key Terms

anchors

bipolar scale

branching schedule

chi-square statistic

cluster sampling

contingency table

1/4/2018 Print

https://content.ashford.edu/print/Newman.2681.16.1?sections=navpoint-23,navpoint-26,navpoint-27,navpoint-28,navpoint-29,navpoint-30,navpoint-3… 41/44

convenience sampling

correlation

correlation coef�icient

double-barreled question

�ixed-format response

forced choice

interview

interview schedule

leading question

Likert scale

linear schedule

margin of error

multiple-choice format

multiple correlation coef�icient

multiple regression

non

probability sampling

open-ended response

pilot testing

population
probability sampling

questionnaire

rating scale

regression coef�icients (beta weights)

sampling bias

sampling error

sampling frame

self-reports

simple random sampling

snowball sampling

social desirability

1/4/2018 Print

https://content.ashford.edu/print/Newman.2681.16.1?sections=navpoint-23,navpoint-26,navpoint-27,navpoint-28,navpoint-29,navpoint-30,navpoint-3… 42/44

strati�ied random sampling

survey research

true/false format

unipolar scale

Chapter 4 Flashcards

Apply Your Knowledge
1. For each of the following poorly written questionnaire items, identify the major problem and then rewrite it

so that the problem is resolved.
a. How much do you like cats and ponies?

main problem:
better item:

b. Do you think that John McCain’s complete lack of personality proved that he would have been a
terrible president?

main problem:
better item:

c. Do you dislike not playing basketball?

main problem:
better item:

d. Do you support SB 1070?

Choose a Study ModeView this study set

https://quizlet.com/

https://quizlet.com/142579164/research-methods-in-psychology-2e-chapter-4-flash-cards/

1/4/2018 Print

https://content.ashford.edu/print/Newman.2681.16.1?sections=navpoint-23,navpoint-26,navpoint-27,navpoint-28,navpoint-29,navpoint-30,navpoint-3… 43/44

Research Scenarios: Try It

main problem:
better item:

e. How often do you take drugs?

main problem:
better item:

2. Dr. Truxillo is interested in Arizona residents’ thoughts and feelings about global warming. For each of the
following examples, identify the sampling method used by her research assistants.

a. Alejandra sets up a table in the mall and hands a survey to people who approach her.
b. Catherine randomly chooses �ive cities, then chooses three neighborhoods in each, then randomly

samples 5,000 households for a phone survey.
c. Isaiah starts with a list of the entire population of Arizona and selects participants by dialing random

phone numbers.
d. Anna obtains the master list from Isaiah and divides the population according to education level. She

then randomly chooses 500 high school dropouts, 500 college graduates, and 500 people with some
postgraduate education.

3. Based on each of the following study descriptions, choose whether the best analysis would be a correlation,
a multiple regression, or a chi-square.

a. Ahmad is interested in the relationship between annual income and self-reported happiness.
b. Sheila is interested in whether some ethnic groups are more likely to use counseling services (a yes-

or-no question).
c. Angela is interested in knowing the best predictors of recovery from depression, comparing the

in�luence of drugs, therapy, and family resources.
d. Kartik is interested in whether high school dropouts or college graduates are more likely to vaccinate

their children.
e. Nicole is interested in understanding the best predictors of weight loss.
f. Isabella is interested in the relationship between self-esteem and prejudice.

Critical Thinking Questions
1. In survey research, explain the trade-off between the “richness” of people’s responses, and the ease of

analyzing their responses.
2. When conducting interviews, the researcher has a personal interaction with the subject. Why is this both

good and bad?
3. What are some of the new challenges in conducting surveys over the Internet? On mobile devices?
4. Explain the compromises between con�idence level and research costs. When might researchers be willing

to accept more error in their �indings?

1/4/2018 Print

https://content.ashford.edu/print/Newman.2681.16.1?sections=navpoint-23,navpoint-26,navpoint-27,navpoint-28,navpoint-29,navpoint-30,navpoint-3… 44/44

NEXTPREV

Still stressed from student homework?
Get quality assistance from academic writers!

Order your essay today and save 25% with the discount code LAVENDER