Experimental Design and Validity

Review article and use template to demonstrate each of the four experimental designs: reversal, multiple baseline, changing criterion, and alternating treatment and discuss strengths and limitations and well as all forms of validity for each experimental design.

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

Experimental Design and Validity

Reversal Design

From the articles in the article bank provided by your instructor, choose

one
that demonstrates reversal design and complete the following.

APA citation

Full APA citation here.

Strengths

1. Strength of reversal design

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

2. Another strength of reversal design

Limitations

1. Limitation of reversal design

2. Another limitation of reversal design

External Validity

First explain what external validity is. Then explain how external validity was present or absent with support from the article.

Internal Validity

First explain what internal validity is. Then explain how internal validity was present or absent with support from the article.

Social Validity

First explain what social validity is. Then explain how social validity was present or absent with support from the article.

Multiple Baseline Design

From the articles in the article bank provided by your instructor, choose

one
that demonstrates multiple baseline design and complete the following.

APA citation

Full APA citation here.

Strengths

Limitations

External Validity

First explain what external validity is. Then explain how external validity was present or absent with support from the article.

Internal Validity

First explain what internal validity is. Then explain how internal validity was present or absent with support from the article.

Social Validity

First explain what social validity is. Then explain how social validity was present or absent with support from the article.

1. Strength of multiple baseline design

2. Another strength of multiple baseline design

1. Limitation of multiple baseline design

2. Another limitation of multiple baseline design

Changing Criterion Design

From the articles in the article bank provided by your instructor, choose

one
that demonstrates changing criterion design and complete the following.

APA citation

Full APA citation here.

Strengths

Limitations

External Validity

First explain what external validity is. Then explain how external validity was present or absent with support from the article.

Internal Validity

First explain what internal validity is. Then explain how internal validity was present or absent with support from the article.

Social Validity

First explain what social validity is. Then explain how social validity was present or absent with support from the article.

1. Strength of changing criterion design

2. Another strength of changing criterion design

1. Limitation of changing criterion design

2. Another limitation of changing criterion design

 

Alternating Treatment Design

From the articles in the article bank provided by your instructor, choose

one
that demonstrates alternating treatment design and complete the following.

APA citation

Full APA citation here.

Strengths

Limitations

External Validity

First explain what external validity is. Then explain how external validity was present or absent with support from the article.

Internal Validity

First explain what internal validity is. Then explain how internal validity was present or absent with support from the article.

Social Validity

First explain what social validity is. Then explain how social validity was present or absent with support from the article.

1. Strength of alternating treatment design

2. Another strength of alternating treatment design

1. Limitation of alternating treatment design

2. Another limitation of alternating treatment design

 

RE S EARCH ART I C L E

A systematic review of social-validity assessments in the
Journal of Applied Behavior Analysis: 2010–2020

Erin S. Leif | Nadine Kelenc-Gasior | Bradley S. Bloomfield | Brett Furlonger |

Russell A. Fox

Faculty of Education, Monash University,
Clayton, Victoria, Australia

Correspondence
Erin S. Leif, Faculty of Education, Monash
University, 19 Ancora Imparo Way,
Clayton VIC 3131, Australia.
Email: erin.leif@monash.edu

Editor-in-Chief: John Borrero
Handling Editor: Timothy Vollmer

Abstract
We conducted a systematic review of studies published in the Journal of Applied
Behavior Analysis between 2010 and 2020 to identify reports of social validity. A
total of 160 studies (17.60%) published during this time included a measure of
social validity. For each study, we extracted data on (a) the dimensions of social
validity, (b) the methods used for collecting social-validity data, (c) the respon-
dents, and (d) when social-validity data were collected. Most social-validity
assessments measured the acceptability of intervention procedures and outcomes,
with fewer evaluating goals. The most common method for collecting social valid-
ity data was Likert-type rating scales, followed by non-Likert-type questionnaires.
In most studies, the direct recipients of the intervention provided feedback on
social validity. Social-validity assessment data were often collected at the conclusion
of the study. We provide examples of social-validity measurement methods, discuss
their strengths and limitations, and provide recommendations for improving the
future collection and reporting of social-validity data.

KEYWORDS
consumer satisfaction, intervention acceptability, intervention preference, social validity

Social validity is defined as a consumer’s satisfaction with
the goals, procedures, and outcomes of intervention pro-
grams (Wolf, 1978). Social-validity assessments of
behavior-analytic interventions provide participants and
relevant stakeholders with the opportunity to give feed-
back and express their satisfaction with these three
dimensions (Wolf, 1978). These assessments may also
allow individuals to express their preferences for interven-
tions, which might enhance participation and outcomes
(Hanley, 2010). One of the criticisms, however, of pub-
lished research on behavior-analytic interventions has
been the lack of social-validity measurement, as studies
have instead predominantly focused on the efficacy and
effectiveness of interventions and practices (Callahan
et al., 2017; Carr et al., 1999; Ferguson et al., 2019;
Huntington et al., 2023). There have been recent calls to
improve the collection and reporting of information about
the degree to which the direct recipients of behavior-
analytic interventions view the procedures used as part of

these interventions as acceptable and preferred and the
outcomes meaningful (Common & Lane, 2017).

Wolf (1978) noted that the construct of social valid-
ity consists of three dimensions: (a) the goals of the
intervention, or what behaviors the intervention is
intended to change; (b) the procedures used during interven-
tion; and (c) the degree to which intervention effects are
meaningful and desirable, including those intended and
unpredicted. This conceptualization has been the primary
guide for the development of social-validity assessment
methods in the behavior-analytic research literature. Social
validity may be a critical variable in addressing the research-
to-practice gap, as interventions deemed impractical,
unacceptable, or harmful may not be adopted or applied
in real-world settings (Kazdin, 1977; Kern & Manz, 2004;
Leko, 2014; Lloyd & Heubusch, 1996). Assessing the
social validity of behavior-analytic interventions may also
support the sustainable implementation of evidence-based
interventions at a larger scale (Cook et al., 2013; Reimers

Received: 4 October 2023 Accepted: 13 May 2024

DOI: 10.1002/jaba.1092

This is an open access article under the terms of the Creative Commons Attribution L

icense

, which permits use, distribution and reproduction in any medium, provided
the original work is properly cited.
© 2024 The Author(s). Journal of Applied Behavior Analysis published by Wiley Periodicals LLC on behalf of Society for the Experimental Analysis of Behavior (SEAB).

542 J Appl Behav Anal. 2024;57:542–559.wileyonlinelibrary.com/journal/jaba

https://orcid.org/0000-0003-2219-2405

https://orcid.org/0000-0002-5792-5480

https://orcid.org/0000-0002-3061-3495

mailto:erin.leif@monash.edu

http://creativecommons.org/licenses/by/4.0/

http://wileyonlinelibrary.com/journal/jaba

http://crossmark.crossref.org/dialog/?doi=10.1002%2Fjaba.1092&domain=pdf&date_stamp=2024-06-07

et al., 1987) and prevent the development and distribution
of interventions that are likely to be rejected by consumers
and the public (Schwartz & Baer, 1991).

Carr et al. (1999) reviewed research published in the
Journal of Applied Behavior Analysis (JABA) from 1968
to 1998 to identify the prevalence of social-validity mea-
sures. Two dimensions of social validity were assessed for
each study, intervention acceptability and intervention
outcomes. On average, during this 31-year period, mea-
sures of social validity related to intervention acceptabil-
ity and outcomes were reported in only 13% of published
studies. Carr et al. expressed concerns that failure to
report the outcomes of social-validity assessments may
prevent researchers and practitioners from identifying the
reasons that behavior-analytic interventions may be
rejected or discontinued by consumers. Additionally,
Carr et al. noted that failure to report the methods used
to gather social-validity data from various consumers
may prevent the development, refinement, and uptake of
these methods.

The methods used by Carr et al. (1999) were replicated
and extended by Ferguson et al. (2019) who identified the
prevalence and type of social-validity assessments published
in JABA between 1999 and 2016. Across this 17-year
period, only 12% of studies included a social-validity mea-
sure. The social validity of the intervention procedures and
outcomes were more likely to be reported than the social
validity of intervention goals. The authors noted that most
studies used a combination of rating scales, questionnaires,
and intervention choice to collect social-validity data. The
authors also reported that “other” forms of social-validity
measurement were used in 8% of studies, but they did not
provide examples of what these types of measurement
involved.

Other researchers have explored the prevalence and
type of social-validity assessment data published across a
range of journals. Snodgrass et al. (2018) systematically
reviewed reports of social validity published in six special
education journals. All single-case research design studies
published in these six journals between 2005 and 2018
were reviewed, with 26.8% (n = 115) reporting results of
a social-validity assessment. Of these 115 studies, 28
measured the social validity of the goals, procedures, and
outcomes of the intervention. For these 28 studies, ques-
tionnaires were the most common method for collecting
data (n = 20), the direct recipients of the intervention
most often provided data on social validity (n = 19), and
most social-validity assessments were administered at or
after the intervention concluded (n = 27). However, one
limitation of Snodgrass et al. was that the authors limited
their assessment of the methods, respondents, and times
to only those 28 studies that measured all three dimen-
sions of social validity. Additionally, the authors did not
include JABA in their sample of journals.

Most recently, Huntington et al. (2023) assessed social
validity across eight behavior-analytic journals between
2010 and 2020, including JABA. Huntington et al. found

47% of studies included in their review reported a measure
of social validity, with a large increase evident in 2019 and
2020. The authors highlighted the need for future research
to identify and describe methods used to collect social-
validity data, the participants who provide social-validity
data, and timing of social-validity assessments in behavior-
analytic journals. The collection and reporting of these data
might provide a clearer picture of how social validity has
been measured in studies published in JABA, assist in the
evaluation of the quality of the data collected, and provide
new insights into how to potentially improve the future
assessment of social validity. To this end, our purpose was
to systematically identify and appraise social-validity assess-
ments included in studies published in JABA between 2010
and 2020. For the studies included in this review, we sought
to identify (a) the dimensions of social validity assessed,
(b) the types of methods used to collect social-validity data,
(c) the individuals who provided social-validity data (the
respondents), and (d) the point at which social-validity
assessments were conducted. We provide illustrative exam-
ples of different ways to measure social validity and discuss
the strengths and potential limitations of different social-
validity assessments. Based on these data and examples, we
provide recommendations for potentially improving the col-
lection and reporting of social-validity data in behavior-
analytic research.

METHOD

A systematic literature review was undertaken to iden-
tify studies for inclusion in this report. Figure 1
includes a diagram of the study screening process.
Rather than conducting a keyword search of terms
related to social validity in various databases, the iden-
tification of relevant peer-reviewed studies for inclu-
sion in this review was undertaken by compiling and
systematically screening all studies published in JABA
from 2010 (Volume 43[1]) to 2020 (Volume 53[4]). All
studies were downloaded directly from the journal’s
website and independently reviewed. A total of 1,059
studies was published in JABA between 2010 and 2020.
The search focused on studies published from 2010
onward to allow us to systematically replicate and
extend the procedures described by Carr et al. (1999)
and Ferguson et al. (2019) within a more recent
10-year period. Additionally, as the purpose of the cur-
rent review was to provide a more in-depth analysis of
the characteristics of social-validity assessments pub-
lished in JABA, studies published in other journals
were not included in the analysis.

Initial study screening procedure

To be included in the current review, the study needed to
include at least one human or nonhuman participant.

SYSTEMATIC REVIEW OF SOCIAL VALIDITY 543

19383703, 2024, 3, D
ow

nloaded from
https://onlinelibrary.w

iley.com
/doi/10.1002/jaba.1092 by C

apella U
niversity, W

iley O
nline L

ibrary on [20/09/2024]. See the T
erm

s and C
onditions (https://onlinelibrary.w

iley.com
/term

s-and-conditions) on W
iley O

nline L
ibrary for rules of use; O

A
articles are governed by the applicable C

reative C
om

m
ons L

icense

The following were excluded during the initial screening
process: technical reports, systematic reviews, meta-
analyses, brief reviews, book reviews, errata, announce-
ments, surveys, issue information, acknowledgments, and
reanalyses of previously published data sets. The methods
and results sections of all 1,059 studies were examined to
determine which studies fulfilled this inclusion criterion.
This resulted in the exclusion of 177 studies that did not
include at least one human or nonhuman participant.

Inclusion and exclusion criteria

The remaining 882 studies were reviewed a second time for
the presence or absence of at least one measure of social
validity. First, the following terms were typed into the elec-
tronic search bar of the downloaded PDF version of each
study: social validity, social validation, social acceptability,
intervention validity, intervention acceptability, consumer
satisfaction, satisfaction survey, interview, preference, or
choice. If this search returned a result, the study was
reviewed to locate any social-validity measure. If this search
did not yield any results, the methods, results, and discus-
sion section of the study were reviewed in full to determine
whether a social-validity measure was included. If the study
did not include a measure of social validity, it was excluded.

A study was included if it reported any qualitative or
quantitative data measuring the social significance of the
intervention goals, procedures, or outcomes (Wolf, 1978) or
if it included a measure of intervention preference (Hanley,
2010). All studies that included one or more measures of
social validity and reported the outcomes of the assessment
were retained. Of the 882 reviewed studies, 160 studies
reported one or more measures of social validity.

Dependent measures

Data were extracted for each of the 160 studies that
included a measure of social validity for the following
categories (and category variables): (a) the authors, (b) the
year of publication, (c) the dimension of social validity
measured (goals, procedures, or outcomes), (d) the specific
method that was used to collect social-validity data
(e.g., Likert-type rating scales, questionnaires, or inter-
views), (e) the person who provided the social-validity data
(e.g., parents, teachers, or participants), and (f) the specific
point(s) at which the social-validity data were collected
(e.g., before, during, or after intervention). The data col-
lected as part of this study can be found in the Additional
Supporting Information in the online version of this article
at the publisher’s website.

F I GURE 1 Flow diagram of the study screening process.

544 LEIF ET AL.

19383703, 2024, 3, D
ow

nloaded from
https://onlinelibrary.w

iley.com
/doi/10.1002/jaba.1092 by C

apella U
niversity, W

iley O
nline L

ibrary on [20/09/2024]. See the T
erm

s and C
onditions (https://onlinelibrary.w

iley.com
/term

s-and-conditions) on W
iley O

nline L
ibrary for rules of use; O

A
articles are governed by the applicable C

reative C
om

m
ons L

icense

Dimensions of social validity

Table 1 provides a definition of each dimension of social
validity assessed in the current review. A study was scored
as reporting a measure of the social validity of the interven-
tion goals if formal measures were employed to assess
consumer acceptance of or agreement with the purpose
or purported goals of the intervention and the behaviors
targeted for change as part of the intervention. A study
was scored as reporting a measure of the social validity of
the intervention procedures if formal measures were
employed to assess consumer acceptance of, agreement
with, or preference for the tactics used to deliver the inter-
vention or to assess the consumer’s willingness to continue
with intervention. A study was scored as reporting an assess-
ment of the social validity of the intervention outcomes if

formal measures were used to assess consumer satisfaction
with, social importance of, or practical significance of the
intervention effects.

Social-validity assessment methods

Table 2 provides a definition of each method of social-
validity assessment included in the current review. Social-
validity assessment methods were defined as the specific
procedures used to collect data on measures of each dimen-
sion of social validity. Social-validity assessment methods
included (a) Likert-type rating scales, (b) non-Likert-type
questionnaires, (c) direct observations, (d) intervention pref-
erence or choice questions, (e) concurrent-chains interven-
tion preference assessments, or (f) interviews.

TABLE 1 Dimensions of social validity assessed (adapted from Wolf, 1978).

Dimension Definition
Total number
of studies Percentage

Intervention goals Acceptance of or agreement with the purpose or purported goals of the
intervention and the behaviors targeted for change (Are the specific behaviors
selected for change and the reasons for behavior change important and valued?)

26 16.25%

Intervention procedures Acceptance of, agreement with, or preference for the strategies and tactics used to
deliver the intervention or willingness to continue with intervention (Are the
specific intervention strategies used acceptable and preferred?)

144 90%

Intervention outcome Satisfaction with, social importance of, or practical significance of the
intervention effects (Are the outcomes associated with the intervention
meaningful, including any unexpected outcomes?)

110 68.75%

TABLE 2 Social-validity assessment methods (adapted from Carter & Wheeler, 2019).

Methods Definition
Total number
of studies Percentage

Likert-type rating
scales

A scale that consists of a series of statements or items related to the goals of
an intervention, intervention procedures, or outcomes of an intervention for
which respondents are asked to indicate their level of agreement or
disagreement with each statement. The scale typically ranges from “Strongly
Disagree” to “Strongly Agree,” with several intermediate response options

129 80.63%

Non-Likert-type
questionnaires

A survey or assessment tool that does not use the traditional Likert-type scale
format for collecting responses. Questionnaires might include closed-ended
response options, including multiple-choice or yes/no questions; visual-
analogue scales; or open-ended questions about the intervention

53 33.13%

Direct observations In vivo or video-based observations in which observers watch intervention
sessions and then provide feedback on the intervention, often using Likert-
type rating scales or non-Likert-type questionnaires

41 25.63%

Intervention preference
or choice

Opportunities for people who are directly involved in the intervention (as
recipients or interventionists) to provide feedback on which intervention they
prefer or will continue to use following the study. However, the respondent
does not experience the intervention after indicating their preference or choice

17 10.63%

Concurrent-chains
intervention preference
assessments

Opportunities for people who are directly involved in the intervention (as
recipients) to choose from available interventions by selecting a discriminative
stimulus associated with that intervention and then experiencing their selected
intervention following their selection

15 9.38%

Interviews A conversation facilitated by an interviewer who asks the respondent a range
of questions to collect information about their opinion of, satisfaction with, or
preference for the interventions’ goals, procedures, and outcomes

5 3.13%

SYSTEMATIC REVIEW OF SOCIAL VALIDITY 545

19383703, 2024, 3, D
ow

nloaded from
https://onlinelibrary.w

iley.com
/doi/10.1002/jaba.1092 by C

apella U
niversity, W

iley O
nline L

ibrary on [20/09/2024]. See the T
erm

s and C
onditions (https://onlinelibrary.w

iley.com
/term

s-and-conditions) on W
iley O

nline L
ibrary for rules of use; O

A
articles are governed by the applicable C

reative C
om

m
ons L

icense

Respondents

Table 3 provides a definition of each group of social-
validity assessment respondents included in the current
review. Respondents were defined as any person who was
formally invited by the researchers to participate in a
social-validity assessment and included (a) participants
who received the intervention; (b) participants who deliv-
ered the intervention; (c) parents or caregivers of the par-
ticipants who received the intervention but who did not
deliver the intervention; (d) educators, therapists, instruc-
tors, or other professionals who had a relationship with
the participants who received the intervention but who
did not deliver the intervention; and (e) individuals
who were not involved in the study and who did not have
a relationship with the participants but were invited by
the researchers to provide feedback on the study’s goals,
procedures, or outcomes.

Social-validity measurement points

Table 4 provides a definition of the different points at
which social-validity assessments were conducted in the

included studies. If social-validity data were collected prior
to the start of the intervention (e.g., by asking parents
about the acceptability of the intervention goals), it was
coded as “before.” If social-validity data were collected
during the implementation of the intervention (e.g., by
providing participants with a choice of which intervention
they would like to experience), it was coded as “during.” If
social-validity data were collected at the conclusion of
intervention, during maintenance or generalization phases,
or during follow-up sessions, it was coded as “after.” If
there was not enough information provided in the methods
section of the study to determine the point at which social-
validity data were collected, it was coded as “unclear.” If
social validity was assessed at more than one point
(e.g., before and after the study), it was coded as both
“before” and “after.” If social validity was assessed multi-
ple times at a single point (e.g., assessed three times after
the study), it was coded as “after” only one time.

Data extraction procedures

To extract data, the first author read the methods and
results section for each included study. If tables were

TABLE 3 Social-validity respondents.

Respondents Definition
Total number
of studies Percentage

Participants who
received the
intervention

Consumers whose behavior was targeted for change through the delivery of
the intervention (e.g., children, students, athletes, employees)

89 55.63%

Participants who
delivered the
intervention

Consumers who delivered the intervention but who were not members of the
research team (e.g., parents, teachers, coaches, therapists)

41 25.63%

Parents/caregivers Family members or primary caregivers for participants who received the
intervention but who were not involved in the delivery of the intervention

27 16.88%

Educators/therapists/
instructors

Professionals who had a relationship with the participants who received the
intervention but who were not involved in the delivery of the intervention

25 15.63%

Individuals who were
not involved in the
study

Any individual who served as a respondent and provided social-validity data
but who did not have a relationship with the participant who received the
intervention and/or who was naïve to the purpose of the study

35 21.88%

TABLE 4 Point at which social-validity data were collected.

Measurement point Definition
Total number
of studies Percentage

Before Prior to the start of the intervention 15 9.38%

During Any time during the delivery of the intervention, or when intervention sessions
followed the collection of social-validity data and were informed by the social-
validity data

23 13.75%

After After the conclusion of the intervention when no additional intervention sessions
were planned or delivered, based on the data, or during maintenance and
generalization or follow-up sessions

133 83.13%

Unclear The information provided in the methods section of the study were not detailed
enough to permit the identification of the point at which social-validity data were
collected

13 8.13%

546 LEIF ET AL.

19383703, 2024, 3, D
ow

nloaded from
https://onlinelibrary.w

iley.com
/doi/10.1002/jaba.1092 by C

apella U
niversity, W

iley O
nline L

ibrary on [20/09/2024]. See the T
erm

s and C
onditions (https://onlinelibrary.w

iley.com
/term

s-and-conditions) on W
iley O

nline L
ibrary for rules of use; O

A
articles are governed by the applicable C

reative C
om

m
ons L

icense

presented that included a list of specific questions asked
as part of the social-validity assessment, these were
reviewed as well. In some cases, authors provided an
example of social-validity data collection tools as part of
supplementary materials information. When provided,
supplementary materials were also reviewed. The pres-
ence or absence of each category variable was determined
by the presence of keywords in the text, tables, and/or
supplementary materials and the description of the
dimensions of social validity measured, methods used to
measure social validity, the respondents who provided
social-validity data, and the point(s) at which social-
validity data were collected as evident in the text of the
study. All data were entered into an author-created Excel
spreadsheet to facilitate data analysis (available upon
request). The percentage of studies that included each
variable was calculated by dividing the total number of
studies that measured each dimension by the total num-
ber of included studies (n = 160) and multiplying by 100.

Interrater reliability

Interrater reliability data were collected at four points.
First, interrater reliability data were collected for the
total number of studies published in JABA from 2010 to
2020. Two independent raters (the third and fifth
authors) reviewed all studies in all issues published in
four randomly selected years of publication (2011, 2015,
2016, and 2020; 44% of total studies). Years were selected
at random using an online random number generator.
An agreement was defined as the primary and indepen-
dent rater calculating the same total number of studies
included in each issue. A disagreement was defined as
any discrepancy in the total number of studies per issue.
Interrater reliability was calculated for each study by
adding the total agreements and dividing by the sum of
the agreements plus disagreements and multiplying by
100. Interrater reliability for the number of total studies
published in JABA was 100%.

Second, interrater reliability data were collected for
the initial screening procedure (N = 1,059). The two
independent raters reviewed all studies in all issues pub-
lished in the same four randomly selected years of publi-
cation (2011, 2015, 2016, and 2020; 44% of total studies).
An agreement was defined as the primary and indepen-
dent rater calculating the same total number of studies
that included at least one human or nonhuman partici-
pant for each issue in each year. A disagreement was
defined as any discrepancy in the total number of studies
identified as including at least one human or nonhuman
participant per issue per year. Interrater reliability was
calculated for each study by adding the total agreements
and dividing by the sum of the agreements plus disagree-
ments and multiplying by 100. Total agreement was cal-
culated by averaging the interrater reliability score across
years. Interrater reliability for the number of total studies

included following the initial screening process was
96.50%. Any discrepancies (n = 16) were reviewed by the
first author and one of the independent raters and
resolved.

Third, the two independent raters applied the inclu-
sion and exclusion criteria to the studies retained follow-
ing initial screening (n = 882). The independent raters
reviewed all studies in the same four randomly selected
years of publication to determine whether the study
included a measure of social validity. If the study
included a measure of social validity, the independent
raters recorded the authors, title, year, and issue in an
Excel workbook that was identical to that used by the
primary rater. An agreement was defined as the primary
and independent rater selecting the same authors, title,
year, and issue. A disagreement was defined as any dis-
crepancy between the studies identified by the two raters.
Interrater reliability was calculated for each year by add-
ing the total agreements and dividing by the sum of the
agreements plus disagreements for that year and multi-
plying by 100. Total agreement was calculated by averag-
ing the interrater reliability score across years. Total
interrater reliability for the inclusion procedures was
96%. Any discrepancies (n = 4) were reviewed by the first
author and one of the independent raters and resolved.

Finally, the second author independently reviewed and
coded 84.38% (n = 135) of the included studies. The inde-
pendent rater followed the same coding procedures
described above. Data entered in the Excel workbook by
the primary rater were then compared with those entered by
the independent rater. An agreement was defined as the pri-
mary and independent rater indicating the presence or
absence of each category variable (dimension, method,
respondent, and point of collection). A disagreement was
defined as any discrepancy between the coding of each cate-
gory variable between the two raters. Interrater reliability
was calculated individually for each study by adding the
total agreements and dividing by the sum of the agreements
plus disagreements and multiplying by 100. Total agreement
was calculated by averaging the interrater reliability score
across studies and averaged 95.60% (range: 92%–100%).
Any discrepancies (n = 6) were reviewed by the first author
and one of the independent raters and resolved.

RESULTS AND DISCUSSION

Prevalence of social validity

Figure 2 depicts the percentage of total studies published
per year that included a measure of social validity. Of the
882 studies retained for review, 160 (18.14%) included
measures of social validity. Between 2010 and 2019, the
total number of studies published in JABA each year
(and retained for inclusion in the current review) ranged
from 57 (in 2017) to 99 (in 2011). The percentage of these
studies reporting results of social-validity data was stable,

SYSTEMATIC REVIEW OF SOCIAL VALIDITY 547

19383703, 2024, 3, D
ow

nloaded from
https://onlinelibrary.w

iley.com
/doi/10.1002/jaba.1092 by C

apella U
niversity, W

iley O
nline L

ibrary on [20/09/2024]. See the T
erm

s and C
onditions (https://onlinelibrary.w

iley.com
/term

s-and-conditions) on W
iley O

nline L
ibrary for rules of use; O

A
articles are governed by the applicable C

reative C
om

m
ons L

icense

ranging between 10% and 20%. A notable exception was
observed in 2017, when 28% of included studies included
a measure of social validity. Interestingly, a large increase
in the percentage of studies including a measure of social
validity was observed in 2020. In 2020, 137 studies were
published and included for review in the current study,
and of these, 32.10% included a measure of social
validity.

These findings extend those presented by Carr et al.
(1999) and Ferguson et al. (2019). Between 1968 and 1998,
Carr et al. identified an increasing trend in the number of
studies published in JABA, particularly between the mid-
1970s and the mid-1980s. Between the mid-1980s and 1998,
Carr et al. reported that approximately 25% of studies
included a measure of social validity. Between 1999 and
2016, Ferguson et al. identified 1,209 studies that included
at least one participant. Of these studies, only 141 (12%)
included a measure of social validity, a notable decrease rel-
ative to the findings of Carr et al. However, Ferguson et al.
noted a variable but increasing trend in the percentage of
studies including a measure of social validity, primarily
between 2005 and 2016.

In the current study, we found that between 2010 and
2020, on average, 18.14% of studies published in JABA
that included at least one participant included a measure
of social validity. These data suggest that publication of
social-validity assessment data in JABA is increasing. As
mentioned above, we found a marked increase in the
publication of social-validity assessment data in 2020,
with 32.10% of studies including a measure of social
validity. These findings replicate those reported by Hun-
tington et al. (2023), who also showed a substantial

increase in the number of studies including a measure of
social validity in behavior-analytic journals in 2019
and 2020.

Dimensions of social validity

Table 1 depicts the number and percentage of included
studies (n = 160) reporting a measure of each dimension
of social validity. Assessing consumer acceptance of the
procedures used as part of the intervention was the most
common dimension of social validity measured, with 90%
of included studies reporting a measure of acceptability or
satisfaction with procedures used. A measure of consumer
satisfaction with intervention outcomes was included in
68.75% of included studies, whereas measures of consumer
acceptance or agreement with the goals of intervention were
reported less often, in only 16.25% of included studies.

These findings differ from those reported by Carr et al.
(1999), who identified between 0% and 30% of studies as
including a measure of the acceptability of the procedures
used. This value increased to a high of nearly 50% when
data on the percentage of studies including a measure of
the acceptability of the procedures or perceptions of inter-
vention outcomes were also included. However, Carr et al.
did not include data on the percentage of studies including
a measure of the social validity of intervention goals.
Although the findings of the current study differ from
those of Carr et al., they are consistent with those reported
by Ferguson et al. (2019), who found that 85% of studies
included a measure of the social validity of the interven-
tion procedures, 60% included a measure of the social

F I GURE 2 Total included studies and percentage of studies reporting social-validity data in the Journal of Applied Behavior Analysis from 2010
to 2020.

548 LEIF ET AL.

19383703, 2024, 3, D
ow

nloaded from
https://onlinelibrary.w

iley.com
/doi/10.1002/jaba.1092 by C

apella U
niversity, W

iley O
nline L

ibrary on [20/09/2024]. See the T
erm

s and C
onditions (https://onlinelibrary.w

iley.com
/term

s-and-conditions) on W
iley O

nline L
ibrary for rules of use; O

A
articles are governed by the applicable C

reative C
om

m
ons L

icense

validity of the outcomes, and only 12% included a measure
of the social validity of the intervention goals.

In the current study, we found that the acceptability of
intervention goals was often assessed concurrently with the
acceptability of intervention procedures using Likert-type
rating scales. However, some authors conducted observa-
tions of behavior prior to implementing any interventions
to determine the overall goals for the intervention. In one
noteworthy example, Mann and Karsten (2020) asked col-
lege students to model different types of typical conversa-
tion behaviors and recorded data on the topography of
these behaviors. These behaviors were then used as a nor-
mative sample to develop socially valid intervention goals
for participants. Because procedures designed to assess the
social validity of intervention goals are published less fre-
quently, it is possible that behavior analysts are less familiar
with how to design these types of social-validity assess-
ments. Alternatively, it is possible that behavior analysts
develop individualized and socially valid goals and proce-
dures for intervention through conversations with partici-
pants and other stakeholders prior to intervention during
the process of gaining informed consent. However, we
found that information about these types of informal mea-
sures of social validity were not commonly published.

Social-validity methods

Table 2 depicts the number and percentage of studies that
used various methods to assess social validity.

Likert-type rating scales

Likert-type rating scales were the most frequently used
method, accounting for 80.63% of the total number of
studies. In these studies, researchers developed a set of
statements and asked respondents to select from a set
of response options to indicate how much they agreed
with the statement. For example, DiGennaro Reed et al.
(2010) evaluated a video-modeling intervention to
improve the procedural fidelity of behavioral interven-
tions delivered by teachers. At the conclusion of the
study, the teachers were invited to respond to 15 Likert-
type questions adapted from the Intervention Rating
Profile-15 (Martens et al., 1985) to indicate the accept-
ability of the video-modeling intervention. The teachers
read each statement before selecting a response option
on a Likert-type scale ranging from 1 (strongly disagree)
to 6 (strongly agree), with higher scores representing
higher intervention acceptability.

Other authors have used Likert-type rating scales to
assess all three dimensions of social validity. Austin and
Bevan (2011) evaluated the effects of a differential-rein-
forcement-of-low-rates-of-behavior intervention on the
rate of attention-seeking behavior displayed by three stu-
dents. At the end of the study, the researchers invited the

teacher to respond to questions about whether students
asked for attention too often prior to intervention (assess-
ment of the social validity of intervention goals), whether
the intervention was easy to implement, whether it
could be easily integrated into classroom routines,
whether she would continue to use it (assessment of the
social validity of intervention procedures), and whether
she thought the children worked more independently
and completed more work when the intervention was
in place (assessment of the social validity of interven-
tion outcomes). Data were collected using a 5-point
Likert-type scale (strongly disagree to strongly agree),
with higher scores indicating higher levels of accept-
ability. The authors also adapted the Likert-type rating
scale to collect social-validity data with students.
The students were invited to indicate whether they
liked the intervention, liked earning points exchanged
for reinforcers, and wanted their teacher to keep using
the intervention. Students circled faces on a 3-point
smiling-faces scale for each question.

Likert-type rating scales are a type of closed-ended
social-validity assessment in that they allow participants
to select a single response that represents their answer to
a question or agreement with a statement. Likert-type
social-validity assessments may be relatively easy and fast
to implement and may allow for the quantitative analysis
of social-validity data and comparison of data across
multiple respondents. For example, mean ratings for each
participant can be compared across participants or for
the same participant over time. A unique example of a
pre- and postintervention measure of social validity was
provided by Mancuso and Miltenberger (2016), who
assessed participants’ perceptions of their public speaking
skills before and after a habit reversal intervention. Par-
ticipants rated their confidence and comfort with public
speaking before and after intervention, and mean scores
were compared to determine whether participants had
more positive views of their public speaking postinterven-
tion. These data supplemented direct observations of the
participants’ public speaking skills. However, a notewor-
thy limitation of Likert-type rating scales is that they do
not allow respondents to expand on the reasons for their
response selections and thus may not help researchers
identify why interventions may or may not be viewed as
acceptable or preferred by consumers.

Non-Likert-type questionnaires

Non-Likert-type questionnaires were the second most
common method of assessing social validity and were used
in in 33.13% of included studies. For example, Raiff and
Dallery (2010) invited participants to complete a treatment-
acceptability questionnaire, using a 100-mm visual analogue
scale, to rate the ease of use, enjoyment, convenience, help-
fulness, and effectiveness of an Internet-based contingency-
management program for the management of Type

SYSTEMATIC REVIEW OF SOCIAL VALIDITY 549

19383703, 2024, 3, D
ow

nloaded from
https://onlinelibrary.w

iley.com
/doi/10.1002/jaba.1092 by C

apella U
niversity, W

iley O
nline L

ibrary on [20/09/2024]. See the T
erm

s and C
onditions (https://onlinelibrary.w

iley.com
/term

s-and-conditions) on W
iley O

nline L
ibrary for rules of use; O

A
articles are governed by the applicable C

reative C
om

m
ons L

icense

1 diabetes. Higher numbers on the visual analogue scale
were indicative of more favorable perceptions of the inter-
vention. Jones et al. (2019) developed open-ended questions
to assess participant perceptions of the acceptability and
outcomes of an interdependent group contingency imple-
mented in a classroom setting to reduce students’ use of
cell phones during instructional periods. Students who
participated in the intervention were invited to answer
three questions following the intervention: (1) What did
you think about not having your phones during class
time? (2) Did you feel that you could focus better during
class without your phones? and (3) What was your reaction
when other students caused the rest of the class to lose their
10 min of free time? Interestingly, the students who partici-
pated conveyed an unfavorable view of the interdependent
group-contingency procedures because they were discour-
aged from using their cell phones at school (a measure of
the social validity of intervention acceptability). However,
these same participants reported that they were satisfied
with intervention outcomes because they were better able to
sustain their focus in the classroom (a measure of the social
validity of the intervention outcome). The use of open-ended
questions allowed researchers to gain more information
about participants’ opinions, which may be helpful in inter-
preting and understanding the reason for discrepant or unfa-
vorable ratings on closed-ended social-validity assessments.

Direct observations

Direct observations were the third most common method
used by researchers to gather social-validity data. Of the
total number of studies reporting a measure of social valid-
ity, 25.63% included a direct observation measure. For
example, before initiating an intervention to improve the
safety skills of employees working in a manufacturing
setting, Abellon and Wilder (2014) collected data on the
workplace behavior displayed by one employee whom the
supervisor identified as displaying exemplary safety skills.
These data were used to establish socially valid intervention
goals (i.e., a performance standard) for the participating
employees. Similarly, Stokes et al. (2010) evaluated an inter-
vention to improve the pass-blocking skills of high school
American-rules football players. Prior to intervention, the
researchers watched video clips of the top-performing
players from the previous year and measured their correct
performance using a 10-step task analysis. The researchers
used data collected from these videos to establish perfor-
mance goals for players receiving the intervention. In both
studies, the behaviors displayed by participants during base-
line and intervention were compared with these normative
samples to determine how much improvement was made
and when performance goals were achieved.

In some studies that used direct observation measures,
observers were asked to watch video clips of different
interventions and then rate the acceptability of the proce-
dures used. For example, Gibbs et al. (2018) asked the

parents of children who received an intervention to reduce
vocal stereotypy to watch videos of two different interven-
tions: response interruption and redirection (RIRD) alone
or free access to competing stimuli + RIRD. Using an
adapted version of the Intervention Evaluation Inventory–
Short Form (TEI-SF; Kelley et al., 1989), parents were
asked to respond to statements about the acceptability of
the procedures used in each condition via a Likert-type
rating scale (1 = strongly disagree; 5 = strongly agree).
Example statements included “I find this intervention to
be an acceptable way of dealing with my child’s vocal ste-
reotypy” and “I would be willing to use this intervention at
home to address my child’s vocal stereotypy.”

In other studies, naïve observers were asked to view video
samples of participants pre- and postintervention to judge
whether the outcomes were meaningful or consistent with
behavioral norms or performance standards. For example,
Grosberg and Charlop (2017) asked 20 mothers of school-
aged children who were unfamiliar with both the purpose of
the study and the participants to view video clips of children
collected during baseline and intervention and answer ques-
tions about the children’s play and social skills. After watch-
ing each clip, the mothers responded to questions such as
“Does the child demonstrate an interest in having a conver-
sation with his/her peers?” and “Would my child want to
talk with this peer?” using a 7-point Likert-type rating scale
(with 1 being strongly disagree, 4 being neutral, and 7 being
strongly agree). The researchers displayed differences
between ratings of participants’ pre- and postintervention
play and social behavior, which were also compared through
paired samples t tests conducted with numerical data col-
lected from these questionnaire items. This method of data
analysis allowed the researchers to determine the statistical
significance (as well as practical significance) of changes in
ratings related to the behavior demonstrated by children
before and after intervention.

As described above, direct observation measures of
social validity may be useful for developing intervention
goals, assessing the acceptability of intervention proce-
dures with consumers who are not directly receiving or
involved in the delivery of interventions, or assessing the
degree to which behavior change is meaningful or consis-
tent with behavioral norms (or expected behaviors based
on normative samples or comparisons). Direct observa-
tions were often used in conjunction with Likert-type rat-
ing scales to measure observers’ agreement with
statements about the acceptability of the intervention
procedures used or the relative degree of behavior
change. However, the degree to which independent
observers rate intervention procedures as acceptable or
the degree to which behavior change is consistent with
behavioral norms may not necessarily reflect the degree
to which the direct recipient of the intervention perceives
the procedures to be acceptable and the outcomes mean-
ingful. Rather, such measures more often reflected the
degree to which others view the procedures as acceptable
and the outcomes meaningful.

550 LEIF ET AL.

19383703, 2024, 3, D
ow

nloaded from
https://onlinelibrary.w

iley.com
/doi/10.1002/jaba.1092 by C

apella U
niversity, W

iley O
nline L

ibrary on [20/09/2024]. See the T
erm

s and C
onditions (https://onlinelibrary.w

iley.com
/term

s-and-conditions) on W
iley O

nline L
ibrary for rules of use; O

A
articles are governed by the applicable C

reative C
om

m
ons L

icense

Intervention preference or choice

Intervention preference or choice questions were the
fourth most common method used to gather social-
validity data, reported in 10.63% of included studies. In
these studies, direct consumers of the intervention were
invited to indicate their preference for different interven-
tion components or choose the intervention they would
like to continue with at the conclusion of the study. Nota-
bly, respondents did not experience the intervention after
indicating their preference or choice. For example,
following an intervention to address food selectivity dis-
played by a child with autism, Allison et al. (2012) asked
the child’s parent, who did not deliver the intervention,
to indicate her preference for two equally effective inter-
vention procedures: escape extinction + differential rein-
forcement of alternative behavior and escape extinction
+ noncontingent reinforcement. The authors noted that
because both interventions were effective, parent prefer-
ence might be the most important determinant of which
intervention to use. The parent reported that escape
extinction + noncontingent reinforcement was more
acceptable, easier to implement, and a better fit for her
child’s needs (measures of the social validity of the inter-
vention procedures). The parent also indicated that she
would feel more comfortable implementing escape extinc-
tion + noncontingent reinforcement at home and in pub-
lic settings. Intervention preference or choice assessments
might address limitations associated with direct observa-
tional methods because they involve the direct recipients
of intervention or those who are responsible for imple-
menting intervention outside of the study. Combining
measures of intervention preference with open-ended
questions about why the specific intervention is preferred
may provide researchers with rich information about
components of intervention that are viewed as more or
less acceptable as well as components of interventions
that might continue to be implemented postintervention.

Concurrent-chains intervention preference
assessments

Likert-type rating scales, questionnaires, and other
methods of collecting social-validity data might not be
accessible to people with disabilities or young children
who cannot vocally report their preferences. In such
cases, concurrent-chains assessments might be used to
assess relative preference for different interventions.
Concurrent-chains assessments were reported in 9.38% of
included studies. In concurrent-chains assessments,
participants choose between two or more concurrent
interventions. Response options, each associated with a
discriminative stimulus (e.g., a colored card), are pre-
sented to the participant. Following a selection response
(e.g., pointing to a colored card), the participant experi-
ences the intervention associated with that response

option. For example, Potter et al. (2013) evaluated pref-
erence for interventions designed to increase leisure item
engagement and decrease motor stereotypy with teen-
agers with developmental disabilities and language
delays. Participants were invited to select a colored card
corresponding to each intervention. When differential
consequences were provided (i.e., the participant experi-
enced the intervention associated with the colored card),
preferences were identified. All participants consistently
selected the colored card associated with response block-
ing plus differential access to automatic reinforcement.
Although concurrent-chains assessments will likely
provide valuable information about the preferences of
individuals with disabilities, the approach may require
teaching prerequisite skills (e.g., discrimination between
interventions). Leaf et al. (2010) evaluated preferences
for different prompting procedures with young children
with autism and language delays. The children in this
study made inconsistent selections, suggesting they either
did not have clear preferences or could not discriminate
the interventions associated with each colored card.

Interviews

Interviews were the least common assessment method
across all studies. They were used to gather social-validity
data in 3.13% of included studies. This finding likely
reflects the fact that JABA favors the publishing of quan-
titative rather than qualitative data. Interview data are
often analyzed using qualitative research methods, such
as thematic analysis (Braun & Clarke, 2022), a research
method that may be less familiar to behavior analysts.
However, there were some noteworthy examples. For
example, Gunning et al. (2020) taught parents of typi-
cally developing children and children with autism to
implement a version of the Preschool Life Skills program
with their children at home. At the end of the study, the
authors interviewed the children to find out what they
thought of the program. The authors reported activities
included in the program that the children said they liked
(e.g., marble runs, foam building kit). In another exam-
ple, Nieto and Wiskow (2020) interviewed students fol-
lowing their participation in the STEP it UP! game to
determine which condition (i.e., no game, Step it UP!
game, or Step it UP! game + adult interaction) they liked
the most and why. In these studies, the researchers posed
brief open-ended questions to participants and recorded
their responses. The questions asked were similar to
open-ended survey questions, with the main difference
being that the researcher asked the questions instead of
asking respondents to write down their answers. In most
studies that included interviews, short illustrative quotes
were provided in the results section or the authors summa-
rized the main findings in one or two sentences.

Overall, these findings add to the literature on the
assessment of social validity by defining different methods,

SYSTEMATIC REVIEW OF SOCIAL VALIDITY 551

19383703, 2024, 3, D
ow

nloaded from
https://onlinelibrary.w

iley.com
/doi/10.1002/jaba.1092 by C

apella U
niversity, W

iley O
nline L

ibrary on [20/09/2024]. See the T
erm

s and C
onditions (https://onlinelibrary.w

iley.com
/term

s-and-conditions) on W
iley O

nline L
ibrary for rules of use; O

A
articles are governed by the applicable C

reative C
om

m
ons L

icense

reporting data on the prevalence of different methods, and
providing examples of how different methods might be
used to facilitate the collection of social-validity data. In
addition, we highlighted some noteworthy strengths and
limitations of these methods. Of note, Carr et al. (1999)
and Huntington et al. (2023) did not report data on the
methods used to collect social-validity data. Ferguson
et al. (2019) noted that a combination of two or more
methods (questionnaire, rating scale, intervention choice,
or other) were the most commonly used to collect social-
validity data (48% of included studies). Ferguson et al.
noted that rating scales were used in 21% of studies,
followed by questionnaires (17%), other methods (8%),
and intervention choice (6%). We extended the findings of
Ferguson et al. by disaggregating this information and
providing data on the exact number of studies that
included each type of measure. We also reported data on
additional methods for collecting social-validity data
including direct observations, concurrent-chains interven-
tion preferences assessments, and interviews.

Understanding the methods used to collect social-
validity data is important for several reasons. First, pro-
viding detailed descriptions of data collection methods
may enable other researchers to replicate or adapt the
methods for similar research questions, which may
enhance the future reporting of social-validity data. Sec-
ond different methods for collecting social-validity data
are likely to have different strengths and limitations.
Knowing the specific methods that might be used may
help practitioners and researchers evaluate the degree to
which a specific method might be useful with a specific
respondent and the extent to which the data they collect
represent the constructs being measured. Third, the
choice of data collection methods may influence the inter-
pretation of results. For example, qualitative methods
(e.g., interviews or open-ended questions) may provide
richer insights into participants’ perspectives, whereas
quantitative methods (e.g., Likert-type rating scales) may
yield more precise numerical data. Finally, providing a
description of specific methods used in social-validity
assessments may be valuable for practitioners who wish
to implement similar assessments in real-world settings.
By providing examples of how different methods have
been used in the behavior-analytic research, practitioners
may be better able to select and adapt different methods
for use in their work.

Respondents who provided social-validity data

Table 3 depicts the number and percentage of studies that
gathered social-validity data from different groups of
respondents. In the current study, participants who
received the intervention were the most common respon-
dents for social-validity assessments. Of the total number
of studies that reported a measure of social validity,
55.63% gathered data from the participants themselves.

The demographics of the participants varied substan-
tially, ranging from young children to adults in a range
of contexts including homes, schools, employment set-
tings, and disability programs. For example, Fogel et al.
(2010) evaluated the effects of exergaming on students’
physical activity in a school physical education class. Sev-
eral different exergaming programs were provided to the
students. At the end the study, the researchers asked
the participants to rank order the exergames from most
to least preferred. This allowed the researchers to identify
differences in preference among the participating stu-
dents. Erath et al. (2020) taught 25 human-services staff
working in a residential services program for adults with
disabilities to implement behavioral skills training to
teach job skills to newly hired program staff. At the con-
clusion of the training, participants were invited to
respond to questions about their experiences with the
training using a modified version of the Intervention
Rating Profile-15 (Martens et al., 1985). Finally, studies
that employed concurrent-chains intervention preference
assessments (e.g., Potter et al., 2013) allowed individuals
with disabilities and communication delays to express
their preferences for different interventions by providing
opportunities for them to choose which intervention con-
text they would like to experience.

Participants who were responsible for delivering the
intervention provided social-validity data in 25.63% of
included studies. Lerman et al. (2013) coached adults
with disabilities to deliver teaching programs to young
children with autism as part of a vocational training pro-
gram. At the conclusion of the role-play portion of the
training, the participants were invited to complete a
Likert-type rating scale to answer questions including “I
like the methods used to train me,” “These training
methods were effective in teaching me new skills,” and “I
would feel comfortable using these skills with children.”
In another study, Allen and Wallace (2013) taught dentists
to use a fixed-time schedule of breaks to decrease escape-
maintained challenging behavior displayed by children
attending a local dental clinic. The dentists were invited to
complete a modified version of the Treatment Evaluation
Inventory-Short For TEI-SF (Kelley et al., 1989) to pro-
vide feedback on how acceptable the treatment was, how
willing they would be to use the procedure, and how much
they liked the procedure.

Social-validity data were gathered from parents or
caregivers not directly involved in delivering the interven-
tion in 16.88% of included studies. For example, Rubio
et al. (2020) evaluated the effects of a finger prompt on the
food acceptance and refusal behavior of children attending
a day treatment program for the assessment and treatment
of avoidant/restrictive food intake disorder. Parents were
invited to observe the intervention and respond to a set of
Likert-type questions, such as “I was comfortable with this
treatment for my child” and “I feel my child is now accept-
ing more food (amount and/or variety) during mealtimes
than before this treatment.” Following this feedback,

552 LEIF ET AL.

19383703, 2024, 3, D
ow

nloaded from
https://onlinelibrary.w

iley.com
/doi/10.1002/jaba.1092 by C

apella U
niversity, W

iley O
nline L

ibrary on [20/09/2024]. See the T
erm

s and C
onditions (https://onlinelibrary.w

iley.com
/term

s-and-conditions) on W
iley O

nline L
ibrary for rules of use; O

A
articles are governed by the applicable C

reative C
om

m
ons L

icense

parents received training on how to implement the proce-
dure with their children at home.

Gibbs et al. (2018) evaluated the effects of noncontin-
gent music and RIRD on vocal stereotypy displayed by
two children with autism. After completion of the inter-
vention, the parents of the participants were invited to
view video clips of their child during RIRD alone and
RIRD + music. After viewing the recording, the parents
responded to Likert-type questions adapted from the
TEI-SF (Kelley et al., 1989) measuring treatment accept-
ability for each condition. Example questions included “I
find this intervention to be an acceptable way of dealing
with my child’s vocal stereotypy” and “I believe that my
child experiences discomfort during this intervention.”
Both parents expressed a preference for using RIRD +
music at home.

Educators, therapists, instructors, or other profes-
sionals who had a relationship with the participant but
who were not directly involved in delivering the interven-
tion provided social-validity data in 15.63% of included
studies. For example, Luczynski and Hanley (2013)
taught communication and social skills to preschool-aged
children at risk for the development of challenging
behavior. At the end of the study, the authors invited the
assistant director of quality assurance for all local pre-
schools, the director of the preschool that participants
attended, and the lead and assistant classrooms teachers
who worked directly with the participants to view video
clips of the children during baseline and intervention ses-
sions and respond to a series of Likert-type questions
about the goals, procedures, and outcomes of the
intervention.

Finally, individuals who were not involved in the study
or were naïve to the purpose of the study provided social-
validity data in 21.88% of included studies. For example,
Howard and DiGennaro Reed (2014) coached animal
shelter staff to conduct obedience training with hard-
to-adopt shelter dogs. At the conclusion of the training,
the researchers recruited potential adopters, shelter staff,
and animal trainers employed by or volunteering at the
shelter but who were not involved in the research to view
video clips of the trainer and dog interacting during base-
line and training sessions. Respondents were asked to
answer questions about the acceptability of the training
methods observed using a Likert-type rating scale and to
select which video (before or after training) they consid-
ered “better” along five dimensions: (a) effectiveness of
trainer, (b) desirability of trainer, (c) adoptability of dog,
(d) which dog would be better with children, and (e) which
dog would be better for a first-time pet owner. Tai and
Miltenberger (2017) used behavioral skills training to
teach safe tackling skills to youth American-rules football
players. At the conclusion of the study, a youth football
coach who was naïve to the purpose of the study viewed
videos of the tackles made by the participants during base-
line and intervention sessions and was asked to select the
video depicting the safer tackle.

Collecting these data allowed us to extend the
methods used by Carr et al. (1999), Ferguson et al.
(2019), and Huntington et al. (2023), who did not report
data on respondent types. Our findings were consistent
with those reported by Snodgrass et al. (2018), who found
that the direct recipients of the intervention most com-
monly provided social-validity data. Understanding the
source of social-validity assessment data may allow for a
more comprehensive assessment of the credibility and
reliability of the information. Different individuals or
groups may have varying perspectives on and vested
interests in the social validity of an intervention. For
example, participants who receive an intervention may
comment on their preference for the intervention, how
much the intervention helped them achieve their unique
goals, and how participating in the intervention fits into
their daily life. In contrast, opinions provided by parents,
teachers, or health care professionals may offer different
viewpoints on the acceptability and effectiveness of an
intervention based on other factors, such as ease of imple-
mentation and cost effectiveness. Knowing who provided
the data may help readers determine the extent to which
the findings related to social validity can be generalized
to broader populations. Finally, in some cases, knowing
who provided the social-validity assessment data can
reveal potential conflicts of interest. This is especially
important in cases where financial or personal interests
may influence the assessment of an intervention.

Another potential concern arises when researchers col-
lect social-validity data by directly asking participants
(e.g., the direct recipients of the intervention or parents of
the direct recipient) to rate the quality of services provided.
This method introduces a potential bias, as the person pro-
viding and evaluating the intervention is the one soliciting
feedback, possibly exerting pressure on participants to pro-
vide favorable responses. In future research, researchers
might mitigate this concern by implementing strategies to
minimize bias. For example, if participants are aware that
the researcher who delivered the services is gathering feed-
back, transparency can be maintained by ensuring that
participants understand the purpose of the evaluation and
emphasizing the importance of honest feedback. Addition-
ally, researchers can employ measures such as anonymous
surveys or third-party data collection to reduce the influ-
ence of social-desirability bias and encourage participants
to provide genuine responses without feeling pressured to
be overly positive.

Social-validity measurement points

Table 4 depicts the percentage of total studies that col-
lected social-validity data at different points. Most studies
(83.13%) assessed social validity at or after the conclusion
of the study. At the conclusion of the study, participants
who received the intervention or delivered the intervention
were often provided with a Likert-type rating scale and

SYSTEMATIC REVIEW OF SOCIAL VALIDITY 553

19383703, 2024, 3, D
ow

nloaded from
https://onlinelibrary.w

iley.com
/doi/10.1002/jaba.1092 by C

apella U
niversity, W

iley O
nline L

ibrary on [20/09/2024]. See the T
erm

s and C
onditions (https://onlinelibrary.w

iley.com
/term

s-and-conditions) on W
iley O

nline L
ibrary for rules of use; O

A
articles are governed by the applicable C

reative C
om

m
ons L

icense

asked to rate their agreement with statements about the
intervention goals, procedures, and/or outcomes. For
example, at the conclusion of the study, Hanley et al.
(2014) administered a four-item rating scale to parents
whose children participated in an intervention to reduce
challenging behavior and increase functional replacement
behaviors. To supplement information gathered by the
families following the intervention, Hanley et al. reported
data on the time and cost associated with the intervention.
Although not directly related to participant or family per-
ceptions about the goals, procedures, and outcomes of the
intervention, providing representative data on time and
cost might help influence public perceptions about the
social validity of the intervention, particularly if the inter-
vention is publicly funded.

A much smaller number of studies conducted social-
validity assessments prior to the start of the intervention
(9.38%) or during the intervention (13.75%). As discussed
above, conducting direct observations of peers prior to
intervention might help researchers develop socially valid
goals that reflect developmentally or contextually appro-
priate behavior. These observations can also inform the
development and implementation of the intervention by
helping researchers to define the target behaviors of inter-
est or providing a performance standard (or terminal goal)
from which to evaluate the participant’s progress. Carlile
et al. (2018) provided a unique example of a social-validity
preassessment conducted with children who were not
involved with the study. Prior to implementing an inter-
vention to teach six school-aged children with autism to
request help when lost, the researchers asked 45 similar-
aged typically developing peers to answer open-ended
questions about what it meant to be lost, what to do when
lost, and their use of cell phones. The data collected from
this assessment were used to develop the individualized
target behavior definitions for each participant. In another
example, Downs et al. (2015) asked a certified yoga
instructor to review and provide feedback on a task analy-
sis for teaching yoga postures prior to implementing a
video self-evaluation intervention for improving yoga pos-
tures with two adult yoga students.

Studies that used concurrent-chains intervention prefer-
ence assessments most often collected social-validity data
during intervention. In other words, participants were pro-
vided with the opportunity to select a schedule-correlated
stimulus associated with a specific intervention and then
experience the intervention following selection. Although
these types of social-validity assessments were coded as
occurring during the intervention, they often occurred after
the researchers introduced and assessed the efficacy of dif-
ferent interventions for the participant. Campbell and
Anderson (2011) provided a unique example of a social-
validity assessment conducted with teachers during the
delivery of the intervention, using a Likert-type rating scale
and questionnaire. Teachers were coached to deliver a
Check-In Check-Out intervention with four students who
displayed challenging behavior that resulted in office

disciplinary referrals. Teachers’ perceptions of changes in
student challenging behavior (outcomes) were assessed once
or twice a week throughout the study using a two-item
rating scale. Additionally, the contextual fit of the interven-
tion was assessed with the teachers during the initial imple-
mentation phase and at the end of the study using
the Contextual Fit Questionnaire (Horner et al., 2003). This
questionnaire asked teachers to provide feedback on the
ease of implementation of the intervention, the amount of
effort required to implement the intervention, and whether
the effects of the intervention were worth the effort. The
researchers made modifications to the intervention on an
ongoing basis in response to the information provided by
teachers via social-validity assessments.

These findings were similar to those reported by
Snodgrass et al. (2018), who found that social-validity
assessments were most commonly conducted at or after
the conclusion of the intervention. Knowing the point
at which social-validity assessment data were collected
may be important for several reasons. Social-validity
data collected at different points can provide insights
into whether and how an intervention has been adapted
or modified in response to feedback from participants.
This can shed light on the dynamic nature of interven-
tion development and implementation. In the current
study, most social-validity assessments were found to
be conducted at the conclusion of the study. Thus,
social-validity data may not be commonly used in
research to inform the development of interventions or
changes to an intervention during a study (although
these data may inform the development of subsequent
studies). Additionally, over time a participant’s percep-
tions and expectations of an intervention may change.
Knowing the point of data collection helps identify poten-
tial response shifts, where participants’ initial expectations
or judgments may evolve as they experience the interven-
tion. Although we only looked at the points at which
social-validity data were collected within studies, it may be
equally important to look at the points at which social-
validity data are collected across studies and years. An
intervention considered socially valid during one period
may become less acceptable due to changing societal atti-
tudes and norms (see Barnes, 2019), individual or collec-
tive beliefs (see King et al., 2006), or global health and
economic conditions (see Nicolson et al., 2020). Social-
validity assessments may be useful in identifying these
changes.

GENERAL DISCUSSION

In this review, we replicated and extended the procedures
described by Carr et al. (1999), Snodgrass et al. (2018),
Ferguson et al. (2019), and Huntington et al. (2023) to
systematically identify the prevalence and type of social-
validity assessments published in JABA between 2010
and 2020. We found the percentage of studies including

554 LEIF ET AL.

19383703, 2024, 3, D
ow

nloaded from
https://onlinelibrary.w

iley.com
/doi/10.1002/jaba.1092 by C

apella U
niversity, W

iley O
nline L

ibrary on [20/09/2024]. See the T
erm

s and C
onditions (https://onlinelibrary.w

iley.com
/term

s-and-conditions) on W
iley O

nline L
ibrary for rules of use; O

A
articles are governed by the applicable C

reative C
om

m
ons L

icense

social-validity assessments was relatively stable between
2010 and 2019, with a marked increase in 2020. We
found that social-validity measures designed to assess the
acceptability of intervention procedures and outcomes
were most common, with relatively fewer studies asses-
sing the acceptability of the intervention goals. Likert-
type rating scales were the most commonly used method
for collecting social-validity data, followed by non-
Likert-type questionnaires. In addition to prevalence and
type, we reported data on the respondent characteristics
and the points at which social-validity data were
conducted during the study. We found that in over half
of the included studies, the direct recipients of the inter-
vention provided information about the social validity
of the intervention’s goals, procedures, or outcomes.
Social-validity data were less commonly provided by peo-
ple who delivered the intervention (e.g., parents, teachers,
coaches), people who had a relationship with the direct
recipient of the intervention but did not deliver the inter-
vention (e.g., parents, teachers, therapists), or people
who did not have a relationship with the direct recipient
of the intervention and were not involved in the study
(e.g., undergraduate students, Board Certified Behavior
Analysts, health professionals, coaches, employers). Most
social-validity assessments were conducted at the conclu-
sion of the intervention.

Recommendations to increase the collection and
reporting of social-validity data in behavior-analytic
research have been made consistently (Baer et al., 1987;
Detrich, 2018; Hanley, 2010; Schwartz & Baer, 1991;
Wolf, 1978), yet the current findings demonstrate that
social-validity assessments are still relatively infrequently
employed as primary or secondary measures for research
published in JABA. There are several potential reasons
why this might be the case. First, Carr et al. (1999) noted
that behavior-analytic journals do not provide recommen-
dations about when and how to report social-validity data
or require such measures for publication, together poten-
tially contributing to the underreporting of such data. Sec-
ond, editors and reviewers of behavior-analytic journals
may prioritize the collection and reporting of data on the
effectiveness of interventions rather than more subjective
measures about the perceived acceptability and value of
these interventions. Indeed, JABA’s author guidelines
(Journal of Applied Behavior Analysis, n.d.) state that the
primary focus of JABA is on research studies demonstrat-
ing socially important functional relations. Although the
author guidelines currently state that the clinical signifi-
cance of the effects for individuals should be discussed, it
is noted that direct measures of behavior are critical for
the acceptance of research in the journal. Concurrent-
chains intervention preference assessments provide one
direct measure of the potential social acceptability of the
intervention procedures. However, most social-validity
assessments rely on subjective measures such as personal
opinions. Therefore, authors may wonder if personal
opinions about the goals, procedures, or outcomes of an

intervention are appropriate for publication in JABA.
We encourage the editorial board of JABA to consider
providing more clear advice on when and how to report
measures of social validity in research studies.

Third, Huntington et al. (2023) noted that the variety
of terms used in the literature to describe social validity
(e.g., satisfaction, preference, acceptability) may make it
challenging to identify, compare, and contrast social-
validity assessments and outcomes. Huntington et al.
argued that imprecise use of terms to describe social
validity may be inconsistent with a behavior-analytic
commitment to technical descriptions of intervention
procedures and research methods. As noted above, the
JABA author guidelines recommend that authors
describe the clinical significance of behavior change.
However, the term “clinical significance” is not defined
and its relation to social validity is unclear. To address
this challenge, we have attempted to provide more precise
definitions for the dimensions of social validity (Table 1),
the methods of collecting social-validity data (Table 2),
groups of respondents who might provide social-validity
data (Table 3), and points at which social-validity data
might be collected (Table 4). We encourage researchers
to clearly describe the methods and procedures used
to collect social-validity data in future studies and hope
the definitions provided in the current study will help
increase consistency in the use of terms or concepts.

Recommendations

Based on the findings of the currently study, we believe
there are meaningful steps that researchers can take to
improve the reporting of social validity in JABA. In what
follows, we provide three practical recommendations for
potentially improving the collection and reporting infor-
mation about the social validity of interventions.

Recommendation #1: Integrate social validity
and informed consent procedures

Behavior analysts adhering to the Behavior Analyst Certifi-
cation Board Ethics Code (2020) have an ethical responsibil-
ity to obtain informed consent from clients and participants
before engaging in behavioral assessments, interventions, or
changes in intervention design. Additionally, behavior ana-
lysts have an ethical responsibility to respect and actively
promote client choice and self-determination to the best
of their abilities, particularly when providing services to
vulnerable populations. To obtain informed consent, it
is important for researchers and practitioners to clearly
explain the goals, procedures, and anticipated outcomes
associated with the delivery of the intervention. Thus,
the process of obtaining informed consent may provide
opportunities to gather data on the social validity of the
intervention prior to its implementation. However, in

SYSTEMATIC REVIEW OF SOCIAL VALIDITY 555

19383703, 2024, 3, D
ow

nloaded from
https://onlinelibrary.w

iley.com
/doi/10.1002/jaba.1092 by C

apella U
niversity, W

iley O
nline L

ibrary on [20/09/2024]. See the T
erm

s and C
onditions (https://onlinelibrary.w

iley.com
/term

s-and-conditions) on W
iley O

nline L
ibrary for rules of use; O

A
articles are governed by the applicable C

reative C
om

m
ons L

icense

the current study, few studies reported formal measures
of social validity prior to the start of the intervention
(although such measures might be collected but omitted
from published research). We encourage researchers to inte-
grate social-validity measures into the informed consent
process. We recommend that researchers describe how
social validity was assessed before the intervention
(e.g., via interviews or questionnaires) and what changes
were made to aspects of the intervention (e.g., interven-
tion procedures used) based on social-validity data.
Researchers might ask participants to consent to data
being collected and reported on how the intervention goals
and procedures were developed and changed based on par-
ticipant feedback prior to the actual implementation of the
intervention or report the number of participants who
declined to participate following a description of the inter-
vention procedures.

Recommendation #2: Incorporate ongoing
assessments of social validity

Behavior analysis is unique from other fields of psychologi-
cal study in that there is an emphasis on understanding idio-
syncratic functional relations between an organism’s
behavior(s) and the environments within which it occurs
(Skinner, 1953). This has led to the prioritization of direct
behavioral assessments as well as the use of single-case
research methods that allow for rigorous and reliable explo-
ration of behavioral variability as the datum of interest
(Sidman, 1960). These methods allow for the elaboration of
broader behavioral principles while actively informing ongo-
ing intervention and treatment decisions (Kazdin, 2021).
However, this same approach has not been applied to
social-validity assessment. Most social-validity assessments
were conducted at the conclusion of the intervention, and,
therefore, the data may not be used to inform changes to
goals of the intervention or the intervention design during
the study. We recommend that researchers and practitioners
adopt an ongoing approach to social-validity assessment
during the intervention. Assessments of the social validity of
the goals of the intervention, the procedures used to deliver
the intervention, and the outcomes of the intervention
should ideally be conducted at various stages throughout
the intervention process. Understanding stakeholders’ per-
spectives at various stages can help ensure that the interven-
tion aligns with their values, priorities, and needs. For
example, researchers and practitioners might assess stake-
holders’ perceptions of the intervention procedures such as
the clarity of instructions, the feasibility of implementation,
the acceptability of the delivery format, and the appropriate-
ness of the intervention activities. In some cases, it may be
useful to collect social-validity data from multiple stake-
holders (e.g., parents, teachers, therapists) at different points
to identify and address disagreements related to the goals of
intervention, acceptability of the intervention procedures, or
importance of the outcomes. Regular and systematic

assessment of social validity throughout the intervention
timeline may help promote stakeholder engagement,
improve intervention design and delivery, and enhance the
overall efficacy of the intervention.

We also recommend that researchers and practitioners
explore ways in which personalized and idiosyncratic mea-
sures of social validity can be incorporated alongside exist-
ing Likert-type questionnaires. Some meaningful examples
have been presented in the published research where behav-
ior analysts have assessed, defined, and then measured per-
sonalized and idiosyncratic behaviors that may be
indicative of the social validity of an intervention. For
example, Green and Reid (1996), Parsons et al. (2012), and
Ramey et al. (2023) demonstrated that personalized indi-
ces of happiness and unhappiness could be operationally
defined and reliably measured. In addition, the concept
of “happy, relaxed, and engaged” may provide a useful
heuristic to support the personalization of measures of
social validity (see Gover et al., 2022). Finally, develop-
ing and incorporating novel applications of concurrent-
chains intervention preference assessments, such as the
enhanced choice model (Rajaraman et al., 2022), may
be useful for refining the measurement and reporting
of the social validity of interventions for the direct
recipient of the intervention. Rajaraman et al. (2022)
demonstrated that ongoing assessment of social validity
could be implemented with children by providing
them with concurrent, continuously available options to
(a) experience skill-based treatment for their challenging
behavior (intervention context), (b) experience noncon-
tingent reinforcement (hangout context), or (c) leave the
intervention setting altogether. By regularly assessing
the child’s choice, the researchers could alter the skill-
based treatment context (including the schedule and
type of demands and reinforcers presented) to ensure
that it included components that the child preferred and
to enhance the child’s willingness to participate in the
intervention.

Recommendation #3: Include open-ended
response options

In the current study, the most common method of collect-
ing social-validity data was via Likert-type rating scales, a
closed-ended assessment method. Including open-ended
response options in social-validity assessments may be a
valuable way to gather qualitative data and in-depth feed-
back from participants. These open-ended responses can
provide insights, context, and nuanced perspectives that
closed-ended questions may not capture (Fryling &
Baires, 2016). Asking open-ended questions to participants
with repertoires of vocalized verbal behavior and record-
ing their responses (see Nieto & Wiskow, 2020) may be
one way to reduce the response effort for participants to
engage in social-validity assessments. Researchers might
ask open-ended questions to learn about participant’s

556 LEIF ET AL.

19383703, 2024, 3, D
ow

nloaded from
https://onlinelibrary.w

iley.com
/doi/10.1002/jaba.1092 by C

apella U
niversity, W

iley O
nline L

ibrary on [20/09/2024]. See the T
erm

s and C
onditions (https://onlinelibrary.w

iley.com
/term

s-and-conditions) on W
iley O

nline L
ibrary for rules of use; O

A
articles are governed by the applicable C

reative C
om

m
ons L

icense

perceptions of the benefits of the intervention and any
adverse or unexpected effects associated with the interven-
tion. To report data, researchers might consider including
illustrative quotes from participants in text or in a table.

Limitations and future research

Some limitations of the current study warrant mention.
First, we only systematically identified and appraised
social-validity assessments published in JABA. Thus, our
findings may not be representative of all relevant studies
including social-validity assessments in the behavior-
analytic literature and future research could evaluate
similar procedures for additional journals. Second, we
did not report data on the settings in which interventions
were conducted. The dimensions of social validity
assessed and the methods used to gather these data may
differ among university-based clinics, schools, and com-
munity-based settings. In the future, researchers could
explore differences in social-validity measurement and
reporting across settings, including differences in respon-
dents across settings. Third, we included both open-ended
questions and non-Likert-type closed-ended questions in
our definition of non-Likert-type questionnaires (see
Table 2), a limitation of our data-coding procedures.
Open-ended questions often yield qualitative data that
require different analytical approaches than those used
for closed-ended questions. Future researchers could
focus on developing more precise coding methods.
Knowing the characteristics of individuals who find
an intervention acceptable and effective might help
researchers tailor interventions to better meet the needs
of specific populations. Finally, the studies included in
this review used a wide range of methods to collect
social-validity data. Future researchers may wish to con-
duct a more in-depth review of individual methods used
to collect social-validity data and the outcomes reported.
The findings of such reviews might help practitioners and
researchers identify when and how to conduct various
types of social-validity assessments and may establish a
more robust evidence base for the social validity of
behavior-analytic interventions.

ACKNOWLEDGMENT
Open access publishing facilitated by Monash University,
as part of the Wiley – Monash University agreement via
the Council of Australian University Librarians.

CONFLICT OF INTEREST STATEMENT
The authors do not have any conflicts of interest to
declare.

DATA AVAILABILITY STATEMENT
The data collected as part of this study can be found in
the Additional Supporting Information in the online ver-
sion of this article at the publisher’s website.

ETHICS APPROVAL
No human or animal subjects were used to produce this
article.

ORCID
Erin S. Leif https://orcid.org/0000-0003-2219-2405
Bradley S. Bloomfield https://orcid.org/0000-0002-
5792-5480
Russell A. Fox https://orcid.org/0000-0002-3061-3495

REFERENCES
An asterisk denotes studies that were included in the current review. A
full list of included studies can be found in the Supporting Information
in the online version of this article at the publisher’s website.
*Abellon, O. E., & Wilder, D. A. (2014). The effect of equipment prox-

imity on safe performance in a manufacturing setting. Journal of
Applied Behavior Analysis, 47(3), 628–632. https://doi.org/10.1002/
jaba.137

*Allen, K. D., & Wallace, D. P. (2013). Effectiveness of using noncon-
tingent escape for general behavior management in a pediatric
dental clinic. Journal of Applied Behavior Analysis, 46(4), 723–737.
https://doi.org/10.1002/jaba.82

*Allison, J., Wilder, D. A., Chong, I. V. Y., Lugo, A., Pike, J., &
Rudy, N. (2012). A comparison of differential reinforcement and
noncontingent reinforcement to treat food selectivity in a child
with autism. Journal of Applied Behavior Analysis, 45(3), 613–617.
https://doi.org/10.1901/jaba.2012.45-613

*Austin, J. L., & Bevan, D. (2011). Using differential reinforcement
of low rates to reduce children’s requests for teacher attention.
Journal of Applied Behavior Analysis, 44(3), 451–461. https://doi.
org/10.1901/jaba.2011.44-451

Baer, D. M., Wolf, M. M., & Risley, T. R. (1987). Some still-current
dimensions of applied behavior analysis. Journal of Applied Behav-
ior Analysis, 20(4), 313–327. https://doi.org/10.1901/jaba.1987.
20-313

Barnes, C. (2019). Understanding the social model of disability: Past,
present and future. In N. Watson, A. Roulstone, & C. Thomas
(Eds.), Routledge handbook of disability studies (pp. 14–31).
Routledge.

Behavior Analyst Certification Board. (2020). Ethics code for behavior
analysts. https://bacb.com/wp-content/ethics-code-for-behavior-
analysts/

Braun, V., & Clarke, V. (2022). Conceptual and design thinking for the-
matic analysis. Qualitative Psychology, 9(1), 3–26. https://psycnet.
apa.org/doi/10.1037/qup0000196

Callahan, K., Hughes, H. L., Mehta, S., Toussaint, K. A.,
Nichols, S. M., Ma, P. S., Kutlu, M., & Wang, H. T. (2017).
Social validity of evidence-based practices and emerging interven-
tions in autism. Focus on Autism and Other Developmental Disabil-
ities, 32(3), 188–197. https://doi.org/10.1177/1088357616632446

*Campbell, A., & Anderson, C. M. (2011). Check-in/check-out: A sys-
tematic evaluation and component analysis. Journal of Applied
Behavior Analysis, 44(2), 315–326. https://doi.org/10.1901/jaba.
2011.44-315

*Carlile, K. A., DeBar, R. M., Reeve, S. A., Reeve, K. F., &
Meyer, L. S. (2018). Teaching help-seeking when lost to individ-
uals with autism spectrum disorder. Journal of Applied Behavior
Analysis, 51(2), 191–206. https://doi.org/10.1002/jaba.447

Carr, J. E., Austin, J. L., Britton, L. N., Kellum, K. K., & Bailey, J. S.
(1999). An assessment of social validity trends in applied behavior
analysis. Behavioral Interventions, 12(4), 223–231. https://doi.org/
10.1002/(SICI)1099-078X(199910/12)14:4

Carter, S. L., & Wheeler, J. J. (2019). The social validity manual.
Elsevier Science & Technology.

Common, E. A., & Lane, K. L. (2017). Social validity assessment. In
J. K. Luiselli. (Ed.), Applied behavior analysis advanced guidebook:

SYSTEMATIC REVIEW OF SOCIAL VALIDITY 557

19383703, 2024, 3, D
ow

nloaded from
https://onlinelibrary.w

iley.com
/doi/10.1002/jaba.1092 by C

apella U
niversity, W

iley O
nline L

ibrary on [20/09/2024]. See the T
erm

s and C
onditions (https://onlinelibrary.w

iley.com
/term

s-and-conditions) on W
iley O

nline L
ibrary for rules of use; O

A
articles are governed by the applicable C

reative C
om

m
ons L

icense

https://orcid.org/0000-0003-2219-2405

https://orcid.org/0000-0003-2219-2405

https://orcid.org/0000-0002-5792-5480

https://orcid.org/0000-0002-5792-5480

https://orcid.org/0000-0002-5792-5480

https://orcid.org/0000-0002-3061-3495

https://orcid.org/0000-0002-3061-3495

https://doi.org/10.1002/jaba.137

https://doi.org/10.1002/jaba.137

https://doi.org/10.1002/jaba.82

https://doi.org/10.1901/jaba.2012.45-613

https://doi.org/10.1901/jaba.2011.44-451

https://doi.org/10.1901/jaba.2011.44-451

https://doi.org/10.1901/jaba.1987.20-313

https://doi.org/10.1901/jaba.1987.20-313

https://bacb.com/wp-content/ethics-code-for-behavior-analysts/

https://bacb.com/wp-content/ethics-code-for-behavior-analysts/

https://doi.org/10.1037/qup0000196

https://doi.org/10.1037/qup0000196

https://doi.org/10.1177/1088357616632446

https://doi.org/10.1901/jaba.2011.44-315

https://doi.org/10.1901/jaba.2011.44-315

https://doi.org/10.1002/jaba.447

https://doi.org/10.1002/(SICI)1099-078X(199910/12)14:4

https://doi.org/10.1002/(SICI)1099-078X(199910/12)14:4

A manual for professional practice (pp. 73–92). Academic Press.
https://doi.org/10.1016/B978-0-12-811122-2.00004-8

Cook, B. G., Cook, L., & Landrum, T. J. (2013). Moving research into
practice: Can we make dissemination stick? Exceptional Children,
79(3), 163–180. https://doi.org/10.1177/001440291307900203

Detrich, R. (2018). Rethinking dissemination: Storytelling as a part of
the repertoire. Perspectives on Behavior Science, 41(2), 541–549.
https://doi.org/10.1007/s40614-018-0160-y

*DiGennaro-Reed, F. D., Codding, R., Catania, C. N., & Maguire, H.
(2010). Effects of video modeling on intervention integrity of
behavioral interventions. Journal of Applied Behavior Analysis,
43(2), 291–295. https://doi.org/10.1901/jaba.2010.43-291

*Downs, H. E., Miltenberger, R., Biedronski, J., & Witherspoon, L.
(2015). The effects of video self-evaluation on skill acquisition with
yoga postures. Journal of Applied Behavior Analysis, 48(4), 930–
935. https://doi.org/10.1002/jaba.248

*Erath, T. G., DiGennaro Reed, F. D., Sundermeyer, H. W.,
Brand, D., Novak, M. D., Harbison, M. J., & Shears, R. (2020).
Enhancing the training integrity of human service staff using pyra-
midal behavioral skills training. Journal of Applied Behavior Anal-
ysis, 53(1), 449–464. https://doi.org/10.1002/jaba.608

Ferguson, J. L., Cihon, J. H., Leaf, J. B., Van Meter, S. M.,
McEachin, J., & Leaf, R. (2019). Assessment of social validity
trends in the Journal of Applied Behavior Analysis. European Jour-
nal of Behavior Analysis, 20(1), 146–157. https://doi.org/10.1080/
15021149.2018.1534771

*Fogel, V. A., Miltenberger, R. G., Graves, R., & Koehler, S. (2010). The
effects of exergaming on physical activity among inactive children in
a physical education classroom. Journal of Applied Behavior Analysis,
43(4), 591–600. https://doi.org/10.1901/jaba.2010.43-591

Fryling, M. J., & Baires, N. A. (2016). The practical importance of the
distinction between open and closed-ended indirect assessments.
Behavior Analysis in Practice, 9(2), 146–151. https://doi.org/10.
1007/s40617-016-0115-2

*Gibbs, A. R., Tullis, C. A., Thomas, R., & Elkins, B. (2018). The
effects of noncontingent music and response interruption and redi-
rection on vocal stereotypy. Journal of Applied Behavior Analysis,
51(4), 899–914. https://doi.org/10.1002/jaba.485

Gover, H. C., Staubitz, J. E., & Ju�arez, A. P. (2022). Revisiting rein-
forcement: A focus on happy, relaxed, and engaged students.
TEACHING Exceptional Children, 55(1), 72–74. https://doi.org/
10.1177/00400599221123185

Green, C. W., & Reid, D. H. (1996). Defining, validating, and increas-
ing indices of happiness among people with profound multiple
disabilities. Journal of Applied Behavior Analysis, 29(1), 67–78.
https://doi.org/10.1901/jaba.1996.29-67

*Grosberg, D., & Charlop, M. H. (2017). Teaching conversational
speech to children with autism spectrum disorder using
text-message prompting. Journal of Applied Behavior Analysis,
50(4), 789–804. https://doi.org/10.1002/jaba.403

*Gunning, C., Holloway, J., & Grealish, L. (2020). An evaluation of
parents as behavior change agents in the Preschool Life Skills pro-
gram. Journal of Applied Behavior Analysis, 53(2), 889–917.
https://doi.org/10.1002/jaba.660

Hanley, G. P. (2010). Toward effective and preferred programming: A
case for the objective measurement of social validity with recipi-
ents of behavior-change programs. Behavior Analysis in Practice,
3(1), 13–21. https://doi.org/10.1007/BF03391754

*Hanley, G. P., Jin, C. S., Vanselow, N. R., & Hanratty, L. A. (2014).
Producing meaningful improvements in problem behavior of chil-
dren with autism via synthesized analyses and treatments. Journal
of Applied Behavior Analysis, 47(1), 16–36. https://doi.org/10.1002/
jaba.106

Horner, R., Salantine, S., & Albin, R. (2003). Self assessment of contex-
tual fit in schools. Educational and Community Supports.

* Howard, V. J., & DiGennaro Reed, F. D. (2014). Training shelter vol-
unteers to teach dog compliance. Journal of Applied Behavior
Analysis, 47(2), 344–359. https://doi.org/10.1002/jaba.120

Huntington, R. N., Badgett, N. M., Rosenberg, N. E., Greeny, K.,
Bravo, A., Bristol, R. M., Byun, Y. H., & Park, M. S. (2023).
Social validity in behavioral research: A selective review. Perspec-
tives on Behavior Science, 46(1), 201–215. https://doi.org/10.1007/
s40614-022-00364-9

*Jones, M. E., Allan Allday, R., & Givens, A. (2019). Reducing adoles-
cent cell phone usage using an interdependent group contingency.
Journal of Applied Behavior Analysis, 52(2), 386–393. https://doi.
org/10.1002/jaba.538

Journal of Applied Behavior Analysis. (n.d.). Author guidelines.
https://onlinelibrary.wiley.com/page/journal/19383703/
homepage/forauthors.html

Kazdin, A. E. (1977). Assessing the clinical or applied importance of
behavior change through social validation. Behavior Modification,
1(4), 427–452. https://doi.org/10.1177/014544557714001

Kazdin, A. E. (2021). Single-case experimental designs: Characteristics,
changes, and challenges. Journal of the Experimental Analysis of
Behavior, 115(1), 56–85. https://doi.org/10.1002/jeab.638

Kelley, M. L., Heffer, R. W., Gresham, F. M., & Elliott, S. N. (1989).
Development of a modified intervention evaluation inventory.
Journal of Psychopathology and Behavioral Assessment, 11(3),
235–247. https://doi.org/10.1007/BF00960495

Kern, L., & Manz, P. (2004). A look at current validity issues of school-
wide behavior support. Behavioral Disorders, 30(1), 47–59. https://
doi.org/10.1177/019874290403000102

King, G. A., Zwaigenbaum, L., King, S., Baxter, D., Rosenbaum, P., &
Bates, A. (2006). A qualitative investigation of changes in the
belief systems of families of children with autism or Down syn-
drome. Child: Care, Health and Development, 32(3), 353–369.
https://doi.org/10.1111/j.1365-2214.2006.00571.x

*Leaf, J. B., Sheldon, J. B., & Sherman, J. A. (2010). Comparison of
simultaneous prompting and no-no prompting in two-choice dis-
crimination learning with children with autism. Journal of Applied
Behavior Analysis, 43(2), 215–228. https://doi.org/10.1901/jaba.
2010.43-215

Leko, M. M. (2014). The value of qualitative methods in social validity
research. Remedial and Special Education, 35(5), 275–286. https://
doi.org/10.1177/0741932514524002

*Lerman, D. C., Hawkins, L., Hoffman, R., & Caccavale, M. (2013).
Training adults with an autism spectrum disorder to conduct
discrete-trial training for young children with autism: A pilot
study. Journal of Applied Behavior Analysis, 46(2), 465–478.
https://doi.org/10.1002/jaba.50

Lloyd, J. W., & Heubusch, J. D. (1996). Issues of social validation in
research on serving individuals with emotional or behavioral disor-
ders. Behavioral Disorders, 22(1), 8–14. https://doi.org/10.1177/
019874299602200105

*Luczynski, K. C., & Hanley, G. P. (2013). Prevention of problem
behavior by teaching functional communication and self-control
skills to preschoolers. Journal of Applied Behavior Analysis, 46(2),
355–368. https://doi.org/10.1002/jaba.44

*Mancuso, C., & Miltenberger, R. G. (2016). Using habit reversal to
decrease filled pauses in public speaking. Journal of Applied Behav-
ior Analysis, 49(1), 188–192. https://doi.org/10.1002/jaba.267

*Mann, C. C., & Karsten, A. M. (2020). Efficacy and social validity of
procedures for improving conversational skills of college students
with autism. Journal of Applied Behavior Analysis, 53(1), 402–421.
https://doi.org/10.1002/jaba.600

Martens, B. K., Witt, J. C., Elliott, S. N., & Darveaux, D. X. (1985).
Teacher judgments concerning the acceptability of school-based
interventions. Professional Psychology: Research and Practice,
16(2), 191–198. https://doi.org/10.1037/0735-7028.16.2.191

Nicolson, A. C., Lazo-Pearson, J. F., & Shandy, J. (2020). ABA finding
its heart during a pandemic: An exploration in social validity.
Behavior Analysis in Practice, 13(4), 757–766. https://doi.org/10.
1007/s40617-020-00517-9

*Nieto, P., & Wiskow, K. M. (2020). Evaluating adult interaction dur-
ing the Step It UP! game to increase physical activity in children.

558 LEIF ET AL.

19383703, 2024, 3, D
ow

nloaded from
https://onlinelibrary.w

iley.com
/doi/10.1002/jaba.1092 by C

apella U
niversity, W

iley O
nline L

ibrary on [20/09/2024]. See the T
erm

s and C
onditions (https://onlinelibrary.w

iley.com
/term

s-and-conditions) on W
iley O

nline L
ibrary for rules of use; O

A
articles are governed by the applicable C

reative C
om

m
ons L

icense

https://doi.org/10.1016/B978-0-12-811122-2.00004-8

https://doi.org/10.1177/001440291307900203

https://doi.org/10.1007/s40614-018-0160-y

https://doi.org/10.1901/jaba.2010.43-291

https://doi.org/10.1002/jaba.248

https://doi.org/10.1002/jaba.608

https://doi.org/10.1080/15021149.2018.1534771

https://doi.org/10.1080/15021149.2018.1534771

https://doi.org/10.1901/jaba.2010.43-591

https://doi.org/10.1007/s40617-016-0115-2

https://doi.org/10.1007/s40617-016-0115-2

https://doi.org/10.1002/jaba.485

https://doi.org/10.1177/00400599221123185

https://doi.org/10.1177/00400599221123185

https://doi.org/10.1901/jaba.1996.29-67

https://doi.org/10.1002/jaba.403

https://doi.org/10.1002/jaba.660

https://doi.org/10.1007/BF03391754

https://doi.org/10.1002/jaba.106

https://doi.org/10.1002/jaba.106

https://doi.org/10.1002/jaba.120

https://doi.org/10.1007/s40614-022-00364-9

https://doi.org/10.1007/s40614-022-00364-9

https://doi.org/10.1002/jaba.538

https://doi.org/10.1002/jaba.538

https://onlinelibrary.wiley.com/page/journal/19383703/homepage/forauthors.html

https://onlinelibrary.wiley.com/page/journal/19383703/homepage/forauthors.html

https://doi.org/10.1177/014544557714001

https://doi.org/10.1002/jeab.638

https://doi.org/10.1007/BF00960495

https://doi.org/10.1177/019874290403000102

https://doi.org/10.1177/019874290403000102

https://doi.org/10.1111/j.1365-2214.2006.00571.x

https://doi.org/10.1901/jaba.2010.43-215

https://doi.org/10.1901/jaba.2010.43-215

https://doi.org/10.1177/0741932514524002

https://doi.org/10.1177/0741932514524002

https://doi.org/10.1002/jaba.50

https://doi.org/10.1177/019874299602200105

https://doi.org/10.1177/019874299602200105

https://doi.org/10.1002/jaba.44

https://doi.org/10.1002/jaba.267

https://doi.org/10.1002/jaba.600

https://doi.org/10.1037/0735-7028.16.2.191

https://doi.org/10.1007/s40617-020-00517-9

https://doi.org/10.1007/s40617-020-00517-9

Journal of Applied Behavior Analysis, 53(3), 1354–1366. https://
doi.org/10.1002/jaba.699

Parsons, M. B., Reid, D. H., Bentley, E., Inman, A., & Lattimore, L. P.
(2012). Identifying indices of happiness and unhappiness among
adults with autism: Potential targets for behavioral assessment and
intervention. Behavior Analysis in Practice, 5(1), 15–25. https://doi.
org/10.1007/BF03391814

*Potter, J. N., Hanley, G. P., Augustine, M., Clay, C. J., &
Phelps, M. C. (2013). Treating stereotypy in adolescents diagnosed
with autism by refining the tactic of “using stereotypy as reinforce-
ment.” Journal of Applied Behavior Analysis, 46(2), 407–423.
https://doi.org/10.1002/jaba.52

*Raiff, B. R., & Dallery, J. (2010). Internet-based contingency manage-
ment to improve adherence with blood glucose testing recommen-
dations for teens with Type 1 diabetes. Journal of Applied Behavior
Analysis, 43(3), 487–491.

Rajaraman, A., Hanley, G. P., Gover, H. C., Staubitz, J. L.,
Staubitz, J. E., Simcoe, K. M., & Metras, R. (2022). Minimizing
escalation by treating dangerous problem behavior within an enhanced
choice model. Behavior Analysis in Practice, 15(1), 219–242. https://
doi.org/10.1007/s40617-020-00548-2

Ramey, D., Healy, O., & McEnaney, E. (2023). Defining and
measuring indices of happiness and unhappiness in children diag-
nosed with autism spectrum disorder. Behavior Analysis in Prac-
tice, 16(1), 194–209. https://doi.org/10.1007/s40617-022-00710-y

Reimers, T., Wacker, D., & Koeppl, G. (1987). Acceptability of behav-
ioral interventions: A review of the literature. School Psychology
Review, 16(2), 212–227. https://doi.org/10.1080/02796015.1987.
12085286

*Rubio, E. K., Volkert, V. M., Farling, H., & Sharp, W. G. (2020).
Evaluation of a finger prompt variation in the treatment of pediat-
ric feeding disorders. Journal of Applied Behavior Analysis, 53(2),
956–972. https://doi.org/10.1002/jaba.658

Schwartz, I. S., & Baer, D. M. (1991). Social validity assessments: Is
current practice state of the art? Journal of Applied Behavior
Analysis, 24(2), 189–204. https://doi.org/10.1901/jaba.1991.
24-189

Sidman, M. (1960). Tactics of scientific research: Evaluating experimen-
tal data in psychology. Basic Books.

Skinner, B. F. (1953). Science and human behavior. Macmillan.
Snodgrass, M. R., Chung, M. Y., Meadan, H., & Halle, J. W. (2018).

Social validity in single-case research: A systematic literature
review of prevalence and application. Research in Developmental
Disabilities, 74, 160–173. https://doi.org/10.1016/j.ridd.2018.01.007

*Stokes, J. V., Luiselli, J. K., & Reed, D. D. (2010). A behavioral inter-
vention for teaching tackling skills to high school football athletes.
Journal of Applied Behavior Analysis, 43(3), 509–512. https://doi.
org/10.1901/jaba.2010.43-509

*Tai, S. S., & Miltenberger, R. G. (2017). Evaluating behavioral skills
training to teach safe tackling skills to youth football players. Jour-
nal of Applied Behavior Analysis, 50(4), 849–855. https://doi.org/
10.1002/jaba.412

Wolf, M. M. (1978). Social validity: The case for subjective measure-
ment or how applied behavior analysis is finding its heart. Journal
of Applied Behavior Analysis, 11(2), 203–214. https://doi.org/10.
1901/jaba.1978.11-203

SUPPORTING INFORMATION
Additional supporting information can be found online
in the Supporting Information section at the end of this
article.

How to cite this article: Leif, E. S., Kelenc-Gasior,
N., Bloomfield, B. S., Furlonger, B., & Fox, R. A.
(2024). A systematic review of social-validity
assessments in the Journal of Applied Behavior
Analysis: 2010–2020. Journal of Applied Behavior
Analysis, 57(3), 542–559. https://doi.org/10.1002/
jaba.1092

SYSTEMATIC REVIEW OF SOCIAL VALIDITY 559

19383703, 2024, 3, D
ow

nloaded from
https://onlinelibrary.w

iley.com
/doi/10.1002/jaba.1092 by C

apella U
niversity, W

iley O
nline L

ibrary on [20/09/2024]. See the T
erm

s and C
onditions (https://onlinelibrary.w

iley.com
/term

s-and-conditions) on W
iley O

nline L
ibrary for rules of use; O

A
articles are governed by the applicable C

reative C
om

m
ons L

icense

https://doi.org/10.1002/jaba.699

https://doi.org/10.1002/jaba.699

https://doi.org/10.1007/BF03391814

https://doi.org/10.1007/BF03391814

https://doi.org/10.1002/jaba.52

https://doi.org/10.1007/s40617-020-00548-2

https://doi.org/10.1007/s40617-020-00548-2

https://doi.org/10.1007/s40617-022-00710-y

https://doi.org/10.1080/02796015.1987.12085286

https://doi.org/10.1080/02796015.1987.12085286

https://doi.org/10.1002/jaba.658

https://doi.org/10.1901/jaba.1991.24-189

https://doi.org/10.1901/jaba.1991.24-189

https://doi.org/10.1016/j.ridd.2018.01.007

https://doi.org/10.1901/jaba.2010.43-509

https://doi.org/10.1901/jaba.2010.43-509

https://doi.org/10.1002/jaba.412

https://doi.org/10.1002/jaba.412

https://doi.org/10.1901/jaba.1978.11-203

https://doi.org/10.1901/jaba.1978.11-203

https://doi.org/10.1002/jaba.1092

https://doi.org/10.1002/jaba.1092

  • A systematic review of social-validity assessments in the Journal of Applied Behavior Analysis: 2010-2020
  • METHOD

    Initial study screening procedure

    Inclusion and exclusion criteria

    Dependent measures

    Dimensions of social validity

    Social-validity assessment methods

    Respondents

    Social-validity measurement points

    Data extraction procedures

    Interrater reliability

    RESULTS AND DISCUSSION

    Prevalence of social validity

    Dimensions of social validity

    Social-validity methods

    Likert-type rating scales

    Non-Likert-type questionnaires

    Direct observations

    Intervention preference or choice

    Concurrent-chains intervention preference assessments

    Interviews

    Respondents who provided social-validity data

    Social-validity measurement points

    GENERAL DISCUSSION

    Recommendations

    Recommendation #1: Integrate social validity and informed consent procedures

    Recommendation #2: Incorporate ongoing assessments of social validity

    Recommendation #3: Include open-ended response options

    Limitations and future research

    ACKNOWLEDGMENT

    CONFLICT OF INTEREST STATEMENT

    DATA AVAILABILITY STATEMENT

    ETHICS APPROVAL

    REFERENCES

Still stressed from student homework?
Get quality assistance from academic writers!

Order your essay today and save 25% with the discount code LAVENDER