5D2-9 – Teamwork and Program Evaluation. see details below. Please follow instructions given and answer all questions.

Discussion Instructions:

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

* explain in what ways teamwork is an asset when evaluating a program

RESEARCH ARTICLE

The Effectiveness of Teamwork Training on

Teamwork Behaviors and Team Performance:

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

A Systematic Review and Meta-Analysis of

Controlled Interventions

Desmond McEwan
1*, Geralyn R. Ruissen1, Mark A. Eys2, Bruno D. Zumbo3, Mark

R. Beauchamp
1

1 School of Kinesiology, The University of British Columbia, Vancouver, British Columbia, Canada,

2 Departments of Kinesiology/Physical Education and Psychology, Wilfrid Laurier University, Waterloo,

Ontario, Canada, 3 Department of Educational and Counseling Psychology, Faculty of Education, University

of British Columbia, Vancouver, British Columbia, Canad

a

* desi.mcewan@ubc.ca

Abstract

The objective of this study was to conduct a systematic review and meta-analysis of team-

work interventions that were carried out with the purpose of improving teamwork and team

performance, using controlled experimental designs. A literature search returned 16,849

unique articles. The meta-analysis was ultimately conducted on 51 articles, comprising 72

(k) unique interventions, 194 effect sizes, and 8439 participants, using a random effects

model. Positive and significant medium-sized effects were found for teamwork interventions

on both teamwork and team performance. Moderator analyses were also conducted, which

generally revealed positive and significant effects with respect to several sample, interven-

tion, and measurement characteristics. Implications for effective teamwork interventions as

well as considerations for future research are discussed.

Introduction

From road construction crews and professional soccer squads to political parties and special

operations corps, teams have become a ubiquitous part of today’s world. Bringing a group of

highly-skilled individuals together is not sufficient for teams to be effective. Rather, team

members need to be able to work well together in order for the team to successfully achieve its

purposes [1, 2]. As a result, there has been a proliferation of research assessing whether, and

how, teams can be improved through teamwork training. A wide range of studies have shown

positive effects of teamwork interventions for improving team effectiveness across several con-

texts such as health care (e.g., [3]), military (e.g., [4]), aviation (e.g., [5]), and academic (e.g.,

[6]) settings. Similarly, improvements in teamwork have been observed as a result of training

with a variety of team types including new teams (e.g., [7]), intact teams (e.g., [8]), and those

created for laboratory-based experiments (e.g., [9]). In sum, the extant empirical evidence to

date appears to suggest that teams can be improved via teamwork

training.

PLOS ONE | DOI:10.1371/journal.pone.0169604 January 13, 2017 1 / 23

a1111111111

a1111111111
a1111111111
a1111111111
a1111111111

OPEN ACCESS

Citation: McEwan D, Ruissen GR, Eys MA, Zumbo

BD, Beauchamp MR (2017) The Effectiveness of

Teamwork Training on Teamwork Behaviors and

Team Performance: A Systematic Review and

Meta-Analysis of Controlled Interventions. PLoS

ONE 12(1): e0169604. doi:10.1371/journal.

pone.0169604

Editor: Nico W. Van Yperen, Rijksuniversiteit

Groningen, NETHERLANDS

Received: September 15, 2016

Accepted: December 19, 2016

Published: January 13, 2017

Copyright: © 2017 McEwan et al. This is an open
access article distributed under the terms of the

Creative Commons Attribution License, which

permits unrestricted use, distribution, and

reproduction in any medium, provided the original

author and source are credited.

Data Availability Statement: All relevant data are

within the paper and its

Supporting Information

files. Raw data (taken from the studies in our meta-

analysis) are available upon request from the

corresponding author.

Funding: The authors received no specific funding

for this work.

Competing Interests: The authors have declared

that no competing interests exist.

http://crossmark.crossref.org/dialog/?doi=10.1371/journal.pone.0169604&domain=pdf&date_stamp=2017-01-13

http://crossmark.crossref.org/dialog/?doi=10.1371/journal.pone.0169604&domain=pdf&date_stamp=2017-01-13

http://crossmark.crossref.org/dialog/?doi=10.1371/journal.pone.0169604&domain=pdf&date_stamp=2017-01-13

http://crossmark.crossref.org/dialog/?doi=10.1371/journal.pone.0169604&domain=pdf&date_stamp=2017-01-13

http://crossmark.crossref.org/dialog/?doi=10.1371/journal.pone.0169604&domain=pdf&date_stamp=2017-01-13

http://crossmark.crossref.org/dialog/?doi=10.1371/journal.pone.0169604&domain=pdf&date_stamp=2017-01-13

http://creativecommons.org/licenses/by/4.0/

What is Teamwork?

Within teams, members’ behaviors can be categorized in terms of both taskwork and teamwork
processes [2]. Marks et al. [10] differentiated between the two by suggesting that “taskwork

represents what it is that teams are doing, whereas teamwork describes how they are doing it
with each other” (p. 357). Specifically, while taskwork involves the execution of core technical

competencies within a given domain, teamwork refers to the range of interactive and interde-

pendent behavioral processes among team members that convert team inputs (e.g., member

characteristics, organizational funding, team member composition) into outcomes (e.g., team

performance, team member satisfaction) [2, 10]. Some examples of teamwork (and respective

comparisons to taskwork) include: the seamless communication between a surgeon, nurse,

and anaesthesiologist, rather than the technical competencies of these practitioners; the syn-

ergy between a quarterback and receiver to complete a passing play, rather than their respec-

tive skill sets related to throwing or catching a football; the collaborative adjustments a flight

crew makes in response to adverse weather or system problems, rather than each individual’s

aviation skills; and so forth. Research from an assortment of studies indicates that teamwork—

the focus of the current paper—is positively related to important team effectiveness variables,

including team performance, group cohesion, collective efficacy, and member satisfaction [1].

Teamwork has been conceptualized within several theoretical models. For example, in their

review, Rousseau et al. [2] reported that 29 frameworks related to teamwork have been pub-

lished. Although there is much overlap across these models, there are also some notable differ-

ences. These relate to the number of dimensions of teamwork being conceptualized as well as

the specific labelling of these dimensions. One thing that is generally agreed upon, however, is

that teamwork is comprised of multiple observable and measurable behaviors. For instance,
two highly cited frameworks by Marks et al. [10] and Rousseau et al. [2] consist of 10 and 14

dimensions of teamwork, respectively. In general, teamwork models focus on behaviors that

function to (a) regulate a team’s performance and/or (b) keep the team together. These two

components coincide with the two respective processes that Kurt Lewin, the widely recognized

father of group dynamics, originally proposed all groups to be involved in: locomotion and
maintenance [11].

With regard to regulating team performance (i.e., locomotion), teamwork behaviors

include those that occur (a) before/in preparation for team task performance, (b) during the

execution of team performance, and (c) after completing the team task [2]. First, with regard

to teamwork behaviors that occur before/in preparation for team task performance, these
include the active process of defining the team’s overall purpose/mission, setting team goals,

and formulating action plans/strategies for how goals and broader purposes will be achieved.

These behaviors help ensure that all team members are clear in terms of what is required of

them in order for the team to function effectively. Second, teamwork behaviors that occur dur-
ing the execution of team tasks include actions that correspond to members’ communication,
coordination, and cooperation with each other. At this stage, team members translate what

they have previously planned (during the preparation phase) into action. Third, in terms of

teamwork behaviors that occur after completing the team task (i.e., reflection), these include
monitoring important situations and conducting post-task appraisals of the team’s perfor-

mance and system variables (e.g., internal team resources, broader environmental conditions),

solving problems that are precluding team goal attainment, making innovative adjustments to

the team’s strategy, and providing/receiving verbal and behavioral assistance to/from team-

mates. Hence, team members determine whether their actions have moved them closer

towards accomplishing the team goals and objectives, and whether any modifications are

required in order to facilitate future success. In addition to these three dimensions concerned

A Meta-Analysis of Teamwork Training

PLOS ONE | DOI:10.1371/journal.pone.0169604 January 13, 2017 2 / 23

with the regulation of team performance, a fourth dimension of teamwork involves behaviors

that function to keep the team together (i.e., maintenance). These behaviors focus on the

team’s interpersonal dynamics, and include the management of interpersonal conflict between
members and the provision of social support for members experiencing personal difficulties.

Managing interpersonal dynamics is critical as it is theorized that teams cannot operate effec-

tively when these issues are present [2].

How Can Teamwork Be Trained?

Teamwork interventions have utilized a number of training methods in order to target the reg-

ulation of team performance (i.e., preparation, execution, reflection) and management of team

maintenance (i.e., interpersonal dynamics) dimensions. These intervention strategies generally

fall under one of four categories. First, the most basic approach to training and developing

teamwork involves providing didactic education to team members in a classroom-type setting,

such as lecturing about the importance of providing social support within the team or promot-

ing ways to manage interpersonal conflict among teammates. This type of training has been

found to be useful for enhancing team effectiveness (e.g., [12]). A second category of team

training involves utilizing a more interactive workshop-style format, wherein team members

take part in various group activities, such as having discussions about the team’s purposes and

goals (e.g., [13]) or working through case studies together (e.g. [14]). The third broad category

of team training involves simulation training, wherein teams experientially enact various team-

work skills, such as interpersonal communication and coordination, in an environment that

mimics upcoming team tasks (e.g., airline simulators or medical patient manikins). Although

often used as a means of fostering taskwork competencies (e.g., teaching new surgeons how to

perform the technical skills of a medical operation), simulation training has been found to be

an efficacious approach to teamwork intervention (e.g., [15]). In addition to these three train-

ing approaches that occur outside of the team task environment (i.e., training within class-

room and simulation settings), teamwork can also be fostered by incorporating team reviews

in-situ (i.e., where the team actually performs its tasks), which allows teams to monitor/review

their quality of teamwork on an ongoing basis. These team reviews involve some form of team

briefs before (e.g., creating action plans), during (e.g., monitoring team members’ actions),

and/or after (e.g., assessing the team’s performance) team task execution, and have also been

shown to be efficacious in previous studies (e.g., [16]).

The effectiveness of teamwork interventions can be determined with an assortment of crite-

ria, including team- and individually-based behaviors, cognitions, and affective states. Hack-

man and Katz 2010 [17] posit that team effectiveness can be determined by examining the

extent to which the team has achieved its a priori objectives. Since the broad purpose of form-

ing a team is to produce something of value, it is perhaps unsurprising that the most widely

tested criterion of team effectiveness has been team performance [18–20]. Thus, although

teams come from an array of settings and are idiosyncratic in their own ways, one question

that essentially all teams address at some point during their tenure is whether they are per-

forming well. For example, is that road construction crew fixing potholes adequately? Does the

local soccer squad have a respectable winning percentage? Has an elected political party suc-

cessfully completed the tasks for which they campaigned? Did a special operations corps

achieve the mission it set out to accomplish? When taken in concert, questions related to team

performance are often of central interest when characterizing a team’s effectiveness.

In addition to assessing the outcome variable of team performance, researchers have also

been interested in whether teamwork training actually improves teamwork itself. The efficacy

of these interventions can be determined with a number of objective (e.g., products produced

A Meta-Analysis of Teamwork Training

PLOS ONE | DOI:10.1371/journal.pone.0169604 January 13, 2017 3 / 23

by an industry team), self-report (e.g., questionnaires regarding perceived social support

amongst team members), and third-party assessments (e.g., expert ratings of team behaviors).

Both general/omnibus measures of teamwork (e.g., [21]) as well as those assessing specific

dimensions of teamwork (e.g., communication [22]) have been operationalized to examine the

effectiveness of these interventions. For example, do team goal setting activities actually result

in members creating and pursuing effective team goals? Does simulation training improve the

requisite coordination processes among aviation cockpit crews? Has a didactic lecture contrib-

uted to improved conflict management among team members? Answering these types of ques-

tions is important for determining whether an intervention is actually efficacious in changing

the variable that is targeted for improvement (i.e., teamwork behaviors).

The Current Review

Prior to outlining the purposes of this systematic review, it is important to recognize that pre-

vious quantitative reviews have been conducted that addressed—to some degree—teamwork

training. In preparation for this systematic review, we conducted a scoping review which

revealed that eight previous meta-analyses have assessed teamwork intervention studies in

some way. However, these reviews were delimited based on various sample and/or interven-

tion characteristics. For example, some reviews included studies that were only conducted

with certain team types (e.g., intact teams [23]) or within a particular context (e.g., sports [24];

medical teams [25]). Others were delimited to specific training programs/strategies that were

restricted to a narrow range of teamwork strategies (e.g., [23, 25–29]). Finally, studies that

used a combination of teamwork and taskwork intervention components have been systemati-
cally reviewed [30]; however, these types of interventions result in a limited ability to deter-

mine the extent to which the resulting effects were due to teamwork training versus taskwork

training.

It should also be noted that all but one [23] of these previous reviews pooled together stud-

ies that included a control condition (i.e., wherein teams do not receive any type of teamwork

training) and those that did not (as mentioned above, that study only analyzed the effects of

certain teamwork strategies). This is an important consideration, as it has been suggested that

controlled and uncontrolled studies should not be combined into the same meta-analysis due

to differences in study quality (which is a major source of heterogeneity) and since stronger

conclusions can be derived from controlled interventions compared to uncontrolled interven-

tions (e.g., [31]). Therefore, while previous systematic reviews have provided valuable contri-

butions to the teamwork literature, a systematic review that assesses the effects of controlled

teamwork interventions across a range of contexts, team types, and involving those that tar-

geted diverse dimensions of teamwork appears warranted. In doing so, a more comprehensive

assessment of the efficacy of these teamwork interventions is provided, while also having the

capacity to look at the potential moderating effects of various sample, intervention, and mea-

surement characteristics. Moreover, by including only controlled studies, one is able to make

stronger conclusions regarding the observed effects.

The overall purpose of this study was to better understand the utility of teamwork training

for enhancing team effectiveness. Specifically, a meta-analysis was conducted on controlled

studies (i.e., comparing teams who have received teamwork training with those who have not)

that have examined the effects of teamwork interventions on teamwork processes and/or team

performance. To better disentangle the effectiveness of these studies, we also sought to assess

potential moderators of these main effects; that is, to determine whether there are certain con-

ditions under which the independent variable of teamwork training more strongly (or weakly)

causally influences the dependent variables of teamwork behaviors or team performance [32].

A Meta-Analysis of Teamwork Training

PLOS ONE | DOI:10.1371/journal.pone.0169604 January 13, 2017 4 / 23

The specific moderators that we assessed included: (a) the team context/field of study, (b) the

type of teams that were trained, (c) the primary type of intervention method employed, (d)

the dimensions of teamwork that were targeted in the intervention, (e) the number of dimen-

sions targeted, (f) the types of measures used to quantify the training effects, and (g) in studies

where teamwork was assessed as an outcome variable, the dimensions of teamwork that were

measured. It was hypothesized that teamwork training would have a positive and significant

effect on both teamwork and team performance and that these effects would be evident

across a range of the aforementioned sample, intervention, and measurement characteristics/

conditions.

Methods

Literature Search

Searches for potential articles were conducted in the following databases: PsycInfo, Medline,
Cochrane Central Register of Controlled Trials, SportDiscus, and ProQuest Dissertations and
Theses. Hand searches were also conducted across thirteen journals that typically publish arti-
cles on group dynamics (e.g., Group Dynamics: Theory, Research, and Practice; Small Group
Research, Journal of Applied Psychology; Personnel Psychology, Human Factors; Academy of
Management Journal, Journal of Sport & Exercise Psychology). In each database and journal
search, the following combination of search terms were used: (team OR interprofessional OR
interdisciplinary) AND (intervention OR training OR building OR simulation) AND (teamwork
OR mission analysis OR goal specification OR goal setting OR planning OR strategy OR coordi-
nation OR cooperation OR communication OR information exchange OR information sharing
OR monitoring OR problem solving OR backing up OR coaching OR innovation OR adaptability
OR feedback OR support OR conflict management OR situation awareness OR confidence build-
ing OR affect management). These terms were based on various models of teamwork that exist
within the literature (see Rousseau et al. [2] for an overview of these models). An additional

search was conducted within these databases and journals using the search terms (Team-
STEPPS OR Crew Resource Management OR SBAR [Situation-Background-Assessment-Rec-
ommendation]), as several articles in the initial search used these specific training programs.

We also searched the reference sections of the articles from past teamwork training review

papers as well as from articles that initially met inclusion criteria to determine if any additional

articles could be retrieved. The searches were conducted in September 2015 and no time limits

were placed on the search strategy. Each article was first subjected to title elimination, then

abstract elimination, and finally full-text elimination.

Eligibility Criteria

To be included in the meta-analysis, a study needed to examine the effects of teamwork train-

ing by comparing teams in an experimental condition (i.e., those who received teamwork

training) with those in a control condition (i.e., where teams did not receive teamwork train-

ing). Cross-sectional/non-experimental studies were excluded, as were intervention studies

that did not include a control condition. As this review was only concerned with teamwork

interventions, studies that focused on training taskwork—whether independent of, or in addi-

tion to, a teamwork intervention—were excluded. For example, as previously mentioned, sim-

ulation-based training (SBT) has been used as a means of training individuals to perform

technical skills and also to enhance teamwork. In order for a SBT intervention to be included

in this meta-analysis, it had to be clear that only teamwork (not technical skills) was being tar-

geted during training. In order to address our primary research question, the study had to pro-

vide data on at least one teamwork dimension and/or team performance. The study also

A Meta-Analysis of Teamwork Training

PLOS ONE | DOI:10.1371/journal.pone.0169604 January 13, 2017 5 / 23

needed to provide sufficient statistics to compute an effect size. In cases of insufficient data,

corresponding authors were contacted for this information. The articles were delimited to

those published in the English language.

Data Analysis

Articles that met the aforementioned eligibility criteria were extracted for effect sizes and

coded independently with respect to seven moderators by two of the authors (DM and GR).

Interrater reliability for the coding of these moderators was over 90%, kappa (SE) = 0.80
(0.01). The moderators examined were based on a scoping review (the purpose of which

included identifying pertinent characteristics that were commonly reported in previous team-

work intervention research), which was conducted in preparation for this systematic review.

The moderators that were examined in this review included (1) the context within which an

intervention was conducted (health care, aviation, military, academia, industry, or laboratory
experiment), (2) the type of team targeted (intact or new), (3) the primary training method
applied to conduct the intervention (didactic education, workshop, simulation, or team reviews),
(4) the dimension(s) of teamwork (preparation, execution, reflection, and/or interpersonal
dynamics) targeted in the intervention as well as (5) the number of dimensions targeted
(between one and four), (6) the type of measure used to derive effect sizes (self-report, third
party, or objective measures), and—when teamwork was assessed as the criterion variable—(7)
the specific dimension(s) of teamwork that were measured (general, preparation, execution,
reflection, and interpersonal dynamics).

Once coded, data were entered into the software Comprehensive Meta-Analysis, Version 2
[33] and analyzed as a random-effects model (DerSimonian and Laird approach). This type of

model assumes that there is heterogeneity in the effect sizes across the included studies and is

the appropriate model to use in social science research, as opposed to a fixed-effects model

(which assumes that effect sizes do not vary from study to study) [34, 35]. Where possible,

effect sizes for each study were derived from means, standard deviations, and sample sizes at

baseline and post-intervention [34, 36]. If these statistics were not fully provided, they were

supplemented with F-statistics, t scores, correlations, and p-values to compute the effect size.
Each study was given a relative weight based on its precision, which is determined by the

study’s sample size, standard error, and confidence interval (i.e., the more precise the data, the

larger the relative study weight) [34].

In instances where a study provided data to calculate multiple effect sizes (such as when sev-

eral measures of the criterion variable—teamwork or team performance—were examined),

these effects were combined into one overall effect size statistic (i.e., a weighted average) for

that study. This was done to ensure that those studies that had multiple measures of teamwork

or team performance were not given greater weight compared to studies that only provided

one effect size (i.e., only had one measure of performance or teamwork), which could poten-

tially skew the overall results [34]. The exception to this was when articles reported the effects

of more than one intervention (i.e., had multiple experimental conditions), each of which had

a unique teamwork training protocol. In these cases, an effect size from each intervention was

computed. Thus, these articles would contribute multiple effect sizes to the total number of

comparisons within the meta-analysis. To correct for potential unit-of-analysis errors in these

particular articles, the sample size of the control condition was divided by the number of

within-study comparisons [31]. For example, if three different types of teamwork interventions

were compared to one control condition (e.g., which had a sample size of 30 participants), the

n of the control condition was divided by 3 (i.e., 30/3 = 10) when calculating the effect sizes of
those interventions. Cohen’s d was used as the effect size metric to represent the standardized

A Meta-Analysis of Teamwork Training

PLOS ONE | DOI:10.1371/journal.pone.0169604 January 13, 2017 6 / 23

effect (i.e., the average magnitude of effectiveness) of teamwork interventions on teamwork

and team performance [37]. Standard errors and 95% confidence intervals were computed to

test for the accuracy of the standardized effects obtained.

To reduce heterogeneity and improve the interpretability of the results, we pooled studies

into those that measured teamwork as its criterion variable and those that measured team per-

formance. Pooling studies in this manner not only reduces heterogeneity but also allowed us

to identify the extent to which teamwork interventions impact team performance and, sepa-

rately, the extent to which they affect teamwork processes. Heterogeneity within the meta-

analysis was also assessed by computing a Q value—which estimates the variability in the
observed effect sizes across studies—and an I2 statistic—which estimates the ratio of the true
heterogeneity to the total observed variation across studies. High Q and I2 statistics can be
problematic for interpreting the results of a meta-analysis and can also indicate that the meta-

analysis includes outlier studies. We also planned to identify and exclude outliers from subse-

quent moderator analyses in two ways. First, sensitivity analyses were carried out by removing

a single intervention from the meta-analysis and noting the resulting effect size—this estimates

the impact that each individual intervention has on the overall effect size of teamwork or team

performance. If the resulting effect size with an intervention removed (i.e., K– 1) is substan-

tially different than the effect size with that intervention present, this may suggest that it is an

outlier and needs to be removed [34]. Second, we noted any studies that had abnormally high

effect sizes and standardized residuals (above 3.0), especially when these values were accompa-

nied by narrow confidence intervals. If heterogeneity (Q and I2) is substantially reduced upon
removal of a study, this further confirms that the study is an outlier and should be omitted

from subsequent subgroup/moderator analyses.

Once the two pools of studies were produced, bias within each pool was assessed. First, pub-

lication bias was examined by calculating a fail-safe N statistic, which estimates the number of
unpublished studies with null findings that would have to exist to reduce the obtained effect

size to zero [38]. If this number is sufficiently large—Rosenberg [39] recommends a critical

value of 5N+10—then the probability of such a number of studies existing is considered to be
low. For example, if 20 studies were included in a meta-analysis, then the resulting fail-safe N
should be larger than 110 (i.e., 5�20 + 10); if this value was not larger than 110, then publica-

tion bias is likely within this pool of studies. We also obtained two funnel plots (one for studies

where teamwork was the outcome variable and one for team performance as the outcome) to

provide a visual depiction of potential publication bias. We then conducted an Egger’s test as a

measure of symmetry for these two funnel plots. If this test statistic is significant (p < 0.05), this denotes that the distribution around the effect size is asymmetric and publication bias is

likely present [34].

Results

Literature Search

The literature search from the five databases returned 22,066 articles, while the hand searches

of the 13 journals returned 3797 articles, vetting of studies from previous team training reviews

returned 191 articles, and the ancestry search of reference lists returned 471 articles (see Fig 1).

After removing duplicates, 16,849 articles were subject to title and abstract screening, where

they were dichotomously coded as ‘potentially relevant’ or ‘clearly not relevant’. 1517 poten-

tially relevant articles were then full-text reviewed and coded as meeting eligibility criteria or

as ineligible for the following reasons: (1) not a teamwork intervention; (2) teamwork-plus-

taskwork intervention; (3) insufficient statistics to compute an effect size; (4) not including a

measure of teamwork or team performance; or (5) not including a control group. As a result of

A Meta-Analysis of Teamwork Training

PLOS ONE | DOI:10.1371/journal.pone.0169604 January 13, 2017 7 / 23

this eligibility coding, 51 articles were included in the meta-analysis. 13 of these studies

reported results on two or more interventions, bringing the total number of comparisons (k)
to 72 with 8439 participants (4966 experimental, 3473 control). See S1 Table for descriptions

of each study with regard to study context, type of team and participants, targeted teamwork

dimensions of the intervention, number of effect sizes, the criteria measured, and an overview

of the intervention.

Summary Statistics

Results of the overall effect of teamwork interventions on teamwork processes along with sum-

mary statistics and sensitivity analyses (i.e., the final column marked ‘ES with study removed’)

for this pool of studies are presented in Table 1. This pool included a total of 39 interventions

from 33 studies. The results revealed that teamwork interventions had a significant, medium-

Fig 1. Results of Literature Search (PRISMA Flow Diagram).

doi:10.1371/journal.pone.0169604.g001

A Meta-Analysis of Teamwork Training

PLOS ONE | DOI:10.1371/journal.pone.0169604 January 13, 2017 8 / 23

to-large effect on teamwork, d (SE) = 0.683 (0.13), 95% CI = 0.43–0.94, Z = 5.23, p < 0.001; Q (df) = 660.7 (38), I2 = 94.2. The funnel plot for this pool of studies is shown in Fig 2. The fail- safe N was 3598, which is sufficiently large, as it exceeds the critical value of 205 (5�39+10).

Table 1. Summary Results of Interventions Assessing the Effects of Teamwork Training on Teamwork.

Study Relative Weight Effect Size (SE) 95% CI (lower, upper) Z-value p-value ES with intervetion removed

Aaron 2014 [13] a 2.43 1.432 (.35) .74, 2.13 4.04 < .001 0.67 b 2.48 .869 (.33) .22, 1.52 2.61 .009 0.68

Becker 2005 [40] 2.75 .635 (.21) .22, 1.05 3.02 .003 0.69

Beck-Jones 2004 [41] a 2.70 -.030 (.24) -.50, .44 -0.13 .898 0.70

b 2.69 -.003 (.24) -.47, .47 -0.01 .990 0.70

Beranek 2005 [42] 2.67 .649 (.25) .16, 1.13 2.62 .009 0.68

Bjornberg 2014 [9] 2.83 .080 (.16) -.23, .39 0.50 .615 0.69

Brannick 2005 [5] 2.72 1.229 (.23) .79, 1.67 5.47 < .001 0.69 Bushe 1995 [43] a 2.53 .405 (.31) -.20, 1.01 1.31 .192 0.69

b 2.53 .534 (.31) -.08, 1.14 1.71 .086 0.69

Cheater 2005 [12] 2.82 .336 (.17) .00, .67 1.97 .049 0.69

Clay-Willaims 2013 [44] a 2.04 .531 (.51) -.46, 1.53 1.05 .296 0.69

b 2.06 -.213 (.50) -1.20, .77 -0.43 .671 0.70

c 2.12 0.000 (.48) -.94, .94 0.00 1.00 0.70

Dalenberg 2009 [45] 2.82 1.001 (.17) .68, 1.33 6.02 < .001 0.67 Deneckere 2013 [46] 2.92 .129 (.09) -.04, .29 1.52 .129 0.70

Dibble 2010 [47] 2.92 -.242 (.09) -.42, -.07 -2.72 .007 0.71

Eden 1986 [48] 2.92 .427 (.09) .07, .42 2.73 .006 0.70

Ellis 2005 [14] 2.88 .792 (.13) .54, 1.05 6.14 < .001 0.68 Emmert 2011 [49] 2.54 .763 (.31) .16, 1.36 2.48 .013 0.68

Entin 1999 [50] 2.32 .771 (.40) -.01, 1.55 1.93 .054 0.68

Friedlander 1967 [51] 2.72 .495 (.22) .06, .94 2.21 .027 0.69

Green 1994 [52] a 1.91 .665 (.56) -.44, 1.76 1.19 .236 0.68

b 1.87 1.058 (.58) -.08, 2.20 1.82 .069 0.68

Jankouskas 2010 [7] 2.22 .778 (.44) -.08, 1.64 1.77 .077 0.68

Kim 2014 [53] 2.65 .062 (.26) -.45, .57 0.24 .813 0.70

Marshall 2009 [22]* 2.70 3.277 (.33) 2.65, 3.95 9.90 < .001 0.61 Martinez-Moreno 2015 [54] 2.86 .503 (.14) .23, .78 3.63 < .001 0.69 Morey 2002 [3]* 2.93 1.896 (.08) 1.75, 2.05 24.83 < .001 0.64 O’Leary 2011 [21] 2.82 .426 (.17) .10, .76 2.54 .011 0.69

Padmo Putri 2012 [6] 2.82 -.097 (.17) -.42, .23 -0.58 .561 0.71

Prichard 2007 [55] 2.40 1.981 (.37) 1.26, 2.70 5.381 < .001 0.65 Rapp 2007 [56] 2.61 .535 (.28) -.01, 1.08 1.93 .053 0.69

Shapiro 2004 [57] 2.03 .689 (.52) -.32, 1.70 1.34 .181 0.68

Smith-Jentsch 2008 [4] 2.63 1.103 (.27) .58, 1.63 4.13 < .001 0.67 Thomas 2007 [58] 2.39 .891 (.37) .16, 1.62 2.40 .016 0.68

Volpe 1996 [59] 2.71 .450 (.23) .00, .90 1.97 .049 0.69

Weaver 2010 [60] 2.41 .580 (.36) -.13, 1.29 1.61 .109 0.69

Weller 2014 [61] 2.64 1.563 (.26) 1.05, 2.08 5.92 < .001 0.66 OVERALL 100 .683 (0.13) 0.43, 0.94 5.23 <0.001

Note. a, b, c = intervention groups within study; SE = standard error; CI = confidence interval; ES = effect size.

* = Study identified as an outlier and removed from subsequent moderator analyses.
The final column marked ‘ES with study removed’ indicates the results of the sensitivity analysis for each respective intervention.

doi:10.1371/journal.pone.0169604.t001

A Meta-Analysis of Teamwork Training

PLOS ONE | DOI:10.1371/journal.pone.0169604 January 13, 2017 9 / 23

The funnel plot for this pool of studies is presented in Fig 2. Egger’s value for this funnel plot

was not significant (B = 0.364, SE = 1.30, 95% CI = -2.26–2.99, t = 0.28, p = 0.78), which also
suggests that bias was not present. Two studies were identified as outliers within this pool of

studies: Morey et al. [3] and Marshall et al. [22]. The resulting effect size when these studies

were excluded was d (SE) = 0.550 (0.08), 95% CI = 0.39–0.71, Z = 6.73, p < 0.001; Q (df) = 187.53 (36), I2 = 80.8. Subsequent moderator analyses were conducted with these two outlier studies being omitted.

Results of the overall effect of teamwork interventions on team performance as well as sum-

mary statistics and sensitivity analyses (i.e., the final column marked ‘ES with intervention

removed’) for this pool of studies are presented in Table 2. This pool of studies included a total

of 50 interventions from 32 studies. It was shown that teamwork interventions had a signifi-

cant, large effect on team performance—d (SE) = 0.919 (0.14), 95% CI = 0.65–1.19, Z = 6.72,
p < 0.001; Q (df) = 851.3 (49), I2 = 94.2. The funnel plot for this pool of studies is shown in Fig 3. The fail-safe N was 6692, which is sufficiently large, as it exceeds the critical value of 260

(5�50+10). The funnel plot for this pool of studies is presented in Fig 3. Egger’s value for this

funnel plot was not significant (B = 0.131, SE = 1.19, 95% CI = -2.26–2.54, t = 0.11, p = 0.91),
which also implies that bias was not present. There were five outlier interventions (from four

studies) in this pool of studies that assessed team performance: Morey et al. [3], Smith-Jentsch

et al. [4], one of the interventions from Buller and Bell [63]; teambuilding condition), and both

interventions from Bushe and Coetzer [43]. When these outliers were removed, the resulting

effect size was d (SE) = 0.582 (0.06), 95% CI = 0.47–0.69, Z = 10.30, p < 0.001; Q (df) = 101.1 (44), I2 = 56.5. Subsequent moderator analyses were conducted with these five interventions omitted.

Fig 2. Funnel Plot for Studies Assessing Teamwork. Circles filled with black indicate outlier studies.

doi:10.1371/journal.pone.0169604.g002

A Meta-Analysis of Teamwork Training

PLOS ONE | DOI:10.1371/journal.pone.0169604 January 13, 2017 10 / 23

Table 2. Summary Results of Interventions Assessing the Effects of Teamwork Training on Team Performance.

Study Relative Weight Effect Size (SE) 95% CI (lower, upper) Z-value p-value ES with intervention removed

Beck-Jones 2004 [41] a 2.16 .502 (.18) .35, 1.04 3.91 < .001 0.93 b 2.15 .902 (.18) .33, 1.30 3.83 < .001 0.92

Bjornberg 2014 [9] 2.24 .466 (.16) .15, .78 2.91 .004 0.93

Brannick 2005 [5] 2.20 .237 (.21) -.17, .64 1.15 .249 0.94

Brown 2003 [62] 2.25 .267 (.15) -.02, .56 1.80 .072 0.94

Buller 1986 [63] a 1.33 1.435 (.77) -0.08, 2.95 1.86 .063 0.91

b* 1.11 3.72 (.94) 1.88, 5.56 3.96 < .001 0.89 C 1.46 1.58 (.69) .23, 2.94 2.30 .022 0.91

Bushe 1995 [43] a* 1.67 4.57 (.56) 3.47, 5.66 8.19 < .001 0.86 b* 1.47 5.96 (.68) 4.63, 7.29 8.75 < .001 0.84

Cannon-Bowers 1998 [64] 2.22 .46 (.19) .09, .82 2.45 .014 0.93

Chang 2008 [65] 2.04 1.344 (.33) .70, 1.99 4.09 < .001 0.91 Dalenberg 2009 [45] 2.24 .653 (.16) .34, .97 4.06 < .001 0.93 Dibble 2010 [47] 2.29 .181 (.09) .01, .36 2.04 .042 0.94

Entin 1999 [50] 1.92 .927 (.41) .13, 1.72 2.88 .022 0.92

Fandt 1990 [66] 2.25 .095 (.15) -.19, .38 0.65 .518 0.94

Green 1994 [52] a 1.67 .655 (.56) -.44, 1.75 1.17 .243 0.92

b 1.62 1.212 (.59) .05, 2.37 2.05 .040 0.91

Haslam 2009–1 [67] a 2.08 .223 (.30) -.37, .82 0.73 .464 0.93

b 2.06 .690 (.31) .07, 1.31 2.20 .028 0.92

Haslam 2009–2 [67] a 2.02 .941 (.34) .27, 1.61 2.76 .006 0.92

b 2.04 .610 (.33) -.03, 1.25 1.87 .062 0.93

c 2.02 .957 (.35) .28, 1.63 2.78 .005 0.92

d 2.03 .963 (.34) .31, 1.62 2.87 .004 0.92

Ikomi 1999 [68] 2.06 1.008 (.32) .39, 1.63 3.18 .001 0.92

Jankouskas 2010 [7] 1.86 -.173 (.44) -1.04, .70 -0.39 .696 0.94

Jarrett 2012 [69] a 2.22 .243 (.19) -.12, .61 1.31 .191 0.94

b 2.21 .834 (.19) .46, 1.21 4.34 < .001 0.92 c 2.22 .358 (.19) -.01, .72 1.92 .055 0.93

d 2.21 .940 (.19) .56, 1.32 4.84 < .001 0.92 Kring 2005 [70] a 2.00 .062 (.36) -.64, .76 0.17 .862 0.94

b 2.00 -.092 (.36) -.79, .61 -0.26 .795 0.94

Longenecker 1994 [71] 2.03 1.89 (.33) 1.24, 2.54 5.66 < .001 0.90 Morey 2002 [3]* 2.29 2.781 (.09) 2.61, 2.95 31.51 < .001 0.80 Padmo Putri 2012 [6] 2.23 .542 (.17) .21, .87 3.21 .001 0.93

Rapp 2007 [56] 2.12 .254 (.27) -.28, .79 0.93 .353 0.93

Schurig 2013 [72] a 2.26 .513 (.27) -.02, 1.05 1.88 .061 0.93

b 2.26 .688 (.28) .15, 1.23 2.49 .013 0.93

Siegel 1973 [73] 1.99 .594 (.36) -.11, 1.30 1.64 .100 0.93

Sikorski 2012 [74] 2.26 .272 (.14) -.01, .56 1.89 .059 0.94

Smith-Jentsch 2008 [4]* 1.91 3.729 (.41) 2.92, 4.54 9.07 < .001 0.86 Smith-Jentsch 1996 [75] a 1.74 .206 (.52) -.81, 1.22 0.40 .690 0.93

b 1.74 .025 (.52) -.99, 1.04 0.05 .961 0.94

c 1.71 .901 (.54) -.15, 1.95 1.68 .092 0.92

Stout 1997 [76] 2.04 .984 (.33) .34, 1.63 3.00 .003 0.92

Villado 2013 [16] 2.19 .834 (.22) .41, 1.36 3.88 < .001 0.92 Volpe 1996 [59] 2.16 .877 (.24) .28, 1.12 3.70 < .001 0.92

(Continued )

A Meta-Analysis of Teamwork Training

PLOS ONE | DOI:10.1371/journal.pone.0169604 January 13, 2017 11 / 23

Moderator Analyses

The results of the moderator analyses are shown in Table 3 (for teamwork behaviors) and

Table 4 (for team performance). With respect to sample characteristics, significant positive

effects of teamwork interventions were found for enhancing teamwork across all contexts

(ds = 0.46–1.23) except for the single effect size from an industry setting (d = 0.50). In terms of
team performance, significant effects were evident across all settings (ds = 0.40–1.76). In addi-
tion, interventions were effective for enhancing teamwork with intact teams (d = 0.33) and
newly-formed teams (d = 0.67), with the effect size for new teams being significantly larger
(Q = 4.04, p = 0.004) than that for existing teams. Teamwork training was also effective at fos-
tering team performance for both team types; however, in contrast to the findings on team-

work, the effect size for intact teams (d = 0.99) was significantly larger (Q = 6.04, p = 0.02) than
that for new teams (d = 0.54).

Table 2. (Continued )

Study Relative Weight Effect Size (SE) 95% CI (lower, upper) Z-value p-value ES with intervention removed

Wegge 2005 [77] a 1.91 1.004 (.41) .19, 1.81 2.44 .015 0.92

b 1.90 .682 (.42) -.14, 1.50 1.64 .102 0.92

c 1.95 .487 (.39) -.28, 1.25 1.25 .212 0.93

OVERALL 100 .919 (.14) .65, 1.19 6.72 <0.001

Note. a, b, c, d = intervention groups within study; SE = standard error; CI = confidence interval; ES = effect size.

* = Study identified as an outlier and removed from subsequent moderator analyses.
The final column marked ‘ES with study removed’ indicates the results of the sensitivity analysis for each respective intervention.

doi:10.1371/journal.pone.0169604.t002

Fig 3. Funnel plot for studies assessing team performance. Circles filled with black indicate outlier studies.

doi:10.1371/journal.pone.0169604.g003

A Meta-Analysis of Teamwork Training

PLOS ONE | DOI:10.1371/journal.pone.0169604 January 13, 2017 12 / 23

Three intervention characteristics were analyzed as potential moderators. First, with regard

to the intervention method utilized, significant effects on teamwork were found for workshop

Table 3. Moderator results for interventions assessing teamwork as the outcome variable.

Moderator K Effect size (SE) 95% CI Z-value p value Q value (df), p-value

Sample Characteristics

Context 3.272(5), p = 0.658

Health care 13 0.51 (0.15) 0.20, 0.81 3.30 0.001

Academia 10 0.46 (0.17) 0.14, 0.78 2.78 0.005

Laboratory experiment 6 0.51 (0.20) 0.12, 0.89 2.55 0.011

Military 6 0.77 (0.23) 0.33, 1.22 3.42 0.001

Aviation 1 1.23 (0.47) 0.25, 2.21 2.46 0.014

Industry 1 0.50 (0.50) -0.48, 1.47 0.99 0.321

Team type 4.04(1), p = 0.004

Intact 13 0.33 (0.14) 0.05, 0.60 2.35 0.019

New 24 0.67 (0.10) 0.47, 0.87 6.58 <0.001

Intervention Characteristics

Method of intervention 6.17(3), p = 0.10

Didactic education 4 0.19 (0.19) -0.20, 0.57 0.95 0.341

Workshop 18 0.50 (0.10) 0.31, 0.70 4.96 <0.001 Simulation 11 0.78 (0.16) 0.48, 1.09 5.05 <0.001 Team Reviews 4 0.64 (0.19) 0.26, 1.01 3.34 0.001

Teamwork dimensions targeted
a

Preparation 20 0.75 (0.11) 0.54, 0.95 7.09 <0.001 Execution 21 0.64 (0.11) 0.42, 0.86 5.70 <0.001 Reflection 22 0.65 (0.11) 0.43, 0.86 5.80 <0.001 Interpersonal dynamics 11 0.69 (0.16) 0.38, 1.00 4.33 <0.001

Number of dimensions targeted
b

19.73(4), p = 0.001

One 6 0.05 (0.16) -0.26, 0.35 0.29 0.775

Two 11 0.65 (0.12) 0.42, 0.89 5.39 <0.001 Three 6 0.98 (0.16) 0.66, 1.30 6.04 <0.001 Four 7 0.57 (0.15) 0.27, 0.87 3.70 <0.001

Measurement Characteristics

Type of teamwork measure
c

16.86(1), p<0.001 Third party 45 0.80 (0.07) 0.66, 0.94 10.92 <0.001 Self-report 46 0.38 (0.07) 0.25, 0.52 5.47 <0.001

Teamwork dimension measured
c

2.98(1), p = 0.56

General 27 0.71 (0.11) 0.49, 0.93 6.36 <0.001 Preparation 8 0.53 (0.19) 0.16, 0.89 2.80 0.005

Execution 31 0.55 (0.10) 0.35, 0.74 5.57 <0.001 Reflection 12 0.70 (0.16) 0.40, 1.01 4.50 <0.001 Interpersonal dynamics 13 0.45 (0.14) 0.17, 0.73 3.12 0.002

Note. The df of the Q-value represents the total number of combinations of the targeted dimensions minus 1.
a
: The total k of this moderator is greater than 37 as many interventions targeted more than one dimension of teamwork. Because of this, each category

within this moderator was analyzed independently (i.e., whether each teamwork dimension was targeted or not targeted in the intervention); as a result, it

was not possible to calculate a Q value for this moderator.
b
: The total k of this moderator is less than 37 as seven interventions were unclear in terms of the exact teamwork dimensions targeted.

c
: The total k of this moderator is greater than 37 as many studies used more than one type of criterion measure of teamwork. Because of this, each

category within this moderator was analyzed independently.

doi:10.1371/journal.pone.0169604.t003

A Meta-Analysis of Teamwork Training

PLOS ONE | DOI:10.1371/journal.pone.0169604 January 13, 2017 13 / 23

training (d = 0.50), simulation-based teamwork training (d = 0.78), and team reviews (d =
0.64) but not for didactic education (d = 0.19). All training methods were effective for enhanc-
ing team performance (ds = 0.41–0.69). Second, significant effects of training on teamwork
were evident when two or more dimensions of teamwork were targeted (ds = 0.65–0.98) but
not when only one dimension was targeted (d = 0.05). Team performance, however, improved
significantly as a result of teamwork training regardless of the number of teamwork dimen-

sions that were targeted (ds = 0.46–0.67). Third, significant effects were shown regardless of

Table 4. Moderator results for interventions assessing team performance as the outcome variable.

Moderator k Effect size (SE) 95% CI Z value p value Q value (df), p-value

Sample Characteristics

Context 16.94(5), p = 0.01

Health care 2 0.76 (0.31) 0.15, 1.36 2.46 0.014

Laboratory experiment 25 0.54 (0.07) 0.41, 0.67 8.08 <0.001 Aviation 4 0.64 (0.18) 0.28, 0.99 3.51 <0.001 Military 5 0.66 (0.17) 0.34, 0.99 3.99 <0.001 Industry 3 1.76 (.32) 1.13, 2.38 5.52 <0.001 Academia 6 0.40 (0.12) 0.17, 0.63 3.35 0.001

Team type 6.04(1), p = 0.02

Intact 6 0.99 (0.18) 0.64, 1.33 5.63 <0.001 New 39 0.54 (0.06) 0.42, 0.65 9.32 <0.001

Intervention Characteristics

Method of intervention 2.44(3), p = 0.49

Didactic education 4 0.41 (0.16) 0.09, 0.74 2.52 0.012

Workshop 24 0.55 (0.08) 0.39, 0.71 6.87 <0.001 Simulation 7 0.57 (0.17) 0.23, 0.90 3.30 0.001

Team Reviews 10 0.69 (0.10) 0.50, 0.89 6.88 <0.001 Teamwork dimensions targeted

a

Preparation 15 0.60 (0.07) 0.46, 0.73 8.69 <0.001 Execution 26 0.52 (0.08) 0.37, 0.66 6.87 <0.001 Reflection 22 0.55 (0.08) 0.40, 0.70 7.17 <0.001 Interpersonal dynamics 6 0.57 (0.18) 0.18, 0.95 2.88 0.004

Number of dimensions targeted
b

3.98(4), p = 0.67

One 20 0.61 (0.09) 0.44, 0.79 6.85 <0.001 Two 12 0.63 (0.12) 0.40, 0.86 5.31 <0.001 Three 9 0.46 (0.11) 0.24, 0.67 4.08 <0.001 Four 3 0.67 (0.25) 0.19, 1.15 2.74 0.006

Measurement Characteristics

Type of team performance measure
c

2.03(1), p = 0.15

Third party 31 0.56 (0.08) 0.40, 0.72 6.79 <0.001 Objective 62 0.61 (0.06) 0.48, 0.73 9.70 <0.001

Note. The df of the Q-value represents the total number of combinations of the targeted dimensions minus 1.
a
: The total k of this moderator is greater than 45 as many interventions targeted more than one dimension of teamwork. Because of this, each category

within this moderator was analyzed independently (i.e., whether each teamwork dimension was targeted or not targeted in the intervention); as a result, it

was not possible to calculate a Q value for this moderator.
b
: The total k of this moderator is less than 45 as one intervention was unclear in terms of the exact teamwork dimensions targeted.

c
: The total k of this moderator is greater than 45 as many studies used more than one type of criterion measure of team performance. Because of this, each

category within this moderator was analyzed independently.

doi:10.1371/journal.pone.0169604.t004

A Meta-Analysis of Teamwork Training

PLOS ONE | DOI:10.1371/journal.pone.0169604 January 13, 2017 14 / 23

which dimension (i.e., preparation, execution, reflection, interpersonal dynamics) was targeted

for both teamwork (ds = 0.64–0.75) and team performance (ds = 0.52–0.60).
With regard to measurement characteristics, significant improvements on teamwork

emerged when either third-party (d = 0.80) or self-report (d = 0.38) measures of teamwork
were utilized; the effect size for third-party measures was significantly larger (Q = 6.02,
p = 0.014) than the effect size for self-report measures. For team performance outcomes, signif-
icant effects were shown for both objective (d = 0.61) and third-party measures (d = 0.56).
Finally, significant effects on teamwork were found when general/omnibus measures of team-

work were taken (d = 0.71), as well as when a specific dimension of teamwork was measured
(ds = 0.45–0.70).

Discussion

The purpose of this systematic review and meta-analysis was to quantify the effects of the

extant controlled experimental research of teamwork training interventions on teamwork and

team performance. We found positive and significant medium-to-large sized effects for these

interventions on teamwork and large effects on team performance. When outlier studies were

removed, medium-sized effects were found for both criteria. Additional subgroup/moderator

analyses also revealed several notable findings, each of which will be discussed in turn. The

paper concludes with a discussion of the limitations associated with this meta-analysis as well

as considerations for future teamwork training research.

Who Can Benefit From Teamwork Training?

With regard to sample characteristics, teamwork interventions were shown to be effective at

enhancing both teamwork and team performance across a variety of team contexts, including

laboratory settings as well as real-world contexts of health care, aviation, military, and academia.

This highlights the efficacy of teamwork training as a means of improving teams; this is an

important finding as effective teams (i.e., those that work well together and perform at a high

level) are vital in many of the aforementioned contexts. For example, it has been estimated that

approximately 70% of adverse events in medical settings are not due to individuals’ technical

errors but, rather, as a result of breakdowns in teamwork [78]. Thus, there is a critical need to

ensure that teams are effective across these settings, as these teams greatly impact (among other

things) the welfare of others. The results of this meta-analysis suggest that teamwork training

can indeed be a useful way of enhancing team effectiveness within these contexts.

We also examined whether there were differential effects of teamwork training for new

teams compared to intact teams. It was shown that these interventions were effective for both

team types. The effects of teamwork training on teamwork outcomes were significantly larger

for new teams (who showed a medium-to-large effect size) compared to existing teams (who

had a small-to-medium effect size). Interestingly, when we examined team performance as the

criterion variable, the training effects were significantly larger for intact teams (who showed a

large effect size) compared to newly-formed teams (who again showed a medium-to-large

effect size). It should be noted that there were many more studies conducted with new teams

compared to intact teams—thus, caution should be exercised in directly comparing these find-

ings. Nonetheless, at this point, the existing research seems to suggest that teamwork interven-

tions work particularly well at enhancing teamwork processes for newly established teams—

and also work with existing teams—but not the same extent. It is possible that teamwork pro-

cesses might be more malleable and display greater potential for improvement with new teams

compared to more established teams whose teamwork processes may be more entrenched. On

the other hand, it is notable that the effects of teamwork training on team performance were

A Meta-Analysis of Teamwork Training

PLOS ONE | DOI:10.1371/journal.pone.0169604 January 13, 2017 15 / 23

stronger for established teams. In line with this, it is plausible that, while intact teams may

show less pronounced changes in teamwork, they might be better able to translate their team-

work training into improved team performance outcomes.

What Type of Training Works?

Three moderator variables were assessed with regard to intervention characteristics. First, with

regard to the training method utilized, it was shown that all four training methods were effec-

tive for enhancing team performance. These included the provision of didactic lectures/pre-

sentations, workshops, simulation training, and review-type activities conducted in situ.

Although significant effects were shown for the latter three training methods for teamwork

outcomes, those interventions that targeted didactic instruction did not result in significant

improvements in teamwork itself. This suggests that simply providing educational lectures

wherein team members passively learn about teamwork is not an effective way of improving

teamwork. When taken together these findings suggest that teamwork training should incor-

porate experiential activities that provide participants with more active ways of learning and

practising teamwork. These may include various workshop-style exercises that involve all team

members, such as working through case studies of how teams can improve teamwork, watch-

ing and critiquing video vignettes of teams displaying optimal versus suboptimal teamwork,

discussing and setting teamwork-related goals and action plans, or other activities that help

stimulate critical thinking and active learning of effective teamwork. Teams may also find it

useful to conduct simulations of specific team tasks that the group is likely to encounter in-

situ, such as aviation teams using an airplane simulator, surgical teams conducting mock-sur-

geries on medical manikins, military teams practising various field missions, and so on. Team-

work can be also fostered by having team members participate in team reviews/briefings

before, during, and/or after the execution of team tasks that occur in-situ. In summary, simply

lecturing about the importance of teamwork is not sufficient to create meaningful improve-

ments in teamwork; rather, substantive positive effects can be derived by having team mem-

bers engage in activities that require them to actively learn about and practise teamwork.
We also sought to assess how comprehensive an intervention should be—specifically, the

number of teamwork dimensions that need to be targeted—in order to be effective. With

regard to improving team performance, there were significant effects when one or more

dimensions were targeted. However, in terms of improving teamwork behaviors, significant

effects only emerged when two or more dimensions were targeted. From an applied perspec-

tive, individuals concerned with intervention (e.g., team consultants, coaches, managers, team

leaders) can utilize these findings by targeting more than one dimension of teamwork within

their training protocol. For instance, if the purpose of an intervention is to improve a health

care team’s communication, greater effects may be derived by not merely targeting communi-

cation during the execution phase alone (e.g., with a structured communication tool), but by

also incorporating strategies that target other dimensions of teamwork, such as setting goals

and action plans for how communication will be improved (i.e., the preparation dimension of

teamwork) as well as monitoring progress towards those goals, resolving any communication-

related problems that arise, and making adjustments to action plans as necessary (i.e., the

reflection dimension).

Relatedly, we sought to address whether there were differential effects of teamwork inter-

ventions on teamwork and team performance based on the dimensions of teamwork that were

targeted. It was found that interventions had a significant effect on both teamwork behaviors

and team performance when any dimension of teamwork was targeted. This is important as it

means that if those concerned with intervention target any one of the four dimensions of

A Meta-Analysis of Teamwork Training

PLOS ONE | DOI:10.1371/journal.pone.0169604 January 13, 2017 16 / 23

teamwork, this will likely result in improvements in team functioning. While the preparation

(i.e., behaviors occurring before team task performance such as setting goals and action plans),

execution (i.e., intra-task behaviors such as communication and coordination), and reflection

(i.e., behaviors occurring following task performance such as performance monitoring and

problem solving) dimensions have each been theorized to be implicated in fostering team per-

formance [2, 79], is particularly noteworthy that interventions targeting the interpersonal

dynamics of a team (i.e., managing interpersonal conflict and the provision of social support

between members) also displayed significant effects in relation to team performance. Specifi-

cally, efforts to enhance interpersonal processes have generally been theorized to be related to

supporting team maintenance more so than supporting team performance [2, 79]. However,

the results from the current review provide evidence that training teams with regard to social

support and interpersonal conflict management processes may actually be a useful way to

enhance team performance. While the exact reason for this effect is not immediately clear

from this review, it may be that improving interpersonal dynamics has an indirect relationship

with team performance. That is, teamwork training focused on improving social support and

conflict management may improve the functioning of a team, which, in turn, improves the

team’s performance. As Marks et al. [10] contend, these interpersonal processes “lay the foun-

dation for the effectiveness of other processes” (p. 368). Relatedly, Rousseau et al. [2] suggest

that problems related to social support and conflict management “may prevent team members

from fully contributing to task accomplishment or from effectively regulating team perfor-

mance” (p. 557). Further research examining this potential relationship is required as this

would have implications in both research and applied teamwork settings.

Does It Matter How Criterion Variables Are Measured?

Two measurement characteristics were examined as moderators within this meta-analysis. First,

significant, large- and small-to-medium sized effects were found for third party and self-report

measures of teamwork, respectively. Significant medium effects were also evident for third party

and objective measures of team performance. It is worth noting that significantly larger effect

sizes emerged for third party assessments of teamwork compared to self-report measures. Taken

together, these findings suggest that the positive effects that were found for teamwork interven-

tions are not merely perceptive and/or due to individuals’ self-report biases (i.e., social desirabil-

ity). Rather, these results indicate that the effects of these interventions on both teamwork and

team performance are clearly observable with measures beyond self-report indices.

Finally, we sought to assess whether the effects of teamwork training varied based on which

teamwork dimension(s) were measured. Medium-to-large effects emerged when general/

omnibus measures of teamwork—that is, those that provided an overall score of teamwork as

opposed to examining individual dimensions of teamwork—were taken. Measures that tapped

into the specific dimensions of teamwork (e.g., those that provided individual scores on prepa-

ration, execution, reflection, and interpersonal dynamics) also yielded comparable effect sizes.

Hence, teamwork interventions appear to have a somewhat similar effect on each of the com-

ponents of teamwork. In summary, the results of the above two moderators (i.e., type of mea-

sure and dimension of teamwork examined) suggest that teamwork training has a positive

impact on teamwork and team performance regardless of the way in which these variables are

assessed.

Limitations

Despite the contributions of this meta-analytic review, it is not without limitations. First, there

were additional variables that we had planned to analyze as moderators a priori including team

A Meta-Analysis of Teamwork Training

PLOS ONE | DOI:10.1371/journal.pone.0169604 January 13, 2017 17 / 23

size and length of/contact time within the intervention. However, there was an insufficient

amount of reliable data across the studies on these variables to conduct these subgroup analy-

ses appropriately. For instance, although many studies noted the total number of participants

within an organization (e.g., a hospital) that took part in an intervention, information on the

size of the teams within the organization (e.g., various units within the hospital) was often

missing. Team composition variables such as this have been noted as important factors to take

into account when examining teams (e.g., [30, 80]). Similarly, although some studies were

explicit about the total length of the intervention and the contact time between interventionists

and participating teams, this information was not provided consistently. This too would have

been a valuable feature to analyze in order to provide more specific recommendations about

how teamwork training programs should be designed—that is, how long an intervention

should last? Unfortunately, due to the paucity of information available in the included manu-

scripts, we were unable to determine whether these variables moderated the observed effects of

teamwork training on teamwork and team performance in the current meta-analysis.

Furthermore, there was a considerable amount of variability within some of the moderator

categories that were coded. For instance, with regard to intervention methods, ‘workshops’

consisted of many different types of activities including team charter sessions, strategy plan-

ning meetings, case study activities, and so on. Combining these activities into one category

was done for the sake of being adequately powered to conduct moderator analyses (i.e., include

a sufficient number of studies within each of the resulting categories). However, while the

above examples are indeed activities that teams do together, they are of course each different

in their own ways. Hence, although it is evident that workshop-type activities are effective

overall, it is unclear if specific workshop activities are more effective than others. This example

underscores the difficulty that can occur when trying to balance statistical power with accuracy

for each moderator category when conducting subgroup analyses in a meta-analysis.

Relatedly, effect sizes were only computed with the statistics that were provided from base-

line and post-intervention, even if studies provided additional data on teamwork and/or per-

formance at some other point in between or at a follow-up point in time (although it is worth

noting that relatively few studies actually did this). This was done in order to minimize hetero-

geneity within the meta-analysis and improve the interpretability of the results (i.e., determin-

ing the effects of teamwork training from pre- to post-intervention). However, by not taking

these measurement time-points into consideration, two questions in particular are raised.

First, do certain dimensions of teamwork and team performance evolve differently over time

and, if so, how? For instance, do improvements in teamwork occur immediately in response to

training and then plateau; or do they improve in a slower, more linear fashion from the onset

of training? Second, what are the long-term implications of teamwork training? That is, does

teamwork training result in sustained improvements in teamwork and team performance

beyond the intervention period or do these effects eventually wane? Answers to these types

of research questions would certainly be of interest to teamwork researchers and applied

practitioners.

Future Directions

In addition to summarizing the previous research on teamwork interventions for improving

teamwork and team performance, the findings from this systematic review also highlight

several potential avenues of future research. First, with regard to sample characteristics, the

majority of studies that examined the effects of teamwork interventions on team performance

were conducted within laboratory settings, with relatively fewer controlled studies having been

conducted in real-world settings. Thus, although significant effects on team performance (and

A Meta-Analysis of Teamwork Training

PLOS ONE | DOI:10.1371/journal.pone.0169604 January 13, 2017 18 / 23

teamwork) were found in health care, aviation, military, and academic settings, the extant liter-

ature would be strengthened by conducting further controlled intervention research within

these contexts. It was also shown that teamwork training was less effective for improving team-

work for intact teams compared to new teams. Since many teams seeking teamwork training

are likely to be intact, it is important that future research continue to test various training strat-

egies that can be utilized with these types of teams. In addition, there are other contexts in

which controlled interventions have not yet been conducted such as with police squads, fire-

fighting crews, sports teams, political parties, and so on. Research in these areas is clearly ripe

for future inquiry.

Further research on the ideal combination of teamwork dimensions (i.e., preparation and/
or execution and/or reflection and/or interpersonal dynamics) targeted in an intervention

would also enhance our current knowledge in terms of how to train teamwork most effectively

and efficiently. We had originally planned to further assess this moderator by conducting a

method co-occurrence analysis [81]. Specifically, since there would likely be a variety of combi-

nations of dimensions that were targeted in the teamwork interventions (e.g., preparation only;

preparation and execution; preparation, execution, reflection, and interpersonal dynamics; etc),

we had hoped to examine if there would be differential effects of these combinations with regard

to intervention effectiveness. Unfortunately, since there were such a large number of combina-

tions of dimensions targeted in the included studies, there was an insufficient number of inter-

ventions that fell into each category. We were, therefore, unable to pursue this method co-

occurrence analysis [81] of the various combinations of dimensions. Thus, although our find-

ings suggest that interventions are more effective when two or more dimensions are targeted,

further research that examines the effects of the ideal combinations of these dimensions would
certainly enhance our current knowledge of teamwork training. For example, if the objective of

teamwork training is to improve the coordination and cooperation of the team, should the

training also target (in addition to targeting these execution behaviors) both the preparation
and reflection dimensions of training (or simply one or the other)? Answering such complex

questions will help to advance our understanding of what makes for an effective teamwork

training program.

Conclusion

Balanced against the contributions and insights provided by the various moderator analyses

conducted in this study, the overall take-home message is that teamwork training is an effec-

tive way to foster teamwork and team performance. These effects appear to be evident across a

range of samples, utilizing numerous intervention methods, and when considering various

measurement characteristics. Interventions appear to be particularly effective when they target

multiple dimensions of teamwork and include experiential activities for team members to

actively learn about, practise, and continually develop teamwork.

Supporting Information

S1 Table. Summaries of Interventions. Summaries of each study and intervention included

in the meta-analysis is provided in the S1 Table.

(DOCX)

S1 File. PRISMA Checklist. The Preferred Reporting Items for Systematic Reviews and Meta-

Analyses (PRISMA) Checklist [82] for this review is presented in the S1 File.

(DOC)

A Meta-Analysis of Teamwork Training

PLOS ONE | DOI:10.1371/journal.pone.0169604 January 13, 2017 19 / 23

http://www.plosone.org/article/fetchSingleRepresentation.action?uri=info:doi/10.1371/journal.pone.0169604.s001

http://www.plosone.org/article/fetchSingleRepresentation.action?uri=info:doi/10.1371/journal.pone.0169604.s002

Author Contributions

Conceptualization: DM ME BZ MB.

Data curation: DM.

Formal analysis: DM.

Investigation: DM GR.

Methodology: DM MB.

Project administration: DM MB.

Resources: DM MB.

Supervision: MB.

Validation: DM GR MB.

Visualization: DM GR ME BZ MB.

Writing – original draft: DM MB.

Writing – review & editing: DM GR ME BZ MB.

References
1. Lepine JA, Piccolo RF, Jackson CL, Mathieu JE, Saul JR. A Meta-Analysis Of Teamwork Processes:

Tests Of A Multidimensional Model And Relationships With Team Effectiveness Criteria. Personnel

Psychology. 2008; 61(2): 273–307.

2. Rousseau V, Aubé C, Savoie A. Teamwork behaviors: A review and an integration of frameworks.

Small Group Research. 2006Jan; 37(5): 540–70

3. Morey JC, Simon R, Jay GD, Wears RL, Salisbury M, Dukes KA, et al. Error reduction and performance

improvement in the emergency department through formal teamwork training: Evaluation results of the

MedTeams project. Health Services Research. 2002; 37(6): 1553–81. doi: 10.1111/1475-6773.01104

PMID: 12546286

4. Smith-Jentsch KA, Cannon-Bowers JA, Tannenbaum SI, Salas E. Guided team self-correction: Impacts

on team mental models, processes, and effectiveness. Small Group Research. 2008Jan; 39(3): 303–

27.

5. Brannick MT, Prince C, Salas E. Can PC-based systems enhance teamwork in the cockpit?. The Inter-

national Journal of Aviation Psychology. 2005; 15(2): 173–87.

6. Padmo Putri DA. The effect of communication strategy and planning intervention on the processes and

performance of course material development teams [dissertation]. Tallahassee (FL): Florida State Uni-

versity; 2012.

7. Jankouskas TS. Crisis Resource Management training: Impact on team process and team effective-

ness [dissertation]. State College (PA): Pennsylvania State University; 2012

8. Mcculloch P, Mishra A, Handa A, Dale T, Hirst G, Catchpole K. The effects of aviation-style non-techni-

cal skills training on technical performance and outcome in the operating theatre. Quality and Safety in

Health Care. 2009Jan; 18(2): 109–15. doi: 10.1136/qshc.2008.032045 PMID: 19342524

9. Bjornberg NH. Mutual performance monitoring in virtual teams [dissertation]. Norfolk, VA: Old Dominion

University; 2014

10. Marks MA, Mathieu JE, Zaccaro SJ. A temporally based framework and taxonomy of team processes.

Academy of Management Review. 2001Jan; 26(3): 356–76.

11. Lewin K. A dynamic theory of personality. New York, NY: McGraw-Hill; 1935.

12. Cheater FM, Hearnshaw H, Baker R, Keane M. Can a facilitated programme promote effective multidis-

ciplinary audit in secondary care teams? An exploratory trial. International Journal of Nursing Studies.

2005; 42(7): 779–91. doi: 10.1016/j.ijnurstu.2004.11.002 PMID: 16084925

13. Aaron JR, Mcdowell WC, Herdman AO. The effects of a team charter on student team behaviors. Jour-

nal of Education for Business. 2014; 89(2): 90–7.

A Meta-Analysis of Teamwork Training

PLOS ONE | DOI:10.1371/journal.pone.0169604 January 13, 2017 20 / 23

http://dx.doi.org/10.1111/1475-6773.01104

http://www.ncbi.nlm.nih.gov/pubmed/12546286

http://dx.doi.org/10.1136/qshc.2008.032045

http://www.ncbi.nlm.nih.gov/pubmed/19342524

http://dx.doi.org/10.1016/j.ijnurstu.2004.11.002

http://www.ncbi.nlm.nih.gov/pubmed/16084925

14. Ellis AP, Bell BS, Ployhart RE, Hollenbeck JR, Ilgen DR. An evaluation of generic teamwork skills train-

ing with action teams: Effects on cognitive and skill-based outcomes. Personnel Psychology. 2005; 58

(3): 641–72.

15. Achille LB, Schulze KG, Schmidt-Nielsen A. An analysis of communication and the use of military terms

in Navy team training. Military Psychology. 1995; 7(2): 95–107.

16. Villado AJ, Arthur W. The comparative effect of subjective and objective after-action reviews on team

performance on a complex task. Journal of Applied Psychology. 2013; 98(3): 514–28. doi: 10.1037/

a0031510 PMID: 23356248

17. Hackman JR, Katz N. Group behavior and performance. In: Fiske TE, Gilbert DT, Lindzey G, editors.

Handbook of social psychology (Vol 2., 5th ed.). West Sussex, UK John Wiley & Sons; 2010. pp.

1251–1280

18. Argote L, McGrath JE. Group processes in organizations: Continuity and change. In Cooper CL, Robert-

son IT, editors. International review of industrial and organizational psychology. Chichester, UK:

Wiley;1993. pp. 333–389

19. Bommer WH, Johnson JL, Rich GA, Podsakoff PM, Mackenzie SB. On the interchangeability of objec-

tive and subjective measures of employee performance: A meta-analysis. Personnel Psychology.

1995; 48(3): 587–605.

20. Mathieu J, Maynard MT, Rapp T, Gilson L. Team effectiveness 1997–2007: A review of recent

advancements and a glimpse into the future. Journal of Management. 2008Jul; 34(3): 410–76.

21. O’Leary KJ, Haviley C, Slade ME, Shah HM, Lee J, Williams MV. Improving teamwork: Impact of struc-

tured interdisciplinary rounds on a hospitalist unit. Journal of Hospital Medicine. 2010; 6(2): 88–93.

22. Marshall S, Harrison J, Flanagan B. The teaching of a structured tool improves the clarity and content of

interprofessional clinical communication. Quality and Safety in Health Care. 2009Jan; 18(2): 137–40.

doi: 10.1136/qshc.2007.025247 PMID: 19342529

23. Salas E, Nichols DR, Driskell JE. Testing three team training strategies in intact teams: A meta-analysis.

Small Group Research. 2007Jan; 38(4): 471–88.

24. Martin LJ, Carron AV, Burke SM. Team building interventions in sport: A meta-analysis. Sport and Exer-

cise Psychology Review. 2009; 5(2): 3–18.

25. O’dea A, O’connor P, Keogh I. A meta-analysis of the effectiveness of crew resource management

training in acute care domains. Postgraduate Medical Journal. 2014Apr; 90(1070): 699–708. doi: 10.

1136/postgradmedj-2014-132800 PMID: 25370080

26. Klein C, DiazGranados D, Salas E, Le H, Burke CS, Lyons R, Goodwin GF. Does team building work?

Small Group Research. 2009; 40(2): 181–222.

27. Kleingeld A, Mierlo HV, Arends L. The effect of goal setting on group performance: A meta-analysis.

Journal of Applied Psychology. 2011; 96(6): 1289–304. doi: 10.1037/a0024315 PMID: 21744940

28. Salas E, Rozell D, Mullen B, Driskell JE. The effect of team building on performance: An integration.

Small Group Research. 1999Jan; 30(3): 309–29.

29. Tannenbaum SI, Cerasoli CP. Do team and individual debriefs enhance performance? A meta-analysis.

Human Factors: The Journal of the Human Factors and Ergonomics Society. 2012Apr; 55(1): 231–45.

30. Salas E, Diazgranados D, Klein C, Burke CS, Stagl KC, Goodwin GF, et al. Does Team Training

Improve Team Performance? A Meta-Analysis. Human Factors: The Journal of the Human Factors and

Ergonomics Society. 2008Jan; 50(6): 903–33.

31. Higgins JP, Green S. Cochrane handbook for systematic reviews of interventions (Vol. 5.1.0). The

Cochrane Collaboration. 2008. Available from www.cochrane-handbook.org.

32. Wu AD, Zumbo BD. Understanding and using mediators and moderators. Social Indicators Research.

2008Jul 1; 87(3): 367–92.

33. Borenstein M, Hedges L, Higgins JPT, Rothstein HR. Comprehensive meta-analysis ( 2nd Ed.). Engle-

wood, NJ: Biostat;2005.

34. Borenstein M, Hedges L, Higgins JPT, Rothstein HR. Introduction to meta-analysis. West Sussex, UK:

John Wiley & Sons;2009

35. Field AP, Gillett R. How to do a meta-analysis. British Journal of Mathematical and Statistical Psychol-

ogy. 2010; 63(3): 665–94.

36. Decoster J, Claypool HM. A meta-analysis of priming effects on impression formation supporting a gen-

eral model of informational biases. Personality and Social Psychology Review. 2004; 8(1): 2–27. doi:

10.1207/S15327957PSPR0801_1 PMID: 15121538

37. Cohen J. A power primer. Psychological Bulletin. 1992; 112(1): 155–9. PMID: 19565683

38. Rosenthal R. The file drawer problem and tolerance for null results. Psychological Bulletin. 1979; 86(3):

638–41.

A Meta-Analysis of Teamwork Training

PLOS ONE | DOI:10.1371/journal.pone.0169604 January 13, 2017 21 / 23

http://dx.doi.org/10.1037/a0031510

http://dx.doi.org/10.1037/a0031510

http://www.ncbi.nlm.nih.gov/pubmed/23356248

http://dx.doi.org/10.1136/qshc.2007.025247

http://www.ncbi.nlm.nih.gov/pubmed/19342529

http://dx.doi.org/10.1136/postgradmedj-2014-132800

http://dx.doi.org/10.1136/postgradmedj-2014-132800

http://www.ncbi.nlm.nih.gov/pubmed/25370080

http://dx.doi.org/10.1037/a0024315

http://www.ncbi.nlm.nih.gov/pubmed/21744940

Home

http://dx.doi.org/10.1207/S15327957PSPR0801_1

http://www.ncbi.nlm.nih.gov/pubmed/15121538

http://www.ncbi.nlm.nih.gov/pubmed/19565683

39. Rosenberg MS. The file-drawer problem revisited: A general weighted method for calculating fail-safe

numbers in meta-analysis. Evolution. 2005; 59(2): 464–8. PMID: 15807430

40. Becker EA, Godwin EM. Methods to improve teaching interdisciplinary teamwork through computer

conferencing. Journal of Allied Health. 2005; 34(3): 169–176. PMID: 16252680

41. Beck-Jones JJ. The effect of cross-training and role assignment in cooperative learning groups on task

performance, knowledge of accounting concepts, teamwork behavior, and acquisition of interpositional

knowledge [dissertation]. Tallahassee (FL): Florida State University; 2004

42. Beranek PM, Martz B. Making virtual teams more effective: Improving relational links. Team Perfor-

mance Management Team Performance Management: An International Journal. 2005; 11(5/6): 200–

13.

43. Bushe GR, Coetzer G. Appreciative inquiry as a team-development intervention: A controlled experi-

ment. The Journal of applied behavioral science. 1995 Mar 1; 31(1):13–30.

44. Clay-Williams R, Mcintosh CA, Kerridge R, Braithwaite J. Classroom and simulation team training: A

randomized controlled trial. International Journal for Quality in Health Care. 2013Feb; 25(3): 314–21.

doi: 10.1093/intqhc/mzt027 PMID: 23548443

45. Dalenberg S, Vogelaar ALW, Beersma B. The effect of a team strategy discussion on military team per-

formance. Military Psychology. 2009; 21(Suppl 2).

46. Deneckere S, Euwema M, Lodewijckx C, Panella M, Mutsvari T, Sermeus W, et al. Better interprofes-

sional teamwork, higher level of organized care, and lower risk of burnout in acute health care teams

using care pathways: A cluster randomized controlled trial. Medical Care. 2013; 51(1): 99–107. doi: 10.

1097/MLR.0b013e3182763312 PMID: 23132203

47. Dibble R. Collaboration for the common good: An examination of internal and external adjustment [dis-

sertation]. Irvine (CA): University of California; 2010.

48. Eden D. Perspectives team development: Quasi-experimental confirmation among combat companies.

Group & Organization Management. 1986 Dec 1; 11(3):133–46.

49. Emmert MC. Pilot test of an innovative interprofessional education assessment strategy [doctoral dis-

sertation]. Los Angeles (CA): University of California; 2011.

50. Entin EE, Serfaty D. Adaptive team coordination. Human Factors: The Journal of the Human Factors

and Ergonomics Society. 1999Jan; 41(2): 312–25.

51. Friedlander F. The impact of organizational training laboratories upon the effectiveness and interaction

of ongoing work groups. Personnel Psychology. 1967 Sep 1; 20(3): 289–307.

52. Green LR. The effectiveness of tactical adaptation and coordination training on team performance in

tactical scenarios. Naval Postgraduate School Monterey CA; 1994 Jun.

53. Kim LY. The effects of simulation-based TeamSTEPPS interprofessional communication and teamwork

training on patient and provider outcomes [dissertation]. Los Angeles (CA): University of California;

2014.

54. Martı́nez-Moreno E, Zornoza A, Orengo V, Thompson LF. The effects of team self-guided training on

conflict management in virtual teams. Group Decision and Negotiation. 2014Sep; 24(5): 905–23.

55. Prichard JS, Ashleigh MJ. The effects of team-skills training on transactive memory and performance.

Small Group Research. 2007 Dec 1; 38(6): 696–726.

56. Rapp TL, Mathieu JE. Evaluating an individually self-administered generic teamwork skills training pro-

gram across time and levels. Small Group Research. 2007 Aug 1; 38(4): 532–55.

57. Shapiro MJ, Morey JC, Small SD, Langford V, Kaylor CJ, Jagminas L, Suner S, Salisbury ML, Simon R,

Jay GD. Simulation based teamwork training for emergency department staff: does it improve clinical

team performance when added to an existing didactic teamwork curriculum?. Quality and Safety in

Health Care. 2004 Dec 1; 13(6):417–21. doi: 10.1136/qshc.2003.005447 PMID: 15576702

58. Thomas EJ, Taggart B, Crandell S, Lasky RE, Williams AL, Love LJ, et al. Teaching teamwork during

the Neonatal Resuscitation Program: A randomized trial. Journal of Perinatology. 2007Jul; 27(7): 409–

14. doi: 10.1038/sj.jp.7211771 PMID: 17538634

59. Volpe CE, Cannon-Bowers JA, Salas E, Spector PE. The impact of cross-training on team functioning:

An empirical investigation. Human Factors: The Journal of the Human Factors and Ergonomics Society.

1996Jan; 38(1): 87–100.

60. Weaver SJ, Rosen MA, DiazGranados D, Lazzara EH, Lyons R, Salas E, et al. Does teamwork improve

performance in the operating room? A multilevel evaluation. Joint Commission Journal on Quality and

Patient Safety. 2010; 36(3): 133–142. PMID: 20235415

61. Weller JM, Torrie J, Boyd M, Frengley R, Garden A, Ng WL, et al. Improving team information sharing

with a structured call-out in anaesthetic emergencies: a randomized controlled trial. British Journal of

Anaesthesia. 2014; 112(6): 1042–9. doi: 10.1093/bja/aet579 PMID: 24561645

A Meta-Analysis of Teamwork Training

PLOS ONE | DOI:10.1371/journal.pone.0169604 January 13, 2017 22 / 23

http://www.ncbi.nlm.nih.gov/pubmed/15807430

http://www.ncbi.nlm.nih.gov/pubmed/16252680

http://dx.doi.org/10.1093/intqhc/mzt027

http://www.ncbi.nlm.nih.gov/pubmed/23548443

http://dx.doi.org/10.1097/MLR.0b013e3182763312

http://dx.doi.org/10.1097/MLR.0b013e3182763312

http://www.ncbi.nlm.nih.gov/pubmed/23132203

http://dx.doi.org/10.1136/qshc.2003.005447

http://www.ncbi.nlm.nih.gov/pubmed/15576702

http://dx.doi.org/10.1038/sj.jp.7211771

http://www.ncbi.nlm.nih.gov/pubmed/17538634

http://www.ncbi.nlm.nih.gov/pubmed/20235415

http://dx.doi.org/10.1093/bja/aet579

http://www.ncbi.nlm.nih.gov/pubmed/24561645

62. Brown TC. The effect of verbal self-guidance training on collective efficacy and team performance. Per-

sonnel Psychology. 2003; 56(4): 935–64.

63. Buller PF, Bell CH. Effects of team building and goal setting on productivity: A field experiment. Acad-

emy of Management Journal. 1986 Jun 1; 29(2): 305–28.

64. Cannon-Bowers JA, Salas E, Blickensderfer E, Bowers CA. The impact of cross-training and workload

on team functioning: A replication and extension of initial findings. Human Factors: The Journal of the

Human Factors and Ergonomics Society. 1998 Mar 1; 40(1):92–101.

65. Chang S, Waid E, Martinec DV, Zheng B, Swanstrom LL. Verbal communication improves laparoscopic

team performance. Surgical Innovation. 2008Jan; 15(2): 143–7. doi: 10.1177/1553350608318452

PMID: 18492733

66. Fandt PM, Richardson WD, Conner HM. The impact of goal setting on team simulation experience.

1990Jan; 21(4): 411–22.

67. Haslam SA, Wegge J, Postmes T. Are we on a learning curve or a treadmill? The benefits of participa-

tive group goal setting become apparent as tasks become increasingly challenging over time. European

Journal of Social Psychology. 2009 Apr 1; 39(3): 430–46.

68. Ikomi PA, Boehm-Davis DA, Holt RW, Incalcaterra KA. Jump seat observations of advanced crew

resource management (ACRM) effectiveness. In Proceedings of the 10th International Symposium on

Aviation Psychology 1999 (Vol. 5, pp. 3–1999).

69. Jarrett S. The comparative effectiveness of after-action reviews in co-located and distributed team train-

ing environments [dissertation]. College Station (TX): Texas A&M University; 2012.

70. Kring JP. Communication modality and after action review performance in a distributed immersive vir-

tual environment [dissertation]. Orlando (FL): University of Central Florida; 2005

71. Longenecker CO, Scazzero JA, Stansfield TT. Quality improvement through team goal setting, feed-

back, and problem solving: A field experiment. International Journal of Quality & Reliability Manage-

ment. 1994 Jun 1; 11(4): 45–52.

72. Schurig IA. An investigation of the effect of after-action reviews on teams’ performance-efficacy relation-

ships [dissertation]. College Station (TX): Texas A&M University; 2013.

73. Siegel AI, Federman PJ. Communications content training as an ingredient in effective team perfor-

mance. Ergonomics. 1973; 16(4): 403–16. doi: 10.1080/00140137308924530 PMID: 4757408

74. Sikorski EG, Johnson TE, Ruscher PH. Team knowledge sharing intervention effects on team shared

mental models and student performance in an undergraduate science course. Journal of Science Edu-

cation and Technology 2011; 21(6): 641–51.

75. Smith-Jentsch KA, Salas E, Baker DP. Training team performance-related assertiveness. Personnel

Psychology. 1996; 49(4): 909–36.

76. Stout RJ, Salas E, Fowlkes JE. Enhancing teamwork in complex environments through team training.

Group Dynamics: Theory, Research, and Practice. 1997; 1(2): 169–82.

77. Wegge J, Haslam SA. Improving work motivation and performance in brainstorming groups: The effects

of three group goal-setting strategies. European Journal of Work and Organizational Psychology. 2005

Dec 1; 14(4): 400–30.

78. The Joint Commission. Health care at the crossroads: Strategies for improving the medical liability sys-

tem and preventing patient injury. 2005. Available from http://www.jointcommission.org/assets/1/18/

Medical_Liability

79. McEwan D, Beauchamp MR. Teamwork in sport: A theoretical and integrative review. International

Review of Sport and Exercise Psychology. 2014; 7(1): 229–250.

80. De Dreu CKW. Cooperative outcome interdependence, task reflexivity, and team effectiveness: A moti-

vated information processing perspective. Journal of Applied Psychology. 2007; 92(3): 628–38. doi: 10.

1037/0021-9010.92.3.628 PMID: 17484546

81. Peters G-JY, Bruin MD, Crutzen R. Everything should be as simple as possible, but no simpler: towards

a protocol for accumulating evidence regarding the active content of health behaviour change interven-

tions. Health Psychology Review. 2013; 9(1): 1–14. doi: 10.1080/17437199.2013.848409 PMID:

25793484

82. Moher D, Liberati A, Tetzlaff J, Altman DG, The PRISMA Group. Preferred Reporting Items for System-

atic Reviews and Meta-Analyses: The PRISMA Statement. PLoS Med. 2009; 6(6): e1000097.

A Meta-Analysis of Teamwork Training

PLOS ONE | DOI:10.1371/journal.pone.0169604 January 13, 2017 23 / 23

http://dx.doi.org/10.1177/1553350608318452

http://www.ncbi.nlm.nih.gov/pubmed/18492733

http://dx.doi.org/10.1080/00140137308924530

http://www.ncbi.nlm.nih.gov/pubmed/4757408

http://www.jointcommission.org/assets/1/18/Medical_Liability

http://www.jointcommission.org/assets/1/18/Medical_Liability

http://dx.doi.org/10.1037/0021-9010.92.3.628

http://dx.doi.org/10.1037/0021-9010.92.3.628

http://www.ncbi.nlm.nih.gov/pubmed/17484546

http://dx.doi.org/10.1080/17437199.2013.848409

http://www.ncbi.nlm.nih.gov/pubmed/25793484

DETAILS

Distribution, posting, or copying of this PDF is strictly prohibited without written permission of the National Academies Press.
(Request Permission) Unless otherwise indicated, all materials in this PDF are copyrighted by the National Academy of Sciences.

Copyright © National Academy of Sciences. All rights reserved.

THE NATIONAL ACADEMIES PRESS

Visit the National Academies Press at NAP.edu and login or register to get:

– Access to free PDF downloads of thousands of scientific reports

– 10% off the price of print titles

– Email or social media notifications of new titles related to your interests

– Special offers and discounts



GET THIS BOOK

FIND RELATED TITLES

This PDF is available at SHARE

CONTRIBUTORS

   http://nap.edu/9847

Classroom Assessment and the National Science Education
Standards

128 pages | 8.5 x 11 | HARDBACK
ISBN 978-0-309-38624-1 | DOI 10.17226/9847

J. Myron Atkin, Paul Black, and Janet Coffey, Editors; Committee on Classroom
Assessment and the National Science Education Standards; National Research
Council

http://nap.edu/9847

http://www.nap.edu/related.php?record_id=9847

http://www.nap.edu/reprint_permission.html

http://nap.edu

http://api.addthis.com/oexchange/0.8/forward/facebook/offer?pco=tbxnj-1.0&url=http://www.nap.edu/9847&pubid=napdigops

http://www.nap.edu/share.php?type=twitter&record_id=9847&title=Classroom+Assessment+and+the+National+Science+Education+Standards

http://api.addthis.com/oexchange/0.8/forward/linkedin/offer?pco=tbxnj-1.0&url=http://www.nap.edu/9847&pubid=napdigops

mailto:?subject=null&body=http://nap.edu/9847

P R O F E S S I O N A L D E V E L O P M E N T 79

5
Professional
Development

Improvement by teachers of formative
assessment practices will usually
involve a significant change in the way
they plan and carr y out their teaching,
so that attempts to force adoption of
the same simple recipe by all teachers
will not be effective. Success will
depend on how each can work out his
or her own way of implementing
change. (Black, 1997)

Just as there is powerful evidence
that formative assessment can im-
prove students’ learning and achieve-
ment, it is just as clear that sustained
professional development for teachers
is required if they are to improve this
aspect of their teaching. Clear goals
are necessar y, along with well-under-
stood criteria for high-quality student
work. To accurately gauge student
understanding requires that teachers
engage in questioning and listen
carefully to student responses. It

means focusing the students’ own
questions. It means figuring out what
students comprehend by listening to
them during their discussions about
science. They need to carefully
consider written work and what they
obser ve while students engage in
projects and investigations. The
teacher strives to fathom what the
student is saying and what is implied
about the student’s knowledge in his
or her statements, questions, work
and actions. Teachers need to listen in
a way that goes well beyond an imme-
diate right or wrong judgment.

Once the current level of under-
standing is ascer tained, teachers
need to use data drawn from conver-
sations, obser vations, and prior
student work to make informed
decisions about how to help a student
move toward the desired goals. They
also need to facilitate and cultivate

Teachers, teacher educators, professional-development specialists, and admin-
istrators may be most interested in this chapter.

Classroom Assessment and the National Science Education Standards

Copyright National Academy of Sciences. All rights reserved.

http://www.nap.edu/9847

80 C L A S S R O O M A S S E S S M E N T A N D T H E N A T I O N A L S C I E N C E E D U C A T I O N S T A N D A R D S

peer and self-assessment strategies
among their students. Although this
list is not complete, it does begin to
show the scope of professional
development that is required to
achieve high-quality classroom
assessment. Many teachers already
engage smoothly and ef fectively in
the processes associated with ef fec-
tive classroom assessment, but these
practices need to be developed and
enhanced in all classrooms and
among all teachers.

FEATURES OF PROFESSIONAL
DEVELOPMENT

Change in assessment practices
that are closely linked to ever yday
teaching will not come about through
occasional in-ser vice days or special
workshops. Teacher professional-
development research (Loucks-
Horsley, Hewson, Love, & Stiles, 1998)
indicates that a “one-shot” teacher
professional-development experience
is not effective in almost any signifi-
cant attempt to improve teaching
practice. Because the kind of assess-
ment discussed in this document is
intimately associated with a teacher’s
fundamental approach to her responsi-
bilities and not simply an add-on to
current practice, professional develop-
ment must permit the examination of
basic questions about what it means to
be a teacher. Professional develop-
ment needs to become a continuous
process (see Professional Develop-

ment Standards, NRC, 1996), where
teachers have opportunities to engage
in professional growth throughout
their careers.

Rooted in Practice

As Black’s statement at the outset
of this chapter suggests, widespread
formative assessment will not come
about solely through changes in
policies nor solely by adopting specific
programs. New techniques can help,
but understanding the basis for the
new techniques also is necessar y if it
is to be implemented in a manner
consistent with its intent. Yet a
teacher cannot successfully implement
all of the changes overnight. Success-
ful and lasting change takes time and
deep examination. It becomes critical
to root professional-development
experiences in what teachers actually
do. This approach also is consistent
with what research says about teacher
learning. A recent study by the NRC
(1999a) asserts that teachers continue
to learn about teaching in many ways.
Primarily, the study states, “they learn
from their own practice” (p. 179).
Teachers develop repertoires of action
that are shaped both by standards and
by the knowledge that is gleaned in
practice (Wenger, 1998).

Reflective Practice

The standards for assessment and
teaching stress the impor tance of

Classroom Assessment and the National Science Education Standards
Copyright National Academy of Sciences. All rights reserved.

http://www.nap.edu/9847

P R O F E S S I O N A L D E V E L O P M E N T 81

incorporating reflection into regular
teaching practice. The Teaching
Standards (NRC, 1996) state that
teachers should, “Use student data,
obser vations of teaching, and inter-
actions with colleagues to reflect on
and improve teaching practice” (p.
42). Underlying many of the suc-
cessful professional growth strate-
gies is the use of data from a
teacher’s own classroom and experi-
ence. When teachers examine their
own teaching, they begin to notice
incidents and patterns that may
other wise have been overlooked. It
is impor tant that teachers allow
feedback from their own practice to
inform their future practice, includ-
ing their beliefs and understandings
involved in teaching. Reflection and
inquir y into teaching, and the local
and practical knowledge that results,
is a star t towards improved assess-
ment in the classroom.

One form this inquir y into teach-
ing practice could take is action
research, research conducted by
teachers for improvement of aspects
of their teaching. This form of
research is based on the principle
that the practical reasoning of teach-
ers is directed toward taking prin-
cipled action in their own classrooms
(Atkin, 1992). By making changes in
their own professional activities,
teachers learn about themselves and
the improvements they desire. Their
understanding is deepened when
they discuss these experiences with

peers who share similar values and
who are tr ying to make similar
changes (Atkin, 1994; Cochran-Smith
& L ytle, 1999; Elliot, 1987;
Hargreaves, 1998).

Collaborative

For teachers working in what is
often considered a solitar y culture,
collaboration with peers is thus
another feature of improving prac-
tices. This is suppor ted by research
findings that teachers learn through
their interactions with other teachers
(NRC, 1999a):

. . . research evidence indicates that the
most successful teacher professional
development activities are those that
are extended over time and encourage
the development of teachers’ learning
communities. These kinds of activities
have been accomplished by creating
opportunities for shared experiences
and discourse around shared texts and
data about student learning, and focus
on shared decision making. (p. 192)

Deliberation among peers is a
fundamental feature of professional
development in any field. These
deliberations can be formal or infor-
mal and also can occur among col-
leagues who teach the same grade
level or across grades. The exact
composition of the group is secondar y
to the common interest and goal of
improved practice. Parallels do exist
between what we know about teacher
learning and our understanding of

Classroom Assessment and the National Science Education Standards
Copyright National Academy of Sciences. All rights reserved.

http://www.nap.edu/9847

82 C L A S S R O O M A S S E S S M E N T A N D T H E N A T I O N A L S C I E N C E E D U C A T I O N S T A N D A R D S

student learning. One such parallel is
the importance of collecting informa-
tion that can be used to inform teach-
ing. Collaboration and cooperative
groups help facilitate feedback; thus,
opportunities that allow colleagues to
obser ve attempts to implement new
ideas—by visits to other classrooms
and by watching videotape—should be
built into professional-development
experiences (NRC, 1999a). As well as
finding out about effective practices,
teachers can glean valuable lessons
from sharing and discussing practices
that are less than successful (NRC,
1999a). To paraphrase Thomas
Edison: I didn’t fail; I found out what
doesn’t work.

Multiple Entry Points

Because teachers have different
professional needs, designers of
professional-development programs
usually tr y to provide multiple points
of entr y to the experience as well as to
encourage multiple forms of follow-up.
Furthermore, they are cognizant of
the fact that change does not happen
all at once. To facilitate long-term
growth, professional-development
experiences need to provide for, and
foster as a desired skill, sustained
reflection and deliberation.

A major theme throughout this
report is that formative assessment
practices are, or ought to be, so deeply
embedded in instructional practice

FIGURE 5-1 Perspectives on learning environments.

SOURCE: NRC (1999a).

that efforts to improve them open up a
broad agenda of issues associated with
curriculum, instruction, as well as
assessment, and the interactions
among all three. Figure 5-1 offers a
graphic illustration of learning envi-
ronments. The diagram illustrates
that while assessment is a subject for
study in and of itself, there will be
overlaps with the areas of curriculum
(knowledge) and instruction (the
students), as well as an inescapable
impact on the context in which the
learning is taking place. Because of
this close integration, examination of
classroom assessment is a particularly
fertile entr y point for the study and

Community

Learner
centered

Assessment
centered

Knowledge
centered

Classroom Assessment and the National Science Education Standards
Copyright National Academy of Sciences. All rights reserved.

http://www.nap.edu/9847

P R O F E S S I O N A L D E V E L O P M E N T 83

improvement of a range of teachers’
professional activities, all within an
integrated context of content, teach-
ing, and learning. It can give impetus
and shape to teacher education at all
levels, preser vice and inser vice.

Discussions with groups of teachers
focusing on assessment that goes on
in their classrooms can quickly lead to
some basic questions: What is really
worth knowing? What is worth teach-
ing? What counts as knowing? What
is competence? What is excellence?
How does a particular piece of work
reflect what a student understands and
is able to do? After conducting class-
room assessment professional-devel-
opment programs with teachers, staff
members at TERC (Love, 1999) in
Cambridge, Massachusetts, concluded:

When done well, a discussion by
teachers of students’ endeavors can
lead to deeper understandings about
individual students and can provide
information about the quality of
assignments, teaching strategies and
classroom climate. Perhaps most
important of all, it provides a rich
professional learning opportunity for
teachers. (p. 1)

AN AGENDA FOR ASSESSMENT-
CENTERED PROFESSIONAL
DEVELOPMENT

This section articulates an agenda
necessar y to enhance these profes-
sional perspectives and to improve
these skills. There is no single and

clear sequence in which the various
issues, skills, and perspectives that are
entailed might best be explored and
understood in teacher development. A
variety of components will be called
into play, sooner or later, in any rich
program of professional development
that starts from a focus on formative
assessment. The order in which they
arise may well depend on the particu-
lar interests and star ting points of the
teachers involved.

Any comprehensive professional-
development program associated with
improved formative classroom assess-
ment corresponds closely to the
framework for formative assessment
itself. That is to say, professional-
development activities need to address
establishing goals for student learning
and performance, identifying a
student’s understanding, and articulat-
ing plans and pathways that help
students move towards the set goals.
In addition, assessment-centered,
professional-development activities
need to attend to providing feedback
to students, science subject matter,
conceptions of learning, and support-
ing student involvement in assessment.

Establishing Goals

Clarity about the purposes and
goals being pursued in and through
the curriculum is essential. Learning
how to establish these goals is an
important step to improving assess-
ment in one’s classroom. In inquir y

Classroom Assessment and the National Science Education Standards
Copyright National Academy of Sciences. All rights reserved.

http://www.nap.edu/9847

84 C L A S S R O O M A S S E S S M E N T A N D T H E N A T I O N A L S C I E N C E E D U C A T I O N S T A N D A R D S

activities, for example, it is important
to keep in mind both the development
of the students’ understandings and
skills about the process of investiga-
tion and also the aim of developing
concept understanding in relation to
the phenomena being studied in that
investigation. If skills of communica-
tion, or the development to reflect on
one’s own thinking (metacognition),
are aims of the curriculum, then these
also have to be in a professional-
development agenda.

Identifying Student
Understanding

Implementing effective formative
assessment requires that a teacher
elicit information about the students’
understandings as they approach any
particular topic. This is particularly
important since a student will likely
interpret new material in the frame-
work of her preexisting knowledge
and understanding (first main prin-
ciple from How People Learn, NRC,
1999a). Professional development that
will lead to improved assessment must
begin with the sensitivity to the need
of the teacher to learn how to obtain
information about a student’s current
level of understanding of the subject to
be taught and learned.

A teacher influenced by the impor-
tance of probing student current
knowledge started his teaching of a new
science topic with questions designed to
elicit the existing understanding. He

found that the class knew far more
about energy than he had anticipated
but lacked a coherent structure in which
they could relate their various ideas. He
thus abandoned the formal presenta-
tions of the whole menu of relevant
knowledge that he had emphasized in
previous years, and had intended to use
again, and attempted instead to help
them reorganize their existing under-
standings. He was able to incorporate
student investigations into the unit that
helped students’ challenge their ideas
and apply concepts to everyday events.
Overall, the work now took less time
than before but was more ambitious in
developing understanding of the
concepts involved.

As this example demonstrates,
teachers must develop and use means
to elicit students’ existing ideas and
understandings. This may be
achieved by direct questioning,
whether orally, with individuals or in
group discussions, or in writing.
However, such questioning may be
more evocative if it is indirect, if it is
about relevant phenomena or situa-
tions that are put before students and
about which they have to think in
order to respond. The responses may
then indicate how they interpret the
concepts and skills that they possess
and choose to bring to bear on the
specific problem. For example, the
teacher above could ask his students
at the outset to tr y to define mechani-
cal, kinetic or potential energy, or he
could provide the students with a

Classroom Assessment and the National Science Education Standards
Copyright National Academy of Sciences. All rights reserved.

http://www.nap.edu/9847

P R O F E S S I O N A L D E V E L O P M E N T 85

scenario, and ask them to discuss the
scenario in terms of the types of
energy. How best to evaluate and to
use the data that come from question-
ing is equally important to consider
and is discussed in a later section.

The analysis here is not simply about
a single starting point in a teaching plan.

Curriculum also needs consideration.
Content has to be considered in a
meaningful way so that subgoals help
lead to main goals. Box 5-1 provides an
example of subgoals for inquiry science.
Teachers may determine the subgoals
with different levels of specificity. Some
teachers may find dividing a concept

BOX 5-1 Fundamental Abilities and Understandings of Inquiry, 9-12 (Sample)

Elaboration
Students should formulate a testable hypothesis and demonstrate
the logical connections between the scientific concepts guiding a
hypothesis and the design of an experiment. They should
demonstrate appropriate procedures, a knowledge base, and
conceptual understanding of scientific investigations.

Designing and conducting a scientific investigation requires
introduction to the major concepts in the area being investigated,
proper equipment, safety precautions, assistance with method-
ological problems, recommendations for use of technologies,
clarification of ideas that guide the inquiry, and scientific
knowledge obtained from sources other than the actual investi-
gation. The investigation may also require student clarification
of the question, method, controls, and variables; student organi-
zation and display of data; student revision of methods and
explanations; a public presentation of the results with a critical
response from peers. Regardless of the scientific investigation
performed, students must use evidence, apply logic, and con-
struct an argument for their proposed explanations.

Conceptual principles and knowledge guide scientific inquiries.
Historical and current scientific knowledge influences the design
and interpretation of investigations and the evaluations of
proposed explanations made by other scientists.

Mathematical tools and models guide and improve the posing of
questions, gathering data, constructing explanations, and
communicating results.

Ability
Identify questions and
concepts that guide scientific
investigations.

Design and conduct scientific
investigations.

Understanding
Scientists usually inquire
about how physical, living,
or designed systems function.

Mathematics is essential in
scientific inquiry.

Classroom Assessment and the National Science Education Standards
Copyright National Academy of Sciences. All rights reserved.

http://www.nap.edu/9847

86 C L A S S R O O M A S S E S S M E N T A N D T H E N A T I O N A L S C I E N C E E D U C A T I O N S T A N D A R D S

into too fine a level of detail overly
formal. It may deprive them the flexibil-
ity of addressing the needs of individual
students. Although many of these goals
can be determined beforehand, they
also may emerge and need to be
reevaluated based on assessments
occurring during the course of instruc-
tion. A check on one step or goal
becomes part of the design for the next.

To best help students meet their
learning goals, subgoals often have to
be identified and articulated. Coming
to understand a particular model
requires well-organized knowledge of
concepts and inquir y procedures,
which often requires time and many
“little steps” to reach the larger goal.
With a solid understanding of science,
the underlying structure of the disci-
pline can help ser ve as the roadmap to
guide a teacher in selecting and
sequencing activities, assessments,
and with their other interactions with
students (NRC, 1999a).

Inquiry and the National Science
Education Standards (NRC, 2000)
elaborates on some of the more par-
ticular elements of inquir y as ability
and understanding for the K-4, 5-8, 9-12
grade spans. Mastering the abilities
and understandings associated with
inquir y in particular is difficult and can
seem elusive even for the most experi-
enced teacher. Such detail would be
useful for a teacher when articulating
subgoals to support student inquir y in
the classroom. Box 5-1 is an example
of delineation of the fundamental

abilities and understandings for inquir y
at the 9-12 level. For further elabora-
tion, the Standards (NRC, 1996) offer
complete descriptions of scientific
inquir y as abilities and understandings
at the K-4, 5-8, and 9-12 levels.

Articulating a Plan

This process of organizing content
into meaningful steps and activities is
one of the most demanding aspects of
teaching. The teacher needs both a
clear idea about the structure of the
concepts and skills involved and knowl-
edge of the ways in which students may
progress. If intermediary goals are too
ambitious, the step towards growth may
be too difficult, while if they are too
slight, students may not be challenged.
An appropriate subgoal is one that goes
beyond what the student can learn
without help but is within reach given a
reasonable degree of teacher support.
For more background on the theoretical
roots presented here, see Vygotsky’s
discussion (1962) of the zone of proxi-
mal development. Teacher knowledge
of common misconceptions and of tools
available to promote conceptual re-
construction or to promote fluency with
new skills can powerfully inform the
process of structuring the curriculum.

Responding to Students—
Feedback

Teachers also need ways to respond
to the information they elicit from

Classroom Assessment and the National Science Education Standards
Copyright National Academy of Sciences. All rights reserved.

http://www.nap.edu/9847

P R O F E S S I O N A L D E V E L O P M E N T 87

students. One necessar y step is to be
able to analyze and interpret students’
responses to questions, or their actions
in problem situations. In short, teach-
ers need to use data from assessment
in order to make appropriate inferences
that form the basis of their feedback.
This can require careful analysis to
probe the meanings behind what
students say, write, or do. Questions of
good quality are those that evoke
evidence relevant to critical points of
understanding, but students may often
respond in ways that may be hard to
interpret. There are many studies that
show that seemingly incorrect responses
to questions are evidence of a misinter-
pretation of the question rather than of
misunderstanding of the idea being
questioned (NRC, 1981). Difficulties
with language or in the contexts or
purposes of a question are often the
cause. Although such difficulties can
undermine the validity of formal tests,
they need not undermine formative
work by the teacher, provided that
follow-up questions are used to check,
as will happen if question responses are
shared and explored in discussion with
the teacher or with peers.

Understanding of Subject
Matter

A teacher’s interpretation of a student
response, questions, and action will be
related to that teacher’s understanding
of the concept or skill that is at issue.
Thus a solid understanding of the

subject matter being taught is essential.
Performance criteria need to be based
on authentic subject matter goals and on
a depth of understanding of the subject
matter. For formal tests, sound scoring
requires careful rubrics—assessment
tools that articulate criteria for differen-
tiating between performance levels—
that help the assessor to distinguish
between the fully correct, the partially
correct, and the incorrect response.
Such rubrics are even more useful if the
variation of common ways in which
answers can be partially correct are
identified, inasmuch as each partially
correct response requires a different
kind of help from a teacher in helping a
student to progress in overcoming
particular obstacles. For an example of
a rubric, see Table 4-3 in Chapter 4.

Similarly, less formal assessments
also may benefit from a rubric-type
tool for interpretation. For example,
during a classroom discussion, a
teacher can draw on her previous
experience with a student’s par ticular
difficulty in order to formulate the
most helpful oral response.

Exploring Conceptions of
Learning

Underpinning such appropriate
rubrics or frameworks will be the
teacher’s conception of how a student
learns both generally and in the
particular topic of study. A vision of
learning will inform teachers’ guidance
to students. Addressing issues related

Classroom Assessment and the National Science Education Standards
Copyright National Academy of Sciences. All rights reserved.

http://www.nap.edu/9847

88 C L A S S R O O M A S S E S S M E N T A N D T H E N A T I O N A L S C I E N C E E D U C A T I O N S T A N D A R D S

to learning may sound formidable but
all teachers already have such concep-
tions, even if they are incomplete and
implicit. Ideas about learning are part
of any teacher’s pedagogic skills;
making them explicit so that they can
be shared and reflected upon with
colleagues may refine these skills.

SUPPORTING STUDENT
INVOLVEMENT IN ASSESSMENT

A central issue for an assessment-
centered, professional-development
agenda is the development of self-
reflection, or metacognition, among
students. Evidence of the powers of
metacognition can be evoked through a
variety of activities: when students are
asked to review what they have learned,
compose their own test questions, justify
to others how their work meets the
goals of the learning, and assess the
strength and weaknesses of their own
work or work of their peers.

Attending to the ways in which
students arrived at their results, as
well as to the qualities of those results,
bears upon and helps give guidance
about the metacognitive aspect of
students’ development. Here, as
elsewhere, a clear notion of the
meaning and importance of the
concept of metacognition has to be
developed by the teachers and this
notion has to be related to students’
work in practice. Reflection and
discussion with peers can help begin
the examination of these notions.

A focus on student self-reflection
raises a final issue. As argued in
Chapter 3, an important task required
by and promoted by good formative
assessment is the cultivation of self-
assessment and peer-assessment
practices among students. The agenda
for the development of the professional
capabilities of teachers, or much of it,
also can be viewed as an agenda for the
development of the capabilities of
students to become independent and
lifelong learners. In particular, to share
with students the goals, as perceived
and pursued by their teachers, and to
share the criteria of quality by which
those teachers guide and assess their
work, are essential to their growth as
learners. For teachers, this implies a
change of understanding of their role, a
shift away from being seen as director
or controller towards a model of guide
or coach.

An Example

The following vignette highlights
many issues previously discussed,
offering an example of the sometimes
serendipitous nature of assessment-
centered professional development.
In this case, teachers were working
together over the course of a year to
design summative assessments and
scoring mechanisms and discussing
the student work generated during
larger scale summative assessment
tasks administered at the state level in
Delaware.

Classroom Assessment and the National Science Education Standards
Copyright National Academy of Sciences. All rights reserved.

http://www.nap.edu/9847

P R O F E S S I O N A L D E V E L O P M E N T 89

Vignette
The task of the Lead Teacher Assessment Committees seemed straightforward:

develop end-of-unit assessments for the inquiry-based curricular modules being used in
elementary schools across the state. After months of often frustrating efforts, it was not
until the teachers recognized that they first needed to examine their values, beliefs, and
assumptions about student learning, and, in turn, the design and purpose of the
assessments they were being asked to develop, that any substantive progress was
made. What emerged from this process was a self-created learning organization in
which assessment became a force that would support and inform instructional decision
making at multiple levels of Delaware’s science education system. Just how this process
evolved will be related through the professional-development experiences of a team of
fifth-grade teachers charged with the responsibility of creating a performance-based
assessment for an ecosystem module.

From the very beginning in 1992, Delaware’s standards-based reform initiative
included teachers in crucial roles, such as the development of the state science content
standards and with the framework commission. Within this 1997 reform context,
elementary lead teachers from across the state began to collaborate on the develop-
ment of end-of-unit performance assessments for the curricular modules used in their
classrooms. Even though the teaching guides that accompanied the modules included
assessments, many Delaware teachers felt that the majority of the assessment items did
not elicit the kinds of responses needed to determine if their students really understood
the major concepts central to the State Science Standards. Although the teachers were
in agreement that the accompanying assessments were inadequate, there was very little
agreement as how to best proceed in developing alternative assessments. After days of
discussion, the team of fifth-grade teachers decided to begin by constructing a concept
map to ensure that there was consensus about which “big ideas” and processes from
the ecosystem module they considered important enough to assess.

What happened next was not only a surprise to the teachers but also to the leaders
facilitating the development process. As the concept map began to take shape, it
became apparent that many of the teachers were confusing the skills their students
needed to perform the ecosystem activities with the major concepts they needed to
understand. Attempts to clarify the confusion led to a series of conversations in which
the teachers realized that in their zeal to provide “hands-on” experiences, they had
often taught scientific processes in isolation from or at times, to the exclusion of,
scientific concepts. Consequently, observing the ecocolumn and constructing observa-
tion charts had taken precedence over students explaining the relationship between the
living and nonliving components of the ecocolumn they were studying.

In efforts to determine the cause of the over-emphasis, the assessment-development
team made some interesting discoveries. The teachers began to openly acknowledge
that even though each of them had participated in 30 hours of professional develop-
ment centered on the ecosystem module they still did not feel comfortable with some
important ecological concepts. This conversation was especially insightful for the
leaders facilitating the development process, since they had assumed that 30 hours of
professional development were adequate. At that point in the process, it became clear

Classroom Assessment and the National Science Education Standards
Copyright National Academy of Sciences. All rights reserved.

http://www.nap.edu/9847

90 C L A S S R O O M A S S E S S M E N T A N D T H E N A T I O N A L S C I E N C E E D U C A T I O N S T A N D A R D S

that once again the time table for developing the assessments needed to be modified
for the teachers to better understand the concepts they were being asked to teach and
develop an assessment around.

Through this period of self-discovery, the teachers also began to realize as they
reviewed the teacher’s guide, that their perceptions regarding what should be empha-
sized during the course of the unit had been strongly influenced by the wording and
formatting of the guide. Because the titles of most of the ecosystem activities began
with action verbs, such as “observing,” “adding,” “setting up,” they had naturally
inferred that their instructional focus should be process oriented. As the team of fifth-
grade teachers became more confident about their content knowledge and more
comfortable with their newly acquired role of being a “wise curricular consumer,” they
were able to identify the major concepts from the Delaware Science Standards they did
not feel had been made explicit enough in the ecosystem module. They also began to
rethink how the investigative activities needed to be presented and taught to support
and strengthen students’ conceptual understandings. This rethinking automatically led
to the kinds of assessment discussions they had been unable to engage in for weeks.
With a new and collective understanding about the module’s instructional goals, the
challenge for the team of teachers became twofold: how to create a performance-
based assessment that could be used to evaluate both a student’s skill level and concep-
tual understandings and how not to reinforce the process/concept dichotomy they
themselves had experienced.

Several other crucial lessons learned by the assessment-development team were the
importance of using educational research to inform decision-making processes and the
need to seek outside expertise to stretch the thinking of team members. One of the most
challenging issues facing the team was ensuring that the assessment items under develop-
ment could be used to evaluate a range of student capabilities—from making accurate
observations to formulating well-reasoned explanations. They included discussions about
how items had to match what they were being asked. For example, if they wanted to
assess critical reasoning, they had to probe in areas that would elicit that.

As Figure 5-2 shows, once a draft version of the ecosystem assessment was finally
developed, the team began efforts to construct scoring criteria. Initially, attempts were
made to develop very generic or holistic scoring rubrics, but the team soon realized that
if these assessments were going to be used to inform teachers’ instructional practices,
generic rubrics were simply not diagnostic enough. The problem then became “so now
what?” Leaders facilitating the team efforts once again realized the need to look beyond
the expertise of the group for an alternative to generic rubrics so they brought in an
assessment expert who was willing to work with the teachers in the process.

Bringing in an outside expert was good for the entire process. In conversations that
ensued in the expert’s presence, the teachers accepted that all of the students did not
get the answers wrong for the same reason. It also was through the expert’s insistence
that the team began to explicitly state the criteria for a complete response. This exer-
cise would initiate some of the most interesting discussion that occurred during the
entire development process. As the debates about the criteria got into full swing, the
teachers recognized that, because they each had their own set of internalized criteria
for evaluating student work, what was considered quality work in one of their classes

Classroom Assessment and the National Science Education Standards
Copyright National Academy of Sciences. All rights reserved.

http://www.nap.edu/9847

P R O F E S S I O N A L D E V E L O P M E N T 91

was not necessarily considered quality work in another. Having to explicitly state the
criteria became the impetus for very intense discussions regarding what counts as
evidence of student learning and how good is good enough. These discussions would
be revisited again and again and evidence of student learning would ultimately
become the foundation for the decision-making assessment process.

Although the plan had always been to pilot draft versions of the ecosystem assess-
ment so that student work could be used to modify both the instrument and scoring
criteria, none of the team members anticipated how profoundly many of their assump-
tions about student learning would be challenged by critically examining and analyz-
ing student work. Because so much time and effort had gone into the design of the
assessment instrument and scoring rubrics, team members felt very confident that they
had developed a quality product. The teachers naturally experienced a great sense of
ownership regarding the assessment and although the process at times had been
grueling they felt very proud of their efforts. When samples of student work from the
pilot were returned for scoring and analysis and were not the quality anticipated, the
first tendency was to blame the students. It took several rounds of discussions and
more objective evaluations of student responses before the team was ready to admit
that at least some of the problem was the way in which the items or rubrics had been
designed. By examining student responses from across the state it was difficult to
ignore converging evidence, which strongly suggested that some items were obviously
confusing and therefore did not allow students the best opportunity to demonstrate their
understandings. Box 5-2 presents two samples of student work to illustrate this point.
The samples are typical student responses for the item that appeared on the piloted
version, as shown in Figure 5-2.

After samples of earlier versions of the assessment-generated student work had been
analyzed, it became apparent that the item itself was contributing to students’ oversimplifi-
cation of an important scientific concept—interdependency of organisms within an eco-
system. In their efforts to reduce the complexity of the wetland assessment item, the lead
teachers were inadvertently fostering a very linear model of interdependency and were
actually setting the students up to respond incorrectly. The overwhelming state-wide
response, if the large-mouth bass disappeared then the heron would automatically die and
nothing would happen to other organisms, literally forced the lead teachers to not only take
a much closer look at the item but also to begin to question reasons for the prevalence of
such a response.

In a revised version in Figure 5-3, the item was rewritten to reinforce a web-like model
of interdependency that more closely approximates what actually occurs in the wetland
ecosystems. Results from the subsequent field tests indicated that the modifications to the
item are in part responsible for more complete and accurate student responses as seen
in Box 5-3. In their responses to this item, more students included mention of how
populations of organisms would be impacted rather than how a single organism would
be affected.

Several other factors contributed to an increase in student performance on this item.
As the lead teachers began to focus on acquiring evidence of student understanding,
they realized several important things about their instructional practice. For one, most of
their instructional emphasis had focused exclusively on the student-built ecocolumns and

Classroom Assessment and the National Science Education Standards
Copyright National Academy of Sciences. All rights reserved.

http://www.nap.edu/9847

92 C L A S S R O O M A S S E S S M E N T A N D T H E N A T I O N A L S C I E N C E E D U C A T I O N S T A N D A R D S

FIGURE 5-2 First version of assessment item.
SOURCE: Adapted from Delaware Science Coalition (1999).

Classroom Assessment and the National Science Education Standards
Copyright National Academy of Sciences. All rights reserved.

http://www.nap.edu/9847

P R O F E S S I O N A L D E V E L O P M E N T 93

not on creating a learning environment in which students were encouraged and chal-
lenged to extend their own understandings beyond their own ecocolumn model to other
local ecosystems. The responses to the item have prompted teachers to go beyond the kit
to exploring local habitats. Additionally, the conversations revealed that many of the
teachers also had developed the linear interdependency model that their student sub-
scribed to in their responses. Clarifying this particular content issue with teachers resulted
in immediate and significant improvement in student responses. Teachers continue to
work collaboratively to develop assessment items and rubrics aligned with their curricu-
lum and to identify areas that would prove rich for further professional development.

As the Delaware experience indi-
cates, groups of teachers and other
experts coming together around
student work can be a powerful
experience. The “messiness” of this
professional-development experience
is in many ways its ver y strength. It
also demonstrates a realistic view of
the complexity of assessment-related
discussions. Allowing the valuable

conversations to emerge and run their
course provided a richness that may
not have been captured in a session
with a strict agenda. Also, it certainly
would not have happened in a single
scoring session. The conversations
about scoring student work, assess-
ment criteria, or assessment designs
quickly get to issues of content and
questions of worth.

One important element highlighted
in the Delaware experience was the
discussion concerning the valid
inferences that can be made from
assessment data. All teachers must
grapple with this issue as they use
assessment data to inform teaching
decisions. In this instance, asking
critical questions such as, “What does
this piece of evidence show?” and
“What else do I need to find out?” led
the teachers to identify the flaws in
this experience.

Also conveyed in this case, efforts to
use assessment as a cornerstone of
teacher-professional development can
spawn a deeper knowledge of science
content. To design the assessment and
score the students’ work, teachers had
to probe and extend their own under-

BOX 5-2 Student Work from Original Version

Sample #1

Sample #2

SOURCE: Delaware Science Coalition (1999).

Classroom Assessment and the National Science Education Standards
Copyright National Academy of Sciences. All rights reserved.

http://www.nap.edu/9847

94 C L A S S R O O M A S S E S S M E N T A N D T H E N A T I O N A L S C I E N C E E D U C A T I O N S T A N D A R D S

FIGURE 5-3 Revised version of assessment item.
SOURCE: Adapted from Delaware Science Coalition (1999).

Classroom Assessment and the National Science Education Standards
Copyright National Academy of Sciences. All rights reserved.

http://www.nap.edu/9847

P R O F E S S I O N A L D E V E L O P M E N T 95

standings. Many of the teachers
discovered that even they did not fully
grasp all of the ecology concepts being
taught and assessed. The teachers
recognized that they were led by the
wording and format of the curriculum
guide rather than their own under-
standing of the material. Identifying
and addressing areas where teachers
need additional support to better learn
the content is very important especially
when one considers research showing
that a lack of science-content knowl-
edge limits a teacher’s ability to give
appropriate feedback, including identi-
fying misconceptions (Tobin &
Garnett, 1988). In her 1993 work,
Deborah Ball demonstrated the impor-

tance of subject-matter knowledge in
teaching students for understanding,
which requires careful listening to why
and what the students are saying.

Numerous routes can be taken for
professional development aimed at
improving assessment and under-
standing of students. For example,
placing student work at the center of
their ef for ts, Project Zero brings
teachers together to discuss and
reflect on assessment practices and
student work. Teachers involved in
the project engage in a “collaborative
review process” as they look critically
and deeply at student work. They are
urged to stick to the piece of work
and not look for psychological and
social factors that could prevent a
student from producing strong work
(NRC, 1999a).

In a 13-countr y study of 21 innova-
tions in science, mathematics, and
technology education, the researchers
noted (Black & Atkin, 1996) seven
essential elements of programs that
were successful in promoting changes
in teachers. These elements are
displayed in Box 5-4.

Attention to the assessment that
occurs in their classrooms forces
teachers to focus on some aspect of
their practice. Assessment-centered,
professional development regardless
of the starting point can be a powerful
vehicle for teacher professional
growth when performed collaborative-
ly, with regular reflection, and based
on knowledge gleaned in practice.

BOX 5-3 Student Work from Revised Version

Sample #1
Sample #2
SOURCE: Delaware Science Coalition (1999).
Classroom Assessment and the National Science Education Standards
Copyright National Academy of Sciences. All rights reserved.

http://www.nap.edu/9847

96 C L A S S R O O M A S S E S S M E N T A N D T H E N A T I O N A L S C I E N C E E D U C A T I O N S T A N D A R D S

BOX 5-4 Some Basic Features of Professional Development

■ Change begins with disequilibrium—a perception that current practices and policies cannot
help the teachers achieve their goals. If that perception does not exist, then any voluntary project
will have to create it.

■ Teacher networks can be powerful. Describing the effects of setting up networks, one
project reported: “Exposure [to other ideas, resources, and opportunities] broadens teachers’
awareness of possibilities for change and fosters a sense that alternatives to traditional knowledge
and beliefs, classroom practices, and professional involvement are available and within their
reach.”

■ Teachers react against ideas and materials that are theoretically sound but do not function
in the classroom. They seek proof that other professionals with whom they identify are making
new methods work. Such existence proof—the fact that others can do it—gives them moral
support and challenges them.

■ Demonstrating an idea to teachers in action in a real context deepens their understanding
in powerful, subtle, and manifold ways. Such modeling adds to the existence proof the proof of
the teacher’s own experience.

■ Innovation is risky. Personal support—which much be both knowledgeable and close at
hand—is then essential, as the isolated teacher can easily lose direction and lose heart when the
inevitable, often unexpected, difficulties arise.

■ It is most often the case that the whole environment of schools which demonstrably promote
effective professional development also encouraged experimentation.

■ We know from research into teachers’ professional development that change without
reflection is often shallow and incompetent. Such reflection must follow on experimentation,
however well or badly an experiment may turn out. Yet teachers are rarely given the time or
stimulus for reflection.

SOURCE: Black and Atkin (1996).

KEY POINTS

• Professional development
becomes a lifelong process directed
towards catalyzing professional growth.

• Assessment offers fertile ground
for teacher professional development
across a range of activities because of
the close integration of assessment,
curriculum, teaching and learning.

There is no “best” place to start and
no “best” way to proceed.

• Professional development should
be rooted in real-world practice.

• Regular and sustained reflection
and inquir y into teaching is a start
towards improved daily assessment.

• Collaboration is necessar y, as is
support at the school and broader
systems level.

Classroom Assessment and the National Science Education Standards
Copyright National Academy of Sciences. All rights reserved.

http://www.nap.edu/9847

Still stressed with your coursework?
Get quality coursework help from an expert!