4D2-9 – Performance Reports. see details below. Please follow all instructions given and answer the questions as given.

Discussion Instructions:

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

Read Chapter 10 in Program Evaluation and Performance Measurement.  Address and discuss the following:

Analyze the benefits and drawbacks of performance reports.

 

* You must understand and analyze successful performance measurements.

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

* Articulate challenges with implementing performance measurement systems with stakeholders. 

*Communicate through writing that is concise, balanced, and logically organized. 

If program managers are expected to be involved in developing, implementing, and using
performance measures (particularly, measures that focus on outcomes), the intended uses
become important. There will inherently be more incentives for managers to take “ownership”
of measures that are used formatively, as opposed to performance measures that will be seen to
be indicators reflecting the program outcomes attributable to the organization to be used for
summative purposes.

Summative uses of outcome results create situations where managers, knowing that they
often do not control the environmental factors that can affect their program outcomes, perceive
a dilemma. If they are expected to develop and use performance measures, they can be called
to account if they fail to do so. But if they do develop and use performance measures, they
could be held responsible for poor performance results that reflect factors that they cannot
control. In many public organizations—NPM initiatives notwithstanding—procedural rules
and restrictions, and political risk aversion, still mean that even if managers do see innovative
ways of changing programs or program environments to increase the probability of success,
they may have limited freedom to do so.

There is a related problem for organizations that deliver human service programs. In
Chapter 2, we introduce the idea that programs vary in the a priori robustness of their intended
cause-and-effect linkages. Programs that are based on engineering knowledge, such as a
highway maintenance program, typically incorporate technologies that have a high probability
of success. If the program is implemented as planned, it is highly likely that the intended
outcomes will occur. At the other end of the continuum, there are many programs that operate
with low-probability technologies. In general, these programs focus on human service issues
and involve efforts to ameliorate or change human conditions, knowledge, attitudes, or
behaviors. Even if these programs are fully implemented, we often observe mixes of program
successes and failures. Our knowledge of what works tends to be far less certain than is true of
high-probability technologies. There are usually important environmental variables that affect
both program activities and outcome variables. In many situations, it is very difficult to
mitigate the influences of these variables—low-probability technologies tend to be much more
vulnerable to external influences.

For program managers who are involved in delivering programs that incorporate low-
probability technologies, the best efforts of their organization may not succeed. For these kinds
of programs, being accountable for outcome-based performance measures is daunting. Aside
from the challenges of measuring intended outcomes, program managers will point out that
outcomes are not really under their control.

An example might be a social welfare agency that has a program focused on single
mothers. The objective of the program is to support these women in their efforts to obtain
permanent jobs. The logic model for the program emphasizes training and child care support
during the training. But the single mothers may have more needs than this program can meet.
Some may need to deal with substance abuse problems or deal with former partners who
continue to harass the family. Some may need to deal with psychological issues from their own
family of origin. Single programs typically will not be able to address this range of issues, so to
hold any one of them accountable for the program objective is not appropriate. Even if a
comprehensive program could be designed and implemented, the state of the art of our
knowledge of how to achieve the objective is likely to mean that there will be a lot of partial
successes and failures in program outcomes. Holding managers to account for their efforts is
appropriate, but holding them to account for the outcome may easily result in gaming
behaviors.

The Levels of Analysis Problem: Conflating Organizational, Program, and Individual
Performance

In our discussions of performance measurement, we have generally referred to situations
where program performance is being measured. The logic modeling approaches discussed in
Chapter 2, the research design considerations included in Chapter 3, and the measurement
issues in Chapter 4 are focused on evaluating programs. However, performance measurement
can occur at different levels within and between public and nonprofit organizations.

In principle, we can measure the performance of individuals, the performance of programs
or administrative units, the performance of organizations, the performance of sectors (e.g., the
health and social services sector of a government), and the performance of whole governments.
In some performance measurement systems where public reporting has been mandated, there is
a problem with the conceptual framework for performance measurement. Even in the
scholarly literature, program and organizational performance are sometimes used
interchangeably. Hatry (2006), for example, refers to program performance in his book but
does not always distinguish between program and organizational performance.

Where government departments are expected to report on their performance, there appears
to be an assumption that if they measure the performance of the departments, it can be assumed
that program performance within the overall organization is also known. Similarly, if a
department is meeting its performance targets, there can be a tendency to conclude that its
programs must also be performing well. Although this assumption simplifies the task of
measuring performance, it is not the case that one level of analysis is equivalent to the other.
Knowing how ministries or departments are performing is not equivalent to knowing how their
programs are performing, or vice versa.

This levels of analysis problem can be extended. Performance measurement can also
focus on how well individuals are doing in an organization or a program. Again, it might seem
reasonable to assume that if the people are performing well, then the organization or program
is performing well. But as managers know, there is more to effective organizations than the
programs and the people who deliver them. It is fallacious to assume that knowledge of
performance at one level implies knowledge of performance at other levels. To fully measure
performance, it will be necessary to measure it at each level in the organization (individual,
program, department, organization as a whole) in order to obtain a credible picture of how well
the organization is doing.

SUMMARY

Performance measurement and public reporting is now a central part of governmental efforts to
demonstrate public accountability. The normative performance management cycle we
introduced in Chapter 1 suggests that public performance reporting should have real
consequences for the reporting organizations and that the prospect of these consequences will
serve as a driver for performance improvements.

In Chapter 10, we have looked at the ways that performance information is used and
whether public performance reporting does improve performance. The experience with public
reporting is that where performance reporting is high stakes and is accompanied by
independent organizations rating organizations—in England between 2000 and 2005 a three-
star rating system was used—challenging the reputations of public organizations will improve
performance.

But high-stakes performance measurement settings usually produce unintended side
effects, and the key one is gaming of performance results. In the English ambulance service,
gaming included modifying the reported response times for ambulances to meet the 8-minute
target for emergency responses to calls. Managing gaming responses to performance targets
requires investing in strategies such as audit functions, capable of checking data systems and
performance results. Gaming is dynamic, that is, it evolves over time. Responses to gaming
will reduce it but probably never eliminate it.

In most settings, public performance reporting is high stakes in the sense that negative
political consequences can develop. The risk of that happening is situational. In adversarial
political cultures, for example, where risk aversion is a factor in administrative and even policy
decisions, public performance reports can become part of efforts to minimize risks, including
making sure that they contain “good news” or at least performance results that are not negative.

In high-risk settings where public organizations collect their own performance data and
prepare their own performance reports, performance information that is included in public
reports may be decoupled from the performance information that is used for internal
management purposes. Decoupling is a strategy for reducing risk; it offers performance results
to meet nominal accountability expectations, and it protects managers from possible blowback
from political controversy over a public performance result.

In low-risk settings (e.g., most local governments) it is easier to use performance results for
both internal performance management and external accountability. Although there is limited
research on this relationship, in one local government in Western Canada, managers and
council members agreed that the performance information that was produced by managers was
both useful and credible. At the same time, neither the managers nor the council members were
substantially concerned about reporting negative performance results publicly.

There is little empirical evidence that politicians use ex post performance information in
their roles and deliberations. A study that examined the ways that legislators use performance
reports over time shows that expectations were high before the first reports were received, but
actual uses were very modest and were mainly focused around general (and perhaps symbolic)
accountability uses as well as information dissemination uses.

Although there has been widespread adoption of performance measurement and public
reporting in many countries, there is growing evidence of a pulling back from high-stakes
performance measurement systems. In 2005, the British government stopped the “naming and
shaming” approach that was being used in England. In 2010, the U.S. government stopped
using the high-stakes PART process for rating programs on their effectiveness.

Although there has been a pulling back from high-stakes performance measurement and
public reporting, performance measurement is here to stay. It has become an expected part of
public accountability and will continue to be used to produce public performance reports.
Performance measurement has survived and evolved with different governmental reform
initiatives, the latest one being NPM. Even if NPM fades or is replaced by another wave of
public sector reforms, it is unlikely that performance measurement will fade.

For program evaluators, performance data can be a useful part of evaluations, but the
gaming-related problems that can develop in high-stakes performance measurement and
reporting settings suggest that evaluators have to be cautious about which data they rely on.
Performance information that is developed for internal use is likely to be more valid and
reliable than information produced for external, summative purposes.

DISCUSSION QUESTIONS

1. What are the key differences between the technical/rational view of implementing
performance management in organizations and the political/cultural view?

2. Some commentators have suggested that failures of performance measurement systems
to live up to their promises are due to poor or inadequate implementation. This view
suggests that if organizations properly implement performance measurement, paying
attention to what is really needed to get it right, performance measurement will be
successful. Another view is that performance measurement itself is a flawed idea and
that no amount of attention to implementation will solve its problems. What are your
views on this issue?

3. Will auditing performance reports increase their usefulness? Why?
4. If you were making recommendations to the governor of your state about ways of

improving the performance measurement system, what recommendations would you
make?

5. The record on uses of performance reports suggests that they are used by decision
makers to some extent, but there is much room for improvement. What three things
would you suggest to make performance reports more useful for elected decision
makers?

6. What is the levels of analysis problem in performance measurement systems?
7. What does it mean for organizational managers to “game” performance measures? What

are some ways of reducing the occurrence of this problem?
8. What is the “ratchet effect” in setting targets for performance measures?

REFERENCES

Askim, J. (2007). How do politicians use performance information? An analysis of the
Norwegian local government experience. International Review of Administrative Sciences,
73(3), 453–472.

Auditor General of British Columbia. (2008). Strengthening accountability in British
Columbia: Trends and opportunities in performance reporting. Victoria, British Columbia,
Canada: Queen’s Printer.

Auditor General of British Columbia & Deputy Ministers’ Council. (1996). Enhancing
accountability for performance: A framework and an implementation plan—Second joint
report. Victoria, British Columbia, Canada: Queen’s Printer.

Barrett, K., & Greene, R. (2008). Grading the states ’08: The mandate to measure. Governing,
21(6), 24–95.

Bevan, G., & Hamblin, R. (2009). Hitting and missing targets by ambulance services for
emergency calls: Effects of different systems of performance measurement within the UK.
Journal of the Royal Statistical Society. Series A (Statistics in Society), 172(1), 161–190.

Bevan, G., & Hood, C. (2006). Health policy: Have targets improved performance in the
English NHS? BMJ, 332(7538), 419–422.

Bish, R. (1971). The public economy of metropolitan areas. Chicago, IL: Markham.
Bouckaert, G., & Halligan, J. (2008). Managing performance: International comparisons. New

York: Routledge.
CCAF-FCVI. (2002). Reporting principles: Taking public performance reporting to a new

level. Ottawa, Ontario, Canada: Author.

CCAF-FCVI. (2006). Users and uses: Towards producing and using better public
performance reporting: Perspectives and solutions. Ottawa, Ontario, Canada: Author.

CCAF-FCVI. (2007a). CCAF’S improved public performance reporting program gets
guidance from task force. Retrieved from http://www.ccaf-fcvi.com/english/updates/
PPRPTaskForce03–15–07.html

CCAF-FCVI. (2007b). How to improve public performance reports: Major topic at conference
chaired by CCAF. Retrieved from http://www.ccaf-fcvi.com/english/updates/Victoria
Conference02–15–07.html

Downs, A. (1965). An economic theory of democracy. New York: Harper & Row.
Dunleavy, P., Margetts, H., Bastow, S., & Tinkler, J. (2006). New public management is dead:

Long live digital-era governance. Journal of Public Administration Research and Theory,
16(3), 467–494.

Epstein, P. D., Grifel, S. S., & Morgan, S. S. (2004). Auditor roles in government performance
measurement: A guide to exemplary practices at the state, local and provincial levels.
Apharetta, GA: Institute of Internal Auditors Research Foundation.

Feller, I. (2002). Performance measurement redux. American Journal of Evaluation, 23(4), 435
–452.

Frisco, V., & Stalebrink, O. (2008). Congressional use of the Program Assessment Rating
Tool. Public Budgeting & Finance, 28(2), 1–19.

Gill, D. (Ed.). (2011). The iron cage recreated: The performance management of state
organisations in New Zealand. Wellington, NZ: Institute of Policy Studies.

Gilmour, J., & Lewis, D. (2006). Does performance budgeting work? An examination of the
Office of Management and Budget’s PART scores. Public Administration Review, 66(5),
742–752.

Government of BC and the Auditor General of BC. (2003). Reporting principles and an
assurance program for BC: Progress report on the February 2002 recommendations of the
Public Accounts Committee of British Columbia related to building better reports.
Victoria, British Columbia, Canada: Author.

Government of British Columbia. (2000). Budget Transparency and Accountability Act: [SBC
2000 Chapter 23]. Victoria, British Columbia, Canada: Queen’s Printer.

Government of British Columbia. (2001). Budget Transparency and Accountability Act [SBC
2000 Chapter 23] (amended). Victoria, British Columbia, Canada: Queen’s Printer.

Government Performance and Results Act of 1993, Pub. L. No. 103-62.
Government Performance and Results Act Modernization Act of 2010, Pub. L. No. 111-352.
Hatry, H. P. (1974). Measuring the effectiveness of basic municipal services. Washington, DC:

Urban Institute and International City Management Association.
Hatry, H. P. (1980). Performance measurement principles and techniques: An overview for

local government. Public Productivity Review, 4(4), 312–339.
Hatry, H. P. (2006). Performance measurement: Getting results (2nd ed.). Washington, DC:

Urban Institute Press.
Hibbard, J. (2008). What can we say about the impact of public reporting? Inconsistent

execution yields variable results Annals of Internal Medicine, 148, 160–161.
Hibbard, J., Stockard, J., & Tusler, M. (2003). Does publicizing hospital performance stimulate

quality improvement efforts? Health Affairs, 22(2), 84–94.

Hildebrand, R., & McDavid, J. (2011). Joining public accountability and performance
management: A case study of Lethbridge, Alberta. Canadian Public Administration, 54(1),
41–72.

Hood, C. (1991). A public management for all seasons? Public Administration, 69(1), 3–19.
Hood, C. (2006). Gaming in targetworld: The targets approach to managing British public

services. Public Administration Review, 66(4), 515–521.
Hood, C., Dixon, R., & Wilson, D. (2009). “Managing by numbers”: The way to make public

services better? Retrieved from
http://www.christopherhood.net/pdfs/Managing_by_numbers

Kettl, D., & Kelman, S. (2007). Reflections on 21st century government management.
Washington, DC: IBM Center for the Business of Government.

Klay, W. E., McCall, S. M., & Baybes, C. E. (2004). Should financial reporting by government
encompass performance reporting? Origins and implications of the GFOA-GASB conflict.
In A. Khan & W. B. Hildreth (Eds.), Financial management theory in the public sector (pp.
115–140). Westport, CT: Praeger.

Le Grand, J. (2010). Knights and knaves return: Public service motivation and the delivery of
public services. International Public Management Journal, 13(1), 56–71.

Manitoba Office of the Auditor General. (2000). Business and performance measurement:
Study of trends and leading practices. Winnipeg, Manitoba, Canada: Author.

Mayne, J., & Wilkins, P. (2005). “Believe it or not?” The emergence of performance
information auditing. In R. Schwartz & J. Mayne (Eds.), Quality matters: Seeking
confidence in evaluating, auditing, and performance reporting (pp. 237–259). New
Brunswick, NJ: Transaction.

McDavid, J. C. (2001). Solid-waste contracting-out, competition, and bidding practices among
Canadian local governments. Canadian Public Administration, 44(1), 1–25.

McDavid, J. C., & Huse, I. (2012). Legislator uses of public performance reports: Findings
from a five-year study. American Journal of Evaluation, 33(1), 7–25.

McLean, I., Haubrich, D., & Gutierrez-Romer, R. (2007). The perils and pitfalls of
performance measurement: The CPA regime for local authorities in England. Public Money
and Management, 27(2), 111–118.

Melkers, J. (2006). On the road to improved performance: Changing organizational
communication through performance management. Public Performance & Management
Review, 30(1), 73–95.

Melkers, J., & Willoughby, K. (1998). The state of the states: Performance-based budgeting
requirements in 47 out of 50 states. Public Administration Review, 58(1), 66–73.

Moynihan, D. P. (2008). The dynamics of performance management: Constructing information
and reform. Washington, DC: Georgetown University Press.

Niskanen, W. A. (1971). Bureaucracy and representative government. New York: Aldine-
Atherton.

Osborne, D., & Gaebler, T. (1992). Reinventing government: How the entrepreneurial spirit is
transforming the public sector. Reading, MA: Addison-Wesley.

Otley, D. (2003). Management control and performance management: Whence and whither?
British Accounting Review, 35(4), 309–326.

Pentland, B. T. (2000). Will auditors take over the world? Program, technique and the
verification of everything. Accounting, Organizations and Society, 25(3), 307–312.

Perrin, B. (1998). Effective use and misuse of performance measurement. American Journal of
Evaluation, 19(3), 367–379.

Poister, T. H. (2010). The future of strategic planning in the public sector: Linking strategic
management and performance. Public Administration Review, 70(Suppl. S1), S246–S254.

Pollanen, R. M. (2005). Performance measurement in municipalities: Empirical evidence in
Canadian context. International Journal of Public Sector Management, 18(1), 4–24.

Pollitt, C. (2011). 30 years of public management reforms: Has there been a pattern?
(Background paper for the World Bank consultation exercise). Retrieved from
http://siteresources.worldbank.org/EXTGOVANTICORR/
Resources/Politt

Pollitt, C., Bal, R., Jerak-Zuiderent, S., Dowswell, G., & Harrison, S. (2010). Performance
regimes in health care: Institutions, critical junctures and the logic of escalation in England
and the Netherlands. Evaluation, 16(1), 13–29.

Power, M. K. (1997). The audit society: Rituals of verification. Oxford, UK: Oxford University
Press.

Propper, C., & Wilson, D. (2003). The use and usefulness of performance measures in the
public sector. Oxford Review of Economic Policy, 19(2), 250–267.

Savas, E. S. (1982). Privatizing the public sector: How to shrink government. Chatham, NJ:
Chatham House.

Savas, E. S. (1987). Privatization: The key to better government. Chatham, NJ: Chatham
House.

Schaffner, B. F., Streb, M., & Wright, G. (2001). Teams without uniforms: The nonpartisan
ballot in state and local elections. Political Research Quarterly, 54(1), 7–30.

Schwartz, R., & Mayne, J. (Eds.). (2005). Quality matters: Seeking confidence in evaluating,
auditing, and performance reporting. New Brunswick, NJ: Transaction.

Steele, G. (2005, April). Re-aligning resources and expectations: Getting legislators to do
what they “should.” Paper presented at the 25th Anniversary Conference of CCAF-FCVI,
Ottawa, Ontario, Canada.

Sterck, M. (2007). The impact of performance budgeting on the role of the legislature: A four-
country study. International Review of Administrative Sciences, 73(2), 189–203.

Streib, G. D., & Poister, T. H. (1999). Assessing the validity, legitimacy, and functionality of
performance measurement systems in municipal governments. American Review of Public
Administration, 29(2), 107–123.

Thomas, P. G. (2006). Performance measurement, reporting, obstacles and accountability:
Recent trends and future directions. Canberra, ACT, Australia: ANU E Press. Retrieved
from http://epress.anu.edu.au/anzsog/performance/pdf/
performance-whole

Thomas, P. G. (2008). Why is performance-based accountability so popular in theory and so
difficult in practice? In KPMG International (Ed.), Holy grail or achievable quest?
International perspectives on public sector performance management (pp. 169–191).
Toronto, Canada: KPMG International.

Treasury Board of Canada Secretariat. (2009). Policy on evaluation. Retrieved from
http://www.tbs-sct.gc.ca/pol/doc-eng.aspx?id=15024

Wankhade, P. (2011). Performance measurement and the UK emergency ambulance service.
International Journal of Public Sector Management, 24(5), 384–402.

Weber, M. (1930). The Protestant ethic and the spirit of capitalism. London, England: George
Allen.

Wilkins, P., & Mayne, J. (2002). Providing assurance on performance reports: Two
jurisdictions compared. Perth, Western Australia, Australia: Office of the Auditor General.

Williams, D. W. (2003). Measuring government in the early twentieth century. Public
Administration Review, 63(6), 643–659.

Willoughby, K., & Benson, P. (2011). Program evaluation, performance budgeting and PART:
The U.S. Federal Government experience. Atlanta: Georgia State University.

CHAPTER 10

USING PERFORMANCE MEASUREMENT FOR
ACCOUNTABILITY AND PERFORMANCE
IMPROVEMENT

Introduction
Using Performance Results

Legislator Expected Versus Actual Uses of Performance Reports in British Columbia,
Canada

High-Stakes Uses of Performance Measures

The British Experience With Performance Management

Assessing the “Naming and Shaming” Approach to Performance Management in Britain

A Case Study of Gaming: Distorting the Output of a Coal Mine

Reassessing the Performance Management Cycle: The Roles of Incentives and
Organizational Politics

Use of Performance Measures in a Non-Adversarial Political Environment

Joining Internal and External Uses of Performance Information: The Lethbridge Local

Government Study
Using Performance Information for Management: Encouraging Internal Uses of

Performance Results
Increasing Uses of Performance Information by Elected Officials: Supply and Demand

Improvements
Improving the Supply and Demand of Performance Information: Examining the Audit

Strategy

Improving the Demand for Performance Information: Examining Legislation and Training

Assessing the Realities of Performance Measurement for Public Accountability,

Performance Improvement, and Program Evaluation
Three Additional Considerations in Implementing and Sustaining Performance
Measurement Systems

The Centralizing Influence of Performance Measurement in Public Organizations

Attributing Outcomes to Programs

The Levels of Analysis Problem: Conflating Organizational, Program, and Individual

Performance
Summary
Discussion Questions
References

INTRODUCTION

In Chapter 10, we look at both the intended and actual uses of performance information by
elected officials and organizational managers. Performance measurement systems are intended
to improve public accountability and the management of organizational performance. Elected
officials are a key stakeholder in efforts to improve accountability, so we summarize key
findings from one of the few studies that looks at ways that elected decision makers actually
use public performance reports provided to them each year. Then, we look at uses of
performance information in a system where poor performance results were given media
coverage and, in some cases, resulted in executives being fired in poorly performing
organizations. The “naming and shaming” approach used in Britain between 2000 and 2005 is
unique in implementing a high-stakes performance measurement system for public
accountability and performance improvement.

The British experience also raises the issue of the importance of incentives and
organizational political factors in understanding how performance information is actually used
and whether it is seen to be credible. So we come back to the (idealistic) performance
management cycle that we introduced in Chapter 1 and re-examine it from a perspective that
now takes into account how people in organizations actually behave, instead of how they
“should” behave. Given the challenges of implementing public performance reporting systems
that produce information that is not distorted by “gaming” responses from those whose
performance is being judged, we consider the following: Are there any circumstances where
public performance reports are taken at face value, and the managers who produce them are not
concerned about the risks of reporting less-than-positive results? There is one empirical study
of a local government in Western Canada that directly addresses this issue, so we report some
of key findings from that study.

In many settings, public performance reporting is risky; often, political cultures are risk
averse, so it is difficult to report anything but positive or at least noncontroversial performance
results. We look at possible strategies to increase internal uses of performance information in
such environments and some strategies that managers tend to adopt to manage performance but
not expose the program to political risks.

Finally, we turn to several problems with performance measurement that can more
generally affect both the validity and the usefulness of performance information. We come
back to a key theme in our textbook; performance measurement is part of a broader suite of
approaches to doing evaluations. Performance measurement does not replace program
evaluation but instead focuses and informs it.

In Chapter 9, we discussed the design and implementation of performance measurement
systems and outlined 12 steps for guiding the process. Implementation implies that
performance measurement data are being collected regularly, are being analyzed, and are being
reported (either internally or externally). If the system is intended primarily for formative uses,
reporting may be informal and open-ended; that is, analyses and reports are prepared as
needed. If the reports are prepared to meet external accountability requirements, they usually
have a summative intent. In many jurisdictions where performance measurement is mandated,
public reports are required on a periodic (often annual) basis and are intended to be used for
decision making, including budgetary decision making (Hatry, 2006; Melkers, 2006).

Generating performance information is not sufficient to ensure that performance
information is actually used. This chapter will examine ways that performance information is
intended to be used, and is actually used. We will look at the factors that affect how
performance information is used and suggest that the political cultures in which organizations

are embedded are important in assessing the prospects for using performance measurement
information.

Historically, accountability focused on the processes by which decisions were made in
public organizations. The emphasis was on keeping good records, following established
procedures, and knowing how decisions were made and who was involved in making the
decisions. Public organizations have typically been structured hierarchically, and authority to
make the final decisions and accountability for decisions have ultimately resided in the person
or persons who occupied the higher offices of the structure. These bureaucratic structures and
their limitations have been critically analyzed by public choice theorists (e.g., Downs, 1965;
Niskanen, 1971) who have argued that to understand how public bureaucracies work, it is
necessary to acknowledge that public servants are not unlike private sector employees and
have an underlying self-interest that affects their work efforts. Furthermore, they argue that
understanding public sector motivations is central to designing government organizations and
structures that perform well.

Le Grand (2010) has suggested that historically, the assumption has been that public
servants are chiefly motivated by their desire to serve the public interest: that public servants
were “knights,” motivated to do the right thing. Public choice theorists and proponents of New
Public Management (Osborne & Gaebler, 1992) challenged that assumption and instead
offered a model of motivation that emphasized the importance of incentives, rewards, and
sanctions to induce good performance (see Le Grand, 2010). For public choice theorists,
process-focused accountability missed what was important, and even got in the way of good
performance. They argued that, instead, performance should be focused on results and on
aligning the incentives for public servants, so that following their self-interest would lead to
efficient and effective outcomes for public organizations. Key to that approach was identifying
desired results, measuring the extent to which they have been achieved, and holding public
servants accountable for delivering those results.

NPM, as a broad movement that has influenced both public and nonprofit management and
governance in Western countries, has been strongly connected to the drive to measure and
publicly report performance (Hood, 1991; Osborne & Gaebler, 1992). Among the intended
uses of performance information, improving accountability is a key one.

As we suggested in Figure 9.3 in Chapter 9, performance measurement and public
reporting were expected to contribute to improved public accountability (for results) and also
improve performance. Requiring public reporting of performance results, particularly in
relation to targets for a suite of performance measures, is now built into many public sector
performance measurement systems (Bevan & Hood, 2006; Pollitt, Bal, Jerak-Zuiderent,
Dowswell, & Harrison, 2010).

Making public performance reporting work as a means of inducing improved performance
assumes that performance results have real consequences for the organizations reporting them.
Recall Figure 1.1 in Chapter 1 in which we introduced the performance management cycle.
The final stage of the cycle, once performance results (both performance measures and
program evaluations) are reported, is to use that information to inform decisions, set priorities,
make budgets, and position organizations and governments for another performance
management cycle based on evidence from the past cycle. These are the real consequences that
are expected to flow from reporting performance results. If the performance management cycle
is an annual process, then a part of it would be annual performance reports.

USING PERFORMANCE RESULTS

Elected leaders are expected to be principal users of performance results (McDavid & Huse,
2012; Thomas, 2006). The efficacy of the performance management cycle is linked to elected
decision makers paying attention to performance results and using that information in their
deliberations and decisions. In spite of the importance of this link, there has been limited
research that has focused squarely on whether, and to what extent, elected officials make use of
the performance information that is regularly supplied to them through public reports. In the
next section, we will summarize a recent empirical study that has examined, over time, how
elected public officials in one jurisdiction used performance reports. The findings from this
study help us understand how the link between public reporting and “real consequences”
actually operates.

Legislator Expected Versus Actual Uses of Performance Reports in British Columbia,
Canada

In 2000, the British Columbia (B.C.) Legislature passed the Budget Transparency and
Accountability Act, a law mandating annual performance plans and annual performance
reports for all departments and agencies. The act was amended in 2001 (Government of British
Columbia, 2000, 2001), and the first performance reports were completed in June 2003.
McDavid and Huse (2012) surveyed all elected members of the legislature anonymously on
three occasions: The first survey was completed before the first public performance reports
were received in 2003, and then legislators were surveyed again in 2005 and 2007.

Table 10.1 summarizes the overall response rates to the three surveys.
In each of the three surveys, the same measures were used—the only difference between

the 2003 survey and the other two was that in 2003 the statements were worded in terms of
expected uses of the performance reports. Fifteen separate Likert statements were included,
asking politicians to rate the extent to which they used (or, in the first survey, how they
expected to use) the public performance reports for those 15 purposes. Figure 10.1 shows the
format and content of the Likert statements in all three surveys. For further analysis, the Likert
statements were later clustered so that they reflected five overall reported uses of performance
reports: (1) accountability, (2) communications, (3) improving efficiency and effectiveness, (4)
making policy decisions, and (5) making budget decisions.

Table 10.1 Legislator Response Rates for the 2003, 2005, and 2007 Surveys

Source: McDavid and Huse (2012, p. 14). Reproduced by permission from Sage.

Figure 10.1 Format for the Survey Questions on Expected Uses of Performance Reports in
2003

Source: McDavid and Huse (2012, p. 13). Reproduced with the permission of Sage.

Survey responses from the governing party were grouped to distinguish between cabinet
ministers (politicians who were responsible for departments and agencies) and backbenchers
(elected officials in the governing party who had no administrative oversight responsibilities).
Figures 10.2 and 10.3 display key findings from the three surveys for the governing (Liberal)
party.

If we look at Figures 10.2 and 10.3 together, we can see several trends. Overall, initial
expectations in 2003 about ways that performance reports could be used were high. Cabinet
ministers had even higher expectations than did their colleagues in the governing party who did
not head administrative departments. The drops in reported expectations in 2003 to actual uses
in 2005 and 2007 are substantial. For three of the clusters of uses (communication uses,
efficiency and effectiveness uses, and policy uses), when the 2005 and 2007 levels are
averaged, the drops for cabinet ministers were greater than 50%. The overall pattern for
backbench government members is similar to cabinet ministers. When the two groups of

elected officials are compared, cabinet minister drops in reported uses were larger than for
backbench members.

Figure 10.2 Clusters of Performance Reports Uses for Cabinet Ministers in 2003, 2005, and
2007

Source: McDavid and Huse (2012, p. 16). Reproduced with the permission of Sage.

Figure 10.3 Clusters of Performance Reports Uses for Liberal Backbench Members of the
Legislature

Source: McDavid and Huse (2012, p. 17). Reproduced with permission from Sage.

There was a provincial election in the spring of 2005 after the second survey was
completed, in which the New Democratic Party won 33 of the 79 seats in the legislature.
Figure 10.4 compares the reported uses of performance reports in 2007 by government
members of the legislature and members of the opposition party. In 2007, when there was a
substantial number of elected opposition members in the B.C. Legislature, comparisons
indicated that they generally used the reports even less than government members. Although
the responses indicated that members of the opposition used the reports for general
accountability purposes more than government members did, the difference is not statistically
significant. Overall, the reports appeared to be relatively less useful for opposition members in
their roles as critics of government policies and programs. These findings are generally
consistent with under-utilization or even non-utilization of public performance reports by
elected officials reported elsewhere (Barrett & Greene, 2008; Bouckaert & Halligan, 2008;
Steele, 2005; Sterck, 2007).

The picture of legislator uses of public performance reports, based on this empirical study,
suggests that although there were high expectations before the first performance reports were
seen—expectations that reflect the intended uses of performance reports that have been a part
of the NPM literature—the two subsequent rounds of actual reports were not used nearly as
much as had been expected (McDavid & Huse, 2012).

Figure 10.4 Government (Liberal) and Opposition (New Democratic Party) Uses of the
Performance Reports in 2007

Source: McDavid and Huse (2012, p. 18). Reproduced with the permission of Sage.

If elected officials are not using performance reports, or are using them very little in their
roles and responsibilities, an important link in the intended performance management cycle is
weakened. As well, expectations that public accountability will drive performance
improvements via legislator scrutiny are then questionable. While there may be process-related
benefits associated with an annual cycle of setting targets, measuring performance against
those targets, and reporting the results (McDavid & Huse, 2012), those are not the real
consequences of public reporting that have been envisioned by advocates of this approach to
public accountability and performance management.

High-Stakes Uses of Performance Measures

Britain has been an exemplar for its commitment to performance measurement, including
setting targets, producing public reports, and ensuring that there are real consequences for
reporting organizations. Pollitt et al. (2010), Bevan and Hamblin (2009), Hood, Dixon, and
Wilson (2009), and Le Grand (2010) have all examined the British experience of using
performance measurement and public reporting as a means to manage and improve
governmental performance. We will describe the experiences of different approaches taken by
the British to enhance public accountability, and how well they worked to improve
performance.

The British Experience With Performance Management

Beginning in 1997 when the Labour government under Tony Blair was first elected,
performance measurement was given a high priority. There had been earlier efforts in the
National Health Service (NHS) to implement results-based management regimes (Pollitt et al.,
2010), but the New Labour government quickly expanded the scope of performance
measurement and target setting. Initially, there was an emphasis on constructing performance
measures for government departments and agencies, setting targets, and reporting actual results
compared with targets. This approach was used for about 3 years (1997–2000), and
assessments of the results of this approach suggested that performance had not improved, even
though more money had been put into key services such as health and education (Le Grand,
2010).

Le Grand suggests that the assumption made during that time was that public servants were
“knights” who were motivated to “do the right thing”—that they would be motivated
intrinsically to improve their performance to meet targets. By 2000, the Blair government had
decided to use a much more centralized and directive approach to public accountability and
performance management. For the next five years, public rating and ranking systems were
widely implemented; this regime has been called the “targets and terror” approach to
performance management (Bevan & Hamblin, 2009).

In the health sector, the heart of this approach was a star rating system wherein public
organizations were rated on their overall performance from zero stars up to three stars. This
public accountability approach was first applied in acute care hospitals in 2001 in England and
then extended in 2002 to cover ambulance services in England (Bevan & Hamblin, 2009).
Eventually, it was implemented in other parts of the public sector, including local governments
(McLean, Haubrich, & Gutierrez-Romer, 2007). The mechanism that was integral to the star
rating system was to challenge the reputation of each organization (Bevan & Hamblin, 2009;
Hibbard, 2008). Hibbard, Stockard, and Tusler (2003) specify four criteria that are necessary to
establish an effective ranking system that has real reputational consequences for the
organizations that are rated and ranked (see also Bevan & Hamblin, 2009):

1. A ranking system must be established for the organizations in a sector.
2. The ranking results need to be published and disseminated widely.
3. The ranking results need to be easily understood by the public and other stakeholders so

that it is obvious which organizations are top performers and which are not.
4. Published rankings are periodically followed up to see whether performance has

improved—one way to do this is to make the rankings cyclical.

The process by which organizations (e.g., hospitals) were rated and ranked is described by
Bevan and Hamblin (2009). For hospitals, approximately 50 measures were used to rate
performance, and those measures were then aggregated so that for each hospital a “star rating”
was determined. The performance measures for the hospitals were based primarily on
administrative data collected and collated by an independent agency. This agency (the
Healthcare Commission) assessed performance and gave each organization a star rating.
Hospitals could earn anywhere between zero and three stars depending on their aggregate
performance. The star ratings were published in a league table that was widely disseminated.

Bevan and Hamblin (2009) summarize the impacts of the first published three-star rankings
in 2001 for acute care hospitals: “the 12 zero-rated hospitals [in that year’s ratings] were
described by the then Secretary of State for Health as the ‘dirty dozen’; six of their chief
executives lost their jobs” (p. 167). Star ratings in 2004 resulted in

the chief executives of the nine acute care hospitals that were zero rated [being] “named
and shamed” by the Sun (on October 21, 2004), the newspaper with a circulation of over 3
million in Britain [which published] a two-page spread [that] had the heading “You make
us sick! Scandal of Bosses running Britain’s worst hospitals” and claimed that they were
delivering “squalid wards, long waiting times for treatment and rock-bottom staff
morale.” (p. 167)

The whole process was high stakes for organizations being rated. It had real consequences.
What made the British approach to performance management unique, and offers us a way to
see what difference it made, is that the star rating system was implemented in England and not
in Wales or Scotland—those latter two countries (within Britain) controlled their own
administration of all health-related organizations and services, even though the funding source
was the (British) NHS.

In Wales and Scotland, there were performance targets and public reporting, but no ratings
and no regimes of “naming and shaming” or “targets and terror.” Bevan and Hamblin (2009)
take advantage of this natural experiment to compare performance over time in England
compared with Wales and Scotland. What they discovered was that given enough oversight,
the English approach apparently can have an effect—measurable improvements in
performance happened in England’s hospitals that did not occur in either Wales or Scotland.

A second natural experiment was also evaluated by Bevan and Hamblin (2009). For
ambulance services (organizations that provide ambulance services are called trusts in Britain)
England again implemented a high-stakes summative star rating performance measurement
system, whereas Scotland and Wales did not. For emergency calls (Category A calls), the NHS
had established a British target of having 75% of those calls completed in eight minutes or less
in England, Scotland, and Wales. In England, by 2003 most ambulance organizations reported
achieving that target (Bevan & Hamblin, 2009), but neither Wales nor Scotland did. The star
rating system in England apparently produced the desired result of improving performance.

However, the star rating system also produced substantial unintended effects, which we
will explore in greater detail in the next sections of this chapter. In 2005, there was a national
election in England and one of the issues that surfaced was the side effects of the performance
management regime that was in place. Bevan and Hood (2006) report one incident involving
the prime minister, Tony Blair, and a questioner on the campaign trail:

In May 2005, during the British general election campaign, the prime minister was
apparently nonplussed by a complaint made during a televised question session that
pressure to meet the key target that 100% of patients be offered an appointment to see a

general practitioner within two working days had meant that many general practices
refused to book any appointments more than two days in advance. A survey of patients
found that 30% reported that their general practice did not allow them to make a doctor’s
appointment three or more working days in advance. (p. 420)

The star rating system was ended in 2005. Target setting and public reporting were
continued but the “naming and shaming” aspects of the system were largely abandoned in
preference to a less confrontational approach.

Assessing the “Naming and Shaming” Approach to Performance Management in
Britain

When we examine the implications of the British approach to performance management,
we can see that three different models were tried at different times and places: the first model,
in place before the Blair government came to power in 1997, involved performance results—to
the extent that they were available—being used to inform and induce improvements through
target setting (Hood et al., 2009). Pollitt et al. (2010) point out that the first performance
measurement system in the NHS was developed in 1983 but was initially intended to be
formative, to be used by managers to monitor programs and improve performance locally,
notwithstanding the fact that national performance targets were a part of the development of
this system. In the NHS, this approach gave way to the first published comparisons of
performance results across health regions in 1994; this was the first version of the “league
tables” approach that was used more widely later on.

The second model, high-stakes performance measurement and public reporting, was
implemented in England between 2000 and 2005, and has been linked to several different
kinds of unintended effects, which we will describe shortly. The third model (2006 to the
present) is similar to the first one in that objectives, targets, and reporting are all mandated, but
the high-stakes star rating system has been omitted.

Bevan and Hamblin (2009), Otley (2003), Pollitt et al. (2010), and others have commented
on problems that can arise when performance measurement and management regimes utilize
the kind of “naming and shaming” strategies that were central to the English approach between
2000 and 2005. Bevan and Hamblin (2009) suggest several problems with the high-stakes star
rating system that was adopted in Britain. We will highlight three problems here.

The first is that what gets measured matters and, by implication, what is not or cannot be
measured does not matter and may be neglected. A phrase that has been used to characterize
this situation is “hitting the target and missing the point.” Wankhade (2011) looked at the
English ambulance service and found that the dominant focus on a response time target of
eight minutes for Category A emergency ambulance runs distorted the real work that was being
done and forced these organizations to devalue the importance of patient outcomes.

The second problem is related to the first one in that picking key performance measures
often misrepresents the complexity of the work being done by public organizations (Bevan &
Hood, 2006). Picking performance measures is at least in part opportunistic—measures
represent values and priorities that are politically important at a given time, but may not be
sound measures of the performance of core objectives in organizations.

The third problem is perhaps the most significant one: gaming performance measures is a
widespread problem and has been linked to the lack of credibility of performance results in
many settings (Bevan & Hamblin, 2009; Bevan & Hood, 2006; Hood, 2006). We will look at
gaming as an issue in the NHS-related studies that have been done, and then summarize a case
that illustrates gaming, based on Otley’s (2003) work with the coal mining industry in Britain.

Gaming amounts to situations where unintended behaviors result from the implementation
of performance measures. Propper and Wilson (2003) describe gaming behaviors in terms of
the relationship between principals (political decision makers) and agents (those who are
actually delivering the programs or services): “As the principal tries to get higher effort (and so
better public services) by implementing performance measurement, the response may be better
services but also may be other less desired behaviour” (p. 252).

In their examination of English ambulance services during the star rating regime from 2000
to 2005, Bevan and Hamblin (2009) point out that many ambulance services (about one third)
were manually “correcting” their reported response times to come in “on target.” Furthermore,
in an audit that was conducted in 2006 after whistle-blowers had contacted the counter fraud
service, the Department of Health “reported … that six of 31 trusts [ambulance organizations]
had failed accurately to record the actual response times to the most serious life threatening
emergency calls” (p. 182).

Figures 10.5 and 10.6 have been reproduced from Bevan and Hamblin (2009) and offer a
visual interpretation of gaming behaviors in the English ambulance trusts during the time that
the star ratings performance management system was in place. First, Figure 10.5 illustrates a
frequency distribution of ambulance response times, taken from one service, that indicates a
fairly linear distribution of frequency of response times and corresponding number of calls for
service. This overall pattern suggests that the ambulance trust is reporting response times
accurately—there is no visible change in frequency of calls around the 8-minute target that was
the core of the star rating system for ambulance services in England.

Figure 10.5 A Distribution of Response Times for One English Ambulance Service: No
Gaming Is Evident

Source: Bevan and Hamblin (2009, p. 178).

Figure 10.6 Distribution of Response Times for One English Ambulance Service: Gaming
Is Evident

Source: Bevan and Hamblin (2009, p. 179).

Figure 10.6, in contrast, indicates a marked difference from the earlier linear pattern of the
frequency of ambulance response times and the number of calls for service. Up to the 8-minute
performance target, there are apparently more ambulance calls the closer the response times are
to that target. But beyond the target, the response frequency drops off dramatically. The pattern
in Figure 10.6 strongly suggests that ambulance response times are being “adjusted” (gamed)
so that they meet the 8-minute threshold—it is highly unlikely that the discontinuity in Figure
10.6 could have occurred by chance.

Hood (2006) points out that gaming was either not anticipated as the transition to high-
stakes performance measurement and reporting was made in Britain in 2000, or was possibly
downplayed by those who had a stake in “meeting the targets.” Hood puts it this way:

Why was there no real attempt to check such data properly from the start? The slow and
half-hearted approach to developing independent verification of performance data itself
might be interpreted as a form of gaming by the central managers (like the famous
English admiral, Horatio Nelson, who put a telescope to his blind eye to avoid seeing a
signal he did not want to obey). (pp. 519–520)

Hood (2006) has suggested three general categories of gaming behaviors based on his
research on performance management in Britain. Ratchet effects are exemplified where
organizations try to negotiate performance targets that are easy to attain. An example from
Bevan and Hamblin (2009) was the Welsh ambulance service that could not meet the NHS
target of eight minutes for 75% of Category A calls and, over several successive years,
succeeded in getting that target reduced year over year.

Threshold effects occur when a performance target results in organizational behaviors that
distort the range of work activities in an organization. Hood (2006) gives the example of

schools that were set pupil-attainment targets on test scores, leading teachers to
concentrate on a narrow band of marginal students who are close to the target thresholds
and to give proportionately less attention to those at the extreme ends of the ability range.
(p. 518)

The third kind of gaming is arguably the most important of the three proposed by Hood
(2006). Output distortions occur in situations where performance results are “adjusted” so
that they line up with expectations. Bevan and Hamblin (2009) quote Carvel (2006) who
examined the actual methods used to measure English ambulance response times:

Some did not start the clock as soon as a 999 call was received. Others did not
synchronize the clocks in the emergency switchboard with those used by the paramedics.
In some cases, ambulance organizations re-categorized the urgency of the call after the
job was done to make it fit the response time achieved rather than the priority given when
the original call was made. This would allow staff to downgrade an emergency if the
ambulance arrived late. (Bevan & Hamblin, 2009, p. 182)

A Case Study of Gaming: Distorting the Output of a Coal Mine

Otley (2003) recounts a story based on his early experience as a British mining engineer.
His first project was to develop a computer model of a coal mine. Using an existing model of
how a single coal face operated, he quickly extended this model to a whole mine. Validating
the model involved comparing the model’s predicted mine outputs with data from the actual
mine. The model predicted average output quite well but could not predict the variability in
output. Since the model was intended in part to assist in the design of an underground
transportation system, peak loads needed to be accurately estimated.

Otley assumed that he had made some kind of programming error; he spent several weeks
searching for such an error, to no avail. He decided to look at the actual raw data to see if
anything emerged. The weekly data had patterns. The mining output data showed that for a
typical Monday through Thursday, actual tonnes of coal produced conformed pretty closely to
a budgeted target for each day. But on Friday, the actual tonnes could be anything from much
more to much less than the daily average. It turned out that the mine managers knew that for
every day of the week but Friday, they could report an output to headquarters that was close to
the budgeted output because the actual tonnes were only totaled up on Fridays. To reconcile
their reported figures with the weekly total (being on budget with actual production was their
performance measure), they approached the Friday output figure creatively.

The mine managers had created an additional way of assuring that they met the weekly
production targets. At the bottom of the mine shaft there was a bunker that was intended to be
used to store coal that could not be transported to the surface during a given day. The bunker
was supposed to be emptied on Friday, so that it could be used to buffer the next week’s daily
production—the hoist that brought the coal to the surface was a bottleneck, and the bunker was
a way to work with this problem. But Otley discovered that the bunker was often full on
Monday mornings; the managers had determined that having a full bunker to start the week
meant that they had a leg up on that week’s quota, and since the penalty for under-producing
was greater than any consequence for overproducing, they responded to the incentives.

Mine managers had developed ways to game the performance measure for which they were
accountable. For Otley’s modeling process, the output data were not sufficiently accurate to be
useful.

Reassessing the Performance Management Cycle: The Roles of Incentives and
Organizational Politics

To this point in Chapter 10, we have looked at the linkage in the performance management
cycle between reporting performance results and the “real consequences” when those results
become public. What we have seen is that, in the study reported by McDavid and Huse (2012),
legislators generally under-utilize public performance reports for budgetary decisions, policy-
related decisions, or for improving efficiency and effectiveness. Instead, performance reports
appear to be more useful for symbolic accountability purposes and to communicate with
constituents and other stakeholders. The high-stakes approach to inducing real consequences
for performance reporting—the very public “naming and shaming” approach that was
implemented in England between 2000 and 2005—appeared to work to improve performance
compared with the less draconian approach used in Wales and Scotland, but the English system
created substantial gaming-related side effects that served in part to put an end to it in 2005.

Britain has returned to a less high-stakes variant of performance measurement. In the
United States, where the Office of Management Budget from 2002 to 2009 annually conducted
summative assessments of the effectiveness of federal programs using the Program Assessment
Rating Tool (or the PART) process, the current American administration has pulled back from
this approach—amending the Government Performance Results Act (GPRA) in 2010 (GPRA
Modernization Act, 2010) to focus more attention on performance management in individual
departments and agencies. Performance measurement is still required, as is reporting on a
quarterly basis, but there is more emphasis on balancing performance measurement and
program evaluation (Willoughby & Benson, 2011).

The performance management cycle that we introduced in Chapter 1 is a normative model.
It displays intended relationships among the five phases of the cycle: (1) strategic planning;
(2) policy and program design; (3) implementation; (4) evaluation, performance measurement,
and public reporting; and, finally, (5) real consequences. Given the research that has been
done that examines the way the performance management cycle “closes the loop,” that is,
makes performance results available to decision makers, it is appropriate to reassess the model.

As part of our reassessment, recall that in Chapter 9, we introduced the rational/technical
and the political/cultural lens through which we can “see” organizations and the process of
developing and implementing a performance measurement system. The model of the
performance management cycle we introduced in Chapter 1 of this book is aligned with a
rational/technical view of organizations in terms of the development and implementation of
performance measurement systems. In our view, this model of performance management
assumes that people will behave as if their motives, intentions, and values are aligned with the
rational/technical “systems” view of the organization.

The political/cultural lens introduced in Chapter 9 suggests a view of performance
management that highlights what we have seen so far in the examples in Chapter 10.
Performance measurement and public reporting systems that are high stakes can encounter
significant problems in terms of the ways that performance information is created, compiled,
reported, and actually used. Some results are broadly consistent with the normative
performance management model, in that high-stakes public reporting does improve
performance (at least on the measures that are being highlighted). However, other behaviors
work to undermine the credibility of such systems unless they are policed through processes
like external audits of measurement systems and results. Audit and performance assessment
systems are often costly to operate, affecting the sustainability of these systems over time.

Figure 10.7 displays our original performance management cycle introduced in Chapter 1
but includes changes that are aimed at including organizational politics and incentives. What

Figure 10.7 implies is that a plan to design and implement a performance management system
will of necessity need to navigate organizational processes in which people, their backgrounds
and experiences, and their sense of “who wins and who loses?” and “what does this change do
to my own prospects?” will be key to how well, if at all, the performance management system
actually “performs.”

One way to look at the performance management cycle that is depicted in Figure 10.7 is
that in settings where the stakes are high—the political culture is adversarial, the media and
other interests are openly critical, and the public reporting of performance results is highly
visible (in other words, closer to the “naming and shaming” system that was implemented in
England between 2000 and 2005)—it is more likely that unintended effects in the performance
management cycle will emerge. Otley (2003) has suggested that over time, as gaming becomes
more sophisticated, it is necessary to counter it with more sophisticated monitoring and control
mechanisms.

Pollitt et al. (2010), who have extensively examined the British experience with
performance management, suggest that there is a pattern to the development of public
performance measurement systems that consists of six stages:

1. The initial few, simple indicators become more numerous and comprehensive in scope;
2. Initially formative approaches to performance become summative, e.g., through league

tables or targets;
3. The summative approach becomes linked to incentives and sanctions, with associated

pressures for “gaming”;
4. The initial simple indicators become more complex and more difficult for non-experts to

understand;
5. “Ownership” of the performance regime becomes more diffuse, with the establishment

of a performance “industry” of regulators, academic units and others, including groups
of consultants and analysts …; and

6. External audiences’ trust in performance data and interpretations of them tends to
decline. (p. 19)

What Pollitt et al. (2010) are describing is the evolution of performance measurement
systems in adversarial political cultures, specifically, Britain since the early 1980s. Although it
is not appropriate to generalize to all other jurisdictions, it is important to keep in mind that
securing managerial buy in, in the initial stages of developing a performance measurement
system, is often linked to an initial goal of using performance information formatively. The
pattern summarized above is generally consistent with the evolution of the performance
measurement regime in British Columbia, for example (McDavid, 2001), where an initial
formative stage from about 1995, in which individual departments developed their own
measures and shared their experiences, gave way to legislated performance measurement and
reporting requirements by 2000. That summative, target-based public reporting system endures
to the present.

Figure 10.7 Public Sector Accountability and Performance Management: Impacts of
Incentives and Politics

Source: Adapted from Auditor General of British Columbia & Deputy Ministers’ Council (1996).

However, there is another way to achieve a linkage between public performance reporting,
accountability, and performance improvements. Instead of constructing a high-stakes
environment for the process, it is possible, in some settings, to work with a low-stakes
approach. We will consider an example of such a setting—a local government in Western
Canada.

Use of Performance Measures in a Non-Adversarial Political Environment

Joining Internal and External Uses of Performance Information: The Lethbridge Local
Government Study

In Chapter 8, we pointed out that performance measurement had its origins in local
governments in the United States at the turn of the 20th century. Although there were no

computers or even calculators, it was possible to construct performance information that
included costs of local government services as well as key outputs and often outcomes
(Williams, 2003). When Niskanen (1971) wrote his seminal book on bureaucracies, one of his
recommendations was to increase the role of the private sector in providing government
programs and services. Variations on that recommendation were also made by Bish (1971) and
others who were writing about urban local governments in the United States. Contracting out
of local government services became a widespread practice during the 1970s and 1980s
(McDavid, 2001; Savas, 1982, 1987), and part of the success of that movement was due to the
relative ease with which the performance of local government services could be measured.
Hatry (1974, 1980) was among the first to champion performance measurement for local
governments. Generally, local government programs and services have outputs and often
outcomes that are tangible, are countable, and are agreed upon.

Many local governments also deliver programs and services in political environments that
are broadly nonpartisan. Indeed, a key goal of the Progressive Movement in the United States
during the late 1800s to post–World War I was to eliminate political parties from local
elections (Schaffner, Streb, & Wright, 2001). For advocates of progressive local government,
introducing business-like practices into local government was also desirable. From this
perspective, citizens who are served by local governments can be seen to be consumers who
are relatively able to see the amount and quality of services they receive, and can offer
performance feedback via complaints and other mechanisms.

Requiring public performance reporting for local governments varies from one state or
province to the next.

Research has been done looking at the way local governments use performance information
(Askim, 2007; Pollanen, 2005; Streib & Poister, 1999), and one case study has systematically
looked at how managers and elected officials in a local government use performance
information. Lethbridge is a community of 84,000 people that is situated in the southern part of
the province of Alberta in Western Canada. Performance measurement has been widely
implemented across the city departments, but the measures have been developed by managers
for their own uses. Although public performance reports are not required, nearly all business
units prepare such reports for City Council each year.

In 2009, eight of nine members of City Council and 25 of 28 departmental managers were
interviewed to solicit their perceptions of the usefulness of performance information for certain
purposes (Hildebrand & McDavid, 2011). Table 10.2 shows a comparison on the same
interview questions between council members and business unit managers on several
indicators of the usefulness of performance information (based on a scale of 1 to 5).

There is general agreement between councillors and managers on the extent to which
performance information is useful. One clear difference between councillors and managers was
around how useful they thought citizens would find the performance reports. Council members
were more likely than managers to indicate that citizens would find the reports useful. Because
managers in Lethbridge have built performance measures for their own uses, some of the
measures are technical, reflecting the business of their department, and managers possibly
share the view that citizens would not find this information useful. Council members, on the
other hand, have a perspective that is perhaps more likely to consider citizens as stakeholders
interested in the reports, given that performance reports are a means to demonstrate the
accountability of the city government.

Table 10.3 compares councillor and manager perceptions of the quality and credibility of
the performance data that are produced by departments in the city. Both council members and
managers generally agreed on the high quality of the performance information they were using.
The one difference was around the extent to which performance information is accurate;

council members were more likely to take a “neutral” position, but when asked to rate overall
data believability, council members responded that they trusted the information produced by
their managers.

Table 10.2 Perceived Usefulness of Performance Information for Council Members and
Managers

Source: Reproduced from Hildebrand and McDavid (2011, p. 56).

Table 10.3 Council Member and Manager Perceptions of Performance Data Quality and
Credibility

Source: Reproduced from Hildebrand and McDavid (2011, p. 59).

Both council members and managers were asked to respond to a Likert statement about the
extent to which they were concerned about publicly reporting performance results that are not
positive. Their responses could be 1 = not at all, 2 = hardly any degree, 3 = some degree, 4 =
moderate degree, or 5 = great degree. Overall, the responses from council members and
managers were quite similar; neither group was substantially concerned about reporting results
that are not positive. Council members were more concerned: 25% of them expressed at least a
moderate degree of concern (response mean of 2.75) versus 16% of the business-unit managers
(response mean of 2.04).

The Lethbridge local government case contrasts with high-stakes performance
measurement and reporting in adversarial political environments. In Lethbridge, performance
measures have been developed by managers, and public reporting is not the central purpose of
the system. Both council members and managers find performance information credible and
useful and neither group is substantially concerned with publicly reporting performance results
that are not positive. Most important, a nonadversarial political culture facilitates developing
and using performance information for both accountability and performance management
purposes. In other words, the absence of a high-stakes “naming and shaming” approach to the
public airing of performance results means that the same performance data are used both to
improve public accountability and to improve performance.

The Lethbridge findings, although they represent only one local government, provide an
interesting contrast to findings from studies of high-stakes utilization of performance
measures. As we have seen, high-stakes top-down performance measurement and public
reporting generally have significant challenges related to their efforts to increase both public
accountability and performance improvement. As the English experience has suggested,
making a “naming and shaming” performance management system work requires a major
investment in monitoring, auditing, and measurement capacities to manage the side effects of
such an approach. Hood (2006) has suggested that these capacities were not sufficiently
developed when the system was implemented in Britain. Most governments are not willing to
put those kinds of resources into the design and implementation of performance management
systems.

In Chapter 9, we suggested that unless managers are substantially involved in developing
and implementing performance measurement systems, they will generally not be useful for
performance management/performance improvement and will not be sustainable. Our findings
from examining performance measurement and public reporting in Lethbridge, Alberta, also
suggest that in political cultures where the “temperature” is lower, that is, performance results
are not treated as political ammunition, there is a higher likelihood that the goal of
simultaneously realizing public accountability and performance improvement will be achieved.

But what about jurisdictions where public reporting is high stakes, is visible, and can result
in both media and even political consequences? Even though these jurisdictions typically do
not have “name and shame” policies, there are potential political risks associated with public
performance reporting. Real consequences are a risk rather than a certainty. We will next
consider an approach to navigating such situations that not only makes it possible to use
performance information for management but also takes into account the risks of reporting
performance results publicly.

Using Performance Information for Management: Encouraging Internal Uses of
Performance Results

As we have discussed, evaluations found that the British approach to performance
management, particularly during the “naming and shaming” regime from 2000 to 2005,

arguably improved performance in services in England as compared with Wales and Scotland.
But this version of performance measurement and public reporting was not sustainable. The
political blowback that the Blair government experienced in the 2005 election campaign was
one reason that the government pulled back from that high-stakes system. What has replaced it
is a system that includes performance measurement, target setting (decentralized to some
extent), and public reporting. A similar approach is now in place in the United States and
Canada.

Hatry (2006) suggests that “using performance information only for accountability to
higher levels would be a great waste. Performance measurement data should be used to help
improve programs” (p. 153). What he has in mind is managers using the information, premised
on the assumption that performance results can be used formatively. At the same time, he
recognizes the risk of performance results becoming fuel for blaming:

Concern (unfortunately realistic) that the performance data will be used by others
primarily to cast blame has led programs to select performance indicators that are easier
for them to influence … Performance information should be used to learn, not “shoot the
messenger.” (p. 195)

Dealing with perceived risks of being caught “making mistakes” is a part of what drives
both political and administrative decisions. The real consequences of making performance
results public are not known in advance but instead are a function of factors that are often
situational; risk management for governments includes managing the risks of public
performance reporting. Some have suggested that managers should be prepared to discuss and
take strategic risks involved in using and reporting actual performance results (Poister, 2010).
Although this approach has the promise of joining together performance reporting for
accountability and performance improvement, the evidence suggests that the incentives to
avoid risks will continue to be a key factor in adversarial political cultures.

There is another strategy for reducing risks and increasing the likelihood that performance
information will be used by managers and others inside organizations. Kettl and Kelman
(2007) and McDavid and Huse (2012) have suggested that decoupling (Power, 1997)
externally reported performance results from the information that is used by managers
internally can reduce the risks of using performance information for monitoring and program
improvements. Decoupling means that public performance reporting is largely separated from
the internal performance measurement and management activities in departments and agencies.
Information that is developed for internal uses may then be viewed as being more trustworthy
because it is not intended to be made public. It can be used formatively. Managers may create
their own databases that are distinct from existing organization-wide performance systems and
even prepare their own internal performance reports (Hatry, 2006). Gill (2011), in a large-scale
study of the New Zealand public service, reports that managers have developed their own
information sources and have at least partially decoupled internal uses of performance
information from external accountability-related reporting requirements.

In summary, the relationships between public performance reporting, public accountability,
and performance improvement are more complex than the normative performance management
cycle suggests. The normative model makes public reporting the driver for both public
accountability and performance improvement. One way to envision this intended linkage is
that public performance reporting will “force” public sector organizations to make
performance improvements. The star rating system in England epitomized that kind of
mechanism. But in many other jurisdictions where high-stakes performance reporting is not
accompanied by visible ratings or rankings, the risks are more diffused. How organizations

navigate in such environments depends on experience and the tolerance of political leaders for
risk. Figure 10.8 depicts a model of how public performance reporting both affects and is
affected by the political culture in which an organization is embedded. We are saying that the
political culture influences how the performance measures are chosen, compiled, and reported;
in a high-stakes environment, gaming is likely.

Figure 10.8 A Behavioral Model of the Relationships Between Accountability and
Performance Improvement

Increasing Uses of Performance Information by Elected Officials: Supply and
Demand Improvements

To this point in Chapter 10, we have examined three different scenarios for combining
performance measurement, public reporting, increased accountability, and performance
improvement. The “naming and shaming” strategy uses publicly reported performance results
for accountability and to improve performance, but has limitations. In adversarial settings, the
incentives to avoid risks effectively undermine reporting results that are not positive. The
credibility of performance results is then an issue in such settings. The low-stakes approach
that was illustrated by Lethbridge, Alberta, makes public reporting an adjunct to a system that
is intended to serve the needs of the managers, but is possibly only effective in a non-
adversarial political environment. The third scenario involves decoupling public performance
reporting from internal uses of performance information. In effect, one set of measures are
developed for accountability purposes, and another set are developed and used internally. The
three approaches highlight the importance of context in the development, implementation, and
use of performance measures.

What are the implications of these challenges in terms of use of performance measures by
political officials and other high-level budgetary decision makers? Studies suggest that there is
limited evidence that the performance results from the PART process in the United States
influenced congressional budgetary decision making (Frisco & Stalebrink, 2008; Gilmour &
Lewis, 2006). This finding is consistent with findings from examinations of the impacts of
performance-based budgeting in the states: “Research shows that performance-based budgeting
initiatives in the states have had some success over the last few years, particularly in the

management of programs, but less dramatically in changing actual appropriations” (Melkers,
2006, p. 73). Sterck (2007) looked at performance budgeting in Sweden, the Netherlands,
Australia, and Canada and concluded as follows: “Despite the fact that new budget formats and
new budget procedures are installed, there is little evidence that the performance information is
consistently used by members of Parliament in their oversight function” (p. 200).

Advocates for public performance reporting and its use by elected decision makers have
suggested strategies that are aimed at bridging the gap between producing public performance
reports and their being used. These strategies focus on improving the supply and/or the demand
for performance information. Below we discuss audit, legislation, and training as potential
solutions to increasing performance measurement use by elected officials and other high-level
budgetary decision makers.

Improving the Supply and Demand of Performance Information: Examining the Audit
Strategy

Auditing performance measures and performance reports are strategies promoted by public
auditing bodies and are intended to increase the credibility of the performance information
(CCAF-FCVI, 2002, 2007b; Klay, McCall, & Baybes, 2004; Power, 1997; Wilkins & Mayne,
2002). Assuring the credibility of performance reports is assumed to increase the likelihood
that the information will be used, although there is little research that looks at this linkage
directly. Some jurisdictions have invested substantially in auditing public performance reports,
with national audit offices being required to audit all public performance reports (Schwartz &
Mayne, 2005).

It is evident that organizations representing public auditors have advocated for credibility
assurance audits as part of the mandate of state, provincial, and national auditors general
(Mayne & Wilkins, 2005). But what are the views of elected legislators on this issue? Is there
any evidence that legislators are demanding audits to assure themselves of the credibility of
public performance reports?

The study of legislator uses of public performance reports in British Columbia included a
question (not published in the McDavid & Huse [2012] report) in the 2007 survey that asked
both government and opposition members to indicate their preferences for procedures to assure
the credibility of the performance reports. Each respondent was asked to agree or disagree, on
a 5-point scale, to a series of four Likert items each of which described a way to assure the
credibility of the performance reports. Figure 10.9 displays the responses to the four statements
for both government and opposition members of the legislature.

Figure 10.9 shows that government members have a preference for internal ways of
assuring the credibility of performance information. Involving external auditors is rated as
considerably less attractive. Keep in mind that government members did not use the
performance reports in a substantive way over time, so these results may be related to the level
and types of use by those of the ruling party. Members of the opposition have the opposite
view on how to assure credibility of the reports. Involving external auditors is their strongest
preference, reflecting the adversarial political culture in British Columbia wherein trust
between the government and the opposition is quite limited. What Figure 10.9 implies is that
for the political party in power, external audits of performance reports is not a priority. Without
a voiced demand for external audits, it is difficult to see how undertaking external audits would
increase demand for performance information.

Figure 10.9 Ratings of Options for Credibility Assurance for Government and Opposition
Members of the BC Legislature

Improving the Demand for Performance Information: Examining Legislation and Training

A principal strategy for ensuring a demand for performance reports is to legislate their
annual production and review. In British Columbia, for example, legislation was passed in
2000 and amended in 2001 (Government of British Columbia, 2000, 2001) that made it
mandatory for all government departments and agencies to produce performance reports and
table them in the Legislature during the summer of each year (although there was not actually
legislation for systematic review of all the annual performance reports). The presumption was
that if the reports were available to elected officials, they would be more likely to be used. Half
of the 10 Canadian provinces have passed legislation that mandates performance measurement
and public reporting (Manitoba Office of the Auditor General, 2000).

The original Government Performance and Results Act (1993) was intended to be the
legislative mandate to implement government-wide performance measurement and reporting in
the United States federal government. Setting objectives, measuring performance, comparing
actual results to intended results and reporting were all part of the GPRA process. When GPRA
was amended in 2010 (GPRA Modernization Act, 2010), the successor legislation mandated
public reporting, although it does not link performance results to the budgeting process in the
way that the PART process did between 2002 and 2009.

At the state level, in the United States, nearly all have required that performance
information be included in the budgetary process. In their survey of American states in 1998,
Melkers and Willoughby (1998) found that “all but three states have performance-based
budgeting as well as administrative requirements” (p. 66).

As noted earlier, however, the availability of performance measures does not seem to have
translated into their fulsome use. Notwithstanding the efforts by public auditors to promote
performance reporting (Auditor General of British Columbia, 2008; CCAF-FCVI, 2002, 2006,
2007a, 2007b; Epstein, Grifel, & Morgan, 2004; Government of BC and the Auditor General
of BC, 2003; Pentland, 2000), in most performance measurement and public reporting systems,
even where legislated, there appears to be a limited appetite for substantively using
performance results to make political decisions (Frisco & Stalebrink, 2008; Gilmour & Lewis,
2006; McDavid & Huse, 2012; Sterck, 2007; Thomas, 2008).

Sterck (2007) suggests a bold strategy for increasing the uses of performance information:
“Intensive training for MPs and increased communication about [performance budget] reforms
could be instruments to convince MPs to find the right information and to use it” (p. 200). If
that were done, it could possibly increase demand for performance information. But the
prospect for training political decision makers to better use public performance reports to make
policy and legislative decisions is possibly quite limited. There may be insufficient overlap
between the administrative technical/rational perspective that comprises performance
measurement and performance reporting systems, and the communications-related
informational needs of elected officials (Steele, 2005).

Assessing the Realities of Performance Measurement for Public Accountability,
Performance Improvement, and Program Evaluation

The British approach to performance measurement, target setting, and the star rating
system between 2000 and 2005 in England was an exemplar of a fully realized high-stakes
centralized approach to performance management tied to performance measures. Using the star
rating system to challenge the reputations of public sector organizations did improve
performance (Bevan & Hamblin, 2009). At the same time, there were substantial side effects,
principal among them being the gaming-related affects on the performance results (Bevan &
Hamblin, 2009; Hood, 2006; Pollitt et al., 2010). “Naming and shaming” worked, but it was
costly both from an administrative standpoint and from a service quality standpoint.

Looking at the British experience, Pollitt (2011), in a review of public management
reforms in Europe over the past 30 years, sums up the changes in Britain from 1997 through
2010 this way:

Within the 13 year life span of the Blair/Brown New Labour administration (1997–2010)
central government’s approach to performance measurement swung from moderate to
intense and back again, so that by the end the government was officially distancing itself
from its former period of intense central targetry, and proclaiming the need for
professional service deliverers to have greater freedom to take their own decisions and set
their own targets (Cabinet Office, 2008). The coalition government that succeeded them
[2010] took this rhetoric further, denouncing their predecessor’s performance
measurement regime as inhibiting of local innovation, and declaring a “bonfire of
targets.” Soon, however, new frameworks of “milestones” and “standards” appeared, and
operational managers could be forgiven for thinking that in practice they still had a great
deal of upwards performance reporting to do. (p. 13)

If we step back from the British experience and look at public performance measurement
more broadly, we can safely say that it is here to stay (Feller, 2002; McDavid & Huse, 2012;
Perrin, 1998; Thomas, 2006). Even if political decision makers make only limited use of
performance information, the drive to measure and publicly report performance results is
entrenched—public performance reporting is now part of accountability expectations in most
countries and subnational jurisdictions, and is expected to be useful for performance planning
(Dunleavy, Margetts, Bastow, & Tinkler, 2006).

In this textbook, we see performance measurement as being a complement to program
evaluation and, together, they comprise the main approaches to evaluating programs and
policies. Some jurisdictions have policies that explicitly recognize the complementary nature
of performance measurement and program evaluation (Treasury Board of Canada Secretariat,
2009). In this chapter, we have seen that after a period of time during which performance

measurement and public reporting were linked to high-stakes consequences for public
organizations, particularly in Britain, there has been a pulling back to systems that emphasize
public reporting, but do not rely on the reputational elements that were central to the English
approach during the 2000 to 2005 period. In jurisdictions where the political cultures are
adversarial, there are political risks in performance reporting, but not the same consequences
that the “naming and shaming” regime in England entailed.

Notwithstanding efforts in some jurisdictions to ensure the credibility of performance
information, most public departments and agencies collect their own data and report it. Their
data systems and their reports are not subject to external audits. The concern about such
situations is that public performance results could, in part, reflect gaming strategies.

Where the political environment is low stakes (e.g., many local governments and nonprofit
organizations), performance information is more likely to be used for internal performance
management purposes and, when it is reported externally, is perhaps more likely to be credible
(Hildebrand & McDavid, 2011). For program evaluators, then, settings where performance
information is used by managers to monitor and manage performance are more likely to yield
data that are valid and reliable. In situations where reporting performance results externally
carries a risk to the organization and its managers, there are incentives to decouple external
performance reporting and the information therein from internal performance measures. For
evaluators, internal performance information is, again, more likely to be considered
trustworthy.

THREE ADDITIONAL CONSIDERATIONS IN IMPLEMENTING
AND SUSTAINING PERFORMANCE MEASUREMENT
SYSTEMS

It is worthwhile briefly reviewing several additional issues that affect the viability of
performance measurement systems. In addition to the issues that have been raised in Chapter
10 to this point, we want to mention three others: (1) the centralizing influence of performance
measurement on public organizations; (2) the continuing challenge of using performance
results to claim that programs cause those results; and (3) the levels of analysis
problem—assuming that organizational performance tells us about program and individual
performance, and vice versa.

The Centralizing Influence of Performance Measurement in Public Organizations

When NPM was first proffered as a different approach to public sector management in the
1990s, there was an emphasis on managing for results and structuring incentives to align
organizational behaviors with the achievement of outputs and outcomes. One of the premises
built into results-based management systems was that managers would be freed from the need
to pay close attention to traditional process and procedures; instead, they would have the
latitude to work with their inputs in ways that improved efficiency and effectiveness. The
operative phrase was to “let the managers manage.”

But in many settings, particularly where performance measurement and public reporting
are high stakes, managers were not freed up in this way. In fact, the results-based management
requirements were layered on top of existing process-focused requirements (Moynihan, 2008).
The overall effect has been a tendency to centralize organizational decision making. Targets
are not accompanied by more latitude, but instead become controls.

Gill (2011) examined the New Zealand public service and included a survey of more than
1,700 public servants in organizations across the public service. One of his conclusions is that
Max Weber’s “Iron Cage” has been re-created in the New Zealand public service. Weber
introduced this metaphor in his writings on 19th-century bureaucracies (Weber, 1930). Weber
recognized the importance of bureaucracies not just as instruments in the emerging rational
societies in Europe but also as a cultural phenomenon by which relationships would be
transformed in governments and between governments and societies. Bureaucracy, in addition
to offering societies ways of regularizing administration and governance, could also become an
iron cage wherein behaviors and relationships become circumscribed by the values and
expectations core to well-functioning bureaucracies.

In New Zealand, performance management was implemented, but decentralization was not.
Instead, performance information has been used to demonstrate alignment, with objectives and
targets cascading downward. Although results-focused information is used by managers,
performance information on inputs and processes is used more. When bureaucracies are
challenged either by their minister or by external stakeholders, the impulse is to retreat to rules
and processes. One specific finding is worth noting: Of 10 possible influences on the daily
work of the managers who were surveyed, they were least likely to agree that their work unit
has a lot of freedom in how budget and staff are allocated (Gill, 2011, p. 385).

Attributing Outcomes to Programs

The typical scenario for outcome-focused performance measures is that changes in the
measurement values over time or across districts/regions or client groups are assumed to tell us
something about what the program actually accomplished. We do not usually build into
performance measurement systems the kinds of comparisons that yield counterfactual
“program versus no-program” scenarios—the paradigmatic case for program evaluations that
was suggested in Table 8.1.

Attribution becomes more of a problem further down the logic chain from program outputs.
In other words, attributing the variance of longer-term outcome variables to a program is far
more challenging than attributing the variations in outputs to the program.

If program managers are expected to be involved in developing, implementing, and using
performance measures (particularly, measures that focus on outcomes), the intended uses
become important. There will inherently be more incentives for managers to take “ownership”
of measures that are used formatively, as opposed to performance measures that will be seen to
be indicators reflecting the program outcomes attributable to the organization to be used for
summative purposes.

Summative uses of outcome results create situations where managers, knowing that they
often do not control the environmental factors that can affect their program outcomes, perceive
a dilemma. If they are expected to develop and use performance measures, they can be called
to account if they fail to do so. But if they do develop and use performance measures, they
could be held responsible for poor performance results that reflect factors that they cannot
control. In many public organizations—NPM initiatives notwithstanding—procedural rules
and restrictions, and political risk aversion, still mean that even if managers do see innovative
ways of changing programs or program environments to increase the probability of success,
they may have limited freedom to do so.

There is a related problem for organizations that deliver human service programs. In
Chapter 2, we introduce the idea that programs vary in the a priori robustness of their intended
cause-and-effect linkages. Programs that are based on engineering knowledge, such as a
highway maintenance program, typically incorporate technologies that have a high probability

of success. If the program is implemented as planned, it is highly likely that the intended
outcomes will occur. At the other end of the continuum, there are many programs that operate
with low-probability technologies. In general, these programs focus on human service issues
and involve efforts to ameliorate or change human conditions, knowledge, attitudes, or
behaviors. Even if these programs are fully implemented, we often observe mixes of program
successes and failures. Our knowledge of what works tends to be far less certain than is true of
high-probability technologies. There are usually important environmental variables that affect
both program activities and outcome variables. In many situations, it is very difficult to
mitigate the influences of these variables—low-probability technologies tend to be much more
vulnerable to external influences.

For program managers who are involved in delivering programs that incorporate low-
probability technologies, the best efforts of their organization may not succeed. For these kinds
of programs, being accountable for outcome-based performance measures is daunting. Aside
from the challenges of measuring intended outcomes, program managers will point out that
outcomes are not really under their control.

An example might be a social welfare agency that has a program focused on single
mothers. The objective of the program is to support these women in their efforts to obtain
permanent jobs. The logic model for the program emphasizes training and child care support
during the training. But the single mothers may have more needs than this program can meet.
Some may need to deal with substance abuse problems or deal with former partners who
continue to harass the family. Some may need to deal with psychological issues from their own
family of origin. Single programs typically will not be able to address this range of issues, so to
hold any one of them accountable for the program objective is not appropriate. Even if a
comprehensive program could be designed and implemented, the state of the art of our
knowledge of how to achieve the objective is likely to mean that there will be a lot of partial
successes and failures in program outcomes. Holding managers to account for their

Still stressed with your coursework?
Get quality coursework help from an expert!