Question 1: Professional Assignment
In a 4-6-page, APA-formatted paper, design a software test process that utilizes the foundations for modern software testing and include technical methods to design effective test case values from criteria. Show the concepts being put into practice and point out any additional pragmatic concerns. Finally, provide a summary overview of the major aspects of putting the Model-Driven Test Design process into practice. Be sure to take consideration of test plans, integration testing, regression testing, and the design and implementation of test oracles. Remember to support your thoughts and justifications with outside, reliable resources that are properly identified, cited, and referenced.
Question 2: (Discussion)
What is one of the ways in which we as managers of the software test process can properly write and implement test plans? Explain a structure that you have previously used or explored in your research methods currently being used in the industry. Give examples and cite your work where necessary. Minimum 250 words, add 2 citations and 2 references.
Introduction to Software Testing
This extensively classroom-tested text takes an innovative approach to
explaining software testing that defines it as the process of applying a few
precise, general-purpose criteria to a structure or model of the software.
The text incorporates cutting-edge developments, including techniques to
test modern types of software such as OO, web applications, and
embedded software. This revised second edition significantly expands
coverage of the basics, thoroughly discussing test automaton frameworks,
and adds new, improved examples and numerous exercises. Key features
include:
The theory of coverage criteria is carefully, cleanly explained to help
students understand concepts before delving into practical
applications.
Extensive use of the JUnit test framework gives students practical
experience in a test framework popular in industry.
Exercises feature specifically tailored tools that allow students to check
their own work.
Instructor’s manual, PowerPoint slides, testing tools for students, and
example software programs in Java are available from the book’s
website.
Paul Ammann is Associate Professor of Software Engineering at George
Mason University. He earned the Volgenau School’s Outstanding
Teaching Award in 2007. He led the development of the Applied
Computer Science degree, and has served as Director of the MS Software
Engineering program. He has taught courses in software testing, applied
object-oriented theory, formal methods for software engineering, web
software, and distributed software engineering. Ammann has published
more than eighty papers in software engineering, with an emphasis on
software testing, security, dependability, and software engineering
education.
Jeff Offutt is Professor of Software Engineering at George Mason
University. He leads the MS in Software Engineering program, teaches
software engineering courses at all levels, and developed new courses on
several software engineering subjects. He was awarded the George Mason
University Teaching Excellence Award, Teaching with Technology, in
2013. Offutt has published more than 165 papers in areas such as modelbased testing, criteria-based testing, test automaton, empirical software
engineering, and software maintenance. He is Editor-in-Chief of the
Journal of Software Testing, Verification and Reliability; helped found the
IEEE International Conference on Software Testing; and is the founder of
the μJava project.
INTRODUCTION TO
SOFTWARE
TESTING
Paul Ammann
George Mason University
Jeff Offutt
George Mason University
University Printing House, Cambridge CB2 8BS, United Kingdom One Liberty
Plaza, 20th Floor, New York, NY 10006, USA 477 Williamstown Road, Port
Melbourne, VIC 3207, Australia 4843/24, 2nd Floor, Ansari Road, Daryaganj,
Delhi – 110002, India 79 Anson Road, #06-04/06, Singapore 079906
Cambridge University Press is part of the University of Cambridge.
It furthers the University’s mission by disseminating knowledge in the pursuit of
education, learning, and research at the highest international levels of excellence.
www.cambridge.org
Information on this title: www.cambridge.org/9781107172012
DOI: 10.1017/9781316771273
© Paul Ammann and Jeff Offutt 2017
This publication is in copyright. Subject to statutory exception and to the
provisions of relevant collective licensing agreements, no reproduction of any part
may take place without the written permission of Cambridge University Press.
First published 2017
Printed in the United States of America by Sheridan Books, Inc.
A catalogue record for this publication is available from the British Library.
Library of Congress Cataloguing in Publication Data
Names: Ammann, Paul, 1961– author. — Offutt, Jeff, 1961– author.
Title: Introduction to software testing / Paul Ammann, George Mason
University, Jeff Offutt, George Mason University.
Description: Edition 2. — Cambridge, United Kingdom; New York, NY, USA:
Cambridge University Press, [2016]
Identifiers: LCCN 2016032808 — ISBN 9781107172012 (hardback)
Subjects: LCSH: Computer software–Testing.
Classification: LCC QA76.76.T48 A56 2016 — DDC 005.3028/7–dc23
LC record available at https://lccn.loc.gov/2016032808
ISBN 978-1-107-17201-2 Hardback
Additional resources for this publication at https://cs.gmu.edu/~offutt/softwaretest/.
Cambridge University Press has no responsibility for the persistence or accuracy of
URLs for external or third-party Internet Web sites referred to in this publication
and does not guarantee that any content on such Web sites is, or will remain,
accurate or appropriate.
Contents
List of Figures
List of Tables
Preface to the Second Edition
Part 1 Foundations
1 Why Do We Test Software?
1.1 When Software Goes Bad
1.2 Goals of Testing Software
1.3 Bibliographic Notes
2 Model-Driven Test Design
2.1 Software Testing Foundations
2.2 Software Testing Activities
2.3 Testing Levels Based on Software Activity
2.4 Coverage Criteria
2.5 Model-Driven Test Design
2.5.1 Test Design
2.5.2 Test Automation
2.5.3 Test Execution
2.5.4 Test Evaluation
2.5.5 Test Personnel and Abstraction
2.6 Why MDTD Matters
2.7 Bibliographic Notes
3 Test Automation
3.1 Software Testability
3.2 Components of a Test Case
3.3 A Test Automation Framework
3.3.1 The JUnit Test Framework
3.3.2 Data-Driven Tests
3.3.3 Adding Parameters to Unit Tests
3.3.4 JUnit from the Command Line
3.4 Beyond Test Automation
3.5 Bibliographic Notes
4 Putting Testing First
4.1 Taming the Cost-of-Change Curve
4.1.1 Is the Curve Really Tamed?
4.2 The Test Harness as Guardian
4.2.1 Continuous Integration
4.2.2 System Tests in Agile Methods
4.2.3 Adding Tests to Legacy Systems
4.2.4 Weaknesses in Agile Methods for Testing
4.3 Bibliographic Notes
5 Criteria-Based Test Design
5.1 Coverage Criteria Defined
5.2 Infeasibility and Subsumption
5.3 Advantages of Using Coverage Criteria
5.4 Next Up
5.5 Bibliographic Notes
Part 2 Coverage Criteria
6 Input Space Partitioning
6.1 Input Domain Modeling
6.1.1 Interface-Based Input Domain Modeling
6.1.2 Functionality-Based Input Domain Modeling
6.1.3 Designing Characteristics
6.1.4 Choosing Blocks and Values
6.1.5 Checking the Input Domain Model
6.2 Combination Strategies Criteria
6.3 Handling Constraints Among Characteristics
6.4 Extended Example: Deriving an IDM from JavaDoc
6.4.1 Tasks in Designing IDM-Based Tests
6.4.2 Designing IDM-Based Tests for Iterator
6.5 Bibliographic Notes
7 Graph Coverage
7.1 Overview
7.2 Graph Coverage Criteria
7.2.1 Structural Coverage Criteria
7.2.2 Touring, Sidetrips, and Detours
7.2.3 Data Flow Criteria
7.2.4 Subsumption Relationships Among Graph Coverage
Criteria
7.3 Graph Coverage for Source Code
7.3.1 Structural Graph Coverage for Source Code
7.3.2 Data Flow Graph Coverage for Source Code
7.4 Graph Coverage for Design Elements
7.4.1 Structural Graph Coverage for Design Elements
7.4.2 Data Flow Graph Coverage for Design Elements
7.5 Graph Coverage for Specifications
7.5.1 Testing Sequencing Constraints
7.5.2 Testing State Behavior of Software
7.6 Graph Coverage for Use Cases
7.6.1 Use Case Scenarios
7.7 Bibliographic Notes
8 Logic Coverage
8.1 Semantic Logic Coverage Criteria (Active)
8.1.1 Simple Logic Expression Coverage Criteria
8.1.2 Active Clause Coverage
8.1.3 Inactive Clause Coverage
8.1.4 Infeasibility and Subsumption
8.1.5 Making a Clause Determine a Predicate
8.1.6 Finding Satisfying Values
8.2 Syntactic Logic Coverage Criteria (DNF)
8.2.1 Implicant Coverage
8.2.2 Minimal DNF
8.2.3 The MUMCUT Coverage Criterion
8.2.4 Karnaugh Maps
8.3 Structural Logic Coverage of Programs
8.3.1 Satisfying Predicate Coverage
8.3.2 Satisfying Clause Coverage
8.3.3 Satisfying Active Clause Coverage
8.3.4 Predicate Transformation Issues
8.3.5 Side Effects in Predicates
8.4 Specification-Based Logic Coverage
8.5 Logic Coverage of Finite State Machines
8.6 Bibliographic Notes
9 Syntax-Based Testing
9.1 Syntax-Based Coverage Criteria
9.1.1 Grammar-Based Coverage Criteria
9.1.2 Mutation Testing
9.2 Program-Based Grammars
9.2.1 BNF Grammars for Compilers
9.2.2 Program-Based Mutation
9.3 Integration and Object-Oriented Testing
9.3.1 BNF Integration Testing
9.3.2 Integration Mutation
9.4 Specification-Based Grammars
9.4.1 BNF Grammars
9.4.2 Specification-Based Mutation
9.5 Input Space Grammars
9.5.1 BNF Grammars
9.5.2 Mutating Input Grammars
9.6 Bibliographic Notes
Part 3 Testing in Practice
10 Managing the Test Process
10.1 Overview
10.2 Requirements Analysis and Specification
10.3 System and Software Design
10.4 Intermediate Design
10.5 Detailed Design
10.6 Implementation
10.7 Integration
10.8 System Deployment
10.9 Operation and Maintenance
10.10 Implementing the Test Process
10.11 Bibliographic Notes
11 Writing Test Plans
11.1 Level Test Plan Example Template
11.2 Bibliographic Notes
12 Test Implementation
12.1 Integration Order
12.2 Test Doubles
12.2.1 Stubs and Mocks: Variations of Test Doubles
12.2.2 Using Test Doubles to Replace Components
12.3 Bibliographic Notes
13 Regression Testing for Evolving Software
13.1 Bibliographic Notes
14 Writing Effective Test Oracles
14.1 What Should Be Checked?
14.2 Determining Correct Values
14.2.1 Specification-Based Direct Verification of Outputs
14.2.2 Redundant Computations
14.2.3 Consistency Checks
14.2.4 Metamorphic Testing
14.3 Bibliographic Notes
List of Criteria
Bibliography
Index
Figures
1.1
2.1
2.2
2.3
2.4
2.5
3.1
3.2
3.3
3.4
3.5
3.6
3.7
3.8
4.1
4.2
6.1
6.2
7.1
Cost of late testing
Reachability, Infection, Propagation, Revealability (RIPR) model
Activities of test engineers
Software development activities and testing levels – the “V Model”
Model-driven test design
Example method, CFG, test requirements and test paths
Calc class example and JUnit test
Minimum element class
First three JUnit tests for Min class
Remaining JUnit test methods for Min class
Data-driven test class for Calc
JUnit Theory about sets
JUnit Theory data values
AllTests for the Min class example
Cost-of-change curve
The role of user stories in developing system (acceptance) tests
Partitioning of input domain D into three blocks
Subsumption relations among input space partitioning criteria
Graph (a) has a single initial node, graph (b) multiple initial nodes,
and graph (c) (rejected) with no initial nodes
7.2 Example of paths
7.3 A Single-Entry Single-Exit graph
7.4 Test case mappings to test paths
7.5 A set of test cases and corresponding test paths
7.6 A graph showing Node Coverage and Edge Coverage
7.7 Two graphs showing prime path coverage
7.8 Graph with a loop
7.9 Tours, sidetrips, and detours in graph coverage
7.10 An example for prime test paths
7.11 A graph showing variables, def sets and use sets
7.12 A graph showing an example of du-paths
7.13 Graph showing explicit def and use sets
7.14 Example of the differences among the three data flow coverage
criteria
7.15 Subsumption relations among graph coverage criteria
7.16 CFG fragment for the if-else structure
7.17 CFG fragment for the if structure without an else
7.18 CFG fragment for the if structure with a return
7.19 CFG fragment for the while loop structure
7.20 CFG fragment for the for loop structure
7.21 CFG fragment for the do-while structure
7.22 CFG fragment for the while loop with a break structure
7.23 CFG fragment for the case structure
7.24 CFG fragment for the try-catch structure
7.25 Method patternIndex() for data flow example
7.26 A simple call graph
7.27 A simple inheritance hierarchy
7.28 An inheritance hierarchy with objects instantiated
7.29 An example of parameter coupling
7.30 Coupling du-pairs
7.31 Last-defs and first-uses
7.32 Quadratic root program
7.33 Def-use pairs under intra-procedural and inter-procedural data flow
7.34 Def-use pairs in object-oriented software
7.35 Def-use pairs in web applications and other distributed software
7.36 Control flow graph using the File ADT
7.37 Elevator door open transition
7.38 Watch–Part A
7.39 Watch–Part B
7.40 An FSM representing Watch, based on control flow graphs of the
methods
7.41 An FSM representing Watch, based on the structure of thesoftware
7.42 An FSM representing Watch, based on modeling state variables
7.43 ATM actor and use cases
7.44 Activity graph for ATM withdraw funds
8.1 Subsumption relations among logic coverage criteria
8.2 Fault detection relationships
8.3 Thermostat class
8.4 PC true test for Thermostat class
8.5 CC test assignments for Thermostat class
8.6 Calendar method
8.7 FSM for a memory car seat–Nissan Maxima 2012
9.1 Method Min and six mutants
9.2 Mutation testing process
9.3 Partial truth table for (a ∧ b)
9.4 Finite state machine for SMV specification
9.5 Mutated finite state machine for SMV specification
9.6 Finite state machine for bank example
9.7 Finite state machine for bank example grammar
9.8 Simple XML message for books
9.9 XML schema for books
12.1 Test double example: Replacing a component
Tables
6.1
6.2
6.3
6.4
6.5
First partitioning of triang()’s inputs (interface-based)
Second partitioning of triang()’s inputs (interface-based)
Possible values for blocks in the second partitioning in Table 6.2
Geometric partitioning of triang()’s inputs (functionality-based)
Correct geometric partitioning of triang()’s inputs (functionalitybased)
6.6 Possible values for blocks in geometric partitioning in Table 6.5
6.7 Examples of invalid block combinations
6.8 Table A for Iterator example: Input parameters and
characteristics
6.9 Table B for Iterator example: Partitions and base case
6.10 Table C for Iterator example: Refined test requirements
6.11 Table A for Iterator example: Input parameters and
characteristics (revised)
6.12 Table C for Iterator example: Refined test requirements
(revised)
7.1 Defs and uses at each node in the CFG for patternIndex()
7.2 Defs and uses at each edge in the CFG for patternIndex()
7.3 du-path sets for each variable in patternIndex()
7.4 Test paths to satisfy all du-paths coverage on patternIndex()
7.5 Test paths and du-paths covered in patternIndex()
8.1 DNF fault classes
8.2 Reachability for Thermostat predicates
8.3 Clauses in the Thermostat predicate on lines 28-30
8.4 Correlated active clause coverage for Thermostat
8.5 Correlated active clause coverage for cal() preconditions
8.6 Predicates from memory seat example
9.1 Java’s access levels
10.1 Testing objectives and activities during requirements analysis and
specification
10.2 Testing objectives and activities during system and software design
10.3 Testing objectives and activities during intermediate design
10.4 Testing objectives and activities during detailed design
10.5 Testing objectives and activities during implementation
10.6 Testing objectives and activities during integration
10.7 Testing objectives and activities during system deployment
10.8 Testing objectives and activities during operation and maintenance
Preface to the Second Edition
Much has changed in the field of testing in the eight years since the first
edition was published. High-quality testing is now more common in
industry. Test automation is now ubiquitous, and almost assumed in large
segments of the industry. Agile processes and test-driven development are
now widely known and used. Many more colleges offer courses on
software testing, both at the undergraduate and graduate levels. The ACM
curriculum guidelines for software engineering include software testing in
several places, including as a strongly recommended course [Ardis et al.,
2015].
The second edition of Introduction to Software Testing incorporates new
features and material, yet retains the structure, philosophy, and online
resources that have been so popular among the hundreds of teachers who
haveused the book.
What is new about the second edition?
The first thing any instructor has to do when presented with a new edition
of a book is analyze what must be changed in the course. Since we have
been in that situation many times, we want to make it as easy as possible
for our audience. We start with a chapter-to-chapter mapping.
First
Edition
Chapter 1
Second
Edition
Topic
Part I: Foundations
Why do we test software?
(motivation)
Chapter 01
Model-driven test design
Chapter 02
(abstraction)
Chapter 03
Test automation (JUnit)
Chapter 04
Putting testing first (TDD)
Chapter 05
Chapter 2
Chapter 3
Chapter 4
Chapter 5
Chapter 6
Chapter 7
Chapter 8
Chapter 9
Criteria-based test design (criteria)
Part II: Coverage Criteria
Chapter 07
Graph coverage
Chapter 08
Logic coverage
Chapter 09
Syntax-based testing
Chapter 06
Input space partitioning
Part III: Testing in Practice
Managing the test process
Chapter 10
Writing test plans
Chapter 11
Test implementation
Chapter 12
Regression testing for evolving
Chapter 13
software
Chapter 14
Writing effective test oracles
N/A
Technologies
N/A
Tools
N/A
Challenges
The most obvious, and largest change, is that the introductory chapter 1
from the first edition has been expanded into five separate chapters. This is
a significant expansion that we believe makes the book much better. The
new part 1 grew out of our lectures. After the first edition came out, we
started adding more foundational material to our testing courses. These
new ideas were eventually reorganized into five new chapters. The new
chapter 011 has much of the material from the first edition chapter 1,
including motivation and basic definitions. It closes with a discussion of
the cost of late testing, taken from the 2002 RTI report that is cited in
every software testing research proposal. After completing the first edition,
we realized that the key novel feature of the book, viewing test design as
an abstract activity that is independent of the software artifact being used
to design the tests, implied a completely different process. This led to
chapter 02, which suggests how test criteria can fit into practice. Through
our consulting, we have helped software companies modify their test
processes to incorporate this model.
A flaw with the first edition was that it did not mention JUnit or other
test automation frameworks. In 2016, JUnit is used very widely in
industry, and is commonly used in CS1 and CS2 classes for automated
grading. Chapter 03 rectifies this oversight by discussing test automation
in general, the concepts that make test automation difficult, and explicitly
teaches JUnit. Although the book is largely technology-neutral, having a
consistent test framework throughout the book helps with examples and
exercises. In our classes, we usually require tests to be automated and
often ask students to try other “*-Unit” frameworks such as HttpUnit as
homework exercises. We believe that test organizations cannot be ready to
apply test criteria successfully before they have automated their tests.
Chapter 04 goes to the natural next step of test-driven development.
Although TDD is a different take on testing than the rest of the book, it’s
an exciting topic for test educators and researchers precisely because it
puts testing front and center—the tests become the requirements. Finally,
chapter 05 introduces the concept of test criteria in an abstract way. The
jelly bean example (which our students love, especially when we share), is
still there, as are concepts such as subsumption.
Part 2, which is the heart of the book, has changed the least for the
second edition. In 2014, Jeff asked Paul a very simple question: “Why are
the four chapters in part 2 in that order?” The answer was stunned silence,
as we realized that we had never asked which order they should appear in.
It turns out that the RIPR model, which is certainly central to software
testing, dictates a logical order. Specifically, input space partitioning does
not require reachability, infection, or propagation. Graph coverage criteria
require execution to “get to” some location in the software artifact under
test, that is, reachability, but not infection or propagation. Logic coverage
criteria require that a predicate not only be reached, but be exercised in a
particular way to affect the result of the predicate. That is, the predicate
must be infected. Finally, syntax coverage not only requires that a location
be reached, and that the program state of the “mutated” version be
different from the original version, but that difference must be visible after
execution finishes. That is, it must propagate. The second edition orders
these four concepts based on the RIPR model, where each chapter now has
successively stronger requirements. From a practical perspective, all we
did was move the previous chapter 5 (now chapter 06) in front of the graph
chapter (now chapter 07).
Another major structural change is that the second edition does not
include chapters 7 through 9 from the first edition. The first edition
material has become dated. Because it is used less than other material in
the book, we decided not to delay this new edition of the book while we
tried to find time to write this material. We plan to include better versions
of these chapters in a third edition.
We also made hundreds of changes at a more detailed level. Recent
research has found that in addition to an incorrect value propagating to the
output, testing only succeeds if our automated test oracle looks at the right
part of the software output. That is, the test oracle must reveal the failure.
Thus, the old RIP model is now the RIPR model. Several places in the
book have discussions that go beyond or into more depth than is strictly
needed. The second edition now includes “meta discussions,” which are
ancillary discussions that can be interesting or insightful to some students,
but unnecessarily complicated for others.
The new chapter 06 now has a fully worked out example of deriving an
input domain model from a widely used Java library interface (in section
06.4). Our students have found this helps them understand how to use the
input space partitioning techniques. The first edition included a section on
“Representing graphs algebraically.” Although one of us found this
material to be fun, we both found it hard to motivate and unlikely to be
used in practice. It also has some subtle technical flaws. Thus, we removed
this section from the second edition. The new chapter 08 (logic) has a
significant structural modification. The DNF criteria (formerly in section
3.6) properly belong at the front of the chapter. Chapter 08 now starts with
semantic logic criteria (ACC and ICC) in 08.1, then proceeds to syntactic
logic criteria (DNF) in 08.2. The syntactic logic criteria have also changed.
One was dropped (UTPC), and CUTPNFP has been joined by MUTP and
MNFP. Together, these three criteria comprise MUMCUT.
Throughout the book (especially part 2), we have improved the
examples, simplified definitions, and included more exercises. When the
first edition was published we had a partial solution manual, which
somehow took five years to complete. We are proud to say that we learned
from that mistake: we made (and stuck by!) a rule that we couldn’t add an
exercise without also adding a solution. The reader might think of this rule
as testing for exercises. We are glad to say that the second edition book
website debuts with a complete solution manual.
The second edition also has many dozens of corrections (starting with
the errata list from the first edition book website), but including many
more that we found while preparing the second edition. The second edition
also has a better index. We put together the index for the first edition in
about a day, and it showed. This time we have been indexing as we write,
and committed time near the end of the process to specifically focus on the
index. For future book writers, indexing is hard work and not easy to turn
over to a non-author!
What is still the same in the second edition?
The things that have stayed the same are those that were successful in the
first edition. The overall observation that test criteria are based on only
four types of structures is still the key organizing principle of the second
edition. The second edition is also written from an engineering viewpoint,
assuming that users of the book are engineers who want to produce the
highest quality software with the lowest possible cost. The concepts are
well grounded in theory, yet presented in a practical manner. That is, the
book tries to make theory meet practice; the theory is sound according to
the research literature, but we also show how the theory applies in practice.
The book is also written as a text book, with clear explanations, simple
but illustrative examples, and lots of exercises suitable for in-class or outof-class work. Each chapter ends with bibliographic notes so that
beginning research students can proceed to learning the deeper ideas
involved
in
software
testing.
The
book
website
(https://cs.gmu.edu/~offutt/softwaretest/) is rich in materials with solution
manuals, listings of all example programs in the text, high quality
PowerPoint slides, and software to help students with graph coverage,
logic coverage, and mutation analysis. Some explanatory videos are also
available and we hope more will follow. The solution manual comes in
two flavors. The student solution manual, with solutions to about half the
exercises, is available to everyone. The instructor solution manual has
solutions to all exercises and is only available to those who convince the
authors that they are using a book to teach a course.
Using the book in the classroom
The book chapters are built in a modular, component-based manner. Most
chapters are independent, and although they are presented in the order that
we use them, inter-chapter dependencies are few and they could be used in
almost any order. Our primary target courses at our university are a fourthyear course (SWE 437) and a first-year graduate course (SWE 637).
Interested readers can search on those courses (“mason swe 437” or
“mason swe 637”) to see our schedules and how we use the book. Both
courses are required; SWE 437 is required in the software engineering
concentration in our Applied Computer Science major, and SWE 637 is
required in our MS program in software engineering2. Chapters 01 and 03
can be used in an early course such as CS2 in two ways. First, to sensitize
early students to the importance of software quality, and second to get
them started with test automation (we use JUnit at Mason). A second-year
course in testing could cover all of part 1, chapter 06 from part 2, and all or
part of part 3. The other chapters in part 2 are probably more than what
such students need, but input space partitioning is a very accessible
introduction to structured, high-end testing. A common course in north
American computer science programs is a third-year course on general
software engineering. Part 1 would be very appropriate for such a course.
In 2016 we are introducing an advanced graduate course on software
testing, which will span cutting-edge knowledge and current research. This
course will use some of part 3, the material that we are currently
developing for part 4, and selected research papers.
Teaching software testing
Both authors have become students of teaching over the past decade. In the
early 2000s, we ran fairly traditional classrooms. We lectured for most of
the available class time, kept organized with extensive PowerPoint slides,
required homework assignments to be completed individually, and gave
challenging, high-pressure exams. The PowerPoint slides and exercises in
the first edition were designed for this model.
However, our teaching has evolved. We replaced our midterm exam
with weekly quizzes, given in the first 15 minutes of class. This distributed
a large component of the grade through the semester, relieved much of the
stress of midterms, encouraged the students to keep up on a weekly basis
instead of cramming right before the exam, and helped us identify students
who were succeeding or struggling early in the term.
After learning about the “flipped classroom” model, we experimented
with recorded lectures, viewed online, followed by doing the “homework”
assignments in class with us available for immediate help. We found this
particularly helpful with the more mathematically sophisticated material
such as logic coverage, and especially beneficial to struggling students. As
the educational research evidence against the benefits of lectures has
mounted, we have been moving away from the “sage on a stage” model of
talking for two hours straight. We now often talk for 10 to 20 minutes,
then give in-class exercises3 where the students immediately try to solve
problems or answer questions. We confess that this is difficult for us,
because we love to talk! Or, instead of showing an example during our
lecture, we introduce the example, let the students work the next step in
small groups, and then share the results. Sometimes our solutions are
better, sometimes theirs are better, and sometimes solutions differ in
interesting ways that spur discussion.
There is no doubt that this approach to teaching takes time and cannot
acccomodate all of the PowerPoint slides we have developed. We believe
that although we cover less material, we uncover more, a perception
consistent with how our students perform on our final exams.
Most of the in-class exercises are done in small groups. We also
encourage students to work out-of-class assignments collaboratively. Not
only does evidence show that students learn more when they work
collaboratively (“peer-learning”), they enjoy it more, and it matches the
industrial reality. Very few software engineers work alone.
Of course, you can use this book in your class as you see fit. We offer
these insights simply as examples for things that work for us. We
summarize our current philosophy of teaching simply: Less talking, more
teaching.
Acknowledgments
It is our pleasure to acknowledge by name the many contributers to this
text. We begin with students at George Mason who provided excellent
feedback on early draft chapters from the second edition: Firass Almiski,
Natalia Anpilova, Khalid Bargqdle, Mathew Fadoul, Mark Feghali,
Angelica Garcia, Mahmoud Hammad, Husam Hilal, Carolyn Koerner,
Han-Tsung Liu, Charon Lu, Brian Mitchell, Tuan Nguyen, Bill Shelton,
Dzung Tran, Dzung Tray, Sam Tryon, Jing Wu, Zhonghua Xi, and Chris
Yeung.
We are particularly grateful to colleagues who used draft chapters of the
second edition. These early adopters provided valuable feedback that was
extremely helpful in making the final document classroom-ready. Thanks
to: Moataz Ahmed, King Fahd University of Petroleum & Minerals; Jeff
Carver, University of Alabama; Richard Carver, George Mason
University; Jens Hannemann, Kentucky State University; Jane Hayes,
University of Kentucky; Kathleen Keogh, Federation University Australia;
Robyn Lutz, Iowa State University; Upsorn Praphamontripong, George
Mason University; Alper Sen, Bogazici University; Marjan Sirjani,
Reykjavik University; Mary Lou Soffa, University of Virginia; Katie
Stolee, North Carolina State University; and Xiaohong Wang, Salisbury
University.
Several colleagues provided exceptional feedback from the first edition:
Andy Brooks, Mark Hampton, Jian Zhou, Jeff (Yu) Lei, and six
anonymous reviewers contacted by our publisher. The following
individuals corrected, and in some cases developed, exercise solutions:
Sana’a Alshdefat, Yasmine Badr, Jim Bowring, Steven Dastvan, Justin
Donnelly, Martin Gebert, JingJing Gu, Jane Hayes, Rama Kesavan,
Ignacio Martín, Maricel Medina-Mora, Xin Meng, Beth Paredes, Matt
Rutherford, Farida Sabry, Aya Salah, Hooman Safaee, Preetham
Vemasani, and Greg Williams. The following George Mason students
found, and often corrected, errors in the first edition: Arif Al-Mashhadani,
Yousuf Ashparie, Parag Bhagwat, Firdu Bati, Andrew Hollingsworth,
Gary Kaminski, Rama Kesavan, Steve Kinder, John Krause, Jae Hyuk
Kwak, Nan Li, Mohita Mathur, Maricel Medina Mora, Upsorn
Praphamontripong, Rowland Pitts, Mark Pumphrey, Mark Shapiro, Bill
Shelton, David Sracic, Jose Torres, Preetham Vemasani, Shuang Wang,
Lance Witkowski, Leonard S. Woody III, and Yanyan Zhu. The following
individuals from elsewhere found, and often corrected, errors in the first
edition: Sana’a Alshdefat, Alexandre Bartel, Don Braffitt, Andrew Brooks,
Josh Dehlinger, Gordon Fraser, Rob Fredericks, Weiyi Li, Hassan Mirian,
Alan Moraes, Miika Nurminen, Thomas Reinbacher, Hooman Rafat
Safaee, Hossein Saiedian, Aya Salah, and Markku Sakkinen. Lian Yu of
Peking University translated the the first edition into Mandarin Chinese.
We also want to acknowledge those who implicitly contributed to the
second edition by explicitly contributing to the first edition: Aynur
Abdurazik, Muhammad Abdulla, Roger Alexander, Lionel Briand, Renee
Bryce, GeorgeP. Burdell, Guillermo Calderon-Meza, Jyothi Chinman,
Yuquin Ding, Blaine Donley, Patrick Emery, Brian Geary, Hassan Gomaa,
Mats Grindal, Becky Hartley, Jane Hayes, Mark Hinkle, Justin
Hollingsworth, Hong Huang, Gary Kaminski, John King, Yuelan Li, Ling
Liu, Xiaojuan Liu, Chris Magrin, Darko Marinov, Robert Nilsson, Andrew
J. Offutt, Buzz Pioso, Jyothi Reddy, Arthur Reyes, Raimi Rufai, Bo
Sanden, Jeremy Schneider, Bill Shelton, Michael Shin, Frank Shukis, Greg
Williams, Quansheng Xiao, Tao Xie, Wuzhi Xu, and Linzhen Xue.
While developing the second edition, our graduate teaching assistants at
George Mason gave us fantastic feedback on early drafts of chapters: Lin
Deng, Jingjing Gu, Nan Li, and Upsorn Praphamontripong. In particular,
Nan Li and Lin Deng were instrumental in completing, evolving, and
maintaining the software coverage tools available on the book website.
We are grateful to our editor, Lauren Cowles, for providing unwavering
support and enforcing the occasional deadline to move the project along,
as well as Heather Bergmann, our former editor, for her strong support on
this long-running project.
Finally, of course none of this is possible without the support of our
families. Thanks to Becky, Jian, Steffi, Matt, Joyce, and Andrew for
helping us stay balanced.
Just as all programs contain faults, all texts contain errors. Our text is no
different. And, as responsibility for software faults rests with the
developers, responsibility for errors in this text rests with us, the authors.
In particular, the bibliographic notes sections reflect our perspective of the
testing field, a body of work we readily acknowledge as large and
complex. We apologize in advance for omissions, and invite pointers to
relevant citations.
1 To help reduce confusion, we developed the convention of using two digits for
second edition chapters. Thus, in this preface,chapter 01 implies the second
edition, whereas chapter 1 implies the first.
2 Our MS program is practical in nature, not research-oriented. The majority of
students are part-time students with five to tenyears of experience in the software
industry. SWE 637 begat this book when we realized Beizer’s classic text
[Beizer, 1990] was out ofprint.
3 These in-class exercises are not yet a formal part of the book website. But we
often draw them from regular exercises in the text. Interested readers can extract
recent versions from our course web pages with a search engine.
PART I
Foundations
1
Why Do We Test Software?
The true subject matter of the tester is not testing, but the design of test cases.
The purpose of this book is to teach software engineers how to test. This
knowledge is useful whether you are a programmer who needs to unit test
your own software, a full-time tester who works mostly from requirements
at the user level, a manager in charge of testing or development, or any
position in between. As the software industry moves into the second
decade of the 21st century, software quality is increasingly becoming
essential to all businesses and knowledge of software testing is becoming
necessary for all software engineers.
Today, software defines behaviors that our civilization depends on in
systems such as network routers, financial calculation engines, switching
networks, the Web, power grids, transportation systems, and essential
communications, command, and control services. Over the past two
decades, the software industry has become much bigger, is more
competitive, and has more users. Software is an essential component of
exotic embedded applications such as airplanes, spaceships, and air traffic
control systems, as well as mundane appliances such as watches, ovens,
cars, DVD players, garage door openers, mobile phones, and remote
controllers. Modern households have hundreds of processors, and new cars
have over a thousand; all of them running software that optimistic
consumers assume will never fail! Although many factors affect the
engineering of reliable software, including, of course, careful design and
sound process management, testing is the primary way industry evaluates
software during development. The recent growth in agile processes puts
increased pressure on testing; unit testing is emphasized heavily and testdriven development makes tests key to functional requirements. It is clear
that industry is deep into a revolution in what testing means to the success
of software products.
Fortunately, a few basic software testing concepts can be used to design
tests for a large variety of software applications. A goal of this book is to
present these concepts in such a way that students and practicing engineers
can easily apply them to any software testing situation.
This textbook differs from other software testing books in several
respects. The most important difference is in how it views testing
techniques. In his landmark book Software Testing Techniques, Beizer
wrote that testing is simple—all a tester needs to do is “find a graph and
cover it.” Thanks to Beizer’s insight, it became evident to us that the
myriad of testing techniques present in the literature have much more in
common than is obvious at first glance. Testing techniques are typically
presented in the context of a particular software artifact (for example, a
requirements document or code) or a particular phase of the lifecycle (for
example, requirements analysis or implementation). Unfortunately, such a
presentation obscures underlying similarities among techniques.
This book clarifies these similarities with two innovative, yet
simplifying, approaches. First, we show how testing is more efficient and
effective by using a classical engineering approach. Instead of designing
and developing tests on concrete software artifacts like the source code or
requirements, we show how to develop abstraction models, design tests at
the abstract level, and then implement actualtests at the concrete level by
satisfying the abstract designs. This is the exact process that traditional
engineers use, except whereas they usually use calculus and algebra to
describe the abstract models, software engineers usually use discrete
mathematics. Second, we recognize that all test criteria can be defined
with a very short list of abstract models: input domain characterizations,
graphs, logical expressions, and syntactic descriptions. These are directly
reflected in the four chapters of Part II of this book.
This book provides a balance of theory and practical application,
thereby presenting testing as a collection of objective, quantitative
activities that can be measured and repeated. The theory is based on the
published literature, and presented without excessive formalism. Most
importantly, the theoretical concepts are presented when needed to support
the practical activities that test engineers follow. That is, this book is
intended for all software developers.
1.1 WHEN SOFTWARE GOES BAD
As said, we consider the development of software to be engineering. And
like any engineering discipline, the software industry has its shares of
failures, some spectacular, some mundane, some costly, and sadly, some
that have resulted in loss of life. Before learning about software disasters,
it is important to understand the difference between faults, errors, and
failures. We adopt the definitions of software fault, error, and failure from
the dependability community.
Definition 1.1 Software Fault: A static defect in the software.
Definition 1.2 Software Error: An incorrect internal state that is the
manifestation of some fault.
Definition 1.3 Software Failure: External, incorrect behavior with
respect to the requirements or another description of the expected
behavior.
Consider a medical doctor diagnosing a patient. The patient enters the
doctor’s office with a list of failures (that is, symptoms). The doctor then
must discover the fault, or root cause of the symptoms. To aid in the
diagnosis, a doctor may order tests that look for anomalous internal
conditions, such as high blood pressure, an irregular heartbeat, high levels
of blood glucose, or high cholesterol. In our terminology, these anomalous
internal conditions correspond to errors.
While this analogy may help the student clarify his or her thinking about
faults, errors, and failures, software testing and a doctor’s diagnosis differ
in one crucial way. Specifically, faults in software are design mistakes.
They do not appear spontaneously, but exist as a result of a decision by a
human. Medical problems (as well as faults in computer system hardware),
on the other hand, are often a result of physical degradation. This
distinction is important because it limits the extent to which any process
can hope to control software faults. Specifically, since no foolproof way
exists to catch arbitrary mistakes made by humans, we can never eliminate
all faults from software. In colloquial terms, we can make software
development foolproof, but we cannot, and should not attempt to, make it
damn-foolproof.
For a more precise example of the definitions of fault, error, and failure,
we need to clarify the concept of the state. A program state is defined
during execution of a program as the current value of all live variables and
the current location, as given by the program counter. The program
counter (PC) is the next statement in the program to be executed and can
be described with a line number in the file (PC = 5) or the statement as a
string (PC = “if (x > y)”). Most of the time, what we mean by a statement
is obvious, but complex structures such as for loops have to be treated
specially. The program line “for (i=1; i < N; i++)” actually has three
statements that can result in separate states. The loop initialization (“i=1”)
is separate from the loop test (“i < N”), and the loop increment (“i++”)
occurs at the end of the loop body. As an illustrative example, consider the
following Java method:
Sidebar
Programming Language Independence
This book strives to be independent of language, and most of the
concepts in the book are. At the same time, we want to illustrate these
concepts with specific examples. We choose Java, and emphasize that
most of these examples would be very similar in many other common
languages.
The fault in this method is that it starts looking for zeroes at index 1
instead of index 0, as is necessary for arrays in Java. For example,
numZero ([2, 7, 0]) correctly evaluates to 1, while numZero
([0, 7, 2]) incorrectly evaluates to 0. In both tests the faulty
statement is executed. Although both of these tests result in an error, only
the second results in failure. To understand the error states, we need to
identify the state for the method. The statefor numZero() consists of
values for the variables x, count, i, and the program counter ( PC). For
the first example above, the state at the loop test on the very first iteration
of the loop is ( x = [2, 7, 0], count = 0, i = 1, PC = “ i <
x.length”). Notice that this state is erroneous precisely because the
value of i should be zero on the first iteration. However, since the value of
count is coincidentally correct, the error state does not propagate to the
output, and hence the software does not fail. In other words, a state is in
error simply if it is not the expected state, even if all of the values in the
state, considered in isolation, are acceptable. More generally, if the
required sequence of states is s0, s1, s2, ..., and the actual sequence of states
is s0, s2, s3, ..., then state s2 is in error in the second sequence. The fault
model described here is quite deep, and this discussion gives the broad
view without going into unneeded details. The exercises at the end of the
section explore some of the subtleties of the fault model.
In the second test for our example, the error state is ( x = [0, 7, 2],
count = 0, i = 1, PC = “ i < x.length”). In this case, the error
propagates to the variable count and is present in the return value of the
method. Hence a failure results.
The term bug is often used informally to refer to all three of fault, error,
and failure. This book will usually use the specific term, and avoid using
“bug.” A favorite story of software engineering teachers is that Grace
Hopper found a moth stuck in a relay on an early computing machine,
which started the use of bug as a problem with software. It is worth noting,
however, that the term bug has an old and rich history, predating software
by at least a century. The first use of bug to generally mean a problem we
were able to find is from a quote by Thomas Edison :
It has been just so in all of my inventions. The first step is an intuition, and
comes with a burst, then difficulties arise–this thing gives out and [it is]
then that ‘Bugs’–as such little faults and difficulties are called–show
themselves and months of intense watching, study and labor are requisite.
— Thomas Edison
A very public failure was the Mars lander of September 1999, which
crashed due to a misunderstanding in the units of measure used by two
modules created by separate software groups. One module computed
thruster data in English units and forwarded the data to a module that
expected data in metric units. This is a very typical integration fault (but in
this case enormously expensive, both in terms of money and prestige).
One of the most famous cases of software killing people is the Therac25 radiation therapy machine. Software faults were found to have caused
at least three deaths due to excessive radiation. Another dramatic failure
was the launch failure of the first Ariane 5 rocket, which exploded 37
seconds after liftoff in 1996. The low-level cause was an unhandled
floating point conversion exception in an inertial guidance system
function. It turned out that the guidance system could never encounter the
unhandled exception when used on the Ariane 4 rocket. That is,
theguidance system function was correct for Ariane 4. The developers of
the Ariane 5 quite reasonably wanted to reuse the successful inertial
guidance system from the Ariane 4, but no one reanalyzed the software in
light of the substantially different flight trajectory of the Ariane 5.
Furthermore, the system tests that would have found the problem were
technically difficult to execute, and so were not performed. The result was
spectacular–and expensive!
The famous Pentium bug was an early alarm of the need for better
testing, especially unit testing. Intel introduced its Pentium microprocessor
in 1994, and a few months later, Thomas Nicely, a mathematician at
Lynchburg College in Virginia, found that the chip gave incorrect answers
to certain floating-point division calculations.
The chip was slightly inaccurate for a few pairs of numbers; Intel
claimed (probably correctly) that only one in nine billion division
operations would exhibit reduced precision. The fault was the omission of
five entries in a table of 1, 066 values (part of the chip’s circuitry) used by
a division algorithm. The five entries should have contained theconstant
+2, but the entries were not initialized and contained zero instead. The
MIT mathematician Edelman claimed that “the bug in the Pentium was an
easy mistake to make, and a difficult one to catch,” an analysis that misses
an essential point. This was a very difficult mistake to find during system
testing, and indeed, Intel claimed to have run millions of tests using this
table. But the table entries were left empty because a loop termination
condition was incorrect; that is, the loop stopped storing numbers before it
was finished. Thus, this would have been a very simple fault to find during
unit testing; indeed analysis showed that almost any unit level coverage
criterion would have found this multimillion dollar mistake.
The great northeast blackout of 2003 was started when a power line in
Ohio brushed against overgrown trees and shut down. This is called a fault
in the power industry. Unfortunately, the software alarm system failed in
the local power company, so system operators could not understand what
happened. Other lines also sagged into trees and switched off, eventually
overloading other power lines, which then cut off. This cascade effect
eventually caused a blackout throughout southeastern Canada and eight
states in the northeastern part of the US. This is considered the biggest
blackout in North American history, affecting 10 million people in Canada
and 40 million in the USA, contributing to at least 11 deaths and costing
up to $6 billion USD.
Some software failures are felt widely enough to cause severe
embarrassment to the company. In 2011, a centralized students data
management system in Korea miscalculated the academic grades of over
29, 000 middle and high school students. This led to massive confusion
about college admissions and a government investigation into the software
engineering practices of the software vendor, Samsung Electronics.
A 1999 study commissioned by the U.S. National Research Council and
the U.S. President’s commission on critical infrastructure protection
concluded that the current base of science and technology is inadequate for
building systems to control critical software infrastructure. A 2002 report
commissioned by the National Institute of Standards and Technology
(NIST) estimated that defective software costs the U.S. economy $59.5
billion per year. The report further estimated that 64% of the costs were a
result of user mistakes and 36% a result of design and development
mistakes, and suggested that improvements in testing could reduce this
cost by about a third, or $22.5 billion. Blumenstyk reported that web
application failures lead to huge losses in businesses; $150, 000 per hour
in media companies, $2.4 million per hour in credit card sales, and $6.5
million per hour in the financial services market.
Software faults do not just lead to functional failures. According to a
Symantec security threat report in 2007, 61 percent of all vulnerabilities
disclosed were due to faulty software. The most common are web
application vulnerabilities that can be attacked by some common attack
techniques using invalid inputs.
These public and expensive software failures are getting more common
and more widely known. This is simply a symptom of the change in
expectations of software. As we move further into the 21st century, we are
using more safety critical, real-time software. Embedded software has
become ubiquitous; many of us carry millions of lines of embedded
software in our pockets. Corporations rely more and more on large-scale
enterprise applications, which by definition have large user bases and high
reliability requirements. Security, which used to depend on cryptography,
then database security, then avoiding network vulnerabilities, is now
largely about avoiding software faults. The Web has had a major impact. It
features a deployment platform that offers software services that are very
competitive and available to millions of users. They are also distributed,
adding complexity, and must be highly reliable to be competitive. More so
than at any previous time, industry desperately needs to apply the
accumulated knowledge of over 30 years of testing research.
1.2 GOALS OF TESTING SOFTWARE
Surprisingly, many software engineers are not clear about their testing
goals. Is it to show correctness, find problems, or something else? To
explore this concept, we first must separate validation and verification.
Most of the definitions in this book are taken from standards documents,
and although the phrasing is ours, we try to be consistent with the
standards. Useful standards for reading in more detail are the IEEE
Standard Glossary of Software Engineering Terminology, DOD-STD2167A and MIL-STD-498 from the US Department of Defense, and the
British Computer Society’s Standard for Software Component Testing.
Definition 1.4 Verification: The process of determining whether the
products of a phase of the software development process fulfill the
requirements established during the previous phase.
Definition 1.5 Validation: The process of evaluating software at the
end of software development to ensure compliance with intended
usage.
Verification is usually a more technical activity that uses knowledge
about the individual software artifacts, requirements, and specifications.
Validation usually depends on domain knowledge; that is, knowledge of
the application for which the software is written. For example, validation
of software for an airplane requires knowledge from aerospace engineers
and pilots.
As a familiar example, consider a light switch in a conference room.
Verification asks if the lighting meets the specifications. The specifications
might say something like, “The lights in front of the projector screen can
be controlled independently of the other lights in the room.” If the
specifications are written down somewhere and thelights cannot be
controlled independently, then the lighting fails verification, precisely
because the implementation does not satisfy the specifications. Validation
asks whether users are satisfied, an inherently fuzzy question that has
nothing to do with verification. If the “independent control” specification
is neither written down nor satisfied, then, despite the disappointed users,
verification nonetheless succeeds, because the implementation satisfies the
specification. But validation fails, because the specification for the lighting
does not reflect the true needs of the users. This is an important general
point: validation exposes flaws in specifications.
The acronym “IV&V” stands for “Independent Verification and
Validation,” where “independent” means that the evaluation is done by
non-developers. Sometimes the IV&V team is within the same project,
sometimes the same company, and sometimes it is entirely an external
entity. In part because of the independent nature of IV&V, the process
often is not started until the software is complete and is often done by
people whose expertise is in the application domain rather than software
development. This can sometimes mean that validation is given more
weight than verification. This book emphasizes verification more than
validation, although most of the specific test criteria we discuss can be
used for both activities.
Beizer discussed the goals of testing in terms of the “test process
maturity levels” of an organization, where the levels are characterized by
the testers’ goals. He defined five levels, where the lowest level is not
worthy of being given a number.
Level
0
There is no difference between testing and debugging.
Level
1
The purpose of testing is to show correctness.
Level
2
The purpose of testing is to show that the software does not
work.
Level
3
The purpose of testing is not to prove anything specific, but to
reduce the risk of using the software.
Level
4
Testing is a mental discipline that helps all IT professionals
develop higher- quality software.
Level 0 is the view that testing is the same as debugging. This is the
view that is naturally adopted by many undergraduate Computer Science
majors. In most CS programming classes, the students get their programs
to compile, then debug the programs with a few inputs chosen either
arbitrarily or provided by the professor. This model does not distinguish
between a program’s incorrect behavior and a mistake within the program,
and does very little to help develop software that is reliable or safe.
In Level 1 testing, the purpose is to show correctness. While a
significant step up from the naive level 0, this has the unfortunate problem
that in any but the most trivial of programs, correctness is virtually
impossible to either achieve or demonstrate. Suppose we run a collection
of tests and find no failures. What do we know? Should we assume that we
have good software or just bad tests? Since the goal of correctness is
impossible, test engineers usually have no strict goal, real stopping rule, or
formal test technique. If a development manager asks how much testing
remains to be done, the test manager has no way to answer the question. In
fact, test managers are in a weak position because they have no way to
quantitatively express or evaluate theirwork.
In Level 2 testing, the purpose is to show failures. Although looking for
failures is certainly a valid goal, it is also inherently negative. Testers may
enjoy finding the problem, but the developers never want to find
problems–they want the software to work (yes, level 1 thinking can be
natural for the developers). Thus, level 2 testing puts testers and
developers into an adversarial relationship, which can be bad for team
morale. Beyond that, when our primary goal is to look for failures, we are
still left wondering what to do if no failures are found. Is our work done?
Is our software very good, or is the testing weak? Having confidence in
when testing is complete is an important goal for all testers. It is our view
that this level currently dominates the software industry.
The thinking that leads to Level 3 testing starts with the realization that
testing can show the presence, but not the absence, of failures. This lets us
accept the fact that whenever we use software, we incur some risk. The
risk may be small and the consequences unimportant, or the risk may be
great and the consequences catastrophic, but risk is always there. This
allows us to realize that the entire development team wants the same
thing–to reduce the risk of using the software. In level 3 testing, both
testers and developers work together to reduce risk. We see more and more
companies move to this testing maturity level every year.
Once the testers and developers are on the same “team,” an organization
can progress to real Level 4 testing. Level 4 thinking defines testing as a
mental discipline that increases quality. Various ways exist to increase
quality, of which creating tests that cause the software to fail is only one.
Adopting this mindset, test engineers can become the technical leaders of
the project (as is common in many other engineering disciplines). They
have the primary responsibility of measuring and improving software
quality, and their expertise should help the developers. Beizer used the
analogy of a spell checker. We often think that the purpose of a spell
checker is to find misspelled words, but in fact, the best purpose of a spell
checker is to improve our ability to spell. Every time the spell checker
finds an incorrectly spelled word, we have the opportunity to learn how to
spell the word correctly. The spell checker is the “expert” on spelling
quality. In the same way, level 4 testing means that the purpose of testing
is to improve the ability of the developers to produce high-quality
software. The testers should be the experts who train your developers!
As a reader of this book, you probably start at level 0, 1, or 2. Most
software developers go through these levels at some stage in their careers.
If you work in software development, you might pause to reflect on which
testing level describes your company or team. The remaining chapters in
Part I should help you move to level 2 thinking, and to understand the
importance of level 3. Subsequent chapters will give you the knowledge,
skills, and tools to be able to work at level 3. An ultimate goal of this book
is to provide a philosophical basis that will allow readers to become
“change agents” in their organizations for level 4 thinking, and test
engineers to become software quality experts. Although level 4 thinking
is currently rare in the software industry, it is common in more mature
engineeringfields.
These considerations help us decide at a strategic level why we test. At a
more tactical level, it is important to know why each test is present. If you
do not know why you are conducting each test, the test will not be very
helpful. What fact is each test trying to verify? It is essential to document
test objectives and test requirements, including the planned coverage
levels. When the test manager attends a planning meeting with the other
managers and the project manager, the test manager must be able to
articulate clearly how much testing is enough and when testing will
complete. In the 1990s, we could use the “date criterion,” that is, testing is
“complete” when the ship date arrives or when the budget is spent.
Figure 1.1 dramatically illustrates the advantages of testing early rather
than late. This chart is based on a detailed analysis of faults that were
detected and fixed during several large government contracts. The bars
marked‘A’ indicate what percentage of faults appeared in that phase.
Thus, 10% of faults appeared during the requirements phase, 40% during
design, and 50% during implementation. The bars marked ‘D’ indicated
the percentage of faults that were detected during each phase. About 5%
were detected during the requirements phase, and over 35% during system
testing. Lastly is the cost analysis. The solid bars marked ‘C’ indicate the
relative cost of finding and fixing faults during each phase. Since each
project was different, this is averaged to be based on a “unit cost.” Thus,
faults detected and fixed during requirements, design, and unit testing were
a single unit cost. Faults detected and fixed during integration testing cost
five times as much, 10 times as much during system testing, and 50 times
as much after the software is deployed.
Figure 1.1. Cost of late testing.
If we take the simple assumption of $1000 USD unit cost per fault, and
100 faults, that means we spend $39, 000 to find and correct faults during
requirements, design, and unit testing. During integration testing, the cost
goes upto $100, 000. But system testing and deployment are the serious
problems. We find more faults during system testing at ten times the cost,
for a total of $360, 000. And even though we only find a few faults after
deployment, the cost being 50 X unit means we spend $250, 000!
Avoiding the work early (requirements analysis and unit testing) saves
money in the short term. But it leaves faults in software that are like little
bombs, ticking away, and the longer they tick, the bigger the explosion
when they finally go off.
To put Beizer’s level 4 test maturity level in simple terms, the goal of
testing is to eliminate faults as early as possible. We can never be perfect,
but every time we eliminate a fault during unit testing (or sooner!), we
save money. The rest of this book will teach you how to do that.
EXERCISES
Chapter 1.
1. What are some factors that would help a development organization
move from Beizer’s testing level 2 (testing is to show errors) to
testing level 4 (a mental discipline that increases quality)?
2. What is the difference between software fault and software failure?
3. What do we mean by “level 3 thinking is that the purpose of testing is
to reduce risk?” What risk? Can we reduce the risk to zero?
4. The following exercise is intended to encourage you to think of
testing in a more rigorous way than you may be used to. The exercise
also hints at the strong relationship between specification clarity,
faults, and test cases1.
(a) Write a Java method with the signature
public static Vector union (Vector a,
Vector b)
The method should return a Vector of objects that are in either
of the two argument Vectors.
(b) Upon reflection, you may discover a variety of defects and
ambiguities in the given assignment. In other words, ample
opportunities for faults exist. Describe as many possible faults
as you can. (Note: Vector is a Java Collection class. If you
are using another language, interpret Vector as a list.)
(c) Create a set of test cases that you think would have a reasonable
chance of revealing the faults you identified above. Document a
rationale for each test in your test set. If possible, characterize
all of your rationales in some concise summary. Run your tests
against your implementation.
(d) Rewrite the method signature to be precise enough to clarify the
defects and ambiguities identified earlier. You might wish to
illustrate your specification with examples drawn from your test
cases.
5. Below are four faulty programs. Each includes test inputs that result
in failure. Answer the following questions about each program.
(a) Explain what is wrong with the given code. Describe the fault
precisely by proposing a modification to the code.
(b) If possible, give a test case that does not execute the fault. If
not, briefly explain why not.
(c) If possible, give a test case that executes the fault, but does not
result in an error state. If not, briefly explain why not.
(d) If possible give a test case that results in an error, but not a
failure. If not, briefly explain why not. Hint: Don’t forget about
the program counter.
(e) For the given test case, describe the first error state. Be sure to
describe the complete state.
(f) Implement your repair and verify that the given test now
produces the expected output. Submit a screen printout or other
evidence that your new program works.
6. Answer question (a) or (b), but not both, depending on your
background.
(a) If you do, or have, worked for a software development
company, what level of test maturity do you think the company
worked at? (0: testing=debugging, 1: testing shows correctness,
2: testing shows the program doesn’t work, 3: testing reduces
risk, 4: testing is a mental discipline about quality).
(b) If you have never worked for a software development company,
what level of test maturity do you think that you have? (0:
testing=debugging, 1: testing shows correctness, 2: testing
shows the program doesn’t work, 3: testing reduces risk, 4:
testing is a mental discipline about quality).
7. Consider the following three example classes. These are OO faults
taken from Joshua Bloch’s Effective Java, Second Edition. Answer
the following questions about each.
(a) Explain what is wrong with the given code. Describe the fault
precisely by proposing a modification to the code.
(b) If possible, give a test case that does not execute the fault. If
not, briefly explain why not.
(c) If possible, give a test case that executes the fault, but does not
result in an error state. If not, briefly explain why not.
(d) If possible give a test case that results in an error, but not a
failure. If not, briefly explain why not. Hint: Don’t forget about
the program counter.
(e) In the given code, describe the first error state. Be sure to
describe the complete state.
(f) Implement your repair and verify that the given test now
produces the expected output. Submit a screen printout or other
evidence that your new program works.
1.3 BIBLIOGRAPHIC NOTES
This textbook has been deliberately left uncluttered with references.
Instead, each chapter contains a Bibliographic Notes section, which
contains suggestions for further and deeper reading for readers who want
more. We especially hope that research students will find these sections
helpful.
Most of the terminology in testing is from standards documents,
including the IEEE Standard Glossary of Software Engineering
Terminology [IEEE, 2008], the US Department of Defense [Department of
Defense, 1988, Department of Defense, 1994], the US Federal Aviation
Administration FAA-DO178B, and the British Computer Society’s
Standard for Software Component Testing [British Computer Society,
2001].
Beizer [Beizer, 1990] first defined the testing levels in Section 1.2.
Beizer described them in terms of the maturity of individual developers
and used the term phase instead of level. We adapted the discussion to
organizations rather than individual developers and chose the term level to
mirror the language of the well-known Capability Maturity Model [Paulk
et al., 1995].
All books on software testing and all researchers owe major thanks to
the landmark books in 1979 by Myers [Myers, 1979], in 1990 by Beizer
[Beizer, 1990], and in 2000 by Binder [Binder, 2000]. Some excellent
overviews of unit testing criteria have also been published, including one
by White [White, 1987] and more recently by Zhu, Hall, and May [Zhu et
al., 1997]. The recent text from Pezze and Young [Pezze and Young,
2008] reports relevant processes, principles, and techniques from the
testing literature, and includes many useful classroom materials. The Pezze
and Young text presents coverage criteria in the traditional lifecycle-based
manner, and does not organize criteria into the four abstract models
discussed in this chapter. Another recent book by Mathur offers a
comprehensive, in-depth catalog of test techniques and criteria [Mathur,
2014].
Numerous other software testing books were not intended as textbooks,
or do not offer general coverage for classroom use. Beizer’s Software
System Testing and Quality Assurance [Beizer, 1984] and Hetzel’s The
Complete Guide to Software Testing [Hetzel, 1988] cover various aspects
of management and process for software testing. Several books cover
specific aspects of testing [Howden, 1987, Marick, 1995, Roper, 1994].
The STEP project at Georgia Institute of Technology resulted in a
comprehensive survey of the practice of software testing by Department of
Defense contractors in the 1980s [DeMillo et al., 1987].
The information for the Pentium bug and Mars lander was taken from
several sources, including by Edelman, Moler, Nuseibeh, Knutson, and
Peterson [Edelman, 1997, Knutson and Carmichael, 2000, Moler, 1995,
Nuseibeh, 1997, Peterson, 1997]. The well-written official accident report
[Lions, 1996] is our favorite source for understanding the details of the
Ariane 5 Flight 501 Failure. The information for the Therac-25 accidents
was taken from Leveson and Turner’s deep analysis [Leveson and Turner,
1993]. The details on the 2003 Northeast Blackout was taken from
Minkel’s analysis in Scientific American [Minkel, 2008] and Rice’s book
[Rice, 2008]. The information about the Korean education information
system was taken from two newspaper articles [Min-sang and Sang-soo,
2011, Korea Times, 2011].
The 1999 study mentioned was published in an NRC / PITAC report
[PITAC, 1999, Schneider, 1999]. The data in Figure 1.1 were taken from a
NIST report that was developed by the Research Triangle Institute [RTI,
2002]. The figures on web application failures are due to Blumenstyk
[Blumenstyk, 2006]. The figures about faulty software leading to security
vulnerabilities are from Symantec [Symantec, 2007].
Finally, Rick Hower’s QATest website is a good resource for current,
elementary, information about software testing: www.softwareqatest.com.
1 Liskov’s Program Development in Java, especially chapters 9 and 10, is a great
source for students who wish to learn more about this.
2
Model-Driven Test Design
Designers are more efficient and effective if they can raise their level of
abstraction.
This chapter introduces one of the major innovations in the second edition
of Introduction to Software Testing. Software testing is inherently
complicated and our ultimate goal, completely correct software, is
unreachable. The reasons are formal (as discussed below in section 2.1)
and philosophical. As discussed in Chapter 1, it’s not even clear that the
term “correctness” means anything when applied to a piece of engineering
as complicated as a large computer program. Do we expect correctness out
of a building? A car? A transportation system? Intuitively, we know that
all large physical engineering systems have problems, and moreover, there
is no way to say what correct means. This is even more true for software,
which can quickly get orders of magnitude more complicated than physical
structures such as office buildings or airplanes.
Instead of looking for “correctness,” wise software engineers try to
evaluate software’s “behavior” to decide if the behavior is acceptable
within consideration of a large number of factors including (but not limited
to) reliability, safety, maintainability, security, and efficiency. Obviously
this is more complex than the naive desire to show the software is correct.
So what do software engineers do in the face of such overwhelming
complexity? The same thing that physical engineers do–we use
mathematics to “raise our level of abstraction. ” The Model-Driven Test
Design (MDTD) process breaks testing into a series of small tasks that
simplify test generation. Then test designers isolate their task, and work at
a higher level of abstraction by using mathematical engineering structures
to design test values independently of the details of software or design
artifacts, test automation, and test execution.
A key intellectual step in MDTD is test case design. Test case design
can be the primary determining factor in whether tests successfully find
failures in software. Tests can be designed with a “human-based”
approach, where a test engineer uses domain knowledge of the software’s
purpose and his or her experience to design tests that will be effective at
finding faults. Alternatively, tests can be designed to satisfy well-defined
engineering goals such as coverage criteria. This chapter describes the task
activities and then introduces criteria-based test design. Criteria-based test
design will be discussed in more detail in Chapter 5, then specific criteria
on four mathematical structures are described in Part II. After these
preliminaries, the model-driven test design process is defined in detail. The
book website has simple web applications that support the MDTD in the
context of the mathematical structures in Part II.
2.1 SOFTWARE TESTING FOUNDATIONS
One of the most important facts that all software testers need to know is
that testing can show only the presence of failures, not their absence. This
is a fundamental, theoretical limitation; to be precise, the problem of
finding all failures in a program is undecidable. Testers often call a test
successful (or effective) if it finds an error. While this is an example of
level 2 thinking, it is also a characterization that is often useful and that we
will use throughout the book. This section explores some of the theoretical
underpinnings of testing as a way to emphasize how important the MDTD
is.
The definitions of fault and failure in Chapter 1 allow us to develop the
reachability, infection, propagation, and revealability model (“RIPR”).
First, we distinguish testing from debugging.
Definition 2.6 Testing: Evaluating software by observing its
execution.
Definition 2.7 Test Failure: Execution of a test that results in a
software failure.
Definition 2.8 Debugging: The process of finding a fault given a
failure.
Of course the central issue is that for a given fault, not all inputs will
“trigger” the fault into creating incorrect output (a failure). Also, it is often
very difficult to relate a failure to the associated fault. Analyzing these
ideas leads to the fault/failure model, which states that four conditions are
needed for a failure to be observed.
Figure 2.1 illustrates the conditions. First, a test must reach the location
or locations in the program that contain the fault (Reachability). After the
location is executed, the state of the program must be incorrect (Infection).
Third, the infected state must propagate through the rest of the execution
and cause some output or final state of the program to be incorrect
(Propagation). Finally, the tester must observe part of the incorrect portion
of the final program state (Revealability). If the tester only observes parts
of the correct portion of the final program state, the failure is not revealed.
This is shown in the cross-hatched intersection in Figure 2.1. Issues with
revealing failures will be discussed in Chapter 4 when we present test
automation strategies.
Figure 2.1. Reachability, Infection, Propagation, Revealability (RIPR) model.
Collectively, these four conditions are known as the fault/failure model,
or the RIPR model.
It is important to note that the RIPR model applies even when the fault
is missing code (so-called faults of omission). In particular, when
execution passes through the location where the missing code should be,
the program counter, which is part of the program state, necessarily has the
wrong value.
From a practitioner’s view, these limitations mean that software testing
is complex and difficult. The common way to deal with complexity in
engineering is to use abstraction by abstracting out complicating details
that can be safely ignored by modeling the problem with some
mathematical structures. That is a central theme of this book, which we
begin by analyzing the separate technical activities involved in creating
good tests.
2.2 SOFTWARE TESTING ACTIVITIES
In this book, a test engineer is an Information Technology (IT)
professional who is in charge of one or more technical test activities,
including designing test inputs, producing test case values, running test
scripts, analyzing results, and reporting results to developers and
managers. Although we cast the description in terms of test engineers,
every engineer involved in software development should realize that he or
she sometimes wears the hat of a test engineer. The reason is that each
software artifact produced over the course of a product’s development has,
or should have, an associated set of test cases, and the person best
positioned to define these test cases is often the designer of the artifact. A
test manager is in charge of one or more test engineers. Test managers set
test policies and processes, interact with other managers on the project,
and otherwise help the engineers test software effectively and efficiently.
Figure 2.2 shows some of the major activities of test engineers. A test
engineer must design tests by creating test requirements. These
requirements are then transformed into actual values and scripts that are
ready for execution. These executable tests are run against the software,
denoted P in the figure, and the results are evaluated to determine if the
tests reveal a fault in the software. These activities may be carried out by
one person or by several, and the process is monitored by a test manager.
Figure 2.2. Activities of test engineers.
One of a test engineer’s most powerful tools is a formal coverage
criterion. Formal coverage criteria give test engineers ways to decide what
test inputs to use during testing, making it more likely that the tester will
find problems in the program and providing greater assurance that the
software is of high quality and reliability. Coverage criteria also provide
stopping rules for the test engineers. The technical core of this book
presents the coverage criteria that are available, describes how they are
supported by tools (commercial and otherwise), explains how they can
best be applied, and suggests how they can be integrated into the overall
development process.
Software testing activities have long been categorized into levels, and
the most often used level categorization is based on traditional software
process steps. Although most types of tests can only be run after some part
of the software is implemented, tests can be designed and constructed
during all software development steps. The most time-consuming parts of
testing are actually the test design and construction, so test activities can
and should be carried out throughout development.
2.3 TESTING LEVELS BASED ON SOFTWARE
ACTIVITY
Tests can be derived from requirements and specifications, design
artifacts, or the source code. In traditional texts, a different level of testing
accompanies each distinct software development activity:
Acceptance Testing : assess software with respect to requirements or
users’ needs.
System Testing : assess software with respect to architectural design
and overall behavior.
Integration Testing : assess software with respect to subsystem design.
Module Testing: assess software with respect to detailed design.
Unit Testing : assess software with respect to implementation.
Figure 2.3, often called the “V model,” illustrates a typical scenario for
testing levels and how they relate to software development activities by
isolating each step. Information for each test level is typically derived from
the associated development activity. Indeed, standard advice is to design
the tests concurrently with each development activity, even though the
software will not be in an executable form until the implementation phase.
The reason for this advice is that the mere process of designing tests can
identify defects in design decisions that otherwise appear reasonable. Early
identification of defects is by far the best way to reduce their ultimate cost.
Note that this diagram is not intended to imply a waterfall process. The
synthesis and analysis activities generically apply to any development
process.
Figure 2.3. Software development activities and testing levels – the “V Model”.
The requirements analysis phase of software development captures the
customer’s needs. Acceptance testing is designed to determine whether the
completed software in fact meets these needs. In other words, acceptance
testing probes whether the software does what the users want. Acceptance
testing must involve users or other individuals who have strong domain
knowledge.
The architectural design phase of software development chooses
components and connectors that together realize a system whose
specification is intended to meet the previously identified requirements.
System testing is designed to determine whether the assembled system
meets its specifications. It assumes that the pieces work individually, and
asks if the system works as a whole. This level of testing usually looks for
design and specification problems. It is a very expensive place to find
lower-level faults and is usually not done by the programmers, but by a
separate testing team.
The subsystem design phase of software development specifies the
structure and behavior of subsystems, each of which is intended to satisfy
some function in the overall architecture. Often, the subsystems are
adaptations of previously developed software. Integration testing is
designed to assess whether the interfaces between modules (defined
below) in a subsystem have consistent assumptions and communicate
correctly. Integration testing must assume that modules work correctly.
Some testing literature uses the terms integration testing and system testing
interchangeably; in this book, integration testing does not refer to testing
the integrated system or subsystem. Integration testing is usually the
responsibility of members of the development team.
The detailed design phase of software development determines the
structure and behavior of individual modules. A module is a collection of
related units that are assembled in a file, package, or class. This
corresponds to a file in C, a package in Ada, and a class in C++ and Java.
Module testing is designed to assess individual modules in isolation,
including how the component units interact with each other and their
associated data structures. Most software development organizations make
module testing the responsibility of the programmer; hence the common
term developer testing.
Implementation is the phase of software development that actually
produces code. A program unit, or procedure, is one or more contiguous
program statements, with a name that other parts of the software use to call
it. Units are called functions in C and C++, procedures or functions in
Ada, methods in Java, and subroutines in Fortran. Unit testing is designed
to assess the units produced by the implementation phase and is the
“lowest” level of testing. In some cases, such as when building general-
purpose library modules, unit testing is done without knowledge of the
encapsulating software application. As with module testing, most software
development organizations make unit testing the responsibility of the
programmer, again, often called developer testing. It is straightforward to
package unit tests together with the corresponding code through the use of
tools such as JUnit for Java classes.
Because of the many dependencies among methods in classes, it is
common among developers using object-oriented (OO) software to
combine unit and module testing and use the term unit testing or
developertesting.
Not shown in Figure 2.3 is regression testing, a standard part of the
maintenance phase of software development. Regression testing is done
after changes are made to the software, to help ensure that the updated
software still possesses the functionality it had before the updates.
Mistakes in requirements and high-level design end up being
implemented as faults in the program; thus testing can reveal them.
Unfortunately, the software faults that come from requirements and design
mistakes are visible only through testing months or years after the original
mistake. The effects of the mistake tend to be dispersed throughout
multiple software components; hence such faults are usually difficult to
pin down and expensive to correct. On the positive side, even if tests
cannot be executed, the very process of defining tests can identify a
significant fraction of the mistakes in requirements and design. Hence, it is
important for test planning to proceed concurrently with requirements
analysis and design and not be put off until late in a project. Fortunately,
through techniques such as use case analysis, test planning is becoming
better integrated with requirements analysis in standard software practice.
Although most of the literature emphasizes these levels in terms of
when they are applied, a more important distinction is on the types of
faults that we are looking for. The faults are based on the software artifact
that we are testing, and the software artifact that we derive the tests from.
For example, unit and module tests are derived to test units and modules,
and we usually try to find faults that can be found when executing the units
and modules individually.
One final note is that OO software changes the testing levels. OO
software blurs the distinction between units and modules, so the OO
software testing literature has developed a slight variation of these levels.
Intra-method testing evaluates individual methods. Inter-method testing
evaluates pairs of methods within the same class. Intra-class testing
evaluates a single entire class, usually as sequences of calls to methods
within the class. Finally, inter-class testing evaluates more than one class
at the same time. The first three are variations of unit and module testing,
whereas inter-class testing is a type of integration testing.
2.4 COVERAGE CRITERIA
The essential problem with testing is the numbers. Even a small program
has a huge number of possible inputs. Consider a tiny method that
computes the average of three integers. We have only three input
variables, but each can have any value between -MAXINT and
+MAXINT. On a 32-bit machine, each variable has a possibility of over 4
billion values. With three inputs, this means the method has over 80
Octillion possible inputs!
So no matter whether we are doing unit testing, integration testing, or
system testing, it is impossible to test with all inputs. The input space is, to
all practical purposes, infinite. Thus a test designer’s goal could be
summarized in a very high-level way as searching a huge input space,
hoping to find the fewest tests that will reveal the most problems. This is
the source of two key problems in testing: (1) how do we search? and (2)
when do we stop? Coverage criteria give us structured, practical ways to
search the input space. Satisfying a coverage criterion gives a tester some
amount of confidence in two crucial goals: (A) we have looked in many
corners of the input space, and (B) our tests have a fairly low amount of
overlap.
Coverage criteria have many advantages for improving the quality and
reducing the cost of test data generation. Coverage criteria can maximize
the “bang for the buck,” with fewer tests that are effective at finding more
faults. Well-designed criteria-based tests will be comprehensive, yet factor
out unwanted redundancy. Coverage criteria also provide traceability from
software artifacts such as source, design models, requirements, and input
space descriptions. This supports regression testing by making it easier to
decide which tests need to be reused, modified, or deleted. From an
engineering perspective, one of the strongest benefits of coverage criteria
is they provide a “stopping rule” for testing; that is, we know in advance
approximately how many tests are needed and we know when we have
“enough” tests. This is a powerful tool for engineers and managers.
Coverage criteria also lend themselves well to automation. As we will
formalize in Chapter 5, a test requirement is a specific element of a
software artifact that a test case must satisfy or cover, and a coverage
criterion is a rule or collection of rules that yield test requirements. For
example, the coverage criterion “cover every statement” yields one test
requirement for each statement. The coverage criterion “cover every
functional requirement” yields one test requirement for each functional
requirement. Test requirements can be stated in semi-formal, mathematical
terms, and then manipulated algorithmically. This allows much of the test
data design and generation process to be automated.
The research literature presents a lot of overlapping and identical
coverage criteria. Researchers have invented hundreds of criteria on
dozens of software artifacts. However, if we abstract these artifacts into
mathematical models, many criteria turn out to be exactly the same. For
example, the idea of covering pairs of edges in finite state machines was
first published in 1976, using the term switch cover. Later, the same idea
was applied to control flow graphs and called two-trip, still again, the same
idea was “invented” for state transition diagrams and called transition-pair
(we define this formally using the generic term edge-pair in Chapter 7).
Although they looked very different in the research literature, if we
generalize these structures to graphs, all three ideas are the same.
Similarly, node coverage and edge coverage have each been defined
dozens of times.
Sidebar
Black-Box and White-Box Testing
Black-box testing and the complementary white-box testing are old and
widely used terms in software testing. In black-box testing, we derive
tests from external descriptions of the software, including
specifications, requirements, and design. In white-box testing, on the
other hand, we derive tests from the source code internals of the
software, specifically including branches, individual conditions, and
statements. This somewhat arbitrary distinction started to lose
coherence when the term gray-box testing was applied to developing
tests from design elements, and the approach taken in this book
eliminates the need for the distinction altogether.
Some older sources say that white-box testing is used for system testing
and black-box testing for unit testing. This distinction is certainly false,
since all testing techniques considered to be white-box can be used at
the system level, and all testing techniques considered to be black-box
can be used on individual units. In reality, unit testers are currently
more likely to use white-box testing than system testers are, simply
because white-box testing requires knowledge of the program and is
more expensive to apply, costs that can balloon on a large system.
This book relies on developing tests from mathematical abstractions
such as graphs and logical expressions. As will become clear in Part II,
these structures can be extracted from any software artifact, including
source, design, specifications, or requirements. Thus asking whether a
coverage criterion is black-box or white-box is the wrong question. One
more properly should ask from what level of abstraction is the structure
drawn.
In fact, all test coverage criteria can be boiled down to a few dozen
criteria on just four mathematical structures: input domains, graphs, logic
expressions, and syntax descriptions (grammars). Just like mechanical,
civil, and electrical engineers use calculus and algebra to create abstract
representations of physical structures, then solve various problems at this
abstract level, software engineers can use discrete math to create abstract
representations of software, then solve problems such as test design.
The core of this book is organized around these four structures, as
reflected in the four chapters in Part II. This structure greatly simplifies
teaching test design, and our classroom experience with the first edition of
this book helped us realize this structure also leads to a simplified testing
process. This process allows test design to be abstracted and carried out
efficiently, and also separates test activities that need different knowledge
and skill sets. Because the approach is based on these four abstract models,
we call it the Model-Driven Test Design process (MDTD).
Sidebar
MDTD and Model-Based Testing
Model-based testing (MBT) is the design of software tests from an
abstract model that represents one or more aspects of the software. The
model usually, but not always, represents some aspects of the behavior
of the software, and sometimes, but not always, is able to generate
expected outputs. The models are often described with UML diagrams,
although more formal models as well as other informal modeling
languages are also used. MBT typically assumes that the model has
been built to specify the behavior of the software and was created
during a design stage of development.
The ideas presented in this book are not, strictly speaking, exclusive to
model-based testing. However, there is much overlap with MDTD and
most of the concepts in this book can be directly used as part of MBT.
Specifically, we derive our tests from abstract structures that are very
similar to models. An important difference is that these structures can
be created after the software is implemented, by the tester as part of
test design. Thus, the structures do not specify behavior; they represent
behavior. If a model was created to specify the software behavior, a
tester can certainly use it, but if not, a tester can create one. Second, we
create idealized structures that are more abstract than most modeling
languages. For example, instead of UML statecharts or Petri nets, we
design our tests from graphs. If model-based testing is being used, the
graphs can be derived from a graphical model. Third, model-based
testing explicitly does not use the source code implementation to design
tests. In this book, abstract structures can be created from the
implementation via things like control flow graphs, call graphs, and
conditionals in decision statements.
2.5 MODEL-DRIVEN TEST DESIGN
Academic teachers and researchers have long focused on the design of
tests. We define test design to be the process of creating input values that
will effectively test software. This is the most mathematical and
technically challenging part of testing, however, academics can easily
forget that this is only a small part of testing.
The job of developing tests can be divided into four discrete tasks: test
design, test automation, test execution, and test evaluation. Many
organizations assign the same person to all tasks. However, each task
requires different skills, background knowledge, education, and training.
Assigning the same person to all these tasks is akin to assigning the same
software developer to requirements, design, implementation, integration,
and configuration control. Although this was common in previous decades,
few companies today assign the same engineers to all development tasks.
Engineers specialize, sometimes temporarily, sometimes for a project, and
sometimes for their entire career. But should test organizations still assign
the same people to all test tasks? They require different skills, and it is
unreasonable to expect all testers to be good at all tasks, so this clearly
wastes resources. The following subsections analyze each of these tasks in
detail.
2.5.1 Test Design
As said above, test design is the process of designing input values that will
effectively test software. In practice, engineers use two general approaches
to designing tests. In criteria-based test design, we design test values that
satisfy engineering goals such as coverage criteria. In human-based test
design, we design test values based on domain knowledge of the program
and human knowledge of testing. These are quite different activities.
Criteria-based test design is the most technical and mathematical job in
software testing. To apply criteria effectively, the tester needs knowledge
of discrete math, programming, and testing. That is, this requires much of
a traditional degree in computer science. For somebody with a degree in
computer science or software engineering, this is intellectually stimulating,
rewarding, and challenging. Much of the work involves creating abstract
models and manipulating them to design high-quality tests. In software
development, this is analogous to the job of software architect; in building
construction, this is analogous to the job of construction engineer. If an
organization uses people who are not qualified (that is, do not have the
required knowledge), they will spend time creating ineffective tests and be
dissatisfied at work.
Human-based test design is quite different. The testers must have
knowledge of the software’s application domain, of testing, and of user
interfaces. Human-based test designers explicitly attempt to find stress
tests, tests that stress the software by including very large or very small
values, boundary values, invalid values, or other values that the software
may not expect during typical behavior. Human-based testers also
explicitly consider actions the users might do, including unusual actions.
This is much harder than developers may think and more necessary than
many test researchers and educators realize. Although criteria-based
approaches often implicitly include techniques such as stress testing, they
can be blind to special situations, and may miss problems that humanbased tests would not. Although almost no traditional CS is needed, an
empirical background (biology or psychology) or a background in logic
(law, philosophy, math) is helpful. If the software is embedded on an
airplane, a human-based test designer should understand piloting; if the
software runs an online store, the test designers should understand
marketing and the products being sold. For people with these abilities,
human-based test design is intellectually stimulating, rewarding, and
challenging–but ...