Contemporary Software Testing Foundations Programming Discussion

Question 1: Professional Assignment

In a 4-6-page, APA-formatted paper, design a software test process that utilizes the foundations for modern software testing and include technical methods to design effective test case values from criteria. Show the concepts being put into practice and point out any additional pragmatic concerns. Finally, provide a summary overview of the major aspects of putting the Model-Driven Test Design process into practice. Be sure to take consideration of test plans, integration testing, regression testing, and the design and implementation of test oracles. Remember to support your thoughts and justifications with outside, reliable resources that are properly identified, cited, and referenced.

Question 2: (Discussion)

What is one of the ways in which we as managers of the software test process can properly write and implement test plans? Explain a structure that you have previously used or explored in your research methods currently being used in the industry. Give examples and cite your work where necessary. Minimum 250 words, add 2 citations and 2 references.

Introduction to Software Testing
This extensively classroom-tested text takes an innovative approach to
explaining software testing that defines it as the process of applying a few
precise, general-purpose criteria to a structure or model of the software.
The text incorporates cutting-edge developments, including techniques to
test modern types of software such as OO, web applications, and
embedded software. This revised second edition significantly expands
coverage of the basics, thoroughly discussing test automaton frameworks,
and adds new, improved examples and numerous exercises. Key features
include:
The theory of coverage criteria is carefully, cleanly explained to help
students understand concepts before delving into practical
applications.
Extensive use of the JUnit test framework gives students practical
experience in a test framework popular in industry.
Exercises feature specifically tailored tools that allow students to check
their own work.
Instructor’s manual, PowerPoint slides, testing tools for students, and
example software programs in Java are available from the book’s
website.
Paul Ammann is Associate Professor of Software Engineering at George
Mason University. He earned the Volgenau School’s Outstanding
Teaching Award in 2007. He led the development of the Applied
Computer Science degree, and has served as Director of the MS Software
Engineering program. He has taught courses in software testing, applied
object-oriented theory, formal methods for software engineering, web
software, and distributed software engineering. Ammann has published
more than eighty papers in software engineering, with an emphasis on
software testing, security, dependability, and software engineering
education.
Jeff Offutt is Professor of Software Engineering at George Mason
University. He leads the MS in Software Engineering program, teaches
software engineering courses at all levels, and developed new courses on
several software engineering subjects. He was awarded the George Mason
University Teaching Excellence Award, Teaching with Technology, in
2013. Offutt has published more than 165 papers in areas such as modelbased testing, criteria-based testing, test automaton, empirical software
engineering, and software maintenance. He is Editor-in-Chief of the
Journal of Software Testing, Verification and Reliability; helped found the
IEEE International Conference on Software Testing; and is the founder of
the μJava project.
INTRODUCTION TO
SOFTWARE
TESTING
Paul Ammann
George Mason University
Jeff Offutt
George Mason University
University Printing House, Cambridge CB2 8BS, United Kingdom One Liberty
Plaza, 20th Floor, New York, NY 10006, USA 477 Williamstown Road, Port
Melbourne, VIC 3207, Australia 4843/24, 2nd Floor, Ansari Road, Daryaganj,
Delhi – 110002, India 79 Anson Road, #06-04/06, Singapore 079906
Cambridge University Press is part of the University of Cambridge.
It furthers the University’s mission by disseminating knowledge in the pursuit of
education, learning, and research at the highest international levels of excellence.
www.cambridge.org
Information on this title: www.cambridge.org/9781107172012
DOI: 10.1017/9781316771273
© Paul Ammann and Jeff Offutt 2017
This publication is in copyright. Subject to statutory exception and to the
provisions of relevant collective licensing agreements, no reproduction of any part
may take place without the written permission of Cambridge University Press.
First published 2017
Printed in the United States of America by Sheridan Books, Inc.
A catalogue record for this publication is available from the British Library.
Library of Congress Cataloguing in Publication Data
Names: Ammann, Paul, 1961– author. — Offutt, Jeff, 1961– author.
Title: Introduction to software testing / Paul Ammann, George Mason
University, Jeff Offutt, George Mason University.
Description: Edition 2. — Cambridge, United Kingdom; New York, NY, USA:
Cambridge University Press, [2016]
Identifiers: LCCN 2016032808 — ISBN 9781107172012 (hardback)
Subjects: LCSH: Computer software–Testing.
Classification: LCC QA76.76.T48 A56 2016 — DDC 005.3028/7–dc23
LC record available at https://lccn.loc.gov/2016032808
ISBN 978-1-107-17201-2 Hardback
Additional resources for this publication at https://cs.gmu.edu/~offutt/softwaretest/.
Cambridge University Press has no responsibility for the persistence or accuracy of
URLs for external or third-party Internet Web sites referred to in this publication
and does not guarantee that any content on such Web sites is, or will remain,
accurate or appropriate.
Contents
List of Figures
List of Tables
Preface to the Second Edition
Part 1 Foundations
1 Why Do We Test Software?
1.1 When Software Goes Bad
1.2 Goals of Testing Software
1.3 Bibliographic Notes
2 Model-Driven Test Design
2.1 Software Testing Foundations
2.2 Software Testing Activities
2.3 Testing Levels Based on Software Activity
2.4 Coverage Criteria
2.5 Model-Driven Test Design
2.5.1 Test Design
2.5.2 Test Automation
2.5.3 Test Execution
2.5.4 Test Evaluation
2.5.5 Test Personnel and Abstraction
2.6 Why MDTD Matters
2.7 Bibliographic Notes
3 Test Automation
3.1 Software Testability
3.2 Components of a Test Case
3.3 A Test Automation Framework
3.3.1 The JUnit Test Framework
3.3.2 Data-Driven Tests
3.3.3 Adding Parameters to Unit Tests
3.3.4 JUnit from the Command Line
3.4 Beyond Test Automation
3.5 Bibliographic Notes
4 Putting Testing First
4.1 Taming the Cost-of-Change Curve
4.1.1 Is the Curve Really Tamed?
4.2 The Test Harness as Guardian
4.2.1 Continuous Integration
4.2.2 System Tests in Agile Methods
4.2.3 Adding Tests to Legacy Systems
4.2.4 Weaknesses in Agile Methods for Testing
4.3 Bibliographic Notes
5 Criteria-Based Test Design
5.1 Coverage Criteria Defined
5.2 Infeasibility and Subsumption
5.3 Advantages of Using Coverage Criteria
5.4 Next Up
5.5 Bibliographic Notes
Part 2 Coverage Criteria
6 Input Space Partitioning
6.1 Input Domain Modeling
6.1.1 Interface-Based Input Domain Modeling
6.1.2 Functionality-Based Input Domain Modeling
6.1.3 Designing Characteristics
6.1.4 Choosing Blocks and Values
6.1.5 Checking the Input Domain Model
6.2 Combination Strategies Criteria
6.3 Handling Constraints Among Characteristics
6.4 Extended Example: Deriving an IDM from JavaDoc
6.4.1 Tasks in Designing IDM-Based Tests
6.4.2 Designing IDM-Based Tests for Iterator
6.5 Bibliographic Notes
7 Graph Coverage
7.1 Overview
7.2 Graph Coverage Criteria
7.2.1 Structural Coverage Criteria
7.2.2 Touring, Sidetrips, and Detours
7.2.3 Data Flow Criteria
7.2.4 Subsumption Relationships Among Graph Coverage
Criteria
7.3 Graph Coverage for Source Code
7.3.1 Structural Graph Coverage for Source Code
7.3.2 Data Flow Graph Coverage for Source Code
7.4 Graph Coverage for Design Elements
7.4.1 Structural Graph Coverage for Design Elements
7.4.2 Data Flow Graph Coverage for Design Elements
7.5 Graph Coverage for Specifications
7.5.1 Testing Sequencing Constraints
7.5.2 Testing State Behavior of Software
7.6 Graph Coverage for Use Cases
7.6.1 Use Case Scenarios
7.7 Bibliographic Notes
8 Logic Coverage
8.1 Semantic Logic Coverage Criteria (Active)
8.1.1 Simple Logic Expression Coverage Criteria
8.1.2 Active Clause Coverage
8.1.3 Inactive Clause Coverage
8.1.4 Infeasibility and Subsumption
8.1.5 Making a Clause Determine a Predicate
8.1.6 Finding Satisfying Values
8.2 Syntactic Logic Coverage Criteria (DNF)
8.2.1 Implicant Coverage
8.2.2 Minimal DNF
8.2.3 The MUMCUT Coverage Criterion
8.2.4 Karnaugh Maps
8.3 Structural Logic Coverage of Programs
8.3.1 Satisfying Predicate Coverage
8.3.2 Satisfying Clause Coverage
8.3.3 Satisfying Active Clause Coverage
8.3.4 Predicate Transformation Issues
8.3.5 Side Effects in Predicates
8.4 Specification-Based Logic Coverage
8.5 Logic Coverage of Finite State Machines
8.6 Bibliographic Notes
9 Syntax-Based Testing
9.1 Syntax-Based Coverage Criteria
9.1.1 Grammar-Based Coverage Criteria
9.1.2 Mutation Testing
9.2 Program-Based Grammars
9.2.1 BNF Grammars for Compilers
9.2.2 Program-Based Mutation
9.3 Integration and Object-Oriented Testing
9.3.1 BNF Integration Testing
9.3.2 Integration Mutation
9.4 Specification-Based Grammars
9.4.1 BNF Grammars
9.4.2 Specification-Based Mutation
9.5 Input Space Grammars
9.5.1 BNF Grammars
9.5.2 Mutating Input Grammars
9.6 Bibliographic Notes
Part 3 Testing in Practice
10 Managing the Test Process
10.1 Overview
10.2 Requirements Analysis and Specification
10.3 System and Software Design
10.4 Intermediate Design
10.5 Detailed Design
10.6 Implementation
10.7 Integration
10.8 System Deployment
10.9 Operation and Maintenance
10.10 Implementing the Test Process
10.11 Bibliographic Notes
11 Writing Test Plans
11.1 Level Test Plan Example Template
11.2 Bibliographic Notes
12 Test Implementation
12.1 Integration Order
12.2 Test Doubles
12.2.1 Stubs and Mocks: Variations of Test Doubles
12.2.2 Using Test Doubles to Replace Components
12.3 Bibliographic Notes
13 Regression Testing for Evolving Software
13.1 Bibliographic Notes
14 Writing Effective Test Oracles
14.1 What Should Be Checked?
14.2 Determining Correct Values
14.2.1 Specification-Based Direct Verification of Outputs
14.2.2 Redundant Computations
14.2.3 Consistency Checks
14.2.4 Metamorphic Testing
14.3 Bibliographic Notes
List of Criteria
Bibliography
Index
Figures
1.1
2.1
2.2
2.3
2.4
2.5
3.1
3.2
3.3
3.4
3.5
3.6
3.7
3.8
4.1
4.2
6.1
6.2
7.1
Cost of late testing
Reachability, Infection, Propagation, Revealability (RIPR) model
Activities of test engineers
Software development activities and testing levels – the “V Model”
Model-driven test design
Example method, CFG, test requirements and test paths
Calc class example and JUnit test
Minimum element class
First three JUnit tests for Min class
Remaining JUnit test methods for Min class
Data-driven test class for Calc
JUnit Theory about sets
JUnit Theory data values
AllTests for the Min class example
Cost-of-change curve
The role of user stories in developing system (acceptance) tests
Partitioning of input domain D into three blocks
Subsumption relations among input space partitioning criteria
Graph (a) has a single initial node, graph (b) multiple initial nodes,
and graph (c) (rejected) with no initial nodes
7.2 Example of paths
7.3 A Single-Entry Single-Exit graph
7.4 Test case mappings to test paths
7.5 A set of test cases and corresponding test paths
7.6 A graph showing Node Coverage and Edge Coverage
7.7 Two graphs showing prime path coverage
7.8 Graph with a loop
7.9 Tours, sidetrips, and detours in graph coverage
7.10 An example for prime test paths
7.11 A graph showing variables, def sets and use sets
7.12 A graph showing an example of du-paths
7.13 Graph showing explicit def and use sets
7.14 Example of the differences among the three data flow coverage
criteria
7.15 Subsumption relations among graph coverage criteria
7.16 CFG fragment for the if-else structure
7.17 CFG fragment for the if structure without an else
7.18 CFG fragment for the if structure with a return
7.19 CFG fragment for the while loop structure
7.20 CFG fragment for the for loop structure
7.21 CFG fragment for the do-while structure
7.22 CFG fragment for the while loop with a break structure
7.23 CFG fragment for the case structure
7.24 CFG fragment for the try-catch structure
7.25 Method patternIndex() for data flow example
7.26 A simple call graph
7.27 A simple inheritance hierarchy
7.28 An inheritance hierarchy with objects instantiated
7.29 An example of parameter coupling
7.30 Coupling du-pairs
7.31 Last-defs and first-uses
7.32 Quadratic root program
7.33 Def-use pairs under intra-procedural and inter-procedural data flow
7.34 Def-use pairs in object-oriented software
7.35 Def-use pairs in web applications and other distributed software
7.36 Control flow graph using the File ADT
7.37 Elevator door open transition
7.38 Watch–Part A
7.39 Watch–Part B
7.40 An FSM representing Watch, based on control flow graphs of the
methods
7.41 An FSM representing Watch, based on the structure of thesoftware
7.42 An FSM representing Watch, based on modeling state variables
7.43 ATM actor and use cases
7.44 Activity graph for ATM withdraw funds
8.1 Subsumption relations among logic coverage criteria
8.2 Fault detection relationships
8.3 Thermostat class
8.4 PC true test for Thermostat class
8.5 CC test assignments for Thermostat class
8.6 Calendar method
8.7 FSM for a memory car seat–Nissan Maxima 2012
9.1 Method Min and six mutants
9.2 Mutation testing process
9.3 Partial truth table for (a ∧ b)
9.4 Finite state machine for SMV specification
9.5 Mutated finite state machine for SMV specification
9.6 Finite state machine for bank example
9.7 Finite state machine for bank example grammar
9.8 Simple XML message for books
9.9 XML schema for books
12.1 Test double example: Replacing a component
Tables
6.1
6.2
6.3
6.4
6.5
First partitioning of triang()’s inputs (interface-based)
Second partitioning of triang()’s inputs (interface-based)
Possible values for blocks in the second partitioning in Table 6.2
Geometric partitioning of triang()’s inputs (functionality-based)
Correct geometric partitioning of triang()’s inputs (functionalitybased)
6.6 Possible values for blocks in geometric partitioning in Table 6.5
6.7 Examples of invalid block combinations
6.8 Table A for Iterator example: Input parameters and
characteristics
6.9 Table B for Iterator example: Partitions and base case
6.10 Table C for Iterator example: Refined test requirements
6.11 Table A for Iterator example: Input parameters and
characteristics (revised)
6.12 Table C for Iterator example: Refined test requirements
(revised)
7.1 Defs and uses at each node in the CFG for patternIndex()
7.2 Defs and uses at each edge in the CFG for patternIndex()
7.3 du-path sets for each variable in patternIndex()
7.4 Test paths to satisfy all du-paths coverage on patternIndex()
7.5 Test paths and du-paths covered in patternIndex()
8.1 DNF fault classes
8.2 Reachability for Thermostat predicates
8.3 Clauses in the Thermostat predicate on lines 28-30
8.4 Correlated active clause coverage for Thermostat
8.5 Correlated active clause coverage for cal() preconditions
8.6 Predicates from memory seat example
9.1 Java’s access levels
10.1 Testing objectives and activities during requirements analysis and
specification
10.2 Testing objectives and activities during system and software design
10.3 Testing objectives and activities during intermediate design
10.4 Testing objectives and activities during detailed design
10.5 Testing objectives and activities during implementation
10.6 Testing objectives and activities during integration
10.7 Testing objectives and activities during system deployment
10.8 Testing objectives and activities during operation and maintenance
Preface to the Second Edition
Much has changed in the field of testing in the eight years since the first
edition was published. High-quality testing is now more common in
industry. Test automation is now ubiquitous, and almost assumed in large
segments of the industry. Agile processes and test-driven development are
now widely known and used. Many more colleges offer courses on
software testing, both at the undergraduate and graduate levels. The ACM
curriculum guidelines for software engineering include software testing in
several places, including as a strongly recommended course [Ardis et al.,
2015].
The second edition of Introduction to Software Testing incorporates new
features and material, yet retains the structure, philosophy, and online
resources that have been so popular among the hundreds of teachers who
haveused the book.
What is new about the second edition?
The first thing any instructor has to do when presented with a new edition
of a book is analyze what must be changed in the course. Since we have
been in that situation many times, we want to make it as easy as possible
for our audience. We start with a chapter-to-chapter mapping.
First
Edition
Chapter 1
Second
Edition
Topic
Part I: Foundations
Why do we test software?
(motivation)
Chapter 01
Model-driven test design
Chapter 02
(abstraction)
Chapter 03
Test automation (JUnit)
Chapter 04
Putting testing first (TDD)
Chapter 05
Chapter 2
Chapter 3
Chapter 4
Chapter 5
Chapter 6
Chapter 7
Chapter 8
Chapter 9
Criteria-based test design (criteria)
Part II: Coverage Criteria
Chapter 07
Graph coverage
Chapter 08
Logic coverage
Chapter 09
Syntax-based testing
Chapter 06
Input space partitioning
Part III: Testing in Practice
Managing the test process
Chapter 10
Writing test plans
Chapter 11
Test implementation
Chapter 12
Regression testing for evolving
Chapter 13
software
Chapter 14
Writing effective test oracles
N/A
Technologies
N/A
Tools
N/A
Challenges
The most obvious, and largest change, is that the introductory chapter 1
from the first edition has been expanded into five separate chapters. This is
a significant expansion that we believe makes the book much better. The
new part 1 grew out of our lectures. After the first edition came out, we
started adding more foundational material to our testing courses. These
new ideas were eventually reorganized into five new chapters. The new
chapter 011 has much of the material from the first edition chapter 1,
including motivation and basic definitions. It closes with a discussion of
the cost of late testing, taken from the 2002 RTI report that is cited in
every software testing research proposal. After completing the first edition,
we realized that the key novel feature of the book, viewing test design as
an abstract activity that is independent of the software artifact being used
to design the tests, implied a completely different process. This led to
chapter 02, which suggests how test criteria can fit into practice. Through
our consulting, we have helped software companies modify their test
processes to incorporate this model.
A flaw with the first edition was that it did not mention JUnit or other
test automation frameworks. In 2016, JUnit is used very widely in
industry, and is commonly used in CS1 and CS2 classes for automated
grading. Chapter 03 rectifies this oversight by discussing test automation
in general, the concepts that make test automation difficult, and explicitly
teaches JUnit. Although the book is largely technology-neutral, having a
consistent test framework throughout the book helps with examples and
exercises. In our classes, we usually require tests to be automated and
often ask students to try other “*-Unit” frameworks such as HttpUnit as
homework exercises. We believe that test organizations cannot be ready to
apply test criteria successfully before they have automated their tests.
Chapter 04 goes to the natural next step of test-driven development.
Although TDD is a different take on testing than the rest of the book, it’s
an exciting topic for test educators and researchers precisely because it
puts testing front and center—the tests become the requirements. Finally,
chapter 05 introduces the concept of test criteria in an abstract way. The
jelly bean example (which our students love, especially when we share), is
still there, as are concepts such as subsumption.
Part 2, which is the heart of the book, has changed the least for the
second edition. In 2014, Jeff asked Paul a very simple question: “Why are
the four chapters in part 2 in that order?” The answer was stunned silence,
as we realized that we had never asked which order they should appear in.
It turns out that the RIPR model, which is certainly central to software
testing, dictates a logical order. Specifically, input space partitioning does
not require reachability, infection, or propagation. Graph coverage criteria
require execution to “get to” some location in the software artifact under
test, that is, reachability, but not infection or propagation. Logic coverage
criteria require that a predicate not only be reached, but be exercised in a
particular way to affect the result of the predicate. That is, the predicate
must be infected. Finally, syntax coverage not only requires that a location
be reached, and that the program state of the “mutated” version be
different from the original version, but that difference must be visible after
execution finishes. That is, it must propagate. The second edition orders
these four concepts based on the RIPR model, where each chapter now has
successively stronger requirements. From a practical perspective, all we
did was move the previous chapter 5 (now chapter 06) in front of the graph
chapter (now chapter 07).
Another major structural change is that the second edition does not
include chapters 7 through 9 from the first edition. The first edition
material has become dated. Because it is used less than other material in
the book, we decided not to delay this new edition of the book while we
tried to find time to write this material. We plan to include better versions
of these chapters in a third edition.
We also made hundreds of changes at a more detailed level. Recent
research has found that in addition to an incorrect value propagating to the
output, testing only succeeds if our automated test oracle looks at the right
part of the software output. That is, the test oracle must reveal the failure.
Thus, the old RIP model is now the RIPR model. Several places in the
book have discussions that go beyond or into more depth than is strictly
needed. The second edition now includes “meta discussions,” which are
ancillary discussions that can be interesting or insightful to some students,
but unnecessarily complicated for others.
The new chapter 06 now has a fully worked out example of deriving an
input domain model from a widely used Java library interface (in section
06.4). Our students have found this helps them understand how to use the
input space partitioning techniques. The first edition included a section on
“Representing graphs algebraically.” Although one of us found this
material to be fun, we both found it hard to motivate and unlikely to be
used in practice. It also has some subtle technical flaws. Thus, we removed
this section from the second edition. The new chapter 08 (logic) has a
significant structural modification. The DNF criteria (formerly in section
3.6) properly belong at the front of the chapter. Chapter 08 now starts with
semantic logic criteria (ACC and ICC) in 08.1, then proceeds to syntactic
logic criteria (DNF) in 08.2. The syntactic logic criteria have also changed.
One was dropped (UTPC), and CUTPNFP has been joined by MUTP and
MNFP. Together, these three criteria comprise MUMCUT.
Throughout the book (especially part 2), we have improved the
examples, simplified definitions, and included more exercises. When the
first edition was published we had a partial solution manual, which
somehow took five years to complete. We are proud to say that we learned
from that mistake: we made (and stuck by!) a rule that we couldn’t add an
exercise without also adding a solution. The reader might think of this rule
as testing for exercises. We are glad to say that the second edition book
website debuts with a complete solution manual.
The second edition also has many dozens of corrections (starting with
the errata list from the first edition book website), but including many
more that we found while preparing the second edition. The second edition
also has a better index. We put together the index for the first edition in
about a day, and it showed. This time we have been indexing as we write,
and committed time near the end of the process to specifically focus on the
index. For future book writers, indexing is hard work and not easy to turn
over to a non-author!
What is still the same in the second edition?
The things that have stayed the same are those that were successful in the
first edition. The overall observation that test criteria are based on only
four types of structures is still the key organizing principle of the second
edition. The second edition is also written from an engineering viewpoint,
assuming that users of the book are engineers who want to produce the
highest quality software with the lowest possible cost. The concepts are
well grounded in theory, yet presented in a practical manner. That is, the
book tries to make theory meet practice; the theory is sound according to
the research literature, but we also show how the theory applies in practice.
The book is also written as a text book, with clear explanations, simple
but illustrative examples, and lots of exercises suitable for in-class or outof-class work. Each chapter ends with bibliographic notes so that
beginning research students can proceed to learning the deeper ideas
involved
in
software
testing.
The
book
website
(https://cs.gmu.edu/~offutt/softwaretest/) is rich in materials with solution
manuals, listings of all example programs in the text, high quality
PowerPoint slides, and software to help students with graph coverage,
logic coverage, and mutation analysis. Some explanatory videos are also
available and we hope more will follow. The solution manual comes in
two flavors. The student solution manual, with solutions to about half the
exercises, is available to everyone. The instructor solution manual has
solutions to all exercises and is only available to those who convince the
authors that they are using a book to teach a course.
Using the book in the classroom
The book chapters are built in a modular, component-based manner. Most
chapters are independent, and although they are presented in the order that
we use them, inter-chapter dependencies are few and they could be used in
almost any order. Our primary target courses at our university are a fourthyear course (SWE 437) and a first-year graduate course (SWE 637).
Interested readers can search on those courses (“mason swe 437” or
“mason swe 637”) to see our schedules and how we use the book. Both
courses are required; SWE 437 is required in the software engineering
concentration in our Applied Computer Science major, and SWE 637 is
required in our MS program in software engineering2. Chapters 01 and 03
can be used in an early course such as CS2 in two ways. First, to sensitize
early students to the importance of software quality, and second to get
them started with test automation (we use JUnit at Mason). A second-year
course in testing could cover all of part 1, chapter 06 from part 2, and all or
part of part 3. The other chapters in part 2 are probably more than what
such students need, but input space partitioning is a very accessible
introduction to structured, high-end testing. A common course in north
American computer science programs is a third-year course on general
software engineering. Part 1 would be very appropriate for such a course.
In 2016 we are introducing an advanced graduate course on software
testing, which will span cutting-edge knowledge and current research. This
course will use some of part 3, the material that we are currently
developing for part 4, and selected research papers.
Teaching software testing
Both authors have become students of teaching over the past decade. In the
early 2000s, we ran fairly traditional classrooms. We lectured for most of
the available class time, kept organized with extensive PowerPoint slides,
required homework assignments to be completed individually, and gave
challenging, high-pressure exams. The PowerPoint slides and exercises in
the first edition were designed for this model.
However, our teaching has evolved. We replaced our midterm exam
with weekly quizzes, given in the first 15 minutes of class. This distributed
a large component of the grade through the semester, relieved much of the
stress of midterms, encouraged the students to keep up on a weekly basis
instead of cramming right before the exam, and helped us identify students
who were succeeding or struggling early in the term.
After learning about the “flipped classroom” model, we experimented
with recorded lectures, viewed online, followed by doing the “homework”
assignments in class with us available for immediate help. We found this
particularly helpful with the more mathematically sophisticated material
such as logic coverage, and especially beneficial to struggling students. As
the educational research evidence against the benefits of lectures has
mounted, we have been moving away from the “sage on a stage” model of
talking for two hours straight. We now often talk for 10 to 20 minutes,
then give in-class exercises3 where the students immediately try to solve
problems or answer questions. We confess that this is difficult for us,
because we love to talk! Or, instead of showing an example during our
lecture, we introduce the example, let the students work the next step in
small groups, and then share the results. Sometimes our solutions are
better, sometimes theirs are better, and sometimes solutions differ in
interesting ways that spur discussion.
There is no doubt that this approach to teaching takes time and cannot
acccomodate all of the PowerPoint slides we have developed. We believe
that although we cover less material, we uncover more, a perception
consistent with how our students perform on our final exams.
Most of the in-class exercises are done in small groups. We also
encourage students to work out-of-class assignments collaboratively. Not
only does evidence show that students learn more when they work
collaboratively (“peer-learning”), they enjoy it more, and it matches the
industrial reality. Very few software engineers work alone.
Of course, you can use this book in your class as you see fit. We offer
these insights simply as examples for things that work for us. We
summarize our current philosophy of teaching simply: Less talking, more
teaching.
Acknowledgments
It is our pleasure to acknowledge by name the many contributers to this
text. We begin with students at George Mason who provided excellent
feedback on early draft chapters from the second edition: Firass Almiski,
Natalia Anpilova, Khalid Bargqdle, Mathew Fadoul, Mark Feghali,
Angelica Garcia, Mahmoud Hammad, Husam Hilal, Carolyn Koerner,
Han-Tsung Liu, Charon Lu, Brian Mitchell, Tuan Nguyen, Bill Shelton,
Dzung Tran, Dzung Tray, Sam Tryon, Jing Wu, Zhonghua Xi, and Chris
Yeung.
We are particularly grateful to colleagues who used draft chapters of the
second edition. These early adopters provided valuable feedback that was
extremely helpful in making the final document classroom-ready. Thanks
to: Moataz Ahmed, King Fahd University of Petroleum & Minerals; Jeff
Carver, University of Alabama; Richard Carver, George Mason
University; Jens Hannemann, Kentucky State University; Jane Hayes,
University of Kentucky; Kathleen Keogh, Federation University Australia;
Robyn Lutz, Iowa State University; Upsorn Praphamontripong, George
Mason University; Alper Sen, Bogazici University; Marjan Sirjani,
Reykjavik University; Mary Lou Soffa, University of Virginia; Katie
Stolee, North Carolina State University; and Xiaohong Wang, Salisbury
University.
Several colleagues provided exceptional feedback from the first edition:
Andy Brooks, Mark Hampton, Jian Zhou, Jeff (Yu) Lei, and six
anonymous reviewers contacted by our publisher. The following
individuals corrected, and in some cases developed, exercise solutions:
Sana’a Alshdefat, Yasmine Badr, Jim Bowring, Steven Dastvan, Justin
Donnelly, Martin Gebert, JingJing Gu, Jane Hayes, Rama Kesavan,
Ignacio Martín, Maricel Medina-Mora, Xin Meng, Beth Paredes, Matt
Rutherford, Farida Sabry, Aya Salah, Hooman Safaee, Preetham
Vemasani, and Greg Williams. The following George Mason students
found, and often corrected, errors in the first edition: Arif Al-Mashhadani,
Yousuf Ashparie, Parag Bhagwat, Firdu Bati, Andrew Hollingsworth,
Gary Kaminski, Rama Kesavan, Steve Kinder, John Krause, Jae Hyuk
Kwak, Nan Li, Mohita Mathur, Maricel Medina Mora, Upsorn
Praphamontripong, Rowland Pitts, Mark Pumphrey, Mark Shapiro, Bill
Shelton, David Sracic, Jose Torres, Preetham Vemasani, Shuang Wang,
Lance Witkowski, Leonard S. Woody III, and Yanyan Zhu. The following
individuals from elsewhere found, and often corrected, errors in the first
edition: Sana’a Alshdefat, Alexandre Bartel, Don Braffitt, Andrew Brooks,
Josh Dehlinger, Gordon Fraser, Rob Fredericks, Weiyi Li, Hassan Mirian,
Alan Moraes, Miika Nurminen, Thomas Reinbacher, Hooman Rafat
Safaee, Hossein Saiedian, Aya Salah, and Markku Sakkinen. Lian Yu of
Peking University translated the the first edition into Mandarin Chinese.
We also want to acknowledge those who implicitly contributed to the
second edition by explicitly contributing to the first edition: Aynur
Abdurazik, Muhammad Abdulla, Roger Alexander, Lionel Briand, Renee
Bryce, GeorgeP. Burdell, Guillermo Calderon-Meza, Jyothi Chinman,
Yuquin Ding, Blaine Donley, Patrick Emery, Brian Geary, Hassan Gomaa,
Mats Grindal, Becky Hartley, Jane Hayes, Mark Hinkle, Justin
Hollingsworth, Hong Huang, Gary Kaminski, John King, Yuelan Li, Ling
Liu, Xiaojuan Liu, Chris Magrin, Darko Marinov, Robert Nilsson, Andrew
J. Offutt, Buzz Pioso, Jyothi Reddy, Arthur Reyes, Raimi Rufai, Bo
Sanden, Jeremy Schneider, Bill Shelton, Michael Shin, Frank Shukis, Greg
Williams, Quansheng Xiao, Tao Xie, Wuzhi Xu, and Linzhen Xue.
While developing the second edition, our graduate teaching assistants at
George Mason gave us fantastic feedback on early drafts of chapters: Lin
Deng, Jingjing Gu, Nan Li, and Upsorn Praphamontripong. In particular,
Nan Li and Lin Deng were instrumental in completing, evolving, and
maintaining the software coverage tools available on the book website.
We are grateful to our editor, Lauren Cowles, for providing unwavering
support and enforcing the occasional deadline to move the project along,
as well as Heather Bergmann, our former editor, for her strong support on
this long-running project.
Finally, of course none of this is possible without the support of our
families. Thanks to Becky, Jian, Steffi, Matt, Joyce, and Andrew for
helping us stay balanced.
Just as all programs contain faults, all texts contain errors. Our text is no
different. And, as responsibility for software faults rests with the
developers, responsibility for errors in this text rests with us, the authors.
In particular, the bibliographic notes sections reflect our perspective of the
testing field, a body of work we readily acknowledge as large and
complex. We apologize in advance for omissions, and invite pointers to
relevant citations.
1 To help reduce confusion, we developed the convention of using two digits for
second edition chapters. Thus, in this preface,chapter 01 implies the second
edition, whereas chapter 1 implies the first.
2 Our MS program is practical in nature, not research-oriented. The majority of
students are part-time students with five to tenyears of experience in the software
industry. SWE 637 begat this book when we realized Beizer’s classic text
[Beizer, 1990] was out ofprint.
3 These in-class exercises are not yet a formal part of the book website. But we
often draw them from regular exercises in the text. Interested readers can extract
recent versions from our course web pages with a search engine.
PART I
Foundations
1
Why Do We Test Software?
The true subject matter of the tester is not testing, but the design of test cases.
The purpose of this book is to teach software engineers how to test. This
knowledge is useful whether you are a programmer who needs to unit test
your own software, a full-time tester who works mostly from requirements
at the user level, a manager in charge of testing or development, or any
position in between. As the software industry moves into the second
decade of the 21st century, software quality is increasingly becoming
essential to all businesses and knowledge of software testing is becoming
necessary for all software engineers.
Today, software defines behaviors that our civilization depends on in
systems such as network routers, financial calculation engines, switching
networks, the Web, power grids, transportation systems, and essential
communications, command, and control services. Over the past two
decades, the software industry has become much bigger, is more
competitive, and has more users. Software is an essential component of
exotic embedded applications such as airplanes, spaceships, and air traffic
control systems, as well as mundane appliances such as watches, ovens,
cars, DVD players, garage door openers, mobile phones, and remote
controllers. Modern households have hundreds of processors, and new cars
have over a thousand; all of them running software that optimistic
consumers assume will never fail! Although many factors affect the
engineering of reliable software, including, of course, careful design and
sound process management, testing is the primary way industry evaluates
software during development. The recent growth in agile processes puts
increased pressure on testing; unit testing is emphasized heavily and testdriven development makes tests key to functional requirements. It is clear
that industry is deep into a revolution in what testing means to the success
of software products.
Fortunately, a few basic software testing concepts can be used to design
tests for a large variety of software applications. A goal of this book is to
present these concepts in such a way that students and practicing engineers
can easily apply them to any software testing situation.
This textbook differs from other software testing books in several
respects. The most important difference is in how it views testing
techniques. In his landmark book Software Testing Techniques, Beizer
wrote that testing is simple—all a tester needs to do is “find a graph and
cover it.” Thanks to Beizer’s insight, it became evident to us that the
myriad of testing techniques present in the literature have much more in
common than is obvious at first glance. Testing techniques are typically
presented in the context of a particular software artifact (for example, a
requirements document or code) or a particular phase of the lifecycle (for
example, requirements analysis or implementation). Unfortunately, such a
presentation obscures underlying similarities among techniques.
This book clarifies these similarities with two innovative, yet
simplifying, approaches. First, we show how testing is more efficient and
effective by using a classical engineering approach. Instead of designing
and developing tests on concrete software artifacts like the source code or
requirements, we show how to develop abstraction models, design tests at
the abstract level, and then implement actualtests at the concrete level by
satisfying the abstract designs. This is the exact process that traditional
engineers use, except whereas they usually use calculus and algebra to
describe the abstract models, software engineers usually use discrete
mathematics. Second, we recognize that all test criteria can be defined
with a very short list of abstract models: input domain characterizations,
graphs, logical expressions, and syntactic descriptions. These are directly
reflected in the four chapters of Part II of this book.
This book provides a balance of theory and practical application,
thereby presenting testing as a collection of objective, quantitative
activities that can be measured and repeated. The theory is based on the
published literature, and presented without excessive formalism. Most
importantly, the theoretical concepts are presented when needed to support
the practical activities that test engineers follow. That is, this book is
intended for all software developers.
1.1 WHEN SOFTWARE GOES BAD
As said, we consider the development of software to be engineering. And
like any engineering discipline, the software industry has its shares of
failures, some spectacular, some mundane, some costly, and sadly, some
that have resulted in loss of life. Before learning about software disasters,
it is important to understand the difference between faults, errors, and
failures. We adopt the definitions of software fault, error, and failure from
the dependability community.
Definition 1.1 Software Fault: A static defect in the software.
Definition 1.2 Software Error: An incorrect internal state that is the
manifestation of some fault.
Definition 1.3 Software Failure: External, incorrect behavior with
respect to the requirements or another description of the expected
behavior.
Consider a medical doctor diagnosing a patient. The patient enters the
doctor’s office with a list of failures (that is, symptoms). The doctor then
must discover the fault, or root cause of the symptoms. To aid in the
diagnosis, a doctor may order tests that look for anomalous internal
conditions, such as high blood pressure, an irregular heartbeat, high levels
of blood glucose, or high cholesterol. In our terminology, these anomalous
internal conditions correspond to errors.
While this analogy may help the student clarify his or her thinking about
faults, errors, and failures, software testing and a doctor’s diagnosis differ
in one crucial way. Specifically, faults in software are design mistakes.
They do not appear spontaneously, but exist as a result of a decision by a
human. Medical problems (as well as faults in computer system hardware),
on the other hand, are often a result of physical degradation. This
distinction is important because it limits the extent to which any process
can hope to control software faults. Specifically, since no foolproof way
exists to catch arbitrary mistakes made by humans, we can never eliminate
all faults from software. In colloquial terms, we can make software
development foolproof, but we cannot, and should not attempt to, make it
damn-foolproof.
For a more precise example of the definitions of fault, error, and failure,
we need to clarify the concept of the state. A program state is defined
during execution of a program as the current value of all live variables and
the current location, as given by the program counter. The program
counter (PC) is the next statement in the program to be executed and can
be described with a line number in the file (PC = 5) or the statement as a
string (PC = “if (x > y)”). Most of the time, what we mean by a statement
is obvious, but complex structures such as for loops have to be treated
specially. The program line “for (i=1; i < N; i++)” actually has three statements that can result in separate states. The loop initialization (“i=1”) is separate from the loop test (“i < N”), and the loop increment (“i++”) occurs at the end of the loop body. As an illustrative example, consider the following Java method: Sidebar Programming Language Independence This book strives to be independent of language, and most of the concepts in the book are. At the same time, we want to illustrate these concepts with specific examples. We choose Java, and emphasize that most of these examples would be very similar in many other common languages. The fault in this method is that it starts looking for zeroes at index 1 instead of index 0, as is necessary for arrays in Java. For example, numZero ([2, 7, 0]) correctly evaluates to 1, while numZero ([0, 7, 2]) incorrectly evaluates to 0. In both tests the faulty statement is executed. Although both of these tests result in an error, only the second results in failure. To understand the error states, we need to identify the state for the method. The statefor numZero() consists of values for the variables x, count, i, and the program counter ( PC). For the first example above, the state at the loop test on the very first iteration of the loop is ( x = [2, 7, 0], count = 0, i = 1, PC = “ i < x.length”). Notice that this state is erroneous precisely because the value of i should be zero on the first iteration. However, since the value of count is coincidentally correct, the error state does not propagate to the output, and hence the software does not fail. In other words, a state is in error simply if it is not the expected state, even if all of the values in the state, considered in isolation, are acceptable. More generally, if the required sequence of states is s0, s1, s2, ..., and the actual sequence of states is s0, s2, s3, ..., then state s2 is in error in the second sequence. The fault model described here is quite deep, and this discussion gives the broad view without going into unneeded details. The exercises at the end of the section explore some of the subtleties of the fault model. In the second test for our example, the error state is ( x = [0, 7, 2], count = 0, i = 1, PC = “ i < x.length”). In this case, the error propagates to the variable count and is present in the return value of the method. Hence a failure results. The term bug is often used informally to refer to all three of fault, error, and failure. This book will usually use the specific term, and avoid using “bug.” A favorite story of software engineering teachers is that Grace Hopper found a moth stuck in a relay on an early computing machine, which started the use of bug as a problem with software. It is worth noting, however, that the term bug has an old and rich history, predating software by at least a century. The first use of bug to generally mean a problem we were able to find is from a quote by Thomas Edison : It has been just so in all of my inventions. The first step is an intuition, and comes with a burst, then difficulties arise–this thing gives out and [it is] then that ‘Bugs’–as such little faults and difficulties are called–show themselves and months of intense watching, study and labor are requisite. — Thomas Edison A very public failure was the Mars lander of September 1999, which crashed due to a misunderstanding in the units of measure used by two modules created by separate software groups. One module computed thruster data in English units and forwarded the data to a module that expected data in metric units. This is a very typical integration fault (but in this case enormously expensive, both in terms of money and prestige). One of the most famous cases of software killing people is the Therac25 radiation therapy machine. Software faults were found to have caused at least three deaths due to excessive radiation. Another dramatic failure was the launch failure of the first Ariane 5 rocket, which exploded 37 seconds after liftoff in 1996. The low-level cause was an unhandled floating point conversion exception in an inertial guidance system function. It turned out that the guidance system could never encounter the unhandled exception when used on the Ariane 4 rocket. That is, theguidance system function was correct for Ariane 4. The developers of the Ariane 5 quite reasonably wanted to reuse the successful inertial guidance system from the Ariane 4, but no one reanalyzed the software in light of the substantially different flight trajectory of the Ariane 5. Furthermore, the system tests that would have found the problem were technically difficult to execute, and so were not performed. The result was spectacular–and expensive! The famous Pentium bug was an early alarm of the need for better testing, especially unit testing. Intel introduced its Pentium microprocessor in 1994, and a few months later, Thomas Nicely, a mathematician at Lynchburg College in Virginia, found that the chip gave incorrect answers to certain floating-point division calculations. The chip was slightly inaccurate for a few pairs of numbers; Intel claimed (probably correctly) that only one in nine billion division operations would exhibit reduced precision. The fault was the omission of five entries in a table of 1, 066 values (part of the chip’s circuitry) used by a division algorithm. The five entries should have contained theconstant +2, but the entries were not initialized and contained zero instead. The MIT mathematician Edelman claimed that “the bug in the Pentium was an easy mistake to make, and a difficult one to catch,” an analysis that misses an essential point. This was a very difficult mistake to find during system testing, and indeed, Intel claimed to have run millions of tests using this table. But the table entries were left empty because a loop termination condition was incorrect; that is, the loop stopped storing numbers before it was finished. Thus, this would have been a very simple fault to find during unit testing; indeed analysis showed that almost any unit level coverage criterion would have found this multimillion dollar mistake. The great northeast blackout of 2003 was started when a power line in Ohio brushed against overgrown trees and shut down. This is called a fault in the power industry. Unfortunately, the software alarm system failed in the local power company, so system operators could not understand what happened. Other lines also sagged into trees and switched off, eventually overloading other power lines, which then cut off. This cascade effect eventually caused a blackout throughout southeastern Canada and eight states in the northeastern part of the US. This is considered the biggest blackout in North American history, affecting 10 million people in Canada and 40 million in the USA, contributing to at least 11 deaths and costing up to $6 billion USD. Some software failures are felt widely enough to cause severe embarrassment to the company. In 2011, a centralized students data management system in Korea miscalculated the academic grades of over 29, 000 middle and high school students. This led to massive confusion about college admissions and a government investigation into the software engineering practices of the software vendor, Samsung Electronics. A 1999 study commissioned by the U.S. National Research Council and the U.S. President’s commission on critical infrastructure protection concluded that the current base of science and technology is inadequate for building systems to control critical software infrastructure. A 2002 report commissioned by the National Institute of Standards and Technology (NIST) estimated that defective software costs the U.S. economy $59.5 billion per year. The report further estimated that 64% of the costs were a result of user mistakes and 36% a result of design and development mistakes, and suggested that improvements in testing could reduce this cost by about a third, or $22.5 billion. Blumenstyk reported that web application failures lead to huge losses in businesses; $150, 000 per hour in media companies, $2.4 million per hour in credit card sales, and $6.5 million per hour in the financial services market. Software faults do not just lead to functional failures. According to a Symantec security threat report in 2007, 61 percent of all vulnerabilities disclosed were due to faulty software. The most common are web application vulnerabilities that can be attacked by some common attack techniques using invalid inputs. These public and expensive software failures are getting more common and more widely known. This is simply a symptom of the change in expectations of software. As we move further into the 21st century, we are using more safety critical, real-time software. Embedded software has become ubiquitous; many of us carry millions of lines of embedded software in our pockets. Corporations rely more and more on large-scale enterprise applications, which by definition have large user bases and high reliability requirements. Security, which used to depend on cryptography, then database security, then avoiding network vulnerabilities, is now largely about avoiding software faults. The Web has had a major impact. It features a deployment platform that offers software services that are very competitive and available to millions of users. They are also distributed, adding complexity, and must be highly reliable to be competitive. More so than at any previous time, industry desperately needs to apply the accumulated knowledge of over 30 years of testing research. 1.2 GOALS OF TESTING SOFTWARE Surprisingly, many software engineers are not clear about their testing goals. Is it to show correctness, find problems, or something else? To explore this concept, we first must separate validation and verification. Most of the definitions in this book are taken from standards documents, and although the phrasing is ours, we try to be consistent with the standards. Useful standards for reading in more detail are the IEEE Standard Glossary of Software Engineering Terminology, DOD-STD2167A and MIL-STD-498 from the US Department of Defense, and the British Computer Society’s Standard for Software Component Testing. Definition 1.4 Verification: The process of determining whether the products of a phase of the software development process fulfill the requirements established during the previous phase. Definition 1.5 Validation: The process of evaluating software at the end of software development to ensure compliance with intended usage. Verification is usually a more technical activity that uses knowledge about the individual software artifacts, requirements, and specifications. Validation usually depends on domain knowledge; that is, knowledge of the application for which the software is written. For example, validation of software for an airplane requires knowledge from aerospace engineers and pilots. As a familiar example, consider a light switch in a conference room. Verification asks if the lighting meets the specifications. The specifications might say something like, “The lights in front of the projector screen can be controlled independently of the other lights in the room.” If the specifications are written down somewhere and thelights cannot be controlled independently, then the lighting fails verification, precisely because the implementation does not satisfy the specifications. Validation asks whether users are satisfied, an inherently fuzzy question that has nothing to do with verification. If the “independent control” specification is neither written down nor satisfied, then, despite the disappointed users, verification nonetheless succeeds, because the implementation satisfies the specification. But validation fails, because the specification for the lighting does not reflect the true needs of the users. This is an important general point: validation exposes flaws in specifications. The acronym “IV&V” stands for “Independent Verification and Validation,” where “independent” means that the evaluation is done by non-developers. Sometimes the IV&V team is within the same project, sometimes the same company, and sometimes it is entirely an external entity. In part because of the independent nature of IV&V, the process often is not started until the software is complete and is often done by people whose expertise is in the application domain rather than software development. This can sometimes mean that validation is given more weight than verification. This book emphasizes verification more than validation, although most of the specific test criteria we discuss can be used for both activities. Beizer discussed the goals of testing in terms of the “test process maturity levels” of an organization, where the levels are characterized by the testers’ goals. He defined five levels, where the lowest level is not worthy of being given a number. Level 0 There is no difference between testing and debugging. Level 1 The purpose of testing is to show correctness. Level 2 The purpose of testing is to show that the software does not work. Level 3 The purpose of testing is not to prove anything specific, but to reduce the risk of using the software. Level 4 Testing is a mental discipline that helps all IT professionals develop higher- quality software. Level 0 is the view that testing is the same as debugging. This is the view that is naturally adopted by many undergraduate Computer Science majors. In most CS programming classes, the students get their programs to compile, then debug the programs with a few inputs chosen either arbitrarily or provided by the professor. This model does not distinguish between a program’s incorrect behavior and a mistake within the program, and does very little to help develop software that is reliable or safe. In Level 1 testing, the purpose is to show correctness. While a significant step up from the naive level 0, this has the unfortunate problem that in any but the most trivial of programs, correctness is virtually impossible to either achieve or demonstrate. Suppose we run a collection of tests and find no failures. What do we know? Should we assume that we have good software or just bad tests? Since the goal of correctness is impossible, test engineers usually have no strict goal, real stopping rule, or formal test technique. If a development manager asks how much testing remains to be done, the test manager has no way to answer the question. In fact, test managers are in a weak position because they have no way to quantitatively express or evaluate theirwork. In Level 2 testing, the purpose is to show failures. Although looking for failures is certainly a valid goal, it is also inherently negative. Testers may enjoy finding the problem, but the developers never want to find problems–they want the software to work (yes, level 1 thinking can be natural for the developers). Thus, level 2 testing puts testers and developers into an adversarial relationship, which can be bad for team morale. Beyond that, when our primary goal is to look for failures, we are still left wondering what to do if no failures are found. Is our work done? Is our software very good, or is the testing weak? Having confidence in when testing is complete is an important goal for all testers. It is our view that this level currently dominates the software industry. The thinking that leads to Level 3 testing starts with the realization that testing can show the presence, but not the absence, of failures. This lets us accept the fact that whenever we use software, we incur some risk. The risk may be small and the consequences unimportant, or the risk may be great and the consequences catastrophic, but risk is always there. This allows us to realize that the entire development team wants the same thing–to reduce the risk of using the software. In level 3 testing, both testers and developers work together to reduce risk. We see more and more companies move to this testing maturity level every year. Once the testers and developers are on the same “team,” an organization can progress to real Level 4 testing. Level 4 thinking defines testing as a mental discipline that increases quality. Various ways exist to increase quality, of which creating tests that cause the software to fail is only one. Adopting this mindset, test engineers can become the technical leaders of the project (as is common in many other engineering disciplines). They have the primary responsibility of measuring and improving software quality, and their expertise should help the developers. Beizer used the analogy of a spell checker. We often think that the purpose of a spell checker is to find misspelled words, but in fact, the best purpose of a spell checker is to improve our ability to spell. Every time the spell checker finds an incorrectly spelled word, we have the opportunity to learn how to spell the word correctly. The spell checker is the “expert” on spelling quality. In the same way, level 4 testing means that the purpose of testing is to improve the ability of the developers to produce high-quality software. The testers should be the experts who train your developers! As a reader of this book, you probably start at level 0, 1, or 2. Most software developers go through these levels at some stage in their careers. If you work in software development, you might pause to reflect on which testing level describes your company or team. The remaining chapters in Part I should help you move to level 2 thinking, and to understand the importance of level 3. Subsequent chapters will give you the knowledge, skills, and tools to be able to work at level 3. An ultimate goal of this book is to provide a philosophical basis that will allow readers to become “change agents” in their organizations for level 4 thinking, and test engineers to become software quality experts. Although level 4 thinking is currently rare in the software industry, it is common in more mature engineeringfields. These considerations help us decide at a strategic level why we test. At a more tactical level, it is important to know why each test is present. If you do not know why you are conducting each test, the test will not be very helpful. What fact is each test trying to verify? It is essential to document test objectives and test requirements, including the planned coverage levels. When the test manager attends a planning meeting with the other managers and the project manager, the test manager must be able to articulate clearly how much testing is enough and when testing will complete. In the 1990s, we could use the “date criterion,” that is, testing is “complete” when the ship date arrives or when the budget is spent. Figure 1.1 dramatically illustrates the advantages of testing early rather than late. This chart is based on a detailed analysis of faults that were detected and fixed during several large government contracts. The bars marked‘A’ indicate what percentage of faults appeared in that phase. Thus, 10% of faults appeared during the requirements phase, 40% during design, and 50% during implementation. The bars marked ‘D’ indicated the percentage of faults that were detected during each phase. About 5% were detected during the requirements phase, and over 35% during system testing. Lastly is the cost analysis. The solid bars marked ‘C’ indicate the relative cost of finding and fixing faults during each phase. Since each project was different, this is averaged to be based on a “unit cost.” Thus, faults detected and fixed during requirements, design, and unit testing were a single unit cost. Faults detected and fixed during integration testing cost five times as much, 10 times as much during system testing, and 50 times as much after the software is deployed. Figure 1.1. Cost of late testing. If we take the simple assumption of $1000 USD unit cost per fault, and 100 faults, that means we spend $39, 000 to find and correct faults during requirements, design, and unit testing. During integration testing, the cost goes upto $100, 000. But system testing and deployment are the serious problems. We find more faults during system testing at ten times the cost, for a total of $360, 000. And even though we only find a few faults after deployment, the cost being 50 X unit means we spend $250, 000! Avoiding the work early (requirements analysis and unit testing) saves money in the short term. But it leaves faults in software that are like little bombs, ticking away, and the longer they tick, the bigger the explosion when they finally go off. To put Beizer’s level 4 test maturity level in simple terms, the goal of testing is to eliminate faults as early as possible. We can never be perfect, but every time we eliminate a fault during unit testing (or sooner!), we save money. The rest of this book will teach you how to do that. EXERCISES Chapter 1. 1. What are some factors that would help a development organization move from Beizer’s testing level 2 (testing is to show errors) to testing level 4 (a mental discipline that increases quality)? 2. What is the difference between software fault and software failure? 3. What do we mean by “level 3 thinking is that the purpose of testing is to reduce risk?” What risk? Can we reduce the risk to zero? 4. The following exercise is intended to encourage you to think of testing in a more rigorous way than you may be used to. The exercise also hints at the strong relationship between specification clarity, faults, and test cases1. (a) Write a Java method with the signature public static Vector union (Vector a, Vector b) The method should return a Vector of objects that are in either of the two argument Vectors. (b) Upon reflection, you may discover a variety of defects and ambiguities in the given assignment. In other words, ample opportunities for faults exist. Describe as many possible faults as you can. (Note: Vector is a Java Collection class. If you are using another language, interpret Vector as a list.) (c) Create a set of test cases that you think would have a reasonable chance of revealing the faults you identified above. Document a rationale for each test in your test set. If possible, characterize all of your rationales in some concise summary. Run your tests against your implementation. (d) Rewrite the method signature to be precise enough to clarify the defects and ambiguities identified earlier. You might wish to illustrate your specification with examples drawn from your test cases. 5. Below are four faulty programs. Each includes test inputs that result in failure. Answer the following questions about each program. (a) Explain what is wrong with the given code. Describe the fault precisely by proposing a modification to the code. (b) If possible, give a test case that does not execute the fault. If not, briefly explain why not. (c) If possible, give a test case that executes the fault, but does not result in an error state. If not, briefly explain why not. (d) If possible give a test case that results in an error, but not a failure. If not, briefly explain why not. Hint: Don’t forget about the program counter. (e) For the given test case, describe the first error state. Be sure to describe the complete state. (f) Implement your repair and verify that the given test now produces the expected output. Submit a screen printout or other evidence that your new program works. 6. Answer question (a) or (b), but not both, depending on your background. (a) If you do, or have, worked for a software development company, what level of test maturity do you think the company worked at? (0: testing=debugging, 1: testing shows correctness, 2: testing shows the program doesn’t work, 3: testing reduces risk, 4: testing is a mental discipline about quality). (b) If you have never worked for a software development company, what level of test maturity do you think that you have? (0: testing=debugging, 1: testing shows correctness, 2: testing shows the program doesn’t work, 3: testing reduces risk, 4: testing is a mental discipline about quality). 7. Consider the following three example classes. These are OO faults taken from Joshua Bloch’s Effective Java, Second Edition. Answer the following questions about each. (a) Explain what is wrong with the given code. Describe the fault precisely by proposing a modification to the code. (b) If possible, give a test case that does not execute the fault. If not, briefly explain why not. (c) If possible, give a test case that executes the fault, but does not result in an error state. If not, briefly explain why not. (d) If possible give a test case that results in an error, but not a failure. If not, briefly explain why not. Hint: Don’t forget about the program counter. (e) In the given code, describe the first error state. Be sure to describe the complete state. (f) Implement your repair and verify that the given test now produces the expected output. Submit a screen printout or other evidence that your new program works. 1.3 BIBLIOGRAPHIC NOTES This textbook has been deliberately left uncluttered with references. Instead, each chapter contains a Bibliographic Notes section, which contains suggestions for further and deeper reading for readers who want more. We especially hope that research students will find these sections helpful. Most of the terminology in testing is from standards documents, including the IEEE Standard Glossary of Software Engineering Terminology [IEEE, 2008], the US Department of Defense [Department of Defense, 1988, Department of Defense, 1994], the US Federal Aviation Administration FAA-DO178B, and the British Computer Society’s Standard for Software Component Testing [British Computer Society, 2001]. Beizer [Beizer, 1990] first defined the testing levels in Section 1.2. Beizer described them in terms of the maturity of individual developers and used the term phase instead of level. We adapted the discussion to organizations rather than individual developers and chose the term level to mirror the language of the well-known Capability Maturity Model [Paulk et al., 1995]. All books on software testing and all researchers owe major thanks to the landmark books in 1979 by Myers [Myers, 1979], in 1990 by Beizer [Beizer, 1990], and in 2000 by Binder [Binder, 2000]. Some excellent overviews of unit testing criteria have also been published, including one by White [White, 1987] and more recently by Zhu, Hall, and May [Zhu et al., 1997]. The recent text from Pezze and Young [Pezze and Young, 2008] reports relevant processes, principles, and techniques from the testing literature, and includes many useful classroom materials. The Pezze and Young text presents coverage criteria in the traditional lifecycle-based manner, and does not organize criteria into the four abstract models discussed in this chapter. Another recent book by Mathur offers a comprehensive, in-depth catalog of test techniques and criteria [Mathur, 2014]. Numerous other software testing books were not intended as textbooks, or do not offer general coverage for classroom use. Beizer’s Software System Testing and Quality Assurance [Beizer, 1984] and Hetzel’s The Complete Guide to Software Testing [Hetzel, 1988] cover various aspects of management and process for software testing. Several books cover specific aspects of testing [Howden, 1987, Marick, 1995, Roper, 1994]. The STEP project at Georgia Institute of Technology resulted in a comprehensive survey of the practice of software testing by Department of Defense contractors in the 1980s [DeMillo et al., 1987]. The information for the Pentium bug and Mars lander was taken from several sources, including by Edelman, Moler, Nuseibeh, Knutson, and Peterson [Edelman, 1997, Knutson and Carmichael, 2000, Moler, 1995, Nuseibeh, 1997, Peterson, 1997]. The well-written official accident report [Lions, 1996] is our favorite source for understanding the details of the Ariane 5 Flight 501 Failure. The information for the Therac-25 accidents was taken from Leveson and Turner’s deep analysis [Leveson and Turner, 1993]. The details on the 2003 Northeast Blackout was taken from Minkel’s analysis in Scientific American [Minkel, 2008] and Rice’s book [Rice, 2008]. The information about the Korean education information system was taken from two newspaper articles [Min-sang and Sang-soo, 2011, Korea Times, 2011]. The 1999 study mentioned was published in an NRC / PITAC report [PITAC, 1999, Schneider, 1999]. The data in Figure 1.1 were taken from a NIST report that was developed by the Research Triangle Institute [RTI, 2002]. The figures on web application failures are due to Blumenstyk [Blumenstyk, 2006]. The figures about faulty software leading to security vulnerabilities are from Symantec [Symantec, 2007]. Finally, Rick Hower’s QATest website is a good resource for current, elementary, information about software testing: www.softwareqatest.com. 1 Liskov’s Program Development in Java, especially chapters 9 and 10, is a great source for students who wish to learn more about this. 2 Model-Driven Test Design Designers are more efficient and effective if they can raise their level of abstraction. This chapter introduces one of the major innovations in the second edition of Introduction to Software Testing. Software testing is inherently complicated and our ultimate goal, completely correct software, is unreachable. The reasons are formal (as discussed below in section 2.1) and philosophical. As discussed in Chapter 1, it’s not even clear that the term “correctness” means anything when applied to a piece of engineering as complicated as a large computer program. Do we expect correctness out of a building? A car? A transportation system? Intuitively, we know that all large physical engineering systems have problems, and moreover, there is no way to say what correct means. This is even more true for software, which can quickly get orders of magnitude more complicated than physical structures such as office buildings or airplanes. Instead of looking for “correctness,” wise software engineers try to evaluate software’s “behavior” to decide if the behavior is acceptable within consideration of a large number of factors including (but not limited to) reliability, safety, maintainability, security, and efficiency. Obviously this is more complex than the naive desire to show the software is correct. So what do software engineers do in the face of such overwhelming complexity? The same thing that physical engineers do–we use mathematics to “raise our level of abstraction. ” The Model-Driven Test Design (MDTD) process breaks testing into a series of small tasks that simplify test generation. Then test designers isolate their task, and work at a higher level of abstraction by using mathematical engineering structures to design test values independently of the details of software or design artifacts, test automation, and test execution. A key intellectual step in MDTD is test case design. Test case design can be the primary determining factor in whether tests successfully find failures in software. Tests can be designed with a “human-based” approach, where a test engineer uses domain knowledge of the software’s purpose and his or her experience to design tests that will be effective at finding faults. Alternatively, tests can be designed to satisfy well-defined engineering goals such as coverage criteria. This chapter describes the task activities and then introduces criteria-based test design. Criteria-based test design will be discussed in more detail in Chapter 5, then specific criteria on four mathematical structures are described in Part II. After these preliminaries, the model-driven test design process is defined in detail. The book website has simple web applications that support the MDTD in the context of the mathematical structures in Part II. 2.1 SOFTWARE TESTING FOUNDATIONS One of the most important facts that all software testers need to know is that testing can show only the presence of failures, not their absence. This is a fundamental, theoretical limitation; to be precise, the problem of finding all failures in a program is undecidable. Testers often call a test successful (or effective) if it finds an error. While this is an example of level 2 thinking, it is also a characterization that is often useful and that we will use throughout the book. This section explores some of the theoretical underpinnings of testing as a way to emphasize how important the MDTD is. The definitions of fault and failure in Chapter 1 allow us to develop the reachability, infection, propagation, and revealability model (“RIPR”). First, we distinguish testing from debugging. Definition 2.6 Testing: Evaluating software by observing its execution. Definition 2.7 Test Failure: Execution of a test that results in a software failure. Definition 2.8 Debugging: The process of finding a fault given a failure. Of course the central issue is that for a given fault, not all inputs will “trigger” the fault into creating incorrect output (a failure). Also, it is often very difficult to relate a failure to the associated fault. Analyzing these ideas leads to the fault/failure model, which states that four conditions are needed for a failure to be observed. Figure 2.1 illustrates the conditions. First, a test must reach the location or locations in the program that contain the fault (Reachability). After the location is executed, the state of the program must be incorrect (Infection). Third, the infected state must propagate through the rest of the execution and cause some output or final state of the program to be incorrect (Propagation). Finally, the tester must observe part of the incorrect portion of the final program state (Revealability). If the tester only observes parts of the correct portion of the final program state, the failure is not revealed. This is shown in the cross-hatched intersection in Figure 2.1. Issues with revealing failures will be discussed in Chapter 4 when we present test automation strategies. Figure 2.1. Reachability, Infection, Propagation, Revealability (RIPR) model. Collectively, these four conditions are known as the fault/failure model, or the RIPR model. It is important to note that the RIPR model applies even when the fault is missing code (so-called faults of omission). In particular, when execution passes through the location where the missing code should be, the program counter, which is part of the program state, necessarily has the wrong value. From a practitioner’s view, these limitations mean that software testing is complex and difficult. The common way to deal with complexity in engineering is to use abstraction by abstracting out complicating details that can be safely ignored by modeling the problem with some mathematical structures. That is a central theme of this book, which we begin by analyzing the separate technical activities involved in creating good tests. 2.2 SOFTWARE TESTING ACTIVITIES In this book, a test engineer is an Information Technology (IT) professional who is in charge of one or more technical test activities, including designing test inputs, producing test case values, running test scripts, analyzing results, and reporting results to developers and managers. Although we cast the description in terms of test engineers, every engineer involved in software development should realize that he or she sometimes wears the hat of a test engineer. The reason is that each software artifact produced over the course of a product’s development has, or should have, an associated set of test cases, and the person best positioned to define these test cases is often the designer of the artifact. A test manager is in charge of one or more test engineers. Test managers set test policies and processes, interact with other managers on the project, and otherwise help the engineers test software effectively and efficiently. Figure 2.2 shows some of the major activities of test engineers. A test engineer must design tests by creating test requirements. These requirements are then transformed into actual values and scripts that are ready for execution. These executable tests are run against the software, denoted P in the figure, and the results are evaluated to determine if the tests reveal a fault in the software. These activities may be carried out by one person or by several, and the process is monitored by a test manager. Figure 2.2. Activities of test engineers. One of a test engineer’s most powerful tools is a formal coverage criterion. Formal coverage criteria give test engineers ways to decide what test inputs to use during testing, making it more likely that the tester will find problems in the program and providing greater assurance that the software is of high quality and reliability. Coverage criteria also provide stopping rules for the test engineers. The technical core of this book presents the coverage criteria that are available, describes how they are supported by tools (commercial and otherwise), explains how they can best be applied, and suggests how they can be integrated into the overall development process. Software testing activities have long been categorized into levels, and the most often used level categorization is based on traditional software process steps. Although most types of tests can only be run after some part of the software is implemented, tests can be designed and constructed during all software development steps. The most time-consuming parts of testing are actually the test design and construction, so test activities can and should be carried out throughout development. 2.3 TESTING LEVELS BASED ON SOFTWARE ACTIVITY Tests can be derived from requirements and specifications, design artifacts, or the source code. In traditional texts, a different level of testing accompanies each distinct software development activity: Acceptance Testing : assess software with respect to requirements or users’ needs. System Testing : assess software with respect to architectural design and overall behavior. Integration Testing : assess software with respect to subsystem design. Module Testing: assess software with respect to detailed design. Unit Testing : assess software with respect to implementation. Figure 2.3, often called the “V model,” illustrates a typical scenario for testing levels and how they relate to software development activities by isolating each step. Information for each test level is typically derived from the associated development activity. Indeed, standard advice is to design the tests concurrently with each development activity, even though the software will not be in an executable form until the implementation phase. The reason for this advice is that the mere process of designing tests can identify defects in design decisions that otherwise appear reasonable. Early identification of defects is by far the best way to reduce their ultimate cost. Note that this diagram is not intended to imply a waterfall process. The synthesis and analysis activities generically apply to any development process. Figure 2.3. Software development activities and testing levels – the “V Model”. The requirements analysis phase of software development captures the customer’s needs. Acceptance testing is designed to determine whether the completed software in fact meets these needs. In other words, acceptance testing probes whether the software does what the users want. Acceptance testing must involve users or other individuals who have strong domain knowledge. The architectural design phase of software development chooses components and connectors that together realize a system whose specification is intended to meet the previously identified requirements. System testing is designed to determine whether the assembled system meets its specifications. It assumes that the pieces work individually, and asks if the system works as a whole. This level of testing usually looks for design and specification problems. It is a very expensive place to find lower-level faults and is usually not done by the programmers, but by a separate testing team. The subsystem design phase of software development specifies the structure and behavior of subsystems, each of which is intended to satisfy some function in the overall architecture. Often, the subsystems are adaptations of previously developed software. Integration testing is designed to assess whether the interfaces between modules (defined below) in a subsystem have consistent assumptions and communicate correctly. Integration testing must assume that modules work correctly. Some testing literature uses the terms integration testing and system testing interchangeably; in this book, integration testing does not refer to testing the integrated system or subsystem. Integration testing is usually the responsibility of members of the development team. The detailed design phase of software development determines the structure and behavior of individual modules. A module is a collection of related units that are assembled in a file, package, or class. This corresponds to a file in C, a package in Ada, and a class in C++ and Java. Module testing is designed to assess individual modules in isolation, including how the component units interact with each other and their associated data structures. Most software development organizations make module testing the responsibility of the programmer; hence the common term developer testing. Implementation is the phase of software development that actually produces code. A program unit, or procedure, is one or more contiguous program statements, with a name that other parts of the software use to call it. Units are called functions in C and C++, procedures or functions in Ada, methods in Java, and subroutines in Fortran. Unit testing is designed to assess the units produced by the implementation phase and is the “lowest” level of testing. In some cases, such as when building general- purpose library modules, unit testing is done without knowledge of the encapsulating software application. As with module testing, most software development organizations make unit testing the responsibility of the programmer, again, often called developer testing. It is straightforward to package unit tests together with the corresponding code through the use of tools such as JUnit for Java classes. Because of the many dependencies among methods in classes, it is common among developers using object-oriented (OO) software to combine unit and module testing and use the term unit testing or developertesting. Not shown in Figure 2.3 is regression testing, a standard part of the maintenance phase of software development. Regression testing is done after changes are made to the software, to help ensure that the updated software still possesses the functionality it had before the updates. Mistakes in requirements and high-level design end up being implemented as faults in the program; thus testing can reveal them. Unfortunately, the software faults that come from requirements and design mistakes are visible only through testing months or years after the original mistake. The effects of the mistake tend to be dispersed throughout multiple software components; hence such faults are usually difficult to pin down and expensive to correct. On the positive side, even if tests cannot be executed, the very process of defining tests can identify a significant fraction of the mistakes in requirements and design. Hence, it is important for test planning to proceed concurrently with requirements analysis and design and not be put off until late in a project. Fortunately, through techniques such as use case analysis, test planning is becoming better integrated with requirements analysis in standard software practice. Although most of the literature emphasizes these levels in terms of when they are applied, a more important distinction is on the types of faults that we are looking for. The faults are based on the software artifact that we are testing, and the software artifact that we derive the tests from. For example, unit and module tests are derived to test units and modules, and we usually try to find faults that can be found when executing the units and modules individually. One final note is that OO software changes the testing levels. OO software blurs the distinction between units and modules, so the OO software testing literature has developed a slight variation of these levels. Intra-method testing evaluates individual methods. Inter-method testing evaluates pairs of methods within the same class. Intra-class testing evaluates a single entire class, usually as sequences of calls to methods within the class. Finally, inter-class testing evaluates more than one class at the same time. The first three are variations of unit and module testing, whereas inter-class testing is a type of integration testing. 2.4 COVERAGE CRITERIA The essential problem with testing is the numbers. Even a small program has a huge number of possible inputs. Consider a tiny method that computes the average of three integers. We have only three input variables, but each can have any value between -MAXINT and +MAXINT. On a 32-bit machine, each variable has a possibility of over 4 billion values. With three inputs, this means the method has over 80 Octillion possible inputs! So no matter whether we are doing unit testing, integration testing, or system testing, it is impossible to test with all inputs. The input space is, to all practical purposes, infinite. Thus a test designer’s goal could be summarized in a very high-level way as searching a huge input space, hoping to find the fewest tests that will reveal the most problems. This is the source of two key problems in testing: (1) how do we search? and (2) when do we stop? Coverage criteria give us structured, practical ways to search the input space. Satisfying a coverage criterion gives a tester some amount of confidence in two crucial goals: (A) we have looked in many corners of the input space, and (B) our tests have a fairly low amount of overlap. Coverage criteria have many advantages for improving the quality and reducing the cost of test data generation. Coverage criteria can maximize the “bang for the buck,” with fewer tests that are effective at finding more faults. Well-designed criteria-based tests will be comprehensive, yet factor out unwanted redundancy. Coverage criteria also provide traceability from software artifacts such as source, design models, requirements, and input space descriptions. This supports regression testing by making it easier to decide which tests need to be reused, modified, or deleted. From an engineering perspective, one of the strongest benefits of coverage criteria is they provide a “stopping rule” for testing; that is, we know in advance approximately how many tests are needed and we know when we have “enough” tests. This is a powerful tool for engineers and managers. Coverage criteria also lend themselves well to automation. As we will formalize in Chapter 5, a test requirement is a specific element of a software artifact that a test case must satisfy or cover, and a coverage criterion is a rule or collection of rules that yield test requirements. For example, the coverage criterion “cover every statement” yields one test requirement for each statement. The coverage criterion “cover every functional requirement” yields one test requirement for each functional requirement. Test requirements can be stated in semi-formal, mathematical terms, and then manipulated algorithmically. This allows much of the test data design and generation process to be automated. The research literature presents a lot of overlapping and identical coverage criteria. Researchers have invented hundreds of criteria on dozens of software artifacts. However, if we abstract these artifacts into mathematical models, many criteria turn out to be exactly the same. For example, the idea of covering pairs of edges in finite state machines was first published in 1976, using the term switch cover. Later, the same idea was applied to control flow graphs and called two-trip, still again, the same idea was “invented” for state transition diagrams and called transition-pair (we define this formally using the generic term edge-pair in Chapter 7). Although they looked very different in the research literature, if we generalize these structures to graphs, all three ideas are the same. Similarly, node coverage and edge coverage have each been defined dozens of times. Sidebar Black-Box and White-Box Testing Black-box testing and the complementary white-box testing are old and widely used terms in software testing. In black-box testing, we derive tests from external descriptions of the software, including specifications, requirements, and design. In white-box testing, on the other hand, we derive tests from the source code internals of the software, specifically including branches, individual conditions, and statements. This somewhat arbitrary distinction started to lose coherence when the term gray-box testing was applied to developing tests from design elements, and the approach taken in this book eliminates the need for the distinction altogether. Some older sources say that white-box testing is used for system testing and black-box testing for unit testing. This distinction is certainly false, since all testing techniques considered to be white-box can be used at the system level, and all testing techniques considered to be black-box can be used on individual units. In reality, unit testers are currently more likely to use white-box testing than system testers are, simply because white-box testing requires knowledge of the program and is more expensive to apply, costs that can balloon on a large system. This book relies on developing tests from mathematical abstractions such as graphs and logical expressions. As will become clear in Part II, these structures can be extracted from any software artifact, including source, design, specifications, or requirements. Thus asking whether a coverage criterion is black-box or white-box is the wrong question. One more properly should ask from what level of abstraction is the structure drawn. In fact, all test coverage criteria can be boiled down to a few dozen criteria on just four mathematical structures: input domains, graphs, logic expressions, and syntax descriptions (grammars). Just like mechanical, civil, and electrical engineers use calculus and algebra to create abstract representations of physical structures, then solve various problems at this abstract level, software engineers can use discrete math to create abstract representations of software, then solve problems such as test design. The core of this book is organized around these four structures, as reflected in the four chapters in Part II. This structure greatly simplifies teaching test design, and our classroom experience with the first edition of this book helped us realize this structure also leads to a simplified testing process. This process allows test design to be abstracted and carried out efficiently, and also separates test activities that need different knowledge and skill sets. Because the approach is based on these four abstract models, we call it the Model-Driven Test Design process (MDTD). Sidebar MDTD and Model-Based Testing Model-based testing (MBT) is the design of software tests from an abstract model that represents one or more aspects of the software. The model usually, but not always, represents some aspects of the behavior of the software, and sometimes, but not always, is able to generate expected outputs. The models are often described with UML diagrams, although more formal models as well as other informal modeling languages are also used. MBT typically assumes that the model has been built to specify the behavior of the software and was created during a design stage of development. The ideas presented in this book are not, strictly speaking, exclusive to model-based testing. However, there is much overlap with MDTD and most of the concepts in this book can be directly used as part of MBT. Specifically, we derive our tests from abstract structures that are very similar to models. An important difference is that these structures can be created after the software is implemented, by the tester as part of test design. Thus, the structures do not specify behavior; they represent behavior. If a model was created to specify the software behavior, a tester can certainly use it, but if not, a tester can create one. Second, we create idealized structures that are more abstract than most modeling languages. For example, instead of UML statecharts or Petri nets, we design our tests from graphs. If model-based testing is being used, the graphs can be derived from a graphical model. Third, model-based testing explicitly does not use the source code implementation to design tests. In this book, abstract structures can be created from the implementation via things like control flow graphs, call graphs, and conditionals in decision statements. 2.5 MODEL-DRIVEN TEST DESIGN Academic teachers and researchers have long focused on the design of tests. We define test design to be the process of creating input values that will effectively test software. This is the most mathematical and technically challenging part of testing, however, academics can easily forget that this is only a small part of testing. The job of developing tests can be divided into four discrete tasks: test design, test automation, test execution, and test evaluation. Many organizations assign the same person to all tasks. However, each task requires different skills, background knowledge, education, and training. Assigning the same person to all these tasks is akin to assigning the same software developer to requirements, design, implementation, integration, and configuration control. Although this was common in previous decades, few companies today assign the same engineers to all development tasks. Engineers specialize, sometimes temporarily, sometimes for a project, and sometimes for their entire career. But should test organizations still assign the same people to all test tasks? They require different skills, and it is unreasonable to expect all testers to be good at all tasks, so this clearly wastes resources. The following subsections analyze each of these tasks in detail. 2.5.1 Test Design As said above, test design is the process of designing input values that will effectively test software. In practice, engineers use two general approaches to designing tests. In criteria-based test design, we design test values that satisfy engineering goals such as coverage criteria. In human-based test design, we design test values based on domain knowledge of the program and human knowledge of testing. These are quite different activities. Criteria-based test design is the most technical and mathematical job in software testing. To apply criteria effectively, the tester needs knowledge of discrete math, programming, and testing. That is, this requires much of a traditional degree in computer science. For somebody with a degree in computer science or software engineering, this is intellectually stimulating, rewarding, and challenging. Much of the work involves creating abstract models and manipulating them to design high-quality tests. In software development, this is analogous to the job of software architect; in building construction, this is analogous to the job of construction engineer. If an organization uses people who are not qualified (that is, do not have the required knowledge), they will spend time creating ineffective tests and be dissatisfied at work. Human-based test design is quite different. The testers must have knowledge of the software’s application domain, of testing, and of user interfaces. Human-based test designers explicitly attempt to find stress tests, tests that stress the software by including very large or very small values, boundary values, invalid values, or other values that the software may not expect during typical behavior. Human-based testers also explicitly consider actions the users might do, including unusual actions. This is much harder than developers may think and more necessary than many test researchers and educators realize. Although criteria-based approaches often implicitly include techniques such as stress testing, they can be blind to special situations, and may miss problems that humanbased tests would not. Although almost no traditional CS is needed, an empirical background (biology or psychology) or a background in logic (law, philosophy, math) is helpful. If the software is embedded on an airplane, a human-based test designer should understand piloting; if the software runs an online store, the test designers should understand marketing and the products being sold. For people with these abilities, human-based test design is intellectually stimulating, rewarding, and challenging–but ...

Turn in your highest-quality paper
Get a qualified writer to help you with

“ Contemporary Software Testing Foundations Programming Discussion ”

Get high-quality paper

Guarantee! All work is written by expert writers!

Still stressed from student homework?

Get quality assistance from academic writers!

Order now