Why Do We Test Software?
Testing in the 21st Century
Software defines behavior
– network routers, finance, switching networks, other infrastructure
Today’s software market :
– is much bigger
– is more competitive – has more users
Embedded Control Applications
– airplanes, air traffic control – spaceships
– watches – ovens
– remote controllers
Agile processes put increased pressure on testers
– Programmers must unit test – with no training or education!
– PDAs
– memory seats – DVD players
– garage door openers – cell phones
Industry is going
through a revolution in what testing means to the success of software
products
Software is a Skin that
Surrounds Our Civilization
3 Quote due to Dr. Mark Harman
Software Fault : A static defect in the software
Software Failure : External, incorrect behavior with
respect to the requirements or other description of the expected behavior
Software Error : An incorrect internal state that is the manifestation of some fault
Faults in software are equivalent to design mistakes in hardware.
Software does not degrade.
Software Faults, Errors & Failures
A Concrete Example
5
public static int numZero (int [ ] arr)
{ // Effects: If arr is null throw NullPointerException // else return the number of occurrences of 0 in arr int count = 0;
for (int i = 1; i < arr.length; i++) {
if (arr [ i ] == 0) {
count++;
} }
return count;
}
Fault: Should start searching at 0, not 1
Test 1 [ 2, 7, 0 ] Expected: 1 Actual: 1
Test 2 [ 0, 2, 7 ] Expected: 1 Actual: 0 Error: i is 1, not 0, on
the first iteration Failure: none
Error: i is 1, not 0
Error propagates to the variable count Failure: count is 0 at the return statement
Spectacular Software Failures
Intel’s Pentium FDIV fault : Public relations nightmare
THERAC-25 radiation machine : Poor testing of safety-critical software can cost lives : 3 patients were killed
Mars Polar Lander crash site?
THERAC-25 design
Ariane 5:
exception-handling bug : forced self destruct on maiden flight (64-bit to 16-bit conversion: about 370 million $ lost)
We need our software to be dependable Testing is one way to assess dependability
NASA’s Mars lander: September 1999, crashed due to a units integration fault
Ariane 5 explosion : Very expensive
Northeast Blackout of 2003
7
Affected 10 million people in Ontario,
Canada
Affected 40 million people in 8 US
states
Financial losses of
$6 Billion USD 508 generating
units and 256 power plants shut
down
The alarm system in the energy management system failed due to a software error and operators were not informed of
the power overload in the system
Testing in the 21st Century
More safety critical, real-time software
Embedded software is ubiquitous … check your pockets
Enterprise applications means bigger programs, more users
Paradoxically, free software increases our expectations !
Security is now all about software faults
– Secure software is reliable software
The web offers a new deployment platform
– Very competitive and very available to more users – Web apps are distributed
– Web apps must be highly reliable
What Does This Mean?
9
Software testing is getting more important
What are we trying to do when we test ?
What are our goals ?
Validation & Verification ( IEEE )
Validation : The process of evaluating software at the end of software development to ensure compliance with
intended usage
Verification : The process of determining whether the products of a given phase of the software development process fulfill the requirements established during the previous phase
IV&V stands for “independent verification and validation”
11
Testing Goals Based on Test Process Maturity
Level 0 : There’s no difference between testing and debugging
Level 1 : The purpose of testing is to show correctness
Level 2 : The purpose of testing is to show that the software doesn’t work
Level 3 : The purpose of testing is not to prove anything specific, but to reduce the risk of using the software
Level 4 : Testing is a mental discipline that helps all IT professionals develop higher quality software
Where Are You?
Are you at level 0, 1, or 2 ?
Is your organization at work at level 0, 1, or 2 ?
Or 3?
We hope to teach you to become
“change agents” in your workplace …
Advocates for level 4 thinking
Tactical Goals : Why Each Test ?
13
Written test objectives and requirements must be documented
What are your planned coverage levels?
How much testing is enough?
Common objective – spend the budget … test
until the ship-date …
– Sometimes called the “date criterion”
If you don’t know why you’re conducting each test, it won’t be very helpful
Cost of Not Testing
Testing is the most time consuming and expensive part of software development
Not testing is even more expensive
If we have too little testing effort early, the cost of testing increases
Planning for testing after development is prohibitively expensive
Poor Program Managers might say:
“Testing is too expensive.”
Cost of Late Testing
15
60 50 40 30 20 10 0
Fault origin (%) Fault detection (%) Unit cost (X)
Software Engineering Institute; Carnegie Mellon University; Handbook CMU/SEI-96-HB-002
Assume $1000 unit cost, per fault, 100 faults
Summary: Why Do We Test Software ?
A tester’s goal is to eliminate faults as early as possible
• Improve quality
• Reduce cost
• Preserve customer satisfaction
Model-Driven Test Design
Complexity of Testing Software
No other engineering field builds products as complicated as software
The term correctness has no meaning
– Is a building correct?
– Is a car correct?
– Is a subway system correct?
Like other engineers, we must use abstraction to manage complexity
– This is the purpose of the model-driven test design process – The “model” is an abstract structure
Software Testing Foundations
19
Testing can only show the presence of failures
Not their absence
Testing & Debugging
Testing : Evaluating software by observing its execution
Test Failure : Execution of a test that results in a software failure
Debugging : The process of finding a fault given a failure Not all inputs will “trigger” a fault into causing a
failure
21
Fault & Failure Model
Three conditions necessary for a failure to be observed
1. Reachability : The location or locations in the program that contain the fault must be reached
2. Infection : The state of the program must be incorrect 3. Propagation : The infected state must cause some
output or final state of the program to be incorrect
Software Testing Activities
Test Engineer : An IT professional who is in charge of one or more technical test activities
– Designing test inputs – Producing test values – Running test scripts – Analyzing results
– Reporting results to developers and managers
Test Manager : In charge of one or more test engineers
– Sets test policies and processes
– Interacts with other managers on the project – Otherwise supports the engineers
23
Traditional Testing Levels
Class A method mA1() method mA2()
Class B method mB1() method mB2() main Class P
Acceptance testing : Is the software
acceptable to the user?
Integration testing : Test how modules interact with each other
System testing : Test the overall
functionality of the system
Module testing
(developer testing) : Test each class, file, module, component
Unit testing
(developer testing) : Test each unit
(method) individually
This view obscures underlying similarities
Object-Oriented Testing Levels
Class A method mA1() method mA2()
Class B method mB1() method mB2()
Intra-class testing : Test an entire class as sequences of calls
Inter-class testing : Test multiple classes together
Inter-method testing : Test pairs of methods in the same class
Intra-method testing : Test each method
Coverage Criteria
Even small programs have too many inputs to fully test them all
– private static double computeAverage (int A, int B, int C) – On a 32-bit machine, each variable has over 4 billion possible
values
– Over 80 octillion possible tests!!
– Input space might as well be infinite
Testers search a huge input space
– Trying to find the fewest inputs that will find the most problems
Coverage criteria give structured, practical ways to search the input space
– Search the input space thoroughly – Not much overlap in the tests
25
Advantages of Coverage Criteria
Maximize the “bang for the buck”
Provide traceability from software artifacts to tests
– Source, requirements, design models, …
Make regression testing easier
Gives testers a “stopping rule” … when testing is finished
Can be well supported with powerful tools
Test Requirements and Criteria
Test Criterion : A collection of rules and a process that define test requirements
̶ Cover every statement
̶ Cover every functional requirement
Test Requirements : Specific things that must be satisfied or covered during testing
– Each statement is a test requirement
– Each functional requirement is a test requirement
27
Testing researchers have defined dozens of criteria, but they are all really just a few criteria on four types of
structures …
1. Graphs
2. Logic expressions 3. Input domains
4. Syntax descriptions
Types of Test Activities
Testing can be broken up into four general types of activities
1. Test Design
2. Test Automation 3. Test Execution 4. Test Evaluation
Each type of activity requires different skills, background knowledge, education and training
No reasonable software development organization uses the same people for requirements, design,
implementation, integration and configuration control Why do test organizations still use the same people for all four test activities??
This clearly wastes resources
1.a) Criteria-based 1.b) Human-based
Other Activities
Test management : Sets policy, organizes team, interfaces with development, chooses criteria, decides how much automation is needed, …
Test maintenance : Save tests for reuse as software evolves
– Requires cooperation of test designers and automators
– Deciding when to trim the test suite is partly policy and partly technical – and in general, very hard !
– Tests should be put in configuration control
Test documentation : All parties participate
– Each test must document “why” – criterion and test requirement satisfied or a rationale for human-designed tests
– Ensure traceability throughout the process – Keep documentation in the automated tests
29
Using MDTD in Practice
This approach lets one test designer do the math
Then traditional testers and programmers can do their parts
– Find values
– Automate the tests – Run the tests
– Evaluate the tests
Just like in traditional engineering … an engineer constructs models with calculus, then gives direction to carpenters, electricians,
technicians, …
Test designers become the technical
experts
Model-Driven Test Design
31
software artifact
model /
structure test
requirements
refined
requirements / test specs
input values
test cases test
scripts test
results pass /
fail
IMPLEMENTATION ABSTRACTION
LEVEL
DESIGN ABSTRACTION
LEVEL test
requirements
Model-Driven Test Design – Steps
software artifact
model /
structure test
requirements
refined
requirements / test specs
input values
test cases test
scripts test
results pass /
fail
IMPLEMENTATION ABSTRACTION
LEVEL
DESIGN ABSTRACTION
LEVEL analysis
criterion refine
generate
prefix postfix expected automate
execute evaluate
test
requirements domain
analysis
Model-Driven Test Design –Activities
33
software artifact
model /
structure test
requirements
refined
requirements / test specs
input values
test cases test
scripts test
results pass /
fail
IMPLEMENTATION ABSTRACTION
LEVEL
DESIGN ABSTRACTION
LEVEL
Test Design
Test Execution Test
Evaluation
Raising our abstraction level makes test design MUCH easier
Small Illustrative Example
Software Artifact : Java Method /** * Return index of node n at the
* first position it appears, * -1 if it is not present
*/ public int indexOf (Node n)
{ for (int i=0; i < path.size(); i++) if (path.get(i).equals(n)) return i;
return -1;
} 5 4
3 2
1 i = 0
i < path.size() if
return i return -1
Control Flow Graph
Example (2)
35
4 5
3 2
1
Graph
Abstract version
Edges 1 2 2 3 3 2 3 4 2 5
Initial Node: 1 Final Nodes: 4, 5
6 requirements for Edge-Pair Coverage 1. [1, 2, 3]
2. [1, 2, 5]
3. [2, 3, 4]
4. [2, 3, 2]
5. [3, 2, 3]
6. [3, 2, 5]
Test Paths [1, 2, 5]
[1, 2, 3, 2, 5]
[1, 2, 3, 2, 3, 4]
Find values …
Test Automation
What is Test Automation?
Reduces cost
Reduces human error
Reduces variance in test quality from different individuals
Significantly reduces the cost of regression testing
37
The use of software to control the execution of tests, the comparison of actual outcomes to predicted outcomes, the
setting up of test preconditions, and other test control and test reporting functions
Components of a Test Case
A test case is a multipart artifact with a definite structure
Test case values
Expected results
The result that will be produced when executing the test if the program satisfies it intended behavior
The values that directly satisfy one test requirement
39
What is JUnit?
Open source Java testing framework used to write and run repeatable automated tests
JUnit is open source (junit.org)
A structure for writing test drivers
JUnit features include:
– Assertions for testing expected results
– Test features for sharing common test data
– Test suites for easily organizing and running tests – Graphical and textual test runners
JUnit is widely used in industry
JUnit can be used as stand alone Java programs (from the command line) or within an IDE such as Eclipse
Writing Tests for JUnit
Need to use the methods of the junit.framework.assert class
– javadoc gives a complete description of its capabilities
Each test method checks a condition (assertion) and reports to the test runner whether the test failed or succeeded
The test runner uses the result to report to the user (in command line mode) or update the display (in an IDE)
All of the methods return void
A few representative methods of junit.framework.assert
– assertTrue (boolean)
– assertTrue (String, boolean)
41
JUnit Test Fixtures
A test fixture is the state of the test
– Objects and variables that are used by more than one test – Initializations (prefix values)
– Reset values (postfix values)
Different tests can use the objects without sharing the state
Objects used in test fixtures should be declared as instance variables
They should be initialized in a @Before method
Can be deallocated or reset in an @After method
Simple JUnit Example
public class Calc
{ static public int add (int a, int b) {
return a + b;
} }
import org.junit.Test;
import static org.junit.Assert.*;
public class calcTest
{ @Test public void testAdd() {
assertTrue (“Calc sum incorrect”, 5 == Calc.add (2, 3));
}
43
Testing the Min Class
import java.util.*;
public class Min { /**
* Returns the mininum element in a list
* @param list Comparable list of elements to search * @return the minimum element in the list
* @throws NullPointerException if list is null or * if any list elements are null
* @throws ClassCastException if list elements are not mutually comparable * @throws IllegalArgumentException if list is empty
*/
… }
public static <T extends Comparable<? super T>> T min (List<? extends T> list) {
if (list.size() == 0) {
throw new IllegalArgumentException ("Min.min");
}
Iterator<? extends T> itr = list.iterator();
T result = itr.next();
if (result == null) throw new NullPointerException ("Min.min");
while (itr.hasNext())
{ // throws NPE, CCE as needed T comp = itr.next();
if (comp.compareTo (result) < 0) {
result = comp;
} }
return result;
} }
MinTest Class
Standard imports for all JUnit classes :
import static org.junit.Assert.*;
import org.junit.*;
import java.util.*;
Test fixture and pre-test setup
method (prefix) :
Post test teardown method (postfix) :
private List<String> list; // Test fixture
// Set up - Called before every test method.
@Before
public void setUp()
{ list = new ArrayList<String>();
}
// Tear down - Called after every test method.
@After
public void tearDown() {
list = null; // redundant in this example!
45
Min Test Cases: NullPointerException
@Test public void testForNullList() { list = null;
try {
Min.min (list);
} catch (NullPointerException e) { return;
}
fail (“NullPointerException expected”);
}
@Test(expected = NullPointerException.class) public void testForNullElement()
{ list.add (null);
list.add ("cat");
Min.min (list);
This NullPointerException }
test uses the fail assertion
This NullPointerException test decorates the @Test
annotation with the class of the exception
This NullPointerException test catches an easily
overlooked special case
@Test(expected = NullPointerException.class) public void testForSoloNullElement()
{ list.add (null);
Min.min (list);
}
Remaining Test Cases for Min
@Test(expected = ClassCastException.class)
@SuppressWarnings("unchecked") public void testMutuallyIncomparable() { List list = new ArrayList();
list.add ("cat");
list.add ("dog");
list.add (1);
Min.min (list);
}
@Test(expected = IllegalArgumentException.class) public void testEmptyList()
{ Min.min(list);
}
Note that Java generics don’t
prevent clients from using raw types!
Special case: Testing for the empty list
Finally! A couple of
“Happy Path” tests
@Test
public void testSingleElement() { list.add ("cat");
Object obj = Min.min (list);
assertTrue("Single Element List", obj.equals("cat"));
}
@Test
public void testDoubleElement() { list.add ("dog");
list.add ("cat");
Object obj = Min.min (list);
assertTrue ("Double Element List", obj.equals ("cat"));
47
Question: Where Does The Data Come From?
Answer:
– All combinations of values from @DataPoint annotations where assume clause is true
– Four (of nine) combinations in this particular case – Note: @DataPoint format is an array
@DataPoints
public static String[] string = {"ant", "bat", "cat"};
@DataPoints
public static Set[] sets = {
new HashSet (Arrays.asList ("ant", "bat")),
new HashSet (Arrays.asList (“bat", “cat", “dog“, “elk”)), new HashSet (Arrays.asList (“Snap”, “Crackle”, “Pop")) };
JUnit Theories Need BoilerPlate
import org.junit.*;
import org.junit.runner.RunWith;
import static org.junit.Assert.*;
import static org.junit.Assume.*;
import org.junit.experimental.theories.DataPoint;
import org.junit.experimental.theories.DataPoints;
import org.junit.experimental.theories.Theories;
import org.junit.experimental.theories.Theory;
import java.util.*;
@RunWith (Theories.class) public class SetTheoryTest { … // See Earlier Slides }
49
Running from a Command Line
This is all we need to run JUnit in an IDE (like Eclipse)
We need a main() for command line execution …
AllTests
import org.junit.runner.RunWith;
import org.junit.runners.Suite;
import junit.framework.JUnit4TestAdapter;
// This section declares all of the test classes in the program.
@RunWith (Suite.class)
@Suite.SuiteClasses ({ StackTest.class }) // Add test classes here.
public class AllTests
{ // Execution begins in main(). This test class executes a // test runner that tells the tester if any fail.
public static void main (String[] args) {
junit.textui.TestRunner.run (suite());
}
// The suite() method helpfs when using JUnit 3 Test Runners or Ant.
public static junit.framework.Test suite() {
return new JUnit4TestAdapter (AllTests.class);
51
How to Run Tests
JUnit provides test drivers
– Character-based test driver runs from the command line – GUI-based test driver-junit.swingui.TestRunner
• Allows programmer to specify the test class to run
• Creates a “Run” button
If a test fails, JUnit gives the location of the failure and any exceptions that were thrown
Summary
The only way to make testing efficient as well as effective is to automate as much as possible
JUnit provides a very simple way to automate our unit tests
It is no “silver bullet” however … it does not solve the hard problem of testing :
What test values to use ?
• This is test design … the purpose of test criteria
Criteria-Based Test Design
Changing Notions of Testing
Old view focused on testing at each software
development phase as being very different from other phases
– Unit, module, integration, system …
New view is in terms of structures and criteria
– Graphs, logical expressions, syntax, input space
Test design is largely the same at each phase
– Creating the model is different
– Choosing values and automating the tests is different
55
New : Test Coverage Criteria
Test Requirements : Specific things that must be satisfied or covered during testing
Test Criterion : A collection of rules and a process that define test requirements
A tester’s job is simple : Define a model of the software, then find ways to cover it
Testing researchers have defined dozens of
criteria, but they are all really just a few criteria on four types of structures …
Criteria Based on Structures
Structures : Four ways to model software
1. Graphs
2. Logical Expressions 3. Input Domain
Characterization
4. Syntactic Structures
(not X or not Y) and A and B
if (x > y) z = x - y;
else
A: {0, 1, >1}
B: {600, 700, 800}
C: {swe, cs, isa, infs}
57
Old View : Black and White Boxes
Black-box testing : Derive tests from external
descriptions of the software, including specifications, requirements, and design
White-box testing : Derive tests from the source code internals of the software, specifically including branches, individual conditions, and statements
Model-based testing : Derive tests from a model of the software (such as a UML diagram)
MDTD makes these distinctions less important.
The more general question is:
from what abstraction level do we derive tests?
Source of Structures
These structures can be extracted from lots of software artifacts
– Graphs can be extracted from UML use cases, finite state machines, source code, …
– Logical expressions can be extracted from decisions in program source, guards on transitions, conditionals in use cases, …
This is not the same as “model-based testing,” which derives tests from a model that describes some aspects of the
system under test
– The model usually describes part of the behavior – The source is usually not considered a model
59
1. Graph Coverage – Structural
6
5 3
2
1 7
4
Node (Statement) Cover every node
• 12567
• 1343567 This graph may represent
• statements & branches
• methods & calls
• components & signals
• states and transitions
Edge (Branch) Cover every edge
• 12567
• 1343567
• 1357 Path
Cover every path
• 12567
• 1257
• 13567
• 1357
• 1343567
• 134357 …
Defs & Uses Pairs
• (x, 1, (1,2)), (x, 1, (1,3))
• (y, 1, 4), (y, 1, 6)
• (a, 2, (5,6)), (a, 2, (5,7)), (a, 3, (5,6)), (a, 3, (5,7)),
• (m, 2, 7), (m, 4, 7), (m, 6, 7)
1. Graph Coverage – Data Flow
6
5 3
2
1 7
This graph contains: 4
• defs: nodes & edges
where variables get values
• uses: nodes & edges
where values are accessed
def = {x, y}
def = {a , m}
def = {a}
def = {m}
def = {m}
use = {x}
use = {x}
use = {a}
use = {a}
use = {y}
use = {m}
use = {y}
All Defs
Every def used once
• 1, 2, 5, 6, 7
• 1, 2, 5, 7
• 1, 3, 4, 3, 5, 7 All Uses
Every def “reaches”
every use
• 1, 2, 5, 6, 7
• 1, 2, 5, 7
• 1, 3, 5, 6, 7
• 1, 3, 5, 7
61
1. Graph - FSM Example
Memory Seats in a Lexus ES 300
Driver 1 Configuration
Driver 2 Configuration
[Ignition = off] | Button2
[Ignition = off] | Button1
Modified Configuration
sideMirrors () [Ignition = on] |
lumbar () [Ignition = on] |
seatBottom () [Ignition = on] |
seatBack () [Ignition = on] |
New
Configuration Driver 1
New
Configuration Driver 2
[Ignition = on] | Reset AND Button1 [Ignition = on] | Reset AND Button2 Ignition = off
Ignition = off
(to Modified)
Guard (safety constraint) Trigger (input)
2. Logical Expressions
( (a > b) or G ) and (x < y)
Transitions
Software Specifications
Program Decision Statements Logical
Expressions
63
2. Logical Expressions
Predicate Coverage : Each predicate must be true and false
– ( (a>b) or G ) and (x < y) = True, False
Clause Coverage : Each clause must be true and false
– (a > b) = True, False – G = True, False
– (x < y) = True, False
Combinatorial Coverage : Various combinations of clauses
– Active Clause Coverage: Each clause must determine the predicate’s result
( (a > b) or G ) and (x < y)
2. Logic – Active Clause Coverage
( (a > b) or G ) and (x < y)
1 T F T 2 F F T
duplicate
3 F T T 4 F F T 5 T T T 6 T T F
With these
values for G and (x<y), (a>b)
determines the value of the
predicate
65
3. Input Domain Characterization
Describe the input domain of the software
– Identify inputs, parameters, or other categorization
– Partition each input into finite sets of representative values – Choose combinations of values
System level
– Number of students { 0, 1, >1 }
– Level of course { 600, 700, 800 } – Major { swe, cs, isa, infs }
Unit level
– Parameters F (int X, int Y)
– Possible values X: { <0, 0, 1, 2, >2 }, Y : { 10, 20, 30 } – Tests F (-5, 10), F (0, 20), F (1, 30), F (2, 10), F (5, 20)
4. Syntactic Structures
Based on a grammar, or other syntactic definition
Primary example is mutation testing
1. Induce small changes to the program: mutants
2. Find tests that cause mutant programs to fail: killing mutants 3. Failure is defined as different output from the original program 4. Check the output of useful tests on the original program
Example program and mutants
if (x > y) z = x - y;
else
z = 2 * x;
if (x > y)
if (x >= y) z = x - y;
z = x + y;
z = x – m;
else
67
Coverage Overview
Four Structures for Modeling Software
Graphs Logic Input Space Syntax
Use cases Specs
Design Source
Applied to
Specs DNF
FSMs Source
Applied to
Input Models
Integ Source
Applied to
Coverage
Infeasible test requirements : test requirements that cannot be satisfied
– No test case values exist that meet the test requirements – Dead code
– Detection of infeasible test requirements is formally undecidable for most test criteria
Thus, 100% coverage is impossible in practice
Given a set of test requirements TR for coverage criterion C, a test set T satisfies C coverage if and only if for every test requirement tr in TR, there is at least one test t in T such that t satisfies tr
69
Two Ways to Use Test Criteria
1. Directly generate test values to satisfy the criterion
– often assumed by the research community – most obvious way to use criteria
– very hard without automated tools
2. Generate test values externally and measure against the criterion usually favored by industry
– sometimes misleading
– if tests do not reach 100% coverage, what does that mean?
Test criteria are sometimes called
metrics
Generators and Recognizers
Generator : A procedure that automatically generates values to satisfy a criterion
Recognizer : A procedure that decides whether a given set of test values satisfies a criterion
Both problems are provably undecidable for most criteria
It is possible to recognize whether test cases satisfy a criterion far more often than it is possible to generate tests that satisfy the criterion
Coverage analysis tools are quite plentiful
Criteria Summary
71
• Many companies still use “monkey testing”
• A human sits at the keyboard, wiggles the mouse and bangs the keyboard
• No automation
• Minimal training required
• Some companies automate human-designed tests
• But companies that use both automation and criteria- based testing
Save money Find more faults
Build better software
Summary of Part 1’s New Ideas
Why do we test – to reduce the risk of using the software
– faults, failures, the RIP model
– Test process maturity levels – level 4 is a mental discipline that improves the quality of the software
Model-Driven Test Design
– Four types of test activities – test design, automation, execution and evaluation
Test Automation
– Testability, observability and controllability, test automation frameworks
Criteria-based test design
– Four structures – test requirements and criteria
Earlier and better testing can empower the