LAB OF SOFTWARE ARCHITECTURES AND INFORMATION SYSTEMS FACULTY OF INFORMATICS MASARYK UNIVERSITY, BRNO PV260 - SOFTWARE QUALITY LECT 7. Requirements and Test Cases. From Unit Testing to Integration Testing Bruno Rossi brossi@mail.muni.cz 2-106 ● Software Testing → Introduction → Basic Principles ● From Requirements to Test Cases → Functional testing → Translating specifications into test cases ● From Unit Testing to Integration Testing → comparison with unit testing → Strategies ● Specific Issues in Testing Object Oriented Software Outline "Discovering the unexpected is"Discovering the unexpected is more important than confirmingmore important than confirming the known."the known." George BoxGeorge Box 4-106 ● In Eclipse and Mozilla, 30–40% of all changes are fixes (Sliverski et al., 2005) ● Fixes are 2–3 times smaller than other changes (Mockus +Votta, 2000) ● 4% of all one-line changes introduce new errors (Purushothaman + Perry, 2004) Introduction A. Zeller, Why Programs Fail, Second Edition: A Guide to Systematic Debugging, 2 edition. Amsterdam ; Boston: Morgan Kaufmann, 2009.  5-106 Motivating Examples A. Zeller, Why Programs Fail, Second Edition: A Guide to Systematic Debugging, 2 edition. Amsterdam ; Boston: Morgan Kaufmann, 2009.  6-106 ● “Testing is the process of exercising or evaluating a system or system component by manual or automated means to verify that it satisfies specified requirements.” IEEE standards definition What is Software Testing 7-106 Reminder for some important terms: ● Defect: “An imperfection or deficiency in a work product where that work product does not meet its requirements or specifications and needs to be either repaired or replaced.” ● Error: “A human action that produces an incorrect result” ● Failure: “(A) Termination of the ability of a product to perform a required function or its inability to perform within previously specified limits. (B) An event in which a system or system component does not perform a required function within specified limits. A failure may be produced when a fault is encountered→ . “ ● Fault: “A manifestation of an error in software.” ● Problem: “(A) Difficulty or uncertainty experienced by one or more persons, resulting from an unsatisfactory encounter with a system in use. (B) A negative situation to overcome” What is Software Testing Definitions according to IEEE Std 1044-2009 “IEEE Standard Classification for Software Anomalies“ 8-106 Hopefully you haven't seen some of these 9-106 Maybe some of these... 10-106 And defects are everywhere... This is one failure I encountered when preparing this presentation on LibreOffice 4.2.7.2 A formula in ppt that got converted into image – looks good when editing The slides preview on the left, looks a bit strange... When converted to pdf... 11-106 Where is the term “bug”? ● Very often a synonymous of “defect” so that “debugging” is the activity related to removing defects in code However: → it may lead to confusion: it is not rare the case in which “bug” is used in natural language to refer to different levels: “this line is buggy” - “this pointer being null, is a bug” - “the program crashed: it's a bug” → starting from Dijkstra, there was the search for terms that could increase the responsibility of developers – the term “bug” might give the impression of something that magically appears into software What about the term “Bug”? Definitions according to IEEE Std 1044-2009 “IEEE Standard Classification for Software Anomalies“ 12-106 Who's to blame? image from http://blog.smartbear.com/sqc/when-bad-software-requirements-happen-to-good-people 13-106 ● Sensitivity: better to fail every time than sometimes ● Redundancy: making intentions explicit ● Restrictions: making the problem easier ● Partition: divide and conquer ● Visibility: making information accessible ● Feedback: applying lessons from experience in process and techniques Basic Principles of Testing (c) 2007 Mauro Pezzè & Michal Young 14-106 ● Consistency helps: – a test selection criterion works better if every selected test provides the same result, i.e., if the program fails with one of the selected tests, it fails with all of them (reliable criteria) – run time deadlock analysis works better if it is machine independent, i.e., if the program deadlocks when analyzed on one machine, it deadlocks on every machine Sensitivity: better to fail every time than sometimes (c) 2007 Mauro Pezzè & Michal Young 15-106 ● Look at the following code fragment Sensitivity Example char before[] = “=Before=”; char middle[] = “Middle”; char after [] = “=After=”; int main(int argc, char *argv){ strcpy(middle, “Muddled”); /* fault, may not fail */ strncpy(middle, “Muddled”, sizeof(middle)); /* fault, may not fail */ } What's the problem? (c) 2007 Mauro Pezzè & Michal Young 16-106 ● Let's make the following adjustment Sensitivity Example char before[] = “=Before=”; char middle[] = “Middle”; char after [] = “=After=”; int main(int argc, char *argv){ strcpy(middle, “Muddled”); /* fault, may not fail */ strncpy(middle, “Muddled”, sizeof(middle)); /* fault, may not fail */ stringcpy(middle, “Muddled”, sizeof(middle)); /* guaranteed to fail */ } void stringcpy(char *target, const char *source, int size){ assert(strlen(source) < size); strcpy(target, source); } This adds sensitivity to a non-sensitive solution (c) 2007 Mauro Pezzè & Michal Young 17-106 ● Let's look at the following Java code fragment. We use the ArrayList as a sort of queue and we remove one item after printing the results Sensitivity Example public class TestIterator { public static void main(String args[]) { List myList = new ArrayList<>(); myList.add("PV260"); myList.add("SW"); myList.add("Quality"); Iterator it = myList.iterator(); while (it.hasNext()) { String value = it.next(); System.out.println(value); myList.remove(value); } } } Will this output “PV260 SW Quality” ? 18-106 ● Let's look at the following Java code fragment. We use the ArrayList as a sort of queue and we remove one item after printing the results Sensitivity Example public class TestIterator { public static void main(String args[]) { List myList = new ArrayList<>(); myList.add("PV260"); myList.add("SW"); myList.add("Quality"); Iterator it = myList.iterator(); while (it.hasNext()) { String value = it.next(); System.out.println(value); myList.remove(value); } } } Actually, this throws java.util.ConcurrentModificationException 19-106 ● From Java SE documentation: ● “[...] Some Iterator implementations (including those of all the general purpose collection implementations provided by the JRE) may choose to throw this exception if this behavior is detected. Iterators that do this are known as fail-fast iterators, as they fail quickly and cleanly, rather that risking arbitrary, non-deterministic behavior at an undetermined time in the future.” ● “Note that fail-fast behavior cannot be guaranteed as it is, generally speaking, impossible to make any hard guarantees in the presence of unsynchronized concurrent modification. Fail-fast operations throw ConcurrentModificationException on a best-effort basis. Therefore, it would be wrong to write a program that depended on this exception for its correctness: ConcurrentModificationException should be used only to detect bugs.” Sensitivity Example 20-106 • Redundant checks can increase the capabilities of catching specific faults early or more efficiently. – Static type checking is redundant with respect to dynamic type checking, but it can reveal many type mismatches earlier and more efficiently. – Validation of requirement specifications is redundant with respect to validation of the final software, but can reveal errors earlier and more efficiently. – Testing and proof of properties are redundant, but are often used together to increase confidence Redundancy: making intentions explicit (c) 2007 Mauro Pezzè & Michal Young 21-106 • Adding redundancy by asserting that a condition must always be true for the correct execution of the program Redundancy Example void save(File *file, const char *dest){ assert(this.isInitialized()); ... } • From a language (e.g. Java) point of view, why are we obliged to declare the exception we throw from a method - isn't this redundant? public void throwException() throws FileNotFoundException{ throw new FileNotFoundException(); } Think if you could throw any exception from a method without declaration in the method signature 22-106 • Suitable restrictions can reduce hard (unsolvable) problems to simpler (solvable) problems – A weaker spec may be easier to check: it is impossible (in general) to show that pointers are used correctly, but the simple Java requirement that pointers are initialized before use is simple to enforce. – A stronger spec may be easier to check: it is impossible (in general) to show that type errors do not occur at run-time in a dynamically typed language, but statically typed languages impose stronger restrictions that are easily checkable. Restriction: making the problem easier (c) 2007 Mauro Pezzè & Michal Young 23-106 ● Will the following compile in Java? Restriction Example public static void questionable(){ int k; for (int i=0; i<10;++i){ if (someCondition(i)){ k = 0; } else { k+=i; } } } int k; if (true == false){ k+=i; } Java ALWAYS enforces variable initialization before usage as the following example shows – this is a case of restriction But restrictions can be applied at different levels, e.g. at the architectural level the decision of making the HTTP protocol stateless hugely simplified testing (and as such made the protocol more robust) 24-106 • Hard testing and verification problems can be handled by suitably partitioning the input space: – both structural (white box) and functional test (black box) selection criteria identify suitable partitions of code or specifications (partitions drive the sampling of the input space) – verification techniques fold the input space according to specific characteristics, grouping homogeneous data together and determining partitions → Examples of structural (white box) techniques: unit testing, integration testing, performance testing → Examples of functional (black box) techniques: system testing, acceptance testing, regression testing Partition: divide and conquer (c) 2007 Mauro Pezzè & Michal Young 25-106 ● Non-uniform distribution of faults ● Example: Java class “roots” applies quadratic equation ● Incomplete implementation logic: Program does not properly handle the case in which b2 - 4ac = 0 and a = 0 → Failing values are sparse in the input space — needles in a very big haystack. Random sampling is unlikely to choose a=0.0 and b=0.0 Partition - Example These would make good input values for test cases (c) 2007 Mauro Pezzè & Michal Young ax 2 +bx+c=0 x= −b±√b2 −4 ac 2a 26-106 Partition - Example Failure (valuable test case) No failure Failures are sparse in the space of possible inputs ... ... but dense in some parts of the space If we systematically test some cases from each part, we will include the dense parts Functional testing is one way of drawing pink lines to isolate regions with likely failures Thespaceofpossibleinputvalues (thehaystack) (c) 2007 Mauro Pezzè & Michal Young 27-106 ● The ability to measure progress or status against goals ● X visibility = ability to judge how we are doing on X, e.g., schedule visibility = “Are we ahead or behind schedule,” quality visibility = “Does quality meet our objectives?” – Involves setting goals that can be assessed at each stage of development ● The biggest challenge is early assessment, e.g., assessing specifications and design with respect to product quality ● Related to observability – Example: Choosing a simple or standard internal data format to facilitate unit testing Visibility: Judging status (c) 2007 Mauro Pezzè & Michal Young 28-106 ● The HTTP Protocol Visibility - Example GET /index.html HTTP/1.1 Host: www.google.com Why wasn't a more efficient binary format selected? To note HTTP 2.0 will use a binary format (from https://http2.github.io/faq): “Binary protocols are more efficient to parse, more compact “on the wire”, and most importantly, they are much less error-prone, compared to textual protocols like HTTP/1.x, because they often have a number of affordances to “help” with things like whitespace handling, capitalization, line endings, blank links and so on.” In fact, reduction of visibility is confirmed by “It’s true that HTTP/2 isn’t usable through telnet, but we already have some tool support, such as a Wireshark plugin.” 29-106 • Learning from experience: Each project provides information to improve the next • Examples – Checklists are built on the basis of errors revealed in the past – Error taxonomies can help in building better test selection criteria – Design guidelines can avoid common pitfalls Feedback: tuning the development process Using a software reliability model fitting past project data Looking for problematic modules based on prior knowledge (c) 2007 Mauro Pezzè & Michal Young 30-106 From Requirements to Test Cases 31-106 According to ISO/IEC/IEEE 29148-2011 standard: ● Correctness: requirements represent the client’s view ● Completeness: all possible scenarios through the system are described, including exceptional behavior by the user ● Consistency: There are functional or nonfunctional requirements that contradict each other ● Clarity: There are no ambiguities in the requirements ● Realism: Requirements can be implemented and delivered ● Traceability: Each system function can be traced to a corresponding set of functional requirements Characteristics of Requirements 32-106 According to IEEE Std 829-1998: ● Test Case Specification: “A document specifying inputs, predicted results, and a set of execution conditions for a test item” Test Cases Definition 33-106 • Functional testing: Deriving test cases from program specifications • Functional refers to the source of information used in test case design, not to what is tested • Also known as: – specification-based testing (from specifications) – black-box testing (no view of the code) • Functional specification = description of intended program behavior – either formal or informal Functional Testing (c) 2007 Mauro Pezzè & Michal Young 34-106 • Functional testing uses the specification (formal or informal) to partition the input space – E.g., specification of “roots” program suggests division between cases with zero, one, and two real roots • Test each category, and boundaries between categories – No guarantees, but experience suggests failures often lie at the boundaries (as in the “roots” program) Functional testing: exploiting the specification (c) 2007 Mauro Pezzè & Michal Young 35-106 • The base-line technique for designing test cases – Timely • Often useful in refining specifications and assessing testability before code is written – Effective • finds some classes of fault (e.g., missing logic) that can elude other approaches – Widely applicable • to any description of program behavior serving as spec • at any level of granularity from module to system testing. – Economical • typically less expensive to design and execute than structural (code-based) test cases Why functional Tests? (c) 2007 Mauro Pezzè & Michal Young 36-106 • Program code is not necessary – Only a description of intended behavior is needed – Even incomplete and informal specifications can be used • Although precise, complete specifications lead to better test suites • Early functional test design has side benefits – Often reveals ambiguities and inconsistency in spec – Useful for assessing testability • And improving test schedule and budget by improving spec – Useful explanation of specification • or in the extreme case (as in XP), test cases are the spec Early Functional Test Design (c) 2007 Mauro Pezzè & Michal Young 37-106 • Functional test applies at all granularity levels: – Unit (from module interface spec) – Integration (from API or subsystem spec) – System (from system requirements spec) – Regression (from system requirements + bug history) • Structural (code-based) test design applies to relatively small parts of a system: – Unit – Integration • Functional testing is best for missing logic faults – A common problem: Some program logic was simply forgotten – Structural (code-based) testing will never focus on code that isn’t there! Functional vs structural test: granularity levels (c) 2007 Mauro Pezzè & Michal Young 38-106 1. Decompose the specification – If the specification is large, break it into independently testable features to be considered in testing 2. Select representatives – Representative values of each input, or Representative behaviors of a model – Often simple input/output transformations don’t describe a system. We use models in program specification, in program design, and in test design 3. Form test specifications – Typically: combinations of input values, or model behaviors 4. Produce and execute actual tests Steps: from specifications to test cases (c) 2007 Mauro Pezzè & Michal Young 39-106 Steps: from specifications to test cases: example Derive Independently Testable Features: identify features that can be tested separately Examples: a search functionality on a web application or addition of new users this may map to different→ levels at the design and code level NOTE: this helps also in determining if there are requirements that are not testable or need to be rewritten or clarified! Derive Representative values OR a model that can be used to derive test cases. Note that this phase is mostly enumeration of values in isolation. Example: considering empty list or a one element list as representative cases Generation of test case specification based on the previous step, usually based on the Cartesian product from the enumeration values (considering feasible cases). Example: the search functionality, representative values might be 0,1, many characters and 0,1, many special characters, but the case {0,many} is clearly impossible 40-106 Example One: using category partitioning Using combinatorial testing (category partition) from the specifications • We are building a catalogue of computer components in which customers can select the different parts and assemble their PC for delivery • A model identifies a specific product and determines a set of constraints on available components • A set of (slot, component) pairs, corresponding to the required and optional slots of the model. A component might be empty for optional slots 41-106 Parameter Model – Model number – Number of required slots for selected model (#SMRS) – Number of optional slots for selected model (#SMOS) Parameter Components – Correspondence of selection with model slots – Number of required components with selection ≠ empty – Required component selection – Number of optional components with selection ≠ empty – Optional component selection Environment element: Product database – Number of models in database (#DBM) – Number of components in database (#DBC) Step 1: Identify independently testable units (c) 2007 Mauro Pezzè & Michal Young 42-106 Model number Malformed Not in database Valid Number of required slots for selected model (#SMRS) 0 1 Many Number of optional slots for selected model (#SMOS) 0 1 Many Step 2: Identify relevant values: Component (1/3) (c) 2007 Mauro Pezzè & Michal Young 43-106 Correspondence of selection with model slots Omitted slots Extra slots Mismatched slots Complete correspondence Number of required components with non empty selection 0 < number required slots = number required slots Required component selection Some defaults All valid ≥ 1 incompatible with slots ≥ 1 incompatible with another selection ≥ 1 incompatible with model ≥ 1 not in database Number of optional components with non empty selection 0 < #SMOS = #SMOS Optional component selection Some defaults All valid ≥ 1 incompatible with slots ≥ 1 incompatible with another selection ≥ 1 incompatible with model ≥ 1 not in database Step 2: Identify relevant values: Component (2/3) (c) 2007 Mauro Pezzè & Michal Young 44-106 Number of models in database (#DBM) 0 1 Many Number of components in database (#DBC) 0 1 Many Note 0 and 1 are unusual (special) values. They might cause unanticipated behavior alone or in combination with particular values of other parameters. Step 2: Identify relevant values: Component (3/3) (c) 2007 Mauro Pezzè & Michal Young 45-106 ● A combination of values for each category corresponds to a test case specification – in the example we have 314.928 test cases – most of which are impossible! ● example zero slots and at least one incompatible slot ● Introduce constraints to – rule out impossible combinations – reduce the size of the test suite if too large Step 3: Introduce constraints (c) 2007 Mauro Pezzè & Michal Young 46-106 [Error] indicates a value class that – corresponds to a erroneous values – need be tried only once Example Model number: Malformed and Not in database error value classes – No need to test all possible combinations of errors – One test is enough (we assume that handling an error case bypasses other program logic) Step 3: error constraint (c) 2007 Mauro Pezzè & Michal Young 47-106 Model number Malformed [error] Not in database [error] Valid Correspondence of selection with model slots Omitted slots [error] Extra slots [error] Mismatched slots [error] Complete correspondence Number of required comp. with non empty selection 0 [error] < number of required slots [error] Required comp. selection ≥ 1 not in database [error] Number of models in database (#DBM) 0 [error] Number of components in database (#DBC) 0 [error] Error constraints reduce test suite from 314.928 to 2.711 test cases Example - Step 3: error constraint (c) 2007 Mauro Pezzè & Michal Young 48-106 constraint [property] [if-property] rule out invalid combinations of values [property] groups values of a single parameter to identify subsets of values with common properties [if-property] bounds the choices of values for a category that can be combined with a particular value selected for a different category Example combine Number of required comp. with non empty selection = number required slots [if RSMANY] only with Number of required slots for selected model (#SMRS) = Many [Many] Step 3: property constraints (c) 2007 Mauro Pezzè & Michal Young 49-106 Number of required slots for selected model (#SMRS) 1 [property RSNE] Many [property RSNE] [property RSMANY] Number of optional slots for selected model (#SMOS) 1 [property OSNE] Many [property OSNE] [property OSMANY] Number of required comp. with non empty selection 0 [if RSNE] [error] < number required slots [if RSNE] [error] = number required slots [if RSMANY] Number of optional comp. with non empty selection < number required slots [if OSNE] = number required slots [if OSMANY] from 2.711 to 908 test cases Example - Step 3: property constraints (c) 2007 Mauro Pezzè & Michal Young 50-106 [single] indicates a value class that test designers choose to test only once to reduce the number of test cases Example value some default for required component selection and optional component selection may be tested only once despite not being an erroneous condition note single and error have the same effect but differ in rationale. Keeping them distinct is important for documentation and regression testing Step 3 (cont): single constraints (c) 2007 Mauro Pezzè & Michal Young 51-106 from 908 to 69 test cases Number of required slots for selected model (#SMRS) 0 [single] 1 [property RSNE] [single] Number of optional slots for selected model (#SMOS) 0 [single] 1 [single] [property OSNE] Required component selection Some default [single] Optional component selection Some default [single] Number of models in database (#DBM) 1 [single] Number of components in database (#DBC) 1 [single] Example - Step 3: single constraints (c) 2007 Mauro Pezzè & Michal Young 52-106 Parameter Model ● Model number – Malformed [error] – Not in database [error] – Valid ● Number of required slots for selected model (#SMRS) – 0 [single] – 1 [property RSNE] [single] – Many [property RSNE] [property RSMANY] ● Number of optional slots for selected model (#SMOS) – 0 [single] – 1 [property OSNE] [single] – Many [property OSNE] [property OSMANY] Environment Product data base ● Number of models in database (#DBM) – 0 [error] – 1 [single] – Many ● Number of components in database (#DBC) – 0 [error] – 1 [single] – Many Parameter Component ● Correspondence of selection with model slots – Omitted slots [error] – Extra slots [error] – Mismatched slots [error] – Complete correspondence ● # of required components (selection  empty) – 0 [if RSNE] [error] – < number required slots [if RSNE] [error] – = number required slots [if RSMANY] ● Required component selection – Some defaults [single] – All valid ≥ 1 incompatible with slots ≥ 1 incompatible with another selection ≥ 1 incompatible with model ≥ 1 not in database [error] ● # of optional components (selection  empty) – 0 – < #SMOS [if OSNE] – = #SMOS [if OSMANY] ● Optional component selection – Some defaults [single] – All valid ≥ 1 incompatible with slots ≥ 1 incompatible with another selection ≥ 1 incompatible with model ≥ 1 not in database [error] Example - Summary (c) 2007 Mauro Pezzè & Michal Young 53-106 Example Two: Deriving a model Maintenance: The Maintenance function records the history of items undergoing maintenance. • If the product is covered by warranty or maintenance contract, maintenance can be requested either by calling the maintenance toll free number, or through the web site, or by bringing the item to a designated maintenance station. • If the maintenance is requested by phone or web site and the customer is a US or EU resident, the item is picked up at the customer site, otherwise, the customer shall ship the item with an express courier. • If the maintenance contract number provided by the customer is not valid, the item follows the procedure for items not covered by warranty. • If the product is not covered by warranty or maintenance contract, maintenance can be requested only by bringing the item to a maintenance station. The maintenance station informs the customer of the estimated costs for repair. Maintenance starts only when the customer accepts the estimate. • If the customer does not accept the estimate, the product is returned to the customer. • Small problems can be repaired directly at the maintenance station. If the maintenance station cannot solve the problem, the product is sent to the maintenance regional headquarters (if in US or EU) or to the maintenance main headquarters (otherwise). • If the maintenance regional headquarters cannot solve the problem, the product is sent to the maintenance main headquarters. • Maintenance is suspended if some components are not available. • Once repaired, the product is returned to the customer. Multiple choices in the first step ... ... determine the possibilities for the next step ... ... and so on ... From an informal specification: (c) 2007 Mauro Pezzè & Michal Young 54-106 Example Two: Deriving a model To a finite state machine: (c) 2007 Mauro Pezzè & Michal Young 55-106 Example Two: Deriving a model To a test suite: (c) 2007 Mauro Pezzè & Michal Young 56-106 Example Two: Deriving a model Using transition coverage: Using transition coverage: Every transition between states should be traversed by at least one test case (c) 2007 Mauro Pezzè & Michal Young Does history matter? That is the order in which we traverse a node influences the functionality? (e.g. see wait for completion) 57-106 In the Agile context, the problem of functional testing has been addressed by having user stories and acceptance tests in collaboration with customers, constantly updated and runnable A complementary point of view (1/5) User Stories Architectural Spike Release Planning Iteration Acceptance Tests Small Releases Spike Exploration Phase Planning Phase Iterations to Release Phase Productionizing Phase requirements Test scenarios bugs next iteration latest version customer approval system metaphor uncertain estimates confident estimates release plan eXtreme Programming (XP) process 58-106 A complementary point of view (2/5) Using Fitnesse to write acceptance tests so that the customer can actually write the acceptance conditions for the software looking at our previous example the “root” case That we solve by means of ax2 +bx+c=0 x= −b±√b2 −4 ac 2a 59-106 A complementary point of view (3/5) public class Root { double rootOne, rootTwo; int numRoots; public Root (double a, double b, double c){ double q; double r; q = b*b - 4 * a *c; if (q >0 && a != 0){ // if b^2 > 4ac there are two dinstict roots numRoots = 2; r = (double) Math.sqrt(q); rootOne = ((0-b) + r) / (2*a); rootTwo = ((0-b) - r) / (2*a); } else if (q==0){ // DEFECT HERE numRoots = 1; rootOne = (0-b)/(2*a); rootTwo = rootOne; }else { // equation had no roots if b^2<4ac numRoots = 0; rootOne = -1; rootTwo = -1; } } } Source code from Mauro Pezzè & Michal Young 60-106 A complementary point of view (4/5) Our first attempt returns the number of solutions, but the customer did not want only this – so this is a mistake we would not have captured with unit tests The customer also wanted the solutions to the equation, however this opens other discussions how should we deal with no solutions? What→ about imaginary numbers? 61-106 A complementary point of view (5/5) Running with a=0 reports the mistake and also opens up a discussion about the format for returning the solutions and what were the original requirements in these cases 62-106 From Unit Testing to Integration Testing 63-106 ● On cars, children should not sit on the front passenger if passenger's airbag has not been disabled ● On most cars there is lever to turn to disable it Motivating Example ● However: one cars' manufacturer had trouble with the following scenario – Airbag turned off by the user – Car sent for check-up central unit replaced→ – Complete reset of the system reactivated airbags even though lever was OFF ● How could have this been detected by testing & which type of tests? 64-106 Module test Integration test System test Specification: Module interface Interface specs, module breakdown Requirements specification Visible structure: Coding details Modular structure (software architecture) — none — Scaffolding required: Some Often extensive Some Looking for faults in: Modules Interactions, compatibility System functionality What is integration testing? (c) 2007 Mauro Pezzè & Michal Young 65-106 • Unit (module) testing is a necessary foundation – Unit level has maximum controllability and visibility – Integration testing can never compensate for inadequate unit testing • Integration testing may serve as a process check – If module faults are revealed in integration testing, they signal inadequate unit testing – If integration faults occur in interfaces between correctly implemented modules, the errors can be traced to module breakdown and interface specifications Integration versus Unit Testing (c) 2007 Mauro Pezzè & Michal Young 66-106 • Inconsistent interpretation of parameters or values – Example: Mixed units (meters/yards) in Martian Lander • Violations of value domains, capacity, or size limits – Example: Buffer overflow • Side effects on parameters or resources – Example: Conflict on (unspecified) temporary file • Omitted or misunderstood functionality – Example: Inconsistent interpretation of web hits • Nonfunctional properties – Example: Unanticipated performance issues • Dynamic mismatches – Example: Incompatible polymorphic method calls Integration Faults (c) 2007 Mauro Pezzè & Michal Young 67-106 Static void ssl_io_filter_disable(ap_filter_t *f) { bio_filter_in_ctx_t *inctx = f->ctx; inctx->ssl = NULL; inctx->filter ctx->pssl = NULL; } Apache web server, version 2.0.48 Response to normal page request on secure (https) port Example: A Memory Leak No obvious error, but Apache leaked memory slowly (in normal use) or quickly (if exploited for a DOS attack) (c) 2007 Mauro Pezzè & Michal Young 68-106 Static void ssl_io_filter_disable(ap_filter_t *f) { bio_filter_in_ctx_t *inctx = f->ctx; SSL_free(inctx -> ssl); inctx->ssl = NULL; inctx->filter ctx->pssl = NULL; } Apache web server, version 2.0.48 Response to normal page request on secure (https) port The missing code is for a structure defined and created elsewhere, accessed through an opaque pointer. Example: A Memory Leak (c) 2007 Mauro Pezzè & Michal Young 69-106 Static void ssl_io_filter_disable(ap_filter_t *f) { bio_filter_in_ctx_t *inctx = f->ctx; SSL_free(inctx -> ssl); inctx->ssl = NULL; inctx->filter ctx->pssl = NULL; } Apache web server, version 2.0.48 Response to normal page request on secure (https) port Almost impossible to find with unit testing. (Inspection and some dynamic techniques could have found it.) Example: A Memory Leak (c) 2007 Mauro Pezzè & Michal Young 70-106 • Yes, I implemented module A , but I didn’t⟨ ⟩ test it thoroughly yet. It will be tested along with module B when that’s⟨ ⟩ ready. Maybe you have heard... (c) 2007 Mauro Pezzè & Michal Young 71-106 • Yes, I implemented module A , but I didn’t⟨ ⟩ test it thoroughly yet. It will be tested along with module B when that’s⟨ ⟩ ready. • I didn’t think at all about the strategy for testing. I didn’t design module⟨ A for testability and I⟩ didn’t think about the best order to build and test modules A and B .⟨ ⟩ ⟨ ⟩ (c) 2007 Mauro Pezzè & Michal Young Translation... 72-106 An extreme and desperate approach: Test only after integrating all modules +Does not require scaffolding • The only excuse, and a bad one - Minimum observability, diagnosability, efficacy, feedback - High cost of repair • Recall: Cost of repairing a fault rises as a function of time between error and repair (c) 2007 Mauro Pezzè & Michal Young Big Bang Integration Test 73-106 • Structural orientation: Modules constructed, integrated and tested based on a hierarchical project structure – Top-down, Bottom-up, Sandwich, Backbone • Functional orientation: Modules integrated according to application characteristics or features – Threads, Critical module (c) 2007 Mauro Pezzè & Michal Young Structural and Functional Strategies 74-106 Testing Framework Class A Stub B Stub X Stub Y Testing Framework Class A Class B Stub X Stub Y Testing Framework Class A Class B Class X Class Y ● Working from the top level (in terms of “use” or “include” relation) toward the bottom. ● No drivers required if program tested from top-level interface (e.g. GUI, CLI, web app, etc.) ● Write stubs of called or used modules at each step in construction ● As modules replace stubs, more functionality is testable ● ...until the program is complete, and all functionality can be tested (c) 2007 Mauro Pezzè & Michal Young Top-Down 75-106 Testing Framework Driver Class X Testing Framework Driver Driver Class X Class Y Testing Framework Class A Class B Class X Class Y ● Starting at the leaves of the “uses” hierarchy, we never need stubs ● ... but we must construct drivers for each module (as in unit testing) … ● ... an intermediate module replaces a driver, and needs its own driver ● so we may have several working subsystems that are eventually integrated into a single system. (c) 2007 Mauro Pezzè & Michal Young Bottom-Up . 76-106 ● Working from the extremes (top and bottom) toward center, we may use fewer drivers and stubs ● Sandwich integration is flexible and adaptable, but complex to plan (c) 2007 Mauro Pezzè & Michal Young Sandwich Top (parts) Stub A Class B Class Y Top (parts) Class A Class B Class YClass Y 77-106 ● A “thread” is a portion of several modules that together provide a user-visible program feature. ● Integrating one thread, then another, etc., we maximize visibility for the user ● As in sandwich integration testing, we can minimize stubs and drivers, but the integration plan may be complex (c) 2007 Mauro Pezzè & Michal Young Thread Driver Class A Class B Class Y Driver Class A Class B Class Y Class C Class Z 78-106 • Strategy: Start with riskiest modules – Risk assessment is necessary first step – May include technical risks (is X feasible?), process risks (is schedule for X realistic?), other risks • May resemble thread or sandwich process in tactics for flexible build order – E.g., constructing parts of one module to test functionality in another • Key point is risk-oriented process – Integration testing as a risk-reduction activity, designed to deliver any bad news as early as possible (c) 2007 Mauro Pezzè & Michal Young Critical Modules 79-106 • Functional strategies require more planning – Structural strategies (bottom up, top down, sandwich) are simpler – But thread and critical modules testing provide better process visibility, especially in complex systems • Possible to combine – Top-down, bottom-up, or sandwich are reasonable for relatively small components and subsystems – Combinations of thread and critical modules integration testing are often preferred for larger subsystems (c) 2007 Mauro Pezzè & Michal Young Choosing a Strategy 80-106 Specific Issues in Testing Object Oriented Software 81-106 ● Procedural software – unit = single program, function, or procedure more often: a unit of work that may correspond to one or more intertwined functions or programs ● Object oriented software – unit = class or (small) cluster of strongly related classes (e.g., sets of Java classes that correspond to exceptions) – unit testing = intra-class testing – integration testing = inter-class testing (cluster of classes) → dealing with single methods separately is usually too expensive (complex scaffolding), so methods are usually tested in the context of the class they belong to Ch 15, slide 81 OO definitions of unit and integration testing (c) 2007 Mauro Pezzè & Michal Young 82-106 • The Unit in Unit Testing is usually a class, however, there are specific issues that need to be taken into account when considering OO: – State dependent behavior – Encapsulation – Inheritance – Polymorphism and dynamic binding – Abstract and generic classes – Exception handling “Unit” in Unit Testing (c) 2007 Mauro Pezzè & Michal Young 83-106  abstract class Credit {  ...    abstract boolean validateCredit( Account a, int amt, CreditCard c);  ... } USAccount UKAccount EUAccount JPAccount OtherAccount EduCredit BizCredit IndividualCredit VISACard AmExpCard StoreCard The combinatorial problem: 3 x 5 x 3 = 45 possible combinations of dynamic bindings (just for this one method!) “Isolated” calls: the combinatorial explosion problem (c) 2007 Mauro Pezzè & Michal Young 84-106 Account Credit creditCard USAccount EduCredit VISACard USAccount BizCredit AmExpCard USAccount individualCredit ChipmunkCard UKAccount EduCredit AmExpCard UKAccount BizCredit VISACard UKAccount individualCredit ChipmunkCard EUAccount EduCredit ChipmunkCard EUAccount BizCredit AmExpCard EUAccount individualCredit VISACard JPAccount EduCredit VISACard JPAccount BizCredit ChipmunkCard JPAccount individualCredit AmExpCard OtherAccount EduCredit ChipmunkCard OtherAccount BizCredit VISACard OtherAccount individualCredit AmExpCard Identify a set of combinations that cover all pairwise combinations of dynamic bindings The combinatorial approach (c) 2007 Mauro Pezzè & Michal Young 85-106 public abstract class Account { ... public int getYTDPurchased() { if (ytdPurchasedValid) { return ytdPurchased; } int totalPurchased = 0; for (Enumeration e = subsidiaries.elements() ; e.hasMoreElements(); ) { Account subsidiary = (Account) e.nextElement(); totalPurchased += subsidiary.getYTDPurchased(); } for (Enumeration e = customers.elements(); e.hasMoreElements(); ) { Customer aCust = (Customer) e.nextElement(); totalPurchased += aCust.getYearlyPurchase(); } ytdPurchased = totalPurchased; ytdPurchasedValid = true; return totalPurchased; } … } Problem: different implementations of methods getYDTPurchased refer to different currencies. Combined calls: undesired effects (c) 2007 Mauro Pezzè & Michal Young 86-106 public abstract class Account { ... public int getYTDPurchased() { if (ytdPurchasedValid) { return ytdPurchased; } int totalPurchased = 0; for (Enumeration e = subsidiaries.elements() ; e.hasMoreElements(); ) { Account subsidiary = (Account) e.nextElement(); totalPurchased += subsidiary.getYTDPurchased(); } for (Enumeration e = customers.elements(); e.hasMoreElements(); ) { Customer aCust = (Customer) e.nextElement(); totalPurchased += aCust.getYearlyPurchase(); } ytdPurchased = totalPurchased; ytdPurchasedValid = true; return totalPurchased; } … } step 1: identify polymorphic calls, binding sets, defs and uses totalPurchased used and defined totalPurchased used and defined totalPurchased defined totalPurchased usedtotalPurchased used A Data Flow Approach (c) 2007 Mauro Pezzè & Michal Young 87-106 ● Derive a test case for each possible polymorphic pair – Each binding must be considered individually – Pairwise combinatorial selection may help in reducing the set of test cases ● Example: Dynamic binding of currency – We need test cases that bind the different calls to different methods in the same run – We can reveal faults due to the use of different currencies in different methods Def-Use (dataflow) testing of polymorphic calls (c) 2007 Mauro Pezzè & Michal Young 88-106 ● When testing a subclass ... – We would like to re-test only what has not been thoroughly tested in the parent class ● for example, no need to test hashCode and getClass methods inherited from class Object in Java – But we should test any method whose behavior may have changed ● even accidentally! Inheritance (c) 2007 Mauro Pezzè & Michal Young 89-106 ● Track test suites and test executions – determine which new tests are needed – determine which old tests must be re-executed ● New and changed behavior ... – new methods must be tested – redefined methods must be tested, but we can partially reuse test suites defined for the ancestor – other inherited methods do not have to be retested Reusing Tests with the Testing History Approach (c) 2007 Mauro Pezzè & Michal Young 90-106 Testing history (c) 2007 Mauro Pezzè & Michal Young 91-106 Inherited, unchanged (c) 2007 Mauro Pezzè & Michal Young 92-106 Newly introduced methods (c) 2007 Mauro Pezzè & Michal Young 93-106 Overridden methods (c) 2007 Mauro Pezzè & Michal Young 94-106 ● Abstract methods (and classes) – Design test cases when abstract method is introduced (even if it can’t be executed yet) ● Behavior changes – Should we consider a method “redefined” if another new or redefined method changes its behavior? ● The standard “testing history” approach does not do this ● It might be reasonable combination of data flow (structural) OO testing with the (functional) testing history approach Testing history – some details (c) 2007 Mauro Pezzè & Michal Young 95-106 Testing History - Summary (c) 2007 Mauro Pezzè & Michal Young 96-106 ● Executing test cases should (usually) be cheap – It may be simpler to re-execute the full test suite of the parent class – ... but still add to it for the same reasons ● But sometimes execution is not cheap ... – Example: Control of physical devices – Or very large test suites ● Ex: Some Microsoft product test suites require more than one night (so daily build cannot be fully tested) – Then some use of testing history is profitable Does Testing History help? (c) 2007 Mauro Pezzè & Michal Young 97-106 A generic class class PriorityQueue {...} is designed to be instantiated with many different parameter types PriorityQueue PriorityQueue A generic class is typically designed to behave consistently some set of permitted parameter types. Testing can be broken into two parts – Showing that some instantiation is correct – showing that all permitted instantiations behave consistently Testing Generic Classes (c) 2007 Mauro Pezzè & Michal Young 98-106 ● Design tests as if the parameter were copied textually into the body of the generic class. – We need source code for both the generic class and the parameter class Show that some instantiation is correct (c) 2007 Mauro Pezzè & Michal Young 99-106 ● Identify potential interactions between generic and its parameters – Identify potential interactions by inspection or analysis, not testing – Look for: method calls on parameter object, access to parameter fields, possible indirect dependence – Easy case is no interactions at all (e.g., a simple container class) ● Where interactions are possible, they will need to be tested Identify (possible) interactions (c) 2007 Mauro Pezzè & Michal Young 100-106 class PriorityQueue {...} ● Priority queue uses the “Comparable” interface of Elem to make method calls on the generic parameter ● We need to establish that it does so consistently – So that if priority queue works for one kind of Comparable element, we can have some confidence it does so for others Example Interaction (c) 2007 Mauro Pezzè & Michal Young 101-106 ● We can’t test every possible instantiation – Just as we can’t test every possible program input ● ... but there is a contract (a specification) between the generic class and its parameters – Example: “implements Comparable” is a specification of possible instantiations – Other contracts may be written only as comments ● Functional (specification-based) testing techniques are appropriate – Identify and then systematically test properties implied by the specification Testing variation in instantiation (c) 2007 Mauro Pezzè & Michal Young 102-106 Most but not all classes that implement Comparable also satisfy the rule (x.compareTo(y) == 0) == (x.equals(y)) (from java.lang.Comparable) So test cases for PriorityQueue should include ● instantiations with classes that do obey this rule: class String ● instantiations that violate the rule: class BigDecimal with values 4.0 and 4.00 Example: Testing variation in instantiation (c) 2007 Mauro Pezzè & Michal Young 103-106 void addCustomer(Customer theCust) { customers.add(theCust); } public static Account newAccount(...) throws InvalidRegionException { Account thisAccount = null; String regionAbbrev = Regions.regionOfCountry( mailAddress.getCountry()); if (regionAbbrev == Regions.US) { thisAccount = new USAccount(); } else if (regionAbbrev == Regions.UK) { .... } else if (regionAbbrev == Regions.Invalid) { throw new InvalidRegionException(mailAddress.getCountry()); } ... } exceptions create implicit control flows and may be handled by different handlers Exception handling (c) 2007 Mauro Pezzè & Michal Young 104-106 ● Impractical to treat exceptions like normal flow ● too many flows: every array subscript reference, every memory allocation, every cast, ... ● multiplied by matching them to every handler that could appear immediately above them on the call stack. ● many actually impossible ● So we separate testing exceptions ● and ignore program error exceptions (test to prevent them, not to handle them) ● What we do test: Each exception handler, and each explicit throw or re-throw of an exception Testing Exception Handling (c) 2007 Mauro Pezzè & Michal Young 105-106 ● Local exception handlers – test the exception handler (consider a subset of points bound to the handler) ● Non-local exception handlers – Difficult to determine all pairings of – So enforce (and test for) a design rule: if a method propagates an exception, the method call should have no other effect Testing program exception handlers (c) 2007 Mauro Pezzè & Michal Young 106-106 Most of the source code examples, class diagrams, etc... from [2] if not differently stated [1] A. Zeller, Why Programs Fail, Second Edition: A Guide to Systematic Debugging, 2 edition. Amsterdam ; Boston: Morgan Kaufmann, 2009.  [2] M. Pezzè and M. Young, Software Testing And Analysis: Process, Principles And Techniques. Hoboken, N.J.: John Wiley & Sons Inc, 2007. Acceptance Testing example using Fitnesse (www.fitnesse.org) References