LAB OF SOFTWARE ARCHITECTURES AND INFORMATION SYSTEMS FACULTY OF INFORMATICS MASARYK UNIVERSITY, BRNO PV260 - SOFTWARE QUALITY SOFTWARE MEASUREMENT & METRICS AND THEIR ROLE IN QUALITY IMPROVEMENT Bruno Rossi brossi@mail.muni.cz 2-99 ● The following defect (can you spot it?) in Apple's SSL code was undiscovered from Sept 2012 to Feb 2014 – how can it be? M. Bland, “Finding more than one worm in the apple,” Communications of the ACM, vol. 57, no. 7, pp. 58–64, Jul. 2014. Introduction 3-99 ● Modern systems are very large & complex in terms of structure & runtime behaviour ● The figure on the right represents Eclipse JDT 3.5.0 (350K LOCs, 1.324 classes, 23.605 methods ) Classes → black - Methods → red – Attributes → blue. Method containment, attribute containment, and class inheritance → gray - Invocations → red - Accesses → blue Introduction 4-99 ● We need ways to understand attributes of software, represent in a concise way and use it to track for software & development process improvement ● Software Measurement and Metrics are one of the aspects we can consider LOCs 354.780 NOM 23.605 NOC 1.324 NOP 45 LOCs=lines of code, NOM=nr. of methods NOC=nr. of classes, NOP=nr. of packages If we consider the following metrics, what can we say? What are these metrics “good” for? Introduction 5-99 ● Typical problems of measurement: → How can I measure the maintainability of my software? → Can I estimate the number of defects of my software? → What is the productivity of my development team? → Can I measure the quality of my testing process? Introduction 6-99 ● Measurement is the process by which numbers or symbols are assigned to attributes of entities in the real world in such a way as to describe them according to clearly defined rules (N. Fenton and S. L. Pfleeger, 1997) → A measurement is the process to define a measure Measurement 7-99 ● The measurement process goes from the real world to the numerical representation ● Interpretation goes from the numerical representation to the relevant empirical results Real World Numbers Reduced Numbers Relevant Empirical Results Intelligence Barrier Measures Interpretation Statistics RelevantResults The Measurement Process 8-99 ● To avoid anecdotal evidence without a clear research (through experiments or prototypes for example) ● To increase the visibility and the understanding of the process ● To analyze the software development process ● To make predictions through statistical models Gilbs’s Principle of fuzzy targets (1988): “Projects without clear goals will not achieve their goals clearly” Why Software Measurement 9-99 ● Although measurement may be integrated in development, very often objectives of measurements are not clear “I measure the process because there is an automated tool that collects the metrics, but do not know how to read the data and what I can do with the data” Tom De Marco (1982): “You cannot manage what you cannot measure” ... ...but you need to know what to measure and how to measure However... 10-99 Motivational Example 11-99 ● Expert source code and system review after reported cases of accidents due to cars accelerating without users' inputs * ● 18 months review + previous NASA experts code review ● Investigation on unintended accelerations Review of Defective Toyota Camry's System (1/3) * http://www.safetyresearch.net/Library/BarrSlides_FINAL_SCRUBBED.pdf 12-99 ● Usage of software metrics (p.24): ● “Data-flow spaghetti – Complex coupling between software modules and between tasks – Count of global variables is a software metric for “tangledness” → 2005 Camry L4 has >11,000 global variables (NASA)” Review of Defective Toyota Camry's System (2/3) * http://www.safetyresearch.net/Library/BarrSlides_FINAL_SCRUBBED.pdf 13-99 ● Usage of software metrics (p.24): ● “Control-flow spaghetti – Many long, overly-complex function bodies – Cyclomatic Complexity is a software metric for “testability” → 2005 Camry L4 has 67 functions scoring >50 (“untestable”) → The throttle angle function scored over 100 (unmaintainable)” ● See also p.30-31 for coding rules violations and expected number of bugs Review of Defective Toyota Camry's System (3/3) * http://www.safetyresearch.net/Library/BarrSlides_FINAL_SCRUBBED.pdf 14-99 Pitfalls in linking the real world phenomenon to numbering systems https://xkcd.com/605/ 15-99 ● A/B Testing is a kind of randomized experiment in which you can propose two variants of the same application to the users ● Set-up an experiment with two browsers and two variations of the same webpage Pitfall Example (1/3) Conv Rate A Conv Rate B Firefox 87.50% 100.00% Chrome 50.00% 62.50% What can you conclude? Which alternative is better? https://medium.com/homeaway-tech-blog/simpsons-paradox-in-a-b-testing-93af7a2f3307 16-99 ● Let’s look at the same table but with additional information about the way the tests were split Pitfall Example (2/3) https://medium.com/homeaway-tech-blog/simpsons-paradox-in-a-b-testingConv Rate A Conv Rate B Firefox 70/80 = 87.5% 20/20 = 100% Chrome 10/20 = 50% 50/80 = 62.5% Both 80/100 = 80% 70/100 = 70% 17-99 Simpsons' paradox ● It can happen that: a/b < A/B c/d < C/D (a + c)/(b + d) > (A + C)/(B + D) ● example 1/5 (20%) < 2/8 (25%) 6/8 (75%) < 4/5 (80%) 7/13 (53%) > 6/13 (46%) See: https://plato.stanford.edu/entries/paradox-simpson/ – considering the following papers: J. Pearl (2000). Causality: Models, Reasoning, and Inference, Cambridge University Press. P.J. Bickel, E.A. Hammel and J.W. O'Connell (1975). "Sex Bias in Graduate Admissions: Data From Berkeley. Science 187 (4175): 398–40 Dept Men Women Applicants admitted Applicants admitted A 5 20% 8 25% B 8 75% 5 80% Total 13 53% 13 46% Pitfall Example (3/3) 18-99 Background on Software Measurement 19-99 Measurement artifacts / objects Product (architecture implementation, documentation) Process (management, lifecycle, CASE) Resources (personnel, software, hardware) Measurement Models Flow graphs Call graphs Structure tree Code schema ... Scale types, statistics Correlation Estimation Adjustment Calibration Measurement Evaluation Analysis Visualization Exploration Prediction ... Measurement Goals Understanding Learning Improvement Management Controlling ... artefactBased operation quantificationBased operation valueBased operation experienceBased operation Software Measurement Methods 20-99 Information Product Information Needs Interpretation Indicator (analysis) Model Derived Measure Derived Measure Measurement Function Base Measure Base Measure Measurement Method Measurement Method Attribute Attribute Entity Measurable Concept Measurable Concept: abstract relationship between attributes of entities and information needs Measurement Information Model (ISO/IEC 15939) 21-99 Derived Measure Derived Measure Measurement Function Base Measure Base Measure Measurement Method Measurement Method Attribute Attribute Entity Measurable Concept Property relevant to information needs Operations mapping an attribute to a scale Variable assigned a value by applying the method to one attribute Algorithm for combining two or more base measures Variable assigned a value by applying the measurement function to two or more values of base measures Bottom part Measurement Information Model (ISO/IEC 15939) 22-99 Information Product Information Needs Interpretation Indicator (analysis) Model Algorithm for combining measures and decision criteria Variable assigned a value by applying the analysis model to base and/or derived measures Explanation relating the quantitative information in the indicator to the information needs The outcome of the measurement process that satisfies the information needs Top part Measurement Information Model (ISO/IEC 15939) 23-99 Information Product Information Needs Interpretation Indicator (analysis) Model Derived Measure Derived Measure Measurement Function Base Measure Base Measure Measurement Method Measurement Method Attribute Attribute Entity Measurable Concept B1= Nr. of inaccurate computations encountered by users B2= Operation Time B1/B2 Computational Accuracy Comparison of values obtained with generic thresholds and/or targets External quality measures – Functionality - Accuracy Software Run-time accuracy Run-time usability Information Product Information Needs Interpretation Indicator (analysis) Model Derived Measure Derived Measure Measurement Function Base Measure Base Measure Measurement Method Measurement Method Attribute Attribute Entity Measurable Concept B1= Number of detected failures B2= Number of performed test cases B1/B2 Failure density against test cases Comparison of values obtained with generic thresholds and/or targets External quality measures – Reliability - Maturity Software Run-time reliability Level of testing Inspired by Abran, Alain, et al. "An information model for software quality measurement with ISO standards." Proceedings of the International Conference on Software Development (SWDC-REK), Reykjavik, Iceland. 2005. ISO/IEC 15939 Examples 24-99 ● A measure is a mapping between – The real world – The mathematical or formal world with its objects and relations ● Different mappings give different views of the world depending on the context (height, weight, …) ● The mapping relates attributes to mathematical objects; it does not relate entities to mathematical objects Measure Definition 25-99 ● The validity of a measure depends on definition of the attribute coherent with the specification of the real world ● Example: Is LOC a valid measure of productivity? → Think by paradox: 100K system.out statements vs 100K of complex loops and statements ADDITIONAL PROBLEM: You might have two different projects with two different definitions of LOCs (e.g., considering blanks+comments vs only “;”) so that the following can be true at the same time P1>P2 and P1 min,max median avg prop Nominal → Ordinal → Interval → Rational → ≠,= 32-99 ● Some examples of measures and related scales Scale Type Examples in Software Eng. Indicators of Central Tendency Nominal Name of the programming language (e.g. Java, C++, C#) Mode Ordinal Ranking of failures (as a measure of failure severity) Mode + Median Interval Beginning date, end date of activities Mode + Median + Arithmetic Mean Ratio LOC (as a measure of program size) Mode + Median + Arithmetic Mean + geometric Mean Morasca, Sandro. "Software measurement." Handbook of Software Engineering and Knowledge Engineering (2001): 239-276. Measurement Scales (2/4) 33-99 ● Example, suppose that we have the following ranking of software tickets by severity Level Severity Description 6 Blocker Prevents function from being used, no workaround, blocking progress on multiple fronts 5 Critical Prevents function from being used, no work- around 4 Major Prevents function from being used, but a work-around is possible 3 Normal A problem making a function difficult to use but no special work-around is required 2 Minor A problem not affecting the actual function, but the behavior is not natural 1 Trivial A problem not affecting the actual function, a typo would be an example Measurement Scales (3/4) - Examples 34-99 ● Is it meaningful to use the weighted average to compare two projects in terms of severity of the open issues? Order Severity P1 P2 6 Blocker 2 10 5 Critical 36 19 4 Major 25 22 3 Normal 15 32 2 Minor 2 5 1 Trivial 121 113 Sev(Pn)=avg(∑issuesi∗weighti) Sev(P1)=avg(2∗6+36∗5+25∗4+15∗3+2∗2+121∗1)=77 Sev(P2)=avg(10∗6+19∗5+22∗4+32∗3+5∗2+113∗1)=77 Are the projects the same according to our metric? Is there the “same distance” from a critical ticket to a blocker that there is between minor and trivial? Measurement Scales (4/4) - Examples Let’s define the following metric: 35-99 ● Some measures are harder to collect or are not regularly collected – Direct: from a direct process of measuring – Indirect: from a mathematical equation in the world of symbols Derived Measure Derived Measure Measurement Function Base Measure Base Measure Measurement Method Measurement Method Attribute Attribute Entity Measurable Concept Property relevant to information needs Operations mapping an attribute to a scale Variable assigned a value by applying the method to one attribute Algorithm for combining two or more base measures Variable assigned a value by applying the measurement function to two or more values of base measures This is what ISO/IEC 15939 refers as base measure and derived measure Direct vs Indirect Measures (1/2) 36-99 ● Direct – Number of known defects ● Indirect – Defects density (DD) – COCOMO, measure of effort E=a⋅KSLoCb ⋅EAF where b=0.91+0.01∑i=1 5 SFi a=2.94 DD= known defects productsize Direct vs Indirect Measures (2/2) EAF = Effort Adjustment Factor SF = Scale Factors 37-99 ● Generally, it easier to collect measures of length and complexity of the code (internal attributes of product) than measures of its quality (external attributes) – Internal attribute: internal characteristics of product, process, and human resources – External attributes: due to external environment Internal vs External Attributes (1/4) 38-99 ● One of the aims of Software Engineering is to improve the quality of software Internal vs External Attributes (2/4) 39-99 ● The mapping of internal attributes to external ones – and then quality in use – is not as straightforward Internal vs External Attributes (3/4) 40-99 ● The mapping of internal attributes to external ones – and then quality in use – is not as straightforward (example: reliability) Internal vs External Attributes (4/4) nr. of failures over a period of time How many faults were detected in reviewed Product? X=A/B A=Absolute number of faults detected in review B=Number of estimated faults to be detected in review (using past history or reference model) Is there a relation between the two? ASSUMPTION (!): fix internal mistakes to fix the corresponding failure(s) 41-99 Objective: the same each time they are taken (e.g. automated collected by some device) → e.g., LOCs Subjective: manually collected by individuals → e.g., time to use a functionality in an application Objective vs Subjective Measures 42-99 SOFTWARE METRICS - SIZE 43-99 [01] * multiples. Repeat until there are no more multiples [02] * in the array. [03] */ [04] public class PrimeGenerator [05] { [06] private static boolean[] crossedOut; [07] private static int[] result; [08] public static int[] generatePrimes(int maxValue){ [09] if (maxValue < 2){ [10] return new int[0]; [11] }else{ [12] uncrossIntegersUpTo(maxValue); [13] crossOutMultiples(); [14] putUncrossedIntegersIntoResult(); [15] return result; [16] } [17] } [18] } Various Measures of Size 44-99 [01] * multiples. Repeat until there are no more multiples [02] * in the array. [03] */ [04] public class PrimeGenerator [05] { [06] private static boolean[] crossedOut; [07] private static int[] result; [08] public static int[] generatePrimes(int maxValue){ [09] if (maxValue < 2){ [10] return new int[0]; [11] }else{ [12] uncrossIntegersUpTo(maxValue); [13] crossOutMultiples(); [14] putUncrossedIntegersIntoResult(); [15] return result; [16] } [17] } [18] } LOC = 18 (Lines Of Code) CLOC=3 (Commented Lines of Code) Various Measures of Size 45-99 [01] * multiples. Repeat until there are no more multiples [02] * in the array. [03] */ [04] public class PrimeGenerator [05] { [06] private static boolean[] crossedOut; [07] private static int[] result; [08] public static int[] generatePrimes(int maxValue){ [09] if (maxValue < 2){ [10] return new int[0]; [11] }else{ [12] uncrossIntegersUpTo(maxValue); [13] crossOutMultiples(); [14] putUncrossedIntegersIntoResult(); [15] return result; [16] } [17] } [18] } NLOC = 15 (Non-Commented Lines Of Code) Various Measures of Size 46-99 [01] * multiples. Repeat until there are no more multiples [02] * in the array. [03] */ [04] public class PrimeGenerator [05] { [06] private static boolean[] crossedOut; [07] private static int[] result; [08] public static int[] generatePrimes(int maxValue){ [09] if (maxValue < 2){ [10] return new int[0]; [11] }else{ [12] uncrossIntegersUpTo(maxValue); [13] crossOutMultiples(); [14] putUncrossedIntegersIntoResult(); [15] return result; [16] } [17] } [18] } NOC = 1 (Number Of Classes) NOM = 1 (Number of Methods) NOP = 1 (Number of Packages) Various Measures of Size 47-99 ● Size is used for normalization of existing measures → from the example before, it would be much more useful to report a comments density of 16% (3/18) rather than 3 CLOCs CD= CLOCs LOCs = 3 18 =0.16 Measures of Size Good for? 48-99 ● Example, using comments density to compare Open Source projects after normalization What is a good reference value for “comments density” in your opinion? These look “scary” O. Arafat and D. Riehle, “The comment density of open source software code,” in 31st International Conference on Software Engineering - Companion Volume, 2009. ICSE-Companion 2009, 2009, pp. 195–198. Measures of Size Good for? 49-99 ● Size can give a good rough initial estimation of effort, although... → Measures of source code size should *never* be used to assess the productivity of developers How would you compare Mozilla Firefox with the Linux Kernel in terms of maintenance effort? Software LOCs Microsoft Windows Vista ~50M Linux Kernel 3.1 ~15M Android ~12M Mozilla Firefox ~10M Unreal Engine 3 ~2M Measures of Size Good for? 50-99 → http://www.informationisbeautiful.net/visualizations/million-lines-of-code/ ● Size can be used for comparison of projects and across releases Measures of Size Good for? 51-99 “The task then is to refine the code base to better meet customer need. If that is not clear, the programmers should not write a line of code. Every line of code costs money to write and more money to support.” Jeff Sutherland, one of the main proponents of the Agile Manifesto and the SCRUM methodology Another Observation about LOCs 52-99 SOFTWARE METRICS - COMPLEXITY 53-99 ● G=(N,E) is a graph representing the control flow of a program. N=nodes, E=edges, P = nr. disconnected parts of G, like main program and method call ● Cyclomatic Complexity is defined as: v(G) = |E|-|N|+ 2P Cyclomatic Complexity (CC) → Assumptions: higher complexity of the program flow graphs, more complex testing process for the source code 54-99 CC = 2 [01] * multiples. Repeat until there are no more multiples [02] * in the array. [03] */ [04] public class PrimeGenerator{ [05] private static boolean[] crossedOut; [06] private static int[] result; [07] public static int[] generatePrimes(int maxValue){ [08] if (maxValue < 2){ [09] return new int[0]; [10] }else{ [11] uncrossIntegersUpTo(maxValue); [12] crossOutMultiples(); [13] putUncrossedIntegersIntoResult(); [14] return result; [15] } [16] } [17] } Typical ranges 1-4 low 5-7 medium 8-10 high 11+ very high Cyclomatic Complexity (CC) CC of method generatePrimes v(G)=|E|-|N|+2 v(G)=9-9+2=2 entry exit 55-99 ● The following code structure from a 2008 students' project implementing chess: one method with 292LOCs and 163 CC Example by using CC 56-99 ● Let's decompose a bit such huge method public boolean eatCoin(Movement mov, Movement eatMov, Coin coin) throws IOException{ //Controls if the eatMove is in the board, if not return if(!canMove(eatMov)){ System.out.println("You can't eat this coin"); return false; } try{ //If it is a coin if(!this.board[mov.row][mov.col].isKing()){ //If the coin to eat isn't a king System.out.println("nextRow " + mov.nextRow + " nextCol " + mov.nextCol + " isKing " + this.board[mov.nextRow][mov.nextCol].isKing()); if(!this.board[mov.nextRow][mov.nextCol].isKing()){ .... Example by using CC 57-99 Example by using CC 58-99 ● A word of warning is that metrics take typically into account syntactic complexity NOT semantic complexity ● Both of the following code fragments have the *same* Cyclomatic Complexity → which code fragment is easier to understand? [04] public class PrimeGenerator [05] { [06] private static boolean[] crossedOut; [07] private static int[] result; [08] [09] public static int[] generatePrimes(int maxValue){ [10] if (maxValue < 2){ [11] return new int[0]; [12] }else{ [13] uncrossIntegersUpTo(maxValue); [14] crossOutMultiples(); [15] putUncrossedIntegersIntoResult(); [16] return result; [17] } [18] } [04] public class A [05] { [06] private static boolean[] c; [07] private static int[] b; [08] [09] public static int[] generate(int m){ [10] if (m < 2){ [11] return new int[0]; [12] }else{ [13] methodOne(m); [14] methodTwo(); [15] methodThree(); [16] return b; [17] } [18] } ● As well, as in the initial motivating example, a word of warning when comparing projects in terms of average complexity Complexity 59-99 OBJECT ORIENTED METRICS 60-99 ● WMC: Weighted methods per class → nr. of methods per class ● DIT: Depth of Inheritance Tree → max inheritance level from the root to the class ● NOC: Number of Children → nr. Of direct descendants of a class ● CBO: Coupling between object classes → Class A coupled with B, if A is using methods/attributes of B ● RFC: Response for a Class → count of methods that can be executed by class A responding to a message ● LCOM: Lack of cohesion in methods → (see next slide!) Chidamber & Kemerer Suite (1994!) 61-99More recent metrics 62-99 FINAL REMARKS 63-99 ● Given all that we have seen, what are your thoughts about the following metric (from the 90’s but still used) computing the Maintainability Index (MI) of a project: Final Remarks MI=171−5.2⋅ln(V )−0.23⋅CC−16.2⋅ln(LOC) Note: you might see different versions of MI implemented in different tools – this is the original formula that has a range (171,-∞), other variations go in the (0,100) range, e.g. look at Microsoft Visual Studio documentation for details Where V is the Halstead volume, measuring the complexity of code based on length and vocabulary used (in the code) V=N∗log2 n whereN=N1+N2, N1=Totaloperators(like>,;,),etc..,N2=Totaloperands(like j ,i,0,etc...) N=n1+n2, n1=uniqueoperators,n2=uniqueoperands In your view, what is good and what is bad about this metric? 64-99 The Goal Question Metrics (GQM) Approach 65-99 ● Common pitfalls in software measurement – Collecting measurements without a meaning ● Measurement must be goal-driven – Not analyzing measurements ● Numbers need detailed analysis – Setting unrealistic targets ● Targets should not be uniquely defined based on the numbers – Paralysis by analysis ● Measurement is a key activity in management, not a separate activity Count what is countable. Measure what is measurable. And what is not measurable, make measurable. Galileo Galilei Software Measurement - Pitfalls 66-99 ● Introduced in 1986 by Rombach and Basili – GQM stands for Goal Question Metric ● It is a deductive instrument to derive suitable measures from prescribed goals ● The paradigm is initiated by Business Goals (BG) ● From the BGs we can derive the GQM ● The Goal Question Metric top-down approach consists of three layers – Conceptual layer – the Measurement Goal (G) – Operational layer – the Question (Q) – Measurement layer – the Metric (M) The GQM Approach 67-99 ● Measurements must be goal-oriented ● Following typically a structure as the GQM approach: Measurement Goal (G) Question (Q) Metric (M) Business objectives, key performance indicators, projects targets, improvements goals Approaches to reach the goals, improvement programs, change management, project management techniques Business, employee, products, processes What are the goals to reach? What do I need to improve? How do I reach my objectives? I will I improve? Am I doing good or bad? Am I doing better or worse? Feedback loop (understand) Review Define Goal-oriented Measurement 68-99 The primary question must be “What do I need to improve?” rather than “What measurements should I use?” Measurement Goal (G) Question (Q) Metric (M) Business objectives, key performance indicators, projects targets, improvements goals Approaches to reach the goals, improvement programs, change management, project management techniques Business, employee, products, processes What are the goals to reach? What do I need to improve? How do I reach my objectives? I will I improve? Am I doing good or bad? Am I doing better or worse? Feedback loop (understand) Review Define Goal-oriented Measurement 69-99 ● Here are some possible and common used words for each item of the Goal structure ● Object of study: process, product, model, metric, etc ● Purpose: characterize, evaluate, predict, motivate, etc. in order to understand, assess, manage, engineer, improve, etc. it ● Point of view: manager, developer, tester, customer, etc. ● Perspective or Focus: cost, effectiveness, correctness, defects, changes, product measures, etc. ● Environment or Context: specify the environmental factors, including process factors, people factors, problem factors, methods, tools, constraints, etc. The Measurement Goal 70-99 SQALE (Software Quality Assessment Based on Lifecycle Expectations) 71-99 ● SQALE (Software Quality Assessment Based on Lifecycle Expectations) is a quality method to evaluate technical debts in software projects based on the measurement of software characteristics ● It allows to discuss here how quality characteristics have been mapped into numerical representations SQALE 72-99 ● SQALE quality model is based around three levels, the first one including 8 software characteristics SQALE Characteristic Sub- Characteristic Source Code Requirement 1 1,n 1 1,n Level 1 Level 2 Level 3 Testability Reliability Changeability Efficiency Security Maintainability Portability Reusability 73-99 ● The second level is formed by characteristics SQALE Characteristic Sub- Characteristic Source Code Requirement 1 1,n 1 1,n Level 1 Level 2 Level 3 Testability Reliability Changeability Efficiency Security Maintainability Portability Reusability Unit TestingTestability Integration TestingTestability Data related reliability Logic related reliability Statement related reliability Synchroniation related reliability Resource related reliability Architecture related reliability Fault tolerance Understandability Readability ... ... ... ... ... 74-99 ● The third level is linking language specific constructs to the sub-characteristics SQALE Characteristic Sub- Characteristic Source Code Requirement 1 1,n 1 1,n Level 1 Level 2 Level 3 Testability Reliability Changeability Efficiency Security Maintainability Portability Reusability Unit TestingTestability Integration TestingTestability Data related reliability Logic related reliability Statement related reliability Synchroniation related reliability Resource related reliability Architecture related reliability Fault tolerance Understandability Readability ... ... ... ... ... Number of parametersin a module call (NOP) <6 Couplingbetween objects(CBO) <7 Switch statementshave a 'default' condition No assignement ' =' within 'if' statement No assignement ' =' within 'while' statement Invariant iteration index 75-99 ● For each of the source code requirements we need to associate a remediation function that translates the noncompliances into remediation costs ● In the most complex case you can associate a different function for each requirement, but in the most simple case you can have some predefined value for categories in which code requirements are in: SQALE – Remediation Function 76-99 ● Non-remediation functions represent the cost to keep a nonconformity so a negative impact from the business point of view SQALE – Non-Remediation Function 77-99 ● Sums of all the remediation costs associated to a particular hierarchy of characteristics constitute an index: – SQALE Testability Index: STI – SQALE Reliability Index: SRI – SQALE Changeability Index: SCI – SQALE Efficiency Index: SEI – SQALE Security Index: SSI – SQALE Maintainability Index: SMI – SQALE Portability Index: SPI – SQALE Reusability Index: SRuI – SQALE Quality Index: SQI (overall index) SQALE – Indices * Note that there is a version of each index that represents density, normalized by some measure of size 78-99 ● Indexes can be used to build a rating value: SQALE – Rating Rating= estimated remediationcost estimated development cost Rating= 8.30h 300h =2.7 %->C Example, an artefact that has an estimated development cost of 300 hours and a STI of 8.30 hours, using the reference table on the left 79-99 ● The final representation can take the form of a Kiviat diagram in which the different density indexes are represented SQALE – Rating 80-99 ● This is the view you find in SonarCube http://www.sonarqube.org/sonar-sqale-1-2-in-screenshot SQALE – Rating 81-99 ● Given our initial discussion of measurement pitfalls, scales and representation condition, the following sentence should be now clear: “Because the non-remediation costs are not established on an ordinal scale but on a ratio scale, we have shown [..] that we can aggregate the measures by addition and comply with the measurement theory and the representation clause.” SQALE Letouzey, Jean-Louis, and Michel Ilkiewicz. "Managing technical debt with the SQALE method." IEEE software 6 (2012): 44-51. 82-99 Case Studies 83-99 ● Suppose that we have the some projects on which we computed the following set of metrics → What can you say about the projects? Project01 Project02 Project03 Project04 Project05 Project06 # LOCS 4920 5817 4013 4515 3263 5735 # packages 29 49 33 35 25 33 # classes 126 199 159 181 75 198 # methods 658 862 644 817 415 715 # attributes 153 196 227 285 78 177 # parameters 301 459 393 440 182 415 # local vars 493 533 325 397 339 416 # calls 2051 2830 1844 2297 917 2015 Proj_status complete complete incomplete complete incomplete complete Case Study 84-99 ● What if we consider relative instead of absolute values? ● This would allow to compare the values across projects Project01 Project02 Project03 Project04 Project05 Project06 LOCs/NOM 7.48 6.75 6.23 5.53 7.86 8.02 NOC/NOP 4.34 4.06 4.82 5.17 3.00 6.00 NOM/NOC 5.22 4.33 4.05 4.51 5.53 3.61 att/NOC 1.21 0.98 1.43 1.57 1.04 0.89 param/NOM 0.46 0.53 0.61 0.54 0.44 0.58 locvars/NOM 0.75 0.62 0.50 0.49 0.82 0.58 Calls/NOM 3.12 3.28 2.86 2.81 2.21 2.82 highest value Proj_status complete complete incomplete complete incomplete complete lowest value Case Study 85-99 Case Study ● What if we make sense out of the metrics by using the GQM approach? G1. Analyze the software product (object of study) for the purpose of evaluation (purpose) with respect to the effectiveness of code structure (quality focus) from the point of view of the development team (point of view) in the environment of our project named xyx (environment). Q1.1. what is the structure of the system? M1.2.1 Calls/NOM M1.2.2 param/NOM M1.1.3 NOM/NOC Q1.2. what is the coupling within the system? M1.1.1 NOC/NOP M1.1.2 LOCs/NOM 86-99 Case Study ● What if we make sense out of the metrics by using the GQM approach? G1. Analyze the software product (object of study) for the purpose of evaluation (purpose) with respect to the effectiveness of code structure (quality focus) from the point of view of the development team (point of view) in the environment of our project named xyx (environment). Q1.1. what is the structure of the system? M1.2.1 Calls/NOM M1.2.2 param/NOM M1.1.3 NOM/NOC Q1.2. what is the coupling within the system? M1.1.1 NOC/NOP M1.1.2 LOCs/NOM P1: 3.12 P5: 2.21 P1: 0.46 P5: 0.44 87-99 Case Study ● What happens if we consider LOCs instead of NOMs? G1. Analyze the software product (object of study) for the purpose of evaluation (purpose) with respect to the effectiveness of code structure (quality focus) from the point of view of the development team (point of view) in the environment of our project named xyx (environment). Q1.1. what is the structure of the system? M1.2.1 Calls/LOCs M1.2.2 param/LOCs M1.1.3 NOM/NOC Q1.2. what is the coupling within the system? M1.1.1 NOC/NOP M1.1.2 LOCs/NOM P1: 0.41 P5: 0.28 P1: 0.14 P5: 0.05 88-99 ● Another useful way to think in terms of relative values and thresholds is to use the Overview Pyramid ● The Overview pyramid allows to represent three different aspects of internal quality: inheritance, size & complexity and coupling ● It provides both absolute and relative values that are compared against typical thresholds NOP: Number of Packages NOC: Number of Classes NOM: Number of Methods LOC: Lines of Code CYCLO: Cyclomatic Complexity ANDC: Average Number of Derived Classes AHH: Average Hierarchy Height CALL: Number of Distinct Method Invocations FANOUT: Number of Called Classes Case Study – The Overview Pyramid 89-99 Project 1 Project 2 Project 3 Close to high Close to average Close to low Case Study – The Overview Pyramid 90-99 Project 4 Project 5 Project 6 Close to high Close to average Close to low Case Study – The Overview Pyramid 91-99 Back to our initial project Eclipse JDT 3.5.0 The overview pyramid Close to high Close to average Close to low Case Study – The Overview Pyramid 92-99 ● Measurement is important to track progress of software projects and to focus on relevant parts that need attention ● As such, we always need to take measurement into account with some “grain of salt” ● Still, collecting non-relevant or non-valid metrics might be even worse than not collecting any valid measure at all Conclusions 93-99 Extra Slides 94-99 ● LOCs: Lines of Code ● CC: McCabe Cyclomatic complexity ● Fan in: number of local flows that terminates in a module ● Fan out: number of local flows emanate from a module ● Information flow complexity of a a module: length of the module times the squared difference of fan in and fan out ● NOM: Number of Methods per class ● WMC: Weighted Methods per Class ● DIT: Depth of Inheritance Tree ● NOC: Number of Children ● CBO: Coupling Between Objects ● RFC: Response For a Class ● LCOM: Lack of Cohesion of Methods ● ANDC: Average Number of Derived Classes ● AHH: Average Hierarchy Height List of some Acronyms 95-99 Example: Laws in Software Engineering: how were these derived? Software Engineering Laws (1/4) 96-99 Information hiding in object oriented programming “A human being can concentrate on 7±2 items at a time” “Productivity is improved by reducing accidents and controlling essence” “Testing can show the presence but not absence of errors” Pr(A|B) = Pr(B|A)*Pr(A) / Pr(b) Software Engineering Laws (2/4) 97-99 “Requirement deficiencies are the prime source of project failure” “The value of a model depends on the view taken,but none is best for all purposes” “the user will never know what they want until after the system is in production” “Good designs require deep application domain knowledge” “What applies to small systems does not apply to large ones” “Everything put together falls apart sooner or later” 8 laws of software evolution Software Engineering Laws (3/4) 98-99 The number of transistors on an integrated circuit will double in about 18 months. The number of radio communications doubles every 30 months “the number of lines of code a programmer can write in a fixed period of time is the same regardless of the programming language” “If builders built buildings the way programmers wrote programs, the first woodpecker that came along would destroy civilization” Perspective based inspections (along one dimension, for a specific stakeholder) are highly eeffective and efficient Software reuse reduces cycle time and increases productivity and quality Software Engineering Laws (4/4) 99-99 ● N. Fenton and J. Bieman, Software Metrics: A Rigorous and Practical Approach, Third Edition, 3 edition. Boca Raton: CRC Press, 2014. ● C. Ebert and R. Dumke, Software Measurement: Establish - Extract Evaluate - Execute, Softcover reprint of hardcover 1st ed. 2007 edition. Springer, 2010. ● Lanza, Michele, and Radu Marinescu. Object-oriented metrics in practice: using software metrics to characterize, evaluate, and improve the design of object-oriented systems. Springer Science & Business Media, 2007. ● Some code samples from Martin, Robert C. Clean code: a handbook of agile software craftsmanship. Pearson Education, 2008. ● Moose platform for software data analysis http://moosetechnology.org ● The SQALE Method http://www.sqale.org/wp-content/uploads/2010/08/SQALE-Method-EN-V1-0.pdf References