PV260 - SOFTWARE QUALITY [Spring 2023] SOFTWARE MEASUREMENT & METRICS AND THEIR ROLE IN QUALITY IMPROVEMENT Bruno Rossi brossi@mail.muni.cz LAB OF SOFTWARE ARCHITECTURES AND INFORMATION SYSTEMS FACULTY OF INFORMATICS MASARYK UNIVERSITY, BRNO 2/94 ● The following defect (can you spot it?) in Apple's SSL code was undiscovered from Sept 2012 to Feb 2014 – how can it be? M. Bland, “Finding more than one worm in the apple,” Communications of the ACM, vol. 57, no. 7, pp. 58–64, Jul. 2014. Introduction 3/94 ● Modern systems are very large & complex in terms of structure & runtime behaviour ● The figure on the right represents Eclipse JDT 3.5.0 (350K LOCs, 1.324 classes, 23.605 methods ) Classes black - Methods red – Attributes blue. Method containment, attribute containment, and class inheritance gray - Invocations red - Accesses blue Introduction 4/94 ● We need ways to understand attributes of software, represent in a concise way and use it to track for software & development process improvement ● Software Measurement and Metrics are one of the aspects we can consider LOCs 354.780 NOM 23.605 NOC 1.324 NOP 45 LOCs=lines of code, NOM=nr. of methods NOC=nr. of classes, NOP=nr. of packages If we consider the following metrics, what can we say? What are these metrics “good” for? Introduction 5/94 ● Typical problems related to software measurement: → How can I measure the maintainability of my software? → Can I estimate the number of defects of my software? → What is the productivity of my development team? → Can I measure the quality of my testing process? Introduction 6/94 Motivational Example 7/94 ● Expert source code and system review after reported cases of accidents due to cars accelerating without users' inputs * ● 18 months review + previous NASA experts code review ● Investigation on unintended accelerations * http://www.safetyresearch.net/Library/BarrSlides_FINAL_SCRUBBED.pdf Review of defective Toyota Camry’s System (1/3) 8/94 ● Usage of software metrics (p.24): ● “Data-flow spaghetti – Complex coupling between software modules and between tasks – Count of global variables is a software metric for “tangledness” 2005 Camry L4 has >11,000 global variables (NASA)” * http://www.safetyresearch.net/Library/BarrSlides_FINAL_SCRUBBED.pdf Review of defective Toyota Camry’s System (2/3) 9/94 ● Usage of software metrics (p.24): ● “Control-flow spaghetti – Many long, overly-complex function bodies – Cyclomatic Complexity is a software metric for “testability” 2005 Camry L4 has 67 functions scoring >50 (“untestable”) The throttle angle function scored over 100 (unmaintainable)” ● See also p.30-31 for coding rules violations and expected number of bugs * http://www.safetyresearch.net/Library/BarrSlides_FINAL_SCRUBBED.pdf Review of defective Toyota Camry’s System (3/3) 10/94 Background on Software Measurement 11/94 Measurement is the process by which numbers or symbols are assigned to attributes of entities in the real world in such a way as to describe them according to clearly defined rules (Fenton & Pfleeger, 1997) Measurement 12/94 ● To avoid anecdotal evidence without a clear research (through experiments or prototypes for example) ● To increase the visibility and the understanding of the process ● To analyze the software development process ● To make predictions through statistical models Gilbs’s Principle of fuzzy targets (1988): “Projects without clear goals will not achieve their goals clearly” Why Software Measurement 13/94 ● Although measurement may be integrated in development, very often the objectives of measurements are not clear “I measure the process because there is an automated tool that collects the metrics, but do not know how to read the data and what I can do with the data” Tom De Marco (1982): “You cannot manage what you cannot measure” ... ...but you need to know what to measure and how to measure However... 14/94 ● The measurement process goes from the real world to the numerical representation ● Interpretation goes from the numerical representation to the relevant empirical results Real World Numbers Reduced Numbers Relevant Empirical Results Intelligence Barrier Measures Interpretation Statistics RelevantResults The Measurement Process 15/94 ● A measure is a mapping between – The real world – The mathematical or formal world with its objects and relations ● Different mappings give different views of the world depending on the context (height, weight, …) ● The mapping relates attributes to mathematical objects; it does not relate entities to mathematical objects Measure Definition 16/94 ● The validity of a measure depends on definition of the attribute coherent with the specification of the real world ● Example: Is LOC a valid measure of productivity? Think by paradox: 100K system.out statements vs 100K of complex loops and statements ADDITIONAL PROBLEM: You might have two different projects with two different definitions of LOCs (e.g., considering blanks+comments vs only “;”) so that the following can be true at the same time P1>P2 and P1 min,max median avg prop Nominal → Ordinal → Interval → Rational → ≠,= Measurement Scales (1/4) 23/94 ● Some examples of measures and related scales Scale Type Examples in Software Eng. Indicators of Central Tendency Nominal Name of the programming language (e.g. Java, C++, C#) Mode Ordinal Ranking of failures (as a measure of failure severity) Mode + Median Interval Beginning date, end date of activities Mode + Median + Arithmetic Mean Ratio LOC (as a measure of program size) Mode + Median + Arithmetic Mean + geometric Mean Morasca, Sandro. "Software measurement." Handbook of Software Engineering and Knowledge Engineering (2001): 239-276. Measurement Scales (2/4) 24/94 ● Example, suppose that we have the following ranking of software tickets by severity Level Severity Description 6 Blocker Prevents function from being used, no workaround, blocking progress on multiple fronts 5 Critical Prevents function from being used, no work- around 4 Major Prevents function from being used, but a workaround is possible 3 Normal A problem making a function difficult to use but no special work-around is required 2 Minor A problem not affecting the actual function, but the behavior is not natural 1 Trivial A problem not affecting the actual function, a typo would be an example Measurement Scales (3/4) - example 25/94 ● Is it meaningful to use the weighted average to compare two projects in terms of severity of the open issues? Order Severity P1 P2 6 Blocker 2 10 5 Critical 36 19 4 Major 25 22 3 Normal 15 32 2 Minor 2 5 1 Trivial 121 113 Sev(Pn)=avg(∑issuesi∗weighti) Sev(P1)=avg(2∗6+36∗5+25∗4+15∗3+2∗2+121∗1)=77 Sev(P2)=avg(10∗6+19∗5+22∗4+32∗3+5∗2+113∗1)=77 Are the projects the same according to our metric? Is there the “same distance” from a critical ticket to a blocker that there is between minor and trivial? Let’s define the following metric: Measurement Scales (4/4) - example 26/94 Pitfalls in linking the real world phenomenon to numbering systems https://xkcd.com/605/ 27/94 ● A/B Testing is a kind of randomized experiment in which you can propose two variants of the same application to the users ● We set-up an experiment with two browsers and two variations of the same webpage ● Conversion Rate: % of users completing an action Conv Rate A Conv Rate B Firefox 87.50% 100.00% Chrome 50.00% 62.50% What can you conclude? Which alternative is better? https://medium.com/homeaway-tech-blog/simpsons-paradox-in-a-b-testing-93af7a2f3307 Pitfall Example (1/3) 28/94 ● Let’s look at the same table but with additional information about the way the tests were split https://medium.com/homeaway-tech-blog/simpsons-paradox-in-a-b-testing-93af7a2f3307 Conv Rate A Conv Rate B Firefox 70/80 = 87.5% 20/20 = 100% Chrome 10/20 = 50% 50/80 = 62.5% Both 80/100 = 80% 70/100 = 70% Pitfall Example (2/3) 29/94 Simpsons' paradox ● It can happen that: a/b < A/B c/d < C/D (a + c)/(b + d) > (A + C)/(B + D) ● example 1/5 (20%) < 2/8 (25%) 6/8 (75%) < 4/5 (80%) 7/13 (53%) > 6/13 (46%) See: https://plato.stanford.edu/entries/paradox-simpson/ – considering the following papers: J. Pearl (2000). Causality: Models, Reasoning, and Inference, Cambridge University Press. P.J. Bickel, E.A. Hammel and J.W. O'Connell (1975). "Sex Bias in Graduate Admissions: Data From Berkeley. Science 187 (4175): 398–40 Dept Men Women Applicants admitted Applicants admitted A 5 20% 8 25% B 8 75% 5 80% Total 13 53% 13 46% Pitfall Example (3/3) 30/94 Software Measurement Models & Methods 31/94 Measurement artifacts / objects Product (architecture implementation, documentation) Process (management, lifecycle, CASE) Resources (personnel, software, hardware) Measurement Models Flow graphs Call graphs Structure tree Code schema ... Scale types, statistics Correlation Estimation Adjustment Calibration Measurement Evaluation Analysis Visualization Exploration Prediction ... Measurement Goals Understanding Learning Improvement Management Controlling ... artefactBased operation quantificationBased operation valueBased operation experienceBased operation Software Measurement Methods 32/94 Information Product Information Needs Interpretation Indicator (analysis) Model Derived Measure Derived Measure Measurement Function Base Measure Base Measure Measurement Method Measurement Method Attribute Attribute Entity Measurable Concept Measurable Concept: abstract relationship between attributes of entities and information needs Measurement Information Model (ISO/IEC 15939) 33/94 Derived Measure Derived Measure Measurement Function Base Measure Base Measure Measurement Method Measurement Method Attribute Attribute Entity Measurable Concept Property relevant to information needs Operations mapping an attribute to a scale Variable assigned a value by applying the method to one attribute Algorithm for combining two or more base measures Variable assigned a value by applying the measurement function to two or more values of base measures Bottom part Measurement Information Model (ISO/IEC 15939) 34/94 Information Product Information Needs Interpretation Indicator (analysis) Model Algorithm for combining measures and decision criteria Variable assigned a value by applying the analysis model to base and/or derived measures Explanation relating the quantitative information in the indicator to the information needs The outcome of the measurement process that satisfies the information needs Top part Measurement Information Model (ISO/IEC 15939) 35/94 Information Product Information Needs Interpretation Indicator (analysis) Model Derived Measure Derived Measure Measurement Function Base Measure Base Measure Measurement Method Measurement Method Attribute Attribute Entity Measurable Concept B1= Nr. of inaccurate computations encountered by users B2= Operation Time B1/B2 Computational Accuracy Comparison of values obtained with generic thresholds and/or targets External quality measures – Functionality - Accuracy Software Run-time accuracy Run-time usability Information Product Information Needs Interpretation Indicator (analysis) Model Derived Measure Derived Measure Measurement Function Base Measure Base Measure Measurement Method Measurement Method Attribute Attribute Entity Measurable Concept B1= Number of detected failures B2= Number of performed test cases B1/B2 Failure density against test cases Comparison of values obtained with generic thresholds and/or targets External quality measures – Reliability - Maturity Software Run-time reliability Level of testing Inspired by Abran, Alain, et al. "An information model for software quality measurement with ISO standards." Proceedings of the International Conference on Software Development (SWDC-REK), Reykjavik, Iceland. 2005. ISO/IEC 15939 Examples 36/94 ● Some measures are harder to collect or are not regularly collected – Direct: from a direct process of measuring – Indirect: from a mathematical equation in the world of symbols Derived Measure Derived Measure Measurement Function Base Measure Base Measure Measurement Method Measurement Method Attribute Attribute Entity Measurable Concept Property relevant to information needs Operations mapping an attribute to a scale Variable assigned a value by applying the method to one attribute Algorithm for combining two or more base measures Variable assigned a value by applying the measurement function to two or more values of base measures ISO/IEC 15939 refers to them as base measure and derived measure Direct vs Indirect Measures (1/2) 37/94 ● Direct – Number of known defects ● Indirect – Defects density (DD) – COCOMO, measure of effort E=a⋅KSLoCb ⋅EAF where b=0.91+0.01∑i=1 5 SFi a=2.94 DD= known defects productsize EAF = Effort Adjustment Factor SF = Scale Factors Direct vs Indirect Measures (2/2) 38/94 ● Generally, it easier to collect measures of length and complexity of the code (internal attributes of product) than measures of its quality (external attributes) – Internal attribute: internal characteristics of product, process, and human resources – External attributes: due to external environment Internal vs External Attributes (1/4) 39/94 ● One of the aims of Software Engineering is to improve the quality of software Internal vs External Attributes (2/4) 40/94 ● The mapping of internal attributes to external ones – and then quality in use – is not as straightforward Internal vs External Attributes (3/4) 41/94 ● The mapping of internal attributes to external ones – and then quality in use – is not as straightforward (example: reliability) nr. of failures over a period of time How many faults were detected in reviewed Product? X=A/B A=Absolute number of faults detected in review B=Number of estimated faults to be detected in review (using past history or reference model) Is there a relation between the two? ASSUMPTION (!) → fix internal mistakes to fix the corresponding failure(s) Internal vs External Attributes (4/4) 42/94 Objective: the same each time they are taken (e.g. automated collected by some device) e.g., LOCs Subjective: manually collected by individuals e.g., time to use a functionality in an application Objective vs Subjective Measures 43/94 SOFTWARE METRICS - SIZE 44/94 [01] * multiples. Repeat until there are no more multiples [02] * in the array. [03] */ [04] public class PrimeGenerator [05] { [06] private static boolean[] crossedOut; [07] private static int[] result; [08] public static int[] generatePrimes(int maxValue){ [09] if (maxValue < 2){ [10] return new int[0]; [11] }else{ [12] uncrossIntegersUpTo(maxValue); [13] crossOutMultiples(); [14] putUncrossedIntegersIntoResult(); [15] return result; [16] } [17] } [18] } Various Measures of Size 45/94 [01] * multiples. Repeat until there are no more multiples [02] * in the array. [03] */ [04] public class PrimeGenerator [05] { [06] private static boolean[] crossedOut; [07] private static int[] result; [08] public static int[] generatePrimes(int maxValue){ [09] if (maxValue < 2){ [10] return new int[0]; [11] }else{ [12] uncrossIntegersUpTo(maxValue); [13] crossOutMultiples(); [14] putUncrossedIntegersIntoResult(); [15] return result; [16] } [17] } [18] } LOC = 18 (Lines Of Code) CLOC=3 (Commented Lines of Code) Various Measures of Size 46/94 [01] * multiples. Repeat until there are no more multiples [02] * in the array. [03] */ [04] public class PrimeGenerator [05] { [06] private static boolean[] crossedOut; [07] private static int[] result; [08] public static int[] generatePrimes(int maxValue){ [09] if (maxValue < 2){ [10] return new int[0]; [11] }else{ [12] uncrossIntegersUpTo(maxValue); [13] crossOutMultiples(); [14] putUncrossedIntegersIntoResult(); [15] return result; [16] } [17] } [18] } NLOC = 15 (Non-Commented Lines Of Code) Various Measures of Size 47/94 [01] * multiples. Repeat until there are no more multiples [02] * in the array. [03] */ [04] public class PrimeGenerator [05] { [06] private static boolean[] crossedOut; [07] private static int[] result; [08] public static int[] generatePrimes(int maxValue){ [09] if (maxValue < 2){ [10] return new int[0]; [11] }else{ [12] uncrossIntegersUpTo(maxValue); [13] crossOutMultiples(); [14] putUncrossedIntegersIntoResult(); [15] return result; [16] } [17] } [18] } NOC = 1 (Number Of Classes) NOM = 1 (Number of Methods) NOP = 1 (Number of Packages) Various Measures of Size 48/94 ● Size is used for normalization of existing measures from the example before, it would be much more useful to report a comments density of 16% (3/18) rather than 3 CLOCs CD= CLOCs LOCs = 3 18 =0.16 Measures of Size good for…? 49/94 ● Example: using comments density to compare Open Source projects after normalization What is a good reference value for “comments density” in your opinion? These look “scary” O. Arafat and D. Riehle, “The comment density of open source software code,” in 31st International Conference on Software Engineering - Companion Volume, 2009. ICSE-Companion 2009, 2009, pp. 195–198. Measures of Size good for…? 50/94 ● Size can give a good rough initial estimation of effort, although... → Measures of source code size should *never* be used to assess the productivity of developers How would you compare Mozilla Firefox with the Linux Kernel in terms of maintenance effort? Software LOCs Microsoft Windows Vista ~50M Linux Kernel 3.1 ~15M Android ~12M Mozilla Firefox ~10M Unreal Engine 3 ~2M Measures of Size good for…? 51/94 → http://www.informationisbeautiful.net/visualizations/million-lines-of-code/ ● Size can be used for comparison of projects and across releases Measures of Size good for…? 52/94 “The task then is to refine the code base to better meet customer need. If that is not clear, the programmers should not write a line of code. Every line of code costs money to write and more money to support.” Jeff Sutherland, one of the main proponents of the Agile Manifesto and the SCRUM methodology Another observation about LOCs 53/94 SOFTWARE METRICS - COMPLEXITY 54/94 ● CC represents the number of independent control flow paths ● G=(N,E) is a graph representing the control flow of a program. N=nodes, E=edges, P = nr. disconnected parts of G, like main program and method call ● Cyclomatic Complexity is defined as: v(G) = |E|-|N|+ 2P → Assumptions: higher complexity of the program flow graphs, more complex testing process for the source code McCabe's Cyclomatic Complexity (CC) Note: a shortcut is to use # branches + 1 (if, for, foreach, while, do-while, case label, catch, conditional statements) 55/94 CC = 2 [01] * multiples. Repeat until there are no more multiples [02] * in the array. [03] */ [04] public class PrimeGenerator{ [05] private static boolean[] crossedOut; [06] private static int[] result; [07] public static int[] generatePrimes(int maxValue){ [08] if (maxValue < 2){ [09] return new int[0]; [10] }else{ [11] uncrossIntegersUpTo(maxValue); [12] crossOutMultiples(); [13] putUncrossedIntegersIntoResult(); [14] return result; [15] } [16] } [17] } Typical ranges 1-4 low 5-7 medium 8-10 high 11+ very high CC of method generatePrimes v(G)=|E|-|N|+2 v(G)=9-9+2=2 entry exit McCabe's Cyclomatic Complexity (CC) 56/94 ● The following code structure from a 2008 students' project implementing chess: one method with 292LOCs and 163 CC Example Application of CC 57/94 ● Let's decompose a bit such huge method public boolean eatCoin(Movement mov, Movement eatMov, Coin coin) throws IOException{ //Controls if the eatMove is in the board, if not return if(!canMove(eatMov)){ System.out.println("You can't eat this coin"); return false; } try{ //If it is a coin if(!this.board[mov.row][mov.col].isKing()){ //If the coin to eat isn't a king System.out.println("nextRow " + mov.nextRow + " nextCol " + mov.nextCol + " isKing " + this.board[mov.nextRow][mov.nextCol].isKing()); if(!this.board[mov.nextRow][mov.nextCol].isKing()){ .... Example Application of CC 58/94 Example Application of CC 59/94 ● A word of warning is that metrics take typically into account syntactic complexity NOT semantic complexity ● Both of the following code fragments have the *same* Cyclomatic Complexity → which code fragment is easier to understand? [04] public class PrimeGenerator [05] { [06] private static boolean[] crossedOut; [07] private static int[] result; [08] [09] public static int[] generatePrimes(int maxValue){ [10] if (maxValue < 2){ [11] return new int[0]; [12] }else{ [13] uncrossIntegersUpTo(maxValue); [14] crossOutMultiples(); [15] putUncrossedIntegersIntoResult(); [16] return result; [17] } [18] } [04] public class A [05] { [06] private static boolean[] c; [07] private static int[] b; [08] [09] public static int[] generate(int m){ [10] if (m < 2){ [11] return new int[0]; [12] }else{ [13] methodOne(m); [14] methodTwo(); [15] methodThree(); [16] return b; [17] } [18] } ● As well, as in the initial motivating example, a word of warning when comparing projects in terms of average complexity Complexity 60/94 OBJECT ORIENTED METRICS 61/94 ● WMC: Weighted methods per class ● DIT: Depth of Inheritance Tree ● NOC: Number of Children ● CBO: Coupling between object classes ● RFC: Response for a Class ● LCOM: Lack of cohesion in methods Chidamber & Kemerer Suite 62/94 ● WMC: Weighted methods per class – weighted sum of the number of methods of a class. Given C a class and M1, …, MK k methods with complexity c1,…,cK WMC ∑ i−1 n ci ,where cisthe complexity of amethod 63/94 → What is the WMC of the following classes? WMC WMC=∑ i−1 n ci 64/94 → What is the WMC of the following classes? WMC WMC=∑ i−1 n ci WMC(A) = NoM(A) = 5 WMC(B) = NoM(B) = 1 WMC(C) = NoM(C) = 0 WMC(D) = NoM(D) = 1 WMC(E) = NoM(E) = 3 WMC(F) = NoM(F) = 0 WMC(G) = NoM(G) = 0 65/94 ● DIT: Depth of Inheritance Tree max inheritance level from the root to the class ● NOC: Number of Children nr. Of direct descendants of a class DIT & NOC 66/94 ● DIT: Depth of Inheritance Tree max inheritance level from the root to the class ● NOC: Number of Children nr. Of direct descendants of a class DIT & NOC The deeper a class is in the hierarchy, the more methods it is likely to inherit, making it more complex Deep trees as such indicate greater design complexity As a positive factor, deep trees promote reuse because of method inheritance What are “good” DIT & NOC values? 67/94 ● CBO: Coupling between Objects Class A coupled with B, if A is using methods/attributes of B Multiple accesses to the same class are counted as one access High CBO is undesirable: Excessive coupling between object classes is detrimental to modular design and prevents reuse Note: Some definitions of CBO consider both A using B (fan-out), but also B using A (fanin) for the computation of CBO CBO 68/94 CBO → What is the CBO of the following classes? 69/94 CBO → What is the CBO of the following classes? CBO(A)=3 CBO(B)=CBO(C)=CBO(D)= CBO(E)=CBO(F)=0 70/94 ● RFC: Response for a Class the number of methods of a class than can be invoked in response of a call to a method of a class – count of methods that can be executed by class A responding to a message (Mc) – Sum all to external calls to other methods (only count one call to the same method once) A large RFC has been found to indicate more faults Classes with a high RFC are more complex and harder to understand: Testing and Debugging are more complicated RFC RFC=|Mc∪Me| 71/94 RFC → What is the RFC of the following classes? RFC=|Mc∪Me| 72/94 RFC → What is the RFC of the following classes? RFC(A) = 7 WMC(B) = NoM(B) = 1 WMC(C) = NoM(C) = 0 WMC(D) = NoM(D) = 1 WMC(E) = NoM(E) = 3 WMC(F) = NoM(F) = 0 WMC(G) = NoM(G) = 0 RFC=|Mc∪Me| 73/94 ● LCOM: Lack of cohesion in methods – How closely the local methods are related to the local instance variables in the class – We use a “negative” measure of cohesiveness, the lack of cohesion of its methods LCOM LCOM=1− ∑ F|Mf| |M|x|F| M = static and instance methods in the class F = instance field in the class Mf = methods accessing field f |S| = cardinality of set S Figure source from NDepend documentation 1− 10 50 =0.8 Divide by the # of methods multiplied the # of fields Take each field in the class, count the methods that reference it, sum all together for all fields. Violet=attributes, pink=methods 1− 2 2 =0 74/94 Question Time 75/94 ● Given all that we have seen, what are your thoughts about the following metric computing the Maintainability Index (MI) of a project: MI=171−5.2⋅ln(V )−0.23⋅CC−16.2⋅ln(LOC) Note: you might see different versions of MI implemented in different tools – this is the original formula that has a range (171,-∞), other variations go in the (0,100) range, e.g. look at Microsoft Visual Studio documentation for details V is the Halstead volume, measuring the complexity of code based on length and vocabulary used (in the code) V=N∗log2 n whereN=N1+N2, N1=Totaloperators(like>,;,),etc..,N2=Totaloperands(like j ,i,0,etc...) N=n1+n2, n1=uniqueoperators,n2=uniqueoperands In your view, what is good and what is bad about this metric? Maintainability Index (MI) CC = Cyclomatic Complexity as defined previously LOC = Lines of Code 76/94 The Goal Question Metrics (GQM) Approach 77/94 ● Common pitfalls in software measurement – Collecting measurements without a meaning ● Measurement must be goal-driven – Not analyzing measurements ● Numbers need detailed analysis – Setting unrealistic targets ● Targets should not be uniquely defined based on the numbers – Paralysis by analysis ● Measurement is a key activity in management, not a separate activity Count what is countable. Measure what is measurable. And what is not measurable, make measurable. Galileo Galilei Software Measurement Pitfalls 78/94 ● Introduced in 1986 by Rombach and Basili – GQM stands for Goal Question Metric ● It is a deductive instrument to derive suitable measures from prescribed goals ● The paradigm is initiated by Business Goals (BG) ● From the BGs we can derive the GQM ● The Goal Question Metric top-down approach consists of three layers – Conceptual layer – the Measurement Goal (G) – Operational layer – the Question (Q) – Measurement layer – the Metric (M) The GQM Approach 79/94 ● Measurements must be goal-oriented ● Following typically a structure as the GQM approach: Measurement Goal (G) Question (Q) Metric (M) Business objectives, key performance indicators, projects targets, improvements goals Approaches to reach the goals, improvement programs, change management, project management techniques Business, employee, products, processes What are the goals to reach? What do I need to improve? How do I reach my objectives? I will I improve? Am I doing good or bad? Am I doing better or worse? Feedback loop (understand) Review Define Goal-oriented Measurement 80/94 ● Here are some possible and common used words for each item of the Goal structure → Object of study: process, product, model, metric, etc... → Purpose: characterize, evaluate, predict, motivate, etc... in order to understand, assess, manage, engineer, improve, etc... → Point of view: manager, developer, tester, customer, etc... → Perspective or Focus: cost, effectiveness, correctness, defects, changes, product measures, etc... → Environment or Context: specify the environmental factors, including process factors, people factors, problem factors, methods, tools, constraints, etc... The Measurement Goal 81/94 SQALE (Software Quality Assessment Based on Lifecycle Expectations) 82/94 ● SQALE (Software Quality Assessment Based on Lifecycle Expectations) is a quality method to evaluate technical debts in software projects based on the measurement of software characteristics – Three levels, the first one including 8 software characteristics Characteristic Sub- Characteristic Source Code Requirement 1 1,n 1 1,n Level 1 Level 2 Level 3 Testability Reliability Changeability Efficiency Security Maintainability Portability Reusability SQALE Adapted from: http://www.sqale.org/wp-content/uploads/2010/08/SQALE-Method-EN-V1-0.pdf 83/94 ● The second level is formed by characteristics Characteristic Sub- Characteristic Source Code Requirement 1 1,n 1 1,n Level 1 Level 2 Level 3 Testability Reliability Changeability Efficiency Security Maintainability Portability Reusability Unit Testing Testability Integration Testing Testability Data related reliability Logic related reliability Statement related reliability Synchroniation related reliability Resource related reliability Architecture related reliability Fault tolerance Understandability Readability ... ... ... ... ... SQALE Adapted from: http://www.sqale.org/wp-content/uploads/2010/08/SQALE-Method-EN-V1-0.pdf 84/94 ● The third level is linking language specific constructs to the sub- characteristics Characteristic Sub- Characteristic Source Code Requirement 1 1,n 1 1,n Level 1 Level 2 Level 3 Testability Reliability Changeability Efficiency Security Maintainability Portability Reusability Unit Testing Testability Integration Testing Testability Data related reliability Logic related reliability Statement related reliability Synchroniation related reliability Resource related reliability Architecture related reliability Fault tolerance Understandability Readability ... ... ... ... ... Number of parameters in a module call (NOP) <6 Coupling between objects (CBO) <7 Switch statements have a 'default' condition No assignement ' =' within 'if' statement No assignement ' =' within 'while' statement Invariant iteration index SQALE Adapted from: http://www.sqale.org/wp-content/uploads/2010/08/SQALE-Method-EN-V1-0.pdf 85/94 ● For each of the source code requirements we need to associate a remediation function that translates the non-compliances into remediation costs ● In the most complex case you can associate a different function for each requirement, but in the most simple case you can have some predefined value for categories in which code requirements are in: SQALE – Remediation Function Source: http://www.sqale.org/wp-content/uploads/2010/08/SQALE-Method-EN-V1-0.pdf 86/94 ● Non-remediation functions represent the cost to keep a nonconformity so a negative impact from the business point of view SQALE – Non-remediation Function Source: http://www.sqale.org/wp-content/uploads/2010/08/SQALE-Method-EN-V1-0.pdf 87/94 ● Sums of all the remediation costs associated to a particular hierarchy of characteristics constitute an index: – SQALE Testability Index: STI – SQALE Reliability Index: SRI – SQALE Changeability Index: SCI – SQALE Efficiency Index: SEI – SQALE Security Index: SSI – SQALE Maintainability Index: SMI – SQALE Portability Index: SPI – SQALE Reusability Index: SRuI – SQALE Quality Index: SQI (overall index) * Note that there is a version of each index that represents density, normalized by some measure of size SQALE - Indexes 88/94 ● Indexes can be used to build a rating value: Rating= estimated remediationcost estimated development cost Rating= 8.30h 300h =2.7 %->C Example, an artefact that has an estimated development cost of 300 hours and a STI of 8.30 hours, using the reference table on the left SQALE - Rating Source: http://www.sqale.org/wp-content/uploads/2010/08/SQALE-Method-EN-V1-0.pdf 89/94 ● The final representation can take the form of a Kiviat diagram in which the different density indexes are represented SQALE - Rating Source: http://www.sqale.org/wp-content/uploads/2010/08/SQALE-Method-EN-V1-0.pdf 90/94 ● This is the overall view you find in SonarCube SQALE - Rating Source: http://www.sonarqube.org 91/94 ● Given our initial discussion of measurement pitfalls, scales and representation condition, the following sentence should be now clear: “Because the non-remediation costs are not established on an ordinal scale but on a ratio scale, we have shown [..] that we can aggregate the measures by addition and comply with the measurement theory and the representation clause.” Letouzey, Jean-Louis, and Michel Ilkiewicz. "Managing technical debt with the SQALE method." IEEE software 6 (2012): 44-51. SQALE – Small Detail 92/94 ● Measurement is important to track progress of software projects and to focus on relevant parts that need attention ● As such, we always need to take measurement into account with some “grain of salt” ● Still, collecting non-relevant or non-valid metrics might be even worse than not collecting any valid measure at all Conclusions 93/94 ● LOCs: Lines of Code ● CC: McCabe Cyclomatic complexity ● Fan in: number of local flows that terminates in a module ● Fan out: number of local flows emanate from a module ● Information flow complexity of a a module: length of the module times the squared difference of fan in and fan out ● NOM: Number of Methods per class ● WMC: Weighted Methods per Class ● DIT: Depth of Inheritance Tree ● NOC: Number of Children ● CBO: Coupling Between Objects ● RFC: Response For a Class ● LCOM: Lack of Cohesion of Methods ● ANDC: Average Number of Derived Classes ● AHH: Average Hierarchy Height List of some acronyms 94/94 ● N. Fenton and J. Bieman, Software Metrics: A Rigorous and Practical Approach, Third Edition, 3 edition. Boca Raton: CRC Press, 2014. ● C. Ebert and R. Dumke, Software Measurement: Establish - Extract - Evaluate Execute, Softcover reprint of hardcover 1st ed. 2007 edition. Springer, 2010. ● Lanza, Michele, and Radu Marinescu. Object-oriented metrics in practice: using software metrics to characterize, evaluate, and improve the design of objectoriented systems. Springer Science & Business Media, 2007. ● Some code samples from Martin, Robert C. Clean code: a handbook of agile software craftsmanship. Pearson Education, 2008. ● Moose platform for software data analysis http://moosetechnology.org ● The SQALE Method http://www.sqale.org/wp-content/uploads/2010/08/SQALE-Method-EN-V1-0.pdf References