PV260 - SOFTWARE QUALITY
[Spring 2023]
SOFTWARE MEASUREMENT & METRICS AND
THEIR ROLE IN QUALITY IMPROVEMENT
Bruno Rossi
brossi@mail.muni.cz
LAB OF SOFTWARE ARCHITECTURES
AND INFORMATION SYSTEMS
FACULTY OF INFORMATICS
MASARYK UNIVERSITY, BRNO
2/94
●
The following defect (can you spot it?) in Apple's SSL code was
undiscovered from Sept 2012 to Feb 2014 – how can it be?
M. Bland, “Finding more than one worm in the apple,”
Communications of the ACM, vol. 57, no. 7, pp. 58–64,
Jul. 2014.
Introduction
3/94
●
Modern systems are very large & complex in terms of
structure & runtime behaviour
●
The figure on the right
represents Eclipse JDT 3.5.0
(350K LOCs, 1.324 classes,
23.605 methods )
Classes black - Methods red – Attributes blue. Method containment, attribute containment, and class
inheritance gray - Invocations red - Accesses blue
Introduction
4/94
●
We need ways to understand attributes of software, represent in a
concise way and use it to track for software & development process
improvement
●
Software Measurement and Metrics are one of the aspects we can
consider
LOCs 354.780
NOM 23.605
NOC 1.324
NOP 45
LOCs=lines of code, NOM=nr. of methods
NOC=nr. of classes, NOP=nr. of packages
If we consider the following metrics,
what can we say?
What are these metrics “good” for?
Introduction
5/94
●
Typical problems related to software measurement:
→  How can I measure the maintainability of my software?
→  Can I estimate the number of defects of my software?
→  What is the productivity of my development team?
→  Can I measure the quality of my testing process?
Introduction
6/94
Motivational Example
7/94
●
Expert source code and system review after reported cases of
accidents due to cars accelerating without users' inputs *
●
18 months review + previous NASA experts code review
●
Investigation on unintended accelerations
* http://www.safetyresearch.net/Library/BarrSlides_FINAL_SCRUBBED.pdf
Review of defective Toyota Camry’s System (1/3)
8/94
●
Usage of software metrics (p.24):
●
“Data-flow spaghetti
– Complex coupling between software modules and between tasks
– Count of global variables is a software metric for “tangledness”
2005 Camry L4 has >11,000 global variables (NASA)”
* http://www.safetyresearch.net/Library/BarrSlides_FINAL_SCRUBBED.pdf
Review of defective Toyota Camry’s System (2/3)
9/94
●
Usage of software metrics (p.24):
●
“Control-flow spaghetti
– Many long, overly-complex function bodies
– Cyclomatic Complexity is a software metric for “testability”
2005 Camry L4 has 67 functions scoring >50 (“untestable”)
The throttle angle function scored over 100 (unmaintainable)”
●
See also p.30-31 for coding rules violations and expected number of bugs
* http://www.safetyresearch.net/Library/BarrSlides_FINAL_SCRUBBED.pdf
Review of defective Toyota Camry’s System (3/3)
10/94
Background on Software Measurement
11/94
Measurement is the process by which numbers or symbols are
assigned to attributes of entities in the real world in such a
way as to describe them according to clearly defined rules
(Fenton & Pfleeger, 1997)
Measurement
12/94
●
To avoid anecdotal evidence without a clear research (through
experiments or prototypes for example)
●
To increase the visibility and the understanding of the process
●
To analyze the software development process
●
To make predictions through statistical models
Gilbs’s Principle of fuzzy targets (1988):
“Projects without clear goals will not achieve their goals clearly”
Why Software Measurement
13/94
●
Although measurement may be integrated in development,
very often the objectives of measurements are not clear
“I measure the process because there is an automated tool
that collects the metrics, but do not know how to read the
data and what I can do with the data”
Tom De Marco (1982):
“You cannot manage what you cannot measure” ...
...but you need to know what to measure and how to measure
However...
14/94
●
The measurement process goes from the real world to the
numerical representation
●
Interpretation goes from the numerical representation to the
relevant empirical results
Real World Numbers
Reduced
Numbers
Relevant
Empirical
Results
Intelligence Barrier
Measures
Interpretation Statistics
RelevantResults
The Measurement Process
15/94
●
A measure is a mapping between
– The real world
– The mathematical or formal world with its objects and relations
●
Different mappings give different views of the world depending on the
context (height, weight, …)
●
The mapping relates attributes to mathematical objects; it does not relate
entities to mathematical objects
Measure Definition
16/94
●
The validity of a measure depends on definition of the attribute
coherent with the specification of the real world
●
Example: Is LOC a valid measure of productivity?
Think by paradox: 100K system.out statements vs
100K of complex loops and statements
ADDITIONAL PROBLEM: You might have two different projects with two different
definitions of LOCs (e.g., considering blanks+comments vs only “;”) so that the
following can be true at the same time P1>P2 and P1<P2
Measurement
Low High
RealWorld
Low
High
TRUE
NEGATIVE
FALSE
POSITIVE
FALSE
NEGATIVE
TRUE
POSITIVE
Measurement
Low High
RealWorld
Low
High
TRUE
NEGATIVE
FALSE
POSITIVE
FALSE
NEGATIVE
TRUE
POSITIVE
Valid Measure
17/94
●
Code coverage is a measure giving an indication of how much of the
source code has been run (“covered”) by running the tests
●
Different criteria:
– Statement coverage (the one assumed by standard “code coverage): the % of
statements of the program covered by the tests
– Function coverage: the % of functions/methods covered by the tests
– Branch coverage: the % of branches of the control structures (e.g., if- then- else)
covered by the tests
– Condition coverage: % of each Boolean condition evaluated both as True/False
[01] * multiples. Repeat until there are no more multiples
[02] * in the array.
[03] */
[04] public class PrimeGenerator
[05] {
[06] private static boolean[] crossedOut;
[07] private static int[] result;
[08] public static int[] generatePrimes(int maxValue){
[09] if (maxValue < 2){
[10] return new int[0];
[11] }else{
[12] uncrossIntegersUpTo(maxValue);
[13] crossOutMultiples();
[14] putUncrossedIntegersIntoResult();
[15] return result;
[16] }
[17] }
[18] }
Valid Measures – Example (1/5)
18/94
●
From Wikipedia some years ago: “...A program with high code coverage has
been more thoroughly tested and has a lower chance of containing
software bugs than a program with low code coverage...” - as of 2022 this
sentence was removed from Wikipedia, but it is still in some other webpages
(probably copy & paste)
Q.: Would you consider code coverage as a valid measure of how much
thoroughly one software project has been tested?
Suppose you have two projects and you compute code coverage
P1 70% vs P2 80%
Would you generally consider P2 to be “better” (more accurately) tested
than P1?
Valid Measures – Example (2/5)
19/94
Coverage 100%
[01] double div (int x, int y){
[02] return x/y;
[03] }
AssertEquals(1.0, div(1,1));
Coverage 100%
assertEquals(0.66, div(2,3), 0.1);
[01] double div (int x, int y){
[02] return x/y;
[03] }
A. Assumption: considering every test covering the same nr. of lines
as equal?
Note(!): Software follows usually a Pareto principle:
~80% of the defects are in the ~20% of the code
the ~20% of code with more defect-density can be more
difficult to cover with tests
Valid Measures – Example (3/5)
Same coverage, but the one on the right is a better test
20/94
●
According to Martin Fowler:
“→ “ Test coverage is a useful tool for
finding untested parts of a
codebase. Test coverage is of little
use as a numeric statement of how
good your tests are”
(http://martinfowler.com/bliki/TestCoverage.html)
Valid Measures – Example (4/5)
21/94
●
In this case, we do not respect the representation condition: when we
assign symbols to the attributes of entities we need to preserve the meaning
of relationships when moving entities from the real world to the numerical
world
●
You can see this also from the Information Theoretical point of view
Real
World
Mathem.
World
1-1 mapping on relations
Measurement
Low High
RealWorld
Low
High
TRUE
NEGATIVE
FALSE
POSITIVE
FALSE
NEGATIVE
TRUE
POSITIVE
Valid Measures – Example (5/5)
22/94
●
Every measurement is mapped to a so-called scale (nominal, ordinal,
interval, rational)
●
Considering the scale is quite important for the admissible operations
<,> min,max median avg prop
Nominal →
Ordinal →
Interval →
Rational →
≠,=
Measurement Scales (1/4)
23/94
●
Some examples of measures and related scales
Scale Type Examples in Software Eng. Indicators of Central Tendency
Nominal Name of the programming
language (e.g. Java, C++, C#)
Mode
Ordinal Ranking of failures (as a
measure of failure severity)
Mode + Median
Interval Beginning date, end date of
activities
Mode + Median + Arithmetic
Mean
Ratio LOC (as a measure of program
size)
Mode + Median + Arithmetic
Mean + geometric Mean
Morasca, Sandro. "Software measurement." Handbook of Software Engineering and Knowledge
Engineering (2001): 239-276.
Measurement Scales (2/4)
24/94
●
Example, suppose that we have the following ranking of software
tickets by severity
Level Severity Description
6 Blocker Prevents function from being used, no workaround,
blocking progress on multiple fronts
5 Critical Prevents function from being used, no work-
around
4 Major Prevents function from being used, but a workaround
is possible
3 Normal A problem making a function difficult to use but
no special work-around is required
2 Minor A problem not affecting the actual function, but
the behavior is not natural
1 Trivial A problem not affecting the actual function, a
typo would be an example
Measurement Scales (3/4) - example
25/94
●
Is it meaningful to use the weighted average to compare two
projects in terms of severity of the open issues?
Order Severity P1 P2
6 Blocker 2 10
5 Critical 36 19
4 Major 25 22
3 Normal 15 32
2 Minor 2 5
1 Trivial 121 113
Sev(Pn)=avg(∑issuesi∗weighti)
Sev(P1)=avg(2∗6+36∗5+25∗4+15∗3+2∗2+121∗1)=77
Sev(P2)=avg(10∗6+19∗5+22∗4+32∗3+5∗2+113∗1)=77
Are the projects the same
according to our metric? Is there
the “same distance” from a
critical ticket to a blocker that
there is between minor and
trivial?
Let’s define the following metric:
Measurement Scales (4/4) - example
26/94
Pitfalls in linking the real world
phenomenon to numbering systems
https://xkcd.com/605/
27/94
●
A/B Testing is a kind of randomized experiment in which you can
propose two variants of the same application to the users
●
We set-up an experiment with two browsers and two variations of the
same webpage
●
Conversion Rate: % of users completing an action
Conv Rate A Conv Rate B
Firefox 87.50% 100.00%
Chrome 50.00% 62.50%
What can you conclude? Which alternative is better?
https://medium.com/homeaway-tech-blog/simpsons-paradox-in-a-b-testing-93af7a2f3307
Pitfall Example (1/3)
28/94
●
Let’s look at the same table but with additional information about
the way the tests were split
https://medium.com/homeaway-tech-blog/simpsons-paradox-in-a-b-testing-93af7a2f3307
Conv Rate A Conv Rate B
Firefox 70/80 = 87.5% 20/20 = 100%
Chrome 10/20 = 50% 50/80 = 62.5%
Both 80/100 = 80% 70/100 = 70%
Pitfall Example (2/3)
29/94
Simpsons' paradox
●
It can happen that:
a/b < A/B
c/d < C/D
(a + c)/(b + d) > (A + C)/(B + D)
●
example
1/5 (20%) < 2/8 (25%)
6/8 (75%) < 4/5 (80%)
7/13 (53%) > 6/13 (46%)
See: https://plato.stanford.edu/entries/paradox-simpson/ – considering the following papers:
J. Pearl (2000). Causality: Models, Reasoning, and Inference, Cambridge University Press.
P.J. Bickel, E.A. Hammel and J.W. O'Connell (1975). "Sex Bias in Graduate Admissions: Data From Berkeley. Science 187 (4175): 398–40
Dept Men Women
Applicants admitted Applicants admitted
A 5 20% 8 25%
B 8 75% 5 80%
Total 13 53% 13 46%
Pitfall Example (3/3)
30/94
Software Measurement Models & Methods
31/94
Measurement
artifacts /
objects
Product
(architecture
implementation,
documentation)
Process
(management, lifecycle,
CASE)
Resources
(personnel,
software,
hardware)
Measurement
Models
Flow graphs
Call graphs
Structure tree
Code schema
...
Scale
types,
statistics
Correlation
Estimation
Adjustment
Calibration
Measurement
Evaluation
Analysis
Visualization
Exploration
Prediction
...
Measurement
Goals
Understanding
Learning
Improvement
Management
Controlling
...
artefactBased
operation
quantificationBased
operation
valueBased
operation
experienceBased
operation
Software Measurement Methods
32/94
Information
Product
Information Needs
Interpretation
Indicator
(analysis)
Model
Derived
Measure
Derived
Measure
Measurement
Function
Base
Measure
Base
Measure
Measurement
Method
Measurement
Method
Attribute Attribute
Entity
Measurable
Concept
Measurable
Concept:
abstract relationship
between attributes of
entities and
information needs
Measurement Information Model (ISO/IEC 15939)
33/94
Derived
Measure
Derived
Measure
Measurement
Function
Base
Measure
Base
Measure
Measurement
Method
Measurement
Method
Attribute Attribute
Entity
Measurable
Concept
Property relevant to
information needs
Operations mapping
an attribute to a scale
Variable assigned a
value by applying the
method to one attribute
Algorithm for combining
two or more base
measures
Variable assigned a
value by applying the
measurement function
to two or more values of
base measures
Bottom part
Measurement Information Model (ISO/IEC 15939)
34/94
Information
Product
Information Needs
Interpretation
Indicator
(analysis)
Model
Algorithm for combining
measures and decision
criteria
Variable assigned a value
by applying the analysis
model to base and/or
derived measures
Explanation relating the
quantitative information in
the indicator to the
information needs
The outcome of the
measurement process
that satisfies the
information needs
Top part
Measurement Information Model (ISO/IEC 15939)
35/94
Information
Product
Information Needs
Interpretation
Indicator
(analysis)
Model
Derived
Measure
Derived
Measure
Measurement
Function
Base
Measure
Base
Measure
Measurement
Method
Measurement
Method
Attribute Attribute
Entity
Measurable
Concept
B1= Nr. of
inaccurate
computations
encountered
by users
B2=
Operation
Time
B1/B2
Computational
Accuracy
Comparison of
values obtained
with generic
thresholds and/or
targets
External quality
measures –
Functionality -
Accuracy
Software
Run-time
accuracy
Run-time
usability
Information
Product
Information Needs
Interpretation
Indicator
(analysis)
Model
Derived
Measure
Derived
Measure
Measurement
Function
Base
Measure
Base
Measure
Measurement
Method
Measurement
Method
Attribute Attribute
Entity
Measurable
Concept
B1= Number of
detected
failures
B2= Number
of performed
test cases
B1/B2
Failure density
against test
cases
Comparison of
values obtained
with generic
thresholds and/or
targets
External quality
measures –
Reliability -
Maturity
Software
Run-time
reliability
Level of
testing
Inspired by Abran, Alain, et al. "An information model for software quality measurement with ISO standards." Proceedings of the International Conference on Software Development
(SWDC-REK), Reykjavik, Iceland. 2005.
ISO/IEC 15939 Examples
36/94
●
Some measures are harder to collect or are not regularly collected
– Direct: from a direct process of measuring
– Indirect: from a mathematical equation in the world of symbols
Derived
Measure
Derived
Measure
Measurement
Function
Base
Measure
Base
Measure
Measurement
Method
Measurement
Method
Attribute Attribute
Entity
Measurable
Concept
Property relevant to
information needs
Operations mapping
an attribute to a scale
Variable assigned a
value by applying the
method to one attribute
Algorithm for combining
two or more base
measures
Variable assigned a
value by applying the
measurement function
to two or more values of
base measures
ISO/IEC 15939 refers to
them as base measure
and derived measure
Direct vs Indirect Measures (1/2)
37/94
●
Direct
– Number of known defects
●
Indirect
– Defects density (DD)
– COCOMO, measure of effort
E=a⋅KSLoCb
⋅EAF
where b=0.91+0.01∑i=1
5
SFi
a=2.94
DD=
known defects
productsize
EAF = Effort Adjustment Factor
SF = Scale Factors
Direct vs Indirect Measures (2/2)
38/94
●
Generally, it easier to collect measures of length and
complexity of the code (internal attributes of product) than
measures of its quality (external attributes)
– Internal attribute: internal characteristics of product, process,
and human resources
– External attributes: due to external environment
Internal vs External Attributes (1/4)
39/94
●
One of the aims of Software Engineering is to improve the
quality of software
Internal vs External Attributes (2/4)
40/94
●
The mapping of internal attributes to external ones – and
then quality in use – is not as straightforward
Internal vs External Attributes (3/4)
41/94
●
The mapping of internal attributes to external ones – and then
quality in use – is not as straightforward (example: reliability)
nr. of
failures over
a period of
time
How many faults were
detected in reviewed
Product?
X=A/B
A=Absolute number of faults
detected in review
B=Number of estimated faults to
be detected in review (using past
history or reference model)
Is there a relation
between the two?
ASSUMPTION (!) →  fix internal mistakes to fix the corresponding failure(s)
Internal vs External Attributes (4/4)
42/94
Objective: the same each time they are taken (e.g. automated
collected by some device)
e.g., LOCs
Subjective: manually collected by individuals
e.g., time to use a functionality in an application
Objective vs Subjective Measures
43/94
SOFTWARE METRICS - SIZE
44/94
[01] * multiples. Repeat until there are no more multiples
[02] * in the array.
[03] */
[04] public class PrimeGenerator
[05] {
[06] private static boolean[] crossedOut;
[07] private static int[] result;
[08] public static int[] generatePrimes(int maxValue){
[09] if (maxValue < 2){
[10] return new int[0];
[11] }else{
[12] uncrossIntegersUpTo(maxValue);
[13] crossOutMultiples();
[14] putUncrossedIntegersIntoResult();
[15] return result;
[16] }
[17] }
[18] }
Various Measures of Size
45/94
[01] * multiples. Repeat until there are no more multiples
[02] * in the array.
[03] */
[04] public class PrimeGenerator
[05] {
[06] private static boolean[] crossedOut;
[07] private static int[] result;
[08] public static int[] generatePrimes(int maxValue){
[09] if (maxValue < 2){
[10] return new int[0];
[11] }else{
[12] uncrossIntegersUpTo(maxValue);
[13] crossOutMultiples();
[14] putUncrossedIntegersIntoResult();
[15] return result;
[16] }
[17] }
[18] }
LOC = 18
(Lines Of Code)
CLOC=3
(Commented
Lines of Code)
Various Measures of Size
46/94
[01] * multiples. Repeat until there are no more multiples
[02] * in the array.
[03] */
[04] public class PrimeGenerator
[05] {
[06] private static boolean[] crossedOut;
[07] private static int[] result;
[08] public static int[] generatePrimes(int maxValue){
[09] if (maxValue < 2){
[10] return new int[0];
[11] }else{
[12] uncrossIntegersUpTo(maxValue);
[13] crossOutMultiples();
[14] putUncrossedIntegersIntoResult();
[15] return result;
[16] }
[17] }
[18] }
NLOC = 15
(Non-Commented
Lines Of Code)
Various Measures of Size
47/94
[01] * multiples. Repeat until there are no more multiples
[02] * in the array.
[03] */
[04] public class PrimeGenerator
[05] {
[06] private static boolean[] crossedOut;
[07] private static int[] result;
[08] public static int[] generatePrimes(int maxValue){
[09] if (maxValue < 2){
[10] return new int[0];
[11] }else{
[12] uncrossIntegersUpTo(maxValue);
[13] crossOutMultiples();
[14] putUncrossedIntegersIntoResult();
[15] return result;
[16] }
[17] }
[18] }
NOC = 1
(Number Of
Classes)
NOM = 1
(Number of
Methods)
NOP = 1
(Number of
Packages)
Various Measures of Size
48/94
●
Size is used for normalization of existing measures
from the example before, it would be much more useful to report a
comments density of 16% (3/18) rather than 3 CLOCs
CD=
CLOCs
LOCs
=
3
18
=0.16
Measures of Size good for…?
49/94
●
Example: using comments density to compare Open Source
projects after normalization
What is a good
reference value
for “comments
density” in your
opinion?
These look “scary”
O. Arafat and D. Riehle, “The comment density of open source software code,” in 31st International Conference on Software
Engineering - Companion Volume, 2009. ICSE-Companion 2009, 2009, pp. 195–198.
Measures of Size good for…?
50/94
●
Size can give a good rough initial estimation of effort,
although...
→  Measures of source code size should *never* be used to assess
the productivity of developers
How would you compare
Mozilla Firefox with the
Linux Kernel in terms of
maintenance effort?
Software LOCs
Microsoft Windows Vista ~50M
Linux Kernel 3.1 ~15M
Android ~12M
Mozilla Firefox ~10M
Unreal Engine 3 ~2M
Measures of Size good for…?
51/94
→ http://www.informationisbeautiful.net/visualizations/million-lines-of-code/
●
Size can be used for comparison of projects and across
releases
Measures of Size good for…?
52/94
“The task then is to refine the code base to better meet customer
need. If that is not clear, the programmers should not write a line of
code. Every line of code costs money to write and more money to
support.”
Jeff Sutherland, one of the main proponents of the
Agile Manifesto and the SCRUM methodology
Another observation about LOCs
53/94
SOFTWARE METRICS - COMPLEXITY
54/94
●
CC represents the number of independent control flow paths
●
G=(N,E) is a graph representing the control flow of a
program. N=nodes, E=edges, P = nr. disconnected parts of G,
like main program and method call
●
Cyclomatic Complexity is defined as:
v(G) = |E|-|N|+ 2P
→  Assumptions: higher complexity of the program flow graphs, more
complex testing process for the source code
McCabe's Cyclomatic Complexity (CC)
Note: a shortcut is to use # branches + 1 (if, for, foreach, while, do-while, case label, catch,
conditional statements)
55/94
CC = 2 [01] * multiples. Repeat until there are no more multiples
[02] * in the array.
[03] */
[04] public class PrimeGenerator{
[05] private static boolean[] crossedOut;
[06] private static int[] result;
[07] public static int[] generatePrimes(int maxValue){
[08] if (maxValue < 2){
[09] return new int[0];
[10] }else{
[11] uncrossIntegersUpTo(maxValue);
[12] crossOutMultiples();
[13] putUncrossedIntegersIntoResult();
[14] return result;
[15] }
[16] }
[17] }
Typical ranges
1-4 low
5-7 medium
8-10 high
11+ very high
CC of method
generatePrimes
v(G)=|E|-|N|+2
v(G)=9-9+2=2
entry
exit
McCabe's Cyclomatic Complexity (CC)
56/94
●
The following code structure from a 2008 students' project
implementing chess: one method with 292LOCs and 163 CC
Example Application of CC
57/94
●
Let's decompose a bit such huge method
public boolean eatCoin(Movement mov, Movement eatMov, Coin coin)
throws IOException{
//Controls if the eatMove is in the board, if not return
if(!canMove(eatMov)){
System.out.println("You can't eat this coin");
return false;
}
try{
//If it is a coin
if(!this.board[mov.row][mov.col].isKing()){
//If the coin to eat isn't a king
System.out.println("nextRow " + mov.nextRow + "
nextCol " + mov.nextCol + " isKing " +
this.board[mov.nextRow][mov.nextCol].isKing());
if(!this.board[mov.nextRow][mov.nextCol].isKing()){
....
Example Application of CC
58/94
Example Application of CC
59/94
●
A word of warning is that metrics take typically into account syntactic
complexity NOT semantic complexity
●
Both of the following code fragments have the *same* Cyclomatic Complexity
→ which code fragment is easier to understand?
[04] public class PrimeGenerator
[05] {
[06] private static boolean[] crossedOut;
[07] private static int[] result;
[08]
[09] public static int[] generatePrimes(int maxValue){
[10] if (maxValue < 2){
[11] return new int[0];
[12] }else{
[13] uncrossIntegersUpTo(maxValue);
[14] crossOutMultiples();
[15] putUncrossedIntegersIntoResult();
[16] return result;
[17] }
[18] }
[04] public class A
[05] {
[06] private static boolean[] c;
[07] private static int[] b;
[08]
[09] public static int[] generate(int m){
[10] if (m < 2){
[11] return new int[0];
[12] }else{
[13] methodOne(m);
[14] methodTwo();
[15] methodThree();
[16] return b;
[17] }
[18] }
●
As well, as in the initial motivating example, a word of warning when
comparing projects in terms of average complexity
Complexity
60/94
OBJECT ORIENTED METRICS
61/94
●
WMC: Weighted methods per class
●
DIT: Depth of Inheritance Tree
●
NOC: Number of Children
●
CBO: Coupling between object classes
●
RFC: Response for a Class
●
LCOM: Lack of cohesion in methods
Chidamber & Kemerer Suite
62/94
●
WMC: Weighted methods per class
– weighted sum of the number of methods of a class. Given C a
class and M1, …, MK k methods with complexity c1,…,cK
WMC
∑
i−1
n
ci ,where cisthe complexity of amethod
63/94
→  What is the WMC of the following classes?
WMC
WMC=∑
i−1
n
ci
64/94
→  What is the WMC of the following classes?
WMC
WMC=∑
i−1
n
ci
WMC(A) = NoM(A) = 5
WMC(B) = NoM(B) = 1
WMC(C) = NoM(C) = 0
WMC(D) = NoM(D) = 1
WMC(E) = NoM(E) = 3
WMC(F) = NoM(F) = 0
WMC(G) = NoM(G) = 0
65/94
●
DIT: Depth of Inheritance Tree
max inheritance level from the root to the class
●
NOC: Number of Children
nr. Of direct descendants of a class
DIT & NOC
66/94
●
DIT: Depth of Inheritance Tree
max inheritance level from the root to the class
●
NOC: Number of Children
nr. Of direct descendants of a class
DIT & NOC
The deeper a class is in the hierarchy, the more
methods it is likely to inherit, making it more complex
Deep trees as such indicate greater design complexity
As a positive factor, deep trees promote reuse
because of method inheritance
What are “good” DIT & NOC values?
67/94
●
CBO: Coupling between Objects
Class A coupled with B, if A is using methods/attributes of B
Multiple accesses to the same class are counted as one access
High CBO is undesirable: Excessive coupling between object
classes is detrimental to modular design and prevents reuse
Note: Some definitions of CBO consider both A using B (fan-out), but also B using A (fanin)
for the computation of CBO
CBO
68/94
CBO
→  What is the CBO of the following classes?
69/94
CBO
→  What is the CBO of the following classes?
CBO(A)=3
CBO(B)=CBO(C)=CBO(D)=
CBO(E)=CBO(F)=0
70/94
●
RFC: Response for a Class
the number of methods of a class than can be invoked in response of a
call to a method of a class
– count of methods that can be executed by class A responding to a
message (Mc)
– Sum all to external calls to other methods (only count one call to the
same method once)
A large RFC has been found to indicate more faults
Classes with a high RFC are more complex and harder to understand:
Testing and Debugging are more complicated
RFC
RFC=|Mc∪Me|
71/94
RFC
→  What is the RFC of the following classes?
RFC=|Mc∪Me|
72/94
RFC
→  What is the RFC of the following classes?
RFC(A) = 7
WMC(B) = NoM(B) = 1
WMC(C) = NoM(C) = 0
WMC(D) = NoM(D) = 1
WMC(E) = NoM(E) = 3
WMC(F) = NoM(F) = 0
WMC(G) = NoM(G) = 0
RFC=|Mc∪Me|
73/94
●
LCOM: Lack of cohesion in methods
– How closely the local methods are related to the local instance variables in the
class
– We use a “negative” measure of cohesiveness, the lack of cohesion of its
methods
LCOM
LCOM=1−
∑ F|Mf|
|M|x|F|
M = static and instance methods in the class
F = instance field in the class
Mf
= methods accessing field f
|S| = cardinality of set S
Figure source from NDepend documentation
1−
10
50
=0.8
Divide by the # of methods
multiplied the # of fields
Take each field in the class, count the methods
that reference it, sum all together for all fields. Violet=attributes, pink=methods
1−
2
2
=0
74/94
Question Time
75/94
●
Given all that we have seen, what are your thoughts about the following
metric computing the Maintainability Index (MI) of a project:
MI=171−5.2⋅ln(V )−0.23⋅CC−16.2⋅ln(LOC)
Note: you might see different versions of MI implemented in different tools – this is the original
formula that has a range (171,-∞), other variations go in the (0,100) range, e.g. look at
Microsoft Visual Studio documentation for details
V is the Halstead volume, measuring the complexity of code based on length and
vocabulary used (in the code)
V=N∗log2 n
whereN=N1+N2,
N1=Totaloperators(like>,;,),etc..,N2=Totaloperands(like j ,i,0,etc...)
N=n1+n2,
n1=uniqueoperators,n2=uniqueoperands
In your view, what is good and what is bad about this metric?
Maintainability Index (MI)
CC = Cyclomatic Complexity as defined previously
LOC = Lines of Code
76/94
The Goal Question Metrics
(GQM) Approach
77/94
●
Common pitfalls in software measurement
– Collecting measurements without a meaning
●
Measurement must be goal-driven
– Not analyzing measurements
●
Numbers need detailed analysis
– Setting unrealistic targets
●
Targets should not be uniquely defined based on the numbers
– Paralysis by analysis
●
Measurement is a key activity in management, not a separate activity
Count what is countable.
Measure what is measurable.
And what is not measurable, make measurable.
Galileo Galilei
Software Measurement Pitfalls
78/94
●
Introduced in 1986 by Rombach and Basili
– GQM stands for Goal Question Metric
●
It is a deductive instrument to derive suitable measures from
prescribed goals
●
The paradigm is initiated by Business Goals (BG)
●
From the BGs we can derive the GQM
●
The Goal Question Metric top-down approach consists of
three layers
– Conceptual layer – the Measurement Goal (G)
– Operational layer – the Question (Q)
– Measurement layer – the Metric (M)
The GQM Approach
79/94
●
Measurements must be goal-oriented
●
Following typically a structure as the GQM approach:
Measurement
Goal (G)
Question (Q)
Metric (M)
Business objectives, key
performance indicators,
projects targets,
improvements goals
Approaches to reach the
goals, improvement
programs, change
management, project
management techniques
Business, employee,
products, processes
What are the goals to reach?
What do I need to improve?
How do I reach my
objectives? I will I improve?
Am I doing good or bad? Am I
doing better or worse?
Feedback loop
(understand)
Review
Define
Goal-oriented Measurement
80/94
●
Here are some possible and common used words for each item
of the Goal structure
→  Object of study: process, product, model, metric, etc...
→  Purpose: characterize, evaluate, predict, motivate, etc... in order to
understand, assess, manage, engineer, improve, etc...
→  Point of view: manager, developer, tester, customer, etc...
→  Perspective or Focus: cost, effectiveness, correctness, defects, changes,
product measures, etc...
→  Environment or Context: specify the environmental factors, including
process factors, people factors, problem factors, methods, tools, constraints,
etc...
The Measurement Goal
81/94
SQALE (Software Quality Assessment
Based on Lifecycle Expectations)
82/94
●
SQALE (Software Quality Assessment Based on Lifecycle Expectations) is a
quality method to evaluate technical debts in software projects based on
the measurement of software characteristics
– Three levels, the first one including 8 software characteristics
Characteristic Sub-
Characteristic
Source Code
Requirement
1 1,n 1 1,n
Level 1 Level 2 Level 3
Testability
Reliability
Changeability
Efficiency
Security
Maintainability
Portability
Reusability
SQALE
Adapted from: http://www.sqale.org/wp-content/uploads/2010/08/SQALE-Method-EN-V1-0.pdf
83/94
●
The second level is formed by characteristics
Characteristic Sub-
Characteristic
Source Code
Requirement
1 1,n 1 1,n
Level 1 Level 2 Level 3
Testability
Reliability
Changeability
Efficiency
Security
Maintainability
Portability
Reusability
Unit Testing Testability
Integration Testing Testability
Data related reliability
Logic related reliability
Statement related reliability
Synchroniation related reliability
Resource related reliability
Architecture related reliability
Fault tolerance
Understandability
Readability
...
...
...
...
...
SQALE
Adapted from: http://www.sqale.org/wp-content/uploads/2010/08/SQALE-Method-EN-V1-0.pdf
84/94
●
The third level is linking language specific constructs to the sub-
characteristics
Characteristic Sub-
Characteristic
Source Code
Requirement
1 1,n 1 1,n
Level 1 Level 2 Level 3
Testability
Reliability
Changeability
Efficiency
Security
Maintainability
Portability
Reusability
Unit Testing Testability
Integration Testing Testability
Data related reliability
Logic related reliability
Statement related reliability
Synchroniation related reliability
Resource related reliability
Architecture related reliability
Fault tolerance
Understandability
Readability
...
...
...
...
...
Number of parameters in a module call (NOP) <6
Coupling between objects (CBO) <7
Switch statements have a 'default' condition
No assignement ' =' within 'if' statement
No assignement ' =' within 'while' statement
Invariant iteration index
SQALE
Adapted from: http://www.sqale.org/wp-content/uploads/2010/08/SQALE-Method-EN-V1-0.pdf
85/94
●
For each of the source code requirements we need to associate a
remediation function that translates the non-compliances into
remediation costs
●
In the most complex case you can associate a different function for
each requirement, but in the most simple case you can have some
predefined value for categories in which code requirements are in:
SQALE – Remediation Function
Source: http://www.sqale.org/wp-content/uploads/2010/08/SQALE-Method-EN-V1-0.pdf
86/94
●
Non-remediation functions represent the cost to keep a nonconformity
so a negative impact from the business point of view
SQALE – Non-remediation Function
Source: http://www.sqale.org/wp-content/uploads/2010/08/SQALE-Method-EN-V1-0.pdf
87/94
●
Sums of all the remediation costs associated to a particular hierarchy
of characteristics constitute an index:
– SQALE Testability Index: STI
– SQALE Reliability Index: SRI
– SQALE Changeability Index: SCI
– SQALE Efficiency Index: SEI
– SQALE Security Index: SSI
– SQALE Maintainability Index: SMI
– SQALE Portability Index: SPI
– SQALE Reusability Index: SRuI
– SQALE Quality Index: SQI (overall index)
* Note that there is a version of each index that represents density,
normalized by some measure of size
SQALE - Indexes
88/94
●
Indexes can be used to build a rating value:
Rating=
estimated remediationcost
estimated development cost
Rating=
8.30h
300h
=2.7 %->C
Example, an artefact that has an estimated
development cost of 300 hours and a STI of 8.30
hours, using the reference table on the left
SQALE - Rating
Source: http://www.sqale.org/wp-content/uploads/2010/08/SQALE-Method-EN-V1-0.pdf
89/94
●
The final representation can take the form of a Kiviat diagram in
which the different density indexes are represented
SQALE - Rating
Source: http://www.sqale.org/wp-content/uploads/2010/08/SQALE-Method-EN-V1-0.pdf
90/94
●
This is the overall view you find in SonarCube
SQALE - Rating
Source: http://www.sonarqube.org
91/94
●
Given our initial discussion of measurement pitfalls, scales and
representation condition, the following sentence should be now
clear:
“Because the non-remediation costs are not established on an
ordinal scale but on a ratio scale, we have shown [..] that we can
aggregate the measures by addition and comply with the
measurement theory and the representation clause.”
Letouzey, Jean-Louis, and Michel Ilkiewicz. "Managing technical debt with the SQALE method." IEEE software 6
(2012): 44-51.
SQALE – Small Detail
92/94
●
Measurement is important to track progress of software
projects and to focus on relevant parts that need attention
●
As such, we always need to take measurement into account
with some “grain of salt”
●
Still, collecting non-relevant or non-valid metrics might be
even worse than not collecting any valid measure at all
Conclusions
93/94
●
LOCs: Lines of Code
●
CC: McCabe Cyclomatic complexity
●
Fan in: number of local flows that terminates in a module
●
Fan out: number of local flows emanate from a module
●
Information flow complexity of a a module: length of the module times the
squared difference of fan in and fan out
●
NOM: Number of Methods per class
●
WMC: Weighted Methods per Class
●
DIT: Depth of Inheritance Tree
●
NOC: Number of Children
●
CBO: Coupling Between Objects
●
RFC: Response For a Class
●
LCOM: Lack of Cohesion of Methods
●
ANDC: Average Number of Derived Classes
●
AHH: Average Hierarchy Height
List of some acronyms
94/94
●
N. Fenton and J. Bieman, Software Metrics: A Rigorous and Practical Approach, Third
Edition, 3 edition. Boca Raton: CRC Press, 2014.
●
C. Ebert and R. Dumke, Software Measurement: Establish - Extract - Evaluate Execute,
Softcover reprint of hardcover 1st ed. 2007 edition. Springer, 2010.
●
Lanza, Michele, and Radu Marinescu. Object-oriented metrics in practice: using
software metrics to characterize, evaluate, and improve the design of objectoriented
systems. Springer Science & Business Media, 2007.
●
Some code samples from Martin, Robert C. Clean code: a handbook of agile software
craftsmanship. Pearson Education, 2008.
●
Moose platform for software data analysis http://moosetechnology.org
●
The SQALE Method http://www.sqale.org/wp-content/uploads/2010/08/SQALE-Method-EN-V1-0.pdf
References