LAB OF SOFTWARE ARCHITECTURES
AND INFORMATION SYSTEMS
FACULTY OF INFORMATICS
MASARYK UNIVERSITY, BRNO
PV260 - SOFTWARE QUALITY
LECT2. Software Measurement & Metrics and their role
in quality improvement
Bruno Rossi
brossi@mail.muni.cz
2-103
●
Introduction
●
The Measurement Process
●
Motivational Examples
●
Background on Software Measurement
●
The Goal Question Metrics approach
●
Measures and Software Quality Improvement
→ SQALE (Software Quality Assessment Based on Lifecycle
Expectations)
●
Case Studies
Outline
3-103
●
The following bug (can you spot it?) in Apple's SSL code was undiscovered
from Sept 2012 to Feb 2014 – how can it be?
M. Bland, “Finding more than one worm in the apple,”
Communications of the ACM, vol. 57, no. 7, pp. 58–64,
Jul. 2014.
Introduction
4-103
● Modern systems are very large & complex in terms of
structure & runtime behaviour
●
The figure on the right
represents Eclipse JDT 3.5.0
(350K LOCs, 1.324 classes,
23.605 methods )
Classes black - Methods red – Attributes blue. Method containment, attribute containment, and class→ → →
inheritance gray - Invocations red - Accesses blue→ → →
Introduction
5-103
●
We need ways to understand attributes of software, represent in a
concise way and use it to track for software & development process
improvement
●
Software Measurement and Metrics are one of the aspects we can
consider
LOCs 354.780
NOM 23.605
NOC 1.324
NOP 45
LOCs=lines of code, NOM=nr. of methods
NOC=nr. of classes, NOP=nr. of packages
If we consider the following metrics,
what can we say?
Are they “good” metrics?
Introduction
6-103
●
Measurement is the process by which numbers or symbols
are assigned to attributes of entities in the real world in
such a way as to describe them according to clearly defined
rules (N. Fenton and S. L. Pfleeger, 1997)
→ A measurement is the process to define a measure
Measurement
7-103
●
The measurement process goes from the real world to the
numerical representation
●
Interpretation goes from the numerical representation to the
relevant empirical results
Real World Numbers
Reduced
Numbers
Relevant
Empirical
Results
Intelligence Barrier
Measures
Interpretation
The Measurement Process
8-103
●
To avoid anecdotal evidence without a clear study (through
experiments or prototypes for example)
●
To increase the visibility and the understanding of the process
●
To analyze the software development
●
To make predictions through statistical models
Gilbs’s Principle of fuzzy targets (1988):
“Projects without clear goals will not achieve their goals clearly”
Why Software Measurement
9-103
●
Although measurement may be integrated in development,
very often objectives of measurements are not clear
●
“I measure the process because there is an automated tool
that collects the metrics, but do not know how to read the
data and what I can do with the data”
Tom De Marco (1982):
“You cannot manage what you cannot measure” ...
...but you need to know what to measure and how to measure
However...
10-103
Motivational Examples
about the pitfalls in linking the real world phenomenon to
numbering systems
11-103
●
You were asked to conduct a study to evaluate whether there is
discrimination among man and woman in university's enrollment
●
You set-up a case study and looked at the final results
→ Is there a discrimination in place?
→ What can you conclude from the numbers above?
Applicants % admitted
Men 8442 44%
Woman 4321 35%
A Motivational Example (1/3)
12-103
●
Now look at the same study, but performed at the department level (top 6
departments):
●
There does not seem to be any discrimination against women! The conclusion is that women
tended to apply to more competitive departments than men
●
The effect we just saw is called Simpson's paradox
Source of the example: http://en.wikipedia.org/wiki/Simpson%27s_paradox – considering the following papers:
J. Pearl (2000). Causality: Models, Reasoning, and Inference, Cambridge University Press.
P.J. Bickel, E.A. Hammel and J.W. O'Connell (1975). "Sex Bias in Graduate Admissions: Data From Berkeley. Science 187 (4175): 398–404.
Department Men Women
Applicants % admitted Applicants % admitted
A 825 62% 108 82%
B 560 63% 25 68%
C 325 37% 593 34%
D 417 33% 375 35%
E 191 28% 393 24%
F 272 6% 341 7%
A Motivational Example (2/3)
13-103
●
Simpsons' paradox: How can it be?
●
It can happen that:
a/b < A/B
c/d < C/D
(a + c)/(b + d) > (A + C)/(B + D)
●
e.g.
1/5 < 2/8
6/8 < 4/5
7/13 > 6/13
●
It is the result of not considering an hidden variable, as in the example not considering
the difficulty of entering a certain department
Dept Men Women
Applicants admitted Applicants admitted
A 5 20% 8 25%
B 8 75% 5 80%
Total 13 53% 13 46%
A Motivational Example (3/3)
14-103
Background on Software Measurement
15-103
Measurement
artifacts /
objects
Product
(architecture
implementation,
documentation)
Process
(management, lifecycle,
CASE)
Resources
(personnel,
software,
hardware)
Measurement
Models
Flow graphs
Call graphs
Structure tree
Code schema
...
Scale
types,
statistics
Correlation
Estimation
Adjustment
Calibration
Measurement
Evaluation
Analysis
Visualization
Exploration
Prediction
...
Measurement
Goals
Understanding
Learning
Improvement
Management
Controlling
...
artefactBased
operation
quantificationBased
operation
valueBased
operation
experienceBased
operation
Software Measurement Methods
16-103
Information
Product
Information Needs
Interpretation
Indicator
(analysis)
Model
Derived
Measure
Derived
Measure
Measurement
Function
Base
Measure
Base
Measure
Measurement
Method
Measurement
Method
Attribute Attribute
Entity
Measurable
Concept
Measurable
Concept:
abstract relationship
between attributes of
entities and
information needs
Measurement Information Model (ISO/IEC 15939)
17-103
Derived
Measure
Derived
Measure
Measurement
Function
Base
Measure
Base
Measure
Measurement
Method
Measurement
Method
Attribute Attribute
Entity
Measurable
Concept
Property relevant to
information needs
Operations mapping
an attribute to a scale
Variable assigned a
value by applying the
method to one attribute
Algorithm for combining
two or more base
measures
Variable assigned a
value by applying the
measurement function
to two or more values of
base measures
Bottom partMeasurement Information Model (ISO/IEC 15939)
18-103
Information
Product
Information Needs
Interpretation
Indicator
(analysis)
Model
Algorithm for combining
measures and decision
criteria
Variable assigned a value
by applying the analysis
model to base and/or
derived measures
Explanation relating the
quantitative information in
the indicator to the
information needs
The outcome of the
measurement process
that satisfies the
information needs
Top partMeasurement Information Model (ISO/IEC 15939)
19-103
Information
Product
Information Needs
Interpretation
Indicator
(analysis)
Model
Derived
Measure
Derived
Measure
Measurement
Function
Base
Measure
Base
Measure
Measurement
Method
Measurement
Method
Attribute Attribute
Entity
Measurable
Concept
B1= Nr. of
inaccurate
computations
encountered
by users
B2=
Operation
Time
B1/B2
Computational
Accuracy
Comparison of
values obtained
with generic
thresholds and/or
targets
External quality
measures –
Functionality -
Accuracy
Software
Run-time
accuracy
Run-time
usability
Information
Product
Information Needs
Interpretation
Indicator
(analysis)
Model
Derived
Measure
Derived
Measure
Measurement
Function
Base
Measure
Base
Measure
Measurement
Method
Measurement
Method
Attribute Attribute
Entity
Measurable
Concept
B1= Number of
detected
failures
B2= Number
of performed
test cases
B1/B2
Failure density
against test
cases
Comparison of
values obtained
with generic
thresholds and/or
targets
External quality
measures –
Reliability -
Maturity
Software
Run-time
reliability
Level of
testing
Inspired by Abran, Alain, et al. "An information model for software quality measurement with ISO standards." Proceedings of the International
Conference on Software Development (SWDC-REK), Reykjavik, Iceland. 2005.
ISO/IEC 15939 Examples
20-103
●
A measure is a mapping between
– The real world
– The mathematical or formal world with its objects and relations
●
Different mappings give different views of the world depending on the
context (height, weight, …)
●
The mapping relates attributes to mathematical objects; it does not relate
entities to mathematical objects
Measure Definition
21-103
●
The validity of a measure depends on definition of the attribute
coherent with the specification of the real world
– Is LOC a valid measure?
– It depends on our measurement goals, e.g.:
→ Do we consider blanks and comments in the LOCs?
→ How are the lines exactly computed (e.g. considering “;” as end
statements only)
You might have two different projects with two different
definitions of LOCs so that the following can be true at the same
time P1>P2 and P1<P2
Valid Measure
22-103
●
From Wikipedia: “...A program with high code coverage has been
more thoroughly tested and has a lower chance of containing
software bugs than a program with low code coverage...”
●
Would you consider code coverage as a valid measure of how much
thoroughly one software project has been tested?
●
Suppose you have two projects and you compute code coverage
P1 70% vs P2 80%→ →
Would you generally consider P2 to be “better” (more accurately) tested
than P1?
Valid Measures – Example (1/4)
23-103
Coverage 100%
[01] double div (int x, int y){
[02]    return x/y;
[03] }
AssertEquals(1.0, div(1,1));
Coverage 100%
assertEquals(0.66, div(2,3), 0.1);
[01] double div (int x, int y){
[02]    return x/y;
[03] }
A. Is it realistic to consider every test covering the same nr. of lines
as equal?
B. Is it realistic to consider every line of code as equally important for
testing?
Software follows usually a Pareto principle, so that 80% of the bugs
are in the 20% of the code as well usually this code is more difficult→
to cover with tests
Valid Measures – Example (2/4)
24-103
So should we “throw away” code
coverage?
●
According to Martin Fowler: “Test
coverage is a useful tool for
finding untested parts of a
codebase. Test coverage is of little
use as a numeric statement of how
good your tests are”
(http://martinfowler.com/bliki/TestCoverage.html)
Valid Measures – Example (3/4)
25-103
●
What is happening in this case is that we do not respect
the representation condition: when we assign symbols
to the attributes of entities we need to preserve the
meaning of relationships when moving entities from the real
world to the numerical world
Real
World
Mathem.
World
1-1 mapping on relations
Valid Measures – Example (4/4)
26-103
●
The triple (A, B, μ ) is called a scale
●
We can have different types of scales:
– nominal scale: ((A,≈),( , =), μ ), where ≈ stands for an equivalence relation and =ℝ
for a numerical relation (two objects are equivalent or not).
– ordinal scale: ((A, ∙≥),( , ≥), μ ), where ∙≥ describes ranking propertiesℝ
– interval scale: ((A × A, ∙≥),( × , ≥), μ ), where ∙≥ is a preference relation aboutℝ ℝ
the measured objects, entities or artifacts,
– ratio scale: ((A, ∙≥, ◦),( , ≥, ), μ ), where the described axioms of an extensiveℝ ⊗
structure above are valid.
Measurement Scales
27-103
●
Considering the scale is quite important for the admissible
operations
Scale Type Examples Indicators of Central Tendency
Nominal Name of the programming
language (e.g. Java, C++, C#)
Mode
Ordinal Ranking of failures (as a
measure of failure severity)
Mode + Median
Interval Beginning date, end date of
activities
Mode + Median + Arithmetic
Mean
Ratio LOC (as a measure of program
size)
Mode + Median + Arithmetic
Mean + geometric Mean
Morasca, Sandro. "Software measurement." Handbook of Software Engineering and Knowledge
Engineering (2001): 239-276.
Measurement Scales
28-103
●
Example, suppose that we have the following ranking of
software tickets by severity
Level Severity Description
6 Blocker Prevents function from being used, no workaround,
blocking progress on multiple fronts
5 Critical Prevents function from being used, no work-
around
4 Major Prevents function from being used, but a workaround
is possible
3 Normal A problem making a function difficult to use but
no special work-around is required
2 Minor A problem not affecting the actual function, but
the behavior is not natural
1 Trivial A problem not affecting the actual function, a
typo would be an example
Measurement Scales - Examples
29-103
●
Is it meaningful to use the weighted average to compare two projects in
terms of severity of the open issues?
Order Severity P1 P2
6 Blocker 2 10
5 Critical 36 19
4 Major 25 22
3 Normal 15 32
2 Minor 2 5
1 Trivial 121 113
Sev(P1)=avg(2∗6+36∗5+25∗4+15∗3+2∗2+121∗1)=77
Sev(P2)=avg(10∗6+19∗5+22∗4+32∗3+5∗2+113∗1)=77
Really the projects are the same?
Is there the same distance from
a critical ticket to a blocker that
there is between minor and
trivial?
Measurement Scales - Examples
30-103
●
Some measures are harder to collect or are not regularly
collected
– Direct: from a direct process of measuring
– Indirect: from a mathematical equation in the world of symbols
Derived
Measure
Derived
Measure
Measurement
Function
Base
Measure
Base
Measure
Measurement
Method
Measurement
Method
Attribute Attribute
Entity
Measurable
Concept
Property relevant to
information needs
Operations mapping
an attribute to a scale
Variable assigned a
value by applying the
method to one attribute
Algorithm for combining
two or more base
measures
Variable assigned a
value by applying the
measurement function
to two or more values of
base measures
This is what in ISO/IEC
15939 we refer as base
measure and derived
measure
Direct vs Indirect Measures
31-103
●
Direct
– Number of known defects
●
Indirect
– Defects density (DD)
– COCOMO, measure of effort
E=a⋅KSLoCb
⋅EAF
where b=0.91+0.01∑i=1
5
SFi
a=2.94
DD=
known defects
product size
Direct vs Indirect Measures
32-103
●
It easier to collect measures of length and complexity of
the code (internal attributes of product) than measures
of its quality (external attributes)
– Internal attribute: internal characteristics of product, process,
and human resources
– External attributes: due to external environment
Internal vs External Attributes
33-103
Objective: the same each time they are taken (e.g.
automated collected by some device)
→ e.g. LOCs
Subjective: manually collected by individuals
→ e.g. time to use a functionality in an application
Objective vs Subjective Measures
34-103
SOFTWARE METRICS - SIZE
35-103
[01] * multiples. Repeat until there are no more multiples
[02] * in the array.
[03] */
[04] public class PrimeGenerator
[05] {
[06]   private static boolean[] crossedOut;
[07]   private static int[] result;
[08]   public static int[] generatePrimes(int maxValue){
[09]     if (maxValue < 2){
[10]        return new int[0];
[11]     }else{
[12]        uncrossIntegersUpTo(maxValue);
[13]        crossOutMultiples();
[14]        putUncrossedIntegersIntoResult();
[15]        return result;
[16]     }
[17]   }
[18] }
Various Measures of Size
36-103
[01] * multiples. Repeat until there are no more multiples
[02] * in the array.
[03] */
[04] public class PrimeGenerator
[05] {
[06]   private static boolean[] crossedOut;
[07]   private static int[] result;
[08]   public static int[] generatePrimes(int maxValue){
[09]     if (maxValue < 2){
[10]        return new int[0];
[11]     }else{
[12]        uncrossIntegersUpTo(maxValue);
[13]        crossOutMultiples();
[14]        putUncrossedIntegersIntoResult();
[15]        return result;
[16]     }
[17]   }
[18] }
LOC = 18
(Lines Of Code)
CLOC=3
(Commented
Lines of Code)
Various Measures of Size
37-103
[01] * multiples. Repeat until there are no more multiples
[02] * in the array.
[03] */
[04] public class PrimeGenerator
[05] {
[06]   private static boolean[] crossedOut;
[07]   private static int[] result;
[08]   public static int[] generatePrimes(int maxValue){
[09]     if (maxValue < 2){
[10]        return new int[0];
[11]     }else{
[12]        uncrossIntegersUpTo(maxValue);
[13]        crossOutMultiples();
[14]        putUncrossedIntegersIntoResult();
[15]        return result;
[16]     }
[17]   }
[18] }
NLOC = 15
(Non-Commented
Lines Of Code)
Various Measures of Size
38-103
[01] * multiples. Repeat until there are no more multiples
[02] * in the array.
[03] */
[04] public class PrimeGenerator
[05] {
[06]   private static boolean[] crossedOut;
[07]   private static int[] result;
[08]   public static int[] generatePrimes(int maxValue){
[09]     if (maxValue < 2){
[10]        return new int[0];
[11]     }else{
[12]        uncrossIntegersUpTo(maxValue);
[13]        crossOutMultiples();
[14]        putUncrossedIntegersIntoResult();
[15]        return result;
[16]     }
[17]   }
[18] }
NOC = 1
(Number Of
Classes)
NOM = 1
(Number of
Methods)
NOP = 1
(Number of
Packages)
Various Measures of Size
39-103
●
Size is used for normalization of existing
measures
→ from the example before, it would be much more useful to report a
comments density of 16% (3/18) rather than 3 CLOCs Why?
CD=
CLOCs
LOCs
=
3
18
=0.16
Measures of Size Good for?
40-103
●
Example, using comments density to compare Open Source
projects after normalization
What is a good
reference value for
comments density
in your opinion?
These look scary
O. Arafat and D. Riehle, “The comment density of open source software code,” in 31st International Conference on Software
Engineering - Companion Volume, 2009. ICSE-Companion 2009, 2009, pp. 195–198.
Measures of Size Good for?
41-103
●
Size can give a good rough initial estimation of effort,
although...
●
Size should NEVER be used to assess the productivity of
developers Why?
How would you compare
Mozilla Firefox with the
Linux Kernel in terms of
maintenance effort?
Software LOCs
Microsoft Windows Vista ~50M
Linux Kernel 3.1 ~15M
Android ~12M
Mozilla Firefox ~10M
Unreal Engine 3 ~2M
Measures of Size Good for?
42-103
→ http://www.informationisbeautiful.net/visualizations/million-lines-of-code/
● Size can be used for comparison of projects and across
releases
Measures of Size Good for?
43-103
●
“The task then is to refine the code base to better meet
customer need. If that is not clear, the programmers
should not write a line of code. Every line of code
costs money to write and more money to support.”
Jeff Sutherland, one of the main proponents of the
Agile Manifesto and the SCRUM methodology
One Observation about LOCs
44-103
SOFTWARE METRICS - COMPLEXITY
45-103
CC = 3
Number of decision points - if,
while, for, foreach, case,
default, continue, goto, &&,
||, catch, ? : (ternary
operator), ??(nonnull operator)
[01] * multiples. Repeat until there are no more multiples
[02] * in the array.
[03] */
[04] public class PrimeGenerator
[05] {
[06]   private static boolean[] crossedOut;
[07]   private static int[] result;
[08]   public static int[] generatePrimes(int maxValue){
[09]     if (maxValue < 2){
[10]        return new int[0];
[11]     }else{
[12]        uncrossIntegersUpTo(maxValue);
[13]        crossOutMultiples();
[14]        putUncrossedIntegersIntoResult();
[15]        return result;
[16]     }
[17]   }
[18] }
Typical ranges
1-4 low
5-7 medium
8-10 high
11+ very high
Cyclomatic Complexity (CC)
v(G) = e – n + 2
46-103
●
The following code structure from a 2008 students' project
implementing chess: one method with 292LOCs and 163 CC
Example by using CC
47-103
●
Let's decompose a bit such huge method
public boolean eatCoin(Movement mov, Movement eatMov, Coin coin) 
throws IOException{
//Controls if the eatMove is in the board, if not return
if(!canMove(eatMov)){
System.out.println("You can't eat this coin");
return false;
}
try{
//If it is a coin
if(!this.board[mov.row][mov.col].isKing()){
//If the coin to eat isn't a king
System.out.println("nextRow " + mov.nextRow + "   
                      nextCol " + mov.nextCol + " isKing " +          
                      this.board[mov.nextRow][mov.nextCol].isKing());
if(!this.board[mov.nextRow][mov.nextCol].isKing()){
....
Example by using CC
48-103
Example by using CC
49-103
●
A word of warning is that metrics take typically into account syntactic
complexity NOT semantical complexity
●
Both of the following code fragments have the same Cyclomatic
Complexity
[04] public class PrimeGenerator
[05] {
[06]   private static boolean[] crossedOut;
[07]   private static int[] result;
[08]   
[09]   public static int[] generatePrimes(int maxValue){
[10]     if (maxValue < 2){
[11]        return new int[0];
[12]     }else{
[13]        uncrossIntegersUpTo(maxValue);
[14]        crossOutMultiples();
[15]        putUncrossedIntegersIntoResult();
[16]        return result;
[17]   }
[18] }
[04] public class A
[05] {
[06]   private static boolean[] c;
[07]   private static int[] b;
[08]   
[09]   public static int[] generate(int m){
[10]     if (m < 2){
[11]        return new int[0];
[12]     }else{
[13]        methodOne(m);
[14]        methodTwo();
[15]        methodThree();
[16]        return b;
[17]   }
[18] }
● As well, as in the initial motivating example, a word of warning
when comparing projects in terms of average complexity
Complexity
50-103
OBJECT ORIENTED METRICS
51-103
52-103
The Goal Question Metrics
(GQM) Approach
53-103
●
Common pitfalls in software measurement
– Collecting measurements without a meaning
●
Measurement must be goal-driven
– Not analyzing measurements
●
Numbers need detailed analysis
– Setting unrealistic targets
●
Targets should not be uniquely defined based on the numbers
– Paralysis by analysis
●
Measurement is a key activity in management, not a separate activity
Count what is countable.
Measure what is measurable.
And what is not measurable, make measurable.
Galileo Galilei
Software Measurement - Pitfalls
54-103
●
Introduced in 1986 by Rombach and Basili
– GQM stands for Goal Question Metric
●
It is a deductive instrument to derive suitable
measures from prescribed goals
●
The paradigm is initiated by Business Goals (BG)
The GQM Approach
55-103
●
Improve the quality of a software product
●
Understand the development process for a given project
●
Enhance the inspection process in the testing phase
●
Decide on the adoption of a new software tool
●
Evaluate costs of a transition to a new sw solution
●
Assess the efficiency of the development process
●
Evaluate the current testing strategy
Examples of Business Goals
56-103
●
From the BGs we can derive the GQM
●
The Goal Question Metric top-down approach consists of
three layers
– Conceptual layer – the Measurement Goal (G)
– Operational layer – the Question (Q)
– Measurement layer – the Metric (M)
The GQM Approach
57-103
●
Measurements must be goal-oriented
●
Following typically a structure as the GQM approach:
Measurement
Goal (G)
Question (Q)
Metric (M)
Business objectives, key
performance indicators,
projects targets,
improvements goals
Approaches to reach the
goals, improvement
programs, change
management, project
management techniques
Business, employee,
products, processes
What are the goals to reach?
What do I need to improve?
How do I reach my
objectives? I will I improve?
Am I doing good or bad? Am I
doing better or worse?
Feedback loop
(understand)
Review
Define
Goal-oriented Measurement
58-103
Starting with objectives which can be personal or company-wide it is determined
what to improve. Goals are translated into what should be achieved in the context
of a software project or process or product
Measurement
Goal (G)
Question (Q)
Metric (M)
Business objectives, key
performance indicators,
projects targets,
improvements goals
Approaches to reach the
goals, improvement
programs, change
management, project
management techniques
Business, employee,
products, processes
What are the goals to reach?
What do I need to improve?
How do I reach my
objectives? I will I improve?
Am I doing good or bad? Am I
doing better or worse?
Feedback loop
(understand)
Review
Define
Goal-oriented Measurement
59-103
Measurement
Goal (G)
Question (Q)
Metric (M)
Business objectives, key
performance indicators,
projects targets,
improvements goals
Approaches to reach the
goals, improvement
programs, change
management, project
management techniques
Business, employee,
products, processes
What are the goals to reach?
What do I need to improve?
How do I reach my
objectives? I will I improve?
Am I doing good or bad? Am I
doing better or worse?
Feedback loop
(understand)
Review
Define
Identification about how the improvement should be done
Asking questions helps in clarifying how the objectives of step 1 will effectively
(and efficiently) be reached
Goal-oriented Measurement
60-103
Measurement
Goal (G)
Question (Q)
Metric (M)
Business objectives, key
performance indicators,
projects targets,
improvements goals
Approaches to reach the
goals, improvement
programs, change
management, project
management techniques
Business, employee,
products, processes
What are the goals to reach?
What do I need to improve?
How do I reach my
objectives? I will I improve?
Am I doing good or bad? Am I
doing better or worse?
Feedback loop
(understand)
Review
Define
Measurement
Goal (G)
Question (Q)
Metric (M)
Business objectives, key
performance indicators,
projects targets,
improvements goals
Approaches to reach the
goals, improvement
programs, change
management, project
management techniques
Business, employee,
products, processes
What are the goals to reach?
What do I need to improve?
How do I reach my
objectives? I will I improve?
Am I doing good or bad? Am I
doing better or worse?
Feedback loop
(understand)
Review
Define
Identify appropriate measurements that will indicate progress and whether the
change is pointing in a good direction
Goal-oriented Measurement
61-103
The primary question must be “What do I need to improve?” rather than “What
measurements should I use?”
Software measurements should follow from the organizational needs
Measurement
Goal (G)
Question (Q)
Metric (M)
Business objectives, key
performance indicators,
projects targets,
improvements goals
Approaches to reach the
goals, improvement
programs, change
management, project
management techniques
Business, employee,
products, processes
What are the goals to reach?
What do I need to improve?
How do I reach my
objectives? I will I improve?
Am I doing good or bad? Am I
doing better or worse?
Feedback loop
(understand)
Review
Define
Goal-oriented Measurement
62-103
●
The MG is structured in 5 items
– Object of Study (OS): what we want to measure - as a model
– Purpose: is the major verb
– Focus (F): the perspective to which one looks at the OS
– Point of view: generally is a person or a category of people
– Context: the environment in which the OS is observed
The Measurement Goal
63-103
●
Here are some possible and common used words for each item
of the Goal structure
●
Object of study: process, product, model, metric, etc
●
Purpose: characterize, evaluate, predict, motivate, etc. in
order to understand, assess, manage, engineer, improve, etc.
it
●
Point of view: manager, developer, tester, customer, etc.
●
Perspective or Focus: cost, effectiveness, correctness,
defects, changes, product measures, etc.
●
Environment or Context: specify the environmental factors,
including process factors, people factors, problem factors,
methods, tools, constraints, etc.
The Measurement Goal
64-103
● The Question is a link between OS and F
● BG1
: improve the software inspection process
● MG1
: Analyze the current inspection process to evaluate it in terms
of duration testing from the point of view of the testers in a small
software house
– OS: Inspection method
– Focus: cost
– Q Link: weekly labor of a tester to inspect a code
● Q1
: What is the cost of the weekly labor of a tester to inspect a code
with the given process?
The Questions - Example
65-103
●
Metrics are a set of measures for OS, F, and the QL
●
Example
●
I can derive the following metrics
M1= weekly salary * effort * # testers
M2= weekly salary * effort * duration of the inspection
The Metrics - Example
66-103
Software Measurement & the role in
Software Quality Improvement
67-103
●
One of the aims of Software Engineering is to improve the
quality of the software
External Product Measures
68-103
●
The mapping of internal attributes to external ones – and
then quality in use – is not as straightforward
External Product Measures
69-103
●
Concept that dates back to hardware reliability
– But software has a different behavior
– Ideas never wear out they do not deteriorate as they are not bounded to a
physical object
●
A system is said reliable if it operates in an external
environment following the prescribed specifications
●
A failure is a deviation from the prescribed flow
●
The concept is depending on time: a system is reliable in
a given interval of time
– Reliability is traditionally measures by the number of occurrences of
failures (in time)
– There exists no software product with zero defects
Example - Reliability
70-103
●
The mapping of internal attributes to external ones – and
then quality in use – is not as straightforward
Example - Reliability
nr. of
failures over
a period of
time
How many faults were
detected in reviewed
Product?
X=A/B
A=Absolute number of faults
detected in review
B=Number of estimated faults to
be detected in review (using past
history or reference model)
Is there a relation
between the two?
71-103
●
Failures are difficult to trace
●
They depend on the environment
●
They depend by the end-users
●
Failures are hardly collected
– Automatic or autonomous collection
●
They may contain useless information
●
Big effort to clean the data
●
Use of internal causes of failures
– Defects, bugs, faults, errors
●
Hope: fix internal mistakes to fix the
corresponding failure(s)
Internal Measures of Reliability
72-103
●
Intervening to fix a bug may inject new bugs (hence failures)
in the code
– The same happens in the design, architecture, test
●
Testing the code to find failures cannot reproduce all the
users’ behaviour (in vitro testing)
●
Inspecting the code is expensive
●
It is not proved that there is a clear cause effect relation
between defects and failures
– A failure is caused by defects
– A defect might not cause a failure in the time period in which the application is
used
– Pareto principle: The 20% of the classes are responsible of the 80% of the failures
Problems
73-103
SQALE (Software Quality Assessment
Based on Lifecycle Expectations)
74-103
●
SQALE (Software Quality Assessment Based on Lifecycle Expectations)
is a quality method to evaluate technical debts in software projects
based on the measurement of software characteristics
●
It allows to discuss here how quality characteristics have been
mapped into numerical representations
SQALE
75-103
●
SQALE quality model is based around three levels, the first one
including 8 software characteristics
SQALE
Characteristic Sub-
Characteristic
Source Code
Requirement
1 1,n 1 1,n
Level 1 Level 2 Level 3
Testability
Reliability
Changeability
Efficiency
Security
Maintainability
Portability
Reusability
76-103
●
The second level is formed by characteristics
SQALE
Characteristic Sub-
Characteristic
Source Code
Requirement
1 1,n 1 1,n
Level 1 Level 2 Level 3
Testability
Reliability
Changeability
Efficiency
Security
Maintainability
Portability
Reusability
Unit Testing Testability
Integration Testing Testability
Data related reliability
Logic related reliability
Statement related reliability
Synchroniation related reliability
Resource related reliability
Architecture related reliability
Fault tolerance
Understandability
Readability
...
...
...
...
...
77-103
●
The third level is linking language specific constructs to the sub-
characteristics
SQALE
Characteristic Sub-
Characteristic
Source Code
Requirement
1 1,n 1 1,n
Level 1 Level 2 Level 3
Testability
Reliability
Changeability
Efficiency
Security
Maintainability
Portability
Reusability
Unit Testing Testability
Integration Testing Testability
Data related reliability
Logic related reliability
Statement related reliability
Synchroniation related reliability
Resource related reliability
Architecture related reliability
Fault tolerance
Understandability
Readability
...
...
...
...
...
Number of parameters in a module call (NOP) <6
Coupling between objects (CBO) <7
Switch statements have a 'default' condition
No assignement ' =' within 'if' statement
No assignement ' =' within 'while' statement
Invariant iteration index
78-103
●
For each of the source code requirements we need to associate a
remediation function that translates the non-compliances into
remediation costs
●
In the most complex case you can associate a different function for
each requirement, but in the most simple case you can have some
predefined value for categories in which code requirements are in:
SQALE – Remediation Function
79-103
●
Non-remediation funtions represent the cost to keep a nonconformity
so a negative impact from the business point of view
SQALE – Non-Remediation Function
80-103
●
Sums of all the remediation costs associated to a particular hierarchy of
characteristics constitute an index:
– SQALE Testability Index: STI
– SQALE Reliability Index: SRI
– SQALE Changeability Index: SCI
– SQALE Efficiency Index: SEI
– SQALE Security Index: SSI
– SQALE Maintainability Index: SMI
– SQALE Portability Index: SPI
– SQALE Reusability Index: SRuI
– SQALE Quality Index: SQI (overall index)
SQALE – Indices
* Note that there is a version of each index that represents density,
normalized by some measure of size
81-103
●
Indexes can be used to build a rating value:
SQALE – Rating
Rating=
estimated remediationcost
estimated development cost
Rating=
8.30h
300h
=2.7 %->C
Example, an artefact that has an estimated
development cost of 300 hours and a STI of 8.30
hours, and using the reference table on the left
82-103
●
The final representation can take the form of a Kiviat diagram in
which the different density indexes are represented
SQALE – Rating
83-103
●
This is the view you find in SonarCube
http://www.sonarqube.org/sonar-sqale-1-2-in-screenshot
SQALE – Rating
84-103
●
Given our initial discussion of measurement pitfalls, scales and
representation condition, the following sentence should be now
clear:
“Because the non-remediation costs are not established on an
ordinal scale but on a ratio scale, we have shown [..] that we can
aggregate the measures by addition and comply with the
measurement theory and the representation clause.”
SQALE
Letouzey, Jean-Louis, and Michel Ilkiewicz. "Managing technical debt with the SQALE method." IEEE software 6
(2012): 44-51.
85-103
Case Studies
86-103
●
Suppose that we have the some projects on which
we computed the following set of metrics
●
What can you say about the projects?
Project01 Project02 Project03 Project04 Project05 Project06
# LOCS 4920 5817 4013 4515 3263 5735
# packages 29 49 33 35 25 33
# classes 126 199 159 181 75 198
# methods 658 862 644 817 415 715
# attributes 153 196 227 285 78 177
# parameters 301 459 393 440 182 415
# local vars 493 533 325 397 339 416
# calls 2051 2830 1844 2297 917 2015
Proj_status complete complete incomplete complete incomplete complete
Case Study
87-103
●
What if we consider relative instead of absolute
values?
●
This would allow to compare the values across
projects
Project01 Project02 Project03 Project04 Project05 Project06
LOCs/NOM 7.48 6.75 6.23 5.53 7.86 8.02
NOC/NOP 4.34 4.06 4.82 5.17 3.00 6.00
NOM/NOC 5.22 4.33 4.05 4.51 5.53 3.61
att/NOC 1.21 0.98 1.43 1.57 1.04 0.89
param/NOM 0.46 0.53 0.61 0.54 0.44 0.58
locvars/NOM 0.75 0.62 0.50 0.49 0.82 0.58
Calls/NOM 3.12 3.28 2.86 2.81 2.21 2.82 highest value
Proj_status complete complete incomplete complete incomplete complete lowest value
Case Study
88-103
Case Study
●
What if we make sense out of the metrics by using the GQM
approach?
G1. Analyze the software product (object of study) for the purpose of
evaluation (purpose) with respect to the effectiveness of code structure
(quality focus) from the point of view of the development team (point of
view) in the environment of our project named xyx (environment).
Q1.1. what is
the structure of
the system?
M1.2.1
Calls/NOM
M1.2.2
param/NOM
M1.1.3
NOM/NOC
Q1.2. what is
the coupling
within the
system?
M1.1.1
NOC/NOP
M1.1.2
LOCs/NOM
89-103
Case Study
●
What if we make sense out of the metrics by using the GQM
approach?
G1. Analyze the software product (object of study) for the purpose of
evaluation (purpose) with respect to the effectiveness of code structure
(quality focus) from the point of view of the development team (point of
view) in the environment of our project named xyx (environment).
Q1.1. what is
the structure of
the system?
M1.2.1
Calls/NOM
M1.2.2
param/NOM
M1.1.3
NOM/NOC
Q1.2. what is
the coupling
within the
system?
M1.1.1
NOC/NOP
M1.1.2
LOCs/NOM
P1: 3.12 P5: 2.21 P1: 0.46 P5: 0.44
90-103
Case Study
●
What happens if we consider LOCs instead of NOMs?
G1. Analyze the software product (object of study) for the purpose of
evaluation (purpose) with respect to the effectiveness of code structure
(quality focus) from the point of view of the development team (point of
view) in the environment of our project named xyx (environment).
Q1.1. what is
the structure of
the system?
M1.2.1
Calls/LOCs
M1.2.2
param/LOCs
M1.1.3
NOM/NOC
Q1.2. what is
the coupling
within the
system?
M1.1.1
NOC/NOP
M1.1.2
LOCs/NOM
P1: 0.41 P5: 0.28 P1: 0.14 P5: 0.05
91-103
●
Another useful way to think in terms of relative values and thresholds
is to use the Overview Pyramid
●
The Overview pyramid allows to represent three different aspects of
internal quality: inheritance, size & complexity and coupling
●
It provides both absolute and relative values that are compared against
typical thresholds
NOP: Number of Packages
NOC: Number of Classes
NOM: Number of Methods
LOC: Lines of Code
CYCLO: Cyclomatic Complexity
ANDC: Average Number of Derived Classes
AHH: Average Hierarchy Height
CALL: Number of Distinct Method Invocations
FANOUT: Number of Called Classes
Case Study – The Overview Pyramid
92-103
Project 1
Project 2
Project 3
Close to high
Close to average
Close to low
Case Study – The Overview Pyramid
93-103
Project 4
Project 5
Project 6
Close to high
Close to average
Close to low
Case Study – The Overview Pyramid
94-103
Back to our initial project
Eclipse JDT 3.5.0
The overview pyramid
Close to high
Close to average
Close to low
Case Study – The Overview Pyramid
95-103
●
Measurement is important to track progress of software
projects and to focus on relevant parts that need attention
●
As such, we always need to take measurement into account
with some “grain of salt”
●
Still, collecting non-relevant or non-valid metrics might be
even worse than not collecting any valid measure at all
Conclusions
96-103
Extra Slides
97-103
●
LOCs: Lines of Code
●
CC: McCabe Cyclomatic complexity
●
Fan in: number of local flows that terminates in a module
●
Fan out: number of local flows emanate from a module
●
Information flow complexity of a a module: length of the module times the
squared difference of fan in and fan out
●
NOM: Number of Methods per class
●
WMC: Weighted Methods per Class
●
DIT: Depth of Inheritance Tree
●
NOC: Number of Children
●
CBO: Coupling Between Objects
●
RFC: Response For a Class
●
LCOM: Lack of Cohesion of Methods
●
ANDC: Average Number of Derived Classes
●
AHH: Average Hierarchy Height
List of some Acronyms
98-103
– Analogies
– Axioms
– Correlations
– Criterions
– Intuitions
– Laws
– Lemmas
– Formulas,
– Methodologies
– Principles
– Relations
– Rule Of Thumbs
– Theories
●
Measurement Experience can have the form of:
Measurement Experience
99-103
Example: Laws in
Software
Engineering: how
were these derived?
Software Engineering Laws (1/4)
100-103
Information hiding in object
oriented programming
“A human being can concentrate on
7±2 items at a time”
“Productivity is improved by
reducing accidents and
controlling essence”
“Testing can show
the presence but not
absence of errors”
Pr(A|B) = Pr(B|A)*Pr(A) / Pr(b)
Software Engineering Laws (2/4)
101-103
“Requirement
deficiencies are the
prime source of
project failure”
“The value of a
model depends on
the view taken,but
none is best for all
purposes”
“the user will never
know what they want
until after the system
is in production”
“Good designs
require deep
application domain
knowledge”
“What applies to
small systems does
not apply to large
ones”
“Everything put
together falls apart
sooner or later”
8 laws of software
evolution
Software Engineering Laws (3/4)
102-103
The number of transistors on an
integrated circuit will double in about
18 months.
The number of radio
communications doubles every 30
months
“the number of lines of
code a programmer can
write in a fixed period of
time is the same
regardless of the
programming language”
“If builders built
buildings the way
programmers
wrote programs,
the first
woodpecker that
came along
would destroy
civilization”
Perspective based
inspections (along one
dimension, for a
specific stakeholder) are
highly eeffective and
efficient
Software reuse reduces
cycle time and
increases productivity
and quality
Software Engineering Laws (4/4)
103-103
●
N. Fenton and J. Bieman, Software Metrics: A Rigorous and Practical Approach, Third
Edition, 3 edition. Boca Raton: CRC Press, 2014.
●
C. Ebert and R. Dumke, Software Measurement: Establish - Extract - Evaluate Execute,
Softcover reprint of hardcover 1st ed. 2007 edition. Springer, 2010.
●
Lanza, Michele, and Radu Marinescu. Object-oriented metrics in practice: using
software metrics to characterize, evaluate, and improve the design of objectoriented
systems. Springer Science & Business Media, 2007.
●
Some code samples from Martin, Robert C. Clean code: a handbook of agile software
craftsmanship. Pearson Education, 2008.
●
Moose platform for software data analysis http://moosetechnology.org
●
The SQALE Method http://www.sqale.org/wp-content/uploads/2010/08/SQALE-Method-EN-V1-0.pdf
References