LAB OF SOFTWARE
ARCHITECTURES
AND INFORMATION SYSTEMS
FACULTY OF INFORMATICS
MASARYK UNIVERSITY, BRNO
PV260 - SOFTWARE QUALITY
SOFTWARE MEASUREMENT & METRICS
AND THEIR ROLE IN QUALITY
IMPROVEMENT
Bruno Rossi
brossi@mail.muni.cz
2-99
●
The following defect (can you spot it?) in Apple's SSL code was
undiscovered from Sept 2012 to Feb 2014 – how can it be?
M. Bland, “Finding more than one worm in the
apple,” Communications of the ACM, vol. 57, no. 7,
pp. 58–64, Jul. 2014.
Introduction
3-99
●
Modern systems are very large & complex in
terms of structure & runtime behaviour
●
The figure on the right
represents Eclipse JDT 3.5.0
(350K LOCs, 1.324 classes,
23.605 methods )
Classes → black - Methods → red – Attributes → blue. Method containment, attribute containment, and
class inheritance → gray - Invocations → red - Accesses → blue
Introduction
4-99
●
We need ways to understand attributes of software,
represent in a concise way and use it to track for software &
development process improvement
●
Software Measurement and Metrics are one of the aspects
we can consider
LOCs 354.780
NOM 23.605
NOC 1.324
NOP 45
LOCs=lines of code, NOM=nr. of methods
NOC=nr. of classes, NOP=nr. of packages
If we consider the following metrics,
what can we say?
What are these metrics “good” for?
Introduction
5-99
●
Typical problems of measurement:
→ How can I measure the maintainability of my
software?
→ Can I estimate the number of defects of my
software?
→ What is the productivity of my development team?
→ Can I measure the quality of my testing process?
Introduction
6-99
●
Measurement is the process by which numbers or
symbols are assigned to attributes of entities in
the real world in such a way as to describe them
according to clearly defined rules (N. Fenton and S.
L. Pfleeger, 1997)
→ A measurement is the process to define a measure
Measurement
7-99
●
The measurement process goes from the real world to the
numerical representation
●
Interpretation goes from the numerical representation to
the relevant empirical results
Real World Numbers
Reduced
Numbers
Relevant
Empirical
Results
Intelligence Barrier
Measures
Interpretation Statistics
RelevantResults
The Measurement Process
8-99
●
To avoid anecdotal evidence without a clear research (through
experiments or prototypes for example)
●
To increase the visibility and the understanding of the process
●
To analyze the software development process
●
To make predictions through statistical models
Gilbs’s Principle of fuzzy targets (1988):
“Projects without clear goals will not achieve their goals clearly”
Why Software Measurement
9-99
●
Although measurement may be integrated in
development, very often objectives of measurements
are not clear
“I measure the process because there is an automated
tool that collects the metrics, but do not know how to
read the data and what I can do with the data”
Tom De Marco (1982):
“You cannot manage what you cannot measure” ...
...but you need to know what to measure and how to measure
However...
10-99
Motivational Example
11-99
●
Expert source code and system review after reported
cases of accidents due to cars accelerating without
users' inputs *
●
18 months review + previous NASA experts code
review
●
Investigation on unintended accelerations
Review of Defective Toyota Camry's System (1/3)
* http://www.safetyresearch.net/Library/BarrSlides_FINAL_SCRUBBED.pdf
12-99
●
Usage of software metrics (p.24):
●
“Data-flow spaghetti
– Complex coupling between software modules and between
tasks
– Count of global variables is a software metric for “tangledness”
→ 2005 Camry L4 has >11,000 global variables (NASA)”
Review of Defective Toyota Camry's System (2/3)
* http://www.safetyresearch.net/Library/BarrSlides_FINAL_SCRUBBED.pdf
13-99
●
Usage of software metrics (p.24):
●
“Control-flow spaghetti
– Many long, overly-complex function bodies
– Cyclomatic Complexity is a software metric for “testability”
→ 2005 Camry L4 has 67 functions scoring >50
(“untestable”)
→ The throttle angle function scored over 100
(unmaintainable)”
●
See also p.30-31 for coding rules violations and expected number
of bugs
Review of Defective Toyota Camry's System (3/3)
* http://www.safetyresearch.net/Library/BarrSlides_FINAL_SCRUBBED.pdf
14-99
Pitfalls in linking the real world
phenomenon to numbering systems
https://xkcd.com/605/
15-99
●
A/B Testing is a kind of randomized experiment in which
you can propose two variants of the same application to the
users
●
Set-up an experiment with two browsers and two variations of
the same webpage
Pitfall Example (1/3)
Conv Rate A Conv Rate B
Firefox 87.50% 100.00%
Chrome 50.00% 62.50%
What can you conclude? Which alternative is better?
https://medium.com/homeaway-tech-blog/simpsons-paradox-in-a-b-testing-93af7a2f3307
16-99
●
Let’s look at the same table but with additional information
about the way the tests were split
Pitfall Example (2/3)
https://medium.com/homeaway-tech-blog/simpsons-paradox-in-a-b-testingConv
Rate A Conv Rate B
Firefox 70/80 = 87.5% 20/20 = 100%
Chrome 10/20 = 50% 50/80 = 62.5%
Both 80/100 = 80% 70/100 = 70%
17-99
Simpsons' paradox
●
It can happen that:
a/b < A/B
c/d < C/D
(a + c)/(b + d) > (A + C)/(B + D)
●
example
1/5 (20%) < 2/8 (25%)
6/8 (75%) < 4/5 (80%)
7/13 (53%) > 6/13 (46%)
See: https://plato.stanford.edu/entries/paradox-simpson/ – considering the following papers:
J. Pearl (2000). Causality: Models, Reasoning, and Inference, Cambridge University Press.
P.J. Bickel, E.A. Hammel and J.W. O'Connell (1975). "Sex Bias in Graduate Admissions: Data From Berkeley. Science 187 (4175): 398–40
Dept Men Women
Applicants admitted Applicants admitted
A 5 20% 8 25%
B 8 75% 5 80%
Total 13 53% 13 46%
Pitfall Example (3/3)
18-99
Background on Software
Measurement
19-99
Measurement
artifacts /
objects
Product
(architecture
implementation,
documentation)
Process
(management, lifecycle,
CASE)
Resources
(personnel,
software,
hardware)
Measurement
Models
Flow graphs
Call graphs
Structure tree
Code schema
...
Scale
types,
statistics
Correlation
Estimation
Adjustment
Calibration
Measurement
Evaluation
Analysis
Visualization
Exploration
Prediction
...
Measurement
Goals
Understanding
Learning
Improvement
Management
Controlling
...
artefactBased
operation
quantificationBased
operation
valueBased
operation
experienceBased
operation
Software Measurement Methods
20-99
Information
Product
Information Needs
Interpretation
Indicator
(analysis)
Model
Derived
Measure
Derived
Measure
Measurement
Function
Base
Measure
Base
Measure
Measurement
Method
Measurement
Method
Attribute Attribute
Entity
Measurable
Concept
Measurable
Concept:
abstract relationship
between attributes of
entities and
information needs
Measurement Information Model (ISO/IEC 15939)
21-99
Derived
Measure
Derived
Measure
Measurement
Function
Base
Measure
Base
Measure
Measurement
Method
Measurement
Method
Attribute Attribute
Entity
Measurable
Concept
Property relevant to
information needs
Operations mapping
an attribute to a scale
Variable assigned a
value by applying the
method to one attribute
Algorithm for combining
two or more base
measures
Variable assigned a
value by applying the
measurement function
to two or more values of
base measures
Bottom part
Measurement Information Model (ISO/IEC 15939)
22-99
Information
Product
Information Needs
Interpretation
Indicator
(analysis)
Model
Algorithm for combining
measures and decision
criteria
Variable assigned a value
by applying the analysis
model to base and/or
derived measures
Explanation relating the
quantitative information in
the indicator to the
information needs
The outcome of the
measurement process
that satisfies the
information needs
Top part
Measurement Information Model (ISO/IEC 15939)
23-99
Information
Product
Information Needs
Interpretation
Indicator
(analysis)
Model
Derived
Measure
Derived
Measure
Measurement
Function
Base
Measure
Base
Measure
Measurement
Method
Measurement
Method
Attribute Attribute
Entity
Measurable
Concept
B1= Nr. of
inaccurate
computations
encountered
by users
B2=
Operation
Time
B1/B2
Computational
Accuracy
Comparison of
values obtained
with generic
thresholds and/or
targets
External quality
measures –
Functionality -
Accuracy
Software
Run-time
accuracy
Run-time
usability
Information
Product
Information Needs
Interpretation
Indicator
(analysis)
Model
Derived
Measure
Derived
Measure
Measurement
Function
Base
Measure
Base
Measure
Measurement
Method
Measurement
Method
Attribute Attribute
Entity
Measurable
Concept
B1= Number of
detected
failures
B2= Number
of performed
test cases
B1/B2
Failure density
against test
cases
Comparison of
values obtained
with generic
thresholds and/or
targets
External quality
measures –
Reliability -
Maturity
Software
Run-time
reliability
Level of
testing
Inspired by Abran, Alain, et al. "An information model for software quality measurement with ISO standards." Proceedings of the International
Conference on Software Development (SWDC-REK), Reykjavik, Iceland. 2005.
ISO/IEC 15939 Examples
24-99
●
A measure is a mapping between
– The real world
– The mathematical or formal world with its objects and
relations
●
Different mappings give different views of the world depending on
the context (height, weight, …)
●
The mapping relates attributes to mathematical objects; it
does not relate entities to mathematical objects
Measure Definition
25-99
●
The validity of a measure depends on definition of the
attribute coherent with the specification of the real
world
●
Example: Is LOC a valid measure of productivity?
→ Think by paradox: 100K system.out statements vs
100K of complex loops and statements
ADDITIONAL PROBLEM: You might have two different projects with two
different definitions of LOCs (e.g., considering blanks+comments vs only
“;”) so that the following can be true at the same time P1>P2 and P1<P2
Valid Measure
Measurement
Low High
RealWorld
Low
High
TRUE
NEGATIVE
FALSE
POSITIVE
FALSE
NEGATIVE
TRUE
POSITIVE
Measurement
Low High
RealWorld
Low
High
TRUE
NEGATIVE
FALSE
POSITIVE
FALSE
NEGATIVE
TRUE
POSITIVE
26-99
●
Code coverage is a measure giving an indication of how
much of the source code has been run (“covered”) by running
the tests
●
Different criteria:
– Statement coverage (the one assumed by standard “code coverage): the %
of statements of the program covered by the tests
– Function coverage: the % of functions/methods covered by the tests
– Branch coverage: the % of branches of the control structures (e.g., if-→then→else)
covered by the tests
– Condition coverage: % of each Boolean condition evaluated both as
True/False
Valid Measures – Example (1/5)
[01] * multiples. Repeat until there are no more multiples
[02] * in the array.
[03] */
[04] public class PrimeGenerator
[05] {
[06] private static boolean[] crossedOut;
[07] private static int[] result;
[08] public static int[] generatePrimes(int maxValue){
[09] if (maxValue < 2){
[10] return new int[0];
[11] }else{
[12] uncrossIntegersUpTo(maxValue);
[13] crossOutMultiples();
[14] putUncrossedIntegersIntoResult();
[15] return result;
[16] }
[17] }
[18] }
27-99
●
From Wikipedia: “...A program with high code coverage has
been more thoroughly tested and has a lower chance of
containing software bugs than a program with low code
coverage...”
Q.: Would you consider code coverage as a valid measure
of how much thoroughly one software project has been
tested?
→ Suppose you have two projects and you compute code
coverage
P1 → 70% vs P2 → 80%
Would you generally consider P2 to be “better” (more accurately)
tested than P1?
Valid Measures – Example (2/5)
28-99
Coverage 100%
[01] double div (int x, int y){
[02] return x/y;
[03] }
AssertEquals(1.0, div(1,1));
Coverage 100%
assertEquals(0.66, div(2,3), 0.1);
[01] double div (int x, int y){
[02] return x/y;
[03] }
A. Assumption: considering every test covering the same nr.
of lines as equal?
Note(!): Software follows usually a Pareto principle:
– ~80% of the defects are in the ~20% of the code
– the ~20% of code with more defect-density can be more
difficult to cover with tests
Valid Measures – Example (3/5)
29-99
●
According to Martin Fowler:
“Test coverage is a useful tool
for finding untested parts of a
codebase. Test coverage is of
little use as a numeric
statement of how good your
tests are”
(http://martinfowler.com/bliki/TestCoverage.html)
Valid Measures – Example (4/5)
30-99
●
In this case, we do not respect the representation condition:
when we assign symbols to the attributes of entities we need to
preserve the meaning of relationships when moving entities from
the real world to the numerical world
●
You can see this also from the Information Theory point of view
Real
World
Mathem.
World
1-1 mapping on relations
Valid Measures – Example (5/5)
Measurement
Low High
RealWorld
Low
High
TRUE
NEGATIVE
FALSE
POSITIVE
FALSE
NEGATIVE
TRUE
POSITIVE
31-99
●
Every measurement is mapped to a so-called scale (nominal,
ordinal, interval, rational)
●
Considering the scale is quite important for the admissible
operations
Measurement Scales (1/4)
<,> min,max median avg prop
Nominal →
Ordinal →
Interval →
Rational →
≠,=
32-99
●
Some examples of measures and related scales
Scale Type Examples in Software
Eng.
Indicators of Central
Tendency
Nominal Name of the programming
language (e.g. Java, C++,
C#)
Mode
Ordinal Ranking of failures (as a
measure of failure severity)
Mode + Median
Interval Beginning date, end date of
activities
Mode + Median +
Arithmetic Mean
Ratio LOC (as a measure of
program size)
Mode + Median +
Arithmetic Mean +
geometric Mean
Morasca, Sandro. "Software measurement." Handbook of Software Engineering and Knowledge
Engineering (2001): 239-276.
Measurement Scales (2/4)
33-99
●
Example, suppose that we have the following ranking of
software tickets by severity
Level Severity Description
6 Blocker Prevents function from being used, no workaround,
blocking progress on multiple fronts
5 Critical Prevents function from being used, no work-
around
4 Major Prevents function from being used, but a
work-around is possible
3 Normal A problem making a function difficult to use
but no special work-around is required
2 Minor A problem not affecting the actual function,
but the behavior is not natural
1 Trivial A problem not affecting the actual function, a
typo would be an example
Measurement Scales (3/4) - Examples
34-99
●
Is it meaningful to use the weighted average to compare
two projects in terms of severity of the open issues?
Order Severity P1 P2
6 Blocker 2 10
5 Critical 36 19
4 Major 25 22
3 Normal 15 32
2 Minor 2 5
1 Trivial 121 113
Sev(Pn)=avg(∑issuesi∗weighti)
Sev(P1)=avg(2∗6+36∗5+25∗4+15∗3+2∗2+121∗1)=77
Sev(P2)=avg(10∗6+19∗5+22∗4+32∗3+5∗2+113∗1)=77
Are the projects the same
according to our metric? Is
there the “same distance”
from a critical ticket to a
blocker that there is between
minor and trivial?
Measurement Scales (4/4) - Examples
Let’s define the following
metric:
35-99
●
Some measures are harder to collect or are not
regularly collected
– Direct: from a direct process of measuring
– Indirect: from a mathematical equation in the world of
symbols
Derived
Measure
Derived
Measure
Measurement
Function
Base
Measure
Base
Measure
Measurement
Method
Measurement
Method
Attribute Attribute
Entity
Measurable
Concept
Property relevant to
information needs
Operations mapping
an attribute to a scale
Variable assigned a
value by applying the
method to one attribute
Algorithm for combining
two or more base
measures
Variable assigned a
value by applying the
measurement function
to two or more values of
base measures
This is what ISO/IEC
15939 refers as base
measure and derived
measure
Direct vs Indirect Measures (1/2)
36-99
●
Direct
– Number of known defects
●
Indirect
– Defects density (DD)
– COCOMO, measure of effort
E=a⋅KSLoCb
⋅EAF
where b=0.91+0.01∑i=1
5
SFi
a=2.94
DD=
known defects
productsize
Direct vs Indirect Measures (2/2)
EAF = Effort Adjustment Factor
SF = Scale Factors
37-99
●
Generally, it easier to collect measures of length and
complexity of the code (internal attributes of
product) than measures of its quality (external
attributes)
– Internal attribute: internal characteristics of product,
process, and human resources
– External attributes: due to external environment
Internal vs External Attributes (1/4)
38-99
●
One of the aims of Software Engineering is to improve
the quality of software
Internal vs External Attributes (2/4)
39-99
●
The mapping of internal attributes to external ones –
and then quality in use – is not as straightforward
Internal vs External Attributes (3/4)
40-99
●
The mapping of internal attributes to external ones – and
then quality in use – is not as straightforward (example:
reliability)
Internal vs External Attributes (4/4)
nr. of
failures over
a period of
time
How many faults were
detected in reviewed
Product?
X=A/B
A=Absolute number of faults
detected in review
B=Number of estimated faults to
be detected in review (using past
history or reference model)
Is there a relation
between the two?
ASSUMPTION (!): fix internal mistakes to fix the corresponding failure(s)
41-99
Objective: the same each time they are taken
(e.g. automated collected by some device)
→ e.g., LOCs
Subjective: manually collected by individuals
→ e.g., time to use a functionality in an application
Objective vs Subjective Measures
42-99
SOFTWARE METRICS - SIZE
43-99
[01] * multiples. Repeat until there are no more multiples
[02] * in the array.
[03] */
[04] public class PrimeGenerator
[05] {
[06] private static boolean[] crossedOut;
[07] private static int[] result;
[08] public static int[] generatePrimes(int maxValue){
[09] if (maxValue < 2){
[10] return new int[0];
[11] }else{
[12] uncrossIntegersUpTo(maxValue);
[13] crossOutMultiples();
[14] putUncrossedIntegersIntoResult();
[15] return result;
[16] }
[17] }
[18] }
Various Measures of Size
44-99
[01] * multiples. Repeat until there are no more multiples
[02] * in the array.
[03] */
[04] public class PrimeGenerator
[05] {
[06] private static boolean[] crossedOut;
[07] private static int[] result;
[08] public static int[] generatePrimes(int maxValue){
[09] if (maxValue < 2){
[10] return new int[0];
[11] }else{
[12] uncrossIntegersUpTo(maxValue);
[13] crossOutMultiples();
[14] putUncrossedIntegersIntoResult();
[15] return result;
[16] }
[17] }
[18] }
LOC = 18
(Lines Of Code)
CLOC=3
(Commented
Lines of Code)
Various Measures of Size
45-99
[01] * multiples. Repeat until there are no more multiples
[02] * in the array.
[03] */
[04] public class PrimeGenerator
[05] {
[06] private static boolean[] crossedOut;
[07] private static int[] result;
[08] public static int[] generatePrimes(int maxValue){
[09] if (maxValue < 2){
[10] return new int[0];
[11] }else{
[12] uncrossIntegersUpTo(maxValue);
[13] crossOutMultiples();
[14] putUncrossedIntegersIntoResult();
[15] return result;
[16] }
[17] }
[18] }
NLOC = 15
(Non-Commented
Lines Of Code)
Various Measures of Size
46-99
[01] * multiples. Repeat until there are no more multiples
[02] * in the array.
[03] */
[04] public class PrimeGenerator
[05] {
[06] private static boolean[] crossedOut;
[07] private static int[] result;
[08] public static int[] generatePrimes(int maxValue){
[09] if (maxValue < 2){
[10] return new int[0];
[11] }else{
[12] uncrossIntegersUpTo(maxValue);
[13] crossOutMultiples();
[14] putUncrossedIntegersIntoResult();
[15] return result;
[16] }
[17] }
[18] }
NOC = 1
(Number Of
Classes)
NOM = 1
(Number of
Methods)
NOP = 1
(Number of
Packages)
Various Measures of Size
47-99
●
Size is used for normalization of existing
measures
→ from the example before, it would be much more useful to report a
comments density of 16% (3/18) rather than 3 CLOCs
CD=
CLOCs
LOCs
=
3
18
=0.16
Measures of Size Good for?
48-99
●
Example, using comments density to compare Open
Source projects after normalization
What is a good
reference value
for “comments
density” in
your opinion?
These look
“scary”
O. Arafat and D. Riehle, “The comment density of open source software code,” in 31st International Conference on
Software Engineering - Companion Volume, 2009. ICSE-Companion 2009, 2009, pp. 195–198.
Measures of Size Good for?
49-99
●
Size can give a good rough initial estimation of
effort, although...
→ Measures of source code size should *never* be used
to assess the productivity of developers
How would you
compare Mozilla
Firefox with the
Linux Kernel in terms
of maintenance
effort?
Software LOCs
Microsoft Windows Vista ~50M
Linux Kernel 3.1 ~15M
Android ~12M
Mozilla Firefox ~10M
Unreal Engine 3 ~2M
Measures of Size Good for?
50-99
→ http://www.informationisbeautiful.net/visualizations/million-lines-of-code/
●
Size can be used for comparison of projects and
across releases
Measures of Size Good for?
51-99
“The task then is to refine the code base to better meet
customer need. If that is not clear, the programmers
should not write a line of code. Every line of code costs
money to write and more money to support.”
Jeff Sutherland, one of the main proponents of
the Agile Manifesto and the SCRUM
methodology
Another Observation about LOCs
52-99
SOFTWARE METRICS - COMPLEXITY
53-99
●
G=(N,E) is a graph representing the control flow of a
program. N=nodes, E=edges, P = nr. disconnected
parts of G, like main program and method call
●
Cyclomatic Complexity is defined as:
v(G) = |E|-|N|+ 2P
Cyclomatic Complexity (CC)
→ Assumptions: higher complexity of the program flow graphs,
more complex testing process for the source code
54-99
CC = 2 [01] * multiples. Repeat until there are no more multiples
[02] * in the array.
[03] */
[04] public class PrimeGenerator{
[05] private static boolean[] crossedOut;
[06] private static int[] result;
[07] public static int[] generatePrimes(int maxValue){
[08] if (maxValue < 2){
[09] return new int[0];
[10] }else{
[11] uncrossIntegersUpTo(maxValue);
[12] crossOutMultiples();
[13] putUncrossedIntegersIntoResult();
[14] return result;
[15] }
[16] }
[17] }
Typical ranges
1-4 low
5-7 medium
8-10 high
11+ very high
Cyclomatic Complexity (CC)
CC of method
generatePrimes
v(G)=|E|-|N|+2
v(G)=9-9+2=2
entry
exit
55-99
●
The following code structure from a 2008 students'
project implementing chess: one method with 292LOCs
and 163 CC
Example by using CC
56-99
●
Let's decompose a bit such huge method
public boolean eatCoin(Movement mov, Movement eatMov, Coin coin)
throws IOException{
//Controls if the eatMove is in the board, if not return
if(!canMove(eatMov)){
System.out.println("You can't eat this coin");
return false;
}
try{
//If it is a coin
if(!this.board[mov.row][mov.col].isKing()){
//If the coin to eat isn't a king
System.out.println("nextRow " + mov.nextRow + "
nextCol " + mov.nextCol + " isKing " +
this.board[mov.nextRow][mov.nextCol].isKing());
if(!this.board[mov.nextRow][mov.nextCol].isKing()){
....
Example by using CC
57-99
Example by using CC
58-99
●
A word of warning is that metrics take typically into account syntactic
complexity NOT semantic complexity
●
Both of the following code fragments have the *same* Cyclomatic
Complexity → which code fragment is easier to understand?
[04] public class PrimeGenerator
[05] {
[06] private static boolean[] crossedOut;
[07] private static int[] result;
[08]
[09] public static int[] generatePrimes(int maxValue){
[10] if (maxValue < 2){
[11] return new int[0];
[12] }else{
[13] uncrossIntegersUpTo(maxValue);
[14] crossOutMultiples();
[15] putUncrossedIntegersIntoResult();
[16] return result;
[17] }
[18] }
[04] public class A
[05] {
[06] private static boolean[] c;
[07] private static int[] b;
[08]
[09] public static int[] generate(int m){
[10] if (m < 2){
[11] return new int[0];
[12] }else{
[13] methodOne(m);
[14] methodTwo();
[15] methodThree();
[16] return b;
[17] }
[18] }
●
As well, as in the initial motivating example, a word of warning
when comparing projects in terms of average complexity
Complexity
59-99
OBJECT ORIENTED METRICS
60-99
●
WMC: Weighted methods per class
→ nr. of methods per class
●
DIT: Depth of Inheritance Tree
→ max inheritance level from the root to the class
●
NOC: Number of Children
→ nr. Of direct descendants of a class
●
CBO: Coupling between object classes
→ Class A coupled with B, if A is using methods/attributes of
B
●
RFC: Response for a Class
→ count of methods that can be executed by class A
responding to a message
●
LCOM: Lack of cohesion in methods
→ (see next slide!)
Chidamber & Kemerer Suite (1994!)
61-99More recent metrics
62-99
FINAL REMARKS
63-99
●
Given all that we have seen, what are your thoughts about
the following metric (from the 90’s but still used) computing
the Maintainability Index (MI) of a project:
Final Remarks
MI=171−5.2⋅ln(V )−0.23⋅CC−16.2⋅ln(LOC)
Note: you might see different versions of MI implemented in different tools – this is
the original formula that has a range (171,-∞), other variations go in the (0,100)
range, e.g. look at Microsoft Visual Studio documentation for details
Where V is the Halstead volume, measuring the complexity of code based
on length and vocabulary used (in the code)
V=N∗log2 n
whereN=N1+N2,
N1=Totaloperators(like>,;,),etc..,N2=Totaloperands(like j ,i,0,etc...)
N=n1+n2,
n1=uniqueoperators,n2=uniqueoperands
In your view, what is good and what is bad
about this metric?
64-99
The Goal Question Metrics
(GQM) Approach
65-99
●
Common pitfalls in software measurement
– Collecting measurements without a meaning
●
Measurement must be goal-driven
– Not analyzing measurements
●
Numbers need detailed analysis
– Setting unrealistic targets
●
Targets should not be uniquely defined based on the numbers
– Paralysis by analysis
●
Measurement is a key activity in management, not a separate activity
Count what is countable.
Measure what is measurable.
And what is not measurable, make measurable.
Galileo Galilei
Software Measurement - Pitfalls
66-99
●
Introduced in 1986 by Rombach and Basili
– GQM stands for Goal Question Metric
●
It is a deductive instrument to derive suitable
measures from prescribed goals
●
The paradigm is initiated by Business Goals (BG)
●
From the BGs we can derive the GQM
●
The Goal Question Metric top-down approach consists
of three layers
– Conceptual layer – the Measurement Goal (G)
– Operational layer – the Question (Q)
– Measurement layer – the Metric (M)
The GQM Approach
67-99
●
Measurements must be goal-oriented
●
Following typically a structure as the GQM approach:
Measurement
Goal (G)
Question (Q)
Metric (M)
Business objectives, key
performance indicators,
projects targets,
improvements goals
Approaches to reach the
goals, improvement
programs, change
management, project
management techniques
Business, employee,
products, processes
What are the goals to reach?
What do I need to improve?
How do I reach my
objectives? I will I improve?
Am I doing good or bad? Am I
doing better or worse?
Feedback loop
(understand)
Review
Define
Goal-oriented Measurement
68-99
The primary question must be “What do I need to improve?” rather than
“What measurements should I use?”
Measurement
Goal (G)
Question (Q)
Metric (M)
Business objectives, key
performance indicators,
projects targets,
improvements goals
Approaches to reach the
goals, improvement
programs, change
management, project
management techniques
Business, employee,
products, processes
What are the goals to reach?
What do I need to improve?
How do I reach my
objectives? I will I improve?
Am I doing good or bad? Am I
doing better or worse?
Feedback loop
(understand)
Review
Define
Goal-oriented Measurement
69-99
●
Here are some possible and common used words for each
item of the Goal structure
●
Object of study: process, product, model, metric, etc
●
Purpose: characterize, evaluate, predict, motivate, etc. in
order to understand, assess, manage, engineer, improve,
etc. it
●
Point of view: manager, developer, tester, customer, etc.
●
Perspective or Focus: cost, effectiveness, correctness,
defects, changes, product measures, etc.
●
Environment or Context: specify the environmental
factors, including process factors, people factors, problem
factors, methods, tools, constraints, etc.
The Measurement Goal
70-99
SQALE (Software Quality Assessment
Based on Lifecycle Expectations)
71-99
●
SQALE (Software Quality Assessment Based on Lifecycle
Expectations) is a quality method to evaluate technical
debts in software projects based on the measurement of
software characteristics
●
It allows to discuss here how quality characteristics have been
mapped into numerical representations
SQALE
72-99
●
SQALE quality model is based around three levels, the first
one including 8 software characteristics
SQALE
Characteristic Sub-
Characteristic
Source Code
Requirement
1 1,n 1 1,n
Level 1 Level 2 Level 3
Testability
Reliability
Changeability
Efficiency
Security
Maintainability
Portability
Reusability
73-99
●
The second level is formed by characteristics
SQALE
Characteristic Sub-
Characteristic
Source Code
Requirement
1 1,n 1 1,n
Level 1 Level 2 Level 3
Testability
Reliability
Changeability
Efficiency
Security
Maintainability
Portability
Reusability
Unit TestingTestability
Integration TestingTestability
Data related reliability
Logic related reliability
Statement related reliability
Synchroniation related reliability
Resource related reliability
Architecture related reliability
Fault tolerance
Understandability
Readability
...
...
...
...
...
74-99
●
The third level is linking language specific constructs to the
sub-characteristics
SQALE
Characteristic Sub-
Characteristic
Source Code
Requirement
1 1,n 1 1,n
Level 1 Level 2 Level 3
Testability
Reliability
Changeability
Efficiency
Security
Maintainability
Portability
Reusability
Unit TestingTestability
Integration TestingTestability
Data related reliability
Logic related reliability
Statement related reliability
Synchroniation related reliability
Resource related reliability
Architecture related reliability
Fault tolerance
Understandability
Readability
...
...
...
...
...
Number of parametersin a module call (NOP) <6
Couplingbetween objects(CBO) <7
Switch statementshave a 'default' condition
No assignement ' =' within 'if' statement
No assignement ' =' within 'while' statement
Invariant iteration index
75-99
●
For each of the source code requirements we need to
associate a remediation function that translates the noncompliances
into remediation costs
●
In the most complex case you can associate a different
function for each requirement, but in the most simple case
you can have some predefined value for categories in which
code requirements are in:
SQALE – Remediation Function
76-99
●
Non-remediation functions represent the cost to keep a nonconformity
so a negative impact from the business point of
view
SQALE – Non-Remediation Function
77-99
●
Sums of all the remediation costs associated to a particular
hierarchy of characteristics constitute an index:
– SQALE Testability Index: STI
– SQALE Reliability Index: SRI
– SQALE Changeability Index: SCI
– SQALE Efficiency Index: SEI
– SQALE Security Index: SSI
– SQALE Maintainability Index: SMI
– SQALE Portability Index: SPI
– SQALE Reusability Index: SRuI
– SQALE Quality Index: SQI (overall index)
SQALE – Indices
* Note that there is a version of each index that represents density,
normalized by some measure of size
78-99
●
Indexes can be used to build a rating value:
SQALE – Rating
Rating=
estimated remediationcost
estimated development cost
Rating=
8.30h
300h
=2.7 %->C
Example, an artefact that has an estimated
development cost of 300 hours and a STI of
8.30 hours, using the reference table on
the left
79-99
●
The final representation can take the form of a Kiviat diagram
in which the different density indexes are represented
SQALE – Rating
80-99
●
This is the view you find in SonarCube
http://www.sonarqube.org/sonar-sqale-1-2-in-screenshot
SQALE – Rating
81-99
●
Given our initial discussion of measurement pitfalls, scales
and representation condition, the following sentence should
be now clear:
“Because the non-remediation costs are not established on
an ordinal scale but on a ratio scale, we have shown [..]
that we can aggregate the measures by addition and
comply with the measurement theory and the
representation clause.”
SQALE
Letouzey, Jean-Louis, and Michel Ilkiewicz. "Managing technical debt with the SQALE method." IEEE
software 6 (2012): 44-51.
82-99
Case Studies
83-99
●
Suppose that we have the some projects on which we
computed the following set of metrics
→ What can you say about the projects?
Project01 Project02 Project03 Project04 Project05 Project06
# LOCS 4920 5817 4013 4515 3263 5735
# packages 29 49 33 35 25 33
# classes 126 199 159 181 75 198
# methods 658 862 644 817 415 715
# attributes 153 196 227 285 78 177
# parameters 301 459 393 440 182 415
# local vars 493 533 325 397 339 416
# calls 2051 2830 1844 2297 917 2015
Proj_status complete complete incomplete complete incomplete complete
Case Study
84-99
●
What if we consider relative instead of absolute
values?
●
This would allow to compare the values across projects
Project01 Project02 Project03 Project04 Project05 Project06
LOCs/NOM 7.48 6.75 6.23 5.53 7.86 8.02
NOC/NOP 4.34 4.06 4.82 5.17 3.00 6.00
NOM/NOC 5.22 4.33 4.05 4.51 5.53 3.61
att/NOC 1.21 0.98 1.43 1.57 1.04 0.89
param/NOM 0.46 0.53 0.61 0.54 0.44 0.58
locvars/NOM 0.75 0.62 0.50 0.49 0.82 0.58
Calls/NOM 3.12 3.28 2.86 2.81 2.21 2.82 highest value
Proj_status complete complete incomplete complete incomplete complete lowest value
Case Study
85-99
Case Study
●
What if we make sense out of the metrics by using the
GQM approach?
G1. Analyze the software product (object of study) for the purpose of
evaluation (purpose) with respect to the effectiveness of code structure
(quality focus) from the point of view of the development team (point of
view) in the environment of our project named xyx (environment).
Q1.1. what is
the structure of
the system?
M1.2.1
Calls/NOM
M1.2.2
param/NOM
M1.1.3
NOM/NOC
Q1.2. what is
the coupling
within the
system?
M1.1.1
NOC/NOP
M1.1.2
LOCs/NOM
86-99
Case Study
●
What if we make sense out of the metrics by using the
GQM approach?
G1. Analyze the software product (object of study) for the purpose of
evaluation (purpose) with respect to the effectiveness of code structure
(quality focus) from the point of view of the development team (point of
view) in the environment of our project named xyx (environment).
Q1.1. what is
the structure of
the system?
M1.2.1
Calls/NOM
M1.2.2
param/NOM
M1.1.3
NOM/NOC
Q1.2. what is
the coupling
within the
system?
M1.1.1
NOC/NOP
M1.1.2
LOCs/NOM
P1: 3.12 P5: 2.21 P1: 0.46 P5: 0.44
87-99
Case Study
●
What happens if we consider LOCs instead of NOMs?
G1. Analyze the software product (object of study) for the purpose of
evaluation (purpose) with respect to the effectiveness of code structure
(quality focus) from the point of view of the development team (point of
view) in the environment of our project named xyx (environment).
Q1.1. what is
the structure of
the system?
M1.2.1
Calls/LOCs
M1.2.2
param/LOCs
M1.1.3
NOM/NOC
Q1.2. what is
the coupling
within the
system?
M1.1.1
NOC/NOP
M1.1.2
LOCs/NOM
P1: 0.41 P5: 0.28 P1: 0.14 P5: 0.05
88-99
●
Another useful way to think in terms of relative values and
thresholds is to use the Overview Pyramid
●
The Overview pyramid allows to represent three different aspects
of internal quality: inheritance, size & complexity and coupling
●
It provides both absolute and relative values that are compared
against typical thresholds
NOP: Number of Packages
NOC: Number of Classes
NOM: Number of Methods
LOC: Lines of Code
CYCLO: Cyclomatic Complexity
ANDC: Average Number of Derived Classes
AHH: Average Hierarchy Height
CALL: Number of Distinct Method Invocations
FANOUT: Number of Called Classes
Case Study – The Overview Pyramid
89-99
Project 1
Project 2
Project 3
Close to high
Close to average
Close to low
Case Study – The Overview Pyramid
90-99
Project 4
Project 5
Project 6
Close to high
Close to average
Close to low
Case Study – The Overview Pyramid
91-99
Back to our initial project
Eclipse JDT 3.5.0
The overview pyramid
Close to high
Close to average
Close to low
Case Study – The Overview Pyramid
92-99
●
Measurement is important to track progress of software
projects and to focus on relevant parts that need
attention
●
As such, we always need to take measurement into
account with some “grain of salt”
●
Still, collecting non-relevant or non-valid metrics might
be even worse than not collecting any valid measure
at all
Conclusions
93-99
Extra Slides
94-99
●
LOCs: Lines of Code
●
CC: McCabe Cyclomatic complexity
●
Fan in: number of local flows that terminates in a module
●
Fan out: number of local flows emanate from a module
●
Information flow complexity of a a module: length of the module
times the squared difference of fan in and fan out
●
NOM: Number of Methods per class
●
WMC: Weighted Methods per Class
●
DIT: Depth of Inheritance Tree
●
NOC: Number of Children
●
CBO: Coupling Between Objects
●
RFC: Response For a Class
●
LCOM: Lack of Cohesion of Methods
●
ANDC: Average Number of Derived Classes
●
AHH: Average Hierarchy Height
List of some Acronyms
95-99
Example: Laws in
Software
Engineering: how
were these derived?
Software Engineering Laws (1/4)
96-99
Information hiding in object
oriented programming
“A human being can concentrate on
7±2 items at a time”
“Productivity is improved by
reducing accidents and
controlling essence”
“Testing can show
the presence but not
absence of errors”
Pr(A|B) = Pr(B|A)*Pr(A) / Pr(b)
Software Engineering Laws (2/4)
97-99
“Requirement
deficiencies are the
prime source of
project failure”
“The value of a
model depends on
the view taken,but
none is best for all
purposes”
“the user will never
know what they want
until after the system
is in production”
“Good designs
require deep
application domain
knowledge”
“What applies to
small systems does
not apply to large
ones”
“Everything put
together falls apart
sooner or later”
8 laws of software
evolution
Software Engineering Laws (3/4)
98-99
The number of transistors on an
integrated circuit will double in about
18 months.
The number of radio
communications doubles every 30
months
“the number of lines of
code a programmer can
write in a fixed period of
time is the same
regardless of the
programming language”
“If builders built
buildings the way
programmers
wrote programs,
the first
woodpecker that
came along
would destroy
civilization”
Perspective based
inspections (along one
dimension, for a
specific stakeholder) are
highly eeffective and
efficient
Software reuse reduces
cycle time and
increases productivity
and quality
Software Engineering Laws (4/4)
99-99
●
N. Fenton and J. Bieman, Software Metrics: A Rigorous and Practical
Approach, Third Edition, 3 edition. Boca Raton: CRC Press, 2014.
●
C. Ebert and R. Dumke, Software Measurement: Establish - Extract Evaluate
- Execute, Softcover reprint of hardcover 1st ed. 2007 edition.
Springer, 2010.
●
Lanza, Michele, and Radu Marinescu. Object-oriented metrics in practice:
using software metrics to characterize, evaluate, and improve the design of
object-oriented systems. Springer Science & Business Media, 2007.
●
Some code samples from Martin, Robert C. Clean code: a handbook of agile
software craftsmanship. Pearson Education, 2008.
●
Moose platform for software data analysis http://moosetechnology.org
●
The SQALE Method http://www.sqale.org/wp-content/uploads/2010/08/SQALE-Method-EN-V1-0.pdf
References