E2011: Theoretical fundamentals of computer
science
Introduction to numerical approximation
Vlad Popovici, Ph.D.
Fac. of Science - RECETOX
Motivation: approximations and errors video
Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation2 / 40
Scientific computing
Wikipedia:
"Computational science (also scientific computing or scientific
computation) is concerned with constructing mathematical models and
quantitative analysis techniques and using computers to analyze and solve
scientific problems."
Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation3 / 40
Scientific computing
Wikipedia:
"Computational science (also scientific computing or scientific
computation) is concerned with constructing mathematical models and
quantitative analysis techniques and using computers to analyze and solve
scientific problems."
Basically: find numerical solutions to mathematically-formulated problems.
Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation3 / 40
(J. Hadamard) A problem is well posed if its solution
exists
is unique
has a behavior that changes continuously with the initial conditions;
otherwise, it is ill posed.
Inverse problems are often ill posed.
Example: 3D to 2D projection.
Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation4 / 40
General computational approach
well-posed but ill-conditioned problems: small errors in input lead to
large variations in the solution → improve conditioning by
regularization
continuous domain → discrete domain
infinite → finite
differential → algebraic
nonlinear → (combination of) linear
accept approximate solutions, but control for the error
Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation5 / 40
Approximations
Modeling approximations:
▶ "model" = approximation of the nature
▶ data - inexact measurements or previous results
Implementation/computational approximations:
▶ discretization of the continuous domain; truncation
▶ rounding
errors in input data
errors propagated by the algorithm
accuracy of the final result
Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation6 / 40
Example: area of the Earth
model: sphere
A = 4πr2
r =?
π = 3.14159 . . .
rounded arithmetic
Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation7 / 40
Errors
Absolute error: approximate value - true value
Relative error:
absolute error
true value
→ approximate value = (1 + relative error) × (true value)
if the relative error is ∼ 10−d
, it means that ˆx has about d exact digits:
there exists τ = ±(0.0 . . . 0nd+1nd+2 . . . ) such that ˆx = x + τ
true value is usually not known → use estimates or bounds on the
error
relative error can be taken relative to the approximate value
Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation8 / 40
Example
Stirling’s approximation for factorials:
Sn =
√
2πn
n
e
n
≈ n!, n = 1, 2, . . .
where e = exp(1).
Relative error (Sn − n!)/n!:
Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation9 / 40
Errors: data and computational
compute f(x) for f : R → R
▶ x ∈ R is the true value
▶ f(x) true/desired result
▶ ˆx approximate input
▶ ˆf approximate result
total error:
ˆf(ˆx) − f(x) = (ˆf(ˆx) − f(ˆx)) + (f(ˆx) − f(x))
= computational error + propagated data error
the algorithm has no effect on propagated error
Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation10 / 40
Computational error
is sum of:
truncation error = (true result) - (result of the algorithm using exact
arithmetic)
i.e. error due to approximating mathematical procedures
Example: considering only the first terms of an infinite Taylor series;
stopping before convergence
rounding error = (result of the algorithm using exact arithmetic) (result
of the algorithm using limited precision arithmetic)
i.e. error due to representation approximation
Example: π ≈ 3.14 or π ≈ 3.141593
Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation11 / 40
Finite difference approximation
f′
(x) = lim
h→0
f(x + h) − f(x)
h
≈
f(x + h) − f(x)
h
, for some small h > 0
truncation error: f′(x) −
f(x+h)−f(x)
h ≤ Mh/2 where |f′′(t)| ≤ M for t in a
small neighborhood of x
rounding error: 2ϵ/h, for ϵ being the precision
total error is minimized for h ≈ 2
√
ϵ/M
Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation12 / 40
Figure: Total computational error as a tradeoff between truncation and rounding
error (from Heath - Scientific computing)
Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation13 / 40
Error analysis
For y = f(x), for f : R → R an approximate ˆy result is obtained.
forward error: ∆y = ˆy − y
backward error: ∆x = ˆx − x, for f(ˆx) = ˆy
x
f //
OO
backward error

ˆf

y
OO
forward error

= f(x)
ˆx
f // ˆy = ˆf(x) = f(ˆx)
Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation14 / 40
Compute f(x) = ex
for x = 1. Use the first 4 terms from Taylor expansion:
ˆf(x) = 1 + x +
x2
2
+
x3
6
take "true" value: f(x) = 2.716262 and compute ˆf(x) = 2.666667,
then
forward error: |∆y| = 0.051615, or a relative f. error of about 2%
backward error: ˆx = lnˆf(x) = 0.989829 ⇒ |∆x| = 0.019171, or a
relative b. error of 2%
these are two perspectives on assessing the accuracy
Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation15 / 40
Example
Consider the general Taylor series with limit e:
∞
n=0
1
n!
= e
How many terms are needed for an approximation of e to three decimal
places?
Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation16 / 40
For correct 3 decimals, the tail (the rest of the sum, not computed) must be
upper bounded by 0.0005 (why?)
∞
n=1
1
n!
=
k
n=1
1
n!
+
∞
n=k+1
1
n!
=
k
n=1
1
n!
+
1
(k + 1)!
1 +
1
k + 2
+
1
(k + 2)(k + 3)
+ . . .
<
k
n=1
1
n!
+
1
(k + 1)!
1 +
1
k + 1
+
1
(k + 1)2
+ . . .
=
k
n=1
1
n!
+
1
(k + 1)!


1
1 − 1
k+1

 =
k
n=1
1
n!
+
1
k · k!
Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation17 / 40
k approx tail
1 2.00000000 1.00000000
2 2.50000000 0.25000000
3 2.66666667 0.05555556
4 2.70833333 0.01041667
5 2.71666667 0.00166667
6 2.71805556 0.00023148 <-----
7 2.71825397 0.00002834
8 2.71827877 0.00000310
9 2.71828153 0.00000031
10 2.71828180 0.00000003
Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation18 / 40
Backward error analysis
idea: approximate result is the exact solution of a modified problem
how far from the original problem is the modified version?
how much error in the input data would explain all the error in the
result?
an approximate solution is good if it is an exact solution for a nearby
problem
backward analysis is usually easier
Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation19 / 40
Sensitivity and conditioning
insensitive (well-conditioned) problem: relative changes in input data
causes similar relative change in the result
large changes in solution for small changes in input data indicate a
sensitive (ill-conditioned) problem;
condition number:
cond =
absolute relative change in solution
absolute relative change in input
=
|∆y/y|
|∆x/x|
if cond >> 1 the problem is sensitive
Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation20 / 40
condition number is a scale factor for the error:
relative forward err = cond × relative backward err
usually, only upper bounds of the cond. number can be estimated,
cond ≤ C, hence
relative forward err ≤ C × relative backward err
Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation21 / 40
ˆx = x + ∆x
forward error: f(x + ∆x) − f(x) ≈ f′(x)∆x, for small enough ∆x
relative forward error: ≈
f′(x)∆x
f(x)
⇒ cond ≈
xf′(x)
f(x)
Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation22 / 40
ˆx = x + ∆x
forward error: f(x + ∆x) − f(x) ≈ f′(x)∆x, for small enough ∆x
relative forward error: ≈
f′(x)∆x
f(x)
⇒ cond ≈
xf′(x)
f(x)
Example: tangent function is sensitive in neighborhood of π/2
tan(1.57079) ≈ 1.58058 × 105
; tan(1.57078) ≈ 6.12490 × 104
for x = 1.57079, cond ≈ 2.48275 × 105
Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation22 / 40
Stability
an algorithm is stable if is relatively insensitive to perturbations during
computation
stability of algorithms is analogous to conditioning of problems
backward analysis: an algorithm is stable if the result produced is the
exact solution of a nearby problem
stable algorithm: the effect of computational error is no worse than
the effect of small error in input data
Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation23 / 40
Accuracy
accuracy: closeness of the result to the true solution of the problem
depends on the conditioning of the problem AND on the stability of
the algorithm
stable algorithm + well-conditioned problem = accurate results
Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation24 / 40
Floating-point numbers
like in scientific notation: mantissa × radixexponent
, e.g. 2.35 × 103
formally
x = ± b0 +
b1
β
+
b2
β2
+ · · · +
bp−1
βp−1
× βE
where
β is the radix (or base)
p is the precision
L ≤ E ≤ U are the limits of the exponent
0 ≤ bk ≤ β
mantissa: m = b0b1 . . . bp−1; fraction: b1b2 . . . bp−1
the sign, mantissa and exponent are stored in fixed-sized fields (the
radix is implicit for a given system, β = 2 usually)
Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation25 / 40
Normalization:
b0 0 for all x 0
mantissa m satisfies 1 ≤ m < β
ensures unique representation, optimal use of available bits
Internal representation (on 64 bits - "double precision", binary
representation):
x = sign | exponent | fraction = b63 b62 . . . b52 b51 . . . b0
Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation26 / 40
Properties:
only a finite number of discrete values can be represented
total number of floating point numbers representable in normalized
format is
2(β − 1)βp−1
(U − L + 1) + 1
undeflow level (smallest number): UFL = βL
overflow level (largest number): OFL = βU+1
(1 − β−p
)
not all real numbers can be represented exactly:
▶ machine numbers
▶ rounding → rounding error
Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation27 / 40
From continuous domain to a discrete domain
Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation28 / 40
Example: let β = 2, p = 3, L = −1, U = 1, there are 25 distinct numbers
that can be represented:
UFL = 0.510; OFL = 3.510
note the non-uniform coverage
∀x ∈ R, fl(x) is the floating point representation; x − fl(x) is the
rounding error
Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation29 / 40
Rounding rules
chop = round toward zero: truncate the base−β representation after
p − 1st digit
round to nearest: fl(x) is the closest machine number to x
Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation30 / 40
Machine precision
machine precision, ϵmach
▶ with chopping: ϵmach = β1−p
▶ with rounding to nearest: ϵmach = 1
2
β1−p
called also unit roundoff: the smallest number ϵ such that fl(1 + ϵ) > 1
maximum relative error of representation
fl(x) − x
x
≤ ϵmach
usually 0 < UFL < ϵmach < OFL
Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation31 / 40
Machine precision - example
For β = 2, p = 3, L = −1, U = 1,
ϵmach = (0.01)2 = (0.25)10 with chopping
ϵmach = (0.001)2 = (0.125)10 with rounding to nearest
The usual case (IEEE fp systems):
ϵmach = 2−24
≈ 10−7
in single precision
ϵmach = 2−53
≈ 10−16
in double precision
→ about 7 and 16 decimals of precision, respectively
(in R: p-value < 2.2e − 16)
Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation32 / 40
Gradual underflow
to improve representation of numbers around 0 - use subnormal (or
denormalized) numbers
when exponent is at minimum, alow leading digits to be 0
subnormals are less precise
→ gradual underflow
Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation33 / 40
Special values
IEEE standard:
Inf: infinity; the result of 1/0
NaN: the result of 0/0 or Inf/Inf
special representation of the exponent field
Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation34 / 40
Floating-point arithmetic
addition/subtraction: denormalization might be required:
3.52 × 103
+ 1.97 × 105
= 0.0352 × 105
+ 1.97 × 105
= 2.0052 × 105
→ might cause loss of some digits
multiplication/division: the result may not be representable
overflow is more serious than underflow: how to approximate large
numbers?
for underflow, the result may be approximated by 0
in FP arithm. addition and multiplication are commutative but not
associative: if ϵ is slightly smaller than ϵmach, then (1 + ϵ) + ϵ = 1, but
1 + (ϵ + ϵ) > 1
ideally, x flop y = fl(x op y); IEEE standard ensures this for within
range results
Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation35 / 40
Example: divergent series
∞
n=1
1
n
in FP arithm, the sum of the series is finite;
depending on the system, this is because:
▶ after a while, the sum overflows
▶ 1/n underflows
▶ for all n such that
1
n
< ϵmach
n−1
k=1
1
k
the sum does not change anymore
Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation36 / 40
Cancellation
subtracting 2 numbers of the same magnitude usually cancels the
most significant digits:
1.92403 × 102
− 1.92275 × 102
= 1.28000 × 10−1
→ only 3 significant
digits
let ϵ > 0 be slightly smaller than ϵmach, then (1 + ϵ) − (1 − ϵ) yields 0
in FP arithmetic, instead of 2ϵ.
Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation37 / 40
Cancellation - example
For the quadratic equation, ax2
+ bx + c = 0, the two solutions are given
by
x1,2 =
−b ±
√
b2 − 4ac
2a
Problems:
for very large/small coefficients, the terms b2
or 4ac may
over-/underflow → rescale coeficients by max{a, b, c}.
cancellation between −b and
√
· can be avoided by computing one
root using x = 2c
−b∓
√
b2−4ac
Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation38 / 40
Questions?
Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation39 / 40
Supplementals
here you can convert between real numbers and their representation
and see the machine error
have a look at float toy to see how real numbers are represented –
see also the Wikipedia links from the same page
for a principle discussion of the subject, read What every computer
scientist should know about floating-point arithmetic by D. Goldberg
Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation40 / 40