E2011: Theoretical fundamentals of computer science Introduction to numerical approximation Vlad Popovici, Ph.D. Fac. of Science - RECETOX Motivation: approximations and errors video Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation2 / 40 Scientific computing Wikipedia: "Computational science (also scientific computing or scientific computation) is concerned with constructing mathematical models and quantitative analysis techniques and using computers to analyze and solve scientific problems." Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation3 / 40 Scientific computing Wikipedia: "Computational science (also scientific computing or scientific computation) is concerned with constructing mathematical models and quantitative analysis techniques and using computers to analyze and solve scientific problems." Basically: find numerical solutions to mathematically-formulated problems. Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation3 / 40 (J. Hadamard) A problem is well posed if its solution exists is unique has a behavior that changes continuously with the initial conditions; otherwise, it is ill posed. Inverse problems are often ill posed. Example: 3D to 2D projection. Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation4 / 40 General computational approach well-posed but ill-conditioned problems: small errors in input lead to large variations in the solution → improve conditioning by regularization continuous domain → discrete domain infinite → finite differential → algebraic nonlinear → (combination of) linear accept approximate solutions, but control for the error Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation5 / 40 Approximations Modeling approximations: ▶ "model" = approximation of the nature ▶ data - inexact measurements or previous results Implementation/computational approximations: ▶ discretization of the continuous domain; truncation ▶ rounding errors in input data errors propagated by the algorithm accuracy of the final result Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation6 / 40 Example: area of the Earth model: sphere A = 4πr2 r =? π = 3.14159 . . . rounded arithmetic Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation7 / 40 Errors Absolute error: approximate value - true value Relative error: absolute error true value → approximate value = (1 + relative error) × (true value) if the relative error is ∼ 10−d , it means that ˆx has about d exact digits: there exists τ = ±(0.0 . . . 0nd+1nd+2 . . . ) such that ˆx = x + τ true value is usually not known → use estimates or bounds on the error relative error can be taken relative to the approximate value Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation8 / 40 Example Stirling’s approximation for factorials: Sn = √ 2πn n e n ≈ n!, n = 1, 2, . . . where e = exp(1). Relative error (Sn − n!)/n!: Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation9 / 40 Errors: data and computational compute f(x) for f : R → R ▶ x ∈ R is the true value ▶ f(x) true/desired result ▶ ˆx approximate input ▶ ˆf approximate result total error: ˆf(ˆx) − f(x) = (ˆf(ˆx) − f(ˆx)) + (f(ˆx) − f(x)) = computational error + propagated data error the algorithm has no effect on propagated error Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation10 / 40 Computational error is sum of: truncation error = (true result) - (result of the algorithm using exact arithmetic) i.e. error due to approximating mathematical procedures Example: considering only the first terms of an infinite Taylor series; stopping before convergence rounding error = (result of the algorithm using exact arithmetic) (result of the algorithm using limited precision arithmetic) i.e. error due to representation approximation Example: π ≈ 3.14 or π ≈ 3.141593 Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation11 / 40 Finite difference approximation f′ (x) = lim h→0 f(x + h) − f(x) h ≈ f(x + h) − f(x) h , for some small h > 0 truncation error: f′(x) − f(x+h)−f(x) h ≤ Mh/2 where |f′′(t)| ≤ M for t in a small neighborhood of x rounding error: 2ϵ/h, for ϵ being the precision total error is minimized for h ≈ 2 √ ϵ/M Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation12 / 40 Figure: Total computational error as a tradeoff between truncation and rounding error (from Heath - Scientific computing) Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation13 / 40 Error analysis For y = f(x), for f : R → R an approximate ˆy result is obtained. forward error: ∆y = ˆy − y backward error: ∆x = ˆx − x, for f(ˆx) = ˆy x f // OO backward error  ˆf  y OO forward error  = f(x) ˆx f // ˆy = ˆf(x) = f(ˆx) Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation14 / 40 Compute f(x) = ex for x = 1. Use the first 4 terms from Taylor expansion: ˆf(x) = 1 + x + x2 2 + x3 6 take "true" value: f(x) = 2.716262 and compute ˆf(x) = 2.666667, then forward error: |∆y| = 0.051615, or a relative f. error of about 2% backward error: ˆx = lnˆf(x) = 0.989829 ⇒ |∆x| = 0.019171, or a relative b. error of 2% these are two perspectives on assessing the accuracy Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation15 / 40 Example Consider the general Taylor series with limit e: ∞ n=0 1 n! = e How many terms are needed for an approximation of e to three decimal places? Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation16 / 40 For correct 3 decimals, the tail (the rest of the sum, not computed) must be upper bounded by 0.0005 (why?) ∞ n=1 1 n! = k n=1 1 n! + ∞ n=k+1 1 n! = k n=1 1 n! + 1 (k + 1)! 1 + 1 k + 2 + 1 (k + 2)(k + 3) + . . . < k n=1 1 n! + 1 (k + 1)! 1 + 1 k + 1 + 1 (k + 1)2 + . . . = k n=1 1 n! + 1 (k + 1)!   1 1 − 1 k+1   = k n=1 1 n! + 1 k · k! Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation17 / 40 k approx tail 1 2.00000000 1.00000000 2 2.50000000 0.25000000 3 2.66666667 0.05555556 4 2.70833333 0.01041667 5 2.71666667 0.00166667 6 2.71805556 0.00023148 <----- 7 2.71825397 0.00002834 8 2.71827877 0.00000310 9 2.71828153 0.00000031 10 2.71828180 0.00000003 Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation18 / 40 Backward error analysis idea: approximate result is the exact solution of a modified problem how far from the original problem is the modified version? how much error in the input data would explain all the error in the result? an approximate solution is good if it is an exact solution for a nearby problem backward analysis is usually easier Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation19 / 40 Sensitivity and conditioning insensitive (well-conditioned) problem: relative changes in input data causes similar relative change in the result large changes in solution for small changes in input data indicate a sensitive (ill-conditioned) problem; condition number: cond = absolute relative change in solution absolute relative change in input = |∆y/y| |∆x/x| if cond >> 1 the problem is sensitive Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation20 / 40 condition number is a scale factor for the error: relative forward err = cond × relative backward err usually, only upper bounds of the cond. number can be estimated, cond ≤ C, hence relative forward err ≤ C × relative backward err Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation21 / 40 ˆx = x + ∆x forward error: f(x + ∆x) − f(x) ≈ f′(x)∆x, for small enough ∆x relative forward error: ≈ f′(x)∆x f(x) ⇒ cond ≈ xf′(x) f(x) Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation22 / 40 ˆx = x + ∆x forward error: f(x + ∆x) − f(x) ≈ f′(x)∆x, for small enough ∆x relative forward error: ≈ f′(x)∆x f(x) ⇒ cond ≈ xf′(x) f(x) Example: tangent function is sensitive in neighborhood of π/2 tan(1.57079) ≈ 1.58058 × 105 ; tan(1.57078) ≈ 6.12490 × 104 for x = 1.57079, cond ≈ 2.48275 × 105 Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation22 / 40 Stability an algorithm is stable if is relatively insensitive to perturbations during computation stability of algorithms is analogous to conditioning of problems backward analysis: an algorithm is stable if the result produced is the exact solution of a nearby problem stable algorithm: the effect of computational error is no worse than the effect of small error in input data Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation23 / 40 Accuracy accuracy: closeness of the result to the true solution of the problem depends on the conditioning of the problem AND on the stability of the algorithm stable algorithm + well-conditioned problem = accurate results Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation24 / 40 Floating-point numbers like in scientific notation: mantissa × radixexponent , e.g. 2.35 × 103 formally x = ± b0 + b1 β + b2 β2 + · · · + bp−1 βp−1 × βE where β is the radix (or base) p is the precision L ≤ E ≤ U are the limits of the exponent 0 ≤ bk ≤ β mantissa: m = b0b1 . . . bp−1; fraction: b1b2 . . . bp−1 the sign, mantissa and exponent are stored in fixed-sized fields (the radix is implicit for a given system, β = 2 usually) Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation25 / 40 Normalization: b0 0 for all x 0 mantissa m satisfies 1 ≤ m < β ensures unique representation, optimal use of available bits Internal representation (on 64 bits - "double precision", binary representation): x = sign | exponent | fraction = b63 b62 . . . b52 b51 . . . b0 Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation26 / 40 Properties: only a finite number of discrete values can be represented total number of floating point numbers representable in normalized format is 2(β − 1)βp−1 (U − L + 1) + 1 undeflow level (smallest number): UFL = βL overflow level (largest number): OFL = βU+1 (1 − β−p ) not all real numbers can be represented exactly: ▶ machine numbers ▶ rounding → rounding error Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation27 / 40 From continuous domain to a discrete domain Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation28 / 40 Example: let β = 2, p = 3, L = −1, U = 1, there are 25 distinct numbers that can be represented: UFL = 0.510; OFL = 3.510 note the non-uniform coverage ∀x ∈ R, fl(x) is the floating point representation; x − fl(x) is the rounding error Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation29 / 40 Rounding rules chop = round toward zero: truncate the base−β representation after p − 1st digit round to nearest: fl(x) is the closest machine number to x Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation30 / 40 Machine precision machine precision, ϵmach ▶ with chopping: ϵmach = β1−p ▶ with rounding to nearest: ϵmach = 1 2 β1−p called also unit roundoff: the smallest number ϵ such that fl(1 + ϵ) > 1 maximum relative error of representation fl(x) − x x ≤ ϵmach usually 0 < UFL < ϵmach < OFL Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation31 / 40 Machine precision - example For β = 2, p = 3, L = −1, U = 1, ϵmach = (0.01)2 = (0.25)10 with chopping ϵmach = (0.001)2 = (0.125)10 with rounding to nearest The usual case (IEEE fp systems): ϵmach = 2−24 ≈ 10−7 in single precision ϵmach = 2−53 ≈ 10−16 in double precision → about 7 and 16 decimals of precision, respectively (in R: p-value < 2.2e − 16) Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation32 / 40 Gradual underflow to improve representation of numbers around 0 - use subnormal (or denormalized) numbers when exponent is at minimum, alow leading digits to be 0 subnormals are less precise → gradual underflow Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation33 / 40 Special values IEEE standard: Inf: infinity; the result of 1/0 NaN: the result of 0/0 or Inf/Inf special representation of the exponent field Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation34 / 40 Floating-point arithmetic addition/subtraction: denormalization might be required: 3.52 × 103 + 1.97 × 105 = 0.0352 × 105 + 1.97 × 105 = 2.0052 × 105 → might cause loss of some digits multiplication/division: the result may not be representable overflow is more serious than underflow: how to approximate large numbers? for underflow, the result may be approximated by 0 in FP arithm. addition and multiplication are commutative but not associative: if ϵ is slightly smaller than ϵmach, then (1 + ϵ) + ϵ = 1, but 1 + (ϵ + ϵ) > 1 ideally, x flop y = fl(x op y); IEEE standard ensures this for within range results Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation35 / 40 Example: divergent series ∞ n=1 1 n in FP arithm, the sum of the series is finite; depending on the system, this is because: ▶ after a while, the sum overflows ▶ 1/n underflows ▶ for all n such that 1 n < ϵmach n−1 k=1 1 k the sum does not change anymore Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation36 / 40 Cancellation subtracting 2 numbers of the same magnitude usually cancels the most significant digits: 1.92403 × 102 − 1.92275 × 102 = 1.28000 × 10−1 → only 3 significant digits let ϵ > 0 be slightly smaller than ϵmach, then (1 + ϵ) − (1 − ϵ) yields 0 in FP arithmetic, instead of 2ϵ. Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation37 / 40 Cancellation - example For the quadratic equation, ax2 + bx + c = 0, the two solutions are given by x1,2 = −b ± √ b2 − 4ac 2a Problems: for very large/small coefficients, the terms b2 or 4ac may over-/underflow → rescale coeficients by max{a, b, c}. cancellation between −b and √ · can be avoided by computing one root using x = 2c −b∓ √ b2−4ac Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation38 / 40 Questions? Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation39 / 40 Supplementals here you can convert between real numbers and their representation and see the machine error have a look at float toy to see how real numbers are represented – see also the Wikipedia links from the same page for a principle discussion of the subject, read What every computer scientist should know about floating-point arithmetic by D. Goldberg Vlad Popovici, Ph.D. (Fac. of Science - RECETOX)E2011: Theoretical fundamentals of computer science Introduction to numerical approximation40 / 40