E7441: Scientific computing Vlad Popovici, Ph.D. RECETOX Outline Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing 2 / 48 There is nothing more practical than a good theory. Kurt Lewin (1890–1947) Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing 3 / 48 Outline Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing 4 / 48 Bibliography: KONG Q., SIAUW T., BAYEN A. (2020). Python programming and numerical methods. Academic Press. ISBN: 9780128195499 HEATH M.T. (2002). Scientific Computing. An introductory survey. McGraw-Hill, 2nd edition. ISBN: 0-07-239910-4 Good accompanying materials at https://heath.cs.illinois.edu/scicomp/notes/index.html, including slides and demos! Used as basis for the first part of the course. KEPNER J. (2009). Parallel Matlab for Multicore and Multinode Computers. SIAM Publishing. ISBN: 978-0-898716-73-3 GENTLE J.E. (2005). Elements of Computational Statistics. Springer. ISBN:978-0387954899 Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing 5 / 48 Computing environments for the course: Python 3, https://www.python.org - with NumPy and SciPy packages recommended: JupyterLab for exercises suggestion: install Python and related packages using a distribution like Anaconda or Mamba for easier integration of dependencies R, http://www.r-project.org - "environment for statistical computing and graphics" WARNING: Some pieces of code shown during the course may not represent the optimal implementation in the given language. They are merely a device for demonstrating some principles. Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing 6 / 48 Scientific computing Wikipedia: "Computational science (also scientific computing or scientific computation) is concerned with constructing mathematical models and quantitative analysis techniques and using computers to analyze and solve scientific problems." Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing 7 / 48 Scientific computing Wikipedia: "Computational science (also scientific computing or scientific computation) is concerned with constructing mathematical models and quantitative analysis techniques and using computers to analyze and solve scientific problems." Basically: find numerical solutions to mathematically-formulated problems. Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing 7 / 48 (J. Hadamard) A problem is well posed if its solution exists is unique has a behavior that changes continuously with the initial conditions; otherwise, it is ill posed. Inverse problems are often ill posed. Example: 3D to 2D projection. Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing 8 / 48 continuous domain → discrete domain well-posed but ill-conditioned problems: small errors in input lead to large variations in the solution improve conditioning by regularization Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing 9 / 48 General computational approach continuous domain → discrete domain infinite → finite differential → algebraic nonlinear → (combination of) linear accept approximate solutions, but control for the error Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing 10 / 48 Approximations Modeling approximations: ▶ "model" = approximation of the nature ▶ data - inexact measurements or previous results Implementation/computational approximations: ▶ discretization of the continuous domain; truncation ▶ rounding errors in input data errors propagated by the algorithm accuracy of the final result Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing 11 / 48 Example: area of the Earth model: sphere A = 4πr2 r =? π = 3.14159 . . . rounded arithmetic Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing 12 / 48 Errors Absolute error: approximate value (ˆx) - true value (x) Relative error: absolute error true value → approximate value = (1 + relative error) × (true value) if the relative error is ∼ 10−d , it means that ˆx has about d exact digits: there exists τ = ±(0.0 . . . 0nd+1nd+2 . . . ) such that ˆx = x + τ true value is usually not known → use estimates or bounds on the error relative error can be taken relative to the approximate value Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing 13 / 48 Example/exercise - Homework! Stirling’s approximation for factorials: Sn = √ 2πn n e n ≈ n!, n = 1, 2, . . . where e = exp(1). Relative error (Sn − n!)/n!: Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing 14 / 48 Errors: data and computational compute f(x) for f : R → R ▶ x ∈ R is the true value ▶ f(x) true/desired result ▶ ˆx approximate input ▶ ˆf approximate result total error: ˆf(ˆx) − f(x) = (ˆf(ˆx) − f(ˆx)) + (f(ˆx) − f(x)) = computational error + propagated data error the algorithm has no effect on propagated error Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing 15 / 48 Computational error is sum of: truncation error = (true result) - (result of the algorithm using exact arithmetic) Example: considering only the first terms of an infinite Taylor series; stopping before convergence rounding error = (result of the algorithm using exact arithmetic) (result of the algorithm using limited precision arithmetic) Example: π ≈ 3.14 or π ≈ 3.141593 Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing 16 / 48 Finite difference approximation f′ (x) = lim h→0 f(x + h) − f(x) h ≈ f(x + h) − f(x) h , for some small h > 0 truncation error: f′(x) − f(x+h)−f(x) h ≤ Mh/2 where |f′′(t)| ≤ M for t in a small neighborhood of x (HOMEWORK) rounding error: 2ϵ/h, for ϵ being the precision total error is minimized for h ≈ 2 √ ϵ/M Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing 17 / 48 Figure: Total computational error as a tradeoff between truncation and rounding error (from Heath - Scientific computing) Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing 18 / 48 Error analysis For y = f(x), for f : R → R an approximate ˆy result is obtained. forward error: ∆y = ˆy − y backward error: ∆x = ˆx − x, for f(ˆx) = ˆy x f // OO backward error  ˆf  y OO forward error  = f(x) ˆx f // ˆy = ˆf(x) = f(ˆx) Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing 19 / 48 Compute f(x) = ex for x = 1. Use the first 4 terms from Taylor expansion: ˆf(x) = 1 + x + x2 2 + x3 6 take "true" value: f(x) = 2.716262 and compute ˆf(x) = 2.666667, then forward error: |∆y| = 0.051615, or a relative f. error of about 2% backward error: ˆx = lnˆf(x) = 0.989829 ⇒ |∆x| = 0.019171, or a relative b. error of 2% these are two perspectives on assessing the accuracy Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing 20 / 48 Exercise Consider the general Taylor series with limit e: ∞ n=0 1 n! = e How many terms are needed for an approximation of e to three decimal places? Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing 21 / 48 Backward error analysis idea: approximate result is the exact solution of a modified problem how far from the original problem is the modified version? how much error in the input data would explain all the error in the result? an approximate solution is good if it is an exact solution for a nearby problem backward analysis is usually easier Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing 22 / 48 Sensitivity and conditioning insensitive (well-conditioned) problem: relative changes in input data causes similar relative change in the result large changes in solution for small changes in input data indicate a sensitive (ill-conditioned) problem; condition number: cond = absolute relative change in solution absolute relative change in input = |∆y/y| |∆x/x| if cond >> 1 the problem is sensitive Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing 23 / 48 condition number is a scale factor for the error: relative forward err = cond × relative backward err usually, only upper bounds of the cond. number can be estimated, cond ≤ C, hence relative forward err ≤ C × relative backward err Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing 24 / 48 ˆx = x + ∆x forward error: f(x + ∆x) − f(x) ≈ f′(x)∆x, for small enough ∆x relative forward error: ≈ f′(x)∆x f(x) ⇒ cond ≈ xf′(x) f(x) Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing 25 / 48 ˆx = x + ∆x forward error: f(x + ∆x) − f(x) ≈ f′(x)∆x, for small enough ∆x relative forward error: ≈ f′(x)∆x f(x) ⇒ cond ≈ xf′(x) f(x) Example: tangent function is sensitive in neighborhood of π/2 tan(1.57079) ≈ 1.58058 × 105 ; tan(1.57078) ≈ 6.12490 × 104 for x = 1.57079, cond ≈ 2.48275 × 105 Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing 25 / 48 Stability an algorithm is stable if is relatively insensitive to perturbations during computation stability of algorithms is analogous to conditioning of problems backward analysis: an algorithm is stable if the result produced is the exact solution of a nearby problem stable algorithm: the effect of computational error is no worse than the effect of small error in input data Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing 26 / 48 Accuracy accuracy: closeness of the result to the true solution of the problem depends on the conditioning of the problem AND on the stability of the algorithm stable algorithm + well-conditioned problem = accurate results Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing 27 / 48 CPUs Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing 28 / 48 Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing 29 / 48 Number representation internally, all data are represented in binary format (each digit can be either 0 or 1, e.g. 1011001...) bit, nybble, byte word → specific to architecture: 1, 2, 4, or 8 bytes integers: ▶ unsigned (≥ 0): on n bits: 0, . . . , 2n − 1. The stored representation (for 1 byte) is b7b6b5b4b3b2b1b0 for a value x = 7 i=0 bi2i . ▶ signed: 1 bit for sign, rest for the absolute value; −2n−1 , . . . , 0, . . . , 2n−1 − 1. The stored representation (for 1 byte) is b7b6b5b4b3b2b1b0 for a value x = b7(−27 ) + 6 i=0 bi2i . Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing 30 / 48 Floating-point numbers like in scientific notation: mantissa × radixexponent , e.g. 2.35 × 103 formally x = ± b0 + b1 β + b2 β2 + · · · + bp−1 βp−1 × βE where β is the radix (or base) p is the precision L ≤ E ≤ U are the limits of the exponent 0 ≤ bk ≤ β mantissa: m = b0b1 . . . bp−1; fraction: b1b2 . . . bp−1 the sign, mantissa and exponent are stored in fixed-sized fields (the radix is implicit for a given system, β = 2 usually) Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing 31 / 48 Normalization: b0 0 for all x 0 mantissa m satisfies 1 ≤ m < β ensures unique representation, optimal use of available bits Internal representation (on 64 bits - "double precision", binary representation): x = sign | exponent | fraction = b63 b62 . . . b52 b51 . . . b0 Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing 32 / 48 Properties: only a finite number of discrete values can be represented total number of floating point numbers representable in normalized format is 2(β − 1)βp−1 (U − L + 1) + 1 undeflow level (smallest number): UFL = βL overflow level (largest number): OFL = βU+1 (1 − β−p ) not all real numbers can be represented exactly: ▶ machine numbers ▶ rounding → rounding error Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing 33 / 48 Example: let β = 2, p = 3, L = −1, U = 1, there are 25 distinct numbers that can be represented: UFL = 0.510; OFL = 3.510 note the non-uniform coverage ∀x ∈ R, fl(x) is the floating point representation; x − fl(x) is the rounding error Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing 34 / 48 Rounding rules chop = round toward zero: truncate the base−β representation after p − 1st digit round to nearest: fl(x) is the closest machine number to x Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing 35 / 48 Machine precision machine precision, ϵmach ▶ with chopping: ϵmach = β1−p ▶ with rounding to nearest: ϵmach = 1 2 β1−p called also unit roundoff: the smallest number ϵ such that fl(1 + ϵ) > 1 maximum relative error of representation fl(x) − x x ≤ ϵmach usually 0 < UFL < ϵmach < OFL Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing 36 / 48 Machine precision - example For β = 2, p = 3, L = −1, U = 1, ϵmach = (0.01)2 = (0.25)10 with chopping ϵmach = (0.001)2 = (0.125)10 with rounding to nearest The usual case (IEEE fp systems): ϵmach = 2−24 ≈ 10−7 in single precision ϵmach = 2−53 ≈ 10−16 in double precision → about 7 and 16 decimals of precision, respectively (in R: p-value < 2.2e − 16) Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing 37 / 48 Gradual underflow to improve representation of numbers around 0 - use subnormal (or denormalized) numbers when exponent is at minimum, alow leading digits to be 0 subnormals are less precise → gradual underflow Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing 38 / 48 Special values IEEE standard: Inf: infinity; the result of 1/0 NaN: the result of 0/0 or Inf/Inf special representation of the exponent field Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing 39 / 48 Floating-point arithmetic addition/subtraction: denormalization might be required: 3.52 × 103 + 1.97 × 105 = 0.0352 × 105 + 1.97 × 105 = 2.0052 × 105 → might cause loss of some digits multiplication/division: the result may not be representable overflow is more serious than underflow: how to approximate large numbers? for underflow, the result may be approximated by 0 in FP arithm. addition and multiplication are commutative but not associative: if ϵ is slightly smaller than ϵmach, then (1 + ϵ) + ϵ = 1, but 1 + (ϵ + ϵ) > 1 ideally, x flop y = fl(x op y); IEEE standard ensures this for within range results Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing 40 / 48 Example: divergent series ∞ n=1 1 n in FP arithm, the sum of the series is finite; depending on the system, this is because: ▶ after a while, the sum overflows ▶ 1/n underflows ▶ for all n such that 1 n < ϵmach n−1 k=1 1 k the sum does not change anymore Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing 41 / 48 Cancellation subtracting 2 numbers of the same magnitude usually cancels the most significant digits: 1.92403 × 102 − 1.92275 × 102 = 1.28000 × 10−1 → only 3 significant digits let ϵ > 0 be slightly smaller than ϵmach, then (1 + ϵ) − (1 − ϵ) yields 0 in FP arithmetic, instead of 2ϵ. Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing 42 / 48 Cancellation - example For the quadratic equation, ax2 + bx + c = 0, the two solutions are given by x1,2 = −b ± √ b2 − 4ac 2a Problems: for very large/small coefficients, the terms b2 or 4ac may over-/underflow → rescale coeficients by max{a, b, c}. cancellation between −b and √ · can be avoided by computing one root using x = 2c −b∓ √ b2−4ac Exercise: let x1 = 2000, x2 = 0.05 be the roots of a quadratic equation. Compute the coefficients and then use the above formulas to retrieve the roots. Try numpy.roots() function in Python. Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing 43 / 48 Cancellation - another example P(X) = (X − 1)6 = X6 − 6X5 + 15X4 − 20X3 + 15X2 − 6X + 1. What happens around X = 1? import matplotlib.pyplot as plt import numpy as np epsilon = [0.01, 0.005, 0.001] for k in range(3): x = np.linspace(1 - epsilon[k], 1 + epsilon[k], 100) px = x**6 - 6*x**5 + 15*x**4 - 20*x**3 + 15*x**2 - 6*x + 1 px0 = (x - 1)**6 plt.subplot(2, 3, k+1) plt.plot(x, px, ’-b’, x, np.zeros(100), ’-r’) plt.axis([1 - epsilon[k], 1 + epsilon[k], -max(abs(px)), max(abs(px))]) plt.subplot(2, 3, k+4) plt.plot(x, px0, ’-b’, x, np.zeros(100), ’-r’) plt.axis([1 - epsilon[k], 1 + epsilon[k], -max(abs(px0)), max(abs(px0))]) plt.show() Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing 44 / 48 ...mathematically equivalent, but numerically different... Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing 45 / 48 (very small) Project Study the paper Moler, C., Morisson, D., Replacing square roots by Pythagorean sums. IBM J. Res. Develop. 27(6), 1983 Then, implement the proposed method and compare it with the naive sqrt()-based approach. Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing 46 / 48 In Python... basic (and not only) numerical functions are in numpy package ϵmach is returned by ▶ single precision: np.finfo(np.float32).eps gives 1.1920929e − 07 = 2−23 ▶ double precision: np.finfo(np.float64).eps gives 2.220446049250313e − 16 = 2−52 to obtain the smallest or largest single/double precision numbers, use np.finfo(np.float32).min, np.finfo(np.float32).max, np. finfo(np.float64).min, np.finfo(np.float64).max you have the special constants np.Inf and np.NaN Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing 47 / 48 Questions? Vlad Popovici, Ph.D. (RECETOX) E7441: Scientific computing 48 / 48