Volker John
Numerical Methods for Ordinary
Differential Equations
1. Explicit One-Step Methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3
2. Numerical Methods for Stiff Ordinary Differential Equations . . . . . . . . . . . . . . . . . . . . . . . 27
3. Multi-StepMethods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4. Summary and Outlook. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .73
Appendix A: Topics on the Theory of Ordinary Differential Equations. . . . . . . . . . . . . . . . .77
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
1
Chapter 1
Explicit One-Step Methods
Remark 1.1. Contents. This course presents methods for the numerical solution
of explicit systems of initial value problems for ordinary diﬀerential
equations of ﬁrst order
y�
(x) = f(x, y(x)), y(x0) = y0.
For the most part, only initial value problems for scalar ordinary diﬀerential
equations of ﬁrst order
y�
(x) = f(x, y(x)), y(x0) = y0, (1.1)
are considered, for simplicity of presentation. The extension of the results
and the methods to systems is generally straightforward.
It will be always assumed that there is a unique solution of the initial
value problem in a neighborhood of the initial value. In applications, the
independent variable is often the time. ✷
1.1 Consistency and Convergence
Deﬁnition 1.2. Grid, step size. A grid is a decomposition Ih of the interval
I = [x0, xe]
Ih = {x0, x1, . . . , xN = xe}
with x0 < x1 < . . . < xN . The diﬀerences between neighboring grid points
hk = xk+1 − xk are called step sizes. For an equidistant grid, the notation
h = hk will be used for the step size, see Figure 1.1. ✷
Remark 1.3. Explicit and implicit methods. Let y(xk) denote the solution of
(1.1) in the node xk and yk a numerical approximation of y(xk). A numerical
3
4 1 Explicit One-Step Methods
x2 x3 x4 xN
xeh
x0 x1 xN−1
Fig. 1.1 Equidistant grid.
method for the solution of (1.1) on a grid Ih is called explicit, if an approximation
yk+1 in xk+1 can be calculated directly from already computed values
yi, i ≤ k. Otherwise, the method is called implicit method. Implicit methods
require in each step the solution of a generally nonlinear equation for
computing yk+1. ✷
Deﬁnition 1.4. One-step method, incremental function. A one-step
method for the computation of an approximation yk+1 of the solution of
(1.1) on a grid Ih has the form
yk+1 = yk + hkΦ (x, y, hk) , k = 0, 1, . . . , y0 = y(x0). (1.2)
Here, Φ(·, ·, ·) is called incremental function of the one-step method. ✷
Example 1.5. One-step methods, incremental functions. The explicit or forward
Euler method
yk+1 = yk + hkf (xk, yk) , k = 0, 1, 2, . . . , y0 = y(x0),
is an explicit one-step method with the incremental function
Φ (x, y, hk) = f (xk, yk) .
The computation of yk+1 requires only the substitution of already computed
values in the function f(x, y) from the initial value problem (1.1).
The implicit or backward Euler method
yk+1 = yk + hkf (xk+1, yk+1) , k = 0, 1, 2, . . . , y0 = y(x0),
is an implicit one-step method with the incremental function
Φ (x, y, hk) = f (xk+1, yk+1) .
One has to solve an equation for computing yk+1. The complexity of this step
depends on f(x, y). ✷
Remark 1.6. Representation of implicit one-step methods. Explicit one-step
methods require only that known values are inserted in the incremental function.
Hence, their incremental function can be written ﬁnally in the form
1.1 Consistency and Convergence 5
xk+1xk
ˆyk+1
y(xk+1)
y(xk)
lek+1
Fig. 1.2 The local error.
Φ (xk, yk, hk). For the considerations in this section, one can adopt the point
of view that also implicit one-step methods can be written as explicit onestep
methods, because the data for the nonlinear equation are xk, yk, and hk.
However, generally one does not know the concrete form of the incremental
function. ✷
Example 1.7. Incremental function of the implicit Euler method. The incremental
function of the implicit Euler method on an equidistant grid can be
written in the form
Φ (x, y, h) = f (x + h, y + hΦ (x, y, h)) ,
which allows formally the representation of this method as explicit one-step
scheme. ✷
Deﬁnition 1.8. Local error. Let ˆyk+1 be the result of one step of an explicit
one-step method (1.2) with the initial value y(xk), i.e.,
ˆyk+1 = y(xk) + hkΦ (xk, y(xk), hk) .
Then,
le (xk+1) = lek+1 = y (xk+1) − ˆyk+1
= y (xk+1) − (y(xk) + hkΦ (xk, y(xk), hk)) (1.3)
is called local error, see Figure 1.2 ✷
Remark 1.9. The local error. In the literature, sometimes
y (xk+1) − y(xk)
hk
− Φ (xk, y(xk), hk)
is deﬁned to be the local error.
6 1 Explicit One-Step Methods
For the local error, one starts from the solution of the initial value problem
and considers the error after one step of the numerical method.
One should require for a reasonable method that the local error is small
in an appropriate sense. ✷
Deﬁnition 1.10. Consistent method. Let y(x) be the solution of the initial
value problem (1.1), hmax = maxk hk, and
S := {(x, y) : x ∈ [x0, xe], y ∈ R} .
The one-step method (1.2) is said to be consistent, if for all f ∈ C(S), which
satisfy in S a Lipschitz condition with respect to y, it holds
lim
hmax→0
�
max
xk∈Ih
|le (xk+1)|
hk
�
= 0
or
lim
hmax→0
�
max
xk∈Ih
|f(xk, y(xk)) − Φ (xk, y(xk), hk)|
�
= 0. (1.4)
Both conditions are equivalent, compare Remark 1.11. ✷
Remark 1.11. Approximation of the derivative with the incremental function.
For bounded incremental functions, it is obvious that the local error converges
to zero if hmax → 0, because in this case it holds hk → 0 and
ˆyk+1 → y(xk), such that this statement follows from (1.3). Consistency requires
more, namely that the incremental function approximates the derivative
of the solution suﬃciently well. Applying (1.3) and (1.1) yields
le (xk+1)
hk
=
y (xk+1) − y(xk)
hk
− Φ (xk, y(xk), hk)
≈ y�
(xk) − Φ (xk, y(xk), hk)
= f (xk, yk) − Φ (xk, y(xk), hk) ,
compare (1.4). ✷
Example 1.12. Consistency of the explicit Euler method. For the explicit Euler
method, it is Φ (xk, y(xk), hk) = f (xk, y(xk)) . Hence, condition (1.4) from
Deﬁnition 1.10 is satisﬁed and the method is consistent. ✷
Remark 1.13. Quality of the approximation of the incremental function. For
practical purposes, not only the consistency itself but the quality of the approximation
of the derivative by the incremental function is essential. The
quality allows a comparison of diﬀerent one-step methods. For simplicity of
presentation, let hk = h for all k. ✷
1.1 Consistency and Convergence 7
Deﬁnition 1.14. Order of consistency. An explicit one-step method (1.2)
has the consistency order p ∈ N, if p is the largest natural number such that
for all functions f ∈ C(S), which satisfy a Lipschitz condition with respect
to y, it holds
|le (xk + h)| ≤ Chp+1
for all xk ∈ Ih, for all Ih with h ∈ (0, H], and with the constant C > 0 being
independent of h. The constant C might depend on derivatives of y(x), on
f(x, y), and on partial derivatives of f(x, y). ✷
Example 1.15. Order of consistency of the explicit Euler method. Consider
the explicit Euler method and assume that the function y(x) is two times
continously diﬀerentiable. Then, it follows with Taylor series expansion and
using the diﬀerential equation that
|le (xk + h)| = |y(xk + h) − ˆyk+1|
= |y(xk) + hy�
(xk) +
h2
2
y��
(xk + θh) − y(xk) − h f (xk, y(xk))
� �� �
=y
�
(xk)
|
=
h2
2
�
�y��
(xk + θh)
�
� ≤
h2
2
�y�C
2
([x0,xe])
,
with θ ∈ (0, 1). Since there is no way to replace the term on the right-hand
side by a term with a larger power of h, the method has consistency order 1.
✷
Remark 1.16. Consistency and convergence. The consistency is a local property
of a one-step method. For practical purposes, it is important that the
computed solution converges to the analytic solution if the grid becomes ﬁner
and ﬁner. Of course, the order of convergence is of importance, too.
It will be shown that, under certain conditions, the convergence of a onestep
method follows from its consistency and that the order of convergence
equals the consistency order. ✷
Deﬁnition 1.17. Convergent method, order of convergence. A onestep
method (1.2) converges for the initial value problem (1.1) on the interval
I = [x0, xe], if for each sequence of grids {Ih} with hmax = maxhk
hk → 0 for
the global error
e(xk, h) = y(xk) − yk, xk ∈ Ih,
it follows that
max
xk∈Ih
|e(xk, h)| → 0 for hmax → 0.
The one-step method has the order of convergence p∗
, if p∗
is the largest
natural number such that for all step lengths hmax ∈ (0, H], it holds
|e(xk, h)| ≤ Chp
∗
max ∀ xk ∈ Ih,
8 1 Explicit One-Step Methods
where C > 0 is independent of hmax. ✷
Lemma 1.18. Estimate for a sequence of real numbers. Assume that
for real numbers xn, n = 0, 1, . . ., the inequality
|xn+1| ≤ (1 + δ) |xn| + β
holds with constants δ > 0, β ≥ 0. Then, it holds that
|xn| ≤ enδ
|x0| +
enδ
− 1
δ
β, n = 0, 1, . . . .
Proof. With induction, problem for exercises. �
Theorem 1.19. Connection of consistency and convergence. Let y(x)
be the solution of the initial value problem (1.1) with f ∈ C(S). Let a Lipschitz
condition hold for the second argument of the incremental function
|Φ(x, y1, h) − Φ(x, y2, h)| ≤ M |y1 − y2| ∀ (x, y) ∈ S, h ∈ (0, H], (1.5)
with M ∈ R ﬁxed. Assume that for the local error the estimate
|le (xk + h)| ≤ Chp+1
∀ xk ∈ Ih, h ∈ (0, H] (1.6)
is valid and assume that y0 = y(x0).
Then, it follows for the global error that
|e(xk+1, h)| ≤ C
eM(xk+1−x0) − 1
M
hp
,
where C is independent of h.
Proof. It holds that
yk+1 = yk + hΦ (xk, yk, h) ,
y(xk+1) = y(xk) + hΦ (xk, y(xk), h) + le
�
xk+1
�
, k = 0, 1, . . . .
Then, it follows with the triangle inequality, the assumption on the local error (1.6), and
the Lipschitz condition of the incremental function (1.5) that
�
�e(xk+1, h)
�
� =
�
�y(xk+1) − yk+1
�
�
=
�
�y(xk) − yk + le
�
xk+1
�
+ h
�
Φ (xk, y(xk), h) − Φ (xk, yk, h)
��
�
=
�
�e(xk, h) + le
�
xk+1
�
+ h
�
Φ (xk, y(xk), h) − Φ (xk, yk, h)
��
�
≤ |e(xk, h)| +
�
�le
�
xk+1
��
� + h |Φ (xk, y(xk), h) − Φ (xk, yk, h)|
≤ |e(xk, h)| + Ch
p+1
+ hM |y(xk) − yk|
= (1 + hM) |e(xk, h)| + Ch
p+1
.
This sequence of inequalities has the form that was considered in Lemma 1.18. One obtains
with e(x0) = 0
1.2 Explicit Runge–Kutta Schemes 9
�
�e(xk+1, h)
�
� ≤ e
(k+1)hM
|e(x0)| + C
e
(k+1)hM
− 1
hM
h
p+1
= C
e
M(xk+1−x0) − 1
M
h
p
.
�
Remark 1.20. To Theorem 1.19.
• The consideration of a constant step length is only for simplicity of presentation.
The result of the theorem holds also for non-constant step lengths
with h = maxk hk.
• One-step methods compute an approximation yk of the solution in the
grid points xk, k = 0, 1, . . . , N. To enable a better comparison with
the analytic solution, one connects these points linearly from (xk, yk) to
(xk+1, yk+1). In this way, one obtains a piecewise linear approximation of
the solution that is deﬁned on [x0, xe]. This function is called yh
(x). The
considerations from above can be extended to yh
(x).
✷
1.2 Explicit Runge–Kutta Schemes
Remark 1.21. Idea. The Euler methods are only of ﬁrst order. The idea of
Runge1
–Kutta2
methods consists in using an incremental function Φ(x, y, h)
that is a linear combination of values of f(x, y) in diﬀerent points. With this
approach, one obtains methods of higher order for the cost of evaluating more
values of f(x, y).
This approach can be illustrated well at the integral equation that is equivalent
to the initial value problem (1.1). For simplicity, let the right-hand side
of (1.1) depend only on x. Then, the integral equation has the form
y(x) = y0 +
� x
x0
f(t) dt. (1.7)
The idea of the Runge–Kutta methods consists in approximating the righthand
side by a quadrature rule, e.g., in the interval [xk, xk+1] by
� xk+1
xk
f(t) dt ≈ hk
s�
j=1
bjf
�
xk + cjhk
�
with the weights bj and the nodes xk + cjh.
In the following, only hk = h for all k will be considered for the reason of
simplicity. ✷
1
Carle David Tolm´e Runge (1856 – 1927)
2
Martin Kutta (1867 – 1944)
10 1 Explicit One-Step Methods
Deﬁnition 1.22. Runge–Kutta methods, increments, and stages. A
Runge–Kutta method has the form
yk+1 = yk + hΦ(x, y, h), k = 0, 1, . . . , y0 = y(x0),
where the incremental function is deﬁned with the help of
Ki(x, y, h) = f

xk + cih, yk + h
s�
j=1
aijKj(x, y, h)


by
Φ(x, y, h) =
s�
i=1
biKi(x, y, h),
with c1, . . . , cs, b1, . . . , bs, aij ∈ R, i, j = 1, . . . , s. The quantities Ki(x, y, h),
i = 1, . . . , s, are called increments. The natural number s ∈ N is the number
of stages of the method.
An equivalent deﬁnition is as follows
y
(i)
k+1 = yk + h
s�
j=1
aijf
�
xk + cjh, y
(j)
k+1
�
, (1.8)
Φ(x, y, h) =
s�
i=1
bif
�
xk + cih, y
(i)
k+1
�
. (1.9)
The intermediate values y
(i)
k+1 are called stages. ✷
Remark 1.23. Butcher3
tableau. For the reason of clarity, one writes a Runge–
Kutta scheme in general in form of a tableau, the so-called Butcher tableau
c1 a11 a12 · · · a1s
c2 a21 a22 · · · a2s
c3 a31 a32 · · · a3s
...
...
...
...
cs as1 as2 · · · ass
b1 b2 · · · bs
=⇒
c A
bT . (1.10)
Here, c are the nodes, A is the matrix of the method, and b are the weights.
✷
Remark 1.24. Increments and Butcher tableau. For explicit Runge–Kutta
schemes, the increments can be computed one after the other
3
John C. Butcher, born. 1933
1.2 Explicit Runge–Kutta Schemes 11
K1(x, y, h) = f(xk, yk),
K2(x, y, h) = f (xk + c2h, yk + ha21K1(x, y, h)) ,
...
Ks(x, y, h) = f

xk + csh, yk + h
s−1�
j=1
asjKj(x, y, h)

 . (1.11)
The Butcher tableau has the form
0
c2 a21
c3 a31 a32
...
...
...
...
cs as1 as2 · · · as,s−1
b1 b2 · · · bs−1 bs
.
✷
Example 1.25. Explicit Euler scheme. The explicit Euler scheme is an explicit
Runge–Kutta scheme with the Butcher tableau
0
1
.
In the integral equation, the approximation
� xk+1
xk
f(t, y(t)) dt ≈ hf (xk, yk(xk))
is used, see the proof of the Theorem of Peano, lectures notes of Numerical
Mathematics I. ✷
Theorem 1.26. Consistency of explicit Runge–Kutta schemes. Let
f ∈ C(S), see Deﬁnition 1.10. An explicit Runge–Kutta scheme is consistent
if and only if
s�
i=1
bi = 1. (1.12)
Proof. From the continuity of f(x, y) and the deﬁnition (1.11) of the increments of an
explicit Runge–Kutta scheme, it follows that
lim
h→0
Ki(x, y, h) = f(xk, y(xk)), ∀ (x, y) ∈ S, i = 1, . . . , s,
for the case that the initial value of this step is yk = y(xk). The continuity of the absolute
value function gives
lim
h→0
|f(xk, y(xk)) − Φ (xk, y(xk), h)| = lim
h→0
�
�
�
�
�
f(xk, y(xk)) −
s�
i=1
biKi(x, y, h)
�
�
�
�
�
12 1 Explicit One-Step Methods
=
�
�
�
�
�
f(xk, y(xk)) −
s�
i=1
bi lim
h→0
Ki(x, y, h)
�
�
�
�
�
=
�
�
�
�
�
f(xk, y(xk))
�
1 −
s�
i=1
bi
��
�
�
�
�
= 0
if and only if
�s
i=1 bi = 1. Hence, the condition (1.4) in Deﬁnition 1.10 is satisﬁed. �
Theorem 1.27. Interpretation of the increments. Let for the solution
of (1.1) hold y ∈ C2
([x0, xe]), let f ∈ C(S), and let f be Lipschitz continuous
in the second argument. If yk = y(xk) and
ci =
i−1�
j=1
aij, i ≥ 2, (1.13)
holds, then Ki(x, y, h) is an approximation of at least ﬁrst order (of consistency)
to y�
(xk + cih), i.e.,
y�
(xk + cih) − Ki(x, y, h) = O
�
h2
�
.
Proof. The proof follows by induction.
i = 2. For i = 2, it follows with (1.1), the Lipschitz continuity, and Taylor series
expansion that
�
�y
�
(xk + c2h) − K2(x, y, h)
�
�
=
�
�f
�
xk + c2h, y(xk + c2h)
�
− f
�
xk + c2h, y(xk) + ha21f(xk, y(xk))
��
�
≤ L |y(xk + c2h) − y(xk) − ha21f(xk, y(xk))|
= L
�
�
�y(xk) + c2hy
�
(xk) + O(h
2
) − y(xk) − ha21y
�
(xk)
�
�
�
= L
�
�
�(c2 − a21)hy
�
(xk) + O(h
2
)
�
�
� .
Hence, in the case c2 = a21, the error is of order O(h
2
).
i > 2. Let the asymptotic order of the errors be proved for all indices 2, . . . , i−1. Then,
one gets in the same way as for i = 2
�
�y
�
(xk + cih) − Ki(x, y, h)
�
�
=
�
�
�
�
�
�
f (xk + cih, y(xk + cih)) − f

xk + cih, y(xk) + h
i−1�
j=1
aijKj(x, y, h)


�
�
�
�
�
�
≤ L
�
�
�
�
�
�
y(xk + cih) − y(xk) − h
i−1�
j=1
aijKj(x, y, h)
�
�
�
�
�
�
= L
�
�
�
�
�
�
y(xk) + cihy
�
(xk) + O(h
2
) − y(xk) − h
i−1�
j=1
�
aij
�
y
�
(xk + cjh) + O(h
2
)
��
�
�
�
�
�
�
= L
�
�
�
�
�
�
cihy
�
(xk) + O(h
2
) − h
i−1�
j=1
�
aij
�
y
�
(xk) + O(h)
��
�
�
�
�
�
�
1.2 Explicit Runge–Kutta Schemes 13
= L
�
�
�
�
�
�
h

ci −
i−1�
j=1
aij

 y
�
(xk) + O(h
2
)
�
�
�
�
�
�
.
The order of the error O(h
2
) is given, if ci =
�i−1
j=1 aij. �
Remark 1.28. Conditions on the coeﬃcients for certain orders of convergence.
The conditions from Theorems 1.26 and 1.27 are satisﬁed for many explicit
Runge–Kutta schemes. The goal consists in determining the coeﬃcients
b1, . . . , bs, and aij in such a way that one obtains an order of consistency as
high as possible. The consistency order of a Runge–Kutta scheme with s
stages can be derived from the Taylor series expansion of the local error. Let
(1.12) be valid, then one obtains, e.g.,
• A Runge–Kutta scheme with the parameters (A, b, c) has at least consistency
order p = 2 if
s�
j=1
bjcj =
1
2
. (1.14)
This condition will be shown in Example 1.29 for s = 2.
• If in addition
s�
j=1
bjc2
j =
1
3
and
s�
j=1
bj
s�
k=1
ajkck =
1
6
hold, then the order of consistency is at least p = 3.
The proof of the last statement and conditions for even higher order consistency
can be found in the literature, e.g. in (Strehmel & Weiner, 1995;
Strehmel et al., 2012, Section 2.4.2). ✷
Example 1.29. Runge–Kutta methods with 2 stages. For the investigation of
2-stage Runge–Kutta schemes, one considers for simplicity the so-called autonomous
initial value problem
y�
(x) = f(y(x)), y(x0) = y0.
One has for the increments
K1(y, h) = f(yk),
K2(y, h) = f (yk + ha21K1(yk, h)) = f (yk + ha21f(yk))
= f(yk) + ha21f(yk)∂yf(yk) + O(h2
).
If the initial value is exact, it follows for the incremental function that
Φ(y(xk)) = b1K1(y, h) + b2K2(y, h) (1.15)
= (b1 + b2)f(y(xk)) + hb2a21f(y(xk))∂yf(y(xk)) + O(h2
).
The Taylor series expansion of the solution has the form
14 1 Explicit One-Step Methods
y(xk + h) = y(xk) + h y�
(xk)
� �� �
=f(y(xk))
+
h2
2
y��
(xk) + O
�
h3
�
.
One obtains with the chain rule
y��
(x) =
d
dx
y�
(x) =
d
dx
f(y(x)) = ∂yf(y)y�
(x) = ∂yf(y)f(y(x)).
Now, it follows for the local error, using Taylor series expansion and (1.15),
that
le(xk + h) = y(xk + h) − y(xk) − hΦ(y(xk))
= y(xk) + hf(y(xk)) +
h2
2
�
∂yf(y(xk))f(y(xk))
�
+ O
�
h3
�
− y(xk)
−h
�
(b1 + b2)f(y(xk)) + hb2a21f(y(xk))∂yf(y(xk)) + O(h2
)
�
= h
�
1 − (b1 + b2)
�
f(y(xk)) + h2
�
1
2
− b2a21
�
f(y(xk))∂yf(y(xk))
+O
�
h3
�
.
To achieve an order of consistency as large as possible, the ﬁrst two terms
have to vanish. One obtains with the condition c2 = a21 that
b1 + b2 = 1, b2a21 =
1
2
⇐⇒ b2c2 =
1
2
.
The ﬁrst equation is the general condition for consistency (1.12) and the
second condition is exactly (1.14) for s = 2. These two conditions characterize
all 2-stage explicit Runge–Kutta methods that possess consistency and
convergence order 2
c2 c2
1 − 1
2c2
1
2c2
, with c2 �= 0.
In the case c2 = 1/2, one obtains the method of Runge (1895)
1/2 1/2
0 1
.
This method corresponds with respect to the approximation of the integral
in (1.7) to the application of the mid point rule.
For c2 = 1, one gets the method of Heun4
(1900)
4
Karl Heun (1859 – 1929)
1.2 Explicit Runge–Kutta Schemes 15
1 1
1/2 1/2
,
which corresponds to the use of the trapezoidal rule for the numerical quadrature
in (1.7). ✷
Remark 1.30. Autonomous ordinary diﬀerential equations. Every explicit ﬁrst
order ordinary diﬀerential equation
y�
(x) = f(x, y(x))
can be transformed into an autonomous form
˜y�
(x) = ˜f (˜y(x)) =
�
f(x, y(x))
1
�
by introducing the function
y(x) := x and ˜y(x) :=
�
y(x)
y(x)
�
and noting that (y(x), x) are just the components of ˜y(x). ✷
Theorem 1.31. Consistency and convergence of explicit Runge–
Kutta methods. Let y(x) be the solution of the initial value problem (1.1)
with f ∈ C(S) and let f(x, y) satisfy a Lipschitz condition in the second argument.
Then, an explicit Runge–Kutta scheme that is consistent of order p
converges also with order p.
Proof. The incremental function of an explicit Runge–Kutta scheme is a linear combination
of values of the right-hand side f(x, y). Thus, the assumptions of Theorem 1.19 are
satisﬁed, since the Lipschitz condition in this theorem follows from the required Lipschitz
condition on the right-hand side of the diﬀerential equation. The statement of the theorem
follows now directly from Theorem 1.19. �
Remark 1.32. Explicit Runge–Kutta methods of higher order. Analogously to
2-stage methods, it is possible to derive conditions on the coeﬃcients of an
explicit Runge–Kutta scheme in order to construct methods of higher order.
An important question is the minimal number of stages that is necessary
to be able to reach a certain order. Some answers to this question are from
Butcher (1963, 1965, 1985)
p 1 2 3 4 5 6 7 8
min s 1 2 3 4 6 7 9 11
.
✷
Example 1.33. Classical Runge–Kutta scheme (1901). The so-called classical
Runge–Kutta scheme has four stages and the Butcher tableau
16 1 Explicit One-Step Methods
0
1/2 1/2
1/2 0 1/2
1 0 0 1
1/6 1/3 1/3 1/6
.
It is based on the Simpson5
rule. The center node of the Simpson rule is used
twice, c2 = c3, but with a diﬀerent second argument for the computation of
the increments. This method is of fourth order. ✷
1.3 Step Length Control
Remark 1.34. Motivation. The considerations so far did not provide a way for
estimating a good step length for solving a given initial value problem with
prescribed accuracy and with as little work as possible.
• If the steps are too large, then the numerical solution might be too inac-
curate.
• If the steps are too small, then the numerical simulation might take much
longer than necessary.
A good step length depends certainly on the concrete problem and generally
it will change within the considered interval. For these reasons, the step
length should be controlled during the numerical simulation of the initial
value problem.
A typical approach consists in computing two approximations of the solution
in a node with diﬀerent methods and to draw conclusions on the
size of the local error based on the diﬀerence of these approximations. Of
course, the consideration of the global error would be better. However, Theorem
1.19 shows that on the one hand, the global error is inﬂuenced by
problem-dependent terms, like the length of the interval [x0, xe] or the Lipschitz
constant. On the other hand, the global error is expected to be small
only if the local errors are small. ✷
1.3.1 The Richardson Method
Remark 1.35. Idea. Given a numerical method for solving an initial value
problem and given a step length h. The Richardson6
method consists of the
following steps, see also Figure 1.3:
5
Thomas Simpson (1710 – 1761)
6
Lewis Fry Richardson (1881 – 1953)
1.3 Step Length Control 17
y(x0 + 2h)
yh
x0 + hx0 x0 + 2h
y0
y(x0 + h)
y2h
y2×h
Fig. 1.3 Sketch of the Richardson method.
1. Starting from a node (x0, y0) and using a step length of 2h, an approximation
y2h at the node x0 + 2h will be computed.
2. Two approximations yh and y2×h in x0 + h and x0 + 2h are computed
with two steps of length h.
3. The step length will be controlled by comparing y2h and y2×h.
In general, the more accurate approximation will be y2×h. In addition, it will
be demonstrated that it is possible to improve the accuracy of y2×h with the
information obtained by this method. ✷
Example 1.36. Richardson method for an explicit 2-stage Runge–Kutta method.
Consider an explicit 2-stage Runge–Kutta scheme. One obtains in the
ﬁrst step of the Richardson method, using (1.8), (1.9),
y
(1)
2h = y0,
y
(2)
2h = y0 + 2ha21f(x0, y0),
y2h = y0 + 2h [b1K1(x, y) + b2K2(x, y)]
= y0 + 2h
�
b1f(x0, y
(1)
2h ) + b2f
�
x0 + 2c2h, y
(2)
2h
��
,
or written as Butcher tableau
0
2c2 2a21
2b1 2b2
.
Note that because of the step length 2h, the weights sum up to 2.
The second step of the Richardson method yields
y
(1)
2×h = y0,
y
(2)
2×h = y0 + ha21f(x0, y0),
y
(3)
2×h = yh = y0 + h
�
b1f
�
x0, y
(1)
2×h
�
+ b2f
�
x0 + c2h, y
(2)
2×h
��
,
y
(4)
2×h = yh + ha21f(x0 + h, yh),
18 1 Explicit One-Step Methods
y2×h = yh + h
�
b1f(x0 + h, yh) + b2f
�
x0 + h + c2h, y
(4)
2×h
��
.
Inserting the formula for yh in the last two lines, one sees that the Butcher
tableau of this method is
0
c2 a21
1 b1 b2
1 + c2 b1 b2 a21
b1 b2 b1 b2
.
That means, the computation of y2×h is equivalent to the computation of an
approximation with the help of an explicit 4-stage Runge–Kutta scheme.
Altogether, ﬁve function evaluations are needed:
f(x0, y0), f
�
x0 + 2c2h, y
(2)
2h
�
, f
�
x0 + c2h, y
(2)
2×h
�
,
f(x0 + h, yh), f
�
x0 + h + c2h, y
(4)
2×h
�
.
In the case of a s-stage Runge–Kutta method, (3s − 1) function evaluations
are required. This number is rather large and the high costs per time step
are a disadvantage of the Richardson method. ✷
Remark 1.37. Comparison of both approximations. Consider a one-step meth-
od
yk+1 = yk + hΦ(x, y, h)
of order p. Let the initial value y(x0) be exact, then it follows for the local
error in x0 + 2h that
y(x0 + 2h) − y2h = C(x0)(2h)p+1
+ O
�
hp+2
�
. (1.16)
For estimating the local error of y2×h it will be assumed that the incremental
function Φ(x, y, h) is Lipschitz continuous in the second argument. This
assumption is always satisﬁed for explicit Runge–Kutta schemes if f(x, y)
possesses this property, see the proof of Theorem 1.31. It is
y2×h = yh + hΦ (x + h, yh, h) . (1.17)
Let
ˆy2×h = y(x0 + h) + hΦ (x + h, y(x0 + h), h) (1.18)
be the iterate that is computed with the exact starting value in x0 +h. Using
the deﬁnition of the consistency order, one obtains with (1.17) and (1.18)
y(x0 + 2h) − y2×h
= (y(x0 + 2h) − ˆy2×h) + (ˆy2×h − y2×h)
1.3 Step Length Control 19
=
�
C(x0 + h)hp+1
+ O
�
hp+2
� �
+
�
y(x0 + h) + hΦ (x + h, y(x0 + h), h)
−yh − hΦ (x + h, yh, h)
�
.
For the terms with the incremental function, one gets from the Lipschitz
continuity and the consistency order
|hΦ (x + h, y(x0 + h), h) − hΦ (x + h, yh, h)| ≤ hL |y(x0 + h) − yh|
� �� �
O(h
p+1
)
= O
�
hp+2
�
.
It follows, applying again the consistency error, that
y(x0 + 2h) − y2×h
= C(x0 + h)hp+1
+ y(x0 + h) − yh + O
�
hp+2
�
= C(x0 + h)hp+1
+ C(x0)hp+1
+ O
�
hp+2
�
+ O
�
hp+2
�
= 2C(x0)hp+1
+ O
�
hp+2
�
, (1.19)
where one assumes that C(x0 + h) = C(x0) + O(h), i.e., that the constants
do not change too rapidly.
Neglecting in (1.16) and (1.19) the higher order terms allows to eliminate
y(x0 + 2h) and solve for the constant, yielding
C(x0) =
1
2
�
y2×h − y2h
2p
− 1
�
1
hp+1 . (1.20)
From (1.19), it follows for the local error of the more accurate method that
y(x0 + 2h) − y2×h =
y2×h − y2h
2p
− 1
+ O
�
hp+2
�
. (1.21)
The ﬁrst term on the right-hand side is a computable approximation of this
local error. ✷
Remark 1.38. Increasing the accuracy. Rearranging terms in (1.21) gives
y(x0 + 2h) −
�
y2×h +
y2×h − y2h
2p
− 1
�
= O
�
hp+2
�
.
Then,
y2×h = y2×h +
y2×h − y2h
2p
− 1
is an approximation of the solution of order p + 1. ✷
20 1 Explicit One-Step Methods
Remark 1.39. Automatic step length control. From (1.21) and (1.20), it follows
that
err =
|y2×h − y2h|
2p
− 1
≈ 2C(x0)hp+1
(1.22)
is a computable approximation of the local error. This approximation will be
compared with a prescribed tolerance. Often, a so-called scaled tolerance sc
is used, (Hairer et al., 1993, p. 167) or (Strehmel et al., 2012, p. 61). The
scaled tolerance is a combination of an absolute tolerance atol and a relative
tolerance rtol
sc = atol + max {|y0| , |y2×h|} rtol.
Then, the scaled error
errsc =
|y2×h − y2h|
(2p
− 1)sc
is deﬁned.
• If errsc ≤ 1 ⇐⇒ err ≤ sc, then the performed step will be accepted.
Starting from y2×h or y2×h, the next step will be performed. If y2×h is
used, this approach is also called local Richardson extrapolation.
An important aspect is the choice of the step length hnew for the next
step. The guideline is that the scaled error for the next step should be on
the one hand still smaller than 1 but on the other hand as close to 1 as
possible. Following (1.22), it should hold
1 =
errnew
sc
=
2C (x0 + 2h) hp+1
new
sc
≈
2C(x0)hp+1
new
sc
=
2C(x0)hp+1
sc
�
hnew
h
�p+1
≈ errsc
�
hnew
h
�p+1
,
i.e., hnew has to be chosen such that
hnew ≈
�
1
errsc
�1/(p+1)
h. (1.23)
• If errsc > 1, then the performed step will be rejected. The Richardson
method is repeated from (x0, y0) with a step length hnew < h.
That means, the work that was spent for performing the step with step
length h was wasted. One likes to avoid this situation.
✷
Remark 1.40. Issues of the practical implementation. In practical simulations,
one uses some modiﬁcations of (1.23) for stabilizing the algorithm.
• A safety factor α ∈ (0, 1) is introduced
hnew = α
�
1
errsc
�1/(p+1)
h,
1.3 Step Length Control 21
often α ∈ [0.8, 0.9].
• One likes to avoid large oscillations of the sizes of subsequent steps. For
this reason, a factor for the maximal increase αmax of the new step size
with respect to the current step size and a factor for the maximal decrease
αmin < αmax are used. Then, one obtains
hnew = h min
�
αmax, max
�
αmin, α
�
1
errsc
�1/(p+1)
��
.
If a very large step length is proposed
α
�
1
errsc
�1/(p+1)
> αmax,
then the factor αmax becomes eﬀective and similarly for the case that a
very small step length is proposed.
• Usually, one prescribes a minimal step length hmin and a maximal step
length hmax and requires for all step lengths that hk ∈ [hmin, hmax].
• In the ﬁrst step, one has to estimate h. Generally, this estimate has to
be corrected. In practice, this correction is done very fast by algorithms
for automatic step length control. An algorithm for determining a good
initial step length can be found in (Hairer et al., 1993, p. 168).
✷
1.3.2 Embedded Runge–Kutta Schemes
Remark 1.41. Motivation, embedded Runge–Kutta schemes. Richardson extrapolation
is quite expensive in terms of evaluations of the incremental
function. It is possible to construct a step length control that needs less
evaluations, with so-called embedded Runge–Kutta schemes.
The idea of embedded Runge–Kutta schemes consists in computing numerical
approximations of the solution at the next time with two one-step
methods with diﬀerent order. The methods are chosen in such a way that it
is possible to use certain evaluations of the incremental function for both of
them. That means, one has to construct a Runge–Kutta scheme of the form
0
c2 a21
...
...
cs as1 · · · as,s−1
b1 · · · bs−1 bs
˜b1 · · · ˜bs−1
˜bs
,
22 1 Explicit One-Step Methods
x0 + hx0
y0
y(x0 + h)
˜y1
y1
Fig. 1.4 Sketch of embedded Runge–Kutta schemes.
such that
y1 = y0 + h
s�
i=1
biKi(x, y)
is order of p and
˜y1 = y0 + h
s�
i=1
˜biKi(x, y)
is of order q, see Figure 1.4. In general, it is q = p − 1 or q = p + 1. ✷
Example 1.42. Runge–Kutta–Fehlberg 2(3). Consider explicit Runge–Kutta
schemes with 3 stages
0
c2 a21
c3 a31 a32
b1 b2 b3 p = 2
˜b1
˜b2
˜b3 q = 3
.
One of the schemes should be of order 2 and the other one of third order.
There are 11 parameters to choose. From Theorem 1.26, Theorem 1.27, and
Remark 1.28, it follows that 8 equations have to be satisﬁed
c2 = a21,
c3 = a31 + a32,
b1 + b2 + b3 = 1,
b2c2 + b3c3 =
1
2
,
˜b1 + ˜b2 + ˜b3 = 1,
˜b2c2 + ˜b3c3 =
1
2
,
1.3 Step Length Control 23
˜b2c2
2 + ˜b3c2
3 =
1
3
,
˜b3a32c2 =
1
6
.
That means, one has to set three parameters. First, one can choose c2 = 1,
b3 = 0. Then, it follows from the ﬁrst equation that a21 = 1, from the fourth
equation that b2 = 1/2, and from the third equation that b1 = 1/2. Now,
one chooses c3 = 1/2. From the sixth and seventh equation, it follows that
˜b2 = 1/6 and ˜b3 = 4/6. Then, one gets from the ﬁfth equation ˜b1 = 1/6
and from the eighth equation a32 = 1/4. Finally, the second equation gives
a31 = 1/4. The resulting methods have the form
0
1 1
1/2 1/4 1/4
1/2 1/2 0 p = 2
1/6 1/6 4/6 q = 3
.
The method with order q = 3 is the Simpson rule. The complete embedded
approach is called Runge–Kutta–Fehlberg7
2(3) method (RKF 2(3)). ✷
Remark 1.43. Error estimate, theoretical drawback. By construction, it holds
for the embedded scheme that
y1 = y(x0 + h) + O
�
hp+1
�
, ˜y1 = y(x0 + h) + O
�
hq+1
�
.
It follows that
|err| := |˜y1 − y1| =
�
�
�y(x0 + h) + O
�
hq+1
�
− y(x0 + h) + O
�
hp+1
��
�
�
=
�
�
�O
�
hp+1
�
+ O
�
hq+1
��
�
� (1.24)
is an estimate of the main error term of the Runge–Kutta scheme of order
q∗
= min{p, q}. That means, one obtains only an estimate of the error of the
lower order method. To obtain information only on the lower order method
is the main theoretical drawback of this approach, since one is interested
actually in the higher order method and one will continue the computation
also from the higher order approximation. ✷
Remark 1.44. Automatic step length control, I Controller. Let h be the step
size that was used for computing y1 of order p and ˜y1 of order q with p < q.
From (1.24), one has
|err| = |y1 − ˜y1| = Chp+1
. (1.25)
7
Erwin Fehlberg (1911 – 1972)
24 1 Explicit One-Step Methods
Given a tolerance tol for the maximal local error.
• One approach consists in controlling the error per step (EPS). Then, one
requires that
r1 = |err| ≤ tol. (1.26)
If this condition is satisﬁed, then the current step is accepted. Next, one
requires for the new step size that the local error is equal to the tolerance
Chp+1
new = tol,
with C from (1.25) This requirement gives
hnew =
�
tol
C
�1/(p+1)
=
�
tol
Chp+1
�1/(p+1)
h.
With (1.25) and (1.26), the new step length is computed by
hnew =
�
α tol
|err|
�1/k
h =
�
α tol
r1
�1/k
h, (1.27)
where k = p + 1 and α ∈ (0, 1) is again a safety factor.
• Another way is the consideration of the error relative to the current step
length, the so-called error per unit step (EPUS),
r1 =
|err|
h
≤ tol. (1.28)
The satisfaction of a condition of form (1.28) leads to a new step of form
(1.27) with k = p.
• If (1.26) or (1.28) is not satisﬁed, then the step is rejected and it will be
repeated with a step length smaller than h.
• A generalization of this approach is the so-called I Controller. Replacing
in (1.27) 1/k by kI gives
hnew =
�
α tol
r1
�kI
h.
For obtaining a useful automatic step length control mechanism, the
choice kI = 1/k or equivalently kkI = 1 is not necessary. The following
choices can be found in the literature
kkI ∈ [0, 2] ⇐⇒ kI ∈ [0, 2/k] stable control,
kkI ∈ (1, 2) ⇐⇒ kI ∈ (1/k, 2/k) fast and oscillating control,
kkI ∈ (0, 1) ⇐⇒ kI ∈ (0, 1/k) slow and smooth control,
kkI = 1 ⇐⇒ kI = 1/k standard I Controller.
1.3 Step Length Control 25
There are more sophisticated controllers that are used in practical simulations,
see S¨oderlind (2002) for an overview.
✷
Remark 1.45. Methods used in practice. In practice, one uses, e.g.,
• RKF 4(5), s = 6, Fehlberg (1964),
• RKF 7(8), s = 13, Fehlberg (1969),
• DOPRI 4(5) (or DOPRI 5(4) or DOPRI5), s = 6, Dormand8
, Prince9
:
Dormand & Prince (1980),
• DOPRI 7(8), s = 13, Prince & Dormand (1981).
The standard routine ode45 from MATLAB uses DOPRI 4(5). ✷
Remark 1.46. Fehlberg trick. The Fehlberg trick requires that
Ks = f
�
xk + csh, yk + h
s−1�
i=1
asiKi
�
!
= f
�
xk + h, yk + h
s�
i=1
biKi
� �� �
yk+1
�
,
i.e., the last evaluation of the incremental function of the old step can be used
as ﬁrst value of the incremental function in the new step. The conditions for
applying this trick are
asi = bi, i = 1, . . . , s − 1, bs = 0, cs = 1.
It can be applied, e.g., in DOPRI 4(5). This trick works only if hold ≈ hnew.
✷
8
John R. Dormand
9
P. J. Prince
Chapter 2
Numerical Methods for Stiﬀ Ordinary
Diﬀerential Equations
2.1 Stiﬀ Ordinary Diﬀerential Equations
Remark 2.1. Stiﬀness. It was observed in Curtiss & Hirschfelder (1952) that
explicit methods failed for the numerical solution of initial value problems for
ordinary diﬀerential equations that model certain chemical reactions. They
introduced the notation stiﬀness for such chemical reactions where the fast
reacting components arrive in a very short time in their equilibrium and
the slowly changing components are more or less ﬁxed, i.e., stiﬀ. In 1963,
Dahlquist found out that the reason for the failure of explicit Runge–Kutta
methods is their bad stability, see Section 2.3. It should be emphasized that
the stability properties of the equations themselves are good, it is in fact a
problem of the explicit methods.
There is no unique deﬁnition of stiﬀness in the literature. However, essential
properties of stiﬀ systems are as follows:
• There exist, for certain initial conditions, solutions that change slowly.
• Solutions in a neighborhood of these smooth solutions converge quickly
to them.
A deﬁnition of stiﬀness can be found in (Strehmel & Weiner, 1995, p. 202),
(Strehmel et al., 2012, p. 208). This deﬁnition involves a certain norm that
depends on the equation and it might be complicated to evaluate this norm.
If the solution of (1.1) is sought in the interval [x0, xe] and if the right-hand
side of (1.1) is Lipschitz continuous in the second argument with Lipschitz
constant L, then an approximation of this deﬁnition is as follows. A system
of ordinary diﬀerential equations is called stiﬀ if
L (xe − x0) � 1. (2.1)
Another deﬁnition of stiﬀness will be given in Deﬁnition 2.28. ✷
Example 2.2. Stiﬀ system of ordinary diﬀerential equations. Consider the sys-
tem
27
28 2 Numerical Methods for Stiﬀ Ordinary Diﬀerential Equations
Fig. 2.1 Solutions of Example 2.2, left: ﬁrst component, right: second component.
y�
1 = −80.6y1 + 119.4y2,
y�
2 = 79.6y1 − 120.4y2,
in (0, 1). This system is a linear system of ordinary diﬀerential equations that
can be written in the form
y�
=
�
−80.6 119.4
79.6 −120.4
�
y.
Taking as Lipschitz constant, e.g., the l1 norm of the system matrix (column
sums), one gets L = 239.8 and condition (2.1) is satisﬁed. The general solution
of this system is, compare Appendix A.2.3,
y(x) = c1
�
3
2
�
e−x
+ c2
�
−1
1
�
e−200x
.
The constants are determined by the initial condition. If the initial condition
is such that c2 = 0, then the solution is smooth for all x > 0. Otherwise, if
c2 �= 0, then the solutions changes rapidly for small x while approaching the
smooth solution, see Figure 2.1 ✷
2.2 Implicit Runge–Kutta Schemes
Remark 2.3. Motivation. If the upper triangular part of the matrix of a
Runge–Kutta method, see Deﬁnition 1.22, is not identical to zero, the Runge–
Kutta method is called implicit. That means, there are increments that depend
not only on previously computed increments but also on not yet computed
increments. Thus, one has to solve a nonlinear problem for computing
these increments. Consequently, the implementation of implicit Runge–Kutta
2.2 Implicit Runge–Kutta Schemes 29
methods is much more involved compared with the implementation of explicit
Runge–Kutta methods. Generally, performing one step of an implicit
method is much more time-consuming than for an explicit method. However,
the great advantage of implicit methods is that they can be used for the
numerical simulation of stiﬀ systems, see the stability theory in Section 2.3.
✷
Remark 2.4. Derivation of implicit Runge–Kutta methods. Implicit Runge–
Kutta schemes can be derived from the integral representation (1.7) of the
initial value problem. One can show that for each implicit Runge–Kutta
scheme with the weights bj and the nodes xk + cjh there is a corresponding
quadrature rule with the same weights and the same nodes, see the section
on Gaussian quadrature in Numerical Mathematics I. ✷
Example 2.5. Gauss–Legendre quadrature. Consider the interval [xk, xk +h] =
[xk, xk+1]. Let c1, . . . , cs be the roots of the Legendre polynomial Ps(t) with
the arguments
t =
2
h
(x − xk) − 1 =⇒ t ∈ [−1, 1].
There are s mutually distinct real roots in (−1, 1). After having computed
c1, . . . , cs, one can determine the coeﬃcients aij, bj such that one obtains a
method of order 2s, see Example 2.8. ✷
Remark 2.6. Simplifying order conditions. The order conditions for an implicit
Runge–Kutta scheme with s stages are the same as given in Theorems
1.26, 1.27, and Remark 1.28. These conditions lead to a nonlinear system
of equations for computing the parameters of the scheme. These computations
are generally quite complicated.
A useful tool for solving this problem are the so-called simplifying order
conditions, introduced in Butcher (1964):
B(p) :
s�
i=1
bick−1
i =
1
k
, k = 1, . . . , p,
C(l) :
s�
j=1
aijck−1
j =
1
k
ck
i , i = 1, . . . , s, k = 1, . . . , l, (2.2)
D(m) :
s�
i=1
bick−1
i aij =
1
k
bj
�
1 − ck
j
�
, j = 1, . . . , s, k = 1, . . . , m,
with 00
= 1.
One can show that for suﬃciently large values l and m, the conditions
C(l) and D(m) can be reduced to B(p) with appropriate p. ✷
Remark 2.7. Interpretation of B(p) and C(l). Consider the initial value prob-
lem
30 2 Numerical Methods for Stiﬀ Ordinary Diﬀerential Equations
y�
(x) = f(x), y(x0) = 0.
With the fundamental theorem of diﬀerential calculus, one sees that this
problem has the solution
y(x0 + h) =
� x0+h
x0
f(ξ) dξ = h
� 1
0
f (x0 + hθ) dθ.
A Runge–Kutta method with s stages gives
y1 = h
s�
i=1
bif (x0 + cih) .
Consider in particular the case that f(x) is a polynomial f(x) = (x−x0)k−1
,
k ∈ N \ {0}. Then, the analytical solution has the form
y(x0 + h) = h
� 1
0
(hθ)
k−1
dθ =
(hθ)k
k
�
�
�
�
�
θ=1
θ=0
=
hk
k
. (2.3)
The Runge–Kutta scheme yields
y1 = h
s�
i=1
bi(cih)k−1
= hk
s�
i=1
bick−1
i . (2.4)
Comparing (2.3) and (2.4), one can observe that condition B(p) means that
the quadrature rule that is the basis of the Runge–Kutta method is exact for
polynomials of degree (p − 1).
Condition C(1) is (1.13) with the upper limit s
ci =
s�
j=1
aij, i = 1, . . . , s. (2.5)
✷
Example 2.8. Classes of implicit Runge–Kutta schemes.
• Gauss–Legendre schemes. The nodes of the Gauss–Legendre quadrature
are used. A method with s stages possesses the maximal possible order
2s, where all nodes are in the interior of the intervals. To get the optimal
order, one has to show that B(2s), C(s), D(s) are satisﬁed, see (Strehmel
et al., 2012, Section 8.1.2), i.e.,
s�
i=1
bick−1
i =
1
k
, k = 1, . . . , 2s,
s�
j=1
aijck−1
j =
1
k
ck
i , i = 1, . . . , s, k = 1, . . . , s, (2.6)
2.2 Implicit Runge–Kutta Schemes 31
s�
i=1
bick−1
i aij =
1
k
bj
�
1 − ck
j
�
, j = 1, . . . , s, k = 1, . . . , s.
An example is the implicit mid point rule, whose coeﬃcients can be derived
by setting s = 1 in (2.6). One obtains the following conditions
b1 = 1, b1c1 =
1
2
, a11 = c1, b1a11 = b1 (1 − c1) .
Consequently, the implicit mid point rule is given by
1/2 1/2
1
.
• Gauss–Radau1
methods. These methods are characterized by the feature
that one of the end points of the interval [xk, xk+1] belongs to the nodes.
A method of this class with s stages has at most order 2s − 1.
Examples (s = 1):
◦
0 1
1
s = 1, p = 1,
◦
1 1
1
s = 1, p = 1, implicit Euler scheme.
The ﬁrst scheme does not satisfy condition (2.5).
• Gauss–Lobatto2
methods. In these methods, both end points of the interval
[xk, xk+1] are nodes. A method of this kind with s stages cannot be of
higher order than (2s − 2).
Examples:
◦ trapezoidal rule, Crank3
–Nicolson4
scheme
0 0 0
1 1/2 1/2
1/2 1/2
s = p = 2.
◦ other scheme
0 1/2 0
1 1/2 0
1/2 1/2
s = 2, p = 2.
The second scheme does not satisfy condition (2.5).
✷
1
Rodolphe Radau (1835 – 1911)
2
Rehuel Lobatto (1797 – 1866)
3
John Crank (1916 – 2006)
4
Phyllis Nicolson (1917 – 1968)
32 2 Numerical Methods for Stiﬀ Ordinary Diﬀerential Equations
Remark 2.9. Diagonally implicit Runge–Kutta methods (DIRK methods). For
an implicit Runge–Kutta method with s stages, one has to solve a coupled
nonlinear system for the increments K1(x, y), . . . , Ks(x, y). This step is expensive
for a large number of stages s. A compromise is the use of so-called
diagonally implicit Runge–Kutta (DIRK) methods
c1 a11 0 0 · · · 0
c2 a21 a22 0 · · · 0
c3 a31 a32 a33 · · · 0
...
...
...
...
cs as1 as2 · · · ass
b1 b2 · · · bs−1 bs
.
In DIRK methods, one has to solve s independent nonlinear equations for the
increments. In the equation for Ki(x, y), only the stages K1(x, y), . . . , Ki(x, y)
appear, where K1(x, y), . . . , Ki−1(x, y) were already computed. ✷
2.3 Stability Theory
Remark 2.10. On the stability theory. The stability theory studies numerical
methods for solving the linear initial value problem
y�
(x) = λy(x), y(0) = 1, λ ∈ C. (2.7)
It will turn out the even at the simple initial value problem (2.7) the most
important stability properties of numerical methods can be explored. The
solution of (2.7) is
y(x) = eλx
.
If the initial condition will be slightly perturbed to be 1+δ0, then the solution
of the perturbed initial value problem is
˜y(x) = (1 + δ0)eλx
= eλx
+ δ0eλx
.
If λ = a + ib with a = Re(λ) > 0, then the diﬀerence
|y(x) − ˜y(x)| =
�
�
�δ0eλx
�
�
� = |δ0| |eax
|
�
�
�e
ibx
�
�
� = |δ0| |eax
|
becomes for each δ0 �= 0 arbitrarily large if x is suﬃciently large. That means,
the initial value problem (2.7) is not stable in this case. In this situation, one
cannot expect that any numerical method is stable. Hence, this situation is
not of interest for numerical simulations.
In contrast, if Re(λ) < 0, then the diﬀerence |y(x) − ˜y(x)| becomes arbitrarily
small and the initial value problem is stable, i.e., small changes of the
2.3 Stability Theory 33
data result only in small changes of the solution. This case is of interest for
the stability theory of methods for solving ordinary diﬀerential equations.
This section considers one-step methods with equidistant meshes with step
size h. The solution of (2.7) in the node xk+1 = (k + 1)h is
y(xk+1) = eλxk+1
= eλ(xk+h)
= eλh
eλxk
= eλh
y(xk) =: ez
y(xk),
with z := λh ∈ C, Re(z) ≤ 0. Now, it will be studied how the step from xk
to xk+1 looks like for diﬀerent one-step methods. In particular, large steps
are of interest, i.e., |z| → ∞. ✷
Example 2.11. Behavior of diﬀerent one-step methods for one step of the
model problem (2.7).
1. Explicit Euler method. The general form of this method is
yk+1 = yk + hf (xk, yk) .
In particular, one obtains for (2.7)
yk+1 = yk + hλyk = (1 + z) yk =: R(z)yk.
It holds, independently of Re(z), that lim|z|→∞ |R(z)| = ∞.
2. Implicit Euler method. This method has the form
yk+1 = yk + hf (xk+1, yk+1) .
For applying it to (2.7), one can rewrite it as follows
yk+1 = yk + hλyk+1 ⇐⇒
(1 − z)yk+1 = yk ⇐⇒
yk+1 =
1
1 − z
yk =
�
1 +
z
1 − z
�
yk =: R(z)yk.
For this method, one has, independently of Re(z), that lim|z|→∞ |R(z)| =
0.
3. Trapezoidal rule. The general form of this method is
yk+1 = yk +
h
2
(f(xk, yk) + f(xk+1, yk+1)) ,
which can be derived from the Butcher tableau given in Example 2.8. For
the linear diﬀerential equation (2.7), one gets
34 2 Numerical Methods for Stiﬀ Ordinary Diﬀerential Equations
yk+1 = yk +
h
2
(λyk + λyk+1) ⇐⇒
�
1 −
z
2
�
yk+1 =
�
1 +
z
2
�
yk ⇐⇒
yk+1 =
1 + z/2
1 − z/2
yk =
�
1 +
z
1 − z/2
�
yk =: R(z)yk.
Let z = 2r (cos(φ) + i sin(φ)). Inserting this expression gives
lim
|z|→∞
�
�
�
�
1 + z/2
1 − z/2
�
�
�
� = lim
r→∞
�
�
�
�
1 + r (cos(φ) + i sin(φ))
1 − r (cos(φ) + i sin(φ))
�
�
�
�
= lim
r→∞
�
�
�
�
1/r + (cos(φ) + i sin(φ))
1/r − (cos(φ) + i sin(φ))
�
�
�
�
=
|(cos(φ) + i sin(φ))|
|− (cos(φ) + i sin(φ))|
=
1
1
= 1.
Hence, one has that lim|z|→∞ |R(z)| = 1 for the trapezoidal rule, independently
of φ, and with that independently of Re(z).
The function R(z) describes for each method the step from xk to xk+1. Thus,
this function is an approximation of ez
, which has for diﬀerent methods different
properties, e.g., the limit for |z| → ∞. ✷
Deﬁnition 2.12. Stability function. Let 1 = (1, . . . , 1)T
∈ Rs
, ˆC = C ∪
∞, where ∞ has to be understood as in function theory (Riemann sphere),
and consider a Runge–Kutta method with s stages and with the parameters
(A, b, c). Then, the function
R : ˆC → ˆC, z �→ 1 + zbT
(I − zA)−1
1 (2.8)
is called stability function of the Runge–Kutta method. ✷
Remark 2.13. Stability functions from Example 2.11. All stability functions
from Example 2.11 can be written in the form (2.8). One obtains, e.g., for
the trapezoidal rule
b =
�
1/2
1/2
�
, I − zA =
�
1 0
−z
2 1 − z
2
�
, (I − zA)
−1
=
1
1 − z
2
�
1 − z
2 0
z
2 1
�
,
from what follows that
1 + zbT
(I − zA)−1
1 = 1 +
z
1 − z/2
�
1
2
−
z
4
+
z
4
+
1
2
�
= 1 +
z
1 − z/2
.
✷
Theorem 2.14. Form of the stability function of Runge–Kutta methods.
Given a Runge–Kutta scheme with s stages and with the parameters
2.3 Stability Theory 35
(A, b, c), then the stability function R(z) is a rational function deﬁned on ˆC,
whose polynomial order in the numerator and in the denominator is at most
s. The poles of this functions might be only at values that correspond to the
inverse of an eigenvalue of A. For an explicit Runge–Kutta scheme, R(z) is
a polynomial.
Proof. Consider ﬁrst an explicit Runge–Kutta scheme. In this case, the matrix A is a
strictly lower triangular matrix. Hence, I − zA is a triangular matrix with the values one
at its main diagonal. This matrix is invertible and it is
(I − zA)
−1
= I + zA + . . . + z
s−1
A
s−1
, (2.9)
which can be checked easily by multiplication with (I − zA) and using that A
s
= 0 since
A is strictly lower triangular. It follows from (2.8) and (2.9) that R(z) is a polynomial in
z of degree at most s.
Now, the general case will be considered. The expression (I −zA)
−1
1 can be interpreted
as the solution of the linear system of equations (I − zA)ζ = 1. Using the Cramer rule,
one ﬁnds that the i-th component of the solution has the form
ζi =
det Ai
det(I − zA)
,
where Ai is the matrix that is obtained by replacing the i-th column of (I − zA) by the
right-hand side, i.e., by 1. The numerator of ζi is a polynomial in z of order at most (s−1)
since there is one column where z does not appear. The denominator is a polynomial of
degree at most s. Multiplying with zb
T
from the left-hand side gives just a rational function
with polynomials of at most degree s both in the numerator and in the denominator.
There is only one case where this approach does not work, namely if
det(I − zA) = det(z(I/z − A)) = z
s
det(I/z − A) = 0,
i.e., if 1/z is an eigenvalue of A. �
Theorem 2.15. Solution of the initial value problem (2.7) obtained
with a Runge–Kutta scheme. Consider a Runge–Kutta method with s
stages and with the parameters (A, b, c). If z−1
= (λh)−1
is not an eigenvalue
of A, then the Runge–Kutta scheme is well-deﬁned for the initial value
problem (2.7). In this case, it is
yk = (R(hλ))k
, k = 0, 1, 2, . . . .
Proof. The statement of the theorem follows directly if one writes the Runge–Kutta
scheme for (2.7) and applies induction. exercise �
Deﬁnition 2.16. Stability domain. The stability domain of a one-step
method is the set
S := {z ∈ ˆC : |R(z)| ≤ 1}.
✷
Remark 2.17. Desirable property for the stability domain. The stability domain
of the initial value problem (2.7) is, see Remark 2.10,
36 2 Numerical Methods for Stiﬀ Ordinary Diﬀerential Equations
Sanal = C−
0 := {z ∈ C : Re(z) ≤ 0},
since R(z) = ez
. In this domain, the solution decreases (for Re(z) < 0) or its
absolute value is constant (for Re(z) = 0). A desirable property of a numerical
method is that it should be stable for all parameters where the initial value
problem is stable, i.e., C−
0 ⊆ S. ✷
Deﬁnition 2.18. A-stable method. If for the stability domain S of a onestep
method, it holds that C−
0 ⊆ S, then this one-step method is called
A-stable. ✷
Lemma 2.19. Property of an A-stable method. Consider an A-stable
one-step method, then it is |R(∞)| ≤ 1.
Proof. By the assumption C
−
0 ⊆ S, the absolute value of the stability function is bounded
from above by 1 for all |z| → ∞ with Re(z) ≤ 0. From Theorem 2.14, it follows that
the stability function has to be a rational function where the polynomial degree of the
numerator is not larger than the polynomial degree of the denominator, since otherwise
the function is unbounded for |z| → ∞. It is known from function theory that such rational
functions are continuous in ∞. Hence, it is |R(∞)| ≤ 1. �
Remark 2.20. On A-stable methods. The behavior of the stability function
for |z| → ∞, z ∈ C−
0 , is of utmost interest, since it describes the length of
the steps that is admissible for given λ such that the method is still stable.
However, from the property |R(∞)| ≤ 1, it does not follow that the step
length can be chosen arbitrarily large without loosing the stability of the
method. ✷
Deﬁnition 2.21. Strongly A-stable method, L-stable method. An Astable
one-step method is called strongly A-stable, if it satisﬁes in addition
|R(∞)| < 1. It is called L-stable (left stable), if even it holds that |R(∞)| = 0.
✷
Example 2.22. Stability of some one-step methods. The types of stability deﬁned
in Deﬁnitions 2.18 and 2.21 are of utmost importance for the quality of
a numerical method.
1. Explicit Euler method. It is R(z) = 1 + z, i.e., the stability domain is the
closed circle with radius 1 and center (−1, 0), see Figure 2.2. This method
is not A-stable. For |λ| large, one has to use very small steps in order to
get stable simulations.
The smallness of the step lengths for stable simulations of stiﬀ problems
is the basic problem of all explicit methods.
2. Implicit Euler method. One has for this method R(z) = 1/(1 − z). The
stability domain is the complete complex plane without the open circle
with radius 1 and center (1, 0), see Figure 2.2. Hence, the method is Astable.
From Example 2.11, it is already known that |R(∞)| = 0 such
2.3 Stability Theory 37
x−1
y
1
y
1
x1
Fig. 2.2 Stability domain of the explicit Euler method (left) and the implicit Euler method
(right).
that the implicit Euler method is even L-stable. A smallness condition
on the step lengths does not arise for this method, at least for the model
problem (2.7).
In general, one can apply with the implicit Euler method much larger steps
than, e.g., with the explicit Euler method. Step size restrictions arise, e.g.,
from the physics of the problem and from the required accuracy of the
simulations. However, one has to solve in general in each node a nonlinear
equation, like for each implicit scheme. Thus, the numerical costs and the
computing time per step are usually much larger than for explicit schemes.
3. Trapezoidal rule. For the trapezoidal rule, one gets with z = a + ib, a, b ∈
R,
|R(z)|
2
=
�
�
�
�
1 + z/2
1 − z/2
�
�
�
�
2
=
�
�
�
�
1 + a/2 + ib/2
1 − a/2 − ib/2
�
�
�
�
2
=
(2 + a)2
+ b2
(2 − a)2
+ b2 .
Thus, |R(z)| ≤ 1 if |2 + a| ≤ |2 − a|, compare Figure 2.3, i.e.,
R(z)



< 1 for a < 0 ⇐⇒ Re(z) < 0,
= 1 for a = 0 ⇐⇒ Re(z) = 0,
= 1 for z = ∞,
see also Example 2.11. Hence, one obtains S = C−
0 . This method is Astable
but not L-stable. However, in contrast to the implicit Euler method,
which is a ﬁrst order method, the trapezoidal rule is a second order
method.
✷
Remark 2.23. Linear systems of ordinary diﬀerential equations. Consider the
linear system of ordinary diﬀerential equations with constant coeﬃcients
y�
(x) = Ay(x), y(0) = y0, A ∈ Rn×n
, y0 ∈ Rn
. (2.10)
The solution of (2.10) has the form
y(x) = eAx
y0,
38 2 Numerical Methods for Stiﬀ Ordinary Diﬀerential Equations
−4 −2 0 2 4
�
0
1
2
3
4
5
|� ��|
|�−�|
Fig. 2.3 Illustration to the trapezoidal rule, Example 2.22.
where Ax is deﬁned component-wise, as a multiplication of a scalar with a
matrix. The ﬁrst factor on the right-hand side is the matrix exponential. ✷
Deﬁnition 2.24. Matrix exponential. Let A ∈ Rn×n
and
A0
:= I, A1
:= A, A2
:= AA, . . . , Ak
:= Ak−1
A.
The matrix exponential is deﬁned by
eA
:= exp(A) : Rn×n
→ Rn×n
, A �→
∞�
k=0
Ak
k!
.
✷
Lemma 2.25. Properties of the matrix exponential. The matrix exponential
has the following properties:
i) The series
∞�
k=0
Ak
k!
converges absolutely for all A ∈ Rn×n
, like in the real case n = 1.
ii) If the matrices A, B ∈ Rn×n
are commuting, i.e., if AB = BA holds, then
it follows that
eA
eB
= eA+B
.
iii) The matrix
�
eA
�−1
∈ Rn×n
exists for all A ∈ Rn×n
and it holds
�
eA
�−1
= e−A
.
This property corresponds to ex
�= 0 for the scalar case.
2.3 Stability Theory 39
iv) It holds rank
�
eA
�
= n, det
�
eA
�
�= 0.
v) The matrix-valued function R → Rn×n
, x �→ eAx
, where Ax is deﬁned
component-wise, is continuously diﬀerentiable with respect to x with
d
dx
eAx
= AeAx
.
The derivative of the exponential is the ﬁrst factor in this matrix product.
The formula looks the same as in the scalar case.
Proof. i) with comparison test with a majorizing series, using that the corresponding
series with real argument converges for all real numbers, see literature,
ii) follows from i), exercise,
iii) follows from ii), exercise,
iv) follows from iii),
v) direct calculation with diﬀerence quotient, exercise.
�
Example 2.26. Matrix exponential. There are only few classes of matrices that
allow an easy computation of the matrix exponential: diagonal matrices and
nilpotent matrices.
1. Consider
A =


1 0 0
0 2 0
0 0 3

 =⇒ Ak
=



1k
0 0
0 2k
0
0 0 3k


 .
It follows that
eAx
=
∞�
k=0
(Ax)k
k!
=
∞�
k=0
1
k!



xk
0 0
0 (2x)k
0
0 0 (3x)k



=



�∞
k=0
x
k
k! 0 0
0
�∞
k=0
(2x)
k
k! 0
0 0
�∞
k=0
(3x)
k
k!


 =


ex
0 0
0 e2x
0
0 0 e3x

 .
2. This example illustrates property ii) of Lemma 2.25. For the matrices
A =
�
2 0
0 3
�
, B =
�
0 1
0 0
�
,
it is possible to calculate the corresponding series easily, since B is a
nilpotent matrix (B2
= 0). One obtains
eA
=
�
e2
0
0 e3
�
, eB
= I + B =
�
1 1
0 1
�
.
It holds AB �= BA and
40 2 Numerical Methods for Stiﬀ Ordinary Diﬀerential Equations
eA
eB
=
�
e2
e2
0 e3
�
�=
�
e2
e3
0 e3
�
= eB
eA
.
Assume that eA
eB
= eA+B
. Since eA+B
= eB+A
, it follows that then
also eB
eA
= eB+A
= eA+B
= eA
eB
, which is a contradiction to the
calculations from above.
✷
Remark 2.27. Extension of the stability theory to linear systems. Consider
system (2.10). It will be assumed that the matrix A possesses n eigenvalues
λ1, . . . , λn ∈ C.
Further, it will be assumed that this matrix can be diagonalized, i.e., there
exists a matrix Q ∈ Rn×n
such that
Λ = Q−1
AQ, with Λ = diag(λ1, . . . , λn).
This property is given, e.g., if all eigenvalues are mutually diﬀerent. The
columns qi of Q are the eigenvectors of A. With the substitution
y(x) = Qz(x) =⇒ y�
(x) = Qz�
(x),
one obtains the diﬀerential equation
Qz�
(x) = AQz(x) ⇐⇒ z�
(x) = Q−1
AQz(x) = Λz(x).
The equations of this system are decoupled. Its general solution is given by
z(x) = eΛx
c =
�
cieλix
�
i=1,...,n
.
It follows that the general solution of (2.10) has the form
y(x) = Qz(x) =
n�
i=1
cieλix
qi.
Inserting this expression in the initial condition yields
y(0) =
n�
i=1
ciqi = Qc = y0 =⇒ c = Q−1
y0.
Hence, one obtains the following solution of the initial value problem
y(x) =
n�
i=1
�
Q−1
y0
�
i
eλix
qi, (2.11)
2.3 Stability Theory 41
where
�
Q−1
y0
�
i
is the i-th component of Q−1
y0. Now, one can easily see
that the solution is stable (small changes of the initial data lead to small
changes of the solution) only if all eigenvalues have a negative real part.
The study of numerical methods makes sense only in the case that the
problem is well posed, i.e., all eigenvalues have a negative real part. Then,
the most important term in (2.11) with respect to stability is the term with
the eigenvalue of A with the largest absolute value of its real part, since
for the stability, the absolute values of the product of the real parts of the
eigenvalues and the step length are important. ✷
Deﬁnition 2.28. Stiﬀ system of ordinary diﬀerential equations. The
linear system of ordinary diﬀerential equations
y�
(x) = Ay(x), A ∈ Rn×n
,
is called stiﬀ, if all eigenvalues λi of A possess a negative real part and if
q :=
max{|Re(λi)| , i = 1, . . . , n}
min{|Re(λi)| , i = 1, . . . , n}
� 1.
Sometimes, the system is called weakly stiﬀ if q ≈ 10 and stiﬀ if q > 10. ✷
Remark 2.29. On Deﬁnition 2.28. Deﬁnition 2.28 has a disadvantage. The
ratio becomes large also in the case that the eigenvalue with the smallest
absolute value of the real part is close to zero. However, this eigenvalue is not
important for the stability of numerical methods, only the eigenvalue with
the largest absolute value of the real part. ✷
Remark 2.30. Local stiﬀness for general ordinary diﬀerential equations. The
concept of stiﬀness can be extended in some sense from linear diﬀerential
equations to general diﬀerential equations. The diﬀerential equation
y�
(x) = f(x, y(x))
can be transformed, by introducing the functions
y(x) := x and ˜y(x) :=
�
y(x)
y(x)
�
,
to the autonomous form
˜y�
(x) = ˜f (˜y(x)) =
�
f(x, y(x))
1
�
.
By linearizing at the initial value ˜y0, one obtains a diﬀerential equation of the
form ˜y�
(x) = A˜y(x). Applying some deﬁnition of stiﬀness to the linearized
equation, it is possible to deﬁne a local stiﬀness for the general equation.
42 2 Numerical Methods for Stiﬀ Ordinary Diﬀerential Equations
However, if one considers nonlinear problems, one has to be careful in the
interpretation of the results. In general, the results are valid only locally, i.e.,
in a neighborhood of the point of linearization, and they do not describe
the stability of a numerical method in the whole domain of deﬁnition of the
nonlinear problem. ✷
2.4 Rosenbrock Methods
Remark 2.31. Goal. From the stability theory, it became obvious that one
has to use implicit methods for stiﬀ problems. However, implicit methods are
computationally expensive, one has to solve in general nonlinear problems in
each step. The goal consists in constructing implicit methods that have on
the one hand a reduced computational complexity but on the other hand,
they should be still accurate and stable. ✷
Remark 2.32. Linearly implicit Runge–Kutta methods. Consider, without loss
of generality, the autonomous initial value problem in Rn
y�
(x) = f(y), y(0) = y0,
compare Remark 1.30. DIRK methods, see Remark 2.9, have a Butcher
tableau of the form
c1 a11 0 0 · · · 0
c2 a21 a22 0 · · · 0
c3 a31 a32 a33 · · · 0
...
...
...
...
cs as1 as2 · · · ass
b1 b2 · · · bs−1 bs
.
One has to solve s decoupled nonlinear equations
Kj = f
�
yk + h
j−1�
l=1
ajlKl + hajjKj
�
, j = 1, . . . , s. (2.12)
This ﬁxed point equation can be solved with a ﬁxed point iteration. As a
special ﬁxed point iteration, the quasi Newton method for solving the j-th
equation leads to an iterative scheme of the form
K
(m+1)
j = K
(m)
j (2.13)
−
�
I − ajjhJ
�−1
�
K
(m)
j − f
�
yk + h
j−1�
l=1
ajlKl + hajjK
(m)
j
��
� �� �
residual
,
2.4 Rosenbrock Methods 43
m = 0, 1, . . .. The derivative with respect to Kj of the corresponding nonlinear
problem to (2.12) with right-hand side 0 is
I − ajjh
����
∂y
∂Kj
∂yf
�
yk + h
j−1�
l=1
ajlKl + hajjKj
�
∈ Rn×n
.
In (2.13), one uses usually the approximation of the derivative J = ∂yf(yk)
instead of the derivative at the current iterate, hence it is a quasi Newton
method. If the step length h is suﬃciently small, then the matrix
�
I − ajjhJ
�
is non-singular, since then it is suﬃciently close to the identity, and the linear
systems of equations possess a unique solution.
Often, it turns out to be suﬃcient for reaching the required accuracy to
perform just one step of the iteration. This statement holds in particular if
the step length is suﬃciently small and if a suﬃciently accurate start value
K
(0)
j is available. One utilizes the ansatz (linear combination of the already
computed increments)
K
(0)
j :=
j−1�
l=1
djl
ajj
Kl,
where the coeﬃcients djl, l = 1, . . . , j −1, still need to be determined. Applying
just one step in (2.13) with this ansatz, one obtains an implicit method
with linear systems of equations of the form
�
I − ajjhJ
�
Kj = f
�
yk + h
j−1�
l=1
�
ajl + djl
�
Kl
�
− hJ
j−1�
l=1
djlKl, j = 1, . . . , s,
yk+1 = yk + h
s�
j=1
bjKj. (2.14)
This class of methods is called linearly implicit Runge–Kutta methods.
Linearly implicit Runge–Kutta methods are still implicit methods. One has
to solve in each step only s linear systems of equations. That means, these
methods are considerably less computationally complex than the original
implicit methods and the ﬁrst goal stated in Remark 2.31 is achieved. Now,
one has to study which properties of the original methods are transferred
to the linearly implicit methods. In particular, stability is of importance. If
stability will be lost, then the linearly implicit methods are not suited for
solving stiﬀ diﬀerential equations. ✷
Theorem 2.33. Stability of linearly implicit Runge–Kutta methods.
Consider a Runge–Kutta method with the parameters (A, b, c), where
A ∈ Rs×s
is a non-singular lower triangular matrix (which was used for the
derivation of (2.14)). Then, the corresponding linearly implicit Runge–Kutta
44 2 Numerical Methods for Stiﬀ Ordinary Diﬀerential Equations
method (2.14) with J = ∂yf(yk) has the same stability function R(z) as the
original method, independently of the choice of {djl}.
Proof. The linearly implicit method will be applied to the one-dimensional (to simplify
notations) test problem
y
�
(x) = λy(x), y(0) = 1,
with Re(λ) < 0. Since f(y) = λy, one obtains J = λ. The j-th equation of (2.14) has the
form
�
1 − ajjhλ
�
Kj = λ
�
yk + h
j−1�
l=1
�
ajl + djl
�
Kl
�
− hλ
j−1�
l=1
djlKl
= λyk + hλ
j−1�
l=1
ajlKl, j = 1, . . . , s.
Multiplication with h gives with z = λh
Kjh − z
j�
l=1
ajlKlh = zyk, j = 1, . . . , s.
This equation is equivalent, using matrix-vector notation, to
(I − zA) Kh = zyk1, K = (K1, . . . , Ks)
T
.
Let h be chosen in such a way that z
−1
is not an eigenvalue of A. Then, one obtains by
inserting this equation in the second equation of (2.14)
yk+1 = yk + hb
T
K = yk + hb
T
(I − zA)
−1
1
z
h
yk =
�
1 + zb
T
(I − zA)
−1
1
�
yk = R(z)yk.
Now one can see that in the parentheses there is the stability function R(z) of the original
Runge–Kutta method, see (2.8). �
Remark 2.34. On the stability and consistency. Since the most important stability
properties of a numerical method for solving initial value problems with
ordinary diﬀerential equations depend only on the stability function, these
properties transfer from the original implicit Runge–Kutta method to the
corresponding linearly implicit method.
The choice of the coeﬃcients {djl} will inﬂuence the order of the linearly
implicit method. For an inappropriate choice of these coeﬃcients, the order
of the linearly implicit method might be lower than the order of the original
method. ✷
Example 2.35. Linearly implicit Euler method. The implicit Euler method has
the Butcher tableau
1 1
1
.
With (2.14), it follows that the linearly implicit Euler method has the form
�
I − h∂yf(yk)
�
K1 = f (yk) , yk+1 = yk + hK1.
2.4 Rosenbrock Methods 45
The linearly implicit Euler method is L-stable, like the implicit Euler method,
and one has to solve in each step only one linear system of equations. There
are no coeﬃcients {djl} to be chosen in this method. ✷
Remark 2.36. Rosenbrock5
methods. Another possibility for simplifying the
use of linearly implicit methods and decreasing the numerical costs consists
in using for all increments the same coeﬃcient ajj = a. In this case, all linear
systems of equations in (2.14) possess the same system matrix (I − ahJ).
Then, one needs only one LU decomposition of this matrix and can solve all
systems in (2.14) with this decomposition. This approach is called Rosenbrock
methods or Rosenbrock–Wanner6
methods (ROW methods)
(I − ahJ) Kj = f
�
yk + h
j−1�
l=1
�
ajl + djl
�
Kl
�
− hJ
j−1�
l=1
djlKl, j = 1, . . . , s,
yk+1 = yk + h
s�
j=1
bjKj. (2.15)
In practice, it is often even possible to use the same approximation J of the
Jacobian for some subsequent steps. This is true in particular, if the solution
changes only slowly. In this way, one can save additional computational costs.
✷
Example 2.37. The method ode23s. In MATLAB, one can ﬁnd for solving stiﬀ
initial value problems with ordinary diﬀerential equations the Rosenbrock
method ode23s, see Shampine & Reichelt (1997). This method has the form
(I − ahJ) K1 = f(yk), a =
1
2 +
√
2
≈ 0.2928932,
(I − ahJ) K2 = f
�
yk +
1
2
hK1
�
− ahJK1, (2.16)
yk+1 = yk + hK2.
From the equation for the second increment, it follows that d21 = a. Then,
one obtains with (2.15) a21 = 1/2 − d21 = 1/2 − a. Using the condition
that the nodes are the sums of the rows of the matrix, it follows that the
corresponding Butcher tableau looks like
a a
1/2 1/2 − a a
0 1
.
✷
5
Howard H. Rosenbrock (1920 – 2010)
6
Gerhard Wanner, born 1942
46 2 Numerical Methods for Stiﬀ Ordinary Diﬀerential Equations
Theorem 2.38. Order of ode23s. The Rosenbrock method ode23s is of
second order if h ∈ (0, 1/(2a �J�2)).
Proof. Let h ∈ (0, 1/(2a �J�2)), where �·�2 denotes the spectral norm of J, which is
induced by the Euclidean vector norm �·�2. It can be shown, see class Computer Mathematics,
that the matrix (I − ahJ) is invertible if �ahJ�2 < 1. This condition is satisﬁed
for the choice of h from above.
Let K be the solution of
(I − ahJ)K = f. (2.17)
Then, one obtains with the triangle inequality, with the compatibility of the Euclidean
vector norm and the spectral matrix norm, and with the choice of h that
�(I − ahJ)K�2 ≥ �K�2 − ah �JK�2 ≥ �K�2 − ah �J�2 �K�2
≥ �K�2 −
a �J�2
2a �J�2
�K�2 =
1
2
�K�2 .
It follows with (2.17) that
1
2
�K�2 ≤ �(I − ahJ)K�2 = �f�2 =⇒ �K�2 ≤ 2 �f�2 . (2.18)
Thus, the solution of the linear system of equations is bounded by the right-hand side.
One obtains for the ﬁrst increment of ode23s by recursive insertion, using (2.16),
K1 = f(yk) + ahJK1 = f(yk) + ahJ (f(yk) + ahJK1)
= f(yk) + ahJf(yk) + h
2
a
2
J
2
K1
= f(yk) + ahJf(yk) + O
�
h
2
�
. (2.19)
The last step is allowed since K1 is bounded by the data of the problem (the right-hand
side f(yk)) independently of h, see (2.18) where the constant in the estimate is 2. Using
a Taylor series expansion and considering only ﬁrst order terms explicitly, one obtains in
a similar way for the second increment of ode23s
K2 = f
�
yk +
1
2
hK1
�
− ahJK1 + ahJK2
= f (yk) +
1
2
h∂yf (yk) K1 − ahJK1 + ahJK2 + O
�
h
2
�
(2.20)
(2.19)
= f (yk) +
1
2
h∂yf (yk) f (yk) − ahJf(yk) + ahJK2 + O
�
h
2
�
(2.20)
= f (yk) +
1
2
h∂yf (yk) f (yk) − ahJf(yk) + ahJf(yk) + O
�
h
2
�
= f (yk) +
1
2
h∂yf (yk) f (yk) + O
�
h
2
�
.
Inserting these results in (2.16) gives for one step of ode23s
yk+1 = yk + hf (yk) +
1
2
h
2
∂yf (yk) f (yk) + O
�
h
3
�
. (2.21)
The Taylor series expansion of the solution y(x) of the system of diﬀerential equations in
xk has the form, using the diﬀerential equation and the chain rule,
y(xk+1) = y(xk) + hy
�
(xk) +
h
2
2
y
��
(xk) + O
�
h
3
�
2.4 Rosenbrock Methods 47
= y(xk) + hf (yk) +
h
2
2
∂f (y)
∂x
(xk) + O
�
h
3
�
= y(xk) + hf (yk) +
h
2
2
∂yf (yk) y
�
(xk) + O
�
h
3
�
= y(xk) + hf (yk) +
h
2
2
∂yf (yk) f (yk) + O
�
h
3
�
.
Starting with the exact value at xk, then the ﬁrst three terms of (2.21) correspond to
the Taylor series expansion of the solution y(x) of the system of diﬀerential equations in
xk. Thus, it follows that the local error is of order O
�
h
3
�
, from what follows that the
consistency order of ode23s is two, see Deﬁnition 1.14. �
Remark 2.39. To the proof of Theorem 2.38. Note that it is not needed in the
proof of Theorem 2.38 that J is the exact derivative ∂yf (yk). The method
ode23s remains a second order method if J is only an approximation of
∂yf (yk) and even if J is an arbitrary matrix. However, the transfer of the
stability properties from the original method to ode23s is only guaranteed
for the choice J = ∂yf (yk), see Theorem 2.33. ✷
Theorem 2.40. Stability function of ode23s. Assume that J = ∂yf(yk),
then the stability function of the Rosenbrock method ode23s has the form
R(z) =
1 + (1 − 2a)z
(1 − az)2 . (2.22)
Proof. The statement of the theorem follows from applying the method to the usual test
equation, exercise. �
Corollary 2.41. Stability of ode23s. If J = ∂yf(yk), then the Rosenbrock
method ode23s is L-stable.
Proof. The statement is obtained by applying the deﬁnition of L-stability to the stability
function (2.22). �
Remark 2.42. On the order of ode23s. It remains the question whether an
appropriate choice of J might even increase the order of the method. However,
for the model problem of the stability analysis, a series expansion of the
stability function shows that the exponential function is reproduced exactly
only to the quadratic term. From this observation, it follows that one does
not obtain a third order method even with exact Jacobian. In practice, there
is no important reason from the point of view of accuracy to compute a new
Jacobian in each step. Often, it is suﬃcient to update J every now and then.
✷
Chapter 3
Multi-Step Methods
3.1 Deﬁnition
Remark 3.1. Multi-step methods. The characteristic feature of one-step methods
is that they need for computing yk+1 only the value from the previous
approximation yk of the solution. A straightforward extension consists in constructing
methods that use for computing yk+1 more than one of the previous
approximations yk, yk−1, . . .. Such methods are called multi-step methods. ✷
Deﬁnition 3.2. q-step method, linear q-step method. A q-step method
with q ≥ 1 is a numerical method for approximately solving
y�
(x) = f (x, y(x)) , y(x0) = y0, (3.1)
where yk+1 depends on yk+1−q but not on yi with i < k + 1 − q.
A q-step method is called linear, if it has the form
yk+1 =
q−1�
j=0
ajyk−j + h
q−1�
j=0
bjf
�
xk−j, yk−j
�
+ hb−1f (xk+1, yk+1) , (3.2)
k = q − 1, q, . . . , with q ≥ 1, a0, . . . , aq−1, b−1, . . . , bq−1 ∈ R, aq−1 �= 0 or
bq−1 �= 0. For q = 1, the method is called a one-step method. If b−1 �= 0, then
the linear q-step method is an implicit method, otherwise it is an explicit
method. ✷
Remark 3.3. Initial values for a q-step method. A q-step method needs q initial
values. However, the initial value problem (3.1) provides only the initial
value y0. The second initial value y1 can be computed with a one-step method,
the next initial value y2 with a one-step method or with a two-step method
and so on. It follows that all initial values yi, i > 0, are already numerical
approximations. This aspect has to be taken into account in the error analysis
of multi-step methods, see Remark 3.23. ✷
49
50 3 Multi-Step Methods
integral
xp interpolationxp−r
xp−j xp+m
Fig. 3.1 Parameters in the derivation of predictor-corrector schemes.
3.2 Predictor-Corrector Methods
Remark 3.4. Construction. Starting point of the construction of predictorcorrector
methods is the equivalent integral form of the initial value problem
(3.1)
y(x) = y0 +
� x
x0
f (t, y(t)) dt. (3.3)
Denote the solution at ˜x by y(˜x), then it holds that
y(x) = y(˜x) +
� x
˜x
f (t, y(t)) dt. (3.4)
The main idea of predictor-corrector methods consists in approximating
the integral on the right-hand side of (3.4) in an appropriate way. There are
two principal diﬃculties:
• The dependency of the term in the integral on t is generally not known
since the function y(t) is unknown.
• Even if the dependency of the function in the integral on t is known, generally
it will be impossible to ﬁnd an analytic expression of the solution.
Consider an equidistant grid with nodes
xi = x0 + ih, i = 0, 1, . . . .
For the derivation of the methods, assume that the term in the integral is
known. Then, the derivation is similar to the derivation of the Newton1
–
Cotes2
formulas for numerical quadrature. In this approach, the term in the
integral of (3.4) is replaced by a polynomial interpolant. Let the boundaries
of the integral be the nodes
˜x = xp−j, starting point with parameter j,
x = xp+m end point with parameter m, (3.5)
with parameters j, m ∈ N0 that need yet to be determined. It will be required
that the interpolation polynomial pr(x) satisﬁes the following properties:
• the degree of pr(x) is lower than or equal to r,
• pr(xi) = f(xi, y(xi)) for i = p, p − 1, . . . , p − r.
1
Isaac Newton (1642 – 1727)
2
Roger Cotes (1682 – 1716)
3.2 Predictor-Corrector Methods 51
Thus, xp is the most right-hand side node for computing the interpolation
polynomial. The value r is a third parameter, compare Figure 3.1. Note that
two sets of nodes are involved in the construction, namely the nodes that
determine the boundaries of the integral and the nodes that are used to deﬁne
the interpolation polynomial. The solution of this interpolation problem is
given by the Lagrange3
interpolation polynomial
pr(x) =
r�
i=0
f
�
xp−i, y(xp−i)
�
Li(x)
with
Li(x) =
r�
l=0,l�=i
x − xp−l
xp−i − xp−l
, i = 0, 1, . . . , r. (3.6)
It follows by using (3.4), (3.5), and (3.6) that
yp+m ≈ yp−j +
r�
i=0
f
�
xp−i, y(xp−i)
�
� x
˜x
Li(t) dt
= yp−j + h
r�
i=0
βif
�
xp−i, y(xp−i)
�
(3.7)
with
βi =
1
h
� x
˜x
Li(t) dt =
1
h
� x
˜x


r�
l=0,l�=i
t − xp−l
xp−i − xp−l

 dt.
The constructed method is in particular linear. Note that so far the assumption
of having an equidistant grid was not used.
Finally, the formula for βi should be simpliﬁed. To this end, note that all
ﬁxed values from the interval are nodes of the equidistant grid, such that,
e.g., xp = x0 + ph. Replacing these values and using the substitution
t = xp + sh =⇒ dt = hds,
yields
βi =
1
h
� m
−j


r�
l=0,l�=i
xp + sh − xp−l
xp−i − xp−l

 h ds
=
� m
−j


r�
l=0,l�=i
x0 + ph + sh − x0 − ph + lh
x0 + ph − ih − x0 − ph + lh

 ds
3
Joseph Louis Lagrange (1736 – 1813)
52 3 Multi-Step Methods
=
� m
−j


r�
l=0,l�=i
s + l
−i + l

 ds. (3.8)
Now, diﬀerent methods can be obtained depending on the choice of m, j, and
r and by replacing the generally unknown value y(xp−i) in (3.7) with yp−i.
There are four important classes of methods. ✷
Example 3.5. Adams4
–Bashforth5
methods. The class of q-step Adams–Bashforth
methods is given by m = 1, j = 0, and r = q−1. It follows that the q-step
Adams–Bashforth method uses the nodes xk+1−q, . . . , xk for computing the
Lagrangian interpolation polynomial. These are q nodes and pq(x) is at most
of degree q − 1. Adams–Bashforth methods are explicit methods. They have
the general form
yk+1 = yk + h
q−1�
i=0
βif (xk−i, yk−i) , (3.9)
see (3.7), with
βi =
� 1
0


q−1�
l=0,l�=i
s + l
−i + l

 ds, (3.10)
compare (3.8).
In the case q = 1, the term in the integral in (3.4) is replaced by a constant
interpolation polynomial with the node (xk, f(xk, yk)). Using the convention
that the product is 1 if there is formally no factor in (3.10), this approach
yields
yk+1 = yk + h
�� 1
0
ds
�
f (xk, yk) = yk + hf (xk, yk) ,
i.e., one obtains the explicit Euler method.
If q = 2, then the term in the integral is approximated by a linear interpolation
polynomial with the nodes (xk−1, f(xk−1, yk−1)) and (xk, f(xk, yk)).
Using (3.9) and (3.10), one obtains
yk+1 = yk + h
��� 1
0
s + 1
1
ds
�
f (xk, yk) +
�� 1
0
s
−1
ds
�
f (xk−1, yk−1))
�
= yk + h
�
3
2
f (xk, yk) −
1
2
f (xk−1, yk−1)
�
= yk +
h
2
�
3f (xk, yk) − f (xk−1, yk−1)
�
.
q ≥ 3, exercise ✷
4
John Couch Adams (1819 – 1892)
5
Francis Bashforth (1819 – 1912)
3.2 Predictor-Corrector Methods 53
Example 3.6. Adams–Moulton6
methods. Adams–Moulton methods are deﬁned
by m = 0, j = 1, and r = q. Hence, it follows that
βi =
� 0
−1


q�
l=0,l�=i
s + l
−i + l

 ds
and from (3.7) that
yk = yk−1 + h
q�
i=0
βif (xk−i, yk−i)
or, by transforming the index,
yk+1 = yk + h
q�
i=0
βif (xk+1−i, yk+1−i) .
The q + 1 nodes of these methods are given by xk+1−q, . . . , xk, xk+1. That
means, Adams–Moulton methods are implicit methods.
This class contains two one-step methods that are obtained for q = 0
(which can be used in contrast to the requirement in Deﬁnition 3.2) and
q = 1. Note that the parameter q in (3.2) determines both the previous approximations
to be used and the previous arguments of the function f. But
in the construction of the methods, three independent parameters were introduced
to determine these values. This construction introduces some freedom
which allows here to set q = 0.
Considering the case q = 0, then the term in the integral is replaced by
a constant interpolation polynomial with the node at (xk+1, f(xk+1, yk+1)).
This approach results in the method
yk+1 = yk + h
�� 0
−1
ds
�
f (xk+1, yk+1) = yk + hf (xk+1, yk+1) ,
which is the implicit Euler method.
For q = 1, one uses a linear interpolation polynomial with the points
(xk, f(xk, yk)) and (xk+1, f(xk+1, yk+1)). One gets
yk+1 = yk + h
��� 0
−1
s + 1
1
ds
�
f (xk+1, yk+1) +
�� 0
−1
s
−1
ds
�
f (xk, yk)
�
= yk + h
�
1
2
f (xk+1, yk+1) +
1
2
f (xk, yk)
�
= yk +
h
2
[f (xk+1, yk+1) + f (xk, yk)] .
6
Forest Ray Moulton (1872 – 1952)
54 3 Multi-Step Methods
This method is the trapezoidal rule. ✷
Example 3.7. Nystr¨om7
methods. The class of Nystr¨om methods is obtained
by using m = 1, j = 1, and r = q − 1. They have the form
yk+1 = yk−1 + h
q−1�
i=0
βif (xk−i, yk−i)
with
βi =
� 1
−1


q−1�
l=0,l�=i
s + l
−i + l

 ds.
These methods are explicit and one uses the q nodes xk+1−q, . . . , xk.
One gets, e.g., for q = 1, the method
yk+1 = yk−1 + h
�� 1
−1
ds
�
f (xk, yk) = yk−1 + 2hf (xk, yk) .
✷
Example 3.8. Milne8
method. Milne methods are deﬁned by m = 0, j = 2,
and r = q. Using a transform of the index, one ﬁnds that they have the form
yk+1 = yk−1 + h
q�
i=0
βif (xk+1−i, yk+1−i)
with
βi =
� 0
−2


q�
l=0,l�=i
s + l
−i + l

 ds.
Thus, these are implicit methods. ✷
Remark 3.9. On the coeﬃcients of multi-step methods. One can ﬁnd tables
with the coeﬃcients for multi-step methods in the literature. ✷
Remark 3.10. Using implicit methods in practice, predictor-corrector methods.
If implicit methods are used, then one has to solve in each node xk+1 an
equation that is generally nonlinear. This step can be performed with some
kind of ﬁxed point iteration, e.g., with a method of Newton-type. To achieve
a good eﬃciency of the method, a good initial iterate is of importance. To
obtain a good initial iterate, one can use an explicit (multi-step) method. For
7
Evert J. Nystr¨om (1895 – 1960)
8
William Edwin Milne (1890 – 1971)
3.3 Convergence of Multi-Step Methods 55
this reason, explicit multi-step methods are called predictor methods and
implicit multi-step methods are called corrector methods. The combination
of a predictor method with a corrector method is called predictor-corrector
method.
Often, it is suﬃcient for computing the next iterate to perform the predictor
step and one or two corrector steps. ✷
Remark 3.11. Nordsieck9
form. It is possible to transform multi-step methods
in a one-step form, the so-called Nordsieck form. This form uses instead of
yk, . . . , yk−q+1, f (xk, yk) , . . . , f
�
xk−q+1, yk−q+1
�
,
the values
yk, y�
(xk), y��
(xk), . . . , y(q)
(xk),
see, e.g., (Strehmel et al., 2012, Section 4.4.3). The advantage of the Nordsieck
form consists in the possibility of applying a step length control as it is known
from one-step methods, Section 1.3. Otherwise, a step length control for form
(3.2) of multi-step methods becomes rather complicated. On the other hand,
using the Nordsieck form requires that the solution of the initial value problem
is q times continuously diﬀerentiable. ✷
3.3 Convergence of Multi-Step Methods
Remark 3.12. Generalities. In this section, linear multi-step methods of the
form (3.2) will be considered. Similarly to one-step methods, notations like
local error, consistency, or order of convergence will be introduced. The extension
of these notations to nonlinear multi-step methods is straightforward.
✷
Deﬁnition 3.13. Local error. Let yk+1 be the results of (3.2), k ≥ q, where
the initial values are exactly the values of the solution
yk+1−q = y(xk+1−q), . . . , yk = y(xk).
Then, the local error is deﬁned by
le(xk+1) = lek+1 = y(xk+1)−


q−1�
j=0
ajy
�
xk−j
�
+ h
q−1�
j=−1
bjf
�
xk−j, y(xk−j)
�

 .
(3.11)
✷
9
Arnold Nordsieck (1911 – 1971)
56 3 Multi-Step Methods
Deﬁnition 3.14. Consistent method, consistency order. Let y(x) be
the solution of the initial value problem (3.1), S = {(x, y) : x ∈ I =
[x0, xe], y ∈ R}, and IN an equidistant mesh on I with N intervals. The
multi-step method (3.2) is called consistent if for all f ∈ C(S), which satisfy
in S a Lipschitz condition with respect to y, it holds
lim
h→0
�
max
xk∈IN
le(xk + h)
h
�
= 0, with h =
xe − x0
N
. (3.12)
If the expression on the left-hand side converges like hp
for p ≥ 1, then the
multi-step scheme has the consistency order p. ✷
Example 3.15. Consistency order for a Nystr¨om method. The consistency order
of a multi-step method can be computed in the same way as for a one-step
method by expanding the local error in a Taylor series with respect to h. After
having then divided by h, the order of the ﬁrst non-vanishing term gives
the consistency order.
Consider the Nystr¨om method for q = 3
yk+1 = yk−1 + h
��� 1
−1
2�
l=1
s + l
l
ds
�
f (xk, yk)
+


� 1
−1
2�
l=0,l�=1
s + l
−1 + l
ds

 f (xk−1, yk−1)
+
�� 1
−1
1�
l=0
s + l
−2 + l
ds
�
f (xk−2, yk−2)
�
= yk−1 + h
�
7
3
f (xk, yk) −
2
3
f (xk−1, yk−1) +
1
3
f (xk−2, yk−2)
�
.
It follows with (3.11) and (3.1) that
le(xk+1)
= y(xk+1) − y(xk−1)
−h
�
7
3
f (xk, y(xk)) −
2
3
f (xk−1, y(xk−1)) +
1
3
f (xk−2, y(xk−2))
�
= y(xk+1) − y(xk−1) − h
�
7
3
y�
(xk) −
2
3
y�
(xk−1) +
1
3
y�
(xk−2)
�
. (3.13)
Now, the the individual terms will be expanded
y(xk+1) = y(xk + h) = y(xk) + hy�
(xk) +
h2
2
y��
(xk) +
h3
6
y���
(xk)
+
h4
24
y(4)
(xk) + O(h5
),
3.3 Convergence of Multi-Step Methods 57
y(xk−1) = y(xk − h) = y(xk) − hy�
(xk) +
h2
2
y��
(xk) −
h3
6
y���
(xk)
+
h4
24
y(4)
(xk) + O(h5
),
y�
(xk−1) = y�
(xk − h) = y�
(xk) − hy��
(xk) +
h2
2
y���
(xk)
−
h3
6
y(4)
(xk) + O(h4
),
y�
(xk−2) = y�
(xk − 2h) = y�
(xk) − 2hy��
(xk) + 2h2
y���
(xk)
−
4h3
3
y(4)
(xk) + O(h4
).
Inserting these expressions in formula (3.13) for the local error gives
le(xk+1) = y(xk) + hy�
(xk) +
h2
2
y��
(xk) +
h3
6
y���
(xk) +
h4
24
y(4)
(xk)
−y(xk) + hy�
(xk) −
h2
2
y��
(xk) +
h3
6
y���
(xk) −
h4
24
y(4)
(xk)
−
7h
3
y�
(xk) +
2
3
�
hy�
(xk) − h2
y��
(xk) +
h3
2
y���
(xk) −
h4
6
y(4)
(xk)
�
−
1
3
�
hy�
(xk) − 2h2
y��
(xk) + 2h3
y���
(xk) −
4h4
3
y(4)
(xk)
�
+ O(h5
)
=
h4
3
y(4)
(xk) + O(h5
).
With (3.12), one obtains that this method has consistency order 3. ✷
Remark 3.16. Linear multi-step methods with a high order of convergence.
The goal in constructing multi-step methods consists of course in obtaining
convergent methods of high order. A high order of convergence can be expected
only if the consistency order is high, i.e., if the local error is small.
Using the Taylor series expansion of the terms in the local error and requiring
that as many leading terms as possible vanish, one gets a linear system of
equations for determining the coeﬃcients aj, bj, j = 0, . . . , q − 1 and b−1 in
(3.11). In this way, one obtains a method of the form
yk+1 −
q−1�
j=0
ajyk−j = h
q−1�
j=−1
bjf
�
xk−j, yk−j
�
. (3.14)
Constructing one-step methods in this way, one always obtains a convergent
one-step method, e.g., compare Example 1.29. However, the situation
might be diﬀerent for multi-step methods. ✷
58 3 Multi-Step Methods
Example 3.17. Non-convergent multi-step method. Consider the idea from Remark
3.16 for the construction of an explicit linear multi-step method with
q = 2 and with maximal order of consistency. That means, the ansatz for the
method is as follows, compare (3.14),
yk+1 − a0yk − a1yk−1 = h [b0f (xk, yk) + b1f (xk−1, yk−1)] .
The local error has the form
le(xk+1) = y (xk+1) − a0y (xk) − a1y (xk−1) − hb0f (xk, y(xk))
−hb1f (xk−1, y(xk−1))
= y (xk+1) − a0y (xk) − a1y (xk−1) − hb0y�
(xk) − hb1y�
(xk−1).
Now, the individual terms are expanded in powers of h:
y (xk+1) = y (xk + h) = y (xk) + hy�
(xk) +
h2
2
y��
(xk) +
h3
6
y���
(xk) + O(h4
),
y (xk−1) = y (xk − h) = y (xk) − hy�
(xk) +
h2
2
y��
(xk) −
h3
6
y���
(xk) + O(h4
),
y�
(xk−1) = y�
(xk − h) = y�
(xk) − hy��
(xk) +
h2
2
y���
(xk) + O(h3
).
Inserting the expansions gives
le(xk+1) = y (xk) + hy�
(xk) +
h2
2
y��
(xk) +
h3
6
y���
(xk) − a0y (xk)
−a1
�
y (xk) − hy�
(xk) +
h2
2
y��
(xk) −
h3
6
y���
(xk)
�
− hb0y�
(xk)
−hb1
�
y�
(xk) − hy��
(xk) +
h2
2
y���
(xk)
�
+ O(h4
)
= [1 − a1 − a0] y (xk) + [1 + a1 − b1 − b0] hy�
(xk)
+
�
1
2
−
a1
2
+ b1
�
h2
y��
(xk) +
�
1
6
+
a1
6
−
b1
2
�
h3
y���
(xk) + O(h4
).
Requiring that the ﬁrst four terms should vanish leads to the following linear
system of equations




1 1 0 0
−1 0 1 1
1/2 0 −1 0
−1/6 0 1/2 0








a1
a0
b1
b0



 =




1
1
1/2
1/6



 .
The unique solution of this system is a1 = 5, a0 = −4, b1 = 2, b0 = 4.
Consequently, one obtains the method
3.3 Convergence of Multi-Step Methods 59
yk+1 = −4yk + 5yk−1 + h [4f (xk, yk) + 2f (xk−1, yk−1)] (3.15)
with third order of consistency.
Next, the convergence of the method will be studied at the model initial
value problem
y�
(x) = −y(x), y(0) = 1,
with the solution y(x) = exp(−x). As second initial condition, one takes the
value of the solution in the mesh point x1 = h, i.e., y1 = exp(−h). Inserting
the special form of the right-hand side of the model problem, f(xk, yk) = −yk,
in (3.15), one can represent the computed solution explicitly. This solution
satisﬁes the homogeneous linear diﬀerence equation
yk+1 + (4 + 4h)yk + (−5 + 2h)yk−1 = 0.
The solution of this diﬀerence equation can be obtained with the ansatz
yk = ξk
. Inserting this ansatz in the diﬀerence equation leads to
ξk+1
+ (4 + 4h)ξk
+ (−5 + 2h)ξk−1
= 0.
This equation is satisﬁed for ξ = 0. Other solutions can be obtained after
division by ξk−1
from
ξ2
+ (4 + 4h)ξ + (−5 + 2h) = 0. (3.16)
One gets the solutions
ξ1(h) = −2 − 2h + 3
�
1 +
2
3
h +
4
9
h2
, ξ2(h) = −2 − 2h − 3
�
1 +
2
3
h +
4
9
h2
.
For simplicity, the dependency on h will be neglected in the notation. The
general solution of the diﬀerence equations can be represented as a linear
combination of the special solutions (superposition principle)
yk = C1ξk
1 + C2ξk
2 .
Now, the constants can be determined from the initial conditions. It holds
y0 = C1 + C2 = 1, y1 = e−h
= C1ξ1 + C2ξ2,
from what follows that
C1(h) =
e−h
− ξ2
ξ1 − ξ2
, C2(h) = −
e−h
− ξ1
ξ1 − ξ2
.
Expanding ξ1(h), ξ2(h), C1(h) and C2(h) in powers of h and inserting these
expansions in the solution (exercise), gives for ﬁxed x > 0 and hN := x/N
60 3 Multi-Step Methods
yN =
�
1 + O
� x
N
�� �
1 −
x
N
+ O
�� x
N
�2
��N
−
1
216
� x
N
�4 �
1 + O
� x
N
�� �
−5 − 3
x
N
+ O
�� x
N
�2
��N
.
Considering now the convergence of the method, i.e., hN → 0 ⇐⇒ N →
∞. Then, one obtains for the ﬁrst term, using well known properties of the
exponential, that
lim
N→∞
�
1 + O
� x
N
�� �
1 −
x
N
+ O
�� x
N
�2
��N
= e−x
.
This part approximates the solution of the model problem. For the second
term, it holds that
−
1
216
� x
N
�4 �
1 + O
� x
N
�� �
−5 − 3
x
N
+ O
�� x
N
�2
��N
= −
(−5)N
216
� x
N
�4 �
1 + O
� x
N
�� �
1 +
3
5
x
N
+ O
�� x
N
�2
��N
.
Since
lim
N→∞
�
1 +
3
5
x
N
+ O
�� x
N
�2
��N
= e3x/5
,
one ﬁnds that the second term behaves for large N as follows
−
(−5)N
216
� x
N
�4
e3x/5
. (3.17)
This expression oscillates with increasing N and the modulus is increasing
for ﬁner grids (‘exponential (−5)N
is stronger than polynomial (x/N)4
’),
compare the values for x = 1 in the following table
N value of (3.17)
1 0.0421787
2 - 0.1054467
3 0.3514890
4 - 1.3180836
5 5.2723345
6 - 21.96806
7 94.14883
8 - 411.90113
9 1830.6717
10 - 8238.0226
It follows that the method does not converge.
3.3 Convergence of Multi-Step Methods 61
Such an oscillatory behavior can be observed also if the method is applied
for solving other initial value problems. For the considered example, the reason
for this behavior is that the general solution of the diﬀerence equation
contains a term that becomes arbitrarily large for large k and for small h (or
large N). For the considered method it is
lim
h→0
ξ2(h) = −5 =⇒ lim
k→∞
�
�
�ξ
k
2 (h)
�
�
� = ∞.
The solution of the diﬀerence equation was obtained from the roots of the
polynomial (3.16). It can be guessed that the roots of this polynomial will be
of importance for the convergence of multi-step methods. ✷
Deﬁnition 3.18. Null stable linear multi-step method. A linear q-step
method is called null stable if the ﬁrst characteristic polynomial
Ψ(ξ) = ξq
− a0ξq−1
− . . . − aq−1 (3.18)
possesses only roots ξq with
�
�ξq
�
� ≤ 1 that are simple in the case that
�
�ξq
�
� = 1.
For the notation ‘null stable’, compare Remark 3.37 below. ✷
Example 3.19. Null stability for predictor-corrector methods. The methods
from the four most important classes of predictor-corrector methods are null
stable.
• Adams–Bashforth methods, Adams–Moulton methods. The ﬁrst characteristic
polynomial has the form
Ψ(ξ) = ξq
− ξq−1
= (ξ − 1) ξq−1
.
The only non-trivial root is ξq = 1 and this root is simple.
• Nystr¨om methods, Milne methods. For these methods, the ﬁrst characteristic
polynomial is
Ψ(ξ) = ξq
− ξq−2
= (ξ + 1) (ξ − 1) ξq−2
.
Hence, the only non-trivial roots are ξq = 1 and ξq = −1. They are simple.
Null stability does not mean stable in the sense that the method can be
applied for the numerical solution of stiﬀ problems, see Example 3.22. ✷
Theorem 3.20. First Dahlquist10
barrier. The maximal order of consistency
of a null stable linear q-step method is
p =



q + 1 for q odd,
q + 2 for q even,
q if b−1 ≤ 0, in particular, if the method is explicit.
10
Germund Dahlquist (1925 – 2005)
62 3 Multi-Step Methods
Proof. Only a sketch of the proof is given here, for details see the literature, e.g., (Strehmel
et al., 2012, Section 4.2.3) or (Hairer et al., 1993, Section III.3).
First, one sets for ξ ∈ C, |ξ| < 1,
z =
ξ − 1
ξ + 1
.
Then, one deﬁnes the polynomials
R(z) =
�
1 − z
2
�q
Ψ(ξ) =
q�
l=0
αlz
l
,
S(z) =
�
1 − z
2
�q
σ(ξ) =
q�
l=0
βlz
l
,
with
σ(ξ) = b−1ξ
q
+ b0ξ
q−1
+ . . . + bq−1. (3.19)
As next step, one can prove that a linear multi-step method has consistency order p if and
only if
R(z)
�
ln
1 + z
1 − z
�−1
− S(z) = O
�
z
p�
for z → 0.
Using a Taylor series expansion of the term with the logarithm, one has on the left-hand
side of this statement a polynomial. Now, one studies which coeﬃcients of this polynomial
might vanish such that the method is null stable in the individual cases given in the
theorem. �
Example 3.21. Consistency order of some predictor-corrector methods.
• Adams–Bashforth methods with q steps have the consistency order q and
Adams–Moulton methods with q steps possess the consistency order q+1.
Thus, Adams–Moulton methods where q is even have an order that is less
than the maximal possible order according to Theorem 3.20.
• The 2-step Milne method (also Milne–Simpson method)
yk+1 = yk−1 + h
�
1
3
f(xk+1, yk+1) +
4
3
f(xk, yk) +
1
3
f(xk−1, yk−1)
�
(3.20)
has the consistency order 4. This method achieves the maximal order of
consistency for a null stable 2-step method.
✷
Example 3.22. Stability of the 2-step Milne method. This implicit method is
null stable, see Example 3.19, and it possesses the maximal possible order of
consistency for a null stable method, see Example 3.21. Thus, so far it shows
favorable properties. But having a closer look on its stability reveals that this
method has a severe drawback.
Consider again the model initial value problem
y�
(x) = λy(x), y(0) = 1,
3.3 Convergence of Multi-Step Methods 63
with the solution y(x) = exp(λx). Applying the 2-step Milne method for the
solution of this problem, then the method (3.20) has the form
yk+1 = yk−1 + hλ
�
1
3
yk+1 +
4
3
yk +
1
3
yk−1
�
.
This equation can be rewritten as a linear diﬀerence equation
�
1 −
hλ
3
�
yk+1 −
4hλ
3
yk −
�
1 +
hλ
3
�
yk−1 = 0.
The general solution of this diﬀerence equation can be represented in the
form
yk = C1ξk
1 + C2ξk
2 , (3.21)
where ξ1(h) and ξ2(h) are the solutions of the quadratic equation
�
1 −
hλ
3
�
ξ2
−
4hλ
3
ξ −
�
1 +
hλ
3
�
= 0.
One obtains
ξ1(h) =
3
3 − hλ

2hλ
3
+
�
1 +
(hλ)2
3

 ,
ξ2(h) =
3
3 − hλ

2hλ
3
−
�
1 +
(hλ)2
3

 .
Now, the constants C1, C2 can be determined from the initial condition and
from the value of the ﬁrst step. It is for x = 0
C1 + C2 = 1 (3.22)
and for x = h, taking the exact value,
eλh
= C1ξ1 + (1 − C1)ξ2 = (1 − C2) ξ1 + C2ξ2. (3.23)
Expanding ξ1(h) and ξ2(h) in powers of h at h = 0, one obtains as ﬁrst order
approximation by computing the derivatives of ξ1(h) and ξ2(h), respectively,
and inserting zero
ξ1(h) = 1 + λh + O
�
h2
�
, ξ2(h) = −1 +
λ
3
h + O
�
h2
�
. (3.24)
In the interesting case, λ < 0, ξ1(h) approaches 1 from left, with values
smaller than 1, and ξ2(h) approaches −1 also from left, but here the modulus
of ξ2(h) is larger than 1. The last property leads to undesired eﬀects.
64 3 Multi-Step Methods
For the approximate solution in the node xk = kh, k = 0, 1, . . ., one gets
with (3.21)
yk = C1
�
1 + λh + O
�
h2
��xk/h
+ C2
�
−1 +
λ
3
h + O
�
h2
��xk/h
. (3.25)
The ﬁrst term converges to exp(λxk) for h → 0, since
lim
h→0
(1 + λh)
xk/h
= lim
h→0
�
1 + λxk
h
xk
�xk/h
= exp(λxk).
It behaves like the solution of the model initial value problem. The second
term behaves for small h like
(−1)xk/h
�
1 −
λ
3
h
�xk/h
.
Here, the second factor converges to exp(−λxk/3), but the ﬁrst factor oscillates
for xk/h ∈ N. That means, for the stable initial value problem with
λ < 0, this term gives an oscillatory, bounded (for ﬁxed xk), but exponentially
large perturbation.
The behavior of the solution depends on the constants C1 and C2. Inserting
the expansion (3.24) in condition (3.23) gives
eλh
= (1 − C2)
�
1 + λh + O
�
h2
��
+ C2
�
−1 +
λ
3
h + O
�
h2
��
= 1 + λh − 2C2 −
2λh
3
C2 + O
�
h2
�
.
An expansion of the exponential yields
1 + λh + O
�
h2
�
= 1 + λh − 2C2 −
2λh
3
C2 + O
�
h2
�
=⇒
O
�
h2
�
= −2C2 −
2λh
3
C2.
It follows that C2(h) = O
�
h2
�
and from (3.22), it follows that C1 = O (1).
In summary, it is for the second term of (3.25)
lim
h→0
C2(h)
�
−1 +
λ
3
h + O
�
h2
��xk/h
� �� �
bounded
= 0.
The method converges.
However, the term
3.3 Convergence of Multi-Step Methods 65
Fig. 3.2 Example 3.22: application of the 2-step Milne method to the model problem with
λ = −5 and h ∈ {0.1, 0.01, 0.001} (left to right, top to bottom).
C2(h)
�
−1 +
λ
3
h + O
�
h2
��xk/h
≈ ±h2
exp
�
−
λxk
3
�
becomes small in the case λ � −1 and large xk only if the step size h is very
small, see Figure 3.2.
The behavior found for this method can be observed in practice for all qstep
methods of consistency order q+2 if these methods are applied to initial
value problems with exponentially decaying solution. This kind of instability
is a strong restriction of the usefulness of these methods. ✷
Remark 3.23. Start of multi-step methods and convergence. Apart of the consistency
of multi-step methods, one is above all interested in their convergence.
For one-step methods, convergence follows from consistency under
rather general assumptions and the order of consistency and convergence are
the same, see Theorem 1.19. The situation becomes more complicated for
multi-step methods.
First of all, one needs for starting a q-step method besides the known
initial value y0 = y(x0) still (q − 1) further approximations y1, . . . , yq−1 for
y(x1), . . . , y(xq−1). These values can be computed, for instance by a onestep
method. The accuracy of these approximations has a strong inﬂuence on
the accuracy of the q-step method that uses these values. Assume that the
66 3 Multi-Step Methods
approximations behave as follows
y0 = y(x0), y1 = y(x1) + ε1(h), . . . , yq−1 = y(xq−1) + εq−1(h).
Then, the values that are computed with the q-step method depend also on
the perturbations ε1(h), . . . , εq−1(h) and one should write for the computed
solution in the node xk more exactly yk(ε, h), where ε(x, h) is a function for
which εi(h) = ε(xi, h), i = 1, . . . , q − 1, holds. ✷
Deﬁnition 3.24. Global error. Let y(x) be the solution of the initial value
problem (3.1). Denote the approximations of y(x) that are computed with a
multi-step method with step length h by yk(ε, h), where the accuracy of the
initial approximations is given by the function ε(x, h). Then, the quantity
e(xk, ε, h) := yk(ε, h) − y(xk)
is called global error or global discretization error at the node xk with respect
to the step length h and the perturbations ε(x, h). ✷
Deﬁnition 3.25. Convergence of a multi-step method. Consider the
ordinary diﬀerential equation of the initial value problem (3.1) in [a, b] and
let x0 ∈ [a, b]. A multi-step method for solving initial value problems of form
(3.1) is called convergent if
lim
n→∞
e(x, ε, hn) = 0, with hn =
x − x0
n
,
for all x ∈ [a, b], for all f ∈ C1
([a, b] × R), and for all functions ε(x, h) with
lim
n→∞
|ε(x, hn)| = 0, for x = x0 + ihn, i = 0, . . . , q − 1.
✷
Lemma 3.26. A convergent linear multi-step methods is null stable.
A convergent linear multi-step method (3.2) is null stable.
Proof. The proof is performed by contradiction. Assume that the linear multi-step method
is not null stable. Consider the initial value problem
y
�
(x) = 0, y(0) = 0,
whose solution is y(x) = 0. Applying a linear multi-step method of form (3.2) to this
problem yields the homogeneous linear diﬀerence equation
yk+1 −
q−1�
j=0
ajyk−j = 0. (3.26)
Since the method is assumed to be not null stable, the corresponding ﬁrst characteristic
polynomial Ψ(ξ) has a root with |ξ1| > 1 or a root |ξ2| = 1 that is not simple. Without loss
3.3 Convergence of Multi-Step Methods 67
of generality, let the multiplicity of ξ1 be one and of ξ2 be two. Similarly to Example 3.17,
one ﬁnds that the solution of (3.26) in the node xk = kh is given by
yk = C1ξ
k
1 + C2kξ
k
2 , C1, C2 ∈ R,
where one of these coeﬃcients is not zero.
Consider a ﬁxed x with x = mh, m ∈ N. Choosing C1 = C2 =
√
h, where it will be
discussed below that this is an admissible choice, the solution in x is given by
√
hξ
x/h
1 +
x
√
h
ξ
x/h
2 . (3.27)
For the initial value x = 0, the value of (3.27) is
√
h and for the initial value x = h, it
is
√
hξ1 +
√
hξ2. Thus, for the initial values, (3.27) converges to the analytic solution as
h → 0, so that the choices of C1 and C2 were admissible. However, for other values of x,
which is assumed to be ﬁx, both terms in (3.27) diverge. This observation contradicts the
assumed convergence of the linear multi-step method. Hence, it is null stable. �
Theorem 3.27. Connection of convergence and null stability. Let
yk+1 =
q−1�
j=0
ajyk−j + hΦ(xk+1, . . . , xk+1−q, yk+1, . . . , yk+1−q, h) (3.28)
be a consistent multi-step method for the solution of initial value problems of
form (3.1), which is more general than a linear multi-step method. Assume
that the incremental function satisﬁes the following conditions:
i) Φ(xk+1, . . . , xk+1−q, yk+1, . . . , yk+1−q, h) ≡ 0 for all x ∈ [a, b], all yk ∈ R,
and all h ∈ R if f(x, y) ≡ 0.
ii) Lipschitz continuity with respect to the y-components, i.e., there are constants
h0 > 0 and M such that
�
�Φ(xq, . . . , x0, vq, . . . , v0, h) − Φ(xq, . . . , x0, wq, . . . , w0, h)
�
�
≤ M
q�
i=0
|vi − wi|
for all xq, . . . , x0 ∈ [a, b], all vi, wi ∈ R, i = 0, . . . , q, and all step sizes h
with h < h0.
Then, the multi-step method converges if and only if it is null stable.
Proof. For the proof, it is referred to the literature, e.g., (Strehmel et al., 2012, Section
4.2.5). �
Remark 3.28. To Theorem 3.27.
• The ﬁrst assumption and the null stability guarantee that the multi-step
method solves the trivial initial value problem
y�
(x) = 0, y(x0) = 0,
exactly if ε0 = ε1 = . . . = εq−1 = 0.
68 3 Multi-Step Methods
• A linear multi-step method is a special case of (3.28). For linear multi-step
methods, the ﬁrst assumption is always satisﬁed, since the incremental
function is a linear combination of values of the right-hand side f(x, y) of
the ordinary diﬀerential equation. Due to the same reason, the incremental
function of these methods satisﬁes the second assumption if the right-hand
side f(x, y) is Lipschitz continuous with respect to the second argument.
Altogether, if the right-hand side of the initial value problem is suﬃciently
smooth, then a consistent linear multi-step method is convergent if and
only if it is null stable.
✷
Theorem 3.29. Order of convergence. Consider a multi-step method
of the form (3.28) that satisﬁes the assumptions stated in Theorem 3.27
and which possesses the order of consistency p. Then, it holds for all f ∈
Cp
([a, b] × R) and for all x ∈ [a, b] that
|e(x, ε, h)| = O (hp
) ,
if for the accuracy of the initial values it holds
|εi(h)| = O (hp
) for i = 0, . . . , q − 1.
Proof. See literature, e.g., (Strehmel et al., 2012, Section 4.2.5) or (Hairer et al., 1993,
Chapter III.4). �
Remark 3.30. Interpretation of Theorem 3.29. If a multi-step method with
consistency order p should also have convergence order p, then it is necessary
to compute the initial approximations suﬃciently accurately, e.g., with a onestep
method of order p. Considering the complete method, which consists of
the starting method for computing the approximations y1, . . . , yq−1 and a
predictor-corrector method for computing the other values, then the order of
the complete method is determined by the partial method with the lowest
order. ✷
3.4 Backward Diﬀerence Formula (BDF) Methods
Remark 3.31. Construction. The construction of Backward Diﬀerence Formula
(BDF) methods is based on the original initial value problem (3.1) and
not on the integral form (3.3) as it is the case for predictor-corrector methods.
Given q + 1 nodes xk+1−q, . . . , xk+1 and q ≥ 1 known approximations
of the solution yk+1−q, . . . , yk. Then, the idea of BDF methods consists in
approximating the solution by an interpolation polynomial pq(x) of degree
q with the nodes
�
xk+1−q, yk+1−q
�
, . . . , (xk, yk). Now, another condition is
needed in order to deﬁne a polynomial of degree q and this condition shall
also allow to compute yk+1. For BDF methods, one uses the requirement that
3.4 Backward Diﬀerence Formula (BDF) Methods 69
this polynomial should satisfy the diﬀerential equation (3.1) in xk+1, i.e.,
p�
q (xk+1) = f (xk+1, yk+1) , (3.29)
which leads to an interpolation of Hermite type. It follows from this requirement
that BDF methods are implicit methods. ✷
Example 3.32. BDF methods. Consider an equidistant grid with grid size h.
• q = 1. The linear interpolation polynomial through the points (xk, yk)
and (xk+1, yk+1) is given by the Newton representation (using divided
diﬀerences)
p1(x) = yk+1 + (x − xk+1)
yk − yk+1
xk − xk+1
.
It is
p�
1(x) =
yk − yk+1
xk − xk+1
such that requirement (3.29) and xk − xk+1 = −h leads to
yk − yk+1
−h
= f (xk+1, yk+1) ⇐⇒ yk+1 = yk + hf (xk+1, yk+1) .
Hence, BDF(1) is just the implicit Euler method.
• q = 2. The Newton representation of the quadratic interpolation polynomial
through (xk−1, yk−1) , (xk, yk) , (xk+1, yk+1) is given by
p2(x) = yk+1 + (x − xk+1)
yk − yk+1
xk − xk+1
+
(x − xk+1)(x − xk)
xk−1 − xk+1
�
yk−1 − yk
xk−1 − xk
−
yk − yk+1
xk − xk+1
�
. (3.30)
Computing the derivative of this polynomial and using that the grid is
equidistant yields
p�
2(x) =
yk − yk+1
−h
+
(x − xk+1) + (x − xk)
−2h
�
yk−1 − yk
−h
−
yk − yk+1
−h
�
,
such that requirement (3.29) leads to
p�
2(xk+1) =
yk+1 − yk
h
+
h
2h
�
yk+1 − 2yk + yk−1
h
�
= f (xk+1, yk+1) .
Collecting terms gives the BDF(2) method
3
2
yk+1 − 2yk +
1
2
yk−1 = hf (xk+1, yk+1) . (3.31)
BDF(2) is the most popular multi-step method for stiﬀ problems.
70 3 Multi-Step Methods
• q ≥ 3. The derivation of higher order methods proceeds in the same way.
One obtains, e.g., as BDF(3) method
11
6
yk+1 −
18
6
yk +
9
6
yk−1 −
2
6
yk−2 = hf (xk+1, yk+1) . (3.32)
It should be emphasized that in BDF methods the right-hand side of the
initial value problem appears only in one term, namely f (xk+1, yk+1). This
situation is in contrast to the predictor-corrector methods from Section 3.2.
This property of BDF methods is of advantage if the computation of the
right-hand side is complicated or numerically expensive, like for special discretizations
of partial diﬀerential equations. ✷
Lemma 3.33. Null stability of BDF(1), BDF(2), and BDF(3). The
methods BDF(1), BDF(2), and BDF(3) are null stable.
Proof. The statement of the lemma is obtained by computing the roots of the ﬁrst characteristic
polynomial.
• q = 1. The characteristic polynomial is λ − 1 with the root λ1 = 1.
• q = 2. The characteristic polynomial of BDF(2) (3.31) is
λ
2
−
4
3
λ +
1
3
.
A straightforward calculation gives
λ1 =
2
3
+
1
3
= 1, λ2 =
2
3
−
1
3
=
1
3
.
• q = 3. For BDF(3), see (3.32), one obtains the characteristic polynomial
λ
3
−
18
11
λ
2
+
9
11
λ −
2
11
.
By inserting, one checks that λ1 = 1 is a root of this polynomial. Extracting the linear
factor with this root yields
λ
3
− 18
11
λ
2
+ 9
11
λ − 2
11
λ − 1
= λ
2
−
7
11
λ +
2
11
.
The remaining roots are given by the roots of the quadratic polynomial, which are
λ2 =
7 + i
√
39
22
, λ3 =
7 − i
√
39
22
,
such that |λ2| = |λ3| =
√
22/11 ≈ 0.4264.
�
Remark 3.34. Null stability of BDF(q) methods. It can be shown that BDF(q)
methods are null stable only for q ≤ 6, e.g., see Cryer (1972). Hence, BDF(q)
methods for q > 6 are not of interest. ✷
Lemma 3.35. Consistency of BDF(q) methods. BDF(q) methods with
q ≤ 6 are consistent of order q.
3.4 Backward Diﬀerence Formula (BDF) Methods 71
Proof. The proof is obtained by a Taylor series expansion (exercise). �
Theorem 3.36. Convergence of BDF(q) methods. Let f ∈ Cq
([a, b]×R)
and Lipschitz continuous with respect to the second argument and let the
initial values be computed suﬃciently accurately, then the BDF(q) methods
with q ≤ 6 are convergent of order q.
Proof. The incremental function of BDF(q) methods is
Φ(xk+1, . . . , xk+1−q, yk+1, . . . , yk+1−q, h) = f
�
xk+1, yk+1
�
,
so that the assumptions of Theorem 3.27 are satisﬁed. Because BDF(q) methods with q ≤ 6
are null stable and consistent of order q, the other assumptions of Theorem 3.29 are also
satisﬁed and the statement of the theorem follows now from Theorem 3.29. �
Remark 3.37. On the stability. Stability of multi-step methods is studied at
the same initial value problem (2.7) as it was used for one-step methods.
In the same way as in Example 3.17, one obtains a homogeneous diﬀerence
equation
yk+1 −
q−1�
j=0
ajyk−j = z
q−1�
j=−1
bjyk−j
with z = λh. With the ansatz yk = ξk
and after division by ξk+1−q
, one
obtains a characteristic equation
Ψ(ξ) − zσ(ξ) = 0, (3.33)
compare (3.16), where Ψ(ξ) is the ﬁrst characteristic polynomial (3.18). The
polynomial σ has the coeﬃcients bj, compare (3.19).
Note that for z = 0, only the roots of the ﬁrst characteristic polynomial
are considered, which are important for the null stability of the method. This
relation might be the reason for the notion ‘null’ stable. ✷
Deﬁnition 3.38. Stability domain. The set
S =
�
z ∈ C : for all roots ξl of (3.33) it holds |ξl| ≤ 1;
if ξl is a multiple root, then it holds |ξl| < 1
�
is called stability domain of a linear multi-step method. ✷
Deﬁnition 3.39. A-stability, A(α)-stability. A linear multi-step method
is called A-stable if C−
⊂ S. It is called A(α)-stable with α ∈ (0, π/2) if
�
z ∈ C−
with |arg(z) − π| ≤ α
�
⊂ S,
with arg(z) ∈ [0, 2π). ✷
Theorem 3.40 (Second Dahlquist barrier). An A-stable linear multistep
method is at most of second order.
72 3 Multi-Step Methods
Table 3.1 Values of α (in degree) for the A(α)-stability of BDF(q) methods.
q 1 2 3 4 5 6
α 90 90 86.03 73.35 51.84 17.84
Proof. See literature, e.g., (Strehmel et al., 2012, Section 9.1). �
Remark 3.41. A(α)-stability of BDF(q) methods. BDF(q) methods are A(α)stable
for q ≤ 6 and even A-stable for q ≤ 2. The values of α for BDF(q)
methods are given in Table 3.1. Because of the small value of α for q = 6,
the method BDF(6) is not used in practice. ✷
Remark 3.42. Variable step size for BDF(q) methods. BDF(q) methods can
be used on non-equidistant grids, e.g., a formula for BDF(2) with variable
step size can be derived on the basis of the quadratic interpolation polynomial
(3.30). For q > 1 there is some restriction on the admissible change of the
mesh size from one mesh cell to its neighbor, e.g., for BDF(2) stability is
guaranteed as long as hk+1/hk ≤ 2.41421, see (Strehmel et al., 2012, p. 328)
for more details. ✷
Chapter 4
Summary and Outlook
4.1 Comparison of Numerical Methods
Remark 4.1. Motivation. Given an initial value problem in practice, one has
to choose a method for its numerical solution. It is desirable to use a method
that is appropriate for the given problem. This section discusses some criteria
for making the choice. ✷
Remark 4.2. Criteria for comparing numerical methods for solving initial
value problems.
• Computing time. Computing time is important in many applications. If
the evaluation of the right-hand side of the initial value problem is timeconsuming,
the number of evaluations is important. For implicit methods,
the number of calculations of the Jacobian and the number of LU factorizations
is of importance.
• Accuracy. Computing an accurate numerical solution is of course desirable.
However, aiming for high accuracy is often in conﬂict with having
short computing times. An easy step length control should be possible.
• Memory. On modern computers, the amount of memory is usually not
a big issue. However, if the given initial value problem has special structures,
like a sparse Jacobian, such structures should be supported by the
numerical method. It should be kept into consideration that on modern
computers the access to the memory determines essentially the computing
time (and not the number of ﬂoating point operations).
• Reliability. The step length control (or order control) should be sensitive
with respect to local changes of the right-hand side and act in a correct
way.
• Robustness. The method should work also for complicated examples. It
should be ﬂexible with the step length control, e.g., reduce the step length
appropriately if the right-hand side has steep gradients.
• Simplicity. In complex applications, often the simplicity of the method is
of importance.
73
74 4 Summary and Outlook
✷
Remark 4.3. Some experience. There are much more numerical methods for
solving initial value problems than presented in this course. Here, only some
remarks to the presented methods are given.
• Non-stiﬀ problems. For such problems, explicit one-step and linear multistep
methods were presented. Often, one-step methods need less steps
than multi-step methods and they require fewer evaluations of the righthand
side. A few popular explicit one-step methods are given in Remark
1.45.
• Complex problems. For complex problems, e.g., from ﬂuid dynamics,
where one has an initial value problem with respect to time, very often
only the simplest methods are used, like the explicit or implicit Euler
method, the trapezoidal rule (Crank–Nicolson scheme), or sometimes
BDF2. An eﬃcient and theoretically supported step length control is not
possible with these methods. Other methods, like Rosenbrock schemes,
have been used only for academic problems so far, e.g., in John & Rang
(2010).
✷
4.2 Boundary Value Problems
Remark 4.4. A one-dimensional boundary value problem. Boundary value
problems prescribe, in contrast to initial value problems, values at (some
part of) the boundary of the domain. A typical example in one dimension is
− u��
= f in (0, 1), u(0) = a, u(1) = b, (4.1)
with given values a, b ∈ R. Solving (4.1) can be performed in principal by
integrating the right-hand side of the diﬀerential equation twice. Whether or
not this analytic calculation can be performed depends on f. Each integration
gives a constant, such that the general solution of the diﬀerential equation
has two constants. The values of these constants can be determined with the
given boundary conditions. ✷
Remark 4.5. Boundary value problems in higher dimensions. Let Ω ⊂ Rd
,
d ∈ {2, 3}, be a bounded domain. Then, a typical boundary value problem is
− Δu = −
d�
i=1
∂2
u
∂x2
i
= f in Ω, u = g on ∂Ω, (4.2)
where Δ is the Laplacian and ∂Ω is the boundary of Ω. Solutions of problems
of type (4.2) can be hardly found analytically.
4.3 Diﬀerential-Algebraic Equations 75
The topic of Numerical Mathematics III will be the introduction of numerical
methods for solving problems of type (4.2). Such methods include Finite
Diﬀerence Methods, Finite Element Methods, and Finite Volume Methods.
✷
4.3 Diﬀerential-Algebraic Equations
Remark 4.6. Diﬀerential-Algebraic Equations (DAEs). In many applications,
the modeling of processes leads to a coupled system of equations of diﬀerent
type. A typical example are systems of the form
y�
(t) = f(t, y(t), z(t)),
0 = g(t, y(t), z(t)), (4.3)
with given functions f and g. In (4.3), the derivative of y with respect to t
occurs, but not the derivative of z with respect to t. Problem (4.3) is called
semi-explicit diﬀerential-algebraic equation (DAE), the variable y is the differential
variable, and z is the algebraic variable. In this context, the notion
‘algebraic’ means that there are no derivatives. ✷
Example 4.7. Equations for incompressible ﬂuids. Equations for the behavior
of incompressible ﬂuids are derived on the basis of two conservation laws.
The ﬁrst one is Newton’s second law of motion (net force equals mass times
acceleration, conservation of linear momentum) and the second one is the
conservation of mass. The unknown variables in these equations are the velocity
u(t, x, y, z) and the pressure p(t, x, y, z), where t is the time, (x, y, z)
the spatial variable, and
u(t, x, y, z) =


u1(t, x, y, z)
u2(t, x, y, z)
u3(t, x, y, z)

 .
Whereas the conservation of linear momentum contains the temporal derivative
∂tu of the velocity, i.e., it is a diﬀerential equation with respect to time,
the conservation of mass reads as follows
∇ · u = ∂xu1 + ∂yu2 + ∂zu3 = 0. (4.4)
Thus, one obtains a coupled model of the diﬀerential equation (with respect
to time) and the algebraic equation (4.4). ✷
Remark 4.8. Theory of DAEs. There is not suﬃcient time for presenting the
theory of DAEs. One can ﬁnd it, e.g., in (Strehmel et al., 2012, Chapter 13)
or (Kunkel & Mehrmann, 2006, Part I). ✷
76 4 Summary and Outlook
Remark 4.9. Direct approach for the discretization of DAEs. In the so-called
direct approach, the DAE (4.3) is embedded in the so-called singularly perturbed
problem
y�
(t) = f(t, y(t), z(t)),
εz�
(t) = g(t, y(t), z(t)), (4.5)
with 0 < ε � 1. Problem (4.5) is an ODE, for which the methods presented
in this course can be applied. Formulating these methods for problem (4.5),
then the so-called singular perturbation parameter ε appears. Setting then
ε = 0 in these formulations leads formally to methods for the DAE (4.3).
Now, one has to study the properties of these methods. ✷
Appendix A
Topics on the Theory of Ordinary
Diﬀerential Equations
A.1 Ordinary Diﬀerential Equations of Higher Order
Remark A.1. Motivation. The notation of stiﬀness comes from the consideration
of ﬁrst order systems of ordinary diﬀerential equations. There are some
connections of such systems to ordinary diﬀerential equations of higher order,
e.g. a solution method for linear ﬁrst order systems requires the solution of
a higher order linear diﬀerential equation, see Remark A.36. ✷
A.1.1 Deﬁnition, Connection to First Order Systems
Deﬁnition A.2. General and explicit n-th order ordinary diﬀerential
equation. The general ordinary diﬀerential equation of order n has the form
F x, y(x), y′
(x), . . . , y(n)
(x) = 0. (A.1)
This equation is called explicit, if one can write it in the form
y(n)
(x) = f x, y(x), y′
(x), . . . , y(n−1)
(x) . (A.2)
The function y(x) is a solution of (A.1) in an interval I if y(x) is n times
continuously diﬀerentiable in I and if y(x) satisﬁes (A.1).
Let x0 ∈ I be given. Then, (A.1) together with the conditions
y(x0) = y0, y′
(x0) = y1, . . . , y(n−1)
(x0) = yn−1
is called initial value problem for (A.1). ✷
77
78 A Topics on the Theory of Ordinary Diﬀerential Equations
Example A.3. Special cases. The general resp. explicit ordinary diﬀerential
equation of higher order can be solved analytically only in special cases. Two
special cases, that will not be considered here, are as follows:
• Consider the second order diﬀerential equation
y′′
(x) = f(x, y′
(x)).
Substituting y′
(x) = z(x), one obtains a ﬁrst order diﬀerential equation
for z(x)
z′
(x) = f(x, z(x)).
If one can solve this equation analytically, one gets y′
(x). If it is then possible
to ﬁnd a primitive of y′
(x), one has computed an analytical solution
of the diﬀerential equation of second order. In the case of an initial value
problem with
y(x0) = y0, y′
(x0) = y1,
the initial value for the ﬁrst order diﬀerential equation is
z(x0) = y1.
The second initial value is needed for determining the constant of the
primitive of y′
(x).
• Consider the diﬀerential equation of second order
y′′
(x) = f(y, y′
).
Let a solution y(x) of this diﬀerential equation be known and let y−1
(y)
its inverse function, i.e. y−1
(y(x)) = x. Then, one can use the ansatz
p(y) := y′
y−1
(y) .
With the rule for diﬀerentiating the inverse function ((f−1
)′
(y0) =
1/f′
(x0)), one obtains
dp
dy
(y) = y′′
y−1
(y)
d
dy
y−1
(y(x)) =
y′′
y−1
(y)
y′(x)
=
y′′
y−1
(y)
y′ (y−1(y))
=
y′′
y−1
(y)
p(y)
=
y′′
(x)
p(y)
.
This approach leads then to the ﬁrst order diﬀerential equation
p′
(y) =
f(y, p(y))
p(y)
.
✷
A.1 Ordinary Diﬀerential Equations of Higher Order 79
Theorem A.4. Connection of explicit ordinary diﬀerential equations
of higher order and systems of diﬀerential equations of ﬁrst
order. Every explicit diﬀerential equation of n-th order (A.2) can be transformed
equivalently to a system of n diﬀerential equations of ﬁrst order
y′
k(x) = yk+1(x), k = 1, . . . , n − 1,
y′
n(x) = f(x, y1(x), . . . , yn(x)) (A.3)
or (note that the system is generally nonlinear, since the unknown functions
appear also in f(·, . . . , ·))
y′
(x) =





y′
1(x)
y′
2(x)
...
y′
n(x)





=





0 1 0 · · · 0
0 0 1 · · · 0
...
...
...
...
...
0 0 0 · · · 0










y1(x)
y2(x)
...
yn(x)





+





0
0
...
f(x, y1, . . . , yn)





for the n functions y1(x), . . . , yn(x). The solution of (A.2) is y(x) = y1(x).
Proof. Insert in (A.2)
y1(x) := y(x), y2(x) := y′
1(x) = y′
(x), y3(x) := y′
2(x) = y′′
(x), . . .
yn(x) := y′
n−1(x) = y(n−1)
(x).
If y ∈ Cn(I) is a solution of (A.2), then y1(x), . . . , yn(x) is obviously a solution of (A.3)
in I.
Conversely, if y1(x), . . . , yn(x) ∈ C1(I) is a solution of (A.3), then it holds
y2(x) = y′
1(x), y3(x) = y′
2(x) = y′′
1 (x), . . . , yn(x) = y
(n−1)
1 (x)
y′
n(x) = y
(n)
1 (x) = f(x, y1, . . . , yn).
Hence, the function y1(x) is n times continuously diﬀerentiable and it is the solution of
(A.2) in I.
Example A.5. Transform of a higher order diﬀerential equation into a system
of ﬁrst order equations. The third order diﬀerential equation
y′′′
(x) + 2y′′
(x) − 5y′
(x) = f(x, y(x))
can be transformed into the form
y1(x) = y(x)
y′
1(x) = y2(x)(= y′
(x))
y′
2(x) = y3(x)(= y′′
(x))
y′
3(x) = y′′′
(x) = −2y′′
(x) + 5y′
(x) + f(x, y(x))
= −2y3(x) + 5y2(x) + f(x, y1(x)).
✷
80 A Topics on the Theory of Ordinary Diﬀerential Equations
A.1.2 Linear Diﬀerential Equations of n-th Order
Deﬁnition A.6. Linear n-th order diﬀerential equations. A linear differential
equation of n-th order has the form
an(x)y(n)
(x)+an−1(x)y(n−1)
(x)+. . .+a1(x)y′
(x)+a0(x)y(x) = f(x), (A.4)
where the functions a0(x), . . . , an(x) are continuous in the interval I, in which
a solution of (A.4) is searched, and it holds an(x) = 0 in I. The linear n-th
order diﬀerential equation is called homogeneous if f(x) = 0 for all x ∈ I
an(x)y(n)
(x) + an−1(x)y(n−1)
(x) + . . . + a1(x)y′
(x) + a0(x)y(x) = 0. (A.5)
✷
Theorem A.7. Superposition principle for linear diﬀerential equations
of higher order. Consider the linear diﬀerential equation of n-th
order (A.4), then the superposition principle holds:
i) If y1(x) and y2(x) are two solutions of the homogeneous equation (A.5),
then c1y1(x) + c2y2(x), c1, c2 ∈ R, is a solution of the homogeneous equation,
too.
ii) If y0(x) is a solution of the inhomogeneous equation and y1(x) is a solution
of the homogeneous equation, then y0(x) + y1(x) is a solution of the
inhomogeneous equation.
iii) If y1(x) and y2(x) are two solutions of the inhomogeneous equation, then
y1(x) − y2(x) is a solution of the homogeneous equation.
Proof. Direct calculations, exercise.
Corollary A.8. General solution of the inhomogeneous diﬀerential
equation. The general solution of (A.4) is the sum of the general solution
of the homogeneous linear diﬀerential equation of n-th order (A.5) and one
special solution of the inhomogeneous n-th order diﬀerential equation (A.4).
Remark A.9. Transform in a linear system of ordinary diﬀerential equations
of ﬁrst order. A linear diﬀerential equation of n-th order can be transformed
equivalently into a linear n × n system
y′
k(x) = yk+1(x), k = 1, . . . , n − 1,
y′
n(x) = −
n−1
i=0
ai(x)
an(x)
yi+1(x) +
f(x)
an(x)
or
A.1 Ordinary Diﬀerential Equations of Higher Order 81
y′
(x) =





y′
1(x)
y′
2(x)
...
y′
n(x)





=





0 1 0 · · · 0
0 0 1 · · · 0
...
...
...
...
...
− a0(x)
an(x) − a1(x)
an(x) − a2(x)
an(x) · · · −an−1(x)
an(x)










y1(x)
y2(x)
...
yn(x)





+





0
0
...
f(x)
an(x)





=: A(x)y(x) + f(x). (A.6)
✷
Theorem A.10. Existence and uniqueness of a solution of the initial
value problem. Let I = [x0−a, x0+a] and ai ∈ C(I), i = 0, . . . , n, f ∈ C(I).
Then, the linear diﬀerential equation of n-th order (A.4) has exactly one
solution y ∈ Cn
(I) for given initial value
y(x0) = y0, y′
(x0) = y1, . . . , y(n−1)
(x0) = yn−1.
Proof. Since (A.4) is equivalent to the system (A.6), one can apply the theorem on global
existence and uniqueness of a solution of an initial value problem from Picard–Lindel¨of, see
lecture notes Numerical Mathematics I or the literature. To this end, one has to show the
Lipschitz continuity of the right-hand side of (A.6) with respect to y1, . . . , yn. Denoting
the right-hand side by F(x, y) gives
F(x, y) − F(x, ˜y) [C(I)]n = A(y − ˜y) [C(I)]n ≤ A [C(I)]n,∞ y − ˜y [C(I)]n ,
where one uses the triangle inequality to get
Ai·y C(I) = max
x∈I
n
j=1
aij(x)yj(x) ≤ max
x∈I
n
j=1
|aij(x)| max
j=1,...,n
max
x∈I
|yj(x)|
= Ai· C(I) y [C(I)]n
for i = 1, . . . , n. Now, one can choose
L = A [C(I)]n,∞ = max
x∈I
max 1,
a1(x)
an(x)
+ . . . +
an−1(x)
an(x)
.
All terms are bounded since I is closed (compact) and continuous functions are bounded
on compact sets.
Deﬁnition A.11. Linearly independent solutions, fundamental system.
The solutions yi(x) : I → R, i = 1, . . . , k, of (A.5) are called linearly
independent if from
k
i=1
ciyi(x) = 0, for all x ∈ I, ci ∈ R,
82 A Topics on the Theory of Ordinary Diﬀerential Equations
it follows that ci = 0 for i = 1, . . . , k. A set of n linearly independent solutions
is called a fundamental system of (A.5). ✷
Deﬁnition A.12. Wronski1
matrix, Wronski determinant. Let yi(x),
i = 1, . . . , k, be solutions of (A.5). The matrix
W(x) =





y1(x) . . . yk(x)
y′
1(x) . . . y′
k(x)
...
y
(n−1)
1 (x) . . . y
(n−1)
k (x)





is called Wronski matrix. For k = n the Wronski determinant is given by
det(W)(x) =: W(x). ✷
Lemma A.13. Properties of the Wronski matrix and Wronski determinant.
Let I = [a, b] and let y1(x), . . . , yn(x) be solutions of (A.5).
i) The Wronski determinant fulﬁlls the linear ﬁrst order diﬀerential equation
W′
(x) = −
an−1(x)
an(x)
W(x).
ii) It holds for all x ∈ I
W(x) = W(x0) exp −
x
x0
an−1(t)
an(t)
dt
with arbitrary x0 ∈ I.
iii) If there exists a x0 ∈ I with W(x0) = 0, then it holds W(x) = 0 for all
x ∈ I.
iv) If there exists a x0 ∈ I with rank(W(x0)) = k, then there are at least k
solutions of (A.5), e.g. y1(x), . . . , yk(x), linearly independent.
Proof. i) Let Sn be the set of all permutations of {1, . . . , n} and let σ ∈ Sn. Denote the
entries of the Wronski matrix by W(x) = yjk(x)
n
j,k=1
. If σ = (σ1, . . . , σn), then let
n
j=1
yj,σj (x) = (y1,σ1 y2,σ2 . . . yn,σn ) (x).
Applying the Laplace2 formula for determinants and the product rule yields
1 Joseph Marie Wronski (1758 – 1853)
2 Pierre–Simon (Marquis de) Laplace (1749 – 1827)
A.1 Ordinary Diﬀerential Equations of Higher Order 83
d
dx
det(W(x)) =
d
dx


σ∈Sn

sgn(σ)
n
j=1
yj,σj (x)




=
σ∈Sn

sgn(σ)
n
i=1


n
j=1,j=i
yj,σj (x)

 y′
i,σi
(x)


=
n
i=1


σ∈Sn

sgn(σ)
n
j=1,j=i
yj,σj (x)y′
i,σi
(x)




=
n
i=1
det



· · · · · · · · ·
y
(i−1)
1 (x)
′
· · · y
(i−1)
n (x)
′
· · · · · · · · ·


 .
exercise for n = 2, 3. In the last step, again the Laplace formula for determinants was
applied. In the i-th row of the last matrix is the ﬁrst derivative of the corresponding
row of the Wronski matrix, i.e. there is the i-th order derivative of (y1(x), . . . , yn(x)).
The rows with dots in this matrix coincide with the respective rows of W(x). For
i = 1, . . . , n − 1, the determinants vanish, since in these cases there are two identical
rows, namely row i and i + 1. Thus, it is
d
dx
det(W(x)) = det








y1(x) . . . yn(x)
y′
1(x) . . . y′
n(x)
...
y
(n−2)
1 (x) . . . y
(n−2)
n (x)
y
(n)
1 (x) . . . y
(n)
n (x)








.
Now, one uses that y1(x), . . . , yn(x) are solutions of (A.5) and one replaces the n-th
derivative in the last row by (A.5). Using rules for the evaluation of determinants, one
obtains
d
dx
det(W(x)) =
n
i=1
−
ai−1(x)
an(x)
det





y1(x) . . . yn(x)
y′
1(x) . . . y′
n(x)
...
y
(i−1)
1 (x) . . . y
(i−1)
n (x)





.
Apart of the last term, all other determinants vanish, since all other terms have two
identical rows, namely the i-th row and the last row.
ii) This term is the solution of the initial value problem for the Wronski determinant and
the initial value W(x0), see the respective theorem in the lecture notes of Numerical
Mathematics I.
iii) This statement follows directly from ii) since the exponential does not vanish.
iv) exercise
Theorem A.14. Existence of a fundamental system, representation
of the solution of a homogeneous linear diﬀerential equation of nth
order by the fundamental system. Let I = [a, b] with x0 ∈ I. The
homogeneous equation (A.5) has a fundamental system in I. Each solution of
(A.5) can be written as a linear combination of the solutions of an arbitrary
fundamental system.
84 A Topics on the Theory of Ordinary Diﬀerential Equations
Proof. Consider n homogeneous initial value problems with the initial values
y
(i−1)
j (x0) = δij, i, j = 1, . . . , n.
Each of these initial value problems has a unique solution yj(x), see Theorem A.10. It is
W(x0) = 1 for these solutions. From Lemma A.13, iii), it follows that {y1(x), . . . , yn(x)}
is a fundamental system.
Let y(x) be an arbitrary solution of (A.5) with the initial values y(i−1)(x0) = ˜yi−1,
i = 1, . . . , n, and {y1(x), . . . , yn(x)} an arbitrary fundamental system. The system





y1(x0) . . . yn(x0)
y′
1(x0) . . . y′
n(x0)
.
.
.
y
(n−1)
1 (x0) . . . y
(n−1)
n (x0)










c0
c1
...
cn−1





=





˜y0
˜y1
...
˜yn−1





has a unique solution since the matrix spanned by a fundamental system is not singular.
The function n
i=1 ci−1yi(x) satisﬁes the initial conditions (these are just the equations
of the system) and, because of the superposition principle, it is a solution of (A.5). Since
the solution of the initial value problem to (A.5) is unique, Theorem A.10, it follows that
y(x) = n
i=1 ci−1yi(x).
Theorem A.15. Special solution of the inhomogeneous equation. Let
{y1(x), . . . , yn(x)} be a fundamental system of the homogeneous equation
(A.5) in I = [a, b]. In addition, let Wl(x) be the determinant, which is obtained
from the Wronski determinant W(x) with respect to {y1(x), . . . , yn(x)}
by replacing the l-th column by (0, 0, . . . , f(x)/an(x))T
. Then,
y(x) =
n
l=1
yl(x)
x
x0
Wl(t)
W(t)
dt, x0, x ∈ I,
is a solution of the inhomogeneous equation (A.4).
Proof. The proof uses the principle of the variation of the constants. This principle will be
explained in a simpler setting in Remark A.27. For details of the proof, see the literature.
A.1.3 Linear n-th Order Diﬀerential Equations with
Constant Coeﬃcients
Deﬁnition A.16. Linear diﬀerential equation of n-th order with constant
coeﬃcients. A linear n-th order diﬀerential equation with constant
coeﬃcients has the form
any(n)
(x) + an−1y(n−1)
(x) + . . . + a1y′
(x) + a0y(x) = f(x), (A.7)
with ai ∈ R, i = 0, . . . , n, an = 0. ✷
A.1 Ordinary Diﬀerential Equations of Higher Order 85
A.1.3.1 The Homogeneous Equation
Remark A.17. Basic approach for solving the homogeneous linear diﬀerential
equation of n-th order with constant coeﬃcients. Because of the superposition
principle, one needs the general solution of the homogeneous diﬀerential
equation. That means, one has to ﬁnd a fundamental system, i.e. n linearly
independent solutions.
Consider
n
i=0
aiy
(i)
h (x) = 0. (A.8)
In the case of a diﬀerential equation of ﬁrst order, i.e. n = 1,
a1y′
h(x) + a0yh(x) = 0,
one can get the solution by the method of separating the variables (unknowns),
see lecture notes of Numerical Mathematics I. One obtains
yh(x) = c exp −
a0
a1
x , c ∈ R.
One uses the same structural ansatz for computing the solution of (A.8)
yh(x) = eλx
, λ ∈ C. (A.9)
It follows that
y′
h(x) = λeλx
, . . . , y
(n)
h (x) = λn
eλx
.
Inserting into (A.8) gives
anλn
+ an−1λn−1
+ . . . + a1λ + a0 eλx
= 0. (A.10)
It is eλx
= 0, also for complex λ. Because, using Euler’s formula, it holds for
λ = a + ib, a, b ∈ R, that
eλx
= eax
(cos(bx) + i sin(bx)) = eax
cos(bx) + ieax
sin(bx).
A complex number is zero iﬀ its real part and its imaginary part are vanish.
It is eax
> 0 and there does not exist a (bx) ∈ R such that at the same time
sin(bx) and cos(bx) vanish. Hence, eλx
= 0.
The equation (A.10) is satisﬁed iﬀ one of the factors is equal to zero. Since
the second factor cannot vanish, it must hold
p(λ) := anλn
+ an−1λn−1
+ . . . + a1λ + a0 = 0.
The function p(λ) is called characteristic polynomial of (A.8). The roots of
the characteristic polynomial are the values of λ in the ansatz of yh(x).
86 A Topics on the Theory of Ordinary Diﬀerential Equations
From the fundamental theorem of algebra it holds that p(λ) has exactly
n roots, which do not need to be mutually diﬀerent. Since the coeﬃcients of
p(λ) are real numbers, it follows that with each complex root λ1 = a + ib,
a, b ∈ R, b = 0, also its conjugate λ2 = a − ib is a root of p(λ).
It will be shown that the basic ansatz (A.9) is not suﬃcient in the case of
multiple roots. ✷
Theorem A.18. Linearly independent solutions in the case of real
roots with multiplicity k. Let λ0 ∈ R be a real root of the characteristic
polynomial p(λ) with multiplicity k, 1 ≤ k ≤ n. Then, one can obtain with
λ0 the k linearly independent solutions of (A.8)
yh,1(x) = eλ0x
, yh,2(x) = xeλ0x
, . . . , yh,k(x) = xk−1
eλ0x
. (A.11)
Proof. For k = 2.
yh,1(x), yh,2(x) solve (A.8). This statement is already clear for yh,1(x) since this function
has the form of the ansatz (A.9). For yh,2(x) it holds
y′
h,2(x) = (1 + λ0x) eλ0x
,
y′′
h,2(x) = 2λ0 + λ2
0x eλ0x
,
...
y
(n)
h,2(x) = nλn−1
0 + λn
0 x eλ0x
.
Inserting into the left-hand side of (A.8) yields
eλ0x
n
i=0
ai(iλi−1
0 + λi
0x) = eλ0x
x
n
i=0
aiλi
0
p(λ0)
+
n
i=0
aiiλi
0
p′(λ0)
. (A.12)
It is p(λ0) = 0, since λ0 is a root of p(λ). The second term is the derivative p′(λ) of p(λ)
at λ0. Since the multiplicity of λ0 is two, one can write p(λ) in the form
p(λ) = (λ − λ0)2
p0(λ),
where p0(λ) is a polynomial of degree n − 2. It follows that
p′
(λ) = 2 (λ − λ0) p0(λ) + (λ − λ0)2
p′
0(λ).
Hence, it holds p′(λ0) = 0, (A.12) vanishes, and yh,2(x) is a solution of (A.8).
yh,1(x), yh,2(x) are linearly independent. One has to show, Lemma A.13, that the Wronski
determinant does not vanish. It holds
W(x) = det
yh,1(x) yh,2(x)
y′
h,1(x) y′
h,2(x)
= det
eλ0x xeλ0x
λ0eλ0x (1 + λ0x)eλ0x
= e2λ0x
det
1 x
λ0 1 + λ0x
= e2λ0x
(1 + λ0x − λ0x) = e2λ0x
> 0
for all x ∈ I.
A.1 Ordinary Diﬀerential Equations of Higher Order 87
Roots of multiplicity k > 2. The principle proof is analogous to the case k = 2, where one
uses the factorization p(λ) = (λ − λ0)k
p0(λ). The computation of the Wronski determinant
becomes more involved.
Remark A.19. Complex roots. The statement of Theorem A.18 is true also
for complex roots of p(λ). The Wronski determinant is e2λ1x
= 0. However,
the corresponding solutions, e.g.
˜y1,h(x) = eλ1x
= e(a+ib)x
are complex-valued. Since one has real coeﬃcients in (A.8), one likes to obtain
also real-valued solutions. Such solutions can be constructed from the
complex-valued solutions.
Let λ1 = a + ib, λ1 = a − ib, a, b ∈ R, b = 0, be a conjugate complex roots
of p(λ), then one obtains with Euler’s formula
eλ1x
= e(a+ib)x
= eax
(cos(bx) + i sin(bx)) ,
eλ1x
= e(a−ib)x
= eax
(cos(bx) − i sin(bx)) .
Because of the superposition principle, each linear combination is also solution
of (A.8). ✷
Theorem A.20. Linearly independent solution for simple conjugate
complex roots. Let λ1 ∈ C, λ1 = a+ib, b = 0, be a simple conjugate complex
root of the characteristic polynomial p(λ) with real coeﬃcients. Then,
yh,1(x) = Re eλ1x
= eax
cos(bx), yh,2(x) = Im eλ1x
= eax
sin(bx),
are real-valued, linearly independent solutions of (A.8).
Proof. Use the superposition principle for proving that the functions are solutions and
the Wronski determinant for proving that they are linearly independent, exercise.
Theorem A.21. Linearly independent solution for conjugate complex
roots with multiplicity greater than one. Let λ1 ∈ C, λ1 = a+ib,
b = 0, be a conjugate complex root with multiplicity k of the characteristic
polynomial p(λ) with real coeﬃcients. Then,
yh,1(x) = eax
cos(bx), . . . , yh,k(x) = xk−1
eax
cos(bx),
yh,k+1(x) = eax
sin(bx), . . . , yh,2k(x) = xk−1
eax
sin(bx) (A.13)
are real-valued, linearly independent solutions of (A.8).
Proof. The proof is similarly to the previous theorems.
Theorem A.22. Fundamental system for (A.8). Let p(λ) be the characteristic
polynomial of (A.8) with the roots λ1, . . . , λn ∈ C, where the roots
are counted in correspondence to their multiplicity. Then, the set of solutions
of form (A.11) and (A.13) form a fundamental system of (A.8).
88 A Topics on the Theory of Ordinary Diﬀerential Equations
Proof. A real root with multiplicity k gives k linearly independent solutions and a conjugate
complex root with multiplicity k gives 2k linearly independent solutions. Thus, the
total number of solutions of form (A.11) and (A.13) is equal to the number of roots of p(λ).
This number is equal to n, because of the fundamental theorem of algebra. It is known
from Theorem A.14 that a fundamental system has exactly n functions. Altogether, the
correct number of functions is there.
One can show that solutions that correspond to diﬀerent roots are linearly independent,
e.g., (G¨unther et al., 1974, p. 75). The linearly independence of the solutions that belong
to the same root, was already proved.
Example A.23. Homogeneous second order linear diﬀerential equation with
constant coeﬃcients.
1. Consider
y′′
(x) + 6y′
(x) + 9y(x) = 0.
The characteristic polynomial is
p(λ) = λ2
+ 6λ + 9
with the roots λ1 = λ2 = −3. One obtains the fundamental system
yh,1(x) = e−3x
, yh,2(x) = xe−3x
.
The general solution of the homogeneous equation has the form
yh(x) = c1yh,1(x) + c2yh,2(x) = c1e−3x
+ c2xe−3x
, c1, c2 ∈ R.
2. Consider
y′′
(x) + 4y(x) = 0 =⇒ p(λ) = λ2
+ 4 =⇒ λ1,2 = ±2i.
It follows that
yh,1(x) = cos(2x), yh,2(x) = sin(2x)
yh(x) = c1 cos(2x) + c2 sin(2x), c1, c2 ∈ R.
✷
A.1.3.2 The Inhomogeneous Equation
Remark A.24. Goal. Because of the superposition principle, a special solution
of (A.7) has to be found. This section sketches several possibilities to obtain
such a solution. ✷
Remark A.25. Appropriate ansatz (St¨orgliedans¨atze). If the right-hand side
f(x) possesses a special form, it is possible to obtain a solution of the inhomogeneous
equation (A.7) with an appropriate ansatz. From (A.7) it becomes
clear, that this way works only if on the left-hand side and the right-hand
A.1 Ordinary Diﬀerential Equations of Higher Order 89
side of the equation are the same types of functions. In particular, one needs
the same types of functions for yi(x) and all derivatives up to order n. This
approach works, e.g., for the following classes of right-hand sides:
• f(x) is a polynomial
f(x) = b0 + b1x + . . . + bmxm
, bm = 0.
The appropriate ansatz is also a polynomial
yi(x) = xk
(c0 + c1x + . . . + cmxm
) ,
where 0 is a root of p(λ) with multiplicity k.
• If the right-hand side is
f(x) = (b0 + b1x + . . . + bmxm
) eax
,
then one can use the following ansatz
yi(x) = xk
(c0 + c1x + . . . + cmxm
) eax
,
where a is a root of p(λ) with multiplicity k. The ﬁrst class of functions
is just a special case for a = 0.
• For right-hand sides of the form
f(x) = (b0 + b1x + . . . + bmxm
) cos(bx),
f(x) = (b0 + b1x + . . . + bmxm
) sin(bx),
one can use the ansatz
yi(x) = xk
(c0 + c1x + . . . + cmxm
) cos(bx)
+xk
(d0 + d1x + . . . + dmxm
) sin(bx),
if ib is a root of p(λ) with multiplicity k.
One can ﬁnd the ansatz for more right-hand sides in the literature, e.g. in
Heuser (2006). ✷
Example A.26. Appropriate ansatz (St¨orgliedansatz). Consider
y′′
(x) − y′
(x) + 2y(x) = cos x.
The appropriate ansatz is given by
yi(x) = a cos x + b sin x =⇒
y′
i(x) = −a sin x + b cos x =⇒
y′′
i (x) = −a cos x − b sin x.
Inserting into the equation gives
90 A Topics on the Theory of Ordinary Diﬀerential Equations
−a cos x − b sin x + a sin x − b cos x + 2a cos x + 2b sin x = cos x =⇒
(−a − b + 2a) cos x + (−b + a + 2b) sin x = cos x.
The last equation is satisﬁed if the numbers a, b solve the following linear
system of equations
a − b = 1, a + b = 0 =⇒ a =
1
2
, b = −
1
2
.
One obtains the special solution
yi(x) =
1
2
(cos x − sin x) .
✷
Remark A.27. Variation of the constants. If one cannot ﬁnd an appropriate
ansatz, then one can try the variation of the constants. This approach will
be demonstrated for the second order diﬀerential equation
y′′
(x) + a1y′
(x) + a0y(x) = f(x). (A.14)
Let yh,1(x), yh,2(x) be two linearly independent solutions of the homogeneous
diﬀerential equation such that
yh(x) = c1yh,1(x) + c2yh,2(x)
is the general solution of the homogeneous equation. Now, one makes the
ansatz
yi(x) = c1(x)yh,1(x) + c2(x)yh,2(x)
with two unknown functions c1(x), c2(x). The determination of these functions
requires two conditions. One has
y′
i(x) = c′
1(x)yh,1(x) + c1(x)y′
h,1(x) + c′
2(x)yh,2(x) + c2(x)y′
h,2(x)
= (c′
1(x)yh,1(x) + c′
2(x)yh,2(x)) + c1(x)y′
h,1(x) + c2(x)y′
h,2(x).
Now, one sets the term in the parentheses zero. This is the ﬁrst condition. It
follows that
y′′
i (x) = c′
1(x)y′
h,1(x) + c1(x)y′′
h,1(x) + c′
2(x)y′
h,2(x) + c2(x)y′′
h,2(x).
Inserting this expression into (A.14) gives
A.1 Ordinary Diﬀerential Equations of Higher Order 91
f(x) = c′
1(x)y′
h,1(x) + c1(x)y′′
h,1(x) + c′
2(x)y′
h,2(x) + c2(x)y′′
h,2(x)
+a1 c1(x)y′
h,1(x) + c2(x)y′
h,2(x) + a0 (c1(x)yh,1(x) + c2(x)yh,2(x))
= c1(x) y′′
h,1(x) + a1y′
h,1(x) + a0yh,1(x)
=0
+c2(x) y′′
h,2(x) + a1y′
h,2(x) + a0yh,2(x)
=0
+c′
1(x)y′
h,1(x) + c′
2(x)y′
h,2(x).
This is the second condition. Summarizing both conditions gives the following
system of equations
yh,1(x) yh,2(x)
y′
h,1(x) y′
h,2(x)
c′
1(x)
c′
2(x)
=
0
f(x)
.
This system possesses a unique solution since yh,1(x), yh,2(x) are linearly
independent from what follows that the determinant of the system matrix,
which is just the Wronksi matrix, is not equal to zero. The solution is
c′
1(x) = −
f(x)yh,2(x)
yh,1(x)y′
h,2(x) − y′
h,1(x)yh,2(x)
, c′
2(x) =
f(x)yh,1(x)
yh,1(x)y′
h,2(x) − y′
h,1(x)yh,2(x)
.
The success of the method of the variation of the constants depends only on
the diﬃculty to ﬁnd the primitives of c′
1(x) and c′
2(x).
For equations of order higher than two, one has the goal to get a linear
system of equations for c′
1(x), . . . , c′
n(x). To this end, one sets for each
derivative of the ansatz the terms with c′
1(x), . . . , c′
n(x) equal to zero. The
obtained linear system of equations has as matrix the Wronski matrix and
as right-hand side a vector, whose ﬁrst (n − 1) components are equal to zero
and whose last component is f(x). ✷
Example A.28. Variation of the constants. Find the general solution of
y′′
(x) + 6y′
(x) + 9y(x) =
e−3x
1 + x
.
The general solution of the homogeneous equation is
yh(x) = c1e−3x
+ c2xe−3x
,
see Example A.23. The variation of the constants leads to the following system
of linear equations
e−3x
xe−3x
−3e−3x
(1 − 3x)e−3x
c′
1(x)
c′
2(x)
=
0
e−3x
1+x
.
Using, e.g., the Cramer rule, gives
92 A Topics on the Theory of Ordinary Diﬀerential Equations
c′
1(x) = −
e−6x x
1+x
(1 − 3x + 3x)e−6x
= −
x
1 + x
,
c′
2(x) =
e−6x 1
1+x
(1 − 3x + 3x)e−6x
=
1
1 + x
.
One obtains
c1(x) = −
x
1 + x
dx = −
1 + x
1 + x
dx +
1
1 + x
dx = −x + ln |1 + x| ,
c2(x) =
1
1 + x
dx = ln |1 + x| .
Thus, one gets
yi(x) = (−x + ln |1 + x|) e−3x
+ ln |1 + x| xe−3x
and one obtains for the general solution
y(x) = (−x + ln |1 + x| + c1) e−3x
+ (ln |1 + x| + c2) xe−3x
.
Inserting this function into the equation proves the correctness of the result.
✷
A.2 Linear Systems of Ordinary Diﬀerential Equations
of First Order
A.2.1 Deﬁnition, Existence and Uniqueness of a
Solution
Deﬁnition A.29. Linear system of ﬁrst order diﬀerential equations.
In a linear system of ordinary diﬀerential equations of ﬁrst order one tries
to ﬁnd functions y1(x), . . . , yn(x) : I → R, I = [a, b] ⊂ R, that satisfy the
system
y′
i(x) =
n
j=1
aij(x)yj(x) + fi(x), i = 1, . . . , n,
or in matrix-vector notation
y′
(x) = A(x)y(x) + f(x) (A.15)
with
A.2 Linear Systems of Ordinary Diﬀerential Equations of First Order 93
y(x) =



y1(x)
...
yn(x)


 , y′
(x) =



y′
1(x)
...
y′
n(x)


 ,
A(x) =



a11(x) · · · a1n(x)
...
...
...
an1(x) · · · ann(x)


 , f(x) =



f1(x)
...
fn(x)


 ,
where aij(x), fi(x) ∈ C(I). If f(x) ≡ 0, then the system is called homogeneous.
✷
Theorem A.30. Superposition principle for linear systems. Consider
the linear system of ordinary diﬀerential equations (A.15), then the superposition
principle holds:
i) If y1(x) and y2(x) are two solutions of the homogeneous systems, then
c1y1(x) + c2y2(x), c1, c2 ∈ R, is a solution of the homogeneous system,
too.
ii) If y0(x) is a solution of the inhomogeneous system and y1(x) is a solution
of the homogeneous system, then y0(x) + y1(x) is a solution of the
inhomogeneous system.
iii) If y1(x) and y2(x) are two solutions of the inhomogeneous system, then
y1(x) − y2(x) is a solution of the homogeneous system.
Proof. Direct calculations, exercise.
Corollary A.31. General solution of the inhomogeneous system.
i) If y1(x), y2(x), . . . , yk(x) are solutions of the homogeneous system, then
any linear combination
k
i=1 ciyi(x), c1, . . . , ck ∈ R, is also a solution of
the homogeneous system.
ii) The general solution of the inhomogeneous system is the sum of a special
solution of the inhomogeneous system and the general solution of the
homogeneous system.
Theorem A.32. Existence and uniqueness of a solution of the initial
value problem. Let I = [x0 − a, x0 + a] and aij ∈ C(I), fi ∈ C(I), i, j =
1, . . . , n. Then, there is exactly one solution y(x) : I → Rn
of the initial
value problem to (A.15) with the initial value y(x0) = y0 ∈ Rn
.
Proof. The statement of the theorem follows from the theorem on global existence and
uniqueness of a solution of an initial value problem from Picard–Lindel¨of, see lecture notes
Numerical Mathematics I or the literature.
Since the functions aij(x) are continuous on the closed (compact) interval I, they are
also bounded due to the Weierstrass theorem. That means, there is a constant M with
|aij(x)| ≤ M, x ∈ I, i, j = 1, . . . , n.
Denoting the right hand side of (A.15) by f(x, y), it follows that
94 A Topics on the Theory of Ordinary Diﬀerential Equations
f(x, y1) − f(x, y2) ∞ = max
i=1,...,n
|fi(x, y1) − fi(x, y2)|
= max
i=1,...,n
n
j=1
aij(x)y1,j(x) + fi(x) −
n
j=1
aij(x)y2,j(x) − fi(x)
= max
i=1,...,n
n
j=1
aij(x) (y1,j(x) − y2,j(x))
≤ n max
i,j=1,...,n
|aij(x)| max
i=1,...,n
|y1,i(x) − y2,i(x)|
≤ nM y1 − y2 ∞ ,
i.e. the right hand side satisﬁes a uniform Lipschitz condition with respect to y with the
Lipschitz constant nM. Hence, the assumptions of the theorem on global existence and
uniqueness of a solution of an initial value problem from Picard–Lindel¨of are satisﬁed.
A.2.2 Solution of the Homogeneous System
Remark A.33. Scalar case. Because of the superposition principle, one needs
the general solution of the homogeneous system
y′
(x) = A(x)y(x) (A.16)
for ﬁnding the general solution of (A.15). The homogeneous system has always
the trivial solution y(x) = 0.
In the scalar case y′
(x) = a(x)y(x), the general solution has the form
y(x) = c exp
x
x0
a(t) dt , c ∈ R, x0 ∈ (a, b),
see lecture notes Numerical Mathematics I or the literature. Also for the
system (A.16), it is possible to specify the general solution with the help of
the exponential. ✷
Theorem A.34. General solution of the homogeneous linear system
of ﬁrst order. The general solution of (A.16) is
yh(x) = e
x
x0
A(t) dt
c, c ∈ Rn
, x0 ∈ (a, b). (A.17)
The integral is deﬁned component-wise.
Proof. i) (A.17) is a solution of (A.16). This statement follows from the derivative of
the matrix exponential and the rule on the diﬀerentiation of an integral with respect to
the upper limit
y′
h(x) =
d
dx
e
x
x0
A(t) dt
c =
d
dx
x
x0
A(t) dt e
x
x0
A(t) dt
c = A(x)e
x
x0
A(t) dt
c.
A.2 Linear Systems of Ordinary Diﬀerential Equations of First Order 95
ii) every solution of (A.16) is of form (A.17). Consider an arbitrary solution ˜yh(x) of
(A.16) with ˜yh(x0) ∈ Rn. Take in (A.17) c = ˜yh(x0). Then, it follows that
yh(x0) = e
x0
x0
A(t) dt
˜yh(x0) = e0
=I
˜yh(x0) = ˜yh(x0).
That means, e
x
x0
A(t) dt
˜yh(x0) is a solution of (A.16) which has in x0 the same initial
value as ˜yh(x). Since the solution of the initial value problem is unique, Theorem A.32, it
follows that ˜yh(x) = e
x
x0
A(t) dt
˜yh(x0).
A.2.3 Linear Systems of First Order with Constant
Coeﬃcients
Remark A.35. Linear system of ﬁrst order diﬀerential equations with constant
coeﬃcients. A linear system of ﬁrst order diﬀerential equations with constant
coeﬃcients has the form
y′
(x) = Ay(x) + f(x), A =



a11 · · · a1n
...
...
...
an1 · · · ann


 ∈ Rn×n
. (A.18)
Thus, the homogeneous system has the form
y′
(x) = Ay(x). (A.19)
Its general solution is given by
yh(x) = eAx
c, c ∈ Rn
, (A.20)
see Theorem A.34. ✷
Remark A.36. Elimination method, substitution method for the homogeneous
system. One needs, due to the superposition principle, the general solution of
the homogeneous system. In practice, it is generally hard to compute exp(Ax)
because it is deﬁned by an inﬁnity series. For small systems, i.e. n ≤ 3, 4, one
can use the elimination or substitution method for computing the general
solution of (A.19). This method is already known from the numerical solution
of linear systems of equations. One solves one equation for a certain unknown
function yi(x) and inserts the result into the other equations. For diﬀerential
equations, the equation has to be diﬀerentiated, see Example A.37. This
step reduces the dimension of the system by one. One continues with this
method until one reaches an equation with only one unknown function. For
this function, a homogeneous linear diﬀerential equation of order n has to
be solved, see Section A.1.3. The other components of the solution vector of
(A.19) can be obtained by back substitution. ✷
96 A Topics on the Theory of Ordinary Diﬀerential Equations
Example A.37. Elimination method, substitution method. Find the solution
of
y′
(x) =
−3 −1
1 −1
y(x) ⇐⇒ y′
1(x) = −3y1(x)−y2(x), y′
2(x) = y1(x)−y2(x).
Solving the second equation for y1(x) and diﬀerentiating gives
y1(x) = y′
2(x) + y2(x), y′
1(x) = y′′
2 (x) + y′
2(x).
Inserting into the ﬁrst equation yields
y′′
2 (x)+y′
2(x) = −3 (y′
2(x) + y2(x))−y2(x) ⇐⇒ y′′
2 (x)+4y′
2(x)+4y2(x) = 0.
The general solution of this equation is
y2(x) = c1e−2x
+ c2xe−2x
, c1, c2 ∈ R.
One obtains from the second equation
y1(x) = y′
2(x) + y2(x) = (−c1 + c2) e−2x
− c2xe−2x
.
Thus, the general solution of the given linear system of diﬀerential equations
with constant coeﬃcients is computed by
y =
−c1 + c2
c1
e−2x
+
−c2
c2
xe−2x
.
Note that one can choose the constants in y2(x), but the constants in y1(x)
are determined by the back substitution. If the constants should be choosen
by y1(x), one obtains
y =
C1
C2 − C1
e−2x
+
C2
−C2
xe−2x
.
If an initial condition is given, then corresponding constants can be determined.
✷
Remark A.38. Other methods for computing the general solution of the homogeneous
system. There are also other methods for computing the general
solution of (A.19).
• The idea of the method of main-vectors and eigenvectors consists in transforming
the system to a triangular system. Then it is possible to solve
the equations successively. To this end, one constructs with the so-called
main-vectors and eigenvectors an invertible matrix C ∈ Rn×n
such that
C−1
AC is a triangular matrix. One can show that such a matrix C exists
for each A ∈ Rn×n
. Then, one sets
A.2 Linear Systems of Ordinary Diﬀerential Equations of First Order 97
y(x) = Cz(x) =⇒ y′
(x) = Cz′
(x).
Inserting into (A.19) yields
Cz′
(x) = ACz(x) ⇐⇒ z′
(x) = C−1
ACz(x).
This is a triangular system for z(x), which is solved successively for the
components of z(x). The solution of (A.19) is obtained by computing
Cz(x).
• The method of matrix functions is based on an appropriate ansatz for the
solution.
However, the application of both methods becomes very time-consuming for
larger n, see the literature. ✷
Remark A.39. Methods for determining a special solution of the inhomogeneous
system. For computing the general solution of the inhomogeneous system
of linear diﬀerential equations of ﬁrst order with constant coeﬃcients,
one needs also a special solution of the inhomogeneous system. There are
several possibilities for obtaining this solution:
• Method of the variation of constants. One replaces c in (A.20) by c(x),
inserts this expression into (A.18), obtains conditions for c′
(x), and tries
to compute c(x) from these conditions.
• Appropriate ansatz (St¨orgliedans¨atze). If each component of the right
hand side f(x) has a special form, e.g., a polynomial, sine, cosine, or
exponential, then it is often possible to ﬁnd the special solution with an
appropriate ansatz.
• Method of elimination. If the right hand side of f(x) of (A.18) is (n − 1)
times continuously diﬀerentiable, then one can proceed exactly as in the
elimination method. One obtains for one component of y(x) an inhomogeneous
ordinary diﬀerential equation of order n with constant coeﬃcients,
for which one has to ﬁnd a special solution. A special solution for (A.18)
is obtained by back substitution.
✷
References
Butcher, J. C. (1964) Implicit Runge-Kutta processes. Math. Comp., 18,
50–64.
Cryer, C. W. (1972) On the instability of high order backward-diﬀerence
multistep methods. Nordisk Tidskr. Informations behandling (BIT), 12,
17–25.
Curtiss, C. F. & Hirschfelder, J. O. (1952) Integration of stiﬀ equations.
Proc. Nat. Acad. Sci. U. S. A., 38, 235–243.
Deuflhard, P. & Bornemann, F. (2002) Scientiﬁc computing with
ordinary diﬀerential equations. Texts in Applied Mathematics, vol. 42.
Springer-Verlag, New York, pp. xx+485. Translated from the 1994 German
original by Werner C. Rheinboldt.
Deuflhard, P. & Bornemann, F. (2008) Numerische Mathematik 2. de
Gruyter Lehrbuch. [de Gruyter Textbook], revised edn. Walter de Gruyter
& Co., Berlin, pp. xii+499. Gew¨ohnliche Diﬀerentialgleichungen. [Ordinary
diﬀerential equations].
Dormand, J. R. & Prince, P. J. (1980) A family of embedded RungeKutta
formulae. J. Comput. Appl. Math., 6, 19–26.
Fehlberg, E. (1964) New high-order Runge-Kutta formulas with step size
control for systems of ﬁrst- and second-order diﬀerential equations. Z.
Angew. Math. Mech., 44, T17–T29.
Fehlberg, E. (1969) Klassische Runge-Kutta-Formeln f¨unfter und siebenter
Ordnung mit Schrittweiten-Kontrolle. Computing (Arch. Elektron. Rechnen),
4, 93–106.
G¨unther, P., Beyer, K., Gottwald, S. & W¨unsch, V. (1974) Grundkurs
Analysis. Teil 4. Leipzig: BSB B. G. Teubner Verlagsgesellschaft,
p. 308. Erste Auﬂage, Mathematisch-Naturwissenschaftliche Bibliothek,
Band 56.
Hairer, E., Nørsett, S. P. & Wanner, G. (1993) Solving ordinary
diﬀerential equations. I . Springer Series in Computational Mathematics,
vol. 8, second edn. Berlin: Springer-Verlag, pp. xvi+528. Nonstiﬀ problems.
99
100 References
Heuser, H. (2006) Gew¨ohnliche Diﬀerentialgleichungen. Mathematische
Leitf¨aden. [Mathematical Textbooks], ﬁfth edn. Stuttgart: B. G. Teubner,
p. 628. Einf¨uhrung in Lehre und Gebrauch. [Introduction to theory and
application].
John, V. & Rang, J. (2010) Adaptive time step control for the incompressible
Navier-Stokes equations. Comput. Methods Appl. Mech. Engrg., 199,
514–524.
Kunkel, P. & Mehrmann, V. (2006) Diﬀerential-algebraic equations.
EMS Textbooks in Mathematics. European Mathematical Society (EMS),
Z¨urich, pp. viii+377. Analysis and numerical solution.
Prince, P. J. & Dormand, J. R. (1981) High order embedded RungeKutta
formulae. J. Comput. Appl. Math., 7, 67–75.
Shampine, L. F. & Reichelt, M. W. (1997) The MATLAB ODE suite.
SIAM J. Sci. Comput., 18, 1–22. Dedicated to C. William Gear on the
occasion of his 60th birthday.
S¨oderlind, G. (2002) Automatic control and adaptive time-stepping. Numer.
Algorithms, 31, 281–310. Numerical methods for ordinary diﬀerential
equations (Auckland, 2001).
Strehmel, K., Weiner, R. & Podhaisky, H. (2012) Numerik
gew¨ohnlicher Diﬀerentialgleichungen, second edn. Springer Spektrum, p.
505.
Strehmel, K. & Weiner, R. (1995) Numerik gew¨ohnlicher Diﬀerentialgleichungen.
Teubner Studienb¨ucher Mathematik. [Teubner Mathematical
Textbooks]. Stuttgart: B. G. Teubner, p. 462.