C O D E A N D D E S I G N SA F E T Y
O F
P RO BA B I L I S T I C S YS T E M S
Habilitation Thesis
(collection of articles)
by
Petr Novotný
Masaryk University
Faculty of Informatics
2022
ii
A B S T R AC T
This thesis, submitted as a part of habilitation proceedings of its author, presents the
author’s contribution to the areas of code and design safety of probabilistic systems. Code
safety refers to the process of ensuring that a program code involving probabilistic instructions
behaves according to the intent of the programmer. Design safety refers to designing
probabilistic systems so that they behave in an inherently safe way.
In the domain of code safety, the thesis details the author’s work on static analysis
of probabilistic programs via martingales, a concept from probability theory. We show
how martingales can be used to reason about termination, safety, and eﬃciency questions
pertaining to probabilistic programs and how such reasoning can be performed in an
automated way.
On the design safety front, we describe the author’s work on the problem of riskconstrained
probabilistic decision-making. The idea is to design decision-making algorithms
for autonomous agents so that apart from the usual objective of maximising the
agent’s expected utility, the algorithm ensures a bounded level of risk, quantitatively expressed
via a suitable risk metric.
The thesis is written in the form of a collection of articles (Part II) accompanied by an
extensive commentary (Part I). The commentary is written so as to provide a high-level
overview of the problems tackled and the ideas behind their solutions. The technical details
are contained in the enclosed articles.
iii
iv
AC K NOW L E D G E M E N T S
First and foremost, I would like to thank all my co-authors (and co-authors to be), whether
the outcomes of our collaboration are included in this thesis or not. Namely, my thanks
go to (in alphabetic order) Sheshansh Agrawal, Michal Ajdarów, Guy Avni, Nikhil Balaji,
František Blahoudek, Tomáš Brázdil, Šimon Brlej, Krishnendu Chatterjee, Taolue Chen,
Martin Chmelík, Murat Cubuktepe, Adrián Elgyütt, Vojtěch Forejt, Hongfei Fu, Ehsan
Kafshdar Goharshady, Anchit Gupta, Rouzbeh Hasheminezhad, Thomas A. Henzinger,
Rasmus Ibsen-Jensen, Deep Karkhanis, Joost-Pieter Katoen, Stefan Kiefer, David Klaška,
Ľuboš Korenčiak, Jan Krčál, Antonín Kučera, Vít Musil, Melkior Ornik, Guillermo A. Pérez,
Jean-François Raskin, Owen Rouillé, Amélie Royer, Vojtěch Řehák, Aistis Simaitis, Mahsa
Shirmohammadi, Ufuk Topcu, Pranay Thangeda, Jiří Vahala, Dominik Velan, Dominik
Wojtczak, Florian Zuleger, Jiří Zárevúcky, and Ðorđe Žikelić. Of course, there were many
more colleagues who have impacted my research career through discussions, suggestions,
and friendships. It would be diﬃcult to compile their list while avoiding the risk of
inadvertently omitting someone, and hence I extend my thanks to all of you anonymously.
Special thanks go to Dana Komárková and Jan Strejček for helping with the administrative
aspects of the habilitation procedure and to Benjamin Kaminski for helping in my
search for nice thesis fonts.
Finally and most importantly, I would like to thank my wife Jana for her love and
support and our son Vilém for bringing so much fun to our lives.
Petr Novotný
November 5, 2022
v
vi
C ON T E N T S
i Commentary on the Enclosed Publications
1 Probabilistic Systems in Computer Science 3
2 Preliminaries 7
2.1 Basic Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Modelling Probabilistic Computations . . . . . . . . . . . . . . . . . . . . 7
3 Introduction to Code Safety 11
3.1 Problems in Analysis of Probabilistic Programs . . . . . . . . . . . . . . . 11
3.2 Probabilistic Arithmetic Programs . . . . . . . . . . . . . . . . . . . . . . 13
4 Ranking Functions and Martingales for Termination Proving: A Brief History 17
4.1 History of Ranking Functions . . . . . . . . . . . . . . . . . . . . . . . . 17
4.2 History of Ranking Supermartingales . . . . . . . . . . . . . . . . . . . . 26
5 Code Safety: Author’s Contribution 35
5.1 Scalar RSMs: Algorithms, Complexity, Tail Bounds . . . . . . . . . . . . . 35
5.2 Repulsing Supermartingales: Quantitative Reachability, Safety, and More . 38
5.3 Lexicographic Ranking Supermartingales . . . . . . . . . . . . . . . . . . 41
5.4 Beyond Non-Negative LexRSMs . . . . . . . . . . . . . . . . . . . . . . . 44
5.5 Summary of Martingale-Based Techniques: A Book Chapter . . . . . . . . 47
5.6 Non-Probabilistic Tech Transfer: Proving Non-Termination by Program
Reversal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
6 Design Safety: Risk-Aware Decision Making under Uncertainty 51
6.1 POMDP Optimisation under Worst-Case Payoﬀ Constraints . . . . . . . 53
6.2 From Worst-Case to Quantile Constraints . . . . . . . . . . . . . . . . . . 55
6.3 Risk-Constrained Learning . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Notes 59
Bibliography 61
ii Collection of Applicant’s Publications
7 List of Enclosed Publications 75
vii
viii CONTENTS
Part I
C OM M E N TA RY ON T H E E NC L O S E D P U B L I C AT I ON S
P RO BA B I L I S T I C S YS T E M S I N
C OM P U T E R S C I E NC E 1
Probability?
Many people (including, from the author’s experience, a non-trivial amount of computer
scientists) view computer systems as an epitome of determinism. From this viewpoint, a
computer ought to predictably follow the instructions set by the programmer, and any variation
in its behaviour is deemed to be a result of errors in programming and of inadequate
testing. While such a perception might be correct within some application domains, it is
not accurate in general. Uncertainty and randomness pervade various ﬁelds of computing.
In what might come as a surprise, this is not entirely the result of the recent rapid developments
in artiﬁcial intelligence (though AI deﬁnitely is one of the major drivers for the surge
of interest in probabilistic techniques); randomness is connected to the very beginnings of
computer science as an independent ﬁeld. Indeed, one of the ﬁrst computations performed
on the ﬁrst general-purpose computer ENIAC were randomised1 simulations of particlelevel
physical processes [Met87]. This concrete application nicely illustrates the typical
usage of randomness in computing (similar to its usage in human reasoning): a computation
that aims to say something about the real world necessarily operates over some model of
the real world, and this model is necessarily imprecise; probabilities must then be used
within the model to represent our uncertainty about the model/real-world correspondence.
Domains using probabilistic methods include the aforementioned artiﬁcial intelligence
(itself consisting of a diverse variety of sub-domains [SB18]), control theory [Kus71], planning
and scheduling [LMA98], robotics [TBF05], simulation of complex systems (from
weather and climate to epidemiological models [All08]), or security [MVV18].
Formal Reasoning about Probabilistic Computations
The proliferation of probabilistic reasoning in computer science quite naturally spawned a
demand for formal analysis of probabilistic computations. Major developments in formal
methods for probabilistic systems started to appear in the 1980s, mostly in the form of
logical calculi for temporal properties of probabilistic programs [FH82; Fel84; Koz83].
This was followed by a strive towards automation of formal veriﬁcation techniques for
probabilistic systems, inspired by the success of automated veriﬁcation in the classical (i.e.
non-probabilistic) setting. Initially, the automation focused on ﬁnite-state probabilistic
systems (Markov chains, Markov decision processes, and stochastic games), where algorithms
verifying a wide range of properties were developed [BK08; KP12]. The logical next
step was considering automated veriﬁcation of decidable inﬁnite-state program models,
3
4 Probabilistic Systems in Computer Science
typically probabilistic pushdown automata [BEKK12; EY12], and their restrictions, such as
probabilistic BPAs [BBKO10] or one-counter automata [BBEK11; BKNW12; BBEKW10;
BBE10; SEY13]. It is not surprising, given the inherent complexity of probabilistic systems,
that the decidability frontier approached signiﬁcantly faster with increasing model
expressiveness compared to the classical setting. This became particularly evident when
considering models capturing multiple unbounded variables, such as probabilistic vector
addition system with states (pVASS) [BKKNK14; BKKN15] or branching processes [ESY12;
ESY17].
While the strive towards automation of probabilistic formal methods ultimately covered
most of the commonly studied models, challenges along various fronts remained (and
remain) to be resolved. We do not aim to provide their exhaustive list here. Instead, the
next subsection focuses on those challenges appearing since the early-to-mid 2010s that
became the focus of the author’s research. Then, we will sketch how the author tackled
these challenges and show the outline of the remainder of this thesis which contains a
more detailed account of the author’s contribution.
Challenges
The ﬁrst challenge we consider is the lifting of probabilistic formal methods from the
world of abstract and decidable models to the world of more concrete and Turing-complete
probabilistic programs. That is, make a step towards probabilistic program analysis (PPA).2
This challenge mimics a similar process in the non-probabilistic setting. There, many
eﬃcient (though necessarily incomplete) automated techniques for proving a variety of
program properties were developed, despite the inherent undecidability of veriﬁcation in
Turing-complete programs [Tur38]. Such an assortment of techniques was not available
for probabilistic programs, although the progress on the PPA front was tremendous in the
past decade. Going back to the early 2010s, the pioneering steps towards automated PPA
focused on safety properties, e.g. bounding the probability of assertion violation via
techniques such as probabilistic CEGAR [HWZ08] or game-based abstraction [KNP06].
There were markedly fewer results concerning liveness properties, such as termination and
its quantitative aspects (e.g. expected termination time).
The previous paragraph paints the picture of the ﬁrst challenge this thesis deals with:
developing new automated techniques for proving properties of Turing-complete probabilistic
programs, with a (non-exclusive) focus on liveness properties. We call this challenge
code safety, since the guiding goal here is to ensure that a probabilistic code behaves correctly
and according to the intents of its programmer.
The second challenge we consider focuses on what happens before we even start to
check the correctness of a probabilistic program: the process of designing a probabilistic
program so that it operates in a safe way. Hence, we call this challenge design safety. In this
thesis, we study the design safety within the AI domain of planning for autonomous agents
under probabilistic uncertainty. This domain has a rich history and a lot of common ground
Probabilistic Systems in Computer Science 5
with probabilistic formal methods: both ﬁelds employ similar models, such as Markov
decision processes, and the methods developed within the ﬁeld are often close in both
intent and technical detail. (For instance, algorithms like value and policy iteration [Put05]
are used within both ﬁelds.) However, agent planning puts more emphasis on scalabality to
large models, as opposed to the focus on theoretical guarantees sought in formal methods
(e.g. a guarantee to ﬁnd an optimal policy for the agent). This led to the development
of many heuristic and approximate approaches to agent planning, including algorithms
utilising machine learning techniques [SB18]. The heuristic nature of the approaches
increases the risk that the agent will not operate according to the designer’s intent. This
risk is compounded by the fact that agent planning algorithms typically optimise the agent’s
mean performance, expressed in the form of an expected reward accumulated along the
agent’s run. The focus on expected value might allow for large and potentially dangerous
deviations from the mean behaviour. The design safety challenge, as understood in this
thesis, deals with designing agent planning algorithms that combine the eﬃciency of the
heuristic approaches with ideas from formal methods to mitigate the aforementioned risk.
Author’s Contributions
This thesis presents its author’s contributions to tackling the code and design safety challenges
for probabilistic systems.
Static Analysis of Probabilistic Programs with Martingales Chapters 3–5 focus on
code safety, in particular on analysis of termination properties in probabilistic programs.
The primary focus will be on almost-sure (a.s.) termination, i.e. proving that a program
terminates with probability 1. We will show how the mathematical concept of martingales
from probability theory has been leveraged for automation of a.s. termination proving.
Chapter 4 reviews the conceptual roots of the approach present in the previous work on
ranking functions and scalar martingales, and it (or parts thereof) can be safely skipped
by a reader acquainted with the topic. Chapter 5 presents the author’s contribution to
martingale-based program analysis techniques.
Risk-Aware Agent Planning Chapter 6 focuses on design safety. We will present new
agent planning algorithms that use ideas from constrained optimisation to mitigate the
planning risk. That is, we assign to each policy some measure of risk, typically expressed
as the probability of either entering some undesirable state (state-based risk) or of the
reward dropping below some acceptable level (tail or quantile risk). We then seek policies
optimising the expected payoﬀ subject to the constraint that the risk in question is below a
given threshold 𝑇. To tackle this problem, our algorithms combine ideas from heuristic
search and reinforcement learning with tools inspired by the work on constrained and
multi-objective Markov decision processes.
6 Probabilistic Systems in Computer Science
The thesis is written in the form of a collection of papers (Part II) provided with an extensive
commentary (Part I). We aim to provide a high-level overview of the author’s conceptual
contribution to the above topics. The technical details can be found in the referenced
publications.
P R E L I M I NA R I E S
2
2.1 Basic Notation
We assume familiarity with the basics of probability theory [Wil91]. In particular, a probability
space is a triple (Ω, F , P), where Ω is a sample space, F is a sigma-algebra of measurable
sets over Ω, and P is a probability measure on F . A random variable (r.v.) 𝑅 : Ω → R∪ {±∞}
is an F -measurable real-valued function (i.e. {𝜔 | 𝑅(𝜔) ≤ 𝑥} ∈ F for all 𝑥 ∈ R) and we
denote by E[𝑅] its expected value. We will also use random variables of the form 𝑅: Ω → 𝐴
for some ﬁnite or countable set 𝐴, which easily translates to the real-valued variables.
2.2 Modelling Probabilistic Computations
Before we delve into the world of probabilistic program analysis, we ﬁx some necessary
notation and deﬁne concepts that will serve as our mathematical “modelling language”
throughout the thesis. Chief of these is the notion of a probabilistic transition system.
Deﬁnition 1 (Probabilistic transition system). A probabilistic transition system (PTS) is a
tuple T = (𝑆, Δ) where:
• 𝑆 is a set of states; and
• Δ is a set of transitions. Here, a transition is a tuple 𝜏 = (𝑠, −→𝜏 ) where 𝑠 ∈ 𝑆 is a source
state and −→𝜏 is a next state function specifying how probable it is for T to transition
from 𝑠 into each particular state when executing 𝜏. Formally, −→𝜏 is either a probability
distribution over 𝑆 (if 𝑆 is discrete) or a Markov kernel [MT09] of type 𝑆 × 𝑆 → [0, 1]
(if 𝑆 is continuous).
We say that a transition 𝜏 is enabled in a state 𝑠 if 𝑠 is the ﬁrst component (the source state)
of 𝜏. We denote by Δ(𝑠) the set of all transitions enabled in 𝑠. We stipulate that Δ(𝑠) ≠ ∅ for
each state 𝑠.
In this thesis, we use PTSs as an abstract “back-end” through which we deﬁne the semantics
of various probabilistic models, including Turing-complete probabilistic programs.
For now, we consider a coarse-grained classiﬁcation of PTSs into three classes:
• ﬁnite PTSs, where both the sets 𝑆 and Δ are ﬁnite,
• discrete PTSs, where 𝑆 and Δ are ﬁnite or countably inﬁnite sets (discrete subsumes
ﬁnite),
7
8 Preliminaries
• continuous or general state space PTSs, where 𝑆 is an arbitrary set.
Below, we sketch the semantics of PTSs: we will describe how each PTS encodes a family
of stochastic processes. We will keep the description on an intuitive level, using a language
suitable mostly for discrete PTSs. The presented intuition also applies to continuous PTSs,
though formalising it in the continuous setting would require delving into technicalities
beyond the scope of the thesis. We refer the reader to the literature on general state space
stochastic processes (e.g. [MT09]) for further details.
Semantics of PTSs
To deﬁne the dynamics of a PTS, an initial state 𝑠0 and a scheduler 𝜎 have to be ﬁxed. Here,
a scheduler is a function which resolves the nondeterminism present in the system due to
the fact that multiple (indeed, inﬁnitely many) transitions can be enabled in some states.
Formally, each scheduler is a function which takes as an input the whole current history of
states and transitions exhibited by the PTS (i.e. the sequence 𝑠0 𝜏0 𝑠1 𝜏1 . . . 𝑠𝑖−1 𝜏𝑖−1 𝑠𝑖, where
𝑠𝑖 and 𝜏𝑖 are the 𝑖-th state and transition encountered so far, respectively) and outputs3 a
distribution over Δ(𝑠𝑖).
Our PTSs evolve in discrete time steps. We start in the initial state 𝑠0. Then, in each step
𝑖, the PTS is in some state 𝑠𝑖 ∈ 𝑆 and performs a transition 𝜏𝑖 ∈ Δ(𝑠𝑖) sampled according to
𝜎 (𝑠0 𝜏0 𝑠1 𝜏1 . . . 𝑠𝑖−1 𝜏𝑖−1 𝑠𝑖). Once this is done, the next state 𝑠𝑖+1 is sampled according to −→𝜏𝑖 .
This process continues ad inﬁnitum and thus (randomly) produces a single trajectory (or
run) 𝑠0 𝜏0 𝑠1 𝜏1 𝑠2 𝜏2 . . ..
Thus, an intuitive view of the semantics of a PTS is that of an abstract “parameterised
sampler” which inputs an initial state 𝑠0 and a measurable scheduler 𝜎 and randomly
samples an inﬁnite state-transition trajectory according to the aforementioned rules. More
precisely, a PTS T induces a function which, given 𝑠0 and 𝜎, outputs a probability measure
P 𝜎
T,𝑠0
over T’s runs. The measure captures the behaviour of T under 𝜎, sketched in the
previous paragraph. We denote by E 𝜎
T,𝑠0
the expected value operator associated with P 𝜎
T,𝑠0
.
We will omit T from the subscript when clear from the context.
Markov Decision Processes and Stochastic Games. A PTS can be viewed as a type
of a Markov decision process (MDP) [Put94] with a possibly general state space. A possible
extension of the model gives rise to stochastic games that contain two types of nondeterminism.
Formally, we can partition the set of states 𝑆 into sets 𝑆 𝐴, 𝑆 𝐷 of angelic and
demonic states, respectively. Two schedulers, angelic one 𝜎 𝐴 and demonic one 𝜎 𝐷 are then
involved in the resolution of nondeterminism: 𝜎 𝐴 chooses transitions in angelic states,
while 𝜎 𝐷 in demonic ones. In the context of veriﬁcation, 𝜎 𝐴 typically models the behaviour
of a rational agent who wants the system to achieve a given speciﬁcation, while 𝜎 𝐷 (which
typically models unknown aspects of the system) aims to hinder such achievement. In this
work, we will make only passing reference to stochastic games and hence omit the details.
In the chapters on veriﬁcation, we will take the demonic view of non-determinism, which
2.2 Modelling Probabilistic Computations 9
amounts to overapproximating the possible behaviours of the system. On the other hand,
in chapters focused on probabilistic planning, we will take the angelic view, since we will
use PTSs to model autonomous agents operating towards some desired goal.
10 Preliminaries
I N T RO D U C T I ON T O C O D E
SA F E T Y 3In this chapter, we describe concrete problems in probabilistic program analysis that
we want to address. The following chapters describe the techniques used to solve these
problems. These descriptions will, for each studied problem, consist of two parts:
• Proof rule & certiﬁcate: First, we will formulate a mathematical proof rule for proving
the property addressed in the studied problem, e.g. almost-sure termination. The
proof rule will typically be of the form “if there exists a proof certiﬁcate of a speciﬁc
form, then the property holds.” For instance, in the case of almost-sure termination,
we will show that if a program admits a so-called ranking supermartingale, then
it terminates almost-surely. The proof rules and certiﬁcates are abstract objects;
thus, they will be deﬁned and analysed (w.r.t. soundness and completeness) over
the abstract model of PTSs. In particular, our proof rules will be applicable to any
probabilistic system that can be modelled as a PTS, not only to models stemming
from a concrete form of probabilistic programs.
• Automation: Once we deﬁne a sound rule for some property, we focus on its automation,
typically in the form of designing an algorithm capable of synthesising the
respective proof certiﬁcate. To do this, we need to use a more concrete program
model than PTSs: indeed, for inﬁnite-state PTSs it is not even clear how to represent
them in a ﬁnitary way so as to submit them to algorithmic analysis. In our work,
we focus on algorithmic analysis of imperative arithmetic programs. Thus, whenever
considering the question of automation, we will focus on PTSs representable by
such programs. The syntax and semantics of this class of programs are provided in
Section 3.2.
3.1 Problems in Analysis of Probabilistic Programs
We focus on proving the fundamental temporal properties of probabilistic programs listed
below. We frame the properties in abstract terms, over a PTS T = (𝑆, Δ).
Almost-sure termination: We designate sets 𝐼𝑛, 𝐹 ⊆ 𝑆 of initial and terminal states, respectively,
and say that T is almost-surely (a.s.) terminating if for every scheduler 𝜎 and every
initial state 𝑠0 ∈ 𝐼𝑛 it holds that
P 𝜎
𝑠0
[a state from 𝐹 is reached] = 1.
The task is to prove that a given program is a.s. terminating.
11
12 Introduction to Code Safety
Quantitative termination: the problem setup is the same as for a.s. termination, but the task
is to compute an approximation of or bounds on the worst-case termination probability,
i.e. the quantity
inf
𝜎
inf
𝑠0∈𝐼𝑛
P 𝜎
𝑠0
[a state from 𝐹 is reached].
Performing such an analysis makes sense when the program in question does not terminate
a.s. or when its a.s. termination cannot be proved by available methods. Since we
typically view termination as a positive event, we focus on computing lower bounds on its
probability.
Finite termination: We say that a program is ﬁnitely terminating if it terminates in a ﬁnite
expected number of steps. Each ﬁnitely terminating program is also a.s. terminating, since
non-terminating runs have, by deﬁnition, an inﬁnite termination time, and thus, their
measure must be zero if the expected runtime is ﬁnite. The converse is not true. For
instance, a program simulating a symmetric random walk with an absorbing barrier at 0
will terminate a.s., but its expected termination time is inﬁnite.
Formally, given a target set 𝐹 of states, we deﬁne a random variable Time 𝐹 such that
for each run 𝜚 the quantity Time(𝜚) denotes the ﬁrst point in time in which the run visits a
state from 𝐹 (or ∞ if no such time step exists). The variable Time 𝐹 is typically called the
termination time. Then, given a set of initial states 𝐼𝑛, the task is to decide whether for each
𝑠0 ∈ 𝐼𝑛 and each scheduler 𝜎 it holds
E 𝜎
𝑠0
[Time 𝐹] < ∞.
The property of ﬁnite termination is sometimes called positive termination in the literature,
due to the analogy with the concept of positive recurrence in inﬁnite-state Markov
chains.
Expected termination time analysis: The setup is the same as in the ﬁnite termination problem,
but the task is to compute some approximation of the worst-case expected termination
time. Since we typically see early termination as preferable to later one, we usually aim to
compute upper bounds (as tight as possible) on the quantity
sup
𝜎
sup
𝑠0∈𝐼𝑛
E 𝜎
𝑠0
[Time 𝐹].
To obtain state-speciﬁc bounds, we often aim to compute a function which inputs 𝑠0
and returns an upper bound on sup 𝜎 E 𝜎
𝑠0
[Time 𝐹]. For arithmetic programs where the state
is characterised by valuation of the program’s variables, such a bounding function typically
takes form of a symbolic expression involving these variables.
Tail bounds on runtime: The expected value is a rather crude characteristic of a random
variable, and the expected termination time is no exception to this fact. A program might
have a small expected runtime, but the probability of extremely high runtimes could still be
relatively high: the runtime of a probabilistic program could follow a fat-tailed distribution.
To account for such eventualities, we might want to compute ﬁner characteristics of the
3.2 Probabilistic Arithmetic Programs 13
runtime, such as its tail bounds. That is, we want to compute functions 𝑁0 : 𝑆 → N and
𝑔 : 𝑆 × N → [0, 1] (both typically in the form of a symbolic expression) such that for every
initial state 𝑠0 and every 𝑛 ≥ 𝑁0(𝑠0) it holds
sup
𝜎
P 𝜎
𝑠0
[Time 𝐹 ≥ 𝑛] ≤ 𝑔(𝑠0, 𝑛).
Of particular interest are exponentially decreasing tail bounds, where 𝑔(𝑠0, 𝑛) converges
exponentially fast to 0 as 𝑛 → ∞. Exponentially decreasing tail bounds entail that the
distribution of the runtime is concentrated around its expected value (the scale of the
concentration being determined by constants appearing in the deﬁnitions of 𝑁0 and 𝑔).
Quantitative safety: The above problems focus on various aspects of termination (or in
general, reachability), which is a liveness property. It is natural to also study safety properties,
where we aim to bound (from above) the probabilities of some undesirable events. The
basic safety property is state safety, where the undesirable event is speciﬁed as reaching
some set of undesirable states. In the context of imperative programs, we speak about the
probability of assertion violation, since the undesirable states are typically those where the
variable valuation violates some assertion in the program. Formally, given a set 𝑉 ⊆ 𝑆 of
violating states and an initial state 𝑠0, we want to compute an upper bound (again, as tight
as possible) on
sup
𝜎
P 𝜎
𝑠0
[a state from 𝑉 is reached].
As before, the bound can be either a concrete number, if 𝑠0 is ﬁxed in advance, or a symbolic
expression determining a function from the set of initial states 𝐼𝑛 to the unit interval.
3.2 Probabilistic Arithmetic Programs
While the problems we study and their theoretical solutions we provide are phrased in the
language of abstract PTSs, for algorithmic solutions we restrict to a sub-class of PTSs that
represent imperative arithmetic programs. More concretely, we study a simple imperative
programming language consisting of standard programming constructs (assignments,
conditionals, while loops, and sequencing of statements) operating over numerical variables
(mathematical reals, rationals, or integers).4 Our language also contains two inherently
probabilistic constructs: sampling of a variable value from some probability distribution
(sample instruction) and probabilistic branching (the if prob(𝑝) construct, which selects
the if branch with probability 𝑝 and the else branch with probability 1 − 𝑝). Our programs
also contain support for non-determinism: non-deterministic branching (if ★ statement,
where the choice between the if and the else branch is resolved by a scheduler) and nondeterministic
variable update (ndet), which sets a variable to a value non-deterministically
chosen from a set of valuations satisfying a given expression. The abstract grammar of the
programming language is summarised in Figure 1.
14 Introduction to Code Safety
⟨stmt⟩ ::= ⟨assgn⟩ | skip | ⟨stmt⟩; ⟨stmt⟩
| if ⟨ndbexpr⟩ then ⟨stmt⟩ else ⟨stmt⟩ ﬁ
| while ⟨boolexpr⟩ do ⟨stmt⟩ od
⟨assgn⟩ ::= ⟨var⟩ := ⟨expr⟩
| ⟨var⟩ := sample(⟨dist⟩)
| ⟨var⟩ := ndet(⟨expr⟩)
⟨ndbexpr⟩ ::= ★ | prob(⟨p⟩) | ⟨boolexpr⟩
while 𝑥 ≥ 0 ∧ 𝑦 ≥ 0 do
i f ∗ then
i f prob ( 1
2 ) then
𝑥 := 𝑥 − 2
e l s e skip
f i
e l s e
𝑥 := 2𝑥 ;
𝑦 := 𝑦 − 1
f i
od
Figure 1: Left: Abstract grammar of probabilistic programs considered in this chapter. Here, ⟨var⟩
ranges over a countable set of program variables, ⟨expr⟩ over the set of arithmetic expressions,
⟨dist⟩ over probability distributions over integers or reals, and ⟨boolexpr⟩ ranges over the set of
boolean expressions (boolean combinations of atomic predicates of the form 𝐸1 ≤ 𝐸1, where 𝐸1, 𝐸2
are arithmetic expressions).
Right: An example probabilistic program.
We consider both discrete and continuous distributions, and we assume that the expected
value and the support of each distribution used in the program are known and
well-deﬁned. As for expressions, for theoretical development we allow arbitrary measurable
expressions, which are built from program variables and Borel-measurable functions over
integers or real numbers. The measurability condition is, in particular, satisﬁed for all the
standard arithmetic operations utilised in basic calculus.
Aﬃne Probabilistic Programs When considering automation of the presented approaches,
we restrict to a class of aﬃne probabilistic programs (APPs). In APPs, all arithmetic
expressions are of the form 𝑑 + 𝑛
𝑖=1 𝑐𝑖 𝑥𝑖, where 𝑐1, . . . , 𝑐 𝑛, 𝑑 are constants and 𝑥1, . . . , 𝑥 𝑛 are
program variables. This restriction will allow us to reduce the search for proof certiﬁcates
to linear programming, thus allowing for eﬃcient automation. We note, however, that
many techniques presented in this thesis were recently extended also to programs with
non-linear arithmetic [CFG16].
From Programs to PTSs A probabilistic program can be naturally represented by a PTS
whose states are tuples (ℓ, 𝜈), where ℓ is a program location (intuitively, corresponding to a
logical line of code or a program command currently being executed) and 𝜈 is a valuation
of program variables. The transitions of the PTS are deﬁned so as to capture the control
ﬂow of the program. For instance, the command ℓ = " if prob(0.6) then 𝐶1 else 𝐶2" is
represented by adding, for each state of the form (ℓ, 𝜈), a transition 𝜏 whose source state is
(ℓ, 𝜈) and whose next state function assigns probability 0.6 to (ℓ1, 𝜈) and probability 0.4
3.2 Probabilistic Arithmetic Programs 15
to (ℓ2, 𝜈), where ℓ𝑖 is the ﬁrst command of the sub-program 𝐶𝑖. Similarly, the command
ℓ = "x := sample(Uniform [0,1]); 𝐶" is represented by adding a transition 𝜏 from each
state ℓ, 𝜈, such that the next state function of 𝜏 is a Markov kernel representing a uniform
distribution over states of the form (ℓ′, 𝜈(𝑥 ← 𝑎)) for some 𝑎 ∈ [0, 1] and for ℓ′ being the
ﬁrst command of 𝐶. The remaining programming constructs are represented in a similar
manner, as common in program analysis.
In the context of imperative programs, the set of terminal states 𝐹 is typically a set of
states of the form (ℓterm, 𝜈), where ℓterm is a special terminal location representing the "last
line of code" in which the program has terminated. Similarly, there is an initial location ℓinit
representing the ﬁrst line of code. By default, any variable valuation is considered initial:
when this is not the desired interpretation, an initial block of (possibly non-deterministic)
assignments speciﬁes the permitted range of initial values.
16 Introduction to Code Safety
R A N K I NG F U NC T I ON S A N D
M A RT I NGA L E S F O R
T E R M I NAT I ON P ROV I NG :
A B R I E F H I S T O RY
4
While this chapter aims to survey previous martingale-based techniques for probabilistic
program analysis, we start our survey in the non-probabilistic world. This might seem
counter-intuitive: the notion of martingale is (as we shall see) decidedly probabilistic, and
hence it seems unnecessary to tie it to non-probabilistic concepts. However, the story of
martingales in probabilistic program analysis does not primarily seem to be the case of
“technological transfer” from abstract probability theory to the domain of formal methods.
Instead, the story can be interpreted as a quest for a generalisation of techniques (proof
certiﬁcates and algorithms) successful in the non-probabilistic domain to the probabilistic
one. In particular, the martingales we will encounter can be viewed as a “syntactically
natural” generalisation of so-called ranking functions to the probabilistic world, in the sense
that the deﬁnitions of the two concepts look syntactically very similar, apart from certain
diﬀerences that seem natural when augmenting a non-probabilistic concept to deal with
probabilities. We stress that “naturalness” does not equate “straightforwardness.” While
deﬁning probabilistic generalisations of ranking functions is not diﬃcult, proving their
soundness as a proof certiﬁcate and utilising their full power for program analysis are
endeavours that often require delicate probabilistic reasoning and knowledge of tools of
martingale theory: in this way, the connection to abstract probability is eventually created.
However, the connection to the original source of the probabilistic generalisation, i.e. to
ranking functions, is still present and becomes important when focusing on automation:
due to the syntactic similarity of the original and the probabilistic proof certiﬁcates, the
algorithms computing the latter ones share the same design patterns as the algorithms for
the former ones. Hence, the next section provides a brief survey of ranking functions and
algorithms for their synthesis.
4.1 History of Ranking Functions
We will treat the term non-probabilistic program mostly informally. Where a more formal
view is required, we can imagine arithmetic programs as deﬁned in Section 3.2 (including
their formalisation via PTS’s) but without any probabilistic commands (i.e. no probabilistic
branching or sampling of a variable value). The termination analysis for non-probabilistic
17
18 Ranking Functions and Martingales for Termination Proving: A Brief History
programs aims to (dis)prove that all runs (or the single run, if there is no non-determinism)
of a given program do reach a terminal state (i.e., terminate).
Ranking Functions Informally, a ranking function (RF) is a mapping 𝑓 : 𝑆 → 𝑊 from
the set of states of a program to a well-founded set, i.e. an ordered set (𝑊, ⪯) in which
there are no inﬁnite strictly descending sequences. To call such a function 𝑓 ranking, it
must satisfy some sort of a strict decrease condition: in the most simple form, we require
that the value of 𝑓 strictly decreases (w.r.t. the ordering ⪯) with each step of the program’s
execution (i.e. until a terminal state is reached). If a program admits such a ranking function,
it clearly has to terminate: a non-terminating run 𝑠0 𝜏0 𝑠1 𝜏1 𝑠2 𝜏2 . . . would induce an inﬁnite
decreasing sequence 𝑓 (𝑠0) ≻ 𝑓 (𝑠1) ≻ 𝑓 (𝑠2) ≻ . . . of values in 𝑊, contradicting the wellfoundedness
of this set. One can readily notice that requiring a decrease in every step of
an imperative program’s execution is not strictly necessary, since we only need to show
that the program does not get stuck in an inﬁnite loop. To this end, it suﬃces to identify a
set of cutpoints, i.e. a set of program locations such that each program loop goes through at
least one such cutpoint, and require that the value of 𝑓 strictly decreases between every
two visits of cutpoints. (One natural choice of cutpoints are the heads of each loop.) While
the resulting notion of a ranking function is, from the theoretical point of view, no more
powerful than the one with a strict decrease in every step, using cutpoints might simplify
some termination proofs. Another possible relaxation of the RF deﬁnition is transferring
the well-foundedness requirement from 𝑊 to 𝑓 itself: that is, we do no longer require that
𝑊 is well-founded, but then 𝑓 itself must satisfy some additional property which ensures
that there cannot be an inﬁnite sequence 𝑓 (𝑠0) ≻ 𝑓 (𝑠1) ≻ 𝑓 (𝑠2) ≻ . . . induced by a run in
the program.
Scalar Ranking Functions There are several natural choices for the set 𝑊, the most
prominent being non-negative integers with their natural ordering. For programs with
real arithmetic, it might be beneﬁcial to work with non-negative rationals or reals, in which
case the strict decrease condition must be strengthened to require that the magnitude of
the decrease is uniformly bounded from below (e.g. “between every two visits of cutpoints,
the value of 𝑓 must decrease by at least 1”). We refer to RFs with such co-domains as scalar,
since they assign a single number to each program state.
Completeness of Ranking Functions We have already argued why ranking functions
are sound for proving program termination. It turns out that for programs with bounded
non-determinism (which means that there is some uniform bound 𝐵 on the number of
successors of every state with non-deterministic choice), they are also complete in the sense
that every terminating program admits a (scalar) ranking function. This is not hard to
see: by König’s lemma [Kön27], the set of states visited by a terminating program with
bounded non-determinism must be ﬁnite, and hence also the maximal termination time
from each reachable state is ﬁnite. It is then easy to see that a function assigning to each
4.1 History of Ranking Functions 19
state its maximal (over all runs initiated in that state) termination time is a scalar ranking
function with codomain Z.
Vectorial and Lexicographic Ranking Functions While, as mentioned above, scalar
RFs are (in theory) suﬃcient to prove termination of any terminating program, termination
proofs are typically easier to construct when the co-domain is endowed with a richer structure.
In practice, the most commonly used ranking functions are arguably the lexicographic
ones (LexRFs), which map the states to vectors of numbers ordered lexicographically. The
concrete co-domain of the LexRF 𝑓 can be either directly well-founded (say N 𝑛 ordered
lexicographically) or more general, with the well-foundedness being provided by the deﬁnition
of 𝑓. As an example of the latter approach, consider a function 𝑓 mapping 𝑆 to Z 𝑛
equipped with a lexicographic ordering (obviously not a well-founded set) such that in
each step of the program’s execution:
• the value of 𝑓 decreases in lexicographic ordering, and
• the leftmost component of 𝑓 to decrease is non-negative before executing the step.
Here, even though the codomain of 𝑓 is not well-founded, one can easily verify that a
program admitting such a function must terminate.
One might wonder why are the LexRFs preferable over their scalar counterparts. One
key reason is that LexRFs are apt for reasoning about programs with non-trivial control
ﬂow structure. In particular, termination of nested loops can be nicely captured by the
lexicographic ordering, with the leftmost component typically ranking (i.e. proving the
termination of) the outer loop and one additional component added for each level of
nesting. Automated tools for ranking function synthesis beneﬁt from this type of structural
decomposition: trying to directly compute a scalar RF can be too ambitious (e.g. there is
no guarantee that the function admits a nice closed-form representation).
The LexRFs can be viewed as a special case of vectorial ranking functions, called
so (unsurprisingly) because their codomains are sets of vectors. Examples of vectorial
RFs which are not lexicographic RFs are the Ramsey-based ranking functions presented
in [PR04b] and [CPR06].
Brief History of Ranking Functions The termination analysis literature often cites
the paper of Floyd [Flo67] as the originator of ranking functions. The paper lays out a
general framework for program veriﬁcation, stating an unpublished work of Alan Perlis
and Saul Gorn as a source of inspiration. However, similar ideas about proving correctness
and termination can be traced back even further, to the Alan Turing’s 1949 three-page
conference piece “Checking a Large Routine” [Tur49]. The likely reason for the relative
obscurity of Turing’s paper is, as stated in [MJ84], a large number of typos in its typewritten
transcript, which obfuscates its message.
20 Ranking Functions and Martingales for Termination Proving: A Brief History
It is worth noting that the papers of both Turing and Floyd already consider what
we previously called a lexicographic ranking function. Also, none of them actually uses
the term ranking function: Floyd uses the name 𝑊-function (as a shorthand for wellordered),
while Turing does not use any speciﬁc name at all. The earliest use of the term
ranking function the author was able to ﬁnd is in the paper [LPS81], which deals with
termination proving in concurrent programs. Other terms used for the concept of RFs in
the past are termination function [DM79], and loop variant, which Gries [Gri03] attributes
to Dijkstra [Dij76].
Ranking functions bear some resemblance to Lyapunov functions used in control theory
to prove the stability of a dynamical system. This connection does not seem to be discussed
in the termination proving literature, likely because of the rather continuous nature of
Lyapunov functions as opposed to discrete reasoning used in program analysis (e.g. in
Lyapunov functions, the decrease condition is replaced with a condition placed on the
function’s derivatives). The connection between control theory and termination proving
seems to be more pronounced in the probabilistic setting. Indeed, in probability theory, the
term Lyapunov function is used in the context of so-called Foster-Lyapunov criteria [Fos53]
to prove positive recurrence in inﬁnite-state Markov chains, including chains with discrete
state space. Such Lyapunov functions can be seen as natural probabilistic generalisations
of scalar ranking functions, and hence the “Lyapunov” terminology is used in some early
literature on probabilistic termination proving. We will provide more details on this topic
in the subsequent sections.
Linear RFs and Supporting Invariants While the aforementioned works did not
consider automated termination proving, they often contain conceptual points that streamline
the proving process and thus act as stepping stones towards the automation. The
paper [KM75] highlights two such points: linear RFs and supporting invariants.
A (vectorial) RF 𝑓 is linear, if at every (cutpoint) location, each component of 𝑓 is an
aﬃne function of the current valuation 𝜈 of program variables. That is, denoting by 𝑓 𝑘 the
𝑘-th component of 𝑓, we have 𝑓 (ℓ, 𝜈) = 𝑏ℓ + 𝑛
𝑖=1 𝑎ℓ
𝑖 𝑥𝑖, where 𝑎ℓ
1, . . . , 𝑎ℓ
𝑛, 𝑏ℓ are locationspeciﬁc
components and 𝑥1, . . . , 𝑥 𝑛 are the program variables. Restricting to linear RFs
makes constructing and checking termination proofs easier, which comes at the expense of
coverage: even if a program admits a ranking function, there is no guarantee that it admits
a linear one. This limitation can, to some degree, be mitigated by using vectorial RFs:
there are programs that only admit non-linear scalar RFs, but do admit linear vectorial
RFs. [ACN18].
Now one might ask whether the use of linear RFs makes sense: one of the key requirements
on an RF is that it is in some sense bounded from below (well-foundedness). But
non-constant linear functions are clearly not bounded, so how could they act as RFs? The
key observation is that an RF does not have to be lower-bounded over all possible states
but only over reachable states, i.e. states that can appear during the program’s execution. A
similar observation holds about the strict decrease condition.
4.1 History of Ranking Functions 21
Precisely identifying the set of all reachable states is generally diﬃcult and not possible
to automate due to undecidability barriers. However, to check that 𝑓 is a ranking function,
it is suﬃcient to check the required conditions on some over-approximation of the set of
reachable states. Such over-approximations are called invariants. Formally, an invariant
for a program is a mapping 𝐼 assigning to each location a set of variable valuations such
that each state (ℓ, 𝜈) reachable from the initial state satisﬁes 𝜈 ∈ 𝐼(ℓ). For a state 𝑠 = (ℓ, 𝜈),
we will use 𝑠 ∈ 𝐼 as a short-hand for 𝜈 ∈ 𝐼(ℓ).
An invariant 𝐼 supports an RF 𝑓 if 𝑓 satisﬁes the well-foundedness and strict decrease
condition once its domain is restricted to ℓ {ℓ} × 𝐼(ℓ). The ability to compute good
invariants is an important prerequisite of many automated termination-proving techniques.
Algorithmic Synthesis of Ranking Functions: The Farkas’ Approach The ﬁrst
results concerning the automated synthesis of linear scalar RFs started to appear in the early
2000s, e.g. [CS01; PR04a]. While these papers considered programs of various syntactic
structures and their approaches provided diﬀerent sorts of guarantees, they were all based
on the template-based approach utilising the Farkas’ lemma. We will sketch the approach
below, since it naturally transfers to probabilistic termination proving.
The approach is most suitable for aﬃne, i.e. linear-arithmetic programs. Suppose that
we are given such a program and its invariant 𝐼. For simplicity, we make the following two
assumptions
• The invariant 𝐼 is polyhedral, i.e. each of the sets 𝐼(ℓ) is an intersection of half-spaces.
• Each location forms a cutpoint, i.e. we want to ensure the strict decrease and lower
bound conditions in every location.
Since linear ranking functions can be arbitrarily re-scaled and shifted, we can w.l.o.g.
restrict our search to ranking functions that are always positive and that decrease by at
least one in every step. Since we aim to ﬁnd a linear scalar RF, we can ﬁx, for each location
ℓ, a linear template
𝑏ℓ
+
𝑛∑︁
𝑖=1
𝑎ℓ
𝑖 · 𝑥𝑖,
where 𝑥1, . . . , 𝑥 𝑛 are all the programs variables and 𝑎ℓ
1, . . . , 𝑎ℓ
𝑛, 𝑏ℓ are unknown coeﬃcients.
Substituting concrete values for these unknown coeﬃcients in every location yields a
linear function mapping program states to reals and vice versa. Hence, we can identify RF
candidates 𝑓 with |𝐿| · (𝑛 + 1)-dimensional vectors (𝐿 being the set of all locations).
22 Ranking Functions and Martingales for Termination Proving: A Brief History
Now a vector 𝑓 deﬁnes a ranking function iﬀ for every location ℓ and every location ℓ′
reachable in one step from ℓ, the following two formulae NNEGℓ (for “non-negativity”)
and DECℓ,ℓ′ (for “decrease”) are satisﬁed:
NNEGℓ (𝑓) ≡ ∀𝜈 : 𝜈 ∈ 𝐼(ℓ) =⇒ 𝑏ℓ
+
𝑛∑︁
𝑖=1
𝑎ℓ
𝑖 · 𝜈(𝑥𝑖) > 0, (1)
DECℓ,ℓ′ (𝑓) ≡ ∀𝜈 : 𝜈 ∈ 𝐼(ℓ) =⇒ 𝑏ℓ′
+
𝑛∑︁
𝑖=1
𝑎ℓ′
𝑖 · 𝜈′
(𝑥𝑖) < 𝑏ℓ
+
𝑛∑︁
𝑖=1
𝑎ℓ
𝑖 · 𝜈(𝑥𝑖) − 1, (2)
where 𝜈′ is the new variable valuation obtained by executing the command corresponding
to the step from location ℓ to ℓ′ in valuation 𝜈. A careful reader might be surprised by
the strict inequalities in the consequents, since the previous discussion called only for
non-strict ones. However, the change of non-strict inequalities to strict ones is w.l.o.g.
again due to the possibility of re-scaling and shifting an RF, and will later simplify the
construction.
Let us now investigate the syntactic structure of the two types of formulae. The
consequent in NNEGℓ is a (strict) linear inequality (linear w.r.t. the terms 𝜈(𝑥𝑖)). The
same holds for DECℓ,ℓ′ when working with aﬃne programs: in such a case, for each 𝑖 we
have 𝜈(𝑥𝑖) = 𝑔 + 𝑛
𝑖=1 ℎ𝑖 · 𝜈(𝑥𝑖), where 𝑔, ℎ1, . . . , ℎ 𝑛 are constants given by the respective
assignment command in the program. Substituting these linear combinations for 𝜈(𝑥𝑖)
and re-arranging again yields a linear inequality w.r.t. all 𝜈(𝑥𝑖). Let us now consider the
antecendent 𝜈 ∈ 𝐼(ℓ) common to both formulae. Since we assumed that 𝐼 is polyhedral
𝐼(ℓ) is an intersection of half-spaces, i.e. there is an 𝑛 × 𝑛 matrix 𝐶 and an 𝑛-dimensional
vector 𝒅 such that 𝜈 ∈ 𝐼(ℓ) iﬀ 𝐶 · (𝜈(𝑥𝑖)) 𝑛
𝑖=1 ≤ 𝒅. To streamline the notation, we abuse it a
bit and denote by 𝝂 the vector (𝜈(𝑥𝑖)) 𝑛
𝑖=1. We conclude that both the conditions NNEGℓ
and DECℓ,ℓ′ can be re-written in the following generic form:
GEN(𝒂, 𝑏, 𝒅, 𝐶) ≡ ∀𝝂 ∈ R 𝑛
: 𝐶 · 𝝂 ≤ 𝒅 =⇒ 𝒂 · 𝝂 < 𝑏, (3)
for appropriate vectors 𝒂, 𝒅, matrix 𝐶, and scalar 𝑏. If all these four objects were constants,
checking the validity of GEN(𝒂, 𝑏, 𝒅, 𝐶) would reduce to checking whether a given convex
polytope is contained in a given half-space, which in turn reduces to checking the feasibility
of a system of linear equations: GEN(𝒂, 𝑏, 𝒅, 𝐶) holds iﬀ there is no counterexample to the
implication, i.e. if the system of linear inequalities
𝐶
−𝒂
· 𝝂 ≤
𝒅
−𝑏
(4)
is infeasible over 𝝂 ∈ R 𝑛. However, in the ranking function synthesis scenario, only 𝐶 and
𝒅 are constants, since they are given entirely by the invariants, which are assumed to be
pre-computed before the synthesis process. On the other hand, each component of 𝒂 is of
the form 𝑎𝑖 · 𝑎ℓ
𝑖 , where 𝑎𝑖 is a constant given by the program and 𝑎ℓ
𝑖 is an unknown template
4.1 History of Ranking Functions 23
coeﬃcient; a similar decomposition can be applied to 𝑏. This could seem problematic
because then the system of inequalities (4) is no longer linear: both 𝝂 and 𝒂 contain
unknowns. This issue is overcome by a fundamental result in mathematical optimisation
called Farkas’ lemma, from which this RF synthesis algorithm takes its name. There are
several possible versions of the lemma [MG07], from which we take the one most suitable
for our purposes:
Theorem 1 (Farkas’ lemma). Let 𝐴 ∈ R 𝑛×𝑛 and 𝒗 ∈ R 𝑛 be arbitrary. Then exactly one of
the following statements is true
• The system 𝐴 · 𝒙 ≤ 𝒗 has a solution 𝒙 ∈ R 𝑏.
• There exists 𝒚 ∈ R 𝑛 such that 𝒚 ≥ 0, 𝒚T · 𝐴 = 0, and 𝒚T · 𝒗 < 0.
A computer scientist might imagine the Farkas’ lemma to state a refutation-completeness
of a certain logical system. If some 𝒙 satisﬁes 𝐴 · 𝒙 ≤ 𝒗, then for any 𝒚 ≥ 0 the
linear inequality ( 𝒚T · 𝐴) · 𝒙 ≤ 𝒚T · 𝒗 is also true – this can be easily checked. Hence,
non-negative linear combinations of the rows of ( 𝐴 | 𝒗) can be seen as consequences of the
original system of inequalities and the process of taking such combinations as a sound
inference rule producing such consequences. In particular, if the ﬁrst item in the lemma’s
statement is true, the second one must be false, since the inequality 0 ≤ something negative
is false. The crux of the Farkas’ lemma is that it shows the rule to be refutation-complete:
if the original system is infeasible (i.e. the ﬁrst item in the lemma’s statement is false), then
the second item must be true, and hence we must be able to infer an obvious contradiction:
an inequality 0 ≤ 𝑧 for some 𝑧 < 0.
We now apply the Farkas’ lemma to the ranking function synthesis problem, namely to
the system of inequalities (4), substituting
𝐴 with
𝐶
−𝒂
, and 𝒗 with
𝒅
−𝑏
.
We get that (4) is infeasible (i.e., GEN(𝒂, 𝒃, 𝒅, 𝐶) is valid) iﬀ the following system has an
𝑛 + 1-dimensional solution 𝒚 = (𝝁, 𝜆) ≥ 0 (where 𝝁 is 𝑛-dimensional and 𝜆 is a scalar):
𝝁T
· 𝐶 − 𝜆𝒂 = 0 and 𝝁T
· 𝒅 − 𝜆𝑏 < 0. (5)
It is easy to check that in any non-negative solution to (5), 𝜆 must be non-zero, i.e. positive.
(Otherwise we would have 𝝁T · 𝐶 = 0 and 𝝁T · 𝒅 < 0, a contradiction.) Taking one such
solution and multiplying it by 1/𝜆 again yields a feasible solution to (5) in which 𝜆 = 1.
Hence, we can factor 𝜆 out of (5) entirely, getting
𝝁T
· 𝐶 − 𝒂 = 0 and 𝝁T
· 𝒅 − 𝑏 < 0. (6)
24 Ranking Functions and Martingales for Termination Proving: A Brief History
The system (6) contains two types of unknowns: 𝝁, and template variables hidden in 𝒂 and
𝑏. Since 𝝁 is only multiplied with 𝐶 and 𝒅, the system is linear and its feasibility can be
checked by linear programming, retrieving the feasible solution if one exists.
The following list summarises all the conceptual steps of the algorithm.
1. Input an aﬃne program P with 𝑛 variables 𝑥1, . . . , 𝑥 𝑛, and an invariant 𝐼 for the
program.
2. For each location ℓ of P, ﬁx a linear template 𝑏ℓ + 𝑛
𝑖=1 𝑎ℓ
𝑖 · 𝑥𝑖 for a function 𝑓 with
unknown coeﬃcients 𝑏ℓ, 𝑎ℓ
1, . . . , 𝑎ℓ
𝑛.
3. For each location ℓ, construct the formula NNEGℓ (𝑓). For each pair of locations
ℓ, ℓ′, where ℓ′ is a one-step successor of ℓ, construct the formula DECℓ,ℓ′ (𝑓). Gather
the constructed formulae together to get a ﬁnite set of linear-arithmetic formulae
{GEN(𝒂 𝑘, 𝑏 𝑘, 𝒅 𝑘, 𝐶 𝑘) | 𝑘 ∈ 𝐾}, where 𝐾 is a ﬁnite index set. (With 𝒂 𝑘 and 𝑏 𝑘
containing template unknowns.)
4. For each 𝑘 ∈ 𝐾, introduce an 𝑛-dimensional vector of variables 𝝁 𝑘 and construct a
system L 𝑘 of linear (in)equalities
L 𝑘 : 𝝁T
𝑘 · 𝐶 𝑘 − 𝒂 𝑘 = 0 and 𝝁T
𝑘 · 𝒅 𝑘 − 𝑏 𝑘 < 0.
5. Gather all the constructed inequalities together to obtain a system L:
L : L1 ∧ L2 ∧ . . . ∧ L|𝐾|.
6. Using linear programming, ﬁnd a feasible solution of L (w.r.t. the set of variables
{𝝁 𝑘 | 𝑘 ∈ 𝐾} ∪ {𝑎ℓ
1, . . . , 𝑎ℓ
𝑛, 𝑏ℓ | ℓ is a location }). If no such solution exists, report
that a linear ranking function supported by 𝐼 does not exist. Otherwise, let
(𝑎ℓ
1, . . . , 𝑎ℓ
𝑛, 𝑏ℓ)ℓ be the computed feasible solution projected to the template variables.
Return a linear ranking function 𝑓 such that
𝑓 (ℓ, 𝜈) = 𝑏ℓ
+
𝑛∑︁
𝑖=1
𝑎ℓ
𝑖 · 𝜈(𝑥𝑖).
In the last step, we use a standard trick to handle the strict inequalities in L’s: we
introduce a fresh non-negative variable 𝜀 and replace each strict inequality 𝐿 < 𝑅 with
𝐿 + 𝜀 ≤ 𝑅. We then seek a solution maximising the value of 𝜀, rejecting the system as
infeasible if the optimal value of 𝜀 is zero.
The algorithm is not only sound, but also relatively complete in the sense that if there
exists a linear ranking function for P supported by 𝐼, the algorithm will ﬁnd one such
function. Moreover, the algorithm is essentially a polynomial-time reduction to linear
programming. Hence, the problem whether a given program admits a linear RF supported
by a given invariant is solvable in polynomial time.
4.1 History of Ranking Functions 25
Algorithmic Synthesis of Ranking Functions: Speciﬁc Algorithms The earliest
use of Farkas’ lemma for termination proving we were able to identify is due to Colón
and Sipma, who worked ﬁrst with unnested loops [CS01], and then with more general
programs [CS02]. They do not ultimately reduce RF synthesis to linear programming
and instead present an algorithm based on the manipulation of polyhedral cones. Linear
programming was utilised by Podelski and Rybalchenko [PR04a], who prove that
their approach is complete (in an absolute sense) for unnested straight-line loops. The
paper [BMS05] extends the template-based approach to the scenario where the invariants
are not pre-computed but synthesised during the termination-proving process. The key
observation is that we can set up a linear template for invariants themselves, turning the
𝐶-matrices in GEN-formulae into matrices of unknowns. This turns (6) into a system
of quadratic constraints, and [BMS05] presents two heuristic approaches to solving such
systems.5
The aforementioned work [BMS05] also considers the synthesis of lexicographic ranking
functions. Indeed, the Farkas’ approach can be used to synthesise individual components
of a linear LexRF. The paper [ADFG10] describes an iterative technique computing
the components one by one. In each iteration, a relaxed version of the linear system L
is constructed, aiming to ﬁnd a “partial” ranking function that maximises the number of
program transitions along which the function value strictly decreases (on other transitions,
the function must not increase). The ranked transitions are then removed from the
program, the synthesised partial RF is added as the next component of a lexicographic
RF, and the process proceeds to the next iteration until no unranked transitions remain.
The paper [CSZ13] considers a similar approach with an important twist inspired by the
approach to synthesising Ramsey-based RFs [CPR06]. The twist consists of deploying a
safety prover for termination proving: when (the relaxation of) L becomes infeasible, it
means that there is no RF supported by the given invariant. This might be because the
invariant is not precise enough. Hence, a safety prover is used to check whether we can
produce an execution along which the function computed so far does not lexicographically
decrease. If the safety check fails, the safety prover provides such a counterexample execution.
This counterexample is used to reﬁne the vectorial function: using an adaptation of
the Farkas’ approach, we compute a new component that decreases along the counterexample
and thus extends the scope of the termination certiﬁcate. One interesting diﬀerence
between [ADFG10] and [CSZ13] is that in the former, all components of the LexRF have to
be non-negative, while in the latter, each component needs to be non-negative only on
that part of the program whose termination is proved by that component. These subtle
diﬀerences between deﬁnitions of LexRFs and their computational implications were
studied in [BG15].
Finally, we note that termination proving via synthesis of LexRFs was implemented
in several successful temporal provers, such as Terminator [CPR06], T2 [BCIKP16],
AProVE [Gie+17], and others. The research surrounding termination proving and ranking
26 Ranking Functions and Martingales for Termination Proving: A Brief History
functions goes well beyond the scope of discussion warranted in this thesis; we refer the
reader to the survey paper [CPR11] as a starting point for further research on the topic.
4.2 History of Ranking Supermartingales
Let us now see how can the ideas from the previous section be transferred into the probabilistic
setting. In this section, we describe several previous lines of research that converged
towards the notion of ranking supermartingale, which forms the foundation of probabilistic
termination proving. The work of this thesis’ author, which builds on this notion to develop
several new approaches to probabilistic program analysis, is described in the next chapter.
RFs and Probabilistic Programs Recall that an RF 𝑓 maps program states to some
well-founded set (𝑊, ⪯) in such a way that upon each execution step, the 𝑓-value of the
current state strictly decreases w.r.t. the ⪯ ordering. Nothing prevents us from deﬁning
RFs in exactly the same way also for probabilistic programs. This implicitly amounts to
treating the probability as non-determinism: the value of the function will be required
to decrease for each outcome of any probabilistic instruction in the program. Such a
deﬁnition is deﬁnitely sound w.r.t. almost-sure termination proving: the existence of such
an RF demonstrates that each program’s execution terminates, and thus, the probability of
terminating runs is one. However, it can be easily seen that this is a rather crude tool for
a.s. termination proving, as it is easy to ﬁnd programs that do terminate almost-surely but
still admit (possibly inﬁnitely many) non-terminating executions (consider a simple loop
simulating an asymmetric random walk over integers with negative bias and terminating
once a non-positive number is reached).
Moreover, by ignoring the quantitative aspect of the probabilistic behaviour, the classical
RFs are not suitable for quantitative analysis (quantitative termination, expected
runtime, tail bound analysis,...).
From RFs to (Scalar) Ranking Supermartingales A reader familiar with the basic
theory of random walks will likely not be surprised (in particular in light of the above
example) by how to ﬁt RFs into the probabilistic setting in a less naive and more useful
way: in essence, RFs tracks the program’s current “distance” from termination, and for a
probabilistic system to (almost-surely) reach a certain state, it is suﬃcient to show that the
distance towards that state decreases in expectation. The following deﬁnition formalises
this intuition.
Deﬁnition 2 (Ranking supermartingale). Let T = (𝑆, Δ) be a PTS and 𝑓 : 𝑆 → R a lowerbounded
function. We say that 𝑓 is a ranking supermartingale (RSM) for a target set of states
𝐹 ⊆ 𝑆 if for every transition 𝜏 = (𝑠, −→𝜏 ) ∈ Δ s.t. 𝑠 ∉ 𝐹 it holds that
E𝑠′∼−→𝜏 [𝑓 (𝑠′
)] ≤ 𝑓 (𝑠) − 1.6
4.2 History of Ranking Supermartingales 27
Similarly to the case of classical RFs, the choice of −1 in the deﬁnition of an RSM is
rather arbitrary and can be replaced by any non-positive constant.
Mathematical Martingales The term “supermartingale” might seem puzzling to readers
unfamiliar with probabilistic literature. Given that RSMs seem like a straightforward
probabilistic generalisation of RFs, why not use a more down-to-earth term such as a
“probabilistic ranking function?” The reason is that the RSMs we just introduced are deeply
connected to so-called (super)martingale processes in abstract probability theory. How these
processes came to be named “martingales” is an interesting (and highly speculative) story
that is beyond the scope of this thesis (we refer the reader to the article [Man05] for the
sometimes colourful details). We just state what the (super)martingales are: a martingale is
a real-valued stochastic process whose expected change in each step is zero, even if the
knowledge of past values of the process is provided; in a supermartingale, the expected
change of the process is non-positive, i.e. the process tends to decrease (or stay the same)
over time. Formally, the deﬁnition is as follows:
Deﬁnition 3. Let 𝑋0, 𝑋1, 𝑋2, 𝑋3, . . . be a stochastic process (a sequence of random variables
in the same probability space). The process is a martingale if for every 𝑖 ≥ 0 it holds
E[𝑋𝑖+1 | 𝑋0, 𝑋1, . . . , 𝑋𝑖] = 𝑋𝑖, (7)
where E[ · | · ] denotes conditional expectation. If (7) holds with ≤ instead of =, we say that
the process is a supermartingale; and if it holds with ≥ instead of =, we say that the process is
a submartingale.
We note the sometimes confusing terminology: supermartingales tend to decrease (or
stay the same) while the converse holds for submartingales. A way to imagine this is that a
supermartingale is “pushed” from above, producing a downward trend.
The above deﬁnition swept a lot of technical details under the rug. In particular,
deﬁning the conditional expectation is a rather non-trivial process. We just note that the
conditional expectation in (7) is a random variable, which, given a sample 𝜔 ∈ Ω, returns
the expected value of 𝑋𝑖+1 conditional on the event 𝑋0 = 𝑋0(𝜔) ∧ . . . ∧ 𝑋𝑖 = 𝑋𝑖(𝜔). We
also leave aside the fact that probabilistic literature (including the literature on termination
proving with martingales) uses a more general deﬁnition of a martingale, which involves
ﬁltrations and conditioning by sub-algebras. These formal foundations, whose lack should
not impede the comprehension of this thesis, can be found in most of the textbooks on
formal probability theory (e.g. [Wil91; Bil95; Ros06]).
Example 1. Consider the following stochastic process (𝑌𝑖)∞
𝑖=0: we put 𝑌0 = 0. Then, we once
ﬂip a fair coin. If it lands on heads, in every subsequent time step we increase the value of the
process by 1; if the coin lands on tails, in every subsequent time step we decrease the value of
the process. The process (𝑌𝑖)∞
𝑖=0 can exhibit exactly two trajectories, each with probability 1
2: one
trajectory is 0, 1, 2, 3, . . ., while the other one is 0, −1, −2, −3, . . .. It is easy to see that E[𝑌𝑖] = 0
28 Ranking Functions and Martingales for Termination Proving: A Brief History
for every 𝑖 ≥ 0. However, to process is neither a martingale nor a super- or submartingale.
Indeed, once the value of 𝑌1 is revealed, we know for sure that the process will keep either
increasing (if 𝑌1 = 1), in which case it cannot be a (super)martingale, or decreasing, in which
case it cannot be a submartingale. However, if we modify the process by tossing a fair coin in
every step, with the outcome of the toss deciding if the next step should be an increment or a
decrement (i.e., if we change the process into the standard symmetric random walk), then we get
a martingale.
Example 2. Now consider the same example as before, but with an unfair coin which lands
on tails with probability 2
3. That is, the process (𝑌𝑖)∞
𝑖=0 can again exhibit only two trajectories,
with the decreasing one being “heavier”. Compare this to the asymmetric, 0-initiated random
walk (𝑋𝑖)∞
𝑖=0 where in each step there is a probability 2
3 of decrementing and 1
3 of incrementing.
Then for all 𝑖 ≥ 0 we have E[𝑋𝑖] = E[𝑌𝑖] = −𝑖
3 . However, for the same reasons as above, the
random walk (𝑋𝑖)∞
𝑖=0 is a supermartingale, while (𝑌𝑖)∞
𝑖=0 is not. In a sense, while it would be
tempting to say that both processes have a negative downward trend, such a characterisation
would be misleading for the latter one, as we can easily ﬁnd ourselves in a situation where the
converse is true.
The previous examples give two well-known examples of (super)martingale processes:
1D symmetric and asymmetric random walks. They also illustrate why the martingales are
relevant to proving termination properties. Indeed, the random walks in the two examples
do eventually hit negative numbers with probability 1; whereas the processes (𝑌𝑖)∞
𝑖=0 do
not. In a sense, the conditional expectation constraints (7) ensure that there is a solid trend
towards some set of states, unlike the weaker sense of trend pictured by the (unconditional)
expected values.
The connection between ranking supermartingales (Deﬁnition 2) and mathematical
martingales (Deﬁnition 3) is rather straightforward. Let T be a PTS and 𝑓 a ranking
supermartingale for a target set 𝐹. We can deﬁne a stochastic process (𝑀𝑖)∞
𝑖=0 such that
𝑀𝑖 = 𝑓 (𝑠𝑖), where 𝑠𝑖 is a random variable representing the 𝑖-th state along a run in T. It is
not diﬃcult to see that (𝑀𝑖)∞
𝑖=0 is then a supermartingale (in the mathematical sense) with
two additional properties:
• it is uniformly bounded from below; and
• it satisﬁes a stricter decrease property than “standard” supermartingales: namely,
for each 𝑖 it holds that E[𝑀𝑖+1 | 𝑀0, 𝑀1, . . . , 𝑀𝑖] ≤ 𝑀𝑖 − I𝑠𝑖∉𝐹, where I 𝐸 is the
indicator function of event 𝐸.
This connection allows us to utilise the vast arsenal of tools from martingale theory in
probabilistic program analysis and other areas of probabilistic veriﬁcation.
Martingales in Probabilistic Veriﬁcation Before we describe the usage of RSMs in
termination proving, we brieﬂy discuss the importance of martingale theory for probabilistic
veriﬁcation in general. The usefulness of martingales stems from the fact that many
interesting theorems were proved about them, often theorems similar in spirit to those
4.2 History of Ranking Supermartingales 29
holding for sequences of independent and identically distributed (i.i.d.) random variables.
Examples of these are the martingale central limit theorem, which does not need any i.i.d. assumptions,
or the martingale convergence theorem, which posits that a bounded (in a certain
sense) martingale must almost-surely stabilise. Another useful tool are the concentration
inequalities (such as Azuma’s inequality) for martingales, analogues of Chernoﬀ bounds,
that work for martingales as opposed to i.i.d. variables. The fact that the i.i.d. assumption
is not needed to apply these theorems is important in probabilistic veriﬁcation where we
work with state-based systems. The behaviour of such system in some time step depends
on the current state, and the current state depends on choices made in the past, a situation
not matching the i.i.d. property.7 On the other hand, to apply martingale-based tools, we
“only” need to ﬁnd a martingale process connected to the probabilistic system we want to
analyse (such as when we ﬁnd an RSM for a given program). There is no guarantee that
ﬁnding such a martingale is easy, but it is typically possible as long as the system exhibits a
robust trend towards some behaviour.
We conclude this part by noting that the mathematical arsenal of martingale theory is
not always necessary in order to prove something about, say, a ranking supermartingale.
For instance, the fact that an RSM for the set of terminal states acts as a certiﬁcate for almostsure
termination (which the reader by now likely, and reasonably, expects to hold and
which will be formally stated in the next part) can be proved by relatively short probabilistic
computation, without resorting to black-box results. Still, it is useful to have the arsenal at
our disposal, in particular once we start investigating more complex properties than a.s.
termination.
RSMs as Certiﬁcates of Finite Termination The usage of RSMs for proving a.s.
termination is not new to the author’s research. In this part, we will recapitulate the history
of the concept and of its applications.
We ﬁrst state the, by now anticipated, correctness of RSMs.
Theorem 2. Let T = (𝑆, Δ) be a PTS and 𝐼𝑛, 𝐹 ⊆ 𝑆 sets of initial and terminal states. Assume
that there is a RSM 𝑓 for 𝐹 in T𝐼 𝑛, i.e. in the restriction of T to states reachable from 𝐼𝑛. Then:
a) For every 𝑠0 ∈ 𝐼𝑛 and every scheduler 𝜎 it holds that
P 𝜎
𝑠0
[a state from 𝐹 is reached] = 1.
b) Denoting Time 𝐹 the number of steps to reach 𝐹, it holds that
sup
𝜎
E 𝜎
𝑠0
[Time 𝐹] ≤ 𝑓 (𝑠0).
In particular, the expected termination time is ﬁnite.
The proof of Theorem 2 is not diﬃcult for someone with a fair command of formal
probability theory. Still, it is an order of magnitude more intricate than the obvious
30 Ranking Functions and Martingales for Termination Proving: A Brief History
correctness of scalar RFs, which follows directly from the deﬁnition of a well-founded set.
This is a pattern common to multiple methods in probabilistic program analysis: on the
surface, they look (syntactically) similar to their non-probabilistic counterparts. However,
proving that the methods work is typically much more intricate in the probabilistic setting
and, as we shall see, one needs to be careful of subtle yet crucial details.
RSMs for Finite Termination: A History Although we presented RSMs as generalisations
of ranking functions, the historical developments of the two notions seem to
be independent. The earliest appearance of a concept similar to RSMs we were able to
identify is due to Foster [Fos53], who studied the ergodicity of and expected return times
in countable-state Markov chains with applications to queuing processes. The presented
proof certiﬁcates of ergodicity eventually came to be known as Foster-Lyapunov functions
[MT09], in a nod to Lyapunov functions from non-stochastic control theory. Foster’s
paper essentially contains (in a somewhat diﬀerent form) the statement of Theorem 2
for countable Markov chains (i.e. countable state-space PTSs with no non-determinism).
The work of Bournez and Garnier [BG05] connected the notion of Lyapunov functions
to almost-sure termination of certain probabilistic rewrite systems. They also coined the
term positive termination for almost-sure termination in a ﬁnite expected number of steps.
The notion ranking supermartingale was, to our best knowledge, ﬁrst used in the work
of Chakarov and Sankaranarayanan [CS13]. They deﬁned RSMs for imperative arithmetic
programs in a way identical in spirit to what is presented in this thesis. One of their
key contributions is an algorithm for the synthesis of linear RSMs supported by a given
invariant. 8 The algorithm is a straightforward syntactic extension of the procedure for
the synthesis of linear scalar ranking functions, presented in Section 4.1. The crucial
observation is that as long as we restricted to linear arithmetic, due to the linearity of
expected value it is still possible to express the deﬁning properties of an RSM as a conjuction
of the formulae of the general form (3), i.e. ∀𝝂 ∈ R 𝑛 : 𝐶 · 𝝂 ≤ 𝒅 =⇒ 𝒂 · 𝝂 < 𝑏 (for
appropriate vectors 𝒂, 𝒅, matrix 𝐶 and scalar 𝑏), which can then be transformed into a
conjunction of linear inequalities using the Farkas’ lemma. More precisely, recall that
the deﬁning conditions of a ranking function were encoded via the formulae NNEG (1)
and DEC (2). Non-negativity (or, in general, lower-boundedness) of an RSM is a property
independent of any probabilities, and hence the same formula NNEG can be used for
RSMs. To capture the expected decrease property of RSMs, we can deﬁne the following
P-DEC formula:
P-DECℓ,ℓ′ (𝑓) ≡ ∀𝜈 : 𝜈 ∈ 𝐼(ℓ) =⇒ E[𝑓 (ℓ′
, 𝜈′
)] < 𝑏ℓ
+
𝑛∑︁
𝑖=1
𝑎ℓ
𝑖 · 𝜈(𝑥𝑖) − 1. (8)
(Recall that a linear RF/RSM 𝑓 can be represented by a set of linear coeﬃcients 𝑎ℓ
1, . . . , 𝑎ℓ
𝑛, 𝑏ℓ).
If the command eﬀecting the transition from ℓ to ℓ′ is non-probabilistic, the distribution
4.2 History of Ranking Supermartingales 31
over the successor valuations is Dirac and (8) reduces to (2). If the command is of the form
𝑥𝑖 := sample(D) for some distribution 𝐷, then
E[𝑓 (ℓ′
, 𝜈′
)] = 𝑏ℓ′
+ 𝑎ℓ′
𝑖 · E[𝐷] +
𝑛∑︁
𝑗=1
𝑗≠𝑖
𝑎ℓ′
𝑗 𝜈(𝑥 𝑗),
(where E[𝐷] is the expected value of 𝐷), in which case (8) ﬁts the general form (3). For the
if prob(p) probabilistic branching command, the formula P-DEC is only parameterised by
the source location ℓ, since ℓ′ is a random variable. If ℓ′ and ℓ′′ are the possible destinations
of the probabilistic if and else branches, respectively, then
E[𝑓 (ℓ′
, 𝜈′
)] = 𝑝 · (𝑏ℓ′
+
𝑛∑︁
𝑖=1
𝑎ℓ′
𝑖 · 𝜈(𝑥𝑖)) + (1 − 𝑝) · (𝑏ℓ′′
+
𝑛∑︁
𝑖=1
𝑎ℓ′′
𝑖 · 𝜈(𝑥𝑖)),
again a linear expression (as long as 𝑝 is a constant hard-coded into the program) allowing
re-arrangement of (8) to ﬁt the general form (3).
Chakarov and Sankaranarayanan not only utilised the above insights to get an algorithm
for RSM synthesis, but also sketched the use of martingales for proving of quantitative
safety properties via the Azuma’s ineuqality, a useful tool from martingale theory that we
will keep encountering later. However, no automation of such safety proofs is discussed.
The fundamental limitation of their work is that they only consider programs without
non-determinism, i.e. Markov chains. The work of Fioriti and Hermanns [FH15] aimed
to overcome this limitation and lift RSM reasoning to programs with non-determinism.
Their paper provides numerous ideas relevant for the subsequent research on the topic;
however, its impact was somewhat aﬀected by its use of rather non-standard semantics.
Instead of treating the program as a (possibly general state-space) MDP, they consider a
construction of a single probability space which captures the behaviour of all schedulers.
The main motivation for such a construction seems to be a controversial claim that under
the standard semantics, RSMs are not a complete proof certiﬁcate for proving ﬁnite
termination. In other words, the paper claims that there are probabilistic programs with
bounded non-determinism that terminate, under every scheduler, in a ﬁnite expected
number of steps, but do not admit an RSM. However, the presented counterexample does
not seem to demonstrate this eventuality. The accompanying discussion focuses on the
optimality of schedulers w.r.t. maximising the value of a certain variable, not on termination
per se; it is not clear why should the worst-case expected termination time not work as
an RSM for the program. Moreover, the arXiv paper [CF17] claims a counterexample
showing that under classical semantics, the counterexample does admit an RSM, though
the program used in [CF17] has a diﬀerent syntax than the one in [FH15] (a key variable
controlling a conditional branch has a Bernoulli distribution rather than a uniform one
used in the original). Nevertheless, with the caveat of non-standard semantics, [FH15] does
indeed prove Theorem 2 for programs with non-determinism. The work does not directly
32 Ranking Functions and Martingales for Termination Proving: A Brief History
consider automation of RSM synthesis. Instead, it designs a proof system for compositional
termination proving, where the a.s. termination of a whole program is inferred from a.s.
termination of its sub-components. Proofs derived in such a system can be viewed as (sort
of) lexicographic termination arguments: by the design of the system, RSMs certifying
termination of outer loops must not increase in expectation during an execution of the
inner loop. While the proof system is very intuitive, Fioriti and Hermanns show that it is not
sound in general. The issue boils down to the probabilistic notion of uniform integrability.
Intuitively, if the RSM-deﬁning formula involves variables that are not uniformly integrable
(UI), the RSM might, with non-zero probability, keep increasing in super-polynomial pace
while its expected value remains bounded. This presents an issue for the compositional
proof systems, and hence non-UI variables must be excluded from RSMs. Unfortunately, as
noted already in [FH15], the task of deciding whether a variable is UI is hihgly non-trivial,
since programs might produce very complex distributions over their variables. Instead,
the paper provides a type system through which UI of some of the program variables
can be certiﬁed. It is then claimed that a compositional termination proof in which the
constituent RSMs only use UI-typed variables is sound for a.s. termination. This claim again
proved controversial: [HFCG19] contends (to the author’s best knowledge, successfully)
that the system of [FH15] is not sound under the standard semantics, likely since it does
not require the constituent RSM to be integrable at the end of the while-loop it ranks.
The counterexample does not involve any non-determinism, and hence the issue likely
cannot be attributed to the non-standard semantics used in [FH15]. We suggest taking the
aforementioned discrepancies as a reminder that probabilistic program analysis involves
subtle intricacies already at the most fundamental level.
Martingales in Other Fields of Computer Science. We conclude this survey part by
noting that probabilistic program analysis is by far not the only ﬁeld of computer science
where martingales are useful. (A fact that should not be surprising given the previously
discussed usefulness of martingales in probabilistic veriﬁcation.) Martingales play an
important role in stochastic control theory. Already the classical book of Kushner [Kus71]
details the use of martingales (and in particular, of the Doob’s martingale inequality) in
stability analysis of stochastic processes. This theme has been further developed in a more
recent work, in which notions such as level sets [TA11] and barrier certiﬁcates [PJP04] that
can be interpreted in the language of martingale theory, are used to prove various safety
and liveness properties of stochastic dynamical systems. While the focus in control theory
is more on the usage of such certiﬁcates rather than their automated synthesis, recent
approaches consider automation via reduction to sums-of-squares programming [ST12].
A last but important category of work we mention here is the one on veriﬁcation of oneand
multi-counter Markov decision processes and games [BBEKW10; BBE10; BBEK11], including
the work co-authored by the author of this thesis [BKNW12; BKKNK14; BKKN15].
These models can be imagined as ﬁnite-state Markov chains/Markov decision processes/games
equipped with one or more integer-valued counters, which can in turn be viewed
4.2 History of Ranking Supermartingales 33
as a very restricted class of probabilistic programs. This restricted form ensures that the
models always possess a martingale of a nice syntactic shape that can be used to analyse (and
decide) most of the veriﬁcation questions we study in this chapter. While the algorithmic
aspects of these works are not readily transferable to general probabilistic programs, the
martingale-based arguments therein were important stepping stones towards the more
general techniques presented in the next chapter.
34 Ranking Functions and Martingales for Termination Proving: A Brief History
C O D E SA F E T Y: AU T H O R’ S
C ON T R I B U T I ON 5The previous chapter summarised state of the art in martingale methods for probabilistic
program analysis at the time when the author of this thesis started contributing to this ﬁeld.
In this section, we present ﬁve papers (and one book chapter) from the period 2016–2021
which encompass the current state of this contribution.
5.1 Scalar RSMs: Algorithms, Complexity, Tail Bounds
The ﬁrst paper we cover is the POPL’16 paper co-authored with Chatterjee, Fu, and
Hasheminezhad, together with its journal version published in TOPLAS:
• K. Chatterjee, H. Fu, P. Novotný, and R. Hasheminezhad. “Algorithmic analysis of
qualitative and quantitative termination problems for aﬃne probabilistic programs”.
In: Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles
of Programming Languages, POPL 2016, St. Petersburg, FL, USA, January 20 - 22,
2016. Ed. by R. Bodík and R. Majumdar. ACM, 2016, pp. 327–342. doi: 10.1145/
2837614.2837639. url: http://doi.acm.org/10.1145/2837614.2837639
• K. Chatterjee, H. Fu, P. Novotný, and R. Hasheminezhad. “Algorithmic Analysis of
Qualitative and Quantitative Termination Problems for Aﬃne Probabilistic Programs”.
In: ACM Trans. Program. Lang. Syst. 40.2 (2018), 7:1–7:45. doi: 10.1145/
3174800. url: https://doi.org/10.1145/3174800
The paper extends both the work of Chakarov and Sankaranarayanan [CS13] and of
Fioriti and Hermanns [FH15] by considering RMSs for programs with non-determinism
under the standard MDP semantics. Furthermore, we consider programs with both angelic
and demonic non-determinism, the algorithmic and complexity aspects of RSM synthesis,
and the derivation of tail bounds on expected termination time.
More precisely, the ﬁrst contribution of the paper was the proof of Theorem 2 for aﬃne
probabilistic programs with demonic or angelic non-determinism, or even a combination
thereof. Note that the formulation of Theorem 2 in this thesis considers only demonic nondeterminism.
Incorporating also angelic non-determinism entails changing the statement
of item a) into “For every 𝑠0 ∈ 𝐼𝑛 there exists and angelic scheduler 𝜎 𝐴 s.t. for all demonic
schedulers 𝜎 𝐷 it holds that...(the program terminates a.s. under 𝜎 𝐴 and 𝜎 𝐷).” Similarly, item
b) changes into inf 𝜎 𝐴 sup 𝜎 𝐷
E 𝜎 𝐴,𝜎 𝐷
𝑠0
[Time 𝐹] ≤ 𝑓 (𝑠0). Next, we provided a careful deﬁnition
of a linear RSM and of a Farkas’-based algorithm for its synthesis. While the formulation
35
36 Code Safety: Author’s Contribution
of the algorithm is a rather straightforward extension of the ideas of [CS13] and of the
previous work on non-probabilistic ranking functions, we were, to our best knowledge,
the ﬁrst to provide such a detailed recipe on how to produce the individual constraints,
along with the proof of their correctness. (Indeed, to our best knowledge, none of the
previous work provided a complete description of the Farkas’ translation process, even to
the level of detail provided in Section 4.1).
An entirely novel aspect of our formulation is the handling of angelic non-determinism,
for which the Farkas’ transformation yields a system of quadratic constraints. Intuitively,
this is because for angelic non-determinism, the formula (3) can have a disjunction of
multiple linear inequalities on its right-hand side. Indeed, in the angelic case we want to
express the property “for each state in the invariant, at least one of the available transitions
decreases the value of the RSM”, which yields a formula of the form
𝐶 · 𝝂 ≤ 𝒅 =⇒
𝑚
𝑗=1
𝒂𝑗 · 𝝂 < 𝑏𝑗,
where 𝑗 ranges over choices available to the angelic scheduler. The validity of the formula
is equivalent to the unsatisﬁability of the formula
𝐶 · 𝝂 ≤ 𝒅 ∧
𝑚
𝑗=1
−𝒂𝑗 · 𝝂 ≤ −𝑏𝑗,
which, using the Farkas’ transformation, yields a variant of (5) of the following form:
𝝁T
· 𝐶 −
𝑚∑︁
𝑗=1
𝜆 𝑗 𝒂𝑗 = 0 and 𝝁T
· 𝒅 −
𝑚∑︁
𝑗=1
𝜆 𝑗 𝑏𝑗 < 0. (9)
The presence of multiple 𝜆-variables makes it impossible to factor them out easily, and since
𝒂𝑗 contains RSM template variables, (9) is a quadratic system. Of course, if the analysed
program does not involve angelic non-determinism, the produced constraints are just
linear.
Another contribution of the paper is the study of the complexity of RSM synthesis.
More precisely, we consider the following decision problem: given an aﬃne program P
and its polyhedral invariant 𝐼, decide if there exists a linear scalar RSM for P supported
by 𝐼. For programs without angelic non-determinism, the Farkas’ transformation reduces
the problem to checking the feasibility of a system of linear inequalities, demonstrating its
solvability in polynomial time. For programs with angelic non-determinism, we present
a reduction from 3-SAT showing that the problem is NP-hard. This demonstrates that,
unless P = NP, the quadratic system (9) cannot be replaced by a linear one.
Finally, we consider problems pertaining to expected termination time. Theorem 2
gives an upper bound on the expected termination time, though there is no guarantee on
how tight the bound is. For a class of programs with bounded updates, where there is a bound
5.1 Scalar RSMs: Algorithms, Complexity, Tail Bounds 37
𝛿 on the one-step change of any program variable, we can obtain further information
about the distribution of termination time via martingale concentration inequalities, such as
Azuma’s inequality:
Theorem 3 (Azuma). Let (𝑋𝑖)∞
𝑖=0 be a supermartingale and let there be 𝑐 > 0 s.t. |𝑋𝑖+1 − 𝑋𝑖| ≤
𝑐 for any 𝑖 ≥ 0. Then, for any 𝑛 ≥ 0 and any 𝑡 > 0 it holds
P[𝑋 𝑛 − 𝑋0 ≥ 𝑡] ≤ exp −
𝑡2
2𝑛𝑐2
.
The Azuma’s inequality can be employed (for demonic non-determinism) as follows.
Let 𝑓 be an RSM for our program, and suppose that we are able to derive some bound 𝛿
on the one-step changes of 𝑓. By deﬁnition of an RSM, the expected value of 𝑓 decreases
by at least one in every step (unless the process has already terminated). This decrease
can be compensated by adding unity in every step without damaging the supermartingale
property. In other words, the process 𝑋 𝑛 = 𝑓 (𝑠 𝑛) + min{𝑛, Time} (where 𝑠 𝑛 is the state at
time 𝑛 and Time is the termination time) is a supermartingale in the mathematical sense,
and its one-step change is bounded by 𝑐 = 𝛿 + 1. We then observe that for any 𝑡 ≥ 0, the
event Time > 𝑛 is contained in the event 𝑋 𝑛 − 𝑋0 ≥ 𝑛 − 𝑓 (𝑠0). This is because
𝑋 𝑛 − 𝑋0 = 𝑓 (𝑠 𝑛) + 𝑛 − 𝑓 (𝑠0) (since 𝑇 > 𝑛)
≥ 𝑛 − 𝑓 (𝑠0) (since 𝑓 is non-negative in non-terminal states)
In particular P 𝜎
𝑠0
[Time ≥ 𝑛] ≤ P 𝜎
𝑠0
[𝑋 𝑛 − 𝑋0 ≥ 𝑛 − 𝑓 (𝑠0)] under any scheduler 𝜎. To
bound the latter quantity, we observe that for large enough 𝑛, we have 𝑛 − 𝑓 (𝑠0) positive.
Hence, we can plug the values 𝑛, 𝑐 = 𝛿 + 1, and 𝑡 = 𝑛 − 𝑓 (𝑠0) into Azuma’s inequality to get
P 𝜎
𝑠0
[Time ≥ 𝑛] ≤ exp −
(𝑛 − 𝑓 (𝑠0))2
2𝑛𝑐2
≤ 𝐻 · exp −
𝑛
2𝑐2
,
where 𝐻 is a constant independent of 𝑛. Thus, we have obtained an exponentially decreasing
tail bound on the termination time. The argument for programs with angelic
non-determinism is essentially the same; we just need to ﬁx an angelic scheduler which
decreases 𝑓 in expectation. From an algorithmic perspective, the only issue to resolve is
obtaining the bound 𝛿. But the requirement −𝛿 ≤ 𝑓 (𝑠) − 𝑓 (𝑠′) ≤ 𝛿 (where 𝑠′ is a successor
of 𝑠) can be encoded directly into the linear system through which we seek 𝑓, in a similar
way as the non-negativity and decrease conditions. (In the constraints, 𝛿 is treated
as a variable.) Naturally, not every program admits an RSM with bounded diﬀerences,
but if a program admits a bounded-diﬀerence RSM supported by a given invariant, the
aforementioned method is guaranteed to ﬁnd it.
The exponentially decreasing tail bound, once obtained, can be (at least in theory) used
to approximate the expected termination time up to arbitrary precision, provided that all
probability distributions used in the program have a ﬁnite support. The idea is to unfold
the program into a tree of computations up to a depth 𝑑 computed in such a way that the
38 Code Safety: Author’s Contribution
probability of not terminating before time 𝑑 is extremely low and thus behaviour past step
𝑑 inﬂuences the expected termination time only in a marginal way. Note that computing
such 𝑑 is possible given the tail bound, since the probability of non-termination by time 𝑛
decreases exponentially fast, while the termination time increases only linearly fast (by
1) with increasing 𝑛. The termination time within the tree can then be computed exactly
by standard techniques for Markov chains. This method is likely not too practical, since
the required depth 𝑑 can be exponentially large, and the whole unfolding can thus have a
doubly-exponential size. We supplement this observation with a theoretical hardness result.
For any constant 𝐾 we deﬁne the following decision problem TimeDec(K): given a program
P and a number 𝑁 such that the termination time of P is either ≤ 𝑁 or ≥ 𝐾 · 𝑁, decide
which of the two cases holds. We show that for any 𝐾, TimeDec(K) is PSPACE-hard, via
reduction from the membership problem for linearly bounded Turing machines. In other
words, already approximating the termination time up to an arbitrary factor is diﬃcult.
The hardness result holds even if we restrict to programs P that are non-probabilistic,
without non-determinism, with bounded variable updates and admitting a linear RSM
with bounded diﬀerences.
5.2 Repulsing Supermartingales: Quantitative Reachability,
Safety, and More
Next, we cover the following POPL’17 paper co-authored with Chatterjee and Žikelić:
• K. Chatterjee, P. Novotný, and D. Žikelić. “Stochastic invariants for probabilistic
termination”. In: Proceedings of the 44th ACM SIGPLAN Symposium on Principles
of Programming Languages (POPL 2017). Ed. by G. Castagna and A. D. Gordon.
ACM, 2017, pp. 145–160. doi: 10.1145/3009837. url: http://dl.acm.org/
citation.cfm?id=3009873
The initial motivation of the paper was to study quantitative termination, i.e. computing
good lower bounds on termination probability in programs with demonic nondeterminism.
We proposed a method of stochastic invariants which reduces the problem to
safety analysis, and introduce the concept of repulsing supermartingales as a sound proof
rule for quantitative safety properties.
Our paper proposes solving the aforementioned problem using the following notion
of stochastic invariants:
Deﬁnition 4. Let (SI, 𝑝) be a tuple such that SI is a function mapping each program location
to a set of variable valuations and 𝑝 ∈ [0, 1] a probability. The tuple (SI, 𝑝) is a stochastic
invariant if, under every scheduler, the probability of reaching a state (ℓ, 𝝂) s.t. 𝝂 ∉ SI(ℓ) is at
most 1 − 𝑝.
5.2 Repulsing Supermartingales: Quantitative Reachability, Safety, and More 39
The name stochastic invariant captures the idea that while SI might not be an invariant
of the program, since its complement can be reached, it “behaves like an invariant with
probability at least 𝑝.”
The relationship between stochastic invariants and lower bounds on termination
probability is captured in the following theorem.
Theorem 4. Let P be a probabilistic program with a set of terminal states 𝐹 and (SI, 𝑝) its
stochastic invariant. Assume that there is a mapping 𝑓 from program states to real numbers
satisfying the following conditions:
• 𝑓 is bounded from below; and
• for every transition 𝜏 = (𝑠, −→𝜏 ) ∈ Δ s.t. 𝑠 ∈ SI and 𝑠 ∉ 𝐹 it holds that
E𝑠′∼−→𝜏 [𝑓 (𝑠′
)] ≤ 𝑓 (𝑠) − 1.
Then, under any scheduler, P terminates with probability at least 𝑝.
The conditions on 𝑓 in Theorem 4 essentially requires 𝑓 to be an RSM in a PTS obtained
from the original program by restricting the set of states to SI. In particular, if P is aﬃne
and SI is given as a mapping from program locations to convex polytopes, then the existence
of a linear 𝑓 from Theorem 4 can be encoded into a set of linear constraints using the
Farkas’ transformation, with a program invariant 𝐼 replaced by the stochastic invariant SI.
The above result shows that stochastic invariants can be used to obtain lower bounds
on termination probability. The question is how to obtain such stochastic invariants. In
the paper, we ﬁrst focus on the following problem: given an aﬃne program P, its (classical)
invariant 𝐼, and a polyhedral mapping SI, compute (a non-trivial value of) 𝑝 such that
(SI, 𝑝) is a stochastic invariant. This is essentially a quantitative safety veriﬁcation problem:
we want to ﬁnd an upper bound 𝑝 on the probability of reaching the complement of SI. In
the previous work, various automated approaches to quantitative safety were considered,
based, e.g. in optimised simulations [Sam+14], symbolic execution [SCG13], or ﬁnite-state
abstractions [HWZ08; KKNP09]. In our paper, we stayed true to the focus on martingalebased
techniques and proposed proving of quantitative safety bounds via a new notion of
repulsing supermartingales:
Deﬁnition 5. Let P be a probabilistic program with a state set 𝑆, 𝐼 its (classical) invariant,
and 𝑉 a set of violating states. A map 𝑓 : 𝑆 → R is an repulsing supermartingale (RepSM)
for 𝑉 supported by 𝐼 if it satisﬁes the following:
• for every 𝑠 ∈ 𝐼 ∩ 𝑉, 𝑓 (𝑠) ≥ 0;
• for each initial state 𝑠0 it holds 𝑓 (𝑠0) < 0; and
• for every 𝑠 ∈ 𝐼 ∩ (𝑆 \ 𝑉), E𝑠′∼−→𝜏 [𝑓 (𝑠′)] ≤ 𝑓 (𝑠) − 1.
40 Code Safety: Author’s Contribution
Hence, for a RepSM 𝑓, we can view −𝑓 (𝑠) as a “distance” of a state 𝑠 from the “boundary”
of the sets of violating and non-violating states. The deﬁnition of a RepSM requires that as
long as we stay in the non-violating region, the distance tends to increase over time, and
hence the program tends to stay safe. This intuition can be formalised to get guaranteed
bounds on the probability of reaching 𝑉.
Theorem 5. Let P be a probabilistic program, 𝑐 > 0, 𝑉 a set of violating states, and let 𝑓 be a
RepSM for P (supported by some invariant 𝐼) such that 𝑓 has 𝑐-bounded diﬀerences; that is,
|𝑓 (𝑠) = 𝑓 (𝑠′)| ≤ 𝑐 for each state-successor pair 𝑠, 𝑠′ ∈ 𝐼. Then, under every scheduler sigma
and for each initial state 𝑠0 it holds:
P 𝜎
𝑠0
[a state of 𝑉 is reached] ≤ 𝐶 ·
𝛾⌈
|𝑓 (𝑠0)|
𝑐 ⌉
1 − 𝛾
,
where 𝛾 = exp − 1
2(𝑐+1)2 and 𝐶 = exp −
|𝑓 (𝑠0)|
(𝑐+1)2 .
Note that the bound in Theorem 5 is exponentially decreasing in the “distance” of 𝑠0
from 𝑉. The theorem follows by applying the Azuma’s inequality to the supermartingale
𝑋 𝑛 = 𝑓 (𝑠 𝑛) + 𝑛 (stopped at the ﬁrst point in time in which 𝑉 is reached). It is suﬃcient to
observe that being inside 𝑉 in step 𝑛 entails 𝑋 𝑛 − 𝑋0 ≥ 𝑛 − 𝑓 (𝑠0), and by Azuma’s inequality,
the probability of this event is bounded by exp(−(𝑛 − 𝑓 (𝑠0))2/2𝑛(𝑐 + 1)2). Summing these
bounds (as a geometric series) over all 𝑛 ≥
⌈|𝑓 (𝑠0)|⌉
𝑐 (a lower bound on the number of steps
needed to hit 𝑉) and simplifying yields the bound from the theorem.
We note that this use of martingales for quantitative safety proving bears some similarity
to the technique of barrier certiﬁcates in stochastic control theory [PJP04]. The latter are
mostly based on the Doob’s martingale inequality, which does not require the 𝑐-bounded
diﬀerence assumption at the expense of providing only linearly decreasing bounds. Our
derivation of exponentially decreasing bounds is inspired by the analysis of one-counter
systems [BKK14].
We now turn to algorithmic aspects of stochastic invariants and repulsing supermartingales,
for which we again restrict to aﬃne probabilistic programs. Note that the deﬁnition
of a RepSM is, from a high-level point of view, syntactically similar to the deﬁnition of an
RSM: a list of requirements, each essentially of the form “for each state in a given invariant,
some aﬃne relationship must hold between the current value of the function and the
expected next-step value.” Hence, given the set 𝑉 and the invariant 𝐼, the conditions in Definition
5 can be translated into a system of linear inequalities using the Farkas’ lemma.
Similarly, once we have identiﬁed some stochastic invariant (SI, 𝑝), checking whether
it supports an RSM (and thus getting a lower bound on the probability of termination)
reduces quite straightforwardly to solving a linear system. A more interesting algorithmic
question is computing the termination certiﬁcate and the stochastic invariant at the same
time: that is, given a classical invariant 𝐼, we want to compute:
• a map SI mapping locations to convex polytopes;
5.3 Lexicographic Ranking Supermartingales 41
• a RepSM 𝑓 for the set 𝑉 = 𝑆 \ ℓ ({ℓ} × SI(ℓ)) supported by 𝐼; and
• an RSM supported by SI ∨ 𝐼.
Using 𝑓, we can derive (via Theorem 5) a bound 𝑝 s.t. (SI, 𝑝) is a stochastic invariant,
and the RSM then certiﬁes that the program terminates with probability at least 𝑝. To
compute all the required objects simultaneously, we can generalise the known technique
for computing ranking functions together with their supporting invariants by means of
quadratic programming [CSS03]. The idea is that if the invariant 𝐼 is unknown, then also
its corresponding matrix 𝐶 in (3) is composed of unknown coeﬃcients. (Of course, we
need to know at least the dimension of 𝐶 so as to set up the system (3), which we can
achieve by ﬁxing in advance the number of half-spaces that deﬁne the invariant in each
location.) After the Farkas’ transformation, we get the system (6) in which 𝐶 is multiplied
with the vector of dual variables 𝝁, thus yielding a quadratic system. The idea generalises
rather straightforwardly to computing RSM, RepSM, and a stochastic invariant at once.
While non-linear constraint solving is algorithmically much harder than linear one, for
smaller programs, the approach is expected to be within the capabilities of modern solvers.
Our paper details several other uses of RepSMs, the chief of which is the refutation of
a.s. and ﬁnite-time termination (in the weak sense that the refuted program does not, say,
terminate a.s. under any scheduler). Let us ﬁrst introduce a weaker notion of a RepSM:
Deﬁnition 6. Let P be a probabilistic program with a state set 𝑆, 𝐼 its (classical) invariant,
and 𝑉 a set of violating states. A map 𝑓 : 𝑆 → R is a weakly repulsing supermartingale
(WRepSM) for 𝑉 supported by 𝐼 if it satisﬁes the ﬁrst two items from the deﬁnition of a RepSM
(Deﬁnition 5) and moreover, for every 𝑠 ∈ 𝐼 ∩ (𝑆 \ 𝑉) it holds that E𝑠′∼−→𝜏 [𝑓 (𝑠′)] ≤ 𝑓 (𝑠).
Then, we can prove, using the optional stopping theorem from martingale theory, that
bounded-diﬀerence RepSMs for the set of terminal states refute (existential) a.s. termination,
while bounded-diﬀerence WRepSMs refute (existential) ﬁnite-time termination:
Theorem 6. Let 𝑉 be the set of terminal states of a program P.
1. If there exists 𝑐 > 0 and a RepSM for 𝑉 with 𝑐-bounded diﬀerences supported by some
invariant, then P terminates with probability strictly less than one under every scheduler.
2. If there exists 𝑐 > 0 and a WRepSM for 𝑉 with 𝑐-bounded diﬀerences supported by some
invariant, then the expected termination time of P is inﬁnite under every scheduler.
5.3 Lexicographic Ranking Supermartingales
The previous papers on use of martingales in probabilistic program analysis considered
only scalar martingale-based certiﬁcates. In the following paper, co-authored by Agrawal
and Chatterjee, we were the ﬁrst to introduce the concept of lexicographic ranking super-
martingales.
42 Code Safety: Author’s Contribution
• S. Agrawal, K. Chatterjee, and P. Novotný. “Lexicographic ranking supermartingales:
an eﬃcient approach to termination of probabilistic programs”. In: Proc. ACM
Program. Lang. 2. (Proceedings of POPL’18) (2018), 34:1–34:32. doi: 10.1145/
3158122. url: https://doi.org/10.1145/3158122
The motivation for using vectorial certiﬁcates instead of scalar ones was already discussed
for the non-probabilistic case. However, the process of extending lexicographic
RFs to the probabilistic setting is quite intricate. Consider, for instance, one of the possible
deﬁnitions of a LexRF, which [BG15] call Bradley-Manna-Sipma (BMS) LexRFs:
Deﬁnition 7. Let 𝑓 : 𝑆 → R 𝑛 be a function and denote by 𝑓𝑖 the projection of 𝑓 to the 𝑖-th
component. We say that 𝑓 is a BMS-LexRF supported by an invariant 𝐼 ⊆ 𝑆 if for every
reachable non-terminal state 𝑠 ∈ 𝐼 and its successor 𝑠′ ∈ 𝐼 there exists 1 ≤ 𝑖 ≤ 𝑛 such that all
of the following conditions hold:
𝑓𝑖(𝑠) ≥ 0, (10)
∀1 ≤ 𝑗 < 𝑖 : 𝑓𝑗(𝑠′
) ≤ 𝑓𝑗(𝑠), (11)
𝑓𝑖(𝑠′
) ≤ 𝑓𝑖(𝑠) − 1. (12)
The condition (10) is simply called non-negativity, the condition (11) is known as
unaﬀecting condition, and (12) as ranking condition. It is not diﬃcult to prove that if
a program admits a BMS-LexRF, then it terminates. To lift the construction into the
probabilistic setting, it would be tempting to follow the lead from the scalar setting and
simply replace the right-hand sides in (11) and (12) with E𝑠′∼−→𝜏 [𝑓𝑗(𝑠′)] and E𝑠′∼−→𝜏 [𝑓𝑖(𝑠′)],
respectively (requiring the conditions to hold for any transition outgoing from 𝑠). However,
such a proof rule would not be sound for proving a.s. termination. As a counterexample, one
can actually take the program from [FH15] demonstrating that the compositional approach
to a.s. termination proving is not sound without uniform integrability constraints. The
“level of unsoundness” is substantial: the program does not have a single terminating
execution, yet it admits the naive probabilistic extension of BMS-LexRF.
The problem with Deﬁnition 7 is that the non-negativity condition (10) is partial: for
each transition, we only require non-negativity of the leftmost component decreased by
that transition. In our paper, we showed that probabilistic extensions of LexRFs are sound
if we require universal non-negativity:
Deﬁnition 8. Let P be a probabilistic program and 𝐼 its invariant. We say that a function
𝑓 : 𝑆 → R 𝑛 is a lexicographic ranking supermartingale (LexRSM) if for every state 𝑠 ∈ 𝐼
and every transition 𝜏 = (𝑠, −→𝜏 ) the following holds:
• ∀𝑖 ∈ {1, . . . , 𝑛} : 𝑓𝑖(𝑠) ≥ 0; and
• if 𝑠 is non-terminal, then there exists 𝑖 ∈ {1, . . . , 𝑛} such that:
∀1 ≤ 𝑗 < 𝑖 : E𝑠′∼−→𝜏 [𝑓𝑗(𝑠′
)] ≤ 𝑓𝑗(𝑠),
E𝑠′∼−→𝜏 [𝑓𝑖(𝑠′
)] ≤ 𝑓𝑖(𝑠) − 1.
5.3 Lexicographic Ranking Supermartingales 43
We prove that LexRSMs are sound for proving a.s. termination:
Theorem 7. Suppose that P admits a LexRSM 𝑓 supported by some invariant 𝐼. Then P
terminates almost-surely under any scheduler.
Several comments on our result our in order. First, although LexRSMs are syntactically
a rather straightforward extension of RFs (the universal non-negativity was previously
used in the context of LexRFs in [ADFG10]), this does not hold for proving their soundness.
As demonstrated by the unsoundess of partial non-negativity, the proof requires delicate
reasoning about the underlying stochastic processes. As a matter of fact, in our paper we
ﬁrst provide an abstract deﬁnition of a LexRSM (in the language of general stochastic
processes) and a proof of their soundness. Only then we instantiate them to the domain of
probabilistic program analysis. Hence, our result has the potential for applicability also in
other application domains.
Second, Theorem 7 does not guarantee that the program terminates in a ﬁnite expected
number of steps. Indeed, we give an example of a linear-arithmetic program without
non-determinism which has inﬁnite expected termination time and yet admits a LexRSM
(and terminates almost-surely). While this could be viewed as a weakness of LexRSMs,
we argue that it is actually a feature, as it demonstrates that LexRSMs are a more general
proof rule than scalar RSMs, which can only exist for programs that do terminate in ﬁnite
expected time under every scheduler. We also formulate some additional conditions under
which LexRSMs entail ﬁnite (and polynomial) bounds on the expected termination time.
Finally, we comment on the relationship with the uniform integrability issue studied
in [FH15]. One might suspect whether the universal non-negativity condition simply
another way of ensuring uniform integrability and whether our method does actually
bring something new over the typechecking-supported compositional termination rule
from [FH15]. Neither is the case. First, a sequence of non-negative random variables does
not need to be uniformly integrable and vice versa. So on the mathematical level, the
proof rules are orthogonal. One might wonder whether this orthogonality survives in the
concrete world of aﬃne probabilistic programs. To examine this, we designed a LexRSMbased
compositional termination rule which replaces the uniform integrability requirement
from [FH15] with non-negativity of the constituent RSMs (and thus dispenses with the need
for typechecking). We show that this new rule is able to prove a.s. termination of programs
that cannot be handled by the rule from [FH15]. We also note that while the soundness of
the proof rule from [FH15] under standard MDP semantics has been questioned [HFCG19],
our proof rule was proven sound under this standard semantics. (And fortunately, it so
far survived all possible counterexamples: the example from [HFCG19] permits some
components of the proof certiﬁcate to become unboundedly negative, and hence does not
contradict the soundness of LexRSMs.)
Finally, we comment on the algorithmic aspects of LexRSM synthesis, for which
we again restrict to aﬃne programs and focus on LexRSMs with linear components.
Since LexRSMs are syntactically similar to non-negative LexRFs studied in [ADFG10],
44 Code Safety: Author’s Contribution
ℓ0 : while 𝑦 ≥ 0 do
𝑥 := 𝑦;
ℓ1 : while 𝑥 ≥ 0 do
𝑥 := 𝑥 − 1 + Norm(0, 1)
od ;
𝑦 := 𝑦 − 1
od
(a)
ℓ0 : while 𝑥 ≥ 0 do
i f 𝑦 ≥ 0 then
𝑦 := 𝑦 + Uni[−7, 1]
e l s e
𝑥 := 𝑥 + Uni[−7, 1] ;
ℓ1 : 𝑦 := 𝑦 + Uni[−7, 1]
f i od
(b)
Figure 2: Motivating examples. Norm(𝜇, 𝜎) samples from the normal distribution with mean 𝜇
and std. deviation 𝜎. Uni[𝑎, 𝑏] samples uniformly from the interval [𝑎, 𝑏]. Location labels are the
“ℓ𝑖”: one location per loop head and one additional location in (b) so as to have one assignment per
transition (a technical requirement for our approach).
we can employ the iterative Farkas’-based algorithm from that paper modiﬁed to handle
probabilistic programs in a similar way in which [CS13] extended algorithms for scalar
RF synthesis to the probabilistic setting. The resulting algorithm runs in polynomial time
and is guaranteed to ﬁnd a linear LexRSM of the smallest degree supported by the input
invariant 𝐼.
5.4 Beyond Non-Negative LexRSMs
As shown in the previous section, LexRSMs with relaxed non-negativity conditions are
not necessarily sound certiﬁcates of a.s. termination. On the other hand, universal nonnegativity
might be too restrictive. Indeed, there is a spectrum of non-negativity conditions
between the universal non-negativity of LexRSMs and the single-component
non-negativity of Bradley-Manna-Sipma LexRFs, and there remains hope that some relaxation
of universal non-negativity is possible while retaining soundness. Such relaxations
were explored in the Formal Methods’21 paper co-authored with Chatterjee, Goharshady,
Zárevúcky, and Žikelić:
• K. Chatterjee, E. K. Goharshady, P. Novotný, J. Zárevúcky, and D. Žikelić. “On
Lexicographic Proof Rules for Probabilistic Termination”. In: Proceedings of FM’21.
Ed. by M. Huisman, C. S. Pasareanu, and N. Zhan. Vol. 13047. LNCS. Springer, 2021,
pp. 619–639
As a motivating example, consider the two programs in Figure 2. The program in
Figure 2a does terminate a.s., as can be shown by a simple random-walk argument. A
linear LexRSM proving this needs to have a component containing a positive multiple of 𝑥
at the head of the inner while-loop (ℓ1). However, due to the sampling from the normal
5.4 Beyond Non-Negative LexRSMs 45
distribution, which has unbounded support, the value of 𝑥 inside the inner loop cannot be
bounded from below. Hence, the program does not admit a linear LexRSM. In general,
LexRSMs with strong non-negativity do not handle well programs with unboundedsupport
distributions. Now consider the program in Figure 2b. It can be again shown
that it terminates a.s.; however, this cannot be witnessed by a linear LexRSM: to rank
the “if-branch” transition, there must be a component with a positive multiple of 𝑦 in ℓ0.
But 𝑦 can become arbitrarily negative within the else-branch, and cannot be bounded
from below by a linear function of 𝑥. While it is possible that both programs do admit a
non-linear LexRSM, restrictions to linear arithmetic are typically preferred due to easier
automation.
We ﬁrst consider a probabilistic generalisation of so-called Ben-Amram–Genaim (BG)
LexRFs [BG15]. Intuitively, a BG-LexRSM replaces the universal non-negativity (ﬁrst
item in Deﬁnition 8) with a weaker version requiring that if 𝑠 is non-terminal and 𝑖 is
the leftmost component s.t. E𝑠′∼−→𝜏 [𝑓𝑖(𝑠′)] ≤ 𝑓𝑖(𝑠) − 1, then for all 1 ≤ 𝑗 ≤ 𝑖 we have
𝑓𝑗(𝑠) ≥ 0. Unfortunately, we show that already this relaxation is unsound. By examining
the counterexample, we identify a supplementary condition which ensures soundness,
yielding the notion of a generalised LexRSM (GLexRSM):
Deﬁnition 9. Let P be a probabilistic program and 𝐼 its invariant. We say that a function
𝑓 : 𝑆 → R 𝑛 is a generalised lexicographic ranking supermartingale (GLexRSM) if for every
non-terminal state 𝑠 ∈ 𝐼 and every transition 𝜏 = (𝑠, −→𝜏 ) there exists 1 ≤ 𝑖 ≤ 𝑛 such that the
following holds:
∀1 ≤ 𝑗 < 𝑖 : E𝑠′∼−→𝜏 [𝑓𝑗(𝑠′
)] ≤ 𝑓𝑗(𝑠), (13)
E𝑠′∼−→𝜏 [𝑓𝑖(𝑠′
)] ≤ 𝑓𝑖(𝑠) − 1, (14)
∀1 ≤ 𝑗 ≤ 𝑖 : 𝑓𝑗(𝑠) ≥ 0, (15)
∀1 ≤ 𝑗 ≤ 𝑖 : E𝑠′∼−→𝜏 [𝑓𝑗(𝑠′
) · I<𝑗(𝑠′
)] ≥ 0, (16)
where I<𝑗(𝑠′) is the indicator function of the set of all states in which a transition ranked by
some component left to the 𝑗-component is enabled.
We call the requirement (15) partial non-negativity and (16) expected leftward nonnegativity
(ELN). It is the ELN requirement that distinguishes GLexRSMs from a probabilistic
generalisation of BG-LexRFs.
We prove the soundness of GLexRSMs for proving a.s. termination, using a novel
application of the Borel-Cantelli lemma.
Theorem 8. Suppose that P admits a GLexRSM 𝑓 supported by some invariant 𝐼. Then P
terminates almost-surely under any scheduler.
In particular, GLexRSMs can prove a.s. termination of the programs in our motivating
examples.
46 Code Safety: Author’s Contribution
We now turn to the automation of GLexRSM synthesis. Unfortunately, even when
restricted to linear-arithmetic programs, the ELN requirement (16) cannot be straightforwardly
translated into a set of linear constraints, since it involves a rather complex integral.
Hence, we consider adding further restrictions on the shape of the program or on the
GLexRSM itself so that the integration can be replaced by simpler operations. While these
restrictions are less general than GLexRSMs, they are (in a linear form) still applicable to a
wider class of programs than LexRSMs.
First, we consider programs in which all samplings are from bounded-support distributions
(such as the program in Figure 2b). We show that in such programs, the existence
of a linear GLexRSM is equivalent to the existence of a linear function 𝑓 : 𝑆 → R 𝑛 whose
deﬁnition diﬀers from a GLexRSM in that:
• (16) is only required to hold for transitions corresponding to probabilistic branching,
and
• for all other transitions, 𝑓 satisﬁes the following expected non-negativity (EN) property
(recall that 𝑖 denotes the ranking component):
∀1 ≤ 𝑗 ≤ 𝑖 : E𝑠′∼−→𝜏 [𝑓𝑗(𝑠′
)] ≥ 0. (17)
The existence of such a function (supported by a given invariant) can be decided
by adapting the iterative Farkas’-based algorithm for LexRSMs. For the modiﬁcation,
one needs to observe that (17) can be encoded into a linear constraint by the Farkas’
transformation. Encoding (16) for probabilistic branching locations is a more technical
step that rests on two ideas:
• Each probabilistic state has just two successors, so the integral in (16) is a two-term
sum involving the indicators I<𝑗(𝑠′).
• By the time we are synthesising the 𝑗-component of 𝑓, we already know the previous
components and thus can identify the transitions ranked by them (and thus also the
valuations in which guards of these transitions are satisﬁed, which can be described
by a linear-arithmetic formula). Hence, we can remove the indicators from the sum
and instead replace the universal quantiﬁcation over “all states in the invariant” by
“all states in the invariant whose successors satisfy a guard of some transition ranked
by a previous component”. Since the latter set can be expressed as a union of convex
polytopes, we get a formula that can be fed into the Farkas’ transformation.
We note that this modiﬁcation of GLexRSMs is still suﬃcient to prove the a.s. termination
of the program in Figure 2b.
We then study how to drop the bounded-support assumption without sacriﬁcing
eﬃcient automation. First, we impose a mild syntactic restriction on the programs we
analyse: if there are two transitions, one corresponding to probabilistic branching, the
5.5 Summary of Martingale-Based Techniques: A Book Chapter 47
other to a sampling instruction, then these two transitions must not share a target location.
Such a property can be ensured by using dummy skip statements. Second, we impose
restrictions on the form of linear “GLexRSMs” that we aim to synthesise. More speciﬁcally,
we say that a linear function 𝑓 : 𝑆 → R 𝑛 satisﬁes the UNBOUND condition if the following
holds:
• Let 𝜏 be any transition which changes some variable 𝑥 by sampling from a distribution
that has unbounded support (in particular, due to the syntax of PPs, all successor
states of 𝜏 share the same location ℓ). Moreover, let 𝑖 be the smallest index such that
E𝑠′∼−→𝜏 [𝑓𝑖(𝑠′)] ≤ 𝑓𝑖(𝑠) − 1. Then for all 1 ≤ 𝑗 < 𝑖 the coeﬃcient of variable 𝑥 in the
linear function 𝑓𝑗 at the source location of 𝜏 is zero.
We prove that a program terminates a.s. whenever it admits a linear function 𝑓 : 𝑆 →
R 𝑛 that satisﬁes the UNBOUND condition and the deﬁnition of a GLexRSM with the
exception of:
• ELN property (16) being only required for probabilistic branching transitions; and
• the EN property (17) being satisﬁed for all the other transitions.
The existence of such a function can be again decided by a modiﬁcation of the iterative
algorithm for LexRSM synthesis. This modiﬁcation of GLexRSMs is suﬃcient to prove a.s.
termination of the program in Figure 2a
5.5 Summary of Martingale-Based Techniques: A Book
Chapter
The results from the previous three sections, as well as several results of other authors
[MMKK18], were summarised in a chapter of Foundations of Probabilistic Programming
co-authored with Chatterjee and Fu:
• K. Chatterjee, H. Fu, and P. Novotný. “Termination Analysis of Probabilistic Programs
with martingales”. In: Foundations of Probabilistic Programming. Ed. by G.
Barthe, J.-P. Katoen, and A. Silva. Cambridge University Press, 2020. Chap. 7,
pp. 221–258
5.6 Non-Probabilistic Tech Transfer: Proving Non-Termination
by Program Reversal
Somewhat ironically, we conclude this section with a result from non-probabilistic program
analysis. However, there is a probabilistic connection: the result spun oﬀ preceding
48 Code Safety: Author’s Contribution
discussions on eventually valid a.s. termination certiﬁcates, e.g. RSMs that only start to
behave as RSMs after some ﬁnite but possibly unbounded number of steps. This resulted
in discussions of eventual invariants, i.e. sets of states that one is guaranteed to reach eventually.
Such a concept can be seen as an instance of reachability and thus also termination
analysis in non-probabilistic programs, which led us to have a closer look at the relationship
between invariants and termination. A ﬁnal product of this study was a new approach for
automated proving of non-termination in non-probabilistic programs, presented in the
following PLDI’21 paper co-authored with Chatterjee, Goharshady, and Žikelić:
• K. Chatterjee, E. K. Goharshady, P. Novotný, and D. Žikelić. “Proving Non-Termination
by Program Reversal”. In: Proceedings of PLDI’21. Ed. by S. N. Freund and
E. Yahav. ACM, 2021, pp. 1033–1048
First, we clarify the motivation for proving non-termination. While termination is the
property typically desired to prove, if a termination proof fails, we do not know whether this
is because the program does not terminate or whether our termination-proving technique
was too weak. In such a case, it is useful to employ non-termination prover as a sort of
bug-hunting procedure for termination: if the proof is successful, we can be sure that the
program indeed contains an error, and the proof might help us localise and repair it. Hence,
proving non-termination of programs is a highly active area of research [GHMRX08;
CCFNO14; LNORR14; Gie+17; LH18; FG19; VR08].
Fix a program (represented as a non-probabilistic PTS with a state set 𝑆) with a given
set of initial states. As discussed previously, a set 𝐼 ⊆ 𝑆 is an invariant for the program if
each state reachable from some initial state if contained in 𝐼. A set 𝐼 is inductive if for every
𝑠 ∈ 𝐼 the set 𝐼 contains all one-step successors 𝑠′ of 𝑠. Every inductive set that contains
all initial states is an invariant, but not every invariant is inductive (since it might contain
some unreachable state 𝑠 but missing some successor of 𝑠). There exist many techniques for
the synthesis of non-trivial (inductive) invariants. In our paper, we employ the technique
of [GSV08] based on quadratic constraint solving.
To reason about non-termination, we introduce the notion of a backward invariant
(BI).
Deﬁnition 10. Consider a program with a set of terminal states 𝐹. A set of states BI is
a backward invariant w.r.t. if every state 𝑠 from which 𝐹 is reachable is contained in BI.
Moreover, we say that BI is pre-inductive if for every 𝑠′ ∈ BI and every one-step predecessor
𝑠 of 𝑠′ it holds 𝑠 ∈ BI.
A good way to conceptualise backward invariants is to imagine a program with all
transitions “reversed.” (I.e., if the original program contained a transition from state 𝑠 to 𝑠′,
the reversed program contains a transition from 𝑠′ to 𝑠. The reversed program has 𝐹 as its
set of initial states. Note that we work with non-probabilistic transitions). A backward
invariant in the original program is then simply an invariant in the reversed program; and
similarly, 𝐼 is pre-inductive iﬀ it is inductive in the reversed program.
5.6 Non-Probabilistic Tech Transfer: Proving Non-Termination by Program Reversal 49
The imperative arithmetic programs we consider can be translated into arithmetic
transition systems, where each transition is represented as an arithmetical relation between
the primed and unprimed variables. For these systems, the reversal can be done simply by
swapping the primed and unprimed variables in the individual relations. We can then plug
the reversed program into the aforementioned invariant synthesis technique to obtain
pre-inductive backward invariants for the original program.
Backward invariants serve as sound certiﬁcates of non-termination:
Theorem 9. Let P be a non-probabilistic program. If P admits a pre-inductive backward
invariant BI such that BI is not an invariant of P, then P admits a non-terminating execution.
The theorem is easy to prove: since BI is not an invariant, its complement ¬BI = 𝑆 \ BI
is reachable, so there exists an execution that enters ¬BI. Since BI is pre-inductive, ¬BI
is inductive, so the execution never leaves ¬BI once it reaches it. Finally, since BI is
a backward invariant, ¬BI does not contain any terminal state, so the aforementioned
execution is non-terminating.
We now sketch how to algorithmically ﬁnd BI satisfying the conditions of Theorem 9.
The problem is that using the constraint-based technique of [GSV08] cannot express the
condition that BI is not an invariant. A naive solution to this issue would be to try to
ﬁnd any pre-inductive backward invariant and then check, using a safety prover, that its
complement is reachable. However, if the check is only done after the synthesis, then
nothing prevents the constraint solver from yielding the trivial backward invariant 𝑆,
whose complement is empty. Hence, we separate the synthesis of BI into two separate
steps. First, we check whether there exists a pre-inductive backward invariant BI which
does not contain some initial state. This can be done using the constraint-based approach.
Such BI, if it exists, is clearly not an invariant, so in the case of a positive answer, we can
report non-termination without the need for a reachability check. Otherwise, we check
if there exists a pre-inductive backward invariant BI which is not inductive. For such BI,
we know that any execution that reaches ¬BI is non-terminating, so we check whether
the complement of BI is reachable using a safety checker. The idea is that by restricting
the search for BI to non-inductive sets, the constraint-based approach will be less likely to
return trivial solutions.
Our method is arguably simple, which we deem to be an advantage. Many technical
details were omitted from the above description. In particular, our method also works for
programs with non-determinism, which is not universally true for alternative approaches
to non-termination proving. We use resolution of non-determinism through polynomialtemplate
schedulers (synthesis of which can be incorporated into the constraint-solving
step), which narrows down the set of possible behaviours and thus simpliﬁes the search
for a pre-inductive backward invariant. To summarise, our approach has several beneﬁts
which, to our best knowledge, are not all present in any alternative approach:
• it handles non-determinism;
50 Code Safety: Author’s Contribution
• it handles polynomial-arithmetic (as opposed to only linear-arithmetic) programs;
• it is able to prove non-termination of programs in which all non-terminating executions
are aperiodic (i.e. non-lasso);
• it comes with relative completeness guarantees (i.e. if the non-termination witness of
a certain shape exists, our method is guaranteed to ﬁnd it).
We evaluated our approach on benchmarks from the TermComp’19 terminationproving
competition [GRSWY19]. With a proper conﬁguration, our tool proved nontermination
of more programs than any state-of-the art comparison tool, and even proved
non-termination of 2 programs that were not proven by any other tool. The vast majority
of non-terminations were proved using the ﬁrst stage of the algorithm (BI not containing
an initial conﬁguration), though the second stage (with the safety prover) was also utilised
for several programs, demonstrating its usefulness. The results demonstrate that very
simple techniques can yield surprisingly good and eﬃcient outcomes. It is worth noting,
though, that these results were made possible by recent advances in SMT solving, which
enabled us to solve the quadratic constraint systems eﬃciently.
D E S I G N SA F E T Y: R I S K-AWA R E
D E C I S I ON M A K I NG U N D E R
U NC E RTA I N T Y
6
In this chapter, we focus on the design safety of decision-making algorithms for autonomous
agents operating under statistical uncertainty. The prime model for such
stochastic decision making are Markov decision processes (MDPs), which are essentially
probabilistic transition systems (PTSs) as introduced in Chapter 2. In an MDP, the whole
system (consisting of an agent and its environment) can be, in each time step, in one of
many states, and in each step, the agent has the opportunity to choose from some set
of actions enabled in the current state. Depending on the current state-action pair, the
system probabilistically transitions into a new state, after which a new step begins, the
process continuing in this fashion ad inﬁnitum. By identifying actions with PTS transitions,
we see that MDPs are indeed PTSs, with agent choices resolved by schedulers (which
are called strategies or policies in the MDP setting). Hence, from now on we will use the
terms PTSs/MDPs, transitions/actions, schedulers/policies, etc., interchangeably. In what
follows, we will focus on MDPs with ﬁnite (though possibly large) state and actions spaces.
Similarly to the veriﬁcation of PTSs, studied in previous chapters, MDPs are typically
equipped with objectives the agent aims to satisfy. The major diﬀerence between the PTS and
MDP “narratives” is that in PTS veriﬁcation, we aim to show that the system satisﬁes a given
property (objective) for every scheduler (adversarial/demonic view of non-determinism),
while in MDPs we want to compute a single policy which makes the agent satisfy the
objective (controllable/angelic view of nondeterminism). In particular, if the underlying
objective is quantitative (i.e., each policy is assigned some number), the MDP approach is
aimed at optimisation of this quantity rather than just verifying that the quantity surpasses
or stays below some value for some or for every policy.
In a typical MDP, the task is to optimise the expected payoﬀ, where the payoﬀ is a
suitable aggregation of per-step rewards. That is, each state-transition pair (𝑠, 𝜏) is assigned
a numerical reward 𝑟(𝑠, 𝜏) ∈ R, and the step rewards 𝑟𝑖 = 𝑟(𝑠𝑖, 𝜏𝑖) are aggregated over
the whole run 𝜚 = 𝑠0 𝜏0 𝑠1 𝜏1 . . . 𝑠𝑖−1 𝜏𝑖−1 𝑠𝑖 . . . using a suitable aggregation function. The
aggregation can run either up to some ﬁnite decision horizon 𝐻 or over the whole inﬁnite
run, in which case 𝐻 = ∞. Aggregation functions most commonly encountered in the
literature are the discounted payoﬀ, deﬁned as 𝑑𝑖𝑠𝑐 𝛾 (𝜚) = 𝐻
𝑖=0 𝛾 𝑖 𝑟𝑖 for some discount factor
𝛾 ∈ (0, 1); the mean payoﬀ (also called limit average) 𝑚𝑝(𝜚) = lim inf𝑖→𝐻
𝑖−1
𝑗=0 𝑟𝑗
𝑖 ; and the
total reward 𝑡𝑜𝑡(𝜌) = 𝐻
𝑖=1 𝑟𝑖, which is well-deﬁned whenever 𝐻 is ﬁnite or the per-step
rewards are non-negative. For a given aggregation function payoﬀ ∈ {𝑑𝑖𝑠𝑐 𝛾, 𝑚𝑝, 𝑡𝑜𝑡} the
51
52 Design Safety: Risk-Aware Decision Making under Uncertainty
task is then to ﬁnd a policy 𝜎 maximising the expected payoﬀ E 𝜎
𝑠 [payoﬀ] from the initial
state 𝑠.
The expected payoﬀ maximisation in MDPs has been heavily studied from both mathematical
[Put05] and computer-science [PT87] points of view. Over time, it became clear
that the expected payoﬀ is a rather crude optimisation criterion which is not suitable for
every application scenario. For instance, a policy which always yields zero payoﬀ is, under
the expected payoﬀ criterion, equivalent to the one which has two equiprobable outcomes:
+100 and -100. The diﬀerence in the underlying payoﬀ distribution is particularly important
with respect to the agent’s risk-willingness: a loss-averse agent would strictly prefer
the constant policy over the second one. This reasoning introduced a notion of risk into
MDP optimisation, and inevitably, various approaches to handling this notion eventually
emerged. One approach replaces the classical expected payoﬀ with various risk metrics of
the payoﬀ distribution, such as mean-variance, value at risk (VaR), or conditional value
at risk (CVaR, also known as expected shortfall); see, e.g. [SLM12; PS12; Mai13; CGJP17;
TGY20]. These approaches lead to single-dimensional optimisation, i.e. the goal is to
minimise the risk metric. In this thesis, we focus on an alternative approach of constrained
optimisation, where the agent aims to optimise the expected payoﬀ subject to satisfying
additional risk constraints on the policy.
The standard model of constrained probabilistic decision-making are constrained MDPs
(CMDPs) [Alt99]. Here, apart from per-step rewards, each state-transition pair is assigned
a per-step penalty 𝑐(𝑠, 𝜏). The penalties are again aggregated using a suitable function,
and the task is to ﬁnd a policy maximising the expected payoﬀ subject to the constraint
that the expected aggregated penalty is below a given threshold. (The standard CMPD
model allows multiple penalty functions, each with a separate threshold. For simplicity, we
consider only single-dimensional penalties.) Various approaches to solving CMDPs were
developed [Alt99; UH10; Pou+15; STW16], some of them for the more general partially
observable setting (which we will discuss in more detail later on). While these works laid the
theoretical groundwork for tackling the problem, the algorithms presented therein (based
on classical planning techniques such as branch-and-bound or linear programming) do not
match the performance of modern heuristic and learning algorithms for unconstrained
payoﬀ optimisation. Only relatively recently have the techniques such as policy gradient
and Monte Carlo tree search been started to be applied to CMDPs with both perfect and
partial observation [CGJP17; LKPK18]. The work of the thesis’s author was a part of that
trend. In the remainder of this chapter, we will present three publications in which the
author has aimed to achieve the following objectives:
• solve constraint optimisation problems for MDPs in which the constraint does not
directly ﬁt into the CMDP framework;
• design algorithms that are suitable for solving models with both perfect and partial
observation; and
• design algorithms that scale well to very large models.
6.1 POMDP Optimisation under Worst-Case Payoﬀ Constraints 53
6.1 POMDPOptimisation under Worst-Case Payoﬀ Con-
straints
We start with the following AAAI’17 paper co-authored with Chatterjee, Pérez, Raskin,
and Žikelić:
• K. Chatterjee, P. Novotný, G. A. Pérez, J. Raskin, and D. Žikelić. “Optimizing Expectation
with Guarantees in POMDPs”. In: Proceedings of AAAI’17. AAAI Press,
2017, pp. 3725–3732. url: http://aaai.org/ocs/index.php/AAAI/AAAI17/
paper/view/14354
The paper considers optimisation in partially observable MDPs (POMDPs). In a POMDP,
the agent cannot directly observe the current state. Instead, once entering some state 𝑠,
the state emits an observation 𝑜 according to a ﬁxed state-dependent distribution 𝑂𝑠. The
agent receives the emitted observation and incorporates it in its decision. For the sake of
the latter, we assume that there is some global set of transition labels called actions, with
transitions enabled in diﬀerent states possibly sharing an action label. Making a decision
then means selecting an action 𝑎 to “play,” which results in performing a transition that is
enabled in the current state and labelled with 𝑎. In line with this reasoning, the policies
in POMDPs output distributions over actions and do not input state-transition histories
𝑠0 𝜏0 𝑠1 𝜏1 . . . 𝜏𝑖−1 𝑠𝑖, but observation-action histories 𝑜0 𝑎0 𝑜1 𝑎1 . . . 𝑎𝑖−1 𝑜𝑖 where 𝑜𝑗 ∼ 𝑂𝑠𝑗 and 𝑎𝑗
is the label of transition 𝜏 𝑗. In what follows, we also assume that the agent can observe the
step rewards, i.e. that these rewards can be framed as a function of the current observation
and the selected action.
In the presented paper, we consider the problem of optimising the expected discounted
payoﬀ under the constraint that no run emitted by the policy has discounted payoﬀ smaller
than a given threshold 𝑏. That is, we are solving the problem
max
𝜎
E 𝜎
𝑠0
[𝑑𝑖𝑠𝑐 𝛾]
subject to P 𝜎
𝑠0
[𝑑𝑖𝑠𝑐 𝛾 < 𝑏] = 0.
(For discounted payoﬀ, it can be proved that almost-sure satisfaction of the constraint
is equivalent to its sure satisfaction.) This corresponds to the problem of beyond worst case
(BWC) optimisation in perfectly observable MDPs [BFRR14; RRS15]. However, utilising
the previous BWC approaches would ﬁrst require us to exactly solve the unconstrained
optimisation problem. While this is, in principle, possible in MDPs, where polynomial-time
algorithms for the unconstrained problem are known, solving POMDPs exactly is generally
considered to be intractable. Instead, various heuristic and statistical approaches are the
method of choice for POMDP solving.
To take this into account, we take the following approach to the problem: take some
eﬃcient heuristic algorithm for unconstrained POMDPs and modify it so as to never select
54 Design Safety: Risk-Aware Decision Making under Uncertainty
actions that might lead to violation of the constraints. The idea is similar to the concepts
of permissive controller synthesis [DFKPU14] and shielding [Als+18] in MDPs, which were
emerging at the same time as our publication. In the terminology of the latter, our approach
can be explained as consisting of two phases:
• In the ﬁrst — oﬄine — phase, we compute a shield: an algorithm which inputs
the POMDP history (or some characteristic thereof) and outputs a list of allowed
actions. The shield has the property that as long as the agent plays only actions
allowed in respective steps, the constraint is never violated. Conversely, no safe
(i.e. constraint-satisfying) policy is prohibited by the shield: once the agent plays a
disallowed action, he is guaranteed to eventually violate the constraint.
• Then, we run some state-of-the-art algorithm for unconstrained POMDP optimisation
augmented with the shield. The shield keeps track of the agent’s history and
prohibits the usage of disallowed actions by the algorithm.
We brieﬂy elaborate on our approach to the two aforementioned points. We prove
that the shield only needs to keep track of the payoﬀ accumulated so far and of the agent’s
belief support: the set of states in which the POMDP can currently be with non-zero
probability. Belief supports are uniquely determined by the current history and can be
eﬃciently updated from step to step. To compute the actions allowed for each payoﬀ-belief
support pair, we solve a ﬁnite turn-based zero-sum discounted-payoﬀ game whose states
correspond to possible belief supports. One player in the game corresponds to the agent,
whereas the other resolves the probabilistic behaviour of the environment in an adversarial
way. (The constraint in our problem is such that exact transition probabilities do not matter.
What matters are the worst possible outcomes of the probabilistic choices.) This way of
computing the shield might seem ineﬃcient: there is no known polynomial-time algorithm
for discounted-payoﬀ games (though they are known to be in NP ∩ coNP [ZP96]) and
moreover, the number of belief supports can be exponentially large in the size of the
game. However, in practice, the number of belief supports reachable from the initial setup
could be much lower due to regularities in the POMDP’s structure. As for solving the
resulting game, the classical value iteration algorithm [Put05] proved to work very well in
our experiments.
For the second, optimisation phase, we utilised the POMCP algorithm [SV10], which
extends the Monte Carlo tree search (MCTS) planning algorithm for MDPs [KS06]. MCTS
can be described as a heuristic search algorithm exploring a part of the inﬁnite tree of all of
the model’s histories (a history tree). It is an online algorithm which, in every decision step,
computes a local approximation of the optimal policy so as to select the best action for
the current situation. To achieve this, it iteratively builds a search tree – a ﬁnite connected
sub-graph of the history tree, whose each node carries statistical information about the
best payoﬀ achievable by histories passing through that node. In each iteration, the current
history (representing the root of the search tree) is extended with a ﬁnite suﬃx sampled
according to a two-step process:
6.2 From Worst-Case to Quantile Constraints 55
• As long as the suﬃx is in the current search tree, it is being extended by selecting
actions according to the UCT formula [KS06], which balances the exploration of
new histories with the exploitation of promising ones. After each action selection,
the next observation is then sampled according to the selected action and dynamics
of the given POMDP.
• Once the suﬃx falls out of the search tree, the process continues similarly but with
actions selected uniformly at random (so-called rollout phase).
After the suﬃx is sampled, its shortest preﬁx not yet included in the search tree is added to
the tree, and all nodes on the path from the newly added one to the tree root have their
statistical information updated by the payoﬀ of the sampled history. After a number of
such iterations, the statistical information in the root is used to estimate the optimal action
in the current step. This action is then performed, and a new observation is received,
corresponding to one successor node 𝑣 of the root node. This 𝑣 then becomes a new root
of the tree (i.e., all nodes outside of 𝑣’s sub-tree are pruned away), and the resulting tree is
utilised in the next decision epoch.
We extended POMCP by storing, in each node, the information about the past accumulated
payoﬀ and the current belief support. During the building of the search tree and ﬁnal
action selection, this information is passed to the shield so as to retrieve the list of actions
from which POMCP can sample. We demonstrated the eﬀectiveness of our approach on
several classical POMDP benchmarks with up to ≈ 10,000 states.
6.2 From Worst-Case to Quantile Constraints
We continue with the following IJCAI-ECAI’18 paper co-authored with Chatterjee, Elgyütt,
and Rouillé:
• K. Chatterjee, A. Elgyütt, P. Novotný, and O. Rouillé. “Expectation Optimization
with Probabilistic Guarantees in POMDPs with Discounted-Sum Objectives”. In:
Proceedings of IJCAI’18. Ed. by J. Lang. ijcai.org, 2018, pp. 4692–4699. doi: 10.
24963/ijcai.2018/652. url: https://doi.org/10.24963/ijcai.2018/
652
In the paper, we generalised the objective from the previous section to allow constraint
violations with some given (ideally, small) probability. That is, apart from the payoﬀ
threshold 𝑏, we are also given a risk threshold 𝜃 and solve the following problem:
max
𝜎
E 𝜎
𝑠0
[𝑑𝑖𝑠𝑐 𝛾]
subject to P 𝜎
𝑠0
[𝑑𝑖𝑠𝑐 𝛾 < 𝑏] ≤ 𝜃.
56 Design Safety: Risk-Aware Decision Making under Uncertainty
The rationale behind this objective is that in some application domains, satisfying the
worst-case constraint for non-trivial values of 𝑏 might be impossible, since the agent could
always experience some catastrophic failure. Hence, we relax the constraints by allowing
a bounded probability of large shortfalls. This “quantile” constraint in the problem’s
formulation is conceptually similar to the notion of value-at-risk utilised in mathematical
ﬁnance [DP97].
It is not possible to straightforwardly adapt the shielding approach from the previous
section to this more general scenario due to several factors. First, even deciding whether
there exists a policy satisfying a given quantile constraint in a perfectly observable MDP
has been proved to be highly intractable for several classes of payoﬀ functions [BCFNS13;
HK15]. Second, optimal policies for the quantile-constrained problem can be randomised:
the agent might want to probabilistically mix risky but potentially high-yield actions with
conservative low-yield actions so as to keep the actual risk (i.e. the probability that 𝑑𝑖𝑠𝑐 𝛾 < 𝑏)
as close to 𝜃 as possible (otherwise he would be sacriﬁcing the optimality of expected payoﬀ).
Hence, the shield would need to disable mixtures of actions rather than individual actions.
Third, the agent’s risk threshold 𝜃 actually evolves during the decision-making process. To
illustrate this, imagine that the agent, whose risk threshold is 𝜃, has chosen a risky action
with two equiprobable outcomes, one of which eventually leads to payoﬀ that is surely
below 𝑏. If such a bad outcome indeed happens, there is nothing to be done to save the
agent, but if the agent is lucky and observes the alternative outcome, he needs to adjust his
risk threshold: before the action outcome has been observed, the probability of violating
the payoﬀ constraint was 1
2 + 1
2 𝑝, where 𝑝 is the eventual constraint violation probability
in the alternative outcome. To ensure that this probability is at most 𝜃, we must have
𝑝 ≤ 2𝜃 − 1, i.e. the right-hand side is the new risk threshold for the continuation of the
process. Hence, the shields would additionally need to input the current level of 𝜃, which
further complicates their computation.
Our solution to the aforementioned conundrum is to eschew formal guarantees and
instead propose a heuristic approach to constrained optimisation, which is only guaranteed
to ﬁnd a constraint-satisfying policy in the limit. We dispose of the shield and instead
directly modify POMCP to obtain a new algorithm called RAMCP (“Risk-aware POMCP”).
Our modiﬁcation can be described as follows:
• Whenever a history is sampled, its payoﬀ is compared against the payoﬀ threshold
𝑏. If the history has payoﬀ at least 𝑝, we assign it risk value 0, otherwise its risk
value is 1. The statistics of the risk values are stored in the nodes of the search tree
analogously to the payoﬀ statistics.
• After the search three extension phase stops and the algorithm is about to select the
actual action to play, we proceed as follows: we interpret the search tree as a perfectly
observable constrained MDP with nodes as states and edges as transitions. To compute
the transition probabilities, we compute beliefs corresponding to each particular node,
i.e. probability distribution over the states of the POMDP indicating, for each state,
6.3 Risk-Constrained Learning 57
the probability of being in that particular state given that the history corresponding
to the tree node was observed. The belief computation proceeds by performing
standard Bayesian updates [SV10] along the tree edges. Inside the tree, the per-step
reward function over edges is computed in a similar way from the reward function of
the underlying POMDP, while the penalty of every tree edge is zero. Finally, for each
leaf node of the tree, we incur one-time reward and penalty equal to the respective
statistical estimates stored within the node (formally, this is done by adding dummy
transitions to a special sink state). The resulting CMDP with the tree root as the
initial state is solved via the usual linear programming approach [Alt99], which in
particular yields a probability distribution over actions in the root. The action to
be played is sampled according to this distribution, and a resulting observation is
received. Thereafter, the tree is pruned similarly to POMCP and the agent’s risk
threshold is updated in the manner sketched above. The algorithm then proceeds to
the next decision epoch.
In essence, we use the POMCP sampling to sample a ﬁnite CMDP which forms a
local approximation of the overall partially observable process, with “distant” behaviours
estimated statistically in the leaf nodes. It is clear that this heuristic cannot be equipped with
concrete formal guarantees. (And moreover, since the POMCP sampling is inherently ﬁnitehorizon,
RAMCP solves only a ﬁnite-horizon approximation of the problem.) However,
one can prove that as the number of search tree iterations approaches inﬁnity, the algorithm
indeed converges to the optimal constrained action. Moreover, we validated the algorithm
on a set of standard POMDP benchmarks with up to 67000 states. In our experiments, the
agent’s risk never surpassed the initial risk threshold 𝜃, and hence the algorithm behaved
in a conservative way.
6.3 Risk-Constrained Learning
We conclude the overview with the following AAAI’20 paper co-authored with Brázdil,
Chatterjee, and Vahala:
• T. Brázdil, K. Chatterjee, P. Novotný, and J. Vahala. “Reinforcement Learning of RiskConstrained
Policies in Markov Decision Processes”. In: The Thirty-Fourth AAAI
Conference on Artiﬁcial Intelligence (AAAI 2020). AAAI Press, 2020, pp. 9794–9801.
url: https://aaai.org/ojs/index.php/AAAI/article/view/6531
The aim of the paper was to scale the algorithm presented in the previous section to even
larger state spaces by incorporating elements of machine learning. Since we anticipated
non-trivial engineering challenges (in particular w.r.t. setting of hyperparameters), we
simpliﬁed the problem in two ways:
• We restricted to the perfectly observable setting, i.e. considered MDPs only; and
58 Design Safety: Risk-Aware Decision Making under Uncertainty
• we considered simpler state-based risk constraints. That is, the agent was given a set
𝐹 of failure states to avoid and a risk threshold 𝜃; his task was then to ﬁnd a policy
maximising the expected payoﬀ while ensuring that the probability of reaching 𝐹 is
below 𝑡ℎ𝑒𝑡𝑎:
max
𝜎
E 𝜎
𝑠0
[𝑑𝑖𝑠𝑐 𝛾]
subject to P 𝜎
𝑠0
[𝐹 is reached] ≤ 𝜃.
Since we stick to the MCTS approach to solving the problem, simplifying the constraint
does not signiﬁcantly decrease its expressive power: as seen in the previous section, e.g.
quantile constraints can be expressed using state-based risk by extending each history node
with the information on the payoﬀ accumulated so far. Also, while perfectly observable
CMDPs are known to be solvable by linear programming [Alt99], and thus in polynomial
time, our goal was to provide an algorithm that would be applicable to very large state
spaces, in which already writing down the whole model, not least constructing the linear
program over it, would be computationally demanding.
The core idea behind our approach is taking RAMCP and replacing the rollout-based
estimation of payoﬀs and risks with an estimation via a predictor (function approximator)
that is iteratively learning the correct values by observing repeated agent-environment
simulations. The resulting algorithm is called RAlph, since the predictor is used in a
way similar to the well-known MCTS-based AlphaZero game-playing algorithm [Sil+18].
RAlph combines the basic structure of AlphaZero with the RAMCP approach of sampling
a local constrained MDP to be solved by linear programming.
While the conceptual idea behind RAlph is a simple-looking combination of known
algorithms, the major diﬃculty in RAlph’s development was the engineering aspect, i.e.
designing the whole learning pipeline so that the learning process is stable and scalable.
There were several degrees of freedom in the design of RAlph’s speciﬁc components that
needed to be investigated properly so as to make the correct choices. In the end, RAlph has
been shown to handle MDPs with up to millions of states, and even on smaller benchmarks,
it clearly outperformed RAMCP by a wide margin. The performance of RAlph can be
further increased by automated parameter tuning [Pet22].
NO T E S
1. It might be argued that there was no real randomness used, just pseudo-random generators. However, we do
not aim to delve into debates on the nature of randomness in this thesis. If an observer cannot eﬃciently
distinguish the output of some procedure from a random output, we deem the procedure to be randomised.
2. We understand the phrase probabilistic program analysis as a shorthand for program analysis for probabilistic
programs. I.e., the phrase does not imply that the analysis algorithms we develop are themselves probabilistic.
3. For continuous PTSs, the scheduler needs to satisfy an additional measurability condition for the semantics
to be well-deﬁned [NSK09].
4. In this work, we always consider mathematical numbers as opposed to machine ones.
5. Since the capabilities of modern SMT solvers have increased signiﬁcantly over the last decade, SMT solving
should be considered as an eﬃcient alternative for tackling quadratic systems.
6. We denote by E 𝑥∼D [. . .] the expected value in a random experiment consisting of sampling 𝑥 according to
the distribution D.
7. For ergodic systems, the i.i.d. behaviour can sometimes be recovered by considering a “big-step” process
in which one time step corresponds to returning to some distinguished state. But the ergodic assumption
might be too strong in itself. Nondeterminism further complicates the attempts to proceed in this direction.
8. The concept of an invariant supporting an RSM is deﬁned analogously to an invariant supporting a ranking
function.
59
60 Design Safety: Risk-Aware Decision Making under Uncertainty
B I B L I O G R A P H Y
[ACN18] S. Agrawal, K. Chatterjee, and P. Novotný. “Lexicographic ranking supermartingales:
an eﬃcient approach to termination of probabilistic programs”.
In: Proc. ACM Program. Lang. 2. (Proceedings of POPL’18) (2018), 34:1–
34:32. doi: 10.1145/3158122. url: https://doi.org/10.1145/
3158122.
[ADFG10] C. Alias, A. Darte, P. Feautrier, and L. Gonnord. “Multi-dimensional Rankings,
Program Termination, and Complexity Bounds of Flowchart Programs”.
In: Proceedings of the 17th International Conference on Static Analysis.
Ed. by R. Cousot and M. Martel. Perpignan, France: Springer-Verlag, 2010,
pp. 117–133. url: http://dl.acm.org/citation.cfm?id=1882094.
1882102.
[All08] L. J. S. Allen. “An Introduction to Stochastic Epidemic Models”. In: Mathematical
Epidemiology. Ed. by F. Brauer, P. van den Driessche, and J. Wu.
Springer, 2008, pp. 81–130.
[Als+18] M. Alshiekh, R. Bloem, R. Ehlers, B. Könighofer, S. Niekum, and U. Topcu.
“Safe Reinforcement Learning via Shielding”. In: AAAI Conference on Artiﬁcial
Intelligence, (AAAI 2018). Ed. by S. A. McIlraith and K. Q. Weinberger.
AAAI Press, 2018, pp. 2669–2678. url: https://www.aaai.org/ocs/
index.php/AAAI/AAAI18/paper/view/17211.
[Alt99] E. Altman. Constrained Markov Decision Processes. Chapman&Hall/CRC,
1999.
[BBE10] T. Brázdil, V. Brožek, and K. Etessami. “One-Counter Stochastic Games”.
In: Proceedings of FSTTCS’10. Ed. by K. Lodaya and M. Mahajan. Schloss
Dagstuhl - Leibniz-Zentrum für Informatik, 2010, pp. 108–119.
[BBEK11] T. Brázdil, V. Brožek, K. Etessami, and A. Kučera. “Approximating the termination
value of one-counter MDPs and stochastic games”. In: Proceedings
of ICALP 2011, Part II. Vol. 6756. 2011, pp. 332–343.
[BBEKW10] T. Brázdil, V. Brožek, K. Etessami, A. Kučera, and D. Wojtczak. “OneCounter
Markov Decision Processes”. In: Proceedings of SODA 2010. 2010,
pp. 863–874.
[BBKO10] T. Brázdil, V. Brožek, A. Kučera, and J. Obdržálek. “Qualitative Reachability
in Stochastic BPA Games”. In: Information and Computation 208.7 (2010),
pp. 772–796.
61
62 BIBLIOGRAPHY
[BCFNS13] T. Brázdil, T. Chen, V. Forejt, P. Novotný, and A. Simaitis. “Solvency Markov
Decision Processes with Interest”. In: Proceedings of IARCS Annual Conference
on Foundations of Software Technology and Theoretical Computer
Science (FSTTCS 2013). Ed. by A. Seth and N. K. Vishnoi. Vol. 24. LIPIcs.
Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2013, pp. 487–499.
doi: 10.4230/LIPIcs.FSTTCS.2013.487. url: http://dx.doi.org/
10.4230/LIPIcs.FSTTCS.2013.487.
[BCIKP16] M. Brockschmidt, B. Cook, S. Ishtiaq, H. Khlaaf, and N. Piterman. “T2:
Temporal Property Veriﬁcation”. In: Tools and Algorithms for the Construction
and Analysis of Systems: 22nd International Conference, TACAS 2016, Held
as Part of the European Joint Conferences on Theory and Practice of Software,
ETAPS 2016, Eindhoven, The Netherlands, April 2-8, 2016, Proceedings. Ed. by
M. Chechik and J.-F. Raskin. Berlin, Heidelberg: Springer Berlin Heidelberg,
2016, pp. 387–393. doi: 10.1007/978-3-662-49674-9\_22. url: http:
//dx.doi.org/10.1007/978-3-662-49674-9%5C_22.
[BCNV20] T. Brázdil, K. Chatterjee, P. Novotný, and J. Vahala. “Reinforcement Learning
of Risk-Constrained Policies in Markov Decision Processes”. In: The ThirtyFourth
AAAI Conference on Artiﬁcial Intelligence (AAAI 2020). AAAI Press,
2020, pp. 9794–9801. url: https://aaai.org/ojs/index.php/AAAI/
article/view/6531.
[BEKK12] T. Brázdil, J. Esparza, S. Kiefer, and A. Kučera. “Analyzing Probabilistic
Pushdown Automata”. In: Formal Methods in System Design 43.2 (2012),
pp. 124–163.
[BFRR14] V. Bruyère, E. Filiot, M. Randour, and J.-F. Raskin. “Meet Your Expectations
With Guarantees: Beyond Worst-Case Synthesis in Quantitative Games”. In:
Proceedings of STACS’14. Ed. by E. W. Mayr and N. Portier. Vol. 25. LIPIcs.
Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2014, pp. 199–213.
doi: 10.4230/LIPIcs.STACS.2014.199. url: http://dx.doi.org/
10.4230/LIPIcs.STACS.2014.199.
[BG05] O. Bournez and F. Garnier. “Proving Positive Almost-Sure Termination”.
In: Proceedings of the 16th International Conference on Rewriting Techniques
and Applications (RTA 2005). Ed. by J. Giesl. 3467 vols. LNCS. Springer,
2005, pp. 323–337.
[BG15] A. M. Ben-Amram and S. Genaim. “Complexity of Bradley-Manna-Sipma
Lexicographic Ranking Functions”. In: Computer Aided Veriﬁcation: 27th
International Conference, CAV 2015, San Francisco, CA, USA, July 18-24,
2015, Proceedings, Part II. Ed. by D. Kroening and C. S. Păsăreanu. Springer
International Publishing, 2015, pp. 304–321. doi: 10.1007/978-3-319-
BIBLIOGRAPHY 63
21668- 3_18. url: http://dx.doi.org/10.1007/978- 3- 319-
21668-3_18.
[Bil95] P. Billingsley. Probability and Measure. 3rd. Wiley, 1995.
[BK08] C. Baier and J.-P. Katoen. Principles of Model Checking. Cambridge, Massachusetts,
2008, p. 984.
[BKK14] T. Brázdil, S. Kiefer, and A. Kučera. “Eﬃcient Analysis of Probabilistic
Programs with an Unbounded Counter”. In: J. ACM 61.6 (Dec. 2014), 41:1–
41:35. doi: 10.1145/2629599. url: http://doi.acm.org/10.1145/
2629599.
[BKKN15] T. Brázdil, S. Kiefer, A. Kučera, and P. Novotný. “Long-Run Average Behaviour
of Probabilistic Vector Addition Systems”. In: Proceedings of LICS’15.
IEEE Computer Society, 2015, pp. 44–55.
[BKKNK14] T. Brázdil, S. Kiefer, A. Kučera, P. Novotný, and J.-P. Katoen. “Zero-Reachability
in Probabilistic Multi-Counter Automata”. In: Proceedings of CSLLICS’14.
Vienna, Austria: ACM, 2014, 22:1–22:10. doi: 10.1145/2603088.
2603161. url: http://doi.acm.org/10.1145/2603088.2603161.
[BKNW12] T. Brázdil, A. Kučera, P. Novotný, and D. Wojtczak. “Minimizing Expected
Termination Time in One-Counter Markov Decision Processes”. In: Proceedings
of ICALP 2012, Part II. Vol. 7392. 2012, pp. 141–152.
[BMS05] A. R. Bradley, Z. Manna, and H. B. Sipma. “Linear Ranking with Reachability”.
In: Computer Aided Veriﬁcation, 17th International Conference, CAV 2005,
Edinburgh, Scotland, UK, July 6-10, 2005, Proceedings. Ed. by K. Etessami and
S. K. Rajamani. Springer, 2005, pp. 491–504. doi: 10.1007/11513988_48.
[CCFNO14] H.-Y. Chen, B. Cook, C. Fuhs, K. Nimkar, and P. O’Hearn. “Proving Nontermination
via Safety”. In: Tools and Algorithms for the Construction and
Analysis of Systems: 20th International Conference, TACAS 2014, Held as Part
of the European Joint Conferences on Theory and Practice of Software, ETAPS
2014, Grenoble, France, April 5-13, 2014. Proceedings. Ed. by E. Ábrahám
and K. Havelund. Berlin, Heidelberg: Springer Berlin Heidelberg, 2014,
pp. 156–171. doi: 10.1007/978- 3- 642- 54862- 8_11. url: http:
//dx.doi.org/10.1007/978-3-642-54862-8_11.
[CENR18] K. Chatterjee, A. Elgyütt, P. Novotný, and O. Rouillé. “Expectation Optimization
with Probabilistic Guarantees in POMDPs with DiscountedSum
Objectives”. In: Proceedings of IJCAI’18. Ed. by J. Lang. ijcai.org, 2018,
pp. 4692–4699. doi: 10.24963/ijcai.2018/652. url: https://doi.
org/10.24963/ijcai.2018/652.
64 BIBLIOGRAPHY
[CF17] K. Chatterjee and H. Fu. Termination of Nondeterministic Recursive Probabilistic
Programs. 2017. doi: 10.48550/ARXIV.1701.02944. url: https:
//arxiv.org/abs/1701.02944.
[CFG16] K. Chatterjee, H. Fu, and A. K. Goharshady. “Termination Analysis of Probabilistic
Programs Through Positivstellensatz’s”. In: Computer Aided Veriﬁcation
- 28th International Conference, CAV 2016, Toronto, ON, Canada,
July 17-23, 2016, Proceedings, Part I. Ed. by S. Chaudhuri and A. Farzan.
Vol. 9779. Lecture Notes in Computer Science. Springer, 2016, pp. 3–22.
doi: 10.1007/978-3-319-41528-4\_1. url: https://doi.org/10.
1007/978-3-319-41528-4%5C_1.
[CFN20] K. Chatterjee, H. Fu, and P. Novotný. “Termination Analysis of Probabilistic
Programs with martingales”. In: Foundations of Probabilistic Programming.
Ed. by G. Barthe, J.-P. Katoen, and A. Silva. Cambridge University Press,
2020. Chap. 7, pp. 221–258.
[CFNH16] K. Chatterjee, H. Fu, P. Novotný, and R. Hasheminezhad. “Algorithmic
analysis of qualitative and quantitative termination problems for aﬃne
probabilistic programs”. In: Proceedings of the 43rd Annual ACM SIGPLANSIGACT
Symposium on Principles of Programming Languages, POPL 2016, St.
Petersburg, FL, USA, January 20 - 22, 2016. Ed. by R. Bodík and R. Majumdar.
ACM, 2016, pp. 327–342. doi: 10.1145/2837614.2837639. url: http:
//doi.acm.org/10.1145/2837614.2837639.
[CFNH18] K. Chatterjee, H. Fu, P. Novotný, and R. Hasheminezhad. “Algorithmic
Analysis of Qualitative and Quantitative Termination Problems for Aﬃne
Probabilistic Programs”. In: ACM Trans. Program. Lang. Syst. 40.2 (2018),
7:1–7:45. doi: 10.1145/3174800. url: https://doi.org/10.1145/
3174800.
[CGJP17] Y. Chow, M. Ghavamzadeh, L. Janson, and M. Pavone. “Risk-Constrained
Reinforcement Learning with Percentile Risk Criteria”. In: Journal of Machine
Learning Research 18.1 (2017), pp. 6070–6120.
[CGNŽ21] K. Chatterjee, E. K. Goharshady, P. Novotný, and D. Žikelić. “Proving NonTermination
by Program Reversal”. In: Proceedings of PLDI’21. Ed. by S. N.
Freund and E. Yahav. ACM, 2021, pp. 1033–1048.
[CGNZŽ21] K. Chatterjee, E. K. Goharshady, P. Novotný, J. Zárevúcky, and D. Žikelić.
“On Lexicographic Proof Rules for Probabilistic Termination”. In: Proceedings
of FM’21. Ed. by M. Huisman, C. S. Pasareanu, and N. Zhan. Vol. 13047.
LNCS. Springer, 2021, pp. 619–639.
BIBLIOGRAPHY 65
[CNPRŽ17] K. Chatterjee, P. Novotný, G. A. Pérez, J. Raskin, and D. Žikelić. “Optimizing
Expectation with Guarantees in POMDPs”. In: Proceedings of AAAI’17. AAAI
Press, 2017, pp. 3725–3732. url: http://aaai.org/ocs/index.php/
AAAI/AAAI17/paper/view/14354.
[CNŽ17] K. Chatterjee, P. Novotný, and D. Žikelić. “Stochastic invariants for probabilistic
termination”. In: Proceedings of the 44th ACM SIGPLAN Symposium
on Principles of Programming Languages (POPL 2017). Ed. by G. Castagna
and A. D. Gordon. ACM, 2017, pp. 145–160. doi: 10.1145/3009837. url:
http://dl.acm.org/citation.cfm?id=3009873.
[CPR06] B. Cook, A. Podelski, and A. Rybalchenko. “Termination Proofs for Systems
Code”. In: Proceedings of the ACM SIGPLAN 2006 Conference on Programming
Language Design and Implementation, Ottawa, Ontario, Canada, June
11-14, 2006. Ed. by M. I. Schwartzbach and T. Ball. ACM, 2006, pp. 415–426.
doi: 10.1145/1133981.1134029. url: https://doi.org/10.1145/
1133981.1134029.
[CPR11] B. Cook, A. Podelski, and A. Rybalchenko. “Proving program termination”.
In: Communications of the ACM 54.5 (2011), pp. 88–98. doi: 10.1145/
1941487.1941509. url: http://doi.acm.org/10.1145/1941487.
1941509.
[CS01] M. Colón and H. Sipma. “Synthesis of Linear Ranking Functions”. In: Tools
and Algorithms for the Construction and Analysis of Systems, 7th International
Conference, TACAS 2001 Held as Part of the Joint European Conferences on
Theory and Practice of Software, ETAPS 2001 Genova, Italy, April 2-6, 2001,
Proceedings. Ed. by T. Margaria and W. Yi. Springer, 2001, pp. 67–81. doi:
10.1007/3-540-45319-9_6.
[CS02] M. A. Colón and H. B. Sipma. “Practical Methods for Proving Program
Termination”. In: Computer Aided Veriﬁcation: 14th International Conference,
CAV 2002 Copenhagen, Denmark, July 27–31, 2002 Proceedings. Ed. by E.
Brinksma and K. G. Larsen. Berlin, Heidelberg: Springer Berlin Heidelberg,
2002, pp. 442–454. doi: 10.1007/3- 540- 45657- 0_36. url: http:
//dx.doi.org/10.1007/3-540-45657-0_36.
[CS13] A. Chakarov and S. Sankaranarayanan. “Probabilistic Program Analysis
with Martingales”. In: Computer Aided Veriﬁcation - 25th International Conference
(CAV 2013). Ed. by N. Sharygina and H. Veith. Vol. 8044. LNCS.
Springer, 2013, pp. 511–526. doi: 10.1007/978-3-642-39799-8_34.
[CSS03] M. A. Colón, S. Sankaranarayanan, and H. B. Sipma. “Linear invariant Generation
Using Non-Linear Constraint Solving”. In: International Conference
on Computer Aided Veriﬁcation. Springer. 2003, pp. 420–432.
66 BIBLIOGRAPHY
[CSZ13] B. Cook, A. See, and F. Zuleger. “Ramsey vs. Lexicographic Termination
Proving”. In: TACAS. Ed. by N. Piterman and S. A. Smolka. Springer, 2013,
pp. 47–61.
[DFKPU14] K. Dräger, V. Forejt, M. Kwiatkowska, D. Parker, and M. Ujma. “Permissive
Controller Synthesis for Probabilistic Systems”. In: Proceedings of TACAS’14.
Ed. by E. Ábrahám and K. Havelund. Berlin, Heidelberg: Springer Berlin
Heidelberg, 2014, pp. 531–546.
[Dij76] E. W. Dijkstra. A Discipline of Programming. Prentice-Hall, 1976.
[DM79] N. Dershowitz and Z. Manna. “Proving Termination with Multiset Orderings”.
In: Proceedings of ICALP’79. Ed. by H. A. Maurer. Berlin, Heidelberg:
Springer, 1979, pp. 188–202.
[DP97] D. Duﬃe and J. Pan. “An overview of value at risk.” In: The Journal of
Derivatives 4.3 (1997), pp. 7–49.
[ESY12] K. Etessami, A. Stewart, and M. Yannakakis. “Polynomial Time Algorithms
for Branching Markov Decision Processes and Probabilistic Min(Max)
Polynomial Bellman Equations”. English. In: Automata, Languages, and
Programming 2012. Ed. by A. Czumaj, K. Mehlhorn, A. Pitts, and R. Wattenhofer.
Vol. 7391. LNCS. Springer Berlin Heidelberg, 2012, pp. 314–326.
doi: 10.1007/978-3-642-31594-7_27. url: http://dx.doi.org/
10.1007/978-3-642-31594-7_27.
[ESY17] K. Etessami, A. Stewart, and M. Yannakakis. “A Polynomial Time Algorithm
for Computing Extinction Probabilities of Multitype Branching Processes”.
In: SIAM J. Comput. 46.5 (2017), pp. 1515–1553. doi: 10.1137/
16M105678X. url: https://doi.org/10.1137/16M105678X.
[EY12] K. Etessami and M. Yannakakis. “Model Checking of Recursive Probabilistic
Systems”. In: 13 (2 2012).
[Fel84] Y. A. Feldman. “A decidable propositional dynamic logic with explicit probabilities”.
In: Information and Control 63.1 (1984), pp. 11–38. doi: 10.1016/
S0019-9958(84)80039-X. url: http://www.sciencedirect.com/
science/article/pii/S001999588480039X.
[FG19] F. Frohn and J. Giesl. “Proving Non-Termination via Loop Acceleration”.
In: 2019 Formal Methods in Computer Aided Design, FMCAD 2019, San Jose,
CA, USA, October 22-25, 2019. 2019, pp. 221–230.
[FH15] L. M. F. Fioriti and H. Hermanns. “Probabilistic Termination: Soundness,
Completeness, and Compositionality”. In: Proceedings of the 42nd Annual
ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages
(POPL 2015). Ed. by S. K. Rajamani and D. Walker. ACM, 2015,
pp. 489–501. doi: 10.1145/2676726.2677001.
BIBLIOGRAPHY 67
[FH82] Y. A. Feldman and D. Harel. “A probabilistic dynamic logic”. In: Proceedings
of the fourteenth annual ACM Symposium on Theory of computing. ACM.
1982, pp. 181–195.
[Flo67] R. W. Floyd. “Assigning meanings to programs”. In: Mathematical Aspects of
Computer Science 19 (1967), pp. 19–33.
[Fos53] F. G. Foster. “On the Stochastic Matrices Associated with Certain Queuing
Processes”. In: The Annals of Mathematical Statistics 24.3 (1953), pp. 355–
360.
[GHMRX08] A. Gupta, T. A. Henzinger, R. Majumdar, A. Rybalchenko, and R.-G. Xu.
“Proving Non-termination”. In: SIGPLAN Not. 43.1 (Jan. 2008), pp. 147–158.
doi: 10.1145/1328897.1328459. url: http://doi.acm.org/10.
1145/1328897.1328459.
[Gie+17] J. Giesl, C. Aschermann, M. Brockschmidt, F. Emmes, F. Frohn, C. Fuhs,
J. Hensel, C. Otto, M. Plücker, P. Schneider-Kamp, T. Ströder, S. Swiderski,
and R. Thiemann. “Analyzing Program Termination and Complexity Automatically
with AProVE”. In: J. Autom. Reason. 58.1 (2017), pp. 3–31. doi:
10.1007/s10817-016-9388-y. url: https://doi.org/10.1007/
s10817-016-9388-y.
[Gri03] D. Gries. “Loop Invariant”. In: Encyclopedia of Computer Science. GBR: John
Wiley and Sons Ltd., 2003, pp. 1038–1040.
[GRSWY19] J. Giesl, A. Rubio, C. Sternagel, J. Waldmann, and A. Yamada. “The Termination
and Complexity Competition”. In: Tools and Algorithms for the
Construction and Analysis of Systems - 25 Years of TACAS: TOOLympics, Held
as Part of ETAPS 2019, Prague, Czech Republic, April 6-11, 2019, Proceedings,
Part III. 2019, pp. 156–166. doi: 10.1007/978-3-030-17502-3\_10.
url: https://doi.org/10.1007/978-3-030-17502-3%5C_10.
[GSV08] S. Gulwani, S. Srivastava, and R. Venkatesan. “Program analysis as constraint
solving”. In: Proceedings of the ACM SIGPLAN 2008 Conference on
Programming Language Design and Implementation, Tucson, AZ, USA, June
7-13, 2008. 2008, pp. 281–292.
[HFCG19] M. Huang, H. Fu, K. Chatterjee, and A. K. Goharshady. “Modular Veriﬁcation
for Almost-Sure Termination of Probabilistic Programs”. In: Proc.
ACM Program. Lang. 3.OOPSLA (Oct. 2019). doi: 10.1145/3360555. url:
https://doi.org/10.1145/3360555.
68 BIBLIOGRAPHY
[HK15] C. Haase and S. Kiefer. “The Odds of Staying on Budget”. In: International
Colloquium on Automata, Languages, and Programming, (ICALP 2015). Ed. by
M. M. Halldórsson, K. Iwama, N. Kobayashi, and B. Speckmann. Vol. 9135.
Lecture Notes in Computer Science. Springer, 2015, pp. 234–246. doi:
10.1007/978-3-662-47666-6\_19. url: https://doi.org/10.
1007/978-3-662-47666-6%5C_19.
[HWZ08] H. Hermanns, B. Wachter, and L. Zhang. “Probabilistic CEGAR”. In: CAV.
LNCS 5123. Springer, 2008, pp. 162–175.
[KKNP09] M. Kattenbelt, M. Kwiatkowska, G. Norman, and D. Parker. “Abstraction
reﬁnement for probabilistic software”. In: International Workshop on Veriﬁcation,
Model Checking, and Abstract Interpretation. Springer. 2009, pp. 182–
197.
[KM75] S. Katz and Z. Manna. “A Closer Look at Termination”. In: Acta Informatica
5 (1975), pp. 333–352. doi: 10.1007/BF00264565. url: https://doi.
org/10.1007/BF00264565.
[KNP06] M. Z. Kwiatkowska, G. Norman, and D. Parker. “Game-based Abstraction
for Markov Decision Processes”. In: QEST. 2006, pp. 157–166.
[Kön27] D. König. “Über eine Schlussweise aus dem Endlichen ins Unendliche”. In:
Acta Sci. Math.(Szeged) 3.2-3 (1927), pp. 121–130.
[Koz83] D. Kozen. “A Probabilistic PDL”. In: Proceedings of the Fifteenth Annual
ACM Symposium on Theory of Computing. STOC ’83. New York, NY, USA:
ACM, 1983, pp. 291–297. doi: 10.1145/800061.808758. url: http:
//doi.acm.org/10.1145/800061.808758.
[KP12] M. Z. Kwiatkowska and D. Parker. “Advances in Probabilistic Model Checking”.
In: Software Safety and Security - Tools for Analysis and Veriﬁcation. Ed.
by T. Nipkow, O. Grumberg, and B. Hauptmann. Vol. 33. NATO Science for
Peace and Security Series - D: Information and Communication Security.
IOS Press, 2012, pp. 126–151. doi: 10.3233/978-1-61499-028-4-126.
url: https://doi.org/10.3233/978-1-61499-028-4-126.
[KS06] L. Kocsis and C. Szepesvári. “Bandit Based Monte-Carlo Planning”. In:
European Conference on Machine Learning (ECML 2006). Ed. by J. Fürnkranz,
T. Scheﬀer, and M. Spiliopoulou. Vol. 4212. LNCS. Springer, 2006, pp. 282–
293. doi: 10.1007/11871842_29. url: http://dx.doi.org/10.1007/
11871842_29.
[Kus71] H. Kushner. Introduction to Stochastic Control. Holt, Rinehart and Winston,
1971.
BIBLIOGRAPHY 69
[LH18] J. Leike and M. Heizmann. “Geometric Nontermination Arguments”. In:
Tools and Algorithms for the Construction and Analysis of Systems - 24th
International Conference, TACAS 2018, Held as Part of the European Joint
Conferences on Theory and Practice of Software, ETAPS 2018, Thessaloniki,
Greece, April 14-20, 2018, Proceedings, Part II. 2018, pp. 266–283.
[LKPK18] J. Lee, G.-H. Kim, P. Poupart, and K.-E. Kim. “Monte-Carlo Tree Search for
Constrained POMDPs.” In: Neural Information Processing Systems (NeurIPS
2018). 2018, pp. 7934–7943.
[LMA98] L. P. Kaelbling, M. L. Littman, and A. R. Cassandra. “Planning and acting
in partially observable stochastic domains”. In: Artiﬁcial intelligence 101.1
(1998), pp. 99–134.
[LNORR14] D. Larraz, K. Nimkar, A. Oliveras, E. Rodríguez-Carbonell, and A. Rubio.
“Proving Non-termination Using Max-SMT”. In: Computer Aided Veriﬁcation:
26th International Conference, CAV 2014, Held as Part of the Vienna
Summer of Logic, VSL 2014, Vienna, Austria, July 18-22, 2014. Proceedings.
Ed. by A. Biere and R. Bloem. Cham: Springer International Publishing,
2014, pp. 779–796. doi: 10 . 1007 / 978 - 3 - 319 - 08867 - 9 _ 52. url:
http://dx.doi.org/10.1007/978-3-319-08867-9_52.
[LPS81] D. Lehmann, A. Pnueli, and J. Stavi. “Impartiality, Justice and Fairness: The
Ethics of Concurrent Termination”. In: Proceedings if ICALP’81. Ed. by S.
Even and O. Kariv. Berlin, Heidelberg: Springer Berlin Heidelberg, 1981,
pp. 264–277.
[Mai13] O.-A. Maillard. “Robust Risk-Averse Stochastic Multi-Armed Bandits”. In:
Algorithmic Learning Theory (ALT 2013). Ed. by S. Jain, R. Munos, F. Stephan,
and T. Zeugmann. Berlin, Heidelberg: Springer, 2013, pp. 218–233.
[Man05] R. Mansuy. “The Origins of the Word “Martingale””. In: Electronic Journal
for History of Probability and Statistics 5.1 (2005).
[Met87] N. C. Metropolis. “The Beginning of the Monte Carlo Method”. In: Los
Alamos Science 15 (1987), pp. 125–130.
[MG07] J. Matoušek and B. Gärtner. Understanding and Using Linear Programming.
Springer, 2007.
[MJ84] F. L. Morris and C. B. Jones. “An Early Program Proof by Alan Turing”. In:
Annals of the History of Computing 6.2 (1984), pp. 139–143. doi: 10.1109/
MAHC.1984.10017.
[MMKK18] A. McIver, C. Morgan, B. L. Kaminski, and J. Katoen. “A New Proof Rule
for Almost-Sure Termination”. In: Proceedings of the ACM on Programming
Languages 2.Proceedings of (POPL 2018) (2018), 33:1–33:28. doi: 10.1145/
3158121. url: https://doi.org/10.1145/3158121.
70 BIBLIOGRAPHY
[MT09] S. Meyn and R. Tweedie. Markov Chains and Stochastic Stability. 2009.
[MVV18] A. J. Menezes, P. C. Van Oorschot, and S. A. Vanstone. Handbook of Applied
Cryptography. CRC press, 2018.
[NSK09] M. Neuhäußer, M. Stoelinga, and J.-P. Katoen. “Delayed Nondeterminism in
Continuous-Time Markov Decision Processes”. In: Proceedings of FoSSaCS
2009. Vol. 5504. 2009, pp. 364–379.
[Pet22] J. Petrák. “Black-Box Hyperparameter Tuning for Risk-Constrained Reinforcement
Learning Algorithm”. MA thesis. Masaryk University, 2022.
[PJP04] S. Prajna, A. Jadbabaie, and G. Pappas. “Stochastic Safety Veriﬁcation Using
Barrier Certiﬁcates”. In: 2004 43rd IEEE Conference on Decision and
Control (CDC). Vol. 1. IEEE, 2004, pp. 929–934. doi: 10.1109/CDC.2004.
1428804.
[Pou+15] P. Poupart, A. Malhotra, P. Pei, K. Kim, B. Goh, and M. Bowling. “Approximate
Linear Programming for Constrained Partially Observable Markov
Decision Processes”. In: AAAI 2015. AAAI Press, 2015, pp. 3342–3348. url:
http://www.aaai.org/ocs/index.php/AAAI/AAAI15/paper/
view/9742.
[PR04a] A. Podelski and A. Rybalchenko. “A Complete Method for the Synthesis of
Linear Ranking Functions”. In: Veriﬁcation, Model Checking, and Abstract
Interpretation, 5th International Conference, VMCAI 2004, Venice, January
11-13, 2004, Proceedings. Ed. by B. Steﬀen and G. Levi. Springer, 2004,
pp. 239–251. doi: 10.1007/978-3-540-24622-0_20.
[PR04b] A. Podelski and A. Rybalchenko. “Transition Invariants”. In: Proceedings
of the 19th Annual IEEE Symposium on Logic in Computer Science. LICS
’04. Washington, DC, USA: IEEE Computer Society, 2004, pp. 32–41. doi:
10.1109/LICS.2004.50. url: http://dx.doi.org/10.1109/LICS.
2004.50.
[PS12] M. Petrik and D. Subramanian. “An Approximate Solution Method for
Large Risk-Averse Markov Decision Processes”. In: CoRR abs/1210.4901
(2012).
[PT87] C. H. Papadimitriou and J. N. Tsitsiklis. “The Complexity of Markov Decision
Processes”. In: Mathematics of Operations Research 12.3 (1987), pp. 441–
450. doi: 10.1287/moor.12.3.441. eprint: https://doi.org/10.
1287/moor.12.3.441. url: https://doi.org/10.1287/moor.12.3.
441.
[Put05] M. L. Puterman. Markov Decision Processes. Wiley-Interscience, 2005, p. 684.
[Put94] M. Puterman. Markov Decision Processes. 1994.
BIBLIOGRAPHY 71
[Ros06] J. S. Rosenthal. A First Look at Rigorous Probability Theory. 2nd. World
Scientiﬁc Publishing Company, 2006.
[RRS15] M. Randour, J.-F. Raskin, and O. Sankur. “Variations on the Stochastic
Shortest Path Problem”. In: Proceedings of VMCAI’15. Ed. by D. D’Souza,
A. Lal, and K. G. Larsen. Vol. 8931. LNCS. Springer, 2015, pp. 1–18. doi:
10.1007/978-3-662-46081-8_1. url: http://dx.doi.org/10.
1007/978-3-662-46081-8_1.
[Sam+14] A. Sampson, P. Panchekha, T. Mytkowicz, K. S. McKinley, D. Grossman,
and L. Ceze. “Expressing and verifying probabilistic assertions”. In: ACM
SIGPLAN Conference on Programming Language Design and Implementation,
PLDI ’14, Edinburgh, United Kingdom - June 09 - 11, 2014. Ed. by
M. F. P. O’Boyle and K. Pingali. ACM, 2014, p. 14. doi: 10.1145/2594291.
2594294. url: http://doi.acm.org/10.1145/2594291.2594294.
[SB18] R. S. Sutton and A. G. Barto. Reinforcement Learning: An Introduction. MIT
Press, 2018.
[SCG13] S. Sankaranarayanan, A. Chakarov, and S. Gulwani. “Static Analysis for
Probabilistic Programs: Inferring Whole Program Properties from Finitely
Many Paths”. In: PLDI. 2013, pp. 447–458.
[SEY13] A. Stewart, K. Etessami, and M. Yannakakis. “Upper Bounds for Newton’s
Method on Monotone Polynomial Systems, and P-Time Model Checking
of Probabilistic One-Counter Automata”. In: Proceedings of CAV 2013.
Vol. 8044. 2013, pp. 495–510.
[Sil+18] D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M.
Lanctot, L. Sifre, D. Kumaran, T. Graepel, T. Lillicrap, K. Simonyan, and D.
Hassabis. “A general reinforcement learning algorithm that masters chess,
shogi, and Go through self-play”. In: Science 362.6419 (2018), pp. 1140–
1144. doi: 10.1126/science.aar6404. eprint: https://science.
sciencemag.org/content/362/6419/1140.full.pdf. url: https:
//science.sciencemag.org/content/362/6419/1140.
[SLM12] A. Sani, A. Lazaric, and R. Munos. “Risk-Aversion in Multi-Armed Bandits”.
In: Advances in Neural Information Processing Systems (NIPS 2012). Vol. 25.
2012, pp. 3275–3283.
[ST12] J. Steinhardt and R. Tedrake. “Finite-Time Regional Veriﬁcation of Stochastic
Non-Linear Systems”. In: The International Journal of Robotics Research
31.7 (2012), pp. 901–923. doi: 10 . 1177 / 0278364912444146. eprint:
https://doi.org/10.1177/0278364912444146. url: https://
doi.org/10.1177/0278364912444146.
72 BIBLIOGRAPHY
[STW16] P. Santana, S. Thiébaux, and B. C. Williams. “RAO*: An Algorithm for
Chance-Constrained POMDP’s”. In: AAAI Conference on Artiﬁcial Intelligence
(AAAI 2016). AAAI Press, 2016, pp. 3308–3314.
[SV10] D. Silver and J. Veness. “Monte-Carlo Planning in Large POMDPs”. In:
Neural Information Processing (NIPS 23). Curran Associates, Inc., 2010,
pp. 2164–2172. url: http://papers.nips.cc/paper/4031-monte-
carlo-planning-in-large-pomdps.pdf.
[TA11] I. Tkachev and A. Abate. “On Inﬁnite-Horizon Probabilistic Properties and
Stochastic Bisimulation Functions”. In: 50th IEEE Conference on Decision
and Control and European Control Conference (CDC 2011). IEEE, 2011,
pp. 526–531. doi: 10.1109/CDC.2011.6160617.
[TBF05] S. Thrun, W. Burgard, and D. Fox. Probabilistic Robotics (Intelligent Robotics
and Autonomous Agents). The MIT Press, 2005.
[TGY20] D. Troop, F. Godin, and J. Y. Yu. “Risk-Averse Action Selection Using Extreme
Value Theory Estimates of the CVaR”. In: CoRR abs/1912.01718
(2020).
[Tur38] A.M.Turing. “OnComputableNumbers, withanApplicationtotheEntscheidungsproblem.
A Correction”. In: Proceedings of the London Mathematical
Society 2.1 (1938), pp. 544–546.
[Tur49] A. Turing. “Checking a Large Routine”. In: Report of a Conference on High
Speed Automatic Calculating Machines. 1949, pp. 67–69.
[UH10] A. Undurti and J. P. How. “An Online Algorithm for Constrained POMDPs”.
In: International Conference on Robotics and Automation (ICRA’17). IEEE.
2010, pp. 3966–3973.
[VR08] H. Velroyen and P. Rümmer. “Non-termination Checking for Imperative
Programs”. In: Tests and Proofs: Second International Conference, TAP 2008,
Prato, Italy, April 9-11, 2008. Proceedings. Ed. by B. Beckert and R. Hähnle.
Berlin, Heidelberg: Springer Berlin Heidelberg, 2008, pp. 154–170. doi:
10.1007/978-3-540-79124-9_11. url: http://dx.doi.org/10.
1007/978-3-540-79124-9_11.
[Wil91] D. Williams. Probability with Martingales. Cambridge Mathematical Textbooks.
Cambridge, UK: Cambridge University Press, 1991, p. 251.
[ZP96] U. Zwick and M. Paterson. “The Complexity of Mean Payoﬀ Games on
Graphs”. In: Theoretical Computer Science 158.1–2 (1996), pp. 343–359.
Part II
C O L L E C T I ON O F A P P L I C A N T ’ S P U B L I C AT I ON S
L I S T O F E NC L O S E D
P U B L I C AT I ON S 7Below we list all the papers submitted as a part of this habilitation thesis. We note that they
do not represent the applicant’s entire bibliography. The rules of the habilitation procedure
at Masaryk University require us to provide an estimate of the author’s percentual
contribution towards each of the papers. Since such a percentage is impossible to estimate
correctly in practice, we use 100%/number of authors as a proxy measure. The author’s
factual contribution is sketched under each record.
1. K. Chatterjee, H. Fu, P. Novotný, and R. Hasheminezhad. “Algorithmic Analysis of
Qualitative and Quantitative Termination Problems for Aﬃne Probabilistic Programs”.
In: ACM Trans. Program. Lang. Syst. 40.2 (2018), 7:1–7:45. doi: 10.1145/
3174800. url: https://doi.org/10.1145/3174800 25%
• Contributed equally to all parts of the research and writing process.
2. K. Chatterjee, P. Novotný, and D. Žikelić. “Stochastic invariants for probabilistic
termination”. In: Proceedings of the 44th ACM SIGPLAN Symposium on Principles
of Programming Languages (POPL 2017). Ed. by G. Castagna and A. D. Gordon.
ACM, 2017, pp. 145–160. doi: 10.1145/3009837. url: http://dl.acm.org/
citation.cfm?id=3009873 33.3%
• Originator of the main idea of the paper, proved most of the key results, wrote
a large majority of the resulting text, overseen the prototype implementation.
3. S. Agrawal, K. Chatterjee, and P. Novotný. “Lexicographic ranking supermartingales:
an eﬃcient approach to termination of probabilistic programs”. In: Proc. ACM
Program. Lang. 2. (Proceedings of POPL’18) (2018), 34:1–34:32. doi: 10.1145/
3158122. url: https://doi.org/10.1145/3158122 33.3%
• Originator of the main idea of the paper, proved most of the key results, wrote
a large majority of the resulting text, overseen the prototype implementation.
4. K. Chatterjee, E. K. Goharshady, P. Novotný, J. Zárevúcky, and D. Žikelić. “On
Lexicographic Proof Rules for Probabilistic Termination”. In: Proceedings of FM’21.
Ed. by M. Huisman, C. S. Pasareanu, and N. Zhan. Vol. 13047. LNCS. Springer, 2021,
pp. 619–639 20%
• Originator of the main idea of the paper, wrote approximately half of the paper.
75
76 List of Enclosed Publications
5. K. Chatterjee, H. Fu, and P. Novotný. “Termination Analysis of Probabilistic Programs
with martingales”. In: Foundations of Probabilistic Programming. Ed. by G.
Barthe, J.-P. Katoen, and A. Silva. Cambridge University Press, 2020. Chap. 7,
pp. 221–258 33.3%
• Contributed equally to all parts of the research and writing process.
6. K. Chatterjee, E. K. Goharshady, P. Novotný, and D. Žikelić. “Proving Non-Termination
by Program Reversal”. In: Proceedings of PLDI’21. Ed. by S. N. Freund and
E. Yahav. ACM, 2021, pp. 1033–1048 25%
• Signiﬁcantlycontributedtotheconceptualdiscussionsbehindthepaper, equally
contributed to the writing of the paper.
7. K. Chatterjee, P. Novotný, G. A. Pérez, J. Raskin, and D. Žikelić. “Optimizing Expectation
with Guarantees in POMDPs”. In: Proceedings of AAAI’17. AAAI Press,
2017, pp. 3725–3732. url: http://aaai.org/ocs/index.php/AAAI/AAAI17/
paper/view/14354 20%
• Contributed equally to all parts of the research and writing process.
8. K. Chatterjee, A. Elgyütt, P. Novotný, and O. Rouillé. “Expectation Optimization
with Probabilistic Guarantees in POMDPs with Discounted-Sum Objectives”. In:
Proceedings of IJCAI’18. Ed. by J. Lang. ijcai.org, 2018, pp. 4692–4699. doi: 10.
24963/ijcai.2018/652. url: https://doi.org/10.24963/ijcai.2018/
652 25%
• Originator of the main idea of the paper, wrote a large majority of the resulting
text, overseen the prototype implementation.
9. T. Brázdil, K. Chatterjee, P. Novotný, and J. Vahala. “Reinforcement Learning of RiskConstrained
Policies in Markov Decision Processes”. In: The Thirty-Fourth AAAI
Conference on Artiﬁcial Intelligence (AAAI 2020). AAAI Press, 2020, pp. 9794–9801.
url: https://aaai.org/ojs/index.php/AAAI/article/view/6531 25%
• Originator of the main idea of the paper, wrote a large majority of the resulting
text, overseen the prototype implementation.
List of Enclosed Publications 77
The pages containing the enclosed papers were removed from the public version to
prevent copyright infringement.