Journal of Applied Logic 7 (2009) 131­135
www.elsevier.com/locate/jal
Introduction
Combining Probability and Logic
Fabio Cozman, Rolf Haenni, Jan-Willem Romeijn, Federica Russo,
Gregory Wheeler, Jon Williamson 
United Kingdom
Available online 28 January 2008
This volume arose out of an international, interdisciplinary academic network on Probabilistic Logic and Probabilistic
Networks involving four of us (Haenni, Romeijn, Wheeler and Williamson), called Progicnet and funded by
the Leverhulme Trust from 2006­8. Many of the papers in this volume were presented at an associated conference, the
Third Workshop on Combining Probability and Logic (Progic 2007), held at the University of Kent on 5­7 September
2007. The papers in this volume concern either the special focus on the connection between probabilistic logic and
probabilistic networks or the more general question of the links between probability and logic. Here we introduce
probabilistic logic, probabilistic networks, current and future directions of research and also the themes of the papers
that follow.
1. What is probabilistic logic?
Probabilistic logic, or progic for short, can be understood in a broad sense as referring to any formalism that
combines aspects of both probability theory and logic, or in a narrow sense as a particular kind of logic, namely one
that incorporates probabilities in the language or metalanguage.
In the latter case, if the probabilities are incorporated directly into the logical language we have what might be called
an internal progic. An example is a first-order language where one or more of the function symbols are intended to
refer to probability functions. Thus one can form expressions like (P1(Fa) = 0.2  P2(Rab) 0.5)  Gb. This kind
of language is suitable for reasoning about probabilities and is explored by Halpern [5], for instance. If, on the other
hand, the probabilities are incorporated into the metalanguage we have an external progic. For example, one might
attach probabilities to sentences of a propositional language: (p  q)  r0.95
. This kind of language is suitable for
reasoning under uncertainty, and maintains a stricter distinction between the level of logic and the level of probability
[see, e.g., 10]. A logic that incorporates probabilities both within the language and the metalanguage is a mixed progic.
The central question facing a progic is which conclusions to draw from given premisses. For an internal progic the
question is which  to conclude from given premisses 1,...,n, where these are sentences of a language involving
probabilities. This is analogous to the question facing classical logic. But for an external (or mixed) progic, the
question is rather different. Instead of asking what Y to conclude from given premisses 1
X1 ,...,n
Xn one would
normally ask what Y to attach to a given , when also given premisses 1
X1 ,...,n
Xn . Note that, depending on the
semantics in question, the X1,...,Xn,Y might be probabilities or sets of probabilities.
Since the fundamental question of an external probabilistic logic differs from that of a non-probabilistic logic,
different techniques may be required to answer this question. In non-probabilistic logics one typically appeals to a
* Corresponding author.
E-mail address: j.williamson@kent.ac.uk (J. Williamson).
1570-8683/$ ­ see front matter  2007 Elsevier B.V. All rights reserved.
doi:10.1016/j.jal.2007.12.001
132 F. Cozman et al. / Journal of Applied Logic 7 (2009) 131­135
proof theory to determine which  to conclude from given 1,...,n. This is not always appropriate in the case of
a probabilistic logic. An external progic requires machinery for manipulating probabilities, not just tools for handling
sentences. In Haenni et al. [4] it is suggested that the machinery of probabilistic networks can fruitfully be applied here.
2. What are probabilistic networks?
Probabilistic networks (or probabilistic graphical models) is a general term for various mathematical models in
which probabilistic information is linked to network-based structural information. The network structure is usually
formalized by a directed graph with nodes and arrows, where an arrow between two nodes is meant to represent some
sort of influence or dependency between the variables associated with those nodes. This in turn means that the absence
of an arrow between two nodes implies some sort of independence among the associated variables. As an example,
we could represent our knowledge about the positive correlation between smoking and lung cancer by two network
nodes S and L and an arrow from S towards L. And we could then enhance the network by nodes and corresponding
arrows for other (possibly independent) causes of lung cancer, for further smoking-related diseases, or for possible
symptoms of lung cancer. To avoid circular dependencies, directed graphs are normally assumed to be acyclic.
Depending on the available probabilistic information, it is common to distinguish different types of probabilistic
network. In the simplest case of so-called Bayesian (or belief) networks [11], it is assumed that the conditional
probability of each network variable given its parents is fully known. In the example above, we could meet this
requirement by specifying P(L = yes | S = yes) = 0.05 and P(L = yes | S = no) = 0.01 for the network variable L
(which indicates that smoking increases the risk of lung cancer by a factor of 5) and by assuming a prior probability
P(S = yes) = 0.3 for the network variable S (by which we express the assumption that about 30% of the population
smokes). What makes Bayesian networks particularly attractive is the fact that the included structural and probabilistic
information is sufficient to induce a unique joint probability function over all involved variables, which in turn can
be used to answer all sorts of probabilistic queries. A non-trivial query in the example would be to compute the
probability P(S = yes | L = yes), which can easily be derived from the joint probability function. Note that the explicit
specification of a joint probability function would require exponentially many parameters, which is not feasible for
practical applications. Bayesian networks are thus useful to specify large joint probability functions efficiently and to
reduce the complexity of respective computations. The simplicity and efficiency of Bayesian networks explains their
vast success in the last two decades in many different areas.
A second type of probabilistic network arises when the uniqueness assumption for the given probability values
is relaxed. The resulting credal networks [1] are thus similar to Bayesian networks, except that they expect sets of
probabilities instead of point-valued probabilities for each network variable. In the lung cancer example above, we
could for example express our lack of knowing the percentage of smokers precisely by 0.27 P(S = yes) 0.32. In
the simplest case of binary variables with two possible outcomes, the required sets of probabilities are always intervals,
like [0.27,0.32], but in general it is assumed that they are closed convex sets of probabilities, so-called credal sets.
The advantage of credal networks over Bayesian networks is their increased generality and flexibility, but the price for
this is added computational complexity. Credal networks have been around for about 10 years now, but a remarkable
increase of attention has only been observed in the last few years.
Another type of probabilistic network results from relaxing the restriction to directed graphs. One class of such
undirected probabilistic networks is known as Markov networks (or Markov random fields) [8,12]. They are similar to
Bayesian networks in their representation of dependencies, but they can represent certain dependencies that a Bayesian
network cannot (such as cyclic dependencies), and they can't represent certain dependencies that a Bayesian network
can. The probabilistic information of a Markov network comes in form of so-called potential functions k, where k
is a clique (set of pairwise connected nodes) of the undirected graph. The collection of potential functions represents
a factorization of the joint probability function of all involved variables and can thus be used to answer arbitrary
probabilistic queries. Both Bayesian networks and Markov networks are special cases of valuation systems [7] (or
factor graphs [6]), a more general type of (probabilistic or non-probabilistic) graphical model.
3. Current directions
Research on probabilistic logic has, since its beginnings, explored trade-offs between expressivity and complexity
in various kinds of internal and external progics. Generally these progics attach probabilities to logical constructs in
F. Cozman et al. / Journal of Applied Logic 7 (2009) 131­135 133
a very flexible manner, possibly letting many measures satisfy a given set of probabilistic assessments (this "singlemeasure
vs multiple-measure" debate is further discussed in Section 5). The current literature continues to investigate
the merits and the applications of progics that do not impose any special structure on probabilistic assessments. Take
for instance, the recent work by [9], and several papers on philosophical and psychological topics in this special
issue--by Howson, by Leuridan, by Pfeifer and Kleiter, and by Sprenger.
The last fifteen years have witnessed the arrival of many progics where graphs are used to structure sentences and
assessments, so as to lower the complexity of inference. Until a decade ago the work on "progics + graphs" focused
on a few narrow strands; a noticeable change took place around 2000, and since then the literature has been growing at
a staggering pace. Most of this recent literature employs the technology of Bayesian and Markov networks to produce
progics where any set of well-formed formulas is satisfied by a single measure. The paper by Ng and Lloyd in this
special issue gives examples where Bayesian networks are encoded through logic.
A key reason for the dramatic growth of interest in single-model progics based on graphs, particularly in the
artificial intelligence literature, is the maturity of inference algorithms for Bayesian and Markov networks. Today one
can employ graphs so as to produce a progic with the power to handle problems of practical significance in reasonable
computing time. Relatively little attention has been given to progics that are based on graphs and yet depart from the
single-measure assumption (however see the papers by de Campos et al. and by Haenni in this issue).
Two further characteristics of the current work on "progics + graphs" deserve to be mentioned. First, research
on progics based on probabilistic networks is heavily oriented towards applications--for instance, applications in
data mining, because data are relational or structured (an example is textual data), or because it is advisable to use
existing domain knowledge expressed as logical sentences [2]. Second, the research often focuses on learning logical
and probabilistic sentences from data. Indeed, a key reason for the surge of such progics around 2000 is that, at that
point, researchers began to combine Bayesian networks and logic so as to process relational data [3]. The emphasis
on single-model progics is particularly strong when learning from data is considered.
4. This volume
In the first paper of this volume, Assembling a consistent set of sentences in relational probabilistic logic with
stochastic independence, Cassio Polpo de Campos, Fabio G. Cozman, and José E. Ochoa Luna examine networkbased
representations for independence relations in probabilistic logics. The paper starts with a comprehensive survey
of the literature on probabilistic logic and thus provides an ideal entry point for the topic of this volume. Next, it
discusses independence relations in probabilistic logics with a focus on networks for relational probabilistic logic,
and then it explores a possible application. The paper ends by pointing out some methodological care that should be
taken when assembling knowledge bases.
In the second paper, on Probabilistic argumentation, Rolf Haenni assumes that the available knowledge is partly
encoded as a set of logical premisses and partly as a fully specified probability space. This setting gets particularly
interesting when some of the logical premisses include variables that are not part of the probability space. The two
classical questions of the probability and the logical deducibility of a hypothesis can then be replaced by the more
general question of the probability of a hypothesis being logically deducible from the premisses. The theory is thus an
example of an external progic, which does not explicitly deal with sets of probabilities, but still allows a hypothesis to
be judged by a pair of probability values.
In Can logic be combined with probability? Some observations, Colin Howson addresses one of the most fundamental
questions underlying this special issue and more generally the Progicnet project. As Howson says, a positive
answer to this question is of course possible, but could turn out to be trivial, as "logic can be combined with anything".
A slightly more sophisticated answer would be that logic and probability can indeed be combined because of their both
being formal languages: logic and probability have a common set of concept and methods, and some of their semantic
aspects also turn out to be significantly similar. But the interesting question becomes whether such a formal similarity
is simply formal, or whether there is a deeper degree of conceptual affinity. This is exactly the challenge that Howson
takes up in the paper. In the first part of the paper, the author closely investigates the extent to which Gaifman, Scott
and Krauss succeeded in combining probability and logic. This attempt consisted in adapting the concepts, tools and
procedures of model theory in modern logic--in particular, the notions of consistency and of consequence--to provide
a corresponding model theory for the probability language. Howson shows that this account isn't fully successful because
expressiveness--i.e., the consideration of language systems whose expressive power is closer to the -algebra
134 F. Cozman et al. / Journal of Applied Logic 7 (2009) 131­135
of mathematical probability--and effectiveness--i.e., the possibility of developing a proof theory--eventually pull
in opposite directions. The pars construens of the paper develops along the line of Scott's and Krauss' ideas, but it
differs in that it develops a formalism of epistemic probability, based on set theoretic algebras, that is a generalisation
of the classical logical concepts of model, consistency, and consequence. In particular, for assignments of real-valued
probabilities to elements of a field or -field F of sets, Howson shows that three theorems, which have their analogue
(meta)results in first order logic, follow: (i) Absoluteness of consistency, (ii) Non-ampliativity of consequence, and
(iii) Compactness.
The paper by Bert Leuridan, Causal discovery and the problem of ignorance, deals with the use of logic in determining
the structure of a Bayesian network on the basis of a probability assignment. It discusses the problem that
Pearl's IC algorithm, which determines a causal graph on the basis of a probability assignment over variables, requires
us to have full knowledge of the probability assignment, in particular of all the conditional (in)dependence relations
among variables. Clearly, in many applications we do not have a full frequency table over all variables available, and
hence we cannot take a unique probability assignment over all the variables as starting point. The paper proposes
a solution to this problem: it defines a so-called adaptive logic, which allows us to work with default settings for
variables whose relation is unknown, and indeed to revise the default settings once more information is obtained. The
paper provides a proof theory, a semantics, and finally a soundness proof for this logic.
In the next paper, Framing human inference by coherence based probability logic, Niki Pfeifer and Gernot Kleiter
establish a link from inference in probabilistic logic to psychological aspects of human deductive reasoning. The
approach they adopt is to take probability logic based on the coherence approach of subjective probability as the basic
reference theory. The paper first gives a brief overview of the recent developments of combining logic and probability
to build models of human deductive reasoning. It then proposes the coherence approach and highlights its advantages
for psychological model building. The main claims of the paper are supported by results of experimental studies,
which seem to be most consistent with coherence based probability logic.
The paper by Kee Siong Ng and John Lloyd, Probabilistic reasoning in a classical logic, explores the combination
of logical and probabilistic reasoning through higher-order logics. The paper argues against the view that one must add
"special" features to classical logic so as to obtain a probabilistic logic--they contend that one can handle probabilistic
reasoning within classical logic through higher-order machinery. Ng and Lloyd indeed present a language that does
so, by allowing functions to return whole densities over domains of types. Ng and Lloyd comment extensively on
expressivity and computational complexity of higher-order progics, and make connections with recent developments
in logic and artificial intelligence research. The paper also discusses several examples, and in particular an extended
example that mixes individuals, relations, and Bayesian networks.
Finally, the paper by Jan Sprenger, Statistics between inductive logic and empirical science, is concerned with
the logical status of statistical inference. More specifically, it poses the question of whether we can isolate a distinctly
logical part to statistical inference that is universal to all statistical inferences, as separate from the rather more
application-dependent choices of a model and, in the case of Bayesian inference, certain prior probabilities. By means
of two examples, one on parameter estimation and one on model selection, Sprenger argues that statistical practice
provides no grounds for any such separation between a unifying logical and a diverse application-oriented part of
statistical inference. Rather the image emerges of statistics as closely tied up with the empirical sciences, requiring
case-by-case optimisation instead of blanket solutions. Inductive logic may be interesting in its own right; portraying
statistical inference as inductive logic misses out on the juiciest parts of the latter.
5. Future directions
Progic 2007 was the third in a series of workshops on combining probability and logic. The special focus of the Kent
workshop was the relationship between probabilistic logic and probabilistic networks, and a number of proposals to
use network structures to simplify probabilistic calculation were discussed. The sheer variety of approaches naturally
raises the question of which to choose, and why. In addressing this question at the workshop three themes emerged,
which suggests future directions of research.
The first issue, touched upon in Section 1, concerns the type of uncertainty to be managed. There is a natural
distinction between reasoning about uncertainty and reasoning under conditions of uncertainty, which have formal
analogues to internal progics and external progics. As we observed, each asks very different things from a probability
F. Cozman et al. / Journal of Applied Logic 7 (2009) 131­135 135
logic and some disagreements over approach trace to different types of representation and reasoning problems to
solve.
Second, there is a familiar trade-off in logic between expressive capacity of the representational language and
inferential power of a logic: an increase in expressive capacity nearly always accompanies a decrease in capacities
for the logic to effectively draw out consequences. One place this tension appears within probability logic is when
discussing the relative advantages of sharp probabilities versus interval-valued probabilities. A single distribution is
an easier object to compute with than a set of distributions, so there is little surprise that point-valued approaches
are favored from a computational point of view. But deferring purely to computational considerations without check
would yield ditching probability for propositional logic. The second issue concerns an inventory of advantages and
disadvantages for adopting sharp probabilities, and comparing those to the advantages and disadvantages to using
interval-valued probabilities.
In addition to practical and philosophical reasons for favoring imprecise or interval-valued probabilities to sharp
values (e.g., sometimes you don't have enough information to construct a full joint distribution; then what?), there
are also strong theoretical reasons for paying attention to results and methods developed within interval-valued frameworks.
One theme of the workshop was to demonstrate that an interval-valued framework that has as a special case one
or another sharp-valued approach often gives a vivid picture of what is happening within this representation, and why
it does so. The upshot is that even those who are unmoved by the philosophical reasons for favoring interval-valued
probabilities over sharp probabilities may nevertheless benefit from several mathematical insights that are gained from
the shift.
A third theme concerns how to judge the correctness of a representation of uncertainty within a probabilistic logic.
For many who take de Finetti as a starting point, the criteria for the representation of uncertain opinion derive from the
behavioral consequences of opinion, such as buying a bet. One of the motivations for imprecise probability is that the
lowest selling price for a bet may be and often is higher than the highest buying price, suggesting that the associated
probability can be determined up to an interval. However, this invites the question of whether a probabilistic logic
should perhaps be supplemented with a decision theory, or even designed in conjunction with a decision theory from
scratch. It may be that as a normative theory, a stand-alone probabilistic logic is without a firm foundation.
References
[1] F.G. Cozman, Credal networks, Artificial Intelligence 120 (2) (2000) 199­233.
[2] P. Domingos, Toward knowledge-rich data mining, Data Mining and Knowledge Discovery 15 (1) (2007) 21­28.
[3] N. Friedman, L. Getoor, D. Koller, A. Pfeffer, Learning probabilistic relational models, in: International Joint Conference on Artificial Intelligence,
1999, pp. 1300­1309.
[4] R. Haenni, J.-W. Romeijn, G. Wheeler, J. Williamson, Probabilistic logic and probabilistic networks, 2008, in press.
[5] J.Y. Halpern, Reasoning about Uncertainty, MIT Press, Cambridge MA, 2003.
[6] F.R. Kschischang, B.J. Frey, H.A. Loeliger, Factor graphs and the sum­product algorithm, IEEE Transactions on Information Theory 47 (2)
(2001) 498­519.
[7] J. Kohlas, Information Algebras: Generic Structures for Inference, Springer, London, 2003.
[8] S.L. Lauritzen, Graphical Models, Oxford University Press, 1996.
[9] Z. Ognjanovi´c, Discrete linear-time probabilistic logics: Completeness, decidability and complexity, Journal of Logic and Computation 16 (2)
(2006) 257­285.
[10] J.B. Paris, The Uncertain Reasoner's Companion, Cambridge University Press, Cambridge, 1994.
[11] J. Pearl, Probabilistic Reasoning in Intelligent Systems, Morgan Kaufmann, San Mateo, CA, 1988.
[12] J. Whittaker, Graphical Models in Applied Multivariate Statistics, John Wiley and Sons, 1990.