IA: Computational Logic
. Inductive Inference
Achim Blumensath blumens@fi.muni.cz
Faculty of Informatics, Masaryk University, Brno
Basic Concepts
Induction
learning general facts from examples:
Induction is the process of forming of a hypothesis (about a target
concept/function) based on observed data.
Example
What is the next number?
0, 1,
Induction
learning general facts from examples:
Induction is the process of forming of a hypothesis (about a target
concept/function) based on observed data.
Example
What is the next number?
0, 1, 1,
Induction
learning general facts from examples:
Induction is the process of forming of a hypothesis (about a target
concept/function) based on observed data.
Example
What is the next number?
0, 1, 1, 2,
Induction
learning general facts from examples:
Induction is the process of forming of a hypothesis (about a target
concept/function) based on observed data.
Example
What is the next number?
0, 1, 1, 2, 3,
Induction
learning general facts from examples:
Induction is the process of forming of a hypothesis (about a target
concept/function) based on observed data.
Example
What is the next number?
0, 1, 1, 2, 3, 5,
Induction
learning general facts from examples:
Induction is the process of forming of a hypothesis (about a target
concept/function) based on observed data.
Example
What is the next number?
0, 1, 1, 2, 3, 5, 8,… an = an−2 + an−1
Induction
learning general facts from examples:
Induction is the process of forming of a hypothesis (about a target
concept/function) based on observed data.
Example
What is the next number?
0, 1, 1, 2, 3, 5, 8,… an = an−2 + an−1
0, 0,
Induction
learning general facts from examples:
Induction is the process of forming of a hypothesis (about a target
concept/function) based on observed data.
Example
What is the next number?
0, 1, 1, 2, 3, 5, 8,… an = an−2 + an−1
0, 0, 0,
Induction
learning general facts from examples:
Induction is the process of forming of a hypothesis (about a target
concept/function) based on observed data.
Example
What is the next number?
0, 1, 1, 2, 3, 5, 8,… an = an−2 + an−1
0, 0, 0, 0,
Induction
learning general facts from examples:
Induction is the process of forming of a hypothesis (about a target
concept/function) based on observed data.
Example
What is the next number?
0, 1, 1, 2, 3, 5, 8,… an = an−2 + an−1
0, 0, 0, 0, 0,
Induction
learning general facts from examples:
Induction is the process of forming of a hypothesis (about a target
concept/function) based on observed data.
Example
What is the next number?
0, 1, 1, 2, 3, 5, 8,… an = an−2 + an−1
0, 0, 0, 0, 0, 0,… an = 0
or
Induction
learning general facts from examples:
Induction is the process of forming of a hypothesis (about a target
concept/function) based on observed data.
Example
What is the next number?
0, 1, 1, 2, 3, 5, 8,… an = an−2 + an−1
0, 0, 0, 0, 0, 0,… an = 0
or an = n(n − 1)(n − 2)(n − 3)(n − 4)(n − 5)?
Fundamental Problem
From a strictly logical point of view, induction is not possible: there
are always several possible explanations for the observed phenomena
and there is no rational basis for choosing one over the others. Hence,
a hypothesis can be falsified but never verified.
Consequently we need to make additional a priori assumptions (the
so-called inductive bias) regarding the target concept.
Inductive Learning Hypothesis
A hypothesis that approximates the target concept well over a
sufficiently large amount of training data will also approximate it well
over unobserved examples.
Occam’s Razor
Use the simplest hypothesis that matches the observations.
(What’s simple depends on our formalism.)
Philosophy of Science
Scientific Method
In the th century, Francis Bacon, René Descartes, and Isaac
Newton developed the scientific method based on induction.
Problem of Induction
David Hume was the first to point out that inductive inferences are
unprovable and always subject to falsification.
Falsifiability
Karl Popper argued that induction does not exist. Instead science is
based on conjecture and criticism. One should select hypotheses that
are the easiest to falsify.
Paradigm Shift
Thomas Kuhn viewed science as a social process. He emphasised the
role of paradigms and the way they are replaced when sufficiently
many observations point to problems with the current paradigm.
Machine Learning
Induction (and learning in general) works best if it is interactive:
▸ form a hypothesis based on the current data
▸ test the hypothesis on new data
▸ repeat
The question therefore is not whether the hypothesis is true, but how
well it predicts observations.
Most decent algorithms for inference use statistical methods and fall
outside the scope of this course.
Hypothesis Space
Hypothesis Space
Hypothesis Space
Hypothesis Space
Example

  
  

Example

  
  

x ∨ y
Example

  
  

x ∨ y
¬x ∨ ¬z
Boolean Functions
Boolean functions
In this lecture we will concentrate on learning boolean functions
f {0, 1}n
→ {0, 1}
(which can be encoded as propositional formulae)
Example
x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 f (¯x)
0 1 0 1 1 1 1 0 0 1
√
1 0 1 0 0 0 0 1 1 1 ⨉⨉⨉
1 1 0 0 1 1 1 0 1 0 ⨉⨉⨉
0 0 0 0 1 0 0 0 1 0
√
0 0 0 1 1 0 0 1 1 0
√
0 1 1 1 0 1 1 0 1 1 ⨉⨉⨉
0 1 0 0 1 0 0 1 0 0
√
Conjunctive Hypotheses
Setting
Learning a boolean function f {0, 1}n
→ {0, 1} using as hypotheses
conjunctions η = xi ∧ ⋅⋅⋅ ∧ ¬xk of literals.
General-to-specific ordering
η is more specific than ζ if η ⊧ ζ.
Idea
Find the most specific hypothesis.
Conjunctive Hypotheses
Setting
Learning a boolean function f {0, 1}n
→ {0, 1} using as hypotheses
conjunctions η = xi ∧ ⋅⋅⋅ ∧ ¬xk of literals.
General-to-specific ordering
η is more specific than ζ if η ⊧ ζ.
Idea
Find the most specific hypothesis.
Conjunctive Hypotheses
Setting
Learning a boolean function f {0, 1}n
→ {0, 1} using as hypotheses
conjunctions η = xi ∧ ⋅⋅⋅ ∧ ¬xk of literals.
General-to-specific ordering
η is more specific than ζ if η ⊧ ζ.
Idea
Find the most specific hypothesis.
√
Conjunctive Hypotheses
Setting
Learning a boolean function f {0, 1}n
→ {0, 1} using as hypotheses
conjunctions η = xi ∧ ⋅⋅⋅ ∧ ¬xk of literals.
General-to-specific ordering
η is more specific than ζ if η ⊧ ζ.
Idea
Find the most specific hypothesis.
√
√
Conjunctive Hypotheses
Setting
Learning a boolean function f {0, 1}n
→ {0, 1} using as hypotheses
conjunctions η = xi ∧ ⋅⋅⋅ ∧ ¬xk of literals.
General-to-specific ordering
η is more specific than ζ if η ⊧ ζ.
Idea
Find the most specific hypothesis.
√
√
√
Conjunctive Hypotheses
Setting
Learning a boolean function f {0, 1}n
→ {0, 1} using as hypotheses
conjunctions η = xi ∧ ⋅⋅⋅ ∧ ¬xk of literals.
General-to-specific ordering
η is more specific than ζ if η ⊧ ζ.
Idea
Find the most specific hypothesis.
√
√
√
⨉⨉⨉
⨉⨉⨉
⨉⨉⨉
Conjunctive Hypotheses
Setting
Learning a boolean function f {0, 1}n
→ {0, 1} using as hypotheses
conjunctions η = xi ∧ ⋅⋅⋅ ∧ ¬xk of literals.
General-to-specific ordering
η is more specific than ζ if η ⊧ ζ.
Idea
Find the most specific hypothesis.
√
√
√
⨉⨉⨉
⨉⨉⨉
⨉⨉⨉
⨉⨉⨉
Find-S algorithm
▸ Start with η =
▸ Consider the next positive example ¯b
▸ If η(¯b) is true, continue.
▸ Otherwise, find the most specific ζ such that η ⊧ ζ and ζ(¯b) is
true.
▸ Continue with η = ζ.
This algorithm computes find the least conjunction with respect to the
⊧-ordering that covers all positive examples.
If any of the negative examples is also covered, the training data
cannot be described by a conjunction.
Example
x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 f (¯x)
0 1 0 1 1 1 1 0 0 1
√
1 0 1 0 0 0 0 1 1 1 ⨉⨉⨉
1 1 0 0 1 1 1 0 1 0 ⨉⨉⨉
0 0 0 0 1 0 0 0 1 0
√
0 0 0 1 1 0 0 1 1 0
√
0 1 1 1 0 1 1 0 1 1 ⨉⨉⨉
0 1 0 0 1 0 0 1 0 0
√
η0 =
Example
x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 f (¯x)
0 1 0 1 1 1 1 0 0 1
√
1 0 1 0 0 0 0 1 1 1 ⨉⨉⨉
1 1 0 0 1 1 1 0 1 0 ⨉⨉⨉
0 0 0 0 1 0 0 0 1 0
√
0 0 0 1 1 0 0 1 1 0
√
0 1 1 1 0 1 1 0 1 1 ⨉⨉⨉
0 1 0 0 1 0 0 1 0 0
√
η0 =
η1 = ¬x1 ∧ x2 ∧ ¬x3 ∧ x4 ∧ x5 ∧ x6 ∧ x7 ∧ ¬x8 ∧ ¬x9 ∧ x10
Example
x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 f (¯x)
0 1 0 1 1 1 1 0 0 1
√
1 0 1 0 0 0 0 1 1 1 ⨉⨉⨉
1 1 0 0 1 1 1 0 1 0 ⨉⨉⨉
0 0 0 0 1 0 0 0 1 0
√
0 0 0 1 1 0 0 1 1 0
√
0 1 1 1 0 1 1 0 1 1 ⨉⨉⨉
0 1 0 0 1 0 0 1 0 0
√
η0 =
η1 = ¬x1 ∧ x2 ∧ ¬x3 ∧ x4 ∧ x5 ∧ x6 ∧ x7 ∧ ¬x8 ∧ ¬x9 ∧ x10
η2 = ¬x1 ∧ ¬x3 ∧ x5 ∧ ¬x8
Example
x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 f (¯x)
0 1 0 1 1 1 1 0 0 1
√
1 0 1 0 0 0 0 1 1 1 ⨉⨉⨉
1 1 0 0 1 1 1 0 1 0 ⨉⨉⨉
0 0 0 0 1 0 0 0 1 0
√
0 0 0 1 1 0 0 1 1 0
√
0 1 1 1 0 1 1 0 1 1 ⨉⨉⨉
0 1 0 0 1 0 0 1 0 0
√
η0 =
η1 = ¬x1 ∧ x2 ∧ ¬x3 ∧ x4 ∧ x5 ∧ x6 ∧ x7 ∧ ¬x8 ∧ ¬x9 ∧ x10
η2 = ¬x1 ∧ ¬x3 ∧ x5 ∧ ¬x8
η3 = ¬x1 ∧ ¬x3 ∧ x5
Example
x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 f (¯x)
0 1 0 1 1 1 1 0 0 1
√
1 0 1 0 0 0 0 1 1 1 ⨉⨉⨉
1 1 0 0 1 1 1 0 1 0 ⨉⨉⨉
0 0 0 0 1 0 0 0 1 0
√
0 0 0 1 1 0 0 1 1 0
√
0 1 1 1 0 1 1 0 1 1 ⨉⨉⨉
0 1 0 0 1 0 0 1 0 0
√
η0 =
η1 = ¬x1 ∧ x2 ∧ ¬x3 ∧ x4 ∧ x5 ∧ x6 ∧ x7 ∧ ¬x8 ∧ ¬x9 ∧ x10
η2 = ¬x1 ∧ ¬x3 ∧ x5 ∧ ¬x8
η3 = ¬x1 ∧ ¬x3 ∧ x5
η4 = ¬x1 ∧ ¬x3 ∧ x5
Hypothesis space
Goal Compute all hypotheses consistent with the data.
Let D ⊆ {0, 1}n
× {0, 1} be the observed data and H the set of all
hypotheses consistent with every datum in D.
We compute the sets H+
and H−
of maximal/minimal elements of H
(with respect to the general-to-specific order ⊧).
Candidate-Elimination Algorithm
▸ Start with H+
= {⊺} and H−
= { }.
▸ For each positive d ∈ D:
▸ Delete from H+
every hypothesis η with η(d) = 0.
▸ Replace every η ∈ H−
with η(d) = 0 by the set of all minimal ζ
such that
η ⊧ ζ , ζ(d) = 1 , and ζ ⊧ η′
, for some η′
∈ H+
.
▸ Remove from H−
all elements that are not minimal.
▸ For each negative d ∈ D: proceed analogously with H+
and H−
interchanged.
Example
x1 x2 x3 f (¯x)
1 1 0
√
0 0 1 ⨉⨉⨉
1 0 0
√
1 0 1 ⨉⨉⨉

  
  

Example
x1 x2 x3 f (¯x)
1 1 0
√
0 0 1 ⨉⨉⨉
1 0 0
√
1 0 1 ⨉⨉⨉

  
  

Step . H−
= { } H+
= {⊺}
Example
x1 x2 x3 f (¯x)
1 1 0
√
0 0 1 ⨉⨉⨉
1 0 0
√
1 0 1 ⨉⨉⨉

  
  

Step . H−
= { } H+
= {⊺}
Step . H−
= {x1 ∧ x2 ∧ ¬x3} H+
= {⊺}
Example
x1 x2 x3 f (¯x)
1 1 0
√
0 0 1 ⨉⨉⨉
1 0 0
√
1 0 1 ⨉⨉⨉

  
  

Step . H−
= { } H+
= {⊺}
Step . H−
= {x1 ∧ x2 ∧ ¬x3} H+
= {⊺}
Step . H−
= {x1 ∧ x2 ∧ ¬x3} H+
= {x1, x2, ¬x3}
Example
x1 x2 x3 f (¯x)
1 1 0
√
0 0 1 ⨉⨉⨉
1 0 0
√
1 0 1 ⨉⨉⨉

  
  

Step . H−
= { } H+
= {⊺}
Step . H−
= {x1 ∧ x2 ∧ ¬x3} H+
= {⊺}
Step . H−
= {x1 ∧ x2 ∧ ¬x3} H+
= {x1, x2, ¬x3}
Step . H−
= {x1 ∧ ¬x3} H+
= {x1, ¬x3}
Example
x1 x2 x3 f (¯x)
1 1 0
√
0 0 1 ⨉⨉⨉
1 0 0
√
1 0 1 ⨉⨉⨉

  
  

Step . H−
= { } H+
= {⊺}
Step . H−
= {x1 ∧ x2 ∧ ¬x3} H+
= {⊺}
Step . H−
= {x1 ∧ x2 ∧ ¬x3} H+
= {x1, x2, ¬x3}
Step . H−
= {x1 ∧ ¬x3} H+
= {x1, ¬x3}
Step . H−
= {x1 ∧ ¬x3} H+
= {¬x3}
Decision Trees
Decision Trees
Organise the function to be learned as a tree.
x1 x2 x3 x4 x5 x6 x7 f (¯x)
1 0 1 1 1 0 1
√
0 1 0 0 0 1 1 ⨉⨉⨉
1 1 1 1 1 1 0
√
0 0 1 0 0 1 0
√
0 0 0 1 1 0 1 ⨉⨉⨉
1 1 0 1 1 0 0
√
1 0 1 0 1 0 0
√
 
   
   
 
  
   
      
      
      
⨉⨉⨉
√
⨉⨉⨉
√ √ √ √
Decision Trees
Organise the function to be learned as a tree.
x1 x2 x3 x4 x5 x6 x7 f (¯x)
1 0 1 1 1 0 1
√
0 1 0 0 0 1 1 ⨉⨉⨉
1 1 1 1 1 1 0
√
0 0 1 0 0 1 0
√
0 0 0 1 1 0 1 ⨉⨉⨉
1 1 0 1 1 0 0
√
1 0 1 0 1 0 0
√
 
 
 
⨉⨉⨉
√
⨉⨉⨉
√
The order of the variables xi matters. Which one do we choose?
Ordered Binary Decision Diagrams (OBDDs)
▸ data structure to compactly represent a boolean function
▸ the arguments are ordered x1, . . . , xn
▸ the graph is reduced: merge isomorphic subgraphs and eliminate
unneeded vertices
(x1 ∧ x3) ∨ (x2 ∧ x3) ∨ ¬(x1 ∨ x2 ∨ x3)
x
x
x x
   


 
   