Source
Experiments & Sample Spaces
Probability
Pavel Rychlý
PA154 Statistické nástroje pro korpusy, Spring 2014
Pavel Rychlý Probability
Source
Experiments &i Sample Spaces
Source
Introduction to Natural Language Processing (600.465) Dr. Jan Hajic CS Dept., Johns Hopkins Univ. hajic@cs.jhu.edu www.cs.jhu.edu/~hajic
Pavel Rychlý Probability
Experiments & Sample Spacf
:xperiments & Sample Spaces
■ Experiment, process, test, . ..
■ Set of possible basic outcomes: sample space íž (základní prostor obsahující možné výsledky)
■ coin toss (fž — {head, tail}), die (fž — {1..6})
■ yes/no opinion poll, quality test (bad/good) (il — {0,1})
■ lottery      |= 107..1012)
■ # of traffic accidents somewhere per year (íl — N)
■ spelling errors (íl — Z*), where Z is an aplhabet, and Z* is set of possible strings over such alphabet
■ missing word      | = vocabulary size)
Pavel Rychlý Probability
Source
Experiments & Sample Spaces
Events
■ Event (jev) A is a set of basic outcomes
■ Usually A c ÍŽ, and all A £ 2Q (the event space, jevové pole)
■ O is the certain event (jistý jev), 0 is the impossible event (nemožný jev)
■ Example:
■ experiment: three times coin toss
■ Q = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}
■ count cases with exactly two tails: then
■ A = {HTT, THT, TTH}
■ all heads:
■ A = {HHH}
Pavel Rychlý Probability
Source
Experiments &i Sample Spaces
Probability
■ Repeat experiment many times, record how many times a given event A occured ("count" c\).
■ Do this whole series many times; remember all c,-s.
Cj
■ Observation: if repeated really many times, the ratios of —
(where T-, is the number of experiments run in the i-th series) are close to some (unknown but) constant value.
■ Call this constant a probability of A. Notation: p(A)
Pavel Rychlý Probability
Source
Experiments &i Sample Spaces
Estimating Probability
■ Remember: .. . close to an unknown constant.
■ We can only estimate it:
■ from a single series (typical case, as mostly the outcome of a series is given to us we cannot repeat the experiment), set
P(A) = f ' 1
■ otherwise, take the weighted average of all — (or, if the data
allows, simply look at the set of series as if it is a single long series).
■ This is the best estimate.
Pavel Rychlý Probability
Source
Experiments &i Sample Spaces
Example
■ Recall our example:
■ experiment: three times coin toss
■ ft = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}
■ count cases with exactly two tails: A — {HTT, THT, TTH}
■ Run an experiment 1000 times (i.e. 3000 tosses)
■ Counted: 386 cases with two tails (HTT, THT or TTH)
■ estimate: p(A) = 386/100 = .386
■ Run again: 373, 399, 382, 355, 372, 406, 359
■ p(A) = .379 (weighted average) or simply 3032/8000
■ Uniform distribution assumption: p(A) = 3/8 = .375
Pavel Rychlý Probability
Source
Experiments &i Sample Spaces
Basic Properties
■ Basic properties:
■ p: 2°     [0,1]
■ P(fi) = 1
■ Disjoint events: p(U A,) — E,p(A,)
■ NB: axiomatic definiton of probability: take the above three conditions as axioms
■ Immediate consequences:
■ P(0) = 0
■ p(A) = 1 - p(a)
i AC B=> p(A) < P(B)
■ EaeoP(a) = 1
Pavel Rychlý Probability
Source
Experiments & Sample Spaces
Joint and Conditional Probability
P(A,B) p(A\B)
: p(An B) p(A, B)
P(B)
Estimating form counts:
p(A\B) =
P(A B) P(B)
cjADB) c(B)
cjADB) c(B)
		ft
Q	Q	
Pavel Rychlý Probability
Source
Experiments & Sample Spaces
p(A,B) = p(B,A) since p(A n B) = p(B n A)
■ therefore p(A\B)p(B) = p(B\A)p(A), and therefore:
Bayes Rule
p(A\B)
p{B\A) x p(A) P{B)
Pavel Rychlý Probability
		ft
Q	Q	
Source
Experiments &i Sample Spaces
Independence
Can we compute p(A,B) from p(A) and p(B)? Recall from previous foil:
p(A\B) x p(B) = p(B\A) x p(A) p(A, B) = p(B\A) x p(A)
... we're almost there: how p(B\A) relates to p(B)? ■ p(B|A) — p(B) iff A and B are independent
Example: two coin tosses, weather today and weather on March 4th 1789;
Any two events for which p(B|A) = P(B)!
Pavel Rychlý Probability
Source
Experiments & Sample Spaces
Chain Rule
p(A1,A2,A3,AA,...,An) =
p{A1\A2,A3,AA,...,An) x p{A2\A3,A4,...,An)x xp(A3\AA, ...,An)x... p(A,_i| A, x p(An)
■ this is a direct consequence of the Bayes rule.
Pavel Rychlý Probability
■ Interested in an event A given B (where it is not easy or practical or desirable) to estimate p(A\B):
■ take Bayes rule, max over all Bs:
fA\a\    argmaxAp(B\A) x p(A)
■ argmaxAp(A\B) =-- =
argmaxAp{B\A) x p(A)
■ ... as p(B) is constant when changing As
Pavel Rychlý Probability
Source
Experiments &i Sample Spaces
Random Variables
■ is a function X : ft —> Q
■ in general Q — R", typically R
m easier to handle real numbers than real-world events
■ random variable is discrete if Q is countable (i.e. also if finite)
■ Example: die: natural "numbering" [1,6], coin: {0,1}
■ Probability distribution:
■ px{x) — p{X — x) —df P{AX) where Ax — {a e fi : X(a) — x} m often just p(x) if it is clear from context what X is
Pavel Rychlý Probability
Source Experiments &i Sample Spaces
Expectation
Joint and Conditional Distributions
is a mean of a random variable (weighted average)
■ E{X) = Exex(0)x-Px(x)
Example: one six-sided die: 3.5, two dice (sum): 7 Joint and Conditional distribution rules:
■ analogous to probability of events
Bayes: pX|y(x,y) —notation Pxy{x\y) —even simpler notation
p(x\y)
p{y\x).p{x)
p(y)
Chain rule:  p(w, x,y, z) = p(z).p(y\z).p(x\y, z).p(w\x,y, z)
Pavel Rychlý Probability
Source                                                                                                                                      Experiments &i Sample Spaces Standard Distributions	
■	Binomial (discrete)
	■ outcome: 0 or 1 (thus £>/nomial)
	■ make n trials
	■ interested in the (probability of) numbers of successes r
■	Must be careful: it's not uniform! / n\
■	( ) pt,{r\n) =       (for equally likely outcome)
■	(") counts how many possibilities there are for choosing r
	objects out of n;
	(n\ _ "!
■	^ ~ (n- r)\r\
	
Pavel Rychlý Probability	
ontinuous Distributions
■ The normal distribution ("Gaussian")
■ Pnorm{x\H,v) = exp
■ where:
■ (i is the mean (x-coordinate of the peak) (0)
■ a is the standard deviation (1)
m_p_X.
■ other: hyperbolic, t
Pavel Rychlý Probability
-(*-/j)2
(Ty/2Ťľ