Source Experiments & Sample Spaces Probability Pavel Rychlý PA154 Statistické nástroje pro korpusy, Spring 2014 Pavel Rychlý Probability Source Experiments &i Sample Spaces Source Introduction to Natural Language Processing (600.465) Dr. Jan Hajic CS Dept., Johns Hopkins Univ. hajic@cs.jhu.edu www.cs.jhu.edu/~hajic Pavel Rychlý Probability Experiments & Sample Spacf :xperiments & Sample Spaces ■ Experiment, process, test, . .. ■ Set of possible basic outcomes: sample space íž (základní prostor obsahující možné výsledky) ■ coin toss (fž — {head, tail}), die (fž — {1..6}) ■ yes/no opinion poll, quality test (bad/good) (il — {0,1}) ■ lottery |= 107..1012) ■ # of traffic accidents somewhere per year (íl — N) ■ spelling errors (íl — Z*), where Z is an aplhabet, and Z* is set of possible strings over such alphabet ■ missing word | = vocabulary size) Pavel Rychlý Probability Source Experiments & Sample Spaces Events ■ Event (jev) A is a set of basic outcomes ■ Usually A c ÍŽ, and all A £ 2Q (the event space, jevové pole) ■ O is the certain event (jistý jev), 0 is the impossible event (nemožný jev) ■ Example: ■ experiment: three times coin toss ■ Q = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT} ■ count cases with exactly two tails: then ■ A = {HTT, THT, TTH} ■ all heads: ■ A = {HHH} Pavel Rychlý Probability Source Experiments &i Sample Spaces Probability ■ Repeat experiment many times, record how many times a given event A occured ("count" c\). ■ Do this whole series many times; remember all c,-s. Cj ■ Observation: if repeated really many times, the ratios of — (where T-, is the number of experiments run in the i-th series) are close to some (unknown but) constant value. ■ Call this constant a probability of A. Notation: p(A) Pavel Rychlý Probability Source Experiments &i Sample Spaces Estimating Probability ■ Remember: .. . close to an unknown constant. ■ We can only estimate it: ■ from a single series (typical case, as mostly the outcome of a series is given to us we cannot repeat the experiment), set P(A) = f ' 1 ■ otherwise, take the weighted average of all — (or, if the data allows, simply look at the set of series as if it is a single long series). ■ This is the best estimate. Pavel Rychlý Probability Source Experiments &i Sample Spaces Example ■ Recall our example: ■ experiment: three times coin toss ■ ft = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT} ■ count cases with exactly two tails: A — {HTT, THT, TTH} ■ Run an experiment 1000 times (i.e. 3000 tosses) ■ Counted: 386 cases with two tails (HTT, THT or TTH) ■ estimate: p(A) = 386/100 = .386 ■ Run again: 373, 399, 382, 355, 372, 406, 359 ■ p(A) = .379 (weighted average) or simply 3032/8000 ■ Uniform distribution assumption: p(A) = 3/8 = .375 Pavel Rychlý Probability Source Experiments &i Sample Spaces Basic Properties ■ Basic properties: ■ p: 2° [0,1] ■ P(fi) = 1 ■ Disjoint events: p(U A,) — E,p(A,) ■ NB: axiomatic definiton of probability: take the above three conditions as axioms ■ Immediate consequences: ■ P(0) = 0 ■ p(A) = 1 - p(a) i AC B=> p(A) < P(B) ■ EaeoP(a) = 1 Pavel Rychlý Probability Source Experiments & Sample Spaces Joint and Conditional Probability P(A,B) p(A\B) : p(An B) p(A, B) P(B) Estimating form counts: p(A\B) = P(A B) P(B) cjADB) c(B) cjADB) c(B) ft Q Q Pavel Rychlý Probability Source Experiments & Sample Spaces p(A,B) = p(B,A) since p(A n B) = p(B n A) ■ therefore p(A\B)p(B) = p(B\A)p(A), and therefore: Bayes Rule p(A\B) p{B\A) x p(A) P{B) Pavel Rychlý Probability ft Q Q Source Experiments &i Sample Spaces Independence Can we compute p(A,B) from p(A) and p(B)? Recall from previous foil: p(A\B) x p(B) = p(B\A) x p(A) p(A, B) = p(B\A) x p(A) ... we're almost there: how p(B\A) relates to p(B)? ■ p(B|A) — p(B) iff A and B are independent Example: two coin tosses, weather today and weather on March 4th 1789; Any two events for which p(B|A) = P(B)! Pavel Rychlý Probability Source Experiments & Sample Spaces Chain Rule p(A1,A2,A3,AA,...,An) = p{A1\A2,A3,AA,...,An) x p{A2\A3,A4,...,An)x xp(A3\AA, ...,An)x... p(A,_i| A, x p(An) ■ this is a direct consequence of the Bayes rule. Pavel Rychlý Probability ■ Interested in an event A given B (where it is not easy or practical or desirable) to estimate p(A\B): ■ take Bayes rule, max over all Bs: fA\a\ argmaxAp(B\A) x p(A) ■ argmaxAp(A\B) =-- = argmaxAp{B\A) x p(A) ■ ... as p(B) is constant when changing As Pavel Rychlý Probability Source Experiments &i Sample Spaces Random Variables ■ is a function X : ft —> Q ■ in general Q — R", typically R m easier to handle real numbers than real-world events ■ random variable is discrete if Q is countable (i.e. also if finite) ■ Example: die: natural "numbering" [1,6], coin: {0,1} ■ Probability distribution: ■ px{x) — p{X — x) —df P{AX) where Ax — {a e fi : X(a) — x} m often just p(x) if it is clear from context what X is Pavel Rychlý Probability Source Experiments &i Sample Spaces Expectation Joint and Conditional Distributions is a mean of a random variable (weighted average) ■ E{X) = Exex(0)x-Px(x) Example: one six-sided die: 3.5, two dice (sum): 7 Joint and Conditional distribution rules: ■ analogous to probability of events Bayes: pX|y(x,y) —notation Pxy{x\y) —even simpler notation p(x\y) p{y\x).p{x) p(y) Chain rule: p(w, x,y, z) = p(z).p(y\z).p(x\y, z).p(w\x,y, z) Pavel Rychlý Probability Source Experiments &i Sample Spaces Standard Distributions ■ Binomial (discrete) ■ outcome: 0 or 1 (thus £>/nomial) ■ make n trials ■ interested in the (probability of) numbers of successes r ■ Must be careful: it's not uniform! / n\ ■ ( ) pt,{r\n) = (for equally likely outcome) ■ (") counts how many possibilities there are for choosing r objects out of n; (n\ _ "! ■ ^ ~ (n- r)\r\ Pavel Rychlý Probability ontinuous Distributions ■ The normal distribution ("Gaussian") ■ Pnorm{x\H,v) = exp ■ where: ■ (i is the mean (x-coordinate of the peak) (0) ■ a is the standard deviation (1) m_p_X. ■ other: hyperbolic, t Pavel Rychlý Probability -(*-/j)2 (Ty/2Ťľ