198 8 REGRESSION AND PREDICTION relation of each with IQ. For persons of the same IQ, there is little correlation between speed and comprehension. Hence, since an increase in reading speed will not increase IQ, it would be expected to have littleeffect on comprehension. 6. (a) Y;= .00186V;+2.51 (b) rrQ.v=-.013, rYG.v=-.122; Gwill be selectedsince itsabsolute valueis greatest (c) zr, =.379zv, -.12IzG; (d) .258 (e) Y;=.00207 V; -. I65G;+ 3.133 (f) 4.06 (g) .34 (h) .12 or 12% (~), yes, .35 7. ry' = I - 701.4/I,04 1.6 = .3266; hence, 1]~ ,1.3266~ .57 ~ r PROBABILITY b INTRODUCTION Researchers often attempt to generalize from their observations. They make a tacit assumption that the set of data has some generalizability-if they gathered more data tomorrow. it would reflect the same general trend. Inferences differ in their likelihood of being correct all the way from "extremely unlikely" to "almost certain." From the standpoint of logic. all inferences contain uncertainty. Statisticians have developed methods that assign probabilities to inferences. Inferential reasoning is a principal method of science. The language of everyday life-"extremely unlikely" or "almost certain"-lacks precision. Statisticians do not completely agree as to how to assign probabilities to statements or how to choose to which statements to assign probabilities. Nonetheless, their preference for objectivity and quantification springs from values regarding the nature and methods of science. The remaining chapters deal with assigning a probability value to an inference. The methods that statisticians have developed allow one to state, for example, "There is a positive relationship between IQ and grade-point average, and the probability of obtaining such a large r by chance is very small-only .01 if there really is no correlation in the popu.ation,' Only certain basics of probability theory can be addressed in a single chapter; probability theory is a large and complex body of knowledge. The finer points in probability theory are not needed for using and interpreting the probability statements used in statistical inference. An intuitive understanding of probability is, however, necessary to interpret [he statistics of hypothesis testing and interval estimation. • PROBABILITY AS A MATHEMATICAL SYSTEM Probability can be viewed as a system of definitions and operations pertaining to a sample space. The idea of a sample space is basic. Every probability statement is related to a 199 9.3 First Addition Rule of Probabilities Definition: A probability function is a rule of correspondence that associates with each event A in the sample space, a number peA) such that: I. For any event A, I ;::; P(rI) ;::; O. 2. The sum of the probabilities for all distinct events is 1. 3. If A and B are mutually exclusive events, that is, have no sample points in common, the peA or B) = peA) + PCB). If it is assumed that the probability of every elementary event a,is IIN, where N is the total number of sample points, then the probability of the event A that is composed of r sample points"is: 200 9 PROBABILITY sample space of some sort; indeed, statements of probability are statements about sample spaces.' A sample space can be defined as a set of points. These points can represent persons, businesses, cities, schools, et cetera. An event is an observable happening like the appearance of heads when a coin is flipped, or that a person is watching television. There are usually many points in the sample space, each of which is an example of an event. For instance, the sample space may be a set of six white and three black balls in an urn. This sample space has nine points. An event might be "A ball is white." This event has six sample-space points, How many points in the sample space does the event "A ball is black" have? The event "A ball in this urn is red" has no sample points. "A ball in this urn is either white or black" is also an example of an event. Notice that many different events can be defined on the same sample space. A statement of probability is made about the relative frequency of an event that is associated with a sample space. A capital letter, A, B, C, ... , will stand for an event; the "probability of the event A" will be denoted by peA). TI I I I r P(A)=-+-+ ...+-=N N N N (9.2) I 201 Definition: The probability of the event A, P(A), is the ratio of the number of sample points that are examples ofA to the total finitenumber of sample points in the sample space, assuming all sample points are equally likely. Let A be the event "a 3 face of a die," where the sample space is the set of the six faces of a die. How many sample points are examples of the event A? Obviously, only one. The total number of sample points is six. Hence, the probability of the event A (3) is: peA) =-!. 6 peA) Number of Examples of A Total Number of Sample Points (9.1 ) The probability of event A. peA), is the ratio o.f the numb~r, r, of sample points that are examples ofA to the total number of sample pomts, N, that I.S, r/N. . . . Both routes brinz us to the same definition for peA). While the second definition might be the preference of ~he mathematician, the first definition of peA) is more intuitive. Combining Probabilltles Suppose an urn contains five red, three white, and two black bal~s. Three events might be of interest: (1) A, a ball is red, (2) B, a ball is white, or (3) C, a ball tS black. These three events are mutually exclusive; each sample point is an example of one and only one event. The question arises, "What is the probability that a ball is :ed or whi.t~?" This event, the union of A and B, shall be denoted by the symbol A u B and Its probability by peA u B). • FIRST ADDITION RULE OF PROBABILITIES When the events A and B are mutually exclusive, the probability of either A or B, peA u B), is: If B is the event "an even numeral" in this die-tossing example, find PCB). B can be restated as "a 2 or 4 or 6." Since B consists of three points in the sample space of six points, PCB) =3/6 =1/2. What is P(C) if C is the event 7? P(C) is 0/7 =0, because 7 is not in the sample space of this problem. IfD is the event "an even or odd numeral," what is P(D)?The answer is 6/6 = I. Suppose there is an urn that has four white balls in it and a finite, but unspecified, number of black balls. The probability of an event cannot be determined, A probability statement can be made only when the sample space is defined completely. The definition of the probability of an event can be expressed using an alternative approach. Consider a sample space composed of a specified number of sample points. Denote each of the sample points by "a]': a,. a2, ..., a.: Every event that is defined within the sample space is composed of a related set of sample points. peA u B) =peA) + PCB) From the example above, we have: 5 3 8 8P(AuB)=P(A)+P(B)=-+-=- or. 10 10 10 (9.3) I 'The notion of a sample space is actually a relatively recent development in probability theory, dating back only to the 1920s. 'The symbol conventionally used in probability theory for this purpose is r, not to be confused with the correlation coeflicient. 202 9 PROBABILITY 9.4 Second Addition Rule of Probabilities 203 Find PtA u C). Since events A and C are mutually exclusive: peA u C) = PtA) + P(C) =.5 +.2 = .7. Similarly, the value of PCB u C) = P(B) + P(C) =.3 +.2 = .5. Non-mutually Exclusive Events In some sample spaces, two events are not mutually exclusive: a single sample point may be an example of both events A and B. A playing card can be both an ace and a diamond. Consider the possible outcomes (heads or tails) oftlipping a fair coin three times in a row, or three fair coins once. The eight possible outcomes make up the sample space: I. HHH 2. HHT 3. HTH 4. HTT 5. THH 6. THT 7. TTH 8. TTT Each of the eight outcomes is equally likely, that is, each has probability 1/8. What is the probabil ity of heads on the first flip? The answer is 4/8 or 1/2. What is the probability of heads on flips I and 2? The answer is 2/8 or 1/4. Now define two events, A and B, using the sample space just defined: A: Heads on flips I and 2 B: Heads on flips 2 and 3 The sample points that are examples of event A are the first two events (HHH and HHT) in the sample space. The first and fifth outcomes (HHH and THH) are the sample points corresponding to event B. The symbol A Ii B shall denote the new event, the intersection of A and B. (Note the symbols u and Ii are analogous to the words or and and.i In the example, A Ii B is the event "heads on flips I and 2 and heads on flips 2 and 3". Since all the sample points are equally likely, the probability of the event A Ii B is: 5 IFIGURE 9.1 Venn diagram of the intersecting events A and B in the sample space S. tilSECOND ADDITION RULE OF PROBABILITIES The probability of either event A. or event B, or both is expressed as: peA u B) = PIA) + PCB) - peAIi B). The Venn diagram in Figure 9.1 is a graphic portrayal of this situation, and should help clarify the meaning of the term peAIi BJ. The events A and B are not mutually exclusive, that is, they have some sample points in common in the sample space S. The probability of event A is represented by the area of circle A; the probability of event B is represented by the area of circle B. The probability of A or B. or both. is that area of S that is inside the boundaries of both A and B. The shaded portion in Figure 9.1 is that set of sample points in both events A and B, that is, those points in the intersection A Ii B. How does one find the entire area covered by A. and B? First, find the area of A that is not shared by B. Add to it the area of B not shared by A, and then add the area common to both-the intersection of A and B: peAu B) = [peA) - peAIi Bl] + [P(8) - peAIi B)] + PtA Ii B)] peAIi B) = Number of Sample Points that Are Examples of AIi B Total Number of Sample Points (9.4) The first two terms following the equal sign, peA)- peA Ii B), give the area of A minus the area in common with B. PCB) -peA Ii B) gives the area of B minus its area in common with A. The desired area is found by adding in the area common to A and B. peA Ii B). The previous equation simplifies to Equation 9.5 establishes the second addition rule ofprobabilities. Notice that if one had simply added peA)and PCB) to find peAu B), the portion in common to A and B, peAIi B), would contribute twice to the sum, because A and B are not mutually exclusive areas. The intersection must be included only once; consequently it must be subtracted once as shown in Equation 9.5. The total number of sample points is 8. Only one sample point, HHH, is an example of the event A "heads on flips 1 and 2"; and also an example of event B "heads on flips 2 and 3". So the probability of the event A Ii B is 1/8. Look back at the first addition rule of probabilities and notice the condition that the two events A and B are mutually exclusive. In the example just discussed, A and B were not mutually exclusive. The outcome HHH was an example of both events A and B. But what is the probability ofA or B, peAu B), when A and B are not mutually exclusive? The "Second Addition Rule of Probabilities" is needed to answer this question. peA u B) = Pt.-\) + PCB) - peAIi B) (9.5) I 204 9 PROBABILITY s IFIGURE 9.2 Venn diagram of the mutually exclusive events A and B in the sample space S. II I 9.5 Multiplication Rule of Probabilities 205 event?" The answer to this question is the key to the relationship of probability theory and its application. The answer is yes when the underlying assumptions are met. Suppose an event A either does. or does not, occur on every trial of an act. The probability that A will occur, peA), is the same for all trials of the act. For example, the act may be flipping a fair coin, A may be the event "heads," and it is assumed that the probability of heads (1/2) is the same from one flip to the next. It is also assumed that every trial is independent of. in no way affected by, every other trial. Now after N trials of the act, the proportion of times A has occurred is p. It can be proved that p gets closer and closer to P(A) as N becomes larger and larger. The proportion of times A occurs can be made closer and closer to peA) (the probability calculated from the sample space) by performing the act an increasing number of times. So peA) tells what will happen ill the long run if the actions are actually performed under the conditions laid down previously. The preceding paragraph is a statement of the law oflarge numbers. The law of large numbers is important for the application of probability. of which statistical inference is one such application. The law of large numbers is closely related to the statistical property of consistency, encountered previously in Chapter 5. "Actuallythe probability is slightly less than .49, but we will keep the numbers simple and use .5. Multiplication Rule of Probabilities: The probability that A, which has probability peA), will occur I' times in I' independent trials is: • MULTIPLICATION RULE OF PROBABILITIES (1)5 IP(G)5 = - =-=.03125~.03 2 32 (9.6) IpeA) . peA) ..... peA) = P(AY There exists a multiplication rule for probabilities that is important for statistical inference. Suppose a coin is flipped five times in a row. Assume that the probability of heads is \/1 on each flip and that the flips are independent (the outcomes are uncorrelated and have no influence on one another). The multiplicative rule for probabilities states that the probability of getting five straight heads is (1/2)( 1/2)( l/2)( 1/2)( 112) = 1/32.A general statement of the rule follows: Illustrations Assume that the probability that your next child will be a girl is" 1/2 or.5 = PiG). Since the sex of one child has no effect on the sex of any subsequent child, the probabilities are independent. Thus, the probability of five consecutive girls (or boys) from Equation 9.6 is: In other words, of every 32 families that have five children, one (or about three of 100) would be expected to have all girls. Suppose a sixth child is now expected, what are the odds that it is girl? The odds are I in 2; past events that are independent have no influence on future events. To believe otherwise is to believe in the "Gambler's Fallacy" (Section 9.12), and be in jeopardy of wasting a great deal of your money. P(A u B) =peA)+ PCB) - 0 =peA)+ PCB) Notice the first addition rule of probabilities (Equation 9.3) is just a special case of the second rule (Equation 9.5), that is, in the case when P(A n B) = O.If A and Bare mutually exclusive events in S, then they do not overlap. See Figure 9.2, where there is no area in common to A and B; hence P(A n B) = O. More generally, P(A u B) = P(A) + PCB) - P(A n B). If A and B are mutually exclusive, then peA n B) = O. Therefore, if A and B are mutually exclusive: In much the same manner as the mathematical systems of geometry and algebra, probability theory can be developed from a small set of axioms and definitions. Also in the same manner, probability theory can serve as a model for what is going on in a certain class of events in the world around us. James Bernoulli (1654-1705) was the first to relate probability statements to physical events." An example of the application of a formal probability statement to an actual set of actions will illustrate the relationship between theory and application. Suppose an urn contains four white and six black balls. The balls are identical in size, shape, and weight and thoroughly mixed so that if one were to reach in and pull one out, it is equally likely that anyone of the ten balls would be selected; each ball has one chance in ten of being chosen. A ball is taken out and its color is recorded. The ball is returned to the urn, the balls in the urn are stirred thoroughly, and the act is repeated under the same conditions. This act is performed many times, say, 10,000. After the 10,000th drawing of a ball, suppose a count is taken of the number of times a white ball was drawn. Intuition says that the ratio of the number of times a white ball is drawn to 10,000 will be very close to 4110 = .4. It is unlikely the ratio is exactly 4110 but it will be very close. If the ten balls are regarded as a sample space, and if A is the event "a ball is white," then peA) is exactly .4. The question arises, "Will the formal probability of an event as calculated from theory correspond closely to the relative frequency of the occurrence of the 'The "Bernoulli distribution" is another name for the binomial distribution that you may have long forgotten from high school algebra. This distribution has only two classes of independent events in the sample space. for example. heads or tails. hit or miss, event A or not-event A. 206 9 PROBABILITY 9.7 Bayes's Theorem 207 R,G R.G R,G R,G R.G R,G 1,1 1.1 3,1 4,1 5.1 6,1 1.1 1.1 3,1 4,1 5.1 6,1 1,3 1.3 3,3 4,3 5.3 6,3 1,4 1.4 3,4 4,4 5.4 6,4 1,5 1.5 3,5 4,5 5.5 6,5 1,6 1,6 3,6 4,6 5.6 6,6 ITABLE 9.1 I Sample Space of Outcomes of Tossing a Pair of Dice (One Red, One Green) In Figure 9.1, note that the conditional probability P(BIA) (read "the probability of B given A") is represented by the ratio of the area A n B to the areaA. In the example in Section 9.5: peA)= 1/6 and peA n B) = 1/36; thus using Equation 9.7, the probability of B given A is l/36.,. 1/6 =1/6. Given A (Ion the red die). the probability of B (Ion the green die) is 1/6. Suppose that the sample space is American adults and B is "a woman" and A is "a college graduate." Given that a college graduate is selected (IA), what is the probability that it is a woman, that is, what is P(BIA)?Currently, the probability of selecting a college graduate is approximately .26 =peA),and the probability of selecting a woman college graduate is .115 =peA n B). Given A, that a college graduate is selected, the probability of B, a woman, is P(AIB)= P(AnB) =~=.23 PCB) .5 • BAYES'S THEOREM Equation 9.7 is the simplest version of Bayes's theorem," a theorem that describes the relationship among various conditional probabilities. Equation 9.7 may be expressed alterna- tivelyas: P(BIA)= .115 =.44 .26 Notice that P(BIA) is not equal to P(AIB).In this example, given B, a woman, PCB) = .5, the probability of A, a college graduate, is: (9.8) P AIB)= P(AIB)P(B) ( P(AIBJP(B)+P(AIB)P(B) Suppose a toddler is attracted to your computer and pecks keys randomly on the keyboard. What is the probability that the result of the first six characters are M-O-T-H-E-R? For simplicity, assume that there are 100 keys on your keyboard; the probability of a correct peck is, therefore, .01. Since there are six independent events, the probability that the first six random pecks will result in "mother" is (.01)6 = .00000000000 I or only one chance in a trillions (1/ 1,OOO,OOO,OOO,OOO)! The following examples illustrate the probability rules developed so far. Consider a roll of a pair of dice; one die is red and the other is green. The sample space of possible outcomes has 36 points, as shown in Table 9.1. Suppose event A is "a 1 on the red die" and event B is "a 1 on the green die," peAn B) is found by dividing the number of sample points that are examples of An B (both A and B) by the total number of sample points (36). Verify that the probability that both A and B occur is equal to peA) . PCB), (1/6)( 1/6) =1/36. . Find peA u B), the probability of event A or event B, remembering that peA u B) = peA)+PCB) - peAn B). peAn B) is 1/36 because only the point"1,1" is common to A and B; thus, peA u B) = 6/36 + 6/36 - 1/36 = 11/36. Two events are independent if and only if peA n B) = peA) . PCB). Independence is an important concept in statistics and probability, and statistical independence is a much used concept in subsequent chapters. • CONDITIONAL PROBABILITY If peA) and PCB) are known, the conditional probability ofB given A, denoted as P(BIA), is: P(BIA)= P(AnB) peA) (9.7) I where S is read "not B," Let event B be "an automobile accident during the next year" and event A be "a course in driver education," What is the probability of event B (an accident in the next ye~, given A (driver education). that is, what is P(B1A)? Assume that PCB) = .1; thus P( B) =1 - PCB) =.9. Also, assume that P(AIB), the probability of having driver education given that a person has had an accident, is .50; and that P(AIS), the probability of having driver education given that a person has not had an accident, is .7. From Equation 9.8, the probability of an accident (B), given driver education (A) is: 'In scientific notation this is 1.00 x 10-[2.A 17-key sequence, such as 1- -L-O-V-E- -S-T-A-T-I-S-T-I-C-S, has a probability of only (.01)17 or I in 10'4; if one entered a random 17-key sequence once every second, this particular sequence would be expected only once every 317,000,000,000,000,000,000,000,000 (or 3.17 x 10") years! Scientists have estimated the age of the earth at about 5 billion (5 x 10') years; thus, the probability that this 17-key sequence would have occurred even once since the beginning of the earth is (5 x 109)/(3.17 x 1026)= 6.34 X 1017 (less than I chance in 100,000,000,000,000,000; p < .0000000000000000 I)! If one billion individuals were each striking 17 keys randomly each second, the probability remains infinitesimal (I in 100,000,000). Contrast the probability of the simple 1- -L-O-V-E- -S-T-A-T-I-S-T-I-C-S to the incredibly greater complexity of a DNA molecule! PCB IA) = (.5)(.1) =.0735 (.5)(.1) +(.7)(.9) Thus, the probability of an accident has been reduced by 26.5%, from. 10 to .0735, given the person had driver education. 'The theorem is named for its originator, the English clergyman and mathematician Thomas Bayes (1702- 1761), who first used probability inductively and established a mathematical basis for probabilistic inference. 208 9 PROBABILITY 9.9 Combinations 209 ITABLE 9.2 1st letter A B C Permutations of Three Letters: A, B, and C Znd letter 3rd letter Permutation B C I. ABC C B 2. ACB A C .'. BAC C A 4. BCA A B 5. CAB B A 6. CBA possible permutation, or arrangement, of them in the egg carton. Assume you can make a new arrangement in ten seconds. which you do continuously during your eight hour working day. If you keep at this job five days a week fifty-two weeks a year, you would require approximately 640 years to make every possible arrangement! If you and each successive generation of your descendants donate 50 years to this task, it would be finished by your great. great, great, great. great, great, great. great. great. great, great, great grandchild. We suggest you take our word for it. On the Wechsler intelligence tests, an examinee must put five cartoon pictures in the correct chronological order. What is the probability that an examinee will arrive at the correct order by chance? Since 5! = 120, the probability of a "lucky guess" is only 1/120. • COMBINATIONS How many permutations are there for N =4 objects, taking r =3 at a time? Using Equation 9.9, the answer is found to be (4)(3)(2) =24 permutations. There are r terms in this product corresponding to the r objects selected. However, for each unique combirtation of r objects, there are r! permutations; in the example. r =3 and r! =3! =6. Hence, the number of combinations of r objects selected from N objects, ignoring the order among the r objects, is the number of permutations of the r objects selected from N objects (Equation 9.9) divided by the number of permutations within a combination. The concept of combinations arises when one is selecting some number of objects r from a set of N objects. A combination of objects is a distinct set of objects in which order is not considered. When r = N, that is, N objects are selected from N objects, all the objects are selected and there is only one combination (although there are N! permutations). If r =1, one object is selected from N objects, and there are N combinations. The problem is to find a general expression for the number of combinations that exist when r things are selected from N things. Consider four objects, A, B, C. and D. How many different combinations can be made by selecting two letters at a time from these four? The answer is six: AB, AC. AD, BC, BD, CD. Notice that for combinations. order is not considered: AB is one combination, and BA is the same combination. (See the first two columns of Table 9.2. where the six permutations form three combinations.) Suppose r objects are being selected from N objects. How many different combinations are there? For the time being, regard order as important. and then later combine all the sets that are different only because of order. If r objects are being selected from N, then there are N choices for the first object, (N - I) choices for the second. (N - 2) for third, (N - 3) for the fourth. and so on until there are (N - r + I) choices for the rth object. So, the total of different permutations of r objects from N objects. where order is considered, is equal to: Bayes's theorem provides exact results, providing the prior probabilities, the probabilities entered into Equation 9.8, are accurate. Obviously, the practical difficulty in the application of the theorem lies in knowing the prior probabilities. These probabilities have often been viewed as degrees of belief, or personal probabilities. The topic has occasioned much controversy among statisticians who favor a strict relative-frequency interpretation of probability and those who would allow for a more subjective interpretation. Discriminant analysis (Section 8.30) makes use of prior probabilities and Bayes's theorem in determining probabilities that a given case falls in a given category, given a prior probability of being in that category. CEIlPERMUTATIONS Two additional concepts that crop up repeatedly in probability illustrations are permutations and combinations (Section 9.9). A permutation of a set of objects (the letters A, B, and C, for example) is an arrangement of them in which order is considered. A different ordering of the objects is a different permutation. How many different permutations or orderings are there of the letters A, B,and C? To find out, one can set about the task of writing them down and counting them, as shown in Table 9.2. The first letter can be either A, B, or C. Suppose it is A, the top third of Table 9.2. If the first letter is A, the second letter can be either B or C. If the second letter is B, then the third letter must be C. S6 ABC is one possible permutation. There are three possible letters for the first position. After one letter is assigned to the first position, there are two possible letters for the second position. Hence, the number of possible permutations of the three letters A, B, and Cis (3)(2)(1) = 6. If there are N distinct objects, one can make N(N - 1)(N - 2) ... (2)( I) different permutations of them. This product can be denoted simply by N! (read "N factorial"). N! is the product of the numbers from I through N and equals the number of permutations of N distinct objects." (O! is defined mathematically to equal I.) The value of N! increases dramatically as N increases. Would you work a year for 12! pennies? (l2! pennies = 479,001,600¢ = $4,790,016). To illustrate the incredible size of 12!, imagine that you have one dozen eggs in a carton and that you want to form every 'Many calculators have a factorial (!) key that provides almost instant answers to N! questions. [ N(N-I)(N-'2) ... (N-r+ I) ( N')=N(N-I)(N-2) ... (N-r+l) r, r! (9.9) I (9.10) I 210 9 PROBABILITY 9.10 Binomial Probabilities 211 When this substitution is made in Equation 9.10, the number of combinations of N things taken r at a time is given by: The expression to the left of the equal sign in Equation 9.10 is read "the number of combinations of r things taken from N things." It can be shown" that the number of permutations of r objects taken from N objects is: \" N(N-I)(N-2) ... (N-r+I)=-1_.- (N-rl! ( N) N! r =r!(N-r)i (9.11) ] (9.12) i events):'! q = I r P denotes the probability of event B. The probability of a particular permuration. for example, four A's (successes) followed by a B (failure), then another A (success), (A, A. A, A, B, A), is fJ . P . P . P . q .fJ= p5q. Notice that the result, p5q, is the same no matter where in the sequence the outcome B falls. Thus, the probability ofany sequence ofN independent Bernoulli events depends only on the probability of event A on any trial (p), and the number of A's in the sequence (r). In other words, the probability of a given sequence of r successes in N events, where p is the probability of outcome A on any of the N independent trials and q = I - P is the probability of outcome B (a non-A event on any trial), is: (9.13) I When tossing a pair affair die, what is the probability of (7,7,7, not-seven, 7) in five tosses? In Table 9.1, we see that in the sample space, the proportion of 7's is 6/36 = 1/6 = p. The probability of a non-seven then is q = I - p = I - 1/6 = 5/6. Thus, from Equation 9.13, the probability of this sequence is: Examples How many different combinations are there of r = 3 things taken from N = 5 things? Note from Equation 9.9 that there are (5)(4)(3) permutations of five things taken three at a time. Each combination has r! =3! =6 permutations; thus, the sixty permutations represent 60/6 combinations. Each of the ten combinations of five thinas taken three at a time has (3)(2)(1) = 6 permutations. Ten persons are eligible to serve on a committee. The committee must be composed of only five persons. How many different five member (r = 5) committees could be formed from the ten available persons (N = IOJ? From Equation 9.12: ( 5) 5! 3 = 3!(5-3)! 5! 3!2! 5·4·3·2·[ (3·2· I)(2 .1) 5·..J. 20 -=-=10 2·1 2 p+q5-4 = p+q = (1/6)+(5/6) = 5/7,776 = .000643 or roughly 6 chances in 10,000. Rarely would our interest be in a particular sequence or permutation; we would usually be more interested in the probability of obtaining four 7's in five tosses, for example. Note that there are five different permutations, each with the same independent probability (.000643), that would result in the same combination; consequently, the probability of one of the five occurring is 5(.000643) = .003215, or about 3 chances in 1,000. More generally, from Equation 9.12, note that there are N!/[r!(N - r)!] sequences that result in r successes in N trials. If N = 5, and r =4, there are five sequences that result in this combination. __N_!_= (5·4·3·2·I) =5 r!(N - r)! (4· 3·2· I)(I) When we have a Bernoulli-type (two classes of independent events) sample space and the probability ofeach event class is known, we can find the probability of any outcome. Let A (success) denote one class of events; let B (failure) denote the other class. Further, let p represent the probability of event A (i.e., the proportion of the sample space occupied by A Generalizing the above rationale, we come to the following conclusion. Where the probability of the result A is p and the probability of a non-A result is I - p = q, the probability of observing result A in r of N independent occasions is: Keep in mind that prqN-r is the probability of anyone of the N!/[r!(N - r)!] sequences that result in r successes in N trials. Suppose there is a ten-item multiple choice test, with each item containing four options. What is the probability of obtaining a score of 80% from ( 10) 10! IO! 1O·9·8·7·6·,6.A-.i3.i·f 5 = 5!(l0-5)!= 5!5!= (5·4·3·2·I)(b·4..zl·,t-i, b BINOMIAL PROBABILITIES 30,240 =252 120 N!____ (prqN-r) r!(N -r)! (9.14) I 'Note that N! = N(N - I)(N - 2) ... [(N - r) + I]IN - r)[(N - r)-I] '" II). Write out N! in the numerator and (N - r)! in the denominator and cancel the terms common to both numerator and denominator and express (N - r)[(N - r) ~ 1] ... (1) as (N - r)! to obtain Equation 9.11. 'To be completely consistent in notation, since a parameter is denoted, IT:rather than" should be used. We shall compromise in this application, however, and use the conventional symbols." and q. 212 9 PROBABILITY random guessing? In this situation. N = 10. r = 8, and the probability of a success on any item is p = \14' The probability of guessing correctly on 8 (not 8 or more) of the 10 items is: IO! (~)8(~)2 = 45(_1_)(~) = ( -105 ) = 000386 8!(l0-8)! 4 4 65,536 16 1,0-18,576 . or less than 4 chances in 10,000. What is the probability of answering 8 or more of the 10 items correctly? We need to find the probability of answering 9 items correctly, and the probability of answering all 10 questions correctly, and sum the three probabilities. The probability of 9 from Equation 9.14 is (10!19!)[(1/4)9(3/4)] = 10(1/262,144)(3/4) = 30/1,0-18,576; the probability of answering all ten correctly is (1/4)10= 1/1,0-18,576.Thus the probability of answering 8 or more of the 10 four-option questions correctly merely by guessing is only (405 + 30 + 1)/ 1,048,576 = 436/1,048,576 = .0004158.10 The binomial expansion is an application of the binomial theorem you were probably introduced to in high school algebra. It is a general expression that gives the probabilities for any number of outcomes of an event A in N independent Bernoulli trials, where p is the probability of event A on anyone trial. The number of A's is the exponent for p in any of the N + 1 terms of the expansion: ( + )N _ N + N! pN-lq+ N! pN-2q2 + ... + N! pq'':' +q" (915) P q -p (N-I)!!! (N-2)!2! 1!(N-I)! • The binomial expansion or distribution can be used to compare any dichotomous set of observations with a theoretical distribution to answer such questions as, "In the distribution of male offsprings to mothers who carry the gene for hemophilia, what is expected if hemophilia is a Mendelian recessive trait?" "Are more babies male than female?" "Do husbands score higher on need-for-achievement measures than do their wives?" "In a mental telepathy experiment, is the number of 'hits' greater than chance?" "Can you toss more than 50% heads in a series of coin tosses? "Can a naive examinee beat the odds and obtain a score that is greater than can be expected from chance alone?" "When faced with true-false questions, are examinees more likely to guess T than F?" "Is the proportion of dropouts who are male greater than the proportion who are female?" • THE BINOM!Al AND SIGN TEST The sign test is a special "non-parametric"I I application of the binomial distribution in which there are N paired observations, such as matched pairs in an experiment (E vs. C). If 100r 4.16 x 10-' in scientific notation (often appearing as 4.16E-4 in computer output), II"Non-parametric" is an unfortunate term applied to statistical methods that apply to data that represent nominal or ordinal scales of measurement, or to a data set where normality and other assumptions are not made about parameters of the distribution of observations in the parent population (Conover. 1980. p. 92). "Distributionfree" is a synonym of "non-parametric." T1 9.12 Intuition and Probability the treatment has no effect, we should expect the E pair-member to surpass the C pairmember in one-half (JJ = .5) of the N comparisons. Suppose only 5 matched pairs are available. We randomly assign one pair-member to get the E treatment, the other pair-member serving as the control. On the posttest, we find that in 4 of the 5 comparisons E outperformed C. How probable is it that we would observe that E outperformed C in 4 or more of the comparisons by chance? Based on Equation 9.15, the probability of exactly 4 "hits" is 5(1/2)4( 112) = 5( 1/2)5= 5/32; the probability of 5 "hits" is (1/2)5 = 1/32; thus. the probability of 4 or more "hits" in 5 comparisons is 5/32 + 1/32 = 6/32 = .1875, or roughly I chance in 5. Our expectation that E is effective lacks strong evidence. However, if we continue to add cases with the same success ratio, our case will become much stronger. 12 In the sign test, and many other applications of the binomial test, when p = q = .5, the product of each of the N + I "pq" terms equals p'', which greatly simplifies the computation. If a basketball player makes 50% of his free throws, and if he takes 12 free throws in a name, how likely is it that he will make 10 or more? Since pN = (1/2)12 = 1/4,096; the probability of making all 12 is 1/4,094; the probability of making I I is 1214,096; the probability of making lOis [(12)(11)/2]/4,094 = 66/4,096. Thus, the probability of making 10 or more of the 12 free throws is (I + 12+66)/4,096= 79/4,096 "" .02 (not very likely). Note, however, the probability of missing 10 or more of the 12 has the same probability.':' Table 0 in the Appendix gives the probability for observing r A's in N events when p = .5. Suppose we want to investigate whether this generation is taller than the previous generation. as we have been led to believe. We ask the twenty-five members of the statistics class to compare their heights with that of their parents of the same sex at the same age. If this generation is no taller than the previous generation, we would expect .5 = 50% of the students to be taller than their parents. We collect data and find that twenty of the twenty-five are taller than their same-sex parent. How probable is it that we would observe 20 A's out of 25 = N Bernoulli trials? In Table 0, find the row for N = 25 and the column for N - 5 = 25 - 5 =20 A's (or for 5 B's I.The intersection contains the value .00204, which is the probability that twenty or more A's would be observed in twenty-five independent events when p = .5. Since our result is so unlikely if this generation is no taller than the previous generation, we conclude that this generation is taller. 14 • INTUITION AND PROBABILITY The study of probability can be interesting and entertaining. Historically. the concepts of probability evolved in connection with games of chance. Those who make use of probability theory are generally awed by the intricacy and excitement of the system and the way in which it produces results that are often quite in disagreement with intuition (unless one's intuition has been developed by experience with calculated probabilities). A few examples of surprising results will illustrate the untrustworthiness of intuition. First, the classic "birthday problem": What is the probability that at least two people in I'Can you confirm that the probability of 8 success in 10 trials is (45 + 10 + IJ/I,024 = 56/ I,024 = .055? "This problem makes the dubious assumption that the probabilities are independent. If there is any validity to the notion that on a given occasion a player may get "hot." the model does not hold. Whether there are nonran- 110111 sequences such as hitting streaks or slumps in baseball (or whether they are just unusual sequences that are expected infrequently) can be tested by the "runs" test (Dixon & Massey. 1983). "In Chapter II. you will see that we have just rejected the statistical hypothesis that the parameter Jr is .5. 213 214 9 PROBABILITY 9.13 Probability as an Area 215 • PROBABILITY AS AN AREA The probabilities of observing values of continuous variables, for example, height, are conveniently represented by mathematical curves known as probability distributions. Suppose a continuous random variable X takes on values from 0 to 10. For example, X could be the time required for students to solve a certain puzzle. A student may solve it almost immediately, or she may take as long as ten minutes. Presumably, the length of time required to solve the problem is known for a large number of different subjects. Figure 9.3 represents a graph drawn in which the "time to solution" is graphed against the "proportion of subjects requiring that time." The proportion of subjects requiring between two and four minutes to solve the puzzle can be regarded as the probability that a subject selected at random from the population will require that amount of time to solve the puzzle. The entire area under the curve in Figure 9.3 is set to I, so the area under the curve between any two points XI and X2 is the probability that a randomly selected subject will require between XI and X2 minutes to solve the puzzle. The probability that a randomly selected subject will take more than 5.3 minutes is equal to the shaded area in Figure 9.3. What area corresponds to the probability that a randomly selected subject will take less than I minute? (The small area between 0 and 1.)16 If the proportion of the area under the curve in Figure 9.3 between 6 and 10 were .17, then in a group of 100 randomly chosen subjects we would expect about seventeen persons to take between six and ten minutes to solve the puzzle. The statistician frequently plots the values a continuous random variable can assume so that the area between any two values of the variable equals the probability that the variable will assume a value between those two values. The resulting graph is called a probability density function. The graph can often be expressed as a mathematical function so that the ordinate P(X) can be found by substituting any value of the random variable X. For exIThe Probability Density Function of the Variable X, Time Required to Solve a Puzzle 109874 5 6 Time (min.) 32o P(x) >- .5 I :;: :c .3m .0 0...c. .1 IFIGURE 9.3 a group of twenty-three have the same birthday anniversary? Assume that the people are drawn randomly from a population of persons in which all 365 birthdays (not counting 29 February) are equally likely. One often obtains intuitive guesses that the probability is: 10 or .00 I or even smaller. Surprisingly, it is more likely than not that the probability that at least two people out of twenty-three have the same birthday is .507! Naturally the probability is even higher as the size of the group increases; it is practically certain that in a group of 150 persons at least two people will have the same birthday (Feller, 1957. pp. 31-32.). A certain TV show has three "windows," Behind one there is an expensive prize: there is nothing of real value behind the other two. The contestant keeps what is behind the window chosen. Suppose that, after the contestant selects a window, the host always raises one ofthe two other windows-behind which there is nothing of value, then gives the contestant the choice to remain with the initial choice or to change to the other unopened window. What is the wiser choice? You may feel that the odds started out even. and remain even. so that there is no advantage or disadvantage to remaining with the original choice. If your intuition is that your odds of winning are increased by changing, you are right. The odds of winning change from 1/3 to 2/3 when the contestant leaves the initial choice. What is the probability that a student will obtain a score of 75 on a 100 item true-false test by guessing randomly on each question. given that the average" chance score, /-1, is 50 and the standard deviation, a, is 5? The score of 75 is z = 5 standard deviations above the chance mean. From Table A in the Appendix, the probability can be read as less than one chance in I,OOO,OOO! What is the probability of a score above 60'7c? Since 60% is two standard deviations above the mean, only one randomly guessing student out of fifty will obtain a score greater than 60. ..The "gambler's fallacy" represents another example in which intuitive notions of probabJl!ty often lead to erroneous conclusions. If the football captain has won the coin tlip by calling he~ds for the first three game~, should he change to tails on the next flip? If a craps shooter failed to throw a 7 on ten straight throws of a pair of dice, is he more likely to throw a 7 on the next throw than if he has thrown three 7's in a row? If the first four children in a family are. boys,. are the ~hances. that the next child will be a girl different from what they w?uld be If the four previous children had been two boys and two girls or all girls? If you think so, shame on you-you are guilty of the "gambler's fallacy." If the probabilities of the event in ques.tion are independent, as they are in the examples above. the probability of a future event IS unaffected by any pattern of past results. Whatever the number of heads prior to a given toss, the probability of heads on the next toss of a fair coin is .5. This is confirmed by the conditional probability equation (Equation 9.7). In the sample space of four tosses of a coin there are sixteen events or permutations: HHHH, HHHT, HHTH, HHTT, ... TTTH, TTTT.If A is HHHforthe first three tosses. then PIA) = 2/16 = lI8' and if B is H for the fourth toss, then PIA n B) =1/16, the probability of head (B) aiven three previous heads (A), that is: e P(BIA)= P(AnB) = 1/16 =.5 PIA) 1/8 15J1 = k(a,.where k is the number. of items on the test and a is the number of response options per item. The standard deviation of the chance score is a =.Jk;r(l-;r I. where zr« 1/a (see Hopkins Stanley & Hopkins 1990) For large k, the distribution will be approximately normal. . , . . . "Theoretically, the probability that a person will take exactly one minute (or any other precisevalue). for example, 1.000 ... minutes, is zero; in such situations, the height of the curve is the "probability density" of the value. 216 9 PROBABILITY ample. assume X is a random variable that can take on any value between 0 and 2 with equal probability. If P(X) = 1/2 for all X. then the resulting graph (Figure 9.4) will be the probability density function of X. For example. the area under the curve of the rectangle in Figure 9.4 is exactly I (i.e.• 0.5 x 2.0). The lightly shaded area is the probability that X takes on a value between 0 and I; the probability equals .5. 9.15 Expectations and Moments 217 (9.16) I N E(X) = PIX1 + P2X2+ ... + PNXN = L pjXj J • EXPECTATIONS AND MOMENTS where PI +P2 + ... +PN= I = LjPj' Another symbol denoting the expectation of X is fl. the Greek lowercase letter rnu. E(X) = fl, the mean of the population of X's. The names "expectation" and "expected value" are synonymous. Some examples of expectations are as follows: I. Suppose X is the random variable that has six possible values, 1,2, ...,6. The events of the sample space could be the six sides of a die. Assume that a probability of 1/6 is associated with each value of X. What is the value of E(X)? From Equation 9.16: Definition: If X is a discrete random variable that takes on the values XI.X2, ... , X"with probabilities PI' P2• ...•PN. then the expectation of X denoted by E(X) is defined as: Moments are characteristics of distributions defined in terms of expectations. The definition of the expectation of a random variable will be considered first. T I! ----x 2 P(x) 1.0 o ~ :ca:s ..c 0.5 o 0:. IFIGURE 9.4 Probability Density Function of the Variable XThat Assumes All Possible Values between oand 2 with Equal Probability illCOMBINING PROBABILITIES Suppose we perform a sign test to investigate our research hypothesis that husbands express a higher level of need-for-achievement than do their wives. and suppose the study was replicated by two other researchers. Suppose all three studies found a slight difference in favor of the husbands with probability levels of .09..06. and .11. None of these studies separately would give convincing evidence of the research hypothesis. but does your intuition suggest that a combination of the three studies would lead to a different conclusion? There are several methods for addressing this situation. but the "most serviceable under the largest range of conditions is the method of adding z's'' (Rosenthal. (978). The "method of adding z's,"also known as the Stouffer method, is quite simple and direct: First, convert each of the p-values to its corresponding z-score from Table A. Second. sum the z's and divide by the square root of the number of probabilities being combined, and third, find the p-value of the obtained z-value from Table A. For our three p-values of 09..06, and .11, the corresponding z-scores are 1.341, 1.555, and approximately17 1.225; their sum is 4.121, which. divided by the square root of three (1.732), is 2.379. Note from Table A, the proportion of the area above z = 2.38 in a normal distribution is .0087; thus, the probability of obtaining three independent p-values of .06. .09, and. I I is less than .0 I. Hence. taken as a composite set of information, the conclusion is warranted than husbands do indeed have a higher level of need-for-achievement than do wives. Note however, the p's must be independent: the procedure cannot be used within a single study having multiple measures. 6 1 I I 1 I I 1 E(X)=" pjXj =-(1)+-(2)+-(3)+-(4)+-(5)+-(6)=-(1+2+ ... +6) L. 6 6 6 6 6 6 6 J =21=3.5 6 In this example, E(X) = fl = 21/6 = 3.5. In repeatedly rolling the die, one can expect to average 3.5 points. 2. A particular slot machine has payoffs of $0.00. $0.50, $1, $2. and $25. The probabilities associated with each of these occurrences are .80, .15, .04, .01, and .001, respectively. Define a random variable X that takes on the four values 0, 50, 100, 200, and 2500 cents with probabilities .80, .15, .04, .01, and .001. What is the value of E(X)? fl = E(X) = .80(0) + .15(50) + .04(100) + .01(200) + .00 I(2500) =0+7.5+4.0+2.0+2.5= 16 If it costs 25 cents for each trial on this slot machine, would you like to play? If you feed $25 into the machine, how much can you expect to lose? ($9) 3. Let X be the four random values that correspond to the number of heads in four flips of a fair coin. X can take on the values 0, 1, 2, 3, and 4. Find E(X). If you write out the sixteen equally probable events in the sample space, that is, HHHH, HHHT, ... , TlTF, you will find the probabilities associated with 0, I, 2, 3,4 are 1/16, 1/4, 3/8, 1/4, and 1/16, respectively. Thus: "This is best done using a spreadsheet (e.g.. using the NORMDIST function in EXCEL). although the results from Appendix Table A are quite acceptable. 4 1 I 3 I I E(X) ="pjXj =-(0)+-(1)+-(2)+-(3)+-(4)=2 L. 16 4 8 416 J 218 9 PROBABILITY 9.17 Mastery Test 219 P(x) I~~I~IFIGURE 9.5 Probability Distribution of X o 2 3 X The probability of event B, given event A, is conditional probability, P(BIA), and equals peA n B)IP(A). . . . . 1 Each unique ordering or arrangement of11obJec~s IS a ~ermutatl~n. ~here are II. permutations of II objects. The set of objects irrespective of order IS a combination. The number of combinations of II things taken r at a time is I1l/[r!(11- r)!]. . • SUGGESTED COMPUTER ACTIVITY From spreadsheet or statistical software, select a random sample of 25 numbers between I and 1,000. "In the long run." one can expect to average two heads in four random tosses of a fair coin. If X is a continuous variable instead of a discrete one, then an algebraic function describes the form of its probability distribution. If X is continuous, one cannot assign a probability to a single value of X. Instead, statements about the probability that X lies in an interval are made. For these reasons, the definition given for E(X) in Equation 9.16 cannot be applied to a continuous random variable. Unfortunately for those without recourse to knowledge of integral calculus, it is difficult to define the expectation of a continuous variable. Suppose X is a continuous random variable and the probability distribution of X looks like the one in Figure 9.5. There is an algebraic rule that gives the height of the curve in Figure 9.5 for every value of X. The area under the curve is I unit. The probability that X will assume a value between, for example, 2 and 3 is equal to the area under the curve between those two points. Definition: The expectation of the continuous random variable X is the sum of the products formed by multiplying each value that X can assume by the height of the probability function curve above that value of X. Since X can take on infinitely many values with continuous random variables, you might wonder how you could multiply each of the separate values of X by the height of the curve at X to find its expectation. This is the problem that recourse to the integral calculus solves. If you are not familiar with calculus, take it on faith that it can be done, in a precise but somewhat indirect way, by "integration." The expectation of a continuous random variable X is denoted by E(X) or u, as is the expectation of a discrete variable. tilCHAPTER SUMMARY The probability ofevent A can be viewed as the ratio of the number of points ofA to the total number in the sample space. Often intuitive notions regarding probability are quite inaccu- rate. If events A and B are mutually exclusive, the probability of either A or B, peA U B), is peA) + PCB). If events A and B are not mutually exclusive, peA u B) =peA) + P(B)peA n B), where the last term is the probability of both A and B. If event A has probability peA), the probability that A will occur II times in II independent trials is P(A)". MASTERY TEST 1. What is the probability of tossing a "6," P(6), with one toss of a die? 2. How many sample points are there in the sample space in question I? 3. Are the sample points mutually exclusive? 4. What is the probability of not tossing "6:' P(6)? 5. What is the probability of tossing "fi's" in two tosses of a die? 6. Given that one "6" has been tossed, what is the conditional probability that the second toss will also be a "6"? 7. On a five-option multiple choice test, what is the probabilityof selecting the correct answer from a random guess? 8. What is the probability of correctly guessing the right answer on all ten items of a five-option multiple choice test? 9. Although 651412 and 214I65 are different permutations, they represent a single __. 10. What is the probability of correctly guessing ten often true-false questions? 11. How many permutations are there in the previousquestion? (For example,T, T, F,T,T, F,F,F,T, T, is one perrnutation.) 12. How many different doubles teams in tennis are possible with a class of twenty members? 13. The probability of throwing two consecutive "snake eyes" with the tos.s. of a pair of dice .is (1/36)" = 1/1,296. If you havejust tossed "snake eyes:' what is the probability the next toss will be "snake eyes"? 14. Which of these random variables are discrete and which are continuous: (a) number of students enrolled in a statistics class (b) running speed of ten-year-olds (c) result from the toss of a pair of dice (d) height of adult males 15. What is the expected number of girls in two-child families if the probabilities associated with 0, I, and 2 girls are 1/4, 1/2, and 1/4, respectively? 16. Probability density pertains to (a) continuous random variable (b) discrete random variable 17. If 25% of the area in a probability distribution falls between 90 and 100,what is the probability that a case selected at random will fall between 90 and 100? Answers to Mastery Test ANSWERS TO PROBLEMS AND EXERCISES (a) What is the probability that the five initially cancerous convicts would be randomly assigned to be the five experimental smokers? (b) If there had been ten pairs. rather than five, what is the probability that the cancerous convict in each of the ten pairs would have been assigned to the smoking group? 11. In the general population, Stanford-Binet IQ's are nearly normally distributed with a mean of 100 and a standard deviation of 16. By referring to Table A in the Appendix, determine the following probabilities: (a) A randomly sampled person will have an IQ between 80 and 120. (b) A randomly sampled person will have an IQ above 140. (c) Three independently randomly sampled persons will all have IQ's above 92. 12. The variable X takes on the values 0, I, 2. 3, and 4 with probabilities 0, 2/5, 1/5. 1/5, and 1/5, respectively. What is the value of E(X). the expected value of X? 13. The sample space for tossing a pair of dice is given in Table 9.1. (a) Determine the probability for each value of X. for 2, 3. 4...., 12. (b) What is the expected value of X, E(X)? (c) What is probability that X'" 7? (d) What is the probability of "?" on three consecutive throws'? (e) Given the consecutive "T's. what is the probability of "?" on the next toss? (f) What is the probability of "Ts" on four of five tosses? (g) What is the probability of "Ts" on four or more of five tosses? 14. In an experiment comparing a new product with the standard product, the new product was preferred by 23 of the 30 subjects in the taste test. Is the new product better, or can this be explained by chance? Use Table 0 to learn the probability of 23 A's out of 30 Bernoulli events. 220 9 PROBABILITY 18. If event A influences the probability of event B. events A and B are not __. 19. What is the probability of drawing four aces without replacement from a deck of fifty-two cards? (Hint: probability of ace on first card is 4/52; on second, 3/51; etc.) PROBLEMS AND EXERCISES 1. Let a pack of fifty-two playing cards be the sample space S of interest. Determine the probabilities of each of the following events: (a) A is "a card is the ace of spades." Find peA). (b) B is "a card is an ace." Find PCB). (c) C is the event that t'a card is a spade." Find P(C). (d) D is "a card is a diamond," and Cis "a card is a spade." Find PCD n C). (e) Cis "a card is a spade" and B is ra card is an ace." Find P(C u B). 2. Suppose that in a certain locale 3'70 of the children of kindergarten age have severe perceptual problems and 6'70 of the children of the same age have emotional problems. Also, 1.5'70 of the same group of children have both perceptual and emotional problems. Children suffering from either problem or both must receive teaching apart from normal pupils. What is the probability that a child entering kindergarten will require special teaching, that is, will have either perceptual or emotional problems or both? 3. (a) Find 15!!13! (b) Find 6!/[3!(6-3)!] (c) What is the value of N if (N + I)! is exactly ten times larger than N!? 4. An experimenter wishes to have subjects learn a list of paired associations in all possible orders a!th~ ~ix pairs. ~ach.sUbjec.t can learn the list only once. How many subjects would be required If a different subject IS required for every possible ordering of the six pairs? 5. (a) How many combinations are there of six things taken four at a time-find (~). (b) Find (\), G), G). (~), and (3). 6. The varsity basketball team has twelve members. How many possible "starting fives"-the five players who start the game-could the coach form from his team of twelve players? 7. (a) Verify that @ +(1) + (3) + (}) + (j) is equal to 24. (In general, r.f(1) = 2N (b) How many different combinations are there of six things, in groups of size ato 6 inclusively that is, what is r.;~o(?)? ' 8. How many five-item tests can be formed by ten items split into two tests of five items each? 9. A student takes a ten-item true-false test but does not know the answers to five of the items.lfhe guesses randomly between true and false on each of the five items, (a) What is the probability that he will earn a perfect score of 1O? (b) What is the probability that he will guess incorrectly on all five questions? (c) What is the probability that his score will be 6, 7, 8, or 9? (Hint: Subtract probabilities for scores of 5 and 10 from 1.00.) 10. In a ~ctitious experiment, convicts. volunteered for study of the causal relationship between smoking and lung cancer. The convicts were matched into five matched pairs so that both pair mates were of the same age. Within each pair of convicts, a coin was flipped to determine which convict would continue smoking two packs of cigarettes a day and which one would not be allowed cigarettes for the duration of the experiment. At the end of the ten-year experimental period, the five smokers in each pair had lung cancer; none of the nonsmokers had lung cancer. Suppose that at the outset of the experiment, a convict in each pair had undiagnosed lung cancer. TI ANSWERS TO MASTERY TEST 1. 1/6 2. 6 3. yes 4. 5/6 5. (1/6)" = 1/36 6. P(BIAl = (1/36)/(1/6) = 1/6 7. 1/5 8. (1/5)1"=.0000001 9. combination 10. (1/2)1"= 1/1.024 11. 1,024 1. (a) peA) = 1/52 (b) PCB) = 4/52 = 1/13 (c) P(C) = 13/52 = 1/4 (d) P(D n C) = 0/52 = a (e) PiC u B) = 13/52 + 4/52 - 1/52 = 16/52 = 4/13 12. (220)= 2!(2~0~2)! 190 13. 1/36 14. (a) and (e) are discrete, (h) and (d) are continuous. 15. E(X) = 1/4(0)+ 1/2(1)+ 1/4(2)= I 16. (a) 17..25 18. independent 19. (4/52)(3/51 )(2/50)( 1/49) = 24/6,497,400 =.0000037 2. A: a child has perceptual problems. B: a child has emotional problems. A n B: a child has both perceptual and emotional problems. peA)= .03, PCB) = .06, peA n B) = .015. P(Au B) = .03 + .06 - .015 = .075 221 222 9 PROBABILITY 3. (a) 210 (b) 20 (e) If N = 9, then (Il + 1)! = 1O! = 10(9!). 4. 6! =720 5. (a) 15 (b) 5, 10, 10,5, and I 6. (f)=792 7. (a) I +4+6+4+ I = 16; 24= 16 (b) 26 = 64 8. (~)) = 252 9. (a) P(A)"= (1/2)5 = 1/32 or .03125 (b) 1/320r.03125 (e) 1-1/32 -1/32 = 30/32 = 15/16 or .9375 10. (a) (1/2)5 =1/32 (b) (1/2)10 = 1/1,024 11. (a) .7888 (b) .0062 (e) (.6915)-'= .3307 12. E(K) =21 / , 13. (a) 1136, 1/18, 1/12, 1/9,5/36, 1/6,5/36. 1/9, 1/12, 1/18, 1/36 (b) E()(I = 2..jpj Xj = 7 (e) 5/6 (d) (1/6)3 =1/216 (e) 1/6 (f) N! v-: (5!) 5h(N-I)!!!P q, 4! = , enee 5(.!.)4(2) = ~= .0032150 6 6 7,776 (g) p,v= (1/6)5= 117,776;thus, 26/7,776 = .0033436 14. The probability of observing 23 or more A's when N = 30 is only .00261, good news for the marketingdepartment. .....10 '. .I, ...""".. STATISTICAL INFERENCE: SAMPLING AND INTERVAL ESTIMATION • OVERVIEW In the preceding chapters, statistical inference has been of incidental concern-only a minor theme. Beginning with this chapter, we will focus on estimating parameters using inferential statistical methods. One of the primary purposes of statistical methods is to allow generalizations about populations using data from samples. This chapter introduces ideas that are of fundamental importance in all succeeding chapters. Nearly all public opinion polls and surveys, such as the Gallup and Harris polls, involve selecting a sample, obtaining data on that sample. and then making inferences about the entire population. Rarely are all members of the population observed; usually only a small fraction of the elements in the population is sampled. The Nielsen ratings of the popularity of television programs are based on the viewing habits of a sample of less than one home in 10.000 (.0 I'7c) in the population. The computerized projections of winners in political elections are nothing more than sophisticated applications of the concepts of this chapter. Before considering the theory underlying statistical inference, some fundamental definitions and concepts must be reviewed. • POPULATIONS AND SAMPLES: PARAMETERS AND STATISTICS The principal use of statistical inference in empirical research is to obtain knowledge about a large class of persons, or other statistical units, from a relatively small number of persons. Inferential statistical methods employ inductive reasoning-reasoning from the particular to the general and from the observed to the unobserved. Inferential statistical reasoning addresses such questions as: "What can I say about the age at which the average child in the United States (the population) first utters a sentence, given that the average was 202 weeks for a sample of twenty-five children?" Any exhaustive (finite or infinite) set or collection of things (units) that we wish to study, or about which we wish to make inferences, is called a 223