700 28 AUTOMATED METHODS OF ANALYSIS commodate approximately 98% of the samples encountered in a clinical laboratory. Samples outside this range can usually be handled by suitable dilutions. Extensive performance testing of these devices generally reveals a good correlation between the data they produce and the results by standard procedures. Precision of 1 to 10% relative is reported depending upon the type of test, which again is comparable with the data from automated standard methods. 28E QUESTIONS AND PROBLEMS 28-1 List sequentially a set of laboratory unit operations that might be used to (a) ascertain the presence or absence of lead in flakes of dry paint. (b) determine the iron content of multiple vitamin/mineral tablets. 28-2 Sketch a flow-injection system that could be used for the determination of K+ and Na+ in blood based upon flame photometric measurements. 28-3 Sketch a flow-injection system that might be employed for determining lead in the aqueous effluent from an industrial plant based upon the extraction of lead ions with a carbon tetrachloride solution of dithizone, which reacts with lead ion to form an intensely colored product. 28-4 Sketch a flow-injection apparatus for the determination of sodium sulfite in aqueous samples. Appendix valuation of Analytical Data This appendix describes the types of errors that are encountered in analytical chemistry and how their magnitudes are estimated and reported. Estimation of the probable accuracy of results is a vital part of any analysis because data of unknown reliability are essentially worthless. 91A PRECISION AND ACCURACY Two terms are widely used in discussions of the rell ability of data: precision and accuracy. alA-1 Precision Precision describes the reproducibility of results— that is, the agreement between numerical values for two or more replicate measurements, or measurements that have been made in exactly the same way. Generally, the precision of an analytical method is readily obtained by simply repeating the measurement. Three terms are widely used to describe the precision of a set of replicate data: standard deviation, variance, and coefficient of variation. These terms have statistical significance and are defined in Section alB-1. a1A-2 Accuracy Accuracy describes the correctness of an experimental result. Strictly speaking, the only type of measurement that can be completely accurate is one that involves counting objects. All other measurements contain errors and give only an approximation of the truth. Accuracy is a relative term in the sense that what is an accurate or inaccurate method very much depends upon the needs of the scientist and the difficulty of the analytical problem. For example, an analytical method that yields results, that are within ±10%, or one1 part per billion, of the correct amount of mercury in a sample of fish tissue that contains 10 parts per billion of the metal would usually be considered to be reasonably accurate, in contrast, a procedure that yields results that are within ± 10% of the correct amount of mercury in an ore that contains 20% of the metal would usually be deemed unacceptably inaccurate. Accuracy is expressed in terms of either absolute error or relative error. The absolute error Eu of the mean A-1 „2 APPENDIX 1 EVALUATION OF ANALYTICAL DATA A-2 (or average) x of a small set of replicate analyses is given by the relationship 8 (aH) E„ = x x, where x, is an accepted value of the quantity being measured. Often, it is useful to express the accuracy in terms of relative error, where relative error = ^^x 100% (al-2) Xf TABLE a1-1 t * Replicate Absorbance Measurements P Trial Absorbance, A Frequently, the relative error is expressed as a pe -centage as shown; in other cases the quotient is multiplied by 1000 to give the error in parts per thousand (PPtNote that both absolute and relative errors bear a sisn a positive sign indicating that the measured result is greater than its true value and a negative sign the reverse. r We will be concerned with two types of errors:;: random, or indeterminate, errors and systematic,;^ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 0.488 0.480 0.486 0.473 0.475 0.482 0.486 0.482 0.481 0.490 0.480 0.489 0.478 0.471 0.482 0.483 0.488 Trial Absor nance, A --- 18 0.475 19 0.480 20 0,494t 21 0.492 22 0.484 23 0.481 24 0.487 25 0.478 26 0.483 27 0.482 28 0.491 29 0.481 30 0.469$ 31 0.485 32 0.477 33 0.476 34 0.483 Mean absorbance = 0.482 Standard deviation = 0.0056 * Data listed in the order obtained t Maximum value $ Minimum value Trial 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 Absorbance, A 0.476 0.490 0.488 0.471 ... . 0.486 0.478 0.486 0.482 ,: 0.477 Z\ 0.477 0.486 0.478 0.483 0.480 0.483 0.479 US ■ ■ ill ■ IBF jjjl ■Is determinate, errors.1 The error in the mean of a set of replicate measurements is then the sum of these two types of errors; (al-3) where is the random error associated with a measurement and Es is the systematic error. RANDOM ERRORS Whenever analytical measurements are repeated on the same sample, the data obtained are scattered, as is shown in Table al—1, because of the presence of random, or indeterminate errors—that is, the presence of random errors is reflected in the imprecision of the data. The data in columns 2, 4, and 6 of the table are absorbances (Section 7A-2) obtained with a spectrophotometer on 50 replicate red solutions produced by treating identical aqueous samples containing 10 ppm of Fe(III) with an excess of thiocyanate ion. The measured absorbances are directly proportional to iron concentration. The distribution of random errors in these data is more easily comprehended if they are organized into equal-size, contiguous data groups, or cells, as shown in Table al-2. The relative frequency of occurrence of data in each cell is then plotted as in Figure al-L4 to give a bar graph called a histogram. It is reasonable to suppose that if the number of analyses were much larger than that shown in Table al-2 and if the size of the cells were made much smaller, then, ultimately, a smooth curve such as that shown in Figure al-lB would be obtained. A smooth curve of this type is called a Gaussian curve, or a normal error curve. It is found empirically that the results of replicate chemical analyses are frequently distributed in an approximately Gaussian, or normal, form. The frequency distribution exhibited by a Gaussian curve has the following characteristics: 1. The most frequently observed result is the mean |x of the set of data. :1 A third type of error occasionally encountered is gross error, which arises in most instances from the carelessness, ineptitude, laziness, or bad luck of the experimenter. Typical sources include transposition of numbers in recording data, spilling of a sample, using the wrong scale on a meter, accidental introduction of contaminants, and reversing the sign on a meter reading. A gross error in a set of replicate measurements appears as an outlier—a datum that is no- : ticeabiy different from the other data in the set. We will not consider gross errors in this discussion. alA PRECISION AND ACCURACY A-3 2. The results cluster symmetrically around this mean value. 3. Small divergences from the central mean value are found more frequently than are large divergences. 4. In the absence of systematic errors, the mean of a large set of data approaches the true value. Characteristic 4 means that, in principle, it is always possible to reduce the random error of an analysis to something that approaches zero. Unfortunately, it is seldom practical to achieve this goal, because to do so requires performing 20 or more replicate analyses. Ordinarily, a scientist can only afford the time for two or three replicated measurements, and a significant random error is to be expected for the mean of such a small number of replicate measurements. Statisticians usually use ix to represent the mean of an infinite collection of data (see Figure al-lB), and x for the mean of a small set of replicate data. The random error Er for the mean of the small set is then given by E = x It is found that the mean for a finite set of data rapidly approaches the true mean when the number of measurements n is greater than perhaps 20 or 30. Thus, as is shown in the following example, we can sometimes TABLE a 1-2 Frequency Distribution of Data from Table a 1-1 Absorbance Number Relative Range m Range Frequency, y 0.469 to 0.471 3 0.06 0.472 to 0.474 1 0.02 0.475 to 0.477 7 0.14 0.478 to 0.480 9 0.18 0.481 to 0.483 13 0.26 0.484 to 0.486 - 7 (1.14 ' 0.487 to 0.489 5 0.10 0.490 to 0.492 4 0.08 0.493 to 0.495 1 0.02 f N = total number of measurements = 50. Ä-4 APPENDIX 1 EVALUATION OF ANALYTICAL DATA a1A PRECISION AND ACCURACY Ä-5 28 24 20 16 12 ' Q.47Í 0.474 0.477 0.480 0.483 0.486 0.489 0.492 0.495 A has no bias, so the limiting mean is the true value x,. Method B has a bias that is given by ■ 78 0.481 I o. FIGURE a1-1. S, deviation a1-1 A, Histogram showing distribution of the 50 results in Table Jaussian curve for data having the same mean and same standard as the data In A determine the random error in an individual datum or in the mean of a small set of data. EXAMPLE a1-1 Calculate the random error for (a) the second datum in Table al-1 and (b) the mean for the first three entries in the table. The mean for the entire set of data is 0.482, and because this mean is for 50 measurements, we may assume that the random error in it is approximately zero. Thus, the limiting mean (jl can be taken as 0.482. (a) Here, the random error for a single measurement x2 is Er = x2 - ft = 0.480 - 0.482 = -0.002 (b) The mean x for the first three entries in the table is 0.488 + 0.480 + 0.486 = 0.485 Substituting into Equation a 1-4 gives E, = x - it, = 0.485 - 0.482 = +0.003 The random nature of indeterminate errors makes it possible to treat these effects by the methods of statistics. Statistical techniques are considered in Section alB. SYSTEMATIC ERRORS—BIAS Systematic errors have a definite value, an assignable cause, and are of the same sign and magnitude for replicate measurements made in exactly the same way. Systematic errors lead to bias in a measurement technique. Bias is illustrated by the two curves in Figure al-2, which show the frequency distribution of replicate results in the analysis of identical samples by two methods that have random errors of identical size. Method Analytical result, x, FIGURE a1-2 Illustration of bias: bias = u,B - x>- bias = fxB — x, = u-B (al-5) Note that bias affects all of the data in a set and that it bears a sign. Systematic errors have three sources: instrumental, personal, and method. Instrumental Errors. Typical sources of instrumental errors include drift in electronic circuits, leakage in vacuum systems, temperature effects on detectors, currents induced in circuits from ac power lines, decreases in voltages of batteries with use, and calibration errors in meters, weights, and volumetric equipment. Systematic instrument errors are commonly detected and corrected by calibration with suitable standards. Periodic calibration of instruments is always desirable because the response of most instruments changes with time as a consequence of wear, corrosion, or mistreatment. Persona! Errors. Personal errors are those introduced into a measurement by judgments that the experimentalist must make. Examples include estimating the position of a pointer between two scale divisions, the color of a solution at the end point in a titration, the level of a liquid with respect to a graduation in a pipet, or the relative intensity of two light beams. Judgments of this type are often subject to systematic, unidirectional uncertainties. For example, one person may read a pointer consistently high, another may be slightly slow in activating a timer, and a third may be less sensitive to color. Color blindness or other physical handicaps often exacerbate determinate personal errors. Number bias is another source of personal systematic error that is widely encountered and varies considerably from person to person. The most common bias encountered in estimating the position of a needle on a scale is a preference for the digits 0 and 5. Also prevalent is a preference for small digits over large and even ones over odd. A near-universal source of personal error is prejudice. Most of us, no matter how honest, have a natural tendency to estimate scale readings in a direction that improves the precision in a set of results or causes the results to fall closer to a preconceived notion of the true value for the measurement. Most personal errors can be minimized by care and self-discipline. Thus, most scientists develop the habit of systematically double-checking instrument readings, notebook entries, and calculations. Robots, automated systems, computerized data collection, and computerized instrument control have the potential of minimizing or eliminating personal systematic errors. Method Errors. Method-based errors are often introduced from nonideal chemical and physical behavior of reagents and reactions upon which an analysis is based. Possible sources include slowness or incompleteness of chemical reactions, losses by volatility, adsorption of the analyte on solids, instability of reagents, contaminants, and chemical interferences. Systematic method errors are usually more difficult to detect and correct than are instrument and personal errors. The best and surest way involves validation of the method by employing it for the analysis of standard materials that resemble the samples to be analyzed both in physical state and in chemical composition. The analyte concentrations of these standards must, of course, be known with a high degree of certainty. For simple materials, standards can sometimes be prepared by blending carefully measured amounts of pure compounds. Unfortunately, more often than not, materials to be analyzed are sufficiently complex to preclude this simple approach. The National Institute of Standards and Technology2 offers for sale a variety of standard reference materials (SRMs) that have been specifically prepared for the validation of analytical methods.3 The concentration of one or more constituents in these materials has been determined by (1) a previously validated reference method, (2) two or more independent, reliable measurement methods, or (3) analyses from a network of cooperating laboratories, technically competent and thoroughly familiar with material being tested. Most standard reference materials are substances that are commonly encountered in commerce or in environmental, pollution, clinical, biological, and forensic studies. A few examples include trace elements in coal, fuel oil, urban particulate matter, sediments from estuaries, and '■ In 1989, the name^of the National Bureau of Standards (NBS) was changed to the National Institute of Standards and Technoiogy (NIST). At this time, several of the NIST publications still bear the NBS label. ' See U.S. Department of Commerce, NIST Standard Reference Materials Catalog 1990-91, NIST Special Publication 260. Washington: Government Printing Office, 1990. For a description of the NIST reference material program, see R. A. Alvarez, S. D. Ras-berry, and G. A. Uriano, Anal. Chem., 1982, 54, 1226A; and G. A. Uriano, ASTM Standardization News, 1979, 7, 8. A-6 APPENDIX 1 EVALUATION OF ANALYTICAL DATA a1 B STATISTICAL TREATMENT OF RANDOM ERRORS A-7 water; lead in blood samples; cholesterol in human serum; drugs of abuse in urine; and a wide variety of elements in rocks, minerals, and glasses. In addition several commercial supply houses now offer a variety of analyzed materials for method testing.4 a1B STATISTICAL TREATMENT OF RANDOM ERRORS Randomly distributed data of the kind described in the section labeled "random errors" are conveniently analyzed by the techniques of statistics, which are considered in the next several sections.5 a1 B-1 Populations and Samples In the statistical treatment of data, it is assumed that the handful of replicate experimental results obtained in the laboratory is a minute fraction of the infinite number of results that could in principle be obtained given infinite time and an infinite amount of sample. Statisticians call the handful of data a sample and view it as a subset of an infinite population, or universe, of data that in principle exists. The laws of statistics apply strictly to populations only; when applying these laws to a sample of laboratory data, it is necessary to assume that the sample is truly representative of the population. Because there is no assurance this assumption is valid, statements about random errors are necessarily uncertain and must be couched in terms of probabilities. DEFINITION OF SOME TERMS USED IN STATISTICS Population Mean (ft). The population mean, or limiting mean, of a set of replicate data is defined by the equation lim N 2 x< i-I N (al-6) 4 See C. Veillon, Anal. Chem., 1986, 58, 851A. 5 For a more detailed treatment of statistics, see R. Calcutt and R. Boddy, Statistics for Analytical Chemistry. New York: Chapman and Hall, 1983; J. Mandel, in Treatise on Analytical Chemistry, 2nd ed., I. M. Kolthoff and P. J. Elving, Eds., Part I, Vol. 1, Chapter 5. New York: Wiley, 1978; J. K. Taylor, Quality Assurance of Chemical Measurements. Chelsea, Michigan: Lewis Publishers, Inc., 1987;andH. MarkandJ. Workman, Statistics in Spectroscopy. San Diego: Academic Press, 1991. where x, represents the value of the z'th measurement. As indicated by this equation, the mean of a set of measurements approaches the population mean as N, the number of measurements, approaches infinity. It is important to add that in the absence of bias, p, is the true value for the quantity being measured. Population Standard Deviation (o") and the Population Variance (o-2). The population standard deviation and the population variance provide statistically significant measures of the precision of a population of data. Thus, a = (al-7) where is again the value for the tth measurement. Note that the population standard deviation is the root mean square of the individual deviations from the mean for the population. Statisticians prefer to express the precision of data in terms of variance, which is simply the square of the standard deviation (a2), because variances combine ad-ditively. That is, if n independent sources of random error exist in a system, the total variance of is given by the relationship + o\ a1B~2 Confidence Limits (CL) The true or population mean (jx) of a measurement is a constant that must always remain unknown. However, in the absence of systematic errors, limits can be 0 I0 FIGURE a 1-4 Relative error in s as a function of N. set within which the population mean can be expected to lie with a given degree of probability. The limits obtained in this manner are called confidence limits. The confidence limit, which is derived from the sample standard deviation, depends upon the certainty with which * is known. If there is reason to believe that s is a good approximation of a, then the confidence limits can be significantly narrower than if the estimate of s is based upon only two or three measurements. CONFIDENCE LIMIT WHEN s IS A GOOD APPROXIMATION OF a Figure al-5 is a normal error curve in which the abscissa is the quantity z, which represents the deviation from the mean in units of the population standard deviation Confidence level + z + 0.67ff 4 o- = 0.006% ethanol. (a) 2x, = 0.084 + 0.089 + 0.079 = 0.252 Xjc? = 0.0070566 + 0.007921 + 0.006241 = 0.021218 '0.021218 - (0.252)2/3 = Q QQ5Q M 3-1 Here,* = 0.252/3 = 0.084. Table al-4 indicates that f = ±4.30 for two degrees of freedom and 95% confidence. Thus,' 95% CL = x ± ts = 0.084 4.3 X 0.0050 V3 = 0.084 ± 0.012 (b) Because a good value of cr is available 1.96 X 0.006 95% CL = x ± 0.084 ± V3 = 0.084 ± 0.007 Note from Example al-6 that a sure knowledge of cr nearly halves the confidence interval. a1B-3 Test for Bias As noted in Section alA-2, bias in an analytical method is generally detected by the analysis of one or more standard reference materials whose composition is known. In all probability, the experimental mean of such an analysis x will differ from the true value \x supplied for the standard. In this case, a judgment must be made whether this difference is the consequence of , random error in the analysis of the reference material or of bias in the method used. A common way of treating this problem statistically is to compare the experimental difference x — p, with the difference that could be expected at a certain probability level if no bias existed. If the experimental x — p, is larger than the calculated difference, bias is likely. If, on the other hand, the experimental value is equal to or smaller than the computed difference, the presence of bias has not been demonstrated. This test for bias makes use of the t statistics discussed earlier. Here we rearrange Equation a 1—21 to give * ~ ^ = ± \/N (al_22) where N is the number of replicate measurements employed in the test. (If a good estimate of cr is available, the equation can be modified by replacing t with z and s with cr.) If the experimental value of x — p. is larger than the value of x -p- calculated from Equation al-22, the presence of bias in the method is suggested. If, on the other hand, the value calculated using Equation al-22 is larger, no bias has been demonstrated. EXAMPLE a1-7 A new procedure for the rapid determination of sulfur in kerosenes was tested on a sample known from its method of preparation to contain 0.123% S. The results were%S = 0.112,0.118,0.115, and 0.119. Do the data indicate the presence of bias in the method? Sjc, = 0.112 + 0.118 + 0.115 + 0.119 = 0.464 x = 0.464/4 = 0.116%S x - ft = 0.116 - 0.123 = -0.007% S %xf = 0.012544 + 0.013924 + 0.013225 + 0.014161 = 0.053854 s = /0.053854 - (0.464)2/4 4 - 1 /0.000030 = 0.0032 From Table al-4, we find that at the 95% confidence level, t has a value of 3.18 for three degrees of freedom. Thus, ts. V4 3.18 X 0.0032 V? = ±0.0051 An experimental mean can be expected to deviate by ±0.0051 or greater no more frequently than 5 times in 100. Thus, if we conclude that x — \i = —0.007 is a significant difference and that bias is present, we will, on the average, be wrong fewer than 5 times in 100. If we make a similar calculation employing the value for t at the 99% confidence level, tsi\/N assumes a value of 0.0093. Thus, if we insist upon being wrong no more often than 1 time in 100, we must conclude that no bias has been demonstrated. Note that this statement is different from saying that no bias exists. EXAMPLE a1-8 Suppose we know from past experience that the method described in Example a 1-7 had a population standard deviation of 0.0032% S. That is, s -» cr = 0.0032. Is the presence of bias suggested at the 99% confidence level? Here we write that X — (x = ± Vn 2.58 x 0.0032 V4 = ±0.00413 The experimental difference of - 0.007 is significantly larger than this number. Thus, bias is strongly suggested. a 18-4 Propagation of Measurement Uncertainties A typical instrumental method of analysis involves several experimental measurements, each of which is subject to an indeterminate uncertainty and each of which contributes to the net indeterminate error of the final result. For the purpose of showing how such indeterminate uncertainties affect the outcome of an analysis, let us assume that a result x is dependent upon the experimental variables, p, q, r, . . . , each of which A-14 APPENDIX 1 EVALUATION OF ANALYTICAL DATA fluctuates in a random and independent way. That is, x is a function of p, q, r, . . . , so we may write x = fip.q.r, (a 1-23) The uncertainty dx, (the deviation from the mean) in the ith measurement of x will depend upon the size and sign of the corresponding uncertainties dph dqh drit . . , . , and we may write dxi = /(dp^dq.^, . . .) The variation in dx as a function of the uncertainties in p, q, r, . . . can be derived by taking the total differential of Equation a 1-23. That is, dx = dx dP/q.r.... . dx dq dx ,6r dr + (a 1-24) In order to develop a relationship between the standard deviation oix and the standard deviations of p, q, and r, it is necessary to square the foregoing equation. In doing so, we will drop the subscripts associated with all partial derivatives. Thus, (dx)2 dp + + dAdq dq) dx\ , — \dr + dr (al-25) This equation must then be summed between the limits of i = 1 to / = N, where N again is the total number of replicate measurements. In squaring Equation a 1-24, two types of terms from the right-hand side of the equation emerge: (1) square terms and (2) cross terms. Square terms take the form terms should approach zero, particularly when N is large.8 As a consequence of the canceling tendency of cross terms, the sum of Equation a 1-25 from / = 1 to / = N can be assumed to be made up exclusively of square terms. This sum then takes the form dp2, dq2 ^) dr2, dr Square terms are always positive and can, therefore, never cancel. In contrast, cross terms may be either positive or negative in sign. Examples are dpdq- ■ • " If dp, dq, and dr represent independent and random uncertainties, some of the cross terms will be negative and others positive. Thus, the summation of all such dx\I dx\ ßp)\dq) (al-26) Dividing through by N gives %(dxf _ {dx\2X(dPlf (dx\22(dq,)2 N W N N fdxYtjdr,)2 _ \dr) N From Equation al-7, however, we see that - = - = at n n where or2 is the variance of x. Similarly, (al-27): N = o-„ and so forth. Thus, Equation al-27 can be written in terms of the variances of the quantities; that is, 2 I dX 1 Z _!_ ßP. + (al-28) The example that follows illustrates how Equation al-28 is employed to give the variance of a quantity calculated from several experimental data. EXAMPLE a1-9 The number of plates N in a chromatographic column can be computed with Equation 24-17 (Chapter 24): ! If the variables are not independent, the cross terms must be kept regardless of the size of N. See S. L. Meyer, Data Analysisp» Scientists and Engineers, New York: Wiley, 1975. a1 B STATISTICAL TREATMENT OF RANDOM ERRORS A-15 where tR is the retention time and W is the width of the chromatographic peak in the same units as tR. The significance of these terms is explained in Figure 24-6. Hexachlorobenzene exhibited a high-performance liquid chromatographic peak at a retention time of 13.36 min. The width of the peak at its base was 2.18 min. The standard deviation s for the two time measurements was 0.043 and 0.061 min, respectively. Calculate (a) the number of plates in the column and (b) the standard deviation for the computed result. )2 = 601 plates (a) AT = 16 (b) Substituting s for rr in Equation al-28 gives bn dN dW -2 Taking partial derivatives of the original equation w2 and Lw , ~ + Substituting these relationships into the previous equation gives /3255 V 2 , (-32^ \w2) V w3 32 X 13.36 min\2^„^ . s, - „ (0.061mm)2 2.18 mihr / -32(13.36 min)2\2„,,, . s7 ———ir-) (0.043 min)2 (2.18 min)3 / = 592.1 sN = V592TT = 24.3 = 24 plates Thus, N = 6.0 (±0.2) X 102 plates a1B-5 Rounding Results from Arithmetic Calculations Equation al-28 is helpful in deciding how the results of arithmetical calculations should be rounded. For example, consider the case where the result x is computed by the relationship x = p + q — r where p, q, and r are experimental quantities having sample standard deviations of spl sq, and sr, respectively. Applying Equation al-18 (using sample rather than population standard deviations) gives dxV 2 (dx\2 2 /dx" 2 But, dx Bp = 1 and dx\ Jr) Therefore, the variance of x is given by s2=(l)2s2 + (l)2s2+ (-]fs2 or the standard deviation of the result is given by Vs2 + s2, +~2 Thus, the absolute standard deviation of a sum or difference is equal to the square root of the sum of the squares of the absolute standard deviation of the numbers making up the sum or difference. Proceeding in this same way yields the relationships shown in column 3 of Table al-5 for other types of arithmetic operations. Note that in several calculations, relative variances such as (sx/x)2 and (splp)2 are combined rather than absolute standard deviations. EXAMPLE a1-10 Calculate the standard deviation of the result of [14.3(±0.2) - 11.6(±0.2)] x 0.050(±Q.0Q1) [820(±10) + 1030(±5)3 X 42,3(±0.4) = 1.725(±?) X 10-6 where the numbers in parentheses are absolute standard deviations. First we must calculate the standard deviation of the sum and the difference. The standard deviation sp for the difference in the numerator is given by sp = V(±0.2)2 + (±0.2)2 - ±0.283 For the sum in the denominator, the standard derivative is *, = V(±10)2 + (±5)2 = ±11.2 We may then rewrite the equation as 2.7(±0.283) x 0.05Q(±0.001) 1850(±11.2) X 42.3(±0.4) = 1.725 (±?) X io-6 A-16 APPENDIX 1 EVALUATION OF ANALYTICAL DATA The equation now contains only products and quotients, and Equation (2) of Table al-5 applies: X = ±0.107 To obtain the absolute standard deviation, we write s = ±0 107 x = ±0.0107(1.725 X 10"6) = ±0.185 X 10-6 and the answer is rounded to 1.7 (±0.2) X 10 6. EXAMPLE a1-11 Calculate the absolute deviations of the results of the following computations. The absolute standard deviation for each quantity is given in parentheses. (a) x = log [2.00 (±0.02) X 10~4] = ■-3.6990 ± ? (b) x = antilog [1.200 (±0.003)] = 15.849 ± ? (c) x = antilog [45.4 (±0.3)1 = 2.5119 X 1045 ± ? (a) Referring to Equation (4) in Table al-5 we see : 0.434 x 0.02 x 10' 2.00 X 10" = ±0.004 Thus, log [2.00 (±0.02) X 10 1 = - 3.699 (±0.004) J: (b) Employing Equation (5) in Table al-5, we obtain ■1' = 2.303 X (±0.003) = ±0.0069 x sx = ±0.0069x = ±0.0069 X 15.849 = 0.109 Therefore, antilog [1.200 (±0.003)] = 15.8 ±0.1 A (c) ^ = 2.303(±0.3) = ±0.691 x ,., = ±0.691 x 2.511 X 1045 = ±1.7 x 104S Therefore, antilog [45.4 (±0.3)] = 2.5 (±1.7) x 1045 Example al-11 c demonstrates that a large absolute error is associated with the antilogarithm of a number:: with few digits beyond the decimal point. This large TABLE al-5 Error Propagation in Arithmetic Calculations Type of Calculation Addition or Subtraction Multiplication or Division Exponentiation Logarithm Antilogarithm K.xiiinplc: x = f x = tog10 p x = antilog.oP Standard Deviation of x 5* £e x P = 0.434 P ^ = 2.303 sp x (3) (4) (5) * p, q, and r are ex] perimentat variables whose standard deviations are and respectively; y to a constant. a 1C METHOD OF LEAST SQUARES A-17 uncertainty arises from the fact that the numbers to the left of the decimal (the characteristic) serve only to locate the decimal point. The large error in the antilogarithm results from the relatively large uncertainty in the mantissa of the number (that is, 0.4 ± 0.3). a1C METHOD OF LEAST SQUARES Most analytical methods are based upon a calibration curve in which a measured quantity y is plotted as a function of the known concentration x of a series of standards. Figure a 1-6 shows a typical calibration curve, which was derived for the chromatographic determination of isooctane in hydrocarbon samples. The ordinate (the dependent variable) is the area under the chromatographic peak for isooctane, and the abscissa (the independent variable) is the mole percent of isooctane. As is typical (and desirable), the plot approximates a straight line. Note, however, that because of the random errors in the measuring process, not all the data fall exactly on the line. Thus, we must try to fit a ' 'best'' straight line through the points. A common way of finding such a line is the method of least squares. In applying the method of least squares, we assume that there is a linear relationship between the area of 5.0 £■ 3.0 a 2.0 — 0.5 1.0 1.5 x, Concentration of isooctane. mo I' FIGURE a1-6 Calibration curve for determining isooctane in hydrocarbon mixtures. the peaks (y) and the analyte concentration (x) as given by the equation y = mx + b where m is the slope of the straight line and b is the intercept. We also assume that any deviation of individual points from the straight line results from error in the area measurement and that there is no error in the values of x—that is, the concentrations of the standard solutions are known exactly. As illustrated in Figure a 1—5, the vertical deviation of each point from the straight line is called a residual. The line generated by the least-squares method is the one that minimizes the sum of the squares of the residuals for all of the points. For convenience, we define three quantities Sxx, S^, and Sxy as follows: ~ 2(-£," X) — %Xj Syy = X(ys - yf = Xyj -S = £(*,■ - x)(yt - y) N (Sy,)2 N Sx;Sy,-N (al-29) (al-30) (al-31) Here x: and y,- are the coordinates of the individual data points, N is the number of pairs of data used in preparation of the calibration curve, and x and y are the average values for the variables, or x = Xx,/N and y = %yi/N Note that and Syv are the sum of the squares of the deviations from the mean for the individual values of x and y. The equivalent expressions shown to the far right in Equations al-29, al—30, and al-31 are more convenient when a handheld calculator is being used. Six useful quantities can be computed from S^, Syy, and S™. 1. The slope* 6f"ihe line m: m 2. The intercept b: b = y — mx (al-32) (al-33) 3. The standard deviation sy of the residuals, which is given by: A-18 APPENDIX 1 EVALUATION OF ANALYTICAL DATA a1 D QUESTIONS AND PROBLEMS A-19 y V N - 2 4. The standard deviation of the slope sm: {al-34) (al-35) 5. The standard deviation sh of the intercept: / %xj ; 1 _ _ (al-36) 6. The standard deviation se for analytical results obtained with the calibration curve: fflVL N m2Sx (al-37) Equation al-37 permits the calculation of the standard deviation of the mean yc of a set of L replicate analyses when a calibration curve that contains N points is used; recall that y is the mean value of y for the N calibration data. EXAMPLE a 1-12 Carry out a least-squares analysis of the experimental data provided in the first two columns in Table al-6 and plotted in Figure al-6. Columns 3,4, and 5 of the table contain computed values for xf, y2, and *,-y,; their sums appear as the last entry of each column. Note that the number of digits carried in the computed values should be the maximum allowed by the calculator or computer, and rounding should not be performed until the calculation is complete. We now substitute into Equations a 1-29, al-30, and a 1-31 to obtain S„ = Sx2 - (Sx,)2/JV = 6.90201 - (5.365)2/5 = 1.14537 S„ = Xyj - (Zytf/N = 36.3775 - (12.51)2/5 = 5.07748 = X*,y,- - Xjc,-2y,W = 15.81992 - 5.365 X 12.51/5 = 2.39669 Substituting these quantities into Equations a 1-32 and a 1-33 yields m = 2.39669/1.14537 = 2.0925 = 2.09 12.51 5.365 b = —-- 2.0925 x —^— = 0.2567 = 0.26 Thus, the equation for the least-squares line is y = 2.09x + 0.26 Substitution into Equation al-34 yields the standard deviation for the residuals: k - m2Sx I N - 2 J5.07748 - (2.0925)2 x 1.14537 n „ , = /--—----■ = 0.14 V 5-2 and substitution into Equation al-35 gives the standard deviation of the slope: sm = sjVsZ = 0.14/VI. 14537 = 0.13 The standard deviation of the intercept is obtained from Equation al-36. Thus 5> = 0.14 /—• 5 - (5.365)2/6.90201 0.16 CaSnlU for a Chromatographic Method for the Determination of laooctane in a Hydrocarbon Mixture Mole Percent lsonctane, Peak Area, yi x\ •-. - !' y> 0.352 0.803 1.08 1.38 1.75 5.365 1.09 1.78 2.60 3.03 4.01 12.51 0.12390 0.64481 1.16640 1.90440 3.06250 6.90201 1.1881 3.1684 6.7600 9.1809 16.0801 36.3775 0.38368 1.42934 2.80800 4.18140 : 7.01750 15.81992 EXAMPLE a1-13 The calibration curve derived in Example a 1-12 was used for the chromatographic determination of iso-octane in a hydrocarbon mixture. A peak area of 2.65 was obtained. Calculate the mole percent of isooctane and the standard deviation for the result if the area was (a) the result of a single measurement and (b) the mean of four measurements. In either case, y - 0.26 2.65 - 0.26 2.09 2.09 = 1.14 mol' (a) Substituting into Equation al-37, we obtain = 2^1 /I I (2-65 - 12.51/5)2 Sc ~~ 2.09Y 1 + 5 + (2.09)2 X 1.145 = 0.074 mol % (b) For the mean of four measurements, = 2^1 /I 1 (2.65 - 12.51/5)" Sc 2.09a/4 + 5+ (2.09)2 X 1.145 = 0.046 mol % a1D QUESTIONS AND PROBLEMS al-l Consider the following sets of data: al-2 al-3 a 1-4 al-5 al 6 A B C D 61.45 3.27 12.06 2.7 61.53 3.26 12.14 2.4 61.32 3.24 2.6 3.24 2.9 3.28 3.23 Calculate: (a) the mean for each data set, and decide how many degrees of freedom are associated with the calculation of x; (b) the absolute standard deviation of each set, and decide how many degrees of freedom are associated with the calculation of s; (c) the standard error of the mean of each set; (d) the coefficient of variation for the individual data points from the mean. The accepted value for the quantity that provided each of the sets of data in Problem al-l is: A 61.71, B 3.28, C 12.23, D 2.75. Calculate: (a) the absolute error for the mean of each set; (b) the percent relative error for each mean. A particular method for the analysis of copper yields results that are low by 0.5 mg. What will be the percent relative error due to this source if the weight of copper in a sample is (a) 25 mg? (b) 100 mg? (c) 250 mg? (d) 500 mg? The method described in Problem al-3 is to be used to analyze an ore that contains about 4,8% copper. What minimum sample weight should be taken if the relative error due to a 0.5-mg loss is to be smaller than '' (a) 0.1%? (b) 0.5%? (c) 0.8%? (d) 1.2%? A certain instrumental technique has a standard deviation of 1,0%. How many replicate measurements are necessary if the standard error of the mean is to be 0.01%? A certain technique is known to have a mean of 0.500 and standard deviation of 1.84 X 10~3. It is also known that Gaussian statistics apply. How many replicate determinations are necessary if the standard error of the mean is not to exceed 0.100%? A-20 APPENDIX 1 EVALUATION OF ANALYTICAL DATA a1 D QUESTIONS AND PROBLEMS Ä-21 al-7 al A constant solubility loss of approximately 1.8 mg is associated with a particular method for the determination of chromium in geological samples. A sample containing approximately 18% Cr was analyzed by this method. Predict the relative error (in parts per thousand) in the results due to this systematic error, if the sample taken for analysis weighed 0.400 g. Following are data from a continuing study of calcium ion in the blood plasma of several individuals: (d) y = [33.33 (±0.03)]3 = 37025.927 Subject a 1-9 Mean Calcium Content, mg/100 mL Number of Observations Derivation of Individual Results from Mean Values al-10 3.16 4.08 3.75 3.49 3.32 5 4 5 3 6 0.14, 0.09, 0.06, 0.00, 0.11 0.07, 0.12, 0.10, 0.01 0.13 , 0.05 , 0.08 , 0.14, 0.07 0.10, 0.13, 0.07 0.07, 0.10, 0.11, 0.03, 0.14, 0.05 (a) Calculate s for each set of values. (b) Pool the data and calculate s for the analytical method. A method for determining the particulate lead content of air samples is based upon drawing a measured quantity of air through a filter and performing the analysis on circles cut from the filter. Calculate the individual values for s as well as a pooled value for the accompanying data. Sample 1 2 3 4 u.g Pb/m3 Air 1.5, 1.2, 1.3 2.0, 2.3, 2.3, 2.2 1.8, 1.7, 1.4, 1.6 1.6, 1.3, 1.2, 1.5, 1.6 Estimate the absolute standard deviation and the coefficient of variation for the results of the following calculations. Round each result so that it contains only significant digits. The numbers in parentheses are absolute standard deviations. 6.75 (±0.03) + 0.843 (±0.001) - 7.021 (±0.001) 67.1 (±0.3) x 1.03 (±0.02) X 10-17 = 6.9113 x 760(±2) (a) y = (b) y = 0.572 io-16 (c) y (d) y 243 (±1) X 143(±6) - 1.006(±0.006) 64(±3) = 183578.5 1249(±1) + 77(±8) 1.97(±0.01) = 5.9578 X 10" = 8.106996 x 10 al-11 (e) y -' y 243(±3) Estimate the absolute standard deviation and the coefficient of variation for the results of the following calculations. Round each result to include only significant figures. The numbers in parentheses are absolute standard deviations. (a) y = -1.02 (±0.02) X 10-7 - 3.54 (±0.2) X 10^8 = -1.374 x IO"7 (b) y = 100.20(±0.08) - 99.62 (±0.06) + 0.200(±0.004) = 0.780 (c) y = 0.0010 (±0.0005) x 18.10 (±0.02) x 200 (±1) = 3.62 al-15 al-16 (e) y = 1.73(±0.03) X 10" = 106.1349693 1.63(±0.04) x 10" a 1-12 Based on extensive past experience, it is known that the standard deviation for an analytical method for gold'in sea water is 0.025 ppb. Calculate the 99% confidence limit for an analysis using this method, based on (a) a single measurement. (b) three measurements. (c) five measurements. al—13 An established method of analysis for chlorinated hydrocarbons in air samples has a standard deviation of 0.030 ppm. (a) Calculate the 95% confidence limit for the mean of four measurements obtained by this method. (b) How many measurements should be made if the 95% confidence limit is to be ±0.017? al-14 The standard deviation in a method for the analysis of carbon monoxide in automotive exhaust gases has been found, on the basis of extensive past experience, to be 0.80 ppm. (a) Estimate the 90% confidence limit for a triplicate analysis. (b) How many measurements would be needed for the 90% confidence limit for the set to be 0.50 ppm? The certified percentage of nickel in a particular NIST reference steel sample is 1.12%. A new spectrometric method for the determination of nickel produced the following percentages: 1.10, 1.08, 1.09, 1.12, 1.09. Is there an indication of bias in the method at the 95% level? A titrimetric method for the determination of calcium in limestone was tested by analysis of an NIST limestone containing 30.15% CaO. The mean result of four analyses was 30.26% CaO, with a standard deviation of 0.085%. By pooling data from several analyses, it was established that s —* cr = 0.094% CaO. (a) Do the data indicate the presence of a determinate error at the 95% confidence level? (b) Do the data indicate the presence of a determinate error at the 95% confidence level if no pooled value for a was available? al-17 In order to test the quality of the work of a commercial laboratory, duplicate analyses of a purified benzoic acid (68.8% C, 4.953% H) sample was requested. It is assumed that the relative standard deviation of the method is sr —> 07 = 4 ppt for carbon and 6 ppt for hydrogen. The means of the reported results are 68.5% C and 4.882% H. At the 95% confidence level, is there any indication of determinate error in either analysis? al-18 The diameter of a sphere has been found to be 2.15 cm, and the standard deviation associated with the mean is 0.02 cm. What is the best estimate of the volume of the sphere, and wharfs the standard deviation associated with the volume? al-19 A given pH meter can be read with a standard deviation of ± 0.01 pH units throughout the range 2 to 12. Calculate the standard deviation of [H30 + ] at each end of this range. a 1-20 A solution is prepared by weighing 5.0000 g of compound X into a 100-mL volumetric flask. The balance could be used with a precision of 0.2 mg reported as a standard deviation and the volumetric flask could be filled A-22 APPENDIX 1 EVALUATION OF ANALYTICAL DATA a1 D QUESTIONS AND PROBLEMS A-23 a 1-21 al-22 with a precision of 0.15 mL also reported as a standard deviation. What is the estimated standard deviation of concentration in g/mL? Estimate the absolute standard deviation in the result derived from (he following operations (the numbers in parentheses are absolute standard deviations for the numbers they follow). Report the result to the appropriate number of significant figures. (a) x = log 878(±4) = 2.94349 (b) x = log 0.4957(±0.0004) = -0.30478 (c) p = antilogarithm 3.64(±0.01) = 4365.16 (d) p = antilogarithm -7.191(±0.002) = 6.44169 X I0~s The sulfate ion concentration in natural water can be determined by measuring the turbidity that results when an excess of BaCl2 is added to a measured quantity of the sample. A turbidimeter, the instrument used for this analysis, was calibrated with a series of standard Na,S04 solutions. The following data were obtained in the calibration: mg SO| /L, er Turbidimeter Reading, R 0.00 5.00 10.00 15.0 20.0 0.06 1.48 2.28 3.98 4.61 Assume that a linear relationship exists between the instrument reading and concentration. (a) Plot the data and draw a straight line through the points by eye. (b) Derive a least-squares equation for the relationship between the variables. (c) Compare the straight line from the relationship derived in (b) with that in (a). (d) Calculate the standard deviation for the slope and the intercept for the least-squares line. (e) Calculate the concentration of sulfate in a sample yielding a turbidimeter reading of 3.67. Calculate the absolute standard deviation of the result and the coefficient of variation. (f) Repeat the calculations in (e) assuming that the 3.67 was a mean of six turbidimeter readings. a 1-23 The following data were obtained in calibrating a calcium ion electrode for the determination of pCa. A linear relationship between the potential E and pCa is known to exist. pCa E, mV 5.00 4.00 3.00 2.00 1.00 -53.8 -27.7 + 2.7 + 31.9 + 65.1 (a) Plot the data and draw a line through the points by eye. (b) Derive a least-squares expression for the best straight line through the points. Plot this line. al-24 (c) Calculate the standard deviation for the slope and the intercept of the least-squares line. (d) Calculate the pCa of a serum solution in which the electrode potential was 20.3 mV. Calculate the absolute and relative standard deviations for pCa if the result was from a single voltage measurement. (e) Calculate the absolute and relative standard deviations for pCa if the millivolt reading in (d) was the mean of two replicate measurements. Repeat the calculation based upon the mean of eight measurements. (f) Calculate the molar calcium ion concentration for the sample described in (d). (g) Calculate the absolute and relative standard deviations in the calcium ion concentration if the measurement was performed as described in (e). The following are relative peak areas for chromatograms of standard solutions of methyl vinyl ketone (MVK). Concentration MVK, mmoI/L Relative Peak Area 0.500 1.50 2.50 3.50 4.50 5.50 3.76 9.16 15.03 20.42 25.33 31.97 (a) Derive a least-squares expression assuming the variables bear a linear relationship to one another. (b) Plot the least-squares line as well as the experimental points. (c) Calculate the standard deviation of the slope and intercept of the line. (d) Two samples containing MVK yielded relative peak areas of 6.3 and 27.5. Calculate the concentration of MVK in each solution. (e) Assume that the results in (d) represent a single measurement as well as the mean of four measurements. Calculate the respective absolute and relative standard deviations.