Environ. Sci. Technol. 1994, 28, 459-465 Group Contribution Method for Predicting Probability and Rate of Aerobic Biodegradation Robert S. Boethling,t Philip H. Howard,’p* William Meyian,* William Stiteler,* Julie Beauman,* and Mestor Tiradot Office of Pollution Prevention and Toxics 7406, U.S. Environmental Protection Agency, 401 M Street SW, Washington, D.C. 20460, and Syracuse Research Corporation, Merrill Lane, Syracuse, New York 13210 Two independent training sets were used to develop four mathematical models for predicting aerobic biodegradability from chemical structure. All four of the models are based on multiple regressions against counts of 36 preselected chemicalsubstructuresplus molecularweight. Two of the models, based on linear and nonlinear regressions, calculate the probability of rapid biodegradation and can be used to classify chemicals as rapidly or not rapidly biodegradable. The training set for these models consisted of qualitative summary evaluations of all available experimental data on biodegradability for 295 chemicals. The other two models allow semi-quantitative prediction of primary andultimate biodegradation rates using multiple linear regression. The training set for these models consisted of estimates of primary and ultimate biodegradation rates for 200 chemicals, gathered in a survey of 17 biodegradation experts. The two probability models correctly classified 90% of the chemicals in their training set, whereas the two survey models calculated biodegradation rates for the survey chemicals with R2r 0.7. These four models are intended for use in chemical screening and in setting priorities for further review. Introduction Chemical scoring systems for identifying substances of priority concern have proliferated in concert with environmental legislation. Much of the early impetus for developing such systems derived from the need to review Premanufacture Notifications (PMNs) under Section 5 of the Toxic Substances Control Act (TSCA) and by the TSCA-mandated screening of existing chemicals by the US.Interagency Testing Committee. However,virtually every EPA program is now involved in chemical scoring in one way or another. Examples include reportable quantity (RQ) adjustment methodology under the Comprehensive Environmental Response, Compensation, and Liability Act (CERCLA; Superfund); the Superfund Hazard Ranking System;the Officeof Pesticide Programs’ Inerts Ranking Program for “inert” components of pesticide formulations; and methodologies for listing chemicals on the Toxics Release Inventory (TRI). This list is by no means exhaustive. The characteristics of ranking systems vary, but the majority include explicit consideration of persistence, bioconcentration potential, and aquatic and human toxicity. Persistence is primarily a function of biodegradability for the majority of organic chemicals released to soiland water. This creates a problem for priority-setting exercises, because experimental biodegradation data are typically either lacking entirely or do not exist in a form that can be easily incorporated into automated screening methods. t U. S. Environmental Protection Agency. t Syracuse Research Corp. 0013-938X/94/0928-0459$04.50/0 0 1994 American Chemical Society In responding to this need, we first developed a weightof-evidence procedure (1) for collecting and evaluating available data, due to the considerable variability of biodegradation data. The data and evaluations were then made available on-line in BIODEG, a component of the Environmental Fate Data Base (2,3) that now contains information on more than 800 discrete organics. The records for each chemical in the BIODEG file constitute a comprehensive assessment of experimental mixedculture biodegradation data that exist for the chemical; pure culture data are not included because they offerlittle insight into environmental biodegradation rates. Each test result, whether it reflects biochemicaloxygendemand (BOD),COzproduction, loss of parent, or something else, is assigned a qualitative descriptor code such as BR (biodegrades rapidly) or BSA (biodegrades slowly even with acclimation). Aspects of biodegradation such as acclimation, microbial toxicity and temperature are considered in the evaluation process. A reliability code (3, one test available; 2, two tests available; and 1, three or more consistenttests available),whichreflects theamount and consistency of the available data, is also assigned for each biodegradation summary code. There are summary evaluation codesforoverallaerobicbiodegradation,aerobic biodegradation in screening tests, biological treatment simulations, grab sample tests with soilor water, and field studies. Subsequently,thesummaryevaluation codesfor overall aerobic biodegradation were used to develop two models for predicting aerobic biodegradability from chemical substructures (4). Thesemodels, based on multiple linear and nonlinear regression against counts of 35 preselected substructures, calculated the probability of rapid biodegradation and successfullyclassifiedasrapidly or notrapidly biodegradable 90% of 264 chemicals in the training set and 27 chemicals in an independent validation set. Klopman et al. (5)have also used this data set to develop apredictive modelbasedon computer-automated structure evaluation (CASE)methodology,and Gombar andEnslein (6) have described models for subsets (aliphatic and aromatic chemicals) of the BIODEG data. Although the above models are based on carefully evaluated experimental data,their capabilities arelimited to classification. To provide a consistent set of data for quantitative modeling and to determine the feasibility of a biodegradation expert system,we conducted a survey in 1986 in which 22 biodegradation experts were asked to estimate rates and products of degradation for 50organic chemicals (7). A screening-level model for predicting aerobic biodegradability was developed from the survey data (8),but the usefulness of that data set was limited by its small size. In this paper, we describe four new screening-level biodegradability models. Two of these represent enhancements to our previously described (4) linear and nonlinear BIODEG models; i.e., those based on experiEnviron. Sci.Technol.,Vol. 28, No. 3, 1994 459 mental data. The other two models are based on data from a new and greatly expanded survey,in which a panel of 17 experts estimated rates of primary (loss of parent chemical identity) and ultimate (essentially, conversion to C02 and water) degradation under aerobic conditions in aquatic environments for 200chemicals. These models permit semi-quantitative prediction of aquatic biodegradationrates. Theindependent variables forallfourmodels are a revised set of 36 chemical substructures from the original linear and nonlinear models (4) plus molecular weight. With the addition of molecular weight, predictions are possible for all chemicals even if they do not contain any of the 36 structural fragments. More importantly, the successful fitting of a single set of chemical substructures to both the evaluated (BIODEG) and the survey datasets affirms the importance of these substructures in estimating biodegradability. Methods Biodegradation Database. Summaryevaluation codes foroverallaerobic biodegradation, used inthe development ofthelinear andnonlinear BIODEG models, wereretrieved from BIODEG, the Evaluated Biodegradation Database. The design and development of this filewere described in detail in a previous publication (1)and briefly in the Introduction. BIODEG is a component of the Environmental Fate Data Base (EFDB),available on-line or in a PC-compatible format from Syracuse Research Corp. (contact P. H. Howard for details). Linear and Nonlinear Models. The basic approach in the development of the linear and nonlinear BIODEG models has been described (4). For this exercise, the training set and the independent validation set from the earlier work were combined to yield a new training set of 295 chemicals. Because the approach had already been validated, it was not considered necessary to keep a separate validation set. This data set consisted of 186 chemicals that received summary evaluations of “biodegrades rapidly” and 109 chemicals designated “does not biodegrade rapidly”. An indicator variable was formed with chemicalsinthe rapid biodegradation category being assigned a value of 1and chemicals in the slow biodegradation category being assigned a value of 0. The indicator variable wasthen used asthedependent variable in multiple linear and nonlinear regressions against 37 independent variables. With this definition of the dependent variable, a regression model estimates the probability that a chemical is in the “biodegrades rapidly” group. In our previously described (4) linear and nonlinear models, counts of 35structural fragments (i.e.,thenumber oftimes a substructureoccursinthemolecule)constituted the independent variables. For this study,severalchanges were made in the set of independent variables used in the regression analyses. Two new fragments (-CF3 and unsubstituted phenyl group,-C6H5)were added, and three fragments were redefined. The latter include the quaternary carbon and the tertiary alcohol fragments of the earlier models (4),now eliminated and replaced with a single fragment (carbon with four single bonds and no hydrogens),and the unsubstituted linear alkyl chain LC4, which now can be used only if it is terminal (Le., -CH2CH2CH2CH3). Finally, molecular weight was added as a continuous variable, since it is well-known that as molecular sizeincreases,biodegradability generallydecreases (9,10). In general, atoms were used only once; that is, if an atom is part of one fragment, it cannot be part of another. Table 1liststhesefragmentsandtheir regressionderived coefficients. The linear model was defined as y1= a0 + alf, +ad2+ ...+a3j36+amM, +e; (1) where Y;isthe probability that chemicaljwill biodegrade fast, or for the survey models, the primary or ultimate biodegradation rate, f , is the number of nth substructure in jth chemical, a0 is the intercept, a, is the regression coefficient for nth substructure, M, is the molecular weight,am isthe regression coefficient forM,, and e, isthe error term (mean value is zero). Regression coefficients were estimated by the method of least squares, using the REG procedureofthe PCversion of the StatisticalAnalysis System (SAS Institute, Cary, NC). Although the assumption of homogeneousvariance does not hold whenever the dependent variable is defined as above, the least-squares method still results in unbiased estimates. The logistic equation was used as the basis for the nonlinear model. This model exp(a, +a,f, +ad2+ ...a3j 3 6 +a,M,) 1+ exp(ao+alfl +aj 2+...a3j36-ta,M,) Yj= (2) estimates the probabilities near 0.0 whenever the linear combination in the exponent takes large negative values; near 0.5whenever that linear combination is near 0.0; and closeto 1.0whenever the linear combination takes a large positive value. Themaximumlikelihood method wasused for estimating the coefficients for this model rather than the method of least squares, because the model is not a linear functionof the unknown coefficients. Theestimates were obtained by using the CATMOD procedure of PC- SAS. For each of the estimated regression coefficients, a standard error was computed as well as a test statistic for evaluating the hypothesis that the true population value is 0.0. The test statistic followed an asymptotic x2 distribution in the case of the maximum likelihood estimates (nonlinear model) and anF distribution for the least-squares estimates (linearmodel). A p value wasalso calculated for each of the test statistics. These p values and statisticsarenot included inTable 1topreserveclarity, but are available from the authors. The standard errors and the test statistics (or their p values) were used only as an approximate indication of the contribution of a particular fragment rather than as a basis for eliminating the fragment from the model. We took this approach because the objective was not to determine the most parsimonious subset of fragments for predicting biodegradation status but to keep the model as broadly applicable as possible. As a result, there are collinearitiesamongsomeof thefragmentsthat could affect the accuracyof some of thep valuescomputed for the test statistics. Biodegradation Survey. Information relating to the purpose, design and implementation of an earlier survey of expertknowledgehasbeenpublished (7). For thisstudy, we developed a larger database of biodegradability estimates by conducting a second survey in which 17experts evaluated 200organic chemicals (therewere 50 chemicals in the first survey). Each expert rated the primary and 480 Environ. Sci. Technol., Vol. 28, No. 3, 1994 Table 1. Structural Fragments and Coefficients fragment or parameter equation constant M W unsubstituted aromatic (53rings) phosphate ester cyanideinitrile (CEN) aldehyde (CHO) amide (C(=O)N or C(=S)N) aromatic (C(=O)OH) ester (C(=O)OC) aliphatic OH aliphatic NH2 or NH aromatic ether unsubstituted phenyl group (C6H5). - . aromatic OH linear C4 terminal alkyl (CH2CH2CH2CH3) aliphatic sulfonic acid or salt caibamate aliphatic (C(=O)OH) alkyl substituent on aromatic ring triazine ring ketone (CC(=O)C) aromatic F aromatic I polycyclic aromatic hydrocarbon (24rings) N-nitroso (NN=O) trifluoromethyl (CF3) aliphaticether aromatic NO2 azo group (N=N) aromatic NH2 or NH aromatic sulfonicacid or salt tertiary amine carbon with 4 single bonds and no H aromatic C1 pyridine ring aliphatic C1 aromatic Br aliphatic Br freq’ 295 2 5 5 4 9 24 23 34 13 11 25 46 44 4 4 33 36 5 12 1 2 6 4 1 11 14 2 32 11 10 9 40 18 12 5 5 BIODEG models survey models linear coeff nonlinear coeff freq‘ 0.748 0.319 0.314 0.307 0.285 0.210 0.177 0.174 0.159 0.154 0.132 0.128 0.116 0.108 0.108 0.080 0.073 0.055 0.0095 0.0068 -0.000476 -0.810 -0.759 -0.657 -0.525 -0.520 -0.305 -0.347 -0.242 -0.234 -0.224 -0.205 -0.184 -0.182 -0.155 -0.111 -0.110 -0.046 Number of compounds in the training set containing the fragment. 3.01 -0.0142 7.191 44.409 4.644 7.180 2.691 2.422 4.080 1.118 1.110 2.248 1.799 0.909 1.844 6.833 1.009 0.643 0.577 -5.725 -0.453 -10.532 -10.003 -10.164 -3.259 -5.670 -3.429 -2.509 -8.219 -1.907 -1.028 -2.223 -1.723 -2.016 -1.638 -1.853 -1.678 -4.443 200 1 6 11 5 13 6 25 18 7 11 22 21 26 4 6 10 36 4 10 1 2 2 1 2 16 13 3 23 8 10 32 27 8 14 4 2 primary coeff 3.848 -0.00144 -0.343 0.465 -0.065 0.197 0.205 0.0078 0.229 0.129 0.043 0.077 0.0049 0.040 0.269 0.177 0.194 0.386 -0.069 -0.058 -0.022 0.135 -0.127 -0.702 0.019 -0.274 -0.0097 -0.108 -0.053 -0.108 0.022 -0.288 -0.153 -0.165 -0.019 -0.101 -0.154 0.035 ultimate coeff 3.199 -0.00221 -0.586 0.154 -0.082 0.022 -0.054 0.088 0.140 0.160 0.024 -0.058 0.022 0.056 0.298 0.193 -0.047 0.365 -0.075 -0.246 -0.023 -0.407 -0.045 -0.799 -0.385 -0.513 -0.0087 -0.170 -0.300 -0.135 0.142 -0.255 -0.212 -0.207 -0.214 -0.173 -0.136 0.029 ultimate biodegradability of each chemical on a semiquantitative scale,which used thetermshours,days,weeks, months, and longer than months to indicate the approximatetime they thought would berequired for the process to proceed to completion. As the measure of central tendency, wecalculated an arithmetic mean scorefor each chemicalafter assigningnumericalscorestothe individual responses as follows: 5 = hours; 4 = days; 3 = weeks; 2 = months; 1 = longer. The total number of responses for each chemical often exceeded 17, since many experts indicated a range of time by marking more than one term. The 200 survey chemicals covered a very wide range of structure and molecular weight, and the majority were multifunctional. In general, chemicals were selected to be included in thesurvey forthespecificpurpose of testing hypotheses regarding the effects of certain substructures onestimated biodegradability. Someexamplesfollow.To explore postulated negative influences on estimated biodegradability, 50 of the 200 chemicals were halogenated, 17 had nitro groups, 18 had quaternary carbon atoms (defined as four single bonds to non-hydrogen atoms), 20 had three or more fused rings, and 35 had nitrogencontaining heterocycles of various types. With respect to expected positive influences onestimated biodegradability, 56chemicalswere biologically hydrolyzable or postulated to be so, and 15chemicals had unsubstituted linear alkyl chains of C4 or larger. Of the 200 chemicals in the survey and 295 in the experimental (BIODEG) data set, only 20 were common to both sets. Survey Models. Multiple linear regressions were performed usingthemean scoresforprimary and ultimate biodegradation asdependent variables. The independent variables were the same as those used in the linear and nonlinear BIODEG models just described; i.e., counts of 36 structural fragments plus molecular weight. The fragments and regression-derived coefficients are listed in Table 1. Regression coefficientswere estimated by the method of least squares, using the REG procedure of PCSAS. Primary or ultimate biodegradability is calculated for any chemicalby summing,forallthe fragmentspresent inthe chemical,the number of times (ifany)eachfragment occurstimesitscoefficient,and then addingthe summation to a constant that was determined for the entire training set, plus the product of the chemical’s molecular weight and the M, coefficient. A file that lists the 200 survey chemicals, the predicted primary and ultimate biodegradation scores,and the chemicals’ CASregistry numbers is available from the authors. Results Biodegradation Survey. Table 2 contains summary statistics for the 200 survey chemicals and the experts’ responses. Ethylene glycoldiacetate wasjudged to be the Environ. Sci. Tschnol., Vol. 28, No. 3, 1994 461 Table 2. Summary Statistics for Survey Data minimum maximum parameter mean score chemical score chemical primary 3.52 2.37 pentahromoethylbenzene 4.57 ethylene glycol diacetate SD,d 0.84 0.51 ethylene glycol diacetate 1.28 Vat Blue 4; ethylenediaminetetrakkis(methylphosphonic ultimate 2.60 1.44 pentabromoethylhenzene 3.89 ethylene glycol diacetate SDd' 0.83 0.58 ethylene glycol diacetate;picloram; 1.15 maleic hydrazide pri-ulta 0.92 0.43 e-caprolactone 1.75 dacthal Mw 228.6 53.1 acrylonitrile 697.6 tris-2,3-dibromopropylphosphate GPrimary degradationscore minus ultimate degradation score for the same chemical. acid) pentabromoethylbenzene most easily degraded chemical and pentabromoethylbenzene the least degradable for both primary and ultimate degradation. Thehighest and lowestpossiblemean scores are 5 and 1,respectively, but no such value was observed for any of the 200 chemicals. This would have required the unanimous judgment of all 17 experts that biodegradation would occur either in hours (=5, by definition) or longer than months (=l,by definition). Using the standard deviations of the responses for a given chemical asa measure of agreement or disagreement, unanimity of judgment was also greatest for these two chemicals. In contrast, the largest standard deviations were observed for Vat Blue 4 and ethylenediaminetetrakis(methy1phosphonic acid) (primary) and maleic hydrazide (ultimate). On the average, scores for primary and ultimate degradation for each chemical differed by almost 1unit (0.92) relative to the ordinal scale used to assign scores to the experts' estimates. The largest difference (1.75) was observed for dacthal, a tetrachlorinated herbicide with two ester functions that were considered to be relatively easily hydrolyzed. Biodegradation Models. Coefficients fitted by the regressions for all four models are listed in Table 1. With the BIODEG models, the probability of rapid biodegradation can be predicted by using the linear or nonlinear coefficients from Table 1 and either eq 1 or eq 2, respectively. With the survey models, biodegradability can be predicted using the coefficients for primary or ultimate biodegradation in Table 1and eq 1. Toillustrate a typical estimation of biodegradability, we will explain the calculations necessary for predicting the ultimate biodegradability of o-phenylphenol (M, = 170)using the survey model. Using Table 1and eq 1,we have Yj= equation constant +(4.00221) (M,) +(0.022) (one unsubstituted phenyl group, C,H,) +(0.056) (one aromatic OH group) = 3.199 +(-0.3757) + 0.022 +0.056 = 2.90 Themean scoreforthis chemicalfrom the biodegradation experts was 3.08. The integer 3corresponded to "weeks" in the tabulation of the individual survey responses. Performance of the BIODEG models in classifying chemicals in their training set is summarized in Table 3. Each model classified chemicals in the training set with about90% accuracyoverall,butresults wereslightlybetter for the nonlinear model. With either model, rapidly degraded chemicals were classifiedmore accurately than slowlydegraded chemicals. Thedistributions of residuals from the survey models are shown in Figure 1,along with the R2values and percentages of residuals 2 *O.l, *0.3, 462 Enviran. Scl. Technol., VoI. 28. No. 3, 1994 Table 3. Performance of Biodegradability Models in Classifying Chemicals in Their Respective Training Sets RIODEG models survey models pammeter iineaP nonlineaP primaryb ultimatec total correct 2641296 275 295 165 200 167,200 correct total 89.5 93.2 82.5 83.5 % correct, 97.3 97.3 84.9 93.5 fast biodegradation (181/186) (1811186J ,101 119) (101,108) slow biodegradation (83 109) (94 109) 64/61) (66 92) 5 correct, 76.1 86.2 79.0 71.7 Fast biodegradationis defined an a predicted probability >OS for being classified a9 a BR cBiodegrades Rapidly). Fast biodeg. radation is defined as a biodegradahiiity score 23.5 Fast biodeg. radation i9 defined as a biodegradahilrtyscore >2.5. 40 iiI 1-1 iI /--I t i t 41 25 - --I IE 5 20I l l / , . , . , , , , , , , .I .0.8 4.8 0.4 0.2 0 0.2 0.4 0.8 0.8 1 RBIidYai Figure1. DistributionOf residualsfrombiodegradationsurveymodels. (A. top) Primary degradation model. (6. botiom) Uitimate degradation model. and *0.5 (absolute value). The mean residuals (absolute value) for primary and ultimate degradation for all 200 chemicals were 0.173 and 0.206,respectively. There were eight chemicalswith residuals 20.6 in absolute value,and these are listed in Table 4. Table 4. Poorly Predicted Survey Chemicalsa predicted chemical experts model residual primary degradation silvex 2,2,4,4,6,8,8-heptamethylnonane di-tert-butyldicarbonate ultimate degradation e-caprolactone n-decanal 11-cyanoundecanoicacid hexachlorophene ethylene glycol diacetate di-tert-butyldicarbonate 2.82 3.43 2.43 3.06 4.05 3.23 3.70 3.09 3.80 3.17 3.68 3.01 1.77 1.10 3.89 3.16 3.18 2.29 -0.60 -0.63 0.82 0.61 0.63 0.67 0.67 0.73 0.89 a All survey chemicals with residuals (experts predicted minus model predicted) 2 Jf0.61. The primary and ultimate survey models calculate a biodegradability score rather than a probability of rapid biodegradation, with integers corresponding to the descriptors (hours,days,weeks,etc.)used in thesurvey.This makes direct comparison of performance for thetwo types of models (BIODEGvssurvey)difficult. Onewaytoenable such a comparison isto evaluateperformance of the survey models in classifyingchemicals in the survey training set. Toaccomplish this, we defined rapid primary degradation as a biodegradability score of 23.5, corresponding to the descriptor days-weeks. For ultimate degradation, we defined rapid biodegradation as a biodegradability score of >2.5,which corresponds toweeks-months. Usingthese criteria, performance of the primary and ultimate survey models as classifiers (Table 3) was somewhat below that observedforthe BIODEG models,with slightly more than 80% of the survey chemicals classified correctly by each model. As was true for the BIODEG models, rapidly degraded chemicals were more accurately classified than slowly degraded chemicals. Accuracy of Experts’ Estimates. To assess directly the accuracy of the experts’ biodegradability estimates, we retrieved and reviewed experimental data forallsurvey chemicals that also had water grab sample data in the BIODEG database (11-34). Our assessments of the literature data for these 13 chemicals with respect to the approximate length of time required for complete degradation (defined here as six half-lives) are summarized in Table 5. For cornparison, the mean survey scores, our interpretation of them relative to the biodegradability descriptors used in the survey,and the calculated (model) values are also presented. It is evident that the experts’ estimates of biodegradabilityin aquaticenvironments were generally consistent with existing experimental data. Biodegradability scorescalculated usingthesurveymodels (lastcolumn ofTable 5) alsotracked wellwith the experts’ estimates, with mean residuals (absolute value) for these chemicals of 0.16for primary degradation (n=6)and 0.20 for ultimate degradation (n = 7). Discussion Our results demonstrate that a single set of chemical substructures and molecular weight allow an acceptably accurate prediction of both experimentally determined biodegradability, as reflected in the BIODEG evaluation codes, and experts’ estimates of primary and ultimate biodegradation rates. This finding lends credence to the notion that these factors are important determinants of biodegradability. It also validates expert judgment, as reflected in the survey data and the models based on it. The models thus derived have been encoded in an IBMcompatible PC program (Biodegradation Probability Program, available from Syracuse Research Corp.) that predicts the probability of rapid biodegradation and the timerequired forprimary and ultimate degradation. Only the chemical’s SMILES notation (35)or CAS registry number is required as input. Expertjudgment is alsovalidated by direct comparison ofsurveyscoresto grab samplebiodegradation data (Table 5). Chlorothalonil seems to be an exception, because the experts predicted that primary degradation would occur inweeksto months,whereastheexperimentaldata suggest days to weeks. But this is an unusual situation since, according to Davies (36),primary degradation is much faster than anticipated because the nitriles in chlorothalonil direct nucleophilic attack to the 4 and 6 positions on the ring. A lesson to be learned from this is that even the collective wisdom of experts may be in error when applied to specific chemical structuresand should not be considered a substitute for adequate testing. Our previous models (4) included a library of 35 structural fragments in order to ensure that the models be as broadly applicable as possible. However, no predictions could be made for chemicals that did not contain any of these substructures. With the inclusion of the molecular weight parameter no structures are excluded, although the reliability of predictions based on molecular weight alone is probably fairly low except for chemicals with very low or very high molecular weights. Among the five new or redefined substructures listed in Table 1,at least two also have clear mechanistic significance, since unsubstituted terminal alkyl groups (represented by the linear C4fragment) and unsubstituted phenyl groupsboth provide sites for the initiation of well-known biodegradation pathways (20,37). The signs of the coefficients for the fragments and parameters listed in Table 1are generally consistent with commonly accepted generalizations regarding effects of chemical structure on biodegradability. For example, ester, alcohol,and carboxylicacid groups usually enhance biodegradability (9,IO),and all have positive signs in all four models. On the other hand, halogens, nitro groups, and quaternary carbons are assumed to make a chemical more resistant to degradation, and allhave negative signs. However,there arealsoa number of fragmentsforwhich the signs are not the same in the four models. In some cases the coefficients are small for all four models, which suggests that the fragment may not be very important in determining biodegradability. An example is the ketone fragment. For other fragments, it may be observed that the signs of the coefficients are often inconsistent where the BIODEG and survey training sets contained only a few chemicals with that fragment. Confidence in those coefficients is therefore low, but could be raised by additional testing. Examples of fragments for which few data are available include the aromatic F, N-nitroso, and aliphatic Br fragments. Another phenomenon is that the primary and ultimate coefficients(surveymodels) are sometimes quite different in magnitude. This is to be expected. In the case of the aldehyde, amide, and carbamate fragments, for example, this suggests that these fragments are considered by expertsto be likelysitesof initialattack,but without major Environ. Scl. Technol., Vol. 28, No. 3, 1994 463 Table 5. Comparison of Survey Data to Measured Biodegradability for Survey Chemicals with Water Grab Sample Data survey literature chemical scoren intb intC nd U or Pe ref modelf eicosane dimethylformamide cumene propanil Acid Orange 6 chlorothalonil acrylonitrile o-phenylphenol diphenyl ether di-2-cyanoethylether tert-butylbenzene hexachlorophene benzanthracene 4.19 4.09 3.68 3.61 3.45 2.39 3.27 3.08 2.79 2.79 2.62 1.77 1.76 mo; >mo >d wk;wk mo wk-mo; mo mo >wk; >mo >d; >d; wk; >wk; >wk;wk-mo; mo; d; d-wk; d-wk; wk; wk d-wk >mo; >mo; >mo; >mo; >mo; >mo 3 1 1 5 1 5 2 1 1 2 1 2 13 P P P P P P U U U U U U U 11-13 14 15 16-18 19 20 21 22 23 21,24 23 25 26-34 3.98 3.94 3.61 3.40 3.47 2.68 3.00 2.90 2.81 2.76 2.72 1.10 1.89 Observed biodegradability scorefrom survey;value given is for either ultimate or primary degradation,depending on the type of literature data. * Interpretation of the surveyscoreaccordingto the followingscheme (d = days;wk = weeks; mo = months): 24 = 5d; <4 2 3 = d-wk; <3 2 2 = wk-mo; <2 = tmo. Interpretation of each study in terms of the approximate time required for complete degradation,defined as six half-livesfor primary degradation and 60-70% of theoretical for ultimate degradation, in natural water grab samples. Number of studies. e U = ultimate; P = primary. f Predicted primary or ultimate degradation using the appropriate survey model. influence on rates of ultimate degradation. Conversely, triazine rings, azo bonds, and pyridine rings, for example, seem to be viewed as negative for ultimate but not necessarily primary degradation. Closeinspection of theresidualsfromthe surveymodels suggests several ways in which these models could be improved. In some cases the solution is obvious. For example, the experts assumed that di-tert-butyldicarbonate (Table 4) would be readily hydrolyzed, but our modelslack a carbonate fragment. Alkyl chainsrepresent amoresubtleproblem. Thirteen compounds in the survey had linear alkyl chains of C9 or greater, and 10 of these had positive residuals for both primary and ultimate degradation. This suggests that long alkyl chains were viewed by the experts as having a positive impact on biodegradability, but that the linear C4 terminal alkyl fragment does not adequately account for this effect. On the other hand, compounds with cycloalkane rings (six survey compounds)and aromatic rings with twonitrogens (Le., pyrazines, pyrimidines, and pyridazines; six survey compounds) generally had negative residuals, suggesting that these groups were considered to increase resistance to biodegradation. It should be noted that both singleand three-nitrogen aromatics (i.e.,pyridines and triazines) are already represented by fragments in our models, and all but one of the coefficients are negative. In spite of this, we did not add new fragments for cycloalkane or two-nitrogenheteroaromatic rings, because our approach was to require that such chemicals also be adequately represented in the experimental data (BIODEG)training set, and they were not. Additional testing will probably be required to establishan adequate database of measured values. This kind of analysis shows how the models may be used to identify chemical classes in need of testing. There is no doubt that the fragment constant approach to biodegradability modeling that we have taken is somewhat simplistic and does not, for example, take into account the possible interactions among fragments in multifunctional molecules. Nevertheless, the models described above meet our goal of providing quantitative or semi-quantitative estimates of biodegradation rate for use in chemical ranking schemes, in addition to estimates of probability of rapid biodegradation. Literature Cited (1) Howard, P. H.; Hueber, A. E.; Boethling, R. S. Enuiron. Toxicol. Chem. 1987, 6, 1. (2) Howard, P. H.; Sage, G. W.; LaMacchia, A.; Colb, A. J. Chem. Inf. Comput. Sci. 1982, 22, 38. (3) Howard, P. H.; Hueber, A. E.; Mulesky, B. C.; Crisman, J. S.;Meylan,W.;Crosbie,E.;Gray,D.A.;Sage,G.W.;Howard, K. P.; LaMacchia, A.; Boethling, R.; Troast, R. Enuiron. Toxicol. Chem. 1986, 5, 977. (4) Howard, P. H.; Boethling, R. S.; Stiteler, W. M.; Meylan, W. M.; Hueber, A. E.; Beauman, J. A,; Larosche, M. E. Enuiron. Toxicol. Chem. 1992, 11, 593. (5) Klopman, G.;Balthasar, D. M.;Rosenkranz, H. S.Enuiron. Toxicol. Chem. 1993, 12, 231. (6) Gombar, V. K.; Enslein, K. In Applied Multivariate Analysis in SARandEnvironmentalStudies;Devillers,J., Karcher, W., Eds.; Kluwer: Boston, MA, 1991;pp 377-414. (7) Boethling, R. S.; Gregg, B.; Frederick, R.; Gabel, N. W.; Campbell, S.E.; Sabljic, A. Ecotoxicol. Enuiron. Saf. 1989, 18, 252. (8) Boethling,R. S.;Sabljic,A.Enuiron.Sci.Technol. 1989,23, 672. (9) Alexander, M. Biotechnol. Bioeng. 1973, 15, 611. (10) Scow,K. M. InHandbook of ChemicalPropertyEstimation Methods; Lyman, W.J., Reehl, W. F., Rosenblatt, D. H., Eds.; McGraw-Hill: New York, 1982; pp 9-1-9-85. (11) Bertrand, J. C.; Esteves, J. L.; Mulyono, M.; Mille, G. Chemosphere 1986, 15,205. (12) Matsumoto, G. Water Res. 1983, 17, 1803. (13) Walker, J. D.; Calomiris, J. J.;Herbert, T. L.; Colwell, R. R. Mar. Biol. 1976, 34, 1. (14) Dojlido, J. R. Investigations of Biodegradability and Toxicity of Organic Compounds,Final Report 1975-1979 Environmental Protection Agency: Cincinnati, OH, 1979; (15) Walker, J. D.; Colwell,R. R. Prog. Water Technol. 1975, 7, 783. (16) Call,D.J.,Brooke;L.T.;Kent,R.J.;Knuth,M.C.;Anderson, C.; Moriarty, C. Arch. Enuiron. Contam. Toxicol. 1983,12, 175. EPA 60012-79-163. (17) El-Dib, M. A,; Aly, 0.A. Water Res. 1976, 10, 1055. (18) Paris, D. F.; Rogers, J. E. Appl. Enuiron. Microbiol. 1986, 51, 221. (19) Michaels, G. B; Lewis, D. L. Enuiron. Toxicol. Chem. 1986, 5, 161. (20) Gibson, D. T.;Subramanian, V. In Microbial Degradation of Organic Compounds; Gibson, D. T., Ed.; Dekker: New York, 1984; pp 181-252. 464 Envlron. Sci. Technol., Vol. 28, No. 3, 1994 (21)Ludzack, F.J.;Schaffer, R. B.; Bloomhuff, R. N.; Ettinger, M. B. Proc. 13thInd. WasteConf.,Eng.Ext.Bull.,Purdue Univ.,Engr, Ext. Ser. PP 1958,13th, 297. Gonsior, S. J.;Bailey, R. E.;Rhinehart, W. L.; Spence, M. W. J.Agric. Food Chem. 1984,32,593. Ludzack,F.cJ.;Ettinger,M. B.Eng.Ext.Ser. (PurdueUniv.) 1963,no. 115, 278. (24)Cherry, A. B.; Gabaccia, A. J.; Senn, H. W. Sewage Ind. Wastes 1956,28,1137. (25)Lee,R. F.;Ryan, C.In Microbial DegradationofPollutants in Marine Environments; Bourquin, A. W., Pritchard, P. H., Eds.; Environmental Protection Agency: Gulf Breeze, (26)Gardner, W. S.;Lee, R. F.;Tenore, K. R.; Smith, L. W. Water,Air, Soil Pollut. 1978,11,339. (27)Herbes, S.E.;Schwall,L. R. Appl.Environ.Microbiol. 1978, 35, 306. (28)Herbes, S.E.; Southworth, G. R.; Schaeffer, D. L.; Griest, W. G.; Maskarinec, M.P. In TheScientificBasisof Toxicity Assessment;Witschi,H.,Ed.;Elsevier/North-Holland New York, 1980;pp 113-128. FL, 1979;EPA-60019-79-012;pp 443-450. (29)Herbes, S.E.Appl. Enuiron Microbiol. 1981,41,20. (30)Hinga, K.R.;Pilson, M. E.Q.;Lee,R.F.;Farrington, J. W.; Tjessem, K.; Davis, A.C. Environ. Sci. Technol. 1980,14, 1136. (31)Lee, R. F.Proc. Oil Spill Conf.(APIPubl.)1977,4284,611. (32)Lee,R. F.;Gardner,W. S.;Anderson,J.W.; Blaylock,J.W.; (33)Lee, R. F.;Ryan, C. Can.J.Fish Aquat. Sci. 1983,40,86. (34)Roubal, G.; Atlas, R. M. Appl. Environ. Microbiol. 1978, (35)Weininger, D.J. Chem. Inf. Comput. Sci. 1988,28,31. (36)Davies, P.E.Bull.Enuiron.Contam.Toxicol.1988,40,405. (37)Britton, L. N.In Microbial Degradation of Organic Compounds; Gibson, D.T., Ed.; Dekker: New York, 1984;pp Barwell-Clarke, J. Environ. Sci. Technol. 1978,12,832. 35,897. 89-129. Receivedfor review June 3,1993.Revised manuscript received October 25,1993.Accepted November I, 1993.@ @ Abstract published in Advance ACS Abstracts,December 15, 1993. Environ. Sci. Technol., Vol. 28, No. 3. lW4 465