Case Study
Case Study: Biomass Estimation from Beeches
UGARTE, M.D.(*)
(*)Departamento de Estadística e I. O., Universidad Publica de Navarra, Pamplona, Spain
E-MAIL: LOLA@UNAVARRA.ES
Material from the book Probability and Statistics with R by Ugarte, Militino, and Arnholt. Chapman and Hall/CRC, 2008
Brno, December 2007
Lola Ugarte
Introduction
■  Data for this case study come from Gestión Ambiental de Viveros y Repobla-ciones de Navarra and Gobierno de Navarra, 2006.
■  To estimate the amount of carbon dioxide retained in a tree, its biomass needs to be known and multiplied by an expansion factor (there are several alternatives in the literature). To calculate the biomass, specific regression equations by species are frequently used. These regression equations, called allometric equations, estimate the biomass of the tree by means of some known characteristics, typically diameter and/or height of the stem and branches.
Brno, December 2007
1
Lola Ugarte
Introduction
File Data
The biomass file contains data of 42 beeches (Fagus Sylvatica) from a forest of Navarra (Spain) in 2006, where
■  Dn: diameter of the stem in centimeters
■  H: height of the tree in meters
■  PST: weight of the stem in kilograms
■  PSA: aboveground weight in kilograms
Brno, December 2007
2
Lola Ugarte
Regression Analysis with R
■  Make a scatterplot of PSA and Dn. Do you think is it possible to fit a regression line to explain the weight of the beech in terms of the diameter of the stem?
■  Make a scatterplot of log(PSA) and log( Dn). Do you think is it possible to fit a regression line to explain the weight of the beech in terms of the diameter of the stem?
Brno, December 2007
3
Lola Ugarte
Regression Analysis with R
library(PASWR)
attach(biomass)
par(mfrow=c(1,2))
plot (Dn,   PSA)    #   a   clear   non-linear   relationship
abline(lsfit(Dn,PSA),    col=2,    lwd=2)
plot(log(Dn),    log(PSA))       #the   relation   seems   to  be   linear
abline(lsfit(log(Dn),log(PSA)    ),    col=2,    lwd=2)
Brno, December 2007
4
Lola Ugarte
10             20             30             40             50             60             70                                                  2.5                      3.0                      3.5                      4.0
Dn                                                                                                                                        log(Dn)
Figura 1: Scatterplots
Brno, December 2007
5
Lola Ugarte
Regression Analysis with R
■  Fit the regression model
\og(PSÄ) = ßo + ßi\oQ(Dn)
■  Compute R2, R2a, and the residual variance
Brno, December 2007
6
Lola Ugarte
Solution in R
model.DN<-lm(log(PSA)~log(Dn)) > summary(model.DN)
Residuals:
Min IQ Median 3Q Max -0.48510 -0.12682 0.02701 0.10766 0.32104 Coefficients:
Estimate Std. Error t value Pr(>|t|) (Intercept)  -1.5015     0.1920  -7.822 1.38e-09 *** log(Dn)       2.2806     0.0542  42.076  < 2e-16 ***
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.1842 on 40 degrees of freedom
Multiple R-Squared: 0.9779,     Adjusted R-squared: 0.9774
F-statistic:  1770 on 1 and 40 DF,  p-value: < 2.2e-16
7 Brno, December 2007                                                                                                                   Lola Ugarte
Regression Analysis with R
■  Introduce H as an explanatory variable and fit the model
log (PSA) = ß0 + A log (Dn) + ß2H
■  Is H statistically significant?
Brno, December 2007
8
Lola Ugarte
Solution in R
> model.DNH<-lm(log(PSA)"log(Dn)+ H )
> summary(model.DNH)
Residuals:
Min IQ Median 3Q Max -0.29861 -0.11093 -0.01903 0.07141 0.38130 Coefficients:
Estimate Std. Error (Intercept) -1.691859 0.167001 log(Dn) 2.185244 0.050682 H                           0.023259   0.005487
Signif.   codes:      0   '***'    0.001   '**'    0.01   '*'    0.05   '.'    0.1   '    '    1 Residual   standard  error:    0.1544   on   39   degrees   of   freedom Multiple   R-Squared:   0.9849,           Adjusted  R-squared:    0.9841
F-statistic:      1270   on   2   and  39   DF,      p-value:   <   2.2e-16 Brno, December 2007                                                                                                 Lola Ugarte
t value Pr (>11|)
-10.131 1.77e-12   ***
43.117  < 2e-16   ***
4.239 0.000133   ***
Regression Analysis with R
■  Estimate the model's parameters and their standard errors. (See again sum-mary(model.DHN))
■  Provide an interpretation for the model's parameters.
Brno, December 2007
10
Lola Ugarte
Interpretation of the model's parameters
The fitted model is:
log (PSA) = -1,691859 + 2,185244 log (Dn) + 0,023259 H ßi can be interpreted as:
z _ %Ay
Pl "    %Ax
For a given H, if the diameter of the stem increases by 1 %, the weight of the stem
increases approximately 2,19%.
For a given diameter, each meter of increase in height produces an increase of
the weight of the stem of approximately 2,33%
Brno, December 2007
11
Lola Ugarte
Regression Analysis with R
Compute the variance-covariance matrix of the ßs
Solution in R
>  vcov(model.DNH)
(Intercept)                  log(Dn)                                  H
(Intercept)      0.0278894670   -0.0062149387   -0.0002463658
log(Dn)              -0.0062149387      0.0025686777   -0.0001234109
H                            -0.0002463658   -0.0001234109      0.0000301036
Brno, December 2007
12
Lola Ugarte
Regression Analysis with R
Provide 95 % confidence intervals for ß\ and /^ Solution in R
>   confint(model.DNH)
2.5   %               97.5   %
(Intercept)   -2.02965082   -1.35406639
log(Dn)                2.08272951      2.28775805
H                               0.01216104      0.03435673
Brno, December 2007
13
Lola Ugarte
Regression Analysis with R
Compute R2, R2a, and the residual variance Solution in R
summary(model.DNH)$r.squared
[1]    0.9848744
summary(model.DNH)$adj.r.squared
[1]    0.9840988
summary(model.DNH)$sigma"2
[1]    0.02383166
Brno, December 2007
14
Lola Ugarte
Regression Analysis with R
Construct a graph with the default diagnostics plots of R Solution in R
win.graph()
par(mfrow=c(2,2),pty="s")
plot(model.DNH)
Brno, December 2007
15
Lola Ugarte
Residuals vs Fitted
Normal Q-Q
-2-10        1        2 Theoretical Quantiles
Scale-Location
Residuals vs Leverage
o	015 013   o       ,0° O             ° o o        o    o
o o	------">   o   ° 8            o   ° co o
~i-------1-------1-------r
5         6         7
Fitted values
Fi gura 2: Diagnostics Plots
Brno, December 2007
16
Lola Ugarte
Regression Analysis with R
Can homogeneity of variance be assumed? Solution in R
library(lmtest) bptest(model.DNH)
studentized  Breusch-Pagan   test
data:      model.DNH
BP   =   7.626,   df   =   2,   p-value   =   0.02208
Brno, December 2007
17
Lola Ugarte
Regression Analysis with R
Do the residuals appear to follow a normal distribution? Solution in R
>   shapiro.test(rstandard(model.DNH))
Shapiro-Wilk   normality  test
data:      rstandard(model.DNH) W  =   0.9569,   p-value   =   0.1146
Brno, December 2007
18
Lola Ugarte
Regression Analysis with R
Are there any outliers in the data? Solution in R
a=model.DNH
win.graph()
plot(rstudent(a),type="n",xlab="",ylab="r_i"*")
text(rstudent(a))
abline(h=qt(0.025,   a$df.residual-1))
abline(h=qt(0.975,a$df.residual-1))
title ("c)    Studentized  Residuals")
Brno, December 2007
19
Lola Ugarte
c) Studentized Residuals
15
10 11
39
16
29
37
21
33
12
23                              38 43l
26   ,*-,  36   *1
3C2 30
20  24  28    3i
22   27 2Í5
17
19
S     42
14
13
~r~
10
~r~
30
20
40
Fi gura 3: Diagnostics Plots
Brno, December 2007
20
Lola Ugarte
Regression Analysis with R
Are there any influential observations in the data? Solution in R
#   Cook   distance a=model.DNH win.graph() par(mfrow=c(2,2)) cd.F<-cooks.distance(a)
plot(cd.F,   ylab="Cooks   Distance",   ylim=c(0,0.8)) iden(cd.F,    a=3)
crit.value<-qf(0.5,   ncol(X),   nrow(X)-ncol(X)) abline(h=crit.value,    lty=2)
Brno, December 2007
21
Lola Ugarte
#  Dffits
dffits.modelF<-dffits(a)
plot(dffits.modelF, ylab="Dffits", ylim=c(-1,1))
iden(dffits.modelF, a=3)
crit.value<-2*sqrt(ncol(X)/nrow(X))
abline(h=c(-crit.value, crit.value), lty=2)
Brno, December 2007
22
Lola Ugarte
-&
1f
i      D                      °° O                      n      00 °
oooo  °°  o     o °ooo00oo° °oo°  O O 00°
10
20
30
40
Figura 4: Dffits
Brno, December 2007
23
Lola Ugarte
#DFbetas
par(mfrow=c(2,2) )
dfbetas.modelF<-dfbetas(a)
plot(dfbetas.modelF[,1], ylab="dfbetas[,2]", ylim=c(-1,1))
iden(dfbetas.modelF[,1], a=3)
crit.value<-2/sqrt(nrow(X))
abline(h=c(-crit.value, crit.value), lty=2)
plot(dfbetas.modelF[,2], ylab="dfbetas[,2]", ylim=c(-1,1))
iden(dfbetas.modelF[,2], a=3)
crit.value<-2/sqrt(nrow(X))
abline(h=c(-crit.value, crit.value), lty=2)
plot(dfbetas.modelF[,3], ylab="dfbetas[,3]", ylim=c(-1,1))
iden(dfbetas.modelF[,3], a=3)
crit.value<-2/sqrt(nrow(X))
abline(h=c(-crit.value, crit.value), lty=2)
24 Brno, December 2007                                                                                                                   Lola Ugarte
^^	
o""5       "*"	
"Vo °°	° Cítř<S%í»)   °OtSS> 0Q>                □
.----------------Q.	
	38
10
20
30
40
ts

A w
~~I—
10
~~I—
20
~~I— 30
40
Figura 5: Dfbetas
Brno, December 2007
25
Lola Ugarte
Predictions
Obtain predictions of the aboveground biomass of trees with diameters Dn = seg(12,5,42,5,5) and heights H = seg(10,40,5). Note that the weight predictions are obtained from back transforming the logarithm. The bias correction is obtained by means of the log-normal distribution: if ypred is the prediction, the corrected (back-transformed) prediction Ýpred is given by
Ypred = exp(ypred + a2/2) where a2 is the variance of the error term.
Brno, December 2007
26
Lola Ugarte
Predictions with R
> Dn<-seq(12.5,42.5,5)
> H<-seq(10,40,5)
> newdata<-data.frame(Dn, H)
> predictions<-exp(predict.lm(model.DNH,newdata)+summary(model.DNH)$ sigma"2/2)
> predictions
12                      3                      4                      5                      6                      7
58.67295  137.48945  267.47468  465.83604  753.84143 1157.69554 1709.56309
Brno, December 2007
27
Lola Ugarte
A Model for the Stem
Fit the following regression model for the weight of the stem
PST = ßQ + ßlDn + ß2H
■  Display the default diagnostics plots. What does the fitted values vs. the residuals plot suggest?
■  Propose a model to correct the above problem
■  Does your new model correct the residuals problem detected?
Brno, December 2007
28
Lola Ugarte