Linear Models in Statistics Lecture notes by Andrea Kraus, Ph.D. i These lecture notes are intended for students of the 3rd year bachelor course M5120 Linear Models in Statistics I. This is a one-semester course offering an overview of linear statistical models as the fundamental tool of statistical analysis. The students encounter theory, software implementation, applications and interpretation. After the course the students are expected to recognize the situations that can be addressed by linear models, formulate and implement the model, and interpret the results. At the same time, the students are made aware of the limitations of the model and should be able to recognize and possibly avoid problems in a given situation. The lecture notes are based primarily on the following texts: Julian J. Faraway (2014). Linear Models with R. Second edition. Chapman& Hall/CRC. Simon N. Wood (2006). Generalized Additive Models; An introduction with R. Chapman& Hall/CRC. Jiˇr´ı Andˇel (2005). Z´aklady matematick´e statistiky. Matfyzpress. I would like to thank Professor Victor Panaretos for kindly sharing his teaching experience and course materials for his course on Regression at the Swiss Federal Institute of Technology in Lausanne. I would also like to thank my students for their feedback on various bits and pieces of these notes. Contents 1 Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3 Data analysis in practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.4 Descriptive statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.4.1 Types of variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.4.2 Relationships between variables . . . . . . . . . . . . . . . . . . . . 15 2 Linear algebra essentials 20 2.1 The problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.1.1 Linear model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.1.2 Task for this chapter . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.2 Linear mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.2.1 Associated subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.2.2 Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.3 Matrix decompositions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.3.1 Eigen-decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.3.2 Singular value decomposition . . . . . . . . . . . . . . . . . . . . . 28 2.3.3 QR decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2.4 Pseudoinverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2.4.1 Moore–Penrose pseudoinverse . . . . . . . . . . . . . . . . . . . . . 32 2.5 Orthogonal projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 2.6 Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3 Normal distribution 37 3.1 The problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.1.1 Linear model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.1.2 Task for this chapter . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.2 Univariate normal distribution . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.2.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.2.3 Related distributions . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.3 Multivariate normal distribution . . . . . . . . . . . . . . . . . . . . . . . . 42 ii CONTENTS iii 3.3.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.3.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.3.3 Related distributions . . . . . . . . . . . . . . . . . . . . . . . . . . 46 4 Linear model 47 4.1 The problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 4.1.1 Linear model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 4.1.2 Task for this chapter . . . . . . . . . . . . . . . . . . . . . . . . . . 48 4.2 Estimating β . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 4.2.1 Orthogonal projection . . . . . . . . . . . . . . . . . . . . . . . . . 49 4.2.2 Least squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 4.2.3 Computing β . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.3 Quality of estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.3.1 Gauss–Markov theorem . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.4 Estimating σ2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 4.4.1 Estimating σ2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 4.5 Quality of model fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 4.5.1 Coefficient of determination . . . . . . . . . . . . . . . . . . . . . . 54 5 Normal linear model 56 5.1 The problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 5.1.1 Normal linear model . . . . . . . . . . . . . . . . . . . . . . . . . . 56 5.1.2 Task for this chapter . . . . . . . . . . . . . . . . . . . . . . . . . . 57 5.2 Estimating β and σ2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 5.2.1 Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 5.2.2 Matrix derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 5.2.3 Maximizing the likelihood . . . . . . . . . . . . . . . . . . . . . . . 60 5.3 Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 5.3.1 Distribution of the MLE . . . . . . . . . . . . . . . . . . . . . . . . 61 5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 5.4.1 Estimation in the normal linear model . . . . . . . . . . . . . . . . 63 6 Inference in normal linear model 64 6.1 The problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 6.1.1 Normal linear model . . . . . . . . . . . . . . . . . . . . . . . . . . 64 6.1.2 Task for this chapter . . . . . . . . . . . . . . . . . . . . . . . . . . 65 6.2 Estimators & distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 6.2.1 Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 6.2.2 Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 6.3 Confidence intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 6.4 Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 6.5 Confidence bands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 6.6 Testing hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 CONTENTS iv 6.6.1 Simple hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 6.6.2 Composite hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . 71 6.7 Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 7 Model selection 75 7.1 The problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 7.1.1 Normal linear model . . . . . . . . . . . . . . . . . . . . . . . . . . 75 7.1.2 Task for this chapter . . . . . . . . . . . . . . . . . . . . . . . . . . 77 7.2 Why consider various models? . . . . . . . . . . . . . . . . . . . . . . . . . 77 7.2.1 Should we leave out covariates that appear unnecessary? . . . . . . 77 7.2.2 What is the right form of the dependence on covariates? . . . . . . 79 7.3 Nested models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 7.4 Selecting the model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 7.4.1 Model selection tools . . . . . . . . . . . . . . . . . . . . . . . . . . 85 7.4.2 Model selection strategies . . . . . . . . . . . . . . . . . . . . . . . 87 8 Model diagnostics 90 8.1 The problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 8.1.1 Normal linear model . . . . . . . . . . . . . . . . . . . . . . . . . . 90 8.1.2 Task for this chapter . . . . . . . . . . . . . . . . . . . . . . . . . . 91 8.2 Random errors and residuals . . . . . . . . . . . . . . . . . . . . . . . . . . 92 8.3 Checking the assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 8.3.1 General principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 8.3.2 Assumptions on the expectation . . . . . . . . . . . . . . . . . . . . 95 8.3.3 Assumptions on the variance . . . . . . . . . . . . . . . . . . . . . . 97 8.3.4 Assumptions on the distribution . . . . . . . . . . . . . . . . . . . . 102 8.4 Influential and unusual observations . . . . . . . . . . . . . . . . . . . . . . 103 8.4.1 Observations to look at . . . . . . . . . . . . . . . . . . . . . . . . . 103 9 Reduced-rank design matrix and multicolllinearity 106 9.1 The problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 9.1.1 Normal linear model . . . . . . . . . . . . . . . . . . . . . . . . . . 106 9.1.2 Task for this chapter . . . . . . . . . . . . . . . . . . . . . . . . . . 108 9.2 Rank-deficient design matrix . . . . . . . . . . . . . . . . . . . . . . . . . . 108 9.2.1 Rank-deficient design matrix . . . . . . . . . . . . . . . . . . . . . . 108 9.2.2 Identifiability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 9.2.3 Choice of the solution . . . . . . . . . . . . . . . . . . . . . . . . . 112 9.3 Multicollinearity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 10 Miscellanea and recap 119 10.1 The problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 10.1.1 Normal linear model . . . . . . . . . . . . . . . . . . . . . . . . . . 119 10.1.2 Task for this chapter . . . . . . . . . . . . . . . . . . . . . . . . . . 121 CONTENTS v 10.2 Linear regression in practice . . . . . . . . . . . . . . . . . . . . . . . . . . 121 10.2.1 Linear regression in practice . . . . . . . . . . . . . . . . . . . . . . 121 10.3 Notes on interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 10.3.1 Notes on the explanation . . . . . . . . . . . . . . . . . . . . . . . . 122 10.3.2 Notes on the prediction . . . . . . . . . . . . . . . . . . . . . . . . . 124 10.4 Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 10.4.1 Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 10.5 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 10.5.1 Reflection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 Chapter 1 Introduction 1.1 Motivation Is eating chocolate good for our health? Effects of chocolate ◦ it has been suggested that chocolate consumption is beneficial to cardiovascular health (effects on “bad” cholesterol, blood pressure, stroke, . . . ) lowers the risk of diabetes improves cognitive function & reduces memory decline . . . ◦ but it has also been suggested that chocolate consumption leads to obesity (risk for cardiovascular problems, diabetes) leads to dental problems decreases bone density . . . ◦ should be eaten in moderation . . . It’s an uncertain world . . . ◦ How much of chocolate and other goodies is good for our health? levels of bacteria, fertilizers, chemicals, . . . is safe? CHAPTER 1. INTRODUCTION 2 ◦ What is the right size for the height of a dam? insurance premium? mortgage interest? ◦ What is the average salary? public opinion on . . . ? results in upcoming elections? Sources of uncertainty ◦ we do not fully understand the phenomenon human body nature ◦ we do not know the future occurrence and size of a flood occurrence and size of insurance claims level of inflation ◦ we do not collect complete data average salary public opinion ◦ measurement error, human factor, . . . Statistics is all around us ◦ statistics is used to quantify the uncertainty ◦ Strategy 1. build a mathematical model, i.e. define what is known what is uncertain 2. build a probabilistic model for what is uncertain 3. use probability calculus to draw conclusions CHAPTER 1. INTRODUCTION 3 4. “translate” back to the original problem (interpret the results) ◦ uncertainty at the beginning imperfect answers at the end ◦ statistics is used for quantifying uncertainty, not for getting rid of it Notation ◦ random variable X, Y (n´ahodn´a veliˇcina) ◦ random vector/matrix X, Y (n´ahodn´y vektor/matice) ◦ density/probability mass function f (hustota/pravdˇepodobnostn´ı funkce) ◦ parameters θ, β, normal distribution N(µ, σ2 ) (parametry, norm´aln´ı rozdˇelen´ı) ◦ expectation E X, E X (stˇredn´ı hodnota) ◦ variance/covariance/variance-covariance matrix Var X, Cov(X, Y ), Var X (rozptyl/kovariance/kovarianˇcn´ı matice) Var X =     Var X1 Cov(X1, X2) . . . Cov(X1, Xn) Cov(X1, X2) Var X2 . . . Cov(X2, Xn) . . . . . . . . . . . . Cov(X1, Xn) Cov(X2, Xn) . . . Var Xn     Statistician’s TODO list 1. identify right questions 2. collect relevant data x1, . . . , xn 3. think of them as realisations of random variables X1, . . . , Xn with distributions (densities/frequency functions) f1, . . . , fn CHAPTER 1. INTRODUCTION 4 where fi is in fact fi(x, θ) 4. estimate θ/make inference about θ 5. use the results to answer the questions Example 1. Does consuming [amount of] chocolate decrease blood pressure [type, measurement]? ◦ Is chocolate good for our health? 2. design a trial, collect participants’ blood pressures x1, . . . , xn 3. suppose e.g. that Xi ∼ N(µi, σ2 ) ◦ µi: function of eating [amount of] chocolate, age, gender, . . . ◦ e.g. µi = β0 + β1xi,1 + . . . + βkxi,k ◦ xi,1 = 1 if the person eats [amount of] chocolate 0 otherwise 4. test H0 : β1 ≥ 0 versus H1 : β1 < 0 5. if we reject H0 in favour of H1 at α% level, we have shown that at α% level consuming [amount of] chocolate is associated with a lower blood pressure [type, measurement] ◦ if we do not reject H0 in favour of H1 at α% level, we have not shown that at α% level consuming [amount of] chocolate is associated with a lower blood pressure [type, measurement] Linear model ◦ model: Yi = β0 + β1xi,1 + . . . + βkxi,k + εi matrix notation: Y = Xβ + ε assumptions: E ε = 0, Var ε = σ2 I ∗ then E Y = Xβ, Var Y = σ2 I we often assume that ε ∼ N(0, σ2 I) ∗ then Y ∼ N(Xβ, σ2 I) ◦ parameter: θ = (β0, . . . , βk, σ2 ) = (β , σ2 ) estimation (point, interval) testing interpretation CHAPTER 1. INTRODUCTION 5 1.2 Statistics Example 1. Does consuming [amount of] chocolate decrease blood pressure [type, measurement]? ◦ Is chocolate good for our health? 2. design a trial, collect participants’ blood pressures x1, . . . , xn 3. suppose e.g. that Xi ∼ N(µi, σ2 ) ◦ µi: function of eating [amount of] chocolate, age, gender, . . . ◦ µi = β0 + β1xi,1 + . . . + βkxi,k ◦ xi,1 = 1 if the person eats [amount of] chocolate 0 otherwise 4. test H0 : β1 ≥ 0 versus H1 : β1 < 0 5. if we reject H0 in favour of H1 at α% level, we have shown that at α% level consuming [amount of] chocolate is associated with a lower blood pressure [type, measurement] ◦ if we do not reject H0 in favour of H1 at α% level, we have not shown that at α% level consuming [amount of] chocolate is associated with a lower blood pressure [type, measurement] Linear model ◦ model: Yi = β0 + β1xi,1 + . . . + βkxi,k + εi matrix notation: Y = Xβ + ε assumptions: E ε = 0, Var ε = σ2 I ∗ then E Y = Xβ, Var Y = σ2 I we often assume that ε ∼ N(0, σ2 I) ∗ then Y ∼ N(Xβ, σ2 I) ◦ parameter: θ = (β0, . . . , βk, σ2 ) = (β , σ2 ) estimation (point, interval) testing interpretation Parameter estimation CHAPTER 1. INTRODUCTION 6 ◦ we observe data with a distribution depending on a parameter ◦ we would like to use the data to estimate the value of the parameter ◦ estimator is a function of data (only!) ◦ for a one-dimensional parameter θ point estimator ˆθ confidence interval (ˆθL, ˆθU ) ◦ for a vector parameter θ point estimator ˆθ confidence region Methods of point estimation 1. method of moments ◦ “equate” theoretical and empirical moments E Y = 1 n n i=1 Yi E Y 2 = 1 n n i=1 Y 2 i . . . 2. maximum likelihood estimation ◦ maximize the likelihood with respect to θ ◦ likelihood probability of observing the data at hand under a given model ◦ very popular thanks to certain asymptotic optimality properties 3. other methods exist and we will see some Maximum likelihood estimation ◦ Y1, . . . , Yn ind. ∼ fi(y, θ) ◦ likelihood L(y1, . . . , yn; θ) = n i=1 fi(yi; θ) ◦ log-likelihood (y1, . . . , yn; θ) = n i=1 log{fi(yi; θ)} ◦ MLE ˆθMLE = argmaxθ (y1, . . . , yn; θ) CHAPTER 1. INTRODUCTION 7 ◦ usual computation score function U(y1, . . . , yn; θ) = n i=1 ∂ ∂θ log{fi(yi; θ)} score equation U(y1, . . . , yn; θ) = 0 find the solution ˆθMLE of the score equation observed Fisher information matrix J(y1, . . . , yn; θ) = − n i=1 ∂2 ∂θ∂θ log{fi(yi; θ)} show that J(y1, . . . , yn; ˆθMLE) is positive definite Fisher information matrix (under regularity conditions) I(y1, . . . , yn; θ) = Eθ J(y1, . . . , yn; θ) Properties of estimators ◦ parameter θ is a number but estimator ˆθ is a random variable ˆθ has a distribution important distribution summaries: E ˆθ, Var ˆθ Low variance High variance Low variance High variance HighbiasLowbias HighbiasLowbias ◦ ideally, estimation improves with sample size ◦ let ˆθn be an estimator of θ based on n data points ◦ we define desirable properties for the sequence {ˆθn}n∈N ◦ for a sequence of estimators ˆθn of a parameter θ 1. unbiasedness Eθ ˆθn = θ ∀θ 2. consistency ˆθn → θ as n → ∞ in Pθ ∀θ or a.s. CHAPTER 1. INTRODUCTION 8 3. usual asymptotic normality √ n(ˆθn − θ) → N(0, V (θ)) as n → ∞ in distribution 4. efficiency “small” Var ˆθ Properties of MLE ◦ under regularity conditions consistency ∗ ˆθMLE,n → θ a.s. as n → ∞ asymptotic normality ∗ √ n(ˆθMLE,n − θ) → N(0, V (θ)) as n → ∞ in distribution asymptotic efficiency ∗ V (θ) is the smallest possible bias ∗ ˆθMLE is often biased, with bias decreasing with n Interval estimation ◦ parameter θ is a number but estimator ˆθ is a random variable ◦ confidence interval (ˆθL, ˆθU) is a pair of random variables ◦ (1 − α) % confidence interval satisfies that Pθ{θ ∈ (ˆθL, ˆθU)} = 1 − α ∀θ ◦ note that randomness is in the borders, not in θ µ Confidence interval for µ in N(µ, σ2 ) with σ2 unknown ◦ properties coverage 1 − α length ˆθU − ˆθL CHAPTER 1. INTRODUCTION 9 ideally: a short interval with high coverage Testing hypotheses ◦ we observe data with a distribution depending on a parameter ◦ we would like to use the data to answer questions about the parameter is θ > 0, θ < 0, θ = 1, . . . ? ◦ to do so, we can test hypotheses about the parameter H0 : θ ≥ 0 vs. H1 : θ < 0 H0 : θ = 1 vs. H1 : θ = 1 . . . ◦ testing has two possible results 1. we reject H0 in favour of H1 we can say we have shown H1 (at the level α) 2. we do not reject H0 in favour of H1 we can say we have not shown H1 (at the level α) !!!we cannot say we have shown H0!!! ◦ the roles of H0 and H1 are not symmetric ◦ testing has two possible results 1. we reject H0 in favour of H1 2. we do not reject H0 in favour of H1 ◦ we can reach a wrong conclusion in two ways 1. when H0 is true and we reject H0 in favour of H1 “PH0 (reject H0) = α” α: type I. error, level of the test 2. when H1 is true and we do not reject H0 in favour of H1 “1 − PH1 (reject H0) = β” β: type II. error 1 − β: power of the test ◦ often impossible to keep both errors low at the same time ◦ when choosing a test, we keep the level α fixed and try to maximize the power β ◦ the roles of H0 and H1 are not symmetric ◦ the roles of H0 and H1 are not symmetric ◦ it is important to choose a good H0, H1 pair ◦ testing H0 : θ ≥ 0 vs. H1 : θ < 0 H0 : θ ≤ 0 vs. H1 : θ > 0 H0 : θ = 0 vs. H1 : θ = 0 answer different questions 1.3 Data analysis in practice Example: fev data ◦ from: http://www.statsci.org/data/general/fev.html ◦ question: association between the FEV[l] and Smoking, corrected for Age[years], Height[cm] and Gender ◦ data: FEV Age Height Gender Smoking 1.708 9 144.8 Female Non 1.724 8 171.5 Female Non 1.720 7 138.4 Female Non 1.558 9 134.6 Male Non . . . . . . . . . . . . . . . 3.727 15 172.7 Male Current 2.853 18 152.4 Female Non 2.795 16 160.0 Female Current 3.211 15 168.9 Female Non Getting the data to R ◦ data fev.txt ◦ for *.txt files: read.table(...) ◦ for *.csv files (from Excel) CHAPTER 1. INTRODUCTION 11 read.csv(...) ◦ read the data and look at them > fev <- read.table("fev.txt", header=TRUE) > > class(fev) [1] "data.frame" > dim(fev) [1] 654 6 > names(fev) [1] "ID" "Age" "FEV" "Height" "Sex" "Smoker" > > fev[1:3, ] ID Age FEV Height Sex Smoker 1 301 9 1.708 57.0 Female Non 2 451 8 1.724 67.5 Female Non 3 501 7 1.720 54.5 Female Non > > fev <- fev[, -1] Before we fit a model to data ◦ before we do the analysis, we need to get to know the variables get to understand the relationships among the variables identify possible problems for the analysis possibly spot obvious mistakes in data ⇒ first step in applied data analysis: descriptive statistics informal data descriptions (no model, no inference) ∗ numerical and graphical ∗ their choice depends on the type of variable(s) of interest First look at the variables > summary(fev) Age FEV Height Sex Min. : 3.000 Min. :0.791 Min. :46.00 Female:318 1st Qu.: 8.000 1st Qu.:1.981 1st Qu.:57.00 Male :336 Median :10.000 Median :2.547 Median :61.50 Mean : 9.931 Mean :2.637 Mean :61.14 3rd Qu.:12.000 3rd Qu.:3.119 3rd Qu.:65.50 Max. :19.000 Max. :5.793 Max. :74.00 Smoker Current: 65 Non :589 CHAPTER 1. INTRODUCTION 12 First look at the relationships between the variables > pairs(fev, col="deepskyblue", pch=19) Age 1 2 3 4 5 q q q q q q qq q q q q qqq q q q qq qq q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qqq q qq q q q q qq q q q q q q qq q qq qq q q qq q q q q q q q qq q qqqq q q q q q q qq q qqq q q q q qq q q qq q q q q q q q qq q q q qq qqq q q q qq q q q q q q q q q q q q q q q q qqq q q q q qqq q q q q q q q q qq q q q qq q q q q q q q q q qq q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q qq q qq qq q q q q q q q q q q q q q q q qq qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q qq q qq q q qq q qq qq qq q q q qq q q q q q q q q q q q q q q q qq q qq q q q q q q q q q q q q qq q qq q qq q q q q q q qq q q q q q q q q q q q q q q q q q q q qq q q q qq q q q q q q q q q q q q q q q qq q q q q q q q q qq q q q q q q q q q q q q q q q qq q qq q qq q q q q q q q q qq q qq q q q q q q q q q q q q q q q qqq q q q q q q q qq q q q qq qq q q q q q q q q qq q q q q q q q qqq q q q qq q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q qqqq q q qq q q q q q q q q q q q q qq qq q qq q q qqqqq q q q q q q qq q q q qq q q qq q q q q q q q qq q q q q q qq q q q q q q q q q q qq q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q qq q q q q q q q qq q q q q q q q q qq qqq q qq q q q q qq q q q q q qq q q qq qq q q qq q q q q q q q qq q q qqq q q q q q q qq q q qq q q q q qq q q qq q q q q q q q qq q q q qq q qq q q q q qq q q q q q q q q q q qq q q q qqq q q q q q qq q q q q q q q q qq q q q qq q q q q q q q q q qq q q q q q q q q q q q q q q q qq q q q q q q qq q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q qq q q q qq q q q q q q q q q q q q q q q qq qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q qqq q qq q q q qq qq q q q qq q q q q q q q q q q q q q q q qq q qq q q q q q q q q q q q q qq q q q q qq q q q q q q qq q q q q q q q q q q q q q q q q q q q qq q q q qq q q q q q q q q q q q q q q q qq q q q q q q qq qq q q q q q q q q q q q q q q q qq q qq q q q q q q q qq q q q q q q q q q q q qq q q q q q q q q q qq qq q q q q q q qq q q q q q qq q q q q q q q q qq q q q q q q q q qq q q q qq q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q qqqq q q qq q q q q q q q q q q q q q q qq q qq q q q qq qq q q q q q q q q q q q qq q q q q q q q q q q q qq q q q q q qq q q q q 1.0 1.4 1.8 q q q qq q qq q q q qq qq q q q qq qq q q q q q q q q q q q q q q q qqq q q q q q q q q q q q q q q q qq q q q qq q q q q q q q q qq qqq q qq q q q q qq q q q q q qq q q qq qqq q qq q q q q q q q qq q q qqq q q q q q q qq q qq q q q q q qq q q qq q q q q q q q qq q q q qq qqq q q q qq q q q q q q q q q q q qq q q q qqq q q q q q qq q q q q q q q q qq q q q qq q q q q q q q q q qq q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q qq q qq q q q q q q q q q q q q q q q q q qq qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q qqq qqq q qq qq qq q q qqq q q q q q q q q q q q q q q q qq q qq q q q q qq q q q q q q qq q qq q qq q q q q q q qq q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q qq qq q q q q q q qq qq q q q q q q q q q q q q q q q qq q qq q qq q q q q qq q q qq q q q q q q q qq q q q q q q q q qqqq q q q q q q q qq q q q qq qq q q q q q q q q qq q q q q q q q qqq q q q qq q q q q qq qq q q q q q q q q q q q q q q q q q q q q q q qqqq q q qq q q q q q q q q q q q q q q qq q qq q q qqqqq q q q q q q qq q q q qq q q qq q q q q q q q qq q q q q q qq q q q q 51015 q q q qq q qq q q q qqqq q q q qq qq q q q q q q q q q q q q q q q qqq q q q q q q q q q q q qq q q qq q q qqq q q q q q q q q qq qqq q qq q q q q qq q q q q q qqq q qq qqq q qq q q q q q q q qq q qqqq q q q q q q qq q qqq q q q q qq q q qq q q q q q q qqq q q q qq qqq q q q qqq q q q q q q q q q q qq q q q qqq q q q q qqq q q q q q q q q qq q q q qq q q q q q q q q q qq q q q q q q q q q q q q q q q qq q q q q q q qq q q q q q q q q q q q q q q q q q q qq q q q q q qq q q q q qq q qq qq q q q q q q q q q q q q q q q qq qq q q q q q q q q q q q q q q q q q q q qq q q q q q q q qq q q q q q q q q q q q q q qq q qqq qqq q qq qq qq q q qqq q q q q q q q q q q q q q q q qq q qq q q q q qq q q q q q q qq q qq q qq q q q q q q qqq q q q q q q q q qq q q q q q q q q qq q q q qq q q q q q q q q q q q q q qq qq q q q q q q qq qq q q q q q q q q q q q q q q q qq q qq q qq q q q q qq q q qq q qq q q q q qq q q qq q q q q qqqqq q q qq q q qq q q q qq qq q q q q q q q qqq q q q q q q q qqqq q q qq q qq q qq qq qq q q q q q q q q q q q q q q q q q q q q qqqq q q q q q q q q q q q q q q q q q q qq q qq q q qqqqq q q q q q q qq q q q q q q q qq q q q q q q q qq q q q q q qq q q q q 12345 qqq q q q q q qq q q qqq q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q qq q q q qq q q q q q q qq q q q q q q qq q q q q q qq q qq q q q qq q q q q q q q qqq q q qq q q q q q q q q qq q qq q q q q q q qq q q q q q q q q q q q q q qq q qqq qq q q q q q q q q q q q q q q q q q q q qq q q q q q qq qq q q q q q q q qq qq q qq q q q qq q q q q qq q q q q q q q q q q q q qq q qq qq q q q q q q q q q q q qq q q q q q qq q qqq q qq qq qq q q q qq qqq q qq q q q q q q q q q q qq q q q q q qq q qq q q qq q q q q qq q q q q q q q q q q q q qq q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q qq qq qq q q q q q q qq q q q qq q q qq q q q q q q q q q q q qq qqq q q q q q q q q q q q qq q q q q qq q qq q q qq q qq q q qq q q q q q q q qq q q q q q q qq q q q q q q qq q q q q q q q q q qq q qq q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q qqq q q qq qq q qqq q q q qq q q qq qq q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q qq q qq q qq qq q q qq q q q q q qq q q q q q q q q q q q q qq qq q qq q q q q q qq q q q q q qqq q q q q qq q q qq q q q q q q q q q q qq q FEV q qqq q q q q qq qq q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q qq q q q qq q q q q q q qq q q q q q q q q q q q q q q q q qq q q q qq q q q q q q q q qq q q qq q q q q q q q q qq q qq q q q q q q qq q q q q q q q q q q q q q qq q q qq qq q q q q q q q q q q q q q qq q q q q qq q q q q q q q qq q q q q q q q qq q q q qqq q q qqq q q q qq q q q q q q q q q q q q qq q qq qq q q q q q q q q q q q qq q q q q q q q q q q q q qq qq q q q q q qq qqq q q q q q q q q q q q q q qq q q q q q qq q qq q q qq q q q q qq q q q q q q q q q q q q qq q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q qq qq q q q q q qq q qq q q q qq q q qq q q q q q q q q q q q qqqq q q q q q q q q q q q q q q q q q q qq q qq q qqq q qq q q qq q q q q q q q qq q q q q q q qq q q q q q q qq q q q q qq q q q q q q qq q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q qqq q q q q qq qqqq q q q qq q q qqq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q qq q q q qq q q qq q q q q q qq q q q q q q q q q q q q q q qq q q q q q q q q qq q q q q q qq q q q q q qq q q qq q q q q q q q q q q q q q qqq q q q q q qq q q q qq q q q q q qq q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q qq q q q qq q q qq q q q qq q q q q q q qq q q q q q q qq q q q q q q q q qq q q q qq q q q q q q q qqq q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qqq q q q q q q q q q q q q q q q qq q q q q qq q qq q q q q qq q q q qq q q qq q q q qqq q q q qq q q q qq q q q q q q q q q q q q q q q qq qq q q q q q q q q q q q qq q q q q q qq q qq q q qq qq qq q q q qq qqq q qq q q q q q q q q q q qq q q q q q qqq q q q q q q q q q q qq q q q q q q q q q q q q qq q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q qq q q q q q qq q qq q q q qq q q qq q q q q q q q q q q q qqqq q q q q q q q q q q q q qq q q q q q q qqq q qqq q qq q q qq q q q q q q q qq q q q q q q qq q q q q q q qq q q q q qq q q q qq q qq q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q qq q q qq qq qqq q q q q qq q q qqq q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q qq q qq q q q qq q q qq q q q q q q q q q q q q q q q q q q q q q qq q qq q q q q q qq q q q q q qqq q q q q qq q q qq q q q q q q q q q q qq q qqqq q q q q qq qq qqq q q q q q qq q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q qq q q q qq q q qq q q q qq q q q q q q qq q q q q q q qq q q q q q qq q qq q q q qq q q q q q q q qqq q q qq q q q q q q q q qq q qq q q q q q q qq q q q q q q q q q q q q q qq q qqq qq q q q q q q q q q q q q q qq q q q q qq q qq q q qq qq q q q qq q q qq qq q qqq q q qqq q q q qq q q q q q q q q q q q q qq q qq qq q q q q q q q q q q q qq q q q q q qq q qq q q qq qq qq q q q qq qqq q qq q q q q q q q q q q qq q q q q q qqq qq q q qq q q q q qq q q q q q q q q q q q q qq q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q qq qq qq q q q qq q qq q q q qq q q qq q q q q q q q q q q q qqqqq q q q q q q q q q q q qq q q q q qq qqq q qqq q qq q q qq q q q q q q q qq q q q q q q q q q q q q q q qq q q q q qq q q q qq q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qqq q q qq qq qqqq q q q qq q q qqq q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q qq q qq q qq qq q q qq q q q q q qq q q q q q q q q q q q q qq qq q qq q q q q q qq q q q q q q qq q q q q qq q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q qq q qq q q q q q q q q q q q qq qq q q q q q qq q q q q q q q q q q q qq q q q q q q q q q q q q q q qq q q qq q q q qq q q q q q q q q q q q q qq q q q q q q q qq q q q qq q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q qq q q q qq q q q q q q q q q q qq q q qq qq q q q q q q q qq q q q q q q q q q q q q q q q qq q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q qq q qq q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q qq q qq q q q q q q q q q q q q qq q qq q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q qq q q q q q q q qqq qq q q q q q q q q q qq q q q q q qqq q q q q qq q qqq q qqq q qq q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q qq q qq q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q qq qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q qq q q q q q q qq q q q q q q q q q q q q q q q q q q qq q q q qq q q q q q q q q q q q q qq q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q qq q q q q qq q q q q q q q q q q q q q q q q qq q q q q q q q qq q q q q q q q q q q q q q q q qq q q q q q q qq q q q q q q q q qq q q q q q q q q q q q q q qq q qq q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q qq q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q qq q qq q q q q q q q q q q q q q q q q q q q q q q qq q qq q q q q q q q q q q q q q q q q q q q q q q qq q qq q q q q q q q q q q q q qq q qq q q q q q q q q q q q q q q q q qqq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qqq q q q q qq qqqq q q qq q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q qq q q q q q q q q q q q q q q q q q qq q qq q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q qq q qq qq q q q q q q q q q q q Height q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q qq q q q q q q q q q q q q q qq q q q q q qq q q q qq q q q q q q qq q q q q q q q q q q q q q q qq q q qq q q q qq q q q q q q q q q q q q qq q q q q q q q qq q q q qq q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q qqq q q q qq q q q q q q q q q q q q q q q q qq q q q q q q q qq q q q q q q q q q q q q q q q qq q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q qq q qq q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q qq q qq q q q q q q q q q q q q q q q q q q q q q q qq q qq q q q q q q q q q q q q q q q q q q q q q q qq q qq q q q q q q q q q q q q qq q qq q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q qq q q q qq q q qqq q q q q q q q q q q q q q q q q q q q qqq q q q qq qqq q q qqq q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q qq qq q q q q q q q q q q q 45556575 q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q qq q qq q q q q q q q q q q q qq qq q q q q q qq q q q qq q q q q q q qq q q q q q q q q q q q q q q qq q q qq q q q qq q q q q q q q q q q q q qq q q q q q q q qq q q q qq q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q qqq q q q qq q q q q q q q q q q qq q q qq qq q q q q q q q qq q q q q q q q q q q q q q q q qq q q q q q q qq q q q q q q q q qq q q q q q q q q q q q q q qq q qq q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q qq q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q qq q q q qq q q qq q qq q q q q q q q q q q q q q q q q q q q q q q qq q qq q q q q q q q q q q q q q q q q q q q q q q qq q qq q q q q q q q q q q q q qq q qq q q q q q q q q q q q q q q q q qqq q q q q q q q q q q q qq q q q q q q q q qq qq q q q q q q q q q qq q q q q q qqqq q q q qq qqqq q qqq q qq q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q qq q qq q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q qq q q q qq q q q q q q q q q q q 1.01.41.8 qqq qq qqq qqq q q qqqq q qq qqq q q q qq qqq q qqq q qqqq q qq qq q q q q qq q qqq q qqq q q qqq qq q qq q q q qqqqqq q qqq qq qq q qq q q qq qq qqqqq q q qq q q qq qqqq qq qqqq q q q qq q q q q qqq qqq q qqq q q q q q q q qqq qq q q qq qq qq qq qq qq q qq q q qq qqq q q qqqq q q q q qqq q q qq q q qqq q q q qq qq q q q qq q q q qqqq qq q q qqqq q q q qqq qq q qq q q q qq q q q qqq qq q qqqq qq q qq qqq qq q qq qq qqqq qq q qq q qq q q q q q qqq q q q q q qq qqq q qq qq q q q qq q qq qq q q q q qq q qqqq qq q q q q q q q q qq q qq q q qqqqq q q qq qqq q qqqq q qqq qqq qq q q qq q q qq q qq qq q qq q q q qq qqq q qq qqq qq q qq qq q q q q qqq qq q q qq q q q qq q q q qqq q q qq q q qq q q qq q qq q qqq q qq qq q qqq qq qqqqqq qq q qq qqq q q q q q q qq qqqq qq q qq q qqq q q q q qq qqq q q q q qqq qqq qq q qq q qq q qq qq qqqq q qq q qqqqq qq qq q qq q q q qq q qqq q q q qqqq qq q qq qq q qq q q qq q q q qqq qq qq qq qq q qqqq qq qq q qqq qqq q q qq q q q qqqq qqq qqq q qq qq qqq q qq q qqq qq q q qqq qq q q qqqqq qqq qqq q q qqq qqq q q qqqq q qq qqq q q q qq qqq q qqq q qqq q q qq qq q q q q qq q qqq q qqqq q qqq qqq qq q q q qqqqqq q q qq qq qq q qq q q qq qq qq qqq q q qq q q qq qq qq qq q qqq q qq qq q q q q qq q qqq q qqq q q q q q q q qqq q q q q qq qq qq qq qq qq q qq q q qqq qq q q qqqqq q qq qqq q q qqq q qqq q q q qq qq q q q qq q qq q qqq qq q q qqqq q q q qqq qq q qq q q q qq q q q qqq q q q qqqq qq q qq qqq qq q qq qq qqqq qq q qq q qq q q q q q qqq q q q q qqq q qq q qq q q q q q q q q qq qq q q q q q q q q q qq qq q q q q q q q q qq qqq q q qq qq q q q qq qqq q qqq q q qqq qq q q q q q qq q q qq q qq q q q qq q qq qq qq q q qq qqq qq q q q qq q q q q qqqqq q qqq q q q q q q q q qqq q q q q q qqq q q qq q qq qqqq q qqq q q qqq q q qqqq qqq q q qq q qq q q q q qq qq q qqq qq q q qq q qq q q q q qqq q qq q q q qqq q q q qq q q q q qq q qq qq qqqq qqq q qq qqq qq qq q qq q q q qq q qqq q q q q qq q q qq qq q q q qq q qqq q q q qq q qq qq qq qq q qqq q qq qq q qq q q qq q q qq q q q q qqq qqq qqq q qq qqq qqq qq q qqq qqq q qq q qq q q q qqqq qq q q qq q q qqq qqq q q qqqq q q q q qq q q q qq qqq q qqq q qq qq q qq qq q q q q qq q qqq q qqq q q qqq qqq qq q q q qqqqqq q q qq qq q q q qq q q qq qq qq qqq q q qq q q qq qq qq qq q qqq q q q qq q q q q qq q qqq q qqq q q q q q q q qqq q q q q q q qq qq q q qq qq q qq q q qqqqq q q qqqq q q qq qqq q q qq q q qqq q q q qq qq q q q qq q q q q qqq qq q q qqqq q q q qqq qq q qq q q q qq q q q qqq qq q qq qq q q q qq qqq q q q qq qq qqqq q q q q q q qq q q q q q q qq q q q q q qq q qq q qqqq q q q qq q qq qqq q q q q q q qq qq qq q q q q q q q q qq qqq q q qqq q q q q q q qqq q qqq q q qqq qq q qq q q qq q q qq q qq q q q qq q qq qq q qq q qq q qq qq q q q qq q q q q qqqqq q qqq q q q qq q q q q qq q q q q q q qq q q qq q qq q qqq q qq q q q qqq qq qqqq qqq q q qq q qq q q q q qq qq qq qq qq q qq q qq q q q qq qq qqqq q q q qqq q q q q q q qq q qq q qq q q q q qq qqq q q qqqq qq qq q qq q q q qq q qqq q q q qq q q qqq qq qq q qq q qq q q q q q q q qq qq qq qq q qqq q qq qq q q q q qqq q qqq q q q q qq q q qq qq q q qq q qq qq q q q q q qq qqq q qqq qq q q qq qqq q q q Sex qqq qq qqqqqq q q qqqq q qq qqq q q qqq qqq q qqqq qqqq q qq qqq q q qqq q qqqq qqqq q qqq qqq qqq q q qqqqqqq qqq qq qq qqqq q qq qq qqqqq q qqq q qqq qqqq qq qqqqq qq qq qq q q qqq qqq q qqq q q q q q q q qqq qq q q qq qq qq qq qq qqq qq q qqqqqq q q qqqqq q qq qqq qq qqq q qqq q q q qq qq q q q qqq qq qqqqqqq q qqqq q q qqqq qq q qq q q q qq q q q qqq qq qqqqq qq q qq qqq qq q qqqq qqqq qq q qq q qq q q q qq qqqq q q q qqq qqq q qqqq q q qqq q qq qqq q q q qq q qqqq qqq q q q q qq q qq qqq q q qqqqq q q qq qqq q qqqq qq qq qqqqq q qqqq q q q q qq qq q qq q qq qq qq q q qq qqq qq q qq qq q q q q qqqqq q qq q q qq qq q q q qqq q q qq q qqqq q qq q qq qqqq q qqqq q qqq qq qqqqq qqq q qq qqq qq q q qq qq q qqqqq q qqq qqq qq q q q qq qqq q q q qq q qqq qq qqq q qq q qq qq qqqq qqq q qqqqq qqqq q qqq q q qq qqqq q q q q qqq qqq qq qqq qq q qqq q q q qqq qqqq qq qq q qq qq qq q q q qqq qq q q qqq q q q q qqq qqqq qq q qq qqqqqq qq q qqq qqq q q qq qq q q qqqqq qq q 5 10 15 qqq qqqqq qqq qqqqqqq qqqqq q qq qq qqq qqqq qqqqqqqq qq qqqq qq qqqq qqqq qqqqq qq qqq qqqqqqqqq qqqq qqqqq qq qqqqqqqqqqqqq qq qq qqqqqqqqqqqq qq qqqq qqqqqqqqqqqqq qqq q qq qqqq qqqqqqqqqq qqqqqq qqq qq qq qqq qqqqqq qqq qqqqq qqq q qqqq qqqqq qq qqq qq qq qqqqq qq qq qqqq qqq qqqqq qqq qqqqq qq qqqq qqq qqqq qqqqqqqq qqqqq qq qqqq qqqqqqqq qq qq qqqq qq qqq qqqqq qqq qq qqq qqq qq qq qqqqqqq qqqqqq qqqqqq q q qqq qqq qqqqqqqqqq qqqq qqqqq q qq qqq qq q q q q q q q qqqqqqq q q qq q qqq q q qqqqqqqqqqq qq q qqq qqq qqqq q qqq qqqq q qqqq qq qqqq qq q q qq q q q q qqq q qq qqqqqqq q qqqq q q qqq qq qqqq q qqq qqq q qqq qq qqq q qqqq q q q q q q qq qqqq q q qqqqqqq qqqq q qqq qq qqqqq qqqqqqqqqq qq q qq qqqqqq qqqqq q q qq q qq qqqqq qqq qq qq q qqqqqqq qqqqqqq q q qq q q q qq qq q qq q q q q q qq qq qqqq qq q qq qqq qq qqq q qqq qqq q q q q q qqq q qq qq qqq q q q qqqq q qqq qqqq qqqqq q qqqqq q qq qq qqq qqqq q qqq q qqqqq qqqq qq q qqq q qqqq q qqq qqqqq qq qqqqqqq qq qq qqqqq qq q qqq qqqq qqqq q qq qq qqqq qqqqq qqq q qqqqq qqq qq qqqqqqqq qqq q qq q qqq q qqqqqqqqq qq qqqq q qq qq qqq qq qqqqqqqq qqqqqq q qq q q qqq qqq qqqq qqqqq qqqq qqq qq qq qqqq qqq qqqqq q qqq q q qqqqq qqq q qq qqqq qqq qqqqq qqq qq qqqqqq qqqqqq qqqq qq qqqq qqq qqqqq qqqqq q q qqq q qqqq qq qq qqq qq q q qqqq qq qqq q q q qq qqqq qqq qq qq qqq qqqqqqq qq q qq qq q q q q q q q q q q qqqqq qq q q q q q qqq q q qqq qqqqq qq q qq q qq qqqqqqqq q q qq qq qq q qqqq qqq qqqqq q qqq q q q qqqq q qqq qq qqqq q qqqq q qq qq qqq qqq q qqqqqq q qqq qq q q q q q qqq q q q q q q q qq qq qq q qq q qqq q q qq q q qqq qq qqqqqqqqqq qqq qq qq q qq qq qqq q qqqqq q q qq q q qq qqq q q qq qqqq q qqqq qqq qqqqqq qq q q q q qq qq qq q q q q q q q q q q qq qqqq qq q qq qqq qqq qq q qq q qqq q qq q q q qq q qq q q qqq q q q 45 55 65 75 q qqq q qqq qqqq q qqqq q q qq qq q qq qq qqq qqqq qqq qq qqq qq qqq q qq qqqq q qqq q qqqq qqqqq qqq qqqqqq qq qq qq q qq qq qq qq qqqq qqqq q qq qq qq qq qq qqq qqq qq qqqq q qqqq q qqq qqqq qqq q qq q qqq q qqqq qqqqq q qqq qq q qq qq qqqqq qqqqqq qq qq qqqq qqq q qqqq q qq qqqq qqqqq qq qq qqq qq qq qqqq qqq qqqqq q qq q qq qqqqq qqq qqq qq qq q qq qqqqq q qq qq qq qqqq q qqq qqqq qq qq qq qq qqq qq qqq qq qqqqq qqq qqq qq qqqq qqq qq qq qqqq qq qq q q q q qqqqq qqqqq q qq qq q qqqqqqq qq q qq qq q qq q q q q q q q qqqqq q qq q q q q qqq q q qqq q qqqq qq q qq q qq qqqqqq qq q q qq q qqq q qq qq qqq qqq qq q qqq q q q q qqq q qq q qq qqqq q qqqq q qq qq qqq qqq q qqqqqq q q qq qq q qq q qq qq q q q q q q qqqqq qq q qq q qq qq qqq q q qqq q q q q qqqqqqq qqqq qq qq q qq qqq qq q qqqqq q q q q q qqq qq qq q qq qqq q q qqq q qqq qq qqqq qq q q q q qq qqq q qqq q q qq q q q q q qq qq qq q q q qqq q qq qq q q q q q qq q qq q q qqq q qq qq qqq q q q qqq qqqqqqqq qq qqqqq qqqqq qq qqqqqq qqqqq qqqqq qqqqq qq qqqq qqqqqqqq qqqq qqqqqq qq qqqqqqqqqq qqqq qqqqq qqqq qqqqqq qqqq qqqqqqq qqqqqqq qqqq qqq qqqq qqqq qqqq qq qq qq qqqqq qq qqqq qqqq qqqqq qqq qqqqqqq qqqqqq qqq qqqqq qq q q qqqq qq qqqq qq qqqq qqqqqqqqq qqqqq qq qqqqqq qqq qq qqq qq qqqq qqqqqqq qqq qqqqq qqq qqqqqqqq qqq qqq qqq qq qqqqqq qq qqqq qqqq qqqqq qqqq qqq qqqq qq qqq qqqqqqq qq qq qq q qqqqq qq qqqqqq qqq qqqq qqqqq q qq qqqqq q qq qq q q qq qqqq qq q qq q qqq q q qqq qqqqq qqq qq q qq qqqqqq qq q q qqq qqq q q qqqq qqq qqqq q qqq q q q qqqq q qqqqq qqqq q qqqq q qqqq qqqqq q q q qqq qq q qqq qq qqq q qqqqq q q q q q qqq qq qq q q qqqqq qqqq q q q qq qq qqqqqqq qqqqqq qqqq q qqqq qqq qqqqq qq q qq q qqq qqqqq qqq qqq q qq qqqqqqq qqqq qq q qq q qq q qqq q qq q q qq q q q q q qqqq qqq qq qqq qqqqq q qq q qqq q qq q q qqq q qq qq qqq q q q 1.0 1.4 1.8 1.01.41.8 Smoker 1.4 Descriptive statistics 1.4.1 Types of variables Types of variables 1. in mathematical statistics: ◦ continuous (uncountably many possible values) ◦ discrete (at most countably many possible values) 2. in applied statistics: ◦ quantitative ◦ categorical nominal ordinal 3. in : CHAPTER 1. INTRODUCTION 13 ◦ numeric ◦ factor ordered factor Quantitative variable ◦ distribution ◦ characteristics of location mean maximum, minimum quantiles, in particular quartiles and median > summary(fev$FEV) Min. 1st Qu. Median Mean 3rd Qu. Max. 0.791 1.981 2.548 2.637 3.118 5.793 ◦ characteristics of dispersion standard deviation interquartile range > sd(fev$FEV) [1] 0.8670591 > IQR(fev$FEV) [1] 1.1375 Graphics for quantitative variable > hist(fev$FEV, , freq=FALSE, + main="FEV", xlab="FEV [l]", + col="gold", border="navyblue") > > boxplot(fev$FEV, horizontal=TRUE, + main="FEV", xlab="FEV [l]", + col="gold", border="navyblue", pch=19) CHAPTER 1. INTRODUCTION 14 q qq qq qqqq 1 2 3 4 5 FEV FEV [l] FEV FEV [l] Density 1 2 3 4 5 6 0.00.10.20.30.4 Categorical variable ◦ distribution counts of observations per category percentage of observations per category cumulative percentage of observations per category (for ordinal variables) ◦ characteristics modus > summary(fev$Sex) Female Male 318 336 > prop.table(table(fev$Sex)) Female Male 0.4862385 0.5137615 > cumsum(prop.table(table(fev$Sex))) Female Male 0.4862385 1.0000000 > # not so interesting for a nominal variable Graphics for categorical variable > barplot(100*prop.table(table(fev$Sex)), + main="Gender distribution", ylab="Percentage", + col=c("gold", "navyblue"), border="navyblue") > > barplot(100*matrix(prop.table(table(fev$Sex)), nrow=2, ncol=1), + main="Gender distribution", ylab="(Cumulative) percentage", + col=c("gold", "navyblue"), border="navyblue") CHAPTER 1. INTRODUCTION 15 > text(x=0.7, y=4, labels="F", + cex=2, pos=3) > text(x=0.7, y=55, labels="M", + cex=2, pos=3) > > pie(summary(fev$Sex), + main="Gender distribution", + col=c("gold", "navyblue"), border="navyblue") Female Male Gender distribution Percentage 01020304050 Gender distribution (Cumulative)percentage 020406080100 F M Female Male Gender distribution 1.4.2 Relationships between variables Quantitative vs quantitative > plot(fev$FEV~fev$Age, + main="FEV by age", + ylab="FEV [l]", xlab="Age [y]", + pch=19, col="deepskyblue") > lines(lowess(fev$FEV~fev$Age), + lwd=3, col="navyblue") qqq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q qq q qq q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q qqq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 5 10 15 12345 FEV by age Age [y] FEV[l] CHAPTER 1. INTRODUCTION 16 Quantitative vs categorical > boxplot(fev$FEV~fev$Sex, + main="FEV", ylab="FEV [l]", + col="gold", border="navyblue", pch=19) > > par(mfrow=c(1, 2)) > hist(fev$FEV[fev$Sex=="Female"], freq=FALSE, + xlim=c(min(fev$FEV)-1, max(fev$FEV)+1), ylim=c(0, 0.48), + main="FEV", xlab="FEV [l]", + col="gold", border="navyblue") > hist(fev$FEV[fev$Sex=="Male"], freq=FALSE, + xlim=c(min(fev$FEV)-1, max(fev$FEV)+1), ylim=c(0, 0.48), + main="FEV", xlab="FEV [l]", + col="gold", border="navyblue") Female Male 12345 FEV FEV[l] FEV FEV [l] Density 0 1 2 3 4 5 6 7 0.00.10.20.30.4 FEV FEV [l] Density 0 1 2 3 4 5 6 7 0.00.10.20.30.4 Categorical vs categorical > table(fev$Smoker, fev$Sex) Female Male Current 39 26 Non 279 310 > > prop.table(table(fev$Smoker, fev$Sex), + margin=1 + ) Female Male Current 0.6000000 0.4000000 Non 0.4736842 0.5263158 > > prop.table(table(fev$Smoker, fev$Sex), + margin=2 + ) Female Male Current 0.12264151 0.07738095 Non 0.87735849 0.92261905 CHAPTER 1. INTRODUCTION 17 > prop.table(table(fev$Smoker, fev$Sex), + margin=2 + ) Female Male Current 0.12264151 0.07738095 Non 0.87735849 0.92261905 > > barplot(height=prop.table(table(fev$Smoker, fev$Sex), + margin=2), beside=F, + ylab="Percentage", xlab="Gender", main="Smoking by gender", + col=c("gold", "navyblue") + ) > text(x=1.2*c(0:2)+0.7, y=0, labels="Current", + col="navyblue", cex=2, pos=3) > text(x=1.2*c(0:2)+0.7, y=0.8, labels="Non", + col="gold", cex=2, pos=3) Female Male Smoking by gender Gender Percentage 0.00.20.40.60.81.0 Current Current Non Non > prop.table(table(fev$Smoker, fev$Sex), + margin=2 + ) Female Male Current 0.12264151 0.07738095 Non 0.87735849 0.92261905 > > barplot(height=prop.table(table(fev$Smoker, fev$Sex), + margin=2), beside=T, + ylab="Percentage", xlab="Gender", main="Smoking by gender", + col=c("gold", "navyblue") + ) > > legend(x=3.3, y=0.8, legend=c("Current", "Non"), + col=c("gold", "navyblue"), pch=15) CHAPTER 1. INTRODUCTION 18 Female Male Smoking by gender Gender Percentage 0.00.20.40.60.8 Current Non > prop.table(table(fev$Sex, fev$Smoker), + margin=2 + ) Current Non Female 0.6000000 0.4736842 Male 0.4000000 0.5263158 > > barplot(height=prop.table(table(fev$Sex, fev$Smoker), + margin=2), beside=F, + ylab="Percentage", xlab="Smoking", main="Gender by smoking", + col=c("gold", "navyblue") + ) > text(x=1.2*c(0:2)+0.7, y=0.2, labels="Female", col="navyblue", + cex=2, pos=3) > text(x=1.2*c(0:2)+0.7, y=0.75, labels="Male", col="gold", + cex=2, pos=3) Current Non Gender by smoking Smoking Percentage 0.00.20.40.60.81.0 Female Female Male Male > prop.table(table(fev$Sex, fev$Smoker), + margin=2 + ) Current Non Female 0.6000000 0.4736842 Male 0.4000000 0.5263158 > > barplot(height=prop.table(table(fev$Sex, fev$Smoker), CHAPTER 1. INTRODUCTION 19 + margin=2), beside=T, + ylab="Percentage", xlab="Smoking", main="Gender by smoking", + col=c("gold", "navyblue") + ) > > legend(x=3, y=0.6, legend=c("Female", "Male"), + col=c("gold", "navyblue"), pch=15) Current Non Gender by smoking Smoking Percentage 0.00.10.20.30.40.50.6 Female Male Chapter 2 Linear algebra essentials 2.1 The problem 2.1.1 Linear model Chocolatey example ◦ Does consuming [amount of] chocolate decrease blood pressure [type, measurement]? ◦ collect blood pressures x1, . . . , xn ◦ suppose that Xi ∼ N(µi, σ2 ) µi = β0 + β1xi,1 + . . . + βkxi,k xi,1 = 1 if the person eats [amount of] chocolate 0 otherwise xi,j, j ∈ {2, . . . , k}: age, gender, BMI, . . . ◦ test H0 : β1 ≥ 0 versus H1 : β1 < 0 to answer the question Linear model ◦ Yi = β0 + β1xi,1 + . . . + βkxi,k + εi, i ∈ {1, . . . , n} Yi: outcome, response, output, dependent variable ∗ random variable, we observe a realization yi ∗ (odezva, z´avisle promˇenn´a, regresand) xi,1, . . . , xi,k: covariates, predictors, explanatory variables, input, independent variables ∗ given, known CHAPTER 2. LINEAR ALGEBRA ESSENTIALS 21 ∗ (nez´avisle promˇenn´e, regresory) β0, . . . , βk: coefficients ∗ unknown ∗ (regresn´ı koeficienty) εi: random error ∗ random variable, unobserved ◦ εi iid ∼ (0, σ2 ), i ∈ {1, . . . , n} E εi = 0: no systematic errors Var εi = σ2 : same precision ◦ we often assume that εi iid ∼ N(0, σ2 ), i ∈ {1, . . . , n} Linear model in the matrix form ◦ Yi = β0 + β1xi,1 + . . . + βkxi,k + εi, i ∈ {1, . . . , n} ◦ let Y =     Y1 Y2 . . . Yn     , X =     1 x1,1 . . . x1,k 1 x2,1 . . . x2,k . . . . . . . . . . . . 1 xn,1 . . . xn,k     , β =     β0 β1 . . . βk     , ε =     ε1 ε2 . . . εn     ◦ then Y = Xβ + ε, ε ∼ (0, σ2 I) and often ε ∼ N(0, σ2 I) X: design matrix ∗ (regresn´ı matice, matice pl´anu) ◦ let p = k + 1 ◦ then Y n×1 = X n×p β p×1 + ε n×1 ◦ we assume that n > p (and often think about n → ∞, p fixed) Example: bloodpress data ◦ from sites.stat.psu.edu/~lsimon/stat501wc/sp05/data/ ◦ association between the mean arterial blood pressure[mmHg] and age[years], weight[kg], body surface area[m2 ], duration of hypertension[years], basal pulse[beats/min], stress CHAPTER 2. LINEAR ALGEBRA ESSENTIALS 22 ◦ data: BP Age Weight BSA DoH Pulse Stress 105 47 85.4 1.75 5.1 63 33 115 49 94.2 2.10 3.8 70 14 . . . . . . . . . . . . . . . . . . . . . 110 48 90.5 1.88 9.0 71 99 122 56 95.7 2.09 7.0 75 99 ◦ model: Y = Xβ + ε       105 115 . . . 110 122       =       1 47 85.4 1.75 5.1 63 33 1 49 94.2 2.10 3.8 70 14 . . . . . . . . . . . . . . . . . . . . . 1 48 90.5 1.88 9.0 71 99 1 56 95.7 2.09 7.0 75 99       ×   β0 . . . β6   +       ε1 ε2 . . . ε19 ε20       2.1.2 Task for this chapter Design matrix ◦ model:     Y1 Y2 . . . Yn     =     1 x1,1 . . . x1,k 1 x2,1 . . . x2,k . . . . . . . . . . . . 1 xn,1 . . . xn,k     ×   β0 . . . βk   +     ε1 ε2 . . . εn     ◦ design matrix: X =     1 x1,1 . . . x1,k 1 x2,1 . . . x2,k . . . . . . . . . . . . 1 xn,1 . . . xn,k     = (1 | x,1 | x,2 | . . . | x,k) =     x1, x2, . . . xn,     k covariates and 1 are the p columns of X n observations are the n rows of X Matrix algebra in a linear model ◦ model: Y = Xβ + ε ◦ coefficient vector β fixed but unknown p × 1 matrix β defines a mapping β : Rp → R CHAPTER 2. LINEAR ALGEBRA ESSENTIALS 23 xi, ∈ Rp E Yi ∈ R ◦ design matrix X fixed and known n × p matrix defines a mapping X : Rp → Rn β ∈ Rp E Y ∈ Rn idea: when estimating β, how about choosing β so that X maps ˆβ as close to Y as possible? 2.2 Linear mapping Linear mapping from Rp to Rn ◦ function f : Rp → Rn such that f(x + y) = f(x) + f(y) . . . additivity f(αx) = αf(x) . . . homogeneity ◦ described by an n × p matrix A: f(x) = Ax → idea: ∀ x ∈ Rp can be written as x = p i=1 civi, where V = {v1, . . . , vp} is a basis of Rp → f(x) is determined by {f(v1), . . . , f(vp)} because f(x) = f( p i=1 civi) = p i=1 cif(vi) ∀ y ∈ Rn can be written as y = n i=1 ciwi, where W = {w1, . . . , wn} is a basis of Rn → just need to write each f(vi) in terms of W free choice of (W, V) → various A’s representing the same f ◦ operations f1 ◦ f2, f1 + f2, αf result into linear mappings represented by A1A2, A1 + A2, αA CHAPTER 2. LINEAR ALGEBRA ESSENTIALS 24 2.2.1 Associated subspaces Kernel and image ◦ kernel (nullspace) ker(A) = ker(f) = {x ∈ Rp ; Ax = 0} = {x ∈ Rp ; f(x) = 0} subspace of Rp , dim(ker(A)): nullity of A ◦ image (range, column space) im(A) = im(f) = {y ∈ Rn ; ∃ x ∈ Rp : Ax = y} = {y ∈ Rn ; ∃ x ∈ Rp : f(x) = y} subspace of Rn , dim(im(A)): rank of A ◦ schematically 0 0 A : Rp → Rn Rp Rn ker(A) im(A) ◦ rank nullity theorem: dim(ker(A)) + dim(im(A)) = p Four fundamental subspaces associated to A ◦ kernel and image of A column space of A: im(A) = {y ∈ Rn ; ∃ x ∈ Rp : Ax = y} ∗ dim(im(A)) = rank(A) kernel of A: ker(A) = {x ∈ Rp ; Ax = 0} ∗ dim(ker(A)) = p − rank(A) ◦ kernel and image of A column space of A : im(A ): coimage of A ∗ dim(im(A )) = rank(A ) = rank(A) ∗ row space of A kernel of A : ker(A ): cokernel, left nullspace of A ∗ dim(ker(A )) = n − rank(A): corank of A CHAPTER 2. LINEAR ALGEBRA ESSENTIALS 25 2.2.2 Orthogonality Inner product on Rn ◦ dot product: < x, y >= y x = n i=1 xiyi ◦ associated norm (length of x): ||x|| = √ < x, y > = n i=1 x2 i ◦ angle θ between x and y: cos θ = ||x|| ||y|| ◦ orthogonality for x = 0 and y = 0: < x, y >= 0 ◦ orthogonal complement W⊥ of a subspace W of Rn : W⊥ = {y ∈ Rn ; < x, y >= 0 for every x ∈ W} ∗ W⊥ is a subspace of Rn ∗ W⊥ ∩ W = {0} ∗ dim(W) + dim(W⊥ ) = n ◦ orthogonality between fundamental subspaces associated to A ker(A) = (im(A ))⊥ (in Rp ) ker(A ) = (im(A))⊥ (in Rn ) Orthogonal columns ◦ matrix with orthogonal columns: U = (u,1 | u,2 | . . . | u,p) < u,i, u,j >= 0 for i = j ◦ matrix with orthonormal columns: U = (u,1 | u,2 | . . . | u,p) < u,i, u,j >= 0 for i = j ||u,i|| = 1 for i ∈ {1, . . . , p} U U = I ⇒ mapping U : x → Ux preserves CHAPTER 2. LINEAR ALGEBRA ESSENTIALS 26 ∗ inner product ∗ norm ∗ angles ∗ distances Orthogonal matrix ◦ square matrix R with orthonormal columns (and rows) R R = RR = I i.e. R = R−1 ◦ R−1 = R is also an orthogonal matrix ◦ product of orthogonal matrices is also an orthogonal matrix ◦ geometrically change of orthonormal basis (coordinate transformation) mapping R: x → Rx . . . rotation ∗ preserves the origin ∗ preserves angles ∗ preserves distances ∗ proper rotation if det R = 1 R R 2.3 Matrix decompositions 2.3.1 Eigen-decomposition Spectral decomposition (eigen-decomposition) ◦ let A be a symmetric p × p matrix eigenvalues λ1, . . . , λp eigenvectors u.,1, . . . , u.,p everything real Au.,i = λiu.,i f elongates/shrinks u.,i by λi A CHAPTER 2. LINEAR ALGEBRA ESSENTIALS 27 ◦ eigen-decomposition: A = UΛU , where U = (u.,1 | u.,2 | . . . | u.,p) ∗ U is p × p orthogonal matrix Λ = diag{λ1, . . . , λp} ∗ Λ is p × p diagonal matrix convention ∗ λi is the ith largest eigenvalue of A ∗ u.,i is the eigenvector corresponding to λi Geometry for A 0 U Λ U A = UΛU CHAPTER 2. LINEAR ALGEBRA ESSENTIALS 28 2.3.2 Singular value decomposition Singular value decomposition (SVD) ◦ let A be an n × p (n ≥ p) rectangular matrix singular values σ1, . . . , σp left singular vectors u.,1, . . . , u.,p right singular vectors v.,1, . . . , v.,p Av.,i = σiu.,i & A u.,i = σiv.,i everything real, singular values non-negative ◦ SVD: A = UΣV , where U is (n × n) orthogonal ∗ first p columns of U: u.,1, . . . , u.,p Σ (n × p) diagonal ∗ σ1, . . . , σp on the diagonal of Σ V is (p × p) orthogonal ∗ columns of V: v.,1, . . . , v.,p convention : singular values in the descending order Geometry for a square A CHAPTER 2. LINEAR ALGEBRA ESSENTIALS 29 V Σ U A = UΣV SVD and spectral decomposition ◦ SVD: rectangular matrix A (n × p, n ≥ p) singular values and vectors (σ1, u.,1, v.,1), . . . , (σp, u.,p, v.,p) A = UΣV , where ∗ U is (n × n) and V is (p × p), both orthogonal ∗ Σ (n × p) diagonal with non-negative diagonal ◦ Spec. dec.: square symmetric matrix A (p × p) eigenvalues and eigenvectors (λ1, u.,1), . . . , (λp, u.,p) CHAPTER 2. LINEAR ALGEBRA ESSENTIALS 30 A = UΛU , where ∗ U is (p × p) orthogonal ∗ Λ (p × p) diagonal ◦ for a square symmetric A, A 0: UΣV = UΛU ◦ for a rectangular matrix A (n × p, n ≥ p), A = UΣV A A = VΣ ΣV (p × p) ⇒ v.,i’s are eigenvectors of A A AA = UΣΣ U (n × n) ⇒ u.,i’s are eigenvectors of AA σi’s are square roots of non-zero λi’s of A A and AA Reduced SVD’s ◦ SVD: rectangular matrix A (n × p, n ≥ p) singular values and vectors (σ1, u.,1, v.,1), . . . , (σp, u.,p, v.,p) A = UΣV , where ∗ U is (n × n) and V is (p × p), both orthogonal ∗ Σ (n × p) diagonal with non-negative diagonal ◦ if n > p A = U1 | U2 Σ1 0 V → A = UΣV = U1Σ1V , where ∗ U1 (n × p) with orthogonal columns ∗ Σ1 is (p × p) ∗ thin SVD: A = U1Σ1V if r = rank(A) < p, Σ1 is (r × r) . . . compact SVD ◦ SVD writes A as a sum of multiples of rank-one matrices: A = p i=1 σiu.,iv.,i = r i=1 σiu.,iv.,i Geometry CHAPTER 2. LINEAR ALGEBRA ESSENTIALS 31 V Σ U A = UΣV SVD and linear mapping ◦ SVD: rectangular matrix A (n × p, n ≥ p) singular values and vectors (σ1, u.,1, v.,1), . . . , (σp, u.,p, v.,p) A = UΣV , where ∗ U and V: orthonormal bases of Rn and Rp such that A maps the ith basis vector of Rp to a non-negative multiple of the ith basis vector of Rn , and sends the left-over basis vectors to zero ◦ ker(A) spanned by the v.,i corresponding to the null σi CHAPTER 2. LINEAR ALGEBRA ESSENTIALS 32 ◦ im(A) spanned by the u.,i corresponding to the positive σi ◦ dim(ker(A)) + dim(im(A)) = p 2.3.3 QR decomposition QR decomposition (factorization) ◦ let A be a p × p matrix A = QR, where Q is p × p orthogonal R is p × p upper triangular ◦ let A be an n × p (n ≥ p) matrix A = QR, where Q is n × n orthogonal R is n × p upper triangular ◦ if n > p A = (Q1 | Q2) R1 0 = Q1 R1, where ∗ Q1 is n × p with orthogonal columns ∗ R1 is p × p upper triangular rank(A) = p ⇒ rank(R1) = p 2.4 Pseudoinverse 2.4.1 Moore–Penrose pseudoinverse Moore–Penrose pseudoinverse ◦ let A be a p × p matrix, rank(A) = p ◦ inverse A−1 is the p × p matrix satisfying AA−1 = I A−1 A = I ◦ let A be an n × p (n ≥ p) matrix ◦ Moore–Penrose pseudoinverse A+ is the p × n matrix satisfying CHAPTER 2. LINEAR ALGEBRA ESSENTIALS 33 AA+ A = A (generalized inverse) A+ AA+ = A+ (generalized reflexive inverse) (AA+ ) = AA+ (A+ A) = A+ A ◦ A+ exists and is unique Construction of A+ ◦ let A be an n × p (n ≥ p) matrix if rank(A) = p ∗ A = UΣV (thin SVD, i.e. U is n × p & Σ is p × p) ∗ A+ = VΣ−1 U if rank(A) = r < p ∗ A = UΣV (compact SVD, i.e. U is n × r & Σ is r × r) ∗ A+ = VΣ−1 U ◦ let A be a p × p symmetric matrix A = UΛU (spectral decomposition) A+ = UΛ+ U , where ∗ Λ+ : diagonal with 1/λi on diagonal if λi = 0, 0 otherwise 2.5 Orthogonal projection Orthogonal projection Projection on Rn ◦ projection: linear mapping P : Rn → Rn such that PP = P P idempotent P is identity on im(P) ◦ ∀ x ∈ Rn : x = Px u + (I − P)x v u ∈ im(P) & v ∈ ker(P): unique decomposition ◦ (I − P) is a projection on ker(P) and ker(I − P) = im(P) CHAPTER 2. LINEAR ALGEBRA ESSENTIALS 34 p1 p2 u = Px x v = (I − P)x x(I − P)x u Px u1 u2 Px x (I − P)x ◦ orthogonal projection: projection with a symmetric P P is symmetric iff im(P) = ( im(I − P) )⊥ ∃!Px ∈ im(P) and ||x − Px||2 = miny∈im(P) ||x − y||2 if P and P1 are orthogonal projections and im(P1) ≤ im(P) ∗ PP1 = P1 = P1P Construction of P 1. orth. projection on im(u): ◦ P = 1 ||u||2 uu ◦ Px = ||u||2 u ∈ im(u) leaves cu unchanged annihilates the complementary basis 2. {u1, . . . , up} orthonormal, orth. projection on im(U), U = (u1|u2| . . . |up) ◦ P = UU ◦ Px = p i=1 < ui, x > ui ∈ im(U) leaves p i=1 ciui unchanged annihilates the complementary basis p1 p2 u = Px x v = (I − P)x x(I − P)x u Px u1 u2 Px x (I − P)x p1 p2 u = Px x v = (I − P)x x(I − P)x u Px u1 u2 Px x (I − P)x Orthogonal projection onto a column space ◦ let A be an n × p (n ≥ p) matrix 1. rank(A) = p ◦ columns a.,1, . . . , a.,p linearly independent ◦ P = A(A A)−1 A CHAPTER 2. LINEAR ALGEBRA ESSENTIALS 35 ◦ A = UΣV (thin SVD, i.e. U is n × p & Σ is p × p) (A A)−1 = VΣ−2 V P = UU 2. rank(A) = r < p ◦ P = A(A A)+ A ◦ A = UΣV (compact SVD, i.e. U is n × r & Σ is r × r) (A A)+ = V Σ−2 V P = UU 2.6 Application to linear regression Application to linear regression Estimation in linear regression (theory) ◦ linear regression: Y = Xβ + ε, E ε = 0, Var ε = σ2 I ◦ we want to estimate β ◦ start with estimating µ = E(Y) = Xβ ∈ im(X) we look for ˆµ ∈ im(X) closest to Y ∗ we look to minimize ||ˆµ − Y||2 ⇒ ˆµ is the orthogonal projection of Y onto im(X) ˆµ = X(X X)−1 X Y = UU Y if rank(X) = p, X(X X)+ X Y = UU Y if rank(X) < p, ∗ where X = UΣV (thin/compact SVD) ◦ ˆµ ∈ im(X) ⇒ ∃ ˆβ such that ˆµ = Xˆβ ◦ if rank(X) = p ⇒ ∃ ! ˆβ such that ˆµ = Xˆβ Estimation in linear regression (practice in ) ◦ aim: minimize ||Y − Xβ||2 w.r.t. β ◦ use that X = QR Q is n × n orthogonal CHAPTER 2. LINEAR ALGEBRA ESSENTIALS 36 R is n × p upper triangular Q and Q rotations ◦ ||Y − Xβ||2 = ||Q (Y − QR β) ||2 = Q1 Q2 Y − R1 0 β 2 = Q1 Y − R1 β 2 + Q2 Y 2 minimize ||Y − Xβ||2 ⇔ minimize ||Q1 Y − R1β||2 ◦ if rank(X) = p R1 invertible ˆβ = R−1 1 Q1 Y Chapter 3 Normal distribution 3.1 The problem 3.1.1 Linear model Linear model ◦ Yi = β0 + β1xi,1 + . . . + βkxi,k + εi, i ∈ {1, . . . , n} Yi: outcome, response, output, dependent variable ∗ random variable, we observe a realization yi ∗ (odezva, z´avisle promˇenn´a, regresand) xi,1, . . . , xi,k: covariates, predictors, explanatory variables, input, independent variables ∗ given, known ∗ (nez´avisle promˇenn´e, regresory) β0, . . . , βk: coefficients ∗ unknown ∗ (regresn´ı koeficienty) εi: random error ∗ random variable, unobserved ◦ εi iid ∼ (0, σ2 ), i ∈ {1, . . . , n} E εi = 0: no systematic errors Var εi = σ2 : same precision ◦ we often assume that εi iid ∼ N(0, σ2 ), i ∈ {1, . . . , n} CHAPTER 3. NORMAL DISTRIBUTION 38 Example: bloodpress data ◦ from sites.stat.psu.edu/~lsimon/stat501wc/sp05/data/ ◦ association between the mean arterial blood pressure[mmHg] and age[years], weight[kg], body surface area[m2 ], duration of hypertension[years], basal pulse[beats/min], stress ◦ data: BP Age Weight BSA DoH Pulse Stress 105 47 85.4 1.75 5.1 63 33 115 49 94.2 2.10 3.8 70 14 . . . . . . . . . . . . . . . . . . . . . 110 48 90.5 1.88 9.0 71 99 122 56 95.7 2.09 7.0 75 99 ◦ model: Y = Xβ + ε       105 115 . . . 110 122       =       1 47 85.4 1.75 5.1 63 33 1 49 94.2 2.10 3.8 70 14 . . . . . . . . . . . . . . . . . . . . . 1 48 90.5 1.88 9.0 71 99 1 56 95.7 2.09 7.0 75 99       ×   β0 . . . β6   +       ε1 ε2 . . . ε19 ε20       3.1.2 Task for this chapter Normal distribution in a linear model ◦ model: Y = Xβ + ε ◦ assumptions of the normal linear model: X fixed and known β fixed unknown ε ∼ N(0, σ2 I) ⇒ Y ∼ N(Xβ, σ2 I) ◦ estimators of β and σ2 functions of Y ◦ test statistics concerning β and σ2 functions of Y ⇒ to make inference in normal linear model, we need to study multivariate normal distribution N(µ, Σ) distributions of functions of N(µ, Σ) CHAPTER 3. NORMAL DISTRIBUTION 39 3.2 Univariate normal distribution 3.2.1 Definition Normal distribution N(µ, σ2 ) ◦ let µ ∈ R and σ2 > 0 density f(x) = 1√ 2πσ2 exp − 1 2σ2 (x − µ)2 for the standard normal distribution (µ = 0, σ2 = 1): −4 −2 0 2 4 0.00.10.20.30.4 x f(x) ◦ if σ2 = 0 then X = µ a.s. 3.2.2 Properties Properties of N(µ, σ2 ) ◦ µ ∈ R and σ2 > 0 ◦ Let a, b ∈ R, X ∼ N(µ, σ2 ). Then aX + b ∼ N(aµ + b, a2 σ2 ). ◦ Let Z ∼ N(0, 1) and X = µ + σ Z. Then X ∼ N(µ, σ2 ). −4 −2 0 2 4 0.00.10.20.30.4 x f(x) CHAPTER 3. NORMAL DISTRIBUTION 40 ◦ Let ai, bi ∈ R, Xi ind. ∼ N(µi, σ2 i ) for i ∈ {1, . . . , n}. Then n i=1(aiXi + bi) ∼ N( n i=1(aiµi + bi), i=1 a2 i σ2 i ). 3.2.3 Related distributions χ2 (n) distribution ◦ let Z ∼ N(0, 1) Z2 ∼ χ2 (1) ◦ let Zi ind. ∼ N(0, 1) for i ∈ {1, . . . , n} X = n i=1 Z2 i ∼ χ2 (n) ◦ density 0 10 20 30 40 50 0.00.40.81.2 χ2 (1) 0 10 20 30 40 50 0.000.050.100.15 χ2 (5) 0 10 20 30 40 50 0.000.040.08 χ2 (10) 0 10 20 30 40 50 0.000.020.040.06 χ2 (20) ◦ E X = n, Var X = 2n Student’s t-distribution ◦ let Z ∼ N(0, 1) and X ∼ χ2 (n), Z ⊥⊥ X T = Z√ X/n ∼ t(n) ◦ density CHAPTER 3. NORMAL DISTRIBUTION 41 −6 −4 −2 0 2 4 6 0.000.100.200.30 t(1) −6 −4 −2 0 2 4 6 0.00.10.20.3 t(3) −6 −4 −2 0 2 4 6 0.00.10.20.30.4 t(5) ◦ E T = 0 for n > 1, Var T = n/(n − 2) for n > 2 Fisher–Snedecor distribution ◦ let X1 ∼ χ2 (n1) and X2 ∼ χ2 (n2), X1 ⊥⊥ X2 F = X1/n1 √ X2/n2 ∼ F(n1, n2) ◦ density 0 2 4 6 8 10 0.01.02.0 F(1, 5) 0 2 4 6 8 10 0.01.02.0 F(5, 1) 0 2 4 6 8 10 0.00.20.4 F(5, 5) 0 2 4 6 8 10 0.00.20.40.6 F(10, 5) ◦ E F = n2/(n2 − 2) for n2 > 2 CHAPTER 3. NORMAL DISTRIBUTION 42 3.3 Multivariate normal distribution 3.3.1 Definition Multivariate normal distribution N(µ, Σ) ◦ µ ∈ Rn , Σ is an n × n positive semidefinite matrix Definition. A random vector X : (Ω, A) → (Rn , B(Rn )) has multivariate normal distribution N(µ, Σ) if and only if a X ∼ N(a µ, a Σ a) for every a ∈ Rn . ◦ if rank(Σ) = n then N(µ, Σ) is non-degenerate has density f(x) = 1 (2π)n det(Σ) exp − 1 2 (x − µ) Σ−1 (x − µ) ◦ if rank(Σ) = r < n then N(µ, Σ) is degenerate a.s. “lives” in a subspace of Rn of dimension r no density w.r.t. Lebesgue measure on B(Rn ) Non-degenerate multivariate normal distribution ◦ density f(x) = 1 (2π)n det(Σ) exp − 1 2 (x − µ) Σ−1 (x − µ) ◦ Σ: square symmetric positive definite matrix spectral decomposition Σ = UΛU λ1 ≥ λ2 ≥ . . . ≥ λn > 0 Σ−1 = UΛ−1 U ◦ quadratic form (x − µ) Σ−1 (x − µ) can be written as (x − µ) UΛ−1 U (x − µ) = {U (x − µ)} Λ−1 {U (x − µ)} ◦ level sets of f(x), Ic = {x ∈ Rn ; f(x) = c} for c > 0: ellipsoids centred at µ directions of principal axes: u1,, . . . , un, lengths of principal semi-axes: √ dλ1, . . . , √ dλn Non-degenerate bivariate normal distribution ◦ N 0 0 , 1 0 0 1 x y 0.02 0.04 0.06 0.08 0.1 0.12 0.14 ◦ N −1 2 , 1 0 0 1 N 0 0 , 2 0 0 2 x y 0.02 0.04 0.06 0.08 0.1 0.12 0.14 x y 0.01 0.02 0.03 0.04 0.05 0.06 0.07 ◦ N 0 0 , 1 0.2 0.2 1 N 0 0 , 1 −0.8 −0.8 1 x y 0.02 0.04 0.06 0.08 0.1 0.12 0.14 x y 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 0.22 0.24 0. 26 3.3.2 Properties Properties of N(µ, Σ) ◦ µ ∈ Rn , Σ is an n × n symmetric positive semidefinite matrix Theorem (MVN 1). Let X ∼ N(µ, Σ). Then E X = µ and Var X = Σ. Theorem (MVN 2). Let Z1, . . . , Zn iid ∼ N(0, 1) and Z = (Z1, . . . , Zn) . Then Z ∼ N(0, I). CHAPTER 3. NORMAL DISTRIBUTION 44 Theorem (MVN 3). Let X ∼ N(µ, Σ) and let A be an m × n real matrix and b ∈ Rm . Then AX + b ∼ N(Aµ + b, AΣA ). ◦ proofs are given during the lectures and can also be found in Jiˇr´ı Andˇel: Z´aklady matematick´e statistiky N(µ, Σ) seen through N(0, I) ◦ µ ∈ Rn , Σ is an n × n symmetric positive semidefinite matrix 1. if rank(Σ) = n ◦ spectral decomposition Σ = UΛU ◦ λ1 ≥ λ2 ≥ . . . ≥ λn > 0 ◦ Σ = UΛU = UΛ1/2 Σ Λ1/2 U = ΣΣ ◦ let Z = Σ −1 (X − µ) = Λ−1/2 U (X − µ) ⇒ Z ∼ N(0, I) (n-dimensional), X = µ + ΣZ and X ∼ N(µ, Σ) 2. if rank(Σ) = r < n ◦ spectral decomposition Σ = UΛU ◦ λ1 ≥ λ2 ≥ . . . ≥ λr > 0, λr+1 = λr+2 = . . . = λn = 0 Σ = UΛU = Un×r (u,1 | u,2 | ... | u,r) Λr×r diag{λ1,λ2,...,λr} Un×r = Un×rΛ 1/2 r×r Σ Λ 1/2 r×rUn×r = ΣΣ◦ ◦ let Z = Σ + (X − µ) = Λ −1/2 r×r Un×r(X − µ) ⇒ Z ∼ N(0, I) (r-dimensional), X = µ + ΣZ and X ∼ N(µ, Σ) Density of N(µ, Σ) ◦ µ ∈ Rn , Σ is an n × n symmetric positive definite matrix Theorem (MVN 4). Let X ∼ N(µ, Σ) where rank(Σ) = n. Then X has density f(x) w.r.t. Lebesgue measure on B(Rn ) and f(x) = 1 (2π)n det(Σ) exp − 1 2 (x − µ) Σ−1 (x − µ) . ◦ a proof is given during the lectures and can also be found in Jiˇr´ı Andˇel: Z´aklady matematick´e statistiky CHAPTER 3. NORMAL DISTRIBUTION 45 Characteristic function (reminder) Definition (Characteristic function of a random variable). Let X be a random variable. The function ψX : R → C defined by ψX(t) = E exp{i t X}, t ∈ R, is the characteristic function of X. Definition (Characteristic function of a random vector). Let X be an n-dimensional random vector. The function ψX : Rn → C defined by ψX(t) = E exp{i t X}, t ∈ Rn , is the characteristic function of X. Properties of characteristic function (reminder) Theorem (ChF 1). Let X ∼ N(µ, σ2 ). Then ψX(t) = exp i t µ − 1 2 σ2 t2 . Theorem (ChF 2). Let X be an n-dimensional random vector and X1 and X2 its subvectors such that X = (X1 , X2 ) . Then X1 ⊥⊥ X2 iff ψX(t) = ψX1 (t1) × ψX2 (t2) for every t = (t1 , t2 ) ∈ Rn . ◦ a proof can be found in Petr Lachout: Teorie pravdˇepodobnosti (1998). Nakladatelstv´ı Univerzity Karlovy Characteristic function of N(µ, Σ) ◦ µ ∈ Rn , Σ is an n × n symmetric positive semidefinite matrix Theorem (MVN 5). Let X ∼ N(µ, Σ). Then ψX(t) = exp i t µ − 1 2 t Σ t . ◦ a proof is given during the lectures and can also be found in Jiˇr´ı Andˇel: Z´aklady matematick´e statistiky Subvectors of N(µ, Σ) ◦ µ ∈ Rn , Σ is an n × n symmetric positive semidefinite matrix Theorem (MVN 6). Let X ∼ N(µ, Σ) and let k ∈ {1, . . . , n}. Then     X1 X2 . . . Xk     ∼ N         µ1 µ2 . . . µk     ,     σ1,1 σ1,2 . . . σ1,k σ2,1 σ2,2 . . . σ2,k . . . . . . . . . . . . σk,1 σk,2 . . . σk,k         . ◦ a proof is given during the lectures and can also be found in Jiˇr´ı Andˇel: Z´aklady matematick´e statistiky CHAPTER 3. NORMAL DISTRIBUTION 46 ◦ analogous statement is true for any sub-vector of X ◦ converse is not true (In)dependence in N(µ, Σ) ◦ µ ∈ Rn , Σ is an n × n symmetric positive semidefinite matrix Theorem (MVN 7). Let X ∼ N(µ, Σ) and let k ∈ {1, . . . , n − 1}. Denote X1 = (X1, . . . , Xk) , X2 = (Xk+1, . . . , Xn) and X1 ∼ N(µ1, Σ1,1), X2 ∼ N(µ2, Σ2,2). If Σ = Σ1,1 0 0 Σ2,2 then X1 ⊥⊥ X2. ◦ a proof is given during the lectures and can also be found in Jiˇr´ı Andˇel: Z´aklady matematick´e statistiky ◦ AX ⊥⊥ BX iff AΣB = 0 3.3.3 Related distributions Quadratic forms ◦ Let X ∼ N(µ, Σ), µ ∈ Rn , Σ is an n × n symmetric positive semidefinite matrix Theorem (QF 1). Let Z ∼ N(0, I). Then Z Z ∼ χ2 (n). Theorem (QF 2). Let X ∼ N(µ, Σ) where rank(Σ) = n. Then (X − µ) Σ−1 (X − µ) ∼ χ2 (n). Theorem (QF 3). Let X ∼ N(µ, Σ) where rank(Σ) = r < n. Then (X−µ) Σ+ (X−µ) ∼ χ2 (r). ◦ proofs are given during the lectures and analogous statements are proved in Jiˇr´ı Andˇel: Z´aklady matematick´e statistiky Quadratic forms ◦ Let X ∼ N(µ, Σ), µ ∈ Rn , Σ is an n × n symmetric positive semidefinite matrix Theorem (QF 4). Let Z ∼ N(0, I) and let P be an n × n projection matrix of rank r. Then Z P Z ∼ χ2 (r). ◦ a proof is given during the lectures and analogous statements are proved in Jiˇr´ı Andˇel: Z´aklady matematick´e statistiky Chapter 4 Linear model 4.1 The problem 4.1.1 Linear model Linear model ◦ Yi = β0 + β1xi,1 + . . . + βkxi,k + εi, i ∈ {1, . . . , n} Yi: outcome, response, output, dependent variable ∗ random variable, we observe a realization yi ∗ (odezva, z´avisle promˇenn´a, regresand) xi,1, . . . , xi,k: covariates, predictors, explanatory variables, input, independent variables ∗ given, known ∗ (nez´avisle promˇenn´e, regresory) β0, . . . , βk: coefficients ∗ unknown ∗ (regresn´ı koeficienty) εi: random error ∗ random variable, unobserved ◦ εi iid ∼ (0, σ2 ), i ∈ {1, . . . , n} E εi = 0: no systematic errors Var εi = σ2 : same precision ◦ we often assume that εi iid ∼ N(0, σ2 ), i ∈ {1, . . . , n} CHAPTER 4. LINEAR MODEL 48 Example: bloodpress data ◦ from sites.stat.psu.edu/~lsimon/stat501wc/sp05/data/ ◦ association between the mean arterial blood pressure[mmHg] and age[years], weight[kg], body surface area[m2 ], duration of hypertension[years], basal pulse[beats/min], stress ◦ data: BP Age Weight BSA DoH Pulse Stress 105 47 85.4 1.75 5.1 63 33 115 49 94.2 2.10 3.8 70 14 . . . . . . . . . . . . . . . . . . . . . 110 48 90.5 1.88 9.0 71 99 122 56 95.7 2.09 7.0 75 99 ◦ model: Y = Xβ + ε       105 115 . . . 110 122       =       1 47 85.4 1.75 5.1 63 33 1 49 94.2 2.10 3.8 70 14 . . . . . . . . . . . . . . . . . . . . . 1 48 90.5 1.88 9.0 71 99 1 56 95.7 2.09 7.0 75 99       ×   β0 . . . β6   +       ε1 ε2 . . . ε19 ε20       4.1.2 Task for this chapter Estimation in linear model ◦ model: Y = Xβ + ε outcome Y ∗ random vector, we observe a realization y predictors x,1, . . . , x,k ∗ vector of given (known) constants coefficients β ∗ vector of unknown constants error ε ∗ unknown random vector, we do not observe its realization assumptions: ε ∼ (0, σ2 I) ∗ E Y = Xβ: the expected value of Y is a linear function of β ∗ E ε = 0: no systematic errors ∗ Var ε = σ2 I: independence and same precision ◦ task: given the observed data y and known matrix X, find estimators β (and σ2) of β (and σ2 ) with desirable properties CHAPTER 4. LINEAR MODEL 49 4.2 Estimating β 4.2.1 Orthogonal projection β motivated by orthogonal projection ◦ model: Y = Xβ + ε, ε unknown, E ε = 0 ◦ idea: set ε ! = 0 and solve Y = Xβ w.r.t. β then Y n×1 ! = X n×p β p×1 n linear equations with p unknowns and n > p ⇒ a solution exists only if Y ∈ im(X) ◦ modified idea: find ˆY ∈ im(X) such that ||Y − ˆY||2 is the smallest possible and solve ˆY = Xβ w.r.t. β then ˆY is the orthogonal projection of Y onto im(X) projection matrix onto im(X) is H hat matrix = X(X X)+ X solving ˆY = Xβ is solving X (X X)+ X Y = Xβ estimate β by β = (X X)+ X Y but β is the unique solution of ˆY = X β iff rank(X) = p ∗ and then β = (X X)−1 X Y Geometric intuition ◦ model: Y = Xβ + ε, ε unknown, E ε = 0 ◦ fitted model: Y observed value = H Y fitted value ˆY + (I − H) Y residual e x,1 x,2 ˆY Ye ◦ < ˆY, e >= e ˆY = Y (I − H) H Y = 0, i.e. ˆY ⊥ e CHAPTER 4. LINEAR MODEL 50 4.2.2 Least squares β as least squares estimator ◦ model: Y = Xβ + ε, ε unknown, E ε = 0 ◦ idea: make the residuals as small as possible minimize ||ε||2 = n i=1 ε2 i w.r.t. β Least Squares Estimator (LSE) β = arg minβ n i=1 ε2 i also called the OLS (Ordinary Least Squares) solution ◦ computation: ε = Y − Xβ β = arg minβ ||Y − Xβ||2 = arg minβ(Y − Xβ) (Y − Xβ) ◦ look for the minimum by differentiating: ∂ ∂β (Y − Xβ) (Y − Xβ) ! = 0 −2 X Y + 2 X X β ! = 0 X X β ! = X Y normal equations ∂2 ∂β∂β (Y − Xβ) (Y − Xβ) ? 0 at β = β 2 X X 0 for all β convex function ⇒ minimum ◦ normal equations have unique solution iff rank(X) = p and then β = (X X)−1 X Y Geometric intuition ◦ model: Y = Xβ + ε, ε unknown, E ε = 0 ◦ fitted model: Y observed value = Xβ fitted value ˆY + (Y − Xβ) residual e ◦ least squares estimator minimizes the sum of squared vertical distances between the fitted and observed values CHAPTER 4. LINEAR MODEL 51 q q q q q q q q q q q q q q q q q q q q 0 5 10 15 20 05101520 Least squares line x Y q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q fitted value observation residual least squares fit 4.2.3 Computing β β = (X X)−1 X Y ◦ we have seen two approaches give the same β ◦ both approaches give unique β iff rank(X) = p ◦ both approaches would give infinitely many βs if rank(X) < p ◦ a rank-deficient design matrix means a problem in design/model formulation ◦ we need to fix that problem to obtain reasonable conclusions from our model ◦ from now on we assume that rank(X) = p ◦ we will get back to (nearly) rank-deficient X in Chapter 9 β the way it is computed in ◦ model: Y = Xβ + ε, ε unknown, E ε = 0 ◦ β minimizes ||Y − Xβ||2 w.r.t. β ◦ uses that X = QR (QR decomposition from Chapter 2) Q (n × n) orthogonal R (n × p) upper triangular X = Q R = (Q1 | Q2) R1 0 = Q1 R1 CHAPTER 4. LINEAR MODEL 52 does not allow rank(X) < p ⇒ rank(R1) = p Q and Q are rotations ∗ ||Y − Xβ||2 = ||Q (Y − Q R β) ||2 = Q1 Q2 Y − R1 0 β 2 = Q1 Y − R1 β 2 + Q2 Y 2 minimize ||Y − Xβ||2 ⇔ minimize ||Q1 Y − R1β||2 ˆβ = R−1 1 Q1 Y (compare with ˆβ = (X X)−1 X Y) Geometric intuition ◦ model: Y = Xβ + ε, ε unknown, E ε = 0 ◦ Y observed value = Xβ fitted value ˆY + (Y − Xβ) residual e Y = Xβ = Q Q QR β = Q R1 0 β = Q R1β 0 e = (Y − Xβ) = Q Q (Y − QR β) = Q Q1 Q2 Y − R1 0 β = Q 0 Q2 Y ◦ Q conveniently rotates Y and im(X) and Q rotates back ◦ Y ⊥ e Geometric intuition ◦ model: Y = Xβ + ε, ε unknown, E ε = 0 ◦ Y observed value = Xβ fitted value ˆY + (Y − Xβ) residual e = Q R1β 0 + Q 0 Q2 Y ◦ see Figure 1.5 on page 20 in Simon Wood’s Generalized additive models for a nice illustration CHAPTER 4. LINEAR MODEL 53 4.3 Quality of estimation 4.3.1 Gauss–Markov theorem Linear transformation of a random vector ◦ we want to study β = (X X)−1 X Y Theorem. Let X be an n-dimensional random vector with a finite variance-covariance matrix and let A be an m × n matrix. Then ◦ E(A X) = A E X; ◦ Var(A X) = A(Var X)A ; ◦ E(X X) = (E X) (E X) + tr(Var X). ◦ proof is a simple exercise Is β a reasonable estimator? ◦ model: Y = Xβ + ε, ε unknown, E ε = 0 ◦ β = (X X)−1 X Y ◦ has a nice motivation but how about properties? E β = E(X X)−1 X Y = (X X)−1 X E Y = (X X)−1 X X β = β ⇒ unbiased Var β = Var (X X)−1 X Y = (X X)−1 X Var Y (X X)−1 X = σ2 (X X)−1 X X (X X)−1 = σ2 (X X)−1 ◦ how good is Var β? β is a linear estimator, i.e. β = AY for a matrix A β is an unbiased estimator, i.e. Eβ β = β for all β in fact, β is the best linear unbiased estimator of β, i.e. β has the smallest variance among all linear unbiased estimators of β Gauss–Markov theorem CHAPTER 4. LINEAR MODEL 54 Theorem (Gauss–Markov). Let Y = Xβ + ε where X is an n × p matrix, rank(X) = p, β ∈ Rp , and ε is an n-dimensional random vector with E ε = 0 and Var ε = σ2 I. Then β = (X X)−1 X Y is the best linear unbiased estimator of β, i.e. if β is a linear unbiased estimator of β then Var β − Var β 0. ◦ see the blackboard for a proof main steps ∗ show that if β = AY then AX = I ∗ show that Var β − Var β = σ2 A(I − H)(I − H) A 4.4 Estimating σ2 4.4.1 Estimating σ2 Estimating σ2 ◦ model: Y = Xβ + ε, ε unknown, E ε = 0, Var ε = σ2 I ◦ fitted model: Y = H Y ˆY + (I − H) Y e = Xβ ˆY + (Y − Xβ) e ◦ idea: estimate ε by e some care is needed . . . ∗ E e = E(Y − Xβ) = E Y − X E β = Xβ − Xβ = 0 ∗ Var e = Var((I − H) Y) = (I − H) Var Y (I − H) = σ2 (I − H) ∗ rank((I − H)) = n − rank(X) = n − p < n ⇒ dependence E(e e) = (E e) (E e) + tr(Var e) = tr(σ2 (I − H)) ∗ = σ2 (n − rank(X)) = σ2 (n − p) ∗ ∗: tr(P) = rank(P) for orthogonal projection matrices σ2 = 1 n−p e e = 1 n−p n i=1 e2 i = 1 n−p n i=1(Yi − ˆYi)2 ∗ unbiased estimator of σ2 4.5 Quality of model fit 4.5.1 Coefficient of determination Sums of squares CHAPTER 4. LINEAR MODEL 55 ◦ for β we obtain the minimal ||e||2 = ||Y − ˆY||2 = ||Y − Xβ||2 ◦ we have seen properties of β but how about ˆY? ◦ a question: how close ˆY actually is to Y? how well do the covariates in X explain what we see in Y? ◦ an answer: there is some variability in Yis for different i ∗ Total Sum of Squares TSS: n i=1(Yi − ¯Y )2 ∗ also called SST the model explains a part of the variability in Yis ∗ for different is there are different xi,s and so different ˆYis ∗ Explained Sum of Squares ESS: n i=1(ˆYi − ¯ˆY )2 = n i=1(ˆYi − ¯Y )2 ∗ also called Sum of Squares due to Regression but some variability remained unexplained by the model ∗ Residual Sum of Squares RSS: n i=1(Yi − ˆYi)2 ∗ also called Sum of Squared Residuals or Sum of Squared Errors Coefficient of determination R2 ◦ relationship among the sums of squares TSS = RSS + ESS ∗ ||Y − ¯Y 1||2 = ||Y ± ˆY − ¯Y 1||2 = ||Y − ˆY||2 + || ˆY − ¯Y 1||2 ∗ because < Y− ˆY, ˆY >= 0 =< Y− ˆY, 1 > as ˆY is the orthogonal projection of Y onto im(X) and 1 ∈ im(X) variability: total = unexplained + explained ◦ so how well do the covariates in X explain what we see in Y? coefficient of determination R2 = ESS TSS = 1 − RSS TSS ∗ proportion of variability explained by the model ∗ 0 ≤ R2 ≤ 1 and bigger is better adjusted coefficient of determination R2 adj = 1 − RSS/(n−p) TSS/(n−1) ∗ an alternative that takes the number of predictors into account ∗ RSS/(n − p) = σ2 from the linear regression, TSS/(n − 1) = σ2 without the linear regression Chapter 5 Normal linear model 5.1 The problem 5.1.1 Normal linear model Normal linear model ◦ Yi = β0 + β1xi,1 + . . . + βkxi,k + εi, i ∈ {1, . . . , n} Yi: outcome, response, output, dependent variable ∗ random variable, we observe a realization yi ∗ (odezva, z´avisle promˇenn´a, regresand) xi,1, . . . , xi,k: covariates, predictors, explanatory variables, input, independent variables ∗ given, known ∗ (nez´avisle promˇenn´e, regresory) β0, . . . , βk: coefficients ∗ unknown ∗ (regresn´ı koeficienty) εi: random error ∗ random variable, unobserved ◦ εi iid ∼ N(0, σ2 ), i ∈ {1, . . . , n} E εi = 0: no systematic errors Var εi = σ2 : same precision Example: bloodpress data CHAPTER 5. NORMAL LINEAR MODEL 57 ◦ from sites.stat.psu.edu/~lsimon/stat501wc/sp05/data/ ◦ association between the mean arterial blood pressure[mmHg] and age[years], weight[kg], body surface area[m2 ], duration of hypertension[years], basal pulse[beats/min], stress ◦ data: BP Age Weight BSA DoH Pulse Stress 105 47 85.4 1.75 5.1 63 33 115 49 94.2 2.10 3.8 70 14 . . . . . . . . . . . . . . . . . . . . . 110 48 90.5 1.88 9.0 71 99 122 56 95.7 2.09 7.0 75 99 ◦ model: Y = Xβ + ε       105 115 . . . 110 122       =       1 47 85.4 1.75 5.1 63 33 1 49 94.2 2.10 3.8 70 14 . . . . . . . . . . . . . . . . . . . . . 1 48 90.5 1.88 9.0 71 99 1 56 95.7 2.09 7.0 75 99       ×   β0 . . . β6   +       ε1 ε2 . . . ε19 ε20       5.1.2 Task for this chapter Estimation in normal linear model ◦ model: Y = Xβ + ε outcome Y ∗ random vector, we observe a realization y predictors x,1, . . . , x,k ∗ vector of given (known) constants coefficients β ∗ vector of unknown constants error ε ∗ unknown random vector, we do not observe its realization assumptions: ε ∼ N(0, σ2 I) ∗ E Y = Xβ: the expected value of Y is a linear function of β ∗ E ε = 0: no systematic errors ∗ Var ε = σ2 I: independence and same precision ◦ task: given the observed data y and known matrix X, find estimators β and σ2 of β and σ2 with desirable properties CHAPTER 5. NORMAL LINEAR MODEL 58 5.2 Estimating β and σ2 5.2.1 Likelihood Likelihood ◦ model: Y = Xβ + ε, ε ∼ N(0, σ2 I) ε ∼ N(0, σ2 I) ⇒ Y ∼ N(Xβ, σ2 I) density of Y: f(y; β, σ2 ) = 1 √ 2πσ2 n exp − 1 2σ2 (y − Xβ) (y − Xβ) density is a function of y (parameters are fixed) ◦ likelihood: L(β, σ2 ; y) = 1 √ 2πσ2 n exp − 1 2σ2 (y − Xβ) (y − Xβ) ◦ likelihood is a function of the parameters (y is fixed) ◦ log-likelihood: (β, σ2 ; y) = − n 2 log(2π) − n 2 log(σ2 ) − 1 2σ2 (y − Xβ) (y − Xβ) Log-likelihood ◦ model: Y = Xβ + ε, ε ∼ N(0, σ2 I) ◦ log-likelihood: (β, σ2 ; y) = − n 2 log(2π) − n 2 log(σ2 ) − 1 2σ2 (y − Xβ) (y − Xβ) CHAPTER 5. NORMAL LINEAR MODEL 59 x y 0.02 0.04 0.06 0.08 0.1 0.12 0.14 5.2.2 Matrix derivatives Matrix derivatives: definition ◦ let x ∈ Rn and y ∈ Rm ◦ denominator-layout notation: ∂ ∂x y =   ∂ ∂x1 y1 . . . ∂ ∂x1 ym . . . . . . . . . ∂ ∂xn y1 . . . ∂ ∂xn ym   ◦ if n = 1 ∂ ∂x y = ∂ ∂x y1, . . . , ∂ ∂x ym ◦ if m = 1 ∂ ∂x y =   ∂ ∂x1 y . . . ∂ ∂xn y   Matrix derivatives: useful formulae ◦ let A ∈ Rm×n and x ∈ Rn ∂ ∂x A x = A ∂ ∂x x A = A CHAPTER 5. NORMAL LINEAR MODEL 60 ∂ ∂x x A x = (A + A ) x 5.2.3 Maximizing the likelihood Score function ◦ log-likelihood (β, σ2 ; y) = − n 2 log(2π) − n 2 log(σ2 ) − 1 2σ2 (y − Xβ) (y − Xβ) ◦ score function (β-related part): U1:p(β, σ2 ; y) = ∂ ∂β − 1 2σ2 (y − X β) (y − X β) = ∂ ∂β − 1 2σ2 y y − y X β − β X y + β X X β = − 1 2σ2 −X y − X y + (X X + X X) β = 1 σ2 X y − X X β Score function ctd. ◦ log-likelihood (β, σ2 ; y) = − n 2 log(2π) − n 2 log(σ2 ) − 1 2σ2 (y − Xβ) (y − Xβ) ◦ score function (σ2 -related part): Up+1(β, σ2 ; y) = ∂ ∂σ2 − n 2 log(σ2 ) − 1 2σ2 (y − X β) (y − X β) = − n 2 σ2 + 1 2σ4 (y − X β) (y − X β) = 1 2 σ2 −n + 1 σ2 (y − X β) (y − X β) Score equation ◦ score equation: U(β, σ2 ; y) = 1 σ2 X y − X X β 1 2 σ2 −n + 1 σ2 (y − X β) (y − X β) = 0 CHAPTER 5. NORMAL LINEAR MODEL 61 ◦ score equation for β 1 σ2 X y − X X β ! = 0 actually the normal equations βMLE = (X X)−1 X Y ◦ score equation for σ2 1 2 σ2 −n + 1 σ2 (y − X β) (y − X β) ! = 0 σ2 MLE = 1 n (Y − X βMLE) (Y − X βMLE) Fisher information ◦ observed Fisher information matrix: J(β, σ2 ; y) = 1 σ2 X X 1 σ2 X y − X X β 1 σ2 (X y − X X β) − 1 σ2 n 2 − 1 σ2 (y − X β) (y − X β) ◦ J(βMLE, σ2 MLE): 1 σ2 MLE X X 0 0 n 2 σ2 MLE 0 ◦ Fisher information matrix: I(β, σ2 ) = 1 σ2 X X 0 0 n 2 σ2 5.3 Distribution 5.3.1 Distribution of the MLE Distribution of βMLE ◦ model: Y = Xβ + ε, ε ∼ N(0, σ2 I) ◦ Y ∼ N(Xβ, σ2 I) ◦ distribution of βMLE = (X X)−1 X Y? ◦ MVN 3: Let X ∼ N(µ, Σ) and let A be an m × n real matrix and b ∈ Rm . Then AX + b ∼ N(Aµ + b, AΣA ). CHAPTER 5. NORMAL LINEAR MODEL 62 ◦ βMLE ∼ N(β, σ2 (X X)−1 ) Distribution of σ2 MLE ◦ model: Y = Xβ + ε, ε ∼ N(0, σ2 I) ◦ Y ∼ N(Xβ, σ2 I) ◦ distribution of σ2 MLE = 1 n (Y − X βMLE) (Y − X βMLE)? ◦ recall that ˆY = Xβ = HY = X (X X) −1 X Y (Y − ˆY) = e = (I − H) Y ∗ e ∼ N(0, σ2 (I − H)) (by MVN 3) ◦ QF 3: Let X ∼ N(µ, Σ) where rank(Σ) = r < n. Then (X − µ) Σ+ (X − µ) ∼ χ2 (r). ◦ (I − H)+ = (I − H) ⇒ 1 σ2 e (I − H) e ∼ χ2 (n − p) ◦ e (I − H) e = Y (I − H) (I − H)(I − H) Y = e e ◦ n σ2 σ2 MLE ∼ χ2 (n − p) Relationship between βMLE and σ2 MLE ◦ model: Y = Xβ + ε, ε ∼ N(0, σ2 I) ◦ Y ∼ N(Xβ, σ2 I) ◦ βMLE ∼ N(β, σ2 (X X)−1 ) ◦ n σ2 MLE ∼ χ2 (n − p) ◦ joint distribution of βMLE and σ2 MLE? ◦ recall that β = (X X) −1 X Y (Y − ˆY) = e = (I − H) Y CHAPTER 5. NORMAL LINEAR MODEL 63 ◦ Corollary of MVN 7: Let X ∼ N(µ, Σ). Then AX ⊥⊥ BX iff AΣB = 0. ◦ (X X) −1 X (I − H) = (X X) −1 X − (X X) −1 X X (X X) −1 X = 0 ◦ β ⊥⊥ e and β ⊥⊥ σ2 MLE 5.4 Summary 5.4.1 Estimation in the normal linear model Estimation in the normal linear model Theorem. Let Y = Xβ + ε where X is an n × p matrix, rank(X) = p, β ∈ Rp , and ε ∼ N(0, σ2 I). Then the maximum likelihood estimators of β and σ2 are given by βMLE = (X X)−1 X Y and σ2 MLE = 1 n (Y − X βMLE) (Y − X βMLE). Their distributions are βMLE ∼ N(β, σ2 (X X)−1 ) and n σ2 σ2 MLE ∼ χ2 (n−p), and βMLE and σ2 MLE are independent. ◦ unbiased estimator of σ2 : σ2 = 1 n−p (Y − X βMLE) (Y − X βMLE) ◦ its distribution: (n−p) σ2 σ2 ∼ χ2 (n − p) and β ⊥⊥ σ2 Chapter 6 Inference in normal linear model 6.1 The problem 6.1.1 Normal linear model Normal linear model ◦ Yi = β0 + β1xi,1 + . . . + βkxi,k + εi, i ∈ {1, . . . , n} Yi: outcome, response, output, dependent variable ∗ random variable, we observe a realization yi ∗ (odezva, z´avisle promˇenn´a, regresand) xi,1, . . . , xi,k: covariates, predictors, explanatory variables, input, independent variables ∗ given, known ∗ (nez´avisle promˇenn´e, regresory) β0, . . . , βk: coefficients ∗ unknown ∗ (regresn´ı koeficienty) εi: random error ∗ random variable, unobserved ◦ εi iid ∼ N(0, σ2 ), i ∈ {1, . . . , n} E εi = 0: no systematic errors Var εi = σ2 : same precision Example: bloodpress data CHAPTER 6. INFERENCE IN NORMAL LINEAR MODEL 65 ◦ from sites.stat.psu.edu/~lsimon/stat501wc/sp05/data/ ◦ association between the mean arterial blood pressure[mmHg] and age[years], weight[kg], body surface area[m2 ], duration of hypertension[years], basal pulse[beats/min], stress ◦ data: BP Age Weight BSA DoH Pulse Stress 105 47 85.4 1.75 5.1 63 33 115 49 94.2 2.10 3.8 70 14 . . . . . . . . . . . . . . . . . . . . . 110 48 90.5 1.88 9.0 71 99 122 56 95.7 2.09 7.0 75 99 ◦ model: Y = Xβ + ε       105 115 . . . 110 122       =       1 47 85.4 1.75 5.1 63 33 1 49 94.2 2.10 3.8 70 14 . . . . . . . . . . . . . . . . . . . . . 1 48 90.5 1.88 9.0 71 99 1 56 95.7 2.09 7.0 75 99       ×   β0 . . . β6   +       ε1 ε2 . . . ε19 ε20       6.1.2 Task for this chapter Inference in normal linear model ◦ model: Y = Xβ + ε outcome Y ∗ random vector, we observe a realization y predictors x,1, . . . , x,k ∗ vector of given (known) constants coefficients β ∗ vector of unknown constants error ε ∗ unknown random vector, we do not observe its realization assumptions: ε ∼ N(0, σ2 I) ∗ E Y = Xβ: the expected value of Y is a linear function of β ∗ E ε = 0: no systematic errors ∗ Var ε = σ2 I: independence and same precision ◦ task: given the observed data y and known matrix X, draw conclusions about Y and the relationship between Y and X CHAPTER 6. INFERENCE IN NORMAL LINEAR MODEL 66 6.2 Estimators and distributions 6.2.1 Estimators Point estimation in the normal linear model ◦ model: Y = Xβ + ε X is an n × p matrix, rank(X) = p β ∈ Rp ε ∼ N(0, σ2 I) ◦ estimating β βMLE = βOLS = βMOM = β = (X X)−1 X Y ∗ BLUE ∗ distribution: β ∼ N(β, σ2 (X X)−1 ) ◦ estimating σ2 σ2 MLE = 1 n (Y − X β) (Y − X β) ∗ distribution: n σ2 σ2 MLE ∼ χ2 (n − p) σ2 = 1 n−p (Y − X β) (Y − X β) ∗ unbiased ∗ distribution: (n−p) σ2 σ2 ∼ χ2 (n − p) ◦ β ⊥⊥ σ2 MLE and β ⊥⊥ σ2 6.2.2 Distributions Distributions in normal linear model ◦ model: Y = Xβ + ε, ε ∼ N(0, σ2 I) ◦ distributions of point estimators β ∼ N(β, σ2 (X X)−1 ) (n−p) σ2 σ2 ∼ χ2 (n − p) β ⊥⊥ σ2 ◦ let a ∈ Rp and A ∈ Rm×p CHAPTER 6. INFERENCE IN NORMAL LINEAR MODEL 67 a β ∼ N(a β, σ2 a (X X)−1 a) A β ∼ N(A β, σ2 A(X X)−1 A ) ∗ proofs: use MVN 3: Let X ∼ N(µ, Σ) and let A be an m × n real matrix and b ∈ Rm . Then AX + b ∼ N(Aµ + b, AΣA ). Distributions in normal linear model ctd. ◦ model: Y = Xβ + ε, ε ∼ N(0, σ2 I) ◦ distributions of statistics a β ∼ N(a β, σ2 a (X X)−1 a) for a ∈ Rp A β ∼ N(A β, σ2 A(X X)−1 A ) for A ∈ Rm×p (n−p) σ2 σ2 ∼ χ2 (n − p) β ⊥⊥ σ2 ◦ for a ∈ Rp and A ∈ Rm×p , rank(A) = m a β − a β σ2 a (X X)−1a ∼ t(n − p) ∗ proof: verify that the definition of t(n − p) is satisfied 1 m σ2 (Aβ − Aβ) (A (X X)−1 A )−1 (Aβ − Aβ) ∼ F(m, n − p) ∗ proof: use QF2: X ∼ N(µ, Σ), rank(Σ) = n ⇒ (X − µ) Σ−1 (X − µ) ∼ χ2 (n) and verify that the definition of F(m, n − p) is satisfied 6.3 Confidence intervals Confidence intervals Interval estimation in normal linear model ◦ model: Y = Xβ + ε, ε ∼ N(0, σ2 I) ◦ let a ∈ Rp CHAPTER 6. INFERENCE IN NORMAL LINEAR MODEL 68 a β − a β σ2 a (X X)−1a ∼ t(n − p) (1 − α) × 100 % confidence interval for a β: a β − t1−α/2(n − p) σ2 a (X X)−1a , a β + t1−α/2(n − p) σ2 a (X X)−1a ◦ (n−p) σ2 σ2 ∼ χ2 (n − p) (1 − α) × 100 % confidence interval for σ2 : (n − p) σ2 χ2 1−α/2(n − p) , (n − p) σ2 χ2 α/2(n − p) Confidence intervals for the components of β ◦ model: Y = Xβ + ε, ε ∼ N(0, σ2 I) ◦ let a ∈ Rp such that ai = 1 and aj = 0 for j = i (1 − α) × 100 % confidence interval for βi: ˆβi − t1−α/2(n − p) σ2 (X X)−1 i,i , ˆβi + t1−α/2(n − p) σ2 (X X)−1 i,i ◦ let a ∈ Rp such that a1 = 1, ai = 1 and aj = 0 for j = i (1 − α) × 100 % confidence interval for β1 + βi: ˆβ1 + ˆβi − t1−α/2(n − p) σ2 (X X)−1 1,1 + 2(X X)−1 1,i + (X X)−1 i,i , ˆβ1 + ˆβi + t1−α/2(n − p) σ2 (X X)−1 1,1 + 2(X X)−1 1,i + (X X)−1 i,i and analogously for other sums of components of β 6.4 Prediction Prediction New covariate combinations CHAPTER 6. INFERENCE IN NORMAL LINEAR MODEL 69 ◦ model: Y = Xβ + ε, ε ∼ N(0, σ2 I) ◦ what can we say about Y = β0 + β1x1 + . . . + βkxk + ε ? ◦ let x ∈ Rp such that x = (1, x1, . . . , xk) ◦ Y = x β + ε and E Y = x β ◦ we may estimate E Y by E Y = x β ◦ (1 − α) × 100 % confidence interval for E Y : x β − t1−α/2(n − p) σ2 x (X X)−1x , x β + t1−α/2(n − p) σ2 x (X X)−1x Prediction in normal linear model ◦ model: Y = Xβ + ε, ε ∼ N(0, σ2 I) ◦ how can we estimate Y = β0 + β1x1 + . . . + βkxk + ε = x β + ε ? i.e. how do we predict new Y for new x? ◦ prediction ˆY = x β ◦ (1 − α) × 100 % confidence interval for Y (prediction interval): x β − t1−α/2(n − p) σ2 (1 + x (X X)−1x) , x β + t1−α/2(n − p) σ2 (1 + x (X X)−1x) CHAPTER 6. INFERENCE IN NORMAL LINEAR MODEL 70 6.5 Confidence bands Confidence bands Confidence regions in normal linear model ◦ model: Y = Xβ + ε, ε ∼ N(0, σ2 I) ◦ let A ∈ Rm×p , rank(A) = m 1 m σ2 (A β − A β) (A (X X)−1 A )−1 (A β − A β) ∼ F(m, n − p) ◦ (1 − α) × 100 % confidence bands for A β: A β; 1 m σ2 (A β − A β) (A (X X)−1 A )−1 (A β − A β) ≤ F1−α(m, n − p) Confidence bands in normal linear model ◦ model: Y = Xβ + ε, ε ∼ N(0, σ2 I) Lemma 1. Let B ∈ Rm×m , B 0. Then for every x ∈ Rm x B x ≤ 1 ⇔ (b x)2 ≤ b B−1 b ∀ b ∈ Rm . ◦ a proof can be found in Jiˇr´ı Andˇel: Z´aklady matematick´e statistiky (2005). Matfyzpress; see also multiple comparisons and Scheff´e’s theorem next semester ◦ for A ∈ Rm×p , rank(A) = m: 1 − α = = P 1 m σ2 (A β − A β) (A (X X)−1 A )−1 (A β − A β) ≤ F1−α(m, n − p) = P b (A β − A β) 2 ≤ m F1−α(m, n − p) σ2 b A (X X)−1 A b; ∀b ∈ Rm 6.6 Testing hypotheses 6.6.1 Simple hypothesis Testing H0 : βi = 0 CHAPTER 6. INFERENCE IN NORMAL LINEAR MODEL 71 ◦ model: Y = Xβ + ε, ε ∼ N(0, σ2 I) ◦ for a ∈ Rp a β − a β σ2 a (X X)−1a ∼ t(n − p) ◦ let a ∈ Rp such that ai = 1 and aj = 0 for j = i ◦ testing H0 : βi = 0 vs. H1 : βi = 0 ◦ test statistic Ti = ˆβi σ2 (X X)−1 i,i ∼ t(n − p) ◦ reject H0 in favour of H1 if |ti| > t1−α/2(n − p) ◦ analogously for linear combinations of elements of β ◦ analogously for testing H0 : βi = β0,i 6.6.2 Composite hypothesis Testing H0 : βi:p = 0 ◦ model: Y = Xβ + ε, ε ∼ N(0, σ2 I) ◦ for A ∈ Rm×p , rank(A) = m 1 m σ2 (Aβ − Aβ) (A (X X)−1 A )−1 (Aβ − Aβ) ∼ F(m, n − p) ◦ testing H0 : βi:p = 0 vs. H1 : βi:p = 0 ◦ test statistic Fi:p = 1 (p − i + 1) σ2 βi:p (X X)−1 i:p,i:p βi:p ∼ F(p − i + 1, n − p) CHAPTER 6. INFERENCE IN NORMAL LINEAR MODEL 72 ◦ reject H0 in favour of H1 if fi:p > F1−α(p − i + 1, n − p) Testing “the model” ◦ model: Y = Xβ + ε, ε ∼ N(0, σ2 I) ◦ for A ∈ Rm×p , rank(A) = m 1 m σ2 (Aβ − Aβ) (A (X X)−1 A )−1 (Aβ − Aβ) ∼ F(m, n − p) ◦ testing H0 : β2:p = 0 vs. H1 : β2:p = 0 ◦ test statistic F = 1 k σ2 β2:p (X X)−1 2:p,2:p β2:p ∼ F(k, n − p) ◦ reject H0 in favour of H1 if f > F1−α(k, n − p) 6.7 Interpretation Interpretation of results for normal linear model A model for fev data ◦ model: Y = Xβ + ε, ε ∼ N(0, σ2 I) ◦ data: fev from http://www.statsci.org/data/general/fev.html CHAPTER 6. INFERENCE IN NORMAL LINEAR MODEL 73 Age 1 2 3 4 5 q q q q q q qq q q q q qqq q q q qq qq q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qqq q qq q q q q qq q q q q q q qq q qq qq q q qq q q q q q q q qq q qqqq q q q q q q qq q qqq q q q q qq q q qq q q q q q q q qq q q q qq qqq q q q qq q q q q q q q q q q q q q q q q qqq q q q q qqq q q q q q q q q qq q q q qq q q q q q q q q q qq q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q qq q qq qq q q q q q q q q q q q q q q q qq qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q qq q qq q q qq q qq qq qq q q q qq q q q q q q q q q q q q q q q qq q qq q q q q q q q q q q q q qq q qq q qq q q q q q q qq q q q q q q q q q q q q q q q q q q q qq q q q qq q q q q q q q q q q q q q q q qq q q q q q q q q qq q q q q q q q q q q q q q q q qq q qq q qq q q q q q q q q qq q qq q q q q q q q q q q q q q q q qqq q q q q q q q qq q q q qq qq q q q q q q q q qq q q q q q q q qqq q q q qq q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q qqqq q q qq q q q q q q q q q q q q qq qq q qq q q qqqqq q q q q q q qq q q q qq q q qq q q q q q q q qq q q q q q qq q q q q q q q q q q qq q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q qq q q q q q q q qq q q q q q q q q qq qqq q qq q q q q qq q q q q q qq q q qq qq q q qq q q q q q q q qq q q qqq q q q q q q qq q q qq q q q q qq q q qq q q q q q q q qq q q q qq q qq q q q q qq q q q q q q q q q q qq q q q qqq q q q q q qq q q q q q q q q qq q q q qq q q q q q q q q q qq q q q q q q q q q q q q q q q qq q q q q q q qq q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q qq q q q qq q q q q q q q q q q q q q q q qq qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q qqq q qq q q q qq qq q q q qq q q q q q q q q q q q q q q q qq q qq q q q q q q q q q q q q qq q q q q qq q q q q q q qq q q q q q q q q q q q q q q q q q q q qq q q q qq q q q q q q q q q q q q q q q qq q q q q q q qq qq q q q q q q q q q q q q q q q qq q qq q q q q q q q qq q q q q q q q q q q q qq q q q q q q q q q qq qq q q q q q q qq q q q q q qq q q q q q q q q qq q q q q q q q q qq q q q qq q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q qqqq q q qq q q q q q q q q q q q q q q qq q qq q q q qq qq q q q q q q q q q q q qq q q q q q q q q q q q qq q q q q q qq q q q q 1.0 1.4 1.8 q q q qq q qq q q q qq qq q q q qq qq q q q q q q q q q q q q q q q qqq q q q q q q q q q q q q q q q qq q q q qq q q q q q q q q qq qqq q qq q q q q qq q q q q q qq q q qq qqq q qq q q q q q q q qq q q qqq q q q q q q qq q qq q q q q q qq q q qq q q q q q q q qq q q q qq qqq q q q qq q q q q q q q q q q q qq q q q qqq q q q q q qq q q q q q q q q qq q q q qq q q q q q q q q q qq q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q qq q qq q q q q q q q q q q q q q q q q q qq qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q qqq qqq q qq qq qq q q qqq q q q q q q q q q q q q q q q qq q qq q q q q qq q q q q q q qq q qq q qq q q q q q q qq q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q qq qq q q q q q q qq qq q q q q q q q q q q q q q q q qq q qq q qq q q q q qq q q qq q q q q q q q qq q q q q q q q q qqqq q q q q q q q qq q q q qq qq q q q q q q q q qq q q q q q q q qqq q q q qq q q q q qq qq q q q q q q q q q q q q q q q q q q q q q q qqqq q q qq q q q q q q q q q q q q q q qq q qq q q qqqqq q q q q q q qq q q q qq q q qq q q q q q q q qq q q q q q qq q q q q 51015 q q q qq q qq q q q qqqq q q q qq qq q q q q q q q q q q q q q q q qqq q q q q q q q q q q q qq q q qq q q qqq q q q q q q q q qq qqq q qq q q q q qq q q q q q qqq q qq qqq q qq q q q q q q q qq q qqqq q q q q q q qq q qqq q q q q qq q q qq q q q q q q qqq q q q qq qqq q q q qqq q q q q q q q q q q qq q q q qqq q q q q qqq q q q q q q q q qq q q q qq q q q q q q q q q qq q q q q q q q q q q q q q q q qq q q q q q q qq q q q q q q q q q q q q q q q q q q qq q q q q q qq q q q q qq q qq qq q q q q q q q q q q q q q q q qq qq q q q q q q q q q q q q q q q q q q q qq q q q q q q q qq q q q q q q q q q q q q q qq q qqq qqq q qq qq qq q q qqq q q q q q q q q q q q q q q q qq q qq q q q q qq q q q q q q qq q qq q qq q q q q q q qqq q q q q q q q q qq q q q q q q q q qq q q q qq q q q q q q q q q q q q q qq qq q q q q q q qq qq q q q q q q q q q q q q q q q qq q qq q qq q q q q qq q q qq q qq q q q q qq q q qq q q q q qqqqq q q qq q q qq q q q qq qq q q q q q q q qqq q q q q q q q qqqq q q qq q qq q qq qq qq q q q q q q q q q q q q q q q q q q q q qqqq q q q q q q q q q q q q q q q q q q qq q qq q q qqqqq q q q q q q qq q q q q q q q qq q q q q q q q qq q q q q q qq q q q q 12345 qqq q q q q q qq q q qqq q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q qq q q q qq q q q q q q qq q q q q q q qq q q q q q qq q qq q q q qq q q q q q q q qqq q q qq q q q q q q q q qq q qq q q q q q q qq q q q q q q q q q q q q q qq q qqq qq q q q q q q q q q q q q q q q q q q q qq q q q q q qq qq q q q q q q q qq qq q qq q q q qq q q q q qq q q q q q q q q q q q q qq q qq qq q q q q q q q q q q q qq q q q q q qq q qqq q qq qq qq q q q qq qqq q qq q q q q q q q q q q qq q q q q q qq q qq q q qq q q q q qq q q q q q q q q q q q q qq q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q qq qq qq q q q q q q qq q q q qq q q qq q q q q q q q q q q q qq qqq q q q q q q q q q q q qq q q q q qq q qq q q qq q qq q q qq q q q q q q q qq q q q q q q qq q q q q q q qq q q q q q q q q q qq q qq q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q qqq q q qq qq q qqq q q q qq q q qq qq q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q qq q qq q qq qq q q qq q q q q q qq q q q q q q q q q q q q qq qq q qq q q q q q qq q q q q q qqq q q q q qq q q qq q q q q q q q q q q qq q FEV q qqq q q q q qq qq q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q qq q q q qq q q q q q q qq q q q q q q q q q q q q q q q q qq q q q qq q q q q q q q q qq q q qq q q q q q q q q qq q qq q q q q q q qq q q q q q q q q q q q q q qq q q qq qq q q q q q q q q q q q q q qq q q q q qq q q q q q q q qq q q q q q q q qq q q q qqq q q qqq q q q qq q q q q q q q q q q q q qq q qq qq q q q q q q q q q q q qq q q q q q q q q q q q q qq qq q q q q q qq qqq q q q q q q q q q q q q q qq q q q q q qq q qq q q qq q q q q qq q q q q q q q q q q q q qq q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q qq qq q q q q q qq q qq q q q qq q q qq q q q q q q q q q q q qqqq q q q q q q q q q q q q q q q q q q qq q qq q qqq q qq q q qq q q q q q q q qq q q q q q q qq q q q q q q qq q q q q qq q q q q q q qq q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q qqq q q q q qq qqqq q q q qq q q qqq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q qq q q q qq q q qq q q q q q qq q q q q q q q q q q q q q q qq q q q q q q q q qq q q q q q qq q q q q q qq q q qq q q q q q q q q q q q q q qqq q q q q q qq q q q qq q q q q q qq q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q qq q q q qq q q qq q q q qq q q q q q q qq q q q q q q qq q q q q q q q q qq q q q qq q q q q q q q qqq q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qqq q q q q q q q q q q q q q q q qq q q q q qq q qq q q q q qq q q q qq q q qq q q q qqq q q q qq q q q qq q q q q q q q q q q q q q q q qq qq q q q q q q q q q q q qq q q q q q qq q qq q q qq qq qq q q q qq qqq q qq q q q q q q q q q q qq q q q q q qqq q q q q q q q q q q qq q q q q q q q q q q q q qq q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q qq q q q q q qq q qq q q q qq q q qq q q q q q q q q q q q qqqq q q q q q q q q q q q q qq q q q q q q qqq q qqq q qq q q qq q q q q q q q qq q q q q q q qq q q q q q q qq q q q q qq q q q qq q qq q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q qq q q qq qq qqq q q q q qq q q qqq q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q qq q qq q q q qq q q qq q q q q q q q q q q q q q q q q q q q q q qq q qq q q q q q qq q q q q q qqq q q q q qq q q qq q q q q q q q q q q qq q qqqq q q q q qq qq qqq q q q q q qq q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q qq q q q qq q q qq q q q qq q q q q q q qq q q q q q q qq q q q q q qq q qq q q q qq q q q q q q q qqq q q qq q q q q q q q q qq q qq q q q q q q qq q q q q q q q q q q q q q qq q qqq qq q q q q q q q q q q q q q qq q q q q qq q qq q q qq qq q q q qq q q qq qq q qqq q q qqq q q q qq q q q q q q q q q q q q qq q qq qq q q q q q q q q q q q qq q q q q q qq q qq q q qq qq qq q q q qq qqq q qq q q q q q q q q q q qq q q q q q qqq qq q q qq q q q q qq q q q q q q q q q q q q qq q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q qq qq qq q q q qq q qq q q q qq q q qq q q q q q q q q q q q qqqqq q q q q q q q q q q q qq q q q q qq qqq q qqq q qq q q qq q q q q q q q qq q q q q q q q q q q q q q q qq q q q q qq q q q qq q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qqq q q qq qq qqqq q q q qq q q qqq q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q qq q qq q qq qq q q qq q q q q q qq q q q q q q q q q q q q qq qq q qq q q q q q qq q q q q q q qq q q q q qq q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q qq q qq q q q q q q q q q q q qq qq q q q q q qq q q q q q q q q q q q qq q q q q q q q q q q q q q q qq q q qq q q q qq q q q q q q q q q q q q qq q q q q q q q qq q q q qq q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q qq q q q qq q q q q q q q q q q qq q q qq qq q q q q q q q qq q q q q q q q q q q q q q q q qq q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q qq q qq q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q qq q qq q q q q q q q q q q q q qq q qq q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q qq q q q q q q q qqq qq q q q q q q q q q qq q q q q q qqq q q q q qq q qqq q qqq q qq q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q qq q qq q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q qq qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q qq q q q q q q qq q q q q q q q q q q q q q q q q q q qq q q q qq q q q q q q q q q q q q qq q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q qq q q q q qq q q q q q q q q q q q q q q q q qq q q q q q q q qq q q q q q q q q q q q q q q q qq q q q q q q qq q q q q q q q q qq q q q q q q q q q q q q q qq q qq q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q qq q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q qq q qq q q q q q q q q q q q q q q q q q q q q q q qq q qq q q q q q q q q q q q q q q q q q q q q q q qq q qq q q q q q q q q q q q q qq q qq q q q q q q q q q q q q q q q q qqq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qqq q q q q qq qqqq q q qq q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q qq q q q q q q q q q q q q q q q q q qq q qq q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q qq q qq qq q q q q q q q q q q q Height q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q qq q q q q q q q q q q q q q qq q q q q q qq q q q qq q q q q q q qq q q q q q q q q q q q q q q qq q q qq q q q qq q q q q q q q q q q q q qq q q q q q q q qq q q q qq q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q qqq q q q qq q q q q q q q q q q q q q q q q qq q q q q q q q qq q q q q q q q q q q q q q q q qq q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q qq q qq q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q qq q qq q q q q q q q q q q q q q q q q q q q q q q qq q qq q q q q q q q q q q q q q q q q q q q q q q qq q qq q q q q q q q q q q q q qq q qq q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q qq q q q qq q q qqq q q q q q q q q q q q q q q q q q q q qqq q q q qq qqq q q qqq q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q qq qq q q q q q q q q q q q 45556575 q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q qq q qq q q q q q q q q q q q qq qq q q q q q qq q q q qq q q q q q q qq q q q q q q q q q q q q q q qq q q qq q q q qq q q q q q q q q q q q q qq q q q q q q q qq q q q qq q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q qqq q q q qq q q q q q q q q q q qq q q qq qq q q q q q q q qq q q q q q q q q q q q q q q q qq q q q q q q qq q q q q q q q q qq q q q q q q q q q q q q q qq q qq q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q qq q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q qq q q q qq q q qq q qq q q q q q q q q q q q q q q q q q q q q q q qq q qq q q q q q q q q q q q q q q q q q q q q q q qq q qq q q q q q q q q q q q q qq q qq q q q q q q q q q q q q q q q q qqq q q q q q q q q q q q qq q q q q q q q q qq qq q q q q q q q q q qq q q q q q qqqq q q q qq qqqq q qqq q qq q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q qq q qq q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q qq q q q qq q q q q q q q q q q q 1.01.41.8 qqq qq qqq qqq q q qqqq q qq qqq q q q qq qqq q qqq q qqqq q qq qq q q q q qq q qqq q qqq q q qqq qq q qq q q q qqqqqq q qqq qq qq q qq q q qq qq qqqqq q q qq q q qq qqqq qq qqqq q q q qq q q q q qqq qqq q qqq q q q q q q q qqq qq q q qq qq qq qq qq qq q qq q q qq qqq q q qqqq q q q q qqq q q qq q q qqq q q q qq qq q q q qq q q q qqqq qq q q qqqq q q q qqq qq q qq q q q qq q q q qqq qq q qqqq qq q qq qqq qq q qq qq qqqq qq q qq q qq q q q q q qqq q q q q q qq qqq q qq qq q q q qq q qq qq q q q q qq q qqqq qq q q q q q q q q qq q qq q q qqqqq q q qq qqq q qqqq q qqq qqq qq q q qq q q qq q qq qq q qq q q q qq qqq q qq qqq qq q qq qq q q q q qqq qq q q qq q q q qq q q q qqq q q qq q q qq q q qq q qq q qqq q qq qq q qqq qq qqqqqq qq q qq qqq q q q q q q qq qqqq qq q qq q qqq q q q q qq qqq q q q q qqq qqq qq q qq q qq q qq qq qqqq q qq q qqqqq qq qq q qq q q q qq q qqq q q q qqqq qq q qq qq q qq q q qq q q q qqq qq qq qq qq q qqqq qq qq q qqq qqq q q qq q q q qqqq qqq qqq q qq qq qqq q qq q qqq qq q q qqq qq q q qqqqq qqq qqq q q qqq qqq q q qqqq q qq qqq q q q qq qqq q qqq q qqq q q qq qq q q q q qq q qqq q qqqq q qqq qqq qq q q q qqqqqq q q qq qq qq q qq q q qq qq qq qqq q q qq q q qq qq qq qq q qqq q qq qq q q q q qq q qqq q qqq q q q q q q q qqq q q q q qq qq qq qq qq qq q qq q q qqq qq q q qqqqq q qq qqq q q qqq q qqq q q q qq qq q q q qq q qq q qqq qq q q qqqq q q q qqq qq q qq q q q qq q q q qqq q q q qqqq qq q qq qqq qq q qq qq qqqq qq q qq q qq q q q q q qqq q q q q qqq q qq q qq q q q q q q q q qq qq q q q q q q q q q qq qq q q q q q q q q qq qqq q q qq qq q q q qq qqq q qqq q q qqq qq q q q q q qq q q qq q qq q q q qq q qq qq qq q q qq qqq qq q q q qq q q q q qqqqq q qqq q q q q q q q q qqq q q q q q qqq q q qq q qq qqqq q qqq q q qqq q q qqqq qqq q q qq q qq q q q q qq qq q qqq qq q q qq q qq q q q q qqq q qq q q q qqq q q q qq q q q q qq q qq qq qqqq qqq q qq qqq qq qq q qq q q q qq q qqq q q q q qq q q qq qq q q q qq q qqq q q q qq q qq qq qq qq q qqq q qq qq q qq q q qq q q qq q q q q qqq qqq qqq q qq qqq qqq qq q qqq qqq q qq q qq q q q qqqq qq q q qq q q qqq qqq q q qqqq q q q q qq q q q qq qqq q qqq q qq qq q qq qq q q q q qq q qqq q qqq q q qqq qqq qq q q q qqqqqq q q qq qq q q q qq q q qq qq qq qqq q q qq q q qq qq qq qq q qqq q q q qq q q q q qq q qqq q qqq q q q q q q q qqq q q q q q q qq qq q q qq qq q qq q q qqqqq q q qqqq q q qq qqq q q qq q q qqq q q q qq qq q q q qq q q q q qqq qq q q qqqq q q q qqq qq q qq q q q qq q q q qqq qq q qq qq q q q qq qqq q q q qq qq qqqq q q q q q q qq q q q q q q qq q q q q q qq q qq q qqqq q q q qq q qq qqq q q q q q q qq qq qq q q q q q q q q qq qqq q q qqq q q q q q q qqq q qqq q q qqq qq q qq q q qq q q qq q qq q q q qq q qq qq q qq q qq q qq qq q q q qq q q q q qqqqq q qqq q q q qq q q q q qq q q q q q q qq q q qq q qq q qqq q qq q q q qqq qq qqqq qqq q q qq q qq q q q q qq qq qq qq qq q qq q qq q q q qq qq qqqq q q q qqq q q q q q q qq q qq q qq q q q q qq qqq q q qqqq qq qq q qq q q q qq q qqq q q q qq q q qqq qq qq q qq q qq q q q q q q q qq qq qq qq q qqq q qq qq q q q q qqq q qqq q q q q qq q q qq qq q q qq q qq qq q q q q q qq qqq q qqq qq q q qq qqq q q q Sex qqq qq qqqqqq q q qqqq q qq qqq q q qqq qqq q qqqq qqqq q qq qqq q q qqq q qqqq qqqq q qqq qqq qqq q q qqqqqqq qqq qq qq qqqq q qq qq qqqqq q qqq q qqq qqqq qq qqqqq qq qq qq q q qqq qqq q qqq q q q q q q q qqq qq q q qq qq qq qq qq qqq qq q qqqqqq q q qqqqq q qq qqq qq qqq q qqq q q q qq qq q q q qqq qq qqqqqqq q qqqq q q qqqq qq q qq q q q qq q q q qqq qq qqqqq qq q qq qqq qq q qqqq qqqq qq q qq q qq q q q qq qqqq q q q qqq qqq q qqqq q q qqq q qq qqq q q q qq q qqqq qqq q q q q qq q qq qqq q q qqqqq q q qq qqq q qqqq qq qq qqqqq q qqqq q q q q qq qq q qq q qq qq qq q q qq qqq qq q qq qq q q q q qqqqq q qq q q qq qq q q q qqq q q qq q qqqq q qq q qq qqqq q qqqq q qqq qq qqqqq qqq q qq qqq qq q q qq qq q qqqqq q qqq qqq qq q q q qq qqq q q q qq q qqq qq qqq q qq q qq qq qqqq qqq q qqqqq qqqq q qqq q q qq qqqq q q q q qqq qqq qq qqq qq q qqq q q q qqq qqqq qq qq q qq qq qq q q q qqq qq q q qqq q q q q qqq qqqq qq q qq qqqqqq qq q qqq qqq q q qq qq q q qqqqq qq q 5 10 15 qqq qqqqq qqq qqqqqqq qqqqq q qq qq qqq qqqq qqqqqqqq qq qqqq qq qqqq qqqq qqqqq qq qqq qqqqqqqqq qqqq qqqqq qq qqqqqqqqqqqqq qq qq qqqqqqqqqqqq qq qqqq qqqqqqqqqqqqq qqq q qq qqqq qqqqqqqqqq qqqqqq qqq qq qq qqq qqqqqq qqq qqqqq qqq q qqqq qqqqq qq qqq qq qq qqqqq qq qq qqqq qqq qqqqq qqq qqqqq qq qqqq qqq qqqq qqqqqqqq qqqqq qq qqqq qqqqqqqq qq qq qqqq qq qqq qqqqq qqq qq qqq qqq qq qq qqqqqqq qqqqqq qqqqqq q q qqq qqq qqqqqqqqqq qqqq qqqqq q qq qqq qq q q q q q q q qqqqqqq q q qq q qqq q q qqqqqqqqqqq qq q qqq qqq qqqq q qqq qqqq q qqqq qq qqqq qq q q qq q q q q qqq q qq qqqqqqq q qqqq q q qqq qq qqqq q qqq qqq q qqq qq qqq q qqqq q q q q q q qq qqqq q q qqqqqqq qqqq q qqq qq qqqqq qqqqqqqqqq qq q qq qqqqqq qqqqq q q qq q qq qqqqq qqq qq qq q qqqqqqq qqqqqqq q q qq q q q qq qq q qq q q q q q qq qq qqqq qq q qq qqq qq qqq q qqq qqq q q q q q qqq q qq qq qqq q q q qqqq q qqq qqqq qqqqq q qqqqq q qq qq qqq qqqq q qqq q qqqqq qqqq qq q qqq q qqqq q qqq qqqqq qq qqqqqqq qq qq qqqqq qq q qqq qqqq qqqq q qq qq qqqq qqqqq qqq q qqqqq qqq qq qqqqqqqq qqq q qq q qqq q qqqqqqqqq qq qqqq q qq qq qqq qq qqqqqqqq qqqqqq q qq q q qqq qqq qqqq qqqqq qqqq qqq qq qq qqqq qqq qqqqq q qqq q q qqqqq qqq q qq qqqq qqq qqqqq qqq qq qqqqqq qqqqqq qqqq qq qqqq qqq qqqqq qqqqq q q qqq q qqqq qq qq qqq qq q q qqqq qq qqq q q q qq qqqq qqq qq qq qqq qqqqqqq qq q qq qq q q q q q q q q q q qqqqq qq q q q q q qqq q q qqq qqqqq qq q qq q qq qqqqqqqq q q qq qq qq q qqqq qqq qqqqq q qqq q q q qqqq q qqq qq qqqq q qqqq q qq qq qqq qqq q qqqqqq q qqq qq q q q q q qqq q q q q q q q qq qq qq q qq q qqq q q qq q q qqq qq qqqqqqqqqq qqq qq qq q qq qq qqq q qqqqq q q qq q q qq qqq q q qq qqqq q qqqq qqq qqqqqq qq q q q q qq qq qq q q q q q q q q q q qq qqqq qq q qq qqq qqq qq q qq q qqq q qq q q q qq q qq q q qqq q q q 45 55 65 75 q qqq q qqq qqqq q qqqq q q qq qq q qq qq qqq qqqq qqq qq qqq qq qqq q qq qqqq q qqq q qqqq qqqqq qqq qqqqqq qq qq qq q qq qq qq qq qqqq qqqq q qq qq qq qq qq qqq qqq qq qqqq q qqqq q qqq qqqq qqq q qq q qqq q qqqq qqqqq q qqq qq q qq qq qqqqq qqqqqq qq qq qqqq qqq q qqqq q qq qqqq qqqqq qq qq qqq qq qq qqqq qqq qqqqq q qq q qq qqqqq qqq qqq qq qq q qq qqqqq q qq qq qq qqqq q qqq qqqq qq qq qq qq qqq qq qqq qq qqqqq qqq qqq qq qqqq qqq qq qq qqqq qq qq q q q q qqqqq qqqqq q qq qq q qqqqqqq qq q qq qq q qq q q q q q q q qqqqq q qq q q q q qqq q q qqq q qqqq qq q qq q qq qqqqqq qq q q qq q qqq q qq qq qqq qqq qq q qqq q q q q qqq q qq q qq qqqq q qqqq q qq qq qqq qqq q qqqqqq q q qq qq q qq q qq qq q q q q q q qqqqq qq q qq q qq qq qqq q q qqq q q q q qqqqqqq qqqq qq qq q qq qqq qq q qqqqq q q q q q qqq qq qq q qq qqq q q qqq q qqq qq qqqq qq q q q q qq qqq q qqq q q qq q q q q q qq qq qq q q q qqq q qq qq q q q q q qq q qq q q qqq q qq qq qqq q q q qqq qqqqqqqq qq qqqqq qqqqq qq qqqqqq qqqqq qqqqq qqqqq qq qqqq qqqqqqqq qqqq qqqqqq qq qqqqqqqqqq qqqq qqqqq qqqq qqqqqq qqqq qqqqqqq qqqqqqq qqqq qqq qqqq qqqq qqqq qq qq qq qqqqq qq qqqq qqqq qqqqq qqq qqqqqqq qqqqqq qqq qqqqq qq q q qqqq qq qqqq qq qqqq qqqqqqqqq qqqqq qq qqqqqq qqq qq qqq qq qqqq qqqqqqq qqq qqqqq qqq qqqqqqqq qqq qqq qqq qq qqqqqq qq qqqq qqqq qqqqq qqqq qqq qqqq qq qqq qqqqqqq qq qq qq q qqqqq qq qqqqqq qqq qqqq qqqqq q qq qqqqq q qq qq q q qq qqqq qq q qq q qqq q q qqq qqqqq qqq qq q qq qqqqqq qq q q qqq qqq q q qqqq qqq qqqq q qqq q q q qqqq q qqqqq qqqq q qqqq q qqqq qqqqq q q q qqq qq q qqq qq qqq q qqqqq q q q q q qqq qq qq q q qqqqq qqqq q q q qq qq qqqqqqq qqqqqq qqqq q qqqq qqq qqqqq qq q qq q qqq qqqqq qqq qqq q qq qqqqqqq qqqq qq q qq q qq q qqq q qq q q qq q q q q q qqqq qqq qq qqq qqqqq q qq q qqq q qq q q qqq q qq qq qqq q q q 1.0 1.4 1.8 1.01.41.8 Smoker A model for fev data ctd. ◦ model: Y = Xβ + ε, ε ∼ N(0, σ2 I) ◦ model FEV by Height and Sex q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q qq q qq q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q qqq q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 120 130 140 150 160 170 180 190 12345 FEV by height Height [cm] FEV[l] Female Male 12345 FEV Gender FEV[l] Fitted model for fev data CHAPTER 6. INFERENCE IN NORMAL LINEAR MODEL 74 ◦ model: Y = Xβ + ε, ε ∼ N(0, σ2 I) ◦ model FEV by Height and Sex > model.simple <- lm(FEV~Height + Sex, data=fev) > summary(model.simple) Call: lm(formula = FEV ~ Height + Sex, data = fev) Residuals: Min 1Q Median 3Q Max -1.6763 -0.2505 0.0001 0.2347 2.0722 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -5.390263 0.180082 -29.932 < 2e-16 *** Height 0.130231 0.002964 43.933 < 2e-16 *** SexMale 0.125123 0.033801 3.702 0.000232 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.4265 on 651 degrees of freedom Multiple R-squared: 0.7587,Adjusted R-squared: 0.758 F-statistic: 1024 on 2 and 651 DF, p-value: < 2.2e-16 Fitted model for fev data ctd. ◦ model: Y = Xβ + ε, ε ∼ N(0, σ2 I) ◦ model FEV by Height and Sex > coefficients(model.simple) (Intercept) Height SexMale -5.3902632 0.1302305 0.1251234 q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q qq q qq q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q qqq q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 45 50 55 60 65 70 75 12345 FEV by height Height [in] FEV[l] Girls Boys Chapter 7 Model selection 7.1 The problem 7.1.1 Normal linear model Normal linear model ◦ Yi = β0 + β1xi,1 + . . . + βkxi,k + εi, i ∈ {1, . . . , n} Yi: outcome, response, output, dependent variable ∗ random variable, we observe a realization yi ∗ (odezva, z´avisle promˇenn´a, regresand) xi,1, . . . , xi,k: covariates, predictors, explanatory variables, input, independent variables ∗ given, known ∗ (nez´avisle promˇenn´e, regresory) β0, . . . , βk: coefficients ∗ unknown ∗ (regresn´ı koeficienty) εi: random error ∗ random variable, unobserved ◦ εi iid ∼ N(0, σ2 ), i ∈ {1, . . . , n} E εi = 0: no systematic errors Var εi = σ2 : same precision Example: bloodpress data CHAPTER 7. MODEL SELECTION 76 ◦ from sites.stat.psu.edu/~lsimon/stat501wc/sp05/data/ ◦ association between the mean arterial blood pressure[mmHg] and age[years], weight[kg], body surface area[m2 ], duration of hypertension[years], basal pulse[beats/min], stress ◦ data: BP Age Weight BSA DoH Pulse Stress 105 47 85.4 1.75 5.1 63 33 115 49 94.2 2.10 3.8 70 14 . . . . . . . . . . . . . . . . . . . . . 110 48 90.5 1.88 9.0 71 99 122 56 95.7 2.09 7.0 75 99 ◦ model: Y = Xβ + ε       105 115 . . . 110 122       =       1 47 85.4 1.75 5.1 63 33 1 49 94.2 2.10 3.8 70 14 . . . . . . . . . . . . . . . . . . . . . 1 48 90.5 1.88 9.0 71 99 1 56 95.7 2.09 7.0 75 99       ×   β0 . . . β6   +       ε1 ε2 . . . ε19 ε20       https://ww2.amstat.org/publications/jse/v13n2/datasets.kahn.html Example: fev data ◦ from: http://www.statsci.org/data/general/fev.html ◦ question: association between the FEV[l] and Smoking, corrected for Age[years], Height[cm] and Gender ◦ data: FEV Age Height Gender Smoking 1.708 9 144.8 Female Non 1.724 8 171.5 Female Non 1.720 7 138.4 Female Non 1.558 9 134.6 Male Non . . . . . . . . . . . . . . . 3.727 15 172.7 Male Current 2.853 18 152.4 Female Non 2.795 16 160.0 Female Current 3.211 15 168.9 Female Non ◦ model: Y = Xβ + ε              1.708 1.724 1.720 1.558 . . . 3.727 2.853 2.795 3.211              =              1 9 144.8 0 0 1 8 171.5 0 0 1 7 138.4 0 0 1 9 134.6 1 0 . . . . . . . . . . . . . . . 1 15 172.7 1 1 1 18 152.4 0 0 1 16 160.0 0 1 1 15 168.9 0 0              ×   β0 . . . β5   +              ε1 ε2 ε3 ε4 . . . ε651 ε652 ε653 ε654              7.1.2 Task for this chapter Model building/selection ◦ model: Y = Xβ + ε CHAPTER 7. MODEL SELECTION 77 outcome Y ∗ random vector, we observe a realization y predictors x,1, . . . , x,k ∗ vector of given (known) constants coefficients β ∗ vector of unknown constants error ε ∗ unknown random vector, we do not observe its realization assumptions: ε ∼ N(0, σ2 I) ∗ E Y = Xβ: the expected value of Y is a linear function of β ∗ E ε = 0: no systematic errors ∗ Var ε = σ2 I: independence and same precision ◦ task: given the observed data y and values of potential covariates, construct X ◦ Note: X should ideally be known a priori based on background knowledge and various optimality considerations but . . . 7.2 Why consider various models? 7.2.1 Should we leave out covariates that appear unnecessary? Testing hypotheses about null coefficients ◦ model: Y = Xβ + ε, ε ∼ N(0, σ2 I) ◦ testing H0 : βi = 0 vs. H1 : βi = 0 ◦ test statistic Ti = ˆβi σ2 (X X)−1 i,i ∼ t(n − p) ◦ reject H0 in favour of H1 if |ti| > t1−α/2(n − p) ◦ testing H0 : βi:p = 0 vs. CHAPTER 7. MODEL SELECTION 78 H1 : βi:p = 0 ◦ test statistic Fi:p = 1 (p − i + 1) σ2 βi:p (X X)−1 i:p,i:p βi:p ∼ F(p − i + 1, n − p) ◦ reject H0 in favour of H1 if fi:p > F1−α(p − i + 1, n − p) What if we do not reject? ◦ model: Y = Xβ + ε, ε ∼ N(0, σ2 I) ◦ testing H0 : βi = 0 vs. H1 : βi = 0 ◦ if |ti| < t1−α/2(n − p) at α % level, we do not reject that βi = 0 in favour of βi = 0 ◦ testing H0 : βi:p = 0 vs. H1 : βi:p = 0 ◦ if fi:p < F1−α(p − i + 1, n − p) at α % level, we do not reject βi:p = 0 in favour of βi:p = 0 ◦ if we do not reject that some components of β are 0, should we change the model? original model Y = Xβ + ε, ε ∼ N(0, σ2 I) new model Y = X,1:(i−1)β1:(i−1) + ε, ε ∼ N(0, σ2 I) Example: bloodpress data ◦ original model Yi = β0 + β1 × Agei + β2 × Weighti + β3 × BSAi+ + β4 × Duri + β5 × Pulsei + β6 × Stressi + εi, 1 ≤ i ≤ 20 CHAPTER 7. MODEL SELECTION 79 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -12.870476 2.556650 -5.034 0.000229 *** Age 0.703259 0.049606 14.177 2.76e-09 *** Weight 0.969920 0.063108 15.369 1.02e-09 *** BSA 3.776491 1.580151 2.390 0.032694 * Dur 0.068383 0.048441 1.412 0.181534 Pulse -0.084485 0.051609 -1.637 0.125594 Stress 0.005572 0.003412 1.633 0.126491 Residual standard error: 0.4072 on 13 degrees of freedom > coef.table <- summary(model.full)$coefficients > V <- vcov(model.full) > A <- diag(rep(1, 7))[5:7, ] > F.stat <- t(A%*%coef.table[, 1])%*%solve(A%*%V%*%t(A))%*%(A%*%coef.table[, 1])/3 > 1-pf(F.stat, df1=3, df2=13) [,1] [1,] 0.1950807 ◦ should we rather use the new model? Yi = β0 + β1 × Agei + β2 × Weighti + β3 × BSAi + εi, 1 ≤ i ≤ 20 Example: bloodpress data ◦ original model Yi = β0 + β1 × Agei + β2 × Weighti + β3 × BSAi+ + β4 × Duri + β5 × Pulsei + β6 × Stressi + εi, 1 ≤ i ≤ 20 Y = Xβ + ε       105 115 . . . 110 122       =       1 47 85.4 1.75 5.1 63 33 1 49 94.2 2.10 3.8 70 14 . . . . . . . . . . . . . . . . . . . . . 1 48 90.5 1.88 9.0 71 99 1 56 95.7 2.09 7.0 75 99       ×   β0 . . . β6   +       ε1 ε2 . . . ε19 ε20       ◦ new model Yi = β0 + β1 × Agei + β2 × Weighti + β3 × BSAi + εi, 1 ≤ i ≤ 20 Y = ˜Xβ + ε       105 115 . . . 110 122       =       1 47 85.4 1.75 1 49 94.2 2.10 . . . . . . . . . . . . 1 48 90.5 1.88 1 56 95.7 2.09       ×     β0 β1 β2 β3     +       ε1 ε2 . . . ε19 ε20       7.2.2 What is the right form of the dependence on covariates? Specifying the form of dependence in the fev data ◦ basic model for the dependence of FEV on Height and Sex: Yi = β0 + β1 × Heighti + β2 × I{the ith person is male}, 1 ≤ i ≤ 654 CHAPTER 7. MODEL SELECTION 80 ◦ does the basic model fit the data well enough? q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q qq q qq q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q qqq q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 120 130 140 150 160 170 180 190 12345 FEV by height and gender: a nonparametric fit Height [cm] FEV[l] q qq q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q qq q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q qq q q q q q q qq q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q qq q q q q q Boys Girls q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q qq q qq q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q qqq q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 120 130 140 150 160 170 180 190 12345 Fit of the basic model Height [cm] FEV[l] q qq q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q qq q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q qq q q q q q q qq q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q qq q q q q q Boys Girls Example: fev data ◦ original model Yi = β0 + β1 × Heighti + β2 × I{the ith child is male} + εi, 1 ≤ i ≤ 654              1.708 1.724 1.720 1.558 . . . 3.727 2.853 2.795 3.211              =              1 144.8 0 1 171.5 0 1 138.4 0 1 134.6 1 . . . . . . . . . 1 172.7 1 1 152.4 0 1 160.0 0 1 168.9 0              ×   β0 β1 β2   +              ε1 ε2 ε3 ε4 . . . ε651 ε652 ε653 ε654              ◦ new model Yi = β0 + β1 × Heighti + β2 × Height2 i + + β3 × I{the ith child is male} + β4 × HeightiI{the ith child is male}+ + β5 × Height2 i I{the ith child is male} + εi, 1 ≤ i ≤ 654              1.708 1.724 1.720 1.558 . . . 3.727 2.853 2.795 3.211              =              1 144.8 20961.3 0 0 0 1 171.5 29395.1 0 0 0 1 138.4 19162.9 0 0 0 1 134.6 18122.5 1 134.6 18122.5 . . . . . . . . . . . . . . . . . . 1 172.7 29832.2 1 172.7 29832.2 1 152.4 23225.8 0 0 0 1 160.0 25606.4 0 0 0 1 168.9 28530.6 0 0 0              ×   β0 . . . β5   +              ε1 ε2 ε3 ε4 . . . ε651 ε652 ε653 ε654              Example: fev data ◦ fit of the new model CHAPTER 7. MODEL SELECTION 81 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -5.194e+00 2.740e+00 -1.895 0.0585 . Height 5.611e-02 3.692e-02 1.520 0.1291 I(Height^2) -3.977e-05 1.238e-04 -0.321 0.7482 SexMale 1.392e+01 3.423e+00 4.067 5.34e-05 *** Height:SexMale -1.903e-01 4.545e-02 -4.188 3.20e-05 *** I(Height^2):SexMale 6.471e-04 1.501e-04 4.310 1.89e-05 *** q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q qq q qq q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q qqq q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 120 130 140 150 160 170 180 190 12345 FEV by height and gender: a nonparametric fit Height [cm] FEV[l] q qq q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q qq q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q qq q q q q q q qq q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q qq q q q q q Boys Girls q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q qq q qq q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q qqq q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 120 130 140 150 160 170 180 190 12345 Fit of the quadratic model Height [cm] FEV[l] q qq q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q qq q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q qq q q q q q q qq q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q qq q q q q q Boys Girls 7.3 Nested models Submodel Nested models ◦ Bigger model: Y = Xb βb + ε, ε ∼ (0, σ2 I), Xb = 1 | x,1 | x,2 | . . . | x,k−1 | x,k βb = (Xb Xb)−1 Xb Y ˆYb = Xbβb = HbY eb = Y − Yb = (I − Hb)Y σ2 b = 1 n−p ||eb||2 ◦ Smaller model: Y = Xs βs + ε, ε ∼ (0, σ2 I), Xs = 1 | x,1 | x,2 | . . . | x,k−r βs = (Xs Xs)−1 Xs Y ˆYs = Xsβs = HsY es = Y − Ys = (I − Hs)Y CHAPTER 7. MODEL SELECTION 82 σ2 s = 1 n−p+r ||es||2 ◦ more generally, any Xs such that im(Xs) ≤ im(Xb) ∃ A ∈ Rp×(p−r) such that Xs = Xb A Xs = p i=1 ai,1 x,i | . . . | p i=1 ai,p−r x,i Relationship between the two models ◦ if the smaller model holds, so does the bigger one ◦ ∃ A ∈ Rp×(p−r) such that Xs = Xb A Xs = p i=1 ai,1 x,i | . . . | p i=1 ai,p−r x,i ◦ bigger model: Y = Xb βb + ε ◦ smaller model: Y = Xs βs + ε = Xb A βs βb + ε ◦ smaller model is the bigger model with a condition on βb βb p×1 = A p×(p−r) βs (p−r)×1 =   n−p j=1 a1,j βs,j . . . n−p j=1 ap,j βs,j   ∃ B ∈ Rr×p such that B βb = 0 ◦ in the bigger normal linear model we may test for the validity of the smaller model by testing whether B βb = 0 (see Week 7) Relationship between the fits of the two models ◦ difference between the fits ˆYb − ˆYs = (Hb − Hs) Y ◦ difference between the residuals es − eb = (I − Hs) Y − (I − Hb) Y = (Hb − Hs) Y ◦ comparison of the nested models’ fits ||es||2 = ||eb||2 + ||es − eb||2 ∗ proof: realize that < eb, es − eb >= 0 ∗ note: ||es||2 ≥ ||eb||2 ⇒ the fit of the bigger model is closer to the observed data CHAPTER 7. MODEL SELECTION 83 ∗ note: ||es − eb||2 = ||es||2 − ||eb||2 ◦ in the normal linear model (ε ∼ N(0, σ2 I)) eb ⊥⊥ (es − eb) ∗ proof: Corollary of MVN 7: Let X ∼ N(µ, Σ). Then AX ⊥⊥ BX iff AΣB = 0. Does the bigger model fit significantly better? ◦ assume that both models hold (i.e. the smaller model holds) and that ε ∼ N(0, σ2 I) (normal linear model) ◦ 1 σ2 ||eb||2 ∼ χ2 n−p proof: see Week 6 ◦ 1 σ2 ||es − eb||2 ∼ χ2 r proof: MVN 3: Let X ∼ N(µ, Σ) and let A be an m × n real matrix and b ∈ Rm . Then AX + b ∼ N(Aµ + b, AΣA ). and QF 4: Let Z ∼ N(0, I) and let P be an n × n projection matrix of rank r. Then Z P Z ∼ χ2 (r). ◦ ||eb||2 ⊥⊥ ||es − eb||2 proof: see the previous slide ◦ ||es − eb||2 /r ||eb||2/(n − p) = (||es||2 − ||eb||2 )/r ||eb||2/(n − p) ∼ Fr,n−p proof: verify that the definition of Fr,n−p is satisfied More submodels Several models nested within one another ◦ Big model: Y = Xb βb + ε, ε ∼ (0, σ2 I), ˆYb = Xbβb = HbY eb = Y − Yb = (I − Hb)Y ◦ Small model: Y = Xs βs + ε, ε ∼ (0, σ2 I), CHAPTER 7. MODEL SELECTION 84 ˆYs = Xsβs = HsY es = Y − Ys = (I − Hs)Y σ2 s = 1 n−p+r ||es||2 ◦ Super-small model: Y = Xss βss + ε, ε ∼ (0, σ2 I), ˆYss = Xssβss = HssY ess = Y − Yss = (I − Hss)Y σ2 ss = 1 n−p+s ||ess||2 ◦ im(Xss) ≤ im(Xs) ≤ im(Xb) Relationship between the fits of the models ◦ difference between the fits ˆYb − ˆYs = (Hb − Hs) Y ˆYb − ˆYss = (Hb − Hss) Y ˆYs − ˆYss = (Hs − Hss) Y ◦ difference between the residuals es − eb = (I − Hs) Y − (I − Hb) Y = (Hb − Hs) Y ess − eb = (I − Hss) Y − (I − Hb) Y = (Hb − Hss) Y ess − es = (I − Hss) Y − (I − Hs) Y = (Hs − Hss) Y ◦ in the normal linear model (ε ∼ N(0, σ2 I)) eb ⊥⊥ (es − eb) eb ⊥⊥ (ess − eb) eb ⊥⊥ (ess − es) ∗ proof: Corollary of MVN 7: Let X ∼ N(µ, Σ). Then AX ⊥⊥ BX iff AΣB = 0. How about the super-small model’s fit? ◦ assume that all three models hold (i.e. the super-small model holds) and that ε ∼ N(0, σ2 I) (normal linear model) ◦ 1 σ2 ||eb||2 ∼ χ2 n−p ◦ 1 σ2 ||ess − es||2 ∼ χ2 s−r CHAPTER 7. MODEL SELECTION 85 proof: MVN 3: Let X ∼ N(µ, Σ) and let A be an m × n real matrix and b ∈ Rm . Then AX + b ∼ N(Aµ + b, AΣA ). and QF 4: Let Z ∼ N(0, I) and let P be an n × n projection matrix of rank r. Then Z P Z ∼ χ2 (r). ◦ ||eb||2 ⊥⊥ ||ess − es||2 ◦ ||ess − es||2 /(s − r) ||eb||2/(n − p) = (||ess||2 − ||es||2 )/(s − r) ||eb||2/(n − p) ∼ Fs−r,n−p proof: verify that the definition of Fs−r,n−p is satisfied 7.4 Selecting the model 7.4.1 Model selection tools Model selection based on sequential testing ◦ statistical tests t test for testing βi = 0 vs. βi = 0 F test for testing A β = 0 vs. A β = 0 likelihood ratio test ∗ 2(maxθb (Big model) − maxθs (Small model)) as. ∼ χ2 |θb|−|θs| ∗ details next semester ◦ we may start with a big model and sequentially leave out terms that do not appear significant multiple testing ⇒ we do not keep the overall α ∗ often α > 0.05 is used at this stage (even α ≈ 0.2) ∗ the procedure is an ad-hoc one (rather than valid testing) ∗ “clean” ways exist (e.g. error-spending function) an approach of this kind is often applied when the interest is in β and the model is there to explain the phenomenon ◦ words of caution p > 0.05 does not guarantee the absence of the relationship CHAPTER 7. MODEL SELECTION 86 significance of the terms in the final model may be amplified Model selection based on “criteria” ◦ model selection criterion a number that describes the overall fit of the model ◦ often applied when the interest is in prediction ◦ focus is on ||e||2 = ||Y − ˆY||2 ◦ already seen coefficient of determination R2 = 1 − ||e||2 ||Y − ¯Y 1||2 ∗ always bigger for a bigger model ∗ bigger model is not necessarily better, so is the difference big enough to justify the use of the bigger model? adjusted coefficient of determination R2 adj = 1 − ||e||2 /(n − p) ||Y − ¯Y 1||2/(n − 1) ∗ penalizes for the model complexity Likelihood-based information criteria ◦ model fit versus model complexity trade-off ◦ Akaike information criterion AIC = −2 maxθ (model) + 2 × |θ| motivation ∗ information theory ∗ prediction favours bigger models ◦ Bayesian information criterion BIC = −2 maxθ (model) + log(n) × |θ| motivation ∗ Bayesian model comparison CHAPTER 7. MODEL SELECTION 87 ∗ selection of covariates favours smaller models ◦ smaller is better ◦ can be used to compare non-nested models ◦ can be used for more general models (cf. next semester) Mallows’s CP ◦ criterion specific for linear regression: suppose that the full model has β of length p describe the fit (focus on prediction) of its submodel with β of length P estimate the average mean square error of prediction 1 σ2 n i=1 E(ˆYi − E Yi)2 by 1 σ2 b n i=1 (ˆYi − Yi)2 = ||es||2 ||eb||2/(n − p) ◦ CP = ||es||2 ||eb||2/(n − p) − n + 2P for the full model: Cp = p models with CP ≈ P are considered good we may plot CP against P and choose a small model that has CP ≈ P (if small is preferred) related to the AIC 7.4.2 Model selection strategies To leave out or not to leave out? ◦ setting βi = 0 if the true βi = 0 i.e. leaving out a covariate that should have been kept possible bias in the estimators of βj for i = j possible invalidity of the resulting model (cf. Week 10) ◦ allowing βi = 0 if the true βi ≈ 0 i.e. keeping unnecessary covariates in the model CHAPTER 7. MODEL SELECTION 88 possibly worse estimation of βj for i = j and larger confidence intervals (cf. Week 11) possibility of overfitting sometimes/often simple explanations are preferable ◦ conclusion avoid blind automatic model selection procedures if possible Model selection strategies ◦ step-wise procedures based on p-values of the t/F test backward selection ∗ start with a biggest model, leave out the covariate with the largest p-value, end when p-values for all included covariates are smaller than αcrit forward selection ∗ start with a smallest model, add the covariate with the smallest p-value, end when p-values of all non-included covariates are larger than αcrit step-wise selection ∗ a combination of forward and backward selection issues ∗ non-exhaustive search ∗ multiple testing; tests invalid unless the smaller model is true ∗ not recommended for prediction ◦ step-wise procedures with a model selection criterion ◦ exhaustive search with a model selection criterion e.g. plot CP or R2 against the number of predictors Notes on model selection ◦ hierarchical modelling powers of lower order should be kept in the model if powers of higher order are present main terms and interactions of lower order should be kept in the model if interactions of higher order are present there may be a good reason for a non-hierarchical model but such a model is not invariant to affine transformations and rotations of covariates CHAPTER 7. MODEL SELECTION 89 ◦ several models may fit equally well if they give qualitatively different answers, reconsider the use of the data to answer the question ◦ avoid blind automatic model selection procedures if possible if impossible, choose a selection procedure to fit the purpose of the modelling and carefully examine the final model ◦ make sure that the models you considered were fitted to the same data Concluding notes ◦ there is no best/foolproof way to do the model selection except for common sense and sound understanding of the phenomenon ◦ A model should be as simple as possible but no simpler. Albert Einstein ◦ All models are wrong but some are useful. George Box Chapter 8 Model diagnostics 8.1 The problem 8.1.1 Normal linear model Normal linear model ◦ Yi = β0 + β1xi,1 + . . . + βkxi,k + εi, i ∈ {1, . . . , n} Yi: outcome, response, output, dependent variable ∗ random variable, we observe a realization yi ∗ (odezva, z´avisle promˇenn´a, regresand) xi,1, . . . , xi,k: covariates, predictors, explanatory variables, input, independent variables ∗ given, known ∗ (nez´avisle promˇenn´e, regresory) β0, . . . , βk: coefficients ∗ unknown ∗ (regresn´ı koeficienty) εi: random error ∗ random variable, unobserved ◦ εi iid ∼ N(0, σ2 ), i ∈ {1, . . . , n} E εi = 0: no systematic errors Var εi = σ2 : same precision Example: bloodpress data CHAPTER 8. MODEL DIAGNOSTICS 91 ◦ from sites.stat.psu.edu/~lsimon/stat501wc/sp05/data/ ◦ association between the mean arterial blood pressure[mmHg] and age[years], weight[kg], body surface area[m2 ], duration of hypertension[years], basal pulse[beats/min], stress ◦ data: BP Age Weight BSA DoH Pulse Stress 105 47 85.4 1.75 5.1 63 33 115 49 94.2 2.10 3.8 70 14 . . . . . . . . . . . . . . . . . . . . . 110 48 90.5 1.88 9.0 71 99 122 56 95.7 2.09 7.0 75 99 ◦ model: Y = Xβ + ε       105 115 . . . 110 122       =       1 47 85.4 1.75 5.1 63 33 1 49 94.2 2.10 3.8 70 14 . . . . . . . . . . . . . . . . . . . . . 1 48 90.5 1.88 9.0 71 99 1 56 95.7 2.09 7.0 75 99       ×   β0 . . . β6   +       ε1 ε2 . . . ε19 ε20       Example: fev data ◦ from: http://www.statsci.org/data/general/fev.html ◦ question: association between the FEV[l] and Smoking, corrected for Age[years], Height[cm] and Gender ◦ data: FEV Age Height Gender Smoking 1.708 9 144.8 Female Non 1.724 8 171.5 Female Non 1.720 7 138.4 Female Non 1.558 9 134.6 Male Non . . . . . . . . . . . . . . . 3.727 15 172.7 Male Current 2.853 18 152.4 Female Non 2.795 16 160.0 Female Current 3.211 15 168.9 Female Non ◦ model: Y = Xβ + ε              1.708 1.724 1.720 1.558 . . . 3.727 2.853 2.795 3.211              =              1 9 144.8 0 0 1 8 171.5 0 0 1 7 138.4 0 0 1 9 134.6 1 0 . . . . . . . . . . . . . . . 1 15 172.7 1 1 1 18 152.4 0 0 1 16 160.0 0 1 1 15 168.9 0 0              ×   β0 . . . β5   +              ε1 ε2 ε3 ε4 . . . ε651 ε652 ε653 ε654              8.1.2 Task for this chapter Checking the model assumptions ◦ model: Y = Xβ + ε outcome Y CHAPTER 8. MODEL DIAGNOSTICS 92 ∗ random vector, we observe a realization y predictors x,1, . . . , x,k ∗ vector of given (known) constants coefficients β ∗ vector of unknown constants error ε ∗ unknown random vector, we do not observe its realization assumptions: ε ∼ N(0, σ2 I) ∗ E Y = Xβ: the expected value of Y is a linear function of β ∗ E ε = 0: no systematic errors ∗ Var ε = σ2 I: independence and same precision ◦ task: do the assumptions appear to be satisfied? ◦ Note: if they are not, inference is not valid . . . 8.2 Random errors and residuals Random errors Random errors in the normal linear model ◦ model: Y = Xβ + ε ◦ assumptions E Y = Xβ: the expected value of Y is a linear function of β ε ∼ N(0, σ2 I) ∗ E ε = 0: no systematic errors ∗ Var ε = σ2 I: independence and the same precision ◦ we need to verify the assumptions on expectation: E Y = Xβ, i.e. E ε = 0 variance: Var ε = σ2 I distribution: ε ∼ N(0, σ2 I) ◦ all assumptions are made on unobserved random errors ε ◦ fitted model: Y = ˆY + (Y − ˆY) = X β + e CHAPTER 8. MODEL DIAGNOSTICS 93 ◦ residuals e sometimes seen as “estimates” of ε ε is an unobserved random vector, not a parameter (constant) e are not estimates in the usual sense Residuals Residuals in the normal linear model ◦ model: Y = Xβ + ε, ε ∼ N(0, σ2 I) ◦ fitted model: Y = ˆY + e = H Y + (I − H) Y ◦ e ∼ N(0, σ2 (I − H)) proof: cf. Week 6 or use MVN 3 rank(I − H) = n − p if rank(X) = p ∗ e d = A Z for an (n − p)-dimensional Z ∼ N(0, I) I − H = UΛU (spec. dec.) ⇒ A = Un×(n−p) Λ 1/2 (n−p)×(n−p) (cf. Week 4 or use MVN 3) ◦ if the assumptions are satisfied, residuals are zero-mean with unequal variances: Var ei = σ2 (1 − hi,i) with a degenerate normal distribution correlated: Cor(ei, ej) = − hi,j √ (1−hi,i) (1−hj,j) ◦ compare to ε ∼ N(0, σ2 I) . . . Standardized residuals in the normal linear model ◦ model: Y = Xβ + ε ε ∼ N(0, σ2 I) ◦ fitted model: Y = ˆY + e = H Y + (I − H) Y e ∼ N(0, σ2 (I − H)) ◦ to check the assumptions, we often use standardized residuals ri = ei σ2 (1 − hi,i) , 1 ≤ i ≤ n CHAPTER 8. MODEL DIAGNOSTICS 94 ◦ if the assumptions are satisfied we expect that ri ≈ N(0, 1) ∗ it can be shown that E ri = 0 and Var ri = 1 (some technical work needed to prove this) ∗ we did not derive the distribution of ri’s ∗ we did not try to get rid of the correlation ◦ compare to ε ∼ N(0, σ2 I) . . . 8.3 Model diagnostics I: checking the assumptions 8.3.1 General principles Checking the assumptions ◦ Specifying the possible departures need to specify in what sense the assumption might be violated if the assumption is H0, need to specify H1 1. Graphical checking ◦ plots that allows us to “see” departures from the assumptions ◦ based on residuals (e or r) 2. Testing the validity of assumptions ◦ usually by fitting a more general model that allows them not to be satisfied and testing whether the generalization is needed ◦ useful as numerical indications BUT ◦ we cannot “prove the null hypothesis” ◦ problems with the validity of inference: chains of tests and multiple testing assumptions on assumptions we should *know* in advance they are satisfied Overall check: residuals versus fitted values ◦ e ⊥ ˆY by definition CHAPTER 8. MODEL DIAGNOSTICS 95 ◦ no systematic patterns should appear between e and ˆY ◦ example: fev data basic model: Yi = β0 + β1 × Heighti + β2 × I{the ith child is male} + εi, 1 ≤ i ≤ 654 quadratic model: Yi = β0 + β1 × Heighti + β2 × Height2 i + + β3 × I{the ith child is male} + β4 × HeightiI{the ith child is male}+ + β5 × Height2 i I{the ith child is male} + εi, 1 ≤ i ≤ 654 q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q qq q q q q q q q q q qq qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq qq q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q qq q q q q q q q q q q q q q q q q qq q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q qq q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 1 2 3 4 −1012 Residuals versus fitted values: basic model Fitted values Residuals q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q qq q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q qq q q q q q qq q q q qq q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q qq qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q qq q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q qq q q q qq q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q qq q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q qqqq q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q 1 2 3 4 5 −1.5−1.0−0.50.00.51.01.52.0 Residuals versus fitted values: quadratic model Fitted values Residuals 8.3.2 Assumptions on the expectation Checking E ε = 0, i.e. E Y = X β ◦ suspected departures from the assumption incorrectly specified form of dependence ∗ plot e against the included covariates ∗ e ⊥ x,i, 1 ≤ i ≤ p, by definition ∗ no systematic patterns should appear between e and x,i ∗ a trend indicates a dependence not captured by the model ∗ a formal test: fit a more complicated dependence and test against the original model missing covariates ∗ plot e against covariates that are not included in the model ∗ no systematic patterns should appear ∗ a trend indicates a dependence not captured by the model ∗ a formal test: fit a larger model and test the effect of the additional covari- ate CHAPTER 8. MODEL DIAGNOSTICS 96 Incorrectly specified form of dependence ◦ example: fev data basic model: Yi = β0 + β1 × Heighti + β2 × I{the ith child is male} + εi, 1 ≤ i ≤ 654 quadratic model: Yi = β0 + β1 × Heighti + β2 × Height2 i + + β3 × I{the ith child is male} + β4 × HeightiI{the ith child is male}+ + β5 × Height2 i I{the ith child is male} + εi, 1 ≤ i ≤ 654 q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q qq q q q q q q q q q qq qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq qq q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q qq q q q q q q q q q q q q q q q q qq q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q qq q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 120 130 140 150 160 170 180 190 −1012 Residuals versus height: basic model Height Residuals q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q qq q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q qq q q q q q qq q q q qq q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q qq qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q qq q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q qq q q q qq q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q qq q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q qqqq q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q 120 130 140 150 160 170 180 190 −1.5−1.0−0.50.00.51.01.52.0 Residuals versus height: quadratic model Height Residuals Missing covariates ◦ example: fev data ◦ quadratic model: Yi = β0 + β1 × Heighti + β2 × Height2 i + + β3 × I{the ith child is male} + β4 × HeightiI{the ith child is male}+ + β5 × Height2 i I{the ith child is male} + εi, 1 ≤ i ≤ 654 CHAPTER 8. MODEL DIAGNOSTICS 97 q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q qq q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q qq q q q q q q q q q qq q q q q q qq q q q qq q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q qq q q q q q qq q q q q q q q q q q q q q q qq q q q q q q qqq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q qq q q q q q qq q q q q q q q q q q q q q q qq q q qq q q q q q q q q q q q q q q q q q q q q q q q q q qq 5 10 15 −1.5−1.0−0.50.00.51.01.52.0 Residuals versus age: quadratic model Age Residuals q q q q q q q q q qq q q q q q q q Current Non −1.5−1.0−0.50.00.51.01.52.0 Residuals versus smoking: quadratic model Smoking Residuals 8.3.3 Assumptions on the variance Checking Var ε = σ2 I: homoskedasticity ◦ suspected departures from the assumption variance changing with fitted values (usually increasing) ∗ plot standardized residuals (usually square root of the absolute value) against fitted values ∗ no pattern should appear variance changing with covariates ∗ plot standardized residuals (usually square root of the absolute value) against covariates ∗ no pattern should appear ∗ a formal test: studentized Breusch–Pagan test subgroups with the same within-group variance ∗ plot boxplots of standardized residuals by groups ∗ boxes should be of approximately equal sizes ∗ a formal test: fit a more general model and test against the original model Breusch–Pagan test ◦ original model Y = Xβ + ε CHAPTER 8. MODEL DIAGNOSTICS 98 ε ∼ N(0, σ2 I) ◦ more general model Y = Xβ + ε ε ∼ N(0, diag(σ2 1, . . . , σ2 n)) σ2 = Xα ◦ Breusch–Pagan test: test α2:p = 0 in the more general model ◦ studentized Breusch–Pagan test less sensitive to the assumption of normality ◦ more general versions of the Breusch–Pagan test and more general tests exist Checking Var ε = σ2 I: homoskedasticity ◦ example: fev data ◦ quadratic model: Yi = β0 + β1 × Heighti + β2 × Height2 i + + β3 × I{the ith child is male} + β4 × HeightiI{the ith child is male}+ + β5 × Height2 i I{the ith child is male} + εi, 1 ≤ i ≤ 654 q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q qq q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q qq q q qq q q q qq q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qqq q q q q q q q q q q q q qq q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q qq q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q qq q q qq q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q 1 2 3 4 5 01234 Residuals versus fitted values: quadratic model Fitted values |StandardizedResiduals| q q q qq q q q q q q q q q q q q q Female Male −4−2024 Residuals versus gender: quadratic model Gender Standardizedresiduals Checking Var ε = σ2 I: independence ◦ suspected departures from the assumption clustering CHAPTER 8. MODEL DIAGNOSTICS 99 ∗ suspected e.g. when several data points collected from one individual (e.g. same individuals followed over time) ∗ plot boxplots of residuals by the suspected groups ∗ no pattern should appear ∗ a formal test: fit a more general model allowing for the within-group dependence and test against the original model serial correlation ∗ suspected when data collected over time or space ∗ plot ei against ei−1 ∗ no pattern should appear ∗ plot the (partial) autocorrelation function ∗ a formal test: fit a more general model and test against the original model ∗ a formal test: Durbin–Watson test Durbin–Watson test ◦ original model Y = Xβ + ε ε ∼ N(0, σ2 I) ◦ more general model Y = Xβ + ε εi = ρ εi−1 + wi, wi iid ∼ (0, σ2 ), |ρ| < 1 (autoregression of the first order on the error terms) ◦ Durbin–Watson test: test ρ = 0 against ρ > 0 in the more general model ◦ also possible to test ρ = 0 against ρ < 0 and ρ = 0 against ρ = 0 ◦ more general tests available Time series models ◦ time series is a random sequence {Xt, t ∈ Z} stationary if E Xt = µ, Var Xt = σ2 , Cov(Xt, Xt+s) = γ(s) ◦ The autocovariance function of a stationary random sequence {Xt, t ∈ Z} is defined as γ(h) = Cov(Xt, Xt+h), h ∈ Z. CHAPTER 8. MODEL DIAGNOSTICS 100 ◦ The autocorrelation function (ACF) is defined as ρ(h) = Cor(Xt, Xt+h) = γ(h)/γ(0), h ∈ Z. ◦ The partial autocorrelation function (PACF) is defined as α(1) = Cor(Xt, Xt+1) = ρ(1) and α(h) = Cor(Xt − ˆXt, Xt+h − ˆXt+h), h = 2, 3, . . . , where ˆXt and ˆXt+h are the fitted values from the linear regressions Xt ∼ Xt+1, . . . , Xt+h−1 and Xt+h ∼ Xt+1, . . . , Xt+h−1. ACF and PACF for ARMA models ◦ special time series models ◦ Let { t} iid ∼ (0, σ2 ). Then {Xt, t ∈ Z} is AR(p) if ∗ Xt = φ1Xt−1 + · · · + φpXt−p + t; MA(q) if ∗ Xt = t + θ1 t−1 + · · · + θq t−q; ARMA(p, q) if ∗ Xt = φ1Xt−1 + · · · + φpXt−p + t + θ1 t−1 + · · · + θq t−q. ◦ ACF and PACF for AR/MA ACF PACF AR(p) Exponential decay Cuts off after lag p MA(q) Cuts off after lag q Exponential decay ARMA(p, q) Exponential decay Exponential decay ACFs and PACFs ◦ simulated ACFs 0 5 10 15 20 25 0.00.6 Lag ACF White noise 0 5 10 15 20 25 0.00.6 Lag ACF Some other series ◦ theoretical ACF and PACF for an ARMA CHAPTER 8. MODEL DIAGNOSTICS 101 0 5 10 15 20 25 −1.00.01.0 Lag ACF 5 10 15 20 25 −1.00.01.0 Lag PACF ACFs and PACFs ◦ theoretical ACF for MA(1) 0 2 4 6 8 10 −1.00.01.0 Xt = εt + 0.4εt−1 Lag ACF 0 2 4 6 8 10 −1.00.01.0 Xt = εt − 0.9εt−1 Lag ACF ◦ theoretical PACF for AR(2) 2 4 6 8 10 −1.00.00.51.0 Xt = 0.2Xt−1 + 0.6Xt−2 + εt Lag PACF 2 4 6 8 10 −1.00.00.51.0 Xt = −0.6Xt−1 − 0.5Xt−2 + εt Lag PACF Other types of dependence ◦ spatial correlation diagnosed via semivariogram for a stationary isotropic random field {Z(x); x ∈ R2 }, semivariogram is ∗ γ(x, y) = 1 2 Var(Z(x) − Z(y)) = 1 2 E(Z(x) − Z(y))2 = γ(h), where h = ||x − y||2 ◦ clustering (boxplots of residuals by group) CHAPTER 8. MODEL DIAGNOSTICS 102 8.3.4 Assumptions on the distribution Checking ε ∼ N(0, σ2 I) ◦ suspected departures from the assumption non-normal distribution ∗ skewed distribution ∗ heavy-tailed distribution ◦ plot a QQ plot for (standardized) residuals ◦ plot a histogram for (standardized) residuals ◦ formal tests: Shapiro–Wilk test, Kolmogorov–Smirnov test warning: valid for iid’s (and residuals are not iid’s) QQ plot and histogram ◦ QQ plot (preferred) quantiles of N(0, 1) against empirical quantiles should be near a straight line problems to look for ∗ S shape (heavy tails) ∗ an arc (skewness) q q qqq q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q qq q q q q q q qq q qq q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q qq q q q q q q q q q qq q q qq q qq q q q qq q q q q q q q qq q q q qq q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q qq qq q q q q q q q q q q q q q q q q q q q q q q qq q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q qq q qq q q q q q q q q q q qq q q q q q q q q q q q q q q q q q qq q q q q q q q qqq q q q qq q q q q q q q q q q q q q q qq q q q q q q q q q q q q qq q q q q q q q q q q qq q q q q q q q q q q q q qq q q q q qq q q q q q q q q q q q qq q q q q q q q q q q q q q q q q qq q q q q q q q qq q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q qq q q qqq q q q q q qq q q q q q q q q q q q q q q qq q q q q q qqqq q q q q q q qq q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q qq q q q q q q q q q qq q q q q q qq q q q q q qq q q q q q q q q q q q q q q qq q q qq q q q q q q qq q q q qq q q q q q q q q q q q q qq −3 −2 −1 0 1 2 3 −4−2024 QQ plot for standardized residuals from the quadratic model Theoretical Quantiles SampleQuantiles Histogram for standardized residuals from the quadratic model Standardized residuals Density −4 −2 0 2 4 0.00.10.20.3 Shapiro–Wilk test and Kolmogorov–Smirnov test ◦ valid for iid’s (and residuals are not iid’s) CHAPTER 8. MODEL DIAGNOSTICS 103 Shapiro–Wilk test ∗ can be seen as a numerical summary of the QQ plot ∗ rather a strong one > shapiro.test(rstandard(model.basic.quad)) Shapiro-Wilk normality test data: rstandard(model.basic.quad) W = 0.9865, p-value = 9.713e-06 > shapiro.test(rstandard(model.basic.quad)[sample(1:654, 50)]) Shapiro-Wilk normality test data: rstandard(model.basic.quad)[sample(1:654, 50)] W = 0.97011, p-value = 0.2338 Kolmogorov–Smirnov test ∗ rather a weak one Importance of the assumption ◦ large-sample distribution of β Let Xn; n ∈ N be a sequence of n × p design matrices of full rank defining a sequence of linear models Yn = Xn β + εn with εn ∼ (0, σ2 In). If max1≤i≤n xi, (Xn Xn)−1 xi, −−−→ n→∞ 0 then (Xn Xn)1/2 (βn − β) d −−−→ n→∞ N(0, σ2 I), where βn = (Xn Xn)−1 Xn Yn. ◦ normality not crucial in large samples unless there are special observations 8.4 Model diagnostics II: influential and unusual ob- servations 8.4.1 Observations to look at Leverage ◦ Y = ˆY + e = H Y + (I − H) Y ◦ Var ˆYi = hi,i . . . leverage ◦ H = X (X X)−1 X and rank(H) = tr(H) = p ◦ hi,i = xi, (X X)−1 xi, and n i=1 hi,i = p CHAPTER 8. MODEL DIAGNOSTICS 104 ◦ the variance of ˆYi determined by the corresponding covariates ◦ we want all observations to contribute ≈ equally to the fit we want that hi,i ≈ p n ◦ if hi,i much larger for some i, the fit may be influenced by (Yi, xi,) much more than by the other observations ◦ observations with hi,i > 2p n should be checked Potentially influential and influential observations q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q 0 10 20 30 40 010203040 Least squares lines x Y q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q Fit for the blue points Fit for the blue and yellow points Fit for the blue and red points ◦ both points have a high leverage, but only one is influential Model with an excluded observation ◦ consider a model Y[−i] = X[−i] β + ε[−i] without the ith observation ◦ fit the model compute β[−i] and σ2 [−i] ◦ compute ˆy[−i] = xi, β[−i] prediction of yi based on the model without the ith observation ◦ if yi − ˆy[−i] is large, the ith observation is an outlier how large is “too large?” CHAPTER 8. MODEL DIAGNOSTICS 105 Var(yi − ˆy[−i]) = σ2 (1 + xi, (X[−i]X[−i])−1 xi,) define jackknife residuals ti = yi − ˆy[−i] σ2 [−i](1 + xi, (X[−i]X[−i])−1xi,) there is a simpler equivalent formula that does not require fitting n models with excluded observations Influential and unusual observations ◦ in the normal linear model: ti ∼ tn−p−1 ◦ we can test whether and observation is an outlier heavy multiple testing Bonferroni correction ∗ use tn−p−1(1 − α/(2n)) instead of tn−p−1(1 − α/2) ◦ to evaluate whether the observation is influential Cook’s distance: di = 1 p ˆσ2 || ˆY − ˆY[−i]||2 = 1 p r2 i hi,i 1 − hi,i how large is “too large”? ∗ rule of thumb: di ≥ 0.5 deserve some attention di ≥ 1 highly influential observation Chapter 9 Reduced-rank design matrix and multicolllinearity 9.1 The problem 9.1.1 Normal linear model Normal linear model ◦ Yi = β0 + β1xi,1 + . . . + βkxi,k + εi, i ∈ {1, . . . , n} Yi: outcome, response, output, dependent variable ∗ random variable, we observe a realization yi ∗ (odezva, z´avisle promˇenn´a, regresand) xi,1, . . . , xi,k: covariates, predictors, explanatory variables, input, independent variables ∗ given, known ∗ (nez´avisle promˇenn´e, regresory) β0, . . . , βk: coefficients ∗ unknown ∗ (regresn´ı koeficienty) εi: random error ∗ random variable, unobserved ◦ εi iid ∼ N(0, σ2 ), i ∈ {1, . . . , n} E εi = 0: no systematic errors Var εi = σ2 : same precision CHAPTER 9. REDUCED-RANK DESIGN MATRIX AND MULTICOLLLINEARITY107 Example: bloodpress data ◦ from sites.stat.psu.edu/~lsimon/stat501wc/sp05/data/ ◦ association between the mean arterial blood pressure[mmHg] and age[years], weight[kg], body surface area[m2 ], duration of hypertension[years], basal pulse[beats/min], stress ◦ data: BP Age Weight BSA DoH Pulse Stress 105 47 85.4 1.75 5.1 63 33 115 49 94.2 2.10 3.8 70 14 . . . . . . . . . . . . . . . . . . . . . 110 48 90.5 1.88 9.0 71 99 122 56 95.7 2.09 7.0 75 99 ◦ model: Y = Xβ + ε       105 115 . . . 110 122       =       1 47 85.4 1.75 5.1 63 33 1 49 94.2 2.10 3.8 70 14 . . . . . . . . . . . . . . . . . . . . . 1 48 90.5 1.88 9.0 71 99 1 56 95.7 2.09 7.0 75 99       ×   β0 . . . β6   +       ε1 ε2 . . . ε19 ε20       https://ww2.amstat.org/publications/jse/v13n2/datasets.kahn.html Example: fev data ◦ from: http://www.statsci.org/data/general/fev.html ◦ question: association between the FEV[l] and Smoking, corrected for Age[years], Height[cm] and Gender ◦ data: FEV Age Height Gender Smoking 1.708 9 144.8 Female Non 1.724 8 171.5 Female Non 1.720 7 138.4 Female Non 1.558 9 134.6 Male Non . . . . . . . . . . . . . . . 3.727 15 172.7 Male Current 2.853 18 152.4 Female Non 2.795 16 160.0 Female Current 3.211 15 168.9 Female Non ◦ model: Y = Xβ + ε              1.708 1.724 1.720 1.558 . . . 3.727 2.853 2.795 3.211              =              1 9 144.8 0 0 1 8 171.5 0 0 1 7 138.4 0 0 1 9 134.6 1 0 . . . . . . . . . . . . . . . 1 15 172.7 1 1 1 18 152.4 0 0 1 16 160.0 0 1 1 15 168.9 0 0              ×   β0 . . . β5   +              ε1 ε2 ε3 ε4 . . . ε651 ε652 ε653 ε654              CHAPTER 9. REDUCED-RANK DESIGN MATRIX AND MULTICOLLLINEARITY108 9.1.2 Task for this chapter Rank-deficiency/near-rank deficiency of X ◦ model: Y = Xβ + ε outcome Y ∗ random vector, we observe a realization y predictors x,1, . . . , x,k ∗ vector of given (known) constants coefficients β ∗ vector of unknown constants error ε ∗ unknown random vector, we do not observe its realization assumptions: ε ∼ N(0, σ2 I) ∗ E Y = Xβ: the expected value of Y is a linear function of β ∗ E ε = 0: no systematic errors ∗ Var ε = σ2 I: independence and same precision ◦ task: so far we have assumed that rank(X) = p What happens if rank(X) < p or “nearly so”? 9.2 Rank-deficient design matrix 9.2.1 Rank-deficient design matrix Full-rank design matrix X ◦ design matrix X is n × p, n > p ◦ SVD: X = U n×n Σ n×p V p×p ◦ if all covariates are linearly independent rank(X) = p σ1 ≥ σ2 ≥ . . . ≥ σp > 0 thin SVD: X = U1 n×p Σ1 p×p V p×p the columns generate a p-dimensional space im(X) CHAPTER 9. REDUCED-RANK DESIGN MATRIX AND MULTICOLLLINEARITY109 ∗ {x,1, . . . , x,p} is a basis of im(X) ∗ {u,1, . . . , u,p} is an orthonormal basis of im(X) ∗ H = U1U1 = X (X X)−1 X is a projection matrix on im(X) Rank-deficient design matrix X ◦ design matrix X is n × p, n > p ◦ SVD: X = U n×n Σ n×p V p×p ◦ if covariates are not linearly independent rank(X) = r < p σ1 ≥ σ2 ≥ . . . ≥ σr > 0 = σr+1 = . . . = σp compact SVD: X = U1 n×r Σ1 r×r V r×r the columns generate an r-dimensional space im(X) ∗ {u,1, . . . , u,r} is an orthonormal basis of im(X) ∗ H = U1U1 = X (X X)+ X is a projection matrix on im(X) β motivated by orthogonal projection (reminder) ◦ model: Y = Xβ + ε, ε unknown, E ε = 0 ◦ idea: set ε ! = 0 and solve Y = Xβ w.r.t. β then Y n×1 ! = X n×p β p×1 n linear equations with p unknowns and n > p ⇒ a solution exists only if Y ∈ im(X) ◦ modified idea: find ˆY ∈ im(X) such that ||Y − ˆY||2 is the smallest possible and solve ˆY = Xβ w.r.t. β then ˆY is the orthogonal projection of Y onto im(X) projection matrix onto im(X) is H hat matrix = X(X X)+ X solving ˆY = Xβ is solving X (X X)+ X Y = Xβ estimate β by β = (X X)+ X Y but β is the unique solution of ˆY = X β iff rank(X) = p CHAPTER 9. REDUCED-RANK DESIGN MATRIX AND MULTICOLLLINEARITY110 ∗ and then β = (X X)−1 X Y β as least squares estimator (reminder) ◦ model: Y = Xβ + ε, ε unknown, E ε = 0 ◦ idea: make the residuals as small as possible minimize ||ε||2 = n i=1 ε2 i w.r.t. β Least Squares Estimator (LSE) β = arg minβ n i=1 ε2 i also called the OLS (Ordinary Least Squares) solution ◦ computation: ε = Y − Xβ β = arg minβ ||Y − Xβ||2 = arg minβ(Y − Xβ) (Y − Xβ) ◦ look for the minimum by differentiating: ∂ ∂β (Y − Xβ) (Y − Xβ) ! = 0 −2 X Y + 2 X X β ! = 0 X X β ! = X Y: normal equations ◦ normal equations have unique solution iff rank(X) = p: then the solution is (X X)−1 X Y ∂2 ∂β∂β (Y − Xβ) (Y − Xβ) = 2 X X 0 for all β ⇒ the solution is the minimum ⇒ β = (X X)−1 X Y If rank(X) = r < p ◦ orthogonal projection approach ˆY exists and is unique ˆβ such that ˆY = X ˆβ is a vector of coordinates of ˆY ∈ im(X) w.r.t. {x,1, . . . , x,p} ∗ if {x,1, . . . , x,p} is not a basis of im(X), ˆβ is not unique {ˆβ; ˆY = X ˆβ} is a linear subspace of Rp of dimension p − r neither ˆY nor ||Y − ˆY||2 depend on the choice of ˆβ ◦ ordinary least squares approach normal equations X X β = X Y are consistent CHAPTER 9. REDUCED-RANK DESIGN MATRIX AND MULTICOLLLINEARITY111 ∗ rank(X X) = rank((X Y, X X)) normal equations have infinitely many solutions ∗ the linear subspace of Rp of dimension p − r the minimum minβ ||Y−Xβ||2 is attained for each of the solutions and its value is the same for all the solutions ∗ the ||Y − ˆY||2 proofs can be found in Andˇel: Z´aklady matematick´e statistiky 9.2.2 Identifiability Identifiable parameters ◦ ˆY and ||Y − ˆY||2 = minβ ||Y − Xβ||2 does not depend on ˆβ ◦ any other quantities with such properties? Theorem. Let Y = Xβ + ε where X is an n × p matrix, rank(X) = r < p, β ∈ Rp, and ε is an n-dimensional random vector with E ε = 0 and Var ε = σ2I. Let c ∈ Rp and θ = c β. If θ ∈ im((X β) ), equivalently if c ∈ im(X ), then (i) the value of ˆθ = c ˆβ where ˆβ is a solution to the normal equations does not depend on the choice of the solution; (ii) ∃ a linear unbiased estimator of θ; (iii) ˆθ = c ˆβ is BLUE for θ. ◦ parameter θ that is a linear combination of E Y is identifiable ◦ a proof can be found in Jiˇr´ı Andˇel: Z´aklady matematick´e statistiky Inference for identifiable parameters ◦ model: Y = Xβ + ε, ε = N(0, σ2 I), rank(X) = r < p ◦ E Y is identifiable ◦ ˆY = X ˆβ is BLUE for E Y for any ˆβ that solves the normal equations ◦ it can be shown that n−r σ2 ˆσ2 ∼ χ2 n−r ˆσ2 = 1 n−r ||Y − ˆY||2 is an unbiased estimator of σ2 CHAPTER 9. REDUCED-RANK DESIGN MATRIX AND MULTICOLLLINEARITY112 ˆσ2 ⊥⊥ ˆβ for any ˆβ that solves the normal equations proofs are similar to the full-rank case ∗ can be found in Jiˇr´ı Andˇel: Z´aklady matematick´e statistiky (2005). Mat- fyzpress. ◦ inference for identifiable parameters and vectors is as in the full-rank model but we need to adjust the degrees of freedom n − r instead of n − p 9.2.3 Choice of the solution Choice of ˆβ ◦ ˆY and ˆσ2 do not depend on the choice of ˆβ ◦ {ˆβ; ˆY = X ˆβ} is a linear subspace of Rp of dimension p − r we can choose ˆβ by specifying p − r linear constraints ∗ choose an (p − r) × p matrix D, rank(D) = p − r ∗ require that D β = 0 ◦ for a given D QR decompose D = (Q1 | Q2) R1 0 = Q1 R1 CD = Q2 is a p × r matrix, rank(CD) = r XD = X CD is an n × r matrix, rank(XD) = r fit the (full-rank) model Y = XD βD + ε ˆβ = CD βD is the solution to the original normal equations satisfying the constraints given by D Common example: factor variables (fev data) ◦ basic model: FEV ∼ Height + Gender na¨ıve parametrization Yi = β0 + βH × Heighti+ + βM × I{the ith child is male} + βF × I{the ith child is female}+ + εi, 1 ≤ i ≤ 654        1.708 1.724 1.720 1.558 . . . 3.211        =        1 144.8 0 1 1 171.5 0 1 1 138.4 0 1 1 134.6 1 0 . . . . . . . . . . . . 1 168.9 0 1        ×     β0 βH βM βF     +        ε1 ε2 ε3 ε4 . . . ε654        CHAPTER 9. REDUCED-RANK DESIGN MATRIX AND MULTICOLLLINEARITY113 standard parametrization Yi = β0 + βH × Heighti+ + βM × I{the ith child is male}+ + εi, 1 ≤ i ≤ 654 ◦ basic model with interaction: FEV ∼ Height * Gender standard parametrization Yi = β0 + βH × Heighti+ + βM × I{the ith child is male}+ + βH:M × I{the ith child is male} × Heighti + εi, 1 ≤ i ≤ 654 One-way ANOVA ◦ Yi,j = µ + αi + εi,j, εi,j iid ∼ N(0, σ2 ) i ∈ {1, . . . , I}, j ∈ {1, . . . ni} ◦ matrix form Y = X (µ, α) + ε, ε ∼ N(0, σ2 I)                     Y1,1 . . . Y1,n1 Y2,1 . . . Y2,n2 . . . . . . . . . YI,1 . . . YI,nI                     =                     1 1 0 0 . . . 0 . . . . . . . . . . . . . . . . . . 1 1 0 0 . . . 0 1 0 1 0 . . . 0 . . . . . . . . . . . . . . . . . . 1 0 1 0 . . . 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 0 0 0 . . . 1 . . . . . . . . . . . . . . . . . . 1 0 0 0 . . . 1                           µ α1 α2 . . . αI       +                     ε1,1 . . . ε1,n1 ε2,1 . . . ε2,n2 . . . . . . . . . εI,1 . . . εI,nI                     X is an n × (I + 1) matrix with rank(X) = I ANOVA ◦ one-way ANOVA Yi,j = µ + αi + εi,j, εi,j iid ∼ N(0, σ2 ) i ∈ {1, . . . , I}, j ∈ {1, . . . ni} matrix form Y = (1 | Xα) (µ, α) + ε, ε ∼ N(0, σ2 I) ∗ X is an n × (I + 1) matrix with rank(X) = I ◦ two-way ANOVA CHAPTER 9. REDUCED-RANK DESIGN MATRIX AND MULTICOLLLINEARITY114 Yi,j,k = µ + αi + βj + εi,j,k, εi,j,k iid ∼ N(0, σ2 ) i ∈ {1, . . . , I}, j ∈ {1, . . . , J}, k ∈ {1, . . . ni,j} matrix form Y = (1 | Xα | Xβ) (µ, α, β) + ε, ε ∼ N(0, σ2 I) ∗ X is an n × (I + J + 1) matrix with rank(X) = I + J − 1 ◦ two-way ANOVA with interactions Yi,j,k = µ + αi + βj + γi,j + εi,j,k, εi,j,k iid ∼ N(0, σ2 ) i ∈ {1, . . . , I}, j ∈ {1, . . . , J}, k ∈ {1, . . . ni,j} Y = (1 | Xα | Xβ | Xα Xβ) (µ, α, β, γ) + ε, ε ∼ N(0, σ2 I) ( denotes component-wise multiplication in the n × (I × J) matrix) ∗ X is an n × (I + J + (I × J) + 1) matrix, rank(X) = I × J ANOVA parametrizations ◦ one-way ANOVA Yi,j = µ + αi + εi,j, εi,j iid ∼ N(0, σ2 ) i ∈ {1, . . . , I}, j ∈ {1, . . . ni} ∗ parametrization: α1 = 0 ∗ other parametrizations: e.g. I i=1 niαi = 0 ◦ two-way ANOVA Yi,j,k = µ + αi + βj + εi,j,k, εi,j,k iid ∼ N(0, σ2 ) i ∈ {1, . . . , I}, j ∈ {1, . . . , J}, k ∈ {1, . . . ni,j} ∗ parametrization: α1 = 0, β1 = 0 ∗ other: e.g. I i=1 αi J j=1 ni,j = 0, J j=1 βj I i=1 ni,j = 0 ◦ two-way ANOVA with interactions Yi,j,k = µ + αi + βj + γi,j + εi,j,k, εi,j,k iid ∼ N(0, σ2 ) i ∈ {1, . . . , I}, j ∈ {1, . . . , J}, k ∈ {1, . . . ni,j} ∗ parametrization: α1 = 0, β1 = 0, γ1,j = 0 ∀ j, γi,1 = 0 ∀ i ∗ other: e.g. I i=1 αi J j=1 ni,j = 0, J j=1 βj I i=1 ni,j = 0, I i=1 ni,jγi,j = 0 ∀ j, J j=1 ni,jγi,j = 0 ∀ i ANOVA parametrizations via matrices of contrasts CHAPTER 9. REDUCED-RANK DESIGN MATRIX AND MULTICOLLLINEARITY115 ◦ one-way ANOVA Y = (1 | Xα) (µ, α) + ε, ε ∼ N(0, σ2 I) replace (1 | Xα) by (1 | Xα Cα) ∗ Cα ∈ RI×(I−1) , rank((1 | Xα Cα)) = I estimate α by Cα ˆα from the fitted model ◦ two-way ANOVA Y = (1 | Xα | Xβ) (µ, α, β) + ε, ε ∼ N(0, σ2 I) replace (1 | Xα | Xβ) by (1 | Xα Cα | Xβ Cβ) ∗ Cα ∈ RI×(I−1) , Cβ ∈ RJ×(J−1) ∗ rank((1 | Xα Cα | Xβ Cβ)) = I + J − 1 estimate α and β by Cα ˆα and Cβ ˆβ from the fitted model ◦ two-way ANOVA with interactions Y = (1 | Xα | Xβ | Xα Xβ) (µ, α, β, γ) + ε, ε ∼ N(0, σ2 I) replace (1 | Xα | Xβ | Xα Xβ) by (1 | Xα Cα | Xβ Cβ | Xα Cα Xβ Cβ) ∗ Cα ∈ RI×(I−1) , Cβ ∈ RJ×(J−1) ∗ rank((1 | Xα Cα | Xβ Cβ | Xα Cα Xβ Cβ)) = I J estimate α, β and γ by Cα ˆα, Cβ ˆβ and (Cα ⊗ Cβ) ˆγ 9.3 Multicollinearity Multicollinearity Multicollinearity ◦ we have seen that if rank(X) = r < p, we do not lose anything by leaving out p − r columns ◦ but what if rank(X) = p but “only nearly so”? the columns of X linearly independent BUT ||x,i|| ||x,j|| ≈ ±1 for some (i, j) and/or for some linear combinations of the columns ◦ we would lose information by leaving out columns but keeping them all is a problem as well CHAPTER 9. REDUCED-RANK DESIGN MATRIX AND MULTICOLLLINEARITY116 X X is ill-conditioned ∗ β solves (X X) β = X Y ∗ small change in Y ⇒ large change in β ∗ fit extremely sensitive to errors ε large Var β ∗ imprecise estimation of β ∗ wide confidence intervals for β’s ∗ large p-values of the t-tests (not necessarily of the overall F-test) Detecting multicollinearity ◦ pairwise relationships graphically: plot pairs of covariates one against another numerically: compute pairwise correlations ◦ pairwise and/or higher-order relationships regressing each covariate in turn on all the others ∗ large values of the corresponding R2 problematic compute eigenvalues of X X ∗ large values of λ1/λj problematic ◦ other indications large p-values of the individual t-tests but a small p-value of the overall F-test estimates of β and Var(β) very sensitive to adding/leaving out covariates and/or perturbing Y Variance inflation factors ◦ fit lm(X,j ∼ X,1 + . . . + X,j−1 + X,j+1 + . . . + X,p) R2 j . . . the corresponding coefficient of determination ◦ it can be shown that Var( ˆβj) = s2 (n − 1)s2 X,j × 1 1 − R2 j in lm(Y ∼ X,1 + . . . + X,p) ◦ variance inflation factor VIFj = 1 1−R2 j CHAPTER 9. REDUCED-RANK DESIGN MATRIX AND MULTICOLLLINEARITY117 measures linear dependence of the jth covariate on the other covariates interpretation ∗ standard error of ˆβj is ≈ VIFj × larger than it would be were the jth covariate independent of the other covariates = 1 for orthogonal covariates, large values indicate problems how big is “too big”? ∗ some consider VIF > 5 problematic ∗ VIF > 10 is definitely considered problematic ◦ a generalization gVIF exists for categorical variables Example: fev data ◦ Cor(Age, Height) = 0.79 q q q q q q qq q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q qq q q q q q q q qq q q q q q q q q q q qqq q qq q q q q qq q q q q q qq q q qq qq q q qq q q q q q q q qq q q qqq q q q q q q qq q q qq q q q q qq q q q q q q q q q q q qq q q q qq q qq q q q q qq q q q q q q q q q q qq q q q qq q q q q q q qq q q q q q q q q qq q q q qq q q q q q q q q q qq q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q qq q q q qq q q q q q q q q q q q q q q q qq qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q qq q qq q q q qq qq q q q qq q q q q q q q q q q q q q q q qq q qq q q q q q q q q q q q q qq q q q q qq q q q q q q qq q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q qq q q q q q q qq qq q q q q q q q q q q q q q q q qq q qq q q q q q q q qq q q q q q q q q q q q qq q q q q q q q q q qq qq q q q q q q qq q q q q q qq q q q q q q q q qq q q q q q q q q qq q q q qq q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q qqqq q q q q q q q q q q q q q q q q q q q q q qq q q q qq qq q q q q q q q q q q q qq q q q q q q q q q q q qq q q q q q qq q q q q 120 130 140 150 160 170 180 190 51015 Age by height Height [cm] Age[y] ◦ R2 Age = 0.69 ◦ VIFAge = 3.24 Ill-conditioned X X ◦ linear model: Y = Xβ + ε ◦ model fitting: (X X) (p×p) ˆβ (p×1) = X Y (p×1) . . . A (p×p) x (p×1) = b (p×1) CHAPTER 9. REDUCED-RANK DESIGN MATRIX AND MULTICOLLLINEARITY118 ◦ solving for ˆβ with machine precision if the error in b is , the error in the solution A−1 b is A−1 relative error in the solution divided by the relative error in b: ∗ ||A−1 ||/||A−1b|| || ||/||b|| for some norm || • ||2 ∗ maximal value: ||A−1|| ||A|| for Euclidean/spectral norm: ||A−1|| ||A|| = λ1 λp : √ of the ratio of the smallest and largest eigenvalue: condition number ∗ some consider ≥ 30 problematic ∗ the condition number depends also on the scales of covariates (not only on their relationships) ∗ can improve a lot if all covariates are on similar scales Tackling multicollinearity ◦ having independent covariates helps a lot but inherent relationships cannot be cir- cumvented ◦ with collinear covariates, information does not increase as we would expect with the number of covariates ◦ “solutions” excluding covariates ∗ we avoid “repeating the same thing” but lose information ∗ keep covariates that are of interest and/or are easy to measure ∗ do not misinterpret leaving out a covariate as implying that it has no significant influence on the outcome orthogonalizing and/or standardizing the predictors ∗ more complicated interpretation ∗ not a problem for prediction (but then multicollinearity might not have been a big issue unless extrapolation was planned) a different method for estimation (e.g. ridge regression) ∗ we loose some nice properties of the estimators Chapter 10 Miscellanea and recap 10.1 The problem 10.1.1 Normal linear model Normal linear model ◦ Yi = β0 + β1xi,1 + . . . + βkxi,k + εi, i ∈ {1, . . . , n} Yi: outcome, response, output, dependent variable ∗ random variable, we observe a realization yi ∗ (odezva, z´avisle promˇenn´a, regresand) xi,1, . . . , xi,k: covariates, predictors, explanatory variables, input, independent variables ∗ given, known ∗ (nez´avisle promˇenn´e, regresory) β0, . . . , βk: coefficients ∗ unknown ∗ (regresn´ı koeficienty) εi: random error ∗ random variable, unobserved ◦ εi iid ∼ N(0, σ2 ), i ∈ {1, . . . , n} E εi = 0: no systematic errors Var εi = σ2 : same precision Example: bloodpress data CHAPTER 10. MISCELLANEA AND RECAP 120 ◦ from sites.stat.psu.edu/~lsimon/stat501wc/sp05/data/ ◦ association between the mean arterial blood pressure[mmHg] and age[years], weight[kg], body surface area[m2 ], duration of hypertension[years], basal pulse[beats/min], stress ◦ data: BP Age Weight BSA DoH Pulse Stress 105 47 85.4 1.75 5.1 63 33 115 49 94.2 2.10 3.8 70 14 . . . . . . . . . . . . . . . . . . . . . 110 48 90.5 1.88 9.0 71 99 122 56 95.7 2.09 7.0 75 99 ◦ model: Y = Xβ + ε       105 115 . . . 110 122       =       1 47 85.4 1.75 5.1 63 33 1 49 94.2 2.10 3.8 70 14 . . . . . . . . . . . . . . . . . . . . . 1 48 90.5 1.88 9.0 71 99 1 56 95.7 2.09 7.0 75 99       ×   β0 . . . β6   +       ε1 ε2 . . . ε19 ε20       https://ww2.amstat.org/publications/jse/v13n2/datasets.kahn.html Example: fev data ◦ from: http://www.statsci.org/data/general/fev.html ◦ question: association between the FEV[l] and Smoking, corrected for Age[years], Height[cm] and Gender ◦ data: FEV Age Height Gender Smoking 1.708 9 144.8 Female Non 1.724 8 171.5 Female Non 1.720 7 138.4 Female Non 1.558 9 134.6 Male Non . . . . . . . . . . . . . . . 3.727 15 172.7 Male Current 2.853 18 152.4 Female Non 2.795 16 160.0 Female Current 3.211 15 168.9 Female Non ◦ model: Y = Xβ + ε              1.708 1.724 1.720 1.558 . . . 3.727 2.853 2.795 3.211              =              1 9 144.8 0 0 1 8 171.5 0 0 1 7 138.4 0 0 1 9 134.6 1 0 . . . . . . . . . . . . . . . 1 15 172.7 1 1 1 18 152.4 0 0 1 16 160.0 0 1 1 15 168.9 0 0              ×   β0 . . . β5   +              ε1 ε2 ε3 ε4 . . . ε651 ε652 ε653 ε654              CHAPTER 10. MISCELLANEA AND RECAP 121 10.1.2 Task for this chapter Miscellanea & recap ◦ model: Y = Xβ + ε outcome Y ∗ random vector, we observe a realization y predictors x,1, . . . , x,k ∗ vector of given (known) constants coefficients β ∗ vector of unknown constants error ε ∗ unknown random vector, we do not observe its realization assumptions: ε ∼ N(0, σ2 I) ∗ E Y = Xβ: the expected value of Y is a linear function of β ∗ E ε = 0: no systematic errors ∗ Var ε = σ2 I: independence and same precision ◦ task: miscellanea & recap 10.2 Linear regression in practice 10.2.1 Linear regression in practice Statistical analysis with linear regression 1. build a mathematical model, i.e. define ◦ what is known ◦ what is uncertain linear regression example: Y = Xβ + ε 2. build a probabilistic model for what is uncertain linear regression example: ε ∼ N(0, σ2 I) 3. use probability calculus to draw conclusions linear regression example: CHAPTER 10. MISCELLANEA AND RECAP 122 ◦ β ∼ N(β, σ2 (X X)−1 ) ◦ n−p σ2 σ2 ∼ χ2 n−p ◦ β ⊥⊥ σ2 confidence intervals & hypotheses testing 4. “translate back” to the original problem (interpret the results) linear regression example: ◦ β, a β, Aβ, ◦ E Y, a (E Y), A(E Y) ◦ confidence intervals ◦ hypotheses testing Usual additions to the basic analysis 1. find a suitable mathematical model ◦ propose a suitable functional dependence of Y on X ◦ propose a suitable model for the error linear regression example: model selection 2. build a probabilistic model for what is uncertain linear regression example: check the normality, potentially propose a different error distribution 3. use probability calculus to draw conclusions ◦ might need to adjust for multiple testing, post-hoc testing, poor design, . . . 4. “translate back” to the original problem (interpret the results) linear regression example: ◦ explanation ◦ prediction 10.3 Notes on interpretation 10.3.1 Notes on the explanation Explanation using linear regression ◦ model: Y = Xβ + ε CHAPTER 10. MISCELLANEA AND RECAP 123 ◦ estimate β by β = (X X)−1 X Y ◦ estimate a β by a β ◦ (1 − α) × 100 % confidence interval for a β: a β − t1−α/2(n − p) σ2 a (X X)−1a , a β + t1−α/2(n − p) σ2 a (X X)−1a ◦ estimate A β by A β ◦ (1 − α) × 100 % confidence bands for A β: A β; 1 m σ2 (A β − A β) (A (X X)−1 A )−1 (A β − A β) ≤ F1−α(m, n − p) Interpretation ◦ “keeping the values of all the other covariates fixed, a unit increase in xi is associated with a ˆβi increase in E Y ” suitably adapted for categorical predictors and potentially interactions, and depends on the choice of the identifiability conditions polynomials need a more complex interpretation ◦ is it meaningful to imagine that a covariate changes while all the other remain fixed? Be careful with ◦ confounding: suppose that the truth is Yi = β0 + βE Ei + βC Ci + εi we do not know about C and use Yi = β0 + βE Ei + εi instead C and E are connected, e.g. Ei = γ0 + γC Ci + ˜εi then if C has an effect on Y , we will (erroneously) attribute an effect on Y to E may be solved by multiple regression model, provided the confounders and the form of their association to the outcome are known ◦ causality very hard to be confident about a causal relationship rather that the “associa- tion” ◦ both can be helped by a sound design CHAPTER 10. MISCELLANEA AND RECAP 124 10.3.2 Notes on the prediction Prediction from linear regression ◦ model: Y = Xβ + ε, ε ∼ N(0, σ2 I) ◦ what can we say about Y = β0 + β1x1 + . . . + βkxk + ε for a new x = (1, x1, . . . , xk) ? ◦ Y = x β + ε and E Y = x β ◦ estimate E Y and Y by x β ◦ (1 − α) × 100 % confidence interval for E Y : x β − t1−α/2(n − p) σ2 x (X X)−1x , x β + t1−α/2(n − p) σ2 x (X X)−1x ◦ (1 − α) × 100 % confidence interval for Y x β − t1−α/2(n − p) σ2 1 + x (X X)−1x , x β + t1−α/2(n − p) σ2 1 + x (X X)−1x Be careful with ◦ extrapolation predicting Y for x that is far from the xi,’s in X predicting for different situations/populations than the one satisfying Y = Xβ+ ε ◦ overfitting fitting a model Y = Xβ + ε that is “too close to the data” estimated σ2 is small ◦ having seen enough data CHAPTER 10. MISCELLANEA AND RECAP 125 10.4 Transformations 10.4.1 Transformations Transformations of variables ◦ model: Y = Xβ + ε ◦ we have seen transformations of predictors to find a suitable functional dependence of Y on x ◦ how about transforming Y ? done in practice to improve the functional dependence or fix heteroskedasticity most common are log(Y ), √ Y , some use other powers of Y this is a fundamental change to the model ∗ leaving the simple linear regression framework . . . Log-transformation of the response ◦ original model: Yi = β0 + β1 xi + εi ◦ model after the log transform: log(Yi) = β0 + β1 xi + εi on the original scale: Yi = exp{β0} × exp{β1 xi} × exp{εi} the effects of covariates are on the multiplicative scale the error enters multiplicatively and the multiplicative error has log-normal distribution ∗ exp{x} ≈ 1 + x for small x ⇒ Yi = exp{β0} × exp{β1 xi} × (1 + εi) for small εi non-linear regression model with non-constant variance Yi = exp{β0} × exp{β1} exp{xi} + σ2 i εi for small εi . . . prediction on the original scale ∗ predict by exp{ˆY } with CI (exp{L}, exp{U}) interpretation of β on the log-scale problems with interpretation on the original scale ∗ log(E Y ) = E log(Y ) but the median is preserved ∗ log(1 + x) ≈ x for small x . . . ∗ e.g. ˆβ1 = 0.09 can be interpreted as a 9 % increase in medY associated with a unit increase in x CHAPTER 10. MISCELLANEA AND RECAP 126 Box–Cox transformation of the response ◦ original model: Yi = β0 + β1 xi + εi ◦ looking for a more general transform. . . ◦ suppose that Y > 0 ◦ Box–Cox transformation: gλ(y) = yλ−1 λ λ = 0 log(y) λ = 0 (a continuous function of λ) λ can be viewed as a parameter and ˆλ found by MLE ∗ also gives a CI for prediction, you may use y ˆλ for interpretation, you had better round ˆλ to the nearest interpretable value (check the CI) use CI to see if you need a transform at all 10.5 Concluding remarks 10.5.1 Reflection It’s an uncertain world . . . use statistics to decide ◦ How much of chocolate and other goodies is good for our health? levels of bacteria, fertilizers, chemicals, . . . is safe? ◦ What is the right size for the height of a dam? insurance premium? mortgage interest? ◦ What is the average salary? public opinion on . . . ? results in upcoming elections? CHAPTER 10. MISCELLANEA AND RECAP 127 ◦ uncertainty at the beginning imperfect answers at the end ◦ statistics is used for quantifying uncertainty, not for getting rid of it Statistics is collaboration ◦ The best thing about being a statistician is that you get to play at everyone’s backyard. John Tukey Statistics does not guarantee the right answers ◦ if there is no uncertainty, there is no need for statistics → statistics might give a wrong answer !!!but we should not abuse this!!! ◦ only incompetent statisticians do not know how to lie with statistics ◦ good statisticians know the pitfalls and know they must be cautious Ingredients of a statistical analysis ◦ mathematics, programming, communication . . . ◦ but above all: COMMON SENSE