Stata Textbook Examples Applied Regression Analysis by John Fox Chapter 15: Beyond Linear Least Squares Section 15. 1 Models for Dichotomous Data Figure 15.1 on page 440 on data file chile. use http://www.ats.ucla.edu/stat/stata/examples/ara/chile, clear gen voting=1 if (intvote==1) replace voting = 0 if (intvote==2) regress voting statquo predict y, xb logistic voting statquo predict pred, p ksm voting statquo, lowess gen(kp) bwidth(.4) nograph sort statquo graph voting y pred kp statquo, jitter(1) connect(.lll) symbol(Oiii) Table 15.1 and 15.2 on page 452 using data file womenlf. use http://www.ats.ucla.edu/stat/stata/examples/ara/womenlf, clear * Generating a new work status variable and an interaction variable gen ws = 1 if( workstat==1 | workstat==2) replace ws=0 if ( workstat==0) gen ik = husbinc*chilpres * Contrast between model 1 and 0 is the overall likelihood ratio of * model 1 and the p-value is the overall Prob > chi2 xi: logistic ws husbinc i.chilpres i.region ik /*model 1 */ (Some output omitted here.) Logit estimates Number of obs = 263 LR chi2(7) = 39.61 Prob > chi2 = 0.0000 Log likelihood = -158.27076 Pseudo R2 = 0.1112 (More output is omitted.) lrtest, saving(m1)/* saving for further contrast test */ display -2*e(ll) /* displaying deviance */ 316.54152 logistic ws /* model 0 */ display -2*e(ll)/* displaying deviance */ 356.15089 xi: logistic ws husbinc i.chilpres i.region /* model 2 */ display -2*e(ll) 317.30107 lrtest, saving(m2) lrtest, using(m1) /* contrast 2-1 */ Logistic: likelihood-ratio test chi2(1) = 0.76 Prob > chi2 = 0.3835 xi: logistic ws husbinc ik i.chilpres /* model 3 */ display -2*e(ll) 319.12422 lrtest, using(m1)/* constrast 3-1 */ Logistic: likelihood-ratio test chi2(4) = 2.58 Prob > chi2 = 0.6299 xi: logistic ws husbinc i.region /* model 4 */ display -2*e(ll) 347.84936 lrtest, using(m2)/* contrast 4-2 */ Logistic: likelihood-ratio test chi2(1) = 30.55 Prob > chi2 = 0.0000 xi: logistic ws i.chilpres i.region /* model 5 */ display -2*e(ll) 322.4267 lrtest, using(m2)/* contrast 5-2 */ Logistic: likelihood-ratio test chi2(1) = 5.13 Prob > chi2 = 0.0236 Figure 15.4 and formula on page 453. In order to add some text to the graph, we borrowed a program from Stata manual called addtext (see page 64 of Stata 6 Graphics Manual). xi: logistic ws i.chilpres husbinc i.chilpres Ichilp_0-1 (naturally coded; Ichilp_0 omitted) Logit estimates Number of obs = 263 LR chi2(2) = 36.42 Prob > chi2 = 0.0000 Log likelihood = -159.86627 Pseudo R2 = 0.1023 --------------------------------------------------------------------------- --- ws | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] ---------+----------------------------------------------------------------- --- Ichilp_1 | .2068734 .0604614 -5.391 0.000 .1166622 .3668421 husbinc | .9585741 .0189607 -2.139 0.032 .9221229 .9964661 --------------------------------------------------------------------------- --- logit Logit estimates Number of obs = 263 LR chi2(2) = 36.42 Prob > chi2 = 0.0000 Log likelihood = -159.86627 Pseudo R2 = 0.1023 --------------------------------------------------------------------------- --- ws | Coef. Std. Err. z P>|z| [95% Conf. Interval] ---------+----------------------------------------------------------------- --- Ichilp_1 | -1.575648 .2922629 -5.391 0.000 -2.148473 - 1.002824 husbinc | -.0423084 .0197801 -2.139 0.032 -.0810767 - .0035401 _cons | 1.33583 .3837632 3.481 0.000 .5836677 2.087992 --------------------------------------------------------------------------- --- predict pw, p xi: regress ws i.chilpres husbinc i.chilpres Ichilp_0-1 (naturally coded; Ichilp_0 omitted) Source | SS df MS Number of obs = 263 ---------+------------------------------ F( 2, 260) = 20.43 Model | 8.64321054 2 4.32160527 Prob > F = 0.0000 Residual | 55.0069796 260 .211565306 R-squared = 0.1358 ---------+------------------------------ Adj R-squared = 0.1291 Total | 63.6501901 262 .242939657 Root MSE = .45996 --------------------------------------------------------------------------- --- ws | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+----------------------------------------------------------------- --- Ichilp_1 | -.3673736 .061906 -5.934 0.000 -.4892745 - .2454727 husbinc | -.0085375 .0039351 -2.170 0.031 -.0162863 - .0007887 _cons | .7936535 .0766814 10.350 0.000 .6426578 .9446491 --------------------------------------------------------------------------- --- predict lw, xb gen pw1=pw if(chilpres==1) gen pw2=pw if(chilpres==0) gen lw1=lw if(chilpres==1) gen lw2=lw if(chilpres==0) sort husbinc graph pw1 pw2 lw1 lw2 husbinc, connect(llll) symbol(iiii) program define addtext, rclass local y1 =.15 local x1 =10 local y2 =.75 local x2 =30 gph open graph local ay=r(ay) local ax=r(ax) local by=r(by) local bx=r(bx) local r1 = `ay'*`y1' + `by' local c1 = `ax'*`x1' + `bx' local r2 = `ay'*`y2' + `by' local c2 = `ax'*`x2' + `bx' gph pen 1 gph text `r1' `c1' 0 0 Children Present gph text `r2' `c2' 0 0 Children Absent gph close end addtext Figure 15.5 on page 459. We can't get the exact lowess smooth as in the book as it needs more than one iterative reweighting steps as there are outliers in the data and Stata's command ksm does not have an option on that yet. See the corresponding part of SAS for a better lowess smoothing result. logistic ws chilpres husbinc predict prob gen par=(ws-prob)/(prob*(1-prob))-.0423*husbinc /*creating partial residual*/ ksm par husbinc, lowess bwidth(0.9) gen(kprob) nog /*lowess smoothing*/ reg par husbinc predict y graph kprob y par husbinc, connect(ll.) symbol(iiO) Figure 15.6 on page 461. Notice that in Stata all the diagnostic statistics for logistic regression are adjusted for the number of covariate patterns, so called m-asymptotic instead of n- asymptotic, i.e., for the number of observations. So we have to do some calculations here in order to get the results in the text book. logistic ws chilpres husbinc predict yhat /*the predicted value*/ gen pr=(ws-yhat)/sqrt(yhat*(1-yhat)) /* generating Pearson residual on a case by case basis */ predict myhat, hat predict pattern, number egen count=count(obs), by(pattern) /*number of obs sharing a c.p.*/ gen nhat=myhat/count /*hat diagonal on per observation basis*/ gen sr = pr/sqrt(1-nhat)/*studentized pearson residual*/ graph sr nhat, l1(Studentized Residual) ylab(-2(2) 4) yline(-2 0 2) /* */ xlab(0(.01) .06) xline(0.0228 .034) b1("Hat-value") b2(" ") Figure 15.7 (a) and (b) on page 462. Stata does not have built-in command for Dfbeta. We use formula [15.21] to create the required statistics for these figures. This example is continuation of the previous one. gen id=1 set matsize 300 mkmat chilpres husbinc id, matrix(X) matrix d=e(V)*X' matrix dt=d' svmat dt, name(mydf) gen md1=mydf1*(ws-yhat)/(1-nhat)/*kidDfbeta*/ graph md1 obs, ylab(-0.04(0.04) 0.08) yline(0)l1(D*(B2) Presence of Children)/* */xlab(0(50) 300) gen md2=mydf2*(ws-yhat)/(1-nhat)/*husbincDfbeta*/ graph md1 obs, ylab(-0.005(0.005) 0.01) yline(0)l1(D*(B2) Husband's Income)/* */xlab(0(50) 300) Section 15. 2 Models for Polytomous Data Calculation and Figure 15.8 on page 468 and 469. We also show how to add text to a graph at the end of this example. Both Figure 15.8 (a) and Figure 15.8 (b) need to run the suitable program addtext1 to have the text shown. mlogit workstat husbinc chilpre, base(0) Iteration 0: log likelihood = -250.24628 Iteration 1: log likelihood = -214.24438 Iteration 2: log likelihood = -211.495 Iteration 3: log likelihood = -211.44102 Iteration 4: log likelihood = -211.44096 Multinomial regression Number of obs = 263 LR chi2(4) = 77.61 Prob > chi2 = 0.0000 Log likelihood = -211.44096 Pseudo R2 = 0.1551 --------------------------------------------------------------------------- --- workstat | Coef. Std. Err. z P>|z| [95% Conf. Interval] ---------+----------------------------------------------------------------- --- parttime | husbinc | .0068921 .0234548 0.294 0.769 -.0390784 .0528627 chilpres | .0214911 .4690366 0.046 0.963 -.8978037 .940786 _cons | -1.432307 .5924623 -2.418 0.016 -2.593512 - .2711023 ---------+----------------------------------------------------------------- --- fulltime | husbinc | -.0972307 .0280958 -3.461 0.001 -.1522975 - .0421639 chilpres | -2.558595 .362199 -7.064 0.000 -3.268492 - 1.848698 _cons | 1.982822 .4841771 4.095 0.000 1.033853 2.931792 --------------------------------------------------------------------------- --- (Outcome workstat==not work is the comparison group) predict p1 if e(sample), outcome(1) predict p2 if e(sample), outcome(2) predict p0 if e(sample), outcome(0) gen kp1=p1 if(chilpres==0) gen knp1=p1 if (chilpres ==1) gen kp2=p2 if(chilpres==0) gen knp2=p2 if(chilpres==1) gen kp0=p0 if(chilpres==0) gen knp0 =p0 if(chilpres==1) sort husbinc graph knp1 knp2 knp0 husbinc, connect(lll) symbol(iii) ylab(0(.25) .75) l1(Fitted Probability) graph kp2 kp1 kp0 husbinc, connect(lll) symbol(iii) l1(Fitted Probability) ylab(0(.25) 1) program define addtext1, rclass local y1 =1 local x1 =5 local y2 =.75 local x2 =12 local y3 =.65 local x3 =30 local y4 =0.15 local x4=20 gph open graph local ay=r(ay) local ax=r(ax) local by=r(by) local bx=r(bx) local r1 = `ay'*`y1' + `by' local c1 = `ax'*`x1' + `bx' local r2 = `ay'*`y2' + `by' local c2 = `ax'*`x2' + `bx' local r3 = `ay'*`y3' + `by' local c3 = `ax'*`x3' + `bx' local r4 = `ay'*`y4' + `by' local c4 = `ax'*`x4' + `bx' gph pen 1 gph text `r1' `c1' 0 0 Children Absent gph text `r2' `c2' 0 0 Full-Time gph text `r3' `c3' 0 0 Not Working gph text `r4' `c4' 0 0 Part-Time gph close end addtext1 Calculation on page 473. gen wk=(workstat==0) logistic wk husbinc chilpres Logit estimates Number of obs = 263 LR chi2(2) = 36.42 Prob > chi2 = 0.0000 Log likelihood = -159.86627 Pseudo R2 = 0.1023 --------------------------------------------------------------------------- --- wk | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] ---------+----------------------------------------------------------------- --- husbinc | 1.043216 .0206349 2.139 0.032 1.003546 1.084454 chilpres | 4.833875 1.412762 5.391 0.000 2.725968 8.57176 --------------------------------------------------------------------------- --- logit Logit estimates Number of obs = 263 LR chi2(2) = 36.42 Prob > chi2 = 0.0000 Log likelihood = -159.86627 Pseudo R2 = 0.1023 --------------------------------------------------------------------------- --- wk | Coef. Std. Err. z P>|z| [95% Conf. Interval] ---------+----------------------------------------------------------------- --- husbinc | .0423084 .0197801 2.139 0.032 .0035401 .0810767 chilpres | 1.575648 .2922629 5.391 0.000 1.002824 2.148473 _cons | -1.33583 .3837632 -3.481 0.000 -2.087992 - .5836677 --------------------------------------------------------------------------- --- lrtest, saving(0) logistic wk chilpres Logit estimates Number of obs = 263 LR chi2(1) = 31.59 Prob > chi2 = 0.0000 Log likelihood = -162.27945 Pseudo R2 = 0.0887 --------------------------------------------------------------------------- --- wk | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] ---------+----------------------------------------------------------------- --- chilpres | 4.781119 1.379609 5.422 0.000 2.71589 8.416797 --------------------------------------------------------------------------- --- lrtest /*test on the bottom of page 473*/ Logistic: likelihood-ratio test chi2(1) = 4.83 Prob > chi2 = 0.0280 gen ptime=. replace ptime = 0 if (workstat==1) replace ptime=1 if(workstat==2) logistic ptime husbinc chilpres Logit estimates Number of obs = 108 LR chi2(2) = 39.85 Prob > chi2 = 0.0000 Log likelihood = -52.247423 Pseudo R2 = 0.2761 --------------------------------------------------------------------------- --- ptime | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] ---------+----------------------------------------------------------------- --- husbinc | .898285 .0351699 -2.740 0.006 .8319318 .9699305 chilpres | .0705484 .0381719 -4.900 0.000 .0244301 .2037278 --------------------------------------------------------------------------- --- logit Logit estimates Number of obs = 108 LR chi2(2) = 39.85 Prob > chi2 = 0.0000 Log likelihood = -52.247423 Pseudo R2 = 0.2761 --------------------------------------------------------------------------- --- ptime | Coef. Std. Err. z P>|z| [95% Conf. Interval] ---------+----------------------------------------------------------------- --- husbinc | -.1072679 .0391522 -2.740 0.006 -.1840048 - .0305309 chilpres | -2.651456 .5410738 -4.900 0.000 -3.711941 - 1.59097 _cons | 3.477773 .7671069 4.534 0.000 1.974272 4.981275 --------------------------------------------------------------------------- --- lrtest, saving(p1) logistic ptime chilpres Logit estimates Number of obs = 108 LR chi2(1) = 30.87 Prob > chi2 = 0.0000 Log likelihood = -56.738094 Pseudo R2 = 0.2138 --------------------------------------------------------------------------- --- ptime | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] ---------+----------------------------------------------------------------- --- chilpres | .0869565 .04288 -4.953 0.000 .0330794 .2285846 --------------------------------------------------------------------------- --- lrtest, using(p1)/*test on the top of page 474*/ Logistic: likelihood-ratio test chi2(1) = 8.98 Prob > chi2 = 0.0027 Calculation on page 477. xi: ologit workstat husbinc i.chilpres i.chilpres Ichilp_0-1 (naturally coded; Ichilp_0 omitted) Iteration 0: log likelihood = -250.24628 Iteration 1: log likelihood = -221.36758 Iteration 2: log likelihood = -220.83242 Iteration 3: log likelihood = -220.83148 Ordered logit estimates Number of obs = 263 LR chi2(2) = 58.83 Prob > chi2 = 0.0000 Log likelihood = -220.83148 Pseudo R2 = 0.1175 --------------------------------------------------------------------------- --- workstat | Coef. Std. Err. z P>|z| [95% Conf. Interval] ---------+----------------------------------------------------------------- --- husbinc | -.0539007 .01949 -2.766 0.006 -.0921004 - .0157009 Ichilp_1 | -1.971957 .2869478 -6.872 0.000 -2.534364 - 1.40955 ---------+----------------------------------------------------------------- --- _cut1 | -1.852037 .3862995 (Ancillary parameters) _cut2 | -.9409253 .3699303 --------------------------------------------------------------------------- --- Section 15. 3 Discrete Independent Variables and Contingency Tables The analysis in this section is based on Table 15.3, which is based on data from an example from The American Voter (Campbell, et al., 1960). The first data set we create here is based on Table 15.3 for Figure 15.3. input logv1 logvc inten .847 .9 0 .904 1.318 1 .981 2.084 2 end label define scale 0 Weak 1 Medium 2 Strong label values inten scale label variable logv1 "One-Sided" label variable logvc "Close" graph logv1 logvc inten, connect(ll) b1(Intensity of Preference) Now we create another data set below to do the logistic regression and tests over different models. Thus we will have Table 15.4 and Table 15.5 on page 482. input perclose inten1 inten2 voted wv 0 0 0 1 91 0 0 0 0 39 0 1 0 1 121 0 1 0 0 49 0 0 1 1 64 0 0 1 0 24 1 0 0 1 214 1 0 0 0 87 1 1 0 1 284 1 1 0 0 76 1 0 1 1 201 1 0 1 0 25 end gen clspref1=perclose*inten1 gen clspref2=perclose*inten2 logistic voted perclose inten1 inten2 clspref1 clspref2 [fweight=wv] (Output omitted.) di -2*e(ll)/*deviance for model 1*/ 1356.4343 lrtest, saving(t1) logistic voted perclose inten1 inten2 [fweight=wv] (Output omitted.) di -2*e(ll) /*model 2*/ 1363.5529 lrtest, saving(t2) lrtest, using(t1)/*contrast 2-1*/ Logistic: likelihood-ratio test chi2(2) = 7.12 Prob > chi2 = 0.0285 logistic voted perclose clspref1 clspref2 [fweight=wv] (Output omitted.) . di -2*e(ll) /*model 3 */ 1356.6253 lrtest, saving(t3) lrtest, using(t1)/*contrast 3-1 */ Logistic: likelihood-ratio test chi2(2) = 0.19 Prob > chi2 = 0.9089 logistic voted inten1 inten2 clspref1 clspref2 [fweight=wv] (Output omitted.) di -2*e(ll) /*model 4*/ 1356.4869 lrtest, using(t1) Logistic: likelihood-ratio test chi2(1) = 0.05 Prob > chi2 = 0.8186 logistic voted perclose [fweight=wv] (Output omitted.) di -2*e(ll) /*model 5 */ 1382.6582 lrtest, using(t2) Logistic: likelihood-ratio test chi2(2) = 19.11 Prob > chi2 = 0.0001 logistic voted inten1 inten2 [fweight=wv] (Output omitted.) di -2*e(ll) /*model 6*/ 1371.8382 lrtest, using(t2) Logistic: likelihood-ratio test chi2(1) = 8.29 Prob > chi2 = 0.0040