A Clr.ii>!er I Introduction 1.1 Read me first! This book is lilled villi examples T hope you will try on your uwu. This section shows how you can download the example datascfs und supplemental programs. 1.1.1 Downloading the example datasets and programs All the example dalasets find programs can lit1 downloaded from within Statu using the following commands: . not fron http://iraw.stata-presa.cam/clata/BbB . net install sbs . net get sbs Tin.' net install command download.* tin: programs, 1 wrote for the book. These programs are listed below with a brief description and the eliaplers in which 1 luv arc used: • The shoh-coding command shows the coding of an original variable as compared with the reeoded version. This program is used in chapter 111. • The; power multreg command computes power for simple and multiple regression analyses, ft is used in chapter L9. • The power nestreg command computes power for a nested regression. It is used in chapter 111. 1.2.1 ANOVA 5 The esttab command This1 program creates Formal ted estimation tables. It is an extended version of the Hiaia estimates table command. You can download it by typing1 . ssc install ostout, replace The esttab program is extensively used in chapter lfi, showing how to create presentation-quality regression tables. This program was also written by Jann (2IK)7b). The extremes command The extremes command displays extreme values for a variable. You can download it from the SSC repository using the ssc command as shown below: . ssc install e^treiaeE, replace This program, written by Nicholas .1. Cox (201)3). is used in chapter IS on regression diagnostics. 1 am grateful to Jann and Cox fur their kind permission to use their programs m my book. These programs not: only helped to make my book better but also showed the kinds of user-written programs that have been created by the skilled and generous members of the Stata community. I touch on this point in section 1.2.7 later in this chapter. The net get command downloads the example datasets. T encourage you to download these so that you can reproduce and extend the examples illustrated in this hook. 1.1.2 Other user-written programs The book also uses a number of user-written programs. All of these user-writ leu programs are stored at the Statistical Software C'nmpoiienls (.SSC) repository and can be downloaded using the ssc commaud. The programs are described below along with I lie ssc command that yuu can use to download the program. The fre command This program shows frequencies of a variable with the values of the variable and the value labels, ft is an alternative to the tabulate command for one-way tabulations. You can download it by typing . ssc install ire, replace 1.2 Why use Stata? 1 have extensively used and supported many statistical packages, both general and spiv eialized. As someone whose background is in psychology and who feels strong connections to the behavioral sciences. I think there arc many reasons for behavioral scientists to choose Statu as their lirst statistical package or to switch from their current package over to Stata. as described below. 1.2.1 ANOVA Analysis of variance (ANOVA) is a cornerstone statistical technique of the behavioral sciences, especially factorial ANOVA with its ability to dissect the interactions to answer very meaningful substantive hypotheses. Stata offers exceptionally powerful (yet easy-to-use) tools thai allow yon to analyze and dissect, results from ANOVA. Let: me illustrate with an example. The fre command, written by Ben Jann (2007a), is used in many chapters throughout the book. 1. The; c.....iiiiiiifl esttsb i.s pnrt oť i lit1 estput ji;irl;nij;e ol pretrial!*, hti is is dnwnlcrtded by typing ssc install estout. (j Chapter ! Introduction Imagine a .study in which subjects either are assigned to a control group or will receive si kind ol' therapy called optimism therapy, which is geni-til toward increusiiig optimism. All subjects in the study were screened for depression and only iiorulepressed subjects were included. At the conclusion of the study, the optimism of Hie participants was measured, mid optimism was found to be greater in the optimism therapy group than in i,1m* control group. As we can sec in figure 1.1. Hie optimism of those who received optimism therapy is 10 units greater than the optimism of the control group. The effect of optimism therapy is that it boosted optimism by !() units. Experiment 1 T1 (Control) Figure 1.1: Results of experiment 1 The experimenter decides to replicate the previous sludy but with a twist: in addition to recruiting nondepressed participants, the experimenter will also recruit, participants Who are depressed. The experimenter expects that; among those who are tioudeprossed, he or she will find the same kind of pattern as was found in the lirst study (ili.it is, shown in figure i.l). The experimenter wonders how those who are depressed will respond to this therapy. Will they equally lifinelit from the therapy? Will they benefit from the thcrupy but not as much as those who are nondepressed? Will they not benefit; from the therapy at all? These three possible pattern- are described Further below and also illustrated in figure 1.2. 1.2.1 ANOVA Pattern 1 PallGin 2 Paltem 2 Figure 1.2: Three possible patterns of results from a 2 by 2 factorial design Pattern 1. Optimism therapy benefits those who are depressed and nondepressed equally alike. Among those who are nondepressed. the effect of opthnisin therapy is id units (30 versus 40). the same gain as for those who are depressed (51 versus -II). Pattern 2. Optimism therapy boosts optimism by C> points for those who are depressed but does not yield I he 10-poiuf boost; experienced by uimdcpressed participants. Optimism therapy is ell'eciive for those who are depressed but just not to I he same degree as for those who are not depressed. Pattern 3. Optimism therapy has no elfect for those who are depressed (yielding no change from the control group, -11 versus -11), while optimism therapy delivered a 10-poiut boost in optimism for nondepressed participants. The comparisons to distinguish patterns I, 2, and 3 involve tests of interactions and tests to dissect: the specific pattern of the interaction. As illustrated in chapter 7, Statu has exceptionally powerful, yet, easy-to-use, fools for dissecting interactions with surgical precision. But (his is not the only reason 1 think Stata is an outstanding statistical package lor behavioral scientists. Here are some other reasons. 8 Chapter I luirmiuction 12.5 Busy to team 9 2.2 Supercharging your A NOVA Suppose: the outcome of the previous study was changed to m bimiry outcome, uceessi-tuliiig lite use (if a logistic regression. Not. to worry, because if you are using Stafca. you can perforin a two-by-two factorial logistic regression (flawing upon the same concepts and tools as you could in performing a iivo-by-lwo ANOVA. The same can be said if you were using a. count outcome and wanted to run ti two-by-two Poisson regression model. In fact, Stata lias brought ANOVA-like analysis technology to virtually all of its analysis coinuiauds. As a behavioral scientist, you can apply these familiar designs tu an exceptionally wide variety of still istical models. 2.3 Stata is economical Sfcata lias very attractive academic pricing, including student pricing, that can pul Stata into your hands (at the time this book was written) for SI25 a year (or S~i4 a year if you use very small datasets). hi addition, there is no extra price for extra modules. Some statistical software packages charge extra for modules that address missing data, the analysis of complex survey data, structural equation modeling, or bootstrapping, hi fact, there are some specialized statistical packages thai, people buy to add these features to their existing statistical package. With Stata, all of these features are included at no extra cost. Note: Stat/Transfer If you want the ability to translate datasets from a. variety of formats (for example, from .Stata to SAS, from SPSS to Stata), then 1 suggest you consider purchasing Stat/Transfer, which you can obtain directly from StataCorp. Stat/Transfer can convert virtually any kind of dataset: from one formal to another format (for example, from Stata to SAS, from SPSS lo Stata. from SAS to SPSS, and from Access lo Excel), i have used Stat/Transfer for many years and been consistently impressed with ils ease of use and how well ii works. Speaking for myself, 1 cannot imagine doing my daily work without Slut/Transfer. 2.4 Statistical powerhouse Stata is a statistical powerhouse in terms of its statistical features. Rather than listing all the features, 1 will point you to http://www.sta taxom/features/ so that you can look at I hem yourself. —-------------------- Tip: Stata/MP for multiprocessing If you want lo really unleash the power of Stata. consider Stata/MP. "faking advantage of imdticore and multiprocessor computers, Stata/MP speeds up computations by using a divide and conquer strategy. Using S cores, the median time to complete an estimation command is 4.1 times faster'. The Stntii/MP Performance Report (available at http://w'ww.slata.com/statamp/statauip.pdi') describes the concepts involved, in parallel processing and (fetalis the performance gains achieved across Stata commands. Consider these performance gains thai can be achieved on a Ki-core machine- a linear regression using the regress command rims 10.5 times faster, a 2-woy ANOVA using the anova command runs 12.7 times faster, and a logistic regression using the logistic command runs 11.1 times faster. 1,2.5 Easy to learn Looking at tin: list of statistical features, you might feel overwhelmed wondering how you could learn all of these different commands. The amazing thing is how- similarly the commands work. To run a regression predicting y from xi. x2. and x3. you type . regreaa y xl z2 T.3 To run a robust regression predicting y from xl, x2, and x3. you type . rreg y il y.2 i3 To run a logistic regression predicting y from xl, x2, and x3. you type , logistic y ^1 j:2 i3 To run a Poissou regression predicting y from xl, x2, and x3, you type . poi330Q y xl 7.2 x3 If Stata had a regression style command called xyzreg, you would likely be able to use it to predict y from xl, x2, and x3 by typing . xyzreg y 7.1 x2 r.Z What could he simpler'.' IP Chapter I Introduction 2.6 Simple and powerful data management With il.a extensive statistical capabilities. 1 think many overlook the power of Stata for data management. Stata features many specialized commands that are like data, management, shortcuts, directly handling commonly difficult data management tasks. j Example commands inrlililc reshape long, reshape wide, eges, collapse, and merge. I 2 II Online resources lor /earning Stidn 11 Even more information: Downloading user-written programs The Getting Started manual has more information about finding and downloading user-written programs. .Just type help gs, and see the chapter tilled ''Updating and extending Statu - Internet functionality" 2.7 Access to user-written programs One of the greatest virtues of Stata is the way it dovetails the cose of developing add-on programs with a great support structure lor finding and downloading these programs. This virtue has led to a rich and diverse network of user-written Stata programs that . extend the capabilities of Stata. As a result, Ihe power of Stata is greatly extended and enhanced by these user contributions. Such programs are easily found, downloaded, and ijislailed with the search command. The search command connects to Stnta's own search engine, which indexes user-written Stata programs from all around the world. Typing, for example, search regression searches for and displays Stata resources associated with the keyword regression. The resources searched include the Stata online help, Stata frequently asked questions (FAQs), the Statu Journal and its predecessor, the Stata Technical Bui- '• JeU'ri, and programs posted on the websites of Stata users from around the world. All of these results are culled together and displayed in the Statu Viewer window. Von can then point to and click on the programs you want, to download arid install. Video tutorial: Downloading user-written programs See a video demonstration of how to find and dowidoad iiser-writl.cn programs at lit-tp://www .stata. eom/sbs/user- written. Many of these programs are hosted at the ssc archive. This repository makes it easy for people to contribute programs to the Stata community and makes it, easy for end users like you and me to easily download such programs. You can see the newest additions to this archive by typing . ssc new You can also see the most popular downloads by typing . ssc hot 1.2.8 Point and click or commands: Your choice You can use Stata with a point-and-elick interface or by typing in commands. In this book. 1 locus exclusively on showing commands, but at any time, you can explore Stata via the drop-down menus, which give you point-and-elick access to data . management commands (via the Data drop-down menu), graphics commands (via the Graphics drop-down menu), and statistics commands (via the Statistics drop-duwn menu). Whenever you execute commands via the menus, Stata will display the command equivalent in the output window, touching you the commands even as you use the point-and-elick interface. 1.2.9 Powerful yet simple In life, you often face the dilemma of choosing power or simplicity. With Stata. you can have both. It. offers the simplicity of a point-and-elick interface and commands that are simple to use; it also oilers the power of being able to write your own programs using its Mata programming language. 1.2.10 Access to Stata source code Stata is not just a statistical software program; it is a statistical programming environment. You can write your own Stata commands and programs, and you can view the source code for nearly all Stata commands. For example, woidd you like to see the source (.'ode for the ttest command? If so. just type . viewsource tteEt.ada You can view (.he source code for nearly all Stata commands in this way. This allows you to see how every command works. Moreover, it. means that you could even make your own version of any such Stata command with your own personal customization*. 1.2.11 Online resources for learning Stata Another reason to use Statu is that there are so many terrific online resources to help you learn Stata. 12 Ch.'ip/er I lntmtlut't-ion I i o Part If: Bftvm'ii-subjucts ANOVA models 13 The Stata resources and support page provides a comprehensive list of online resources available lor Statu. [|. lists ofliciul resources available from StataCorp as well as from Hie Stata ronimunit.y. See 1HI p://wwrw.stntu.aiiii/support/. The Stata Resource links page provides a list of resources created by tiro Stata community to help yon learn mid use Stata: see htlp://\vww.stata.com/links/. Among the links included there, I highly recommend the UCLA Institute for Digital research and Education (IDRG) Skit a web resources at http://www.nts.ucla.edu/stal/statn/. which include FAQs, annotated Statu output, textbook examples solved in Stata, and online classes and seminars about Stata. The Video tutorials on using Stata page contains links to numerous videos illustrate ing a wide variety of topics about Stata. These videos uniquely exploit the ability to show yon about: the use of Stata in a way I hat a written explanation cannot convey. Further, the videos are brief and to the point, (usually lasting between two and five minutes). While you can find the videos on the SfataCorp YouTuhe channel at http://www.youtube.eoiu/uscT/statucorp, tin; "Video tutorials on using Stata'' page (at http://www.stafa.eom/links/video-tur:oriais/) shows the videos organized by topic. The Stata Frequently Asked Questions page is special because it not only contains many frequently asked questions but also includes answers! The FAQs cover common questions (for example, How do 1 export; tallies from Stata?) as well as esoteric questions (for example. How are estimates of rho outside the bounds [-1.1] handled in the two-step lleckmau estimator''). Von can search flit: FAQs using keywords, or you can browse the FAQs by topic. See http://www.slata.coiii/suppol'l/faqs/. The Stata Technical Bulletin is the predecessor of the Stata Journal. All of these issues lire available for free (.inline. Although many articles may be out of date, there are many gems that, contain timeless information. For more information, see I it t p: //www.si ata.coiu/bookstore/individual-sl ata-teehuicaH.iulletin-issues/. Note: Help menu ft is easy to overlook or forget that, the Stata Help drop-down menu is a central huh for directing you to many helpful resources, including tin: Stat a. documentation in PDF. the help files organized by content, and information about what's now in Stata, i'ly favorite is the Resources item, which firings up llemtmrccs for learning ••••jiiore about S'iata, which provides a concise and comprehensive list: of resources to help you learn and use Stata. You can also access this help file by typing help resources. 1,2.12 And yet there is more! For even more information and more reasons why you would enjoy using Stata. see the Stata webpage tilled "Why use Stata" at http://www.stata.coui/why-use-statu/. 1.3 Overview of the book The book is divided into live parts, described below. Statalist is an independently run web forum thai: connects Stata users from all over the world. It began in 1119-1 as a listserv (hence the name "Statalist") and was relaunched in March 2014 as a web forum. The community is both extremely knowledgeable and friendly, welcoming questions from uewbies and experts alike. Even if you never post a question, you can learn quite: a bit by reading the questions and answers poster! by others. You can visit the web forum at http://www.statalist.org/. And if you wish to read questions and answers thai predate the web forurn (going all the way back to 2(102), you can visit the "Statalist. archives" page at http://www.stala.com/stiitalist/archive/. The Stata Blog covers many interesting and technical aspects of Stata. Entries are written by Stuta's developers and technical support team: see http://blog.stata.coui/. The Stata Journal is published quarterly with articles that integrate various aspects of statistical practice with Statu. Although current issues and articles are available by subscription, articles over three years old are available for free online as PDF files. See lift p: / /www.st.n tu-journal.com/. 1.3.1 Part I: Warming up As implied by the title, this part warms us up for the more substantial parts. This : » includes the current chapter you are reading. The next, chapter, chapter 2, covers ■k descriptive statistics such as tabulations (frequency distributions), summary statistics, cross-tabulations, and summary statistics for specific subgroups. This part concludes with chapter 3, which introduces basic inferential statistics such as two-sample / tests, "$ one-sample I tests, and one- and two-sample tests of proportions. | 1.3.2 Part II: Between-subjects A NOVA models ..5' I'his part covers between-subjects ANOVA models, beginning with chapter 4. which cov- ■|; era one-way between-subjects ANOVA. This is followed by chapter 5. which illustrates -s contrasts that you can use for making comparisons among groups in a one-way ANOVA. >•? Next, chapter (j covers analysis of eovariance (ANCOVA), illustrating its use in exper- 'i miental designs (to increase power) and in nouexperimeutal designs (to attempt to statistically control for confounding variables). Chapter 7 introduces factorial designs, 1-1 Chapter i Introduction ( :!r, p„rí V; Slnííi overview covering lwo-by-1 wo designs, two-by-three designs, mid iliree-hy-lluve designs. This chapter emphasizes bow to visualize und interpret the interactions. It also illustrates how to dissect two-way interactions using simple effects, simple contrasts, partial interactions, and interaction contrasts. Tiie contrast command is illustrated lor dissecting the interactions, and the margins and marginsplot commands are used In display and graph the means associated with the interactions. Chapter 8 illustrates ANCOVA-lypo analyses with the focus on interactions of the independent variable (IV) and the eovnri-atc (in other words, categorical by continuous variable interactions). Chapter !) covers factorial models with three [Vs. This chapter. Iil;e chapter 7, emphasises tin \isiial-izution ami interpret ill ion of the interactions, in this ease focusing on the three-wav interactions. Figures are used to visually understand the three-way interactions, and a variety of analytic and graphical methods is illustrated to also help understand them. Chapter 111 shows how you can extend the power of A NOVA by blending; the AMOVA designs with regression commands. This chapter illustrates how you can extend your A NOVA design to analyse data that come from complex surveys, data that violate the homogeneity of variance assumption, or data with influential observations I via robust regression or quuutile regression). This part: concludes with chapter 11. which illustrates power analysis for ANOVA and ANCOVA. 3.3 Part III: Repeated measures and longitudinal models 1 3/4 Part IV: Regression models strategies for analyzing designs with multiple observations This part covers two dillerei on the same subject. Chapter 12 covers repeated measures ANOVA designs, hi such designs, participants are observed at, more than one time point. All participants are observed according to the same time schedule. This chapter shows three examples illustrating the analysis of repeated measures designs. The first- example includes a. single repealed measures [V (see section 12.2). The second example illustrates a lwo-by-thrce betweeu-withiu design where the between-subjects IV has two levels and the repealed measures IV has three levels (see section 12.3). The third example illustrates a fhree-by-lhree between-within design where the between-subjects IV has three levels and the repeated measures IV also has three levels (see section 12.-1). Chapter 13 covers longitudinal models. These models, in contrast to repeated measures designs, typically have a larger number of observations per subject, and the time gaps between the repeated measures can vary between people. This chapter includes four examples, all of which use multilevel modeling as the main analysis strategy. The first example models the dependent variable (DV) as a linear function of rime (see section 13.2). The second example adds a between-subjects IV, which allows us to model the linear ehect of I itne and explore IV by time interaction (see section 13.3). This example is similar to an ANCOVA with a treatment, by covariate interaction (for instance, like the examples in chapter S). The third example includes time as the only predictor but; uses ii pieeewise modeling strategy for the effect of time (see section 13.4). The fourth example adds a between-subjects [V to the third example, modeling the interaction of the IV with the pieeewise effects of time (see section 13.")). This part illustrates how to use .Stala, commands to fit regression models. The chapters ue ordered like a meal in which you decide to eat dessert first. The sweet and delicious ihapret's are presented first, deferring nutritional topics such as regression diagnostics -md power analysis to the end. This purr, begins with chapter 1-1, which shows you ' ' bow to perform multiple regression using Stnra (showing you how to fit a simple linear regression model and multiple regression models) and how to test, multiple eoelfieienls \viihin a multiple regression model. Chapter 15 covers more details about using the "regress command, showing options for customizing output and how to create summary statists s based on the sample of observations included in the most recent regression •'""arihlvsis This chapter also shows you how to store results of regression models for use latei in your Stala session. This feature is used in chapter 16. which shows tools that \ou can use to create formatted regression tallies, The chapter illustrates how create such tables for display on the screen and how to create customized formatted out [Hit that can he used within a word processor like Word to create presentation-quality ■rPirrrssiou tables. Chapter 17 illustrates model-building tools, showing you how to lit : multiple models using the same sample uf observations. The chapter also shows you huw to fir nested regression models and perform stepwise regression models. Chapter IS illustrales commands for performing regression diagnostics, demonstrating analytic and graphical methods for identifying outliers. The chapter also illustrates analytic and graphical methods that you can use for testing for nolilitiearil.y and limv you can delect: itiultieollinearity, assess the homoskedasticily assumption, and evaluate the normality of the residuals. This part concludes with chapter 19, which illustrates how to perform power analysis for a simple regression mode:! and a multiple regression model and how to compute power for a nested multiple regression model. 1.3.5 Part V: Stata overview The previous parts have focused on different statistical techniques, providing examples of how to perform analyses using those techniques in Statu. This part provides eoiutuand-ceutric information, offering an overview- of the use of Stata. The lirst chapter (chapter 211) shows common features of estimation commands. Even though Stata has a very large number of estimation commands, they share a number of common features. This is by design, not by accident. Because estimation commands work similarly, what; you learn about the behavior of one estimation command transliir.es over to the use of other estimation commands. This chapter is about the features (behaviors) that estimation commands share. Chapter 21 discusses a special set of commands called postestiiiiation commands. They are called this because they are used after an estimation command (for example, after the anova or regress command). In particular, this chapter provides additional details about the contrast, margins, marginsplot, and pwcompare commands. Chapter 1 Introduction I ;i J L"Sc used in the hook 17 Chapter 22 provides brif.'f information about basic data management commands in Stata. It illustrates reading data into Stata, keeping and droppiny variables and observations, labeling data, erealing variables, appending dutasels, merging datasets. and reshaping datasets wide to long and long to wide. 'Hit' liiuil chapter, chapter 23. recognizes that many readers of this book might be fainiliar with IUM© SPSS®. If you are such a reader, you might find yourself asking, for a given SPSS command, What is Hie equivalent Statu command? To answer such qucsl.ions: this chapter lists commonly used SPSS commands (in alphabetical order) and shows the equivalent (or near equivalent) Statu command along with a brief example of the Si hi a command. 1.3.6 The GSS dataset One (.if the commonly used datasets in this hook is based on the General Social Survey (GSS). The CSS dataset was created and is collected by the National Opinion Research Center (KORC). To learn more, see http://www.uorc.ucliicnso.etlu/GSS-l-Website/. The CSS is a unique survey and dataset. it contains numerous variables measuring demographies and societal trends from 1972 to 2(112. This is a cross-sectional dataset: I hits the dala for each year represent different respondents. (Note that the GSS does have a panel dataset for 2006, 2008, and 21)1(1, but this is not used here.) In some years, certain demographic groups were oversatnpled. l-'or simplicity. I will overlook this ami Ireat the sample as though simple random sampling was used. The version of the dataset we will use for the book is based on the CSS from 2012. This dataset was accessed by visiting liff]i://www3.norc.org/GSS+\Vebsile/Dowukiad/ STATA+v8.0+Format,/ and looking under Hie heading "Download Individual Year Data Sels (cross-seeI,ion only)" and the subheading "GSS 1972 2012 Release (>". Clicking on the link 2012, I downloaded a lile named 2012.stata.zip, and unzipping that; file yielded the dataset named GSS2012.dta. I created a Stata do-file thai, subsets and re-codes the variables to create the analytic data file we will use, named gss2012_sbs.dta. This dataset is used below. . use gss2012_sbs The describe command shows I hat flic dataset contains lib-I observal ions ami -!2 variables. . describe, short Contains data from gss20I2_sbs.dta obs; 1,974 vars: 't2 size: 157,920 Sorted by: 17 Jul 2015 09:'il Note thai: littp://www3.norc.orfr/GSS+W<>bsite/Dovvidoa(l/STAlVl+v8.n-fForiuut/ provides some key information about missing value codes. There are four missing values in Ihe data: • .c: Cannot choose. • ■ .i: Inapplicable. Respondents who are not asked to answer n specific question are assigned to LAP. e .d: Don't know. • .n: No answer. This suggests that the special missing value code of .c indicates that the value is missing because the respondent could not choose a rating that relieded his or her happiness. The missing value code of .i indicates that the response is missing because the question was not asked of the respondent. (Note that some groups of questions are asked of only some randomly chosen respondents.) The missing value code of .d means that the answer is missing because the respondent did not know. Finally, a missing value code of ,n means thai there is no answer (for example, the respondent preferred not to respond). For more documentation about the GSS. you can visit http://www;i.uorc.org/GSS+Websik7Dociiment.ntir>n/. You can also learn more about missing value codes in Stata by typing help missing. 1.3.7 Language used in the book I would like to comment uii the language used in this book. I use language from ihe tradition of experiment al design and the behavioral sciences. Here are some of the terms that I will be using and my intended meaning. IS Chapter 1 Introduction :% ;,J,J Getting started 19 Independent and dependent variables. In ANOVA, it: is traditional to call the allegorical predictor an IV and the outcome llic DV. This usage reflects the tradition Hint. A NOVA was most commonly used for the analysis of designed experiments. T will continue to describe Hie categorical predictor as llic IV and the outcome as the nv even if the study is not, a designed experiment and even if the design is nut an AKOVA design, t choose this terminology to emphasise that (he role these variables play (in a statistical sense) are the same, even if they do not arise from a designed experiment and even if they are not used in a traditional A NOVA analysis. Note: Factor variables Sometimes, when referring to au IV or a categorical variable. I will also use the term ''factor variable". This is a Slaty-specific term that refers to a categorical variable thai; has been entered into an ANOVA model or into a regression model by using the i. prefix (for example, i.race). Statu treats factor variables differently, knowing that, they are categorical variables, and most Stata. commands understand how to treat such variables differently from continuous variables (for example, age). Covariate. In an ANCOVA design, a covariate is a continuous predictor, in addition to the IV in the prediction of the DV. Effect, it, can be very parsimonious to talk about a "'treatment effect" when talking about the difference in the means for a treatment; versus control group. Or hi the context of regression, it can be useful to call the regression coefficient for a variable (such as age) the "effect of age"'. Whenever 1 use tins term, 1 am not using it in the context of cause and effect but to describe an observed statistical relationship. Note! Internal validity A key question in many studies is whether statistically significant association." reflect underlying causal relationships. The issue of causal inference concerns the scientific integrity study design, and the ability to draw such causal conclusions is often described as the "internal validity" of the study. In this book, f will sidestep such issues but instead refer you to your favorite book in experimenl.nl methods for more information about the conditions that are necessary for drawing causal conclusions regarding statistically significant findings. j 3 g Online resources for this book The online resources for this book can lie found at the book's website: http://w-ww.stata-press.coin/books/sbs.hlrnl Resources yon will find there include the following: s All the datasets used in Hie book. 1 encourage you to download the datasets, reproduce the examples, and try variations on your own. You can download all the datasets into your current working directory from within Stata by typing . net from http://w-nW. ctana-prass. cora/data/sba . net get sbs a Errata (which 1 hope will be short or blank). Alihough f have tried hard to make this book error free, I know (hat some errors will be found, and they will lie listed in the errata. a Other resources that may be placed on the site after this book goes to press. Be sure to visit the site to see what else may appeal' there, 1.4 Recommended resources and books This book focuses on how to analyze data from n. behavioral science perspective and is not a general purpose book about the overall use of Stata. It: omits topics such as an overall introduction to Stata and general principles of using Stata and provides very little details about data, management or graphics. I made this deliberate choice because there are so many other resources that cover these topics and because the coverage of these topics is not specific to a behavioral scientist. Thus here f provide recommendations for resources to help you acquire this information. 1.4.1 Getting started ffynu are new to Statu, 1 highly recommend the Stata Getting Started manual. There is a unique version of the Getting Started manual that shows what, Stata will look like and how it works on your platform. (The manual comes in three separate versions written for Windows, Mac, and Unix.) You can access the Getting Started manuals by typing . help gs To get: you started, 1 suggest you read the chapter titled "'Introducing Stata- sample session''. In addition, the chapters titled ''The Stata user interface", "Using the Viewer", atid -Getting help" should help you quickly feel comfortable in the Stata environment. ■20 CImpttT J hitroi'luci ion j.( Jlecuniiumided Stata Press hooks 21 Video tutorial: The Stata interface Take a video tour of the. Stata interface at litfcp://www.slat.a.«mi/sbs/int.erface. 1.4.2 Data management in Stata The Catting Started manual shows how to get, data into Stata in the chapters titled ''Opening and saving Stata datasets" and "Importing data", It also covers •'Creating new variables" and '"Deleting variables and observations". For more information about general topics in data management, 1 recommend the. I DUE (formerly Academic Technology Services [AT.SJ) UCLA website. There is a special page devoted to the topic of data management at. http://www.ats.ucla.cdtl/stat/stat:a/lopics/datri._rnai!agciru'[if.htin. For comprehensive coverage of the topic of data management (from basic tasks such as labeling variables and recodiug variables to advanced tasks such as merging or reshaping datasets), 1 recommend my book titled Data Management Using Statu: A Practical Handbook, which is available at http://www.sfala.coiu/ljookstore/data-niauagcmeuf-using-stata/. Video tutorial: Getting help in Stata See a video demonstration of how to get help in Stata at http://www.stata,com/si)s/lielp. 1.4.3 Reproducing your results One topic that I have not addressed but is extremely important concerns how you can reproduce your results. This book illustrates commands that you can use for performing your analyses, but it does nor. show you how to create a procedure for saving these commands so you can easily execute them again. A related topic is how to save the results of your commands so that you can refer to the results in the future. The Getting Started manual has an excellent introduction to those topics. The chapter "Using the Do-tile Editor—automating Stata" shows you how to save a sequence of Stata commands in a lile culled a do-file, which you can execute.: at a later lime.2 The chapter "Saving and printing results by using logs'' shows you how to can save your results in a. log lile, which provides you a transcript of your commands and output from previous analyses. These two features can lie combined so that your do-liles automatically generate log tiles. i. For IBM® SPSS® ukbt*. t lib is the equivah-iit ol'un SPSS syntax file. The topic of creating and using do-files is also covered in my book Dnin Management, [75;,)g Statu: A Practical Handbook. -_-„ . . . Video tutorial: PDF documentation in Stata Did von know that Siata has well over 12,1X10 pages of documentation thai are just one click away'/ From the Help menu, click on PDF documentation. You can al=o see a video demonstration of how you can access PDF documentation within ■ Statu at: http://www.stata.coin/sbs/pdf-tlocuineutatioii. 1.4.4 Recommended Stata Press books StaUeC-'nrp has a publishing arm called Stata Press, which issues books like this one. You can see a list of all the books at h I. l.p://www .stata-press.com/catalog/. At Unit site, you can see a description of each book, including a detailed table of contents, comments from the Statu technical group, and a sample chapter, livery Statu Press bonis 1 have read has excelled in providing useful information about the use of Statu for researchers. Among these books, I particularly recommend the following as books that build upon what is presented in this book: • Discovering ■Structural Equation Modeling Using Statu, Revised Edition by Alan C. Acock (2013). This book provides an excellent inl roduct ion to the use of struct urul equation modeling using Statu. • Au Introduction to Survival Analysis Using Stata, Third Edition by Mario Cleves, William Gould. Roberto G. Gutierrez, and Yulia V. Marcheuko (2010). This book provides excellent and detailed information about how to perform survival analysis using Stata. 0 Multilevel and Longitudinal Modeling Using Stain. Third Edition (Volumes T and IT) by Sophia Rabe-Hcsketh and Anders Skrondal (2012b). This two-volume set provides extensive and very detailed information about fitting multilevel and longitudinal models using Statu. . .. • An Introduction to Stata Programming by Christopher F. Bauiu (200!)). This is an excellent book to help you learn and explore the power of Stata programming. 1 would lie remiss if I did not mention my other bonks published by Stata Press, fisted below. • Data Management Using Stata: A Practical Handbook by Michael Fs. Mitchell (2010). This book provides data management information to complement the statistical examples shown in the book you are holding. • A Visual Guide to Stata Graphics. Third Edition by Michael X. Mitchell (2012b). This book visually illustrates the use of Stata graphics. Chapter 1 Introduction 2 Descriptive statistics 2.1 Chapter overview......................... 2:1 2.2 Using and describing the GSS dataset ............. 2,'i 2.3 One-way tabulations....................... 2fi 2.4 Summary statistics........................ 31 2..ri Summary statistics by one group................ 32 2.ti Two-way tabulations....................... .14 2.7 Cross-tabulations with sununary statistics........... 37 2.8 Closing thoughts......................... 37 2.1 Chapter overview This chapter introduces how to perforin descriplive statistics. It begins with section 2.2, which introduces the General Social Survey (CSS) dataset as well as some of the Statu commands you can use to become familiar with a new dataset. This is followed by a series of sections that illustrate how to perforin different kinds of descriptive statistics, namely, one-way tabulations (section 2.3), sununary statistics (section 2.4). summary statistics by one group (section 2.5), two-way tabulations (.section 2.0), and summitry statistics by two groups (seciion 2.7). 2.2 Using and describing the GSS dataset The examples from this chapter are based on analyses of the GSS from the year 2012 using a dataset named gss2012.sbs.dta. Once you have downloaded the dnfascts for the book (as described in section 1.1), you can load this dataset into Stata by typing . use gss2012_sba We can use the describe command to obtain information about the dataset, including the number of observations, the number of variables, and a listing of all the variables and labels. The: command is shown liclow, but the output is very long, so I have omitted it; to save space, I suggest you try the command so that you can see the output for yourself. interprett'jig-and Visualizing Regression Models Using Stata by Michael N- Mitchell (2012a). This book overlaps with the book" you are holding, covering many of the same topics but discussing them in a way that, would appeal to a more general Stata audience, li discusses modeling of categorical, continuous, and categorical and continuous interactions in a much more general fashion than (he book you are holding.