Oc! ttl 111 1111(1 oat ol St at a OPENING A STATA DATA FILE 10 open an existing Stata data file you can either: V1 i use die Open icon on the toolbar and browse to the file location; (d) use the pull-down menu File —> Open and browse to the file location; or ; type in the Command window or a do file: use datafilename.dta,clear you use method (1) or (2) when another data set is already open, prata will warn you it those data have been changed and ask you either to clear the current data without saving the changes or to cancel the opening of the new data set. If you do want to save the changes then cancel and save the open data before opening the new data. In method (3), the clear option tells Stata to clear any existing data before opening the data file specified in the command. i )niy use the clear option when you are absolutely sure you want lo clear out any open data in its current state. The clear option is not necessary when opening the first data file after launching Stata when no other data are open. Stata will return an error message and stop if the clear is omitted and other data arc open. If )u haven't changed the default colours for fonts, error messages are returned in red, and the error message will say (in red): no; riLa Lo memory would be tost;. Version 10 uses clear and clear all commands in different circumstances, and as you become more familiar with Stata > ou will be able to use the clsar options more efficiently. In (3) you do not have to specify the path to the data file if you have changed directories using the cd command to the location of 24 Data in and out of Stata the data file. In some cases it is preferable to read the master data i from one location, such as when using a read-only data library, and then store smaller data files for analysis and output results in another location. For this kind of scenario we recommend using ' the cd command to point Stata to the location of the directorv you wish to use to store your analysis data files and results, and -then when opening the master data specify the whole path. An -example of the series of commands where you wish to read master data from a data library but put all other files in a project folder r on your local drive would be: cd «C:\projects\preject_a» £ use "Mi\datalibrary\masterdata.dta",clear f The first line instructs Stata to use the projecr_a folder on your C drive as the default directory, so unless otherwise specified all use and save commands retrieve or place files in that folder, » The second line retrieves the master data file from the data library and opens it in Staia. T. You will notice that Stata repeats the cd ™C: \projects\ * project a" command in the Results window in yellow. Unlike messages returned in red, which are error messages, yellow mess- J" ages are result messages which give you feedback on what you *~ have just done. You will also see that the cd and use commands are entered in white in the Results window, as this is the default f colour for inputs (what you type). When you start producing results in the Results window, you will see that the majority of type here is green, as that is the default results colour. E After yon have opened a data file you will see a list of variables in the Variables window. The variable names are listed on the left £ while the labels are on the right. Sometimes you have to expand the right-hand side of this window to see all the variable labels, particularly if they are quite long. Providing you have your win- ¥ dows set up in a similar way, the screen shot shows you how your screen may look alter opening a data file. If your data are blank in the area for variable labels, this means that there were no labels -assigned to the variables or that they were lost when you transferred your data into Stara. We show you how to add these later , m the book. Also note that the command to open the data is repeated in the Results window and in the Review window. What you will see in the Review window depends on whether you t are typing the command into the Command window or using a i Opening a Stala data file cummr.nJ in a do file. If it is the former, the command you typed ill appear there. If it is the latter, you will see a line in the review window tL.it starts with do but is then followed by a file directory, winch is pointing to the location of the do file being used. Ja*' You can see that the data file dcmodata2.dta has 11 variables. With a small number of variables it is possible to manage the whole file quite easily. However, if you use large data files, such as survey data, and you do not need to open the data with all of the variables then you can modify the use command to select which variables you choose. For example, if we only wanted to open the variables female and empslat from the demodata2.dta file then wc would type in the Command window or do file: use female empstat using demodata2. dta, clear For different ways of exploring your data, see Box 2.1. Keep and drop There are two commands that allow you to specify the variables you wish to keep m the open data set once you have opened the 26 Data m and out of Stata Box 2,1; inspecting your 'isAs j i t>"x Goce^o^k t'ip C^'r>n >rt i u"d"Aj you *^ r -tented -;nth ' v'orraKin about ,a>re. U->^ and Tn* rl*U il, bf-*;ny > to or. -ot'i-, r,i ."rt-t* on ahrx .il' ' l!^ ^rc-D {, Htd rA ! y~H W tC Srt ,!!& tlt^rrrsV) fw.rn only a fmv inert spfctty ttk> ,'wiat~tes after !hp commend. For example i . eocebook se>r c.go "aoel: u.iif&: 1 rupture. . 0 i_n; ■lit >Teq. Hunier-'1 bab<=] rrt'a..e ~yoe ■ numeric ;byL = ) units: 1 .'.tiiEsiay . : 0/102C4 -.ft. '\yv\.'ft»''45;...:i-;:.>.>f-*.v" r;v.;ftvftr..';':vH-:ftVftft 63 !r> mm tain- va,,ab;< , - sps '-nc a V - ir statf'n n corrfnrf fotm beicr* Opening a Staia data file 27 . ^ 44.415£o is 56 ago a- ctatr, ct inte^'xe-,, The command iiispscf. is also useful for checking data accuracy Again the default is to prov-de all variables, so Specify rhose for wh>ch you requ-re the information, for example: . inspect- ac= d«=v 3«.". Heilbar c£ ObEe'Vdciolif! "otrai Integoxs Itfonintegers I # , rleca:i-,.e - - I # f -ero - - , - I i # ro.= itl-'e 1CZ6- 101,04 ! « # Total 132S4 lOL'fiA J ff f Misaing 1 " it2b'l ■ (2 uni oue 'va . n^cl re^. _a l=5b( and .= 11 vr- les cu.^ do< im^ijt^d la r.hC' labs t . oge: age Wimth^i 'jt nrsexvarf^fti To-al t'itoLjera tvorixi^t^gers ( # l^rja Live - , ' ~ ■ *; # 2a3' - - • 1 # ff Po-ticliö 10264, 1P264 j J * # ----- , ----- .'----. I # it if 4 -t-tal 10264 ' tn-264 ■ - ' 5 9 7 102C4 {St ai^qre Liocusierizc J I.i tree i nbei 28 Data m and out of Stata lull data, '['he command keep is followed by a list of variables you wish to keep in the open data. For example: keep S82C age educ The command drop is also followed by a list of variables but this time those that you wish to remove from the open data. For example; drop pid hid region - ghqt If there is a group of variables you wish to either keep or drop then you can just put the first and last variable as listed in the Variables window with a dash between and Stata will read this as including all variables between the two named in the order they are in the variables window. This notation can be used in other commands as well. OPENING OTHER TYPES OF DATA FILES There are a number of commands to help you import data in other formats but here we concentrate on probably the two most common formats: Excel spreadsheets and SPSS data files. See Box 2.2 for a software package winch converts many different forms of data. Est;x 2.2. Stas/Transfef Stat/Transfer is a software package that converts data files from one format to another There are too many formats to-list here but all commonly used spreadsheets (Excel, Access, dBase etc.) and statistical packages (Stata, SPSS, SAS, Epi Info, etc} are covered. See www.stattransfer.com and www.stata.com/products/ fransfer.html Excel spreadsheets Jf you have data in an Excel spreadsheet and want to transfer it into Stata, from where you can save it as a Stata data file (.dta), you will have to go through a few intermediate steps. You can then either use the pull-down menus or use the insheet command in the Command window or a do hie. These instructions to Opening other types of data files assume you have your data organized in Excel with the variable names in the first row and then one case or respondent's data per row, With your data open m Excel you need to save it as a text (tab delimited) file with a .txt extension. You can do this by using File —» Save As and then selecting the type of file to save. If you choose to use the pull-down menus in Stata to open the data then use File --» import —> ASCIi data created by spreadsheet and then click Browse to go to the location of the .txt file. There are a few options available to you at this stage but if your data art-organized as above then you do not need to change any of the default settings. When you browse to find the .txt file, you need to make sure that the you can see Files of type: All(*.*) set in the lower panel of the Open dialogue box. Click on OK and the data should open with your list of variables in the Variables window. You can visually check your data by clicking the Data Browser button (see Chapter 1) on the toolbar and inspect the spreadsheet. if you wish to use the Command window or put commands in a do file use: insheet datafile.txt, clear This assumes the .txt file is in the default Stata directory or you can enter the path to the file. To save your data in .Stata format (.dta) sec tire Saving data section below. SPSS data files Until recently SPSS data files had to be converted manually, in a similar way to Excel spreadsheets, but now there is the command usAspss which allows you to open an SPSS for windows (.sav) datafile directly in Stata. Type: fiindit usespss Then follow the instructions lo install the command. The usespss command works in a similar way to the usual Stata commands for opening existing files (see above): 30 Data in and out of Stata usespss using ^patbandfilesiame", clear Alter opening the data it can be saved in Siata format i.eita). Sec the Saving data section below. if you are using version 9.0 then you will need to obtain the free update to version 9.2 for the command to work. Type update query m the Command window and follow the instructions. The latest versions of SPSS can save data in Stata format (.dta). ENTERING YOUR OWN DATA INTO STATA Jt may be the case that vou have raw data that you want to enter directly into Stí \ As mentioned in Chapter 1, look for the following two icons « i_ u on the menu bar. The one on the left is the Dctri fiii < r nid tie one on the right is the Data Browser. The Data Editor is the one you will want to open in order to enter your own data. Once you launch the Data Editor, you can proceed to add your own data. The variables go in columns, with each row representing a single observation. Here, you can enter text and numbers, depending on the nature of your variables. To save the inputted data, click Preserve in the top left-hand coiner. You will see that new variables are listed for you in tlie variables window, Vou wiii need to label the variables and their categories (if applicable) using commands that are covered in Chapter 3. SAYING DATA 'lb save your data at any runt under a new file name you can either: (J) use the pull-down menu File —> Save As and browse to the location y o li wish to save the file and enter its new name; or (2) type in the Stata Command window or do file if you wish to save the data in the default directory or the one you have previously specified with the cd com mand: save newdata.dta If you wish to overwrite an existing data file with a modified version then add the option replace: save newdata^dta^ replace Log files 31 When yon are using a do hie (and remember thar we strongly recommend you progress to using them as soon as possible) it is preferable to use the replace option all the time. This is because it the option is not used and a data file with that name already exists then the gave command will cause an error (returned in red and indicating that rhe file already exists) and the do file commands will stop at that point. If the option is used the first time a data file with that name is saved, Stata will report a (green) message thar states that the file you indicated could not be found to be replaced. This isn't a problem - Stata is just telling you that it couldn't literally replace a file because one by that particular name wasn't already in existence, but it will save it anyway and continue with the next command in the do file. But the next time you run the file and make changes to the original data file, you will see that Stata will overwrite previous versions with the replace option. Sec Hox 2.3 for a discussion of the importance ot careful data file management. Box 23; Ov«>*<1tlrtfj ya'sr data if you regularly use master data drawn from a read-only data library then you need only bo concerned with whether to or when to overwrite your data for analysis files. This is because the system will not allow you to overwrite the master data, If you have your own data master files, on your local drive you run the risk of accidentally overwriting those files, especially when you are in the early stages of getting to know a new software package. After going to all the trouble ,of collecting, coding and entering your own data you need to guard against ruining that work We ■"-v;'m-.enC tnat you koor a e^siei cop-' in a oiaoe detached rrom your local drive (CD-ROM, USB drive, or a networked remote drive) and, to be doubly sure, designate the copy of the master data that you are using on your local drive-as read-only. LOG FILES We mentioned log files briefly m the previous chapter. A log file keeps a record of your commands and results during a Stata session. At first you may wonder why this is necessary as the results 32 Data in and out of Stats are shown immediately m the Results window on. your screen. The Results window has a limited capacity, and while you might initially find that this is sufficient for your use (or you may increase the buffer size to be much greater), you will quickly need the capacity to permanently record your sessions as your data manipulation and analysis becomes more complex. You need to explicitly tell Stata when to open a log file and to close it. This may seem odd to those familiar with SPSS, where an output window automatically opens when the first command is executed, but we believe it does give a greater degree of flexibility for complex scries of analyses, in Stata you can open a log file at any time, close it, open it again to add further results, or open a new log file altogether. We find this particularly useful when we want to separate results from data manipulation or when preparing tables for a report where it is possible to produce a log file for the analysis for each table rather than one larger log file. Of course, you may prefer one large log file with clear annotations separating the different stages of the analysis, and Stata has the flexibility to manage either. Log files come in two formats; both have their advantages and disadvantages. You specify the format you want by the extension to the log file name - this can be either . log or . scral - when you tell Stata to open a log file. If you chose to use the log file with . smcl format (this is the default, so if you do not specify a file extension then you will get this format) then you can view this file by using the View option from the File pull-down menu in Stata and browsing to the log file. The .smel log files have the same properties as the results that ate displayed in the Results window. This file format also has the advantage that is can be copied and pasted easily into Excel and Word, which is a topic that we will return to in later in this chapter. Log files with a .log extension are text files that can be viewed in any word processor. The downside is that it is not as easy to copy and paste tables to Excel, and to view them correctly you need to ensure that the font is one with equal spaces for each jfc character such as Courier. To view your log file in Word, remember to select Display all file types when you arc searching for r~ your file, as its file extension is not .doc. 6* If you wish to only record what you type into the Command window, then you can open, close, turn on and off a command log ^ file using cmdlog instead of log. You can have command logs open at the same time as normal log files if von wish, Hi Starting a log file The command log using starts a log file, and you tell Stata the name you wish to call it (e.g. analysis) and which format you want. Assuming you wish to save the log file in the directory you have previously specified using the cd command, you would use: log asing analysis.log,replace or log using analysis.scml,replace The replace option is used to overwrite the original file with a modified version. In a similar way to using the replace option with the save command, if replace is not specified and a file by dial name exists, Stata will show an error and stop the do file. If replace is used and there is no file by that name Stata will show a warning the first time the file is created but then carry on with the next command in the do file. 11 you are using the pull-down menus or the ' it tnand window and still want to keep a log file of your c mnnids and results then )ou ejii use the log file icon on the toollm - in version 9 .111J in version 10 - to open, suspend and J isi i log file. V. In u ) ou open a log file, Stata automatic lib ltim Js the location of the log file, the type of log file and the time and date it was opened at the beginning of the file. For example: . log using "C:\Documents and Settings\project_a\a.naiysis . log" log: C:\Documents and Settings\project_a\analysis.log iog type: text opened on: 19 Oct 2007, 11:32:41 Closing a iog file When you close a log file it is saved to the location specified when you opened it with the log using command. To close the lc>g file, simply type: log close and Stata records the location of the log file, the type of log fiie and the time and date the log file was closed at the end of die file. For example: 34 Dala in and out o! Slala . ]og close log: C :XDocuments and Setti rigs \pro:ect_o .analysis . log Log type: text closed on: IS Oct 2U0.\ 11:53:23 The log close command shuts the log file completely, and it you want to reopen it to add more information then you need to type log using with append (as below). However, if you want to just turn off the log temporarily then you ran use log off and then log on to turn it bach on later in your analysis or in your do file. After the log off command is used Statu records when the log was paused: . log off log: C : 1 Document, s and SettingtA proioct„_a\analysis . log log i.ype: text paused on: 19 Oct 2.007, 12:03:35 After the log on command Statu records when the log was resumed: . Iocj on log: C : XPccumenls and Sor.t i rig.s\ proioot. a \analysis . log log type: text resumed on; 19 Oct 2007, 12::13;51 Adding to four log file If you want to reopen an existing log file and add lurcher result:, to it, rather than overwrite it as with the replace subcommand, you open the log file in the usual, way but use an append option: log' using analysis. log,append and Stati teeords the location, type and time anci date the new results were added to the log hie in the same way as when the log file was (trst opened. COPYING RESULTS TO EXCEL AND WORD Tables and other forms of .-m.-ilvsis results can he copied from the Results window straight into Kxxel and "Word to create tables in Copying results to Excel and Word 35 ' »u 'i ii n-nts and spreadsheets. In this example, two variables, ni nit rl -i nius (masint) and gender {sex) from our example data u. i i'i>stii> ilated using the tabulate command (sec Chapter 6 t >■ i tmrlh.1 discussion ol this command): IK- the -ur-or and mouse to highlight the table: ^nfiei i H'hr Jick and choose Copy table or use the Edit pull down mi nn n> do the same, I i F u "1 choose one cell and then right-click and choose Paste in ii ' the Fdit pull-down menu to do the same. The data will be i worn iricallv entered m their own cells in the Excel spreadsheet: Ii „| I ItC viewed 36 Data in and out of Stafa In Word, Paste the data into the document. It will appeal-spaced by tabs, and to convert it to a table you need to select the rows and go to Table -» Convert -» Text to table. The default settings will be OK to change the pasted data into a table m Word. In Figure 2.1 we illustrate the mam routes for your Stata commands (whether these are generated by a pull-down menu, Command window or do file), data and results when you are working with master data in a read-only library. This could easily be adapted in the case where your master data are held in a read-only folder on your local drive (see our recommendations in Box 2.3). There are probably as many ways of organizing analysis files as there are researchers, but we would like to suggest two main ways as a starting point for your own file management system. We would like to stress the importance of adopting a systematic approach to file management and archiving whatever system vou finally decide to use. This can pay dividends, as one of us (DP) found out (see Box 2.4), The first approach is to group files by project. In this way all data files, do files and log files arc in the same project folder. This allows you to specify the project loWer in the cd command knowing that all files lor that project are there to access, overwrite or save. This system works well provided the number of files does not become unwieldy. A way of dealing with large numbers of files is to adopt a system of prefixing your file names in order to group types of files. As the default way ol showing files in a folder using Windows is alphabetical on the file name, if you use do_, data_, and log_ to start names for do files, data files and log files respectively, then they will be shown in those groups. We also recommend that you try to be systematic in the naming of files, FILE MANAGEMENT Figure 2.1 information flow data schematic commands Stata j- commands Local drive t File management 37 Box 2A: The i;t.::wff«ncis of good file management ln'2003 i (DP) analysed data from an Americai-, panel sfLCy (the National Longitudinal Study of Adolescent Healt'"1 - AcdHoa.ti; for a paper which was eventually pubiisbec1 in 9.C-CZ-' ir the summer of 2006 I was contacted by a research- .-oo wis doing a meta-revie* of studies that had looked at cannaufs use S'>a depression,3 As this was one of the aspect researcher had questions about the measure had used, as well as asking for some addition;"; '■os.-itw !r-ht were not published, such as bivariate odds, ratios-:who>o had on'y published multivariate results. When I think back to when 1 started data anaiys's and my rather haphazard file management, such a.request men would have caused me hours of work trawiingj back through bcrily na-viod and annotated files to find the particular statistics needed .c ihis meta review. But this time, because I had adopiad a Me management system and had copiously annotated my do "lies, ! was arjle to find the information and provide the unpublished results in sometime in the future. The time betwebn submission to n journal and receiving reviews that may ask for-addihor al analys-a can be quite substantial, and you may have gone on to other oroiects in that time. Good-annotations will allow you to got back m*o (he analysis much faster than trying to work through c:ch command line to try and remember what you had done as those many months ago, ' lidry, J.R, (2003) The National Longitudinal S'.' .dy nf Aonlesnem rieairh (Add Health), Waves I & II, 1994-1S88; Wave 1. 200I-20C2 l-nscTn*-readable data file arid documentation], 'Chape Hl;l MO Carol.na Populahon Center, University of hrorrh Carolina Chapc; hili. 2 Wsde. T.J. snd Pe^alm, D.J. (2005) Auoies^cm aeunqcency ana health Canadian Journal of Criminology anp Oirnsna! Justice, 47: 619-654. 3 Moore, T H M, Zamnit, S,, Lmgford-Hughes, A. et al. (200.7) Cannabis use and risk of psychotic or affective mental health outcome?: a systematic review. Lancet 370 (8584)-. 319- 326 38 Data in and out of Stata especially do files and log files, so that it is obvious what stage of the project they relate to. For example, we might have one do file for extracting data from master files, one do file for manipulating those data to construct variables for analysis, and then three do files for analysis, In this case wc would name the do files as follows; do_extraciion. do do _ construction.do do_analysis_l .do do_analysis_2. do do _analysis_3.do If you do not like the prefixing of the file names then you can click on the Type column top button in Windows, when in List or Detail view, and the files in that folder will be grouped by their extensions. The second approach which we use for larger projects is to set up three subioklers in the project folder: one for do files, one for data., and one for log files. This way does not allow you to easily use the cd command to specify a single folder, but long paths can easily be copied and pasted in the do file editor. When using this system it is important to be able to identify which log files corae from which do files. We use numbers to link them so that cmalysis_l .do produces results_l.log, and if there is more than one log file from a single do file we would use letters to distinguish them such as rcMilts_la.log, rcsittts_lb.log, USING THESE COMMANDS IN A DO FILE Below we provide a starting template for a standard beginning arid end of a do file. This starting template is for a single do file producing a single log file. Do files can be much more complex than this basic example, and we would expect you to develop and tailor your do files as you gain more experience with Stata. For example, you may have a number of log files open at the same time, in different formats, and then pause, resume and close each of them at separate times. In this case the name (teimpname) option is very useful, but for now remembering that this is possible is enough to be going on with. Explanations of the commands are annotated in the do file in the ways that can be actually used to annotate and note do files so Using these commands in a do file 39 thev are more meaningful. You can add comments in do files which give an explanation of the commands you have written. These can be handy when you go back to a do file months later and can't remember why you did certain things. Indeed, you probably will not appreciate the importance of commenting until you have experienced tins type of frustration. It is certainly better to have more comments than not enough! Any line that begins with an asterisk will be ignored by Stata as a command. If your comment extends for numerous lines, yon will have to make sure that every line begins with *, or you can use a different technique for extensive commenting, writing /* at the beginning of the comment and */ at the end. Anything between these signposts is ignored. It is important, if you decide to use this latter method for extensive commenting, that you remember to put in your end of comment signifier */ or everything after the opening /* will be regarded as a comment - possibly the rest of your do file! Both types of commenting techniques arc shown below. For additional commenting techniques, see help comment. set sEem 10m version 10.1 /* not required but a good habit to get into in case you have any version-specific commands */ capture log close /* closes any log files still open */ set more off /* turns off the need to hit space bar when the Results window is full */ cd M:\projects\project_a\analysis log using analysis . 1.scml,replace * the .scml is not formally required as it is * the default format use id sex age iaasfcat educ using /// ™K: \datalibrary\data.„f ile", clear * manipulation and analysis commands here save analysis . file.dta,replace * the .dta is not formally recruired log close 40 Data in and out of Stata Note that this is the first time we have used the /// notation to indi- ■ cate that a command line goes over to the next line. This is only used in do files and not the Command window - see interacting with Stata in Chapter 1. \ Another way of organising your do fiie would be to only open the log file when you have analysis results you wish to keep and p_ cxclude the data manipulation. In this case your do file might be ¥ structured as follows: set mem 10m ■version 10.1 capture log close % set more off cd Mi\projects\project_a\anaIysis • use id sex age mastat eduo using /// "K:\datalibrary\data file", clear *flr«t set of manipulation commands here log vising analysis r.ccml, replace ^ ♦first set of analysis commands here log off .Jk * second set of manipulation commands here log on "second set of analysis commands here and *be sure to close log log close save analysis filo.dta, replace 6