From Excel to Stata

Below, I’m presenting tips and codes to start a Stata program (a.k.a. a DO file) based on data from an Excel spreadsheet, with procedures that you cannot find in the contextual menu of Stata.

Before using the code, you should note:

  • In Stata, columns are variables, while rows are observations;
  • I am using Stata 13: some of my manipulations might not be available in earlier versions of Stata (earlier than 10);

How is my Excel spreadsheet set up?

  1. Columns are variables, while rows are observations (like in Stata);
  2. First row: the names of my variables, best if less than 8 characters and numerated. E.g., “agestc01” as name for the variable “starting age of cohort 1
  3. Second row: the label of my variables, where I can provide a more readable description of the variable. E.g., “starting age of cohort 1” as label for the variable “agestc01

Set your working directory

You can open and import documents by searching them, browsing through folders. But the most efficient way is to have all files associated with a given project in a single folder: the program files, the Excel file, the output Stata data file, logs and any exported graph. Stata will designate this folder as your working folder for this project with the change directory command:

cd "C:\Users\Gatien\Desktop\Project1"

Simply replace the address in green with the one for your folder. Anything after this code will be imported/saved/exported to this folder.

Importing an Excel spreadsheet

import excel "name_of_Excel_file.xlsx", sheet("name_of_spreadsheet") firstrow clear

The import command is built-in, meaning that Stata will apply its own specific importation protocol once it recognizes the type of file it is importing. XLSX files are converted well. You will have to specify which spreadsheet you want to examine with Stata with sheet(“name_of_spreadsheet“). Replace the names of the Excel file and the spreadsheet with yours.

Because this can be a memory-intensive process, you should always add the command clear at the end of the line. It means that the memory used by Stata will be cleared from the process you just executed once it is completed. Think of it as regularly deleting temporary files in your browser so the computer does not slow down.

I like to get Stata to consider the first row of my spreadsheet as the name of my variables, thus adding the option firstrow. To make the second row of the Excel spreadsheet as the label of my variables in Stata, I found the following code:

foreach x of varlist * {
label var 'x' '"'='x'[1]'"' 
    }
drop in 1
foreach x of varlist * {
capture destring 'x', replace
    }
describe

The describe command at the end generates a list of the variable names and their associated label. Useful if you’d like to print out a glossary of labels.

For quick reference, you can also download the DO file (compressed).

Now you are set to using your Excel data table with the full analytical power of Stata.

Print Friendly, PDF & Email
The following two tabs change content below.

Gatien

Health economist at Johns Hopkins University

Latest posts by Gatien (see all)

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.