Using the two course datasets (stops and officer) or a dataset of your own choosing (preapproval by instructor required), create your own script that cleans and processes these data. Datasets will be sent via email as they are too large to post in Canvas. Please turn in your Stata .do file (or R equivalent) by Week 4 on or before class at 9AM Feb 11th. I will run your .do file on the raw data and you will be evaluated based on the final dataset that it creates as well as your use of the commands discussed in class and organization of your code.
Programming Assignment 1
Using the two course datasets (stops and officer) or a dataset of your own choosing
(preapproval by instructor required), create your own script that cleans and processes
these data. Datasets will be sent via email as they are too large to post in Canvas.
Please turn in your Stata .do file (or R equivalent) by Week 4 on or before class at 9AM Feb
11th. I will run your .do file on the raw data and you will be evaluated based on the final
dataset that it creates as well as your use of the commands discussed in class and
organization of your code.
Sample data is located on DropBox, see announcement for the link. Email me if you have
problems accessing it.
The script must contain the following elements:
•
•
•
•
•
•
•
•
•
Best practice in cleaning: tabbing, logs, date stamps, org structure etc.
Create/use at least one temp file
Create/use at least one macro
Create/use at least one loop
Define at least one program
Execute at least one of the different joins discussed in class
Ensure all variables/values are labeled and stored in the least resource
intensive format possible
All date and time variables should be converted to date/time formats
Clean/standardize all data/variables
Matthew B. Ross, Ph.D.
Econometrics 1
Lecture 1: Intro & Stata Basics
Matthew B. Ross, Ph.D.
Economic Sciences Department
School of Social Science, Policy, & Evaluation
Claremont Graduate University
Economic Sciences
Econometrics
Department
1
School of Social Science,January
Policy, 21,
& Evaluation
2022
1Clarem
/ 32
Game Plan for Today
Welcome/Intros
Review Syllabus
Review Expectations/Assignments
Today’s Lesson: Intro to Stata
Assignment 1 in more detail
Matthew B. Ross, Ph.D.
Economic Sciences
Econometrics
Department
1
School of Social Science,January
Policy, 21,
& Evaluation
2022
1Clarem
/ 32
Overview of Stata
Why Stata?
REALLY fast w/ matrices
Holds data in virtual memory (advantage and disadvantage)
Designed for social scientists (the standard)
Peer-reviewed journal for user-written packages
Latest/greatest package development
Matthew B. Ross, Ph.D.
Economic Sciences
Econometrics
Department
1
School of Social Science,January
Policy, 21,
& Evaluation
2022
2Clarem
/ 32
Key Features of Stata
GUI vs. Compiler (command line)
Local vs. server (various editions)
Key file types
scripts: .do
data: .dta
logs .smcl
graphs .gph
Can read/write and import/export many other file types
Help command is amazing
User-written programs and .ado files (load permanently or in a
session)
Again, really FAST (1 dataset, no dataframes)
Matthew B. Ross, Ph.D.
Economic Sciences
Econometrics
Department
1
School of Social Science,January
Policy, 21,
& Evaluation
2022
3Clarem
/ 32
Key Features of Stata
GUI vs. Compiler (command line)
Local vs. server (various editions)
Key file types
scripts: .do
data: .dta
logs .smcl
graphs .gph
Can read/write and import/export many other file types
Help command is amazing
User-written programs and .ado files (load permanently or in a
session)
Matthew B. Ross, Ph.D.
Economic Sciences
Econometrics
Department
1
School of Social Science,January
Policy, 21,
& Evaluation
2022
4Clarem
/ 32
Matthew B. Ross, Ph.D.
Let’s briefly walk through Stata!
Economic Sciences
Econometrics
Department
1
School of Social Science,January
Policy, 21,
& Evaluation
2022
5Clarem
/ 32
Syntax
New version text wrap, older versions do not, can use delimiters
Stata syntax is inconsistent and clunky- it is what it is.
Case sensitive, wildcards, arithmetic
General Syntax Structure
1
2
[ p r e f i x ] command [ v a r l i s t ] [= e x p r e s s i o n ] [ i f / i n ]
[ weight ] [ using f i l e ] [ , options ]
3
4
∗Comments l o o k l i k e
this . . . ∗
5
6
7
/∗And t h e y a l s o t h e y Look
L i k e t h i s ∗/
Matthew B. Ross, Ph.D.
Economic Sciences
Econometrics
Department
1
School of Social Science,January
Policy, 21,
& Evaluation
2022
6Clarem
/ 32
Syntax Examples
Loading Flat Files
1
2
3
i m p o r t d e l i m i t e d ” TXDPS_2015_Stops . csv ” , c l e a r
s t r i n g c o l s ( a l l ) d e l i m i t e r ( ” ,” ) v a r n a m e s ( 1 )
∗ L o a d i n g a CSV∗
4
5
6
7
i m p o r t e x c e l ” TX_TCOLE_Proficiency – Active . xlsx ” ,
s h e e t ( ” Sheet1 ” ) f i r s t r o w a l l s t r i n g c l e a r
∗ L o a d i n g an E x c e l f i l e ∗
8
9
10
u s e ” analytical_sample ” i f n