Programming Question

INFO371 Problem Set: Diff-in-Diff EstimatorYour name:
January 15, 2024
Introduction
This week your task is (again!) to estimate the impact of the Progresa program, a government
social assistance program in Mexico, using real data. However, this time you will use differencesin-differences estimator.
This program, as well as the details of its impact, are described in Schultz (2004) (available on
Canvas). The data (progresa-sample.csv.gz, the same dataset as last week) is available on canvas
in files/data. Please consult the explanations from the last week problem set for description of the
program and variables. To put it briefly: from beginning of 1998, the families who were considered
poor in certain villages (progresa villages) received a government subsidy given their kids attended
school. The progresa/non-progresa villages were chosen randomly.
The goal of this problem set is to
• refresh your t-test;
• implement and use the double-differences estimator
Your task is to estimate the impact of progresa subsidies on the school attendance.
Please submit a) your code (rmd) and b) the lab in a final output form (html or pdf).
Always explain and comment your results, do not expect the grader is able to pick the
correct number out of many with no further explanations.
Working together is fun and useful but you have to submit your own work. Discussing the
solutions and problems with your classmates is all right but do not copy-paste their solution!
Please list the students you collaborated with.
1
Was the randomization done correctly? (40pt)
Your first task is to analyze whether randomization was performed correctly. Perfect randomization
ensures that the treatment group and the control group are similar. This is less important in
terms of observable characteristics, but very-very important for unobservables. Obviously, we
can only analyze the observables: are the pre-treatment (1997) demographic and village-related
characteristics for the poor equal (in average) in treatment and control villages?
1. (14pt) Present your results in a single table with the following columns and 14 (or so) rows.
The table should look something like this:
1
Variable name
Average (T )
sex
indig
dist-sec

Average (C) Difference (T − C)
0.519


0.505

2.508
0.014


p-value
0.012
0.246
0.246
You can see a very similar table in Adams-Prassl et al. (2020), Table A8, page 33 (just for
males/females, not for villages).
Suggestion: use t-test to determine whether the difference between T and C villages is
statistically significant for each of the variables in the dataset. Focus only on the data
from 1997 for poor. Ignore variables such as folnum and village that do not carry social
significance. t-test can be done with t.test in R. For instance t.test(x, y) compares two
unpaired vectors x and y, and outputs confidence intervals, p-value, and other things.
Suggestion 2: There are many ways you can create this table, here is one suggestion you
may follow:
(a) create an empty data frame that contains the values you need: variable name, average
for T , average for C, their difference, and p-value from t-test. In R, you can also just
create a NULL-object:
df

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper
Still stressed from student homework?
Get quality assistance from academic writers!

Order your essay today and save 25% with the discount code LAVENDER