Programming Question

INFO371 Problem Set: Diﬀ-in-Diﬀ EstimatorYour name:
January 15, 2024
Introduction
This week your task is (again!) to estimate the impact of the Progresa program, a government
social assistance program in Mexico, using real data. However, this time you will use diﬀerencesin-diﬀerences estimator.
This program, as well as the details of its impact, are described in Schultz (2004) (available on
Canvas). The data (progresa-sample.csv.gz, the same dataset as last week) is available on canvas
in ﬁles/data. Please consult the explanations from the last week problem set for description of the
program and variables. To put it brieﬂy: from beginning of 1998, the families who were considered
poor in certain villages (progresa villages) received a government subsidy given their kids attended
school. The progresa/non-progresa villages were chosen randomly.
The goal of this problem set is to
• refresh your t-test;
• implement and use the double-diﬀerences estimator
Your task is to estimate the impact of progresa subsidies on the school attendance.
Please submit a) your code (rmd) and b) the lab in a ﬁnal output form (html or pdf).
Always explain and comment your results, do not expect the grader is able to pick the
correct number out of many with no further explanations.
Working together is fun and useful but you have to submit your own work. Discussing the
solutions and problems with your classmates is all right but do not copy-paste their solution!
Please list the students you collaborated with.
1
Was the randomization done correctly? (40pt)
Your ﬁrst task is to analyze whether randomization was performed correctly. Perfect randomization
ensures that the treatment group and the control group are similar. This is less important in
terms of observable characteristics, but very-very important for unobservables. Obviously, we
can only analyze the observables: are the pre-treatment (1997) demographic and village-related
characteristics for the poor equal (in average) in treatment and control villages?
1. (14pt) Present your results in a single table with the following columns and 14 (or so) rows.
The table should look something like this:
1
Variable name
Average (T )
sex
indig
dist-sec
…
Average (C) Diﬀerence (T − C)
0.519
…
…
0.505
…
2.508
0.014
…
…
p-value
0.012
0.246
0.246
You can see a very similar table in Adams-Prassl et al. (2020), Table A8, page 33 (just for
males/females, not for villages).
Suggestion: use t-test to determine whether the diﬀerence between T and C villages is
statistically signiﬁcant for each of the variables in the dataset. Focus only on the data
from 1997 for poor. Ignore variables such as folnum and village that do not carry social
signiﬁcance. t-test can be done with t.test in R. For instance t.test(x, y) compares two
unpaired vectors x and y, and outputs conﬁdence intervals, p-value, and other things.
Suggestion 2: There are many ways you can create this table, here is one suggestion you
may follow:
(a) create an empty data frame that contains the values you need: variable name, average
for T , average for C, their diﬀerence, and p-value from t-test. In R, you can also just
create a NULL-object:
df

Turn in your highest-quality paper
Get a qualified writer to help you with

“ Programming Question ”

Get high-quality paper

Guarantee! All work is written by expert writers!

Still stressed from student homework?

Get quality assistance from academic writers!

Order now