EBS NBA Player Dataset Analysis Presentation

I need to do a presentation with power point, nothing too fancy that checks all the points in the guideline about a casual analysis using Stata (we can get to an agreement to which platforms to use).

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

Guidelines

Final Project

Here are the instructions of the project. Below you can find a description of the datasets.

Create a presentation (PowerPoint slides) that should contain the following sections:

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

a) Motivation (Explain what made you conduct the study.)

b) Research question (What is the causal relationship you are interested in?)

c) Describing the empirical analysis (Description of the causal analysis you want touse. Should include the benchmark OLS regression)

d) Data and main variables (Brief description of the dataset and the variables that youare using.)

e) Bias analysis (Discuss problems that could potentially bias the previous regression.)

f) Solution to the bias (Run alternative regression using methods to fix the bias, e.g., byadding control variables or by using instrumental variable. Discuss whether the resultschange.)

g) Results (Detailed discussion of the findings of the analysis)

h) Conclusion (Concluding remarks and potential shortcomings.)

You must submit the projects in a folder that contains:

a) The presentation slides

b) The do file (it can be in stata or r) containing the codes to create the tables andthe figures, and,

c) The data file used for the analysisThe following steps will help you to understand how to approach the project.

A) Select one of the datasets:

1) productivity.dta

2) fintech.dta

3) fintech-panel.dta

4) work_from_home.dta

5) airfair.dta

6) apple.dta

7) salaries.dta

8) crime.dta

9) fertility.dta

10) hprice.dta

11) children.dta

12) nba.dta

13) patent.dta

14) Optional: Your own dataset

B) Explain a causal effect you want to test with the data

C) Select variables from the dataset and run a regression to test the causal effect you want tostudy.

D) Discuss problems that could potentially bias the previous regression and suggest themeans to overcome the bias.

E) Run an alternative regression using methods to fix the bias (if the data are available).

Discuss whether the results change

Description of datasets

1) Productivity: Data of employees’ performance. The data come from many companiesin different countries.

2) Fintech: Data of household characteristics, property ownership and debt. The datacome from the Spanish Household Finance Surveys of 5 different years.

3) Fintech-panel: Data of household characteristics, property ownership and debt, fromthe Spanish Household Finance Surveys. Panel data of the same households form 3different years.

4) Work from home: Data of employees’ demographics and performance from onecompany. The data come from a six-months study of employees working from homeversus employees working from the office.

5) Airfair: Panel data on airline flights.

6) Apple: Data on quantity of ecologically friendly apples desired by a survey ofindividuals

7) Salaries: Salaries and proximity to college data

8) Crime: data on county level crime rates

9) Fertility: Data on number of living children, education, and demographic informationof a sample of women from Botswana

10) Hprice: Data on housing characteristics and prices

11) Children: Cross-sectional data on the number of children born and the mother’s workhistory and demographics

12) NBA: Data on earnings, position played and demographics of a sample of NBA players

13) Patent: Panel data on the number of patents sought and obtained by a sample of firms along with some firm-specific information.

* STATA Basics: Part 1: Basic Codes
* creating a log file to store output
*log using stata.output.txt, text replace
clear all
set more off
* change directory to folder with data files
mkdir /Users/rajdeep/Downloads/stata_TA
cd /Users/rajdeep/Desktop/stata_TA
*dir
* importing data from the internet
*use http://blablabla.com, clear
* Importing .csv files
*insheet using alcohol.csv, clear
*importing excel files
import excel “/Users/rajdeep/Downloads/Bond_spreads.xlsx”, sheet(“Sheet1”) firstrow
clear
*reading data file (.dta)
*use alcohol, clear
* to view the dataset
list
* to view some of the variables
list irma_exp harvey_exp hurricane_exp house_prices
list irma_exp harvey_exp hurricane_exp house_prices in 1/50
* summarizing the data
describe
summarize
summarize irma_exp harvey_exp
summarize hurricane_exp house_prices, detail
summarize if house_prices>=203000
summarize if harvey_exp==0.02
* summarizing the data by group
tab RiskinessGroup
bysort RiskinessGroup:summarize irma_exp harvey_exp hurricane_exp house_prices
tabstat irma_exp harvey_exp hurricane_exp house_prices, by(RiskinessGroup) stat(n
mean sd)
* correlations
pwcorr hurricane_exp house_prices, sig
* Modifying the data
order RiskinessGroup
label variable RiskinessGroup “Risk”
rename house_prices housing_prices
gen score= irma_exp*RiskinessGroup
gen score2=score^2
gen Risk=1 if housing_prices>= 1000
replace Risk=0 if housing_prices==.
drop if score21
* Identifying duplicates based on all the variables
* unab fatality : _all
* sort `fatality’
* quietly by `fatality’: gen dup = cond(_N==1,0,_n)
* dropping duplicates based on all the variables
* drop dup>0
* dropping duplicates based on all the variables
* sort state
* quietly by state: gen dup1 = cond(_N==1,0,_n)
* drop dup1>0
* Let’s ask the question: Can we reduce the vehicle fatality rate by increasing the tax on
beer?
reg mrall beertax, vce(r)
*Seems like we can. But our results might be misleading.First let’s run similar
regressions only
*for the year 1988 and only if the beer tax is below $1
reg mrall beertax if year==1988 & beertax=1986) & !missing(year)
* Create a dummy variable to identify the group exposed to the treatment. In
* this example lets assumed that statees with codes above 4 were treated
* (=1). States 1-4 were not treated (=0).
gen treated = (state>4) & !missing(state)
* Create an interaction between time and treated. We will call this interaction
*‘did’
gen did = time*treated
* Estimating the DID estimator
reg mrall time treated did, vce(r)
*#######################################################################
#########
* The coefficient for ‘did’ is the differences-in-differences estimator.
* The effect is significant at 10% with the treatment having a negative effect.
*#######################################################################
#########
*########################################################
* Another approach
*########################################################
* Estimating the DID estimator (using the hashtag method, no need to generate the
interaction)
reg mrall time##treated, vce(r)
* The coefficient for ‘time#treated’ is the differences-in-differences estimator
* (‘did’ in the previous example).
*########################################################
* Difference in differences (DID) Using the command “diff”
*########################################################
*ssc install diff
diff mrall, t(treated) p(time)
* to check for more detailed information
return li
* Type “help diff” for more details/options
* Difference in differences Graph
graph twoway (scatter mrall year if state==24 & did==0) (lfit mrall year if state==24 &
did==0) (scatter mrall year if state==24 & did==1) (lfit mrall year if state==24 & did==1),
xline (1985)
* to check for the confidence interval betwene the plots
xtreg mrall time##treated, fe
margins time, at(treated = (0 1)) noestimcheck
marginsplot, xdimension(time)
* STATA Basics: Part 2: Graphs
* Histogram
histogram house_prices
* Box plots
graph box house_prices
*Stem Plots
stem house_prices
*stem full
*scatter matrix
graph matrix irma_exp harvey_exp hurricane_exp in 1/50, half
* Scatter plots
scatter house_prices irma_exp in 1/50
* Scatter plot with fitted line
twoway (scatter house_prices irma_exp in 1/50) (lfit house_prices irma_exp)
* how to create a dotted line
scatter house_prices irma_exp in 1/100, lcolor(red) lpattern(dot) title(“My line”)
* To obtain a local linear smooth of yvar on xvar
twoway (scatter house_prices irma_exp in 1/200) (lfit house_prices irma_exp) (lowess
house_prices irma_exp in 1/200)
*to identify possible outliers
twoway (scatter house_prices irma_exp in 1/50, mlabel(CUSIP)) (lfit house_prices
irma_exp)
*to check for the normality of the predictors
hist irma_exp
*histogram irma_exp in 1/50
*histogram with normal curve
hist irma_exp, normal bin(20)
*we use the xlabel() option for labeling the x-axis, labeling it from 0 to 1600
incrementing by 100. simialrly for y axis
histogram irma_exp, normal bin(20) xlabel(0(.01).07) ylabel(0(5)50)
*###########################################
* Plots with Confidence Intervals
*###########################################
* two way plot with Confidence intervals
twoway lfitci mrall beertax
* two way plot with Confidence intervals
twoway lfitci mrall beertax, level(99)
* two way plot with scatter plot. “stdf” is used to obtain a confidence interval
*based on the standard error of the forecast rather than the standard error of
* the mean. This is more useful for identifying outliers.
twoway lfitci mrall beertax, stdf || scatter mrall beertax
* with a confidence interval of your choice
twoway lfitci mrall beertax, level (99) stdf || scatter mrall beertax
* If you don’t want the shaded area
twoway lfitci mrall beertax, ciplot(rline)
* If you don’t want the shaded area but with a confidence interval of your choice
twoway lfitci mrall beertax, level(99) ciplot(rline)
* if you want to classify the relationship between two variables over different
* categories alongwith the overall relationship
twoway lfitci mrall beertax, stdf || scatter mrall beertax, by(year, total row(3))
* if you want to classify the relationship between two variables over different
* categories alongwith the overall relationship but with a confidence interval of your
choice
twoway lfitci mrall beertax, level (90) stdf || scatter mrall beertax, by(year, total
row(3))

Still stressed from student homework?
Get quality assistance from academic writers!

Order your essay today and save 25% with the discount code LAVENDER