THE FIRST FILE IS ASIGNMENT FILE , REST 4 ARE DATA FILES.
World_Bank_2011-HW x
the 2011 World Bank Teaching Data Set
The World Bank World Development Indicators (WDI) have been produced annually from 1960. The dataset we are using in IQM (‘World_Bank_2011IQMAssignment.sav’) has been prepared using just 15 of the indicators for 2011 (this is just a small selection from over 600 social, economic, financial, natural resources and environmental indicators in the full database).
They can also be accessed directly from the World Bank’s web pages (see
http://data.worldbank.org/indicator/all)
Cases
A sample of 150 countries make up the cases of this dataset.
The Variables
The dataset includes 16 variables – this includes 15 national development indicators prepared by the World Bank (interval level), and one variable indicating ‘country’ (categorical). The ‘country’ variable is included for information but should not be used in the analysis for the IQM assignment (only interval level variables should be used for Part B). The name and label of all 16 variables is included below.
List of Variables in the World_Bank_2011 Teaching dataset
VARIABLE NAME
VARIABLE LABEL
1
CountryGroup
Country
2
Fertility_rate
Fertility rate : total (births per woman)
3
GNI_per_capita
GNI per capita: Atlas method (current US$)
4
GNI_per_capita_growth
GNI per capita growth (annual %)
5
Life_expectancy
Life expectancy at birth: total (years)
6
Infant_Mortality
Mortality rate: infant (per 1000 live births)
7
Population_growth
Population growth (annual %)
8
Primary_Gender_Ratio
Ratio of female to male primary enrollment (%)
9
Urban_population
Urban population (% of total)
10
Female_Labor_Force
Labor force: female (% of total labor force)
11
Health_expenditure_per_capita
Health expenditure per capita (current US$)
12
Armed_forces
Armed forces personnel (% of total labor force)
13
ATMs_per_100000
Automated teller machines (ATMs) (per 100000 adults)
14
Exports
Exports of goods and services (% of GDP)
15
Female_senior_officials
Female legislators senior officials and managers (% of total)
16
Mobiles_per_100
Mobile cellular subscriptions (per 100 people)
Four important notes about the dataset…
1.
Variable definitions
While the variable labels give a sort summary of what each indicator measures, fuller definitions are available at
http://data.worldbank.org/indicator/all
2.
Missing Cases
Nearly all the indicators in the World Bank dataset include some missing cases (where countries fail to provide the relevant data)– These generally number less than 20 for the indicators included in the IQM dataset, but is considerably higher for a few of the variables).
3.
Weighting
No weighting is required for this dataset.
4.
The Data Licence
As part of an open data initiative the World Bank datasets are freely available to all (there is no need to register with the UK Data Service to access this data).
SPSS v20-HW x
The assignment is in two parts, each contributing 50% towards the total mark. Part 1 is based on analysis of the British Social Attitudes Survey (2011) and focuses on the analysis of categorical variables, using cross tabulation. Part 2 uses a dataset compiled for the World Bank (2011) and focuses on the analysis of interval level variables using correlation and simple regression. The assignment should overall word limit (for part 1 and 2 combined) is 3,000 WORDS (with a leeway of +/- 10% not counting tables and graphs or the bibliography).
– Please include the word count on the front cover of your work.
PART 1: An Analysis of the British Social Attitudes Survey (2011) The 2011 BSA teaching dataset (‘BSA2011IQMassignment.sav’) is very similar to the 2009 BSA dataset you used in the practical classes. It includes a number of variables relating to attitudes across arrange of topics. For this part of the assignment you will select ONE of these variables as your dependent (outcome) variable, together with a selection of explanatory variables to carry out an analysis exploring variation in your dependent variable among the British population in 2011.
N.B. PLEASE ENSURES YOU USE THE 2011 VERSION OF THE BSA NOT THE 2009VERSION USED IN PRACTICAL CLASSES.
Detailed Instructions
Step 1. Define the analysis
Take time to study the variables in the 2010 BSA dataset (‘BSA2011IQMassignment.sav’) – the data includes variables measuring attitudes across a range of topics from climate change to views on the welfare state (see the handout ‘About the British Social Attitudes Survey 2011 IQM Teaching Data Set’ for a full list of variables). Select any ONE of these of interest to use as the dependent variable.
your research objective is to carry out a secondary analysis using TWO main explanatory variables of your choice (and ONE control variable) to investigate variation in your dependent variable. Note that all variables used in this part of the assignment should be categorical variables.
It is helpful to specify this as a clear research question or research objective which will guide your analysis. This might be defined in fairly broad terms as befitting an exploratory analysis e.g if you were interested in attitudes to capital punishment your objective could be ‘an investigation of factors associated with support for capital punishment’ or you might adopt a more focused question perhaps designed to test some theory or hypothesis from the literature e.g. ‘that support for capital punishment varies by social class’. (Note attitudes to capital punishment are not in this dataset).
For this assignment you should limit yourself to just TWO main explanatory variables(one additional variable may be used as a control variable – see stage 4 of the analysisbelow). In selecting your explanatory variables for the analysis it is important that you provide a clear reasoning for their choice (with some reference to relevant literature).
Step 2. Carry out the Analysis
N.B. Remember for all your analysis of the 2011 BSA, make sure you have applied the weighting variable to correct for sampling selection bias and non-response (as covered in practical 3)
1. Use appropriate methods to describe the univariate distributions of your dependent and explanatory variables.
2. For your dependent variable, calculate 95% confidence intervals around one of the percentage estimates (e.g. ‘% in favour of capital punishment’), using the method covered in lecture 5.
3. Explore the relationship between the dependent variable and each of your two explanatory variables separately with crosstabs (carrying out any variable recoding as appropriate*), and include a Chi-square and Cramers V test of association for each table.
4. Finally, generate ONE three-way table where you take one of the crosstabs alreadyproduced in stage 3 and introduce a control variable (again include a Chi-square andCramers V test). For your control variable you may choose either a variable already used in stage 3 or a new variable from the dataset (you are advised to ensure the variable you use as a control has only a few categories to keep your 3 way table to a manageable size).
*Remember that using variables with many answer categories in crosstabs may result in tables that are hard to read and interpret (you may also find it leads to many cells of your table being empty or containing very few cases). In such situations you should consider carrying out some recoding of your variables to reduce the number of categories (many of the variables in this dataset have already been recorded in this way (e.g, the variables for marital status, political party identification and newspaper readership are all condensed versions of the originals).
Step 3. Write up the analysis
You should write up Part 1 of the assignment using the following report structure (please note the mark scheme and tips on reporting provided at the end of the document) Introduction
Provide a concise introduction. This should include some background to the topic under study (with some reference to relevant literature/sources) and a statement of the research question/research objective that you will be investigating with the analysis.
You should also introduce, and give key details about, the dataset you will be using (2011 BSA).
The Analysis
Describe and justify the selection of variables used in the analysis (and describe any use of recoding). Briefly comment on any missing data if applicable.
Report your frequency tables and the confidence interval you calculated
Present your crosstabulations (including statistical tests) with a clear reporting andinterpretation of what they show*.
Conclude with a brief summary of your main findings and their limitations.
*Graphs (e.g. stacked or clustered bar charts) may be included in your report to accompany your tables but this is NOT a requirement for Part 1 of the assignment (well presented tables are sufficient).
PART 2: An analysis of national development indicators (World Bank)
Countries rather than individuals make up the cases in the dataset used for Part 2 of the assignment. The dataset is based on World Bank development indicators (interval level variables) for a sample of 150 countries in 2011 (the SPSS data file
‘World_Bank_2011IQMAssignment.sav’ is available on Blackboard). You should select ONE of these interval variables as your dependent variable, and analyse the relationship between this variable and TWO ‘explanatory’ variables of choice.
Step 1. Define the analysis
Take time to study the variables in the World Bank dataset (see the document ‘About the World Bank 2011 Teaching Data Set’ on Blackboard for a full list). Select ONE of these of interest to you as the dependent variable (note this must be interval level so do not choose ‘Country’).
Your research objective is to investigate the strength of the relationship between your chosen dependent variable and TWO other interval level explanatory variables in the data set (note again, this must be interval level so do not choose ‘Country’).
As in your analysis for Part 1, it is helpful to specify a clear research question or research objective which will guide your analysis. And in selecting your explanatory variables it is again important that you provide a clear reasoning for their choice (with some reference to literature).
Step 2. Carry out the Analysis
1. Use appropriate methods to describe the univariate distributions of your dependent and explanatory variables. Briefly comment on any missing data if applicable.
2. For your dependent variable, calculate 95% confidence intervals around the mean value.
3. Produce two scatter-plots to show the relationship between your dependent variable and each of your explanatory variables
4. Run a correlation analysis to show the relationship between your dependent variable and each of your two explanatory variables
5. Finally, run a simple linear regression to model the relationship between your dependent variable and ONE of your explanatory variables (you should use the explanatory variable which showed the strongest correlation with your dependent variable in step 4).
Step 3. Write up the analysis
You should write up Part 2 of the assignment steps using the following report structure (please note the mark scheme and tips on reporting provided below)
Introduction
This should include a statement of the research question / objective that will guide your analysis, providing some brief background on the topic with some reference to literature.
You should also introduce, and give key details about, the dataset you are using.
The Analysis
Describe and justify the selection of variables used in the analysis
Present your results (tables, graphs and statistical tests) with a clear reporting and interpretation of what they show.
Conclude with a brief summary of your main findings and the limitations of your results.
Overall the examiners will be looking for reports that demonstrate:
(1) An understanding of techniques taught on the course, and the ability to apply them to a specific research brief
(2) The ability to report results clearly and concisely and to interpret their meaning and relate them to the research question.
(3) The ability to write up a secondary analysis project with a report structure and writing style that is clear and logical.
General tips for Report Writing:
Reports benefit from a clear structure and sensible use of numbered sub-headings will help achieve this.
Avoid simply pasting in tables and graphs without accompanying comment. Readers will want you to draw out the important elements of your output and to explain your results
Good formulation and presentation of tables and graphs is important. A cut and paste table that goes off the end of the page is a bad table! (note you can edit tables in SPSS). Avoid overcomplex tables with too many rows and columns (resulting in lots of empty or near empty cells) – sensible use of recoding may help. Tables presented with only the raw counts are of little value – inclusion of appropriate row or column percentages will help you and the reader interpret the table.
Tables and figures included in the report should be numbered and given clear titles, and should always be referred to in the text.
Always aim to support your analysis with intelligent discussion. So for relationships you are reporting, can you explain the relationship? Is it what you expected? Do you think it’s real, or might it be due to another unexplored variable?
Focus your discussion and analysis on the research question being considered – does it support existing research/theory/your own hypotheses?
Specific tips on reporting the analysis for this assignment
In reporting your choice of explanatory variables (for Part 1 and 2) try and give properly reasoned justifications, supported where possible with reference to literature or other sources.
When reporting confidence intervals ensure you show your working and provide an explanation of how the confidence intervals should be interpreted.
Are the relationships shown in your crosstab tables statistically significant? Remember to include the results of Chi-square and Cramers V tests in your report. These should be referred to and clearly interpreted (avoid just pasting in a table of results of statistical tests without comment or interpretation).
Good scatterplots have an appropriate scale on the axes and have the dependent variable on the y-axis.
When interpreting correlations, have you considered both the direction (positive or negative) and the strength of the relationship, and whether it is statistically significant?
For the regression modelling consider: What will a regression model tell you? Does your explanatory variable have a statistically significant effect on the response variable? What is the effect on the response variable of a unit increase in the explanatory variable? How good is the model? How much of the variation in your dependent variable is explained by the model?
In reporting simple regression, ensure you indicate the B values, the R2 and the significance of the Bs. You should also express the model as an equation, and provide a worked example.
Output1-BSA 2011.spv
outputViewer0000000000.xml
Output Log
GET
FILE=’C:\Users\Talal\Downloads\BSA2011IQMassignment.sav’.
DATASET NAME DataSet1 WINDOW=FRONT.
META-INF/MANIFEST.MF
allowPivoting=true
Output-World_Bank_2011.spv
outputViewer0000000000.xml
Output Log
GET
FILE=’C:\Users\Talal\AppData\Local\Microsoft\Windows\Temporary Internet Files\Content.IE5\5NLKAYXK\World_Bank_2011IQMAssignment.sav’.
DATASET NAME DataSet1 WINDOW=FRONT.
META-INF/MANIFEST.MF
allowPivoting=true
2011 British Social Attitudes Survey-HW x
About the 2011 British Social Attitudes Survey
(IQM Teaching Data Set)
(see also the official ‘British Social Attitudes 2011 User Guide’ available on Blackboard site)
‘The British Social Attitudes (BSA) survey series, which began in 1983, is designed to produce annual measures of attitudinal movements to complement information gathered from a) large-scale government surveys that deal largely with facts and behaviour patterns, and b) party political attitudes data produced by the polls. One of the main purposes of the BSA is to monitor patterns of continuity and change, and examine the relative rates at which social attitudes change over time. The BSA 2011 questionnaires included modules covering: attitudes to social welfare, education, health, transport, housing and politics.’
(taken from
http://discover.ukdataservice.ac.uk/catalogue/?sn=7237&type=Data%20catalogue )
About the 2011 BSA (IQM) dataset
Please note the dataset we are using
‘BSA2011IQMAssignment.sav’ has been specially prepared for the IQM course and includes only a small selection of variables available from the full 2011 BSA survey. All respondents from the survey are included (3,311 adults aged 18 and over).
The Variables
The dataset includes 57 variables (the full BSA 2011 has around 850) in addition to the case identifiers. The name and label of all 57 variables is included below.
List of Variables included in the BSA 2011 (IQM assessment teaching dataset)
VARIABLE NAME
VARIABLE LABEL
1
Case_ID
Case ID
2
GOR2
Government office region 2003 version :Q18
3
WtFactor
Final BSA weight :Q25
4
ABCVer
questionnaire version :Q30
5
Country
England, Scotland or Wales? :Q31
6
Househld
Number living in household, including respondent :Q47
7
RSex
Person 1 SEX :Q49
8
RAge
Person 1 age last birthday :Q50
9
RAgeCat2
Age of respondent(grouped)<6 category> dv :Q140
10
Married
Marital status <4 categories> dv :Q146
11
ChildHh
Whether respondent has any children in household: dv :Q151
12
HhType
Household type dv :Q163
13
paper2
Newspaper readership grouped (broadsheet & tabloid) – dv
14
PartyId2
Party political id(compressed) dv :Q250
15
Spend1
1st priority for extra Govt spending :Q262
16
HIncDif4
Which of the phrases on this card comes closest to your feelings about your households income these days? :Q274
17
Dole
R s view of the level of benefits for unemployed people. :Q270
18
TaxSpend
If govt had to choose which should choose
19
NHSSat
how satisfied or dissatisfied would you say you are with the way in which the NHS runs nowadays :Q357
20
TrfConc2
how concerned are you about the effect of transport on climate change :Q394
21
CarNum
How many, if any, cars or vans does your household own or have the regular use of? :Q398
22
CliCar
Does The current level of car use have a serious effect on climate change :Q421
23
CliPlane
Does The current level of air travel have a serious effect on climate change :Q422
24
Tenure2
Housing tenure
25
REconSum
Resp economic activity
26
RNSocCl
Respondent : social class[pre-SOC2000]best estimate dv :Q733
27
RClassGp
NS-SEC analytic classes [resp]
28
SRPrej
self rated level of racial prejudice :Q917
29
ResPres
Can I just check, would you describe the place where you live as :Q924
30
ReligSum
Religion of respondent
31
RlFamSum
what religion wre you brought up in
32
RaceOri3
To which of these ethnic/racial groups do you consider you belong? :Q973
33
DisNew2
Do you have a long-standing physical or mental health condition or disability? :Q982
34
HEdQual
Highest educational qual obtained – dv :Q1058
35
HEdQual2
Highest educational qual obtained (postgrad separate) – dv :Q1059
36
HHIncQ
Household pre-tax income quartiles (dv) :Q1206
37
REarnQ
Respondent pre-tax earnings quartiles (dv) :Q1209
38
MigrStop
Migration to Britain should be stopped, even if hurts British economy A2.41bB2.10bC2.9b.
39
WelfFeet
If welfare benefits weren’t so generous, people would learn to stand on own feet A2.47fB2.24fC2.23f.
40
RUHappy2
Consider your life in general these days how happy or unhappy you are A2.1.
41
PrejNow
more racial prejudice in Britain now than there was 5 years ago, less, or about the same amount? :Q914
42
WelfHelp
The welfare state encourages people to stop helping each other A2.47aB2.24aC2.23a.
43
MoreWelf
The government should spend more money on welfare benefits for the poor A2.47bB2.24bC2.23b.
44
UnempJob
Around here, most unemployed people could find a job if they really wanted one A2.47cB2.24cC2.23c.
45
SocHelp
Many people who get social security don’t really deserve any help A2.47dB2.24dC2.23d.
46
DoleFidl
Most people on the dole are fiddling in one way or another A2.47eB2.24eC2.23e.
47
DamLives
Cutting benefits would damage too many people’s lives A2.47gB2.24gC2.23g.
48
ProudWlf
The creation of the welfare state is one of Britain’s proudest achievements A2.47hB2.24hC2.23h.
49
SpeCamMo
Speed cameras are mostly there to make money A2.B2.C2.5b.
50
BigBusnN
Big business benefits owners at the expense of workers A2.48bB2.25bC2.24b.
51
Wealth
Ordinary working people do not get their fair share of the nation’s wealth A2.48cB2.25cC2.24c.
52
RichLaw
There is one law for the rich and one for the poor A2.48dB2.25dC2.24d.
53
TradVals
Young people today don’t have enough respect for traditional British values A2.49aB2.26aC2.25a.
54
StifSent
People who break the law should be given stiffer sentences A2.49bB2.26bC2.25b.
55
BnMove
The current benefit system effectively encourages recipients to move off benefits A2.30eB2.1eC2.1e.
56
Obey
Schools should teach children to obey authority A2.49dB2.26dC2.25d.
57
WrongLaw
The law should always be obeyed, even if a particular law is wrong A2.49eB2.26eC2.25e.
58
Censor
Censorship of films and magazines is necessary to uphold moral standards A2.49fB2.26fC2.25f.
Four important notes about the dataset…
1.
Variables and the original questions
While the variable labels give a short summary of what each variable measures, it is often helpful to go back to the original questionnaire to see exactly what was asked.
The 2011 BSA questionnaire is available on Blackboard (and also from the ESDS website at
http://www.esds.ac.uk/doc/7237/mrdoc/pdf/7237questionnaires ). It is a huge document but the electronic version can be easily searched to find the relevant questions. In the table above you will see that each variable label ends with a number – this refers to the question number in the questionnaire, and you can use this to search the document for the relevant question (N.B. the handout for practical class 2 includes a worked example of how to do this)
2.
Missing Cases
When carrying out analysis on the data you will notice that some of the variables have a large number of missing cases. In a frequency table these ‘missing’ may be given labels such as ‘Skip, not asked this version’ (as in the case below).
These labels simply refer to cases where the respondent was not asked this question in the survey (the BSA survey divides the sample into groups A B and C – all groups answer a set of core questions, but additional questions are specific to groups A B and C) – hence in the above example ‘Skip, not asked this version’ means some (in this case, only group A got asked) did not get asked this question and are consequently coded to ‘missing’ (as they are non-applicable).
Other types of missing case have been defined. These include cases where a respondent refused to answer a question. However, cases where the respondent answered ‘Don’t know’ to an attitudinal question are generally left in as a ‘valid’ response (though if the number of ‘don’t knows’ is very small it may be sensible to code them to missing, so they are excluded from crosstabulations)
3.
Weighting
As with many of the Government Surveys, The 2011 British Social Attitudes Survey data comes with a weighting variable
‘Wtfactor’ (it’s the third variable down in the list above)
This weighting variable has been constructed to correct for known bias in the sample – this includes bias in the sampling design and also non-response bias. The weight should always be applied (switched on)
before carrying out any analysis of the data. When applied the variable will ensure results that are more representative of the true population.
How to apply the weight
Applying the weight variable is easy, and is covered in practical 3 – but the procedure is repeated here for information…
First open the dataset in SPSS. Then…
Either
press the weighting button
on the toolbar at the top
OR
select from the menu:
Data… Weight Cases…
Then the following dialogue box will appear…
Click on ‘weight cases by’ and scroll down the list of variables to find and select ‘WtFactor’ (it’s the third one down the list)
Click OK to apply the weight
To check that the weight is on, look at the bottom right corner of the Data Editor where it should display ‘Weight On’.
All your analysis will now be weighted to help make the results more representative of the national population.
If you want to see the impact caused by applying the weighting variable try running a frequency table of gender or age with the weight on and then with the weight switched off again – you’ll see the difference in the resulting tables is only small.
4.
The Data Licence
Staff and students registered at an HE institution can generally have access free of charge to Government survey data. However, it is important to stress that the data is provided under a strict end users licence agreement which all users must sign up to. Essentially this involves agreeing not to pass on or misuse the data in any way (including any attempt to use it to try and identify individuals in the sample).
Normally all new users would register individually with ESDS (it’s a quick and easy on-line process see
http://www.esds.ac.uk/aandp/access/access.asp) However when the data is being used for class teaching purposes (as in IQM) the procedure is simplified by a special Access Agreement for Teaching whereby enrolled students simply need to sign a class list agreeing to the conditions of use.
N.B. If in the future you end up wanting to use Government survey data for your own research e.g. as part of a dissertation, you would need to complete the on-line registration process as an individual user.
image3
image1
image2