Please read attached handout carefully and follow guidelines, Also extra guidlelines i uploaded for part B and C, Please view all and follow the guidelines to complete this assignment stick to marketing criteria pleaseee.
Due may 16th.
BA500
1
/ BA5001X Business Decision-Making
Final coursework: Lamberts Heating
Submission date: Friday 20th Apirl 2013
Scenario
Lamberts Heating has two main divisions: one manufacturing radiators and the other installation of central heating systems. The manufacturing arm of the company makes radiators for domestic and commercial central heating systems. These radiators are either used by Lamberts installation division or are sold to other central heating installation companies. Lamberts Heating has rather old machinery, presently used to manufacture the radiators, that it wants to replace. The company wants both the new machinery to be in place as soon as possible and to plan the procurement and installation as soon as possible. The company has yet to decide which piece of new machinery to buy and thus does not expect to make an order for this machinery before 3rd June 2013.
Lamberts Heatings installation division supplies and fits domestic central heating systems. It uses its own radiators to obtain the materials for this, but buys in boilers and pipes from other companies. This company is keen to increase its profit margin in this division but the business of supplying domestic heating systems is very competitive at present. The company has decided that the best way to increase its profit margin is to reduce the cost of buying the boilers from a number of suppliers. It has conducted some research into the prices charged by three major suppliers: Apex Boilers, Brunswich Heating Supplies and Centrale.
Tasks
This coursework is in three parts. You should submit all three parts together as one document.
The first part of the document will consist of three reports: one for each part of the coursework with three appendices. These appendices will contain all the details for the work you have completed in order to write the reports. All computer input and output should also be in the appendices, as should the details of any other calculations you have made.
Part A: Planning and control of capital expenditure
Question / Tasks:
The senior management of the Lamberts Heating are considering the replacement of one of the firm’s machines that is used to make radiators in the factory. You are one of the managers involved in this decision. You have had some discussions with other members of the senior management about this proposed investment and there appear to be three machines that the firm could purchase.
Either a:
1. Alumier machine, a straight replacement for the present machine from the same supplier
2. Big EZ machine from an American supplier
3. Cial machine from Japan
The senior management team wish to carefully consider the alternatives. As a first step, it was decided to accurately estimate each of the alternative’s cash flows and the following estimated figures are available:
Alumier
Big EZ
Cial
Machine
Machine
Machine
£
£
£
Initial Outlay
500,000
500,000
500,000
Cash inflows
Year 1
50,000
200,000
150,000
2
100,000
150,000
150,000
3
150,000
150,000
150,000
4
150,000
50,000
150,000
5
150,000
25,000
100,000
6
170,000
25,000
50,000
770,000
600,000
750,000
The other members of the management team have asked you to consider each of the above alternatives, using the various methods usually employed in appraisal of capital investment decisions. You have been asked to report back to the whole team at their next meeting. Whilst other members at this meeting are also aware of usual investment appraisal methods, your report is expected to assess the above alternatives in any way that you consider is appropriate. Your report will be expected to include your recommendation of the machine to purchase, a recommendation that the team can then consider at their next meeting. Your report will also be expected to include an explanation of the strengths and weaknesses of the various methods that you have utilised in your investment appraisal. A 10% discount rate has also been recommended, the yield for each machine is expected to be calculated separately, and it is acceptable for all calculations to be worked out on a pre-tax basis.
(50 Marks)
Note:
1. All references and books, etc., consulted must be included, including any internet sites used.
2. You are also expected to produce this assignment using information technology. Particularly expected for this part of the assignment would be the use of Microsoft Word and Excel, or the use of other similar word-processing and spreadsheet software to produce this coursework.
A successful assignment (in Part A of this assignment) might include consideration or explanation of:
· consideration of each of the alternative proposals under at least three methods of evaluating capital investment decisions
· comparison of the inflows and outflows for each machine under each method
· the objective(s) of each method
· the advantages and disadvantages (strengths and weaknesses) of each of these methods
· consideration of any non-financial factors which may have to be considered
· consideration of any other information you may require
· a recommendation of one of the proposals, including the reasons for your recommendation
· use of the expected word-processing and spreadsheet software
Note: The above is illustrative, covering only the main points of the assignment.
Suggested length: 1,250 words
This part of the assignment (Part A) represents 50% of your marks on this piece of coursework
Part B: Scheduling the installation of the new machinery
Having decided to buy a new piece of machinery to make the radiators, Lambert Heating now has to consider the timeframe for getting the new piece of machinery up and running. This will involve you developing the project plan in terms of determining the critical path and the sequencing of the relevant activities.
The activities involved are shown in the table below with their preceding activities and their normal durations. Some of the activities can be reduced in time by using extra staff but this is at an extra cost. The minimum duration for each activity is shown in the final column. For a number of activities, the minimum duration is the same as the normal duration as the duration of these activities cannot be reduced.
The earliest that the company expects to start the process is Monday 3rd June 2013
Activity
Normal Duration in weeks
Preceding activity
Minimum Duration
[Each day saved cost £200]
A:
Order new machinery
2
None
2
B:
Plan new physical layout of factory
3
None
3
C:
Determine changes needed in existing machinery
3
None
3
D
Receive new equipment
10
A
8
E
Hire new employee to supervise the operation of the new machinery
7
A
6
F
Make changes needed to accommodate new machinery
15
B
13
G:
Make changes needed in existing machinery
9
C
7
H:
Train existing employees to use new machinery
7
D,E
6
I:
Install new machinery
4
F
3
J:
Disassemble old machinery
5
G
4
K:
Conduct employee safety training on new installation
2
H,I
2
Please note that the company works a 5 day week: Monday to Friday.
Task:
Appendix
You may complete this task manually or by using MS project.
· Assuming that the project starts on Monday 3rd
June
2013, determine the shortest duration for the entire project, using the normal duration times. Specify the date at which the project can finish and the number of weeks required.
· Produce a Gantt chart showing the starting and finishing times of each activity
· Identify which activities form the critical path(s) of the project.
· If the project could be speeded up by a maximum of 3 weeks at a cost of £200 a day, where should this money be spent? Produce a new network diagram to show the new critical path(s) and total duration.
Report
Give details of the proposed calendar of works, making it clear which activities are critical and which activities have some slack. This section should be written so that the management can understand the advice being given without reference to the work in the appendix. Give details of the starting date, the finishing date and the duration of the project.
Explain how, by spending more money, the project duration can be shortened. Explain which activities have been shortened and which activities are now critical. Give details of the starting date, the finishing date and the duration of the project. Give details of the extra cost involved in shortening the project.
A successful assignment (Part B of this assignment) should include:
· Determining the shortest time in which the project can be completed using the “normal” durations
· Determining the shortest time in which the project can be completed using the “minimum” durations
· Producing a well-written, well-structured report that enables management to understand how to schedule the project. This report should include details of activities which must be completed on time and those for which there is some slack.
· The costs of shortening the project time should be included in the report as should the relevant start and finishing dates.
Note: The above is illustrative, covering only the main points of the assignment.
Suggested length: 500 words
This part of the assignment (Part B) represents 20% of your marks on this piece of coursework.
Part C: Cost of procuring boilers
This part of the coursework requires you to use linear programming techniques to determine the best way to minimise the cost of supplying boilers to Lamberts Heating
As well as manufacturing radiators (as mentioned in Part A), Lamberts Heating is also reviewing the way it procures boilers to enable the installation of domestic heating systems.
The purchasing department has identified three possible suppliers of boilers; Apex, Brunswich and Centrale. It has also identified five kinds of domestic boiler that it wants to purchase so it can supply and fit these boilers to a variety of sizes of homes.
The five boilers are known
and
coded by Italian numbers as most are manufactured in Italy ; Uno (1), Duo (2), Tre (3), Quattro (4) and Cinque (5).
The table below gives the cost of each boiler (in £s) from each supplier. The final row of the table specifies the minimum requirement of each type of boiler per year. Please note that not all boilers are available from each supplier.
Uno
Duo
Tre
Quattro
Cinque
Apex
500
750
300
—
450
Brunswich
—
725
320
875
420
Centrale
480
775
310
900
—
Requirement
(number of boilers)
2000
1500
3000
2500
2200
There are certain limitations that have to be taken into account;
· Apex can supply no more than 1,000 of the Tre boiler each year.
· Brunswich can supply no more than 800 of the Duo boiler each year.
· Centrale can supply no more than 1,800 Uno boilers each year.
Lamberts Heating wants to meet its requirement for the number of boilers needed each year at the minimum cost.
Task
You are expected to use MS Excel software for this task. However, we will accept printouts from any other Linear Programming software.
Appendix
· Define the meaning of any decision variables you are using.
· Formulate the situation described above as a linear programming problem.
· Include your computer input of the problem (and show any formulae used)
· Include the “Answer Report” and “Sensitivity Report” printouts
Report
Write a brief report to the management of Lambert Heating detailing the cheapest way of meeting their need for boilers.
This report, written in a formal style of English, should include:
· The nature of the problem being solved
· The suggested purchasing plan
· The cost of the suggested purchasing plan.
· The robustness of the plan. Provide details of the ranges of costs of boilers from each suggested supplier for which your suggested plan remains optimal, and advise the company when they will have to generate a new plan. Do not just provide lists but consider your answers in the context of the question.
· At present, Apex can supply no more than 1,000 of the Tre boiler each year. If this limit was changed so that Apex could now supply 1,100 Tre boilers, explain what effect this would have on the total minimum cost?
A successful assignment (for Part C of this assignment) should include:
· Well-defined decision variables and a correct formulation of the problem.
· Appropriate use of Excel (or other software) to produce a solution to the problem.
· Producing a well-written, well-structured report that enables management to understand how to procure boilers at the minimum price. This report should include details of your suggested purchasing plan and the cost of this plan.
· The advice to management should also enable them to determine the range of prices for each boiler for which your purchasing plan is valid.
· The implications of the increased capacity to supply by Apex should be clearly described.
Note: The above is illustrative, covering only the main points of the assignment.
Suggested length: 500 words
This part of the assignment (Part C) represents 30% of your marks on this piece of coursework.
1
BA 5001 Final Coursework
Final coursework: Lamberts Heating
Submission date: Friday 24th May 2013
This coursework is worth 40% of your final mark for this module.
Part A: 50 marks
Part B: 20 marks
Part C: 30 marks
Part B : Installing the new machinery
Activity Normal Duration in weeks Preceding activity Minimum Duration
[Each day saved cost £200]
A: Order new machinery 2 None 2
B: Plan new physical layout of factory 3 None 3
C: Determine changes needed in existing machinery 3 None 3
D Receive new equipment 10 A 8
E Hire new employee to supervise the operation of the new machinery 7 A 6
F Make changes needed to accommodate new machinery 15 B 13
G: Make changes needed in existing machinery 9 C 7
H: Train existing employees to use new machinery 7 D,E 6
I: Install new machinery 4 F 3
J: Disassemble old machinery 5 G 4
K: Conduct employee safety training on new installation 2 H,I 2
Part B : Instructions
You may complete this task manually or by using MS project.
Assuming that the project starts on Monday 3rd June 2013, determine the shortest duration for the entire project, using the normal duration times. Specify the date at which the project can finish and the number of weeks required.
Produce a Gantt chart showing the starting and finishing times of each activity
Identify which activities form the critical path(s) of the project.
Part B : Example from booklet
Estate Agency : pages 113 to 115 for manual example
Estate Agency: pages 124 to 127 for MS Project
Estate Agents: Network Diagram
A 0 9 C 9 14
9 0 9 5 10 15
D 9 12 E 12 15 G 15 16
3 9 12 3 12 15 1 15 16
B 0 7 F 7 11
7 4 11 4 11 15
The critical path is ADEG. The shortest finishing time for the project is 16 weeks.
Estate Agents: Floats
Float = Latest finish time (LFT) – Earliest finish time (EFT)
Activity EFT LFT float Critical
A 9 9 0 yes
B 7 11 4
C 14 15 1
D 12 12 0 yes
E 15 15 0 yes
F 11 15 4
G 16 16 0 yes
Estate Agents : gantt Chart
Estate Agents
no activity A B C D E F G 0 0 9 9 12 7 15 duration A B C D E F G 9 7 5 3 3 4 1 float A B C D E F G 0 4 1 0 0 4 0
days
UsinG MS project: network diagram
UsinG MS project: Gantt Chart
Part B : continued
If the project could be speeded up by a maximum of 3 weeks at a cost of £200 a day, where should this money be spent? Produce a new network diagram to show the new critical path(s) and total duration.
example of crashing on pages 117 to 120 (coursework is easier than this )
make sure you take note of the minimum durations
calculate the extra cost of speeding up the project
Part B : Report (about 500 words)
Give details of the proposed calendar of works, making it clear which activities are critical and which activities have some slack. This section should be written so that the management can understand the advice being given without reference to the work in the appendix. Give details of the starting date, the finishing date and the duration of the project.
Explain how, by spending more money, the project duration can be shortened. Explain which activities have been shortened and which activities are now critical. Give details of the starting date, the finishing date and the duration of the project. Give details of the extra cost involved in shortening the project.
Final Coursework : part C
Cost of procuring boilers
This part of the coursework requires you to use linear programming techniques to determine the best way to minimise the cost of supplying boilers to Lamberts Heating
As well as manufacturing radiators (as mentioned in Part A), Lamberts Heating is also reviewing the way it procures boilers to enable the installation of domestic heating systems.
Part C : Identify the decision variables
The purchasing department has identified three possible suppliers of boilers; Apex, Brunswich and Centrale. It has also identified five kinds of domestic boiler that it wants to purchase so it can supply and fit these boilers to a variety of sizes of homes.
The five boilers are known and coded by Italian numbers as most are manufactured in Italy ; Uno (1), Duo (2), Tre (3), Quattro (4) and Cinque (5).
PArt C : Identify the consTraints
The table below gives the cost of each boiler (in £s) from each supplier. The final row of the table specifies the minimum requirement of each type of boiler per year. Please note that not all boilers are available from each supplier.
Uno Duo Tre Quattro Cinque
Apex 500 750 300 — 450
Brunswich — 725 320 875 420
Centrale 480 775 310 900 —
Requirement
(number of boilers)
2000
1500
3000
2500
2200
PArt C : Identify the consTraints
There are certain limitations that have to be taken into account;
Apex can supply no more than 1,000 of the Tre boiler each year.
Brunswich can supply no more than 800 of the Duo boiler each year.
Centrale can supply no more than 1,800 Uno boilers each year.
Lamberts Heating wants to meet its requirement for the number of boilers needed each year at the minimum cost.
Example from Module booklet page 105
Home
Farm
yield per acre
cost per
acre
Meadow
Farm
yield per acre
cost per acre
corn
400 barrels
£100
corn
650 barrels
£120
wheat
300 barrels
£90
wheat
350 barrels
£80
Farm Production
A farmer has two farms, Home Farm and Meadow Farm, in which he grows corn and wheat.
Both farms are 40 acres in size.
To satisfy a contract with a local mill, the farmer must produce 7,000 barrels of corn and 11,000 barrels of wheat each year.
The farmer wishes to minimise the cost of meeting the contract.
The data for each farm is given below:
Definition of Decision variables
Variables: We need to distinguish between production at Home and Meadow Farm.
Let CH be the number of acres of corn planted at Home Farm
Let CM be the number of acres of corn planted at Meadow Farm
Let WH be the number of acres of wheat planted at Home Farm
Let WM be the number of acres of wheat planted at Meadow Farm
Formulation of problem
Minimise Cost:
Minimise 100CH + 90WH + 120CM + 80WM
subject to:
yield corn(1) 400CH + 650CM ≥ 7,000
wheat (2) 300WH + 350WM ≥11,000
area Home Farm (3) CH + WH ≤ 40
Meadow Farm (4) CM + WM ≤ 40
CH ≥ 0, CM ≥ 0, WH ≥ 0 WM≥ 0
Farm production : Input
Farm’s Answer Report
Farm’s Sensitivity Report
Coursework requirements: Appendix
Define the meaning of any decision variables you are using.
Formulate the situation as a linear programming problem.
Include your computer input of the problem (and show any formulae used)
Include the “Answer Report” and “Sensitivity Report” printouts
Coursework requirements: Report [about 500 words]
Write a brief report to the management of Lambert Heating detailing the cheapest way of meeting their need for boilers.
This report, written in a formal style of English, should include:
The nature of the problem being solved
The suggested purchasing plan
The cost of the suggested purchasing plan.
The robustness of the plan. Provide details of the ranges of costs of boilers from each suggested supplier for which your suggested plan remains optimal, and advise the company when they will have to generate a new plan
coursework requirements
Do not just provide lists but consider your answers in the context of the question.
At present, Apex can supply no more than 1,000 of the Tre boiler each year. If this limit was changed so that Apex could now supply 1,100 Tre boilers, explain what effect this would have on the total minimum cost?
Farm Production : Notes for a report
This is taken from page 106 of the module booklet
1. Planting Plan
In order to minimise the planting costs, the following should be implemented:
Plant Home Farm with 2.56 acres of wheat
Plant Meadow Farm with 10.77 acres of corn and 29.23 acres of wheat.
This will incur costs of £3,861.54, the minimum that can be achieved.
Farm Production : Notes
2. Implications of Suggested Planting Programme
The planting plan given above will result in exactly 7,000 barrels of corn and exactly 11,000 barrels of wheat being produced.
[S1 = 0, S2 = 0]
37.44 acres of Home Farm will not be planted, whilst all 40 acres of Meadow Farm will be used.
Farm Production : Notes
3. Scope of the recommendations
3.1 Change in Costs
The planting plan given above will result in a minimum cost whilst the following conditions hold:
* The cost of planting corn at Home Farm stays above £89.23 per acre.
* The cost of planting corn at Meadow Farm stays below £137.50 per acre.
* The cost of planting wheat at Home Farm stays in the range £68.57 to £105 per acres.
* The cost of planting wheat at Meadow Farm stays in the range £62.50 to £105 per acre.
Farm Production : Notes
If any one of the conditions above fails to hold then a new planting plan will be required. Furthermore, if two or more of the present costs change, a new plan will be required.
As these ranges are relatively large, the proposed plan should hold good for some time
Farm Production : Notes
3.2 Change in Contract
The proposed plan exactly meets the contracts for 7,000 barrels of corn and 11,000 barrels of wheat.
If the requirement for corn was to increase by one barrel, the extra costs incurred in meeting this target would be 22.3 pence. This marginal cost of 22.3 pence per barrel applies for levels of production between 5,571.43 and 26,000 barrels.
If the minimum amount of wheat required was increased, the extra cost would be 30 pence per barrel. This extra cost per barrel will apply whilst the contract for wheat lies in the range 10,230.8 to 22,230.8 barrels.
If the minimum requirement for corn or wheat fell outside these ranges, a new planting plan would be required.
Farm Production : Notes
3.3 Changes in Farm Size
In total only 2.56 acres of Home Farm are being used. Clearly it would not make sense to consider increasing the size of Home Farm.
All 40 acres of Meadow Farm are being used. If the size of Meadow Farm could be increased by 1 acre the minimum cost could be reduced by £25. This marginal reduction in cost applies when the size of Meadow Farm lies in the range 10.77 acres to 42.2 acres.
Similarly, if the size of Meadow Farm is reduced by 1 acre, the minimum cost will rise by £25.
Farm Production : Notes
If we could increase the size of Meadow Farm by 1 acre, we could produce an extra 350 barrels at Meadow Farm at a cost of £80.
At the same time we would reduce the yield from Home Farm by 350 barrels
i.e. 350/300 = 1.16666 acres. This reduction would save 1.16666*90 =105.
Thus the net reduction would be 105 – 80 = £25 ]
Marking Scheme: Part B
40
Input data for original problem shown 2
Correct Network diagram produced 4
Reduced time input data 2
Reduced time correct network diagram 2
total 10
Mention start data and finish date . Mention duration = 24 weeks (2) 4
Mention critical activities, giving full description of each activity for full marks (3) making it clear what critical means (2). List activities that have some slack (3) make it clear what slack means (2) 12
Advice client as to how reduce time 2
Mention new start data and new finish date , duration and cost 4
Mention the new critical activities giving full description of each activity for full marks (2) making it clear what critical means (2) 8
total 30
40
Marking scheme: Part C
appendix Defining variables 6
Stating objective function 2
Stating constraints 9
Showing correct excel input data 2
Showing all of the output data 4
total 23
report Explaining the problem . 5
Explaining the suggest purchasing plan . 8
Stating the minimum cost 3
Robustness: This plan remains optimal whilst the costs of
Uno from Apex are ???, Due from Apex are ??? etc 17
If the Apex could supply 100 more Tre boilers year, how would the minimum cost change 4
total 37
total 60
farms
corn homecorn meadowwheat homewheat meadow
0000
minimise10012090800
corn4006500>=7000
wheat3003500>=11000
home farm 110<=40
meadow farm110<=40
Answer Report 1
Microsoft Excel 14.0 Answer Report
Worksheet: [Book1]Sheet1
Report Created: 05/12/2012 15:33:15
Result: Solver found a solution. All Constraints and optimality conditions are satisfied.
Solver Engine
Engine: Simplex LP
Solution Time: 0.032 Seconds.
Iterations: 4 Subproblems: 0
Solver Options
Max Time Unlimited, Iterations Unlimited, Precision 0.000001, Use Automatic Scaling
Max Subproblems Unlimited, Max Integer Sols Unlimited, Integer Tolerance 1%, Assume NonNegative
Objective Cell (Max)
Cell Name Original Value Final Value
$E$4 0 12750
Variable Cells
Cell Name Original Value Final Value Integer
$B$3 Alpha 0 750 Contin
$C$3 Beta 0 1000 Contin
$D$3 Gamma 0 1500 Contin
Constraints
Cell Name Cell Value Formula Status Slack
$E$5 total investments 100000 $E$5<=$G$5 Binding 0
$E$6 max Alpha 750 $E$6<=$G$6 Not Binding 250
$E$7 max Beta 1000 $E$7<=$G$7 Binding 0
$E$8 max Gamma 1500 $E$8<=$G$8 Binding 0
Sensitivity Report 1
Microsoft Excel 14.0 Sensitivity Report
Worksheet: [Book1]Sheet1
Report Created: 05/12/2012 15:33:15
Variable Cells
Final Reduced Objective Allowable Allowable
Cell Name Value Cost Coefficient Increase Decrease
$B$3 Alpha 750 0 7 0.2 7
$C$3 Beta 1000 0 3 1E+30 0.0833333333
$D$3 Gamma 1500 0 3 1E+30 0.6666666667
Constraints
Final Shadow Constraint Allowable Allowable
Cell Name Value Price R.H. Side Increase Decrease
$E$5 total investments 100000 0.1166666667 100000 15000 45000
$E$6 max Alpha 750 0 1000 1E+30 250
$E$7 max Beta 1000 0.0833333333 1000 1800 600
$E$8 max Gamma 1500 0.6666666667 1500 2250 750
fred
Alpha Beta Gamma
750 1000 1500
7 3 3 12750
total investments 60 25 20 100000 <= 100000
max Alpha 1 750 <= 1000
max Beta 1 1000 <= 1000
max Gamma 1 1500 <= 1500
Answer Report 2
Microsoft Excel 14.0 Answer Report
Worksheet: [LP problems.xlsx]farms
Report Created: 05/12/2012 15:48:05
Result: Solver found a solution. All Constraints and optimality conditions are satisfied.
Solver Engine
Engine: Simplex LP
Solution Time: 0.016 Seconds.
Iterations: 3 Subproblems: 0
Solver Options
Max Time Unlimited, Iterations Unlimited, Precision 0.000001, Use Automatic Scaling
Max Subproblems Unlimited, Max Integer Sols Unlimited, Integer Tolerance 1%, Assume NonNegative
Objective Cell (Min)
Cell Name Original Value Final Value
$F$5 minimise 0 3861.5384615385
Variable Cells
Cell Name Original Value Final Value Integer
$B$4 corn home 0 0 Contin
$C$4 corn meadow 0 10.7692307692 Contin
$D$4 wheat home 0 2.5641025641 Contin
$E$4 wheat meadow 0 29.2307692308 Contin
Constraints
Cell Name Cell Value Formula Status Slack
$F$6 corn 7000 $F$6>=$H$6 Binding 0
$F$7 wheat 11000 $F$7>=$H$7 Binding 0
$F$8 home farm 2.5641025641 $F$8<=$H$8 Not Binding 37.4358974359
$F$9 meadow farm 40 $F$9<=$H$9 Binding 0
Sensitivity Report 2
Microsoft Excel 14.0 Sensitivity Report
Worksheet: [LP problems.xlsx]farms
Report Created: 05/12/2012 15:48:05
Variable Cells
Final Reduced Objective Allowable Allowable
Cell Name Value Cost Coefficient Increase Decrease
$B$4 corn home 0 10.7692307692 100 1E+30 10.7692307692
$C$4 corn meadow 10.7692307692 0 120 17.5 145
$D$4 wheat home 2.5641025641 0 90 15 21.4285714286
$E$4 wheat meadow 29.2307692308 0 80 25 17.5
Constraints
Final Shadow Constraint Allowable Allowable
Cell Name Value Price R.H. Side Increase Decrease
$F$6 corn 7000 0.2230769231 7000 19000 1428.5714285714
$F$7 wheat 11000 0.3 11000 11230.7692307692 769.2307692308
$F$8 home farm 2.5641025641 0 40 1E+30 37.4358974359
$F$9 meadow farm 40 -25 40 2.1978021978 29.2307692308
farms
farms
corn home corn meadow wheat home wheat meadow
0 0 0 0
minimise 100 120 90 80 0
corn 400 650 0 >= 7000
wheat 300 350 0 >= 11000
home farm 1 1 0 <= 40
meadow farm 1 1 0 <= 40
Sheet3
Objective Cell (Min)
CellNameOriginal ValueFinal Value
$F$5minimise03861.538462
Variable Cells
CellNameOriginal ValueFinal ValueInteger
$B$4corn home00Contin
$C$4corn meadow010.76923077Contin
$D$4wheat home02.564102564Contin
$E$4wheat meadow029.23076923Contin
Constraints
CellNameCell ValueFormulaStatusSlack
$F$6corn7000$F$6>=$H$6Binding0
$F$7wheat11000$F$7>=$H$7Binding0
$F$8home farm 2.564102564$F$8<=$H$8Not Binding37.43589744
$F$9meadow farm40$F$9<=$H$9Binding0
Answer Report 1
Microsoft Excel 14.0 Answer Report
Worksheet: [Book1]Sheet1
Report Created: 05/12/2012 15:33:15
Result: Solver found a solution. All Constraints and optimality conditions are satisfied.
Solver Engine
Engine: Simplex LP
Solution Time: 0.032 Seconds.
Iterations: 4 Subproblems: 0
Solver Options
Max Time Unlimited, Iterations Unlimited, Precision 0.000001, Use Automatic Scaling
Max Subproblems Unlimited, Max Integer Sols Unlimited, Integer Tolerance 1%, Assume NonNegative
Objective Cell (Max)
Cell Name Original Value Final Value
$E$4 0 12750
Variable Cells
Cell Name Original Value Final Value Integer
$B$3 Alpha 0 750 Contin
$C$3 Beta 0 1000 Contin
$D$3 Gamma 0 1500 Contin
Constraints
Cell Name Cell Value Formula Status Slack
$E$5 total investments 100000 $E$5<=$G$5 Binding 0
$E$6 max Alpha 750 $E$6<=$G$6 Not Binding 250
$E$7 max Beta 1000 $E$7<=$G$7 Binding 0
$E$8 max Gamma 1500 $E$8<=$G$8 Binding 0
Sensitivity Report 1
Microsoft Excel 14.0 Sensitivity Report
Worksheet: [Book1]Sheet1
Report Created: 05/12/2012 15:33:15
Variable Cells
Final Reduced Objective Allowable Allowable
Cell Name Value Cost Coefficient Increase Decrease
$B$3 Alpha 750 0 7 0.2 7
$C$3 Beta 1000 0 3 1E+30 0.0833333333
$D$3 Gamma 1500 0 3 1E+30 0.6666666667
Constraints
Final Shadow Constraint Allowable Allowable
Cell Name Value Price R.H. Side Increase Decrease
$E$5 total investments 100000 0.1166666667 100000 15000 45000
$E$6 max Alpha 750 0 1000 1E+30 250
$E$7 max Beta 1000 0.0833333333 1000 1800 600
$E$8 max Gamma 1500 0.6666666667 1500 2250 750
fred
Alpha Beta Gamma
750 1000 1500
7 3 3 12750
total investments 60 25 20 100000 <= 100000
max Alpha 1 750 <= 1000
max Beta 1 1000 <= 1000
max Gamma 1 1500 <= 1500
Answer Report 2
Microsoft Excel 14.0 Answer Report
Worksheet: [LP problems.xlsx]farms
Report Created: 05/12/2012 15:48:05
Result: Solver found a solution. All Constraints and optimality conditions are satisfied.
Solver Engine
Engine: Simplex LP
Solution Time: 0.016 Seconds.
Iterations: 3 Subproblems: 0
Solver Options
Max Time Unlimited, Iterations Unlimited, Precision 0.000001, Use Automatic Scaling
Max Subproblems Unlimited, Max Integer Sols Unlimited, Integer Tolerance 1%, Assume NonNegative
Objective Cell (Min)
Cell Name Original Value Final Value
$F$5 minimise 0 3861.5384615385
Variable Cells
Cell Name Original Value Final Value Integer
$B$4 corn home 0 0 Contin
$C$4 corn meadow 0 10.7692307692 Contin
$D$4 wheat home 0 2.5641025641 Contin
$E$4 wheat meadow 0 29.2307692308 Contin
Constraints
Cell Name Cell Value Formula Status Slack
$F$6 corn 7000 $F$6>=$H$6 Binding 0
$F$7 wheat 11000 $F$7>=$H$7 Binding 0
$F$8 home farm 2.5641025641 $F$8<=$H$8 Not Binding 37.4358974359
$F$9 meadow farm 40 $F$9<=$H$9 Binding 0
Sensitivity Report 2
Microsoft Excel 14.0 Sensitivity Report
Worksheet: [LP problems.xlsx]farms
Report Created: 05/12/2012 15:48:05
Variable Cells
Final Reduced Objective Allowable Allowable
Cell Name Value Cost Coefficient Increase Decrease
$B$4 corn home 0 10.7692307692 100 1E+30 10.7692307692
$C$4 corn meadow 10.7692307692 0 120 17.5 145
$D$4 wheat home 2.5641025641 0 90 15 21.4285714286
$E$4 wheat meadow 29.2307692308 0 80 25 17.5
Constraints
Final Shadow Constraint Allowable Allowable
Cell Name Value Price R.H. Side Increase Decrease
$F$6 corn 7000 0.2230769231 7000 19000 1428.5714285714
$F$7 wheat 11000 0.3 11000 11230.7692307692 769.2307692308
$F$8 home farm 2.5641025641 0 40 1E+30 37.4358974359
$F$9 meadow farm 40 -25 40 2.1978021978 29.2307692308
farms
farms
corn home corn meadow wheat home wheat meadow
0 10.7692307692 2.5641025641 29.2307692308
minimise 100 120 90 80 3861.5384615385
corn 400 650 7000 >= 7000
wheat 300 350 11000 >= 11000
home farm 1 1 2.5641025641 <= 40
meadow farm 1 1 40 <= 40
Sheet3
Variable Cells
FinalReducedObjectiveAllowableAllowable
CellNameValueCostCoefficientIncreaseDecrease
$B$4corn home010.769230771001E+3010.76923077
$C$4corn meadow10.76923077012017.5145
$D$4wheat home2.5641025640901521.42857143
$E$4wheat meadow29.230769230802517.5
Constraints
FinalShadowConstraintAllowableAllowable
CellNameValuePriceR.H. SideIncreaseDecrease
$F$6corn70000.2230769237000190001428.571429
$F$7wheat110000.31100011230.76923769.2307692
$F$8home farm 2.5641025640401E+3037.43589744
$F$9meadow farm40-25402.19780219829.23076923
Answer Report 1
Microsoft Excel 14.0 Answer Report
Worksheet: [Book1]Sheet1
Report Created: 05/12/2012 15:33:15
Result: Solver found a solution. All Constraints and optimality conditions are satisfied.
Solver Engine
Engine: Simplex LP
Solution Time: 0.032 Seconds.
Iterations: 4 Subproblems: 0
Solver Options
Max Time Unlimited, Iterations Unlimited, Precision 0.000001, Use Automatic Scaling
Max Subproblems Unlimited, Max Integer Sols Unlimited, Integer Tolerance 1%, Assume NonNegative
Objective Cell (Max)
Cell Name Original Value Final Value
$E$4 0 12750
Variable Cells
Cell Name Original Value Final Value Integer
$B$3 Alpha 0 750 Contin
$C$3 Beta 0 1000 Contin
$D$3 Gamma 0 1500 Contin
Constraints
Cell Name Cell Value Formula Status Slack
$E$5 total investments 100000 $E$5<=$G$5 Binding 0
$E$6 max Alpha 750 $E$6<=$G$6 Not Binding 250
$E$7 max Beta 1000 $E$7<=$G$7 Binding 0
$E$8 max Gamma 1500 $E$8<=$G$8 Binding 0
Sensitivity Report 1
Microsoft Excel 14.0 Sensitivity Report
Worksheet: [Book1]Sheet1
Report Created: 05/12/2012 15:33:15
Variable Cells
Final Reduced Objective Allowable Allowable
Cell Name Value Cost Coefficient Increase Decrease
$B$3 Alpha 750 0 7 0.2 7
$C$3 Beta 1000 0 3 1E+30 0.0833333333
$D$3 Gamma 1500 0 3 1E+30 0.6666666667
Constraints
Final Shadow Constraint Allowable Allowable
Cell Name Value Price R.H. Side Increase Decrease
$E$5 total investments 100000 0.1166666667 100000 15000 45000
$E$6 max Alpha 750 0 1000 1E+30 250
$E$7 max Beta 1000 0.0833333333 1000 1800 600
$E$8 max Gamma 1500 0.6666666667 1500 2250 750
fred
Alpha Beta Gamma
750 1000 1500
7 3 3 12750
total investments 60 25 20 100000 <= 100000
max Alpha 1 750 <= 1000
max Beta 1 1000 <= 1000
max Gamma 1 1500 <= 1500
Answer Report 2
Microsoft Excel 14.0 Answer Report
Worksheet: [LP problems.xlsx]farms
Report Created: 05/12/2012 15:48:05
Result: Solver found a solution. All Constraints and optimality conditions are satisfied.
Solver Engine
Engine: Simplex LP
Solution Time: 0.016 Seconds.
Iterations: 3 Subproblems: 0
Solver Options
Max Time Unlimited, Iterations Unlimited, Precision 0.000001, Use Automatic Scaling
Max Subproblems Unlimited, Max Integer Sols Unlimited, Integer Tolerance 1%, Assume NonNegative
Objective Cell (Min)
Cell Name Original Value Final Value
$F$5 minimise 0 3861.5384615385
Variable Cells
Cell Name Original Value Final Value Integer
$B$4 corn home 0 0 Contin
$C$4 corn meadow 0 10.7692307692 Contin
$D$4 wheat home 0 2.5641025641 Contin
$E$4 wheat meadow 0 29.2307692308 Contin
Constraints
Cell Name Cell Value Formula Status Slack
$F$6 corn 7000 $F$6>=$H$6 Binding 0
$F$7 wheat 11000 $F$7>=$H$7 Binding 0
$F$8 home farm 2.5641025641 $F$8<=$H$8 Not Binding 37.4358974359
$F$9 meadow farm 40 $F$9<=$H$9 Binding 0
Sensitivity Report 2
Microsoft Excel 14.0 Sensitivity Report
Worksheet: [LP problems.xlsx]farms
Report Created: 05/12/2012 15:48:05
Variable Cells
Final Reduced Objective Allowable Allowable
Cell Name Value Cost Coefficient Increase Decrease
$B$4 corn home 0 10.7692307692 100 1E+30 10.7692307692
$C$4 corn meadow 10.7692307692 0 120 17.5 145
$D$4 wheat home 2.5641025641 0 90 15 21.4285714286
$E$4 wheat meadow 29.2307692308 0 80 25 17.5
Constraints
Final Shadow Constraint Allowable Allowable
Cell Name Value Price R.H. Side Increase Decrease
$F$6 corn 7000 0.2230769231 7000 19000 1428.5714285714
$F$7 wheat 11000 0.3 11000 11230.7692307692 769.2307692308
$F$8 home farm 2.5641025641 0 40 1E+30 37.4358974359
$F$9 meadow farm 40 -25 40 2.1978021978 29.2307692308
farms
farms
corn home corn meadow wheat home wheat meadow
0 10.7692307692 2.5641025641 29.2307692308
minimise 100 120 90 80 3861.5384615385
corn 400 650 7000 >= 7000
wheat 300 350 11000 >= 11000
home farm 1 1 2.5641025641 <= 40
meadow farm 1 1 40 <= 40
Sheet3
Module
Title
Business Decision Making
Module Code
BA 5001
Session
2012/ 13
Teaching Period
Year long
Module Booklet Contents
Welcome to Module Title
:
Details of the Staff teaching team
Name of Module Leader
Janet Geary and Maurice Pratt
Office Location
Janet Geary : Stapleton House SH317
Maurice Pratt : Calcutta House CM 147
j
.
geary@londonmet.ac.uk
m.pratt@londonmet.ac.uk
Telephone
Janet 0207 133 3839
Maurice 0207 320 3270
Office Hours
Maurice : Monday 11am to 12, Thursday 12 to 1pm
Janet :
MODULE SPECIFICATION
1
Module title
Business Decision Making
2
Module code
BA 5001
3
Module Level
Level 5
4
Module Leader
Janet Geary & Maurice Pratt
5
Faculty
Business School
6
Teaching site(s) for course
cross-campus
cross-campus
cross-campus
7
Teaching period
year long (30 weeks)
8
Teaching mode
Day
9
Module Type
Year long
10
Credit rating
30
11
Prerequisites and corequisites
BA 4002 Managing Information and Accounting
12
Module description
The Business Decision Making module is directed at students who are following a number of different business degree programmes and requires that students have previously completed the Level 4 module Managing Information and Accounting or its equivalent. The year-long module is designed to enable students to understand the role of quantitative and statistical techniques in managerial decision making and also to familiarise students with management accounting concepts and techniques with an emphasis on decision making.
Assessment: Data analysis and reporting on findings 30%, In-class tests 30%, and integrated assignment 40%
13
Module aims
The aims of the module are:
•To provide an introduction to business decision making through the application of selected quantitative/statistical techniques and associated specialist software.
•To develop an understanding of how such analyses fit into the wider business and management context.
•To enable students to assess the reliability and usefulness of any information generated by the analysis and hence justify decisions made.
•To develop students’ understanding of the major uses of accounting information by management in problem-solving, decision-making and planning and control.
•To familiarise students with the decision-making framework for internal users of accounting information in short term decision-making.
•To examine alternative techniques in decision-making situations, including capital investment appraisal.
•To enable students to design spreadsheet models and interpret the managerial accounting information outputs of such models.
•To prepare students, where appropriate, for Level 6 project work in this area.
14
Module learning outcomes
On successful completion of this module, students will be able to:
• Understand how uncertainty can be built into business decision making.
• Apply linear programming techniques to optimise constrained decision choices.
• Use multiple regression analysis to model business problems.
• Apply a range of approaches to collect and analyse survey data.
• Apply selected techniques to manage projects.
• Use specialist software (e.g. SPSS, MSProject) to support business decision making.
• Account for short-term decisions, allocating costs utilising traditional methods and activity- based costing systems.
•
Calculate
and interpret accounting information for relevant costs and benefits in short-term decision-making situations.
• Demonstrate knowledge and understanding of the budgeting process and how to construct budgets, including the importance of behavioural implications.
• Calculate and communicate financial and non-financial measures of attractiveness in investment appraisal decisions and understand the role of cost of capital.
15
Syllabus
Survey methods: sampling, collection and analysis, using SPSS; statistical hypothesis tests.
Managing projects: scheduling for efficient resourcing; using MSProject.
Using SPSS to model relationships between variables – multiple regression, hypothesis tests.
Linear programming models in situations of constrained optimisation; using QSB (or equivalent).
The value of accounting information and theories of decision-making.
Accounting for short-term decisions, the nature and classification of costs, relevant costing.
Application of cost-volume-profit (CVP) analysis in decision-making for organisations and risk and uncertainty in decision-making.
The budgeting process, responsibility accounting and the effects of budgeting on motivation.
Accounting for long-term decisions: objectives of capital budgeting; the notion of time value of money; compounding and discounting.
Determination of a company’s cost of capital and capital rationing, methods of evaluating investment projects: Payback, ARR, NPV and IRR.
16
Assessment Strategy
The assessment will be in three parts. Formal assessment will comprise:
Assessment 1 (30%): Students will be presented with case study material that enables them to model management data in order to facilitate decision making. Students will be required to report on the results of their analysis.
Assessment 2 (30%): In-class tests. Students will be set a number of in-class tests focussing on Management Accounting.
Assessment 3 (40%): Integrated assignment, combining decision-making and project management.
17
Summary description of assessment items
Assessment Type
Description of item
% Weighting
Tariff
Week Due
CWK
Report on the findings of data analysis
30%
13
In-class tests
30%
19,23,27
CWK
Report on a case study involving decision-making and project management
40%
30
18
Learning and teaching
Learning and Teaching strategy for the module including approach to blended learning, students’ study responsibilities and opportunities for reflective learning/pdp
The teaching will consist of 2½ hour blocks per week, some of which will be in I.T. labs and some in classrooms. This will enable students to develop skills in the software appropriate to the area of the syllabus being covered.
19
Bibliography
Indicative bibliography and key on-line resources
Atrill, P. and McLaney, E. (2010) Management Accounting for Decision Makers, 6th Edition, Financial Times Press.
Bryman A. and Cramer D. (2011) Quantitative Data Analysis with IBM SPSS 17, 18 & 19 – A
Guide for Social Scientists, Routledge
Dewhurst, F. (2006) Quantitative Methods for Business and Management, 2nd Edition, McGraw
Hill
Drury, C. (2009) Management Accounting for Business, 4th ed, Cengage Learning EMEA.
Hongren C., Bhimani A., Datar S. and Foster G. (2012) Management and Cost Accounting, 5th
ed., Financial Times Press.
Oakshott, L. (2012) Essential Quantitative Methods, 5th Edition, Palgrave
Ray Proctor (2009), Managerial Accounting for Business Decisions, 3rd edition, Financial Times Press.
Render B. Stair, R. and Hanna, M. (2012) Quantitative Analysis for Management, 11th Edition,
Pearson
Rowntree, D. (2004) Statistics Without Tears: An introduction for non-mathematicians, Penguin
Books
Swift, L. and Piff, S. (2010) Quantitative Methods for Business, Management and Finance, 3rd
edition, Palgrave
Wisniewski, M. (2009) Quantitative Methods for Decision Makers, 5th Edition, Financial
Times/Prentice Hall
20
Approved to run from
September 2012
21
Module multivalency
22
Subject Standards Board
Weekly Programme Lecture Topics – Seminar/Workshop/Practical details
week
content
Assessments
1
Introduction to the Module
setting up data files in SPSS
2
Normal Distribution
Producing graphs and descriptive statistics in SPSS
3
Hypothesis testing: Chi-squared
Chi-squared using SPSS
4
Hypothesis testing : Significance tests for means
T-test for population mean , two-sample t-test using SPSS
5
Confidence intervals
using SPSS for confidence intervals
6
Correlation and Regression
Using SPSS for bivariate correlation and regression
7
Student Activity week
8
Multiple Regression
using SPSS for multiple regression
9
Interpreting Multiple Regression output
10
Graphical Linear Programming
Using Win QSB for LP
11
Linear programming: shadow prices and sensitivity analysis
Using Win QSB for LP
12
Linear Programming: extension to more than 2 variables
Using Win QSB for LP
13
Project management
Using MS Project
coursework 1 (30%)
14
Project management
Using MS Project
15
Project management
Using MS Project
16
Intro. to accounting for management decisions
17
Cost behaviour and cost-volume-profit (CVP) analysis
18
Product costing – traditional overhead costing and activity-based costing (ABC) systems
19
Management Accounting Test One
Management Accounting Test 1 (10%)
20
Student Activity week
21
Budgeting I – Budgeting process and the preparation of sales and operational budgets
22
Budgeting II – Preparation of cash budgets and the master budget
23
Management Accounting Test Two
Management Accounting Test 2 (10%)
24
Relevant costs and benefits for decisions
25
Relevant information for operating decisions: Short-term decisions with scarce resources
26
Risk and uncertainty in decision-making
27
Management Accounting Test Three
Management Accounting Test 3 (10%)
28
Long-term decisions II – Methods of investment appraisal and capital rationing
29
Revision Week
30
Final coursework (40%
Essential Books/on line resources including Weblearn/Blackboard
For the first 15 weeks, the module booklet provides most of the required material. Students should supplement their reading of the module booklet with the recommended texts.
Required/Weekly Reading/Practice/on line resources including any Weblearn/Blackboard
Students will need to look at weblearn on a regular basis (twice week ,say)for :
1, powerpoint slides
2. data files
3. answers to selected exercises
4. Additional readings
Additional/Weekly Reading/Practice/on line resources including any Weblearn/Blackboard
This module is supported by Weblearn – students are advised to access the site on a regular basis, at least once a week
Module Assessment Details, including Assessment Criteria for all elements of the assessment, including any examination
· Well–structured, well-written reports for courseworks
· Appropriate use of software packages
· Accurate interpretations of the output from software packages
· Accurate calculations
Assessment criteria for individual assessments will be provided along with the assessment guidelines
Assessment completion dates/deadlines
Coursework 1: Friday of week 13
Test 1: In class during week 19
Test 2: In class during week 23
Test 3: in class during week 27
Coursework 2: Friday of week 30
Please ensure that coursework is handed in at the Assessments Unit not later than 5pm on the due date.
Practical session 1
In order to learn how to use SPSS , we will base our exercises around the results of a survey using the following questionnaire. We will not be analyzing all the results
Survey of Seabridge Fitness & Sports Centre
The Centre’s management wants to ensure that its members get full value for money so is undertaking this survey. Included are a number of potential developments to the Centre, and we would appreciate your help in determining the nature and priority of these developments, as well as your opinion on other aspects of the Centre. Please complete this questionnaire (which is both confidential & anonymous) and post it in the box at the reception desk.
1. Which sport/activity have you taken part in during this visit to the Centre?
Tick one box only (main sport/activity)
Swimming 1 Keep Fit/Aerobics 2 Judo 3
Badminton 4 Basketball 5 Gym Training 6
Specialist classes: Pilates 7 Alexander Technique 8
2. What is your main reason for taking part in this Sports/Activity?
Tick one box only
To get fit 1 Social 2
Competition 3 Skill development 4
3. What is your opinion on the range of Sports/Activities offered at the Centre? Tick one box only
Very good 1 Good 2 Average 3 Poor 4 Very poor 5
4. We are thinking of introducing a number of changes at the Centre and need to identify priorities.
Please show your preferences by ticking the two most important developments
Offering hot meals in the café area 1
Building a sauna adjacent to the swimming pool 2
Introducing longer opening hours (7am – 11pm) 3
Introducing family membership scheme 4
Introducing weekday only membership 5
Expanding the range of bookings for team sports 6
Expanding the range of specialist classes 7
Other please state …………………………………………………. 8
5. Are you: Male 1 Female 2
6. How old are you ________________years?
7. When do you normally attend the Centre?
Tick one box only, for each of a) & b) below
a) Time of day: Morning 1 Afternoon 2 Evening 3
b) Day: Weekday 1 Weekend 2
8. How long have you been a member of the Centre?
Please enter the length of your membership to the nearest number of years ______________
9. Please rate your fitness levels when you joined the centre and now:
Tick one box only, for each of a) & b) below
Very Good Fairly Average/ Fairly Poor Very
Good
Good Moderate Poor
Poor
a) Fitness when I joined the Centre 1 2 3 4 5 6 7
b) Fitness now 1 2 3 4
5 6 7
10. What was the main reason for joining the Centre?
Tick one box only
Location of the Centre 1
Range of facilities available 2
Recommended by a friend or relative 3
Membership rates 4
Other please state …………………………………………………. 5
11. Please rate each aspect of service quality:
Tick one box only, for each of a) & b) below
Very Good Average Poor Very
Good Poor
a) Helpfulness of the reception staff 1 2 3
4 5
b) Cleanliness of the changing rooms 1 2 3 4 5
12. Please use the space below to add any comments about the Centre.
Thank you completing this questionnaire. Please post it in the box at the reception desk.
[Thanks to Richard Charlesworth for letting us use this questionnaire and associated data file ]
Janet Geary 2012 Page 2
Questionnaire responses:
Q1
Q2
Q3
Q4(1)
Q4(2)
Q5
Q6
Q7a
Q7b
Q8
Q9a
Q9b
Q10
Q11a
Q11b
Resp
Sport
Reason
Variety
Changes1
Changes2
Gender
Age
Time
Day
Howlong
Fitness1
Fitness2
Join
Helpfulness
Cleanliness
1
6
3
2
3
4
2
29
2
1
2
2
2
3
2
3
2
1
1
2
1
6
1
34
1
1
1
5
5
5
3
2
3
7
1
4
2
7
2
44
1
1
2
7
4
1
1
1
4
3
4
2
3
4
2
42
1
1
2
3
3
4
5
4
5
5
2
4
3
5
1
28
1
2
4
2
3
5
4
4
6
1
1
2
5
6
2
36
1
1
2
4
3
5
4
5
7
5
3
1
1
5
1
47
3
1
3
3
2
5
5
5
8
2
1
3
2
8
2
19
1
1
1
6
5
1
4
3
9
5
4
2
1
4
2
27
1
2
2
5
4
4
3
3
10
4
1
3
2
8
2
999
2
1
4
4
2
1
2
3
11
6
2
1
1
3
1
37
2
1
3
4
2
5
4
4
12
3
4
1
6
7
1
24
1
1
1
4
2
5
3
5
13
4
4
4
4
7
1
52
2
2
3
7
3
3
4
3
14
2
2
2
3
4
2
35
3
1
2
3
4
888
1
2
15
1
2
4
5
7
1
31
3
1
2
6
5
2
3
4
16
4
3
2
3
5
1
23
3
2
1
1
1
3
1
2
17
5
1
4
3
8
2
32
2
1
2
4
2
3
4
4
18
3
3
3
2
3
2
21
3
1
1
1
2
1
2
2
19
3
2
2
1
6
2
33
2
1
2
3
2
3
3
4
20
6
3
3
2
8
1
23
1
2
2
4
3
5
2
2
21
4
4
3
5
7
2
38
1
1
3
6
4
4
4
4
22
7
1
4
4
6
2
51
3
2
3
5
5
1
3
4
23
4
3
2
5
6
1
30
1
1
3
6
4
2
1
1
24
1
1
3
6
8
2
46
1
2
3
3
2
2
4
3
25
5
3
3
4
5
2
31
1
1
2
3
3
2
1
2
26
1
3
2
1
3
2
42
2
2
2
2
4
4
1
2
27
1
3
3
4
6
2
46
1
2
4
4
2
2
2
2
28
5
4
3
4
7
2
25
3
1
2
2
2
1
3
2
29
7
2
1
6
8
2
61
2
1
1
2
2
3
5
5
30
5
888
5
5
8
1
27
2
2
2
4
3
4
2
3
31
2
1
2
2
2
1
38
3
2
4
5
4
3
1
1
32
5
2
4
3
5
1
41
1
1
4
4
4
2
2
1
33
2
4
3
6
8
2
37
1
2
2
4
2
5
2
2
34
6
4
2
1
7
1
37
1
2
1
2
2
2
3
1
35
1
2
3
1
5
1
57
1
2
3
5
3
3
2
3
36
7
1
3
2
3
2
39
1
1
1
7
5
2
1
2
37
3
4
3
3
6
1
22
3
2
2
1
1
2
3
4
38
6
4
3
3
5
1
19
2
1
3
5
3
4
1
1
39
2
1
5
2
7
1
43
2
1
1
3
3
5
2
3
40
2
2
3
1
6
2
68
2
1
2
7
4
4
2
3
41
2
1
4
6
7
2
43
2
2
3
5
2
4
3
2
42
4
1
4
2
7
1
45
3
2
2
4
2
2
2
2
43
6
3
3
1
2
1
38
1
2
1
3
3
2
2
1
44
5
2
4
1
6
1
40
1
2
4
3
2
5
3
3
45
5
2
3
4
4
1
17
1
2
2
7
5
5
2
3
46
4
2
1
4
5
2
34
2
1
1
2
1
1
3
3
47
6
3
3
1
8
1
30
2
1
3
1
3
1
3
4
48
7
3
2
2
8
1
65
3
2
4
1
1
4
1
1
49
6
3
2
4
8
2
33
3
1
2
3
3
4
2
2
50
1
4
3
1
2
2
24
3
2
1
3
2
3
5
4
51
2
3
2
5
6
2
48
2
1
2
2
1
5
2
3
52
8
3
3
1
3
1
999
1
2
3
5
4
1
2
2
53
7
2
2
6
8
1
27
3
2
1
5
3
999
2
3
54
6
2
4
2
5
1
29
1
1
2
5
4
2
2
1
55
8
2
2
4
8
2
37
3
1
4
3
1
3
4
3
56
7
1
1
1
7
1
32
2
1
4
6
3
2
5
4
57
1
4
2
4
8
2
55
2
2
3
6
5
4
2
1
58
6
1
5
2
3
2
30
2
2
1
3
3
3
3
2
59
7
4
1
1
5
2
39
3
1
2
5
3
2
4
3
60
6
3
3
2
8
2
18
3
1
4
4
3
3
1
2
Tasks required in the first practical session.
1. Load the excel data file seabridge.xls into SPSS
2. Set up the Variable View page
3. Save the file as Seabridge.sav
[Thanks to Richard Charlesworth for the notes for this session]
Using SPSS v18/19 to analyse survey data
These notes give an introduction to using SPSS (Statistical Package for the Social Sciences) for survey analysis. They are based around the questionnaire ‘Survey of Seabridge Fitness & Sports Centre’ and the survey data.
1. Entering data into SPSS
Data entry
Data can be entered in three ways:
(i) directly within SPSS (via the data editor);
(ii) copied from a spreadsheet (e.g. Excel) or Word table, and pasted into the SPSS data editor.
(iii) imported to SPSS from a previously created spreadsheet file (e.g. Excel).
Methods (i) & (ii) are the easiest methods & are described below.
Data entry using the SPSS data editor
If data is to be entered directly within SPSS, click on ‘Type in data’ (see Figure 1), then the ‘Data View’ tab at the bottom of the screen. This takes you into the SPSS data editor which has the same format as a spreadsheet with each column representing a different variable (a question), and each row the record of a different respondent (or ‘case’). Note however, that from time to time it is necessary to use more than column for a single question. For example, if multiple responses are permitted/required, a separate column is needed to record each response (e.g. see Q4 in the ‘Seabridge Fitness & Sports Centre’ questionnaire, which requires two answers).
Figure 1 – entering data directly into SPSS
Data entry by copying data from a spreadsheet or Word table
If we choose to simply copy data from a previously prepared spreadsheet such as Excel or a Word table to the SPSS data editor, remember to paste the data into the ‘Data View’ page, and not the ‘Variable View’ page. It is important to paste only the numerical data and not any variable names, which might also have been entered on the spreadsheet or table. SPSS will then automatically assign default variable names VAR00001, VAR00002 etc.
2. Defining the variables
Before undertaking any analysis it is important to fully define each variable. This is carried out on the ‘Variable View’ page. In Figure 2 the variable names have already been added – respondent, sport, reason etc, (note that ‘respondent’ is simply the questionnaire (or respondent/user number) whereas the remaining variables relate to specific questions). We are in the process of defining the variable ‘sport’: the ‘(Variable) Label’ enables the user to give a brief description so the variable ‘sport’ represents ‘Sport or activity undertaken’; similarly we can define the meaning of each of the possible numerical responses under ‘Value Labels’, so a response ‘1’ represents ‘Swimming’, ‘2’ represents ‘Keep Fit/Aerobics’ etc; under ‘Measure’ the user has already confirmed that ‘Sport’ is a ‘nominal’ (or categorical) variable.
Figure 2 – defining the variable ‘Sport’
Note with large questionnaires it is often helpful to insert the question number at the beginning of the ‘(Variable) Label’. So here we have entered ‘Q1 Sport or activity undertaken’ etc. This makes it easier to locate specific questions when conducting analyses.
The ‘Missing’ category enables us to record the code used for any missing values. For example, respondent No. 10 has failed to divulge his/her age, recorded here with a code of 999. Figure 3 shows ‘999’ being defined as the code for a missing response to Question 6 (‘Age’).
Figure 3 – defining 999 as the code for a missing value for Question 6 (‘age’)
Note that SPSS enables us to define up to three distinct missing value codes. The reason for this is that in addition to genuinely missing responses, we may choose to treat some ‘legitimate’ responses as though they were missing.
For instance, respondent number 14 has ticked more than one box for Question 9 (main reason for joining the Centre). Two or more reasons may have persuaded this respondent to join the Centre, so although this response may technically be a correct & honest reply, the ticking of only one box was requested (‘the main reason..’), and it would be inappropriate for us to choose one of these replies to code at the expense of the other, or to count them both. Instead we can introduce a missing value code (here 888) to record multiple responses.
.
When you have finished defining all the variables, save the file. We will be using this file next week.
Topic 1 Normal Distribution
Probability Distributions
Consider the case where we have a number of customers and we have recorded their ages.
If we take ages groups of 10 years and use probabilities rather than frequencies we get the histogram:
[the probability is the frequency of each group dived by the total frequency. The total of all probabilities adds to 1]
If we take age groups of 5 years we get :
With age groups of 2 years we get :
If we continue this process, we can draw a curve rather than a histogram. The curve will join the midpoint of the top of each column.
A continuous random variable, such as age measured to the nearest day, hour, minute etc ., has an infinite number of possible values and it is not possible to list every one with its associated probability. Furthermore the probability of each value will be so small that it must be considered approximately equal to zero. As a result we can only consider the probability that a value lies within a particular interval. In the graph above we have used classes such as 22 up to 24 to represent an interval.
For example: When measuring heights the probability that an individual is exactly 1.63 metres tall is very small but the probability that someone has a height between 1.625 and 1.635 metres is much easier, and sensible, to determine.
The Normal Distribution is one of the most important in Statistics, not only because many data distributions conform to this pattern, but also because the Normal Distribution underpins the subject of statistical inference.
The Normal Distribution curve is shown below.
The total area under the curve represents the total probabilities, thus the total area equals 1.
Properties of the Normal Distribution
The Normal Curve has the following properties:
· It is symmetrical about the mean.
· The curve is bell-shaped.
· The total area under the curve equals 1
· 68.26% of the area under the curve lies within 1 standard deviation from the mean.
· 95.44% of the area under the curve lies within 2 standard deviations from the mean.
· 99.73% of the area under the curve lies within 3 standard deviations from the mean.
The exact shape and position of the Normal curve depends upon the values of the mean and standard deviation. As Normal Distributions have an infinite combination of means and standard deviations, there is a problem in compiling tables for the probabilities.
This problem is solved by standardising the variables.
The formula below takes x values (actual measurements) and converts them into z values (for use in tables).
z = x – μ μ is the mean and σ the standard deviation.
σ
z represents the number of standard deviations between x and the mean.
If actual measurements, x, are converted into z scores by use of the formula, then a Normal Distribution with mean μ and standard deviation σ is transformed into a Normal Distribution with mean 0 and standard deviation 1.
Example:
The examination marks for a large group of students followed a Normal Distribution. The mean mark was 50 and the standard deviation was 10 marks.
What percentage of students gained marks in the range:
a) Between 40 and 60
b) Between 30 and 70
c) above 70
.
a) mean = 50 s.d. =10
x = 40
i.e. 1 standard deviation below the mean.
x = 60
i.e. 1 standard deviation above the mean.
Thus a value between 40 and 60 is equivalent to a value between 1 standard deviation above and 1 standard deviation below the mean.
Using the ‘Properties of the Normal Distribution’ above, the area is 68.26%
Thus we would expect 68.26% of students to gain marks in the range 40 to 60.
b) Between 30 and 70
mean = 50 s.d. = 10
x = 30
Thus x = 30 is 2 standard deviations below the mean.
x = 70
Thus x = 70 is 2 standard deviations above the mean.
Using the ‘Properties of the Normal Distribution’, the area between 2 standard deviations above and below the mean is 95.44%
Thus we would expect 95.44% of students to gain marks in the 30 to 70 range.
c) Above 70
From part b) of this question we have find that 95.44% of students will gain marks in the range 30 to 70. This leaves 4.56% that are either above 70 or below 30. As the Normal curve is symmetrical, 2.28% will be above 70 and 2.28% will be below 30.
Thus we would expect 2.28% of students to gain marks above 70.
In this example, we could use the information from the ‘Properties of the Normal Distribution’ as the x values gave z values of 1 and 2. In most examples the z values will not be integers and we must use tables to find the areas and thus determine the probabilities.
Use of tables for the Normal Distribution
There are different ways of expressing tables for the Normal Distribution. These notes refer to the tables that follow on the next page.
The tables give the area in the right hand side tail. This area represents the probability that a random variable is greater than z.
For example . The probability that z > 1.35 = 0.0885
Tables for the Normal Distribution
z
0.00
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.0
0.5000
0.4960
0.4920
0.4880
0.4840
0.4801
0.4761
0.4721
0.4681
0.4641
0.1
0.4602
0.4562
0.4522
0.4483
0.4443
0.4404
0.4364
0.4325
0.4286
0.4247
0.2
0.4207
0.4168
0.4129
0.4090
0.4052
0.4013
0.3974
0.3936
0.3897
0.3859
0.3
0.3821
0.3783
0.3745
0.3707
0.3669
0.3632
0.3594
0.3557
0.3520
0.3483
0.4
0.3446
0.3409
0.3372
0.3336
0.3300
0.3264
0.3228
0.3192
0.3156
0.3121
0.5
0.3085
0.3050
0.3015
0.2981
0.2946
0.2912
0.2877
0.2843
0.2810
0.2776
0.6
0.2743
0.2709
0.2676
0.2643
0.2611
0.2578
0.2546
0.2514
0.2483
0.2451
0.7
0.2420
0.2389
0.2358
0.2327
0.2296
0.2266
0.2236
0.2206
0.2177
0.2148
0.8
0.2119
0.2090
0.2061
0.2033
0.2005
0.1977
0.1949
0.1922
0.1894
0.1867
0.9
0.1841
0.1814
0.1788
0.1762
0.1736
0.1711
0.1685
0.1660
0.1635
0.1611
1.0
0.1587
0.1562
0.1539
0.1515
0.1492
0.1469
0.1446
0.1423
0.1401
0.1379
1.1
0.1357
0.1335
0.1314
0.1292
0.1271
0.1251
0.1230
0.1210
0.1190
0.1170
1.2
0.1151
0.1131
0.1112
0.1093
0.1075
0.1056
0.1038
0.1020
0.1003
0.0985
1.3
0.0968
0.0951
0.0934
0.0918
0.0901
0.0885
0.0869
0.0853
0.0838
0.0823
1.4
0.0808
0.0793
0.0778
0.0764
0.0749
0.0735
0.0721
0.0708
0.0694
0.0681
1.5
0.0668
0.0655
0.0643
0.0630
0.0618
0.0606
0.0594
0.0582
0.0571
0.0559
1.6
0.0548
0.0537
0.0526
0.0516
0.0505
0.0495
0.0485
0.0475
0.0465
0.0455
1.7
0.0446
0.0436
0.0427
0.0418
0.0409
0.0401
0.0392
0.0384
0.0375
0.0367
1.8
0.0359
0.0351
0.0344
0.0336
0.0329
0.0322
0.0314
0.0307
0.0301
0.0294
1.9
0.0287
0.0281
0.0274
0.0268
0.0262
0.0256
0.0250
0.0244
0.0239
0.0233
2.0
0.0228
0.0222
0.0217
0.0212
0.0207
0.0202
0.0197
0.0192
0.0188
0.0183
2.1
0.0179
0.0174
0.0170
0.0166
0.0162
0.0158
0.0154
0.0150
0.0146
0.0143
2.2
0.0139
0.0136
0.0132
0.0129
0.0125
0.0122
0.0119
0.0116
0.0113
0.0110
2.3
0.0107
0.0104
0.0102
0.0099
0.0096
0.0094
0.0091
0.0089
0.0087
0.0084
2.4
0.0082
0.0080
0.0078
0.0075
0.0073
0.0071
0.0069
0.0068
0.0066
0.0064
2.5
0.0062
0.0060
0.0059
0.0057
0.0055
0.0054
0.0052
0.0051
0.0049
0.0048
2.6
0.0047
0.0045
0.0044
0.0043
0.0041
0.0040
0.0039
0.0038
0.0037
0.0036
2.7
0.0035
0.0034
0.0033
0.0032
0.0031
0.0030
0.0029
0.0028
0.0027
0.0026
2.8
0.0026
0.0025
0.0024
0.0023
0.0023
0.0022
0.0021
0.0021
0.0020
0.0019
2.9
0.0019
0.0018
0.0018
0.0017
0.0016
0.0016
0.0015
0.0015
0.0014
0.0014
3.0
0.0013
0.0013
0.0013
0.0012
0.0012
0.0011
0.0011
0.0011
0.0010
0.0010
3.1
0.0010
0.0009
0.0009
0.0009
0.0008
0.0008
0.0008
0.0008
0.0007
0.0007
3.2
0.0007
0.0007
0.0006
0.0006
0.0006
0.0006
0.0006
0.0005
0.0005
0.0005
3.3
0.0005
0.0005
0.0005
0.0004
0.0004
0.0004
0.0004
0.0004
0.0004
0.0003
3.4
0.0003
0.0003
0.0003
0.0003
0.0003
0.0003
0.0003
0.0003
0.0003
0.0002
3.5
0.0002
0.0002
0.0002
0.0002
0.0002
0.0002
0.0002
0.0002
0.0002
0.0002
3.6
0.0002
0.0002
0.0001
0.0001
0.0001
0.0001
0.0001
0.0001
0.0001
0.0001
3.7
0.0001
0.0001
0.0001
0.0001
0.0001
0.0001
0.0001
0.0001
0.0001
0.0001
When dealing with problems involving the use of Normal tables, it is advisable to draw sketches to identify the required areas.
1: Find the probability that z is greater than 1.
From tables; when we look up z = 1.000 we get 0.1587 Thus P( z > 1 ) = 0.1587
2: Find the probability that z is less than 2.01
When we look up z = 2.01 we get 0.0222 as the area above z = 2.01
The required area is 1 – 0.0222 = 0.9778 Thus P( z < 2.01 ) = 0.9778
3: Find the probability that z is between 0.6 and 2.01
The required area is shown.
From tables z = 0.6 gives 0.2743 z = 2.01 gives 0.02222
The required area is found by 0.2743 – 0.0222= 0.2521 as it is between greater than 0.6 and greater than 2.01 Thus P( 0.6 < z < 2.01 ) = 0.2521
4: Find the probability than z is less than -1.5
From tables we cannot look up z = -1.5 directly as only positive values are listed.
However we can look up z = + 1.5 to find the area.
As the Normal curve is symmetrical, the area below z = -1.5 is the same as the area above z = + 1.5.
From tables z = 1.5 gives 0.0668
Thus the area below z = -1.5 is 0.0668. Thus P( z < -1.5 ) = 0.0668.
5. Find the probability that z lies between -1.5 and 0.6
From tables z = 0.6 gives 0.2743 From tables z = 1.5 gives 0.0668
The required area lies between z = -1.5 and z = 0.6
The required area is 1 – .2743 – 0.0668 = 0.6589 Thus P(-1.5 < z < 0.6) = 0.6589
Worked Example:
The mean weight of 500 schoolboys is 55 kg and the standard deviation is 4 kg.
Assuming that the weights are Normally distributed, find the percentages of boys that are expected to weigh:
a) more than 58 kg
b) between 58 and 60 kg
c) between 50 and 65 kg
μ = 55 σ = 4
a) more than 58 kg
x = 58
From tables z = 0.75 gives 0.2266 Thus the required area is 0.2266
Thus we would expect 22.66% of the boys to weigh more than 58 kg
b) between 58 and 60 kg
x = 58 z = 0.75 and tables give 0.2266 Thus Area 1 (above 0.75) = 0.2266
x = 60
z = 1.25 tables give 0.1056 Thus Area 2 (above 1.25) = 0.1056
The area between z = 0.75 and z = 1.25 is Area 1 – Area 2
0.2266 – 0.1056 = 0.1210
Thus we would expect 12.1 % of the boys to weigh between 58 and 60 kg.
c) between 50 and 65 kg
x = 50
From tables z = 1.25 gives 0.1056 Thus Area 1 (below – 1.25) = 0.1056
x = 65
From tables z = 2.5 gives 0.00621 Thus A2 (above 2.5) = 0.00621
The required area is Total Area – A1 – A2 1 – 0.1056 – 0.00621 = 0.88819
Thus we would expect 88.82% of the boys to weigh between 50 and 65 kg
Example: The weekly output of a production line varies according to the Normal distribution with a mean of 1,500 units and a standard deviation of 120 units.
The manager of the factory wants to know what the production output is for 95% of the time.
To answer this question:
We wish to find the limits of production output that encloses 95% of the area.
If we take this 95% symmetrically, we have two tails of 2.5 % each left.
Thus area in the tail = 2.5% or 0.025
Scanning the tables for an area of 0.025 we find the z value to be
1.96
.
Thus 95% of the area lies between z = -1.96 and z = + 1.96.
We can now use this statement to solve equations. Mean = 1500 sd = 120
Upper limit
Lower limit
z = 1.96
z = x – mean = x – 1500
s.d. 120
z = -1.96
z = x – mean = x – 1500
s.d. 120
1.96 = x – 1500
120
-1.96 = x – 1500
120
1.96 × 120 = x – 1500
-1.96 × 120 = x – 1500
235.2 = x – 1500
– 235.2 = x – 1500
235.2 + 1500 = x
-235.2 + 1500 = x
1735.2 = x
1264.8 = x
Thus 95% of the time, the production output will be between 1265 and 1735 units a week.
Seminar Questions:
1. A thousand candidates sat an examination, the results of which were Normally distributed with a mean mark of 50 and a standard deviation of 10 marks.
How many candidates would be expected to score:
a) less than 75 marks
b) less than 25 marks
c) more than 60 marks
d) a grade C ( i.e. 50 to 59 inclusive)
2. A machine in a factory produces components whose lengths are Normally distributed with
mean = 102 mm and standard deviation = 1.5 mm.
a) Find the probability that, if a component is selected at random and measured, its length will be
i) Less than 100 mm
ii) Greater than 104 mm
b) If a component is only acceptable when its length lies in the range 100 mm to 104 mm, find the percentage of acceptable components.
3. As a result of tests on light bulbs, it was found that the life-time of a particular make of bulb was distributed Normally with an average life of 2040 hours and a standard deviation of 60 hours.
What percentage of bulbs are expected to last :
a) for more than 2150 hours
b) for more than 1960 hours
4. As a result of the introduction of Income Self-Assessment, at the end of the financial year, taxpayers may either have overpaid or underpaid their tax. Those taxpayers who have overpaid are entitled to a refund, whilst those who have underpaid owe the Tax Office money.
Past experience leads the Tax Office to believe that the amounts to be refunded or owed are Normally distributed.
This year, for one category of taxpayer, the mean of these amounts was a refund of 607 and a standard deviation of 320.
What proportion of taxpayers is entitled to a tax refund greater than 1,100?
What proportion of taxpayers owe money to the Tax Office?
What proportion of taxpayers is entitled to a refund of between 50 and 250?
For a separate class of taxpayers, the standard deviation is unknown. The mean refund for this class is 550 and 71.9% of these tax payers have refund greater than 300.
What is the standard deviation for this class of taxpayer.
5. A box of breakfast cereal states that it weighs 500 grams. This is the nominal weight.
When the person responsible for quality control in the company that produces the cereal measured a sample of 100 boxes of cereal the mean weight was 504 gms and the standard deviation was 2 gms. These values were exactly correct according to the companys policy.
Why would the company plan to have a mean weight greater than the nominal weight?
What is the probability that a customer, buying one packet of cereal, would buy a packet:
a) weighing more than 503 gms
b) weighing below the nominal weight
Answers
1 a) 993.79 b) 6.21 c) 158.7 d) 315.9
2 a) i) 0.0918 ii) 0.0918 b) 81.64%
3 a) 3.36% b) 90.82%
4 a) 6.18% b) 2.87% c) 8.68% d) 431
5 a) 0.6915 b) 0.02275
Practical session 2 [Using the file seabridge.sav with all variables defined]
1. Producing descriptive statistics in SPSS
2. Drawing graphs in SPSS
Data summary & analysis [Thanks to Richard Charlesworth for some of these notes on SPSS]
Figure 4 shows the basic command sequence for an elementary statistical analysis. Click first on ‘Analyze’, then ‘Descriptive Statistics’ then ‘Frequencies’
Figure 4 (above) – using ‘Analyze’ to provide some elementary statistical analysis
Figure 5 (below) – requesting an analysis of the variable ‘sport’
In Figure 5 the researcher has requested an analysis of the variable ‘Sport’ and samples of output are in Figures 6 & 7. Had we wished to do so, we could have requested the same analysis of several variables, rather than just ‘Sport’. Note the left-hand window records the output requested so far – useful for quickly locating earlier analyses.
Figure 6 (above) – a frequency table of ‘Sport’
Figure 7 (below) – a bar chart of ‘Sport’
To produce descriptive statistics for age
The output produced is
Descriptive Statistics
N
Minimum
Maximum
Mean
Std. Deviation
Q6 age of respondent
58
17
68
36.19
11.607
Valid N (listwise)
58
4 Incorporating output in a document
The focus of these notes is about interacting with SPSS, and hence the figures illustrate the screen display. However SPSS often produces more output than is needed (so you need to be selective); moreover this is not always in an appropriate style for inclusion in a document such as a dissertation or report.
Tables and graphs can easily be copied in to a document using copy and paste commands. SPSS tables are copied to a Word document as a Word table. You may wish to reformat aspects of the table (e.g. column widths may need readjusting, changing lines/ borders etc.) to get closer to the preferred style, and there is the added benefit that you can edit out any redundant or irrelevant information. SPSS graphs will be copied as a picture, which can be adjusted for size and position to fit in the document.
5 Saving SPSS files
Remember to save both the data and output files before exiting SPSS (you will be reminded of this by SPSS if you fail to do so). In Figure 8 the user has given the name ‘Seabridge Fitness & Sports Centre survey’ to the output file; the full filename is ‘Seabridge Fitness & Sports Centre survey.spv’. Similarly if the data file is also called ‘Seabridge Fitness & Sports Centre survey’, the full filename is ‘Seabridge Fitness & Sports Centre survey.sav’.
Figure 8 – the output file is saved as ‘Seabridge Fitness & Sports Centre .spv’
Topic 2: Hypothesis Testing: Chi-Squared test
Chi-square test of association
This test is one of the most important tests used in survey analysis. It is used to test whether two types of classification (i.e. answers to two questions) are statistically independent.
The test is extremely useful in management research and can be used to explore whether, for instance, there is any relationship between gender and management grade, gender and motivation, stress and salary level etc.
In a survey, airline passengers were asked to answer the question How did you book your flight and also asked for their gender. The chi-squared test allows us to test to see if there is any significant difference in the responses of the male and female passengers.
The resulting contingency table was:
gender
Total
male (1)
female (2)
method of booking
internet (1)
14
15
29
telephone(2)
16
24
40
travel agent (3)
22
9
31
Total
52
48
100
Looking at this table, it would seem that women were more likely to use the telephone more than men and that men were more likely to use travel agents. However, we need to perform the chi-squared test to see if these differences are statistically significant.
The chi-square test compares the ‘observed’ results with those would be ‘expected’ if the results were independent. The expected results are calculated by the formula:
Expected = row total × column total
Grand total
As an example, The number of males who use the internet was observed to be 14.
The expected value for this cell :
Expected = row total × column total = 29 × 52 = 15.08
Grand total 100
The table of observed and expected values is shown below:
gender
Total
male
female (2)
method of booking
internet
14 [15.08]
15 [13.92]
29
telephone
16 [20.80]
24 [19.20]
40
travel agent
22 [16.12]
9 [14.88]
31
Total
52
48
100
Formally we test whether our null hypothesis of independence is supported by the evidence of our survey. If not, we will reject the null hypothesis in favour of the alternative hypothesis which claims that method of booking and gender are not independent.
Formally:
Null hypothesis H0: method of booking is independent of gender
Alternative hypothesis H1: method of booking is not independent of gender
The overall comparative measure is provided by the test statistic:
This is calculated by:
observed, o
expected, e
o-e
(o – e)2
(o – e)2/e
14
15.08
1.08
1.1664
0.0773
15
13.92
1.08
1.1664
0.0838
16
20.8
4.8
23.04
1.1077
24
19.2
4.8
23.04
1.2
22
16.12
5.88
34.5744
2.1448
9
14.88
5.88
34.5744
2.3235
total
6.9371
In this case χ2 = 6.9371
The degrees of freedom , df, can be found from the formula :
df = (number of rows-1) × (number of columns-1)
Note: Do not count the “totals” in the number of rows and columns.
In this case df = (3-1) ×(2-1) = 2
Testing at a level of significance of 5%, the critical value of the chi-square distribution (with 2 df) is 5.991 (see tables at the end of this section). We compare our calculated test statistic with the tabulated value. The critical value is exceeded by the test statistic as can be clearly seen in the graph below.
The test statistic falls into the rejection (or critical) region (5.991 and above), and consequently we reject the null hypothesis that the method of booking and gender are independent. We therefore take this test result as evidence that there is an association between the method of booking and gender.
As the largest relative difference is between the use of travel agents, we can say that men are more likely to book through travel agents and women less likely.
The relevant SPSS printout is on the next page
:
Case Processing Summary
Cases
Valid
Missing
Total
N
Percent
N
Percent
N
Percent
B6 * C3
100
100.0%
0
.0%
100
100.0%
B6 * C3 Crosstabulation
C3
Total
1
2
B6
1
Count
14
15
29
Expected Count
15.1
13.9
29.0
2
Count
16
24
40
Expected Count
20.8
19.2
40.0
3
Count
22
9
31
Expected Count
16.1
14.9
31.0
Total
Count
52
48
100
Expected Count
52.0
48.0
100.0
Chi-Square Tests
Value
df
Asymp. Sig. (2-sided)
Pearson Chi-Square
6.937
2
.031
Likelihood Ratio
7.109
2
.029
Linear-by-Linear Association
3.204
1
.073
N of Valid Cases
100
a 0 cells (.0%) have expected count less than 5. The minimum expected count is 13.92.
The value for ‘Pearson Chi-Square’ is given as 6.937.
Interpretation of the output:
This gives a calculated value of chi-squared of 6.937 which can be compared with the critical value of 5.9915 (from statistics tables using a 5% level of significance). As the calculated value is greater than the critical value it does lie in the ‘reject’ region of the curve (i.e. the area to the right of our calculated value is less than 5%). SPSS gives us the area to the right of the calculated value as 0.031. We could have used this to see that the calculated value of chi-squared lies in the tail and thus we do not have to use statistics table.
If the value in the last column for Pearson Chi-Square was greater then 5% (0.05) we would have concluded that the method of booking and gender were independent at the 5% level of significance.
In this example, we would conclude that there is evidence of an association between gender and method used to book the flight.
Some limitations of the chi-square test
There are some limitations of the chi-square test.
· the test identifies only the presence of relationships, not the focus or direction;
· the statistical properties of the test require that:
· the expected values must be ≥ 5. If this is not the case, appropriate rows or columns must be pooled together; however in some cases we simply do not have enough data (or density of data throughout the table), and combining rows and columns results in collating data into meaningless overarching classes rendering the test of little or no value.
Note: The often stated requirement above, that all expected values must be ≥ 5, is a safety first measure. In practice, we can accommodate up to 20% of the cells having expected values of between 1 and 5 (notice that the SPSS output provides a reminder of this). In the event of too many cells containing small expected frequencies we would need to combine rows or columns as detailed above.
· the χ2 formula needs to be modified slightly for 2 by 2 tables. We use the formula:
/e
This involves subtracting 0.5 from the absolute difference between o and e, before squaring the result and then proceeding as before. The problem arises because we are using the continuous chi-square distribution to approximate to the discrete cell frequencies; with a small number of cells (that is 2 by 2 tables) this approximation needs a ‘continuity correction’. In fact, if the o and e values are quite large, the original uncorrected result will be close to the adjusted value; however the problem becomes more marked when the frequencies are small, so it is wise to always use the modified formula with 2 by 2 tables. Notice that SPSS provides the continuity correction when a 2 by 2 table is analysed.
The test assumes a null hypothesis that ‘gender’ and ‘method of booking’ are statistically independent, and expected frequencies for each of the categories are calculated. These are compared with what actually was recorded in the survey, and a test statistic is computed. The value of the relevant test statistic above is 6.937, which would occur by chance with a probability of 0.031 (or 3.1%) when ‘gender’ and ‘method of booking’ are statistically independent.
In such testing we typically assume a probability value of less than 0.05 (or 5%) is evidence that the null hypothesis is incorrect. In this case we therefore have sufficient evidence to reject the null hypothesis and conclude that ‘gender’ and ‘method of booking’ are not independent. So male and females have used different methods of booking.
The chi-square test is used here because both variables are nominal. If both were ordinal the chi-square test could still be used.
Yates Correction for 2×2 Contingency Tables:
This is used for 2 by 2 contingency tables when the total number of items in the table is relatively small (say, < 50 ). With a 2x2 table the degree of freedom= (r-1)×(c-1) = 1.
Example:
42 people were asked about their voting intentions in a forthcoming referendum on the single currency. The results are shown below:
Would vote for
would vote against
total
female
12
7
19
male
8
15
23
20
22
42
would vote for
would vote against
female
12 (9.05)
7 (9.95)
19
male
8 (10.95)
15 (12.05)
23
20
22
42
We have calculated the expected values as usual i.e expected = (row total×column total)
grand total
The test statistic, with Yates Correction, is given by:
χ2 = ( |o – e| – 0.5)2
e
Where |o-e| denotes the absolute value of a number (regardless of sign). So |3| = |-3| = 3 etc.
H0 Rows and columns are independent [voting does not depend on gender]
H1 some form of dependence exists [the genders vote differently]
level of the test: 5%
o
e
o-e
|o-e|-0.5
(|o-e|-0.5)2
(|o-e|-0.5)2/e
12
9.05
2.95
2.45
6.0025
0.6633
7
9.95
-2.95
2.45
6.0025
0.6033
8
10.95
-2.95
2.45
6.0025
0.5482
15
12.05
2.95
2.45
6.0025
0.4981
total
2.3129
Since the critical value of χ2 with 1 degree of freedom at the 5% level is 3.8415 (from tables) then we fail to reject H0 and conclude that, on the basis of the evidence available to us, rows and columns are independent. Thus there is no evidence to support the claim that males and females vote differently.
Seminar Exercise
1. Number of defects found in a product: 0 1 2 3 4 or more:
Expected percentage: 5% 15% 30% 30% 20%
We take a random sample of 35 items and observe the following numbers of units with the stated number of defects:
Number of defects found in a product: 0 1 2 3 4 or more:
Numbers observed, 3 8 4 11 9
Test to see if this data is consistent with the claimed proportions of defectives.
(Remember to ensure that all expected values are greater than or equal to 5)
2. A random sample of people reporting sick over a five day period was:
Monday
Tuesday
Wednesday
Thursday
Friday
8
20
14
18
25
Is there any evidence to suggest that the reporting is not spread evenly through the week?
3. A firm uses three similar machines to produce a large number of components. On a particular day a random sample of 99 from the defective components produced on the early shift were traced back to the machine that produced them. The same was done with a random sample of 65 defectives from the late shift. The table below shows the number of defectives found to be from each machine on each shift:
machine A
machine B
machine C
Early Shift
37
29
33
Late Shift
13
16
36
Test the hypothesis that the probability of a defective coming from a particular machine is independent of the shift in which it was produced.
4. Stapleton Electrics manufacture televisions at four different factories and quality control is of great interest to the company. The table below shows reliability of the machines in a particular month:
Factory A
Factory B
Factory C
Factory D
Needed repair
4
15
9
12
No repair needed
8
10
6
6
a) Test to see if there is any association between factory and reliability of machines.
b) Subsequently, it was discovered that the figures above had been divided by 10 to make the arithmetic easier.
Correct for this and repeat the test. Does it make any difference to the result?
5. Debtovia is a small country in the middle of a recession. As part of a nation-wide survey a particular town was selected at random and the question “Would you support an incomes policy?” was asked to a number of workers selected at random. Their answers (Yes, No or Don’t Know) and their employment status(skilled/unskilled and Union/Non-Union) were recorded and the data is presented below:
Skilled and
in union
Skilled and
not in union
Unskilled and
in union
Unskilled and not in union
Yes
7
7
9
12
No
24
21
9
11
Don’t know
29
27
17
27
a) Test the hypothesis that there is no association between the responses to the above question and employment status.
b) Form a new 2×2 contingency table from the above data by omitting all the Dont know responses and then pooling the remaining responses to obtain one column for Skilled and one column for Unskilled workers. Test to see if there is any association evident in your new table.
c) Comment on your findings in both cases and compare the two situations.
Answers
1. χ2 test = 6.904 critical = 7.815 do not reject H0
The data is consistent with claimed proportions
2 χ2 test = 9.647 critical = 9.488 reject H0
Reporting sick is not evenly spread. Monday has fewer than expected, Friday has more.
3 χ2 test = 8.7346 critical = 5.991 reject H0
Number of defectives varies according to shift
4 a) χ2 test = 3.5773 critical = 7.815 do not reject H0
No relationship between factory and reliability
b) New χ2 test = 35.77 (i.e. 10 times as big) reject H0
5 a) χ2 test = 8.43 Critical = 12.592 do not reject H0
Conclude no association between response and employment status
b) χ2 test = 6.8727 critical = 3.841 reject H0
Conclude there is an association between response and employment
Chi-Squared Tables
area in one tail
0.100
0.050
0.025
0.010
0.005
0.001
degrees of freedom
1
2.7055
3.8415
5.0239
6.6349
7.8794
10.8276
2
4.6052
5.9915
7.3778
9.2103
10.5966
13.8155
3
6.2514
7.8147
9.3484
11.3449
12.8382
16.2662
4
7.7794
9.4877
11.1433
13.2767
14.8603
18.4668
5
9.2364
11.0705
12.8325
15.0863
16.7496
20.5150
6
10.6446
12.5916
14.4494
16.8119
18.5476
22.4577
7
12.0170
14.0671
16.0128
18.4753
20.2777
24.3219
8
13.3616
15.5073
17.5345
20.0902
21.9550
26.1245
9
14.6837
16.9190
19.0228
21.6660
23.5894
27.8772
10
15.9872
18.3070
20.4832
23.2093
25.1882
29.5883
11
17.2750
19.6751
21.9200
24.7250
26.7568
31.2641
12
18.5493
21.0261
23.3367
26.2170
28.2995
32.9095
13
19.8119
22.3620
24.7356
27.6882
29.8195
34.5282
14
21.0641
23.6848
26.1189
29.1412
31.3193
36.1233
15
22.3071
24.9958
27.4884
30.5779
32.8013
37.6973
16
23.5418
26.2962
28.8454
31.9999
34.2672
39.2524
17
24.7690
27.5871
30.1910
33.4087
35.7185
40.7902
18
25.9894
28.8693
31.5264
34.8053
37.1565
42.3124
19
27.2036
30.1435
32.8523
36.1909
38.5823
43.8202
20
28.4120
31.4104
34.1696
37.5662
39.9968
45.3147
21
29.6151
32.6706
35.4789
38.9322
41.4011
46.7970
22
30.8133
33.9244
36.7807
40.2894
42.7957
48.2679
23
32.0069
35.1725
38.0756
41.6384
44.1813
49.7282
24
33.1962
36.4150
39.3641
42.9798
45.5585
51.1786
25
34.3816
37.6525
40.6465
44.3141
46.9279
52.6197
26
35.5632
38.8851
41.9232
45.6417
48.2899
54.0520
27
36.7412
40.1133
43.1945
46.9629
49.6449
55.4760
28
37.9159
41.3371
44.4608
48.2782
50.9934
56.8923
29
39.0875
42.5570
45.7223
49.5879
52.3356
58.3012
30
40.2560
43.7730
46.9792
50.8922
53.6720
59.7031
31
41.4217
44.9853
48.2319
52.1914
55.0027
61.0983
32
42.5847
46.1943
49.4804
53.4858
56.3281
62.4872
33
43.7452
47.3999
50.7251
54.7755
57.6484
63.8701
34
44.9032
48.6024
51.9660
56.0609
58.9639
65.2472
35
46.0588
49.8018
53.2033
57.3421
60.2748
66.6188
36
47.2122
50.9985
54.4373
58.6192
61.5812
67.9852
37
48.3634
52.1923
55.6680
59.8925
62.8833
69.3465
38
49.5126
53.3835
56.8955
61.1621
64.1814
70.7029
39
50.6598
54.5722
58.1201
62.4281
65.4756
72.0547
40
51.8051
55.7585
59.3417
63.6907
66.7660
73.4020
41
52.9485
56.9424
60.5606
64.9501
68.0527
74.7449
42
54.0902
58.1240
61.7768
66.2062
69.3360
76.0838
43
55.2302
59.3035
62.9904
67.4593
70.6159
77.4186
44
56.3685
60.4809
64.2015
68.7095
71.8926
78.7495
45
57.5053
61.6562
65.4102
69.9568
73.1661
80.0767
46
58.6405
62.8296
66.6165
71.2014
74.4365
81.4003
Practical session 3. Using SPPS for the Chi-squared test and Seabridge file
As an example , suppose we are interested to see if there is any association between gender (Q5) and the main reason for joining the Centre (Q10)
Using Analyse, Descriptive Statistics, Crosstabs
Move “Q5 gender “ into the rows and “Q10 main reason “ into the columns
Click on the Statistics button and choose Chi-squared
From the cells button , choose observed and expected
Now choose continue and Ok
The resulting output is :
Case Processing Summary
Cases
Valid
Missing
Total
N
Percent
N
Percent
N
Percent
Q5 gender * Q10 main reason for joining
58
96.7%
2
3.3%
60
100.0%
Q5 gender * Q10 main reason for joining Crosstabulation
Q10 main reason for joining
Total
location of centre
range of facilities
recommended by friend/relative
membership rates
other
Q5 gender
male
Count
2
9
4
3
9
27
Expected Count
4.2
6.5
5.6
5.1
5.6
27.0
female
Count
7
5
8
8
3
31
Expected Count
4.8
7.5
6.4
5.9
6.4
31.0
Total
Count
9
14
12
11
12
58
Expected Count
9.0
14.0
12.0
11.0
12.0
58.0
Chi-Square Tests
Value
df
Asymp. Sig. (2-sided)
Pearson Chi-Square
10.300a
4
.036
Likelihood Ratio
10.682
4
.030
N of Valid Cases
58
a. 2 cells (20.0%) have expected count less than 5. The minimum expected count is 4.19.
As the area in the tail of the chi-squared distribution is 0.036 (see value in last column) and thus less than 5%, we conclude that there is an association between gender and the reason for joining the centre.
2 cells (20%) have expected values of less than 5. This is acceptable.
Topic 3 Significance Testing For Means
Notation µ is the mean of the population
is the mean of the sample
σ is the standard deviation of the population
s is the best estimate of the population standard deviation from a sample
[Reminder: We use the formula ] for the standard deviation of a population and formula ] for the best estimate of the population standard deviation from a sample ]
The Central Limit Theorem states that if random samples of the same size are repeatedly drawn from a population of any distribution, the means of those samples will be Normally Distributed.
Additionally, the mean of the sample means will be the same as the population mean.
Mean of sample means = µ
If the population size is large relative to the sample size, n, then the standard deviation of the sample means is equal to the population standard deviation divided by sample size
s.d. of sample means =
Z-test for a Population Mean
We use the z-test for a population mean when the standard deviation of the population is known.
Procedure:
Two-tailed test
1) Set up hypotheses H0 : µ= some value [null hypothesis]
H1 : µ ≠some value [alternative hypothesis]
2) Choose significance level 5% is commonly used
3) Calculate test statistic
4) Find critical value(s) from tables For a 2-tailed tail, area in each tail is half of significance level.
5) Accept or reject hypothesis
6) Summarise findings
Example:
The lengths of metal bars produced by a particular machine are Normally distributed with a mean length of 420 cm and a standard deviation of 12 cm. After a recent service to the machine, a sample of 100 bars was taken and the mean length of this sample was found to be 423cm.
Janet Geary 2012 Page 38
Is there any evidence , at the 5% level, of a change in the mean length of the bars produced by the machine. Assume that the standard deviation remains the same after the service.
In this example: population mean µ = 420
Population standard deviation σ = 12
Sample mean = 423
Sample size n = 100
1) H0 : µ = 420 [mean length remains the same]
H1 : µ ≠ 420 [mean length changes, 2-tailed test]
2) significance level , α = 0.05 [5%]
3)
4) Critical values:
In this example we are using a two-tailed test as the alternative hypothesis has two parts ( less than 420 and greater than 420). We have to split the 5% significance level into 2 halves of 0.025 each. From the Normal tables,( below), we get that an area in a tail of 0.025 gives critical values of z = ±1.96.
5) The diagram below (to be completed in seminar) shows the “accept” and “reject” areas for H0.
As test z = 2.5 lies in the reject region, we reject H0
6) We conclude that there is evidence to suggest, at the 5% level, that the mean length of the metal bars has changed since the machine was serviced.
N.B. In this example we assumed that the standard deviation had not changed although we were unsure about whether the mean length had changed. This is not very realistic. In general, if we know the population standard deviation, we are likely to know the population mean as well and thus have no need to test it.
It is much more usual that we do not know the population mean and the population standard deviation. This scenario is covered by the t-test.
If the sample size is large , we can use the t-test .
Critical values of the Normal Distribution
Alpha is the area in one tail.
alpha
z value
alpha
z value
0.005
2.5758
0.05
1.6449
0.010
2.3263
0.10
1.2816
0.015
2.1701
0.15
1.0364
0.020
2.0537
0.20
0.8416
0.025
1.9600
0.25
0.6745
0.030
1.8808
0.30
0.5244
0.035
1.8119
0.35
0.3853
0.040
1.7507
0.40
0.2533
0.045
1.6954
0.45
0.1257
0.050
1.6449
0.50
0.0000
The t-test for a population mean
The t-test is used to test the value of a population mean when the population standard deviation is not known.
Procedure:
1) Calculate the sample mean, , and the estimated standard deviation, s.
2) Set up hypotheses H0 : µ = some value
3) Choose significance level 5% is commonly used
4) Calculate test statistic
5) Find critical value(s) from tables degrees of freedom = n-1
6) Accept or reject hypothesis
7) Summarise findings
Example:
A random sample of 8 women yielded the following cholesterol levels:
3.1 2.8 1.5 1.7 2.4 1.9 3.3 1.6
Test whether the sample could be drawn from a population whose mean cholesterol level is 3.1.
Test at the 5% level of significance.
Population mean (to be tested): µ = 3.1
Sample size: n = 8
sample mean: = 2.2875 [ from calculator]
estimated standard deviation: s = 0.7120 [ from calculator]
H0 : µ = 3.1
H1 µ ≠ 3.1 [2-tailed test]
Significance level = 5% thus α = 0.05/2 = 0.025 [2-tailed test]
degrees of freedom = n-1 = 7 α = 0.025
From the t-tables (next page), the critical values are ± 2.3646
As the value of the test statistics t = -3.2276 is outside the critical values, we Reject H0
There is evidence to suggest that this sample is not drawn from a population with a mean cholesterol level of 3.1.
t- tables
area in one tail
degrees of freedom
0.1
0.05
0.025
0.01
0.005
0.0025
1
3.0777
6.3138
12.7062
31.8205
63.6567
127.3213
2
1.8856
2.9200
4.3027
6.9646
9.9248
14.0890
3
1.6377
2.3534
3.1824
4.5407
5.8409
7.4533
4
1.5332
2.1318
2.7764
3.7469
4.6041
5.5976
5
1.4759
2.0150
2.5706
3.3649
4.0321
4.7733
6
1.4398
1.9432
2.4469
3.1427
3.7074
4.3168
7
1.4149
1.8946
2.3646
2.9980
3.4995
4.0293
8
1.3968
1.8595
2.3060
2.8965
3.3554
3.8325
9
1.3830
1.8331
2.2622
2.8214
3.2498
3.6897
10
1.3722
1.8125
2.2281
2.7638
3.1693
3.5814
11
1.3634
1.7959
2.2010
2.7181
3.1058
3.4966
12
1.3562
1.7823
2.1788
2.6810
3.0545
3.4284
13
1.3502
1.7709
2.1604
2.6503
3.0123
3.3725
14
1.3450
1.7613
2.1448
2.6245
2.9768
3.3257
15
1.3406
1.7531
2.1314
2.6025
2.9467
3.2860
16
1.3368
1.7459
2.1199
2.5835
2.9208
3.2520
17
1.3334
1.7396
2.1098
2.5669
2.8982
3.2224
18
1.3304
1.7341
2.1009
2.5524
2.8784
3.1966
19
1.3277
1.7291
2.0930
2.5395
2.8609
3.1737
20
1.3253
1.7247
2.0860
2.5280
2.8453
3.1534
21
1.3232
1.7207
2.0796
2.5176
2.8314
3.1352
22
1.3212
1.7171
2.0739
2.5083
2.8188
3.1188
23
1.3195
1.7139
2.0687
2.4999
2.8073
3.1040
24
1.3178
1.7109
2.0639
2.4922
2.7969
3.0905
25
1.3163
1.7081
2.0595
2.4851
2.7874
3.0782
26
1.3150
1.7056
2.0555
2.4786
2.7787
3.0669
27
1.3137
1.7033
2.0518
2.4727
2.7707
3.0565
28
1.3125
1.7011
2.0484
2.4671
2.7633
3.0469
29
1.3114
1.6991
2.0452
2.4620
2.7564
3.0380
30
1.3104
1.6973
2.0423
2.4573
2.7500
3.0298
31
1.3095
1.6955
2.0395
2.4528
2.7440
3.0221
32
1.3086
1.6939
2.0369
2.4487
2.7385
3.0149
33
1.3077
1.6924
2.0345
2.4448
2.7333
3.0082
34
1.3070
1.6909
2.0322
2.4411
2.7284
3.0020
35
1.3062
1.6896
2.0301
2.4377
2.7238
2.9960
36
1.3055
1.6883
2.0281
2.4345
2.7195
2.9905
37
1.3049
1.6871
2.0262
2.4314
2.7154
2.9852
38
1.3042
1.6860
2.0244
2.4286
2.7116
2.9803
39
1.3036
1.6849
2.0227
2.4258
2.7079
2.9756
40
1.3031
1.6839
2.0211
2.4233
2.7045
2.9712
41
1.3025
1.6829
2.0195
2.4208
2.7012
2.9670
42
1.3020
1.6820
2.0181
2.4185
2.6981
2.9630
43
1.3016
1.6811
2.0167
2.4163
2.6951
2.9592
44
1.3011
1.6802
2.0154
2.4141
2.6923
2.9555
45
1.3006
1.6794
2.0141
2.4121
2.6896
2.9521
46
1.3002
1.6787
2.0129
2.4102
2.6870
2.9488
50
1.2987
1.6759
2.0086
2.4033
2.6778
2.9370
55
1.2971
1.6730
2.0040
2.3961
2.6682
2.9247
60
1.2958
1.6706
2.0003
2.3901
2.6603
2.9146
65
1.2947
1.6686
1.9971
2.3851
2.6536
2.9060
70
1.2938
1.6669
1.9944
2.3808
2.6479
2.8987
75
1.2929
1.6654
1.9921
2.3771
2.6430
2.8924
80
1.2922
1.6641
1.9901
2.3739
2.6387
2.8870
85
1.2916
1.6630
1.9883
2.3710
2.6349
2.8822
90
1.2910
1.6620
1.9867
2.3685
2.6316
2.8779
95
1.2905
1.6611
1.9853
2.3662
2.6286
2.8741
100
1.2901
1.6602
1.9840
2.3642
2.6259
2.8707
200
1.2858
1.6525
1.9719
2.3451
2.6006
2.8385
300
1.2844
1.6499
1.9679
2.3388
2.5923
2.8279
400
1.2837
1.6487
1.9659
2.3357
2.5882
2.8227
500
1.2832
1.6479
1.9647
2.3338
2.5857
2.8195
600
1.2830
1.6474
1.9639
2.3326
2.5840
2.8175
700
1.2828
1.6470
1.9634
2.3317
2.5829
2.8160
800
1.2826
1.6468
1.9629
2.3310
2.5820
2.8148
900
1.2825
1.6465
1.9626
2.3305
2.5813
2.8140
1000
1.2824
1.6464
1.9623
2.3301
2.5808
2.8133
Janet Geary 2012 Page 41
Seminar Exercise
1. The commissions earned by an estate agent last year were Normally distributed with a mean of £2,560 and a standard deviation of £310. A random sample of 24 sales from this year was examined and the mean commission found to be £2,690.
Assuming no change in the standard deviation, does this evidence provide significant evidence of a change in the mean commission?
Test at the 5% level of significance.
2. A random sample of 40 sacks of animal feed is taken from a population whose mean is unknown but whose variance is 15.7 kg2. The mean weight of the sample is 86.3 kg.
Test the hypothesis that the mean weight of the sacks is 86 kg.
Test at the 5% significance level.
3. The mean time taken by a standard piece of software to run a particular task is 64 seconds. In order to compare a different piece of software it was given a sample of 15 tasks of the same type, the times being:
61.8 73.0 72.0 68.1 63.1 71.5 63.1 69.4 68.9 67.0
65.0 67.5 69.0 59.8 61.5
Does this data indicate a difference in the means times between the two pieces of software? Test at the 5% level of significance.
4. You have been told by “experts” that the average weekly salary for a part-time secretary is £275. You decide to check this assertion by taking a random sample of 30 secretaries and obtain a sample mean of £270 and a standard deviation of £40.
Test at the 5% level to determine whether the experts are correct.
5 During 2011 the number of beds required at a hospital was Normally distributed with a mean of 1800 a day and a standard deviation of 190 per day.
During the first 50 days of 2012, the average daily requirement for beds was 1830. This data is considered to be a valid sample for 2012. A senior hospital manager has claimed that this gives evidence that the requirement for beds has changed since 2011. Would you agree?
Is the sampling method valid?
Answers:
1. Test statistic, z = 2.05 reject H0 mean has changed
2. Test statistic z = 0.4788 do not reject H0
3. Test statistic t = 2.54 reject H0 times are not the same
4. Test statistic t = -0.6847 do not reject H0 experts could be right
5. Test statistic z = 1.116 do not reject H0 requirement for beds has not changed
Sampling method is questionable as only covers winter (January and February) and thus will not represent the demand for beds for the whole year. Illnesses are often seasonal.
One-tailed t-test for a Population Mean
In a one tailed test the alternative hypothesis is that the population mean is either greater than a particular value or it is less than a particular value. (With a two-tailed test it was simply “not equal”)
Example:
A jeweller was sold some silver wire that he was suspicious about. If the wire was pure silver it would give a reading of 1.5 ohms when tested for electrical resistance. If the wire was not pure silver the resistance would be increased.
The jeweller tested 5 pieces of the dubious wire and the following readings for electrical resistance were obtained:
1.51 1.49 1.54 1.52 1.54
Test at the 5% level the hypothesis that the wire is pure silver.
µ = 1.5 [to be tested] n = 5
= 1.52 [from calculator]
s = 0.0212 [from calculator]
H0 : µ = 1.5 [it is pure silver]
H1 : µ > 1.5 [it is not pure silver, resistance is increased]
Significance level = 5% thus α = 0.05 [1-tailed test]
degrees of freedom = n-1 = 4 α = 0.05
critical value = +2.1318 [we need the positive critical value for H1 > some value]
[if H1 had been < some value, we would have used the negative critical value]
Do not reject H0
There is no significance evidence to suggest that the silver wire was impure.
The jeweller should allay his suspicions
Exercises One tailed t-test
1.. A firm which manufactures panels tests a sample of 13 panels, loading them until they crack. The results of this test are the following loads (in Newtons).
2.7 4.6 3.1 4.2 3.3 6.7 8.9 8.6
7.2 8.6 8.9 7.3 8.9
An important customer of this firm asserts that the quality of the panels is getting worse and that the mean cracking load is now less than the previous value of 7 Newtons.
Does the evidence bear out the customers claim?
Test at the 5% level.
2. A health clinic claims that people following its diet programme will lose, on average, at least 8 kilograms during the programme. A random sample of 41 people on the programme showed a mean weight loss of 7 kilograms. The sample standard deviation, s, was found to be 3.1 kilograms. Test at the 5% level of significance whether the company is exaggerating, i.e. the mean weight loss, in general, is less than 8 kilograms.
3. A manager in a university department claimed that the average hours members of staff work each week is 35 hours. A random sample of 12 lecturers found that the number of hours worked in a particular week was taken. The results are given below:
37, 41, 32, 45, 39, 35, 43, 38, 40, 42, 38, 34
The staff union claims that the average number of hours worked is greater than 35.
Does the evidence bear out the unions claim?
Test at the 5% level of significance.
Answers:
1. mean =6.3846, sd = 2.4535, Test statistic t = -0.9044, critical = -1.782
do not reject H0 Evidence does not support the claim
2. Test statistic t = -2.065, critical = -1.684
reject H0 Mean weight loss is less than 8 kg. Company is exaggerating.
3. Mean = 38.66, sd = 3.821, Test statistic t = +3.3237, critical = + 1.796
reject H0 Mean hours worked is greater than 35.
Evidence does bear out the unions claim.
Topic 4 Two-sample test and test for proportions
Two Sample t-Test for Population Means
from a Normal distribution with mean µx and standard deviation σ has sample mean , sd = sx and sample size nx
Suppose we have two independent random samples :
Sample A: x1, x2 x3 ….. xnx
from a Normal distribution with mean µy and standard deviation σ has sample mean = , sd = sy and sample size ny
Sample B: y1 y2 y3 ….. yny
Notice that we are assuming that the population standard deviations are the same. Since this standard deviation is unknown we have to find an estimate for it.
Notation
sample A
sample B
sample size
nx
ny
sample mean
sample standard deviation
sx
sy
The estimate for the common standard deviation is given by s , where
The test statistic is
which fits a t-distribution with nx + ny – 2 degrees of freedom
We usually use a two sample t-test to determine if the population means are equal.
i.e. (µx – µy =0)
In this case, the test statistic is
Example:
As part of an investigation undertaken by a telephone company, a comparison of the weekly phone bills in two areas was undertaken. The figures are given below :
Area A: 5.7 12.0 10.1 13.7 11.9 11.7 10.4 7.3 5.3 6.8 11.8
Area B: 8.9 3.0 8.2 5.2 2.2 5.7 3.2 9.6 3.1 3.9
Test, at the 5% level to see if there is any difference between the mean phone bills in the two areas.
calculations
Area A
Area B
sample means,
= 9.7
= 5.3
sample s.d ,
sx = 2.91067
sx2 =8.47200
sy = 2.7109
sy2 = 7.3489
sample size,
nx = 11
ny = 10
= 7.940
common variance, s2 = 7.940 common standard deviation, s = 2.8178
H0: µx = µy or µx – µy = 0 [population means are the same]
H1: µx ≠ µy [pop. means are not the same]
Choose α = 5%
Test Statistic is
=3.574
If α = 5% for a 2-tailed test, each tail has 0.025, degrees of freedom = nx + ny -2 = 19 critical values = ± 2.0930
As the test statistic t = 3.574 is outside the “accept region “ (± 2.0930), Reject H0
The mean amount spent on telephone bills is not the same in the two areas
Exercise: 2 sample t-test
1. Independent random samples of current account balances at two branches of a particular bank yielded the following results:
Branch
number of accounts sampled
sample
mean balance
sample
standard deviation
Holloway
12
£1,000
£150
Islington
10
£920
£120
Test , at the 5% level of significance, whether there is any difference between the mean balances in the two branches of the bank.
2. A firm is studying the delivery times of two raw material suppliers. The firm is basically satisfied with supplier A and is prepared to stay with that supplier if the mean delivery time is roughly the same as that of supplier B.
Independent samples gave the following results:
sample size
sample mean
delivery time
sample
standard deviation
supplier A
50
14 days
3 days
supplier B
30
12.5 days
2 days
Test, at the 5% level of significance, whether there is any difference in the mean times for delivery.
3. In a wage discrimination case involving male and female employees, independent samples of male and female employees with five year’s experience or more provided the hourly wage results shown in the table.
sample size
sample mean
(£ per hour)
sample
standard deviation
male employees
44
£9.25
£1.00
female employees
32
£8.70
£0.80
Does wage discrimination appear to be present in this case?
Test at the 5% level of significance.
Test at the 1% level of significance.
Do your conclusions depend upon the level of significance?
4. A University careers office decided to collect data on the starting salaries for graduates in different subjects ten years ago. She wanted then to compare the averages ten years ago with those form last year’s graduates. Among the areas investigated were accounting graduates and general business graduates.
The salaries (in £1,000 per annum) found from two samples are given below:
Accounting
Business
14.4
13.2
12.6
11.8
13.1
12.5
14.0
11.5
13.5
14.9
13.1
12.3
14.1
14.5
12.4
13.7
12.6
12.2
14.9
13.4
14.6
13.1
14.6
Use a 5% level of significance to test the hypothesis that there is no difference between the mean annual starting salary of Accounting graduates and the mean starting salary of Business graduates ten years ago.
What is your conclusion?
Answers:
1. Common s = 137.31 t = 1.36 critical = ± 2.0860 do not reject H0
No significant difference in means
2. Common s = 2.6723 t = 2.4306 critical = ± 2.0 (approx) reject H0
There is a difference in the means
3. Common s = 0.9215 t = 2.569
a) critical = ± 1.9921 (approx) reject H0 There is a difference in the means
b) critical = ± 2.6430 (approx) do not reject H0 There is no difference in the means
Difference in wages is significant at the 1% level but not at the 5% level.
4 Accounting mean = 13.6583 sample s.d. = 0.8867
Business mean = 13.0091 sample s.d. = 1.0784
common s = 0.9827 t = 1.5826 critical = ± 2.0796 do not reject H0
No difference in the mean starting salaries
Test for a difference in proportions
Consider 2 random samples of sizes nx and ny with proportions of a success equal to px and py respectively. We wish to determine if there is any significant difference between the 2 proportions.
The first step is to calculate the common proportion P.
then Q = 1 – P
Note: P can be remembered as: P = total number of successes
total sampled
The test statistic is
which fits a Normal distribution with mean 0 and standard deviation 1, i.e. the standardised Normal distribution as found in tables.
The null hypothesis, H0 , will be that there is no difference between the 2 proportions.
Example:
Two newspapers conducted opinion polls asking voters whether they would vote for Mr Whittington as the next Mayor of London.
In the Evening News poll, 325 voters out of 500 said that they would vote for Mr Whittington.
In the Morning Metro poll, 201 voters out of 300 polled said they would vote for Mr. Whittington.
Is there any difference in the results of the two polls?
Answer:
H0 : there is no difference between the proportions saying they will vote for Mr Whittington.
H1 : there is a significant difference between the proportions saying they will vote for Mr Whittington.
Evening News
Morning Metro
total
number of successes
325
201
526
sample proportions
px = 325 = 0.65
500
py = 201 = 0.67
300
sample size
nx = 500
ny = 300
800
P = nx px + ny py then Q = 1- P
nx + ny
P = 5000.65 + 3000.67 = 325 + 201 = 526 = 0.6575
500 + 300 800 800
Note : we could have calculated P directly from the totals column.
P = 0.6575 thus Q = 1 – P = 0.3425
The test statistic is
=-0.5771
Testing at the 5% level of significance, with a 2-tailed test, the critical values are ± 1.96
Do not reject H0 ,
Conclude: There is no significant difference between the results of the 2 opinion polls.
Seminar Exercises Difference in proportions
1. A medical research unit decided to test two drugs, A and B, for reducing blood pressure. The drugs were given to 2 sets of volunteers. One group of 90 volunteers was treated with drug A and 60 of these volunteers reported lower blood pressure. The second group of 80 volunteers was treated with Drug B; of these 50 reported lower blood pressure. Test at the 5% level of significance if there is any difference between the 2 drugs ability to lower blood pressure.
2. A survey firm conducted door-to-door interviews on a new consumer product. Some individuals co-operated with the interviewers and completed the questionnaire whilst other individuals did not co-operate. The sample data is shown in the table below.
Testing at the 5% level of significance, test the hypothesis that the rate of co-operation is the same for both men and women.
sample size
number co-operating
men
200
110
women
300
210
3. Two universities were planning to merge. At one of the 2 universities, NLU, 200 staff were interviewed. Of these 44 said that they thought the new merged University would be “good for the local area”. At the second university, GLU, 48 of the 300 staff interviewed thought the new University would be “good for the local area”.
Test, at the 5% level of significance, whether there is any difference in the two proportions who think that the new university would be “good for the local area”.
Test at the 10% level of significance.
4. A check was conducted on a form completed in 2 offices. The first office yielded a random sample of 250 forms, of which 35 contained errors. The second office had a sample size of 300 and 27 forms that contained errors.
Test at the 10% level of significance whether there is any difference in the proportion of forms containing errors between the 2 offices.
Answers:
1. P = 0.647 z = 0.5679 No significant difference in the proportions.
2. P = 0.64 z = -3.423 There is a significant difference
3. P = 0.184 z = 1.69 No significant difference at 5% level
Significant difference at 10% level
4 P = 0.1127 z = 1.85 There is a significant difference
Janet Geary 2012 Page 48
Practical session Use of SPSs for a t-test for a population means and a two–sample t-test.
One sample t-test
Example : We wish to test if the mean age of the users of the Seabridge sports centre could come from a population with mean age = 40.
Choose Analyse, Compare Means , One-Sample T-test
Choose age as the “test variable” and enter 40 as the “test value”
Using the Options button choose 95% as the Confidence Interval Percentage
Now choose Continue and OK
The SPSS output is :
One-Sample Statistics
N
Mean
Std. Deviation
Std. Error Mean
Q6 age of respondent
58
36.19
11.607
1.524
One-Sample Test
Test Value = 40
t
df
Sig. (2-tailed)
Mean Difference
95% Confidence Interval of the Difference
Lower
Upper
Q6 age of respondent
-2.500
57
.015
-3.810
-6.86
-.76
In this case we would reject the hypothesis that the mean age of the population is 40 in favour of the alternative hypothesis that it is not 40 ( the value in the tail is 0.015 which is less that 0.05). This is not surprising as the sample mean is 36.19
If we repeat this test with a test value of 36
We get :
One-Sample Test
Test Value = 36
t
df
Sig. (2-tailed)
Mean Difference
95% Confidence Interval of the Difference
Lower
Upper
Q6 age of respondent
.124
57
.901
.190
-2.86
3.24
In this case we would not reject the null hypothesis that the mean age of the population is 36 as the value in the tail is 0.901 ( i.e. above 0.05)
Using SPSS for a two-sample t-test.
As an example we can assume that the male and female members were surveyed independently and thus can be considered as two independent samples . We can now test to see if there is a difference in the mean age of the male and female members.
Choose age as the Test Variable and gender as the Grouping Variable
Now define groups using the codes on the data file ( 1 and 2)
Choose Continue and then OK
The output takes the form of :
Group Statistics
Q5 gender
N
Mean
Std. Deviation
Std. Error Mean
Q6 age of respondent
male
27
34.67
11.622
2.237
female
31
37.52
11.619
2.087
Independent Samples Test
Levene’s Test for Equality of Variances
t-test for Equality of Means
F
Sig.
t
df
Sig. (2-tailed)
Mean Difference
Std. Error Difference
95% Confidence Interval of the Difference
Lower
Upper
Q6 age of respondent
Equal variances assumed
.037
.849
-.932
56
.356
-2.849
3.059
-8.977
3.278
Equal variances not assumed
-.932
54.907
.356
-2.849
3.059
-8.980
3.281
The value in the tails are 0.356 if we assume that the populations from which the samples were drawn have equal variances and 0.356 if we do not make this assumption . In both cases the values are greater than 0.05 and thus we do not reject the hypotheses that they have equal means.
This is effectively saying that a female mean age of 37.52 and a male mean age of 34.67 are not significantly different.
Janet Geary 2012 Page 52
Topic 5 Confidence Intervals
Point and Interval Estimates.
The mean rent paid by students in a provincial town is £102 per week with a standard deviation of £10. If we were ignorant of this fact (which we would be unless a census had been taken) we would have to conduct a survey to estimate the population mean.
If the results of the survey gave a sample mean of £100 we could use this to estimate the value of the population mean. This value of the sample mean, £100, is known as a point estimate because it consists of just one value. Point estimates can be very misleading as they give a false impression of accuracy. By themselves they do not recognise the fact that they are only estimates and thus are subject to a degree of uncertainty.
This question of uncertainty is addressed by using interval estimates such as confidence intervals.
A point estimate is The mean weekly rent is £100′
An interval estimate would be The mean weekly rent is in the range £96 to £104
Confidence Interval for a Population Mean – Population Standard Deviation Known
If the population standard deviation is known then we can use the Normal distribution in our calculation of the confidence interval.
The formula for a Confidence Interval is
zα is found from the Normal tables, n is the size of the sample and σ is the standard deviation of the population from which the sample is drawn.
For a 95% confidence interval α = 100% – 95% = 5% i.e. α = 0.025, Zα = 1.96
2 2
For a 90% confidence interval α= 100% – 90% = 10 % i.e. α = 0.05, Zα =1.6449
2 2
For a 80% confidence interval α= 100% – 80% = 20 % i.e. α = 0.10, Zα =1.2816
2 2
Interpretation of a Confidence Interval
A 95 % confidence interval gives a range within which there is a 95% probability that the population mean lies.
If we took 100 samples and for each calculated the sample mean, standard deviation and hence the 95% confidence intervals, 95 of these intervals would contain the population mean.
Thus 5% of the time the population mean will NOT lie in the 95% confidence interval.
Janet Geary 2012 Page 56
Example:
An accountant knows, from past experience that the standard deviation of the value of all invoices is £11.60. However he has forgotten the value of the mean.
In order to estimate the value of the mean, he takes a sample of 20 invoices and finds a sample mean of £51.41.
Determine a 95% confidence interval for the mean value of all invoices.
95 % confidence interval
95% confidence interval is [ £46.33 , £56.49 ]
Thus there is a 95% confidence that the mean value of all invoices lies in the range £46.33 to £56.49.
Confidence Interval for a Population Mean – Population Standard Deviation Unknown
If the population standard deviation is not known then we use the t distribution in our calculation of the confidence interval. We use the value s as the best estimate of the population standard deviation.
The formula for a Confidence Interval is
tα is found from the t-tables. The degrees of freedom is n -1
Example:
The heights, in cm, of 6 policemen were : 180 176 179 181 183 179
Calculate
a) a 90% Confidence Interval for the mean heights of all policemen
b) a 95% Confidence Interval for the mean heights of all policemen
c) If we had obtained the same sample statistics (mean and standard deviation) from a sample of 60 policemen what effect would this have on the 95% Confidence Interval?
Mean = 179.667 sample standard deviation s = 2.3381 [results from calculator]
a) 90% Confidence Interval
α = 0.05 degrees of freedom = n – 1 = 5 tα = 2.0150
There is a 90% confidence that the mean height of all policemen lies in the range 177.74cm to 181.59 cm.
b) 95% Confidence Interval
α = 0.025 degrees of freedom = n -1 = 5 tα = 2.5706
N.B. A higher percentage confidence interval gives a wider interval
c) 95% Confidence Interval with sample size = 60
α = 0.025 degrees of freedom = n -1 = 59 tα = 2.0003 (using df = 60)
N.B. Larger sample sizes give narrower confidence intervals.
Thus for a “more accurate” estimate of the mean,(i.e. a narrower confidence interval) take a larger sample. I hope this was obvious before but we have now shown it to be the case.
Confidence Interval for a Population Proportions –
A proportion can represent any set amount and is mainly used when the data is not numerical.
Let p = sample proportion (expressed as a decimal) then q = 1- p
If np > 5 and nq > 5 then we can calculate a Confidence Interval for a proportion by:
p ± Z α √( pq/n)
Example:
A random sample of 1,000 electors was polled and 400 of the electors said that they will vote Labour.
How accurate an estimate is this sample proportion with respect to how all electors will vote?
p = 400 = 0.4 thus q = 0.6
1000
For a 95% Confidence Interval, Z α = 1.96
95% Confidence Interval
p ± Z α √( pq/n) 0.4 ± 1.96 × √( (0.4 × 0.6)/1000)
0.4 ± 1.96 × √( 0.24/1000) 0.4 ± 1.96 × √( 0.00024)
0.4 ± 1.96 × 0.0155 0.4 ± 0.03036
95% Confidence Interval ( 0.3696 , 0.43036)
There is a 95% probability that the proportion of all electors who will vote Labour lies in the range 36.96% to 43.036%. More sensibly (37% to 43%) will vote Labour with a confidence of 95%.
This is usually reported in the press as “ 40% will vote Labour with an error due to sampling of plus or minus 3%”
Sample Size Required for a given level of accuracy
We can re-arrange the formula above to help as calculate the sample size required for a given level of accuracy.
Notation : let M be the size of the margin of error .
Then from above M = Z α √( pq/n)
For a 95% confidence interval Z α = 1.96
The size of the product pq will vary according to value of p. But as q = 1 – p, we can say that the quantity pq = p(1-p)
From the graph below of y = p (1-p) we can see that the largest possible value of pq is 0.25 .This occurs when both p and q are 0.5.
p
q=1-p
p(1-p)
0
1
0
0.1
0.9
0.09
0.2
0.8
0.16
0.3
0.7
0.21
0.4
0.6
0.24
0.5
0.5
0.25
0.6
0.4
0.24
0.7
0.3
0.21
0.8
0.2
0.16
0.9
0.1
0.09
1
0
0
For a 95% confidence interval we now have:
largest error: M = Z α √( pq/n)
M = 1.96 √( 0.25 /n)
rearranging this formula to make n the subject gives:
M = √( 0.25 /n)
1.96
M2 = 0.25
1.962 n
n = 0.25 × 1.96 2 = 0.9604
M2 M2
Thus if a 95% Confidence Interval required a maximum margin of error of 2% M = 0.02
n = 0.25 × 1.96 2 = 0.9604 = 2401
M2 ( 0.02)2
Some results: NB Sample size must be an integer.
margin
0.9604/M^2
minimum
M
sample size
1%
9604
9604
2%
2401
2401
3%
1067.1111
1068
4%
600.25
601
5%
384.16
385
Seminar Exercise
1. An ambulance station is looking at its response times and takes a sample of 16 call outs and finds that the mean of the sample is 17 minutes whilst the standard deviation of the sample is 5 minutes. What are the 99% confidence limits for the mean response time of all call outs.
2. In order to evaluate the success of a television advertising campaign for a new product, a company interviewed 400 residents in the television area. 120 of them knew about the product. How accurately does this estimate the percentage of residents in the area who know about the product?
Calculate a 95% confidence interval.
3. A random sample of 400 rail passengers is taken and 55% are in favour of proposed new timetables. Calculate a 95% confidence interval for the proportion of all passengers that are in favour of the timetables.
4. A manufacturer needs to estimate the mean life of batteries they are producing. To do this they take a sample of 100 batteries and find that the mean life of this sample is 50 hours.
The standard deviation of all battery lifetimes is known to be 6 hours.
Calculate
a) a 95% Confidence Interval for the mean of all battery lifetimes.
b) a 99% Confidence Interval for the mean of all battery lifetimes.
5. The Human Resources department of a company is leading a campaign to reduce absenteeism of staff.
Last year 15% of the 59,202 hours which should have been worked over a 46 week year were lost due to absenteeism.
The campaign that was set up has now been running for 20 weeks. Absenteeism reports for the last 10 weeks of the campaign were examined. The weekly hours lost were:
week no. 11 12 13 14 15 16 17 18 19 20
hours lost 195 190 162 170 177 190 198 177 184 191
a) Considering these 10 weeks as a random sample of all the weeks to be worked, has there been a significant decrease in absenteeism over last year? You should carry out a test for the mean number of hours lost per week. Test at the 10% level of significance, and explain your choice of test statistic.
b. Calculate a 90% confidence interval for the mean number of hours lost per week after the campaign, and explain its meaning.
6. Last Christmas, the average value of purchases per customer at a toyshop was 36.00 with a standard deviation of 10.25.
The toyshop is anxious to know early on whether this years average Christmas spend is different, so it takes a random sample of 15 customers. Their spend on toys is (in ):
22 48 36 45 35 11 22 69
86 45 57 43 22 24 17
The shop wishes to be 90% confident that the error due to sampling is no larger than 3.00.
Assuming that the population standard deviation remains at 10.25
a) Is the sample taken sufficiently large to achieve the accuracy objective?
b) Test whether the average spend this Christmas is significantly different from last year’s. Test at the 10% level of significance.
c) Establish an 80% confidence interval for the average spend this year.
d) Are your answers to (b) and ( c) consistent? Explain your reasoning.
Answers
1. (13.32, 20.68)
2. (25.5%, 34.5%)
3. (50.1% , 59.9%)
4 a) (48.824, 51.176) b) (48.455 , 51.545)
5 sample mean = 183.4 sample sd = 11.6065 last year mean = 193.05
t = -2.629 critical = -1.383 reject H0 , Has been a decrease in hours lost
b) (176.67 , 190.13)
6 a) sample size is too small, it should be at least 32
b) z = 1.058 do not reject H0 ,
Mean is not significantly different from last year.
c) (35.41 , 42.19) d) both b) and c) imply that the value 36 would lie in a 90% confidence interval, i.e. are consistent
Practical session : Using SPSS to find Confidence Intervals
To find a 95% confidence of the length of membership:
Choose Analyze, Descriptive Statistics and then Explore
Select “length of membership” as the dependent list.
Now choose Statistics button and use 95% for the Confidence Interval for Mean
Now choose Continue and then OK
The output is :
Case Processing Summary
Cases
Valid
Missing
Total
N
Percent
N
Percent
N
Percent
Q8 length of membership
60
100.0%
0
.0%
60
100.0%
Descriptives
Statistic
Std. Error
Q8 length of membership
Mean
2.32
.131
95% Confidence Interval for Mean
Lower Bound
2.05
Upper Bound
2.58
5% Trimmed Mean
2.30
Median
2.00
Variance
1.034
Std. Deviation
1.017
Minimum
1
Maximum
4
Range
3
Interquartile Range
1
Skewness
.320
.309
Kurtosis
-.956
.608
This gives a confidence interval of [ 2.05 , 2.58] We could report this as :
There is a 95% confidence that the mean length of membership is between 2 and 2½years
Topic 6: Correlation and regression
1. Pearson product moment coefficient of correlation
If we wish to measure the strength of linear relationship between two variables (x and y), and the data is cardinal, we use the Pearson product moment coefficient of correlation (r), which is defined as follows:
The value of lies in the range -1 < r < +1. A r value of -1 signifies a perfect negative linear correlation, +1 a perfect positive linear correlation, and 0 no linear correlation. These scenarios are most easily demonstrated on a scatter graph. Note that the coefficient only measures the strength of relationship; it is not evidence of cause and effect.
In effect r measures how adequately a scatter of observations can be represented by a straight line. However it is more meaningful to interpret the strength of the relationship by looking at r2 (known as the coefficient of determination, where 0 < r2 < +1) which measures the proportion (or percentage) of variation in y which is explained by the variation in x. To focus on r tends to overstate the strength of the relationship (e.g. r = 0.7 seems to suggest a fairly strong relationship, but it explains less than 50% of the variation).
We can test the significance of the coefficient using the test statistic:
which has a t-distribution with N-2 degrees of freedom
Example
In a survey of 30 company employees, the correlation between length of service and age is 0.87207.
A scatter diagram of the data is shown below.
The details of the calculation for the value of the correlation coefficient are shown below:
service
age
y
x
y2
x2
xy
2
24
4
576
48
24
51
576
2601
1224
2
25
4
625
50
9
34
81
1156
306
12
40
144
1600
480
6
32
36
1024
192
1
23
1
529
23
5
28
25
784
140
3
21
9
441
63
4
33
16
1089
132
3
26
9
676
78
1
21
1
441
21
14
57
196
3249
798
5
27
25
729
135
4
23
16
529
92
13
45
169
2025
585
1
22
1
484
22
11
43
121
1849
473
8
35
64
1225
280
1
19
1
361
19
8
28
64
784
224
12
44
144
1936
528
13
40
169
1600
520
3
35
9
1225
105
9
48
81
2304
432
2
25
4
625
50
11
43
121
1849
473
4
24
16
576
96
5
30
25
900
150
6
35
36
1225
210
totals
Σy =202
Σx= 981
Σy2= 2168
Σx2 =35017
Σxy =7949
= 0.87207
As the value of r = 0.87207 , the value of r2 is 0.7605
Thus 76% of the variation in the length of service can be explained by the variation in age .
The SPSS printout looks like:
Correlations
AGE
SERVICE
AGE
Pearson Correlation
1
0.872
Sig. (1-tailed)
.
0
N
30
30
SERVICE
Pearson Correlation
0.872
1
Sig. (1-tailed)
0
.
N
30
30
** Correlation is significant at the 0.01 level (1-tailed).
Thus r = 0.872071 and r2 = 0.760508; therefore we have a strong positive correlation in which approximately 76% of the variation in length of service is explained by the variation in age. The remaining 24% will be due to other factors (e.g. not all employees will have worked their way up through the company; some will have joined from other companies etc..). We can test the significance of this result by computing the test statistic and comparing it with critical values from the t-distribution. Our null hypothesis is that the true population correlation ρ = 0; our alternative hypothesis is that ρ > 0.
The test statistic
Testing at a level of significance of 5%, the critical t-value is 1.701. Since the test statistic exceeds the critical value we can reject the null hypothesis, and conclude that we have evidence of a genuine linear relationship between length of service and age. Moreover, the fact that 9.4294 is much larger than 1.701 shows that we could adopt a much smaller level of significance and still reject the hypothesis of no relationship.
Note the output does not specifically provide the test statistic but does indicate that the correlation is statistically significant at a 1% level of significance.
Note we are using a one-tailed test of significance with alternative hypothesis ρ > 0; if r had been negative, our alternative hypothesis would be ρ < 0, the test statistic would also be negative, and the critical t-value would be on the negative side of the t-distribution. Thus the null hypothesis would be rejected if the test statistic is less than the critical value (i.e. closer to the tail). The test is therefore a complete mirror image of the example above. Regression Whilst correlation measures the extent to which there is a linear relationship between 2 variables, regression enable us to find the equation that describes that linear relationship. The equation of a regression line has the form: Y = a + bX where Y is the dependent variable (the one we wish to predict / explain) and X is the independent variable. The value a is known as the intercept of the line and b measures the gradient of this line. The relevant formulae are: gradient intercept Looking at the calculation above, we can see that the value of the “top” of b is the same as the “top” of r and that the value of the “left-bottom bracket” of r is the same as the “bottom” of b. Thus Thus Thus the equation of the line linking length of service (y) and age (x) is: y = -8.2194 + 0.45727x This equation can then be used to make predictions. The SPSS regression output, with interpretation in italics, looks like: Variables Entered/Removedb Model Variables Entered Variables Removed Method 1 agea . Enter Variables Entered/Removed This simply tells us that age was the independent variable and service the dependent variable Model Summaryb Model R R Square Adjusted R Square Std. Error of the Estimate 1 .872a .761 .752 2.629 Model summary The value of the correlation coefficient, r was 0.872 and the value of r2 was 0.761. Coefficientsa Model Unstandardized Coefficients Standardized Coefficients t Sig. 95.0% Confidence Interval for B B Std. Error Beta Lower Bound Upper Bound 1 (Constant) -8.219 1.657 -4.961 .000 -11.613 -4.826 age .457 .048 .872 9.429 .000 .358 .557 Coefficients The unstandardized coefficients give us the values of a and b in the regression equation. Thus the equation here is y = -8.219 + 0.457x The final column “Sig” gives values less than 0.01 thus we can say that the coefficients of the regression equation are significantly different from zero at the 1% level ( and thus at 5% level). Casewise Diagnosticsa Case Number Std. Residual service Predicted Value Residual 2 3.385 24 15.10 8.899 a. Dependent Variable: service Casewise diagnostics During the input dialogue, we asked for any standardised residuals outside the range -3 to + 3. The output shows that one reading, case number 2, had a large standardised residual. This indicates that this point does not fit the general trend of the straight line and can be regarded as an outlier (i.e. an unusual reading). Case number 2 is an employee aged 51 with 24 years of service. On the scatter diagram, we can see that this point is a long way from the regression line. Residuals Statisticsa Minimum Maximum Mean Std. Deviation N Predicted Value .47 17.85 6.73 4.603 30 Residual -4.785 8.899 .000 2.583 30 Std. Predicted Value -1.361 2.414 .000 1.000 30 Std. Residual -1.820 3.385 .000 .983 30 a. Dependent Variable: service Residual Statistics This table can be ignored for simple cases. Rank Correlation If we wish to measure the strength of relationship between two variables, and at least one of them is ordinal, we need to use a nonparametric measure of correlation in which the strength of relationship is now based on the ranks of the data. There are two main measures of rank correlation, Kendalls τ statistic and Spearmans rank correlation coefficient; we shall focus on the latter. Observed values for x and y are replaced by their ranks, and the difference (d) in ranking between each set of paired observations is calculated. Spearmans coefficient (rs) is found from the following formula: As with Pearsons coefficient, we can also test whether the result is statistically significant. Providing N 10 we can use the t statistic as before: which has a t-distribution with N-2 degrees of freedom For larger samples (say N > 30) we can use a normal approximation:
Example 1 [thanks to Richard Charlesworth for this example]
A survey of MBA students included a question which asked the respondent to identify the most important factor in their choice of UNL for their MBA.
Factor Full-time Part-time
MBA course programme/design 34 25
Discussion with tutor at recruitment fair/open evening 09 10
Recommended by a colleague/friend 22 18
Living in London 21 28
Credit transfer/flexible study 00 09
Prior study at UNL 04 10
Photos of the staff in the brochure 10 00
Factor:
Full-time MBA (%)
Part-time MBA (%)
FT ranks
PT ranks
d
d2
MBA programme/design
34
25
1
2
1
1
Discussion at fair/open eve
9
10
5
4.5
0.5
0.25
Recommendation..
22
18
2
3
1
1
Living in London
21
28
3
1
2
4
CATS/flexible study
0
9
7
6
1
1
Prior study at UNL
4
10
6
4.5
1.5
2.25
Photos of the staff ..
10
0
4
7
3
9
10
Σd2 = 18.50
Therefore rs = 1 – (6×18.5) = 0.6696
7×(49-1)
Note that tied ranks are averaged (i.e. the scores of 10% for Part-time MBA jointly cover the rankings 4 and 5, so both take on the rank 4.5).
SPSS Output: Spearmans rank correlation coefficient
Nonparametric Correlations
Correlations
FT MBA
PT MBA
Spearman’s rho
FT MBA
Correlation Coefficient
1
0.667
Sig. (1-tailed)
.
0.051
N
7
7
PT MBA
Correlation Coefficient
0.667
1
Sig. (1-tailed)
0.051
.
N
7
7
How to use SPSS for Correlation and Regression
Example: load up the SPPSS file age&service.sav
To produce a correlation matrix
Choose Analyze, Correlate, Bivariate
Select age and service as the input variables. Check that Pearson is ticked. Choose OK
The resulting output looks like:
service
Pearson Correlation
1
.872**
Sig. (2-tailed)
.000
N
30
30
age
Pearson Correlation
.872**
1
Sig. (2-tailed)
.000
N
30
30
**. Correlation is significant at the 0.01 level (2-tailed).
To produce Regression output
Select Analyze Regression Linear
Select age as the independent variable
Select service as the dependent variable
Select enter as the method
Then select the statistics button
Select Casewise diagnostics
Outliers outside 3 standard deviations
tick Confidence Intervals, choose 95% as Level
then Continue
then OK
The output was shown previously
SPSS has the capacity to produce further analyses for regression, for example an analysis of the residuals is possible (and extremely useful).
Topic 7: Multiple Regression
In many situations we want to be able to use more than one independent variable to predict the value of the dependent variable. We use multiple regression in these cases.
For example, we might suspect that the price of a house depends not only on the number of bedrooms it has but also depends on the number of living rooms, number of bath/shower rooms, size of garden and so on. To predict the price of a house we would need a model that took into account all the significant factors.
We denote the independent variables as x1, x2, ……..xk and the associated coefficients as b1, b2 ….. bk . The constant of the regression equation is denoted by α .
Thus the regression equation is:
y = a + b1x1 + b2x2 + b3x3 + ………. bkxk
The formulae for calculating the values of a , b1 , b2 , b3………. bk are too complicated to include here. We will focus on analysing the SPSS printouts.
t-test on Regression Coefficients
We can use the t-test to decide if the coefficients of the regression line are significant
H0: βi = 0 [coefficient = 0 :no significant contribution to linear relationship]
H1: βi ≠ 0
The test statistics is given by: t = bi / Std Err of bi
We reject H0 if |t| > t( α/2, n-2) [found from statistics tables]
As before, we can reject H0 if the value of p <0.025 in SPSS printout..
Multicollinearity
If a regression equation contains two or more “independent” variables that have a strong linear relationship between them, then the model has “multicollinearity”. This means that the independent variables are not really independent in the sense that they are related to each other.
This strong linear relationship between two or more of the independent variables may make the estimates of the coefficients unreliable and may, in fact, make some coefficients negative when they should be positive (or vice versa).
We can check for collinearity by looking at the correlation matrix.
Multiple Regression Example:
London Theatres
Data was obtained from the Society of London Theatres about various statistics recorded for London Theatres from 1986 to 2010. The data set was obtained from http://www.solt.co.uk/downloads/pdfs/theatreland/2010-Graph2
The full data file is shown overleaf.
year
attendances in thousands
gross box office revenue £ thousands
average no of theatres open
no of performances
no of new productions
1986
10236
112068
42
16543
213
1987
10881
129589
42
16603
212
1988
10897
139338
43
16970
28
1989
10945
153251
42
16436
237
1990
11321
177904
40
15887
187
1991
10905
186790
39
15508
192
1992
10900
194772
41
15916
193
1993
11503
215619
41
15922
198
1994
11163
217763
41
16063
208
1995
11938
238741
43
17163
208
1996
11179
229017
41
16084
186
1997
11466
246082
39
15568
195
1998
11925
257920
41
16018
207
1999
11931
266565
44
17089
265
2000
11555
286556
43
16633
252
2001
11735
298989
44
17035
264
2002
12064
327972
44
17090
221
2003
11585
321485
42
16664
225
2004
12025
343674
43
17235
225
2005
12319
383942
45
17406
221
2006
12351
400853
43
16912
268
2007
13636
469939
44
17455
243
2008
13892
483349
45
18275
241
2009
14257
504984
45
17923
260
2010
14152
512332
46
18615
264
We are going to use multiple regression to produce a model that will enable the value of gross box office revenue to be explained by the other variables.
There are two basic approaches to multiple regression, top-down and bottom-up.
With top-down regression, we start by using all the independent variables in our model and then successively eliminate those have values of bi that are not significant (Sig T >0.025) or are highly correlated with other variables (multicollinearity)
Bottom-up approaches successively add variables to the model until no improvement can be made. The order in which the variables are added is often the order of the values of the correlation coefficients with the dependent variable (variable with the highest r is used first, then variable with second highest r is added and so on) excluding multicollinearity at each stage.
For both of these approaches we need to know what the values of the various correlation coefficients are. This can be obtained by using the correlate command in SPSS .
Note: SPSS highlights any correlations that are significant by the use of * and **.
We get:
Correlations
attendance
revenue
theatresopen
performances
newproductions
attendance
Pearson Correlation
1
.954**
.720**
.803**
.489*
Sig. (2-tailed)
.000
.000
.000
.013
N
25
25
25
25
25
revenue
Pearson Correlation
.954**
1
.715**
.771**
.559**
Sig. (2-tailed)
.000
.000
.000
.004
N
25
25
25
25
25
theatresopen
Pearson Correlation
.720**
.715**
1
.955**
.397*
Sig. (2-tailed)
.000
.000
.000
.050
N
25
25
25
25
25
performances
Pearson Correlation
.803**
.771**
.955**
1
.367
Sig. (2-tailed)
.000
.000
.000
.071
N
25
25
25
25
25
newproductions
Pearson Correlation
.489*
.559**
.397*
.367
1
Sig. (2-tailed)
.013
.004
.050
.071
N
25
25
25
25
25
**. Correlation is significant at the 0.01 level (2-tailed).
*. Correlation is significant at the 0.05 level (2-tailed).
From this correlation matrix we can see that:
· All of the other variables are correlated with revenue
· Attendance is highly correlated with the number of theatres open and the number of performances
As we want to use “revenue” as our dependent variable, we can guess that “attendance” will be in the model and at most one of “theatres open” and “performances”. We would not expect both “theatres open” and “performances” to be included as they are highly correlated and thus not really independent.
“Top-Down” Approach
The “top-down” approach initially involves using all four independent variables in the model.
The relevant printout is shown below with some comments in italics.
The SPSS commands are summarised below
The output is shown below:
Variables Entered/Removedb
Model
Variables Entered
Variables Removed
Method
1
newproductions, performances, attendance, theatresopen
.
Enter
a. All requested variables entered.
b. Dependent Variable: revenue
Variables Entered/Removed
All four possible independent variables have been added using the “Enter” method.
Revenue is the dependent variable.
This indicates that all 4 independent variables were used in the model.
Model Summaryb
Model
R
R Square
Adjusted R Square
Std. Error of the Estimate
1
.961a
.924
.909
36143.148
a. Predictors: (Constant), newproductions, performances, attendance, theatresopen
b. Dependent Variable: revenue
Model Summary
The value of r = 0.961 and r2 = 0.924.
However for multiple regression models, it is usual to use the adjusted R square value of 0.909 for r2 as this value takes into account the number of variables being used as well as the strength of the correlation.
ANOVAb
Model
Sum of Squares
df
Mean Square
F
Sig.
1
Regression
3.166E11
4
7.915E10
60.586
.000a
Residual
2.613E10
20
1.306E9
Total
3.427E11
24
ANOVA
The F statistics of 60.586 indicates that the model as a whole is significant ( sig < 0.025) Coefficientsa Model Unstandardized Coefficients Standardized Coefficients t Sig. B Std. Error Beta 1 (Constant) -1043627.767 177852.629 -5.868 .000 attendance 101.514 13.178 .913 7.703 .000 theatresopen 12735.681 14486.506 .200 .879 .390 performances -28.361 39.178 -.191 -.724 .478 newproductions 260.547 186.511 .104 1.397 .178 a. Dependent Variable: revenue Coefficients The regression equation is given by : revenue = -1043627.767 + 101×attendance + 12735.681× theatres open + -28.361×performances +260.547×new productions Looking at the p values (Sig. Column): Coefficient of the constant is significantly different from zero (p = 0.000 <0.025) Coefficient of attendance is significantly different from zero (p = 0.000 < 0.025) Coefficient of theatres open is NOT significantly different from zero (p = 0.390) Coefficient of performances is NOT significantly different from zero (p = 0.478) Coefficient of new productions is NOT significantly different from zero (p = 0.178) This implies that we should get a better model if we went through a process of deleting variables one by one . The first one to delete would be “performances “ as it has the highest value of p and is correlated with other variables. Residuals Statisticsa Minimum Maximum Mean Std. Deviation N Predicted Value 116690.98 536192.75 283979.76 114851.420 25 Residual -50616.152 68157.070 .000 32994.029 25 Std. Predicted Value -1.457 2.196 .000 1.000 25 Std. Residual -1.400 1.886 .000 .913 25 a. Dependent Variable: revenue Residual Statistics The Maximum standardised residual = 2.196 minimum standardised residual = -1.400 Thus all points are fairly close to the regression line and there are no outliers. Janet Geary 2012 Page 84 Conclusion: Try another model that leaves out the variable performances. However, we will not go through the whole procedure ourselves as we can utilise a special facility in SPSS that uses a “bottom up” procedure. Bottom-Up Approach (SPSS STEPWISE facility) From the correlation matrix ,we can see that the order of the correlation coefficients with sales are: 1. attendance (0.954) 2. performances (0.4578) 3. Theatres open (0.2277 4. new productions (0.559) Attendance is highly correlated with the number of theatres open and the number of performances This means that we have a case of multicollinearity here. We are unlikely to use all three of these in the “best” model. The rest of this printout is the result of the stepwise multiple regression command. NB. The user did not have to specify the order in which the variables were introduced. The “stepwise” command works by introducing the variables, one at a time, based on the value of the correlation coefficients. “Stepwise” stops when it cannot find a “better” model. Thus the last model produced by stepwise is considered the best model to use for predictions. We also can choose to have some useful plots here The step-wise process is summarised in the final output. Variables Entered/Removeda Model Variables Entered Variables Removed Method 1 attendance . Stepwise (Criteria: Probability-of-F-to-enter <= .050, Probability-of-F-to-remove >= .100).
a. Dependent Variable: revenue
The final (and best) model only uses “attendance” to predict the gross box office revenue.
Model Summaryb
Model
R
R Square
Adjusted R Square
Std. Error of the Estimate
1
.954a
.909
.905
36753.551
a. Predictors: (Constant), attendance
b. Dependent Variable: revenue
The value of adjusted r2 is high at 0.905
ANOVAb
Model
Sum of Squares
df
Mean Square
F
Sig.
1
Regression
3.116E11
1
3.116E11
230.702
.000a
Residual
3.107E10
23
1.351E9
Total
3.427E11
24
a. Predictors: (Constant), attendance
b. Dependent Variable: revenue
The regression model as a whole is significant as F has p=0.000
Coefficientsa
Model
Unstandardized Coefficients
Standardized Coefficients
t
Sig.
B
Std. Error
Beta
1
(Constant)
-974724.442
83195.461
-11.716
.000
attendance
106.037
6.981
.954
15.189
.000
a. Dependent Variable: revenue
The regression equation is revenue = -974724 + 106.037×attendance
Both of the coefficients of the constant and attendance are significantly different from zero.
Excluded Variablesb
Model
Beta In
t
Sig.
Partial Correlation
Collinearity Statistics
Tolerance
1
theatresopen
.060a
.650
.523
.137
.482
performances
.013a
.124
.902
.026
.355
newproductions
.122a
1.770
.091
.353
.761
a. Predictors in the Model: (Constant), attendance
b. Dependent Variable: revenue
Residuals Statisticsa
Minimum
Maximum
Mean
Std. Deviation
N
Predicted Value
110668.88
537043.06
283979.76
113951.373
25
Residual
-52402.609
67772.398
.000
35979.706
25
Std. Predicted Value
-1.521
2.221
.000
1.000
25
Std. Residual
-1.426
1.844
.000
.979
25
a. Dependent Variable: revenue
The minimum standardised residual is -1.426 and the maximum is 1.844 so there are no outliers
The histogram of the residuals closely approximates to a normal distribution, thus the model is a good one.
The points on the p-p plot are quite close to the diagonal line and do not really exhibit any particular pattern.
A scatter graph of the standardised residuals and the standardised predicted values is evenly scattered above and below the zero line and there is no particular pattern here.
Topic 7: Interpreting Regression Output
Using the boats data for multiple regression in SPSS
Data was collected on the prices charged for weekly boat hire at Easter and in the Summer from a number of boatyards on the Norfolk Broads. For each boat, the fields shown below were recorded.
Data definitions:
We want to see how the maximum price charged during Easter for a week’s hire is related to the attributes of the boat (length, width, number of fixed berths, maximum number of berths )
The correlation matrix gives:
Correlations
length in metres
width in metres
maximum number of berths
number of fixed berths
maximum Easter price
length in metres
Pearson Correlation
1
.688**
.769**
.828**
.857**
Sig. (2-tailed)
.000
.000
.000
.000
N
112
112
112
112
112
width in metres
Pearson Correlation
.688**
1
.498**
.492**
.537**
Sig. (2-tailed)
.000
.000
.000
.000
N
112
112
112
112
112
maximum number of berths
Pearson Correlation
.769**
.498**
1
.891**
.748**
Sig. (2-tailed)
.000
.000
.000
.000
N
112
112
112
112
112
number of fixed berths
Pearson Correlation
.828**
.492**
.891**
1
.863**
Sig. (2-tailed)
.000
.000
.000
.000
N
112
112
112
112
112
maximum Easter price
Pearson Correlation
.857**
.537**
.748**
.863**
1
Sig. (2-tailed)
.000
.000
.000
.000
N
112
112
112
112
112
**. Correlation is significant at the 0.01 level (2-tailed).
There appears to be some multi-collinearity here as there are strong correlations between some of the “independent” variables. In particular, “number of fixed berths” is strongly correlated with “maximum number of berths”. Thus we would not expect both of these variables to be present in a “good” model.
Using the step-wise approach:
Variables Entered/Removeda
Model
Variables Entered
Variables Removed
Method
1
number of fixed berths
.
Stepwise (Criteria: Probability-of-F-to-enter <= .050, Probability-of-F-to-remove >= .100).
2
length in metres
.
Stepwise (Criteria: Probability-of-F-to-enter <= .050, Probability-of-F-to-remove >= .100).
a. Dependent Variable: maximum Easter price
The first model just used “number of fixed berths” as this has the highest correlation with maximum Easter price. To this model, the variable “length in metres“ was added. No further additions were made as the other variables correlated strongly with these two.
Model Summaryc
Model
R
R Square
Adjusted R Square
Std. Error of the Estimate
Change Statistics
R Square Change
F Change
df1
df2
Sig. F Change
1
.863a
.745
.743
67.792
.745
322.149
1
110
.000
2
.900b
.810
.806
58.845
.064
36.988
1
109
.000
a. Predictors: (Constant), number of fixed berths
b. Predictors: (Constant), number of fixed berths, length in metres
c. Dependent Variable: maximum Easter price
The value of adjusted r2 was 0.743 for model 1 (number of fixed berths). This rose to r2 = 0.806 when the “length in metres” was added to model 1 to form model 2.
Coefficientsa
Model
Unstandardized Coefficients
Standardized Coefficients
t
Sig.
Collinearity Statistics
B
Std. Error
Beta
Tolerance
VIF
1
(Constant)
263.311
15.076
17.466
.000
number of fixed berths
62.209
3.466
.863
17.949
.000
1.000
1.000
2
(Constant)
12.597
43.251
.291
.771
number of fixed berths
35.206
5.363
.489
6.564
.000
.315
3.178
length in metres
33.623
5.528
.453
6.082
.000
.315
3.178
a. Dependent Variable: maximum Easter price
In model 1 . The coefficient of number of fixed berths is significantly different from zero (p=0.000<0.05)
In model 2, both coefficients are significantly different from zero (p=0.000 in both cases )
Casewise Diagnosticsa
Case Number
Std. Residual
maximum Easter price
Predicted Value
Residual
15
3.397
945
745.12
199.877
a. Dependent Variable: maximum Easter price
The only real outlier is case number 15 as it is the only point with a standardised residual outside the range -3 to + 3.
Residuals Statisticsa
Minimum
Maximum
Mean
Std. Deviation
N
Predicted Value
328.79
806.65
508.26
120.382
112
Residual
-158.095
199.877
.000
58.313
112
Std. Predicted Value
-1.491
2.479
.000
1.000
112
Std. Residual
-2.687
3.397
.000
.991
112
a. Dependent Variable: maximum Easter price
The graph of the residuals does roughly resemble a normal distribution curve. Thus the assumption that the residuals are normally distributed seems reasonable.
The points are close to the straight line and there is no real pattern in the points , thus the assumption of normally seems reasonable.
The residuals are fairly evenly spread about the horizontal line through zero. The difference from the zero line does not change much as the standardised predictions increase.
Conclusion : This is a good model.
Compare this output with one that includes all possible variables.
Variables Entered/Removedb
Model
Variables Entered
Variables Removed
Method
1
maximum number of berths, width in metres, length in metres, number of fixed berths
.
Enter
a. All requested variables entered.
b. Dependent Variable: maximum Easter price
Model Summaryb
Model
R
R Square
Adjusted R Square
Std. Error of the Estimate
Change Statistics
R Square Change
F Change
df1
df2
Sig. F Change
1
.903a
.816
.809
58.396
.816
118.850
4
107
.000
a. Predictors: (Constant), maximum number of berths, width in metres, length in metres, number of fixed berths
b. Dependent Variable: maximum Easter price
ANOVAb
Model
Sum of Squares
df
Mean Square
F
Sig.
1
Regression
1621146.826
4
405286.707
118.850
.000a
Residual
364878.665
107
3410.081
Total
1986025.491
111
a. Predictors: (Constant), maximum number of berths, width in metres, length in metres, number of fixed berths
b. Dependent Variable: maximum Easter price
The F statistics is significant and thus indicates that there is a linear relationship between maximum Easter price and at least one of the other variables
Coefficientsa
Model
Unstandardized Coefficients
Standardized Coefficients
t
Sig.
Collinearity Statistics
B
Std. Error
Beta
Tolerance
VIF
1
(Constant)
34.804
73.494
.474
.637
length in metres
36.055
6.704
.485
5.378
.000
.211
4.746
width in metres
-8.502
27.532
-.018
-.309
.758
.503
1.987
number of fixed berths
44.915
7.726
.623
5.814
.000
.149
6.696
maximum number of berths
-10.983
5.921
-.172
-1.855
.066
.201
4.985
a. Dependent Variable: maximum Easter price
(If we were building a model ourselves, we would eliminate “width in metres” and “maximum number of berths” as the coefficients are not significantly different from zero and the signs are not what we would expect. We would expect wider boats with more berths to have an increase in price.)
Casewise Diagnosticsa
Case Number
Std. Residual
maximum Easter price
Predicted Value
Residual
15
3.191
945
758.64
186.357
a. Dependent Variable: maximum Easter price
Residuals Statisticsa
Minimum
Maximum
Mean
Std. Deviation
N
Predicted Value
336.28
780.69
508.26
120.851
112
Residual
-147.015
186.357
.000
57.334
112
Std. Predicted Value
-1.423
2.254
.000
1.000
112
Std. Residual
-2.518
3.191
.000
.982
112
a. Dependent Variable: maximum Easter price
Features of a Good Linear Regression Model [SPSS printout]
Source of information
Desirable feature
1
Model Summary
A high value of adjusted r2
[A low value indicates a poor linear fit]
2
Table of Correlations
Model chosen should not contain independent variables that are highly correlated with each other.
[Model should not exhibit any multi-collinearity]
3
Coefficients table
All coefficients significantly different from zero
i.e. Sig column has values below 0.025
[If a coefficient is close to zero it adds nothing useful to the model]
4
ANOVA
The F statistic is significantly different from zero indicating that there is a relationship between the dependent variable and at least one of the independent variables.
5
Casewise diagnostics
Only 5% of readings are outside the range -2 to +2 in 'Std. Residual' column
Any outliers are outside the range -3 to +3
[If there are a number of outliers than data points could just be 'odd' cases or could indicate lack of a linear relationship.]
6
Histogram of residuals
Histogram fits the superimposed Normal curve (or is close)
[If histogram does not approximate to Normal curve, it implies residuals are not normally distributed and thus model is not a good fit. Should consider possibility of non-linear relationship]
7
Normal p-p plot
Points are:
scattered about the diagonal line
close to the diagonal line
do not exhibit any pattern
[If points are not as stated above, the model is not a good fit. Should consider possibility of non-linear relationship]
Seminar Exercise
Using the SPSS data file boats.sav, produce the best multiple regression model for predicting the maximum summer prices.
Write a report on your printouts.
Janet Geary 2012 Page 102
Janet Geary 2012 Page 86
Topic 8 : Graphical Linear Programming
Linear Programming methods can be used to solve problems where we wish to maximise (or minimise) a linear function subject to a number of linear constraints.
Graphical linear programming can be used when there are only two variables.
The stages of graphical linear programming are:
1. Formulation of the problem.
This involves translating a description of a problem into a mathematical format. In particular, the linear constraints and objective function have to be generated.
2. Determination of the set of feasible solutions.
This involves drawing a graph to determine the region where all the constraints are met.
3. Finding the optimal solution.
This involves finding the “best” feasible solution.
Example: The Soft Toy Company
A manufacturer of expensive soft toys makes giant teddy bears and fluffy rabbits.
Each teddy bear has a contribution to profit of £30 whilst each rabbit has a contribution of £40. (They are very expensive.)
The manufacturer wants to determine which combination of teddy bears and rabbits should be made in order to maximise the contribution.
Each teddy bear requires 2 hours of machining and 2 hours of hand labour whilst each rabbit requires 1 hour of machining and 3 hours of hand labour.
Each teddy bear and each rabbit requires 1 kilogram of stuffing.
The manufacturer has certain limitations on the possible production. In particular, there are only 50 hours of machining and 90 hours of hand labour available each week.
Stuffing is in short supply, so the manufacturer can only rely on 40 kilogrammes each week.
Determine which combination of teddy bears and rabbits should be produced in order to maximise profit.
Problem Formulation : Summarising the details given above:
machine (hour)
labour (h)
stuffing(kg)
teddy bear
2
2
1
rabbit
1
3
1
total(week)
50
90
40
Defining the Variables:
Let x be the number of teddy bears produced and sold each week
Let y be the number of rabbits produced and sold each week.
Formulating the Objective Function:
Contribution is C = 30x + 40y
so our objective function is Max 30x + 40y
Formulate Constraints:
machine hours: 2x + y ≤ 50
hand labour: 2x + 3y ≤ 90
stuffing: x + y ≤ 40
As we cannot have a negative number of toys we should include: x ≥ 0 y ≥ 0
Formulation Summary:
max 30x + 40y
such that
2x + y ≤ 50
2x + 3y ≤ 90
x + y ≤ 40
x ≥ 0
y ≥ 0
Determining the Feasible Region
To draw a graph of the feasible region we have to consider each constraint in turn.
Machine hours: 2x + y ≤ 50
In order to draw the line 2x + y = 50, we need two points.
Choosing x = 0 and y = 0 for our two points we get:
x = 0: 2x + y = 50 y = 0: 2x + y = 50
y = 50 2x = 50
x = 25
The graph now looks like:
We now have to decide which side of the line is required, i.e. on which side of the line is
2x + y actually less than 50.
In general, if the constraint in x and y includes ≤ the required area will be to the “bottom left” of the line. If the constraint includes ≥ the required area will be to the “top right”.
Hand Labour 2x + 3y ≤ 90
x = 0 2x + 3y = 90 y = 0: 2x + 3y = 90
3y = 90 2x = 90
y = 30 x = 45
Again we will want the area under the line.
The graph now looks like:
Stuffing: x + y ≤ 40
x = 0 x + y = 40 y = 0 x + y = 40
y = 40 x = 40
We want the area under this line as well.
The two further constraints x ≥ 0 and y ≥ 0 can be included to give the graph shown:
In this graph the feasible region can be shaded. It is the region where all the constraints are satisfied.
Optimising
In this example we wish to maximise the objective function: 30 x + 40y
Method 1:
We can use the gradient of the line in order to draw an objective function line.
In general, any line with equation ax + by = c has gradient -a/b
The line 30x + 40y has gradient = -30 = -3
40 4
Thus we can draw any line with such a gradient (3 down and 4 along) as our initial objective function line. This line is then moved parallel to find the optimal solution.
Method 2
Linear programming theory tells us that the optimal solution will always lie at a vertex (corner) of the feasible region.
Trying each vertex in turn:
vertex value of objective function
x=0, y=0 30x + 40y = 0
x=0, y=30 30x + 40y = 1200
x=15, y=20 30x + 40y = 1250
x=25, y=0 30x + 40y = 750
The highest value of the objective function is found at: x = 15, y = 20.
Production Plan.
In order to maximise the value of the weekly contribution, the manufacturer should produce and sell 15 teddy bears and 20 rabbits each week.
This will generate a weekly contribution of £1250
Example: Camping Trip
A youth club is planning a camping trip.
Two sizes of tent are available 4-person and 8-person tents.
There are 64 people that want to go on the trip but there is only room on the site for 13 tents. Only 8 4-person tents are available.
If each 4-person tent costs £15 a night and each 8-person tent costs £45 per night, how many of each type of tent minimises the nightly cost?
Formulation:
Let x be the number of 4-person tents used and let y be the number of 8-person tents used.
Minimise cost: Cost = 15x + 45y
Total number of tents: x + y ≤ 13
number of 4-person tents: x ≤ 8
people accommodated 4x + 8y ≥ 64
x ≥ 0,
y ≥ 0
Janet Geary 2012 Page 88
Drawing:
x + y = 13 x = 0, y = 13 y = 0, x = 13
4x + 8y = 64 x = 0, y = 8 y = 0, x = 16
The graph looks like: [
WinQSB graph
The objective function is 15x + 45y which has gradient -15 = -1
45 3
Thus we can draw a line anywhere with this gradient (1 down and 3 along)
As we wish to minimise the objective function, we must move the objective function line towards the origin.
The optimal solution is: x = 8, y = 4
The cost will be 15×8 + 45×4 = 300
Thus the trip organisers should book 8 4-person tents and 4 8-person tents.
This will cost £300 per night and will allow 64 people to be accommodated.
A total of 12 tents are used.
Janet Geary 2012 Page 89
Seminar Sheet
1. A company manufactures two types of sweatshirts: hooded and round neck.
Each hooded sweatshirt makes a contribution to profit of £4 and each round neck sweatshirt makes a contribution of £3.
Each hooded sweatshirt requires 1 hour of labour and each round neck requires 2 hours. There are 110 hours of labour available each day. There are limitations in the production capacity so that only 70 hooded sweatshirts can be made in a day.
The company wishes to maximise the contribution from these sweatshirts.
a) Formulate as a linear programming problem.
b) Determine the number of hooded and round neck sweatshirts that should be made each day in order to maximise contribution.
c) What is the value of the maximum daily contribution?
2. A health enthusiast would like to organise his food consumption of two diet supplements Vita and Glow so that his minimum daily requirement of three basic nutrients A, B and C is satisfied.
The minimum daily requirements are 14 units of A, 12 units of B and 18 units of C.
Product Vita has 2 units of A and one unit each of B and C in each packet.
Product Glow has one unit each of A and B and 3 units of C in each packet.
The price of Vita is 20p and the price of Glow is 40p per packet.
The health enthusiast wants to determine the level of consumption of Vita and Glow that will minimise expenditure whilst satisfying the minimum daily requirements.
a) Formulate the description given above as a Linear Programming problem.
b) Advise the user on the best combination of Vita and Glow to use.
c) What is the minimum cost?
3. A furniture manufacturer makes two types of tables: Traditional and Modern.
Each traditional table requires 6 hours of cutting time, 5 hours of sanding time and 2 hours for staining.
Each traditional table sold gives a contribution to profit of £50.
Each modern table requires 2 hours of cutting time, 5 hours of sanding time and 4 hours of staining time.
Each modern table sold makes a contribution of £30.
Each day the manufacturer has available 36 hours of cutting time,40 hours of sanding time and 28 hours of staining time.
All other inputs are available as required. The company can sell all the tables it makes.
Janet Geary 2012 Page 91
a) Find the number of each type of table that should be made in order to maximise the contribution.
Find the maximum contribution possible.
4 A clothes manufacturer offers two versions of a particular t-shirt; one printed and the other plain.
The manufacturing requirements are:
Cutting and printing time: The printed t-shirt takes 9 minutes each whilst each plain t-shirt takes only 3 minutes to cut and print. There are 360 minutes available for cutting and printing each day.
Each printed t-shirt takes 5 minutes for sewing and packing whilst each plain t-shirt takes only 3 minutes. There are 240 minutes available for sewing and packing of these t-shirts each day.
A contract with a local shop requires a minimum of 12 printed t-shirts to be produced each day.
The manufacturer makes a contribution to profit of £6 from the manufacturer and sale of each printed t-shirt and £5 for each plain t-shirt.
The manufacturer wishes to maximise this contribution to profit.
Formulate the scenario described above as a linear programming problem. You should clearly indicate the meaning of each constraint and the meaning of any variables.
Using a graphical method, or otherwise, determine which combination of printed and plain t-shirt the manufacturer should produce and sell in order to maximise contribution to profit.
5. A small engineering company makes two types of engine parts, coded as part A and part B.
Part A has a contribution of £30 per unit and part B £40. The company wishes to establish the weekly production plan which maximises contribution.
Production data are as follows:
machining (hours)
labour (hours)
materials (kg)
part A
4
4
1
part B
2
6
1
total available per week
100
180
40
Because of a trade agreement, sales of part A are limited to a weekly maximum of 20 units and to honour an agreement with an old established customer at least 10 units of part B must be made each week.
Formulate the scenario described above as a linear programming problem. You should clearly indicate the meaning of each constraint and the meaning of any variables.
Using a graphical method, or otherwise, determine which combination of part A and part B the company should produce and sell in order to maximise contribution to profit.
What is the expected contribution to profit?
Answers:
1. a) max 4x + 3y
such that x + 2y ≤ 110
and x ≤70
b) 70 hooded and 20 round neck c) £340
2. a) min 20 x + 40y
such that 2x + y ≥ 14
and x + y ≥ 12
and x + 3y ≥18
b) 9 packets of Vita and 3 packets of Glow c) £3
3 max 50x + 30y
such that 6x + 2y ≤ 36
5x + 5y ≤ 40
2x + 4y ≤ 28
a) 5 traditional and 3 modern each day.
b) Maximum contribution: £340
4 max 6x + 5y
such that 9x + 3y ≤ 360
5x + 3y ≤ 240
x ≥12
Produce 12 printed and 60 plain t-shirts each day
5 Max 30A + 40B
such that 4A + 2B ≤ 100
4A + 6B ≤ 180
A + B ≤ 40
A ≤20 B ≥10
Produce 15 units of part A and 20 units of part B each week. The expected contribution is £1250 per week from the sale of these parts.
Topic 9 Linear Programming-Shadow Prices and Sensitivity Analysis
Consider the following problem:
A company makes two kinds of armchair; Model A with loose covers and Model B with fitted covers only.
The company estimates that it can sell as many of the armchairs as it can make. Management must now determine the production targets for the next few months in order to maximise profit.
The company knows that it will make a profit of £50 on each type A model and £40 on each type B model.
As Model A chairs require extra packing, chairs and loose covers, the company can only manage to make a maximum of 8 chairs a day.
In the machining department, where the wood for the chairs is shaped, each Model A armchair requires 1 hour whilst each Model B requires 1.5 hours. There are 15 hours of shaping time available each day.
In the upholstery department each Model A requires 3 hours and each Model B requires 2 hours. There is a total of 30 hours available for upholstering each day.
Determine the optimal production plan.
Furthermore answer the following questions:
1. If an extra hour each day could be made available for shaping the wood, what would this be worth?
2. If an extra hour of upholstery time could be made available, by how much would profit be increased?
3. By how much could the profit on type A chairs alter before the original optimal plan changes?
4. If the original solution is to remain optimal, by how much could the profit from type B change?
Model:
Let x be the number of model A armchairs produced each day.
Let y be the number of model B armchairs produced each day.
maximise profit: max 50x + 40y
subject to:
max A x ≤ 8
shaping: x + 1.5y ≤ 15
upholstery: 3x + 2 y ≤ 30
and x ≥ 0, y ≥ 0
The outline graph of the problem is given below:
Janet Geary 2012 Page 92
The objective function 50x + 40y has gradient = - 50 = -5
40 4
[Any line ax + by = c has gradient -a/b]
We can thus draw a profit line with this gradient and move it “top-right” to maximise.
The optimal solution is found at the point x = 6 and y = 6.
The optimal production plan is to make 6 model A armchairs and 6 model B armchairs each day.
This will produce a maximum profit of: Model A 6 @ £50 = £300
Model B 6 @ £40 = £240
Total = £540
Thus the maximum profit available is £540 a day.
Slack and Binding Constraints.
Since the optimal solution is bounded by two constraints, shaping and upholstery, these constraints are known as binding.
Considering each constraint in turn.
Shaping x + 1.5y ≤ 15
When x = 6 and y = 6, x + 1.5y = 6 + 1.5× 6 = 6 + 9 = 15
All of the available time for shaping has been used. Thus the slack on this constraint is 0.
Upholstery 3x + 2y ≤ 30
When x = 6 and y = 6 3x + 2y = 3 × 6 + 2 × 6 = 18 + 12 = 30
All of the time available for upholstering has been used. The slack on this constraint is 0.
Janet Geary 2012 Page 93
Maximum model A x ≤ 8
when x = 6 and y = 6 Left hand side, LHS: x = 6
Right hand side, RHS 8
Slack = RHS - LHS = 2
All of the model A armchairs possible have not been made. Thus there is a slack on this constraint of 2.
N.B. Binding constraints have zero slack.
Shadow Prices
The shadow price ( or dual price) for a particular constraint shows the amount of improvement in the optimal objective value as the right hand side of that constraint is increased by one unit, with all other data held fixed
(Eppen, Gould and Schmidt, Introductory Management Science)
Thus if we are trying to maximise the objective function, the shadow price gives the increase in the value of the objective function. If we are seeking to minimise, the shadow price gives the decrease in the objective function value.
Shaping x + 1.5y ≤ 15
If the time available for shaping is increased by 1 hour the new constraint will be:
x + 1.5y ≤ 16
The new optimal solution will be where this line meets the upholstery line, (see graph).
New: shaping x + 1.5y = 16 × 3 3x + 4.5y = 48
Old: upholstery 3x + 2y = 30 3x + 2y = 30
Subtracting 2.5y = 18
y = 7.2
When y = 7.2 3x + 2y = 30
3x + 14.4 = 30
3x = 15.6 x = 5.2
Thus the new optimal solution would be 5.2 of model A and 7.2 of model B per day
In practice this would mean producing 26 model A and 36 model B in a five day week.
New profit = 50x + 40y = 50 × 5.2 + 40 × 7.2 = 548
Old profit = 540
Extra profit = 8
One extra hour of shaping is worth £8. The shadow price for shaping is £8.
Upholstery 3x + 2y ≤ 30
If upholstery time is increased by 1 hour, the constraint becomes:
3x + 2y ≤ 31
and the new optimal solution will be where this constraint line meets the line for shaping.
New: upholstery 3x + 2y = 31 3x + 2y = 31
Old: shaping x + 1.5y = 15 × 3 3x + 4.5y = 45
-2.5y = -14
y = 5.6
Janet Geary 2012 Page 94
When y = 5.6 3x + 2y = 31
3x + 11.2 = 31
3x = 19.8 x = 6.6
New optimal solution is x = 6.6, y = 5.6
New profit 50x + 40y = 50 × 6.6 + 40 × 5.6 = 554
Old profit = 540
Extra profit = 14
One extra hour of time for upholstery is worth £14. The shadow price of upholstery is £14.
Sensitivity Analysis on the Objective Coefficient Ranges
The objective coefficient ranges tells us the changes that can be made in the objective function coefficients without changing the optimal solution.
This is particularly useful as profits, costs etc are likely to change over time.
Our objective function line , profit = 50x + 40y, has gradient -50/40 = -5/4 = -1.25
Shaping constraint line x + 1.5y = 15 has gradient -1/1.5 = -0.667
Upholstery constraint 3x + 2y = 30 has gradient -3/2 = -1.5
The profit line will reach an optimal solution at the intersection of the shaping and upholstery lines whilst the gradient of the profit line lies between the gradients of these two binding constraints.
Thus the current solution will remain optimal whilst the gradient of the objective function lies between the gradient of the two binding constraints.
Model A
Let the profit from model A chairs change to a new amount called new.
The objective function will now look like: (new)x + 40y
which has gradient - new
40
The current optimal solution will remain optimal whilst:
gradient of upholstery < gradient of profit line < gradient of shaping
-1.5 < - new < -0.667
40
- 1.5 < - new - new < -0.667
40 40
-60 < - new - new < -0.667 40
new < 60 -new < -26.667
26.667 < new
26.667 < new < 60
Current solution remains optimal whilst the profit from model A lies in the range £26.67 to £60
Janet Geary 2012 Page 99
Thus means that the profit could rise by an amount less than £60 - £50 = £10 or it could fall by an amount less than £50 - £26.67 = £23.33 without affecting the optimal solution.
If the profit from each model A armchair remains in the range £26.67 to £60 the current solution will remain optimal.
Model B
Let the profit from each model B armchair change to a new amount new.
The new objective function will be: 50x + (new)y
which has gradient - 50
new
The current solution will remain optimal whilst the gradient of the profit line is between the gradients of the two binding constraints.
- 1.5 < - 50 < -0.667
new
-1.5 < -50 - 50 < -0.667
new new
-1.5×new < -50 - 50 < -0.667×new
50 < 1.5×new 0.667×new < 50
50 < new new < 50___
1.5 0.667
33.333 < new new < 75
33.33 < new < 75
The current solution will remain optimal whilst the profit from each model B armchair lies in the range £33.33 to £75.
This is the same as saying:
The current solution will remain optimal whilst the value of the profit from each model B does not rise by more than £35 [£75 - £40] or fall by more than £6.67 [£40 - £33.33].
Thus continue to produce 6 of each model each day whilst the profit from each model B is between £33.33 and £75.
The WinQSB input and output looks like :
Choose Edit and then variable names
Choose edit and then constraints
The expression can now be entered into the spreadsheet format
To get the output: Choose “Solve and Analyze” then “Solve the problem”
Solution is :
Notes for a report
The optimal production plan is to make 6 model A armchairs and 6 model B armchairs each day.
This will produce a maximum daily profit of £540
If the profit from each model A armchair remains in the range £26.67 to £60 the current solution will remain optimal
The current solution will remain optimal whilst the profit from each model B armchair lies in the range £33.33 to £75.
All of the available time for shaping has been used. Thus the slack on this constraint is 0.
One extra hour of shaping is worth £8. The shadow price for shaping is £8.
All of the time available for upholstering has been used. The slack on this constraint is 0.
One extra hour of time for upholstery is worth £14. The shadow price of upholstery is £14.
All of the 8 model A armchairs possible have not been made. Thus there is a slack on this constraint of 2 as only 6 have been made.
N.B. Only one change can be made at a time
Seminar Exercise (the first part of each question was done last week)
1. A company manufactures two types of sweatshirts: hooded and round neck.
Each hooded sweatshirt makes a contribution to profit of 4 and each round neck sweatshirt makes a contribution of 3.
Each hooded sweatshirt requires 1 hour of labour and each round neck requires 2 hours. There are 110 hours of labour available each day. There are limitations in the production capacity so that only 70 hooded sweatshirts can be made in a day.
The company wishes to maximise the contribution from these sweatshirts.
a) Formulate as a linear programming problem.
`b) Determine the number of hooded and round neck sweatshirts that should be made each day in order to maximise contribution.
c) What is the value of the maximum daily contribution?
d) If another hour of labour was available each day, what would be the maximum daily contribution?
e) If the limitations in the production process were changed so that 71 hooded sweatshirts could be made each day, what would be the effect on the maximum daily contribution?
f) Within what limits could the contribution from hooded sweatshirts lie, without the optimal production plan changing?
2. A health enthusiast would like to organise his food consumption of two diet supplements Vita and Glow so that his minimum daily requirement of three basic nutrients A, B and C is satisfied.
The minimum daily requirements are 14 units of A, 12 units of B and 18 units of C.
Product Vita has 2 units of A and one unit each of B and C in each packet.
Product Glow has one unit each of A and B and 3 units of C in each packet.
The price of Vita is 20p and the price of Glow is 40p per packet.
The health enthusiast wants to determine the level of consumption of Vita and Glow that will minimise expenditure whilst satisfying the minimum daily requirements.
a) Formulate the description given above as a Linear Programming problem.
b) Advise the user on the best combination of Vita and Glow to use.
c) What is the minimum cost?
d) If the minimum daily requirement for the nutrient A was increased by 1 unit, what would the minimum cost be?
e) If the minimum daily requirement for nutrient A was reduced by 1 unit, what would the minimum cost be?
f) Within what range could the price of Vita lie without changing the optimal combination of Vita and Glow?
3. A furniture manufacturer makes two types of tables: Traditional and Modern.
Each traditional table requires 6 hours of cutting time, 5 hours of sanding time and 2 hours for staining.
Each traditional table sold gives a contribution to profit of £50.
Each modern table requires 2 hours of cutting time, 5 hours of sanding time and 4 hours of staining time.
Each modern table sold makes a contribution of £30.
Each day the manufacturer has available 36 hours of cutting time, 40 hours of sanding time and 28 hours of staining time.
All other inputs are available as required. The company can sell all the tables it makes.
a) Find the number of each type of table that should be made in order to maximise the contribution.
Find the maximum contribution possible.
b) How will the company’s optimal mix of tables change if there were only 30 hours of cutting time available each day?
c) If an extra hour of cutting time became available each day, by how much would the maximum contribution change?
d) By how much could the contribution to profit of a Modern table increase before the optimal production plan changes?
6 A clothes manufacturer offers two versions of a particular t-shirt; one printed and the other plain.
The manufacturing requirements are:
Cutting and printing time: The printed t-shirt takes 9 minutes each whilst each plain t-shirt takes only 3 minutes to cut and print. There are 360 minutes available for cutting and printing each day.
Each printed t-shirt takes 5 minutes for sewing and packing whilst each plain t-shirt takes only 3 minutes. There are 240 minutes available for sewing and packing of these t-shirts each day.
A contract with a local shop requires a minimum of 12 printed t-shirts to be produced each day.
The manufacturer makes a contribution to profit of £6 from the manufacturer and sale of each printed t-shirt and £5 for each plain t-shirt.
The manufacturer wishes to maximise this contribution to profit.
Formulate the scenario described above as a linear programming problem. You should clearly indicate the meaning of each constraint and the meaning of any variables.
Using a graphical method, or otherwise, determine which combination of printed and plain t-shirt the manufacturer should produce and sell in order to maximise contribution to profit.
Identify any binding constraints
If the time available for sewing and packing the t-shirts was to increase by 1 minute each day, what effect would this have on the maximum contribution that could be earned from the manufacture and sale of the t-shirts?
Janet Geary 2012 Page 100
At present the contribution from a printed t-shirt is £6. By how much could this contribution increase before the production plan would need to be changed?
6. A small engineering company makes two types of engine parts, coded as part A and part B.
Part A has a contribution of £30 per unit and part B £40. The company wishes to establish the weekly production plan which maximises contribution.
Production data are as follows:
machining (hours)
labour (hours)
materials (kg)
part A
4
4
1
part B
2
6
1
total available per week
100
180
40
Because of a trade agreement, sales of part A are limited to a weekly maximum of 20 units and to honour an agreement with an old established customer at least 10 units of part B must be made each week.
Formulate the scenario described above as a linear programming problem. You should clearly indicate the meaning of each constraint and the meaning of any variables.
Using a graphical method, or otherwise, determine which combination of part A and part B the company should produce and sell in order to maximise contribution to profit.
What is the expected contribution to profit?
b) If the number of hours available for machining each week increased by 5, by how much would the maximum possible contribution change?
c) If the amount of material available each week increased by 1 kg, what effect would this have on the maximum contribution?
d) Within what range could the contribution from a part B lie, without altering the optimal solution found previously.
Answers:
1.
d) maximum contribution would increase by £1.50 to £341.50
e) maximum contribution would increase by £2.50
f) contribution from hooded must be above £1.50
2. d) minimum cost will not change i.e. £3
e) minimum cost will not change i.e. £3
f) price of Vita must lie in the range 13.3p to 40p
3
a) Maximum contribution: £340
b) 3.5 Traditional and 4.5 modern each day (7 Traditional and 9 modern every 2 days)
c) maximum contribution would increase by £5
d) could increase by up to £20
4
Produce 12 printed and 60 plain t-shirts each day
binding constraints: sewing and order (printed >=12)
If sewing and packing time increases by 1 min, max contribution increases by £1.67
Contribution from a printed t-shirt could increase by up to £2.33 before the optimal solution changes.
Produce 15 units of part A and 20 units of part B each week. The expected contribution is £1250 per week from the sale of these parts.
b) 5 @ 1.25 = £6.25
c) c) no effect as slack constraint
d) £15 to £45
Janet Geary 2012 Page 101
Topic 10: Linear Programming – Extension to more than two variables
There are many types of problems that can be solved using linear programming techniques.
We shall limit our consideration to just two examples: financial planning and farm production.
Example: Lottery Winner
A lottery winner, Fred, has instructed his financial adviser to invest 100,000 of his winnings in the best combination of three stocks, Alpha, Beta and Gamma.
The details on these stocks are given in the table below:
Stock
price per share
estimated annual return per share
maximum possible investment
Alpha
60
7
60,000
Beta
25
3
25,000
Gamma
20
3
30,000
Formulate, and solve, a linear programme to show how many shares of each stock Fred should purchase in order to maximise the estimated total annual return.
Advise Fred on the results of the sensitivity analysis.
Formulation
Let A be the number of Alpha shares purchased.
Let B be the number of Beta shares purchased.
Let C be the number of Gamma shares purchased.
Total annual return per share is 7A + 3B + 3C
Total amount of money to be invested is £100,000
Thus 60A +25 B + 20C ≤ 100,000 [assuming we do not have to invest all of it]
Only £60,000 can be invested in Alpha,@ £60 each, therefore A ≤1,000
Only £25,000 can be invested in Beta @ £25 each, therefore B ≤1,000
Only £30,000 can be invested in Gamma @£20 each, therefore C ≤1,500
Problem: max 7A + 3B + 3C
subject to
total investment: 60A + 25B + 20C ≤ 100,000
max A: A ≤ 1,000
max B: B ≤ 1,000
max C: C ≤ 1,500
non-negativity A ≥ 0, B ≥ 0, C≥ 0
Computer Input
Computer Output
Janet Geary 2012 Page 103
Decision variable
Solution value
Alpha
750
Beta
1,000
Gamma
1,500
Objective Function (max) = 12,750
constraint
Slack or Surplus
Shadow Price
total investment
0
0.1167
max in A
250
0
max in B
0
0.0833
max in C
0
0.6667
Optimal solution:
Buy 750 shares in Alpha, 1,000 shares in Beta and 1,500 shares in Gamma.
This will give a total estimated return of £12,750.
Slacks:
Constraint total investment has slack = S1 = 0. All of the £100,000 has been invested.
Constraint max in A has slack = S2 = 250.
The number of Alpha shares bought was 250 less than the number available.
[1000 were available but only 750 bought]
Constraint max in B has slack = S3 = 0.
The maximum number of Beta shares has been purchased.
Constraint max in C has slack = S4 = 0.
The maximum number of Gamma shares has been purchased.
Binding Constraints
The binding constraints are constraints total investment, max in B, and max in C.(They all have zero slack).
Thus the maximum return is limited by the total amount of money that can be spent, the number of Beta shares available and the number of Gamma shares available.
Shadow Prices
Total Investment:
Total investment has a shadow price of 0.1167
If the total investment possible was increased by £1, a further £0.1167 could be earned.
Since any increased investment would be spent on Alpha shares, (Beta and Gamma shares are all used), every extra £1 spent on Alpha shares would generate £7/60 = £0.1167.
Max in A
The maximum Alpha constraint has shadow price equal to zero as we can earn nothing by increasing the availability of Alpha shares as there are already unused shares.
Max in B
If the total number of Beta shares available was increased by 1, a further £0.083 could be earned.
We could not predict this result as more Beta shares will reduce the number of Alpha and Gamma shares in an unknown manner.
Max in C
If the total number of Gamma shares available was increased by 1, a further £0.67 could be earned.
Sensitivity Analysis
Decision Variable
Unit Cost or Profit
Allowable Min
Allowable max
A
7
0
7.2
B
3
2.9167
M =Infinity
C
3
2.3333
M =Infinity
A shares
Janet Geary 2012 Page 104
The current suggested portfolio of shares will remain optimal whilst the return on an Alpha share remains in the range 0 to £7.20, assuming the other returns remain constant.
B shares
The current suggested portfolio will remain optimal whilst the return on a Beta share remains above £2.92.
C shares
The current portfolio will remain optimal whilst the return on a Gamma share remains above £2.33.
Ranges for Shadow Prices
Constraint
Right Hand Side
Shadow Price
Allowable Min RHS
Allowable Max RHS
total investment
100,000
0.1167
55,000
115,000
max in A
750
0
750
M
max in B
1,000
0.0833
400
2,800
max in C
1,000
0.6667
750
3,750
These values give the ranges within which the shadow prices apply.
Total investment
If the RHS of constraint total investment increases to 115,000 or less, we can say that for each extra £1 available for investment the return will increase by £0.1167. However if the increase takes the total investment available to above £115,000 we do not know what the extra return will be.
Similar arguments apply for reductions in the maximum total investment, provided the total available does not fall below £55,000.
Max in A
If the RHS of constraint max A, maximum number of Alpha shares, takes any value above 750, then for each additional share, the extra return will be zero. (This confirms that we only needed 750 Alpha shares for the original solution.)
Max in B
If the RHS of constraint max B, maximum number of Beta shares, is in the range 400 to 2,800 shares, the shadow price of £0.0833 will apply.
As long as the maximum number of Gamma shares is in the range 750 to 3750, the shadow price of £0.6667 will apply.
Farm Production
A farmer has two farms, Home Farm and Meadow Farm, in which he grows corn and wheat.
Both farms are 40 acres in size.
To satisfy a contract with a local mill, the farmer must produce 7,000 barrels of corn and 11,000 barrels of wheat each year.
The farmer wishes to minimise the cost of meeting the contract.
Janet Geary 2012 Page 105
The data for each farm is given below:
Home
Farm
yield per acre
cost per
acre
Meadow
Farm
yield per acre
cost per acre
corn
400 barrels
£100
corn
650 barrels
£120
wheat
300 barrels
£90
wheat
350 barrels
£80
Write a report to the farmer analysing your findings.
Formulation:
Variables: We need to distinguish between production at Home and Meadow Farm.
Let CH be the number of acres of corn planted at Home Farm
Let CM be the number of acres of corn planted at Meadow Farm
Let WH be the number of acres of wheat planted at Home Farm
Let WM be the number of acres of wheat planted at Meadow Farm
Minimise Cost:
Minimise 100CH + 90WH + 120CM + 80WM
subject to:
yield corn (1) 400CH + 650CM ≥ 7,000
wheat (2) 300WH + 350WM ≥ 11,000
area Home Farm (3) CH + WH ≤ 40
Meadow Farm (4) CM + WM ≤ 40
CH ≥ 0, CM ≥ 0, WH ≥0 WM ≥ 0
The input for Farms is:
Janet Geary 2012 Page 106
Report
1. Planting Plan
In order to minimise the planting costs, the following should be implemented:
Plant Home Farm with 2.56 acres of wheat
Plant Meadow Farm with 10.77 acres of corn and 29.23 acres of wheat.
This will incur costs of £3,861.54, the minimum that can be achieved.
2. Implications of Suggested Planting Programme
The planting plan given above will result in exactly 7,000 barrels of corn and exactly 11,000 barrels of wheat being produced.
[S1 = 0, S2 = 0]
37.44 acres of Home Farm will not be planted, whilst all 40 acres of Meadow Farm will be used.
3. Scope of the recommendations
3.1 Change in Costs
The planting plan given above will result in a minimum cost whilst the following conditions hold:
* The cost of planting corn at Home Farm stays above £89.23 per acre.
* The cost of planting corn at Meadow Farm stays below £137.50 per acre.
* The cost of planting wheat at Home Farm stays in the range £68.57 to £105 per acres.
* The cost of planting wheat at Meadow Farm stays in the range £62.50 to £105 per acre.
If any one of the conditions above fails to hold then a new planting plan will be required. Furthermore, if two or more of the present costs change, a new plan will be required.
As these ranges are relatively large, the proposed plan should hold good for some time
3.2 Change in Contract
The proposed plan exactly meets the contracts for 7,000 barrels of corn and 11,000 barrels of wheat.
If the requirement for corn was to increase by one barrel, the extra costs incurred in meeting this target would be 22.3 pence. This marginal cost of 22.3 pence per barrel applies for levels of production between 5,571.43 and 26,000 barrels.
If the minimum amount of wheat required was increased, the extra cost would be 30 pence per barrel. This extra cost per barrel will apply whilst the contract for wheat lies in the range 10,230.8 to 22,230.8 barrels.
If the minium requirement for corn or wheat fell outside these ranges, a new planting plan would be required.
3.3 Changes in Farm Size
In total only 2.56 acres of Home Farm are being used. Clearly it would not make sense to consider increasing the size of Home Farm.
Janet Geary 2012 Page 107
All 40 acres of Meadow Farm are being used. If the size of Meadow Farm could be increased by 1 acre the minimum cost could be reduced by 25. This marginal reduction in cost applies when the size of Meadow Farm lies in the range 10.77 acres to 42.2 acres.
Similarly, if the size of Meadow Farm is reduced by 1 acre, the minimum cost will rise by £25.
[If we could increase the size of Meadow Farm by 1 acre, we could produce an extra 350 barrels at Meadow Farm at a cost of £80.
At the same time we would reduce the yield from Home Farm by 350 barrels
i.e. 350/300 = 1.16666 acres. This reduction would save 1.16666*90 =105.
. Thus the net reduction would be 105 – 80 = £25 ]
Seminar Exercise
Question 1.
Formulate the problem below as a linear programming problem.
Solve the problem using QSB for Windows.
Write some notes on the interpretation of the output.
During the seminar discuss the formulation and the output.
A company has two factories A and B. Each factory makes two products, standard and deluxe. A unit of standard gives a profit contribution of £10, while a unit of deluxe gives a profit contribution of £15. The company wishes to maximise this profit contribution.
Each factory uses two processes, grinding and polishing, for producing its products.
Factory A has a grinding capacity of 80 hours per week and a polishing capacity of 60 hours per week. For factory B these capacities are 60 and 75 hours per week respectively.
The grinding and polishing times in hours for a unit of each type of product in each factory are given in the table below:
Factory A
Factory B
standard
deluxe
standard
deluxe
grinding
4
2
5
3
polishing
2
5
5
6
It is possible, for example, that factory B has older machines than factory A, resulting in higher unit processing times.
In addition each unit of each product uses 4 kilograms of a raw material which we shall refer to as “raw”. The company has 120 kilograms of “raw” available per week.
(Example is taken from H.P.Williams Model Building in Mathematical Programming
3rd edition, Wiley 1990 page 45)
[9.166 units of standard in factory A, 8.33 units of deluxe in factory A, 12.5 units of deluxe
in factory B each week. Maximum contribution = £404.17 per week]
Janet Geary 2012 Page 108
Question 2. Hitech Training
A small private college Hitech Training provides training in web-page design and development for companies. The companies send their employees to the college premises for short courses. Each course lasts 8 hours. The distribution of the 8 hours is flexible; a course make take place during one day or spread over a number of days. It is also possible for one course to spread over a number of weeks, e.g. 2 hours on Monday evening for 4 weeks.
The companies send employees in groups of 10 or less, as the college’s computer rooms can only accommodate 10 trainees at a time. If a company chooses to send less than 10 employees on a particular course they are not charged a reduced fee as the teaching costs are fixed. If the company wishes to send more than 10 employees, they must book more than one course.
The prices charged to the companies for a course are as follows:
Introductory Web Design £500 for an 8-hour course for up to 10 employees
Intermediate Web Design £600 for an 8-hour course for up to 10 employees
Advanced Web Design £700 for an 8-hour course for up to 10 employees
The college employs trainers to deliver these courses. The trainers are only paid for the hours they work. The rates of pay are as follows:
Introductory Web Design £18 an hour
Intermediate Web Design £21 an hour
Advanced Web Design £35 an hour
Demand for these training courses is such that the college is confident of having enough companies wanting courses to fill all the courses it provides. However, the number of courses that can be provided each week is limited by the fact that the college only has 200 trainer-hours available each week. [A trainer-hour is defined as one trainer teaching for one hour.]
The college has enough rooms and trainers to meet any time-tabling problems that could arise.
Of these 200 trainer-hours, only 80 are available for the Advanced Web Design course. A total of 160 trainer-hours are available for teaching the Intermediate or Advanced courses.
The trainers that can teach the Advanced courses can also teach the Intermediate courses. All trainers can teach the Introductory course.
The college’s analysis of the demand for its courses indicates that at least 5 of all courses offered should be at the Advanced level and at least 5 of all courses offered at Intermediate level each week.
The college wishes to maximise its Net Revenue.
This is defined as: Net Revenue = Revenue from courses – Labour Costs.
Required:
· Formulate the scenario described on the previous page as a Linear Programme.
Keep a copy of any notes made during the formulation, as you may need these in the examination.
Janet Geary 2012 Page 110
· Enter the problem into QSB for Windows (or a similar package)
Produce printouts of
The input data
Solution of the problem
Sensitivity Analysis
You will need to ensure that your printouts include: shadow prices, right-hand side ranges, ranges of objective coefficients.
· Bring these printouts and any notes to the seminar
Answer: Question 1.
Let AS be number of standard products made in factory A each week
Let BS be number of standard products made in factory B each week
Let AD be number of deluxe products made in factory A each week
Let BD be number of standard products made in factory B each week
max 10AS + 10BS+ 15AD + 15BD
subject to:
grinding in A: 4AS + 2AD ≤ 80
polishing in A: 2AS + 5AD ≤60
grinding in B: 5BS + 3BD ≤60
polishing in B: 5BS + 6BD ≤ 75
raw: 4AS + 4BS + 4AD + 4BD ≤ 120
Question 2: Output.
Topic 11: Project Management: Critical Path
Introduction
In every industry there are concerns about how to manage large-scale, complicated projects effectively. Millions of pounds in cost overruns have been wasted due to the poor planning and control of projects.
Project planning
Planning the project requires that the objectives are clearly defined so the project team knows exactly what is required of them. All the activities involved in the project must be clearly identified. An activity is the performance of an individual job that requires labour, resources and time. Once the activities have been defined, their sequential relationships; which activity comes first, which activities must precede others; must be determined.
Once the activities and their relationships to each other are known a network of activities can be drawn up. The time estimates of each activity then enable a total project completion time can be calculated. If this total completion time is longer than that specified in the objectives, means must be found to reduce the total project time (known as crashing). This reduction in time is usually done by assigning more resources to certain activities which will reduce the overall project time.
Project Control
Once the planning process is complete and the work begun, the focus then moves onto controlling actual work involved. Controlling a project involves ensuring that the timetable set in the planning stage is adhered to and that the activities are completed in the appropriate sequence. Should any unforeseen problems arise at this stage, the project may have to be re-scheduled and additional resources allocated to ensure that the original completion time is adhered to.
We are going to cover:
· Understand how to plan, monitor and control projects with the use of CPM.
· Be able to determine the earliest start, earliest finish, latest finish and float times for each activity, along with the total project completion time.
· Produce Gantt Charts to facilitate monitoring of the project.
· Reduce the total project time, at the least total cost, by crashing the network.
Critical Path Method
The Critical Path Method (CPM) is a popular technique that helps managers plan, schedule, monitor and control large and complex projects.
The steps in the procedure are:
1. Define the project and all of its significant activities.
2. Decide which activities must precede others.
3. Draw the network connecting all of the activities.
4. Assign times and/or costs estimates to each activity.
5. Compute the critical path through the network.
6. Use the network to plan, schedule, monitor and control the project.
Example: to produce a text book
Activity code
Activity description
Duration (months)
Preceding activities
A
Initial consultation with publisher
3
–
B
Prepare proposal
2
A
C
Sign contract
1
B
D
Write material
18
B
E
First proof-read
4
C, D
F
Final proof-read and publish
6
E
The CPM diagram consists of a network of boxes of the form
Activity
EST
EFT
duration
LST
LFT
EST = earliest start time LST = latest start time
EFT = earliest finish time LFT = latest finish time
A
EST
EFT
3
LST
LFT
C
EST
EFT
1
LST
LFT
E
EST
EFT
4
LST
LFT
F
EST
EFT
6
LST
LFT
B
EST
EFT
2
LST
LFT
D
EST
EFT
18
LST
LFT
Forward Pass
The forward pass involves calculating the earliest time that each activity can start and finish.
Let activity A start at time 0 (its earliest start time)
As activity A takes 3 months its earliest finish time is 0+3
Thus earliest finish time (EF) = earliest start time (ES) + duration
B cannot start until A has completed, thus its earliest start time is the same as A’s earliest finish time .Thus B has ES = 3. The duration is 2 and thus the earliest finishing time = 5
C follows on from B and thus its earliest start time = B’s earliest finish time = 5. The earliest finish time is 5+1 = 6
D follows on from B and thus the earliest start time is 5. It earliest finish time is 5+18 = 23
E follows from both C and D . As D has an earliest finish time of 23 , E cannot start until 23. Thus ES= 23. Duration is 4 , thus earliest finish time = 23+4 = 27
F cannot start until E has finished , thus earliest start time = 27 and earliest finish time = 33.
Thus the project will take 33 months .
A
0
3
3
LST
LFT
C
5
6
1
LST
LFT
E
23
27
4
LST
LFT
F
27
33
6
LST
LFT
B
3
5
2
LST
LFT
D
5
23
18
LST
LFT
Backward Pass
The backward pass is used to determine which activities are “critical” and works by considering the latest start time. Latest start time (LS) = latest finish time (LF – Duration)
The latest finish time of F = 33 (as we do not want to extend the total time taken). The latest start time = 33 – duration (6) = 27
As F can start at 27 months , then this will be E’s latest finish time . E has latest start time of 27-4 = 23
C has latest finish of 23 and earliest start of 23-1 = 22
D has latest finish time of 23 and earliest start time of 23-18 = 5
B must occur before C and D. Taking the smallest value of the latest start time form the 2 , we get that B has a latest finish time of 5. The latest start time is 5-2 = 3.
A has a latest finish time of 3 and an earliest start time of 3-3=0
A
0
3
3
0
3
C
5
6
1
22
23
E
23
27
4
23
27
F
27
33
6
27
33
B
3
5
2
3
5
D
5
23
18
5
23
The critical path is where the earliest finishing time equals the latest finishing time. In this example, the critical path is ABDEF. The minimum time the project can take is 33 months
Example: An Estate Agency is planning to open a new office in North London.
The main activities are as follows:
Activity
Description
Preceding activity
Duration (weeks)
A
Find new office location
None
9
B
Recruit new staff
None
7
C
Make alterations to office
A
5
D
Order equipment
A
3
E
Install new equipment
D
3
F
Train staff
B
4
G
Test operations
C,E,F
1
A
0
9
C
9
14
9
5
D
9
12
E
12
15
G
15
16
3
3
1
B
0
7
F
7
11
7
4
Forward Pass
A: EST = 0 duration of A = 9 EFT = 0 + 9 = 9
B: EST = 0 duration of B = 7 EFT = 0 + 7 = 7
C: EST = 9 duration of C = 5 EFT = 9+ 5 = 14
D: EST = 9 duration of D = 3 EFT = 9 + 3 = 12
E: EST = 12 duration of E = 3 EFT = 12 + 3 = 15
F: EST = 7 duration of F = 4 EFT = 7 + 4 = 11
G: G can start until C. E. F has finished. These have EFT values of 14, 15, 11.
Thus EST = 15 duration = 1 EFT = 15 + 1 = 16
Thus the earliest finishing time of the entire project is 16 weeks.
Backward Pass
A
0
9
C
9
14
9
0
9
5
10
15
D
9
12
E
12
15
G
15
16
3
9
12
3
12
15
1
15
16
B
0
7
F
7
11
7
4
11
15
G: LFT = 16 [same as final EFT] EST = LFT – duration = 16 – 1 = 15
E: LFT = 15 EST = LFT – duration = 15 – 3 = 12
C : LFT = 15 EST = LFT – duration = 15 – 5 = 10
D : LFT = 12 EST = LFT – duration = 12 – 3 = 9
F : LFT = 15 EST = LFT – duration = 15 – 4 = 11
A : LFT = 9 (smaller of EFT of C and D) EST = LFT – duration = 9 – 9 = 0
B: LFT= 11 EST = LFT – duration = 11 – 7 = 4
The critical path is the route joining the nodes where EFT = LFT.
In this example, the critical path is ADEG. The shortest finishing time for the project is 16 weeks.
Floats
The floats measure the amount of time that an activity can over-run by without affecting the total completion time.
Float = Latest finish time (LFT) – Earliest finish time (EFT)
Considering the Estate’s Agency example used earlier:
A
0
9
C
9
14
9
0
9
5
10
15
D
9
12
E
12
15
G
15
16
3
9
12
3
12
15
1
15
16
B
0
7
F
7
11
7
4
11
4
11
15
The critical path is ADEG. The shortest finishing time for the project is 16 weeks.
Activity
EFT
LFT
float
Critical
A
9
9
0
yes
B
7
11
4
C
14
15
1
D
12
12
0
yes
E
15
15
0
yes
F
11
15
4
G
16
16
0
yes
Gantt Charts
Gantt Charts can be used by managers to chart the progress of a project. In these charts, time is measured along the horizontal axis, each activity is listed on the vertical axis, and a bar drawn to show the duration of each activity. The variety of Gantt chart we will use, shows each bar stating at its earliest start time (EST) and the length of the bar represent the duration of the activity with any ‘float’ being shown at the end of the duration.
The Gantt Chart for the Estates Agency is :
Monitoring progress on a Gantt chart
A crucial step in meeting a completion target is to monitor a project’s progress. The Gantt charts provide a visual means of tracking the progress of the project’s activities by the scheduled completion dates. The chart also let the manager know where he has some ‘slack’ in the sense of the ‘float’ times.
For example, at the end of day 9, Activity A should be completed and F should have been started. Activities D and C should begin next day. Ideally activity B will have finished but at long as it completes by day 11, the project can still complete on time.
Gantt Charts
Advantages
Disadvantages
1. The chart is easy to draw, particularly if appropriate software is used.
2. An earliest possible completion date is clearly visible.
3. For each activity the earliest possible start and finishing times are shown.
1. The chart only gives one possible sequence for the activities, (the earliest possible completion time).
2. The precedence rules are not clear from the chart so it is not obvious how a delay in one activity will affect the project.
Seminar Activity :
For the following cases, draw the network, find the critical path and shortest duration.
Identify the “float“ for each activity
a)
Activity
Preceding
Duration
A
None
3
B
None
4
C
A
4
D
A
7
E
B,C
2
b)
Activity
Preceding
Duration
A
None
4
B
None
6
C
A
4
D
A
3
E
C
5
F
B
2
G
D,F
8
H
B
5
c)
Activity
Preceding
Duration
A
None
4
B
A
12
C
A
11
D
none
20
E
D
6
F
B,C,E
7
G
F
10
H
E
5
I
G,H
4
Topic 11: Project Management: Crashing
Some definitions:
Normal Cost: The cost associated with the normal time estimate for the activity.
Crash Cost: The cost associated with the minimum possible time for an activity. Crash Costs, because of the extra wages, overtime payments etc., are higher than the normal costs.
Crash time: The minimum possible time that an activity can take.
Slope: The average cost of shortening an activity by one time unit.
Slope = increase in cost = crash cost – normal cost
decrease in time normal time-crash time
Crashing: This process involves finding the least cost method of reducing the overall project duration.
This is done by reducing the time of the activity with the minimum ‘slope’ providing this activity is on the critical path.
The process is repeated until no further savings can be made.
Example:
Activity
Preceding
activity
Normal duration
Crash duration
Normal cost (£)
Crash cost (£)
A
None
4
3
360
420
B
none
8
5
300
510
C
A
5
3
170
270
D
A
9
7
220
300
E
B,C
5
3
200
360
total
1250
Firstly: Draw network, find critical path and shortest duration.
D
4
13
9
5
14
A
0
4
C
4
9
4
0
4
5
4
9
E
9
14
B
0
8
5
9
14
8
1
9
Critical path = A,C,E normal duration = 14 days Total Cost = £1250
Crashing
:
First Crash:
Step 1: Calculate the ‘slopes’ using:
Slope = increase in cost = crash cost – normal cost
decrease in time normal time-crash time
Activity
critical
Normal duration
Crash duration
Normal cost (£)
Crash cost (£)
slope
A
yes
4
3
360
420
=(420-360) / (4-3) = 60
B
8
5
300
510
C
yes
5
3
170
270
=(270-170) / (5-3) = 50
D
9
7
220
300
E
yes
5
3
200
360
= (360-200) / (5-3) = 80
total
1250
Thus activity C has the minimum slope and thus is the cheapest way of reducing the total duration by 1 day. Reducing C by 1 day costs £50.
Step 2: New network:
D
4
13
9
4
13
A
0
4
C
4
8
4
0
4
4
4
9
E
8
13
B
0
8
5
8
13
8
0
8
Step 3: New critical path, total duration and cost
Critical paths = ACE BE and AD and duration = 13 days
Total Cost = £1250 + £50 = £1300
Second Crash: duration of C is now 4 and all activities are critical
Step 1: Calculate the ‘slopes’ using:
Slope = increase in cost = crash cost – normal cost
decrease in time normal time-crash time
Activity
critical
Normal duration
Crash duration
Normal cost (£)
Crash cost (£)
slope
A
yes
4
3
360
420
= 60 [from before]
B
Yes
8
5
300
510
= (510-300) /(8-5) = 70
C
Yes
4
3
170 (220)
270
= 50 [from before]
D
Yes
9
7
220
300
= (300-220) /(9-7) = 40
E
yes
5
3
200
360
= 80 [from before]
total
1250(1300)
Thus activity D has the minimum slope and thus is the cheapest way of reducing the total duration by 1 day. Reducing D by 1 day costs £40.
However, we cannot reduce D on its own as all paths are critical.
We have to consider combinations of reductions:
A and B extra cost = 60 + 70 = 130
D and E extra cost = 40 + 80 = 120
B,C and D extra cost = 70 + 50 + 40 = 160
A and E extra cost = 60 + 80 = 140
All of these combinations will reduce the total duration by 1 day.
The cheapest option is D and E.
Step 2: New network:
D
4
12
8
4
12
A
0
4
C
4
8
4
0
4
4
4
9
E
8
12
B
0
8
4
8
12
8
0
8
Step 3: New critical path, total duration and cost
Critical paths = ACE BE and AD and duration = 12 days
Total Cost = £1300 + £120 = £1420
Third Crash: duration of C is now 4, D is 8 and E is 4 and all activities are critical
Step 1:
Activity
critical
Normal duration
Crash duration
slope
A
yes
4
3
60
B
yes
8
5
70
C
yes
4
3
50
D
yes
8
7
40
E
yes
4
3
80
We cannot reduce one activity on its own as all paths are critical. We have to consider combinations of reductions:
A and B extra cost = 130
D and E extra cost = 120
B,C and D extra cost = 160
A and E extra cost = 140
All of these combinations will reduce the total duration by 1 day. The cheapest option is D and E.
Step 2: New network:
D
4
11
7
4
12
A
0
4
C
4
8
4
0
4
4
4
9
E
8
11
B
0
8
3
8
11
8
0
8
Step 3: New critical path, total duration and cost
Critical paths = ACE BE and AD and duration = 11 days
Total Cost = £1420 + £120 = £1540
Fourth Crash: duration of C is now 4, D is 7 and E is 3 and all activities are critical
Step 1:
Activity
critical
Normal duration
Crash duration
slope
A
yes
4
3
60
B
yes
8
5
70
C
yes
4
3
50
D
yes
7
7
40
E
yes
3
3
80
We cannot reduce one activity on its own as all paths are critical and we cannot reduce activities D and E as they are at their ‘crash duration’.
We have to consider combinations of reductions:
A and B extra cost = 130
D and E D and E cannot be reduced
B,C and D D cannot be reduced
A and E E cannot be reduced
The only possible option left is to reduce A and B. This will reduce the total duration by 1 day.
Step 2: New network:
D
3
10
7
4
12
A
0
3
C
3
7
3
0
4
4
3
7
E
7
10
B
0
7
3
7
10
7
0
7
Step 3: New critical path, total duration and cost
Critical paths = ACE BE and AD and duration = 10 days
Total Cost = £1540 + £130 = £1670
Activity
critical
Normal duration
Crash duration
slope
A
yes
3
3
60
B
yes
7
5
70
C
yes
4
3
50
D
yes
7
7
40
E
yes
3
3
80
The only activities that are not at their minimum duration are B and C.
As reducing B and C will not reduce the total duration, we cannot ‘crash’ any further.
Example: Klone Computers
Klone Computers is a small manufacturer of personal computers that is about to design, manufacture and Market the Klone 2000 palmbook computer. The company faces three major tasks in introducing a new computer.:
1. Manufacturing the new computer
2. Training staff and sales teams to operate the new computer
3. Advertising the new computer
When the proposed specifications for the new computer have been reviewed, the manufacturing phase begins with the design of a prototype computer. Once the design is determined, the required materials are purchased and the prototypes are manufactured. Prototype models are then tested and analysed by staff who have completed the staff training course. Based on their input, refinements are made to the prototype and an initial production run of computers is scheduled.
Staff training of company personnel begins once the computer is designed allowing staff to test the prototypes once they have been manufactured. After the computer design has been revised based on staff input, the sales force undergoes full-scale training.
Advertising is a two-phase process. First, a small group works closely with the design team so that once a product design has been chosen, the marketing team can begin an initial pre-production advertising campaign. Following this initial campaign and completion of the final design revisions, a larger advertising campaign team is introduced to the special features of the computer, and a full-scale advertising programme is launched.
The entire project is concluded when the initial production run is completed, the sales staff are trained, and the advertising campaign is underway.
Klone has come up with the following information to assist the planning of the project.
Phase
Activity
Description
Immediate predecessors
estimated completion time(days)
Manufacturing
A
Prototype model design
None
90
B
Purchase of materials
A
15
C
Manufacture of prototype models
B
5
D
Revision of design
G
20
E
Initial production run
D
21
Training
F
Staff training
A
25
G
Staff input on prototype models
C, F
14
H
Sales training
D
28
Advertising
I
Pre-production advertising
A
30
J
Post-production advertising
D,I
45
[ based on an example in Lawrence,John & Paternack,Barry, Applied Management
Science, 2nd edition, Wiley 2002
B
90
105
C
105
110
15
5
E
149
170
21
A
0
90
F
90
115
G
115
129
D
129
149
90
25
14
20
H
149
177
28
I
90
120
30
J
149
194
45
Forward pass:
Node
EFT
A
EST = 0 EFT = EST+ duration of A = 0 + 90 = 90
90
B
EST = 90 EFT = 90 + duration of B = 90 +15 = 105
105
C
EST= 105 EFT = 105 +duration of C = 105 +5 = 110
110
F
EST = 90 EFT= 90+ duration of F = 90 + 25 = 115
115
G
G cannot start until C & F have finished
EST = 115 EFT= 115duration of G = 115 +14 = 129
129
D
EST = 129 EFT = 129 + duration of D = 129 +20 = 149
149
E
EST = 149 EFT= 149+ duration of E = 149 + 21 = 170
170
H
EST = 149 EFT = 149+ duration of H = 149 + 28 = 177
177
I
EST = 90 EFT = 90 + duration of I = 90 + 30 = 120
J
J cannot start until both D and I have finished
EST = 149 EFT = 149 + duration of J = 149+ 45 = 194
194
B
90
105
C
105
110
15
95
110
5
110
115
E
149
170
21
173
194
A
0
90
F
90
115
G
115
129
D
129
149
90
0
90
25
90
115
14
115
129
20
129
149
H
149
177
28
166
194
I
90
120
30
149
J
149
194
45
149
194
Backward pass:
Node
LST
J
LFT = 194, LST = 194 – duration of J = 149
194
H
LFT = 194 LST = 194 – duration of H = 194 – 28 = 166
166
E
LFT = 194 LST = 194 – duration of E = 194 – 21 = 173
194
D
Via E: LFT = LST of E = 173
Via H: LFT = LST of H = 166
Via J: LFT = LST of = 149
Thus LFT of D = 149, LST of D = 159 – duration of D = 149 – 20 = 129
129
G
LFT = LST of D = 129 LST of G = 129 – duration of G = 129-14= 115
115
C
LFT = LST of G =115 LST of C = 115 – duration of c = 115 – 5 = 115
110
B
LFT = LST of C = 110 LST = 110 – duration of B = 110 – 15 = 95
95
F
LFT = LST of G = 115 LST of F = 115 – duration of F = 115 -25 =90
90
I
LFT = LST of J = 149 LST = I = 149 – duration of I = 149-30 = 119
119
A
Via B: LFT = LST of B = 95
Via F: LFT = LST of F = 90
Via I: LFT = LST of I = 119
Thus LST of A = 90 – duration of a = 90-90 = 0
0
Critical path is AFGDJ minimum completion time is 194 days
Floats
The floats measure the amount of time that an activity can over-run by without affecting the total completion time.
Float = Latest finish time (LFT) – Earliest finish time (EFT)
Activity
EFT
LFT
float
Critical
A
90
90
0
yes
B
105
110
5
C
110
115
5
D
149
149
0
yes
E
170
194
24
F
115
115
0
yes
G
129
129
0
Yes
H
177
194
17
I
120
149
29
J
194
194
0
yes
From the Gantt chart below, we can see the floats
In the Gantt chart the activities are presented at their EST with the duration represented in black and float in grey.
Seminar Question
A bank is planning to install a new computerised accounting system. The bank management has determined the activities required to complete the project, the precedence relationships, and activity time estimates (in weeks) as shown in the table below.
activity
description
predecessor
Normal time
Crash time
Normal cost
Crash cost
A
Recruit staff
–
9
7
4800
6300
B
Systems development
–
11
9
9100
15500
C
Systems training
A
7
5
3000
4000
D
Equipment training
A
10
8
3600
5000
E
Manual system test
B,C
1
1
0
0
F
Preliminary system changeover
B,C
5
3
1500
2000
G
Computer-personnel interface
D,E
6
5
1800
2000
H
Equipment modification
D,E
3
3
0
0
I
Equipment testing
H
1
1
0
0
J
System debugging and installation
F,G
2
2
0
0
K
Equipment changeover
G,I
8
6
5000
7000
a. Determine the project completion time and the critical path.
b. Represent the project as a Gantt chart.
b. Crash the network to 26 weeks. Indicate how much this will cost the bank. Identify the new critical path.
[Example is based on an example from Taylor, Bernard, Introduction to Management Science, 7th edition, Prentice Hall, 2002]
How to use MS project 2010
Using the estate agents example we had before, we can use Microsoft project to schedule the task.
The main difference we will notice at first is that MS Project uses a calendar . To make the comparison easy with our manual example let us start the project on 1st January 2013 . We calculated that the project would take at least 16 weeks . 16 weeks from Tuesday 1st January would have taken us to Monday 22nd April.
The data entry looks like:
Note only the first date, 1st January was entered , the rest were calculated by the software.
Data entered:
As a network diagram this looks like:
As Gantt Chart we get
Using the example above :
Load Up Microsoft Office Project
Choose File, New, Blank Project
To use the system with “auto scheduled tasks” we need to turn off manually scheduled tasks .
We do this by selecting , File , Options , Schedule, New task created as choose Auto Scheduled then OK
To start to enter the project details:
The screen is split in two, you will need to use more on the left hand side to add predecessors
The first entry is Z: Start project with 0 duration. It is here that we set the start date of 1st January
The rest of the details can now be added:
Notes: a) only the first starting date is added by us
b) The predecessors have to be added by the activity number given in the first column
To get the different displays:
Using the View Tab, choose Gantt chart icon
You will have to increase the size of the right-hand of the screen to see the whole chart
Using the View Tab, choose the network icon . This looks like
Seminar Activity
Use MS project to determine the critical path and the project duration for :
a. Klone Computers
b. Bank Accounting System
Answers
Critical Path is A,F,G,D,J start date = Tues1st January, end date = Friday 27th September (38 weeks and 4 days = 38×5 + 4 = 194 working days
Critical path = ADGK duration = 33 weeks
Janet Geary 2012 Page 131
Estate Agents
no activity A B C D E F G 0 0 9 9 12 7 15 duration A B C D E F G 9 7 5 3 3 4 1 float A B C D E F G 0 4 1 0 0 4 0
days
Klone Computers
A B C D E F G H I J 0 90 105 129 149 90 115 149 90 149 A B C D E F G H I J 90 15 5 20 21 25 14 28 30 45 A B C D E F G H I J 0 5 5 0 24 0 0 17 29 0
days
age of customers
20 to 30 30 to 40 40 to 50 50 to 60 60 to 70 70 to 80 0.03 0.12 0.34 0.34 0.12 0.05 age group
probability
ages of customers
20 to 25 25 to 30 30 to 35 35 to 40 40 to 45 45 to 50 50 to 55 55 to 60 60 to 65 65 to 70 70 to 75 75 to 80 0 0.01 0.02 0.05 7.0000000000000007E-2 0.12 0.22 0.22 0.12 0.08 0.04 0.03 0.02 age group
probability
ages of customers
20 to 22 22 to 24 24 to 26 26 to 28 28 to 30 30 to 32 32 to 34 34 to 36 36 to 38 38 to 40 40 to 42 42 to 44 44 to 46 46 to 48 48 to 50 50 to 52 52 to 54 54 to 56 56 to 58 58 to 60 60 to 62 62 to 64 64 to 66 66 to 68 69 to 70 70 to 72 72 to 74 74 to 76 76 to 78 78 to 80 0 0.01 0 0.01 0.01 0.01 0.02 0.02 0.02 0.05 0.05 0.05 7.0000000000000007E-2 0.09 0.08 0.11 0.08 0.08 0.06 0.03 0.04 0.02 0.02 0.02 0.01 0.02 0.01 0.01 0 0 age groups
probability
standard normal distribution
-3 -2.9 -2.8 -2.7 -2.6 -2.5 -2.4 -2.2999999999999998 -2.2000000000000002 -2.1 -2 -1.9 -1.8 -1.7 -1.6 -1.5 -1.4 -1.3 -1.2 -1.1000000000000001 -1 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1000000000000001 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2000000000000002 2.2999999999999998 2.4 2.5000000000000102 2.6 2.7 2.80000000000001 2.9000000000000101 3.0000000000000102 4.4318484119380075E-3 5.9525324197758538E-3 7.9154515829799686E-3 1.0420934814422592E-2 1.3582969233685613E-2 1.752830049356854E-2 2.2394530294842899E-2 2.8327037741601186E-2 3.5474592846231424E-2 4.3983595980427191E-2 5.3990966513188063E-2 6.5615814774676595E-2 7.8950158300894149E-2 9.4049077376886947E-2 0.11092083467945554 0.12951759566589174 0.14972746563574488 0.17136859204780736 0.19418605498321295 0.21785217703255053 0.24197072451914337 0.26608524989875482 0.28969155276148273 0.31225393336676127 0.33322460289179967 0.35206532676429952 0.36827014030332333 0.38138781546052414 0.39104269397545588 0.39695254747701181 0.3989422804014327 0.39695254747701181 0.39104269397545588 0.38138781546052414 0.36827014030332333 0.35206532676429952 0.33322460289179967 0.31225393336676127 0.28969155276148273 0.26608524989875482 0.24197072451914337 0.21785217703255053 0.19418605498321295 0.17136859204780736 0.14972746563574488 0.12951759566589174 0.11092083467945554 9.4049077376886947E-2 7.8950158300894149E-2 6.5615814774676595E-2 5.3990966513188063E-2 4.3983595980427191E-2 3.5474592846231424E-2 2.8327037741601186E-2 2.2394530294842899E-2 1.7528300493568086E-2 1.3582969233685613E-2 1.0420934814422592E-2 7.915451582979743E-3 5.9525324197756795E-3 4.431848411937874E-3 standard deviations from the mean
-3 -2.9 -2.8 -2.7 -2.6 -2.5 -2.4 -2.2999999999999998 -2.2000000000000002 -2.1 -2 -1.9 -1.8 -1.7 -1.6 -1.5 -1.4 -1.3 -1.2 -1.1000000000000001 -1 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1000000000000001 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2000000000000002 2.2999999999999998 2.4 2.5000000000000102 2.6 2.7 2.80000000000001 2.9000000000000101 3.0000000000000102 4.4318484119380075E-3 5.9525324197758538E-3 7.9154515829799686E-3 1.0420934814422592E-2 1.3582969233685613E-2 1.752830049356854E-2 2.2394530294842899E-2 2.8327037741601186E-2 3.5474592846231424E-2 4.3983595980427191E-2 5.3990966513188063E-2 6.5615814774676595E-2 7.8950158300894149E-2 9.4049077376886947E-2 0.11092083467945554 0.12951759566589174 0.14972746563574488 0.17136859204780736 0.19418605498321295 0.21785217703255053 0.24197072451914337 0.26608524989875482 0.28969155276148273 0.31225393336676127 0.33322460289179967 0.35206532676429952 0.36827014030332333 0.38138781546052414 0.39104269397545588 0.39695254747701181 0.3989422804014327 0.39695254747701181 0.39104269397545588 0.38138781546052414 0.36827014030332333 0.35206532676429952 0.33322460289179967 0.31225393336676127 0.28969155276148273 0.26608524989875482 0.24197072451914337 0.21785217703255053 0.19418605498321295 0.17136859204780736 0.14972746563574488 0.12951759566589174 0.11092083467945554 9.4049077376886947E-2 7.8950158300894149E-2 6.5615814774676595E-2 5.3990966513188063E-2 4.3983595980427191E-2 3.5474592846231424E-2 2.8327037741601186E-2 2.2394530294842899E-2 1.7528300493568086E-2 1.3582969233685613E-2 1.0420934814422592E-2 7.915451582979743E-3 5.9525324197756795E-3 4.431848411937874E-3
-3 -2.9 -2.8 -2.7 -2.6 -2.5 -2.4 -2.2999999999999998 -2.2000000000000002 -2.1 -2 -1.9 -1.8 -1.7 -1.6 -1.5 -1.4 -1.3 -1.2 -1.1000000000000001 -1 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1000000000000001 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2000000000000002 2.2999999999999998 2.4 2.5000000000000102 2.6 2.7 2.80000000000001 2.9000000000000101 3.0000000000000102 4.4318484119380075E-3 5.9525324197758538E-3 7.9154515829799686E-3 1.0420934814422592E-2 1.3582969233685613E-2 1.752830049356854E-2 2.2394530294842899E-2 2.8327037741601186E-2 3.5474592846231424E-2 4.3983595980427191E-2 5.3990966513188063E-2 6.5615814774676595E-2 7.8950158300894149E-2 9.4049077376886947E-2 0.11092083467945554 0.12951759566589174 0.14972746563574488 0.17136859204780736 0.19418605498321295 0.21785217703255053 0.24197072451914337 0.26608524989875482 0.28969155276148273 0.31225393336676127 0.33322460289179967 0.35206532676429952 0.36827014030332333 0.38138781546052414 0.39104269397545588 0.39695254747701181 0.3989422804014327 0.39695254747701181 0.39104269397545588 0.38138781546052414 0.36827014030332333 0.35206532676429952 0.33322460289179967 0.31225393336676127 0.28969155276148273 0.26608524989875482 0.24197072451914337 0.21785217703255053 0.19418605498321295 0.17136859204780736 0.14972746563574488 0.12951759566589174 0.11092083467945554 9.4049077376886947E-2 7.8950158300894149E-2 6.5615814774676595E-2 5.3990966513188063E-2 4.3983595980427191E-2 3.5474592846231424E-2 2.8327037741601186E-2 2.2394530294842899E-2 1.7528300493568086E-2 1.3582969233685613E-2 1.0420934814422592E-2 7.915451582979743E-3 5.9525324197756795E-3 4.431848411937874E-3
-3 -2.9 -2.8 -2.7 -2.6 -2.5 -2.4 -2.2999999999999998 -2.2000000000000002 -2.1 -2 -1.9 -1.8 -1.7 -1.6 -1.5 -1.4 -1.3 -1.2 -1.1000000000000001 -1 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1000000000000001 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2000000000000002 2.2999999999999998 2.4 2.5000000000000102 2.6 2.7 2.80000000000001 2.9000000000000101 3.0000000000000102 4.4318484119380075E-3 5.9525324197758538E-3 7.9154515829799686E-3 1.0420934814422592E-2 1.3582969233685613E-2 1.752830049356854E-2 2.2394530294842899E-2 2.8327037741601186E-2 3.5474592846231424E-2 4.3983595980427191E-2 5.3990966513188063E-2 6.5615814774676595E-2 7.8950158300894149E-2 9.4049077376886947E-2 0.11092083467945554 0.12951759566589174 0.14972746563574488 0.17136859204780736 0.19418605498321295 0.21785217703255053 0.24197072451914337 0.26608524989875482 0.28969155276148273 0.31225393336676127 0.33322460289179967 0.35206532676429952 0.36827014030332333 0.38138781546052414 0.39104269397545588 0.39695254747701181 0.3989422804014327 0.39695254747701181 0.39104269397545588 0.38138781546052414 0.36827014030332333 0.35206532676429952 0.33322460289179967 0.31225393336676127 0.28969155276148273 0.26608524989875482 0.24197072451914337 0.21785217703255053 0.19418605498321295 0.17136859204780736 0.14972746563574488 0.12951759566589174 0.11092083467945554 9. 4049077376886947E-2 7.8950158300894149E-2 6.5615814774676595E-2 5.3990966513188063E-2 4.3983595980427191E-2 3.5474592846231424E-2 2.8327037741601186E-2 2.2394530294842899E-2 1.7528300493568086E-2 1.3582969233685613E-2 1.0420934814422592E-2 7.915451582979743E-3 5.9525324197756795E-3 4.431848411937874E-3
30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 3.28500454538971E-10 1.5189707124558215E-9 6.5981080089264338E-9 2.692440010635819E-8 1.0321177471574996E-7 3.7167987868357442E-7 1.2573768221481113E-6 3.9959352767263687E-6 1.1929659135301238E-5 3.3457556441221342E-5 8.8148920591861352E-5 2.1817067376144004E-4 5.0726201432494203E-4 1.1079621029845019E-3 2.2733906253977632E-3 4.3820751233921351E-3 7.9349129589168545E-3 1.3497741628297016E-2 2.1569329706627883E-2 3.2379398916472936E-2 4.5662271347255479E-2 6.0492681129785841E-2 7.5284358038701107E-2 8.8016331691074881E-2 9.6667029200712309E-2 9.9735570100358176E-2 9.6667029200712309E-2 8.8016331691074881E-2 7.5284358038701107E-2 6.0492681129785841E-2 4.5662271347255479E-2 3.2379398916472936E-2 2.1569329706627883E-2 1.3497741628297016E-2 7.9349129589168545E-3 4.3820751233921351E-3 2.2733906253977632E-3 1.1079621029845019E-3 5.0726201432494203E-4 2.1817067376144004E-4 8.8148920591861352E-5 3.3457556441221342E-5 1.1929659135301238E-5 7.2822757649349693E-41 1.0183369193819613E-40 1.4231295078998777E-40 1.9875859769177719E-40 2.7741886482471005E-40 3.8696761657405153E-40 5.3943835309166803E-40 7.5151488152493637E-40 1.0463137077062267E-39 1.4558438999341422E-39 2.0243997410497396E-39 2.8132368728300922E-39 3.9070133477816142E-39 5.4226560006516564E-39 7.5215575 117712494E-39 1.0426344544303782E-38 1.4443915199930144E-38 1.9997069392517598E-38 -3 -2.9 -2.8 -2.7 -2.6 -2.5 -2.4 -2.2999999999999998 -2.2000000000000002 -2.1 -2 -1.9 -1.8 -1.7 -1.6 -1.5 -1.4 -1.3 -1.2 -1.1000000000000001 -1 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1000000000000001 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2000000000000002 2.2999999999999998 2.4 2.5000000000000102 2.6 2.7 2.80000000000001 2.9000000000000101 3.0000000000000102 4.4318484119380075E-3 5.9525324197758538E-3 7.9154515829799686E-3 1.0420934814422592E-2 1.3582969233685613E-2 1.752830049356854E-2 2.2394530294842899E-2 2.8327037741601186E-2 3.5474592846231424E-2 4.3983595980427191E-2 5.3990966513188063E-2 6.5615814774676595E-2 7.8950158300894149E-2 9.4049077376886947E-2 0.11092083467945554 0.12951759566589174 0.14972746563574488 0.17136859204780736 0.19418605498321295 0.21785217703255053 0.24197072451914337 0.26608524989875482 0.28969155276148273 0.31225393336676127 0.33322460289179967 0.35206532676429952 0.36827014030332333 0.38138781546052414 0.39104269397545588 0.39695254747701181 0.3989422804014327 0.39695254747701181 0.39104269397545588 0.38138781546052414 0.36827014030332333 0.35206532676429952 0.33322460289179967 0.31225393336676127 0.28969155276148273 0.26608524989875482 0.24197072451914337 0.21785217703255053 0.19418605498321295 0.17136859204780736 0.14972746563574488 0.12951759566589174 0.11092083467945554 9.4049077376886947E-2 7.8950158300894149E-2 6.5615814774676595E-2 5.3990966513188063E-2 4.3983595980427191E-2 3.5474592846231424E-2 2.8327037741601186E-2 2.2394530294842899E-2 1.7528300493568086E-2 1.3582969233685613E-2 1.0420934814422592E-2 7.915451582979743E-3 5.9525324197756795E-3 4.431848411937874E-3
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 9.000000000 0000011E-2 0.16000000000000003 0.21 0.24 0.25 0.24 0.21000000000000002 0.15999999999999998 8.9999999999999983E-2 0 value of p
value of pq = p(1-p)
employee data on age and length of service
24 51 25 34 40 32 23 28 21 33 26 21 57 27 23 45 22 43 35 19 28 44 40 35 48 25 43 24 30 35 2 24 2 9 12 6 1 5 3 4 3 1 14 5 4 13 1 11 8 1 8 12 13 3 9 2 11 4 5 6 age
length of service
Soft toy company
machine hours 0 5 10 15 20 25 30 35 40 45 50 50 40 30 20 10 0 teddy bears
rabbits
Soft toy company
machine hours 0 5 10 15 20 25 30 35 40 45 50 50 40 30 20 10 0 hand labour 0 5 10 15 20 25 30 35 40 45 30 26.666666666666664 23.333333333333332 20 16.666666666666664 13.333333333333332 10 6.6666666666666661 3.333333333333333 0 teddy bears
rabbits
Soft toy company
machine hours 0 5 10 15 20 25 30 35 40 45 50 50 40 30 20 10 0 hand labour 0 5 10 15 20 25 30 35 40 45 30 26.666666666666664 23.333333333333332 20 16.666666666666664 13.333333333333332 10 6.6666666666666661 3.333333333333333 0 0 5 10 15 20 25 30 35 40 40 35 30 25 20 15 10 5 0 teddy bears
rabbits
Soft toy company
machine hours 0 5 10 15 20 25 30 35 40 45 50 50 40 30 20 10 0 hand labour 0 5 10 15 20 25 30 35 40 45 30 26.666666666666664 23.333333333333332 20 16.666666666666664 13.333333333333332 10 6.6666666666666661 3.333333333333333 0 0 5 10 15 20 25 30 35 40 40 35 30 25 20 15 10 5 0 teddy bears
rabbits
camping trip
0 1 2 3 4 5 6 7 8 9 10 11 12 13 13 12 11 10 9 8 7 6 5 4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 8 7.5 7 6.5 6 5.5 5 4.5 4 3.5 3 2.5 2 1.5 1 0.5 0 4-person tents
8-person tents