CAN ANY OF YOU COMPLETE THESE 13 QUESTIONS WITHIN 12 HRS , REPLY ASAP.
What to submit: Please submit a single Word file containing your numerical results, comments and graphics (if any) for all questions. Also submit the worksheets (if any) you used to produce the report – a total of two separate files (teaching assistants will review the worksheets in the event of errors in the report).
We have been tossing coins (or letting a computer toss them for us) to see what happens. One thing we learned is that we can get a slightly different answer each time we do it. There are cases in which it is possible to get an exact answer. This Assignment explores that. You can read the Exhaustion Methodto see how you could generate a list of all possible outcomes for a coin tossing experiment. We suggest you use it to list the outcomes for seven tosses. We hope this will help you see that such a list is not too difficult to make if you go at it in a logical and organized way. However, you do not need to follow the process to do this Assignment. Please use the list of all 128 outcomes to answer the questions 8-10 in the assignment and to check the list you made.
Statistics 1 Assignment 2 (40 points)
Q.1 (2 pts) In a survey of engineers at a hard drive manufacturer it was found that 18% were female, 7% were black, 35% had degrees in electrical or computer engineering, and 40% were under the age of 35. Would it make sense to present this information in a pie chart? Why or why not?
Q.2 (2 pts)
In basketball, some fouls result in “free throws” (unimpeded shots) by the player fouled. Over his career, a basketball player has scored on 1210 free throw attempts and missed 214 free throw attempts. What is his estimated probability of successfully scoring on a free throw attempt?
Q.3 (2 pts)
A political commentator makes the following observation in 1991: “From 1973 to 1982, the US economy grew at an annual rate of only 2%. From 1983 to 1990, the growth rate doubled to 4%. That’s a big difference.” Review the spreadsheet on GDP that was presented in this chapter and critique this statement, especially with respect to choice of comparison periods.
Q.4 (2 pts)
Consider the following data on the median home value in Boston neighborhoods (from the mid 20th century):
22
13.1
17.8
20.3
15.4
11.7
25.3
15.2
27.1
23.2
23.1
18.1
32.9
20.3
21.1
21.1
19.9
23.1
16.1
10.4
Find the standard normal score for the first value (22). (For purposes of calculating the standard deviation, you can consider this either as the entire population, or as a sample.
Q.5 (3 pts)
Evidence has been produced that famous people are less likely to die in the month of their birthday than in other months. The (skeptical) hypothesis is that dying is equally likely in any month regardless of birthday.
Now suppose that out of 120 celebrity deaths, only 7 occurred in the month of their birthday.
Imagine a hat with 12 cards, each card a month, as well as a list of the 120 celebrity birthdays. We shuffle and pick a card, noting whether it matched the first celebrity birth month. We then repeat this (replacing the card each time, of course), each time noting whether the month picked from the hat matched the next birth month, etc., until we have gone all the way through the 120 names on the list.
Then we repeat this procedure 100 times, each time recording how many matches we got between the 120 picks from the hat, and the list of 120 birthdays. We got the following frequency distribution. What is your conclusion and why?
Number dying in birthday month
Frequency
6
1
7
3
8
9
9
20
10
32
11
25
12
7
13
1
14
2
Q.6 (4 pts) With the CBC simulation that you already ran, run it again nine more times and report the ten p-values you obtain. So, you will (1) toss a coin ten times, (2) repeat step 1 a thousand times, recording what proportion of the 1000 got 7 or more heads, and then (3) doing steps 1 and 2 nine more times for a total of 100 000 tosses. NOTE: here is a that will lead you to an Excel spreadsheet already set up to do this and another Excel for Windows spreadsheet using Box Sampler (you can download macro-enabled workbook or you can install Box Sampler on your Windows computer). So all you really need to do is press a key or click on a couple menu items ten times and write down what you get. Then make a nice statistical summary of the results and estimate the true p-value. Also give an estimate of how far off that value might be from the true value.
Q.7 (5 pts) This exercise continues our work with the CBC story. In the text we remarked that cutting the number of major medical errors in half would have been more impressive if the number of errors had been larger. Redo question 6, but this time imagine we had 20 major medical errors to assign to years. If you use one of the spreadsheets we provided, you will need to make at least these changes:
a. Change the number of tosses from 10 to 20.
b. The formula that counts how many times 2008 came up will have to be changed to point to a range of 20 numbers rather than 10.
c. The table where you record the frequency distribution of the outcomes will have to expand to have 21 rows instead of 11. (Make good use of cut-and- paste here.)
d. Cutting the errors in half will now mean 14 or more errors in 2008 so you will have to change what you count in the frequency table when you compute the p-value.
Report a frequency table of outcomes and a p-value. Compare the p-value to what you got with just 10 medical errors.
Q.8 (3 Points) Use the list of 128 outcomes for seven tosses of a coin (the link is in the assignment introduction) to make a table for the frequency and probability distribution of the random variable “number of heads in seven tosses”. (You can do this simply by counting.) There should be eight possible values, and each requires a probability. What do your probabilities add up to?
Q.9 (2 Points) Use the table you made above to compute the probability that the number of heads will be at least double the number of tails (this translates to “five or more heads”).
Q.10 (8 Points Total) Suppose you bought some really cheap blank DVDs at the dollar store. Then you look them up on the web and find that half these disks are dead on arrival and when data are recorded on the remainder, about half of those become unreadable within the first year, half of the survivors die in the second year, etc. Let’s see what happens over seven years. We can model the distribution of “time before failure” with a coin toss. Make a table for the probability distribution of the number of tosses before you got a head (=failure). For example this random variable assigns 3 to TTTHTHT and 0 to HTHTHTH, meaning one disk lasted three years and another was dead on arrival. (Make sure you get the right counts for these two examples before you continue.) If you never get a head (TTTTTTT), assign the value 7.
Use the “list of 128 outcomes” linked in the instructions for this homework to answer the following two questions.
a. (5 points) What is the probability that a disk will last 6 years (i.e. 7 tosses of the coin) before failing?
b. (3 points) Often of interest in such situations is the mean time before failure (MTBF). This is a common spec for computer hard drives. Find this for the distribution above.
Note: Could you use a simulation to solve these problems? Yes and a link will be included to a Box Sampler solution in the model answer. However, the setup is a bit involved and we are not asking you to do a simulation for this problem. This is the type of simulation more suited to a programming language than a statistical analysis package.
Q.11 (1 pts) Here is a table of column percents for Department D in the Berkeley study of graduate admissions.
Female
Male
All
Admitted
34.93
33.09
33.96
Rejected
65.07
66.91
66.04
All
100.00
100.00
100.00
What does this tell you about the relative admission rates of males and females in this department?
Q.12 (2 pts) Here is a table for Department F at UC Berkeley.
Admitted
Rejected
All
Female
24
317
341
Male
22
351
373
All
46
668
714
Read it carefully and do an appropriate computation to compare the admission rates of males and females.
Q.13 (4 pts) Here is a contingency table for the variables Dept. and Admit from Berkeley.
A
B
C
D
E
F
All
Admitted
601
370
322
269
147
46
1755
Rejected
332
215
596
523
437
668
2771
All
933
585
918
792
584
714
4526
Find P(C), P(R) and P(C∩R). (Note: The last is the probability of C intersect R in case your browser does not show math. symbols. “R” means “rejected”)
Possible Outcomes for Seven Tosses of a Fair Coin
HHHHHHH
THHHHHH
HTHHHHH
TTHHHHH
HHTHHHH
THTHHHH
HTTHHHH
TTTHHHH
HHHTHHH
THHTHHH
HTHTHHH
TTHTHHH
HHTTHHH
THTTHHH
HTTTHHH
TTTTHHH
HHHHTHH
THHHTHH
HTHHTHH
TTHHTHH
HHTHTHH
THTHTHH
HTTHTHH
TTTHTHH
HHHTTHH
THHTTHH
HTHTTHH
TTHTTHH
HHTTTHH
THTTTHH
HTTTTHH
TTTTTHH
HHHHHTH
THHHHTH
HHTHHTH
TTHHHTH
HHTHHTH
THTHHTH
HTTHHTH
TTTHHTH
HHHTHTH
THHTHTH
HTHTHTH
TTHTHTH
HHTTHTH
THTTHTH
HTTTHTH
TTTTHTH
HHHHTTH
THHHTTH
HTHHTTH
TTHHTTH
HHTHTTH
THTHTTH
HTTHTTH
TTTHTTH
HHHTTTH
THHTTTH
HTHTTTH
TTHTTTH
HHTTTTH
THTTTTH
HTTTTTH
TTTTTTH
HHHHHHT
THHHHHT
HTHHHHT
TTHHHHT
HHTHHHT
THTHHHT
HTTHHHT
TTTHHHT
HHHTHHT
THHTHHT
HTHTHHT
TTHTHHT
HHTTHHT
THTTHHT
HTTTHHT
TTTTHHT
HHHHTHT
THHHTHT
HTHHTHT
TTHHTHT
HHTHTHT
THTHTHT
HTTHTHT
TTTHTHT
HHHTTHT
THHTTHT
HTHTTHT
TTHTTHT
HHTTTHT
THTTTHT
HTTTTHT
TTTTTHT
HHHHHTT
THHHHTT
HHTHHTT
TTHHHTT
HHTHHTT
THTHHTT
HTTHHTT
TTTHHTT
HHHTHTT
THHTHTT
HTHTHTT
TTHTHTT
HHTTHTT
THTTHTT
HTTTHTT
TTTTHTT
HHHHTTT
THHHTTT
HTHHTTT
TTHHTTT
HHTHTTT
THTHTTT
HTTHTTT
TTTHTTT
HHHTTTT
THHTTTT
HTHTTTT
TTHTTTT
HHTTTTT
THTTTTT
HTTTTTT
TTTTTTT
Now you just need to go through the list and count to get a frequency (or probability) distribution. For example, here is a count of heads for the first five outcomes.
HHHHHHH 7
THHHHHH 6
HTHHHHH 6
TTHHHHH 5
HHTHHHH 6
The Method of Exhaustion
This exercise which is intended to give you a feel for where the theoretical probabilities come from. It is the word processor equivalent of a tree diagram. It’s based on repeating a pattern, but the pattern is actually harder to see at the beginning, so we will start with a list of possible outcomes for two tosses of a fair coin (which is an approximate model for the sexes of children born in a family with two children). The possibilities are
HH
TH
HT
TT
We will use this to build the corresponding list for three tosses. In three tosses, we could have any of the above outcomes followed by a head plus any of the above outcomes followed by a tail, for a total of eight possibilities. Use Copy and Paste in your word processor to make two copies of the outcomes for two tosses.
HH
TH
HT
TT
HH
TH
HT
TT
Then add an H at the end of the top four and a T at the end of the bottom four to get these outcomes for three tosses.
HHH
THH
HTH
TTH
HHT
THT
HTT
TTT
You can continue in this pattern to create lists of possible outcomes for 4, 5, 6 and 7 or more tosses of a fair coin. The number of outcomes should double each time. You should find that this is repetitive but not difficult and not too time consuming. This approach allows you to compute theoretical probabilities for a few tosses. If you print your document you will find that continuing to, say, 20 tosses, would use up a lot of trees. Still, you could answer any question by simple counting. If we wish to work with large numbers of tosses, a formula might be handy. The way to find a formula is to look for patterns in easy cases we can do by hand. For that purpose, please make a frequency table and some sort of display for the outcomes of 4, 5, 6 and 7 tosses of a fair coin. Here is three tosses done as an example:
# heads freq.
0 1
1 3
2 3
3 1
Here is a stem-and-leaf plot. If you wonder where the leaves came from, think of “1” as “1.0”.
0|0
1|000
2|000
3|0
1