hypothesis testing

Assignment 2: Discussion

You are a data analyst with John and Sons Company. The company has a large number of manufacturing plants in the United States and overseas. The company plans to open a new manufacturing plant. It has to decide whether to open this plant in the United States or overseas.

What is an appropriate null hypothesis to compare the quality of the product manufactured in the overseas plants and the U.S. plants? Why? How would you choose an appropriate level of significance for your statistical test? What are the possible outcomes and limitations of your statistical test?

 

By Saturday, March 16, 2013, post to the Discussion Area the requested information and analysis. 

 

· From t<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<b>b>b>b>b>b>b>b>b>b>b>b>b>b>b>b>b>b>b>b>b>b>b>b>b>b>b>b>b>b>b>b>b>b>b>b>b>b>b>b>b>b>b>b>b>b>he text<<<<<<<<<<<<<<<<<<b>b>b>b>b>b>b>b>b>b>b>b>b>b>b>b>b>b>book, Business Statistics in Practice, read the following chapters:

·

Hypothesis Testing

·

Decision Theory

CHAPTER 9: Hypothesis Testing

Chapter Outline


9.1
The Null and Alternative Hypotheses and Errors in Hypothesis Testing


9.2

z

Tests about a Population Mean:
σ
Known


9.3

t
Tests about a Population Mean:
σ
Unknown


9.4

z
Tests about a Population Proportion


9.5
Type II Error Probabilities and Sample Size Determination (Optional)


9.6
The Chi-Square Distribution (Optional)


9.7
Statistical Inference for a Population Variance (Optional)

Hypothesis testing is a statistical procedure used to provide evidence in favor of some statement (called a hypothesis). For instance, hypothesis testing might be used to assess whether a population parameter, such as a population mean, differs from a specified standard or previous value. In this chapter we discuss testing hypotheses about population means, proportions, and variances.

In order to illustrate how hypothesis testing works, we revisit several cases introduced in previous chapters and also introduce some new cases:

The Payment Time Case: The consulting firm uses hypothesis testing to provide strong evidence that the new electronic billing system has reduced the mean payment time by more than 50 percent.

The Cheese Spread Case: The cheese spread producer uses hypothesis testing to supply extremely strong evidence that fewer than 10 percent of all current purchasers would stop buying the cheese spread if the new spout were used.

The Electronic Article Surveillance Case: A company that sells and installs EAS systems claims that at most 5 percent of all consumers would never shop in a store again if the store subjected them to a false EAS alarm. A store considering the purchase of such a system uses hypothesis testing to provide extremely strong evidence that this claim is not true.

The Trash Bag Case: A marketer of trash bags uses hypothesis testing to support its claim that the mean breaking strength of its new trash bag is greater than 50 pounds. As a result, a television network approves use of this claim in a commercial.

The Valentine’s Day Chocolate Case: A candy company projects that this year’s sales of its special valentine box of assorted chocolates will be 10 percent higher than last year. The candy company uses hypothesis testing to assess whether it is reasonable to plan for a 10 percent increase in sales of the valentine box.

9.1: The Null and Alternative Hypotheses and Errors in Hypothesis Testing

One of the authors’ former students is employed by a major television network in the standards and practices division. One of the division’s responsibilities is to reduce the chances that advertisers will make false claims in commercials run on the network. Our former student reports that the network uses a statistical methodology called hypothesis testing to do this.


Chapter 9

To see how this might be done, suppose that a company wishes to advertise a claim, and suppose that the network has reason to doubt that this claim is true. The network assumes for the sake of argument that the claim is not valid. This assumption is called the
null hypothesis.
The statement that the claim is valid is called the
alternative, or
research, hypothesis.
The network will run the commercial only if the company making the claim provides sufficient sample evidence to reject the null hypothesis that the claim is not valid in favor of the alternative hypothesis that the claim is valid. Explaining the exact meaning of sufficient sample evidence is quite involved and will be discussed in the next section.

The Null Hypothesis and the Alternative Hypothesis

In hypothesis testing:

1 The
null hypothesis,
denoted H0, is the statement being tested. Usually this statement represents the status quo and is not rejected unless there is convincing sample evidence that it is false.

2 The
alternative, or
research, hypothesis,
de noted Ha, is a statement that will be accepted only if there is convincing sample evidence that it is true.

Setting up the null and alternative hypotheses in a practical situation can be tricky. In some situations there is a condition for which we need to attempt to find supportive evidence. We then formulate (1) the alternative hypothesis to be the statement that this condition exists and (2) the null hypothesis to be the statement that this condition does not exist. To illustrate this, we consider the following case studies.

EXAMPLE 9.1: The Trash Bag Case1

A leading manufacturer of trash bags produces the strongest trash bags on the market. The company has developed a new 30-gallon bag using a specially formulated plastic that is stronger and more biodegradable than other plastics. This plastic’s increased strength allows the bag’s thickness to be reduced, and the resulting cost savings will enable the company to lower its bag price by 25 percent. The company also believes the new bag is stronger than its current 30-gallon bag.

The manufacturer wants to advertise the new bag on a major television network. In addition to promoting its price reduction, the company also wants to claim the new bag is better for the environment and stronger than its current bag. The network is convinced of the bag’s environmental advantages on scientific grounds. However, the network questions the company’s claim of increased strength and requires statistical evidence to justify this claim. Although there are various measures of bag strength, the manufacturer and the network agree to employ “breaking strength.” A bag’s breaking strength is the amount of a representative trash mix (in pounds) that, when loaded into a bag suspended in the air, will cause the bag to rip or tear. Tests show that the current bag has a mean breaking strength that is very close to (but does not exceed) 50 pounds. The new bag’s mean breaking strength μ is unknown and in question. The alternative hypothesis Ha is the statement for which we wish to find supportive evidence. Because we hope the new bags are stronger than the current bags, Ha says that μ is greater than 50. The null hypothesis states that Ha is false. Therefore, H0 says that μ is less than or equal to 50. We summarize these hypotheses by stating that we are testing

H0: μ ≤ 50   versus   Ha: μ > 50

The network will run the manufacturer’s commercial if a random sample of n new bags provides sufficient evidence to reject H0: μ ≤ 50   in favor of   Ha: μ > 50.

EXAMPLE 9.2: The Payment Time Case

Recall that a management consulting firm has installed a new computer-based, electronic billing system for a Hamilton, Ohio, trucking company. Because of the system’s advantages, and because the trucking company’s clients are receptive to using this system, the management consulting firm believes that the new system will reduce the mean bill payment time by more than 50 percent. The mean payment time using the old billing system was approximately equal to, but no less than, 39 days. Therefore, if μ denotes the mean payment time using the new system, the consulting firm believes that μ will be less than 19.5 days. Because it is hoped that the new billing system reduces mean payment time, we formulate the alternative hypothesis as Ha: μ < 19.5 and the null hypothesis as H0: μ ≥ 19.5. The consulting firm will randomly select a sample of n invoices and determine if their payment times provide sufficient evidence to reject H0: μ ≥ 19.5 in favor of Ha: μ < 19.5. If such evidence exists, the consulting firm will conclude that the new electronic billing system has reduced the Hamilton trucking company’s mean bill payment time by more than 50 percent. This conclusion will be used to help demonstrate the benefits of the new billing system both to the Hamilton company and to other trucking companies that are considering using such a system.

EXAMPLE 9.3: The Valentine’s Day Chocolate Case 2

A candy company annually markets a special 18 ounce box of assorted chocolates to large retail stores for Valentine’s Day. This year the candy company has designed an extremely attractive new valentine box and will fill the box with an especially appealing assortment or chocolates. For this reason, the candy company subjectively projects—based on past experience and knowledge of the candy market—that sales of its valentine box will be 10 percent higher than last year. However, since the candy company must decide how many valentine boxes to produce, the company needs to assess whether it is reasonable to plan for a 10 percent increase in sales.

Before the beginning of each Valentine’s Day sales season, the candy company sends large retail stores information about its newest valentine box of assorted chocolates. This information includes a description of the box of chocolates, as well as a preview of advertising displays that the candy company will provide to help retail stores sell the chocolates. Each retail store then places a single (nonreturnable) order of valentine boxes to satisfy its anticipated customer demand for the Valentine’s Day sales season. Last year the mean order quantity of large retail stores was 300 boxes per store. If the projected 10 percent sales increase will occur, the mean order quantity, μ, of large retail stores this year will be 330 boxes per store. Therefore, the candy company wishes to test the null hypothesis H0: μ = 330 versus the alternative hypothesis Ha: μ ≠ 330.

To perform the hypothesis test, the candy company will randomly select a sample of n large retail stores and will make an early mailing to these stores promoting this year’s valentine box. The candy company will then ask each retail store to report how many valentine boxes it anticipates ordering. If the sample data do not provide sufficient evidence to reject H0: μ = 330 in favor of Ha: μ ≠ 330, the candy company will base its production on the projected 10 percent sales increase. On the other hand, if there is sufficient evidence to reject H0: μ = 330, the candy company will change its production plans.

We next summarize the sets of null and alternative hypotheses that we have thus far considered.

The alternative hypothesis Ha: μ > 50 is called a one-sided, greater than alternative
hypothesis,

where

as Ha: μ < 19.5 is called a one-sided, less than alternative
hypothesis, and Ha: μ ≠ 330 is called a two-sided, not equal to alternative
hypothesis. Many of the alternative hypotheses we consider in this book are one of these three types. Also, note that each null hypothesis we have considered involves an equality. For example, the null hypothesis H0: μ ≤ 50 says that μ is either less than or equal to 50. We will see that, in general, the approach we use to test a null hypothesis versus an alternative hypothesis requires that the null hypothesis involve an equality.

The idea of a test statistic

Suppose that in the trash bag case the manufacturer randomly selects a sample of n = 40 new trash bags. Each of these bags is tested for breaking strength, and the sample mean of the 40 breaking strengths is calculated. In order to test H0: μ ≤ 50 versus Ha: μ > 50, we utilize the
test statistic

The test statistic z measures the distance between and 50. The division by says that this distance is measured in units of the standard deviation of all possible sample means. For example, a value of z equal to, say, 2.4 would tell us that is 2.4 such standard deviations above 50. In general, a value of the test statistic that is less than or equal to zero results when is less than or equal to 50. This provides no evidence to support rejecting H0 in favor of Ha because the point estimate indicates that μ is probably less than or equal to 50. However, a value of the test statistic that is greater than zero results when is greater than 50. This provides evidence to support rejecting H0 in favor of Ha because the point estimate indicates that μ might be greater than 50. Furthermore, the farther the value of the test statistic is above 0 (the farther is above 50), the stronger is the evidence to support rejecting H0 in favor of Ha.

Hypothesis testing and the legal system

If the value of the test statistic z is far enough above 0, we reject H0 in favor of Ha. To see how large z must be in order to reject H0, we must understand that a hypothesis test rejects a null hypothesis H0 only if there is strong statistical evidence against H0. This is similar to our legal system, which rejects the innocence of the accused only if evidence of guilt is beyond a reasonable doubt. For instance, the network will reject H0: μ ≤ 50 and run the trash bag commercial only if the test statistic z is far enough above 0 to show beyond a reasonable doubt that H0: μ ≤ 50 is false and Ha: μ > 50 is true. A test statistic that is only slightly greater than 0 might not be convincing enough. However, because such a test statistic would result from a sample mean that is slightly greater than 50, it would provide some evidence to support rejecting H0: μ ≤ 50, and it certainly would not provide strong evidence sup porting H0: μ ≤ 50. Therefore, if the value of the test statistic is not large enough to convince us to reject H0, we do not say that we accept H0. Rather we say that we do not reject H0 because the evidence against H0 is not strong enough. Again, this is similar to our legal system, where the lack of evidence of guilt beyond a reasonable doubt results in a verdict of not guilty, but does not prove that the accused is innocent.

Type I and Type II errors and their probabilities

To determine exactly how much statistical evidence is required to reject H0, we consider the errors and the correct decisions that can be made in hypothesis testing. These errors and correct decisions, as well as their implications in the trash bag advertising example, are summarized in Tables 9.1 and 9.2. Across the top of each table are listed the two possible “states of nature.” Either H0: μ ≤ 50 is true, which says the manufacturer’s claim that μ is greater than 50 is false, or H0 is false, which says the claim is true. Down the left side of each table are listed the two possible decisions we can make in the hypothesis test. Using the sample data, we will either reject H0: μ ≤ 50, which implies that the claim will be advertised, or we will not reject H0, which implies that the claim will not be advertised.

Table 9.1: Type I and Type II Errors

Table 9.2: The Implications of Type I and Type II Errors in the Trash Bag Example

In general, the two types of errors that can be made in hypothesis testing are defined here:

Type I and Type II Errors

If we reject H0 when it is true, this is a
Type I error.

If we do not reject H0 when it is false, this is a
Type II error.

As can be seen by comparing Tables 9.1 and 9.2, if we commit a Type I error, we will advertise a false claim. If we commit a Type II error, we will fail to advertise a true claim.

We now let the symbol
α
(pronounced alpha) denote the probability of a Type I error, and we let
β
(pronounced beta) denote the probability of a Type II error. Obviously, we would like both α and β to be small. A common (but not the only) procedure is to base a hypothesis test on taking a sample of a fixed size (for example, n = 40 trash bags) and on setting α equal to a small prespecified value. Setting α low means there is only a small chance of rejecting H0 when it is true.

This implies that

we are requiring strong evidence against H0 before we reject it.

We sometimes choose α as high as .10, but we usually choose α between .05 and .01. A frequent choice for α is .05. In fact, our former student tells us that the network often tests advertising claims by setting the probability of a Type I error equal to .05. That is, the network will run a commercial making a claim if the sample evidence allows it to reject a null hypothesis that says the claim is not valid in favor of an alternative hypothesis that says the claim is valid with α set equal to .05. Since a Type I error is deciding that the claim is valid when it is not, the policy of setting α equal to .05 says that, in the long run, the network will advertise only 5 percent of all invalid claims made by advertisers.

One might wonder why the network does not set α lower—say at .01. One reason is that it can be shown that, for a fixed sample size, the lower we set α, the higher is β, and the higher we set α, the lower is β. Setting α at .05 means that β, the probability of failing to advertise a true claim (a Type II error), will be smaller than it would be if α were set at .01. As long as (1) the claim to be advertised is plausible and (2) the consequences of advertising the claim even if it is false are not terribly serious, then it is reasonable to set α equal to .05. However, if either (1) or (2) is not true, then we might set α lower than .05. For example, suppose a pharmaceutical company wishes to advertise that it has developed an effective treatment for a disease that has formerly been very resistant to treatment. Such a claim is (perhaps) difficult to believe. Moreover, if the claim is false, patients suffering from the disease would be subjected to false hope and needless expense. In such a case, it might be reasonable for the network to set α at .01 because this would lower the chance of advertising the claim if it is false. We usually do not set α lower than .01 because doing so often leads to an unacceptably large value of β. We explain some methods for computing the probability of a Type II error in optional Section 9.6. However, β can be difficult or impossible to calculate in many situations, and we often must rely on our intuition when deciding how to set α.

Exercises for Section 9.1

CONCEPTS

9.1 Which hypothesis (the null hypothesis, H0, or the alternative hypothesis, Ha) is the “status quo” hypothesis (that is, the hypothesis that states that things are remaining “as is”)? Which hypothesis is the hypothesis that says that a “hoped for” or “suspected” condition exists?

9.2 Which hypothesis (H0 or Ha) is not rejected unless there is convincing sample evidence that it is false? Which hypothesis (H0 or Ha) will be accepted only if there is convincing sample evidence that it is true?

9.3 Define each of the following:

a Type I error

b Type II error

c α

d β

9.4 For each of the following situations, indicate whether an error has occurred and, if so, indicate what kind of error (Type I or Type II) has occurred.

a We do not reject H0 and H0 is true.

b We reject H0 and H0 is true.

c We do not reject H0 and H0 is false.

d We reject H0 and H0 is false.

9.5 If we reject H0, what is the only type of error that we could be making? Explain.

9.6 If we do not reject H0, what is the only type of error that we could be making? Explain.

9.7 When testing a hypothesis, why don’t we set the probability of a Type I error to be extremely small? Explain.

METHODS AND APPLICATIONS

9.8 THE VIDEO GAME SATISFACTION RATING CASE VideoGame

Recall that “very satisfied” customers give the XYZ-Box video game system a rating that is at least 42. Suppose that the manufacturer of the XYZ-Box wishes to use the random sample of 65 satisfaction ratings to provide evidence supporting the claim that the mean composite satisfaction rating for the XYZ-Box exceeds 42.

a Letting μ represent the mean composite satisfaction rating for the XYZ-Box, set up the null and alternative hypotheses needed if we wish to attempt to provide evidence supporting the claim that μ exceeds 42.

b In the context of this situation, interpret making a Type I error; interpret making a Type II error.

9.9 THE BANK CUSTOMER WAITING TIME CASE WaitTime

Recall that a bank manager has developed a new system to reduce the time customers spend waiting for teller service during peak hours. The manager hopes the new system will reduce waiting times from the current 9 to 10 minutes to less than 6 minutes.

Suppose the manager wishes to use the random sample of 100 waiting times to support the claim that the mean waiting time under the new system is shorter than six minutes.

a Letting μ represent the mean waiting time under the new system, set up the null and alternative hypotheses needed if we wish to attempt to provide evidence supporting the claim that μ is shorter than six minutes.

b In the context of this situation, interpret making a Type I error; interpret making a Type II error.

9.10 An automobile parts supplier owns a machine that produces a cylindrical engine part. This part is supposed to have an outside diameter of three inches. Parts with diameters that are too small or too large do not meet customer requirements and must be rejected. Lately, the company has experienced problems meeting customer requirements. The technical staff feels that the mean diameter produced by the machine is off target. In order to verify this, a special study will randomly sample 100 parts produced by the machine. The 100 sampled parts will be measured, and if the results obtained cast a substantial amount of doubt on the hypothesis that the mean diameter equals the target value of three inches, the company will assign a problem-solving team to intensively search for the causes of the problem.

a The parts supplier wishes to set up a hypothesis test so that the problem-solving team will be assigned when the null hypothesis is rejected. Set up the null and alternative hypotheses for this situation.

b In the context of this situation, interpret making a Type I error; interpret making a Type II error.

c Suppose it costs the company $3,000 a day to assign the problem-solving team to a project. Is this $3,000 figure the daily cost of a Type I error or a Type II error? Explain.

9.11 The Crown Bottling Company has just installed a new bottling process that will fill 16-ounce bottles of the popular Crown Classic Cola soft drink. Both overfilling and underfilling bottles are undesirable: Underfilling leads to customer complaints and overfilling costs the company considerable money. In order to verify that the filler is set up correctly, the company wishes to see whether the mean bottle fill, μ, is close to the target fill of 16 ounces. To this end, a random sample of 36 filled bottles is selected from the output of a test filler run. If the sample results cast a substantial amount of doubt on the hypothesis that the mean bottle fill is the desired 16 ounces, then the filler’s initial setup will be readjusted.

a The bottling company wants to set up a hypothesis test so that the filler will be readjusted if the null hypothesis is rejected. Set up the null and alternative hypotheses for this hypothesis test.

b In the context of this situation, interpret making a Type I error; interpret making a Type II error.

9.12 Consolidated Power, a large electric power utility, has just built a modern nuclear power plant. This plant discharges waste water that is allowed to flow into the Atlantic Ocean. The Environmental Protection Agency (EPA) has ordered that the waste water may not be excessively warm so that thermal pollution of the marine environment near the plant can be avoided. Because of this order, the waste water is allowed to cool in specially constructed ponds and is then released into the ocean. This cooling system works properly if the mean temperature of waste water discharged is 60°F or cooler. Consolidated Power is required to monitor the temperature of the waste water. A sample of 100 temperature readings will be obtained each day, and if the sample results cast a substantial amount of doubt on the hypothesis that the cooling system is working properly (the mean temperature of waste water discharged is 60°F or cooler), then the plant must be shut down and appropriate actions must be taken to correct the problem.

a Consolidated Power wishes to set up a hypothesis test so that the power plant will be shut down when the null hypothesis is rejected. Set up the null and alternative hypotheses that should be used.

b In the context of this situation, interpret making a Type I error; interpret making a Type II error.

c The EPA periodically conducts spot checks to determine whether the waste water being discharged is too warm. Suppose the EPA has the power to impose very severe penalties (for example, very heavy fines) when the waste water is excessively warm. Other things being equal, should Consolidated Power set the probability of a Type I error equal to α = .01 or α = .05? Explain.

9.13 Consider Exercise 9.12, and suppose that Consolidated Power has been experiencing technical problems with the cooling system. Because the system has been unreliable, the company feels it must take precautions to avoid failing to shut down the plant when its waste water is too warm. Other things being equal, should Consolidated Power set the probability of a Type I error equal to α = .01 or α = .05? Explain.

9.2: z Tests about a Population Mean: σ Known

In this section we discuss hypothesis tests about a population mean that are based on the normal distribution. These tests are called
z tests,
and they require that the true value of the population standard deviation σ is known. Of course, in most real-world situations the true value of σ is not known. However, the concepts and calculations of hypothesis testing are most easily illustrated using the normal distribution. Therefore, in this section we will assume that—through theory or history related to the population under consideration—we know σ. When σ is unknown, we test hypotheses about a population mean by using the t distribution. In Section 9.3 we study
t tests,
and we will revisit the examples of this section assuming that σ is unknown.

Chapter 9

Testing a “greater than” alternative hypothesis by using a critical value rule

In Section 9.1 we explained how to set up appropriate null and alternative hypotheses. We also discussed how to specify a value for α, the probability of a Type I error (also called the level of significance) of the hypothesis test, and we introduced the idea of a test statistic. We can use these concepts to begin developing a seven step hypothesis testing procedure. We will introduce these steps in the context of the trash bag case and testing a “greater than” alternative hypothesis.

Step 1: State the null hypothesis H0 and the alternative hypothesis Ha. In the trash bag case, we will test H0: μ ≤ 50 versus Ha: μ > 50. Here, μ is the mean breaking strength of the new trash bag.

Step 2: Specify the level of significance α. The television network will run the commercial stating that the new trash bag is stronger than the former bag if we can reject H0: μ ≤ 50 in favor of Ha: μ > 50 by setting α equal to .05.

Step 3: Select the test statistic. In order to test H0: μ ≤ 50 versus Ha: μ > 50, we will test the modified null hypothesis H0: μ = 50 versus Ha: μ > 50. The idea here is that if there is sufficient evidence to reject the hypothesis that μ equals 50 in favor of μ > 50, then there is certainly also sufficient evidence to reject the hypothesis that μ is less than or equal to 50. In order to test H0: μ = 50 versus Ha: μ > 50, we will randomly select a sample of n = 40 new trash bags and calculate the mean of the breaking strengths of these bags. We will then utilize the test statistic

A positive value of this test statistic results from an that is greater than 50 and thus provides evidence against H0: μ = 50 and in favor of Ha: μ > 50.

Step 4: Determine the critical value rule for deciding whether to reject H0. To decide how large the test statistic z must be to reject H0 in favor of Ha by setting the probability of a Type I error equal to α, we note that different samples would give different sample means and thus different values of z. Because the sample size n = 40 is large, the Central Limit Theorem tells us that the sampling distribution of z is (approximately) a standard normal distribution if the null hypothesis H0: μ = 50 is true. Therefore, we do the following:

Place the probability of a Type I error, α, in the right-hand tail of the standard normal curve and use the normal table (see Table A.3, page 863) to find the normal point zα. Here zα, which we call a
critical value,
is the point on the horizontal axis under the standard normal curve that gives a right-hand tail area equal to α.

Reject H0: μ = 50 in favor of Ha: μ > 50 if and only if the test statistic z is greater than the critical value zα
(This is the critical value rule.)

Figure 9.1 illustrates that since we have set α equal to .05, we should use the critical value zα = z.05 = 1.645 (see Table A.3). This says that we should reject H0 if z > 1.645 and we should not reject H0 if z ≤ 1.645.

Figure 9.1: The Critical Value for Testing H0: μ = 50 versus Ha: μ > 50 by Setting α = .05

To better understand the critical value rule, consider the standard normal curve in Figure 9.1. The area of .05 in the right-hand tail of this curve implies that values of the test statistic z that are greater than 1.645 are unlikely to occur if the null hypothesis H0: μ = 50 is true. There is a 5 percent chance of observing one of these values—and thus wrongly rejecting H0—if H0 is true. However, we are more likely to observe a value of z greater than 1.645—and thus correctly reject H0—if H0 is false. Therefore, it is intuitively reasonable to reject H0 if the value of the test statistic z is greater than 1.645.

Step 5: Collect the sample data and compute the value of the test statistic. When the sample of n = 40 new trash bags is randomly selected, the mean of the breaking strengths is calculated to be . Assuming that σ is known to equal 1.65, the value of the test statistic is

Step 6: Decide whether to reject H0 by using the test statistic value and the critical value rule. Since the test statistic value z = 2.20 is greater than the critical value z.05 = 1.645, we can reject H0: μ = 50 in favor of Ha: μ > 50 by setting α equal to .05. Furthermore, we can be intuitively confident that H0: μ = 50 is false and Ha: μ > 50 is true. This is because, since we have rejected H0 by setting α equal to .05, we have rejected H0 by using a test that allows only a 5 percent chance of wrongly rejecting H0. In general, if we can reject a null hypothesis in favor of an alternative hypothesis by setting the probability of a Type I error equal to α, we say that we have
statistical significance at the
α
level.

Step 7: Interpret the statistical results in managerial (real-world) terms and assess their practical importance. Since we have rejected H0: μ = 50 in favor of Ha: μ > 50 by setting α equal to .05, we conclude (at an α of .05) that the mean breaking strength of the new trash bag exceeds 50 pounds. Furthermore, this conclusion has practical importance to the trash bag manufacturer because it means that the television network will approve running commercials claiming that the new trash bag is stronger than the former bag. Note, however, that the point estimate of μ, , indicates that μ is not much larger than 50. Therefore, the trash bag manufacturer can claim only that its new bag is slightly stronger than its former bag. Of course, this might be practically important to consumers who feel that, because the new bag is 25 percent less expensive and is more environmentally sound, it is definitely worth purchasing if it has any strength advantage. However, to customers who are looking only for a substantial increase in bag strength, the statistical results would not be practically important. This illustrates that, in general, a finding of statistical significance (that is, concluding that the alternative hypothesis is true) can be practically important to some people but not to others. Notice that the point estimate of the parameter involved in a hypothesis test can help us to assess practical importance. We can also use confidence intervals to help assess practical importance.

Considerations in setting α

We have reasoned in Section 9.1 that the television network has set α equal to .05 rather than .01 because doing so means that β, the probability of failing to advertise a true claim (a Type II error), will be smaller than it would be if α were set at .01. It is informative, however, to see what would have happened if the network had set α equal to .01. Figure 9.2 illustrates that as we decrease α from .05 to .01, the critical value zα increases from z.05 = 1.645 to z.01 = 2.33. Because the test statistic value z = 2.20 is less than z.01 = 2.33, we cannot reject H0: μ = 50 in favor of Ha: μ > 50 by setting α equal to .01. This illustrates the point that, the smaller we set α, the larger is the critical value, and thus the stronger is the statistical evidence that we are requiring to reject the null hypothesis H0. Some statisticians have concluded (somewhat subjectively) that (1) if we set α equal to .05, then we are requiring strong evidence to reject H0; and (2) if we set α equal to .01, then we are requiring very strong evidence to reject H0.

Figure 9.2: The Critical Values for Testing H0: μ = 50 versus Ha: μ > 50 by Setting α = .05 and .01

A
p
-value for testing a “greater than” alternative hypothesis

To decide whether to reject the null hypothesis H0 at level of significance α, steps 4, 5, and 6 of the seven-step hypoth esis testing procedure compare the test statistic value with a critical value. Another way to make this decision is to calculate a
p
-value,
which measures the likelihood of the sample results if the null hypothesis H0 is true. Sample results that are not likely if H0 is true are evidence that H0 is not true. To test H0 by using a p-value, we use the following steps 4, 5, and 6:

Step 4: Collect the sample data and compute the value of the test statistic. In the trash bag case, we have computed the value of the test statistic to be z = 2.20.

Step 5: Calculate the p-value by using the test statistic value. The p-value for testing H0: μ = 50 versus Ha: μ > 50 in the trash bag case is the area under the standard normal curve to the right of the test statistic value z = 2.20. As illustrated in Figure 9.3(b), this area is 1 − .9861 = .0139. The p-value is the probability, computed assuming that H0: μ = 50 is true, of observing a value of the test statistic that is greater than or equal to the value z = 2.20 that we have actually computed from the sample data. The p-value of .0139 says that, if H0: μ = 50 is true, then only 139 in 10,000 of all possible test statistic values are at least as large, or extreme, as the value z = 2.20. That is, if we are to believe that H0 is true, we must believe that we have observed a test statistic value that can be described as a 139 in 10,000 chance. Because it is difficult to believe that we have observed a 139 in 10,000 chance, we intuitively have strong evidence that H0: μ = 50 is false and Ha: μ > 50 is true.

Figure 9.3: Testing H0: μ = 50 versus Ha: μ > 50 by Using Critical Values and the p-Value

Step 6: Reject H0 if the p-value is less than α. Recall that the television network has set α equal to .05. The p-value of .0139 is less than the α of .05. Comparing the two normal curves in Figures 9.3(a) and (b), we see that this implies that the test statistic value z = 2.20 is greater than the critical value z.05 = 1.645. Therefore, we can reject H0 by setting α equal to .05. As another example, suppose that the television network had set α equal to .01. The p-value of .0139 is greater than the α of .01. Comparing the two normal curves in Figures 9.3(b) and (c), we see that this implies that the test statistic value z = 2.20 is less than the critical value z.01 = 2.33. Therefore, we cannot reject H0 by setting α equal to .01. Generalizing these examples, we conclude that the value of the test statistic z will be greater than the critical value zα if and only if the p-value is less than α. That is, we can reject H0 in favor of Ha at level of significance α if and only if the p-value is less than α.

© NBC, Inc. Used with permission.

Note: This logo appears on an NBC advertising standards booklet. This booklet, along with other information provided by NBC and CBS, forms the basis for much of the discussion in the paragraph to the right.

Comparing the critical value and p-value methods

Thus far we have considered two methods for testing H0: μ = 50 versus Ha: μ > 50 at the .05 and .01 values of α. Using the first method, we determine if the test statistic value z = 2.20 is greater than the critical values z.05 = 1.645 and z.01 = 2.33. Using the second method, we determine if the p-value of .0139 is less than .05 and .01. Whereas the critical value method requires that we look up a different critical value for each different α value, the p-value method requires only that we calculate a single p-value and compare it directly with the different α values. It follows that the p-value method is the most efficient way to test a hypothesis at different α values. This can be useful when there are different decision makers who might use different α values. For example, television networks do not always evaluate advertising claims by setting α equal to .05. The reason is that the consequences of a Type I error (advertising a false claim) are more serious for some claims than for others. For example, the consequences of a Type I error would be fairly serious for a claim about the effectiveness of a drug or for the superiority of one product over another. However, these consequences might not be as serious for a noncomparative claim about an inexpensive and safe product, such as a cosmetic. Networks sometimes use α values between .01 and .04 for claims having more serious Type I error consequences, and they sometimes use α values between .06 and .10 for claims having less serious Type I error consequences. Furthermore, one network’s policies for setting α can differ somewhat from those of another. As a result, reporting an advertising claim’s p-value to each network is the most efficient way to tell the network whether to allow the claim to be advertised. For example, most networks would evaluate the trash bag claim by choosing an α value between .025 and .10. Since the p-value of .0139 is less than all these α values, most networks would allow the trash bag claim to be advertised.

A summary of the seven steps of hypothesis testing

For almost every hypothesis test discussed in this book, statisticians have developed both a critical value rule and a p-value that can be used to perform the hypothesis test. Furthermore, it can be shown that for each hypothesis test the p-value has been defined so that we can reject the null hypothesis at level of significance α if and only if the p-value is less than α
. We now summarize a seven-step procedure for performing a hypothesis test.

The Seven Steps of Hypothesis Testing

1 State the null hypothesis H0 and the alternative hypothesis Ha.

2 Specify the level of significance α.

3 Select the test statistic.

Using a critical value rule:

4 Determine the critical value rule for deciding whether to reject H0. Use the specified value of α to find the critical value in the critical value rule.

5 Collect the sample data and compute the value of the test statistic.

6 Decide whether to reject H0 by using the test statistic value and the critical value rule.

Using a p-value:

4 Collect the sample data and compute the value of the test statistic.

5 Calculate the p-value by using the test statistic value.

6 Reject H0 at level of significance α if the p-value is less than α.

7 Interpret your statistical results in managerial (real-world) terms and assess their practical importance.

In the real world both critical value rules and p-values are used to carry out hypothesis tests. For example, NBC uses critical value rules, whereas CBS uses p-values, to statistically verify the validity of advertising claims. Throughout this book we will continue to present both the critical value and the p-value approaches to hypothesis testing.

Testing a “less than” alternative hypothesis

We next consider the payment time case and testing a “less than” alternative hypothesis:

Step 1: In order to study whether the new electronic billing system reduces the mean bill payment time by more than 50 percent, the management consulting firm will test H0: μ ≥ 19.5 versus Ha: μ < 19.5.

Step 2: The management consulting firm wishes to make sure that it truthfully describes the benefits of the new system both to the Hamilton, Ohio, trucking company and to other companies that are considering installing such a system. Therefore, the firm will require very strong evidence to conclude that μ is less than 19.5, which implies that it will test H0: μ ≥ 19.5 versus Ha: μ < 19.5 by setting α equal to .01.

Step 3: In order to test H0: μ ≥ 19.5 versus Ha: μ < 19.5, we will test the modified null hypothesis H0: μ = 19.5 versus Ha: μ < 19.5. The idea here is that if there is sufficient evidence to reject the hypothesis that μ equals 19.5 in favor of μ < 19.5, then there is certainly also sufficient evidence to reject the hypothesis that μ is greater than or equal to 19.5. In order to test H0: μ = 19.5 versus Ha: μ < 19.5, we will randomly select a sample of n = 65 invoices paid using the billing system and calculate the mean of the payment times of these invoices. Since the sample size is large, the Central Limit Theorem applies, and we will utilize the test statistic

A value of the test statistic z that is less than zero results when is less than 19.5. This provides evidence to support rejecting H0 in favor of Ha because the point estimate indicates that μ might be less than 19.5.

Step 4: To decide how much less than zero the test statistic must be to reject H0 in favor of Ha by setting the probability of a Type I error equal to α, we do the following:

Place the probability of a Type I error, α, in the left-hand tail of the standard normal curve and use the normal table to find the critical value −zα. Here −zα is the negative of the normal point zα. That is, −zα is the point on the horizontal axis under the standard normal curve that gives a left-hand tail area equal to α.

Reject H0: μ = 19.5 in favor of Ha: μ < 19.5 if and only if the test statistic z is less than the critical value −zα. Because α equals .01, the critical value −zα is −z.01 = −2.33 [see Fig. 9.4(a)].

Figure 9.4: Testing H0: μ = 19.5 versus Ha: μ < 19.5 by Using Critical Values and the p-Value

Step 5: When the sample of n = 65 invoices is randomly selected, the mean of the payment times of these invoices is calculated to be . Assuming that σ is known to equal 4.2, the value of the test statistic is

Step 6: Since the test statistic value z = −2.67 is less than the critical value −z.01 = −2.33, we can reject H0: μ = 19.5 in favor of Ha: μ < 19.5 by setting α equal to .01.

Step 7: We conclude (at an α of .01) that the mean payment time for the new electronic billing system is less than 19.5 days. This, along with the fact that the sample mean is slightly less than 19.5, implies that it is reasonable for the management consulting firm to conclude that the new electronic billing system has reduced the mean payment time by slightly more than 50 percent (a substantial improvement over the old system).

A p-value for testing a “less than” alternative hypothesis

To test H0: μ = 19.5 versus Ha: μ < 19.5 in the payment time case by using a p-value, we use the following steps 4, 5, and 6:

Step 4: We have computed the value of the test statistic in the payment time case to be z = −2.67.

Step 5: The p-value for testing H0: μ = 19.5 versus Ha: μ < 19.5 is the area under the standard normal curve to the left of the test statistic value z = −2.67. As illustrated in Figure 9.4(b), this area is .0038. The p-value is the probability, computed assuming that H0: μ = 19.5 is true, of observing a value of the test statistic that is less than or equal to the value z = −2.67 that we have actually computed from the sample data. The p-value of .0038 says that, if H0: μ = 19.5 is true, then only 38 in 10,000 of all possible test statistic values are at least as negative, or extreme, as the value z = −2.67. That is, if we are to believe that H0 is true, we must believe that we have observed a test statistic value that can be described as a 38 in 10,000 chance.

Step 6: The management consulting firm has set α equal to .01. The p-value of .0038 is less than the α of .01. Therefore, we can reject H0 by setting α equal to .01.

Testing a “not equal to” alternative hypothesis

We next consider the Valentine’s Day chocolate case and testing a “not equal to” alternative hypothesis.

Step 1: To assess whether this year’s sales of its valentine box of assorted chocolates will be ten percent higher than last year’s, the candy company will test H0: μ = 330 versus Ha: μ ≠ 330. Here, μ is the mean order quantity of this year’s valentine box by large retail stores.

Step 2: If the candy company does not reject H0: μ = 330 and H0: μ = 330 is false—a Type II error—the candy company will base its production of valentine boxes on a 10 percent projected sales increase that is not correct. Since the candy company wishes to have a reasonably small probability of making this Type II error, the company will set α equal to .05. Setting α equal to .05 rather than .01 makes the probability of a Type II error smaller than it would be if α were set at .01. Note that in optional Section 9.5 we will verify that the probability of a Type II error in this situation is reasonably small. Therefore, if the candy company ends up not rejecting H0: μ = 330 and therefore decides to base its production of valentine boxes on the ten percent projected sales increase, the company can be intuitively confident that it has made the right decision.

Step 3: The candy company will randomly select n = 100 large retail stores and will make an early mailing to these stores promoting this year’s valentine box of assorted chocolates. The candy company will then ask each sampled retail store to report its anticipated order quantity of valentine boxes and will calculate the mean of the reported order quantities. Since the sample size is large, the Central Limit Theorem applies, and we will utilize the test statistic

A value of the test statistic that is greater than 0 results when is greater than 330. This provides evidence to support rejecting H0 in favor of Ha because the point estimate indicates that μ might be greater than 330. Similarly, a value of the test statistic that is less than 0 results when is less than 330. This also provides evidence to support rejecting H0 in favor of Ha because the point estimate indicates that μ might be less than 330.

Step 4: To decide how different from zero (positive or negative) the test statistic must be in order to reject H0 in favor of Ha by setting the probability of a Type I error equal to α, we do the following:

Divide the probability of a Type I error, α, into two equal parts, and place the area α/2 in the right-hand tail of the standard normal curve and the area α/2 in the left-hand tail of the standard normal curve. Then use the normal table to find the critical values zα/2 and −zα/2. Here zα/2 is the point on the horizontal axis under the standard normal curve that gives a right-hand tail area equal to α/2, and −zα/2 is the point giving a left-hand tail area equal to α/2.

Reject H0: μ = 330 in favor of Ha: μ ≠ 330 if and only if the test statistic z is greater than the critical value zα/2 or less than the critical value −zα/2. Note that this is equivalent to saying that we should reject H0 if and only if the absolute value of the test statistic, | z | is greater than the critical value zα/2. Because α equals .05, the critical values are [see Figure 9.5(a)]

Figure 9.5: Testing H0: μ = 330 versus Ha: μ ≠ 330 by Using Critical Values and the p-Value

Step 5: When the sample of n = 100 large retail stores is randomly selected, the mean of their reported order quantities is calculated to be . Assuming that σ is known to equal 40, the value of the test statistic is

Step 6: Since the test statistic value z = −1 is greater than − z.025 = −1.96 (or, equivalently, since | z | = 1 is less than z.025 = 1.96), we cannot reject H0: μ = 330 in favor of Ha: μ ≠ 330 by setting α equal to .05.

Step 7: We cannot conclude (at an α of .05) that the mean order quantity of this year’s valentine box by large retail stores will differ from 330 boxes. Therefore, the candy company will base its production of valentine boxes on the ten percent projected sales increase.

A
p
-value for testing a “not equal to” alternative hypothesis

To test H0: μ = 330 versus Ha: μ ≠ 330 in the Valentine’s Day chocolate case by using a p-value, we use the following steps 4, 5, and 6:

Step 4: We have computed the value of the test statistic in the Valentine’s Day chocolate case to be z = −1.

Step 5: Note from Figure 9.5(b) that the area under the standard normal curve to the right of | z | = 1 is .1587. Twice this area—that is, 2(.1587) = .3174—is the p-value for testing H0: μ = 330 versus Ha: μ ≠ 330. To interpret the p-value as a probability, note that the symmetry of the standard normal curve implies that twice the area under the curve to the right of | z | = 1 equals the area under this curve to the right of 1 plus the area under the curve to the left of −1 [see Figure 9.5(b)]. Also, note that since both positive and negative test statistic values count against H0: μ = 330, a test statistic value that is either greater than or equal to 1 or less than or equal to −1 is at least as extreme as the observed test statistic value z = −1. It follows that the p-value of .3174 says that, if H0: μ = 330 is true, then 31.74 percent of all possible test statistic values are at least as extreme as z = −1. That is, if we are to believe that H0 is true, we must believe that we have observed a test statistic value that can be described as a 31.74 percent chance.

Step 6: The candy company has set α equal to .05. The p-value of .3174 is greater than the α of .05. Therefore, we cannot reject H0 by setting α equal to .05.

A general procedure for testing a hypothesis about a population mean

In the trash bag case we have tested H0: μ ≤ 50 versus Ha: μ > 50 by testing H0: μ = 50 versus Ha: μ > 50. In the payment time case we have tested H0: μ ≥ 19.5 versus Ha: μ < 19.5 by testing H0: μ = 19.5 versus Ha: μ < 19.5. In general, the usual procedure for testing a “less than or equal to” null hypothesis or a “greater than or equal to” null hypothesis is to change the null hypothesis to an equality. We then test the “equal to” null hypothesis versus the alternative hypothesis. Furthermore, the critical value and p-value procedures for testing a null hypothesis versus an alternative hypothesis depend upon whether the alternative hypothesis is a “greater than,” a “less than,” or a “not equal to” alternative hypothesis. The following summary box gives the appropriate procedures. Specifically, letting μ0 be a particular number, the summary box shows how to test H0: μ = μ0 versus either Ha: μ > μ0, Ha: μ < μ0, or Ha: μ ≠ μ0:

Testing a Hypothesis about a Population Mean when σ Is Known

Define the test statistic

and assume that the population sampled is normally distributed, or that the sample size n is large. We can test H0: μ = μ0 versus a particular alternative hypothesis at level of significance α by using the appropriate critical value rule, or, equivalently, the corresponding p-value.

Using confidence intervals to test hypotheses

Confidence intervals can be used to test hypotheses. Specifically, it can be proven that we can reject H0: μ = μ0 in favor of Ha: μ ≠ μ0 by setting the probability of a Type I error equal to α if and only if the 100(1 − α) percent confidence interval for μ does not contain μ0. For example, consider the Valentine’s Day chocolate case and testing H0: μ = 330 versus Ha: μ ≠ 330 by setting α equal to .05. To do this, we use the mean of the sample of n = 100 reported order quantities to calculate the 95 percent confidence interval for μ to be

Because this interval does contain 330, we cannot reject H0: μ = 330 in favor of Ha: μ ≠ 330 by setting α equal to .05.

Whereas we can use two-sided confidence intervals to test “not equal to” alternative hypotheses, we must use one-sided confidence intervals to test “greater than” or “less than” alternative hypotheses. We will not study one-sided confidence intervals in this book. However, it should be emphasized that we do not need to use confidence intervals (one-sided or two-sided) to test hypotheses. We can test hypotheses by using test statistics and critical values or p-values, and these are the approaches that we will feature throughout this book.

Measuring the weight of evidence against the null hypothesis

We have seen that in some situations the decision to take an action is based solely on whether a null hypothesis can be rejected in favor of an alternative hypothesis by setting α equal to a single, prespecified value. For example, in the trash bag case the television network decided to run the trash bag commercial because H0: μ = 50 was rejected in favor of Ha: μ > 50 by setting α equal to .05. Also, in the payment time case the management consulting firm decided to claim that the new electronic billing system has reduced the Hamilton trucking company’s mean payment time by more than 50 percent because H0: μ = 19.5 was rejected in favor of Ha: μ < 19.5 by setting α equal to .01. Furthermore, in the Valentine’s Day chocolate case, the candy company decided to base its production of valentine boxes on the ten percent projected sales increase because H0: μ = 330 could not be rejected in favor of Ha: μ ≠ 330 by setting α equal to .05.

Although hypothesis testing at a fixed α level is sometimes used as the sale basis for deciding whether to take an action, this is not always the case. For example, consider again the payment time case. The reason that the management consulting firm wishes to make the claim about the new electronic billing system is to demonstrate the benefits of the new system both to the Hamilton company and to other trucking companies that are considering using such a system. Note, however, that a potential user will decide whether to install the new system by considering factors beyond the results of the hypothesis test. For example, the cost of the new billing system and the receptiveness of the company’s clients to using the new system are among other factors that must be considered. In complex business and industrial situations such as this, hypothesis testing is used to accumulate knowledge about and understand the problem at hand. The ultimate decision (such as whether to adopt the new billing system) is made on the basis of nonstatistical considerations, intuition, and the results of one or more hypothesis tests. Therefore, it is important to know all the information—called the weight of evidence—that a hypothesis test provides against the null hypothesis and in favor of the alternative hypothesis. Furthermore, even when hypothesis testing at a fixed α level is used as the sole basis for deciding whether to take an action, it is useful to evaluate the weight of evidence. For example, the trash bag manufacturer would almost certainly wish to know how much evidence there is that its new bag is stronger than its former bag.

The most informative way to measure the weight of evidence is to use the p-value. For every hypothesis test considered in this book we can interpret the p-value to be the probability, computed assuming that the null hypothesis H0 is true, of observing a value of the test statistic that is at least as extreme as the value actually computed from the sample data. The smaller the
p
-value is, the less likely are the sample results if the null hypothesis H0 is true. Therefore, the stronger is the evidence that H0 is false and that the alternative hypothesis Ha is true.
Experience with hypothesis testing has resulted in statisticians making the following (somewhat subjective) conclusions:

Interpreting the Weight of Evidence against the Null Hypothesis

If the p-value for testing H0 is less than

• .10, we have some evidence that H0 is false.

• .05, we have strong evidence that H0 is false.

• .01, we have very strong evidence that H0 is false.

• .001, we have extremely strong evidence that H0 is false.

We will frequently use these conclusions in future examples. Understand, however, that there are really no sharp borders between different weights of evidence. Rather, there is really only increasingly strong evidence against the null hypothesis as the p-value decreases.

For example, recall that the p-value for testing H0: μ = 50 versus Ha: μ > 50 in the trash bag case is .0139. This p-value is less than .05 but not less than .01. Therefore, we have strong evidence, but not very strong evidence, that H0: μ = 50 is false and Ha: μ > 50 is true. That is, we have strong evidence that the mean breaking strength of the new trash bag exceeds 50 pounds. As another example, the p-value for testing H0: μ = 19.5 versus Ha: μ < 19.5 in the payment time case is .0038. This p-value is less than .01 but not less than .001. Therefore, we have very strong evidence, but not extremely strong evidence, that H0: μ = 19.5 is false and Ha: μ < 19.5 is true. That is, we have very strong evidence that the new billing system has reduced the mean payment time by more than 50 percent. Finally, the p-value for testing H0: μ = 330 versus Ha: μ ≠ 330 in the Valentine’s Day chocolate case is .3174. This p-value is greater than .10. Therefore, we have little evidence that H0: μ = 330 is false and Ha: μ ≠ 330 is true. That is, we have little evidence that the increase in the mean order quantity of the valentine box by large retail stores will differ from ten percent.

Exercises for Section 9.2

CONCEPTS

9.14 Explain what a critical value is, and explain how it is used to test a hypothesis.

9.15 Explain what a p-value is, and explain how it is used to test a hypothesis.

METHODS AND APPLICATIONS

In Exercises 9.16 through 9.22 we consider using a random sample of 100 measurements to test H0: μ = 80 versus Ha: μ > 80. If and σ = 20:

9.16 Calculate the value of the test statistic z.

9.17 Use a critical value to test H0 versus Ha by setting α equal to .10.

9.18 Use a critical value to test H0 versus Ha by setting α equal to .05.

9.19 Use a critical value to test H0 versus Ha by setting α equal to .01.

9.20 Use a critical value to test H0 versus Ha by setting α equal to .001.

9.21 Calculate the p-value and use it to test H0 versus Ha at each of α = .10, .05, .01, and .001.

9.22 How much evidence is there that H0: μ = 80 is false and Ha: μ > 80 is true?

In Exercises 9.23 through 9.29 we consider using a random sample of 49 measurements to test H0: μ = 20 versus Ha: μ < 20. If and σ = 7:

9.23 Calculate the value of the test statistic z.

9.24 Use a critical value to test H0 versus Ha by setting α equal to .10.

9.25 Use a critical value to test H0 versus Ha by setting α equal to .05.

9.26 Use a critical value to test H0 versus Ha by setting α equal to .01.

9.27 Use a critical value to test H0 versus Ha by setting α equal to .001.

9.28 Calculate the p-value and use it to test H0 versus Ha at each of α = .10, .05, .01, and .001.

9.29 How much evidence is there that H0: μ = 20 is false and Ha: μ < 20 is true?

In Exercises 9.30 through 9.36 we consider using a random sample of n = 81 measurements to test H0: μ = 40 versus Ha: μ ≠ 40. If and σ = 18:

9.30 Calculate the value of the test statistic z.

9.31 Use critical values to test H0 versus Ha by setting α equal to .10.

9.32 Use critical values to test H0 versus Ha by setting α equal to .05.

9.33 Use critical values to test H0 versus Ha by setting α equal to .01.

9.34 Use critical values to test H0 versus Ha by setting α equal to .001.

9.35 Calculate the p-value and use it to test H0 versus Ha at each of α = .10, .05, .01, and .001.

9.36 How much evidence is there that H0: μ = 40 is false and Ha: μ ≠ 40 is true?

9.37 THE VIDEO GAME SATISFACTION RATING CASE VideoGame

Recall that “very satisfied” customers give the XYZ-Box video game system a rating that is at least 42. Suppose that the manufacturer of the XYZ-Box wishes to use the random sample of 65 satisfaction ratings to provide evidence supporting the claim that the mean composite satisfaction rating for the XYZ-Box exceeds 42.

a Letting μ represent the mean composite satisfaction rating for the XYZ-Box, set up the null hypothesis H0 and the alternative hypothesis Ha needed if we wish to attempt to provide evidence supporting the claim that μ exceeds 42.

b The random sample of 65 satisfaction ratings yields a sample mean of . Assuming that σ equals 2.64, use critical values to test H0 versus Ha at each of α = .10, .05, .01, and .001.

c Using the information in part (b), calculate the p-value and use it to test H0 versus Ha at each of α = .10, .05, .01, and .001.

d How much evidence is there that the mean composite satisfaction rating exceeds 42?

9.38 THE BANK CUSTOMER WAITING TIME CASE WaitTime

Letting μ be the mean waiting time under the new system, we found in Exercise 9.9 that we should test H0: μ ≥ 6 versus Ha: μ < 6 in order to attempt to provide evidence that μ is less than six minutes. The random sample of 100 waiting times yields a sample mean of minutes. Moreover, Figure 9.6 gives the MINITAB output obtained when we use the waiting time data to test H0: μ = 6 versus Ha: μ < 6. On this output the label “SE Mean,” which stands for “the standard error of the mean,” denotes the quantity , and the label “Z” denotes the calculated test statistic. Assuming that σ equals 2.47:

a Use critical values to test H0 versus Ha at each of α = .10, .05, .01, and .001.

b Calculate the p-value and verify that it equals .014, as shown on the MINITAB output. Use the p-value to test H0 versus Ha at each of α = .10, .05, .01, and .001.

c How much evidence is there that the new system has reduced the mean waiting time to below six minutes?

Figure 9.6: MINITAB Output of the Test of H0: μ = 6 versus Ha: μ < 6 in the Bank Customer Waiting Time Case

Note: Because the test statistic z has a denominator that uses the population standard deviation σ, MINITAB makes the user specify an assumed value for σ.

9.39 Again consider the audit delay situation of Exercise 8.11. Letting μ be the mean audit delay for all public owner-controlled companies in New Zealand, formulate the null hypothesis H0 and the alternative hypothesis Ha that would be used to attempt to provide evidence supporting the claim that μ is less than 90 days. Suppose that a random sample of 100 public owner-controlled companies in New Zealand is found to give a mean audit delay of days. Assuming that σ equals 32.83, calculate the p-value for testing H0 versus Ha and determine how much evidence there is that the mean audit delay for all public owner-controlled companies in New Zealand is less than 90 days.

9.40 Consolidated Power, a large electric power utility, has just built a modern nuclear power plant. This plant discharges waste water that is allowed to flow into the Atlantic Ocean. The Environmental Protection Agency (EPA) has ordered that the waste water may not be excessively warm so that thermal pollution of the marine environment near the plant can be avoided. Because of this order, the waste water is allowed to cool in specially constructed ponds and is then released into the ocean. This cooling system works properly if the mean temperature of waste water discharged is 60°F or cooler. Consolidated Power is required to monitor the temperature of the waste water. A sample of 100 temperature readings will be obtained each day, and if the sample results cast a substantial amount of doubt on the hypothesis that the cooling system is working properly (the mean temperature of waste water discharged is 60°F or cooler), then the plant must be shut down and appropriate actions must be taken to correct the problem.

a Consolidated Power wishes to set up a hypothesis test so that the power plant will be shut down when the null hypothesis is rejected. Set up the null hypothesis H0 and the alternative hypothesis Ha that should be used.

b Suppose that Consolidated Power decides to use a level of significance of α = .05, and suppose a random sample of 100 temperature readings is obtained. If the sample mean of the 100 temperature readings is , test H0 versus Ha and determine whether the power plant should be shut down and the cooling system repaired. Perform the hypothesis test by using a critical value and a p-value. Assume σ = 2.

9.41 Do part (b) of Exercise 9.40 if .

9.42 Do part (b) of Exercise 9.40 if .

9.43 An automobile parts supplier owns a machine that produces a cylindrical engine part. This part is supposed to have an outside diameter of three inches. Parts with diameters that are too small or too large do not meet customer requirements and must be rejected. Lately, the company has experienced problems meeting customer requirements. The technical staff feels that the mean diameter produced by the machine is off target. In order to verify this, a special study will randomly sample 100 parts produced by the machine. The 100 sampled parts will be measured, and if the results obtained cast a substantial amount of doubt on the hypothesis that the mean diameter equals the target value of three inches, the company will assign a problem-solving team to intensively search for the causes of the problem.

a The parts supplier wishes to set up a hypothesis test so that the problem-solving team will be assigned when the null hypothesis is rejected. Set up the null and alternative hypotheses for this situation.

b A sample of 40 parts yields a sample mean diameter of inches. Assuming σ equals .016, use a critical value and a p-value to test H0 versus Ha by setting α equal to .05. Should the problem-solving team be assigned?

9.44 The Crown Bottling Company has just installed a new bottling process that will fill 16-ounce bottles of the popular Crown Classic Cola soft drink. Both overfilling and underfilling bottles are undesirable: Underfilling leads to customer complaints and overfilling costs the company considerable money. In order to verify that the filler is set up correctly, the company wishes to see whether the mean bottle fill, μ, is close to the target fill of 16 ounces. To this end, a random sample of 36 filled bottles is selected from the output of a test filler run. If the sample results cast a substantial amount of doubt on the hypothesis that the mean bottle fill is the desired 16 ounces, then the filler’s initial setup will be readjusted.

a The bottling company wants to set up a hypothesis test so that the filler will be readjusted if the null hypothesis is rejected. Set up the null and alternative hypotheses for this hypothesis test.

b Suppose that Crown Bottling Company decides to use a level of significance of α = .01, and suppose a random sample of 36 bottle fills is obtained from a test run of the filler. For each of the following three sample means, determine whether the filler’s initial setup should be readjusted. In each case, use a critical value and a p-value, and assume that σ equals .1.

9.45 Use the first sample mean in Exercise 9.44 and a confidence interval to perform the hypothesis test by setting α equal to .05. What considerations would help you to decide whether the result has practical importance?

9.46 THE DISK BRAKE CASE

National Motors has equipped the ZX-900 with a new disk brake system. We define the stopping distance for a ZX-900 as the distance (in feet) required to bring the automobile to a complete stop from a speed of 35 mph under normal driving conditions using this new brake system. In addition, we define μ to be the mean stopping distance of all ZX-900s. One of the ZX-900’s major competitors is advertised to achieve a mean stopping distance of 60 ft. National Motors would like to claim in a new television commercial that the ZX-900 achieves a shorter mean stopping distance.

a Set up the null hypothesis H0 and the alternative hypothesis Ha that would be used to attempt to provide evidence supporting the claim that μ is less than 60.

b A television network will permit National Motors to claim that the ZX-900 achieves a shorter mean stopping distance than the competitor if H0 can be rejected in favor of Ha by setting α equal to .05. If the stopping distances of a random sample of n = 81 ZX-900s have a mean of , will National Motors be allowed to run the commercial? Perform the hypothesis test by using a critical value and a p-value. Assume here that σ = 6.02.

9.47 Consider part (b) of Exercise 9.46, and calculate a 95 percent confidence interval for μ. Do the point estimate of μ and confidence interval for μ indicate that μ might be far enough below 60 feet to suggest that we have a practically important result?

9.48 Recall from Exercise 8.12 that Bayus (1991) studied the mean numbers of auto dealers visited by early and late replacement buyers.

a Letting μ be the mean number of dealers visited by early replacement buyers, suppose that we wish to test H0: μ = 4 versus Ha: μ ≠ 4. A random sample of 800 early replacement buyers yields a mean number of dealers visited of . Assuming σ equals .71, calculate the p-value and test H0 versus Ha. Do we estimate that μ is less than 4 or greater than 4?

b Letting μ be the mean number of dealers visited by late replacement buyers, suppose that we wish to test H0: μ = 4 versus Ha: μ ≠ 4. A random sample of 500 late replacement buyers yields a mean number of dealers visited of . Assuming σ equals .66, calculate the p-value and test H0 versus Ha. Do we estimate that μ is less than 4 or greater than 4?

9.3: t Tests about a Population Mean: σ Unknown

If we do not know σ (which is usually the case), we can base a hypothesis test about μ on the sampling distribution of

If the sampled population is normally distributed, then this sampling distribution is a
t distribution having n − 1 degrees of freedom.
This leads to the following results:

A t Tests about a Population Mean: σ UnKnown

Define the test statistic

and assume that the population sampled is normally distributed. We can test H0: μ = μ0 versus a particular alternative hypothesis at level of significance α by using the appropriate critical value rule, or, equivalently, the corresponding p-value.

Here tα, tα/2, and the p-values are based on n − 1 degrees of freedom.

In the rest of this chapter and in Chapter 10 we will present most of the hypothesis testing examples by using hypothesis testing summary boxes and the seven hypothesis testing steps given in the previous section. However, to be concise, we will not formally number each hypothesis testing step. Rather, for each of the first six steps, we will set out in boldface font a key phrase that indicates that the step is being carried out. Then, we will highlight the seventh step—the business improvement conclusion—as we highlight all business improvement conclusions in this book. After Chapter 10, we will continue to use hypothesis testing summary boxes, and we will more informally use the seven steps.

As illustrated in the following example, we will often first use a critical value rule to test the hypotheses under consideration at a fixed value of α and then use a p-value to assess the weight of evidence against the null hypothesis.

EXAMPLE 9.4

In 1991 the average interest rate charged by U.S. credit card issuers was 18.8 percent. Since that time, there has been a proliferation of new credit cards affiliated with retail stores, oil companies, alumni associations, professional sports teams, and so on. A financial officer wishes to study whether the increased competition in the credit card business has reduced interest rates. To do this, the officer will test a hypothesis about the current mean interest rate, μ, charged by U.S. credit card issuers. The null hypothesis to be tested is H0: μ = 18.8%, and the alternative hypothesis is Ha: μ < 18.8%. If H0 can be rejected in favor of Ha at the .05 level of significance, the officer will conclude that the current mean interest rate is less than the 18.8% mean interest rate charged in 1991. To perform the hypothesis test, suppose that we randomly select n = 15 credit cards and determine their current interest rates. The interest rates for the 15 sampled cards are given in Table 9.3. A stem-and-leaf display and MINITAB box plot are given in Figure 9.7. The stem-and-leaf display looks reasonably mound-shaped, and both the stem-and-leaf display and the box plot look reasonably symmetrical. It follows that it is appropriate to calculate the value of the test statistic t in the summary box. Furthermore, since Ha: μ < 18.8% is of the form Ha: μ < μ0, we should reject H0: μ = 18.8% if the value of t is less than the critical value −tα = −t.05 = −1.761. Here, −t.05 = −1.761 is based on n − 1 = 15 −1 = 14 degrees of freedom and this critical value is illustrated in Figure 9.8(a). The mean and the standard deviation of the n = 15 interest rates in Table 9.3 are and s = 1.538. This implies that the value of the test statistic is

Figure 9.7: Stem-and-Leaf Display and Box Plot of the Interest Rates

Figure 9.8: Testing H0: μ = 18.8% versus Ha: μ < 18.8% by Using a Critical Value and a p-Value

Table 9.3: Interest Rates Charged by 15 Randomly Selected Credit Cards CreditCd

Since t = −4.97 is less than −t.05 = −1.761, we reject H0: μ = 18.8% in favor of Ha: μ < 18.8%. That is, we conclude (at an α of .05) that the current mean credit card interest rate is lower than 18.8%, the mean interest rate in 1991. Furthermore, the sample mean says that we estimate the mean interest rate is 18.8% − 16.827% = 1.973% lower than it was in 1991.

The p-value for testing H0: μ = 18.8% versus Ha: μ < 18.8% is the area under the curve of the t distribution having 14 degrees of freedom to the left of t = −4.97. Tables of t points (such as Table A.4, page 864) are not complete enough to give such areas for most t statistic values, so we use computer software packages to calculate p-values that are based on the t distribution. For example, the MINITAB output in Figure 9.9(a) and the MegaStat output in Figure 9.10 tell us that the p-value for testing H0: μ = 18.8% versus Ha: μ < 18.8% is .0001. Notice that both MINITAB and MegaStat round p-values to three or four decimal places. The Excel output in Figure 9.9(b) gives the slightly more accurate value of 0.000103 for the p-value. Because this p-value is less than .05, .01, and .001, we can reject H0 at the .05, .01, and .001 levels of significance. Also note that the p-value of .0001 on the MegaStat output is shaded dark yellow. This indicates that we can reject H0 at the .01 level of significance (light yellow shading would indicate significance at the .05 level, but not at the .01 level). As a probability, the p-value of .0001 says that if we are to believe that H0: μ = 18.8% is true, we must believe that we have observed a t statistic value (t = −4.97) that can be described as a 1 in 10,000 chance. In summary, we have extremely strong evidence that H0: μ = 18.8% is false and Ha: μ < 18.8% is true. That is, we have extremely strong evidence that the current mean credit card interest rate is less than 18.8%.

Figure 9.9: The MINITAB and Excel Outputs for Testing H0: μ = 18.8% versus Ha: μ < 18.8%

Figure 9.10: The MegaStat Output for Testing H0: μ = 18.8% versus Ha: μ < 18.8%

Recall that in three cases discussed in Section 9.2 we tested hypotheses by assuming that the population standard deviation σ is known and by using z tests. If σ is actually not known in these cases (which would probably be true), we should test the hypotheses under consideration by using t tests. Furthermore, recall that in each case the sample size is large (at least 30). In general, it can be shown that if the sample size is large, the t test is approximately valid even if the sampled population is not normally distributed (or mound shaped). Therefore, consider the Valentine’s Day chocolate case and testing
H0: μ = 330 versus Ha: μ ≠ 330
at the .05 level of significance. To perform the hypothesis test, assume that we will randomly select n = 100 large retail stores and use their anticipated order quantities to calculate the value of the test statistic t in the summary box. Then, since the alternative hypothesis Ha: μ ≠ 330 is of the form Ha: μ ≠ μ0, we will reject H0: μ = 330 if the absolute value of t is greater than tα/2 = t.025 = 1.984 (based on n − 1 = 99 degrees of freedom). Suppose that when the sample is randomly selected, the mean and the standard deviation of the n = 100 reported order quantities are calculated to be and s = 39.1. The value of the test statistic is

Since | t | = 1.023 is less than t.025 = 1.984, we cannot reject H0: μ = 330 by setting α equal to .05. It follows that we cannot conclude (at an α of .05) that this year’s mean order quantity of the valentine box by large retail stores will differ from 330 boxes. Therefore, the candy company will base its production of valentine boxes on the ten percent projected sales increase. The p-value for the hypothesis test is twice the area under the t distribution curve having 99 degrees of freedom to the right of | t | = 1.023. Using a computer, we find that this p-value is .3088, which provides little evidence against H0: μ = 330 and in favor of Ha: μ ≠ 330.

As another example, consider the trash bag case and note that the sample of n = 40 trash bag breaking strengths has mean and standard deviation s = 1.6438. The p-value for testing H0: μ = 50 versus Ha: μ > 50 is the area under the t distribution curve having n − 1 = 39 degrees of freedom to the right of

Using a computer, we find that this p-value is .0164, which provides strong evidence against H0: μ = 50 and in favor of Ha: μ > 50. In particular, recall that most television networks would evaluate the claim that the new trash bag has a mean breaking strength that exceeds 50 pounds by choosing on α value between .025 and .10. It follows, since the p-value of .0164 is less than all these α values, that most networks would allow the trash bag claim to be advertised.

As a third example, consider the payment time case and note that the sample of n = 65 payment times has mean and standard deviation s = 3.9612. The p-value for testing H0: μ = 19.5 versus Ha: μ < 19.5 is the area under the t distribution curve having n − 1 = 64 degrees of freedom to the left of

Using a computer, we find that this p-value is .0031, which is less than the management consulting firm’s α value of .01. It follows that the consulting firm will claim that the new electronic billing system has reduced the Hamilton, Ohio, trucking company’s mean bill payment time by more than 50 percent.

To conclude this section, note that if the sample size is small (<30) and the sampled population is not mound-shaped, or if the sampled population is highly skewed, then it might be appropriate to use a nonparametric test about the population median. Such a test is discussed in Chapter 18.

Exercises for Section 9.3

CONCEPTS

9.49 What assumptions must be met in order to carry out the test about a population mean based on the t distribution?

9.50 How do we decide whether to use a z test or a t test when testing a hypothesis about a population mean?

METHODS AND APPLICATIONS

9.51 Suppose that a random sample of 16 measurements from a normally distributed population gives a sample mean of and a sample standard deviation of s = 6. Use critical values to test H0: μ ≤ 10 versus Ha: μ > 10 using levels of significance α = .10, α = .05, α = .01, and α = .001. What do you conclude at each value of α?

9.52 Suppose that a random sample of nine measurements from a normally distributed population gives a sample mean of and a sample standard deviation of s = .3. Use critical values to test H0: μ = 3 versus Ha: μ ≠ 3 using levels of significance α = .10, α = .05, α = .01, and α = .001. What do you conclude at each value of α?

9.53 THE AIR TRAFFIC CONTROL CASE AlertTime

Recall that it is hoped that the mean alert time, μ, using the new display panel is less than eight seconds. Formulate the null hypothesis H0 and the alternative hypothesis Ha that would be used to attempt to provide evidence that μ is less than eight seconds. The mean and the standard deviation of the sample of n = 15 alert times are and s = 1.0261. Perform a t test of H0 versus Ha by setting α equal to .05 and using a critical value. Interpret the results of the test.

9.54 THE AIR TRAFFIC CONTROL CASE AlertTime

The p-value for the hypothesis test of Exercise 9.53 can be computer calculated to be .0200. How much evidence is there that μ is less than eight seconds?

9.55 The bad debt ratio for a financial institution is defined to be the dollar value of loans defaulted divided by the total dollar value of all loans made. Suppose that a random sample of seven Ohio banks is selected and that the bad debt ratios (written as percentages) for these banks are 7%, 4%, 6%, 7%, 5%, 4%, and 9%. BadDebt

a Banking officials claim that the mean bad debt ratio for all Midwestern banks is 3.5 percent and that the mean bad debt ratio for Ohio banks is higher. Set up the null and alternative hypotheses needed to attempt to provide evidence supporting the claim that the mean bad debt ratio for Ohio banks exceeds 3.5 percent.

b Assuming that bad debt ratios for Ohio banks are approximately normally distributed, use critical values and the given sample information to test the hypotheses you set up in part a by setting α equal to .10, .05, .01, and .001. How much evidence is there that the mean bad debt ratio for Ohio banks exceeds 3.5 percent? What does this say about the banking official’s claim?

c Are you qualified to decide whether we have a practically important result? Who would be? How might practical importance be defined in this situation?

d The p-value for the hypothesis test of part (b) can be computer calculated to be .006. What does this p-value say about whether the mean bad debt ratio for Ohio banks exceeds 3.5 percent?

9.56 In the book Business Research Methods, Donald R. Cooper and C. William Emory (1995) discuss using hypothesis testing to study receivables outstanding. To quote Cooper and Emory:

…the controller of a large retail chain may be concerned about a possible slowdown in payments by the company’s customers. She measures the rate of payment in terms of the average number of days receivables outstanding. Generally, the company has maintained an average of about 50 days with a standard deviation of 10 days. Since it would be too expensive to analyze all of a company’s receivables frequently, we normally resort to sampling.

a Set up the null and alternative hypotheses needed to attempt to show that there has been a slowdown in payments by the company’s customers (there has been a slowdown if the average days outstanding exceeds 50).

b Assume approximate normality and suppose that a random sample of 25 accounts gives an average days outstanding of with a standard deviation of s = 8. Use critical values to test the hypotheses you set up in part a at levels of significance α = .10, α = .05, α = .01, and α = .001. How much evidence is there of a slowdown in payments?

c Are you qualified to decide whether this result has practical importance? Who would be?

9.57 Consider a chemical company that wishes to determine whether a new catalyst, catalyst XA-100, changes the mean hourly yield of its chemical process from the historical process mean of 750 pounds per hour. When five trial runs are made using the new catalyst, the following yields (in pounds per hour) are recorded: 801, 814, 784, 836, and 820. ChemYield

a Let μ be the mean of all possible yields using the new catalyst. Assuming that chemical yields are approximately normally distributed, the MegaStat output of the test statistic and p-value, and the Excel output of the p-value, for testing H0: μ = 750 versus Ha: μ ≠ 750 are as follows:

(Here we had Excel calculate twice the area under the t distribution curve having 4 degrees of freedom to the right of 6.942585.) Use the sample data to verify that the values of , s, and t given on the output are correct.

b Use the test statistic and critical values to test H0 versus Ha by setting α equal to .10, .05, .01, and .001.

9.58 Consider Exercise 9.57. Use the p-value to test H0: μ = 750 versus Ha: μ ≠ 750 by setting α equal to .10, .05, .01, and .001. How much evidence is there that the new catalyst changes the mean hourly yield?

9.59 Whole Foods is an all-natural grocery chain that has 50,000 square foot stores, up from the industry average of 34,000 square feet. Sales per square foot of supermarkets average just under $400 per square foot, as reported by USA Today in an article on “A whole new ballgame in grocery shopping.” Suppose that sales per square foot in the most recent fiscal year are recorded for a random sample of 10 Whole Foods supermarkets. The data (sales dollars per square foot) are as follows: 854, 858, 801, 892, 849, 807, 894, 863, 829, 815. Let μ denote the mean sales dollars per square foot for all Whole Foods supermarkets during the most recent fiscal year, and note that the historical mean sales dollars per square foot for Whole Foods supermarkets in previous years has been $800. Below we present the MINITAB output obtained by using the sample data to test H0: μ = 800 versus Ha: μ > 800. WholeFoods

a Use the p-value to test H0 versus Ha by setting α equal to .10, .05, and .01.

b How much evidence is there that μ exceeds $800?

9.60 Consider Exercise 9.59. Do you think that the difference between the sample mean of $846.20 and the historical average of $800 has practical importance?

9.61 THE VIDEO GAME SATISFACTION RATING CASE VideoGame

The mean and the standard deviation of the sample of n = 65 customer satisfaction ratings are and s = 2.6424. Let μ denote the mean of all possible customer satisfaction ratings for the XYZ-Box video game system, and consider testing H0: μ = 42 versus Ha: μ > 42 Perform a t test of these hypotheses by setting α equal to .05 and using a critical value. Also, interpret the p-value of .0025 for the hypothesis test.

9.62 THE BANK CUSTOMER WAITING TIME CASE WaitTime

The mean and the standard deviation of the sample of 100 bank customer waiting times are and s = 2.475. Let μ denote the mean of all possible bank customer waiting times using the new system and consider testing H0: μ = 6 versus Ha: μ < 6. Perform a t test of these hypotheses by setting α equal to .05 and using a critical value. Also, interpret the p-value of .0158 for the hypothesis test.

9.4: z Tests about a Population Proportion

In this section we study a large sample hypothesis test about a population proportion (that is, about the fraction of population units that possess some characteristic). We begin with an example.

EXAMPLE 9.5: The Cheese Spread Case

Recall that the soft cheese spread producer has decided that replacing the current spout with the new spout is profitable only if p, the true proportion of all current purchasers who would stop buying the cheese spread if the new spout were used, is less than .10. The producer feels that it is unwise to change the spout unless it has very strong evidence that p is less than .10. Therefore, the spout will be changed if and only if the null hypothesis H0: p = .10 can be rejected in favor of the alternative hypothesis Ha: p < .10 at the .01 level of significance.

In order to see how to test this kind of hypothesis, remember that when n is large, the sampling distribution of

is approximately a standard normal distribution. Let p0 denote a specified value between 0 and 1 (its exact value will depend on the problem), and consider testing the null hypothesis H0: p = p0. We then have the following result:

A Large Sample Test about a Population Proportion

Define the test statistic

If the sample size n is large, we can test H0: p = p0 versus a particular alternative hypothesis at level of significance α by using the appropriate critical value rule, or, equivalently, the corresponding p-value.

Here n should be considered large if both np0 and n(1 − p0) are at least 5.3

EXAMPLE 9.6: The Cheese Spread Case

We have seen that the cheese spread producer wishes to test
H0: p = .10 versus Ha: p < .10,
where p is the proportion of all current purchasers who would stop buying the cheese spread if the new spout were used. The producer will use the new spout if H0 can be rejected in favor of Ha at the .01 level of significance. To perform the hypothesis test, we will randomly select n = 1,000 current purchasers of the cheese spread, find the proportion of these purchasers who would stop buying the cheese spread if the new spout were used, and calculate the value of the test statistic z in the summary box. Then, since the alternative hypothesis Ha: p < .10 is of the form Ha: p < p0, we will reject H0:
p = .10 if the value of z is less than

z
α = −
z.01
= −2.33. (Note that using this procedure is valid because np0 = 1,000(.10) = 100 and n(1 − p0) = 1,000(1 − .10) = 900 are both at least 5.) Suppose that when the sample is randomly selected, we find that 63 of the 1,000 current purchasers say they would stop buying the cheese spread if the new spout were used. Since , the value of the test statistic is

Because z = −3.90 is less than −z.01 = −2.33, we reject H0: p = .10 in favor of Ha: p < .10. That is, we conclude (at an α of .01) that the proportion of current purchasers who would stop buying the cheese spread if the new spout were used is less than .10. It follows that the company will use the new spout. Furthermore, the point estimate says we estimate that 6.3 percent of all current customers would stop buying the cheese spread if the new spout were used.

Although the cheese spread producer has made its decision by setting α equal to a single, prechosen value (.01), it would probably also wish to know the weight of evidence against H0 and in favor of Ha. The p-value is the area under the standard normal curve to the left of z = − 3.90. Table A.3 (page 862) tells us that this area is .00005. Because this p-value is less than .001, we have extremely strong evidence that Ha: p < .10 is true. That is, we have extremely strong evidence that fewer than 10 percent of current purchasers would stop buying the cheese spread if the new spout were used.

EXAMPLE 9.7

Recent medical research has sought to develop drugs that lessen the severity and duration of viral infections. Virol, a relatively new drug, has been shown to provide relief for 70 percent of all patients suffering from viral upper respiratory infections. A major drug company is developing a competing drug called Phantol. The drug company wishes to investigate whether Phantol is more effective than Virol. To do this, the drug company will test a hypothesis about the true proportion, p, of all patients whose symptoms would be relieved by Phantol. The null hypothesis to be tested is H0: p = .70, and the alternative hypothesis is Ha: p > .70. If H0 can be rejected in favor of Ha at the .05 level of significance, the drug company will conclude that Phantol helps more than the 70 percent of patients helped by Virol. To perform the hypothesis test, we will randomly select n = 300 patients having viral upper respiratory infections, find the proportion of these patients whose symptoms are relieved by Phantol and calculate the value of the test statistic z in the summary box. Then, since the alternative hypothesis Ha: p > .70 is of the form Ha: p> p0, we will reject H0: p = .70 if the value of z is greater than zα = z.05 = 1.645. (Note that using this procedure is valid because np0 = 300(.70) = 210 and n(1 − p0) = 300(1 − .70) = 90 are both at least 5.) Suppose that when the sample is randomly selected, we find that Phantol provides relief for 231 of the 300 patients. Since the value of the test statistic is

Because z = 2.65 is greater than z.05 = 1.645, we reject H0: p = .70 in favor of Ha: p > .70. That is, we conclude (at an α of .05) that Phantol will provide relief for more than 70 percent of all patients suffering from viral upper respiratory infections. More specifically, the point estimate of p says that we estimate that Phantol will provide relief for 77 percent of all such patients. Comparing this estimate to the 70 percent of patients whose symptoms are relieved by Virol, we conclude that Phantol is somewhat more effective.

The p-value for testing H0: p = .70 versus Ha: p > .70 is the area under the standard normal curve to the right of z = 2.65. This p-value is (1.0 − .9960) = .004 (see Table A.3, page 863), and it provides very strong evidence against H0: p = .70 and in favor of Ha: p > .70. That is, we have very strong evidence that Phantol will provide relief for more than 70 percent of all patients suffering from viral upper respiratory infections.

EXAMPLE 9.8: The Electronic Article Surveillance Case

Suppose that a company selling electronic article surveillance devices claims that the proportion, p, of all consumers who would never shop in a store again if the store subjected them to a false alarm is no more than .05. A store considering installing such a device is concerned that p is greater than .05 and wishes to test
H0: p = .05 versus Ha: p > .05.
To perform the hypothesis test, the store will calculate a p-value and use it to measure the weight of evidence against H0 and in favor of Ha. In an actual systematic sample, 40 out of 250 consumers said they would never shop in a store again if the store subjected them to a false alarm. Therefore, the sample proportion of lost consumers is Since np0 = 250(.05) = 12.5 and n(1 − p0) = 250(1 − .05) = 237.5 are both at least 5, we can use the test statistic z in the summary box. The value of the test statistic is

Noting that Ha: p > .05 is of the form Ha: p>p0, the p-value is the area under the standard normal curve to the right of z = 7.98. The normal table tells us that the area under the standard normal curve to the right of 3.99 is (1.0 − .99997) = .00003. Therefore, the p-value is less than .00003 and provides extremely strong evidence against H0: p = .05 and in favor of Ha: p > .05. That is, we have extremely strong evidence that the proportion of all consumers who say they would never shop in a store again if the store subjected them to a false alarm is greater than .05. Furthermore, the point estimate says we estimate that the percentage of such consumers is 11 percent more than the 5 percent maximum claimed by the company selling the electronic article surveillance devices. A 95 percent confidence interval for p is

This interval says we are 95 percent confident that the percentage of consumers who would never shop in a store again if the store subjected them to a false alarm is between 6.46 percent and 15.54 percent more than the 5 percent maximum claimed by the company selling the electronic article surveillance devices. The rather large increases over the claimed 5 percent maximum implied by the point estimate and the confidence interval would mean substantially more lost customers and thus are practically important. Figure 9.11 gives the MegaStat output for testing H0: p = .05 versus Ha: p > .05. Note that this output includes a 95 percent confidence interval for p. Also notice that MegaStat expresses the p-value for this test in scientific notation. In general, when a p-value is less than .0001, MegaStat (and also Excel) express the p-value in scientific notation. Here the p-value of 7.77 E-16 says that we must move the decimal point 16 places to the left to obtain the decimal equivalent. That is, the p-value is .000000000000000777.

Figure 9.11: The MegaStat Output for Testing H0: p = .05 versus Ha: p > .05

Exercises for Section 9.4

CONCEPTS

9.63 If we test a hypothesis to provide evidence supporting the claim that a majority of voters prefer a political candidate, explain the difference between p and .

9.64 If we test a hypothesis to provide evidence supporting the claim that more than 30 percent of all consumers prefer a particular brand of beer, explain the difference between p and .

9.65 If we test a hypothesis to provide evidence supporting the claim that fewer than 5 percent of the units produced by a process are defective, explain the difference between p and .

9.66 What condition must be satisfied in order to appropriately use the methods of this section?

METHODS AND APPLICATIONS

9.67 For each of the following sample sizes and hypothesized values of the population proportion p, determine whether the sample size is large enough to use the large sample test about p given in this section:

a n = 400 and p0 = .5.

b n = 100 and p0 = .01.

c n = 10,000 and p0 = .01.

d n = 100 and p0 = .2.

e n = 256 and p0 = .7.

f n = 200 and p0 = .98.

g n = 1,000 and p0 = .98.

h n = 25 and p0 = .4.

9.68 Suppose we wish to test H0: p ≤ .8 versus Ha: p > .8 and that a random sample of n = 400 gives a sample proportion .

a Test H0 versus Ha at the .05 level of significance by using a critical value. What do you conclude?

b Find the p-value for this test.

c Use the p-value to test H0 versus Ha by setting α equal to .10, .05, .01, and .001. What do you conclude at each value of α?

9.69 Suppose we test H0: p = .3 versus Ha: p ≠ .3 and that a random sample of n = 100 gives a sample proportion .

a Test H0 versus Ha at the .01 level of significance by using a critical value. What do you conclude?

b Find the p-value for this test.
c Use the p-value to test H0 versus Ha by setting α equal to .10, .05, .01, and .001. What do you conclude at each value of α?

9.70 Suppose we are testing H0: p ≤ .5 versus Ha: p > .5, where p is the proportion of all beer drinkers who have tried at least one brand of “cold-filtered beer.” If a random sample of 500 beer drinkers has been taken and if equals .57, how many beer drinkers in the sample have tried at least one brand of “cold-filtered beer”?

9.71 THE MARKETING ETHICS CASE: CONFLICT OF INTEREST

Recall that a conflict of interest scenario was presented to a sample of 205 marketing researchers and that 111 of these researchers disapproved of the actions taken.

a Let p be the proportion of all marketing researchers who disapprove of the actions taken in the conflict of interest scenario. Set up the null and alternative hypotheses needed to attempt to provide evidence supporting the claim that a majority (more than 50 percent) of all marketing researchers disapprove of the actions taken.

b Assuming that the sample of 205 marketing researchers has been randomly selected, use critical values and the previously given sample information to test the hypotheses you set up in part a at the .10, .05, .01, and .001 levels of significance. How much evidence is there that a majority of all marketing researchers disapprove of the actions taken?

c Suppose a random sample of 1,000 marketing researchers reveals that 540 of the researchers disapprove of the actions taken in the conflict of interest scenario. Use critical values to determine how much evidence there is that a majority of all marketing researchers disapprove of the actions taken.

d Note that in parts b and c the sample proportion is (essentially) the same. Explain why the results of the hypothesis tests in parts b and c differ.

9.72 Last year, television station WXYZ’s share of the 11 p.m. news audience was approximately equal to, but no greater than, 25 percent. The station’s management believes that the current audience share is higher than last year’s 25 percent share. In an attempt to substantiate this belief, the station surveyed a random sample of 400 11 p.m. news viewers and found that 146 watched WXYZ.

a Let p be the current proportion of all 11 p.m. 9.7 news viewers who watch WXYZ. Set up the null and alternative hypotheses needed to attempt to provide evidence supporting the claim that the current audience share for WXYZ is higher than last year’s 25 percent share.

b Use critical values and the following MINITAB output to test the hypotheses you set up in part a at the .10, .05, .01, and .001 levels of significance. How much evidence is there that the current audience share is higher than last year’s 25 percent share?

c Find the p-value for the hypothesis test in part b. Use the p-value to carry out the test by setting α equal to .10, .05, .01, and .001. Interpret your results.

d Do you think that the result of the station’s survey has practical importance? Why or why not?

9.73 In the book Essentials of Marketing Research, William R. Dillon, Thomas J. Madden, and Neil H. Firtle discuss a marketing research proposal to study day-after recall for a brand of mouthwash. To quote the authors:

The ad agency has developed a TV ad for the introduction of the mouthwash. The objective of the ad is to create awareness of the brand. The objective of this research is to evaluate the awareness generated by the ad measured by aided- and unaided-recall scores.

A minimum of 200 respondents who claim to have watched the TV show in which the ad was aired the night before will be contacted by telephone in 20 cities.

The study will provide information on the incidence of unaided and aided recall.

Suppose a random sample of 200 respondents shows that 46 of the people interviewed were able to recall the commercial without any prompting (unaided recall).

a In order for the ad to be considered successful, the percentage of unaided recall must be above the category norm for a TV commercial for the product class. If this norm is 18 percent, set up the null and alternative hypotheses needed to attempt to provide evidence that the ad is successful.

b Use the previously given sample information to compute the p-value for the hypothesis test you set up in part a. Use the p-value to carry out the test by setting α equal to .10, .05, .01, and .001. How much evidence is there that the TV commercial is successful?

c Do you think the result of the ad agency’s survey has practical importance? Explain your opinion.

9.74 Quality Progress, February 2005, reports on the results achieved by Bank of America in improving customer satisfaction and customer loyalty by listening to the ‘voice of the customer’. A key measure of customer satisfaction is the response on a scale from 1 to 10 to the question: “Considering all the business you do with Bank of America, what is your overall satisfaction with Bank of America?”4 Suppose that a random sample of 350 current customers results in 195 customers with a response of 9 or 10 representing ‘customer delight.’

a Let p denote the true proportion of all current Bank of America customers who would respond with a 9 or 10, and note that the historical proportion of customer delight for Bank of America has been .48. Calculate the p-value for testing H0: p = .48 versus Ha: p > .48. How much evidence is there that p exceeds .48?

b Bank of America has a base of nearly 30 million customers. Do you think that the sample results have practical importance? Explain your opinion.

9.75 The manufacturer of the ColorSmart-5000 television set claims that 95 percent of its sets last at least five years without needing a single repair. In order to test this claim, a consumer group randomly selects 400 consumers who have owned a ColorSmart-5000 television set for five years. Of these 400 consumers, 316 say that their ColorSmart-5000 television sets did not need repair, while 84 say that their ColorSmart-5000 television sets did need at least one repair.

a Letting p be the proportion of ColorSmart-5000 television sets that last five years without a single repair, set up the null and alternative hypotheses that the consumer group should use to attempt to show that the manufacturer’s claim is false.

b Use critical values and the previously given sample information to test the hypotheses you set up in part a by setting α equal to .10, .05, .01, and .001. How much evidence is there that the manufacturer’s claim is false?

c Do you think the results of the consumer group’s survey have practical importance? Explain your opinion.

9.5: Type II Error Probabilities and Sample Size Determination (Optional)


Chapters 9
and
11

As we have seen, we usually take action (for example, advertise a claim) on the basis of having rejected the null hypothesis. In this case, we know the chances that the action has been taken erroneously because we have prespecified α, the probability of rejecting a true null hypothesis. However, sometimes we must act (for example, use a day’s production of camshafts to make V6 engines) on the basis of not rejecting the null hypothesis. If we must do this, it is best to know the probability of not rejecting a false null hypothesis (a Type II error). If this probability is not small enough, we may change the hypothesis testing procedure. In order to discuss this further, we must first see how to compute the probability of a Type II error.

As an example, the Federal Trade Commission (FTC) often tests claims that companies make about their products. Suppose coffee is being sold in cans that are labeled as containing three pounds, and also suppose that the FTC wishes to determine if the mean amount of coffee μ in all such cans is at least three pounds. To do this, theFTC tests H0: μ ≥ 3 (or μ = 3) versus Ha: μ < 3 by setting α = .05. Suppose that a sample of 35 coffee cans yields . Assuming that σ equals .0147, we see that because

is not less than −z.05 = −1.645, we cannot reject H0: μ ≥ 3 by setting α = .05. Since we cannot reject H0, we cannot have committed a Type I error, which is the error of rejecting a true H0. However, we might have committed a Type II error, which is the error of not rejecting a false H0. Therefore, before we make a final conclusion about μ, we should calculate the probability of a Type II error.

A Type II error is not rejecting H0: μ ≥ 3 when H0 is false. Because any value of μ that is less than 3 makes H0 false, there is a different Type II error (and, therefore, a different Type II error probability) associated with each value of μ that is less than 3. In order to demonstrate how to calculate these probabilities, we will calculate the probability of not rejecting H0: μ ≥ 3 when in fact μ equals 2.995. This is the probability of failing to detect an average underfill of .005 pounds. For a fixed sample size (for example, n = 35 coffee can fills), the value of β, the probability of a Type II error, depends upon how we set α, the probability of a Type I error. Since we have set α = .05, we reject H0 if

or, equivalently, if

Therefore, we do not reject H0 if . It follows that β, the probability of not rejecting H0: μ ≥ 3 when μ equals 2.995, is

This calculation is illustrated in Figure 9.12. Similarly, it follows that β, the probability of not rejecting H0: μ ≥ 3 when μ equals 2.99, is

Figure 9.12: Calculating β When μ Equals 2.995

It also follows that β, the probability of not rejecting H0: μ ≥ 3 when μ equals 2.985, is

This probability is less than .00003 (because z is greater than 3.99).

In Figure 9.13 we illustrate the values of β that we have calculated. Notice that the closer an alternative value of μ is to 3 (the value specified by H0: μ = 3), the larger is the associated value of β. Although alternative values of μ that are closer to 3 have larger associated probabilities of Type II errors, these values of μ have associated Type II errors with less serious consequences. For example, we are more likely to not reject H0: μ = 3 when μ = 2.995 (β = .3557) than we are to not reject H0: μ = 3 when μ = 2.99 (β = .0087). However, not rejecting H0: μ = 3 when μ = 2.995, which means that we are failing to detect an average underfill of .005 pounds, is less serious than not rejecting H0: μ = 3 when μ = 2.99, which means that we are failing to detect a larger average underfill of .01 pounds. In order to decide whether a particular hypothesis test adequately controls the probability of a Type II error, we must determine which Type II errors are serious, and then we must decide whether the probabilities of these errors are small enough. For example, suppose that the FTC and the coffee producer agree that failing to reject H0: μ = 3 when μ equals 2.99 is a serious error, but that failing to reject H0: μ = 3 when μ equals 2.995 is not a particularly serious error. Then, since the probability of not rejecting H0: μ = 3 when μ equals 2.99, which is .0087, is quite small, we might decide that the hypothesis test adequately controls the probability of a Type II error. To understand the implication of this, recall that the sample of 35 coffee cans, which has , does not provide enough evidence to reject H0: μ ≥ 3 by setting α = .05. We have just shown that the probability that we have failed to detect a serious underfill is quite small (.0087), so the FTC might decide that no action should be taken against the coffee producer. Of course, this decision should also be based on the variability of the fills of the individual cans. Because and σ = .0147, we estimate that 99.73 percent of all individual coffee can fills are contained in the interval If the FTC believes it is reasonable to accept fills as low as (but no lower than) 2.9532 pounds, this evidence also suggests that no action against the coffee producer is needed.

Figure 9.13: How β Changes as the Alternative Value of μ Changes

Suppose, instead, that the FTC and the coffee producer had agreed that failing to reject H0: μ ≥ 3 when μ equals 2.995 is a serious mistake. The probability of this Type II error, which is .3557, is large. Therefore, we might conclude that the hypothesis test is not adequately controlling the probability of a serious Type II error. In this case, we have two possible courses of action. First, we have previously said that, for a fixed sample size, the lower we set α, the higher is β, and the higher we set α, the lower is β. Therefore, if we keep the sample size fixed at n = 35 coffee cans, we can reduce β by increasing α. To demonstrate this, suppose we increase α to .10. In this case we reject H0 if

or, equivalently, if

Therefore, we do not reject H0 if It follows that β, the probability of not rejecting H0: μ ≥ 3  when  μ equals 2.995, is

We thus see that increasing α from .05 to .10 reduces β from .3557 to .2327. However, β is still too large, and, besides, we might not be comfortable making α larger than .05. Therefore, if we wish to decrease β and maintain α at .05, we must increase the sample size. We will soon present a formula we can use to find the sample size needed to make both α and β as small as we wish.

Once we have computed β, we can calculate what we call the power of the test.

The
power
of a statistical test is the probability of rejecting the null hypothesis when it is false.

Just as β depends upon the alternative value of μ, so does the power of a test. In general, the power associated with a particular alternative value of μ equals 1 − β, where β is the probability of a Type II error associated with the same alternative value of μ. For example, we have seen that, when we set α = .05, the probability of not rejecting H0: μ ≥ 3 when μ equals 2.99 is .0087. Therefore, the power of the test associated with the alternative value 2.99 (that is, the probability of rejecting H0: μ ≥ 3 when μ equals 2.99) is 1 − .0087 = .9913.

Thus far we have demonstrated how to calculate β when testing a less than alternative hypothesis. In the following box we present (without proof) a method for calculating the probability of a Type II error when testing a less than, a greater than, or a not equal to alternative hypothesis:

Calculating the Probability of a Type II Error

Assume that the sampled population is normally distributed, or that a large sample will be taken. Consider testing H0: μ = μ0 versus one of Ha: μ > μ0, Ha: μ < μ0, or Ha: μ ≠ μ0. Then, if we set the probability of a Type I error equal to α and randomly select a sample of size n, the probability, β, of a Type II error corresponding to the alternative value μa of μ is (exactly or approximately) equal to the area under the standard normal curve to the left of

Here z* equals zα if the alternative hypothesis is one-sided (μ > μ0 or μ < μ0), in which case the method for calculating β is exact. Furthermore, z* equals zα/2 if the alternative hypothesis is two-sided (μ ≠ μ0), in which case the method for calculating β is approximate.

EXAMPLE 9.9: The Valentine’s Day Chocolate Case

In the Valentine’s Day chocolate case we are testing H0: μ = 330 versus Ha: μ ≠ 330 by setting α = .05. We have seen that the mean of the reported order quantities of a random sample of n = 100 large retail stores is . Assuming that σ equals 40, it follows that because

is between −z.025 = −1.96 and z.025 = 1.96, we cannot reject H0: μ = 330 by setting α = .05. Since we cannot reject H0, we might have committed a Type II error. Suppose that the candy company decides that failing to reject H0: μ = 330 when μ differs from 330 by as many as 15 valentine boxes (that is, when μ is 315 or 345) is a serious Type II error. Because we have set α equal to .05, β for the alternative value μa = 315 (that is, the probability of not rejecting H0: μ = 330 when μ equals 315) is the area under the standard normal curve to the left of

Here z* = zα/2 = z.05/2 = z.025 since the alternative hypothesis (μ ≠ 330) is two-sided. The area under the standard normal curve to the left of −1.79 is 1 − .9633 = .0377. Therefore, β for the alternative value μa = 315 is .0377. Similarly, it can be verified that β for the alternative value μa = 345 is .0377. It follows, because we cannot reject H0: μ = 330 by setting α = .05, and because we have just shown that there is a reasonably small (.0377) probability that we have failed to detect a serious (that is, a 15 valentine box) deviation of μ from 330, that it is reasonable for the candy company to base this year’s production of valentine boxes on the projected mean order quantity of 330 boxes per large retail store.

In the following box we present (without proof) a formula that tells us the sample size needed to make both the probability of a Type I error and the probability of a Type II error as small as we wish:

Calculating the Sample Size Needed to Achieve Specified Values of α and β

Assume that the sampled population is normally distributed, or that a large sample will be taken. Consider testing H0: μ = μ0 versus one of Ha: μ > μ0, Ha: μ < μ0, or Ha: μ ≠ μ0. Then, in order to make the probability of a Type I error equal to α and the probability of a Type II error corresponding to the alternative value μa of μ equal to β, we should take a sample of size

Here z* equals zα if the alternative hypothesis is one-sided (μ > μ0 or μ < μ0), and z* equals zα/2 if the alternative hypothesis is two-sided (μ ≠ μ0). Also, zβ is the point on the scale of the standard normal curve that gives a right-hand tail area equal to β.

EXAMPLE 9.10

Again consider the coffee fill example and suppose we wish to test H0: μ ≥ 3 (or μ = 3) versus Ha: μ < 3. If we wish α to be .05 and β for the alternative value μa = 2.995 of μ to be .05, we should take a sample of size

Here, z* = zα = z.05 = 1.645 because the alternative hypothesis (μ < 3) is one-sided, and zβ = z.05 = 1.645.

Although we have set both α and β equal to the same value in the coffee fill situation, it is not necessary for α and β to be equal. As an example, again consider the Valentine’s Day chocolate case, in which we are testing H0: μ = 330 versus Ha: μ ≠ 330. Suppose that the candy company decides that failing to reject H0: μ = 330 when μ differs from 330 by as many as 15 valentine boxes (that is, when μ is 315 or 345) is a serious Type II error. Furthermore, suppose that it is also decided that this Type II error is more serious than a Type I error. Therefore, α will be set equal to .05 and β for the alternative value μa = 315 (or μa = 345) of μ will be set equal to .01. It follows that the candy company should take a sample of size

Here, z* = zα/2 = z.05/2 = z.025 = 1.96 because the alternative hypothesis (μ ≠ 330) is two-sided, and zβ = z.01 = 2.326 (see the bottom row of the t table on page 865).

To conclude this section, we point out that the methods we have presented for calculating the probability of a Type II error and determining sample size can be extended to other hypothesis tests that utilize the normal distribution. We will not, however, present the extensions in this book.

Exercises for Section 9.5

CONCEPTS

9.76 We usually take action on the basis of having rejected the null hypothesis. When we do this, we know the chances that the action has been taken erroneously because we have prespecified α, the probability of rejecting a true null hypothesis. Here, it is obviously important to know (prespecify) α, the probability of a Type I error. When is it important to know the probability of a Type II error? Explain why.

9.77 Explain why we are able to compute many different values of β, the probability of a Type II error, for a single hypothesis test.

9.78 Explain what is meant by

a A serious Type II error.

b The power of a statistical test.

9.79 In general, do we want the power corresponding to a serious Type II error to be near 0 or near 1? Explain.

METHODS AND APPLICATIONS

9.80 Again consider the Consolidated Power waste water situation. Remember that the power plant will be shut down and corrective action will be taken on the cooling system if the null hypothesis H0: μ ≤ 60 is rejected in favor of Ha: μ > 60. In this exercise we calculate probabilities of various Type II errors in the context of this situation.

a Recall that Consolidated Power’s hypothesis test is based on a sample of n = 100 temperature readings and assume that σ equals 2. If the power company sets α = .025, calculate the probability of a Type II error for each of the following alternative values of μ: 60.1, 60.2, 60.3, 60.4, 60.5, 60.6, 60.7, 60.8, 60.9, 61.

b If we want the probability of making a Type II error when μ equals 60.5 to be very small, is Consolidated Power’s hypothesis test adequate? Explain why or why not. If not, and if we wish to maintain the value of α at .025, what must be done?

c The power curve for a statistical test is a plot of the power = 1 − β on the vertical axis versus values of μ that make the null hypothesis false on the horizontal axis. Plot the power curve for Consolidated Power’s test of H0: μ ≤ 60 versus Ha: μ > 60 by plotting power = 1 − β for each of the alternative values of μ in part a. What happens to the power of the test as the alternative value of μ moves away from 60?

9.81 Again consider the automobile parts supplier situation. Remember that a problem-solving team will be assigned to rectify the process producing the cylindrical engine parts if the null hypothesis H0: μ = 3 is rejected in favor of Ha: μ ≠ 3. In this exercise we calculate probabilities of various Type II errors in the context of this situation.

a Suppose that the parts supplier’s hypothesis test is based on a sample of n = 100 diameters and that σ equals .023. If the parts supplier sets α = .05, calculate the probability of a TypeII error for each of the following alternative values of μ: 2.990, 2.995, 3.005, 3.010.

b If we want the probabilities of making a Type II error when μ equals 2.995 and when μ equals 3.005 to both be very small, is the parts supplier’s hypothesis test adequate? Explain why or why not. If not, and if we wish to maintain the value of α at .05, what must be done?

c Plot the power of the test versus the alternative values of μ in part a. What happens to the power of the test as the alternative value of μ moves away from 3?

9.82 In the Consolidated Power hypothesis test of H0: μ ≤ 60 versus Ha: μ > 60 (as discussed in Exercise 9.80) find the sample size needed to make the probability of a Type I error equal to .025 and the probability of a Type II error corresponding to the alternative value μa = 60.5 equal to .025. Here, assume σ equals 2.

9.83 In the automobile parts supplier’s hypothesis test of H0: μ = 3 versus Ha: μ ≠ 3 (as discussed in Exercise 9.81) find the sample size needed to make the probability of a Type I error equal to .05and the probability of a Type II error corresponding to the alternative value μa = 3.005 equal to .05. Here, assume σ equals .023.

9.6: The Chi-Square Distribution (Optional)

Sometimes we can make statistical inferences by using the
chi-square distribution.
The probability curve of the χ2 (pronounced chi-square) distribution is skewed to the right. Moreover, the exact shape of this probability curve depends on a parameter that is called the number of degrees of freedom (denoted df). Figure 9.14 illustrates chi-square distributions having 2, 5, and 10 degrees of freedom.

Figure 9.14: Chi-Square Distributions with 2, 5, and 10 Degrees of Freedom


Chapter 5

In order to use the chi-square distribution, we employ a chi-square point, which is denoted . As illustrated in the upper portion of Figure 9.15, is the point on the horizontal axis under the curve of the chi-square distribution that gives a right-hand tail area equal to α. The value of in a particular situation depends on the right-hand tail area α and the number of degrees of freedom (df) of the chi-square distribution. Values of are tabulated in a chi-square table. Such a table is given in Table A.17 of Appendix A (pages 877–878); a portion of this table is reproduced as Table 9.4. Looking at the chi-square table, the rows correspond to the appropriate number of degrees of freedom (values of which are listed down the right side of the table), while the columns designate the right-hand tail area α. For example, suppose we wish to find the chi-square point that gives a right-hand tail area of .05 under a chi-square curve having 5 degrees of freedom. To do this, we look in Table 9.4 at the row labeled 5 and the column labeled . We find that this point is 11.0705 (see the lower portion of Figure 9.15).

Figure 9.15: Chi-Square Points

Table 9.4: A Portion of the Chi-Square Table

9.7: Statistical Inference for a Population Variance (Optional)

Chapter 9

A vital part of a V6 automobile engine is the engine camshaft. As the camshaft turns, parts of the camshaft make repeated contact with engine lifters and thus must have the appropriate hardness to wear properly. To harden the camshaft, a heat treatment process is used, and a hardened layer is produced on the surface of the camshaft. The depth of the layer is called the hardness depth of the camshaft. Suppose that an automaker knows that the mean and the variance of the camshaft hardness depths produced by its current heat treatment process are, respectively, 4.5 mm and .2209 mm. To reduce the variance of the camshaft hardness depths, a new heat treatment process is designed, and a random sample of n = 30 camshaft hardness depths produced by using the new process has a mean of and a variance of s2 = .0885. In order to attempt to show that the variance, σ2, of the population of all camshaft hardness depths that would be produced by using the new process is less than .2209, we can use the following result:

Statistical Inference for a Populationa Variance

Suppose that s2 is the variance of a sample of n measurements randomly selected from a normally distributed population having variance σ2. The sampling distribution of the statistic (n− 1) s2 /σ2 is a chi-square distribution having n − 1 degrees of freedom. This implies that

1 A 100(1 − α) percent confidence interval for σ2 is

Here and are the points under the curve of the chi-square distribution having n − 1 degrees of freedom that give right-hand tail areas of, respectively, α/2 and 1 − (α/2).

2 We can test by using the test statistic

Specifically, if we set the probability of a Type I error equal to α, then we can reject H0 in favor of

a

b

c

Here , and are based on n − 1 degrees of freedom.

The assumption that the sampled population is normally distributed must hold fairly closely for the statistical inferences just given about σ2 to be valid. When we check this assumption in the camshaft situation, we find that a histogram (not given here) of the sample of n = 30 hardness depths is bell-shaped and symmetrical. In order to compute a 95 percent confidence interval for σ2, we note that Table A.17 (pages 877 and 878) tells us that these points—based on n − 1 = 29 degrees of freedom—are and (see Figure 9.16). It follows that a 95 percent confidence interval for σ2 is

Figure 9.16: The Chi-Square Points and

This interval provides strong evidence that σ2 is less than .2209.

If we wish to use a hypothesis test, we test the null hypothesis H0: σ2 = .2209 versus the alternative hypothesis Ha: σ2 < .2209. If H0 can be rejected in favor of Ha at the .05 level of significance, we will conclude that the new process has reduced the variance of the camshaft hardness depths. Since the histogram of the sample of n = 30 hardness depths is bell shaped and symmetrical, the appropriate test statistic is given in the summary box. Furthermore, since Ha: σ2 < .2209 is of the form , we should reject H0: σ 2=.2209 if the value of χ2 is less than the critical value Here is based on n − 1 = 30 − 1 = 29 degrees of freedom, and this critical value is illustrated in Figure 9.17. Since the sample variance is s2 = .0885, the value of the test statistic is

Figure 9.17: Testing H0: σ2 = .2209 versus Ha: σ2 < .2209 by Setting α = .05

Since
χ2 = 11.6184
is less than we reject H0: σ2 = .2209 in favor of Ha: σ2 < .2209. That is, we conclude (at an α of .05) that the new process has reduced the variance of the camshaft hardness depths.

Exercises for Sections 9.6 and 9.7

CONCEPTS

9.84 What assumption must hold to use the chi-square distribution to make statistical inferences about a population variance?

9.85 Define the meaning of the chi-square points and . Hint: Draw a picture.

9.86 Give an example of a situation in which we might wish to compute a confidence interval for σ2.

METHODS AND APPLICATIONS

Exercises 9.87 through 9.90 relate to the following situation: Consider an engine parts supplier and suppose the supplier has determined that the variance of the population of all cylindrical engine part outside diameters produced by the current machine is approximately equal to, but no less than, .0005. To reduce this variance, a new machine is designed, and a random sample of n = 25 outside diameters produced by this new machine has a mean of and a variance of s2 = .00014. Assume the population of all cylindrical engine part outside diameters that would be produced by the new machine is normally distributed, and let σ2 denote the variance of this population.

9.87 Find a 95 percent confidence interval for σ2.

9.88 Test H0: σ2 = .0005 versus Ha: σ2 < .0005 by setting α = .05.

9.89 Find a 99 percent confidence interval for σ2.

9.90 Test H0: σ2 = .0005 versus Ha: σ2 ≠ .0005 by setting α = .01.

Chapter Summary

We began this chapter by learning about the two hypotheses that make up the structure of a hypothesis test. The null hypothesis is the statement being tested. Usually it represents the status quo and it is not rejected unless there is convincing sample evidence that it is false. The alternative, or, research, hypothesis is a statement that is accepted only if there is convincing sample evidence that it is true and that the null hypothesis is false. In some situations, the alternative hypothesis is a condition for which we need to attempt to find supportive evidence. We also learned that two types of errors can be made in a hypothesis test. A Type I error occurs when we reject a true null hypothesis, and a Type II error occurs when we do not reject a false null hypothesis.

We studied two commonly used ways to conduct a hypothesis test. The first involves comparing the value of a test statistic with what is called a
critical value, and the second employs what is called a
p-value.
The p-value measures the weight of evidence against the null hypothesis. The smaller the p-value, the more we doubt the null hypothesis. We learned that, if we can reject the null hypothesis with the probability of a Type I error equal to α, then we say that the test result has statistical significance at the α level. However, we also learned that, even if the result of a hypothesis test tells us that statistical significance exists, we must carefully assess whether the result is practically important. One good way to do this is to use a point estimate and confidence interval for the parameter of interest.

The specific hypothesis tests we covered in this chapter all dealt with a hypothesis about one population parameter. First, we studied a test about a population mean that is based on the assumption that the population standard deviation
σ is known.
This test employs the normal distribution. Second, we studied a test about a population mean that assumes that
σ is unknown.
We learned that this test is based on the
t distribution.
Figure 9.18 presents a flowchart summarizing how to select an appropriate test statistic to test a hypothesis about a population mean. Then we presented a test about a population proportion that is based on the normal distribution. Next (in optional Section 9.5) we studied Type II error probabilities, and we showed how we can find the sample size needed to make both the probability of a Type I error and the probability of a serious Type II error as small as we wish. We concluded this chapter by discussing (in optional Sections 9.6 and 9.7) the
chi-square distribution
and its use in making statistical inferences about a population variance.

Figure 9.18: Selecting an Appropriate Test Statistic to Test a Hypothesis about a Population Mean

Glossary of Terms

alternative (research) hypothesis:

A statement that will be accepted only if there is convincing sample evidence that it is true. Sometimes it is a condition for which we need to attempt to find supportive evidence. (page 347)

chi-square distribution:

A useful continuous probability distribution. Its probability curve is skewed to the right, and the exact shape of the probability curve depends on the number of degrees of freedom associated with the curve. (page 382)

critical value:

The value of the test statistic is compared with a critical value in order to decide whether the null hypothesis can be rejected. (pages 354, 358, 360)

greater than alternative:

An alternative hypothesis that is stated as a greater than ( > ) inequality. (page 349)

less than alternative:

An alternative hypothesis that is stated as a less than ( < ) inequality. (page 349)

not equal to alternative:

An alternative hypothesis that is stated as a not equal to ( ≠ ) inequality. (page 349)

null hypothesis:

The statement being tested in a hypothesis test. It usually represents the status quo and it is not rejected unless there is convincing sample evidence that it is false. (page 347)

one-sided alternative hypothesis:

An alternative hypothesis that is stated as either a greater than ( > ) or a less than ( < ) inequality. (page 349)

power (of a statistical test):

The probability of rejecting the null hypothesis when it is false. (page 379)


p-value (probability value):

The probability, computed assuming that the null hypothesis is true, of observing a value of the test statistic that is at least as extreme as the value actually computed from the sample data. The p-value measures how much doubt is cast on the null hypothesis by the sample data. The smaller the p-value, the more we doubt the null hypothesis. (pages 355, 358, 360, 362)

statistical significance at the α level:

When we can reject the null hypothesis by setting the probability of a Type I error equal to α. (page 354)

test statistic:

A statistic computed from sample data in a hypothesis test. It is either compared with a critical value or used to compute a p-value. (page 349)

two-sided alternative hypothesis:

An alternative hypothesis that is stated as a not equal to ( ≠ ) inequality. (page 349)

Type I error:

Rejecting a true null hypothesis. (page 350)

Type II error:

Failing to reject a false null hypothesis. (page 350)

Important Formulas

and Tests

Hypothesis Testing steps: page 357

A hypothesis test about a population mean (σ known): page 361

A t test about a population mean (σ unknown): page 366

A large sample hypothesis test about a population proportion: page 371

Calculating the probability of a Type II error: page 379

Sample size determination to achieve specified values of α and β: page 380

Statistical inference about a population variance: page 383

Supplementary Exercises

9.91 The auditor for a large corporation routinely monitors cash disbursements. As part of this process, the auditor examines check request forms to determine whether they have been properly approved. Improper approval can occur in several ways. For instance, the check may have no approval, the check request might be missing, the approval might be written by an unauthorized person, or the dollar limit of the authorizing person might be exceeded.

a Last year the corporation experienced a 5 percent improper check request approval rate. Since this was considered unacceptable, efforts were made to reduce the rate of improper approvals. Letting p be the proportion of all checks that are now improperly approved, set up the null and alternative hypotheses needed to attempt to demonstrate that the current rate of improper approvals is lower than last year’s rate of 5 percent.

b Suppose that the auditor selects a random sample of 625 checks that have been approved in the last month. The auditor finds that 18 of these 625 checks have been improperly approved. Use critical values and this sample information to test the hypotheses you set up in part a at the .10, .05, .01, and .001 levels of significance. How much evidence is there that the rate of improper approvals has been reduced below last year’s 5 percent rate?

c Find the p-value for the test of part b. Use the p-value to carry out the test by setting α equal to .10, .05, .01, and .001. Interpret your results.

d Suppose the corporation incurs a $10 cost to detect and correct an improperly approved check. If the corporation disburses at least 2 million checks per year, does the observed reduction of the rate of improper approvals seem to have practical importance? Explain your opinion.

9.92 THE CIGARETTE ADVERTISEMENT CASE ModelAge

Recall that the cigarette industry requires that models in cigarette ads must appear to be at least 25 years old. Also recall that a sample of 50 people is randomly selected at a shopping mall. Each person in the sample is shown a “typical cigarette ad” and is asked to estimate the age of the model in the ad.

a Let μ be the mean perceived age estimate for all viewers of the ad, and suppose we consider the industry requirement to be met if μ is at least 25. Set up the null and alternative hypotheses needed to attempt to show that the industry requirement is not being met.

b Suppose that a random sample of 50 perceived age estimates gives a mean of years and a standard deviation of s = 3.596 years. Use these sample data and critical values to test the hypotheses of part a at the .10, .05, .01, and .001 levels of significance.

c How much evidence do we have that the industry requirement is not being met?

d Do you think that this result has practical importance? Explain your opinion.

9.93 THE CIGARETTE ADVERTISEMENT CASE ModelAge

Consider the cigarette ad situation discussed in Exercise 9.92. Using the sample information given in that exercise, the p-value for testing H0 versus Ha can be calculated to be .0057.

a Determine whether H0 would be rejected at each of α = .10, α = .05, α = .01, and α = .001.

b Describe how much evidence we have that the industry requirement is not being met.

9.94 In an article in the Journal of Retailing, Kumar, Kerwin, and Pereira study factors affecting merger and acquisition activity in retailing. As part of the study, the authors compare the characteristics of “target firms” (firms targeted for acquisition) and “bidder firms” (firms attempting to make acquisitions). Among the variables studied in the comparison were earnings per share, debt-to-equity ratio, growth rate of sales, market share, and extent of diversification.

a Let μ be the mean growth rate of sales for all target firms (firms that have been targeted for acquisition in the last five years and that have not bid on other firms), and assume growth rates are approximately normally distributed. Furthermore, suppose a random sample of 25target firms yields a sample mean sales growth rate of with a standard deviation of s = 0.12. Use critical values and this sample information to test H0: μ ≤ .10 versus Ha: μ > .10 by setting α equal to .10, .05, .01, and .001. How much evidence is there that the mean growth rate of sales for target firms exceeds .10 (that is, exceeds 10 percent)?

b Now let μ be the mean growth rate of sales for all firms that are bidders (firms that have bid to acquire at least one other firm in the last five years), and again assume growth rates are approximately normally distributed. Furthermore, suppose a random sample of 25 bidders yields a sample mean sales growth rate of with a standard deviation of s = 0.09. Use critical values and this sample information to test H0: μ ≤ .10 versus Ha: μ > .10 by setting α equal to .10, .05, .01, and .001. How much evidence is there that the mean growth rate of sales for bidders exceeds .10 (that is, exceeds 10 percent)?

9.95 A consumer electronics firm has developed a new type of remote control button that is designed to operate longer before becoming intermittent. A random sample of 35 of the new buttons is selected and each is tested in continuous operation until becoming intermittent. The resulting lifetimes are found to have a sample mean of hours and a sample standard deviation of s = 110.8.

a Independent tests reveal that the mean lifetime (in continuous operation) of the best remote control button on the market is 1,200 hours. Letting μ be the mean lifetime of the population of all new remote control buttons that will or could potentially be produced, set up the null and alternative hypotheses needed to attempt to provide evidence that the new button’s mean lifetime exceeds the mean lifetime of the best remote button currently on the market.

b Using the previously given sample results, use critical values to test the hypotheses you set up in part a by setting α equal to .10, .05, .01, and .001. What do you conclude for each value of α?

c Suppose that and s = 110.8 had been obtained by testing a sample of 100 buttons. Use critical values to test the hypotheses you set up in part a by setting α equal to .10, .05, .01, and .001. Which sample (the sample of 35 or the sample of 100) gives a more statistically significant result? That is, which sample provides stronger evidence that Ha is true?

d If we define practical importance to mean that μ exceeds 1,200 by an amount that would be clearly noticeable to most consumers, do you think that the result has practical importance? Explain why the samples of 35 and 100 both indicate the same degree of practical importance.

e Suppose that further research and development effort improves the new remote control button and that a random sample of 35 buttons gives hours and s = 102.8 hours. Test your hypotheses of part a by setting α equal to .10, .05, .01, and .001.

(1) Do we have a highly statistically significant result? Explain.

(2) Do you think we have a practically important result? Explain.

9.96 Again consider the remote control button lifetime situation discussed in Exercise 9.95. Using the sample information given in the introduction to Exercise 9.95, the p-value for testing H0 versus Ha can be calculated to be .0174.

a Determine whether H0 would be rejected at each of α = .10, α = .05, α = .01, and α = .001.

b Describe how much evidence we have that the new button’s mean lifetime exceeds the mean lifetime of the best remote button currently on the market.

9.97 Calculate and use an appropriate 95 percent confidence interval to help evaluate practical importance as it relates to the hypothesis test in each of the following situations discussed in previous review exercises. Explain what you think each confidence interval says about practical importance.

a The check approval situation of Exercise 9.91.

b The cigarette ad situation of Exercise 9.92.

c The remote control button situation of Exercise 9.95
a
,
c
, and
e
.

9.98 Several industries located along the Ohio River discharge a toxic substance called carbon tetrachloride into the river. The state Environmental Protection Agency monitors the amount of carbon tetrachloride pollution in the river. Specifically, the agency requires that the carbon tetrachloride contamination must average no more than 10 parts per million. In order to monitor the carbon tetrachloride contamination in the river, the agency takes a daily sample of 100 pollution readings at a specified location. If the mean carbon tetrachloride reading for this sample casts substantial doubt on the hypothesis that the average amount of carbon tetrachloride contamination in the river is at most 10 parts per million, the agency must issue a shutdown order. In the event of such a shutdown order, industrial plants along the river must be closed until the carbon tetrachloride contamination is reduced to a more acceptable level. Assume that the state Environmental Protection Agency decides to issue a shutdown order if a sample of 100 pollution readings implies that H0: μ ≤ 10 can be rejected in favor of Ha: μ > 10 by setting α = .01. If σ equals 2, calculate the probability of a Type II error for each of the following alternative values of μ: 10.1, 10.2, 10.3, 10.4, 10.5, 10.6, 10.7, 10.8, 10.9, and 11.0.

9.99 THE INVESTMENT CASE InvestRet

Suppose that random samples of 50 returns for each of the following investment classes give the indicated sample mean and sample standard deviation:

a For each investment class, set up the null and alternative hypotheses needed to test whether the current mean return differs from the historical (1970 to 1994) mean return given in Table 3.11 (page 159).

b Test each hypothesis you set up in part a at the .05 level of significance. What do you conclude? For which investment classes does the current mean return differ from the historical mean?

9.100 THE UNITED KINGDOM INSURANCE CASE

Assume that the U.K. insurance survey is based on 1,000 randomly selected United Kingdom households and that 640 of these households spent money to buy life insurance in 1993.

a If p denotes the proportion of all U.K. households that spent money to buy life insurance in 1993, set up the null and alternative hypotheses needed to attempt to justify the claim that more than 60 percent of U.K. households spent money to buy life insurance in 1993.

b Test the hypotheses you set up in part a by setting α = .10, .05, .01, and .001. How much evidence is there that more than 60 percent of U.K. households spent money to buy life insurance in 1993?

9.101 How safe are child car seats? Consumer Reports (May 2005) tested the safety of child car seats in 30 mph crashes. They found “slim safety margins” for some child car seats. Suppose that Consumer Reports simulates the safety of the market-leading child car seat. Their test consists of placing the maximum claimed weight in the car seat and simulating crashes at higher and higher miles per hour until a problem occurs. The following data identify the speed at which a problem with the car seat first appeared; such as the strap breaking, seat shell cracked, strap adjuster broke, detached from the base, etc.: 31.0, 29.4, 30.4, 28.9, 29.7, 30.1, 32.3, 31.7, 35.4, 29.1, 31.2, 30.2. Let μ denote the true mean speed at which a problem with the car seat first appears. The following MINITAB output gives the results of using the sample data to test H0: μ = 30 versus Ha: μ > 30. CarSeat

How much evidence is there that μ exceeds 30 mph?

9.102 Consumer Reports (January 2005) indicates that profit margins on extended warranties are much greater than on the purchase of most products.5 In this exercise we consider a major electronics retailer that wishes to increase the proportion of customers who buy extended warranties on digital cameras. Historically, 20 percent of digital camera customers have purchased the retailer’s extended warranty. To increase this percentage, the retailer has decided to offer a new warranty that is less expensive and more comprehensive. Suppose that three months after starting to offer the new warranty, a random sample of 500 customer sales invoices shows that 152 out of 500 digital camera customers purchased the new warranty. Letting p denote the proportion of all digital camera customers who have purchased the new warranty, calculate the p-value for testing H0: p = .20 versus Ha: p > .20. How much evidence is there that p exceeds .20? Does the difference between and .2 seem to be practically important? Explain your opinion.

9.103 Fortune magazine has periodically reported on the rise of fees and expenses charged by stock funds.

a Suppose that 10 years ago the average annual expense for stock funds was 1.19 percent. Let μ be the current mean annual expense for all stock funds, and assume that stock fund annual expenses are approximately normally distributed. If a random sample of 12 stock funds gives a sample mean annual expense of with a standard deviation of s = .31%, use critical values and this sample information to test H0: μ ≤ 1.19% versus Ha: μ > 1.19% by setting α equal to .10, .05, .01, and .001. How much evidence is there that the current mean annual expense for stock funds exceeds the average of 10 years ago?

b Do you think that the result in part a has practical importance? Explain your opinion.

9.104: Internet Exercise

Are American consumers comfortable using their credit cards to make purchases over the Internet? Suppose that a noted authority suggests that credit cards will be firmly established on the Internet once the 80 percent barrier is broken; that is, as soon as more than 80 percent of those who make purchases over the Internet are willing to use a credit card to pay for their transactions. A recent Gallup Poll (story, survey results, and analysis can be found at http://www.gallup.com/poll/releases/pr000223.asp) found that, out of n = 302 Internet purchasers surveyed, 267 have paid for Internet purchases using a credit card. Based on the results of the Gallup survey, is there sufficient evidence to conclude that the proportion of Internet purchasers willing to use a credit card now exceeds 0.80? Set up the appropriate null and alternative hypotheses, test at the 0.05 and 0.01 levels of significance, and calculate a p-value for your test.

Go to the Gallup Organization website (http://www.gallup.com) and find the index of recent poll results (http://www.gallup.com/poll/index.asp). Select an interesting current poll and prepare a brief written summary of the poll or some aspect thereof. Include a statistical test for the significance of a proportion (you may have to make up your own value for the hypothesized proportion p0) as part of your report. For example, you might select a political poll and test whether a particular candidate is preferred by a majority of voters (p > 0.50).

Appendix 9.1: One-Sample Hypothesis Testing Using MINITAB

The first instruction block in this section begins by describing the entry of data into the MINITAB data window. Alternatively, the data may be loaded directly from the data disk included with the text. The appropriate data file name is given at the top of the instruction block. Please refer to Appendix 1.1 for further information about entering data, saving data, and printing results when using MINITAB.

Hypothesis test for a population mean in Figure 9.9(a) on page 368 (data file: CreditCd.MTW):

• In the Data window, enter the interest rate data from Table 9.3 (page 367) into a single column with variable name Rate.

• Select Stat: Basic Statistics : 1-Sample t.

• In the “1-Sample t (Test and Confidence Interval)” dialog box, select the “Samples in columns” option.

• Select the variable name Rate into the “Samples in columns” window.

• Place a checkmark in the “Perform hypothesis test” checkbox.

• Enter the hypothesized mean (here 18.8) into the “Hypothesized mean” window.

• Click the Options… button, select the desired alternative (in this case “less than”) from the Alternative drop-down menu, and click OK in the “1-Sample t-Options” dialog box.

• To produce a boxplot of the data with a graphical representation of the hypothesis test, click the Graphs… button in the “1-Sample t (Test and Confidence Interval)” dialog box, check the “Boxplot of data” checkbox, and click OK in the “1-Sample t—Graphs” dialog box.

• Click OK in the “1-Sample t (Test and Confidence Interval)” dialog box.

• The confidence interval is given in the Session window, and the boxplot is displayed in a graphics window.

A “1-Sample Z” test is also available in MINITAB under Basic Statistics. It requires a user-specified value of the population standard deviation, which is rarely known.

Hypothesis test for a population proportion in Exercise 9.72 on page 375:

• Select Stat: Basic Statistics : 1 Proportion

• In the “1 Proportion (Test and Confidence Interval)” dialog box, select the “Summarized data” option.

• Enter the sample number of successes (here equal to 146) into the “Number of events” window.

• Enter the sample size (here equal to 400) into the “Number of trials” window.

• Place a checkmark in the “Perform hypothesis test” checkbox.

• Enter the hypothesized proportion (here equal to 0.25) into the “Hypothesized proportion” window.

• Click on the Options… button.

• In the “1 Proportion—Options” dialog box, select the desired alternative (in this case “greater than”) from the Alternative drop-down menu.

• Place a checkmark in the “Use test and interval based on normal distribution” checkbox.

• Click OK in the “1 Proportion—Options” dialog box and click OK in the “1 Proportion (Test and Confidence Interval)” dialog box.

• The hypothesis test results are given in the Session window.

Appendix 9.2: One-Sample Hypothesis Testing Using Excel

The instruction block in this section begins by describing the entry of data into an Excel spreadsheet. Alternatively, the data may be loaded directly from the data disk included with the text. The appropriate data file name is given at the top of the instruction block. Please refer to Appendix 1.2 for further information about entering data, saving data, and printing results.

Hypothesis test for a population mean in Figure 9.9(b) on page 368 (data file: CreditCd.xlsx):

The Data Analysis ToolPak in Excel does not explicitly provide for one-sample tests of hypotheses. A one-sample test can be conducted using the Descriptive Statistics component of the Analysis ToolPak and a few additional computations using Excel.

Descriptive statistics:

• Enter the interest rate data from Table 9.3 (page 367) into cells A2.A16 with the label Rate in cell A1.

• Select Data: Data Analysis : Descriptive Statistics.

• Click OK in the Data Analysis dialog box.

• In the Descriptive Statistics dialog box, enter A1.A16 into the Input Range box.

• Place a checkmark in the “Labels in first row” check box.

• Under output options, select “New Worksheet Ply” to have the output placed in a new worksheet and enter the name Output for the new worksheet.

• Place a checkmark in the Summary Statistics checkbox.

• Click OK in the Descriptive Statistics dialog box.

The resulting block of descriptive statistics is displayed in the Output worksheet and the entries needed to carry out the test computations have been entered into the range D3.E6.

Computation of the test statistic and p-value:

• In cell E7, use the formula = (E3 − E4)/(E5/SQRT(E6)) to compute the test statistic t (= −4.970).

• Click on cell E8 and then select the Insert Function button on the Excel toolbar.

• In the Insert Function dialog box, select Statistical from the “Or select a category:” menu, select TDIST from the “Select a function:” menu, and click OK in the Insert Function dialog box.

• In the TDIST Function Arguments dialog box, enter abs(E7) in the X window.

• Enter 14 in the Deg_freedom window.

• Enter 1 in the Tails window to select a one-tailed test.

• Click OK in the TDIST Function Arguments dialog box.

• The p-value related to the test will be placed in cell E8.

Appendix 9.3: One-Sample Hypothesis Testing Using MegaStat

The instructions in this section begin by describing the entry of data into an Excel worksheet. Alternatively, the data may be loaded directly from the data disk included with the text. The appropriate data file name is given at the top of each instruction block. Please refer to Appendix 1.2 for further information about entering data and saving and printing results in Excel. Please refer to Appendix 1.3 for more information about using MegaStat.

Hypothesis test for a population mean in Figure 9.10 on page 368 (data file: CreditCd.xlsx):

• Enter the interest rate data from Table 9.3 (page 367) into cells A2.A16 with the label Rate in cell A1.

• Select Add-Ins: MegaStat: Hypothesis Tests : Mean vs. Hypothesized Value

• In the “Hypothesis Test: Mean vs. Hypothesized Value” dialog box, click on “data input” and use the autoexpand feature to enter the range A1.A16 into the Input Range window.

• Enter the hypothesized value (here equal to 18.8) into the Hypothesized Mean window.

• Select the desired alternative (here “less than”) from the drop-down menu in the Alternative box.

• Click on t-test and click OK in the “Hypothesis Test: Mean vs. Hypothesized Value” dialog box.

• A hypothesis test employing summary data can be carried out by clicking on “summary data,” and by entering a range into the Input Range window that contains the following—label; sample mean; sample standard deviation; sample size n.

A z test can be carried out (in the unlikely event that the population standard deviation is known) by clicking on “z-test.”

Hypothesis test for a population proportion shown in Figure 9.11 in the electronic article surveillance situation on pages 373 and 374:

• Select Add-Ins: MegaStat: Hypothesis Tests : Proportion vs. Hypothesized Value

• In the “Hypothesis Test: Proportion vs. Hypothesized Value” dialog box, enter the hypothesized value (here equal to 0.05) into the “Hypothesized p” window.

• Enter the observed sample proportion (here equal to 0.16) into the “Observed p” window.

• Enter the sample size (here equal to 250) into the “n” window.

• Select the desired alternative (here “greater than”) from the drop-down menu in the Alternative box.

• Check the “Display confidence interval” checkbox (if desired), and select or type the appropriate level of confidence.

• Click OK in the “Hypothesis Test: Proportion vs. Hypothesized Value” dialog box.

Hypothesis test for a population variance in the camshaft situation of Section 9.7 on pages 383 and 384:

• Enter a label (in this case Depth) into cell A1, the sample variance (here equal to .0885) into cell A2, and the sample size (here equal to 30) into cell A3.

• Select Add-Ins: MegaStat: Hypothesis Tests : Chi-square Variance Test

• Click on “summary input.”

• Enter the range A1.A3 into the Input Range window—that is, enter the range containing the data label, the sample variance, and the sample size.

• Enter the hypothesized value (here equal to 0.2209) into the “Hypothesized variance” window.

• Select the desired alternative (in this case “less than”) from the drop-down menu in the Alternative box.

• Check the “Display confidence interval” checkbox (if desired) and select or type the appropriate level of confidence.

• Click OK in the “Chi-square Variance Test” dialog box.

• A chi-square variance test may be carried out using data input by entering the observed sample values into a column in the Excel worksheet, and by then using the autoexpand feature to enter the range containing the label and sample values into the Input Range window.

1 This case is based on conversations by the authors with several employees working for a leading producer of trash bags. For purposes of confidentiality, we have agreed to withhold the company’s name.

2 Thanks to Krogers of Oxford, Ohio, for helpful discussions concerning this case.

3 Some statisticians suggest using the more conservative rule that both np0 and n(1 − p0) must be at least 10.

4 Source: “Driving Organic Growth at Bank of America” Quality Progress (February 2005), pp. 23–27.

5 Consumer Reports, January 2005, page 51.

(Bowerman 346)

Bowerman, Bruce L. Business Statistics in Practice, 5th Edition. McGraw-Hill Learning Solutions, 022008. .

CHAPTER 19: Decision Theory

Chapter Outline


19.1

Bayes’ Theorem


19.2
Introduction to Decision Theory


19.3
Decision Making Using Posterior Probabilities


19.4
Introduction to Utility Theory

Every day businesses and the people who run them face a myriad of decisions. For instance, a manufacturer might need to decide where to locate a new factory and might also need to decide how large the new facility should be. Or, an investor might decide where to invest money from among several possible investment choices. In this chapter we study some probabilistic methods that can help a decision maker to make intelligent decisions. In Section 19.1 we present
Bayes’ Theorem,
which is useful for updating probabilities on the basis of newly obtained information that may help in making a decision. In Section 19.2 we formally introduce
decision theory.
We discuss the elements of a decision problem, and we present strategies for making decisions when we face various levels of uncertainty. We also show how to construct a
decision tree,
which is a diagram that can help us analyze a decision problem, and we show how the concept of expected value can help us make decisions. In Section 19.3 we show how to use sample information to help make decisions, and we demonstrate how to assess the worth of sample information in order to decide whether the sample information should be obtained. We conclude this chapter with Section 19.4, which introduces using utility theory to help make decisions.

Many of this chapter’s concepts are presented in the context of

The Oil Drilling Case: An oil company uses decision theory to help to decide whether to drill for oil on a particular site. The company can perform a seismic experiment at the site to obtain information about the site’s potential, and the company uses decision theory to decide whether to drill based on the various possible survey results. In addition, decision theory is employed to determine whether the seismic experiment should be carried out.

19.1: Bayes’ Theorem

Sometimes we have an initial or
prior probability
that an event will occur. Then, based on new information, we revise the prior probability to what is called a
posterior probability.
This revision can be done by using a theorem called
Bayes’ Theorem.

EXAMPLE 19.1

To illustrate Bayes’ Theorem, consider the event that a randomly selected American has the deadly disease AIDS. We let AIDS represent this event, and we let represent the event that the randomly selected American does not have AIDS. Since it is estimated that .6 percent of the American population has AIDS,

A diagnostic test is used to attempt to detect whether a person has AIDS. According to historical data, 99.9 percent of people with AIDS receive a positive (POS) result when this test is administered, while 1 percent of people who do not have AIDS receive a positive result. That is,

If we administer the test to a randomly selected American (who may or may not have AIDS) and the person receives a positive test result, what is the probability that the person actually has AIDS? This probability is

The idea behind Bayes’ Theorem is that we can find P(AIDS | POS) by thinking as follows. A person will receive a positive result (POS) if the person receives a positive result and actually has AIDS—that is, (AIDS ∩ POS)—or if the person receives a positive result and actually does not have AIDS—that is, . Therefore,

This implies that

This probability says that, if all Americans were given an AIDS test, only 38 percent of the people who would react positively to the test would actually have AIDS. That is, 62 percent of Americans identified as having AIDS would actually be free of the disease! The reason for this rather surprising result is that, since so few people actually have AIDS, the majority of people who would receive a positive result are people who are free of AIDS and who erroneously test positive. This is why statisticians have frequently spoken against proposals for mandatory AIDS testing.

In the preceding example, there were two states of nature—AIDS and —and two outcomes of the diagnostic test—POS and . In general, there might be any number of states of nature and any number of experimental outcomes. This leads to a general statement of Bayes’ Theorem.

Bayes’ Theorem

Let S1, S2,…, Sk be k mutually exclusive states of nature, one of which must be true, and suppose that P(S1), P(S2),…, P(Sk) are the prior probabilities of these states of nature. Also, let E be a particular outcome of an experiment designed to help determine which state of nature is really true. Then, the
posterior probability
of a particular state of nature, say Si, given the experimental outcome E, is

where

We illustrate Bayes’ Theorem in the following case.

EXAMPLE 19.2: The Oil Drilling Case

An oil company is attempting to decide whether to drill for oil on a particular site. There are three possible states of nature:

1 No oil (state of nature S1, which we will denote as none)

2 Some oil (state of nature S2, which we will denote as some)

3 Much oil (state of nature S3, which we will denote as much)

Based on experience and knowledge concerning the site’s geological characteristics, the oil company feels that the prior probabilities of these states of nature are as follows:

In order to obtain more information about the potential drilling site, the oil company can perform a seismic experiment, which has three readings—low, medium, and high. Moreover, information exists concerning the accuracy of the seismic experiment. The company’s historical records tell us that

1 Of 100 past sites that were drilled and produced no oil, 4 sites gave a high reading. Therefore,

2 Of 400 past sites that were drilled and produced some oil, 8 sites gave a high reading. Therefore,

3 Of 300 past sites that were drilled and produced much oil, 288 sites gave a high reading. Therefore,

Intuitively, these conditional probabilities tell us that sites that produce no oil or some oil seldom give a high reading, while sites that produce much oil often give a high reading. Figure 19.1(a) shows a tree diagram that illustrates the prior probabilities of no, some, and much oil and the above conditional probabilities. This figure also gives the conditional probabilities for medium and low readings of the seismic experiment given each of the states of nature (none, some, or much).

Figure 19.1: A Tree Diagram and Probability Revision Tables for Bayes’ Theorem in the Oil Drilling Example

Now, suppose that when the company performs the seismic experiment on the site in question, it obtains a high reading. The previously given conditional probabilities suggest that, given this new information, the company might feel that the likelihood of much oil is higher than its prior probability P(much) = .1, and that the likelihoods of some oil and no oil are lower than the prior probabilities P(some) = .2 and P(none) = .7. To be more specific, we wish to revise the prior probabilities of no, some, and much oil to what we call posterior probabilities. We can do this by using Bayes’ Theorem as follows.

If we wish to compute P(none | high), we first calculate

Then Bayes’ Theorem says that

These calculations are summarized in part (b) of Figure 19.1. This table, which is called a probability revision table, also contains the calculations of the revised probabilities P(some | high) = .03125 and P(much | high) = .75. The revised probabilities in part (b) of Figure 19.1 tell us that, given that the seismic experiment gives a high reading, the revised probabilities of no, some, and much oil are .21875, .03125, and .75, respectively.

Since the posterior probability of much oil is .75, we might conclude that we should drill on the oil site. However, this decision should also be based on economic considerations, and we will see in Section 19.3 how to combine Bayesian posterior probabilities with economic considerations to arrive at reasonable decisions.

In this section we have only introduced Bayes’ Theorem. There is an entire subject called
Bayesian statistics,
which uses Bayes’ Theorem to update prior belief about a probability or population parameter to posterior belief. The use of Bayesian statistics is controversial in the case where the prior belief is largely based on subjective considerations, because many statisticians do not believe that we should base decisions on subjective considerations. Realistically, however, we all do this in our daily lives. For example, how each of us viewed the evidence in the O. J. Simpson trial had a great deal to do with our prior beliefs about both O. J. Simpson and the police.

Exercises for Section 19.1

CONCEPTS

19.1 What is a prior probability? What is a posterior probability?

19.2 Explain the purpose behind using Bayes’ Theorem.

METHODS AND APPLICATIONS

19.3 Suppose that A1, A2, and B are events where A1 and A2 are mutually exclusive and

Use this information to find P(A1 | B) and P(A2 | B).

19.4 Suppose that A1, A2, A3, and B are events where A1, A2, and A3 are mutually exclusive and

Use this information to find P(A1 | B), P(A2 | B) and P(A3 | B).

19.5 Again consider the diagnostic test for AIDS discussed in Example 19.1 (page 831) and recall that P(POS | AIDS) = .999 and , where POS denotes a positive test result. Assuming that the percentage of people who have AIDS is 1 percent, recalculate the probability that a randomly selected person has the AIDS virus, given that his or her test result is positive.

19.6 A department store is considering a new credit policy to try to reduce the number of customers defaulting on payments. A suggestion is made to discontinue credit to any customer who has been one week or more late with his/her payment at least twice. Past records show 95 percent of defaults were late at least twice. Also, 3 percent of all customers default, and 30 percent of those who have not defaulted have had at least two late payments.

a Find the probability that a customer with at least two late payments will default.

b Based on part a, should the policy be adopted? Explain.

19.7 A company administers an “aptitude test for managers” to aid in selecting new management trainees. Prior experience suggests that 60 percent of all applicants for management trainee positions would be successful if they were hired. Furthermore, past experience with the aptitude test indicates that 85 percent of applicants who turn out to be successful managers pass the test and 90 percent of applicants who turn out not to be successful managers fail the test.

a If an applicant passes the “aptitude test for managers,” what is the probability that the applicant will succeed in a management position?

b Based on your answer to part a, do you think that the “aptitude test for managers” is a valuable way to screen applicants for management trainee positions? Explain.

19.8 Three data entry specialists enter requisitions into a computer. Specialist 1 processes 30 percent of the requisitions, specialist 2 processes 45 percent, and specialist 3 processes 25 percent. The proportions of incorrectly entered requisitions by data entry specialists 1, 2, and 3 are .03, .05, and .02, respectively. Suppose that a random requisition is found to have been incorrectly entered. What is the probability that it was processed by data entry specialist 1? By data entry specialist 2? By data entry specialist 3?

19.9 A truth serum given to a suspect is known to be 90 percent reliable when the person is guilty and 99 percent reliable when the person is innocent. In other words, 10 percent of the guilty are judged innocent by the serum and 1 percent of the innocent are judged guilty. If the suspect was selected from a group of suspects of which only 5 percent are guilty of having committed a crime, and the serum indicates that the suspect is guilty of having committed a crime, what is the probability that the suspect is innocent?

19.10 A firm designs and builds automatic electronic control devices and installs them in customers’ plants. In shipment, a device has a prior probability of .10 of getting out of alignment. Before a control device is installed, test equipment is used to check the device’s alignment. The test equipment has two readings, “in” or “out” of alignment. If the control device is in alignment, there is a .8 probability that the test equipment will read “in.” If the control device is not in alignment, there is a .9 probability that the test equipment will read “out.”

a Draw a tree diagram illustrating the prior and conditional probabilities for this situation.

b Construct a probability revision table for calculating the probability that the test equipment reads “in,” and for calculating the posterior probabilities of the control device being in alignment and out of alignment given that the test equipment reads “in.” What is P(in | reads “in”)? What is P(out | reads “in”)?

c Construct a probability revision table for calculating the probability that the test equipment reads “out,” and for calculating the posterior probabilities of the control device being in alignment and out of alignment given that the test equipment reads “out.” What is P(in | reads “out”)? What is P(out | reads “out”)?

19.2: Introduction to Decision Theory

Suppose that a real estate developer is proposing the development of a condominium complex on an exclusive parcel of lakefront property. The developer wishes to choose between three possible options—building a large complex, building a medium-sized complex, and building a small complex. The profitability of each option depends on the level of demand for condominium units after the complex has been built. For simplicity, the developer considers only two possible levels of demand—high or low; the developer must choose whether to build a large, medium, or small complex based on her beliefs about whether demand for condominium units will be high or low.

The real estate developer’s situation requires a decision.
Decision theory
is a general approach that helps decision makers make intelligent choices. A decision theory problem typically involves the following elements:

1
States of nature:
a set of potential future conditions that affects the results of the decision. For instance, the level of demand (high or low) for condominium units will affect profits after the developer chooses to build a large, medium, or small complex. Thus, we have two states of nature—high demand and low demand.

2
Alternatives:
several alternative actions for the decision maker to choose from. For example, the real estate developer can choose between building a large, medium, or small condominium complex. Therefore, the developer has three alternatives—large, medium, and small.

3 Payoffs: a payoff for each alternative under each potential state of nature. The payoffs are often summarized in a
payoff table.
For instance, Table 19.1 gives a payoff table for the condominium complex situation. This table gives the profit1 for each alternative under the different states of nature. For example, the payoff table tells us that, if the developer builds a large complex and if demand for units turns out to be high, a profit of $22 million will be realized. However, if the developer builds a large complex and if demand for units turns out to be low, a loss of $11 million will be suffered.

Table 19.1: A Payoff Table for the Condominium Complex Situation

Once the states of nature have been identified, the alternatives have been listed, and the payoffs have been determined, we evaluate the alternatives by using a
decision criterion.
How this is done depends on the degree of uncertainty associated with the states of nature. Here there are three possibilities:

1
Certainty:
we know for certain which state of nature will actually occur.

2
Uncertainty: we have no information about the likelihoods of the various states of nature.

3
Risk:
the likelihood (probability) of each state of nature can be estimated.

Decision making under certainty

In the unlikely event that we know for certain which state of nature will actually occur, we simply choose the alternative that gives the best payoff for that state of nature. For instance, in the condominium complex situation, if we know that demand for units will be high, then the payoff table (see Table 19.1) tells us that the best alternative is to build a large complex and that this choice will yield a profit of $22 million. On the other hand, if we know that demand for units will be low, then the payoff table tells us that the best alternative is to build a small complex and that this choice will yield a profit of $8 million.

Of course, we rarely (if ever) know for certain which state of nature will actually occur. However, analyzing the payoff table in this way often provides insight into the nature of the problem. For instance, examining the payoff table tells us that, if we know that demand for units will be low, then building either a small complex or a medium complex will be far superior to building a large complex (which would yield an $11 million loss).

Decision making under uncertainty

This is the exact opposite of certainty. Here we have no information about how likely the different states of nature are. That is, we have no idea how to assign probabilities to the different states of nature.

In such a case, several approaches are possible; we will discuss two commonly used methods. The first is called the
maximin criterion.

Maximin: Find the worst possible payoff for each alternative, and then choose the alternative that yields the maximum worst possible payoff.

For instance, to apply the maximin criterion to the condominium complex situation, we proceed as follows (see Table 19.1):

1 If a small complex is built, the worst possible payoff is $8 million.

2 If a medium complex is built, the worst possible payoff is $5 million.

3 If a large complex is built, the worst possible payoff is −$11 million.

Since the maximum of these worst possible payoffs is $8 million, the developer should choose to build a small complex.

The maximin criterion is a pessimistic approach because it considers the worst possible payoff for each alternative. When an alternative is chosen using the maximin criterion, the actual payoff obtained may be higher than the maximum worst possible payoff. However, using the maximin criterion assures a “guaranteed minimum” payoff.

A second approach is called the
maximax criterion.

Maximax: Find the best possible payoff for each alternative, and then choose the alternative that yields the maximum best possible payoff.

To apply the maximax criterion to the condominium complex situation, we proceed as follows (see Table 19.1):

1 If a small complex is built, the best possible payoff is $8 million.

2 If a medium complex is built, the best possible payoff is $15 million.

3 If a large complex is built, the best possible payoff is $22 million.

Since the maximum of these best possible payoffs is $22 million, the developer should choose to build a large complex.

The maximax criterion is an optimistic approach because we always choose the alternative that yields the highest possible payoff. This is a “go for broke” strategy, and the actual payoff obtained may be far less than the highest possible payoff. For example, in the condominium complex situation, if a large complex is built and demand for units turns out to be low, an $11 million loss will be suffered (instead of a $22 million profit).

Decision making under risk

In this case we can estimate the probability of occurrence for each state of nature. Thus, we have a situation in which we have more information about the states of nature than in the case of uncertainty and less information than in the case of certainty. Here a commonly used approach is to use the
expected monetary value criterion.
This involves computing the expected monetary payoff for each alternative and choosing the alternative with the largest expected payoff.

The expected value criterion can be employed by using prior probabilities. As an example, suppose that in the condominium complex situation the developer assigns prior probabilities of .7 and .3 to high and low demands, respectively. We find the expected monetary value for each alternative by multiplying the probability of occurrence for each state of nature by the payoff associated with the state of nature and by summing these products. Referring to the payoff table in Table 19.1, the expected monetary values are as follows:

Small complex: Expected value = .3($8 million) + .7($8 million) = $8 million

Medium complex: Expected value = .3($5 million) + .7($15 million) = $12 million

Large complex: Expected value = .3(−$11 million) + .7($22 million) = $12.1 million

Choosing the alternative with the highest expected monetary value, the developer would choose to build a large complex.

Remember that the expected payoff is not necessarily equal to the actual payoff that will be realized. Rather, the expected payoff is the long-run average payoff that would be realized if many identical decisions were made. For instance, the expected monetary payoff of $12.1 million for a large complex is the average payoff that would be obtained if many large condominium complexes were built. Thus, the expected monetary value criterion is best used when many similar decisions will be made.

Using a decision tree

It is often convenient to depict the alternatives, states of nature, payoffs, and probabilities (in the case of risk) in the form of a
decision tree
or tree diagram. The diagram is made up of nodes and branches. We use square nodes to denote decision points and circular nodes to denote chance events. The branches emanating from a decision point represent alternatives, and the branches emanating from a circular node represent the possible states of nature. Figure 19.2 presents a decision tree for the condominium complex situation (in the case of risk as described previously). Notice that the payoffs are shown at the rightmost end of each branch and that the probabilities associated with the various states of nature are given in parentheses corresponding to each branch emanating from a chance node. The expected monetary values for the alternatives are shown below the chance nodes. The double slashes placed through the small complex and medium complex branches indicate that these alternatives would not be chosen (because of their lower expected payoffs) and that the large complex alternative would be selected.

Figure 19.2: A Decision Tree for the Condominium Complex Situation

A decision tree is particularly useful when a problem involves a sequence of decisions. For instance, in the condominium complex situation, if demand turns out to be small, it might be possible to improve payoffs by selling the condominiums at lower prices. Figure 19.3 shows a decision tree in which, after a decision to build a small, medium, or large condominium complex is made, the developer can choose to either keep the same prices or charge lower prices for condominium units. In order to analyze the decision tree, we start with the last (rightmost) decision to be made. For each decision we choose the alternative that gives the highest payoff. For instance, if the developer builds a large complex and demand turns out to be low, the developer should lower prices (as indicated by the double slash through the alternative of same prices). If decisions are followed by chance events, we choose the alternative that gives the highest expected monetary value. For example, again looking at Figure 19.3, we see that a medium complex should now be built because of its highest expected monetary value ($14.1 million). This is indicated by the double slashes drawn through the small and large complex alternatives. Looking at the entire decision tree in Figure 19.3, we see that the developer should build a medium complex and should sell condominium units at lower prices if demand turns out to be low.

Figure 19.3: A Decision Tree with Sequential Decisions

Sometimes it is possible to determine exactly which state of nature will occur in the future. For example, in the condominium complex situation, the level of demand for units might depend on whether a new resort casino is built in the area. While the developer may have prior probabilities concerning whether the casino will be built, it might be feasible to postpone a decision about the size of the condominium complex until a final decision about the resort casino has been made.

If we can find out exactly which state of nature will occur, we say we have obtained
perfect information.
There is usually a cost involved in obtaining this information (if it can be obtained at all). For instance, we might have to acquire an option on the lakefront property on which the condominium complex is to be built in order to postpone a decision about the size of the complex. Or perfect information might be acquired by conducting some sort of research that must be paid for. A question that arises here is whether it is worth the cost to obtain perfect information. We can answer this question by computing the
expected value of perfect information,
which we denote as the EVPI. The EVPI is defined as follows:

EVPI = expected payoff under certainty − expected payoff under risk

For instance, if we consider the condominium complex situation depicted in the decision tree of Figure 19.2, we found that the expected payoff under risk is $12.1 million (which is the expected payoff associated with building a large complex). To find the expected payoff under certainty, we find the highest payoff under each state of nature. Referring to Table 19.1, we see that if demand is low, the highest payoff is $8 million (when we build a small complex); we see that if demand is high, the highest payoff is $22 million (when we build a large complex). Since the prior probabilities of high and low demand are, respectively, .7 and .3, the expected payoff under certainty is .7($22 million) + .3($8 million) = $17.8 million. Therefore, the expected value of perfect information is $17.8 million − $12.1 million = $5.7 million. This is the maximum amount of money that the developer should be willing to pay to obtain perfect information. That is, the land option should be purchased if it costs $5.7 million or less. Then, if the casino is not built (and demand is low), a small condominium complex should be built; if the casino is built (and demand is high), a large condominium complex should be built. On the other hand, if the land option costs more than $5.7 million, the developer should choose the alternative having the highest expected payoff (which would mean building a large complex—see Figure 19.2).

Finally, another approach to dealing with risk involves assigning what we call
utilities
to monetary values. These utilities reflect the decision maker’s attitude toward risk: that is, does the decision maker avoid risk or is he or she a risk taker? Here the decision maker chooses the alternative that maximizes expected utility. The reader interested in this approach is referred to Section 19.4.

Exercises for Section 19.2

CONCEPTS

19.11 Explain the differences between (a) decision making under certainty, (b) decision making under uncertainty, and (c) decision making under risk.

19.12 Explain how to use the (a) maximin criterion, (b) maximax criterion, and (c) expected monetary value criterion.

19.13 Explain how to find the expected value of perfect information.

METHODS AND APPLICATIONS

Exercises 19.14 through 19.19 refer to an example in the book Production/Operations Management by William J. Stevenson. The example involves a capacity-planning problem in which a company must choose to build a small, medium, or large production facility. The payoff obtained will depend on whether future demand is low, moderate, or high, and the payoffs are as given in the following table: CapPlan

19.14 Find the best alternative (and the resulting payoff) in the given payoff table if it is known with certainty that demand will be

a Low.

b Medium.

c High. CapPlan

19.15 Given the payoff table, find the alternative that would be chosen using the maximin criterion. CapPlan

19.16 Given the payoff table, find the alternative that would be chosen using the maximax criterion. CapPlan

19.17 Suppose that the company assigns prior probabilities of .3, .5, and .2 to low, moderate, and high demands, respectively. CapPlan

a Find the expected monetary value for each alternative (small, medium, and large).

b What is the best alternative if we use the expected monetary value criterion?

19.18 Construct a decision tree for the information in the payoff table assuming that the prior probabilities of low, moderate, and high demands are, respectively, .3, .5, and .2. CapPlan

19.19 For the information in the payoff table find

a The expected payoff under certainty.

b The expected value of perfect information, EVPI. CapPlan

19.20 A firm wishes to choose the location for a new factory. Profits obtained will depend on whether a new railroad spur is constructed to serve the town in which the new factory will be located. The following payoff table summarizes the relevant information: FactLoc

Determine the location that should be chosen if the firm uses

a The maximin criterion.

b The maximax criterion.

19.21 Refer to the information given in Exercise 19.20. Using the probabilities of .60 for a new railroad spur and .40 for no new railroad spur FactLoc

a Compute the expected monetary value for each location.

b Find the location that should be selected using the expected monetary value criterion.

c Compute the EVPI, expected value of perfect information.

19.22 Construct a decision tree for the information given in Exercises 19.20 and 19.21. FactLoc

19.23 Figure 19.4 gives a decision tree presented in the book Production/Operations Management by William J. Stevenson. Use this tree diagram to do the following:

a Find the expected monetary value for each of the alternatives (subcontract, expand, and build).

b Determine the alternative that should be selected in order to maximize the expected monetary value.

Figure 19.4: Decision Tree for Exercise 19.23

*Net present value in millions.

Source: Reprinted with permission from W. J. Stevenson, Production/Operations Management, 6th ed., p. 228. © 1999 by The McGraw-Hill Companies, Inc.

19.3: Decision Making Using Posterior Probabilities

We have seen that the expected monetary value criterion tells us to choose the alternative having the highest expected payoff. In Section 19.2 we computed expected payoffs by using prior probabilities. When we use the expected monetary value criterion to choose the best alternative based on expected values computed using prior probabilities, we call this
prior decision analysis.
Often, however, sample information can be obtained to help us make decisions. In such a case, we compute expected values by using posterior probabilities, and we call the analysis
posterior decision analysis.
In the following example we demonstrate how to carry out posterior analysis.

EXAMPLE 19.3: The Oil Drilling Case

Recall from Example 19.2 that an oil company wishes to decide whether to drill for oil on a particular site, and recall that the company has assigned prior probabilities to the states of nature S1 ≡ no oil, S2 ≡ some oil, and S3 ≡ much oil of .7, .2, and .1, respectively. Figure 19.5 gives a decision tree and payoff table for a prior analysis of the oil drilling situation. Here, using the prior probabilities, the expected monetary value associated with drilling is

.7(−$700,000) + .2($500,000) + .1($2,000,000) = −$190,000

while the expected monetary value associated with not drilling is

.7(0) + .2(0) + .1(0) = 0

Therefore, prior analysis tells us that the oil company should not drill.

Figure 19.5: A Decision Tree and Payoff Table for a Prior Analysis of the Oil Drilling Case

Next, remember that the oil company can obtain more information about the drilling site by performing a seismic experiment with three possible readings—low, medium, and high. Also recall that the accuracy of the seismic experiment is expressed by the conditional probabilities in part (a) of Figure 19.1 (page 834) and that we have used these conditional probabilities to update the prior probabilities of no oil, some oil, and much oil to posterior probabilities in the probability revision tables in parts (b), (c), and (d) of Figure 19.1. For instance, in part (b) of Figure 19.1 we found that

We also used the conditional probabilities in part (a) of Figure 19.1 to compute P(high) = .128, P(medium) = .226, and P(low) = .646, the probabilities of a high, medium, and low reading, respectively.

Figure 19.6 presents a decision tree for a posterior analysis of the oil drilling problem. The leftmost decision node represents the decision of whether to conduct the seismic experiment. The upper branch (no seismic survey) contains a second decision node representing the alternatives in our decision problem (that is, drill or do not drill). At the ends of the “drill” and “do not drill” branches we have chance nodes that branch into the three states of nature—no oil (none), some oil (some), and much oil (much). The appropriate payoff is placed at the rightmost end of each branch, and since this uppermost branch corresponds to “no seismic survey,” the probabilities in parentheses for the states of nature are the prior probabilities. The expected payoff associated with drilling (which we found to be −$190,000) is shown at the chance node for the “drill” branch, and the expected payoff associated with not drilling (which we found to be $0) is shown at the chance node for the “do not drill” branch.

Figure 19.6: A Decision Tree for a Posterior Analysis of the Oil Drilling Case

The lower branch of the decision tree (seismic survey) has an extra chance node that branches into the three possible outcomes of the seismic experiment—low, medium, and high. The probabilities of these outcomes are shown corresponding to the low, medium, and high branches. From the low, medium, and high branches, the tree branches into alternatives (drill and do not drill) and from alternatives into states of nature (none, some, and much). However, the probabilities in parentheses written beside the none, some, and much branches are the posterior probabilities that we computed in the probability revision tables in Figure 19.1. This is because advancing to the end of a particular branch in the lower part of the decision tree is conditional; that is, it depends on obtaining a particular experimental result (low, medium, or high).

We can now use the decision tree to determine the alternative (drill or do not drill) that should be selected given that the seismic experiment has been performed and has resulted in a particular outcome. First, suppose that the seismic experiment results in a high reading. Looking at the branch of the decision tree corresponding to a high reading, the expected monetary values associated with the “drill” and “do not drill” alternatives are

Drill: .21875(−$700,000) + .03125($500,000) + .75($2,000,000) = $1,362,500

Do not drill: .21875(0) + .03125(0) + .75(0) = $0

These expected monetary values are placed on the decision tree corresponding to the “drill” and “do not drill” alternatives. They tell us that, if the seismic experiment results in a high reading, then the company should drill and the expected payoff will be $1,362,500. The double slash placed through the “do not drill” branch (at the very bottom of the decision tree) blocks off that branch and indicates that the company should drill if a high reading is obtained.

Next, suppose that the seismic experiment results in a medium reading. Looking at the branch corresponding to a medium reading, the expected monetary values are

Drill: .15487(−$700,000) + .83186($500,000) + .01327($2,000,000) = $334,061

Do not drill: .15487($0) + .83186($0) + .01327($0) = $0

Therefore, if the seismic experiment results in a medium reading, the oil company should drill, and the expected payoff will be $334,061.

Finally, suppose that the seismic experiment results in a low reading. Looking at the branch corresponding to a low reading, the expected monetary values are

Drill: .98607(−$700,000) + .01238($500,000) + .00155($2,000,000) = −$680,959

Do not drill: .98607($0) + .01238($0) + .00155($0) = $0

Therefore, if the seismic experiment results in a low reading, the oil company should not drill on the site.

We can summarize the results of our posterior analysis as follows:

If we carry out the seismic experiment, we now know what action should be taken for each possible outcome (low, medium, or high). However, there is a cost involved when we conduct the seismic experiment. If, for instance, it costs $100,000 to perform the seismic experiment, we need to investigate whether it is worth it to perform the experiment. This will depend on the expected worth of the information provided by the experiment. Naturally, we must decide whether the experiment is worth it before our posterior analysis is actually done. Therefore, when we assess the worth of the sample information, we say that we are performing a
preposterior analysis.

In order to assess the worth of the sample information, we compute the expected payoff of sampling. To calculate this result, we find the expected payoff and the probability of each sample outcome (that is, at each possible outcome of the seismic experiment). Looking at the decision tree in Figure 19.6, we find the following:

Therefore, the expected payoff of sampling, which is denoted EPS, is

EPS = .646($0) + .226($334,061) + .128($1,362,500) = $249,898

To find the worth of the sample information, we compare the expected payoff of sampling to the expected payoff of no sampling, which is denoted EPNS. The EPNS is the expected payoff of the alternative that we would choose by using the expected monetary value criterion with the prior probabilities. Recalling that we summarized our prior analysis in the tree diagram of Figure 19.5, we found that (based on the prior probabilities) we should choose not to drill and that the expected payoff of this action is $0. Therefore, EPNS = $0.

We compare the EPS and the EPNS by computing the
expected value of sample information,
which is denoted EVSI and is defined to be the expected payoff of sampling minus the expected payoff of no sampling. Therefore,

EVSI = EPS − EPNS = $249,898 − $0 = $249,898

The EVSI is the expected gain from conducting the seismic experiment, and the oil company should pay no more than this amount to carry out the seismic experiment. If the experiment costs $100,000, then it is worth the expense to conduct the experiment. Moreover, the difference between the EVSI and the cost of sampling is called the
expected net gain of sampling,
which is denoted ENGS. Here

ENGS = EVSI − $100,000 = $249,898 − $100,000 = $149,898

As long as the ENGS is greater than $0, it is worth it to carry out the seismic experiment. That is, the oil company should carry out the seismic experiment before it chooses whether or not to drill. Then, as discussed earlier, our posterior analysis says that if the experiment gives a medium or high reading, the oil company should drill, and if the experiment gives a low reading, the oil company should not drill.

Exercises for Section 19.3

CONCEPTS

19.24 Explain what is meant by each of the following and describe the purpose of each:

a Prior analysis.

b Posterior analysis.

c Preposterior analysis.

19.25 Define and interpret each of the following:

a Expected payoff of sampling, EPS.

b Expected payoff of no sampling, EPNS.

c Expected value of sample information, EVSI.

d Expected net gain of sampling, ENGS.

METHODS AND APPLICATIONS

Exercises 19.26 through 19.31 refer to the following situation.

In the book Making Hard Decisions: An Introduction to Decision Analysis (2nd ed.), Robert T. Clemen presents an example in which an investor wishes to choose between investing money in (1) a high-risk stock, (2) a low-risk stock, or (3) a savings account. The payoffs received from the two stocks will depend on the behavior of the stock market—that is, whether the market goes up, stays the same, or goes down over the investment period. In addition, in order to obtain more information about the market behavior that might be anticipated during the investment period, the investor can hire an economist as a consultant who will predict the future market behavior. The results of the consultation will be one of the following three possibilities: (1) “economist says up,” (2) “economist says flat” (the same), or (3) “economist says down.” The conditional probabilities that express the ability of the economist to accurately forecast market behavior are given in the following table: InvDec

For instance, using this table we see that P(“economist says up” | market up) = .80. Figure 19.7 gives an incomplete decision tree for the investor’s situation. Notice that this decision tree gives all relevant payoffs and also gives the prior probabilities of up, flat, and down, which are, respectively, 0.5, 0.3, and 0.2. Using the information provided here, and any needed information on the decision tree of Figure 19.7, do the following:

Figure 19.7: An Incomplete Decision Tree for the Investor’s Decision Problem of Exercises 19.26 through 19.31

Source: From Making Hard Decisions: An Introduction to Decision Analysis, 2nd ed., by R. T. Clemen, © 1996. Reprinted with permission of Brooks/Cole, an imprint of the Wadsworth Group, a division of Thomson Learning. Fax 800-730-2215.

19.26 Identify and list each of the following for the investor’s decision problem: InvDec

a The investor’s alternative actions.

b The states of nature.

c The possible results of sampling (that is, of information gathering).

19.27 Write out the payoff table for the investor’s decision problem. InvDec

19.28 Carry out a prior analysis of the investor’s decision problem. That is, determine the investment choice that should be made and find the expected monetary value of that choice assuming that the investor does not consult the economist about future stock market behavior. InvDec

19.29 Set up probability revision tables to InvDec

a Find the probability that the “economist says up” and find the posterior probabilities of market up, market flat, and market down given that the “economist says up.”

b Find the probability that the “economist says flat,” and find the posterior probabilities of market up, market flat, and market down given that the “economist says flat.”

c Find the probability that the “economist says down,” and find the posterior probabilities of market up, market flat, and market down given that the “economist says down.”

d Reproduce the decision tree of Figure 19.7 and insert the probabilities you found in parts a, b, and c in their appropriate locations.

19.30 Carry out a posterior analysis of the investor’s decision problem. That is, determine the investment choice that should be made and find the expected monetary value of that choice assuming InvDec

a The economist says “market up.”

b The economist says “market flat.”

c The economist says “market down.”

19.31 Carry out a preposterior analysis of the investor’s decision problem by finding InvDec

a The expected monetary value associated with consulting the economist; that is, find the EPS.

b The expected monetary value associated with not consulting the economist; that is, find the EPNS.

c The expected value of sample information, EVSI.

d The maximum amount the investor should be willing to pay for the economist’s consulting advice.

Exercises 19.32 through 19.38 refer to the following situation.

A firm designs and manufactures automatic electronic control devices that are installed at customers’ plant sites. The control devices are shipped by truck to customers’ sites; while in transit, the devices sometimes get out of alignment. More specifically, a device has a prior probability of .10 of getting out of alignment during shipment. When a control device is delivered to the customer’s plant site, the customer can install the device. If the customer installs the device, and if the device is in alignment, the manufacturer of the control device will realize a profit of $15,000. If the customer installs the device, and if the device is out of alignment, the manufacturer must dismantle, realign, and reinstall the device for the customer. This procedure costs $3,000, and therefore the manufacturer will realize a profit of $12,000. As an alternative to customer installation, the manufacturer can send two engineers to the customer’s plant site to check the alignment of the control device, to realign the device if necessary before installation, and to supervise the installation. Since it is less costly to realign the device before it is installed, sending the engineers costs $500. Therefore, if the engineers are sent to assist with the installation, the manufacturer realizes a profit of $14,500 (this is true whether or not the engineers must realign the device at the site).

Before a control device is installed, a piece of test equipment can be used by the customer to check the device’s alignment. The test equipment has two readings, “in” or “out” of alignment. Given that the control device is in alignment, there is a .8 probability that the test equipment will read “in.” Given that the control device is out of alignment, there is a .9 probability that the test equipment will read “out.”

19.32 Identify and list each of the following for the control device situation:

a The firm’s alternative actions.

b The states of nature.
c The possible results of sampling (that is, of information gathering).

19.33 Write out the payoff table for the control device situation.

19.34 Construct a decision tree for a prior analysis of the control device situation. Then determine whether the engineers should be sent assuming that the piece of test equipment is not employed to check the device’s alignment. Also find the expected monetary value associated with the best alternative action.

19.35 Set up probability revision tables to

a Find the probability that the test equipment “reads in,” and find the posterior probabilities of in alignment and out of alignment given that the test equipment “reads in.”

b Find the probability that the test equipment “reads out,” and find the posterior probabilities of in alignment and out of alignment given that the test equipment “reads out.”

19.36 Construct a decision tree for a posterior and preposterior analysis of the control device situation.

19.37 Carry out a posterior analysis of the control device problem. That is, decide whether the engineers should be sent, and find the expected monetary value associated with either sending or not sending (depending on which is best) the engineers assuming

a The test equipment “reads in.”

b The test equipment “reads out.”

19.38 Carry out a preposterior analysis of the control device problem by finding

a The expected monetary value associated with using the test equipment; that is, find the EPS.

b The expected monetary value associated with not using the test equipment; that is, find the EPNS.

c The expected value of sample information, EVSI.

d The maximum amount that should be paid for using the test equipment.

19.4: Introduction to Utility Theory

Suppose that a decision maker is trying to decide whether to invest in one of two opportunities—Investment 1 or Investment 2—or not to invest in either of these opportunities. As shown in Table 19.2(a), (b), and (c), the expected profits associated with Investment 1, Investment 2, and no investment are $32,000, $28,000, and $0. Thus, if the decision maker uses expected profit as a decision criterion, and decides to choose no more than one investment, the decision maker should choose Investment 1. However, as discussed earlier, the expected profit for an investment is the long-run average profit that would be realized if many identical investments could be made. If the decision maker will make only a limited number of investments (perhaps because of limited capital), he or she will not realize the expected profit. For example, a single undertaking of Investment 1 will result in either a profit of $50,000, a profit of $10,000, or a loss of $20,000. Some decision makers might prefer a single undertaking of Investment 2, because the potential loss is only $10,000. Other decision makers might be unwilling to risk $10,000 and would choose no investment.

Table 19.2: Three Possible Investments and Their Expected Utilities

There is a way to combine the various profits, probabilities, and the decision maker’s individual attitude toward risk to make a decision that is best for the decision maker. The method is based on a theory of utility discussed by J. Von Neumann and O. Morgenstern in Theory of Games and Economic Behavior (Princeton University Press, Princeton, N. J., 1st ed., 1944, 2nd ed., 1947). This theory says that if a decision maker agrees with certain assumptions about rational behavior (we will not discuss the assumptions here), then the decision maker should replace the profits in the various investments by
utilities
and choose the investment that gives the highest expected utility. To find the utility of a particular profit, we first arrange the profits from largest to smallest. The utility of the largest profit is 1 and the utility of the smallest profit is 0. The utility of any particular intermediate profit is the probability, call it u, such that the decision maker is indifferent between (1) getting the particular intermediate profit with certainty and (2) playing a lottery (or game) in which the probability is u of getting the highest profit and the probability is 1 − u of getting the smallest profit. Table 19.2(d) arranges the profits in Table 19.2(a), (b), and (c) in increasing order and gives a specific decision maker’s utility for each profit. The utility of .95 for $40,000 means that the decision maker is indifferent between (1) getting $40,000 with certainty and (2) playing a lottery in which the probability is .95 of getting $50,000 and the probability is .05 of losing $20,000. The utilities for the other profits are interpreted similarly. Table 19.2(f), (g), and (h) show the investments with profits replaced by utilities. Since Investment 2 has the highest expected utility, the decision maker should choose Investment 2.

Table 19.2(e) shows a plot of the specific decision maker’s utilities versus the profits. The curve connecting the plot points is the utility curve for the decision maker. This curve is an example of a risk averter’s curve. In general, a risk averter’s curve portrays a rapid increase in utility for initial amounts of money followed by a gradual leveling off for larger amounts of money. This curve is appropriate for many individuals or businesses because the marginal value of each additional dollar is not as great once a large amount of money has been earned. A risk averter’s curve is shown on the page margin, as are a risk seeker’s curve and a risk neutral’s curve. The risk seeker’s curve represents an individual who is willing to take large risks to have the opportunity to make large profits. The risk neutral curve represents an individual for whom each additional dollar has the same value. It can be shown that this individual should choose the investment having the highest expected profit.

A risk averter’s curve:

A risk seeker’s curve:

A risk neutral’s curve:

Exercises for Section 19.4

CONCEPTS

19.39 What is a utility?

19.40 What is a risk averter? A risk seeker? A risk neutral?

METHODS AND APPLICATIONS

19.41 Suppose that a decision maker has the opportunity to invest in an oil well drilling operation that has a .3 chance of yielding a profit of $1,000,000, a .4 chance of yielding a profit of $400,000, and a .3 chance of yielding a profit of −$100,000. Also, suppose that the decision maker’s utilities for $400,000 and $0 are .9 and .7. Explain the meanings of these utilities.

19.42 Consider Exercise 19.41. Find the expected utility of the oil well drilling operation. Find the expected utility of not investing. What should the decision maker do if he/she wishes to maximize expected utility?

Chapter Summary

We began this chapter by discussing
Bayes’ Theorem. We learned that this theorem is used to revise
prior probabilities
to
posterior probabilities,
which are revised probabilities based on new information. We also saw how to use a probability revision table (and Bayes’ Theorem) to update probabilities in a decision problem. In Section 19.2 we presented an introduction to decision theory. We saw that a decision problem involves
states of nature, alternatives, payoffs,
and decision criteria, and we considered three degrees of uncertainty—
certainty, uncertainty,
and
risk.
In the case of certainty, we know which state of nature will actually occur. Here we simply choose the alternative that gives the best payoff. In the case of uncertainty, we have no information about the likelihood of the different states of nature. Here we discussed two commonly used decision criteria—the
maximin criterion
and the
maximax criterion. In the case of risk, we are able to estimate the probability of occurrence for each state of nature. In this case we learned how to use the
expected monetary value criterion. We also learned how to construct a
decision tree
in Section 19.2, and we saw how to use such a tree to analyze a decision problem. In Section 19.3 we learned how to make decisions by using posterior probabilities. We explained how to perform a posterior analysis to determine the best alternative for each of several sampling results. Then we showed how to carry out a
preposterior analysis,
which allows us to assess the worth of sample information. In particular, we saw how to obtain the
expected value of sample information.
This quantity is the expected gain from sampling, which tells us the maximum amount we should be willing to pay for sample information. We concluded this chapter with Section 19.4, which introduced using utility theory to help make decisions.

Glossary of Terms

alternatives:

Several alternative actions for a decision maker to choose from. (page 836)

Bayes’ Theorem:

A theorem (formula) that is used to compute posterior probabilities by revising prior probabilities. (page 832)

Bayesian statistics:

An area of statistics that uses Bayes’ Theorem to update prior belief about a probability or population parameter to posterior belief. (page 833)

certainty:

When we know for certain which state of nature will actually occur. (page 836)

decision criterion:

A rule used to make a decision. (page 836)

decision theory:

An approach that helps decision makers to make intelligent choices. (page 836)

decision tree:

A diagram consisting of nodes and branches that depicts the information for a decision problem. (page 838)

expected monetary value criterion:

A decision criterion in which one computes the expected monetary payoff for each alternative and then chooses the alternative yielding the largest expected payoff. (page 837)

expected net gain of sampling:

The difference between the expected value of sample information and the cost of sampling. If this quantity is positive, it is worth it to perform sampling. (page 845)

expected value of perfect information:

The difference between the expected payoff under certainty and the expected payoff under risk. (page 839)

expected value of sample information:

The difference between the expected payoff of sampling and the expected payoff of no sampling. This measures the expected gain from sampling. (page 845)

maximax criterion:

A decision criterion in which one finds the best possible payoff for each alternative and then chooses the alternative that yields the maximum best possible payoff. (page 837)

maximin criterion:

A decision criterion in which one finds the worst possible payoff for each alternative and then chooses the alternative that yields the maximum worst possible payoff. (page 837)

payoff table:

A tabular summary of the payoffs in a decision problem. (page 836)

perfect information:

Information that tells us exactly which state of nature will occur. (page 839)

posterior decision analysis:

Using a decision criterion based on posterior probabilities to choose the best alternative in a decision problem. (page 842)

posterior probability:

A revised probability obtained by updating a prior probability after receiving new information. (page 831)

preposterior analysis:

When we assess the worth of sample information before performing a posterior decision analysis. (page 844)

prior decision analysis:

Using a decision criterion based on prior probabilities to choose the best alternative in a decision problem. (page 842)

prior probability:

The initial probability that an event will occur. (page 831)

risk:

When the likelihood (probability) of each state of nature can be estimated. (page 836)

states of nature:

A set of potential future conditions that will affect the results of a decision. (page 836)

uncertainty:

When we have no information about the likelihoods of the various states of nature. (page 836)

utility:

A measure of monetary value based on an individual’s attitude toward risk. (pages 848–849)

Important Formulas

Bayes’ Theorem: page 832

Probability revision table: page 833

Maximin criterion: page 837

Maximax criterion: page 837

Expected monetary value criterion: page 837

Decision tree: page 838

Expected value of perfect information: page 839

Expected payoff of sampling: page 845

Expected payoff of no sampling: page 845

Expected value of sample information: page 845

Expected net gain of sampling: page 845

Expected utility: page 849

Supplementary Exercises

19.43 In the book Making Hard Decisions: An Introduction to Decision Analysis, Robert T. Clemen presents a decision tree for a research and development decision (note that payoffs are given in millions of dollars, which is denoted by M). Based on this decision tree (shown in Figure 19.8), answer the following:

a Should development of the research project be continued or stopped? Justify your answer by using relevant calculations, and explain your reasoning.

b If development is continued and if a patent is awarded, should the new technology be licensed, or should the company develop production and marketing to sell the product directly? Justify your answer by using relevant calculations and explain your reasoning.

Figure 19.8: A Decision Tree for a Research and Development Decision

Source: From Making Hard Decisions: An Introduction to Decision Analysis, 2nd ed., by R. T. Clemen, © 1996. Reprinted with permission of Brooks/Cole, an imprint of the Wadsworth Group, a division of Thomson Learning. Fax 800-730-2215.

19.44 On any given day, the probability that the Ohio River at Cincinnati is polluted by a carbon tetrachloride spill is .10. Each day, a test is conducted to determine whether the river is polluted by carbon tetrachloride. This test has proved correct 80 percent of the time. Suppose that on a particular day the test indicates carbon tetrachloride pollution. What is the probability that such pollution actually exists?

19.45 In the book Production/Operations Management, William J. Stevenson presents a decision tree concerning a firm’s decision about the size of a production facility. This decision tree is given in Figure 19.9 (payoffs are given in millions of dollars). Use the decision tree to determine which alternative (build small or build large) should be chosen in order to maximize the expected monetary payoff. What is the expected monetary payoff associated with the best alternative?

Figure 19.9: A Decision Tree for a Production Facility Decision

Source: Reprinted with permission from W. J. Stevenson, Production/Operations Management, 6th ed., p. 70. © 1999 by The McGraw-Hill Companies, Inc.

19.46 Consider the decision tree in Figure 19.9 on the next page and the situation described in Exercise 19.45. Suppose that a marketing research study can be done to obtain more information about whether demand will be high or low. The marketing research study will result in one of two outcomes: “favorable” (indicating that demand will be high) or “unfavorable” (indicating that demand will be low). The accuracy of marketing research studies like the one to be carried out can be expressed by the conditional probabilities in the following table:

For instance, P(favorable | high) = .9 and P(unfavorable | low) = .8. Given the prior probabilities and payoffs in Figure 19.9, do the following:

a Carry out a posterior analysis. Find the best alternative (build small or build large) for each possible study result (favorable or unfavorable), and find the associated expected payoffs.

b Carry out a preposterior analysis. Determine the maximum amount that should be paid for the marketing research study.

19.47 A marketing major will interview for an internship with a major consumer products manufacturer/distributor. Before the interview, the marketing major feels that the chances of being offered an internship are 40 percent. Suppose that of the students who have been offered internships with this company, 90 percent had good interviews, and that of the students who have not been offered internships, 50 percent had good interviews. If the marketing major has a good interview, what is the probability that he or she will be offered an internship?

19.48 THE OIL DRILLING CASE DrillTst

Again consider the oil drilling case that was described in Examples 19.2 and 19.3. Recall that the oil company wishes to decide whether to drill and that the prior probabilities of no oil, some oil, and much oil are P(none) = .7, P(some) = .2, and P(much) = .1. Suppose that, instead of performing the seismic survey to obtain more information about the site, the oil company can perform a cheaper magnetic experiment having two possible results: a high reading and a low reading. The past performance of the magnetic experiment can be summarized as follows:

Here, for example, P(low | none) = .8 and P(high | some) = .6. Recalling that the payoffs associated with no oil, some oil, and much oil are −$700,000, $500,000, and $2,000,000, respectively, do the following:

a Draw a decision tree for this decision problem.

b Carry out a posterior analysis. Find the best alternative (drill or do not drill) for each possible result of the magnetic experiment (low or high), and find the associated expected payoffs.

c Carry out a preposterior analysis. Determine the maximum amount that should be paid for the magnetic experiment.

19.49 In the book Making Hard Decisions: An Introduction to Decision Analysis, Robert T. Clemen presents an example in which he discusses the 1982 John Hinckley trial. In describing the case, Clemen says:

In 1982 John Hinckley was on trial, accused of having attempted to kill President Reagan. During Hinckley’s trial, Dr. Daniel R. Weinberger told the court that when individuals diagnosed as schizophrenics were given computerized axial tomography (CAT) scans, the scans showed brain atrophy in 30% of the cases compared with only 2% of the scans done on normal people. Hinckley’s defense attorney wanted to introduce as evidence Hinckley’s CAT scan, which showed brain atrophy. The defense argued that the presence of atrophy strengthened the case that Hinckley suffered from mental illness.

a Approximately 1.5 percent of the people in the United States suffer from schizophrenia. If we consider the prior probability of schizophrenia to be .015, use the information given to find the probability that a person has schizophrenia given that a person’s CAT scan shows brain atrophy.

b John Hinckley’s CAT scan showed brain atrophy. Discuss whether your answer to part a helps or hurts the case that Hinckley suffered from mental illness.

c It can be argued that .015 is not a reasonable prior probability of schizophrenia. This is because .015 is the probability that a randomly selected U.S. citizen has schizophrenia. However, John Hinckley is not a randomly selected U.S. citizen. Rather, he was accused of attempting to assassinate the president. Therefore, it might be reasonable to assess a higher prior probability of schizophrenia. Suppose you are a juror who believes there is only a 10 percent chance that Hinckley suffers from schizophrenia. Using .10 as the prior probability of schizophrenia, find the probability that a person has schizophrenia given that a person’s CAT scan shows brain atrophy.

d If you are a juror with a prior probability of .10 that John Hinckley suffers from schizophrenia and given your answer to part c, does the fact that Hinckley’s CAT scan showed brain atrophy help the case that Hinckley suffered from mental illness?

e If you are a juror with a prior probability of .25 that Hinckley suffers from schizophrenia, find the probability of schizophrenia given that Hinckley’s CAT scan showed brain atrophy. In this situation, how strong is the case that Hinckley suffered from mental illness?

19.50 In an exercise in the book Production/Operations Management, 5th ed. (1996), William J. Stevenson considers a theme park whose lease is about to expire. The theme park’s management wishes to decide whether to renew its lease for another 10 years or relocate near the site of a new motel complex. The town planning board is debating whether to approve the motel complex. A consultant estimates the payoffs of the theme park’s alternatives under each state of nature as shown in the following payoff table:

a What alternative should the theme park choose if it uses the maximax criterion? What is the resulting payoff of this choice?

b What alternative should the theme park choose if it uses the maximin criterion? What is the resulting payoff of this choice?

19.51 Again consider the situation described in Exercise 19.50, and suppose that management believes there is a .35 probability that the motel complex will be approved.

a Draw a decision tree for the theme park’s decision problem.

b Which alternative should be chosen if the theme park uses the maximum expected monetary value criterion? What is the expected monetary payoff for this choice?

c Suppose that management is offered the option of a temporary lease while the planning board decides whether to approve the motel complex. If the lease costs $100,000, should the theme park’s management sign the lease? Justify your answer.

1 Here profits are really present values representing current dollar values of expected future income minus costs.

(Bowerman 830)

Bowerman, Bruce L. Business Statistics in Practice, 5th Edition. McGraw-Hill Learning Solutions, 022008. .

Hypothesis Testing

The Right Hypothesis

In business, or any other discipline, once the question has been asked there must be a statement as to what will or will not occur through testing, measurement, and investigation. This process is known as formulating the right hypothesis. Broadly defined a hypothesis is a statement that the conditions under which something is being measured or evaluated holds true or does not hold true. Further, a business hypothesis is an assumption that is to be tested through market research, data mining, experimental designs, quantitative, and qualitative research endeavors. A hypothesis gives the businessperson a path to follow and specific things to look for along the road.

If the research and statistical data analysis supports and proves the hypothesis that becomes a project well done. If, however, the research data proved a modified version of the hypothesis then re-evaluation for continuation must take place. However, if the research data disproves the hypothesis then the project is usually abandoned.

Hypotheses come in two forms: the null hypothesis and the alternate hypothesis. As a student of applied business statistics you can pick up any number of business statistics textbooks and find a number of opinions on which type of hypothesis should be used in the business world. For the most part, however, and the safest, the better hypothesis to formulate on the basis of the research question asked is what is called the null hypothesis. A 
null hypothesis 
states that the research measurement data gathered will not support a difference, relationship, or effect between or amongst those variables being investigated. To the seasoned research investigator having to accept a statement that no differences, relationships, and/or effects will occur based on a statistical data analysis is because when nothing takes place or no differences, effects, or relationship are found there is no possible reason that can be given as to why. This is where most business managers get into trouble when attempting to offer an explanation as to why something has not happened. Attempting to provide an answer to why something has not taken place is akin to discussing how many angels can be placed on the head of a pin—everyone’s answer is plausible and possible. As such business managers need to accept that which has happened and not that which has not happened.

Many business people will skirt the null hypothesis issue by attempting to set an
alternative hypothesis
 that states differences, effects and relationships will occur between and amongst that which is being investigated if certain conditions apply.Unfortunately, however, this reverse position is as bad. The research investigator might well be safe if the data analysis detects differences, effect or relationships, but what if it does not? In that case the business manager is back to square one in attempting to explain what has not happened. Although the hypothesis situation may seem confusing there is light at the end of the tunnel.

The best-fit hypothesis strategy in business situations is to set a null hypothesis stating that no differences, effects, and or relationship occur, collect the measurement data, subject the data to statistical analysis and if differences, effects, and relationships are detected explain the possible reasons as to why. As stated earlier, if no differences, effects, and or relationships were detected then a decision must be made as to possibly revamping the research situation or abandoning the program altogether.

In the preceding paragraph the phrase “. . . if differences, effects, and/or relationships between and amongst variables were detected. . . ” is, to the unsuspecting business manager conducting research, an accident waiting to happen! In any given research situations differences, effects, and relationships will be found whether minimal and obscure or by chance alone. The lingering question is, therefore, what does the business research investigator do to avoid the trap of having to explain and give reason to every possible difference, effect, or relationship found? The answer is to set statistical limits of acceptance through the use of confidence intervals.

In general, hypothesis testing involves the following steps:

· Identify the Null Hypothesis (H0). For example H0: Mean = 0

· Stipulate the Alternative Hypothesis (HA). For example HA: Mean <> 0

· Calculate the (appropriate) test statistic based on the sample data. The sampling distribution (if the Null Hypothesis is true) is assumed to be known.

· Select the acceptance region (Confidence Interval – .05 or .01) or rejection region based upon the sampling distribution.

· Accept or Reject the Null Hypothesis H0 and draw the appropriate conclusions.

We revisit Chapter 9 by learning about the two hypotheses that make up the structure of a hypothesis test.  The null hypothesis is the statement being tested.  Usually it represents the status quo and it is not rejected unless there is convincing sample evidence that it is false.  The alternative, or, research, hypothesis is a statement that is accepted only if there is convincing sample evidence that it is true and that the null hypothesis is false.  In some situations, the alternative hypothesis is a condition for which we need to attempt to find supportive evidence. We will also learn that two types of errors can be made in a hypothesis test.  A Type I error occurs when we reject a true null hypothesis, and a Type II error occurs when we do not reject a false null hypothesis.

We study two commonly used ways to conduct a hypothesis test.  The first involves comparing the value of a test statistic with what is called a critical value, and the second employs what is called a p-value.  The p-value measures the weight of evidence against the null hypothesis.  The smaller the p-value, the more we doubt the null hypothesis.  We will learn that, if we can reject the null hypothesis with the probability of a Type I error equal to α, then we say that the test result has statistical significance at the α level.  However, even if the result of a hypothesis test tells us that statistical significance exists, we must carefully assess whether the result is practically important.  One good way to do this is to use a point estimate and confidence interval for the parameter of interest.

The specific hypothesis tests we will cover in this chapter all dealt with a hypothesis about one population parameter.  First, we will study a test about a population mean that is based on the assumption that the population standard deviation σ is known.  This test employs the normal distribution.  Second, a test about a population mean that assumes that σ is unknown.  We will learn that this test is based on the t distribution. Figure 9.18 presents a flowchart summarizing how to select an appropriate test statistic to test a hypothesis about a population mean.  Then we will present a test about a population proportion that is based on the normal distribution.  Next we will study Type II error probabilities, and how we can find the sample size needed to make both the probability of a Type I error and the probability of a serious Type II error as small as we wish.  We conclude by discussing the chi-square distribution and its use in making statistical inferences about a population variance.

In chapter 19, we begin by discussing Bayes’ Theorem.  We will learn that this theorem is used to revise prior probabilities to posterior probabilities, which are revised probabilities based on new information.  We also see how to use a probability revision table (and Bayes’ Theorem) to update probabilities in a decision problem.  In Section 19.2 we present an introduction to decision theory.  We see that a decision problem involves states of nature, alternatives, payoffs, and decision criteria, and we will consider three degrees of uncertainty—certainty, uncertainty, and risk.  In the case of certainty, we learn which state of nature will actually occur.  Here we simply choose the alternative that gives the best payoff.  In the case of uncertainty, we have no information about the likelihood of the different states of nature.

Here we will discuss two commonly used decision criteria—the maximum criterion and the maximax criterion.  In the case of risk, we are able to estimate the probability of occurrence for each state of nature.  In this case we will learn how to use the expected monetary value criterion.  We also learn how to construct a decision tree in Section 19.2, and we will see how to use such a tree to analyze a decision problem.  In Section 19.3 we will learn how to make decisions by using posterior probabilities.  We will also see how to perform a posterior analysis to determine the best alternative for each of several sampling results.  Then we will see how to carry out a preposterior analysis, which allows us to assess the worth of sample information.  In particular, we see how to obtain the expected value of sample information.  This quantity is the expected gain from sampling, which tells us the maximum amount we should be willing to pay for sample information.  We conclude by introducing the utility theory to help make decisions.

Still stressed with your coursework?
Get quality coursework help from an expert!