TEST #1
Perform the following two-tailed hypothesis test, using a .05 significance level:
- Intrinsic by Gender
- State the null and an alternate statement for the test
- Use Microsoft Excel (Data Analysis Tools) to process your data and run the appropriate test.
- Copy and paste the results of the output to your report in Microsoft Word.
- Identify the significance level, the test statistic, and the critical value.
- State whether you are rejecting or failing to reject the null hypothesis statement.
- Explain how the results could be used by the manager of the company.
TEST #2
Perform the following two-tailed hypothesis test, using a .05 significance level:
- Extrinsic variable by Position Type
State the null and an alternate statement for the testUse Microsoft Excel (Data Analysis Tools) to process your data and run the appropriate test.Copy and paste the results of the output to your report in Microsoft Word.Identify the significance level, the test statistic, and the critical value.State whether you are rejecting or failing to reject the null hypothesis statement.Explain how the results could be used by the manager of the company.
GENERAL ANALYSIS (Research Required)
Using your textbook or other appropriate college-level resources:
- Explain when to use a t-test and when to use a z-test. Explore the differences.
- Discuss why samples are used instead of populations.
The report should be well written and should flow well with no grammatical errors. It should include proper citation in APA formatting in both the in-text and reference pages and include a title page, be double-spaced, and in Times New Roman, 12-point font. APA formatting is necessary to ensure academic honesty.
Be sure to provide references in APA format for any resource you may use to support your answers.
Running head:  BUSN311 – Quantitative Methods and Analysis
1
Unit 4 – Hypothesis Testing & Variance
Type your Name Here
American InterContinental University
Abstract
This is a single paragraph, no indentation is required. The next page will be an abstract; “a brief, comprehensive summary of the contents of the article; it allows the readers to survey the contents of an article quickly” (Publication Manual, 2010). The length of this abstract should be 35-50 words (2-3 sentences). NOTE: the abstract must be on page 2 and the body of the paper will begin on page 3.
Introduction
Remember to always indent the first line of a paragraph (use the tab key). The introduction should be short (2-3 sentences). The margins, font size, spacing, and font type (italics or plain) are set in APA format. While you may change the names of the headings and subheadings, do not change the font or style of font.
Hypothesis Test #1 Looking at Intrinsic Satisfaction by Gender
Null and alternate hypotheses.
Write out a Null & Alternate Hypothesis (alpha = .05).
The test
Use Excel to perform the test. Paste the results in the document
.
In a separate sentence, specifically identify the significance level (alpha), the test statistics and the critical value.
State your decision
State whether you are rejecting or failing to reject the null hypothesis statement. Explanation of decision made
Comment on why you are making your decision in terms of how the test statistic
compares to the critical value or in terms of how the p value compares to the alpha.  Applications for managers
Discuss how the manager of the company could use this information specifically. Why is this information valuable?
Hypothesis Test #2 Looking at Extrinsic Satisfaction by Position
Null and alternate hypotheses
Write out a Null & Alternate Hypothesis (alpha = .05).
The test
Use Excel to perform the test. Paste the results in the document
In a separate sentence, specifically identify the significance level (alpha), the test statistics and the critical value.
State your decision
State whether you are rejecting or failing to reject the null hypothesis statement. Explanation of decision made
Comment on why you are making your decision in terms of how the test statistic compares to the critical value or in terms of how the p value compares to the alpha.
Applications for managers
Discuss how the manager of the company could use this information specifically. Why is this information valuable?
Z and T Tests
Explain the difference between the Z and T test and when each one is used.
Samples and Populations
Explain the difference between a sample and the population. Why are samples used for hypothesis tests? Be sure to be specific.
Conclusion
Add some concluding remarks in about 2-3 sentences.
References
NOTE: The reference list starts on a new page after your conclusion.
For help with formatting citations and references using rules outlined in the APA Manual’s 6th Edition, please check out the AIU APA guide located under the Interactive Learning section on the left side of the course.
Examples:
American Psychological Association [APA]. (2010) Publication manual of the American
Psychological association (6th ed.). Washington, DC: Author.
Association of Legal Writing Directors (ALWD) (2005). ALWD citation manual: A professional
system of citation (3rd ed.). New York: Aspen Publishers.
TEST#
1
Perform the following two-tailed hypothesis test, using a .0
5
significance level:
· Intrinsic by Gender
· State the null and an alternate statement for the test
· Use Microsoft Excel (Data Analysis Tools) to process your data and run the appropriate test.
Copy and paste the results of the output to your report in Microsoft Word.
· Identify the significance level, the test statistic, and the critical value.
· State whether you are rejecting or failing to reject the null hypothesis statement.
· Explain how the results could be used by the manager of the company.
TEST #
2
Perform the following two-tailed hypothesis test, using a .05 significance level:
· Extrinsic variable by Position Type
· State the null and an alternate statement for the test
· Use Microsoft Excel (Data Analysis Tools) to process your data and run the appropriate test.
· Copy and paste the results of the output to your report in Microsoft Word.
· Identify the significance level, the test statistic, and the critical value.
· State whether you are rejecting or failing to reject the null hypothesis statement.
· Explain how the results could be used by the manager of the company. 
GENERAL ANALYSIS (Research Required)
Using your textbook or other appropriate college-level resources:
· Explain when to use a t-test and when to use a z-test. Explore the differences.
· Discuss why samples are used instead of populations.
The report should be well written and should flow well with no grammatical errors. It should include proper citation in AP
A
formatting in both the in-text and reference pages and include a title page, be double-spaced, and in Times New Roman, 12-point font. APA formatting is necessary to ensure academic honesty.
Be sure to provide references in APA format for any resource you may use to support your answers.
Making Inferences
When data are collected, various summary statistics and graphs can be used for describing data; however, learning about what the data mean is where the power of statistics starts. For example, is there really a difference between two leading cola products? Hypothesis testing is an example of making these types of inferences on data sets.
Hypothesis Tests
Claims are made all the time, such as a particular light bulb will last a certain number of hours.
Claims like this are tested with hypothesis testing. It is a straight forward procedure that consists of the following steps: 
1. A claim is made.
2. A value for probability of significance is chosen.
3
. Data are collected.
4
. The test is performed.
5. The results are analyzed.
Hypothesis tests are performed on the mean of the population. µ
It is not possible to test the full population. For example, it would be impossible to test every light bulb. Instead, the hypothesis test is performed on a sample of the population.
Setting up a Hypothesis Test
When performing hypothesis testing, the test is setup with a null hypothesis (or claim) and the alternative hypothesis.
The null hypothesis (the claim) is
· what is being disputed
· represented by
The alternative hypothesis is
· what is being researched
· represented by
Example: If testing the claim that light bulbs last 3,500 hours, then the hypothesis is written as follows:
Question 1 – Multiple Choice: In the above example, which statement is the null hypothesis?
A.
B.
The correct answer is A. A. represents the null hypothesis.
B.
represents the alternative hypothesis.
One -Tail and Two-Tail Hypothesis Testing
When hypothesis tests are set up, the researcher is either looking to see if there is a difference or if the values are too large or small. If the researcher is looking for a difference from what is being claimed, the test is a two-tail test. A two-tail test states that if the value is too far below or above the mean, then the null will be rejected. One-tail tests are tests that are concerned with values that are greater than (right-tail) or less than (left-tail) the mean.
In the light bulb example, which type of test is used for the following hypothesis?
A. Two-tail
B. One-Tail (Right tail)
C. One-Tail (Left-tail)
The correct answer is C. One-Tail (Left-tail). For the light bulb test, the test results are only significant if the bulbs last less than 3,500 hours.
Outcomes of Hypothesis Testing
When a hypothesis test is performed, the claim is always assumed to be true unless proven otherwise. There are two outcomes for a hypothesis test:
Reject the claim
Fail to reject the claim 
Note: You never state that you accept the null hypothesis. You can only state that there is not enough evidence to fail to reject.
Level of Significance
Before a hypothesis test is made, one must decide the level of significance. To understand the level of significance, it is important to first understand the errors that can occur when testing. Remember that the sample is being tested, not the entire population.
There are two types of errors that can occur: Type I and Type II.
· Type I Error: Rejecting the null erroneously. The null should not have been rejected.
· Type II Error: The null is not rejected erroneously. The null should have been rejected.
In the light bulb example, the testing showed that the light bulbs did not perform for 3,500 hours. The tester rejected the null hypothesis, which stated that the light bulbs last 3,500 hours.
Remember the statement:  
It was then found that the test was performed on a batch of faulty light bulbs. These light bulbs did not represent the average light bulb, so the test results were not accurate.
Which type of error occurred?
A. Type I
B. Type II
The correct answer is A.
Type I. The null hypothesis was rejected; however, it was done so erroneously. The tests were performed on a faulty batch of light bulbs that did not represent the average light bulb.
Level of Significance
The significance level of the test is the risk you are willing to take of rejecting the null hypothesis, when, in fact, it is true.
The level of significance is given in terms of alpha. Typical values are .01 and.05.
 .01 is the stricter of the two.  
Performing the Test
After the hypothesis has been set up and the significance level has been set, data are collected. After the data are collected, the sample mean is calculated and is tested.
Testing includes finding the P- value, which is the probability that if the population mean is true. The sample value will fit the data set.
P-Values are calculated using the z-Test. First, the value z is found with the following formula.
Following are descriptions of the parts of this formula.
The P-value is then found by looking up z, on a normal table, or it is calculated using software.
Using Z-test versus t-Test
When the sample size is 30 or greater, the normal table (z-test) that was just described is used; however, when a sample is less than 30, the p-value s are found in a t distribution table. When looking up values in a t table, the value called degrees of freedom is used. The degrees of freedom are the sample size (n) minus 1.
Hypothesis Test – Decision
When the P-value is calculated, it is compared with the level of significance to make the decision on whether to reject the null hypothesis. In the next exercise, Part 2, you will have a chance to see in detail how this decision is made.
Practice: One Population Parameter
The following data set gives the speeds taken this year from 15 random cars driving on a road with a speed limit of 55 mph. The authorities have given data that states the average speed recorded from the year before in the same area was 75 mph. You feel that the average is lower this year because so many speeders were seen receiving tickets the year before.
Using the sample data, we will set up a hypothesis test based on the claim. Carefully follow each of the steps involved in the test. We will use a significance level of 0.05.
This activity also gives instructions on how to use Microsoft Excel for solving this problem, so you can do this in Excel if you wish. This will be a big help in solving statistics equations and in future assignments.
Following is a list of the 15 recorded speeds: 60, 75, 80, 50, 45, 45, 55, 80, 75, 60, 50, 50, 50, 55, 55. In Excel, these should be put in a column labeled “Recorded Speed (mph)”
The first step in hypothesis testing is to set up the null and alternate hypothesis.
Question 1 – Multiple Choice: Based on the scenario just described, which of the two would be a null hypothesis?
A
B.
The correct answer A
Recall that the null hypothesis states the claim. In this example, the claim was that the average speed was 75 mph.
The next step in hypothesis testing is setting up the alternative hypothesis.
Question 2 – Multiple Choice: Based on the scenario, which of the two choices given would be the alternative hypothesis?
A
B
The correct answer is A.
The dispute to the claim is represented by the alternative hypothesis. In this case, it was stated that your prediction was that the average speed limit was less than 75 mph.
Question 3: After setting your null and alternative hypothesis, you must calculate the sample mean.
The recorded speeds (mph) are (60 , 75, 80, 50, 45, 45, 55, 80, 75, 60, 50, 50, 50, 55, 55). The sample mean is found by adding up all the speeds in the sample size and dividing by the number (n) in the sample size. Which of the following is the sample mean?
A. 59 mph
B. 885 mph
C. 50 mph
D. 80 mph
The correct answer is A. 59 mph. This is the answer when all 15 speeds are added together and divided by n – in this case, 15.
The mean can be calculated in Microsoft Excel. You would do this by typing the following formal into a cell. =AVG(B3: B17). This formula shows that the dataset that contains the speeds starts in cell B3 and ends in B17.
When you hit “Enter”, the mean will appear. In this example, the mean is 59.
Recall that there were 15 total speeds recorded. After finding the sample mean of the 15 speeds (59), the sample standard deviation can be found using the following formula:
Study this formula. Here is how it works.
To obtain the standard deviation you make a list of each of the x values (in this case, the 15 recorded speeds) and subtract the mean (59 mph) from each x value. Then, square each of the values in the list and add them together to get the sum. For this example, the sum is 2160. Then, divide this sum by 14 (this is the degrees of freedom, and is calculated by subtracting 1 from the sample size (15). 2160 / 14 = 154.29. Finally, take the square root of 154.29, to arrive at 12.42.
The steps to calculate the sample standard deviation can be completed in Microsoft Excel.
Here is an example of how to find the standard deviation.
Step 1: Type the following formula into a cell – “=B5-59”. This formula represents the X-Value recorded in cell B5 – which is 80 mph. 59 is the mean. When you hit enter, 59 will be subtracted from the value in cell B5 (80). This can be done for all 15 speeds.
Once you calculate the recorded speed minus the mean for all the recorded speeds, they should appear in Excel in their own column, with the heading (Recorded speed – x values) – Mean.
Step 2: The next step is to square the difference between the x values and the mean. This can be completed by filling out another column in Excel. The formula is C3^2. C3 is the value of the cell that has the recorded speed minus the mean. ^2 tells Excel to square that value. This formula can be copied and pasted into the remaining cells.
Step 3: The next step is to find the sum of the difference square. To do this in Excel, you would find the sum of the column of values calculated in Step 2. The formula used is “=SUM(D3:D17). This shows that cells D3 through D17 contain the squared values. This formula would be typed into the cell beneath the dataset. (D19). When this is done – the result is 2160.
Step 4: The next step in the standard deviation formula is to divide the sum just found by (n -1). Remember that the example has 15 recorded speeds, so n is equal to 15.
In Excel you can take the sum of 2160 and divide by (n – 1) or 15. The following formula would be typed into a blank cell: “=2160/14”. Excel will calculate the value to be 154.2857143.
Step 5: To complete the standard deviation formula, you need to find the square root of 154.2857143. In Excel, this would be done by typing “=SQRT(D22). Note that D22 contains the number calculated in the last step: 154.2857143. Excel then calculates the square root of this number to be 12.42118007. This can be rounded to 12.42, which is the sample standard deviation.
Calculating the t-Value
Once the standard deviation is found, you can use the following formula to find the t- value.
This formula says: T = (sample mean minus the mean of the population) divided by the Standard Deviation (divided by the square root of the sample size).
Following are the values that should be used in this formula:
The Sample Mean is 59
The Mean of the population is 75
The Standard Deviation is 12.42
The Sample size is 15.
This can be entered into Excel as follows:
=(59 – 75)/(12.42/SQRT(15))
The value for this formula is -4.99. When using the t-table for a one-tailed t-test of 0.05 significance level and 14 degrees of freedom, you should arrive at -1.761.
Note: if you arrive at a negative t-value for a left, one tailed-test, as in this example, you can use the t-table, but put a negative sign in front of the number, because the one-tailed left test is testing in the negative direction.
Read It
Steps in Hypothesis Testing
· Step One: Set up the test parameters (mean)
· For example, state that there is no difference between what is claimed and what is examined with the data.
· The test is set up with a null (claim) and an alternative (being researched) hypothesis.
· Step Two: Decide on how significant the test needs to be.
· The more valuable test results will require a more strict rejection value.
· Step Three: Collect the data for examination.
· Representative sampling techniques are used for collecting the data.
· Step Four: Analyze the claim versus the actual data collected.
· Step Five: Once the data have been analyzed, the conclusion can be given.
Type I and Type II Errors
·
Type I error
s occur when the null hypothesis has been rejected incorrectly.
· The level of significance (alpha level) can be very strict to try not to have this type of error occur.
· However, the tradeoff is that lower Type I errors give higher
Type II error
s.
· A Type II error occurs when the alternative was accepted incorrectly.
· Both errors can occur from faulty data or a poor sample.
Deciding the level of significance (alpha values)
· The level of significance is determined by the strictness of the test’s outcome.
· For example, if a test is designed to ensure the safety of an airplane, the results need to be stricter than a test that is determining if the correct amount of potato chips is in a particular bag.
P-values
P-values are typically more prevalent when reporting the results of a hypothesis test because they give the probability that the sampled value would occur given the claim of the test. A higher probability ensures that the claim seems very likely whereas a small probability states that either the sampled data are bad or the claim is false. P-values are found either with a distribution table or with computer software.
One-Tail and Two-Tailed Hypothesis Tests
When hypothesis tests are set up, the researcher is either looking to see if there is a difference or the values are too large or small. If the researcher is looking for a difference in the claim, this is a two-tailed hypothesis test. The two-tailed test states that if the value is too far below or above the mean, then the null will be rejected. A one-tail test, however, may only be rejected if the value is too far below the claim or, in a separate test, too far above the claim. The use of the word tail is represented in the distribution of the data cutting off both tails or only the top or bottom tail. In the following figure, the shaded region signifies the rejection area:
For example, the researcher may want to know if there is a difference (two-tail) in a student’s score, either being very high or very low versus the rest of the class. However, for a light bulb, the researcher may only care if it is not lasting as long as the manufacturer reports (one-tail—left). Quality control may also be interested if an item has been overstocked in packaging (one-tail—right).
Z value
The z value is the test statistic that is calculated from sampled data. The z value is calculated by subtracting the mean of the sample data and the hypothesized mean, and then dividing the standard error, where the standard error is given by the standard deviation divided by the square root of the sample size, as in the following equation:
Once the test statistics have been calculated, the value is looked up in a standard normal table. The value that is found in the table gives the p-value. Then, the decision for rejecting or not rejecting the null hypothesis can be made based on the value from the table compared to the level of significance.
Difference Between z and t Distributions
The t distributions are used when the sample size is less than 30 samples. The t distribution requires looking up the value in a t table with the known degrees of freedom. The degrees of freedom are found by n – 1, or one less than the sample size.
Comparing Two Means for a Hypothesis Test
When comparing two means, the null hypothesis is set up as the two means are equal, meaning there is no significant statistical difference in their values whereas the alternative can be set up as one-tail or two-tail, depending on if the researcher is examining a difference or a high or low value. For example, the following hypothesis setup states that the alternative is claiming the mean values are not the same:
The test statistic, or z value, is found by the differences of the sample mean and hypothesized mean divided by the square root of the standard error as in the following equation:
Note that if there is not a difference in the hypothesized means, then the value of , meaning that.
Sample Problem 1
According to the city of Shumaker Falls, Texas, the average monthly rent for a one bedroom apartment is $875. A real estate agent claims that this average has decreased since the information was calculated. Determine the null and alternative hypothesis, and explain what is meant by making a Type I or Type II error.
Answer
Set up the hypothesis with the following equations:
Note that the alternative is less than because the real estate agent is claiming that the rent has decreased.
A Type I error would occur if the test concludes that the rent has decreased when it really did not. This may occur if the sample was not representative of the true population of apartment rentals in Shumaker Falls.
A Type II error would occur if the test resulted that there was a decrease in the average monthly rent of the apartment when there really was not. Again, this could occur if a poorly representative sample was taken that only consisted of low-rent apartments, for example, in a particular low-rent area.
Sample Problem 2
If the p-value for a hypothesis tests is 0.0327 and the level of significance is 0.05, what is your decision based on the test? Decipher the p-value.
Answer
Because the p-value of 0.0327 is less than the level of significance of 0.05, this implies that the probability the value of the claim is true, given that the p-value is 0.0327, which is a very small probability. Thus, the null hypothesis for the test would be rejected.
Read It
Hypothesis testing is one of the fundamental concepts in both scientific research and business decision making. It involves establishing a hypothesis (an educated guess) about the outcome of an event or experiment and then gathering evidence (data) to decide whether the hypothesis should be accepted or rejected. Another way to think of it is that the hypothesis is an opinion about the value of something for a population; then, data are gathered from a representative sample from that population to make a decision as to whether the opinion is true.
Here are some examples of business problems that can be addressed using hypothesis testing:
·
Should we change our advertising campaign? Does the new campaign have higher recall scores than the current campaign?
·
Do we need to change a critical machine part more often (because downtime exceeds a certain threshold)?
·
Do the computers from Manufacturer A really run faster than the computers from Manufacturer B?
· Does Trucking Company D provide more reliable service than Trucking Company E (does it have more on-time deliveries)?
The first step in conducting a hypothesis test is to formulate a statement that describes an expectation or assumption to test. The next step is to derive a statement that is the opposite. The opposite statement is called the null hypothesis and is represented as H0 (read as H subzero, H for hypothesis and 0 for no difference). The null hypothesis usually has “not” or “no different from” in it. The
alternative hypothesis (H1, H subone) is the expectation or assumption.
Here are the null and alternative hypotheses for the four examples above:
H0: Recall scores for the new ad campaign are not better or higher than those from the current campaign.
H1: Recall scores for the new ad campaign are better or higher than those from the current campaign.
H0: Machine X is not malfunctioning more often than y times per week.
H1: Machine X is malfunctioning more often than y times per week.
H0: Computers from Manufacturer A do not run faster than those from Manufacturer B.
H1: Computers from Manufacturer A run faster than those from Manufacturer B.
H0: Trucking Company D does not have more on-time deliveries than Trucking Company E.
H1: Trucking Company D has more on-time deliveries than Trucking Company E.
The null hypothesis is that which is tested. Keep in mind that the result of a hypothesis test is that you either reject or accept the null hypothesis. You do not prove that the alternative hypothesis is true, just that there is proof beyond a reasonable doubt that it is true. That is why conclusions are often phrased as “the results do not allow us to reject H0” or “we fail to reject H0.” This means that there is ample evidence to support the alternative hypothesis as not occurring due to mere chance.
The significance level of the test is the risk you are willing to take of rejecting the null hypothesis when, in fact, it is true. This level must be decided before you collect data and is related to how large a sample you will need. Hypothesis testing relies on a sample and not on the entire population, so there is a chance for two kinds of errors:
· Type I error: rejecting the null hypothesis when it is true (we think there is a difference when there is not); this tends to result in lost profits because we make a decision to act (usually spending money and other resources) when we should not have—the test’s significance measures this (the most common level is 9
5%
)
· Type II error: accepting the null hypothesis when it is false (we think there is no difference when there really is); this tends to result in lost opportunity because we do not proceed with something we should have—the test’s power measures this and equals 1 – significance, so it is usually 5%
The following chart helps distinguish the Type I and Type II errors:
| Null Hypothesis | Accept H0 | Reject H0 | 
| H0 is really true | Correct decision | Type I error | 
| H0 is really false | Type II error | Correct decision | 
Type I and Type II errors for the examples follow:
| Type I | Type II | ||
| Should we change our advertising campaign? Does the new campaign have higher recall scores than the current campaign? | Test says new campaign is better, so we go ahead with that. It actually turns out to be worse, and we lose sales. | Test says new campaign is no better than the current one, so we do not change campaigns. The new campaign was actually better, and we would have had higher sales if we had changed. | |
| Do we need to change a critical machine part more often (because downtime exceeds a certain threshold)? | Test says downtime exceeds the threshold, so we change the part more often. Downtime really does not exceed the threshold, so money and time are wasted changing the part more often than is necessary. | Test says downtime does not exceed the threshold, so part is not changed more often. Downtime actually does exceed the threshold, and the line is down more often than it would be if we changed the part at the right time. | |
| Do the computers from Manufacturer A really run faster than the computers from Manufacturer B? | Test says computers from Mfr A run faster than those from Mfr B, so we switch to Mfr A. They really do not run any faster, so we wasted money switching to Mfr A. | Test says computers from Mfr A do not run faster than those from Mfr B, so we do not switch to Mfr A. We forego productivity gains because computers from Mfr A really do run faster. | |
| Does Trucking Company D provide more reliable service than Trucking Company E (do they have more on-time deliveries)? | Test says Trucking Company D has a higher proportion of on-time deliveries, so we switch to them. It turns out they really don’t have more on-time deliveries, so we switched from Trucking Company E for no reason and could be losing if Trucking Company D has fewer (as opposed to the same) % of on-time deliveries as Company E. | Test says Trucking Company D does not have a higher proportion of on-time deliveries than Trucking Company E, so we keep our business with Trucking Company E. Trucking Company D does have better on-time delivery performance, so we miss out on productivity gains. | 
You must decide whether you need a one-tailed or two-tailed test when designing a hypothesis. If you want to know if one thing is just different than the other (either greater or less, faster or slower, and so on), use a
two-tailed test. If you are testing to see whether one is better or worse than the other, then use a one-tailed test. 
One-tailed
tests are more common. (Statistics software will have options for one- or two-tailed tests, so you must choose the right one.)
| One- or two-tailed test? | Why? | |||
| Should we change our advertising campaign? Does the new campaign have higher recall scores than the current campaign? | One-tailed | Does the new campaign have higher scores than the current campaign, not higher or lower scores? | ||
| One-tailed | Is the downtime greater than some threshold, not just if it is different (greater or less) than that threshold. | |||
| Do Mfr A computers run faster, not faster or slower. | ||||
| Does trucking Company D have a higher proportion of on-time deliveries than Trucking Company E, not a higher or lower proportion? | 
Data Collection
Businesses and organizations of all types produce significant quantities of data. The majority of these data are generated from normal business operations. For example, each sales transaction of a beverage manufacturer will contain data about the date of the sale, the product sold, the amount of the sale, the customer, the region of the sale, and perhaps the salesperson. These are primary data, and collection of this type of data typically occurs automatically within the organization through the organization’s databases that store the transactions. Other primary data are collected to measure operational performance toward specific objectives. For example, a package delivery company may track driver safety performance over a period of time over certain routes during each day of the week. A copper mining company may collect information on the weight of ore hauled by each truck to understand if truckloads are optimized. A retail store may collect data on customer traffic in the store each hour. Credit card companies may measure how quickly customer service calls are answered.
In addition to company and transaction specific data, organizations and businesses frequently collect financial and market data about competitors, suppliers, customers, and others to guide effective decision making. This data collection effort occurs outside the business and might involve tools such as surveys, interviews, and questionnaires. For example, a manufacturer of appliances might ask its suppliers to produce data on financial health, safety records, and measures of product quality. A consumer goods company may distribute a new product in a defined region and follow up with telephone interviews to assess the product’s performance and attractiveness. National Family Opinion is a national organization that uses the Internet and mail surveys to understand consumer preferences and reactions to specific grocery and nongrocery items. Other site collect data on political opinion for use by political campaigns and journalists. A financial services company that is interested in acquiring another financial institution would require data on the acquisition target’s customer base, branches, and deposits. If any of these data are collected by a third party, such as an independent market research group, or if the data are purchased (such as a customer list), the data are considered secondary data. Whether primary or secondary data, users must always be aware of the collection tools and data sources to ensure fairly measured and accurate data. Having correctly measured data from unbiased samples or sources is critical to correct decision making.
Graphs, Charts, and Tables
Graphs, charts, and tables are tools to help organize and aggregate data upon collection. Each tool helps transform the data into information useful in decision making. Graphs and charts visually present summaries and relationships. For example, a pie chart could visually summarize the percentage of sales dollars in each sales region for a beverage distributor. A frequency distribution could identify the number of salespeople that achieved specific performance targets (e.g., sales or sales growth). A bar chart might compare the quantity of ore hauled by different types of trucks in dry versus wet weather to estimate the effect of weather on mining production. Line charts or graphs frequently involve how performance changes over time. For example, a financial services company might plot a line chart of the growth in deposits by type over time for its acquisition target. Another graph might identify a relationship between the change in sales of soft drinks versus bottled water over the last 5 years, or the change in sales around different levels of marketing. Unlike the visual relationships presented in charts and graphs, tables present summary data in columns and rows, leaving the user to identify and understand relationships. Presentation of data in tables allows users to better understand the data points, range, and distribution characteristics. What is the preferred method of presenting collected data? A combination of visual charts and graphs and tabular presentations ideally describe a set of data and explain the relationships. Pictures minimize the need to explain relationships verbally. However, tabular presentations of data allow users to develop their own understanding of the data and relationships.
Numerical Measures
Numerical measures describe data samples and populations to better understand the central point and spread of the data. Numerical measures, such as the mean, median, and mode describe the central tendency or common point of the data. Variance and standard deviation describe the spread of the data. The variance is the square of the standard deviation. Since the unit measure is squared in the variance, standard deviation is more commonly used to get to the original units.
These measures together allow users to make inferences about the data that can be used to answer questions and make decisions with a statistical measurement of confidence. For example, consider a set of data collected to understand the effect of a smoking ban on restaurant sales. The mean sales is the average level of sales. The median sales is the sales point where half of all observations are greater than the median and half are below the median. The most common data point defines the mode. Typically, the mode is more commonly reported among discrete or categorical data rather than continuous data. For example, if restaurant patrons rate service quality on a scale of 1 (poor) to 5 (excellent) with the following distribution, then the mode is 4.
| Rating | Percent of Responses | 
| 1 | 5% | 
| 2 | 15% | 
| 3 | 30% | 
| 4 | 40% | 
| 5 | 10% | 
Although data may have similar central points, the variance and standard deviation describe how spread out the data distribution is. For example, do a significant number of data points remain around the central tendency points, or are the data points spread out on one or both sides of the center point? For example, although average restaurant sales are the same before and after a smoking ban, the level of the sales may be found to be more spread out prior to the ban compared to after the ban. Rather than concluding no effect of a ban, a difference in the variance of sales may suggest that a greater variety of customers are attracted to the restaurants after the ban.
