MAT 243 Project One Summary Report[Full Name]
Notes:
• Replace the bracketed text on page one (the cover page) with your personal information.
• You will use your selected team for all three projects
1. Introduction: Problem Statement
Discuss the statement of the problem in terms of the statistical analyses that are being performed. In
your response, you should address the following questions:
●
●
●
What is the problem you are going to solve?
What data set are you using?
What statistical methods will you be using to do the analysis for this project?
Answer the questions in a paragraph response. Remove all questions and this note before
submitting! Do not include Python code in your report.
2. Introduction: Your Team and the Assigned Team
In this project, you picked a team and you were assigned a team to do comparative analysis.
See Steps 1 and 2 in the Python script to address the following items:
●
●
What team did you pick and what years were picked to do the analysis?
What team and range of years were you assigned for the comparative study? (Hint: This is called
the assigned team in the Python script.) Present this information in a formatted table as shown
below.
Table 1. Information on the Teams
1. Yours
2. Assigned
Name of Team
Team (e.g. Knicks)
Team (e.g. Bulls)
Assigned Years
XXXX-YYYY (e.g. 2013 – 2015)
XXXX-YYYY (e.g. 2013 – 2015)
Answer the questions in a paragraph response. Remove all questions and this note (but not the
table) before submitting! Do not include Python code in your report.
3. Data Visualization: Points Scored by Your Team
In the Python script, you created a visualization for the distribution of points scored by your team.
See Step 3 in the Python script to address the following items in a paragraph response:
●
●
●
In general, how is data visualization used to study data distributions and trends?
In this activity, you were asked to pick one of the two plots that best describes the data
distribution of the variable for your team. Include a screenshot of this plot in your report.
Why did you pick this plot? Explain.
●
What can you say about the distribution of the variable by visually inspecting this plot? What
does this signify?
Answer the questions in a paragraph response. Remove all questions and this note before
submitting! Do not include Python code in your report.
4. Data Visualization: Points Scored by the Assigned Team
In the Python script, you created a visualization for the distribution of points scored by the assigned
team.
See Step 4 in the Python script to address the following items in a paragraph response:
●
●
●
In this activity, you were asked to pick one of the two plots that best describes the data
distribution of the variable for the assigned team. Include this plot in your report.
Why did you pick this plot? Explain.
What can you say about the distribution of the variable by visually inspecting this plot? What
does this signify?
Answer the questions in a paragraph response. Remove all questions and this note before
submitting! Do not include Python code in your report.
5. Data Visualization: Comparing the Two Teams
In the Python script, you created a visualization for the difference in the distributions of points scored by
your team and the assigned team.
See Step 5 in the Python script to address the following items in a paragraph response:
●
●
●
●
In general, how is data visualization used to compare two different data distributions?
In this activity, you were asked to pick one of the two plots that best compares the data
distributions of your team with the assigned team. Include a screenshot of this plot in your
report.
Why did you pick this plot? Explain.
How do the two distributions compare to each other?
Answer the questions in a paragraph response. Remove all questions and this note before
submitting! Do not include Python code in your report.
6. Descriptive Statistics: Points Scored By Your Team in Home Games
In the Python script, you calculated descriptive statistics on the points scored by your team in games
played at home venue. These included the mean, median, variance, and standard deviation for the
relative skill of your team.
See Step 6 in the Python script to address the following items:
●
Summarize all statistics in a formatted table as shown below. Use one row for each statistic. You
will need to add rows to the table in order to include all of your statistics.
Table 2. Descriptive Statistics for Points Scored by Your Team in Home Games
Statistic
(for example, Mean)
●
●
●
Value
X.XX
*Round off to 2 decimal places.
In general, how are the measures of central tendency and variability used to analyze a data
distribution?
Interpret each statistic in detail and explain what it represents in this scenario.
Use the mean and the median to describe the distribution of points scored by your team in home
games.
○ Describe the skew: Is it left, right, or bell-shaped?
○ Explain which measure of central tendency is best to use to represent the center of the
distribution based on its skew.
Answer the questions in a paragraph response. Remove all questions and this note (but not the
table) before submitting! Do not include Python code in your report.
7. Descriptive Statistics: Points Scored By Your Team in Away Games
In the Python script, you calculated descriptive statistics on the points scored by your team in games
played at opponent’s venue (away). These included the mean, median, variance, and standard deviation
for the relative skill of the assigned team.
See Step 7 in the Python script to address the following items:
●
Summarize all statistics in a formatted table as shown below. Use one row for each statistic. You
will need to add rows to the table in order to include all of your statistics.
Table 3. Descriptive Statistics for Points Scored by Your Team in Away Games
Statistic Name
Statistic
(for example, Mean)
●
●
●
Value
X.XX
*Round off to 2 decimal places.
Interpret each statistic in detail and explain what it represents in this scenario.
Use the mean and the median to describe the distribution of points scored by your team in away
games.
a. Describe the skew: Is it left, right, or bell-shaped?
b. Explain which measure of central tendency is best to use to represent the center of the
distribution based on its skew.
Is your team performing better in games played at home than those played away? Use the mean
and the standard deviation to answer this question. What can be deduced by comparing the
standard deviation of points scored in home games and points scored in away games?
Answer the questions in a paragraph response. Remove all questions and this note (but not the
table) before submitting! Do not include Python code in your report.
8. Confidence Intervals for the Average Relative Skill of All Teams in Your Team’s Years
In the Python script, you calculated a 95% confidence interval for the average relative skill of all teams in
the league during the years of your team. Additionally, you calculated the probability that a given team
in the league has a relative skill level less than that of the team that you picked.
See Step 8 in the Python script to address the following items:
●
Report the confidence interval in a formatted table as shown below.
Table 4. Confidence Interval for Average Relative Skill of Teams in Your Team’s Years
Confidence Level (%)
XX% (for example, 95%)
●
●
●
●
Confidence Interval
(X.XX, X.XX)
*Round off to 2 decimal places.
Describe how confidence intervals are generally used in estimating the measures of central
tendency for a population.
Provide a detailed interpretation of the confidence interval in terms of the average relative skill
of teams in the range of years that you picked.
How would your interval be different if you had used a different confidence level?
What is the probability that a given team in the league has a relative skill level less than that of
the team that you picked? Is it unusual that a team has a skill level less than your team?
Answer the questions in a paragraph response. Remove all questions and this note (but not the
table) before submitting! Do not include Python code in your report.
9. Confidence Intervals for the Average Relative Skill of All Teams in the Assigned Team’s Years
In the Python script, you calculated a 95% confidence interval for the average relative skill of all teams in
the league during the years of the assigned team. Additionally, you calculated the probability that a
given team in the league has a relative skill level less than that of the assigned team.
See Step 9 in the Python script to address the following items:
●
Report the confidence interval in a formatted table as shown below.
Table 5. Confidence Interval for Average Relative Skill of Teams in Assigned Team’s Years
Confidence Level (%)
XX% (for example, 95%)
Confidence Interval
(X.XX, X.XX)
*Round off to 2 decimal places.
●
●
●
Provide a detailed interpretation of the confidence interval in terms of the average relative skill
of teams in the assigned team’s range of years.
Discuss how your interval would be different if you had used a different confidence level.
How does this confidence interval compare with the previous one? What does this signify in
terms of the average relative skill of teams in the range of years that you picked versus the
average relative skill of teams in the assigned team’s range of years?
Answer the questions in a paragraph response. Remove all questions and this note (but not the
table) before submitting! Do not include Python code in your report.
10. Conclusion
Describe the results of your statistical analyses clearly, using proper descriptions of statistical terms and
concepts.
●
●
What is the practical importance of the analyses that were performed?
Describe what these results mean for the scenario.
Answer the questions in a paragraph response. Remove all questions and this note before
submitting! Do not include Python code in your report.
11. Citations
You were not required to use external resources for this report. If you did not use any resources, you
should remove this entire section. However, if you did use any resources to help you with your
interpretation, you must cite them. Use proper APA format for citations.
Insert references here in the following format:
Author’s Last Name, First Initial. Middle Initial. (Year of Publication). Title of book: Subtitle of book,
edition. Place of Publication: Publisher.
Project One: Data Visualization, Descriptive
Statistics, Confidence Intervals
This notebook contains the step-by-step directions for Project One. It is very important to run
through the steps in order. Some steps depend on the outputs of earlier steps. Once you have
completed the steps in this notebook, be sure to write your summary report.
You are a data analyst for a basketball team and have access to a large set of historical data that
you can use to analyze performance patterns. The coach of the team and your management have
requested that you use descriptive statistics and data visualization techniques to study
distributions of key performance metrics that are included in the data set. These data-driven
analytics will help make key decisions to improve the performance of the team. You will use the
Python programming language to perform the statistical analyses and then prepare a report of
your findings to present for the team’s management. Since the managers are not data analysts,
you will need to interpret your findings and describe their practical implications.
There are four important variables in the data set that you will study in Project One.
Variable What does it represent?
pts
Points scored by the team in a game
elo_n
A measure of the relative skill level of the team in the league
year_id Year when the team played the games
fran_id Name of the NBA team
The ELO rating, represented by the variable elo_n, is used as a measure of the relative skill of a
team. This measure is inferred based on the final score of a game, the game location, and the
outcome of the game relative to the probability of that outcome. The higher the number, the
higher the relative skill of a team.
In addition to studying data on your own team, your management has assigned you a second
team so that you can compare its performance with your own team’s.
Team
Your Team
Assigned
Team
What does it represent?
This is the team that has hired you as an analyst. This is the team that you will
pick below. See Step 2.
This is the team that the management has assigned to you to compare against your
team. See Step 1.
Reminder: It may be beneficial to review the summary report template for Project One prior to
starting this Python script. That will give you an idea of the questions you will need to answer
with the outputs of this script.
Step 1: Data Preparation & the Assigned Team
This step uploads the data set from a CSV file. It also selects the assigned team for this analysis.
Do not make any changes to the code block below.
1. The assigned team is the Chicago Bulls from the years 1996-1998
Click the block of code below and hit the Run button above.
import numpy as np
import pandas as pd
import scipy.stats as st
import matplotlib.pyplot as plt
from IPython.display import display, HTML
nba_orig_df = pd.read_csv(‘nbaallelo.csv’)
nba_orig_df = nba_orig_df[(nba_orig_df[‘lg_id’]==’NBA’) &
(nba_orig_df[‘is_playoffs’]==0)]
columns_to_keep =
[‘game_id’,’year_id’,’fran_id’,’pts’,’opp_pts’,’elo_n’,’opp_elo_n’,
‘game_location’, ‘game_result’]
nba_orig_df = nba_orig_df[columns_to_keep]
# The dataframe for the assigned team is called assigned_team_df.
# The assigned team is the Chicago Bulls from 1996-1998.
assigned_years_league_df = nba_orig_df[(nba_orig_df[‘year_id’].between(1996,
1998))]
assigned_team_df =
assigned_years_league_df[(assigned_years_league_df[‘fran_id’]==’Bulls’)]
assigned_team_df = assigned_team_df.reset_index(drop=True)
display(HTML(assigned_team_df.head().to_html()))
print(“printed only the first five observations…”)
print(“Number of rows in the data set =”, len(assigned_team_df))
game_id
year_i fran_i
opp_pt
pts
d
d
s
199511030CH
1996
I
199511040CH
1
1996
I
199511070CH
2
1996
I
199511090CL
3
1996
E
199511110CH
4
1996
I
0
Bulls
105 91
Bulls
107 85
Bulls
117 108
Bulls
106 88
Bulls
110 106
opp_elo_ game_locatio game_resul
n
n
t
1598.292 1531.744
H
W
4
9
1604.394 1458.641
H
W
0
5
1605.798 1310.934
H
W
3
9
1618.870 1452.826
A
W
1
8
1621.159 1490.286
H
W
1
1
elo_n
printed only the first five observations…
Number of rows in the data set = 246
Step 2: Pick Your Team
In this step, you will pick your team. The range of years that you will study for your team is
2013-2015. Make the following edits to the code block below:
1. Replace ??TEAM?? with your choice of team from one of the following team names.
*Bucks, Bulls, Cavaliers, Celtics, Clippers, Grizzlies, Hawks, Heat, Jazz, Kings, Knicks,
Lakers, Magic, Mavericks, Nets, Nuggets, Pacers, Pelicans, Pistons, Raptors, Rockets,
Sixers, Spurs, Suns, Thunder, Timberwolves, Trailblazers, Warriors, Wizards*
Remember to enter the team name within single quotes. For example, if you picked the
Suns, then ??TEAM?? should be replaced with ‘Suns’.
After you are done with your edits, click the block of code below and hit the Run button above.
# Range of years: 2013-2015 (Note: The line below selects ALL teams within the
three-year period 2013-2015. This is not your team’s dataframe.
your_years_leagues_df = nba_orig_df[(nba_orig_df[‘year_id’].between(2013,
2015))]
# The dataframe for your team is called your_team_df.
# —- TODO: make your edits here —your_team_df =
your_years_leagues_df[(your_years_leagues_df[‘fran_id’]==’Lakers’)]
your_team_df = your_team_df.reset_index(drop=True)
display(HTML(your_team_df.head().to_html()))
print(“printed only the first five observations…”)
print(“Number of rows in the data set =”, len(your_team_df))
game_id
year_i fran_i
opp_pt
pts
d
d
s
201210300LA
2013
L
201210310PO
1
2013
R
201211020LA
2
2013
L
201211040LA
3
2013
L
201211070UT
4
2013
A
0
Lakers 91 99
Lakers 106 116
Lakers 95 105
Lakers 108 79
Lakers 86 95
opp_elo_ game_locatio game_resul
n
n
t
1541.758 1533.929
H
L
5
7
1531.718 1460.701
A
L
4
5
1518.798 1580.867
H
L
1
9
1527.592 1409.056
H
W
7
6
1521.160 1535.967
A
L
3
4
elo_n
printed only the first five observations…
Number of rows in the data set = 246
Step 3: Data Visualization: Points Scored by Your Team
The coach has requested that you provide a visual that shows the distribution of points scored by
your team in the years 2013-2015. The code below provides two possible options. Pick ONE of
these two plots to include in your summary report. Choose the plot that you think provides the
best visual for the distribution of points scored by your team. In your summary report, you must
explain why you think your visual is the best choice.
Click the block of code below and hit the Run button above.
NOTE: If the plots are not created, click the code section and hit the Run button again.
import seaborn as sns
# Histogram
fig, ax = plt.subplots()
plt.hist(your_team_df[‘pts’], bins=20)
plt.title(‘Histogram of points scored by Your Team in 2013 to 2015’,
fontsize=18)
ax.set_xlabel(‘Points’)
ax.set_ylabel(‘Frequency’)
plt.show()
print(“”)
# Scatterplot
plt.title(‘Scatterplot of points scored by Your Team in 2013 to 2015’,
fontsize=18)
sns.regplot(your_team_df[‘year_id’], your_team_df[‘pts’], ci=None)
plt.show()
Step 4: Data Visualization: Points Scored by the Assigned
Team
The coach has also requested that you provide a visual that shows a distribution of points scored
by the Bulls from years 1996-1998. The code below provides two possible options. Pick ONE of
these two plots to include in your summary report. Choose the plot that you think provides the
best visual for the distribution of points scored by your team. In your summary report, you will
explain why you think your visual is the best choice.
Click the block of code below and hit the Run button above.
NOTE: If the plots are not created, click the code section and hit the Run button again.
import seaborn as sns
# Histogram
fig, ax = plt.subplots()
plt.hist(assigned_team_df[‘pts’], bins=20)
plt.title(‘Histogram of points scored by the Bulls in 1996 to 1998’,
fontsize=18)
ax.set_xlabel(‘Points’)
ax.set_ylabel(‘Frequency’)
plt.show()
# Scatterplot
plt.title(‘Scatterplot of points scored by the Bulls in 1996 to 1998’,
fontsize=18)
sns.regplot(assigned_team_df[‘year_id’], assigned_team_df[‘pts’], ci=None)
plt.show()
Step 5: Data Visualization: Comparing the Two Teams
Now the coach wants you to prepare one plot that provides a visual of the differences in the
distribution of points scored by the assigned team and your team. The code below provides two
possible visuals. Choose the plot that allows for the best comparison of the data distributions.
Click the block of code below and hit the Run button above.
NOTE: If the plots are not created, click the code section and hit the Run button again.
import seaborn as sns
# Side-by-side boxplots
both_teams_df = pd.concat((assigned_team_df, your_team_df))
plt.title(‘Boxplot to compare points distribution’, fontsize=18)
sns.boxplot(x=’fran_id’,y=’pts’,data=both_teams_df)
plt.show()
print(“”)
# Histograms
fig, ax = plt.subplots()
plt.hist(assigned_team_df[‘pts’], 20, alpha=0.5, label=’Assigned Team’)
plt.hist(your_team_df[‘pts’], 20, alpha=0.5, label=’Your Team’)
plt.title(‘Histogram to compare points distribution’, fontsize=18)
plt.xlabel(‘Points’)
plt.legend(loc=’upper right’)
plt.show()
Step 6:
Descriptive Statistics: Relative Skill of Your Team
The management of your team wants you to run descriptive statistics on the relative skill of your
team from 2013-2015. In this project, you will use the variable ‘elo_n’ to respresent the relative
skill of the teams. Calculate descriptive statistics including the mean, median, variance, and
standard deviation for the relative skill of your team. Make the following edits to the code block
below:
1. Replace ??MEAN_FUNCTION?? with the name of Python function that calculates the
mean.
2. Replace ??MEDIAN_FUNCTION?? with the name of Python function that calculates
the median.
3. Replace ??VAR_FUNCTION?? with the name of Python function that calculates the
variance.
4. Replace ??STD_FUNCTION?? with the name of Python function that calculates the
standard deviation.
After you are done with your edits, click the block of code below and hit the Run button above.
print(“Your Team’s Relative Skill in 2013 to 2015”)
print(“——————————————————-“)
# —- TODO: make your edits here —mean = your_team_df[‘elo_n’].mean()
median = your_team_df[‘elo_n’].median()
variance = your_team_df[‘elo_n’].var()
stdeviation = your_team_df[‘elo_n’].std()
print(‘Mean =’, round(mean,2))
print(‘Median =’, round(median,2))
print(‘Variance =’, round(variance,2))
print(‘Standard Deviation =’, round(stdeviation,2))
Your Team’s Relative Skill in 2013 to 2015
——————————————————Mean = 1440.49
Median = 1412.34
Variance = 6337.75
Standard Deviation = 79.61
Step 7 – Descriptive Statistics – Relative Skill of the Assigned
Team
The management also wants you to run descriptive statistics for the relative skill of the Bulls
from 1996-1998. Calculate descriptive statistics including the mean, median, variance, and
standard deviation for the relative skill of the assigned team.
You are to write this code block yourself.
Use Step 6 to help you write this code block. Here is some information that will help you write
this code block.
1. The dataframe for the assigned team is called assigned_team_df.
2. The variable ‘elo_n’ respresent the relative skill of the teams.
3. Your statistics should be rounded to two decimal places.
Write your code in the code block section below. After you are done, click this block of code and
hit the Run button above. Reach out to your instructor if you need more help with this step.
print(“Assigned Team’s Relative Skill in 1996 to 1998”)
print(“——————————————————“)
mean_=assigned_team_df[‘elo_n’].mean()
median_ = assigned_team_df[‘elo_n’].median()
variance_ = assigned_team_df[‘elo_n’].var()
stdeviation_ = assigned_team_df[‘elo_n’].std()
print(“Mean = “, round(mean_,2))
print(“Median = “, round(median_,2))
print(“Variance = “, round(variance_,2))
print(“Standard Deviation = “, round(stdeviation_,2))
Assigned Team’s Relative Skill in 1996 to 1998
—————————————————–Mean = 1739.8
Median = 1751.23
Variance = 2651.55
Standard Deviation = 51.49
Step 8: Confidence Intervals for the Average Relative Skill of
All Teams in Your Team’s Years
The management wants to you to calculate a 95% confidence interval for the average relative
skill of all teams in 2013-2015. To construct a confidence interval, you will need the mean and
standard error of the relative skill level in these years. The code block below calculates the mean
and the standard deviation. Your edits will calculate the standard error and the confidence
interval. Make the following edits to the code block below:
1. Replace ??SD_VARIABLE?? with the variable name representing the standard
deviation of relative skill of all teams from your years. (Hint: the standard deviation
variable is in the code block below)
2. Replace ??CL?? with the confidence level of the confidence interval.
3. Replace ??MEAN_VARIABLE?? with the variable name representing the mean relative
skill of all teams from your years. (Hint: the mean variable is in the code block below)
4. Replace ??SE_VARIABLE?? with the variable name representing the standard error.
(Hint: the standard error variable is in the code block below)
The management also wants you to calculate the probability that a team in the league has a
relative skill level less than that of the team that you picked. Assuming that the relative skill of
teams is Normally distributed, Python methods for a Normal distribution can be used to answer
this question. The code block below uses two of these Python methods. Your task is to identify
the correct Python method and report the probability.
After you are done with your edits, click the block of code below and hit the Run button above.
print(“Confidence Interval for Average Relative Skill in the years 2013 to
2015”)
print(“———————————————————————————————————–“)
# Mean relative skill of all teams from the years 2013-2015
mean = your_years_leagues_df[‘elo_n’].mean()
# Standard deviation of the relative skill of all teams from the years 20132015
stdev = your_years_leagues_df[‘elo_n’].std()
n = len(your_years_leagues_df)
#Confidence interval
# —- TODO: make your edits here —stderr = stdev/(n ** 0.5)
conf_int_95 = st.norm.interval(0.95, mean, stderr)
print(“95% confidence interval (unrounded) for Average Relative Skill (ELO) in
the years 2013 to 2015 =”, conf_int_95)
print(“95% confidence interval (rounded) for Average Relative Skill (ELO) in
the years 2013 to 2015 = (“, round(conf_int_95[0], 2),”,”,
round(conf_int_95[1], 2),”)”)
print(“\n”)
print(“Probability a team has Average Relative Skill LESS than the Average
Relative Skill (ELO) of your team in the years 2013 to 2015”)
print(“——————————————————————————————————————————————————–“)
mean_elo_your_team = your_team_df[‘elo_n’].mean()
choice1 = st.norm.sf(mean_elo_your_team, mean, stdev)
choice2 = st.norm.cdf(mean_elo_your_team, mean, stdev)
# Pick the correct answer.
print(“Which of the two choices is correct?”)
print(“Choice 1 =”, round(choice1,4))
print(“Choice 2 =”, round(choice2,4))
Confidence Interval for Average Relative Skill in the years 2013 to 2015
———————————————————————————————————-95% confidence interval (unrounded) for Average Relative Skill (ELO) in the
years 2013 to 2015 = (1502.0236894390478, 1507.1824625533618)
95% confidence interval (rounded) for Average Relative Skill (ELO) in the
years 2013 to 2015 = ( 1502.02 , 1507.18 )
Probability a team has Average Relative Skill LESS than the Average Relative
Skill (ELO) of your team in the years 2013 to 2015
——————————————————————————————————————————————————–Which of the two choices is correct?
Choice 1 = 0.7147
Choice 2 = 0.2853
Step 9 – Confidence Intervals for the Average Relative Skill
of All Teams in the Assigned Team’s Years
The management also wants to you to calculate a 95% confidence interval for the average
relative skill of all teams in the years 1996-1998. Calculate this confidence interval.
You are to write this code block yourself.
Use Step 8 to help you write this code block. Here is some information that will help you write
this code block. Reach out to your instructor if you need help.
1. The dataframe for the years 1996-1998 is called assigned_years_league_df
2. The variable ‘elo_n’ represents the relative skill of teams.
3. Start by calculating the mean and the standard deviation of relative skill (ELO) in years
1996-1998.
4. Calculate n that represents the sample size.
5. Calculate the standard error which is equal to the standard deviation of Relative Skill
(ELO) divided by the square root of the sample size n.
6. Assuming that the population standard deviation is known, use Python methods for the
Normal distribution to calculate the confidence interval.
7. Your statistics should be rounded to two decimal places.
The management also wants you to calculate the probability that a team had a relative skill level
less than the Bulls in years 1996-1998. Assuming that the relative skill of teams is Normally
distributed, calculate this probability.
You are to write this code block yourself.
Use Step 8 to help you write this code block. Here is some information that will help you write
this code block.
1. Calculate the mean relative skill of the Bulls. Note that the dataframe for the Bulls is
called assigned_team_df. The variable ‘elo_n’ represents the relative skill.
2. Use Python methods for a Normal distribution to calculate this probability.
3. The probability value should be rounded to four decimal places.
Write your code in the code block section below. After you are done, click this block of code and
hit the Run button above. Reach out to your instructor if you need more help with this step.
print(“Confidence Interval for Average Relative Skill in the years 1996 to
1998”)
print(“———————————————————————————————————–“)
# Mean relative skill of all teams from the years 1996-1998
mean = assigned_years_league_df[‘elo_n’].mean()
# Standard deviation of the relative skill of all teams from the years 19961998
stdev = assigned_years_league_df[‘elo_n’].std()
n = len(assigned_years_league_df)
#Confidence interval
# —- TODO: make your edits here —stderr = stdev/(n ** 0.5)
conf_int_95 = st.norm.interval(0.95, mean, stderr)
print(“95% confidence interval (unrounded) for Average Relative Skill (ELO) in
the years 1996 to 1998 =”, conf_int_95)
print(“95% confidence interval (rounded) for Average Relative Skill (ELO) in
the years 1996 to 1998 = (“, round(conf_int_95[0], 2),”,”,
round(conf_int_95[1], 2),”)”)
print(“\n”)
print(“Probability a team has Average Relative Skill LESS than the Average
Relative Skill (ELO) of Bulls in the years 1996 to 1998”)
print(“——————————————————————————————————————————————————–“)
mean_elo_assigned_team = assigned_team_df[‘elo_n’].mean()
answer1 = st.norm.cdf(mean_elo_assigned_team, mean, stdev)
# Pick the correct answer.
print(“Answer =”, round(answer1,4))
Confidence Interval for Average Relative Skill in the years 1996 to 1998
———————————————————————————————————-95% confidence interval (unrounded) for Average Relative Skill (ELO) in the
years 1996 to 1998 = (1487.6565859527095, 1493.6465501840999)
95% confidence interval (rounded) for Average Relative Skill (ELO) in the
years 1996 to 1998 = ( 1487.66 , 1493.65 )
Probability a team has Average Relative Skill LESS than the Average Relative
Skill (ELO) of Bulls in the years 1996 to 1998
——————————————————————————————————————————————————–Answer = 0.9732
End of Project One
Download the HTML output and submit it with your summary report for Project One. The
HTML output can be downloaded by clicking File, then Download as, then HTML. Do not
include the Python code within your summary report.