Hello I has someone else do the assignment but they were not following instructions. Lucky enough they did the assigment but it needs to be done in jupyter and submitted as an HTML. I have attached the work already done, as well as the data, so you can make sure it is accurate. Here are the instructions below.
——-
Using Jupyter notebook, answer each of the following questions. This is an individual lab. You must submit your own work. You will do your work in Jupyter notebook and must show the code you used to get the answer if it is a code-based question. To complete the assignment you must download your Jupyter notebook with code and answers as an html file and upload the notebook in html format. All questions should be answered fully when a text answer is required and all code must be shown to indicate how you arrived at the result.
Your report submission should be formatted using Markdown and code in Jupyter notebook as follows:
The name of the assignment as an H1, your name as an H2, and the date as an H3 all in the first cell block. Add a horizontal line/rule after the cell. The questions should then be answered in individual cells indicated by Q1, Q2, Q3…etc. in H4 format and separated from other questions by horizontal lines/rulers
- Read in the city_sales20.xlsx Download city_sales20.xlsxdata so that the data does note contain any additional text or notes, only the data on offices and sales
- Is the dataset in optimal form for analysis in which each column represents a variable/feature? Why or why not?
- Transform the dataframe so that it contains three columns–city, year, and sales–and save it as a new dataframe called d2
- Determine which office in the data had the highest mean sales between 2014 and 2020 (inclusive)
- Calculate and plot mean sales by year across all offices
- Plot (with altair or seaborn) the sales overtime in a line plot and color the lines by office
- Every year, this dataset is updated with new data. Define a function called “clean_sales” that takes a file name, reads in the data so that there are no missing values and the result contains only the data on offices and sales and no additional text or notes, and then transforms the dataframe so that it contains three columns: city, year, and sales. The function should return the dataframe with three columns: city, year, sales
- Apply the function to another dataset–city_sales21.xlsx Download city_sales21.xlsx–and create the a dataframe called d21 with the result. Return the last five rows of d21
- Use seaborn or altair to create a sorted (low to high) bar plot of the sales data by office from 2021. Which office has the highest sales?
- Using d21, what are the top five best performing year-city combinations
I have attached the work done from the previous tutor, just use it but convert into the instructions above. Answer 1 xlsx is the work already done
Observations
1. Variable Representation:
Each column (from the third onwards) represents a year, which is a variable in the context of sales data. This is a good pra
The first column represents the office location, another important variable.
2. Observation Representation:
Each row, after the header rows, represents a distinct observation (in this case, an office’s annual sales data). This too alig
3. Header Structure:
The datasets have multi-level headers, which can complicate data analysis. Typically, a single row of headers, where each
4. Unnamed Columns:
Many columns are unnamed, which can lead to confusion during analysis. Properly naming these columns with the respec
5. Initial Rows with Titles/Explanations:
The initial rows that contain titles and explanations are not useful for direct analysis and should be separated from the ma
xt of sales data. This is a good practice as it aligns with the principle of each column representing a variable.
city
NY
BOS
MIA
DC
SEA
PHI
CHI
Main departments included in collection: finance, accounting, marketing
Please refer to HP507 for additional details and figures for each office
NY
BOS
MIA
DC
SEA
PHI
CHI
Main departments included in collection: finance, accounting, marketing
Please refer to HP507 for additional details and figures for each office
NY
BOS
MIA
DC
SEA
PHI
CHI
Main departments included in collection: finance, accounting, marketing
Please refer to HP507 for additional details and figures for each office
NY
BOS
MIA
DC
SEA
PHI
CHI
Main departments included in collection: finance, accounting, marketing
Please refer to HP507 for additional details and figures for each office
NY
BOS
year
2013
2013
2013
2013
2013
2013
2013
2013
2013
2013
2013
2014
2014
2014
2014
2014
2014
2014
2014
2014
2014
2014
2015
2015
2015
2015
2015
2015
2015
2015
2015
2015
2015
2016
2016
2016
2016
2016
2016
2016
2016
2016
2016
2016
2017
2017
MIA
DC
SEA
PHI
CHI
Main departments included in collection: finance, accounting, marketing
Please refer to HP507 for additional details and figures for each office
NY
BOS
MIA
DC
SEA
PHI
CHI
Main departments included in collection: finance, accounting, marketing
Please refer to HP507 for additional details and figures for each office
NY
BOS
MIA
DC
SEA
PHI
CHI
Main departments included in collection: finance, accounting, marketing
Please refer to HP507 for additional details and figures for each office
NY
BOS
MIA
DC
SEA
PHI
CHI
Main departments included in collection: finance, accounting, marketing
Please refer to HP507 for additional details and figures for each office
NY
BOS
MIA
DC
SEA
2017
2017
2017
2017
2017
2017
2017
2017
2017
2018
2018
2018
2018
2018
2018
2018
2018
2018
2018
2018
2019
2019
2019
2019
2019
2019
2019
2019
2019
2019
2019
2020
2020
2020
2020
2020
2020
2020
2020
2020
2020
2020
2021
2021
2021
2021
2021
PHI
CHI
Main departments included in collection: finance, accounting, marketing
Please refer to HP507 for additional details and figures for each office
2021
2021
2021
2021
2021
2021
sales
45
34
45
12
52
43
36
31
23
81
13
48
39
35
34
24
43
15
47
38
36
56
26
44
11
41
38
39
67
25
41
11
41
33
38
45
34
45
18
42
35
33
67
26
51
19
44
41
29
45
22
41
20
42
40
27
35
20
74
20
37
32
28
office
NY
BOS
MIA
DC
SEA
PHI
CHI
2014
2015
31
23
81
13
48
39
35
2016
34
24
43
15
47
38
36
2017
56
26
44
11
41
38
39
2018
67
25
41
11
41
33
38
2019
45
34
45
18
42
35
33
Main departments included in collection: finance, accounting, marketing
Please refer to HP507 for additional details and figures for each office
2020
67
26
51
19
44
41
29
45
22
41
20
42
40
27
Office
NY
BOS
MIA
DC
SEA
PHI
CHI
Mean Sales
49.28571429
25.71428571
49.42857143
15.28571429
43.57142857
37.71428571
33.85714286
Main departments included in collection: finance, accounting, marketing
Please refer to HP507 for additional details and figures for each office
Office
MIA
Mean Sales
49.42857143
City
MIA
DC
SEA
PHI
CHI
Year
2021
2021
2021
2021
2021
Sales
74
20
37
32
28
Sales figures for metropolitan locations
Figures are in 1000s of USD
office
NY
BOS
MIA
DC
SEA
PHI
CHI
2013
45
34
45
12
52
43
36
2014
31
23
81
13
48
39
35
2015
34
24
43
15
47
38
36
2016
56
26
44
11
41
38
39
Main departments included in collection: finance, accounting, marketing
Please refer to HP507 for additional details and figures for each office
2017
67
25
41
11
41
33
38
2018
45
34
45
18
42
35
33
2019
67
26
51
19
44
41
29
2020
45
22
41
20
42
40
27
Sales figures for metropolitan locations
Figures are in 1000s of USD
office
NY
BOS
MIA
DC
SEA
PHI
CHI
2013
45
34
45
12
52
43
36
2014
31
23
81
13
48
39
35
2015
34
24
43
15
47
38
36
2016
56
26
44
11
41
38
39
Main departments included in collection: finance, accounting, marketing
Please refer to HP507 for additional details and figures for each office
2017
67
25
41
11
41
33
38
2018
45
34
45
18
42
35
33
2019
67
26
51
19
44
41
29
2020
45
22
41
20
42
40
27
2021
35
20
74
20
37
32
28