UCSD R studio Quantitative Methods in Business Questions?

Management 3: Quantitative Methods in BusinessSession 3 – Assignment (20 points)
Instructions: Download the citibike dataset from Canvas and import it into R (see the handout on importing data
to review the method demonstrated in class.) Once loaded, check that the dataset has loaded properly by
confirming that the object appears in your environment, and contains the correct number of observations (50,000
rows) and variables (18 columns).
Citi Bike is a bikesharing company based in New York City. Customers rent a bicycle from a station, and may ride
the bike anywhere in the city for as long as they like. At the end of the trip, the customer deposits the bicycle at a
designated Citi Bike station. Customers pay a small fixed fee for the rental session, plus a variable fee based on
the duration of the trip. Customers may register as subscribers of the service, which allows them small discounts
on trips and other special offers. The company is a predecessor to contemporary scooter-sharing companies like
Bird or Lime. In this dataset, each row represents a rental session of a Citi Bike bicycle. The variables contained
in the dataset are as follows:
name
description
trip_id
Primary key; a unique identifier of the rental session.
bike_id
A code identifying the bike rented for the session.
weekday
The day of the week on which the session occurred.
start_hour
The hour of the day (0-23) at which the session began.
start_time
The date and time at which the rental session began.
start_station_id
The code identifying the station at which the rental session began.
start_station_name
The cross streets identifying the station at which the rental session began.
start_station_latitude
The latitudinal coordinates of the station at which the rental session began.
start_station_longitude
The longitudinal coordinates of the station at which the rental session bega.
end_time
The date and time at which the rental session ended.
end_station_id
The code identifying the station at which the rental session ended.
end_station_name
The cross streets identifying the station at which the rental session ended.
end_station_latitude
The latitudinal coordinates of the station at which the rental session ended.
end_station_longitude
The longitudinal coordinates of the station at which the rental session ended.
trip_duration
The length of the rental session, in seconds.
subscriber
An indicator of whether the customer who initiated the session was a subscriber to Citi Bike.
birth_year
The year that the customer who initiated the rental session was born.
gender
The gender of the customer who initiated the rental session (0 = unknown, 1 = male, 2 = female)
© Ryan Wagner, 2020. Do not copy or distribute without permission.
Q1. Identify the data type of each of the following variables: (1/4 pt each, 3 pts total)
a. trip_id
b. bike_id
c. weekday
d. start_hour
e. start_station_id
f. start_station_name
g. start_station_latitude
h. start_station_longitude
i. trip_duration
j. subscriber
k. birth_year
l. gender
Q2. Write the command to generate a proportion table showing the proportion of sessions that were initiated by
subscribers vs. non-subscribers. What proportion of trips were initiated by subscribers? (1 pt)
Q3. Write the command to create a new variable called trip_minutes that converts the duration of the trip from
seconds to minutes. What is the average length of a trip in minutes? (2 pts)
Q4. Using the aggregate() command, find the average trip length in minutes among subscribers vs. nonsubscribers. (2 pts)
Q5. Write the command to create a new variable called weekend that flags all trips that occurred on either
Saturday OR Sunday. What proportion of trips occur on the weekend? (2 pts)
Q6. Write the command to create a crosstable of subscriber status by weekend status. Express the crosstable as
a proportion table, with proportions aggregated by row (you will need to include the margin parameter
demonstrated in class.) Describe the patterns you see in the table: does there appear to be a difference in
bike usage for weekdays vs. weekends among subscribers vs. non-subscribers? (2 pts)
Q7. Using the information found in Q4 and Q6, offer a possible explanation for why you’re observing the
differences in ride length and weekend vs. not among subscribers vs. non-subscribers. Why do you think
each group is using the service? (2 pts)
Q8. Write the command to create a crosstable of subscriber status by gender. Express the crosstable as a
proportion table, with proportions aggregated by row (you will need to include the margin parameter
demonstrated in class.) According to the table, does Citi Bike’s subscriber base appear to skew male or
female? (2 pts)
Note: R often expresses decimals using scientific notation. As a reminder, the symbol e+01 indicates to move
the decimal one place to the right, and the symbol e-01 indicates to move the decimal one place to the left.
© Ryan Wagner, 2020. Do not copy or distribute without permission.
Q9. Write the command to create a variable called age that subtracts the year the rider was born from the current
year, and create a histogram of the age variable. Describe the distribution of ages shown in the data. Does
anything strike you as odd? (2 pts)
Q10. Using the aggregate() command to find the average age by gender. Does there appear to be a
meaningful difference in average rider age by gender? (2 pts)
© Ryan Wagner, 2020. Do not copy or distribute without permission.
MGT 3
Quantitative Methods
in Business
R. Wagner
Winter 2020
Session 3
The Case For R
© 2019 Ryan Wagner
R is an open-source programming language that has become a popular fixture in
the data mining community, especially in the social sciences.
Benefits:
• FREE!
Unlike Excel, SPSS, MATLAB, SAS, STATA, etc.
• Very powerful / flexible.
Can do basically anything any other statistical analysis software can do,
but allows for extra customization.
• Massive global community.
CRAN, Stack Overflow, etc.
Interface
© 2019 Ryan Wagner
Source: where you write your code
(typically in a script).
Console: a log of all the
executed code and resulting
output.
Environment:
all the objects you’ve
loaded/created.
Misc:
graphs, help files,
history, etc.
Syntax Rules

R is case sensitive. e.g., table() vs. Table()

Strings must be wrapped in quotes. (single or double: “hello” vs. ‘hello’)

If you don’t wrap a string in quotes, R will think you’re referring to an object name.

To run a line of code: place your cursor on that line and hit Ctrl + Enter.

Spacing is optional.

Use of spaces between object names, numbers, and operators is
generally used for readability, but not required.

© 2019 Ryan Wagner
Ex: x

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper
Still stressed from student homework?
Get quality assistance from academic writers!

Order your essay today and save 25% with the discount code LAVENDER