CSUB MapReduce Programming

1. Write a MapReduce program that outputs the number of times each neighborhood appears in the Kaggle AirBNB dataset. You can download the dataset from here: https://www.kaggle.com/dgomonov/new-york-city-airb… You can see the schema (columns) of the dataset at the link above, too.

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

The file is a CSV (comma-separated values) dataset; a comma separates the fields in the dataset.

Use the WordCount approach to output from the Reduce stage to count the number of rentals in each neighbourhood (use the neighborhood field) and also output the neighborhood group (e.g. Brooklyn) using the neighbourhood_group field. For each neighborhood encountered, your output should look like this (this is only an example):

Brooklyn Kensington 25

Brooklyn Clinton Hill 5

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

Manhattan Midtown 45…

To receive full credit, please hand in all of the following items:

-All code (please attach this homework zipped into one file).

-Your screenshots (-cat AND running the job)

2. Write a MapReduce program that further analyzes the same Kaggle AirBNB dataset uses in part 1 of this homework. Write WordCount approach MapReduce programs as indicated below:

2A Write a WordCount program to count the number of lines in the file. Name the program: CountLines.

The Reducer should output:

Total number of lines in AirBNB file: [number]

2B Write a MapReduce WordCount program to count all lines that are shorter than the ideal number of fields. Name the program: CountBadShortRecords

The Reducer should output:

Total number of short lines in AirBNB file: [number]

2C Write a WordCount program to count all lines that are longer than the ideal number of fields. Name the program: CountBadLongRecords

The Reducer should output:

Total number of long lines in AirBNB file: [number]

2D Write a MapReduce WordCount program to count all lines that contain the ideal number of fields. Name the program: CountGoodRecords

The Reducer should output:

Total number of good lines in AirBNB file: [number]

To receive full credit, please hand in all of the following items:A. All code (please attach this homework zipped into one file).B. -cat each of the four output filesat the hlog command prompt, screenshot of the job running, for 4 results

Instructions

Still stressed from student homework?
Get quality assistance from academic writers!

Order your essay today and save 25% with the discount code LAVENDER