Instructions |
1. Write a MapReduce program that outputs the number of times each neighborhood appears in the Kaggle AirBNB dataset. You can download the dataset from here: https://www.kaggle.com/dgomonov/new-york-city-airb… You can see the schema (columns) of the dataset at the link above, too.
Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper
The file is a CSV (comma-separated values) dataset; a comma separates the fields in the dataset.
Use the WordCount approach to output from the Reduce stage to count the number of rentals in each neighbourhood (use the neighborhood field) and also output the neighborhood group (e.g. Brooklyn) using the neighbourhood_group field. For each neighborhood encountered, your output should look like this (this is only an example):
Brooklyn Kensington 25
Brooklyn Clinton Hill 5
Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper
Manhattan Midtown 45…
To receive full credit, please hand in all of the following items:
-All code (please attach this homework zipped into one file).
-Your screenshots (-cat AND running the job)
2. Write a MapReduce program that further analyzes the same Kaggle AirBNB dataset uses in part 1 of this homework. Write WordCount approach MapReduce programs as indicated below:
2A Write a WordCount program to count the number of lines in the file. Name the program: CountLines.
The Reducer should output:
Total number of lines in AirBNB file: [number]
2B Write a MapReduce WordCount program to count all lines that are shorter than the ideal number of fields. Name the program: CountBadShortRecords
The Reducer should output:
Total number of short lines in AirBNB file: [number]
2C Write a WordCount program to count all lines that are longer than the ideal number of fields. Name the program: CountBadLongRecords
The Reducer should output:
Total number of long lines in AirBNB file: [number]
2D Write a MapReduce WordCount program to count all lines that contain the ideal number of fields. Name the program: CountGoodRecords
The Reducer should output:
Total number of good lines in AirBNB file: [number]
To receive full credit, please hand in all of the following items:A. All code (please attach this homework zipped into one file).B. -cat each of the four output filesat the hlog command prompt, screenshot of the job running, for 4 results