Big Data Analytics using Spark

Part A: Clustering –

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

1. Find a dataset in kaggle or any other source. Make sure that each dataset is at least 500 MB.

2. Write a detailed description of the dataset.

3. Preprocess the dataset.

4. Using K-means algorithm to cluster the dataset.

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

5. Use the Elbow method and the Silhouette method to find the optimal K.

Part B: Regression

1. Find one or two datasets in kaggle or any other source. Make sure that each dataset is at least 500 MB.

2. Write a detailed description of each dataset.

3. Preprocess each dataset.

4. Divide each dataset into training and testing.

5. Build two regression models.

6. Test the models and compute their accuracy.

Part C: Classification

1. Find one or two datasets in kaggle or any other source. Make sure that each dataset is at least one 500MB.

2. Write a detailed description of each dataset.

3. Preprocess each dataset.

4. Divide each dataset into training and testing.

5. Build two classification models.

6. Test the models and compute their accuracy.

Deliverables:

Still stressed from student homework?
Get quality assistance from academic writers!

Order your essay today and save 25% with the discount code LAVENDER