DSC 540 Grand Canyon University Computer Science Paper

Classification with K-Nearest Neighbors

The k-nearest neighbor (kNN) is used for pattern classification, regression models, and is ideal for data mining. Some real-world examples of its use include determining credit card ratings, identifying who’s likely to default on a loan, detecting unusual patterns in credit card usage, or predicting the future value of stocks.

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

Step 1: To perform a classification using the k-nearest neighbors algorithm, complete the following:

  1. Access https://archive.ics.uci.edu/datasets?Task=Classification&NumInstances=1000-inf&NumAttributes=10-100&skip=0&take=10&sort=desc&orderBy=NumHits&search=

Note: There are about 120 datasets that are suitable for use in a classification task. For this part of the exercise, you must choose one of these datasets, provided it includes at least 10 attributes and 10,000 instances.

  1. Discuss the origin of the data and assess whether it was obtained in an ethical manner.

Step 2: For your selected dataset, build a classification model as follows, this can be done in either RStudio or Python:

  1. Explain the dataset and the type of information you wish to gain by applying a classification method.
  2. Explain the k-nearest neighbors algorithm and how you will be using it in your analysis (list the steps, the intuition behind the mathematical representation, and address its assumptions). Assume k = 5 using the Euclidian distance. Explain the value of k.
  3. Import the necessary libraries, then read the dataset into a data frame and perform initial statistical exploration.
  4. Clean the data and address unusual phenomena (e.g., normalization, outliers, missing data, encoding); use illustrative diagrams and plots and explain them.
  5. Formulate two questions that can be answered by applying a classification method using the k-nearest neighbors method.
  6. Split the data into 80% training and 20% testing sets.
  7. Train the k-nearest neighbors classifier on the training set using the following parameters: k = 5, metric = ‘minkowski’, p = 2.
  8. Make classification predictions.
  9. Interpret the results in the context of the questions you asked.
  10. Validate your model using a confusion matrix, accuracy score, ROC-AUC curves, and k-fold cross validation. Then explain the results.
  11. Include all mathematical formulas used and graphs representing the final outcomes.

Step 3: Prepare a comprehensive technical report as an RMarkdown document or Jupyter notebook, including all code, code comments, all outputs, plots, and analysis.

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

Make sure the project documentation contains:

a) Problem statement

b) Algorithm of the solution

c) Analysis of the findings

d) References

Still stressed with your coursework?
Get quality coursework help from an expert!