- Pick any dataset relevant to your major that you would like to analyze. Avoid use the same or similar datasets to one you use for your final project.
- Randomly divide it into two chunks 80% and 20% of records.
- Select input variables (2 min) that you will use for cluster analysis. Provide reasoning for the selection.
- Use SPSS or other tool to apply appropriate cluster analysis method to clusters the larger part of the dataset.
- Identify clusters and describe their centroids and business meaning.
- If classes are poorly identified by the analysis or their business meaning is hard to describe. Change your variable selection and go to the step 3.
- For at least 5 records from the remaining smaller part of the dataset identify the closest cluster centroid. That will be a prediction which cluster those records belong too. Note that they have not been used in cluster identification, therefore this prediction will qualify as an example of predictive analytics.
- Submit a Word report describing each step and a result of this process, include relevant scripts and outputs produced by the tool you use.