Data Mining and Data Warehousi_PP

Need to make the presentation of 5 to 7 slides with an explanation in the note of each slide and mark in the slide that is explained if needed about the attachment.

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

Question One
Select one of the datasets from UCI Machine Learning Repositories.
(http://archive.ics.uci.edu/ml/) OR ( https://www.kaggle.com/datasets )
OR use your own dataset if available from any source. Then write name and link of
selected dataset.

Dataset Name: Adult

Dataset Link: Adult Data Set
Question Two
Load the dataset in Weka or if you prefer to use any python tools such as Google Collaborate
Lab https://research.google.com/colaboratory/
Question Three
Understand and describe the nature and structure of the selected dataset.
-A brief description about the dataset: include number of attributes, number of instances,
outliers in the dataset.

Number of instances: Approximately 32,561

Number of attributes: 15

The dataset contains outliers, as it represents real-world data with a variety of income
levels and demographics.
The structure of the dataset includes both numeric and categorical attributes. Some attributes
include age, workclass, education, marital status, occupation, and more. The “income”
attribute is the class variable, indicating whether an individual’s income is 50K.
Question Four
Provide descriptive statistics for some attributes [at least 2 attributes] using statistical method:
(1) Include the measure of central tendency such as the mean, median, and mode. (2)
Describe the spread of your data. This may include the measure of variance, standard
deviation, skewness, and kurtosis.
Question Five
Do a basic preprocessing to the dataset such data cleaning / Data reduction /Normalization
(if exist or required) etc.
On the attached Python script
Question Six
Based on dataset run Apriori algorithm with different support and confidence values.
Discuss the generated rules.
Question Seven
Based on your dataset selection, apply SVM data mining algorithm.
Provide the result and accuracies of the algorithms and discuss it with supporting screenshots.
Question Eight
Based on your selection dataset, Apply the Decision tree data mining algorithm with
different parameter setting and record the accuracies.
Question Nine
Apply the K-mean algorithm on the dataset (for k=4) and study the clusters formed.

Still stressed with your coursework?
Get quality coursework help from an expert!