Presentation Introduce your data, including some background on the problem domain and explanation of the..

2Presentation

Introduce your data, including some background on the problem domain and explanation of the attributes/features. (10 pts)

Explicitly state the data science questions you are trying to answer. (10 pts)

Present the three approaches that you ran in SAS, including the results (trees, coefficients, confusion matrices, area under the ROC curve, clusters, etc.) and your interpretation. (10 pts)

Include one or two slides about your conclusions. (10 pts)

Add one slide on what you think was the hardest part of this project and why. (10 pts)

Project submission

Describe the data. Especially pay attention to the different attributes/features and their meaning. I should be able to have a detailed understanding of the data by reading your report, without actually looking at the data. (10 pts)

Explicitly list the data science questions that you are asking and will try to answer. (10 pts)

Try at least three different data mining approaches from SAS on your data. Present your results (trees, coefficients, confusion matrices, etc.) and interpret them. What insights did you gain? Are the results good or bad? (10 pts)

Use figures wherever you can but accompany them with an interpretation (text). (10 pts)

Include a section where you summarize the answers to your earlier questions (after presenting the results). (10 pts)

Oxygen Consumption Variation in
Fitness
Name: Ozoya Ohilebo
Course code:
Course Name:
Course Instructor:
Date:
Project Outline
• The Dataset of my project is fitness predictions.
• The columns of the Date are 10, the rows are 36, the label (not available),
and the location is Cas_v4e051-default/GAINDS.
• The Data node defines all the project’s input information.
2
Project Dataset
Image showing variable name, type, role, and level
3
Data Mining Problems
• This data mining analysis is focused on supervised learning consisting of the Decision
Tree, The Forest, and the Logistic Regression.
Question Nodes
• What is the project champion model?
• What base was the model chosen on?
• What are the five most important factors of my project?
4
Data Mining Process
Image showing data mining process
5
Data Mining Process Cont…
• The Decision Tree: creates a tree mode by applying a series of simple rules. Each
rule assigns observation to a segment based on the value of the output.
• Forest: The forest node creates a model that conforms to a number of decision
trees.
• Logistic Regression: The Logistic Regression node fits a logistic regression
model for a binary or normal target.
6
Results
• The intercept for the target level “168” with a t-value of 0.9514 is the most
significant parameter.
7
Describing Results
• The ROC curve is a plot of sensitivity, which are both measures of
classification based on the confusion matrix.
• The KS cutoff line is drawn at the cutoff value 0, where the specificity value is
1.0, and the sensitivity value is $MathTool.roundTo (3,$sensV).
• The Event classification report is a visual representation of the confusion
matrix at various cutoff values for each partition.
• The nominal classification report displays either the percentage of or the
number of observations predicting each target level.
8
Describing Results Cont…
9
Images showing the t-value
Describing Results Cont…
• The result shows that the target variable is maximum.
• Pulse and the project champion is Logistic Regression.
• The eight most important variables are determined by their relative importance.
• The drawn partition has a Cumulative Lift of 1.74 in the 10% quantile.
10
Describing Results Cont…
11
Images showing Logistic Regression result
Answers to Questions
• The project champion model is Logistic Regression.
• The model was chosen based on the KS (Youden) for the Test Partition.
• The five most important factors are inputted age, gender, oxygen
consumption, performance, and rest-pulse.
12
Challenges
• The challenges I faced were the ability to find appropriate data set for this
project, project scope, and timing.
13
References
• Documentation.sas.com/doc/en/capcdc/8.5/vdmmlcdc/vdminlref/titlepage.htm.
• Documentationfrom.sas.com/doc/en/capcdc/8.5/vdmmlcdc/vdmmlref/n0zugw9or4wjnx1
a01tihvfbz4k.htm.
• Documentationfrom.sas.com/doc/en/capcdc/8.5/vdmmlcdc/vdmmlref/p1v2/d59fpgbfn16
9ad87yqossw.
• Model studio reference documentation.
14
15
Final Project
Problem 1 (100 Pts): The below problem shows personal income as the independent variable (x) and
the personal consumption expenditures as the dependent variable (y). Use the following data to:
1. Determine the regression equation of the line y=b1x + b0
2. Determine the correlation r squared (note, you need SSE, SSR, and SST).
3. Conclude on what your r squared value means.
Personal income ($)
Personal
consumption
expenditures
($)
32,282
26,848
33,872
28,228
35,423
29,818
37,723
31,210
39,418
32,551
40,156
33,273
39,113
32,853
23,310
18,714
24,444
19,569
25,657
20,414
27,260
21,434
28,336
22,738
30,317
24,227
31,162
25,074
31,448
25,865
Advertising x1
5
3
4
4.3
3.6
3.5
5
6.9
Xmean
Ymean
b1
b0
Revenue X- Xmean
101.3
51.9
74.8
126.2
137.8
101.4
237.8
219.6
SUM
Y-Ymean
(X-Xmean)Squared
Part A: Find the equation of the line
Part B: Find the correlation of the line and comment
2
r
SSR / SST
(x-xmean) (y-ymean)
e line and comment
y(hat)=b0 +b1*x
e=y – y(hat)
e(squared)
(Y-YMEAN)^2
SSE
SST = SSE+SSR
SST
0.00
(Y(HAT) – YMEAN) SQUARED
SSR

Turn in your highest-quality paper
Get a qualified writer to help you with

“ Presentation Introduce your data, including some background on the problem domain and explanation of the.. ”

Get high-quality paper

Guarantee! All work is written by expert writers!

Still stressed from student homework?

Get quality assistance from academic writers!

Order now