CS 250 01 Fall 2022Homework Assignment 5 (Extra Credit)
Due on Moodle: Before class on Friday, October 21. Mail submissions will not be accepted.
Submit one single Jupyter Notebook for all questions. For this homework assignment,
download the Possum Regression dataset from Kaggle(Link:
https://www.kaggle.com/datasets/abrambeyer/openintro-possum) and do the following:
1.
a. Read the data into a dataframe
b. Remove all the rows containing nulls
c. Drop the case and Pop columns.
[1 points]
[1 points]
[2 points]
d. Separate the sex column into two new columns male and female (Find out how
to do this. If you can’t, ask me.)
e. Drop the original sex column
[5 points]
[1 points]
2.
a. Split the dataframe and labels into 75% training and 25% testing using
train_test_split()
b. Train a Logistic Regression classifier on the data.
c. Print the accuracy score of the classifier.
[2 points]
[2 points]
[1 points]
3. Evaluate the performance on the test data using confusion matrix (code given in class).
The matrix should have the class labels “Male” and “Female” on the axes.
[5 points]