Project 2RESOURCES ALLOWED: This project is to be completed individually. The only resources you may use
when completing the project are
Material posted on the course Blackboard page
The course textbook and “further resources” listed in the Syllabus
If you are unsure about whether a resource is allowed, then please ask your instructor. You should
also consult the course syllabus for a further explanation of academic honesty.
WHAT TO SUBMIT: You will submit a single Python file project2.py to Gradescope.
The first lines of your file should include the following:
#
#MCS 260 Spring 2023 Project 2
#I hereby attest that I have adhered to the rules for projects as well as UIC’s Academic Integrity standards
while completing this project.
The rest of your file should create the functions described below.
TERMINOLOGY: When we refer to a “word” below, we mean a string that only contains alphabetic
characters. For example, the strings “10” and “C-3PO” are not words. To make the project simpler, we
will also define strings with apostrophes and dashes as NOT being words, such as “doesn’t” or “deepfried”.
FUNCTIONS:
1. The LineList function
The LineList function will take as input a string F which represents the location of a text file. This
function will return a nested list L where L[i] is a list of the words on line i+1 of F with all
punctuation removed (i.e. all non-alphabetic or non-numeric characters) and all words
lowercase.
For example, if F has the following lines:
Thursday post wasn’t on time.
Great work with your initial post! How is Mr.
Rogers doing?
Then L[0] = [“thursday”,”post”,”on”,”time”] (notice wasn’t was ignored since it isn’t a word),
L[1] = [“great”,”work”,”with”,”your”,”initial”,”post”], and
L[2] = [“how”,”is”,”mr”,”rogers”,”doing”]
2. The SentenceCount function
The SentenceCount function will take as input a string F which represents the location of a text
file. This function will return an integer which represents the number of sentences in the file. A
sentence will end with either a period, exclamation point, or question mark. Note that periods
are also used for “Mr.”, “Mrs.”, “Ms.”, and “Dr.” and these periods should not be counted as the
end of a sentence. In the example above, there are 3 sentences.
3. The WordDict function
The WordDict function will take as input a string F which represents the location of a text file. The
function will return a dictionary where the key:value pairs are as follows. The key will be a word
(in lowercase form) in the file and the value will be the number of times that word appears in the
document (in any case). For example, if the file contains “Hello”, “hello”, and “HELLO”, and these
are the only instances of the word “hello”, then the key:value pair would be “hello”:3.
4. The SentimentCount function
You need to download the files negativesentimentwords.txt, positivesentimentwords.txt, and
ignorewords.txt from Blackboard which are zipped into a file called word documents.zip. The
SentimentCount function will take as input a dictionary D like the output of the WordCount function.
First, define three variables Negative, Positive, Neutral and set each equal to the integer 0. Then, for
each key (i.e. word) of D, determine which of the downloaded files above that word is in. If it is in
negativesentimentwords.txt (resp. positivesentimentwords.txt), then increase the Negative (resp.
Positive) count by the value associated to the given key. If the word is not in either sentiment text
files AND it is not in ignorewords.txt, then increase the Neutral count by the value associated to the
given key. After this is completed for each element of D, create a variable
Total = Positive + Negative + Neutral
Then calculate the percentage of each type by dividing the type by Total and rounding the
percentage to the nearest percent. Finally, in the same directory as your Python script (using a
relative reference), create a text file Results.txt with the following four lines of text:
Positive Sentiment Word Count: # (#%)
Negative Sentiment Word Count: # (#%)
Neutral Word Count: # (#%)
Note that the #’s should replaced with their appropriate values and the percentages and the
is as follows: (i) if the positive percentage is greater than the negative
percentage, then the should be This document has a positive sentiment; or
(ii) if the negative percentage is greater than the positive percentage, then the should be This document has a negative sentiment; or (iii) if the positive percentage is
equal to the negative percentage, then the should be This document has a
neutral sentiment. For example, if
D={
“acclaim”: 3, #this is a positive word
“absurd”: 1, #this is a negative word
“along”: 5, #this is an ignore word
“brother”: 2, #this word isn’t in any of the three documents
“capable”: 4, #this is a positive word
}
then Positive = 7, Negative = 1, Neutral = 2 and Total = 10. Then Results.txt would have the following
lines:
Positive Sentiment Word Count: 7 (70%)
Negative Sentiment Word Count: 1 (10%) Neutral Word
Count: 2 (20%)
This document has a positive sentiment
EXAMPLES More examples are to come.
GRADING SCHEME We will use the following grading scheme for the project:
project2.py submitted with all four functions above defined (5%): You will receive the points for
defining the file with the right name and all four functions defined
LineList examples (15%): Your program will be tested with random inputs to the function
SentenceCount examples (15%): Your program will be tested with random inputs to the function
WordDict examples (25%): Your program will be tested with random inputs to the function
SentimentCount examples (30%): Your program will be tested with random inputs to the function
Manual Grading (10%): The program will be checked manually for docstrings and comments.
In particular both functions defined must have a docstring describing what the function’s
arguments are and what the function returns.
FINAL NOTE Please note it is very simple to detect when two students have collaborated, so make sure you
are working alone on this assignment.