COMP – 10205 – Data Structures and Algorithms

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

You a
re t
o write a program that
will count
the
words and occurrences
of
word
s
in
thebook
War

and
Peace
by Leo Tolstoy
. In addition
to counting words
,
you must
spell check the document
and report how many words are
not found in the provided
dictionary.
You will also
match
occurrences of the word War with Peace
and find the average distance be
tween the closes
t set
of each
pair
of
these words.
You mus
t use a class
call
ed
BookWord (provided as UML
below)
to store each word
for the
words in the
novel and for the words in the provided dictionary
. The words must be stored
without concern for case sensitivity
(all characters
must be converted to
lowercase)
.A word is considered to be one or more characters in length separated by white space (space
character, tab, new line character), a comma, period, exclamation point, question mark, double
quote mark, either round bracket, underscore, hyphen, colon or se
micolon.
The regex
expression below can be used to implement this functionality with the .useDelimiter meth
od ofthe scanner object.
Other punctuation marks such as single quote,
apostrophe,
forward
slash
and backslash should not be included in words, ho
wever, for the purpose of this assignment
it
is sufficient to only exclude words containing the single quote character.
For example, the
word it’s should not be considered a word, and should be discarded from
the
input.
S
tring
regEx
=

\
\.|\\?|
\\!|
\\s|
\”|
\\(|
\\)|
\\,|
\\_|
\\-|\\:|
\\;|
\\n”;

BookWord Class U
ML

** These method
s exist in the Object
class and
must be overridden
in yourBookWord class
BookWord will be used to hold
thecharacters for each unique word as a
string along with
a count which
will contain the
number of
occurrences of that word
in thenovel.
The main
method will
create
an ArrayList of BookWord to hold the Words for the
novel. The starting
code should be u
sed to read the
words from the file
.The us.txt
file is a dictionary of words. You must
use the dictio
nary to find the number of
misspelled words
(defined as not present in the provided dictionary)
in the novel.
You will need
to create 2 dictionaries f
or this lab.
 The first
must be an ArrayList. Sort the ArrayList using the
Collections.sort
method once all dictionary has been stored in the ArrayList.
 The second must be a
SimpleHashSet. Do not use the
HashSet orHashMap
that are supplied as part of the Java Collection Framework.
Suggested Steps:
 Download
SimpleHashSet.java,
the novel(
WarandPeace
.txt),
and the dictionary
(US.txt)
from MyCanvas
. Create a project in
IntelliJ
.o Add the US.txt and the
WarAndPeace
.txt file to the src
folder.
 Complete the BookWord class based on the UML
provided.Notes:
o Provide correct implementation for toString
, equals
and hashCode
.
o The hashCode method must calculate and return a hashCode for
eachBookWord. The hashCode must
only use
constant com
ponents in the class
for the calculation.
It is suggested that you find a good hashing algorithm on
the internet and implement it in hashCode
(You must code this and not use a
built
-in hashing method

i.e.
do not use String.hashCode()
or Object
for this
assignment).
o Site your reference in the source code
for the hashing
algorithm used
.o Confirm that the selected algorithm produces good hashes. If you have less than
10% empty buckets
in your HashSet on the Dictionary Words your
algorithm is
acceptable.
There are public methods in the SimpleHashSet class that will allow you
to calculate
this value.

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

Part A
–Hashing / Counting and Spelling

Ensure your application prints all of the following information to the standard output.
 Total # of words in the novel. Total # of unique words in the file.
 The list of the
15
most frequent words and counts
o The data must be sorted to do this. Sort using the built
-in Collections.sort
method and supply a
L
ambda
expression to complete the sort. This needs to be
a two
-key sort
(first by count and then alphabetically)
. The list of words that occur exactly
41
times in the file.
These words must be listed in
alphabetical order.
 The # of words that are not contain
ed in the dictionary.
All three
methods below should
yield the same number of words. You must
implement all
three techniques
andmeasure the performance of each
. At the top of the code you must include a discussion
of the results
including
an analysis o
f the performance and whether or not this lines up
with the theory for the data structure and algorithms used.
1.
Use the
contains method of ArrayList to find matches
.2.
Use the binarySearch
method of the Collections class
(Collections.binarySearch
)to search the ArrayList dictionary
. Supply a Lambda
expression for the binary
search of the Dictionary.
3
.
Use the SimpleHashSet dictionary and the
contains method provided
.

Part B
– War and Peace
The words war and peace appear in the novel many times.
Starting at the beginning of
the
novel you are to match each occurrence of the word peace with the
closest occurrence of the
word war. War may appear before or after peace. Once a pair
has been made the occurrence
of each word must not be used again
in o
ther matches
. Note that the order can make a
difference so
to match the
expected
results you
must find
peace first
and then look for the
closest occurrence of war.
One
way to
determine this is to perform a Proximity Search
(https://en.wikipedia.org/wiki/Proximity_search_(text)
). One method to calculate a numerical
proximity
is to determine the
proximity distance between
one word and another in
terms of
word position.
To do this, e
ach word must be numbered
starting from 1 in
the order they
appear in the text. As an example, if we wanted to find t
he proximity distance between war
and peace in the text below
, we would number each word
starting at 1 from the be
ginning ofthe text.War and peace, peace and war. War, War, War. Give Peace a chance, we all want peace
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11) (12) (13)
(14)(15)(16) (17)
We could then start pairing off each occurrence of peace
with the closest war in the text. The
first pa
ir would be
3 and 1, The second pair would be 4 and 6, The third pair
would be 11 and 9.

And the last pair would be 17 and 8. The occurrence of war at position 7 is ignored as there is

no occurrence of peace to match with.
Sample output for the test case shown above.
shortest
distance between war at pos(3) and peace(1) = 2
shortest distance between war at pos(4) and peace(6) = 2
shortest distance between war at pos(11) and peace(9) = 2
shortest distance between war at pos(17) and peace(8) = 9
The total sum of distances between the matched pairs of war and peace = 15
The average distance between the matched pairs of war and peace = 3.75

NOTE: This is a college project DO NOT use any AI tools to write the code. The code should be hand-written only. Please read and follow all the instructions properly. Please find attached files and one screen capture for the project. Some data and code are provided.

Still stressed from student homework?
Get quality assistance from academic writers!

Order your essay today and save 25% with the discount code LAVENDER