| CMPU
102 - Assignment 9
Short Document Concordance
| Assigned: Monday, Apr 28 Due: Monday, May 5
|
A
Concordance is a listing of all of the words appearing in a short
document together with the line number(s) on which they appear.
In
this assignment you will:
- Implement a Concordance using classes HashMap and PriorityQueue from the JCF (Java
Collections Framework).
- You will complete
the classes Concordance
and WordRecord as
outlined in the starter project.
- You will gain experience using classes from the
JCF.
- Scan through the
input file, line-by-line, and in each line extract each word, construct
a WordRecord, and put it in the HashMap.
- When all of the words in
the document have been added to the Concordance, you will enter the
contents of the HashMap
into a PriorityQueue
and then perform successive removes of WordRecord
objects from the PriorityQueue
to write their contents to an
output file.
- Use the submit102 script
from the Linux prompt to hand in your project.
Summary: We have provided the
starter project with the three classes you will need. You will complete the classes WordRecord
and Concordance
as instructed in the comments. You
will submit
your project and hand in a hard copy of the code and input and output
files.
Download the starter project
- Click here to download the starter project: Assign9.zip.
- Save it in your cs102
directory.
- Unzip the file.
- You should be able to Open this as an existing
project in NetBeans
(described next)
Launch NetBeans
and Open Project
- From NetBeans,
go to the File menu and select "Open Project" -- or click on the third
button from the left on the button bar.
- Navigate to your cs102
folder and select the folder named "Assign9"
- The "Open Project Folder"
button should be enabled. Click it to open the starter project.
- The classes you need
to modify are: WordRecord and Concordance.
The Concordance Class
The instance variables for this class are:
- HashMap<String,
WordRecord> concord
and
- String textFile
where
concord will contain the words and occurrences within the
textfile input file.
You
will implement the following methods:
- a
constructor, Concordance(String
fileName ),
- void makeConcordance( ),
and
- void
writeConcordance
(String outputFileName.
In
makeConcordance
you will declare and initialize a Scanner to read the input file,
textFile.
You must wrap a Scanner
around a File
attached to the
fileName.
With this scanner you will
repeatedly read a
line from the file and increment the line number count.
Inside this outer loop you will use a
StringTokenizer to extract each
word from the line (a
String).
When you
create your
StringTokenizer
you will supply as parameters the String
obtained by the scanner, and a second string--a listing of the
delimiters.
The
delimiters will be all of the punctuation characters (plus the space
and carriage
return), and specification of the delimiters is needed to prevent these
punctuation characters from being attached to the beginning or end of a
word.
You may
declare a delimiter string
as follows:
String delims
= ",.?;:-!)(\"\n\t ";
Use the StringTokenizer to extract each
word from the input line.
Next
you determine if the word (the key) is
contained in the
HashMap.
If it is not present, you
will form a
WordRecord
object and put it in the
HashMap.
If the word
is already present, you will have to add the current
lineNumber
to the list of line numbers on which the word appears.
This is done by first removing the
WordRecord from the HashMap,
adding the new line number to the object retrieved, and then putting it
back in
the
HashMap.
When all of the lines from the input file have been read and
all of the
words extracted and added to the
HashMap,
the method
writeConcordance
is called.
You will need to create a
PrintStream
that wraps a
FileOutputStream
attached to a
File with the
fileName
supplied as a parameter to this method.
You will construct a
PriorityQueue<WordRecord> and add the
Collection obtained from
the
HashMap.
While this priority queue is not empty, successively remove
each
WordRecord and add
the word and its list of line numbers to
the output file.
General note: The online Java API is your friend. Consult
it to determine which methods to use and how to use them correctly.
The WordRecord Class
Class WordRecord
implements the Comparable
interface. A WordRecord
has a word (String) and an ArrayList<Integer>
(for holding line numbers) as instance variables.
This class has the following methods:
- a
constructor WordRecord(String word, int lineNumber),
- void addNumber(int
lineNum ),
- String toString(
),
- int compareTo(Object
other), and
- boolean equals(Object other).
All of the methods are
very straightforward.
Your
implementation of
toString
should produce a string
containing the word and a list of line numbers on which it appears.
When adding a new line
number, you should
determine that the new line is not the same as the last line number
that was
previously added.
You
should not record
duplicate line numbers when the word appears more than once on the same
line.
The
Main
Class
The main class has only a main method, which has been implemented for you. This method gets the name of the input and
output files from
the command line (args[0] and args[1]). Then it creates a
Concordance object and, inside a try
block, makes a concordance and then writes it to the
output
file.
Complete the partially implemented classes as detailed above. Follow instructions
included with the
comments to implement each method.
Good luck!
When you complete your program and
run it, a sample fragment
of the output (in file Concordance.txt) should look like the listing below.
The input text is from the beginning
paragraphs of David Copperfield.
a:
4, 7,
12, 16,
acquainted:
9,
all:
11, 17,
and:
4, 5, 6, 7, 10, 18,
any:
8,
anybody:
2, 18,
as:
3, 11,
at:
4, 17,
attaching:
11,
baby:
17,
be:
1, 2, 9, 18,
because:
13,
becoming:
9,
been:
3, 18,
before:
8,
began:
5,
begin:
2,
beginning:
3,
…… … …
Submitting your solution
From a terminal window, type the following commands:
cd
cd cs102
submit102 Assign9