CS 161 Assignment #6

Due Monday, November 2nd at 11:59pm
(Not accepted after November 8th at 11:59pm)

Introduction

You did fine work on the ComboGuesser assignment! At the next poker night, your anonymous employer couldn't stop talking about how useful your program was, and how discreet you'd been. This got Bill's attention, and he has approached you for help with a delicate situation: Someone's been sending his wife anonymous love letters, and he's trying to figure out who's responsible. Bill has suggested that perhaps an analysis of word frequencies might do the trick — the frequencies with which words appear in a writing sample can help tie the writing to a particular author. He's collected writing samples from the prime suspects, and wants you to determine how often each of the words in the samples appear. You could do the same for the love letters, and the results might help find the author. (Bill suspects that the author uses "daffodil" more often than most writers, for example, but can't prove it yet.)

After thinking about the problem for a bit, you accept the offer. The recent talk about sorting in class gives you an idea: Assuming the words were represented as Strings in an array, you could sort them so that they were in alphabetical order. Once sorted, duplicate words would be adjacent to each other in the array, and you could make a pass through to count how many times each word appeared.

Comparing Strings

You can read up on the methods in the String class in the online documentation. You can't use operators like < or == on Strings. Instead, Java provides the compareTo and equals methods. When comparing strings, a negative result from compareTo means that the first string is alphabetically before the second. Note that both methods are case sensitive, but that's fine for our purposes. For example:

> "abc".compareTo("xyz")
-23  (int)
> "abc".compareTo("abc")
0  (int)
> "xyz".compareTo("abc")
23  (int)
> "aardvark".compareTo("anteater")
-13  (int)
> "Abc".compareTo("abc")
-32  (int)
> "Java".equals("Java")
true  (boolean)
> "Java".equals("java")
false  (boolean)

The Assignment

I've created a new project to get you started, though it currently only contains the WordGetter class mentioned above. You can use the methods in this class to do more extensive testing of your frequency-finding code. The methods in WordGetter will find all of the words in a file or on a web page, and return them as an array of Strings. Once you open the project, you'll need to do the following:
  1. Create a new class called FrequencyFinder to hold the code you write. Your class won't need any state — just the two methods described below for finding word frequencies.
  2. Copy the selection sort method that we wrote in class to your FrequencyFinder class. (You could also use the version from Lab 7 if you prefer.) Edit the method so that it takes an array of Strings as its argument, then make the necessary changes to the body of the code so that it compares and swaps Strings instead of integers. For full credit, this method should be private, though you may wish to leave it public until you've tested it thoroughly:
  3. > FrequencyFinder ff = new FrequencyFinder();
    > String[] words = {"zoo", "aardvark", "java", "apple"};
    > ff.sortStrings(words);
    > words[0]
    "aardvark"  (String)
    > words[1]
    "apple"  (String)
    > words[2]
    "java"  (String)
    > words[3]
    "zoo"  (String)
    
  4. Define a new method called printFrequencies that takes an array of Strings as its input. It should call the string-sorting method you modified in the previous step, then traverse the sorted array and print out frequency information as shown below. (In my output, I printed a tab character ("\t") between each count and the corresponding word, to keep the columns nice and tidy.) Hint: On your final pass through the array, look at adjacent items. Each time you find adjacent items that differ, it's time to print a line of output.

    The examples below illustrate the correct output for various examples. The final example prints nearly 200 lines of output, but I'm only showing the last 40 or so to keep the assignment page to a manageable length. The full list of output is here. Some of the "words" don't look much like English, but that's because the method that retrieves them from the web page grabs HTML formatting commands along with the page's text.

    > FrequencyFinder ff = new FrequencyFinder();
    > String[] word = {"hello"};
    > ff.printFrequencies(word);
    1	hello
    
    > String[] words = {"hello", "world", "hello"};
    > ff.printFrequencies(words);
    2	hello
    1	world
    
    > String[] words2 = {"hello", "world", "Hello"};
    > ff.printFrequencies(words2);
    1	Hello
    1	hello
    1	world
    
    > WordGetter g = new WordGetter();
    > g.fromURL("http://www.cs.ups.edu").length
    198  (int)
     
    > ff.printFrequencies(g.fromURL("http://www.cs.ups.edu"));
    [This is just the end of the output]
    2	like
    1	links
    1	looked
    1	looks
    1	maintained
    1	majors
    1	math
    1	more
    1	occupies
    7	of
    1	offer
    1	offered
    1	on
    1	other
    1	our
    1	pages
    1	photo
    1	programming
    1	programs
    1	renovation
    1	right
    1	sc_invisible=1;
    1	sc_partition=7;
    1	sc_project=874859;
    1	sc_security="4855c8f0";
    1	semester.
    1	specifics
    1	statistics,
    1	students
    12	the
    1	through
    6	to
    1	topics.
    1	tower,
    1	ups
    1	use
    4	var
    2	what
    2	with
    1	year
    1	year,
    

Extending the Assignment

Submitting

Before submitting, test each of your methods thoroughly and double check for comments above each method (including the @param and @return tags). When you're convinced it's ready to go, submit the project electronically. If the submit menu item is greyed out, save this file to your desktop. Quit BlueJ, move "submission.defs" into your project folder, restart BlueJ, and try submitting again. Submitting should work from off campus now as well as from machines on campus.


Brad Richards, 2009