Homework 4: Hashing

This assignment is in beta. You're welcome to do it. It won't change much. I'm in the middle of a trip to Japan and thought it'd be nice to revise the homework to remove some of the tedious stuff so the HW is a little shorter than Sp16's version. There's a chance that I broke something while paring things down. Lemme know by email if you spot anything broken.

Getting the Skeleton Files

As usual, run git pull skeleton master to get the skeleton files.

Introduction

In this lightweight HW, we'll work to better our understanding of hash tables. Given that we have a midterm Friday, we've tried to keep this homework very short and to the point. Make sure you're spending your extra time going through study guides, preferably by working through problems with other students in the class!

Simple Oomage

Your goal in this part of the assignment will be to write an equals and hashCode method for the SimpleOomage class, as well as tests for the hashCode method in the TestSimpleOomage class.

To get started on this assignment, open up the class SimpleOomage and take a quick look around. A SimpleOomage has three properties: red, green, and blue, and each may have any value that is a multiple of 5 between 0 and 255. For example, a SimpleOomage with red=35, green=90, and blue=215 would be valid, since all of these are multiples of 5 between 0 and 255. Try running SimpleOomage and you'll see four random Oomages drawn to the screen.

equals

Start by running TestSimpleOomage. You'll see that you fail the testEquals test. The problem is that two SimpleOomage objects are not considered equal, even if they have the same red, green, and blue values. This is because SimpleOomage is using the default equals method, which simply checks to see if the the ooA and ooA2 references point to the same memory location.

In homework 3, we suggested that you simply allow IntelliJ to create your equals method for you. However, for this homework, we suggest that you write one from scratch.

As mentioned in class, writing a proper equals method is a little tricker than it might sound at first blush. According to the Java language specification, your equals method should have the following properties to be in compliance:

One particularly vexing issue is that the argument passed to the equals method is of type Object, not of type SimpleOomage, so you will need to do a cast. However, doing a cast without verifying that the Object is a SimpleOomage wont' work, because you don't want your code to crash if someone calls .equals with an argument that is not a SimpleOomage. Thus, we'll need to use a new method of the Object class called getClass. For an example of a correct implementation of equals, see http://algs4.cs.princeton.edu/12oop/Date.java.html.

Override the equals method so that it works properly. Make sure to test your equals method by running the test again. After fixing equals, your code should now pass the test.

A Simple hashCode

In Java, it is critically important that if you override equals that you also override hashCode. Uncomment the testHashCodeAndEqualsConsistency method in TestSimpleOomage. Run it, and you'll see that it fails.

To see why this failure occurs, consider the code show below.

Two question to ponder when reading this code:

Answers:

The Java specification for equals mentions this danger as well: "Note that it is generally necessary to override the hashCodemethod whenever the equals method is overridden, so as to maintain the general contract for the hashCode method, which states that equal objects must have equal hash codes."

Uncomment the given hashCode method in SimpleOomage, which will return a hashCode equal to red + green + blue. Note that this hashCode is now consistent with equals, so you should now pass all of the TestSimpleOomage tests.

testHashCodePerfect

While the given hashCode method is ok, in the sense that it is consistent with equals and thus will pass testHashCodeAndEqualsConsistency, it is only using a tiny fraction of the possible space of hash codes, meaning it will have many unnecessary collisions.

Our next goal for the SimpleOomage class will be to write a perfect hashCode function. By perfect, we mean that two SimpleOomages may only have the same hashCode only if they have the exact same red, green, and blue values.

... but before we write it, fill in the testHashCodePerfect of TestSimpleOomage with code that tests to see if the hashCode function is perfect. Hint: Try out every possible combination of red, green, and blue values and ensure that you never see the same value more than once.

Run this test and it should fail, since the provided hashCode method is not perfect.

A Perfect hashCode

To make the hashCode perfect, in the else statement of hashCode, replace return 0 with a new hash code calculation that is perfect, and set the USE_PERFECT_HASH variable to true. Finally, run TestSimpleOomage and verify that your perfect hashCode method passes your test. Your TestSimpleOomage test might take a few seconds to complete execution.

Evaluating the perfect hashCode in JUnit

Fill in the haveNiceHashCodeSpread(List<Oomage> oomages, int M) in the OomageTestUtility method so that it returns true if the given Oomages would have a nice spread if thrown into M buckets using their hashCodes. We define a nice spread as:

For the purposes of converting an Oomage's hashCode to a bucket number, you should compute bucketNum = (o.hashCode() & 0x7FFFFFFF) % M, where o is an Oomage. You should not use Math.abs or Math.floorMod. See the FAQ for why.

Once you've written this method, try running TestJankyOomage and it should fail. Try running TestNiceSpreadOomage and it should pass.

Now uncomment testRandomOomagesHashCodeSpread in TestSimpleOomage and make sure USE_PERFECT_HASH is true in your SimpleOomage class. Most likely, your hash code will fail this test, even though it is perfect. To understand why, let's turn to visualiation. If by some chance your hash code passes, that's fine, you should still use the visualizer to see how your hashCode performs visually in the next section.

Evaluating the perfect hashCode Visually

To get a better understanding of how hash tables work, we will use a hash table visualizer. Specifically, we've provided a method with the signature visualize(List<Oomage> oomages, int M, double scale) that shows the distribution of the given oomages across the M buckets according to their hashcodes (and scale is a special parameter that controls the size of some aspects of the drawing).

To try it out, try running the HashTableVisualizer provided. Your visualization will probably look like this (if your hash code fails your test from the previous section):

visualizer

Or if your hash code is already good enough that everything spreads out nicely, it'll look like this:

visualizer

If you're getting a figure similar the first where everything is in two buckets, here's why: Even though your hashCode is perfect, it's always returning a multiple of 5. Try changing M to 9 in HashTableVisualizer, and you should see a nice spread of values. Understanding why this is happening is a great idea.

If you'd like to make your hashCodes work well for any number of buckets, you should ensure that it is not always a multiple of any number. One way to do this is to divide the red, green, and blue values by 5 before computing the hash code.

Experiment With the Visualizer (Optional)

Once you've got a nice spread, try out the visualizer with scale = 0.5, N = 2000, M = 100.

Try messing around with N and M and see how the visualizer behaves on your SimpleOomage class. If there isn't enough room to fit everything on screen, try resetting the scaling factor to a number less than one. Compare the distribution of items for the perfect vs. imperfect vs. default hashCodes. Does what you see match what you expect?

Complex Oomage

The ComplexOomage class is a more sophisticated beast. Instead of three instance variables representing red, green, and blue values, each ComplexOomage has an entire a list of ints between 0 and 255 (not necessarily multiples of 5). This list may be of any length.

This time, you won't change the ComplexOomage class at all. Instead, your job will be to write tests to find the flaw in the hashCode function.

Evaluating the ComplexOomage hashCode

The provided hashCode is valid, but it does a potentially bad job of distributing items in a hash table.

Start by visualizing the spread of random ComplexOomage objects using the visualizer you just built. Use the randomComplexOomage method to generate random ComplexOomages. You should find that this visual test shows no apparent problem in the distribution of randomComplexOomages.

Also try uncommenting testRandomOomagesHashCodeSpread in TestComplexOomage. You'll see that it passes testRandomOomagesHashCodeSpread just fine, reinforcing what you just saw in the visualizer. For totally random ComplexOomages, everything is fine.

testWithDeadlyParams and binary representations

In this case, we need to look at the hashCode to find the problem. Fill in the test testWithDeadlyParams such that the provided hashCode function fails due to poor distribution of ComplexOomage objects.

Given what we've learned in 61B so far, this is a really tricky problem! Consider how Java represents integers in binary (see lecture 23 for a review). For a hint, see Hint.java.

Your test should not fail due to an IllegalArgumentException.

Once you've written this test and ComplexOomage fails it, you're done with HW4!

Fix the hashCode (optional)

Consider how you might change the hashCode method of ComplexOomage so that testWithDeadlyParams passes. Are there other deadly parameters that might strike your hashCode method?

Submission

To give you some small amount of flexibility in the problems you want to focus on, we've set up the AG to give you full credit so long as you pass all but one test. Thus if you're having trouble with any particular part of the HW, feel free to skip it at no penalty.

FAQ

I'm failing the HashTableVisualizer test!

You must convert from hashCode to bucket number using (hashCode & 0x7FFFFFFF) % M. You should not use Math.abs(hashCode) % M.

Why can't I just use Math.abs or Math.floorMod?

Math.abs is bas because of what happens if you do Math.abs(-2147483648). Try it out. Math.floorMod is fine, but we wanted to give you a chance to see how the optional algorithms textbook does things, and also give you an example of a bitwise operation (0x7FFFFFF), we'll talk about these much later in the course towards the very end.

What does k & 0x7FFFFFF do?

It returns an integer equal to k, but with the top bit set to zero. We'll talk more about this at the very end of class.

I'm getting errors like file does not contain class hw4.hash.HashTableVisualizer in the autograder.

Your code must be part of the hw4.hash package, with the appropriate declaration at the top of the file.