TIL-24: Hashing in Java — Part 2

Recep İnanç
5 min readDec 21, 2019
Photo by Markus Spiske on Unsplash

“Today I learned what is Hashing, what is it in Java, why and how to do it correctly in Java.”

In the previous post, we’ve looked at what is hashing in general, why do we need that, what are the possible problems and what are our options against those problems. Today in this post we’ll answer the following question:

  • What is Hashing in Java?
  • What are Hashtable, HashMap, and HashSet?

Let’s get started!

What is Hashing in Java?

In Java, hashing is used in different data structures and supported by all objects.

hashCode()

The Object class has a hashCode() method which returns a hash code for the given instance. The method hashCode() implemented differently in each subclass of Object. For example from Java 1.2 for java.lang.String the implementation of hashCode is as follows:

h(s)=s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]

Multiply each character in the string with powers of 31. Keep in mind that, if the String is long enough hashCode may return a negative integer due to integer overflow.

How to do Hashing correctly?

Implementing a custom hashCode() for your class is possible too, but you should keep in mind that there’s a proper way of doing this to prevent pitfalls and get the best out of hashing. It is explained in detail in the book Effective Java by Joshua Bloch and you can see the step by step guide here.

Hashtable, HashMap, HashSet

The Set, Map interfaces in Java are two different collections types.

A Set is a collection of distinct items.

A Map is a collection of keys that corresponds to values.

Hashtable and HashMap both implement the Map interface, whereas HashSet implements the Set interface. They all use hashCode for performance improvement.

Hashtable vs. HashMap

Hashtable is a legacy class and thus should be avoided in favor of HashMap. The main difference between the two is that some methods in Hashtable are synchronized — thread-safe. If you are working with a HashMap in a multi-threaded environment you have to provide your own mechanism for thread-safety.

Hashtable is now there just for backward compatibility. Refer to this Stackoverflow Answer for a detailed explanation of why hashtable is not removed, but instead kept as a legacy class and hashmap is created.

  • HashMap class maintains no order.
  • HashMap class contains values based on the key.
  • Elements from a HashSet are retrieved using get(key).
  • HashMap permits a single null key and any number of null values.

HashSet

HashSet is an entirely different data structure that does not provide a map-like structure. It uses HashMap internally but does not provide such an interface to the user. Use HashSet when you don’t need to map keys to values.

From the official Oracle Docs:

This class implements the Set interface, backed by a hash table (actually a HashMap instance). It makes no guarantees as to the iteration order of the set; in particular, it does not guarantee that the order will remain constant over time. This class permits the null element.
This class offers constant time performance for the basic operations (add, remove, contains and size), assuming the hash function disperses the elements properly among the buckets. Iterating over this set requires time proportional to the sum of the HashSet instance’s size (the number of elements) plus the “capacity” of the backing HashMap instance (the number of buckets). Thus, it’s very important not to set the initial capacity too high (or the load factor too low) if iteration performance is important.

  • HashSet is a regular set — all objects in a set are distinct.
  • HashSet doesn’t maintain the insertion order.
  • HashSet is the best approach for search operations.
  • Elements from a HashSet are retrieved using an Iterator.
  • HashSet permits to have a single null value.

Internal Workings of HashMap and HashSet

Do you wonder what happens under the hood when we say HashMap.put(“Key”, “Value”), HashMap.get(“Key”), HashSet.add(“Value”)? Let’s find out!

HashMap.put(“Key”, “Value”)

Let’s see what happens when we get to the Line 9, and call “gamesMap.put()”.

The key here is: “Age of Empires 2”
The value here is: “ageOfEmpires >> Game@898”

1- Our key is hashed into hashedKey. Let’s say hashedKey’s value is 43.

2- A new non-tree Node() is created inside the HashMap class. The Node object we created contains the following data: key, value, hashedKey.

3- All nodes are stored inside a data structure called tab, which is an array of Node<K, V> objects with the capacity of n.

4- We apply positive modulo operation on our hashedKey so that it does not go out of the bounds of the tab array. The result we get from this operation is the index we are going to put our node.

Positive Modulo Operation: (n-1) & hash

5- We insert our Node() at the tab[position].

Perfect! We’ve put our data into our HashMap. Now what? Let’s try to get it back from there!

HashMap.get(“Key”)

As we do in Line 10:

We call gamesMap.get(key), and the key here is: “Age of Empires 2”.

1- Our key is hashed into hashedKey. Again, hashedKey’s value is 43.

2- We apply a positive modulo operation on our hashedKey to find out where it is stored in the tab. We call the result position.

3- We get the Node object at the tab[position].

4- Return the Node object’s value field.

We get our game object (Game@898) back! This is how we do put/get operations in a HashMap, now let’s see how do we do it in a HashSet.

HashSet.add(“Value”)

Looking at Line 14 we can see the syntax for adding an element to a HashSet.

When a HashSet’s constructor is called, the HashSet creates a HashMap object internally. HashMap requires two parameters to put a record into its internal array, a key and value however we only have the value here and it is “ageOfEmpires >> Game@898” object. What happens now?

1- HashSet treats the value we passed as if it was the key we passed to our HashMap.
So, for the internal HashMap key is: “ageOfEmpires >> Game@898”.

2- HashSet uses the constant String “PRESENT” as the value for the internal HashMap.
So, for the internal HashMap value is: “PRESENT”.

3- Now, HashSet calls HashMap.put(key=Game@898, value=”PRESENT”). The rest of the process is exactly the same as the HashMap.

An image is worth a thousand words — I know it is a bit confusing, but hopefully, this drawing will clarify things out for you.

Conclusion

Here we are again, gone pretty deep inside the Java hashing data structures this time, I hope that this post helps you understand the inner workings of hashing in Java because writing this post helped me a lot :). Hope to see you in the other posts!

Cheers!

References

--

--

Recep İnanç

Software Developer @Amazon. You can find me at recepinanc.com. My opinions are my own.