TIL-23: Hashing in Java — Part 1

Recep İnanç
6 min readDec 19, 2019
Photo by David Yu from Pexels

“Today I learned what is Hashing, what is it in Java, why and how to do it correctly in Java.”

Ahoy! This is Part 1 of my “Hashing in Java” posts. In this part we are going to be answering the following questions;

  • What is Hashing?
  • Why do we need Hashing?
  • What is a Hash Function?
  • What are Collisions?
  • How to Handle Collisions?

Let’s get into it!

What is Hashing?

Hashing is a technique to represent information using a shorter key that is generated by using that information.

The information can be anything from strings to integers to objects. The basic steps to apply hashing is as follows:

  • Take the information element and use it as the input of a function to generate the key.
  • Use that key to store than information for quick access later on.

Why do we need Hashing?

To answer this question, let’s look at a real-world example:

Imagine you own a Video Game Store where you have lots of different games. All of your games are put onto the shelves in random order. Someone comes in and says they want to buy the following games — Age of Empires, FIFA, Red Alert 2 and Max Payne. How would you find those games at your store?

Going through ALL of the games and checking if one matches the requested games is considered as “linear search” and it is — as it already sounds — exhaustive!

After you’ve spent hours to find those 4 games, you decided to put your games into shelves in a particular way you decided to so that no more than one game is put into the same position of the same shelf and record where you put them.

Here the information is the game itself and the key is the position of the game on the shelves. The “particular way” you used to decide where to put your game is called the hash function.

What is a Hash Function?

A hash function is a function when given the input, generates a hashed value for that input.

A good hash function should;

  • Uniformly distribute the keys (Each value equally likely for each input)
  • Be efficiently computable
  • Give the same hash value for two equal objects

Note: Two unequal objects do not always have to have different hash values.

A hash function that returns a unique hash number is called a universal hash function. Hopefully, we’ll cover this one in the next part of this series.

Simple Hash Functions:
- Division-remainder method
- Folding method
- Radix transformation method
- Digit rearrangement

More Complex Hash Functions:
- MD2
- MD4
- MD5
- Secure Hash Algorithm (SHA)

Getting back to our example of “Video Game Store”, your hash function could be something like;

  • Get the first character of every word of the game,
  • Find their position in the ASCII table,
  • Multiply them respectively with different prime numbers, (13, 17, 19, 31)
  • Add them together,
  • Apply modulo operator for n to find the position on the shelves for that game, where n is the maximum number of games you can have on your shelves.
  • Place the game into that position.

Let’s apply this to one of the games — Age of Empires:

  • First Characters: [A, o, E]
  • Positions in ASCII: [65, 111, 69]
  • Multiply with Prime Numbers: [65*13], [111*17], [69*19]
  • Add them up: (65*13) + (111*17) + (69*19) = 4043
  • Apply modulo: 4043 % 400 = 43 (assume that you can have 400 games at maximum)
  • Put the game into the 43rd position on the shelves.

Now, when somebody else comes into your store and asks you for “Age of Empires” you know that the game is put on the 43rd position, and you find that in what is called “in constant time”.

What are Collisions?

Everything is fine by now, you are putting your games on your shelves one-by-one and now you want to put the game “The Collision Game …”, took it through your hash function, and it generated 43, and suddenly realize that you’ve already put Age of Empires into 43rd position on the shelves! What happens now?

As we’ve said a good hash function expected to generate a unique hash value for every unique key, but in real-world that is not as easy as it sounds. Since a hash function gets us a small number (43) for a big input (Age of Empires), there is a possibility that two keys result in the same hash value. Getting the same hash value for two unequal objects is called the Collision.

How to Handle Collisions?

There are different approaches to solve the collision problems, these are called “Collision Resolution Techniques”.

Chaining

One way of handling collisions is instead of holding a single element at each position, having multiple elements stored in the same position. It is usually achieved by using a linked list for every position and when the hash function generates the same hash value for a different element, add that element on that list too.

For our case, the solution is to put “The Collision Game” on top/bottom of “Age of Empires”.

The problem with chaining occurs when too many objects collide into the same position. The best advantage of using hashing is to have constant-time access to every element, but imagine putting all games into the same position and asked to find “Age of Empires”. Now, again we have to look for all the games in that position, causing us to spend hours again doing “linear search”.

Java 8 brought an interesting enhancement to HashMap implementation — if a bucket size goes beyond the certain threshold, the linked list gets replaced with a tree map. This allows achieving O(logn) look up instead of pessimistic O(n).

Open Addressing

Open Addressing suggests no chaining, storing one element per slot. If a collision occurs, it suggests trying (probe) to put that element into different slots.

Linear Probing — If we can not put the element at position k, we try the next slot k+1. If that one is also occupied, we go to k+2, and so on.

Quadratic Probing — If we can not put the element at position k, we try the next slot k+i². If that one is also occupied, we go to k+(i+1)², and so on.

Double Hashing — If we can not put the element at position k, we use another hash function hash2(k) and look for k+i*hash2(k) slot. If that one is also occupied, we go to k+(i+1)*hash2(k) slot.

Conclusion

Hashing is amazing when done correctly!

This is it for this post, hope you all enjoyed it. In the next posts, we will dive deeper into hash functions and look hashing implementations in Java.

Cheers!

References

Sorry, I know these are a lot, but they were all great.

--

--

Recep İnanç

Software Developer @Amazon. You can find me at recepinanc.com. My opinions are my own.