Data Structures and Algorithms (DSA)  

What is Hash Tables in Data Structures with Example

Introduction

A Hash Table (also known as a Hash Map) is one of the most efficient and widely used data structures in computer science. It enables you to store and retrieve data quickly using a technique known as hashing. Unlike arrays or linked lists, which use indexes or sequential access, a hash table employs a key-value pair mechanism for fast lookups.

Hash tables are extremely useful in situations where you need constant-time access (O(1)) to data, such as implementing databases, caches, symbol tables, and dictionaries in programming languages.

What is a Hash Table?

A Hash Table is a data structure that stores data in key-value pairs. It uses a hash function to convert a key into an index, which determines where the corresponding value is stored in memory.

For example:

Key: "Name" → Hash Function → Index 5
Value: "John"

So, the hash table stores this data in slot 5 of the internal array.

This makes searching, inserting, and deleting elements very fast — often in O(1) time.

Why Use a Hash Table?

Here’s why hash tables are popular in programming:

  1. Fast Access: You can access data using keys instead of searching through all elements.

  2. Efficient Storage: Keys and values are stored in a compact way.

  3. Flexibility: You can use any data type as a key (string, number, object, etc.).

  4. Useful for Caching: Frequently accessed data can be retrieved instantly.

How Hashing Works

Hashing is the process of converting a key into a unique integer (called a hash code) using a hash function. This integer is then used to determine the index in the array where the value should be stored.

Example Process:

  1. You have a key: "apple".

  2. The hash function converts it into a number, say 1452.

  3. The index is determined using modulus operation: 1452 % table_size.

  4. Suppose the table size is 10 → 1452 % 10 = 2.

  5. So, the key-value pair ("apple", 100) is stored at index 2.

This process ensures fast data retrieval since the key’s hash determines exactly where the value is stored.

Common Hash Functions

A hash function determines how keys are converted into indices. A good hash function should distribute data evenly to avoid collisions.

Some common hash functions include:

  1. Division Method: h(key) = key % table_size

  2. Multiplication Method: h(key) = floor(m * (key * A % 1)) where A is a constant (0 < A < 1)

  3. Universal Hashing: Randomly chooses a hash function from a family of functions to reduce collisions.

Python’s built-in hash() function uses a sophisticated hashing algorithm that provides unique hash codes for different objects.

Example:

print(hash('apple'))
print(hash('banana'))

Output:

332543836  # (varies every run)
412938492  # (varies every run)

Collisions in Hash Tables

A collision occurs when two different keys are hashed to the same index. Since two values cannot occupy the same slot, we must handle collisions efficiently.

Example:

If both keys "apple" and "orange" map to index 2, the hash table needs a way to store both without losing data.

Common Collision Handling Techniques:

  1. Chaining (Linked List Method):
    Each index holds a linked list of key-value pairs that map to that index.

    Example:

    Index 2: ("apple", 100) → ("orange", 200)
  2. Open Addressing:
    When a collision occurs, the algorithm searches for the next empty slot.

    • Linear Probing: Move sequentially to the next available slot.

    • Quadratic Probing: Use a quadratic formula to find the next slot.

    • Double Hashing: Apply a second hash function to find another index.

Python Implementation of a Simple Hash Table

Here’s a basic hash table implementation using Python lists:

class HashTable:
    def __init__(self, size):
        self.size = size
        self.table = [None] * size

    def _hash(self, key):
        return hash(key) % self.size

    def insert(self, key, value):
        index = self._hash(key)
        if self.table[index] is None:
            self.table[index] = [(key, value)]
        else:
            for pair in self.table[index]:
                if pair[0] == key:
                    pair = (key, value)
                    return
            self.table[index].append((key, value))

    def get(self, key):
        index = self._hash(key)
        if self.table[index] is not None:
            for pair in self.table[index]:
                if pair[0] == key:
                    return pair[1]
        return None

    def display(self):
        for i, data in enumerate(self.table):
            print(i, ":", data)

# Example usage
ht = HashTable(10)
ht.insert('apple', 100)
ht.insert('banana', 200)
ht.insert('grape', 300)
ht.display()
print("Value for 'banana':", ht.get('banana'))

Output:

0 : None
1 : None
2 : [('banana', 200)]
3 : [('apple', 100), ('grape', 300)]
...
Value for 'banana': 200

Real-Life Applications of Hash Tables

  1. Databases: Used in indexing to quickly locate data records.

  2. Compilers: Store variable names and associated information in symbol tables.

  3. Caching Systems: Web browsers and servers use hash maps for quick data retrieval.

  4. Password Verification: Hash functions store encrypted versions of passwords.

  5. Routing Tables: Used in networking to map addresses and routes.

  6. Search Engines: Used to index and retrieve web pages efficiently.

  7. Data Deduplication: Quickly identify and eliminate duplicate entries.

Advantages of Hash Tables

  1. Fast Access: Average O(1) lookup, insertion, and deletion time.

  2. Efficient Memory Usage: Compact storage of key-value pairs.

  3. Flexible Keys: Can use strings, numbers, or objects as keys.

  4. Scalable: Works efficiently even with large datasets.

  5. Widely Supported: Built-in implementation available in most languages (like Python dictionaries).

Limitations of Hash Tables

  1. Collisions: Can lead to performance degradation if not handled properly.

  2. Unordered Data: Data is not stored in sequence.

  3. Difficult Debugging: Collisions make debugging harder.

  4. High Memory Usage: Requires extra space to manage empty slots.

Time and Space Complexity

OperationAverage CaseWorst Case
SearchO(1)O(n)
InsertionO(1)O(n)
DeletionO(1)O(n)
SpaceO(n)O(n)

Summary

A Hash Table is a key-value data structure that provides constant-time access (O(1)) for insertion, search, and deletion using hashing. It converts keys into indices using a hash function, and in case of collisions, handles them using methods like chaining or open addressing. Hash tables are essential in areas like databases, caching, network routing, and symbol tables in compilers. They offer a balance of speed, efficiency, and scalability, making them one of the most fundamental structures in computer science and software engineering.