Skip to main content

How to Implement a Hash Map in Python

A hash map (also called a hash table) is a data structure that stores key-value pairs and provides fast access, insertion, and deletion (typically in O(1) average time). Hash maps are one of the most widely used data structures in programming, powering everything from database indexing to caching systems.

Python comes with a built-in hash map implementation called dict (dictionary). However, understanding how hash maps work internally, including hashing and collision handling, is invaluable for technical interviews, algorithm design, and performance optimization.

This guide covers both using Python's built-in dictionary and building a hash map from scratch.

How Hash Maps Work

A hash map uses a hash function to convert keys into array indices, allowing direct access to values without searching through all entries.

Core Concepts

  1. Hash Function: This converts a key into a numeric index. Python provides the built-in hash() function for this.
  2. Bucket Array: This is an internal array where each position (bucket) can store one or more key-value pairs.
  3. Collision Handling: When two different keys hash to the same index, a strategy is needed to store both. The most common approaches are:
    • Chaining: Each bucket holds a list of key-value pairs.
    • Open Addressing: This probes for the next available slot.
Key: "apple"
→ hash("apple") = 1234567890
→ 1234567890 % table_size = 2
→ Store ("apple", value) in bucket[2]

Using Python's Built-in Dictionary

Python's dict is a highly optimized hash map implementation. For most use cases, it's all you need:

# Create a hash map
scores = {}

# Insert key-value pairs
scores['Alice'] = 95
scores['Bob'] = 82
scores['Charlie'] = 91

# Access a value
print(scores['Alice'])

# Update a value
scores['Alice'] = 98
print(scores['Alice'])

# Delete a key
del scores['Bob']

# Check if a key exists
print('Charlie' in scores)

# Iterate over key-value pairs
for name, score in scores.items():
print(f"{name}: {score}")

Output:

95
98
True
Alice: 98
Charlie: 91
tip

For most practical Python programming, use dict. It's implemented in C, handles collision resolution automatically, dynamically resizes, and maintains insertion order (Python 3.7+). Building a custom hash map is primarily useful for learning and interview preparation.

Building a Hash Map from Scratch

To understand the internal mechanics, let's implement a hash map with chaining for collision handling.

Step 1: Define the Class Structure

class HashMap:
def __init__(self, size=10):
"""Initialize the hash map with a fixed number of buckets."""
self.size = size
self.buckets = [[] for _ in range(size)]
self.count = 0
  • self.size: The number of buckets in the hash table.
  • self.buckets: A list of empty lists, where each list is a bucket that can hold multiple key-value pairs (chaining).
  • self.count: Tracks the number of stored key-value pairs.

Step 2: Hash Function

    def _hash(self, key):
"""Compute the bucket index for a given key."""
return hash(key) % self.size

Python's built-in hash() generates an integer from the key. The modulo operation (% self.size) maps it to a valid bucket index.

Step 3: Insert or Update

    def put(self, key, value):
"""Insert a key-value pair, or update the value if the key exists."""
index = self._hash(key)
bucket = self.buckets[index]

for i, (k, v) in enumerate(bucket):
if k == key:
bucket[i] = (key, value) # Update existing
return

bucket.append((key, value)) # Insert new
self.count += 1

The method first checks if the key already exists in the bucket. If found, it updates the value. Otherwise, it appends the new pair.

Step 4: Retrieve a Value

    def get(self, key):
"""Retrieve the value for a given key, or None if not found."""
index = self._hash(key)
bucket = self.buckets[index]

for k, v in bucket:
if k == key:
return v
return None

Step 5: Delete a Key

    def remove(self, key):
"""Remove a key-value pair. Returns True if found, False otherwise."""
index = self._hash(key)
bucket = self.buckets[index]

for i, (k, v) in enumerate(bucket):
if k == key:
bucket.pop(i)
self.count -= 1
return True
return False

Step 6: Display and Utility Methods

    def __len__(self):
return self.count

def __contains__(self, key):
return self.get(key) is not None

def __str__(self):
pairs = []
for bucket in self.buckets:
for key, value in bucket:
pairs.append(f"{key!r}: {value!r}")
return "{" + ", ".join(pairs) + "}"

Complete Implementation

class HashMap:
def __init__(self, size=10):
self.size = size
self.buckets = [[] for _ in range(size)]
self.count = 0

def _hash(self, key):
return hash(key) % self.size

def put(self, key, value):
index = self._hash(key)
bucket = self.buckets[index]
for i, (k, v) in enumerate(bucket):
if k == key:
bucket[i] = (key, value)
return
bucket.append((key, value))
self.count += 1

def get(self, key):
index = self._hash(key)
bucket = self.buckets[index]
for k, v in bucket:
if k == key:
return v
return None

def remove(self, key):
index = self._hash(key)
bucket = self.buckets[index]
for i, (k, v) in enumerate(bucket):
if k == key:
bucket.pop(i)
self.count -= 1
return True
return False

def __len__(self):
return self.count

def __contains__(self, key):
return self.get(key) is not None

def __str__(self):
pairs = []
for bucket in self.buckets:
for key, value in bucket:
pairs.append(f"{key!r}: {value!r}")
return "{" + ", ".join(pairs) + "}"

Usage Example

hm = HashMap(size=5)

# Insert key-value pairs
hm.put('apple', 10)
hm.put('banana', 20)
hm.put('cherry', 30)
print("After insertions:", hm)

# Retrieve values
print("Value for 'banana':", hm.get('banana'))
print("Value for 'apple':", hm.get('apple'))

# Update a value
hm.put('apple', 50)
print("After update:", hm)

# Delete a key
hm.remove('banana')
print("After deletion:", hm)

# Check existence
print("'cherry' exists:", 'cherry' in hm)
print("'banana' exists:", 'banana' in hm)
print("Size:", len(hm))

Output:

After insertions: {'cherry': 30, 'apple': 10, 'banana': 20}
Value for 'banana': 20
Value for 'apple': 10
After update: {'cherry': 30, 'apple': 50, 'banana': 20}
After deletion: {'cherry': 30, 'apple': 50}
'cherry' exists: True
'banana' exists: False
Size: 2

Understanding Collisions

Collisions occur when two different keys hash to the same bucket index. Our implementation handles this with chaining, storing multiple pairs in the same bucket as a list:

hm = HashMap(size=3)  # Only 3 buckets- collisions are likely

hm.put('a', 1)
hm.put('b', 2)
hm.put('c', 3)
hm.put('d', 4)
hm.put('e', 5)

# Visualize the buckets
for i, bucket in enumerate(hm.buckets):
print(f"Bucket {i}: {bucket}")

Output (example):

Bucket 0: [('a', 1), ('b', 2)]
Bucket 1: [('c', 3)]
Bucket 2: [('d', 4), ('e', 5)]

Multiple pairs share the same bucket, but lookups still work correctly by scanning the bucket's list.

Why bucket size matters

With too few buckets, many collisions occur and performance degrades toward O(n). With too many buckets, memory is wasted. A good hash map maintains a load factor (items/buckets) below 0.7 and resizes (doubles the buckets) when it exceeds this threshold.

Time Complexity

OperationAverage CaseWorst Case (many collisions)
Insert (put)O(1)O(n)
Lookup (get)O(1)O(n)
Delete (remove)O(1)O(n)
SpaceO(n)O(n)

The worst case occurs when all keys hash to the same bucket, turning the hash map into essentially a linked list. In practice, with a good hash function and appropriate load factor, operations are nearly always O(1).

Common Applications

ApplicationHow Hash Maps Help
CachingMap URLs or queries to cached responses
Counting frequenciesCount occurrences of words, characters, etc.
Database indexingFast lookup of records by key
DeduplicationTrack seen elements in O(1)
Two Sum / pattern matchingStore complements or patterns for O(n) algorithms

Conclusion

Hash maps are one of the most fundamental and powerful data structures in computer science.

  • Python's built-in dict is an excellent hash map implementation for everyday programming. It is fast, well-tested, and feature-rich.
  • Building a hash map from scratch deepens your understanding of hashing, collision resolution, and the trade-offs between bucket count, load factor, and performance.

Whether you're using Python's dictionary or implementing your own, mastering hash maps is essential for writing efficient, scalable code.