How to Implement a Hash Map in Python
A hash map (also called a hash table) is a data structure that stores key-value pairs and provides fast access, insertion, and deletion (typically in O(1) average time). Hash maps are one of the most widely used data structures in programming, powering everything from database indexing to caching systems.
Python comes with a built-in hash map implementation called dict (dictionary). However, understanding how hash maps work internally, including hashing and collision handling, is invaluable for technical interviews, algorithm design, and performance optimization.
This guide covers both using Python's built-in dictionary and building a hash map from scratch.
How Hash Maps Work
A hash map uses a hash function to convert keys into array indices, allowing direct access to values without searching through all entries.
Core Concepts
- Hash Function: This converts a key into a numeric index. Python provides the built-in
hash()function for this. - Bucket Array: This is an internal array where each position (bucket) can store one or more key-value pairs.
- Collision Handling: When two different keys hash to the same index, a strategy is needed to store both. The most common approaches are:
- Chaining: Each bucket holds a list of key-value pairs.
- Open Addressing: This probes for the next available slot.
Key: "apple"
→ hash("apple") = 1234567890
→ 1234567890 % table_size = 2
→ Store ("apple", value) in bucket[2]
Using Python's Built-in Dictionary
Python's dict is a highly optimized hash map implementation. For most use cases, it's all you need:
# Create a hash map
scores = {}
# Insert key-value pairs
scores['Alice'] = 95
scores['Bob'] = 82
scores['Charlie'] = 91
# Access a value
print(scores['Alice'])
# Update a value
scores['Alice'] = 98
print(scores['Alice'])
# Delete a key
del scores['Bob']
# Check if a key exists
print('Charlie' in scores)
# Iterate over key-value pairs
for name, score in scores.items():
print(f"{name}: {score}")
Output:
95
98
True
Alice: 98
Charlie: 91
For most practical Python programming, use dict. It's implemented in C, handles collision resolution automatically, dynamically resizes, and maintains insertion order (Python 3.7+). Building a custom hash map is primarily useful for learning and interview preparation.
Building a Hash Map from Scratch
To understand the internal mechanics, let's implement a hash map with chaining for collision handling.
Step 1: Define the Class Structure
class HashMap:
def __init__(self, size=10):
"""Initialize the hash map with a fixed number of buckets."""
self.size = size
self.buckets = [[] for _ in range(size)]
self.count = 0
self.size: The number of buckets in the hash table.self.buckets: A list of empty lists, where each list is a bucket that can hold multiple key-value pairs (chaining).self.count: Tracks the number of stored key-value pairs.
Step 2: Hash Function
def _hash(self, key):
"""Compute the bucket index for a given key."""
return hash(key) % self.size
Python's built-in hash() generates an integer from the key. The modulo operation (% self.size) maps it to a valid bucket index.
Step 3: Insert or Update
def put(self, key, value):
"""Insert a key-value pair, or update the value if the key exists."""
index = self._hash(key)
bucket = self.buckets[index]
for i, (k, v) in enumerate(bucket):
if k == key:
bucket[i] = (key, value) # Update existing
return
bucket.append((key, value)) # Insert new
self.count += 1
The method first checks if the key already exists in the bucket. If found, it updates the value. Otherwise, it appends the new pair.
Step 4: Retrieve a Value
def get(self, key):
"""Retrieve the value for a given key, or None if not found."""
index = self._hash(key)
bucket = self.buckets[index]
for k, v in bucket:
if k == key:
return v
return None
Step 5: Delete a Key
def remove(self, key):
"""Remove a key-value pair. Returns True if found, False otherwise."""
index = self._hash(key)
bucket = self.buckets[index]
for i, (k, v) in enumerate(bucket):
if k == key:
bucket.pop(i)
self.count -= 1
return True
return False
Step 6: Display and Utility Methods
def __len__(self):
return self.count
def __contains__(self, key):
return self.get(key) is not None
def __str__(self):
pairs = []
for bucket in self.buckets:
for key, value in bucket:
pairs.append(f"{key!r}: {value!r}")
return "{" + ", ".join(pairs) + "}"
Complete Implementation
class HashMap:
def __init__(self, size=10):
self.size = size
self.buckets = [[] for _ in range(size)]
self.count = 0
def _hash(self, key):
return hash(key) % self.size
def put(self, key, value):
index = self._hash(key)
bucket = self.buckets[index]
for i, (k, v) in enumerate(bucket):
if k == key:
bucket[i] = (key, value)
return
bucket.append((key, value))
self.count += 1
def get(self, key):
index = self._hash(key)
bucket = self.buckets[index]
for k, v in bucket:
if k == key:
return v
return None
def remove(self, key):
index = self._hash(key)
bucket = self.buckets[index]
for i, (k, v) in enumerate(bucket):
if k == key:
bucket.pop(i)
self.count -= 1
return True
return False
def __len__(self):
return self.count
def __contains__(self, key):
return self.get(key) is not None
def __str__(self):
pairs = []
for bucket in self.buckets:
for key, value in bucket:
pairs.append(f"{key!r}: {value!r}")
return "{" + ", ".join(pairs) + "}"
Usage Example
hm = HashMap(size=5)
# Insert key-value pairs
hm.put('apple', 10)
hm.put('banana', 20)
hm.put('cherry', 30)
print("After insertions:", hm)
# Retrieve values
print("Value for 'banana':", hm.get('banana'))
print("Value for 'apple':", hm.get('apple'))
# Update a value
hm.put('apple', 50)
print("After update:", hm)
# Delete a key
hm.remove('banana')
print("After deletion:", hm)
# Check existence
print("'cherry' exists:", 'cherry' in hm)
print("'banana' exists:", 'banana' in hm)
print("Size:", len(hm))
Output:
After insertions: {'cherry': 30, 'apple': 10, 'banana': 20}
Value for 'banana': 20
Value for 'apple': 10
After update: {'cherry': 30, 'apple': 50, 'banana': 20}
After deletion: {'cherry': 30, 'apple': 50}
'cherry' exists: True
'banana' exists: False
Size: 2
Understanding Collisions
Collisions occur when two different keys hash to the same bucket index. Our implementation handles this with chaining, storing multiple pairs in the same bucket as a list:
hm = HashMap(size=3) # Only 3 buckets- collisions are likely
hm.put('a', 1)
hm.put('b', 2)
hm.put('c', 3)
hm.put('d', 4)
hm.put('e', 5)
# Visualize the buckets
for i, bucket in enumerate(hm.buckets):
print(f"Bucket {i}: {bucket}")
Output (example):
Bucket 0: [('a', 1), ('b', 2)]
Bucket 1: [('c', 3)]
Bucket 2: [('d', 4), ('e', 5)]
Multiple pairs share the same bucket, but lookups still work correctly by scanning the bucket's list.
With too few buckets, many collisions occur and performance degrades toward O(n). With too many buckets, memory is wasted. A good hash map maintains a load factor (items/buckets) below 0.7 and resizes (doubles the buckets) when it exceeds this threshold.
Time Complexity
| Operation | Average Case | Worst Case (many collisions) |
|---|---|---|
Insert (put) | O(1) | O(n) |
Lookup (get) | O(1) | O(n) |
Delete (remove) | O(1) | O(n) |
| Space | O(n) | O(n) |
The worst case occurs when all keys hash to the same bucket, turning the hash map into essentially a linked list. In practice, with a good hash function and appropriate load factor, operations are nearly always O(1).
Common Applications
| Application | How Hash Maps Help |
|---|---|
| Caching | Map URLs or queries to cached responses |
| Counting frequencies | Count occurrences of words, characters, etc. |
| Database indexing | Fast lookup of records by key |
| Deduplication | Track seen elements in O(1) |
| Two Sum / pattern matching | Store complements or patterns for O(n) algorithms |
Conclusion
Hash maps are one of the most fundamental and powerful data structures in computer science.
- Python's built-in
dictis an excellent hash map implementation for everyday programming. It is fast, well-tested, and feature-rich. - Building a hash map from scratch deepens your understanding of hashing, collision resolution, and the trade-offs between bucket count, load factor, and performance.
Whether you're using Python's dictionary or implementing your own, mastering hash maps is essential for writing efficient, scalable code.