Skip to main content

How to Use Hash Sets in Python

A hash set is a data structure that stores unique elements in an unordered collection, providing O(1) average time for searching, inserting, and deleting elements. In Python, the built-in set type is a hash set implementation, powered by a hash table under the hood.

Hash sets are essential for tasks like removing duplicates, membership testing, set operations (union, intersection, difference), and any scenario where fast lookups matter more than element ordering.

How Hash Sets Work

Internally, Python sets use a hash table where each element is stored as a key (with no associated value). When you add an element:

  1. Python computes the element's hash value using hash().
  2. The hash determines the bucket (storage slot) where the element is placed.
  3. If a collision occurs (two elements hash to the same bucket), Python resolves it using open addressing.

This hashing mechanism is what makes lookups, insertions, and deletions so fast.

Key characteristics of hash sets
  • No duplicate elements - adding an existing element has no effect.
  • Unordered - elements are stored by hash value, not insertion order.
  • Mutable - you can add or remove elements after creation.
  • Only hashable elements - numbers, strings, and tuples are allowed; lists and dictionaries are not.

Creating a Hash Set

You can create a set using curly braces {} or the set() constructor:

# Using curly braces
fruits = {'apple', 'banana', 'cherry'}
print("Fruits:", fruits)

# Using the set() constructor
numbers = set([1, 2, 3, 3, 4, 4, 5])
print("Numbers:", numbers) # Duplicates are removed automatically

# Empty set (must use set(), not {})
empty = set()
print("Empty set:", empty)

Output:

Fruits: {'banana', 'cherry', 'apple'}
Numbers: {1, 2, 3, 4, 5}
Empty set: set()
Creating an empty set

Use set() to create an empty set - not {}. Empty curly braces create an empty dictionary, not a set:

not_a_set = {}
print(type(not_a_set)) # <class 'dict'>

actual_set = set()
print(type(actual_set)) # <class 'set'>

Adding Elements

Use add() to insert a single element, or update() to add multiple elements at once:

colors = {'red', 'green'}

# Add a single element
colors.add('blue')
print("After add:", colors)

# Adding a duplicate has no effect
colors.add('red')
print("After adding duplicate:", colors)

# Add multiple elements
colors.update(['yellow', 'purple', 'green'])
print("After update:", colors)

Output:

After add: {'blue', 'green', 'red'}
After adding duplicate: {'blue', 'green', 'red'}
After update: {'yellow', 'green', 'red', 'blue', 'purple'}

Removing Elements

Python sets provide several methods for removing elements, each with different behavior:

numbers = {1, 2, 3, 4, 5}

# remove(): raises KeyError if element not found
numbers.remove(3)
print("After remove(3):", numbers)

# discard(): does NOT raise an error if element not found
numbers.discard(10) # No error
print("After discard(10):", numbers)

# pop(): removes and returns an arbitrary element
popped = numbers.pop()
print(f"Popped: {popped}, Remaining: {numbers}")

# clear(): removes all elements
numbers.clear()
print("After clear:", numbers)

Output:

After remove(3): {1, 2, 4, 5}
After discard(10): {1, 2, 4, 5}
Popped: 1, Remaining: {2, 4, 5}
After clear: set()
remove() vs discard()
  • Use remove() when you're certain the element exists and want an error if it doesn't (helps catch bugs).
  • Use discard() when missing elements should be silently ignored.
s = {1, 2, 3}

s.discard(99) # No error
s.remove(99) # KeyError: 99

Membership Testing (The Main Strength)

The primary advantage of hash sets is O(1) membership testing - checking whether an element exists is nearly instantaneous, regardless of set size:

large_set = set(range(1_000_000))

# O(1) lookup: extremely fast
print(999_999 in large_set)
print(2_000_000 in large_set)

Output:

True
False

Compare this to a list, where membership testing is O(n):

# Set: O(1) lookup
my_set = {1, 2, 3, 4, 5}
print(3 in my_set) # Fast

# List: O(n) lookup
my_list = [1, 2, 3, 4, 5]
print(3 in my_list) # Slower for large collections

Set Operations

Hash sets support powerful mathematical set operations:

a = {1, 2, 3, 4, 5}
b = {4, 5, 6, 7, 8}

# Union: all elements from both sets
print("Union:", a | b)
print("Union:", a.union(b))

# Intersection: elements common to both
print("Intersection:", a & b)
print("Intersection:", a.intersection(b))

# Difference: elements in a but not in b
print("Difference:", a - b)
print("Difference:", a.difference(b))

# Symmetric Difference: elements in either, but not both
print("Symmetric Diff:", a ^ b)
print("Symmetric Diff:", a.symmetric_difference(b))

Output:

Union: {1, 2, 3, 4, 5, 6, 7, 8}
Union: {1, 2, 3, 4, 5, 6, 7, 8}
Intersection: {4, 5}
Intersection: {4, 5}
Difference: {1, 2, 3}
Difference: {1, 2, 3}
Symmetric Diff: {1, 2, 3, 6, 7, 8}
Symmetric Diff: {1, 2, 3, 6, 7, 8}

Set Comparison Operations

a = {1, 2, 3}
b = {1, 2, 3, 4, 5}

# Subset check
print(a.issubset(b)) # True (all elements of a are in b)
print(a <= b) # True

# Superset check
print(b.issuperset(a)) # True (b contains all elements of a)

# Disjoint check (no common elements)
c = {6, 7, 8}
print(a.isdisjoint(c)) # True (no overlap)

Output:

True
True
True
True

Practical Examples

Removing Duplicates from a List

data = [1, 5, 2, 1, 3, 2, 5, 4, 3]

unique = list(set(data))
print("Unique values:", unique)

Output:

Unique values: [1, 2, 3, 4, 5]

Fast Membership Checking

valid_users = {'alice', 'bob', 'charlie', 'diana'}

def check_access(username):
if username in valid_users: # O(1) lookup
return f"Access granted for {username}"
return f"Access denied for {username}"

print(check_access('bob'))
print(check_access('eve'))

Output:

Access granted for bob
Access denied for eve

Finding Common Elements Between Lists

list1 = [1, 2, 3, 4, 5, 6]
list2 = [4, 5, 6, 7, 8, 9]

common = set(list1) & set(list2)
print("Common elements:", common)

Output:

Common elements: {4, 5, 6}

Frozen Sets (Immutable Hash Sets)

If you need an immutable set (one that can't be modified after creation), use frozenset. Frozen sets are hashable, so they can be used as dictionary keys or elements of other sets:

# Frozenset (immutable set)
fs = frozenset([1, 2, 3, 4])
print("Frozenset:", fs)

# Can be used as a dictionary key
permissions = {
frozenset(['read', 'write']): 'editor',
frozenset(['read']): 'viewer'
}
print(permissions[frozenset(['read'])])

Output:

Frozenset: frozenset({1, 2, 3, 4})
viewer

What Can (and Can't) Be Stored in a Set

Sets can only contain hashable (immutable) elements:

# These work (hashable types)
valid_set = {42, 3.14, 'hello', (1, 2, 3), True, None}
print("Valid set:", valid_set)
# Output: Valid set: {True, 3.14, 42, (1, 2, 3), None, 'hello'}
# These raise TypeError (unhashable types)
invalid_set = {[1, 2, 3]} # TypeError: unhashable type: 'list'
invalid_set = {{'a': 1}} # TypeError: unhashable type: 'dict'
warning

If you need to store list-like data in a set, convert it to a tuple first:

data = [1, 2, 3]
my_set = {tuple(data)} # Works!
print(my_set)
# Output: {(1, 2, 3)}

Time Complexity Summary

OperationAverage CaseWorst Case
add(element)O(1)O(n)
remove(element)O(1)O(n)
element in setO(1)O(n)
union (|)O(n + m)O(n + m)
intersection (&)O(min(n, m))O(n * m)
difference (-)O(n)O(n)

The worst case occurs when all elements hash to the same bucket (extremely rare with Python's hash function).

Conclusion

Python's built-in set type is a powerful hash set implementation that provides O(1) average-time operations for adding, removing, and checking membership.

  • Use sets whenever you need unique elements, fast lookups, or mathematical set operations like union, intersection, and difference.
  • Remember that sets are unordered and can only store hashable elements.
  • For immutable sets, use frozenset.

Understanding hash sets is fundamental to writing efficient Python code, especially for problems involving deduplication, searching, and data comparison.