How to Convert String to Set in Python
Converting strings to sets enables efficient deduplication and fast membership testing. Sets automatically remove duplicates and provide O(1) lookup time, making them ideal for checking unique elements.
Extract Unique Characters with set()
Pass a string directly to set() to get unique characters.
text = "hello"
unique_chars = set(text)
print(unique_chars)
# Output: {'h', 'e', 'l', 'o'}
print(len(unique_chars))
# Output: 4
Analyze Character Composition
password = "aB3$aB3$xY"
unique = set(password)
print(f"Unique characters: {len(unique)}")
# Output: Unique characters: 6
# Check for required character types
has_digit = bool(unique & set('0123456789'))
has_upper = bool(unique & set('ABCDEFGHIJKLMNOPQRSTUVWXYZ'))
has_special = bool(unique & set('!@#$%^&*'))
print(f"Has digit: {has_digit}") # True
print(f"Has uppercase: {has_upper}") # True
print(f"Has special: {has_special}") # True
Output:
Unique characters: 6
Has digit: True
Has uppercase: True
Has special: True
Sets are unordered collections. Converting a string to a set loses the original character sequence and cannot be reversed to reconstruct the original string.
Extract Unique Words with split()
Combine split() and set() to find unique words in text.
text = "apple orange banana apple orange grape"
unique_words = set(text.split())
print(unique_words)
# Output: {'banana', 'apple', 'grape', 'orange'}
print(f"Total words: {len(text.split())}")
# Output: Total words: 6
print(f"Unique words: {len(unique_words)}")
# Output: Unique words: 4
Split by Custom Delimiter
tags = "python,django,flask,python,api,django"
unique_tags = set(tags.split(","))
print(unique_tags)
# Output: {'python', 'django', 'flask', 'api'}
Handle Case Sensitivity
Normalize strings before conversion to treat different cases as equal.
text = "Apple APPLE apple aPpLe"
# Case-sensitive (default)
case_sensitive = set(text.split())
print(case_sensitive)
# Output: {'apple', 'aPpLe', 'APPLE', 'Apple'}
# Case-insensitive
case_insensitive = set(text.lower().split())
print(case_insensitive)
# Output: {'apple'}
Preserve Original Case While Deduplicating
text = "Apple APPLE apple Banana BANANA"
# Keep first occurrence of each word (case-insensitive)
seen = set()
unique = []
for word in text.split():
lower = word.lower()
if lower not in seen:
seen.add(lower)
unique.append(word)
print(unique)
# Output: ['Apple', 'Banana']
Perform Set Operations on String Data
Sets support powerful operations for comparing and combining collections.
text1 = "apple banana cherry"
text2 = "banana cherry date"
set1 = set(text1.split())
set2 = set(text2.split())
# Words in both texts (intersection)
common = set1 & set2
print(f"Common: {common}")
# Output: Common: {'cherry', 'banana'}
# Words in either text (union)
all_words = set1 | set2
print(f"All: {all_words}")
# Output: All: {'date', 'banana', 'cherry', 'apple'}
# Words only in first text (difference)
only_first = set1 - set2
print(f"Only in first: {only_first}")
# Output: Only in first: {'apple'}
# Words in one but not both (symmetric difference)
exclusive = set1 ^ set2
print(f"Exclusive: {exclusive}")
# Output: Exclusive: {'date', 'apple'}
Check Membership Efficiently
Sets provide constant-time membership testing.
allowed_commands = "start stop restart status"
allowed_set = set(allowed_commands.split())
def validate_command(cmd: str) -> bool:
return cmd.lower() in allowed_set
print(validate_command("start")) # True
print(validate_command("delete")) # False
Much faster than list for large datasets -List: O(n) lookup
- Set: O(1) lookup
Parse and Deduplicate Structured Data
Handle real-world data with cleaning and deduplication.
# Email list with duplicates and formatting issues
raw_emails = """
alice@example.com, ALICE@EXAMPLE.COM,
bob@test.com, charlie@test.com,
BOB@TEST.COM
"""
# Clean, normalize, and deduplicate
emails = set(
email.strip().lower()
for email in raw_emails.replace('\n', ',').split(',')
if email.strip()
)
print(emails)
# Output: {'alice@example.com', 'bob@test.com', 'charlie@test.com'}
Convert Set Back to String
Join set elements to create a deduplicated string.
text = "the quick brown fox jumps over the lazy the dog"
# Deduplicate words
unique_words = set(text.split())
# Convert back to string
result = ' '.join(unique_words)
print(result)
# Output: quick over dog brown jumps fox the lazy
# Note: Order is not preserved!
# For sorted output
sorted_result = ' '.join(sorted(unique_words))
print(sorted_result)
# Output: brown dog fox jumps lazy over quick the
Output:
quick over dog brown jumps fox the lazy
brown dog fox jumps lazy over quick the
Sets are unordered, so converting back to a string produces unpredictable ordering. Use sorted() if you need consistent output order.
Quick Reference
| Goal | Method | Example |
|---|---|---|
| Unique characters | set(s) | set("hello") → {'h','e','l','o'} |
| Unique words | set(s.split()) | "a b a" → {'a', 'b'} |
| Case-insensitive | set(s.lower().split()) | Normalizes case first |
| Count unique | len(set(s)) | Returns unique count |
| Check membership | x in set(s) | O(1) lookup |
Conclusion
Use set() for character-level deduplication and set(s.split()) for word-level deduplication. Normalize with lower() when case-insensitive comparison is needed. Sets excel at membership testing and comparing collections with operations like intersection and difference, but remember they do not preserve element order.