Skip to main content

How to Convert String to Set in Python

Converting strings to sets enables efficient deduplication and fast membership testing. Sets automatically remove duplicates and provide O(1) lookup time, making them ideal for checking unique elements.

Extract Unique Characters with set()

Pass a string directly to set() to get unique characters.

text = "hello"

unique_chars = set(text)

print(unique_chars)
# Output: {'h', 'e', 'l', 'o'}

print(len(unique_chars))
# Output: 4

Analyze Character Composition

password = "aB3$aB3$xY"

unique = set(password)
print(f"Unique characters: {len(unique)}")
# Output: Unique characters: 6

# Check for required character types
has_digit = bool(unique & set('0123456789'))
has_upper = bool(unique & set('ABCDEFGHIJKLMNOPQRSTUVWXYZ'))
has_special = bool(unique & set('!@#$%^&*'))

print(f"Has digit: {has_digit}") # True
print(f"Has uppercase: {has_upper}") # True
print(f"Has special: {has_special}") # True

Output:

Unique characters: 6
Has digit: True
Has uppercase: True
Has special: True
info

Sets are unordered collections. Converting a string to a set loses the original character sequence and cannot be reversed to reconstruct the original string.

Extract Unique Words with split()

Combine split() and set() to find unique words in text.

text = "apple orange banana apple orange grape"

unique_words = set(text.split())

print(unique_words)
# Output: {'banana', 'apple', 'grape', 'orange'}

print(f"Total words: {len(text.split())}")
# Output: Total words: 6

print(f"Unique words: {len(unique_words)}")
# Output: Unique words: 4

Split by Custom Delimiter

tags = "python,django,flask,python,api,django"

unique_tags = set(tags.split(","))

print(unique_tags)
# Output: {'python', 'django', 'flask', 'api'}

Handle Case Sensitivity

Normalize strings before conversion to treat different cases as equal.

text = "Apple APPLE apple aPpLe"

# Case-sensitive (default)
case_sensitive = set(text.split())
print(case_sensitive)
# Output: {'apple', 'aPpLe', 'APPLE', 'Apple'}

# Case-insensitive
case_insensitive = set(text.lower().split())
print(case_insensitive)
# Output: {'apple'}

Preserve Original Case While Deduplicating

text = "Apple APPLE apple Banana BANANA"

# Keep first occurrence of each word (case-insensitive)
seen = set()
unique = []

for word in text.split():
lower = word.lower()
if lower not in seen:
seen.add(lower)
unique.append(word)

print(unique)
# Output: ['Apple', 'Banana']

Perform Set Operations on String Data

Sets support powerful operations for comparing and combining collections.

text1 = "apple banana cherry"
text2 = "banana cherry date"

set1 = set(text1.split())
set2 = set(text2.split())

# Words in both texts (intersection)
common = set1 & set2
print(f"Common: {common}")
# Output: Common: {'cherry', 'banana'}

# Words in either text (union)
all_words = set1 | set2
print(f"All: {all_words}")
# Output: All: {'date', 'banana', 'cherry', 'apple'}

# Words only in first text (difference)
only_first = set1 - set2
print(f"Only in first: {only_first}")
# Output: Only in first: {'apple'}

# Words in one but not both (symmetric difference)
exclusive = set1 ^ set2
print(f"Exclusive: {exclusive}")
# Output: Exclusive: {'date', 'apple'}

Check Membership Efficiently

Sets provide constant-time membership testing.

allowed_commands = "start stop restart status"
allowed_set = set(allowed_commands.split())

def validate_command(cmd: str) -> bool:
return cmd.lower() in allowed_set

print(validate_command("start")) # True
print(validate_command("delete")) # False
note

Much faster than list for large datasets -List: O(n) lookup

  • Set: O(1) lookup

Parse and Deduplicate Structured Data

Handle real-world data with cleaning and deduplication.

# Email list with duplicates and formatting issues
raw_emails = """
alice@example.com, ALICE@EXAMPLE.COM,
bob@test.com, charlie@test.com,
BOB@TEST.COM
"""

# Clean, normalize, and deduplicate
emails = set(
email.strip().lower()
for email in raw_emails.replace('\n', ',').split(',')
if email.strip()
)

print(emails)
# Output: {'alice@example.com', 'bob@test.com', 'charlie@test.com'}

Convert Set Back to String

Join set elements to create a deduplicated string.

text = "the quick brown fox jumps over the lazy the dog"

# Deduplicate words
unique_words = set(text.split())

# Convert back to string
result = ' '.join(unique_words)
print(result)
# Output: quick over dog brown jumps fox the lazy
# Note: Order is not preserved!

# For sorted output
sorted_result = ' '.join(sorted(unique_words))
print(sorted_result)
# Output: brown dog fox jumps lazy over quick the

Output:

quick over dog brown jumps fox the lazy
brown dog fox jumps lazy over quick the
warning

Sets are unordered, so converting back to a string produces unpredictable ordering. Use sorted() if you need consistent output order.

Quick Reference

GoalMethodExample
Unique charactersset(s)set("hello"){'h','e','l','o'}
Unique wordsset(s.split())"a b a"{'a', 'b'}
Case-insensitiveset(s.lower().split())Normalizes case first
Count uniquelen(set(s))Returns unique count
Check membershipx in set(s)O(1) lookup

Conclusion

Use set() for character-level deduplication and set(s.split()) for word-level deduplication. Normalize with lower() when case-insensitive comparison is needed. Sets excel at membership testing and comparing collections with operations like intersection and difference, but remember they do not preserve element order.