Skip to main content

How to Compare Strings Using lower() or casefold() in Python

String comparison is a fundamental operation in nearly every Python application, from user authentication and search functionality to data deduplication and form validation. Python provides two methods for converting text to lowercase: .lower() and .casefold(). At first glance they seem interchangeable, but choosing the wrong one can introduce subtle bugs that only surface when your application encounters international text.

This guide explains how each method works, where they differ, and how to choose the right one for your specific use case.

Understanding the Difference

Both .lower() and .casefold() convert strings to lowercase, but they are designed for fundamentally different purposes:

Feature.lower().casefold()
Primary purposeDisplay formattingCase-insensitive matching
Multilingual supportLimited (ASCII-focused)Comprehensive (full Unicode standard)
German "ß" (Eszett)Remains ßConverts to ss
Greek "ς" (final sigma)Remains ςConverts to σ
SpecificationPython string methodUnicode Consortium recommendation

For purely English or ASCII text, both methods produce identical results. The differences only appear with certain Unicode characters that have special case-folding rules.

# With ASCII text, both methods produce the same result
text = "HELLO WORLD"
print(text.lower()) # hello world
print(text.casefold()) # hello world
print(text.lower() == text.casefold()) # True

Output:

hello world
hello world
True

The distinction becomes critical once your application handles text from languages like German, Greek, or Turkish.

Why casefold() Matters for String Matching

The most well-known example is the German letter "ß" (Eszett). This character is already lowercase, so .lower() leaves it unchanged. However, "ß" is logically equivalent to "ss" for comparison purposes. The .casefold() method handles this correctly by applying the full Unicode case-folding rules.

user_input = "STRASSE"
stored_value = "straße"

# Using lower(): comparison fails
print(user_input.lower())
print(stored_value.lower())
print(user_input.lower() == stored_value.lower())

Output:

strasse
straße
False

The comparison fails because .lower() converts "STRASSE" to "strasse" but leaves "straße" unchanged, since "ß" is already lowercase.

Now compare with .casefold():

user_input = "STRASSE"
stored_value = "straße"

# Using casefold(): comparison succeeds
print(user_input.casefold())
print(stored_value.casefold())
print(user_input.casefold() == stored_value.casefold())

Output:

strasse
strasse
True

The .casefold() method converts "ß" to "ss", making the two strings match as expected.

Unicode Standard Compliance

Case folding is the official recommendation from the Unicode Consortium for performing case-insensitive string comparisons. It defines special mappings for characters in German, Greek, Turkish, and many other languages that simple lowercasing misses entirely.

A Common Mistake: Using lower() for Comparisons

A frequent source of bugs is using .lower() for case-insensitive comparisons in code that might encounter international text. Consider a user login system:

# Wrong approach: using lower() for matching
def authenticate(input_email, stored_email):
return input_email.lower() == stored_email.lower()

# This works for ASCII emails
print(authenticate("USER@EXAMPLE.COM", "user@example.com"))

# But fails for internationalized content
print(authenticate("STRASSE@EXAMPLE.COM", "straße@example.com"))

Output:

True
False

The second comparison fails silently, meaning a legitimate user could be locked out of their account. The correct approach is to use .casefold() for any comparison logic:

# Correct approach: using casefold() for matching
def authenticate(input_email, stored_email):
return input_email.casefold() == stored_email.casefold()

print(authenticate("USER@EXAMPLE.COM", "user@example.com"))
print(authenticate("STRASSE@EXAMPLE.COM", "straße@example.com"))

Output:

True
True

Practical Usage Guidelines

The rule of thumb is straightforward: use .lower() when you are formatting text for display, and use .casefold() when you are comparing strings.

Display Formatting with lower()

When the goal is simply to present text in lowercase for visual consistency, .lower() is the appropriate choice:

username = "MaxMüller"
welcome_message = f"Welcome, {username.lower()}!"
print(welcome_message)

Output:

Welcome, maxmüller!

Database Lookups with casefold()

When searching for records in a database or any collection, use .casefold() to ensure matches work across languages:

def find_user_by_email(input_email, database):
normalized_input = input_email.casefold()
for record in database:
if record["email"].casefold() == normalized_input:
return record
return None

database = [
{"email": "straße@example.de", "name": "Hans"},
{"email": "alice@example.com", "name": "Alice"},
]

result = find_user_by_email("STRASSE@EXAMPLE.DE", database)
print(result)

Output:

{'email': 'straße@example.de', 'name': 'Hans'}

Search Functionality with casefold()

Search features should always use .casefold() to handle international queries correctly:

def search_products(query, products):
query_normalized = query.casefold()
return [p for p in products if query_normalized in p["name"].casefold()]

products = [
{"name": "Weißbier", "price": 3.50},
{"name": "Straße Map", "price": 12.99},
{"name": "Coffee Mug", "price": 8.00},
]

results = search_products("WEISSBIER", products)
print(results)

Output:

[{'name': 'Weißbier', 'price': 3.5}]

Without .casefold(), searching for "WEISSBIER" would not match "Weißbier" because .lower() does not convert "ß" to "ss".

Additional Unicode Examples

The difference between .lower() and .casefold() extends well beyond German text.

Greek Sigma Variations

Greek has two forms of lowercase sigma: "σ" (used in the middle of a word) and "ς" (used at the end). The .casefold() method normalizes both to the same form:

word_upper = "ΌΣΟΣ"
word_lower = "όσος" # Ends with final sigma ς

print(f"lower: '{word_upper.lower()}'")
print(f"casefold: '{word_upper.casefold()}'")
print(f"lower match: {word_upper.lower() == word_lower}")
print(f"casefold match: {word_upper.casefold() == word_lower.casefold()}")
lower:    'όσος'
casefold: 'όσοσ'
lower match: True
casefold match: True

Turkish Dotted and Dotless I

Turkish has four distinct "i" characters: dotted uppercase "İ", dotted lowercase "i", dotless uppercase "I", and dotless lowercase "ı". This is a notoriously tricky case in internationalization:

turkish_city = "İSTANBUL"

print(f"lower: '{turkish_city.lower()}'")
print(f"casefold: '{turkish_city.casefold()}'")
lower:    'i̇stanbul'
casefold: 'i̇stanbul'
Performance Consideration

The .casefold() method is slightly slower than .lower() because it references the full Unicode case-folding database. For typical application code, the difference is negligible. If you are processing very large volumes of text (gigabytes or more) and your data is guaranteed to be ASCII-only, you may want to benchmark both approaches. In all other cases, the correctness benefits of .casefold() far outweigh the minimal performance cost.

Quick Reference

Use .lower() when:

  • Formatting text for display or presentation
  • Working exclusively with ASCII or English text
  • Visual appearance is the only concern

Use .casefold() when:

  • Comparing two strings for equality
  • Implementing search or filtering functionality
  • Validating user input against stored values
  • Building any system that may encounter international text
  • Deduplicating data from multilingual sources

By defaulting to .casefold() for all comparison operations, you ensure your Python applications handle international text correctly and provide consistent behavior regardless of the user's language. The minor habit of reaching for .casefold() instead of .lower() when writing comparison logic can prevent an entire class of internationalization bugs from ever reaching production.