How to Encode and Decode String in Python

In Python 3, there is a strict distinction between human-readable text (Strings) and machine-readable binary data (Bytes). "Encoding" is the process of translating a string into bytes (for storage or network transmission), while "decoding" translates bytes back into a string.

This guide explains how to use the .encode() and .decode() methods, handle UnicodeEncodeError exceptions gracefully, and normalize Unicode text for consistent comparison.

Understanding Strings vs. Bytes

String (str): A sequence of Unicode characters. This is the default text type in Python 3. (e.g., "Hello", "Café").
Bytes (bytes): A sequence of integers (0-255) representing raw binary data. (e.g., b"Hello", b"Caf\xc3\xa9").

The Workflow: String → encode() → Bytes → decode() → String

Method 1: Basic Encoding with `.encode()`

To convert a string to bytes, use the .encode(encoding) method. If no encoding is specified, Python defaults to UTF-8, which supports all languages and emojis.

text = "Python is powerful 🚀"

# ✅ Solution: Encode to UTF-8 (Default)
utf8_bytes = text.encode('utf-8')

print(f"Original: {text} (Type: {type(text)})")
print(f"Encoded:  {utf8_bytes} (Type: {type(utf8_bytes)})")

Output:

Original: Python is powerful 🚀 (Type: <class 'str'>)
Encoded:  b'Python is powerful \xf0\x9f\x9a\x80' (Type: <class 'bytes'>)

note

ASCII is a 7-bit encoding standard limited to 128 characters (English letters, numbers, and basic symbols). UTF-8 is variable-width and covers over a million unique characters.

Method 2: Handling Encoding Errors

A common error occurs when you attempt to encode special characters (like accents or emojis) into a restrictive format like ASCII.

Error `UnicodeEncodeError`

text = "Café"

try:
    # ⛔️ Incorrect: ASCII cannot represent 'é'
    ascii_bytes = text.encode('ascii')
except UnicodeEncodeError as e:
    print(f"Error: {e}")

Output:

Error: 'ascii' codec can't encode character '\xe9' in position 3: ordinal not in range(128)

Solution: Using the `errors` Argument

You can pass an errors parameter to dictate how Python handles un-encodable characters.

'strict': Raise an error (Default).
'ignore': Discard the character.
'replace': Insert a placeholder (usually ?).
'namereplace': Replace with the name of the character (e.g., \N{...}).

text = "Café 🚀"

# ✅ Solution 1: Replace invalid characters with '?'
safe_ascii = text.encode('ascii', errors='replace')
print(f"Replace: {safe_ascii}")

# ✅ Solution 2: Ignore invalid characters entirely
ignored_ascii = text.encode('ascii', errors='ignore')
print(f"Ignore:  {ignored_ascii}")

# ✅ Solution 3: Replace with XML character reference
xml_ascii = text.encode('ascii', errors='xmlcharrefreplace')
print(f"XML Ref: {xml_ascii}")

Output:

Replace: b'Caf? ?'
Ignore:  b'Caf '
XML Ref: b'Caf&#233; &#128640;'

Method 3: Decoding Bytes to String

When you receive data from a file or network, it arrives as bytes. You must .decode() it to work with it as text.

# Raw bytes (UTF-8)
raw_data = b'R\xc3\xa9sum\xc3\xa9'

# ✅ Solution: Decode back to string
decoded_text = raw_data.decode('utf-8')

print(decoded_text)

Output:

Résumé

warning

You must know the encoding used to create the bytes. Decoding UTF-8 bytes using latin-1 will result in "Mojibake" (garbled text) like RÃ©sumÃ©.

Advanced: Unicode Normalization

Sometimes, the same character can be represented in multiple ways in Unicode (e.g., é can be a single character or an e followed by an accent modifier). This makes string comparison fail even if they look identical.

Use unicodedata.normalize to standardize strings.

import unicodedata

# Two ways to write 'café'
str1 = "café"           # Precomposed character (NFC)
str2 = "cafe\u0301"     # Decomposed: 'e' + combining acute accent (NFD)

print(f"Looks same? {str1} vs {str2}")
print(f"Bytes equal? {str1 == str2}") # False

# ✅ Solution: Normalize both to NFC (Normalization Form Composition)
norm1 = unicodedata.normalize('NFC', str1)
norm2 = unicodedata.normalize('NFC', str2)

print(f"Normalized equal? {norm1 == norm2}")

Output:

Looks same? café vs café
Bytes equal? False
Normalized equal? True

Conclusion

To handle string encoding effectively in Python:

Use .encode('utf-8') to convert Strings to Bytes for storage or transmission.
Use .decode('utf-8') to convert Bytes back to Strings for processing.
Handle Errors: Use errors='replace' if you must force text into a restricted encoding like ASCII.
Normalize: Use unicodedata when comparing strings from different sources to ensure consistency.

Understanding Strings vs. Bytes​

Method 1: Basic Encoding with .encode()​

Method 2: Handling Encoding Errors​

Error UnicodeEncodeError​

Solution: Using the errors Argument​

Method 3: Decoding Bytes to String​

Advanced: Unicode Normalization​

Conclusion​

Table of Contents

Understanding Strings vs. Bytes

Method 1: Basic Encoding with `.encode()`

Method 2: Handling Encoding Errors

Error `UnicodeEncodeError`

Solution: Using the `errors` Argument

Method 3: Decoding Bytes to String

Advanced: Unicode Normalization

Conclusion