How to Convert String to Bytes in Python

Python 3 enforces a strict separation between text (str, Unicode code points) and binary data (bytes, raw octets). Understanding how to convert between these types is essential for file I/O operations, cryptographic hashing, network programming, and working with binary protocols.

Why String to Bytes Conversion Matters

Strings in Python 3 are sequences of Unicode characters, while bytes are sequences of integers (0-255). Many operations require bytes rather than strings:

Writing to binary files
Sending data over network sockets
Computing hashes (MD5, SHA-256)
Encoding data for transmission (Base64)
Interfacing with C libraries or system calls

info

Python 3 will raise a TypeError if you attempt to use a string where bytes are required, forcing you to handle encoding explicitly.

Convert String to Bytes Using `encode()`

The encode() method is the standard way to convert a string to bytes. It accepts an encoding format, with UTF-8 as the default.

text = "Hello Python 🐍"

# Convert to bytes using UTF-8 encoding
data = text.encode("utf-8")

print(data)
# Output: b'Hello Python \xf0\x9f\x90\x8d'

print(type(data))
# Output: <class 'bytes'>

Output:

b'Hello Python \xf0\x9f\x90\x8d'
<class 'bytes'>

The b prefix indicates a bytes literal, and \xf0\x9f\x90\x8d represents the four-byte UTF-8 sequence for the snake emoji.

Common Encoding Options

text = "Héllo"

# UTF-8: Variable-width, supports all Unicode (recommended)
utf8_bytes = text.encode("utf-8")
print(utf8_bytes)  # b'H\xc3\xa9llo'

# UTF-16: Used by Windows internally
utf16_bytes = text.encode("utf-16")
print(utf16_bytes)  # b'\xff\xfeH\x00\xe9\x00l\x00l\x00o\x00'

# Latin-1 (ISO-8859-1): Single-byte encoding for Western European languages
latin1_bytes = text.encode("latin-1")
print(latin1_bytes)  # b'H\xe9llo'

Output:

b'H\xc3\xa9llo'
b'\xff\xfeH\x00\xe9\x00l\x00l\x00o\x00'
b'H\xe9llo'

Convert String to Bytes Using the `bytes()` Constructor

The bytes() constructor provides an alternative syntax that makes type conversion more explicit in your code.

text = "Data"

# Encoding argument is required
byte_data = bytes(text, "utf-8")

print(byte_data)
# Output: b'Data'

warning

Unlike encode(), the bytes() constructor requires the encoding argument: it has no default value and will raise a TypeError if omitted.

Create Mutable Byte Sequences with `bytearray`

Standard bytes objects are immutable. When you need to modify binary data in place, such as when processing data streams or building network packets, use bytearray.

text = "Hello"
mutable_bytes = bytearray(text, "utf-8")

print(mutable_bytes)
# Output: bytearray(b'Hello')

# Modify the first byte (H -> J)
mutable_bytes[0] = ord('J')

print(mutable_bytes)
# Output: bytearray(b'Jello')

# Append more data
mutable_bytes.extend(b' World')
print(mutable_bytes)
# Output: bytearray(b'Jello World')

Output:

bytearray(b'Hello')
bytearray(b'Jello')
bytearray(b'Jello World')

Handle Encoding Errors Gracefully

When encoding strings containing characters outside the target encoding's range, Python raises a UnicodeEncodeError. The errors parameter provides several strategies to handle these situations.

text = "Café 🍵"

# Strict mode (default): Raises an exception
try:
    text.encode("ascii")
except UnicodeEncodeError as e:
    print(f"Error: {e}")

# Replace unsupported characters with '?'
replaced = text.encode("ascii", errors="replace")
print(replaced)
# Output: b'Caf? ?'

# Ignore unsupported characters entirely
ignored = text.encode("ascii", errors="ignore")
print(ignored)
# Output: b'Caf '

# Use XML character references
xmlcharref = text.encode("ascii", errors="xmlcharrefreplace")
print(xmlcharref)
# Output: b'Caf&#233; &#127861;'

# Use Python's backslash escape sequences
backslash = text.encode("ascii", errors="backslashreplace")
print(backslash)
# Output: b'Caf\\xe9 \\U0001f375'

Output:

Error: 'ascii' codec can't encode character '\xe9' in position 3: ordinal not in range(128)
b'Caf? ?'
b'Caf '
b'Caf&#233; &#127861;'
b'Caf\\xe9 \\U0001f375'

Error Handling Strategies

Strategy	Behavior	Use Case
`strict`	Raises `UnicodeEncodeError`	Default; fail fast
`ignore`	Removes unencodable characters	Lossy but simple
`replace`	Substitutes with `?`	User-facing output
`xmlcharrefreplace`	Uses XML numeric references	HTML/XML content
`backslashreplace`	Uses Python escape sequences	Debugging

Convert Bytes Back to String

Use the decode() method to convert bytes back to a string.

byte_data = b'Hello Python \xf0\x9f\x90\x8d'

# Decode using UTF-8
text = byte_data.decode("utf-8")

print(text)
# Output: Hello Python 🐍

tip

Always use the same encoding for both encode() and decode() operations to avoid data corruption or errors.

Quick Reference

Method	Return Type	Mutable	Notes
`str.encode(encoding)`	`bytes`	No	Preferred method
`bytes(str, encoding)`	`bytes`	No	Explicit constructor
`bytearray(str, encoding)`	`bytearray`	Yes	For in-place modification
`b"literal"`	`bytes`	No	ASCII-only literals

Conclusion

Use str.encode('utf-8') for the vast majority of string-to-bytes conversions. UTF-8 is the web standard, handles all Unicode characters, and maintains compatibility with ASCII. Only switch to alternative encodings when interfacing with legacy systems that specifically require Latin-1, ASCII, or other formats. For scenarios requiring in-place byte manipulation, bytearray provides the necessary mutability while maintaining the same encoding interface.

Why String to Bytes Conversion Matters​

Convert String to Bytes Using encode()​

Common Encoding Options​

Convert String to Bytes Using the bytes() Constructor​

Create Mutable Byte Sequences with bytearray​

Handle Encoding Errors Gracefully​

Error Handling Strategies​

Convert Bytes Back to String​

Quick Reference​

Conclusion​

Table of Contents

Why String to Bytes Conversion Matters

Convert String to Bytes Using `encode()`

Common Encoding Options

Convert String to Bytes Using the `bytes()` Constructor

Create Mutable Byte Sequences with `bytearray`

Handle Encoding Errors Gracefully

Error Handling Strategies

Convert Bytes Back to String

Quick Reference

Conclusion