Skip to main content

How to Convert String to Bytes in Python

Python 3 enforces a strict separation between text (str, Unicode code points) and binary data (bytes, raw octets). Understanding how to convert between these types is essential for file I/O operations, cryptographic hashing, network programming, and working with binary protocols.

Why String to Bytes Conversion Matters

Strings in Python 3 are sequences of Unicode characters, while bytes are sequences of integers (0-255). Many operations require bytes rather than strings:

  • Writing to binary files
  • Sending data over network sockets
  • Computing hashes (MD5, SHA-256)
  • Encoding data for transmission (Base64)
  • Interfacing with C libraries or system calls
info

Python 3 will raise a TypeError if you attempt to use a string where bytes are required, forcing you to handle encoding explicitly.

Convert String to Bytes Using encode()

The encode() method is the standard way to convert a string to bytes. It accepts an encoding format, with UTF-8 as the default.

text = "Hello Python 🐍"

# Convert to bytes using UTF-8 encoding
data = text.encode("utf-8")

print(data)
# Output: b'Hello Python \xf0\x9f\x90\x8d'

print(type(data))
# Output: <class 'bytes'>

Output:

b'Hello Python \xf0\x9f\x90\x8d'
<class 'bytes'>

The b prefix indicates a bytes literal, and \xf0\x9f\x90\x8d represents the four-byte UTF-8 sequence for the snake emoji.

Common Encoding Options

text = "Héllo"

# UTF-8: Variable-width, supports all Unicode (recommended)
utf8_bytes = text.encode("utf-8")
print(utf8_bytes) # b'H\xc3\xa9llo'

# UTF-16: Used by Windows internally
utf16_bytes = text.encode("utf-16")
print(utf16_bytes) # b'\xff\xfeH\x00\xe9\x00l\x00l\x00o\x00'

# Latin-1 (ISO-8859-1): Single-byte encoding for Western European languages
latin1_bytes = text.encode("latin-1")
print(latin1_bytes) # b'H\xe9llo'

Output:

b'H\xc3\xa9llo'
b'\xff\xfeH\x00\xe9\x00l\x00l\x00o\x00'
b'H\xe9llo'

Convert String to Bytes Using the bytes() Constructor

The bytes() constructor provides an alternative syntax that makes type conversion more explicit in your code.

text = "Data"

# Encoding argument is required
byte_data = bytes(text, "utf-8")

print(byte_data)
# Output: b'Data'
warning

Unlike encode(), the bytes() constructor requires the encoding argument: it has no default value and will raise a TypeError if omitted.

Create Mutable Byte Sequences with bytearray

Standard bytes objects are immutable. When you need to modify binary data in place, such as when processing data streams or building network packets, use bytearray.

text = "Hello"
mutable_bytes = bytearray(text, "utf-8")

print(mutable_bytes)
# Output: bytearray(b'Hello')

# Modify the first byte (H -> J)
mutable_bytes[0] = ord('J')

print(mutable_bytes)
# Output: bytearray(b'Jello')

# Append more data
mutable_bytes.extend(b' World')
print(mutable_bytes)
# Output: bytearray(b'Jello World')

Output:

bytearray(b'Hello')
bytearray(b'Jello')
bytearray(b'Jello World')

Handle Encoding Errors Gracefully

When encoding strings containing characters outside the target encoding's range, Python raises a UnicodeEncodeError. The errors parameter provides several strategies to handle these situations.

text = "Café 🍵"

# Strict mode (default): Raises an exception
try:
text.encode("ascii")
except UnicodeEncodeError as e:
print(f"Error: {e}")

# Replace unsupported characters with '?'
replaced = text.encode("ascii", errors="replace")
print(replaced)
# Output: b'Caf? ?'

# Ignore unsupported characters entirely
ignored = text.encode("ascii", errors="ignore")
print(ignored)
# Output: b'Caf '

# Use XML character references
xmlcharref = text.encode("ascii", errors="xmlcharrefreplace")
print(xmlcharref)
# Output: b'Caf&#233; &#127861;'

# Use Python's backslash escape sequences
backslash = text.encode("ascii", errors="backslashreplace")
print(backslash)
# Output: b'Caf\\xe9 \\U0001f375'

Output:

Error: 'ascii' codec can't encode character '\xe9' in position 3: ordinal not in range(128)
b'Caf? ?'
b'Caf '
b'Caf&#233; &#127861;'
b'Caf\\xe9 \\U0001f375'

Error Handling Strategies

StrategyBehaviorUse Case
strictRaises UnicodeEncodeErrorDefault; fail fast
ignoreRemoves unencodable charactersLossy but simple
replaceSubstitutes with ?User-facing output
xmlcharrefreplaceUses XML numeric referencesHTML/XML content
backslashreplaceUses Python escape sequencesDebugging

Convert Bytes Back to String

Use the decode() method to convert bytes back to a string.

byte_data = b'Hello Python \xf0\x9f\x90\x8d'

# Decode using UTF-8
text = byte_data.decode("utf-8")

print(text)
# Output: Hello Python 🐍
tip

Always use the same encoding for both encode() and decode() operations to avoid data corruption or errors.

Quick Reference

MethodReturn TypeMutableNotes
str.encode(encoding)bytesNoPreferred method
bytes(str, encoding)bytesNoExplicit constructor
bytearray(str, encoding)bytearrayYesFor in-place modification
b"literal"bytesNoASCII-only literals

Conclusion

Use str.encode('utf-8') for the vast majority of string-to-bytes conversions. UTF-8 is the web standard, handles all Unicode characters, and maintains compatibility with ASCII. Only switch to alternative encodings when interfacing with legacy systems that specifically require Latin-1, ASCII, or other formats. For scenarios requiring in-place byte manipulation, bytearray provides the necessary mutability while maintaining the same encoding interface.