Skip to main content

How to Convert a String to Binary in Python

Converting strings to binary representation is a fundamental skill for developers working with cryptography, data compression, network protocols, or low-level data manipulation. Understanding how text maps to binary helps you work confidently with any system that operates at the byte level.

In this guide, you will learn how to convert strings to their binary representation and back, handle Unicode characters including emojis, and build reusable functions for production code.

Understanding the Basics​

Computers store text as numeric codes (ASCII or Unicode), which are ultimately represented as binary bits. Converting a string to binary means transforming each character into its binary equivalent.

For example, the character 'A' has a Unicode code point of 65, which is 01000001 in binary.

info

ASCII characters use 7 to 8 bits, while Unicode characters like accented letters, CJK characters, or emojis may require multiple bytes when encoded in UTF-8.

Converting a String to Binary Using format()​

The most straightforward approach iterates over each character, retrieves its Unicode code point with ord(), and formats it as an 8-bit binary string:

text = "ABC"

# ord(c) returns the Unicode code point (65 for 'A', 66 for 'B', etc.)
# format(n, '08b') converts to an 8-bit binary string
binary_list = [format(ord(c), '08b') for c in text]

binary_str = " ".join(binary_list)
print(binary_str)

Output:

01000001 01000010 01000011

Understanding the Format Specifier​

The '08b' format specifier consists of three parts:

ComponentMeaning
0Pad with leading zeros
8Total width of 8 characters
bBinary format
warning

Always use '08b' with zero-padding. Without it, shorter binary values lose their leading zeros, breaking byte alignment and making reliable decoding impossible:

# Without padding: inconsistent widths
print(format(ord('A'), 'b')) # 1000001 (7 bits)
print(format(ord('\n'), 'b')) # 1010 (4 bits)

# With padding: consistent 8-bit representation
print(format(ord('A'), '08b')) # 01000001
print(format(ord('\n'), '08b')) # 00001010

Output:

1000001
1010
01000001
00001010

Converting Binary Back to a String​

Reversing the process requires splitting the binary string into individual byte representations, parsing each one as a base-2 integer, and converting it back to a character:

binary_input = "01000001 01000010 01000011"

# int(b, 2) parses a base-2 string to an integer
# chr(n) converts the integer to its corresponding character
text = "".join(chr(int(b, 2)) for b in binary_input.split())

print(text)

Output:

ABC

Handling Unicode and Emojis with UTF-8 Encoding​

The ord() approach works for basic characters, but multi-byte Unicode characters like emojis do not fit in 8 bits. For full Unicode support, encode the string to bytes first:

text = "Hi 🐍"

# Encode to UTF-8 bytes to handle all Unicode characters
binary_str = " ".join(format(byte, '08b') for byte in text.encode('utf-8'))

print(binary_str)

Output:

01001000 01101001 00100000 11110000 10011111 10010000 10001101
tip

The snake emoji (🐍) requires 4 bytes in UTF-8 encoding, which is why you see four 8-bit sequences for a single character. The first three groups represent H, i, and a space, each using a single byte.

Decoding UTF-8 Binary Back to a String​

binary_input = "01001000 01101001 00100000 11110000 10011111 10010000 10001101"

# Convert binary strings to bytes, then decode from UTF-8
byte_values = bytes(int(b, 2) for b in binary_input.split())
text = byte_values.decode('utf-8')

print(text)

Output:

Hi 🐍

Why UTF-8 Encoding Matters​

The simple ord() approach and the UTF-8 byte approach produce different results for non-ASCII characters:

text = "Γ©"  # e with acute accent

# ord() gives the Unicode code point (233)
print(f"Code point: {ord(text)}")
print(f"ord() binary: {format(ord(text), '08b')}")

# UTF-8 encodes it as two bytes
utf8_bytes = text.encode('utf-8')
print(f"UTF-8 bytes: {list(utf8_bytes)}")
print(f"UTF-8 binary: {' '.join(format(b, '08b') for b in utf8_bytes)}")

Output:

Code point: 233
ord() binary: 11101001
UTF-8 bytes: [195, 169]
UTF-8 binary: 11000011 10101001

The ord() approach gives you the raw code point in binary, while UTF-8 encoding gives you the actual byte representation used for storage and transmission. For most real-world applications, the UTF-8 approach is what you need.

Reusable Conversion Functions​

For production code, wrap these operations in reusable functions with proper error handling:

def string_to_binary(text: str, encoding: str = 'utf-8') -> str:
"""Convert a string to a space-separated binary representation."""
return ' '.join(format(byte, '08b') for byte in text.encode(encoding))

def binary_to_string(binary_str: str, encoding: str = 'utf-8') -> str:
"""Convert a space-separated binary string back to text."""
byte_values = bytes(int(b, 2) for b in binary_str.split())
return byte_values.decode(encoding)

# Demonstrate round-trip conversion
original = "Hello, δΈ–η•Œ! 🌍"
binary = string_to_binary(original)
restored = binary_to_string(binary)

print(f"Original: {original}")
print(f"Binary: {binary[:56]}...")
print(f"Restored: {restored}")
print(f"Match: {original == restored}")

Output:

Original:  Hello, δΈ–η•Œ! 🌍
Binary: 01001000 01100101 01101100 01101100 01101111 00101100 00...
Restored: Hello, δΈ–η•Œ! 🌍
Match: True

The round-trip test confirms that the original string is perfectly preserved through the conversion and back.

Quick Reference​

GoalCode
String to binary (ASCII)' '.join(format(ord(c), '08b') for c in s)
String to binary (UTF-8)' '.join(format(b, '08b') for b in s.encode())
Binary to string (ASCII)''.join(chr(int(x, 2)) for x in binary.split())
Binary to string (UTF-8)bytes(int(x, 2) for x in binary.split()).decode()

Conclusion​

Converting strings to binary in Python is straightforward once you understand the relationship between characters, their numeric code points, and binary encoding. For ASCII-only text, the ord() and chr() functions with format(n, '08b') work perfectly. For modern applications that handle international text, accented characters, or emojis, always use UTF-8 byte encoding to ensure complete Unicode support and compatibility with how text is actually stored and transmitted in real systems.