Skip to main content

How to Use TextDecoder and TextEncoder in JavaScript

Binary data becomes meaningful only when you know how to interpret it. A sequence of bytes like [72, 101, 108, 108, 111] is just numbers until you decode it as text and get "Hello". The reverse is equally important: when you need to send a string over a network connection, write it to a file, or store it in a binary buffer, you must encode it into bytes first.

JavaScript provides two built-in APIs for this: TextDecoder converts binary data (bytes) into strings, and TextEncoder converts strings into binary data. Together, they handle the critical bridge between the binary world of ArrayBuffer and typed arrays and the text world of JavaScript strings.

While the concept sounds simple, text encoding has real-world complexity: different systems use different encodings (UTF-8, Windows-1251, ISO-8859-1, Shift_JIS, and dozens more), multi-byte characters can be split across data chunks, and invalid byte sequences need to be handled gracefully. This guide covers the full API surface of both TextDecoder and TextEncoder, including streaming, error handling, and working with legacy encodings you will encounter in real applications.

TextDecoder: Binary to String

TextDecoder takes binary data and produces a JavaScript string. It supports a wide range of character encodings, making it the go-to tool for reading text from files, network responses, WebSocket messages, or any other binary source.

Basic Usage

// Create a decoder (defaults to UTF-8)
let decoder = new TextDecoder();

// Decode a Uint8Array
let bytes = new Uint8Array([72, 101, 108, 108, 111, 44, 32, 87, 111, 114, 108, 100, 33]);
let text = decoder.decode(bytes);
console.log(text); // "Hello, World!"

What You Can Decode

TextDecoder.decode() accepts any of the following as input:

let decoder = new TextDecoder();

// Uint8Array
let fromUint8 = decoder.decode(new Uint8Array([72, 105]));
console.log(fromUint8); // "Hi"

// ArrayBuffer
let buffer = new ArrayBuffer(2);
new Uint8Array(buffer).set([72, 105]);
let fromBuffer = decoder.decode(buffer);
console.log(fromBuffer); // "Hi"

// DataView
let view = new DataView(new ArrayBuffer(2));
view.setUint8(0, 72);
view.setUint8(1, 105);
let fromView = decoder.decode(view);
console.log(fromView); // "Hi"

// Any TypedArray (Int8Array, Uint16Array, etc.)
let fromInt8 = decoder.decode(new Int8Array([72, 105]));
console.log(fromInt8); // "Hi"

// No argument returns an empty string
let empty = decoder.decode();
console.log(empty); // ""

The Constructor: new TextDecoder(encoding, options)

The constructor accepts two parameters:

// Parameter 1: encoding label (string, default: "utf-8")
// Parameter 2: options object

let decoder = new TextDecoder(encoding, {
fatal: false, // If true, throw on invalid data. Default: false (replace with �)
ignoreBOM: false // If true, ignore Byte Order Mark. Default: false
});

Decoding UTF-8

UTF-8 is the default encoding and the most common on the modern web. It uses 1 to 4 bytes per character:

let decoder = new TextDecoder("utf-8"); // or just new TextDecoder()

// ASCII characters: 1 byte each
let ascii = new Uint8Array([74, 97, 118, 97, 83, 99, 114, 105, 112, 116]);
console.log(decoder.decode(ascii)); // "JavaScript"

// Latin accented characters: 2 bytes each
// "Café" = C(43) a(61) f(66) é(C3 A9)
let accented = new Uint8Array([67, 97, 102, 195, 169]);
console.log(decoder.decode(accented)); // "Café"

// Chinese characters: 3 bytes each
// "你好" = ä½ (E4 BD A0) 好(E5 A5 BD)
let chinese = new Uint8Array([0xE4, 0xBD, 0xA0, 0xE5, 0xA5, 0xBD]);
console.log(decoder.decode(chinese)); // "你好"

// Emoji: 4 bytes each
// "👋" = F0 9F 91 8B
let emoji = new Uint8Array([0xF0, 0x9F, 0x91, 0x8B]);
console.log(decoder.decode(emoji)); // "👋"

Handling Invalid Bytes: fatal Option

When the decoder encounters byte sequences that are not valid for the specified encoding, it has two behaviors:

Default (fatal: false): Replace with the Unicode replacement character U+FFFD ():

let decoder = new TextDecoder("utf-8");             // fatal defaults to false

// 0xFF is not valid in UTF-8
let invalidBytes = new Uint8Array([72, 101, 0xFF, 108, 111]);
let result = decoder.decode(invalidBytes);
console.log(result); // "He�lo"

// An incomplete multi-byte sequence
let incomplete = new Uint8Array([0xE4, 0xBD]); // Only 2 of 3 bytes for a Chinese char
console.log(decoder.decode(incomplete)); // "�" (replaced)

// Mixed valid and invalid
let mixed = new Uint8Array([0xC0, 0xC1, 72, 105]); // 0xC0 and 0xC1 are never valid in UTF-8
console.log(decoder.decode(mixed)); // "��Hi"

Strict mode (fatal: true): Throw a TypeError on any invalid byte:

let strictDecoder = new TextDecoder("utf-8", { fatal: true });

try {
let invalidBytes = new Uint8Array([72, 101, 0xFF, 108, 111]);
strictDecoder.decode(invalidBytes);
} catch (error) {
console.log(error instanceof TypeError); // true
console.log(error.message); // Varies by browser, e.g., "The encoded data was not valid."
}

// Valid data works normally
let validBytes = new Uint8Array([72, 101, 108, 108, 111]);
console.log(strictDecoder.decode(validBytes)); // "Hello"
tip

Use fatal: true when data integrity matters and you want to detect corruption early. Use the default fatal: false when you want to be lenient and display as much text as possible, even if some bytes are damaged.

// Strict: for parsing structured data where invalid encoding means corruption
let strictDecoder = new TextDecoder("utf-8", { fatal: true });

// Lenient: for displaying user content that might have encoding issues
let lenientDecoder = new TextDecoder("utf-8"); // fatal: false by default

Byte Order Mark (BOM) and ignoreBOM

A Byte Order Mark (BOM) is a special Unicode character (U+FEFF) sometimes placed at the beginning of a file to indicate its encoding and byte order. In UTF-8, the BOM is the three-byte sequence EF BB BF.

By default, TextDecoder recognizes and strips the BOM from the output. The ignoreBOM option controls this:

// UTF-8 BOM bytes
let withBOM = new Uint8Array([0xEF, 0xBB, 0xBF, 72, 101, 108, 108, 111]);

// Default: BOM is recognized and stripped
let decoder1 = new TextDecoder("utf-8");
console.log(decoder1.decode(withBOM)); // "Hello" (BOM stripped)

// ignoreBOM: true (BOM is kept as a character)
let decoder2 = new TextDecoder("utf-8", { ignoreBOM: true });
let result = decoder2.decode(withBOM);
console.log(result); // "\uFEFFHello" (BOM kept)
console.log(result.length); // 6 (BOM + 5 chars)
console.log(result.charCodeAt(0) === 0xFEFF); // true

In most cases, you want the default behavior (BOM stripped). Use ignoreBOM: true only when you specifically need to preserve or detect the BOM.

Streaming Decoding

When data arrives in chunks (from a network stream, file reader, or WebSocket), a multi-byte character might be split across two chunks. Without streaming mode, each chunk is decoded independently, and the split character produces replacement characters or errors.

The stream option in decode() tells the decoder to keep incomplete multi-byte sequences buffered for the next call:

let decoder = new TextDecoder("utf-8");

// "Привет" in UTF-8: each Cyrillic letter is 2 bytes
// П(D0 9F) р(D1 80) и(D0 B8) в(D0 B2) е(D0 B5) т(D1 82)
// Full byte sequence: D0 9F D1 80 D0 B8 D0 B2 D0 B5 D1 82

// Simulate receiving data in chunks that split a character
let chunk1 = new Uint8Array([0xD0, 0x9F, 0xD1, 0x80, 0xD0]); // "Пр" + first byte of "и"
let chunk2 = new Uint8Array([0xB8, 0xD0, 0xB2, 0xD0, 0xB5, 0xD1, 0x82]); // rest of "ивет"

Without streaming (incorrect):

let decoder = new TextDecoder("utf-8");

let part1 = decoder.decode(chunk1);
console.log(part1); // "Пр�" (the trailing 0xD0 is invalid on its own)

let part2 = decoder.decode(chunk2);
console.log(part2); // "�вет" (0xB8 alone is invalid)

console.log(part1 + part2); // "Пр��вет" (corrupted!)

With streaming (correct):

let decoder = new TextDecoder("utf-8");

let part1 = decoder.decode(chunk1, { stream: true });
console.log(part1); // "Пр" (the trailing 0xD0 is buffered, not decoded yet)

let part2 = decoder.decode(chunk2); // No stream option on the last chunk
console.log(part2); // "ивет" (0xD0 from chunk1 + 0xB8 from chunk2 = "и")

console.log(part1 + part2); // "Привет" (correct!)

The rule is simple: pass { stream: true } for every chunk except the last one. On the last chunk, omit the option (or pass { stream: false }) so the decoder flushes any remaining buffered bytes.

Practical Streaming Example: Reading a Fetch Response

async function readStreamAsText(url) {
let response = await fetch(url);
let reader = response.body.getReader();
let decoder = new TextDecoder("utf-8");
let result = "";

while (true) {
let { done, value } = await reader.read();

if (done) {
// Flush any remaining buffered bytes
result += decoder.decode();
break;
}

// Decode chunk with stream: true to handle split characters
result += decoder.decode(value, { stream: true });
}

return result;
}

// Usage
let text = await readStreamAsText("/api/large-text-file");
console.log(text);

Reusing a Decoder

A TextDecoder instance can be reused for multiple independent decoding operations. Each call to decode() without { stream: true } resets any internal state:

let decoder = new TextDecoder("utf-8");

// Completely independent operations
let text1 = decoder.decode(new Uint8Array([72, 101, 108, 108, 111]));
console.log(text1); // "Hello"

let text2 = decoder.decode(new Uint8Array([87, 111, 114, 108, 100]));
console.log(text2); // "World"

However, when using streaming mode, the decoder maintains internal state between calls. If you start a streaming sequence and want to abort it, call decode() with no arguments to flush and reset:

let decoder = new TextDecoder("utf-8");

// Start streaming
decoder.decode(new Uint8Array([0xD0]), { stream: true });

// Oops, we want to abort this stream and start fresh
decoder.decode(); // Flush/reset

// Now we can start a new independent decode
let text = decoder.decode(new Uint8Array([72, 105]));
console.log(text); // "Hi"

TextEncoder: String to Binary

TextEncoder converts JavaScript strings into binary data. Unlike TextDecoder, which supports many encodings, TextEncoder always outputs UTF-8. This is by design: UTF-8 is the universal encoding for the web, and having a single consistent output encoding simplifies the API.

Basic Usage

let encoder = new TextEncoder();

let bytes = encoder.encode("Hello");
console.log(bytes); // Uint8Array(5) [72, 101, 108, 108, 111]
console.log(bytes.length); // 5
console.log(bytes instanceof Uint8Array); // true

encode(): Returns a New Uint8Array

The encode method always returns a new Uint8Array containing the UTF-8 encoded bytes:

let encoder = new TextEncoder();

// ASCII: 1 byte per character
let ascii = encoder.encode("Hello");
console.log(ascii); // Uint8Array [72, 101, 108, 108, 111]
console.log(ascii.length); // 5

// Accented characters: 2 bytes in UTF-8
let french = encoder.encode("Café");
console.log(french); // Uint8Array [67, 97, 102, 195, 169]
console.log(french.length); // 5 (not 4, é takes 2 bytes)

// CJK characters: 3 bytes in UTF-8
let japanese = encoder.encode("日本語");
console.log(japanese.length); // 9 (3 characters × 3 bytes each)

// Emoji: 4 bytes in UTF-8
let emoji = encoder.encode("Hello 👋🌍");
console.log(emoji.length); // 14 (5 + 1 space + 4 + 4)

// Empty string
let empty = encoder.encode("");
console.log(empty.length); // 0

Understanding UTF-8 Byte Lengths

Since TextEncoder always produces UTF-8, it is important to understand how many bytes different characters require:

let encoder = new TextEncoder();

function showByteLength(str) {
let bytes = encoder.encode(str);
console.log(`"${str}" → ${bytes.length} bytes (${str.length} JS chars)`);
return bytes;
}

showByteLength("A"); // "A" → 1 byte (1 JS char)
showByteLength("ñ"); // "ñ" → 2 bytes (1 JS char)
showByteLength("€"); // "€" → 3 bytes (1 JS char)
showByteLength("你"); // "你" → 3 bytes (1 JS char)
showByteLength("👋"); // "👋" → 4 bytes (2 JS chars, surrogate pair)
showByteLength("🇺🇸"); // "🇺🇸" → 8 bytes (4 JS chars, two regional indicators)

// Practical: a tweet-length string
let tweet = "JavaScript is amazing! 🚀✨";
showByteLength(tweet);
// "JavaScript is amazing! 🚀✨" → 30 bytes (25 JS chars)
Code Point RangeUTF-8 BytesExamples
U+0000 to U+007F1 byteASCII: A, z, 0, !, space
U+0080 to U+07FF2 bytesñ, ü, é, Σ, Д
U+0800 to U+FFFF3 bytes你, €, ₹, ∞
U+10000 to U+10FFFF4 bytes👋, 🎉, 🌍, 𝄞

encodeInto(): Encode Into an Existing Buffer

While encode() always creates a new Uint8Array, encodeInto() writes the encoded bytes into an existing Uint8Array. This avoids memory allocation and is more efficient for repeated encoding operations or when you are building binary data incrementally.

let encoder = new TextEncoder();
let buffer = new Uint8Array(20);

let result = encoder.encodeInto("Hello", buffer);
console.log(result);
// { read: 5, written: 5 }
// read: number of UTF-16 code units consumed from the source string
// written: number of UTF-8 bytes written to the buffer

console.log(buffer);
// Uint8Array [72, 101, 108, 108, 111, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

What happens when the buffer is too small:

let encoder = new TextEncoder();
let smallBuffer = new Uint8Array(3);

// "Hello" needs 5 bytes, but buffer only has 3
let result = encoder.encodeInto("Hello", smallBuffer);
console.log(result);
// { read: 3, written: 3 }
// Only "Hel" was encoded, the rest did not fit

console.log(smallBuffer); // Uint8Array [72, 101, 108]

// Multi-byte characters are never partially written
let tinyBuffer = new Uint8Array(1);
let result2 = encoder.encodeInto("€", tinyBuffer); // € needs 3 bytes
console.log(result2);
// { read: 0, written: 0 }
// Nothing was written, the character doesn't fit even partially

The key guarantee: encodeInto never writes a partial multi-byte character. If the remaining buffer space cannot fit the next character's complete UTF-8 sequence, it stops and reports how much it read and wrote.

Writing Multiple Strings Into One Buffer

let encoder = new TextEncoder();
let buffer = new Uint8Array(100);
let offset = 0;

function writeString(str) {
let result = encoder.encodeInto(str, buffer.subarray(offset));
offset += result.written;
return result;
}

writeString("Name: ");
writeString("Alice");
writeString("\n");
writeString("Age: ");
writeString("30");

// Read back the combined result
let decoder = new TextDecoder();
let text = decoder.decode(buffer.subarray(0, offset));
console.log(text);
// "Name: Alice\nAge: 30"
console.log(`Total bytes written: ${offset}`); // 20

encode() vs. encodeInto() Comparison

Featureencode(string)encodeInto(string, uint8Array)
ReturnsNew Uint8Array{ read, written }
Memory allocationAllocates new buffer every callWrites into existing buffer
Buffer too small?N/A (always fits)Writes as much as fits, reports remainder
PerformanceSlight overhead from allocationBetter for frequent/repeated encoding
SimplicitySimpler to useMore code but more control
let encoder = new TextEncoder();

// Use encode() for simplicity when you just need the bytes
let bytes = encoder.encode("Quick and easy");

// Use encodeInto() for performance in hot paths
let reusableBuffer = new Uint8Array(1024);
function processMessage(msg) {
let { written } = encoder.encodeInto(msg, reusableBuffer);
sendBinaryData(reusableBuffer.subarray(0, written));
}

Calculating Buffer Size Before Encoding

If you need to know the exact byte size before encoding (to allocate the right buffer), you can calculate it:

function utf8ByteLength(str) {
let encoder = new TextEncoder();
return encoder.encode(str).byteLength;
}

// Or calculate without encoding (faster for large strings):
function utf8ByteLengthFast(str) {
let bytes = 0;
for (let i = 0; i < str.length; i++) {
let code = str.charCodeAt(i);
if (code <= 0x7F) {
bytes += 1;
} else if (code <= 0x7FF) {
bytes += 2;
} else if (code >= 0xD800 && code <= 0xDBFF) {
// Surrogate pair (emoji and other characters above U+FFFF)
bytes += 4;
i++; // Skip the low surrogate
} else {
bytes += 3;
}
}
return bytes;
}

console.log(utf8ByteLengthFast("Hello")); // 5
console.log(utf8ByteLengthFast("Café")); // 5
console.log(utf8ByteLengthFast("你好")); // 6
console.log(utf8ByteLengthFast("👋")); // 4

Handling Different Encodings

The modern web is predominantly UTF-8, but you will still encounter other encodings when working with legacy systems, files from different regions, older databases, or binary protocols. TextDecoder supports a wide range of encodings to handle these situations.

Supported Encodings

TextDecoder supports all encodings listed in the Encoding Standard. Here are the most commonly encountered ones:

Unicode encodings:

LabelDescriptionCommon Use
"utf-8"Variable length, 1-4 bytesWeb standard, modern files
"utf-16le"2 or 4 bytes, little-endianWindows internal, some APIs
"utf-16be"2 or 4 bytes, big-endianJava, some network protocols

Legacy single-byte encodings (Western):

LabelDescriptionCommon Use
"windows-1252"Western EuropeanOld Windows documents
"iso-8859-1" (aka "latin1")Western EuropeanHTTP headers, old HTML
"iso-8859-15"Western European + €Updated Latin-1
"ascii"7-bit ASCIIPlain text, protocols

Legacy single-byte encodings (Other regions):

LabelDescriptionCommon Use
"windows-1251"Cyrillic (Russian, etc.)Old Russian Windows files
"windows-1256"ArabicOld Arabic Windows files
"iso-8859-2"Central EuropeanOld Czech, Polish files
"iso-8859-7"GreekOld Greek files
"koi8-r"RussianUnix/Linux Russian text

Legacy multi-byte encodings (East Asian):

LabelDescriptionCommon Use
"shift_jis"JapaneseOld Japanese web pages, games
"euc-jp"JapaneseUnix/Linux Japanese text
"iso-2022-jp"JapaneseJapanese email
"gb2312" / "gbk" / "gb18030"Chinese (Simplified)Chinese websites and files
"big5"Chinese (Traditional)Taiwanese websites and files
"euc-kr"KoreanKorean websites and files

Specifying an Encoding

// Windows-1251 (Cyrillic)
let cyrillicDecoder = new TextDecoder("windows-1251");
let cyrillicBytes = new Uint8Array([207, 240, 232, 226, 229, 242]);
console.log(cyrillicDecoder.decode(cyrillicBytes)); // "Привет"

// Shift_JIS (Japanese)
let japaneseDecoder = new TextDecoder("shift_jis");
let japaneseBytes = new Uint8Array([0x82, 0xB1, 0x82, 0xF1, 0x82, 0xC9, 0x82, 0xBF, 0x82, 0xCD]);
console.log(japaneseDecoder.decode(japaneseBytes)); // "こんにちは"

// ISO-8859-1 (Latin-1)
let latinDecoder = new TextDecoder("iso-8859-1");
let latinBytes = new Uint8Array([72, 101, 108, 108, 111, 44, 32, 87, 246, 114, 108, 100, 33]);
console.log(latinDecoder.decode(latinBytes)); // "Hello, Wörld!"

// GBK (Chinese Simplified)
let chineseDecoder = new TextDecoder("gbk");
let chineseBytes = new Uint8Array([0xC4, 0xE3, 0xBA, 0xC3]);
console.log(chineseDecoder.decode(chineseBytes)); // "你好"

Encoding Labels Are Case-Insensitive

The encoding label accepts various aliases and is case-insensitive:

// These all create a UTF-8 decoder
new TextDecoder("utf-8");
new TextDecoder("UTF-8");
new TextDecoder("utf8");

// These all create a Windows-1251 decoder
new TextDecoder("windows-1251");
new TextDecoder("cp1251");
new TextDecoder("x-cp1251");

// These all create an ISO-8859-1 decoder
new TextDecoder("iso-8859-1");
new TextDecoder("latin1");
new TextDecoder("iso8859-1");

Invalid Encoding Label

If you pass an unsupported encoding label, the constructor throws a RangeError:

try {
let decoder = new TextDecoder("invalid-encoding");
} catch (error) {
console.log(error instanceof RangeError); // true
console.log(error.message); // "'invalid-encoding' is not a supported encoding."
}

TextEncoder Only Supports UTF-8

Unlike TextDecoder, TextEncoder does not accept an encoding parameter. It always encodes to UTF-8:

let encoder = new TextEncoder();
console.log(encoder.encoding); // "utf-8"

// There is no way to encode to Windows-1251, Shift_JIS, etc. with TextEncoder

This is intentional. The Encoding Standard specifies that encoding (string to bytes) should only produce UTF-8 to encourage the web to standardize on UTF-8. For legacy encoding output, you would need a library.

info

If you need to encode text into a non-UTF-8 encoding (which is rare on the modern web), you can use libraries like iconv-lite (Node.js) or implement manual encoding for simple single-byte encodings:

// Manual encoding for ASCII-compatible single-byte encodings
function encodeWindows1251(str) {
// This requires a lookup table from Unicode code points to Windows-1251 bytes
// For production use, prefer a library
let encoder = new TextEncoder();
// ... custom mapping logic ...
}

In most cases, if you are producing binary output, UTF-8 is the correct choice.

Detecting the Encoding of Unknown Data

The browser does not provide a built-in encoding detection API. When you receive binary data without knowing its encoding, you have several options:

Check for a BOM (Byte Order Mark):

function detectBOM(bytes) {
if (bytes[0] === 0xEF && bytes[1] === 0xBB && bytes[2] === 0xBF) {
return "utf-8";
}
if (bytes[0] === 0xFF && bytes[1] === 0xFE) {
return "utf-16le";
}
if (bytes[0] === 0xFE && bytes[1] === 0xFF) {
return "utf-16be";
}
return null; // No BOM detected
}

let data = new Uint8Array([0xEF, 0xBB, 0xBF, 72, 101, 108, 108, 111]);
let encoding = detectBOM(data);
console.log(encoding); // "utf-8"

if (encoding) {
let decoder = new TextDecoder(encoding);
console.log(decoder.decode(data)); // "Hello" (BOM is stripped by default)
}

Try UTF-8 with fatal: true, fall back to another encoding:

function decodeWithFallback(bytes, fallbackEncoding = "windows-1252") {
try {
let decoder = new TextDecoder("utf-8", { fatal: true });
return decoder.decode(bytes);
} catch (e) {
// Not valid UTF-8: try the fallback encoding
console.warn("Not valid UTF-8, falling back to", fallbackEncoding);
let decoder = new TextDecoder(fallbackEncoding);
return decoder.decode(bytes);
}
}

// Valid UTF-8
let utf8Data = new Uint8Array([67, 97, 102, 195, 169]); // "Café" in UTF-8
console.log(decodeWithFallback(utf8Data)); // "Café"

// Windows-1252 encoded data (not valid UTF-8)
let win1252Data = new Uint8Array([67, 97, 102, 233]); // "Café" in Windows-1252
console.log(decodeWithFallback(win1252Data)); // "Café" (via fallback)

Check the HTTP Content-Type header:

async function fetchWithEncoding(url) {
let response = await fetch(url);

// Try to get encoding from Content-Type header
let contentType = response.headers.get("Content-Type") || "";
let charsetMatch = contentType.match(/charset=([^\s;]+)/i);
let encoding = charsetMatch ? charsetMatch[1] : "utf-8";

console.log(`Detected encoding: ${encoding}`);

let buffer = await response.arrayBuffer();
let decoder = new TextDecoder(encoding);
return decoder.decode(buffer);
}

Check the HTML <meta> tag (for HTML files):

function detectHTMLEncoding(bytes) {
// Read the first 1024 bytes as ASCII to find a meta charset tag
let asciiDecoder = new TextDecoder("ascii");
let head = asciiDecoder.decode(bytes.slice(0, 1024));

// Look for <meta charset="...">
let match = head.match(/<meta\s+charset=["']?([^"'\s>]+)/i);
if (match) return match[1];

// Look for <meta http-equiv="Content-Type" content="...; charset=...">
match = head.match(/charset=([^"'\s;>]+)/i);
if (match) return match[1];

return "utf-8"; // Default
}

Practical Example: Converting Between Encodings

To convert text from one encoding to another, decode from the source encoding and re-encode to UTF-8 (since TextEncoder only produces UTF-8):

function convertToUTF8(bytes, sourceEncoding) {
// Decode from the source encoding
let decoder = new TextDecoder(sourceEncoding);
let text = decoder.decode(bytes);

// Encode to UTF-8
let encoder = new TextEncoder();
return encoder.encode(text);
}

// Convert Windows-1251 Russian text to UTF-8
let win1251Bytes = new Uint8Array([207, 240, 232, 226, 229, 242]); // "Привет" in Windows-1251
let utf8Bytes = convertToUTF8(win1251Bytes, "windows-1251");
console.log(utf8Bytes);
// Uint8Array [208, 159, 209, 128, 208, 184, 208, 178, 208, 181, 209, 130]
// "Привет" in UTF-8 (12 bytes vs 6 in Windows-1251)

// Verify
let decoder = new TextDecoder("utf-8");
console.log(decoder.decode(utf8Bytes)); // "Привет"

Practical Examples

Building a Text File Downloader

function downloadTextFile(text, filename, encoding = "utf-8") {
let encoder = new TextEncoder();
let bytes = encoder.encode(text);

// Optionally add a UTF-8 BOM for Excel compatibility
let bom = new Uint8Array([0xEF, 0xBB, 0xBF]);
let withBOM = new Uint8Array(bom.length + bytes.length);
withBOM.set(bom);
withBOM.set(bytes, bom.length);

let blob = new Blob([withBOM], { type: "text/plain;charset=utf-8" });
let url = URL.createObjectURL(blob);

let a = document.createElement("a");
a.href = url;
a.download = filename;
a.click();

URL.revokeObjectURL(url);
}

// Usage
downloadTextFile("Hello, World!\nLine 2", "output.txt");

Reading a Text File with Encoding Detection

async function readTextFile(file) {
let buffer = await file.arrayBuffer();
let bytes = new Uint8Array(buffer);

// Step 1: Check for BOM
let encoding = detectBOM(bytes);
if (encoding) {
let decoder = new TextDecoder(encoding);
return { text: decoder.decode(bytes), encoding, method: "BOM" };
}

// Step 2: Try UTF-8 strict
try {
let decoder = new TextDecoder("utf-8", { fatal: true });
let text = decoder.decode(bytes);
return { text, encoding: "utf-8", method: "valid UTF-8" };
} catch (e) {
// Not valid UTF-8
}

// Step 3: Fall back to Windows-1252 (common for Western files)
let decoder = new TextDecoder("windows-1252");
return { text: decoder.decode(bytes), encoding: "windows-1252", method: "fallback" };
}

function detectBOM(bytes) {
if (bytes[0] === 0xEF && bytes[1] === 0xBB && bytes[2] === 0xBF) return "utf-8";
if (bytes[0] === 0xFF && bytes[1] === 0xFE) return "utf-16le";
if (bytes[0] === 0xFE && bytes[1] === 0xFF) return "utf-16be";
return null;
}

// Usage with a file input
document.getElementById("file-input").addEventListener("change", async (event) => {
let file = event.target.files[0];
if (!file) return;

let result = await readTextFile(file);
console.log(`Encoding: ${result.encoding} (${result.method})`);
console.log(`Content: ${result.text.substring(0, 200)}...`);
});

Streaming Text Processing

async function processLargeTextStream(url, processLine) {
let response = await fetch(url);
let reader = response.body.getReader();
let decoder = new TextDecoder("utf-8");
let buffer = "";

while (true) {
let { done, value } = await reader.read();

if (done) {
// Process any remaining text in the buffer
if (buffer.length > 0) {
processLine(buffer);
}
break;
}

// Decode the chunk (streaming mode to handle split characters)
buffer += decoder.decode(value, { stream: true });

// Process complete lines
let lines = buffer.split("\n");
// Keep the last incomplete line in the buffer
buffer = lines.pop();

for (let line of lines) {
processLine(line);
}
}
}

// Usage: process a large CSV file line by line
await processLargeTextStream("/data/large-file.csv", (line) => {
let fields = line.split(",");
console.log(`Name: ${fields[0]}, Value: ${fields[1]}`);
});

WebSocket Binary Messages

let encoder = new TextEncoder();
let decoder = new TextDecoder();

let ws = new WebSocket("wss://example.com/socket");
ws.binaryType = "arraybuffer";

// Sending a text message as binary
function sendMessage(type, payload) {
let payloadBytes = encoder.encode(JSON.stringify(payload));

// Create a message: 1 byte type + payload
let message = new Uint8Array(1 + payloadBytes.length);
message[0] = type;
message.set(payloadBytes, 1);

ws.send(message.buffer);
}

// Receiving and decoding a binary message
ws.addEventListener("message", (event) => {
let data = new Uint8Array(event.data);
let type = data[0];
let payloadBytes = data.subarray(1);
let payload = JSON.parse(decoder.decode(payloadBytes));

console.log(`Type: ${type}, Payload:`, payload);
});

// Usage
sendMessage(1, { action: "subscribe", channel: "updates" });

Comparing Strings at the Byte Level

let encoder = new TextEncoder();

function bytesEqual(str1, str2) {
let bytes1 = encoder.encode(str1);
let bytes2 = encoder.encode(str2);

if (bytes1.length !== bytes2.length) return false;

for (let i = 0; i < bytes1.length; i++) {
if (bytes1[i] !== bytes2[i]) return false;
}
return true;
}

// Visually identical strings can differ at the byte level
// (due to Unicode normalization: é can be one character or e + combining accent)
let composed = "Caf\u00E9"; // é as a single code point
let decomposed = "Cafe\u0301"; // e + combining acute accent

console.log(composed === decomposed); // false (different JS strings)
console.log(bytesEqual(composed, decomposed)); // false (different bytes too)

// After normalization, they match
let normalized1 = composed.normalize("NFC");
let normalized2 = decomposed.normalize("NFC");
console.log(bytesEqual(normalized1, normalized2)); // true

Summary

TextDecoder and TextEncoder provide the bridge between binary data and text strings in JavaScript.

TextDecoder (Binary to String):

  • Constructor: new TextDecoder(encoding?, { fatal?, ignoreBOM? })
  • Supports many encodings: UTF-8, UTF-16, Windows-1251, Shift_JIS, GBK, ISO-8859-1, and dozens more.
  • decode(buffer, { stream? }) converts binary data to a string.
  • Default behavior on invalid bytes: replace with . Set fatal: true to throw instead.
  • Use { stream: true } when decoding data in chunks to handle split multi-byte characters.
  • Pass { stream: true } on every chunk except the last.
  • Encoding labels are case-insensitive and accept common aliases.

TextEncoder (String to Binary):

  • Always encodes to UTF-8 (no other encoding is supported).
  • encode(string) returns a new Uint8Array containing the UTF-8 bytes.
  • encodeInto(string, uint8Array) writes into an existing buffer and returns { read, written }.
  • encodeInto never writes partial multi-byte characters.
  • Use encodeInto for better performance when encoding repeatedly or into pre-allocated buffers.

Handling Different Encodings:

  • Check for BOM bytes to detect UTF-8, UTF-16LE, or UTF-16BE.
  • Try TextDecoder("utf-8", { fatal: true }) first, then fall back to legacy encodings.
  • Check HTTP Content-Type headers or HTML <meta charset> tags for declared encodings.
  • To convert between encodings: decode from the source encoding, then re-encode to UTF-8.
  • TextEncoder produces only UTF-8. For other output encodings, use a library.

Common UTF-8 byte sizes:

  • ASCII (A-Z, 0-9, punctuation): 1 byte per character
  • Latin accents, Cyrillic, Greek, Arabic, Hebrew: 2 bytes per character
  • CJK (Chinese, Japanese, Korean), most symbols: 3 bytes per character
  • Emoji, rare scripts, musical symbols: 4 bytes per character