Skip to main content

How to Work with Strings in JavaScript

Strings are one of the most frequently used data types in JavaScript. Every piece of text in your application, from user names and error messages to HTML content and API responses, is a string. JavaScript provides an extensive set of built-in methods for searching, extracting, transforming, and comparing strings, making it one of the most text-capable languages in web development.

However, strings in JavaScript have characteristics that set them apart from many other languages. They are immutable, meaning they can never be changed in place. They are stored internally as UTF-16, which affects how they handle emoji and certain international characters. And the sheer number of available methods, some overlapping, some deprecated, can be overwhelming without a clear guide.

This article covers everything you need to know about strings: from the three types of quotes and template literals, through every major string method with practical examples, to the internal Unicode representation that affects how strings behave with modern characters.

String Immutabilityโ€‹

The most fundamental characteristic of strings in JavaScript is that they are immutable. Once a string is created, it cannot be changed. Every operation that appears to modify a string actually creates and returns a new string, leaving the original untouched.

let greeting = "Hello";

greeting[0] = "J"; // Attempting to change the first character
console.log(greeting); // "Hello" (unchanged! No error, just silently ignored.)

let upper = greeting.toUpperCase();
console.log(greeting); // "Hello" (still the original)
console.log(upper); // "HELLO" (a brand-new string)

In strict mode, attempting to assign to a string index throws a TypeError:

"use strict";
let str = "Hello";
str[0] = "J"; // TypeError: Cannot assign to read only property '0' of string 'Hello'

This immutability means that when you "modify" a string, you are always creating a new one. Building strings through repeated concatenation in a loop can be inefficient for very large strings because each concatenation creates a new string object.

// Each += creates a new string
let result = "";
for (let i = 0; i < 5; i++) {
result += i + " "; // New string created each iteration
}
console.log(result); // "0 1 2 3 4 "

For building complex strings, using an array and join() or template literals is often cleaner and can be more efficient.

Quotes: Single, Double, Backticksโ€‹

JavaScript supports three types of quotes for creating strings. Single and double quotes are functionally identical, while backticks (template literals) provide additional features.

Single and Double Quotesโ€‹

let single = 'Hello, World!';
let double = "Hello, World!";

console.log(single === double); // true (identical strings)

There is no functional difference between single and double quotes. The choice is purely stylistic. Most modern style guides (Airbnb, StandardJS) prefer single quotes for consistency, reserving double quotes for strings that contain apostrophes:

let message = "It's a beautiful day";    // Double quotes avoid escaping
let alt = 'It\'s a beautiful day'; // Single quotes require escaping
let html = '<div class="container">'; // Single quotes for HTML strings

Backticks (Template Literals)โ€‹

Backticks provide three features that single and double quotes do not:

// 1. String interpolation
let name = "Alice";
let greeting = `Hello, ${name}!`;
console.log(greeting); // "Hello, Alice!"

// 2. Multi-line strings
let poem = `Roses are red,
Violets are blue,
JavaScript is great,
And so are you.`;
console.log(poem);

// 3. Embedded expressions
let a = 5, b = 3;
console.log(`${a} + ${b} = ${a + b}`); // "5 + 3 = 8"

Single and double quotes cannot span multiple lines without escape sequences and do not support interpolation.

Template Literals: Interpolation, Multi-Line, Tagged Templatesโ€‹

Template literals (backtick strings) deserve a deeper look because they are one of the most powerful string features in modern JavaScript.

String Interpolationโ€‹

Any JavaScript expression can be placed inside ${}:

let user = { name: "Alice", age: 30 };

console.log(`Name: ${user.name}`); // "Name: Alice"
console.log(`Age next year: ${user.age + 1}`); // "Age next year: 31"
console.log(`Is adult: ${user.age >= 18}`); // "Is adult: true"
console.log(`Uppercase: ${user.name.toUpperCase()}`); // "Uppercase: ALICE"

// Function calls inside interpolation
function greet(name) {
return `Hello, ${name}!`;
}
console.log(`Message: ${greet("Bob")}`); // "Message: Hello, Bob!"

// Ternary expressions
let score = 85;
console.log(`Result: ${score >= 60 ? "Pass" : "Fail"}`); // "Result: Pass"

Multi-Line Stringsโ€‹

Template literals preserve line breaks exactly as written:

let html = `
<div class="card">
<h2>${user.name}</h2>
<p>Age: ${user.age}</p>
</div>
`;

console.log(html);
//
// <div class="card">
// <h2>Alice</h2>
// <p>Age: 30</p>
// </div>
//

Note that the leading and trailing newlines are included in the string. If you do not want them, start the content on the same line as the opening backtick.

Tagged Templatesโ€‹

Tagged templates are an advanced feature where a function processes the template literal. The function receives the string parts and the interpolated values separately:

function highlight(strings, ...values) {
let result = "";
strings.forEach((str, i) => {
result += str;
if (i < values.length) {
result += `**${values[i]}**`;
}
});
return result;
}

let name = "Alice";
let role = "developer";

let message = highlight`Welcome ${name}, you are a ${role}!`;
console.log(message); // "Welcome **Alice**, you are a **developer**!"

The tag function receives:

  • strings: an array of string segments between interpolations (["Welcome ", ", you are a ", "!"])
  • ...values: the interpolated values (["Alice", "developer"])

Practical Tagged Template: HTML Escapingโ€‹

function safeHTML(strings, ...values) {
let result = "";
strings.forEach((str, i) => {
result += str;
if (i < values.length) {
result += String(values[i])
.replace(/&/g, "&amp;")
.replace(/</g, "&lt;")
.replace(/>/g, "&gt;")
.replace(/"/g, "&quot;");
}
});
return result;
}

let userInput = '<script>alert("XSS")</script>';

let html = safeHTML`<div>${userInput}</div>`;
console.log(html);
// "<div>&lt;script&gt;alert(&quot;XSS&quot;)&lt;/script&gt;</div>"

This pattern is used in real-world libraries to prevent XSS attacks when building HTML from user data.

Special Characters and Escape Sequencesโ€‹

Strings can contain special characters represented by escape sequences starting with a backslash (\):

Escape SequenceCharacterDescription
\nNewlineLine feed (LF)
\rCarriage returnCR (Windows line endings use \r\n)
\tTabHorizontal tab
\\BackslashLiteral backslash
\'Single quoteInside single-quoted strings
\"Double quoteInside double-quoted strings
```BacktickInside template literals
\0Null characterThe null byte
\uXXXXUnicode (BMP)4-digit hex code point
\u{XXXXX}Unicode (full)Any code point (1-6 hex digits)
\xXXLatin-12-digit hex code
// Common escape sequences
console.log("Line 1\nLine 2");
// Line 1
// Line 2

console.log("Column1\tColumn2\tColumn3");
// Column1 Column2 Column3

console.log("She said \"hello\"");
// She said "hello"

console.log("Path: C:\\Users\\Alice");
// Path: C:\Users\Alice

// Unicode escapes
console.log("\u00A9"); // ยฉ (copyright symbol)
console.log("\u{1F600}"); // ๐Ÿ˜€ (grinning face emoji)
console.log("\u{2764}"); // โค (heart)

Multi-Line with Escape Sequences vs. Template Literalsโ€‹

// Old way: escape sequences
let oldMultiLine = "Line 1\n" +
"Line 2\n" +
"Line 3";

// Modern way: template literals
let newMultiLine = `Line 1
Line 2
Line 3`;

console.log(oldMultiLine === newMultiLine); // true

Template literals are clearly more readable for multi-line strings.

String Length and Accessing Charactersโ€‹

The length Propertyโ€‹

length returns the number of UTF-16 code units in the string, not the number of visible characters. For most text, these are the same, but they differ for emoji and certain symbols:

console.log("Hello".length);     // 5
console.log("".length); // 0
console.log(" ".length); // 1 (space is a character)
console.log("\n".length); // 1 (newline is one character)

// Emoji surprise
console.log("๐Ÿ˜€".length); // 2 (not 1! (uses two UTF-16 code units))
console.log("๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ".length); // 11 (a family emoji is many code units!)

length is a property, not a method. Do not use parentheses:

let str = "Hello";
console.log(str.length); // 5 โœ…
// console.log(str.length()); // TypeError: str.length is not a function โŒ

Accessing Individual Charactersโ€‹

Three ways to access characters by index:

let str = "Hello";

// Bracket notation (modern, preferred)
console.log(str[0]); // "H"
console.log(str[1]); // "e"
console.log(str[4]); // "o"
console.log(str[10]); // undefined (out of bounds)

// charAt() method
console.log(str.charAt(0)); // "H"
console.log(str.charAt(10)); // "" (empty string for out-of-bounds)

// at() method (ES2022) supports negative indices!
console.log(str.at(0)); // "H"
console.log(str.at(-1)); // "o" (last character)
console.log(str.at(-2)); // "l" (second to last)
console.log(str.at(10)); // undefined (out of bounds)

Comparison: [], charAt(), at()โ€‹

Featurestr[i]str.charAt(i)str.at(i)
Out-of-boundsundefined"" (empty string)undefined
Negative indexundefined""Counts from end
RecommendedYesLegacyYes (especially for negative)
let word = "JavaScript";

// Getting the last character
console.log(word[word.length - 1]); // "t" (verbose)
console.log(word.at(-1)); // "t" (clean and modern)
tip

Use str.at(-1) to access the last character of a string. It is far more readable than str[str.length - 1].

Iterating Over Stringsโ€‹

for...of Loopโ€‹

The for...of loop iterates over the characters of a string. It correctly handles multi-byte characters (like emoji):

let word = "Hello";

for (let char of word) {
console.log(char);
}
// H
// e
// l
// l
// o

for...of with Emojiโ€‹

let text = "Hi! ๐Ÿ˜€๐Ÿ‘";

// for...of correctly iterates character by character
for (let char of text) {
console.log(char);
}
// H
// i
// !
//
// ๐Ÿ˜€ (one iteration, correctly grouped)
// ๐Ÿ‘ (one iteration, correctly grouped)

// Compare with traditional for loop, breaks emoji!
for (let i = 0; i < text.length; i++) {
console.log(text[i]);
}
// H
// i
// !
//
// ๏ฟฝ (broken surrogate)
// ๏ฟฝ (broken surrogate)
// ๏ฟฝ (broken surrogate)
// ๏ฟฝ (broken surrogate)

The traditional for loop iterates over UTF-16 code units, which breaks multi-byte characters. Always use for...of when iterating over string characters.

Converting a String to an Array of Charactersโ€‹

// Spread operator (handles emoji correctly)
let chars = [..."Hello ๐Ÿ˜€"];
console.log(chars);
// ["H", "e", "l", "l", "o", " ", "๐Ÿ˜€"]

console.log(chars.length); // 7 (correct character count)

// Array.from (also handles emoji correctly)
let chars2 = Array.from("Hello ๐Ÿ˜€");
console.log(chars2); // ["H", "e", "l", "l", "o", " ", "๐Ÿ˜€"]

Searching in Stringsโ€‹

JavaScript provides several methods for finding text within strings.

indexOf and lastIndexOfโ€‹

indexOf(searchString, startPosition) returns the index of the first occurrence, or -1 if not found:

let text = "Hello, World! Hello, JavaScript!";

console.log(text.indexOf("Hello")); // 0
console.log(text.indexOf("Hello", 1)); // 14 (search starting from index 1)
console.log(text.indexOf("World")); // 7
console.log(text.indexOf("Python")); // -1 (not found)
console.log(text.indexOf("hello")); // -1 (case-sensitive!)

lastIndexOf(searchString, startPosition) searches from the end:

let text = "Hello, World! Hello, JavaScript!";

console.log(text.lastIndexOf("Hello")); // 14 (last occurrence)
console.log(text.lastIndexOf("Hello", 13)); // 0 (searching backward from index 13)

Finding All Occurrencesโ€‹

let text = "the cat sat on the mat";
let search = "the";
let positions = [];
let pos = text.indexOf(search);

while (pos !== -1) {
positions.push(pos);
pos = text.indexOf(search, pos + 1);
}

console.log(positions); // [0, 15]

includes()โ€‹

Returns true or false. Simpler than indexOf when you only need to know if a substring exists:

let text = "Hello, JavaScript!";

console.log(text.includes("JavaScript")); // true
console.log(text.includes("Python")); // false
console.log(text.includes("hello")); // false (case-sensitive)

// With start position
console.log(text.includes("Hello", 1)); // false (starts searching from index 1)

startsWith() and endsWith()โ€‹

Check if a string begins or ends with a specific substring:

let filename = "report-2024.pdf";

console.log(filename.startsWith("report")); // true
console.log(filename.startsWith("Report")); // false (case-sensitive)
console.log(filename.endsWith(".pdf")); // true
console.log(filename.endsWith(".doc")); // false

// With position parameter
console.log(filename.startsWith("2024", 7)); // true (start checking at index 7)
console.log(filename.endsWith("report", 6)); // true (consider only first 6 characters)

Practical Use Casesโ€‹

// File type validation
function isImage(filename) {
let lower = filename.toLowerCase();
return lower.endsWith(".jpg") ||
lower.endsWith(".jpeg") ||
lower.endsWith(".png") ||
lower.endsWith(".gif") ||
lower.endsWith(".webp");
}

console.log(isImage("photo.JPG")); // true
console.log(isImage("doc.pdf")); // false

// URL checking
function isSecureUrl(url) {
return url.startsWith("https://");
}

console.log(isSecureUrl("https://example.com")); // true
console.log(isSecureUrl("http://example.com")); // false

Search Methods Comparisonโ€‹

MethodReturnsUse When
indexOf()Index or -1You need the position
lastIndexOf()Index or -1You need the last position
includes()booleanYou only need to know if it exists
startsWith()booleanChecking the beginning
endsWith()booleanChecking the ending

Extracting Substrings: slice, substring, substrโ€‹

JavaScript has three methods for extracting parts of a string. In practice, slice is the one you should use.

slice(start, end)โ€‹

Returns the portion of the string from start up to (but not including) end:

let str = "Hello, World!";

console.log(str.slice(0, 5)); // "Hello"
console.log(str.slice(7, 12)); // "World"
console.log(str.slice(7)); // "World!" (omit end = go to end)

// Negative indices count from the end
console.log(str.slice(-6)); // "World!"
console.log(str.slice(-6, -1)); // "World"

// Start after end returns empty string
console.log(str.slice(5, 2)); // ""

substring(start, end)โ€‹

Similar to slice but with two differences:

  • Negative arguments are treated as 0
  • If start > end, the arguments are swapped
let str = "Hello, World!";

console.log(str.substring(0, 5)); // "Hello"
console.log(str.substring(7, 12)); // "World"

// Differences from slice:
console.log(str.substring(5, 2)); // "llo" (arguments swapped to (2, 5)
console.log(str.slice(5, 2)); // "" (returns empty string)

console.log(str.substring(-3)); // "Hello, World!" (negative treated as 0)
console.log(str.slice(-3)); // "ld!" (negative counts from end)

substr(start, length) (Deprecated)โ€‹

substr takes a start position and a length instead of an end position. It is deprecated and should not be used in new code:

let str = "Hello, World!";

// โŒ Deprecated: do not use
console.log(str.substr(7, 5)); // "World"
console.log(str.substr(-6, 5)); // "orld!"

// โœ… Use slice instead
console.log(str.slice(7, 7 + 5)); // "World"
console.log(str.slice(-6, -6 + 5)); // "orld!"

Comparison Tableโ€‹

MethodNegative argsstart > endDeprecated?
slice(start, end)Counts from endReturns ""No
substring(start, end)Treated as 0Swaps argumentsNo
substr(start, length)Start counts from endN/A (uses length)Yes
Always Use slice()

slice() is the most versatile and predictable substring method. It supports negative indices, does not have the confusing argument-swapping behavior of substring, and is the only one recommended for modern code.

Changing Caseโ€‹

toUpperCase() and toLowerCase()โ€‹

Convert the entire string to uppercase or lowercase:

let str = "Hello, World!";

console.log(str.toUpperCase()); // "HELLO, WORLD!"
console.log(str.toLowerCase()); // "hello, world!"

// Original is unchanged (strings are immutable)
console.log(str); // "Hello, World!"

// Single character
console.log("a".toUpperCase()); // "A"

toLocaleUpperCase() and toLocaleLowerCase()โ€‹

Handle locale-specific case conversions. Critical for certain languages:

// Turkish has a special case: lowercase 'i' โ†’ uppercase 'ฤฐ' (not 'I')
let turkishWord = "istanbul";

console.log(turkishWord.toUpperCase()); // "ISTANBUL" (wrong for Turkish!)
console.log(turkishWord.toLocaleUpperCase("tr-TR")); // "ฤฐSTANBUL" (correct!)

// German sharp s: 'รŸ' โ†’ 'SS'
let german = "straรŸe";
console.log(german.toUpperCase()); // "STRASSE"
console.log(german.toLocaleUpperCase("de-DE")); // "STRASSE"

Capitalizing the First Letterโ€‹

A common utility that JavaScript does not provide natively:

function capitalize(str) {
if (!str) return str;
return str[0].toUpperCase() + str.slice(1);
}

console.log(capitalize("hello")); // "Hello"
console.log(capitalize("javaScript")); // "JavaScript"
console.log(capitalize("")); // ""

// Capitalize every word
function capitalizeWords(str) {
return str
.split(" ")
.map(word => capitalize(word))
.join(" ");
}

console.log(capitalizeWords("hello world from javascript"));
// "Hello World From Javascript"

Trimming: trim, trimStart, trimEndโ€‹

Remove whitespace (spaces, tabs, newlines) from the edges of a string:

let padded = "   Hello, World!   ";

console.log(padded.trim()); // "Hello, World!" (both sides)
console.log(padded.trimStart()); // "Hello, World! " (left side only)
console.log(padded.trimEnd()); // " Hello, World!" (right side only)

// Removes all types of whitespace
let messy = " \t \n Hello \n \t ";
console.log(messy.trim()); // "Hello"

trimStart() and trimEnd() also have aliases trimLeft() and trimRight(), but the Start/End versions are the standard names.

Practical Use: Cleaning User Inputโ€‹

function cleanInput(input) {
return input.trim();
}

let username = cleanInput(" alice_42 ");
console.log(username); // "alice_42"
console.log(username.length); // 8 (no leading/trailing spaces)

Padding: padStart, padEndโ€‹

Pad a string to a target length by adding characters to the beginning or end:

// padStart(targetLength, padString)
console.log("5".padStart(3, "0")); // "005"
console.log("42".padStart(5, "0")); // "00042"
console.log("hello".padStart(5, "0")); // "hello" (already >= 5 chars)
console.log("hi".padStart(10, ".-")); // ".-.-.-.-hi"
console.log("7".padStart(2)); // " 7" (default pad is space)

// padEnd(targetLength, padString)
console.log("hello".padEnd(10, ".")); // "hello....."
console.log("42".padEnd(6, "0")); // "420000"
console.log("hi".padEnd(10, "!")); // "hi!!!!!!!!"

Practical Use Casesโ€‹

// Formatting numbers with leading zeros
function formatTime(hours, minutes, seconds) {
return `${String(hours).padStart(2, "0")}:${String(minutes).padStart(2, "0")}:${String(seconds).padStart(2, "0")}`;
}

console.log(formatTime(9, 5, 3)); // "09:05:03"
console.log(formatTime(14, 30, 0)); // "14:30:00"

// Formatting IDs
function formatId(id) {
return `ID-${String(id).padStart(6, "0")}`;
}

console.log(formatId(42)); // "ID-000042"
console.log(formatId(12345)); // "ID-012345"

// Creating a simple table
let items = [
{ name: "Apple", price: 1.5 },
{ name: "Banana", price: 0.75 },
{ name: "Cherry", price: 3.2 }
];

items.forEach(item => {
console.log(
`${item.name.padEnd(10)} $${item.price.toFixed(2).padStart(6)}`
);
});
// Apple $ 1.50
// Banana $ 0.75
// Cherry $ 3.20

Repeating: repeatโ€‹

Creates a new string by repeating the original string a specified number of times:

console.log("ha".repeat(3));       // "hahaha"
console.log("-".repeat(20)); // "--------------------"
console.log("abc".repeat(0)); // "" (empty string)
console.log("Hello! ".repeat(2)); // "Hello! Hello! "

// Practical: creating separators
function separator(char = "-", length = 40) {
return char.repeat(length);
}

console.log(separator()); // "----------------------------------------"
console.log(separator("=", 30)); // "=============================="
console.log(separator("*-", 10)); // "*-*-*-*-*-*-*-*-*-*-"

repeat throws a RangeError for negative numbers or Infinity:

// "ha".repeat(-1);       // RangeError
// "ha".repeat(Infinity); // RangeError

Replacing: replace, replaceAllโ€‹

replace()โ€‹

Replaces the first occurrence of a pattern:

let text = "Hello, World! Hello, JavaScript!";

// Replace first occurrence only
console.log(text.replace("Hello", "Hi"));
// "Hi, World! Hello, JavaScript!"

// Case-sensitive
console.log(text.replace("hello", "Hi"));
// "Hello, World! Hello, JavaScript!" (no match, unchanged)

replace with Regular Expressionsโ€‹

To replace all occurrences with replace, use a regex with the g (global) flag:

let text = "Hello, World! Hello, JavaScript!";

// Replace ALL occurrences with regex + g flag
console.log(text.replace(/Hello/g, "Hi"));
// "Hi, World! Hi, JavaScript!"

// Case-insensitive replacement
console.log(text.replace(/hello/gi, "Hi"));
// "Hi, World! Hi, JavaScript!"

replaceAll()โ€‹

Replaces all occurrences without needing a regular expression:

let text = "Hello, World! Hello, JavaScript!";

console.log(text.replaceAll("Hello", "Hi"));
// "Hi, World! Hi, JavaScript!"

// Useful for escaping characters
let csv = "one,two,three,four";
console.log(csv.replaceAll(",", " | "));
// "one | two | three | four"

Replacement with Functionsโ€‹

Both replace and replaceAll accept a function as the second argument, giving you full control over each replacement:

let text = "I have 3 cats and 12 dogs";

let result = text.replace(/\d+/g, (match) => {
return match * 2;
});

console.log(result); // "I have 6 cats and 24 dogs"

// More complex: using capture groups
let template = "Hello, {name}! You are {age} years old.";
let data = { name: "Alice", age: 30 };

let filled = template.replace(/\{(\w+)\}/g, (fullMatch, key) => {
return data[key] ?? fullMatch;
});

console.log(filled); // "Hello, Alice! You are 30 years old."

Splitting and Joiningโ€‹

split()โ€‹

Divides a string into an array of substrings based on a separator:

let csv = "apple,banana,cherry,date";
let fruits = csv.split(",");
console.log(fruits); // ["apple", "banana", "cherry", "date"]

// Split by space
let words = "Hello World JavaScript".split(" ");
console.log(words); // ["Hello", "World", "JavaScript"]

// Split by empty string, individual characters
let chars = "Hello".split("");
console.log(chars); // ["H", "e", "l", "l", "o"]

// Split with a limit
let limited = csv.split(",", 2);
console.log(limited); // ["apple", "banana"]

// Split by regex
let text = "one1two2three3four";
let parts = text.split(/\d/);
console.log(parts); // ["one", "two", "three", "four"]

join() (Array Method)โ€‹

The counterpart to split. Joins array elements into a string:

let words = ["Hello", "World", "JavaScript"];

console.log(words.join(" ")); // "Hello World JavaScript"
console.log(words.join(", ")); // "Hello, World, JavaScript"
console.log(words.join("-")); // "Hello-World-JavaScript"
console.log(words.join("")); // "HelloWorldJavaScript"
console.log(words.join()); // "Hello,World,JavaScript" (default is comma)

split and join Together: Common Patternsโ€‹

// Reverse a string (simple cases, not emoji-safe!)
let reversed = "hello".split("").reverse().join("");
console.log(reversed); // "olleh"

// Convert between formats
let kebab = "my-component-name";
let camel = kebab
.split("-")
.map((word, i) => i === 0 ? word : word[0].toUpperCase() + word.slice(1))
.join("");
console.log(camel); // "myComponentName"

// Clean up extra spaces
let messy = " too many spaces ";
let clean = messy.trim().split(/\s+/).join(" ");
console.log(clean); // "too many spaces"

// Create a slug from a title
function slugify(title) {
return title
.toLowerCase()
.trim()
.split(/\s+/)
.join("-")
.replace(/[^a-z0-9-]/g, "");
}

console.log(slugify("Hello World! How Are You?"));
// "hello-world-how-are-you"

String Comparison and Localesโ€‹

Default Comparison (Unicode Code Points)โ€‹

Strings are compared character by character using their Unicode code point values:

console.log("a" > "b");           // false (97 > 98 is false)
console.log("b" > "a"); // true
console.log("apple" > "banana"); // false ('a' < 'b')

// Uppercase letters have LOWER code points than lowercase
console.log("A" > "a"); // false (65 > 97 is false)
console.log("Z" > "a"); // false (90 > 97 is false)

// This means uppercase sorts BEFORE lowercase
let words = ["banana", "Apple", "cherry"];
console.log(words.sort());
// ["Apple", "banana", "cherry"] (A (65) comes before b (98))

localeCompare()โ€‹

For proper language-aware comparison, use localeCompare():

// localeCompare returns:
// negative if str < other
// 0 if str === other
// positive if str > other

console.log("a".localeCompare("b")); // -1 (a comes before b)
console.log("b".localeCompare("a")); // 1 (b comes after a)
console.log("a".localeCompare("a")); // 0 (equal)

// Case-insensitive by default in most locales
console.log("a".localeCompare("A")); // -1 or small number (depends on locale)

// Sorting with localeCompare
let words = ["banana", "Apple", "cherry", "รคpple"];
words.sort((a, b) => a.localeCompare(b));
console.log(words); // ["Apple", "รคpple", "banana", "cherry"]

localeCompare Optionsโ€‹

// Case-insensitive comparison
let result = "Hello".localeCompare("hello", undefined, { sensitivity: "base" });
console.log(result); // 0 (treated as equal)

// Numeric sorting
let files = ["file10", "file2", "file1", "file20"];
files.sort((a, b) => a.localeCompare(b, undefined, { numeric: true }));
console.log(files); // ["file1", "file2", "file10", "file20"]

// Without numeric option:
let filesBad = ["file10", "file2", "file1", "file20"];
filesBad.sort();
console.log(filesBad); // ["file1", "file10", "file2", "file20"] (wrong!)

Intl.Collator for Performanceโ€‹

When sorting large arrays, creating an Intl.Collator is more efficient than calling localeCompare repeatedly:

let collator = new Intl.Collator("en", { numeric: true, sensitivity: "base" });

let files = ["file10", "file2", "file1", "file20", "file3"];
files.sort(collator.compare);
console.log(files); // ["file1", "file2", "file3", "file10", "file20"]

Unicode, UTF-16, Surrogate Pairs, and Emoji Handlingโ€‹

Understanding how JavaScript stores strings internally is essential for handling modern text correctly, especially emoji and international characters.

UTF-16 Internal Representationโ€‹

JavaScript strings are sequences of UTF-16 code units, where each code unit is 16 bits (2 bytes). Characters in the Basic Multilingual Plane (BMP), which includes most common characters, fit in a single 16-bit code unit:

// BMP characters: one code unit each
console.log("A".length); // 1
console.log("โ‚ฌ".length); // 1 (Euro sign: U+20AC)
console.log("ไธญ".length); // 1 (Chinese character: U+4E2D)

Surrogate Pairsโ€‹

Characters outside the BMP (code points above U+FFFF), including most emoji, are represented as two code units called a surrogate pair:

// Characters outside BMP: two code units (surrogate pair)
console.log("๐Ÿ˜€".length); // 2
console.log("๐ŸŽ‰".length); // 2
console.log("๐•ณ".length); // 2 (Mathematical double-struck H)

// The two code units of ๐Ÿ˜€
console.log("๐Ÿ˜€".charCodeAt(0)); // 55357 (high surrogate: 0xD83D)
console.log("๐Ÿ˜€".charCodeAt(1)); // 56832 (low surrogate: 0xDE00)

// Getting the actual code point
console.log("๐Ÿ˜€".codePointAt(0)); // 128512 (U+1F600)

Creating Characters from Code Pointsโ€‹

// String.fromCharCode: only works for BMP (single code unit)
console.log(String.fromCharCode(65)); // "A"
console.log(String.fromCharCode(8364)); // "โ‚ฌ"

// String.fromCodePoint: works for ALL characters
console.log(String.fromCodePoint(128512)); // "๐Ÿ˜€"
console.log(String.fromCodePoint(0x1F600)); // "๐Ÿ˜€"
console.log(String.fromCodePoint(65, 66, 67)); // "ABC"

The Problems with Surrogate Pairsโ€‹

String operations that work on code units can break emoji:

let emoji = "๐Ÿ˜€Hello";

// โŒ Bracket notation can return half a surrogate pair
console.log(emoji[0]); // "๏ฟฝ" (high surrogate, not a valid character)
console.log(emoji[1]); // "๏ฟฝ" (low surrogate)
console.log(emoji[2]); // "H"

// โŒ slice can split a surrogate pair
console.log(emoji.slice(0, 1)); // "๏ฟฝ" (broken!)

// โœ… Use spread or for...of for correct character handling
let chars = [...emoji];
console.log(chars[0]); // "๐Ÿ˜€" (correct!)
console.log(chars[1]); // "H"
console.log(chars.length); // 6 (correct character count)

// โœ… Correct slicing
console.log(chars.slice(0, 1).join("")); // "๐Ÿ˜€"

Reversing Strings with Emojiโ€‹

The classic split("").reverse().join("") breaks with emoji:

let text = "Hello ๐Ÿ˜€!";

// โŒ WRONG: Breaks surrogate pairs
let broken = text.split("").reverse().join("");
console.log(broken); // "!๏ฟฝ๏ฟฝolleH" (broken emoji)

// โœ… CORRECT: Use spread operator
let correct = [...text].reverse().join("");
console.log(correct); // "!๐Ÿ˜€ olleH"

Grapheme Clustersโ€‹

Even for...of and the spread operator do not solve all problems. Some visual characters consist of multiple code points combined:

// Family emoji = multiple code points joined by Zero Width Joiner (ZWJ)
let family = "๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ";
console.log(family.length); // 11
console.log([...family].length); // 7 (still not 1!)
console.log([...family]); // ["๐Ÿ‘จ", "โ€", "๐Ÿ‘ฉ", "โ€", "๐Ÿ‘ง", "โ€", "๐Ÿ‘ฆ"]

// Flag emoji = two regional indicator symbols
let flag = "๐Ÿ‡ฎ๐Ÿ‡น";
console.log(flag.length); // 4
console.log([...flag].length); // 2 (not 1!)

// Accented characters can be single or combined
let cafe1 = "cafรฉ"; // 'รฉ' as single code point (U+00E9)
let cafe2 = "cafe\u0301"; // 'e' + combining accent (U+0301)
console.log(cafe1.length); // 4
console.log(cafe2.length); // 5 (different!)
console.log(cafe1 === cafe2); // false!

Unicode Normalizationโ€‹

To handle composed vs. decomposed characters, use normalize():

let cafe1 = "cafรฉ";           // Precomposed รฉ
let cafe2 = "cafe\u0301"; // e + combining acute accent

console.log(cafe1 === cafe2); // false
console.log(cafe1.normalize() === cafe2.normalize()); // true!

// NFC (default): Composes to shortest form
console.log(cafe2.normalize("NFC").length); // 4 (รฉ as single code point)

// NFD: Decomposes to base + combining marks
console.log(cafe1.normalize("NFD").length); // 5 (e + combining accent)

Intl.Segmenter for True Grapheme Handlingโ€‹

For correctly counting and splitting visual characters (grapheme clusters), use Intl.Segmenter:

let text = "Hello ๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ! ๐Ÿ‡ฎ๐Ÿ‡น";
let segmenter = new Intl.Segmenter("en", { granularity: "grapheme" });

let graphemes = [...segmenter.segment(text)].map(s => s.segment);
console.log(graphemes); // ["H", "e", "l", "l", "o", " ", "๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ", "!", " ", "๐Ÿ‡ฎ๐Ÿ‡น"]

console.log(graphemes.length); // 10 (correct visual character count!)
When Unicode Matters

For most everyday text processing with ASCII and common characters, standard string methods work perfectly. You only need to worry about surrogate pairs, grapheme clusters, and normalization when your application handles emoji, international text with combining marks, or flag symbols. However, in any user-facing application, it is worth considering these edge cases.

Summaryโ€‹

  • Strings are immutable. Every string operation returns a new string; the original is never modified.
  • JavaScript supports three quote types: single quotes, double quotes (functionally identical), and backticks (template literals with interpolation, multi-line support, and tagged templates).
  • Template literals (\...`) support $` interpolation, multi-line text, and tagged templates for custom processing.
  • Access characters with bracket notation (str[i]), charAt(i), or at(i) (supports negative indices). Use at(-1) for the last character.
  • Iterate over characters with for...of, which correctly handles multi-byte characters unlike index-based loops.
  • Search methods: indexOf/lastIndexOf (position), includes (boolean), startsWith/endsWith (boolean).
  • Extract substrings with slice(start, end). Prefer slice over substring (confusing argument swapping) and substr (deprecated).
  • Case conversion: toUpperCase(), toLowerCase(), and their locale-aware variants toLocaleUpperCase(), toLocaleLowerCase().
  • Trimming: trim(), trimStart(), trimEnd() remove whitespace from edges.
  • Padding: padStart(), padEnd() add characters to reach a target length.
  • Repeating: repeat(n) creates a string repeated n times.
  • Replacing: replace() (first or regex), replaceAll() (all occurrences). Both accept functions for dynamic replacement.
  • Split/Join: split() converts strings to arrays; join() converts arrays to strings.
  • Comparison: Default comparison uses Unicode code points. Use localeCompare() for language-aware sorting, especially with the numeric and sensitivity options.
  • Strings are stored as UTF-16. Characters outside the BMP (like emoji) use surrogate pairs (two code units), making length and index-based access unreliable for them. Use spread ([...str]) or for...of for correct character handling, and Intl.Segmenter for true grapheme cluster support.

Table of Contents