How to Work with Strings in JavaScript

Strings are one of the most frequently used data types in JavaScript. Every piece of text in your application, from user names and error messages to HTML content and API responses, is a string. JavaScript provides an extensive set of built-in methods for searching, extracting, transforming, and comparing strings, making it one of the most text-capable languages in web development.

However, strings in JavaScript have characteristics that set them apart from many other languages. They are immutable, meaning they can never be changed in place. They are stored internally as UTF-16, which affects how they handle emoji and certain international characters. And the sheer number of available methods, some overlapping, some deprecated, can be overwhelming without a clear guide.

This article covers everything you need to know about strings: from the three types of quotes and template literals, through every major string method with practical examples, to the internal Unicode representation that affects how strings behave with modern characters.

String Immutability

The most fundamental characteristic of strings in JavaScript is that they are immutable. Once a string is created, it cannot be changed. Every operation that appears to modify a string actually creates and returns a new string, leaving the original untouched.

let greeting = "Hello";

greeting[0] = "J";     // Attempting to change the first character
console.log(greeting); // "Hello" (unchanged! No error, just silently ignored.)

let upper = greeting.toUpperCase();
console.log(greeting); // "Hello" (still the original)
console.log(upper);    // "HELLO" (a brand-new string)

In strict mode, attempting to assign to a string index throws a TypeError:

"use strict";
let str = "Hello";
str[0] = "J"; // TypeError: Cannot assign to read only property '0' of string 'Hello'

This immutability means that when you "modify" a string, you are always creating a new one. Building strings through repeated concatenation in a loop can be inefficient for very large strings because each concatenation creates a new string object.

// Each += creates a new string
let result = "";
for (let i = 0; i < 5; i++) {
  result += i + " "; // New string created each iteration
}
console.log(result); // "0 1 2 3 4 "

For building complex strings, using an array and join() or template literals is often cleaner and can be more efficient.

Quotes: Single, Double, Backticks

JavaScript supports three types of quotes for creating strings. Single and double quotes are functionally identical, while backticks (template literals) provide additional features.

Single and Double Quotes

let single = 'Hello, World!';
let double = "Hello, World!";

console.log(single === double); // true (identical strings)

There is no functional difference between single and double quotes. The choice is purely stylistic. Most modern style guides (Airbnb, StandardJS) prefer single quotes for consistency, reserving double quotes for strings that contain apostrophes:

let message = "It's a beautiful day";    // Double quotes avoid escaping
let alt = 'It\'s a beautiful day';       // Single quotes require escaping
let html = '<div class="container">';    // Single quotes for HTML strings

Backticks (Template Literals)

Backticks provide three features that single and double quotes do not:

// 1. String interpolation
let name = "Alice";
let greeting = `Hello, ${name}!`;
console.log(greeting); // "Hello, Alice!"

// 2. Multi-line strings
let poem = `Roses are red,
Violets are blue,
JavaScript is great,
And so are you.`;
console.log(poem);

// 3. Embedded expressions
let a = 5, b = 3;
console.log(`${a} + ${b} = ${a + b}`); // "5 + 3 = 8"

Single and double quotes cannot span multiple lines without escape sequences and do not support interpolation.

Template Literals: Interpolation, Multi-Line, Tagged Templates

Template literals (backtick strings) deserve a deeper look because they are one of the most powerful string features in modern JavaScript.

String Interpolation

Any JavaScript expression can be placed inside ${}:

let user = { name: "Alice", age: 30 };

console.log(`Name: ${user.name}`);                        // "Name: Alice"
console.log(`Age next year: ${user.age + 1}`);            // "Age next year: 31"
console.log(`Is adult: ${user.age >= 18}`);               // "Is adult: true"
console.log(`Uppercase: ${user.name.toUpperCase()}`);     // "Uppercase: ALICE"

// Function calls inside interpolation
function greet(name) {
  return `Hello, ${name}!`;
}
console.log(`Message: ${greet("Bob")}`);                  // "Message: Hello, Bob!"

// Ternary expressions
let score = 85;
console.log(`Result: ${score >= 60 ? "Pass" : "Fail"}`);  // "Result: Pass"

Multi-Line Strings

Template literals preserve line breaks exactly as written:

let html = `
<div class="card">
  <h2>${user.name}</h2>
  <p>Age: ${user.age}</p>
</div>
`;

console.log(html);
// 
// <div class="card">
//   <h2>Alice</h2>
//   <p>Age: 30</p>
// </div>
// 

Note that the leading and trailing newlines are included in the string. If you do not want them, start the content on the same line as the opening backtick.

Tagged Templates

Tagged templates are an advanced feature where a function processes the template literal. The function receives the string parts and the interpolated values separately:

function highlight(strings, ...values) {
  let result = "";
  strings.forEach((str, i) => {
    result += str;
    if (i < values.length) {
      result += `**${values[i]}**`;
    }
  });
  return result;
}

let name = "Alice";
let role = "developer";

let message = highlight`Welcome ${name}, you are a ${role}!`;
console.log(message); // "Welcome **Alice**, you are a **developer**!"

The tag function receives:

strings: an array of string segments between interpolations (["Welcome ", ", you are a ", "!"])
...values: the interpolated values (["Alice", "developer"])

Practical Tagged Template: HTML Escaping

function safeHTML(strings, ...values) {
  let result = "";
  strings.forEach((str, i) => {
    result += str;
    if (i < values.length) {
      result += String(values[i])
        .replace(/&/g, "&amp;")
        .replace(/</g, "&lt;")
        .replace(/>/g, "&gt;")
        .replace(/"/g, "&quot;");
    }
  });
  return result;
}

let userInput = '<script>alert("XSS")</script>';

let html = safeHTML`<div>${userInput}</div>`;
console.log(html); 
// "<div>&lt;script&gt;alert(&quot;XSS&quot;)&lt;/script&gt;</div>"

This pattern is used in real-world libraries to prevent XSS attacks when building HTML from user data.

Special Characters and Escape Sequences

Strings can contain special characters represented by escape sequences starting with a backslash (\):

Escape Sequence	Character	Description
`\n`	Newline	Line feed (LF)
`\r`	Carriage return	CR (Windows line endings use `\r\n`)
`\t`	Tab	Horizontal tab
`\\`	Backslash	Literal backslash
`\'`	Single quote	Inside single-quoted strings
`\"`	Double quote	Inside double-quoted strings
```	Backtick	Inside template literals
`\0`	Null character	The null byte
`\uXXXX`	Unicode (BMP)	4-digit hex code point
`\u{XXXXX}`	Unicode (full)	Any code point (1-6 hex digits)
`\xXX`	Latin-1	2-digit hex code

// Common escape sequences
console.log("Line 1\nLine 2");
// Line 1
// Line 2

console.log("Column1\tColumn2\tColumn3");
// Column1    Column2    Column3

console.log("She said \"hello\"");
// She said "hello"

console.log("Path: C:\\Users\\Alice");
// Path: C:\Users\Alice

// Unicode escapes
console.log("\u00A9");          // © (copyright symbol)
console.log("\u{1F600}");       // 😀 (grinning face emoji)
console.log("\u{2764}");        // ❤ (heart)

Multi-Line with Escape Sequences vs. Template Literals

// Old way: escape sequences
let oldMultiLine = "Line 1\n" +
                   "Line 2\n" +
                   "Line 3";

// Modern way: template literals
let newMultiLine = `Line 1
Line 2
Line 3`;

console.log(oldMultiLine === newMultiLine); // true

Template literals are clearly more readable for multi-line strings.

String Length and Accessing Characters

The length Property

length returns the number of UTF-16 code units in the string, not the number of visible characters. For most text, these are the same, but they differ for emoji and certain symbols:

console.log("Hello".length);     // 5
console.log("".length);          // 0
console.log(" ".length);         // 1 (space is a character)
console.log("\n".length);        // 1 (newline is one character)

// Emoji surprise
console.log("😀".length);        // 2 (not 1! (uses two UTF-16 code units))
console.log("👨‍👩‍👧‍👦".length);       // 11 (a family emoji is many code units!)

length is a property, not a method. Do not use parentheses:

let str = "Hello";
console.log(str.length);      // 5 ✅
// console.log(str.length()); // TypeError: str.length is not a function ❌

Accessing Individual Characters

Three ways to access characters by index:

let str = "Hello";

// Bracket notation (modern, preferred)
console.log(str[0]);          // "H"
console.log(str[1]);          // "e"
console.log(str[4]);          // "o"
console.log(str[10]);         // undefined (out of bounds)

// charAt() method
console.log(str.charAt(0));   // "H"
console.log(str.charAt(10));  // "" (empty string for out-of-bounds)

// at() method (ES2022) supports negative indices!
console.log(str.at(0));       // "H"
console.log(str.at(-1));      // "o" (last character)
console.log(str.at(-2));      // "l" (second to last)
console.log(str.at(10));      // undefined (out of bounds)

Comparison: `[]`, `charAt()`, `at()`

Feature	`str[i]`	`str.charAt(i)`	`str.at(i)`
Out-of-bounds	`undefined`	`""` (empty string)	`undefined`
Negative index	`undefined`	`""`	Counts from end
Recommended	Yes	Legacy	Yes (especially for negative)

let word = "JavaScript";

// Getting the last character
console.log(word[word.length - 1]);  // "t" (verbose)
console.log(word.at(-1));            // "t" (clean and modern)

tip

Use str.at(-1) to access the last character of a string. It is far more readable than str[str.length - 1].

Iterating Over Strings

`for...of` Loop

The for...of loop iterates over the characters of a string. It correctly handles multi-byte characters (like emoji):

let word = "Hello";

for (let char of word) {
  console.log(char);
}
// H
// e
// l
// l
// o

for...of with Emoji

let text = "Hi! 😀👍";

// for...of correctly iterates character by character
for (let char of text) {
  console.log(char);
}
// H
// i
// !
//  
// 😀  (one iteration, correctly grouped)
// 👍  (one iteration, correctly grouped)

// Compare with traditional for loop, breaks emoji!
for (let i = 0; i < text.length; i++) {
  console.log(text[i]);
}
// H
// i
// !
//  
// � (broken surrogate)
// � (broken surrogate)
// � (broken surrogate)
// � (broken surrogate)

The traditional for loop iterates over UTF-16 code units, which breaks multi-byte characters. Always use for...of when iterating over string characters.

Converting a String to an Array of Characters

// Spread operator (handles emoji correctly)
let chars = [..."Hello 😀"];
console.log(chars);
// ["H", "e", "l", "l", "o", " ", "😀"]

console.log(chars.length); // 7 (correct character count)

// Array.from (also handles emoji correctly)
let chars2 = Array.from("Hello 😀");
console.log(chars2); // ["H", "e", "l", "l", "o", " ", "😀"]

Searching in Strings

JavaScript provides several methods for finding text within strings.

`indexOf` and `lastIndexOf`

indexOf(searchString, startPosition) returns the index of the first occurrence, or -1 if not found:

let text = "Hello, World! Hello, JavaScript!";

console.log(text.indexOf("Hello"));       // 0
console.log(text.indexOf("Hello", 1));    // 14 (search starting from index 1)
console.log(text.indexOf("World"));       // 7
console.log(text.indexOf("Python"));      // -1 (not found)
console.log(text.indexOf("hello"));       // -1 (case-sensitive!)

lastIndexOf(searchString, startPosition) searches from the end:

let text = "Hello, World! Hello, JavaScript!";

console.log(text.lastIndexOf("Hello"));     // 14 (last occurrence)
console.log(text.lastIndexOf("Hello", 13)); // 0 (searching backward from index 13)

Finding All Occurrences

let text = "the cat sat on the mat";
let search = "the";
let positions = [];
let pos = text.indexOf(search);

while (pos !== -1) {
  positions.push(pos);
  pos = text.indexOf(search, pos + 1);
}

console.log(positions); // [0, 15]

`includes()`

Returns true or false. Simpler than indexOf when you only need to know if a substring exists:

let text = "Hello, JavaScript!";

console.log(text.includes("JavaScript")); // true
console.log(text.includes("Python"));     // false
console.log(text.includes("hello"));      // false (case-sensitive)

// With start position
console.log(text.includes("Hello", 1));   // false (starts searching from index 1)

startsWith() and endsWith()

Check if a string begins or ends with a specific substring:

let filename = "report-2024.pdf";

console.log(filename.startsWith("report"));  // true
console.log(filename.startsWith("Report"));  // false (case-sensitive)
console.log(filename.endsWith(".pdf"));      // true
console.log(filename.endsWith(".doc"));      // false

// With position parameter
console.log(filename.startsWith("2024", 7)); // true (start checking at index 7)
console.log(filename.endsWith("report", 6)); // true (consider only first 6 characters)

Practical Use Cases

// File type validation
function isImage(filename) {
  let lower = filename.toLowerCase();
  return lower.endsWith(".jpg") ||
         lower.endsWith(".jpeg") ||
         lower.endsWith(".png") ||
         lower.endsWith(".gif") ||
         lower.endsWith(".webp");
}

console.log(isImage("photo.JPG"));               // true
console.log(isImage("doc.pdf"));                 // false

// URL checking
function isSecureUrl(url) {
  return url.startsWith("https://");
}

console.log(isSecureUrl("https://example.com")); // true
console.log(isSecureUrl("http://example.com"));  // false

Search Methods Comparison

Method	Returns	Use When
`indexOf()`	Index or `-1`	You need the position
`lastIndexOf()`	Index or `-1`	You need the last position
`includes()`	`boolean`	You only need to know if it exists
`startsWith()`	`boolean`	Checking the beginning
`endsWith()`	`boolean`	Checking the ending

Extracting Substrings: `slice`, `substring`, `substr`

JavaScript has three methods for extracting parts of a string. In practice, slice is the one you should use.

`slice(start, end)`

Returns the portion of the string from start up to (but not including) end:

let str = "Hello, World!";

console.log(str.slice(0, 5));     // "Hello"
console.log(str.slice(7, 12));    // "World"
console.log(str.slice(7));        // "World!" (omit end = go to end)

// Negative indices count from the end
console.log(str.slice(-6));       // "World!"
console.log(str.slice(-6, -1));   // "World"

// Start after end returns empty string
console.log(str.slice(5, 2));     // ""

`substring(start, end)`

Similar to slice but with two differences:

Negative arguments are treated as 0
If start > end, the arguments are swapped

let str = "Hello, World!";

console.log(str.substring(0, 5));    // "Hello"
console.log(str.substring(7, 12));   // "World"

// Differences from slice:
console.log(str.substring(5, 2));    // "llo" (arguments swapped to (2, 5)
console.log(str.slice(5, 2));        // ""    (returns empty string)

console.log(str.substring(-3));      // "Hello, World!" (negative treated as 0)
console.log(str.slice(-3));          // "ld!" (negative counts from end)

`substr(start, length)` (Deprecated)

substr takes a start position and a length instead of an end position. It is deprecated and should not be used in new code:

let str = "Hello, World!";

// ❌ Deprecated: do not use
console.log(str.substr(7, 5));       // "World"
console.log(str.substr(-6, 5));      // "orld!"

// ✅ Use slice instead
console.log(str.slice(7, 7 + 5));    // "World"
console.log(str.slice(-6, -6 + 5));  // "orld!"

Comparison Table

Method	Negative args	`start > end`	Deprecated?
`slice(start, end)`	Counts from end	Returns `""`	No
`substring(start, end)`	Treated as `0`	Swaps arguments	No
`substr(start, length)`	Start counts from end	N/A (uses length)	Yes

Always Use slice()

slice() is the most versatile and predictable substring method. It supports negative indices, does not have the confusing argument-swapping behavior of substring, and is the only one recommended for modern code.

Changing Case

`toUpperCase()` and `toLowerCase()`

Convert the entire string to uppercase or lowercase:

let str = "Hello, World!";

console.log(str.toUpperCase()); // "HELLO, WORLD!"
console.log(str.toLowerCase()); // "hello, world!"

// Original is unchanged (strings are immutable)
console.log(str);               // "Hello, World!"

// Single character
console.log("a".toUpperCase()); // "A"

`toLocaleUpperCase()` and `toLocaleLowerCase()`

Handle locale-specific case conversions. Critical for certain languages:

// Turkish has a special case: lowercase 'i' → uppercase 'İ' (not 'I')
let turkishWord = "istanbul";

console.log(turkishWord.toUpperCase());               // "ISTANBUL" (wrong for Turkish!)
console.log(turkishWord.toLocaleUpperCase("tr-TR"));  // "İSTANBUL" (correct!)

// German sharp s: 'ß' → 'SS'
let german = "straße";
console.log(german.toUpperCase());                    // "STRASSE"
console.log(german.toLocaleUpperCase("de-DE"));       // "STRASSE"

Capitalizing the First Letter

A common utility that JavaScript does not provide natively:

function capitalize(str) {
  if (!str) return str;
  return str[0].toUpperCase() + str.slice(1);
}

console.log(capitalize("hello"));       // "Hello"
console.log(capitalize("javaScript"));  // "JavaScript"
console.log(capitalize(""));            // ""

// Capitalize every word
function capitalizeWords(str) {
  return str
    .split(" ")
    .map(word => capitalize(word))
    .join(" ");
}

console.log(capitalizeWords("hello world from javascript"));
// "Hello World From Javascript"

Trimming: `trim`, `trimStart`, `trimEnd`

Remove whitespace (spaces, tabs, newlines) from the edges of a string:

let padded = "   Hello, World!   ";

console.log(padded.trim());       // "Hello, World!" (both sides)
console.log(padded.trimStart());  // "Hello, World!   " (left side only)
console.log(padded.trimEnd());    // "   Hello, World!" (right side only)

// Removes all types of whitespace
let messy = " \t \n Hello \n \t ";
console.log(messy.trim());        // "Hello"

trimStart() and trimEnd() also have aliases trimLeft() and trimRight(), but the Start/End versions are the standard names.

Practical Use: Cleaning User Input

function cleanInput(input) {
  return input.trim();
}

let username = cleanInput("  alice_42  ");
console.log(username);        // "alice_42"
console.log(username.length); // 8 (no leading/trailing spaces)

Padding: `padStart`, `padEnd`

Pad a string to a target length by adding characters to the beginning or end:

// padStart(targetLength, padString)
console.log("5".padStart(3, "0"));       // "005"
console.log("42".padStart(5, "0"));      // "00042"
console.log("hello".padStart(5, "0"));   // "hello" (already >= 5 chars)
console.log("hi".padStart(10, ".-"));    // ".-.-.-.-hi"
console.log("7".padStart(2));            // " 7" (default pad is space)

// padEnd(targetLength, padString)
console.log("hello".padEnd(10, "."));    // "hello....."
console.log("42".padEnd(6, "0"));        // "420000"
console.log("hi".padEnd(10, "!"));       // "hi!!!!!!!!"

Practical Use Cases

// Formatting numbers with leading zeros
function formatTime(hours, minutes, seconds) {
  return `${String(hours).padStart(2, "0")}:${String(minutes).padStart(2, "0")}:${String(seconds).padStart(2, "0")}`;
}

console.log(formatTime(9, 5, 3));    // "09:05:03"
console.log(formatTime(14, 30, 0));  // "14:30:00"

// Formatting IDs
function formatId(id) {
  return `ID-${String(id).padStart(6, "0")}`;
}

console.log(formatId(42));           // "ID-000042"
console.log(formatId(12345));        // "ID-012345"

// Creating a simple table
let items = [
  { name: "Apple", price: 1.5 },
  { name: "Banana", price: 0.75 },
  { name: "Cherry", price: 3.2 }
];

items.forEach(item => {
  console.log(
    `${item.name.padEnd(10)} $${item.price.toFixed(2).padStart(6)}`
  );
});
// Apple      $  1.50
// Banana     $  0.75
// Cherry     $  3.20

Repeating: `repeat`

Creates a new string by repeating the original string a specified number of times:

console.log("ha".repeat(3));       // "hahaha"
console.log("-".repeat(20));       // "--------------------"
console.log("abc".repeat(0));      // "" (empty string)
console.log("Hello! ".repeat(2));  // "Hello! Hello! "

// Practical: creating separators
function separator(char = "-", length = 40) {
  return char.repeat(length);
}

console.log(separator());          // "----------------------------------------"
console.log(separator("=", 30));   // "=============================="
console.log(separator("*-", 10));  // "*-*-*-*-*-*-*-*-*-*-"

repeat throws a RangeError for negative numbers or Infinity:

// "ha".repeat(-1);       // RangeError
// "ha".repeat(Infinity); // RangeError

Replacing: `replace`, `replaceAll`

`replace()`

Replaces the first occurrence of a pattern:

let text = "Hello, World! Hello, JavaScript!";

// Replace first occurrence only
console.log(text.replace("Hello", "Hi"));
// "Hi, World! Hello, JavaScript!"

// Case-sensitive
console.log(text.replace("hello", "Hi"));
// "Hello, World! Hello, JavaScript!" (no match, unchanged)

replace with Regular Expressions

To replace all occurrences with replace, use a regex with the g (global) flag:

let text = "Hello, World! Hello, JavaScript!";

// Replace ALL occurrences with regex + g flag
console.log(text.replace(/Hello/g, "Hi"));
// "Hi, World! Hi, JavaScript!"

// Case-insensitive replacement
console.log(text.replace(/hello/gi, "Hi"));
// "Hi, World! Hi, JavaScript!"

`replaceAll()`

Replaces all occurrences without needing a regular expression:

let text = "Hello, World! Hello, JavaScript!";

console.log(text.replaceAll("Hello", "Hi"));
// "Hi, World! Hi, JavaScript!"

// Useful for escaping characters
let csv = "one,two,three,four";
console.log(csv.replaceAll(",", " | "));
// "one | two | three | four"

Replacement with Functions

Both replace and replaceAll accept a function as the second argument, giving you full control over each replacement:

let text = "I have 3 cats and 12 dogs";

let result = text.replace(/\d+/g, (match) => {
  return match * 2;
});

console.log(result); // "I have 6 cats and 24 dogs"

// More complex: using capture groups
let template = "Hello, {name}! You are {age} years old.";
let data = { name: "Alice", age: 30 };

let filled = template.replace(/\{(\w+)\}/g, (fullMatch, key) => {
  return data[key] ?? fullMatch;
});

console.log(filled); // "Hello, Alice! You are 30 years old."

Splitting and Joining

`split()`

Divides a string into an array of substrings based on a separator:

let csv = "apple,banana,cherry,date";
let fruits = csv.split(",");
console.log(fruits);  // ["apple", "banana", "cherry", "date"]

// Split by space
let words = "Hello World JavaScript".split(" ");
console.log(words);   // ["Hello", "World", "JavaScript"]

// Split by empty string, individual characters
let chars = "Hello".split("");
console.log(chars);   // ["H", "e", "l", "l", "o"]

// Split with a limit
let limited = csv.split(",", 2);
console.log(limited); // ["apple", "banana"]

// Split by regex
let text = "one1two2three3four";
let parts = text.split(/\d/);
console.log(parts);   // ["one", "two", "three", "four"]

`join()` (Array Method)

The counterpart to split. Joins array elements into a string:

let words = ["Hello", "World", "JavaScript"];

console.log(words.join(" "));     // "Hello World JavaScript"
console.log(words.join(", "));    // "Hello, World, JavaScript"
console.log(words.join("-"));     // "Hello-World-JavaScript"
console.log(words.join(""));      // "HelloWorldJavaScript"
console.log(words.join());        // "Hello,World,JavaScript" (default is comma)

`split` and `join` Together: Common Patterns

// Reverse a string (simple cases, not emoji-safe!)
let reversed = "hello".split("").reverse().join("");
console.log(reversed);    // "olleh"

// Convert between formats
let kebab = "my-component-name";
let camel = kebab
  .split("-")
  .map((word, i) => i === 0 ? word : word[0].toUpperCase() + word.slice(1))
  .join("");
console.log(camel);       // "myComponentName"

// Clean up extra spaces
let messy = "  too   many    spaces  ";
let clean = messy.trim().split(/\s+/).join(" ");
console.log(clean);       // "too many spaces"

// Create a slug from a title
function slugify(title) {
  return title
    .toLowerCase()
    .trim()
    .split(/\s+/)
    .join("-")
    .replace(/[^a-z0-9-]/g, "");
}

console.log(slugify("Hello World! How Are You?"));
// "hello-world-how-are-you"

String Comparison and Locales

Default Comparison (Unicode Code Points)

Strings are compared character by character using their Unicode code point values:

console.log("a" > "b");           // false (97 > 98 is false)
console.log("b" > "a");           // true
console.log("apple" > "banana");  // false ('a' < 'b')

// Uppercase letters have LOWER code points than lowercase
console.log("A" > "a");           // false (65 > 97 is false)
console.log("Z" > "a");           // false (90 > 97 is false)

// This means uppercase sorts BEFORE lowercase
let words = ["banana", "Apple", "cherry"];
console.log(words.sort());
// ["Apple", "banana", "cherry"]  (A (65) comes before b (98))

localeCompare()

For proper language-aware comparison, use localeCompare():

// localeCompare returns:
// negative if str < other
// 0 if str === other
// positive if str > other

console.log("a".localeCompare("b"));      // -1 (a comes before b)
console.log("b".localeCompare("a"));      // 1 (b comes after a)
console.log("a".localeCompare("a"));      // 0 (equal)

// Case-insensitive by default in most locales
console.log("a".localeCompare("A"));      // -1 or small number (depends on locale)

// Sorting with localeCompare
let words = ["banana", "Apple", "cherry", "äpple"];
words.sort((a, b) => a.localeCompare(b));
console.log(words); // ["Apple", "äpple", "banana", "cherry"]

localeCompare Options

// Case-insensitive comparison
let result = "Hello".localeCompare("hello", undefined, { sensitivity: "base" });
console.log(result);    // 0 (treated as equal)

// Numeric sorting
let files = ["file10", "file2", "file1", "file20"];
files.sort((a, b) => a.localeCompare(b, undefined, { numeric: true }));
console.log(files);     // ["file1", "file2", "file10", "file20"]

// Without numeric option:
let filesBad = ["file10", "file2", "file1", "file20"];
filesBad.sort();
console.log(filesBad);  // ["file1", "file10", "file2", "file20"] (wrong!)

Intl.Collator for Performance

When sorting large arrays, creating an Intl.Collator is more efficient than calling localeCompare repeatedly:

let collator = new Intl.Collator("en", { numeric: true, sensitivity: "base" });

let files = ["file10", "file2", "file1", "file20", "file3"];
files.sort(collator.compare);
console.log(files); // ["file1", "file2", "file3", "file10", "file20"]

Unicode, UTF-16, Surrogate Pairs, and Emoji Handling

Understanding how JavaScript stores strings internally is essential for handling modern text correctly, especially emoji and international characters.

UTF-16 Internal Representation

JavaScript strings are sequences of UTF-16 code units, where each code unit is 16 bits (2 bytes). Characters in the Basic Multilingual Plane (BMP), which includes most common characters, fit in a single 16-bit code unit:

// BMP characters: one code unit each
console.log("A".length);      // 1
console.log("€".length);      // 1 (Euro sign: U+20AC)
console.log("中".length);     // 1 (Chinese character: U+4E2D)

Surrogate Pairs

Characters outside the BMP (code points above U+FFFF), including most emoji, are represented as two code units called a surrogate pair:

// Characters outside BMP: two code units (surrogate pair)
console.log("😀".length);         // 2
console.log("🎉".length);         // 2
console.log("𝕳".length);          // 2 (Mathematical double-struck H)

// The two code units of 😀
console.log("😀".charCodeAt(0));  // 55357 (high surrogate: 0xD83D)
console.log("😀".charCodeAt(1));  // 56832 (low surrogate: 0xDE00)

// Getting the actual code point
console.log("😀".codePointAt(0)); // 128512 (U+1F600)

Creating Characters from Code Points

// String.fromCharCode: only works for BMP (single code unit)
console.log(String.fromCharCode(65));           // "A"
console.log(String.fromCharCode(8364));         // "€"

// String.fromCodePoint: works for ALL characters
console.log(String.fromCodePoint(128512));      // "😀"
console.log(String.fromCodePoint(0x1F600));     // "😀"
console.log(String.fromCodePoint(65, 66, 67));  // "ABC"

The Problems with Surrogate Pairs

String operations that work on code units can break emoji:

let emoji = "😀Hello";

// ❌ Bracket notation can return half a surrogate pair
console.log(emoji[0]);          // "�" (high surrogate, not a valid character)
console.log(emoji[1]);          // "�" (low surrogate)
console.log(emoji[2]);          // "H"

// ❌ slice can split a surrogate pair
console.log(emoji.slice(0, 1)); // "�" (broken!)

// ✅ Use spread or for...of for correct character handling
let chars = [...emoji];
console.log(chars[0]);          // "😀" (correct!)
console.log(chars[1]);          // "H"
console.log(chars.length);      // 6 (correct character count)

// ✅ Correct slicing
console.log(chars.slice(0, 1).join("")); // "😀"

Reversing Strings with Emoji

The classic split("").reverse().join("") breaks with emoji:

let text = "Hello 😀!";

// ❌ WRONG: Breaks surrogate pairs
let broken = text.split("").reverse().join("");
console.log(broken);    // "!��olleH" (broken emoji)

// ✅ CORRECT: Use spread operator
let correct = [...text].reverse().join("");
console.log(correct);   // "!😀 olleH"

Grapheme Clusters

Even for...of and the spread operator do not solve all problems. Some visual characters consist of multiple code points combined:

// Family emoji = multiple code points joined by Zero Width Joiner (ZWJ)
let family = "👨‍👩‍👧‍👦";
console.log(family.length);      // 11
console.log([...family].length); // 7 (still not 1!)
console.log([...family]);        // ["👨", "‍", "👩", "‍", "👧", "‍", "👦"]

// Flag emoji = two regional indicator symbols
let flag = "🇮🇹";
console.log(flag.length);          // 4
console.log([...flag].length);     // 2 (not 1!)

// Accented characters can be single or combined
let cafe1 = "café";                // 'é' as single code point (U+00E9)
let cafe2 = "cafe\u0301";          // 'e' + combining accent (U+0301)
console.log(cafe1.length);         // 4
console.log(cafe2.length);         // 5 (different!)
console.log(cafe1 === cafe2);      // false!

Unicode Normalization

To handle composed vs. decomposed characters, use normalize():

let cafe1 = "café";           // Precomposed é
let cafe2 = "cafe\u0301";     // e + combining acute accent

console.log(cafe1 === cafe2);                         // false
console.log(cafe1.normalize() === cafe2.normalize()); // true!

// NFC (default): Composes to shortest form
console.log(cafe2.normalize("NFC").length); // 4 (é as single code point)

// NFD: Decomposes to base + combining marks
console.log(cafe1.normalize("NFD").length); // 5 (e + combining accent)

Intl.Segmenter for True Grapheme Handling

For correctly counting and splitting visual characters (grapheme clusters), use Intl.Segmenter:

let text = "Hello 👨‍👩‍👧‍👦! 🇮🇹";
let segmenter = new Intl.Segmenter("en", { granularity: "grapheme" });

let graphemes = [...segmenter.segment(text)].map(s => s.segment);
console.log(graphemes);         // ["H", "e", "l", "l", "o", " ", "👨‍👩‍👧‍👦", "!", " ", "🇮🇹"]

console.log(graphemes.length);  // 10 (correct visual character count!)

When Unicode Matters

For most everyday text processing with ASCII and common characters, standard string methods work perfectly. You only need to worry about surrogate pairs, grapheme clusters, and normalization when your application handles emoji, international text with combining marks, or flag symbols. However, in any user-facing application, it is worth considering these edge cases.

Summary

Strings are immutable. Every string operation returns a new string; the original is never modified.
JavaScript supports three quote types: single quotes, double quotes (functionally identical), and backticks (template literals with interpolation, multi-line support, and tagged templates).
Template literals (\...`) support $` interpolation, multi-line text, and tagged templates for custom processing.
Access characters with bracket notation (str[i]), charAt(i), or at(i) (supports negative indices). Use at(-1) for the last character.
Iterate over characters with for...of, which correctly handles multi-byte characters unlike index-based loops.
Search methods: indexOf/lastIndexOf (position), includes (boolean), startsWith/endsWith (boolean).
Extract substrings with slice(start, end). Prefer slice over substring (confusing argument swapping) and substr (deprecated).
Case conversion: toUpperCase(), toLowerCase(), and their locale-aware variants toLocaleUpperCase(), toLocaleLowerCase().
Trimming: trim(), trimStart(), trimEnd() remove whitespace from edges.
Padding: padStart(), padEnd() add characters to reach a target length.
Repeating: repeat(n) creates a string repeated n times.
Replacing: replace() (first or regex), replaceAll() (all occurrences). Both accept functions for dynamic replacement.
Split/Join: split() converts strings to arrays; join() converts arrays to strings.
Comparison: Default comparison uses Unicode code points. Use localeCompare() for language-aware sorting, especially with the numeric and sensitivity options.
Strings are stored as UTF-16. Characters outside the BMP (like emoji) use surrogate pairs (two code units), making length and index-based access unreliable for them. Use spread ([...str]) or for...of for correct character handling, and Intl.Segmenter for true grapheme cluster support.

String Immutability​

Quotes: Single, Double, Backticks​

Single and Double Quotes​

Backticks (Template Literals)​

Template Literals: Interpolation, Multi-Line, Tagged Templates​

String Interpolation​

Multi-Line Strings​

Tagged Templates​

Practical Tagged Template: HTML Escaping​

Special Characters and Escape Sequences​

Multi-Line with Escape Sequences vs. Template Literals​

String Length and Accessing Characters​

The length Property​

Accessing Individual Characters​

Comparison: [], charAt(), at()​

Iterating Over Strings​

for...of Loop​

for...of with Emoji​

Converting a String to an Array of Characters​

Searching in Strings​

indexOf and lastIndexOf​

Finding All Occurrences​

includes()​

startsWith() and endsWith()​

Practical Use Cases​

Search Methods Comparison​

Extracting Substrings: slice, substring, substr​

slice(start, end)​

substring(start, end)​

substr(start, length) (Deprecated)​

Comparison Table​

Changing Case​

toUpperCase() and toLowerCase()​

toLocaleUpperCase() and toLocaleLowerCase()​

Capitalizing the First Letter​

Trimming: trim, trimStart, trimEnd​

Practical Use: Cleaning User Input​

Padding: padStart, padEnd​

Practical Use Cases​

Repeating: repeat​

Replacing: replace, replaceAll​

replace()​

replace with Regular Expressions​

replaceAll()​

Replacement with Functions​

Splitting and Joining​

split()​

join() (Array Method)​

split and join Together: Common Patterns​

String Comparison and Locales​

Default Comparison (Unicode Code Points)​

localeCompare()​

localeCompare Options​

Intl.Collator for Performance​

Unicode, UTF-16, Surrogate Pairs, and Emoji Handling​

UTF-16 Internal Representation​

Surrogate Pairs​

Creating Characters from Code Points​

The Problems with Surrogate Pairs​

Reversing Strings with Emoji​

Grapheme Clusters​

Unicode Normalization​

Intl.Segmenter for True Grapheme Handling​

Summary​

Table of Contents

String Immutability

Quotes: Single, Double, Backticks

Single and Double Quotes

Backticks (Template Literals)

Template Literals: Interpolation, Multi-Line, Tagged Templates

String Interpolation

Multi-Line Strings

Tagged Templates

Practical Tagged Template: HTML Escaping

Special Characters and Escape Sequences

Multi-Line with Escape Sequences vs. Template Literals

String Length and Accessing Characters

The length Property

Accessing Individual Characters

Comparison: `[]`, `charAt()`, `at()`

Iterating Over Strings

`for...of` Loop

for...of with Emoji

Converting a String to an Array of Characters

Searching in Strings

`indexOf` and `lastIndexOf`

Finding All Occurrences

`includes()`

startsWith() and endsWith()

Practical Use Cases

Search Methods Comparison

Extracting Substrings: `slice`, `substring`, `substr`

`slice(start, end)`

`substring(start, end)`

`substr(start, length)` (Deprecated)

Comparison Table

Changing Case

`toUpperCase()` and `toLowerCase()`

`toLocaleUpperCase()` and `toLocaleLowerCase()`

Capitalizing the First Letter

Trimming: `trim`, `trimStart`, `trimEnd`

Practical Use: Cleaning User Input

Padding: `padStart`, `padEnd`

Practical Use Cases

Repeating: `repeat`

Replacing: `replace`, `replaceAll`

`replace()`

replace with Regular Expressions

`replaceAll()`

Replacement with Functions

Splitting and Joining

`split()`

`join()` (Array Method)

`split` and `join` Together: Common Patterns

String Comparison and Locales

Default Comparison (Unicode Code Points)

localeCompare()

localeCompare Options

Intl.Collator for Performance

Unicode, UTF-16, Surrogate Pairs, and Emoji Handling

UTF-16 Internal Representation

Surrogate Pairs

Creating Characters from Code Points

The Problems with Surrogate Pairs

Reversing Strings with Emoji

Grapheme Clusters

Unicode Normalization

Intl.Segmenter for True Grapheme Handling

Summary