Skip to main content

How to Use the Word Boundary \b in JavaScript Regular Expressions

When searching for text with regular expressions, finding the right substring is only half the challenge. You also need to ensure you are matching complete words rather than parts of longer words. Searching for "cat" without boundaries matches "cat" inside "catalog," "scattered," and "education." The word boundary anchor \b solves this problem by asserting that a match occurs at the edge of a word, where a word character meets a non-word character or the start/end of the string.

This guide explains exactly what a word boundary is, how the regex engine determines boundary positions, and walks through practical use cases for whole-word matching in real applications.

What Is a Word Boundary?​

The word boundary \b is an anchor, just like ^ and $. It does not match a character. It matches a position between characters where one side is a word character and the other side is not.

The Definition​

A word boundary exists at a position where:

  • A word character (\w = [a-zA-Z0-9_]) is on one side, AND
  • A non-word character (\W = [^a-zA-Z0-9_]), or the start/end of the string, is on the other side

In simpler terms, \b matches at the edges of words.

// Visualizing word boundaries in "Hello, World!"
//
// | H e l l o , W o r l d ! |
// ↑ ↑ ↑ ↑ ↑ ↑ ↑
// \b \b \b \b \b \b \b
//
// Boundaries exist:
// - Before "H" (start of string β†’ word char)
// - After "o" in "Hello" (word char β†’ comma)
// - Before "W" (space β†’ word char)
// - After "d" (word char β†’ "!")

Let's verify this with code:

const text = 'Hello, World!';

// Replace each word boundary with a visible marker
const marked = text.replace(/\b/g, '|');
console.log(marked);
// "|Hello|, |World|!"

Every | marks a position where \b matches. Notice that there are boundaries around "Hello" and "World" but not around the comma, space, or exclamation mark, because those are non-word characters on both sides (no transition from word to non-word).

How the Engine Checks Boundaries​

The regex engine checks two adjacent positions for each \b:

  1. The character before the current position (or string start if at position 0)
  2. The character after the current position (or string end if at the last position)

If exactly one of these is a \w character, it is a boundary.

const text = 'abc def';

// Position 0: [start] vs "a" β†’ start + word char = boundary βœ“
// Position 1: "a" vs "b" β†’ word + word = NOT boundary βœ—
// Position 2: "b" vs "c" β†’ word + word = NOT boundary βœ—
// Position 3: "c" vs " " β†’ word + non-word = boundary βœ“
// Position 4: " " vs "d" β†’ non-word + word = boundary βœ“
// Position 5: "d" vs "e" β†’ word + word = NOT boundary βœ—
// Position 6: "e" vs "f" β†’ word + word = NOT boundary βœ—
// Position 7: "f" vs [end] β†’ word + end = boundary βœ“

console.log(text.replace(/\b/g, '|'));
// "|abc| |def|"

Boundaries with Different Characters​

Understanding which characters are \w (word characters) and which are not is essential:

// Word characters: letters, digits, underscore
console.log('hello_world'.replace(/\b/g, '|'));
// "|hello_world|" (underscore is a word character, no boundary there)

console.log('hello-world'.replace(/\b/g, '|'));
// "|hello|-|world|" (hyphen is NOT a word character, creates boundaries)

console.log('file.txt'.replace(/\b/g, '|'));
// "|file|.|txt|" (dot is NOT a word character)

console.log('user@email'.replace(/\b/g, '|'));
// "|user|@|email|" (@ is NOT a word character)

console.log('price$100'.replace(/\b/g, '|'));
// "|price|$|100|" ($ creates boundaries on both sides)

console.log('abc123'.replace(/\b/g, '|'));
// "|abc123|" (digits are word characters, no internal boundary)

console.log('abc 123'.replace(/\b/g, '|'));
// "|abc| |123|" (space creates boundaries)

\b at String Boundaries​

\b matches at the start of the string if the first character is a word character, and at the end of the string if the last character is a word character:

// First character is a word char β†’ boundary at start
console.log(/^\b/.test('hello')); // true
console.log(/^\b/.test(' hello')); // false (space is not \w)

// Last character is a word char β†’ boundary at end
console.log(/\b$/.test('hello')); // true
console.log(/\b$/.test('hello!')); // false (! is not \w, but there IS
// a boundary before ! β†’ let me verify)

Let me clarify the last case:

console.log('hello!'.replace(/\b/g, '|'));
// "|hello|!"
// Boundary at start (before "h") and after "o" (word β†’ non-word)
// No boundary at the very end (! is non-word, end is non-word)

console.log(/\b$/.test('hello!')); // false (position at end: "!" vs end)
// non-word vs end = NOT a boundary

console.log(/\b$/.test('hello')); // true (position at end: "o" vs end)
// word vs end = IS a boundary

The Non-Word Boundary \B​

The uppercase \B is the inverse of \b. It matches at positions that are not word boundaries, meaning both sides are word characters or both sides are non-word characters:

console.log('hello world'.replace(/\B/g, '|'));
// "h|e|l|l|o w|o|r|l|d"
// Boundaries between word chars (inside words) and between non-word chars (the space)

// \B matches positions where \b does NOT match
// Useful for matching within words, not at their edges

// Find patterns in the middle of words (not at boundaries)
console.log('raining training straining'.match(/\Brain/g));
// ["rain", "rain", "rain"] (matches "rain" when NOT at a word boundary)
// i.e., only when "rain" appears inside a larger word

console.log('rain raining training'.match(/\Brain/g));
// ["rain", "rain"] (skips standalone "rain" (which starts at a boundary))

\B is used much less frequently than \b. Its main use is matching substrings that are explicitly part of a larger word.

Use Cases: Whole Word Matching​

The primary use of \b is ensuring you match complete words rather than substrings. Wrapping a pattern with \b on both sides, \bword\b, creates a whole-word match.

const text = 'The cat sat on the catalog near the caterpillar';

// ❌ Without boundaries: matches "cat" inside other words
console.log(text.match(/cat/g));
// ["cat", "cat", "cat"] (matches in "cat", "catalog", "caterpillar")

// βœ… With boundaries: matches only the standalone word "cat"
console.log(text.match(/\bcat\b/g));
// ["cat"] (only the standalone word)

This is the most common and important use of \b. Let's explore more scenarios:

const text = 'I like JavaScript, not just Java or JavaBeans';

// Without boundaries
console.log(text.match(/Java/g));
// ["Java", "Java", "Java"] (found in JavaScript, Java, JavaBeans)

// With boundaries: only "Java" as a standalone word
console.log(text.match(/\bJava\b/g));
// ["Java"] (only the standalone "Java")

// Match words starting with "Java"
console.log(text.match(/\bJava\w*/g));
// ["JavaScript", "Java", "JavaBeans"]

Search and Replace Whole Words​

// Replace a specific word without affecting longer words
const text = 'The car was in the cargo area near the railcar';

// ❌ Without boundaries: damages "cargo" and "railcar"
console.log(text.replace(/car/g, 'truck'));
// "The truck was in the truckgo area near the railtruck"

// βœ… With boundaries: only replaces the standalone word
console.log(text.replace(/\bcar\b/g, 'truck'));
// "The truck was in the cargo area near the railcar"
// Censor a specific word
function censorWord(text, word) {
const escaped = word.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
const regex = new RegExp(`\\b${escaped}\\b`, 'gi');
return text.replace(regex, '*'.repeat(word.length));
}

console.log(censorWord('The password is secret, keep it secret!', 'secret'));
// "The password is ******, keep it ******!"

// "password" is not affected even though it's related semantically
function findWholeWord(text, word) {
const escaped = word.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
const regex = new RegExp(`\\b${escaped}\\b`, 'gi');
return text.match(regex) || [];
}

console.log(findWholeWord('The THE the tHe', 'the'));
// ["The", "THE", "the", "tHe"]

console.log(findWholeWord('test testing tested TEST retest', 'test'));
// ["test", "TEST"] (only standalone "test")

Counting Word Occurrences​

function countWord(text, word) {
const escaped = word.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
const regex = new RegExp(`\\b${escaped}\\b`, 'gi');
return (text.match(regex) || []).length;
}

const article = 'The the THE cat sat on the mat with the cat';
console.log(countWord(article, 'the')); // 4
console.log(countWord(article, 'cat')); // 2
console.log(countWord(article, 'at')); // 0 ("at" doesn't appear as a standalone word)

Highlighting Search Terms​

function highlightWord(text, searchTerm) {
const escaped = searchTerm.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
const regex = new RegExp(`\\b(${escaped})\\b`, 'gi');
return text.replace(regex, '<mark>$1</mark>');
}

const result = highlightWord(
'JavaScript is great. Learn JavaScript today!',
'JavaScript'
);
console.log(result);
// "<mark>JavaScript</mark> is great. Learn <mark>JavaScript</mark> today!"

Highlighting multiple words:

function highlightWords(text, words) {
const escaped = words.map(w => w.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'));
const pattern = escaped.join('|');
const regex = new RegExp(`\\b(${pattern})\\b`, 'gi');
return text.replace(regex, '<mark>$1</mark>');
}

const result = highlightWords(
'The cat and the dog sat on the mat',
['cat', 'dog', 'mat']
);
console.log(result);
// "The <mark>cat</mark> and the <mark>dog</mark> sat on the <mark>mat</mark>"

Matching at Word Start or Word End Only​

You can use a single \b to match at the start or end of words:

const text = 'preview review view viewing overview';

// Words starting with "view"
console.log(text.match(/\bview\w*/g));
// ["view", "viewing"]

// Words ending with "view"
console.log(text.match(/\w*view\b/g));
// ["preview", "review", "view", "overview"]

// Words containing "view" (with boundaries on both sides of the full word)
console.log(text.match(/\b\w*view\w*\b/g));
// ["preview", "review", "view", "viewing", "overview"]
// Find words starting with a prefix
function wordsStartingWith(text, prefix) {
const escaped = prefix.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
const regex = new RegExp(`\\b${escaped}\\w*`, 'gi');
return text.match(regex) || [];
}

console.log(wordsStartingWith('unhappy unlikely undo happy done', 'un'));
// ["unhappy", "unlikely", "undo"]

// Find words ending with a suffix
function wordsEndingWith(text, suffix) {
const escaped = suffix.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
const regex = new RegExp(`\\w*${escaped}\\b`, 'gi');
return text.match(regex) || [];
}

console.log(wordsEndingWith('running jumping sitting walking', 'ing'));
// ["running", "jumping", "sitting", "walking"]

console.log(wordsEndingWith('singer ringer finger danger', 'nger'));
// ["singer", "ringer", "finger", "danger"]

Matching Specific Word Patterns​

// Match words that are exactly N characters long
function wordsOfLength(text, length) {
const regex = new RegExp(`\\b\\w{${length}}\\b`, 'g');
return text.match(regex) || [];
}

console.log(wordsOfLength('I am a good JavaScript developer', 3));
// []

console.log(wordsOfLength('The cat sat on the big mat', 3));
// ["The", "cat", "sat", "the", "big", "mat"]

// Match words containing digits
const text = 'version2 is better than v1 and abc';
console.log(text.match(/\b\w*\d\w*\b/g));
// ["version2", "v1"]

// Match words that are all uppercase
const mixed = 'The API uses HTTP and REST for IO';
console.log(mixed.match(/\b[A-Z]{2,}\b/g));
// ["API", "HTTP", "REST", "IO"]

// Match capitalized words (first letter uppercase, rest lowercase)
const sentence = 'Alice and Bob went to New York City';
console.log(sentence.match(/\b[A-Z][a-z]+\b/g));
// ["Alice", "Bob", "New", "York", "City"]

Validating Specific Word Formats​

// Check if a string is a valid variable name (simplified)
function isValidVarName(str) {
return /^\b[a-zA-Z_$][a-zA-Z0-9_$]*\b$/.test(str);
// Actually, ^ and $ already enforce full match,
// so \b is redundant here. Let me show a better example.
}

// Better use case: check if a word exists in a list
function containsKeyword(code, keyword) {
const regex = new RegExp(`\\b${keyword}\\b`);
return regex.test(code);
}

console.log(containsKeyword('let x = 10', 'let')); // true
console.log(containsKeyword('letter = "abc"', 'let')); // false! (\b catches this)
console.log(containsKeyword('const fn = () => {}', 'const')); // true
console.log(containsKeyword('constant = 5', 'const')); // false

Matching Numbers as Whole Words​

const text = 'There are 42 items and item42 costs $100';

// Match standalone numbers (not part of a larger word)
console.log(text.match(/\b\d+\b/g));
// ["42", "100"]

// Let me verify:
console.log('item42 has 42 items'.match(/\b\d+\b/g));
// ["42"] (only the standalone "42")
// Because in "item42", there's no boundary between "m" and "4" (both are \w characters)

// To match "42" even inside "item42", don't use \b:
console.log('item42 has 42 items'.match(/\d+/g));
// ["42", "42"]

The Hyphenated Word Challenge​

A common issue with \b is that hyphens are not word characters, so hyphenated words are treated as separate words:

const text = 'The well-known self-driving state-of-the-art car';

// \b treats each hyphen-separated part as its own word
console.log(text.match(/\bself\b/g));
// ["self"] (matches "self" in "self-driving")

console.log(text.match(/\bself-driving\b/g));
// ["self-driving"] (this works because \b checks the characters)
// at the outer edges: before "s" (spaceβ†’word = boundary βœ“) and after "g" (wordβ†’space = boundary βœ“)

// The hyphens inside are just matched literally

When you need to match whole hyphenated terms:

// Match hyphenated words as complete units
function findHyphenatedWord(text, word) {
const escaped = word.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
const regex = new RegExp(`\\b${escaped}\\b`, 'gi');
return text.match(regex) || [];
}

console.log(findHyphenatedWord(
'The state-of-the-art system is state-of-the-art',
'state-of-the-art'
));
// ["state-of-the-art", "state-of-the-art"]

The Unicode Limitation​

The \b boundary is based on \w, which only covers ASCII word characters ([a-zA-Z0-9_]). This means \b does not recognize accented letters, characters from non-Latin scripts, or other Unicode letters as word characters:

// \b doesn't work correctly with accented characters
const text = 'The café serves crème brûlée';

console.log(text.match(/\bcafΓ©\b/g));
// null! Because Γ© is not a \w character, so there's a boundary
// between "caf" and "Γ©", and the pattern doesn't match

console.log(text.replace(/\b/g, '|'));
// "|The| |caf|Γ© |serves| |cr|Γ¨|me| |br|Γ»|l|Γ©|e|"
// Boundaries appear in the middle of words at accented characters!

For Unicode-aware word boundaries, you need alternative approaches:

// Approach 1: Use Unicode property escapes with lookahead/lookbehind
// Match word boundaries for Unicode letters
function unicodeWordMatch(text, word) {
// Use lookbehind and lookahead to check for non-letter boundaries
const escaped = word.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
const regex = new RegExp(
`(?<=^|[^\\p{Letter}\\p{Mark}])${escaped}(?=[^\\p{Letter}\\p{Mark}]|$)`,
'giu'
);
return text.match(regex) || [];
}

console.log(unicodeWordMatch('The cafΓ© serves good cafΓ© au lait', 'cafΓ©'));
// ["cafΓ©", "cafΓ©"]

// Approach 2: Simpler but less preci, check surrounding characters
function findUnicodeWord(text, word) {
const escaped = word.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
// Match word when surrounded by non-letters or string boundaries
const regex = new RegExp(`(?<![\\p{Letter}])${escaped}(?![\\p{Letter}])`, 'giu');
return text.match(regex) || [];
}

console.log(findUnicodeWord('ΠŸΡ€ΠΈΠ²Π΅Ρ‚ ΠΌΠΈΡ€ ΠŸΡ€ΠΈΠ²Π΅Ρ‚ΡΡ‚Π²ΠΈΠ΅', 'ΠŸΡ€ΠΈΠ²Π΅Ρ‚'));
// ["ΠŸΡ€ΠΈΠ²Π΅Ρ‚"] (only the standalone word, not inside "ΠŸΡ€ΠΈΠ²Π΅Ρ‚ΡΡ‚Π²ΠΈΠ΅")
tip

If your application handles international text, be aware that \b only understands ASCII word boundaries. For correct word boundary detection with Unicode text, use lookahead and lookbehind with \p{Letter} and the u or v flag as shown above.

Building a Complete Word Search Utility​

Here is a practical, reusable word search function that handles common edge cases:

class WordSearcher {
constructor(text) {
this.text = text;
}

// Find whole word occurrences
findWord(word, caseSensitive = false) {
const escaped = word.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
const flags = caseSensitive ? 'g' : 'gi';
const regex = new RegExp(`\\b${escaped}\\b`, flags);
const matches = [];
let match;

while ((match = regex.exec(this.text)) !== null) {
matches.push({
word: match[0],
index: match.index,
context: this.getContext(match.index, match[0].length)
});
}

return matches;
}

// Replace whole words only
replaceWord(oldWord, newWord, caseSensitive = false) {
const escaped = oldWord.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
const flags = caseSensitive ? 'g' : 'gi';
const regex = new RegExp(`\\b${escaped}\\b`, flags);
return this.text.replace(regex, newWord);
}

// Get surrounding context
getContext(index, length, padding = 20) {
const start = Math.max(0, index - padding);
const end = Math.min(this.text.length, index + length + padding);
let context = this.text.slice(start, end);
if (start > 0) context = '...' + context;
if (end < this.text.length) context = context + '...';
return context;
}

// Count occurrences
countWord(word, caseSensitive = false) {
return this.findWord(word, caseSensitive).length;
}
}

// Usage
const searcher = new WordSearcher(
'The cat catalog has a categorical list of cat breeds and caterpillars'
);

console.log(searcher.findWord('cat'));
// Two matches for standalone "cat" only, with context

console.log(searcher.countWord('cat'));
// 2

console.log(searcher.replaceWord('cat', 'dog'));
// "The dog catalog has a categorical list of dog breeds and caterpillars"
// Only standalone "cat" is replaced

Keyword Detection in Code​

// Detect JavaScript keywords in a code string
const JS_KEYWORDS = [
'const', 'let', 'var', 'function', 'return', 'if', 'else',
'for', 'while', 'do', 'switch', 'case', 'break', 'continue',
'class', 'extends', 'new', 'this', 'super', 'import', 'export',
'default', 'try', 'catch', 'finally', 'throw', 'async', 'await',
'yield', 'typeof', 'instanceof', 'in', 'of', 'void', 'delete',
'true', 'false', 'null', 'undefined'
];

function findKeywords(code) {
const pattern = JS_KEYWORDS
.map(k => k.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'))
.join('|');
const regex = new RegExp(`\\b(${pattern})\\b`, 'g');
return code.match(regex) || [];
}

const code = 'const result = await fetchData(); if (result) return result;';
console.log(findKeywords(code));
// ["const", "await", "if", "return"]
// "result" is not a keyword, correctly excluded

Summary​

The word boundary \b is an essential anchor for precise text matching in JavaScript regular expressions:

  • \b matches a position, not a character. It asserts that one side is a word character (\w = [a-zA-Z0-9_]) and the other side is a non-word character or the start/end of the string.
  • \bword\b is the standard pattern for whole-word matching, ensuring the pattern is not part of a longer word. This is critical for search, replace, and validation operations.
  • \B is the inverse, matching positions that are not word boundaries (inside words or between non-word characters).
  • Use \b at the start only (\bprefix) to match words beginning with a pattern, or at the end only (suffix\b) to match words ending with a pattern.
  • Hyphens and punctuation create word boundaries because they are not \w characters. Hyphenated terms like "state-of-the-art" can still be matched literally with \b at the outer edges.
  • \b is ASCII-only: it does not understand accented characters or non-Latin scripts as word characters. For Unicode-aware boundaries, use lookahead and lookbehind with \p{Letter} and the u flag.
  • Always escape user input before inserting it into a \b-bounded regex pattern to prevent special characters from breaking the regex.