How to Use Regular Expression Character Classes in JavaScript
When writing regular expressions, you rarely want to match a single specific character. Most of the time, you need to match a category of characters: any digit, any letter, any whitespace, or any character at all. Character classes are the regex feature that makes this possible. Instead of listing every individual character you want to match, you use a shorthand that represents an entire group.
JavaScript regular expressions provide built-in character classes for the most common categories (digits, word characters, whitespace) along with their inverse counterparts. The dot (.) serves as the broadest character class, matching almost any character. Understanding how each character class works, what it includes and excludes, and how the u flag changes their behavior is fundamental to writing effective regular expressions.
\d, \s, \w: Digits, Whitespace, Word Characters
JavaScript provides three built-in character classes that cover the most frequently needed character categories. Each is written as a backslash followed by a lowercase letter.
\d: Any Digit
The \d class matches a single digit character from 0 to 9. It is equivalent to the character set [0-9].
const text = 'Order #4521 placed on 2024-03-15';
// Find individual digits
console.log(text.match(/\d/g));
// ["4", "5", "2", "1", "2", "0", "2", "4", "0", "3", "1", "5"]
// Find sequences of digits
console.log(text.match(/\d+/g));
// ["4521", "2024", "03", "15"]
\d is one of the most commonly used character classes. Here are practical examples:
// Extract a phone number
const message = 'Call me at 555-123-4567 today';
const phone = message.match(/\d{3}-\d{3}-\d{4}/);
console.log(phone[0]); // "555-123-4567"
// Validate that a string contains only digits
function isNumeric(str) {
return /^\d+$/.test(str);
}
console.log(isNumeric('12345')); // true
console.log(isNumeric('123a5')); // false
console.log(isNumeric('')); // false
// Extract a price from text
const price = 'Total: $29.99 USD'.match(/\d+\.\d{2}/);
console.log(price[0]); // "29.99"
Without the u or v flag, \d matches only ASCII digits (0-9). It does not match digits from other scripts like Arabic-Indic (٠١٢٣) or Devanagari (०१२३). With the u flag, \d still matches only ASCII digits in JavaScript. To match Unicode digits from any script, use \p{Number} with the u or v flag:
const text = 'Price: ١٢٣';
console.log(text.match(/\d+/g)); // null (ASCII digits only)
console.log(text.match(/\p{Number}+/gu)); // ["١٢٣"] (any Unicode digit)
\s: Any Whitespace Character
The \s class matches a single whitespace character. This includes:
- Space (
) - Tab (
\t) - Newline (
\n) - Carriage return (
\r) - Form feed (
\f) - Vertical tab (
\v) - And other Unicode whitespace characters (non-breaking space, etc.)
It is roughly equivalent to [ \t\n\r\f\v] plus additional Unicode whitespace.
const text = 'Hello World\tJavaScript\nRegex';
// Find all whitespace characters
console.log(text.match(/\s/g));
// [" ", " ", " ", "\t", "\n"]
// Split on any whitespace
console.log(text.split(/\s+/));
// ["Hello", "World", "JavaScript", "Regex"]
Common uses for \s:
// Trim extra whitespace (normalize spaces)
const messy = ' Hello World ';
const clean = messy.trim().replace(/\s+/g, ' ');
console.log(clean); // "Hello World"
// Check if a string contains only whitespace
function isBlank(str) {
return /^\s*$/.test(str);
}
console.log(isBlank('')); // true
console.log(isBlank(' ')); // true
console.log(isBlank(' \t\n ')); // true
console.log(isBlank(' hi ')); // false
// Match text between whitespace boundaries
const csv = 'Alice 30 Engineer';
const fields = csv.match(/\S+/g);
console.log(fields); // ["Alice", "30", "Engineer"]
\s is particularly useful for parsing structured text where fields are separated by variable amounts of whitespace:
// Parse a log line with variable spacing
const logLine = '2024-03-15 08:30:22 INFO Server started on port 3000';
const parts = logLine.split(/\s+/);
console.log(parts);
// ["2024-03-15", "08:30:22", "INFO", "Server", "started", "on", "port", "3000"]
const [date, time, level, ...messageParts] = parts;
console.log(date); // "2024-03-15"
console.log(time); // "08:30:22"
console.log(level); // "INFO"
console.log(messageParts.join(' ')); // "Server started on port 3000"
\w: Any Word Character
The \w class matches a single "word" character. In JavaScript, this means:
- Letters
atozandAtoZ - Digits
0to9 - The underscore
_
It is equivalent to [a-zA-Z0-9_].
const text = 'hello_world-123 foo@bar';
// Find all word characters
console.log(text.match(/\w/g));
// ["h","e","l","l","o","_","w","o","r","l","d","1","2","3","f","o","o","b","a","r"]
// Find sequences of word characters
console.log(text.match(/\w+/g));
// ["hello_world", "123", "foo", "bar"]
\w is commonly used for matching identifiers, variable names, and tokens:
// Validate a username (word characters only)
function isValidIdentifier(name) {
return /^\w+$/.test(name);
}
console.log(isValidIdentifier('user_name')); // true
console.log(isValidIdentifier('userName123')); // true
console.log(isValidIdentifier('user-name')); // false (hyphen is not \w)
console.log(isValidIdentifier('user name')); // false (space is not \w)
// Extract variable names from code
const code = 'let myVar = someFunc(arg1, arg2);';
const identifiers = code.match(/\b[a-zA-Z_]\w*\b/g);
console.log(identifiers);
// ["let", "myVar", "someFunc", "arg1", "arg2"]
\w only matches ASCII word characters. It does not match accented letters, characters from non-Latin scripts, or any letters outside the basic English alphabet:
const text = 'café résumé naïve über straße';
console.log(text.match(/\w+/g));
// ["caf", "r", "sum", "na", "ve", "ber", "stra", "e"]
// Accented characters split the words!
// To match Unicode word characters, use \p{Letter} with the u flag
console.log(text.match(/[\p{Letter}\p{Mark}]+/gu));
// ["café", "résumé", "naïve", "über", "straße"]
This is one of the most common surprises with \w. If your application handles international text, \w is likely insufficient.
Combining Character Classes
Character classes can be combined with other regex features to build more complex patterns:
// Match a date format: YYYY-MM-DD
const dateRegex = /\d{4}-\d{2}-\d{2}/;
console.log(dateRegex.test('2024-03-15')); // true
// Match a simple email pattern
const emailRegex = /\w+@\w+\.\w+/;
console.log(emailRegex.test('user@example.com')); // true
// Match a hex color code
const hexColor = /#[0-9a-fA-F]{6}\b/g;
const css = 'color: #ff5733; background: #1a2b3c;';
console.log(css.match(hexColor)); // ["#ff5733", "#1a2b3c"]
// Match time in HH:MM or HH:MM:SS format
const timeRegex = /\d{2}:\d{2}(:\d{2})?/g;
const log = 'Started at 08:30, finished at 14:45:22';
console.log(log.match(timeRegex)); // ["08:30", "14:45:22"]
Character Classes with Quantifiers
Character classes become especially powerful when combined with quantifiers:
// \d{n} - exactly n digits
console.log('12345'.match(/\d{3}/)); // ["123"]
// \d{n,m} - between n and m digits
console.log('12345'.match(/\d{2,4}/)); // ["1234"]
// \d+ - one or more digits
console.log('abc123def456'.match(/\d+/g)); // ["123", "456"]
// \d* - zero or more digits
console.log('abc'.match(/\d*/)); // [""] (matches zero digits at the start)
// \d? - zero or one digit
console.log('A1B'.match(/[A-Z]\d?/g)); // ["A1", "B"]
// \s+ - one or more whitespace (common for splitting)
console.log('hello world'.split(/\s+/)); // ["hello", "world"]
// \w{3,} - word characters, 3 or more (filter short words)
const text = 'I am a JS developer';
console.log(text.match(/\w{3,}/g)); // ["developer"]
Let me provide a cleaner example:
// Filter words by length
const sentence = 'The quick brown fox jumps over the lazy dog';
const longWords = sentence.match(/\w{4,}/g);
console.log(longWords);
// ["quick", "brown", "jumps", "over", "lazy"]
\D, \S, \W: Inverse Classes
Every character class has an inverse (or negated) counterpart, written with an uppercase letter. The inverse matches any character that the lowercase version does not match.
| Class | Matches | Inverse | Matches |
|---|---|---|---|
\d | Digits [0-9] | \D | Non-digits [^0-9] |
\s | Whitespace | \S | Non-whitespace |
\w | Word characters [a-zA-Z0-9_] | \W | Non-word characters [^a-zA-Z0-9_] |
\D: Any Non-Digit
\D matches any character that is not a digit. It is equivalent to [^0-9]:
const text = 'Order #4521 on 2024-03-15';
// Find all non-digit characters
console.log(text.match(/\D+/g));
// ["Order #", " on ", "-", "-"]
// Remove all non-digits (extract only numbers)
const digitsOnly = text.replace(/\D/g, '');
console.log(digitsOnly); // "452120240315"
Practical uses:
// Clean a phone number: remove everything that's not a digit
function cleanPhone(phone) {
return phone.replace(/\D/g, '');
}
console.log(cleanPhone('(555) 123-4567')); // "5551234567"
console.log(cleanPhone('+1-555-123-4567')); // "15551234567"
console.log(cleanPhone('555.123.4567')); // "5551234567"
// Verify a string contains at least one non-digit
function hasNonDigit(str) {
return /\D/.test(str);
}
console.log(hasNonDigit('12345')); // false
console.log(hasNonDigit('123a5')); // true
console.log(hasNonDigit('hello')); // true
\S: Any Non-Whitespace
\S matches any character that is not whitespace. It is equivalent to [^\s]:
const text = ' Hello World ';
// Find all non-whitespace sequences
console.log(text.match(/\S+/g));
// ["Hello", "World"]
// Check if a string has any visible content
function hasContent(str) {
return /\S/.test(str);
}
console.log(hasContent(' ')); // false
console.log(hasContent(' hi ')); // true
console.log(hasContent('')); // false
console.log(hasContent('\t\n')); // false
\S is frequently used as a quick way to match "any visible character":
// Match a simple URL pattern (any non-whitespace after http)
const text = 'Visit https://example.com/path?q=1 for details';
const url = text.match(/https?:\/\/\S+/);
console.log(url[0]); // "https://example.com/path?q=1"
// Extract words (sequences of non-whitespace)
const line = 'Alice 30 Engineer NYC';
const columns = line.match(/\S+/g);
console.log(columns); // ["Alice", "30", "Engineer", "NYC"]
\W: Any Non-Word Character
\W matches any character that is not a word character. It is equivalent to [^a-zA-Z0-9_]. This includes spaces, punctuation, special characters, and any non-ASCII characters:
const text = 'hello_world! foo@bar.com (test)';
// Find all non-word characters
console.log(text.match(/\W/g));
// ["!", " ", "@", ".", " ", "(", ")"]
// Find sequences of non-word characters
console.log(text.match(/\W+/g));
// ["! ", "@", ".", " (", ")"]
Practical uses:
// Split text into words, removing punctuation
const sentence = "Hello, world! How's it going? Fine - thanks.";
const words = sentence.split(/\W+/).filter(Boolean);
console.log(words);
// ["Hello", "world", "How", "s", "it", "going", "Fine", "thanks"]
// Slugify a string (convert to URL-friendly format)
function slugify(text) {
return text
.toLowerCase()
.replace(/\W+/g, '-') // Replace non-word chars with hyphens
.replace(/^-+|-+$/g, ''); // Trim leading/trailing hyphens
}
console.log(slugify('Hello World!')); // "hello-world"
console.log(slugify(' JavaScript & Regex ')); // "javascript-regex"
console.log(slugify('What is This?!')); // "what-is-this"
// Count punctuation characters
const essay = 'Well, hello there! How are you? I am fine.';
const punctuation = essay.match(/[^\w\s]/g);
console.log(punctuation); // [",", "!", "?", "."]
console.log(`Punctuation count: ${punctuation.length}`); // 4
Using a Class and Its Inverse Together
A character class combined with its inverse matches everything. This technique is useful as an alternative to the dot when you need to match any character including newlines:
// [\s\S] - any whitespace OR any non-whitespace = any character at all
// This was the classic workaround before the s flag existed
const html = `<div>
<p>Hello</p>
</div>`;
// . doesn't match newlines (without s flag)
console.log(html.match(/<div>.*<\/div>/)); // null
// [\s\S] matches everything including newlines
console.log(html.match(/<div>[\s\S]*<\/div>/));
// ["<div>\n <p>Hello</p>\n</div>"]
// Modern alternative: use the s flag
console.log(html.match(/<div>.*<\/div>/s));
// ["<div>\n <p>Hello</p>\n</div>"]
Other "match anything" combinations:
// All of these match any character including newlines:
/[\s\S]/ // whitespace or non-whitespace
/[\d\D]/ // digit or non-digit
/[\w\W]/ // word char or non-word char
/./s // dot with dotall flag (modern, preferred)
Summary Table
| Class | Meaning | Equivalent Set | Inverse | Inverse Meaning |
|---|---|---|---|---|
\d | Any digit | [0-9] | \D | Any non-digit |
\s | Any whitespace | [ \t\n\r\f\v...] | \S | Any non-whitespace |
\w | Word character | [a-zA-Z0-9_] | \W | Non-word character |
The Dot . and the s Flag (Dotall)
The dot (.) is the most general built-in character class. It matches "almost any" character, with one important exception by default.
Default Behavior: Everything Except Newlines
Without any flags, . matches any single character except newline characters (\n, \r, line separator \u2028, paragraph separator \u2029):
console.log(/h.t/.test('hat')); // true (. matches "a")
console.log(/h.t/.test('hot')); // true (. matches "o")
console.log(/h.t/.test('h2t')); // true (. matches "2")
console.log(/h.t/.test('h!t')); // true (. matches "!")
console.log(/h.t/.test('h t')); // true (. matches " ")
console.log(/h.t/.test('h\tt')); // true (. matches tab)
console.log(/h.t/.test('h\nt')); // false (. does NOT match newline)
console.log(/h.t/.test('ht')); // false (. requires exactly one character)
The dot is commonly used when you know a character should be there but do not care what it is:
// Match any three-letter word
const text = 'cat bat 123 sat mat';
console.log(text.match(/\b...\b/g));
// ["cat", "bat", "123", "sat", "mat"]
// Match a date in any separator format
const dates = ['2024-03-15', '2024/03/15', '2024.03.15'];
dates.forEach(d => {
console.log(/\d{4}.\d{2}.\d{2}/.test(d)); // true for all
});
// Match a file extension
const filename = 'report.pdf';
const ext = filename.match(/\.(.+)$/);
console.log(ext[1]); // "pdf"
The Problem with Multiline Content
The dot's inability to match newlines causes problems when working with text that spans multiple lines:
const html = `<p>
First paragraph
with multiple lines
</p>`;
// Trying to match everything between <p> tags
console.log(html.match(/<p>(.+)<\/p>/));
// null - the . can't cross the newlines
Before the s flag was introduced, developers used workarounds:
// Workaround 1: [\s\S] - matches any whitespace or non-whitespace
console.log(html.match(/<p>([\s\S]+)<\/p>/));
// Match found: "\nFirst paragraph\nwith multiple lines\n"
// Workaround 2: [\d\D] - matches any digit or non-digit
console.log(html.match(/<p>([\d\D]+)<\/p>/));
// Same result
// Workaround 3: [^] - empty negated class (matches anything)
// Note: this is non-standard and not universally supported
console.log(html.match(/<p>([^]+)<\/p>/));
// Same result, but avoid this - it's not part of the standard
The s Flag (Dotall Mode)
The s flag, also called "dotall" mode, makes the dot match every character, including newlines:
const html = `<p>
First paragraph
with multiple lines
</p>`;
// With the s flag: . matches newlines too
console.log(html.match(/<p>(.+)<\/p>/s));
// Match found: "\nFirst paragraph\nwith multiple lines\n"
The name "dotall" means "dot matches all." The flag is available in all modern browsers and Node.js.
const template = `
---
title: Hello World
date: 2024-03-15
---
Content goes here
and here too
`;
// Extract content between --- markers (multiline)
const frontMatter = template.match(/---(.*?)---/s);
console.log(frontMatter[1].trim());
// "title: Hello World\ndate: 2024-03-15"
When to Use the s Flag
Use the s flag when your pattern needs to span across line breaks:
// Extracting multiline blocks
const code = `function hello() {
console.log("Hello");
console.log("World");
}`;
// Extract function body
const body = code.match(/\{(.*)\}/s);
console.log(body[1].trim());
// 'console.log("Hello");\n console.log("World");'
// Extracting HTML comments (which can span multiple lines)
const html = `<!-- This is
a multiline
comment -->
<p>Content</p>`;
const comment = html.match(/<!--(.+?)-->/s);
console.log(comment[1].trim());
// "This is\na multiline\ncomment"
// Extracting JSON blocks from mixed text
const response = `Some text before
{
"name": "Alice",
"age": 30
}
Some text after`;
const jsonBlock = response.match(/\{.*\}/s);
console.log(JSON.parse(jsonBlock[0]));
// { name: "Alice", age: 30 }
When NOT to Use the s Flag
Sometimes you specifically want . to stop at newlines. This is common when processing line-by-line content where newlines are meaningful boundaries:
const csv = `name,age,city
Alice,30,NYC
Bob,25,LA
Charlie,35,Chicago`;
// Without s: . stops at each line - correct behavior for CSV parsing
console.log(csv.match(/^\w+,.+$/gm));
// ["Alice,30,NYC", "Bob,25,LA", "Charlie,35,Chicago"]
// With s: . would cross lines - wrong for line-based parsing
console.log(csv.match(/^\w+,.+$/gms));
// This would match differently because . now crosses newlines
The Dot Is Greedy
By default, .+ and .* are greedy, meaning they match as many characters as possible. This can lead to unexpected results:
const html = '<b>Hello</b> and <b>World</b>';
// Greedy: .+ matches as much as possible
console.log(html.match(/<b>.+<\/b>/));
// ["<b>Hello</b> and <b>World</b>"]
// Matched from the FIRST <b> to the LAST </b>
// Lazy: .+? matches as little as possible
console.log(html.match(/<b>.+?<\/b>/g));
// ["<b>Hello</b>", "<b>World</b>"]
// Matched each <b>...</b> pair individually
This greedy vs. lazy distinction is covered in detail in a later guide, but it is important to be aware of when using the dot. Adding ? after + or * makes the quantifier lazy, matching the shortest possible string.
Dot vs. Specific Character Classes
The dot is convenient but imprecise. Using a more specific character class often produces more correct results:
// ❌ Too broad: . matches anything
const priceRegex = /\$.\../;
console.log(priceRegex.test('$9.99')); // true
console.log(priceRegex.test('$a.bc')); // true - probably not intended
// ✅ More precise: \d matches only digits
const betterPriceRegex = /\$\d\.\d\d/;
console.log(betterPriceRegex.test('$9.99')); // true
console.log(betterPriceRegex.test('$a.bc')); // false - correctly rejected
// ❌ Too broad: .+ in a URL pattern
const urlRegex = /https?:\/\/.+/;
console.log(urlRegex.test('http://example.com valid')); // true - matches the space and "valid" too
// ✅ More precise: \S+ stops at whitespace
const betterUrlRegex = /https?:\/\/\S+/;
const match = 'Visit http://example.com for info'.match(betterUrlRegex);
console.log(match[0]); // "http://example.com" - stops at the space
Use the dot when you genuinely do not care what character is in a position. When you have a reasonable expectation about what characters should appear, use a more specific class like \d, \w, \S, or a custom character set [a-z]. More specific patterns produce fewer false matches and are easier to debug.
Practical Patterns Using Character Classes
Here are complete, real-world examples that combine everything covered in this guide.
Parsing Key-Value Pairs
const config = `
host = localhost
port = 3000
debug = true
name = My Application
`;
const pairs = {};
const regex = /(\w+)\s*=\s*(.+)/g;
let match;
while ((match = regex.exec(config)) !== null) {
const key = match[1].trim();
const value = match[2].trim();
pairs[key] = value;
}
console.log(pairs);
// { host: "localhost", port: "3000", debug: "true", name: "My Application" }
Extracting Data from Structured Text
// Extract timestamps and messages from a log
const log = `
[2024-03-15 08:30:22] INFO Server started
[2024-03-15 08:30:23] DEBUG Connection pool initialized
[2024-03-15 08:31:01] ERROR Failed to connect to database
`;
const entries = [];
const logRegex = /\[(\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2})\]\s(\w+)\s+(.+)/g;
for (const match of log.matchAll(logRegex)) {
entries.push({
timestamp: match[1],
level: match[2],
message: match[3]
});
}
console.log(entries);
// [
// { timestamp: "2024-03-15 08:30:22", level: "INFO", message: "Server started" },
// { timestamp: "2024-03-15 08:30:23", level: "DEBUG", message: "Connection pool initialized" },
// { timestamp: "2024-03-15 08:31:01", level: "ERROR", message: "Failed to connect to database" }
// ]
Input Sanitization
// Remove non-printable characters from user input
function sanitize(input) {
// Keep word characters, whitespace, and common punctuation
return input.replace(/[^\w\s.,!?@#$%&*()\-+=:;"'\/\\]/g, '');
}
console.log(sanitize('Hello\x00World\x07!')); // "HelloWorld!"
// Normalize whitespace
function normalizeWhitespace(text) {
return text
.replace(/\s+/g, ' ') // Collapse multiple whitespace to single space
.trim(); // Remove leading/trailing whitespace
}
console.log(normalizeWhitespace(' Hello \t World \n '));
// "Hello World"
Redacting Sensitive Information
// Mask credit card numbers, keeping only the last 4 digits
function maskCardNumber(text) {
return text.replace(/\b(\d{4})\s?\d{4}\s?\d{4}\s?(\d{4})\b/g,
(match, first, last) => `****-****-****-${last}`
);
}
console.log(maskCardNumber('Card: 4532 1234 5678 9012'));
// "Card: ****-****-****-9012"
// Redact email addresses
function redactEmails(text) {
return text.replace(/\S+@\S+\.\S+/g, '[EMAIL REDACTED]');
}
console.log(redactEmails('Contact alice@example.com or bob@test.org'));
// "Contact [EMAIL REDACTED] or [EMAIL REDACTED]"
// Mask phone numbers
function maskPhone(text) {
return text.replace(/\d{3}[-.]?\d{3}[-.]?\d{4}/g, (match) => {
const digits = match.replace(/\D/g, '');
return `***-***-${digits.slice(-4)}`;
});
}
console.log(maskPhone('Call 555-123-4567 or 555.987.6543'));
// "Call ***-***-4567 or ***-***-6543"
Password Strength Checker
function checkPasswordStrength(password) {
const checks = {
length: password.length >= 8,
uppercase: /[A-Z]/.test(password),
lowercase: /[a-z]/.test(password),
digit: /\d/.test(password),
special: /\W/.test(password), // Non-word character (but includes space)
noSpaces: !/\s/.test(password), // No whitespace
noRepeating: !/(.)\1{2,}/.test(password) // No 3+ repeated characters
};
const passed = Object.values(checks).filter(Boolean).length;
const total = Object.keys(checks).length;
let strength;
if (passed <= 3) strength = 'weak';
else if (passed <= 5) strength = 'medium';
else strength = 'strong';
return { checks, passed, total, strength };
}
console.log(checkPasswordStrength('Abc123!@'));
// { checks: { length: true, uppercase: true, ... }, passed: 7, total: 7, strength: "strong" }
console.log(checkPasswordStrength('password'));
// { checks: { length: true, uppercase: false, ... }, passed: 3, total: 7, strength: "weak" }
Summary
Character classes are the foundation of practical regular expression patterns in JavaScript:
\dmatches any ASCII digit (0-9). Use it for numbers, dates, phone numbers, and any numeric data. Its inverse\Dmatches any non-digit.\smatches any whitespace character (space, tab, newline, and others). Use it for splitting text, normalizing whitespace, and detecting blank strings. Its inverse\Smatches any non-whitespace and is useful for matching visible content.\wmatches ASCII word characters (letters, digits, underscore). Use it for identifiers, usernames, and tokens. Be aware that it does not match accented or non-Latin characters. Its inverse\Wmatches non-word characters like punctuation and spaces.- The dot
.matches any character except newline by default. Use thesflag (dotall mode) to make it match newlines too. Prefer specific character classes over the dot when you know what characters to expect. - A class paired with its inverse (
[\s\S],[\d\D],[\w\W]) matches any character at all, which was the standard workaround before thesflag existed. - For international text,
\dand\wonly cover ASCII. Use Unicode property escapes (\p{Letter},\p{Number}) with theuorvflag for full Unicode support.