Skip to main content

How to Use Regular Expression Character Classes in JavaScript

When writing regular expressions, you rarely want to match a single specific character. Most of the time, you need to match a category of characters: any digit, any letter, any whitespace, or any character at all. Character classes are the regex feature that makes this possible. Instead of listing every individual character you want to match, you use a shorthand that represents an entire group.

JavaScript regular expressions provide built-in character classes for the most common categories (digits, word characters, whitespace) along with their inverse counterparts. The dot (.) serves as the broadest character class, matching almost any character. Understanding how each character class works, what it includes and excludes, and how the u flag changes their behavior is fundamental to writing effective regular expressions.

\d, \s, \w: Digits, Whitespace, Word Characters

JavaScript provides three built-in character classes that cover the most frequently needed character categories. Each is written as a backslash followed by a lowercase letter.

\d: Any Digit

The \d class matches a single digit character from 0 to 9. It is equivalent to the character set [0-9].

const text = 'Order #4521 placed on 2024-03-15';

// Find individual digits
console.log(text.match(/\d/g));
// ["4", "5", "2", "1", "2", "0", "2", "4", "0", "3", "1", "5"]

// Find sequences of digits
console.log(text.match(/\d+/g));
// ["4521", "2024", "03", "15"]

\d is one of the most commonly used character classes. Here are practical examples:

// Extract a phone number
const message = 'Call me at 555-123-4567 today';
const phone = message.match(/\d{3}-\d{3}-\d{4}/);
console.log(phone[0]); // "555-123-4567"

// Validate that a string contains only digits
function isNumeric(str) {
return /^\d+$/.test(str);
}

console.log(isNumeric('12345')); // true
console.log(isNumeric('123a5')); // false
console.log(isNumeric('')); // false

// Extract a price from text
const price = 'Total: $29.99 USD'.match(/\d+\.\d{2}/);
console.log(price[0]); // "29.99"
note

Without the u or v flag, \d matches only ASCII digits (0-9). It does not match digits from other scripts like Arabic-Indic (٠١٢٣) or Devanagari (०१२३). With the u flag, \d still matches only ASCII digits in JavaScript. To match Unicode digits from any script, use \p{Number} with the u or v flag:

const text = 'Price: ١٢٣';

console.log(text.match(/\d+/g)); // null (ASCII digits only)
console.log(text.match(/\p{Number}+/gu)); // ["١٢٣"] (any Unicode digit)

\s: Any Whitespace Character

The \s class matches a single whitespace character. This includes:

  • Space ( )
  • Tab (\t)
  • Newline (\n)
  • Carriage return (\r)
  • Form feed (\f)
  • Vertical tab (\v)
  • And other Unicode whitespace characters (non-breaking space, etc.)

It is roughly equivalent to [ \t\n\r\f\v] plus additional Unicode whitespace.

const text = 'Hello   World\tJavaScript\nRegex';

// Find all whitespace characters
console.log(text.match(/\s/g));
// [" ", " ", " ", "\t", "\n"]

// Split on any whitespace
console.log(text.split(/\s+/));
// ["Hello", "World", "JavaScript", "Regex"]

Common uses for \s:

// Trim extra whitespace (normalize spaces)
const messy = ' Hello World ';
const clean = messy.trim().replace(/\s+/g, ' ');
console.log(clean); // "Hello World"

// Check if a string contains only whitespace
function isBlank(str) {
return /^\s*$/.test(str);
}

console.log(isBlank('')); // true
console.log(isBlank(' ')); // true
console.log(isBlank(' \t\n ')); // true
console.log(isBlank(' hi ')); // false

// Match text between whitespace boundaries
const csv = 'Alice 30 Engineer';
const fields = csv.match(/\S+/g);
console.log(fields); // ["Alice", "30", "Engineer"]

\s is particularly useful for parsing structured text where fields are separated by variable amounts of whitespace:

// Parse a log line with variable spacing
const logLine = '2024-03-15 08:30:22 INFO Server started on port 3000';
const parts = logLine.split(/\s+/);
console.log(parts);
// ["2024-03-15", "08:30:22", "INFO", "Server", "started", "on", "port", "3000"]

const [date, time, level, ...messageParts] = parts;
console.log(date); // "2024-03-15"
console.log(time); // "08:30:22"
console.log(level); // "INFO"
console.log(messageParts.join(' ')); // "Server started on port 3000"

\w: Any Word Character

The \w class matches a single "word" character. In JavaScript, this means:

  • Letters a to z and A to Z
  • Digits 0 to 9
  • The underscore _

It is equivalent to [a-zA-Z0-9_].

const text = 'hello_world-123 foo@bar';

// Find all word characters
console.log(text.match(/\w/g));
// ["h","e","l","l","o","_","w","o","r","l","d","1","2","3","f","o","o","b","a","r"]

// Find sequences of word characters
console.log(text.match(/\w+/g));
// ["hello_world", "123", "foo", "bar"]

\w is commonly used for matching identifiers, variable names, and tokens:

// Validate a username (word characters only)
function isValidIdentifier(name) {
return /^\w+$/.test(name);
}

console.log(isValidIdentifier('user_name')); // true
console.log(isValidIdentifier('userName123')); // true
console.log(isValidIdentifier('user-name')); // false (hyphen is not \w)
console.log(isValidIdentifier('user name')); // false (space is not \w)

// Extract variable names from code
const code = 'let myVar = someFunc(arg1, arg2);';
const identifiers = code.match(/\b[a-zA-Z_]\w*\b/g);
console.log(identifiers);
// ["let", "myVar", "someFunc", "arg1", "arg2"]
warning

\w only matches ASCII word characters. It does not match accented letters, characters from non-Latin scripts, or any letters outside the basic English alphabet:

const text = 'café résumé naïve über straße';

console.log(text.match(/\w+/g));
// ["caf", "r", "sum", "na", "ve", "ber", "stra", "e"]
// Accented characters split the words!

// To match Unicode word characters, use \p{Letter} with the u flag
console.log(text.match(/[\p{Letter}\p{Mark}]+/gu));
// ["café", "résumé", "naïve", "über", "straße"]

This is one of the most common surprises with \w. If your application handles international text, \w is likely insufficient.

Combining Character Classes

Character classes can be combined with other regex features to build more complex patterns:

// Match a date format: YYYY-MM-DD
const dateRegex = /\d{4}-\d{2}-\d{2}/;
console.log(dateRegex.test('2024-03-15')); // true

// Match a simple email pattern
const emailRegex = /\w+@\w+\.\w+/;
console.log(emailRegex.test('user@example.com')); // true

// Match a hex color code
const hexColor = /#[0-9a-fA-F]{6}\b/g;
const css = 'color: #ff5733; background: #1a2b3c;';
console.log(css.match(hexColor)); // ["#ff5733", "#1a2b3c"]

// Match time in HH:MM or HH:MM:SS format
const timeRegex = /\d{2}:\d{2}(:\d{2})?/g;
const log = 'Started at 08:30, finished at 14:45:22';
console.log(log.match(timeRegex)); // ["08:30", "14:45:22"]

Character Classes with Quantifiers

Character classes become especially powerful when combined with quantifiers:

// \d{n} - exactly n digits
console.log('12345'.match(/\d{3}/)); // ["123"]

// \d{n,m} - between n and m digits
console.log('12345'.match(/\d{2,4}/)); // ["1234"]

// \d+ - one or more digits
console.log('abc123def456'.match(/\d+/g)); // ["123", "456"]

// \d* - zero or more digits
console.log('abc'.match(/\d*/)); // [""] (matches zero digits at the start)

// \d? - zero or one digit
console.log('A1B'.match(/[A-Z]\d?/g)); // ["A1", "B"]

// \s+ - one or more whitespace (common for splitting)
console.log('hello world'.split(/\s+/)); // ["hello", "world"]

// \w{3,} - word characters, 3 or more (filter short words)
const text = 'I am a JS developer';
console.log(text.match(/\w{3,}/g)); // ["developer"]

Let me provide a cleaner example:

// Filter words by length
const sentence = 'The quick brown fox jumps over the lazy dog';
const longWords = sentence.match(/\w{4,}/g);
console.log(longWords);
// ["quick", "brown", "jumps", "over", "lazy"]

\D, \S, \W: Inverse Classes

Every character class has an inverse (or negated) counterpart, written with an uppercase letter. The inverse matches any character that the lowercase version does not match.

ClassMatchesInverseMatches
\dDigits [0-9]\DNon-digits [^0-9]
\sWhitespace\SNon-whitespace
\wWord characters [a-zA-Z0-9_]\WNon-word characters [^a-zA-Z0-9_]

\D: Any Non-Digit

\D matches any character that is not a digit. It is equivalent to [^0-9]:

const text = 'Order #4521 on 2024-03-15';

// Find all non-digit characters
console.log(text.match(/\D+/g));
// ["Order #", " on ", "-", "-"]

// Remove all non-digits (extract only numbers)
const digitsOnly = text.replace(/\D/g, '');
console.log(digitsOnly); // "452120240315"

Practical uses:

// Clean a phone number: remove everything that's not a digit
function cleanPhone(phone) {
return phone.replace(/\D/g, '');
}

console.log(cleanPhone('(555) 123-4567')); // "5551234567"
console.log(cleanPhone('+1-555-123-4567')); // "15551234567"
console.log(cleanPhone('555.123.4567')); // "5551234567"

// Verify a string contains at least one non-digit
function hasNonDigit(str) {
return /\D/.test(str);
}

console.log(hasNonDigit('12345')); // false
console.log(hasNonDigit('123a5')); // true
console.log(hasNonDigit('hello')); // true

\S: Any Non-Whitespace

\S matches any character that is not whitespace. It is equivalent to [^\s]:

const text = '  Hello   World  ';

// Find all non-whitespace sequences
console.log(text.match(/\S+/g));
// ["Hello", "World"]

// Check if a string has any visible content
function hasContent(str) {
return /\S/.test(str);
}

console.log(hasContent(' ')); // false
console.log(hasContent(' hi ')); // true
console.log(hasContent('')); // false
console.log(hasContent('\t\n')); // false

\S is frequently used as a quick way to match "any visible character":

// Match a simple URL pattern (any non-whitespace after http)
const text = 'Visit https://example.com/path?q=1 for details';
const url = text.match(/https?:\/\/\S+/);
console.log(url[0]); // "https://example.com/path?q=1"

// Extract words (sequences of non-whitespace)
const line = 'Alice 30 Engineer NYC';
const columns = line.match(/\S+/g);
console.log(columns); // ["Alice", "30", "Engineer", "NYC"]

\W: Any Non-Word Character

\W matches any character that is not a word character. It is equivalent to [^a-zA-Z0-9_]. This includes spaces, punctuation, special characters, and any non-ASCII characters:

const text = 'hello_world! foo@bar.com (test)';

// Find all non-word characters
console.log(text.match(/\W/g));
// ["!", " ", "@", ".", " ", "(", ")"]

// Find sequences of non-word characters
console.log(text.match(/\W+/g));
// ["! ", "@", ".", " (", ")"]

Practical uses:

// Split text into words, removing punctuation
const sentence = "Hello, world! How's it going? Fine - thanks.";
const words = sentence.split(/\W+/).filter(Boolean);
console.log(words);
// ["Hello", "world", "How", "s", "it", "going", "Fine", "thanks"]

// Slugify a string (convert to URL-friendly format)
function slugify(text) {
return text
.toLowerCase()
.replace(/\W+/g, '-') // Replace non-word chars with hyphens
.replace(/^-+|-+$/g, ''); // Trim leading/trailing hyphens
}

console.log(slugify('Hello World!')); // "hello-world"
console.log(slugify(' JavaScript & Regex ')); // "javascript-regex"
console.log(slugify('What is This?!')); // "what-is-this"

// Count punctuation characters
const essay = 'Well, hello there! How are you? I am fine.';
const punctuation = essay.match(/[^\w\s]/g);
console.log(punctuation); // [",", "!", "?", "."]
console.log(`Punctuation count: ${punctuation.length}`); // 4

Using a Class and Its Inverse Together

A character class combined with its inverse matches everything. This technique is useful as an alternative to the dot when you need to match any character including newlines:

// [\s\S] - any whitespace OR any non-whitespace = any character at all
// This was the classic workaround before the s flag existed

const html = `<div>
<p>Hello</p>
</div>`;

// . doesn't match newlines (without s flag)
console.log(html.match(/<div>.*<\/div>/)); // null

// [\s\S] matches everything including newlines
console.log(html.match(/<div>[\s\S]*<\/div>/));
// ["<div>\n <p>Hello</p>\n</div>"]

// Modern alternative: use the s flag
console.log(html.match(/<div>.*<\/div>/s));
// ["<div>\n <p>Hello</p>\n</div>"]

Other "match anything" combinations:

// All of these match any character including newlines:
/[\s\S]/ // whitespace or non-whitespace
/[\d\D]/ // digit or non-digit
/[\w\W]/ // word char or non-word char
/./s // dot with dotall flag (modern, preferred)

Summary Table

ClassMeaningEquivalent SetInverseInverse Meaning
\dAny digit[0-9]\DAny non-digit
\sAny whitespace[ \t\n\r\f\v...]\SAny non-whitespace
\wWord character[a-zA-Z0-9_]\WNon-word character

The Dot . and the s Flag (Dotall)

The dot (.) is the most general built-in character class. It matches "almost any" character, with one important exception by default.

Default Behavior: Everything Except Newlines

Without any flags, . matches any single character except newline characters (\n, \r, line separator \u2028, paragraph separator \u2029):

console.log(/h.t/.test('hat'));    // true (. matches "a")
console.log(/h.t/.test('hot')); // true (. matches "o")
console.log(/h.t/.test('h2t')); // true (. matches "2")
console.log(/h.t/.test('h!t')); // true (. matches "!")
console.log(/h.t/.test('h t')); // true (. matches " ")
console.log(/h.t/.test('h\tt')); // true (. matches tab)
console.log(/h.t/.test('h\nt')); // false (. does NOT match newline)
console.log(/h.t/.test('ht')); // false (. requires exactly one character)

The dot is commonly used when you know a character should be there but do not care what it is:

// Match any three-letter word
const text = 'cat bat 123 sat mat';
console.log(text.match(/\b...\b/g));
// ["cat", "bat", "123", "sat", "mat"]

// Match a date in any separator format
const dates = ['2024-03-15', '2024/03/15', '2024.03.15'];
dates.forEach(d => {
console.log(/\d{4}.\d{2}.\d{2}/.test(d)); // true for all
});

// Match a file extension
const filename = 'report.pdf';
const ext = filename.match(/\.(.+)$/);
console.log(ext[1]); // "pdf"

The Problem with Multiline Content

The dot's inability to match newlines causes problems when working with text that spans multiple lines:

const html = `<p>
First paragraph
with multiple lines
</p>`;

// Trying to match everything between <p> tags
console.log(html.match(/<p>(.+)<\/p>/));
// null - the . can't cross the newlines

Before the s flag was introduced, developers used workarounds:

// Workaround 1: [\s\S] - matches any whitespace or non-whitespace
console.log(html.match(/<p>([\s\S]+)<\/p>/));
// Match found: "\nFirst paragraph\nwith multiple lines\n"

// Workaround 2: [\d\D] - matches any digit or non-digit
console.log(html.match(/<p>([\d\D]+)<\/p>/));
// Same result

// Workaround 3: [^] - empty negated class (matches anything)
// Note: this is non-standard and not universally supported
console.log(html.match(/<p>([^]+)<\/p>/));
// Same result, but avoid this - it's not part of the standard

The s Flag (Dotall Mode)

The s flag, also called "dotall" mode, makes the dot match every character, including newlines:

const html = `<p>
First paragraph
with multiple lines
</p>`;

// With the s flag: . matches newlines too
console.log(html.match(/<p>(.+)<\/p>/s));
// Match found: "\nFirst paragraph\nwith multiple lines\n"

The name "dotall" means "dot matches all." The flag is available in all modern browsers and Node.js.

const template = `
---
title: Hello World
date: 2024-03-15
---
Content goes here
and here too
`;

// Extract content between --- markers (multiline)
const frontMatter = template.match(/---(.*?)---/s);
console.log(frontMatter[1].trim());
// "title: Hello World\ndate: 2024-03-15"

When to Use the s Flag

Use the s flag when your pattern needs to span across line breaks:

// Extracting multiline blocks
const code = `function hello() {
console.log("Hello");
console.log("World");
}`;

// Extract function body
const body = code.match(/\{(.*)\}/s);
console.log(body[1].trim());
// 'console.log("Hello");\n console.log("World");'

// Extracting HTML comments (which can span multiple lines)
const html = `<!-- This is
a multiline
comment -->
<p>Content</p>`;

const comment = html.match(/<!--(.+?)-->/s);
console.log(comment[1].trim());
// "This is\na multiline\ncomment"

// Extracting JSON blocks from mixed text
const response = `Some text before
{
"name": "Alice",
"age": 30
}
Some text after`;

const jsonBlock = response.match(/\{.*\}/s);
console.log(JSON.parse(jsonBlock[0]));
// { name: "Alice", age: 30 }

When NOT to Use the s Flag

Sometimes you specifically want . to stop at newlines. This is common when processing line-by-line content where newlines are meaningful boundaries:

const csv = `name,age,city
Alice,30,NYC
Bob,25,LA
Charlie,35,Chicago`;

// Without s: . stops at each line - correct behavior for CSV parsing
console.log(csv.match(/^\w+,.+$/gm));
// ["Alice,30,NYC", "Bob,25,LA", "Charlie,35,Chicago"]

// With s: . would cross lines - wrong for line-based parsing
console.log(csv.match(/^\w+,.+$/gms));
// This would match differently because . now crosses newlines

The Dot Is Greedy

By default, .+ and .* are greedy, meaning they match as many characters as possible. This can lead to unexpected results:

const html = '<b>Hello</b> and <b>World</b>';

// Greedy: .+ matches as much as possible
console.log(html.match(/<b>.+<\/b>/));
// ["<b>Hello</b> and <b>World</b>"]
// Matched from the FIRST <b> to the LAST </b>

// Lazy: .+? matches as little as possible
console.log(html.match(/<b>.+?<\/b>/g));
// ["<b>Hello</b>", "<b>World</b>"]
// Matched each <b>...</b> pair individually

This greedy vs. lazy distinction is covered in detail in a later guide, but it is important to be aware of when using the dot. Adding ? after + or * makes the quantifier lazy, matching the shortest possible string.

Dot vs. Specific Character Classes

The dot is convenient but imprecise. Using a more specific character class often produces more correct results:

// ❌ Too broad: . matches anything
const priceRegex = /\$.\../;
console.log(priceRegex.test('$9.99')); // true
console.log(priceRegex.test('$a.bc')); // true - probably not intended

// ✅ More precise: \d matches only digits
const betterPriceRegex = /\$\d\.\d\d/;
console.log(betterPriceRegex.test('$9.99')); // true
console.log(betterPriceRegex.test('$a.bc')); // false - correctly rejected

// ❌ Too broad: .+ in a URL pattern
const urlRegex = /https?:\/\/.+/;
console.log(urlRegex.test('http://example.com valid')); // true - matches the space and "valid" too

// ✅ More precise: \S+ stops at whitespace
const betterUrlRegex = /https?:\/\/\S+/;
const match = 'Visit http://example.com for info'.match(betterUrlRegex);
console.log(match[0]); // "http://example.com" - stops at the space
tip

Use the dot when you genuinely do not care what character is in a position. When you have a reasonable expectation about what characters should appear, use a more specific class like \d, \w, \S, or a custom character set [a-z]. More specific patterns produce fewer false matches and are easier to debug.

Practical Patterns Using Character Classes

Here are complete, real-world examples that combine everything covered in this guide.

Parsing Key-Value Pairs

const config = `
host = localhost
port = 3000
debug = true
name = My Application
`;

const pairs = {};
const regex = /(\w+)\s*=\s*(.+)/g;
let match;

while ((match = regex.exec(config)) !== null) {
const key = match[1].trim();
const value = match[2].trim();
pairs[key] = value;
}

console.log(pairs);
// { host: "localhost", port: "3000", debug: "true", name: "My Application" }

Extracting Data from Structured Text

// Extract timestamps and messages from a log
const log = `
[2024-03-15 08:30:22] INFO Server started
[2024-03-15 08:30:23] DEBUG Connection pool initialized
[2024-03-15 08:31:01] ERROR Failed to connect to database
`;

const entries = [];
const logRegex = /\[(\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2})\]\s(\w+)\s+(.+)/g;

for (const match of log.matchAll(logRegex)) {
entries.push({
timestamp: match[1],
level: match[2],
message: match[3]
});
}

console.log(entries);
// [
// { timestamp: "2024-03-15 08:30:22", level: "INFO", message: "Server started" },
// { timestamp: "2024-03-15 08:30:23", level: "DEBUG", message: "Connection pool initialized" },
// { timestamp: "2024-03-15 08:31:01", level: "ERROR", message: "Failed to connect to database" }
// ]

Input Sanitization

// Remove non-printable characters from user input
function sanitize(input) {
// Keep word characters, whitespace, and common punctuation
return input.replace(/[^\w\s.,!?@#$%&*()\-+=:;"'\/\\]/g, '');
}

console.log(sanitize('Hello\x00World\x07!')); // "HelloWorld!"

// Normalize whitespace
function normalizeWhitespace(text) {
return text
.replace(/\s+/g, ' ') // Collapse multiple whitespace to single space
.trim(); // Remove leading/trailing whitespace
}

console.log(normalizeWhitespace(' Hello \t World \n '));
// "Hello World"

Redacting Sensitive Information

// Mask credit card numbers, keeping only the last 4 digits
function maskCardNumber(text) {
return text.replace(/\b(\d{4})\s?\d{4}\s?\d{4}\s?(\d{4})\b/g,
(match, first, last) => `****-****-****-${last}`
);
}

console.log(maskCardNumber('Card: 4532 1234 5678 9012'));
// "Card: ****-****-****-9012"

// Redact email addresses
function redactEmails(text) {
return text.replace(/\S+@\S+\.\S+/g, '[EMAIL REDACTED]');
}

console.log(redactEmails('Contact alice@example.com or bob@test.org'));
// "Contact [EMAIL REDACTED] or [EMAIL REDACTED]"

// Mask phone numbers
function maskPhone(text) {
return text.replace(/\d{3}[-.]?\d{3}[-.]?\d{4}/g, (match) => {
const digits = match.replace(/\D/g, '');
return `***-***-${digits.slice(-4)}`;
});
}

console.log(maskPhone('Call 555-123-4567 or 555.987.6543'));
// "Call ***-***-4567 or ***-***-6543"

Password Strength Checker

function checkPasswordStrength(password) {
const checks = {
length: password.length >= 8,
uppercase: /[A-Z]/.test(password),
lowercase: /[a-z]/.test(password),
digit: /\d/.test(password),
special: /\W/.test(password), // Non-word character (but includes space)
noSpaces: !/\s/.test(password), // No whitespace
noRepeating: !/(.)\1{2,}/.test(password) // No 3+ repeated characters
};

const passed = Object.values(checks).filter(Boolean).length;
const total = Object.keys(checks).length;

let strength;
if (passed <= 3) strength = 'weak';
else if (passed <= 5) strength = 'medium';
else strength = 'strong';

return { checks, passed, total, strength };
}

console.log(checkPasswordStrength('Abc123!@'));
// { checks: { length: true, uppercase: true, ... }, passed: 7, total: 7, strength: "strong" }

console.log(checkPasswordStrength('password'));
// { checks: { length: true, uppercase: false, ... }, passed: 3, total: 7, strength: "weak" }

Summary

Character classes are the foundation of practical regular expression patterns in JavaScript:

  • \d matches any ASCII digit (0-9). Use it for numbers, dates, phone numbers, and any numeric data. Its inverse \D matches any non-digit.
  • \s matches any whitespace character (space, tab, newline, and others). Use it for splitting text, normalizing whitespace, and detecting blank strings. Its inverse \S matches any non-whitespace and is useful for matching visible content.
  • \w matches ASCII word characters (letters, digits, underscore). Use it for identifiers, usernames, and tokens. Be aware that it does not match accented or non-Latin characters. Its inverse \W matches non-word characters like punctuation and spaces.
  • The dot . matches any character except newline by default. Use the s flag (dotall mode) to make it match newlines too. Prefer specific character classes over the dot when you know what characters to expect.
  • A class paired with its inverse ([\s\S], [\d\D], [\w\W]) matches any character at all, which was the standard workaround before the s flag existed.
  • For international text, \d and \w only cover ASCII. Use Unicode property escapes (\p{Letter}, \p{Number}) with the u or v flag for full Unicode support.