How to Use Character Sets and Ranges in JavaScript Regular Expressions
Regular expressions become truly powerful when you can match groups of characters rather than exact sequences. Character sets (also called character classes) let you define a collection of characters to match against a single position in a string. Combined with ranges, you can express patterns like "any lowercase letter," "any digit," or "any vowel" in a compact and readable syntax.
This guide covers everything you need to know about character sets [abc], ranges [a-z], negated sets [^abc], and the special rules about escaping characters inside square brackets.
What Are Character Sets?
A character set is defined by placing characters inside square brackets [ ]. It matches exactly one character from the listed set at that position in the string.
The syntax is straightforward:
const pattern = /[abc]/;
This pattern matches a single character that is either a, b, or c.
const regex = /[abc]/;
console.log(regex.test("apple")); // true (matches 'a')
console.log(regex.test("banana")); // true (matches 'b')
console.log(regex.test("cherry")); // true (matches 'c')
console.log(regex.test("date")); // false (no 'a', 'b', or 'c')
Character Sets Match One Position
A critical point that beginners often overlook is that [abc] does not match the string "abc". It matches one character from the set at a single position.
const regex = /[abc]/g;
const str = "abcdef";
console.log(str.match(regex));
// Output: ['a', 'b', 'c']
Each character in the set is matched individually. The g flag finds all occurrences, so it returns three separate one-character matches.
Using Character Sets in Longer Patterns
Character sets are most useful when combined with other pattern elements:
// Match a color hex code starting with # followed by exactly 6 hex characters
const hexColor = /#[0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F]/;
console.log(hexColor.test("#ff6600")); // true
console.log(hexColor.test("#FF6600")); // true
console.log(hexColor.test("#gggggg")); // false
Of course, you would normally use quantifiers to shorten this, but the principle is clear: each [0-9a-fA-F] matches one hexadecimal character.
// Match words that start with a vowel
const startsWithVowel = /\b[aeiou]\w*/gi;
const text = "An apple and an orange exist everywhere";
console.log(text.match(startsWithVowel));
// Output: ['An', 'apple', 'and', 'an', 'orange', 'exist', 'everywhere']
Character Sets with Quantifiers
You can apply quantifiers to character sets just like any other pattern element:
// Match one or more vowels in a row
const vowelCluster = /[aeiou]+/gi;
const word = "beautiful";
console.log(word.match(vowelCluster));
// Output: ['eau', 'i', 'u']
// Match a simple username: 3 to 16 alphanumeric characters or underscores
const username = /^[a-zA-Z0-9_]{3,16}$/;
console.log(username.test("john_doe")); // true
console.log(username.test("ab")); // false (too short)
console.log(username.test("valid_user123")); // true
console.log(username.test("no spaces!")); // false
Ranges Inside Character Sets
Instead of listing every character individually, you can specify a range using a hyphen - between two characters. The range is based on Unicode code point order.
Common Ranges
// [a-z] matches any lowercase letter from a to z
const lowercase = /[a-z]/;
// [A-Z] matches any uppercase letter from A to Z
const uppercase = /[A-Z]/;
// [0-9] matches any digit from 0 to 9
const digit = /[0-9]/;
// [a-zA-Z] matches any letter (upper or lower)
const anyLetter = /[a-zA-Z]/;
// [a-zA-Z0-9] matches any alphanumeric character
const alphanumeric = /[a-zA-Z0-9]/;
const digitPattern = /[0-9]/g;
const str = "Order #4521 has 3 items";
console.log(str.match(digitPattern));
// Output: ['4', '5', '2', '1', '3']
Combining Multiple Ranges
You can combine multiple ranges and individual characters inside a single set:
// Match hex digits: 0-9, a-f, A-F
const hexDigit = /[0-9a-fA-F]/;
console.log(hexDigit.test("f")); // true
console.log(hexDigit.test("g")); // false
console.log(hexDigit.test("9")); // true
console.log(hexDigit.test("B")); // true
// Match letters, digits, hyphens, and underscores (common in slugs)
const slugChar = /^[a-zA-Z0-9_-]+$/;
console.log(slugChar.test("my-blog-post")); // true
console.log(slugChar.test("post_123")); // true
console.log(slugChar.test("has spaces")); // false
console.log(slugChar.test("special@char")); // false
Partial Ranges
Ranges don't have to span the entire alphabet or all digits:
// Match only digits 1 through 5
const oneToFive = /[1-5]/;
console.log(oneToFive.test("3")); // true
console.log(oneToFive.test("7")); // false
// Match only letters a through f (like hex)
const aToF = /[a-f]/;
console.log(aToF.test("c")); // true
console.log(aToF.test("z")); // false
How Ranges Work Under the Hood
Ranges rely on Unicode code point values. The character on the left side of the hyphen must have a lower code point than the character on the right side, or you get an error.
// This works: 'a' (97) to 'z' (122)
const valid = /[a-z]/;
// This throws an error: 'z' (122) to 'a' (97)
try {
const invalid = /[z-a]/;
} catch (e) {
console.log(e.message);
// Output: Invalid regular expression: /[z-a]/: Range out of order in character class
}
The start of a range must come before the end in Unicode order. [a-z] is valid, but [z-a] throws a SyntaxError. This applies to digits too: [0-9] is valid, [9-0] is not.
Using the Hyphen as a Literal Character
If you want to match a literal hyphen inside a character set, place it at the beginning or end of the set, or escape it with a backslash:
// Hyphen at the start
const withHyphenStart = /[-abc]/;
// Hyphen at the end
const withHyphenEnd = /[abc-]/;
// Escaped hyphen in the middle
const withHyphenEscaped = /[a\-c]/;
console.log(withHyphenStart.test("-")); // true
console.log(withHyphenEnd.test("-")); // true
console.log(withHyphenEscaped.test("-")); // true
// Practical example: matching arithmetic operators
const operator = /[+\-*/]/;
console.log(operator.test("+")); // true
console.log(operator.test("-")); // true
console.log(operator.test("*")); // true
console.log(operator.test("/")); // true
console.log(operator.test("%")); // false
Negated Character Sets
By placing a caret ^ as the first character inside the square brackets, you create a negated set. This matches any character that is not in the listed set.
// [^abc] matches any character EXCEPT a, b, or c
const notABC = /[^abc]/;
console.log(notABC.test("a")); // false
console.log(notABC.test("d")); // true
console.log(notABC.test("z")); // true
console.log(notABC.test("1")); // true
Negated Ranges
You can negate ranges just as easily:
// Match any character that is NOT a digit
const nonDigit = /[^0-9]/;
console.log(nonDigit.test("5")); // false
console.log(nonDigit.test("a")); // true
console.log(nonDigit.test("!")); // true
// Match any character that is NOT a lowercase letter
const nonLower = /[^a-z]/;
console.log(nonLower.test("a")); // false
console.log(nonLower.test("A")); // true
console.log(nonLower.test("5")); // true
Practical Examples with Negated Sets
// Remove all non-digit characters from a phone number
const phone = "+1 (555) 123-4567";
const digitsOnly = phone.replace(/[^0-9]/g, "");
console.log(digitsOnly);
// Output: "15551234567"
// Extract only alphabetic characters
const messy = "H3ll0 W0rld!";
const lettersOnly = messy.replace(/[^a-zA-Z]/g, "");
console.log(lettersOnly);
// Output: "HllWrld"
// Validate that a string contains only alphanumeric characters and spaces
const cleanInput = /^[a-zA-Z0-9 ]+$/;
console.log(cleanInput.test("Hello World 123")); // true
console.log(cleanInput.test("Hello@World")); // false
console.log(cleanInput.test("no-hyphens")); // false
The Caret Position Matters
The ^ only acts as a negation operator when it is the very first character after the opening bracket. In any other position, it is treated as a literal caret character.
// ^ as negation (first position)
const negated = /[^abc]/;
console.log(negated.test("a")); // false (negated set, 'a' is excluded)
// ^ as literal character (not first position)
const literal = /[a^bc]/;
console.log(literal.test("^")); // true (matches literal '^')
console.log(literal.test("a")); // true (matches 'a')
[^abc] means "NOT a, b, or c." But [a^bc] means "a, ^, b, or c." The ^ is only special when it appears immediately after [.
Negated Sets vs. Shorthand Character Classes
JavaScript provides built-in shorthand character classes that are equivalent to certain negated sets:
| Shorthand | Equivalent Set | Meaning |
|---|---|---|
\d | [0-9] | Any digit |
\D | [^0-9] | Any non-digit |
\w | [a-zA-Z0-9_] | Any word character |
\W | [^a-zA-Z0-9_] | Any non-word character |
\s | [\t\n\r\f\v ] (and more) | Any whitespace |
\S | [^\t\n\r\f\v ] (and more) | Any non-whitespace |
// These are equivalent:
const withSet = /[^0-9]/g;
const withShorthand = /\D/g;
const str = "abc123def";
console.log(str.match(withSet)); // ['a', 'b', 'c', 'd', 'e', 'f']
console.log(str.match(withShorthand)); // ['a', 'b', 'c', 'd', 'e', 'f']
Use \d, \w, \s and their negations when they match your need exactly. Use custom character sets [...] when you need more specific groups, like [aeiou] for vowels or [0-9a-fA-F] for hex digits.
Special Characters Inside Sets
One of the most convenient features of character sets is that most special regex characters lose their special meaning inside square brackets. You do not need to escape them.
Characters That Do NOT Need Escaping Inside [ ]
Inside a character set, these characters are treated as literals and do not need a backslash:
.(dot)*(asterisk)+(plus)?(question mark)(and)(parentheses){and}(curly braces)|(pipe)$(dollar sign)^(caret, except at the first position)
// The dot is literal inside [ ], not "any character"
const dotLiteral = /[.]/;
console.log(dotLiteral.test(".")); // true
console.log(dotLiteral.test("a")); // false (dot is NOT "any char" here)
// Compare with dot outside [ ]
const dotWild = /./;
console.log(dotWild.test("a")); // true (dot IS "any char" here)
// Matching common punctuation without escaping inside [ ]
const punctuation = /[.!?;:,]/g;
const sentence = "Hello! How are you? Fine, thanks.";
console.log(sentence.match(punctuation));
// Output: ['!', '?', ',', '.']
// Matching math operators - most don't need escaping inside [ ]
const mathOps = /[+*./()=]/g;
const expression = "(3 + 5) * 2 = 16.0";
console.log(expression.match(mathOps));
// Output: ['(', '+', ')', '*', '=', '.']
Characters That STILL Need Escaping Inside [ ]
Only a few characters retain their special meaning inside character sets and must be escaped if you want to match them literally:
| Character | Why It's Special | How to Escape |
|---|---|---|
] | Closes the character set | \] |
\ | Escape character | \\ |
^ | Negation (at first position) | \^ or place after first position |
- | Range operator (between chars) | \- or place at start/end |
// Matching a literal backslash inside a set
const backslash = /[\\]/;
console.log(backslash.test("\\")); // true
// Matching a literal closing bracket
const closeBracket = /[\]]/;
console.log(closeBracket.test("]")); // true
// Matching all four special-inside-set characters
const specialInSet = /[\]\\\^\-]/g;
const str = "test ] \\ ^ - end";
console.log(str.match(specialInSet));
// Output: [']', '\\', '^', '-']
Common Mistake: Unnecessary Escaping
Many developers escape characters inside sets out of habit even when it is not required. While unnecessary escaping does not break the regex, it reduces readability.
// ❌ Overly escaped (works, but cluttered)
const overEscaped = /[\.\+\*\?\$\(\)]/g;
// ✅ Clean version (same behavior)
const clean = /[.+*?$()]/g;
const str = "Price: $5.00 (discounted?)";
console.log(str.match(overEscaped));
// Output: ['.', '$', '.', '(', '?', ')']
console.log(str.match(clean));
// Output: ['.', '$', '.', '(', '?', ')']
Only escape characters inside [ ] when they are genuinely special there: ], \, ^ (at the start), and - (between characters). Everything else can be written as-is. This keeps your regular expressions clean and easier to read.
Shorthand Classes Inside Sets
You can combine shorthand character classes with custom characters inside a set:
// Match digits or a dot (useful for decimal numbers)
const digitOrDot = /[\d.]+/g;
const str = "Total: $42.99 and 3 items";
console.log(str.match(digitOrDot));
// Output: ['42.99', '3']
// Match word characters or hyphens (for hyphenated names)
const nameChar = /[\w-]+/g;
const names = "Mary-Jane, John O'Brien, Ana-Maria";
console.log(names.match(nameChar));
// Output: ['Mary-Jane', 'John', "O'Brien", 'Ana-Maria']
// Note: the apostrophe split "O'Brien" into two matches
// Match anything that is NOT a word character and NOT whitespace
const specialChars = /[^\w\s]/g;
const text = "Hello, World! How's it going? #great";
console.log(text.match(specialChars));
// Output: [',', '!', "'", '?', '#']
Real-World Examples
Validating an Email Address (Simplified)
// A simplified email validation using character sets and ranges
const emailPattern = /^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/;
console.log(emailPattern.test("user@example.com")); // true
console.log(emailPattern.test("first.last@company.co")); // true
console.log(emailPattern.test("bad@.com")); // false
console.log(emailPattern.test("missing-at-sign.com")); // false
This simplified email regex covers most common cases but does not handle all valid email formats defined in RFC 5322. For production applications, consider using a well-tested validation library or the HTML5 type="email" input validation.
Extracting Hex Color Codes
const hexColorPattern = /#[0-9a-fA-F]{3,8}/g;
const css = "color: #ff6600; background: #333; border: #aabbccdd;";
console.log(css.match(hexColorPattern));
// Output: ['#ff6600', '#333', '#aabbccdd']
Sanitizing Input by Removing Unwanted Characters
// Allow only letters, numbers, spaces, and basic punctuation
function sanitize(input) {
return input.replace(/[^a-zA-Z0-9 .,!?'-]/g, "");
}
console.log(sanitize("Hello <script>alert('xss')</script>!"));
// Output: "Hello scriptalert'xss'script!"
console.log(sanitize("Normal text, with punctuation. Fine!"));
// Output: "Normal text, with punctuation. Fine!"
Parsing CSV-Like Data
// Split on commas, semicolons, or pipe characters
const delimiters = /[,;|]/;
const data = "apple,banana;cherry|date";
console.log(data.split(delimiters));
// Output: ['apple', 'banana', 'cherry', 'date']
Password Strength Validation
function validatePassword(password) {
const hasLower = /[a-z]/.test(password);
const hasUpper = /[A-Z]/.test(password);
const hasDigit = /[0-9]/.test(password);
const hasSpecial = /[!@#$%^&*()_+{}[\]:;"'<>,.?/\\|~`-]/.test(password);
const isLongEnough = password.length >= 8;
return {
hasLower,
hasUpper,
hasDigit,
hasSpecial,
isLongEnough,
isStrong: hasLower && hasUpper && hasDigit && hasSpecial && isLongEnough
};
}
console.log(validatePassword("Str0ng!Pass"));
// Output: { hasLower: true, hasUpper: true, hasDigit: true,
// hasSpecial: true, isLongEnough: true, isStrong: true }
console.log(validatePassword("weak"));
// Output: { hasLower: true, hasUpper: false, hasDigit: false,
// hasSpecial: false, isLongEnough: false, isStrong: false }
Common Mistakes and Pitfalls
Mistake 1: Confusing [abc] with abc
// [abc] matches ONE character: either a, b, or c
// abc matches the exact sequence a-b-c
const set = /[abc]/;
const sequence = /abc/;
console.log(set.test("cab")); // true (contains 'c', 'a', or 'b')
console.log(sequence.test("cab")); // false (does not contain "abc" in sequence)
console.log(set.test("a")); // true
console.log(sequence.test("a")); // false
Mistake 2: Invalid Range Order
// ❌ Wrong: end of range comes before start in Unicode
try {
const bad = /[Z-A]/;
} catch (e) {
console.log("Error:", e.message);
// Error: Invalid regular expression: /[Z-A]/: Range out of order in character class
}
// ✅ Correct
const good = /[A-Z]/;
Mistake 3: Forgetting That Negated Sets Match Everything Else
// [^0-9] does NOT mean "match nothing" (it means "match anything that's NOT a digit")
const nonDigit = /[^0-9]/;
console.log(nonDigit.test("\n")); // true (newline is not a digit)
console.log(nonDigit.test(" ")); // true (space is not a digit)
console.log(nonDigit.test("!")); // true (exclamation is not a digit)
This is important for input validation. A pattern like /^[^0-9]+$/ does not validate that the string is "text only." It validates that the string has no digits, but it still allows spaces, symbols, emoji, and control characters.
Mistake 4: Unintended Range with Misplaced Hyphen
// ❌ Ambiguous: is this a range from '1' to 'a' or three literal characters?
// Actually, JavaScript interprets [1-a] as a range.
try {
const ambiguous = /[1-a]/;
// This might work or fail depending on code points
// '1' is U+0031, 'a' is U+0061, valid range, but matches many unexpected characters
console.log(ambiguous.test("A")); // true! 'A' (U+0041) is between '1' and 'a'
} catch (e) {
console.log("Error:", e.message);
}
// ✅ If you want to match '1', '-', and 'a', put the hyphen at the end or start
const correct = /[1a-]/;
console.log(correct.test("1")); // true
console.log(correct.test("-")); // true
console.log(correct.test("a")); // true
console.log(correct.test("A")); // false
Always place a literal hyphen at the start [-abc] or end [abc-] of a character set, or escape it [a\-c]. A hyphen between two characters creates a range, which may not be what you intend.
Mistake 5: Using [.] Expecting "Any Character"
// ❌ Wrong: [.] matches only a literal dot
const dotInSet = /[.]/;
console.log(dotInSet.test("a")); // false
// ✅ Correct: use . outside a set for "any character"
const dotOutside = /./;
console.log(dotOutside.test("a")); // true
Sets and the Unicode u and v Flags
When working with Unicode characters beyond the basic ASCII range, the u flag ensures that character sets handle surrogate pairs correctly:
// Without 'u' flag, surrogate pairs may not be handled correctly
const withoutU = /[��-��]/;
// This may produce unexpected results
// With 'u' flag, Unicode is handled correctly
const withU = /[��-��]/u;
console.log(withU.test("😃")); // true
console.log(withU.test("😎")); // true
console.log(withU.test("🤔")); // false (outside range)
The newer v flag (ES2024) provides even more powerful Unicode set operations, including set intersection and subtraction using && and -- syntax inside character classes:
// v flag allows set subtraction (remove specific chars from a range)
// Match Greek letters except alpha (α) and beta (β)
const greekExceptAlphaBeta = /[\p{Script=Greek}--[αβ]]/v;
console.log(greekExceptAlphaBeta.test("γ")); // true
console.log(greekExceptAlphaBeta.test("α")); // false
The v flag is a newer feature. Make sure to check compatibility for your target environments before using it. The u flag is widely supported in all modern browsers and Node.js.
Quick Reference Summary
| Syntax | Meaning | Example |
|---|---|---|
[abc] | Match any one of a, b, or c | /[aeiou]/ matches any vowel |
[a-z] | Match any character from a to z | /[a-z]/ matches lowercase letters |
[A-Z] | Match any character from A to Z | /[A-Z]/ matches uppercase letters |
[0-9] | Match any digit from 0 to 9 | /[0-9]/ matches digits |
[a-zA-Z0-9] | Match any alphanumeric character | Combines multiple ranges |
[^abc] | Match any character NOT a, b, or c | /[^0-9]/ matches non-digits |
[^a-z] | Match any character NOT in range | /[^a-zA-Z]/ matches non-letters |
[-abc] or [abc-] | Match literal hyphen plus a, b, c | Hyphen at start or end |
[a\-c] | Match a, literal hyphen, or c | Escaped hyphen in middle |
[.] | Match a literal dot (not "any char") | Special chars are literal inside sets |
Character sets are one of the most frequently used features in regular expressions. They give you precise control over which characters to match at each position, keep your patterns readable, and combine naturally with quantifiers, anchors, and groups to build powerful matching logic.