How to Prevent Catastrophic Backtracking in JavaScript Regular Expressions
A single regular expression can freeze your entire application. Not because of a bug in the engine, but because certain patterns, when matched against specific input, cause the regex engine to explore an astronomical number of possibilities before giving up. This phenomenon is called catastrophic backtracking, and it is one of the most dangerous performance pitfalls in JavaScript.
Attackers can exploit vulnerable patterns to launch ReDoS (Regular Expression Denial of Service) attacks, feeding crafted input that makes your server or browser hang for minutes or even hours. In this guide, you will learn exactly why this happens, how to recognize vulnerable patterns, and how to fix or replace them with safe alternatives.
What Is Catastrophic Backtracking?
To understand catastrophic backtracking, you first need to understand how the JavaScript regex engine works.
How the Regex Engine Backtracks
JavaScript uses a backtracking regex engine (also called NFA, Non-deterministic Finite Automaton). When a pattern contains quantifiers like *, +, or {n,m}, the engine makes choices at each step: should it match one more character with the quantifier, or should it move on to the next part of the pattern?
When a choice leads to a dead end, the engine backtracks: it goes back to the last choice point and tries a different option. This is normal and usually fast.
Let's see normal backtracking in action:
// Pattern: /a+b/ matched against "aaac"
// Step 1: a+ greedily matches "aaa"
// Step 2: Try to match "b" → finds "c" → FAIL
// Step 3: Backtrack: a+ gives back one "a", now matches "aa"
// Step 4: Try to match "b" → finds "a" → FAIL
// Step 5: Backtrack: a+ gives back another "a", now matches "a"
// Step 6: Try to match "b" → finds "a" → FAIL
// Step 7: Backtrack: a+ gives back last "a", now matches "" → FAIL (need at least one)
// Step 8: Overall match fails
let result = /a+b/.test("aaac");
console.log(result); // false (7 steps, perfectly fine)
Output:
false
This is linear backtracking. For a string of length n, the engine tries roughly n possibilities. Fast and harmless.
When Backtracking Becomes Catastrophic
The problem arises when the pattern structure creates exponential possibilities. Consider a pattern with nested quantifiers or overlapping alternatives:
// DANGEROUS PATTERN: DO NOT USE IN PRODUCTION
let pattern = /^(a+)+$/;
// Against "aaaaaaaaaaaaaaaaaaaaaaaaaaab" (26 a's + b)
console.time("regex");
let result = pattern.test("aaaaaaaaaaaaaaaaaaaaaaaaaaab");
console.timeEnd("regex");
This seemingly simple pattern can take seconds, minutes, or longer to fail. Why?
The outer + quantifier repeats the group (a+). The inner a+ matches one or more a characters. For a string of 26 a characters followed by a b, the engine must figure out every possible way to divide those 26 a characters into groups:
- (26 a's in one group)
- (25 a's) + (1 a)
- (24 a's) + (2 a's)
- (24 a's) + (1 a) + (1 a)
- (23 a's) + (3 a's)
- (23 a's) + (2 a's) + (1 a)
- ...and so on
The number of ways to partition n items is approximately 2^(n-1). For 26 a characters, that is over 33 million combinations. For 30 characters, it is over 500 million. Each one must be tried before the engine concludes that the b at the end prevents a match.
// Let's measure with increasing lengths
let pattern = /^(a+)+$/;
for (let len = 15; len <= 25; len++) {
let str = "a".repeat(len) + "b";
let start = performance.now();
pattern.test(str);
let elapsed = (performance.now() - start).toFixed(1);
console.log(`Length ${len}: ${elapsed}ms`);
}
Output (approximate, varies by machine):
Length 15: 0.5ms
Length 16: 1.0ms
Length 17: 2.1ms
Length 18: 4.3ms
Length 19: 8.5ms
Length 20: 17.1ms
Length 21: 34.2ms
Length 22: 68.5ms
Length 23: 137.0ms
Length 24: 274.1ms
Length 25: 548.3ms
Notice how the time doubles with each additional character. This is the signature of exponential growth. By length 30, the regex would take minutes. By length 40, it could take years.
What Is ReDoS?
ReDoS (Regular Expression Denial of Service) is an attack where a malicious user submits crafted input designed to trigger catastrophic backtracking in a vulnerable regex pattern. If your server validates user input with a vulnerable pattern, a single request can lock up a CPU core, effectively taking down your service.
Real-world ReDoS incidents have affected major platforms:
- Stack Overflow went down in 2016 due to a regex in their markdown parser
- Cloudflare experienced a global outage in 2019 partly related to regex backtracking
- Node.js has had multiple CVEs related to ReDoS in core modules
ReDoS is a real security threat. Any regex that processes user-supplied input must be reviewed for catastrophic backtracking vulnerability. A single vulnerable pattern can bring down an entire server.
Identifying Vulnerable Patterns
Catastrophic backtracking requires a specific combination of conditions. Learning to spot these conditions is the most important skill for writing safe regex.
The Three Conditions for Catastrophic Backtracking
All three must be present:
- A quantifier applied to a sub-pattern that can match in multiple ways (ambiguity)
- The quantified sub-pattern is itself repeated (nested or sequential quantifiers)
- The match ultimately fails (backtracking is triggered only on failure)
If the match succeeds, the engine stops immediately. Catastrophic backtracking only happens when the engine exhausts all possibilities trying to find a match that does not exist.
Pattern 1: Nested Quantifiers
The most obvious danger sign is a quantifier inside a quantifier:
// DANGEROUS: nested quantifiers on overlapping patterns
/^(a+)+$/ // a+ inside ()+
/^(a*)*$/ // a* inside ()*
/^(a+)*$/ // a+ inside ()*
/^(a{1,10})+$/ // a{1,10} inside ()+
All of these are vulnerable because the inner and outer quantifiers create ambiguity about how many characters the inner group should match on each repetition.
// SAFE: no ambiguity, single quantifier
/^a+$/ // Simple, no nesting
/^(ab)+$/ // "ab" as a unit, no overlap possible
Pattern 2: Overlapping Alternatives
When alternatives inside a group can match the same characters, combined with a quantifier, backtracking explodes:
// DANGEROUS: both alternatives can match "a"
/^(a|a)+$/
/^(a|ab)+$/
/^(\w|\d)+$/ // \d is a subset of \w (they overlap!)
// SAFE: alternatives don't overlap
/^(a|b)+$/ // "a" and "b" are distinct
/^(\d|[a-f])+$/ // digits and letters are distinct (assuming non-hex context)
Let's verify the overlapping alternative problem:
// DANGEROUS: \w and \d overlap (digits match both)
let bad = /^(\w|\d)+$/;
let start = performance.now();
bad.test("a]" + "a".repeat(25)); // will be slow if input doesn't match
let elapsed = performance.now() - start;
console.log(`Elapsed: ${elapsed.toFixed(1)}ms`);
Pattern 3: Adjacent Quantified Patterns That Overlap
Even without nesting, two adjacent quantifiers that can match the same characters cause problems:
// DANGEROUS: both \w+ sequences can match word characters
/^\w+\w+$/
// This seems harmless but consider: where does \w+ end and \w+ begin?
// For "aaab": is it "aaa" + "b"? "aa" + "ab"? "a" + "aab"?
// All must be tried if the overall match fails.
// DANGEROUS: same overlap issue
/^\w+\d+$/
// For "123456x": \w+ and \d+ both match digits.
// How many digits go to \w+ vs \d+? All splits must be tried.
However, if the overall match succeeds, the engine stops immediately and no problem occurs. The danger is when the string almost matches but fails at the end:
// This matches instantly (success on first try)
console.log(/^\w+\d+$/.test("abc123")); // true (fast)
// This takes time (failure forces exhaustive backtracking)
let start = performance.now();
console.log(/^\w+\d+$/.test("a]" + "1".repeat(25)));
let elapsed = performance.now() - start;
console.log(`Elapsed: ${elapsed.toFixed(1)}ms`);
Recognizing Dangerous Structures at a Glance
Here is a quick reference of red flags:
| Pattern Structure | Why It Is Dangerous |
|---|---|
(a+)+ | Nested quantifiers, same character |
(a*)* | Nested quantifiers, empty match possible |
(a|a)+ | Overlapping alternatives with quantifier |
(\w|\d)+ | Subset overlap with quantifier |
(.*.*)+ | Dot matches everything, total ambiguity |
\w+\w+ on failure | Adjacent quantifiers over same characters |
(a|b?)+ | b? can match empty, creating ambiguity |
Real-World Vulnerable Patterns
These are patterns you might encounter in actual codebases:
// Email validation (DANGEROUS)
/^([a-zA-Z0-9]+\.)+[a-zA-Z]{2,}$/
// Input: "aaaaaaaaaaaaaaaaaaaaaaaa!"
// URL validation (DANGEROUS)
/^(https?:\/\/)?([\w-]+\.)+[\w-]+(\/[\w-./?%&=]*)?$/
// Input: long string of valid-looking chars ending with invalid char
// HTML tag matching (DANGEROUS)
/<(\w+)(\s+\w+=".*")*>/
// Input: tag with many attributes that doesn't close properly
// CSV parsing (DANGEROUS)
/^("([^"]*|"")*"|[^,]*)(,("([^"]*|"")*"|[^,]*))*$/
// Input: long unquoted string with special characters
Let's demonstrate the email pattern vulnerability:
// DANGEROUS email pattern
let emailPattern = /^([a-zA-Z0-9]+\.)+[a-zA-Z]{2,}$/;
// Normal input (fast)
console.log(emailPattern.test("user.name.test.com")); // false, but fast
// Crafted input (slow!)
let malicious = "a.".repeat(15) + "!";
let start = performance.now();
emailPattern.test(malicious);
let elapsed = performance.now() - start;
console.log(`Elapsed: ${elapsed.toFixed(1)}ms`); // Could be seconds
The pattern ([a-zA-Z0-9]+\.)+ has a nested quantifier. The + inside matches alphanumeric characters, and the outer + repeats the group. When the input has many segments separated by dots but ends with an invalid character, the engine tries every possible way to redistribute characters between segments.
Fixing Vulnerable Patterns
There are several strategies to eliminate catastrophic backtracking, ranging from pattern rewrites to engine-level protections.
Strategy 1: Remove Ambiguity
The root cause is always ambiguity: the engine has multiple ways to match the same substring. Remove the ambiguity, and the problem disappears.
// DANGEROUS: (a+)+ (inner and outer quantifiers overlap)
/^(a+)+$/
// FIXED: Just use a single quantifier (same meaning, no ambiguity)
/^a+$/
// Both match one or more "a" characters, but the second has no nested quantifiers.
// DANGEROUS: (\w|\d)+ ( overlapping alternatives)
/^(\w|\d)+$/
// FIXED: \w already includes \d, so just use \w
/^(\w)+$/
// Or even simpler:
/^\w+$/
// DANGEROUS: \w+\d+
/^\w+\d+$/
// FIXED: Be specific about what each part matches
/^[a-zA-Z_]+\d+$/
// Now the first part matches only letters/underscore, the second only digits.
// No overlap, no ambiguity.
Strategy 2: Rewrite with Precise Character Classes
Instead of broad patterns like .* or \w+, use character classes that match exactly what you expect and nothing more:
// DANGEROUS: Greedy .* inside a group, repeated
/^"(.*")*$/
// FIXED: Match anything except the delimiter
/^"([^"]*")*$/
// [^"]* can never match the quote, so there's no ambiguity about where to stop
// DANGEROUS email domain pattern
/^([a-zA-Z0-9]+\.)+[a-zA-Z]{2,}$/
// FIXED: Use a negated class to prevent overlap
/^([a-zA-Z0-9]+\.)+[a-zA-Z]{2,}$/
// Actually, the fix here is to limit the quantifier:
/^([a-zA-Z0-9]{1,63}\.){1,10}[a-zA-Z]{2,10}$/
// Bounded repetitions prevent exponential growth
Strategy 3: Use Bounded Quantifiers
Replace unbounded quantifiers (+, *) with bounded ones ({1,n}) to cap the maximum number of iterations:
// DANGEROUS: Unbounded nested quantifiers
/^(\w+\.)+\w+$/
// SAFER: Bounded quantifiers limit total combinations
/^(\w{1,63}\.){1,10}\w{2,10}$/
// Even if backtracking occurs, the bounded ranges
// limit the total attempts to a manageable number
// Demonstrating the difference
let dangerous = /^(\w+\.)+\w+$/;
let safer = /^(\w{1,63}\.){1,10}\w{2,10}$/;
let input = "a.".repeat(20) + "!";
console.time("dangerous");
dangerous.test(input); // Could be very slow
console.timeEnd("dangerous");
console.time("safer");
safer.test(input); // Fast (bounded iterations)
console.timeEnd("safer");
Strategy 4: Make Patterns More Specific at Boundaries
Use anchors and specific delimiters to reduce backtracking opportunities:
// DANGEROUS: Vague boundaries
/(\s+\w+)*/
// FIXED: Anchor and be specific
/^(\s+\w+)*$/
// Better yet, require at least one non-space to start:
/^\w+(\s+\w+)*$/
Strategy 5: Use Atomic Workarounds
JavaScript does not natively support atomic groups or possessive quantifiers (which exist in languages like Java, Perl, and .NET). These constructs tell the engine "once you have matched this far, do not backtrack."
However, you can simulate atomic behavior using a lookahead followed by a backreference:
// Simulating an atomic group in JavaScript
// Concept: (?=(pattern))\1 (the lookahead matches, the backreference consumes)
// The backreference matches exactly what the lookahead captured,
// and since it's a fixed string, there's nothing to backtrack into.
// DANGEROUS:
/^(a+)+$/
// SIMULATED ATOMIC GROUP:
/^(?=(a+))\1$/
// But this doesn't fully solve the problem for nested cases.
// The real fix is to simplify the pattern itself.
The atomic group workaround with lookahead + backreference is a known technique, but it has limitations and can be confusing. In most cases, rewriting the pattern to remove ambiguity is a better solution than simulating atomic groups.
Strategy 6: Split Complex Patterns into Steps
Instead of one complex regex, use multiple simple ones or combine regex with procedural code:
// DANGEROUS: One complex regex for email validation
let dangerousEmail = /^([a-zA-Z0-9._%+-]+)@(([a-zA-Z0-9-]+\.)+[a-zA-Z]{2,})$/;
// SAFE: Split into steps
function validateEmail(email) {
// Step 1: Basic structure check
let parts = email.split("@");
if (parts.length !== 2) return false;
let [local, domain] = parts;
// Step 2: Validate local part (simple, no nesting)
if (!/^[a-zA-Z0-9._%+-]{1,64}$/.test(local)) return false;
// Step 3: Validate domain (simple, bounded)
if (!/^([a-zA-Z0-9-]{1,63}\.){1,10}[a-zA-Z]{2,10}$/.test(domain)) return false;
// Step 4: Additional rules
if (local.startsWith(".") || local.endsWith(".")) return false;
if (domain.startsWith("-") || domain.includes("-.") || domain.includes(".-")) return false;
return true;
}
console.log(validateEmail("user@example.com")); // true
console.log(validateEmail("bad..user@example.com")); // true (simplified)
console.log(validateEmail("user@.example.com")); // false
console.log(validateEmail("a".repeat(100) + "@b.com")); // false (fast!)
Output:
true
true
false
false
This approach is safer, faster, and easier to maintain than a single monolithic regex.
Using the v Flag and Other Mitigations
The v Flag (Unicode Sets)
The v flag (introduced in ES2024) provides enhanced Unicode support and stricter syntax validation. While it does not directly prevent catastrophic backtracking, its stricter parsing rejects some ambiguous constructs that might otherwise be written accidentally:
// The v flag enables set operations and stricter parsing
let pattern = /[\w&&[^\d]]/v; // Word characters minus digits = letters + underscore
console.log(pattern.test("a")); // true
console.log(pattern.test("1")); // false
console.log(pattern.test("_")); // true
Output:
true
false
true
The v flag helps write more precise character classes through set operations (&& for intersection, -- for subtraction), which can reduce the character overlap that leads to backtracking:
// Without v flag: overlapping alternatives
let old = /[\w\d]+/; // \d is redundant since \w includes \d
// With v flag: precise set operations
let modern = /[\w--[_]]/v; // \w minus underscore
// More precise patterns = less overlap = less backtracking risk
Input Length Limits
The simplest and most effective mitigation is to limit input length before applying regex:
function safeMatch(input, pattern, maxLength = 1000) {
if (input.length > maxLength) {
throw new Error(`Input exceeds maximum length of ${maxLength}`);
}
return pattern.test(input);
}
// Even a vulnerable pattern can't cause problems with short input
let vulnerable = /^(a+)+$/;
console.log(safeMatch("aaa", vulnerable)); // true
console.log(safeMatch("aaab", vulnerable)); // false (fast, only 4 chars)
try {
safeMatch("a".repeat(10000), vulnerable);
} catch (e) {
console.log(e.message);
}
Output:
true
false
Input exceeds maximum length of 1000
Input length limiting is your first line of defense. Even well-written patterns can have edge cases. Capping input at a reasonable length ensures that even worst-case backtracking remains bounded.
Timeout Mechanisms
For server environments, you can set timeouts on regex operations. Node.js does not have a built-in regex timeout, but you can use worker threads or the vm module:
// Using vm module in Node.js to set a timeout
const vm = require("vm");
function safeRegexTest(pattern, input, timeoutMs = 1000) {
const script = new vm.Script(`result = pattern.test(input)`);
const context = vm.createContext({ pattern, input, result: false });
try {
script.runInContext(context, { timeout: timeoutMs });
return context.result;
} catch (e) {
if (e.code === "ERR_SCRIPT_EXECUTION_TIMEOUT") {
throw new Error("Regex execution timed out - possible ReDoS");
}
throw e;
}
}
Static Analysis Tools
Several tools can analyze your regex patterns for ReDoS vulnerability before your code runs:
safe-regex (npm package):
const safeRegex = require("safe-regex");
console.log(safeRegex(/^(a+)+$/)); // false (DANGEROUS)
console.log(safeRegex(/^a+$/)); // true (safe)
console.log(safeRegex(/^(\w+\.)+\w+$/)); // false (DANGEROUS)
console.log(safeRegex(/^\w+(\.\w+)*$/)); // true (safe)
Output:
false
true
false
true
Other tools:
| Tool | Type | Usage |
|---|---|---|
safe-regex | npm package | Runtime check on pattern objects |
rxxr2 | CLI tool | Academic-grade ReDoS detection |
eslint-plugin-security | ESLint plugin | Flags unsafe regex in code |
vuln-regex-detector | npm package | Detects vulnerable patterns |
| regex101.com | Web tool | Shows backtracking step count |
Using AbortController with Regex (Conceptual)
While JavaScript does not natively support aborting a regex execution, you can move regex work to a Web Worker or Worker Thread and terminate it if it takes too long:
// Browser: Using a Web Worker for regex with timeout
function regexWithTimeout(pattern, input, timeoutMs) {
return new Promise((resolve, reject) => {
const workerCode = `
self.onmessage = function(e) {
const { pattern, flags, input } = e.data;
const regex = new RegExp(pattern, flags);
const result = regex.test(input);
self.postMessage({ result });
};
`;
const blob = new Blob([workerCode], { type: "application/javascript" });
const worker = new Worker(URL.createObjectURL(blob));
const timer = setTimeout(() => {
worker.terminate();
reject(new Error("Regex timed out"));
}, timeoutMs);
worker.onmessage = (e) => {
clearTimeout(timer);
worker.terminate();
resolve(e.data.result);
};
worker.postMessage({
pattern: pattern.source,
flags: pattern.flags,
input
});
});
}
// Usage
// regexWithTimeout(/^(a+)+$/, "aaa...b", 1000)
// .then(result => console.log(result))
// .catch(err => console.error(err.message));
Rewriting Common Vulnerable Patterns
Here are before-and-after examples of frequently encountered dangerous patterns:
Email Validation
// DANGEROUS
let bad = /^([a-zA-Z0-9._%-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,})*$/;
// SAFE: bounded, no nested quantifiers over overlapping chars
let good = /^[a-zA-Z0-9._%+-]{1,64}@[a-zA-Z0-9]([a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(\.[a-zA-Z0-9]([a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*\.[a-zA-Z]{2,10}$/;
// Or simply split and validate separately (recommended)
URL Validation
// DANGEROUS
let bad = /^(https?:\/\/)?([\w-]+\.)+[\w-]+(\/[\w-./?%&=]*)*$/;
// SAFE: bounded, specific characters per segment
let good = /^(https?:\/\/)?([a-zA-Z0-9-]{1,63}\.){1,10}[a-zA-Z]{2,10}(\/[a-zA-Z0-9_.~:/?#[\]@!$&'()*+,;=-]{0,2000})?$/;
// Or better: use the URL constructor
function isValidURL(str) {
try {
new URL(str);
return true;
} catch {
return false;
}
}
HTML Attribute Matching
// DANGEROUS: .* inside repeated group
let bad = /<\w+(\s+\w+=".*")*>/;
// SAFE: use negated character class instead of .*
let good = /<\w+(\s+\w+="[^"]*")*>/;
// [^"]* cannot overshoot the closing quote, eliminating ambiguity
Whitespace Normalization
// DANGEROUS: overlapping patterns
let bad = /(\s+|\t+)+/;
// SAFE: single quantifier, no alternatives needed
let good = /\s+/;
// \s already includes tabs, so the alternatives were redundant and dangerous
Repeated Word Detection
// DANGEROUS: if the \w+ parts overlap with surrounding context
let risky = /(\w+)\s+\1/;
// SAFE: word boundaries prevent ambiguous overlap
let safe = /\b(\w+)\s+\1\b/;
// The \b anchors eliminate ambiguity about word boundaries
Testing Your Patterns
Manual Testing Approach
Create a simple test that increases input length and watches for exponential time growth:
function testRegexPerformance(pattern, charToRepeat, suffix, maxLen = 30) {
console.log(`Testing: ${pattern}`);
console.log(`Input: "${charToRepeat}" × n + "${suffix}"\n`);
for (let len = 5; len <= maxLen; len += 5) {
let input = charToRepeat.repeat(len) + suffix;
let start = performance.now();
pattern.test(input);
let elapsed = performance.now() - start;
let warning = elapsed > 100 ? " ⚠️ SLOW!" : "";
console.log(` Length ${len.toString().padStart(2)}: ${elapsed.toFixed(1).padStart(8)}ms${warning}`);
// Bail out if it's taking too long
if (elapsed > 2000) {
console.log(" ⛔ ABORTING - pattern is vulnerable to catastrophic backtracking!");
break;
}
}
console.log("");
}
// Test a dangerous pattern
testRegexPerformance(/^(a+)+$/, "a", "b");
// Test the fixed version
testRegexPerformance(/^a+$/, "a", "b");
Output (approximate):
Testing: /^(a+)+$/
Input: "a" × n + "b"
Length 5: 0.0ms
Length 10: 0.1ms
Length 15: 0.5ms
Length 20: 17.0ms
Length 25: 548.0ms ⚠️ SLOW!
Length 30: 17562.0ms ⚠️ SLOW!
⛔ ABORTING - pattern is vulnerable to catastrophic backtracking!
Testing: /^a+$/
Input: "a" × n + "b"
Length 5: 0.0ms
Length 10: 0.0ms
Length 15: 0.0ms
Length 20: 0.0ms
Length 25: 0.0ms
Length 30: 0.0ms
The contrast is stark. The dangerous pattern shows exponential growth while the fixed pattern completes in microseconds regardless of input length.
Automated Checking Function
function isRegexSafe(pattern, testChar = "a", maxTestLen = 25, maxTimeMs = 100) {
let suffix = "!"; // A character that should cause match failure
for (let len = 10; len <= maxTestLen; len++) {
let input = testChar.repeat(len) + suffix;
let start = performance.now();
pattern.test(input);
let elapsed = performance.now() - start;
if (elapsed > maxTimeMs) {
return {
safe: false,
failedAt: len,
timeMs: elapsed,
message: `Pattern took ${elapsed.toFixed(1)}ms at length ${len}`
};
}
}
return { safe: true, message: "Pattern appears safe up to tested length" };
}
console.log(isRegexSafe(/^(a+)+$/));
console.log(isRegexSafe(/^a+$/));
console.log(isRegexSafe(/^(\w+\.)+\w+$/));
Output:
{ safe: false, failedAt: 22, timeMs: 134.2, message: 'Pattern took 134.2ms at length 22' }
{ safe: true, message: 'Pattern appears safe up to tested length' }
{ safe: false, failedAt: 23, timeMs: 112.5, message: 'Pattern took 112.5ms at length 23' }
Checklist for Writing Safe Patterns
Use this checklist every time you write a regex that processes user input:
Before writing the pattern:
- Can you limit the input length before applying the regex?
- Can you use a non-regex approach (string methods,
URLconstructor,split)? - Is the pattern applied to user-controlled data?
While writing the pattern:
- Avoid nested quantifiers on the same characters:
(a+)+,(a*)* - Avoid overlapping alternatives inside quantifiers:
(\w|\d)+ - Use negated character classes instead of
.*when possible:[^"]*instead of.* - Use bounded quantifiers
{1,n}instead of unbounded+and*where appropriate - Make character classes as specific as possible
After writing the pattern:
- Test with increasing input lengths to check for exponential time growth
- Use static analysis tools (
safe-regex,eslint-plugin-security) - Test with adversarial input: long strings of matching characters followed by a non-matching character
- If the pattern will run server-side, add a timeout mechanism
Summary
| Concept | Description |
|---|---|
| Catastrophic Backtracking | Exponential time growth when the regex engine tries all possible match combinations |
| ReDoS | Attack exploiting vulnerable regex to cause denial of service |
| Root Cause | Ambiguity from nested quantifiers or overlapping alternatives |
| Primary Fix | Remove ambiguity by simplifying patterns |
| Defense in Depth | Input length limits, timeouts, static analysis tools |
| Atomic Groups | Prevent backtracking (not natively supported in JavaScript) |
v Flag | Enables stricter syntax and set operations for more precise patterns |
Key takeaways:
- Catastrophic backtracking happens when nested quantifiers or overlapping alternatives create exponential match possibilities on failing input
- The time to fail grows exponentially with input length, making even moderate inputs (30+ characters) dangerous
- Remove ambiguity as the primary fix: use single quantifiers, non-overlapping character classes, and negated sets
- Limit input length before applying any regex to user-controlled data
- Test patterns with increasing input lengths and adversarial suffixes to detect exponential behavior
- Use static analysis tools like
safe-regexto catch vulnerable patterns automatically - For server-side code, consider timeout mechanisms using worker threads
- When a regex grows too complex, split validation into multiple simple steps or use non-regex alternatives like the
URLconstructor orsplit()with procedural checks
Every regex that touches user input is a potential attack surface. Writing safe patterns is not just about correctness. It is about protecting your application from a class of denial-of-service attacks that can be triggered by a single carefully crafted string.