How to Remove All Non-Alphanumeric Characters from a String in JavaScript
A common data sanitization task is to "clean" a string by removing all characters that are not letters or numbers. This is useful for creating search-friendly slugs, generating safe filenames, or simplifying user input for processing. The most powerful and concise way to achieve this is by using the String.prototype.replace() method with a regular expression.
This guide will teach you how to use a regular expression to strip all non-alphanumeric characters from a string, how to modify the pattern to include or exclude the underscore, and how to preserve other specific characters like spaces or hyphens.
The Core Method: replace() with a Regular Expression
The most flexible and direct way to remove a set of unwanted characters is to use replace() with a regex that matches them.
Problem: you have a string containing various symbols, and you want to keep only the letters and numbers.
// Problem: Remove all symbols from this string.
let messyString = 'User@_Name!#123-Test';
Solution: this regular expression finds any character that is not a letter or a number and replaces it with an empty string.
function sanitizeAlphanumeric(str) {
// The regex /[^a-z0-9]/gi finds any character that is NOT a-z or 0-9.
return str.replace(/[^a-z0-9]/gi, '');
}
// Example Usage:
let messyString = 'User@_Name!#123-Test';
let cleanString = sanitizeAlphanumeric(messyString);
console.log(cleanString); // Output: UserName123Test
The Regular Expression Explained
The regex /[^a-z0-9]/gi is the key to this operation. Let's break it down:
/ ... /: These are the delimiters that mark the beginning and end of the regular expression pattern.[...]: This is a character set. It defines a group of characters to match.^: When^is the first character inside a character set, it acts as a negation. It inverts the set, matching any character that is not in the list.a-z0-9: This defines two ranges: all lowercase letters from 'a' to 'z' and all digits from '0' to '9'.g: This is the global flag. It is crucial. It tells thereplace()method to replace all matches it finds, not just the first one.i: This is the case-insensitive flag. By including it, oura-zrange will also match uppercase lettersA-Z.
So, /[^a-z0-9]/gi translates to: "Find every character that is not a letter or a number, and do it for the whole string." The replace() method then replaces these matches with an empty string, effectively deleting them.
The "Word Character" Shortcut (\W) and the Underscore
Regular expressions provide a shorthand for alphanumeric characters, but it's important to understand what it includes.
\w: Matches any "word" character. This is equivalent to[A-Za-z0-9_]. Crucially, this includes the underscore (_).\W: The inverse of\w. It matches any "non-word" character. This is equivalent to[^A-Za-z0-9_].
The solution uses \W: if you want to keep letters, numbers, and underscores, using \W is a very concise alternative.
function sanitizeWithUnderscore(str) {
// The regex /\W/g finds any character that is NOT a letter, number, or underscore.
return str.replace(/\W/g, '');
}
// Example Usage:
let messyString = 'User@_Name!#123-Test';
let cleanString = sanitizeWithUnderscore(messyString);
console.log(cleanString); // Output: User_Name123Test
Choose the right tool:
- To keep only letters and numbers, use
/[^a-z0-9]/gi. - To keep letters, numbers, AND underscores, use
/\W/g.
How to Preserve Additional Characters (e.g., Spaces, Hyphens)
To keep other specific characters, you simply add them to the negated character set.
Problem: you want to sanitize a string but preserve spaces and hyphens.
// Problem: Remove symbols but keep letters, numbers, spaces, and hyphens.
let sentence = 'This is my file-name-123. (final).txt';
Solution: add the space and hyphen to the character set after the ^ negation character.
function sanitizeWithWhitelist(str) {
// The regex now says "anything that is NOT a-z, 0-9, a space, or a hyphen".
return str.replace(/[^a-z0-9 -]/gi, '');
}
let sentence = 'This is my file-name-123. (final).txt';
let cleanSentence = sanitizeWithWhitelist(sentence);
console.log(cleanSentence); // Output: This is my file-name-123 finaltxt
When inside a [], the dot (.) loses its special meaning and does not need to be escaped.
Practical Example: Creating a URL Slug
A common use case is to convert a blog post title into a URL-friendly "slug." This typically involves converting to lowercase, replacing spaces with hyphens, and removing all other non-alphanumeric characters.
function createSlug(title) {
return title
.toLowerCase()
.trim()
.replace(/\s+/g, '-') // 1. Replace spaces with hyphens
.replace(/[^a-z0-9-]/g, ''); // 2. Remove all other non-alphanumeric characters
}
let postTitle = 'My First Post! (An Introduction)';
let slug = createSlug(postTitle);
console.log(slug); // Output: my-first-post-an-introduction
Conclusion
The String.prototype.replace() method, combined with a simple regular expression, provides a powerful and concise way to remove non-alphanumeric characters from a string.
- To remove everything except letters and numbers, use
str.replace(/[^a-z0-9]/gi, ''). - To also keep the underscore, the
\Wshorthand is a great alternative:str.replace(/\W/g, ''). - To preserve other characters, like spaces or hyphens, add them to the negated character set:
str.replace(/[^a-z0-9 -]/gi, '').