How to Split a String by Multiple Special Characters in JavaScript
A common text parsing task is to split a string into an array using a variety of different delimiters or special characters. For example, you might need to break a string apart by any punctuation mark, space, or symbol. The most efficient and powerful way to do this is with the String.prototype.split() method, using a single regular expression that contains a character set.
This guide will teach you how to use a regular expression with a character set ([...]) to split a string by multiple, different special characters at once.
The Core Method: split() with a Character Set
The String.prototype.split() method can accept a regular expression as its argument. To split by multiple different characters, you can define all of those characters inside a single character set ([...]).
Problem: you have a string where words are separated by a mix of different delimiters, and you want to get an array of the words.
// Problem: How to split this string by '.', '_', and '-' all at once?
let messyString = 'word1.word2_word3-word4';
Solution:
let messyString = 'word1.word2_word3-word4';
// The regex /[._-]/ matches any single character that is a dot, underscore, or hyphen.
let words = messyString.split(/[._-]/);
console.log(words);
Output:
['word1', 'word2', 'word3', 'word4']
This is the most direct and efficient way to solve the problem, as it processes the entire string in a single pass.
How the Character Set Works
Let's break down the pattern /[._-]/:
/ ... /: These forward slashes mark the beginning and end of the regular expression.[._-]: This is the character set. It tells thesplit()method to break the string wherever it finds any single character that is inside the brackets. In this case, it will split on a literal dot (.), an underscore (_), or a hyphen (-).
An important note on escaping: Inside a character set [...], most special characters (like ., *, +) do not need to be escaped with a backslash. However, some characters, like the backslash itself (\\) or a hyphen (-) that isn't at the beginning or end of the set, must be escaped.
Example with more characters
let str = 'a.b,c-d_e=f\\g/h';
// This set includes many common delimiters.
let result = str.split(/[.,-_=\\/\s]/g);
console.log(result);
Output:
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
A Practical Alternative: Splitting by Anything That Isn't a Letter or Number
A more common real-world scenario is not to define what the delimiters are, but what they are not. For example, you might want to split a string by any character that is not a letter or a number.
Problem: you want to extract all the "words" from a string, and a "word" is defined as any sequence of letters or numbers.
let messyString = 'User ID: user-123, Role: admin!';
Solution: use a negated character set ([^...]) with a + quantifier.
let messyString = 'User ID: user-123, Role: admin!';
// This regex matches one or more characters that are NOT letters or numbers.
let words = messyString.split(/[^a-zA-Z0-9]+/);
console.log(words);
Output:
['User', 'ID', 'user', '123', 'Role', 'admin', '']
How It Works
[^a-zA-Z0-9]: The^at the beginning of the character set negates it, so it matches any character that is not a letter or a digit.+: The+is a quantifier that means "one or more." This is important because it treats a sequence of special characters (like:) as a single delimiter.
Conclusion
Splitting a string by multiple special characters is a task perfectly suited for a regular expression.
- The recommended best practice is to use
string.split(/[...]/)with a character set ([...]) that contains all the delimiters you want to split by. - For the common task of extracting words, it's often easier to use a negated character set to split by anything that is not a letter or number:
string.split(/[^a-zA-Z0-9]+/). - This approach is far more efficient and readable than chaining multiple
split()calls or performing multiplereplace()operations.