How to Group a List of Strings by First Character in Python
Grouping strings by their first character is a common operation in data processing, search indexing, building alphabetical directories, and organizing categorized content. For example, grouping ['apple', 'ant', 'banana', 'berry'] by first letter produces {'a': ['apple', 'ant'], 'b': ['banana', 'berry']}.
In this guide, you'll learn the most effective methods to group a list of strings by their initial character in Python, with clear examples and guidance on choosing the right approach.
Understanding the Problem
Given a list of strings, the goal is to organize them into groups where all strings in a group share the same first character.
Input:
words = ['an', 'a', 'tutorialreference', 'for', 'g', 'free']
Expected output (as a dictionary):
{'a': ['an', 'a'], 'g': ['tutorialreference', 'g'], 'f': ['for', 'free']}
Or as a list of lists:
[['an', 'a'], ['tutorialreference', 'g'], ['for', 'free']]
Using defaultdict (Recommended)
The defaultdict from the collections module is the most efficient and cleanest approach. It automatically initializes missing keys with an empty list, eliminating the need for existence checks:
from collections import defaultdict
words = ['an', 'a', 'tutorialreference', 'for', 'g', 'free']
groups = defaultdict(list)
for word in words:
groups[word[0]].append(word)
print(dict(groups))
Output:
{'a': ['an', 'a'], 't': ['tutorialreference'], 'f': ['for', 'free'], 'g': ['g']}
How it works:
defaultdict(list)creates a dictionary that automatically initializes any new key with an empty list.- For each word,
word[0]extracts the first character. - The word is appended to the list associated with that character.
- No need to check if the key exists:
defaultdicthandles it.
defaultdict runs in O(n) time with a single pass through the list. It preserves the original order of words within each group and requires no sorting. For most grouping tasks, this is the go-to solution.
Using setdefault() (No Imports Needed)
If you prefer to avoid imports entirely, the built-in dict.setdefault() method provides similar functionality:
words = ['an', 'a', 'tutorialreference', 'for', 'g', 'free']
groups = {}
for word in words:
groups.setdefault(word[0], []).append(word)
print(groups)
Output:
{'a': ['an', 'a'], 't': ['tutorialreference'], 'f': ['for', 'free'], 'g': ['g']}
setdefault(key, default)returns the value forkeyif it exists; otherwise, it sets the key todefaultand returns it. This provides the same one-pass efficiency asdefaultdictwithout any imports.
Using sorted() with itertools.groupby()
The itertools.groupby() function groups consecutive elements that share the same key. Since it only groups adjacent elements, the list must be sorted first:
from itertools import groupby
words = ['an', 'a', 'tutorialreference', 'for', 'g', 'free']
sorted_words = sorted(words, key=lambda w: w[0])
groups = {key: list(group) for key, group in groupby(sorted_words, key=lambda w: w[0])}
print(groups)
Output:
a': ['an', 'a'], 'f': ['for', 'free'], 'g': ['g'], 't': ['tutorialreference']}
How it works:
sorted()arranges words alphabetically by their first character.groupby()then iterates through the sorted list, grouping consecutive words with the same first character.- A dictionary comprehension collects the groups.
Note that sorted() reorders the words within each group alphabetically. If preserving the original order matters, use defaultdict or setdefault() instead.
Common Mistake: Using groupby() Without Sorting
A frequent error is using groupby() on an unsorted list. Since groupby() only groups consecutive elements with the same key, unsorted input produces incorrect results:
Wrong approach: unsorted input.
from itertools import groupby
words = ['an', 'tutorialreference', 'a', 'for', 'g', 'free'] # 'a' words are not adjacent
groups = {key: list(group) for key, group in groupby(words, key=lambda w: w[0])}
print(groups)
Output:
{'a': ['a'], 't': ['tutorialreference'], 'f': ['free'], 'g': ['g']}
Words starting with 'a' are split into separate groups because 'an' and 'a' are not adjacent. Only the last group for each key survives in the dictionary.
Correct approach: sort first.
from itertools import groupby
words = ['an', 'tutorialreference', 'a', 'for', 'g', 'free']
sorted_words = sorted(words, key=lambda w: w[0])
groups = {key: list(group) for key, group in groupby(sorted_words, key=lambda w: w[0])}
print(groups)
Output:
{'a': ['an', 'a'], 'f': ['for', 'free'], 'g': ['g'], 't': ['tutorialreference']}
Case-Insensitive Grouping
In real-world data, you may have mixed-case strings. To group "Apple" and "ant" together, normalize the first character:
from collections import defaultdict
words = ['Apple', 'ant', 'Banana', 'berry', 'avocado', 'Blueberry']
groups = defaultdict(list)
for word in words:
groups[word[0].lower()].append(word)
print(dict(groups))
Output:
{'a': ['Apple', 'ant', 'avocado'], 'b': ['Banana', 'berry', 'Blueberry']}
Using .lower() on the first character ensures case-insensitive grouping while preserving the original casing of each word.
Handling Edge Cases
Empty Strings in the List
If your list might contain empty strings, accessing word[0] will raise an IndexError:
from collections import defaultdict
words = ['an', '', 'tutorialreference', '', 'free']
groups = defaultdict(list)
for word in words:
if word: # Skip empty strings
groups[word[0]].append(word)
print(dict(groups))
Output:
{'a': ['an'], 't': ['tutorialreference'], 'f': ['free']}
Empty List
from collections import defaultdict
words = []
groups = defaultdict(list)
for word in words:
groups[word[0]].append(word)
print(dict(groups))
Output:
{}
No iteration occurs, so an empty dictionary is returned safely.
Creating a Reusable Function
A complete, production-ready function with options for case sensitivity and output format:
from collections import defaultdict
def group_by_first_char(
words: list[str],
case_sensitive: bool = True,
as_dict: bool = True,
) -> dict[str, list[str]] | list[list[str]]:
"""Group strings by their first character.
Args:
words: List of strings to group.
case_sensitive: If False, groups 'Apple' and 'ant' together.
as_dict: If True, returns a dict. If False, returns a list of lists.
Returns:
Grouped strings as a dictionary or list of lists.
"""
groups = defaultdict(list)
for word in words:
if not word:
continue
key = word[0] if case_sensitive else word[0].lower()
groups[key].append(word)
if as_dict:
return dict(groups)
return list(groups.values())
# Usage examples
words = ['an', 'a', 'tutorialreference', 'for', 'g', 'free']
print(group_by_first_char(words))
print(group_by_first_char(words, as_dict=False))
mixed = ['Apple', 'ant', 'Banana', 'berry']
print(group_by_first_char(mixed, case_sensitive=False))
Output:
{'a': ['an', 'a'], 't': ['tutorialreference'], 'f': ['for', 'free'], 'g': ['g']}
[['an', 'a'], ['tutorialreference'], ['for', 'free'], ['g']]
{'a': ['Apple', 'ant'], 'b': ['Banana', 'berry']}
Quick Comparison of Methods
| Method | Time Complexity | Preserves Order | Import Required | Best For |
|---|---|---|---|---|
defaultdict | O(n) | ✅ Yes | collections | Most use cases (recommended) |
setdefault() | O(n) | ✅ Yes | None | No-import solutions |
sorted() + groupby() | O(n log n) | ❌ Sorted | itertools | When sorted output is desired |
Conclusion
Grouping strings by their first character in Python is straightforward with the right approach:
defaultdictis the recommended method: it is fast (O(n)), clean, preserves insertion order, and handles the grouping logic elegantly.dict.setdefault()achieves the same result without any imports.itertools.groupby()works well when you also need sorted output, but remember it requires the list to be sorted first.- Always handle edge cases like empty strings and mixed-case characters.
- For case-insensitive grouping, normalize the key with
.lower()while preserving the original strings.