Skip to main content

How to Group a List of Strings by First Character in Python

Grouping strings by their first character is a common operation in data processing, search indexing, building alphabetical directories, and organizing categorized content. For example, grouping ['apple', 'ant', 'banana', 'berry'] by first letter produces {'a': ['apple', 'ant'], 'b': ['banana', 'berry']}.

In this guide, you'll learn the most effective methods to group a list of strings by their initial character in Python, with clear examples and guidance on choosing the right approach.

Understanding the Problem

Given a list of strings, the goal is to organize them into groups where all strings in a group share the same first character.

Input:

words = ['an', 'a', 'tutorialreference', 'for', 'g', 'free']

Expected output (as a dictionary):

{'a': ['an', 'a'], 'g': ['tutorialreference', 'g'], 'f': ['for', 'free']}

Or as a list of lists:

[['an', 'a'], ['tutorialreference', 'g'], ['for', 'free']]

The defaultdict from the collections module is the most efficient and cleanest approach. It automatically initializes missing keys with an empty list, eliminating the need for existence checks:

from collections import defaultdict

words = ['an', 'a', 'tutorialreference', 'for', 'g', 'free']

groups = defaultdict(list)
for word in words:
groups[word[0]].append(word)

print(dict(groups))

Output:

{'a': ['an', 'a'], 't': ['tutorialreference'], 'f': ['for', 'free'], 'g': ['g']}

How it works:

  1. defaultdict(list) creates a dictionary that automatically initializes any new key with an empty list.
  2. For each word, word[0] extracts the first character.
  3. The word is appended to the list associated with that character.
  4. No need to check if the key exists: defaultdict handles it.
Why this is the best approach

defaultdict runs in O(n) time with a single pass through the list. It preserves the original order of words within each group and requires no sorting. For most grouping tasks, this is the go-to solution.

Using setdefault() (No Imports Needed)

If you prefer to avoid imports entirely, the built-in dict.setdefault() method provides similar functionality:

words = ['an', 'a', 'tutorialreference', 'for', 'g', 'free']

groups = {}
for word in words:
groups.setdefault(word[0], []).append(word)

print(groups)

Output:

{'a': ['an', 'a'], 't': ['tutorialreference'], 'f': ['for', 'free'], 'g': ['g']}
note
  • setdefault(key, default) returns the value for key if it exists; otherwise, it sets the key to
  • default and returns it. This provides the same one-pass efficiency as defaultdict without any imports.

Using sorted() with itertools.groupby()

The itertools.groupby() function groups consecutive elements that share the same key. Since it only groups adjacent elements, the list must be sorted first:

from itertools import groupby

words = ['an', 'a', 'tutorialreference', 'for', 'g', 'free']

sorted_words = sorted(words, key=lambda w: w[0])
groups = {key: list(group) for key, group in groupby(sorted_words, key=lambda w: w[0])}

print(groups)

Output:

a': ['an', 'a'], 'f': ['for', 'free'], 'g': ['g'], 't': ['tutorialreference']}

How it works:

  1. sorted() arranges words alphabetically by their first character.
  2. groupby() then iterates through the sorted list, grouping consecutive words with the same first character.
  3. A dictionary comprehension collects the groups.
Sorting changes the order

Note that sorted() reorders the words within each group alphabetically. If preserving the original order matters, use defaultdict or setdefault() instead.

Common Mistake: Using groupby() Without Sorting

A frequent error is using groupby() on an unsorted list. Since groupby() only groups consecutive elements with the same key, unsorted input produces incorrect results:

Wrong approach: unsorted input.

from itertools import groupby

words = ['an', 'tutorialreference', 'a', 'for', 'g', 'free'] # 'a' words are not adjacent

groups = {key: list(group) for key, group in groupby(words, key=lambda w: w[0])}
print(groups)

Output:

{'a': ['a'], 't': ['tutorialreference'], 'f': ['free'], 'g': ['g']}

Words starting with 'a' are split into separate groups because 'an' and 'a' are not adjacent. Only the last group for each key survives in the dictionary.

Correct approach: sort first.

from itertools import groupby

words = ['an', 'tutorialreference', 'a', 'for', 'g', 'free']

sorted_words = sorted(words, key=lambda w: w[0])
groups = {key: list(group) for key, group in groupby(sorted_words, key=lambda w: w[0])}
print(groups)

Output:

{'a': ['an', 'a'], 'f': ['for', 'free'], 'g': ['g'], 't': ['tutorialreference']}

Case-Insensitive Grouping

In real-world data, you may have mixed-case strings. To group "Apple" and "ant" together, normalize the first character:

from collections import defaultdict

words = ['Apple', 'ant', 'Banana', 'berry', 'avocado', 'Blueberry']

groups = defaultdict(list)
for word in words:
groups[word[0].lower()].append(word)

print(dict(groups))

Output:

{'a': ['Apple', 'ant', 'avocado'], 'b': ['Banana', 'berry', 'Blueberry']}

Using .lower() on the first character ensures case-insensitive grouping while preserving the original casing of each word.

Handling Edge Cases

Empty Strings in the List

If your list might contain empty strings, accessing word[0] will raise an IndexError:

from collections import defaultdict

words = ['an', '', 'tutorialreference', '', 'free']

groups = defaultdict(list)
for word in words:
if word: # Skip empty strings
groups[word[0]].append(word)

print(dict(groups))

Output:

{'a': ['an'], 't': ['tutorialreference'], 'f': ['free']}

Empty List

from collections import defaultdict

words = []

groups = defaultdict(list)
for word in words:
groups[word[0]].append(word)

print(dict(groups))

Output:

{}

No iteration occurs, so an empty dictionary is returned safely.

Creating a Reusable Function

A complete, production-ready function with options for case sensitivity and output format:

from collections import defaultdict

def group_by_first_char(
words: list[str],
case_sensitive: bool = True,
as_dict: bool = True,
) -> dict[str, list[str]] | list[list[str]]:
"""Group strings by their first character.

Args:
words: List of strings to group.
case_sensitive: If False, groups 'Apple' and 'ant' together.
as_dict: If True, returns a dict. If False, returns a list of lists.

Returns:
Grouped strings as a dictionary or list of lists.
"""
groups = defaultdict(list)
for word in words:
if not word:
continue
key = word[0] if case_sensitive else word[0].lower()
groups[key].append(word)

if as_dict:
return dict(groups)
return list(groups.values())


# Usage examples
words = ['an', 'a', 'tutorialreference', 'for', 'g', 'free']

print(group_by_first_char(words))
print(group_by_first_char(words, as_dict=False))

mixed = ['Apple', 'ant', 'Banana', 'berry']
print(group_by_first_char(mixed, case_sensitive=False))

Output:

{'a': ['an', 'a'], 't': ['tutorialreference'], 'f': ['for', 'free'], 'g': ['g']}
[['an', 'a'], ['tutorialreference'], ['for', 'free'], ['g']]
{'a': ['Apple', 'ant'], 'b': ['Banana', 'berry']}

Quick Comparison of Methods

MethodTime ComplexityPreserves OrderImport RequiredBest For
defaultdictO(n)✅ YescollectionsMost use cases (recommended)
setdefault()O(n)✅ YesNoneNo-import solutions
sorted() + groupby()O(n log n)❌ SorteditertoolsWhen sorted output is desired

Conclusion

Grouping strings by their first character in Python is straightforward with the right approach:

  • defaultdict is the recommended method: it is fast (O(n)), clean, preserves insertion order, and handles the grouping logic elegantly.
  • dict.setdefault() achieves the same result without any imports.
  • itertools.groupby() works well when you also need sorted output, but remember it requires the list to be sorted first.
  • Always handle edge cases like empty strings and mixed-case characters.
  • For case-insensitive grouping, normalize the key with .lower() while preserving the original strings.