What data structure should I use in Python? Dataclass vs NamedTuple vs Class
Choosing the right data structure for storing related values reduces boilerplate code and improves clarity. Python offers several options, each with distinct characteristics.
Quick Comparison
| Feature | Regular Class | NamedTuple | Dataclass |
|---|---|---|---|
| Mutability | Mutable | Immutable | Mutable (configurable) |
| Boilerplate | High | Low | Low |
| Default values | Manual | Limited | Easy |
| Type hints | Optional | Required | Required |
| Inheritance | Full support | Limited | Full support |
| Hashable | No (by default) | Yes | Configurable |
| Memory | Higher | Lower | Medium |
Regular Class
Traditional classes require manual implementation of common methods:
class User:
def __init__(self, id: int, name: str, email: str = ""):
self.id = id
self.name = name
self.email = email
def __repr__(self):
return f"User(id={self.id}, name={self.name}, email={self.email})"
def __eq__(self, other):
if not isinstance(other, User):
return False
return self.id == other.id and self.name == other.name
user = User(1, "Alice", "alice@example.com")
print(user) # User(id=1, name='Alice', email='alice@example.com')
Best for objects with significant behavior beyond data storage.
NamedTuple
Immutable, memory-efficient records with tuple compatibility:
from typing import NamedTuple
class User(NamedTuple):
id: int
name: str
email: str = ""
user = User(1, "Alice", "alice@example.com")
# Tuple-like access
print(user[0]) # 1
print(user.name) # Alice
# Unpacking works
id, name, email = user
# Immutable - this raises an error
# user.name = "Bob" # AttributeError
NamedTuples are hashable by default, making them suitable as dictionary keys or set members.
Dataclass
The modern standard for data containers with sensible defaults:
from dataclasses import dataclass
@dataclass
class User:
id: int
name: str
email: str = ""
user = User(1, "Alice")
user.email = "alice@example.com" # Mutable by default
print(user) # User(id=1, name='Alice', email='alice@example.com')
Automatic generation of __init__, __repr__, and __eq__ methods.
Dataclass Configuration Options
Customize behavior with decorator parameters:
from dataclasses import dataclass, field
@dataclass(frozen=True) # Makes it immutable
class Point:
x: float
y: float
@dataclass(order=True) # Adds comparison methods
class Priority:
level: int
task: str
@dataclass
class Config:
name: str
values: list = field(default_factory=list) # Mutable default
def __post_init__(self):
# Called after __init__
self.name = self.name.upper()
Available options:
frozen=True: Makes instances immutable and hashableorder=True: Generates__lt__,__le__,__gt__,__ge__slots=True(Python 3.10+): Uses__slots__for memory efficiency
Default Values with Mutable Types
Both NamedTuple and dataclass handle mutable defaults differently:
from dataclasses import dataclass, field
from typing import NamedTuple
# Dataclass - use field() for mutable defaults
@dataclass
class Team:
name: str
members: list = field(default_factory=list)
# NamedTuple - no direct mutable defaults
class Team(NamedTuple):
name: str
members: tuple = () # Use immutable type instead
Never use mutable default values like members: list = [] in dataclasses. Each instance would share the same list object. Always use field(default_factory=list).
Inheritance
Dataclasses support inheritance naturally:
from dataclasses import dataclass
@dataclass
class Person:
name: str
age: int
@dataclass
class Employee(Person):
employee_id: str
department: str = "General"
emp = Employee("Alice", 30, "E001", "Engineering")
print(emp) # Employee(name='Alice', age=30, employee_id='E001', department='Engineering')
NamedTuple inheritance is more limited and less intuitive.
Performance Comparison
from dataclasses import dataclass
from typing import NamedTuple
import sys
class UserClass:
def __init__(self, id, name):
self.id = id
self.name = name
class UserTuple(NamedTuple):
id: int
name: str
@dataclass
class UserData:
id: int
name: str
@dataclass(slots=True)
class UserSlots:
id: int
name: str
# Memory comparison
instances = [
UserClass(1, "A"),
UserTuple(1, "A"),
UserData(1, "A"),
UserSlots(1, "A")
]
for obj in instances:
print(f"{type(obj).__name__}: {sys.getsizeof(obj)} bytes")
Output:
UserClass: 56 bytes
UserTuple: 56 bytes
UserData: 56 bytes
UserSlots: 48 bytes
NamedTuple typically uses the least memory, followed by dataclass with slots.
Conversion Between Types
from dataclasses import dataclass, asdict, astuple
from typing import NamedTuple
@dataclass
class User:
id: int
name: str
user = User(1, "Alice")
# To dictionary
user_dict = asdict(user) # {'id': 1, 'name': 'Alice'}
# To tuple
user_tuple = astuple(user) # (1, 'Alice')
# From dictionary
user_from_dict = User(**user_dict)
When to Use Each
Use Regular Class When:
- Object has significant behavior (methods)
- Complex initialization logic required
- Need full control over all aspects
Use NamedTuple When:
- Immutability is required
- Need tuple unpacking or indexing
- Memory efficiency is critical
- Using as dictionary keys
Use Dataclass When:
- Primary purpose is storing data
- Need mutable instances (default) or immutable (
frozen=True) - Want automatic method generation
- Need default values and type hints
# NamedTuple: Coordinates that shouldn't change
class Point(NamedTuple):
x: float
y: float
# Dataclass: Configurable settings
@dataclass
class Settings:
theme: str = "dark"
font_size: int = 12
# Regular class: Complex behavior
class DatabaseConnection:
def __init__(self, host, port):
self.host = host
self.port = port
self._connection = None
def connect(self):
# Connection logic
pass
Python 3.10+ dataclasses with slots=True approach NamedTuple memory efficiency while retaining mutability and easier inheritance.
Default to dataclass for most data-holding needs. Choose NamedTuple when immutability and tuple compatibility are essential. Reserve regular classes for objects with complex behavior.