Python PyYaml: How to Use Custom Tags in PyYAML in Python
YAML's flexibility extends beyond basic data types through custom tags, which enable direct mapping between YAML content and Python objects. Instead of parsing generic dictionaries and manually instantiating classes, custom tags let PyYAML automatically construct specialized objects like configuration classes, data models, or domain-specific types. This guide demonstrates how to implement bidirectional serialization for seamless YAML integration.
Understanding Custom Tags
Custom tags tell PyYAML how to interpret specific YAML nodes. A tag like !server signals that the following data should become a ServerConfig object rather than a plain dictionary.
Implementing a Complete Custom Tag
Register both a representer (Python → YAML) and constructor (YAML → Python) for full round-trip support:
import yaml
class ServerConfig:
"""Configuration class for server settings."""
def __init__(self, host, port, ssl=False):
self.host = host
self.port = port
self.ssl = ssl
def __repr__(self):
return f"ServerConfig(host={self.host}, port={self.port}, ssl={self.ssl})"
def server_representer(dumper, data):
"""Convert ServerConfig object to YAML representation."""
return dumper.represent_mapping('!server', {
'host': data.host,
'port': data.port,
'ssl': data.ssl
})
def server_constructor(loader, node):
"""Construct ServerConfig object from YAML node."""
values = loader.construct_mapping(node)
return ServerConfig(**values)
# Register the custom tag handlers
yaml.add_representer(ServerConfig, server_representer)
yaml.add_constructor('!server', server_constructor)
# Loading from YAML
yaml_content = """
!server
host: api.example.com
port: 443
ssl: true
"""
config = yaml.load(yaml_content, Loader=yaml.FullLoader)
print(config) # ServerConfig(host=api.example.com, port=443, ssl=True)
print(f"Connecting to {config.host}:{config.port}")
# Dumping back to YAML
output = yaml.dump(config)
print(output)
Output:
ServerConfig(host=api.example.com, port=443, ssl=True)
Connecting to api.example.com:443
!server
host: api.example.com
port: 443
ssl: true
Tags provide type safety and encapsulation. Your application receives fully-formed objects with methods and validation rather than raw dictionaries requiring manual processing.
Multiple Custom Tags
Register multiple tags for different domain objects:
import yaml
class DatabaseConfig:
def __init__(self, engine, name, host="localhost"):
self.engine = engine
self.name = name
self.host = host
class CacheConfig:
def __init__(self, backend, ttl=3600):
self.backend = backend
self.ttl = ttl
# Representer and constructor for DatabaseConfig
def db_representer(dumper, data):
return dumper.represent_mapping('!database', {
'engine': data.engine, 'name': data.name, 'host': data.host
})
def db_constructor(loader, node):
return DatabaseConfig(**loader.construct_mapping(node))
# Representer and constructor for CacheConfig
def cache_representer(dumper, data):
return dumper.represent_mapping('!cache', {
'backend': data.backend, 'ttl': data.ttl
})
def cache_constructor(loader, node):
return CacheConfig(**loader.construct_mapping(node))
# Register all tags
yaml.add_representer(DatabaseConfig, db_representer)
yaml.add_constructor('!database', db_constructor)
yaml.add_representer(CacheConfig, cache_representer)
yaml.add_constructor('!cache', cache_constructor)
# Complex configuration file
config_yaml = """
database: !database
engine: postgresql
name: app_production
host: db.example.com
cache: !cache
backend: redis
ttl: 7200
"""
config = yaml.load(config_yaml, Loader=yaml.FullLoader)
print(f"Database: {config['database'].engine}")
print(f"Cache TTL: {config['cache'].ttl}")
Output:
Database: postgresql
Cache TTL: 7200
Implicit Resolvers for Pattern-Based Detection
Automatically detect types based on patterns without explicit tags:
import yaml
import re
from datetime import date
# Automatically recognize ISO dates
yaml.add_implicit_resolver(
'!isodate',
re.compile(r'^\d{4}-\d{2}-\d{2}$'),
first=list('0123456789')
)
def isodate_constructor(loader, node):
value = loader.construct_scalar(node)
year, month, day = map(int, value.split('-'))
return date(year, month, day)
yaml.add_constructor('!isodate', isodate_constructor)
# Dates are automatically parsed without explicit tags
data = yaml.load("""
event: Conference
start_date: 2024-06-15
end_date: 2024-06-17
""", Loader=yaml.FullLoader)
print(f"Type: {type(data['start_date'])}") # <class 'datetime.date'>
print(f"Event starts: {data['start_date']}")
Output:
Type: <class 'datetime.date'>
Event starts: 2024-06-15
The first parameter in add_implicit_resolver specifies which starting characters trigger pattern checking, improving parsing performance.
Scalar Value Tags
For simple transformations, use scalar constructors:
import yaml
import os
# Environment variable substitution
def env_constructor(loader, node):
"""Replace !env tag with environment variable value."""
var_name = loader.construct_scalar(node)
return os.environ.get(var_name, f"<{var_name} not set>")
yaml.add_constructor('!env', env_constructor)
# Configuration with environment references
config_yaml = """
database:
password: !env DATABASE_PASSWORD
host: !env DATABASE_HOST
"""
os.environ['DATABASE_PASSWORD'] = 'secret123'
os.environ['DATABASE_HOST'] = 'localhost'
config = yaml.load(config_yaml, Loader=yaml.FullLoader)
print(config['database']['password']) # secret123
Output:
secret123
Safe Custom Loaders
For production use, create a custom safe loader:
import yaml
class SafeConfigLoader(yaml.SafeLoader):
"""Custom loader with application-specific tags."""
pass
def server_constructor(loader, node):
values = loader.construct_mapping(node)
return ServerConfig(**values)
# Add constructor to custom loader only
SafeConfigLoader.add_constructor('!server', server_constructor)
# Use the safe custom loader
config = yaml.load(yaml_content, Loader=SafeConfigLoader)
Method Reference
| Method | Purpose | Direction |
|---|---|---|
add_representer() | Convert Python object → YAML | Dump |
add_constructor() | Convert YAML → Python object | Load |
add_implicit_resolver() | Auto-detect tags via regex | Load |
add_multi_representer() | Handle class hierarchies | Dump |
Never use yaml.load() without specifying a Loader. The default behavior in older PyYAML versions can execute arbitrary code. Always use:
yaml.SafeLoaderfor untrusted inputyaml.FullLoaderfor trusted application configs- Custom safe loaders for production systems
By implementing custom tags, you create expressive, type-safe configuration systems that bridge YAML's human-readable format with Python's object-oriented design patterns.