How to Parse a YAML File in Python
YAML (YAML Ain't Markup Language) is a human-readable data serialization format commonly used for configuration files, deployment manifests, CI/CD pipelines, and data exchange. Its clean, indentation-based syntax makes it more readable than JSON or XML for many use cases.
Python's PyYAML library provides a straightforward way to read (parse), write, and manipulate YAML files. In this guide, you will learn how to install PyYAML, parse single and multi-document YAML files, access specific values, and handle YAML data safely.
Installing PyYAML
Install the library using pip:
pip install pyyaml
Then import it in your Python script:
import yaml
Sample YAML Files
We will use the following YAML files throughout this guide.
config.yml: A single-document YAML file:
UserName: TutorialReference
Password: Password@123
Phone: 1234567890
Website: tutorialreference.com
Skills:
- Python
- SQL
- Django
- JavaScript
multi_docs.yml: A multi-document YAML file (documents separated by ---):
---
UserName: TutorialReference
Password: Password@123
Website: tutorialreference.com
...
---
UserName: Google
Password: google@123
Website: google.com
...
---
UserName: Yahoo
Password: yahoo@123
Website: yahoo.com
...
Parsing a Single YAML Document
Using yaml.safe_load(): The Recommended Approach
safe_load() parses a YAML document and returns a Python dictionary. It is the safest loading function because it only constructs basic Python objects (strings, numbers, lists, dicts) and prevents arbitrary code execution:
import yaml
with open('config.yml', 'r') as f:
data = yaml.safe_load(f)
print(data)
print(type(data))
Output:
{'UserName': 'TutorialReference', 'Password': 'Password@123', 'Phone': 1234567890,
'Website': 'tutorialreference.com', 'Skills': ['Python', 'SQL', 'Django', 'JavaScript']}
<class 'dict'>
YAML keys become dictionary keys, YAML lists become Python lists, and YAML values are automatically converted to their appropriate Python types (strings, integers, booleans, etc.).
Accessing Specific Values
Once parsed, you work with the data as a standard Python dictionary:
import yaml
with open('config.yml', 'r') as f:
data = yaml.safe_load(f)
print(f"Username: {data['UserName']}")
print(f"Website: {data['Website']}")
print(f"Skills: {', '.join(data['Skills'])}")
Output:
Username: TutorialReference
Website: tutorialreference.com
Skills: Python, SQL, Django, JavaScript
Use .get() for safe access when a key might not exist:
email = data.get('Email', 'Not specified')
print(f"Email: {email}")
Output:
Email: Not specified
Parsing Multi-Document YAML Files
Some YAML files contain multiple documents separated by ---. Use yaml.safe_load_all() to parse all documents:
import yaml
with open('multi_docs.yml', 'r') as f:
documents = list(yaml.safe_load_all(f))
for i, doc in enumerate(documents):
print(f"Document {i + 1}: {doc['UserName']} : {doc['Website']}")
Output:
Document 1: TutorialReference : tutorialreference.com
Document 2: Google : google.com
Document 3: Yahoo : yahoo.com
safe_load_all() returns a generator, so wrap it in list() if you need to access documents multiple times. Alternatively, iterate directly:
with open('multi_docs.yml', 'r') as f:
for doc in yaml.safe_load_all(f):
print(doc['UserName'])
Parsing a YAML String
You can also parse YAML directly from a string:
import yaml
yaml_string = """
database:
host: localhost
port: 5432
name: mydb
credentials:
user: admin
password: secret123
"""
data = yaml.safe_load(yaml_string)
print(f"Host: {data['database']['host']}")
print(f"Port: {data['database']['port']}")
print(f"User: {data['database']['credentials']['user']}")
Output:
Host: localhost
Port: 5432
User: admin
Understanding the Different Load Functions
PyYAML provides several loading functions. Here is when to use each:
| Function | Safety | Use Case |
|---|---|---|
yaml.safe_load() | ✅ Safe: only basic Python types | Recommended for all standard use cases |
yaml.safe_load_all() | ✅ Safe | Multi-document files, safe |
yaml.full_load() | ⚠️ Moderate: loads most Python types | When you need Python-specific types |
yaml.load(f, Loader=SafeLoader) | ✅ Safe (with SafeLoader) | Explicit loader specification |
yaml.load(f, Loader=FullLoader) | ⚠️ Moderate | Explicit full loading |
yaml.unsafe_load() | ❌ Unsafe: can execute arbitrary code | Never use with untrusted input |
yaml.load() without a LoaderCalling yaml.load() without specifying a Loader parameter is deprecated and dangerous. It defaults to FullLoader in modern versions but used to allow arbitrary code execution:
# ❌ Deprecated and potentially unsafe
data = yaml.load(f)
# ✅ Safe: always specify a Loader
data = yaml.load(f, Loader=yaml.SafeLoader)
# ✅ Even better: use safe_load() directly
data = yaml.safe_load(f)
Writing Python Data to a YAML File
To convert a Python dictionary to YAML and write it to a file, use yaml.dump():
import yaml
data = {
'database': {
'host': 'localhost',
'port': 5432,
'name': 'production_db'
},
'features': ['auth', 'logging', 'caching'],
'debug': False
}
with open('output.yml', 'w') as f:
yaml.dump(data, f, default_flow_style=False, sort_keys=False)
print("YAML file written successfully.")
# Verify by reading it back
with open('output.yml', 'r') as f:
print(f.read())
Output:
YAML file written successfully.
database:
host: localhost
port: 5432
name: production_db
features:
- auth
- logging
- caching
debug: false
| Parameter | Description |
|---|---|
default_flow_style=False | Use block style (indented) instead of inline {key: value} |
sort_keys=False | Preserve the original key order |
allow_unicode=True | Allow Unicode characters in the output |
Handling Common YAML Data Types
YAML automatically converts values to appropriate Python types:
# YAML file
string_value: "hello"
integer_value: 42
float_value: 3.14
boolean_true: true
boolean_false: false
null_value: null
date_value: 2024-01-15
list_value:
- item1
- item2
nested:
key1: value1
key2: value2
import yaml
yaml_string = """
string_value: "hello"
integer_value: 42
float_value: 3.14
boolean_true: true
boolean_false: false
null_value: null
date_value: 2024-01-15
"""
data = yaml.safe_load(yaml_string)
for key, value in data.items():
print(f"{key}: {value!r} ({type(value).__name__})")
Output:
string_value: 'hello' (str)
integer_value: 42 (int)
float_value: 3.14 (float)
boolean_true: True (bool)
boolean_false: False (bool)
null_value: None (NoneType)
date_value: datetime.date(2024, 1, 15) (date)
YAML interprets many strings as booleans: yes, no, on, off, true, false are all converted to True or False. This can cause unexpected behavior:
# ❌ These are parsed as booleans, not strings
country: no # Parsed as False
answer: yes # Parsed as True
switch: on # Parsed as True
# ✅ Quote strings that could be misinterpreted
country: "no"
answer: "yes"
switch: "on"
Complete Example: Parsing a Configuration File
import yaml
import sys
def load_config(filepath):
"""Load and validate a YAML configuration file."""
try:
with open(filepath, 'r') as f:
config = yaml.safe_load(f)
if config is None:
print(f"Warning: {filepath} is empty.")
return {}
return config
except FileNotFoundError:
print(f"Error: File '{filepath}' not found.")
sys.exit(1)
except yaml.YAMLError as e:
print(f"Error parsing YAML file: {e}")
sys.exit(1)
# Usage
config = load_config('config.yml')
print(f"User: {config.get('UserName', 'Unknown')}")
print(f"Site: {config.get('Website', 'N/A')}")
skills = config.get('Skills', [])
print(f"Skills ({len(skills)}): {', '.join(skills)}")
Output:
User: TutorialReference
Site: tutorialreference.com
Skills (4): Python, SQL, Django, JavaScript
Conclusion
Parsing YAML files in Python is straightforward with the PyYAML library. Always use yaml.safe_load() for single documents and yaml.safe_load_all() for multi-document files: these functions prevent arbitrary code execution and safely convert YAML data into Python dictionaries, lists, and primitive types. Use yaml.dump() to write Python data back to YAML format. Remember to handle FileNotFoundError and yaml.YAMLError exceptions for robust file processing, and quote YAML string values that could be misinterpreted as booleans or other types.