How to Pretty Print XML in Python
XML (Extensible Markup Language) is a widely used format for encoding structured data in a way that is both human-readable and machine-parsable. However, raw or minified XML: where all tags are compressed into a single line: can be extremely difficult to read and debug. Pretty printing is the process of formatting XML with proper indentation and line breaks, making it far easier to inspect, edit, and understand.
In this guide, you will learn multiple ways to pretty print XML in Python, including the built-in xml.dom.minidom module, the popular BeautifulSoup library, and the high-performance lxml library. Each method is demonstrated with clear examples and their corresponding outputs.
Understanding the Problem
Consider the following minified XML string:
raw_xml = '<?xml version="1.0" ?><catalog><book><title>Python 101</title><author>Jane Doe</author></book></catalog>'
print(raw_xml)
Output:
<?xml version="1.0" ?><catalog><book><title>Python 101</title><author>Jane Doe</author></book></catalog>
As you can see, everything is crammed onto one line. This is perfectly valid XML, but it is hard to read. Pretty printing transforms it into a well-indented, multi-line format.
Pretty Printing XML Using xml.dom.minidom
The xml.dom.minidom module is part of Python's standard library, so no external installation is required. It provides a minimal implementation of the DOM (Document Object Model) interface and includes a convenient toprettyxml() method.
From a String
import xml.dom.minidom
raw_xml = '<?xml version="1.0" ?><catalog><book><title>Python 101</title><author>Jane Doe</author></book></catalog>'
parsed = xml.dom.minidom.parseString(raw_xml)
pretty_xml = parsed.toprettyxml(indent=" ")
print(pretty_xml)
Output:
<?xml version="1.0" ?>
<catalog>
<book>
<title>Python 101</title>
<author>Jane Doe</author>
</book>
</catalog>
You can customize the indentation by changing the indent parameter. For example, use indent=" " for two-space indentation or indent="\t" for tabs.
From an XML File
You can also pretty print XML stored in an external file. Suppose you have a file called catalog.xml:
<?xml version="1.0" ?><catalog><book><title>Python 101</title><author>Jane Doe</author></book></catalog>
import xml.dom.minidom
with open("catalog.xml", "r") as f:
parsed = xml.dom.minidom.parseString(f.read())
pretty_xml = parsed.toprettyxml(indent=" ")
print(pretty_xml)
Output:
<?xml version="1.0" ?>
<catalog>
<book>
<title>Python 101</title>
<author>Jane Doe</author>
</book>
</catalog>
Removing Extra Blank Lines
A common annoyance with toprettyxml() is that it can insert unwanted blank lines, especially when the original XML already contains some whitespace. Here is how to handle that:
import xml.dom.minidom
import os
raw_xml = """<?xml version="1.0" ?>
<catalog>
<book>
<title>Python 101</title>
</book>
</catalog>"""
parsed = xml.dom.minidom.parseString(raw_xml)
pretty_xml = parsed.toprettyxml(indent=" ")
# Remove extra blank lines
cleaned_xml = os.linesep.join(
line for line in pretty_xml.splitlines() if line.strip()
)
print(cleaned_xml)
Output:
<?xml version="1.0" ?>
<catalog>
<book>
<title>Python 101</title>
</book>
</catalog>
Pretty Printing XML Using BeautifulSoup
BeautifulSoup is a popular third-party library primarily known for HTML parsing, but it handles XML parsing equally well when used with the lxml-xml parser.
Installation
pip install beautifulsoup4 lxml
Example
from bs4 import BeautifulSoup
raw_xml = '<?xml version="1.0" ?><catalog><book><title>Python 101</title><author>Jane Doe</author></book></catalog>'
soup = BeautifulSoup(raw_xml, "xml")
pretty_xml = soup.prettify()
print(pretty_xml)
Output:
<?xml version="1.0" encoding="utf-8"?>
<catalog>
<book>
<title>
Python 101
</title>
<author>
Jane Doe
</author>
</book>
</catalog>
BeautifulSoup's prettify() uses a single-space indentation by default and places the text content of each element on its own line. This can change the visual layout compared to minidom. Be aware that the text nodes (like Python 101) appear on a separate line from their tags.
From an XML File
from bs4 import BeautifulSoup
with open("catalog.xml", "r") as f:
soup = BeautifulSoup(f, "xml")
print(soup.prettify())
Output:
<?xml version="1.0" encoding="utf-8"?>
<catalog>
<book>
<title>
Python 101
</title>
<author>
Jane Doe
</author>
</book>
</catalog>
Pretty Printing XML Using lxml
The lxml library is a high-performance, feature-rich library for processing XML and HTML in Python. It wraps the C libraries libxml2 and libxslt, making it significantly faster than pure-Python alternatives for large documents.
Installation
pip install lxml
From a File
from lxml import etree
tree = etree.parse("catalog.xml")
pretty_xml = etree.tostring(tree, pretty_print=True, encoding="unicode")
print(pretty_xml)
Output:
<catalog>
<book>
<title>Python 101</title>
<author>Jane Doe</author>
</book>
</catalog>
Notice that lxml does not include the XML declaration (<?xml version="1.0" ?>) by default when using encoding="unicode". If you need it, pass xml_declaration=True along with a byte-string encoding:
pretty_xml = etree.tostring(
tree, pretty_print=True, encoding="UTF-8", xml_declaration=True
)
print(pretty_xml.decode("UTF-8"))
From a String
from lxml import etree
raw_xml = '<catalog><book><title>Python 101</title><author>Jane Doe</author></book></catalog>'
root = etree.fromstring(raw_xml)
pretty_xml = etree.tostring(root, pretty_print=True, encoding="unicode")
print(pretty_xml)
Output:
<catalog>
<book>
<title>Python 101</title>
<author>Jane Doe</author>
</book>
</catalog>
Using xml.etree.ElementTree with indent() (Python 3.9+)
Starting with Python 3.9, the standard library's xml.etree.ElementTree module includes a built-in indent() function, providing a simple and dependency-free way to pretty print XML.
import xml.etree.ElementTree as ET
raw_xml = '<catalog><book><title>Python 101</title><author>Jane Doe</author></book></catalog>'
root = ET.fromstring(raw_xml)
ET.indent(root, space=" ")
print(ET.tostring(root, encoding="unicode"))
Output:
<catalog>
<book>
<title>Python 101</title>
<author>Jane Doe</author>
</book>
</catalog>
The ET.indent() function is only available in Python 3.9 and later. If you are using an older version, you will get an AttributeError. In that case, use one of the other methods described above.
Comparison of Methods
| Method | External Dependency | XML Declaration | Custom Indent | Python Version |
|---|---|---|---|---|
xml.dom.minidom | No | ✅ Yes | ✅ Yes | All |
BeautifulSoup | Yes (beautifulsoup4, lxml) | ✅ Yes | ❌ No (1 space) | All |
lxml | Yes (lxml) | Optional | ✅ Yes (2 spaces default) | All |
ET.indent() | No | Optional | ✅ Yes | 3.9+ |
Writing Pretty Printed XML to a File
Regardless of which method you use, you can write the formatted output to a file:
import xml.dom.minidom
raw_xml = '<catalog><book><title>Python 101</title></book></catalog>'
parsed = xml.dom.minidom.parseString(raw_xml)
pretty_xml = parsed.toprettyxml(indent=" ")
with open("output.xml", "w", encoding="utf-8") as f:
f.write(pretty_xml)
print("Pretty printed XML saved to output.xml")
Output:
<?xml version="1.0" ?>
<catalog>
<book>
<title>Python 101</title>
</book>
</catalog>
Summary
Pretty printing XML in Python is straightforward and can be accomplished with several approaches depending on your needs:
- Use
xml.dom.minidomfor a quick, zero-dependency solution available in all Python versions. - Use
xml.etree.ElementTree.indent()if you are on Python 3.9+ and want the simplest standard library approach. - Use
BeautifulSoupwhen you are already using it in your project or need flexible parsing capabilities. - Use
lxmlfor maximum performance and the most control over serialization options, especially with large XML documents.
Each method produces clean, readable XML output: choose the one that best fits your project's requirements and dependencies.