Skip to main content

How to Pretty Print XML in Python

XML (Extensible Markup Language) is a widely used format for encoding structured data in a way that is both human-readable and machine-parsable. However, raw or minified XML: where all tags are compressed into a single line: can be extremely difficult to read and debug. Pretty printing is the process of formatting XML with proper indentation and line breaks, making it far easier to inspect, edit, and understand.

In this guide, you will learn multiple ways to pretty print XML in Python, including the built-in xml.dom.minidom module, the popular BeautifulSoup library, and the high-performance lxml library. Each method is demonstrated with clear examples and their corresponding outputs.

Understanding the Problem

Consider the following minified XML string:

raw_xml = '<?xml version="1.0" ?><catalog><book><title>Python 101</title><author>Jane Doe</author></book></catalog>'
print(raw_xml)

Output:

<?xml version="1.0" ?><catalog><book><title>Python 101</title><author>Jane Doe</author></book></catalog>

As you can see, everything is crammed onto one line. This is perfectly valid XML, but it is hard to read. Pretty printing transforms it into a well-indented, multi-line format.

Pretty Printing XML Using xml.dom.minidom

The xml.dom.minidom module is part of Python's standard library, so no external installation is required. It provides a minimal implementation of the DOM (Document Object Model) interface and includes a convenient toprettyxml() method.

From a String

import xml.dom.minidom

raw_xml = '<?xml version="1.0" ?><catalog><book><title>Python 101</title><author>Jane Doe</author></book></catalog>'

parsed = xml.dom.minidom.parseString(raw_xml)
pretty_xml = parsed.toprettyxml(indent=" ")

print(pretty_xml)

Output:

<?xml version="1.0" ?>
<catalog>
<book>
<title>Python 101</title>
<author>Jane Doe</author>
</book>
</catalog>
tip

You can customize the indentation by changing the indent parameter. For example, use indent=" " for two-space indentation or indent="\t" for tabs.

From an XML File

You can also pretty print XML stored in an external file. Suppose you have a file called catalog.xml:

catalog.xml
<?xml version="1.0" ?><catalog><book><title>Python 101</title><author>Jane Doe</author></book></catalog>
import xml.dom.minidom

with open("catalog.xml", "r") as f:
parsed = xml.dom.minidom.parseString(f.read())
pretty_xml = parsed.toprettyxml(indent=" ")

print(pretty_xml)

Output:

<?xml version="1.0" ?>
<catalog>
<book>
<title>Python 101</title>
<author>Jane Doe</author>
</book>
</catalog>

Removing Extra Blank Lines

A common annoyance with toprettyxml() is that it can insert unwanted blank lines, especially when the original XML already contains some whitespace. Here is how to handle that:

import xml.dom.minidom
import os

raw_xml = """<?xml version="1.0" ?>
<catalog>
<book>
<title>Python 101</title>
</book>
</catalog>"""

parsed = xml.dom.minidom.parseString(raw_xml)
pretty_xml = parsed.toprettyxml(indent=" ")

# Remove extra blank lines
cleaned_xml = os.linesep.join(
line for line in pretty_xml.splitlines() if line.strip()
)

print(cleaned_xml)

Output:

<?xml version="1.0" ?>
<catalog>
<book>
<title>Python 101</title>
</book>
</catalog>

Pretty Printing XML Using BeautifulSoup

BeautifulSoup is a popular third-party library primarily known for HTML parsing, but it handles XML parsing equally well when used with the lxml-xml parser.

Installation

pip install beautifulsoup4 lxml

Example

from bs4 import BeautifulSoup

raw_xml = '<?xml version="1.0" ?><catalog><book><title>Python 101</title><author>Jane Doe</author></book></catalog>'

soup = BeautifulSoup(raw_xml, "xml")
pretty_xml = soup.prettify()

print(pretty_xml)

Output:

<?xml version="1.0" encoding="utf-8"?>
<catalog>
<book>
<title>
Python 101
</title>
<author>
Jane Doe
</author>
</book>
</catalog>
note

BeautifulSoup's prettify() uses a single-space indentation by default and places the text content of each element on its own line. This can change the visual layout compared to minidom. Be aware that the text nodes (like Python 101) appear on a separate line from their tags.

From an XML File

from bs4 import BeautifulSoup

with open("catalog.xml", "r") as f:
soup = BeautifulSoup(f, "xml")

print(soup.prettify())

Output:

<?xml version="1.0" encoding="utf-8"?>
<catalog>
<book>
<title>
Python 101
</title>
<author>
Jane Doe
</author>
</book>
</catalog>

Pretty Printing XML Using lxml

The lxml library is a high-performance, feature-rich library for processing XML and HTML in Python. It wraps the C libraries libxml2 and libxslt, making it significantly faster than pure-Python alternatives for large documents.

Installation

pip install lxml

From a File

from lxml import etree

tree = etree.parse("catalog.xml")
pretty_xml = etree.tostring(tree, pretty_print=True, encoding="unicode")

print(pretty_xml)

Output:

<catalog>
<book>
<title>Python 101</title>
<author>Jane Doe</author>
</book>
</catalog>
info

Notice that lxml does not include the XML declaration (<?xml version="1.0" ?>) by default when using encoding="unicode". If you need it, pass xml_declaration=True along with a byte-string encoding:

pretty_xml = etree.tostring(
tree, pretty_print=True, encoding="UTF-8", xml_declaration=True
)
print(pretty_xml.decode("UTF-8"))

From a String

from lxml import etree

raw_xml = '<catalog><book><title>Python 101</title><author>Jane Doe</author></book></catalog>'

root = etree.fromstring(raw_xml)
pretty_xml = etree.tostring(root, pretty_print=True, encoding="unicode")

print(pretty_xml)

Output:

<catalog>
<book>
<title>Python 101</title>
<author>Jane Doe</author>
</book>
</catalog>

Using xml.etree.ElementTree with indent() (Python 3.9+)

Starting with Python 3.9, the standard library's xml.etree.ElementTree module includes a built-in indent() function, providing a simple and dependency-free way to pretty print XML.

import xml.etree.ElementTree as ET

raw_xml = '<catalog><book><title>Python 101</title><author>Jane Doe</author></book></catalog>'

root = ET.fromstring(raw_xml)
ET.indent(root, space=" ")

print(ET.tostring(root, encoding="unicode"))

Output:

<catalog>
<book>
<title>Python 101</title>
<author>Jane Doe</author>
</book>
</catalog>
caution

The ET.indent() function is only available in Python 3.9 and later. If you are using an older version, you will get an AttributeError. In that case, use one of the other methods described above.

Comparison of Methods

MethodExternal DependencyXML DeclarationCustom IndentPython Version
xml.dom.minidomNo✅ Yes✅ YesAll
BeautifulSoupYes (beautifulsoup4, lxml)✅ Yes❌ No (1 space)All
lxmlYes (lxml)Optional✅ Yes (2 spaces default)All
ET.indent()NoOptional✅ Yes3.9+

Writing Pretty Printed XML to a File

Regardless of which method you use, you can write the formatted output to a file:

import xml.dom.minidom

raw_xml = '<catalog><book><title>Python 101</title></book></catalog>'
parsed = xml.dom.minidom.parseString(raw_xml)
pretty_xml = parsed.toprettyxml(indent=" ")

with open("output.xml", "w", encoding="utf-8") as f:
f.write(pretty_xml)

print("Pretty printed XML saved to output.xml")

Output:

<?xml version="1.0" ?>
<catalog>
<book>
<title>Python 101</title>
</book>
</catalog>

Summary

Pretty printing XML in Python is straightforward and can be accomplished with several approaches depending on your needs:

  • Use xml.dom.minidom for a quick, zero-dependency solution available in all Python versions.
  • Use xml.etree.ElementTree.indent() if you are on Python 3.9+ and want the simplest standard library approach.
  • Use BeautifulSoup when you are already using it in your project or need flexible parsing capabilities.
  • Use lxml for maximum performance and the most control over serialization options, especially with large XML documents.

Each method produces clean, readable XML output: choose the one that best fits your project's requirements and dependencies.