How to Convert XML to Pandas DataFrame in Python

XML remains a common format for data exchange, configuration files, and API responses. Since Pandas 1.3, the read_xml() function provides native XML parsing, converting hierarchical structures into flat DataFrames efficiently.

Parse XML Directly with `pd.read_xml()`

The simplest approach uses Pandas' built-in XML parser with XPath selection.

import pandas as pd

# Sample XML structure:
# <catalog>
#   <book id="1">
#     <title>Python Basics</title>
#     <author>Alice</author>
#     <price>29.99</price>
#   </book>
#   <book id="2">
#     <title>Data Science</title>
#     <author>Bob</author>
#     <price>39.99</price>
#   </book>
# </catalog>

df = pd.read_xml("books.xml", xpath=".//book")

print(df)

Output:

   id          title author  price
0   1  Python Basics  Alice  29.99
1   2   Data Science    Bob  39.99

Parse XML from String

import pandas as pd

xml_string = """
<users>
    <user id="1">
        <name>Alice</name>
        <email>alice@example.com</email>
    </user>
    <user id="2">
        <name>Bob</name>
        <email>bob@example.com</email>
    </user>
</users>
"""

df = pd.read_xml(xml_string, xpath=".//user")

print(df)

Output:

   id   name              email
0   1  Alice  alice@example.com
1   2    Bob    bob@example.com

tip

Install lxml for significantly faster parsing: pip install lxml. Pandas uses it automatically when available.

Filter Data with XPath Expressions

XPath allows selective loading, reducing memory usage for large files.

import pandas as pd

xml_data = """
<products>
    <product category="electronics">
        <name>Laptop</name>
        <price>999</price>
    </product>
    <product category="books">
        <name>Python Guide</name>
        <price>49</price>
    </product>
    <product category="electronics">
        <name>Phone</name>
        <price>699</price>
    </product>
</products>
"""

# Load only electronics products
df = pd.read_xml(xml_data, xpath=".//product[@category='electronics']")

print(df)

Output:

      category    name  price
0  electronics  Laptop    999
1  electronics   Phone    699

Common XPath Patterns

XPath Expression	Description
`.//element`	All elements with this name
`.//element[@attr='value']`	Filter by attribute value
`.//parent/child`	Direct child elements
`.//element[position()<=10]`	First 10 elements

info

Pandas uses XPath 1.0, which has limited filtering capabilities. For complex queries, load the data first and filter with Pandas operations.

Handle Nested XML Structures

Deeply nested XML requires specifying the correct path or flattening manually.

import pandas as pd

nested_xml = """
<company>
    <department name="Engineering">
        <employees>
            <employee>
                <name>Alice</name>
                <role>Developer</role>
            </employee>
            <employee>
                <name>Bob</name>
                <role>Designer</role>
            </employee>
        </employees>
    </department>
</company>
"""

# Target the deeply nested employee elements
df = pd.read_xml(nested_xml, xpath=".//employee")

print(df)

Output:

    name       role
0  Alice  Developer
1    Bob   Designer

Extract Attributes and Elements

import pandas as pd

xml_with_attrs = """
<orders>
    <order id="1001" status="shipped">
        <customer>Alice</customer>
        <total>150.00</total>
    </order>
    <order id="1002" status="pending">
        <customer>Bob</customer>
        <total>89.50</total>
    </order>
</orders>
"""

# Attributes become columns automatically
df = pd.read_xml(xml_with_attrs, xpath=".//order")

print(df)

Output:

     id   status customer  total
0  1001  shipped    Alice  150.0
1  1002  pending      Bob   89.5

Parse Complex XML with BeautifulSoup

For malformed XML, inconsistent structures, or custom extraction logic, use BeautifulSoup.

from bs4 import BeautifulSoup
import pandas as pd

xml_content = """
<items>
    <item id="1">
        <name>Widget A</name>
        <specs>
            <weight unit="kg">2.5</weight>
            <dimensions>10x20x5</dimensions>
        </specs>
    </item>
    <item id="2">
        <name>Widget B</name>
        <specs>
            <weight unit="kg">1.8</weight>
            <dimensions>8x15x4</dimensions>
        </specs>
    </item>
</items>
"""

soup = BeautifulSoup(xml_content, "xml")

data = []
for item in soup.find_all("item"):
    data.append({
        "id": item.get("id"),
        "name": item.find("name").text,
        "weight": float(item.find("weight").text),
        "weight_unit": item.find("weight").get("unit"),
        "dimensions": item.find("dimensions").text
    })

df = pd.DataFrame(data)

print(df)

Output:

  id      name  weight weight_unit dimensions
0  1  Widget A     2.5          kg    10x20x5
1  2  Widget B     1.8          kg     8x15x4

Handle Missing Elements Safely

from bs4 import BeautifulSoup
import pandas as pd

xml_with_missing = """
<products>
    <product>
        <name>Item A</name>
        <price>10.00</price>
    </product>
    <product>
        <name>Item B</name>
        <!-- price is missing -->
    </product>
</products>
"""

soup = BeautifulSoup(xml_with_missing, "xml")

data = []
for product in soup.find_all("product"):
    name_tag = product.find("name")
    price_tag = product.find("price")
    
    data.append({
        "name": name_tag.text if name_tag else None,
        "price": float(price_tag.text) if price_tag else None
    })

df = pd.DataFrame(data)

print(df)

Output:

     name  price
0  Item A   10.0
1  Item B    NaN

Use Standard Library for Zero Dependencies

The built-in xml.etree.ElementTree module requires no external packages.

import xml.etree.ElementTree as ET
import pandas as pd

xml_string = """
<records>
    <record>
        <id>1</id>
        <value>100</value>
    </record>
    <record>
        <id>2</id>
        <value>200</value>
    </record>
</records>
"""

root = ET.fromstring(xml_string)

data = []
for record in root.findall(".//record"):
    data.append({
        "id": record.find("id").text,
        "value": int(record.find("value").text)
    })

df = pd.DataFrame(data)

print(df)

Output:

  id  value
0  1    100
1  2    200

Handle Namespaces in XML

XML namespaces require special handling in XPath expressions.

import pandas as pd

namespaced_xml = """
<root xmlns:ns="http://example.com/ns">
    <ns:item>
        <ns:name>Product 1</ns:name>
    </ns:item>
    <ns:item>
        <ns:name>Product 2</ns:name>
    </ns:item>
</root>
"""

# Define namespace mapping
namespaces = {"ns": "http://example.com/ns"}

df = pd.read_xml(
    namespaced_xml,
    xpath=".//ns:item",
    namespaces=namespaces
)

print(df)

Output:

        name
0  Product 1
1  Product 2

warning

Namespace handling can be tricky. If read_xml() fails, try BeautifulSoup with soup.find_all("item") which often ignores namespaces by default.

Quick Reference

Method	Best For	Speed	Dependencies
`pd.read_xml()`	Standard XML files	⚡ Fast	pandas, lxml (optional)
`BeautifulSoup`	Malformed/complex XML	🐢 Moderate	beautifulsoup4, lxml
`xml.etree`	Simple XML, no dependencies	🚀 Fast	None (standard library)

Conclusion

Use pd.read_xml() for most XML parsing tasks. It's fast, concise, and handles attributes automatically. Upgrade Pandas to the latest version and install lxml for optimal performance. For malformed XML or custom extraction logic, BeautifulSoup provides flexible parsing. Reserve xml.etree.ElementTree for environments where external dependencies aren't available.

Parse XML Directly with pd.read_xml()​

Parse XML from String​

Filter Data with XPath Expressions​

Common XPath Patterns​

Handle Nested XML Structures​

Extract Attributes and Elements​

Parse Complex XML with BeautifulSoup​

Handle Missing Elements Safely​

Use Standard Library for Zero Dependencies​

Handle Namespaces in XML​

Quick Reference​

Conclusion​

Table of Contents

Parse XML Directly with `pd.read_xml()`

Parse XML from String

Filter Data with XPath Expressions

Common XPath Patterns

Handle Nested XML Structures

Extract Attributes and Elements

Parse Complex XML with BeautifulSoup

Handle Missing Elements Safely

Use Standard Library for Zero Dependencies

Handle Namespaces in XML

Quick Reference

Conclusion