Skip to main content

How to Count HTML Tags with BeautifulSoup in Python

Counting HTML tags is a fundamental task in web scraping and content analysis. Whether you are auditing a page's structure, validating that certain elements exist, or gathering statistics about document composition, BeautifulSoup makes it straightforward to locate, filter, and count any tag in an HTML document.

In this guide, you will learn how to count specific tags, generate frequency distributions of all tag types, filter by attributes and CSS selectors, and scope your counts to specific sections of a page. Each method is demonstrated with clear examples and output.

Setup

Before getting started, install BeautifulSoup and a fast HTML parser if you have not already:

pip install beautifulsoup4 lxml

beautifulsoup4 is the parsing library itself, and lxml is a high-performance parser that BeautifulSoup can use under the hood. Python also ships with html.parser, which works without any extra installation but is slower on large documents.

Counting Specific Tags with find_all()

The most common approach uses find_all() to locate every instance of a given tag and then checks the length of the resulting list:

from bs4 import BeautifulSoup

html = """
<html>
<body>
<div class="content">
<p>First paragraph</p>
<p>Second paragraph</p>
<span>Some text</span>
</div>
<div class="sidebar">
<p>Sidebar paragraph</p>
</div>
</body>
</html>
"""

soup = BeautifulSoup(html, "lxml")

paragraph_count = len(soup.find_all("p"))
div_count = len(soup.find_all("div"))

print(f"Paragraphs: {paragraph_count}")
print(f"Divs: {div_count}")

Output:

Paragraphs: 3
Divs: 2
note

find_all() searches the entire document tree by default, including deeply nested elements. It returns a list of all matching Tag objects, so wrapping it with len() gives you the count.

Using CSS Selectors with select()

The select() method lets you use CSS selector syntax for more complex queries. This is especially useful when you need to target tags based on class names, IDs, or their position in the document hierarchy:

from bs4 import BeautifulSoup

html = """
<ul class="menu">
<li>Home</li>
<li>About</li>
<li>Contact</li>
</ul>
<ul class="footer-links">
<li>Privacy</li>
<li>Terms</li>
</ul>
"""

soup = BeautifulSoup(html, "lxml")

# Count <li> elements inside .menu only
menu_items = len(soup.select("ul.menu > li"))
print(f"Menu items: {menu_items}")

# Count all <li> elements regardless of parent
all_list_items = len(soup.select("li"))
print(f"All list items: {all_list_items}")

Output:

Menu items: 3
All list items: 5
Common CSS Selector Patterns

CSS selectors provide powerful and flexible filtering:

  • div.classname selects <div> elements with a specific class
  • #myid selects the element with a specific ID
  • parent > child selects only direct children
  • ancestor descendant selects all nested descendants at any depth
  • a[href] selects <a> tags that have an href attribute

Counting All Tags in a Document

To get a total count of every element in the HTML document, call find_all() without any arguments:

from bs4 import BeautifulSoup

html = """
<html>
<head><title>Test</title></head>
<body>
<h1>Hello</h1>
<p>World</p>
</body>
</html>
"""

soup = BeautifulSoup(html, "lxml")

total_tags = len(soup.find_all())
print(f"Total elements: {total_tags}")

Output:

Total elements: 6

To exclude non-content tags like <script>, <style>, or <meta>, filter them out with a list comprehension:

from bs4 import BeautifulSoup

html = """
<html>
<head><title>Test</title></head>
<body>
<h1>Hello</h1>
<p>World</p>
</body>
</html>
"""

soup = BeautifulSoup(html, "lxml")
excluded = {"script", "style", "meta", "link"}

content_tags = [tag for tag in soup.find_all() if tag.name not in excluded]
print(f"Content elements: {len(content_tags)}")

Output:

Content elements: 6

Generating a Tag Frequency Distribution

To see how many times each tag type appears in a document, combine find_all() with collections.Counter:

from collections import Counter
from bs4 import BeautifulSoup

html = """
<div>
<p>Text</p>
<p>More text</p>
<span>Inline</span>
<a href="#">Link 1</a>
<a href="#">Link 2</a>
<a href="#">Link 3</a>
</div>
"""

soup = BeautifulSoup(html, "lxml")

tag_counts = Counter(tag.name for tag in soup.find_all())

print(tag_counts)
print(tag_counts.most_common(3))

Output:

Counter({'a': 3, 'p': 2, 'html': 1, 'body': 1, 'div': 1, 'span': 1})
[('a', 3), ('p', 2), ('html', 1)]

This is particularly useful for auditing large HTML pages and quickly identifying which element types dominate the document structure.

Counting Tags with Specific Attributes

You can narrow your count to tags that match specific attribute criteria by passing keyword arguments or lambda functions to find_all():

from bs4 import BeautifulSoup

html = """
<div>
<a href="https://example.com">External</a>
<a href="/page">Internal</a>
<a href="https://other.com">External</a>
<img src="photo.jpg" alt="Photo">
<img src="icon.png">
</div>
"""

soup = BeautifulSoup(html, "lxml")

# Count links whose href starts with https
external_links = len(
soup.find_all("a", href=lambda h: h and h.startswith("https"))
)
print(f"External links: {external_links}")

# Count images that have an alt attribute
imgs_with_alt = len(soup.find_all("img", alt=True))
print(f"Images with alt text: {imgs_with_alt}")

Output:

External links: 2
Images with alt text: 1
note

The lambda function receives the attribute value for each tag. The h and check ensures that None values (tags without the attribute) are handled safely without raising an error.

Counting Tags Within a Specific Container

To count tags inside a particular section of the page rather than the entire document, first locate the container element and then call find_all() on it:

from bs4 import BeautifulSoup

html = """
<article>
<p>Article paragraph 1</p>
<p>Article paragraph 2</p>
</article>
<aside>
<p>Sidebar paragraph</p>
</aside>
"""

soup = BeautifulSoup(html, "lxml")

article = soup.find("article")
article_paragraphs = len(article.find_all("p"))

print(f"Paragraphs in article: {article_paragraphs}")

Output:

Paragraphs in article: 2
A Common Mistake: Forgetting to Scope the Search

A frequent error is calling find_all() on the top-level soup object when you only want results from a specific section. This returns matches from the entire document:

# Wrong: counts ALL <p> tags in the document
total_p = len(soup.find_all("p"))
print(total_p) # 3 (includes sidebar paragraph)

# Correct: counts only <p> tags inside <article>
article_p = len(soup.find("article").find_all("p"))
print(article_p) # 2

Always call find_all() on the specific container element when you need scoped results.

Counting Tags from a Live Web Page

In real-world scenarios, you will often fetch HTML from a URL rather than working with a hardcoded string. Here is a complete example using the requests library:

import requests
from collections import Counter
from bs4 import BeautifulSoup

response = requests.get("https://example.com")
soup = BeautifulSoup(response.text, "lxml")

tag_counts = Counter(tag.name for tag in soup.find_all())

print(f"Total tags: {sum(tag_counts.values())}")
print(f"Unique tag types: {len(tag_counts)}")
print(f"Most common: {tag_counts.most_common(5)}")
note

Make sure to install requests with pip install requests if you have not already. Also be mindful of a website's robots.txt and terms of service before scraping.

Method Comparison

GoalMethodExample
Specific tag countfind_all()len(soup.find_all("p"))
CSS-based filteringselect()len(soup.select("div.class > p"))
All tags in documentfind_all() no argslen(soup.find_all())
Tag frequency distributionCounter + find_all()Counter(t.name for t in soup.find_all())
Filter by attributefind_all() + kwargs/lambdafind_all("a", href=True)
Scoped to containerChain find() + find_all()soup.find("article").find_all("p")

Conclusion

The len(soup.find_all(...)) pattern is the standard and most versatile approach for counting HTML tags with BeautifulSoup.

  • Use find_all() with tag names and attribute filters for straightforward queries, and switch to select() with CSS selectors when you need more precise control over element hierarchy and class-based filtering.
  • For full document profiling, combine find_all() with collections.Counter to generate a complete frequency distribution of every tag type.
  • Always remember to scope your search to a specific container element when you only care about a particular section of the page.