How to Count HTML Tags with BeautifulSoup in Python
Counting HTML tags is a fundamental task in web scraping and content analysis. Whether you are auditing a page's structure, validating that certain elements exist, or gathering statistics about document composition, BeautifulSoup makes it straightforward to locate, filter, and count any tag in an HTML document.
In this guide, you will learn how to count specific tags, generate frequency distributions of all tag types, filter by attributes and CSS selectors, and scope your counts to specific sections of a page. Each method is demonstrated with clear examples and output.
Setup
Before getting started, install BeautifulSoup and a fast HTML parser if you have not already:
pip install beautifulsoup4 lxml
beautifulsoup4 is the parsing library itself, and lxml is a high-performance parser that BeautifulSoup can use under the hood. Python also ships with html.parser, which works without any extra installation but is slower on large documents.
Counting Specific Tags with find_all()
The most common approach uses find_all() to locate every instance of a given tag and then checks the length of the resulting list:
from bs4 import BeautifulSoup
html = """
<html>
<body>
<div class="content">
<p>First paragraph</p>
<p>Second paragraph</p>
<span>Some text</span>
</div>
<div class="sidebar">
<p>Sidebar paragraph</p>
</div>
</body>
</html>
"""
soup = BeautifulSoup(html, "lxml")
paragraph_count = len(soup.find_all("p"))
div_count = len(soup.find_all("div"))
print(f"Paragraphs: {paragraph_count}")
print(f"Divs: {div_count}")
Output:
Paragraphs: 3
Divs: 2
find_all() searches the entire document tree by default, including deeply nested elements. It returns a list of all matching Tag objects, so wrapping it with len() gives you the count.
Using CSS Selectors with select()
The select() method lets you use CSS selector syntax for more complex queries. This is especially useful when you need to target tags based on class names, IDs, or their position in the document hierarchy:
from bs4 import BeautifulSoup
html = """
<ul class="menu">
<li>Home</li>
<li>About</li>
<li>Contact</li>
</ul>
<ul class="footer-links">
<li>Privacy</li>
<li>Terms</li>
</ul>
"""
soup = BeautifulSoup(html, "lxml")
# Count <li> elements inside .menu only
menu_items = len(soup.select("ul.menu > li"))
print(f"Menu items: {menu_items}")
# Count all <li> elements regardless of parent
all_list_items = len(soup.select("li"))
print(f"All list items: {all_list_items}")
Output:
Menu items: 3
All list items: 5
CSS selectors provide powerful and flexible filtering:
div.classnameselects<div>elements with a specific class#myidselects the element with a specific IDparent > childselects only direct childrenancestor descendantselects all nested descendants at any deptha[href]selects<a>tags that have anhrefattribute
Counting All Tags in a Document
To get a total count of every element in the HTML document, call find_all() without any arguments:
from bs4 import BeautifulSoup
html = """
<html>
<head><title>Test</title></head>
<body>
<h1>Hello</h1>
<p>World</p>
</body>
</html>
"""
soup = BeautifulSoup(html, "lxml")
total_tags = len(soup.find_all())
print(f"Total elements: {total_tags}")
Output:
Total elements: 6
To exclude non-content tags like <script>, <style>, or <meta>, filter them out with a list comprehension:
from bs4 import BeautifulSoup
html = """
<html>
<head><title>Test</title></head>
<body>
<h1>Hello</h1>
<p>World</p>
</body>
</html>
"""
soup = BeautifulSoup(html, "lxml")
excluded = {"script", "style", "meta", "link"}
content_tags = [tag for tag in soup.find_all() if tag.name not in excluded]
print(f"Content elements: {len(content_tags)}")
Output:
Content elements: 6
Generating a Tag Frequency Distribution
To see how many times each tag type appears in a document, combine find_all() with collections.Counter:
from collections import Counter
from bs4 import BeautifulSoup
html = """
<div>
<p>Text</p>
<p>More text</p>
<span>Inline</span>
<a href="#">Link 1</a>
<a href="#">Link 2</a>
<a href="#">Link 3</a>
</div>
"""
soup = BeautifulSoup(html, "lxml")
tag_counts = Counter(tag.name for tag in soup.find_all())
print(tag_counts)
print(tag_counts.most_common(3))
Output:
Counter({'a': 3, 'p': 2, 'html': 1, 'body': 1, 'div': 1, 'span': 1})
[('a', 3), ('p', 2), ('html', 1)]
This is particularly useful for auditing large HTML pages and quickly identifying which element types dominate the document structure.
Counting Tags with Specific Attributes
You can narrow your count to tags that match specific attribute criteria by passing keyword arguments or lambda functions to find_all():
from bs4 import BeautifulSoup
html = """
<div>
<a href="https://example.com">External</a>
<a href="/page">Internal</a>
<a href="https://other.com">External</a>
<img src="photo.jpg" alt="Photo">
<img src="icon.png">
</div>
"""
soup = BeautifulSoup(html, "lxml")
# Count links whose href starts with https
external_links = len(
soup.find_all("a", href=lambda h: h and h.startswith("https"))
)
print(f"External links: {external_links}")
# Count images that have an alt attribute
imgs_with_alt = len(soup.find_all("img", alt=True))
print(f"Images with alt text: {imgs_with_alt}")
Output:
External links: 2
Images with alt text: 1
The lambda function receives the attribute value for each tag. The h and check ensures that None values (tags without the attribute) are handled safely without raising an error.
Counting Tags Within a Specific Container
To count tags inside a particular section of the page rather than the entire document, first locate the container element and then call find_all() on it:
from bs4 import BeautifulSoup
html = """
<article>
<p>Article paragraph 1</p>
<p>Article paragraph 2</p>
</article>
<aside>
<p>Sidebar paragraph</p>
</aside>
"""
soup = BeautifulSoup(html, "lxml")
article = soup.find("article")
article_paragraphs = len(article.find_all("p"))
print(f"Paragraphs in article: {article_paragraphs}")
Output:
Paragraphs in article: 2
A frequent error is calling find_all() on the top-level soup object when you only want results from a specific section. This returns matches from the entire document:
# Wrong: counts ALL <p> tags in the document
total_p = len(soup.find_all("p"))
print(total_p) # 3 (includes sidebar paragraph)
# Correct: counts only <p> tags inside <article>
article_p = len(soup.find("article").find_all("p"))
print(article_p) # 2
Always call find_all() on the specific container element when you need scoped results.
Counting Tags from a Live Web Page
In real-world scenarios, you will often fetch HTML from a URL rather than working with a hardcoded string. Here is a complete example using the requests library:
import requests
from collections import Counter
from bs4 import BeautifulSoup
response = requests.get("https://example.com")
soup = BeautifulSoup(response.text, "lxml")
tag_counts = Counter(tag.name for tag in soup.find_all())
print(f"Total tags: {sum(tag_counts.values())}")
print(f"Unique tag types: {len(tag_counts)}")
print(f"Most common: {tag_counts.most_common(5)}")
Make sure to install requests with pip install requests if you have not already. Also be mindful of a website's robots.txt and terms of service before scraping.
Method Comparison
| Goal | Method | Example |
|---|---|---|
| Specific tag count | find_all() | len(soup.find_all("p")) |
| CSS-based filtering | select() | len(soup.select("div.class > p")) |
| All tags in document | find_all() no args | len(soup.find_all()) |
| Tag frequency distribution | Counter + find_all() | Counter(t.name for t in soup.find_all()) |
| Filter by attribute | find_all() + kwargs/lambda | find_all("a", href=True) |
| Scoped to container | Chain find() + find_all() | soup.find("article").find_all("p") |
Conclusion
The len(soup.find_all(...)) pattern is the standard and most versatile approach for counting HTML tags with BeautifulSoup.
- Use
find_all()with tag names and attribute filters for straightforward queries, and switch toselect()with CSS selectors when you need more precise control over element hierarchy and class-based filtering. - For full document profiling, combine
find_all()withcollections.Counterto generate a complete frequency distribution of every tag type. - Always remember to scope your search to a specific container element when you only care about a particular section of the page.