How to Find an HTML Tag That Contains Certain Text Using BeautifulSoup in Python
When scraping websites or parsing HTML documents, one of the most common tasks is locating HTML elements that contain specific text content. Whether you need to find a link with a particular label, a heading with a known title, or any tag containing a keyword, BeautifulSoup makes this straightforward.
This guide demonstrates multiple techniques to find HTML tags by their text content using BeautifulSoup, from simple exact matches to advanced pattern-based searches with regular expressions.
Prerequisites
Install BeautifulSoup and the HTML parser:
pip install beautifulsoup4
BeautifulSoup also requires a parser. The built-in html.parser works for most cases. For more complex or malformed HTML, install lxml:
pip install lxml
Sample HTML File
We'll use the following HTML file (example.html) throughout this guide:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Sample Page</title>
</head>
<body>
<a href="https://www.example.com/">Learn Python</a>
<a href="https://www.example.com/java">Learn Java</a>
<a href="https://www.example.com/rust">Learn Rust</a>
<h1>Welcome</h1>
<h1>Python Programming</h1>
<h2>Getting Started with Python</h2>
<span class="highlight">Python Programming</span>
<span class="note">Important Note</span>
<li class="item-1">Python Programming</li>
<li class="item-2">Python Code Examples</li>
<p>This paragraph mentions Python in a longer sentence.</p>
<table>
<tr><td>Course Name</td></tr>
</table>
</body>
</html>
Loading the HTML File
from bs4 import BeautifulSoup
with open("example.html", "r") as file:
content = file.read()
soup = BeautifulSoup(content, "html.parser")
Method 1: Finding a Specific Tag with Exact Text Using find()
The find() method returns the first tag that matches the given criteria. Use the string parameter (or text in older versions) to search by text content:
from bs4 import BeautifulSoup
with open("example.html", "r") as file:
soup = BeautifulSoup(file, "html.parser")
# Find the first <a> tag with exact text "Learn Python"
tag = soup.find("a", string="Learn Python")
print(tag)
Output:
<a href="https://www.example.com/">Learn Python</a>
The string parameter matches the exact, complete text of the tag. If the tag contains additional whitespace, child elements, or only partially matches, it won't be found. See Method 4 for partial matching.
Method 2: Finding All Matching Tags with find_all()
Use find_all() to retrieve every tag that matches your criteria, returned as a list:
from bs4 import BeautifulSoup
with open("example.html", "r") as file:
soup = BeautifulSoup(file, "html.parser")
# Find ALL <a> tags with exact text "Learn Python"
tags = soup.find_all("a", string="Learn Python")
print(tags)
Output:
[<a href="https://www.example.com/">Learn Python</a>]
Searching Across Different Tag Types
You can search for the same text across different HTML tags:
from bs4 import BeautifulSoup
with open("example.html", "r") as file:
soup = BeautifulSoup(file, "html.parser")
target_text = "Python Programming"
# Search in <h1> tags
h1_tags = soup.find_all("h1", string=target_text)
print(f"<h1> matches: {h1_tags}")
# Search in <span> tags
span_tags = soup.find_all("span", string=target_text)
print(f"<span> matches: {span_tags}")
# Search in <li> tags
li_tags = soup.find_all("li", string=target_text)
print(f"<li> matches: {li_tags}")
Output:
<h1> matches: [<h1>Python Programming</h1>]
<span> matches: [<span class="highlight">Python Programming</span>]
<li> matches: [<li class="item-1">Python Programming</li>]
Limiting the Number of Results
Use the limit parameter to cap the number of returned results:
from bs4 import BeautifulSoup
with open("example.html", "r") as file:
soup = BeautifulSoup(file, "html.parser")
# Find only the first 2 anchor tags (regardless of text)
first_two = soup.find_all("a", limit=2)
for tag in first_two:
print(tag)
Output:
<a href="https://www.example.com/">Learn Python</a>
<a href="https://www.example.com/java">Learn Java</a>
Method 3: Using Regular Expressions for Pattern Matching
For flexible text matching - partial matches, case-insensitive searches, or pattern-based queries - use Python's re module with the string parameter:
import re
from bs4 import BeautifulSoup
with open("example.html", "r") as file:
soup = BeautifulSoup(file, "html.parser")
# Find all <a> tags where text contains "Learn"
tags = soup.find_all("a", string=re.compile("Learn"))
for tag in tags:
print(tag)
Output:
<a href="https://www.example.com/">Learn Python</a>
<a href="https://www.example.com/java">Learn Java</a>
<a href="https://www.example.com/rust">Learn Rust</a>
Case-Insensitive Search
import re
from bs4 import BeautifulSoup
with open("example.html", "r") as file:
soup = BeautifulSoup(file, "html.parser")
# Case-insensitive search for "python" in any tag
tags = soup.find_all(string=re.compile("python", re.IGNORECASE))
for text in tags:
print(f" '{text.strip()}' in <{text.parent.name}>")
Output:
'Learn Python' in <a>
'Python Programming' in <h1>
'Getting Started with Python' in <h2>
'Python Programming' in <span>
'Python Programming' in <li>
'Python Code Examples' in <li>
'This paragraph mentions Python in a longer sentence.' in <p>
When you pass a regex to string without specifying a tag name, BeautifulSoup searches the text content of all tags. Each result is a NavigableString object - use .parent to access the enclosing tag.
Method 4: Partial Text Matching with Lambda Functions
The string parameter requires an exact match against the tag's direct text. For tags where the text is part of a longer string or mixed with child elements, use a lambda function:
from bs4 import BeautifulSoup
with open("example.html", "r") as file:
soup = BeautifulSoup(file, "html.parser")
# Find any tag containing "Python" anywhere in its text
tags = soup.find_all(lambda tag: tag.string and "Python" in tag.string)
for tag in tags:
print(f"<{tag.name}>: {tag.string}")
Output:
<a>: Learn Python
<h1>: Python Programming
<h2>: Getting Started with Python
<span>: Python Programming
<li>: Python Programming
<li>: Python Code Examples
<p>: This paragraph mentions Python in a longer sentence.
Finding Tags with Text in Nested Content
Some tags contain text spread across child elements. Use .get_text() instead of .string for these cases:
from bs4 import BeautifulSoup
html = '<p>This is <strong>important</strong> text about Python.</p>'
soup = BeautifulSoup(html, "html.parser")
# .string returns None for tags with mixed content
p_tag = soup.find("p")
print(f".string: {p_tag.string}")
# .get_text() combines all text from the tag and its children
print(f".get_text(): {p_tag.get_text()}")
# Use get_text() in a lambda for reliable partial matching
tags = soup.find_all(lambda tag: "Python" in tag.get_text())
for tag in tags:
print(f"<{tag.name}>: {tag.get_text()}")
Output:
.string: None
.get_text(): This is important text about Python.
<p>: This is important text about Python.
.string and .get_text().stringreturns the text only if the tag contains a single text node (no child elements). ReturnsNoneotherwise..get_text()returns all text from the tag and its descendants, concatenated together.
When searching for text in tags that might have nested elements, always use .get_text():
# ❌ May miss matches: .string is None for tags with children
soup.find_all(lambda tag: tag.string and "Python" in tag.string)
# ✅ Reliable: works with nested content
soup.find_all(lambda tag: "Python" in tag.get_text())
Method 5: Searching Any Tag Type for Specific Text
If you don't care about the tag type and want to find any element containing certain text, omit the tag name:
from bs4 import BeautifulSoup
with open("example.html", "r") as file:
soup = BeautifulSoup(file, "html.parser")
# Find the first tag of any type containing "Course Name"
tag = soup.find(string="Course Name")
print(f"Text: '{tag}'")
print(f"Parent tag: {tag.parent}")
Output:
Text: 'Course Name'
Parent tag: <td>Course Name</td>
Method 6: Combining Text Search with Attribute Filters
You can search for tags matching both text content and attributes simultaneously:
from bs4 import BeautifulSoup
with open("example.html", "r") as file:
soup = BeautifulSoup(file, "html.parser")
# Find <span> tags with class="highlight" AND text "Python Programming"
tag = soup.find("span", class_="highlight", string="Python Programming")
print(tag)
# Find <li> tags with class="item-2" containing "Code"
import re
tag = soup.find("li", class_="item-2", string=re.compile("Code"))
print(tag)
Output:
<span class="highlight">Python Programming</span>
<li class="item-2">Python Code Examples</li>
Practical Example: Extracting Links with Specific Text
A common real-world use case is finding URLs associated with specific link text:
import re
from bs4 import BeautifulSoup
with open("example.html", "r") as file:
soup = BeautifulSoup(file, "html.parser")
# Find all links that contain "Learn" in their text
learn_links = soup.find_all("a", string=re.compile("Learn"))
print("Links containing 'Learn':")
for link in learn_links:
text = link.get_text()
url = link.get("href", "N/A")
print(f" {text} -> {url}")
Output:
Links containing 'Learn':
Learn Python -> https://www.example.com/
Learn Java -> https://www.example.com/java
Learn Rust -> https://www.example.com/rust
Quick Reference
| Task | Code |
|---|---|
| Find first tag with exact text | soup.find("a", string="exact text") |
| Find all tags with exact text | soup.find_all("a", string="exact text") |
| Partial match with regex | soup.find_all("a", string=re.compile("partial")) |
| Case-insensitive match | soup.find_all(string=re.compile("text", re.IGNORECASE)) |
| Lambda for partial match | soup.find_all(lambda t: "text" in t.get_text()) |
| Text + attribute filter | soup.find("span", class_="cls", string="text") |
| Limit number of results | soup.find_all("a", limit=3) |
| Any tag with text | soup.find(string="text").parent |
Conclusion
BeautifulSoup provides flexible and powerful methods to find HTML tags by their text content:
stringparameter withfind()orfind_all()handles exact text matching.- Regular expressions (
re.compile()) enable partial matches, case-insensitive searches, and pattern-based queries. - Lambda functions provide the most flexibility, especially with
.get_text()for tags containing nested elements. - Combining text and attribute filters lets you precisely target specific elements.
For most use cases, soup.find_all("tag", string=re.compile("pattern")) strikes the best balance between simplicity and flexibility. When dealing with complex nested HTML, switch to lambda functions with .get_text() for reliable results.