How to Find an HTML Tag That Contains Certain Text Using BeautifulSoup in Python

When scraping websites or parsing HTML documents, one of the most common tasks is locating HTML elements that contain specific text content. Whether you need to find a link with a particular label, a heading with a known title, or any tag containing a keyword, BeautifulSoup makes this straightforward.

This guide demonstrates multiple techniques to find HTML tags by their text content using BeautifulSoup, from simple exact matches to advanced pattern-based searches with regular expressions.

Prerequisites

Install BeautifulSoup and the HTML parser:

pip install beautifulsoup4

note

BeautifulSoup also requires a parser. The built-in html.parser works for most cases. For more complex or malformed HTML, install lxml:

pip install lxml

Sample HTML File

We'll use the following HTML file (example.html) throughout this guide:

example.html
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>Sample Page</title>
</head>
<body>
    <a href="https://www.example.com/">Learn Python</a>
    <a href="https://www.example.com/java">Learn Java</a>
    <a href="https://www.example.com/rust">Learn Rust</a>

    <h1>Welcome</h1>
    <h1>Python Programming</h1>
    <h2>Getting Started with Python</h2>

    <span class="highlight">Python Programming</span>
    <span class="note">Important Note</span>

    <li class="item-1">Python Programming</li>
    <li class="item-2">Python Code Examples</li>

    <p>This paragraph mentions Python in a longer sentence.</p>

    <table>
        <tr><td>Course Name</td></tr>
    </table>
</body>
</html>

Loading the HTML File

from bs4 import BeautifulSoup

with open("example.html", "r") as file:
    content = file.read()

soup = BeautifulSoup(content, "html.parser")

Method 1: Finding a Specific Tag with Exact Text Using `find()`

The find() method returns the first tag that matches the given criteria. Use the string parameter (or text in older versions) to search by text content:

from bs4 import BeautifulSoup

with open("example.html", "r") as file:
    soup = BeautifulSoup(file, "html.parser")

# Find the first <a> tag with exact text "Learn Python"
tag = soup.find("a", string="Learn Python")
print(tag)

Output:

<a href="https://www.example.com/">Learn Python</a>

tip

The string parameter matches the exact, complete text of the tag. If the tag contains additional whitespace, child elements, or only partially matches, it won't be found. See Method 4 for partial matching.

Method 2: Finding All Matching Tags with `find_all()`

Use find_all() to retrieve every tag that matches your criteria, returned as a list:

from bs4 import BeautifulSoup

with open("example.html", "r") as file:
    soup = BeautifulSoup(file, "html.parser")

# Find ALL <a> tags with exact text "Learn Python"
tags = soup.find_all("a", string="Learn Python")
print(tags)

Output:

[<a href="https://www.example.com/">Learn Python</a>]

Searching Across Different Tag Types

You can search for the same text across different HTML tags:

from bs4 import BeautifulSoup

with open("example.html", "r") as file:
    soup = BeautifulSoup(file, "html.parser")

target_text = "Python Programming"

# Search in <h1> tags
h1_tags = soup.find_all("h1", string=target_text)
print(f"<h1> matches: {h1_tags}")

# Search in <span> tags
span_tags = soup.find_all("span", string=target_text)
print(f"<span> matches: {span_tags}")

# Search in <li> tags
li_tags = soup.find_all("li", string=target_text)
print(f"<li> matches: {li_tags}")

Output:

<h1> matches: [<h1>Python Programming</h1>]
<span> matches: [<span class="highlight">Python Programming</span>]
<li> matches: [<li class="item-1">Python Programming</li>]

Limiting the Number of Results

Use the limit parameter to cap the number of returned results:

from bs4 import BeautifulSoup

with open("example.html", "r") as file:
    soup = BeautifulSoup(file, "html.parser")

# Find only the first 2 anchor tags (regardless of text)
first_two = soup.find_all("a", limit=2)
for tag in first_two:
    print(tag)

Output:

<a href="https://www.example.com/">Learn Python</a>
<a href="https://www.example.com/java">Learn Java</a>

Method 3: Using Regular Expressions for Pattern Matching

For flexible text matching - partial matches, case-insensitive searches, or pattern-based queries - use Python's re module with the string parameter:

import re
from bs4 import BeautifulSoup

with open("example.html", "r") as file:
    soup = BeautifulSoup(file, "html.parser")

# Find all <a> tags where text contains "Learn"
tags = soup.find_all("a", string=re.compile("Learn"))
for tag in tags:
    print(tag)

Output:

<a href="https://www.example.com/">Learn Python</a>
<a href="https://www.example.com/java">Learn Java</a>
<a href="https://www.example.com/rust">Learn Rust</a>

Case-Insensitive Search

import re
from bs4 import BeautifulSoup

with open("example.html", "r") as file:
    soup = BeautifulSoup(file, "html.parser")

# Case-insensitive search for "python" in any tag
tags = soup.find_all(string=re.compile("python", re.IGNORECASE))
for text in tags:
    print(f"  '{text.strip()}' in <{text.parent.name}>")

Output:

  'Learn Python' in <a>
  'Python Programming' in <h1>
  'Getting Started with Python' in <h2>
  'Python Programming' in <span>
  'Python Programming' in <li>
  'Python Code Examples' in <li>
  'This paragraph mentions Python in a longer sentence.' in <p>

note

When you pass a regex to string without specifying a tag name, BeautifulSoup searches the text content of all tags. Each result is a NavigableString object - use .parent to access the enclosing tag.

Method 4: Partial Text Matching with Lambda Functions

The string parameter requires an exact match against the tag's direct text. For tags where the text is part of a longer string or mixed with child elements, use a lambda function:

from bs4 import BeautifulSoup

with open("example.html", "r") as file:
    soup = BeautifulSoup(file, "html.parser")

# Find any tag containing "Python" anywhere in its text
tags = soup.find_all(lambda tag: tag.string and "Python" in tag.string)
for tag in tags:
    print(f"<{tag.name}>: {tag.string}")

Output:

<a>: Learn Python
<h1>: Python Programming
<h2>: Getting Started with Python
<span>: Python Programming
<li>: Python Programming
<li>: Python Code Examples
<p>: This paragraph mentions Python in a longer sentence.

Finding Tags with Text in Nested Content

Some tags contain text spread across child elements. Use .get_text() instead of .string for these cases:

from bs4 import BeautifulSoup

html = '<p>This is <strong>important</strong> text about Python.</p>'
soup = BeautifulSoup(html, "html.parser")

# .string returns None for tags with mixed content
p_tag = soup.find("p")
print(f".string: {p_tag.string}")

# .get_text() combines all text from the tag and its children
print(f".get_text(): {p_tag.get_text()}")

# Use get_text() in a lambda for reliable partial matching
tags = soup.find_all(lambda tag: "Python" in tag.get_text())
for tag in tags:
    print(f"<{tag.name}>: {tag.get_text()}")

Output:

.string: None
.get_text(): This is important text about Python.
<p>: This is important text about Python.

Common Mistake: Confusing .string and .get_text()

.string returns the text only if the tag contains a single text node (no child elements). Returns None otherwise.
.get_text() returns all text from the tag and its descendants, concatenated together.

When searching for text in tags that might have nested elements, always use .get_text():

# ❌ May miss matches: .string is None for tags with children
soup.find_all(lambda tag: tag.string and "Python" in tag.string)

# ✅ Reliable: works with nested content
soup.find_all(lambda tag: "Python" in tag.get_text())

Method 5: Searching Any Tag Type for Specific Text

If you don't care about the tag type and want to find any element containing certain text, omit the tag name:

from bs4 import BeautifulSoup

with open("example.html", "r") as file:
    soup = BeautifulSoup(file, "html.parser")

# Find the first tag of any type containing "Course Name"
tag = soup.find(string="Course Name")
print(f"Text: '{tag}'")
print(f"Parent tag: {tag.parent}")

Output:

Text: 'Course Name'
Parent tag: <td>Course Name</td>

Method 6: Combining Text Search with Attribute Filters

You can search for tags matching both text content and attributes simultaneously:

from bs4 import BeautifulSoup

with open("example.html", "r") as file:
    soup = BeautifulSoup(file, "html.parser")

# Find <span> tags with class="highlight" AND text "Python Programming"
tag = soup.find("span", class_="highlight", string="Python Programming")
print(tag)

# Find <li> tags with class="item-2" containing "Code"
import re
tag = soup.find("li", class_="item-2", string=re.compile("Code"))
print(tag)

Output:

<span class="highlight">Python Programming</span>
<li class="item-2">Python Code Examples</li>

Practical Example: Extracting Links with Specific Text

A common real-world use case is finding URLs associated with specific link text:

import re
from bs4 import BeautifulSoup

with open("example.html", "r") as file:
    soup = BeautifulSoup(file, "html.parser")

# Find all links that contain "Learn" in their text
learn_links = soup.find_all("a", string=re.compile("Learn"))

print("Links containing 'Learn':")
for link in learn_links:
    text = link.get_text()
    url = link.get("href", "N/A")
    print(f"  {text} -> {url}")

Output:

Links containing 'Learn':
  Learn Python -> https://www.example.com/
  Learn Java -> https://www.example.com/java
  Learn Rust -> https://www.example.com/rust

Quick Reference

Task	Code
Find first tag with exact text	`soup.find("a", string="exact text")`
Find all tags with exact text	`soup.find_all("a", string="exact text")`
Partial match with regex	`soup.find_all("a", string=re.compile("partial"))`
Case-insensitive match	`soup.find_all(string=re.compile("text", re.IGNORECASE))`
Lambda for partial match	`soup.find_all(lambda t: "text" in t.get_text())`
Text + attribute filter	`soup.find("span", class_="cls", string="text")`
Limit number of results	`soup.find_all("a", limit=3)`
Any tag with text	`soup.find(string="text").parent`

Conclusion

BeautifulSoup provides flexible and powerful methods to find HTML tags by their text content:

string parameter with find() or find_all() handles exact text matching.
Regular expressions (re.compile()) enable partial matches, case-insensitive searches, and pattern-based queries.
Lambda functions provide the most flexibility, especially with .get_text() for tags containing nested elements.
Combining text and attribute filters lets you precisely target specific elements.

For most use cases, soup.find_all("tag", string=re.compile("pattern")) strikes the best balance between simplicity and flexibility. When dealing with complex nested HTML, switch to lambda functions with .get_text() for reliable results.

Prerequisites​

Sample HTML File​

Loading the HTML File​

Method 1: Finding a Specific Tag with Exact Text Using find()​

Method 2: Finding All Matching Tags with find_all()​

Searching Across Different Tag Types​

Limiting the Number of Results​

Method 3: Using Regular Expressions for Pattern Matching​

Case-Insensitive Search​

Method 4: Partial Text Matching with Lambda Functions​

Finding Tags with Text in Nested Content​

Method 5: Searching Any Tag Type for Specific Text​

Method 6: Combining Text Search with Attribute Filters​

Practical Example: Extracting Links with Specific Text​

Quick Reference​

Conclusion​

Table of Contents

Prerequisites

Sample HTML File

Loading the HTML File

Method 1: Finding a Specific Tag with Exact Text Using `find()`

Method 2: Finding All Matching Tags with `find_all()`

Searching Across Different Tag Types

Limiting the Number of Results

Method 3: Using Regular Expressions for Pattern Matching

Case-Insensitive Search

Method 4: Partial Text Matching with Lambda Functions

Finding Tags with Text in Nested Content

Method 5: Searching Any Tag Type for Specific Text

Method 6: Combining Text Search with Attribute Filters

Practical Example: Extracting Links with Specific Text

Quick Reference

Conclusion