Skip to main content

How to Extract Percentages from a String in Python

Extracting percentage values from text is a common task in data mining, report parsing, web scraping, and natural language processing. Whether you're processing financial reports, survey results, or log files, you often need to pull out values like "85%", "99.9%", or "100%" from unstructured text.

In this guide, you will learn multiple methods to extract percentages from strings in Python using regular expressions, with patterns that handle various formats including integers, decimals, and spaces before the % sign.

Understanding the Problem

Given a string containing text mixed with percentage values:

text = "The success rate is 85% with 99.9% accuracy and 100% coverage"

Extract all percentages:

["85%", "99.9%", "100%"]

The most reliable approach uses re.findall() with a regex pattern that matches numbers followed by the % symbol:

import re

text = "tutorialreference is 100% way to get 200% success"

percentages = re.findall(r"\d+%", text)

print(percentages)

Output:

['100%', '200%']

How it works:

  • \d+ matches one or more digits.
  • % matches the literal percent sign.
  • findall() returns all non-overlapping matches as a list.

Handling Decimal Percentages

The basic \d+% pattern misses decimal percentages like 99.9%. Use an extended pattern:

import re

text = "Accuracy is 99.9% and precision is 87.5%, with 100% recall"

percentages = re.findall(r"\d+\.?\d*%", text)

print(percentages)

Output:

['99.9%', '87.5%', '100%']

Pattern breakdown:

  • \d+ - one or more digits (integer part)
  • \.? - optional decimal point
  • \d* - zero or more digits (decimal part)
  • % - literal percent sign

Handling Spaces Before %

Some text formats include a space before the percent sign (e.g., "100 %"):

import re

text = "Success rate is 85 % with 99.9% accuracy"

# Match numbers with optional space before %
percentages = re.findall(r"\d+\.?\d*\s*%", text)

print(percentages)

Output:

['85 %', '99.9%']
tip

Use \s* (zero or more whitespace characters) before % to handle both "85%" and "85 %" formats. If you want to normalize the output by removing spaces:

# Normalize: remove spaces before %
cleaned = [p.replace(" ", "") for p in percentages]
print(cleaned) # ['85%', '99.9%']

Method 2: Using split() and Filtering

A non-regex approach splits the string into words and filters for those containing %:

text = "tutorialreference is 100% way to get 200% success"

words = text.split()
percentages = [word for word in words if "%" in word]

print(percentages)

Output:

['100%', '200%']

This works for simple cases but fails when the % is separated from the number by a space or when words contain % mixed with other characters.

Method 3: Extracting Numeric Values Without the % Sign

Sometimes you need the numeric values as actual numbers rather than strings with %:

import re

text = "The rates are 15.5%, 22%, and 8.75% respectively"

# Extract just the numbers (without %)
values = [float(x) for x in re.findall(r"(\d+\.?\d*)\s*%", text)]

print(values)

Output:

[15.5, 22.0, 8.75]

How it works:

  • The parentheses (\d+\.?\d*) create a capturing group that captures only the numeric part.
  • findall() returns only the captured group content (without %).
  • float() converts each match to a number.

Comprehensive Regex Pattern

Here's a robust pattern that handles all common percentage formats:

import re

def extract_percentages(text):
"""Extract all percentages from text, handling various formats."""
# Matches: 100%, 99.9%, 0.5%, .5%, 85 %, negative like -3.5%
pattern = r"-?\d*\.?\d+\s*%"
matches = re.findall(pattern, text)
return [m.replace(" ", "") for m in matches] # Normalize spaces


# Test with various formats
tests = [
"Growth is 15.5% and decline is -3.2%",
"Rates: 100%, 0.5%, .75%",
"Score is 85 % out of 100 %",
"No percentages here",
]

for text in tests:
print(f"Input: {text}")
print(f"Output: {extract_percentages(text)}\n")

Output:

Input:  Growth is 15.5% and decline is -3.2%
Output: ['15.5%', '-3.2%']

Input: Rates: 100%, 0.5%, .75%
Output: ['100%', '0.5%', '.75%']

Input: Score is 85 % out of 100 %
Output: ['85%', '100%']

Input: No percentages here
Output: []

Common Mistake: Greedy Matching with Adjacent Text

A poorly constructed regex can match unintended text:

Problem: pattern too broad

import re

text = "ID12345% discount"

# This matches "12345%" which isn't a meaningful percentage
result = re.findall(r"\d+%", text)
print(result) # ['12345%']

Fix: add word boundaries to match standalone numbers

import re

text = "ID12345% discount is 25% off"

# \b ensures the number is a standalone word
result = re.findall(r"\b\d+\.?\d*\s*%", text)
print(result)

Output:

['25%']
caution

Use word boundaries (\b) when you need to match only standalone percentage values and avoid capturing numbers embedded in identifiers or codes.

Extracting Percentages from Multi-Line Text

For processing documents, reports, or log files:

import re

report = """
Q1 Revenue: increased by 12.5%
Q2 Revenue: decreased by 3.2%
Q3 Revenue: stable at 0%
Q4 Revenue: increased by 8.75%
Overall growth: 18.05%
"""

percentages = re.findall(r"-?\d+\.?\d*%", report)

print("All percentages found:", percentages)
print(f"Count: {len(percentages)}")

# Convert to numbers for analysis
values = [float(p.rstrip("%")) for p in percentages]
print(f"Average: {sum(values) / len(values):.2f}%")
print(f"Max: {max(values)}%")
print(f"Min: {min(values)}%")

Output:

All percentages found: ['12.5%', '3.2%', '0%', '8.75%', '18.05%']
Count: 5
Average: 8.50%
Max: 18.05%
Min: 0.0%

Comparison of Methods

MethodHandles DecimalsHandles SpacesHandles NegativesRobustness
re.findall(r"\d+%")❌ No❌ No❌ NoBasic
re.findall(r"\d+\.?\d*%")✅ Yes❌ No❌ NoGood
re.findall(r"-?\d*\.?\d+\s*%")✅ Yes✅ Yes✅ YesRobust
split() + filtering❌ No❌ No❌ NoFragile

Summary

Extracting percentages from strings in Python is best accomplished with regular expressions. Key takeaways:

  • Use re.findall(r"\d+\.?\d*%", text) for the most common use case - matching integer and decimal percentages.
  • Add \s* before % to handle formats with spaces between the number and percent sign.
  • Add -? at the start to capture negative percentages.
  • Use capturing groups (\d+\.?\d*) when you need only the numeric values without the % sign.
  • Use word boundaries (\b) to avoid matching numbers embedded in identifiers.
  • Convert results to float with float(p.rstrip("%")) for numerical analysis.