Skip to main content

How to Extract Email Addresses from a Text File in Batch Script

Scraping contact information from a large text document or log file is a common administrative task. Whether you are building a mailing list from a backup of configuration files or auditing a server log for unauthorized communication, Email Extraction allows you to isolate every instance of the username@domain.com pattern. While Batch isn't designed for complex pattern matching, we can leverage the specialized findstr tool and a PowerShell bridge for high accuracy.

In this guide, we will demonstrate how to extract email addresses using regular expressions.

Method 1: The Fast Filter (FINDSTR)

The findstr command supports basic regular expressions. We can search for the "at" symbol (@) surrounded by alphanumeric characters.

Implementation Script

@echo off
setlocal

set "Source=Contacts.txt"
set "Output=EmailList.txt"

echo Extracting emails from %Source%...

:: /R uses regular expressions
:: /I makes the search case-insensitive
:: findstr uses its own regex flavor where character classes use [a-z]
:: but do not support quantifiers like + or {2,}
:: [a-zA-Z0-9._-] must be repeated with * (zero or more)
findstr /R /I "[a-zA-Z0-9._-]*@[a-zA-Z0-9.-]*\.[a-zA-Z][a-zA-Z]*" "%Source%" > "%Output%"

echo [DONE] Found entries saved to %Output%.

endlocal
pause
warning

findstr returns entire lines that contain a match, not just the email address itself. If a line reads Contact John at john@example.com for details, the full sentence is written to the output file. For extracting only the email addresses, use Method 2.

note

findstr does not support the + quantifier (one or more). The pattern \.[a-zA-Z][a-zA-Z]* is the findstr-compatible way of requiring at least one character after the dot, since \.[a-zA-Z]* would also match a bare dot with nothing after it.

Method 2: The Modern Regex Bridge (PowerShell)

The findstr method is fast but often captures the entire line containing the email, rather than just the address itself. For "Clean" extraction (getting exactly the email and nothing else), PowerShell's regex engine is significantly more powerful.

Implementation Script

@echo off
setlocal

set "file=log.txt"
set "output=emails_only.txt"

echo Performing clean email extraction...

:: This PowerShell one-liner reads the file as a single string,
:: finds all regex matches, and outputs unique results line-by-line
powershell -NoProfile -Command ^
"$content = Get-Content -Path '%file%' -Raw;" ^
"$regex = '[a-zA-Z0-9._%%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}';" ^
"[regex]::Matches($content, $regex) | ForEach-Object { $_.Value } | Sort-Object -Unique | Set-Content -Path '%output%'"

echo [DONE] Unique emails saved to %output%.

endlocal
pause
note

The % character must be escaped as %% inside a Batch script so that the shell passes a literal % to PowerShell. The -Raw switch on Get-Content reads the file as a single string, ensuring that [regex]::Matches scans across line boundaries and captures every address. Set-Content is used instead of the > redirection operator to avoid encoding issues between PowerShell and the Batch shell.

Why Extract Emails via Script?

  1. Auditing: Identifying which users are mentioned in system alerts or error notifications.
  2. Database Seeding: Extracting user contacts from legacy plain-text inventories to import them into a modern CRM.
  3. Communication Logs: Stripping metadata from an exported chat or mail log to get a clean list of participants.

Best Practices

  1. Deduplication: Many text files contain the same email multiple times. Always pipe your results through a unique-filter (as seen in Method 2) to avoid bloating your list.
  2. Handling Special Characters: Email addresses can contain + (for sub-addressing) and _. Ensure your regex accounts for these characters to avoid breaking the address in half.
  3. Encoding: If the source file is in a non-standard encoding, the extraction might miss addresses. Use the -Encoding flag in PowerShell to ensure the text is read correctly.

Conclusion

Extracting email addresses is a powerful way to turn unstructured text into a structured contact inventory. While native Batch commands provide a quick way to find lines containing emails, the PowerShell regex bridge offers the precision needed for professional data harvesting. By automating this extraction, you save hours of manual searching and ensure that your contact lists are accurate and complete.