How to Generate a Word Frequency Count from a Text File in Batch Script
Analyzing the most frequently used words in a document, such as identifying the most common error message in a log file or finding recurring keywords in a report, is a key step in data profiling. A Word Frequency Count tells you not just if something is happening, but how often. While Batch doesn't have a native "Frequency" tool, we can combine loops, variable counters, and sorting to build one.
In this guide, we will demonstrate how to build a frequency analyzer using Batch and a high-performance PowerShell bridge.
Method 1: The "Counter Array" (Native Batch)
This method iterates through every word and uses it as a variable name to store a running total.
This method is best suited for small files with a limited vocabulary. Because each unique word becomes an environment variable, very large files with thousands of unique words may exceed the CMD shell's memory limits.
Implementation Script
@echo off
:: Start with delayed expansion OFF so we can safely clean up exclamation marks and read lines
setlocal disabledelayedexpansion
:: 1. Clean up any leftover variables from previous runs safely
for /f "delims==" %%A in ('set count_ 2^>nul') do set "%%A="
set "Source=Log.txt"
if not exist "%Source%" (
echo [ERROR] Source file "%Source%" not found.
pause
exit /b 1
)
echo Analyzing word frequency (this may take a moment^)...
:: 2. Iterate through every word in the file
for /f "usebackq tokens=* delims=" %%L in ("%Source%") do (
for %%W in (%%L) do call :CountWord "%%W"
)
:: 3. Display results
echo.
echo WORD ^| FREQUENCY
echo ------------------^|----------
:: We need delayed expansion here to build the output string
setlocal enabledelayedexpansion
for /f "tokens=1,* delims==" %%A in ('set count_ 2^>nul') do (
set "varname=%%A"
set "varvalue=%%B"
:: Strip EXACTLY the first 6 characters ("count_")
set "cleanname=!varname:~6!"
set "padded=!cleanname! "
echo !padded:~0,18!^| !varvalue!
)
endlocal
pause
exit /b 0
:CountWord
set "word=%~1"
set "word=%word:.=%"
set "word=%word:,=%"
set "word=%word:;=%"
set "word=%word:?=%"
set "word=%word:"=%"
if not defined word exit /b
:: Enable delayed expansion ONLY for the safe math operation
setlocal enabledelayedexpansion
set "current=0"
if defined count_%word% set "current=!count_%word%!"
set /a current+=1
:: Pass the newly incremented variable back over the endlocal barrier
for %%V in (!current!) do (
endlocal
set "count_%word%=%%V"
)
exit /b
Method 2: The PowerShell "Group-Object" Bridge (Recommended)
The Batch method is very slow for large files because it has to process every word individually. PowerShell's Group-Object is built for this exact task and is incredibly fast.
This method handles case-insensitive grouping, punctuation stripping, and sorted output in a single pipeline. It is the recommended approach for any file larger than a few hundred lines.
Implementation Script
@echo off
setlocal
set "Source=report.txt"
if not exist "%Source%" (
echo [ERROR] Source file "%Source%" not found.
pause
exit /b 1
)
echo Generating top 10 most frequent words...
powershell -NoProfile -Command ^
"$text = Get-Content -Path '%Source%';" ^
"$words = $text -split '\s+' | ForEach-Object { ($_ -replace '[^\w]','').ToLower() } | Where-Object { $_ -ne '' };" ^
"$words | Group-Object | Sort-Object Count -Descending | Select-Object -First 10 Name, Count | Format-Table -AutoSize"
endlocal
pause
Why Generate a Frequency Count?
- Log Diagnostics: Finding out that "Timeout" appears 5,000 times while "Access Denied" appears twice immediately tells you where to focus your repair efforts.
- SEO & Metadata: Identifying the core keywords in a document before publishing it online.
- Data Quality: Spotting "junk" words or unexpected entries that appear far more frequently than they should, indicating a bug in your data collection.
Important Considerations
Batch variables are case-insensitive, so Method 1 will naturally merge Error and ERROR into one count. The PowerShell method uses .ToLower() to achieve the same normalization explicitly. If you need case-sensitive counting, remove the .ToLower() call in Method 2.
- Case Sensitivity: By default, Batch variables are case-insensitive, so
ErrorandERRORwill count toward the same total. The PowerShell method normalizes to lowercase explicitly for consistency. - Punctuation: A word like
Server.is technically different fromServer. For accurate results, your script should strip common punctuation (as seen in both methods). - Variable Limits: Since Method 1 uses variable names to store counts, very long files with thousands of unique words may exceed the memory environment of the CMD shell. Use Method 2 for large-scale analysis.
Conclusion
A word frequency count transforms a mountain of raw text into a prioritized list of insights. By identifying the most common patterns in your data, you can move away from "Guessing" and toward "Evidence-based" decision making. Whether you use the native counter-loop for quick local checks or the robust PowerShell grouping bridge for massive log audits, these scripts provide the statistical clarity needed for professional system administration.