Skip to main content

How to Generate a Word Frequency Count from a Text File in Batch Script

Analyzing the most frequently used words in a document, such as identifying the most common error message in a log file or finding recurring keywords in a report, is a key step in data profiling. A Word Frequency Count tells you not just if something is happening, but how often. While Batch doesn't have a native "Frequency" tool, we can combine loops, variable counters, and sorting to build one.

In this guide, we will demonstrate how to build a frequency analyzer using Batch and a high-performance PowerShell bridge.

Method 1: The "Counter Array" (Native Batch)

This method iterates through every word and uses it as a variable name to store a running total.

note

This method is best suited for small files with a limited vocabulary. Because each unique word becomes an environment variable, very large files with thousands of unique words may exceed the CMD shell's memory limits.

Implementation Script

@echo off
:: Start with delayed expansion OFF so we can safely clean up exclamation marks and read lines
setlocal disabledelayedexpansion

:: 1. Clean up any leftover variables from previous runs safely
for /f "delims==" %%A in ('set count_ 2^>nul') do set "%%A="

set "Source=Log.txt"

if not exist "%Source%" (
echo [ERROR] Source file "%Source%" not found.
pause
exit /b 1
)

echo Analyzing word frequency (this may take a moment^)...

:: 2. Iterate through every word in the file
for /f "usebackq tokens=* delims=" %%L in ("%Source%") do (
for %%W in (%%L) do call :CountWord "%%W"
)

:: 3. Display results
echo.
echo WORD ^| FREQUENCY
echo ------------------^|----------

:: We need delayed expansion here to build the output string
setlocal enabledelayedexpansion
for /f "tokens=1,* delims==" %%A in ('set count_ 2^>nul') do (
set "varname=%%A"
set "varvalue=%%B"
:: Strip EXACTLY the first 6 characters ("count_")
set "cleanname=!varname:~6!"
set "padded=!cleanname! "
echo !padded:~0,18!^| !varvalue!
)
endlocal

pause
exit /b 0

:CountWord
set "word=%~1"
set "word=%word:.=%"
set "word=%word:,=%"
set "word=%word:;=%"
set "word=%word:?=%"
set "word=%word:"=%"

if not defined word exit /b

:: Enable delayed expansion ONLY for the safe math operation
setlocal enabledelayedexpansion
set "current=0"
if defined count_%word% set "current=!count_%word%!"
set /a current+=1
:: Pass the newly incremented variable back over the endlocal barrier
for %%V in (!current!) do (
endlocal
set "count_%word%=%%V"
)
exit /b

The Batch method is very slow for large files because it has to process every word individually. PowerShell's Group-Object is built for this exact task and is incredibly fast.

tip

This method handles case-insensitive grouping, punctuation stripping, and sorted output in a single pipeline. It is the recommended approach for any file larger than a few hundred lines.

Implementation Script

@echo off
setlocal

set "Source=report.txt"

if not exist "%Source%" (
echo [ERROR] Source file "%Source%" not found.
pause
exit /b 1
)

echo Generating top 10 most frequent words...

powershell -NoProfile -Command ^
"$text = Get-Content -Path '%Source%';" ^
"$words = $text -split '\s+' | ForEach-Object { ($_ -replace '[^\w]','').ToLower() } | Where-Object { $_ -ne '' };" ^
"$words | Group-Object | Sort-Object Count -Descending | Select-Object -First 10 Name, Count | Format-Table -AutoSize"

endlocal
pause

Why Generate a Frequency Count?

  1. Log Diagnostics: Finding out that "Timeout" appears 5,000 times while "Access Denied" appears twice immediately tells you where to focus your repair efforts.
  2. SEO & Metadata: Identifying the core keywords in a document before publishing it online.
  3. Data Quality: Spotting "junk" words or unexpected entries that appear far more frequently than they should, indicating a bug in your data collection.

Important Considerations

warning

Batch variables are case-insensitive, so Method 1 will naturally merge Error and ERROR into one count. The PowerShell method uses .ToLower() to achieve the same normalization explicitly. If you need case-sensitive counting, remove the .ToLower() call in Method 2.

  1. Case Sensitivity: By default, Batch variables are case-insensitive, so Error and ERROR will count toward the same total. The PowerShell method normalizes to lowercase explicitly for consistency.
  2. Punctuation: A word like Server. is technically different from Server. For accurate results, your script should strip common punctuation (as seen in both methods).
  3. Variable Limits: Since Method 1 uses variable names to store counts, very long files with thousands of unique words may exceed the memory environment of the CMD shell. Use Method 2 for large-scale analysis.

Conclusion

A word frequency count transforms a mountain of raw text into a prioritized list of insights. By identifying the most common patterns in your data, you can move away from "Guessing" and toward "Evidence-based" decision making. Whether you use the native counter-loop for quick local checks or the robust PowerShell grouping bridge for massive log audits, these scripts provide the statistical clarity needed for professional system administration.