Skip to main content

How to Count Words in a File in Batch Script

Knowing the word count of a file is a standard way to measure the volume of data or the progress of a documentation project. While Windows has the find command to count lines, it does not have a native wc (word count) tool like Linux. To count words in Batch, we have to iterate through every line and count every individual token (word) separated by spaces.

In this guide, we will demonstrate how to count words using a for loop and a more efficient PowerShell alternative.

Method 1: The Loop-and-Count (Native Batch)

This method iterates through every line and then through every word in that line, incrementing a counter.

Implementation Script

@echo off
setlocal disabledelayedexpansion

set "target=document.txt"
set "wordCount=0"

:: Verify source file exists
if not exist "%target%" (
echo [ERROR] File "%target%" not found.
pause
exit /b 1
)

echo Counting words in "%target%"...

:: Read file line-by-line and process each in a robust sub-routine
:: This avoids nested-block parsing errors
for /f "usebackq delims=" %%L in ("%target%") do (
set "line=%%L"
call :countWords
)

echo.
echo ==========================================
echo TOTAL WORDS: %wordCount%
echo ==========================================
pause
exit /b 0

:: Safe sub-routine for counting words in a single line
:countWords
setlocal enabledelayedexpansion
:: Replace standard Batch delimiters with spaces
set "line=!line:,= !"
set "line=!line:;= !"
set "line=!line:== !"

set "c=0"
:: Use quoted tokens to prevent globbing and special char crashes
for %%W in ("!line: =" "!") do (
set "w=%%~W"
if defined w set /a c+=1
)
:: Export the count from setlocal back to the global wordCount
for /f %%C in ("!c!") do endlocal & set /a "wordCount+=%%C"
exit /b
warning

The inner for %%W in (...) loop uses Batch's default token splitting, which treats spaces, tabs, commas, semicolons, and equals signs as delimiters. It also expands wildcard characters (* and ?) against filenames in the current directory. A line containing *.txt would be expanded into matching filenames, each counted as a separate word. For precise word counting, use the PowerShell method (Method 2).

tip

The script uses the delayed expansion toggle pattern: each line is set with delayed expansion disabled (set "line=%%L") to preserve literal ! characters in the file content. The inner loop then uses !line! with delayed expansion enabled to safely handle &, |, >, <, and other special characters that would break for %%W in (%%L) if expanded directly.

How It Works

The inner for %%W in (!line!) loop automatically breaks a line into segments based on Batch's default delimiters (spaces, tabs, commas, semicolons, and equals signs). For every segment it finds, it increments the counter by 1.

For large files, the Batch method is slow because it processes every word individually in a nested loop. PowerShell's Measure-Object -Word is optimized for this task and handles all edge cases correctly.

Implementation Script

@echo off
setlocal

set "target=log.txt"

:: Verify source file exists
if not exist "%target%" (
echo [ERROR] File "%target%" not found.
pause
exit /b 1
)

echo Counting words in "%target%"...

:: Measure-Object -Word counts whitespace-separated tokens
:: It does not expand wildcards or treat punctuation as delimiters
powershell -NoProfile -Command ^
"$stats = Get-Content -Path '%target%' | Measure-Object -Word -Line -Character; " ^
"Write-Host (' Lines: ' + $stats.Lines); " ^
"Write-Host (' Words: ' + $stats.Words); " ^
"Write-Host (' Characters: ' + $stats.Characters)"

if %errorlevel% neq 0 (
echo [ERROR] Word count failed.
pause
exit /b 1
)
pause
exit /b 0
info

The Measure-Object cmdlet can count lines, words, and characters in a single pass by specifying -Word -Line -Character together. This provides a complete file statistics summary similar to the Linux wc command without needing three separate operations.

Comparisons: Word Count vs. Line and Character Count

MetricCommandWhat It Measures
Line Countfind /c /v "" < "file.txt"Number of lines (height of the file)
Word CountMethod 1 or 2 aboveNumber of whitespace-separated tokens (content density)
Character CountMeasure-Object -CharacterRaw size including spaces and punctuation

Why Count Words in Batch?

  1. Validation: If you are scraping data and expect a specific number of inputs, counting words can verify that the extraction was successful.
  2. Quotas: Automated report generators can use word counts to ensure that a mandatory description or entry meets a minimum length requirement.
  3. Audit Logs: Quickly checking if a log file is unusually large or small based on word density can signal a system change.

Best Practices

  1. Wildcard Expansion: The Batch method (Method 1) expands wildcard characters (* and ?) against the current directory's file listing. A word like *.log in the text would be expanded into actual filenames, inflating the count. If your file may contain wildcard characters, use the PowerShell method.
  2. Delimiter Differences: The Batch method treats commas, semicolons, and equals signs as word separators in addition to spaces and tabs. PowerShell's Measure-Object -Word uses only whitespace as a delimiter. Be aware of this difference when comparing counts between methods.
  3. Verify Source File: Always check that the input file exists before processing. A missing file will cause the for /f loop to silently produce a zero count with no error.
  4. Performance: If your file is larger than 1 MB, the Batch loop will take a noticeable amount of time. Use the PowerShell method for production-level tasks.
  5. Blank Lines: Both methods correctly skip blank lines without counting them as words. However, the Batch for /f method also skips lines beginning with ; (the default eol character), which would cause those lines' words to be uncounted.
  6. Special Characters: Always use the delayed expansion toggle pattern in Method 1. Without it, lines containing &, |, >, or < will cause the inner for loop to break or produce incorrect counts.

Conclusion

Counting words in a Batch script is a clever use of tokenization. By breaking down lines into their component parts, you can gain a deeper understanding of the volume and structure of your text data. Whether you use a native loop for local scripts or a PowerShell bridge for high-speed analysis, knowing your word count is an essential part of effective data management and reporting.