Skip to main content

How to Remove Duplicate Lines from a Text File in Batch Script

Duplicate data in text files, such as lists of IP addresses, duplicate email addresses, or repeating log entries, can cause bloat and lead to errors in automated processing. Cleaning up a text file by removing duplicate lines is a common administrative task. While Batch does not have a single "unique" command, we can achieve this efficiently by combining the sort command with a comparison loop or using a PowerShell bridge.

In this guide, we will demonstrate how to strip duplicates using both native Batch and modern PowerShell methods.

Method 1: The "Comparison" Loop (Native Batch)

This method involves sorting the file first so that all duplicates are grouped together, and then checking if each line is the same as the previous one.

Implementation Script

@echo off
setlocal EnableDelayedExpansion

set "source=InputList.txt"
set "dest=UniqList.txt"
set "tempSorted=%TEMP%\sorted_temp_%RANDOM%.txt"

:: Verify source file exists
if not exist "%source%" (
echo [ERROR] Source file "%source%" not found.
pause
exit /b 1
)

:: 1. Sort file
sort "%source%" /o "%tempSorted%"

:: 2. Remove duplicates (adjacent comparison)
set "prevLine="

(
for /f "usebackq delims=" %%A in ("%tempSorted%") do (
set "currentLine=%%A"

if "!currentLine!" neq "!prevLine!" (
echo(!currentLine!
set "prevLine=!currentLine!"
)
)
) > "%dest%"

:: 3. Cleanup
del "%tempSorted%" >nul 2>&1

echo Duplicates removed. Saved to "%dest%".
pause
endlocal
exit /b 0

For example, with this input file InputList.txt:

apple
banana
apple
orange
banana
banana
grape
kiwi
orange
mango
grape
pineapple
kiwi
apple

the output will be UniqList.txt:

apple
banana
grape
kiwi
mango
orange
pineapple
warning

The for /f command inherently skips blank lines and lines beginning with ; (the default eol character). If your input file relies on blank lines as separators or contains lines starting with ;, those lines will be silently removed from the output regardless of duplication.

tip

The script toggles delayed expansion on and off inside the loop. The variable is set with delayed expansion disabled (set "currentLine=%%A" and set "prevLine=%%A"), which preserves literal ! characters in the text. It is then read with delayed expansion enabled (!currentLine! and !prevLine!), which safely handles &, |, >, <, and other special characters during comparison and output. This toggle pattern prevents both classes of corruption.

Method 2: The PowerShell One-Liner (Fastest and Most Precise)

If you are on Windows 10 or later, the most reliable and high-performance way to do this is to call PowerShell from within your Batch file.

Implementation Script

@echo off
setlocal

set "source=InputList.txt"
set "dest=CleanList.txt"

:: Verify source file exists
if not exist "%source%" (
echo [ERROR] Source file "%source%" not found.
pause
exit /b 1
)

echo Removing duplicates from "%source%"...

:: Sort and deduplicate in a single pipeline
powershell -NoProfile -Command "Get-Content '%source%' | Sort-Object -Unique | Set-Content '%dest%'"

if %errorlevel% equ 0 (
echo [SUCCESS] Duplicates removed. Saved to "%dest%".
) else (
echo [ERROR] PowerShell command failed.
)
pause
exit /b 0

For example, with this input file InputList.txt:

apple
banana
apple
orange
banana
banana
grape
kiwi
orange
mango
grape
pineapple
kiwi
apple

the output will be CleanList.txt:

apple
banana
grape
kiwi
mango
orange
pineapple
info

The -NoProfile flag prevents PowerShell from loading the user's profile scripts, which speeds up execution and avoids unexpected side effects from custom profile configurations.

Handling Case Sensitivity

  • Batch Method: The if "!currentLine!" neq "!prevLine!" comparison is case-sensitive by default. Apple and apple are treated as different lines and both would be kept. To perform case-insensitive deduplication, change the comparison to if /i "!currentLine!" neq "!prevLine!".
  • PowerShell Method: Sort-Object -Unique is case-insensitive by default. Apple and apple would be treated as duplicates and only the first occurrence would be kept. To preserve case-sensitive distinctions, use Sort-Object -Unique -CaseSensitive.

Critical Considerations

  1. Sorting Requirement: Deduplication logic in Method 1 requires the file to be sorted first. If you attempt to remove duplicates from an unsorted file, you will not catch duplicates that are separated by other lines. The PowerShell method handles this internally.
  2. Blank Lines: The Batch for /f loop skips blank lines entirely, so they will be removed from the output. The PowerShell method preserves blank lines but deduplicates them to a single occurrence.
  3. Special Characters: Lines containing &, >, <, or | can break a Batch for loop if the values are echoed without delayed expansion protection. The toggle pattern in Method 1 handles these safely. The PowerShell method (Method 2) is inherently safe for all character types.
  4. Semicolon-Prefixed Lines: The for /f command treats ; as the default end-of-line comment character. Any line in the input file that begins with ; will be silently skipped in Method 1. The PowerShell method does not have this limitation.

Best Practices

  1. Verify Source File: Always check that the input file exists before processing. The sort command will not produce a meaningful error for a missing file and may create an empty output.
  2. Keep the Original: Never overwrite your source file directly. Always output to a new file so you can verify the results before replacing the original.
  3. Audit the Count: At the end of your script, display the line counts of both files so the user can see how many duplicates were removed:
    echo --- Audit ---
    find /c /v "" "%source%"
    find /c /v "" "%dest%"
  4. Temp File Hygiene: Use the %TEMP% directory for intermediate files and include %RANDOM% in the filename to avoid collisions when multiple instances run simultaneously.
  5. Choose the Right Method: Use Method 1 when PowerShell is unavailable or restricted by policy. Use Method 2 for maximum reliability, especially when the input file may contain special characters, blank lines, or semicolons.

Conclusion

Removing duplicate lines is an essential step in maintaining clean, reliable datasets. While native Batch allows you to build a functional deduplication loop, the PowerShell bridge provides a more robust and high-performance solution for modern environments. By incorporating these cleaning steps into your scripts, you ensure that your downstream automation processes are working with accurate, non-redundant information.