How to Remove Duplicates from an Array in Batch Script
Arrays often accumulate redundant data, especially when merging several lists (like a list of computer names from Active Directory and a list from a network scan). Deduplication (removing duplicates) is the process of ensuring that every value in your array is unique. This makes your scripts more efficient by preventing the same task from being performed twice on the same target.
In this guide, we will demonstrate how to deduplicate an array using a "Unique Key" methodology.
The Strategy: The Dictionary Check
The most efficient way to remove duplicates in Batch is not to compare every item against every other item (which is slow). Instead, we treat each value as a Variable Name. Since a variable name can only exist once, assigning it to a new array automatically filters out repeats.
This method is case-insensitive because CMD environment variables are case-insensitive. Apple and apple will be treated as the same item, and only the first occurrence will be kept.
Implementation Script
@echo off
setlocal enabledelayedexpansion
:: 1. Define Array with Duplicates
set "size=7"
set "ARR_1=Apple"
set "ARR_2=Orange"
set "ARR_3=Apple"
set "ARR_4=Banana"
set "ARR_5=Orange"
set "ARR_6=Grapes"
set "ARR_7=Banana"
echo Original: !ARR_1!, !ARR_2!, !ARR_3!, !ARR_4!, !ARR_5!, !ARR_6!, !ARR_7!
:: 2. Deduplication Logic
set "uniqueCount=0"
for /L %%i in (1,1,%size%) do (
call set "currentValue=%%ARR_%%i%%"
:: Use the value itself as a unique key in a temporary namespace
if not defined SEEN_!currentValue! (
:: This is a new unique item
set "SEEN_!currentValue!=1"
set /a "uniqueCount+=1"
set "UNIQUE_!uniqueCount!=!currentValue!"
)
)
:: 3. Display Results
echo.
echo Unique Items (!uniqueCount! found^):
for /L %%i in (1,1,!uniqueCount!) do (
echo [%%i] !UNIQUE_%%i!
)
:: 4. Cleanup temporary namespace
for /L %%i in (1,1,%size%) do (
call set "val=%%ARR_%%i%%"
set "SEEN_!val!="
)
endlocal
pause
Output:
Original: Apple, Orange, Apple, Banana, Orange, Grapes, Banana
Unique Items (4 found):
[1] Apple
[2] Orange
[3] Banana
[4] Grapes
Why Deduplicate an Array?
- Automation Safety: If your script deletes folders based on an array, having a duplicate entry might cause the script to error out when it tries to delete a folder that was already successfully removed in the previous loop.
- Performance: Reducing a 1,000-line log file to 50 unique error codes makes your reporting much faster and easier for humans to read.
- Mailing/Messaging: Ensuring that an email notification list doesn't contain the same address twice, preventing users from being spammed.
Important Considerations
If your array values contain special shell characters like &, |, <, >, or ^, the variable-name-as-key technique will fail because CMD interprets those characters as operators. For data that may contain such characters, use the PowerShell deduplication bridge shown below.
- Case Sensitivity: By default, Batch variables are case-insensitive. This means
Appleandapplewill be considered the same and one will be removed. - Special Characters: If your array contains symbols like
&,|, or^, theif defined SEEN_!currentValue!syntax may break. For "Dirty" data, use the PowerShell deduplication bridge. - Memory Management: Always clear your temporary namespace after the loop is done to free up environment variable memory, as demonstrated in the cleanup step above.
The PowerShell Bridge (One-Liner)
For very large lists or those with special characters, PowerShell's Sort-Object -Unique is much safer:
@echo off
:: Use PowerShell to sort and deduplicate a list
set "list=Apple,Orange,Apple,Banana,Orange,Grapes,Banana"
powershell -NoProfile -Command "'%list%'.Split(',') | Sort-Object -Unique"
pause
Output:
Apple
Banana
Grapes
Orange
If you need to preserve the original insertion order instead of sorting alphabetically, replace Sort-Object -Unique with Select-Object -Unique in the PowerShell command. This keeps the first occurrence of each item in its original position.
Conclusion
Removing duplicates is a key step in "Sanitizing" your data. It turns cluttered, redundant lists into clean, actionable inventories. By utilizing the "Unique Key" methodology, you can perform high-speed deduplication without complex nested loops. This ensures that your automation remains lean, fast, and focused on unique tasks, saving time and preventing redundant processing errors.