How to Set Service Recovery Options (Failure Actions) in Batch Script
Creating a Windows Service is only half the battle. To ensure high availability and reliability, you must also configure the service's behavior in the event of a crash. Windows allows you to define "Recovery Options" (or Failure Actions) that can automatically restart the service, reboot the computer, or even run a separate script when the service fails.
This guide will explain how to use the sc failure command in a Batch script to automate these recovery settings, ensuring your critical background processes remain resilient.
The SC FAILURE Command
The sc (Service Control) utility includes a sub-command called failure specifically for managing these recovery rules. Using this command, you can set actions for the first, second, and subsequent failures.
Basic Syntax
sc failure "ServiceName" reset= [Seconds] actions= [Action1]/[Delay1]/[Action2]/[Delay2]/...
The Space Requirement.
As with many sc commands, you must include a space after the equals sign (e.g., actions= ). Without this space, the command will return a syntax error.
Configuring the Actions
The actions= parameter is a slash-separated string that defines what happens and when.
Available Action Types:
restart: Restarts the service.run: Runs a program or script.reboot: Restarts the computer."": Does nothing (no action).
Example: Standard Restart Logic
Here is a script that configures a service to restart after 1 minute on the first failure, and after 2 minutes on every failure after that.
@echo off
set "SvcName=MyCriticalSvc"
:: Verify the service exists before configuring recovery
sc query "%SvcName%" >nul 2>&1
if %errorlevel% neq 0 (
echo [ERROR] Service '%SvcName%' does not exist.
pause
exit /b 1
)
echo [ACTION] Configuring recovery options for %SvcName%...
:: reset= 86400: Reset the failure count after 1 day (86400 seconds)
:: actions= restart/60000/restart/120000:
:: 1st failure: Restart after 60 seconds (60000ms)
:: 2nd and subsequent: Restart after 120 seconds (120000ms)
sc failure "%SvcName%" reset= 86400 actions= restart/60000/restart/120000
if %errorlevel% equ 0 (
echo [SUCCESS] Recovery options applied.
) else (
echo [ERROR] Failed to configure recovery. Ensure you are running as Administrator.
)
pause
Running a Custom Script on Failure
One of the most powerful options is the run action. This allows the service to launch a "Sanity Check" or "Cleanup" script if it crashes.
@echo off
set "TargetSvc=MyDatabaseProxy"
set "FixScript=C:\Scripts\fix_proxy.bat"
:: Verify the service exists
sc query "%TargetSvc%" >nul 2>&1
if %errorlevel% neq 0 (
echo [ERROR] Service '%TargetSvc%' does not exist.
pause
exit /b 1
)
:: Verify the recovery script exists
if not exist "%FixScript%" (
echo [ERROR] Recovery script not found: %FixScript%
pause
exit /b 1
)
:: Set up a failure action that runs a cleanup script after 5 seconds
:: The command= parameter specifies which program to run
sc failure "%TargetSvc%" command= "%FixScript%" actions= run/5000
if %errorlevel% equ 0 (
echo [SUCCESS] Service will now run %FixScript% 5 seconds after a crash.
) else (
echo [ERROR] Failed to configure recovery. Ensure you are running as Administrator.
)
pause
The command= parameter.
When you use the run action in sc failure, the path to the script you want to execute is specified using the command= parameter on the same sc failure command line.
Best Practices and Rules for Recovery Configuration
1. Administrative Privileges
Modifying service recovery options changes the system registry at a deep level. Your Batch script must be run as an Administrator. Standard users do not have permission to modify Service Failure Actions.
2. Time in Milliseconds
In the actions= string, the delays are measured in milliseconds, not seconds.
1000= 1 second60000= 1 minute3600000= 1 hour
3. Reset Counter
The reset= value is in seconds. This is the time during which if no failures occur, the failure counter returns to zero. Setting a reasonable reset time (like 24 hours) prevents a service that crashes once a week from ever reaching its "Subsequent Failure" action.
How to Avoid Common Errors
Wrong Way: Formatting the actions string incorrectly
The actions string must strictly follow the type/time format.
Wrong Approach:
:: Missing slashes or invalid types
sc failure "Svc" actions= restart 60
Correct Way:
sc failure "Svc" actions= restart/60000
Best Practice: Combining with "Automatic Start"
Recovery options are most effective when the service is also set to start= auto. This ensures that even after a computer reboot, the service (and its recovery logic) are active.
Real-World Use Case: The "Unstoppable" Agent
Imagine a security agent that must be running at all times.
@echo off
set "SvcName=SafeAgent"
:: Verify the service exists
sc query "%SvcName%" >nul 2>&1
if %errorlevel% neq 0 (
echo [ERROR] Service '%SvcName%' does not exist. Install it first.
pause
exit /b 1
)
echo [STEP 1] Setting 3-tier recovery...
:: 1st failure: Restart after 1 second (1000ms)
:: 2nd failure: Restart after 10 seconds (10000ms)
:: 3rd and subsequent: Restart after 1 minute (60000ms)
:: Reset the failure counter after 12 hours (43200 seconds)
sc failure "%SvcName%" reset= 43200 actions= restart/1000/restart/10000/restart/60000
if %errorlevel% neq 0 (
echo [ERROR] Failed to set recovery options. Ensure you are running as Administrator.
pause
exit /b 1
)
echo [STEP 2] Enabling recovery actions for non-crash stops...
sc failureflag "%SvcName%" 1
if %errorlevel% equ 0 (
echo.
echo [SUCCESS] Recovery hardening complete for '%SvcName%'.
echo 1st failure: Restart after 1 second
echo 2nd failure: Restart after 10 seconds
echo 3rd+ failure: Restart after 1 minute
) else (
echo [WARNING] Recovery options were set but the failure flag could not be configured.
)
pause
Conclusions
Configuring Service Recovery Options is the hallmark of a professional IT environment. By using the sc failure command, you transform your Windows Services from fragile processes into resilient, self-healing background agents. Always remember to test your failure triggers in a development environment to ensure your scripts (like the run action) have the necessary permissions to execute when the main service dies.