How Can We Optimize BC/DR KPI’s? (Part II — RTO)

Continuing the discussion about optimization of the DR KPIs that serve as benchmarks against which you measure your disaster readiness.

In part one, I talked about how you can improve RPO. In this post, I am going to identify strategies to optimize Recovery Time Objective (RTO). Consider the following strategies to improve your RTO:

1) Document everything 

It’s important to document every aspect of both your system and restore/recovery process to embrace a more proactive approach. Your engineers shouldn’t be forced to figure out the actions on the go. Instead, they should be able to simply copy-paste the required commands from the DR procedure document.

2) Implement a monitoring system and have a support team

How can you start recovery, if you don’t even know when something has failed?
How can you start recovery if you have no technical resources to perform it?
As I mentioned in one of my previous posts, having a monitoring and support system is essential to build the recovery processes. So, be sure to have a system in place that can keep track of everything, and a team/person who can do the actual work.

3) Utilize replication and re-synchronization technologies 

Same as I mentioned in the previous post. If you can just switch over to a working backup infrastructure, you don’t need to spend time restoring/recovering from a backup

4) Automate

Automation is one of the best strategies to improve your operational efficiency and business continuity efforts. Identify all the processes that can be automated with a tool.

Use Infrastructure-as-a-Code and Configuration Management tools wherever feasible. It will drastically decrease the chances of human error and speed up your operations.

When implemented the right way, automation solutions can improve RTO by up to 50% and optimize DR workflows.

If applicable, automate the switch over and make it dependent on system metrics. Why? Because smart automation technologies outperform humans in various business aspects.

5) Test your BC/DR solutions regularly 

This one is the most important of all. Make sure your business continuity and disaster recovery systems are up and running throughout the year. It’s recommended to test them twice a year, if not quarterly. If something goes wrong during the DR execution, it will increase the RTO significantly. You can’t be sure about your solution until it’s tested.

