KPIs for Disaster Recovery and Business Continuity

I briefly talked about designing a Business Continuity (BC) Plan in my previous post. Let’s now talk about Business Continuity and Disaster Recovery KPIs.

We understand that data is king when it comes to continuity and recovery. Reporting on the right metrics is one of the ways to know whether your solution is working or not, and to figure out what are you trying to build at all.

However, can be a challenge for business continuity and DR managers to implement strong KPIs that clearly articulate the value of their actions.

Here are four metrics that I usually recommend to define and measure the completion and performance of BC/DR program:

1) Recovery Point Objective (RPO) 

It is defined as the maximum amount of data you can lose after recovering from a disaster. It can range from days to absolute no loss. Depending on the solution, RPO can determine how frequently you might need to backup your data, or even which BC/DR solution you should choose.

An average Oracle DB system can reach «several minutes» values pretty easily, requiring some more effort/investment after that.

2) Recovery Time Objective (RTO)

RTO defines how much time you’re allowed to spend recovering, at max. It’s a maximum amount of time within which your business must restore after any possible disaster.

Depending on the solution size and system performance, a typical Oracle system almost effortlessly achieves a value of «several hours», while reaching something closer to «seconds» requires considerable investments.

3) Recovery Window 

How old data from the past you might need to retrieve. This value can be enforced by some regulatory rules, or by prior experience, or by prior user requests for data.

4) System Redundancy 

How many copies of data/software/hardware do you want to have? This can scale from two to infinity, with robustness scaling alongside. However, the more copies, the higher the costs.

A good practice is the popular «3-2-1 rule», having three copies of data: two backup copies and one copy offsite. This is not a silver bullet, but something to start with.

