Today, IT DR is more important than ever given the increasing number of disasters we experience each year. In addition, a rapidly changing mix of variables such as increased system complexity, emergence of the next generation workforce, climate change and the need to reach a global customer base all make the CIO’s challenges greater than ever. As we are informed nearly every day, the threat climate is accelerating in ways we never expected. However, those very complexities are what IT DR programs strive to address. It truly points to system resiliency. I embraced that philosophy when it was adopted by the business continuity and DR industry several years ago. Today this discipline has multiple designations such as “Business Continuity,” “Disaster Recovery,” “Business Resumption,” and now “Business Resiliency.” I prefer the latter designation because it really is a matter of addressing the longevity of any business and that means resiliency.
The different references also were adopted to help this industry move away from the traditional “not this year” DR investment mindset. To be successful in this arena, you must prepare for the worst case scenario and realize it is not “if” but “when” disaster strikes your infrastructure. Today there are thousands of documented cases of companies whose leaders believed catastrophic IT events could not happen to them. As mentioned earlier, in the past this need was given low priority by most organizations, often trumped for other initiatives that could more readily impact the bottom line. That thinking changed dramatically when the government stepped in to address the financial crisis, but even more so when the government intervened in the healthcare industry, mandating that organizations address disaster preparedness. The government has and continues to impose large fines on organizations that do not exercise due diligence when it comes to information security and recoverability. So, if you are in either of these two industries and think compliance is expensive, try non-compliance!
I could further elaborate on why it is important to have a solid business resumption program in place, but for now let me share a few tips for those of you trying to get your arms around DR and move quickly:
Three Steps to help establish a Disaster Recovery Planning program:
1. List all your applications and prioritize them by criticality level (Tier 1, 2, 3, etc.)
a. Tier1: Applications that must be recovered within 24 hours of a disaster.
b. Tier 2: Applications that could be recovered during days two through five.
2. Below is a table of Risk Categories you can structure for your specific business to help you group and discuss risks associated with each application or business function.
3. Additional important items to consider when devising and implementing a disaster recovery/resiliency program.
• Who can declare a disaster?
• Which systems are most critical?
• Where do we recover?
• Who is the DR coordinator?
• What is the systems recovery order?
• Who are the recovery teams?
• How do we reach the recovery team members?
• Where do the recovery plans reside?
• When was the last DR test?
If you have no disaster recovery (DR) plans in place, and have not performed any DR testing, you are at much greater risk today than in the past, so get on with it. As Benjamin Franklin said, “By failing to plan you are planning to fail.” In the business resiliency arena, many agree that your ability to recover from a disaster is only as good as your last recovery test. When a crisis strikes and a disaster is declared, that is not the time to begin reading your recovery plans. The best approach is to convene your recovery team(s) immediately and repeat your last DR exercise, which will ensure you have people involved who are experienced. This approach will get your organization on its way to recovery in a more expedient manner.
Also, please note that large, 50-plus page DR plans are quite useless since you will not have time to read and comprehend them during the chaos. The more you test, the better prepared you will be to recover from an IT disaster.
Below is a recent chart depicting disaster root causes although the leading cause hasn’t changed for years: Human error
Today a number of offerings are available to assist you such as moving DR to the cloud to help keep the cost down. These technology options are helpful, but to take advantage of them, look closely at your systems, their criticality level, and how they are constructed.
In most cases, some restructuring or perhaps a complete re-design of your system is best to leverage the real power and value of the cloud. In addition, be sure to engineer access to the cloud in a robust and redundant fashion. I would suggest more than one carrier link to the cloud and perhaps leverage multiple service providers along with load balancing your traffic across them. Consider this a great opportunity to make your systems more resilient!
When pursuing this direction, be sure to ask the cloud providers how they handle DR for their cloud infrastructure and how quickly they will have your environment restored should they experience a disaster at the site where your recovery systems reside. Ask them when they performed their last DR test and what the results were.
Also, ask for recorded documentation of their DR testing and do not accept a verbal response to this question; you will want to have the historical record. Whenever possible, ask to be included when the tests are scheduled and performed.
One a final note, a new international standard (ISO 22301) is now available that can help you address preparedness and implement a program throughout your organization. In addition, a companion document (ISO 22313) provides specific implementation guidance on the new ISO 22301 standard. These guidelines can help you establish a resiliency program where DR is solidly and uniquely embedded within the culture of your organization.