When a Data Center Disaster Strikes

When a Data Center Disaster Strikes


when-a-data-center-disaster-strikes Having a disaster recovery plan to deal with an unexpected IT infrastructure calamity is a significant focus in the data center industry, but what constitutes a data center disaster? Which major organizations have suffered these disasters? What is the potential for financial damage if a data center disaster occurs? How do we mitigate the damage when they do occur?

We asked these questions and more as part of our series of interviews on data center topics with Jeff Gilmer of Excipio Consulting. Additionally, Jeff will help define the differences between disaster recovery, business continuity, and emergency management. As Jeff shares his experience in preparing dozens of enterprises and government agencies for data center disasters you’ll have a better understanding as to what preparations are required to enable your organization to bounce back quickly when disaster strikes.

The audio of this interview can be listened in the player above and the full transcript of the discussion is below.


Kevin O’Neill, Data Center Spotlight: This is Kevin O’Neill with Data Center Spotlight, and we are here today with Jeff Gilmer, senior partner at Excipio Consulting. This is part of our continuing series of discussions with Jeff about various issues in the data center, cloud computing, and overall IT infrastructure world. Jeff, good to be with you again today.

Jeff Gilmer, Excipio Consulting: Yes, thank you, Kevin, good morning.

Data Center Spotlight: Jeff, today’s topic, will your data center emergency go as planned? Now, Jeff, we talk about disaster recovery, which is DR, quite a bit in the data center world. Now, I’ve got a simple question for you. What constitutes a data center disaster?

Jeff Gilmer: Well, that’s always an interesting question, Kevin. We get asked that frequently, there’s a wide range of things that make up a disaster. People always think of oh, it’s a major issue where we’ve got a hurricane, or we’ve got an earthquake, or we’ve got a tornado, or a fire, or something that completely destroys the data center, but in reality, the most common disasters within the data center are really more single applications, or single services that have been lost, whether it’s something through cyberattacks that we see today, whether it’s through equipment failures, whether it’s just a pure lack of ability to recover, they do a patch or an upgrade, and they can’t go back to a version, or they can’t reinstitute what they had prior, and actually the most common one is human error, making an error within the data center, patching, upgrading, making a migration, updating the network connectivity, any of those issues can cause a disaster within the data center itself.

Data Center Spotlight: Well, we know a lot of people do have problems in their data center, and I guess some level of disaster is tough to avoid in the data center world, and I know you’re not someone who’s looking to kick a company or organization when they’re down, Jeff, but do you have any recent examples of the impact to an organization from a data center disaster?

Jeff Gilmer: Sure, there’s a lot of them that have been publicly produced for organizations, people like Google, Amazon, NASDAQ, Microsoft, they need to publish these, being a public company, if they have an issue. So, some examples, one organization was down for five minutes, and all their services went down, and in that five minutes of time, they had over $500,000 in claims and damages, just from a five-minute loss, is one example. Then you take and extrapolate that out, another company had a situation where they could not access their data center, their clients could not access their data center for just under an hour, actually, I think it was 49 or 50 minutes’ time, and they lost over $5 million combined between costs, between claims of some of their clients who were losing revenue. So, it can be significant, I mean, just a few minutes can accumulate to a significant amount of time with any of these particular entities.

Public sector is another one, we’ve worked with some public sector clients, and there are times where if their data center is down, and they can’t publish certain information by a certain time period, they are fined by the federal government up to $1 million per day based on the lack of information. So, it’s not just private corporations, but multiple entities could have a huge impact by a disaster today.

Data Center Spotlight: And it seems like disasters, today, Jeff, are more damaging than before. We have such an assumption of availability. We’re all so dependent upon these services, and these applications, and it seems like it used to be, we would accept a certain level of downtime, and I don’t think that’s the case anymore.

Jeff Gilmer: Well, you’re exactly correct. I mean, how many people today have a smartphone, and they expect that smartphone to be able to access their information on the internet, or their email, at any given time, 7/24/365 and if that goes down, that’s a critical impact to people’s lives, whether it’s an emergency situation or not, and that’s damaging to any type of organization.

Data Center Spotlight: Yeah, absolutely. So, Jeff, let’s get to some of the terminology, and I know the terms most people are familiar with in the data center world are disaster recovery, business continuity, and emergency management. Can you outline the differences between those terms, for example, how a disaster recovery plan differs from a business continuity plan, and how a business continuity plans differs from an emergency management plan?

Jeff Gilmer: Yeah, sure. Let’s take them one at a time, so we can be clear, because they do impact each other to a certain degree, but they’re clearly very different types of plans which you need to have in place in an organization. So, an emergency management plan is a plan that you put in place in the case that you have an emergency, where you’re really dealing with the safety of your employees, or your assets, or your facilities. It’s, what do I do if an emergency occurs? You can think of it this way, and the easiest way for me to explain to people is, you’re just sitting there in 5th grade with your teacher, and all of a sudden, that fire alarm goes off in that school. Well, the emergency management plan is, those teachers take their students to exit out which doors, to what locations, to be accounted for. That’s what an emergency management plan is. It’s to make sure that you safely get away from, or exit from that particular emergency. It could be weather-related, whether a hurricane or a tornado, it could be a fire. It could be, unfortunately, terrorist attack type things that happen today, but it is a plan to really protect the people, protect the assets, and protect the facilities for your particular organization.

Data Center Spotlight: Okay, so that’s an emergency management plan. What’s a business continuity plan, Jeff?

Jeff Gilmer: So, a business continuity plan, in reality, an organization should have multiple business continuity plans, Kevin, and those business continuity plans roll up into the company’s business continuance plan, and I know we’re getting a little play on words here, but in the disaster recovery, and the recoverability world, those are two different things. But basically, a business continuance plan is, in the case of an emergency, how do we continue our operations, and it’s from a business perspective, so that’s where you’ve heard of things such as the calling tree. We’ve got an emergency, there’s a calling tree of who we call. What are the roles of those people that we’re calling in their part of the organization? What are their responsibilities? What are the actions that they need to put in place? And it can be everything from, how do we deal with the public exposure to how do we continue to function internally? How do we continue to provide what we need to provide to our customers and our clients? All the way through, but it’s really defining it from the business aspects of it. Now, IT is a portion of that, but IT is going to be a subset of the business continuance plan for the organization. What IT needs to do is to support the roles or responsibilities and actions of the company.

Data Center Spotlight: Okay, very well. All right, so that’s a business continuation plan. Could you define for us, Jeff, disaster recovery, and how disaster recovery differs from emergency management and business continuation?

Jeff Gilmer: Right, so if we go back and just summarize, emergency management, a disaster is occurring, what do we do? Business continuance, we’ve had a disaster, how do we continue operations? Disaster recovery, the disaster is now full-blown, we’ve had an impact, how do we recover and get the organization back up and functioning? That’s really the difference between the three, and I’ll focus a little bit more on the disaster recovery around the data center, since that’s our focus with Data Center Spotlight, is to talk about the data center.

So, a disaster recovery plan around data center includes the policies, the procedures, the actions to recover the organization, which then flow down to the data center itself. Your first goal is to minimize the effects of that disaster, and then your second goal is to recover, resume all of your functions, starting with your mission critical functions, and eventually also working through all the functions that are critical or supported, technically, by the data center. So, it may be as simple as, someone’s out there with a backhoe and they cut your fiber line, and you have to restore your application, and it may not sound like it’s a big deal, but what if you’re a hospital, and that’s the network connectivity that goes to your ambulances to direct them, when they’re coming to an emergency? Or what if you’re police and fire, and that’s your 911 call line that’s now been broken? Or other issues along those related to compliance or regulatory, it’s the end of the day and you’ve got to file all your stock exchanges from an insurance or financial company by 5 PM, or you’re going to risk some significant fines.

Those are the type of common-day disasters that happen out there, and your IT plan has to understand where all that goes, and I won’t get into all the specifics, but what really happens is, you have to understand your services, you have to have them mapped back to your applications, you have to have that primary application mapped to all its dependent applications, and databases, and other things to bring up. You have to have that mapped to your infrastructure, which is to your servers, to your storage, and ultimately, to your network connectivity. If all that is mapped together, and you define that as a particular group within your disaster recovery plan, once you have that disaster, you can then recover and bring that service back, and get functional, whether it’s one service, or maybe an entire facility.

Data Center Spotlight: Well, Jeff, we’re talking about all these issues, and they’re all important elements of how an organization mitigates risk. You have a lot of experience dealing with large organizations, companies, government agencies. Are you finding that these companies and government agencies are a little better prepared for issues, at least within the data center, than they were a few years ago, or are there still a lot of organizations out there that have a lot of blind spots, and weaknesses, when it comes to their overall disaster recovery, emergency management, and business continuation plans?

Jeff Gilmer: I would say, from Excipio’s experience in dealing with clients today, what we’re finding is the production side of the data center has become pretty efficient and pretty optimized. A lot of people have moved as far as they can with virtualization. They understand their storage, they understand their network connectivity, they may have redundant network providers. The part that now they’re starting to focus on is the ability to recover. While they have plans and other things in place, the true ability to implement those plans, to truly provide functionality, and the ability to recover is not there, and the biggest issue is, they haven’t completed what we call a criticality analysis, meaning taking every service that that company delivers, and ranking that service from 1 to 100, and identifying all the applications and infrastructure that support that, and that is critical as your first steps before you get into a disaster recovery plan, and while we don’t have time to probably get into that today, Kevin, I think that would be a great subject for us to maybe continue this conversation at another date, can talk about what are the steps that you’d really go through to put in place a functional disaster recovery plan for your data center.

Data Center Spotlight: Well, that sounds like a great idea. It sounds like today is leading us to the next topic in our series perhaps, Jeff. Well, Jeff, I think after our conversation today, we have a little bit better understanding of some of these terms in the data center world. How can someone reach out to you for more detail on these issues, or if not directly to you, how can someone learn more about these issues, someone who wants to develop a better understanding of some of these big picture issues, and issues of disaster recovery, and just overall risk management within the data center?

Jeff Gilmer: Sure, so obviously, if you can go to our website, Excipio.net, you’re going to find different solutions surrounding the data center, and other areas, and you’ll find different components related to disaster recovery and business continuance and emergency plans within those pieces of documentation. You also can find my contact information, you can contact me directly if you have any questions, or if you’d like any industry comparative information, we have that available, and probably a third place is in early October here, the 3rd, 4th, and 5th of October, I’m actually going to be speaking with other people at the Critical Facilities Summit, specifically in the data center disaster recovery series. So, I’ll be there, and I’ll be happy to meet with you in person, or just attend any of the sessions that they have related to disaster recovery.

Data Center Spotlight: All right, terrific. Jeff, as always, I appreciate your time, some illuminating information as always. Now, I would encourage anyone who’s been listening to these, Jeff, you just gave them your email address, they can also reach out to me at [email protected] with any followup questions for you, if there are any topics they’d like us to address, and again, if you could give your email address again, Jeff.

Jeff Gilmer: Yes, so my email is [email protected]

Data Center Spotlight: Jeff, again, thank you so much for your time and your expertise today. I look forward to our next visit.

Jeff Gilmer: Thank you, Kevin.