The State of Data Center Outages So Far in 2023
For any data center, any outage is the worst-case scenario, which companies must avoid at all costs. Unfortunately, outages do occur. Even with IT spending growing consistently and enterprises throwing money at keeping their data centers up and running, outages can happen.
The Uptime Institute’s Annual Outages Analysis 2023 report provides a glance into the state of outages. It provides insights that can help organizations make their mission-critical infrastructure more resilient. The report provides data on outage frequency, severity, cost, and causes.
In this article, we will provide the following:
- The key findings of the Uptime Institute’s outage report.
- An outline of how the report suggests enterprises can take the right steps to prevent outages.
- Failing that, an explanation of how they can reduce their frequency and cost.
Not the article you were looking for today? Try these out:
- 5 Common Causes of Hardware Failure and How to Prevent Them
- Network Outages: Do They Cost More Than You Think?
- Here’s How to Mitigate Long Lead Times for Hardware in 2023
Key Takeaways From Uptime Institute’s Report on Outages
Before we discuss the main takeaways from the report, it’s essential to examine the reliability of the data on outages. In its report, Uptime Institute pointed out that data on outages often lacks reliability. It blamed the lack of standards for reporting such data and transparency.
For this report, Uptime Insititute tapped into its own Abnormal Incident Report (AIR) database, which is the most reliable. One can trust these records as they receive incident reports directly from data centers.
Furthermore, they’ve used public data, such as press releases on incidents or outage trackers. They’ve also conducted surveys with different professionals with knowledge of the matter.
Assuming they’ve managed to compile data that provides an authentic look into infrastructure outages, here are the key takeaways from the report.
Outage Rates Are Reducing
Over the years, outages have reduced. Although the aggregate numbers from the last few years show an upward trend of such incidents, Uptime Institute claims the overall outage numbers are falling.
While the number of outage incidents seems to be rising yearly, the number of data centers is also increasing. When adjusted for the expansion of the data center industry, the outage numbers show a slow decline year-on-year.
It has surveyed four data centers since 2020. In 2022, 60 percent of respondents reported having an outage, compared to 69 percent in 2021 and 78 percent in 2020. The sample size for this particular survey is small, but it still indicates a slowdown in the frequency of such incidents, which is a good sign.
Outage Cost Is Rising
The frequency may have reduced, but the cost of an outage is on the rise.
Uptime Institute’s survey found that outages that cost $100,000 or more are increasing. In 2022, outages with costs under $100,000 decreased to 39 percent from 60 percent in 2019. That’s not all: 25 percent of respondents reported outages that cost over a whopping $1 million.
As the digital footprint of the world is growing, so is the dependency on infrastructure. That’s why outages are costing more than they used to. An outage can be even more devastating for larger companies with different subsidiaries interconnected through the same infrastructure.
Another factor contributing to the rise in recovery costs is the price hikes for equipment and services due to inflation. It’s becoming more and more expensive to fix and replace equipment.
Outage Severity Is Decreasing
Another positive finding from the report is that the severity of outages seems to be reducing. Very few data centers that experienced outages in the last three years had Level 4 or 5 outages. Level 4 and 5 outages are severe and debilitating, involving significant disruptions in operations, financial losses of millions, non-compliance fines, and a mark on the company’s reputation.
In 2022, Level 4 and 5 outages dropped 14 percent from the 20 percent average in prior years.
That is primarily due to the shift in the architecture of data centers and other digital infrastructures where a problem in one area doesn’t necessarily bring the entire system down. While interconnected and, at times, dependent, IT experts at large companies have figured out how to isolate an incident so it doesn’t cause an outage across the organization.
Increased redundancy is also helpful. Power backup keeps the servers running and data flowing even during an outage.
Human Error Is the Top Contributor to Outages
Unsurprisingly, human error made it to the list of common causes behind outages. In most incidences, it’s not the sole reason for an outage but a contributor. The data suggests that human error is involved in up to 80 percent of all outages.
Common causes behind such outages include poor procedures, not following procedures, installation problems, lack of staffing, poor maintenance, and design flaws.
It also differs from data breaches, as human errors are responsible for attack vulnerability. From misconfiguration to weak passwords, there’s a variety of ways negligence by personnel leaves systems open to attack by malicious parties.
That all said, this is a problem that organizations can address this challenge with training. Ensuring all the staff receive training on the policies and procedures is necessary to prevent accidental outages caused by human errors.
Third-Party Providers Face Most Prominent Outage Incidents
Many companies work with third-party providers for different services. And as these providers work with different enterprise customers, they are even more exposed to threats. Uptime Institute found that third-party providers such as cloud or services providers accounted for 66 percent of all outages since 2016.
The number rises yearly as outages impact cloud and hosting companies the most. In 2022, it stands at a whopping 81 percent.
Outages experienced by third-party providers impact many companies that rely on their services and infrastructure.
That’s one of the reasons why there’s been a renewed interest in on-premise infrastructure.
Network Complexities Are Another Notable Cause
Uptime Institute’s survey on network outages revealed that 44 percent of companies experienced outages because of a network issue. Among the various network issues, configuration/change management failures, third-party provider failures, and hardware failures were the top three causes.
One could blame the growing complexity of networks. While networking technology is constantly changing for the better, the constant changes in equipment, architecture, and configuration are only increasing complexity. So, it becomes harder to figure out the cause and resolve it in time when something goes wrong.
Even more minor errors and faults can domino effect the whole network, resulting in unexpected outages.
Novice networking technologies are addressing some of these concerns, such as network observability and monitoring tools. Similarly, equipment maintenance is vital to ensuring devices are in good health and refreshed on time.
The Bottom Line
Outages are a serious issue for data centers, as they disrupt operations and leave an indelible mark on reputation, especially if it’s a public outage that makes the news. Besides revenue losses, data centers may also be liable for compliance issues.
The Uptime Institute’s report shows that the number of outages and their severity is decreasing. At the same time, an outage's cost has become unprecedented. The decrease in such incidents shows that enterprises' efforts to make their data centers more resilient are working. However, more needs to happen to ensure this happens.
Maintenance is critical to preventing network outages. As many data centers rely on legacy equipment, it’s essential to ensure such devices are properly maintained. One can achieve this through third-party maintenance (TPM) providers like OneCall. Learn why OneCall is trusted by so many enterprises!