You may have heard of the bathtub curve, a failure rate graph, which is often used for measuring the reliability of products during their lifetime. It has been touted as a dependable and accurate representation of various products, including hardware and other technology products.
But in fact, the bathtub curve may be inaccurate regarding hardware, particularly enterprise equipment such as servers and storage. That’s not an assumption – there’s data and expert opinion to back it up.
In this article, we will explore the following:
Not the article you were looking for today? Try these out:
The bathtub curve is a graph used in reliability engineering to showcase the dependability of a product throughout its lifetime.
(Data: source)
The curve in the graph is shaped like a bathtub with both ends up, hence the name. It suggests that product reliability is lower when new to the market, as manufacturing defaults may be discovered and fixed.
The reliability goes up, or the failure rate goes down in the middle. As the product becomes old and nears the end of its lifespan, the reliability goes back down, or the failure rate goes up.
To reflect this theory, the graph is divided into three sections: the infant mortality section, the normal life section, and the wear-out section. The first and third sections indicate higher failure rates, whereas the second (middle) section indicates lower failure rates.
The bathtub curve formula relies on product failure reports and time. Typically, you need a lot of data over time to see if a product exhibits a bathtub curve. Other factors are also at play, such as product use or early retirement.
The bathtub curve, in general, can be useful for understanding the performance and reliability of different products. However, not all products exhibit the failure rates that form a typical bathtub shape.
It’s not uncommon for both hardware and software products to show faults when first introduced to the market. A recent example is the iPhone 15, which was reportedly overheating for many users. However, Apple quickly resolved the issue with a software update.
These early-stage bugs in equipment and gadgets are often fixed with firmware updates. So, these bugs don’t impact the overall reliability of the device by much. And devices rarely fail completely in the early stages of their lifespan.
As for hardware for data centers, there’s not much data available to suggest that such equipment follows the bathtub curve. Also, the bathtub curve would suggest that servers and storage should be replaced as they enter their wear-out stage. That’s not always the case, either, even when the support from the manufacturer ends.
Also, if the thriving pre-owned (legacy) server market is anything to go by, this type of equipment can easily last up to a decade.
So, the bathtub curve doesn’t accurately represent the reliability or longevity of data center hardware. Of course, there are exceptions, but to say that all equipment exhibits higher failure rates early on and then later would be a misleading assumption.
Another issue with the bathtub curve is that it only projects the failure rate of the equipment and doesn’t determine the actual lifespan. In the case of hardware, a product in question may be more susceptible to failure as it ages, but it’s not the end of its lifespan.
As we’ve often seen, even the end-of-life (EOL) and end-of-service life (EOSL) milestones are not truly indicative of a device’s lifespan. Critical equipment like servers and even non-critical appliances like switches may last beyond EOSL with upkeep.
While original equipment manufacturers put the service life of equipment around three to seven years, it can last up to 10 years.
That said, it’s important to understand that the use and maintenance of equipment over the years strongly impact its reliability and failure rate. So, not all devices may last a long time, especially if they’ve been repaired in the past.
According to the Statista Research Department, a server’s failure rate increases to 18 percent in seven years. While this rate is high, it doesn’t necessarily indicate the demise of the server.
(Data: source)
Whether or not the bathtub curve accurately represents the reliability and longevity of servers, storage, and other networking equipment, every device has a finite life. However, as budgets dwindle and CapEx spending sees cuts, enterprises are pushed to keep working with the equipment they have.
Even if you’re not constrained by budgetary limits, in certain cases, such as complex dependency, you may want to extend the life of your equipment beyond EOSL.
High-ticket assets such as servers can last long. Here are some of the best practices for ensuring your data center equipment lasts a considerable time:
Modern IT equipment is resilient enough to last considerably long, especially when maintained well, but you can’t always trust the bathtub curve. While OEMs may have you believe your refresh cycle should be as short as three years, that’s not always the case. More importantly, that’s not what many enterprises can afford.
TPM is a reliable way to extend the life of your equipment while also ensuring it doesn’t fail on you. It can essentially save your CapEx by delaying the refresh cycle or help you avoid downtime when dependency on legacy equipment is high.
If TPM were a house, OneScan would be the foundation. We do the complex thinking for you when it comes to setting the foundation for an optimized maintenance strategy, saving you valuable time.
Learn how OneCall can help your equipment avoid the traditional bathtub curve failure rate!