Tech Corner

How Accurate the Reliability Bathtub Curve Is for Hardware

You may have heard of the bathtub curve, a failure rate graph, which is often used for measuring the reliability of products during their lifetime. It has been touted as a dependable and accurate representation of various products, including hardware and other technology products. 

But in fact, the bathtub curve may be inaccurate regarding hardware, particularly enterprise equipment such as servers and storage. That’s not an assumption – there’s data and expert opinion to back it up. 

In this article, we will explore the following:

  • The bathtub curve, with an example and its link with data center hardware.
  • Whether it is applicable to critical equipment such as servers.
  • Whether it correctly reflects the lifespan of the equipment.
  • Ultimately, how it can impact your capital expenditure (CapEx).

 

Explore Hardware Options

Not the article you were looking for today? Try these out:

What Is the Bathtub Curve?

The bathtub curve is a graph used in reliability engineering to showcase the dependability of a product throughout its lifetime. 

Bathtub curve depicting the hardware and software lifetimes of HPC systems

(Data: source)

The curve in the graph is shaped like a bathtub with both ends up, hence the name. It suggests that product reliability is lower when new to the market, as manufacturing defaults may be discovered and fixed.

The reliability goes up, or the failure rate goes down in the middle. As the product becomes old and nears the end of its lifespan, the reliability goes back down, or the failure rate goes up.

To reflect this theory, the graph is divided into three sections: the infant mortality section, the normal life section, and the wear-out section. The first and third sections indicate higher failure rates, whereas the second (middle) section indicates lower failure rates. 

The bathtub curve formula relies on product failure reports and time. Typically, you need a lot of data over time to see if a product exhibits a bathtub curve. Other factors are also at play, such as product use or early retirement. 

Does It Apply to Hardware? 

The bathtub curve, in general, can be useful for understanding the performance and reliability of different products. However, not all products exhibit the failure rates that form a typical bathtub shape. 

It’s not uncommon for both hardware and software products to show faults when first introduced to the market. A recent example is the iPhone 15, which was reportedly overheating for many users. However, Apple quickly resolved the issue with a software update. 

These early-stage bugs in equipment and gadgets are often fixed with firmware updates. So, these bugs don’t impact the overall reliability of the device by much. And devices rarely fail completely in the early stages of their lifespan. 

As for hardware for data centers, there’s not much data available to suggest that such equipment follows the bathtub curve. Also, the bathtub curve would suggest that servers and storage should be replaced as they enter their wear-out stage. That’s not always the case, either, even when the support from the manufacturer ends. 

Also, if the thriving pre-owned (legacy) server market is anything to go by, this type of equipment can easily last up to a decade. 

So, the bathtub curve doesn’t accurately represent the reliability or longevity of data center hardware. Of course, there are exceptions, but to say that all equipment exhibits higher failure rates early on and then later would be a misleading assumption. 

How Long Can Data Center Hardware Last? 

Another issue with the bathtub curve is that it only projects the failure rate of the equipment and doesn’t determine the actual lifespan. In the case of hardware, a product in question may be more susceptible to failure as it ages, but it’s not the end of its lifespan. 

As we’ve often seen, even the end-of-life (EOL) and end-of-service life (EOSL) milestones are not truly indicative of a device’s lifespan. Critical equipment like servers and even non-critical appliances like switches may last beyond EOSL with upkeep. 

While original equipment manufacturers put the service life of equipment around three to seven years, it can last up to 10 years. 

That said, it’s important to understand that the use and maintenance of equipment over the years strongly impact its reliability and failure rate. So, not all devices may last a long time, especially if they’ve been repaired in the past. 

According to the Statista Research Department, a server’s failure rate increases to 18 percent in seven years. While this rate is high, it doesn’t necessarily indicate the demise of the server. 

Frequency of server failure based on the age of the server

(Data: source)

Best Practices for Extending the Life of Your Equipment

Whether or not the bathtub curve accurately represents the reliability and longevity of servers, storage, and other networking equipment, every device has a finite life. However, as budgets dwindle and CapEx spending sees cuts, enterprises are pushed to keep working with the equipment they have. 

Even if you’re not constrained by budgetary limits, in certain cases, such as complex dependency, you may want to extend the life of your equipment beyond EOSL. 

High-ticket assets such as servers can last long. Here are some of the best practices for ensuring your data center equipment lasts a considerable time:

  • Timely Software Updates: Ensure that any firmware/software updates or security patches are applied as soon as the vendor releases them. This is all the more important in the early stages, as these updates are designed to fix problems and improve performance. 

  • Adequate Cooling at All Times: Avoid any overheating incidents with your equipment by ensuring the premises provide the right temperature within the vendor's stipulated operating temperature range. Overheating may take a toll on the intricate components of the device and weaken them, making it likelier to fail early in its lifetime. 

  • Monitor Equipment Health Data: Obtain performance and health data for the devices. The operating system on most appliances may provide this data, which can help you understand how the device is aging. 

  • Third-Party Maintenance (TPM): Lastly, and most importantly, TPM is your best bet to elongate the life of servers and other equipment in your data center, particularly when support from the OEM stops. It’s a viable option to ensure you have support for problems, or better yet, a spare, should the equipment fail. 

Trust OneCall

Modern IT equipment is resilient enough to last considerably long, especially when maintained well, but you can’t always trust the bathtub curve. While OEMs may have you believe your refresh cycle should be as short as three years, that’s not always the case. More importantly, that’s not what many enterprises can afford. 

TPM is a reliable way to extend the life of your equipment while also ensuring it doesn’t fail on you. It can essentially save your CapEx by delaying the refresh cycle or help you avoid downtime when dependency on legacy equipment is high. 

If TPM were a house, OneScan would be the foundation. We do the complex thinking for you when it comes to setting the foundation for an optimized maintenance strategy, saving you valuable time. 

Learn how OneCall can help your equipment avoid the traditional bathtub curve failure rate!

Explore OneCall

No Comments Yet

Let us know what you think

Subscribe by email