MTBF vs. MTFF vs. MTTR: What’s the Difference?

[ad_1]

In the IT management world, there’s always been confusion between the terms Mean time between failures (MTBF), Mean time to first failure (MTFF), and Mean time to repair (MTTR), and how they relate to product reliability. Let’s take a few minutes today and look at these terms and how you would use them for data center management in an IT service management environment (ITSM).

Here are the basic definitions for each term and how they are derived.

Mean time between failures (MTBF) is a prediction based on prior observations, designating a product’s average time between failures. An MTBF value can be defined by the following equation:

        MTBF = total operational uptime between failures / number of failures

To illustrate, let’s assume a manufacturer has recorded the following data points between product failures for one of its copier models:

Failure number Recorded Operational  uptime before  failure (in hours)
1 10,000
2 9,500
3 11,000
4 9,000
Total 39,500

The MTBF would be 9,875 hours ((10,000 + 9,500 + 11,000 + 9,000) / 4), meaning that on average, this copier can run for 9,875 hours before experiencing a failure.

Meaningful MTBF calculations obviously require many more observable data points. The more failure data points observed for a product, the more accurate its MTBF. The MTBF for a product that has many subsystems (such as a server with disk drives, fans, motherboard, etc.) also requires a much more complicated MTBF calculation. But for our purposes, this simple example illustrates the idea behind an MTBF, which is:

        MTBF predicts the average time between product failures

MTBF is commonly used to designate failure rates for both repairable and replaceable (non-repairable) products.

Mean Time to First Failure (MTFF) also predicts failure rates for a product. Unlike MTBF items, MTFFs are only used to designate failure rates for replaceable (non-repairable) products, such as keyboards, mice, batteries, desk telephones, and motherboards. MTFF formulas generally use the same equations as for an MTBF product, but they only record one data point for each failed item.

MTFF measurements can refer to two specific types of replaceable products:

  1. Replaceable products that can’t be repaired. The product’s first failure is its only failure and it must be replaced.

    A simple example of a replaceable MTFF item is a computer mouse, which are always replaced and never repaired. MTFFs for replaceable items provide an idea of how many replacements should be in stock and the turnover you might expect on that item.  The same holds true for other replaceable items, such as keyboards and desk telephones. It’s also worth noting that some network appliances are replaceable, rather than repairable. Certain firewalls, switches, modems, and other key networking equipment are sealed units that run for years and are replaced rather than repaired when they fail.
  1. MTFF may also refer to the first failure rate for a replaceable subsystem inside a repairable In this scenario, the failure occurs inside another product and the repair is to replace the failed subsystem.

    If you have a UPS in your Data Center for example, its key replaceable components are batteries. The UPS unit itself may have an exceptionally long MTBF of up to ten years, but each UPS battery may only have an MTFF of 3 years. Given this, you should budget to replace all UPS batteries before their MTFF expires (say every 2.5 years). Here, the MTFF provides a budgeting function.  A similar situation exists for Data Center air conditioners and generators, which also have replaceable components that can fail.


    The most popular computer replaceable items are hard drives, which fail with age. If you know the MTFF of your disk drives, you can determine an appropriate number of replacement drives to have available, in case of failure.

 

Mean time to repair (MTTR) represents the average time to repair or replace a failed product or subsystem of a product.

MTBF items are generally repairable products.  Repair occurs as quickly as possible, possibly by replacing a subsystem component. An MTBF product’s MTTR value is a discrete measurable number for how long it will take to get the product running again.

Disk drives, fans, motherboards, and other replaceable items that reside in a computing device are MTFF items that are always replaced and seldom repaired. The same goes for stand-alone replaceable products such as mice, keyboards, batteries, telephones, and sealed network equipment. MTFF products have no MTTR because they cannot be fixed, only replaced.

Practical MTBF and MTFF

Even though an MTBF value can be used for repairable or replaceable products, it’s best to only designate repairable products as MTBF items. MTBF items are longer lived products such as servers, computers, air conditioners, UPS systems, generators, etc. MTBF products are usually capital items that must be budgeted for when you are ready to upgrade or replace.

MTFF products are generally parts for MTBF items as well as inexpensive lower-end components, such as mice, keyboards, batteries, motherboards, etc. They are usually expensed items. Using MTFF values can help you decide how many replacement items of each product you need to stock as they wear out.

Measuring your MTBF, MTFF, and MTTR

Purchasing software that tracks MTBF, MTFF, and MTTR history by individual product in your data center can help improve your data center and service desk performance. Many packages offer reports that detail the individual failure rates and repair cycles of your MTBF and MTFF products. These reports can help improve performance in several areas, including creating product baselines to measure improvement; predicting failures before they occur; and reducing MTTR times and increasing MTBF and MTFF times. Please feel free to contact us at BMC for more information on using MTBF, MTFF, and MTTR to improve performance in your own environment.

[ad_2]

Source link