Estimating Drive Reliability in Desktop Computers
and Consumer Electronics Systems
Introduction
Historically, desktop computers have been the primary application for hard disc storage devices.
However, the market for disc drives in consumer electronic devices is growing rapidly. This paper presents
a method for estimating drive reliability in desktop computers and consumer electronics devices
using the results of Seagate’s standard laboratory tests. It provides a link between Seagate’s published
reliability specifications and real-world drive reliability as experienced by the end-user.
Definitions
Seagate estimates the mean time between failures (MTBF) for a drive as the number of power-on hours
(POH) per year divided by the first-year annualized failure rate (AFR). This is a suitable approximation
for small failure rates, and we intend it to represent a first year MTBF. The annualized failure rate for a
drive is derived from time-to-fail data collected during a reliability-demonstration test (RDT). Factory
reliability-demonstration tests (FRDT) are similar but are performed on drives pulled from the volume
production line. For the purposes of this paper, we assume that any concept that applies to an RDT also
applies to an FRDT.
Seagate Reliability Tests
At Seagateâ Personal Storage Group in Longmont, Colorado, desktop disc drive reliability tests are normally
conducted in ovens at 42ºC ambient temperature to provide accelerated failure rates. In addition,
the drives are operated at the highest possible duty cycle (a drive’s duty cycle is defined by the number
of seeks, reads and writes it performs over a specific time period). We do this to discover as many failure
modes as possible during the product development cycle. By fixing any problems we may see at this
stage, we can make sure that our customers won’t see the same problems.
Estimating Weibull Parameters
Let’s assume we have an RDT with 500 drives, all run for 672 hours at 42ºC ambient temperature.
During this test, further assume that we observe three failures (at 12, 133 and 232 hours). This
means that, of the 500 drives tested, 497 ran the entire test without failing. To analyze and extrapolate
from the test results, we perform Weibull modeling using SuperSmith software from Fulton Findings.1
Specifically, we use the Maximum Likelihood method to estimate the Weibull-distribution parameters
Beta (a shape parameter) and Eta (a scale parameter).
In tests with five or fewer failures, the Beta parameter cannot be well defined by the test data.
Because such cases are common in drive testing, we analyze the data using a WeiBayes2
approach.
This approach requires that we estimate the Beta parameter using historical data. In the desktopproducts
lab, we are currently assuming that Beta = 0.55. This value is based on the manufacturing
data shown in the following table, which includes all desktop products tested prior to March 1999.
1. SuperSmith, Fulton Findings, WinSMITH and WinSMITH Weibull are trademarks of Fulton Findings, 1251 W. Sepulveda Blvd., #800,
Torrance, CA 90502, USA
2. Abernethy, Dr. Robert B., The New Weibull handbook, Second Edition, published by the author, 1996, Chapter 5.
From: Gerry Cole
Seagate Personal Storage Group
Longmont, Colorado
Date: November 2000
Number: TP-338.1
INTELLIGENCE
TECHNOLOGYPAPER
i
FROMSEAGATE
Corporate Headquarters Asia/Pacific Headquarters Europe, Middle East and Africa Headquarters
Scotts Valley, California, USA +1-831-438-6550 Singapore +65-488-7200 Boulogne-Billancourt, France +33 1-41 86 10 00
The graph below shows the results of both the Weibull and WeiBayes analysis. The solid line in the figure
below shows Weibull Beta and Eta parameters (Beta = 0.443, Eta = 69331860) estimated using the Maximum
Likelihood3
(MLE) approach on only 3 failures out of 500 drives. As mentioned before, these results are considered
less accurate than those of the WeiBayes method for small failure rates.
The results of the WeiBayes method (with Beta = 0.55) are shown as a dashed line in the figure below.
Because 672 test hours at 42ºC should be a sufficiently long run time for an RDT, we use our internal
test exit confidence level4
of 63.2 percent for the WeiBayes analysis. The WeiBayes calculations indicate that,
at 42ºC, given a historical Beta = 0.55, a reasonable value for Eta is 3,787,073 hours.
The next step in the analysis is to convert the value for Eta that was based on tests at 42ºC to a value that
reflects our specified operational temperature (25ºC). Using the Arrhenius Model,5
an acceleration factor of
2.2208 can be used to account for this difference in temperature. Therefore, the value for Eta at 25ºC (Eta25)
is assumed to be equal to the value for Eta at 42ºC (Eta42) times 2.2208, or 8,410,332 hours.
3. Abernethy, Dr. Robert B., The New Weibull handbook, Second Edition, published by the author, 1996, Appendix D.
4. Earlier in the RDT, a larger confidence level would be used to reflect the uncertainty in Weibull parameter estimation
due to the limited run time.
5. Nelson, Wayne, Applied Life Data Analysis, John Wiley & Sons, 1982.
2
Desktop drive site Database Mean Beta Standard Deviation of Beta
Longmont 37 RTD, 5 FRDT 0.546 0.176
Perai 2 RTD, 4 FRDT 0.617 0.068
Wuzi 1 RTD 0.388 n/a
Pooled desktop data 49 Tests 0.552 0.167
Example of Weibull and WeiBayes Analysis
CumulativePercentFailure
Test Time at 42ºC (hours)
.1
10 100 1000 10000 100000 1000000 1E+07 1E+08 1E+09
.2
.5
1
2
5
10
20
30
40
50
60
70
80
90
Observed Weibull fit via MLE
WeiBayes fit
W/mle
\WB c=-63.2
YR2000
MO2D22
GFC
Eta Beta n/s
69331860 0.443 500/497
3787073 0.55
3
Applying the Estimated Weibull Parameters to Estimate First-Year MTBF
Using the temperature-adjusted estimated values of the Weibull Beta and Eta parameters, we can calculate the
cumulative-percent-failure rate at any time. By subtracting the cumulative-percent-failure rates for two different
times (t1 and t2), and using appropriate values for Beta and Eta25, we can estimate the percent of drives
that are likely to fail at 25ºC during any time interval from t1 to t2.
To estimate the AFR for the first year of drive operation in a desktop computer setting, we assume that the
drive is used at a rate of 2,400 power-on-hours (POH) per customer year. In addition, we assume that drives
are subjected to a 24-POH integration period by the device manufacturer. Because any drives that fail during
this period are returned to Seagate and are not shipped to the end-user, they are not counted in the first year
AFR and MTBF.
Based on these assumptions (100% duty cycle, Eta25 = 8,410,332 hours, Beta = 0.55, and 2,400 POH per
year) the percent failure rate in the first customer year after integration can be calculated as the percent failure
rate between 24 hours (t1) and 2,424 hours (t2). The results of this calculation are shown in the table
below, which derives a first-year MTBF from the RDT data.
Input area: 2,400 hours per year
Weibull shape factor (Beta): 0.55
Weibull scale factor (Eta): 8,410,332
P(fail), 0 to 2,424 POH per year: 1.123%
P(fail), 0 to 24 hours: – 0.089%
First-year AFR = 1.0338% (before rounding)
POH per year: 2,400
First-year AFT: ÷ 0.010338
First-year Weibull MTBF = 232,140
Accounting for Actual User Conditions
The calculations above suggest that if a customer were to use our drive at 25ºC and 2,400 POH per year, the
expected customer MTBF in the first year would be 232,140. However, these conditions may not always apply
to the consumer electronics environment. For example, in some consumer devices, the drive may be powered
on almost 100 percent of the time and yearly usage rates may be much higher than 2,400 POH. In other
devices, such as video game players, the POH per year may be relatively low. The following section describes
how we can adjust the calculated MTBF so that it applies to various usage levels, duty cycles and ambient
temperatures.
Usage Levels
To account for variation in MTBF due to
different levels of usage, we may use the
MTBF adjustment curve shown at right.
For example, to adjust an MTBF from
2,400 POH per year to a maximum usage
rate of 8,760 POH per year, the MTBF
would be increased by 1.8 times.
Conversely, for low-usage environments, as
in some video games, the MTBF may be
decreased by as much as a factor of two.
0.00
492
0.50
1.00
1.50
2.00
1128
1764
2400
3036
3672
4308
4944
5580
6216
6852
7488
8124
8760
Adjusted MTBF as Function of Expected POH per Year
Expected POH per Year
MTBFSpec.Multiplier
4
Temperature
Next let’s look at the effects of elevated operating temperature. The same Arrhenius Model that we used to
develop an acceleration factor may also be used to generate an MTBF temperature derating-factor (DF) curve.
The following table shows the decrease in first-year MTBF (at 100% duty cycle) as ambient temperature
increases above 25ºC.
Temp (ºC) Acceleration Factor Derating Factor Adjusted MTBF
25 1.0000 1.00 232,140
26 1.0507 0.95 220,533
30 1.2763 0.78 181,069
34 1.5425 0.65 150,891
38 1.8552 0.54 125,356
42 2.2208 0.45 104,463
46 2.6465 0.38 88,123
50 3.1401 0.32 74,284
54 3.7103 0.27 62,678
58 4.3664 0.23 53,392
62 5.1186 0.20 46,428
66 5.9779 0.17 39,464
70 6.9562 0.14 32,500
From the table above, it is clear that as the ambient temperature rises, the derating factor and the adjusted
MTBF become significantly smaller. For example, at 42ºC, we find the 2.2208 acceleration factor referred to
previously in this analysis. Its reciprocal, 0.45, is the DF value, which indicates that the MTBF at 42ºC is less
than half as long as the MTBF at 25ºC.
Duty Cycle
Most disc drives in PCs are operated at duty cycles of 20 percent to 30 percent. However, consumer electronics
devices may have lower or higher duty cycles. Seagate has measured average daily data-transfer rates on existing
consumer electronics devices and found duty cycles as low as 2.5 percent.
To compare the effect of a 2.5 percent duty cycle with that of a 100 percent duty cycle (used in RTD testing),
we can examine the effect of duty-cycle-dependent components in the drive relative to other components.
The number of duty-cycle-dependent components in a hard disc drive is proportional to the number of
discs in the drive. The relationship between disc count and AFR is shown in the following figure. In this graph,
the area below the dotted line indicates the “base” or nonduty-cycle-dependent failure rate for a hypothetical
drive with no discs (or a drive that is not reading, writing or seeking). The solid line indicates estimated failure
rates as a function of the number of discs present.
0
0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
1 2 3 4
Effect of Disc Count on Total and Base AFR
Disc Count (4 is max)
NormalizedAFR
Total AFR
Base AFR
5
From the previous graph it is clear that reducing a drive’s duty cycle reduces only the duty-cycle-dependent
failures (those between the dotted and solid line). Using the ratio between duty-cycle-dependent and total
failures, we can estimate the effect of duty cycle on AFR. For example, consider a four-disc drive with a total
AFR of 1.4 percent and a base AFR of 0.6 percent. Reducing the duty cycle would reduce the failures by the
factor [(1.4 – 0.6)/1.4] = 57 percent. In accounting for reduced duty cycle on a four-disc drive, therefore, we
can only reduce 57 percent of the failures; the remainder are treated as independent of duty cycle.
The resulting MTBF multipliers for drives with different numbers of discs are shown in the following figure.
Combining Multiple Factors
To continue the analysis, we combine a range of duty cycles and temperature derating factors (DF) for several
different drives. The figure on the left shows MTBF multipliers at a variety of duty cycles and temperatures for
a high-capacity, 4-disc drive. The figure on the right shows the same multipliers as applied to a drive with only
one disc. As shown in these figures, depending on the duty cycle and the ambient temperature of the drive in
the customer’s PC, the first-year effective MTBF may be greater than, equal to, or less than the MTBF that we
estimate based on in-house testing. For the one-disc drive, the effects of varying duty cycles are less significant
and the MTBF multipliers tend to be significantly smaller.
1.00
100%
1.20
1.40
1.60
1.80
2.00
2.20
90% 80% 70% 60% 50% 40% 30% 20% 10%
MTBF Multiplier vs Duty Cycle and Platter Count
Duty Cycle
MTBFMultiplier
1-disk Minimum
Capacity MTBF
Multiplier
2-disk MTBF
Multiplier
3-disk MTBF
Multiplier
4-disk Maximum
Capacity MTBF
Multiplier
0.00
0.50
1.00
1.50
2.00
2.50
26 30 34 38 42 46 50 54 58 62 66 70
Thermal Derating for a Range of Duty Cycles
(for Maximum capacity, 4-disc drive)
Ambient Temp ºC
MTBFMultiplier(DF)
DF @ 100% Duty Cycle
DF @ 30% Duty Cycle
DF @ 20% Duty Cycle
DF @ 10% Duty Cycle
DF @ 5% Duty Cycle
DF @ 1% Duty Cycle
0.00
0.20
0.40
0.60
0.80
1.00
1.20
1.40
26 30 34 38 42 46 50 54 58 62 66 70
Thermal Derating for a Range of Duty Cycles
(for Minimum capacity, 1-disc drive)
Ambient Temp ºC
MTBFMultiplier(DF)
DF @ 100% Duty Cycle
DF @ 30% Duty Cycle
DF @ 20% Duty Cycle
DF @ 10% Duty Cycle
DF @ 5% Duty Cycle
DF @ 1% Duty Cycle
6
Reliability after the First Year
The Weibull distribution of time-to-failure, with a Beta less than one, is a distribution of decreasing failure
probability over time. Because of this, MTBF values for a drive’s first year in the field are likely to be lower
than for subsequent years. What would the failure rate or MTBF look like if averaged over the entire useful
lifetime of the drive? Three possible methods for estimating reliability over a drive lifetime are listed below:
• We could use the Weibull [Beta, Eta25] analysis to estimate failures after the first year. However, this
would require extending the RDT test results up to an order of magnitude beyond the duration of the
test. This would not be a very conservative practice.
• We could use data from the Seagate warranty-return database, from which we may estimate the returns
in the second and third years relative to the number of drives returned in the first year. This data is only
applicable to the first three years, which is the limit of most current Seagate desktop-drive warranties,
but it has the advantage of being based on only Seagate desktop products.
• We could assume a model that would “flatline,” or maintain a constant failure rate after the end of the
first year. In other words, we could assume that after the first year, all yearly failure rates would all be
equal to the second-year failure rate. Since failure rates would, if anything, decline over time, this would
be a conservative estimate of averaged MTBF for the life of the drive.
These models are compared in the table below.
Year Cumulative Yearly Cumulative Yearly Cumulative Yearly Cumulative
power-on hours failure rate failure rate failure rate failure rate failure rate failure rate
1 2,400 1.20% 1.20% 1.20% 1.20% 1.20% 1.20%
2 4,800 0.55% 1.75% 0.78% 1.98% 0.55% 1.75%
3 7,200 0.43% 2.18% 0.39% 2.37% 0.55% 2.30%
4 9,600 0.37% 2.55% 0.55% 2.86%
5 12,000 0.33% 2.88% 0.55% 3.41%
6 14,400 0.30% 3.18% 0.55% 3.96%
7 16,800 0.28% 3.46% 0.55% 4.51%
8 19,200 0.26% 3.72% 0.55% 5.06%
9 21,600 0.24% 3.96% 0.55% 5.62%
10 24,000 0.23% 4.19% 0.55% 6.17%
Weibull Warranty Data (OEM only) Flatline Model
MODEL:
7
To further illustrate the differences between these models, let’s look at the cumulative percent failure rates for
the three different models, each assuming a 200,000-hour first-year MTBF:
As the graph above shows, the “flatline” model is less aggressive than the pure Weibull model, and comes
close to the model based on Seagate warranty returns in the first three years. For simplicity, and to provide
a conservative estimate, we have chosen to use the flatline model for our calculations.
Using the flatline model, the results of lifetime-averaged MTBF compared to first-year MTBF may be summarized
as follows:
Average values for years 1 through 3:
Failures per year 0.768%
MTBF 312,500
Improvement over noncorrected MTBF (232,140 hours) 1.56
Average values for years 1 through 5:
Failures per year 0.682%
MTBF 352,113
Improvement over noncorrected MTBF (232,140 hours) 1.76
Average values for years 1 through 10:
Failures per year 0.617%
MTBF 389,105
Improvement over noncorrected MTBF (232,140 hours) 1.95
These calculations indicate that you multiply the first-year MTBF (at the appropriate duty cycle and ambient temperature)
by 1.56 to estimate the averaged MTBF over a three-year drive lifetime. Similarly, to estimate the average
MTBF over a drive lifetime of five or ten years, multiply the first year MTBF by 1.76 or 1.95, respectively.
0.00%
1.00%
2.00%
3.00%
4.00%
5.00%
6.00%
7.00%
1 2 3 4 5 6 7 8 9 10
Cumulative Yearly Failure Rate by Customer Year,
Weibull and Flatline Models Compared to Warranty Returns
Customer Year
CumFailureRateper
CustomerYear
Weibull analysis
“Flatline” model
Model based
on OEM drive
warranty data
Putting it All Together
By combining the multipliers and derating factors described above, we can convert the Seagate-specified
MTBF (first year, at 25ºC ambient temperature, 2,400 POH per year and 100 percent duty cycle) into an
MTBF that applies to a drive in a customer’s device at an appropriate ambient temperature and duty cycle.
We can then estimate the average MTBF over the drive’s lifetime.
The following example demonstrates the calculation of first-year and drive-lifetime MTBF for a drive operated
at 2,400 POH per year at an ambient operating temperature of 38ºC, a duty cycle of 30 percent and a fiveyear
useful life.
First-year MTBF: 232,140 hours (based on Weibull parameters: Beta, Eta25)
´ 0.90 (temp derating for 38ºC and 30% duty cycle)
Customer first-year MTBF: 208,926 hours
Customer MTBF: 208,926 hours
´ 1.76 (factor for averaging over five-year lifetime)
Customer drive-lifetime MTBF: 367,710 hours
As a final example, consider the case of a 1-disc Seagate drive with a specified first-year MTBF of 500,000
hours, which is being operated in a consumer electronics device for a usage rate of 2,920 POH per year (eight
hours a day, seven days a week), an ambient temperature of 42ºC and a duty cycle of 5 percent.
First-year MTBF: 500,000 hours (based on Weibull parameters: Beta, Eta25)
´ 1.09 (adjustment for 2,920 POH per year)
´ 0.59 (derating for temperature of 42ºC and 5% duty cycle)
´ 1.95 (factor for averaging over 10-year drive lifetime)
Customer average MTBF: 627,023 hours
Conclusion
The method outlined above allows us to use Seagate laboratory test data to estimate the reliability of drives
in desktop computers and consumer electronic devices in real-world settings. The method can be summarized
as follows:
• Use Weibull or historical RDT/FRDT test data to estimate Weibull parameters for drive tests.
• Use WeiBayes analysis of test data for a specific type of drive to estimate first-year AFR and
MTBF under RDT test conditions.
• Correct for any differences from the assumed usage rate of 2,400 POH per hour.
• Correct these values to take into account differences between RDT conditions and the real-life
temperature and duty-cycles experienced by the drive after it reaches the customer.
• Extend the first-year customer reliability estimates over a three- to ten-year drive lifetime, using the
conservative assumption that failure rates remain constant after the drive’s first year in the field.
In conclusion, this method provides a mathematically reasonable method for using Seagate test results to
estimate drive reliability in consumer electronics.
8