Reliability Engineering

Physics of Failure

PoF is an alternative approach / methodology to reliability that is focused on failure mechanism, failure site & root cause analysis instead of the more conventional approach that looks at failure modes & effects alone. The PoF approach characterizes reliability through lifetime distributions (probability distribution of frequency of fails v/s time) instead of hazard rates (failure rate v/s time). PoF approaches involve the followings steps:
1. Study of the hardware configuration: geometry, design, materials, structure
2. Study of life cycle loads: operational loads (power, voltage, bias, duty cycle) & environmental loads (temperature, humidity, vibration, shock)
3. Stress analysis: Stress-strength distributions/interference, cumulative damage assessment & endurance interference, FMEA, hypothesize failure mechanisms, failure sites & associate failure models, root cause analysis, calculate RPN's to rank & prioritize failures.
4. Reliability assessment: Rel metrics characterization, life estimation, operating/design margin estimation.
5. Interpret & apply results: Design tradeoffs & optimization, ALT planning & development, PHM & HUMS planning.

PoF approach to reliability
HW config definitionDesign/Architecture, Materials Geometry
Life cyc load assessment
Op loads: Voltage, Power, Current, Frequency, Duty Cycle
Env loads: Temp, RH, shock/vib from S&H, storage conditions
Stress analysis
Stress-Strength interference
Failure modes/mechanisms/locations
FMEA
Understand / hypothesize root cause
Failure models / LSR
Reliability assessment
Develop life distribution
Predict useful life & calculate other metrics
Results
Rank / prioritize failure mechanisms & risks
Design tradeoffs, understand margins
Risk mitigation
Reliability metrics, ALT / PHM planning


Use HW configs and life cycle loading to understand & prioritize failure mechanisms/root causes, to use a failure model (LSR) to develop a lifetime distribution (Weibull, exponential, lognormal, etc) to calculate reliability metrics.

Helps prioritize risks/failures, plan for risk mitigation, design tradeoffs/margins and estimate reliability metrics.

Defines reliability as probability of meeting a given lifetime/ service life at a given confidence level.


Life Stress RelationshipsStressModel ExampleFailure Mechanism
Exponential ModelThermalArrhenius
Inverse Power LawNon thermal (humidity, mech)Coffin MansonLCF
Peck's Model
Exponential + IPL
Norris-LandzbergLCF
Black'sEM

Reliability / Life Estimation

Run ALT & collect TTF data
Fit TTF data to life distribution (for eg, 2P Weibull), to generate eta (test) , beta (test = field, const for given failure mechanism)
Use eta as life characteristic (MTTF), and model eta (test) as function of (test) stress, using a LSR
Estimate LSR model parameters
Knowing LSR, and using stress = use case/ field condition, calculate eta (field)
Use eta (field) and beta (test=field) to model life distribution under use case / field condition, to predict useful life and other reliability metrics

Use POF to understand failure mechanism > Use LSR [Nf = f(stress)] > Lifetime distribution > Calculate rel metrics

ALT to get TTF > TTF distribution (Test) > Estimate LSR model parameters > Lifetime distribution > Calculate rel metrics

Extrapolating Accelerated Test TTF to Predict Use Level Lifetime

Collect TTF at multiple accelerated stress levels.
Fit TTF data to life-distribution (for eg. 2P Weibull) for each stress level, and estimate parameters - each distribution with different eta but same beta (for the same failure mechanism).

Life distribution characteristic (eta) is a function of stress & is equated to stress level using a given Life-Stress-Relationship (LSR) - for eg Eyering Model
Estimate model constants

Now, knowing model constants, estimate eta at various stress levels, including at use level. And knowing beta (for the given failure mechanism) use-level life distribution can be predicted.


Environmental Stress Screens (ESS)

PhaseTestStressResponseMetricESSNotes
Product QualHTOLElevatedELFR / IM & Useful LifeFIT or dppmHALTApplies during design / development
Production/ORMBI on SLTElevatedELFR / IMFIT or dppmHASSApplies during production, stresses closer to operating envelopes
Production/QualitySLTNormalFailure rateAQL or LTPD<AQL = Definition of Good Lot
>LTPD = Definition of Bad Lot








Product Qualification

BI/HTOL/LU/ESD(HBM/CDM)/THB or BHAST/PCT

BI/HTOL used to calculate ELFR & DPPM during useful life, that can be used to generate the FIT rate

Device level : ESD/LU, HTOL, EM
Pkg level: TCG/B/J, UHAST, BHAST, HTS, THB (85/85), Autoclave, Preconditioning, MR/Hammer
BLRT: TC, bend, drop/shock, vibration
Functional: Acoustic, Electromagnetic, Electrical compliance

BLRT

Strain gauges, electrically monitor resistance of daisy chained balls
Temperature cycling: -40 to 125C
Vibration: Axes, RMS acceleration/displacement/velocity, 3G, time per axis
Drop / Shock: Axes, no of pulses, 200G -1500G, ms of pulse
Monotonic or cyclic bend
Dye & Pry



Reliability Standards
JEDEC, IPC, J-STD
JEP, JESD
MIL-883 / PRF 38535
NEMI, ASTM, JIS, EIA



Fails - Exact or Censored

Failure times are either EXACT or CENSORED

Right censored or suspended = Unit has survived when test is stopped

Interval censored = Unit has failed between "Last Inspection" and now

Left censored = Similar to Interval Censoring but "Last Inspection"/Start is 0



FIT Rates

h(t) = hazard rate = f(t)/R(t)
Hazard rate is instantaneous failure rate. But usually, average failure rate for a given time period is more useful, for eg. failure rate per hour.

FIT rate is defined as ppm failure per 1000 hours or number of fails per 10^9 device hours.

Multiply hourly fail-rate by 10^9 to generate FIT rate.

Hourly fail rate = No of rejects / (No of devices x No of hours x AF)

FIT rate = Hourly fail rate x 10^9

No of rejects is determined by Chi-square distribution = [(x^2)/2], commonly at 60% confidence (alpha) and dof: 2r + 2

Power cycling

Motherboard, mux board & fan controller board - design & fabrication, DAQ system, Labview software, package + socket + heat sink + wind tunnel/flow channel, system integration, system setup & debug, calibration & testing.

Failure Mechanisms

OverstressStress/Strength interference
MechanicalBrittle fracture, plastic yielding
ElectricalEOS/ESD, dielectric breakdown
ThermalLU/Thermal runaway, glass transition

WearoutCumulative Damage / Endurance Limit
MechanicalSolder joint fatigue, Creep
ElectricalEM
ThermalIMC formation / Kirkendall voiding
ChemicalCorrosion, ECM, CFF
Thermo-mechanicalDelamination

Package failure mechanisms
UF delamination
UF fillet cracking
Solder joint fatigue
Bulk die cracking
ILD (ULK/ELK) delamination
UF voids / solder creep / extrusion
Corrosion


LCF: Plastic / high strain range
Coffin MansonNf = f (strain range)
EngelmaierNf = f(damage from plastic & viscoplastic deformation) = f (temp swing)
Norris -LandzbergNf = Exponential /Arrehenius term x Engelmaier model
HCF: Elastic / low strain rangeBasquin's LawNf = f (stress amplitude)


Coffin Manson model predicts lifetime for LCF failure for solder joints, assuming IPL model for non-thermal mechanical stress (from strain range per cycle).

Engelmaier improves on the CM model by showing that LCF failure lifetime is related to damage per cycle from plastic and viscoplastic deformation, that in turn is related to the temperature swing per cycle.

Norris Landzberg adds an Arrehenius/exponential terms to the Engelmaier model to show that LCF is also temperature dependent - and is a thermo-mechanical failure mechanism.

The N-L model adds an exponential / Arrhenius term to the Engelmaier model (that is based on the CM / IPL model) to predict Nf tor LCF failure for solder joint fatigue.





Failure Analysis Techniques: Resolution

STM, AFM, EELS, SIMS, TEM (Angstroms) < AES (2nm) < XPS (5nm) < SEM (10nm) < BSE/EBSD (30nm) < EDX/WDX/XRF (0.3u) < FTIR (3u)



Failure Analysis: Tools & Techniques

Microstructural analysis:
Topography: SEM (low voltage, inelastic collisions, higher resolution, low contrast) /BSE (high voltage, elastic collisions, lower resolution, high contrast)
Morphology: (lattice geometry, crystallographic structure) EBSD/TEM/AFM/STM

Material analysis:
Elemental: EDX/WDX/XRF
Chemical: (structural bonds, oxidation states) AES/XPS/EELS/SIMS/FTIR

MicrostructuralMaterial
TopologyMorpholologyElementalChemical
Lattice geometry
Oxidation states
CrytallographicBonding
SEMTEMEDXFTIR
BSEEBSDXRFAuger
AFMWDXXPS
STMSIM/EELS

Interaction between primary electrons & matter: SEM, TEM, BSE, EBSD, EDX & WDX
Interaction between primary X-Rays & matter: XPS, AES, XRF

Incident BeamResultant emissionFA Method
Primary electrons
Secondary electronsSEM
Backscattered electronsBSE / EBSD
X-RaysEDX, WDX
Transmission electronsTEM
XRays
Secondary X-raysXRF
PhotoelectronsXPS
AugerAES

SEM: Inelastic collisions, low energy, low contrast, higher resolution
BSE:  Elastic collisions, high energy, high contrast, low resolution


Other techniques: Opticals, X-Rays, CSAM, Curve Trace, TDR, IR & thermal imaging, SQUID, LSM (LIVA/OBIC for opens & TIVA/OBIRCH for shorts), x-sections, P-laps & FIB cuts

Flow: C/T, TDR, Delid/optical, X-Ray, CSAM, XS, SEM/EDX, FIB, P-lap, SQUID, LSM





No comments:

Post a Comment

Smartphone Components

Antenna + Switch & RFFE, Filter, Duplexer, Amplifier, Transceiver, Baseband, Application Processor [SOC + LPDDR3], Memory [Flash / SSD...