Monday, November 2, 2015

Hypothesis Testing in JMP


One sample t-test
Used to compare mean of 1 population/sample to a hypothesized value
Null hypothesis states that population/sample mean is equal to hypothesized value
[Analyze-> Distribution, select TEST MEAN]


Two sample t-test
Used to compare means of 2 populations/samples with each other. Continuous variable (Y) v/s 2-level single categorical variable/factor (X)
Null hypothesis states that means of the 2 populations/samples are equal to each other
[Analyze -> Fit Y by X, select MEANS/ANOVA/POOLED-T or T-TEST]


One Way Anova
Used to compare means of 3 or more populations/samples. Continuous variable (Y) v/s single categorical variable with 3 or more levels (X)
Null hypothesis states that means of all populations/samples/levels are equal to each other.
[Analyze -> Fit Y by X, select MEANS/ANOVA + COMPARE MEANS]


Two Way (Factorial) ANOVA
Used to study effects of 2 categorical variables/factors and their interaction on single response. Continuous variable (Y) v/s two categorical variables (X1, X2)
Null hypothesis states that factors do not have a significant effect
[Analyze -> Fit Model, select LS MEANS PLOT]

Thursday, October 29, 2015

JMP Data Types & Graphs


Variables are either Continuous or Categorical (Nominal or Ordinal).
Continuous (Y) v/s Continuous (X) --> BIVARIATE
Continuous (Y) v/s Categorical (X) --> ONE WAY ANOVA

Categorical(Y) v/s Categorical (X) --> CONTINGENCY
Categorical (Y) v/s Continuous (X) --> LOGISTIC

Monday, October 5, 2015

HPC & CoProcessors


Traditionally, HPC environment -> Co Processors or Accelerators (non bootable) combined with server x86 processors, to offload tasks, interfacing through PCIe, typically GPU class SoC's since these provide high bandwidth performance. GPU SoC need GDDR5, server processors will need their own memory (DDR4),this increases system complexity and power consumption. Although this approach is good for certain applications, in other cases, especially with a large mix of single-threaded, serial & serial/parallel mixed applications, this has severe limitations. NVidia Tesla (K40) & Intel Knights Corner are examples of this implementation. Intel's Knights Landing is a paradigm shift in HPC, since it eliminates the need for a coprocessor. This is a fully bootable x86 server/accelerator fused with high capacity DDR4 controllers as a single SoC packaged as an MCM with 8 stacks of HMC (multi channel DRAM with upto 16GB memory). This reduces system complexity and improves system efficiency.

Sunday, August 30, 2015

Bump Electromigration


Run kinetic study at multiple currents & temperatures to estimate Ea (activation energy) & n (current density exponent) in Black's equation:
MTTF = A J ^ (-n) exp [Ea/KT]
Resistance of the EM device is measured using 4-wire Kelvin structure, and an increase in R over a predefined threshold is considered to be a failure.
As a first step, the temperature sensor (that may be an on-die resistor or diode) is calibrated in the oven (set at a certain temp, for a low constant current) to generate R v/s temp (for the resistor) or V v/s temp (for the diode) to estimate TCR.
Then the oven is set to stress temperatures, and EM test structure is powered at stress current, to note initial change in R. Knowing TCR, delta R is equated to delta T. This is 'Joule Heating'.
Device temp = Oven Temp + DeltaT (Joule heating)
TTF data is collected and fit to an appropriate statistical distribution, which is then used to estimate MTTF from CDF plot (Cumulative % fails v/s time). Common distributions used to model EM data are Lognormal & Weibull.
Lognormal = sigma -> shape parameter, gives measure of distribution width.
Weibull = beta is shape parameter, eta is characteristic time (or time when 63.2% fails) MTTF obtained for multiple stress conditions (current & temp), based on distribution parameter estimation helps estimate Ea & n in Black's Equation. Once Ea & n have been estimated, lifetime/usage requirements (for eg 95C device temp & 5 years life) is used to predict Imax.

Local TIM Resistance Extraction


Run experiment on TTV at given P, Ta - and measure Tc (at lid center) and Tj at multiple die locations(local).
Setup thermal/FEA model and adjust heat transfer coefft until same Tc is obtained for given Ta and P.
Model will predict Tj (local) by estimating deltaT as P x Theta-jc from experiment (= [Tjlocal - Tc]/P)
Tj (local) = Tc + delta T
Force Tjlocal predicted from model to match Tjlocal from experiment, by adjusting local TIM1 thermal resistance.

Sunday, August 23, 2015

Thick & thin lids : Impact on Thermal R


Theta-ja (local) = Theta jc (l) + Theta cs (l) + Theta sa (l)
From center to corner, Theta jc (l) increases, but due to 3D heat flow/ lateral heat spreading - theta cs (l) & theta sa (l) actually decrease. The resultant impact is that theta ja (l) also diminishes from center to corner.
This effect is more pronounced on thicker lids v/s thinner lids, since thicker lids allow more lateral heat spreading

Board components


SoC/Processor, logic chipsets (Southbridge), memory modules (DIMM, SODIMM, etc), PCI (network, storage)/AGP (graphic cards/GPU), SATA/IDE, power supplies & PMIC, heat sink, fans, wind tunnel / flow channel

Saturday, August 22, 2015

MS&V


Vibration tests: JESD22-B103B

Swept sine tests: SHM with logarithmic sweep of frequency range from min-to-max, 4 times (4 mins each time) for each of X, Y, Z axes.
Peak acceleration (G), Peak displacement, crossover freq, freq range -for each use condition.
Example: Service condition 1, 20-2KHz, 4 mins/cycle, 4 cyc/axis, 3 axis.

Random vibration:Vibration applied for 30 minutes in each of the 3 orthogonal axes, X, Y, Z
RMS acceleration (G), RMS displacement, RMS velocity, - for each use condition
For each use condition, the selected test parameters above result in PSD values for different frequencies, generating plots/profiles showing PSD (intensity of acceleration power, measured in G^2/Hz) variation across a range of frequencies (2-500Hz).
Area under PSD-frequency plot is RMS acceleration (G)

Mechanical Shock JESD22-B110B:
Component or sub-assembly free-state: 5 shocks x 3 axes (X, Y, Z) x 2 directions/axis, minimum (total) 30 shocks
Sub-assembly mounted: 2 shocks x 3 axes (X, Y, Z) x 2 directions/axis, minimum (total) 12 shocks
Peak acceleration (G), pulse duration, velocity change, equivalent drop height.
Example: Service cond. B, 1500G, 0.5 ms, 5 times/axis, 6 axis

Sunday, August 16, 2015

TCNCP process profile


Tpre-heat -> Tstage ->Initial high speed approach upto search height -> Approach at search speed -> Contact (min force)-> Force ramp to max force (pressurization), bond head temp increased from stage (standby) temp to Tcontact -> temp ramp to Tpeak or Tcure ->cooldown and head disengagement.
Tbond-head(standby) = T stage -> to minimize any risk of NCP entrapment
Tbond-head(standby) = T contact -> for best throughput
Tbond-head(standby) > T contact -> risk of NCP entrapment

TCNCP with OSP based surface finish


TCNCP requires OSP surface finish (low cost, fine pitch process margins v/s Sn based finishes, controlled solder wetting v/s EPIG finish). However this requires use of fluxing agents in NCP material, to remove oxides during thermal compression bonding. Also it is critical to remove any residual OSP from the pad, since the OSP can combine with by-products of the fluxing reaction and lead to NCP entrapment. This therefore, requires an OSP pre-clean step, which comprises of plasma-treatment & chemical deflux, ahead of TCB.

Thursday, August 13, 2015

Power cycling


Motherboard, mux board & fan controller board - design & fabrication, DAQ system, Labview software, package + socket + heat sink + wind tunnel/flow channel, system integration, system setup & debug, calibration & testing.

Wednesday, August 12, 2015

PoP Evolution


PoP is popular package configuration for smartphones, since this format allows integration of AP/baseband logic with DDR(DRAM) in limited space constraints (footprint & height) while maximizing performance (high speed & bandwidth for memory). Over time, has evolved from WB-POP to FC-POP, and now from BD-PoP to MLP-PoP.

Increasing performance requirements drives larger logic die and package body size, thin core or coreless/ETS substrates, all of which increase warpage that make meeting tightening coplanarity requirements, very challenging. Package height constraints further necessitate thin die/substrates that increase coplanarity/warpage concerns.

In addition, higher memory performance drives need for fine pitch memory, that require smaller ball sizes, which translates to smaller collapsed height or clearance between top & bottom packages.

MLP-PoP is an approach that (1) alleaviates above concerns of warpage & coplanarity, by making use of an overmold that adds structural robustness to the package and (2) enables fine pitch memory by improving the clearance between the 2 packages, without needing excessive die thinning.

This may be MLP-ED (exposed die) or MLP-OM (overmold). ED reduces overall package thickness but slightly higher cop/warpage is the resulting tradeoff. Further, this may be MLP-CUF v/s MLP-MUF

However, MLP-PoP requires additional molding processes - and comes at a premium (cost). Lower cost alternative is to use BD PoP with CuBOL for the bottom package, that increases package to package clearance by reducing die-to-substrate standoff.

Saturday, August 8, 2015

L-Gate methodology


L-Gate: Technology/Product Development
L-1: Explore / PC1
L 0: Define / PC2 & T/O
L 1: Enable/BKM determination
L 2: Implement/BKM optimization & corner
L 3: Qualify/BKM validate
L 4: Ramp/PRU
L 5: Production/HVM

Saturday, August 1, 2015

TSV process

Front side:
Etch/Dielectric liner/barrier/seed/fill/RDL/passivation/landing pad

Back side:
Temp bond/backgrind & TSV reveal/MEOL/passivation/bump & debond

2.5D flows: CoW v/s CoS

2 primary flows: CoS and CoW (or CoC)
CoW/CoC may be chip-first (attach before interposer MEOL) or chip-last (after interposer MEOL)
Chip-first requires committing expensive die on interposer, without knowing interposer yield, but allows chip-attach on full thickness wafers. [Concern: Assemblly yield]
Chip-last uses KGD & finished interposer (or KGI) and therefore promises higher assembly yield, but requires thin interposer wafer handling (WSS) and therefore increases assembly cost. [Concern: Assembly cost]
CoS leverages existing flip-chip assembly infrastructure and allows test insertion before committing expensive BOM (logic/ASIC/memory die), but large interposer attach to substrate first, generates warpage concerns that may challenge ASIC/logic/memory die attach to interposer. [Concern: Assembly yield for large die]

Friday, July 31, 2015

2.5D adoption


Image CMOS sensors/DSP, DRAM/flash memory stacks -> early adopters (Sony, SK Hynix, Micron, Samsung, Toshiba)
High performance FPGA integration (Xilinx Vertex series)
High performance graphics (AMD Fiji/Fury)
Servers & networking (Altera, IBM, Huawei, Cisco)

Thursday, July 30, 2015

Copper Pillar: why, when & where


Mass reflow w/CUF (CuP) or Thermo compression bond (TCB/TCFC) -> TC-CUF, TC-NCP, TC-NCF
Electrical performance (current carrying capabilities, EM, fine pitch requirements) are driving CuP.
Mass reflow w/ Cu-pillar (CuP) maximizes UPH and therefore minimizes assembly cost. However, for large die packages, MR generates warpage concerns that can result in ELK/CPI issues. To counter this and lower package stresses, TCB or TCFC are process alternatives (but are low UPH options that end up increasing assembly cost/complexity):
TC-CUF: relatively lower cost alternative, but capillary UF flow may need vacuum underfill or pressure curing
TC-NCP: relatively medium cost alternative
TC-NCF: highest cost, application limited to thin die/ smaller FF packages/ ultra fine pitch
Shrinking pitch & die size: MR + Pb-free solder, MR + CuP/CuBOL, TC-CUF, TCNCP, TCNCF

Wednesday, July 29, 2015

AMD


Opteron 3000: low power (25-65W), low cost, 1S server, Piledriver core
Delhi, Orochi AM3

Opteron 4000: mainstream (35-95W), 1S or 2S, Piledriver core
Seoul, Orochi C32

Opteron 6000: high performance (100-150W), 2S or 4S, Piledriver core
Abu Dhabi, Orochi G34 -> Warsaw refresh

Opteron X: Kyoto, low power (11-22W), microserver (Jaguar core / mobile class)
Opteron A: Seattle, ARM based server(25W), ARMv8 64-bit Cortex A57
Roadmap:
Berlin (Steamroller core) & Toronto (Excavator core)
Next-gen microarchitectures: Zen (14/16nm x86 core) and K12 (ARMv8)

Intel


New u-architecture = TOCK; process node shrink = TICK
Haswell (22nm TOCK), Broadwell (14nm TICK)
Atom (Silvermont 22nm) -mobile SoC based core, optimized for low power, energy-efficient, scaleout applications
Xeon (Haswell 22nm or Broadwell 14nm)-> D (low end/high density server/microservers), E3 (desktop-client class/1S), E5 (mainstream or enhanced performance-EP/2S or 4S), E7 (mission critical or EX-expandibility, 4S+)
Itanium (65nm & 32nm), implementing IA-64 instruction set that is non x86 compatible
uP specs: Clock speed GHz, cache MB, DRAM bandwidth GB/s, TDP W, # cores & threads, bitwidth
SKYLAKE: 14nm TOCK (uarchitechture change), CANONLAKE: 10nm TICK (node shrink)
Denverton (Airmont 14nm) is follow on to Atom (Silvermont 22nm)

Microservers


Low power rack-mount high density modular approach, with each module or "sled" or "cartridge" being a full server, that may either be single or dual socketed
More suited to "scaleout" applications that rely on increasing/deploying more nodes & modules instead of "scaleup" that depend on boosting performance by increasing clock speed or adding more CPU cores. Cloud based solutions rely on distributed applications/computing and therefore tend to be supported by microservers - that improve power efficiency, redundancy, availability & cost management.

Tuesday, July 28, 2015

GSM, GPRS, EDGE (2.5G) -> TDMA
CDMA, WCDMA, EVDO(3G), HSPA(3.5G)
LTE (3.9G)

ARMv8 64-bit


ARMv8 is new ISA (Instruction Set Architecture) that enables 64-bit computing improving CPU performance, paving the path forward for ARM based SoC's to compete with x86 in servers: Cortex A57 and Cortex A72 are standard ARMv8 cores, but there are several custom ARMv8 cores being developed in the industry

Server 101


Server system technology
Platform = SoC processor + logic chipsets (North Bridge/South Bridge) + DIMM + Baseboard Management Controller + PCB + Graphic Cards

North Bridge = Memory controller + PCIe controller
South Bridge = SATA (HDD & SSD)/USB/Ethernet/MAC/Flash/SPI (serial peripheral interface)/LPC(low pin count)
SoC = CPU [+ GPU] + memory (cache) + memory controller + I/O controller (PCIe, SATA, USB, Gbe) + system bus
Baseboard management controller provides for remote monitoring/management of server boards by system admin
RAID: Redundant Array of Inexpensive Disks -> applied to HDD/SSD -> connect to processor through SATA interface

Server Processor performance attributes: Multicore (parallel processing), multithreading (virtualization), clock speed, cache, memory bandwidth & latency, logic chipset, memory subsystem, system bus, technology node( 14 FinFet v/s 16nm), ISA ( instruction set architecture: x86 v ARM), ECC (error correction code), RAS (reliability, availability & serviceability)

Rack & blade servers Chassis of racks, with each rack supporting: servers, switches, networking & storage, peripherals, adapters & cards, power supplies, fans & cooling equipment
1RU or 1U is 1.75" high, 19" wide
Blade server can fit into 19" rack and may be 4U to 10U (or larger) in height - and can support multiple blades, with each blade being processor + logic chipsets + memory + storage + network +I/O controllers/interfaces

Blade servers also include fabric modules + switch modules + power management modules, sometime in separate cards in rack-mount chassis

Software Windows Server supports x86 only, not ARM
Commercially available Linux platforms:
Ubuntu (Canonical) -> supports x86, ARM
Suse Linux Enterprise Server /SLES (Novell) & Red Hat -> supports x86, not validated for ARM
Linaro: industry consortium to develop open source Linux software for ARM based SoC's

Smartphone Components

Antenna + Switch & RFFE, Filter, Duplexer, Amplifier, Transceiver, Baseband, Application Processor [SOC + LPDDR3], Memory [Flash / SSD...