Post Silicon Platform Validation

Post Si Platform Validation


This involves functional testing & performance validation on an implementation of a pre-production version of silicon during SoC/Product Development NPI cycle, where the SoC is tested across a wide range of operating conditions (temperature, voltage, frequency) targeting multiple use cases & corner conditions (to stress-test circuit). Goal is to ensure a robust, reliable and secure SoC design that performs & functions as intended per the design. Validation should always be run across a large quantity of devices so that variation within the silicon population is well represented.

The tests run during post Si validation may be random in nature or directed/focused & feature-oriented. The objective of these tests is to ensure functional correctness while verifying performance envelopes within power constraints & noise margins. This may test the ISA / microarchitecture, run memory stack validation, check for cache coherency, memory sequencing & synchronization, I/O concurrency,  bus & interface speeds, loads & bandwidths , hardware/software/firmware issues. SI/PDN issues are analyzed and any digital or analog bugs are identified & fixed. Tests include Power ON validation, logic validation, electrical validation, hardware/software compatibility validation & signal path validation (to identify frequency limited circuits).

This is an iterative, cycle-rich process, that includes platform level interactions and provides real-time results. However, on the flip-side, this is one of the most significant bottlenecks during SoC product development due to time & resource requirements. Since validation is run post Si, this is late in the NPI cycle, where development schedules are up against an aggressive time to market, and cost of any rework/design respins can be extremely costly. Analyzing results from post silicon validation (including identification of bugs, debug & troubleshooting) can be resource consuming, that is further compounded by the complexity of today's SoC designs that include multiple IP blocks, billions of transistors, several communication fabrics, interfaces and interconnects.

Other challenges in SoC validation:

- There are many ways signals can run through complex circuits, but responses can only be tracked/monitored for analysis at limited observation points (package pins, test pads or memory locations), that have to be pre-designed & allocated during pre-silicon design & development. These are typically assigned based on design experience / expertise, but if observation points for any key signals are missed during design, this translates to losing valuable information during validation and/or expensive rework. 

Speed of execution:Post Si validation > Pre Si Hardware emulation (FPGA prototyping)> Pre Si Software simulation (RTL models - that run the slowest)
Observability & controllability: Software / RTL sims (best)>Hardware emulation>Post Si validation

-  During pre-silicon verification, when bugs are found, they may be sequentially resolved. However, during validation, sequential troubleshooting is not preferred path due to aggressive scheduling needs. Instead, bugs once found, need to be contained with workarounds that allow validation activities to continue, while suppressing the current bug but without masking any other bugs from being discovered - or generating any undesirable product behavioural characteristics.

- Bugs may not always be reproducible, since they can get masked in statistical variation or noise tolerances around the signal.

- Security assurance & power management features tend to have competing / conflicting needs with validation. DFT structures (needed for debug & troubleshooting) can be targeted by hackers and can thus pose a security risk. Power management switches off sections of the circuits that may have an interaction or dependency on the occurrence of specific bugs, so this limits  troubleshooting visibility during validation.

- Application software used for data analysis during validation is used on an unstable platform (chip motherboard) that has not fully matured and is still under development. This can result in high levels of interaction between the analysis software and the underlying platform, generating in false fails and debug issues.


Debug Test Methodology

1. Test Plan
This documents what tests to be run, what functionalities need to be tested, what are the validation methodologies & test vectors, what are the specifications and performance benchmarks, what coverage is needed, what are the target use cases and corner conditions. Test plan is developed from the very get-go / early in design cycle when only high-level architectural specifications may be available, but needs to be progressively fleshed out as design develops and matures

2. Test Execution
Run test, run sanity checks if fails are found, to verify this is a real / true fail.

3. Pre Sighting Analysis
Investigate repeatability and reproducibility of bugs found, and confirm these are valid bugs. Check if bugs can generated across range of operating conditions --> confirmed  & valid  bug / sighting.

4. Bug Disposition
Develop PoA on bug resolution, consult with SoC design/architecture & verification teams, generate workaround  containment plan to continue with validation activities while suppressing effect of bugs found.

5. Bug Resolution
Root cause analysis & CA. Fault isolation: hardware v/s software, digital fault v/s analog issue, group issues with similar root causes together, run a behavioral representation of issue/bug found on a pre-Silicon verification model (such as RTL/HDL simulation) to aid root cause analysis and bug-fixing. Once CA is identified, validate fix.


On Chip Test Hardware for Debug & Troubleshooting (DFT Structures)

DFT/DFD structures help analayze both effects as well as root cause of bugs / errors.

Scan chains (digital flip-flops are all connected in a linear chain forming a giant shift-register. BSDL: Boundary Scan Descriptive Language, JTAG implementation of I/O boundary scan)

- Trace buffers (used to store values of certain preselected signals during run-time, that can be transported off-chip to aid debug, characterized by trace width: number of signals that can be tracked and trace depth: number of clock cycles that the signals can be tracked) Size of the trace buffer, for instance 128x2048 indicates 128 signals can be traced over 2048 clock cycles.

- Signal tracing (ATPG: Automated Test Pattern Generator) will generate signal patterns during every clock cycle, that will be routed to certain predesignated pins. By continuously monitoring the response at these pins, any deviations from normal behaviour can be tracked and reported.

- Structures / instrumentation to allow data transfer from on-chip memory registers to off-chip using standard communication interfaces such as USB

- Structures / instrumentation to allow on-chip configuration control, such as allowing correction codes, system over-rides.


There can be millions or billions of signals generated during run time, but observability is finite and limited to few test pins or memory locations, and therefore only a limited number of signals can be traced / tracked. Tests and DFT structures (scan chains, trace buffers, signal tracing) should be selected so that they can identify/activate bugs and propagate them to the chosen observation points.

Scanning provides "frozen observability" or shows current status or snapshot in time.
Tracing provides "streaming observability" or provides more dynamic picture


Debug software

-Specialized OS that may have been developed from the ground up by the validation team or an off the shelf OS adapted / modified to a highly custom software stack, purely intended for validation
-Drivers that allow off-chip transport of info from memory registers using standard communication interfaces (Transport software)
-Tools that allow querying/configuring/controlling on chip test instrumentation such as memory registers & trace buffers -  and/or trigger signal tracing  (Configuration software)
- Application software used for analysis of raw stream of signals from SoC for higher level debug data such as analysis of signal patterns, traffic congestion, power management (Analysis software)

Debug testing needs specialized test boards, custom test cards, complex test equipment, tools & peripherals.

DUT, Logic Analyzer, Oscilloscope, Probes, Spectrum Analyzer, TDR, Signal Generator, Sequence Controller, DAQ system


Signal Integrity 

SI depends on signal frequency / speeds, signal path design (impedance loading / matching), power distribution, clock distribution, transmission line effects.

At high speeds & frequencies, circuits progressively change from being resistive to capacitive to inductive (transmission line effects).

Digital issues may be due to analog problems in the signal:
1. Transients such as amplitude droop
2. Noise or jitter
3. Edge issues (overshoot, undershoot, preshoot)
4. Transmission line effects (EMI, coupling, x-talk, reflections, ground bounce)


Signal degradation may be due to flaws in digital logic - or analog issues in the signal.


A good starting point to analyze digital issues in complex systems with multiple I/O 's and buses/interconnects is a Logic Analyzer that can typically support high channel counts. This generates a timing diagram of the digital pulses and allows timing analysis of the signal.

A logic analyzer functions in state mode or synchronous mode (where the DUT determines sampling rate/frequency) or in timing mode or asynchronous mode (where an external CLK determines the same). Logic analyzers are characterized by their sampling rate (or timing resolution), memory depth (or info captured per slice during each sampling cycle) and its features of event detectability and trigger flexibility.

Logic analyzer probes should allow the same probe to sample info in both State & Timing modes, capture both digital and analog components of the signal, so that the analog information can be routed to a digitizing oscilloscope that can provide real time analysis of the analog waveform while the LA runs timing analysis of the digital signal.


An oscilloscope is used to debug analog issues by analyzing waveform of the signal, and can shed light on issues such as noise, transients, edge effects and transmission line effects. Once a digital fault has been found with a logic analyzer, an oscilloscope is used to isolate and identify analog issues/causes behind that digital fault.

Oscilloscopes should support high bandwidth and high frequency/ sampling rate to troubleshoot designs with high data rates & fast rise times. Probes for oscilloscopes should provide for high fidelity during data acquisition, maximizing signal preservation and minimizing any loss/distortion, and hence should have low capacitance & inductance.

A Mixed Signal Oscilloscope may be used to concurrently analyze digital and analog signals  for simpler designs, when a high level of interaction is expected between digital and analog signals. However, for more complex designs, it makes more sense to use a logic analyzer in combination with an oscilloscope.

In order to determine highly rare / more elusive events, such as transients in signals with fast rise times, clockphase slip, PLL settling, or low-level bug masked by signal noise, it is needed to sample data at frequencies higher than what an oscilloscope can support. This is when a Spectrum Analyzer needs to be used, that typically provides information in the frequency domain.

A Real Time Spectrum Analyzer (RTSA) uses a DSP to run real-time processing of the signal before committing it to memory storage, and thereby allows information to be displayed in both time & frequency domains. RTSA provides better HDR on the signal.

Ideally, a logic analyzer should be combined with an oscilloscope with event-detection capability to trigger a Real Time Spectrum Analyzer in the frequency mode, that in turn captures a correlated time record of the signal - and can be used to display information in both time & frequency domains.

Time Domain Reflectometry is a tool / technique that is used to study signal path of propagation by measuring the impedances in the time domain as signal flows through traces, connectors and cables. This is used to analyze impedance related issues such as impedance matching, reflections at terminals, loss in amplitude, etc.

A Signal Generator is used to feed external signals into the DUT, to replicate a missing signal, simulate an external stimulus or to intentionally generate distorted / worse case signals to stress test circuits. Signal generators typically provide digital pulses, but analog signals can also be fed using Waveform Generators. Mixed signal and/or data patterns combining digital and analog signals can be generated with a Function Generator.

A Sequence Controller is used to program a virtual set or a series of digital pulses/analog waveforms or mixed signal data patterns that can be fed to the DUT.


EYE DIAGRAM

Visual tool for quick analysis of signal integrity for high speed digital circuits. Provides representation of key electrical parameters for signals.
Generated by an oscilloscope, this is a resultant of a waveform overlay of multiple logic transitions on a digital circuit, plotting amplitude v/s time, creating a statistical average of the signal that can be analyzed for SI issues: timing / amplitude issues, noise / jitter, etc.


Critical parameters for SI analysis:
Eye opening / eye width : This corresponds to one Bit period or Unit Interval (UI) width
Eye amplitude: Difference between 1 and 0
Eye height: Ratio of eye height to eye amplitude is measure of the signal to noise ratio.
Eye crossing percentage: Ideal value is at 50%, deviations signify distortion in duty cycle / pulse symmetry
Rise and fall times
Jitter: Time deviations in rise and fall transitions at signal edge.


SHMOO PLOT

A shmoo plot is a graphical representation of system response (such as pass/fail) when operating conditions (such as frequency, voltage, temperature, etc.) are varied. These are commonly used to show results of testing of complex electronic systems such as microprocessors (and used to display operating envelopes / range of operating conditions of those devices).

The process of varying conditions/inputs during electrical testing to alter system response is termed as shmooing. ATE include software that allow automatic shmooing of the part to determine range of operating conditions.


https://en.wikipedia.org/wiki/File:Figure_5._Shmoo_plots_with_test_period_power_supply_test_on_a_few_FE-I4_devices.png


Plotting V v/s F, Fmax is max frequency at Vmin.
Performance is optimal  at Fmax @ Vmin.


No comments:

Post a Comment

Smartphone Components

Antenna + Switch & RFFE, Filter, Duplexer, Amplifier, Transceiver, Baseband, Application Processor [SOC + LPDDR3], Memory [Flash / SSD...