AnsweredAssumed Answered

Digital debug difficulties caused by pin card differences

Question asked by t_stinchcombe on Aug 27, 2008
Latest reply on Apr 3, 2009 by uncletom
We have just experienced another strange digital test failure which highlights yet again (we've seen this sort of thing before!) that the hybrid pin cards, which are nominally functionally equivalent, can sometimes behave quite differently and cause 'faults' that are difficult to track down, and which then present a dilemma in how to get around them/remove the 'failure'. It would be nice to hear if other forum member's have experienced similar problems, and if so, how they overcame them.

The current 'fault' concerns an SRAM chip: we have made a little over 22,000 of these boards over the past two years, with apparently little trouble with the test for this chip. One of our CEMs recently built, and apparently passed on their HP3070, a batch about 300 boards, but when we tested them, we got 11 failures on this chip! We have 3 HP3070s, and most of the failing boards fail more often than not (using 'recycle to fail' in Debug) on all 3 machines. Surprisingly the card fit on our CEM's machine is quite close to one of ours - the one slot that is different has only one or two resources used in the test of the 'failing' chip. Across all 4 machines we have a good mix of E4000-66540, -66550, -69540 & -69550 hybrid cards, so both fixed- and programmable-speed, and new and 'refurbished' cards (that being my current understanding of the -69xxx).

(The test itself is fairly common fare for a memory chip: walk a '1' up the address lines and write '55' at each byte; then read the '55' from the first location again, and immediately write in 'AA', repeating this for the other addresses by walking the '1' up the address lines again; finally walk up the lines a third time, reading back the 'AA'.)

The 'fault' itself appears to me (in my few years experience in the game) to be due to normal 'process variation' in the chip manufacture, so no surprises there, but the manner of alleviating it is most bizarre. 'Diagnose faults' quickly established that increasing/decreasing all drive high/low levels by +400mV/-400mV (respectively) would remove the 'fault' on our 3 machines. By a simple process of elimination I narrowed this down to 1 address line ('A12'), which needed the 'drive high' level increased by 200mV to rid the problem. And here is the first stumper: at all the failing vectors across the 11 boards, all of them have line A12 set low, so why should increasing the drive high level have the effect of passing the test?

I can also remove the fault by reducing the 'slew rate' on just this one address line, A12, from the family default of 100, down to 50. (The vector cycle/receive delay is set to 1u/900n, and either increasing or decreasing these had little effect.)

I then decided to look at the data lines: most were failing because one of the two high bits in low-order nibble wasn't high as expected, so it looked as though the 'receive high' level on this pin was too high. Thus I reduced it, eventually by a substantial amount (like both 'rh' and 'rl' on this pin at zero!), and still the best I got was occasionally the test would fail a few vectors further on. However, if I decide to simply ignore the value of just that pin in just that vector, by setting an 'X' as the receive state, the test now runs all the way through and passes! This difference in behaviour itself I think points to some very strange interaction between the pin card drivers/receivers and what is going on in the board under test, and that this behaviour is different in the 4 machines, despite the cards nominally being 'the same'.

Any thoughts or comments from anyone?

I've not quite decided on the best strategy of what to 'tweak' in order to get the boards to pass on our machines yet (and I should add we did look at one in some detail and are quite convinced that the board is OK!), but at the moment I'm thinking of a blanket reduction of the slew rate on all drivers.