Numbnut's guide to debugging prototypes

by andy@warmcat.com 

It didn't just work.  Now What.?

     

 
Back to Latest News

Executive Summary

Okay, so you carefully built your prototype, and - breathless anticipation - the moment comes when you must test it.  WTF!?!?!  It doesn't work!  What kind of shit design is this!  OMG!  OMG!  (sound of head exploding).



The special piechart above explains your position.  The green vomit colour represents the various and manifold ways that the prototype might be WRONG.  There are a lot of ways it might be wrong.  Imagine that you listed all the pins and tracks on your prototypes, and then imagine what the list of every possible WRONG combination of opens and shorts between those pins would look like.  It would contain more entries than there are atoms in the universe for a nontrivial design.

So you can see that the white line, representing the ONE way that your prototype can be put together right, is being pretty generous with its multipixel width.  The odds are against you.  Its not just a matter of buying the parts and following instructions.  Lets just say that again.  

  • Trillions of ways to be wrong.  
  • One way to be right.  

And if you have not yet found The One True Way, my son, read on.

Why things aren't that bad

Alright, things aren't that bad.  In fact only a tiny number of the 'wrong' scenarios are likely.  Generally, with shorts for example, only next-door pins or tracks that are next to each other on the PCB can realistically be shorted together.  So that improves the odds by several trillion trillion already, leaving only several zillion ways to go wrong.

With care, readily visible errors will be spotted and corrected.  So that is another large population of potential errors that can be resolved with just time and care (including the absolute worst case, which is that every pin on your PCB is shorted together... you would probably notice that I hope).  However, that still leaves thousands of credible faults for even a small PCB like Milksop.

Things that lead to errors

We will be looking at how to avoid or fix these problems in the next section.

Inexperience

Component misorientation

So, it doesn't matter which way around a resistor goes, or a nonpolarized capacitor.  But it does matter which way around a diode and an electrolitic capacitor goes (you may get an exploding capacitor if you do that wrong).  And you probably figured that it matters which way around the ICs go, right?  Did you check?

How about resistor packs?  On some respacks it does matter, but the ones I have used on these pages do not care.

Miswiring

Inexperience leads to a feeling of being out of your depth and the suspicion that it is all IMPOSSIBLE.  That leads to people taking shortcuts with their brain when they really out to stop and bring up Google.  This includes an unnamed chap on XBOXHACKER who did not check the pinout of his PLCC device and just assumed that pin one was in a corner.  Which corner?  We do not know, it was his favourite corner.  In fact PLCC packaged devices have pin one in the top middle (there is a small dent to show where) and count around anticlockwise.

The point is not that he ''made a mistake'', but that he blithely assumed that he would just be right with his choice, that ''it will all work out''.  This is a dangerous frame of mind it is easy to fall into when you are fatigued and really quite worried you aren't going to make it, you try to pretend there wasn't a choice there at all.

What?  It needs power?

Inexperience causes problems when more experienced people leave out ''the obvious''.  It IS obvious to more experienced people because they do it every day and don't even think about it any more.  But for newbies, it is not obvious, for example, that external power has to be applied to a CPLD when attempting to program it over JTAG.  It seems perfectly possible that the PC might provide 100mA out of its printer port to a newbie, after all, everything else seems to work by some magic.

Solder shorts

Invisible bridging

Solder is quite capable to make ''thinner than hairline'' bridging which is not visible to the naked eye.

Pin bridges that are not easy to see

Strands of molten solder which bridge two pins are often visible to the eye.  However, if they are at the point where the pin enters the package, or on the underside of the pins, or on the corner bend of pins, this may not be so apparent.

Under-package shorts

If you have been over-generous with your solder, it can form islands under the IC package which cannot be reached by solder braid (because the heat from the iron cannot reach the solder).

Shunted pins

If you do not take care of the pins on your devices, they may become laterally pushed together, causing shorts.  This is simple carelessness not to have noticed.

Misaligned pins

If you do not align pins with care, especially if the package is slightly rotated from normal, on fine pitch ICs the pins will gradually get more and more out of alignment as you go along a row, until they start to short to the wrong pads.

PCB shorts

PCB Manufacturing defects - bridges

You can't trust your PCB.  Hair or dust may fall on the board during manufacturing - depending on which manufacturing process step this happens on it could cause opens too - leading to arbitrary shorts in your tracks.

''Solder resist was too expensive''

Oh boy.  Put solder resist on all your PCBs.  Otherwise solder you are using to solder surface mount or through-hole pads will bridge on to nearby tracks.  That is a miserable situation, the thin tracks cannot stand direct rework and will lift off the fibregalss substrate of the PCB, leaving you with an interesting new coaster for hot coffee, and an unusual configuration of bruises on your forhead where you tried to stick your head through the office wall.

Chaff

See the Bare Board Test section.

Opens

SMT conformance

SMT chips have a figure for pin conformance in their package data.  This is a tolerance - usually stupendously tiny - for how much higher or lower the bottom of all the pins are from each other.  If you have not taken care with your device, and some pins have been bent, they may no longer touch the pad on the PCB, while all the other pins are seated firmly, or the bent pin may be lifting all of the other pins up off their pads slightly.

Levitating pins

Much harder to see is the feared failure mode where the solder flux gets under a pin and boils while you are doing the ''flood the pins with solder'' thing.  The boiling action stops any solder from entering under the pin.  You end up with a pin that looks like it is perfectly soldered, when it is sitting on an insulating layer of solidified flux and no solder even got near it.

Levitating balls on PTH

The related failure mode for PTH pins (Plated Through Hole, or pins that go through the board) is to solder a flattened sphere of solder which again looks perfectly credible as a well-soldered joint.  However the dreaded boiling flux of doom has been a-dancing on your pad, sniggering behind its lacy handkerchief, and there is in fact no connection.

PCB Opens

PCB Manufacturing defects - opens

You can't trust your PCB.  Hair or dust may fall on the board during manufacturing - depending on which manufacturing process step this happens on it could cause shorts too - leading to arbitrary opens in your tracks.

Excessive track currents, or ''the wisp of foul smoke''

This one happens when you begin testing.  A careless moment with a scope probe, a pre-existing solderblob between 0V and Vcc, and your PCB becomes an unusual and exciting low ohm resistor hooked up to your power supply.  An inch of 6 thou track presents around 0.15R DC resistance.  Where that inch is accidentally hooked between 0V and 5V we are therefore attempting the amazing feat of passing ~30A through an inch of track not much thicker than a hair.  It reacts to this honour in the following way.  First, it heats up.  Then a few milliseconds later it gets really very hot, white hot.  The solder reisst above the track chars into carcenogenic flakes, releasing what would other under circumstances be a beautiful curl of thick grey smoke, which rises into the air gracefully and steadily, driven by the heat.  This curl is of very defined size - because a few hundred mS later the track burns out and the fault area begins to cool.  If you happen to be leaning over the PCB, breathing heavily with anticipation and fear, this toxic plume will gladly be drawn fully and directly into your tender lungs, where it will sear them, and as you stumble from the room cursing, retching and choking, take up residence until it will later kill you from lung cancer.

Overvoltage

The mosfets used in modern silicon have a breakdown voltage that is not a million miles away from their operating voltage.  If a pin, or the Vcc rail comes into contact with a voltage that is double the rated maximum, it will almost certainly have proved fatal.

Overcurrent/Heat

Its actually fairly hard to kill devices by overcurrent if no excessive voltage is involved.  However, shorts can cause large currents to flow in output stages, a 74HC pin shorted to 5V but driving low will pass considerable current (>100mA), over time this heating will cause fatal trouble.

Static Discharge

I have never had a problem I could say was down to static discharge.  However, I live in England, which is not known for its dry atmospheric conditions.

Mysterious problems caused by protection diodes

Every pin on a CMOS device has a pair of protection diodes.  There perform the function of providing a low resistance path for volatges below ~ -0.5V to 0V, and for voltages abover Vcc+0.5V to Vcc.  The unforseen consequence of this protection is that low current devices can ''sort of operate'' even if there is no 0V or Vcc connection to them.  In the case of no Vcc, consider that any high signals will be shunted on to the Vcc rail by its protection diode, raising Vcc to around the High level - 0.5V.  But then the Vcc level is dependent on the amount of high signals coming to the chip!  If you have truly bizarre operation, this should be checked for.

Avoiding avoidable errors

Okay, that was almost all the depressing stuff.  Now you understand the difficulty.  What can be done about it?  How does anyone build anything that works?

Fully avoidable problems

  • Always use solder resist on your PCBs.
  • Bare Board Test production batches.  This is where a special jig is made to verify that all pads are connected only to the pads they are meant to be connected to, and no others.  Boards that fail this test are thrown away and are never maufactured further.  HOWEVER, if your PCB manufacturer is a BASTARD, like one manufacturer in Bedfordshire I had the misfortune to use about ten years ago, a BASTARD, yes, BASTARD I said, you may order 100 PCBs, with Bare Board Testing, only to find that  around 50% of the board fail with shorts when they are later built.  Closer examination showed swarf (PCB filings with bare copper, from when the boards were cut) UNDER the solder resist!!!!  So much for Bare Board Testing at that establishment.  I later heard he'd gone bust.  BASTARD, BASTARD.... ... ... ... BASTARD!
  • Look up datashseets for the devices you will be using.  Make sure you have an understanding of the basics for the chip before you use it, which pins are power, where they are physically, what voltage it operates at, how much power it wants, etc.  There's no excuse when data is a Google away.

Largely avoidable problems

  • Take care with component orientation.  Make sure you KNOW if a device can go in any orientation, like a resistor.  You may still make mistakes, but by taking care. they will be very few.
  • Do not allow you SMT device's pins to become misaligned, bent, or anything other than perfect
  • Do not flood the device with really excessive amount of solder when soldering
  • Align your devices on the PCB pads with extreme care.  Use an Intel Qx3 microscope if you can get one.
  • Get recommendations for a GOOD PCB manufacturer.  That is NOT necessarily the cheapest, or the most expensive.  GOOD is unrelated to the money metric.  I recommend Swift Circuits, they have served me excellently for several years now.

Working with unavoidable errors

You WILL have unintended shorts and opens on your prototype.  Get used to it.  You aren't as perfect as your Mum thinks you are.  You need tools and strategies for dealing with the inevitable.

Tools

  • You need an oscillosocpe.  An old surplus one will do at a pinch, you can pick these up for cheap at surplus shows and suchlike.  Even a crappy old one is better than nothing, but if you have deep pockets I can personally recommend those little Textronix LCD scopes.  For design work you do need the 1Gs/s scopes, but for fault tracing you are generally seeing gross, silly errors, like 5V square waves squished down to 10mV by a short.
  • You need a multimeter with continuity testing.  I have a great machine called a Polar 850 Shorts Locator instead.  Its a super continuity tester which makes a different pitch sound depending on the resistance, and it can see milliohms.  That means you can move the probes along a track a hear if you are getting closer or further away from the short.  Great, but a multimeter will do.
  • You need a sewing pin, with a sharp end.  In fact I use the special probes on the shorts tester, its great for that too.
  • If you can get hold of an Intel QX3 USB Microsope (it is a kids toy microsope, but if you pull it apart it works great for this too) that will help you.  Nowadays we are working below the sizes that the human eye can readily resolve unaided, having tools to help you with that makes a lot of sense.

Strategies

  • Your morale is important.  Find someone to talk to about your problems.  The moment you give up and throw it in the garbage is the moment when your chances of getting it working falls to zero, other than that there is always some chance, big or small.
  • Check all voltage rails and any clock signals, eg, from Clock Oscillators, etc.
  • Of all of the possible problems, it is unlikely that you have blown up a chip unless there has been heat or overvoltage involved.  It is almost certainly something else causing your problem.  Act accordingly.
  • Don't be afraid to stick a finger on the top of the IC packages on your design regularly.  Some chips run hot even when everything is fine, eg, the 5V Xilinx CPLDs.  But most chips will be room temperature if all is well.  If they are showing signs of running a temperature, that's a big hint.
  • Measure the resitance between 0V and Vcc on your prototype BEFORE you apply power.  Every time you do some soldering on it, test it again.  This will reduce the number of foul smoke wisps you get.
  • Print out the schematics for the prottype and look at the flow of signals in and out of the design.  Force whatever generates the initial signals to provide a contant stimulus (ie, put the driver program in a batch file loop) and trace through the circuit the flow of data.  This will OFTEN lead to discovering opens and shorts, with 'impossible' situations like a signal going into an inverter but having the wrong or a constant value coming out.
  • Have a probe around with your continuity tester when you are stumped.  Look at pins and pads that look well soldered as well as suspicious ones.  Preferably look at the connection indirectly, ie, if a via some distance away is meant to be connected, put one probe on the via, so it is not physically changing what you are measuring.
  • Flex the board gently while it is operating, see if that makes a difference - suggestive of opens if it does.
  • If you're SURE that all the pins on a surface mount chip are well soldered, but it ain't working, get your sewing pin and run it along the legs of the chip with a little force.  You were SURE, right:?  Well, no harm in putting a little lateral force on the pins then.  Do NOT push the pins back towrds the package though.  Lateral, sideways force, and not much of it.  Unsoldered pins will shunt on to their next door neighbour, whereas soldered ones will stand firm.  We are NOT talking about a lot of force here, an unsoldered pin will move with a very small amount of force, you just want slightly more than that.  Look how thin the pins are and be gentle.  If you discover an unsoldered pin, use the sewing pin to straighten it by bringing it down in the gap between two pins, and resolder it.
  • Another method that is worth considering for SMT is the downward reheat.  Clean you soldering iron bit on a wet sponge, then gently use it to reheat the pins of your suspect surface mount device.  Do not drag the iron along the pins, it will be a disaster.  Instead bring the iron down on to the foot of the pins, and with gentle downward pressure to force the pins to the PCB, bring the iron back gently towards you away from the pin.  Clean your iron tip as necessary, you should not make any bridges this way as you are not bringing any new solder to the party.  If you do, add some more fresh solder and use clean braid to get rid of it all.
  • Another strategy that pays off very well is the ''dumb shift'' test.  In this test you put your logic into a special mode (easy with a CPLD or FPGA) which drives only one of the output pins high at a time, changing which one all the time.  Shorts turn up as a 1/3rd height pulse on two lines simultaneously.  You can look for the signal at other component pins it is meant to be connected to, if you see nothing you know that signal is shorted to a power rail or the pin is open.

Conclusion

This is only a sampling of the problems and solutions you need to build a modern prototype.  It is not an easy job.  I hope this advice will be of some help.