USB Hub Bug Hunting & Lessons Learned

This tale begins with a customer reporting this cute little 2 port USB hub wasn’t working with Teensy 3.6.

In this article I’ll show how a protocol analyzer is used, how my instincts turned out to be very wrong, and along the way dive into arcane USB details you probably won’t see explained anywhere else.

First Instincts: Fail

This article is being written from the perspective of hindsight.  You can much of the process as it actually happened on the forum thread where this problem was reported.

Most of my first troubleshooting instincts revolved around these 2 questions:

1: This hub works on Linux, Mac & Windows.  What are they doing that Teensy isn’t?

2: All other hubs work on Teensy, so what’s different about this particular hub?

I believe most people would start with these 2 questions.  In the end, both turned out to involve very wrong assumptions that were ultimately a distraction from finding the real problem.

Protocol Analyzer: Fail (Software Crash)

Usually the first step in debugging USB problems is to hook up the protocol analyzer.  I have this one, the Beagle480 from Total Phase.

The way this works is pretty simple.  On the left side where I drew the 2 blue arrows, your USB communication passes through.  To whatever you’re monitoring, it just looks like a longer cable.  But it makes a copy of all communication happening in either direction and sends it to the port on the right (green arrow).

If you’re just making a USB device that plugs into your PC, usually you can run software to capture the data from your PC’s driver.  But when you’re making an embedded USB host like Teensy 3.6’s 2nd USB port, this is the only way to actually see what’s really happening.  The downside is the cost.  Total Phase sells this one for $1200.

Here’s the test setup, with the Beagle480 inserted between the Teensy 3.6 and 2-port hub.  Usually quite a clutter of cables is involved, but this little hub plugs right into the side of the Beagle480.  The hub has 1 keyboard connected.  A Mac laptop is watching the USB communication, and a Linux desktop is running the Arduino IDE to upload code to Teensy.

Despite the high price, it’s not made of magic.  Inside it has a fairly small buffer memory.  It has to send a copy everything on that right side port, which can be problematic during periods of sustained fast data flow.

This case turned out to be one of those problematic situations.  Unfortunately Total Phase’s software does not handle this situation well.  The Linux version simply crashes.

The Mac version is better.  It hangs with the infamous Mac beachball.  A Force quit is needed.

Fortunately having the software lock up means at least something can be seen on the screen, even if I could not scroll up to look at any other data.   Whatever was going wrong with this hub was causing a SETUP token to be repeatedly tried.  The timestamps show it’s happening about every 8 to 12 microseconds!

What’s Different About This Hub?

With the protocol analyzer not helping much, I decided to first focus on question #2.  All the other hubs work.  What’s different about this one?

In USBHost_t36.h on line 59 is an option to turn on lots of verbose debug printing.  But it doesn’t print much about the USB descriptors.  I decided this was a good time to fix that (rather than just plug into a PC and run a program to view), optimistically hoping the info would show some stark difference between this bad hub and all the good ones.  I spent a few hours adding this code to print descriptor info.

Here is the configuration descriptor for this 2 port hub.

Configuration Descriptor:
  09 02 29 00 01 01 00 E0 01 
    NumInterfaces = 1
    ConfigurationValue = 1
  09 04 00 00 01 09 00 01 00 
    Interface = 0
    Number of endpoints = 1
    Class/Subclass/Protocol = 9(Hub) / 0 / 1(Single-TT)
  07 05 81 03 01 00 0C 
    Endpoint = 1 IN
    Type = Interrupt
    Max Size = 1
    Polling Interval = 12
  09 04 00 01 01 09 00 02 00 
    Interface = 0
    Number of endpoints = 1
    Class/Subclass/Protocol = 9(Hub) / 0 / 2(Multi-TT)
  07 05 81 03 01 00 0C 
    Endpoint = 1 IN
    Type = Interrupt
    Max Size = 1
    Polling Interval = 12

Turns out this little 2 port hub is a Multi-TT type.  I’ll talk more about the transaction translators soon.  Most hubs have only a single TT, but 2 of my test hubs are Multi-TT.  Here’s one of their descriptors.

Configuration Descriptor:
  09 02 29 00 01 01 00 E0 32 
    NumInterfaces = 1
    ConfigurationValue = 1
  09 04 00 00 01 09 00 01 00 
    Interface = 0
    Number of endpoints = 1
    Class/Subclass/Protocol = 9(Hub) / 0 / 1(Single-TT)
  07 05 81 03 01 00 0C 
    Endpoint = 1 IN
    Type = Interrupt
    Max Size = 1
    Polling Interval = 12
  09 04 00 01 01 09 00 02 00 
    Interface = 0
    Number of endpoints = 1
 Class/Subclass/Protocol = 9(Hub) / 0 / 2(Multi-TT)
  07 05 81 03 01 00 0C 
    Endpoint = 1 IN
    Type = Interrupt
    Max Size = 1
    Polling Interval = 12

Turns out they’re an exact binary match, except for the last byte in the configuration descriptor.  I can’t emphasize enough how this process involves looking stuff up in the USB 2.0 PDF, in this case page 266.  This byte is just the hub telling the PC how much power is might draw.  The 2 port hub say 01, meaning only 2 mA which can’t be right.  But I know nothing in code on Teensy (which I wrote) ever look at this byte.  Maybe someday we’ll detect power issues, but for now it’s ignored.

Hubs also have a special descriptor for their capability.  It’s documented on page 417-418 of the USB 2.0 PDF.  Here the 3 different hubs did differ.

2 port (not working)
09 29 02 09 00 32 01 00 FF


4 port, ioGear, GUH274 (works)
09 29 04 89 00 32 64 00 FF


4 port, no-name blue color, UH-BSC4-US (works)
09 29 04 E0 00 32 64 00 FF

While I now know these differences are meaningless, and really only the 3rd byte (the number of ports) matters, at the moment I found these it seemed like this just had to explain the problem.  That 2 port hub is different.

Turns out the 4th byte describes features like which LEDs, over-current protection and other features the hub has.  The 7th byte says how long we’re supposed to wait between enabling power and accessing a port, but the bad 2-port hub says only 1 ms, much less than the other 2 working hubs.

What Would Linux Do?

With nothing apparently different between the hub, I returned to the idea that Teensy must be missing some initialization or other step that hasn’t mattered for all the other hubs, but does for this one.

It’s easy to look at what Linux does, since nothing goes wrong that crashes the Beagle480’s software.  Here’s most of the hub’s enumeration process on Linux:

I quickly noticed Linux is sending this Set Interface control transfer.  I did testing with all 7 of the hubs I have.  This doesn’t happen on the USB 1.1 hub or the 3 that are Single-TT, but Linux does do it for all the Multi-TT hubs.

Adding this to the USBHost_t36 library took time.  The hub driver was only looking at the first 16 bytes, to see if the interface and endpoint are compatible.  I had to rewrite the hub driver to look for all the possible interfaces.  Then when it was able to detect if 2 or more exist and parse which is the best to use, I added code to actually send that Set Interface command.

It didn’t make any difference.  I’d poured a few more hours into coding, which in the long term was probably worthwhile, but still no closer to solving the problem!

Working Around Data Capture Software Lockup

I seriously considered resigning this problem to my low-priority bug list, since all the other hubs work.  I considered restructuring the enumeration code to even more closely match Linux’s approach.  I looked over the descriptor data yet again, hoping to find the answer, but no.

Clearly I needed to get around the problem with Total Phase’s software locking up.  At first I just added a “while (1) ; // die here” right before calling new_Device() to start the enumeration process after the hub detects the keyboard.  But that didn’t reveal anything useful.

I needed to let the problem happen, but then shut off the USB port before the Beagle480’s buffer could overfill (or whatever other problem causes the Total Phase software to lock up).  Ultimately I did this:

    if (state == PORT_RECOVERY) {
        port_doing_reset = 0;
        print("PORT_RECOVERY, port=", port);
        println(", addr=", device->address);
static IntervalTimer stoptimer;
stoptimer.begin(panic, 1000);
        // begin enumeration process
        uint8_t speed = port_doing_reset_speed;
        devicelist[port-1] = new_Device(speed, device->address, port);

When I put in terrible debug code, I like to give it very odd indenting so I’ll see it when/if I later forget to remove it!  In this case, a hardware timer is started to generate an interrupt in 1 millisecond, which calls panic().

The panic() function looked like this:

void panic()
{
    GPIOE_PCOR = (1<<6); // turn off USB host power
    Serial.printf("Panic stop\n");
    while (1) ;
}

With this I was finally able to see the beginning of the problem.

Somehow the bad 2 port hub was indeed working, at least for the first control transfer which reads 8 bytes of the device descriptor.  But then everything goes wrong and nothing works.  The host controller keeps trying to resend the Set Address command… at least for 1 millisecond, until the panic() function abruptly shuts off the USB port’s power.

Realizing What’s Really Wrong

I wish I could tell you there’s a reliable process to go from actually being able to observe the problem to understanding it.  There isn’t!

In this case I spent several hours re-reading the USB 2.0 spec, staring at that screenful of info, and basically just guessing.  For a while I considered there might be some sort of electrical interference problem unique to this hub, somehow triggered by completing that first transfer.

It literally took hours to focus on this 1 byte which turned out to be the cause of so much trouble.

To understand this byte, you need to know about USB split transactions, which are probably among the most arcane of USB features.  This next part will be quite technical, so perhaps skip to the end if you’re not in the mood for intensely low-level USB protocol details.

Split transfers are used only between USB hosts and hubs.  If you’re making a USB device, you’ll never see this type of USB communication.  They’re used only when a host at 480 Mbit speed needs to communicate with devices at 12 or 1.5 Mbit.

When USB 3.x is used, all the 5+ Gbps communication happens on dedicated wires.  The cables & hubs have a redundant 480 Mbit path. When you use a 12 or 1.5 Mbit device on a USB 3.0 hub, that slow communicate is done using split transactions on the 480 Mbit line.  Slow speeds are never translated to the 5 Gbps lines.

USB has a lot of very specific terminology.  From smallest to largest, communication is in Tokens, then Transactions, then Transfers.  I’m not going into details here, other than it’s important to remember we’re dealing with 3 starts-with-T junks of data.  Tokens are the smallest.  Transactions are bigger, made up of Tokens.  Transfers are the largest, maybe of of Transactions & Tokens.

These SPLIT tokens communicate with a Transaction Translator (TT) inside the hub.  When the host wants to send a transaction at 12 or 1.5 Mbit, it sends a SPLIT-START token to the hub’s TT.  If the TT isn’t busy, it acknowledges and begins working on the slow communication.  That sequence of tokens is a transaction, also called split start (maybe confusing if you don’t know the context).  Later the host sends SPLIT-COMPLETE.  If the TT has finished, it replies with ACK, and if it’s still busy it replies with NYET.

Here’s the diagram from the USB 2.0 spec on page 202 showing this for OUT transactions.  Sadly there isn’t a diagram specifically for SETUP transactions, but my guess is this should be pretty close.  Did I mention this sort of work is all about guesswork and re-reading technical specs over and over?

Total Phase’s software is very nice.  It normally shows you one Transfer per line.  If you want to see the Transactions that made up that transfer, you click the little triangle to expand.  Likewise if you want to drill down to see the actual tokens, you can.  Here’s the first successful transfer.  Hopefully you can see this more-or-less corresponds to Figure 8-9, except SETUP instead of OUT.  Of course, we only see the stuff from circles #1 & #3 since we’re connected between Teensy and the hub.

In this case, we can see Teensy sent 58 SPLIT-COMPLETE tokens while the hub slowly send that SETUP token at 1.5 Mbit to the keyboard, where the hub replied with NYET to the first 57, then finally an ACK.

This is important, because the process of figuring out why something isn’t working almost always involves first looking at the deeper details of what it’s supposed to do when it does work.

With that in mind, here’s an expanded view of the first Set Address transfer that fails.

Remember the software is showing us nesting of Tokens inside Transactions inside Transfers.  So the 3rd line is actually the first real data.  This badness begins with the bytes 78 01 82 B4.

Earlier 78 01 82 10 was considered ok.  But 78 01 82 B4.  So what does that last byte mean?  Again the answer is digging into the nitty gritty details in the USB 2.0 document.  It happens to be on the same page as that diagram above.

Even though going from 10 to B4 looks like a big change, in fact the top 5 bits are a CRC check which is expected to be totally different if any of the checked bits changed.  So really this means just one of the “ET” bits flipped from a 0 to 1.

The meaning of those ET bits is documented a few pages later.

These bits were 00 for Control, but somehow became 10 for Bulk.  That’s definitely not right!  We’re sending a SETUP token, and that token has the endpoint number encoded within (also documented elsewhere in chapter 8….)

Where This 1 Bit Went So Wrong

The good news is this last part, going from an understanding of what’s wrong to finding the mistake causing it, is relatively easy.  It took only about an hour.  While I didn’t know exactly what controlled those ET bits, I was pretty sure it was something in the EHCI controller’s QH or qTD data structures.

USB controllers implementing EHCI are very unlike traditional embedded peripherals, where a handful of fixed register addresses control everything.  EHCI gets almost everything from main system RAM, where it’s regularly using DMA to read linked lists and binary tree data structures to find out what work you want it to perform.

For each pipe/endpoint each USB device, you create a QH structure (except complexities for isochronous… but trying to keep this simple).  Here’s the EHCI diagram for the QH structure.

The first field is used for the linked lists or inverted tree pointers.  The next two are where you actually configure how the hardware will communicate.  The big yellow area is written by the hardware, coping other info there temporarily as it works on the Transfers you want.  (Yes, the hardware deals in Transfers and automatically does the Transactions & Tokens)

While there’s a lot of small fields in those 64 bits, the “C” bit turned out to be the key.  The EHCI spec documents it as:

Control Endpoint Flag (C). If the QH.EPS field indicates the endpoint is not a high-speed device, and the endpoint is an control endpoint, then software must set this bit to a one. Otherwise it should always set this bit to a zero.

Fortunately the code uses C structs to with distinctive names to represent the QH and other EHCI data structures.  A quick grep of all the code turned up everything that accesses this field.  Only a few of them actually write to it.

Ultimately the bug turned out to be just this 1 line function called pipe_set_maxlen()

pipe->qh.capabilities[0] = (pipe->qh.capabilities[0] & 0x8000FFFF) | (maxlen << 16);

Of course if you look at the 2nd line in the QH documentation, this is incorrectly zeroing too many bits, destroying the C bit and part of the RL field.  It should be:

pipe->qh.capabilities[0] = (pipe->qh.capabilities[0] & 0xF800FFFF) | (maxlen << 16);

Indeed applying this fix made the “bad” 2-port hub work quite nicely.

Headslap Moment

After I found the fix, I looked one last time at the question “All other hubs work on Teensy, so what’s different about this particular hub?”

Here’s the communication with that keyboard through one of the “good” hubs!

Turns out the code has been sending improper SPLIT-START tokens all along, but every hub I own and the numerous hubs other people have used all were able to work anyway!  The different was this particular hub was actually rejecting the bad tokens.

In all this time, developing this library earlier this year and all this recent work to track down this bug, apparently I never actually connected the protocol analyzer and watched the “good” hubs really communicating with low speed devices!  Could have saved a *lot* of work.  Now I know, in hindsight.

Why APA102 LEDs Have Trouble At 24 MHz

It’s well known long APA102 LEDs strips have have trouble at 24 MHz, usually after 150 to 250 LEDs.  But why?  Here’s my attempt to investigate.

One tempting explanation is signal quality problems due to horribly messy unterminated wiring carrying high speed signals, as in this photo!  But it turned out to be a more fundamental timing problem.

In this test, a Teensy 3.2 runs the FastLED “Cylon” example with this line:

 LEDS.addLeds<APA102,11,13,RGB,DATA_RATE_MHZ(24)>(leds,NUM_LEDS);

NUM_LEDs was set to 160, and I connected a strip of 144.  The oscilloscope traces are the signals which arrive at the end of the 144 LED strip.

First, I set the clock speed to only 2 MHz to see the “normal” waveforms output by the last APA102 LED.

The main thing to observe here is the APA102 output changes its data line (blue) at the falling edge of the clock (red).  You might notice a slight delay from the falling edge of the clock to the change in data, but it’s tiny relative to the slow 2 MHz clock cycle.

At 24 MHz, the delay is much more significant.  In this case I measured approximately 15 ns delay from clock to the data changing.

You might also notice the red trace doesn’t look like the 50% duty cycle SPI clock signal.  I believe this, together with the data delay, is the main cause of APA102 issues on long LED strips.

Here’s another measurement of the clock.

Each APA102 LED is supposed to regenerate the clock signal.  Ideally this is supposed to allow a very long LED strip.  But it appears each APA102 lengthens the clock high time and shortens the clock low time slightly.  This might be internal to the APA102 control chip, or it could be simply due to the clock output driver having a faster fall time than rise time, causing the following APA102 to receive a slightly different clock high time.  Perhaps the APA102 controller chip has a better N-channel transistor for pulling the clock output low than the P-channel transistor for driving it high?

After 144 LEDs, the clock low time on this strip has shrunk from 20.83 ns to approximately 18 ns.

With the data output delayed 15 ns after the falling edge, this leaves only 3 ns before the next APA102 LED captures the data on the rising edge of the clock.  As the strip gets longer, each APA102 reduces the clock low time, until it’s shorter than the clock to data delay.

FastLED defaults to 12 MHz SPI clock for APA102 LEDs on Teensy 3.x, which should allow for several hundred LEDs before this clock duty cycle change becomes a problem.

This test was just one APA102 strip I purchased about a year ago.  The Chinese semiconductor manufacturers making these LEDs have a history of changing the silicon without any notice.  I also only tested at room temperature, using only 1 example program which doesn’t drive the LEDs anywhere near 100% duty cycle (more heating).  I powered the 144 LEDs with 5V from both ends, but didn’t make any measurements of the voltage near the middle of the strip.  Power supply voltage might matter.  In other words, your mileage may vary.

But hopefully this helps with understanding what’s really going on, why short APA102 LED strips work so well with the fast clock speeds, but fail when using very long strips, even though the LEDs are supposed to regenerate the clock and data as they pass it down the strip.

Fast pulse counting with interrupts and why nested priority really helps

A question was recently asked on the forum, how fast can attachInterrupt count pulses.  I did some testing to find out, and made this video.

Turns out Teensy 3.2 can run about 1.25 million interrupts per second.  Teensy 3.6 can do about 2.55 million per second.  But these depend on assigning a top priority to the interrupt, as explained in the video.

Of course, both boards can use a timer to count pulses at very high rates, at least 30 MHz.

Non-Blocking WS2812 LED Library

Last weekend I wrote a new WS2812 LED library featuring non-blocking performance.

A common problem with WS2812 / NeoPixel LEDs is creating their control signal with precise timing conflicts with other timing-sensitive software.  Adafruit NeoPixel completely blocks all interrupts.  FastLED can be configured to allow other interrupts, but any other library using interrupts for more than several microseconds can disrupt the WS2812 signal.

OctoWS2811 has offered non-blocking performance on Teensy 3.x since early 2013.  But it consumes 8 pins and places restrictions on 1 or 2 others, which makes it difficult to use in many projects needing some of those pins.  OctoWS2811 is designed for large LED projects (500 to 6000 LEDs), which is “overkill” for many projects using only dozes or even a few hundred LEDs.

Especially for projects using NeoPixel products with the Teensy Audio Library, or trying to receiving incoming serial data (especially DMX lighting control), we have long needed a simple, single-pin, easy-to-use library that doesn’t interfere with interrupts.  For a long time I’ve meant to write this library, and this recent forum conversaton finally gave me the push to get it working to truly solve the NeoPixels+Audio isssue!

Inverted Serial Transmit

WS2812Serial uses one of the hardware serial ports to actually transmit the Ws2812 data.  This idea certainly isn’t new.  This message is the oldest reference I could find of the basic idea.

The serial port is configured to run at 4 Mbit/sec, which is exactly 5 times the 800 kbit/sec speed WS2812 LEDs expect.  Every 5 data bits becomes one cycle of the WS2812 signal.

Standard 8N1 format serial sends 1 start bit, 8 data bits in least-significant-bit-first order, and then 1 stop bit.  In this case, the signal is inverted from the usual TTL level output.  Teensy LC & 3.x have hardware built in to invert the signal.

Since the start bit is always high, to send a zero bit to WS2812 the first 4 data bits are configured low.  To send a one bit, the first 3 are configured high and the 4th low.  The other half of the byte becomes the next WS2812 bit.  Bit 4 must always be high, and bits 5 to 7 control the data seen by WS2812.  The stop bit is always low, which automatically completes the 2nd WS2812 data.

Originally I tried using only 3 bits per WS2812 time slot, with 2.4 Mbit/sec serial baud rate.  Many of the WS2812 datasheets say the timing allows up to 450 ns pulse width, so in theory this 417 ns pulse should work.  In practice it did work with some WS2812 LEDs, but not others.  In the end, I changed to 4 Mbit/sec which allows it to work with all WS2812 / NeoPixel LEDs.

Direct Memory Access (DMA)

To achieve non-blocking performance, and to run efficiently at 4 Mbit/sec baud rate, DMA is used to copy the data directly from memory to the serial port.

The result is a perfectly continuous WS2812 output which does not require any interrupts and leaves the processor free to run other libraries or your program to compute the next frame of LED data.

The need to compute all of the serial data before each update does lead to the one major drawback of this non-blocking approach: memory usage.  Normally with FastLED or Adafruit NeoPixel, only 3 bytes of memory are used per LED.  WS2812Serial requires 15 bytes, the normal 3 for drawing, and 12 for composing the serial data.

Fortunately the code is fairly simple.  Here is the entire show() function which updates the LEDs.  In the middle you can see “x = 0x08” which sets the begin bit for the 2nd half of each byte’s WS2812 output, and then the two logical OR operations which control the groups of 3 bits which are the shaded portion of the drawing above.

In this code sample you can also see my DMAChannel.h abstraction layer for DMA transfers.  It is my attempt to make DMA simple to use, like other Arduino libraries.  Obviously things are not quite there yet, especially for Teensy LC where you can see I had to resort to directly programming the DMA controller registers, rather than using the functions to configure the source, destination, transfer size and count.

At some point I intend to write a detailed article about how DMA works.  Mike from Hackaday has been asking me to do this for years!  If you’d also like to see it, remind me too….

One other possible idea for this library might involve using two DMA channels and their interrupts, to allow a smaller serial buffer.  The basic idea would involve rendering only part of the output, and configuring each DMA channel to send half.  Each each completes and generates an interrupt, another chunk of the output could be generated and the just-finished DMA channel could be quickly reconfigured to send the next chunk.  Ideally, this could allow a relatively small memory buffer.  It would require interrupts, but if they are delayed by other libraries or code, hopefully a user could make a trade-off between memory usage and allowable interrupt latency.

For now, WS2812Serial simply requires a big frame buffer and gives completely non-blocking performance.  No interrupts are ever used.  That does consume extra RAM, but the huge benefit is compatibility with other code or libraries require interrupts or CPU time while the LEDs update.

Pilot Light Flame Sensor for Burning Man Art

I recently built flame sensing electronics for Martin “Moltensteelman” Montesano’s “Three Wishes” art project to be shown at Burning Man.

As a safety precaution, certain types of fire art installations are required to use sensors on their pilot lights.  If wind blows out the pilot, the gas must automatically turn off and other valves controlling high pressure propane directed at that pilot must be prevented from opening.

In this article, I’d like to share with you details of how I built a circuit board for Martin’s project to meet these requirements.

Safety First  Disclaimer First

This Information (web page and related hardware and software files) are provided “as is” and without warranty of any kind, expressed or implied, including but not limited to the implied warranties of merchantability and fitness for a particular purpose.  In no event shall PJRC.COM, LLC or Paul Stoffregen be liable for any direct, indirect, incidental, special, exemplary, or consequential damages arising in any way out of the use of this Information, even if advised of the possibility of such damage.

My hope is you’ll find this article interesting.  If you do make use of it, remember it’s something you found for free on the Internet.  Ultimately it’s your responsibility to manage and accept responsibility for the risks of any project you build.

How To Sense A Flame

There are at least 3 ways to sense fire.

  1. Infrared Light: If you search for flame sensors and Arduino, nearly all tutorials and inexpensive products use a simple IR diode or photo-transistor circuit.  Infrared light also comes from many other natural and man-made sources, making this approach very difficult to implement reliably.  The absolute last thing you want in this application is a false reading allowing high pressure propane to flow!
  2. Heat / Temperature: Thermocouples and Thermopiles are commonly used in water heaters and gas fireplaces.  However, temperature sensing can be quite a challenge for an outdoor application exposed to wind and made with metals that can remain hot for quite some time.
  3. Conductivity / Flame Rectification: Modern gas furnaces use this highly reliable technique.  Gas flames are conductive and act as a rectifier, similar to a diode.  The rectification effect is supposed to be due to a difference in the mobility of the positive ionized particles (burning fuel) and free electrons.  Applying a voltage to the flame can move the electrons much moreso than the positively charged ions.  The best explanation I’ve found is in this paper by Andreas Möllberg.

Martin had already purchased this flame sensor rod (Rheem 62-23543-01), based on the fact that it’s the sensor used in a common furnace, that it has a convenient L-shape which works nicely for his art’s pilot light design, and that it’s only $7.  So the path of flame rectification sensing was chosen…

He also rigged up 1 of the Three Wishes pilot lights to a regulator and fitting so I could run in in by back yard from my barbeque’s propane tank!

With this sensor and test gear, I started on electronics to make it actually work.

Flame Rectification Circuitry

I must confess, before this project I had never even heard of flame rectification.  Reading Möllberg’s paper, it was clear the flame acts like a diode with a very high value resistor in series.  Many guides exist for training furnace technicians with good info.  So the basic idea is to apply a substantial AC voltage to the flame and then look for a tiny pulsed DC current.

After a couple false starts, a little more searching turned up several patents.  I tried a couple of these, ultimately settling on US5439374, partly because it allows an actual analog reading (many circuits are just flame vs no-flame indicators), and partly because it expired in 2013.

US5439374’s basic idea is current can only take 2 paths to get to the flame, either through capacitor #24, or through the resistors and transistor.  Since capacitors can only pass AC current, if any net DC current flows through the flame, it must also flow through the transistor’s base-emitter junction.

An amplified copy of the DC current flows through the transistor and diode, causing capacitor #50 to charge up.  Higher DC flame conduction causes this capacitor to charge up faster.

I built an adaptation of the US5439374 circuit on a breadboard.

Much of this circuitry generates the AC voltage, because it’s meant to run on 12 volts DC.  From left to right, a Teensy LC creates a AC waveform using its DAC output.  It goes through a low-pass filter to smooth the stairstep edges, and an amplifier with gain of 3 to turn it into a 10 volt peak-to-peak waveform.  The amplifier uses a pair of transistors to buffer its output, which then drives a small transformer with approximately an 1:11 ratio.  The result is about 40 volts AC (rms), or about 112 volts peak-to-peak.  Rather than use 50 or 60 Hz as in a furnace, the Teensy generates 1kHz AC, which allows for faster sensing and a smaller transformer and capacitors.

The parts in the upper right area of the breadboard are the US5439374 circuit.  I used a 2N5087 transistor and after some fiddling settled 1nF & 10nF capacitors.  On the far right side are 15M, 10M, 10M resistors and a diode meant to simulate the flame.  As you can image, I tried many combinations of these, and in this photo a wire is testing what happens when the output shorts to ground.  It turns out the US5439374 circuit works extremely well, able to measure diode plus 35M ohm resistance quite well.

I added one more NPN transistor across the capacitor, which you can see just to the bottom right of the transformer.  The other half of the dual opamp chip turns on this transistor if the capacitor’s voltage rises above 3.3V.  It turns out this circuit can gradually charge the capacitor to about 20 volts, so this protects the Teensy’s analog input pin, in case the code isn’t running and regularly discharging the capacitor before it can accumulate too much voltage.

First Test With Fire

No amount of testing with resistors and diodes could really satisfy my lingering doubt.  Would this would really work with actual fire?  So it was time to run a first test with the breadboard circuit connected to a real flame.

Martin came over and we ran it for the first time with real fire.  The Teensy had just a simple program measuring the elapsed microseconds for the capacitor to charge to 2 volts.

I’m happy to say the flame gave a really strong signal.  I had done a lot of testing with 10M to 35M resistors.  Even with the gas turned down very low, with the flame just barely flickering, we got readings (low numbers) printing in the serial monitor.

With the circuit working, it was time to turn it into a usable design to control the pilot light and other valves.

Circuit Board

I quickly designed a circuit board with the flame sensing circuit, a power mosfet for controlling the pilot light gas valve, a relay to enable the other valves, a special timer circuit I’ll describe in more detail in the next section, and all the usual power input conditioning you’d expect.

Since time was running short, I ordered boards from Sunstone Circuits.  They have a very rapid service.  The quick turn prices are better if you choose the option for no solder mask or silk screen.  I put the order in for these boards on Wednesday for a 2-day turn.  Sunstone shipped 1 day early, so I actually got these on Friday!  Thanks Sunstone.  🙂

This was the first board I built.  As you can see, it has a smaller transformer.  I only had 4 of the larger, better transformers, so I wanted to try this one first.

Luckily, the board worked.  No major mistakes.  However, during testing I did discover one small oversight.  The B130 clamping diode has a small leakage current.  It’s nowhere near enough to turn on a solenoid, but it does make the output measure about 10 volts with a multimeter (having about 10M input impedance).  On the final boards I added a 4.7K resistor, so the output is nearly zero volts when off.  The rest of the info in this section has this extra 4.7K resistor added.

Circuit Board Parts:

  1  Teensy LC
  3  LED, Green
  1  LED, Red
  1  Relay, RT424012, Mouser 655-RT424012
  1  Terminal Block Header, 4 pin, Phoenix 1755752, 277-1152-ND
  1  Terminal Block Header, 6 pin, Phoenix 1755778, 277-1154-ND
  1  Connector, 2 pin right angle header, Molex 22-05-3021
  1  LMC6482A Opamp, LMC6482AIMX
  1  74VHC123 Dual One-shot, 74VHC123AMXCT-ND
  1  LM2940IMPX-5.0 Voltage Regulator, LM2940IMPX-5.0/NOPBCT-ND
  1  Transformer, Audio 1K:8ohm, Mouser/Xicon TU003, Mouser 42TU003-RC
  2  IRFR5305 Mosfet, P-channel, IRFR5305PBFCT-ND
  2  PNP Transistor, 2N5087, TO-92, 2N5087-ND, 2N5087CS-ND
  2  Resistor, 10 ohm, 1%, 0805
  3  Resistor, 220 ohm, 1%, 0603
  2  Resistor, 470 ohm, 1%, 0603
  6  Resistor, 1K, 1%, 0603
  1  Resistor, 4.7K, 1%, 0805 (add between gas solenoid output pins)
  8  Resistor, 10K, 1%, 0603
  1  Resistor, 22K, 1%, 0603
  2  Resistor, 47K, 1%, 0603
  1  Resistor, 100K, 1%, 0603
  2  Resistor, 220K, 1%, 0805
  1  Resistor, 470K, 1%, 0805
  5  Resistor, 1M, 1%, 0805
  3  Capacitor, 4.7nF, C0G, 0805, 1276-6729-1-ND
  1  Capacitor, 1nF, 100V, Polyester Film, 493-3476-ND
  1  Capacitor, 10nF, 50V, Polyester Film, 493-3455-ND
  2  Capacitor, 0.1uF, 50V, X7R, 0603
  7  Capacitor, 1uF, 35V, X7R, 0805
  1  Capacitor, 100uF, radial, 6.3mm diameter, 493-13394-ND
  1  Capacitor, 100uF, 6.3V Tantalum
  1  Capacitor, 470uF, axial, 4053PHCT-ND or 4054PHBK-ND
  3  Diode, Dual Common Cathode, MMBD4148, MMBD4148CCCT-ND, MMBD4148CC-TPMSCT-ND
  1  Diode, Dual Schottky Common Cathode, BAT54C, BAT54C-FDICT-ND
  1  Diode, Schottky, 1A, 30V, B130, B130-FDICT-ND
  1  Diode, Zener, 12V, 1SMB5927, 1SMB5927BT3GOSCT-ND
  4  NPN Transistor, MMBT3904, MMBT3904FSCT-ND
  1  PNP Transistor, MMBT3906, MMBT3906FSCT-ND
  1  Pushbutton, 401-1426-1-ND (optional)
  2  header, 14 pins
  1  header, 4 pins (ok to cut from longer header)
  2  socket, 14 pins
  1  socket, 4 pins (ok to cut from longer socket, may need to shave side to fit)
  0  Connector, 3 pin right angle header, Molex 22-05-3031 --- not used
  0  MAX3483 RS485 Transceiver --- not used

If you’d like to get some of these boards made, I’ve shared the design on OSH Park.

Here is the parts placement diagram for building the board.

When bringing up this board for the first time, there’s quite a lot of hardware to test.  First, I applied 12V power without a Teensy installed, only to check the 5V power.  If something is wrong with the power input, discovering it early can save the Teensy and most of the other circuitry.

I created this hardware test sketch to help verify most of the board’s hardware.  While running, the gas solenoid output turns on for 1/2 second and then off for 1 second.  It’s best to check the timing with a logic analyzer or oscilloscope, but if the timer circuity isn’t working the likely failures are no changes or 3 rapid pulses every 1.5 seconds.

Holding the pushbutton changes the output to the relay, to allow testing the other timer, the relay & its transistor, and of course the pushbutton.

The test program also generates a 1.2 volt AC waveform (3.3 volts peak-to-peak) at Teensy LC’s DAC pin.  The opamp circuitry is supposed to increase it to 3.6 volts AC (10.3 volts peak-to-peak) at the transformer input.  The transformer should step it up to approximately 40 volts AC (112 volts peak-to-peak on an oscilloscope).  When testing these voltages, oscilloscope probes should be in 10X mode.

The final test to check the US5439374 flame sensor circuit was done with the board programmed with the final code.  It prints the flame measurement to the Arduino Serial Monitor.  Testing is pretty easy to do with resistors and diodes.  A fast non-Schottky diode is best.  I used a UF1003-T diode.  Smaller numbers are stronger signal levels.  The circuit should be able to measure at least 25M in series with the diode, but without the diode it should see no signal (highest number printed).

Building and testing these boards took most of a weekend.  Here’s the final 4 boards.

Safety Features

The primary safety feature in Martin’s project is use of normally closed solenoid valves and plumbing rated for high pressure propane.  All failure modes need to avoid keeping power applied to the solenoid valves.  When the valves are off, the propane is contained.

Rather than allowing the Teensy pins to directly control voltage to the valves, I added hardware timers using a 74VHC123 chip.  This part works essentially the same way as the familiar 555 timer chip, with voltage comparators and set-reset latches.  But unlike the 555, it is designed to be retriggerable and it has logic for true edge triggering.  These can be added to a 555 with extra parts, but each adds design compromises.  The 74VHC123 gives both features in a highly reliable chip.

The timer turns on its output when it sees the rising edge of a trigger pulse.  The edge trigger is important.  Microcontroller failure modes tend to result in the pins stuck in one state.  The timer does not respond to its input stuck high or low.  Only the low-to-high transition begins the timing cycle.

This oscilloscope screenshot shows the waveforms when running the hardware test program.  The timer keeps it output high for 0.25 seconds following any rising edge trigger.  The test gives 3 pulses, then waits.  The waveform after this 3rd pulse tests a failure where Teensy might crash or lock up or otherwise fail.  The hardware timer automatically turns off the propane solenoid valve after 0.25 seconds without another pulse.

The code Teensy runs also implements extra safety features.  The main one is a count of good flame readings.  Pulses are only sent to the timer if the previous 4 sensor readings all show a flame signal.  While the US5439374 circuit works very reliably, this extra check is meant to have the gas turn off in the case where only intermittent measurements are made.  Only when a consistent pattern of good flame readings are seen can the timers be triggered to turn on the outputs.

The code also measures the time while the manual start button is pressed.  If this button were to fail (in the pressed state), and the pilot light is out, we don’t want to turn on any gas.  A 20 second time limit is used, and it’s initialized to 100 seconds at startup.  The manual start only works if the button is first observed as not pressed, to reset this time limit, and then can only keep the gas turned on for 20 seconds.  If some unexpected mechanical failure results in an object falling onto the button, the worst case scenario is 20 seconds of pilot light gas.

Dust Resistant Construction

Anyone who’s been to Burning Man knows of the extreme dust.  So I started the enclosure design around this IP67 rated pushbutton, which has real threads & a nut (so many are the flimsy press-in type).

This button photo is from the Digikey page, and really, I’m a bit curious how they manage to get such a perfect photos with the parts all standing?  Must be tricks real photographers know…

With the project coming together on a tight timeframe, I laser cut the enclosure from acrylic sheets I had on hand.  Here is the lid made from thin material, due to the depth of the button’s threads.

Some time ago, I purchased samples of rubber material to someday try to make gaskets for a waterproof box.  This was the day…

Again, a real photographer could have probably captured this without so much blur.  Maybe I could have with a tripod?  Maybe not?  The camera has a lot of trouble focusing on a black sheet.

Here is a first fit check of the gasket against an early attempt at the base.

One of the many little lessons I learned is the squeezed rubber exerts force against the plastic, enough to make it bend with this thin piece.  I ended up discarding this part and cutting bases from the thicker material to (hopefully) avoid gaps from the base bending.

The gasket seals the lid against the base, and the PCB against the base where the 2 connectors fit.  For the lids, I just glued pieces together with normal acrylic cement and then used silicone caulking compound along the inside seams.

Here are the 4 completed lids with the buttons installed, and the final thicker base, ready to be assembled.

I’d like to remember this part of the project going smoothly, but the truth is I ended up making a few different base designs.  Silly mistakes.  Here’s the final fit check.

One other imporant piece was the height of these standoffs.  The side walls are made to be about 0.015 inch taller than the combined standoff+PCB height, so there’s a slight squeeze applied to the rubber gasket.

Here’s the final assembly with the screws.  It stands off the table slightly, since the connectors protrude from the bottom.  Maybe not the most conventional design, but this was done pretty quickly.

One thing I considered but didn’t do was buy special sealing machine screws.  Hopefully the ordinary screws will work.  I guess we’ll soon know if this design resists the Playa dust of Burning Man, though Martin will probably also end up installing these inside a box that offers another layer of protection.

Here is a list of all the materials I used to make these enclosures.

  1  Terminal Block, 4 pin, Phoenix 1757035, 277-1013-ND
  1  Terminal Block, 6 pin, Phoenix 1757051, 277-1015-ND
  1  Pushbutton, IP67 waterproof/dustproof, GPB556A05BR, CW158-ND
  1  Connector housing, Molex 22-01-3027, WM2000-ND
  2  Connector crimp contact, Molex 08-50-0114, WM1114-ND
  5  Laser cut acrylic, 0.212 thick
  1  Laser cut acrylic, 0.100 thick
  1  Laser cut rubber, 0.063 thick
  4  Hex standoff, 4-40 thread, nylon, 36-1902D-ND
  4  Machine screw, 4-40, 1/4 inch
  4  Machine screw, 4-40, 1/2 inch
  2  #26 wire
  2  printed label, attach to Terminal Blocks

The laser cut design was create with Corel Draw v6.  You can find the file flame_sensor.cdr in this Github repository for the code.  These were cut with an Epilog laser, which has a driver that ignores the thicker lines when only vector cutting.

One final addition was printing these labels for the connectors.  Troubleshooting electronics in the desert can be frustrating.  Hopefully these will help a bit.

The flame_sensor_labels.pdf file for printing these is also in the Github repository.

Final Test With Fire

With everything built, we did another test with actual fire to make sure it really works.  Throughout the latter part of this project I actually became pretty comfortable testing with the diode & resistors, which is very consistent and repeatable.  But testing with the actual flame is fun.

Here are the final assembled units, one for each of the Three Wishes, plus a spare.

I’m excited to see the Three Wishes project come together.  I hope to update this page later with photos or a link to the project.  Photos from first Three Wishes test added below.

If you do use any of this info in your own projects, please think carefully about safety and remember the disclaimer from above.

Three Wishes Art Installation Test (Aug 20, 2017)

Three Wishes photos by Jason Whitson, at Moltensteelman Headquarters, August 21, 2017.

Photo by Jay Bird, at Moltensteelman Headquarters, August 21, 2017.

Photo by JaMarcus LaKrantz, at Burning Man 2017