Black Friday Sale 2021

The Audio shield i~~s on sale for Black Friday~~ (sale has ended), with sockets and a flash memory chip included.

Teensyduino 1.54 Released

Teensyduino 1.54 was released earlier last week.

This article is a detailed look at the new & improved features version 1.54 brings.

MicroMod Teensy Support

Sparkfun recently released MicroMod Teensy, which is a product of collaboration between PJRC and Sparkfun.

MicroMod Teensy has hardware similar to Teensy 4.0 (600 MHz ARM Cortex-M7, 1MB RAM), but with larger 16MB flash memory.

Teensyduino 1.54 brings full support for MicroMod Teensy. Simply select it from Arduino’s Tools > Boards menu.

Teensy has traditionally focused on DIY electronics experimentation & building, particularly with solderless breadboards. While a few shields exist, like the Audio board, PJRC doesn’t have the resources to create a large ecosystem of shields & accessories. Almost all projects involve DIY building.

Sparkfun’s MicroMod form factor aims to give you an ecosystem of “just plug parts together” carrier boards and Qwiic I2C modules for a wide variety of projects. If any of Sparkfun’s growing collection of carrier boards is a close fit to your project’s needs, and you’d rather buy something than experience the joy and learning (but also frustration) of DIY building, MicroMod carrier boards may save you a lot of time.

MicroMod can also be useful for low volume production. Obviously at high volume you would create a fully custom design from the ground up, and at medium volume a custom PCB using the Teensy 4 bootloader chip can make good sense. But for low volume production, keeping up-front costs low and minimizing money tied up in inventory is often the best path to success. In those cases, you would manufacture a simple 2-layer PCB with as few expensive parts as possible, then populate it with Teensy and other complex modules at the time you actually sell.

Traditional Teensy requires through-hole header pins, which are simple but labor intensive to solder. MicroMod form factor allows you to build your PCB using an inexpensive M.2 socket. Then when you’re ready to sell a product, just buy the MicroMod board and plug it in.

Like all Teensy models, MicroMod Teensy is supported by Teensy Loader.

Teensy Loader is a stand alone program which can be used without the rest of Teensyduino and the Arduino IDE. When you need to program a large number of boards with the same firmware, load the HEX file and turn on Teensy Loader’s “Auto” mode. Then plug each board into a USB cable and press its Program / Boot button, or momentarily touch pads together if your custom PCB is made without a pushbutton. Auto mode immediately uploads your firmware without requiring any keystrokes or mouse clicks!

Arduino 1.8.15 Support

Arduino released Arduino IDE 1.8.15 on May 15, 2021, and also version 1.8.14 several days earlier.

Teensyduino 1.54 brings full support for these 2 latest (non-beta) versions of the Arduino software.

If you are unsure which version you have, click Help > About to check. On Macintosh, the About menu item is found under the program name rather than Help.

SD Cards Using SdFat with SD Library Compatibility

Until now, 2 different libraries were used to access SD cards. The old Arduino SD library is used by most programs. Bill Greiman’s SdFat library offers higher performance, support for large cards, and use of long filename rather than limited to 8.3 DOS filename format.

Teensyduino 1.54 has completely removed the very old Arduino SD library, which internally included an ancient version of SdFat.

SD.h has been replaced by a thin compatibility layer to allow all programs written to use the old SD library to still compile and function properly, but all SD card access is now actually performed with the SdFat library.

You can even mix usage of the traditional SD library and SdFat’s more capable API in the same program. To do this, use “SD.sdfs” to access the SdFat instance within the SD compatibility layer.

One of the most useful ways to mix APIs is calling SdFat’s begin(), with SD.sdfs.begin() rather than the simple SD.begin() which offers only a choice of which pin to use for CS. SdFat’s begin() lets you configure several hardware access optimizations.

After using any of these, the SD card can be accessed using either SD library functions, or SdFat functions, or a mix of both. For more detail, open File > Examples > SD > SdFat_Usage.

LittleFS for Files on Flash Chips

A LittleFS library is now included, providing littlefs published by ARM with a set of drivers for common memory chips and a high-level API compatible with the Arduino SD library.

Before 1.54, only the SerialFlash library was used for flash memory chips on Teensy. SerialFlash provides essentially raw access to the underlying flash memory, which gives best performance, but comes with many limitations. In particular, files must be created with a fixed size and can not grow in size beyond their original allocation.

LittleFS provides a traditional filesystem. Files may be written as you would with the Arduino SD library. LittleFS has flash memory wear leveling and copy-on-write behavior to protect against filesystem corruption if power is lost while writing, so LittleFS is a good choice for data logging or other applications which regularly write. However, the copy-on-write behavior does impose overhead, making LittleFS slower than SerialFlash.

LittleFS supports flash memory chips on both the QSPI memory area of Teensy 4.1 and connected to the SPI ports. FRAM memory chips are only supported on SPI ports. Teensy’s built in memory can also be used.

These are the supported memory types with LittleFS in Teensyduino 1.54.

Winbond W25Q16JV*IQ / W25Q16FV
Winbond W25Q32JV*IQ / W25Q32FV
Winbond W25Q64JV*IQ / W25Q64FV
Winbond W25Q128JV*IQ / W25Q128FV
Winbond W25Q256JV*IQ
Winbond W25Q512JV*IQ
Winbond W25Q64JV*IM (DTR)
Winbond W25Q128JV*IM (DTR)
Winbond W25Q256JV*IM (DTR)
Winbond W25Q512JV*IM (DTR)
Adesto/Atmel AT25SF041
Spansion S25FL208K
Winbond W25N01G
Winbond W25N02G
Winbond W25M02
Cypress CY15B108QN-40SXI
Cypress FM25V10-G
Cypress FM25V10-G rev1
Cypress CY15B104Q-SXI
ROHM MR45V100A
Teensy 4.0, 4.1, MicroMod Program Flash
RAM Disk, internal RAM or PSRAM on Teensy 4.1

Use of program memory comes with some caveats. While writing, your program may briefly stall when the flash memory is busy with an erase or write operation. Uploading new programs fully erases the flash memory, destroying any saved files. Future Teensyduino may provide a way to preserve the files across code uploading, but in 1.54 all files are lost. Of course, this applies only to program flash. Uploading code doesn’t alter data stored in other flash memory connected to the SPI ports or QSPI memory expansion.

To get started using LittleFS, check out the examples in File > Examples > LittleFS. Documentation and more example code can be found in the LittleFS README file.

Audio Bandwidth Limited Waveforms

Audio waveform synthesis now supports bandwidth limited sawtooth, square, triangle and pulse waveforms, thanks to a contribution from Mark T.

These are the waveform types now supported by the waveform synthesis object.

Bandwidth limited waveforms are useful when another waveform rapidly modulates frequency, phase or amplitude. Modulation creates additional spectral content. With standard full bandwidth waveforms, this new sound content can suffer from Nyquist aliasing. Using bandwidth limited waveforms can avoid those aliasing problems.

Some people also feel the bandwidth limited waveforms simply sound better, or “more musical”.

Now you can have either standard or bandwidth limited waveforms, simply by changing the waveform name. However, the bandwidth limited versions do require more CPU time. Teensy 4.0, 4.1 or MicroMod are best for polyphonic synthesis where many bandwidth limited waveforms may need to be synthesized simultaneously.

Audio Ladder Filter

The audio library now includes a low-pass ladder filter, thanks to a contribution by Richard van Hoesel, based on the Oscillator and Filter Algorithms for Virtual Analog Synthesis paper published by Vesa Välimäki and Antti Huovilainen in Computer Music Journal 2006.30.2.19.

This ladder filter attempts to provide classic Moog sound when used with subtractive synthesis, where complex waveforms are generated with rich sound spectrum and then filtered, often using other waveforms to dynamically alter the filter’s parameters.

The ladder filter has 3 inputs, for the signal to filter, a control signal which can modulate the filter’s corner frequency, and another control signal which can modulate the filter’s resonance.

Like the original analog Moog filter, resonance can be adjusted or modulated all the way to cause self oscillation. Many classic synthesizer sounds make use of this self oscillation feature.

Internally, this ladder filter uses hyperbolic tangent approximation and 4X oversampling to simulate the non-linear behavior of the original analog Moog filters. An inputDrive() function is available to adjust the response from a very “clean” sound to the “dirty” overdrive mode popular with the Minimoog Model D synthesizer.

To quickly hear the ladder filter, open File > Examples > Audio > Synthesis > LadderFilter.

The ladder filter is numerically intensive. While it technically can run on Teensy 3.5 & 3.6, for practical usage Teensy 4.0 should be considered the minimum required hardware.

RGBW LEDs

Both WS2812Serial and OctoWS2811 now fully support WS2812, WS2812B, SK6812 or “NeoPixel” RGBW addressable LEDs. The additional white LED allows creation of pastel-like colors, and reliable white at greatly reduced power consumption.

These libraries use Direct Memory Access (DMA) to avoid blocking your program from running while the LED data is transmitted at 800 kbit/sec. The non-blocking behavior gives your program more time to compute complex animations between LED updates. It also prevents interference with audio, serial communication, or other libraries requiring interrupts.

On Teensy 4.0, 4.1, and MicroMod, OctoWS2811 can use any combination of digital pins, even while using RGBW LEDs, rather than only a fixed set of 8 pins as with older Teensy models. For details, open File > Examples > OctoWS2811 > Teensy4_PinList. Both libraries now include examples for RGBW. If switching from RGB to RGBW, be sure to copy or update the buffer memory definition, as 4 bytes are needed for each LED rather than only 3.

FlexIO_t4 Library

FlexIO provides a sort of construction kit for building custom communication peripherals, using 32 bit shift registers, 8 & 16 bit timers, special control logic, and configurable signal routing to I/O pins.

Each FlexIO port provides 8 shift registers and 8 timers. The IMXRT1062 chip on Teensy 4.0 has 3 of these FlexIO ports.

Teensyduino 1.54 now includes the FlexIO_t4 library written by Kurt E.

First, this library provides objects for FlexSerial and FlexIOSPI classes, so you can create even more SPI and Serial ports, beyond the regular 8 serial ports and 3 SPI ports!

Second, the library provides dynamic resource management for the shifter & timer resources within each FlexIO port. The goal is for libraries using FlexIO, such as TMM-HM01B0-Camera and TriantaduoWS2811, and FlexSerial & FlexIOSPI, to coordinate their usage of the shifter & timer resources. While this resource coordination is still a work-in-progress, ideally in future versions you use any combination of these and other FlexIO libraries, they will automatically resolve which library instances utilize each shifter & timer within FlexIO ports.

Additional Serial Buffer Memory

Each hardware serial object, Serial1 to Serial8, has buffers for storing data recently received or ready to transmit. For certain applications, you might like to increase the size of these buffers.

For example, if your loop() function performs tasks which need to happen regularly for proper functioning of your project, but also needs to occasionally transmit a large message on a serial port, the fixed buffer size can become an issue.

When this simple program is run, indeed the messages appear on the serial TX1 pin (yellow trace below) every 100ms. But at the beginning of each message, pin 4 (the green trace below) stops changing for several milliseconds because Serial1.write() must wait for space in the transmit buffer.

The usual but complex solution to this problem is Serail1.availableForWrite(), which tells you how many bytes can be written without incurring extra delay. However, this requires restructuring your program to write the outgoing message in multiple pieces. If the message contains data which may change over time, perhaps you make a copy of all the data. Doable, but complicated. Everything would be so much simpler if you could increase the Serial1 buffer size.

Now you can increase the serial buffer sizes on all 32 bit Teensy boards, thanks to contributions from Kurt E. To increase the Serial1 transmit buffer, you use Serial1.addMemoryForWrite().

You create an array of bytes to serve as the extra memory. The array MUST be either static or global scope, because Serial1 will continue using it long after the current function ends.

When this program is run with extra buffer memory, the green trace never stops. The loop() function never stalls waiting on Serial1.write() because the buffer is now large enough to hold the entire outgoing message.

Of course, you can also increase the receive buffer with Serial1.addMemoryForRead(). If your program spends time doing lengthy computations or other work while serial data is arriving at a high baud rate, extra buffer memory can help avoid the loss of incoming bytes.

More Serial RX & CTS Pin Choices

Teensyduino 1.54 also brings more choices for which pins to use to receive serial data and flow control signals, on Teensy 4.0, 4.1, and MicroMod. Thanks to another contribution from Kurt E, you can now use pins 0, 1, 2, 3, 4, 5, 7, 8, 30, 31, 32, 33 with any of the Serial1 to Serial8 setRX(pin) and attachCts(pin) functions.

Normally the serial signals connect to the pins through a multiplexer which allows 1 of 8-10 peripherals to control the pin. This new feature automatically routes the serial receive and CTS receive signals through the “XBAR” (crossbar trigger network) peripheral to open up many more choices than could be configured using only the normal pin multiplexers.

This is particularly valuable for use of hardware RTS / CTS flow control, which allows fast baud rates up to 6 Mbit/sec to be used reliably with “normal” programs which may not always read incoming data rapidly. Serial1 to Serial8 support transmitting RTS on any pin, but with prior software, only Serial3 and Serial5 on Teensy 4.x could receive CTS (and Serial5 only supported CTS on a SD card pin). Now RTS / CTS flow control can be used easily with any of the 8 serial ports.

One caveat, however, is each serial port may only route one input signal through the XBAR peripheral. For each serial port, you may use either setRX() or attachCts() with these special XBAR pins, but not both.

Improved map()

Arduino’s map() function provides a convenient way to scale a number from one range to another. One common use is scaling the analog input pins, which give 0 to 1023 with default settings, to another range, such as 1 to 100.

Traditionally, map() has worked only with integer numbers. For example, mapping 1-10 to 4-21:

Sometimes you may want more precision. With Teensyduino 1.54, map() now automatically uses floating point math, if the input variable is float or double type. You can store your number into a float variable, or convert it to float right at the map input.

Of course, if you want the traditional integer-based map(), simply use an integer type for the input. Internally, a little C++ template magic is used to automatically use the integer version when your input is an integer type, or the float version when your input is float or double. If using 64 double, the math is done with full 64 bit precision.

Arduino map() has a long history of suffering integer round-off errors. For most applications, these errors are small and never noticed. But they are errors nonetheless.

Teensyduino 1.54 fixes the integer map() equation, so the integer results agree with the precise floating point results, when the float numbers are properly rounded to the nearest integer.

delayNanoseconds() on all Teensy Models

All Teensy models now have a delayNanoseconds() function, even on Teensy 2.0. When you need a very short delay, this function is simple and easy to use.

Often the actual delay will often be slightly longer than requested. Just like delayMicroseconds() and delay() for milliseconds, it will always delay for at least the time specified.

In this simple example, a 140ns delay is used between changing a pin. The actual resulting pulse measures 159ns.

Internally, delayNanoseconds() is implemented differently on Teensy 2 & 3 than on Teensy 4. The older boards use a simple busy loop with “nop” instructions. This gives the best performance on those older & simpler architectures. But if an interrupt occurs during the delay time, the delay continues after the interrupt without any awareness its total time was lengthened.

Teensy 4 has a very different delayNanoseconds() implementation. Because Cortex-M7 relies on caching both data and instructions, and uses multiple types of memory with very different latency for cache misses, busy looping is not as predictable as with the older architectures. Instead, the desired nanoseconds are converted to CPU cycles, and the a wait loop polls the ARM DWT cycle counter. While this adds overhead before the delay begins, the result is highly predictable results regardless of memory & cache. An added benefit is resilience to interrupts which may occur during the delay.

Normally delayNanoseconds() is used with a constant input for the nanosecond time. Inline code is used, where the compiler is able to better optimize results when the input is a constant. However, variables not known as constants at compile time may be used. All 32 bit boards still give fairly good results. On Teensy 2.0, use of variable input results in delay precision of about 1 microsecond.

Detailed Memory Usage Report

Prior Teensyduino versions used Arduino’s default memory usage summary after compiling.

For Teensy 4, this simple memory report leaves much to be desired.

Teensy 4.0 & MicroMod have 3 distinct memories. Teensy 4.1 has 4 distinct memory areas, if PSRAM chip(s) have been added.

Arduino’s simple memory usage also assumes RAM is only used for variables and flash is only used for code. On Teensy 4 boards, the RAM1 memory is partially used for code. Because the hardware supports partitioning RAM between Instruction Tightly Couple Memory (ITCM) and Data TCM in 32K blocks, a portion of the last 32K block of ITCM is unused padding.

Teensyduino 1.54 brings a more detailed memory usage summary which shows how each of the 3 or 4 memory regions are actually used.

Flash memory usage is given in more detail. One common issue is large arrays of constant data, such as lookup tables to speed computations or digital media like sound clips & fonts, can be mistakenly allocated to RAM. The report now shows how much is data versus code, so you can easily see the change in data stored in flash while working with such const data arrays.

A portion of flash memory usage is reported as “headers”. Slightly over 4K at the beginning of flash memory is reserved for data used by the IMXRT startup process. 3K at the end of your program is reserved for a digital signature. Up to 2K of padding may be used to align these areas to 1K boundaries. By reporting these as “headers”, they’re not mixed in with the code and data your program uses, giving you a more accurate view of how your program is really using the flash memory.

Fault Recovery & CrashReport

Teensy 4.0, 4.1 & MicroMod have a Memory Protection Unit (MPU) built into the ARM Cortex-M7 processor. The MPU allows permissions and caching configuration to be established on up to 16 memory regions. This is much simpler than a Memory Management Unit (MMU) found in PCs, which translates addresses to implement virtual memory. The MPU only enforces permissions, such as whether reading, writing or code execution is allowed in each memory region.

Since the earliest Teensy 4.0 beta tests, the MPU has been configured with security conscious settings, such as only allowing code to execute from flash and ITCM memory regions and disallowing any access from a NULL pointer. But if your program “crashed” by violating any MPU permissions or otherwise caused a fault exception, the default fault handler was essentially empty. Teensy 4.0 & 4.1 would stop responding and a press of the pushbutton was needed to recover.

Teensyduino 1.54 includes an improved default fault handler to recover from problems and give you information to help diagnose what went wrong. The new fault handler performs these actions.

Log processor state & other useful info to RAM
Reduce CPU speed to 198 MHz
Complete USB transmission, so all Serial.print() data buffered before the fault (hopefully) arrives at the Arduino Serial Monitor
Try to remain USB responsive to Arduino’s Upload button
Automatically reboot after 8 seconds

Because this fault handler is built into all programs, and because it runs when something has gone very wrong, its code is meant to be kept simple & small. Diagnostic info is written to a fixed location in RAM (which is preserved when restarting), rather than communicated at the time of the fault.

The automatic reboot may controversial. Rebooting tends to hide the fact a problem occurred. For most applications where Teensy always runs an application specific program, automatically restarting to recover from a problem is probably better than remaining forever unresponsive.

The 8 second delay was chosen for 2 reasons. Traditionally watchdog timers are used for automatic rebooting. If any of the 3 watchdogs are in use, they can function normally if configured for less than 8 seconds. The 8 second delay is also meant to prevent a rapid infinite rebooting loop in the case where the program quickly causes another fault. Hopefully 8 seconds is a good balance between promptly recovering, avoiding difficult to understand reboot-loop behavior, and allowing normal watchdog usage.

After Teensy reboots, your program can access the logged information with CrashReport. The CrashReport code is only built into your program if you actually use it. While your program can use CrashReport at any time, it is meant to be used early in your program’s startup, hopefully before anything has gone wrong, so you have fully access to a freshly rebooted system.

The simplest CrashReport usage is with a single Serial.print() line.

If your program previously crashed with a fault or unused interrupt which logged information, CrashReport shows you a summary.

CrashReport can be tested as true/false, to tell you whether information was logged. You might use this with a program that rapidly prints to the serial monitor, to show the info and wait for input before continuing on to run the program which quickly scrolls the CrashReport info off your screen.

CrashReport is not limited to use with only the Arduino Serial Monitor. It can be used with any interface inheriting the Arduino Print class. For example, you can store CrashReport to a file on a SD card or a flash chip with LittleFS, to keep a record of every time a fault was detected.

The default fault handler and CrashReport can not handle every possible error. The MyFault library was created as a collection of test cases for fault recovery and CrashReport, and also has a typical usage example.

Bug Fixes and Minor Improvements

Teensyduino 1.54 also includes many bug fixes, minor features & improvements, and updated libraries. A detailed list of all changes can be found on this forum announcement thread.

PCM1802 Breakout Board Needs Small Hack

PCM1802 is an impressive audio A/D converter, specified for 105 dB signal to noise (A weighted). But people have reported problems using very cheap PCM1802 breakout boards. Today I made it work.

The cheap PCM1802 boards are sold my many Chinese companies, usually for just a few dollars.

Update: since this article was written, new PCM1802 breakout boards have appeared on the market which may have another design problem with the configuration pins. PJRC has not yet tested any new PCM1802 with this issue.

The PCM1802 tested in this article came from this AliExpress vendor.

The main issue with these PCM1802 boards involves configuring the data format. Teensy and most microcontrollers use I2S format. The board comes with no documentation, but the PCM1802 datasheet shows how to configure the chip in table 6.

On the bottom side of the PCM1802 breakout are little solder pads with names that match up with the datasheet.

If you power the board, the FMT0 and FMT1 pads measure 0 volts. Without power, the also measure about 9K ohms to GND.

To configure for I2S, you would expect to just solder the 2 pads next to FTM0 together. But there is a problem…

These 5 pads labeled “+” connect to each other, but they do NOT connect to 3.3V or anything else on the circuit board! Soldering the FTM0 pads together has no effect.

To make this work, I soldered a wire to those pads.

There is no location on the bottom of the board to access 3.3V power. I considered using the OSR pad. But the pullup resistor is only 10K. The PCM1802 has 50K pulldown resistors, according to the datasheet. Indeed with power applied, I measured 2.8 volts at the OSR pad.

So to get 3.3V, I ran the wire to the top side and soldered it to the 3.3V pin.

The SOT23 part in the lower left corner of this photo is a 3.3V regulator. This 3.3V pin is an output, not an input, which I also verified with my voltmeter.

Fortunately, all of the other pins in this PCM1802 board are wired correctly for use with Teensy’s I2S. In its default mode, only DOUT is an output. All of the other signals are inputs.

These are the required connections between Teensy 3.6 and the PCM1802 breakout board.

PCM1802 Teensy 3.6
   +5V               VIN
   3.3V
   GND              GND
   DOUT            DIN (13)
   BCK               BCLK (9)
   FSY               3.3V
   LRCK             LRCLK (23)
   POW              3.3V
   SCK               MCLK (11)

The POW pin is the only name which doesn’t match up with the PCM1802 datasheet. I used my ohmmeter to verify it really is connected to the PDWM pin.

The FSY pin (connected to FSYNC) is also a bit unusual. PCM1802 expects it to be logic high while you transmit data, so just connect to 3.3V. In the other modes, it sends a signal on this pin which is high during data bits and low during the zero padding bits. But it does not require that signal as input. FSYNC just connects to 3.3V to use PCM1802 with Teensy.

For a simple test, I programmed Teensy 3.6 with minimal code to just route the I2S input data to the two DAC pins.

#include <Audio.h>

AudioInputI2S i2s1; //xy=152,100
AudioOutputAnalogStereo dacs1; //xy=316,117
AudioConnection patchCord1(i2s1, 0, dacs1, 0);
AudioConnection patchCord2(i2s1, 1, dacs1, 1);

void setup() {
 AudioMemory(10);
}

void loop() {
}

With FMT0 correctly configured using a mod wire, and those connections, PCM1802 works great with Teensy. Here are closer photos of the wiring.

If you need a high quality audio A/D and you can find these cheap PCM1802 breakout boards, hopefully this tip about the FTM0 hack and known-good wiring can save you from some frustration and get your project up and running quickly.

1 Bit Video on Sharp Memory LCD

Nic Magnier created this 1 bit dithered video player using a Sharp Memory LCD.

Normally memory displays aren’t known for speed. Nic explains that the display actually allow you to update only specific lines. His approach uses a conversion of video with blue noise dithering and some forward diffusion to avoid pixel moving too much between frame. Then the video is encoded to a custom format, with only the lines which change between frames.

Nic’s video conversion tool, written in Lua using Dear Imgui for the user interface, was designed to quickly experiment with dithering and tweaking values with side-by-side comparison of results.

Nic started with Adafruit’s library and added optimizations for good performance when running on Teensy 3.5. Using this dithering technique and crafty optimizations results in impressive looking video on these displays not normally considered capable of such feats. Awesome work Nic!

SmartLED Shield for Teensy 4

Pixelmatix has made a new SmartLED Shield capable of driving large 128×64 LED panels at 240 Hz refresh & 36 bit color!

This shield is currently being made available on Crowd Supply.

The SmartMatrix library offers amazing features. 36 and 48 bit color can be used, or 24 bit color can be automatically expanded with gamma curves for color correction, good contrast, smooth gradients. Larger 128×128 HUB75 LED panels can also be used at lower refresh rates.

Much of this is made possible by Eric Eason’s work with FlexIO and DMA on Teensy 4.

Complete details & discussion are on this forum thread and the Crowd Supply project page.

Panel Mount & Free Standing Brackets

Graema Gill created a pair of awesome 3D printed brackets to hold Teensy 3.2.

Both the panel mount version and free standing bracket are published on Thingiverse. Both look great and could really come in handy for projects!

uLisp on Teensy 4.x

David Johnson-Davies ported his uLisp interpreter to Teensy 4. This blinks Teensy’s LED in Lisp language!

David also tested Teensy’s uLisp speed relative to other boards.

Full benchmark details are available for several types of computationally intensive tasks.

If you’re a fan of Lisp language, now you can run it on Teensy 4.0 & 4.1.

Teensy 4.1 Released

Teensy 4.1 is now available with access more I/O and memory expansion.

Teensy 4.1 features a 10/100 Mbit Ethernet PHY.

The Ethernet port also has IEEE1588 precision packet timestamping.

Fast Ethernet opens up possibility for low latency & high bandwidth Artnet LED projects, streaming audio, open sound control and other Ethernet-based protocols that were difficult to accomplish with a traditional SPI-based Ethernet shield. Traditionally use of Ethernet in a project has involved a choice of single board computers with high bandwidth Ethernet using systems not designed for low latency tasks, versus microcontrollers designed for real-time tasks but with slow Ethernet or lacking performance to handle Ethernet’s speed. Teensy 4.1 with a Cortex-M7 processor at 600 MHz opens up the possibility to use the high bandwidth and low latency of Ethernet on a microcontroller designed for real-time tasks.

Teensy 4.1 includes a USB host port, supporting 480 Mbit/sec high speed USB. While Teensy 4.0 has those USB host data signals on surface mount pads, Teensy 4.1 adds the hot-plugging power management needed to simply connect a USB host cable and be able to plug in a USB device. Or a USB hubs can be used to connect many USB devices.

USB devices are supported by the USBHost_t36 library, which is installed into the Arduino IDE automatically by the Teensyduino installer.

Another feature technically present on Teensy 4.0, but difficult to use, is native SDIO for fast data transfer to a SD card. Teensy 4.1 includes the SD socket, so you can easily use a micro SD card with the card’s native SDIO protocol rather than slow SPI access.

Teensy 4.1 also includes locations to solder additional memory chips. The larger space is meant for a QSPI flash memory and the smaller space is intended for a 8MB PSRAM chip.

The IMXRT1062 microcontroller on Teensy 4.0 and 4.1 has 1MB of RAM built in. For many real-time control projects, the internal RAM is plenty. But some projects can greatly benefit from the added memory. Frame buffers for high resolution TFT displays are the most common use, allowing advanced graphics rendering. Large memory is also very useful for special audio effects like advanced reverb, buffering fast incoming data before logging to a SD card, and for emulation of retro computer systems like classic arcade games.

These extra memory chips have a dedicated QSPI bus, which is independent from Teensy 4.1’s main program memory. When a flash memory chip added for storage of files or other data, this dedicated bus means it does not interfere with normal program memory access. This is especially valuable while writing data or erasing sectors in the extra flash memory. For real time projects controller motors, synthesizing or processing real-time audio, communicating high speed data or other tasks requiring low latency, flash writing can be done to the extra flash chip without blocking access to the normal program memory.

Teensy 4.1’s larger form factor also bring more I/O pins, and makes the difficult-to-access bottom pads from Teensy 4.0 very easy to access, breadboard-friendly pins. The new I/O pins bring an 8th serial port, more analog inputs and PWM outputs, and access to 16 contiguous native port pins, useful for projects needing fast parallel I/O.

Not every project requires so much I/O or extra memory. Teensy 4.0 fills those needs. But when you do need more I/O, more memory, fast Ethernet, or connecting USB devices or fast SD card access, the larger Teensy 4.1 brings this extra I/O capability to a platform designed for real-time use with fast 600 MHz M7 performance.

Audio Shield For Teensy 4.0

The Teensy Audio Shield has been updated (Rev D) for Teensy 4.0.

The signals are now placed for direct connection to Teensy 4.0.

The audio shield’s circuitry is identical to the previous Rev C version for Teensy 3.0 to 3.6. Only the pin assignments have been changed, so it can used easily with Teensy 4.0.

Improving Arduino Serial Monitor Performance

Recently I’ve been working to improve the Arduino Serial Monitor. Here it is running with Teensyduino 1.48-beta1.

Previously if a board sent data this fast (as Teensy 4.0 can), Java would run out of memory and the Arduino IDE crashes.

Teensy 4.0’s USB code is not yet fully optimized, so we can expect even greater speeds later this year. The Arduino Serial Monitor needs improvement to handle these faster data rates!

Deja Vu From 2014

This isn’t the first time Teensy has crashed Arduino by sending too rapidly to the Serial Monitor. Back in 2014, this same problem existed with Teensy 3.1. Serial.print() without delay on Teensy 3.1 would cause Java to run out of memory and crash the Arduino IDE.

Arduino Due was also capable of crashing the Arduino IDE this way. The Arduino developers had tried in October 2014 to solve it by limiting the buffered data size, which helped, but still Java would eventually run out of memory and lock up.

On December 6, 2014, I finally managed to work around the problem well enough for Arduino to handle sustained USB full speed (12 Mbit/sec) incoming data. My solution worked around the terrible slowness of adding and removing data from the JTextArea component by collecting incoming data to a buffer and using a timer to add data at only 30 Hz rate. It also limited the rate of removal to only once every 150 adds, and removed by the number of characters rather than the number of lines. 4 days later, the Arduino developers adapted my solution and merged it into Arduino. This code as been in every version of Arduino since 1.6.0.

At the time, I wrote this explanation of the details and rant about Java performance. Back then I wrote “Java is pretty horrible”. Now with the benefit of hindsight, I realize I was equating Swing’s JTextArea and JTextComponent classes (and the complicated data storage infrastructure lurking behind them) with Java in general. I also wrote “if dramatically faster hardware is made … in the future, this buffer might need to grow”.

Now with Teensy 4.0 bringing that dramatically faster hardware, and some hindsight, I’ve learned how so much more than merely increasing the size of an intermediate buffer is needed to support sustained data transfer at such speed.

What’s Really Using So Much Memory?

I quickly discovered the terrible slowness inside JTextArea & JTextComponent scaled up (or “down”) rapidly with data size. Keeping the same 30 updates per second but with larger data would not work. It failed spectacularly. Under the load of Teensy 4.0 printing without delay, the Arduino IDE would run slowly for a matter of seconds, then on Windows and Mac start throwing OutOfMemoryError exceptions and ultimately lock up. On Linux, it would keep running, but unusably slow and consume many gigabytes of memory. Not good.

To start digging into the problem, I ran VisualVM, which is a Java profiler. It’s one of the programs bundled with the Java SDK. If you have the SDK and a JAVA_HOME environment variable (the usual setup for compiling Arduino from source), it can be run from the command line with “$JAVA_HOME/bin/jvisualvm”.

VisualVM is very easy to use. Every Java-based program running on your machine shows up in the “Local” group. Arduino appears as “processing.app.Base (pid [number])”. Clicking it connects the profiler to the running Arduino IDE. Then clicking the “Profiler” tab lets you see which Java classes are using so much memory.

This screenshot shows the memory use after only several seconds of Teensy 4.0 printing rapidly to the Serial Monitor. While 100 megabytes is used by raw character data, the really startling result is nearly 2 million live instances of GapContent$MarkData, GapContent$UnfoPosRef, AbstractDocument$LeafElement, and GapContent$StickyPosition. Yikes!

Even on Linux, with the extra burden of the VisualVM profiler, Java quickly crashes under the strain of Teensy 4.0 printing without delays. But the profiler served its purpose, so shine light on what’s consuming such an insane amount of memory. GapContent appeared to the culprit.

Flexible or Bloated, A Matter of Perspective?

Java has a pretty amazing amount of good documentation. Google searches always turn up Oracle’s reference material. Usually searches turn up many nice Java tutorials and well answered questions on sites like Stack Overflow. But from the lack of non-reference material, not even unanswered questions (other than people trying to use JTextArea or JTextComponent as a terminal or live log file display and hitting these same memory use problems), it seems this part of Java is a seldom traveled path. That’s much of the reason I’m taking some time to write this lengthy blog article, to share with you what I’ve learned on this optimization journey.

I spent a lot of time reading the Java reference pages. A lot of time…

Internally a number of Java classes are used in a rather modular way. This modularity could be been seen a highly flexibly or highly bloated system, either a blessing or a curse, depending on your perspective. In the end, the modular nature turned out to be quite useful. But first, let’s look at how it’s structured.

This diagram from the JTextComponent reference best sums up the way things really work under the hood.

If you compare with Oracle’s page, you’ll see I’ve add the GapContent part to this diagram. It turns out the Document class actually outsources all the data storage to GapContent. At one point I had imagined just replacing GapContext with something more efficient. But sadly, the GapContent API is designed around the assumption that the data size grows to any arbitrary size. I wanted to replace all the storage with a fixed-size FIFO circular buffer and avoid *any* use dynamically allocating classes or large data on the heap during the sustained processing of data.

FifoDocument Class

GapContent had to go, so I started work on a new FifoDocument class which would hold all the Serial Monitor text in a fixed size array.

The idea was simple. Since we only new add lines at the end, and delete the oldest lines from the beginning, this ought to be simple, right?

Elements & Events

At first I did not understand the purpose of the Element class. I imagined just sending a DocumentEvent output with a single Element representing all the text. If you read only that Element reference, perhaps you can see how it seems to imply that might work? At least that’s what I incorrectly assumed.

The Document reference is the only other page (which I found) describing Elements. But it only describes how they might be used in a generic way. This Element structure image is completely wrong for the use case of JTextComponent.

It turns out JTextComponent expects a single top-level Element as a container for the entire document, which has child Elements representing each line. Near the end of the explanation on that Document reference page is a dead link to “see ~~The Swing Connection~~ and most particularly the article, ~~The Element Interface~~“. Every indication is Oracle deleted “The Swing Connection” blog many years ago, and dead links automatically redirect to the generic Java page.

Fortunately I did find a copy of The Element Interface article archived at an academic site. This article is essential to understand what Element structure the various Java Document classes actually use, if you want to craft your own custom Document class to replace on of them. For the Arduino Serial Monitor case, it’s the PlainDocument structure.

My initial hope to use a single Element was replaced by adding a large, fixed size array of FifoElementLine instances which keep track of where the individual lines of data are located within the big FIFO circular buffer.

With this addition and many trial-and-error tests to figure out which functions actually get called, *finally* the Serial Monitor window started responding to the DocumentEvent notifications and displayed the text.

An early experiment also showed that FifoDocument could delete data without receiving any input from JTextArea. By simply sending an event to notify JTextArea that data has been deleted, indeed the Serial Monitor would update properly. This highly flexible event-based design that could be seen as bloat turned out to be very useful for implementing an efficient FIFO that automatically discards old data.

Later this ability for the Document to notify the GUI of changes (which it didn’t initiate or participate in any way) turned out to be even more useful for directly adding data into the FIFO.

Bugs & Thread Safety

Without the help of several great people on the PJRC Forum, especially Tim (Defragster), FifoDocment probably never would have reached a usable state. The DocumentEvent interface involves many complex requirements which are only scantly documented. Many tries were needed to get everything right. Defragster found pretty much every bug very quickly.

Even after the “easy” bugs where fixed, thorny problems with threads remained. I ended up making almost all the public methods of FifoDocument synchronized. The thread which adds new data into FifoDocument is also called using SwingUtilities.invokeAndWait(). These are less than optimal. Perhaps later even better performance could be possible?

Direct Write Into FIFO Memory

Even with all these optimizations, Java would still run out of memory on some Macintosh machines. Arduino’s traditional implementation of the serial monitor makes multiple copies of incoming data before it finally ends up stored in FifoDocument (or GapContent).

First, the Serial class has an event handler which receives incoming characters into a buffer, which is allocates on the heap. Then that buffer is converted to a String and passed to a message() function, which is the abstraction allowing different classes to receive data. The contents of that String are then copied into a StringBuffer instance, which is the improvement I contributed 5 years ago. At a rate of 30 Hz, then StringBuffer is then copied another String instances, which is passed to JTextArea, which then passes it to the Document storing the data.

I replaced all this copying of data with path directly from arriving characters into the FifoDocument character buffer.

Unfortunately this means overriding the usual path data takes between the abstract serial monitor classes. Instead, a single loop waits for data to arrive. When data is ready to read, it requests the maximum number of bytes FifoDocument can accept, and the offset where that data goes inside FifoDocument’s fixed size buffer. It then reads incoming characters directly from the input stream to FifoDocument’s buffer. No extra copies are made in other buffers, or String instances to pass the data between abstraction layers.

With this final optimization, even older Macs could continuously receive from Teensy 4.0 without running out of memory or locking up.

However, Java’s InputStreamReader class is still used to convert the raw bytes from UTF8 format to Java’s internal handling of all characters, and the Document event API still uses heap-based temporary allocations for some features. These do still cause Java’s memory usage to grow gradually, then suddenly shrink what Java’s garbage collection runs. But at least initial testing appears as if this overhead is acceptable.

Auto Scroll Behavior

While developing the FifoDocument with a truly fixed size buffer, I was forced to make some hard decisions about how to handle the Serial Monitor’s auto-scroll checkbox.

Since Arduino 1.6.0, the Serial Monitor has used a target size of 4,000,000 characters for its buffer (half of the maxChars size in the TextAreaFIFO class). But this only checked every 150 updates, which can happen no faster than 30 Hz. Regardless of the auto-scroll checkbox, if the stored data has grown longer than 4,000,000 characters, the oldest data is deleted so only 4,000,000 characters remain.

FifoDocument has a fixed size buffer, rather than the flexible storage of GapContent, so I was not able to preserve this functionality. I’m also not convinced this existing approach is really correct, since rapidly arriving data can cause whatever the user is trying to read to be deleted. The TextAreaFIFO class as a flag to tell whether to trim the oldest data, but in all modern versions of Arduino it is never used or changed.

For FifoDocument, I implemented a buffer management policy where the buffer is allowed to fill to 60% capacity while scrolling. During sustained scrolling, 40% of the buffer remains free.

When auto-scroll is disabled, FifoDocument allows new data to completely fill the buffer. Once the buffer is 100% full, FifoDocument discards new data. This may be a controversial decision. The idea is to allow the user to read anything already within the buffer, and everything new which arrives until the buffer becomes 100% full. The Serial Monitor window can’t jump to go blank if too much new data pours in while the user is reading. But once the fixed size buffer is full, nothing more can be captured until auto-scroll is turned back on.

FifoDocument currently implements a 10,000,000 character buffer, and a maximum of 1,000,000 lines. This allows more data to be captured, but the behavior is not exactly the same as current versions of Arduino when auto-scroll is turned off.

Performance

Despite my frustrated words 5 years ago, indeed Java is capable of implementing the serial monitor at these higher speeds. But the pervasive approach of most Java code, allocating new objects while passing data between many abstraction layers, puts far too much burden on Java’s memory management and garbage collection. Using a fixed size buffer, with incoming data stored directly to the buffer without allocating copies on Java’s heap is indeed fast enough.

In a somewhat surprising result, the most efficient system for this use is Microsoft Windows 10.

The Java virtual machine appears to run Java code more efficiently on Windows than it does on Macintosh and Linux. Perhaps Oracle (or Sun) invested more work to optimize the JRE for Windows?

In this screenshot, you can see the “teensy_serialmon” program running, also with low CPU usage. This helper program talks to Teensy using Windows native WIN32 functions and then passes the data to Java using stdin & stdout streams. On Windows 10, both the built in serial driver and the lightweight anonymous pipes used for stdin/stdout appear to be very efficient when accessed with native WIN32 functions.

While Linux is not far behind Windows, unfortunately Macintosh appears to have considerable CPU overhead for accessing serial devices.

This screenshot was taken on the same Macbook Air as the Windows 10 test above (running natively, via Bootcamp dual boot).

However, the Macintosh USB drivers allow Teensy 4.0 to transmit about 60% faster than the drivers on Windows and Linux. When running the USB serial print speed benchmark, Linux and Windows usually sustain about 200,000 lines/second. Macintosh usually runs just over 300,0000 lines/second.

These differences are very likely an artifact caused by less-than-optimal USB driver code on Teensy 4.0, interacting with subtle timing differences in the USB host drivers on each system. I have been delaying work for USB optimizations on Teensy 4.0 until the Serial Monitor is capable of handling the incoming speed without crashing the Arduino IDE. Now, with these improvements, I can start to focus my effort on optimizing the Teensy side!

Arduino Contribution

As always, my intention is to contribute Teensy-inspired improvements to the Arduino IDE back to the Arduino project. Several weeks ago I exchanged a few emails with the Arduino developers about this Serial Monitor optimization work, so they are aware of my effort. Part of the reason to write this lengthy article is to document this work from a “high level” perspective which isn’t really possible by the comments in the code which give more detail.

All of this source code is published on Github. These are the files:

FifoDocument.java – All the FIFO and Document code
TeensyPipeMonitor.java – Teensy’s “Pluggable” Serial Monitor using FifoDocument. This file creates the listener threads which receive stdin data directly into FifoDocument’s buffer, and parse stderr for status updates.
FifoEvent.java – The event info FifoDocument sends to JTextComponent. Methods just call the actual implementation in FifoDocument.
FifoElementLine.java – The Element instances representing individual lines of text. Methods just call the actual implementation in FifoDocument.
FifoElementRoot.java – The top-level Element required by JTextComponent.
FifoPosition.java – The Position interface which JTextComponent requires for users to select text and copy to clipboard. Positions are actually implemented with a 64 bit total-historic offset, as explained in this comment.

This work could be considered 2 separate contributions, even though their code is now pretty tightly integrated. The other work could be called “Pluggable Serial Monitor”, similar in concept to Pluggable Discovery. The concept, like with discovering ports, is a board can provide a program like “teensy_serialmon” which does the actual communication and makes it available to the Arduino IDE using stdin & stdout streams. Teensy has used this approach in beta testing since early 2017, and released in Teensyduino 1.42 June 2018). The port discovery portion was accepted by Arduino around that time. The serial monitor part will hopefully become an official Arduino feature at some point in the future.

Whether this serial monitor performance improvement may ever accepted by Arduino is unclear. Arduino has announced at Maker Faire and Arduino Day (March 2018) a next-gen Arduino IDE which will no longer use Java. How they feel about merging such a large change to the existing Java code is unclear, especially when Teensy 4.0 is (probably) the only board available today which can transmit at these speeds which crash the existing IDE.

Still, my hope is this code may eventually find its way into an official Arduino release. Eventually more microcontrollers will appear on the market with 480 Mbit or faster USB, and fast enough processors and USB code capable of sustained printing at these speeds and perhaps much faster.

When anyone is later interested in this Serial Monitor optimization, hopefully this lengthy article and the code on github will help.

« Older Entries

Newer Entries »