This is where I post projects from the past and present in the hope that someone else (my future self included) will find them useful. Also, one day I really would like to be featured on Hackaday and become internet-famous. You can contact me on social media, or at “mail” on this domain.
Embedded Rust Toolchains
I recently started learning Embedded Rust. As I mentioned at the top of the last post, there are a couple of toolchain options:
- OpenOCD + GDB
- probe-rs + cargo-embed
Turns out there is also a third one that I just discovered:
- probe-rs + probe-run
This was a little confusing at first, I was unsure what was the better option and how these projects all connect to each other. Embedded Rust seems to be moving fast, so this might get outdated but here is a basic summary if you are just getting into Embedded Rust as well.
The Embedded Rust way
Rust-embedded is a working group in the official Rust organization. Among other things, they maintain The Embedded Rust Book, which you may have come across. In the book, they describe what I would call the “official” toolchain, using OpenOCD and ARM-compatible GDB (gdb-multiarch
). This is probably the way to go if you need to do serious development today. OpenOCD and GDB are stable and mature projects and for a Rust programmer, it is the most fully-featured and reliable option right now.
Enter probe-rs
As an alternative to those external (non-Rust) dependencies, a team has formed an ambitious project around replacing it all with software written in Rust – probe-rs
. Here is an illustration (borrowed from the video below):

Here is a very informative talk by one of the people behind probe-rs:
My major learning from this talk was that probe-rs is really a library. Other projects, like cargo-embed
and probe-run
are built on top.
Cargo-embed
The probe-rs team built cargo-embed
to show off the capabilities of probe-rs. As such, it was the first tool I came across when I found the official probe-rs website. One might imagine that being built by the same team, cargo-embed
will stay closer to the latest features of probe-rs and have a shorter path to get new features in. But this is just speculation.
To build and upload programs, you simply run cargo embed --release
(see my last post about why --release
is important for timing). It possible to do logging with rtt
, a debugger-based thing that uses an internal buffer that gets read out by the debugger instead of, for example, printing over UART. Debugging is also supported (but not at the same time as rtt
currently) from the command line, or visually in something like Visual Studio Code, by hooking into the GDB stubs that probe-rs provides. This is an interface that (to the best of my understanding) mimics GDB’s, but actually goes directly to probe-rs.
Configuration is done in a new file called Embed.toml
. There you configure what chip you are using and whether to use rtt
or GDB debugging, or set up separate profiles for each.
The vision of probe-rs is to offer a full development environment for embedded Rust, so they are also working on a VSCode plugin. It is still in alpha, and I have not tried it yet.
Probe-run
Ferrous Systems is a company that pops up everywhere in embedded Rust. They are a consultancy specializing in Rust for embedded applications and are also very active in open source development for embedded Rust. They started a project called Knurling dedicated to improving the experience working with embedded Rust.
Knurling has many sub-projects and probe-run
is one of them. Built on top of probe-rs, it gives you the same features as cargo-embed
, but in a slightly different packaging. The philosophy is that embedded development should work the same way as native development, so instead of introducing a new cargo command, probe-run
is a so called Cargo runner. This means you configure the “usual” cargo run
command to use probe-run
under the hood. And there is no new configuration file to keep track of, just the regular Cargo.toml
and .cargo/config.toml
. Does it matter? Up to you.
Knurling also has an interesting logging framework; defmt
. Instead of doing string formatting on the embedded device, it relies on a tool on the host side and simply generates a list of strings at compile time that is kept on the host. The embedded device then simply sends the index into that list (using rtt
), causing much less overhead.
I do like the idea of keeping the main Rust interface unchanged, which speaks in favor of probe-run, but I’m not sure about plans for integrating probe-run
with VSCode, or debugging with breakpoints. As I learn more, I hope to find a favorite and maybe also start contributing myself.
Embedded Rust: Timer Timeout Problem
TL;DR: When doing timing critical stuff, use the --release
flag to get a faster binary!
For example: cargo embed --release
.
I’m learning Embedded Rust on a STM32 Bluepill board (with a STM32F103 microcontroller). At the time of writing there seems to be two toolchain options:
- The “official” Embedded Rust way, using OpenOCD and ARM-compatible GDB.
- Up-and-coming probe-rs that is working on having everything in Rust and installable via
cargo
. Their toolcargo-embed
basically replaces OpenOCD and GDB.
OpenOCD + GDB is true and tested, but a lot more work to set up. Probe-rs is litteraly just cargo install cargo-embed
, but it is a work in progress and far from feature-complete. I tried both, but this particular thing caught me while using cargo-embed
, so that’s the command I will be showing.
The Timer Problem
I wanted to talk to a ws2812 adressable RGB LED (also known as NeoPixel). I found the crate smart-leds that seemed perfect. It comes with several “companion crates” with device drivers that support different LEDs and several options for different ways of driving the ws2812, like the ws2812-spi and ws2812-timer-delay.
The SPI crate unfortunately did not work in my attempts so far. It manages to write to my LED once, then panics with the error “Overrun”. Probably I’m using a newer version of the embedded-hal
and/or stm32f1xx-hal
than it was written for. Maybe a topic for another day.
The Timer Delay crate also did not work at first. I broke out my Analog Discovery 2 to look at the data signal:

The time between bits was around 200 us. To get a comparison, I fired up a Platformio project for the same STM32 Bluepill board and imported Adafruit’s Neopixel library. Now, the LED of course worked perfectly and the problem was obvious:

The time between the bits was now only around 1,4 us. I will spare you the details of all the things I tried while wrongly thinking either the entire MCU or the timer was running at the wrong frequency.
The solution turns out to be almost silly: Rust binaries can be really slow if you do not compile them in release mode. Just add the --release
flag and all is well! 💩
Solution:
cargo embed --release
There is apparently a way to override this per-dependency in Cargo.toml
, that might be worth a try if you need it.
Update:
I tried adding the following to Cargo.toml
to make all dependencies build with the highest optimization level, but this still was not enough to make the LED work in my case.
# Cargo.toml [profile.dev.package."*"] opt-level = 3
I also tried increasing the optimization level for the whole dev
profile. This worked already from level 2:
# Cargo.toml [profile.dev] opt-level = 2
Stepping through code compiled like this with a debugger might not work as well though so you might as well use the release
profile all the time and only drop down to dev
for debugging.
NB-IoT and LTE-M Covarage Maps
Here are some links to coverage maps for NB-IoT and LTE-M in Scandinavia. The GSMA also has a global deployment map here:
https://www.gsma.com/iot/deployment-map/
Denmark
Finland
Norway
Sweden
Talk: Cellular Connectivity for IoT
In 2018, I had the great honor to speak at the NDC conference in Oslo. At the time, I was working with cellular connectivity for IoT at nordic mobile operator Telia, and I titled the talk accordingly. NDC is mainly a developer conference, so the talk was intended as an introduction to cellular IoT for the “Rasppberry Pi and Arduino crowd” that I anticipated would show up. I went into the difference between NB-IoT and LTE-M as well as between chips, modules and boards. Probably the best part however, if I may say so myself, was the last one, where I showed a live demo of working with a couple of development kits.
LED bed light
I have been working on my own custom wakeup-light on and off for several years (part 1, part 2, part 3). After getting Philips Hue lights, however, I have not gotten around to setting it up in my new apartment. So recently, when the need came along, I figured the quarter round rod I used for that project might also make a pretty nice looking bed light.
Version 1
I cut one of the old rods to length and taped it to the headboard to see how it would look:
This could work! I borrowed the old driver board from the wakeup light and hooked up a couple of potentiometers to an Arduino to try the whole thing out:
Version 2
After convincing myself (and subsequently the significant other) that this was a good idea, I went at designing and 3D-printing some mounting brackets to replace the silver tape that held the v1 in place.
The last version was long enough to hold the rod and had holes to let the cable escape invisibly out the back of the headboard:
Looking way better already!
Version 3
The next thing to work on was the electronics. Using potentiometers and analog inputs was not really ideal as the light would flicker and change even by just touching the knobs or the cables and required frequent polling of the inputs in the code. Also, the potentiometers I had were only single-turn.
To make a more reliable control, I ordered some EC11 and EC12 “coded switches” or rotary encoders from Aliexpress. These turn just like potentiometers, but are really acting like switches, opening and closing in a sequence that lets you decide which way they are rotating. The main advantage of this is that there is no need to sacrifice an entire analog input channel just for the sake of having a rotary knob and furthermore, the pins can be hooked up to pin interrupts, removing the need for polling altogether. The ones I got could also be pushed like a button, which I could use to switch between brightness and color mode instead of having two separate knobs like before.
After getting the encoders in the mail, I designed yet another version of the holder that would also house the control knob:
The knob looks a bit out of place and is bigger than is really needed. I will probably try making a smaller one that fits the style of the whole thing a bit better for version 4. And then it would also be neat if the light could be controlled over wifi of course… But first I will figure out a way to hide the electronics so that they are not on top of the headboard!
The 3D-printed parts I made fit the bed IKEA Malm really well but are fairly simple in design. If you still would like me to share them for your own project, drop a comment below!
Kodama Trinus 3D-printer upgrades
At work, we recently got the Kodama Trinus combined 3D-printer and laser engraver. I’m pretty happy with the overall quality of the printer so far, but for our use I immediately identified some areas of improvement:
- No power switch – the only way to turn the printer off is by unplugging the cord.
- No lights inside the enclosure – we got the additional enclosure, but it came without any lights inside.
- No network interface – you have to connect to the printer via USB directly or go get the SD-card. Also no way to control or monitor it remotely.
So to fix this, I wanted to put a switch on the back, put in some lights and add a Raspberry Pi with a camera and Octoprint inside the printer. I started by printing a faceplate for the hole in the back of the enclosure:
This was then fitted with a switch and a DC jack for the power supply. (Make sure their rating is higher or equal to that of the power supply!) The jack lets me split off power to the Raspberry Pi and also fixes the slight annoyance of having to reach in to the very back of the enclosure to connect the cable to the printer.
I then soldered and crimped some cables to wire the switch in series with the DC jack and connected two Wago cage clamps for distributing power to the multiple things on the inside of the enclosure. I also mace a short DC cable to connect to the printer. The face plate has a flange to keep it from moving around in the hole, and is less than half the thickness of the enclosure wall, so I printed a second copy of it, put one on from either side and simply secured the whole thing with the locking nut of the DC jack.
Next, I added some LED-strips on the sides. Note that the enclosure is laying upside-down, so they are actually in the ceiling.
The power supply gives out 12 volts, which is fine for connecting to the printer and the particular LED-strips that I had. The Raspberry Pi however, requires a 5 V power supply, so I wired in a step-down converter and a micro-USB cable.
Finally, I put the enclosure back on and connected the DC-jack and the Raspberry Pi. Here is what it looks like in the back of the printer now.
For now, the Raspberry Pi is just laying inside the enclosure and the camera is simply stuck to the back wall with double-sided tape. I might come up with something smarter in the future, but it works for now.
The LED-lighting made a huge improvement! It was also completely necessary for the Raspberry Pi camera to be useful.
To sum up, these improvements took less than a day to do in total and were fairly inexpensive, but provide a huge step up in usability. We can now upload prints and monitor the progress from our desks instead of going to the printer all the time. Octoprint also has a Cura plugin so that you can simply upload STL-files directly without the need for everyone to have a slicer installed locally. This also means we can have optimized settings on the printer and not have to distribute settings to each individual using the printer.
One caveat is that the Trinus LCD display does not work with Octoprint, meaning that you cannot stop the print or use any of the other features on the front panel but have to run back to the computer to stop a failed print. I might replace the LCD with a small touch screen connected to the Raspberry Pi instead and/or wire in an emergency stop button to the GPIO pins. Also, the LED lights flicker quite a bit as the printer draws more or less power, probably due to the poor-quality power supply. I might try to fix it with some decoupling capacitors and/or a new power supply.
Let me know if you did similar upgrades, have some good ideas for the Raspberry Pi and camera or if you just want some pointers on doing this to your own printer!
Edit:
Alarm Clock v0.1.0
We wanted to try banning phones from the bedroom (you should try, I recommend it!). Clearly, a suitable hardware replacing the alarm clock app was needed. Having thought about building my own alarm clock for a while, I quickly determined it was not a viable option to just go buy one – there simply did not exist a model with all the features I had thought of and now needed to have, like for example:
- Weekly schedule (no alarm on weekends)
- Smarter snooze (configurable and longer)
- Integrated with wakeup-lights and the rest of the appartment
- Configurable from other devices
- Programmable/extendable with future ideas
For the first prototype, I used some parts I had laying around:
- Raspberry Pi A+ with USB wifi dongle
- 1.8″ TFT display (check Ebay for “HY-1.8 SPI”)
- Some prototyping board, connectors and pushbuttons
- Small speakers with 3.5 mm jack
The first step was to connect the display. The one at hand communicated over SPI, which all Pi’s support, and hooking it up was not too difficult. Then, however, I spend quite some time trying to make the Pi recognize it as a screen rather than handling the SPI commands to it directly in my code. (Doing that would mean the interface could be a webpage for example, which would make it easier to develop.)
Using an SPI TFT as a monitor had been achieved already and made quite a buzz on Hackaday back in 2012 or so, but unfortunately it was not so easy to reproduce. At the time of building this in late 2016, most documentation I could find was still from 2012-2013 and talked about compiling the kernel from scratch and a frame buffer driver called fbtft. But, once I found its official Github repository, the first thing in the readme was (and still is) a message from early 2015 saying the driver has moved into Linux staging and that development there has ceased:
2015-01-19 The FBTFT drivers are now in the Linux kernel staging tree [...] Development in this github repo has ceased.
I could not find any signs if fbtft is now actually part of Raspbian nor any comprehensible documentation on how to set it up so I ran out of patience and decided to go with direct SPI control for the first version, meaning less fancy graphics for now. For direct SPI, finding examples was a little easier and by learning from the code on w8bh.net, I finally got something working.
The Pi now runs a fairly simple Python script, listening to the buttons and updating the display every minute, playing an mp3 file at increasing volume if it is wakeup time. The display shows current time and the time of the alarm (which only runs monday – friday). Two of the buttons are used to move the alarm time back or forth in 15 minute intervals. This can be used to change the alarm, snooze or skip it in the morning. The third button stops the alarm and the fourth toggles the Philips Hue lights in the bedroom on/off. The Hue lights are actually controlled via MQTT, from a Node Red server running on a separate Raspberry Pi, acting as a hub for this and some other “smart home” features which might be a topic for a future post.
All-in-all we are pretty happy with this first version, it has been in live use for four months now without any major malfunctions and basically it just needs an enclosure. It does lack some obvious features, like adding an alarm on the weekend and actually moving the time the wakeup-light starts along with the alarm time. Also, when I make a new version, probably I will add another button for starting the coffee maker as well. 😁
Autonomous RC Racing
Here is a project I have been working with on and off for about a year – Alvin the autonomous RC car. The robot is built to comply with the rules of two Swedish robot competitions, Robot SM and Stockholm Robot Championship, where the objective is to race three other robots around a track without any form of remote control. Rules vary somewhat, but each heat typically lasts until one robot reaches 7 laps, or for max 3 minutes. The robots are then given points according to the number of laps they have completed at this point. The participants can also signal the organizer to flip or turn the robot if it crashes or gets stuck, at the cost of one point.
Here is Alvin in action, racing some other robots from the Norbot team at a recent meetup:
Parts
- Wltoys A222 RC car
- Texass Instruments TM4C123G Launchpad microcontroller evaluation board
- My own custom Launchpad protoboard booster-pack
- Sharp GP2Y0A21YK0F analog distance sensors from Ebay
- “10A Brushed ESC Motor Speed Controller for RC Car without Brake” off Ebay
- A3144 hall effect switch, also from Ebay
- Rare earth magnets (D4x2mm), Ebay as well
Operation
Since Alvin is built on a RC car, both the steering servo and throttle ESC have a maximum update rate of once per 20 ms, or 50 times per second. For the first implementation, therefore, the processor simply runs the algorithms for calculating new steering and throttle values once every 20 ms and then just waits in between.
For the steering, Alvin uses two Sharp distance sensors to measure the distance to the side walls and a third to detect obstacles in front. To stay in the middle of the track, it simply compares the distance to the walls and compensates the steering to make them equal. This rather simple approach could probably be improved a lot to gain more speed.
Since the track contains a hump, the power to the motor can not be hardcoded – it needs to increase in the uphill. To control the speed, a hall effect sensor is reading two magnets mounted on the drive shaft. The algorithm converting pulses from this sensor to a throttle setting has so far proven to be the hardest to implement, mainly because there are only two pulses per revolution of the drive shaft. This means there are normally just 0-2 new pulses recorded every time the algorithm runs, making it hard to determine the actual speed. To work around this, the delta in revolutions is instead calculated 10 iterations back in time. The resulting speed value is then used as input to a PI controller which calculates the throttle.
Results
Alvin raced for the first time last year at Stockholm Robot Championship and I was very happy to finish 6th without ever tuning it on a full size track before! I have since worked on the throttle algorithm to make it as responsive as in the video above, so I think it could do even better today.
Future improvements
- Extend front bumper to go around the wheels to avoid getting stuck against the wall.
- Add a speed sensor with more pulses per revolution to allow better throttle control.
- Add more distance sensors to enable a more advanced steering algorithm.
- Add roll cage or body to protect the electronics and improve the looks.
Conclusion
Autonomous robot racing is a fun and fairly affordable way to put your combined engineering skills to the test, including both mechanical, electrical and software components. Being new in my city, it has also been a way to meet like-minded people and a reason to go down to the local hackerspace.
Finally, here is a playlist from Stockholm Robot Championship 2016 if you want to see some more action!
Periodic Python Scheduler
Since the days were getting shorter, I thought it was about time to set up the wakeup light in my new apartment. This got me looking at the code and wondering about improvements. One of the first things I wanted to fix was the ugly way it handled fading the light by setting a new intensity once per second:
while(1): client.loop() # Do next fade step if needed if (rgb.fadeState != 0): if (time() - lastTime >= rgb.fadeTickSeconds): lastTime = time() rgb.fade()
At work, I have spent some time programming with TI-RTOS lately. Like any RTOS, it has the ability to set up periodic tasks. TI-RTOS calls them clock objects and the idea is pretty simple; you assign the clock object a period and a callback function and just start it. Clearly there should be a way to do this with a single line of Python? Not that I could find! There is one way to do it, apparently, in something called Celery. Some comments on Stack Overflow convinced me that it is way overkill for what I am trying to achieve though, requiring setting up ques and handlers and stuff.
Python does provide the event scheduler sched. It can schedule a single event to happen some time in the future. Another Stack Overflow comment pointed out you can use sched to recursively schedule new events and thus get a periodic scheduler:
def periodic(scheduler, interval, action, actionargs=()): scheduler.enter(interval, 1, periodic, (scheduler, interval, action, actionargs)) action(*actionargs)
I went with this idea and added a few features to it. First, I wanted the task to be able to stop itself if it wants to. Second, it should also be possible to stop the task from the callers context. Third, I wanted to be able to schedule only a specific number of periodic events. Fourth, I wanted it to be non-blocking. Sched can run in a non-blocking mode since Python 3.3, but I was using 2.7 so I resorted to wrapping the scheduler in its own thread instead. Here it is (only 44 lines!):
import sched import time import thread import threading # Periodic task scheduler class periodicTask: def __init__(self, periodSec, task, taskArguments = (), repetitions = 0): self.thread = threading.Thread(target = self.threadEntryPoint, args = (periodSec, task, taskArguments, repetitions)) self.thread.start() def threadEntryPoint(self, periodSec, task, taskArguments, repetitions): self.repetitions = repetitions self.scheduler = sched.scheduler(time.time, time.sleep) self.periodic(self.scheduler, periodSec, task, taskArguments) self.scheduler.run() def periodic(self, scheduler, delaySec, task, taskArguments): # Schedule another recursive event self.nextEvent = self.scheduler.enter(delaySec, 1, self.periodic, (self.scheduler, delaySec, task, taskArguments)) # Do task and get return status stopTask = task(*taskArguments) # Stop task if it returned true if (stopTask == True): self.stop self.repetitions = self.repetitions - 1 # Stop if we ran through all repetitions # If repetitions was initialized to 0, run forever # (or at least until integer underflow?) if (self.repetitions == 0): self.stop() def stop(self): self.scheduler.cancel(self.nextEvent) thread.exit()
Now I can define a task and then run it three times with a 1 second interval in a single line of code(!):
def testTask(): print "Test" return False myTask = periodicTask(periodSec = 1, task = testTask, repetitions = 3)
I thought this might come in handy for other projects, so I put it in a separate GitHub project. The next step will be to include the scheduler in the wakeup light code. It will probably require adding some more functionality, like separating creating the task and starting it so that it can be re-used, we’ll see.
Learning Proper Coding: For-loop optimization
“Whats wrong with this code?”
for ( int i=0; i<10; i++ ) { //do stuff }
As it turns out, I will soon start working as an embedded software developer. It seems then, now would be a good time to start learning how to program properly. I’ll be putting up some tidbits here for my own reference and hopefully others to learn something as well!
Several months ago, during a (the) job interview, I was told about reverse loops. Supposedly, computer scientists will scoff at this, but since I never learnt about it until interviewing for my first job after five years of electrical engineering studies and even more time doing hobby-level coding, I thought I might just put one more mention of it on the internetz.
So, if you find yourself writing code for a system with heavy resource constraints, it would seem the school-book example of a for-loop above has some room for improvement. Assuming the loop body does not depend on i incrementing upwards, it would be better to do this:
for ( int i=10; i>=0; i-- ) { //do stuff }
Why? It all depends on what instructions the code compiles down to. When evaluating whether or not to stay in the loop, the processor has to first put the number 10 into a register, then do a comparison to see which one is larger. The comparison in its turn is usually implemented by subtracting and checking the sign of the result. Since most architectures have a designated “zero register”, the reverse loop may skip loading the value to a register and go straight to comparing. Before we move any further however, there is one more thing we can do:
for ( int i=10; i--; ) { //do stuff }
Since the loop criterion has to be fulfilled in order to stay in the loop, we are looking at the zero register (Z) to find out when to break it. This means we might as well use the decrementing itself as the loop criterion, saving us the whole compare instruction altogether!
Sweet! I was nearly going to leave it at that but of course you should not have to take my word for it. Lets try to verify this on an MSP430 using msp430-gcc. First, lets implement the three loop variants:
#include <msp430.h> //Normal loop void loop1() { for ( int i=0; i<10; i++ ) { __delay_cycles( 1 ); } } //Slightly optimized void loop2() { for ( int i=10; i>=0; i-- ) { __delay_cycles( 1 ); } } //Super optimized! void loop3() { for ( int i=10; i--; ) { __delay_cycles( 1 ); } } void main() { loop1(); loop2(); loop3(); }
Next, I compiled the code and fed it through msp430-objdump to get the assembly output. Note that compiler optimizations were set to the lowest level with the -O0 flag and variable declaration inside the for-statement was made possible with -std=c99 (let’s not get into that today):
$ msp430-gcc -O0 -std=c99 main.c -mmcu=msp430g2553 -o main.elf $ msp430-objdump -DS main.elf > main.lst
Now this is where it gets interesting. Watch what happens in the relevant parts of the output:
0000c058 <loop1>: c058: 04 12 push r4 c05a: 04 41 mov r1, r4 c05c: 24 53 incd r4 c05e: 21 83 decd r1 c060: 84 43 fc ff mov #0, -4(r4) ;r3 As==00, 0xfffc(r4) c064: 03 3c jmp $+8 ;abs 0xc06c c066: 03 43 nop c068: 94 53 fc ff inc -4(r4) ;0xfffc(r4) c06c: b4 90 0a 00 cmp #10, -4(r4) ;#0x000a, 0xfffc(r4) c070: fc ff c072: f9 3b jl $-12 ;abs 0xc066 c074: 21 53 incd r1 c076: 34 41 pop r4 c078: 30 41 ret 0000c07a <loop2>: c07a: 04 12 push r4 c07c: 04 41 mov r1, r4 c07e: 24 53 incd r4 c080: 21 83 decd r1 c082: b4 40 0a 00 mov #10, -4(r4) ;#0x000a, 0xfffc(r4) c086: fc ff c088: 03 3c jmp $+8 ;abs 0xc090 c08a: 03 43 nop c08c: b4 53 fc ff add #-1, -4(r4) ;r3 As==11, 0xfffc(r4) c090: 84 93 fc ff tst -4(r4) ;0xfffc(r4) c094: fa 37 jge $-10 ;abs 0xc08a c096: 21 53 incd r1 c098: 34 41 pop r4 c09a: 30 41 ret 0000c09c <loop3>: c09c: 04 12 push r4 c09e: 04 41 mov r1, r4 c0a0: 24 53 incd r4 c0a2: 21 83 decd r1 c0a4: b4 40 0a 00 mov #10, -4(r4) ;#0x000a, 0xfffc(r4) c0a8: fc ff c0aa: 01 3c jmp $+4 ;abs 0xc0ae c0ac: 03 43 nop c0ae: 5f 43 mov.b #1, r15 ;r3 As==01 c0b0: 84 93 fc ff tst -4(r4) ;0xfffc(r4) c0b4: 01 20 jnz $+4 ;abs 0xc0b8 c0b6: 4f 43 clr.b r15 c0b8: b4 53 fc ff add #-1, -4(r4) ;r3 As==11, 0xfffc(r4) c0bc: 4f 93 tst.b r15 c0be: f6 23 jnz $-18 ;abs 0xc0ac c0c0: 21 53 incd r1 c0c2: 34 41 pop r4 c0c4: 30 41 ret
See that? The most optimised loop, loop3(), has the most instructions! How can that be!? Lets try again with -Os for size optimization:
0000c054 <loop1>: c054: 3f 40 0a 00 mov #10, r15 ;#0x000a c058: 03 43 nop c05a: 3f 53 add #-1, r15 ;r3 As==11 c05c: fd 23 jnz $-4 ;abs 0xc058 c05e: 30 41 ret 0000c060 <loop2>: c060: 3f 40 0b 00 mov #11, r15 ;#0x000b c064: 03 43 nop c066: 3f 53 add #-1, r15 ;r3 As==11 c068: fd 23 jnz $-4 ;abs 0xc064 c06a: 30 41 ret 0000c06c <loop3>: c06c: 3f 40 0b 00 mov #11, r15 ;#0x000b c070: 01 3c jmp $+4 ;abs 0xc074 c072: 03 43 nop c074: 3f 53 add #-1, r15 ;r3 As==11 c076: fd 23 jnz $-4 ;abs 0xc072 c078: 30 41 ret
Sorry, loop3() is still loosing!!
So what have we learned from this exercise? Unless I’m missing something, this seems to be a typical case of what shall henceforth be known as “the-people-who-wrote-the-compiler-were-smarter-than-you”. Even on the lowest optimisation setting, the compiler is already reversing the loop, as it detects the loop variable is not used. Supposedly, a more complex loop body would change this. Still, I’m not sure why they don’t all compile down to the same code then. Maybe one day I will have become smart enough to fix it!
While the theory behind the reverse loop as I presented it above seems sound, it appears the compiler optimizes the “normal” code best anyway, at least on this one platform with this one compiler. Until I actually have a resource constrained system to shoehorn into a low-power device, I’ll just keep writing my loops as I used to. This also has another major advantage: readability. A traditional loop statement is simply easier to understand. Also, I don’t feel like explaining that my loops are not broken, but brilliant. Especially when they are, in fact, not even brilliant.