The compiler will not save you
…at least not in an embedded environment.
There’s a commonly held myth in modern software development that compilers are smarter than people at optimising code for its eventual runtime environment. By extension there’s no point in writing efficient code, because your idea of efficient code might not actually be all that efficient, and any time spent on doing that is wasted because the compiler will know better. This post is about one of those times when that’s not true. Functionally identical code can end up compiling into very different binaries, and that can lead to significant differences in resource usage on the target environment. That might not matter on a PC with a multi core multi GHz CPU and many GB or RAM, but it can make the difference between working and not working in an embedded environment – the kind of environment that the Internet of Things is being built out of.
The back story
A couple of years ago I did a simple project with my kids to make little boards that would flash their names in morse code. It used the cheap ($4.30) and popular TI MSP430 LaunchPad development kit. The code to do the morse bits was fugly but it worked.
Some time later my friend Andy Bennett asked me why I’d coded the project the way I had. The honest answer was that I’d been pretty lazy and cut’n’pasted the whole thing together, but for good measure I threw in a comment that I thought it would be easy for the compiler to digest and turn into an efficient binary for the highly constrained target environment of a low end MSP430 chip.
I’m still not quite sure whether Andy set out to prove me right or wrong, but I’m impressed by the effort and rigour that he put into his subsequent investigations. Some time later he presented his findings at OSHUG #25 (pdf slides).
The (more elegant) alternatives
Andy created three additional versions of the morse code flasher program (I’ve put all four into this Github repo – EmbeddedMorse). I think anybody with the slightest sense of smell around code would argue that all three are better than my original (#2). #1 and #3 make much better use of C data structures, #3 has some clever bit mapping of morse characters, and #4 uses a switch statement instead of many extra functions.
If the myth of compiler omnipotence held true then all four examples would have compiled down into the same binary for the MSP430.
They didn’t – it turns out that the compiler can be led astray.
A quick detour into a limited memory map
The MSP430G2211 device that I used has 2kB of Flash and 128 Bytes of RAM – so anything that’s going to change at runtime (variables) needs to fit into that 128 Bytes and everything else can be accommodated in the more generous 2048 Bytes of Flash. The actual memory map for an MSP430 device (or any other microcontroller for that matter) is a bit more complex, due to memory mapped peripherals etc., but that matters less when it comes to squeezing some code in.
Memory sections of compiled programs
A compiler will generate binary code to fit into various different sections such as .text (machine instructions), .data (static data such as constants) and .bss (space for variables).
For the compiled program to fit onto the device then .bss must go inside 128 Bytes and .text and .data combined can’t exceed 2048 Bytes.
Here’s what the compiler actually produced
The first notable thing is that they’re all different – the compiler hasn’t spotted that the code is functionally identical, and it’s compiled it into different shaped binaries.
The second notable thing is that they all should work – none of the code uses any .bss, so that tiny 128 Bytes of RAM shouldn’t be an issue.
The only one that does work is #2, and it’s pretty obvious why that is – the entire compiled output is .text, which can go into flash.
It’s less clear why the other three examples don’t work:
#1 – if .data goes into RAM rather than flash then it’s clearly going to break past the 128 Byte limit
#4 – if .rodata goes into RAM rather than flash (which it totally doesn’t need to) then it will also break the 128 Byte limit
#3 – even if the .data and .rodata go into RAM then it’s only 70 Bytes, so there are 58 Bytes left. In theory that should be plenty, but in practice… some further analysis needed to figure out why it breaks.
Compilers aren’t as smart as people make out. Functionally identical source code can lead to very different binaries, and that can matter a lot in a constrained environment.
I didn’t know any of this when I wrote my initial implementation, and it’s just good luck that I picked an approach that worked first time. If it hadn’t then I might have done battle with the compiler for some time, or I might have given up and found an easier platform to get it working on.
Filed under: code, software | 2 Comments
Tags: binary, c, compiler, microcontroller, msp430, optimisation, size