Of stack overflows on embedded systems

This last week has seen some intensive time investigating a random, pretty much non-replicatable bug on an embedded system (RL78/G14). It has maybe been seen a few times before and dismissed. It has now been seen a few more times and presented in such a way that it has stuck around when occurring.

This fault caused quite a few hours of staring at code trying to find an edge case of where it had gone wrong and got stuck in a bad state in it’s state machine. A *LOT* of staring. And then some more.

This code was run pretty much all the time, it had a common function and duplicated logic for two states.

Nothing was suggesting an actual software bug, nothing was suggesting a compiler bug (of which we’ve had a few on this project!).

Soooo… I had noted as a risk earlier that we had used 76% of ROM and 52% of RAM on this chip. Sounds like a bit left, but then we only had 5.4KB of RAM. That means only ~2000 bytes of Stack space left. That could be as low as 15 function calls deep or as high as 30. Neither is a big number.

Investigating the generated map file and I noted that the data for the area we had seen most issues was within the top area of working memory (not the first thing though). I also noted, as I had kind of already knew, that the Stack was allocated from top of RAM and if we went too far we would trash the working memory.

This isn’t your stack overflow of security type stuff, this is literally you have run out of stack space and wrote over your working variables. Why does it happen on such a random basis, well, perhaps we are operating with 1 or 2 levels spare at the very deepest calls which happen only under certain logic and only for short periods. But if you get an interrupt (or even an interrupt within that one) it could well be the straw that breaks the camels back.

I guess, I got complacent, on big systems the MMU would have saved us (well, it would have crashed us) if we even managed to get there. Most of the time your run out of ROM before you run out of RAM.

But it remains, what can we do about it. Worse still, all we have is a lot of evidence pointing to this as the cause of an error, but it’s not repeatable via steps or timing.

The obvious answer is get a bigger chip, and it’s pretty likely this will happen, however our pipeline means we may have to work with what we have for a period.

We can reduce RAM/Stack usage. I’ve looked at this a few times and there isn’t much in the way of reduction without removing functionality.

The next thought is that if this happens so rarely that we have seen maybe 4 or 5 of these in normal usage in 6 months then perhaps we can fix up the routine to break the state machine deadlock. Except, if we trashed the working memory we can’t be sure that everything else is actually OK. It’s a house of cards, once that has occurred we can’t trust anything.

So, I’m left with trying to detect and rectify on the run. 

Step 1 – move the stack below the working memory

On this platform the RAM starts at a non-zero address and attempting to write at addresses below this will cause a reset with an illegal memory access. This gets us back to safe conditions and also allows us to log the reset cause – we should get an idea of just how frequently this is actually happening.

So luckily this build tool chain using LD the GNU linker. Unfortunately, it turns out you can’t figure out the size of a section without defining it.

This matters because we want:

  • 0xF0000 – end of stack
  • 0xF???? – start of stack
  • 0xF???? + 1 – start of working RAM
  • 0xFFFFF – end of working RAM

In other words, we want to allocate the working memory based on it’s size from the end of RAM.

A long story short here, the linker allocation pointer can only be moved forwards, so you need to know the .data and .bss data sizes, then allocate .stack, then allocate the .data and .bss sections.

My first guess was to list a fake RAM duplicate section, allocate a fake .data and .bss (named .fdata and .fbss since I was feeling creative) with a NOLOAD attribute.

I now had the size, except when I now allocated the actual .data and .bss their size was zero.

Turns out the linker is clever, it could see I had already included the objects for .data, .bss and COMMON into the two fake sections and so didn’t/wouldn’t include them again.

As far as I can see there is no trick to doing this, I tried DSECT, KEEP and a bunch of other attributes but it would still not include them.

I then tried calculating the sizes they needed to be and moving the allocation pointer (the ‘.’ ) on by the needed sizes. That worked.

Well, it worked for making the right size, but I kept getting a reset and debugging showed what I believe to be the symbol table not being relocated, so it wasn’t actually correct. Probably because all those symbols were in the discarded (NOLOAD) section of .fdata and .fbss.

I admit defeat at this point, I can’t see a way to automatically size stack and working RAM in this way without using some clever stuff such as double link step to get the section size, extract, and relink again with the non-fake block.

Step 1 – move the stack below working memory (again!)

Step 1, attempt 1 failed, but I should at least be able to statically size the stack and do this right?

Yes, this is actually pretty easy:

stack_size = 0x900;
    .stack 0xFE900 (NOLOAD) : 
        . += stack_size;
        _stack = .;
    } > RAM
    .data : AT(_mdata)
        . = ALIGN(2);
        _data = .;
        . = ALIGN(2);
        _edata = .;
    } > RAM
    PROVIDE(__romdatacopysize = SIZEOF(.data));
    .bss :
        . = ALIGN(2);
        _bss = .;
        . = ALIGN(2);
        . = ALIGN(2);
        _ebss = .;
        _end = .;
        ASSERT((_end < 0xFFEDC),"Not enough space in RAM for working Memory");
    } > RAM

The only things to note here are:

  • You need to define the max stack size, that has to take into account the memory needed for the working RAM. That’s pretty much trial and error.
  • You need to ensure the ‘AT (_mdata)’ (in my case) was used to get the right size and linkage.
  • You really want to include that ASSERT at the end to know when it goes wrong.

I don’t like this solution, it’s an extra step that the customer will need to know about.


We did manage to get our stack to go from below working memory and a quick test with a stack eating function showed that we do get our reset as planned.

I think the vendor of our toolset and platform would be advised to provide this as a default linker script, with the stack maybe at 50% or maybe even 90% to force the developer to examine the settings early.

The result of this change would be that when developing a stack overflow (is this really an underflow since it was going down?) would be immediately obvious.

The RL78/G14 is not exactly a hobby micro, so an unfriendly stack setting of 90% and placed below working RAM would seem reasonable to me, a stack overflow can have unpredictable effects otherwise.

We still aren’t convinced this was our problem, but it was potentially A problem, so we’ve fixed it. The good side is that I had an interesting issue to diagnose and fix and a challenging fight with a new tool I hadn’t paid much attention to before now.

Lastly, googling showed that this was a common enough issue and I’d single out this blog post and this one from Embedded Gurus for confirming what I suspected and providing a bit of consoling views that it was not just me that had found this one of the harder bugs to verify.

Android Drawable Resources

Doing some work on testing SmsDroidway on different Android SDK/Platform levels and I noticed that some of the nice default icons were missing from my UI. A quick google showed that this was not uncommon and that the best practice was to import the resources to your project locally to make yourself platform agnostic, though you then don’t necessarily match in with the platform.

The Android Eclipse project folder structure has a ‘res’ folder that contains Drawables, layouts, menus and values. However, the Drawables tend to be split by High, Medium and Low DPI folders, in fact some of the ADT tools create icons for you into these folders. This means that to import from the SDK folders you have to go and grab the the images from the folders individually, this just struck me as too much effort/pain.

So… I weighed up creating an Eclipse plugin to steal them from the SDK folders and push them into your project, and, to be honest, this would have been my preferred method. But I don’t have time to get up to speed with the plugin framework to create a UI, to access the ADT preferences and all the other stuff it should do.

However, I do have time to write a quick (and admittedly bit dirty) C# Winforms application that can do this.

So that is what I did, this is the wonderfully named “Android Resource Transferer”. I could probably brand it better 🙂

At the top you select your destination eclipse android folder, the Shell browse dialog is validated on exit to confirm the folder looks android like. This is a pain, the FolderBrowseDlg doesn’t support custom validation or subclassing, so I had to just put it in a loop with some delegate validation function passed.

Next you can choose to either use the SDK as the source or a Donor Project. The Tab selection is your active method.

The SDK Browse process includes some checks to make sure it looks like the SDK root folder. The SDK includes a platform level on the left, you can see my Android internals platforms listed there. You can then use whatever icon you think looks best.

A big difference from the eclipse ADT tool here is that the search box uses your search string as a  “contains” method, so you can focus on keywords and not if it starts with ic_ or whatever like you must in ADT.

The list of resources includes a nmeonic of [HMLN] to indicate which DPI are available for a particular resource, being [H]igh, [M]edium, [L]ow, [N]eutral. The selected resource is previewed at the highest DPI on the right.

The final step is to hit import and the drawable will be put in your project, it will even create the right folder if it doesn’t exist.

Back in Eclipse you will have to refresh to see you results and might have to do a clean to get the drawable IDs propagating through the project.

You can download the tool here. I can probably downgrade it from .NET 4 if needed, and I expect it to run under Mono, since there isn’t anything too complex in there. We won’t be releasing the source to this tool as it isn’t to the standard we normally work, being a quick tool to aid our development work, on the other hand, we haven’t obfuscated it at all, so if you run reflector over it you will see that it isn’t deliberately malicious 🙂

Accessing the Android Internal Classes

As part of our mobile work I have been peering into the Android source and Kernel, which is great, but we also wanted to see what we could get access to without making our own ROM and/or Kernel.

This provides a middle ground between more functionality and less compatibility when we want to offer some of our items for download to any old phone.

I found this posting Using com.android.internal classes from DevMaze. It was way helpful and in the final sections he discusses how the ADT Eclipse plugin restricts use of the internal classes, even if you can access them via the custom platform creations.

Well, I manually carried out the hex edit and it is all working, but I was concerned that if I let eclipse update I have to go and do it again. So, I give you http://code.google.com/p/adthiddentweak/ a very simple application that carries out the ‘hard work’ of doing the modification.

It just:

  • Copies the plugin to a folder under your documents to work on
  • Unpacks using DotNetZip (http://dotnetzip.codeplex.com/) to a folder
  • Maps the file that needs to be modified into memory (first time using memory mapped files in .NET, frankly not as useful as in C/C++) for alteration
  • Copies the plugin as <file>.bkp just in case
  • Creates the Jar (a zip file)
  • Copies back over the original Jar plugin file

Nothing really amazing but it did allow me to mess with two things.

Visual Studio 2012.

Yesssss….. Not bowled over by this experience. I wasn’t asked what developer profile I prefer, so ended up with a .NET profile, I’ve always used C++ (I’ve been using VS since v6) and got used to the layout.

Color scheme, blinded by the light, though thankfully I found the dark mode.

Icons.. I’ve only had 15 years to get used to the old ones, so I guess making me learn new ones might cause a bit of frustration…

I don’t feel especially disposed to upgrading right away.

Memory Mapped Files

I was keen to see these in .NET as the application here was a text book example of their use.

In C++ you map the file, get a pointer and get to treat it like a big block of memory. It makes the C programmer in me happy :).

In C# I thought I would get a massive array of bytes that I could mess with, instead I got what really looked like a stream. I might as well have just opened the file R/W via File IO and worked on it.

Today was a day to want my pointers back :-p

Why software engineers are bad for electronics

Since I’m responsible for the hardware side of CASL I thought I’d share this article from Rugged Circuits about how easy it can be to kill one of the Arduino boards we use for happy prototyping. As we say about each other, we each know enough of the others expertise to be dangerous!
This is why I try to make sure our designs can withstand software engineers, after that users should be the easy ones…

10 Ways to Destroy an Arduino


CASL has been working on Atmel AVR programming using the excellent Arduino rapid prototype platform. Conveniently we’ve been using www.freetronics.com supplied boards as we could source them at Jaycar, just a few minutes drive away.

My own previous experience has been some PIC solutions,and a bit of FPGA work (I’ve also got some really old stuff like 6800 and MCS48…) , so this was interesting trying out another processer architecture and the provided development environment, the Arduino IDE.

I liked the boards, I liked the prevalance of libraries but hated the IDE… I’m coming from a Visual Studio background or Emacs, so I’m used to actually seeing my errors and having some extra control.

It wasn’t long until I had Visual Micro Arduino installed in VS2010, but even that wasn’t great (good plugin, but still using the FDH toolchain of the IDE). So I downloaded AVR Studio.

Now, we were talking I was seeing warnings, errors and lots of new stuff. I’ve spent a bit of time now configuring up templates for Arduino development, cleaned up the libraries and made new additions to the core (fancy subsrcribing to Pin Change Interrupts? How about multiple subscribers to Interrupt handlers?).

I should be going back and doing a QA on the code, when that’s done CASL will release an AVR Studio template for Arduino boards and also look to push the architecture changes back into the mainline of the Arduino sources.

Hello world!

We’ve set up a blog here on the website so that we add in more dynamic technical articles, hopefully we should be sharing information on bits of technology we have been working on and what we are thinking about.

We’ve used wordpress, since that was offered easily with our hosting, and hopefully it is the content, not the meidum (and perhaps even some of the presentation) that is important.

We look forward to sharing some of our work with you!