The Evils of Arduino Strings

Due to WordPress’s abysmal handling of code blocks this blog post is now hosted at https://majenko.co.uk/blog/

Everyone, when they’re starting out on the Arduino and similar boards, learns to use the String object for working with text. Or they think they do.

Well, you should forget all you think you have learned about using Strings on the Arduino, because it is all wrong.

Strings are a bit of a tricky area on the Arduino. The String object was created to make working with blocks of text easier for people that don’t really know what they are doing when it comes to low level C++ programming. However, if you don’t know what you’re doing when it comes to low level C++ programming then it is very easy to abuse the String object in such a way that it makes everything about your sketch fragile and unstable.

So I’m here to tell you firstly why you shouldn’t use String objects, and secondly, if you really must use them, how you go about using them properly.

Ok. First a little bit of computer theory – especially to do with memory.

The Arduino’s RAM is split up into different chunks for different purposes. There’s a chunk where all the global and static variables are stored (aka BSS and data areas). There’s the stack where local variables created within function are stored, and finally there’s the heap, which is where dynamic variables are stored (more on those in a moment).

If you want to know more about how these chunks of memory relate to each other you can read more on Wikipedia here: https://en.wikipedia.org/wiki/Data_segment

The main part we are concerned with here is the heap.

So let’s describe the stack and the heap so you can grasp the difference.

Imagine you have a load of coins. Each one represents a variable in your program. Local variables in functions are placed on the stack. As its name suggests this is like a single stack of coins. You place a coin on the top of the stack, and you can remove a coin from the top of the stack. Variables are placed on the stack, and the stack grows. Variables are removed from the stack, and the stack shrinks. Just like placing coins on top of the stack of coins and then removing the, You have to remove then in the opposite order that you placed them on. If you tried taking a coin out from the middle of the stack the whole stack would fall over.

Now the heap is completely different. Again, you have a load of coins. The coins are of different sizes. Some big, some small, and maybe you even have some notes (if you’re lucky). The heap is more like you are laying the coins on the table side by side in a line. You can place a coin at the end of the line, and you can remove coins from the middle of the line. If there is a gap in the line big enough for you to fit a new coin in then you can place it in there. If the coin is smaller than the gap you end up with a tiny gap between that new coin and the next one. That gap’s too small to fit another coin into, so the next time you want to place a coin down it has to go on the end of the line. After a while of adding and removing coins of different sizes you end up with a line of coins full of gaps that’s much longer than it needs to be.

This is called Heap Fragmentation and is a real problem when you only have a very limited amount of memory. All those gaps in the heap are wasted memory that is very hard to re-use.

Dynamic variables are variables that are created by asking the memory management routines to give you a chunk of memory to work with. Once you have finished working with that memory you are supposed to hand it back to the memory management routines so that it can be re-used by another part of your program. The main functions used here are malloc(), free() and realloc(). The first asks for a block of memory, the second gives it back again. The third says “This block of memory I have is too small. Give me a bigger one instead”.

When you ask for a block of memory it tries to find a hole in the heap that it can use for your request. If there is one it will use part or all of it for your request. If there isn’t it will add it to the end of the heap. If you ask for a bigger amount of memory with realloc() and there isn’t room where your current allocation is, it will move your allocation elsewhere, leaving a hole behind.

So you can see the problem of why dynamic memory allocation in systems with very small amounts of RAM can be a problem.

And the String class uses dynamic memory in abundance.

The biggest problem is when you perform many common operations with Strings you inadvertently create new String objects that you don’t need to. Every one of those potentially creates holes in the heap.

For instance, take this simple snippet:

String hi = "Hello";
String name = "Fred";
String greeting = hi + " " + name;

How many Strings do you count there? Nope, there’s four, not three. You have the String “hi” which has allocated RAM to store the word “Hello”, a second one that stores the name “Fred”, and a third that stores the results of adding the others together. But there is a fourth. You see, to build up the results of “greeting” it has to do it in stages. First it takes the string “hi” and adds a space to the end of it. That is placed into a new String object. Then to that new String object it adds the contents of the “name” object. So you have what is called a temporary. Just that, a temporary. Not a temporary variable or anything, just simply a temporary. It’s created in the process of doing the work and then thrown away again afterwards. And of course that has the potential of leaving behind it a hole.

So what should you do if you want to avoid these temporaries? Well, the String class has a handy function concat() which will add things to the end of an existing string.

So you could have something more like:

String greeting = hi;
greeting.concat(" ");
greeting.concat(name);

BUT there is another little gotcha there. Every time you add to the end of the String it has to make more space to store the extra text in it. If the hole it’s currently in is too small it will end up moving elsewhere and leaving behind it a hole, and the heap will grow. So even that is not a good solution.

So the best solution is not to use the String class at all, but to do all the work in proper native “C strings”. I’ll cover working with those a little later on.

So let’s look at another surprising little example. Take this bit of code:

void PrintVal(String tag, String value) {
    Serial.print(tag);
    Serial.print(" = ");
    Serial.println(value);
}

A little function which takes two Strings and prints them to the serial port separated by =.

You call the function with two Strings. Say you call it with:

String tempname = "Temperature";
String temp = "23C
PrintVal(tempname, temp)

You have two Strings already, tempname and temp. You then call the function, and in doing so you inadvertently create copies of both tempname and temp. Those copies are called tag and value in the function. So suddenly you have twice as much heap used by Strings as you did before. So you have made, without thinking about it, 2 extra Strings that you didn’t really mean to. And that has a real noticeable effect on the integrity of your heap.

By now it’s looking like Swiss cheese.

So how do you avoid those inadvertent copies all over the place? Well, the trick here is to pass the String as references instead of copies. That calls for the & reference operator:

void PrintVal(String &tag, String &value) {
    Serial.print(tag);
    Serial.print(" = ");
    Serial.println(value);
}

Now when you call the function the Strings tempname and tag are the exact same Strings. The same for temp and value. You have simply given the Strings new names for the function use. That’s saved two whole extra copies.

So now you can see why the String class, which was created for use by people who don’t have advanced programming skills, is not a good thing for people who don’t have advanced programming skills to use.

But even with advanced programming skills you can never work around all the shortcomings of the String class.

So instead you really need to learn how to do without the String class altogether. And that means using “C strings”.

First a little bit of anatomy.

A C string is simply an array of characters, but it is an array of characters that must obey certain rules.

The biggest rule of C strings is that they are NULL terminated. That means that the very last character of every C string must be ASCII character 0.

The internal C string manipulation functions I will be introducing in a moment all look for that final NULL character as a marker to show where the end of the string is. The reason is because in C an array, although you may have specified a size at compile time, doesn’t have that size stored as part of it, and neither does a C string. It is perfectly possible to say you’re working with a string of 10 characters and then fill it with 20 characters instead. That is a very bad thing to do, so you must learn to take care of these things. The result of that is known as a buffer overrun and is one of the most prevalent hack attacks used by cybercriminals – fill up an input buffer with more data than it can handle until you end up writing your data over part of the program that is being run – and then your data (which could quite happily be instructions for a program) would get executed, thus compromising the device. So care must really be taken to avoid that.

Creating a C string is as simple as:

char string[30];

That will create an array of up to 30 characters. If you do it globally it will be stored in the BSS area mentioned above. If you do it in a function it will be stored in the stack. Not a hint of it even going near the heap.

Don’t forget that the 30 character space that you have reserved includes the NULL character, so actually you only have room for 29 characters if you are to be able to follow the rules for a C string.

Getting things into C strings is a little more tricky though. You can specify some content right at the start if you like:

char string[30] = "This is a string";

And that’s simple enough. But what about if you want to change the content on the fly? Unlike with the String class you can’t just do this:

string = "New content";

Instead you have to change each of the characters in the array individually. You see string doesn’t contain the test, it just points to where it is in memory. So you need to manipulate the memory that it’s pointing to, not the pointer itself.

So to change what is in the string you can use the strcpy() function:

strcpy(string, "New content");

That will iterate character by character over the second string and place those characters into the first string’s memory.

Another thing you can’t do with C strings is adding them together. This will not work:

char hi[7] = "Hello ";
char name[5] = "Fred";
char all[14] = hi + name;

Remember, the variables hi and name just point to locations in memory where those strings are stored. So in fact what you are doing there is adding to addresses (numbers) together and ending up with some bigger number which you then try to assign to an array (which doesn’t work).

Instead you need to use the handy strcat() function:

char hi[7] = "Hello ";
char name[5] = "Fred";
char all[14] = "";
strcat(all, hi);
strcat(all, name);

The strcat function, like the strcpy function, copies the memory content character by character from the right hand string to the left hand one. Unlike strcpy though, strcat starts from the end of the first string, not the start.

Now, when it comes to buffer overruns, there is a special variation of all the string handling functions available. Every C string function has an n variant available, for instance strcpy has strncpy. These variations will perform on up to n characters. That allows you to limit the maximum number of characters you will work with, and thus help you to prevent buffer overruns from existing.

One of the hardest parts of working with C strings, though, is that of working out what is in a string. You can’t just compare strings:

char a[10] = "Part A";
char b[10] = "Part B";

if (a == b) {
....
}

That kind of comparison is merely comparing the pointers to the memory where the strings are stored. Instead you must compare the content of the memory character by character.

Fortunately, again, there are functions to do that for you. strcmp is a good one to start with.

strcmp will take two strings and compare them character by character. If the strings are the same it will return a 0. If one string is logically “less” than the other (in that “a” is less that “g”) it will return -1. If one string is greater than the other then it will return +1. Most of the time you don’t care about greater than or less than, only is it equal. So to compare the two strings above you would use:

if (strcmp(a, b) == 0) {
....
}

Another useful variation on that function is strcasecmp. This does the exact same job but it doesn’t care about upper or lowercase letters. So “Hello” would equal “hello” with strcasecmp but not with strcmp.

Again there are n variants of both those functions available, strncmp and strncasecmp.

So far I have only shown you one way of creating strings: char [num] { = “content”;}, but there are others, and they each have special meaning.

char *string;

That format just creates a pointer to a string. It doesn’t actually point to any string at all – it’s like a string with no size to it whatsoever. You may thing that’s useless, but it’s not – it’s incredibly useful. You can use it to point to any other string, and since it’s really just a number, you can move it around anywhere within a string. More on pointer manipulation later.

char *string = "This is text";

That is creating a pointer to some text in memory. That memory could well be Flash memory, or it may have been copied into RAM first, depending on your architecture. It is never safe to change the content of that string since it may be in read-only memory. The size of the string is determined by the length of the content at compile time.

char string[] = "This is text";

Just like above this will create a block of memory whose size is equal to the length of the content with the content in it and point the string at it. However, this differed from the one above in that it will always be copied into RAM first. It is perfectly safe to change the content of the string.

Note the subtle difference between those last two. The first is supposed to be read only and could cause untold problems if you try to change it – the second, though it looks almost identical, is safe to change. In the former it’s normal to prefix the declaration with const which tells the compiler “I will never change this. Don’t let me even try”.

const char *string = "This is text";

Now onto that aforementioned pointer manipulation. Passing C strings to functions.

Unlike a String object you cannot pass the entire string to a function as a parameter. Instead all you pass is the pointer to the memory where the string data is located. So the example function PrintVal that we looked at way up the page would look like this:

void PrintVal(char *tag, char *value) {
    Serial.print(tag);
    Serial.print(" = ");
    Serial.println(value);
}

Just a tiny change, but one with lots to it.

Now you’re passing the address in memory of where the two strings are stored, and those are then passed on to the print function which knows how to handle them. It then, staring at the address given, works character by character printing each one in turn up until it reaches that all important NULL character.

However, all is not well there. Imagine you call the function as:

char temp[] = "23C";
PrintVal("Temperature", temp);

You’d think that would be fine. And most of the time it would be. However, the compiler isn’t completely happy with it. If you allow the compiler to show warnings (turned off by default in most Arduino and Ardiuno-like IDEs) you would see it moaning. Simply because you have specified the parameters as “char *”, which means, as parameters, “Pointers to memory that I can edit”, yet you are passing the first parameter as a string literal “Temperature”. That’s not a string in memory that you can edit, it’s a literal string in Flash memory. So it moans.

So the rule of thumb, any char pointers that you are passing into a function that you know will not be modified by the function must be done as const char pointers:

void PrintVal(const char *tag, const char *value) {
    Serial.print(tag);
    Serial.print(" = ");
    Serial.println(value);
}

Now the compiler knows that you’re never going to modify the memory pointed to by the pointers you pass, and so it is perfectly happy for you to pass a read-only string.

I have mentioned a few times phrases like “iterates over each character until it reaches the NULL character”. But what do I mean and, more importantly, how does it do it? Well, let’s take a little look at an example – printing a string to Serial character by character.

There’s a number of ways this can be done, but I’m going to show you the pointer way of doing it. Here’s a little function to whet your appetite. I have purposely broken it down into lots of small steps so we can analyse what is going on:

void PrintString(const char *str) {
    const char *p;
    p = str;
    while (*p) {
        Serial.print(*p);
        p++;
    }
}

I know, you’re thinking “WTF?!”, right? Well, don’t worry, it’s all quite simple.

Firstly you should recognise that this is a function to which we’re passing a pointer to a block of memory that we know we won’t be modifying (const char str). So we have a string pointer *str to work with.

Next we are creating a new string pointer, but it’s not pointing anywhere and not got any size. That “useless” one from before, remember? (char *p)

Next we are pointing that new variable to the same area of memory that str is pointing to (p = str) – so p now contains the same address as str does, and they both point to the same piece of memory – that is, the start of the string we want to print.

Now comes a while loop, with the enigmatic test “p”. The * in front of the *p means “Give me the value that is stored in the memory that p is pointing at”. Initially, then, that means “Give me the first character from the string” since p is pointing to the start of the string. Now the magic here is that the NULL character at the end of the string is 0, which is the same as FALSE, so when *p equates to 0 the while loop will finish.

Next we use that same * operator again to get the character that is currently pointed to and print it.

Finally, the last operation in the while loop, is to increment p (p++). Because p is just a number (the memory address), incrementing p causes it to point to the next address in memory. That means, p is now pointing to the next character in the string.

And so it continues until the character pointed to by p is NULL.

So you can now see the importance of that NULL character – without it how would functions like these know when to stop?

There are far more functions for working with C strings than I can cover here, but hopefully I have shown you the basics and you can now delve in and discover more for yourself and finally do away with that String class that causes so many problems.

For further reading you might like to check out these links:

http://www.tutorialspoint.com/c_standard_library/string_h.htm
http://www.tutorialspoint.com/c_standard_library/c_function_strtok.htm
https://en.wikibooks.org/wiki/C_Programming/Strings

73 thoughts on “The Evils of Arduino Strings”

Aram Perez March 22, 2016 at 10:08 am

If you have the warnings on, your PrintString function has an error. It should be written as follows:

void PrintString(const char *str) {
const char *p;
p = str;
while (*p) {
Serial.print(*p);
p++;
}
}

LikeLike

Reply ↓
1. majenko Post authorMarch 22, 2016 at 10:13 am
  
  I forgot the const. Yeah. Thanks for spotting that, i have corrected it.
  
  LikeLike
  
  Reply ↓
Mvaldez May 10, 2016 at 5:56 am

Hi. Great explanations. I really appreciate you included examples (lots of posts in other blogs don’t try that hard). I just finished a project with an ATMega328p for a device that must stay on for months without human supervision and so, after adding an external watchdog (just in case) I decided to refactor the code to move away from the String class. It was not that difficult.

However, for another project (already installed on the field) I used an ATMega1284p and could not get rid of the String class. This project connects with a couple of remote servers and has to process some XML data and then build some JSON answers. Doing simple things like space trimming, case conversion, substring searching/splitting, etc. is very painful using C-style strings (I actually introduced some nasty bugs while using pointers). So I decided to use some fixed size char array buffers where needed but kept using the String class in a couple of places where I needed some of its functions. Call me chicken.

BTW, your state machines post also inspired me to refactor another project code. Eventually I moved from the big switch statement to a table of states/functions. I have to say that it improved the performance of the device but also simplified a lot the code. It is a monitor for another machine, with multiple states and exceptions. I must admit, previously, some of those exceptions were not being properly handled because of the mess the code became. But now it is quite clean and easier to debug.

Regards, MValdez.

LikeLike

Reply ↓
Thomas May 18, 2016 at 8:29 pm

As a hobbyist, I have been pulling my hair out with trying to manage conversions, comparisons, assignments… of strings and char arrays. I have worked with saving to SD, displaying on LCDs, time modules, GPS modules, serial/internet control of robotics… all dealing with characters and strings. I usually found a hack to get it to work, but never could precisely understand what I was doing wrong. I have searched for answers until, I came upon your post. The whole time I was searching, I was asking myself why isn’t this written about more, especially in the Arduino forums. They talk around the issue, but I have never gotten clarity until you explained this. Self taught, sometimes I just need a little explanation and I can take it from there. I will probably use this article in teaching. Great title. Great explanation. Just thanks.

LikeLiked by 1 person

Reply ↓
Libor June 12, 2016 at 12:12 pm

Excellent post!

LikeLike

Reply ↓
dvmrp July 1, 2016 at 12:26 am

Thanks. Excellent post!

I’ve a question. If the same String variable is repeatedly modified, would that creates a new heap hole on each iteration?

For example,
String a;
a = “This is a test”;
a = “This is a long test”;
a = “This is a longer test”;
….
…

LikeLike

Reply ↓
1. majenko Post authorJuly 1, 2016 at 10:20 am
  
  Possibly, possibly not. It depends on the implementation of malloc() (actually sbrk()) in the C library in use. Some sbrk() implementations will allocate memory in a number of multiples of a pre-defined block size. That can be wasteful when allocating small amounts, but it does mean that there is an extra bit of space set aside for realloc() to expand into, which can make expanding your memory area avoid any new allocations and thus any new heap holes. If sbrk() doesn’t do that, and just allocates the exact requested amount, and the currently allocated area is blocked by another bit of allocated memory then it will have to allocate the entire amount again and leave a hole in the heap.
  
  However, using your simplistic example, chances are the initial allocation will end up at the end of the heap (not much else gets on the heap) and so the subsequent ones will just be extending that allocation into unused space. No guarantees of that of course.
  
  And that’s really the underlying point with heap use: there’s no guarantees. You can’t really predict just what is happening, or what is going to happen. Over-use of the heap may work perfectly for months at a time and then suddenly start failing. Or it may work fine (seemingly) forever. Or it may fail within seconds. It’s pretty much impossible to predict.
  
  LikeLike
  
  Reply ↓
dw August 28, 2016 at 4:36 pm

sprintf ?

LikeLike

Reply ↓
1. majenko Post authorAugust 28, 2016 at 4:37 pm
  
  What about sprintf?
  
  LikeLike
  
  Reply ↓
dw August 28, 2016 at 5:00 pm

I would use something like this instead of strcpy and strcat. I think this would work OK.

char szBuff[64];
char szTemp[8];
char szHum[8];
double tempF = (DHT.temperature * 1.8) + 32.0;
double humidity = DHT.humidity;
dtostrf(tempF, 4, 2, szTemp);
dtostrf(humidity, 4, 2, szHum);
sprintf(szBuff,”{\”temp\”:%s,\”humidity\”:%s}”, szTemp, szHum);
Serial.println(szBuff);

LikeLike

Reply ↓
1. majenko Post authorAugust 28, 2016 at 5:10 pm
  
  The problem with sprintf() and similar functions is they are very heavyweight. They are very complex functions that take a lot of flash. If you are already using sprintf() for pretty formatting of strings with numbers etc then by all means you can use it for simple concatenations. If that’s all you’re using it for then you’re both wasting flash memory and CPU cycles – especially if your sprintf() supports floating point.
  
  LikeLike
  
  Reply ↓
2. majenko Post authorAugust 28, 2016 at 5:11 pm
  
  Also I must slap you around the face with a wet fish for using hungarian notation. No one has used that seriously since the 80’s. Even Microsoft, the last real proponents of it, ditched it years ago and recommend never using it.
  
  LikeLike
  
  Reply ↓
  1. DroneMann September 16, 2019 at 8:52 pm
    
    The whole point of the thread is to get newbies to stop using String class. We will work on “proper” naming later. (That is, never. Whatever works for the programmer is fine).
    
    LikeLike
dw August 28, 2016 at 5:26 pm

Haha, That’s how long ago I last used C. All Java now.

LikeLike

Reply ↓
Bruce Boyes September 10, 2016 at 11:45 pm

That’s a very helpful post… thanks.

LikeLike

Reply ↓
Pingback: The Evils of Arduino Strings | The bright side
Jake Sparling October 12, 2016 at 5:11 pm

Great post, I really appreciate the simple examples and stepping through the variants. Thank you!

LikeLike

Reply ↓
Wynand Meijer October 26, 2016 at 7:47 pm

This article must be the 1st google result on an arduino char/string search. Thank you very very much for the detailed article, examples and explanations. Thank you for sharing your knowledge in such a brain digestible manner =]

LikeLike

Reply ↓
kallelindberg October 31, 2016 at 7:49 pm

This blogg was mentioned in the Arduino forum and it provided a nice solution of reading from a Webserver … Thanks a million !

LikeLike

Reply ↓
1. pfabri November 1, 2016 at 9:27 am
  
  Would you mind sharing a link to the forum post you mention, please?
  
  LikeLike
  
  Reply ↓
Gustavo November 24, 2016 at 9:30 am

I was looking for exactly this level of explanation. A straight heads up of how to replace String for someone who has been working with Arduino code and are now getting serious and wanting stability, and more dynamic memory available.

I have since refactored my sketches to eliminate strings…

Thankyou

LikeLike

Reply ↓
Anderson January 12, 2017 at 1:48 am

Respect!! Your explanation is way better than all of my teachers and lectures combined. Big up man… Much thank

LikeLike

Reply ↓
Jindra Širůček March 25, 2017 at 9:58 am

Hi, thx for awsome article! Im javascripter so Im kind of dying from c/c++ way of handling Strings.. But I start to program Arduino – so I have to wokr with it..
I have one simple question for You.. Why there is not some kind of function like deframgmentHeap()? where is the problem to build such a think on low level base of the language?..
it will not solve the problem with small SRAM on embedded devices, but it would sort out the problem with wasting heap space (emental chees problem).. thx 🙂

LikeLike

Reply ↓
1. majenko Post authorMarch 25, 2017 at 11:56 am
  
  Writing such a function would be nigh on impossible – especially for an embedded system with no virtual memory system.
  
  The problem is because there are two aspects to such a procedure. The first is simple enough – move the blocks of memory around to fit them together nicely like a jigsaw. If you have enough free memory that’s just a case of copy it all to a free area then copy it back again into the heap, keeping the heap indexing system intact. Harder if memory is low – you’d be copying stuff all over the place back and forth, a bit like defragmenting a disk that is full.
  
  The second part is the killer though. Your program uses pointers to point directly to memory addresses within the heap. How can a defragmentation function know about the pointer variables in your program to change them to the new addresses after the heap has been defragmented? It can’t. So you end up with your program now with all its pointers pointing at the wrong place.
  
  So unfortunately, while it would be a nice thing to be able to do, it’s just not practical.
  
  LikeLike
  
  Reply ↓
  1. Jindra Širůček March 25, 2017 at 1:34 pm
    
    Thank You very much for explenation..
    2) I was thinking about pointers if they could be changed too.. if they will be saved like a hash table..
    1) moving memory block.. I was thinking about it in a way: start at first place in memory, find first empty block memory, find next data, move them to start of first place memory, then move to another free space and so on.. so it would be moving bigger and bigger free space towards free heap space.. little bit like a bubble sort (just a little bit :-)).
    
    I know it is academic diskusion.. I gues nighter You nor me will not going to program such a think even if it would be possible.. but Im discusing just to get understand the topic bit more.. thank You
    
    LikeLike
Pingback: Splitting Up Text in C | Majenko's Hardware Hacking Blog
Thomas Weeks May 22, 2017 at 2:03 pm

I need more coffee. 🙂

LikeLike

Reply ↓
okierie June 7, 2017 at 7:25 am

Reblogged this on okierie and commented:
Useful, must-read item

LikeLike

Reply ↓
Yaron Kaplan June 15, 2017 at 2:50 pm

Are those Strings really so bad?

I wrote a sketch with the intention of wasting memory, and I didn’t see any memory leaks.
The memory behaves in a very weird way and it’s pretty wasteful, but is consistent.

While I could use c-strings, it would be probably a bad idea for me as I’m not used to it. And I need to do some string processing.

I used the MemoryFree library: https://github.com/maniacbug/MemoryFree

Note: changing: “pig2 = returnStupidString(length);” to “pig2 = returnStupidString(length);” (and deleting ‘pig2’ from above) saves a lot of memory.

Here’s my code (added a malloc call in order to cause an intentional memory leak, which happens if you comment-out the ‘free’ call below).
This program just accepts a digit from the serial and creates a string with the length of the digit specified.

#include

void setup() {
Serial.begin(9600);
Serial.print(“hi! free memory: “); Serial.println(freeMemory());
}

String returnStupidString(byte length) {
String ret = “”;
ret.reserve(length);
for (int j=1; j<=length; j++) {
ret += 'a';
//ret[j-1] = 'a';
}

return ret;
}

void loop() {
char *p;
while (true) {
String pig;
//String pig2;

p = (char *)malloc(1024);
if (p == NULL) Serial.println ("null!");

while (!Serial.available()) {}; // do nothing…
char ch = Serial.read();
byte length = (byte)ch – (byte)'0';

pig = "stupid string";
String pig2 = returnStupidString(length);
Serial.print(pig2);
Serial.print(' ');
Serial.println(pig2.length());

Serial.print("free memory: "); Serial.println(freeMemory());

free((void *)p);
}
}

LikeLike

Reply ↓
1. majenko Post authorJune 17, 2017 at 3:46 pm
  
  Fairly sequential operationslike you are performing aren’t so bad. It’s when you start doing random string manipulations that problems occur. I have had to fix problematic program for people many times which suffer from random lockups, and every time stripping out all the String operations fixes it. That is the biggest symptom – random crashes at random times. The problems that String causes compound over time until the system can’t cope any more.
  
  LikeLike
  
  Reply ↓
  1. Bruce Boyes January 26, 2018 at 9:05 pm
    
    You would hope for some runtime memory analysis or at least monitor in Arduino. At least report current size of the heap. There is some blog on this here https://learn.adafruit.com/memories-of-an-arduino/measuring-free-memory, then you would know when things were getting dangerous and could rewrite your code, or reboot. We write our code with static allocation which is safe, but not efficient, use of SRAM (we have enough so it’s OK in this case). I have seen a commercial system which automatically rebooted every night at midnight in order to start with a fresh heap and stack (it was Java). That’s a crutch IMHO but I guess if you don’t trust your memory manager extreme measures might be justified. No one noticed that for over 10 years since the retail stores it was in were always closed at that time. There were some bad side effects of the reboot, which also went undetected or unreported… so not a good crutch in this case.
    
    LikeLike
Visitor3838 September 4, 2017 at 10:20 pm

Finally, finally, finally, someone explains the problem with using Strings. I have been warned about Strings in the Arduino forums off and on for several years. Never once have any of the forum experts actually explained why. Never once have any of the forum experts actually corrected the other forum experts who are constantly pushing the use of Strings. So big applause to you for actually sitting down and taking time to explain!

LikeLike

Reply ↓
Neil B September 21, 2017 at 4:59 pm

In most C compilers, and as far as I know, the Arduino compiler is one, if you declare MyString[30] the first location is in MyString[0], the null is in MyString[30], so the compiler has created space for a 30-character string in 31 char spaces, which is not quite what you said.
No harm done, though, everything is nicely explained and accurate.

LikeLike

Reply ↓
1. majenko Post authorSeptember 21, 2017 at 5:05 pm
  
  No, with MyString[30] you get exactly 30 bytes of memory reserved (29 characters, 1 NULL, if you use it as a C string). The slices are numbered 0-29. It’s a common mistake that if you create it with [30] you can access right up to [30], but you can’t. You can only access up to [29] since you have created only 30 entries which are numbered starting from 0. Don’t confuse the declaration with the access – the two are very different things.
  
  LikeLike
  
  Reply ↓
strontiumXnitrate December 14, 2017 at 7:33 pm

Excellent work majenko, very well explained.

However, If I’m not mistaken, the terminating NULL in a C style string is an actual zero (b00000000) and not an ASCII zero character (‘0’, b00110000, 0x30) as you mentioned in the article. It makes sense since if you wanted to include ‘0’ in your string you would end up terminating your own string inadvertently.

I’m using String in my code for the convenience of using readStringUntil() for reading strings of unknown length from Serial1 (MEGA2560). Would you recommend dropping String in favor of char array and writing a manual version of readStringUntil() instead? And what is the best strategy for allocating memory for the char array if I do?

LikeLike

Reply ↓
1. majenko Post authorDecember 14, 2017 at 10:06 pm
  
  ASCII character 0 is ‘\0’ not ‘0’. I only mention ‘0’ when subtracting it from a number character to get an integer.
  
  LikeLike
  
  Reply ↓
Justin December 15, 2017 at 8:17 pm

I’ve had unexplained problems in my Arduino code for years that are now revealed by this article. Thank you so much for explaining it well. I converted all of my String objects into char arrays and wrote my own manipulation functions because I didn’t think that strcpy(), strcat(), and the like were versatile enough. But using the examples that the author gave, I was able to jump right in and write my own functions to manipulate the char arrays the way I wanted. THANK YOU!!!

LikeLike

Reply ↓
Pingback: Adding Sensors to an Arduino Data Logger | Underwater Arduino Data Loggers
Pingback: Waiter Please Bot - BlueXanh.Com
Brett January 25, 2018 at 2:47 am

Great article thanks.

LikeLike

Reply ↓
Tomasz Ścisłowicz January 28, 2018 at 2:36 pm

After reading this post i decided to create simple string library that will be as easy as Arduino String and won’t use malloc internally, size need to be known upfront though

https://github.com/toomasz/ArduinoFixedString

Let me know what you think!

LikeLiked by 1 person

Reply ↓
Dave Hardy January 29, 2018 at 1:05 pm

great work – helped refresh my knowledge from a number of years ago.

LikeLiked by 1 person

Reply ↓
Armand Aanekre February 28, 2018 at 10:35 pm

Thank you very much for this thorough and well explained walkthrough. Perfectly explained why some of my Arduino sketches mysteriously crashes 🙂

LikeLiked by 1 person

Reply ↓
renatoa May 16, 2018 at 7:53 pm

Isn’t possible to have a memory compacting mechanism, that can be called in a loop, solve all the issues of String object ?

LikeLike

Reply ↓
1. majenko Post authorMay 16, 2018 at 8:52 pm
  
  It could be done, but it’s a lot of work to both implement it and run it, and if you don’t have enough free memory to move strings around you run into problems… Simplest to just stick to c arrays.
  
  LikeLike
  
  Reply ↓
  1. renatoa May 17, 2018 at 7:04 am
    
    How severe could be this phenomenon with the new procs/boards having +100k of RAM instead 2k as the antique Atmels?
    Sure, still will happen, but later, maybe never in the program life cycle…
    But with so much RAM available I can even afford the luxury to compact when the heap reach 50-75% of RAM, thus having a lot of RAM available for moving.
    
    LikeLike
  2. majenko Post authorMay 17, 2018 at 10:08 am
    
    The amount of memory in the chip is not that relevant. What is relevant is how much of that memory is available to the heap. If you have a chip with 100k of SRAM and you make an array to store sample data (for example) of 99k you then only get 1k for the stack and the heap. The memory size doesn’t matter – only what you do with it. With smaller chips like the AVR, it’s default (and only) state is to have not much room for the heap. With bigger ARM chips the default state is to have plenty of room, so normally it’s no problem at all. However, as soon as you start using big chunks of that memory (you get a chip with lots of memory, normally, because you want to *use* that memory for something) the problem starts to appear again. And then you have the PIC32 compiler for chipKIT – that artificially limits the heap to 2k (I have campaigned with Microchip for years to remove that limit, but they won’t – I provide my own version for UECIDE with no limit imposed) even if you have 512kB of SRAM. So yes, with more memory comes easier heap management opportunities and fewer fragmentation issues under most basic circumstances, but it’s not that clear-cut.
    
    LikeLike
Pingback: Arduino String object woes and resulting ESP8266 stability issues – Mikkel's Private Blog
Hardi June 5, 2018 at 10:44 am

Great article, but you don’t mention String reserve. It’s supposed to be the right way to use Strings while limiting heap fragmentation. Your opinion?

LikeLike

Reply ↓
Libor July 13, 2018 at 7:13 am

This is the best article I have ever read abot the C strings.

LikeLiked by 1 person

Reply ↓
Jose Henrique July 26, 2018 at 11:20 am

Great article!
Thanks for the time you spent writing it.

What about the functions that waiting Arduino String arguments?
Can I pass C string to it?
My question is because I’m using Universal Telegram Bot library: https://github.com/witnessmenow/Universal-Arduino-Telegram-Bot

All the functions provided by this library wait for String.

LikeLike

Reply ↓
1. majenko Post authorJuly 26, 2018 at 1:03 pm
  
  No, you would have to convert them to String first. Ideally the author of the library should code in support for standard c strings instead of forcing the use of the String object.
  
  LikeLike
  
  Reply ↓
2. majenko Post authorJuly 26, 2018 at 1:07 pm
  
  OMG, I can’t believe my eyes. I just read the source to that library. It really is poorly coded by someone that really doesn’t understand C++. It needs a complete rewrite. It will really make Swiss cheese of your heap…
  
  LikeLike
  
  Reply ↓
  1. Jose Henrique July 27, 2018 at 12:10 am
    
    That’s what I thought.
    After reading your article, I was worried.
    But I found a library that was bifurcated from that first link, but that does not use String: https://github.com/J-Rios/Universal-Arduino-Telegram-Bot
    But I still do not know if it’s worth rewriting the whole program, replacing the Arduino Strings for C String.
    
    LikeLike
Richard August 20, 2018 at 3:05 pm

After reading this article I tried to update my Arduino code by removing all String. In some of this cases i could not find a solution. What do you think is the best way to combine a char* and for example a double?

double temperature = 22.05;
char* text = “The temperature outside: “;
char* combined = ?

Thanks in advance for your help

LikeLike

Reply ↓
1. majenko Post authorAugust 20, 2018 at 3:30 pm
  
  You could use dtostrf() to convert it to a char * (if you have that in your libc) then you are just combining strings. Or you could (if your libc supports it) use sprintf to format a %f.
  
  LikeLike
  
  Reply ↓
2. renatoa August 20, 2018 at 5:05 pm
  
  This is how looks my code for such scenario, combining a string/char# with various values:
  
  userCmdl = “W “;
  dtostrf(pwmdPcnt, 3, 0, &userCmdl[1]);
  
  After W there are 3 spaces placeholder, to host the value converted by dtostrf.
  
  LikeLike
  
  Reply ↓
Indrek September 22, 2018 at 8:01 pm

It is OK to use String temporarily in a function scope if you don’t allocate any additional memory before all the String objects fall out of scope.

When all the String objects fall out of scope then all generated Swiss cheese in heap gets cleaned up. Although this means that you have to have enough memory available in the heap during that time.

For example I needed a generic function that prints any value in the middle of the screen:

void displayVal( uint8_t textY, uint8_t textSize, String & val ) { uint8_t textW = (val.length() * 6 * textSize); uint8_t textX = tft.width() / 2 - textW / 2 + textSize / 2;
tft.setTextSize(textSize); tft.setCursor(textX, textY); tft.fillRect(0, textY, textX, 7 * textSize, ST7735_BLACK); tft.fillRect(textX + textW, textY, tft.width() - textX - textW, 7 * textSize, ST7735_BLACK); tft.print(val); }

Now calling it with String parameter is really convenient:
displayVal(100, 3, String(frequencyValue) + “Hz”);
displayVal(125, 2, “(” + String((char)(‘A’+frequencyIndex)) + “)”);

Now lets say that frequencyValue=400 and frequencyIndex=1.
It will display this in the middle of the screen:
400Hz
(B)

I am not exactly sure if those String values get freed immediately after “displayVal” returns or at the end of the function that called “displayVal” but it doesn’t really matter since my program generally doesn’t allocate memory dynamically and there is enough of free memory for run time. At this particular instance ease of use to me is worth the overhead it creates.

LikeLike

Reply ↓
John Forde November 18, 2018 at 5:32 pm

It’s really only a problem if you have a small amount of memory, which as you point out is usually the case with Arduino. If you use micro’s with more memory, it really isn’t a problem.

LikeLike

Reply ↓
Bernard Nyacuma April 5, 2019 at 7:53 am

You are a great teacher ! Now i have differentiated between C strings and the C++ String object.
Now working on the ARM-Duino is easier. Thanks so much.

LikeLike

Reply ↓
R.A. Ghosh April 21, 2019 at 1:30 pm

I am writing a string concatenation code as below line:
SMS_Msg = “tuno=BRITVTS001&data=“;

where Lat, Lon, N_S, E_W, NewDate, nwTime are String type variables. But when I print the variable SMS_Msg, then I get only last part of string “,9,86,0,0,0,0,0> on my serial console. I am new to arduino and C, where I am going wrong ?

LikeLike

Reply ↓
Pingback: ESP8266 und ESP8285 Module Anleitung – Rolands Home
Stefan Ludwig April 12, 2020 at 3:43 pm

Hi Majenko,

while searching I came across this library called StackString which claims to avoid eating up and then screw up RAM-memory
https://arjenstens.com/an-alternative-to-the-memory-fragmenting-string-class-for-arduino/

what do you think?
best regards

Stefan

LikeLike

Reply ↓
1. majenko Post authorApril 12, 2020 at 3:48 pm
  
  Looks nice. A good, convenient, wrapper around the standard C library string routines. Nice fine.
  
  LikeLike
  
  Reply ↓
  1. Stefan Ludwig April 12, 2020 at 4:54 pm
    
    Hi Majenko,
    
    thank you ver ymuch for answering so quick.
    I’m familiar with delphi but a newbee to c++.
    If I try to assign a StackString in this way
    
    StackString TextLine_SS = StackString(“DemoIdentifier=1234567890”);
    
    FileName_SS = “/MyParameters.ini”;
    
    File file3 = SPIFFS.open(FileName_SS.c_str() );
    
    TextLine_SS = file3.readString() ;
    
    I get the error-message
    no match for ‘operator=’ (operand types are ‘Stack::StackString’ and ‘String’)
    
    best regards
    
    Stefan
    
    LikeLiked by 1 person
  2. majenko Post authorApril 12, 2020 at 5:49 pm
    
    Looks like that’s not how to assign a new value to a StackString. Use FileName_SS.clear(); and FileName_SS.append(“/MuParameters.ini”); etc.
    
    LikeLike
  3. Stefan Ludwig April 12, 2020 at 6:00 pm
    
    first of all thank you for answering. And before we go on:
    do you enjoy answering my questions? If no just be honest.
    
    Then I will try asking in the Arduino-Forum.
    
    If yes I have some more questions.
    Everything compiles as long as I use hardcoded strings like “1234Test”)
    
    As soon as I try to assign an argument like
    myString_SS.append(TextLine);
    Where Textline is a variable of type String
    the compiler complaints
    no matching function for call to ‘Stack::StackString::append(String&)’
    
    Do I have to do a type-casting? and if yes how does it look like?
    
    best regards
    
    Stefan
    
    LikeLike
  4. majenko Post authorApril 12, 2020 at 6:12 pm
    
    I spend all day answering questions – yours are no more trouble than them 😉 Although this isn’t the best place to ask the, as answering isn’t easy on here. You’ll find me on arduino.stackexchange.com where I answer questions like this all day every day.
    
    But to answer your current question – use MyString.c_str() in your assignment. StackString does’t accept String objects as parameters, but does accept C strings.
    
    LikeLike
Matthew June 26, 2020 at 11:34 pm

There is a new Arduino SafeString library available which solves all the problem noted here.
SafeString implements almost all of the String methods and operators but never causes heap fragmentation and comes with extensive debugging and error messages
A detailed tutorial is available at
https://www.forward.com.au/pfod/ArduinoProgramming/SafeString/index.html

LikeLike

Reply ↓
teksatan December 8, 2020 at 3:14 pm

Curiously, how would using structures or classes affect the memory structure in one of these MCUs? in theory would encapsulating blocks of data within a class or structure keep the data within the heap ordered?(assuming you aren’t storing pointers) or do the functionality of classes and structures differ between MCUs and PC architectures given the limitations of an MCU. may be a good topic to expand this article with.

LikeLike

Reply ↓
Pingback: Designfehler | wer bastelt mit?
Pingback: Conversia uint8_t în String - rezultat neașteptat
Pingback: SRAM Management • Wolles Elektronikkiste

	SRAM Management… on The Evils of Arduino Stri…
	Hackeando um v… on Making accurate ADC readings o…
	majenko on The Importance of Sharing…
	miguel on The Importance of Sharing…
	majenko on The Importance of Sharing…

Majenko's Hardware Hacking Blog

I wonder what happens if…?

The Evils of Arduino Strings

73 thoughts on “The Evils of Arduino Strings”

Leave a comment Cancel reply

Share this:

Related

73 thoughts on “The Evils of Arduino Strings”

Leave a comment Cancel reply