Fast, Efficient Data Storage on an Arduino

Logging data on an Arduino is very much a trivial every-day task. Connect an SD card, open a file, and start printing data to it.

For many people that is good enough. It results in nice easily readable (by us humans) data.

But it’s not fast. It’s not efficient. It’s perfectly fine for things like logging temperature every hour, or barometric pressure every 5 minutes, etc. But when you have large amounts of data to store very rapidly you have to think a little differently.

I came across a situation recently where it was necessary to store lots of data values very rapidly and be able to read them back again at a similar speed.  Now that could be done by writing those values to the file as text, maybe as a Comma Separated Values file (CSV), which is simple enough, but when it comes to reading that data back in on an Arduino things get decidedly complex. And complex means lots of code and large processing overheads.

So what is needed is a way of storing the data in such a way that it is trivial to read it back in.  So for this I am going to give you two small phrases that sum everything up quite neatly:

  • Human Readable
  • Machine Readable

That describes two types of file.  Human Readable files are things like text files, CSV files, etc. You can open them in a simple text editor and you understand what they are. It’s just text.  However a computer has a hard time understanding them.  The opposite end of the spectrum is Machine Readable files. These can’t be understood by a (normal) human being. You open them in a text editor and all you see is gibberish. It takes a special computer program to interpret them and display a representation of that data for a human to make head or tail of it.  A good example is a graphics file – say a PNG file.  Here is a PNG file opened in GEdit on Linux:gedit

As you can see it’s just nonsense. However open it with a graphics program and that program reads the file and uses it to create a picture you can see.

Such data is said to be Binary. It is important to note, of course, that binary contains text. That is text files are just a subset of binary files.  In a binary file each entry (byte) can contain a value between 0 and 255. Text files just map the letters we all know and love to numbers within that range of 0-255. So there is absolutely no difference between a text and binary file – its is just that a binary file can contain more data outside the range of human understanding. For instance in the PNG file above you can see the word “PNG”, and “HIDAT” and other letters and numbers besides the stuff that you can’t understand. So a binary file that contains only bytes in the human readable range (also known as ASCII values – the American Standard Code for Information Interchange) we choose to call a text file.

So how does this help us store data more efficiently on an Arduino? Well, simply by stepping away from the limiting factors of the human interpretation of data and using a purely machine readable file.

For this we are going to use a struct. In C a struct is a method of grouping different variables together into one single unit. You can think of it as a bit like an entry in a database, where you might have a name, address, town, postal code, all as different fields within it.  In C those fields are variables, and the struct is the record.  Let’s take an example:

struct datastore {
    uint16_t adc1;
    uint16_t adc2;
    float voltage;
    float current;
};

There we have defined a structure that contains four different values within it. Each value has its own associated data type, just like normal variables.  The whole structure itself is, as well, a new data type. You can make new variables from it, like:

struct datastore myData;

You now have a new variable called myData which itself has 4 sub-variables. You access those using a “.” and the name:

myData.adc1 = analogRead(0);
myData.adc2 = analogRead(1);
myData.voltage = myData.adc1/1024.0*5.0;
myData.current = myData.adc2/10000.0*23.4429;

It’s a useful technique in its own right for grouping different variables together that are related, but it’s real power comes when you get under the hood and look at what is actually happening.  Not only are the sub-variables grouped together within an umbrella name like that, but they are also grouped together in memory. And in a very specific way as well. The order they are specified in the structure is the order they are held in memory. For instance the struct above might look like this in memory:

StructureMemory

Each square is one byte in memory. As you can see the uint16_t values (same as an unsigned int on the Arduino – I’ll cover why using uint16_t and not unsigned int is important a little later) use two bytes, and the float values use 4 bytes each. That gives a total of 12 bytes. And of course it is perfectly possible to access those raw bytes of data should you wish to.

And we wish to – although not directly.

There is another very useful function in C called “sizeof(var)”. That tells you how big variables are. For instance it would return 2 for a uint16_t, 4 for a float, etc. For our struct it would return 12.

So now what if we were to write those 12 raw bytes direct to the SD card instead of a textual representation of the numbers?  We would end up with a file that was 12 bytes long. Write it twice and we would have a file that was 24 bytes long. Three times and it would be 36 bytes long.

The SD library on the Arduino supports that kind of writing perfectly well. You don’t have to do anything special when creating or opening the file. All the magic happens when you tell it to just write a bunch of bytes instead of text:

myFile.write((const uint8_t *)&myData, sizeof(myData));

Yes, I know, that looks a little cryptic, so I’ll break it down for you so you can see what is going on here.

“&myData” gets the address in memory where the data is stored.  It is intrinsically a “struct datastore *” type. The write function doesn’t like that type, so we need to change it. That is called casting, and we want to cast it to an unsigned byte pointer type, so prepend it with:

(const uint8_t *) &myData

The write function now sees it as an array of bytes. Clever, eh? Along side that we need to tell the write function how many bytes to write, and for that we can use the handy sizeof() function I mentioned before.

So let’s roll that all into a complete example:

#include <SPI.h>
#include <SD.h>

const int chipSelect = 4;
File dataFile;

struct datastore {
    uint16_t adc1;
    uint16_t adc2;
    float voltage;
    float current;
};

void setup() {
    Serial.begin(9600);
    Serial.print("Initializing SD card...");
    pinMode(10, OUTPUT);

    if (!SD.begin(chipSelect)) {
        Serial.println("Card failed, or not present");
        return;
    }

    Serial.println("card initialized.");
    dataFile = SD.open("datalog.dat", FILE_WRITE);
}

void loop() {
    struct datastore myData;
    myData.adc1 = analogRead(0);
    myData.adc2 = analogRead(1);
    myData.voltage = myData.adc1 / 1023.0 * 5.0;
    myData.current = myData.adc2 / 10000.0 * 23.4429;
    dataFile.write((const uint8_t *)&myData, sizeof(myData));
    delay(50);
}

So now we are filling our SD card with raw binary data. But what can we do with it? We can’t look at it, it will just be meaningless to us.  So we need the Arduino to read it for us. And that is just as simple. There is a “read” equivalent to the “write” function we used above where we can tell it to read bytes into an array – and that array can be our struct cast as before:

myFile.read((uint8_t *)&myData, sizeof(myData));

That will read the 12 bytes from the SD card and reconstruct your structure for you, all magically and without you needing to do any interpreting of numbers or symbols. So we can take the example we already have and change it into a reading example very very simply:

#include <SPI.h>
#include <SD.h>

const int chipSelect = 4;
File dataFile;

struct datastore {
    uint16_t adc1;
    uint16_t adc2;
    float voltage;
    float current;
};

void setup() {
    Serial.begin(9600);
    Serial.print("Initializing SD card...");
    pinMode(10, OUTPUT);

    if (!SD.begin(chipSelect)) {
        Serial.println("Card failed, or not present");
        return;
    }

    Serial.println("card initialized.");
    dataFile = SD.open("datalog.dat", FILE_READ);
}

void loop() {
    if (dataFile.available()) {
        struct datastore myData;
        dataFile.read((uint8_t *)&myData, sizeof(myData));
        analogWrite(5, map(myData.adc1, 0, 1023, 0, 255));
        analogWrite(6, map(myData.adc2, 0, 1023, 0, 255));
        Serial.print(myData.voltage, 4);
        Serial.print(" ");
        Serial.println(myData.current, 4);
        delay(50);
    }
}

So simple. No need to try and understand the data, the Arduino already knows what it is.

There are a couple of gotchas though with this method.

  1. The data structure mustn’t change, or you won’t be able to read old data. It relies on the struct always being the same size and with the same variables in it. If you suspect that you may want to add more variables to the structure at a later date you should reserve room for them in the structure right from the start.
  2. Different chips, boards and computers treat different variables in different ways. For instance on an Arduino Uno an int is 2 bytes, but on a Due it’s 4 bytes. That is why it is important to use things like uint16_t instead of “unsigned int” – it tells the system precisely how big a variable to use and all systems will then use the same size.
  3. Carrying on from 2 is the problem of endianness. Not only do different systems have different sizes for different variables, but there are a number of ways of arranging them in memory – for instance the uint16_t has two bytes, but which byte is which? There are two common endiannesses – big-endian and little-endian, and making sure that you convert between them on different systems is vital or your data will just come out as nonsense. For instance, take the Arduino Yun. The ATMega32U4 chip is little-endian. That means that in a 2-byte variable like the uint16_t it stores the least significant byte first. The Linux portion, though, happens to be big-endian. That means it stores the most significant byte first. So to read the data written by the ATMega32U4 on the Linux side you will have to manually swap the bytes around when reading from the structure.
Advertisements

15 thoughts on “Fast, Efficient Data Storage on an Arduino

  1. apicus

    you probably mean :

    analogWrite(5, map(myData.adc1, 0, 1023, 0, 255));
    analogWrite(6, map(myData.adc2, 0, 1023, 0, 255));

    Like

    Reply
  2. Bertie

    Hi, would the approach be similar for saving a class instead of a struct?

    I have a main class that has attributes which are ints, floats, structs, strings and pointers to other classes.

    This is all new to me (Arduino, C++) so excuse any naivety. I’m also still waiting on my SD module so I haven’t had the chance to play around with this yet. Anyway my worries/ thoughts/questions is:

    Is the size of each instance of my class fixed? My gut tells me “no”, and that this precludes mimicking the example above.

    If I am right then I am thinking that I need to get the actual size of each instance of myClass and save that to the SD as well so that I can read back the same data that was saved…. Am I on the right lines?

    Great write-up BTW. Better than most of the explanations I have read so far today. Thanks.

    Like

    Reply
  3. Pingback: A DIY Arduino data logger: Build Instructions – Part 4 (Power Optimization) | Arduino based underwater sensors

  4. Reda

    hi
    thanks for your post ,
    i.m trying to build an electronic access door system with Arduino mega , so i decided to store users data in a binary file on SD Card which attached to Adafruit TFT touch screen
    my problem is :
    when i open the file to read and search for a user it works and retrieve the related data but when i try to open the file again to make another search for another user i got this message :
    “Card failed, or not present”
    and even screen stops to respond,but every thing works again after restarting the MEGA board

    Like

    Reply
    1. majenko Post author

      Unfortunately comments here aren’t the best place for debugging code. I would suggest you ask a question on arduino.stackexchange.com were you can post your code so we can see what is wrong with it. One thing I would suggest though is to not open/close the file, but to only open it once at the beginning of your program (in setup()) and then use seek() to jump to different places within it. For instance “rewinding” the file to the beginning can be done with “myFile.seek(0);”. You can also jump to a specific record number within the file with “myFile.seek(sizeof(struct myDataStruct) * recordNumber);”

      Like

      Reply
  5. El

    You are my personal hero.
    I was thinking about that for days, and you not only giving a solution … I understand it 🙂
    I was dealing with arrays and huge for loops on my Arduino, and now my save/load functions are clean, easy to read, powerful and have 20 lines 😉
    Thank you very much!

    Like

    Reply
  6. tingeman

    Thanks, this was very helpful! Upon trying this with a struct containing one uint32_t and three uint16_t members, each record written end up having two extra bytes of value 0x00. I assume this has to do with padding of the struct. Do you observe this as well? Can I expect the padding to always be at the end of the struct, so that I could simply write always 10 bytes (instead of using “sizeof”)?

    Like

    Reply
    1. majenko Post author

      On an 8-bit system I wouldn’t expect to see any padding. However you can “pack” the struct using `struct blah { … } __attribute__((packed));` which will eradicate any padding bytes and also save you some space.

      Like

      Reply
      1. tingeman

        Thanks, I’m using a 32 bit NavSpark-GL arduino compatible board, so that probably explains the packing. The “__attribute__((packed))” works perfect!
        I’m not concerned about memory or SD storage capacity – more about speed, as I am trying to obtain high-frequency logging from an external sensor. Do you know of any downsides to packing the struct? (there must be a good reason it is not packed by default).
        Thanks again for your help!

        Like

      2. majenko Post author

        Packing can introduce a small amount of extra processing to access the structure content as it has to extract parts of words and recombine them again. Some architectures provide byte and half-word access instructions which speeds things up somewhat. Not sure what the NavSpark has though.

        Like

  7. oldmanegan

    You have a small typo… see last line, rest here for context and help finding…
    “So now we are filling our SD card with raw binary data. But what can we do with it? We can’t look at it, it will just be meaningless to us. So we need the Arduino to read it for us. And that is just as simple. There is a “read” equivalent to the “write” function we used above where we can tell it to read bytes into an array – and that array can be our struct cast as before:

    myFile.read((uint8_t *)&myData, sizeof(mydata));”. Should be myData, not my data…

    Great work and many thanks!

    Like

    Reply
  8. Pingback: Arduino Data Logger: 2017 Build Update | Arduino based underwater sensors

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s