Tuesday, June 29, 2010

More tools to write better C

EDIT: 15.09.2011 Another SVN working now. Sources can be found from




are covered here.

Oh my oh my. It has been ages since I last annoyed you my dear readers. Well, I figured I could now do that again and share some bits (pun intended) which may come handy.


With bitset I mean a simple way of decreasing memory usage while storing information. There's plenty of situations where we need to store some state for multiple things. A simple (and very much schooly) example could be student's course proceedings in a school. Eg, let's imagine a school where we have 10000 students. Each student needs to pass set of courses, and these courses are either passed or not passed. Now, we could assign an ID-number for each student, starting from 0, and ending up to 9999. (Gosh how fast these classes grow nowadays. What does this tell about us parents...).

Now first idea how to simply store this status is to create an array for each course, index it with student IDs, and set the element of array to 1 if student has passed the course, or 0 if he/she has not. Eg. we might end up in a solution like:

int biology1[10000];

int has_student_passed_biology1(int student_id)
if(student_id<0 || student_id>10000)
return -1;
return biology1[student_id];

Lets ponder a lil while. An integer on x86 machine takes up 32 bits of space. How many bits do we need? Naturally, we only need 1 bit, (student can have passed the course (1) or not passed (0). (Of course there is cases when we do not need even this many bits, for example the introduction to quantum mechanics was such a course for 1.st years physics students... No one passed, but let's ignore that now :D) So for each course we end up wasting 31 bits of memory/student. This is 310000 bits (we had 10000 students), Eg. roughly 37.8 kilobytes. Well, when we think of modern PCs, this might not be significant. But on the other hand, when we think of all the possible places where we can do this wasting... Well, have you ever wondered M$ product's memory usage? :D And if we think of embedded systems with limited amount of memory.........

To tackle this problem we could change ints to chars. That would divide the loss by 4. But it would still be a waste. Especially in already bloated world of SW.

So my (and many other's) solution is a bitset. In a simplest form it is just a way to read/write state of single bit. And basically this is what my bitset does. Eg, at the beginning, you initialize my bitset by telling how many bits you need, and later you set/unset/get the state of N.th bit of bitset.

My bitset implementation can be obtained from my bot project's repository:
The files are MbotBitset.h and MbotBitset.c

NOTE! I've not tested this bit set on 64 bit architectures, and it may fail there. (It may also work though). If you test it, please report the results in this blog for example (as a comment), or by email (Mazziesaccount@gmail.com).

Packed Array:

Well, if we continue playing with thoughts of students, we can go back in good ol' times where grades from courses at university had only 4 possibilities. Failed, 1,2 or 3. Now if we think of storing these numbers, we note following:
1. bitset cannot really be easily used, since it only allows 2 states / Id. (since it only uses 1 bit to represent the value, but representing 4 states requires 2 bits, 00, 01, 10, 11)
2. If we use char array, only 2 of 8 bits is used / student.

To tackle this one I wrote a packed array.

Basically my packed array just calculates amount of bits needed to represent the largest value (told at array initialization), and then allocates the amount of memory that is needed for array with X members (also told at initialization). Then it splits the memory to slots and allows writing/reading values stored in N bits wide slots.

My packed array implementation can be found from same place as the bitset, files MbotPackedArray.h and MbotPackedArray.c

NOTE! Packed array has not been tested on 64-bit machine (and I assume it will not work there). Also 32 bit environment has not been so throughoutly tested, but at least I've not noticed that there is bugs left...
Anyways, I'd be gratefull if you reported results of all experiments with the packed array / bitset.

(To my mail (Mazziesaccount@gmail.com) Bugs, success stories, improvements...
or in this blog as a comment.Bugs, success stories, improvements...)

1 comment:

  1. Oh, I forgot to say... The bitset and packed array presented in here are quite messy. The reason is simple and good, but it does not make the code maintenance/understanding any simple :) However my excuse is real need for speed optimizations in some places I've used this. This need for speed made me to introduce couple of quite ugly macros (to avoid function calls && jumps+stack handling it creates) and to replace (often) pretty timetaking division and modulo operations with faster bit sifting.

    However if you do not have such a need for speed, or you have a processor capable to do fast divisions, you may want to make the code a bit cleaner && more maintainable... :)