Friday, February 15, 2013

Circular buffer algorithm rates on Arduino

For my chorded keyboard, I need a mode switch function. These modes increment and decrement in a loop. For example, if there are four modes (0, 1, 2, and 3), and I want to go to the next mode, 3 should loop back around to 0. I'm sure there is a name for these systems, but I'm not sure what it is. Apparently, they are called wki.pe/Circular_buffers.

This is an easy problem, and there are many ways to solve it (some more elegant than others). As I thought about all the ways to do this, I wondered which was the fastest. So I worked up 5 algorithms of varying elegance and methods and ran them against each other.

F1 increments, and then moves 4 to 0.
byte f1(byte a){
  a++;
  if(a>=4){
    a=0;
  }
  return a;
}

F2 uses modulo:
byte f2(byte a){
  a++;
  return a % 4;
}

F3 is exactly the same as F2. I was curious wither it would be faster or slower to do it in one line. It is certainly more elegant.
byte f3(byte a){
  return ++a % 4;
}

F4 is a case-switch:
byte f4(byte a){
  switch(a){
    case 0:
      return 1;
      break;
    case 1:
      return 2;
      break;
    case 2:
      return 3;
      break;
    case 3:
      return 0;
      break;
  }
}

F5 is if-esleif-else:
byte f5(byte a){
  if(a==0){
    return 1;
  }else if(a==1){
    return 2;
  }else if(a==2){
    return 3;
  }else{
    return 0;
  }
}

I suspected F4 and F5 to be very slow, as they are a lot of lines. I figured modulo is probably a longer function, so my hunch was F1 would win in terms of speed, modulo's elegance aside. The speed of the algorithms would depend on the number going in: F1(0) should be faster than F1(3). Each algorithm is run for each of {0,1,2,3} 100 times, and the microseconds to do that are returned. I repeated this for all 5 algorithms 10 times. Here is the test code:

byte a;
byte b;
byte c;
unsigned long starts;
unsigned long ends;
unsigned long delta;

void setup() {
  Serial.begin(9600);
  delay(2000);
}

void loop() {
  //f1
  for(a=0; a<=3; a++){
    starts = micros();
    for(b=0; b<=100; b++){
      c=f1(b);
    }
    ends = micros();
    delta = ends - starts;
    Serial.print(delta);
    Serial.print("\t");
  }
  Serial.println(");
  delay(2000);

  //f2
  for(a=0; a<=3; a++){
    starts = micros();
    for(b=0; b<=100; b++){
      c=f2(b);
    }
    ends = micros();
    delta = ends - starts;
    Serial.print(delta);
    Serial.print("\t");
  }
  Serial.println(");
  delay(2000);
  
  //f3
  for(a=0; a<=3; a++){
    starts = micros();
    for(b=0; b<=100; b++){
      c=f3(b);
    }
    ends = micros();
    delta = ends - starts;
    Serial.print(delta);
    Serial.print("\t");
  }
  Serial.println(");
  delay(2000);
  
  //f4
  for(a=0; a<=3; a++){
    starts = micros();
    for(b=0; b<=100; b++){
      c=f4(b);
    }
    ends = micros();
    delta = ends - starts;
    Serial.print(delta);
    Serial.print("\t");
  }
  Serial.println(");
  delay(2000);
  
  //f5
  for(a=0; a<=3; a++){
    starts = micros();
    for(b=0; b<=100; b++){
      c=f5(b);
    }
    ends = micros();
    delta = ends - starts;
    Serial.print(delta);
    Serial.print("\t");
  }
  Serial.println(");
  delay(2000);
}

The full data is available here. The average time in microseconds for each algorithm:

F1 F2 F3 F4 F5
0.662 0.539 0.539 1.101 0.979

It is worth noting that the modulus function is faster than any of the functions with conditionals, including the first. It is also interesting that there is really no difference between incrementing on a separate line vs. on the same line (this is probably because they compile to the same thing). Here is the chart of the results. Each group is for the inputs {0, 1, 2, 3}. There was no real change across inputs between each algorithm, which I thought was surprising. 

This is a case where the most elegant solution is also the fastest.