see this and this for related fun stuff on logic and gates
UPDATE 4/5/06

George Hickey submitted this very cool version...

   int add1(int val)
   {
       return 0xFFFFFFFF * (val ^ 0xFFFFFFFF);
   }

And for fun I did this hack to remove the mult

int add_1(int i)
{
   return ((long long)(i ^ (0xffffffff)) << 32) - (i^0xffffffff)
}


There was a challenge recently to write a routine to add '1' to a number without using the + or - symbols.

Here's the most simple version of the routine..

int add1(int value)
{
   /* retval has to be unsigned, else the right shift
    *  will pad with ones
    */
   unsigned int retval = 0;
   unsigned int counter = 0xFFFFFFFF;
   int carry = 1;

   while (counter)
   {
      retval >>= 1;
      if (carry)
      {  if (value & 0x01)
         {  carry = 1;
         }
         else
         {  retval = retval | 1 << 31;
            carry = 0;
      }  }
      else
      {
         if (value & 0x01)
         {  retval = retval | 1 << 31;
            carry = 0;
         }
         else
         {  carry = 0;
      }  }
      value >>= 1;
      counter >>= 1;
   }
   return retval;
}

Putting it through its paces ( gcc -O0, Pentium 3 1 Ghz), I get a speed of 1.66 calls per millisecond. Calling a routine that does nothing but increment and return gives me 63 calls per second. And using the -O3 flag, I get 2.83.

Well, if you look at the outer else clause you set carry to zero in either case. We can simplify like this...


      else
      {  carry = 0;
         if (value & 0x01)
         {  retval = retval | 1 << 31;
      }  }

...and now we get 1.74 calls per second. Of course, the compiler should detect this for the non -O0 case. (Though it turns out it does, but not perfectly...with -O3 I get 2.81/2.83 or a 1% improvement).

What if we reverse the logic of the loop...

      if (value & 0x01)
      {  if (carry)
         {  carry = 1;
         }
         else
         {  carry = 0;
            retval = retval | 1 << 31;
         }
      }
      else
      {  if (carry)
         {  carry = 0;
            retval = retval | 1 << 31;
         }
      }

...now we get 1.79. A bit of an improvement. Now, what if we remove the conditionals around setting carry like this...

      if (value & 0x01)
      {  if (carry)
         {
         }
         else
         {  retval = retval | 1 << 31;
         }
      }
      else
      {  if (carry)
         {  retval = retval | 1 << 31;
         }
      }
      carry = (value & 0x01) & carry;

This turns out to be worse - we're down to 1.49. Ok, let's refactor some more and make the code really consufing...

      if (  (value & 0x01 && !carry) ||
            (!(value & 0x01) && carry) )
      {
         retval = retval | 1 << 31;
      }

      carry = (value & 0x01) & carry;

...and this, again, is worse. However, I did do a run with -O3, and now I'm up to 13.7 !! That's nearly 5x where I was before. Still, I wondered why the -O0 case is so awful. Well, I defined a temp variable to try and help and clean things up...

      temp = value & 0x01;

      if (  (temp && !carry) ||
            (!temp && carry) )
      {
         retval = retval | 1 << 31;
      }

      carry = temp & carry;

...and this made the -O0 case still worse at 1.25. Weird! The -O3 case is still ok, though.

Well, duh, I then realized that we are simply xor-ing temp and carry, so we can make this a lot simpler.


      if (  temp ^ carry  )
      {   retval = retval | 1 << 31;
      }

...and this again helped the -O0 case to 1.56, but the -O3 case stayed the same. Ok, so if I remove the temp variable, I end up with

      if (  (value & 0x01) ^ carry  )
      {
         retval |= 1 << 31;
      }

      carry = (value & 0x01) & carry;

...and I'm back to 1.78 for -O0, and -O3 is unchanged at 13.7.

Ok, time to get bizarre. I realized I can use the same variable for the returned value as I do for the source number. I just shift new values to the left as I roll them off to the right. But I need an intermediate variable. I have to transfer to an unsigned for value though, because signed ints fill in their MSB with a '1' when you do a right shift.

int add1(int value_in)
{
   unsigned int value = value_in;
   unsigned int counter = 0xFFFFFFFF;
   int carry = 1;
   int set_bit = 0;

   while (counter)
   {
      set_bit = (value & 0x01) ^ carry;
      carry = (value & 0x01) & carry;
      value >>= 1;
      value |= set_bit << 31;
      counter >>= 1;
   }
   return value;
}

Well, this insane method in fact does nothing to help or hurt performance. So let's go back to our more sensible looking version...Now I've removed a conditional and made some of the variables registers just for fun.

int add1(int value)
{
   register unsigned int retval = 0;
   register unsigned int counter = 0xFFFFFFFF;
   int carry = 1;

   while (counter)
   {
      retval >>= 1;

      retval |= ((value & 0x01) ^ carry ) << 31;
      carry = (value & 0x01) & carry;

      value >>= 1;
      counter >>= 1;
   }
   return retval;
}

Now, I'm up to a whopping 2.44! Even without the register spec, I'm still up at 1.95.

Oooh...starting to see weird binary relationships, eh? Wait a minute...I don't need to do the & 0x01, do I? carry always has only its first bit set anyway! So now my loop has boiled down to...

   while (counter)
   {
      retval  >>= 1;
      retval   |= (value ^ carry ) << 31;
      carry     = value & carry;

      value   >>= 1;
      counter >>= 1;
   }

...and my performance has gone up to around 2.5 for -O0. But still, no change for -O3 - we seem to be stuck at 13.68. Note...if I changed just one of the affected lines, the performance got worse. I had to change both to make the improvement.

So let's recap...for a Pentium 3 1 Ghz, we see...

Simple spelled out code Tweaked version Speedup
gcc -O0 1.66 2.50 1.5x
gcc -O3 2.83 13.68 4.8x

I've read enough Abrash to know there are undoubtedly still better versions, even in C. Heck, there's probably some slick binary arithmetic trick I could have used to avoid the loop. But this was a good deal of fun in any case. I think the most fun was refactoring to peel the loop down from 15 lines of code to 5.

Here's the whole program alone with some timing stuff.

-- MattWalsh - 03 Aug 2004

Topic revision: r4 - 06 Apr 2006 - MattWalsh
 
This site is powered by the TWiki collaboration platformCopyright © 2008-2012 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback