see this and this for related fun stuff on logic and gates
UPDATE 4/5/06
George Hickey submitted this very cool version...
int add1(int val)
{
return 0xFFFFFFFF * (val ^ 0xFFFFFFFF);
}
And for fun I did this hack to remove the mult
int add_1(int i)
{
return ((long long)(i ^ (0xffffffff)) << 32) - (i^0xffffffff)
}
There was a challenge recently to write a routine to add '1' to a number without using the
+ or
- symbols.
Here's the most simple version of the routine..
int add1(int value)
{
/* retval has to be unsigned, else the right shift
* will pad with ones
*/
unsigned int retval = 0;
unsigned int counter = 0xFFFFFFFF;
int carry = 1;
while (counter)
{
retval >>= 1;
if (carry)
{ if (value & 0x01)
{ carry = 1;
}
else
{ retval = retval | 1 << 31;
carry = 0;
} }
else
{
if (value & 0x01)
{ retval = retval | 1 << 31;
carry = 0;
}
else
{ carry = 0;
} }
value >>= 1;
counter >>= 1;
}
return retval;
}
Putting it through its paces (
gcc -O0, Pentium 3 1 Ghz), I get a speed of
1.66 calls per millisecond. Calling a routine that does nothing but increment and return gives me
63 calls per second. And using the
-O3 flag, I get
2.83.
Well, if you look at the outer
else clause you set
carry to zero in either case. We can simplify like this...
else
{ carry = 0;
if (value & 0x01)
{ retval = retval | 1 << 31;
} }
...and now we get
1.74 calls per second. Of course, the compiler should detect this for the non
-O0 case. (Though it turns out it does, but not perfectly...with
-O3 I get
2.81/2.83 or a 1% improvement).
What if we reverse the logic of the loop...
if (value & 0x01)
{ if (carry)
{ carry = 1;
}
else
{ carry = 0;
retval = retval | 1 << 31;
}
}
else
{ if (carry)
{ carry = 0;
retval = retval | 1 << 31;
}
}
...now we get
1.79. A bit of an improvement. Now, what if we remove the conditionals around setting
carry like this...
if (value & 0x01)
{ if (carry)
{
}
else
{ retval = retval | 1 << 31;
}
}
else
{ if (carry)
{ retval = retval | 1 << 31;
}
}
carry = (value & 0x01) & carry;
This turns out to be worse - we're down to
1.49. Ok, let's refactor some more and make the code really consufing...
if ( (value & 0x01 && !carry) ||
(!(value & 0x01) && carry) )
{
retval = retval | 1 << 31;
}
carry = (value & 0x01) & carry;
...and this, again, is worse. However, I did do a run with
-O3, and now I'm up to
13.7 !! That's nearly 5x where I was before. Still, I wondered why the -O0 case is so awful. Well, I defined a temp variable to try and help and clean things up...
temp = value & 0x01;
if ( (temp && !carry) ||
(!temp && carry) )
{
retval = retval | 1 << 31;
}
carry = temp & carry;
...and this made the
-O0 case
still worse at
1.25. Weird! The
-O3 case is still ok, though.
Well, duh, I then realized that we are simply xor-ing
temp and
carry, so we can make this a lot simpler.
if ( temp ^ carry )
{ retval = retval | 1 << 31;
}
...and this again helped the
-O0 case to
1.56, but the
-O3 case stayed the same. Ok, so if I remove the
temp variable, I end up with
if ( (value & 0x01) ^ carry )
{
retval |= 1 << 31;
}
carry = (value & 0x01) & carry;
...and I'm back to
1.78 for
-O0, and
-O3 is unchanged at
13.7.
Ok, time to get bizarre. I realized I can use the same variable for the returned value as I do for the source number. I just shift new values to the left as I roll them off to the right. But I need an intermediate variable. I have to transfer to an unsigned for
value though, because signed
ints fill in their MSB with a '1' when you do a right shift.
int add1(int value_in)
{
unsigned int value = value_in;
unsigned int counter = 0xFFFFFFFF;
int carry = 1;
int set_bit = 0;
while (counter)
{
set_bit = (value & 0x01) ^ carry;
carry = (value & 0x01) & carry;
value >>= 1;
value |= set_bit << 31;
counter >>= 1;
}
return value;
}
Well, this insane method in fact does nothing to help or hurt performance. So let's go back to our more sensible looking version...Now I've removed a conditional and made some of the variables registers just for fun.
int add1(int value)
{
register unsigned int retval = 0;
register unsigned int counter = 0xFFFFFFFF;
int carry = 1;
while (counter)
{
retval >>= 1;
retval |= ((value & 0x01) ^ carry ) << 31;
carry = (value & 0x01) & carry;
value >>= 1;
counter >>= 1;
}
return retval;
}
Now, I'm up to a whopping
2.44! Even without the
register spec, I'm still up at
1.95.
Oooh...starting to see weird binary relationships, eh? Wait a minute...I don't need to do the
& 0x01, do I?
carry always has only its first bit set anyway! So now my loop has boiled down to...
while (counter)
{
retval >>= 1;
retval |= (value ^ carry ) << 31;
carry = value & carry;
value >>= 1;
counter >>= 1;
}
...and my performance has gone up to around
2.5 for
-O0. But still, no change for
-O3 - we seem to be stuck at
13.68. Note...if I changed just
one of the affected lines, the performance got worse. I had to change both to make the improvement.
So let's recap...for a Pentium 3 1 Ghz, we see...
| |
Simple spelled out code |
Tweaked version |
Speedup |
gcc -O0 |
1.66 |
2.50 |
1.5x |
gcc -O3 |
2.83 |
13.68 |
4.8x |
I've read enough Abrash to know there are undoubtedly still better versions, even in C. Heck, there's probably some slick binary arithmetic trick I could have used to avoid the loop. But this was a good deal of fun in any case. I think the most fun was refactoring to peel the loop down from 15 lines of code to 5.
Here's the
whole program alone with some timing stuff.
--
MattWalsh - 03 Aug 2004