I can't seem to find a way to do this fast...
I need to increment an 8-bit value in memory with saturation. The fastest method I found so far is:
That's five slow instruction! This code is quite critical to me so I wondered if there was a faster way to do it.
Thanks for any ideas!
I need to increment an 8-bit value in memory with saturation. The fastest method I found so far is:
movzx eax, byte ptr [...]
inc eax
cmp eax, 0xFF
cmovg eax, [constFFh]
mov [...], al
That's five slow instruction! This code is quite critical to me so I wondered if there was a faster way to do it.
Thanks for any ideas!
Try this:
add ,1
sbb ,0
Or this:
cmp ,255
adc ,0
add ,1
sbb ,0
Or this:
cmp ,255
adc ,0
I always find solutions right after I post something... ;)
This should be faster:
I'm already much more satisfied now, since every instruction has a clear purpose: load, increment, saturate and store.
This should be faster:
movsx eax, byte ptr [...]
inc eax
cmovz eax, [constFFh]
mov [...], al
I'm already much more satisfied now, since every instruction has a clear purpose: load, increment, saturate and store.
Try this:
add ,1
sbb ,0
Or this:
cmp ,255
adc ,0
Wow that's short! Thanks! I hope the read-modify-write operation aren't slower. I'll try it out...
Sorry, my method was nearly twice as fast on my Pentium M.
Edit: I was too hasty. Your second method with the cmp is about equally fast (it's not a read-modify-write operation). I'll make some more accurate measurements...
Edit: Congratulations! Your second method is 10% faster than mine. Thanks!
Edit: I was too hasty. Your second method with the cmp is about equally fast (it's not a read-modify-write operation). I'll make some more accurate measurements...
Edit: Congratulations! Your second method is 10% faster than mine. Thanks!
I found the equivalent for decrement with saturation:
cmp byte ptr [...], 1
adc byte ptr [...], -1
Also try doing the operation in a register instead of using RMW instructions, and see if it makes a difference.
For saturated inc-by-1, what about:
Same logic for dec-by-1:
For larger-than-1 increments, scali suggested the following:
Your responsibility to time the stuff...
add <whatever>, 1
sbb <whatever>, 0
Same logic for dec-by-1:
sub <whatever>, 1
adc <whatever>, 0
For larger-than-1 increments, scali suggested the following:
add <whatever>, <amount>
sbb <reg>, <reg>
or <whatever>, <reg>
Your responsibility to time the stuff...