I'm just quoting the theoretical informations from Intel. Practice shows that the only way to optimize something is to test all possibilities on all possible machines :)
Posted on 2005-12-16 14:23:56 by ti_mo_n
Specifically, it says:

The inc and dec instructions modify only a subset of the bits in the flag register. This creates a dependence on all previous writes of the flag register. This is especially problematic when these instructions are on the critical path because they are used to change an address for a load on which many other instructions depend.

Assembly/Compiler Coding Rule 42. (M impact, H generality) inc and dec instructions should be replaced with an add or sub instruction, because add and sub overwrite all flags, whereas inc and dec do not, therefore creating false dependencies on earlier instructions that set the flags.

Interestingly, though, the latency for add/sub seems to have increased on newer P4s (according to table c-8 in the intel p4 optimisation guide) to level with dec/inc (1 cycle) whereas it was 0.5 cycles. When I tested add/sub versus inc/dec that was on an early P4 (a Wilamette). Maybe the difference isn't so big on Prescotts....?
Posted on 2005-12-16 16:28:54 by stormix