is there some undocumented way of performing such an operation? or can anyone post an alternative that runs in three clocks or less? i'm coding one of those big number addition routines in MMX (i'm updating big number routines for MMX) - which i'll post in addition to bitRAKE's 32-bit algo re:big number routines. thanks :)

edit: you can use up to 3 MMX registers if so required.
Posted on 2002-04-01 13:42:11 by jademtech
Wouldn't a PADDUS or PADDUSQ (seems redundant) be better?
Posted on 2002-04-01 23:24:26 by bitRAKE
yes, yes it would be nice :) but i was thinking i could adapt the code, anyway :)

here's a rough thing of what i've got (in pseudo-asm) for BigAdd (non existant instructions in italics). i also know i didn't do big/little endian conversions:

;some init code above
@@:
movq MM0,
movq MM1,

movq MM2,MM0
movq MM3,MM1

paddq MM0,MM1
paddusq MM2,MM3

movq MM4,MM2
pcmpgtq MM2,MM0 ;overflowed?

;Carry bit in MM4
paddq MM0,MM4
paddq MM4,MM4

pcmpgtq MM4,MM0
sub esi,8

por MM4,MM2
sub edi,8

shrlq MM4,63
dec ecx

movq ,MM0
jne @B
Posted on 2002-04-02 21:40:54 by jademtech