is there some undocumented way of performing such an operation? or can anyone post an alternative that runs in three clocks or less? i'm coding one of those big number addition routines in MMX (i'm updating big number routines for MMX) - which i'll post in addition to bitRAKE's 32-bit algo re:big number routines. thanks :)
edit: you can use up to 3 MMX registers if so required.
edit: you can use up to 3 MMX registers if so required.
Wouldn't a PADDUS or PADDUSQ (seems redundant) be better?
Posted on 2002-04-01 23:24:26 by bitRAKE
Posted on 2002-04-01 23:24:26 by bitRAKE
yes, yes it would be nice :) but i was thinking i could adapt the code, anyway :)
here's a rough thing of what i've got (in pseudo-asm) for BigAdd (non existant instructions in italics). i also know i didn't do big/little endian conversions:
;some init code above
@@:
movq MM0,
movq MM1,
movq MM2,MM0
movq MM3,MM1
paddq MM0,MM1
paddusq MM2,MM3
movq MM4,MM2
pcmpgtq MM2,MM0 ;overflowed?
;Carry bit in MM4
paddq MM0,MM4
paddq MM4,MM4
pcmpgtq MM4,MM0
sub esi,8
por MM4,MM2
sub edi,8
shrlq MM4,63
dec ecx
movq ,MM0
jne @B
here's a rough thing of what i've got (in pseudo-asm) for BigAdd (non existant instructions in italics). i also know i didn't do big/little endian conversions:
;some init code above
@@:
movq MM0,
movq MM1,
movq MM2,MM0
movq MM3,MM1
paddq MM0,MM1
paddusq MM2,MM3
movq MM4,MM2
pcmpgtq MM2,MM0 ;overflowed?
;Carry bit in MM4
paddq MM0,MM4
paddq MM4,MM4
pcmpgtq MM4,MM0
sub esi,8
por MM4,MM2
sub edi,8
shrlq MM4,63
dec ecx
movq ,MM0
jne @B