I would like to double a packed BCD number.

eg. 1234 decimal can be represented thusly in 32 bits:
\$01020304 BCD (each digit stored in a byte)
or \$00001234 packed BCD (each digit stored in a nibble)

A complete routine for adding two packed BCD numbers is given here http://www.asmcommunity.net/board/viewtopic.php?t=13215

What I want to do now is multiply a packed BCD number by 2. The following code works (and incorporates carry in & out). It simplifies one aspect of the addition algorithm: when binary number being added to itself, normally just shift....

Can it be sped up any? Thanks.

``````     mov ([esi+mda], eax);
mov (eax, edx);
xor (-1, edx);
and (\$88888888, edx);
shr (1, ebx);              // carry
rcl (1, eax);
rcl (1, ebx);              // carry
sub (edx, eax);
shr (2, edx);
Posted on 2004-09-02 20:03:20 by V Coder
Okay, but I'm going to write this in a normal way:
``````mov ecx,[esi+mda]
lea eax,[ecx+0x33333333]
and eax,0x88888888
shr eax,1
shr eax,1
Posted on 2004-09-04 12:37:27 by Sephiroth3
as '0x33' HexMark, it's belong C (or other HLL) .

all most Asms' HexMark, has 2, "33h" & "\$33" ; 33h in asm source, written by Asm programmer; \$33 in online disassembler for display.
``````
mov ecx, [esi+mda]
lea eax, [ecx+33333333h]
and eax, 88888888h
shr eax, 1
shr eax, 1
Posted on 2004-09-05 05:50:46 by Kestrel
Ah, sorry, I had just been using NASM and grown accustomed to writing 0x for some reason :P
Posted on 2004-09-05 08:16:38 by Sephiroth3
Thanks guys. But it does not work. With input 123456h, output eax=1234BCh ecx=12349Ah. Something should be 246912h. Please also check out at the other links. I like the brivity of your routine though, so I'm looking at it further. Do you have a source for it or stuff on packed BCD arithmetic? Thanks.
Posted on 2004-09-05 23:22:24 by V Coder
Ah, you're right, I missed an instruction!

``````
mov ecx, [esi+mda]
lea eax, [ecx+33333333h]
and eax, 88888888h
shr eax,2
sub ecx,eax
``````

I now also made it return the value in ECX to save one byte and one shift instruction.
Posted on 2004-09-06 10:54:48 by Sephiroth3
That is great. Four instructions less than mine!!! How did you do it? BTW one thing missing though - how do we get the carry please in case of doubling eight digit numbers?

Another question. How would you extend this to a generic add for packed BCD please? Also with carry?

Do you have any other links on the topic please? Thanks.

I got it. This handles the carry...
``````    mov ecx, [esi+mda]
lea eax, [ecx+33333333h]
and eax, 88888888h
lea ecx, [ecx*2+ebx]
sets bl
shr eax,2
sub ecx,eax``````

Still, how would you extend this to generic packed BCD add please. Thanks.
Posted on 2004-09-06 17:08:59 by V Coder
I think I did it in about the same way as you did, but with reversed logic.

Maybe you can add like this?

``````mov eax,[valuea]
mov ecx,[valueb]
mov edx,eax
rcl ebx,1
xor ecx,edx
mov edx,eax
xor eax,ecx
shr ebx,1
rcr eax,3
and eax,22222222h
lea eax,[eax*2+eax]
shl eax,2
; Result in EDX, upper bit of EBX cleared``````

The doubler with carry can be implemented like this:
``````mov ecx, [esi+mda]
lea eax,[ecx+33333333h]
and eax,88888888h
shr eax,2
sub ecx,eax
shl eax,3
; Result in ECX``````

I'm afraid I don't know of any litterature about BCD numerical processing.
Posted on 2004-09-07 12:57:19 by Sephiroth3
I combined the two techniques above to produce:
``````mov ecx, [esi+mda]
lea eax,[ecx+33333333h]
rcr eax,2
and eax,22222222h
lea eax,[eax*2+eax]
shl eax,2``````

However, it is one quarter the speed of
``````mov ecx, [esi+mda]
lea eax,[ecx+33333333h]
and eax,88888888h
shr eax,2
sub ecx,eax
shl eax,3``````

and
``````mov ecx, [esi+mda]
lea eax, [ecx+33333333h]
and eax, 88888888h
lea ecx, [ecx*2+ebx]
sets bl
shr eax,2
sub ecx,eax``````

Apparently the flags dependency after adc slows down the rcr - and we don't even use this carry!!! Changing rcr to shr brings it almost up to speed.
``````mov ecx, [esi+mda]
lea eax,[ecx+33333333h]
shr eax,2
and eax,22222222h
lea eax,[eax*2+eax]
shl eax,2``````
Posted on 2004-09-08 07:15:22 by V Coder
Maybe you could just keep the code I posted, then :P Or is any of the other variants faster?
Posted on 2004-09-08 10:34:33 by Sephiroth3
Consider the four routines above (Sep 8 ) to be labelled a, b, c, d. They rank as follows:

1st place <b> (Sephiroth3 - Sep 7 with carry in carry flag) and <c> (V Coder - Sep 6 with carry added in ebx & sets bl after Sephiroth3 - Sep 6) calculation in 5 seconds (both approx 5.75 seconds but truncated to 5). Total with startup and save=8.5 seconds
2nd place <d> (V Coder - Sep 8 with carry in carry flag after Sephiroth3 - Sep 7) calculation in 7 seconds
3th place <not posted> (V Coder - a variation of the initial Sep 2 code using setc bl) calculation in 7 seconds
4rd place <initial code> (V Coder Sep 2) calculation in 10 seconds
5th place <a> (V Coder - Sep 8 with carry in carry flag after Sephiroth3 - Sep 7) calculation in 25 seconds

Total time with startup & save measured with Randy's howlong.

With twice as large terminal dataset, b & c completed the calculation in 23 seconds (4 times as long), and the initial code took 44 seconds. b took 25.7 seconds and c 26.1 seconds according to howlong. b is therefore just under twice as fast as the original routine. Furthermore, all things being equal, b should save a minute every hour over c.
Posted on 2004-09-08 23:03:10 by V Coder
Update:
Arithmetic (add, sub... affect carry), logic (and, or... zero carry) or shift (shr, rcl...) instructions all affect the carry flag (except lea, dec, inc), and thus cannot be used until the carry is accessed from routines a, b, d. Loops to deal with long numbers require lea and dec to test for termination conditions.

By contrast, the routine with sets bl dispatches the carry, so that the faster add (index)/sub (number of iterations) can be used to test for the loop condition.

But this is not enough to edge <c> just ahead of <b>.
Posted on 2004-12-30 22:43:08 by V Coder