I would like to double a packed BCD number.

eg. 1234 decimal can be represented thusly in 32 bits:

$000004D2 hexadecimal conversion

$01020304 BCD (each digit stored in a byte)

or $00001234 packed BCD (each digit stored in a nibble)

A complete routine for adding two packed BCD numbers is given here http://www.asmcommunity.net/board/viewtopic.php?t=13215

What I want to do now is multiply a packed BCD number by 2. The following code works (and incorporates carry in & out). It simplifies one aspect of the addition algorithm: when binary number being added to itself, normally just shift....

Can it be sped up any? Thanks.

eg. 1234 decimal can be represented thusly in 32 bits:

$000004D2 hexadecimal conversion

$01020304 BCD (each digit stored in a byte)

or $00001234 packed BCD (each digit stored in a nibble)

A complete routine for adding two packed BCD numbers is given here http://www.asmcommunity.net/board/viewtopic.php?t=13215

What I want to do now is multiply a packed BCD number by 2. The following code works (and incorporates carry in & out). It simplifies one aspect of the addition algorithm: when binary number being added to itself, normally just shift....

Can it be sped up any? Thanks.

```
mov ([esi+mda], eax);
```

add ($33333333, eax);

mov (eax, edx);

xor (-1, edx);

and ($88888888, edx);

shr (1, ebx); // carry

rcl (1, eax);

rcl (1, ebx); // carry

sub (edx, eax);

shr (2, edx);

add (edx, eax);

Okay, but I'm going to write this in a normal way:

```
mov ecx,[esi+mda]
```

lea eax,[ecx+0x33333333]

and eax,0x88888888

shr eax,1

add ecx,eax

shr eax,1

add eax,ecx

as '0x33' HexMark, it's belong C (or other HLL) .

all most Asms' HexMark, has 2, "33h" & "$33" ; 33h in asm source, written by Asm programmer; $33 in online disassembler for display.

all most Asms' HexMark, has 2, "33h" & "$33" ; 33h in asm source, written by Asm programmer; $33 in online disassembler for display.

```
```

mov ecx, [esi+mda]

lea eax, [ecx+33333333h]

and eax, 88888888h

shr eax, 1

add ecx, eax

shr eax, 1

add eax, ecx

Ah, sorry, I had just been using NASM and grown accustomed to writing 0x for some reason :P

Thanks guys. But it does not work. With input 123456h, output eax=1234BCh ecx=12349Ah. Something should be 246912h. Please also check out at the other links. I like the brivity of your routine though, so I'm looking at it further. Do you have a source for it or stuff on packed BCD arithmetic? Thanks.

Ah, you're right, I missed an instruction!

I now also made it return the value in ECX to save one byte and one shift instruction.

```
```

mov ecx, [esi+mda]

lea eax, [ecx+33333333h]

and eax, 88888888h

add ecx,ecx

add ecx,eax

shr eax,2

sub ecx,eax

I now also made it return the value in ECX to save one byte and one shift instruction.

That is great. Four instructions less than mine!!! How did you do it? BTW one thing missing though - how do we get the carry please in case of doubling eight digit numbers?

Another question. How would you extend this to a generic add for packed BCD please? Also with carry?

Do you have any other links on the topic please? Thanks.

I got it. This handles the carry...

Still, how would you extend this to generic packed BCD add please. Thanks.

Another question. How would you extend this to a generic add for packed BCD please? Also with carry?

Do you have any other links on the topic please? Thanks.

I got it. This handles the carry...

```
mov ecx, [esi+mda]
```

lea eax, [ecx+33333333h]

and eax, 88888888h

lea ecx, [ecx*2+ebx]

; add ecx,ecx

sets bl

add ecx,eax

shr eax,2

sub ecx,eax

Still, how would you extend this to generic packed BCD add please. Thanks.

I think I did it in about the same way as you did, but with reversed logic.

Maybe you can add like this?

The doubler with carry can be implemented like this:

I'm afraid I don't know of any litterature about BCD numerical processing.

Maybe you can add like this?

```
mov eax,[valuea]
```

mov ecx,[valueb]

mov edx,eax

adc eax,ecx

rcl ebx,1

xor ecx,edx

mov edx,eax

add eax,66666666h

adc ebx,0

xor eax,ecx

shr ebx,1

rcr eax,3

and eax,22222222h

lea eax,[eax*2+eax]

add edx,eax

shl eax,2

; Result in EDX, upper bit of EBX cleared

The doubler with carry can be implemented like this:

```
mov ecx, [esi+mda]
```

lea eax,[ecx+33333333h]

adc ecx,ecx

and eax,88888888h

add ecx,eax

shr eax,2

sub ecx,eax

shl eax,3

; Result in ECX

I'm afraid I don't know of any litterature about BCD numerical processing.

I combined the two techniques above to produce:

However, it is one quarter the speed of

and

Apparently the flags dependency after adc slows down the rcr - and we don't even use this carry!!! Changing rcr to shr brings it almost up to speed.

```
mov ecx, [esi+mda]
```

lea eax,[ecx+33333333h]

adc ecx,ecx

rcr eax,2

and eax,22222222h

lea eax,[eax*2+eax]

add ecx,eax

shl eax,2

However, it is one quarter the speed of

```
mov ecx, [esi+mda]
```

lea eax,[ecx+33333333h]

adc ecx,ecx

and eax,88888888h

add ecx,eax

shr eax,2

sub ecx,eax

shl eax,3

and

```
mov ecx, [esi+mda]
```

lea eax, [ecx+33333333h]

and eax, 88888888h

lea ecx, [ecx*2+ebx]

; add ecx,ecx

sets bl

add ecx,eax

shr eax,2

sub ecx,eax

Apparently the flags dependency after adc slows down the rcr - and we don't even use this carry!!! Changing rcr to shr brings it almost up to speed.

```
mov ecx, [esi+mda]
```

lea eax,[ecx+33333333h]

adc ecx,ecx

shr eax,2

and eax,22222222h

lea eax,[eax*2+eax]

add ecx,eax

shl eax,2

Maybe you could just keep the code I posted, then :P Or is any of the other variants faster?

Consider the four routines above (Sep 8 ) to be labelled a, b, c, d. They rank as follows:

Total time with startup & save measured with Randy's

With twice as large terminal dataset, b & c completed the calculation in 23 seconds (4 times as long), and the initial code took 44 seconds. b took 25.7 seconds and c 26.1 seconds according to howlong. b is therefore just under twice as fast as the original routine. Furthermore, all things being equal, b should save a minute every hour over c.

**1st place**<b> (Sephiroth3 - Sep 7 with carry in carry flag) and <c> (V Coder - Sep 6 with carry added in ebx & sets bl after Sephiroth3 - Sep 6) calculation in 5 seconds (both approx 5.75 seconds but truncated to 5). Total with startup and save=8.5 seconds**2nd place**<d> (V Coder - Sep 8 with carry in carry flag after Sephiroth3 - Sep 7) calculation in 7 seconds**3th place**<not posted> (V Coder - a variation of the initial Sep 2 code using set__c__bl) calculation in 7 seconds**4rd place**<initial code> (V Coder Sep 2) calculation in 10 seconds**5th place**<a> (V Coder - Sep 8 with carry in carry flag after Sephiroth3 - Sep 7) calculation in 25 secondsTotal time with startup & save measured with Randy's

*howlong*.With twice as large terminal dataset, b & c completed the calculation in 23 seconds (4 times as long), and the initial code took 44 seconds. b took 25.7 seconds and c 26.1 seconds according to howlong. b is therefore just under twice as fast as the original routine. Furthermore, all things being equal, b should save a minute every hour over c.

**Update:**

Arithmetic (add, sub... affect carry), logic (and, or... zero carry) or shift (shr, rcl...) instructions all affect the carry flag (except lea, dec, inc), and thus cannot be used until the carry is accessed from routines a, b, d. Loops to deal with long numbers require lea and dec to test for termination conditions.

By contrast, the routine with sets bl dispatches the carry, so that the faster add (index)/sub (number of iterations) can be used to test for the loop condition.

But this is not enough to edge <c> just ahead of <b>.