A simple fast help dlg, to change div by const to mul,

it makes whole operation ~ 4 time faster.

Some time ago it was new method wich was independenty discovered by several people, including me.

Now I submit it in easy to use calculation prog for anybody who cares.

Enjoy. Examples in window is just examples, but Magic number and number of shifts are real.

Source included so you can understand all this simple magic :)

it makes whole operation ~ 4 time faster.

Some time ago it was new method wich was independenty discovered by several people, including me.

Now I submit it in easy to use calculation prog for anybody who cares.

Enjoy. Examples in window is just examples, but Magic number and number of shifts are real.

Source included so you can understand all this simple magic :)

Very interesting it using the logic here it sped up some of algo's quite a bit.

Thanks

Thanks

Added it to my RadASM tools menu, thank you.

Conversion done prior.

```
@powerof2:
```

invoke dwtoa,shift,offset bstring

This *invoke*is duplicate?Conversion done prior.

```
invoke dwtoa,shift,offset bstring
```

invoke uqw2a,offset MagicNumber,offset magic

pop eax

mov esi,offset msgp2

sahf ;???????? ????????? ????????? ? ?????

je @powerof2

mov esi,offset buffer

mov edi,offset AlgoStringB

jc @lower

mov edi,offset AlgoStringC

@lower:

mov byte ptr [esi],0

invoke szMultiCat,3,esi,offset StrMagic,edi,offset shiftstr

@powerof2:

invoke SetDlgItemText,hWnd,11,esi

jmp @r

@DivBy0:

invoke MessageBox,hWnd,offset msgerr,0,0

xor esi,esi

jmp @powerof2

Also, I have not tested this, but it looks good:```
.data
```

magic dword ?

shift dword ?

r dd 0,80000000h,3fffh + 32 ; TBYTE 1.0 * 2^32

.code

; these three lines fill in the constant above in [r]

; fld1

; fstp tbyte ptr [r]

; add [r+8],32

mov [r+8],3fffh + 32 ; to make code re-entrant

bsr ecx,eax

push eax

add [r+8],ecx

mov shift,ecx

fld tbyte ptr [r]

fidiv dword ptr [esp] ;???????? ?? x

fst st(1) ;???????? ????????? ? st1 ??? ????????? ? ???????????

fistp [magic] ;???????? magic

fild [magic] ;??????? ??? ??? ?????????

fcompp ;??????? ??? ?? ????? ???????

fstsw ax ;

pop ecx

push eax

Cute icon! :grin:The Svin, nice tool!:alright:

Could you please explain how it works for me, because I don't really understand the source-code (nor the russian comments ;))...

Thanks,

Delight

Could you please explain how it works for me, because I don't really understand the source-code (nor the russian comments ;))...

Thanks,

Delight

At all simplicity this utility has many errors.

At the first number which has occurred.

Divider:2222222222 (84746B8Eh)

MagicNumber = 4449834257 (1093B1511h - cheerful 32-bit arithmetic)

mov eax, X

mov edx, MagicNumber

inc eax

mul edx

SHR edx, 31

As it is visible, that X it can not be equal 0FFFFFFFFh - a source of the future gluks the program.

Be cautious.

At the first number which has occurred.

Divider:2222222222 (84746B8Eh)

MagicNumber = 4449834257 (1093B1511h - cheerful 32-bit arithmetic)

mov eax, X

mov edx, MagicNumber

inc eax

mul edx

SHR edx, 31

As it is visible, that X it can not be equal 0FFFFFFFFh - a source of the future gluks the program.

Be cautious.

Yes, Nexo. X can not be 0FFFF FFFFh.

I'll take your notions to test and improve the app in a future.

Thanx for spending your time with it.

I'll take your notions to test and improve the app in a future.

Thanx for spending your time with it.

Nexo,

with divider >= 8000 0000h (including FFFF FFFFh)

result of division maybe only 1 or 0.

So I can add code to my app wich desplay such solution in the case:

(app check for signed bit in typed divider and if SF goes to display such text:)

It will cover all "glucs" :)

What do you think?

with divider >= 8000 0000h (including FFFF FFFFh)

result of division maybe only 1 or 0.

So I can add code to my app wich desplay such solution in the case:

(app check for signed bit in typed divider and if SF goes to display such text:)

```
```

mov eax,X

mov edx,1

cmp eax,divider

sbb edx,0

It will cover all "glucs" :)

What do you think?

bitRAKE,

My code is very akward :)

So if you feel to optimize it - just do it.

It was coded in a minute as if I did some batch file for temp usage.

I just got bored when again faced the fact that I need again calculate those numbers manualy and wrote some premitive code so that in next time it would be computer problem not mine :)

My code is very akward :)

So if you feel to optimize it - just do it.

It was coded in a minute as if I did some batch file for temp usage.

I just got bored when again faced the fact that I need again calculate those numbers manualy and wrote some premitive code so that in next time it would be computer problem not mine :)

I handled all glucs mentioned by Nexo, and removed one needless invoke mentioned by bitRake.

No mentioned errors anymore ;)

At all simplicity this utility has many errors.

No mentioned errors anymore ;)

I do:

```
```

cmp eax,Divider

sbb eax,eax

inc eax

The Svin, you algo wrong :)

Second try and I get follow result

Divider=2147483647 - nearest divide bug

Divider is power of 2

To divide X by the divider use:

mov edx,X

SHR edx, 30

With my prog I recive:

May be it is identical ;)

Second try and I get follow result

Divider=2147483647 - nearest divide bug

Divider is power of 2

To divide X by the divider use:

mov edx,X

SHR edx, 30

With my prog I recive:

```
```

;eax-any unsigned

mov edx,080000001h

mul edx

add eax,080000001h

adc edx,0

shr edx,30

May be it is identical ;)

The Svin, you algo wrong

You are just amazing Nexo with your tries to find magic number

for mul replacement with dividers close to 8000 0000h :)

Had you understood the algo you would have never wrote such nonsence :)

Algo is matemicaly right the problem is precision in FPU using fcomp cause in the case

x = 2147483649,0000000004656612875246

y = 2147483649

it gives the result x = y.

I wouldn't use mul to divide x by anything close to 8000 0000h

I would use sub.

But since I submit the app written for myself for anybody use

I promice to write one more solution for divider close to 80000000h :)

```
```

cmp eax,Divider

sbb eax,eax

inc eax

Yes, that's better. Thanx.

**Svin**, this is an interesting discussion started by your tool. Is there an easy way to get the remainder without DIV? I understand we can use two MUL, but there are many dependancies.

bitRake, I don't know universal way to do it

but in particular situation (as in dwtoa from M32lib) I would use

relpacement for mul by a few lea , add (and\or) sh operation.

Since we know divider in design time we can use fast replacement

for mul to do with result and then sub it from X.

example

In case when divider in power of 2 we use and to get reminder.

Nexo, I'v changed my mind.

I'll stick with my example.

It's a little bigger but I clock faster.

in

all in dependency wich make it impossible to do in parralel.

I think, you can find one better solution following the same logic.

but in particular situation (as in dwtoa from M32lib) I would use

relpacement for mul by a few lea , add (and\or) sh operation.

Since we know divider in design time we can use fast replacement

for mul to do with result and then sub it from X.

example

```
```

ebx = X

divider = 3

MagicNumber = 2863311531

mov eax,ebx

mov edx, MagicNumber

mul edx

SHR edx, 1

lea ecx,[edx*2][edx]

sub ebx,ecx ; ebx = reminder edx = integer part of X\3

In case when divider in power of 2 we use and to get reminder.

Nexo, I'v changed my mind.

I'll stick with my example.

It's a little bigger but I clock faster.

in

```
```

mov eax,X

cmp eax,Divider

sbb eax,eax

inc eax

all in dependency wich make it impossible to do in parralel.

I think, you can find one better solution following the same logic.

OK I include one more case for values:

for from 2147483626 to 2147483647;

that handles all cases that had problem with FPU precision

while comparing.

Curious if Nexo can find anithing else :)

for from 2147483626 to 2147483647;

that handles all cases that had problem with FPU precision

while comparing.

Curious if Nexo can find anithing else :)

Here it is not necessary to search for bad magic numbers. Weak places it is visible at once on algorithm. The algorithm is always coupled at the system of operations of the processor. It not problem FPU. It is incorrect interpreting of execution of FPU operations. I observe algorithm on your source codes, because this direct reflections of a mathematical model. Therefore the initial algorithm is wrong.

For the last version of the program in some cases DIV works faster in 2,725 times faster. For example, for the divider equal 134217727. The code is more preferable:

Works on 25 % faster on P6 and K7 CPU. As it is easier for building into expressions on a context. For example, a=b/222222222222+2c-3.

I is simpl want to warn the others from them CPU of errors in rate of operation.

For the last version of the program in some cases DIV works faster in 2,725 times faster. For example, for the divider equal 134217727. The code is more preferable:

```
```

mov eax, 8000001h

mul edx

add eax, 8000001h

adc edx, 0

shr edx, 22

```
```

cmp eax, Divider

sbb eax, eax

inc eax

Works on 25 % faster on P6 and K7 CPU. As it is easier for building into expressions on a context. For example, a=b/222222222222+2c-3.

```
```

cmp ebx, 222222222222

sbb ebx, ebx

lea eax, [ebx+1+2*ecx-3]

I is simpl want to warn the others from them CPU of errors in rate of operation.

I is simpl want to warn the others from them CPU of errors in rate of operation.

?= I simply want to warn others of errors from CPU in rate of operation.
I've simply converted the sentence to the meaning I've assumed it to mean.

**Nexo**, I think your trying to say a lot in one sentence. :) What is 'rate of operation'? I'd like to understand the depth of what you mean. There is very nice work here and else where by you - it would be nice if we could communicate more fluidly.

Nexo, why do you for the second time comparing your 3 instruction code:

with my 4 instruction?

add to your code

mov eax,X

and it will be slower at any processor.

My code among other things loading X into register, your not.

Include loading code then compare.

About math model - it's nothing but hollow words.

Algo needs to know if fraction is > 0.5.

If you knew how to do it right way you would say something

like(what I guessed without your super unexplicit help)

"Svin, if fcomp returns equal - in your algo it yet would mean that

fraction < 0.5"

Or would show different method of comparing.

You did nothing of the above.

You said nothing practical of what math part was wrong, just unprecise words that it was wrong.

So don't bull**** me about one glance understanding ;)

Write your own app without Stepan procs with right math with the

same perpose and then we'll talk.

He uses Granlund T. - Montgomery P. method wich is close to mine but a little bit different. At least I understand method

from math point of view, the only problem trunslate math words in English (in what I not good at).

So next time you talk of math - write math proof to clearify what yor mean if you really can see it in source.

The only thing you did - showed errors, for that - thank you,

the rest has absolutly no meaning until you clearify it.

below is corrected fcomp.

```
```

cmp eax, Divider

sbb eax, eax

inc eax

with my 4 instruction?

add to your code

mov eax,X

and it will be slower at any processor.

My code among other things loading X into register, your not.

Include loading code then compare.

About math model - it's nothing but hollow words.

Algo needs to know if fraction is > 0.5.

If you knew how to do it right way you would say something

like(what I guessed without your super unexplicit help)

"Svin, if fcomp returns equal - in your algo it yet would mean that

fraction < 0.5"

Or would show different method of comparing.

You did nothing of the above.

You said nothing practical of what math part was wrong, just unprecise words that it was wrong.

So don't bull**** me about one glance understanding ;)

Write your own app without Stepan procs with right math with the

same perpose and then we'll talk.

He uses Granlund T. - Montgomery P. method wich is close to mine but a little bit different. At least I understand method

from math point of view, the only problem trunslate math words in English (in what I not good at).

So next time you talk of math - write math proof to clearify what yor mean if you really can see it in source.

The only thing you did - showed errors, for that - thank you,

the rest has absolutly no meaning until you clearify it.

below is corrected fcomp.

If to observe on clock ticks both alternatives are equal. If to observe this code in environment of calculations my code is fulfilled faster because it is shorter. I made calculations in a context of the various equations and have received appropriate result. As the side benefit, was remarked possibility usage of one register, that else sometimes in addition improved the resulting code.

Yes, I mustered your code using built - in DIV utility in RAD assembler application from Stepan Polovnikov. What for to me to create that is already created. When I is simpl can use built - in assembler macros divisions (FASTDIV) ;)

I think, that people will use your utility, and additional problems in their code are not necessary for them, only therefore to have to train your brain.

Test for power of 2:

Use only one addition register. Does not change the source register. Shorter. Faster.

Good luck.

Yes, I mustered your code using built - in DIV utility in RAD assembler application from Stepan Polovnikov. What for to me to create that is already created. When I is simpl can use built - in assembler macros divisions (FASTDIV) ;)

I think, that people will use your utility, and additional problems in their code are not necessary for them, only therefore to have to train your brain.

Test for power of 2:

```
```

lea ecx,[eax-1]

test eax,ecx

je power2

Use only one addition register. Does not change the source register. Shorter. Faster.

Good luck.

**bitRAKE**, thank you. You correctly have understood me... Can be :) Sometimes I use a WEB-translator. I really speak much in one sentence - on the average 3 lines on sentence ;) In the future I shall try to communicate more fluidly.