A simple fast help dlg, to change div by const to mul,
it makes whole operation ~ 4 time faster.
Some time ago it was new method wich was independenty discovered by several people, including me.
Now I submit it in easy to use calculation prog for anybody who cares.
Enjoy. Examples in window is just examples, but Magic number and number of shifts are real.
Source included so you can understand all this simple magic :)
Posted on 2002-04-19 17:46:56 by The Svin
Very interesting it using the logic here it sped up some of algo's quite a bit.

Posted on 2002-04-19 18:28:41 by Betrayed
Added it to my RadASM tools menu, thank you.

invoke dwtoa,shift,offset bstring
This invoke is duplicate?
Conversion done prior.
	invoke dwtoa,shift,offset bstring

invoke uqw2a,offset MagicNumber,offset magic
pop eax
mov esi,offset msgp2
sahf ;???????? ????????? ????????? ? ?????
je @powerof2
mov esi,offset buffer
mov edi,offset AlgoStringB
jc @lower
mov edi,offset AlgoStringC
mov byte ptr [esi],0
invoke szMultiCat,3,esi,offset StrMagic,edi,offset shiftstr

invoke SetDlgItemText,hWnd,11,esi
jmp @r
invoke MessageBox,hWnd,offset msgerr,0,0
xor esi,esi
jmp @powerof2
Also, I have not tested this, but it looks good:

magic dword ?
shift dword ?

r dd 0,80000000h,3fffh + 32 ; TBYTE 1.0 * 2^32
; these three lines fill in the constant above in [r]
; fld1
; fstp tbyte ptr [r]
; add [r+8],32

mov [r+8],3fffh + 32 ; to make code re-entrant
bsr ecx,eax
push eax
add [r+8],ecx
mov shift,ecx
fld tbyte ptr [r]
fidiv dword ptr [esp] ;???????? ?? x
fst st(1) ;???????? ????????? ? st1 ??? ????????? ? ???????????
fistp [magic] ;???????? magic
fild [magic] ;??????? ??? ??? ?????????
fcompp ;??????? ??? ?? ????? ???????
fstsw ax ;
pop ecx
push eax
Cute icon! :grin:
Posted on 2002-04-19 18:54:27 by bitRAKE
The Svin, nice tool!:alright:
Could you please explain how it works for me, because I don't really understand the source-code (nor the russian comments ;))...

Posted on 2002-04-20 05:26:23 by Delight
At all simplicity this utility has many errors.
At the first number which has occurred.
Divider:2222222222 (84746B8Eh)

MagicNumber = 4449834257 (1093B1511h - cheerful 32-bit arithmetic)
mov eax, X
mov edx, MagicNumber
inc eax
mul edx
SHR edx, 31

As it is visible, that X it can not be equal 0FFFFFFFFh - a source of the future gluks the program.
Be cautious.
Posted on 2002-04-20 07:50:05 by Nexo
Yes, Nexo. X can not be 0FFFF FFFFh.
I'll take your notions to test and improve the app in a future.
Thanx for spending your time with it.
Posted on 2002-04-20 08:42:13 by The Svin
with divider >= 8000 0000h (including FFFF FFFFh)
result of division maybe only 1 or 0.
So I can add code to my app wich desplay such solution in the case:
(app check for signed bit in typed divider and if SF goes to display such text:)

mov eax,X
mov edx,1
cmp eax,divider
sbb edx,0

It will cover all "glucs" :)
What do you think?
Posted on 2002-04-20 09:02:41 by The Svin
My code is very akward :)
So if you feel to optimize it - just do it.
It was coded in a minute as if I did some batch file for temp usage.
I just got bored when again faced the fact that I need again calculate those numbers manualy and wrote some premitive code so that in next time it would be computer problem not mine :)
Posted on 2002-04-20 09:17:19 by The Svin
I handled all glucs mentioned by Nexo, and removed one needless invoke mentioned by bitRake.
At all simplicity this utility has many errors.

No mentioned errors anymore ;)
Posted on 2002-04-20 09:33:44 by The Svin
I do:

cmp eax,Divider
sbb eax,eax
inc eax
Posted on 2002-04-20 10:01:23 by Nexo
The Svin, you algo wrong :)
Second try and I get follow result
Divider=2147483647 - nearest divide bug

Divider is power of 2
To divide X by the divider use:
mov edx,X
SHR edx, 30

With my prog I recive:

;eax-any unsigned
mov edx,080000001h
mul edx
add eax,080000001h
adc edx,0
shr edx,30

May be it is identical ;)
Posted on 2002-04-20 10:43:03 by Nexo
The Svin, you algo wrong

You are just amazing Nexo with your tries to find magic number
for mul replacement with dividers close to 8000 0000h :)
Had you understood the algo you would have never wrote such nonsence :)
Algo is matemicaly right the problem is precision in FPU using fcomp cause in the case
x = 2147483649,0000000004656612875246
y = 2147483649
it gives the result x = y.
I wouldn't use mul to divide x by anything close to 8000 0000h
I would use sub.
But since I submit the app written for myself for anybody use
I promice to write one more solution for divider close to 80000000h :)

cmp eax,Divider
sbb eax,eax
inc eax

Yes, that's better. Thanx.
Posted on 2002-04-20 12:24:48 by The Svin
Svin, this is an interesting discussion started by your tool. Is there an easy way to get the remainder without DIV? I understand we can use two MUL, but there are many dependancies.
Posted on 2002-04-20 12:36:21 by bitRAKE
bitRake, I don't know universal way to do it
but in particular situation (as in dwtoa from M32lib) I would use
relpacement for mul by a few lea , add (and\or) sh operation.
Since we know divider in design time we can use fast replacement
for mul to do with result and then sub it from X.

ebx = X
divider = 3
MagicNumber = 2863311531
mov eax,ebx
mov edx, MagicNumber
mul edx
SHR edx, 1
lea ecx,[edx*2][edx]
sub ebx,ecx ; ebx = reminder edx = integer part of X\3

In case when divider in power of 2 we use and to get reminder.

Nexo, I'v changed my mind.
I'll stick with my example.
It's a little bigger but I clock faster.

mov eax,X
cmp eax,Divider
sbb eax,eax
inc eax

all in dependency wich make it impossible to do in parralel.
I think, you can find one better solution following the same logic.
Posted on 2002-04-20 12:50:19 by The Svin
OK I include one more case for values:
for from 2147483626 to 2147483647;
that handles all cases that had problem with FPU precision
while comparing.
Curious if Nexo can find anithing else :)
Posted on 2002-04-20 14:47:35 by The Svin
Here it is not necessary to search for bad magic numbers. Weak places it is visible at once on algorithm. The algorithm is always coupled at the system of operations of the processor. It not problem FPU. It is incorrect interpreting of execution of FPU operations. I observe algorithm on your source codes, because this direct reflections of a mathematical model. Therefore the initial algorithm is wrong.
For the last version of the program in some cases DIV works faster in 2,725 times faster. For example, for the divider equal 134217727. The code is more preferable:

mov eax, 8000001h
mul edx
add eax, 8000001h
adc edx, 0
shr edx, 22

cmp eax, Divider
sbb eax, eax
inc eax

Works on 25 % faster on P6 and K7 CPU. As it is easier for building into expressions on a context. For example, a=b/222222222222+2c-3.

cmp ebx, 222222222222
sbb ebx, ebx
lea eax, [ebx+1+2*ecx-3]

I is simpl want to warn the others from them CPU of errors in rate of operation.
Posted on 2002-04-20 15:54:49 by Nexo
I is simpl want to warn the others from them CPU of errors in rate of operation.
?= I simply want to warn others of errors from CPU in rate of operation.

I've simply converted the sentence to the meaning I've assumed it to mean. Nexo, I think your trying to say a lot in one sentence. :) What is 'rate of operation'? I'd like to understand the depth of what you mean. There is very nice work here and else where by you - it would be nice if we could communicate more fluidly.
Posted on 2002-04-20 16:57:51 by bitRAKE
Nexo, why do you for the second time comparing your 3 instruction code:

cmp eax, Divider
sbb eax, eax
inc eax

with my 4 instruction?
add to your code
mov eax,X
and it will be slower at any processor.
My code among other things loading X into register, your not.
Include loading code then compare.

About math model - it's nothing but hollow words.
Algo needs to know if fraction is > 0.5.
If you knew how to do it right way you would say something
like(what I guessed without your super unexplicit help)
"Svin, if fcomp returns equal - in your algo it yet would mean that
fraction < 0.5"
Or would show different method of comparing.
You did nothing of the above.
You said nothing practical of what math part was wrong, just unprecise words that it was wrong.
So don't bull**** me about one glance understanding ;)
Write your own app without Stepan procs with right math with the
same perpose and then we'll talk.
He uses Granlund T. - Montgomery P. method wich is close to mine but a little bit different. At least I understand method
from math point of view, the only problem trunslate math words in English (in what I not good at).
So next time you talk of math - write math proof to clearify what yor mean if you really can see it in source.
The only thing you did - showed errors, for that - thank you,
the rest has absolutly no meaning until you clearify it.
below is corrected fcomp.
Posted on 2002-04-20 19:08:12 by The Svin
If to observe on clock ticks both alternatives are equal. If to observe this code in environment of calculations my code is fulfilled faster because it is shorter. I made calculations in a context of the various equations and have received appropriate result. As the side benefit, was remarked possibility usage of one register, that else sometimes in addition improved the resulting code.

Yes, I mustered your code using built - in DIV utility in RAD assembler application from Stepan Polovnikov. What for to me to create that is already created. When I is simpl can use built - in assembler macros divisions (FASTDIV) ;)

I think, that people will use your utility, and additional problems in their code are not necessary for them, only therefore to have to train your brain.

Test for power of 2:

lea ecx,[eax-1]
test eax,ecx
je power2

Use only one addition register. Does not change the source register. Shorter. Faster.

Good luck.
Posted on 2002-04-21 01:38:08 by Nexo
bitRAKE, thank you. You correctly have understood me... Can be :) Sometimes I use a WEB-translator. I really speak much in one sentence - on the average 3 lines on sentence ;) In the future I shall try to communicate more fluidly.
Posted on 2002-04-21 01:59:25 by Nexo