I don't get it.
Could you explain?
I thought of integer values as three groups.
1. x <> 0 and s bit = 0 (x>0)
2. x <> 0 and s bit = 1 (x<0)
3. x = 0
And that f(x) should return {1,-1,0} respectivly.
If I was wrong - what was the task?
And what format is data it eax?
Could you explain?
I thought of integer values as three groups.
1. x <> 0 and s bit = 0 (x>0)
2. x <> 0 and s bit = 1 (x<0)
3. x = 0
And that f(x) should return {1,-1,0} respectivly.
If I was wrong - what was the task?
And what format is data it eax?
Svin, the goal was for floating point numbers: -0.0 = 80000000h
Thanx for claryfication, bitRake.
We can do it with four instructions also,
and very few bytes of opcode:
We can do it with four instructions also,
and very few bytes of opcode:
add eax,eax ;2
je @F ;2
or al,1 ;2
salc ;1 = 7 bytes
@@:
add eax,eax ;2
je @F ;2
mov al,1 ;2
salc ;1 = 7 bytes
@@:
IIRC, salc will clear al if CF=0. Then, this code will return -1 in al for negative numbers and 0 for non-negative numbers. BTW, we've been trying _not_ to use Jcc in this thread. :)
<edit>
I think this will make Svin's idea work without increasing the size:
add eax,eax ;2
je @F ;2
salc ;1
or al,1 ;2 = 7 bytes
@@:
</edit>
We edit our posts sulmanteniously :)
And about not jcc -
why people don't like jcc - misprediction
You know chances of misprediction in this case?
1/ (2^31)
:)
And about not jcc -
why people don't like jcc - misprediction
You know chances of misprediction in this case?
1/ (2^31)
:)
/nod. If I were using this code for my own projects, I would use jcc, but the goal here was to do it without jcc. I rarely have the guts to contribute to these kinds of challenges but I thought my attempt was pretty good. Oh well.
Good jobs everyone. :alright:
Good jobs everyone. :alright:
Well, if here was such a condition (no jcc) then you may
consider my code as "out of the challange free-style idea".
:)
consider my code as "out of the challange free-style idea".
:)
add eax, eax
je _2
sbb eax, eax
or eax, 1
_2:
...might perform better on some CPU's?IMHO, from practicle point of view your code is for sure improvement from the original.
1. sbb would run faster then salc
2. Return in eax is better for use f(x) as index pointer.
For example f(x) returns 3 values -1,0,1 This array intself
has relation as a[0]=-1 and a=a[-n]+1
so we have leanear function where int can be used as index.
For example we have 3 different procs or 3 different blocks
of code that respectively handle (x<0),(x=0),(x>0) cases.
caseXless0:
.....
caseXeq0:
.....
caseXgr0:
;we can:
handles dd caseXless0, caseXeq0,caseXgr0
.....
.code
add eax,eax
je @F
sbb eax,eax
or eax,1
@@:
jmp
......
In code with salc the return is in al
we would need additinal movsx eax,al
to use the return for index.
The only difference in favor about salc that code with
salc is 1 byte shorter. But it is true only if return value
is not used as pointer. Otherwize using sbb again would
finally lead to shorter code.
1. sbb would run faster then salc
2. Return in eax is better for use f(x) as index pointer.
For example f(x) returns 3 values -1,0,1 This array intself
has relation as a[0]=-1 and a=a[-n]+1
so we have leanear function where int can be used as index.
For example we have 3 different procs or 3 different blocks
of code that respectively handle (x<0),(x=0),(x>0) cases.
caseXless0:
.....
caseXeq0:
.....
caseXgr0:
;we can:
handles dd caseXless0, caseXeq0,caseXgr0
.....
.code
add eax,eax
je @F
sbb eax,eax
or eax,1
@@:
jmp
......
In code with salc the return is in al
we would need additinal movsx eax,al
to use the return for index.
The only difference in favor about salc that code with
salc is 1 byte shorter. But it is true only if return value
is not used as pointer. Otherwize using sbb again would
finally lead to shorter code.
For just branching instance:
add eax, eax
je Zero
jc Minus
jmp Plus
...and could reconstruct EAX if needed (i.e. non-destructive):
Minus:
rcr eax, 1
...
...this is quite slower on average!
add eax, eax
je Zero
jc Minus
jmp Plus
...and could reconstruct EAX if needed (i.e. non-destructive):
Minus:
rcr eax, 1
...
...this is quite slower on average!
I don't have the time to search for this right now,
but do a search on the web for "superoptimizer".
About 10-15 years ago someone used the signum
function has input to their superoptimizer program
and came up with a three-instruction sequence (IIRC)
to do this trick. Very efficient on CPUs of that day.
I suspect it's still pretty good on modern CPUs.
It took the value in EAX and left the result in EDX,
as I recall.
Cheers,
Randy Hyde
but do a search on the web for "superoptimizer".
About 10-15 years ago someone used the signum
function has input to their superoptimizer program
and came up with a three-instruction sequence (IIRC)
to do this trick. Very efficient on CPUs of that day.
I suspect it's still pretty good on modern CPUs.
It took the value in EAX and left the result in EDX,
as I recall.
Cheers,
Randy Hyde
cdq
cmp edx,eax
adc edx,0
Thanks Randy!
http://groups.google.com/groups?q=superoptimizer+x86+eax&hl=en&lr=&ie=UTF-8&oe=UTF-8&selm=7c1d0h%24is1%241%40winter.news.rcn.net&rnum=2
Terje Mathisen, "This is a fairly well-known sequence, I'd guess it is one of those that could be located by the (in)famous Superoptimizer."
Another solution in thread at link above:
cdq
neg eax
adc edx, edx
...this one does not preserve EAX - whereas the one above does!It returns -1 with 80000000h in eax
Svin, I guess I confused things by not stating those two bits of code are just for integer sng(x) --- does not work for floating point. It would be more fitting in ( this ) thread.
FPU signum, how?
OK, i'm capable of answering my question and i'm writing these for you to benefit from;)
X-Calibre's 'fpu comparisons' helped much:alright:
.data?
FPUStatusWord WORD ?
.code
fsgn MACRO
fldz ;0,x
fcomp st(1) ;x
push eax
fstsw [FPUStatusWord]
mov ah, BYTE PTR [FPUStatusWord+1]
sahf
pop eax
jz @F
fld st
fabs
fdiv
@@:
;donothing because x=0 already and at top of the stack if jumped here.
endm
X-Calibre's 'fpu comparisons' helped much:alright:
1. You can use FTST to check if something is 0 or unordered. FTST will make your fn shorter.
2. FNSTSW AX would make your fn faster and shorter.
3. Sometimes, using AND instead of SAHF is faster. This is one of those cases.
4. Finally, you can use all of the above discussion to create the result -1,0,1 and, you can just
at the end, which is certainly faster than using FDIV.
2. FNSTSW AX would make your fn faster and shorter.
3. Sometimes, using AND instead of SAHF is faster. This is one of those cases.
4. Finally, you can use all of the above discussion to create the result -1,0,1 and, you can just
push eax
fild dword ptr [esp]
pop eax
at the end, which is certainly faster than using FDIV.
Thanks for answer but i didn't understood #4:confused:
fsgn MACRO
ftst
fnstsw AX
and ah,40h
jnz @F ; (the zero flag is inverted!)
fld st
fabs
fdiv
@@:
;donothing because x=0 already and at top of the stack if jumped here.
endm
Is this better? I think it is. But i don't understand how to eliminate fdiv.
What I meant was; you can eliminate fdiv by using the previously discussed codes with integer instructions.
Example 1:
Step 1: Pick one of your favorite from the previously discussed codes.
Step 2: Find out which register holds the result - some have it in eax, others have it in edx, and even in al.
Step 3: push the result, and fild (and, of course, clear the stack afterwards.)
Example 2: (modifying your code)
fld st/fabs/fdiv sequence can be reimplemented by
It is larger for sure, but avoiding fdiv is expected to pay off.
<edit>
If you don't like fild in this example, you can do this
and much larger than the previous one. :)
</edit>
Example 1:
Step 1: Pick one of your favorite from the previously discussed codes.
Step 2: Find out which register holds the result - some have it in eax, others have it in edx, and even in al.
Step 3: push the result, and fild (and, of course, clear the stack afterwards.)
Example 2: (modifying your code)
fld st/fabs/fdiv sequence can be reimplemented by
push eax ; make room
; we only need the sign bit.
; we don't need to worry about
; denormalized numbers here.
fstp dword ptr [esp]
mov eax,[esp]
sar eax,31
or eax,1
mov [esp],eax
fild dword ptr [esp]
pop eax ; clear the stack
It is larger for sure, but avoiding fdiv is expected to pay off.
<edit>
If you don't like fild in this example, you can do this
mov eax,[esp]
and eax,80000000h
or eax,3f800000h
mov [esp],eax
fld dword ptr [esp]
and much larger than the previous one. :)
</edit>