I don't get it.
Could you explain?
I thought of integer values as three groups.
1. x <> 0 and s bit = 0 (x>0)
2. x <> 0 and s bit = 1 (x<0)
3. x = 0
And that f(x) should return {1,-1,0} respectivly.

If I was wrong - what was the task?
And what format is data it eax?
Posted on 2003-07-12 15:23:14 by The Svin
Svin, the goal was for floating point numbers: -0.0 = 80000000h
Posted on 2003-07-12 15:38:13 by bitRAKE
Thanx for claryfication, bitRake.
We can do it with four instructions also,
and very few bytes of opcode:


add eax,eax ;2
je @F ;2
or al,1 ;2
salc ;1 = 7 bytes
@@:
Posted on 2003-07-12 16:46:20 by The Svin



add eax,eax ;2
je @F ;2
mov al,1 ;2
salc ;1 = 7 bytes
@@:



IIRC, salc will clear al if CF=0. Then, this code will return -1 in al for negative numbers and 0 for non-negative numbers. BTW, we've been trying _not_ to use Jcc in this thread. :)

<edit>
I think this will make Svin's idea work without increasing the size:


add eax,eax ;2
je @F ;2
salc ;1
or al,1 ;2 = 7 bytes

@@:

</edit>
Posted on 2003-07-12 17:55:21 by Starless
We edit our posts sulmanteniously :)
And about not jcc -
why people don't like jcc - misprediction
You know chances of misprediction in this case?
1/ (2^31)
:)
Posted on 2003-07-12 18:24:14 by The Svin
/nod. If I were using this code for my own projects, I would use jcc, but the goal here was to do it without jcc. I rarely have the guts to contribute to these kinds of challenges but I thought my attempt was pretty good. Oh well.

Good jobs everyone. :alright:
Posted on 2003-07-12 19:50:54 by iblis
Well, if here was such a condition (no jcc) then you may
consider my code as "out of the challange free-style idea".
:)
Posted on 2003-07-12 20:37:06 by The Svin
	add	eax, eax

je _2
sbb eax, eax
or eax, 1
_2:
...might perform better on some CPU's?
Posted on 2003-07-13 11:29:06 by bitRAKE
IMHO, from practicle point of view your code is for sure improvement from the original.
1. sbb would run faster then salc
2. Return in eax is better for use f(x) as index pointer.
For example f(x) returns 3 values -1,0,1 This array intself
has relation as a[0]=-1 and a=a[-n]+1
so we have leanear function where int can be used as index.
For example we have 3 different procs or 3 different blocks
of code that respectively handle (x<0),(x=0),(x>0) cases.

caseXless0:
.....
caseXeq0:
.....
caseXgr0:

;we can:

handles dd caseXless0, caseXeq0,caseXgr0
.....
.code

add eax,eax
je @F
sbb eax,eax
or eax,1
@@:
jmp

......

In code with salc the return is in al
we would need additinal movsx eax,al
to use the return for index.

The only difference in favor about salc that code with
salc is 1 byte shorter. But it is true only if return value
is not used as pointer. Otherwize using sbb again would
finally lead to shorter code.
Posted on 2003-07-13 14:22:57 by The Svin
For just branching instance:

add eax, eax
je Zero
jc Minus
jmp Plus

...and could reconstruct EAX if needed (i.e. non-destructive):

Minus:
rcr eax, 1
...


...this is quite slower on average!
Posted on 2003-07-13 17:44:12 by bitRAKE
I don't have the time to search for this right now,
but do a search on the web for "superoptimizer".
About 10-15 years ago someone used the signum
function has input to their superoptimizer program
and came up with a three-instruction sequence (IIRC)
to do this trick. Very efficient on CPUs of that day.
I suspect it's still pretty good on modern CPUs.
It took the value in EAX and left the result in EDX,
as I recall.
Cheers,
Randy Hyde
Posted on 2003-07-13 22:31:29 by rhyde
cdq

cmp edx,eax
adc edx,0
Thanks Randy!

http://groups.google.com/groups?q=superoptimizer+x86+eax&hl=en&lr=&ie=UTF-8&oe=UTF-8&selm=7c1d0h%24is1%241%40winter.news.rcn.net&rnum=2

Terje Mathisen, "This is a fairly well-known sequence, I'd guess it is one of those that could be located by the (in)famous Superoptimizer."

Another solution in thread at link above:
	cdq

neg eax
adc edx, edx
...this one does not preserve EAX - whereas the one above does!
Posted on 2003-07-13 23:47:33 by bitRAKE
It returns -1 with 80000000h in eax
Posted on 2003-07-17 20:34:51 by The Svin
Svin, I guess I confused things by not stating those two bits of code are just for integer sng(x) --- does not work for floating point. It would be more fitting in ( this ) thread.
Posted on 2003-07-17 21:22:13 by bitRAKE
FPU signum, how?
Posted on 2003-07-18 03:00:33 by inFinie
OK, i'm capable of answering my question and i'm writing these for you to benefit from;)


.data?
FPUStatusWord WORD ?
.code

fsgn MACRO
fldz ;0,x
fcomp st(1) ;x

push eax
fstsw [FPUStatusWord]
mov ah, BYTE PTR [FPUStatusWord+1]
sahf
pop eax
jz @F

fld st
fabs
fdiv
@@:
;donothing because x=0 already and at top of the stack if jumped here.
endm

X-Calibre's 'fpu comparisons' helped much:alright:
Posted on 2003-09-05 07:53:39 by inFinie
1. You can use FTST to check if something is 0 or unordered. FTST will make your fn shorter.

2. FNSTSW AX would make your fn faster and shorter.

3. Sometimes, using AND instead of SAHF is faster. This is one of those cases.

4. Finally, you can use all of the above discussion to create the result -1,0,1 and, you can just


push eax
fild dword ptr [esp]
pop eax

at the end, which is certainly faster than using FDIV.
Posted on 2003-09-05 15:54:53 by Starless
Thanks for answer but i didn't understood #4:confused:
Posted on 2003-09-06 05:08:06 by inFinie
fsgn MACRO

ftst
fnstsw AX
and ah,40h
jnz @F ; (the zero flag is inverted!)
fld st
fabs
fdiv
@@:
;donothing because x=0 already and at top of the stack if jumped here.

endm

Is this better? I think it is. But i don't understand how to eliminate fdiv.
Posted on 2003-09-06 17:11:47 by inFinie
What I meant was; you can eliminate fdiv by using the previously discussed codes with integer instructions.

Example 1:
Step 1: Pick one of your favorite from the previously discussed codes.
Step 2: Find out which register holds the result - some have it in eax, others have it in edx, and even in al.
Step 3: push the result, and fild (and, of course, clear the stack afterwards.)

Example 2: (modifying your code)
fld st/fabs/fdiv sequence can be reimplemented by


push eax ; make room
; we only need the sign bit.
; we don't need to worry about
; denormalized numbers here.
fstp dword ptr [esp]
mov eax,[esp]
sar eax,31
or eax,1
mov [esp],eax
fild dword ptr [esp]
pop eax ; clear the stack

It is larger for sure, but avoiding fdiv is expected to pay off.

<edit>
If you don't like fild in this example, you can do this


mov eax,[esp]
and eax,80000000h
or eax,3f800000h
mov [esp],eax
fld dword ptr [esp]

and much larger than the previous one. :)
</edit>
Posted on 2003-09-06 18:58:59 by Starless