I am recently trying to learn assembly and am playing around with compiling things in C/C++ and looking at the generated assembly.  I am very confused about the asm generated by sqrt(double).  I am running gcc under linux fedora core 6.  Here is my C code:

#include <stdio.h>
#include <math.h>

int main()
{
    double x, y;

    printf("Enter Number: ");

    scanf("%lf", &x);

    y = sqrt(x);

    printf("x = %12.40e\n", x);
    printf("y = %12.40e\n", y);

    return (0);
}


I have generated assembly for this using the command:

/usr/libexec/gcc/i386-redhat-linux/4.1.2/cc1 -quiet -v testsqrt.c -quiet -dumpbase testsqrt.c -masm=intel -mtune=generic -auxbase testsqrt -O -version -o /tmp/testsqrt.s

The generated asm is:

        .file  "testsqrt.c"
        .intel_syntax
        .section        .rodata.str1.1,"aMS",@progbits,1
.LC0:
        .string "Enter Number: "
.LC1:
        .string "%lf"
.LC2:
        .string "x = %12.40e\n"
.LC3:
        .string "y = %12.40e\n"
        .text
.globl main
        .type  main, @function
main:
        lea    %ecx, [%esp+4]
        and    %esp, -16
        push    DWORD PTR [%ecx-4]
        push    %ebp
        mov    %ebp, %esp
        push    %ecx
        sub    %esp, 52
        mov    DWORD PTR [%esp], OFFSET FLAT:.LC0
        call    printf
        lea    %eax, [%ebp-16]
        mov    DWORD PTR [%esp+4], %eax
        mov    DWORD PTR [%esp], OFFSET FLAT:.LC1
        call    scanf
        fld    QWORD PTR [%ebp-16]
        fld    %st(0)
        fsqrt
        fst    QWORD PTR [%ebp-32]
        fucomp  %st(0)
        fnstsw  %ax
        sahf
        jp      .L5
        je      .L6
.L5:
        fstp    QWORD PTR [%esp]
        call    sqrt
        fstp    QWORD PTR [%ebp-32]
        .p2align 4,,4
        jmp    .L2
.L6:
        fstp    %st(0)
.L2:
        fld    QWORD PTR [%ebp-16]
        fstp    QWORD PTR [%esp+4]
        mov    DWORD PTR [%esp], OFFSET FLAT:.LC2
        call    printf
        fld    QWORD PTR [%ebp-32]
        fstp    QWORD PTR [%esp+4]
        mov    DWORD PTR [%esp], OFFSET FLAT:.LC3
        call    printf
        mov    %eax, 0
        add    %esp, 52
        pop    %ecx
        pop    %ebp
        lea    %esp, [%ecx-4]
        ret
        .size  main, .-main
        .ident  "GCC: (GNU) 4.1.2 20070925 (Red Hat 4.1.2-33)"
        .section        .note.GNU-stack,"",@progbits

It seems to me that the fsqrt command should be able to compute this square root.  In the assembly code the command fsqrt appears, afterwards there is an fucomp, a jp, and a je.  Two lines after .L5 there is a "call sqrt".  I don't understand this logic  at all.  Are there cases where fsqrt gives an incorrect or inaccurate result that requires further processing?  Any help would be appreciated.  Thanks.
Posted on 2009-01-30 11:48:15 by mengfanke
Well, the couple of lines that immediately follow the fsqrt are getting the fpu StateFlags onto the cpu, so that we can then use some conditional jumps..

        fsqrt                                     ;calculate st(0) = square root
        fst     QWORD PTR [%ebp-32]    ;store st(0), but dont actually unload it
        fucomp  %st(0)                      ;compare st(0) with ITSELF, and unload it from st(0)
        fnstsw  %ax                           ;set ax = fpu state flags
        sahf                                      ;set fpu state flags = ah
        jp      .L5                               ;if TRUE
        je      .L6                               ;if FALSE


Clearly, this is a test for NaN..
One can therefore test whether a variable has a NaN value by comparing it to itself, thus if x = x gives false then x is a NaN code


The fpu is designed to operate apon 'real' numbers, it's not suitable for returning the square root of a negative number (other than zero), as these are 'imaginary', or 'complex' numbers, which take a different form to our familiar 'real' numbers.

If we ask for the square root of say, -49, the fpu will return NAN (Not A Number), an error code, we'll need to calculate the square roots of complex numbers the hard way.

Seriously, this is a poorly crafted test, we should have just looked at the high bit of the input term to see if its negative, then switched to fsqrt or call sqrt based on that.

Posted on 2009-01-30 18:42:52 by Homer
The more you learn about assembly, the more you will observe how much bloating most of those HLL compilers spit out even for simple functions. Your example is just the tip of the iceberg.
Posted on 2009-01-30 22:20:25 by Raymond
Homer and Raymond:

Thank you for your responses.  I would have thought that fsqrt returning NaN is sufficient, it really is surprinsing how much additional computation is performed after the call to fsqrt.  Again, I am fairly new to assembly and appreciate your responses.
Posted on 2009-02-02 20:56:24 by mengfanke
Try compiling the code with Visual C++ and "/fp:fast"...
fast - "fast" floating-point model; results are less predictable
. Not the smartest stack management code, but it does FSQRT without too much weirdness.
Posted on 2009-02-03 00:45:34 by f0dder