Getting tired of searching for an effective algorithm which calculates the number of digits of a 64-Bit-unsigned-integer but
finding only long-winded binary-to-string-to-binary conversion routines I wrote my own one.

The decimal part uses the coprocessor using this simple formula:

numdigits = int(log(number64))+1

The hexadecimal part uses the powerful BSR ("bit scan reverse") instruction, which can also be considered as binary logarithm.

Input: EAX=Pointer qword ptr
Output: EAX=number of digits
Changed: EAX, EDX, FPU(S0...S2)

Written for TASM, 32Bit.

Thanks for any comments or suggestions.

fpu_ctrlword	record fpucw_$1:3,fpucw_ic:1,fpucw_rc:2,fpucw_pc:2,fpucw_ie:1,fpucw_$2:1,fpucw_pm:1,fpucw_um:1,fpucw_om:1,fpucw_zm:1,fpucw_dm:1,fpucw_im:1

;number64 dq 12345678901234
number64 dq 01fh

main proc @type_main argc:dword,argv:dword,argenv:dword
mov eax,offset number64
fninit

;----Start Decimal part----------------------------------------------------------------------------
push dx ;make room for FPU control word
fnstcw ;save FPU control word
push word ptr ;create a copy of FPU control word
setflag word ptr,3 shl fpucw_rc ;set FPU rounding mode: truncate
fldcw ;set the FPU rounding moude
fld1 ;"1"
fldlg2 ;log(2) = 1/log(2,10)
fild qword ptr ;load number64
ftst ;test if number64 equals to zero
fnstsw ax
sahf ;set processor flags to FPU result for evaluation
.if zero?
fcompp ;throw away s0 and s1, leave "1"
.else
fyl2x ;s1 * log(2,s0) = log(2,s0) / log(2,10) = log(s0)
fadd ;= log(s0)+1
.endif
fistp word ptr
xor eax,eax
pop ax ;load EAX with num of digits
fldcw ;restore FPU control word
pop dx ;throw away FPU control word space
;----End Decimal part----------------------------------------------------------------------------

mstr s1,<"The number %I64u contains %d decimal digits.",10>
call printf,offset s1,dword ptr number64+4 dword ptr number64,eax


mov eax,offset number64

;----Start Hexadecimal part----------------------------------------------------------------------------
xor edx,edx ;initialize digit counter value in case number64 equals to zero
bsr edx,dword ptr ;search for the first set bit in HIGH_DWORD
.if zero? ;no bit was set
bsr edx,dword ptr ;search the first set bit in LOW_DWORD
.else
add edx,32
.endif
xchg eax,edx
shr eax,2
inc eax
;----End Hexadecimal part----------------------------------------------------------------------------

mstr s2,<"The number %I64Xh contains %d hexadecimal digits.">
call printf,offset s2,dword ptr number64+4 dword ptr number64,eax

endmain: ret
endp
Posted on 2008-09-16 18:06:55 by TasmDev
the number of digits of a 64-Bit-unsigned-integer

numdigits = int(log(number64))+1


One small word of caution: You can load only a 64-Bit-signed-integer to the FPU. Your procedure should thus be restricted to a 63-Bit-unsigned-integer if you want to use it only for unsigned integers.

If you intend to use it for 64-Bit-signed-integers, be sure to use "fabs" before you attempt to compute the log.
Posted on 2008-09-16 22:11:30 by Raymond
That's fully correct. I overlooked this case. But assuming that such a large 64Bit unsigned number will never be committed to the procedure and a programmer pays attention to the argument I think "fabs" can be omitted for code space.
Posted on 2008-09-16 22:33:49 by TasmDev
Funny, you can load 128 bits to the gpu and do math and return a meaningful result via a texture.
Am I missing something?
Posted on 2008-09-17 10:09:36 by Homer

Funny, you can load 128 bits to the gpu and do math and return a meaningful result via a texture.
Am I missing something?

Yes, certainly. You are in the wrong thread. But if you are able to program your "gpu" with this "texture" please post it here to suggest a better solution or leave this thread. Thank you.
Posted on 2008-09-17 19:36:53 by TasmDev
I didn't see a DO NOT DISTURB sign on the door, sorry about that.

Actually its becoming quite common to use the gpu for general purpose processing - its a powerful number cruncher, and allows for 1x128 bit, 2x64 bit or 4x32bit parallel processing - you have this muscle that you never flex unless you're playing games, its a waste - search for GPGPU.

Thank you, have a nice day.
Posted on 2008-09-17 20:25:49 by Homer
Are you serious? If I get you right, a user shall buy a multi-monster-high-end-PC with a super-3D-power-wasting-multitexture-water-cooled graphic card just only to run a program which calculates the number of digits from a shitty integer? That's like breaking a butterfly on a wheel. But thanks for the proposal.
Posted on 2008-09-17 21:18:52 by TasmDev
TasmDev, it's true that for one simple calculation you don't need anything like GPGPU, super-uber parallelism, SSE78 optimization, etc. But what you asked for was a fast way to do this (or at least that's what I understand from your first post) and Homer suggested one. If you want to "calculate the number of digits from a shitty integer" then you can very well use the "long-winded binary-to-string-to-binary conversion routines". Please state clearly what speeds are you aiming at so you won't get frustrated next time someone suggests you a solution.
Posted on 2008-09-17 21:26:53 by ti_mo_n
How about something like this:

mov edx,dword ptr ; load hi-dword
mov eax,dword ptr ; load lo-dword

.if edx==0
     cmp eax,10
     jl _numDigitsIs1
     cmp eax,100
     jl _numDigitsIs2
     cmp eax,1000
     jl _numDigitsIs3
     cmp eax,10000
     jl _numDigitsIs4
...
.else
   .. here in a bit more complicated way compare to 64-bit values
.endif

Just an idea. May be faster, may be slower, may be optimized, may lead to tricks


Homer: GPUs can only handle single-precision floats, only 2 recent ones added double-precision floats. You can't even do bit-fiddling ops like XOR/AND/OR. Maybe, just maybe that functionality is exposed/emulated in CUDA, but GL_NV_fragment_program4's doc only rejects that notion.
Whoops, it's documented in GL_NV_gpu_program4  - it has xor/and/or/not instructions
Posted on 2008-09-19 09:58:57 by Ultrano
Shader constants are one or more 128 bit float, typically addressed as 4xreal4.
The trick then is to structure operations to take advantage of this ability, and to feed crafted data to the shader.
Posted on 2008-09-19 11:37:44 by Homer

But what you asked for was a fast way to do this (or at least that's what I understand from your first post) and Homer suggested one.


Quite the contrary, the GPU would not be fast for this at all. Way too much latency to seriously consider it for doing non-parallel work.

Using the slow-assed div instruction and dividing by 10 until the quotient is 0 would be many times faster that trying to shuttle data over to the GPU, running a shader, and then shuttling a result back.

I'm sure that GPGPU is all cool and shit, but come on... be realistic... it aint cool this time.. it would be a novel solution with no practical value at all
Posted on 2008-12-02 03:38:26 by Rockoon
You are using the BSR for hex numbers. Why don't you use that procedure also for decimal numbers and use the result to pick the number of decimal digits from a look-up table which should take you less time to prepare with your calculator than the time it took to write your initial algo.

I doubt you can devise a faster procedure. AND it would work for the full range of unsigned 64-bit integers.
Posted on 2008-12-02 20:43:26 by Raymond

You are using the BSR for hex numbers. Why don't you use that procedure also for decimal numbers and use the result to pick the number of decimal digits from a look-up table which should take you less time to prepare with your calculator than the time it took to write your initial algo.


Consider that a 10-bit number can range from 512..1023

..so is a 10-bit number 3 digits, or 4? How will a lookup table help while still fitting on a practical storage medium (a 1TB drive wouldn't be big enough, would it?)

Note the relationship between base-2 and base-16 that makes his hex techique possible.


Posted on 2008-12-03 07:58:08 by Rockoon
Okay I wrote this based on Ultrano's suggestion. The code is written in BASM:

Function GetDigitCount (Number : DWORD) : DWORD; Assembler; Register;
asm
  (* EAX = Number *)
  PUSH      EDI
  MOV      EDI , 10
  MOV      EDX , EAX
  XOR      EAX , EAX
  MOV      ECX , 10 (* The number that EDX has to be compared to *)
  MOV      EBX , 1  (* The number of digits that have to be returned *)
  @@Loop:
    CMP      EDX , ECX
    MOV      EAX , EBX
    JL        @@End
    IMUL      ECX , 10
    INC      EBX
    DEC      EDI
    JNZ      @@Loop
  @@End:
  POP      EDI
end;


I compiled and tested it in Delphi 6 for 32-bit DWORD values.
Posted on 2008-12-03 09:42:55 by XCHG
MASM

.data
??pow10 textequ <1>
POW10TABLE dq ??pow10
rept 19
??pow10 CatStr ??pow10,<0>
%dq ??pow10
endm

.code
UInt64Log10 proc uses ebx edi esi uint64:qword
mov eax,dword ptr uint64
mov edx,dword ptr uint64+4
lea ebx,POW10TABLE
.if edx
bsr ecx,edx
add ecx,32+1
imul ecx,ecx,1233
shr ecx,12
mov esi,eax
mov edi,edx
sub esi,
sbb edi,
@@: shr edi,31
sub ecx,edi
mov eax,ecx
ret
.endif
bsr ecx,eax
add ecx,1
imul ecx,ecx,1233
shr ecx,12
mov edi,eax
sub edi,
jmp @B

UInt64Log10 endp

UInt64Log2 proc uint64:qword
mov edx,dword ptr uint64+4;hi
mov eax,dword ptr uint64;lo
cmp edx,1; if (edx<1) { carry=1 } else { carry=0 }
cmovnc eax,edx
sbb edx,edx; set to -1 if carry
bsr eax,eax
and edx,-32
lea eax,
ret
UInt64Log2 endp
Posted on 2008-12-03 10:55:09 by drizz
Here's a procedure using BSR and a lookup table. Fast and efficient.

- lpsrc is the memory address of the qword
- The number of digits in the decimal representation of the qword is returned in EAX
- Usage: invoke Ddigits, ADDR qword_source
- The source qword is not modified

Ddigits proc uses ebx lpsrc :DWORD

.data
ALIGN 4
digits      db     1, 1, 1, 1, 2, 2, 2, 3
            db     3, 3, 4, 4, 4, 4, 5, 5
            db     5, 6, 6, 6, 7, 7, 7, 7
            db     8, 8, 8, 9, 9, 9,10,10
            db    10,10,11,11,11,12,12,12
            db    13,13,13,13,14,14,14,15
            db    15,15,16,16,16,16,17,17
            db    17,18,18,18,19,19,19,19
lo_limit    dd    -1,-1,-1,9,-1,-1,63h,-1
            dd    -1,3E7h,-1,-1,-1,270Fh,-1,-1
            dd    1869Fh,-1,-1,0F423Fh,-1,-1,-1,98967Fh
            dd    -1,-1,5F5E0FFh,-1,-1,3B9AC9FFh,-1,-1
hi_limit    dd    -1,540bE3FFh,-1,-1,4876E7FFh,-1,-1,0D4A50FFFh
            dd    -1,-1,-1,4E729FFFh,-1,-1,107A3FFFh,-1
            dd    -1,0A4C67FFFh,-1,-1,-1,6FC0FFFFh,-1,-1
            dd    5D89FFFFh,-1,-1,0A763FFFFh,-1,-1,-1,89E7FFFFH
            dd    2,2,8,10h,17h,40h,80h,0E8h
            dd    200h,400h,800h,918h,2000h,4000h,5AF3h,10000h
            dd    20000h,38D7Eh,80000h,100000h,200000h,2386F2h,800000h,1000000h
            dd    1634578h,4000000h,8000000h,0DE0B6B3h,20000000h,40000000h,80000000h,8AC72304h

.code

    mov   ebx,lpsrc
    mov   eax,
    mov   edx,
    bsr   ecx,edx                 ;check high dword first
    .if   ZERO?                   ;only a 32-bit number
        bsr   ecx,eax
        .if   ZERO?
            xor   ebx,ebx
        .else
            movzx ebx,digits
            .if   eax > lo_limit
                  inc   ebx
            .endif
        .endif
    .else
        add   ecx,32
        movzx ebx,digits
        cmp   edx,hi_limit
        .if   ZERO?
            .if   eax > lo_limit
                  inc   ebx
            .endif
        .elseif !CARRY?
              inc   ebx
        .endif
    .endif
    mov   eax,ebx
    ret

Ddigits endp


I have tested it with all possible values of 2n, 2n-1, 10n and 10n-1, and all returned values were correct.
Posted on 2008-12-04 20:33:54 by Raymond
That's the best solution I have ever seen!
Posted on 2009-02-07 14:14:12 by TasmDev
You could have done it easily by yourself if you had put your mind to it after I gave you the clue about a lookup table.

Chalk it up to experience for your next challenge. ;)
Posted on 2009-02-07 20:35:47 by Raymond

Funny, you can load 128 bits to the gpu and do math and return a meaningful result via a texture.
Am I missing something?


Neither nVidia nor ATi support 128 bit floats, as far as I know.
Their hardware normally works with 32-bit floats, but with GPGPU tasks they can process 64-bit precision at reduced performance.
I also don't know of any textures formats that would support 128 bit float.
4x real4 is not the same as 1x real16. You couldn't do a 128 bit log operation on the data.
Posted on 2009-03-10 07:09:57 by Scali
A32R32G32B32 pixel format used as a vertex texture , and indexed via a pixelshader.
Posted on 2009-03-10 08:57:59 by Homer