Getting tired of searching for an effective algorithm which calculates the number of digits of a 64-Bit-unsigned-integer but
finding only long-winded binary-to-string-to-binary conversion routines I wrote my own one.

The decimal part uses the coprocessor using this simple formula:

numdigits = int(log(number64))+1

The hexadecimal part uses the powerful BSR ("bit scan reverse") instruction, which can also be considered as binary logarithm.

Input: EAX=Pointer qword ptr
Output: EAX=number of digits
Changed: EAX, EDX, FPU(S0...S2)

Written for TASM, 32Bit.

Thanks for any comments or suggestions.

``fpu_ctrlword	record fpucw_\$1:3,fpucw_ic:1,fpucw_rc:2,fpucw_pc:2,fpucw_ie:1,fpucw_\$2:1,fpucw_pm:1,fpucw_um:1,fpucw_om:1,fpucw_zm:1,fpucw_dm:1,fpucw_im:1;number64	dq 12345678901234number64 dq 01fhmain		proc @type_main argc:dword,argv:dword,argenv:dword	mov eax,offset number64	fninit;----Start Decimal part----------------------------------------------------------------------------	push dx		;make room for FPU control word	fnstcw	;save FPU control word	push word ptr	;create a copy of FPU control word	setflag word ptr,3 shl fpucw_rc		;set FPU rounding mode: truncate	fldcw	;set the FPU rounding moude	fld1		;"1"	fldlg2	;log(2) = 1/log(2,10)	fild qword ptr 	;load number64	ftst		;test if number64 equals to zero	fnstsw ax	sahf		;set processor flags to FPU result for evaluation	.if zero?		fcompp	;throw away s0 and s1, leave "1"	.else		fyl2x		;s1 * log(2,s0) = log(2,s0) / log(2,10) = log(s0)		fadd		;= log(s0)+1	.endif	fistp word ptr	xor eax,eax	pop ax	;load EAX with num of digits	fldcw	;restore FPU control word	pop dx	;throw away FPU control word space;----End Decimal part----------------------------------------------------------------------------	mstr s1,<"The number %I64u contains %d decimal digits.",10>	call printf,offset s1,dword ptr number64+4 dword ptr number64,eax	mov eax,offset number64;----Start Hexadecimal part----------------------------------------------------------------------------	xor edx,edx		;initialize digit counter value in case number64 equals to zero	bsr edx,dword ptr	;search for the first set bit in HIGH_DWORD	.if zero?	;no bit was set		bsr edx,dword ptr	;search the first set bit in LOW_DWORD	.else		add edx,32	.endif	xchg eax,edx	shr eax,2	inc eax;----End Hexadecimal part----------------------------------------------------------------------------		mstr s2,<"The number %I64Xh contains %d hexadecimal digits.">	call printf,offset s2,dword ptr number64+4 dword ptr number64,eaxendmain:	retendp``
Posted on 2008-09-16 18:06:55 by TasmDev
the number of digits of a 64-Bit-unsigned-integer

numdigits = int(log(number64))+1

One small word of caution: You can load only a 64-Bit-signed-integer to the FPU. Your procedure should thus be restricted to a 63-Bit-unsigned-integer if you want to use it only for unsigned integers.

If you intend to use it for 64-Bit-signed-integers, be sure to use "fabs" before you attempt to compute the log.
Posted on 2008-09-16 22:11:30 by Raymond
That's fully correct. I overlooked this case. But assuming that such a large 64Bit unsigned number will never be committed to the procedure and a programmer pays attention to the argument I think "fabs" can be omitted for code space.
Posted on 2008-09-16 22:33:49 by TasmDev
Funny, you can load 128 bits to the gpu and do math and return a meaningful result via a texture.
Am I missing something?
Posted on 2008-09-17 10:09:36 by Homer

Funny, you can load 128 bits to the gpu and do math and return a meaningful result via a texture.
Am I missing something?

Yes, certainly. You are in the wrong thread. But if you are able to program your "gpu" with this "texture" please post it here to suggest a better solution or leave this thread. Thank you.
Posted on 2008-09-17 19:36:53 by TasmDev
I didn't see a DO NOT DISTURB sign on the door, sorry about that.

Actually its becoming quite common to use the gpu for general purpose processing - its a powerful number cruncher, and allows for 1x128 bit, 2x64 bit or 4x32bit parallel processing - you have this muscle that you never flex unless you're playing games, its a waste - search for GPGPU.

Thank you, have a nice day.
Posted on 2008-09-17 20:25:49 by Homer
Are you serious? If I get you right, a user shall buy a multi-monster-high-end-PC with a super-3D-power-wasting-multitexture-water-cooled graphic card just only to run a program which calculates the number of digits from a shitty integer? That's like breaking a butterfly on a wheel. But thanks for the proposal.
Posted on 2008-09-17 21:18:52 by TasmDev
TasmDev, it's true that for one simple calculation you don't need anything like GPGPU, super-uber parallelism, SSE78 optimization, etc. But what you asked for was a fast way to do this (or at least that's what I understand from your first post) and Homer suggested one. If you want to "calculate the number of digits from a shitty integer" then you can very well use the "long-winded binary-to-string-to-binary conversion routines". Please state clearly what speeds are you aiming at so you won't get frustrated next time someone suggests you a solution.
Posted on 2008-09-17 21:26:53 by ti_mo_n
``mov edx,dword ptr ; load hi-dwordmov eax,dword ptr ; load lo-dword.if edx==0     cmp eax,10     jl _numDigitsIs1     cmp eax,100     jl _numDigitsIs2     cmp eax,1000     jl _numDigitsIs3     cmp eax,10000     jl _numDigitsIs4....else   .. here in a bit more complicated way compare to 64-bit values.endif``

Just an idea. May be faster, may be slower, may be optimized, may lead to tricks

Homer: GPUs can only handle single-precision floats, only 2 recent ones added double-precision floats. You can't even do bit-fiddling ops like XOR/AND/OR. Maybe, just maybe that functionality is exposed/emulated in CUDA, but GL_NV_fragment_program4's doc only rejects that notion.
Whoops, it's documented in GL_NV_gpu_program4  - it has xor/and/or/not instructions
Posted on 2008-09-19 09:58:57 by Ultrano
Shader constants are one or more 128 bit float, typically addressed as 4xreal4.
The trick then is to structure operations to take advantage of this ability, and to feed crafted data to the shader.
Posted on 2008-09-19 11:37:44 by Homer

But what you asked for was a fast way to do this (or at least that's what I understand from your first post) and Homer suggested one.

Quite the contrary, the GPU would not be fast for this at all. Way too much latency to seriously consider it for doing non-parallel work.

Using the slow-assed div instruction and dividing by 10 until the quotient is 0 would be many times faster that trying to shuttle data over to the GPU, running a shader, and then shuttling a result back.

I'm sure that GPGPU is all cool and shit, but come on... be realistic... it aint cool this time.. it would be a novel solution with no practical value at all
Posted on 2008-12-02 03:38:26 by Rockoon
You are using the BSR for hex numbers. Why don't you use that procedure also for decimal numbers and use the result to pick the number of decimal digits from a look-up table which should take you less time to prepare with your calculator than the time it took to write your initial algo.

I doubt you can devise a faster procedure. AND it would work for the full range of unsigned 64-bit integers.
Posted on 2008-12-02 20:43:26 by Raymond

You are using the BSR for hex numbers. Why don't you use that procedure also for decimal numbers and use the result to pick the number of decimal digits from a look-up table which should take you less time to prepare with your calculator than the time it took to write your initial algo.

Consider that a 10-bit number can range from 512..1023

..so is a 10-bit number 3 digits, or 4? How will a lookup table help while still fitting on a practical storage medium (a 1TB drive wouldn't be big enough, would it?)

Note the relationship between base-2 and base-16 that makes his hex techique possible.

Posted on 2008-12-03 07:58:08 by Rockoon
Okay I wrote this based on Ultrano's suggestion. The code is written in BASM:

``Function GetDigitCount (Number : DWORD) : DWORD; Assembler; Register;asm  (* EAX = Number *)  PUSH      EDI  MOV       EDI , 10  MOV       EDX , EAX  XOR       EAX , EAX  MOV       ECX , 10 (* The number that EDX has to be compared to *)  MOV       EBX , 1  (* The number of digits that have to be returned *)  @@Loop:    CMP       EDX , ECX    MOV       EAX , EBX    JL        @@End    IMUL      ECX , 10    INC       EBX    DEC       EDI    JNZ       @@Loop  @@End:  POP       EDIend;``

I compiled and tested it in Delphi 6 for 32-bit DWORD values.
Posted on 2008-12-03 09:42:55 by XCHG
MASM

``.data??pow10 textequ <1>POW10TABLE dq ??pow10rept 19	??pow10 CatStr ??pow10,<0>	%dq ??pow10endm.codeUInt64Log10 proc uses ebx edi esi uint64:qword	mov eax,dword ptr uint64	mov edx,dword ptr uint64+4	lea ebx,POW10TABLE	.if edx		bsr ecx,edx		add ecx,32+1		imul ecx,ecx,1233		shr ecx,12		mov esi,eax		mov edi,edx		sub esi,		sbb edi,@@:		shr edi,31		sub ecx,edi		mov eax,ecx		ret	.endif	bsr ecx,eax	add ecx,1	imul ecx,ecx,1233	shr ecx,12	mov edi,eax	sub edi,	jmp @B	UInt64Log10 endpUInt64Log2 proc uint64:qword	mov edx,dword ptr uint64+4;hi	mov eax,dword ptr uint64;lo	cmp edx,1; if (edx<1) { carry=1 } else { carry=0 } 	cmovnc eax,edx	sbb edx,edx; set to -1 if carry	bsr eax,eax	and edx,-32	lea eax,	retUInt64Log2 endp``
Posted on 2008-12-03 10:55:09 by drizz
Here's a procedure using BSR and a lookup table. Fast and efficient.

- lpsrc is the memory address of the qword
- The number of digits in the decimal representation of the qword is returned in EAX
- Usage: invoke Ddigits, ADDR qword_source
- The source qword is not modified

``Ddigits proc uses ebx lpsrc :DWORD.dataALIGN 4digits      db     1, 1, 1, 1, 2, 2, 2, 3            db     3, 3, 4, 4, 4, 4, 5, 5            db     5, 6, 6, 6, 7, 7, 7, 7            db     8, 8, 8, 9, 9, 9,10,10            db    10,10,11,11,11,12,12,12            db    13,13,13,13,14,14,14,15            db    15,15,16,16,16,16,17,17            db    17,18,18,18,19,19,19,19lo_limit    dd    -1,-1,-1,9,-1,-1,63h,-1            dd    -1,3E7h,-1,-1,-1,270Fh,-1,-1            dd    1869Fh,-1,-1,0F423Fh,-1,-1,-1,98967Fh            dd    -1,-1,5F5E0FFh,-1,-1,3B9AC9FFh,-1,-1hi_limit    dd    -1,540bE3FFh,-1,-1,4876E7FFh,-1,-1,0D4A50FFFh            dd    -1,-1,-1,4E729FFFh,-1,-1,107A3FFFh,-1            dd    -1,0A4C67FFFh,-1,-1,-1,6FC0FFFFh,-1,-1            dd    5D89FFFFh,-1,-1,0A763FFFFh,-1,-1,-1,89E7FFFFH            dd    2,2,8,10h,17h,40h,80h,0E8h            dd    200h,400h,800h,918h,2000h,4000h,5AF3h,10000h            dd    20000h,38D7Eh,80000h,100000h,200000h,2386F2h,800000h,1000000h            dd    1634578h,4000000h,8000000h,0DE0B6B3h,20000000h,40000000h,80000000h,8AC72304h.code    mov   ebx,lpsrc    mov   eax,    mov   edx,    bsr   ecx,edx                 ;check high dword first    .if   ZERO?                   ;only a 32-bit number        bsr   ecx,eax        .if   ZERO?            xor   ebx,ebx        .else            movzx ebx,digits            .if   eax > lo_limit                  inc   ebx            .endif        .endif    .else        add   ecx,32        movzx ebx,digits        cmp   edx,hi_limit        .if   ZERO?            .if   eax > lo_limit                  inc   ebx            .endif        .elseif !CARRY?              inc   ebx        .endif    .endif    mov   eax,ebx    retDdigits endp``

I have tested it with all possible values of 2n, 2n-1, 10n and 10n-1, and all returned values were correct.
Posted on 2008-12-04 20:33:54 by Raymond
That's the best solution I have ever seen!
Posted on 2009-02-07 14:14:12 by TasmDev
You could have done it easily by yourself if you had put your mind to it after I gave you the clue about a lookup table.

Chalk it up to experience for your next challenge. ;)
Posted on 2009-02-07 20:35:47 by Raymond

Funny, you can load 128 bits to the gpu and do math and return a meaningful result via a texture.
Am I missing something?

Neither nVidia nor ATi support 128 bit floats, as far as I know.
Their hardware normally works with 32-bit floats, but with GPGPU tasks they can process 64-bit precision at reduced performance.
I also don't know of any textures formats that would support 128 bit float.
4x real4 is not the same as 1x real16. You couldn't do a 128 bit log operation on the data.
Posted on 2009-03-10 07:09:57 by Scali
A32R32G32B32 pixel format used as a vertex texture , and indexed via a pixelshader.
Posted on 2009-03-10 08:57:59 by Homer