Hi to all (and merry christmas).
I am confused, have a mess into the head....

Well, I am trying show in a listview some special characters (like the spanish 'ñ'), but i have an unreadable result.

what are the steps for full support utf-8 format, in my program (edits, listviews, etc...)

thanks

Morlok
Posted on 2005-12-21 04:35:50 by The Morlok
I don't think there's UTF-8 support in windows - it has 8-bit ANSI (with regional codepages) and 16-bti WIDE (UTF-16?) support. To use the unicode functions, you need to use calls like MessageBoxW instead of MessageBoxA and so on (by default MessageBox expands to MessageBoxA).
Posted on 2005-12-21 04:50:46 by f0dder
My program maintais a communication 'http' with a server in the net. This server replys in utf-8 format, and i want to show part of the message
in a listview (LVM_SETITEM), but some special characters are unreadables.

are ther some way to show them correctly.

(if i put the incoming message in a buffer and try
    invoke MessageBoxW, hWnd, addr Buffer, NULL,MB_OK  ----> i get Chinese characters  :shock: )

the goal is put the message in a listview, but i am lost.

thanks.
Posted on 2005-12-21 05:48:15 by The Morlok
You can't use MessageBoxW with utf-8, you must convert your utf-8 string to utf-16 before you try to display it...
Posted on 2005-12-21 06:21:38 by f0dder
I think MultiByteToWideChar is what you need here
Posted on 2005-12-21 10:05:20 by stormix
I  convert the utf-8 string to utf-16 and is ok.

I try make a utf-8 -> utf-16 translator. Any suggestion will be welcome

Thanks.

Morlok
Posted on 2005-12-21 13:16:30 by The Morlok
Hi, Morlok (hi, everyone)
My 1-st post and 1-st suggestion...you could use my funcs
to_UTF8 - which converts from UCS-2 to UTF-8
UTF8_to - which converts from UTF-8 to UCS-2
I actually didn't like Multi.. Wide.. APIs because of too large number of arguments. BTW I happily pass an input as UTF8 strings ussing these two in scintilla :)

to_UTF8 proc sour:DWORD,dest:DWORD
LOCAL b1:BYTE
LOCAL b2:BYTE

xor    eax,eax
mov    esi,sour
mov    edi,dest
@@:
mov    ax,word ptr
;cmp    ax,0FFFFh
;jg    _4bytes
cmp    ax,0
je    fin
cmp    ax,07FFh
jg    _3bytes
cmp    ax,007Fh
jg    _2bytes
;----1_byte
mov    byte ptr ,al
inc    edi
jmp    next_char
_2bytes:
mov    ebx,080h
mov    ecx,eax
and    ecx,3Fh
or    ecx,ebx
mov    edx,ecx
;---------------------
mov    ebx,0C0h
mov    ecx,eax
shr    ecx,6
or    ecx,ebx
;---------------------
mov    byte ptr ,cl
mov    byte ptr ,dl
add    edi,2
jmp    next_char
_3bytes:
mov    ebx,080h
mov    ecx,eax
and    ecx,3Fh
or    ecx,ebx
mov    b1,cl
;--------------
mov    ebx,080h
mov    ecx,eax
and    ecx,0FFFh
shr    ecx,6
or    ecx,ebx
mov    b2,cl
;--------------
mov    ebx,0E0h
mov    ecx,eax
shr    ecx,12
or    ecx,ebx
;--------------
mov    byte ptr ,cl
mov    cl,b2
mov    byte ptr ,cl
mov    cl,b1
mov    byte ptr ,cl
add    edi,3
_4bytes: ;WE'LL_IGNORE_IT_AS_MOST_OF_API_DOES
next_char:
add    esi,2
jmp    @B
fin:
mov    byte ptr ,0

ret

to_UTF8 endp

UTF8_to proc sour:DWORD,dest:DWORD

xor    eax,eax
mov    esi,sour
mov    edi,dest
@@:
mov    al,byte ptr
;cmp    ax,0FFFFh
;jg    _4bytes
cmp    al,0
je    fin
mov    cl,al
and    cl,0E0h
cmp    cl,0E0h
je    _3bytes
mov    cl,al
and    cl,0C0h
cmp    cl,0C0h
je    _2bytes
;----1_byte
mov    byte ptr ,al
mov    byte ptr ,0
inc    esi
jmp    next_char
_2bytes:
mov    cl,al
shl    cl,6
mov    bl,byte ptr
and    bl,3Fh
or    cl,bl
mov    byte ptr ,cl   
;-------------------------
mov    cl,al
and    cl,1Fh
shr    cl,2
mov    byte ptr ,cl
add    esi,2
jmp    next_char
_3bytes:
mov    cl,byte ptr
and    cl,3
shl    cl,6
mov    bl,byte ptr
and    bl,3Fh
or    cl,bl
mov    byte ptr ,cl
;-------------------------
mov    cl,byte ptr
and    cl,3Fh
shr    cl,2
mov    bl,byte ptr
shl    bl,4
or    cl,bl
mov    byte ptr ,cl
add    esi,3
_4bytes: ;WE'LL_IGNORE_IT_AS_MOST_OF_API_DOES
next_char:
add    edi,2
jmp    @B
fin:
mov    byte ptr ,0
mov    byte ptr ,0

ret

UTF8_to endp

P.S. charachters greater than 0FFFFh are rarely used (maybe never) so I didn't bother implementing...
Posted on 2005-12-21 15:32:14 by ramguru
thanks, these func works very fine.

Only one thing...  :oops:, the output format is XX 00 XX 00 XX 00 XX 00, and the LVM_SETITEM insert a null terminated string in the listview, so
only the first character is displayed.

I make a litle func that skip these 00'es, and work ok, but do exist a more elegant method (like MessageboxW, that display the output string correctly)


Morlok


Posted on 2005-12-21 17:47:38 by The Morlok
Create a 'Wide' listbox (CreateWindowExW).
Posted on 2005-12-21 17:56:32 by ti_mo_n
Don't forget to declare class name in unicode aswell - uni$("SysListView32") or CADD("S",0,"y",0,"s",0,"L",0...)
Posted on 2005-12-21 18:18:13 by ramguru
Unicode Listview Revisited.... :roll:


Here is a small program that open a utf-8 file (only a few lines) and put the text into one Edit, one ListBox and at last one ListView.
(here i use the ramguru's func for trans utf-8 strings).

Well, the Edit and the ListBox display corretly the text, but no the listview.
	
    invoke UTF8_to, addr LocalBuff2, addr LocalBuff1 ;THANKS ramguru FOR YOUR FUNCS (UTF8_to AND to_UTF8)

    invoke SendDlgItemMessageW, hWnd, IDC_EDT1, WM_SETTEXT, 0, addr LocalBuff1

    invoke SendDlgItemMessageW, hWnd, IDC_LST1, LB_INSERTSTRING, 0, addr LocalBuff1

    mov Item.imask,  LVIF_TEXT
    mov Item.cchTextMax,256
    lea edi, LocalBuff1
    mov Item.pszText,edi
    mov Item.iSubItem,0
    mov Item.iItem, 0
    invoke SendDlgItemMessageW, hWnd, IDC_LSV1, LVM_INSERTITEM, 0, addr Item


what is the ListView's trick for display unicode strings?

thanks

Morlok
Attachments:
Posted on 2005-12-23 19:46:13 by The Morlok
From the C header file (commctrl.h)

#define LVM_INSERTITEMA  (LVM_FIRST + 7)
#define LVM_INSERTITEMW  (LVM_FIRST + 77)

Listview is not a first generation control, so you must use different message numbers.

I'm not familiar with the MASM32 inc files. Check to see if the W version already exists or not. If it doesn't exist, the simplest way to create the W version is:

LVM_INSERTITEMW equ LVM_INSERTITEM + 70
Posted on 2005-12-23 21:45:54 by tenkey
Current WINDOWS.INC.

LVM_GETITEM    equ LVM_FIRST + 5
LVM_GETITEMW    equ LVM_FIRST + 75
LVM_SETITEM    equ LVM_FIRST + 6
LVM_SETITEMW    equ LVM_FIRST + 76
LVM_INSERTITEM  equ LVM_FIRST + 7
LVM_INSERTITEMW equ LVM_FIRST + 77
LVM_DELETEITEM  equ LVM_FIRST + 8
Posted on 2005-12-23 22:27:30 by hutch--
LVM_INSERTITEMW that is.
Now all is right.

Thanks


Merry christmas
Posted on 2005-12-24 01:30:37 by The Morlok