I am considering how to enhance GoAsm's and GoRC's Unicode support and would welcome the views of members of this forum.

Both GoAsm and GoRC are for use with Windows, so what follows is for that platform only.

Reading Unicode source scripts
In order to offer proper support for all languages and alphabets I can see that an assembler and resource compiler ought to be able to read source scripts in Unicode. Hopefully it is probably fair to insist that all instructions, directives and mnemonics are in English. However, a developer may wish to use Unicode in any language in quoted text. Such text will end up in menus, dialogs, files, and messages to the user. Not being experienced in Unicode, what I would ask is this. If GoAsm and GoRC could read UTF-16 text files, would this suffice? In other words can those Unicode developers who program for Windows save their source scripts in UTF-16 text format? From my investigation so far, this format seems more appropriate in the Windows environment than UTF-8 or UTF-32.

Microsoft Layer for Unicode
In order to make an English language application compatible with Windows 9x and ME which does not inherently use Unicode, and also with Windows NT/2000/XP which does, developers have tended to make just one version of their application in Ansi only. Such applications call the Ansi versions of the API irrespective of the platform on which they are running. However, calls to Ansi APIs involving text manipulation in an application running under NT/2000/XP result in the system having to translate the Ansi text strings to Unicode and back again. This then enables the system to use the Unicode API. This extra processing offends some assembler programmers and has given rise to some debate on this forum.
It seems that Microsoft are aiming to change this by making development in Unicode the norm. It has introduced the "Microsoft Layer for Unicode" which includes a redistributable DLL called Unicows.dll which exports "W" specific APIs. This DLL appears to call the correct "W" or "A" version of the API in the system DLLs depending on whether the machine is running under NT/2000/XP or 9x or ME.
Under this new regime an application designed for all platforms would now call the Unicode versions of the APIs and pass Unicode strings. Then, if the system is running under 9x or ME the DLL must translate the Unicode strings to Ansi so that the "A" API can be called. The string must be translated back again before returning from the API.

So using the Microsoft Layer for Unicode it seems that developers are more likely in future to make just one version of their products - a Unicode version, and they would ship Unicows.dll with it to ensure it runs properly on all Windows platforms.

Existing GoAsm Unicode support
Existing GoAsm support is adequate to deal with the above changes for English applications since all calls can be made expressly to the Unicode API. Calls to generic APIs (which do not have "A" or "W" versions) can be made in the normal way. Quoted strings in English will be converted automatically by GoAsm using the API MultiByteToWideChar. This happens if you use either DUS ("declare unicode string") or L"....". These can be used in the data section or (in the case of L"..) in the code section after PUSH or as a parameter after INVOKE to push a pointer to a string.

Ansi/Unicode switched source code
Despite the changes now introduced by Microsoft, I appreciate that developers will still sometimes want to make two different versions of their program - one in Ansi and one in Unicode. GoAsm supports this mainly by conditional assembly. APIs can be switched by using eg. DefWindowProc=DefWindowProc## AW where AW is defined as either A or W depending on whether the UNICODE flag is on. Conditional assembly can also be used to switch between Ansi and Unicode strings.

GoAsm syntax enhancements
To help with creating two different versions of programs I am considering adding the following extensions which would be switched depending on whether the source script at that point was UNICODE or ANSI:-
DAWS ("declare Ansi or Wide string") for use directly in the data section or for use in a structure;
PAWS ("push Ansi or Wide string") for pushing a pointer to the string before an API call; and
AW".... ("declare or push Ansi or Wide string") alternative to the above which could also be used as a parameter after INVOKE.

I am also considering adding an automatic switch to the APIs to avoid the need for any lists of APIs in the include file, for example:-
CALL MessageBox#AW

Generic APIs would still be called normally eg. GetBkMode.

And if GoAsm will be able to read Unicode source scripts, then I ought also to add A"... which would convert a Unicode quoted string to Ansi using the API WideCharToMultiByte.

On this basis, plain quoted strings without an A L or AW in front would not be changed at all.

Possible other Unicode issues are:-

Special Unicode characters
An example would be the paragraph separator (2029h), but clearly there are numerous such characters. Should I add support using for example, escapes? Hopefully this is not necessary since such characters can be inserted using eg.
DUS "I am a Unicode string"
DW 2029h

Parsing Unicode strings
In WALK32 Sven Schreiber offers a series of pseudo instructions which for example move by a word if UNICODE is "on" and move by a byte if "off". To my mind this would make code unnecessarily complex and it would be better for the developer to create completely different parsers if necessary, the correct one being called either at compile-time (using conditional assembly) or at run-time.

I would appreciate members' comments.
Posted on 2002-12-07 17:17:06 by jorgon