i'm sorry for post thing not relate to asm because i no one answer my question at mysql forum .  :mad: , i think maybe there's someone can't help me here :D
i use libmysql.dll to connect to mysql database using visual c++ , i need to work unicode string but only see the ansi version of function mysql_query(MYSQL *, char *) , i must update my database with some unicode string , so how ? .

thanks.
Posted on 2007-04-07 08:54:47 by secmask
I don't think full "Unicode" is what you want. Trying using UTF-8.

If you don't know what UTF-8 is exactly, well... it is best described as a variable-length character set, where-as standard ASCII/Unicode are fixed-length. The standard ASCII 0x00-0x7F characters are encoded just the same and thus still only take one byte. ASCII 0x7F-0xFF would actually need two bytes because the most significant BIT in the first byte in a character sequence defines if the that sequence is an expansion.

This is exactly why websites with UTF-8 work so well. I also intend to support UTF-8 as the default character set in DynatOS.

Anyhow, Wikipedia has a better explaination of UTF-8 than I could give in this thread.

Good luck.
Posted on 2007-04-07 09:56:58 by SpooK
oh , yeh , that's utf8 ,mysql is not use full unicode. so how to make a query contain utf8 string with mysql ? i' jus see the ansi version of mysql_query function .
Posted on 2007-04-07 10:12:43 by secmask

oh , yeh , that's utf8 ,mysql is not use full unicode. so how to make a query contain utf8 string with mysql ? i' jus see the ansi version of mysql_query function .



Due to the nature of UTF-8, I think it automatically uses it. Just feed it a test UTF-8 string and see what happens :)
Posted on 2007-04-07 12:17:30 by SpooK
oh, cool  :D ,when i used string as full unicode --> error , after convert to utf8 it run so good  , as my guess , i can get help here. thanks Spook.
Posted on 2007-04-07 13:21:36 by secmask

oh, cool  :D ,when i used string as full unicode --> error , after convert to utf8 it run so good  , as my guess , i can get help here. thanks Spook.


Well, you got lucky this time... I've had my fair share of dealing with MySQL and UTF-8 over the years :P
Posted on 2007-04-07 14:52:44 by SpooK

I don't think full "Unicode" is what you want. Trying using UTF-8.

If you don't know what UTF-8 is exactly, well... it is best described as a variable-length character set, where-as standard ASCII/Unicode are fixed-length....


sorry to RIP on you again, but you explained it a little wrongly

Unicode is mostly mapping characters to numbers. It says only as much as 0x2C25  = GLAGOLITIC CAPITAL LETTER SMALL YUS WITH TAIL etc...

Problem is that unicode defines values up to 0x10FFFF. This means to encode all unicode values in fixed-width variable, you would need (practically) 32bits, and that is quite a waste of space (This is called UTF32 encoding). So there are alternative encodings.

UTF8 is one of them. Good thing about it is, that string encoded in UTF8 is ASCII-compatible.

Another popular format is UTF16, used by WinAPI for example (*W functions). It is 16bit analoque to UTF8. It is still variable-width, character can have either 2 or 4 bytes.

So Unicode cannot be "fixed length". Unicode is just table of numbers and corresponding characters. What you mean by "Full Unicode" is probably Unicode encoded as UTF32. This one is AFAIK utilized only with "wchar_t" in gcc/glibc on Linux. (wchar_t on Windows is usually 16bit)
Posted on 2007-04-17 17:10:54 by vid
vid: are you sure that the 16bit UNICODE format used by NT is actually variable-width UTF16? I was under the impression that it was designed as fixed-width, since NT was planned and written before the unicode committe got very far with their work.

Imho UTF-8 is a storage format, while UTF-32 (treated as fixed-width, klingon lobbyists go bury yourself) is the working format. Makes life easier and safer.
Posted on 2007-04-17 17:26:29 by f0dder

vid: are you sure that the 16bit UNICODE format used by NT is actually variable-width UTF16? I was under the impression that it was designed as fixed-width, since NT was planned and written before the unicode committe got very far with their work.

Never really tried it. Originally it was for sure UCS-2 (fixed-width), but now it should be real UTF-16.

I was trying to find some hard evidence, quite a problem. Best i could work out was this.

EDIT: found it! here

Imho UTF-8 is a storage format, while UTF-32 (treated as fixed-width, klingon lobbyists go bury yourself) is the working format. Makes life easier and safer.
I would personally worry especially about those klingon lobbyists...  ;)Having simplistic "Unicode support" like you suggest surely doesn't make life easier for those who like using character modifiers...
Posted on 2007-04-17 17:45:37 by vid
Isn't there a "canonical form" that fits in UCS-4, at least if you exclude lame crap like klingon?

Unicode is a mess anyway - it's so lame that they couldn't come up with something simple, efficient and unambiguous. Grmbl.
Posted on 2007-04-17 17:49:58 by f0dder
Isn't there a "canonical form" that fits in UCS-4, at least if you exclude lame crap like klingon?
anything for you: http://en.wikipedia.org/wiki/ISO-10646

Unicode is a mess anyway - it's so lame that they couldn't come up with something simple, efficient and unambiguous. Grmbl.

For programmers it is hell, surely. But after reading Unicode standard, i am afraid it just isn't possible to do it simply for all languages.
Posted on 2007-04-17 17:56:00 by vid
Oh well, we'll just have to nuke china, japan, the middle east and... a few more places. Then life will be simpler ^_^
Posted on 2007-04-17 18:02:21 by f0dder
don't forget to nuke few dead civilizations (oh, done that already?) and IPA
Posted on 2007-04-17 18:18:14 by vid
Well, as long as we stick to languages with an alphabet (as opposed to ideograms/whatever) and left-to-right read order, unicode isn't all that bothersome, so we can do selective nuking... ;)
Posted on 2007-04-17 18:20:29 by f0dder
Posted on 2007-04-17 18:36:59 by vid

sorry to RIP on you again, but you explained it a little wrongly


Perhaps, but I do not attempt to overshoot my targets. He got the solution he needed, along with a suggestion to read an outside source, such as Wikipedia, which will most likely have the correct information and/or links to the correct information. I don't presume to know everything nor do I try to over-cite sources that people can simply read themselves.

Since I do like you and your enthusiasm, vid... I am going to give you some friendly advice.

You are pretty good at pushing precision, vid, I'll give you that... but you have much to learn in the application of other part of the human equation... that being an appeal to *when* people actually care.

To highlight a general purpose example: When you are asked for the "time" you tend to respond with the hour and minute... but not the second. Such consistent and unnecessary "precision" will, more likely than not, ANNOY people. Also, despite common belief, your brain/balls don't seem any bigger to the people asking the questions. The same application applies to when people ask any other questions. Pretend everyone is a Police Officer or an Interviewer... don't tell them more than they need to know... let *them* ask *you* more questions ;)
Posted on 2007-04-17 19:24:56 by SpooK

I was under the impression that it was designed as fixed-width, since NT was planned and written before the unicode committe got very far with their work.

WideChar contains all characters used nowadays, and it gets extended to 32-bit only for some ancient/rare Chinese characters, and musical scores. Thus, WideChar is practically fixed-width, 16-bit.
Posted on 2007-04-17 20:05:43 by Ultrano

Perhaps, but I do not attempt to overshoot my targets. He got the solution he needed, along with a suggestion to read an outside source, such as Wikipedia, which will most likely have the correct information and/or links to the correct information. I don't presume to know everything nor do I try to over-cite sources that people can simply read themselves.

sure, your answer was surely helpful. My point was just that he was using wrong vocabulary, and he MIGHT want to have it corrected. If not, it's very easy to ignore out my post and continue using wrong terms.

Of course it was also possible to find out from wikipedia, but that is way more reading than my post.

Since I do like you and your enthusiasm, vid... I am going to give you some friendly advice.

You are pretty good at pushing precision, vid, I'll give you that... but you have much to learn in the application of other part of the human equation... that being an appeal to *when* people actually care.

that is matter of view. I think that if there is some important problem to tell, it should be told even if author doesn't matter or doesn't want to hear it. For example reminding people to do error checking after calling WinAPI procedures. Most people really are annoyed when I remind them this. But maybe next time they (and other readers of this post) code something, they remember and do it properly.

To highlight a general purpose example: When you are asked for the "time" you tend to respond with the hour and minute... but not the second. Such consistent and unnecessary "precision" will, more likely than not, ANNOY people. Also, despite common belief, your brain/balls don't seem any bigger to the people asking the questions.

That is bad example. This was like if you would answer "It's 17:50 AM", and i would correct you "either say 17:50, or 5:50 PM". Sure, the questioner got his answer and figured out what it means anyway, and you both are annoyed by me. But otherwise he may get a bad habit of using incorrect expressing of time. I consider it good to annoy people a little for such purpose.

WideChar contains all characters used nowadays, and it gets extended to 32-bit only for some ancient/rare Chinese characters, and musical scores. Thus, WideChar is practically fixed-width, 16-bit.

Not really, see this. Those ancient characters seem to be used.
And anyway, your software can't even be sold in china if you don't support characters above 10000h, AFAIK there's law about it.
Posted on 2007-04-18 04:38:26 by vid


I was under the impression that it was designed as fixed-width, since NT was planned and written before the unicode committe got very far with their work.

WideChar contains all characters used nowadays, and it gets extended to 32-bit only for some ancient/rare Chinese characters, and musical scores. Thus, WideChar is practically fixed-width, 16-bit.

Problems arise if you expect the string to be fixed-width and code that way, while in reality the APIs are UTF-16 instead of UCS-2, and then suddenly one day somebody decides to use stupid musical notes, ancient chinese, or whatever. And *b00m*, there goes your software.
Posted on 2007-04-18 04:59:35 by f0dder
Quite realistic circumstances  :roll:.
Maybe someone would show me a piece of hope that in China software isn't 100% pirated?
Posted on 2007-04-18 10:41:49 by Ultrano