I'm beginning to understand those "hardcore" assembly programmers who hate compilers... :mad:
I was coding in C++ a today, my program was crashing and after quite some debugging I found out what the problem was: my checksum function was *somehow* returning negative values (it was supposed to use positives only) and that was causing a GPF later.
I scratched my head and wondered: how come this code can return a negative number?
unsigned int checksum(const char* name)
{
? ? unsigned int sum = 0;
? ? for( int i = 0; name; i++ )
? ? {
? ? ? ? sum += (unsigned int)name;
? ? };
? ? return sum;
};
It couldn't be an integer overflow, because the strings I was feeding it were very short (under 256 chars long). I managed to isolate what kind of strings would produce negative results, and it turned out they all had ASCII characters over 128.
Something fishy was going on there :/ so I took a look at the dissassembly, and saw this line:
MOVSX ECX, DWORD PTR
Huh?! :?:
I have a theory: I think the compiler hasn't defined a conversion from char (signed by default) and unsigned int, but it does to int (also signed by default). So this statement...
(unsigned int)name
...was converting the char to int, and then to unsigned int. Since the last conversion does nothing, in practice the char was being converted to a signed integer instead! :mad:
I had to fix the code like this:
unsigned int checksum(const char* name)
{
? ? unsigned int sum = 0;
? ? for( int i = 0; name; i++ )
? ? {
? ? ? ? sum += (unsigned int)((unsigned char)name);
? ? };
? ? return sum;
};
It can still be overflowed, but now it needs a string full of 0xFF chars and at least 8,421,505 bytes long to do it. I think I'm safe with my 256 chars limit. ;)
Anyway, is it possible to disable this automatic type conversions? Or at least flag a warning when they take place? I'd hate to run into another one like this... :sad:
I was coding in C++ a today, my program was crashing and after quite some debugging I found out what the problem was: my checksum function was *somehow* returning negative values (it was supposed to use positives only) and that was causing a GPF later.
I scratched my head and wondered: how come this code can return a negative number?
unsigned int checksum(const char* name)
{
? ? unsigned int sum = 0;
? ? for( int i = 0; name; i++ )
? ? {
? ? ? ? sum += (unsigned int)name;
? ? };
? ? return sum;
};
It couldn't be an integer overflow, because the strings I was feeding it were very short (under 256 chars long). I managed to isolate what kind of strings would produce negative results, and it turned out they all had ASCII characters over 128.
Something fishy was going on there :/ so I took a look at the dissassembly, and saw this line:
MOVSX ECX, DWORD PTR
Huh?! :?:
I have a theory: I think the compiler hasn't defined a conversion from char (signed by default) and unsigned int, but it does to int (also signed by default). So this statement...
(unsigned int)name
...was converting the char to int, and then to unsigned int. Since the last conversion does nothing, in practice the char was being converted to a signed integer instead! :mad:
I had to fix the code like this:
unsigned int checksum(const char* name)
{
? ? unsigned int sum = 0;
? ? for( int i = 0; name; i++ )
? ? {
? ? ? ? sum += (unsigned int)((unsigned char)name);
? ? };
? ? return sum;
};
It can still be overflowed, but now it needs a string full of 0xFF chars and at least 8,421,505 bytes long to do it. I think I'm safe with my 256 chars limit. ;)
Anyway, is it possible to disable this automatic type conversions? Or at least flag a warning when they take place? I'd hate to run into another one like this... :sad:
Hm, I'm not sure if there's some obscure thing in The Standard that explains this, but it does smell like a compiler bug. You should state which compiler you use - it's reproducible with vc2003. I would either pass unsigned chars (since signed chars have often been problematic in various circumstances), or rewrite it to something like this:
unsigned int checksum(const void* a_name)
{
const unsigned char* const name = static_cast<const unsigned char* const>( a_name );
unsigned int sum = 0;
for( int i = 0; name; i++ )
{
sum = sum + static_cast<unsigned>( name );
}
return sum;
}
BIG smile, :mrgreen:
I'm beginning to understand those "hardcore" assembly programmers who hate compilers..
Ve Vill conVert Yoooo !
I'm beginning to understand those "hardcore" assembly programmers who hate compilers..
Ve Vill conVert Yoooo !
IMO it is not a bug!
The matter is that your signed character already holds a negative value, and that value will be preserved when promoting to integer, independently on the fact that the destination integer is signed or unsigned. That peculiarity (signed or unsigned) will only be taken into account in the following operations you will perform on the promoted integer.
Just another example:
So the value inside ui will be the same as the value inside si:
you cannot expect that the cast operation will take off the negative sign from -13...
Best regards, bilbo
P.S. This is anyway a common programmer's trap: I saw a similar discussion on a Microcontrollers forum,
hxxp://www.cygnal.org/ubb/Forum7/HTML/000056.html?
The matter is that your signed character already holds a negative value, and that value will be preserved when promoting to integer, independently on the fact that the destination integer is signed or unsigned. That peculiarity (signed or unsigned) will only be taken into account in the following operations you will perform on the promoted integer.
Just another example:
? ? signed int si = -13;? // mov dword ptr ,0FFFFFFF3h
? ? unsigned int ui = (unsigned int)si;? // mov eax,dword ptr + mov dword ptr ,eax
So the value inside ui will be the same as the value inside si:
you cannot expect that the cast operation will take off the negative sign from -13...
Best regards, bilbo
P.S. This is anyway a common programmer's trap: I saw a similar discussion on a Microcontrollers forum,
hxxp://www.cygnal.org/ubb/Forum7/HTML/000056.html?
To solve such problems in my C code, I make pointers - at least I know they will be properly copied :). Here's how I'd fix your code:
Note that I also use "long" instead of "int", because with some compilers "int" is 2 bytes, and "long" is always 4 bytes.
I use extra round brackets at tricky conversion-bound places, to assure the compiler will understand what exactly I need (and to spare me from learning the rock-paper-scissors logic in C/C++ compilers - that is not always implemented as ANSI states).
unsigned long checksum(const char* sname)
{
unsigned long sum = 0;
unsigned char *name = (unsigned char*)sname;
while(*name)
{
sum += (unsigned int)(*name); // extra round brackets
name++;
};
return sum;
};
Note that I also use "long" instead of "int", because with some compilers "int" is 2 bytes, and "long" is always 4 bytes.
I use extra round brackets at tricky conversion-bound places, to assure the compiler will understand what exactly I need (and to spare me from learning the rock-paper-scissors logic in C/C++ compilers - that is not always implemented as ANSI states).
BIG smile, :mrgreen:
Ve Vill conVert Yoooo !
I'm scared... :D
As pointed out by bilbo this is not a bug -- a design flaw at any rate. Since there seems to be no way to make the compiler at least warn me when it's about to do something unexpected, I guess I'll have to use unsigned chars always from now on. :P
BTW, Ultrano, your fix won't work. The compiler will try to convert the signed char into signed integer first, no matter how many brackets you use... :(
What about the absolute value of the signed variable?
OK, maybe because I made a typo:
sum += (unsigned int)(*name); // ^^"
should be "long".
But if that doesn't work either, try
sum += ((unsigned long)(*name)) & 255;
So, which compiler do you use ???
sum += (unsigned int)(*name); // ^^"
should be "long".
But if that doesn't work either, try
sum += ((unsigned long)(*name)) & 255;
So, which compiler do you use ???
If you're going to treat plain char as numbers, you should always strip away any potential sign extension.
Here is the historical reason: the char to int conversion is supposed to use the most efficient conversion. On some machines, like the Digital machines, it was more efficient if char was treated as signed. On other machines, like the IBM mainframes, it was more efficient to treat char as unsigned. The big Unix port to at least three other machines (the original machines were Digital models) pointed out little things like this.
When C began to migrate to microprocessors, there was no signed keyword. So if you wanted (implied) sign extension in your compiler, you made plain char be equivalent to modern signed char. (Even if signed conversion was less efficient than unsigned.)
Another reason - what I remember was that everyone with access to Unix machines wanted to mimic them, which meant mimicking the Digital versions of software.
Here is the historical reason: the char to int conversion is supposed to use the most efficient conversion. On some machines, like the Digital machines, it was more efficient if char was treated as signed. On other machines, like the IBM mainframes, it was more efficient to treat char as unsigned. The big Unix port to at least three other machines (the original machines were Digital models) pointed out little things like this.
When C began to migrate to microprocessors, there was no signed keyword. So if you wanted (implied) sign extension in your compiler, you made plain char be equivalent to modern signed char. (Even if signed conversion was less efficient than unsigned.)
Another reason - what I remember was that everyone with access to Unix machines wanted to mimic them, which meant mimicking the Digital versions of software.
With vc2003 you can use the /J switch to cl.exe to make char be treated as unsigned, which may help.
:shock: impressive, tenkey :)
interesting to know
interesting to know
Wow! A lot of nice info, thanks to tenkey and stormix!
By the way, also MSVC 6.0 contemplates the switch /J!
Regards, bilbo
By the way, also MSVC 6.0 contemplates the switch /J!
Regards, bilbo
Another way to do this (sqeezed into one line for extra l33tness :mrgreen:)
Which is basically what tenkey indicated ;)
for (int i = 0; name; sum += (unsigned char)name);
Which is basically what tenkey indicated ;)
So, which compiler do you use ???
Both VC 6 and GCC 3. Since one of my goals is portability, I'd rather stay away from long and keep using int.
@tenkey: Very interesting! That explains this oddity in the type conversion logic. :)
With vc2003 you can use the /J switch to cl.exe to make char be treated as unsigned, which may help.
Never never never ever depend on this, it's for being able to compile broken source code. If you're depending on 'char' being unsigned, you might as well code in assembly, as your code will not be portable.