Zero-terminated strings are pretty inflexible, often leads to either over- or under-sized strings, and are the main cause of those pesky buffer overflow problems. Not to mention that the usual usage patterns of zero-terminated strings are often slower than need be, since you constantly scan for the string length.

Here's my initial work on dynamically sized and length-counted strings, which should be both safer and faster than traditional zero-terminated static buffers.

Any feedback is welcome, especially bugreports (there might be some "knock head into table" bugs as well as off-by-one errors, let me know).

Ideas and/or code for additional functions or macros are welcome, as well as optimization ideas (mostly those that give a measurable benefit, but anything goes :)). The included example is *very* basic, but should be enough to get you started.
Posted on 2005-08-19 06:36:19 by f0dder
I don't have any bugreports but I've only glanced at the source.

My only initial suggestion would be that as a sister library someone should put together a complete set of conversion functions to and from the various numerical types. There already exists source code for those functions so it should be possible to write a set which use your dyn strings natively rather than wrapping existing one with zstring to dstr and vise versa conversions.

Regardless though I do want to say that I think this is a excellent library and is perhaps one which has been needed for a long time.

P.S. I normally don't correct spelling/grammer mistakes seeing as I make so many myself, but I suspect you ment to use the word misrepresented as opposed to misinterpreted in the copyright notice on dstring.inc.
Posted on 2005-08-19 07:07:47 by Eóin
Works like a charm.  Assembled clean & runs smooth.

-#2pencil-
Posted on 2005-08-19 07:53:48 by number2pencil

P.S. I normally don't correct spelling/grammer mistakes seeing as I make so many myself, but I suspect you ment to use the word misrepresented as opposed to misinterpreted in the copyright notice on dstring.inc.


Did you do that on purpose?  ;)
Posted on 2005-08-19 09:39:40 by roticv
Heh, I thought I had written misrepresented - interesting :) (I hope the code doesn't have that kind of brainfarts... I did double-check it and have a few runs with OllyDbg, but again - more testing for the win).

Number conversion could be pretty decent - initially handling up to 64bit signed/unsigned and hexadecimal conversions. sprintf-style formatting would also be nifty.

Once the base library is done, I'll probably write some C++ wrapper class for it; std::string is pretty swell and all that, but it doesn't always give me what I need. I'll probably also convert the library to FASM, to get rid of the dependency on a legacy assembler like MASM (though I'll still continue full interface compatibility with MASM).

I'll be leaving for a 12-hour night shift in a couple hours, that should give me plenty of coding time :D
Posted on 2005-08-19 11:02:03 by f0dder
"knock head into table" bugs


In fact, there is one in the dstr_free procedure ... the memory pointer in gets zeroed before testing if it points to memory that should be deallocated.
Posted on 2005-08-20 13:29:51 by Frank
Thanks for reporting - dunno what I was thinking when I wrote the code. Fixed in source, will update the .zip somewhat later (want to rewrite the stuff in FASM first).
Posted on 2005-08-20 16:40:06 by f0dder
roticv, I wish I did that on purpose but unfortunatly no :sad: .
Posted on 2005-08-20 17:46:05 by Eóin
f0dder, it was my pleasure  :-)

Keep up the good work. I'm looking forward to the FASM version.
Posted on 2005-08-20 22:38:10 by Frank
Whoo~hoo. I like this. I have confused with the string alqorithm.


CTEXT MACRO y:VARARG
LOCAL sym

CONST segment
IFIDNI <y>,<>
sym db 0
ELSE
sym db y,0
ENDIF
CONST ends

EXITM <OFFSET sym>
ENDM
Posted on 2005-08-21 23:35:37 by realvampire

Thanks for reporting - dunno what I was thinking when I wrote the code. Fixed in source, will update the .zip somewhat later (want to rewrite the stuff in FASM first).


f0dder, are you going to replace your ZIP file any time soon? Of course the bug is trivial and people reading the whole thread can correct it on their own. But then, some won't read the whole thread and, after downloading, will run into the memory leak.
Posted on 2005-08-27 23:26:09 by Frank

Zero-terminated strings are pretty inflexible, often leads to either over- or under-sized strings, and are the main cause of those pesky buffer overflow problems. Not to mention that the usual usage patterns of zero-terminated strings are often slower than need be, since you constantly scan for the string length.


What do you mean by inflexible? Why do they often lead to over- or under-sized strings? Why do they lead to buffer overflows? These aren't direct consequences of the data structure. I agree that caching string length is often a good idea. I can, however, see a few notable advantages to plain zero-terminated strings:


  • Many string operations can work more efficiently with zero-terminated strings: with counted-length strings, you need to allocate a register for the length (or read it from memory, ugh).

  • They incur minimal space overhead per string: This can be important when you deal with many short strings, which isn't an exceptional case.

  • They are ubiquitous and portable: The C concept of a string; with counted-length strings you need to think about endianess and machine word size; other modules need to agree about these in order to communicate efficiently.



Posted on 2005-08-28 05:29:03 by death
While I am more than pleased to see f0dder writing something to support FASM, its worth noting that the OS already has strings of this type in OLE strings in both ANSI and UNICODE so there may be some capacity duplication in trying to do it another way. Windows can live with both concepts, the zero terminated types are simpler but the ole strings and similar systems while being easier for inexperienced users to use are inherantly slower because of additional overhead.
Posted on 2005-08-28 06:31:09 by hutch--
Frank, I hope to upload a new version later tonight; the code is modularized and translated to FASM, but I want to write up a makefile before I move on. It's been some 3-4 years since I manually wrote a makefile, and I'm currently pondering whether to use GNU make or MS nmake or pelle's pomake.

Death, those statements were made based on observations of how a lot of C and assembly programmers use zero-terminated strings. Off-by-one errors, over/undersizing, etc. zstrings aren't bad in and by themselves, but it requires more effort to get it right.


Many string operations can work more efficiently with zero-terminated strings: with counted-length strings, you need to allocate a register for the length (or read it from memory, ugh).

With zstrings, you need some additional branching logic, though - and we all know that branches are slow, while memory in the L1 cache (ie, a few locals) aren't that bad wrt. speed.

hutch--, this library is going to be fairly portable - you only need to change the dynamic memory allocation functions around. Besides, you don't have access to the implementation of the windows BSTRINGs, so you cannot make guarantees about size/speed across windows versions.
Posted on 2005-08-28 11:01:51 by f0dder

With zstrings, you need some additional branching logic, though - and we all know that branches are slow, while memory in the L1 cache (ie, a few locals) aren't that bad wrt. speed.


I don't understand: you still need branching even with counted-length strings.
Posted on 2005-08-28 13:02:47 by death


With zstrings, you need some additional branching logic, though - and we all know that branches are slow, while memory in the L1 cache (ie, a few locals) aren't that bad wrt. speed.


I don't understand: you still need branching even with counted-length strings.



Consider strncpy - you need to check for length as well as zero. With a length-counted string, you don't need the check-for-zery branch, and you don't even need a bytecopy loop, you can use whatever efficient form of memcpy you want.
Posted on 2005-08-28 13:30:12 by f0dder

Consider strncpy - you need to check for length as well as zero. With a length-counted string, you don't need the check-for-zery branch, and you don't even need a bytecopy loop, you can use whatever efficient form of memcpy you want.


Yeah, but consider a general case where you want to process character by character, for example simplistic case conversion. There are also other efficiencies (e.g., in-place substrings, mostly used in tokenization). I have no doubt length-counted strings are more efficient in many common operations; just pointing out that zero-terminated strings do have their uses and goodies.

Posted on 2005-08-28 17:17:31 by death
Updates should be coming along soon - I've finally set up a version/source control system (http://subversion.tigris.org), and I'm re-teaching myself GNU make.
Posted on 2005-09-17 17:38:03 by f0dder