There are times when you want an array of strings, either in order or not and don't want a big production like linked lists etc. For example for error code lookups in your program, you would generally have an error number and a message. Given the number you would normally create an array of pointers to strings. This seems a bit much for such a simple task and handling each on an ad-hoc basis is a pain so I came up with this. It is designed to generate error messages so it is not fast nor does it need to be...

.data

STRINGBLOCK DB 080h,"String Entry 0",0
DB 081h,"String Entry 1",0
DB 082h,"String Entry 2",0
DB 083h,"String Entry 3",0
DB 084h,"String Entry 4",0
DB 085h,"String Entry 5",0
ENDSTRINGBLOCK:
DB "No entry found",0

LENSTRINGBLOCK EQU ENDSTRINGBLOCK-STRINGBLOCK

.code
GetString FRAME iString
uses edi
mov al,[iString]
add al,080h
mov ecx,LENSTRINGBLOCK
mov edi,OFFSET STRINGBLOCK
repne scasb
mov eax,edi
ret
ENDF


When you want string 0 you simply invoke GetString,0 and it will return the offset of the first string, zero terminated and ready to copy or display or whatever.

The strings are keyed with the index number (beginning at 080h so they don't interfere with content) and the proc just adds 080h to the index and scans for that byte value returning the offset of the character immediately following it. For values that do not exist, the pointer will be one past the end of the array so that points to "No entry found" and will return that. Like I said it is not fast but it is simple and pretty much fool-proof and in the context that it is intended it is more than efficient enough.
Posted on 2004-02-23 17:30:30 by donkey
mov eax, strings
mov ecx, string_number
_0: movzx edx,
dec ecx
lea eax,
jns _0
sub eax, edx
ret

strings:
DB 8
DB "1234567",0
DB 8
DB "1234567",0
DB 8
DB "1234567",0

A macro could automate string length calculation.
Posted on 2004-02-23 20:15:03 by bitRAKE
I'm not sure exactly what yours does.

You move the offset of the strings into EAX, the first value (I assume the movzx is moving a byte) in EDX would be 8. You then add 1 to it (making 9) then attempt to read that as an address. I would think it would GPF after the first itteration.



Shouldn't it be :

lea eax,

To add the length to the address ?

But besides that, mine is meant for error codes and I didn't want to scan in order. I wanted the string found by it's code no matter what order it was in. That way I can just add new errors to the end of the array and not have to worry about sequentially numbering the errors or keeping them in order.

For example:

STRINGBLOCK		DB	080h,"String Entry 0",0

DB 081h,"String Entry 1",0
DB 082h,"String Entry 2",0
DB 083h,"String Entry 3",0
DB 084h,"String Entry 4",0
DB 087h,"String Entry 7",0


7 would still give string 7 but 5&6 would remain unfound and give a default message. In my app a neg return value always indicates an error so I just send it straight to the lookups (no +80h). But each type of error has a different high nibble. For example there may only be one error in Cx but 5 in Dx and 3 in Ex etc...
Posted on 2004-02-23 20:28:38 by donkey
Yeah, I meant LEA EAX,.

Okay, I see now how neatly that fits your use. Error codes are usually not time critical.
Posted on 2004-02-23 21:22:31 by bitRAKE
Yeah,

I like yours though and will definitely find it useful. Mine is pretty much specific to looking up things by index that are not time critical but I thought somebody might find a use for my solution to the problem. Like I had said though it is for error messages, I doubt I would use it elsewhere, maybe messageboxes with predefined messages or some other thing where speed was pointless.

It is the great advantage of assembly that we can tailor everything to it's specific application and not be herded in to one size fits all type solutions.
Posted on 2004-02-23 21:34:04 by donkey

It is the great advantage of assembly that we can tailor everything to it's specific application and not be herded in to one size fits all type solutions.
To me, this puts more value in the programmer and stresses the importance of understanding the problem. Whereas HLLs tend to overstress reuse: devaluing the programmer and moving too quickly to a solution prior to really understanding the problem. Or maybe I've just known too many bad HLL programmers, and project managers to be objective.

Posted on 2004-02-23 22:19:27 by bitRAKE
What about the following? IMO it's rather easy to manage. Personally, I would probably choose a pair of String-ID's and string pointers, so I can add/remove strings as I see fit - this also encourages the use of some ID_ERR_* (or whatever name) equates, so 'magic numbers' don't crop up in your code.

The scheme could be changed somewhat, for instance with changed logic in the geterror proc, the first table entry could be used to hold amount of string entries (still computed automatically at build-time), so you only have to pass index number and table-start. Or macros could be written for the current routine etc.

Error table lookup isn't really a critical task, so you might not want to spend four bytes per string for the table pointers. The idea of 80+string-id is cute enough, but of course limits the amount of strings and excludes non-english OEM charsets :)



CTEXT MACRO y:VARARG
LOCAL sym

CONST segment
IFIDNI <y>,<>
sym db 0
ELSE
sym db y,0
ENDIF
CONST ends

EXITM <OFFSET sym>
ENDM

.data
errorstrings_s:
dd 0 ; first entry = out-of-bounds value
dd CTEXT("Message 1 - you'll never see this")
dd CTEXT("Uh oh, the printer's on fire")
dd CTEXT("Critical Surface Error")
dd CTEXT("Ugh. Stone-age tactics")
errorstrings_e:

.code
geterror PROC idx:dword, tblstart:dword, tblend:dword
mov eax, [idx]
mov ecx, [tblend]
mov edx, [tblstart]

; bounds checking
sub ecx, edx
cmp eax, ecx
jl @@inbounds

; string entry 0 = out-of-bounds value to return
xor eax, eax

@@inbounds:
; return string from table
mov eax, [edx + eax*4]
ret
geterror ENDP
Posted on 2004-02-24 00:59:40 by f0dder
Donkey,

Maybe I have missed something but isn't it simpler to write the array and the set of strings then just access them by the array ?


.data
st0 db "string0",0
st1 db "string1",0
st2 db "string2",0
st3 db "string3",0
st4 db "string4",0
st5 db "string5",0
st6 db "string6",0
st7 db "string7",0
st8 db "string8",0
st9 db "string9",0

sarr dd st0,st1,st2,st3,st4,st5,st6,st7,st8,st9

; call proc
invoke GetString,4,OFFSET sarr
invoke SetWindowText,hWnd,eax

; procedure
GetString proc num:DWORD,lparray:DWORD

mov edx, num
mov ecx, lparray
mov eax, [ecx+edx*4]

ret

GetString endp
Posted on 2004-02-24 05:25:43 by hutch--
Hi Hutch,

The idea was to have the ability to skip indexes. I will leave holes in my error codes for future expansion while still remaining in the structure I have laid out. So there may very well be quite a few holes in the array, for example this is a portion of the HTTPUpdate error code equates :

/*

#################################################
ERROR CODES
#################################################
90 = Exit code error from main application
91 = Main app not responding (timeout)

Internet connection codes
A0 = InternetOpen failed
A1 = URL ping failed
A2 = InternetOpenUrl failed
A3 = Could not allocate memory
A4 = InternetReadFile Failed
A5 = zero bytes read
A6 = Invalid script format or syntax error
A7 = Zero bytes written to file or byte counts don't match
A8 = Query file size failed

Merge registry error codes
B0 = Could not update registry key
B1 = Problem in termination of RegEdit.exe

Download error codes
C0 = CRC did not match

D0 = General download error
D1 = Could not create temp file

Decompression error codes
E1 = Z_ERRNO
E2 = Z_STREAM_ERROR
E3 = Z_DATA_ERROR
E4 = Z_MEM_ERROR
E5 = Z_BUF_ERROR
E6 = Z_VERSION_ERROR

F1 = Could not open compressed file
F2 = Could not create compressed heap
F3 = Could not create uncompressed heap
F4 = Could not create ucompressed file
*/


As you can see I needed a solution tailored to the way I like to define my error codes. Each high order nibble is indicative of the procedure that the error occured in.
Posted on 2004-02-24 05:38:50 by donkey
Originally posted by bitRAKE
To me, this puts more value in the programmer and stresses the importance of understanding the problem. Whereas HLLs tend to overstress reuse: devaluing the programmer and moving too quickly to a solution prior to really understanding the problem.


From my experience this usually happens when people think and code in the wrong order. Of course asm has the advantage here, that it takes so much longer to code stuff that most people inevitably start to think at some point :grin:
Posted on 2004-02-24 05:45:09 by Jibz
hutch, your sample is basically the same as mine - except of the manual overhead of defining 'sarr' and the lack of bounds checking. Much nicer to use a macro for the string pointer table, this gives the additional benefit that it would be a matter of changing the macro (perhaps with an assemble-time equate) to get UNICODE support, too :)

Last time I needed stringtables was for an engineering app a friend and me wrote for a danish company. The app needed to be (and was) localized for at least Danish, English and German - perhaps French, too. In the scale of 'real world' things it was a sorta small application, but I guess a lot of people here would think of it as 'big'.

The approach was to parse an ini file with "stringid=string" lines into a binary format - header, pairs of <stringid,format> DWORDs, raw stringdata. Very fast to load, very fast to search... while not really a big deal with a few hundred strings as I think we had, I did implement a binary search just for the heck of it :)

The advantage to using resources, was that we didn't have to build any resource DLLs, that it was easy to write a tool to define the strings (we just used notepad though), and that using a string at runtime was a matter of calling the getstring routine with the ID - no need to LoadString into a buffer first.

Might be overkill for what you need though, but it was pretty nice and supported whatever OEM character sets we could think of.
Posted on 2004-02-24 08:54:25 by f0dder
f0dder,

The technique you used sounds fine for small counts and any dynamic technique is more flexible in terms of being able to use external files for the data.

The bounds check for the code I posted is simple if you ned it, just check the number against the maximum and disallow it if its bigger.

For future applications especially where you need multilingual support, probably the OLE string capacity for either ANSI or UNICODE is the most flexible as it uses string handles where the data pairs can comfortably be written to a single string.

In an application like the one Donkey has in mind, I would probably allocate a dynamic array, fill it with a default address for each member then add the specific addresses for the strings that have so far been added so you have the capacity to simply add extra data.
Posted on 2004-02-24 18:43:38 by hutch--
Hutch, the code asm you posted was basically the same as mine - only that I used a macro to automate the generation of the "sarr" table and had bounds checking in the getstring - not really a big deal, I think the macroized approach is a bit cleaner and less work to manage, but whatever :)

As for the "getstring library" I described, it's overkill for a lot of situations - but it's pretty beautiful :P

It's very convenient to use, you call the getstring routine and it returns a string pointer; no need for loading into a buffer first, as with the WIN32 LoadString.

It uses DWORD identifiers, so you can have a lot of strings. The identifiers can be discontiguous, like donkey needs (it's pretty nice when you need to add some error message you hadn't thought of, heh). And since the table is id-sorted at buildtime, binary search can be used for looking up strings. This is pretty fast, and I think the code would scale pretty well even for a lot of strings.

One of the biggest advantages, as we saw it, is that the string data files are more compact and easier to distribute than string resource DLLs... besides, they're portable :grin: whic might actually matter, as my friend has betrayed windows somewhat in favour of linux (that fool ;)).
Posted on 2004-02-25 10:19:16 by f0dder