ok, i have been trying to get down strings for a very long time. the other day i got a tutorial on strings from NOP-erator's Assembly World - great site.

well, anyway i read it and thought i understood it apparently i didnt because usually when i try to manipulate strings. i usually crash the app,get output of strange characters that weren't in the string in the first place or recieve nothing at all. This is amazingly frustrating to me.

if anyone knows of any good tutorials or maybe something graphical that would be good for teaching a complete moron. lol. They would be greatly appreciated.

Thank You,

HeXeN :confused:
Posted on 2002-12-19 04:21:53 by HeXeN
hmmm .... Post your Qn here then. Btw.. It crashs becoz of the pointer error or because you overwrite the allocated buffer.
Posted on 2002-12-19 04:35:51 by roticv
What kind of manipulation are you interested in?
How far have you got at the moment?

There are two kinds of strings supported in windows, ASCII, and UNICODE.
ASCII supports 255 characters, and a terminator. Hence each element of the string is 1 byte in size (8 bits).
UNICODE supports 65535 characters + a terminator. Each character takes 2 bytes (1 word, or 16 bits), and is used in NT. It gives a much bigger range of characters, and so is useful when dealing with languages which do not use the standard "western" alphabet.

You should try to work out which you are dealing with, obviously writing an ASCII string over a UNICODE string will cause problems (and similarly vice versa)!

Mirno
Posted on 2002-12-19 04:41:23 by Mirno
why don't i recieve any out even remotely close to the string mostly is my question.

im trying to do any string manipulation that will give me atleast part of the string atleast then ill know im on the right path.

-HeXeN
Posted on 2002-12-19 04:47:27 by HeXeN
btw im one of those visual basic programmers most people don't like. im used to the high level languages.

-HeXeN
Posted on 2002-12-19 04:48:54 by HeXeN

btw im one of those visual basic programmers most people don't like. im used to the high level languages.
Posted on 2002-12-19 05:49:54 by Maverick

You're being surrounded.. SURRENDER OR DIE!!!

:eek:

lol... and the good thing is - those vb programmers can't even run away from us, cause they're so damn slow :grin:

hexen, why don't you post some source of yours we'll have a look.
Posted on 2002-12-19 06:05:33 by Tola
HeXen strings aren't really anything special...they're just a sequence of bytes(words if unicode).

All string functions will take a pointer to the string.
If you aren't getting what you expected then you should check one of the following :

1)you do not have a pointer to the string ...

2)the string data isn't initialised...

3)you're expecting the wrong thing! :grin: (all bugs WILL be solved):grin:

in most cases the problem is 1.
Anyway here's a small radasm project which should ge you going. It doesn't use any exotic instructions just the basic mov.
It will reverse and capatilise an input string.

Hope it helps you.


Damn! I forgot to attached the file. Check post below:rolleyes:
Posted on 2002-12-19 07:25:21 by MArtial_Code
Here it is..
Posted on 2002-12-19 07:27:49 by MArtial_Code
HeXeN,

A bit of the difference is that in most languages, you pass the address of a string, not the string itself. In most dialects of basic, when you pass a string to a sub or function, the compiler actually passes the address of that string. In basic you call it passing by reference.

In assembler, you can handle strings in various ways but at its simplest, you put a string in the .DATA section,


.DATA
MyString db "This is my string",0 ; zero terminate it.

Then in your code you can work on the string by using its address.
LOCAL buffer[128]:BYTE

mov esi, OFFSET MyString ; put address in ESI
lea edi, buffer ; load local address into EDI
mov ecx, LENGTHOF MyString ; length in ECX
add ecx, esi ; calculate an exit condition

lbl:
mov al, [esi] ; get the BYTE at address in ESI
inc esi ; increment ESI
; -- -----------------------------
; do what you need to do with each
; byte in the AL register
; -------------------------------
mov [edi], al ; write BYTE to address in EDI
inc edi ; increment EDI
cmp esi, ecx ; test if esi = ecx
jne lbl ; jump back if not

This loop can be made more efficient but the basics are there to read a string, modify it if you need to and copy it back to a buffer. All you need to remember is that you are working with the ADDRESS of a string and you access the content BYTE by BYTE in the AL register and write it back to the buffer.

Regards,

hutch@movsd.com
Posted on 2002-12-19 08:14:49 by hutch--
HeXeN

You should tell us which program you are using to assemble and compile your code. We would then be in a much better position to give you some basic real examples which you could try and build on.

Raymond
Posted on 2002-12-19 19:32:40 by Raymond
quick editor with the syntax highlighting plugin.

-HeXeN
Posted on 2002-12-19 19:50:18 by HeXeN
Quick editor is only a word processor. It could be used with any assembler. Must I assume that you are using MASM32 to assemble whatever you type with Qeditor?

Raymond
Posted on 2002-12-19 21:30:27 by Raymond
Yes Raymond You Are Correct. Im using The Masm v7 Package.

-HeXeN
Posted on 2002-12-20 00:16:14 by HeXeN
Now it will be easier to help you and get you started.

Just copy and paste the following code in the Qeditor. If needed, modify the include directives to reflect your masm32 path.

Save that as an .asm file (such as Trial1.asm), then under the 'Project' tab choose 'Console Assemble & Link'. If no assembling error is reported, choose the 'Run Program' option under the 'Project' tab. You will see two of the many variations on how to display strings in Win32asm.

;######################

.386
.model flat, stdcall
option casemap :none ; case sensitive

include \masm32\include\windows.inc
include \masm32\include\user32.inc
include \masm32\include\kernel32.inc
include \masm32\include\masm32.inc

includelib \masm32\lib\user32.lib
includelib \masm32\lib\kernel32.lib
includelib \masm32\lib\masm32.lib

.data
mystring db "I am using MASM32",0

.code

start:

invoke StdOut,ADDR mystring
invoke MessageBox,0,ADDR mystring,0,MB_OK

invoke ExitProcess,0

end start

;###################

The user32.lib is required for the MessageBox function.
The kernel32.lib is required for the ExitProcess function.
The masm32.lib is required for the StdOut function.

If you do not already have the WIN32.HLP file, it is an absolute necessity if you intend to use and understand the Windows functions.

Raymond
Posted on 2002-12-20 11:04:18 by Raymond
No Offense But I Knew That Already. The Thing Im Having Problem With Is Since IM Used To Visual Basic.

There Are String Functions.

Like Mid
Left
Right
Instr


I Know These Are Included In The Masm V7 Package.

But Im Not Really Positive On How The Functions Are Really Working.

I Apologize If "No Offense But I Knew That Already." That Offended You.

-HeXeN
Posted on 2002-12-20 19:25:44 by HeXeN
well the basics of it all are very simple.

on this low level you must forget any encapsulation that you're used to and implement your own or at least understand how others implemented it for you.


A string is best compared to scrabble stones on a numbered board.


1 2 3 4 5 6 7 8 9
1
2
3 H a l l o 0
4
5
6

So if you have the ASCIIZ (ASCII, zero-terminated. type of string windows usually wants) string "Hallo" then that means that your string takes up 6 bytes for the 5 letters and the 1 terminator (0). So on the scrabble board you can literally place 6 stones.

Now because your memory is huge and filled with many data, it's impossible for your functions to 'guess' where the string is that you wish to use. HLL such as VB allow you to solve this via just the label but internally windows wants the address in memory of your string. So, our example string starts at address 31 and ends at address 36, 6 scrabble stones of 1 byte :)

This starting address is where it's all at. If you do a mid in vb then you say from string <label> I want x chars, starting at position Y

so actually you say to the VB compiler starting with the address of the string add Y bytes to determine the starting position and make me a new stringbuffer that contains the next x bytes starting from the newfound address (string + y)

Left is just the same thing left(Hallostring,3) is just

the address of Hallostring (31 for our fictional string), make a buffer of 4 bytes (3 chars + 1 terminator 0) and copy all chars from position Hallostring (31) to position Hallostring + 2 ( 3 scrabblestones, 33) :)

That's all there is to it. really.

I hope this helped a bit. If you have trouble visualizing it, try real physical ways that you can rearrange (like scrabbleboards ;) ) to show it to you.
Posted on 2002-12-20 20:50:45 by Hiroshimator
HeXeN

You did not offend me at all. As my signature says, whenever you assume something, you risk being wrong.

At least, now we know what you want.

I do agree with you that the source code of the MASM32 library procedures related to strings could certainly be upgraded, at the very least to explain (i) the purpose of the procedure (which is not even mentioned in many cases) and (ii) what type of parameters are expected.

I'll give you one example and you can probably figure out the others by analogy. All string procedures assume zero-termination and restore such in the destination string.

szRight proc lpszSource:DWORD,lpszTarget:DWORD,ln:DWORD

This procedure would take the right most number of characters specified by "ln" from the string which is located at the address specified by "lpszSource" and copy those characters starting at the address specified by lpszTarget, adding a terminating zero at the end.

You would than call this procedure as follows, substituting the proper variable names for Source and Target, and specifying the correct number of characters instead of the 10.

invoke szRight, ADDR Source, ADDR Target, 10

Let us know if you need more.

Raymond
Posted on 2002-12-20 21:32:12 by Raymond
HeXeN,

I know its a bit of a pain when you first shift from a language structured like basic to assembler but at the most basic level, string data is not that different. Modern basic dialects use OLE string which is an allocated piece of memory for each string.


a$ = "This is a basic dynamic string"

This is a string handle "a$" that has some literal text assigned to it. a$ is actually the address of the memory that the literal string data is stored in.

With assembler at its most simple, the storage of string data is done a bit differently, the compiled file is broken up into what is called sections and one of them is called the "initialised data" section. This is where you can store string data. The notation is pretty simple,


.data
MyString db "This is a zero terminated string",0

This means that the string data is stored in the .DATA section and when you want to do something with the string, you refer to it by the NAME you selected when you stored the string data.

Now what makes it a bit different is that when its in the .DATA section, its address is already known when the file is being assembled so it is referred to as an offset in the file.

In MASM, you use the OFFSET keyword to refer to that string.


mov eax, OFFSET MyString

This puts the address of the string from the .DATA section into the EAX register so you can do somewthing with it.

Some string data is obtained at runtime so it is not contained within the file at all. Just for example, if you wanted to select a file name from a common dialog box, you have to allocate a buffer big enough first so that you can work on the file name after you have selected it.

This is usually done within a procedure so you can allocate the buffer for the file name on the stack as a LOCAL variable.


LOCAL buffer[260]:BYTE

This gives you a 260 byte long buffer to load a file name into. Because it is a stack variable, you do not have to bother to deallocate it because that is automatically done when the procedure is finished.

Now something worth noting, because it is created at runtime on the stack, it is not an OFFSET like string data in the .DATA section so you use a different instruction to place its address into a register.


lea eax, buffer

LEA (load effective address) is the instruction to use rather than MOV with a LOCAL variable that is created on the stack.

Don't worry about it being a bit different to basic string handling, once you get the swing of it, zero terminated strings are actually a bit simpler than basic dynamic strings and are often a bit faster.

Regards,

hutch@movsd.com
Posted on 2002-12-21 04:33:44 by hutch--
HeXeN:
VB is irrelevent. With Assembl? lanuage, you are no longer the conductor of an orchestra. You are now responsible for writting the music along with tuning the violens and drain the spit from the trumbones.

Now for the 16 demo (ie Start->Run->debug){note ; comments are valid}


e130 "Good by Cruel World." 0
a100
mov di, 130 ; PTR to string to search
mov al, 0 ; Look for terminating Null
repne scasb ; Do Search until Null is found
mov cx, di ; Calc Length of String
sub cx, 130 ; by Subtracting Base PTR
mov [14e], cx ; Save Length
mov si, 130 ; Source
mov di, 150 ; Destination
rep movsb ; Copy String to 0x140
mov ah, 40 ; DOS WriteFile
mov bx, 01 ; to StdOut
mov cx, [14e] ; Length
mov dx, 150 ; Address
int 21 ; Print!
ret ; Done
; notice extra CRLF


Somthing is FUBAR on my machine, but the above should work.

Type ? and CR for help or refer to your MS-DOS 4.0 User's Guide
Posted on 2002-12-21 05:39:36 by eet_1024