Hi all,

I'm about to plunge into my next program, and it's a bit more ambitious, and I have come to realize that I'm not really sure how to go about it. Here's what I'm thinking, and I'd love it if any of you would point out things I could or should do differently.

My next teach-myself-assembly program is a Win32 assembly port of the Unix wc command, which does word counts on a file. I've written this program already in C, and am hoping to use that familiarity as I write it in assembly. As I prepare to write, however, I notice that I'm thinking at a very high level, and am trying to code it that way. I think it would be better if I could learn the "right" way to do it in assembler. So, comments are welcome.

Here's what I need to do:

1. Get and parse the command line. At first blush this seemed easy; GetCommandLine returns the command line. But then I noticed that it returns the entire command line as a single string. To break it into separate arguments, I need to split the string up on whitespace and quotes. Ugh, that's a assembly programming challenge for me right off the bat. I'm used to getting an array of pointers to the string, already split up. Is there a proper way to do this in assembler, like make a different call of which I am not aware, or perhaps use some already written code?

If I *do* write this from scratch, what's the best way to do it? My thinking would be to have a dword buffer prepared to hold a list of pointers to each argument. Next, I would scan through the string looking for dividing points--whitespace between arguments, and/or quotes. Each bit of whitespace or quote character I would write a zero byte over the whitespace, and record the address of that chunk of string in my pointer buffer. When I was done, I would write the number of pointers somewhere--possibly to the first dword in the buffer, which I would have reserved in advance. Is this a sane and assemblylike approach? One quick thought that worries me is the possibility that the buffer pointed to by GetCommandLine may not be intended to have me writing zeroes into it. If so, I'll need to copy the string first.

2. Once that's all done, I should have the filename. I'll need to check to see if it exists (PathFileExists in Shlwapi.dll, right?) and then open it and read it. What's the proper assembler way to do this in Win32? Should I use CreateFile, or is there a simpler way? (CreateFile, in C++ at least, is notoriously overcomplex.)

3. Those are the two hard parts. After that, it's just a matter of read a chunk and scan it; update totals and repeat until end of file. Here my thinking is to simply scan for characters. If the first character I get is alphanumeric, I count myself as being "in a word". If not, I'm outside a word. I simply scan forward, looking at each byte looking for the other character type than I've already found. Each time I toggle to "in a word" I increase the word count. Each time I find \n while outside a word I increase the line count. Each byte read I increase the byte count. (Actually, I probably won't do that last one unless it's essentially free as a byproduct of my scanning loops, because functions exist that will get the file size anyway.)

I know this is only barely a Win32 program :) but it'll go a long way towards helping me learn to think in assembler.

Thoughts/ideas?

Thanks!

-Chalain
Posted on 2002-04-18 23:11:21 by Chalain
Chalain,

To answer your question 1:

You could use the routine included with Masm32
Version7 GetCL, supplied in the Masm32\M32LIB
where you supply the the argument number and a
buffer for the returned argument. Or the routine
I wrote can be run once at the beginning of a
program and can check for single quotes ' or
double quotes " for literals, including a quoted
program name. I fixed this since the last time I
posted this routine. The result is an array of
pointers to the command line arguments, cl_args.

farrier



.386 ; forces 32 bit assembly
.model flat, stdcall ; memory model and calling convention
option casemap :none ; case sensitive code

include \masm32\include\windows.inc ; always first
include dbmacros.asm
include \masm32\include\user32.inc ; system include
include \masm32\include\kernel32.inc ; file next
include \masm32\include\gdi32.inc

includelib \masm32\lib\user32.lib ; matching system
includelib \masm32\lib\kernel32.lib ; libraries after that
includelib \masm32\lib\gdi32.lib

get_cl PROTO STDCALL
pro_q PROTO STDCALL :BYTE

.data
ALIGN 4
err_msg db "error in get_cl", 0
head_get_cl db "GET_CL PROG", 0

.data?
ALIGN 4
hInstance dword ?
lpCmdLine LPSTR ? ;address of command line string
end_of_cl dword ? ;address of end of command line
len_cl dword ? ;length of command line
num_args dword ? ;numbers of cl arguments not counting program
cl_args dword 80 dup (?) ;addresses of up to 80 arguments

.code
start:
invoke GetModuleHandle, NULL
mov hInstance, eax
call get_cl
.if (eax != 0)
invoke MessageBox, hInstance, addr err_msg, addr head_get_cl, MB_OK
.else
mov ebx, 0
.while (ebx < num_args)
invoke MessageBox, hInstance, cl_args[ebx * 4], addr head_get_cl, MB_OK
inc ebx
.endw
.endif
invoke ExitProcess, 0

get_cl PROC uses ebx edi esi

; Requires the following routines
;skip_sp PROTO STDCALL
;pro_q PROTO STDCALL :BYTE
; Requires the following data
;cl_args dword 80 dup (?) ;addresses of up to 80 arguments
;end_of_cl dword ? ;address of end of command line
;lpCmdLine LPSTR ? ;address of command line string
;len_cl dword ? ;length of command line
;num_args dword ? ;numbers of cl arguments not counting program

invoke GetCommandLine ; get the address of the command line
mov lpCmdLine, eax ;store the address
xor al, al ;zero al
mov ecx, 10000 ;ecx will hold count down from 10000
mov edi, lpCmdLine ;destination of scan in DI
cld ;advance after each comparison
repne scasb ;compare command line bytes to 0
dec edi
mov end_of_cl, edi ;store add. of end of command line
mov eax, 10000 ;original count in ecx
sub eax, ecx ;total length of command line including program
mov len_cl, eax ;store total length of command line
mov ecx, eax
mov num_args, 0
xor ebx, ebx ;pointer to array in cl_args
mov edi, lpCmdLine
skip_bl: ;skip leading spaces
cmp byte ptr [edi], ' '
jne check_q
inc edi
jmp skip_bl
;Check for ' or " enclosing program name
check_q: ;check for ' or " enclosing program directory
cmp byte ptr [edi], '"'
je check_d
cmp byte ptr [edi], "'"
jne check_a
check_s: ;first non-blank char is '
inc edi
cmp edi, end_of_cl
jne @f
mov eax, 2
jmp iep
@@:
cmp byte ptr [edi], "'" ;look for another '
jne check_s
skip_s_ag:
cmp byte ptr [edi], ' '
jne check_a
inc edi
jmp skip_s_ag
check_d: ;first non-blank char is "
inc edi
cmp edi, end_of_cl
jne @f
mov eax, 2
jmp iep
@@:
cmp byte ptr [edi], '"' ;look for another '
jne check_d
check_a: ;check for a space before end of CL
mov al, ' ' ;now check for a ' ' space
; mov edi, lpCmdLine
repne scasb
jecxz no_args ;if ECX is 0, no args
pro_arg:
; call skip_sp ;skip all extra spaces, edi returns pointing
;to first non-space after program
cmp byte ptr [edi], ' '
jne @f
inc edi
jmp pro_arg
; mov edi, ecx
@@:
cmp byte ptr [edi], 0 ;see if this is end of command line
je no_mo_args
cmp byte ptr [edi], 34 ;is this next character a double quote "
jne c_sq ;if not a dq check for a single quote
push 34
call pro_q ;process a double quote
mov edi, ecx
mov ebx, edx
.IF (eax == FALSE)
jmp pro_ra
.ENDIF
jmp pro_arg ;process another argument
c_sq:
cmp byte ptr [edi], 39 ;is this next character a single quote '
jne pro_ra ;if not a sq process regular argument
push 39
call pro_q ;process a single quote
mov edi, ecx
mov ebx, edx
.IF ( eax == FALSE )
jmp pro_ra
.ENDIF
jmp pro_arg ;process another argument
pro_ra: ;edi points to something other than
;0, ' '. '"', "'"
mov [cl_args + ebx], edi ;beginning of first arg in cl_args
inc num_args
add ebx, 4 ;use ebx to point to next pointer in cl_args
mov al, ' ' ;look for a space in the rest of the cl
mov ecx, end_of_cl
sub ecx, edi ;number of bytes left to process
repne scasb ;scan for dq, 34, still in al
jecxz no_mo_args ;jump if a space is not found
mov byte ptr [edi - 1], 0
jmp pro_arg
no_mo_args:
mov eax, 0
cmp num_args, 0
je no_args
; call disp_args
jmp iep
no_args:
; call disp_no_args
mov eax, 0
iep:
ret
get_cl ENDP

pro_q PROC uses ebx edi esi, type_q:BYTE
;edi points to an opening type_q quote
;scan to see if there is a matching type_q quote
;if not, consider it part of an arg
push edi ;store edi in case there is no match
mov ecx, end_of_cl
sub ecx, edi ;number of bytes to process
inc edi ;point to next char
mov al, type_q
repne scasb ;scan for type_q, still in al
jecxz psq_ra ;jump if type_q not found in rest of cl
mov byte ptr [edi - 1], 0 ;replace second type_q with a 0 for end of string
pop esi ;pop address of first type_q
mov byte ptr [esi], 0 ;replace first type_q with a 0
inc esi ;point to first non-type_q char
mov [cl_args + ebx], esi ;beginning of first arg in cl_args
inc num_args ;we have another arg
add ebx, 4 ;use ebx to point to next pointer in cl_args
; set edi to end of this arg!!!
mov ecx, edi ;return edi in ecx
mov edx, ebx
mov eax, TRUE ;return TRUE
jmp get_out
psq_ra:
pop ecx ;return edi in ecx
mov edx, ebx
mov eax, FALSE ;second quote not found, process as a regular argument
get_out:
ret
pro_q ENDP

end start
Posted on 2002-04-18 23:51:52 by farrier
It all sounds good to me... go do it, and come back when stumped :tongue:

No seriously, heed to farrier's advice and make use of the masm32.lib. Use his source if it helps, but my sugestion is to remember that your program is to have a single purpose, and maybe generic functions is overkill?? If params dont fit in the right order, simply output an Error.

I also think it would be a good excersise in itself to have you parse the line itself for what you want (definitely alot of oportunity to practice the @@:, @F, and @B jumping (i know cause i wrote a parser for this same purpose once)).

Also, before you jump into CreateFile (which is a correct API to use), look at Iczelions File mapping tutorial (Tuts are at the top of the main section). You may find it would be better to file map instead (seeing some files might be quite large).

Also, on a whole, i think this next project is a good one to tackle (seeing you have windows experience under the belt ~ this is why i suggest you write the parser your self, after all, asm algo's is what your really asking to learn)

Best of luck..
:alright:
NaN
Posted on 2002-04-19 01:26:57 by NaN
If you want to go about commandline parsing manually (good excercise,
although the m32lib GetCL will save you some time), you probably shouldn't
write to the pointer returned by GetCommandLine. The argv-style pointer array
is a good approach, don't worry if it's "assembly-like" or not, it's about
suiting *your* needs. Note that commandline parsing can be tricky, there's
a bunch of different situations with spaces, quotes and stuff. While you're
testing your code, you should try running your app in a lot of different
ways (from a shell, double-clicking, dragging a file to the .exe, . . .).

Don't bother checking if file exists before you open it. Especially not
using those ugly shell functions ;). Just CreateFile - if this fails, you
can GetLastError to see what the error was (or just abort saying there was
a problem opening the file, which should be sufficient). CreateFile isn't
overcomplex, it just offers a lot of flexibility. You don't need all this
flexibility most of the time though, so it would make sense writing a simple
"openFile" wrapper that takes fewer parameters and adds the extra parms.

If you want, you could use Memory Mapped Files, that way you simply get a
pointer and can treat the file as if it was a memory buffer. There's a few
more API calls involved, but it's not hard to use, and you don't have to
do a block-based algorithm. Memory mapped files are somewhat slower than
"raw" IO (because they're backed via the PageFault mechanism), but the
advantage of simplified programming usually makes up for this performance
degradation. You might not even be able to notice the performance difference.

Also, don't worry if this program isn't very "win32ish", it's assembly
language you're out to learn :). This program will be more "assembly" than
iczelions tutorials, since icz' tuts are more focused on teaching the win32
api than assembly.

Go for it :alright:
Posted on 2002-04-19 07:51:32 by f0dder
GetCL will do fine, but there's a small problem when you pass a lot of files to the program. In my PNG viewer (written in C but without libc so I had to write my own command line parser) I first used GetCL to get the PNG files specified in the command line of the program. However when you feed it a lot of files, the function crashes. That's why I wrote my own.

Thomas
Posted on 2002-04-19 11:02:19 by Thomas