i am meking a program that reads a Text file script...but after do a lot of code a confusion comes to my head...

how it's marked the end of a line in a text file? (after a Enter)

i made a some test with a .txt file on windows:

when i just put an ENTER key on the file , the size was 2 bytes...(?)

then when i put an A and an ENTER key the size of the file comes to 3 bytes ....

so in my conclutions ENTER key puts 2 bytes (i am a genius!) ...so ...

whish are those 2 bytes , on order to detect the end of the line in a loop in my program ?

when in a loop to fill a buffer with a string well read from a text file , i have to discount
the bytes well read ( to check the end of the file), but when i get the end of a line , i have to discount 2 bytes ?
Posted on 2003-03-02 17:20:10 by DrBios
Actually, when you create an empty text file, the filesize will always be 1 byte. That is because an empty text file has 1 character in it, the null character 00H. This is the character you check for to tell of the end of file. Open your empty text file in a hex editor and you will see it has 1 character which is 00H. Now, add a few characters to that file say:

I am line 1
I am line 2
and you will see that after line there will be 10h13h for carrage return line feed and after line 2 you will see 00h

So, to read and do something with each line in the text file I use:


GetNewLine:
@@:
mov al, byte ptr [esi] ; this is my file buffer
inc esi
cmp al, 0DH ;CR
je AddItem
cmp al, 00H ; EOF
je AddLastItem
mov byte ptr [edi], al
inc edi
jmp @B

AddItem:
; Do something with the line here
jmp GetNewLine

AddLastItem:
; Do something with the last line here


This is off the top of my head....
Posted on 2003-03-02 17:28:46 by Gunner
good point that about THE END OF FILE , i didn't note that 00H it's read at the end.... so now i can forget that about discount bytes well read :grin:

... thanks...
Posted on 2003-03-02 18:48:42 by DrBios
just to clarify what gunner said...
I've never seen a text editor which automatically wrote a null character to a file. I don't think it happens , atleast not on windows.

What you can do DrBios is if you know you are opening a text file then when you read it into memory allocate a buffer which is 1 byte greater than the file size. After reading in the file put a null byte at the end of the buffer. That way when processing the buffer you know you've processed all valid bytes when you've read a null byte.

Also be aware that CR(carriage return) byte 0DH may not always be in a text file(it's not in unix text files). To be certain you've reached the end of a line you need to detect the LF(line feed) byte 0AH.

Here are the steps to take:
1: get the file size (filesize)
2: allocate a buffer with size bufsize =filesize+1
3: read the file into the buffer
4: set the last byte of the buffer to NULL
5: process the buffer each 0AH char signifies a new line
6:while processing the buffer if the byte==NULL then finished

I suggest you start by writing a routine which counts the number of lines in a file. Test it on both unix/windows type files. Once you've done this you can you shouldn't have a problem.
Posted on 2003-03-02 18:56:14 by MArtial_Code
actually , i figured out to allocate the the script on memory , but i do not thought put the 00H at the end , which make very easy find the end of the file..:grin: ....

and the unix text part , it's a good idea :grin: , like this it's more sure read the file , whether it's as been made

to be shure i gonna check both 0AH and ODH to not add to my buffer the one of them while i fill the buffer
Posted on 2003-03-02 19:24:56 by DrBios