I would like to know why when i open a .exe, .jpg or any other non text files I see garbish information and what do these symbols mean?
Posted on 2013-02-19 17:06:07 by Snake4eva
Nonsense! Just gotta have the right executable!

; assemble with "nasm -o myfile.com myfile.asm"
; view with "type myfile.com"

dec ax
inc bp
dec sp
dec sp
dec di
sub al,20h
push di
dec di
push dx
dec sp
inc sp
or cl,

int 20h

That isn't intended to run, but to be viewed with... well, Notepad ought to work...

Seriously, "printable" ascii (American Standard Code for Information Interchange) characters run from 20h (space) to 7Fh... well, 7Eh really - 7Fh is "delete". Numbers less than 20h are "control codes" - backspace, tab, carriage return, linefeed, beep, etc. These days, they confuse the issue with Unicode and stuff...

The garbage characters are... well, code or image data. With the exception of .com files, executables have a header or more than one - some of which may be readable. Windows executables should start with "MZ" (I'm told "ZM" works too). Jpegs have a header, too. A few bytes in, you should see "JFIF". That's a "signature" (Linux folk call it "magic" sometimes). This enables software intended to read 'em to identify it as "their" kind of file. Text editors aren't intended to read 'em, so they don't know what to make of the binary data. It's all just bytes!

Best,
Frank

Posted on 2013-02-19 23:55:01 by fbkotler
To be exact, 'MZ' is what an MS-DOS executable starts with (as in .exe, not .com).
A Windows executable is basically an MS-DOS executable, with an extra header tacked on, which can be recognized by 'PE'. The MZ header specifies where the PE header starts (if any).
This is why when you try to run a Windows executable from DOS, it displays the message: "This program cannot be run in DOS mode."
The MZ-part of a PE file is literally a simple MS-DOS program that prints that message.
Posted on 2013-02-20 03:49:03 by Scali
I would like to know how to read an image file from disk and parse the file for the images and discard the metadata writing only the image bytes back to the file in assembly language. The whole operation don't have to specified the key point i want are
1. Opening the image file
2. Locating where image starts
3. manipulate pixels
4. Obtain image width, height etc meta data.

I would like to do some image processing and i would like to know how to just read the files and locate where images start my intention is to do it in assembly. I've searched online for high level implementations but the all include using external header files and libraries with predefined transformations. What i want to do is write a simple version of that library to manipulate image files.
Posted on 2013-02-20 18:58:57 by Snake4eva

I would like to do some image processing and i would like to know how to just read the files and locate where images start my intention is to do it in assembly. I've searched online for high level implementations but the all include using external header files and libraries with predefined transformations. What i want to do is write a simple version of that library to manipulate image files.


Data is data. Don't get caught up in a particular representation of data, it's all binary at the computer level.

Grab a particular image format file structure and have at it. Reading a BMP file is probably a decent introduction as it doesn't involve compression.
Posted on 2013-02-20 19:14:49 by SpooK
This is a portion of a jpeg file that I opened in Notepad:
JFIF  ` `  Adobe d    ]Exif  MM *   2     b;     vGF      GI    ?                i     }  2009:03:12 13:46:42 Corbis            Ӓ   54    54             2008:03:14 13:59:26 2008:03:14 13:59:26             )     1     9           H    H   JFIF      C  

(1#%(:3=<9387@H\N@DWE78PmQW_bghg>Mqypdx\egc

That's what I call gibberish what do the symbols mean and how did notepad know how to generate them.
Posted on 2013-02-22 08:04:55 by Snake4eva
Notepad expects to be fed printable ascii characters. That's not what you're feeding it. Notepad "knows" to produce those symbols 'cause that's what you get if you treat those bytes as ascii... or unicode(?)...

http://www.wotsit.org has more information than you wanna know about file formats, including JPEG (Joint Photographic Expert's Group), and including source code. (I haven't looked at it - HLL I presume...)

It was a long time ago I looked at jpeg at all. There were several compression/decompression formats in use, some of which (Discrete Cosine Transformation, e.g.) were patented. This is like patenting long division, IMHO, but that's how it is/was. Some of these patents have probably expired.

I concluded that it was way too complicated for a beginner like me to tackle. The "sane" way to do it is to call an existing library. This isn't too difficult to do from assembly language, but there isn't much point to it, either.

Once you've found the jpeg format, open the file and read (some of) it into a buffer. Probably worthwhile to mmap it(?). Then by treating the ascii parts as ascii, the words (if any) as words, the dwords as dwords, etc. you should be able to begin to make some sense of it.

I attempted a .gif reader for dos once. (.gif was patented at the time, but I wasn't too worried about the FBI coming to my house). By simplifying it to a known file of known size and graphics mode (13h), and known to fit into a 64k buffer, I managed to display it. As I recall, there was a "shortcut" in the decompression loop that "should have worked", but I never got it to work. I eventually lost interest and moved on to something else, without ever being "satisfied" with it. (I'm not a very ambitious student)

By starting simple (BMP probably), and taking small steps, you can probably figure it out - see what to send off to a library routine, at least. I doubt if figuring out why Notepad does what it does is going to be much help.

Best,
Frank

Posted on 2013-02-22 11:58:21 by fbkotler
fbkotler how long did it take you to write the .gif reader?
Posted on 2013-02-22 19:14:52 by Snake4eva
That's an easy one: I don't remember.

I could compare the file dates between "gv01.asm" and "gv09.asm"... but they've all been copied from drive to drive and all have the same date. As I recall, I maintained interest in the project for maybe two or three weeks (?) but it wasn't anything like a "full time job" for that length of time. The slow part was collecting information and figuring out what I needed to do. Writing the code, once you've figured out what needs to be done, doesn't take that long - longer than HLL of course, since we need to tell the CPU "Every Single Thing" - but not that bad. I recall that I was using a Pascal file as guidance (I didn't - and don't - know Pascal). There may have been C files involved, and some text documentation. I don't think I used any compiler-generated code, but was just trying to figure out what the "steps" were...

My main goal was to figure out how such a thing as "lossless compression" was even possible! I haven't retained it entirely, but at the time I understood "how it worked" well enough to satisfy myself... and that was as far as I got.

I can post the code - I'd have to boot to dos and test what actually worked... I think "gv09.asm" didn't work, I see some "debugging" code wherein I stop and wait for a key after every pixel (a really slow way to display an image!)... maybe "gv08.asm" or "gv08a.asm"... But I don't think it'll help you much, being dos code, and 16-bit code, and for a specific .gif image... Quite a different thing. A different area of the Forum would probably be more appropriate anyway...

Doesn't really matter how long it took me - it will take you a different amount of time (more or less), depending on where you're starting from, amongst other things. "Until done"... or until you get bored with it. :)

Best,
Frank

Posted on 2013-02-23 12:21:24 by fbkotler
Thanks again fbkotler and I would really like you to post the source codes from gv01.asm to 9 probably it will help me out. I am trying to write an image object detector as a part of my final year project and I have already learnt how to open and manipulate files in C++ but i want to use assembly language to write the transformations involved. I understand the different file formats for image and the different compression types. Also I get how image can be represented in memory and all of the mathematics involved with image processing. My major problem is understanding the jpeg format. I recently read an article that made it more clearer however my major problem is how to open the file and move through it in assembly and the writing the result to the video buffer. Also I want to know how to locate YCbCr image data different from the DCT or Huffman tables inside the image. I know that both JFIF/JPEG documentation are online and i've read the for hours but it still seems sketch how to access the image data. I can learn with relative ease how to open and traverse a file in assembly using bios and OS interrupt services and the video write i might be able to figure out but i just need someone to tell me how i can look for the DCT and Huffman tables and YCbCr data in the image. I'll do the code I just the a guidance as to how?
Posted on 2013-02-23 18:23:37 by Snake4eva
Well, I think "dos and bios" or perhaps "algorithms and source code" might be a more appropriate place for this, but you asked for it here, so I'll put it here. Mods can move it or delete it entirely once you've had a chance to download it. Besides the "gv??.asm" series, there's "test.gif" which they're intended to work on. I threw in examples of "hi-res" graphics from int 10h. "garfvid.asm" is the bank-switching one (two methods, one of which is commented out) and m107ph.asm and j107ph.asm need "Flat Real Mode" which require starting from "real Real Mode" and may not work for you. They aren't very good examples, as they ASSume that the video mode is available and don't do much error checking. Studying RBIL is a better bet. It is what it is. Good luck!

Best,
Frank

Attachments:
Posted on 2013-02-25 19:57:22 by fbkotler