i am kinda newbie to this PE matter, so having some unclear stuff
in icz tutorial about this topic he created program for checking
if certain file is PE
i saw that he used memory mapping file
to map selected file in memory and then check to see if its PE

so my question is, in order to check if file is valid PE, is it really necessairly
to do memory mapping of file or its possible to it by checking file on hdd
somthing like, open file for read, do checks at certain offsets in file to compare
for PE characteristics (like PE\0\0 and other) ?

ps. one more thing, when file is memory mapped, its loaded in memory exactly as it
is on hard disk right?
ok, so i readed somewhere that kernel loader when loading exe PE files
use this MMF technique, and i also readed that PE at hdd and in memory are
not same, so how come is that? does kernel loader use some kind different MMF
specially designed for loading PE files or what
Posted on 2002-07-11 17:04:55 by Mikky
It is absolutely not necessary to memory map the file. Simply open the file, read IMAGE_DOS_HEADER structure. Verify its signature is equal to "MZ", if no - abort. Then, SetFilePointer to IMAGE_DOS_HEADER.e_lfanew and read IMAGE_NT_HEADERS structure. Verify signature is "PE".
Posted on 2002-07-11 17:11:28 by comrade
are u sure about this?
i mean, all those structures that describes PE headers, are they applied on PE file in memory or in hard disc??
is there any difference between them (my ps. question in 1st post)

i also found on some other places (like famous matt pietrik PE tutorial) that he also used MMF method for this in his sample program
Posted on 2002-07-11 20:51:53 by Mikky
When you use file mapping the file's content is exacly the same as it is on the hard disk. the sections of the PE file that are affected by being loaded into memory are only effected if the PE file was loaded by the windows PE loader.
Posted on 2002-07-11 21:04:26 by Kudos
hi all
lots of info 'bout pe
Microsoft Portable Executable and Common Object File Format Specification
http://www.microsoft.com/hwdev/download/hardware/PECOFF.pdf
later
Posted on 2002-07-11 22:54:04 by b0z0
Mikky,

Comrade has told you the right way to test if a file is a 32 bit PE file. Read the "e_lfanew" member of the MZ header to determine if the file is a PE file.

An even easier way is to read the 1st 500 bytes or so of the file and search for the string PE with two ascii zeros appended.

If its there, the file is a PE file, if its not, its not a PE file.

Regards,

hutch@movsd.com
Posted on 2002-07-11 23:35:04 by hutch--
hutch--,

I've seen some files with very big DOS part, thus your advise is not completely correct.
If dword on 3Ch from start (e_lfanew) points on P,E,0,0 (50, 45, 00, 00), then it is PE.
Posted on 2002-07-12 00:26:53 by masquer
in order to check if file is valid PE, is it really necessairly
to do memory mapping of file or its possible to it by checking file on hdd
somthing like, open file for read, do checks at certain offsets in file to compare
for PE characteristics (like PE
in order to check if file is valid PE, is it really necessairly
to do memory mapping of file or its possible to it by checking file on hdd
somthing like, open file for read, do checks at certain offsets in file to compare
for PE characteristics (like PE\0\0 and other) ?


It' absolutelly indifferent what method you use to read file contents.
Actually it's different in way how OS loads it in memory, but for checking if file PE or not it's not important.

ps. one more thing, when file is memory mapped, its loaded in memory exactly as it
is on hard disk right?
ok, so i readed somewhere that kernel loader when loading exe PE files
use this MMF technique, and i also readed that PE at hdd and in memory are
not same, so how come is that?


Right and wrong at the same time.
PE consist of number of sections. Sections, among other, have some characteristics:
RawOffset
VirtualOffset
RawSize
VirtualSize
...


Sections are aligned. But alignment can be different on HDD and in memory.
There are fields SectionAlignment and FileAlignment in optional header.
SectionAlignment is alignment in memory.
FileAlignment is alignment on HDD.
The alignment in memory can't be < page size (4k), because sections have different access right.
.text - executable & readable
.data - readable & writable
and so on.
But on HDD sections can have smaller alignment (to save disk space).

So RawOffset of section is offset from beginning of file.
But VirtualOffset is offset from image base in memory.
If SectionAlignment != FileAlignment
RawOffset != VirtualOffset

Loader not maps PE header in memory because of PE header is needed only by loader itself.
When PE is loaded header is useless.
.text, .rdata, .rsrc sections are mapped as is, because of read only attribute.
.data section remains as on HDD before every first writing to it by executable itself.
.reloc also is needed only by loader itself and only if it can't map PE to preferable ImageBase.
Also in every PE that is importing something, there is import directory.
At load time OS loader changes the part of import directory called IAT (Import Address Table).
It happens only if import is not binded.
So, as you can see, there is some little difference.
There are also other section types.

does kernel loader use some kind different MMF
specially designed for loading PE files or what
in order to check if file is valid PE, is it really necessairly
to do memory mapping of file or its possible to it by checking file on hdd
somthing like, open file for read, do checks at certain offsets in file to compare
for PE characteristics (like PE\0\0 and other) ?


It' absolutelly indifferent what method you use to read file contents.
Actually it's different in way how OS loads it in memory, but for checking if file PE or not it's not important.

ps. one more thing, when file is memory mapped, its loaded in memory exactly as it
is on hard disk right?
ok, so i readed somewhere that kernel loader when loading exe PE files
use this MMF technique, and i also readed that PE at hdd and in memory are
not same, so how come is that?


Right and wrong at the same time.
PE consist of number of sections. Sections, among other, have some characteristics:
RawOffset
VirtualOffset
RawSize
VirtualSize
...


Sections are aligned. But alignment can be different on HDD and in memory.
There are fields SectionAlignment and FileAlignment in optional header.
SectionAlignment is alignment in memory.
FileAlignment is alignment on HDD.
The alignment in memory can't be < page size (4k), because sections have different access right.
.text - executable & readable
.data - readable & writable
and so on.
But on HDD sections can have smaller alignment (to save disk space).

So RawOffset of section is offset from beginning of file.
But VirtualOffset is offset from image base in memory.
If SectionAlignment != FileAlignment
RawOffset != VirtualOffset

Loader not maps PE header in memory because of PE header is needed only by loader itself.
When PE is loaded header is useless.
.text, .rdata, .rsrc sections are mapped as is, because of read only attribute.
.data section remains as on HDD before every first writing to it by executable itself.
.reloc also is needed only by loader itself and only if it can't map PE to preferable ImageBase.
Also in every PE that is importing something, there is import directory.
At load time OS loader changes the part of import directory called IAT (Import Address Table).
It happens only if import is not binded.
So, as you can see, there is some little difference.
There are also other section types.

does kernel loader use some kind different MMF
specially designed for loading PE files or what
and other) ?


It' absolutelly indifferent what method you use to read file contents.
Actually it's different in way how OS loads it in memory, but for checking if file PE or not it's not important.

ps. one more thing, when file is memory mapped, its loaded in memory exactly as it
is on hard disk right?
ok, so i readed somewhere that kernel loader when loading exe PE files
use this MMF technique, and i also readed that PE at hdd and in memory are
not same, so how come is that?


Right and wrong at the same time.
PE consist of number of sections. Sections, among other, have some characteristics:
RawOffset
VirtualOffset
RawSize
VirtualSize
...


Sections are aligned. But alignment can be different on HDD and in memory.
There are fields SectionAlignment and FileAlignment in optional header.
SectionAlignment is alignment in memory.
FileAlignment is alignment on HDD.
The alignment in memory can't be < page size (4k), because sections have different access right.
.text - executable & readable
.data - readable & writable
and so on.
But on HDD sections can have smaller alignment (to save disk space).

So RawOffset of section is offset from beginning of file.
But VirtualOffset is offset from image base in memory.
If SectionAlignment != FileAlignment
RawOffset != VirtualOffset

Loader not maps PE header in memory because of PE header is needed only by loader itself.
When PE is loaded header is useless.
.text, .rdata, .rsrc sections are mapped as is, because of read only attribute.
.data section remains as on HDD before every first writing to it by executable itself.
.reloc also is needed only by loader itself and only if it can't map PE to preferable ImageBase.
Also in every PE that is importing something, there is import directory.
At load time OS loader changes the part of import directory called IAT (Import Address Table).
It happens only if import is not binded.
So, as you can see, there is some little difference.
There are also other section types.

does kernel loader use some kind different MMF
specially designed for loading PE files or what


AFAIK, the method of MMF used by loader is absolutely the same.

PS: There are bunch of functions in Imagehlp.dll (find in your %System%) for working with PE:
ImageNtHeader
ImageRvaToSection
ImageDirectoryEntryToData
etc...
Consult your API-ref.

PPS: I also have seen some files with nonstandart size of DOS stub.
Posted on 2002-07-12 02:58:46 by Four-F
masquer,

You are right of course as the suggestion I made for a simplified method did not take into account programs that are designed to run in both DOS and win32.

If a simplified method is required, just scan the whole file to find P,E,0,0. The MZ header member "e_lfanew" never fails but it requires loading the MZ header into a structure first.

Regards,

hutch@movsd.com
Posted on 2002-07-12 05:28:51 by hutch--
thanks for the answers guys, now i am beginning to get somthing ;)
i have few new questions

1. ok so which method is better or lets say faster
if we use MMF here to load a big exe file thats going to be
slower than to check file from hdd which includes lets say not more
than 1kb reading from file

2.ImageBase is member of optional header structure,
this is from Icezilion PE tut

It's the preferred load address for
the PE file. For example, if the value in this field is 400000h, the PE
loader will try to load the file into the virtual address space starting
at 400000h. The word "preferred" means that the PE loader may
not load the file at that address if some other module already occupied
that address range.


i dont understand this, if every process runs in its own 4gb memory space
doesnt this mean that it has all that memory for it self so loader can load
PE everytime at prefered address becouse there is nothing else there
but our program...??
how can "some other module already occupied that address range."
if there is no other program in my program memory space or maybe this
prefered address applys for real physical address in RAM
not the virtual that win32 creates for our programs?

3. i dont understand all those alignments stuff
like FileAlignment and SectionAlignment (members of optional header struct)
what are they for? like the things are not complicated enought

4. PE is portable so it can executes on different procesors right?
ok, but how is that posible if we compiled PE file on x86 so it will have x86 instuctions in it, now how can that file with x86 instuctions execute on i.e. alpha processor??
all that zeros and ones in file will represent somthing completly different thing on alpha processor
Posted on 2002-07-12 15:54:16 by Mikky
1. No matter. If your file is big enough MMF will load only needed part.

2. Every process run in separate address space, and for every process there is an address 00400000h.

3. FileAlignment and SectionAlignment is an alignment in file on disk and in memory respectively :rolleyes: I think it is needed for portability reason.

4. If you use x86 instruction, sure you can't run it on Alpha and vice versa. :) PE is just provide unified storage for code and data, which can vary on each platform
Posted on 2002-07-12 16:32:55 by masquer
Well, there are many things in there besides your program. However, in Windows 95 and higher any address between 0x400000 and 0x80000000 is free. But for DLL's, there may be another module at the preferred address. Thus, a DLL could end up in a different address in another address space (but it will make another copy of the data if a page in a nonshared is changed)
Posted on 2002-07-12 20:19:12 by Sephiroth3
In NT and derivates you can find DLL's even at 0x60000000 and such. See this post for a routine to reserve the largest block possible of your process' address space.
Posted on 2002-07-13 03:10:04 by Maverick
2.ImageBase is member of optional header structure, this is from Icezilion PE tut

It's from COFF (PE) format.
how can "some other module already occupied that address range."

DLLSkeleton.dll in Tut #17 has ImageBase=10000000.
It's linker dafault. If your proggy use more than 2 dll compiled with default image base, loader can't map its to the same address.
It should lelocate one of the dll to another free memory.
And .reloc section in every dll is for this job.
i dont understand all those alignments stuff...

Icz tuts is good, but it's only very basic knowledge.
If you want really understand how all this stuff works don't ask it here.
It takes very much time to explain it. Look for PE format description and read.
One link is above, another is at the bottom of Icezilion's PE-tut #1 (Luevelsmeyer).
Also you can find somethig interesting with the help of google.com.
IMO it's best way to learn PE.
Posted on 2002-07-13 04:05:16 by Four-F
Hutch, hrm, simplified method? Might be a little less code
to read in "whatever amount" of bytes and do PE,0,0 scanning...
but it's not "simpler". Furthermore, as already mentioned,
it can fail to identify PEs because of large DOS stubs, and
it can even give false positivies. And there's no reason
whatsoever not do do PE identification the "right" way...

I've attached a simple example with a simplistic (read: not
enough error handling) way of detecting PE files. If you're
worried about using too much stack for the MZ header (heh),
you could do away with a DWORD instead of the MZ struct, and
do a few more SetFilePointer calls.

There's a bunch of different ways to handle PE manipulation.
The approach typically used (and shown by iczelion in his
PE tutorials) involve file mapping. If you're going to use
that approach, you might as well do the PE checking as part
of setting up the filemapping (instead of calling a "isPEFile"
and only doing work if it returns true, call openPEFile and
bail out if it's not a PE). I usually map in the PE files as
the windows loader does, so I can use RVAs directly as pointers;
I'm not coding malware, so using "an amount" of memory is not a
problem for me.

As for which approach is faster? Dunno if you can feel any
difference on such a small operation... it's not going to
matter much. MMF *is* slower than normal ReadFile I/O, both
in setup time and access speed, but it's not going to be
noticeable if you are *only* going to check if a file is a
valid PE file.

Mikky, about ImageBase... 0x400000 is usually :) free in a
process address space, so your app can be loaded there. However,
other Imagebases might not be free, especially on 9x that is
split into user/shared/kernel. There's also issues when loading
DLLs, but fortunately those usually have relocations (even if
most developers are too ignorant to set a different imagebase
than linker default).

Alignment... section alignment has to do with x86 page protection.
file alignment is as far as I can tell to make paging operations
faster, and the minimum file alignment of 0x200 corresponds nicely
with the IDE sector size (afair, if you open a file in uncached
mode, you must read sector-aligned and sector-multiple sizes).

The PE *format* is portable across processors, but that doesn't
mean you can execute x86 on an alpha. Just that you can use the
same file format, and thus that a fair amount of the loader code
can be kept from machine to machine.
Posted on 2002-07-13 07:08:30 by f0dder