I've some problems understanding the way API calls are made in Windows.

1. Each process runs in his own 4 GB adress-space. It would be a waste of resources to put a copy of all API-Functions into the space of each process. Or does Windows only change the allocation of the physical memory (where the implementation of the API resides) to the virtual adress-space. So lots of processes should be able to access the same physical memory (only the code).

2. How does the program know where to find the entry point of an API-Function. Is there a fixed location in (virtual of course) memory. When I debug my programms I realize that the function call jumps to a jump-instruction at the end of my Programm which leads to the function entry point.

I hope my questions have been understandable. I'm not quite sure how to express them.
Thanks
Posted on 2003-05-06 05:58:55 by Compuholic
2. How does the program know where to find the entry point of an API-Function. Is there a fixed location in (virtual of course) memory. When I debug my programms I realize that the function call jumps to a jump-instruction at the end of my Programm which leads to the function entry point.
As you probably know, dlls are fully relocatable, and each function in that dll starts at a certain offset. Each dll also has what is called a "base address", this is where the dll gets loaded to (this can be configured by the developer). If the kernel cannot load the dll at that base address, it relocates it to another position. The kernel then searches all other modules already loaded into that process space for references to functions within that dll, and then "fixes" those modules to point to the correct location of those referenced functions (the address for the function is the dll base address + the function offset).

The reason why your exe does a call to the end of itself before jumping to the referenced function is the reason why dlls and relocation work in the first place. The area that jmp instruction resides in can be called the "vector table" or "jump table", and there is one vector per referenced function. So, when the dll is loaded, only the addresses within the vector table need to be fixed and the whole exe just works. Without that vector table, the kernel would have to scan the whole exe looking for references to functions in the newly loaded dll, and would have to fix up 100's or 1000's of call statements to point to the right address.
Posted on 2003-05-06 06:25:45 by sluggy
For your question #1...

(note I'm talking about NT here - 9x is somewhat different because it sucks ;)).

DLLs are shared across processes. A DLL is only mapped into your process if you use it ("use" defined as use it directly, or use a DLL that uses it. recursive). As you guessed, memory is shared via the x86 paging mechanism - however, both data and code is shared. Modification is handled via Copy On Write.

You should find yourself a copy of "inside windows 2000", it's a very interesting book - but of course feel free to ask further questions :)
Posted on 2003-05-06 07:08:02 by f0dder
Thank you very much...
Posted on 2003-05-06 08:30:56 by Compuholic
no problem - and be sure to ask if there's anything that remains unclear or if you have further questions.
Posted on 2003-05-06 08:35:25 by f0dder
I just ran into another question. In order to understand the layout of an executable in Memory I looked a little bit closer into a PE-Header.

Address of Entry Point 00016514

This seems to be the offset to the File, where the CPU should begin execution. Althought i'm wondering why no virtual address is referenced I can imagine that the position of the executable is relocateable in memory, so it wouldn't make much sense to use a VA here. I compare it to the ELF-Header which is used in *NIX operating systems where it is possible to describe the whole layout of the Program in Memory.



Section VirtSize RVA PhysSize PhysOfs Flags Info Percent of file
------- -------- -------- -------- -------- -------- ---- ---------------
CODE 00015588 00001000 00015600 00000400 60000020 CER 39,6%


But this piece of the section header is very confusing.
1. Why is VirtSize not aligned but PhysSize is? IMHO the opposite would be right.
2. What does PhysOffset describe?
Posted on 2003-06-15 06:20:41 by Compuholic


Section VirtSize RVA PhysSize PhysOfs Flags Info Percent of file
------- -------- -------- -------- -------- -------- ---- ---------------
CODE 00015588 00001000 00015600 00000400 60000020 CER 39,6%

The naming is extremely confusing. I prefer to call it
Section VirtSize RVA      RawSize FileOffset  Flags    Info Percent of file

------- -------- -------- -------- -------- -------- ---- ---------------
CODE 00015588 00001000 00015600 00000400 60000020 CER 39,6%


1. Why is VirtSize not aligned but PhysSize is? IMHO the opposite would be right.

No. VirtualSize is the acutal size of the code. RawSize is size of the section when mapped (aligned to 200h).
2. What does PhysOffset describe?

Just call it the file offset and you will understand what it does.
Posted on 2003-06-15 06:57:49 by roticv