I'm creating a debugger, or at least I'm trying. I got the memory window, the register window is halfway there, so I'm up to the disassembly window. I've given it soe thought over the past few days, and I'm coming up with more and more problems. The only thing that will be straight-forward is the actual diassembling of the instructions, cause the Intel manual has it all. I had in mind, that I should disassemble a portion of it and display it, and as the user scrolls, I should disassemble some more and display it when needed. This poses the problem of how I would know when the edit control got to the top/bottom. Also going down is no big deal cause Intel tells you if you have a certain first byte, this or that will follow, but if i'm going up, how do I know where the instruction starts? Then, there's the problem of data in the code segment. The only option I see for that is a recursive algorithm to process all jumps, and I don't really see what I would look for and how to store info on the jumps. Also, that means I have to disassemble the whole thing. What would I do with a 10 MB program? Those are the only problems I can see so far, but I'm sure there are more. Does anyone know how I should approach such a task? I don't suppose any of you have a DLL that can diassemble, or something. :)
I don't think that disassembling on-the-fly in-the-small is feasible -- basically, disassembly is a global problem because of jumps. And as you point out, there's the problem of parsing the instruction stream; you'd need a way to save the parsing information, perhaps as an array of offsets giving the start of each byte beginning a valid instruction. What I would try to do is to disassemble starting at the entry point of the program, following the calls and jumps and see how far I get. This should minimize the problem of data in the code section, since data is never (well, almost never) executed. Also, the start-up routine for every program has boilerplate code -- you know where the calls go and what to expect -- and can therefore serve as a good test of the correctness of your method. A lot of disassemblers are totally befuddled by data and try to disassemble right through it. The best disassembler so far, IDA Pro by Ilfak Guilfanov, traces the actual execution flow. It seems to work pretty much as a human would (that's why it's "interactive"), but much faster. I suppose any top-notch disassembler must be of this type. Are you trying to compete with Guilfanov? That would be a tall order! Good luck!:)
For a debugger, a global analysis is not feasible. It's too slow. What would I do? I have to find out, since I'm about to write a kernel-level debugger for my homebrewn OS-kernel. The easy way would be to always have the line of code @ EIP on the top line of your window, disallowing scrolling. This would not be so very useful. Another method would be increasing (or decreasing, depending whether you scroll up or down) the disasm offset as long as the byte at EIP is "in the middle of an instruction". You might want to have a look at mklasson.cjb.net and download DazmIt. It's for tasm as far as I remember, but a clever pro- grammer should be able to see what parts he needs to learn from, rather than just ripping code =)
Why not try this:- Have a buffer, say 4K of memory, when the edit control hits the top, run up your pointer back by 4K then attempt to translate the instructions forwards, you'll hit a known instruction at some point, then you can translate down the the end. Insert these instructions onto the top of the buffer, and allow the user to scroll from there. Actually I have Visual Studio here, which has debugger. It creates a custom window, there is a scroll bar running from address 0 to 0XFFFFFFFF when the program breaks it positions the cursor into the buffer at the memory location of the program break, scrolling to the top/bottom takes you all the way through memory, so it must be doing it dynamically, cos you can't have a 4Gb window! umbongo
Actually, I remember yesterday I was lurking around the MSJ site ( www.msj.com) and they have an number of articles on Writing a debugger (windbg it called) Take a look there for some ideas. umbongo :)
Argh. The microsoft drives me so d*mn crazy! :). I can never seem to find the information I need. If you can remember the exact URL of those articles, it would be a great help. Lazy me, but who likes searching for needles in haystacks? =)
F0dder-- try http://www.microsoft.com/msj/code.asp Type debugger in the Search box, and in the resulting list of files, choose the first one (by John Robbins).
umbongo is right about the dynamic method of backing up some bytes, but you only have to back-up (the longest instruction length) X (number of lines in the window). The longest instruction allowed I believe is 15 bytes. You have to make assumptions - IDA does even. BTW, IDA allows you to browse while it's tracing the code to clean things up. bitRAKE
Yeah, i think the only real way to it is to dissasemble the entire file, and keep it in memory, while your are scrolling through it. This is what w32dasm does, quite a good dissasembler. It even offer you the ability to save the file after you load it in and it dissasembles it, so that it wont have to dissassemble next time. I just wish w32dasm had a close option, so that you dont have to quit to close the file.
I've given your ideas some thought. If I back up by a certain number of bytes, and disassemble, there still is a possible problem. If I have 0E, it means push es, while if I have 8F0E, it means pop dword ptr , so I still wouldn't know I started at the right place. As for disassembling the whole file, which would be the best way, I'm doing a debugger not a disassembler, so I can't wait an hour like with IDA and Wdasm to disasm the whole thing. I wonder how WinDbg does it. Then again, it's made by the all-knowing Micro$oft.
As I said earlier... In a debugger you know the current EIP. So, when disassembling, you should where the instruction boundaries are. If you get past EIP without having an instruction starting *exactly* at EIP, you will have to increase/decrease the start- offset of the disassembly. This will be pretty cumbersome if you disasm far away from the EIP, but at least it should work, and can be extended to be more useful (last-known good opcode boundary or something like that).
Hmmm.. I will have a think about this Hel, and get back to you :-) umbongo
I looked at Windbg, and it's horibble. After a few hundred bytes, up or down, it produces garbage. I didn't look at the Visual Studio debugger though. Therefore, I'm feeling tempted to cut corners.
The fact of the matter is that in real code the boundaries between instructions will syncronize after a few bad disassembled instructions. bitRAKE p.s. That is unless they are trying to make it hard to disassemble :P
Hel, I've given this some thought, and now I think I know the answer..... Visual Studio always compilies with certain alignments, 4,8,16 etc bytes, so it will know where the instructions start - thats how it can disassemble the code, you get a 1 byte instruction, but it inserts 3 NOP's to fill it out, by using a 4 bytes alignment you can always firgure out where the next instruction is. So if you start at address 0 and find a 1 byte instruction then the next one is at address 4 etc etc. You'll need to know the alignment of the instructions, but I think it tells you in the exe header. umbongo.
No, that alignment is for data only. Nop-aligning opcodes would be stupid and slow. Besides, 4-byte alignment wouldn't work, as many opcodes are pretty much longer.
>> No, that alignment is for data only. Nop-aligning opcodes >> would be stupid and slow Well we are talking about MS here..... I was thinking more of edit and continue which loads the application with int3's so it can insert extra instructions in if you chnage the code. Maybe the only was is to disassemble the whole program, if you're looking through memory you'll just have to guess. umbongo
As far as I know, edit and continue only does padding on a function boundary - it *does* add a lot, though. As for disassembling (in a debugger, not a hardcore project as IDA), keep a sync buffer, and do as I've described above.
Concerning the debugger that comes with Visual C -- it has access to your source code and debug information in the exe file which makes its task MUCH easier. It's relatively easy given this information to disassemble on the fly and display the results in a window. This explains why Microsoft is "all-knowing", at least in its own environment. If you don't have the source code, the task is much harder. Thus IDA (and SoftIce) have a tougher job. If you're writing a debugger and expect to have access to source code, you'll need to understand the DEBUG part of the PE file as generated by the Microsoft compiler and linker. Here's a funny thing that happened to me when using the VC debugger. I was building a project that uses WM_COPYDATA for simple IPC, following the code in A. Williams' book "Windows 2000 System Programming". There's a bug in Williams' code that caused a Privileged Instruction Exception. This was truly weird, so I checked the Code pane in the VC debugger and sure enough, it pointed to an in (read port) opcode instruction. This must have been totally bogus -- and it was! All I had to do was scroll above the in opcode and the debugger reconsidered its parsing and displayed the correct codes. This shows two things: 1) certain bugs can cause code to be executed with an incorrect reading frame; 2) code parsing by the debugger is nontrivial, even with access to the source code.
you will just have to store each address that you find each instruction at. This means you will have a DWORD address for each line of code you disasimble. using this info you can always go backwards by just getting your next address DWORD. if you did not understand this I will try it again if you still need help on this issue..