Anyone know a good way to disassemble in reverse, as in, from (say) 402000 backwards, but keeping as close to the intended disassembly as possible ?

I've tried a few methods, but none of them are particularly brilliant and almost always get out of sync very easily.

The first approach I've tried is going back from the current address by 16 bytes (max instruction size), and then coming forwards, attempting to disassemble, until I get a valid disassembly which is the same size as the number of bytes I've gone back to try and find it. Eg. I go back 16 bytes, don't find a 16 byte opcode, so go back 15 bytes instead. Keep going until I eventually go 2 bytes back and find a two byte opcode. However, this can get out of sync easily as a trailing null on one instruction can screw up a push that comes after it for example.

The second method was to go 16 bytes back and try and disassemble from there, keeping a list of all the addresses seen up until the current address, and then simply take the last seen address as the previous address. Doesn't work too well, as if the instruction 16-bytes back is some data from a command, then the last disassemble is likely to be bad too. It gradually gets insync as you go further up, but it does look ugly as it does so.

Does anyone have any alternatives to these ?
Posted on 2003-06-15 17:10:05 by squidge
Given a long enough string of bytes to disassemble the algo will stablize to actual instructions, but this is assuming all the bytes are to instructions. A simple way is what you are already doing, but I'd back up further. Assuming the average instruction length is 8 bytes (pessimistic on purpose) then your only backing up 2-3 instructions -- not enough room for mis-alignment to syncronize. Go back 64 bytes. Begin disassembling and if the bytes at the present position is not an instruction then increment the pointer and try again. This will syncronize to the instruction boundaries.

With code needing alignment on entry points and data stored with code the above method produces garbage on many occasions. A better approach is to do some analysis of the code. Look for entry points (CALL/JMP/Jcc targets) and cache them.

No algorithmic method is without errors.
Posted on 2003-06-15 18:15:38 by bitRAKE
Thanks bitRAKE, going back further seems to sort the problem in most cases, as I can add a manual adjustment for the cases where it will not work. As I know the address of the current instruction, I know that the previous instruction must end 1 byte previous to that, so if it doesn't, it has not synchronised properly and it needs to restart. Scrolling back through data will still be a nightmare, but I don't see a way around that without analysing the code, and due to the nature of program, the analysed data could become void at any time due to modifications.
Posted on 2003-06-16 14:03:04 by squidge