Hi,
I would like to discuss here an optimal heuristic algo that enables finding of all instructions, which directly fill a specific item in the stack, using reverse stack scanning. I would like to speak about general issues, not about concrete assembly implementation.
I started writing this algo as I needed to find all labels of RETN instruction initially - I believe this is the best approach instead of watching the stack after each CALL or PUSH OFFSET or similar instructions. I generalized the algo later on.
I wrote the algo with general code in mind and that's why it tests, for instance, far jumps too.

Note: This topic is quite sensible here, but is authorised by the moderators. It may be deleted when it turns in the wrong direction, so please sticky to the topic.

Well... I hope the algo works, even though I didn't code it and test it.
I think the algo has only one drawback - it probably will be very slow, because it tests if is stack-handling instruction inside a loop, because the loop multiplies the effect on the stack. The algo tests for the loop until it gets last instruction of the code or some important instruction (see GetStackLoop proc). I think of limitation of maximal count of scanned instructins or so...

Notes:
Do not depend on given examples. These are just for illustration.
The algo is intended for scanning protected-mode code with 32bit stack segment.

1st example: get the label(s) of the RET instruction:


push eax stdcall argument
call proc ESP=ESP-4
...
proc: ... ESP without changes
[B]retn 4[/B] find_nest: +4 (== 0 + 4)

2nd example: get the instruction, which fills the item at ESP+8


push eax ESP=ESP-4
push ebx ESP=ESP-4
push ecx ESP=ESP-4
[B]mov edx,[esp+8][/B] find_nest: +12 (== 8 + 4)


Base theses:

    [*]In general, the scanner watches the changes of ESP (e.g. PUSH means ESP=ESP-4). This change is added to the find_nest number until the result is zero - then it gets the filling instruction. Computing the find_nest: at the start of scanning, we know how nested is seeking item (e.g. the nest is zero in case of 1st example, or +8 in case of 2nd example) relative to starting ESP. To this nest has to be added +4 and we get find_nest, because we add each relative change of ESP to this find_nest (see examples above).
    [*]The main difficulty lies in correct recognition of stack-handling in loop, e.g.:
    
    
    [B]@@: push eax[/B]
    inc eax
    cmp eax,ebx
    [B]jne @B[/B] ; conditional loop
    retn ; start scanning - origin address [I]OA[/I]

    The GetStackLoop function tests it.


    Assumptions and simplifications:

      [*]The code is fully disassembled into list of entries (i.e. we know the lenght of any instruction, so reverse scanning is possible)
      [*]Data between code and other junks were correctly recognized.
      [*]No other instruction indirectly rewrites the stack (e.g. MOV EAX,ESP - MOV ,xyz)
      [*]The manipulations with stack are always DWORD-aligned
      [*]To prevent multiple scanning of the same code, scanned addresses are remembered along with appropriate find_nest values and these informations are tested while scanning. This saving/testing is not included in the algo below.


      Algo:
      
      
      [B]Procedure StackScan[/B](Arg1: Origin Address OA, Arg2: nest)
      // returns record of addresses of instructions, which fills the item
      01 Compute and remember find_nest: find_nest = nest + 4;
      02 [I][B]Scan reverse through the code, watch for:[/B][/I]
      03 [B]Unconditional jump[/B]:
      04 Stop going this way;
      // there has to be label at the next instruction - jumps using this label are remembered already (see next line)

      05 [B]Label[/B]:
      06 Remember all jump instructions using this label, with actual find_nest;
      // label means independent branch that has to be scanned later extra

      07 [B]Instruction modifies ESP[/B]:
      [QUOTE]
      08 mov esp,405008
      09 push ebx ; ESP=405004
      10 push ecx ; ESP=405000
      11 [B]mov esp,405004[/B]
      12 pop eax ; [i]OA[/i] EAX=EBX (here is nest = 0, find_nest = +4)
      [/QUOTE]
      13 If is ESP filled with absolute value (e.g. MOV) then
      14 If is value of ESP at the address of the previous instruction (line 10) known then
      15 If is new value of ESP known (line 11) then
      16 Compute new find_nest: find_nest = find_nest + subtraction; // in example 4+4=+8
      17 Goto check_nest (line 56);
      18 Else return from proc with error "unknown value of top of stack";
      19 Else return from proc with error "unknown stack contains"; // can't compute the subtraction
      20 EndIf;

      // ESP is filled relative way: need for recognition if is this instruction inside a loop
      // this recognition is not needed, if is founded stack-pair instructions, e.g. PUSH-POP and if one of them
      // is not in loop relative to the companion - see example:
      [QUOTE]
      21 @@: [B]push [ebx][/B] ; ESP=ESP-4 [B]companion[/B]
      22 xor [esp],eax ; code data
      23 jnz @lbl
      24 push eax ; ESP=ESP-4
      25 mov eax,[ecx] ; if key==0 and data==0, fill with [ecx]
      26 xor [esp+4],eax
      27 pop eax ; ESP=ESP+4
      28 @lbl: [B]pop [ebx][/B] ; ESP=ESP+4 [B]first[/B]
      29 add ebx,4
      30 loop @B ; loop may be overpassed - contains only stack-pair instructions
      31 retn ; [i]OA[/i] find_nest=+4
      [/QUOTE]
      32 Remember this ESP change (line 28, +4);
      33 Save actual find_nest (line 28, +4);
      34 [I][B]Scan reverse through the code, watch for:[/B][/I]
      35 [B]Instruction modifies ESP[/B]:
      36 Add this ESP change to find_nest;
      37 If result == 0 then break this local scanning; // perhaps stack-pair instructions (see example), check it
      38 [B]Near jump[/B]:
      39 If is its label over the original instruction (over line 28) then goto call_get_loop (line 52);
      // this jump breaks the loop (if any), so break this local scanning and check, if there is a loop
      40 [B]Far jump[/B]:
      41 Goto call_get_loop (line 52);
      // this jump breaks the loop (if any), so break this local scanning and check, if there is a loop
      42 [B]Label[/B]:
      43 Remember all jump instructions using this label; // why see line 47
      [B]First instruction of the code[/B]:
      44 Break scanning this way; // the founded item is not stored within the code
      45 [I][B]Else run scanning on;[/B][/I]

      // find_nest is here 0 - actual address is line 21
      46 Return saved find_nest;
      47 If non of remembered jump instructions has labels outside original address (line 28)
      and actual address (line 21) then
      // means non of them breaks the potential loop
      48 If ESP change between each of jump instruction and actual address (line 21)
      is equal to ESP change between its label and original address (line 28) then
      // these two conditions mean that such jump doesn't change the sense of stack handling;
      // in example ESP change between lines 23 and 21 is equal to ESP change between lines 28 and 28
      49 Resume the main scanning loop from instruction previous to actual address;
      // Overpass instructions between original and actual addresses, because ESP is returned
      50 EndIf;
      51 EndIf;
      52 [I]Label call_get_loop:[/I]
      53 Call GetStackLoop(OA, actual address); // actual address means instruction at line 28
      54 If unknown loop count, return from proc with error "can't get the right instruction";
      55 Multiply remembered ESP change in actual instruction by loop count;
      56 [I]Label check_nest:[/I]
      57 Add the result to find_nest;
      58 If find_nest <= 0 then
      59 Remember actual address; [B]// it got the instruction[/B]
      60 Break scanning this way; // i.e. run scanning on from remembered jump from line 5, if any
      61 EndIf;

      62 [B]First instruction of the code[/B]: // check this at the end of main scanning loop
      63 Break scanning this way; // the founded item is not stored within the code

      64 [I][B]Else run main scanning on;[/B][/I]

      [B]StackScan ENDP[/B];

      [B]Procedure GetStackLoop[/B](Arg1: Origin Address OA, Arg2: Actual Address AA)
      // returns loop_count, one if not in loop, or error when the loop count was not deducted
      // Checks if instruction at AA is in loop relative to the OA

      65 [I][B]Scan through the code from AA, watch for:[/B][/I]
      66 [B]OA[/B]:
      [QUOTE]
      67 push eax ; AA
      68 retn ; OA
      [/QUOTE]
      69 Return from proc with loop_count = 1; // there is no loop

      70 [B]AA[/B]:
      [QUOTE]
      71 [B]@@: push eax[/B] ; AA
      72 inc eax
      73 cmp eax,ebx
      74 [B]jne @B[/B] ; see line 90
      75 retn ; OA
      [/QUOTE]
      76 If is loop condition unknown then return from proc with the error
      77 Else return from proc with loop_count = number of loop iterations;

      78 [B]Last remembered linking label[/B]: // remembered from line 96
      [QUOTE]
      79 push eax ; AA
      80 @lbl: cmp eax,ebx
      81 [B]jne @F[/B] ; this jump doesn't loop around AA
      82 retn ; OA
      83 @@: add eax,ecx
      84 jmp @lbl
      [/QUOTE]
      85 Run scanning on at the next instruction; // no loop, run next code on

      86 [B]Far jump[/B]:
      87 Return from proc with loop_count = 1; // there is no loop

      88 [B]Unconditional jump[/B]:
      89 Run scanning on at its label;

      90 [B]Conditional jump[/B]:
      [QUOTE]
      91 push eax ; AA
      92 inc eax
      93 cmp eax,ebx
      94 [B]jne @B/F[/B]
      95 retn ; OA
      [/QUOTE]
      96 Remember address of this jump instruction; // scanning will run later on at the next instruction
      97 Run scanning on at label of this jump;
      // this remembered address is tested at line 78
      // this system enables dispatching of nested loops (lines 99 to 101):
      [QUOTE]
      98 @BB: add eax,ebx
      99 nest: or eax,esi
      100 add eax,edx
      101 [B]ja nest[/B]
      102 @@: cmp eax,ecx
      103 [B]jne @BB[/B]
      104 push eax ; AA
      105 inc eax
      106 cmp eax,ebx
      107 [B]jne @B[/B]
      108 retn ; OA
      [/QUOTE]
      109 [B]Last instruction of the code[/B]:
      110 Return from proc with loop_count = 1; // there is no loop

      111 [I][B]Else run main scanning on;[/B][/I]

      [B]GetStackLoop ENDP[/B];


      edited line 33: +8 corrected to +4
Posted on 2004-04-19 23:13:17 by MazeGen