Hi,
I would like to discuss here an optimal heuristic algo that enables finding of all instructions, which directly fill a specific item in the stack, using reverse stack scanning. I would like to speak about general issues, not about concrete assembly implementation.
I started writing this algo as I needed to find all labels of RETN instruction initially - I believe this is the best approach instead of watching the stack after each CALL or PUSH OFFSET or similar instructions. I generalized the algo later on.
I wrote the algo with general code in mind and that's why it tests, for instance, far jumps too.

Note: This topic is quite sensible here, but is authorised by the moderators. It may be deleted when it turns in the wrong direction, so please sticky to the topic.

Well... I hope the algo works, even though I didn't code it and test it.
I think the algo has only one drawback - it probably will be very slow, because it tests if is stack-handling instruction inside a loop, because the loop multiplies the effect on the stack. The algo tests for the loop until it gets last instruction of the code or some important instruction (see GetStackLoop proc). I think of limitation of maximal count of scanned instructins or so...

Notes:
Do not depend on given examples. These are just for illustration.
The algo is intended for scanning protected-mode code with 32bit stack segment.

1st example: get the label(s) of the RET instruction:
``````
push	eax		stdcall argument
call	proc		ESP=ESP-4
...
proc:	...			ESP without changes
[B]retn 4[/B]			find_nest: +4 (== 0 + 4)
``````

2nd example: get the instruction, which fills the item at ESP+8
``````
push	eax		ESP=ESP-4
push	ebx		ESP=ESP-4
push	ecx		ESP=ESP-4
[B]mov	edx,[esp+8][/B]	find_nest: +12 (== 8 + 4)
``````

Base theses:

[*]In general, the scanner watches the changes of ESP (e.g. PUSH means ESP=ESP-4). This change is added to the find_nest number until the result is zero - then it gets the filling instruction. Computing the find_nest: at the start of scanning, we know how nested is seeking item (e.g. the nest is zero in case of 1st example, or +8 in case of 2nd example) relative to starting ESP. To this nest has to be added +4 and we get find_nest, because we add each relative change of ESP to this find_nest (see examples above).
[*]The main difficulty lies in correct recognition of stack-handling in loop, e.g.:
``````
[B]@@:	push	eax[/B]
inc	eax
cmp	eax,ebx
[B]jne	@B[/B]		; conditional loop
retn			; start scanning - origin address [I]OA[/I]
``````

The GetStackLoop function tests it.

Assumptions and simplifications:

[*]The code is fully disassembled into list of entries (i.e. we know the lenght of any instruction, so reverse scanning is possible)
[*]Data between code and other junks were correctly recognized.
[*]No other instruction indirectly rewrites the stack (e.g. MOV EAX,ESP - MOV ,xyz)
[*]The manipulations with stack are always DWORD-aligned
[*]To prevent multiple scanning of the same code, scanned addresses are remembered along with appropriate find_nest values and these informations are tested while scanning. This saving/testing is not included in the algo below.

Algo:
``````
[B]Procedure StackScan[/B](Arg1: Origin Address OA, Arg2: nest)
// returns record of addresses of instructions, which fills the item
01  Compute and remember find_nest: find_nest = nest + 4;
02  [I][B]Scan reverse through the code, watch for:[/B][/I]
03   [B]Unconditional jump[/B]:
04    Stop going this way;
// there has to be label at the next instruction - jumps using this label are remembered already (see next line)

05   [B]Label[/B]:
06    Remember all jump instructions using this label, with actual find_nest;
// label means independent branch that has to be scanned later extra

07   [B]Instruction modifies ESP[/B]:
[QUOTE]
08 		mov	esp,405008
09	 	push	ebx			; ESP=405004
10 		push	ecx			; ESP=405000
11	 	[B]mov	esp,405004[/B]
12 		pop	eax			; [i]OA[/i]	EAX=EBX (here is nest = 0, find_nest = +4)
[/QUOTE]
13    If is ESP filled with absolute value (e.g. MOV) then
14     If is value of ESP at the address of the previous instruction (line 10) known then
15      If is new value of ESP known (line 11) then
16       Compute new find_nest: find_nest = find_nest + subtraction;	// in example 4+4=+8
17       Goto check_nest (line 56);
18      Else return from proc with error "unknown value of top of stack";
19     Else return from proc with error "unknown stack contains";	// can't compute the subtraction
20    EndIf;

// ESP is filled relative way: need for recognition if is this instruction inside a loop
// this recognition is not needed, if is founded stack-pair instructions, e.g. PUSH-POP and if one of them
// is not in loop relative to the companion - see example:
[QUOTE]
21   	@@:	[B]push	[ebx][/B]			; ESP=ESP-4 [B]companion[/B]
22   		xor	[esp],eax			; code data
23   		jnz	@lbl
24   			push	eax		; ESP=ESP-4
25   			mov	eax,[ecx]		; if key==0 and data==0, fill with [ecx]
26   			xor	[esp+4],eax
27   			pop	eax		; ESP=ESP+4
28   	@lbl:	[B]pop	[ebx][/B]			; ESP=ESP+4 [B]first[/B]
30   		loop	@B			; loop may be overpassed - contains only stack-pair instructions
31   		retn				; [i]OA[/i] find_nest=+4
[/QUOTE]
32    Remember this ESP change (line 28, +4);
33    Save actual find_nest (line 28, +4);
34    [I][B]Scan reverse through the code, watch for:[/B][/I]
35     [B]Instruction modifies ESP[/B]:
36      Add this ESP change to find_nest;
37      If result == 0 then break this local scanning;	// perhaps stack-pair instructions (see example), check it
38     [B]Near jump[/B]:
39      If is its label over the original instruction (over line 28) then goto call_get_loop (line 52);
// this jump breaks the loop (if any), so break this local scanning and check, if there is a loop
40     [B]Far jump[/B]:
41      Goto call_get_loop (line 52);
// this jump breaks the loop (if any), so break this local scanning and check, if there is a loop
42     [B]Label[/B]:
43      Remember all jump instructions using this label;	// why see line 47
[B]First instruction of the code[/B]:
44      Break scanning this way;	// the founded item is not stored within the code
45    [I][B]Else run scanning on;[/B][/I]

// find_nest is here 0 - actual address is line 21
46    Return saved find_nest;
47    If non of remembered jump instructions has labels outside original address (line 28)
and actual address (line 21) then
// means non of them breaks the potential loop
48     If ESP change between each of jump instruction and actual address (line 21)
is equal to ESP change between its label and original address (line 28) then
// these two conditions mean that such jump doesn't change the sense of stack handling;
// in example ESP change between lines 23 and 21 is equal to ESP change between lines 28 and 28
49      Resume the main scanning loop from instruction previous to actual address;
// Overpass instructions between original and actual addresses, because ESP is returned
50     EndIf;
51    EndIf;
52    [I]Label call_get_loop:[/I]
53    Call GetStackLoop(OA, actual address);	// actual address means instruction at line 28
54    If unknown loop count, return from proc with error "can't get the right instruction";
55    Multiply remembered ESP change in actual instruction by loop count;
56    [I]Label check_nest:[/I]
57    Add the result to find_nest;
58    If find_nest <= 0 then
59     Remember actual address;	[B]// it got the instruction[/B]
60     Break scanning this way;	// i.e. run scanning on from remembered jump from line 5, if any
61    EndIf;

62   [B]First instruction of the code[/B]:	// check this at the end of main scanning loop
63     Break scanning this way;	// the founded item is not stored within the code

64  [I][B]Else run main scanning on;[/B][/I]

[B]StackScan ENDP[/B];

// returns loop_count, one if not in loop, or error when the loop count was not deducted
// Checks if instruction at AA is in loop relative to the OA

65  [I][B]Scan through the code from AA, watch for:[/B][/I]
66   [B]OA[/B]:
[QUOTE]
67  		push	eax		; AA
68  		retn			; OA
[/QUOTE]
69    Return from proc with loop_count = 1;	// there is no loop

70   [B]AA[/B]:
[QUOTE]
71 	[B]@@:	push	eax[/B]		; AA
72 		inc	eax
73 		cmp	eax,ebx
74 		[B]jne	@B[/B]		; see line 90
75 		retn			; OA
[/QUOTE]
76    If is loop condition unknown then return from proc with the error
77    Else return from proc with loop_count = number of loop iterations;

78   [B]Last remembered linking label[/B]:	// remembered from line 96
[QUOTE]
79  		push	eax		; AA
80  	@lbl:	cmp	eax,ebx
81  		[B]jne	@F[/B]		; this jump doesn't loop around AA
82 		retn			; OA
84  		jmp	@lbl
[/QUOTE]
85    Run scanning on at the next instruction;	// no loop, run next code on

86   [B]Far jump[/B]:
87    Return from proc with loop_count = 1;	// there is no loop

88   [B]Unconditional jump[/B]:
89    Run scanning on at its label;

90   [B]Conditional jump[/B]:
[QUOTE]
91  		push	eax		; AA
92  		inc	eax
93  		cmp	eax,ebx
94  		[B]jne	@B/F[/B]
95  		retn			; OA
[/QUOTE]
96    Remember address of this jump instruction;	// scanning will run later on at the next instruction
97    Run scanning on at label of this jump;
// this remembered address is tested at line 78
// this system enables dispatching of nested loops (lines 99 to 101):
[QUOTE]
99  	nest:	or	eax,esi
101 		[B]ja	nest[/B]
102 	@@:	cmp	eax,ecx
103 		[B]jne	@BB[/B]
104 		push	eax			; AA
105 		inc	eax
106 		cmp	eax,ebx
107 		[B]jne	@B[/B]
108 		retn				; OA
[/QUOTE]
109  [B]Last instruction of the code[/B]:
110   Return from proc with loop_count = 1;	// there is no loop

111 [I][B]Else run main scanning on;[/B][/I]

[B]GetStackLoop ENDP[/B];
``````

edited line 33: +8 corrected to +4
Posted on 2004-04-19 23:13:17 by MazeGen