After reading this thread, I felt like writing a stack probe routine :)
I needed it for my own small C runtime anyway, and thought it would be nicer to write my own rather than stealing microsoft's. The MS version also seemed more complicated than need be :)

I guess the main focus ought to be size rather than speed - how often do you call routines that uses huge amount of local storage? That said, there's no reason to make it overly slow, either. On my machine (P4) the first routine clocks somewhat slower than the MS version, and the second is about the same speed. Nope, I didn't do very comprehensive tests :)

I'm posting this mainly for fun, but somebody might find it useful. And if you feel like improving size or speed or both :), it would be fun to see what you can come up with, my code is fairly lame ^_^

; Probe each page of stack memory for routines with a large stack frames,
; thus utilizing the guard page mechanism to get all the memory committed.
;Entry: EAX = size of local frame
;Exit: ESP = new stackframe, if successful
;Uses: EAX
Posted on 2004-04-19 17:53:18 by f0dder
Can't this be done with a couple ENTER instructions or one in a loop. ;)
Posted on 2004-04-19 18:34:50 by bitRAKE

ENTER sets up EBP and doesn't actually touch the pages on the stack? (except of course from the ebp-pushing).

But perhaps you're thinking of saving ebp, and using "ENTER 0, 4096" in a loop? Hm, let's see... :p
Posted on 2004-04-20 00:32:46 by f0dder
; ensure stack pages commited for large local data

TouchStack MACRO iSize:REQ

lea eax, [esp - (iSize)]
tag: enter PAGE_SIZE - 4, 0
cmp eax, esp
jc tag
mov esp, eax
...really small if you don't care about EBP. :)
Posted on 2004-04-20 00:47:00 by bitRAKE
I think it's a requirement of _chkstk to only touch EAX though :/ - but yes, if you don't care about EBP, that is pretty small :)

(and slow ;))
Posted on 2004-04-20 00:50:28 by f0dder
Here's a silly one - slower&larger than _chkstk2, I probably missed something very obvious :stupid:

_chkstk3 PROC ; 1Bh bytes
push ebp
push eax
mov eax, esp

enter PAGESIZE-4, 0 ; touch (because of push ebp) + sub stack
sub dword ptr [eax], 4096 ; one less page to go
jns @@touchloop

sub esp, [eax] ; sub negative value = re-increase ESP

add esp, 12 ; adjust for push ebp + push eax + ret-eip
mov ebp, [eax+4]
jmp dword ptr [eax+8] ; ret
_chkstk3 ENDP
Posted on 2004-04-20 00:53:31 by f0dder
If you want to create a small executable that is compressed then use a fully unrolled version producing code like...
	mov eax, ebp

enter PAGE_SIZE-4, 0
...(n times)...
enter PAGE_SIZE-4, 0
enter iSize - n*(PAGE_SIZE)-4, 0
mov ebp, eax
...of course saving EBP only if you need it.

Also, a compressor that supports scaling should be used for maximum compression. The scaling can be in the compression algorithm itself (Sequitur) or through some type of preprocessing (n-RLE) that colapses instruction patterns of variable length.
Posted on 2004-04-20 01:01:12 by bitRAKE
Cute trick :) - got any ideas on improving the version that takes the frame size in eax? Here's another early-morning version based on your idea :)

_chkstk4 PROC ; 1Ah bytes
neg eax
lea eax, ; target esp

push ebp ; save ebp for later
push eax
mov eax, esp

enter PAGESIZE-4, 0 ; touch (because of push ebp) + sub stack
cmp dword ptr , esp ; below target yet?
jc @@touchloop

mov esp, ; set up final esp
mov ebp, ; restore ebp
jmp dword ptr ; ret
_chkstk4 ENDP
Posted on 2004-04-20 01:10:17 by f0dder
I played with it a little, but haven't tested:
sub eax, esp

push ebp
lea ebp, [esp+8]
neg eax
enter PAGESIZE - 4*3, 2
cmp eax, esp
jc @@touchloop
mov ebp, [ebp-8]
mov [eax], ebp
mov ebp, [esp+PAGESIZE - 4*2]
mov esp, eax
Too, tricky and even slower. :( I don't like supporting such rigid interfaces at run-time. They should colapse into optimal code when the routines are so small.
Posted on 2004-04-20 15:55:42 by bitRAKE

I don't think chkstk2 is too bad... but yes, you could get it smaller if you inlined the code in the function body and relied on a compile-time known frame size. I guess they opted for the mov "eax, size" + "call _chkesp" method to possibly save some bytes in large programs with many functions with a large amount of locals :rolleyes:.

I think the speed of the algorithm drowns in the exceptions caused - then again, I don't know if stack pages are ever decommitted, so it might be a "few-times" hit only, when entering deeper nested functions, or the next function with a larger stack frame size.

I guess it's also nice having control of the _chkesp function - perhaps one could consider using VirtualAlloc to commit all the pages in one go, instead of touching each one and possibly causing a lot of pagefaults? Dunno if it's a good idea and if there's speed benefits from this in the long run :)

Of course the _chkesp can be turned completely off, if it's not required for your target platform (like, using a dos extender with PE file support).

It might have been an idea to support a stackprobe switch that would insert per-function code.
Posted on 2004-04-20 16:31:56 by f0dder