I was recently pondering how I could use CMOV in my applications but still be backward compatible with older processors. The usual answer is to set up multiple execution paths depending on the processor type. That is a very laborious and space wasting solution as you have to have multiple copies of code blocks for each eventuality. I decided there must be a better way, I wanted to ensure that on newer processors the code would run optimally and was not concerned with speed on older ones so I thought why not use the exception hanlder to do this. In the following example I have used UD2 (undefined opcode 2) but you can easily set it up for any opcode.

The example uses the final exception handler to trap and analyze the error then sets EAX to -1 and continues. If the error is not what we expect, it passes the exception on to the next handler (usually the JIT debugger)

// Set the exception handler (this should be at the entry point)
invoke SetUnhandledExceptionFilter,OFFSET FinalHandler

....

FinalHandler FRAME pExceptInfo
uses esi,edi

mov eax,
mov edi,
mov esi,

// EXCEPTION_ILLEGAL_INSTRUCTION
// Check to see if this is an undefined opcode
mov eax,
cmp eax,EXCEPTION_ILLEGAL_INSTRUCTION
jne >>.NEXTHANDLER

// The 2 bytes at EIP should be UD2 (0B0Fh)
// Note that B90Fh is also gauranteed to be undefined
mov eax,
mov eax,
and eax,0FFFFh
cmp eax,0B0Fh
jne >>.NEXTHANDLER

// Increment the instruction pointer past the undefined opcode
mov eax,
add eax,2
mov ,eax

// Set the return value of EAX to 0FFFFFFFFh
mov D,0FFFFFFFFh

// continue execution
mov eax,EXCEPTION_CONTINUE_EXECUTION
ret

.NEXTHANDLER
// Pass the exception to the next handler
mov eax,EXCEPTION_CONTINUE_SEARCH
ret

ENDF


I have not coded the complete CMOV instruction as of yet but it should be an interesting project. Ofcourse it will be VERY slow on processors that do not support the instruction but since if the processor supports the instruction there is no exception it allows optimal execution on those.
Posted on 2006-05-07 09:38:43 by donkey
I'm a bit confused wrt. your use of UD2... is this only as an example?

I assume your method (as opposed to the sample) is to just CMOV all you want, and trap CMOV instructions on old CPUs, and handle them in the exception handler? Sounds doable, althouh (as you point out yourself) quite slow on the older machines, that will be slow enough to start with :)

Now, if you NOP-padded all CMOV instructions to be "large enough", you could set up an elaborate JIT that compiled code so you don't have to rely on exceptions... but that's probably quite overkill ^_^
Posted on 2006-05-07 11:05:42 by f0dder
Hi f0dder,

I used UD2 only for the example as I have not finished coding the CMOV stuff (as I said). I have WinExplorer users who are running it on an old 486 box, no RDTSC (it's priviledged) and no CMOV. Since these boxes are getting rarer every day it seems to be overkill to set up execution paths but I still would like to support them albiet at a snails pace hence my little idea. It allows me to cut and paste a routine to make my apps backward compatible without much effort.

CMOV is an excellent instruction to work with since it is always 3 byte long 0F/cc/rm as it only works with 32 bit registers and no memory or immediate operands. RDTSC also makes a great target for this as it does not use operands at all and is fixed length.
Posted on 2006-05-07 11:16:26 by donkey
It should be noted as well that this method can be used to obfuscate code quite well, using UD2 to execute a procedure as opposed to calling it is very effective in making code less traceable. Since you control the value of EIP you can even use operands to indicate which proc to execute.
Posted on 2006-05-07 11:22:21 by donkey

CMOV is an excellent instruction to work with since it is always 3 byte long 0F/cc/rm as it only works with 32 bit registers and no memory or immediate operands. RDTSC also makes a great target for this as it does not use operands at all and is fixed length.

The source operand could be 16/32-bit register or 16/32-bit memory.
Posted on 2006-05-11 03:12:44 by MazeGen
And -like many other instructions- can have up to four 1-byte prefixes in no particular order.
Posted on 2006-05-11 03:33:16 by ti_mo_n


CMOV is an excellent instruction to work with since it is always 3 byte long 0F/cc/rm as it only works with 32 bit registers and no memory or immediate operands. RDTSC also makes a great target for this as it does not use operands at all and is fixed length.

The source operand could be 16/32-bit register or 16/32-bit memory.


Hi Mazegen,

I should have qualified that with "for my purposes", I only use CMOV for register to register moves to date, that may change in the future but for now I only wrote an implementation for r32.

Donkey
Posted on 2006-05-11 12:45:02 by donkey