Conditional Statements

From ASM Book

Revision as of 05:10, 19 October 2009 by SpooK (Talk | contribs)
(diff) ← Older revision | Current revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Conditional statements are the "if...else...endif", "switch...case", "while...wend", "do..while" etc. The statements that allow for branching (deviation to separate choices) based on conditions are called conditional branching statements. Those that allow for repetition of statements of code without rewriting code based on conditions are called conditional looping statements.

However, assembly language does not natively support high-level representation of these conditional statements. Specialized instructions use the EFlags register to determine a condition and then a jump is executed based on the state. High-level constructs can be implemented for these instructions, but they are mostly either macros or incorporated assembler directives.

This chapter focuses on teaching you how we implement these constructs using plain-vanilla assembly instructions.


Contents

The Flags

In assembler, conditional statements revolve around one thing and that is the EFLAG register (or more commonly known as the flag register). All the opcodes can be classified into 2 groups, the first being opcodes that modifies the EFLAG, and those that do not modify the EFLAG. For the former group it could be further classified into opcodes that which modifies what flags and so on. The most important flags are carry flag (CF), overflow flag (OF), zero flag (ZF), Sign flag (SF), Parity flag (PF). Also another somewhat important flag would be the direction flag (DF), but it would only be used by string opcodes and can only be modified by cld (clear direction flag) and std (set direction flag).

Opcodes relating to Conditional Statements

The opcodes that would be most commonly seen in conditional statements in assembler would be the following
Instruction Description
JMP Unconditional jump
Jcc Jump if conditions met
JCXZ/JECXZ Jump if cx/ecx equals 0
LOOP Loop count
LOOPZ/LOOPE Loop count while zero/equal
LOOPNZ/LOOPNE Loop count while not zero/equal
CMOVcc Conditional move
TEST Logical compare
CMP Compare 2 operands

For Jcc, CMOVcc and SETcc, there is actually a whole range of opcodes. The "cc" in Jcc, CMOVcc and SETcc represent the tttn (condition test fields). Some of the conditions test fields have their alias thus actually they are opcodes that are the same (For example, JZ is the same as JE).

The tttn is as following (listed according to its encoding)
  • O (Overflow) OF = 1
  • NO (No overflow) OF = 0
  • C/B/NAE (Carry, Below, Not above or equal) CF = 1
  • NC/NB/AE (No carry, Not below, Above or equal) CF = 0
  • E/Z (Equal, Zero) ZF = 1
  • NE/NZ (Not equal, Not zero) ZF = 0
  • BE/NA (Below or equal, Not above) CF = 1 or ZF = 1
  • NBE/A (Not below or equal, Above) CF = 0 and ZF = 0
  • S (Sign) SF = 1
  • NS (Not sign) SF = 0
  • P/PE (Parity, Parity even) PF = 1
  • NP/PO (Not parity, Parity odd) PF = 0
  • L/NGE (Less than, Not greater than or equal to) SF <> OF
  • NL/GE (Not less than, Greater than or equal to) SF = OF
  • LE/NG (Less than or equal to, Not greater than) ZF = 1 or SF <> OF
  • NLE/G (Not less than or equal to, Greater than) ZF = 0 and SF = OF

One would wonder what is the difference between ja and jg. Well the difference is that ja is jump if above (intended for unsigned numbers), while jg is jump if greater (intended for signed number). Alright so the above list could be classified into conditional jumps for signed numbers, conditional jumps for unsigned, and others.

Conditional Jumps for signed numbers
  • JL/JNGE
  • JNL/JGE
  • JLE/JNG
  • JNLE/JG
Conditional Jumps for unsigned numbers
  • JC/JB/JNAE
  • JNC/JNB/JAE
  • JBE/JNA
  • JNBE/JA
Others
  • JO
  • JNO
  • JE/JZ
  • JNE/JNZ
  • JS
  • JNS
  • JP/JPE
  • JNP/JPO

"JMP" is an unconditional jump. For JMP, there is 2 types of jump commonly used, one is jump near, relative, displacement relative to next instruction, the one is jump near, absolute indirect, address given in operand. Jcc is almost similiar to JMP, just that the jump is only taken if the conditions are right (For example for JZ label, the processor will only jump to label if ZF = 0). JCXZ/JECXZ is a jump if cx/ecx (dependent on opcode used) is zero. But take note that the displacement for JCXZ and JECXZ is only 1 byte, id est can only jump relative to JCXZ/JECXZ -127 to +127.

LOOP/LOOPxx instruction makes us of ecx or cx as the counter. Each time LOOP instruction is executed, ecx or cx (depending on address size) is decremented, then if counter != 0, the code will jump to the label. So in short LOOP label is the same as the following
label:
 
     dec    ecx     ; decrement counter in count register
     jnz    label   ; go back to label if ecx is not zero

LOOPZ and LOOPNZ is similiar to LOOP but the jump is also dependent on the value in Zero Flag. For LOOPZ, the code will jump to the label if counter != 0 and zero flag is set to 1. For LOOPNZ, the code wil jump to the label if counter != 0 and zero flag is set 0 (or rather is cleared). Do take note that Intel do not recommend LOOP/LOOPZ/LOOPNZ because they say it is a complex instruction and it would be much better to do the above code to replace LOOP. Also loop has a displacement of 1 byte, so the jump must be within displacement of -127 to +127.

SETcc will set the byte if the condition is met. Please bear in mind that SETcc only accept 8bit register and 8bit memory and nothing else, no support for 32bit or 16bit memory or register. Though, if you wish to generate a 32-bit result from SETcc that can be done by zero extending the 8bit register by using the MOVZX instruction. CMOVcc is only available on 686 and later, but I personally think it is more useful than SETcc. For CMOVcc, if the condition met, the code will move data from register to register, memory to register or register to memory. Do take note that the conditional move is only for 32bit and 16bit register and memory. Conditional move for 8bit register and memory is not support.

CMP instruction compares the first operand with the second source operand and set the status flag in the EFLAGS register according to the results. The comparison is performed by subtracting the second operand from the first operand and then setting the status flags in the same manner, but the results are not updated. TEST instruction compares the bit-wise logical AND of the first operand and the second operand and set the SF, ZF and PF status flags according to the result. The result is then discarded.

Now the more commonly used opcodes for conditional statements are introduced, lets dive in into the topic itself.

How to implement conditional statement in assembler

In this section, I will give some pseudo code and later how it would look like in assembler.


IF statement

Pseduo code:
IF eax < 25
//do something here
ENDIF
 
Assembler:
cmp eax, 25
jnc _out
;do something here
_out:
 
HLA (low-level syntax):
 
cmp( eax, 25 );
jnb _out;
  // do something here
_out:
 
 
MASM/TASM (high-level syntax):
 
.if eax < 25
  ;do something here
.endif
 
HLA (high-level syntax):
 
if( eax < 25 ) then
  // do something here
endif;
Comment
I generally perfer jnc to jnb because jnc means jump if carry flag is not set as opposed to jnb which means jump if not below.


When using high-level control structures like HLA's "if" and MASM's .if, you have to be careful when comparing registers against values. By default, most high-level assemblers assume you're using unsigned comparisons. The following rarely does what the author expects 
if( eax > -1 ) then
   // do something
endif;
 
// equivalent low-level code:
 
cmp( eax, -1 );
jna endOfIf;
 // do something
endOfIf:
:

The problem is that -1 is equivalent to $ffff_ffff (0ffffffffh) and EAX, when treated as an unsigned value, is never greater than this value (hence the expression above is always false). You'll have to explicitly tell the assembler if you want to do a signed comparison, e.g.,

if( (type int32 eax) > -1 ) then
   // do something
endif;
 
// equivalent low-level code:
 
cmp( eax, -1 );
jng endOfIf;
  // do something
endOfIf:

Always be aware of the differences between signed and unsigned comparisons!


Pseudo code:
 
IF eax == 0
//do something here
ENDIF
 
HLA high-level syntax:
 
if( !eax ) then
  // do something here
endif;
 
Assembler:
 
test eax, eax ;set the flags
jnz _out
;do something here
_out:
 
HLA low-level syntax:
 
test( eax, eax );
jnz _out;
  // do something here
_out:
 
 
or
 
or eax, eax ;set the flags
jnz _out
;do something here
_out:
 
HLA syntax:
 
or( eax, eax );
jnz _out;
  // do something here
_out:
 
or
 
xchg eax, ecx
jecxz _out
;do something here
_out:
 
HLA syntax:
 
xchg( eax, ecx );
jecxz _out;
  // do something here
_out:
Comment
This would be probably one of the more common code seen in assembler (Quite a number of windows API returns 0 in eax on error). One reason why "cmp eax, 0" is not used is because test eax, eax and or eax, eax is shorter than the cmp (The last example is 1byte smaller than the test and or variant because xchg eax, reg is only 1byte. However the drawback is that the displacement must be -127 to +128). Call it size optimisation. The test instruction could be used for test whether a bit is set. For example:
Pseudo code:
 
IF eax is odd
edx++
ENDIF
 
Assembler code:
 
test eax, 1
jz _even
inc edx
_even:
 
HLA syntax:
 
test( 1, eax );
jz _even;
inc( edx );
_even:
 
 
or
 
shr eax, 1
jnc _even
inc edx
_even:
 
HLA syntax:
 
shr( 1, eax );
jnc _even;
inc( edx );
_even:
 
 
or
 
bt eax, 0
jnc _even:
inc edx
_even:
 
HLA syntax:
 
bt( 0, eax );
jnc _even;
inc( edx );
_even:
 
or
 
shr eax, 1
adc edx, 0
 
HLA syntax:
 
shr( 1, eax );
adc( 0, edx );
 
or
 
bt eax, 0
adc edx, 0
 
HLA syntax:
 
bt( 0, eax );
adc( 0, edx );
Comment
The above codes are just examples of testing for "even-ness", id est whether the last bit is set or not. The first should be easy to understand, the second makes use of the fact the carry flag contains the last bit shifted out, while the last makes use of the instruction bt which test the bit and sets the carry flag according to whether the bit is set or not. All in all, the first and third does not destroy the value in eax, but the second does. If you want to preserve the value, go for the test version or the bt version.


Pseudo code:
 
IF eax > 47
edx = eax
ENDIF
 
MASM syntax (high-level):
 
.if eax > 47
mov edx, eax
.endif
 
HLA Syntax (high-level):
 
if( eax > 47 ) then
   mov( eax, edx );
endif;
 
Assembler:
 
cmp eax, 47
cmova edx, eax
 
HLA syntax:
 
cmp( eax, 47 );
cmova( eax, edx );
 
 
or
 
cmp eax, 47
jna @F
mov edx, eax
@@:
 
HLA syntax:
 
cmp( eax, 47 );
jna atF;
mov( eax, edx );
atF:
Comment
This is just an example of how the instruction CMOVcc could be used (Sweet and short huh?). This could be replaced by some conditional jumps (as shown in the later example), but misjumps take up alot of cycles. Just do take note that CMOVcc are introduced in P6 family processor, and may not be supported on all IA-32 processors.


Pseudo code:
 
IF eax == 9
ecx = 1
ENDIF
 
Assembler:
 
cmp eax, 9
setz cl
movzx ecx, cl
 
HLA syntax:
 
cmp( eax, 9 );
setz( cl );
movzx( cl, ecx );
Comment
This is just an example of how the instruction SETcc could be used. The movzx zero extend the value in cl (which is set depending on the value in eax) to ecx.


FOR statement (C version)

Pseudo code:
 
FOR (ecx==0;ecx<=9;ecx++){
array[ecx] = array2[ecx]
}
 
HLA high-level syntax:
 
for( xor( ecx, ecx ); ecx <= 9; inc( ecx )) do
  mov( array2[ ecx*4 ], eax );
  mov( eax, array[ ecx*4 ] );
endfor;
 
 
Assembler:
 
xor ecx, ecx
_loopstart:
mov eax, array2[ecx*4]
mov array[ecx*4], eax
inc ecx
cmp ecx, 9
jbe _loopstart
 
HLA syntax:
 
xor( ecx, ecx );
_loopstart:
  mov( array2[ ecx*4 ], eax );
  mov( eax, array[ ecx*4 ] );
  inc( ecx );
  cmp( ecx, 9 );
  jbe _loopstart;
 
or
 
mov ecx,9
_loopstart:
mov eax, array2[ecx*4]
mov array[ecx*4], eax
dec ecx
jnz _loopstart
 
HLA syntax:
 
mov( 9, ecx );
_loopstart:
  mov( array2[ ecx*4 ], eax );
  mov( eax, array[ ecx*4 ] );
  dec( ecx );
  jnz _loopstart;
Comment
Both examples does the same thing in this context, but the second example is one instruction shorter than the other. Also in both examples, ecx is used as the counter. This need not be the case, you can use any of the other registers as the counter.


IF-THEN-ELSE statement

Pseudo code:
 
IF (ecx<eax)
edx = 8
ELSE
edx = 16
ENDIF
 
Assembler:
 
cmp ecx, eax
sbb edx, edx
and edx, 8
add edx, 8
 
HLA syntax:
 
cmp( ecx, eax );
sbb( edx, edx );
and( 8, edx );
add( 8, edx );
 
or
 
cmp ecx, eax
jnc @F
mov edx, 8
jmp _@@
@@:
mov edx, 16
_@@:
 
HLA syntax:
 
cmp( ecx, eax );
jb notLessThan;
  mov( 8, edx );
  jmp endOfIF;
 
notLessThan:
  mov( 16, edx );
 
endOfIF:
Comment
Personally I prefer the first code because there is no branching. Instead carry flag is used to set edx to -1 or 0. The and instruction then sets edx to 8 or 0. Then finally the add instruction fixes the number to the desired number.


Advanced IF statements

Pseudo code:
 
IF EAX>="0" && EAX<="9"
;do something
ENDIF
 
HLA high-level syntax:
 
if( eax >= '0' && eax<= '9' ) then
  // do something
endif;
 
also:
 
if( eax in '0'..'9' ) then
  // do something
endif;
 
Assembler:
 
cmp eax, "0"
jc @F
cmp eax, "9"
ja @F
;do something
@@:
 
HLA low-level syntax:
 
cmp( eax, '0' );
jnae atF;
cmp( eax, '9' );
jnbe atF;
  // do something
atF:
 
 
or
 
lea ecx, [eax-"0"]
cmp ecx, "9"-"0"
ja @F
;do something
@@:
 
HLA syntax:
 
lea( ecx, [eax-@byte('0')]);
cmp( ecx, 9 );
jnbe atF;
  // do something
atF:
 
or
 
sub( '0', eax );  // or xor( '0', eax );
cmp( eax, 9 );
jnbe atF;
  // do something
atF:
Comment
The second code makes use of one register but there is only one conditional jump while the second code makes use of two conditional jumps. Generally the best optimised code is when you do not need conditional jumps at all. Kudos to Nexo for coming up with the second code.


SWITCH-CASE statements

Pseudo code:
 
SWITCH eax
case 0:
   mov ecx, 7
   break
case 1:
   mov edx, 8
   break
case 2:
   mov ecx, 9
   break
case 3:
   mov edx, 10
   break
case 4:
   mov ecx, 11
   break
default:
   or ecx, -1
   break
END SWITCH
 
HLA switch macro (from the HLA standard library):
 
switch( eax )
  case( 0 )
   mov ecx, 7
 
  case(1)
   mov edx, 8
 
  case(2)
   mov ecx, 9
 
  case(3)
   mov edx, 10
 
  case(4)
   mov ecx, 11
 
  default
   or ecx, -1
 
endswitch;
 
 
Assembler:
 
.data
jmptable dd offset _0, offset _1, offset _2, offset _3, offset _4
.code
cmp eax, 4
ja _default
jmp jmptable[eax*4]
_0:
mov ecx, 7
jmp @F
_1:
mov edx, 8
jmp @F
_2:
mov ecx, 9
jmp @F
_3:
mov edx, 10
jmp @F
_4:
mov ecx, 11
jmp @F
_default:
or ecx, -1
@@:
 
HLA Syntax:
 
static
  jmptable: dword[] := [&_0, &_1, &_2, &_3, &_4 ];
endstatic;
 
cmp( eax, 4 );
ja _default;
jmp( jmptable[eax*4] );
 
_0:
  mov( 7, ecx );
  jmp atF;
 
_1:
  mov( 8, edx );
  jmp atF;
 
_2:
  mov( 9, ecx );
  jmp atF;
 
_3:
  mov( 10, edx );
  jmp atF;
 
_4:
  mov( 11, ecx );
  jmp atF;
 
_default:
  or( -1, ecx );
 
atF:
Comment
The following example just introduce you to the idea of a jump table. One may wonder why there is only one conditional jump to test whether it is within the range. This is because negative numbers are "bigger" than normal number and we are using unsigned comparison (Which would be the concept behind Nexo code in the previous example).
Personal tools