Hi mates !
I'm workin' on a project that uses several time critical functions that i tried as much as possible to optimize. I will put one of them here just in case more optimizations can be done. Maybe also some good book reffering at optimizations someone can recommend !?
Here is the function:
push eax
push edx
lea esi,buffer
movzx eax,
shl eax,1
test ah, ah
jz __cont
xor eax,0x0163
__cont:
xor eax,
xor eax,
mov edx,
mov ,edx
mov edx,
mov ,edx
movzx edx, word ptr
xor al,dl
mov ,al
pop edx
pop eax
Hope someone can take some time and help me ...
Br;)
I'm workin' on a project that uses several time critical functions that i tried as much as possible to optimize. I will put one of them here just in case more optimizations can be done. Maybe also some good book reffering at optimizations someone can recommend !?
Here is the function:
push eax
push edx
lea esi,buffer
movzx eax,
shl eax,1
test ah, ah
jz __cont
xor eax,0x0163
__cont:
xor eax,
xor eax,
mov edx,
mov ,edx
mov edx,
mov ,edx
movzx edx, word ptr
xor al,dl
mov ,al
pop edx
pop eax
Hope someone can take some time and help me ...
Br;)
www.agner.org is pretty good - and if you want the full gory details, with no friendly interpretation, there's the pentium documentation at http://developer.intel.com/design/Pentium4/documentation.htm (there's PDFs for a couple earlier models as well). Also, AMD has some docs.
Your code looks pretty bad from a quick glimpse :) (partial register usage, non-aligned data reference, conditional jmp) - oh, and you should perhaps give a short description of the code rather than just the code itself. Not too bad in this example since it's so short, but it's always helpful...
Your code looks pretty bad from a quick glimpse :) (partial register usage, non-aligned data reference, conditional jmp) - oh, and you should perhaps give a short description of the code rather than just the code itself. Not too bad in this example since it's so short, but it's always helpful...
DANNY
1- Which assembler syntax are you using, MASM?
2- Have you tried assembling the code you posted?
For example, the movzx eax, should get rejected by most assemblers as incomplete, the size of the memory operand not being specified.
Numerous instructions give the impression you are trying to work with memory BYTES although the instruction itself would actually use memory DWORDS (i.e. xor eax,).
f0dder's suggestion of providing a short description of what you are trying to achieve would certainly be to your advantage.
Raymond
1- Which assembler syntax are you using, MASM?
2- Have you tried assembling the code you posted?
For example, the movzx eax, should get rejected by most assemblers as incomplete, the size of the memory operand not being specified.
Numerous instructions give the impression you are trying to work with memory BYTES although the instruction itself would actually use memory DWORDS (i.e. xor eax,).
f0dder's suggestion of providing a short description of what you are trying to achieve would certainly be to your advantage.
Raymond
Hi !
I used that function as inline asm on BCB 6.0 and indeed buffer pointed by esi is an byte array. Function does an alteration to a buffer and in c will be like
void SUB_24B77A(unsigned char buffer[7], unsigned short crypt)
{
unsigned short r14;
short i;
r14 = buffer[7]*2;
if ( (r14 >> 8) ) r14 = r14 ^ 0x0163;
r14 = (buffer[5] ^ r14) ^ buffer[2];
for(i=6;i>=0;i--) buffer = buffer;
buffer[0] = (unsigned char) (crypt ^ r14);
}
Br;)
I used that function as inline asm on BCB 6.0 and indeed buffer pointed by esi is an byte array. Function does an alteration to a buffer and in c will be like
void SUB_24B77A(unsigned char buffer[7], unsigned short crypt)
{
unsigned short r14;
short i;
r14 = buffer[7]*2;
if ( (r14 >> 8) ) r14 = r14 ^ 0x0163;
r14 = (buffer[5] ^ r14) ^ buffer[2];
for(i=6;i>=0;i--) buffer = buffer;
buffer[0] = (unsigned char) (crypt ^ r14);
}
Br;)
Firstly you should notice that all operations are done in byte sized portions, and in fact if you use the carry bit you can avoid the need for the 16 bit accesses altogether.
As you are only shifting by 1, I'd advise you to use add instead of the shift, on the Pentium 4 architecture this will execute on the double pumped ALU effectively taking only half a clock.
Now the conditional XOR can be done using "jnc" instead. Or alternatively you can do this:
Then continue xoring as before, except only using a byte at a time.
Once you've done all that, take your code and shuffle it about a bit to make it difficult to read (remember not to comment it) :
You can of course make it smaller by removing the mov's to CL & CH, but they remove the dependancy on AL, so should make it a bit quicker (in theory).
Mirno
As you are only shifting by 1, I'd advise you to use add instead of the shift, on the Pentium 4 architecture this will execute on the double pumped ALU effectively taking only half a clock.
mov al, [esi + 7]
add al, al
Now the conditional XOR can be done using "jnc" instead. Or alternatively you can do this:
sbb ah, ah
and ah, 063h
xor al, ah
Then continue xoring as before, except only using a byte at a time.
Once you've done all that, take your code and shuffle it about a bit to make it difficult to read (remember not to comment it) :
mov al, [esi + 7]
mov cl, [esi + 5]
mov ch, [esi + 2]
add al, al
mov edx, [esi + 3]
sbb ah, ah
xor al, cl
mov [esi + 4], edx
and ah, 63h
xor al, ch
mov edx, [esi]
xor al, ah
mov [esi + 1], edx
xor al, crypt
mov [esi], al
You can of course make it smaller by removing the mov's to CL & CH, but they remove the dependancy on AL, so should make it a bit quicker (in theory).
Mirno
Hrm, "SUB_24B77A" ...
Hrm, "SUB_24B77A" ...
I'm workin on emulatin' some gsm device part and i reversed the functions but troubles is they are too slow for my needs.
@Mirno:
Thank you very much for taking time and explain me some things.
Br;)
Originally posted by DANNY
I also have an assembler optimization tips page you might find useful. It has more tips than agner's.
http://www.visionx.com/markl/optimization_tips.htm
I also have an assembler optimization tips page you might find useful. It has more tips than agner's.
http://www.visionx.com/markl/optimization_tips.htm
Thank you !
I really appreciate you generaous help even for newer members like me !
Br;)
Originally posted by DANNY
I also have an assembler optimization tips page you might find useful. It has more tips than agner's.
http://www.visionx.com/markl/optimization_tips.htm