Hi,

I'm learning asm using Delphi 2005 and using asm to opmise Delphi code. But it seems that Delphi generates faster code than my in asm. Can someone explain why the delphi's code is faster than my that uses rep stosd?

Here the code:

function LetBitmap(filename: string): TBitmap;
var
FDIB: TDIBSection;
src: pointer;
w, h, t1, t2, x, y, i: integer;
p: pinteger;
begin
result := TBitmap.Create;
result.LoadFromFile(filename);
result.PixelFormat := pf32bit;
GetObject(result.Handle, sizeof(FDIB), @FDIB);
src := FDIB.dsBm.bmBits;
w := result.Width;
h := result.Height;

t1 := GetTickCount;
for i := 1 to 100 do
for y := 0 to h-1 do
begin
p := result.ScanLine;
for x := 0 to w-1 do
begin
p^ := $ffff00;
end;
end;
t2 := GetTickCount;
ShowMessage(IntToStr(t2-t1));

t1 := GetTickCount;
asm push edi end;
for i := 1 to 100 do
asm
mov edi, src // Move o ponteiro do array de pixels para EDI
mov edx, h // que ? utilizado em rep stosd.

@innerloop:
mov ecx, w
mov eax, $ff00ff // EAX ? o valor do pixel que ? armazenado quando
rep stosd // rep stosd ? chamado.
dec edx // EDX equivale ao y e percorre do h at? zero.
jnz @innerloop // Se edx for maior que zero, vai para o innerloop
end;
asm pop edi end;
t2 := GetTickCount;
ShowMessage(IntToStr(t2-t1));
end;
Posted on 2004-12-24 16:59:49 by brunoavila
What are the time results?
Probably the "for i := 1 to 100 do" part is what slows things down a bit - you should see the asm both versions produce, and include the outer loop in asm, too.
Posted on 2004-12-25 05:10:58 by Ultrano
I think the value 100 is too small. Also can give us your result? I have no delphi compiler to test out your code.

Anyway I can assure you that the delphi compiler does not produce optimised code. I looked at it in disassembler before. yucks.
Posted on 2004-12-25 05:24:32 by roticv
In a pentium 4 3.2GHz with HT, for the first 250ms and the second 937ms.

The loop "for i := 1 to 100 do" is only for measuring purposes.

Debugging delphi code, it doesn't use rep stosd. It implements the two for's and uses mov , $ff00ff to assign the pixel's color.

I just don't understand. :(
Posted on 2004-12-25 21:22:51 by brunoavila
It could be that rep stosd is slower because the number of iterations is too small. What is the value of 'w' ?

The asm code can also be simplified to use one loop :

mov edi, src

mov ecx, h
imul ecx, w ; ecx = w * h
mov eax, $ff00ff
rep stosd
Posted on 2004-12-26 03:46:24 by Dr. Manhattan
t1 := GetTickCount;
for i := 1 to 100 do
for y := 0 to h-1 do
begin
p := result.ScanLine;
for x := 0 to w-1 do
begin
p^ := $ffff00;
inc(p); // <<<-----------------
end;
end;
t2 := GetTickCount;
ShowMessage(IntToStr(t2-t1));


This line was missed, thats caused the big difference of the time. And yes, I also got to the same one iteration code that you proposed.

Thanks for the help.

Bruno
Posted on 2004-12-26 18:26:00 by brunoavila
phew :) I was actually getting some (even tiny) doubts about codingspeed(ease) vs codespeed here. Good that this mistery was solved :-D
Posted on 2004-12-26 18:32:08 by Ultrano