Code to skin a window by directly copying the data from bitmap to bitmap.
The source bitmap is a 96x96 image with the skin tiles being 32x32 large. My version doesn't care if the destination bitmap size is aligned by 32, but that doesn't really matter.
I know this looks a bit... messy. ;)



invoke GetObject,bMain,sizeof bminfo,addr bminfo
mov esi,bminfo.bmBits
invoke GetObject,hDib,sizeof bminfo,addr bminfo
mov edi,bminfo.bmBits
push 0
push 32
sub bminfo.bmHeight,64
push bminfo.bmHeight

mov ebx,32
jmp SectionLoop
BlockLoop:
sub esi,4*32*32*3
SectionLoop:
mov eax,32
sub ebx,32
jns LineLoop
mov eax,ebx
;neg eax
add eax,32
LineLoop:
; Draw first 32 pixels
mov ecx,32
rep movsd
; Draw middle pixels
mov ecx,bminfo.bmWidth
sub ecx,64
mov edx,32
MiddleLoop:
movsd
dec edx
jnz @F
mov edx,32
sub esi,32*4
@@:
dec ecx
jnz MiddleLoop
shl edx,2
add esi,edx
; Draw last 32 pixels
mov ecx,32
rep movsd
dec eax
jnz LineLoop
cmp ebx,0
jg BlockLoop
jns @F
neg ebx
mov eax,4*32*3
mul ebx
add esi,eax
@@:
pop ebx
or ebx,ebx
jnz SectionLoop
Posted on 2002-09-07 05:36:43 by Qweerdy
What about this?


[B] mov ebx, offset bminfo[/B]
invoke GetObject,bMain,sizeof bminfo,[B]ebx[/B]
mov esi,bminfo.bmBits
invoke GetObject,hDib,sizeof bminfo,[B]ebx[/B]
mov edi,bminfo.bmBits
[B] mov ebx,32[/B]
push 0
push [B]ebx[/B]
sub bminfo.bmHeight,64
push bminfo.bmHeight

jmp SectionLoop
BlockLoop:
sub esi,4*32*32*3
SectionLoop:
mov eax,32
sub ebx,32
jns LineLoop
mov eax,ebx
;neg eax
add eax,32
LineLoop:
; Draw first 32 pixels
mov ecx,32
rep movsd
; Draw middle pixels
mov [COLOR=deeppink]ecx[/COLOR] ,bminfo.bmWidth
[B] mov [COLOR=deeppink]edx[/COLOR] ,32
sub [COLOR=deeppink]ecx[/COLOR] ,64[/B]
MiddleLoop:
movsd
dec edx
jnz @F
mov edx,32
sub esi,32*4
@@:
dec ecx
jnz MiddleLoop
shl edx,2
add esi,edx
; Draw last 32 pixels
mov ecx,32
rep movsd
dec eax
jnz LineLoop
cmp ebx,0
jg BlockLoop
jns @F
neg ebx
mov eax,4*32*3
mul ebx
add esi,eax
@@:
pop ebx
or ebx,ebx
jnz SectionLoop

Pushing regs are faster than immeds (reg 1 clk, immed 2-3 clks), the lines with registers in pink I think pairs now. I think the jump can be faster i theyre short, but when I looked into a reference it apears as if they're slower :confused: .
Posted on 2002-09-07 06:52:24 by scientica
Well
mov eax,ebx
add eax,32

can be changed to
lea eax,
Posted on 2002-09-07 11:08:29 by Eóin
Ok, good suggestions. Nothing really new though, I just didn't look hard enough :) Yeah yeah they always say that...

anyway, thanks guys :alright:
Posted on 2002-09-09 09:48:43 by Qweerdy
I'm not sure this, but IMHO:

mov edi,bminfo.bmBits
mov ebx,32
push 0
<B> sub bminfo.bmHeight,64
push ebx</B>
push bminfo.bmHeight

I suppose, that pentium processors could use advantages of it's architecture:
When code is optimized like this, it will run faster (sub and push are data-independent, so
the code can be executed in one group-of-clicks, two commands at once,
not one-after-another, like in previous code version ).

Feel free to blame me;)
Posted on 2002-10-08 02:36:40 by Adderek
Hmmm yeah OK but that code wasn't in the loop so I guess it won't matter that much. But an optimization is an optimization, even if you win only 2 cycles...
Posted on 2002-10-08 07:10:59 by Qweerdy
I think I've read somewhere that push/push and pop/pop can pair, don't know where I read that; or if it's true, but I don't think they pair since they both modify one singe reg.

2 cycles is like two years to some people... :eek: :grin:
(and two cycles in a loop executed a "few" times makes some cycles....).

Haven't made any calculations but I think if one could use two regs for the 32 and 64 values the loop would be a few cycles faster (and smaller, IIRC moving from reg to reg is smaller than immed to reg), edi could be used to store 32 (it's used quite frequent).

btw, the size can be of importance to, if it fits in a cache-line then it's faster; don't know how large a chache-line is but i think it was something like 32 <something>.
Posted on 2002-10-08 08:23:14 by scientica
This looks optimal, but doesn't allow a pattern effect in the center:
Posted on 2002-10-12 16:06:04 by bitRAKE