Anyone got a fast alphablitting routine in 16-bit mode? I mean one that uses an alpha channel rather than an alpha value for the whole image.

My own routine is pretty fast but I doubt it is optimal. It blits 512 images with the size 32x32 per frame in 640x480x16 at a framerate of 32 on my 500Mhz PIII with a lousy Matrox G200 card. That also includes physics for the objects I'm blitting.
Posted on 2001-10-12 01:03:47 by gliptic
Nobody seems to be interested in Alphablitting. How strange. I think it's very cool.

I've made an MMX version of the alphablitting routine and it runs four times faster than the original.
Posted on 2001-10-17 00:56:25 by gliptic
Post your code :)
Posted on 2001-10-17 06:08:41 by f0dder
Maybe. I haven't got it here. Just wanted to check if you had some cool algorithms for it. I can't make it faster without another algorithm.
Posted on 2001-10-17 08:51:27 by gliptic
Again I'm a bit late...
16bpp 565 effects are just discouraging to optimize due to the pixel format. I've been fighting this problem on both x86 and ARM and the only optimization that can be done is in (pre)fetching the data from RAM.
Sure, on x86 the MMX get things done a bit better, but if you use 32bpp internally, (and you don't have crap RAM bandwidth), MMX can be much more helpful.

Btw, try using more of Additive-Blending - it's both much faster, better-looking in most cases, and it's easy to fade/control its value (via  additional multiplication) - MMX supports it perfectly on 24/32bpp :)

On newer PCs (especially if equipped with ATi card), doing custom drawing on DDraw surfaces is slow (your crappy videocard behaves better there) . In my DDraw projects I draw on a custom-alloced buffer with custom routines (no DDSurface::Blt), and finally splash it onscreen.

Most people moved to 2D-via-3D - it's the best way, though I hate to admit it.
Posted on 2006-03-24 18:49:24 by Ultrano
Here is one from my trainer engine:

push ebp
mov ,esp

sub edi,2

mov eax,
add eax,eax
mov edx,
sub ,eax
mov ,edx

.line: mov eax,
mov ,eax
.pixel: movzx edx,byte ; A---          u
mov eax, ; -R-B
mov ecx, ; --G-
and eax,$00FF00FF
and ecx,$0000FF00
imul eax,edx
imul ecx,edx
mov ebx,eax
shr ecx,13
shr eax,11
shr ebx,16
and ecx,$07E0
and eax,$001F
and ebx,$F800
xor edx,$FF
add ecx,eax
add edi,2
add ecx,ebx

shr edx,2

movzx eax,word ; -R-B
movzx ebx,word ; --G-
and eax,$F81F
and ebx,$07E0
imul eax,edx
imul ebx,edx
and eax,$F81F shl 6
and ebx,$07E0 shl 6
add eax,ebx
add esi,4
shr eax,6
add eax,ecx

mov ,ax
jg .pixel
add edi,
jg .line

mov esp,
pop ebp

esi -> source frame
edi -> destination
fwidth -> width of source, in pixels
fheight -> height of source, in pixels
lockrc.Pitch -> width of a single scanline, in bytes (this is usually fwidth*number_of_bytes_per_pixel aligned to a dword boundary)
x -> temporary variable
y -> temporary variable
stackp -> temporary variable
Posted on 2006-03-26 11:08:32 by comrade
You can always use MMX to manipulate the individual color components using fixed point.

EDIT: Since you already wrote an MMX version, I suggest maybe try using the cache control instructions?
Instead of writing using MOVQ, use MOVNTQ. I think you maybe limiting support to SSE capable chips (sorry to the Thunderbird dudes), though I think my 900Mhz Athlon executed that instruction, don't remember.

It did provide a rather large increase (definitely noticeable) in performance on a 1.3Ghz Duron when I made use of it.
Posted on 2006-03-27 15:58:55 by x86asm