Hi guys, I want to set up DirectDraw for Hardware flipping, my SMS emulator is starting to come to life running many games but it slows down when I have the resolution at 640x480 and all of a sudden speeds up considerably when I put it down to 320x240 or 320x200, I guess software flipping isnt gonna work.

I read somewhere that DirectDraw doesn't copy the whole buffer but instead swaps the backbuffer and front buffer pointers that the hardware uses to raster the image which sounds like a good idea. I could save a lot of memory bandwidth if this is how it is performed, even if the GPU does the copying I think it may beat my code.

So what do I do to set up a hardware flip surface (How do I tell DirectDraw that I want the hardware to flip the front and back buffers), also for alpha blending I will be reading the the back buffer so should I place it in system memory by using the flag?

Thanks guys

If you guys have optimization tips to suggest here is the code I use to flip my surface (MMX), please remember that I only have an Athlon (non XP) so I can't use SSE instructions. Also I would like the software to run on older processors:



xor eax,eax
xor edx,edx
mov al,mmx
mov ecx,vidsize
.IF AL==1
shr ecx,3
mov edi,lpvidmem
mov esi,backptr ;Set up pointers
mmxcopy:
movq mm0,[esi]
add esi,8
movq [edi],mm0


add edi,8
dec ecx
jnz mmxcopy
emms
DDSINVOKE Unlock, lpSurf, NULL
ret
;Unlock buffer before returning
.ELSE
;Plain CPU copying
mov edi,lpvidmem
mov esi,backptr
;Initialize Source and Destination Index Pointers
mov ecx,vidsize
shr ecx,2 ;Divide ECX by 4
rep movsd ;Repeated Move function
DDSINVOKE Unlock, lpSurf, NULL
ret



Basically what I'm doing is checking the MMX flag which is initialized at load time, if it is 1 (MMX is supported) it uses the MMX code to copy the buffers and if not it uses the REP MOVSD method.
Posted on 2003-12-30 13:36:57 by x86asm
In order to use the Hardware Flipper you must:

a)Be in Exclusive full screen mode (will not work in windowed mode)

1)create PRimary buffer in video ram with flags:
DDSCAPS_PRIMARYSURFACE OR
DDSCAPS_FLIP OR
DDSCAPS_COMPLEX OR
DDSCAPS_VIDEOMEMORY
DDS_COMPLEX
and
ddsd.dwBackBufferCount, 1

Then Use ::GetAttachedSurface to get a pointer to the backbuffer

You will ONLY write to this one from now on :grin:

In your main loop use ::FLIP to switch front and backbuffer

However you can NOT read from video backbuffer because it is in VIDEO memory
SO no alphablending :P

replace rep movsd with a
mov esi,offset source_image
mov edi,lp_backbuffer
mov ecx,nr_dwords_to_copy
@@loop:
mov eax,
mov ,eax
add esi,4
add edi,4
dec ecx
jnz @@loop

it is faster than movsd on big images
for small sizes rep movsd is faster

Do not forget that you CAN NOT READ from video memory or you will get a 100X slower penalty

Also do not forget to use lPitch for width of image and NOT your width
(might loook funny on some video bards or video modes)
do not
Posted on 2003-12-30 14:42:33 by BogdanOntanu

Do not forget that you CAN NOT READ from video memory or you will get a 100X slower penalty

Sorry for hijacking this thread but do you know a solution that gives hardware flips and allows fast blending?

I've been thinking about a quad buffering sceme: two buffers in system memory, two in video memory. The two in system memory are required to always have a target buffer. So while the second buffer is hardware-blitted to video memory, you don't have to wait to continue rendering. The two buffers in video memory are required for regular double buffering and avoiding tearing of course.

Would this approach work? And can it be implemented? Thanks!
Posted on 2003-12-30 15:30:06 by C0D1F1ED


Sorry for hijacking this thread but do you know a solution that gives hardware flips and allows fast blending?

I've been thinking about a quad buffering sceme: two buffers in system memory, two in video memory. The two in system memory are required to always have a target buffer. So while the second buffer is hardware-blitted to video memory, you don't have to wait to continue rendering. The two buffers in video memory are required for regular double buffering and avoiding tearing of course.

Would this approach work? And can it be implemented? Thanks!


Hey thats a good idea! But you have to switch the two front buffers, it can be done, but it will take some thought.
Posted on 2003-12-30 17:51:35 by x86asm
Bogdan, is it OK if I do this?
Can I keep the Back buffer locked at all times except only when flipping ?

For example after I attain the back buffer object I lock the buffer and keep it locked until I want to flip it then once the flip completes I lock it again.
Posted on 2003-12-30 19:18:29 by x86asm
Hi, x86asm . I like AMD, too ;)
And they provides a tutorial for fast copying


C0D1F1ED
I like your idea, but I'm still using one buffer in local memory and two buffers in videomemory.
BTW I'm not sure is RAM bus free for CPU acces when GPU is doing BltFast from local to video RAM
Also, if we use integrated graphics (like intel 845G) where is videomem located?
I think, the best is to benchmark different buffer locations at run time.
But it's not so easy to code :(
Posted on 2003-12-31 04:42:22 by S.T.A.S.
Would this approach work? And can it be implemented? Thanks!

The problem is that you would have to handle the flip asynchronously, which is not possible through DirectDraw, as far as I know (hbl/vbl interrupts were such a lovely thing on the good old Amiga).
So I don't think there's much difference between blitting to backbuffer, waiting for vbl and then flipping, or waiting for vbl, then blitting to the frontbuffer, except that one requires one less surface in videomemory.
Posted on 2004-01-21 03:38:50 by Jan