man, what a disappointment lol, I'm just joking it looks like I may be donig something wrong, I set up DirectDraw surfaces for Hardware flipping, and it does work, but I get like 5 FPS, and I am not reading from the backbuffer!!
My software flipper is outperforming it CONSIDERABLY (something is definitely wrong).
Anyways here is what I did:
(I tried doing DDSCAPS_COMPLEX, but it failed!!)

Create Primary Surface with flags: DDSCAPS_PRIMARYSURFACE or DDSCAPS_FRONTBUFFER
Created another buffer with these flags:

descback.dwFlags, DDSD_CAPS or DDSD_WIDTH or DDSD_HEIGHT
descback.ddsCaps.dwCaps,DDSCAPS_BACKBUFFER

Than I attached the surfaces using AddAttachedSurface, function call returned 0, so it is all good.
I keep the back buffer locked and only unlock it when flipping, to Flip I do the following:



DDSINVOKE Unlock, lpBack, NULL

DDSINVOKE Flip, lpBack, NULL,NULL

DDSINVOKE mLock, lpBack, NULL, ADDR backsurf, DDLOCK_SURFACEMEMORYPTR or DDLOCK_NOSYSLOCK, NULL


But in my emu I get 5 FPS ( I implemented FPS counter), not sure why, I'm gonna keep trying and let you know if I can fix it, but please give ur suggestions that are most welcomed!
Posted on 2004-06-15 12:37:08 by x86asm

man, what a disappointment lol, I'm just joking it looks like I may be donig something wrong, I set up DirectDraw surfaces for Hardware flipping, and it does work, but I get like 5 FPS, and I am not reading from the backbuffer!!
My software flipper is outperforming it CONSIDERABLY (something is definitely wrong).
Anyways here is what I did:
(I tried doing DDSCAPS_COMPLEX, but it failed!!)

Create Primary Surface with flags: DDSCAPS_PRIMARYSURFACE or DDSCAPS_FRONTBUFFER
Created another buffer with these flags:

descback.dwFlags, DDSD_CAPS or DDSD_WIDTH or DDSD_HEIGHT
descback.ddsCaps.dwCaps,DDSCAPS_BACKBUFFER

Than I attached the surfaces using AddAttachedSurface, function call returned 0, so it is all good.
I keep the back buffer locked and only unlock it when flipping, to Flip I do the following:



DDSINVOKE Unlock, lpBack, NULL

DDSINVOKE Flip, lpBack, NULL,NULL

DDSINVOKE mLock, lpBack, NULL, ADDR backsurf, DDLOCK_SURFACEMEMORYPTR or DDLOCK_NOSYSLOCK, NULL


But in my emu I get 5 FPS ( I implemented FPS counter), not sure why, I'm gonna keep trying and let you know if I can fix it, but please give ur suggestions that are most welcomed!

Resolution that was set up is 640x480x32-bit and my video card is a Radeon 8500LE 128MB.
Posted on 2004-06-15 12:42:52 by x86asm
use dx9
Posted on 2004-06-15 12:51:03 by comrade

use dx9

what do u mean?
Posted on 2004-06-15 13:18:56 by x86asm
Your software flipper? Flipping is a hardware-feature, and cannot be emulated in software.
Anyway, unless I see your code, I cannot judge what is wrong with it...
I will just attach code for my old 486 3d engine, which does proper flipping. It is 320x200 8 bit, but should easily be changeable to 640x480 32 bit. Perhaps you can find your error that way.
Posted on 2004-06-15 16:18:23 by Scali

Your software flipper? Flipping is a hardware-feature, and cannot be emulated in software.
Anyway, unless I see your code, I cannot judge what is wrong with it...
I will just attach code for my old 486 3d engine, which does proper flipping. It is 320x200 8 bit, but should easily be changeable to 640x480 32 bit. Perhaps you can find your error that way.

Thanks a lot Scali your code helped me out!, it seems that DirectDraw had a problem with me locking the backbuffer, with current conditions I do see an improvement in speed!
Posted on 2004-06-15 19:42:56 by x86asm
Yes, in theory flipping takes no time at all.
It never copies any memory. It just swaps the front and backbuffers around. This means the videocard just has to swap two pointers. But ofcourse this only works if both buffers are in videomemory, and the videochip supports it.
Perhaps you created your backbuffer with wrong arguments, so it could not be stored in videomemory, and had to divert to sysmem. In that case, the CPU will have to pump the entire buffer over PCI/AGP every frame instead of doing a real flip-operation, and for anything above 640x480 that is pretty much suicide, even on the fastest of systems.
Posted on 2004-06-16 02:26:41 by Scali
In that case, the CPU will have to pump the entire buffer over PCI/AGP every frame instead of doing a real flip-operation, and for anything above 640x480 that is pretty much suicide, even on the fastest of systems.


??

I wouldnt say that.
have you ever tried it?

of course its slower than a few onboard register swap, but its far from being that slow.
i once wrote a dos .com program (using flat real mode, and my code even still was 16 bit) using vesa 2.0, where you get a pointer to the vram (mapped somewhere in the last Gig of adressable physical mem) and i was drawing things to a buffer in ram, then copying them to the LFB each frame, with no sync.

On my P133 with a s3 1Mb PCI card, in 640.480.16bits(even though i converted from 32b RAM backbuffer to the 16b VRAM LFB on the fly with my regular x86 code) i had something between 10 and 20 fps, but it prolly could ve been improved.

On my Athlon 1700+ with Geforce4MX, in 640.480.32, i did reach 70 fps. the copy was a regular "a32 rep movsd". (btw i wondered why it was 70 fps(i didnt sync), maybe the board wouldnt accept more vram writes than the vbl??)

i think you get decent framerates far before a 1.5Ghz processor, but couldnt test.
i also thought it was faster to write in RAM before if you are likely to overwrite pixels..and you can also read them fast.

in fact i was doing this because with my mem model and my system i wasnt able to call the flip func (an software bios int).

modern processors are truely capable of processing that much data realtime(well, not by far, but more than 5fps).

and i m not talking of SIMD (though i suspect mem bandwidth is the bottleneck here).

bye
Posted on 2004-06-16 15:11:18 by HeLLoWorld
btw tinyPTC for win32 does this.
(well, in fact,soft blit from your data seg RGBquad array to a ddraw VRAM surface, then ddraw flip)
x86asm, you might want to look at it, its the lib that made me come to win32 stuff.
Posted on 2004-06-16 15:14:50 by HeLLoWorld



??

I wouldnt say that.
have you ever tried it?

of course its slower than a few onboard register swap, but its far from being that slow.
i once wrote a dos .com program (using flat real mode, and my code even still was 16 bit) using vesa 2.0, where you get a pointer to the vram (mapped somewhere in the last Gig of adressable physical mem) and i was drawing things to a buffer in ram, then copying them to the LFB each frame, with no sync.

On my P133 with a s3 1Mb PCI card, in 640.480.16bits(even though i converted from 32b RAM backbuffer to the 16b VRAM LFB on the fly with my regular x86 code) i had something between 10 and 20 fps, but it prolly could ve been improved.

On my Athlon 1700+ with Geforce4MX, in 640.480.32, i did reach 70 fps. the copy was a regular "a32 rep movsd". (btw i wondered why it was 70 fps(i didnt sync), maybe the board wouldnt accept more vram writes than the vbl??)

i think you get decent framerates far before a 1.5Ghz processor, but couldnt test.
i also thought it was faster to write in RAM before if you are likely to overwrite pixels..and you can also read them fast.

in fact i was doing this because with my mem model and my system i wasnt able to call the flip func (an software bios int).

modern processors are truely capable of processing that much data realtime(well, not by far, but more than 5fps).

and i m not talking of SIMD (though i suspect mem bandwidth is the bottleneck here).

bye


Scali is right though, it is suicide to go over 640x480, even on a reasonably fast system the overhead of copying the data would be too much. 640x480 is fine, but try going over that? I have optimized my copy routine to use the MMX and SSE instruction movntps and still 800x600x32 is a little too much for my Duron 1.3Ghz, which is why I opted to keep my emulator to 640x480x32, but I'm not keeping it there, I'm thinking of dropping the resolution again and switching to 8-bit color mode (the SMS VDP can only display 16 colors on the screen at the same time, why do I need more thn 256?!?!)
Posted on 2004-06-16 15:15:11 by x86asm
I wouldnt say that.
have you ever tried it?


Yes I have, that's why I say it ;)
Just take a look at the bandwidth required for eg 800x600 32 bit at 100 fps. That would be 800*600*4*100 = 192000000 bytes/s.
So 183 mb/s.
A PCI bus would already be out of the question, since it has a theoretical maximum of 133 mb/s.
AGP has more bandwidth, it actually has enough to deliver. But you generally do not get the maximum theoretical bandwidth in practice due to a number of factors. At any rate, what it comes down to is that you will spend a significant amount of time just copying the image from memory to screen. And then you still have to draw in the remaining time. Things get slow quickly. I want to spend most of my time drawing, not getting it on screen.

modern processors are truely capable of processing that much data realtime(well, not by far, but more than 5fps).


It's not about the processors, as said before. It's about the PCI/AGP bottleneck. If the processor was actually located on the videocard, with direct access to the high-speed videomemory, you'd get much better performance. That's more or less why PCs have display accelerator cards with videomem anyway. Most other systems have one big pool of memory, and a few processors that use it (the XBox for example). If you look at the bad performance of integrated chipsets for PCs... it's mainly because they are using the slow mainmemory, rather than dedicated graphics memory. And then they still have a direct path to it, no PCI/AGP bus.
Posted on 2004-06-16 15:28:05 by Scali
I do 800x600x16 software flip and many other things in HE at framerates of 20-50FPS or so ;) without any optimizations whatsoever.

IF i just flipp and do NOT do other paints (like units,shadows,full screen alpha FOW, terrain,effects,interfaces,trees etc) ... and add a little algorithmic optimization like dirty scan lines: I get 300-500fps easy.

You are not setting your buffer corectly OR you are reading from a surface located in video memory

Using hardware accelerated 2D FLIP/Blt (DX7) and keeping all stuff in video RAM you should get 1000+fps or even more

Of course if you sync with monitor you can only get 60...120 or so. but you can use the extra time for other tasks/logic
Posted on 2004-06-16 15:33:53 by BogdanOntanu

I do 800x600x16 software flip and many other things in HE at framerates of 20-50FPS or so ;) without any optimizations whatsoever.

IF i just flipp and do NOT do other paints (like units,shadows,full screen alpha FOW, terrain,effects,interfaces,trees etc) ... and add a little algorithmic optimization like dirty scan lines: I get 300-500fps easy.

You are not setting your buffer corectly

Ya I wasn't setting it up properly, framerates improved drastically!
Posted on 2004-06-16 15:34:55 by x86asm
hmmm, okay, sorry.

still, i had 70fps on my athlon, and each frame i was doing
ram src bmp soft copy to ram backbuff,+
ram backbuff soft copy to lfb vram buff.

i think i tried 800 and 1024 too, and it wasnt horrible, but its true i havent investigated much.

and for ram-ram image filters, do you think its the same?
are the ram modules wired on the pci bus nowaday?or?
Posted on 2004-06-16 15:41:57 by HeLLoWorld
You guys do 16 bit. I cannot stand 16 bit :)
Banding issues, and alphablending is pretty much impossible. You don't have enough accuracy for more than one or two blendpasses. You could use dithering, but that doesn't really make it look better.
32 immediately requires twice the bandwidth.
Posted on 2004-06-16 15:46:39 by Scali
still, i had 70fps on my athlon, and each frame i was doing
ram src bmp soft copy to ram backbuff,+
ram backbuff soft copy to lfb vram buff.


So you copied some bitmaps around?
What about 3d, texturemapping, bilinear filtered sprites, raytracing, post-processing (blur etc?), to name but a few things that I may like to actually DO rather than copying a bitmap to screen as fast as possible? :)
Posted on 2004-06-16 15:52:57 by Scali
my engine wasnt that advanced...:)
maybe i ll work again on it one of these days...
when i left it it just could copy bitmap on screen ... at 70 fps:)

(i dont do 16 bit, but i was forced to final-stage-convert on my 1Mb s3trio which had just 16 bits)
Posted on 2004-06-16 16:01:10 by HeLLoWorld
Well, I am fiddling with a software engine... And when just updating the screen, at 800x600x32, for example, I get 184fps. It depends a lot on the CPU, system ram speed, video card... I tried it on many computers and the discrepancy can be huge even on similar computers, even AGP drivers can make quite a difference!

The bottom line comes to this:

First, when using win32 GDI Blt, which is the fastest most of the time, believe me, there is some kind of parallelism in there, because I can also do about one full screen software blit (in system memory only) and still get the same speed as when doing nothing but Blt'ing. BUT, as soon as you begin to draw something, framerates drop significantly, especially when it's alpha blended in software for example.

Take this example, completely random numbers:

Blt: 4ms (sys->vram)

If you only do that, you can get 1000/4 fps, which is 250fps. That's huge! But just add a BG blit plus some alpha blended sprites for example, and I'll ignore the parallelism for the sake of simplicity:

Blt: 4ms (sys->vram)
BG Blt: 2ms (sys->sys)
100 Alpha Sprites: 5ms (sys->sys)

We get 1000/(4+2+5) which is 90 fps... The hit is huge now.

Ok, sorry to get so lengthy. If y'all already knew this, sorry for my post, otherwise, it's just what I noticed making a 2d software endine. But there is one thing certain, with modern computers, it is possible to make something 2d really cool in software :)
Posted on 2004-06-24 19:48:19 by persil

Well, I am fiddling with a software engine... And when just updating the screen, at 800x600x32, for example, I get 184fps. It depends a lot on the CPU, system ram speed, video card... I tried it on many computers and the discrepancy can be huge even on similar computers, even AGP drivers can make quite a difference!

The bottom line comes to this:

First, when using win32 GDI Blt, which is the fastest most of the time, believe me, there is some kind of parallelism in there, because I can also do about one full screen software blit (in system memory only) and still get the same speed as when doing nothing but Blt'ing. BUT, as soon as you begin to draw something, framerates drop significantly, especially when it's alpha blended in software for example.

Take this example, completely random numbers:

Blt: 4ms (sys->vram)

If you only do that, you can get 1000/4 fps, which is 250fps. That's huge! But just add a BG blit plus some alpha blended sprites for example, and I'll ignore the parallelism for the sake of simplicity:

Blt: 4ms (sys->vram)
BG Blt: 2ms (sys->sys)
100 Alpha Sprites: 5ms (sys->sys)

We get 1000/(4+2+5) which is 90 fps... The hit is huge now.

Ok, sorry to get so lengthy. If y'all already knew this, sorry for my post, otherwise, it's just what I noticed making a 2d software endine. But there is one thing certain, with modern computers, it is possible to make something 2d really cool in software :)


This topic is welcomed to be discussed in this thread, no appology is necessary! Thanks for the info!
Posted on 2004-06-24 21:16:33 by x86asm