Hi, I have a problem, I'm basically using DirectDraw to gain access to the screen buffer, and for some reason I'm not being able to lock the whole buffer, only a portion of the copied result appears at the top, so in order to see the whole thing I have to bump up the resolution sooo high which causes a small, but noticeable slowdown and the unwanted black around the actually displayed area. Basically here is what I do to initialize DDRAW



invoke DirectDrawCreate, NULL, ADDR lpDD, NULL
DDINVOKE SetCooperativeLevel,lpDD, winh, DDSCL_EXCLUSIVE or DDSCL_FULLSCREEN
mov edx,xwid
mov ebx,ywid
DDINVOKE SetDisplayMode,lpDD,edx,ebx, 32 ;Set Display mode
mov ecx,SIZEOF DDSURFACEDESC
xor eax,eax
mov edi,OFFSET ddsurf
rep stosb ;Zero out

mov ddsurf.dwSize, SIZEOF DDSURFACEDESC
mov ddsurf.dwFlags, DDSD_CAPS
mov eax,xwid
mov ebx,ywid
mov ddsurf.dwWidth,eax
mov ddsurf.dwHeight,ebx
mov ddsurf.ddsCaps.dwCaps, DDSCAPS_PRIMARYSURFACE
DDINVOKE CreateSurface, lpDD, ADDR ddsurf, ADDR lpSurf, NULL
.IF EAX!=DD_OK
DDINVOKE SetDisplayMode, lpDD, NULL, NULL,NULL
mov eax,FALSE
ret
.endif
invoke memfill, ADDR ddsurf, SIZEOF DDSURFACEDESC, NULL ;Zero Out STRUCT
mov ddsurf.dwSize, SIZEOF DDSURFACEDESC
mov ddsurf.dwFlags, DDSD_PITCH
DDSINVOKE mLock, lpSurf, NULL, ADDR ddsurf, NULL, NULL
mov eax,ddsurf.lpSurface ;Grab pointer to surface
mov lpvidmem,eax ;Store in RAM
mov eax,ddsurf.lPitch
mov vidpitch,eax
mov ebx,yres
xor edx,edx
mul ebx
;EAX==Add XRes*BPP to get memory
mov edx,xres
shl edx,2 ;4 bytes per pixel,
add eax,edx
mov vidsize,eax
DDSINVOKE Unlock,lpSurf, NULL



I use my own backbuffer and when I want to flip data to the front buffer, this function is the code that is executed.





mov ddsurf.dwSize, SIZEOF DDSURFACEDESC
mov ddsurf.dwFlags, DDSD_PITCH


DDSINVOKE mLock, lpSurf, NULL, ADDR ddsurf, NULL , NULL

.IF waitvb==WVB_WAIT
.WHILE EAX!=DD_OK
DDINVOKE WaitForVerticalBlank, lpDD, DDWAITVB_BLOCKBEGIN, 0
.endw
.endif

xor eax,eax
xor edx,edx
mov al,mmx
mov ecx,vidsize
.IF AL==1
shr ecx,6
mov edi,lpvidmem
mov esi,backptr ;Set up pointers
mmxcopy:
movq mm0,[esi]
add esi,8
movq [edi],mm0


add edi,8
dec ecx
jnz mmxcopy
emms
DDSINVOKE Unlock, lpSurf, NULL
ret
;Unlock buffer before returning
.ELSE
;Plain CPU copying
mov edi,lpvidmem
mov esi,backptr
;Initialize Source and Destination Index Pointers
mov ecx,vidsize
shr ecx,2 ;Divide ECX by 4
rep movsd ;Repeated Move function
DDSINVOKE Unlock, lpSurf, NULL
ret
.endif






I'm not sure what the problem is because this code did work before without any problems, does anyone know what can be going wrong? Thanks
Posted on 2003-11-18 16:03:35 by x86asm
n/m I fixed it, :D
Posted on 2003-11-18 17:31:45 by x86asm
so tell us where the problem was - it'll warn other ppl trying to play with dd
Posted on 2003-12-04 13:40:39 by ti_mo_n

so tell us where the problem was - it'll warn other ppl trying to play with dd


Probably it was some stupid bug not worthy discusion. It heappens me all the time :tongue:
Posted on 2003-12-05 12:59:08 by AceEmbler
By the way, instead of using your own backbuffer, you could have DDraw create a surface in sysmem.
This way you can use the Blt() functions and the driver will choose the fastest way to blt the surface to screen.
(Like for example having the display card DMA the pixels into memory itself, and having the CPU free).
Posted on 2003-12-05 16:30:12 by Bruce-li
Yeah

Unfortunately standard ASM memcpy + some optimizations can beat DX in this case 2x or 3x
actually making your game faster...

Also i do not think DMA might help here as DMA is far to slow for such high speed video operations
most video boards never use it.

One more problem is that you will have to deal with Pitch and friends if you decide to ::Lock this system surface

There are good things also:
::Blt() might do the pixelformats conversions for you and you might also use color key
Also AGP might also help here but only when blitting to video back buffer

First we did it like Bruce suggested in HE but now i am seriousely considering using plaing memory buffers instead. At least i do not have to use ::Lock anymore and i can setup a fixed pitch==0 that might free one register in the inner loops :P
Posted on 2003-12-05 18:33:52 by BogdanOntanu
Also i do not think DMA might help here as DMA is far to slow for such high speed video operations
most video boards never use it.


Well perhaps not DMA as you know it, but something like AGP texturing. I call it DMA, then again, I am an ex-Amiga coder.
And you can't code a faster routine than that on CPU, because it will just go as quickly as the AGP bus/GPU can handle. And on the bright side, your CPU is now idle.
Then again, it ofcourse depends on the hardware (and driver) used. Perhaps it doesn't work properly on some systems. I haven't had problems with it back in the day I used it though (then again, that was on 486 with VLB :)).
These days I simply use the GPU (D3D) for everything, and I suggest everyone to do the same, if they can. It's simple, it's fast, it's robust, and it's filtered.
Anyway, I suggest you try all methods, and use what works best for you.
Posted on 2003-12-05 18:40:48 by Bruce-li
First

Let us NOT confuse starters :grin:

It is kind of "nice" that you "name" AGP as DMA but they are somehow diferent things, so i suggest that we name them as they are. I liked Amiga BTW

And i did mentioned that AGP might help you transfering bitmaps or textures towards video memory

To be more clear:
-----------------------------
1)CPU ASM routine can easy beat DX ::Blit() routines ONLY in SYSTEMRAM operations
let us be kind to DX and assume you can beat them because DX has to deal with more pixelformats :tongue:

2)CPU ASM can marginally 5%-10% beat DX in doing SYSTEM MEMORY to VIDEO MEMORY writes because they use the same AGP? or because drivers are somehow badly done? i do not know why yet

3)CPU can NEVER beat VIDEO to VIDEO transfrers actually they are 10x up to 1000x faster done by GPU, but this is expected and logical

4)Sometimes very rarely CPU can beat GPU at some 3D operations only because CPU is at 3G and GPU is at 500Mhz but those situations are too much complicated to be presented here


Other stuff
------------------
On the other issue of using 3D scanline texture hardware to do everything, yeah i somehow agree, because this is IMPOSED and i have no other OPTIONS.
I just wonder:Since when having no options and freedom has become such a good thing?

Also the fact that i am obligated to do textured triangles for a simple 2D bitmap is just plain lame IMHO
A simple 2D bitmap should be --and in fact is-- much more faster to draw in hardware than the above...
I see no real reason fo this other than the stupidity or the human race and its mind driven assumptions

Also me an others :grin: think that the future of 3D is in hardware accelerated realtime raytracing so the current 3D scanline algorithms will be dropped in a few years

Also i like to have the freedom to test such new algorithms for both 3D and 2D and i will love to have more freedom and control over the hardware caps of the videoboards instead of the video board have control over me :D
Posted on 2003-12-05 19:46:46 by BogdanOntanu
It is kind of "nice" that you "name" AGP as DMA but they are somehow diferent things, so i suggest that we name them as they are. I liked Amiga BTW


DMA stands for Direct Memory Access in my book. In other words, hardware accessing memory directly, without requiring the CPU to read the data and pump it through the hardware.
If you have a better name for what eg AGP texturing does, let me know, but I just used DMA, because I didn't know any more specific name (it's not really AGP texturing either, because this can also be done on PCI systems for example).

let us be kind to DX and assume you can beat them because DX has to deal with more pixelformats


Well firstly, if you use the same pixelformat for both surfaces, DX will skip the pixelformat conversion.
Secondly, a lot of hardware can do this pixelformat conversion by itself, so if you make all sysmem surfaces the same format, and then get it in videomemory and have it do the conversion there, it may be the fastest solution.

2)CPU ASM can marginally 5%-10% beat DX in doing SYSTEM MEMORY to VIDEO MEMORY writes because they use the same AGP? or because drivers are somehow badly done? i do not know why yet


Depends on the hardware or drivers, I suppose. Sometimes it works, sometimes it doesn't.
At any rate, this should not be the bottleneck, unless you are rendering something very trivial anyway.

3)CPU can NEVER beat VIDEO to VIDEO transfrers actually they are 10x up to 1000x faster done by GPU, but this is expected and logical


I would like to make this even stronger and say: NEVER READ FROM VIDEOMEMORY WITH THE CPU, EVER!
:)
PCI/AGP is a one-way street. Reading back videomemory is so slow, that it's useless for anything realtime (well, you could probably get away with reading just a few pixels per frame. But usually there's a way around this).

4)Sometimes very rarely CPU can beat GPU at some 3D operations only because CPU is at 3G and GPU is at 500Mhz but those situations are too much complicated to be presented here


The problem is that hardware-rendering and software rendering cannot be combined efficiently or robustly.
Reading videomemory is slow, as said. And reading or writing the zbuffer is completely impossible on most hardware. You can use a trick with a pixelshader that can render a z from a texture though, so you can write to it, but reading is a big problem.
And ofcourse hardware-rendering has to complete before any software rendering can take place. If you interrupt hardware-rendering, do some software, and go back to hardware rendering during a frame, you will get very poor performance, because you will lose the asynchronous advantage of the hardware-rendering, and have to wait for it to complete, while the CPU sits idle.

A simple 2D bitmap should be --and in fact is-- much more faster to draw in hardware than the above...
I see no real reason fo this other than the stupidity or the human race and its mind driven assumptions


Well, not all hardware actually implements 2d bitmap operations anymore. They just use 'emulation' with the 3d pipeline and textures anyway. This way the hardware can be made simpler and cheaper, which is a good reason, I think :)

Also me an others think that the future of 3D is in hardware accelerated realtime raytracing so the current 3D scanline algorithms will be dropped in a few years


I have given my opinion on this in another thread recently... Somewhere here: http://www.asmcommunity.net/board/index.php?topic=16128&perpage=15&pagenumber=9
Somewhere at the end of the thread, you will also find a link to an article with information on DirectX Next.
It seems like ps4.0 will bring us a few steps closer to hardware-accelerated raytracing again.

Also i like to have the freedom to test such new algorithms for both 3D and 2D and i will love to have more freedom and control over the hardware caps of the videoboards instead of the video board have control over me :D


Tried applying for a job at NVIDIA, ATi, Matrox, XGI, 3DLabs? :)
Posted on 2003-12-05 20:11:33 by Bruce-li
Mea culpa Bruce

I just re read old DX7 SDK on the AGP issue and have found they also name AGP == DMA so i must be living in a dream world

Yes DMA means the same for me:: ie Direct Memory Access, aka devices that read and write data from a memory location to another, very fast if possible (at least 10x CPU speed) and do this without the use of CPU (besides initializations and an interupt at end of operation) <-- this was it by my standards

But i mean direct acces to any memory while as you say AGP (damn PCI also? i really hope not) is only a method to fastly upload textures into video board...

So beeing unidirectional and based on the North bridge only and making AFAIK a kind of direct connection between CPU and the video memory and the ram buss hardly meets my standards as DMA. But now i am forced to agree that it could be considered a special form of DMA :eek: yeah looks more like an upgraded VL bus now

Yes, NOT to read from video memory was a painfull firs lesson i have learned in my early DX testing :D and since then i have recomended it to everybody also.

Even if i dislike this decision to make read video memory impossible... i do not think that this is the real problem when using software rendering, as we all know that we can use a system based backbuffer for caching those reads.


No i did not trye to appliy for a job at Nvidia and other video manufacturers :grin: I do not see the use... waiting 10 years before i have the chance to do exactly as my boss wants me to?

I do not have the time or energy to debate all other aspects presented by you below, and honestly i do not know if it helps any newbie... because the world we live in is corectly presented by you

However i will mention that i am somehow against it ...

And i will add the fact that some things are observed to be somehow today (not by gods intervention but by stupid humman design) dosent make them right or corect either.

I wonder if you have tested any modern video board to see if it lacks 2D acceleration hardware?

I have just tested and i have found latest Nvidia boards are able to run 2D IDirectDraw1 interfaces (aka pre DX3) at super high speeds 2D ::BLT() video to video; same goes for ATI...

I am sure that if you let them know that they will add a flag in drivers that will eventaully prevent this just to make the world better organized....

Whatever they claim, making 2 triangles, setting matrices, setting textures, etc for rendering a simple 2D plain bitmap on screen is pathetic by absolute standards, but ok for captalismus

And i agree that 99% of population is happy with this and since they make the rules, i usually obey and wait to be raped :tongue:...

however i will not close my eyes.... i will keep them WIDE OPEN
Posted on 2003-12-06 00:38:04 by BogdanOntanu
But i mean direct acces to any memory while as you say AGP (damn PCI also? i really hope not) is only a method to fastly upload textures into video board...


Well, if you must know... I have an old PCI accelerator, a PowerVR PCX2 (cousin of the chip used in the Sega DreamCast) to be exact. It's an add-on board, like the old VooDoo cards. But the funny thing is, this card does not need an external cable that loops to the 2d display card output. You just plug the card in, and it works, as long as you have a DirectDraw-compatible 2d card.
Guess how it works? It creates a DirectDraw surface on the 2d card... Then it renders its 3d image into its own local memory (it is a tile rasterizer), and when it is done, it blits the image into the 2d card's surface over the PCI bus. Is that cool or what? :)

But i mean direct acces to any memory while as you say AGP (damn PCI also? i really hope not) is only a method to fastly upload textures into video board...


Well, strictly speaking, a texture is just an array. So if you can upload a texture, you could also fill the 'texture' with other info instead, and upload that. And a DirectDraw surface is basically a texture to the hardware aswell :)

I wonder if you have tested any modern video board to see if it lacks 2D acceleration hardware?


No, then again, I can't say I'm bothered with the speed of 2D operations of modern boards anyway :)
Not sure how you would test it either. You can't tell much from the speed of the operations, I think.

Whatever they claim, making 2 triangles, setting matrices, setting textures, etc for rendering a simple 2D plain bitmap on screen is pathetic by absolute standards, but ok for captalismus


Well, I think it's better to do this, and only have a 3D unit in hardware, than to build a separate 2D unit into hardware aswell, while the 3D unit could do the same, and much more. This way they save some transistors that they can use for more interesting features than 2D instead :)
I don't think it matters much for speed either. A blt-operation is memory-limited anyway, so while rendering triangles may in theory be a more complex and slower operation, in practice it's the memory that determines the speed.
And I prefer triangles anyway, they allow you sub-pixel movement with bilinear filtering, so you can make your 2d stuff smoother and more accurate :)

But if I look at the plans for Windows Longhorn, it seems that 2D hardware will disappear completely, if it has not disappeared yet. Longhorn will have a GUI running on Direct3D, and it will use the 3d hardware for filtering, shading, blending and whatever else they can think of.
And so, if even the GUI itself is '3D' (well it looks 2D, but runs on 3D functions), there will be no reason at all to implement 2D functions in hardware anymore.
Posted on 2003-12-06 04:10:47 by Bruce-li
Yes Bruce you are right, please forgive my ramblings
Posted on 2003-12-06 12:33:46 by BogdanOntanu
Sorry I didn't respond in a long time, in my copy routine I had something like this:



copylp:
movq mm0,[esi]
movq mm1,[esi+8]
movq mm2,[esi+16]
.
.
.
movq [edi],mm0
movq [edi+8],mm1
dec ecx
jnz copylp



Later on I found that this was just stupid probably stalling my Athlon CPU instead of increasing the speed of the copy loop so I replaced it with one register load and store. But before I reached this copy loop
I basically divided the counter by 64 (since I moved 64 bytes at a time), I change the copy loop to this:



copylp:
movq mm0,[esi]
movq [edi],mm0
dec ecx
jnz copylp



When here I forgot to update the part that divides the counter so my copy loop was exiting early. I forgot to divide the counter by 8 instead of 64.

haha I feel stupid.
Posted on 2003-12-06 17:00:37 by x86asm

By the way, instead of using your own backbuffer, you could have DDraw create a surface in sysmem.
This way you can use the Blt() functions and the driver will choose the fastest way to blt the surface to screen.
(Like for example having the display card DMA the pixels into memory itself, and having the CPU free).


I'll test out BitBlt whenever I can thanks, ya the reason I layed off of it was because I did have a ol' Matrox Millenium 2MB and whenever I played a DDraw H/W accelerated game it actually ran slower for some reason!

Hey that PowerVR PCX2 is pretty neat! I never EVER heard of any other card that does that! I like PoverVR's tile technique, I had their Kyro 1 (Hercules 3D Prophet 4000XT) and I was quite shocked that this card in most games gave performace close to my GeForce2 with like a fraction of the fillrate and clock speed, same with my Sega Dreamcast games look quite good on it.
Posted on 2003-12-06 17:04:12 by x86asm
Yes, PowerVR makes nice stuff :)

Anyway, you could try using movntq to write to videomemory.
This instruction will write 'through' the cache.
Since videomemory should not be cached anyway, there's no reason for updating the L1/L2 caches in the CPU either.
This can make writing to videomemory a lot faster.
You could also try replacing the dec ecx with sub ecx, 1.
This is faster on P4s, and also sometimes on Athlons.
On all other CPUs (PII/III/Celeron etc) it should be as fast.
Posted on 2003-12-06 18:08:36 by Bruce-li
bruce:

Well, if you must know... I have an old PCI accelerator, a PowerVR PCX2 (cousin of the chip used in the Sega DreamCast) to be exact. It's an add-on board, like the old VooDoo cards. But the funny thing is, this card does not need an external cable that loops to the 2d display card output. You just plug the card in, and it works, as long as you have a DirectDraw-compatible 2d card.
Guess how it works? It creates a DirectDraw surface on the 2d card... Then it renders its 3d image into its own local memory (it is a tile rasterizer), and when it is done, it blits the image into the 2d card's surface over the PCI bus. Is that cool or what?

did not the 3dfx voodoo ONLY use the link cable for vsync? the board didnt have any dac or adc i think...
on which card did you plug your screen again?
Posted on 2003-12-15 05:53:10 by HeLLoWorld
I think the VooDoo had its own dac, and it basically merged the 2d image with the 3d one. I recall that the link cable degraded the video quality anyway.
My PowerVR card does not have any connections at all, except for the PCI bus.
As I say, it copied the 3d image into a DirectDraw surface on the host card.
Posted on 2003-12-15 06:01:24 by Bruce-li