another_old_member,

I did some tests with truespace/cinema4D: They both do subpixelcorrection - and it looks smoother.
Yes, it should be worth to put in (especially for lower resolutions).
I will put it on my ToDo-List.

Thanks for the (at first glance not so obvious) hint!


fodder,

I have it not in my (Matrox)-Drivers. I hope that somebody can tell me how to disable in the DirectDraw section in the sourcecode.
Anybody?


VShader
Posted on 2004-04-29 06:59:13 by VShader
Found some useful info for putting out VSync:


Enum CONST_DDFLIPFLAGS
DDFLIP_DONOTWAIT = 32
DDFLIP_EVEN = 2
DDFLIP_INTERFVAL2 = 536870912
DDFLIP_INTERFVAL3 = 805306368
DDFLIP_INTERFVAL4 = 1073741824
DDFLIP_NOVSYNC = 8
DDFLIP_ODD = 4
DDFLIP_WAIT = 1 ;only had this one
DDFLIP_STEREO = 16
End Enum


VShader
Posted on 2004-04-29 07:15:22 by VShader
WOOOOOOOOOOOOOOOOOOOOOOOOOAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA!!!!

i t s t r u e l y i n c r e d i b l e!

you deserve a medal! and a beer!

what a beautiful engine!

it looks like the engine i aways dreamed i could create one day!!!!!

still got 70 fps in 800.600!
cant believe it.
i had seen the screenshots a while ago but couldnt see it working because of the pitch problem (msdn says useful things sometimes :) )
now it works... great!
its so true that a slow machine pushes you to optimise far better than othewise.
congratulations!
respect.

cant you look around with the mouse? so you use z buffer? can you disable distance clipping? lots of keys do lots of things...

(maybe i m gonna get flamed but... I LOVE FLAT SHADING! ... or flatcolor gouraud)
did you spend much much time on the polyfiller? (would be interesting to see how it manages with texture support and your skills, but i love so much it that way!)

bye
Posted on 2004-04-29 08:14:28 by HeLLoWorld
flatshading is nice, so retro :)
Posted on 2004-04-29 08:48:13 by f0dder
Rememeber that this engine is mmx-based (16bit signed fixed point used for 3D-transforming) so the question is, if the accuracy of the transform is high enough that it is worth to implement subpixel correction (or if the transform is so inaccurate that you don't notice subpixel correction ).


Most 3d hardware has 4 bit subpixel accuracy.
I suppose you might be able to get away with that? Even with just 16 bit, you would still have 12 bits to spare for the actual resolution, so 1024x768 would just work.
Or you could do less subpixel bits. Even when using only 1 bit, you already get quite a difference.
But I would suggest using floats for all transforms, and using fixedpoint only during the rasterizing process.
If you can use SSE/SSE2, matrix/vector operations will be faster anyway than with MMX.
And if you use the regular FPU, it will still perform quite well, and it will give your engine a lot more flexibility (larger worlds, bigger range of scaling operations etc). I suppose you will usually spend most time rasterizing rather than doing T&L or poly setup.
Posted on 2004-04-29 12:08:56 by Scali

PMMX200 now, AMD64 soon: Not unlikely that I will add support for SSE2 then. (World can then be much bigger.)

:-) - I wonder how regular float would do, compared to fixpoint, on that CPU?
Posted on 2004-04-29 12:41:55 by f0dder

Yes, hardwareflipping.

Rip out all the 3D-stuff from the source in this thread and use it for your emulator.

Please tell me how you disable VSync to get 393 FPS.

VShader

LOL! I have an audio engine DLL, which can load and play WAV's but its not very fast and may kill the performance if your engine, you are more than welcome to try it, I may be able to give it to ya if you promise not to laugh at the horrible messed up code :D, another thing too is that it only uses secondary buffers.
Posted on 2004-04-29 15:40:19 by x86asm
>> can you look around with the mouse?
No.
Use NumPad.
Use Strg+Num8 and Strg Num/ to turn Kamera down/up.
Strg+Num-, Num-, Num+ to alter Lookfrom/Lookat/Kamera.

>>so you use z buffer?
No zbuffer, just zsorting of triangles with bucketsort.

>>can you disable distance clipping?
No but PageUp and PageDown move the farplane.

>>lots of keys do lots of things...
Be careful with Q,W,E,R,T,Z (you can draw over the borders of the screen in videomem), the other key should be no problem.


>>(maybe i m gonna get flamed but... I LOVE FLAT SHADING! ... or flatcolor gouraud) did you spend much much time on the polyfiller? (would be interesting to see how it manages with texture support and your skills, but i love so much it that way!)

I like fast flatshading too - I think it is not worth to try texturemapping (that's better handled by hardware and filtering).
The mmx-polyfiller is in the codesection "I hope I have never to touch it again" but another_old_member has motivated me enough to do more research so I can get a even more cleaner, stable look with flatshading.

VShader
Posted on 2004-04-30 03:31:15 by VShader
No zbuffer, just zsorting of triangles with bucketsort.


I'm not a big fan of z-sorting. There are cases where there is no proper way to sort two triangles, which causes errors in the image.
Also, if your polycount gets relatively high, sorting may become more expensive than z-buffering.
The same goes when you use multiple shaders/textures/etc (many state-changes between polygons/cache misses) or 'heavy' pixelshaders.
It may be cheaper to use a z-buffer that short-circuits the pixelshader on occluded pixels.
Just some things to think about if you move into other areas with your engine.

I think it is not worth to try texturemapping (that's better handled by hardware and filtering).


Well, some people even do multitexturing, per-pixel lighting and texture filtering in Java, in software... ;)
Such as this one: http://www.pouet.net/prod.php?which=10808
So on modern CPUs you can get away quite well with all this stuff.
Especially if you also optimize it with MMX/SSE/SSE2.
And this thing is also cute, it compiles DX9 shaders to x86 code, and runs everything in software, realtime: http://sourceforge.net/projects/sw-shader

So while hardware may be better at it (hardware is also better at flatshading anyway :)), it's still quite doable in software.
Posted on 2004-04-30 03:57:51 by Scali
i dont understand!
i cant understand how you do this kind of distance clipping i you dont have a zbuffer!
there seem to be a plane that cuts all polys much more shraply than just drawpoly/dontDrawPoly...
???

beside, i also will be using zbuffer in the engine i ll ... probably write :) one of these days
... dunno, but sorting polys is maybe o(nlog(n)), and zbuffer, while seeming more expensive, is o(n), n being polys... am i wrong?

you just have a more complicated drawpoly routine...

besides, its simpler in concept i think, and thats a big argument to my eyes. i wrote a small 3d engine in pascal/asm years ago and i timed the parts, and the sort was eating much.

.. but it was a sucking sort i made myself :), never seen it anywhere (or is it a derivative from something well known?) it was the most intuitive sort i could think of: ( a bit like bubblesort after all ):

{
unmark all array items
1:
search the array through the nonmarked items to find the smallest item AND MARK IT
push it in the "result" array
goto 1 (and do the loop numberofitem times)
}

haha...( and it was assembly)

dunno bucketsort...

with zbuffer you can write your polys in the order you want... thats why i wondered why to still use bsp when you have a zbuffer, like in HW accel... anyone could explain? i heard it was because you still draw the poly but in front->back order, in order (haha) to minimize pixel overdraw in the backbuffer... so you can avoid calculations of texture/lighting for this pixel if its hidden... is this right? please.

also bsp has othe uses like cutting parts of scene quickly i think...


btw while explaining to my brother (explaining is a wonderful way to understand something far better yourself, even if you thought you understood it well already)...

i had the idea of using a zbuffer to test if some region of the screen space was already used by a more close poly (everything normal till now) but i would use in this zbuffer only one z value per poly, say, the z value of the first vertice you find, or maybe the mean of the z of the 3 vertices... and it would do something like your sort would do. it would look worse than classic zbuffer (where you write the real z value of each "voxel" of your triangle, and you mult calculate this value in the triangle routine for each drawn pix) , you still couldnt have triangles intersecting properly, but it would be faster (still have to test every pixel (cant skip the whole poly even if 1st pixel is hidden) and you wouldnt have to modify much in your code (one load of the z value at begin of triangle and a test between each pix of the scanline before write).

i really hope you understand what i mean.

another old member:
croissant 9 simply rules!
and the sw-shader is also impressive! dynamic compilation of custom optimized routines! bilinear filtering in SW!
Posted on 2004-04-30 12:23:42 by HeLLoWorld
i dont understand!
i cant understand how you do this kind of distance clipping i you dont have a zbuffer!
there seem to be a plane that cuts all polys much more shraply than just drawpoly/dontDrawPoly...
???


Uhh yes, that's what clipping is :)
You have a viewport with 6 planes, and all polys are either clipped to fit in the viewport, or discarded. The common algorithm for this is called Sutherland-Hodgman, I believe. A z-buffer is not supposed to do near/far clipping, since firstly it's less efficient (per-pixel operation vs per-poly), and secondly, it means that you need to reserve a 'range' to allow for clipping, decreasing the remaining accuracy, while ideally you want to use the entire range of 0..1 for your zbuffer, and everything outside that range should be clipped beforehand.

dunno, but sorting polys is maybe o(nlog(n)), and zbuffer, while seeming more expensive, is o(n), n being polys... am i wrong?


Sorting can be done in O(n) with radixsort (aka bucketsort).
z-buffering is done per-pixel, so you can't express its complexity in terms of polys.

with zbuffer you can write your polys in the order you want... thats why i wondered why to still use bsp when you have a zbuffer, like in HW accel... anyone could explain? i heard it was because you still draw the poly but in front->back order, in order (haha) to minimize pixel overdraw in the backbuffer... so you can avoid calculations of texture/lighting for this pixel if its hidden... is this right? please.


Yes, front-to-back is the most efficient way to draw with z-buffer. Sometimes it's even faster to do a first-pass which updates only the z-buffer, and then do a second pass where you actually draw the pixels.
This way you get maximum occlusion.
Not many games use BSP anymore for visualization. Octrees are quite popular. BSP is still nice for eg collision tests though.
Problem with full BSP is that you get the drawing list poly-for-poly, which is very inefficient for T&L hardware. So usually you get 'leafy BSPs', where a group of polys is stored, and order/visibility are not 100% exact anyway. And generally you have multiple sectors, each with their own BSP tree. Octree is just another way of sectoring, and instead of using BSP trees, all polys are stored per-sector, for more efficient drawing with hardware.

croissant 9 simply rules!


I think it is the most advanced software-renderer ever used in any demo. It can compete well with sw-shader in terms of quality, and even speed, I think :)
It also uses a shader system, but it's not based on D3D-shaders. They are written in Java.
Posted on 2004-04-30 12:40:26 by Scali
btw i read somewhere that for sorting z of triangle, quicksort was unsuited and bubblesort was good, because between each frames the camera moves not much, so the tris are pretty much in the same order than one frame before , so bubblesort is good (that assumes you keep your structure in the way it is when youve ended the sort , and you reuse it the next frame) ...

i had never thought of that by myself.

(it seems quicksort is bad on almost sorted arrays)
Posted on 2004-04-30 12:41:46 by HeLLoWorld
btw i dont fully understand quicksort but one day i looked at a pascal code doing it , and i tried to do the same thing whithout recursion (i m not fond of recursion) and that day i think i managed it... but i didnt write it. maybe i should try again.

does it already exist and has it another name?

(i think i didnt need much more data space with my algo, maybe 2 times the space needed to store the array, and i think i hadnt anything dynamic )
Posted on 2004-04-30 12:47:39 by HeLLoWorld
nd all polys are either clipped to fit in the viewport, or discarded. The common algorithm for this is called Sutherland-Hodgman, I believe.



so its impossible to have only triangles, i mean, either you have a fixed number of polys that can dynamically grow their number of edges, or you must add some triangles to your triangle list in case a triangle is cut and becomes a quad... right or not? complicated.

Problem with full BSP is that you get the drawing list poly-for-poly, which is very inefficient for T&L hardware. So usually you get 'leafy BSPs', where a group of polys is stored, and order/visibility are not 100% exact anyway. And generally you have multiple sectors, each with their own BSP tree. Octree is just another way of sectoring, and instead of using BSP trees, all polys are stored per-sector, for more efficient drawing with hardware.


yes, a team in germany doing research on realtime raytracing (saarcor) was also talking of doing hierarchical bsp trees with local sorted trees that you would then merge to form a global big bsp, , because RTRT relies more heavily on bsp and dynamic scenes are a problem, especially nonlinear space transformation of groups of triangles (animating a robot is ok(trans/rotate parts of body) but a real skin is not(all polys distording simultaneously) ).

anyone understood what i meant with the simplified zbuffer triangle "sorting"?
Posted on 2004-04-30 13:11:34 by HeLLoWorld
so its impossible to have only triangles, i mean, either you have a fixed number of polys that can dynamically grow their number of edges, or you must add some triangles to your triangle list in case a triangle is cut and becomes a quad... right or not? complicated.


However you clip a triangle (or any convex polygon for that matter) with a set of (infinite) planes, it will always remain a convex polygon.
And any convex polygon can be subdivided into a set of triangles in a trivial way (triangle fan).
So you clip the triangles, and then feed the resulting convex poly as a set of triangles to the rasterizer.

anyone understood what i meant with the simplified zbuffer triangle "sorting"?


From what I understood, you divide the screen up in sectors, and for each sector, you sort the polys. Basically you divide the problem up into a set of smaller problems.
I don't think this would work very well though. It takes time to figure out which poly goes into which sector, and then you have to sort and draw all sectors separately (and how do you handle polygons that are in more than one sector at a time?).
I think it's much simpler and faster to just sort all polys at once.

What you could do, however... is to subdivide all objects into purely convex meshes. This way you know that the meshes themselves can be drawn entirely without sorting (if backface culling is on, you can draw them as-is, and if culling is disabled, you could sort by first rendering backfaces, and then frontfaces, since all backfaces will be behind all frontfaces by definition).
Then you only need to sort on a per-mesh basis rather than per-poly. Ofcourse the problem here is when two meshes are intersecting. You could test for possibly intersecting meshes by checking the intersection of their bounding volumes, and resorting to per-poly sort or zbuffer for these meshes.
Such an approach is often used for handing translucent objects on 3d-hardware (per-poly sort is incredibly slow on T&L hardware).
Posted on 2004-04-30 13:28:31 by Scali
So you clip the triangles, and then feed the resulting convex poly as a set of triangles to the rasterizer.


okay

From what I understood, you divide the screen up in sectors, and for each sector, you sort the polys. Basically you divide the problem up into a set of smaller problems.


haha... i knew it was hard to explain, but its very simple.

it really almost is like zbuffer.
you dont need to sort anything , you just draw all the polys.
when you draw a poly, for each pix, you calculate x, y (coords in screen space) and z with your rasterizer routine, and if z>zbuf(x,y) then backbuff(x,y)=color, exactly like with true zbuffer, but here, in fact you dont calculate z for each pixel, you just take each time the same value, the middle z of the triangle for instance, so you just have a test between each pixel, you save the interpolation of z between the edges.(but you still need to interpolate x and y (well, just x in your current scanline) ).

i hope its clearer.
Posted on 2004-04-30 13:42:04 by HeLLoWorld
it really almost is like zbuffer.
you dont need to sort anything , you just draw all the polys.
when you draw a poly, for each pix, you calculate x, y (coords in screen space) and z with your rasterizer routine, and if z>zbuf(x,y) then backbuff(x,y)=color, exactly like with true zbuffer, but here, in fact you dont calculate z for each pixel, you just take each time the same value, the middle z of the triangle for instance, so you just have a test between each pixel, you save the interpolation of z between the edges.(but you still need to interpolate x and y (well, just x in your current scanline) ).


So you still have to read the z-value for each pixel you rasterize, and write the z-value for each pixel you draw, else it doesn't work, right?
But this value is not correct, so you may still get sorting bugs.
Somehow I think that this method is barely faster than a true zbuffer (interpolating z is quite cheap, especially if you are interpolating many other factors, such as light, texture, 1/w etc per-pixel anyway), while it will barely look better than a real sort.

I would prefer to take the true z-buffer then, because I think it will barely be slower (the z-buffer comparison is more expensive than the interpolation of the z per-pixel anyway, so you're not eliminating the most important disadvantage to a polysort), and it will give pixel-perfect intersections, and it will render any polymodel correctly (that alone is enough for me not to use sorting anyway. Z-buffer is 'fast enough' on modern hardware).
Posted on 2004-04-30 13:52:08 by Scali
you must be right... i still need to read from zbuffer and sometimes write, so some more adds dont make much difference...
Posted on 2004-05-03 06:17:49 by HeLLoWorld
Hi guys,

Wrote an exporter-plugin for Cinema4d 6.3.0.2

You can export Polygonobjects. They can have arbitrary colors on arbitrary selections (Only color for flatshading is taken).
Further instructions are in the plugin.

The engine can now be used as a flythrough-modul in C4D.



Changes to the engine:

# Independent movement-speed from framerate.
# Accelerated import-speed drastically (now I can load scenes with more than 100000 triangles in seconds which took about 15 min before...)

I give the complete keys for movement:

------------------------------
Num5: Forward
Num2: Back
Num1: Strafe left
Num3: Strafe right
Num7: Down
Num9: Up
Num4: Turn left
Num6: Turn right
Num8: Turn down
Num/: Turn up
Ctrl+Num8: Turn down but fly straight
Ctrl+Num/: Turn up but fly straight
Ctrl+Num7: Roll (?) left
Ctrl+Num9: Roll (?) right
Num* und Ctrl+Num*: Reset turn down/up

Num+ once: Only move "LookAt" (like watching after a remote controlled car)
Num+ again: Only move "LookFrom" (look at a fixed point)
Num+ again: Normal camera (LookFrom und LookAt are attached on a "car").

Num-: Set LookFrom (before move to the desired point)
Ctrl+Num-: Set LookAt (before move to the desired point)
------------------------------

btw: while toying around with landscape-objects I realized that 16bit-vertex-data isn't *that* bad: In this real landscape-scene (south-germany, extracted the hightfield from a map-program, about 40.000 triangles) which has perhaps a size of 25x25km^2 I use only 10% of the side of the max. cube or 1% of the max. area. So in this resolution the area in 16 bit could be 250x250km^2 !


VShader
Posted on 2004-06-29 13:17:17 by VShader
And here the engine and the C4D plugin.

VShader
Posted on 2004-06-29 13:18:19 by VShader