12 cycles (data from/to register) on PMMX,
realworld-transform-performance of > 9.000.000 vertices on my P200mmx.

"Edit Again: I think the: punpckldq mm5,mm5 should be punpckhdq mm5,mm5? And you might have to mask off the high word of the MM3 before it is stored, if you need that value zero? "
Not necessary: The piece of code in your attachement works correct.


Posted on 2002-02-21 14:05:28 by VShader
VShader, can you email me a copy of what your working on when your done, or a URL where it's posted? Just curious. Nice to hear it works!
Posted on 2002-02-21 14:11:56 by bitRAKE
Yes, of course.

I do attach it here right now, because I do not believe that it ever "will be done", because I just do it for fun.

It's basically a asm only flat shaded 3D-engine for a racing game with using extensively the mmx instructions (even for 3D-transforming, you guess it!).

Of course no documentation, just try every key on your keyboard (Num Lock!) and press and move the mouse.

Source is there too (MASM 6.14), should be useful for anybody who wants to start gfx with front/backbuffer-rendering under Windows95 (or later?).

The "world" can be importet as a 3D-studio ascii file.

Can not guarantee that 16-bit colors are correct on your machine, but 32bit-color exe is there too.

*If* it runs on your machine then please do me a favor:
Navigate to a point so that the whole scenery is on screen, press F1 and later F2, press the "PRINT" key on your keyboard and exit.
The screenshot is then in the clipboard.

Please mail these to and tell me your processor-specs (it runs at 60 Hz on my P200MMX).

Just curious too!


PS: The Octree is not working yet, but you can visualize it (M). Has anybody here implemented a octree? I could use some hints...
Posted on 2002-02-22 11:09:03 by VShader
The output responses pretty fast to movements etc. but it looks like garbage:

Framerate is 75 fps (fixed), probably because the screen is switched to that refresh rate.

(Athlon TB 1.4Ghz, 256MB ddr)

Posted on 2002-02-22 11:36:15 by Thomas
"looks like garbage: "

true. (damn)

please try the 32bit-color version.

If it runs:
F1 - screenshot - exit.
F3 (sorry, not F2) - screenshot - exit

Posted on 2002-02-22 12:30:05 by VShader
The above screenshots were from the 32-bit versions, both versions give the same problem.
Here's the F3 output (32-bit):
Posted on 2002-02-22 12:34:09 by Thomas
It should look like this.

Don't complain about the clipping.
I will do 3D-Clipping when the octree is in place.

Posted on 2002-02-22 12:50:39 by VShader
Both exes seem to work fine on my pc (PIII 700, 128ram, Win2k)

And might I ask you a question? You say you use mmx alot, does that mean your dealing with integer coordinates for the 3d models? I tried looking at the source, but without English comments I'm kinda lost.

Its a beautiful program though. And 14,000 lines of code is no mean feat.
Posted on 2002-02-22 19:12:41 by Eóin
It worked great on the computers at my job, but at home I get the same thing Thomas posted. This might be DX8, or the GeForce3 drivers?

A combination of fixed point and floating point is used.

Very impressive work VShader!
Posted on 2002-02-22 20:29:16 by bitRAKE
I get the same problem on my machine... AMD Duron 800, 256Mb ram, Gforce 2...

But, im inclined to agree with bitRAKE, as my OS is currently using DX8.0 as well...

Hope this is some help... (wish it worked, the pics look *real* impressive!).

Good Luck..
Posted on 2002-02-23 01:16:24 by NaN
There are 3 renderers to choose from at runtime:

F7: mmx-Slopes
F6: Edgetables with midpointalgorithm
F8: GDI-Fill polygon

The GDI version should work on every machine because the text in timing etc. uses GDI too and it shows correct.

But it is dead slow. It even nearly caused to cancel my project because I thought that the Matrox drivers for filling a flat shaded polygon could not be that slow and I wanted a scene with more than 300 triangles.

With full scene on screen I get 2 fps with the GDI-filler and > 60 fps with the mmx filler.
And this isn't optimized yet because I only write words in 16bit-color mode where I could (in larger tris) write dwords and qwords which should be much faster.

I write directly to videoram. If I would use a backbuffer in mainram it could be even faster on faster machines.

Hey, If I had one of your monstermachines I could throw in a 7 times complexer scene !!! But everybody knows that the code wouldn't then be that optimized...

---quote Eoin-----
does that mean your dealing with integer coordinates for the 3d models?

Setup of matrizes and movement of the camera is floating point.
Transform is mmx-integer fixed point.
Perspective is floating point (Can I avoid this?!?!).
Rendering is mmx-integer fixed point.

At this demo I use:

7 bit after decimalpoint 9:7 (+-256m,... // 1 bit: 0,78 cm)

The scene is about 123m long.
There are accuracy-issues with the placement of the (fixed) point if you move away from the center of the scene with your camera. They show while rotation and scaling, not while moving.

This effect of fixed-point-transform can easily be studied while scaling the whole scene with:

U: Scale up
I: Scale down

(But you must first move away from the center)

One way would be to preserve the accuracy of 32 bit after the pmaddwd. You don't pack back to 16 bit but go on with 32 bit fixed point. The mmx-transformation could save there 1 or 2 cycles too (but you have to store 2x the data).

Does anybody guess which game inspired me? (played it YEARS ago)

Posted on 2002-02-23 03:38:58 by VShader
Hmm, Annihilation Tank? I doubt it though, your 3d graphics are a thousand times better.
Posted on 2002-02-23 07:19:06 by Eóin

Does anybody guess which game inspired me? (played it YEARS ago)

On my PC (Athlon-XP 1800+ , ATI 8500 Radeon, C2N DataSette :tongue: ) MMX3D_32.exe works, but MMX3D.exe shows that gfx problem (intuitively I'd say wrong modulo/pitch).

About which game.. hmm.. Test Drive III? ;)

Posted on 2002-02-23 08:16:20 by Maverick
..or Hard Drivin' maybe? ;)
Posted on 2002-02-23 08:21:22 by Maverick
All wrong (shame!)

Of course it is STUNT CAR RACER (Geoff Crammond).

This should become a clone with an drive anywhere approach (I hope I can start physics soon...)

Posted on 2002-02-23 15:45:07 by VShader
Don't blame me.. I know Stunt Car Racer VERY well.. played it on the Amiga for years and not only there, it's surely one of my favourite racing games ever.

Hey, your game looks good but doesn't look like Stunt Car Racer ;) You added trees, a church.. it looks more like Test Drive III or (although less) Hard Drivin'.

SCR roxx :)

Posted on 2002-02-23 21:41:56 by Maverick
Ironically (and you dont have to believe me), but it was my first guess... :)

Too bad i never got my chance to post....

The posted sceens look very much alike... the raised "track" is what cued me to it (as i remember countless times falling off and watching the *slow* death... )

Posted on 2002-02-24 03:37:22 by NaN
Holiday and heavy at work again ... (the projekt rested for some month because of too much work in real life)

Just wanted to show you this really cool dog and the progress my asm-only 3D-engine makes.

To keep it assembler related I will give some insights to the pics.

1) A nice dog.
2 For debugging reasons I wrote a short routine which rendered the just implemented NormalVector per Triangle (Culling *BEFORE* transforming). Loaded the dog-model in the engine, altered the routine to show 4 normal-vectors per tri and animated them - and it lookes realy cool. A very easy way to make 3D-fur which is moving in the wind.
4) Each triangle gets 4 NormalVectors.
5) Preparing for the octrree I wrote a routine which divides the longest triangles in the scene (The spheres of the octree must be kept small)
6) The last level of the octree without perspective ...
7) ... and with perspective.
8) Testing a point (the center of an octree-sphere) is very easy and *fast* with mmx:

; d = Distance = n0 . (Q - P1) = mm7 . (mm1 - mm6)

;mm1: mmx Point = Q
psubw mm1, mm6 ;mm6: mmx Plane-BasePoint. = P1
pmaddwd mm7, mm1 ;mm7: mmx-CLipPlaneNormal = n0
movq mm0, mm7
psrlq mm7, 32
paddd mm0, mm7
movd eax, mm0 ; eax: Distance Point-Plane, +- 17:14
; Distance is negytive if *behind* plane.

Whith this perhaps 6 cycles you can easily test a sphere against a plane and you can throw it away (with all triangles in it) if it is behind the plane.

RED: out => do nothing.

9) YELLOW: Cut by plane => clip the triangles.
10) GREEN: In => render all triangles.

And one point I learned:
If your project gets more complex it is much easier to read something like

mov eax,
mov ebx,
mov cx,

than ...

mov eax,
mov ebx,
mov cx, word ptr

And ... you can even save comments! (?)
Posted on 2002-08-19 19:17:27 by VShader
Posted on 2003-02-24 16:27:45 by VShader
That's amazing! Great work! Only problem I have is that it does not draw polygons unless they fully fit on screen. Clipping problem?
Posted on 2003-05-17 12:37:36 by comrade