Thinking about it again...
We also heard the same stories about 10 years ago, when Microsoft introduced .NET.
Oh it would kill off Win32, and you would never be able to write native code again...
Never happened.

Before that, we had the DOS-to-Windows migration.
"Oh no, now you can't access the hardware directly anymore, and you'll get all this overhead because of the multitasking with things running in the background and such".
Well, those people were nothing more than a bunch of luddites.
I think this thread has made it quite obvious that I'm at home when it comes to accessing the hardware directly and all that... But I never saw that as the best possible way, let alone the only way to write code.
I adopted DirectDraw at an early stage (the Win32ASM plasma tutorial that Ewald and I wrote was one of the first ddraw tutorials ever in ASM), and quickly realized just how wrong these people were. Even on a modest 486, the overhead for the DDraw API and multitasking compared to direct/exclusive system access in DOS was negligible. Those people were nothing more than luddites. Never even bothered to try things themselves.

So yes, perhaps Metro will be more restrictive than the Win32 API... but the real question is: does it matter? The Win32 API was more restrictive than DOS as well. But apparently today this is the 'gold standard'.
Reminds me of another blog I wrote recently, about how Windows XP is the 'gold standard' to some luddites/fanboys, when people ignore everything that went before XP, and also XP's own rocky start: http://scalibq.wordpress.com/2012/04/17/windows-xp-the-gold-standard-of-windows-oses/
It's all so... arbitrary. And so... short-sighted...
Posted on 2012-09-01 12:59:06 by Scali
Whoops, I actually missed an obscure graphics mode that Dosbox and PCem support.
Namely the 160x200 16-colour mode which is a side-effect of NTSC colour aliasing when you use a composite signal.
I've added support for it now.
Attachments:
Posted on 2012-09-05 18:04:26 by Scali
One of the last things that was still missing was a polygon clipper.
As long as your objects would fit entirely inside the screen area, that was not a problem... But if you want slightly more fancy 3d environments than just a single object in the center of the screen, having the ability to clip is pretty much required.
So, there we go: http://youtu.be/V7jrBYEGlD4
Posted on 2012-09-08 09:53:31 by Scali
Some background information, to put things in perspective:
The 286 donut consists of 128 quads: http://www.youtube.com/watch?v=V7jrBYEGlD4
The 486 donut consists of 750 triangles: http://www.youtube.com/watch?v=xE9iifKXvY4
The PowerVR donut (with geometry processing done on Pentium Pro 200 MHz) consists of 2800 triangles: http://www.youtube.com/watch?v=1BWbuUg8yvA

Because of vertex reuse, you only need one unique vertex per quad/2 triangles, so this translates to the following:
- The 286 transforms 128 unique vertices per frame
- The 486 transforms 375 unique vertices per frame
- The PPro transforms 1400 unique vertices per frame

All this can be done in realtime, as can be seen in the videos. In the case of the 286 and 486, the actual bottleneck is not so much the geometry processing, but also the rasterization, which is all done on the CPU as well.
The 286 and 486 code does not use an FPU for these operations. The Pentium Pro code uses regular x87 FPU, there are no SIMD extensions available.

These transforms are more or less matrix*vector operations.
If we want a ballpark figure of matrix*matrix performance, we can treat a matrix as 4 vectors, and divide the above figures by 4:
- The 286 workload is roughly equivalent to 32 matrix*matrix operations
- The 486 workload is roughly equivalent to 94 matrix*matrix operations
- The PPro workload is roughly equivalent to 350 matrix*matrix operations

Some comparisons to a recent mainstream system:
- On a 486DX2-80, the demonstrated code runs at about 50 fps. On a Core i7 860 the same code runs at over 5000 fps. Roughly a factor 100 difference in performance.
- On a Pentium Pro, the demonstrated code runs at about 50 fps. On a Core i7 860 the same code runs at 3200 fps. Roughly a factor 64 difference in performance.
Posted on 2012-09-19 14:03:33 by Scali
Decidedly less oldskool perhaps, but still related, is the work I'm currently doing on the GamePark GP2X.
This portable game console dates from 2005, but it has no 3d capabilities whatsoever, so I'm converting my 286 and 486 rasterizing code to the GP2X. The device has two 200 MHz ARM processors, but it isn't all that fast really. So far the performance seems to be somewhere in the range of a 486-66 and a Pentium 75, depending on what it is you're doing.
Anyway, I've managed to get it to render textured objects and flatshaded ones, and I've just managed to import my claw object from 3dsmax onto the device.
I'll be porting some of this code back to the 286, 486 and Amiga codebases, so they can benefit from the improved lighting routines and the imported objects as well. The claw is a bit too high-poly for those machines though :)
Posted on 2012-11-02 19:41:42 by Scali
Some extra info:
The processors in the GP2X do not have an FPU. This is the main reason why the 286 and 486 rasterizers are so appropriate for this machine (although the 486 technically has an FPU, I do not use it in any time-critical code, because the performance of the 487 FPU is very poor compared to fixedpoint arithmetic).

Aside from that, the second CPU only supports a subset of the ARM instructionset. It does not do division for example. Its main purpose is to be a sort of DSP coprocessor for things like video decoding. It can probably be used to offload at least some of the rendering duties (it could do batch-processing of matrix*vec operations for example, or perform the innerloop of the rasterizer), but so far I've concentrated only on the host CPU.
Posted on 2012-11-03 07:07:14 by Scali
I've added phong shading to the GP2X renderer now.
Attachments:
Posted on 2012-11-09 17:17:29 by Scali
Our (Quebarium, DESiRE, TRSI) GP2X demo 'The Chalcogens' won first place at the Recursion 2012 demo compo!
It's taking a while to release a final version and YouTube capture of the demo. But here is a short preview that I've made during development of the demo:
http://youtu.be/iyMvQzeIAIE
Posted on 2012-11-28 05:09:37 by Scali
Our demo is on Pouet now: http://pouet.net/prod.php?which=60788
The release contains both the GP2X binaries, and a Win32 port.
And here is the YouTube capture: http://www.youtube.com/watch?v=An9FnRiBn9g
Posted on 2012-11-29 01:44:40 by Scali

I also managed to obtain an SDK for PowerSGL, the native rendering API for this chip. I might port the donut from Direct3D to PowerSGL at some point.


I've had a bit of spare time to play around with PowerSGL.
Currently I managed to render the donut with a texture, using the PowerSGL Direct API: http://youtu.be/azfQHgUKOrk
This works much like a software renderer, where you do all T&L on the CPU, and send the triangles to the hardware in screenspace. However, since the PowerVR is a tile-based deferred renderer, there is no need to sort the polygons on depth, and you don't need to clip against the screen edges either, since this hardware can do that for free.
It can also cull backfaces in screenspace for free, at least on the hardware-side, but it's often better to cull in software, because then you can discard vertices at an earlier stage, and skip T&L altogether to save valuable CPU-cycles.

Currently I just perform T&L on the entire mesh and leave culling and clipping entirely to PowerSGL. In that case I get about 47 fps, more or less the same as I got with Direct3D (I have no idea how 'smart' the Direct3D implementation is, as in whether it tries to save CPU-cycles, or just dumps the mesh to the hardware directly).

There is also a more highlevel API to PowerSGL, which is scenegraph-oriented, with lights, materials, meshes and such. Comparable to OpenGL or Direct3D.
I will try to port my code to this API as well, in which case the API should take care of the above things regarding efficient culling and such.
I may also try to implement some CPU-based culling for the PowerSGL Direct code to see if I can get better framerates that way, or if I am just completely limited by the hardware at this point.
Posted on 2012-12-07 16:54:10 by Scali
PowerVR is a tile renderer my ass?
I use this as a full emulation layer for rendering gles 2 on windows
wondering what universe u live in
Posted on 2012-12-08 02:04:11 by Homer

PowerVR is a tile renderer my a**?
I use this as a full emulation layer for rendering gles 2 on windows
wondering what universe u live in


Okay, that made no sense whatsoever?
Firstly, it is common knowledge that PowerVR use TBDR technology, as you can read on their site for example: http://www.imgtec.com/powervr/powervr-graphics-technology.asp
PowerVR graphics technology is based on a concept called Tile Based Deferred Rendering (TBDR). In contrast to Immediate Mode Rendering (IMR) used by most graphics engines in the PC and games console worlds, TBDR focuses on minimising the processing required to render an image as early in the processing of a scene as possible, so that only the pixels that actually will be seen by the end user consume processing resources. This approach minimizes memory and power while improving processing throughput but it is more complex. Imagination Technologies has refined this challenging technology to the point where it dominates the mobile markets for 3D graphics rendering, backed up by an extensive patent portfolio.


I mean, how can you NOT know this? TBDR is the raison d'etre of Imagination Technologies. My PowerVR PCX2 is an update of the PCX1, which was the world's first commercially available TBDR-based 3d accelerator. The PCX1 however is quite rare, and even more rare is its elusive predecessor, codenamed 'Midas 3', which was only found as an OEM part in some Compaq Presario machines. The PCX2 was sold as a regular add-on board (both by VideoLogic themselves, which is now ImgTec, and by Matrox as their M3D card).

Secondly, what does TBDR have to do with GLES2? Obviously the newer PowerVR chips are all fully GLES2-compatible (mine isn't obviously, since it dates from 1997, long before GLES was invented). It's not like TBDR and GLES are somehow mutually exclusive. All the PowerVR chips aimed at the Windows market also supported Direct3D and OpenGL (or at least MiniGL in the case of the PCX1/2, since back in 1997 full OpenGL support was not yet possible on consumer-level hardware).
From the same page:
All popular APIs and OS are supported by all SGX cores, including OpenGL ES 2.0/1.1, OpenVG 1.1, OpenGL 2.0/3.0 and DirectX 9/10.1 on Symbian, Linux, Android, WinCE/Windows Mobile and Windows 7/Vista/XP.


So I wonder what universe *you* live in (aside from the fact that you try to lump in a 1997 accelerator with today's state-of-the-art)?
Posted on 2012-12-08 05:50:36 by Scali

There is also a more highlevel API to PowerSGL, which is scenegraph-oriented, with lights, materials, meshes and such. Comparable to OpenGL or Direct3D.
I will try to port my code to this API as well, in which case the API should take care of the above things regarding efficient culling and such.
I may also try to implement some CPU-based culling for the PowerSGL Direct code to see if I can get better framerates that way, or if I am just completely limited by the hardware at this point.


Okay, so it was not *quite* like D3D and OpenGL. More similar to OpenGL than to D3D though, since it is based on display lists.
Anyway, I managed to get my donut working with this API as well. Still quite rough around the edges, but anyway: http://youtu.be/0P6MW1mn0Eg
The framerate is roughly 48 fps once again. So either PowerSGL does not do a whole lot of optimizations on the CPU-side either, or the D3D and PowerSGL Direct attempts were pushing the hardware to the limits already.

This is still an approach using a conventional polygon mesh however (yes, real polygons as in n-gons, so I use quads for the donut, not triangles). The hardware also supports 'infinite planes' geometry. I will try to create a donut using this technique, and see what happens. A donut won't be a very good case for infinite planes, seeing as it's not a convex object. So I'll have to end up making convex 'tube sections', and connecting those together to form the actual donut shape.
I wonder if the end result will be better or worse than a regular mesh in this case.
And since the high-level PowerSGL API does not provide any performance improvements either, I am tempted to optimize the PowerSGL Direct code, and filter out backfaces before handing them off to the hardware.
Posted on 2012-12-11 07:28:55 by Scali
Right, the donut is not a very good case for infinite planes AT ALL!
Since the donut is relatively high-poly, I run into the limitations of PowerSGL quickly. It only accepts 100 planes per object.
My donut is defined as 40 tube sections made out of 35 quads. So each tube section already needs 35 planes for the outside, and then two more planes for caps.
This means that I should create each tube section as a separate object, in order to avoid the 100 plane limit.

The result was single-digit framerates, sadly: http://youtu.be/sSUBAT7St00
Nevertheless, it was quite an interesting exercise to define the donut in terms of planes/half-spaces. And it's quite interesting how the hardware is actually capable of rendering these infinite planes. Doing a ground plane in a game is trivial with this.

Oh well, next experiment will be to optimize the backface culling on the CPU, to see if the hardware can be pushed just a tad further than the 48 fps I've had so far, with my 1400 polygon donut (yea, that would be 2800 triangles for those newfangled triangle-based GPUs).
Posted on 2012-12-12 10:21:44 by Scali
You're too talented to be floating around in these backwaters, why screw with 48 fps soft? I don't understand you.
Posted on 2012-12-13 01:39:34 by Homer
It's all about having fun, off the beaten path, experimenting, trying to be creative and all that.
Posted on 2012-12-13 13:31:23 by Scali

Oh well, next experiment will be to optimize the backface culling on the CPU, to see if the hardware can be pushed just a tad further than the 48 fps I've had so far, with my 1400 polygon donut (yea, that would be 2800 triangles for those newfangled triangle-based GPUs).


Done that experiment as well, and sadly I have to report that filtering out the backfaces on the CPU side doesn't make any difference at all, still stuck at 48 fps. Apparently it's entirely limited by the rasterizing overhead. This is as good as it's going to get for the poor old PowerVR PCX2 it seems. I'm out of ideas :P
Posted on 2012-12-16 15:47:38 by Scali
I documented the PowerVR experiments in more detail in this blog: http://scalibq.wordpress.com/2012/12/18/just-keeping-it-real-part-6/
Posted on 2012-12-18 16:38:57 by Scali
Another side-project appeared, where I can combine my 286 rasterizer with some of the PowerVR technology.
Namely, I know a guy who is working with embedded microcontrollers and such. He has hooked up a low-end ARM core to an LCD screen. I gave him my 286 rasterizer code, which he adapted to run on his contraption: http://youtu.be/smz2cU3FfFk

As you can see though, it only uses a part of the screen. This is because there are some limitations on the hardware:
1) The display is 320x240 with 16-bit pixels, with its own frontbuffer (150kb). But image data can only be transferred with 64kb blocks at a time.
2) The embedded ARM only has 128kb ram in total, so there simply is not enough room for an entire backbuffer.

So the first try simply uses a 128x128 section of the screen. But, we are not satisfied with that, obviously. So how do we circumvent these limitations? Well, we take a leaf out of the PowerVR book, and render in tiles!
So I made a few modifications to my code so I could set the clipper to any given tile, rather than having it hardcoded to the screen rectangle, and to offset my rasterizer so it can render different parts of the screen in the same tile buffer.
In the process, I also ran into some minor precision issues, so I improved my clipper's accuracy as well: I now make sure that it always clips in the same direction, so that any rounding errors are also in the same direction, and clipped polygon edges will always fit together.
In the attached screenshot, I used 3 tiles for the screen (which is what we intend to do on the target hardware). I chose to set the tiles 1 scanline apart, so it is easy to visually inspect the correctness.
Attachments:
Posted on 2012-12-26 09:09:51 by Scali
I first wrapped up the remaining Amiga-stuff, mostly about the second generation of Amigas, featuring the AGA chipset: http://scalibq.wordpress.com/2013/02/08/just-keeping-it-real-part-7/

And now I've started on a new chapter, the Commodore 64: http://scalibq.wordpress.com/2013/03/08/just-keeping-it-real-part-8/

I've also made an overview page, as this series of blogs has gotten quite large already:
http://scalibq.wordpress.com/just-keeping-it-real-a-series-of-articles-on-oldskoolretro-programming/
Posted on 2013-03-09 08:02:10 by Scali