A cube map is more expensive than a sphere map using polar coordinates, and warps unless you counteract it. Sky domes exist because sky boxes are crap. You could write a single mapping function for spherical texturing, it's not that hard.
Posted on 2012-01-11 01:53:38 by Homer
No offense, but really?...
Anyway, this is not about skydomes or skyboxes. This is about projecting onto a REAL dome, as opposed to a 2d screen. As I said, like imax/omnimax.
I suppose a picture says more than a thousand words: http://paulbourke.net/exhibition/domeinstall/

There are various problems with spherical maps...
Most notably:
- They can only represent about 180 degrees maximum. A full surround view is not possible.
- The pixels are warped with varying degrees. In some areas the resolution is extremely poor.

We specifically use cubemaps because we want a full 360-degree world, with a resolution that is as uniformly divided as possible. Cubemaps are a very good format for this. Spherical maps obviously are not. You'd need two spherical maps, and at higher resolution than a cubemap face... Aside from that, cubemaps can be rendered to very easily by the hardware. Spherical maps would require a render-to-cubemap step first, and then another render-to-texture pass for projection onto spherical format.
Effectively, cubemaps are the cheaper option for our needs.
Anyway, as I say, the player we have 'unwraps' a movie to cubemap format first.
The movies may come in a variety of formats. One that is quite popular is a 'fisheye' variation, commonly referred to as 'fulldome':

We have successfully played back this movie in the dome.
Now, in theory one could warp it directly from the movie format to the required projection... But we chose the cubemap because it makes the projection very simple: we can just reflect it onto a geometric shape of the projection surface. This can easily be modeled in 3dsmax and exported.
The same goes for the unwrap-part: We can build the proper geometry with texture coordinates in 3dsmax, and export it. Then the movie is played on that geometry, and the cubic camera is placed inside, giving us a full 360 degree view.

It's very easy and intuitive to set up and tweak the installation this way. No need for any precalc or expensive pre-processing. Which is what most people do: they set up the warping, then render the entire movie in the warped format. But then it is set-in-stone. You cannot move it around, and you cannot easily do composition with other movies, objects etc.
Posted on 2012-01-11 08:07:20 by Scali

But for our own software we now have a pretty nice high-performance solution.
We are currently using the 'borderless window' trick as well. Although DirectX 9 seems to work okay with multiple fullscreen windows on Windows 7, it's still rather delicate. So just to be on the safe side for now, we stick to the fake fullscreen trick.
We'll experiment with fullscreen mode in DX9 a bit more after the gig. Then we'll also work out the multithreading. I already did a quick-and-dirty test where all GPUs were being fed by their own thread (and thus core), so that they would all render in parallel. The idea works, but it was still very rough around the edges... window procs weren't working yet, etc. Threads would also fight for CPU time, one display would starve the others for some reason (could be a C#-related threading issue, should really do threading in native code).

Things never quite go as planned... So we didn't really look into this part of the code again, until now.
But I have worked it out...
First try was to get multiple windows each running in their own thread, rendering as quickly as possible. After some debugging, I concluded that Windows doesn't schedule multiple message loops well enough so that one does not starve the other. It will probably work fine if you use a GetMessage()-based loop... However, for graphics you generally want to use PeekMessage() and run your graphics code whenever the message queue is empty.
The problem here is that the threads will be hammering the system constantly, which results in message queues of background windows and such getting flooded, and the entire system becoming unresponsive. This is probably a side-effect of the Windows-scheduler not aiming to be fair, but rather trying to boost a window's priority when it receives messages.

Initially I tried with a Sleep(0) after rendering a frame, so that the thread would give up its remaining time, so that other threads would get some time to process their messages as well. However, this does not quite work. Apparently this does not free up enough time for the other threads. So I tried Sleep(1).
This prevents the system from going haywire, but it has a nasty side-effect: Sleep(1) is generally a lot more than just one ms, because the thread scheduler does not run with 1 ms granularity, but generally somewhere around 10-20 ms. As a result, my windows were now limited to rendering ~100 fps max. Which is not acceptable. I might want to drive a 100 Hz or 120 Hz display/projector at full frame rate.

What I really wanted is a Sleep() that has better granularity.
So I decided to try the following: Call Sleep(0) (or SwitchToThread()) in a loop. For each call, if there are other threads to process, they will get time... If not, you will return immediately, and call again. So calling Sleep(0) a few times when there is no work, takes very little time. And if there is work to do... well, there is work to do, so it doesn't matter if it takes a while longer.
After a bit of testing, I concluded that calling Sleep(0) 100 times in a loop works quite well. Each window can still update at very high framerates, so very little CPU time is wasted, and the system remains well-behaved. Each window also runs at more or less the same framerate, so it balances the load quite nicely.
And there we are:

Both windows render the same scene here, but only because I was too lazy to make two separate scenes. They are two completely independent instances of the same scene, and may as well have been completely different scenes... or at least two different view on the same scene.

However, this is probably not the way we will be using multiple renderers in practice.
Namely, we will want to synchronize the threads with the vertical sync... And we want to synchronize the threads with the business logic. So the threads will be in a wait state at some point anyway, which might mean that the Sleep()-calls are no longer required.
In fact, even the entire PeekMessage()-system can be dropped... The business logic could just post a WM_PAINT message or something like that, to start a new frame on all renderers. In that case, a regular GetMessage()-loop would be good enough.

And there's plenty of other ideas that we have, which we will work on after the gig.
One idea I particularly like is to use the VideoLAN code to broadcast live video data over the network, and render with multiple PCs at a time.

To follow up on this as well: It didn't quite work out the way we hoped.
It seems that VideoLAN insists on re-encoding a stream before broadcasting it on the network. This means it takes considerable CPU time, and also introduces quite a bit of latency. So we may just need to implement our own network broadcasting scheme for this, at some point.
Posted on 2012-02-11 12:30:42 by Scali
Here's a fun thingie:

Rendering with 10 instances of the engine on a triplehead setup. Still getting 500 fps per window. And rendering nicely in sync.
Seems like the framework is becoming quite solid. It should be able to scale quite well by just adding more GPUs and CPU cores to the system.
Posted on 2012-03-01 12:31:31 by Scali
The overhaul of the framework, adding multithreading and such was done to fill up the 'dead time' between projects.
Now that the framework is cleaned up and quite efficient/scalable, we have been working on making the technology easy to use.

I have updated the ancient mechanisms in my D3D engine which managed dynamic resources such as rendertargets and vertexbuffers etc. These now form the basis for some easy-to-use objects, which make things like render-to-texture child's play.
The material/shader system has also been prepared to make it easy to compile materials and shaders on-the-fly, and hot-swap them.

The next step is to make render passes easily configurable/editable directly from the UI, so that we can move from just developing technology to actually making the cool visual shows we are aiming for.

However, there will be some projects first. We have a simple gig this weekend where our software will be used as a sort of video mixer that can map the visuals of two VJs onto a stage with 3 screens. This should show off some of the power of our realtime video processing in software, since we plan not to use any actual video mixers or other 'analog' devices at all. The output of the two VJ's computers will be fed straight into our machine, and we let our software do crossfades and colour processing, and drive the 3 outputs directly.

Next up will be our first 'real-world' test with the dome projection. We are going to set up the dome at an art gallery. Inside the dome we will have a bio-feedback system. People can pay for a 15-minute session, where their heart and breathing rhythms will be transformed into audiovisual feedback. By controlling their bodies, they can control the feedback.
Posted on 2012-04-05 07:40:42 by Scali
Had 295-based triple head for some time.
You wont convince me of the validity of any fps number without showing me a millisecond graph anymore. Plus, the screen lag _cannot_ be ignored. People are talking about 3-10 frames. I also have a stereo system and people are saying that 120Hz lcds are only beginning to catch up on CRTs, and I believe them.

Regarding your strange entertainment system, reminds me that I ve had an OCZ NIA for some time now too.
Maybe you could consider doing things with it in addition to your heartbeat sensore and such. I dont even know if you can still buy a new one though.

It kindof works, but obviously it's just the beginning. Last time I checked people were still struggling to use it at the api level or even getting raw data from it.

Other brain-computer interface devices are coming or available, with far more sensors.
Processing power is increasing and understanding of the signals is progressing.

I think this is going to work.
Posted on 2012-04-26 05:38:43 by HeLLoWorld
We got an Emotiv:
Posted on 2012-04-26 15:51:42 by Scali
Thats it :)
Posted on 2012-04-26 17:07:05 by HeLLoWorld
If you are into this stuff, i thought i'd share this huge article written by one of the creators of the nia.

There are twenty pages but i found them worthwile.
It is the most insightful thing i've read about it, but i think you need to have some background or experience to digest it.

Posted on 2012-04-30 18:31:16 by HeLLoWorld


Looks cool, have to sit down and read it more thoroughly some time. Perhaps I should pass it on to the bio-feedback guy as well.

I have just completed yet another extension of the rendering framework. So far, we could support multiple windows, but we did that through multiple D3D devices, each running in their own thread. This approach works great with multiple GPUs, but it is not ideal when you have a single GPU rendering multiple windows.

Firstly, you're depending on the multiplexing capabilities of the driver (which I have to say worked better than I expected). Secondly, you cannot easily share resources between windows. Especially with things like video textures, this is far from ideal, as you end up having to have separate instances for each device, and they have to be synchronized with some CPU overhead.

So I have also added support for multiple swap chains per device now. This means that a single device can now render to multiple windows, and you can re-use the resources easily. It's very useful when you want some kind of preview window for example. In this case, I want to preview the incoming video for example, to see if the video input is working correctly, and to trim off any noisy edges in the case of analog signals.

Other than that, there's just some final cleanup to do for the framework, and then it's mostly developing extra functionality using the framework from there. Which should go nice and quick. The framework is making it very easy to manipulate rendertargets, renderpasses, shaders, materials etc. And pretty much everything you want to do is some kind of combination of those.

I've put the OpenGL code on hold for now... The problem is that our UI works with .NET, and the D3D renderer is C++. We use a wrapper in C++.NET to glue the two together. Now, the UI should be reasonably easy to port to other systems using Mono. And the OpenGL renderer in C++ is already multiplatform. The problem is however, that there is no C++.NET in Mono, so connecting the two together will require some extra effort. So I have decided that it is not important enough at this time. We will concentrate on developing the Windows version first.
Then we'll stop further development for a while, and work on an OpenGL port for Mac OS/linux/BSD full time.

Oh by the way, we do actually use the above C# compiling trick atm. We have a startup script system, which is actually just a C# class. But the application can take a .cs file from the commandline and compile it on-the-fly, then execute it. Gives you very powerful customization options.
Posted on 2012-05-19 07:36:05 by Scali
One aspect that we think is very important, is that our software should run on old/low-end hardware as well.
When building a network of rendering machines, it will keep the cost down considerably, if you can just use old surplus boxes, or low-end machines with just integrated graphics.

To that end, I have dusted off my old Athlon 1400 machine, with a Radeon 8500 128 mb AGP card in there.

This made things rather interesting: can my code still run on DX8-class hardware? The current codebase has a bloodline that can be traced back to DX8. My first fully 3d-accelerated engine was written with Direct3D 8. The code has been updated to Direct3D 9 (although I still used a GeForce2 GTS and later this Radeon 8500), and that code still lives on in the application today.

In theory, it should still be able to run on all the old hardware that it once ran on. I had done some tests with my old laptop earlier. It has a Radeon IGP340M, which offers VS1.1 but no pixel shaders. The code ran, and the vertex shaders worked. However, because of the missing pixel shaders, the rendering was not correct. The code is still there to create fixed-function pixelshading, but I couldn't be bothered to write replacement code for all our shaders. I was happy enough just to see the code starting up and rendering something that was reasonably close to the intended graphics.

Now the Radeon has proper PS1.4 shaders, which should be powerful enough to run most of our shaders. But first I had to update the software on the machine in order to run our application.

This proved more difficult than I thought: I had problems installing the latest DirectX runtime. The error message was completely unhelpful, but as I figured out, the problem is that the Athlon doesn't have SSE. The latest runtime that managed to install on the system was from 2007. Apparently at least some parts of the runtime (more specifically the XAct-portions) require SSE since that time.
I also noticed that recent versions of Paint.NET won't install for the same reason. Apparently developers now assume that SSE is available, and don't bother to build in fallback paths.

While theoretically I could compile my code against an old version of the runtime, it would probably prove to be more trouble than it's worth. Namely, I rely on the fact that the new compiler can compile shaders with D3D10+ syntax for the D3D9 API. This way I can share the same shaders between all API versions.

So I gave up on that Athlon... I also have *another* Athlon, which is an XP1800+, which *does* have SSE. It was just in a pile of components, since I used its case for another computer. I decided to take the old Athlon apart, and put the XP1800+ in its case. It used to have a Radeon 9600XT 256 mb AGP card in it, but I'm not entirely sure where that card went. So for the time being I stuck the Radeon 8500 in there.

And indeed, with SSE, I could now install the latest DX runtime without a problem. Did that mean that my code worked? Well no. I had overlooked some minor details when using the new shader compiler. Apparently the new compiler can not compile pixelshaders to anything less than PS2.0. It is source-compatible with PS1.x shaders, but it always generates at least a PS2.0 binary blob. Since my laptop had no pixel shaders, the problem did not surface on that machine (VS1.1 can be handled just fine by the new compiler).

So, the result was the same as on my laptop with no pixelshaders: I was looking at the same semi-broken renderings.
When I looked through the API reference, I noticed that you can specify a flag that forces it to use an old D3D compiler library from 2006. This one still supports PS1.x. So I decided to write a simple workaround in my code that reverts back to the old compiler when you specify a PS1.x profile to compile to. This still meant that I had to rewrite my shaders from D3D10+ syntax to D3D9 syntax, but since it was only the pixelshaders, it was not that much of a problem. I just had to remove the cbuffer declarations and make them regular global variables. And I had to change the output semantics. No big deal.
The vertex shaders were still handled by the new compiler, so I could leave that code untouched.

And lo and behold, my code actually worked! I was able to run the full rendering algorithm for the sphemir dome projection that we do. And the performance was actually quite good too! We use a 1024x1024 cubemap to render to, with mipmaps generated on-the-fly. And this old card still managed to churn out some 500 fps in a simlpe test scene! That is actually better than some of the newer machines we used. This card may only be a DX8.1 card, but it was a high-end card. In fact, it was the fastest card on the market at its introduction. And that means it came with a lot of memory bandwidth. Something that even today's IGPs and low-end cards seem to struggle with. Not a good thing when rendering to large cubemaps.

Another old card of mine, the GeForce2 GTS, would probably also do quite well still. It may not have had programmable shading, but its hardware T&L and pixel processing were extremely fast at the time. In fact, the GeForce3 and Radeon 8500 were barely faster than the fastest GeForce2 cards.
So who knows, I may actually try for a fixedfunction version at some point, just to see what the GeForce2 can do.
Posted on 2012-07-12 01:02:24 by Scali
Well, old boxes are nice...
This toying around inspired me to work on the material handling some more. I had wrapped part of it for .NET use, but only the bare essentials.
I've now extended the wrapper so it can make full use of all DX11 shaders.
I've also added support for fixedfunction processing and other legacy D3D9 features.
While I was at it, I also found some bugs/not-so-nice-things and fixed those as well.

Another issue I ran into was with Windows XP. While I generally test the D3D9 code, I do it on Windows 7, and I have recently upgraded the code to make use of D3D9Ex whenever possible.
The nice thing about D3D9Ex is that it no longer has the distinction between managed and default pool resources, and devices no longer get lost/need to be reset, same as in D3D10 and higher.
The downside of that is that my reset code for regular D3D9 has not been tested in a while, since I've only run the engine in D3D9Ex mode. So when I tried to resize the window in XP, it hung.
Well, not anymore. I've fixed up all the minor resource handling issues that snuck in there on Windows 7, and it works flawlessly on XP as well now. Even on a DirectX 8 device :)

All this has also been a nice preparation for one of the next tasks on our to-do list: a material/shader editor.
Posted on 2012-07-16 05:09:58 by Scali

Because you spent time writing the code? :)

Yea, good point...
It's even worse because the code is in an svn repository anyway. If I ever wanted it back, it would be easy.
But yea, there are a number of reasons why I could hang on to the D3D10 code... none of them are very good reasons, but still :P

Well, things took a turn for the worse: instead of removing my D3D10 code altogether (its only purpose is for supporting Vista installations which have not been updated Direct3D 11), I have decided to add even MORE D3D10 code.
Namely, I had replaced vanilla D3D10 code for D3D10.1 early on. However, that gives a similar problem to D3D11 on Vista machines: D3D10.1 does not ship with Vista, but is added in SP1. So if a machine is not up-to-date, it won't be able to run D3D10.1.

I figured I'd just see if I could add vanilla D3D10 support back in, just for kicks. Turned out to be not too much of an issue. Also helped me to inspect and modify some pieces of code that I hadn't looked at in a while. Always good.
Posted on 2012-07-19 06:30:15 by Scali

This proved more difficult than I thought: I had problems installing the latest DirectX runtime. The error message was completely unhelpful, but as I figured out, the problem is that the Athlon doesn't have SSE. The latest runtime that managed to install on the system was from 2007. Apparently at least some parts of the runtime (more specifically the XAct-portions) require SSE since that time.
I also noticed that recent versions of Paint.NET won't install for the same reason. Apparently developers now assume that SSE is available, and don't bother to build in fallback paths.

While theoretically I could compile my code against an old version of the runtime, it would probably prove to be more trouble than it's worth. Namely, I rely on the fact that the new compiler can compile shaders with D3D10+ syntax for the D3D9 API. This way I can share the same shaders between all API versions.

So I gave up on that Athlon...

Well, I took that issue back up... Namely, I had a Pentium II box sitting around here, with no PSU. I managed to find another PSU for it, so I tried firing it up. It used to be my home server, so it had an old installation of FreeBSD on it.
I figured it would be more useful if I would install Windows XP on it. So I decided to take the opportunity to do a clean install on that box, and see how far I'd get when trying to run my applications. That way I'd have a good idea of the prerequisites.

Now obviously a Pentium II lacks SSE, just like the Athlon. But I found out that I only need two files to run my code, namely D3D9X_43.DLL and D3DCompiler_43.DLL. If I just drop those two files in the application directory, it will run just fine.

The Pentium II had a Matrox G450 card in there. And much to my surprise it actually worked too! Yes, it was horribly slow, mostly because my code runs with vertex shaders, and they need to be emulated on the slow CPU, but it rendered everything as expected.
So I decided to go find my Radeon 9600XT card. It won't fit in the Pentium II, but I can bump the Radeon 8500 down to the PII, and put the 9600XT in the Athlon instead.
That way the PII at least has a card with shaders. And after my recent code updates, it is supported by my app as well.

And indeed, where the G450 managed to pump out only about 3 fps with the skinned claw animation, the Radeon 8500 rendered it at about 375 fps, which is about the same as what I'd get in the Athlon. That's a nice confirmation of how lightweight my engine code really is, and how it offloads all the heavy work to the GPU.

Now the PII will probably be too slow for the rest of the application... Capturing a video stream will still require quite a bit of bandwidth (not to mention a PCI-card to add USB 2.0) and CPU load. So the PII would probably severely limit the resolutions and framerates for video streams.
But the Athlon 1400 is about as fast as the XP1800+ (which runs at 1533 MHz). And the XP1800+ ran the application without a problem, complete with a 720p stream, even on the Radeon 8500 card. So the Athlon 1400 will be decent enough, now that the SSE-problem is solved.
Posted on 2012-07-26 08:46:20 by Scali
I had bought Windows 8 Pro a few months ago, when they had the special upgrade offer for 30 bucks. However, so far, I had not bothered to prepare a partition on my PC and install it.
Over the weekend I finally did, and I've been playing around with Win8 somewhat. First getting it set up for daily work, by installing Visual Studio and various other development tools.
Then I did some reading on what's new in the Windows 8 SDK.
I also installed Visual Studio 2012 Express for Windows 8, and figured I'd have a look at creating Metro apps. I noticed that their Direct3D examples were basically just using regular C++ code for the D3D part. They then used managed C++ to link it to the Metro UI environment, which is .NET-based.

Well, that is interesting! My engine is already interfaced with .NET. So all I need to do is replace the code that connects it to a hWnd, and use the new CoreWindow instead.
Effectively, it comes down to changing only a single function call, the one that creates the swap chain:

So I will just be having some fun with that. I've just finished touching up my code so that it works correctly with the Windows 8 SDK (DirectX June 2010 SDK is now legacy, and will not receive any new updates). Also had to fix some QuickTime headers, because they defined a GetProcessInformation() function, which is also in the Windows 8 API, and so it clashed:

Despite all the FUD, all my code works fine in desktop mode, obviously. No need to even recompile anything. As long as you're in desktop mode, you barely notice the difference between Win7 and Win8.

Another interesting tidbit in the new Win8 APIs was this little flag: http://msdn.microsoft.com/en-us/library/windows/desktop/hh404455(v=vs.85).aspx
Yes, apparently Microsoft now allows you to detect whether you run on a TBDR (read: PowerVR hardware), so you can make special optimizations. Very nice!
Posted on 2013-04-08 09:42:24 by Scali
Okay, it's not quite THAT simple... That is, the WinRT environment uses C# and managed C++, but it is not the same .NET environment as on the desktop.
The underlying framework is different, so legacy .NET code will not work.

So far I have just taken my C++ code, and interfaced that with WinRT directly, because I could not re-use the .NET wrapper to C#. It does some things that are not allowed in WinRT, and it also relies on SlimDX, which is not WinRT-compliant either.

Aside from that, D3DX is deprecated. There never was an ARM-version of the DirectX SDK, and therefore there is no D3DX for ARM either.
Currently my engine works from within a CoreWindow ('Metro') application, but it still relies on D3DX, so I can only build x86/x64 binaries at this point.

There are basically three different tasks I perform with D3DX:
- Shader compilation
- Texture loading
- Mathematics

I've been working on upgrading the D3DX code to the new interfaces.
Shaders should now be compiled with the D3DCompiler library, which is included in the Windows 8 SDK:
I've replaced my D3DX code with these new functions.

There is no library for textures included in the Windows 8 SDK. However, Microsoft has released an open source project by the name of DirectXTK (Tool kit):
This handles DDS loading for all devices, and aside from Windows Phone, it also handles the 'legacy' formats as well (jpg, png, gif, tga etc). On Windows Phone, the codecs for these formats are not included.
I have also replaced my D3DX texture loading code with this library.

For mathematics, there is the DirectXMath library, which is included in the Windows 8 SDK:
I am currently rewriting my code to replace the last bits of D3DX with DirectXMath.
Sadly though, DirectXMath will require SSE2 as a minimum, so if I replace this code, I will no longer be able to use the D3D9 code on my old Pentium and Athlon systems.
Although, I think there is a way to disable its use of intrinsics.
Posted on 2013-04-10 08:54:54 by Scali
Well, I have not finished removing the D3DXMath stuff from my code yet, so currently it is still limited to x86/x64. But I hacked together a simple test app, which generates a donut (what else), puts a jpg texture on it, and does some per-pixel shading.
I've shown it tiled, with the App Store on the side. You can grab the window while it's running, and move it around. All very simple and seamless.
Posted on 2013-04-12 17:20:36 by Scali
Well, I have removed all dependencies on D3DX from the D3D11 code.
The D3D9/10/10.1 code still need D3DX for texture loading. D3D9 still needs D3DX for shader compiling as well. D3D10/10.1 use the new D3DCompiler for D3D11.

In theory I could also expand the DirectXTK texture loading code to work with D3D9/10/10.1 as well, but I don't see much of a point in that, really.

The code has cleaned up nicely so far.
I can now also get a better insight in what I need to do to make it work on ARM. There were thousands of compile-errors with D3DX. Now there's hundreds of compile errors for DirectShow. I will disable that next (will write a replacement with whatever media API they have for WinRT later). And by the looks of things, the only other remaining errors were related to file access (CreateFile/ReadFile etc), and some synchronization (CriticalSection etc), for which there probably are suitable WinRT replacements.
Posted on 2013-04-15 07:16:48 by Scali
The new media API since Vista is Media Foundation.
I've written some basic code to interface with capture devices, as a replacement for DirectShow. Should work within WinRT and WinPhone8 environments as well.
Friend of mine has recently bought a Windows 8 phone, so we'll probably be having some fun with that sometime soon, and see if we can actually get the whole D3D11 engine, complete with streaming video, running on an ARM device.
Posted on 2013-06-07 19:35:58 by Scali