Thanks, but we already have a hardware-based solution for that: a Matrox TripleHead2Go.
Besides, it doesn't solve the problem that a single GPU needs to render all screens.
We'll get a nice solution implemented eventually (multiple videocards per PC and multiple PCs in a network), it's just not very high on the list right now.
We're planning for a show in a few months, where we will use 3 displays max, so we can rely on the TripleHead2Go, if we don't have a better solution by then.
For now we are concentrating mostly on getting the required rendering tricks implemented, and building a good user interface for that. Once we have that done, we will spend the remaining time on the standalone player, and the networking functionality.

We also want to do some tests with multiple videocards... especially with anisotropic filtering, quality can differ quite a lot between different types of cards. And the colours may also be off. So we'd like to see what the actual effect will be in practice, and how much of that we may be able to compensate by tweaking the colours and filter parameters per system.
Posted on 2011-04-25 09:41:50 by Scali
If I'm not mistaken th2go cant go ultra high in resolution ; plus it's not exactly cheap. BTW there seem to be alternatives to this, chinese hardware I think, maybe higher specs.

There are forums about this and there seem to have been quite some trouble about getting these things to work depending on your monitors your OS and your luck.

SoftTH needs tweaking too, but it was my first bang with multiscreen rendering, and I owe it this.
Oh, and I think softTH might have a networking subsystem already so you might not want to miss this.
With the (user, not dev) experience I had, I would absolutely not say implementing your own multi GPU multi board multiscreen solution, be it single or multi rendertarget, will be a walk in the park...At all.

I'd say a 295-style(tri-output) or two nvidia 4xx or 5xx SLI with surround 2D will provide the most robust ("professional") setup. Don't forget you ~SHOULD (...)  then be using all the power you have under the hood if needed, due to nvidia system automagically balancing the load on the sli. If you're not lucky any three monitors wont work though. There are workarounds though. Not absolutely guaranteed though. I succeeded, but I battled for very long.

Then you've got the red side of GPUs which has got some promising and potentially more powerful Eyefinity, but me sir, I don't wander in those type of places :)

Welcome to the land of freaking huge resolutions. It's indeed quite an amazing blast. Google Maps Google Earth-style in five mega f___ing pixels, or FSX 5040 pixels wide, or Just Cause 2... Oh my, JC2 has eaten so many of my nights I should not even tell.

Virtual worlds they say.

Billions of transistors of floating points FMADD and sheer processing power giving life to somehow real universes.

Posted on 2011-04-29 13:49:03 by HeLLoWorld
Well, I'm not sure what your definition of 'ultra high' is... but my friend/co-coder on this project has a th2go (which I now borrowed to play around with... with "we already have a hardware-based solution for that" I meant that we, as in my friend and I, have the th2go, not that "a device exists in general" :) ), and he drives 3 screens of 1280x1024 with it, for a virtual 3840*1024 screen. Not sure if it can go any higher than that, but 1280x1024 is quite high resolution for beamer-related stuff. Generally the VJ-ing is done at PAL resolution (720x576) or lower, and beamers that can do high resolutions are very expensive, especially if they also need to be powerful (large distances/large projection surface).
So it's looking good so far.

Thing with SLI is that it is not very cost-efficient... The videcards themselves need to be reasonably high-end, because the cheap ones don't support it. Then there's the motherboards that need support. And you'll need a more powerful CPU to drive the two cards as well.
So if we can achieve the same with a number of cheaper systems in a network environment, that would be very interesting. Imagine a network of low-end off-the-shelf systems with just an Intel or AMD IGP, each driving one beamer.

It's also more interesting from a logistic point-of-view: we may be doing shows at very large venues. The beamers need to be at fixed positions. Having a central computer system driving all the beamers poses a big problem: HDMI/DVI can't really travel distances of more than ~5 metres. VGA can go larger distances, but still, not recommended over say ~15 metres, as the cables will become too expensive, and quality dropoff will be too large.
Generally in such situations they will convert the signal to a composite PAL signal, because it's easy to transport that over large distances. This loses quality and resolution however.
With a network, you could just place a small slave PC next to the beamer, and run a network cable up to it: cheap, reliable and with no loss of quality.

Anyway, we're just going to work out our current plan first. If it fails, we can always go back to more expensive solutions with SLI or Crossfire.
Posted on 2011-04-29 14:46:14 by Scali
Of course, your needs are different. I have 3 22'' 1680.1050, thats more than 5 MP.
However nowadays you can find ultramassive setups on the net that easily make this pale ; I remember a forum post dating from years that said a friend had a triple 30''.
But the same 3D world on such a wide surface that takes _more_ than _your_ viewing angle is a blast. FSX was breathtaking first times, as in, you dont breathe.

TH2Go does maybe tri1680 but probably not tri1920, but I reckon that for projectors you don't care.
However keep in mind that TH2Go seems to be like, 300, and secondhand 295s seem to be leaving ebay at few more than 150, however not many have 3 outputs. But this card is indeed a very special case (great). Also only needs one PCIe slot. Also draws a HUGE amount of electric power :) , which also means you need corresponding PSU, and in the cases the sli cant operate, you lose a bunch of processing power. But a unique card nonetheless.

Apparently one 460 would be like, 120 now, so you wont convince me that the TH2Go is a good deal against such an sli (unless I missed something about the price of the device). Of course you need the motherboard and the PSU still.
But I see your point, you dont need tri1920 Crysis.
An I can see how three little notebooks near three projectors linked by rj45 is apealing.

I dont know if what you're displaying are 3D worlds or winamp-style effects.
If it's 3D and the three screens form an angle between them you can indeed have a big advantage if you develop your rendering middleware, with three rendertargets. That way you could even do the render on your several low-end pcs and only graphic commands and textures would need to be transmitted across the three machines, with one master, instead of transmitting the video stream (but then only your application could work instead of any 3D app). In fact I'm not sure you could transfer that much goodlooking videostream at high framerates. Theres a reason why signal cables cant be that long. But I would say, coding things yourself maybe wont be that easy.

I'm thinking of something else : lag. Even with softTH and three screens on two non-sli cards, and you blit the last third of the framebuffer from the main geforce ram to the secondary across the bus, you barely had enough bandwidth to avoid desynchronisation. It depends on what you're displaying, but a lagging 3D world (especially on big projected screens) must alter the experience. Then add a network to that...I suspect a few milliseconds will hurt "quite some much". Ah, someone just tells me you were planning to do large, fast rotozooms?
Then again maybe lag's not a concern to you.

Then again (again) tri720 is another story than tri full hd bandwidthwise, I might have it wrong. Just thinking loud.

Have a nice night.
Posted on 2011-04-29 18:17:09 by HeLLoWorld
Well, we already have the th2go, so at this point it's a good thing to use :)
One problem with a card like the GTX295 is that it's very large. I can't fit it in my PC case without removing the entire HDD assembly, and putting the HDDs in the 5.25" slots with adapters. I guess in some cases it won't fit at all. I specifically selected my last two graphics cards on their length :)
I wanted to have one that allowed my HDDs to sit at their regular place.

What we will be displaying is a combination of 3D worlds and winamp-style effects. We already have a system to stream live videocaptures on 3D geometry in realtime. Basically we can do a variety of things with our software. We can use it to replace a large number of physical devices, such as video mixers, colour correctors and various effect devices. We just use video capture and shaders to do the same in realtime.

We can actually 'host' applications as well. We have developed a system to capture a texture from a desktop window (originally developed for hosting Flash, but it works on anything). So it can work for other applications, although the size of the window limits the performance. But if all else fails, we can also run the application on a second computer and just capture its output onto a texture on the main machine. So any 2D or 3D app can work.
The only problem is if you want to use a capture on more than one PC. The easiest way is probably to duplicate the signal by analog means, and have all PCs capture it simultaneously. But that is probably not something we'll be using often. Having a capture on only one of the displays should be good enough most of the time.
In most cases, we just stream pre-captured videos to textures and render them into 3D scenes etc. All the content will be stored locally, so we only need to send commands to the slave PCs from the master.

We will use a network timing scheme where the master will broadcast a clock signal, and the clients will be compensated for both lag and drift. We should be able to get all of them within the same frame at 50 fps (PAL refresh). That would leave a window of 20 ms per frame. Network latency will probably be < 1 ms, as we can use a small dedicated gbit LAN with just 1 switch, with no other traffic than our own. The lag will probably be negligible anyway, and we'd mostly be compensating for drift.
I'm pretty sure it's all going to work out in the end :)
Posted on 2011-04-29 19:25:57 by Scali
You seem to have thought quite a bit about it :)
Posted on 2011-04-29 22:02:13 by HeLLoWorld
Well, mostly my friend :)
He's been a VJ for years, and has developed tools for himself.
I'm more of an 'enabler of technology', as I am more experienced with C++ and Direct3D and all that. So most of the ideas come from him, I just try to integrate them into my engine code.
Posted on 2011-04-30 09:17:39 by Scali
Okay, I've installed my old GeForce 7600GT as a second adapter in my system, and did some experimenting with driving two GPUs from a single application.

I already did most of the groundwork in an earlier attempt to run on two GPUs. I had removed all global variables from my engine, so that each instance of the engine would be a completely self-contained resource management system for that particular GPU. However, at the earlier attempt, we had two identical GPUs, and it was quite difficult to tell what was going on exactly.
This time the framerate alone would be a tell-tale sign of which GPU was being used to render, as the 7600GT is at least 3 times as slow as the GTX460 on even the simplest stuff.

And although the last attempt didn't actually work, we did read up on the documentation, to find out why it didn't work, and what should go where.
So this time around, I actually got it working in Direct3D9 quite quickly. I just had to move a few things around and introduce a few singleton variables, because some things have to be shared between all adapters. I also had to make proper use of the window handles. The Direct3D9 object is connected to a 'focus window', where the devices/swapchains are connected to 'device windows'. In most cases you just ignore the difference, since you only have one window.

Anyway, with all that in place, I managed to get two devices going, which each could be switched to fullscreen on the connected monitor at the same time.
I also tried it in XP, and although it's a bit more fussy there (it resets your devices more often), it does work.
Another funny thing in XP is that windowed mode also works... just not very well :)
That is, the D3D device will always render on the same GPU... but if you move the window to a monitor connected to another GPU, it will do a very slow copy, and your framerate will drop like mad. In Windows 7, the copying is far more efficient. It's still not as fast as the window being on the 'right' monitor. But if I drag a window rendered on my GTX460 over to the monitor on my 7600GT, the framerate is still more than twice as high as when the 7600GT renders it itself :) In XP you're lucky to get more than 100 fps with even the simplest of windows. A fullscreen window will drop to below 30 fps.

Anyway, I figured D3D9 would be the most difficult one... so from here on it should be a cakewalk to also support D3D10 and D3D11.
But no... No matter what I did, I couldn't get both windows to stay fullscreen. The rendering on two GPUs worked okay... rendering on a single GPU and switching windowed-fullscreen also worked okay... I could even get a single GPU to render fullscreen while the other was rendering windowed... But no matter what I tried, I couldn't get them both to stay fullscreen. The window also didn't quite switch back to windowed mode properly, most of the time. So you'd get a borderless, non-movable popup-style window with no mouse cursor, somewhere on your desktop.

After a lot of searching, we found this:
http://forums.create.msdn.com/forums/t/79803.aspx
This is a known issue with DXGI 1.1 I'm afraid...

It is not possible to create two exclusive full-screen devices each with their own output monitor on two different adapters. One of them will always be forced to windowed mode. This limitation does not apply to drive two outputs from the same adapter, or multi-adapter scenarios that are virtualized to a single device (SLI/Crossfire).


Well, crap!
Apparently it just is not possible. As soon as you switch the second window to fullscreen, the first one is forced back to windowed mode.
However, the way the guy phrased it, I was wondering: is that specific to Win7 then? So I fired up my copy of Vista x64 and tested there... Well, crap! It worked in one go! My code was working all along, Win7 just couldn't do it. I think I'll file a bug report on this, just in case. I hope it will be fixed soon.

Other than that, I guess my work here is done: I can render with multiple GPUs to multiple fullscreen monitors.
I haven't actually implemented multiple swapchains on a single GPU yet, to drive multiple monitors that way, but that should be trivial, as it's barely any different from render-to-texture: Render screen 1 to buffer 1, render screen 2 to buffer 2, present.
Posted on 2011-05-03 18:05:47 by Scali
Things are looking up.
We have a gig planned this Friday. This will be the debut of our newly developed technology.

In short we will be using two capture devices to capture two analog streams in realtime. These will be streamed onto textures in Direct3D.
Then they are put into a cubemap, which is applied to some 3D objects.
We will be driving 4 beamers to project the images onto 4 objects.

For this purpose we have built a machine with 3 videocards with 2 outputs each. Two videocards will drive the beamers, and the third will be for the desktop, so we can have the controller UI on there.

We haven't implemented multithreading yet, so all GPUs are currently driven by a single core in a sequential fashion. Early tests showed that we could still get about 700 fps with 4 outputs, so performance is not an issue at this time. We will probably postpone multithreading until after the gig.

When driving all 6 outputs, we got 150 fps, but that probably has more to do with the fact that the third videocard is my old GeForce 7600GT, which is considerably slower than the other two Radeon 4850s. If I implement an asynchronous mode in the multithreading scheme, the slowest videocard won't necessarily have to pull down the performance of the others. That may be interesting as a benchmarking tool: just have each GPU render as quickly as possible.
Posted on 2011-05-30 06:12:37 by Scali
Here's a quick recording of the app running with two capture devices in a feedback loop, on 4 screens simultaneously:
http://www.youtube.com/watch?v=4IooujifRaQ
Posted on 2011-05-30 18:33:57 by Scali
And here we have them projected on actual half-sphere objects: http://www.youtube.com/watch?v=bjZ8hlFPu0I
Posted on 2011-05-31 14:45:06 by Scali
Very interesting.
Posted on 2011-06-01 16:08:22 by HeLLoWorld
With what you've done you could maybe have something similar to what the softTH guy did.
Render on one card on a large buffer and present on three outputs.
But for this you have to make a fake d3d_.dll that redirects all the calls the application makes to your layer, I don't know how difficult this is. Plus it seems there are several ways to do the copy and subtilities with mouse cursor. By the way softTH manages to have multi fullscreen on windows 7, I think it's a trick using a topmost borderless windowed wiewport.

Then with your system you could also render locally on each card separately with different viewport matrixes which could be a big benefit, minus the fact that the cpu has to send all commands N times. But then you could for example have different camera orientation on each display for "free", and this my friend, would be awesome and unprecedented. Although I suspect that there would be many things in the way of making this work, like the application doing too much things by itself regarding the camera or postprocessing effects assuming a the center of the screen is the center of the view, or hud on each display, or...things I didnt think of.

But with the advent of stereoscopic 3D it is likely that there will be an effort from nvidia and the developers to have more separate distinct layers in the rendering pipeline, so that the lower layers "know" wtf they are doing (this is an object, this is the world, this is the camera, maybe this is the hud, this is a light), and the programmer has less freedom to write its own engine and do tricks to render things in a "creative" (bad, dirty!) way. I don't know if this is a good thing. I also think the evolution of gpus(more and more genericity) actually goes against this separation of layers : raytracing, raymarching, point cloud renderning, screenspace effects, distance field rendering, deferred rendering etc...I think the future best way of doing things seems less certain now than it did a few years ago. And with all these different ways of doing things the hardware or even the lower layers of the software stack have no way of figuring wtf you're trying to do and wtf these numbers you're crunching represent. Only in the end you've got something nice in a buffer.

Anyway Nvidia surround support seem to be improving, but still I'm not certain it fully makes use of all the gpus.
Must not be an trivial thing though.

Reminds me of these things
http://www.wideview.it/steve/
hehe. Times they are changin :)

(man were these 90's tower ugly)
Posted on 2011-06-01 18:03:39 by HeLLoWorld
Well yes, we could do something like SoftTH, but our focus is on developing our own software. We aren't aiming at running existing software in higher resolutions or anything.
And yes, it's going to be quite difficult to make anything more advanced work for a large set of applications. When using shaders, you never quite know where your matrices are going. They are just sent directly to some shader variables. The classic SetTransform() API is bypassed altogether.

But for our own software we now have a pretty nice high-performance solution.
We are currently using the 'borderless window' trick as well. Although DirectX 9 seems to work okay with multiple fullscreen windows on Windows 7, it's still rather delicate. So just to be on the safe side for now, we stick to the fake fullscreen trick.
We'll experiment with fullscreen mode in DX9 a bit more after the gig. Then we'll also work out the multithreading. I already did a quick-and-dirty test where all GPUs were being fed by their own thread (and thus core), so that they would all render in parallel. The idea works, but it was still very rough around the edges... window procs weren't working yet, etc. Threads would also fight for CPU time, one display would starve the others for some reason (could be a C#-related threading issue, should really do threading in native code).

And there's plenty of other ideas that we have, which we will work on after the gig.
One idea I particularly like is to use the VideoLAN code to broadcast live video data over the network, and render with multiple PCs at a time.
Posted on 2011-06-02 08:35:28 by Scali
Right, getting back to the D3D side of things...
I'm currently trying to build some more interaction into the engine, so that you can tweak shaders, materials, textures etc on the fly. In a way it will be something like RenderMonkey... only not as limited.
One of the biggest problems is regarding the inputs and outputs of shaders. Eg, texture lookups implicitly encode the type of texture... so if you write your shader to do a lookup in a cubemap, it can only use a cubemap for that texture input.
Then there are the constant buffers in the shader... What kind of inputs does it expect from the engine, and where?

I am thinking of designing some comment tags that the engine can recognize. So it will scan the shader code to learn what the shader wants. Then it can give the user some hints as to what he should be using as inputs.

C# has a nice feature in that the compiler is always available, and can be invoked from your own application. So I have made a simple proof-of-concept where I write a bit of code in a textbox of the application, and then compile it on-the-fly, and instance the resulting object.
This seems to work quite nicely, doesn't even require me to have the compiler write the output files to disk... and it could be used as an advanced scripting tool in the engine (the custom-written code can have full access to the engine's internals, and perform any D3D function you may desire).
The engine could generate part of the code to drive the shaders, based on the hints it scanned from the shader comments. Then it can allow the user to inspect and edit the code... and once it's done, it can be compiled and used right away.

If I can get that working nicely in practice, it should make writing new shader and effects much nicer.
Posted on 2011-07-18 11:05:03 by Scali
C# has a nice feature in that the compiler is always available, and can be invoked from your own application. So I have made a simple proof-of-concept where I write a bit of code in a textbox of the application, and then compile it on-the-fly, and instance the resulting object.
This seems to work quite nicely, doesn't even require me to have the compiler write the output files to disk...


Do I understand correctly?

This is f**cking SMC brought to the masses...Just not self modifyng machine or bytecode...It's in the execution environment.
Maybe we're going to witness never-seen heights of obfuscation...

Maybe it'll bring new levels of genericity and abstraction in the source...
The revenge of the preprocessor over the templates...Just, it's not only a preprocessor anymore...Same language, and you can mix and match what exists and what you just created...again and again...Till the thing wakes up and want to kill you :)
Less and less difference between the compiler and the environment execution...and your program, which can be seen as an extension of the compiler compiling higher level concepts when it generates the C# code.

En a way one can consider you can ship opensource programs now by only supplying your source.

Of course it's still not like you can insert a bunch of bytecodes or C# statements just before the execution point and jump back, but still...

Wondering.
Posted on 2011-07-18 17:37:09 by HeLLoWorld

Do I understand correctly?

This is f**cking SMC brought to the masses...Just not self modifyng machine or bytecode...It's in the execution environment.
Maybe we're going to witness never-seen heights of obfuscation...


Well, it's somewhere between scripting and SMC I suppose.


and your program, which can be seen as an extension of the compiler compiling higher level concepts when it generates the C# code.


Yes, I think one interesting application would be to generate C# code to optimize your application at runtime.
Say, rather than having to check a number of state variables at every iteration, you just generate C# code for the current state once, then execute it without any conditional jumps.
Something a bit like what SoftWire used to do, except you use C# code instead of an assembly macro language. The end result is still going to be native code, with optimizations applied.


En a way one can consider you can ship opensource programs now by only supplying your source.


You could, but I wouldn't go there myself.
You'd still need to provide a simple program that does the compiling for you (I don't think you get the actual csc.exe compiler executable if you don't have the .NET SDK, so you'd need to build your own compiler).
Aside from that, source code is considerably larger than the compiled code, and it also takes much longer to compile from source than to compile the assembly bytecode to native.
So for convenience and bandwidth-savings, I'd still distribute a compiled version, and have the source code as an optional download (unless eg in the case of BHM where the source code itself is what it's about, and the binaries are just included for convenience... much like the DirectX SDK for example... full source code for the samples is included, but they have included prebuilt binaries as well, so you can just point-and-click to see what it does).

Semi-related: Microsoft is working on their own open source repository. Since Windows-ports of open source programs are generally buggy and perform relatively poorly, Microsoft has decided to do something similar to what FreeBSD does: maintain your own ports.
This includes a build environment for Windows, so users can download and build the latest version from source.
So if this project takes off, then a standardized build environment for Windows may become a reality, and then it will be more feasible to distribute Windows programs in source form.
I made a topic on that a while ago: http://www.asmcommunity.net/board/index.php?topic=30181.0
Haven't really followed it since, but who knows.
Posted on 2011-07-19 03:11:08 by Scali
Yo guys, this is the place perfect for me to show you my 3D renderer engine. I'll rearrange the code so it is neat enough and will not harm your eyes. I think it is good enough for a average low end card.

http://ompldr.org/vYmZnZw/OGLE.zip (The demo, about 5 MB)



I'll tell you all I know. I heard OpenGL is obsolete, but care not, I'll invent a new one if necessary.
Posted on 2011-12-03 04:53:35 by Farabi
1998 called, they want their demo back !
sorry, had to do it :)
Posted on 2011-12-11 18:13:42 by HeLLoWorld
Small update on the D3D stuff...
We're now working on dome projections (sorta like imax/omnimax theaters).
I have developed a player which can play 360 degree movie formats in realtime.
http://www.youtube.com/watch?v=y2imUeEMoq0
It first 'unwraps' the movie to a cubemap, and then it can be projected on any shape with the use of a mesh and some shaders. In this case I use a sphere. Our dome projection works by using a spherical mirror (sphemir) in the center of the dome, and a conventional projector aimed at it. So the sphere is the correct mapping to project onto the dome.
But many other configurations are possible.
Posted on 2011-12-31 14:58:23 by Scali