Grr, I had posted a lengthy reply, which got lost by the proxy.
Short version... I think I'm going to move to a system with dynamically linked MFC, where the EXE contains the main CWinApp object, and each DLL will be an MFC Extension DLL.
It's a miracle that it actually works in the current state (the framework was originally designed as a monolithic statically linked MFC app, and I just rigorously cut it in two for the EXE/DLL part to make the dynamic loading and dependency checking work, breaking some MFC rules in the process).
Posted on 2010-01-22 09:18:04 by Scali
Well, apparently that was easier than I thought. It took a while to wade through the MSDN documentation and figure out exactly which types of DLLs are allowed to do what, and which #defines and compiler options you need to compile your DLL and EXE the proper way... but after that, my code compiled in one go.
I'll have to do an analysis of the code to see if everything is where I want it to be now... but at least it compiles without problems again, for the time being.
Going to dynamically linked MFC also shrunk the filesize considerably. The loader EXE is now down to 39 kb, and the actual engine DLL is 77 kb.
Posted on 2010-01-22 12:06:55 by Scali
Going to dynamically linked MFC also shrunk the filesize considerably. The loader EXE is now down to 39 kb, and the actual engine DLL is 77 kb.
How many megabytes of MFC DLLs does this end up depending upon, though? :)
Posted on 2010-01-22 13:46:15 by f0dder

Going to dynamically linked MFC also shrunk the filesize considerably. The loader EXE is now down to 39 kb, and the actual engine DLL is 77 kb.
How many megabytes of MFC DLLs does this end up depending upon, though? :)


A quick estimate is about 5-6 mb.
Then again, I wonder if that's a problem anyway. Since I use OpenMP, I already need the VC++ redistributable. If that contains the MFC DLLs aswell (which it seems it does: http://www.microsoft.com/downloads/details.aspx?familyid=A5C84275-3B97-4AB7-A40D-3802B2AF5FC2&displaylang=en), then I've just gained the space. It's only a 4.0 mb download too.

I don't think there are any attractive alternatives anyway...
I could either strip the EXE from any MFC code, and put the entire application into each DLL, that way I can still use static linking... or I could drop MFC altogether... But building complex dialogs and stuff with MFC is bad enough... without MFC it's a maintenance hell :P
Posted on 2010-01-22 16:27:35 by Scali
I've been doing some cleanup in the code... Trying to remove API-specific code from my engine objects as much as possible.
F0dder and I are also trying to come up with some kind of scheme where different classes exist for the different APIs, but the actual implementation being shared as much as possible. Most of the time it's just one or two lines that are slightly different from one API to the next in a single function.
You could copy/paste the implementation into different sourcefiles, but then you'd have to maintain the same code three times. That is what I would like to avoid.

However, the BHM claw object now actually loads for all three APIs. So there is some progress. The next step will be making a shader to perform the skinning, so the animation can be played once again. I suspect that this is going to be one of the harder things to make API-independent...
One major difference is that D3D9 handles each shader constant individually, where D3D10/11 allow you to update an entire struct of data to the shader in one go.
That means the handle management in D3D9 needs to somehow be automated to look more like D3D10/11 from the outside. I guess I have to create my own fake 'ID3DBuffer' object which stores the handles internally, so that it can update the struct with shader constants one member at a time.

Related issues are texture/stage state handling. Those are also completely different in D3D9 vs D3D10/11. I had already made my own state blocks, so that I could store a bunch of states and update them in one go... But they don't resemble the D3D10/11 objects much. I'm not sure if I can shoehorn them into that.
And ofcourse vertex declarations in D3D9 vs input element declarations in D3D10/11. Similar, but not quite. I'm thinking of storing the D3D10/11 declarations, and creating a function that translates them into D3D9 vertexdecls. Not sure how feasible that is at this point though.
Posted on 2010-01-26 04:30:14 by Scali
Hum, last night I finally got OpenCL to behave reasonably on my Radeon HD5770 1 GB. So I tried the OpenCL GPU-accelerated samples in GPU Caps Viewer.
Today I figured I could also run them on my PC at work, which has a GeForce 9800GTX+ with 512 mb.
The 9800+ GTX actually seems to outperform the HD5770 in pretty much all tests. That's quite amazing, considering that AMD is pushing OpenCL hard in the media, and the HD5770 is a far newer and more advanced card than the aging 9800GTX+. In gaming, the 5770 is considerably faster than the 9800.
I'd be lying if I said I was surprised, though. AMD is usually 'all talk'. They've been talking about GPU-accelerated physics since the introduction of the Radeon X1800, and they still can't pull it off.

Oh yea, and I actually run my 5770 on pretty heavy overclocked settings, 925 MHz on the core, 1425 MHz on the memory iirc. The 9800 is running stock.
Some quick specs:
HD5770:
- 40 nm production process
- 1 GB GDDR5 memory
- 850 MHz core speed
- 800 stream processors

9800GTX+:
- 55 nm production process
- 512 MB GDDR3 memory
- 738 MHz core speed
- 128 stream processors
Posted on 2010-01-26 08:40:50 by Scali
Okay, time for a quick-list, so I don't forget what it is that I was trying to do next... :)
- Try to build a D3D10/11 input decl to D3D9 vertex decl conversion routine.
- Revive the D3D9 skinning code from the previous-gen engine and plug it into the current D3D9 engine.
- Devise D3D10/11 skinning code from the now-working D3D9 code.

Once I got that, my engine should be mostly back into business, as it can then load, render and animate objects from disk.
Going from there I will probably want to:
- Plug the CPUInfo library into the engine, as a kind of free promotion for the project, and for the user to see some random data regarding his system.
- Build a minimal OpenGL engine to load, render and animate the BHM object, as an example for the BHM file format project on SourceForge.net (again plugging a gratuitous CPUInfo window in there somewhere).
- Build new-and-improved shadowing routines into the engine.
- Make some kind of demo, with audio-synchronized effects.
- Play around with DirectCompute, for some kind of REYES rendering, or perhaps raytracing/hybrid rendering.

Posted on 2010-01-27 04:46:54 by Scali
You just now got dx9 skinning code? :)
Yeah ok, it took me 6 years to get it working, its complicated huh?
Why does it have to be so complicated?
:)
Posted on 2010-01-28 02:25:49 by Homer

You just now got dx9 skinning code? :)
Yeah ok, it took me 6 years to get it working, its complicated huh?
Why does it have to be so complicated?
:)



No, I had it years ago (including skinned shadowvolumes): http://bohemiq.scali.eu.org/forum/viewtopic.php?f=4&t=35
But that was a DX9-only engine, and I'm now building a new engine around DX9/10/11. As I said, I'm going to revive the code from the previous gen engine (that would be the above code, now almost 6 years old) and plug it into the new one.
Posted on 2010-01-28 03:29:56 by Scali
I doubt the transition will be painful, or the implementation simplified.
How do the new vertex declarations differ from dx9?
Posted on 2010-01-28 03:38:30 by Homer
How do the new vertex declarations differ from dx9?


Well, apart from the obvious difference in the actual syntax (they use strings rather than flags for the semantic usage, and the more generic DXGI formats etc) and structures used for declaring a format, it's a semantic difference aswell.
In D3D9, the vertex declaration was used 'adhoc'... so when you called DrawPrimitive(), it would check the current shader, vertexbuffer and vertex declaration (or FVF), and see if it could make them all fit.
In D3D10/11, the vertex declaration is compiled into your shader (after all, you define the input structure in your HLSL code). You then need to create an input layout object for your shader. This is redesigned this way for improved performance (simpler validation at runtime, and the input mapping isn't done at every Draw call, but only once).

This means that I needed to change my logic... I had always put the FVF or vertex declaration with my mesh object, which contains the vertexbuffers (so it declares how the data is stored in the buffer). In D3D10/11, it 'belongs' to the shader (it declares how data will be input to the shader, which also means that every shader needs its own input layout object, even if the vertex declaration itself is the same, because of the input mapping to the shader), as you can't build an input layout object without a compiled shader object. So I now put the vertex declaration into the object that stores my shaders, textures, materials and things. It has a function to compile a shader, and it generates the input layout (D3D10/11) or vertex declaration (D3D9) on the fly. Because of the translation routine that I made yesterday, I can supply D3D10/11 format declarations to the compile function, so the difference between D3D9 and D3D10 in this respect is hidden from the outside. It will convert the declaration to D3D9 format on-the-fly, and build an IDirect3DVertexDeclaration9 object instead of an ID3D10/11InputLayout object.
This way I don't have to separately build D3D9 and D3D10/11 declarations for the same geometry data. Saves me a lot of work, and makes it all a lot less errorprone.
Ofcourse it only works for lowest-common-denominator stuff... not all D3D10/11 declarations can be translated back to D3D9 obviously, as D3D9 doesn't support all features and formats. But those will be exceptional cases anyway. If you can't input it into D3D9, you can't use the shader in D3D9 either.
Posted on 2010-01-28 03:58:16 by Scali
Okay, a small update...
The input layout-to-vertex declaration routine is up and running now, I've been debugging it, and it works at least for my standard skinned and unskinned vertex formats (where I found that in some cases my input layout declarations weren't entirely flawless either).
I've now made a typedef to pretend that an IDirect3DVertexDeclaration9 is the same thing as an ID3DInputLayout. Technically it isn't, because the D3D9 type is more generalized (as I said before, in D3D9 it is shader-independent)... but this way around it will work, and my code will look nice and clean, performing the exact same actions for D3D9/10/11 in as many places as possible.

I've also played around a bit with FVF stuff. It has no meaning in D3D10/11 obviously, but I wanted to keep the FVF support in D3D9, so the new engine would still have the full functionality, for backward compatibility. For some reason I've always had a soft-spot for toying around with shading and trickery on the fixedfunction pipeline. I managed to get it working again, which was fun.

I've copy-pasted the old skinned shader stuff into the new engine, to try and resurrect the D3D9 mode at least. I wasn't too successful, because although the code compiles, it crashed as some matrices didn't get allocated during loading. I'll have to compare with the old engine to see where I allocated them exactly, and figure out why that code somehow didn't make it into the new engine yet.
Once I have it working in D3D9, it should be just a question of rewriting the shader interface code for D3D10/11, and I'll have skinned animatinon in all APIs. All the matrix handling and animation code is (or should be) API-independent, so once that part works in D3D9,  it should automatically work in D3D10/11 aswell.

Speaking of that interface code, I haven't tackled things like textures yet. D3D10/11 work in a completely different way, with very generic 'ID3DBuffer' objects, which can contain a number of things, including texture data. You then need to create a 'resource view' to use it as a texture in a shader (again, much like the input layout scheme, the resource view will move some of the validation and mapping code to creation time, rather than when the texture is actually bound to the pipeline, for improved performance). I haven't gotten round to making a nice abstracted way of loading textures in D3D9/10/11 yet. Once the textures are loaded, they work the same in the engine, regardless of the API used, though. So it's just at creation time that it's still a tad messy at this point.

Similarly, the renderstate handling is way different from D3D9 to D3D10/11. I haven't yet thought about how to abstract that... or if I should even try to abstract it. I've made a nice caching mechanism for D3D10/11 though. You can just update any states at will, and just before rendertime, the engine will automatically check for changed states, and it will build and set a new stateblock on-the-fly if required. So that means that you don't have to care about previous states or anything in your own code, much like how D3D9 worked. You can read back or modify any individual state, and leave the rest as-is. Very convenient to use.
The stateblocks are ofcourse cached and re-used if possible, to maximize performance and avoid memory leaks. So in a way, your state changes are automatically 'compiled' and optimized (it's probably much like how D3D9 drivers work internally anyway).

Another thing I should take care of, is the profile to compile the shaders against. Currently I still hardcode the string ("vs_2_0", "vs_4_0" etc), but sadly there is no lowest-common-denominator. While D3D11 has a 9.0 hardware compatibility mode, it doesn't use the same profile name as D3D9 does, so compiling a shader for "1_1" or "2_0" won't work, you need something like "4_0_level_1".
I think the best way to solve that is to query the highest possible shader level that the hardware supports, and then store those profiles for all compiling that is to be done at a later time. That way I no longer need to make different calls for different APIs. It will also ensure that the shaders are compiled with all the possible optimizations for the hardware.

So, if I can find the time tonight, with a bit of luck the skinning can be completed.
Posted on 2010-01-29 06:37:47 by Scali
Okay, the skinned claw animation can be loaded and played in D3D9 mode again, in the new engine.
So that means that my loading code, my animation code and my skinning code are all in order.
In D3D10/11 mode it currently plays the animation with static meshes only. Basically means that the fingers of the claw don't move, but otherwise it looks okay.

It was actually easier than I thought... At first I thought I had all the code in place, but I didn't see anything. That was because I didn't use the camera from the BHM file yet, and my own camera was way too close... Then I saw everything but the claw... That one was most tricky... I loaded the skinned shaders, but further down in the code, the regular shaders were loaded, replacing the skinned shaders, messing up my whole skinned material system.

Should all be downhill from here, just shoehorning the D3D9 shaders into the new D3D10/11 material system. Basically that should only mean the setting of the shader constants. I already use the D3D10/11 input layout declarations in D3D9 mode.
Posted on 2010-01-29 18:11:18 by Scali
Aha, success at last!
It appeared that it wasn't JUST a case of setting the vertex shader constants.
I also had to do some shader debugging... Although the shader compiled and worked correctly in D3D9, it had a side-effect of creating an extra constant buffer, which in D3D10/11 meant that I was updating the wrong constant buffer, and wondering why nothing appeared on my screen :)

Then I also had problems reading the bone indices in D3D10/11, because as it turned out, I didn't pass the correct DXGI format for the blendindices field. Apparently that didn't affect D3D9, even though it built its vertex declaration from the same definition. D3D9 split the 4 packed bytes into an uint4 type implicitly. D3D10/11 just filled the first uint4 with the whole 4 bytes.

The fun part was... once I had an idea of what may be going wrong, I used the bitwise operations available in SM4.0 to manually extract the bytes with some shifting and anding. It actually worked that way too :)
But in the end it wasn't an option, as it was neither a very clean solution, and it would also break when compiled for D3D9.

But I got it working now, with a single shader working for all three APIs.
Posted on 2010-01-30 12:33:48 by Scali
I guess the next task is the decision between forward, deferred and light-prepass :) ?
And CHC++ vs basic BVH queries, portals or bsp :)
Posted on 2010-01-30 16:35:52 by Ultrano

I guess the next task is the decision between forward, deferred and light-prepass :) ?
And CHC++ vs basic BVH queries, portals or bsp :)


No
Posted on 2010-01-30 16:41:30 by Scali
I've cleaned up most of the code now. I've added the detection of maximum shader profiles supported for the actual hardware.
This also includes the detection of compute shader functionality in D3D11 on 'downlevel' hardware (10.0 and 10.1).
At that point I realized that I never bothered to extend my state cache and other core engine parts to support the new shader types and related states in D3D11, such as the compute shader, domain shader and hull shader.
So I have added those aswell now, allowing me to fully leverage the D3D11 functionality.
It still needs a bit of cleaning up though. There are some operations that could be grouped into single functions. With 3 types of shaders in D3D10 it wasn't THAT big of a deal, but now I have 6 types of shaders, so some code is currently virtually duplicated 6 times.

The code doesn't render *exactly* the same at this point, but I'm not that bothered about it. I think tonight I'll just compile a release package, so it can be tested on various computers, running on various hardware, with various flavours of Windows and Direct3D APIs.

I'm also thinking of removing D3D10 support from my sourcecode now. To be exact, it is D3D10.1 at present. I had already dropped D3D10 in favour of D3D10.1. This makes Vista SP1 mandatory, but I don't think that's a big deal. Dropping D3D10.1 in favour of D3D11 will make SP2 mandatory. But it will cut down the size and complexity of my codebase.
Posted on 2010-02-02 05:00:15 by Scali
Oh, by the way... I've casually mentioned various differences between D3D9 and D3D10/11, and how I tried to bridge the gap between them.
If anyone is interested, I could do a more elaborate summary/overview of the various difference between the APIs, and the solutions I have used to overcome them. It may help others in migrating from D3D9 to a newer API.
Posted on 2010-02-02 07:27:20 by Scali
Here is today's release: http://bohemiq.scali.eu.org/Engine20100202.zip
It should look something like this:


I haven't bothered to add 64-bit binaries, since I'm on my laptop. It doesn't have a 64-bit OS, and the 64-bit cross-development environment isn't set up either. So 32-bit only will have to do this time :)
Posted on 2010-02-02 14:09:56 by Scali
If anyone is interested, I could do a more elaborate summary/overview of the various difference between the APIs, and the solutions I have used to overcome them. It may help others in migrating from D3D9 to a newer API.
Please do that - I pretty much reached a stand-still while refactoring the code you sent me, after realizing I didn't know enough about the various API versions to efficiently untangling the #ifdef soup :)

But with some additional abstraction, I believe substantial parts of the engine can be generalized & re-used, without #ifdef hell, and providing a relatively clean interface.
Posted on 2010-02-02 14:22:37 by f0dder