Wow, sounds like overkill, but hey it works, thats what matters.

Posted on 2010-02-11 05:25:06 by Homer
Probably the same problem.
Here's a quick fix:
Go to the Data directory, and open hlsl_shader.vsh.
Look for the following structure:
struct VS_OUTPUT
{
float4 Position : SV_Position; // vertex position
float3 Diffuse : COLOR0; // diffuse lightvector
float3 Normal : NORMAL; // vertex normal
float3 Specular : TEXCOORD0; // specular lightvector
float3 Distance : TEXCOORD1; // distance vector to light in worldspace, for attenuation
float Attenuate : TEXCOORD2; // per-vertex distance scalar
float2 Texcrd : TEXCOORD3; // Texture coordinates
};


Now I think two possible solutions will work, but I only tried the first:
1) Replace COLOR0 with TEXCOORD0, and increase all the following TEXCOORDn by one (so TEXCOORD0 -> TEXCOORD1, etc).
Then open hlsl_shader.psh and do the same.
Or:
2) Replace 'float3 Diffuse' with 'float4 Diffuse'. Then open hlsl_shader.psh and replace 'half3 Diffuse' with 'half4 Diffuse'.

It didnt work, im posting the results of both and what i changed

1) the claw is now vissible but its white


hlsl_shader.vsh

struct VS_OUTPUT
{
float4 Position : POSITION; // vertex position
float3 Diffuse : TEXCOORD0; // diffuse lightvector
float3 Normal : TEXCOORD1; // vertex normal
float3 Specular : TEXCOORD2; // specular lightvector
float3 Distance : TEXCOORD3; // distance vector to light in worldspace, for attenuation
float Attenuate : TEXCOORD4; // per-vertex distance scalar
float2 Texcrd : TEXCOORD5; // Texture coordinates
};


2) crashes the program, after pressing ok in the settings window
AppName: engine32.bin	 AppVer: 0.0.0.0	 ModName: engine32d3d9.dll
ModVer: 0.0.0.0 Offset: 000080be


hlsl_shader.vsh
struct VS_OUTPUT
{
float4 Position : POSITION; // vertex position
float4 Diffuse : COLOR0; // diffuse lightvector
float3 Normal : TEXCOORD0; // vertex normal
float3 Specular : TEXCOORD1; // specular lightvector
float3 Distance : TEXCOORD2; // distance vector to light in worldspace, for attenuation
float Attenuate : TEXCOORD3; // per-vertex distance scalar
float2 Texcrd : TEXCOORD4; // Texture coordinates
};


hlsl_shader.psh
struct VS_OUTPUT
{
half4 Position : POSITION; // vertex position
half4 Diffuse : COLOR0; // diffuse lightvector
half3 Normal : TEXCOORD0; // vertex normal
half3 Specular : TEXCOORD1; // specular lightvector
half3 Distance : TEXCOORD2; // distance vector to light in worldspace, for attenuation
half Attenuate : TEXCOORD3; // per-vertex distance scalar
half2 Texcrd : TEXCOORD4; // Texture coordinates
};


half4 main( in VS_OUTPUT v ) : SV_Target
{
half4 Diffuse = v.Diffuse*2.0 - 1.0;
half3 Specular = quicknrm(v.Specular);
half3 Normal = //tex2D( tex, v.Texcrd )*2.0 - 1.0;/*normalize(v.Normal);*/
quicknrm(v.Normal);

//half spec /*: register(r4)*/ = tex1D( power, saturate(dot(Specular, Normal)) );
half spec = pow( saturate(dot(Specular, Normal)), 16 );

half4 color = ambient + (diffuse*saturate(dot(Diffuse, Normal)) + spec*specular)*v.Attenuate;

return color*tex2D( tex, v.Texcrd );
}
Posted on 2010-02-11 05:29:33 by Azura

Wow, sounds like overkill, but hey it works, thats what matters.


I moved as much of the processing as possible into the vertexshaders.
You could use less interpolated values, but then you'd have to do a lot more math in the pixelshaders, making the whole thing slower.
And these shaders were originally written for SM1.x (which they can still be compiled to, under D3D9), where vertexshaders were considerably more powerful than pixelshaders anyway, so doing it in a pixelshader was not an option. You had neither the precision nor the instructioncount.
Posted on 2010-02-11 05:43:43 by Scali

It didnt work, im posting the results of both and what i changed

1) the claw is now vissible but its white



For 1), did you change the VS_OUTPUT struct in both hlsl_shader.vsh and hlsl_shader.psh? (They need to match). I couldn't quite make this out from your post.
In case you did, then I'm not sure what is wrong.
By the way, you shouldn't need to change the NORMAL flag, I think.
The correct struct should be:
struct VS_OUTPUT
{
float4 Position : SV_Position; // vertex position
float3 Diffuse : TEXCOORD0; // diffuse lightvector
float3 Normal : NORMAL; // vertex normal
float3 Specular : TEXCOORD1; // specular lightvector
float3 Distance : TEXCOORD2; // distance vector to light in worldspace, for attenuation
float Attenuate : TEXCOORD3; // per-vertex distance scalar
float2 Texcrd : TEXCOORD4; // Texture coordinates
};


And its equivalent in hlsl_shader.psh:
struct VS_OUTPUT
{
half4 Position : SV_Position; // vertex position
half3 Diffuse : TEXCOORD0; // diffuse lightvector
half3 Normal : NORMAL; // vertex normal
half3 Specular : TEXCOORD1; // specular lightvector
half3 Distance : TEXCOORD2; // distance vector to light in worldspace, for attenuation
half Attenuate : TEXCOORD3; // per-vertex distance scalar
half2 Texcrd : TEXCOORD4; // Texture coordinates
};


Edit: Checking the shaders I ran on the Q35... I actually DID replace the NORMAL flag with TEXCOORD aswell, so it's the same as what you posted.
Posted on 2010-02-11 05:52:25 by Scali
i guess i missed to say that i use the engine you posted on page 7, the one from 20100205, which was the last one i found posted
in there the NORMAL flag was declared that way

it works now when changing both hlsl_shader.vsh and hlsl_shader.psh to:

struct VS_OUTPUT
{
float4 Position : POSITION; // vertex position
float3 Diffuse : TEXCOORD0; // diffuse lightvector
float3 Normal : TEXCOORD1; // vertex normal
float3 Specular : TEXCOORD2; // specular lightvector
float3 Distance : TEXCOORD3; // distance vector to light in worldspace, for attenuation
float Attenuate : TEXCOORD4; // per-vertex distance scalar
float2 Texcrd : TEXCOORD5; // Texture coordinates
};

struct VS_OUTPUT
{
half4 Position : POSITION; // vertex position
half3 Diffuse : TEXCOORD0; // diffuse lightvector
half3 Normal : TEXCOORD1; // vertex normal
half3 Specular : TEXCOORD2; // specular lightvector
half3 Distance : TEXCOORD3; // distance vector to light in worldspace, for attenuation
half Attenuate : TEXCOORD4; // per-vertex distance scalar
half2 Texcrd : TEXCOORD5; // Texture coordinates
};


but it crashes when doing the changes you suggested

edit: saw your edit, then i guess everything is the way it should now
Posted on 2010-02-11 06:02:30 by Azura

i guess i missed to say that i use the engine you posted on page 7, the one from 20100205, which was the last one i found posted
in there the NORMAL flag was declared that way


Oh yea, could be. I'm at work here, and only had the Engine20100204 on a memory stick here. Not sure what I changed exactly between those two versions.


it works now when changing both hlsl_shader.vsh and hlsl_shader.psh


Okay, that's good to know.
I guess the Intel IGP problems are now solved then :)
I'll have to remember to fix the shader code at home, so the next release will also come with the correct shader code out-of-the-box.
Posted on 2010-02-11 06:10:12 by Scali
I’ve done captures of our old Bohemiq demos in 720p quality, and put them on Youtube:
Art Nouveau
Croissant 9

And some other stuff too:
Reflecting Spheres
Prosaic
Posted on 2010-02-19 09:46:04 by Scali
What video card are you using, Homer?


I would still like an answer to this question...
Posted on 2010-02-22 12:43:44 by Scali
Just a 8800 (GTX) for now.
Posted on 2010-02-22 15:58:46 by Homer

Just a 8800 (GTX) for now.


Okay, thanks.
That rules out the possibility that the hanging on your system was related to the pixelformat aswell (that was probably an independent issue on the Intel Q35 chip). It must have been in the MFC initialization/deinitialization code for loading the extension DLLs. I hope that was solved in the later builds.
Posted on 2010-02-23 09:52:34 by Scali
I can't find exactly where I mentioned it, but I am quite sure that I mentioned somewhere that the D3D9 version of the engine was the fastest, despite the attempts to reduce the CPU overhead in the new D3D10/11 API and driver model.

Well, while working on the OpenGL code, I also did some work on the D3D code, to clean it up a bit (the OpenGL code starts from a clean slate, so sometimes I think 'hey, this is cleaner than what I did in D3D, I'll change it there aswell').
And I discovered something: Suddenly I saw over 7000 fps in Windows 7, while running the D3D11 variation of the engine. Those are D3D9-like figures (and yes, D3D9 on XP).
So I verified with an earlier build, and indeed, the stuff that I built in February is running faster aswell now. So it's not the changes to my code that made the difference.

It appears that the D3D10/11 drivers are now maturing, and the overhead is finally dropping below D3D9 in practice, which is the main reason why the API and driver model was redesigned in the first place.
I'm also quite happy with the performance of my D3D11 code. In theory it has more overhead than the D3D10 version, since it has to manage more shaders and states (there are three new types of shaders: compute, hull and domain). But I don't see a performance difference with the D3D10 version. So the management code is pretty 'lightweight'.

So it seems that D3D11 on Windows 7 can now outperform D3D9 on Windows 7, and get pretty much the same framerates as D3D9 does in XP.
This is with my Radeon 5770 and Catalyst 10.3 though. I'm not entirely sure where nVidia stands currently.
Posted on 2010-04-06 03:15:59 by Scali
On another note...


One major difference is that D3D9 handles each shader constant individually, where D3D10/11 allow you to update an entire struct of data to the shader in one go.
That means the handle management in D3D9 needs to somehow be automated to look more like D3D10/11 from the outside. I guess I have to create my own fake 'ID3DBuffer' object which stores the handles internally, so that it can update the struct with shader constants one member at a time.


I still haven't come up with a solution to this one... It reared its ugly head again in OpenGL, since OpenGL uses a system similar to D3D9 (which has considerably more overhead than the new D3D10/11 way... and from what I've seen, OpenGL 4.0 doesn't solve this... edit: Indeed, OpenGL 4.0 doesn't solve this, because they solved it in OpenGL 3.1 already, by introducing ARB_Uniform_Buffer_Object.. which ofcourse doesn't work on my Intel driver, blah).

The problem is basically that D3D9/OpenGL view each constant as an independent entity in the shader.
You query the compiled shader for each constant by its name.
So eg if you have this in your shader:
float4x4 myMatrix;
float3 myVector;

You would do something like this to access it (pseudocode):
Matrix myMatrix;
Vector myVector;
myMatrixHandle = GetConstantHandle("myMatrix");
SetConstantMatrix4x4(myMatrixHandle, myMatrix);
myVectorHandle = GetConstantHandle("myVector");
SetConstantFloat3(myVectorHandle, myVector);

So in order to set all the constants in a shader, you need to know all the names of the variables as they are defined in the shader source, and you need to know their types, in order to select the proper setter-function.

In D3D10/11 by contrast it works something like this...
In the shader you can define constant buffers which contain a number of constants, more or less like a struct:
cbuffer
{
float4x4 myMatrix;
float3 myVector;
}

Then in your application code, you do something like this:
struct
{
Matrix myMatrix;
Vector myVector;
} MyStruct;

MyStruct* pMyStruct = ConstantBuffer->Map();
pMyStruct->myMatrix = myMatrix;
pMyStruct->myVector = myVector;
ConstantBuffer->Unmap();

Now how would one unify this elegantly? What I would like is an automated simulation of Map()/Unmap() for D3D9/OpenGL. Going the other way around would just make the code less efficient on D3D10/11 and you'd still have the same clumsy way of accessing shader variables that I never liked to begin with.
What I'd like to do is just fill the struct, then somehow iterate through its members automatically, and use their names and types to access the shader variables.
But I'm not sure how you can do that.
I suppose I will need to do something similar to vertex declarations... where you use some macro functions to describe how a structure is built up.
Then in D3D10/11 you can just ignore the extra data, and only use the struct itself. And in D3D9/OpenGL you can use the extra data to get info on the variable name as a string, and its type.
I think I can't access the members by name directly, in that way... but I'd have to use a byte-offset...
I can't really think of a way to 'connect' them though... as in having the struct definition built automatically from the extended description. That's not what you do with vertex declarations either though... but perhaps there is a clever way to do it. Perhaps some kind of template with recursive subclassing? Adding one new member variable at every iteration?
Posted on 2010-04-06 04:37:57 by Scali
I've made a first attempt at an automated struct-to-D3D9-constanttable setup, so that it would behave much like D3D10/11 do.
I took the input declaration syntax as an example, and made my own constant declaration stuff... Looks something like this:

struct PS_CONST_BUFFER
{
D3DXCOLOR ambient;
D3DXCOLOR diffuse;
D3DXCOLOR specular;
};

D3D_CONST_ELEMENT_DESC psDesc[] =
{
{ "ambient", 1, D3D_CONST_FORMAT_VECTOR, 0 },
{ "diffuse", 1, D3D_CONST_FORMAT_VECTOR, 16 },
{ "specular", 1, D3D_CONST_FORMAT_VECTOR, 32 },
};


The struct can now be shared between D3D9/10/11. D3D9 requires an additional description... I can just ignore that in D3D10/11, so I can use one codepath for all.
I then just create my own const buffer:

pPshConstBuffer = new ID3DConstBuffer( psDesc, _countof(psDesc), pPshConstTable, p3D->pD3DDevice );


This will initialize itself by allocating a struct internally, and getting the handles to all constants named in the declaration.
You then use it exactly like you would in D3D10/11:

PS_CONST_BUFFER* pPshConstData;
p3D->Map( pPshConstBuffer, D3D_MAP_WRITE_DISCARD, (void**)&pPshConstData);

// Set pixelshader constants
...

p3D->Unmap(pPshConstBuffer);


Map() will return a pointer to the internal structure.
Currently, I have the Unmap() function set the variables directly... but technically it shouldn't do anything yet, and there should be an additional call to set the constant buffers, like VSSetConstantBuffers() and PSSetConstantBuffers() in D3D10/11.

At any rate, it makes the handling of shaders much more elegant. I don't need to track the handles to each variable manually anymore, and I don't need to worry about the datatypes either, as the correct setter function will be called automatically. The code is now simpler and more elegant, at virtually no performance cost, and only slightly less flexible than the original way (I don't support the pointer-functionality anymore, or transposing matrices).
Posted on 2010-04-09 04:08:31 by Scali
So, I was wondering what the nVidia drivers would do now (with the 197.13 drivers), seeing that the Radeon drivers got quite a boost in Windows 7 recently...
The biggest shock came when I ran the D3D9 binary in XP x64:


I thought that nVidia's 8000 fps was about as good as it gets (with the 196.21 drivers), and I was happy to see my Radeon getting reasonably close with its ~7200 fps... But nVidia apparently found a bunch of optimizations aswell, and is now nearing the 9000 fps-mark.
Not to mention that this is on an old Pentium 4 HT machine, at 3.8 GHz, with a GeForce 9800GTX (yes, the original 65 nm one, not the GTX+, which is a 55 nm shrink). In theory both my Core2 Duo at 3 GHz and my Radeon 5770 (which is 40 nm) should be quite a bit faster.

Windows 7 was not quite as much of a success though... It only got around 3000 fps there in D3D9 and D3D11. D3D10 was faster, around the 3700 fps mark... but a far cry from the figures in XP x64.
But I suppose this is where the age of the Pentium 4 shows (or it could be that this is a Windows 7 RC installation, not the RTM).
Posted on 2010-04-11 12:21:16 by Scali
I'm playing around with some geometry shaders now.
The thing with using normalmaps is that you need the tangentspace matrix at every vertex (at least for skinned objects... for static objects you could create normalmaps in object space... but that means you have to use separate maps for all objects, tangentspace is more convenient).
This means that you need to have this stored in the geometry... or do a precalc of tangentspace while loading objects.

So I had this idea... with a geometry shader I can access all three vertices of a triangle at the same time, and then calc the tangentspace matrix on-the-fly.
I could do this after skinning is performed, so I don't have to 'skin' the tangentspace matrix. Might be more efficient. It would also mean I have to store less geometry data in videomemory.

It's the first time I'm actually using geometry shaders in this engine. I had added support from the moment I started converting the D3D9 code to D3D10, just adding geometry shader code for all vertex/pixelshader code that I was converting to D3D10. But I had never actually tried to compile and use a geometry shader to see if it works.
Well, it worked on the first try, so that's good :)
Hopefully the same goes for the other shaders that I've added, but never tested so far (compute shader, hull shader, domain shader).
Posted on 2010-05-16 05:22:39 by Scali
Lol, I knew geometry shaders could cause a performance hit... but this Intel IGP is ridiculous... It's dropped from 100ish fps to 5 fps. Either the hardware support for geometry shaders is really poor, or the whole vertex processing pipeline is forced to software emulation. Oh well :)
Posted on 2010-05-16 09:40:08 by Scali
I think it's time to compile a new todo-list, to get some kind of focus back in this project...
Problem is currently that I'm torn between D3D and OpenGL. Although my personal preference lies with D3D, the recent update to OpenGL 4.0 made the difference between the two a lot smaller again. Add OpenCL to that, and the differences become very small.
OpenGL has the platform independence going for it... which I am normally not interested in, but I have some plans which might make a Mac-version desirable. These plans are still in their infancy, so I won't disclose anything just yet.

At any rate, some of the things I'd like to work on:
- Finish the routine that calculates tangent space in the geometry shader stage.
- Implement state-of-the-art shadow mapping (the current shadowmapping is still based on the ancient hack that I made for my GF2, using the alphachannel of a texture as a pseudo depth texture... Yes, it's only 8 bits of precision, but it actually works remarkably well. Various Nintendo games also use this trick. I never bothered to update the code).
- Write a new exporter for a more up-to-date modeler (the current BHM exporter is for 3dsmax6).
- Work on the BigMesh code, possibly turning it into a library, which might be used by the new exporter.
- Do a remake of the Croissant 9 demo, making use of the new engine, with state-of-the-art lighting/shading/shadowing.
- Continue work on the code I started with the Prosaic demo, using spectrum analysis over the audio to drive various effect parameters in realtime.
Posted on 2010-06-07 06:40:15 by Scali
Hum, new June 2010 DirectX SDK is out: http://www.microsoft.com/downloads/details.aspx?displaylang=en&FamilyID=3021d52b-514e-41d3-ad02-438a3ba730ba

One thing that caught my eye: they are dropping VS2005 support. That had a bit of a psychological impact on me... I'm still supporting VS2005 in my opensource projects. And theoretically I still have some pre-DX9 code in VS2005.
But then I had to think rationally again... VS2005 is 5 years old... perhaps it's time to stop supporting it, and support only VS2008 and VS2010 instead.
Aside from that, it really doesn't impact me. Most of the pre-DX9 code is completely irrelevant, as it had been assimilated in the DX9 codebase anyway. The DX9 codebase was then assimilated into the new DX10/11 codebase, and that all works with VS2008. I haven't used VS2005 for DX9 code in quite a while.
As for the opensource projects, none of them use DX in the first place, so they aren't affected.

So I guess I can just install this SDK. I just have think about moving to VS2010 before VS2008 support is dropped... but I probably still have about 3 years for that. I suppose my opensource projects could move to VS2010 Express, as they don't need MFC... except for the simple example program for CPUInfo, but I can rewrite that with .NET Winforms or something.
Posted on 2010-06-14 04:26:20 by Scali
I am lucky enough to have access to all MS software for free. VS2010 IDE is much cleaner and faster and a bit better organized than 2008 (esp. VC#), but aside from that I didn't notice any big differences. Still, I recommend you to switch to 2010 if you can - it's somewhat nice to use and will be supported for at least 5 years from now ^^

And thanks for the hint about DXSDK!
Posted on 2010-06-14 16:46:34 by ti_mo_n
Yea, I know... I use VS2010 at work. For large C# projects it's much faster and more stable than VS2008. I'll have to ask if it's okay to use a home copy aswell. We all have an MSDN account, but I'm not sure about the exact licensing terms.
I've tried VS2010 Express at home. Works okay, but I have the problem that I rely on some MFC code, which Express doesn't support. It doesn't support x64 either, which would be a shame.
I suppose theoretically I could convert my MFC code to .NET WinForms. That way it will be 'VS Express'-compliant. I suppose x64 support for VS Express will be added at some point... Perhaps with Windows 8, where there will not be a 32-bit version of the OS, I believe.

Edit: Btw, I am having trouble getting the June 2010 DX SDK installed on my laptop... it has a Dutch 32-bit version of Vista. Perhaps it has something to do with that. It aborts halfway, without any useful error message... it just says I have to try reinstalling later.
On my Win7 x64 machine it installed just fine.
Posted on 2010-06-15 02:55:50 by Scali