It was most certainly an issue in the DX11 component of the demo, although I didn't take the time to check where the loop was occuring, I am willing to do so.


That could be useful, thanks.
Also, could you check if the 20100203 version also has the problem (or if it now gives an error message)?
I haven't been able to reproduce the problem on the Vista machine since installing D3D11. Now it works correctly even if I rename the d3d11.dll to something weird. Then again, I had to run a LOT of updates to get D3D11 installed. The machine was still running Vista SP1, and hadn't seen an update since November 2008. I had to install about 60 updates before I was even allowed to install SP2, after which I could FINALLY install the Vista Platform Update which contains D3D11. So it could have been any one of those updates that fixed the problem on that PC.
Posted on 2010-02-04 02:39:30 by Scali
"Don't use x87, use SSE2, as context switches don't preserve the full x87 state".
I always thought it was more severe than just saving slightly esoteric exception-related registers, hmm.
Posted on 2010-02-04 02:53:11 by f0dder

"Don't use x87, use SSE2, as context switches don't preserve the full x87 state".
I always thought it was more severe than just saving slightly esoteric exception-related registers, hmm.

MSDN to the rescue:
"The x87, MMX, and 3DNow! instruction sets are deprecated in 64-bit modes. The instructions sets are still present for backward compatibility for 32-bit mode; however, to avoid compatibility issues in the future, their use in current and future projects is discouraged."
In other words:
"Early reports claimed that the operating system scheduler would not save and restore the x87 FPU machine state across thread context switches. Observed behavior shows that this is not the case: the x87 state is saved and restored, except for kernel-mode-only threads (a limitation that exists in the 32-bit version as well). The most recent documentation available from Microsoft states that the x87/MMX/3DNow! instructions may be used in long mode, but that they are deprecated and may cause compatibility problems in the future."

So I suppose it works now, but don't count on it.
Posted on 2010-02-04 03:08:58 by Scali
OMG, last night I made a new build, trying to tackle a few problems related to display format and such (it now automatically tries a few common formats, rather than just bruteforcing R8G8B8A8):
http://bohemiq.scali.eu.org/Engine20100204.zip

And guess what... It actually WORKED on the Intel Q35:



It actually got pretty decent framerates too, compared to my X3100.
So there you have it, my D3D11 engine running on an Intel Q35 integrated graphics chip, with only pixelshader 2.0 support and no vertexshaders in hardware. I don't think you can get any lower than that, hardware-wise :)

I also tried to re-enable the software vertexprocessing in my D3D9 engine, but I wasn't successful in that. The D3D9 engine still doesn't display the claw. I'll have to debug that to see what is actually happening.

Edit: Found the problem with the D3D9 swvp... I accidentally have a '==' where I meant '=', so the proper software vertexshader profile is never passed to the shader compiler. D'oh... so close, yet so far away... since I can't fix that problem in the code right now. I'll have to fix it, and then I can't test it on that machine until Monday. Oh well. At least the D3D11 code worked, which is actually far cooler anyway :)
Posted on 2010-02-05 02:24:17 by Scali
I've fixed the software vertexprocessing bug, and did some other small finetuning to the code:
http://bohemiq.scali.eu.org/Engine20100205.zip

While testing this code in software mode on my laptop, I noticed the secret behind the Q35's performance. The Q35 doesn't have hardware vertexprocessing, so it will always default to software. When forcing my X3100 to software processing, I got about 315 fps out of it.
So it seems that while my X3100 has REAL vertexprocessing in hardware (some hardware used to report hardware vp for compatibility reasons, but would still use a CPU path internally), the hardware is so low-end that it's actually slower than software vp.
Not too surprising in retrospect. After all, the X3100 has only 8 unified shaders. With software vp, that means it has 8 dedicated pixel pipelines. With hardware vp, it has to share the 8 shader pipes between vertex and pixel processing. The concept of unified shaders is a good one, but it goes from the assumption that a significant portion of the pipes are idle, so that they can be re-used for other tasks. Works fine when you have dozens of pipelines, but when you have only 8, they aren't ever idle.

Perhaps I should add a feature that the user can force software vp, if that suits his hardware better.
Posted on 2010-02-05 13:01:44 by Scali
I thought the link was supposed to have something to do with how many SSE/SSE2 machines were on the market.


Oh, but why didn't I think of that earlier?
Steam survey can tell us: http://store.steampowered.com/hwsurvey
Apparently 98% of the machines have SSE2.
Posted on 2010-02-06 04:02:42 by Scali
Stats of gaming machines of gamers that bother with Steam.
Posted on 2010-02-06 04:09:19 by Ultrano

Hi,
Three suggestions, if I may:
1) The 'Adapter' label should say 'display' since it shows all display options.
2) The adapter combobox should say "display name @ adapter name" not the other way around, since we are selecting displays, not adapters.


These were good suggestions, pointing out that the concept of adapter and display is generally seen differently from how I originally intended it in my dialog design. I wasn't trying to ignore them, but I didn't get around to implementing them until now.
Currently it looks like this:


I will have to make the dialog a bit larger too, as apparently the display names can be very long and verbose.
Posted on 2010-02-06 06:28:52 by Scali
I've fixed the software vertexprocessing bug


After running the code on the Q35-equipped PC again today, I have to conclude that this was a bit premature. It still doesn't render the claw.
I wonder what the exact problem is. I tried to simulate a vertexshader-less card on my own hardware, so it would be forced to take the same path... Either I didn't do that 100% correctly, or there is a deeper issue.
Posted on 2010-02-08 11:05:32 by Scali
It's probably something simple, it usually is.
Go back over microsoft's demo source and check yours against it (for animation/rendering stuff).
It took me 6 years to make software rendering work, and another year to get the other three methods working, including shader based animation as the final stage.
Posted on 2010-02-09 00:57:06 by Homer

It's probably something simple, it usually is.
Go back over microsoft's demo source and check yours against it (for animation/rendering stuff).
It took me 6 years to make software rendering work, and another year to get the other three methods working, including shader based animation as the final stage.


Well, I have my own mechanism for automatically falling back to the software pipeline in D3D9 when trying to run a shader that is compiled for a greater version than the hardware supports.
By default this will fallback to software on hardware that has no vertex shader capabilities at all. I wrote it way back in the days when I still had my trusty GF2. Back then it worked (fixedfunction stuff would go with hardware T&L, shaders would go in software), but now the fallback doesn't seem to kick in on the Q35. At least, it renders the animation, except for the claw.
It's difficult to figure out what exactly goes wrong without being able to actually step through the code with a debugger (or getting feedback from the D3D9 debug runtime).
When I simulate it on my own hardware, it works (I can just set vs_3_sw as the vertex shader profile when compiling. That will yield a value greater than vs_3_0, which kicks in the software pipeline.

Edit: I have run capsviewer on the machine again, and noticed that it doesn't have ANY hardware vertex processing at all, not fixedfunction either (unlike my GF2 for example). I have only tested with a D3D device created in mixed processing mode, allowing me to switch between software and hardware at runtime. But on the Q35 I cannot create a mixed device in the first place,  it will fall back to software directly. I will have to try forcing a complete software vertexprocessing device on my own hardware, and see what happens then. It may have slightly different behaviour from a mixed vertexprocessing device in software mode, which is what I've tested on my hardware.
Posted on 2010-02-09 03:28:13 by Scali

I will have to try forcing a complete software vertexprocessing device on my own hardware, and see what happens then. It may have slightly different behaviour from a mixed vertexprocessing device in software mode, which is what I've tested on my hardware.


Well, I've tried with a software vertexprocessing device, but it didn't behave differently on my hardware. I've tried with maximum D3D9 debugging level, and it didn't report anything weird, and it all just rendered fine. I have also tried refrast, no problem.
So the problem may not be related to software vertexprocessing directly, but some Q35-specific detail. I'll just give up on it for now.
If anyone has a Q35 system or similar chip that doesn't render the claw in D3D9 mode, and is willing to assist with debugging, let me know.
Posted on 2010-02-09 15:38:42 by Scali
You could add a little snippet to your boneframe render function and make sure your claw frame is actually being reached?
Microsoft democode assumes that theres only one MeshContainer in the frame hierarchy.
This is almost never the case.
Posted on 2010-02-09 23:35:47 by Homer

You could add a little snippet to your boneframe render function and make sure your claw frame is actually being reached?
Microsoft democode assumes that theres only one MeshContainer in the frame hierarchy.
This is almost never the case.


I don't use Microsoft's mesh code. All storage, animation and rendering code is my own. The code itself should be fine, as it runs in D3D9, D3D10 and D3D11 on various hardware, there are no error messages or warnings even with maximum validation in the debug runtime, and it works in refrast. So the problem is not in the mesh code or my shader code itself.
My theory currently is that I have been approaching it from the wrong angle. Yes, there was a typo in the software vp fallback path, but that was not what plagued the Q35. I now think that perhaps the vertexshader is okay, but there is a problem at the pixelshading stage. I may have set some texture states that are not supported. I will have to verify those. The texture states for the non-skinned parts of the scene work, so I'll have to see if they are any different from the skinned material.
Posted on 2010-02-10 02:57:19 by Scali
Okay, I installed the debug D3D9 runtime on the Q35 machine.
It gave me some useful info... or maybe not?
What it said was something like this: "The output of the current vertex shader cannot be used, because it cannot be mapped to a valid FVF".

Now, I'm not sure what it's trying to tell me...
I've found this with Google... It seems to be somewhat related:
http://doc.51windows.net/Directx9_SDK/?url=/Directx9_SDK/graphics/programmingguide/gettingstarted/vertexdeclaration/vertexdeclaration.htm
DirectX 9.0 Drivers without Pixel Shader Version 3 Support
The input declaration must be translatable to a valid FVF (have the same order of vertex elements and their data types).
Gaps in texture coordinates are allowed.


Since this machine has only SM2.0, I suppose this applies... Then I wonder though, why did it work on my Radeon 9600? Perhaps because the DRIVER understands PS3.0, even though my specific hardware doesn't? It may not enforce the restriction, and the hardware may have require the restriction in the first place.

But I suppose if I rewrite the output of the vertexshader and the input of the pixelshader, it MAY fix the problem.
Posted on 2010-02-11 03:55:59 by Scali
Victory at last!
There was indeed some problem with the mapping of the vertex shader output to the pixel shader input (apparently only on certain SM2.0 devices).
Since I knew that the non-skinned pixelshader worked, I hacked the skinned shader around to output data in the same format.
Now it worked:


Looks like you have to be EXTRA careful when writing shaders for SM2.0 hardware. The funny thing is that the exact same shaders DID work in D3D11 mode on the same machine. Weird, I wonder if it is a driver bug... The hardware can handle the shaders, apparently... Also, I am not quite sure why the original shaders wouldn't map to an FVF anyway. It was just a 4d position and 5 sets of texture coordinates. I wonder if it was something as simple as the variable names I used, which weren't recognized.

Edit: Yup, the driver was right, technically. There was a 'relic of the past' in my code. Namely, I used to pack certain vectors into the COLOR0 or COLOR1 registers. On old hardware, these were integer types, where texcoords were float types. With SM1.x hardware you only had 4 sets of coordinates, so with this trick you could use two extra per-pixel interpolators (where you have to be careful that colours are not necessarily interpolated in a perspective-correct fashion on all hardware, unlike texcoords).
The problem however was that I defined it as a float3. A color type is ARGB, so it would expect float4 instead (or technically uint4).
On SM2.0 you have 8 sets of texcoords, so I just replaced COLOR0 with TEXCOORD0, moved the other 4 sets of texcoords up one index, and it worked.
Apparently SM3.0 hardware doesn't care, because everything is float/texcoord anyway, including COLOR0 and COLOR1. The same goes for my Radeon 9600.
So, that's that mystery solved.
Posted on 2010-02-11 04:26:20 by Scali
The claw doesnt render on my laptop either, its a Intel 945 system running win xp sp3
Posted on 2010-02-11 04:27:55 by Azura
Just curious, why the hell did you need 5 sets of texture coords?
Posted on 2010-02-11 04:35:04 by Homer

The claw doesnt render on my laptop either, its a Intel 945 system running win xp sp3


Probably the same problem.
Here's a quick fix:
Go to the Data directory, and open hlsl_shader.vsh.
Look for the following structure:
struct VS_OUTPUT
{
float4 Position : SV_Position; // vertex position
float3 Diffuse : COLOR0; // diffuse lightvector
float3 Normal : NORMAL; // vertex normal
float3 Specular : TEXCOORD0; // specular lightvector
float3 Distance : TEXCOORD1; // distance vector to light in worldspace, for attenuation
float Attenuate : TEXCOORD2; // per-vertex distance scalar
float2 Texcrd : TEXCOORD3; // Texture coordinates
};


Now I think two possible solutions will work, but I only tried the first:
1) Replace COLOR0 with TEXCOORD0, and increase all the following TEXCOORDn by one (so TEXCOORD0 -> TEXCOORD1, etc).
Then open hlsl_shader.psh and do the same.
Or:
2) Replace 'float3 Diffuse' with 'float4 Diffuse'. Then open hlsl_shader.psh and replace 'half3 Diffuse' with 'half4 Diffuse'.
Posted on 2010-02-11 04:46:34 by Scali

Just curious, why the hell did you need 5 sets of texture coords?


Shader code is included in the Data directory (see above).
I think the output struct pretty much answers the question already.
I need a per-pixel interpolated normal, a per-pixel interpolated specular lightvector, a per-pixel distance from eye to the current position, a per-pixel attenuation factor for the lightsource, and finally the actual texture coordinate.
So it's basically a per-pixel local lighting system (point light), based on the Blinn-Phong equation.
Posted on 2010-02-11 04:49:34 by Scali