I was looking for some info on my Intel IGP, when I stumbled across this suite from Intel:
http://software.intel.com/en-us/articles/intel-gpa/

It's a very powerful tool, and it's free too! What's even nicer: for most of the functionality, you don't even require an Intel IGP. I installed it on my PC with Radeon 5770, and the Direct3D API calls and shader analysis functionality appeared to work just fine.

For people familiar with Microsoft's PIX, this is a very similar tool. Except I think it has a much nicer interface, and it also has more functionality. I don't think PIX can show you how much time is spent in VS and PS respectively. I have never found it anyway. PIX seems to mainly concentrate on the API calls themselves.

So anyway, it's a very nice tool if you want to get some info on the basic performance characteristics of your Direct3D code:
- It can tell you which calls were made during a frame
- How many state changes
- It can visualize which call drew what on screen
- It can tell you how many times a certain pixel is drawn, and by which calls
- It can simulate performance when you would use 2x2 resolution textures, or when using a simple pixelshader
- It can show you the used shaders in HLSL and their compiled assembly listing.
- It can show you the textures used in the frame.
etc

I found some interesting things... For example, my D3D9 code used 82 state changes per frame. The same code compiled to D3D10.1 used 21 state changes per frame. That's the more efficient D3D10 driver model and API for you.
I also found that the reason why the D3D10 code is so much slower on my Intel IGP than the D3D9 code, seems to be in the driver itself.
Namely the VS load is considerably higher in D3D10 mode, while the compiled assembly listing for the shader is virtually identical (except for D3D10 using slightly different instructions here and there, eg it can do an ishrl where D3D9 only has mul).
It just looks like the performance is lost between the D3D10 compiled shader and the driver compiling it to IGP-native instructions. I guess the D3D9 driver compiler is far more optimal (I saw a similar thing with the GLSL compiler, which was way slower than my manually assembled code).
In theory I could install an old D3D9-only driver on my Intel IGP, and then run D3D11 in downlevel mode. Then it should use the D3D9 driver and compiler, and that may deliver better performance than the D3D10 driver.
Posted on 2010-05-14 09:28:43 by Scali
About the D3D9 state changes - we ended up implementing our own state manager layer, which vastly reduced the number of state changes - and was even better than d3d9's 'stateblock' switching scheme.
Posted on 2010-05-14 11:08:22 by Homer

About the D3D9 state changes - we ended up implementing our own state manager layer, which vastly reduced the number of state changes - and was even better than d3d9's 'stateblock' switching scheme.


Obviously I did the same thing (I'm not exactly a beginner)... most of the state changes go into the setting of shader constants.
You need one call per constant in D3D9. D3D10+ allows you to update the entire shader in one call. That's where the biggest gain is.
Posted on 2010-05-14 12:56:44 by Scali
Looks like a nice tool. Could you please compare it to nvidia's PerfHUD?
Posted on 2010-05-15 19:44:58 by ti_mo_n

Looks like a nice tool. Could you please compare it to nvidia's PerfHUD?


I don't have an nVidia card.
I think it's safe to assume that nVidia's tools are best though... If you can use them, use them.
But the point of this is that it works for non-nVidia hardware.
Posted on 2010-05-16 05:07:50 by Scali
For opengl, see GeDebugger, similar functionality - tracing, timing, warning log etc but for opengl
Posted on 2012-09-01 06:22:26 by Homer

For opengl, see GeDebugger, similar functionality - tracing, timing, warning log etc but for opengl


That's gDEBugger: http://www.gremedy.com/
And yes, they made it free a while ago, making it quite an interesting tool.
Previously you could only evaluate it for a limited period before you had to buy it.
Posted on 2012-09-01 06:43:24 by Scali
very useful, ive used it to find bugs in some of my code that i thought was perfect, thats a heads up
Posted on 2012-09-03 06:59:01 by Homer

very useful, ive used it to find bugs in some of my code that i thought was perfect, thats a heads up


Always remember: Working code is not bugfree code
Most important thing any programmer can ever learn :)
Posted on 2012-09-03 08:24:15 by Scali