I've given GLM a quick look, but it looks like it's less useful than cml...
With cml there's the option to specify 'external storage', which is very useful when you extract the data from a file, or from an OpenGL call or such.
You can just wrap a cml object over the raw data that way.
It looks like GLM has the same disadvantages as cml, in that it is completely template-based and doesn't directly interface with external storage... but unlike cml, it doesn't seem to give a workaround for it.
I think at this point I will try to expand my current datatypes to use cml with the external storage functionality. That way I should easily be able to mimic the 'ease of use' of D3DX in OpenGL. I'll just 'wrap' and 'unwrap' cml stuff silently, hidden from the user. The user can then just cast an array of floats directly to a math object, and do simple operations on it with a basic procedural API.
Perhaps at a later time it can be expanded to a generic standalone library such as D3DX... perhaps even as a co-operation with Homer, using assembly-optimized inner-workings.
With cml there's the option to specify 'external storage', which is very useful when you extract the data from a file, or from an OpenGL call or such.
You can just wrap a cml object over the raw data that way.
It looks like GLM has the same disadvantages as cml, in that it is completely template-based and doesn't directly interface with external storage... but unlike cml, it doesn't seem to give a workaround for it.
I think at this point I will try to expand my current datatypes to use cml with the external storage functionality. That way I should easily be able to mimic the 'ease of use' of D3DX in OpenGL. I'll just 'wrap' and 'unwrap' cml stuff silently, hidden from the user. The user can then just cast an array of floats directly to a math object, and do simple operations on it with a basic procedural API.
Perhaps at a later time it can be expanded to a generic standalone library such as D3DX... perhaps even as a co-operation with Homer, using assembly-optimized inner-workings.
I've picked up the code again tonight, after a short hiatus.
I've now implemented the basic keyframe animation.
The skeleton is animated correctly, so I have all the bone matrices calculated.
The last step is to take the bone matrices and feed them to a vertex shader to perform the actual skinning.
I've now implemented the basic keyframe animation.
The skeleton is animated correctly, so I have all the bone matrices calculated.
The last step is to take the bone matrices and feed them to a vertex shader to perform the actual skinning.
Almost there now... bear with me just a bit longer...
I've added some basic GLSL support to my OpenGL framework. Everything can run on shaders for the non-skinned stuff now.
I'll have to translate the HLSL shaders to GLSL and extend the framework to feed the bone matrices, and then it should work.
I'll also want to add a simple texture so it looks exactly like the D3D version. Sadly OpenGL doesn't support loading of textures from disk, so I'll have to wrap some portable JPEG library in order to load the texture I used for the D3D version.
You just take so much for granted when you're used to D3D... like loading textures, or performing basic math :P
I've added some basic GLSL support to my OpenGL framework. Everything can run on shaders for the non-skinned stuff now.
I'll have to translate the HLSL shaders to GLSL and extend the framework to feed the bone matrices, and then it should work.
I'll also want to add a simple texture so it looks exactly like the D3D version. Sadly OpenGL doesn't support loading of textures from disk, so I'll have to wrap some portable JPEG library in order to load the texture I used for the D3D version.
You just take so much for granted when you're used to D3D... like loading textures, or performing basic math :P
Dang... I hadn't even thought about this yet, but the OpenGL pipeline is completely inadequate for vertex skinning.
In Direct3D, vertex skinning was already possible with fixedfunction, albeit in a limited way. But D3D can store up to 256 world matrices.
OpenGL doesn't have a concept of a world matrix in the first place. They have the model-view matrix, which is the world matrix and view (camera) matrix combined.
Pretty useless, since you want to generally apply skinning in world space, not view space.
In D3D I just had direct access to the world matrices via the transform pipeline. This meant that my shading code didn't have to know anything about the actual objects being rendered. It just copied the matrix palette from the pipeline into the shader, and then used the matrix indices stored in the vertices, and that was that.
In OpenGL I'll have to bypass the transform pipeline and build an alternative pipeline myself, one that supports world matrices explicitly (and more than just one).
At least in D3D10/11 you know beforehand that you need to build your own transform pipeline, since there is no pipeline at all (because they got rid of all the fixedfunction legacy).
In OpenGL there is a pipeline, and they even made the matrices accessible in GLSL automatically... but it doesn't do you any good. You still have to reinvent the wheel if you want to do anything more than just basic object rendering.
In Direct3D, vertex skinning was already possible with fixedfunction, albeit in a limited way. But D3D can store up to 256 world matrices.
OpenGL doesn't have a concept of a world matrix in the first place. They have the model-view matrix, which is the world matrix and view (camera) matrix combined.
Pretty useless, since you want to generally apply skinning in world space, not view space.
In D3D I just had direct access to the world matrices via the transform pipeline. This meant that my shading code didn't have to know anything about the actual objects being rendered. It just copied the matrix palette from the pipeline into the shader, and then used the matrix indices stored in the vertices, and that was that.
In OpenGL I'll have to bypass the transform pipeline and build an alternative pipeline myself, one that supports world matrices explicitly (and more than just one).
At least in D3D10/11 you know beforehand that you need to build your own transform pipeline, since there is no pipeline at all (because they got rid of all the fixedfunction legacy).
In OpenGL there is a pipeline, and they even made the matrices accessible in GLSL automatically... but it doesn't do you any good. You still have to reinvent the wheel if you want to do anything more than just basic object rendering.
Its not such a big deal - your animation matrices are relative to modelspace anyway - the final transform from modelspace to worldspace for each model instance can be tacked on, which is a better way to treat instances of animated models anyway.
Its not such a big deal - your animation matrices are relative to modelspace anyway - the final transform from modelspace to worldspace for each model instance can be tacked on, which is a better way to treat instances of animated models anyway.
Well, it IS a big deal... because OpenGL's pipeline doesn't allow you to do this (you can not access the world and view matrices separately, so you can't retrieve whatever matrix you want to 'tack on').
There's a 'hack' though... Instead of applying the view matrix to the modelview matrix mode, you can also apply it to the perspective matrix mode.
This way, you can treat the modelview matrix as just your world matrix. You get free 'tacking on' because of the way the OpenGL pipeline works...
It is even compatible with GLSL shaders via the gl_ModelViewPerspective variable.
Obviously that doesn't solve the problem that matrix palette skinning requires an array of world matrices... but at least it's an easy way to have global access to your world matrix, while remaining compatible with OpenGL's pipeline.
I've added some basic texturing support now, via the FreeImage library, which supports a reasonably wide range of image formats (I had to patch the code first to make it work in 64-bit Windows, but hey, you get what you pay for). It's basically a wrapper for the common libjpeg/libpng/libtiff etc libraries, providing a single interface, and an easy way to detect file type.
By now I have a reasonable collection of 'helper' functions, that make the relatively raw OpenGL functionality for shaders and textures easier to use, more like D3DX, where you can load a shader or texture directly from disk. It also makes dealing with vectors, matrices and colours a bit more userfriendly, by wrapping them into simple classes, and offering basic math operations on these classes.
I've decided to move these into a separate library, which I intend to call 'GLUX' (GL Useful eXtensions), modeled after D3DX. I will release it under the BSD license.
I've based my code around FreeImage and CML, both opensource and portable solutions, so my library itself will also be opensource and portable.
Hopefully that will help new OpenGL users on their way more easily, not having to reinvent the wheel at every turn.
I've also moved to using the aforementioned 'hack' of having GL_MODELVIEW == world matrix and GL_PROJECTION == view*perspective matrix.
By now I have a reasonable collection of 'helper' functions, that make the relatively raw OpenGL functionality for shaders and textures easier to use, more like D3DX, where you can load a shader or texture directly from disk. It also makes dealing with vectors, matrices and colours a bit more userfriendly, by wrapping them into simple classes, and offering basic math operations on these classes.
I've decided to move these into a separate library, which I intend to call 'GLUX' (GL Useful eXtensions), modeled after D3DX. I will release it under the BSD license.
I've based my code around FreeImage and CML, both opensource and portable solutions, so my library itself will also be opensource and portable.
Hopefully that will help new OpenGL users on their way more easily, not having to reinvent the wheel at every turn.
I've also moved to using the aforementioned 'hack' of having GL_MODELVIEW == world matrix and GL_PROJECTION == view*perspective matrix.
Last night was rather productive... I fixed some bugs that prevented the skin shader from working properly, and debugged all the code handling the skin matrices.
I've also changed the code to avoid the OpenGL pipeline, and work more like how my Java engine works (which I don't think I ever backported to the D3D engine by the way, I'll have to do that at some point).
Namely, I solve the object->world space matrices for all objects first, without actually drawing any geometry. This effectively 'flattens' the scene graph, since all objects can now be rendered in any order, since you already know how to get from their object space into world space. You can then do things like culling, sorting, and perform animation that is dependent on other objects... which is the case with skinning.
I have now verified that the bone matrices in my skin shader are correct. I have however not implemented the actual weighted blending yet, so I just picked a single bone matrix at a time, and used that as the world matrix. By also rendering the bones themselves, I could clearly see the object 'attached' to one particular bone.
I verified that for a number of bones, and they all worked nicely.
There are now only a few more things to wrap up:
1) Set up the vertex attributes (matrix indices and blend weights) that are in the vertexbuffer, so they are passed to the shader.
2) Use the vertex attributes to perform the actual matrix palette skinning.
3) Implement per-pixel lighting in the shaders.
4) Do general cleanup of the code, add comments and information where necessary, and make it ready for release.
I think I can have a rough working skinned animation ready tonight... if I only fix 1) and 2), I can post a binary, a proof-of-concept, so to say.
Once everything is nicely cleaned up, I'll release the source code.
After that, I will see how difficult it will be to make the code work on my FreeBSD machine, with its Intel G31 chip. It cannot do GLSL, so I will have to rewrite the vertex shader in the old assembly-style vertex program stuff. The fragment shader will have to be replaced with legacy fixedfunction.
Perhaps a challenge for the more experienced OpenGL coders, when the code is released:
Currently my OpenGL code is much slower than the D3D-based code on the same machine. I invite everyone to download the code, optimize it and try to beat the D3D-versions, under the condition that you share the sourcecode of your improved versions.
I've also changed the code to avoid the OpenGL pipeline, and work more like how my Java engine works (which I don't think I ever backported to the D3D engine by the way, I'll have to do that at some point).
Namely, I solve the object->world space matrices for all objects first, without actually drawing any geometry. This effectively 'flattens' the scene graph, since all objects can now be rendered in any order, since you already know how to get from their object space into world space. You can then do things like culling, sorting, and perform animation that is dependent on other objects... which is the case with skinning.
I have now verified that the bone matrices in my skin shader are correct. I have however not implemented the actual weighted blending yet, so I just picked a single bone matrix at a time, and used that as the world matrix. By also rendering the bones themselves, I could clearly see the object 'attached' to one particular bone.
I verified that for a number of bones, and they all worked nicely.
There are now only a few more things to wrap up:
1) Set up the vertex attributes (matrix indices and blend weights) that are in the vertexbuffer, so they are passed to the shader.
2) Use the vertex attributes to perform the actual matrix palette skinning.
3) Implement per-pixel lighting in the shaders.
4) Do general cleanup of the code, add comments and information where necessary, and make it ready for release.
I think I can have a rough working skinned animation ready tonight... if I only fix 1) and 2), I can post a binary, a proof-of-concept, so to say.
Once everything is nicely cleaned up, I'll release the source code.
After that, I will see how difficult it will be to make the code work on my FreeBSD machine, with its Intel G31 chip. It cannot do GLSL, so I will have to rewrite the vertex shader in the old assembly-style vertex program stuff. The fragment shader will have to be replaced with legacy fixedfunction.
Perhaps a challenge for the more experienced OpenGL coders, when the code is released:
Currently my OpenGL code is much slower than the D3D-based code on the same machine. I invite everyone to download the code, optimize it and try to beat the D3D-versions, under the condition that you share the sourcecode of your improved versions.
Okay, I had to fight with vertexshader for quite a while... seems to work best when you use vec4 for your indices, and cast them to int when necessary.
But now it works at last... BHM skinning in OpenGL:

You can download the early binaries here: http://bohemiq.scali.eu.org/OpenGL-BHMSample20100414.zip
But now it works at last... BHM skinning in OpenGL:

You can download the early binaries here: http://bohemiq.scali.eu.org/OpenGL-BHMSample20100414.zip
Hum, on my PC at home, the performance wasn't exactly stellar. My 3 GHz Core2 Duo with Radeon 5770 managed about 1900 fps in Windows 7.
A far cry from the 7200 fps that it can get in Direct3D.
However, I just ran the code on my work PC, a 3 GHz Core2 Duo with GeForce 9800GTX+. It clocks about 6200 fps in Windows XP.
So... while my code may not be the most optimal OpenGL code in the world, it seems I'm not the only one responsible for the lackluster performance on the Radeon. nVidia makes my code look a whole lot more favourable. Then again, nVidia also does the same with my D3D code, which runs at nearly 9000 fps, so still almost 50% faster than my OpenGL code (while actually having heavier shaders, which also do per-pixel phong lighting, and using anisotropic texture filtering).
Given the amount of state changes required to set up vertex buffers and shaders, I wouldn't be surprised if OpenGL just isn't as efficient as D3D is (contrary to popular belief... which is probably based on the situation in the pre-T&L era... zealots love to dwell on the past, after all). But as I said, once I release the sourcecode, I encourage everyone to try and optimize it as far as you can, and prove me wrong.
I also tried it on my laptop with Intel IGP, but I could not get it to work. It used to be able to run the shader-based version, but apparently I have introduced some GLSL code in the final stages of completing the skinning that the Intel driver doesn't understand.
I will have to take a look at the GLSL compiler's error log to see if I can figure out what the trouble is, and if I can somehow work around it.
A far cry from the 7200 fps that it can get in Direct3D.
However, I just ran the code on my work PC, a 3 GHz Core2 Duo with GeForce 9800GTX+. It clocks about 6200 fps in Windows XP.
So... while my code may not be the most optimal OpenGL code in the world, it seems I'm not the only one responsible for the lackluster performance on the Radeon. nVidia makes my code look a whole lot more favourable. Then again, nVidia also does the same with my D3D code, which runs at nearly 9000 fps, so still almost 50% faster than my OpenGL code (while actually having heavier shaders, which also do per-pixel phong lighting, and using anisotropic texture filtering).
Given the amount of state changes required to set up vertex buffers and shaders, I wouldn't be surprised if OpenGL just isn't as efficient as D3D is (contrary to popular belief... which is probably based on the situation in the pre-T&L era... zealots love to dwell on the past, after all). But as I said, once I release the sourcecode, I encourage everyone to try and optimize it as far as you can, and prove me wrong.
I also tried it on my laptop with Intel IGP, but I could not get it to work. It used to be able to run the shader-based version, but apparently I have introduced some GLSL code in the final stages of completing the skinning that the Intel driver doesn't understand.
I will have to take a look at the GLSL compiler's error log to see if I can figure out what the trouble is, and if I can somehow work around it.
Okay, I figured out the problems with the Intel chip...
Apparently it didn't support uint, only int. Shouldn't matter in this case.
It also didn't support mat4x3() apparently... I've just used mat4x4 then. I'll have to find a nice reference on what types are available where...
And lastly, it didn't allow me to modify the vertex attributes inside the shader. I only store 3 of the 4 blendweights in the vertex buffer. The last blendweight can easily be calculated from the first 3 with a simple dot4()... after all, you know that they have to add up to 1. I've just made a local copy and modified that instead.
Funny thing is that I use mat34() in D3D9/10/11 on the same hardware without a problem, and I also modify the vertex attributes directly. I've written the original shaders ages ago, I think on a Radeon 8500... so it's always been possible in D3D, as far as I know. Weird that OpenGL doesn't support it.
Updated code here: http://bohemiq.scali.eu.org/OpenGL-BHMSample20100415.zip
Apparently it didn't support uint, only int. Shouldn't matter in this case.
It also didn't support mat4x3() apparently... I've just used mat4x4 then. I'll have to find a nice reference on what types are available where...
And lastly, it didn't allow me to modify the vertex attributes inside the shader. I only store 3 of the 4 blendweights in the vertex buffer. The last blendweight can easily be calculated from the first 3 with a simple dot4()... after all, you know that they have to add up to 1. I've just made a local copy and modified that instead.
Funny thing is that I use mat34() in D3D9/10/11 on the same hardware without a problem, and I also modify the vertex attributes directly. I've written the original shaders ages ago, I think on a Radeon 8500... so it's always been possible in D3D, as far as I know. Weird that OpenGL doesn't support it.
Updated code here: http://bohemiq.scali.eu.org/OpenGL-BHMSample20100415.zip
An interesting discovery...
When I ran the new code on my Radeon, I magically got 'nVidia-like' framerates... more than twice as fast as before.
Initially I thought it was a problem with the shader compiler... that it somehow managed to optimize the Intel-friendly code much better (although technically it has to do slightly more work).
But as I dug deeper, I found the real culprit: It's the freeglut library!
The one I use on my main PC is built from source, as I couldn't find a pre-built x64 library.
On my laptop I only have a 32-bit OS, so I only installed the prebuilt binaries there.
For some reason, the prebuilt binary from 2001(!) makes my application run much better than when I build it myself with VS2008. Since I don't know exactly what source the prebuilt binary is made of, I can't be sure that the problem is just the compiler. It could be that updates to the sourcecode have made it much slower, at least on ATi drivers.
I've tried the original GLUT binaries, and they perform better aswell. I'll have to see if I can build a 64-bit version of those from source.
Edit: I've built new 32-bit and 64-bit libraries from the original GLUT sources, and they both perform just fine. So it looks like the problem is in the sourcecode, not the compiler. So much for freeglut, then...
Edit: It looks like the DLL on my laptop wasn't even freeglut at all, but a renamed copy of the original glut32.dll.
Another thing... I tried the code on my FreeBSD box, and it seems that upgrading to FreeBSD 8.0 has also solved the problem of crashes when you leave VBOs bound during glut-calls.
When I ran the new code on my Radeon, I magically got 'nVidia-like' framerates... more than twice as fast as before.
Initially I thought it was a problem with the shader compiler... that it somehow managed to optimize the Intel-friendly code much better (although technically it has to do slightly more work).
But as I dug deeper, I found the real culprit: It's the freeglut library!
The one I use on my main PC is built from source, as I couldn't find a pre-built x64 library.
On my laptop I only have a 32-bit OS, so I only installed the prebuilt binaries there.
For some reason, the prebuilt binary from 2001(!) makes my application run much better than when I build it myself with VS2008. Since I don't know exactly what source the prebuilt binary is made of, I can't be sure that the problem is just the compiler. It could be that updates to the sourcecode have made it much slower, at least on ATi drivers.
I've tried the original GLUT binaries, and they perform better aswell. I'll have to see if I can build a 64-bit version of those from source.
Edit: I've built new 32-bit and 64-bit libraries from the original GLUT sources, and they both perform just fine. So it looks like the problem is in the sourcecode, not the compiler. So much for freeglut, then...
Edit: It looks like the DLL on my laptop wasn't even freeglut at all, but a renamed copy of the original glut32.dll.
Another thing... I tried the code on my FreeBSD box, and it seems that upgrading to FreeBSD 8.0 has also solved the problem of crashes when you leave VBOs bound during glut-calls.
When you are trying to write multi-platform portable code, it is always a good idea to actually try your code on multiple platforms from time to time.
In fact, I read that during the development of Windows NT, Microsoft actually used special development machines based on Intel i860 and MIPS processors. This guaranteed that although the main target of Windows NT would be the x86 architecture, no x86-specific could would be able to slip through.
So I went back to the FreeBSD system, and tried to make the code compile again. Gcc complained about a few issues that MSVC apparently didn't care about... So I fixed up the code.
But obviously there was still this big problem of the FreeBSD system not being able to run the GLSL code. There is an automatic fallback to the legacy fixedfunction pipeline, and that works okay. In theory the GLSL code should also work on FreeBSD/linux, but I cannot verify that myself, with this installation.
Then I figured this would be as good a time as any to try and get into the older assembly language ARB extensions. Those are supported on my FreeBSD system. So I subclassed a new material and gave the extensions a try. Remarkably, they seemed to be even easier to get going than the GLSL extensions. There are less steps involved.
It is all very limited and archaic though... gave me flashbacks of assembly shaders in DirectX 8, back in 2002. Those were happy days though, fond memories.
Besides, this is an assembly forum after all, so it's nice to actually have some assembly code in this project at last :)
Anyway, here is a binary compiled for Windows: http://bohemiq.scali.eu.org/OpenGL-BHM-asm.zip
I've removed the GLSL path, so it will only use the assembly shaders, or fall back to the legacy pipeline.
The programs I've written are very basic, they don't do skinning and lighting yet, just texturing.
But it's a start, a proof-of-concept... my FreeBSD system DOES run programmable shaders now, and it WILL be able to do perform the skinning on the GPU. Implementing the full shading and skinning should be merely a formality from this point on. But it will be last on the list. I want to finish the GLSL shaders first, and clean up and comment the code.
In fact, I read that during the development of Windows NT, Microsoft actually used special development machines based on Intel i860 and MIPS processors. This guaranteed that although the main target of Windows NT would be the x86 architecture, no x86-specific could would be able to slip through.
So I went back to the FreeBSD system, and tried to make the code compile again. Gcc complained about a few issues that MSVC apparently didn't care about... So I fixed up the code.
But obviously there was still this big problem of the FreeBSD system not being able to run the GLSL code. There is an automatic fallback to the legacy fixedfunction pipeline, and that works okay. In theory the GLSL code should also work on FreeBSD/linux, but I cannot verify that myself, with this installation.
Then I figured this would be as good a time as any to try and get into the older assembly language ARB extensions. Those are supported on my FreeBSD system. So I subclassed a new material and gave the extensions a try. Remarkably, they seemed to be even easier to get going than the GLSL extensions. There are less steps involved.
It is all very limited and archaic though... gave me flashbacks of assembly shaders in DirectX 8, back in 2002. Those were happy days though, fond memories.
Besides, this is an assembly forum after all, so it's nice to actually have some assembly code in this project at last :)
Anyway, here is a binary compiled for Windows: http://bohemiq.scali.eu.org/OpenGL-BHM-asm.zip
I've removed the GLSL path, so it will only use the assembly shaders, or fall back to the legacy pipeline.
The programs I've written are very basic, they don't do skinning and lighting yet, just texturing.
But it's a start, a proof-of-concept... my FreeBSD system DOES run programmable shaders now, and it WILL be able to do perform the skinning on the GPU. Implementing the full shading and skinning should be merely a formality from this point on. But it will be last on the list. I want to finish the GLSL shaders first, and clean up and comment the code.
As intrigued as I was with the whole assembly shader thing, and programmable shading actually working on my FreeBSD box, I couldn't quite let go of it just yet.
I figured that it wasn't *quite* a formality to implement skinning yet, as I hadn't looked at how to actually index the matrices in the shader...
So I played around with it some more, and now I actually have the skinning working in assembly.
Here's a binary: http://bohemiq.scali.eu.org/OpenGL-ASMSkinning.zip
And for completeness, the assembly sourcecode of the vertex program:
And the fragment shader:
Looks pretty elegant, as far as assembly code goes :)
I figured that it wasn't *quite* a formality to implement skinning yet, as I hadn't looked at how to actually index the matrices in the shader...
So I played around with it some more, and now I actually have the skinning working in assembly.
Here's a binary: http://bohemiq.scali.eu.org/OpenGL-ASMSkinning.zip
And for completeness, the assembly sourcecode of the vertex program:
!!ARBvp1.0
PARAM projection[4] = { state.matrix.projection };
PARAM bone[80] = { program.env[0..79] };
PARAM calcw = { -1, -1, -1, 1 };
TEMP position;
TEMP temp;
TEMP weights;
TEMP indices;
ADDRESS addr;
MOV weights, vertex.attrib[1];
DP4 weights.w, weights, calcw;
MUL indices, vertex.attrib[6], 4;
ARL addr.x, indices.x;
DP4 temp.x, bone, vertex.position;
DP4 temp.y, bone, vertex.position;
DP4 temp.z, bone, vertex.position;
DP4 temp.w, bone, vertex.position;
MUL position, temp, weights.x;
ARL addr.x, indices.y;
DP4 temp.x, bone, vertex.position;
DP4 temp.y, bone, vertex.position;
DP4 temp.z, bone, vertex.position;
DP4 temp.w, bone, vertex.position;
MAD position, temp, weights.y, position;
ARL addr.x, indices.z;
DP4 temp.x, bone, vertex.position;
DP4 temp.y, bone, vertex.position;
DP4 temp.z, bone, vertex.position;
DP4 temp.w, bone, vertex.position;
MAD position, temp, weights.z, position;
ARL addr.x, indices.w;
DP4 temp.x, bone, vertex.position;
DP4 temp.y, bone, vertex.position;
DP4 temp.z, bone, vertex.position;
DP4 temp.w, bone, vertex.position;
MAD position, temp, weights.w, position;
DP4 result.position.x, projection[0], position;
DP4 result.position.y, projection[1], position;
DP4 result.position.z, projection[2], position;
DP4 result.position.w, projection[3], position;
MOV result.color, vertex.color;
MOV result.texcoord[0], vertex.texcoord;
END
And the fragment shader:
!!ARBfp1.0
TEMP color;
TEX color, fragment.texcoord[0], texture[0], 2D;
MOV result.color, color;
END
Looks pretty elegant, as far as assembly code goes :)
And here is an actual screenshot from my FreeBSD machine:


Works fine, just you forgot to reset the projection matrix if the window is resized.. aspect ratio ;)
Works fine, just you forgot to reset the projection matrix if the window is resized.. aspect ratio ;)
Thanks for testing.
And yea, it's still a bit rough around the edges. I'll add this to the todo-list, so it will be addressed before the source code is released.
A funny thing I noticed, by the way... On my laptop, with Intel GM965 chipset, I can run both the GLSL and the assembly shaders. I have implemented the same simple diffuse lighting in assembly as in GLSL now, and also applied some small optimizations to the assembly code (I can do a 3x4 or 4x3 matrix easily in assembly :)).
I included all the 'heavy' stuff, such as skinning both position and normal, and renormalizing the normal per-pixel.
The GLSL code runs about 100 fps... the asm code runs at about 160 fps(!).
So it looks like Intel's GLSL compiler has very poor optimization, and in this case, the assembly really pays off.
The D3D9 version runs at about 170 fps, and that is HLSL. I haven't tried an assembly version in D3D, but apparently the Microsoft HLSL compiler already does a great job.
Downside to using assembly is that only nVidia has updated the language with extensions over the years (as a back-end for their Cg compiler). Others have abandoned the assembly language after the introduction of GLSL, so the assembly language is still stuck at SM2.0-level. Then again, on Intel drivers, so is the GLSL.
Funny thing is, the more I try to clean up and abstract the OpenGL code, the more my code starts to resemble D3D.
For example, I am thinking of making a reference-counted baseclass, to wrap up the OpenGL resources, and have them cleaned up after the last user releases it.
And I obviously use interleaved vertexbuffers, so I need to keep track of all the offsets of all vertex attributes.
Initially I stored that in every mesh object... but since vertex formats can be reused by multiple meshes, I will make a separate vertex declaration class.
For example, I am thinking of making a reference-counted baseclass, to wrap up the OpenGL resources, and have them cleaned up after the last user releases it.
And I obviously use interleaved vertexbuffers, so I need to keep track of all the offsets of all vertex attributes.
Initially I stored that in every mesh object... but since vertex formats can be reused by multiple meshes, I will make a separate vertex declaration class.
I've built a nice reference-counting baseclass, which should be thread-safe and multiplatform... albeit with some limitations. I use intrinsics for atomic operations, InterlockedIncrement()/Decrement() for MSVC, and __sync_add/sub_and_fetch() for gcc. So different compilers are not supported, and in the case of gcc, the intrinsics may not be supported on all versions and architectures.
But I think in practice it will be good enough, as it works on x86, and all the popular OSes such as linux, FreeBSD, Solaris, OS X all use gcc as the standard compiler.
So at this point I don't feel like making more workarounds with pthread mutex objects or such. I'll leave that as an exercise to people with 'unsupported' compilers/architectures.
On top of the reference-counter I have built a template class, which stores a 'resource' of the templated type. With this template you can easily wrap OpenGL resources such as textures, shaders and vertex buffers. You just have to implement the destructor for each given type, so that it knows how to clean up the resource after the last reference is released.
That may sound and look a lot like D3D, but that was not my original goal. It just seemed like the nicest solution to me. I've also discussed it with a game developer friend of mine, and he came to the same conclusion. Reference counting just seems to make a lot of sense in a rendering framework, where you want to share resources as much as possible. A lot is just based on 'instancing'. Eg, you use an instance of a texture on various materials. You use instances of a material on various objects. You may use various instances of an object in the world...
I still need to finish wrapping up all OpenGL resources, currently I only use them for textures, as a proof-of-concept... and I still need to simplify/refactor some of the code, but it's really starting to shape up now.
I think I will just add the code to the repository once I've finished the cleanup/refactor.
Then I will update the repository later with a full lighting model, and completely commented code. But I've tried to make the design as clean and simple as possible, so hopefully the code mostly 'speaks for itself'. I think it's mainly the shaders that need commenting, and some of the preprocessing of the data loaded from the file. The actual rendering framework should be trivial.
But I think in practice it will be good enough, as it works on x86, and all the popular OSes such as linux, FreeBSD, Solaris, OS X all use gcc as the standard compiler.
So at this point I don't feel like making more workarounds with pthread mutex objects or such. I'll leave that as an exercise to people with 'unsupported' compilers/architectures.
On top of the reference-counter I have built a template class, which stores a 'resource' of the templated type. With this template you can easily wrap OpenGL resources such as textures, shaders and vertex buffers. You just have to implement the destructor for each given type, so that it knows how to clean up the resource after the last reference is released.
That may sound and look a lot like D3D, but that was not my original goal. It just seemed like the nicest solution to me. I've also discussed it with a game developer friend of mine, and he came to the same conclusion. Reference counting just seems to make a lot of sense in a rendering framework, where you want to share resources as much as possible. A lot is just based on 'instancing'. Eg, you use an instance of a texture on various materials. You use instances of a material on various objects. You may use various instances of an object in the world...
I still need to finish wrapping up all OpenGL resources, currently I only use them for textures, as a proof-of-concept... and I still need to simplify/refactor some of the code, but it's really starting to shape up now.
I think I will just add the code to the repository once I've finished the cleanup/refactor.
Then I will update the repository later with a full lighting model, and completely commented code. But I've tried to make the design as clean and simple as possible, so hopefully the code mostly 'speaks for itself'. I think it's mainly the shaders that need commenting, and some of the preprocessing of the data loaded from the file. The actual rendering framework should be trivial.
I've spent quite a bit of time going over the sourcecode, trying to make everything as clean and simple as possible, and also found a few minor bugs here and there...
I think the code is pretty much 'done' right now. I've built some new Windows binaries of the code in its current state:
http://bohemiq.scali.eu.org/BHMSample-20100424.zip
And here is a screenshot of the latest build on FreeBSD:

Sadly I cannot say exactly how well it performs on FreeBSD, as it looks like vsync simply cannot be disabled on the Intel driver. I get a consistent 60 or 75 fps, depending on the refresh rate of my desktop, and about 160 fps when minimized.
So next on the agenda is:
- Making a COPYRIGHT file describing the license (BSD), and adding some basic info as comment at the top of each source file.
- Adding the code to the repository on SourceForge.net
- Finish commenting the code where necessary.
- Writing a readme file that describes the dependencies on third-party libraries, and explains the file format and animation code more globally.
I think the code is pretty much 'done' right now. I've built some new Windows binaries of the code in its current state:
http://bohemiq.scali.eu.org/BHMSample-20100424.zip
And here is a screenshot of the latest build on FreeBSD:

Sadly I cannot say exactly how well it performs on FreeBSD, as it looks like vsync simply cannot be disabled on the Intel driver. I get a consistent 60 or 75 fps, depending on the refresh rate of my desktop, and about 160 fps when minimized.
So next on the agenda is:
- Making a COPYRIGHT file describing the license (BSD), and adding some basic info as comment at the top of each source file.
- Adding the code to the repository on SourceForge.net
- Finish commenting the code where necessary.
- Writing a readme file that describes the dependencies on third-party libraries, and explains the file format and animation code more globally.