Hello forum,

I am writing an assembly function that multiplies 2 4x4 single precision matrices. I wrote 2 versions, one using SSE the other using SSE4.1. What surprised me is that the SSE4.1 version fails to beat the SSE version, it is in fact slightly slower.

If anyone is interested in helping, I can post some code.

Thanks, nick
Posted on 2011-03-11 16:53:22 by nicolasbock
yep show your work :)
Posted on 2011-03-12 01:10:25 by Homer