Hi,

As I?ve continued to develop my project (see First Foray into ASM) I?ve got a bit side tracked into looking at colour Matrices. I?ve written and optimised the C++ version and whilst I?m pretty happy with the results I?d like to look at implementing a MMX version. Sadly I know very little MMX and have not really found many good sources online. I?ve got the basic idea, but the wealth of instructions, packing, unpacking etc has left me a little dizzy.

Colour Matrix is simply the application of matrices to manipulate each pixel in a given image. The original source code for this used floating point (I know I can look into SEE/SSE2 but I?d like to work my way up to these) and as such was pretty slow, though very accurate. I then converted this to an integer version (using the methodology of fixed point, but not caring where the point is ;) - well the code worked that?s all that was important at the time), this was about 40% faster. I then converted this to a lookup table version which gave almost 60% performance on the original FP version.

I?ve posted the fixed point version here to act as a basis for the MMX version, since I believe it?s the most appropriate.

``void TStdXtra_IMoaMmXScript::ncp_ColourMatrixImage_FixedPoint(unsigned long* src, unsigned long* dst, MoaUlong uiWidth, MoaUlong uiHeight, MoaDouble * mMat){	MoaLong		iRed, iGreen, iBlue;	MoaLong		ir, ig, ib;	MoaLong		newMat[16];	MoaUlong	ui1, i;	MoaUlong	iImageSize   = uiWidth*uiHeight;		// Convert Matrix to fixedpoint using *256	for (i=0; i<16; i++)		newMat = (MoaLong)(mMat*256.0f);	for(i=0;i<iImageSize;i++)	{			// This appears to be fastest emthod of grabing the components		ui1	= *src++;		ir	= (MoaLong)((ui1 >> 16)&0xFF); 		ig	= (MoaLong)((ui1 >> 8) &0xFF); 		ib	= (MoaLong)((ui1)      &0xFF);			// Use fixed point matrix values - have to divide through at end		iRed	= (ir*newMat[0] + ig*newMat[4] + ib*newMat[8]  + newMat[12]) / 256;		iGreen	= (ir*newMat[1] + ig*newMat[5] + ib*newMat[9]  + newMat[13]) / 256;		iBlue	= (ir*newMat[2] + ig*newMat[6] + ib*newMat[10] + newMat[14]) / 256;			// bound checks - yuk!  C < 0 = 0    C > 255 = 255		// < snipped for shorter code and it should be removed by using MMX >				*dst++ = (unsigned long)( (byte)(iBlue) | ((byte)(iGreen) << 8) | ((byte)(iRed) << 16));	}	}``

So from my understanding of MMX so far, I can set up the maths as such

``// MMX   Words       a      c      e      g//                   *      *      *      *//       Words       b      d      f      h// Result dWord      a*b+c*d       e*f+g*h``

Which in the case of the red componet resolves to

``// Load mm1 with matrix                m[0]  m[4]    m[8]  m[12]// Load mm2 with Components             ir    ig      ib     1// Multiply mm1 with mm2// Results                         m[0]*ir+m[4]*ig    m[8]*ib + m[12]*1``

For the moment I?m ignoring alpha and so m[12]*1 represents the translation of the colour component. At some stage I?ll introduce alpha and do the component translation later.