Hi,

As I?ve continued to develop my project (see First Foray into ASM) I?ve got a bit side tracked into looking at colour Matrices. I?ve written and optimised the C++ version and whilst I?m pretty happy with the results I?d like to look at implementing a MMX version. Sadly I know very little MMX and have not really found many good sources online. I?ve got the basic idea, but the wealth of instructions, packing, unpacking etc has left me a little dizzy.

Colour Matrix is simply the application of matrices to manipulate each pixel in a given image. The original source code for this used floating point (I know I can look into SEE/SSE2 but I?d like to work my way up to these) and as such was pretty slow, though very accurate. I then converted this to an integer version (using the methodology of fixed point, but not caring where the point is ;) - well the code worked that?s all that was important at the time), this was about 40% faster. I then converted this to a lookup table version which gave almost 60% performance on the original FP version.

I?ve posted the fixed point version here to act as a basis for the MMX version, since I believe it?s the most appropriate.

``void TStdXtra_IMoaMmXScript::ncp_ColourMatrixImage_FixedPoint(unsigned long* src, unsigned long* dst, MoaUlong uiWidth, MoaUlong uiHeight, MoaDouble * mMat){	MoaLong		iRed, iGreen, iBlue;	MoaLong		ir, ig, ib;	MoaLong		newMat;	MoaUlong	ui1, i;	MoaUlong	iImageSize   = uiWidth*uiHeight;		// Convert Matrix to fixedpoint using *256	for (i=0; i<16; i++)		newMat = (MoaLong)(mMat*256.0f);	for(i=0;i<iImageSize;i++)	{			// This appears to be fastest emthod of grabing the components		ui1	= *src++;		ir	= (MoaLong)((ui1 >> 16)&0xFF); 		ig	= (MoaLong)((ui1 >> 8) &0xFF); 		ib	= (MoaLong)((ui1)      &0xFF);			// Use fixed point matrix values - have to divide through at end		iRed	= (ir*newMat + ig*newMat + ib*newMat  + newMat) / 256;		iGreen	= (ir*newMat + ig*newMat + ib*newMat  + newMat) / 256;		iBlue	= (ir*newMat + ig*newMat + ib*newMat + newMat) / 256;			// bound checks - yuk!  C < 0 = 0    C > 255 = 255		// < snipped for shorter code and it should be removed by using MMX >				*dst++ = (unsigned long)( (byte)(iBlue) | ((byte)(iGreen) << 8) | ((byte)(iRed) << 16));	}	}``

So from my understanding of MMX so far, I can set up the maths as such

``// MMX   Words       a      c      e      g//                   *      *      *      *//       Words       b      d      f      h// Result dWord      a*b+c*d       e*f+g*h``

Which in the case of the red componet resolves to

``// Load mm1 with matrix                m  m    m  m// Load mm2 with Components             ir    ig      ib     1// Multiply mm1 with mm2// Results                         m*ir+m*ig    m*ib + m*1``

For the moment I?m ignoring alpha and so m*1 represents the translation of the colour component. At some stage I?ll introduce alpha and do the component translation later.