Hi, I have the following procedure for looping through a list of structres, reading a real4 floating point value from each and checking to find the biggest. It finishes holding the biggest value found in R4t and the index of the structre containing it in DDt
The loop works grand, I'm just curious if I could optimise it as, ideally, it might be need to scan through a large number of structre i.e. 100,000 to 1,000,000. All suggestions, as always, will be appreciated. Thanks in advance. Zadkiel
LOCAL R4t:Real4, DDt:DWORD mov ecx, Total mov ebx, sizeof( SampleStruct ) mov DDt, 0 fldz NxtSS: xor edx, edx mov eax, ecx dec eax mul ebx mov edx, eax fcom SStruct.Desire fstsw ax sahf setb al and al, 0FFh jnz More dec ecx jnz NxtSS jmp FinLp More: fstp R4t fld SStruct.EgValue mov DDt, ecx dec ecx jnz NxtSS FinLp: fstp R4t
Right now this code is calculating the index by doing a
Rather assign the address of float structure in edi. Then for each iteration
xor edx, edx mov eax, ecx dec eax mul ebx mov edx, eax
will avoid complex index calculation logic. Jones.
Add edi, STRUCTURE_SIZE
Thats an excellent idea, thanks. Anymore anyone ?
Since you're testing the B(elow) condition with SETB, why not use the condition directly with JB or JNB?
should reduce to
fstsw ax sahf setb al and al, 0FFh jnz More
fstsw ax sahf jb More
Thanks, I reworte the loop following both of your suggestions and managed to knock off about 230 clock cycles for a 64 iteration test case. That worked out as 15% less for the total time it was taking. Incredable Note I hadn't posted up the full loop just the revelant parts that I felt needed optimising, just in case you think the total quoted clock cycles ie 1565 for the origional 64 iteration loop seems to long for what I posted.