There is an excellent optimization tutorial by Mark Larson at

www.visionx.com/markl/optimization_tips.htm

Apology to all of you who already know about this.
Posted on 2004-11-20 09:56:38 by penang
Thanks for posting the link. I actually posted it when I first wrote the page. However, I should probably post it from time to time.

I have a copy up on masmforum as well, because Hutch-- was kind enough to give me webspace.
http://masmforum.com/website/mark/index.htm


I've also written some assembler optimization tutorials in case anyone is interested. All on masmforum.

Using SSE2 to do Quaternions ( used in game programming):
http://www.masmforum.com/viewtopic.php?t=3469&highlight=quaternions

Mersenne Twister Random Number Generator optimization tutorial. The author of the mersenne twister's C code runs in 258 cycles. Agner Fog's P4 SSE2 code for the mersenne twister runs in 44 cycles. My ALU code runs in 25 cycles ( 10 times faster than the author's code, and almost twice as fast as Agner's SSE2 code). Yes, you read that right, my ALU code is running faster than Agner's SSE2 code. It's because I optimized it specifically for the P4, and you can execute up to 4 ALU instructions in parallel if you do it right. I then wrote an SSE2 version that runs in 14 cycles ( 18.5 times faster than the author's code, and 3.1 times faster than Agner's SSE2 code).
http://www.masmforum.com/viewtopic.php?t=3565&highlight=mersenne+twister


How to optimize C code into fast assembler code. This was the first one I did. It is 6 pages due to all the replies I was getting from people. Jibz was kind enough to offer some better optimized C code to compare against. I took the original code from a book on optimizing C. I wanted to show how to speed up already highly optimized C code using assembler.
http://www.masmforum.com/viewtopic.php?t=3329&highlight=optimization+tutorial


My account got messed up on masmforum. I had to get a new account. Some of my old posts now say hutch-- and some say marklarson. I participated in the MD5CRK project ( http://www.md5crk.com). You can see where Jean-luc gave me credit here: http://www.md5crk.com/?sec=aboutmd5client ( search for "larson"). My code runs 10 times faster than the standard C code. I also posted the code on masmforum but it says hutch-- ( because of that I previously mentioned problem).
http://www.masmforum.com/viewtopic.php?t=2921&highlight=md5
Posted on 2004-11-22 16:06:36 by mark_larson
Excellent work Mark! :alright: I will be using the Twister code extensively.
Posted on 2004-11-22 23:18:39 by bitRAKE
I forgot this. I wrote some macros for Masm 6.14 ( the version that comes with MASM32) to use SSE3. They are useful if you have a Prescott processor.

I tried uploading the file but the board is not giving me that option. So I am cutting and pasting instead.



;SSE3 macros Written By: Mark Larson, Mark_Larson@dell.com
;These macros support all 13 of the new SSE3 instructions.
;List of Supported SSE3 instructions
;01) ADDSUBPD
;02) ADDSUBPS
;03) HADDPD
;04) HADDPS
;05) HSUBPD
;06) HSUBPS
;07) MOVDDUP
;08) MOVSLDUP
;09) MOVSHDUP
;10) LDDQU
;11) MONITOR
;12) MWAIT
;13) FISTTP

ADDSUBPD macro reg1:req, reg2:req
local do_override
local start_modify
local end_modify
local reg_sub

do_override = 0 ;Initialize that we have seen an override to not seen.
reg_sub textequ @SubStr(reg2,1,3) ;Get the first 3 characters of the second passed in parameter
;We want to try and see if their is a segment override
IFIDNI reg_sub, <cs:> ;CS override?
do_override = 1
ELSEIFIDNI reg_sub, <ds:>
do_override = 1
ELSEIFIDNI reg_sub, <es:>
do_override = 1
ELSEIFIDNI reg_sub, <fs:>
do_override = 1
ELSEIFIDNI reg_sub, <gs:>
do_override = 1
ELSEIFIDNI reg_sub, <ss:>
do_override = 1
ENDIF


;overrides get added in IN front of Mod R/M in the opcode. So the byte we modify to convert the
; ADDPD to a ADDSUBPD may change by 1 byte, if a segment override has been used in a memory access.

db 066h ;This forces the following ADDPS to reallly be an ADDPD.
start_modify equ $
addps reg1,reg2 ;I am using ADDPS since it is supported with MASM 6.14, which comes with MASM32
end_modify equ $
org (start_modify+1+do_override);Go back to the "58" and change it to a "D0"
db 0D0h ;Change the ADDPD from a "66 0f 58 /r" to a "66 0f D0 /r" which is a ADDSUBPD
org (end_modify) ;Go to the last byte in the opcode.
endm

ADDSUBPS macro reg1:req, reg2:req
local do_override
local start_modify
local end_modify
local reg_sub

do_override = 0 ;Initialize that we have seen an override to not seen.
reg_sub textequ @SubStr(reg2,1,3) ;Get the first 3 characters of the second passed in parameter
;We want to try and see if their is a segment override
IFIDNI reg_sub, <cs:> ;CS override?
do_override = 1
ELSEIFIDNI reg_sub, <ds:>
do_override = 1
ELSEIFIDNI reg_sub, <es:>
do_override = 1
ELSEIFIDNI reg_sub, <fs:>
do_override = 1
ELSEIFIDNI reg_sub, <gs:>
do_override = 1
ELSEIFIDNI reg_sub, <ss:>
do_override = 1
ENDIF


;overrides get added in IN front of Mod R/M in the opcode. So the byte we modify to convert the
; ADDPS to a ADDSUBPS may change by 1 byte, if a segment override has been used in a memory access.

db 0F2h ;We add an F2h in front since the opcode for ADDSUBPS is "F2,0F,D0,/r"
start_modify equ $
addps reg1,reg2 ;I am using ADDPS since it is supported with MASM 6.14, which comes with MASM32
end_modify equ $
org (start_modify+1+do_override);Go back to the "58" and change it to a "D0"
db 0D0h ;Change the ADDPS from a "0f 58 /r" to a "0f D0 /r" which is a ADDSUBPS
org (end_modify) ;Go to the last byte in the opcode.
endm

HADDPD macro reg1:req, reg2:req
local do_override
local start_modify
local end_modify
local reg_sub

do_override = 0 ;Initialize that we have seen an override to not seen.
reg_sub textequ @SubStr(reg2,1,3) ;Get the first 3 characters of the second passed in parameter
;We want to try and see if their is a segment override
IFIDNI reg_sub, <cs:> ;CS override?
do_override = 1
ELSEIFIDNI reg_sub, <ds:>
do_override = 1
ELSEIFIDNI reg_sub, <es:>
do_override = 1
ELSEIFIDNI reg_sub, <fs:>
do_override = 1
ELSEIFIDNI reg_sub, <gs:>
do_override = 1
ELSEIFIDNI reg_sub, <ss:>
do_override = 1
ENDIF


;overrides get added in IN front of Mod R/M in the opcode. So the byte we modify to convert the
; ADDPD to a HADDPD may change by 1 byte, if a segment override has been used in a memory access.

db 066h ;This forces the following ADDPS to reallly be an ADDPD.
start_modify equ $
addps reg1,reg2 ;I am using ADDPS since it is supported with MASM 6.14, which comes with MASM32
end_modify equ $
org (start_modify+1+do_override);Go back to the "58" and change it to a "7C"
db 07Ch ;Change the ADDPD from a "66 0f 58 /r" to a "66,0F,7C,/r" which is a HADDPD
org (end_modify) ;Go to the last byte in the opcode.
endm

HADDPS macro reg1:req, reg2:req
local do_override
local start_modify
local end_modify
local reg_sub

do_override = 0 ;Initialize that we have seen an override to not seen.
reg_sub textequ @SubStr(reg2,1,3) ;Get the first 3 characters of the second passed in parameter
;We want to try and see if their is a segment override
IFIDNI reg_sub, <cs:> ;CS override?
do_override = 1
ELSEIFIDNI reg_sub, <ds:>
do_override = 1
ELSEIFIDNI reg_sub, <es:>
do_override = 1
ELSEIFIDNI reg_sub, <fs:>
do_override = 1
ELSEIFIDNI reg_sub, <gs:>
do_override = 1
ELSEIFIDNI reg_sub, <ss:>
do_override = 1
ENDIF


;overrides get added in IN front of Mod R/M in the opcode. So the byte we modify to convert the
; ADDPS to a HADDPS may change by 1 byte, if a segment override has been used in a memory access.

db 0F2h ;We add an F2h in front since the opcode for HADDPS is "F2,0F,7C,/r"
start_modify equ $
addps reg1,reg2 ;I am using ADDPS since it is supported with MASM 6.14, which comes with MASM32
end_modify equ $
org (start_modify+1+do_override);Go back to the "58" and change it to a "7C"
db 07Ch ;Change the ADDPS from a "0f 58 /r" to a "F2,0F,7C,/r" which is a HADDPS
org (end_modify) ;Go to the last byte in the opcode.
endm

HSUBPD macro reg1:req, reg2:req
local do_override
local start_modify
local end_modify
local reg_sub

do_override = 0 ;Initialize that we have seen an override to not seen.
reg_sub textequ @SubStr(reg2,1,3) ;Get the first 3 characters of the second passed in parameter
;We want to try and see if their is a segment override
IFIDNI reg_sub, <cs:> ;CS override?
do_override = 1
ELSEIFIDNI reg_sub, <ds:>
do_override = 1
ELSEIFIDNI reg_sub, <es:>
do_override = 1
ELSEIFIDNI reg_sub, <fs:>
do_override = 1
ELSEIFIDNI reg_sub, <gs:>
do_override = 1
ELSEIFIDNI reg_sub, <ss:>
do_override = 1
ENDIF


;overrides get added in IN front of Mod R/M in the opcode. So the byte we modify to convert the
; ADDPD to a HSUBPD may change by 1 byte, if a segment override has been used in a memory access.

db 066h ;This forces the following ADDPS to reallly be an ADDPD.
start_modify equ $
addps reg1,reg2 ;I am using ADDPS since it is supported with MASM 6.14, which comes with MASM32
end_modify equ $
org (start_modify+1+do_override);Go back to the "58" and change it to a "7D"
db 07Dh ;Change the ADDPD from a "66 0f 58 /r" to a "66,0F,7D,/r" which is a HSUBPD
org (end_modify) ;Go to the last byte in the opcode.
endm

HSUBPS macro reg1:req, reg2:req
local do_override
local start_modify
local end_modify
local reg_sub

do_override = 0 ;Initialize that we have seen an override to not seen.
reg_sub textequ @SubStr(reg2,1,3) ;Get the first 3 characters of the second passed in parameter
;We want to try and see if their is a segment override
IFIDNI reg_sub, <cs:> ;CS override?
do_override = 1
ELSEIFIDNI reg_sub, <ds:>
do_override = 1
ELSEIFIDNI reg_sub, <es:>
do_override = 1
ELSEIFIDNI reg_sub, <fs:>
do_override = 1
ELSEIFIDNI reg_sub, <gs:>
do_override = 1
ELSEIFIDNI reg_sub, <ss:>
do_override = 1
ENDIF


;overrides get added in IN front of Mod R/M in the opcode. So the byte we modify to convert the
; ADDPS to a HSUBPS may change by 1 byte, if a segment override has been used in a memory access.

db 0F2h ;We add an F2h in front since the opcode for HSUBPS is "F2,0F,7D,/r"
start_modify equ $
addps reg1,reg2 ;I am using ADDPS since it is supported with MASM 6.14, which comes with MASM32
end_modify equ $
org (start_modify+1+do_override);Go back to the "58" and change it to a "7D"
db 07Dh ;Change the ADDPS from a "0f 58 /r" to a "F2,0F,7D,/r" which is a HSUBPS
org (end_modify) ;Go to the last byte in the opcode.
endm

MOVDDUP macro reg1:req, reg2:req
local do_override
local start_modify
local end_modify
local reg_sub

do_override = 0 ;Initialize that we have seen an override to not seen.
reg_sub textequ @SubStr(reg2,1,3) ;Get the first 3 characters of the second passed in parameter
;We want to try and see if their is a segment override
IFIDNI reg_sub, <cs:> ;CS override?
do_override = 1
ELSEIFIDNI reg_sub, <ds:>
do_override = 1
ELSEIFIDNI reg_sub, <es:>
do_override = 1
ELSEIFIDNI reg_sub, <fs:>
do_override = 1
ELSEIFIDNI reg_sub, <gs:>
do_override = 1
ELSEIFIDNI reg_sub, <ss:>
do_override = 1
ENDIF


;overrides get added in IN front of Mod R/M in the opcode. So the byte we modify to convert the
; ADDPS to a MOVDDUP may change by 1 byte, if a segment override has been used in a memory access.

db 0F2h ;We add an F2h in front since the opcode for MOVDDUP is "F2,0F,12,/r"
start_modify equ $
addps reg1,reg2 ;I am using ADDPS since it is supported with MASM 6.14, which comes with MASM32
end_modify equ $
org (start_modify+1+do_override);Go back to the "58" and change it to a "12"
db 012h ;Change the ADDPS from a "0f 58 /r" to a "F2,0F,12,/r" which is a MOVDDUP
org (end_modify) ;Go to the last byte in the opcode.
endm

MOVSLDUP macro reg1:req, reg2:req
local do_override
local start_modify
local end_modify
local reg_sub

do_override = 0 ;Initialize that we have seen an override to not seen.
reg_sub textequ @SubStr(reg2,1,3) ;Get the first 3 characters of the second passed in parameter
;We want to try and see if their is a segment override
IFIDNI reg_sub, <cs:> ;CS override?
do_override = 1
ELSEIFIDNI reg_sub, <ds:>
do_override = 1
ELSEIFIDNI reg_sub, <es:>
do_override = 1
ELSEIFIDNI reg_sub, <fs:>
do_override = 1
ELSEIFIDNI reg_sub, <gs:>
do_override = 1
ELSEIFIDNI reg_sub, <ss:>
do_override = 1
ENDIF


;overrides get added in IN front of Mod R/M in the opcode. So the byte we modify to convert the
; ADDPS to a MOVSLDUP may change by 1 byte, if a segment override has been used in a memory access.

db 0F3h ;We add an F3h in front since the opcode for MOVSLDUP is "F3,0F,12,/r"
start_modify equ $
addps reg1,reg2 ;I am using ADDPS since it is supported with MASM 6.14, which comes with MASM32
end_modify equ $
org (start_modify+1+do_override);Go back to the "58" and change it to a "12"
db 012h ;Change the ADDPS from a "0f 58 /r" to a "F3,0F,12,/r" which is a MOVSLDUP
org (end_modify) ;Go to the last byte in the opcode.
endm

MOVSHDUP macro reg1:req, reg2:req
local do_override
local start_modify
local end_modify
local reg_sub

do_override = 0 ;Initialize that we have seen an override to not seen.
reg_sub textequ @SubStr(reg2,1,3) ;Get the first 3 characters of the second passed in parameter
;We want to try and see if their is a segment override
IFIDNI reg_sub, <cs:> ;CS override?
do_override = 1
ELSEIFIDNI reg_sub, <ds:>
do_override = 1
ELSEIFIDNI reg_sub, <es:>
do_override = 1
ELSEIFIDNI reg_sub, <fs:>
do_override = 1
ELSEIFIDNI reg_sub, <gs:>
do_override = 1
ELSEIFIDNI reg_sub, <ss:>
do_override = 1
ENDIF


;overrides get added in IN front of Mod R/M in the opcode. So the byte we modify to convert the
; ADDPS to a MOVSHDUP may change by 1 byte, if a segment override has been used in a memory access.

db 0F3h ;We add an F2h in front since the opcode for MOVSHDUP is "F3,0F,16,/r"
start_modify equ $
addps reg1,reg2 ;I am using ADDPS since it is supported with MASM 6.14, which comes with MASM32
end_modify equ $
org (start_modify+1+do_override);Go back to the "58" and change it to a "16"
db 016h ;Change the ADDPS from a "0f 58 /r" to a "F3,0F,16,/r" which is a MOVSHDUP
org (end_modify) ;Go to the last byte in the opcode.
endm

;The MONITOR instruction requires certain registers to be set up PRIOR to calling the instruction.
MONITOR macro
db 0Fh,01h,0C8h ;Opcodes for MONITOR
endm

;The MWAIT instruction requires certain registers to be set up PRIOR to calling the instruction.
MWAIT macro
db 0Fh,01h,0C9h ;Opcodes for MWAIT
endm

;The LDDQU instruction does not support using an XMM register. So if you try and pass in an XMM register
; the macro prints an error message and does an .ERR
LDDQU macro reg1:req, reg2:req
local do_override
local start_modify
local end_modify
local reg_sub

;we do not allow the second parameter to be a register. It has to be a memory location
IFIDNI <reg2>, <xmm0> ;CS override?
echo ERROR: The second parameter ( source) has to be a memory location
.err
ELSEIFIDNI <reg2>, <xmm1> ;CS override?
echo ERROR: The second parameter ( source) has to be a memory location
.err
ELSEIFIDNI <reg2>, <xmm2> ;CS override?
echo ERROR: The second parameter ( source) has to be a memory location
.err
ELSEIFIDNI <reg2>, <xmm3> ;CS override?
echo ERROR: The second parameter ( source) has to be a memory location
.err
ELSEIFIDNI <reg2>, <xmm4> ;CS override?
echo ERROR: The second parameter ( source) has to be a memory location
.err
ELSEIFIDNI <reg2>, <xmm5> ;CS override?
echo ERROR: The second parameter ( source) has to be a memory location
.err
ELSEIFIDNI <reg2>, <xmm6> ;CS override?
echo ERROR: The second parameter ( source) has to be a memory location
.err
ELSEIFIDNI <reg2>, <xmm7> ;CS override?
echo ERROR: The second parameter ( source) has to be a memory location
.err
ENDIF

do_override = 0 ;Initialize that we have seen an override to not seen.
reg_sub textequ @SubStr(reg2,1,3) ;Get the first 3 characters of the second passed in parameter
;We want to try and see if their is a segment override
IFIDNI reg_sub, <cs:> ;CS override?
do_override = 1
ELSEIFIDNI reg_sub, <ds:>
do_override = 1
ELSEIFIDNI reg_sub, <es:>
do_override = 1
ELSEIFIDNI reg_sub, <fs:>
do_override = 1
ELSEIFIDNI reg_sub, <gs:>
do_override = 1
ELSEIFIDNI reg_sub, <ss:>
do_override = 1
ENDIF


;overrides get added in IN front of Mod R/M in the opcode. So the byte we modify to convert the
; ADDPS to a LDDQU may change by 1 byte, if a segment override has been used in a memory access.

db 0F2h ;We add an F2h in front since the opcode for LDDQU is "F2,0F,F0,/r"
start_modify equ $
addps reg1,reg2 ;I am using ADDPS since it is supported with MASM 6.14, which comes with MASM32
end_modify equ $
org (start_modify+1+do_override);Go back to the "58" and change it to a "F0"
db 0F0h ;Change the ADDPS from a "0f 58 /r" to a "F2,0F,F0,/r" which is a LDDQU
org (end_modify) ;Go to the last byte in the opcode.
endm

;FISTTP works on 3 different memory types. Word, Dword, and Qword. Don't forget to specify the appropriate pointer when
; using it with memory. If you don't specify a pointer or one too big it forces an .ERR and prints an error message.
FISTTP macro reg1:req
local do_override
local start_modify
local end_modify
local reg_sub
local fisttp_size
local fisttp_type
local cur_offset
local last_offset

do_override = 0 ;Initialize that we have seen an override to not seen.

last_offset = @SizeStr(reg1) ;Get size of passed in argument.
last_offset = last_offset - 3 ;We look at 3 bytes in the argument at a time.
cur_offset = 1 ;Start at the first byte of the argument.

;I could not get InStr to work properly so I had to do it this way by scanning through the passed in value a byte at a time
; looking for an override.
WHILE cur_offset LE last_offset
reg_sub textequ @SubStr(reg1,cur_offset,3) ;Get the first 3 characters of the second passed in parameter

IFIDNI reg_sub, <cs:> ;CS override?
do_override = 1
ELSEIFIDNI reg_sub, <ds:>
do_override = 1
ELSEIFIDNI reg_sub, <es:>
do_override = 1
ELSEIFIDNI reg_sub, <fs:>
do_override = 1
ELSEIFIDNI reg_sub, <gs:>
do_override = 1
ELSEIFIDNI reg_sub, <ss:>
do_override = 1
ENDIF

cur_offset = cur_offset + 1
endm


;the TYPE operator returns the following when used in REG1 which is passed to the macro
;1 if no pointer used ( it assumes byte)
;1 if byte pointer used
;2 if word pointer used
;4 if dword pointer used
;8 if qword pointer used
fisttp_size equ TYPE reg1 ;1 if byte pointer, 2 if word pointer, 4 if dword pointer, and 8 if qword pointer
if fisttp_size EQ 1
echo ERROR: you have to pass in "word ptr", "dword ptr" or "qword ptr".
.err
elseif fisttp_size EQ 2
fisttp_type equ 0DFh ;opcode for using WORD sized fisttp
elseif fisttp_size EQ 4
fisttp_type equ 0DBh ;opcode for using DWORD sized fisttp
elseif fisttp_size EQ 8
fisttp_type equ 0DDh ;opcode for using QWORD sized fisttp
else
echo ERROR: you have to pass in "word ptr", "dword ptr" or "qword ptr".
.err
endif

;overrides get added in IN front of Mod R/M in the opcode. So the byte we modify to convert the
; FISTP to a FISTTP may change by 1 byte, if a segment override has been used in a memory access.

start_modify equ $
fimul word ptr reg1 ;adds on the segment override prefix if one is present.
;Also forces the user to pass in a valid memory address
end_modify equ $
org (start_modify+do_override);Go back to the byte in the FIMUL
db fisttp_type ;Change the FIMUL byte to a FISTTP byte
org (end_modify) ;Go to the last byte in the opcode.
endm

Posted on 2004-12-01 15:11:59 by mark_larson