I got a little bit tired commenting my code
in English and decided submit the whole a little
optimized code.
It'll be more interesting for you to compare it
to original and figure out why I change some parts.
I don't post rc file 'cause it's the same.
And it's noway super optimization. Not even close
Basics! Common uneffective and unlogical part optimized.
Common to 90 % of asm source code I saw.
May be we'll discuss some this later in details.
; #########################################################################
;
; The framing affects are drawn on the client area by a single procedure,
; "Frame3D". In the WmdProc message handling Proc, the WM_PAINT message
; calls another Proc called "Paint_Proc" which contains the procedure calls
; to "Frame3D".
;
; #########################################################################
.386
.model flat, stdcall
option casemap :none ; case sensitive
; #########################################################################
include C:\masm32\include\windows.inc
include C:\masm32\include\user32.inc
include C:\masm32\include\kernel32.inc
include C:\masm32\include\gdi32.inc
includelib user32.lib
includelib kernel32.lib
includelib gdi32.lib
; #########################################################################
;=============
; Local macros
;=============
szText MACRO Name, Text:VARARG
LOCAL lbl
jmp lbl
Name db Text,0
lbl:
ENDM
;=================
; Local prototypes
;=================
WinMain PROTO
WndProc PROTO :DWORD,:DWORD,:DWORD,:DWORD
Paint_Proc PROTO :DWORD, hDC:DWORD
Frame3D PROTO :DWORD,:DWORD,:DWORD,:DWORD,:DWORD,:DWORD,:DWORD,:DWORD
PushButton PROTO :DWORD,:DWORD,:DWORD,:DWORD,:DWORD,:DWORD,:DWORD
.data
szDisplayName db "3D Frames",0
hWnd dd 0
hInstance equ 400000h
wc WNDCLASSEX
.code
start:
invoke WinMain
; invoke ExitProcess,eax - it's for exit code to communicate
; between processes if you don't write code for multy process you may don't care of
; exit code. and just call ExitProcess
call ExitProcess
; #########################################################################
WinMain proc
;====================
; Put LOCALs on stack
;====================
LOCAL msg :MSG
;==================================================
; Fill WNDCLASSEX structure with required variables
;==================================================
xor ebx,ebx
invoke LoadIcon,hInstance,500 ; icon ID
mov wc.hIcon, eax
invoke LoadCursor,ebx,IDC_ARROW
mov wc.hCursor, eax
invoke RegisterClassEx, ADDR wc
push eax
;================================
; Centre window at following size
;================================
invoke GetSystemMetrics,SM_CXSCREEN
mov esi,eax ;esi == screen X
invoke GetSystemMetrics,SM_CYSCREEN ; eax == screen Y
mov ecx,eax
shr esi,1
shr ecx,1
sub esi,500/2
sub ecx,350/2
szText szClassName,1 ;1 byte for class name :)
pop edx ;edx = atom ID
invoke CreateWindowEx,WS_EX_LEFT,
edx,
ADDR szDisplayName,
WS_OVERLAPPEDWINDOW or WS_VISIBLE,
esi,ecx,500,350,
ebx,ebx,
hInstance,ebx
mov hWnd,eax
invoke LoadMenu,hInstance,600 ; menu ID
invoke SetMenu,hWnd,eax
Alex,
There are two problems with the approach you have taken to this
piece of example code. Writing structures in the DATA section
is a bad practice, you can get away with it in a small test app
but if you wrote a larger app with all of the structures loaded
into the .DATA section, you will blow the size of the app out
for no purpose.
If speed matters with the structure which it rarely does, having
it on the stack increases the access speed as the stack is faster
access than the .DATA section. I am simply citing the Intel data
here where stack data is in cache in most instances where .DATA
section data is a longer and slower fetch.
With the framing procedure, optimisation at close range with assembler
mnemonics is a waste of time that makes the code a lot harder to
read. Include one API call in an algorithm and all of the speed
optimisations are lost, that algorithm has multiple recursive API
calls so the gain is of no use but the loss of readability is a
sizeable loss.
I agree with you approach when it comes to designing assembler algorithms
as there is very good performance advantages in doing so but with
API code, its a waste of time that leaves the code much more difficult
to read.
Regards,
hutch@pbq.com.au
This message was edited by hutch--, on 3/30/2001 7:49:19 PM
Well let's see figures.
Let them judge us.
To put structures in .DATA section is a good practice.
1. Cause it faster
2. Cause it DECREASE size NOT INCREASE.
Compile my example and you'see that it is 1 kb less than yours.
Size of apps is sum of all in the file data + code.
Look at the code needed TO FILL WNDCLASSEX in locals:
It's from your exe
start address
.004010B3: C745D030000000 mov d,[-0030],000000030
.004010BA: C745D403200000 mov d,[-002C],000002003
.004010C1: C745D8F5114000 mov d,[-0028],0004011F5
.004010C8: C745DC00000000 mov d,[-0024],000000000
.004010CF: C745E000000000 mov d,[-0020],000000000
.004010D6: FF7508 push d,[00008]
.004010D9: 8F45E4 pop d,[-001C]
.004010DC: C745F010000000 mov d,[-0010],000000010
.004010E3: C745F400000000 mov d,[-000C],000000000
.004010EA: C745F856114000 mov d,[-0008],000401156
.004010F1: 68F4010000 push 0000001F4 ;" ??"
.004010F6: FF7508 push d,[00008]
.004010F9: E83E050000 call .00040163C -------- (1)
.004010FE: 8945E8 mov [-0018],eax
.00401101: 68007F0000 push 000007F00 ;" ¦ "
.00401106: 6A00 push 000
.00401108: E829050000 call .000401636 -------- (2)
.0040110D: 8945EC mov [-0014],eax
.00401110: C745FC00000000 mov d,[-0004],000000000
-----------------------
here we pass parameter to RegisterClassEx - end address
.00401117: 8D45D0 lea eax,[-0030]
117h - 0B3h = 64h = 100 bytes!
You need 100 bytes to fill it in the stack
I put it in .data section
You can see in the code above SIZEOF WNDCLASSEX = 30h (first line)
And I need just feel a few members in code:
start address:
¦.0040107E: 68F4010000 push 0000001F4 ;" ??"
¦.00401083: 6800004000 push 000400000 ;" @ "
¦.00401088: E87D040000 call .00040150A -------- (3) ;call for LoadIcon
¦.0040108D: A326204000 mov [000402026],eax
¦.00401092: 68007F0000 push 000007F00 ;" ¦ "
¦.00401097: 53 push ebx
¦.00401098: E867040000 call .000401504 -------- (4) ;call for LoadCursor
¦.0040109D: A32A204000 mov [00040202A],eax
------------------------------------------
here we pass parameter to RegisterClassEx - address
¦.004010A2: 680E204000 push 00040200E ;" @ ?"
A2 - 7E = 29h
29h(code) + 30h(data) = 59h = 89 bytes
That's why all those big C procs so fatty.
My proc lost 11 bytes in one structure.
And with every structure it will loose more bytes against some prog which fill the same struct in the locals.
For the rest I write later, I'm tired for my English today :)
Steve, Is it possible to make Frame3D round or oval?
The Svin.Alex,
Here is where I see the problem with the approach you are using,
to get the size reduction, you have used a constant for the
preferred load address of a PE EXE file but it will not work in
a DLL for the obvious reason.
wc WNDCLASSEX
Next you have not loaded the class cursor or the application's Icon
which is a reduction in code size but a reduction in performance
as well.
The approach I take to get the most efficient use of repeated
structures is used in the template generated by the latest version
of Prostart, it writes a procedure that receives different parameters
that are loaded into a WNDCLASSEX structure to register as many
classes as you require.
You will not see it with one structure but you will see it with
multiple structures loaded statically in the .DATA section, once
you get past the original 512 byte section size, it will start to
blow out in size.
I am guilty of size optimisation in a toy like TheGun which has
been hammered to death to keep its size down but it is done at the
level of architecture, you simply cannot get the size reduction
by close range mnemonic selection.
Regards,
hutch@pbq.com.auSteve,
1.I have size decreased for simple reason - in any situation it shorter in size
to initialize data already in the compile stage than in run time.
I say it again size of data + code with struct in data section will be less
than just size of code to init this struct. in the stack.
2. 400000h is default image based for .exe files, it's not preffered loading address
it's image base wich is known in link stage.
For DLL the default will be 1000000h.
Anyway you can change it with BASE option anytime.
But it will be you not black magic who changes it, so it'll be the same guy who writes
the code and if he know what he is doing he sets appropriate constant to the hInstance according
what kind of image BASE is suppouse to be with his tipe of PE and BASE options he
sets for link.
3. I DID load cursor and icon. I loaded it in run time. Loaded everything what you do
in loading WNDCLASSEX in your prog. And while comparing I counted (added) both size of code
I need and size in .data section and use the sum against your size of code.
4. You write about 512 bytes gran. but the same rule uplies to .code section and
with every new class filling you'll get the same result, the only difference will be that
my code will grou 'cause of .data section, yours - because of .code section.
5. You can reuse the structure in .data too.
6. Anyway I this particular prog, I factualy optimize size and speed.
And while optimizing we do not optimize ALL THE PROGS we optimize this particular one.
And facts are - it becomes shorter and faster.
You are not sure that it maybe useful with some multyclassing progs?
Let's check it out then!
We both take knowledge and facts above anything else. I'll be glad if we prove me wrong -
I'll take my advantages in any case.
So give me source where putting class structures in locals IS MATTER, and
I try to do the same with them in .data section.
Result will judge us.
I hope we respect each other well enough to keep our fillings out of business.
At least I do.
The Svin.
Swin
Any test will show that accessing data is *not* slower in .data
than it is when Data are on the Stack.
There is no reason to declare Data on Stack when they can be declared
in .data, out of saving Data room in PE file (the namings conflicts is
a pure theorical stupidity).
Declaring Structures Data on the Stack is:
. Waste of time
. Unreadable
. Longer to write (initialising)
. Slower to run (initialising)
The only one case when declaring Structures on the Stack have some
advantages is when the Structure is not to be initialized by your
app but to be initialized by Win to give you informations.
This is the case, for example, with the PAINTSTRUCT required for
BeginPaint calls (these cases are not many...).
For the data size spoiling, this is *not* a problem as Structures,
in a uge PE are usually only a very tiny part among the lot of other
data you will usually have to stand up, whereas the spoiled code size
is a problem because it turn Asm another C inefficient coding.
The reason why we can see so many Asm examples with Stack Structures
is that 1) they are often translated from C, 2) they are often
written by people who first learned C, and do not ask themself if
this is accurate or not. Trying to prove them that they are wrong is,
usually waste of time too... As you can see upper, after you prove
that 1+1=2.
betov.
Alex,
I have a very different approach to size optimisation that is based on the
architecture of the EXE file, change the .DATA section to a .DATA? section
.data?
CommandLine dd ?
hWnd dd ?
hInstance dd ?
Put the string data in the code section,
jmp @F
szDisplayName db "3D Frames",0
@@:
And voila ! it drops by one section in size.
Build it with this batch file and it drops another section in size,
@echo off
if exist 3dframes.obj del 3dframes.obj
if exist 3dframes.exe del 3dframes.exe
\masm32\bin\rc /v rsrc.rc
\masm32\bin\cvtres /machine:ix86 rsrc.res
\masm32\bin\ml /c /coff /nologo 3dframes.asm
\masm32\bin\Link /SUBSYSTEM:WINDOWS /MERGE:.rdata=.text 3dframes.obj rsrc.obj > nul
dir 3dframes.*
pause
And this is so far without any internal code size reduction. The next trick
is to write the complete text data into one of the spaces between sections
and address the offset manually.
Targetting an introductory example that is written as simply as possible is
not really were the action is, a complete rewrite with size in mind would
drop its size further but the intelligibility would reduce to ZERO.
Rene,
I am glad you have risen from the dead, if it was not after midnight I would
find you the reference in the Intel data about why you prefer stack to
global data in the data section but you are a big boy now and can look it up
yourself. If you are into SIMD instructions, you could fix the speed of global
access if it bothered you by using the "prefetch" mnemonics.
Regards,
hutch@pbq.com.auto bitov:
Well said, thanks. I had the same thoghts but no words to express.
Though, I beleave Steve is sencable man and some day may consider it
worthy to study the issue a little bit more carefully.
to Steve:
Yes, I know these teqs. I wouldn't call it different aproach 'cause they doesn't
exclude each other.
In particular progs I use some of them or all of them.
Some times I put all data and code in one section and make it EWR.
Though I would'n put WR data in critical section of code (critical in hence of speed,
not as thread issue).
Mind - Pentium has separated code and data caches 1 and 2 level. Separated prefetch
ques for them, and if don't balance data and code between them, it may harm performance.
Prefetch is not stack issue it's data usage frequency issue.
For ex. - change your stack pointer to address of .data section -
What now is data in the .data section? data in .data section? data in stack ? prefetched? not prefetched?
both? :)
Remember, we discuss RevStr? And while we work together on your libraries?
There were some strange results. And all of it was explained by testing as read or not read data
effect. In other words - is data in cache or not.
The Svin.
May i point out something, what the bloody hell is thepoint of all of this, I mean, windows is slow, the only point of optimization is to save speed and size. Optimization of registering a window class is really up to you.
But while we are on this topic: szText = BAD
lets look at 'szText'
szText SumText,"Declared"
jmp @F ;Extra bytes just to declare in Jmp instruction
SumText BYTE "Declared",0
@@:
Whereas you could you segemnts to achive the same thing:
.data
SumText BYTE "Declared",0
.code
The String is appended onto the end of the data section, which no extra overheads etc which is where dsText comes in!!! DO NOT HARDCODE THE INSTANCE HANDLE OF A DLL !!!
The hInstance of a DLL is the starting location of the DLL in memory. If you load two DLL's (without unloading them) with the same base address, one of them will be relocated, and the hInstance's will be different.
To be safe, if you hardcode the EXE hInstance, you must remove ALL relocating information. I don't know if there is a flag in the EXE for signalling a non-relocatable executable.
There is no real speed loss in initializing a PAINTSTRUCT, because the initialization is performed in all cases by BeginPaint.
If we want to truly save space, allocate PAINTSTRUCT on the stack only when we receive a WM_PAINT message.
Reason:
DefWindowProc will use SendMessage on some messages and cause recursion.
For example, WM_CLOSE will call DestroyWindow which will SendMessage(WM_DESTROY) to the same window. In this case, if you allocate a PAINTSTRUCT before dispatching the messages, you will create (at least) two instances which are not used.
to Tank:
1. About relocation - it's stript by default.
2. You right about DLL relocation. BUT it's me who coded it. And not black magic.
I such a case I wouldn't define hInstance of DLL as constance.
The matter is that while doing a wonderfull job to show how easy it is to code in Win32 ASM
we are stealing from begginer (who get used to copy - paste, coping meanwhile some
"safe" nonesence from our example as hPrevInst for example) oppotunity to understand
what and why they are coding. They are compiling some progs wich do controls, looks nice
.ect and at the same time the authors have no idea what is going on on low level and some
of them don't even know basic opcodes.
And all advaneges of low level coding - 1. Clear understanding 2. Speed 3.Size
eventually disappear from their creations.
I'm glag that I wrote something that make us discuss something that is usually out of sight.
My job involes a lot of multyprocessing and multythreding, so I wouldn't get caught by
incorrect usage of this kernel mode object.
And I write for those who want to understand, not just copy - paste my code.
So I'm not going to give templates for anything and everything.
I'll help those who want to become creators. And don't care for the rest.
So one of the rule if you want to optimize your code - while
finding common parts and formilizing them in procs and macros, don't make
uniform for everything purpose. It's the same as blinders.
The Svin.