Since mov value1,value2 is a nono..

what's the preferred method:

mov eax,value2
mov value1,eax

or

push value2
pop value1

is the stack slower than mov'ing?
Posted on 2003-09-05 02:56:19 by drarem
If I remember correctly, moving memory via registers are faster. However making use of push/pop does not corrupt any registers (the value of esp remains the same). Anyway it has been long since I last move memory to memory.
Posted on 2003-09-05 03:30:10 by roticv
drarem, if you're interesting in speed "don't settle for just byte or two..."

Anyway it has been long since I last move memory to memory.
roticv , don't you use masm's invoke? ;) (I'm joking :grin: )
Posted on 2003-09-05 05:14:47 by S.T.A.S.
thanks for the link, STAS, but for the moment I am taking in mousex,y and button and working thru a 'sketchpad' tutorial, which is coming along nicely so far. I am now trying to figure out how to do antialiasing of lines, which I understand you copy 2x then back to 1x via stretchblt.. still working on that one..

the amd in the link you gave, is that for the new 64bit amd - are those actual qword registers? And I just bought a new Intel 3.06Ghz with that hyperthreading B.S... did I get skrewed with 32bits and soon-to-be outdated technology without knowing it? I just paid a fortune for it too >\

Please xcuse my dumb questions but I just have to know..
Posted on 2003-09-05 06:12:02 by drarem
Hi, drarem!
I don't know about 'sketchpad' tutorial :(
Still I think If I'm moving "a few bytes", I don't care about it. But it seems to me that push/pop will cause smaller register overhead sometimes.
This depends on many things, because CPU core frequency is 10x faster than RAM sometimes.
I'm not sure, will AMD CodeAnalyst work on P4, but you can try Intel PerformanceAnalizer.

This link worked well some time ago on my Athlon1000. (better than on P3 1200)

It would be some better to by Intel 3.0CGhz +i865PE
Your copmuter will be good for ~1,5 year, if Moore law will work :)


I think I'll not see the difference beetwin I32 & A64 for at least year, but still waiting Athlon64 ;)
Posted on 2003-09-05 07:07:21 by S.T.A.S.

roticv , don't you use masm's invoke? ;) (I'm joking :grin: )

I rarely use it since, masm does not like call _label with invoke which seems to litter my codes. :grin: However could you care to explain how is invoke related to moving memory to memory?
Posted on 2003-09-05 10:05:55 by roticv

However could you care to explain how is invoke related to moving memory to memory?

 00001DA2  33 C0	      xor		EAX,	EAX

invoke CreateWindowEx, MAIN.dwExStyle, MAIN.lpClassName,\
MAIN.lpWindowName, MAIN.dwStyle,\
[EBP].x, [EBP].y,\
EAX, EAX,\
EAX, EAX,\
MAIN.hInstance, EAX
00001DA4 50 * push eax
00001DA5 FF 75 B0 * push dword ptr [ebp]+0FFFFFFB0h
00001DA8 50 * push eax
00001DA9 50 * push eax
00001DAA 50 * push eax
00001DAB 50 * push eax
; this moves data from [EBP+x] to [ESP]
00001DAC FF 75 5C * push dword ptr [ebp]+00000005Ch
00001DAF FF 75 58 * push dword ptr [ebp]+000000058h
00001DB2 FF 75 D4 * push dword ptr [ebp]+0FFFFFFD4h
00001DB5 FF 75 D0 * push dword ptr [ebp]+0FFFFFFD0h
00001DB8 FF 75 C4 * push dword ptr [ebp]+0FFFFFFC4h
00001DBB FF 75 CC * push dword ptr [ebp]+0FFFFFFCCh
00001DBE FF 15 00000000 E * call _imp__CreateWindowExA@48
Posted on 2003-09-07 18:46:28 by S.T.A.S.
Hehe,



:0040152D C70584AC400048954000 mov dword ptr [0040AC84], 00409548
:00401537 0FB616 movzx edx, byte ptr [esi]
:0040153A 52 push edx
:0040153B C1EA03 shr edx, 03
:0040153E 83E207 and edx, 00000007
:00401541 8B149554964000 mov edx, dword ptr [4*edx+00409654]
:00401548 891588AC4000 mov dword ptr [0040AC88], edx
:0040154E 893D8CAC4000 mov dword ptr [0040AC8C], edi
:00401554 5A pop edx
:00401555 33C9 xor ecx, ecx
:00401557 E827660000 call 00407B83
:0040155C E9BDFBFFFF jmp 0040111E



inc esi
mov _opcode.field1, offset _adc
movzx edx, byte ptr[esi]
push edx
shr edx, 3
and edx, 111y
mov edx, [reg8+edx*4]
mov _opcode.field2, edi
mov _opcode.field3, edx
pop edx
xor ecx, ecx
call decodemodrm
jmp _ret

Had been passing parameters via registers :) For some api calls, there's nothing interesting I think, except for some pushes for the parameter hdlg for some apis.
Posted on 2003-09-08 07:58:53 by roticv
is it possible to copy memory by DMA?
Posted on 2003-09-08 08:20:03 by etn
is it possible to copy memory by DMA?

Possible? yes. Efficient (or faster)? maybe not.
The same idea has been discussed in NetBSD mail list about a couple of months before. It could be better than other method when communicating with a device, but it is worse (for general purpose) than usual memcpy() implementation under any OS working in protected mode.
Posted on 2003-09-08 16:10:43 by Starless
Originally posted by roticv
:0040152D C70584AC400048954000    mov dword ptr [0040AC84], 00409548


I love the size more than enter/leave stuff ;) so I prefer use EBP as a pointer to my global data, and ESP for local... As you can see this alow me to use just 3 bytes to send DWORD not only for CALL, but for MOV, too.
There's really noting interesting in API call, exept that we can save some (~50%) space ;)
Specially if we don't use stuff like "mov wc.lpszClassName, offset szClassName", but already have predefined values in the .data section...

BTW, is it possible to replace
" call decodemodrm
jmp _ret"
with "jmp decodemodrm"?
Posted on 2003-09-08 20:02:06 by S.T.A.S.
Yes, I do use predefined wndclassex and other structures. It does save some bytes. Also usually I defined my hinstance as 400000h via an equate.

Well technically it is possible, just that I am another lazy coder. That function is the most called routine in my code. Oh yes for some other function, the immediate follows after the modrm like for example shift and imul. I was too lazy to code another seperate function, thus I reused it.

Actually I think passing parameters via register produce much faster code, less clocks for push. Anyway it only works best in your own code.
Posted on 2003-09-08 20:32:33 by roticv
3 out of 10 i am off base

What do mov mem to mem mean. Is it

mov esi, offset this_BUFFER
mov edi, offset THAT_BUFFER
rep

or is it Jen_Cat_Buffers (StringsLen PLUS)

They all is darn fast enough unless it's for a fast a** game or paint BIG bitmap something i guest.

I am sure for a regular app it will be finish before the user get to click the next option.

48,000 instructions per second beginning with any 386 i read.

Can you type that many lines for ONE function..

I try everyday or i enjoy trying.
Posted on 2003-09-08 20:44:06 by cmax

Also usually I defined my hinstance as 400000h via an equate.
Very cool, roticv! I'm usually afraid to do this, but don't know why... Now I see that this is possibil? :)
Actually I think passing parameters via register produce much faster code, less clocks for push. Anyway it only works best in your own code.

About registers I'm not so sure... usually it's faster but... when i do PUSH VALUE, then VALUE is passed to the chache at first (later in memory); and if I use PUSH , then data stores also in some shadow register of CPU, so when some proc uses this data, I think it's already in CPU. (also Call isn't very quick instruction, AFAIK)

Any way, you're using PUSH EDX, then POP EDX before Call... why pass data in reg if it's already in stack?

When we use instructions like "mov dword ptr [0040AC84], 00409548", then CPU must decode the address where data is to be stored and read the data itself from instruction cache, too.

Somewhere I've read about "registers overhead" (may be it's wrong termin), this may slow down executing if you have many calculations and all registers are used. (even if no data is moving from memory)

I can see really strange things in AMD COdeAnalist sometimes...

But one thing can be really "free" - I assume CARRY flag, not to trash a DWORD register when I need to return an "error code". This is wellknown on earlier CPUs, but novadays there's C in the odd world...
Posted on 2003-09-08 21:34:08 by S.T.A.S.

48,000 instructions per second beginning with any 386 i read.

And you can return to this speed on modern CPU, just turn off cache in the BIOS...
Though, on modern computers this is impossible sometimes, but worked "well" on P1/P2 ;)

Moving data from mem to mem can be _very_ slow, because RAM is bottleneck
(if we compare its speed with executing of "NOP" for example)
Posted on 2003-09-08 21:50:02 by S.T.A.S.