Hi...
I want to fill a 32 bit register with a byte value...

IE : Let's say I have the byte 8F (in fact, I only know the byte value at run time) : I want to fill the register like that :
8F8F8F8F...

Here are my solutions :



mov al, 8fh
mov ah, 8fh
bswap eax
mov al, 8fh
mov ah, 8fh


Another one (perhaps faster : I don't have my assembler here to benchmark)



mov al, 8fh
mov ah, 8fh
ror eax, 30
mov al, 8fh
mov ah, 8fh


Is there a way to do something equivalent better/smaller/faster ?

Thanks in advance...
Posted on 2002-02-27 13:03:20 by JCP
I'd do this
mov  ecx, 4

loopLabel:
mov al, 8Fh
rol eax, 8
loop loopLabel
Not sure about speed or size advantages, but you're gauranteed to get the same value in each byte.:alright:
Posted on 2002-02-27 13:15:15 by The Worrier King
Assuming CL contains the byte you want:


mov al, cl
mov ch, cl
mov ah, cl
shl ecx, 16
or eax, ecx


This takes ~2.9 clock cycles on my athlon TB, both yours took ~4.8 (also using CL instead of 8F). However I don't know if the benchmark is correct as such a small piece of code is hard to benchmark. I put it in a big loop (1000000 iterations), repeated 10 times with the repeat macro... Don't know if that's okay.

btw shouldn't 'ror eax, 30' be 'ror eax, 16' in your code?

Thomas
Posted on 2002-02-27 13:21:35 by Thomas
Thanks for your reply...
IMHO, your solution will be slower because of the loop (I had already considered this option...) and the LOOP instruction (relatively slow since 486).

Another derivated version :

mov ecx, 4
looplabel:
mov al, 8fh
rol eax, 8
dec ecx
jnz looplabel

About size :

My first solution : 10 bytes.
My second solution : 16 bytes.
The Worrier King's solution : 12 bytes.
The Worrier King's derivated solution : 13 bytes.
Posted on 2002-02-27 13:30:11 by JCP
FourOnes dd 1010101h


; BYTE in EAX is put in all four bytes of DWORD
EveryByte MACRO
mul FourOnes
ENDM
This is shortest maybe. :) Fast on CPU's with fast MUL.
Posted on 2002-02-27 13:31:09 by bitRAKE
Thomas, nice code but it has the inconvenient to modify another register...
*EDIT* : But in the reality of my implementation, it suits very well to my optimization issue as I have the famous byte in dl, and I don't care modifying edx entirely...
Many thanks ! :)

btw shouldn't 'ror eax, 30' be 'ror eax, 16' in your code?


I started writing ror eax, 16 but I got wrong result, then I tried 30 and the result was correct...
(I currently do my testings using OllyDBG ^^).

BitRAKE: Yes, I found this "mathematical" relation too, but I was wondering if it would be really slower to do a mul there... and you are tricking : the register must be 0 except al, that contains the byte to fill the register with. :rolleyes:
Posted on 2002-02-27 13:43:33 by JCP
could always do an "and eax, 0000000FFh" before the mul.
Posted on 2002-02-27 13:59:05 by f0dder
or a movzx eax, al... I don't know what is the fastest... (the movzx solution is smaller, though)

What I wanted to say is it is not as small as it figures on the macro to behave like the other ones... tricky Ricky. ;)
Posted on 2002-02-27 14:08:52 by JCP
Readiosys:
It's very good exersize for positioned number systems math topic.
But, please, give you conditions more explicitly.
Where do your get this byte from? Register or memory variable.
Can we change registers?
In your example you use immidiate. That's way it's not clear enough. Cause then we just:
mov eax,8f8f8f8fh

I know stated about runtime value, but in your code use immidiate.
Posted on 2002-02-27 14:18:24 by The Svin

Readiosys:
It's very good exersize for positioned number systems math topic.


Yup, I enjoy this topic a lot... even if we only use basic instructions. ;)


But, please, give you conditions more explicitly.
Where do your get this byte from? Register or memory variable.
Can we change registers?
In your example you use immidiate. That's way it's not clear enough. Cause then we just:
mov eax,8f8f8f8fh


Sorry if I wasn't clear enough, in fact i dissimined the clues over different posts of the thread...

To be more precise and reassemble the elements, I have the byte from dl, and I don't care to modify edx completely (but the code may be subject to change... but I don't think so... you can change edx at will)...
Not destroy any other registers would be the cherry on the cake... but it only a style exercise... and the priority is performance ;)
I only know the value of the byte (comes from a file) at run-time... (It would have been too easy:rolleyes: ).

For information : it is an early step to do a memfill routine filling the memory with one byte, but processing 4 bytes at a time...

Thanks for your interest.
Posted on 2002-02-27 14:29:55 by JCP
and edx,0FFh

; something useful...
mov edx, [MyTable][edx*4] ; :)
No surprise. Meets your criteria?
To be more precise and reassemble the elements, I have the byte from dl, and I don't care to modify edx completely (but the code may be subject to change... but I don't think so... you can change edx at will)...
This paragraph is funny. :)
Can I change EDX, or not?
For information : it is an early step to do a memfill routine filling the memory with one byte, but processing 4 bytes at a time...
MMX. How much memory do you want to fill? How fast do you want to fill it?
Posted on 2002-02-27 14:36:37 by bitRAKE
Wouldn't you have to use that piece of code only once then? Of course we always want all code the fastest way there is :) but it's only a little overhead compared to the loop that probably follows it.
MMX is certainly a great way to do a memfill, athlons have some nice optimizing instructions to speed up large memory block access (like movntq).

Thomas
Posted on 2002-02-27 15:02:16 by Thomas
I personaly like Thomas solution as it is very good choice of size and speed.
If you need best speed or best size of fast memory filling,
all three good solutions came from bitRAKE.
Smollest is using mul (but slow)
Fastest is using table (but big size)
The best to fill big region with the same byte is to fill
not just 32x register but 64x register and fill it in chanks
unrolling loop using movq
eax pointer
movq ,mm0
movq [8*1],mm0
movq [8*2],mm0
movq [8*3],mm0
etc..
movq [8*8],mm0
add eax,8*9
check if finish if not jump to fill again
Posted on 2002-02-27 15:05:02 by The Svin

and edx,0FFh

; something useful...
mov edx, [MyTable][edx*4] ; :)
No surprise. Meets your criteria?This paragraph is funny. :)
Can I change EDX, or not?MMX. How much memory do you want to fill? How fast do you want to fill it?


Thanks...
The memfill is not the problem. I know relatively well how I want to code it...
I know I can process 8 bytes in a row using MMX... or even FPU if I remember well but as the maximum size to be filled will only be 255 bytes... and the average value would be near 10 or 15... and tell me if I'm wrong but it is not very useful to process more than 4 bytes in a time imho... (and I don't really like to use mmx as my programs must run on the largest range of PC possible and this, even if nowadays most people has mmx compliant processor).

Yes, you can change EDX... when I was talking about changes in the code, I was talking about the code that gives the byte to dl... and maybe after the change, the byte won't be in a register but accessed by a pointer in memory... but I don't think it will really change from now...
I'm sorry if you misunderstood me, but I'm very tired these days (really busy at work and don't sleep much) and as a non native english speaker, it is sometimes hard to explain things that would come naturally in your original langage in a foreign one and keep it clear : and this especially when you are exhausted. ;)

Thank you all for your advices and submissions.
Posted on 2002-02-27 16:49:49 by JCP
I'm sorry if you misunderstood me, but I'm very tired these days

Readiosys:
Clearyfications are absolutly normal thing in teq echoes.
We all should get used to be questioned and clarify the others
points.
So it's absolute OK.
Make it clear, more clear even more clear than cristal - it's normal practice.
A lot of people who have answers and ideas about algo and math topics don't ever say them just 'cause they are not sure they understand the questions and are reluctant to ask for more explicite explonaitions.

I say it again - it is absolutly normal, don't be sorry :)
Posted on 2002-02-27 17:00:22 by The Svin
Svin, is absolutely correct! Readiosys, I do not question anything of you personally - I just want to know the environment of the problem. I respect you, and that never comes into question. See, we have a great deal of information about the problem now. :)

Only 10 - 15 bytes average, then you might want to just fill them manually:
mov [mem+0], dl

mov [mem+1], dl
etc...
Are you sure the count is an even number? ...or multiple of four?

The environment of the problem usually dictates the method, especially when looking at the problem this close - a small piece of code. When you are looking at the problem from farther away, you are deciding the environment in which these small pieces of code must execute. Dependancies flow in both directions - the large scale problem solving creates dependancies for the small scale, and small scale problem solving indicates possible large scale changes (less so, or the problem doesn't reach a stable solution, IMHO).
Posted on 2002-02-27 17:39:28 by bitRAKE
Thomas : I found why my ror eax, 16 didn't work : remember I was testing in a debugger...
I assembled ror eax, 16, but 16 was in hex : argh ! I have been using too much my assembler, I think and the habits are still here. ;)

The Svin, thanks for your comprehension. ;)
While we are talking about testing algos in debuggers, I seem to recall you said it is a very good way to view what happens directly to the processor...
I totally approve with you : I tested that because I didn't have my assembler but it is very easy and clear to view what happens in the processor instruction by instruction... it is great to have a deeper view of assembly language... I have nothing to add to what you said about this, but I just wanted to approve your point of view as it was the first time I used a debugger to code an "algo" and experienced the advantages of this method...

BitRAKE: Thanks and I know : I just wanted to justify myself because I think in a normal situation, I would have been clearer...
(I'm not very sure of it, but I guess after some hours of sleep my english is already better than yesterday :rolleyes: ).
In fact, I had the idea to turn this into a "challenge" to code the better way to do byte filling... with the minimum inconvenients while getting the maximum performance and not only place the topic at the level of my own code implementation as it is a bit selfish...
I respect you too and thanks again for your great submission.

Only 10 - 15 bytes average, then you might want to just fill them manually:
code:--------------------------------------------------------------------------------mov , dl
mov , dl
etc...--------------------------------------------------------------------------------
Are you sure the count is an even number? ...or multiple of four?


Now, since we are going to discuss about the memfill routine I will explain the code environment and implementations limitations :

1 : I don't know how many times the loop will run :: the max value is 255...
Imagine that I have to fill 63 bytes... : 63/4 = 15.75
I don't really care to, for example, fill 16*4 bytes : In my case : the overflow is not a problem as my buffer is 64 kb large and the data previously in the buffer don't have to be reused... then I can destroy them at will...
The 10-15 bytes only is an observation I made, but it can be any value between 2 and 255...
esi must be preserved (I thought using mov dword ptr with using ecx for relative displacement and do a sub ecx, 4 after each iteration until 0... I will fill the buffer backwards...) : we can't use movsd then.
I have a fairly well idea of to what the proc will look like... maybe if I go at home not too late today I will have the time to code and submit here.

I hope to have been clear enough this time. ;)
Thank you.
Posted on 2002-02-28 01:27:10 by JCP