hi all!

i currently about to write a gui for my "os". when moving a window i dont want to draw everything new. i want to move the image containing the window-rectangle. it ain't fast anymore on 300*200 px. this is what i have.



movewindow:
;param
; ebx | loword: source x-pos
; | hiword: source y-pos
; ecx | loword: destination x-pos
; | hiword: destination y-pos
; edx | loword: width
; | hiword: height
s_x equ [ebp+8]
s_y equ [ebp+10]
d_x equ [ebp+4]
d_y equ [ebp+6]
width equ [ebp]
height equ [ebp+2]

push ebx
push ecx
push edx
push ebp
lea ebp,[esp+4]

xor eax,eax
movewindow.y:
push eax
xor eax,eax
movewindow.x:
push eax
;x-counter: [esp]
;y-counter: [esp+4]
mov ax,s_y
cmp ax,d_y
jge movewindow.bottom2top
mov si,word [esp+4]
shl esi,16 ;y-offset to higher word

mov ax,s_x
cmp ax,d_x
jge movewindow.bottom2top.left2right
mov si,word [esp] ;x-offset to lower word
jmp movewindow.draw
movewindow.top2bottom.left2right:
mov si,width
sub si,word [esp]
jmp movewindow.draw
movewindow.bottom2top:
mov si,word [esp+4]
shl esi,16

mov ax,s_x
cmp ax,d_x
jge movewindow.bottom2top.left2right
mov si,word [esp]
jmp movewindow.draw
movewindow.bottom2top.left2right:
mov si,width
sub si,word [esp]
movewindow.draw:
mov ebx,dword s_x
add ebx,esi
call getpixel
mov ecx,eax
mov ebx,dword d_x
add ebx,esi
call putpixel
pop eax
inc eax
cmp ax,width
jl movewindow.x
pop eax
inc eax
cmp ax,height
jl movewindow.y
pop ebp
pop edx
pop ecx
pop ebx
ret


note that putpixel and getpixel have the coordinate in ebx, y in hiword, x in loword. this function checks for screen-range and calculates the position in the memory to put the pixel. i know, this is the slow part, i'd love to use rep movsd, but i can't due to range-checking.
so does anybody have an idea how to directly access the lfb (not by put- and getpixel), but keeping the range-checking? i would be dimensions faster if i could access the next pixel just by adding 4 to a counter.
btw, maybe there is any hardware-possibility to do that? i've heard of the pm-interface of vesa, but i actually don't know how to use it.

greets, hartyl
Posted on 2003-05-20 14:11:53 by hartyl
oh man, nobody out there who can help me?

i've played around with it today. i noticed that when i move out a window partially the moving works incredibly fast (almost realtime). i thought "ok, put- and getpixel do rangechecking, the functions only calculate the memory-address if i am in range". so i modified the code that it always calculates the memory-address, then comes the range checking and only if it was successful i do the last move to the lfb. and guess what: it went as slow as before, so i figured out that not the calculating is slow, the actual pixel-putting is.

2 questions:
could it be the problem that i have to do monitor-synchronizing (wait for retrace)? how would i do that?
where is really the memory of the lfb? on the gfx-card or in ram? is it slow to write there?
Posted on 2003-05-22 14:37:56 by hartyl
Man,

1)
Unfortunately on curent "modern" - hardware accelerated - video boards you are not allowed to READ from video memory. It is possible BUT doing so gives you BIG slowdown.

Look at my OS i always rewrite everything but it is a one way operation i a;ways PUT data on video memory NEVER read it from there

2) DO not make 2 CALL's inside your INNER loop man... inline getpixe/putpixel there

3) The whole thing looks WAY too much complicated and general...such code is more likely a HLL construct. SIMPLYFY!.

4) IMHO -- DO not play with Stack THAT way : you do not NEED it (esp when you make 2 CALL's inside inner loop) -- go for standard ebp usage for params instead -- keep code simple until you can see it right


The LFB memory is in video card's own RAM.
Writting there is SLOWer than system RAM ... but not THAT slow as READ!!!
I will check it out more when i have time but as a general personal impresion the whole thing is too much unoptimized.

Keep inner loop as small as possible
Posted on 2003-05-22 15:50:09 by BogdanOntanu
neither the loop nor the 2 calls inside are that slow, as you told me, the reading is. i just wanted to test my code thats why its unoptimized and simple. hey, its temporary code. the conditions are just to check whether i have to move from left to right or from right to left, with top and bottom its also done. real speed optimizing comes later.

ok... the reading from lfb is the problem... but i have to... i had an idea right now:
what about keeping a complete copy of lfb in memory, putpixel writes to both buffers, but getpixel reads from the copy. but this alone would need 1024*768*4=3MB and with changing screensize this would change either.

any other ideas?
Posted on 2003-05-23 11:20:27 by hartyl
I can see no other solution but to use hardware accelerated BLIT to do the move, blitting from video memory to video memory via the hardware blitter is extreemly fast...

Unfortunately in order to do this you must have a special driver for every video board out there as there is NO standard/ no documentation, or the VBE AF standard is so secret and its specs cost approx a few thousand US$

Personaly i use a memory buffer --for the LFB-- in system RAM and ONLY write from this buffer to video memory

I still think your code is very complicated for my taste, but i recognise it might be ok for you, and for sure you underestimate the damage that 2 Call's will do to your iinner loop ... IMHO of course.

I must also point it out to you that you will need to redraw the windows UNDER the window you are moving using such code so this makes this kind of move useless in the long run, you still have to have a function to redraw/regenerate each window elemets via a call to code.

This remake of the window (usually involving ONLY writes) MIGHT just be fast enough to make READ+WRITE COPY useless (even if for smaller sizes)
Posted on 2003-05-23 12:09:32 by BogdanOntanu
you're right, i have to redraw everything below the window - but thats not the problem. i wanted to really "move" the window as i thought this would be easy and fast, i dont have to care about redrawing the window itself. furthermore i dont want to drag a frame and then draw a window there if the mouse is released.

as i reduced my plans that its not possible to move the window out of the screen it is possible to use the hardware-bitblt and the moved window doesn't have to be redrawn. i think that the pm-interface of vesa has some standards. i've found informations about it before in Ralf Browns interrupt list, i'll try this when i have time.

actually i turned away from my buffer-idea. i don't like wasting 8megs of memory just as a copy of the screen.

i'll tell you when i'm finished
Posted on 2003-05-23 13:22:34 by hartyl
1024x768x32bits will be ~ 3.14Megabytes

If the target system is low on ram i guess it can go on 800x600 or 640x480

If the target system is embeded system then it is possible that read from the video device is not that slow (or write not that fast -- hehe ).

Double buffering is a good ideea for some other things like: flicker free animations or window move and translucent windows :P

But indeed on a very limited RAM system i would also go for direct video device memory access also

Clipping the blits is not THAT ard esp IF only rectangles are involved.

I suggest the DX method of using 2 rectangles one for source and one for destination, both can be very easy clipped to new rectangles that DO NOT cross the screen borders IMHO


So I decided to wasted a double buffer on SOLAR OS for translucent windows :P, most games do that all the time anyway to avoid flicker.

RAM is a resource that modern CPU/OS/PCs should have plenty of .... IMHO

Good luck...
Posted on 2003-05-23 13:33:59 by BogdanOntanu
initially i also wanted to do translucent windows, but with the gui i went about it, i can't do that. redrawing everything for everyframe is a very slow process i wanted to avoid.
i've just planned to involve only rectangle-windows (on the long time of windows-programming i've never needed different shapes...)
this weekend i'll take the time and do what i mentioned: implement a hardware-bitblt-function.
Posted on 2003-05-24 09:11:45 by hartyl
If you succed into using hardware BLIT in your own OS please let me know, i am interested, even if it is only for a few well known video boards like nVidia and ATI :D

The redraw all windows all times is not THAT slow as you expect, my os can do this decently on P2 400+ CPUs. esp because of the above talked WRITE ONLY aspect of today video memory

But i agree most of the time there is no need to do this --aka when nothing changes on screen -- however when many contents changes during each frame (like in games or animations or very active contents) this method is quite efective...and i must mention : very simple to implement and understand

I expect myself to develop some dirty rectangles algorithm and update only THAT active parts each frame --- when i will go to optimizations phase in SOLAR OS...

... curentky is just ok as it is == not too fast but very simple and easy to understand --> is good while developing IMHO
Posted on 2003-05-24 11:19:11 by BogdanOntanu
hardware blitting can't be that difficult, out of ralf browns interrupt list, here is what i already know. and it seems easy to use and understand:

- use int 10 with subfunction 0x4f0b to get a "device context buffer". you get a list of pointers you can call from pm.
- the pointer in offset 0x24 is the bitblt-function. usually its call by register, but in that case there are too many params. so a pointer to a parameter block:

ES:EDI -> device context buffer
DS:ESI -> BitBlt parameter block
BL = mix operation
00h replace
01h XOR
02h OR
03h AND

with the following format:
Offset Size Description
00h WORD left coordinate of source rectangle
02h WORD top coordinate of source rectangle
04h WORD right coordinate of source rectangle
06h WORD bottom coordinate of source rectangle
08h WORD left coordinate of destination rectangle
0Ah WORD top coordinate of destination rectangle
0Ch BYTE horizontal direction: 00h = decrement X, 01h = increment X
0Dh BYTE vertical direction: 00h = decrement Y, 01h = increment Y
Posted on 2003-05-26 13:47:32 by hartyl
i didn't read the small-printed.


Note: this interface description is derived from the draft VBE/AF proposal
(version 1.0P, document revsion 0.12P, dated 13jan95)

...
i tried this function to get a device context buffer, but actually the subfunction 0x4f0b is listed twice. the second one is "get nearest pixel clock". so, when i call it i get a pixel-clock in ecx, not a device context buffer.
so... redraw the window itsself on moving too... and i gotta tell you, i don't like it that way.
i'll take some time right now to find a good screen-to-screen bitblt...
Posted on 2003-05-27 13:20:34 by hartyl
First things first: NEVER trust the documentations

I have had a hard time explaining generation after generation of DX programmers that alphablending is NOT implemented in DirectDraw EVEN IF it is mentioned and promised at each DX release :D. Even today when GDI+ can do Alpha blending using hardware, DirectDraw stii can not do it, well the latest DX7 :P, the fact that you have to use 3D to do that is just plain pathetic, since THEY obviousely use it in GDI and any normal GUI will need it (2D) ...but doh ...


I would also like to get a hold of the good VBE-AF documentation.... but i do not have the required money :(
info about accelerated 2D and 3D functions at hardware level for NVidia and ATI and other video boards are my greatest dreams...not to mention network boards, AGP etc

things are hard for OS developers, information is keept secret etc, we have only ourself and the comunity to help, ah ..... also Linux sourcecode :P
Posted on 2003-05-29 09:11:51 by BogdanOntanu