Hey All,
I'm in the architectural design and manufacturing business, and I have been building an app to support my operations... it envolves drag'n'drop objects.. etc... AutoCAD is not an option for the .dxf format makes it too complicated to drag'n'fit (so to speak)...

so, I've written many, many of the routines in PB, and I want to speed them up... (even though pretty fast as is)... in optimizing a simple circle func w/ PBasm, I've found that the winAPI ARC func is still 3 times faster...

so I gather that GDI func's use Graphic card support?...
but I need more control than GDI offers, for some surfacing stuff I've written... (I need to know where/ and have access to every pixel in func)
so, question: should I use other platform than GDI?... which is fastest? ...
which is better? etc ..

I really don't like DirectX.. (too big and complicated)
OpenGL ?
GDI+ ?
?
or, is there a standard, for graphic card access, or do I need to accomodate for all cards?

Thanks,
Brad
Posted on 2003-05-16 08:48:19 by Brad
OpenGL is as large and scary as DX is.
GDI functions can do what you want, but are not lightning fast.
Calling them from asm will be faster than using the interpreted basic.
Posted on 2003-05-17 10:43:47 by Homer
Hey EvilH,

Thx, just for reference PB is a Compiler that supports inline asm, so once tuned it's very fast, but I've found that the SetPixelV API is pretty slow, and am now looking into DIB's,
which seem like they might do the trick... I really want to write own func's and not use DX, or OpenGL...

Brad
Posted on 2003-05-17 11:02:03 by Brad
Afternoon, brad.

Using SetPixel will always be the slowest way to draw graphics, since it would have to lock and unlock memory every time you set a single pixel.

Always better/faster to lock a buffer and draw all you need before unlocking again.

It seems that GDI is what would be better to use for what you're doing.
Create a DIB section and make a proc/function for drawing an arc/circle/whatever.

Cheers,
Scronty
Posted on 2003-05-17 22:37:54 by Scronty
Scronty,

yes, thx for response, I've been playing with DIBSection's and they are extremely fast :grin:

maybe as fast as the api lineto function... the only problem I've run into is that... It doesn't clip itself... so If I try to set a pixel that isn't there it crashes ... unless I'm doing something wrong...
it's a bit of extra work to test if pixel is in bitmap area... for scrolling and zooming windows??

Thanks, Brad
Posted on 2003-05-17 23:06:08 by Brad
Afternoon, Brad.

When working with DIBs, you're actually working with the memory area itself, so if you trying to read/write to a memory location which you aren't supposed to have access to, it's just like trying to access a memory location for anything else.

So you're right in thinking that you have to supply clipping yourself.

Since you're doing this within a window, I imagine that you're obtaining the client rect for the window, and using the rect.right and rect.bottom for creating the DIB to draw to?

One way to cheat at clipping, though not the fastest/best method to use, is to create a DIB with the width and height as large as your PC can handle (or... as large as you think it could ever possibly be) and draw all you need onto it.
After that, just bitblt or stretchblt (for zooming) the correct area of the DIB to the DC (or memDC).

Cheers,
Scronty
Posted on 2003-05-17 23:50:30 by Scronty
hi, Brad,

GDI behaves differently with different video cards. Whenever the card supports DirectDraw (not emulated), you will have some of the GDI functions accelerated. To see which ones, go to ControlPanel->DirectX->DirectDraw-Capabilities. Also click on the button "Advanced".
On my GeForce2 there are only several main functions: BitBlt, StretchBlt, and the xxxRect functions. I'm pretty sure lines are not handled by hardware. But BitBlt is 4x faster than if you did it with memory buffers.
Creating and maintaining a DIB is very easy. In your custom "SetPixel proc x,y,color" make use of


cmp x,Const_Widht
jae _bad
cmp y,Const_Height
jae _bad
; now put color
mov eax,y
mov ecx,Const_Width
mul ecx
mov ecx,DIB_Buffer_Ptr
add eax,x
shl eax,2
add eax,ecx
mov ecx,color
mov [eax],ecx
_bad:
ret

when you have a negative value for x, the "jae" will think x for greater than Const_Width (unsigned compare). Bear in mind that DIBs have irregular Y axis (from bottom to top).
Good luck :alright:
Posted on 2003-05-20 11:47:44 by Ultrano
Ultrano, Thanks Much!!!

I'll try it out :)
Brad
Posted on 2003-05-20 15:35:18 by Brad
Hey do any of you know how many clk cycles it takes for modern CPU's to execute an integer multiply instruction? I believe this will be valuable information for Brad and anyone else
Posted on 2003-05-20 17:37:40 by x86asm
From Hutch's Opcode Help File, in his MASM32 package :grin:

ADDED:.. just realized that it doesn't include pentiums :rolleyes:



MUL - Unsigned Multiply
Usage: MUL src
Modifies flags: CF OF (AF,PF,SF,ZF undefined)
Unsigned multiply of the accumulator by the source. If "src" is
a byte value, then AL is used as the other multiplicand and the
result is placed in AX. If "src" is a word value, then AX is
multiplied by "src" and DX:AX receives the result. If "src" is
a double word value, then EAX is multiplied by "src" and EDX:EAX

receives the result. The 386+ uses an early out algorithm which
makes multiplying any size value in EAX as fast as in the 8 or 16
bit registers.
Clocks Size
Operands 808x 286 386 486 Bytes
reg8 70-77 13 9-14 13-18 2

reg16 118-113 21 9-22 13-26 2
reg32 - - 9-38 13-42 2-4
mem8 (76-83)+EA 16 12-17 13-18 2-4
mem16 (124-139)+EA 24 12-25 13-26 2-4

mem32 - - 12-21 13-42 2-4

F6 /4 MUL r/m8 Unsigned multiply (AX ? AL * r/m8)
F7 /4 MUL r/m16 Unsigned multiply (DX:AX ? AX * r/m16)
F7 /4 MUL r/m32 Unsigned multiply (EDX:EAX ? EAX * r/m32)
:rolleyes:
Posted on 2003-05-20 17:52:07 by Brad
From Agner Fog's optimize document, also in Hutch's package:

An integer multiplication takes approximately 9 clock cycles on PPlain and PMMX and 4 on PPro, PII and PIII. It is therefore often advantageous to replace a multiplication by a constant with a combination of other instructions such as SHL, ADD, SUB, and LEA.

Example:

IMUL EAX,10

can be replaced with

MOV EBX,EAX / ADD EAX,EAX / SHL EBX,3 / ADD EAX,EBX

or

LEA EAX, / ADD EAX,EAX

Floating point multiplication is faster than integer multiplication on PPlain and PMMX, but the time spent on converting integers to float and converting the product back again is usually more than the time saved by using floating point multiplication, except when the number of conversions is low compared with the number of multiplications. MMX multiplication is fast, but is only available with 16-bit operands.
Posted on 2003-05-20 18:07:07 by Brad
On my K6-2 a 'mul' takes 5 cycles, so you shouldn't worry about that much. Make the PutPixel() as a macro, if you want to optimize. Only making your own Rect() will be tough. In the image below, see the red rectangles - with 2 borders and fill color. This was hard to optimize, but still doable :alright:
The rectangle (138,29)-(758,391) is a DIB. I post the image to give some inspiration. :alright:

characteristics: uses "mul" in finding first pixel for drawing of line/rectangle. Uses "add ebx,WIDTH / mov esi,ebx " for finding next row. Draws DIB backround (see it's granular!) + all guide lines + keyboard on the left, for only 2 ms ! It first fills in with blue/darkblue granular surface, then draws all lines. Drawing the notes (these red rectangles) + all left takes about 1ms. My RAM is 64MB at 66MHz, so you can expect better results at your PC. Draws > 466,000 notes on the screen for 100ms. !!! :cool:

btw, I think "shl eax,1" is faster than "add eax,eax" - at hardware spends 16 times less time, though it still stays in 1 cycle :P
Posted on 2003-05-20 20:58:52 by Ultrano

On my K6-2 a 'mul' takes 5 cycles, so you shouldn't worry about that much. Make the PutPixel() as a macro, if you want to optimize. Only making your own Rect() will be tough. In the image below, see the red rectangles - with 2 borders and fill color. This was hard to optimize, but still doable :alright:
The rectangle (138,29)-(758,391) is a DIB. I post the image to give some inspiration. :alright:

characteristics: uses "mul" in finding first pixel for drawing of line/rectangle. Uses "add ebx,WIDTH / mov esi,ebx " for finding next row. Draws DIB backround (see it's granular!) + all guide lines + keyboard on the left, for only 2 ms ! It first fills in with blue/darkblue granular surface, then draws all lines. Drawing the notes (these red rectangles) + all left takes about 1ms. My RAM is 64MB at 66MHz, so you can expect better results at your PC. Draws > 466,000 notes on the screen for 100ms. !!! :cool:

btw, I think "shl eax,1" is faster than "add eax,eax" - at hardware spends 16 times less time, though it still stays in 1 cycle :P


y0 your project is going pretty well, I cant wait till a new version :D
Posted on 2003-05-20 21:28:24 by x86asm
Ultrano, cool coding! :)
Posted on 2003-05-20 23:04:49 by bitRAKE