A few weeks ago, I started a small project to reshape a form to the shape of a (mask) bitmap, usually black and white. Mainly using GetPixel, to return the color value of each pixel on the form, compare it to a "transparency" color and then create a form region. My first experiment, was VERY slow. Going through each pixel on the form, and creating a 1x1 pixel size region for every valid pixel. Then, looking at some other code, I found one could make an extended region, eg. 5x1 size region. This improved my load speed tremendously. This is the common method of redoing a forms shape. I didn't know it at the time, but MOB has a version of the same algorithm written in Assembler on the forum.

Well, on my P4 2.8GHz, an application load delay of 3 seconds to do a 565x565 bitmap was unacceptable, imagine load time on a P1! So I measured speeds of most of the Windows API system calls I was making, and although GetPixel was not the slowest, (CombineRgn() was slowest), GetPixel(), with the shear amount of times it was called, accounted for MOST of the slow speed, because it was being used on each and every pixel.

So, I tried to find a GetPixel() replacement in C++ or Assembler code, bessides this forum I tried SourceForge, CodeGuru and PlanetSourceCode, with no luck. I did however read a lot of comments on the theory behind GetPixel(), what it does and how it works and why it's so slow. So, I endeavoured to optimize GetPixel(). I believed the main reason it's so slow (my opinion), is because it extracts and loads the bitmap from the DC into memory, return the color value, and then destroy the memory resident bitmap it created and extract. This is only my speculation which comes from my experience with creating my replacement because that's what I would have had to do to create an EXACT duplicate. In my opinion, this initial setup was very time consuming.

So, what I decided to do was, write a "setup" function, called EnterGetPixel(), allow you to pass it the DC of your Bitmap as you would GetPixel(), it will extract the bitmap, copy the color bits to memory with a header (with the height and width etc. of the bitmap and a pointer to the memory location of the color bits), and it returns a pointer to the header. Then, every call to my new GetPixel() function, takes that pointer (returned from EnterGetPixel), instead of the DC, finds the RGB color value and returns it. With each new GetPixel call, there is no setup overhead! Once you're done with the new GetPixel function, you just call ExitGetPixel(), pass it the Headers memory location and it frees the memory after destroying the Bitmap object. The main restriction to my code, is that it only works on Bottom-to-top Bitmaps, which is the only kind I've ever seen! That's a bitmap with a positive Height, eg. 565, not -565! Note: Paint stores Bitmaps in Bottom-to-top format!

In my example code, I've left the original algorithm (similar to MOB's code!), which uses the Windows GetPixel() function, and the same algorithm that demonstrates the use of my NewGetPixel() function(check the DocRet.inc file). Over the course of 5 executions of each algorithm (on the same machine simply by changing functions from old to new), my code was 19.985 times faster than the original which uses the Microsoft version of GetPixel(). Please note, the Algorithm remains the same, the only difference is which function is called, old or new! To change between the functions, comment/uncomment the section in the WndProc.inc file!

Please feel free to comment, ask questions or make suggestions!

Included is the full RadASM project, coded in MASM syntax!
Posted on 2004-03-10 02:20:40 by SubEvil
That's quite an improvement in speed - around 16 times here at least. The method is a fine way of doing it and preserving the original algorithm (I didn't look closely at your source, so I'm just commenting on the idea and not the code itself).

However, the algorithm itself is sortof bad :) - most everything revolving around calls to putpixel is 'suboptimal thinking'. For instance consider a sprite blitter - you can spend very much time optimizing putpixel to have the fastest width*y+x calculation and clipping, but instead do a little pointer setup and you will have an amazingly simple (and fast!) innerloop without any putpixel calls.

The same goes for this code - you ought to work directly on the pixel data, it would save a getpixel call and some calculations for each and every pixel.
Posted on 2004-03-10 05:47:32 by f0dder
Also, instead of constantly creating temp regions and using COmbineRgn, what about constructing the region list yourself and finally using ExtCreateRegion? You basically need to create a header and an array of RECT structures - the code to setup a rect and add it to a linked list or dynamically growing array should take a lot less clocks than creating+combining regions, so it might give substantial time savings if ExtCreateRegion isn't too slow. At the expense of some temp memory overhead, of course.
Posted on 2004-03-10 06:16:08 by f0dder
2.53ghz P4, win2k servicepack 3.

Old function took: 562ms
New function took: 31ms
Smart function took: 515ms

All computed region size data require 9440 bytes, while
the temporary "Smart" function uses 13552 bytes temporarily
for it's RECT list + RGNDATAHEADER. I'd say this indicates
that both ExtCreateRegion and the CombineRgn methods optimize
the created regions.

The "Smart" function is written in plain C++ code, based
directly on the assembly "old" function. The original translation
was a bit faster than the assembly routine already :).
The current version uses ExtCreateRegion, and includes
HeapAlloc+HeapFree overhead for the temp RECT list,
the overhead of storing the rect coords in the list,
and setting up the RGNDATAHEADER fields.

Note: you should use GetDC+ReleaseDC instead of Begin+EndPaint,
as the latter should only be used inside a WM_PAINT handler.

The next step would be integrating the ExtCreateRegion code
with the "smarter GetPixel", or even better, work directly
on the bitmap data without call overhead.

Other optimizations could include working on the mask as a
8bit grayscale image (if that bitmap format is supported
under windows - would be too much bother settings up a palette :)).
Oh, and when moving to "work directly on bitmap data",
a smarter strip-finding algorithm could probably be used.
Posted on 2004-03-10 08:39:24 by f0dder
Hi fOdder,

Thanks for the comments and suggestions! I've looked into the ExtCreateRegion() functionality a few days ago, but by profiling my code, I found the slowest function was the GetPixel() calls. So I decided to optimize them first. My next step is to create my own regions, as you suggest!

By the way, I don't use PutPixel() or SetPixel() at all in my code, and I do work directly on the pixel data (within my GetPixel() function). I could, as you suggest, use it to get the pixel data directly in my code, but I wanted people who currently use GetPixel() to be able to almost directly replace their GetPixel() calls with mine. It literally took me about 30 seconds to change the code to use my new GetPixel() function, and I'm very happy with the results.

All that's needed in order to use it is an initial call to EnterGetPixel(), giving it the DC with your Bitmap (As you would give the original GetPixel()). It will return a pointer which needs to be given to the new GetPixel() function, instead of the DC given to the original, along with your X and Y co-ords. Then a call to ExitGetPixel() (with the pointer) when you're done!

Thanks again!
Posted on 2004-03-10 09:05:22 by SubEvil
Yeah, your code is an okay step in the right direction :)

But if you work directly on a top-down bitmap, you save a lot of instructions - push+call for the NewGetPixel, and basically all the code inside... basically getpixel could become "lodsd" (of course implemented with manual mov + add). I think the speedup would be noticable.

Not to degrade your code though, it's quite a performance boost already :alright:.

Of course the fastest of all would be to store the region data alongside the bitmap - around 9k for this semi-complex shape isn't too bad I guess? Might also be worth looking at the final region data visually, to see if it's optimized as much as possible.
Posted on 2004-03-10 09:22:57 by f0dder
It took me about 3 hours to write the ExtCreateRegion() functionality, but I'm very happy with the result. Should be stable and without leaks or bugs!

What I did was create an initial linked list of RECT structures, go through the Mask bitmap and populate them, creating a new blank one each time. Once I have all my RECT's, I consolidate all of them into the final RGNDATA structure needed by ExtCreateRegion().

The result: It's 4.444 times faster than my previous code, each tested consecutively over 5 runs.

My next step is to write a DLL for general use!

PS: I've replaced the initial code with the new version, which includes all previous function versions!
Posted on 2004-03-11 12:35:40 by SubEvil
I've now finished the DLL. Took me about 1 hour to make several modifications.

It has 1 function: TakeShape PROC hWnd :DWORD, hbmMask :DWORD, TransColor :DWORD

TakeShape requires a Window Handle (your form), an HBITMAP, usually returned by a call to LoadBitmap, and a "Transparent" color. If 0FFFFFFFFh is selected for the Transparent color, the function will take the color of the top left pixel (the first pixel, at co-ords 0,0) and use that.

I've included it in the above file. The file has all versions of the code, the instructions to them are just uncommented. Only the DLL version is now enabled by default. If you wanna test the others, uncomment the relevant code in the wndproc.inc file

It is done!
Posted on 2004-03-11 13:35:01 by SubEvil
Improvement over linked list approach: one single heap memory block, initially of sizeof(RGNDATAHEADER) + n*sizeof(RECT) bytes. Rather than adding list nodes, you store directly, and re-allocate the block (with room for n new items, instead of single one). More conservative on memory, probably a bit faster, and saves you from building the block at the end of stuff since you're creating it in-place.

This would only be a minor optimization though, and is probably more of a "conceptual" optimization than a "real" one (probably won't make too much of a difference speedwise, and memory-wise it shouldn't matter too much - region data can be counted in the tens of kilobytes).

Posted on 2004-03-11 15:35:25 by f0dder
I would have used this approach, of course, but I don't don't know how many RECT structures I'll need! I'd have to go through the bitmap once, calculate the number of RECT's, allocate the memory, then go through the bitmap again and fill the RECT's. My method uses only one pass through the Bitmap, as it finds a valid RECT, it allocates the memory and assignes the values then continues.
Posted on 2004-03-11 22:11:42 by SubEvil
You still only need one pass - you do need to keep a couple of counters, like currentItems and maxItems. When currentItems equal maxItems, you add "deltaSize" to maxItems and call HeapReAlloc.

I think the standard rule of thumb in comp.sci. for dynamically growing arrays is to double the size when the array is full, but for a lot of things I like "bumping by a delta-value" better. Ie, start with room for 256 RECTs, and allocate space for 256 new RECTs every time you run out of space.

Btw, did you re-upload the zip to include the ExtCreateRegion stuff?
Posted on 2004-03-12 00:51:30 by f0dder
I did ulpload my latest version!
Thanks for the suggestion!
Posted on 2004-03-12 01:45:03 by SubEvil
Hm, you uploaded the new version? Probably my proxy that has the old version cached, then.

Nice to have helped you gain even an extra bit of performance :)
Posted on 2004-03-12 05:46:07 by f0dder
Have a look at this, I finished up my version (C/C++ though) - I think it works pretty well :), around 20ms to create the region data on an athlon700. There's full source + a readme with some explanations in the archive.
Posted on 2004-03-19 16:05:01 by f0dder
Sorry to bump such an old topic (although a very good one) does anyone have f0dder's final C++ code. I already have SubEvil's.

Of course, now that I need the files, they aren't here anymore. Lesson learned, Always grab files that seem usefull today, tomorrow they may not be there.
BTW, Are old zip files and attachment functionality planned on being restored?

If can help me, please e-mail me the file(s) at goggles8@hotmail.com

Thanks a lot,? :)
Posted on 2005-03-23 19:17:52 by goggles99
Have a look at hxxp://flork.dk/junk/customshape_20040319.rar the bottom of this thread :)
Posted on 2005-03-24 09:43:27 by f0dder
Wow, Thanks F0dder...
much appreciated.... :D
Posted on 2005-03-24 14:15:39 by goggles99
No problem - hope it's of use to you.
Posted on 2005-03-25 07:26:47 by f0dder
Posted on 2006-06-19 07:10:06 by f0dder
I realize this is an insanely old topic, but bumped into it and decided to add the sources just for the sake of completeness - the reteam site might not always be available :)
Posted on 2010-11-04 14:25:26 by f0dder