Since here are some graphic experts, I am pretty sure you can help me:

Whats the fastest way to replace Pixels (lets say RGB 0,0,0 with RGB FF,00,00) in a image (bitmap)?
Is there a better way than going through all pixels in two nested loops?
Posted on 2003-05-12 14:41:38 by bazik
I'm no gfx expert but I don't think you can do better than 2 nested loops(I'm assuming the rgb color to replace: 0, is random). If there is a pattern on rgb color 0, you can whip up some algo there.

Though you could do some parallel processing by using MMX (or SSE) to speed up the code.(I'm not sure!!!)

or you can combine the 2 loops into 1 and still achieve the same time(n) but you will save some registers in the process. :)

just my 2c! :)
Posted on 2003-05-12 15:02:25 by arkane
I am also no expert, thats why I am asking ;)

Isnt there a way to use masking?
Posted on 2003-05-12 15:04:40 by bazik
I can think of a Finite State Machine(FSM) solution but this is just on top of my head. :grin:

FSM is a memory hog because it consumes memory the same size as the bitmap. I'll look into it later, since I also need it. :grin:
Posted on 2003-05-12 15:15:03 by arkane
You could create 16MB LUT where lut = i, but lut[0] = FF0000.
Posted on 2003-05-12 15:15:05 by comrade
which format do you have the pixels in (windows bitmap, raw pixels, DIB, ...)
are we talking dynamic or fixed size image?
which color depth? (I assume 24 or 32?)

there's plenty of ways to go about it - MMX will probably give optimal result (or SSE2/MMX on P4's). For fun you could also do a CMOVcc version ;), or regular pentium-ish masking stuff...

Do avoid two nested loops (you should be able to process the image in one go), and do avoid conditional jumps if you can. I'm sure there's some clever people that can come up with something nice if you post more info, or I can ridicule myself trying.
Posted on 2003-05-12 15:39:31 by f0dder
please read the readme... :grin:
Posted on 2003-05-12 19:54:09 by arkane
bug report #1: seems to affect only the last few values...

0500 0000 0300 0000 0400 0000 0500 0000 

0500 0000 0500 0000 0200 0000 0400 0000
0400 0000 0500 0000 0400 0000 0500 0000
0200 0000 0000 0000 0400 0000 0200 0000
0500 0000 0300 0000 0400 0000 0500 0000 

0500 0000 0500 0000 0200 0000 0400 0000
0400 0000 0500 0000 0400 0000 0500 0000
0200 0000 [color=blue]0000 FF00[/color] [color=red]0000 FF00 0000 FF00[/color]
the value colored in red should not be there it should mimic the last 2 values of the source

the blue colored text is correct, the same thing goes of the black...

The error happens only if the last value/s is/aren't 0... Posted on 2003-05-12 20:05:29 by arkane

which color depth? (I assume 24 or 32?)

What's the difference? Besides step and alignment (32 would be faster), I don't see any.
Posted on 2003-05-12 23:00:01 by comrade
Simple CMOVcc:

lpMem - pointer to a memory
len - length
value - value to replace 0
ReplaceMem PROC USES ebx esi edi lpMem:DWORD, len:DWORD, value:DWORD

mov esi, lpMem
xor eax, eax
mov ebx, len
mov edi, value

mov edx, DWORD PTR [esi+eax*4]
test edx, edx
mov ecx, edx
cmovz ecx, edi
mov DWORD PTR [esi+eax*4], ecx
inc eax
cmp eax, ebx
jb @b


ReplaceMem ENDP
Posted on 2003-05-13 00:53:59 by arkane

What's the difference? Besides step and alignment (32 would be faster), I don't see any.

It's easier to stuff alpha in 32bpp :) - other than that, 32bpp would mainly be because it's more comfortable to work with, and you don't get unaligned reads/writes.

bazik, give more info :) - do you want just the value zero, or do you want to replace arbitrary color values?
Posted on 2003-05-13 01:43:19 by f0dder
ReplaceMemEx PROC lpMem:DWORD, len:DWORD, value:DWORD

mov eax, value
movd MM1, eax
movd MM2, eax
punpckldq MM2, MM1
pcmpeqd MM3, MM3
mov edx, lpMem
mov eax, len
sub eax, 2
pxor MM0, MM0
movq MM1, QWORD PTR [edx+eax*4]
pcmpeqd MM0, MM1
pxor MM0, MM3
pandn MM0, MM2
pxor MM0, MM1
movq QWORD PTR [edx+eax*4], MM0
sub eax, 2
jns @b
cmp eax, -1
jl @f
mov eax, DWORD PTR [edx]
test eax, eax
jnz @f
mov eax, value
mov DWORD PTR [edx], eax
ReplaceMemEx ENDP
I have a shorter version but the array length must be even which is restrictive IMO. :grin:
Posted on 2003-05-13 02:12:58 by arkane
Here is fastest way:

mov edi, OFFSET lut
mov ecx, 16*1024*1024 shr 2
xor eax, eax
rep stosd
mov dword ptr [lut+00000000h], 00FF0000h

mov ebx, OFFSET lut
mov edx, OFFSET bitmap
mov ecx, width*height
next: mov eax, [edx]
mov eax, [ebx+eax*4]
mov [edx], eax
add edx, 4
loop next
Posted on 2003-05-13 08:15:18 by comrade
fastest? you use a 16meg LUT, that's hardly good for cache. You use loop, which is slow on a whole bunch of processors. Assuming 32bpp, you ought to mask off alpha bits (and if you planned to use that code on 24bpp... whoa).

Besides, there's other flaws. Like, you need a (2^24)*4 LUT, not one size (2^4)/4 . And your initialization is wrong too :). Come on comrade, you can do better than jokes like this :)
Posted on 2003-05-13 08:46:39 by f0dder
What's a LUT ?

Posted on 2003-05-13 09:40:29 by JCP
A Look-Up Table.
Posted on 2003-05-13 09:41:19 by f0dder
Uhm, obvious. :)

Thanks ! :alright:
Posted on 2003-05-13 09:42:40 by JCP
Sorry, was a bit busy today.

The images are about 1100x800 pixel as 256 color GIF images. But dont care about the file format please because that isnt the problem ;)
Posted on 2003-05-13 13:39:25 by bazik
Ohh GIF - 8bpp! 1 byte/pixel, so we can handle a lot of pixels with one MMX instruction (8 with MMX, 16 with SSE2/MMX :P). How do you load the gif? One big block, or are the horizontal lines packed in any way?

Are there any guarantee on image size? (Ie, width*height will always be a multiple of 8/16 ;)) - not that it would be too troublesome handling the last few pixels with regular code.
Posted on 2003-05-13 14:39:49 by f0dder
Lets assume I have a hDC.

The X/Y pixels are not fixed. They can be up to ~3000x~2400.
I've never worked with graphics before, except saving a hDC as a JPEG ;)

What I currently do is (pseudo code)

for x = 0 to pic_width
for y = 0 to pic_height
if Getpixel(hdc, x, y) = RBG(0,0,0) then
Setpixel(hdc, x, y, RGB(0,0,ff))
end if
next y
next x

Which takes about 2 minutes to do the following replacements:

White -> Red
Black -> White
Red -> Black
Cyan -> Purple
Posted on 2003-05-13 14:59:25 by bazik