Hey everybody!

Currently, in a graphics program I'm doing I am using a bilinear filter. I did a quick search of the board and couldn't find anything significant related to the topic. I'd like to see other people's approaches to the matter.

Here it goes:

Bilinear interpolation of X' from points X_00,X_01,X_10,X_11 equally spaced on a grid.
X'=X_00(1-u)(1-v) + X_01(1-u)v + X_10u(1-v) + X_11(uv)

if we let:

;uses fixed point math
;0<=u,v<1 but are stored as a byte so are multiplied by 256
;X_ij are byte values (RGB values in this case)
;PROC returns byte value in eax

Bilinear PROC uses edx ebx X_00,X_01,X_10,X_11,u,v:DWORD
mov eax, X_10 ;eax=X_10
mov ebx, X_00 ;ebx=X_00
mov edx, u ;edx=u <<8
sub eax,ebx ;eax=X_10-X_00
shl ebx,8 ;ebx=X_00 << 8
imul eax,edx ;eax=u(X_10-X_00)
add eax,ebx ;eax=a<<8
push eax ;save eax for a moment

mov eax, X_11 ;eax=X_11
mov ebx, X_01 ;ebx=X_01
sub eax,ebx ;eax=X_11-X_01
shl ebx,8 ;ebx=X_01 << 8
imul eax,edx ;eax=u(X_11-X_01)
add eax,ebx ;eax=b<<8
pop edx ;edx=a<<8

sub eax,edx ;eax=b-a <<8
shl edx,8 ;edx=a << 16
imul eax,v ;eax=v(b-a) <<16
add eax,edx ;eax=a+v(b-a) << 16
shr eax,16 ;eax=a+v(b-a)
adc eax,0 ;round up if necessary
Bilinear ENDP
Posted on 2002-04-12 23:00:52 by chorus

; X_ij - mmx_word[0:R:G:B]

; MMX version
; u,v=0..32767
movd mm5,[u]
punpcklwd mm5,mm5
punpckldq mm5,mm5

movq mm0,[qword X_00]
movq mm2,[qword X_10]
psubw mm2,mm0
paddw mm2,mm2
pmulhw mm2,mm5
paddw mm0,mm2 ; mm0=a

movq mm3,[qword X_01]
movq mm4,[qword X_11]
psubw mm4,mm3
paddw mm4,mm4
pmulhw mm4,mm5
paddw mm3,mm4 ; mm3=b

movd mm6,[v]
punpcklwd mm6,mm6
punpckldq mm6,mm6

psubw mm3,mm0
paddw mm3,mm3
pmulhw mm3,mm6
paddw mm0,mm3
; mm0=color

; PIII,Athlon version
; u,v=0..65535
pshufw mm5,[u],0
movq mm0,[qword X_00]
movq mm2,[qword X_10]
psubw mm2,mm0
pmulhuw mm2,mm5
paddw mm0,mm2 ; mm0=a
movq mm3,[qword X_01]
movq mm4,[qword X_11]
psubw mm4,mm3
pmulhuw mm4,mm5
paddw mm3,mm4 ; mm3=b
pshufw mm6,[v],0
psubw mm3,mm0
pmulhuw mm3,mm6
paddw mm0,mm3
; mm0=color

You may reorder instruction for speedup.
Posted on 2002-04-14 02:49:11 by Nexo
Thats kind of funny that you have to create different code optimizatoins for the different processors. lol . When will the word unify down to one code set
Posted on 2002-04-14 14:55:45 by Volcano_88101
There is no optimization under different processors. Various processors - various sets of instructions - various implementation. So was always from 8086 up to today's processors. Nothing I find here funny.
Posted on 2002-04-15 09:44:07 by Nexo
Personally, it's not even a matter of different processors. It's just a matter of making the best code with what tools you have available. If that means you have different processors and different instructions sets, then it's more fun to see what I can do and what other people can do :)

Furthermore, a lot of the actual implementation has nothing to do with processors. Consider the different implementations of Bubble Sort and Heap Sort or Quick Sort. They all accomplish the same thing and you can code each for any given instruction set. But the *concept* is different for each. They are not simply different implementations of the same thing, they are each different in concept. The variation between my code or another persons code stems from a far deeper reason then instruction sets: it stems from being able to do the same thing using entirely different concepts.

To be honest, that's more than half the reason I post here, or code at all for that matter. Optimizations are all fine and dandy, but it's the concepts that are worth my time. I think other people would agree: just check out the StrLen thread. 10 pages of people looking for a byte! Are they just trying to squeeze out 1 more millisecond? I think it's just because we want to find all the ways we can find that zero byte. And that's all.
Posted on 2002-04-16 21:31:50 by chorus