Hey everybody!

Currently, in a graphics program I'm doing I am using a bilinear filter. I did a quick search of the board and couldn't find anything significant related to the topic. I'd like to see other people's approaches to the matter.

Here it goes:

Bilinear interpolation of X' from points X_00,X_01,X_10,X_11 equally spaced on a grid.

Formula:

X'=X_00(1-u)(1-v) + X_01(1-u)v + X_10u(1-v) + X_11(uv)

if we let:

a=X_00+u(X_10-X_00)

b=X_01+u(X_11-X_01)

then:

X'=a+v(b-a)

;uses fixed point math

;0<=u,v<1 but are stored as a byte so are multiplied by 256

;X_ij are byte values (RGB values in this case)

;PROC returns byte value in eax

Currently, in a graphics program I'm doing I am using a bilinear filter. I did a quick search of the board and couldn't find anything significant related to the topic. I'd like to see other people's approaches to the matter.

Here it goes:

Bilinear interpolation of X' from points X_00,X_01,X_10,X_11 equally spaced on a grid.

Formula:

X'=X_00(1-u)(1-v) + X_01(1-u)v + X_10u(1-v) + X_11(uv)

if we let:

a=X_00+u(X_10-X_00)

b=X_01+u(X_11-X_01)

then:

X'=a+v(b-a)

;uses fixed point math

;0<=u,v<1 but are stored as a byte so are multiplied by 256

;X_ij are byte values (RGB values in this case)

;PROC returns byte value in eax

```
```

Bilinear PROC uses edx ebx X_00,X_01,X_10,X_11,u,v:DWORD

mov eax, X_10 ;eax=X_10

mov ebx, X_00 ;ebx=X_00

mov edx, u ;edx=u <<8

sub eax,ebx ;eax=X_10-X_00

shl ebx,8 ;ebx=X_00 << 8

imul eax,edx ;eax=u(X_10-X_00)

add eax,ebx ;eax=a<<8

push eax ;save eax for a moment

mov eax, X_11 ;eax=X_11

mov ebx, X_01 ;ebx=X_01

sub eax,ebx ;eax=X_11-X_01

shl ebx,8 ;ebx=X_01 << 8

imul eax,edx ;eax=u(X_11-X_01)

add eax,ebx ;eax=b<<8

pop edx ;edx=a<<8

sub eax,edx ;eax=b-a <<8

shl edx,8 ;edx=a << 16

imul eax,v ;eax=v(b-a) <<16

add eax,edx ;eax=a+v(b-a) << 16

shr eax,16 ;eax=a+v(b-a)

adc eax,0 ;round up if necessary

Bilinear ENDP

```
```

; X_ij - mmx_word[0:R:G:B]

; MMX version

; u,v=0..32767

movd mm5,[u]

punpcklwd mm5,mm5

punpckldq mm5,mm5

movq mm0,[qword X_00]

movq mm2,[qword X_10]

psubw mm2,mm0

paddw mm2,mm2

pmulhw mm2,mm5

paddw mm0,mm2 ; mm0=a

movq mm3,[qword X_01]

movq mm4,[qword X_11]

psubw mm4,mm3

paddw mm4,mm4

pmulhw mm4,mm5

paddw mm3,mm4 ; mm3=b

movd mm6,[v]

punpcklwd mm6,mm6

punpckldq mm6,mm6

psubw mm3,mm0

paddw mm3,mm3

pmulhw mm3,mm6

paddw mm0,mm3

; mm0=color

; PIII,Athlon version

; u,v=0..65535

pshufw mm5,[u],0

movq mm0,[qword X_00]

movq mm2,[qword X_10]

psubw mm2,mm0

pmulhuw mm2,mm5

paddw mm0,mm2 ; mm0=a

movq mm3,[qword X_01]

movq mm4,[qword X_11]

psubw mm4,mm3

pmulhuw mm4,mm5

paddw mm3,mm4 ; mm3=b

pshufw mm6,[v],0

psubw mm3,mm0

pmulhuw mm3,mm6

paddw mm0,mm3

; mm0=color

You may reorder instruction for speedup.

Thats kind of funny that you have to create different code optimizatoins for the different processors. lol . When will the word unify down to one code set

There is no optimization under different processors. Various processors - various

**sets of instructions**- various**implementation**. So was always from 8086 up to today's processors. Nothing I find here funny.Personally, it's not even a matter of different processors. It's just a matter of making the best code with what tools you have available. If that means you have different processors and different instructions sets, then it's more fun to see what I can do and what other people can do :)

Furthermore, a lot of the actual implementation has nothing to do with processors. Consider the different implementations of Bubble Sort and Heap Sort or Quick Sort. They all accomplish the same thing and you can code each for any given instruction set. But the *concept* is different for each. They are not simply different implementations of the same thing, they are each different in concept. The variation between my code or another persons code stems from a far deeper reason then instruction sets: it stems from being able to do the same thing using entirely different concepts.

To be honest, that's more than half the reason I post here, or code at all for that matter. Optimizations are all fine and dandy, but it's the concepts that are worth my time. I think other people would agree: just check out the StrLen thread. 10 pages of people looking for a byte! Are they just trying to squeeze out 1 more millisecond? I think it's just because we want to find all the ways we can find that zero byte. And that's all.

Furthermore, a lot of the actual implementation has nothing to do with processors. Consider the different implementations of Bubble Sort and Heap Sort or Quick Sort. They all accomplish the same thing and you can code each for any given instruction set. But the *concept* is different for each. They are not simply different implementations of the same thing, they are each different in concept. The variation between my code or another persons code stems from a far deeper reason then instruction sets: it stems from being able to do the same thing using entirely different concepts.

To be honest, that's more than half the reason I post here, or code at all for that matter. Optimizations are all fine and dandy, but it's the concepts that are worth my time. I think other people would agree: just check out the StrLen thread. 10 pages of people looking for a byte! Are they just trying to squeeze out 1 more millisecond? I think it's just because we want to find all the ways we can find that zero byte. And that's all.