I have one array of values (real4) with at least 12 values, and I want to convert that array into another array of exactly 12 values, which represent the averages of each part of the original array.

A simple example: Imagine the first array has 36 values, I want to convert that array into an array of 12 values, where each values is the avarage of 3 values from the original array..

Problems arise when the first array isn't a multiple of 12. I still want accurate averages, so the data has to be interpolated.

I'm looking for a simple algorithm to do this. I can think of some myself but I was wondering if there's a good one around.

Thomas

A simple example: Imagine the first array has 36 values, I want to convert that array into an array of 12 values, where each values is the avarage of 3 values from the original array..

Problems arise when the first array isn't a multiple of 12. I still want accurate averages, so the data has to be interpolated.

I'm looking for a simple algorithm to do this. I can think of some myself but I was wondering if there's a good one around.

Thomas

A good metaphor would be the line draw algorithm.

Think of that algorithm, and you should see the simple

solution - if you have no idea what I'm talking about,

I'll have to add more when I'm at home - too busy at work. ;)

Think of that algorithm, and you should see the simple

solution - if you have no idea what I'm talking about,

I'll have to add more when I'm at home - too busy at work. ;)

This appears to be a working algo:

```
; arrDest Pointer to buffer of size lenDest*4. This buffer will hold an
```

; array of REAL4 values returned from proceedure.

; arrSrc Pointer to buffer of size lenSrc*4. This buffer holds lenSrc REAL4

; values to be averaged by the proceedure.

;

NonUniformAverages PROC uses esi edi, arrDest:DWORD, arrSrc:DWORD, lenDest:DWORD, lenSrc:DWORD

mov esi,arrSrc

mov edi,arrDest

mov edx,lenDest

cmp edx,lenSrc

jge Error

xor eax,eax

div lenSrc

fild lenSrc

fild lenDest

fdiv ; number of source positions per destination position (Note: >1)

fldz ; init

test edx,edx

jne @F

Main: fadd REAL4 PTR [esi]

add esi,4

@@: add edx,eax

jnc Main

; load next value and split it into parts

; the fractional part in EDX indicates how much of next

; value goes into this and next position of destination

mov lenSrc,edx ; we use this for temp storage

fild lenSrc

fdiv fpc(4294967296.0) ; 2^32

fld REAL4 PTR [esi]

fld st ; two copies

add esi,4

fmul st(0),st(2)

fadd st(3),st(0)

fsubp st(1),st(0)

fxch st(2)

fstp st(1)

fdiv st(0),st(2)

fstp REAL4 PTR [edi]

add edi,4

dec lenDest

jne @B

ret

Error: jne @F

; this isn't exactly an error ;)

mov ecx,edx

rep movsd

or eax,-1

ret

@@: xor eax,eax ; error checking isn't a bad thing

ret

NonUniformAverages ENDP

I'd write some test code and work through the algo a bit more thoroughly before claiming it worked...but it looks good. ;)Thanks, looks good. I'll look at it later, don't have time right now.

Thomas

Thomas

Well it works most of the time but it has some strange behaviour.. Sometimes it produces outputs above 1.0 while all inputs are 1.0 or less...

Look at the attachment below, the inputs are almost all at the maximum (black line), but the averages (blended color) seems to fluctuate quite a bit.

Look at the attachment below, the inputs are almost all at the maximum (black line), but the averages (blended color) seems to fluctuate quite a bit.

Here's some code I tried, it produces invalid output values sometimes though.. Probably because it reads invalid values somewhere but I didn't found it yet. The algorithm I figured out is described in the attachment, the code only calculates the extended area, and does not yet substract the overflow parts. It's also not optimized in any way, I will when it's working..

Thomas

Thomas

```
```

[size=9]

NonUniformAverages PROC uses esi ebx edi arrDest:DWORD, arrSrc:DWORD, lenDest:DWORD, lenSrc:DWORD

LOCAL leftRest:DWORD

LOCAL rightRest:DWORD

LOCAL temp:DWORD

finit

mov esi,arrSrc

mov edi,arrDest

fild lenSrc

fidiv lenDest ;lenSrc/lenDest

fld1

fadd st(0),st(0) ;st(0) = 2, st(1)=lenSrc/lenDest

xor ecx, ecx

.WHILE ecx<lenDest

; calculate (lenSrc/lenDest) * ecx, rounded down.

mov eax, lenSrc

mul ecx

xor edx, edx ;not really necessary

div lenDest

mov ebx, eax

mov leftRest, edx

; calculate (lenSrc/lenDest) * (ecx+1), rounded up.

mov eax, lenSrc

inc ecx

mul ecx

dec ecx

xor edx, edx ;not really necessary

div lenDest

or edx, edx

jz @F

inc eax

@@:

mov rightRest, edx

cmp eax, ebx

je @lr_eq

fldz

mov edx, ebx

inc edx

.WHILE edx<=eax

fadd real4 ptr [esi+4*edx-4]

fadd real4 ptr [esi+4*edx]

inc edx

.ENDW

fdiv st(0),st(1)

jmp @F

@lr_eq:

fld real4 ptr [esi+4*eax]

@@:

.IF leftRest

;calculate and substract left overflow

;yet to do

.ENDIF

.IF rightRest

;calculate and substract right overflow

;yet to do

.ENDIF

;st(0)=area

;st(1)=2

;st(2)=lenSrc/lenDest

fdiv st(0),st(2)

fstp real4 ptr [edi+4*ecx] ; area/width = average

inc ecx

.ENDW

fstp st(0) ;kill 2

fstp st(0) ;kill lenSrc/lenDest

ret

NonUniformAverages endp

[/SIZE]

I do the same algo, but since were are dealing with an array, we know that the values substracted at the end of the algo are added to the next partition. I didn't test with values <1, but will - it should work and be fast. ;)

I do the same algo, but since were are dealing with an array, we know that the values substracted at the end of the algo are added to the next partition

Clever, didn't think of that..

I don't know if the problems only occur with small values, but I need the algo for values between -1 and 1.

Here's an example that causes problems:

```
```

inData REAL4 0.0, -0.1788022, -0.7071068, -0.9645574

REAL4 -0.9862856, -1.0, -0.9964929, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0

REAL4 -0.9958844, -0.8571673, -0.5150281, 0.0

...

invoke NonUniformAverages, addr outData, addr inData, 12, 17

I know the inputs are a bit weird but they are ripped from a memory dump and I don't know which values cause the problems.

All values are between -1 and 1 so the averages should lie between those values as well.

However, the 4th output value is -1.189916, the 8th is -1.702422, which are both out of range.

Could it be the presision of the algorithm? I tried to move the division by 2^32 so it would multiply first, then divide (more accurate) but didn't help. And as the FPU uses 80-bit registers for internal use it shouldn't be a problem.

Thomas

Presision shouldn't be a problem because the size of the arrays are so small - must be the algo - I threw it together really quickly. I didn't even work through the math - which I should have done. I just patched things together until the number output were what I expected. :tongue: Thanks for the values - I will surely try them when I get home.

Well, I found a few errors. I'm really surprised the presision is so bad - must still be an error some where? This algo seems right to me. I'll optimize it and see if I can find some bugs. Created serveral tests and they appear to work, but then the data above still produces values out of range? (but close) Let me know if you find something.

Just for clarification, here is a picture of what my algo is doing:

Escentially, EAX is the width of a source item, and when EDX overflows it means we've passed another destination width. At the time of overflow EDX holds the width of the second piece of the split source item. The split itself is when EDX=0 (red line: second picture).

Escentially, EAX is the width of a source item, and when EDX overflows it means we've passed another destination width. At the time of overflow EDX holds the width of the second piece of the split source item. The split itself is when EDX=0 (red line: second picture).

Oh, and the working algo - almost forgot.

Still needs to be optimized. ;)

Still needs to be optimized. ;)

```
; arrDest Pointer to buffer of size lenDest*4. This buffer will hold an
```

; array of REAL4 values returned from proceedure.

; arrSrc Pointer to buffer of size lenSrc*4. This buffer holds lenSrc REAL4

; values to be averaged by the proceedure.

;

NonUniformAverages PROC uses esi edi, arrDest:DWORD, arrSrc:DWORD, lenDest:DWORD, lenSrc:DWORD

LOCAL fTemp:QWORD

mov esi,arrSrc

mov edi,arrDest

mov edx,lenDest

cmp edx,lenSrc

jge Error

xor eax,eax

div lenSrc

and DWORD PTR [fTemp+4],0

; 1/2^32

fld fpc(0.000000000232830643653869628906250)

fild lenDest

fidiv lenSrc

fldz ; init

Main:

add edx,eax

jc @F

fadd REAL4 PTR [esi]

add esi,4

jmp Main

@@:

fadd REAL4 PTR [esi]

mov DWORD PTR [fTemp],edx

fild fTemp

fmul st(0),st(3)

fmul REAL4 PTR [esi]

fdiv st(0),st(2)

fxch st(1)

fsub st(0),st(1)

fmul st(0),st(2)

fstp REAL4 PTR [edi]

add esi,4

add edi,4

dec lenDest

jne Main

fstp st(0)

fstp st(0)

fstp st(0)

ret

Error:

jne @F

; this isn't exactly an error ;)

mov ecx,edx

rep movsd

or eax,-1

ret

@@: xor eax,eax ; error checking isn't a bad thing

ret

NonUniformAverages ENDP

invoke NonUniformAverages, addr outData, addr inData, LENGTHOF outData, LENGTHOF inData

Thanks a lot! :alright: It works like a charm now.

All output values are correct too.

Thomas

All output values are correct too.

Thomas

Your welcome. The FPU isn't something I'm very familar with.

Are you using this algo to generate inputs for a neural net from more dynamic data?

Are you using this algo to generate inputs for a neural net from more dynamic data?

Are you using this algo to generate inputs for a neural net from more dynamic data

.
Exactly, I now convert the angle to sine and cosine values. De black line is the sine value graph which I average to 12 values (the blended color) to get the inputs.

It works much better now, I was able to recognize eight different characters with a training set of each character drawn only once..

Thomas

Very impressive! I'm still working on the MMX versions. Initially, I'm just working to produce optimal output generators. Then I'll work on trying to train them. It's harder than I thought - trying to do plain MMX so everyone can use it.

I don't know which ranges the weights exactly are but because floating points are used they are very flexible (very small values like 1e-10 to very large values like 1e10). It's hard to do the same with integer values. You also need to calculate e^x on mmx regs... Seems very hard to me. Instruction sets like 3dnow are easier I think but you need to write them seperately for intel and amd :(. MMX is generally supported.

If you need help understanding the algorithm just ask, it took me quite some time to understand.

Thomas

If you need help understanding the algorithm just ask, it took me quite some time to understand.

Thomas

Actually, I'm trying to use another method - wouldn't even bother with e^X in SMD integer...scary! ;) That function was only chosen because of it's properties - if I could do the same algorithmically with another function it should still work, or so I hope? It's all in fun and learning.

You can choose any function to do that, even a hardlimiting treshold function that only outputs 0 or 1, but the formulas in the back-propagation algo are derrivated from that formula, so you will need to derrivate them yourself (yuck :tongue: ).

Thomas

Thomas

'Yuck' is a good word for it. I'm leaning towards a genetic approach combined with some back-propagation. When the algo is this fast learning can take 1000x longer. Okay, maybe it's more like a 1000000x times longer. ;) I really want some back-propagation as well.

Are you randomizing the inputs when you train your NN's?

Are you randomizing the inputs when you train your NN's?