I just want to make sure I understand this correctly.

This pertains to training a Feed-Forward Neural Network using back-propagation,

After I have calculated the output set of floats, I should then calculate a deltaError set of floats by subtraction with a desired set of output floats.

Then I should work back through the neural network, and for each neuron, correction of weights follows this basic formula:

Neuron.CorrectedInputWeight=Neuron.InputWeight - (LearningCoefficient * (Neuron.InputWeight * deltaError) )

so if the LearningCoefficient was 0.50f (half) then:

New weight = old weight - (half the old weight's contribution to the error)

We simply work back through the layers of neurons one at a time, calculating their new input weights.

Once we have calculated each layer's new weights in this fashion, the deltaError values exposed to the previous layer (as we work backwards) should now be recalculated from the corrected Input Weights of the current layer, rather than the ones we calculated initially at the output layer.

When we reach (and have processed) the Input Layer of neurons, there's no more per-neuron Input Weights to correct, and we have completed the training cycle.

Is all of this correct?

This pertains to training a Feed-Forward Neural Network using back-propagation,

After I have calculated the output set of floats, I should then calculate a deltaError set of floats by subtraction with a desired set of output floats.

Then I should work back through the neural network, and for each neuron, correction of weights follows this basic formula:

Neuron.CorrectedInputWeight=Neuron.InputWeight - (LearningCoefficient * (Neuron.InputWeight * deltaError) )

so if the LearningCoefficient was 0.50f (half) then:

New weight = old weight - (half the old weight's contribution to the error)

We simply work back through the layers of neurons one at a time, calculating their new input weights.

Once we have calculated each layer's new weights in this fashion, the deltaError values exposed to the previous layer (as we work backwards) should now be recalculated from the corrected Input Weights of the current layer, rather than the ones we calculated initially at the output layer.

When we reach (and have processed) the Input Layer of neurons, there's no more per-neuron Input Weights to correct, and we have completed the training cycle.

Is all of this correct?

I think maybe you have the delta bit wrong. The delta for an inner layer in the product of the deltas of the previous layers together with its own output weights. In effect the deltas travel back through the net in the same ways as input travel forward trhough it with the difference being that you're working out weighted products rather than sums.

For large nets these weighted products quickly tend to 0 cause most of the numbers you deal with are less than 1.

Unless your talking about some sort of Feedforward net which I haven't heard of? What I describe above is the classic backpropagation algo.

For large nets these weighted products quickly tend to 0 cause most of the numbers you deal with are less than 1.

Unless your talking about some sort of Feedforward net which I haven't heard of? What I describe above is the classic backpropagation algo.

yes, classic ff with back prop ..

if I post my version of this stuff, wanna help get the trainer working? the code is quite small - written with atc oop :)

if I post my version of this stuff, wanna help get the trainer working? the code is quite small - written with atc oop :)

The delta for an inner layer in the product of the deltas of the previous layers together with its own output weights.

So I calculate a SINGLE "back-delta" float value for the ENTIRE output set of neurons by adding their deltas, then I apply this back-delta value as the delta for all neurons of the previous layer?

You say product - do you mean sum, or do you mean I should multiply? I want help, I'm REALLY close here :)

Well I've no problem helping, but only have web access till tommorow eve. Yes by product I do mean multiply, I learned the formuls from a site generation5.org (origionally pointed to it by thomas) To sumarize the delta bit..

First you do work out seperate delta for each output neuron in the manner you mention in your first post. This leaves you with each neuron having a particular delta representing how wrong it was, these deltas have to be propagated backwards using this formula

d = x2(1 - x)w1d1w2d2...widi

Where the d#'s represent the deltas of the layer above and the w#'s the weight connecting that neuron to the current.

Only after propagating the deltas back do you then go and adjust the weights of each neuron and you adjust that weight by calculating (delta*lCoeff*input) for each weight and adding that value to the weight. I should stress

First you do work out seperate delta for each output neuron in the manner you mention in your first post. This leaves you with each neuron having a particular delta representing how wrong it was, these deltas have to be propagated backwards using this formula

d = x2(1 - x)w1d1w2d2...widi

Where the d#'s represent the deltas of the layer above and the w#'s the weight connecting that neuron to the current.

Only after propagating the deltas back do you then go and adjust the weights of each neuron and you adjust that weight by calculating (delta*lCoeff*input) for each weight and adding that value to the weight. I should stress

__adding__because I notice you mention subtraction in your first post.My code differs slightly to thomas' code.

It performs sigma filtering etc.

There may be a couple of files in here I am not using in this example, let me know if anything is missing.

It performs sigma filtering etc.

There may be a couple of files in here I am not using in this example, let me know if anything is missing.

Sorry EvilHomer no offence to your code which seem nice but my head has been a bit fried of late and I'm not actually up to the challange of reading and trying to figure out code. I'm happy to try and answer questions or to maybe look at specific bits of code but I'm not currently able to apply myself to understanding something as a whole. Sorry.

BTW what do you mean by simga filtering or how do you implement it?

BTW what do you mean by simga filtering or how do you implement it?

Sigma filtering is just a nice way of saying "S-shaped graph" - basically its a filter function we apply to smooth out harsh transitions from one side of zero to the other.

The sigmoid function I have looks like this:

return ( 1 / ( 1 + exp(-netinput / response)))

here's some notes to explain sigma better:

http://ieee.uow.edu.au/~daniel/software/libneural/BPN_tutorial/BPN_English/BPN_English/node5.html

The sigmoid function I have looks like this:

return ( 1 / ( 1 + exp(-netinput / response)))

here's some notes to explain sigma better:

http://ieee.uow.edu.au/~daniel/software/libneural/BPN_tutorial/BPN_English/BPN_English/node5.html

Do you perhaps mean Sigmoid? What is the response variable in the above equation? Sorry for all the question :) .

I believe that refers to the ActivationResponse threshold per neuron.

The function takes the net Activation Input for a neuron and tempers it with the ActivationResponse value, while also hard-limiting it to unit scale (normalizes).

afaik "sigmoid" describes any mathematical function with a sigma-wave output. A Sigma-wave is 0 at 0, 1 at infinity, and curves in an S shape inbetween jusimilar to the first half of sinewave.

Anyhow, I found an article by that (Mathew James?) chap, where he describes a per-layer method of back-propagation very similar to how I described it, and says that he prefers it. I think I'll give it a go. The main point to note is that my NN has N outputs, not just one, otherwise it's straightforward.

The function takes the net Activation Input for a neuron and tempers it with the ActivationResponse value, while also hard-limiting it to unit scale (normalizes).

afaik "sigmoid" describes any mathematical function with a sigma-wave output. A Sigma-wave is 0 at 0, 1 at infinity, and curves in an S shape inbetween jusimilar to the first half of sinewave.

Anyhow, I found an article by that (Mathew James?) chap, where he describes a per-layer method of back-propagation very similar to how I described it, and says that he prefers it. I think I'll give it a go. The main point to note is that my NN has N outputs, not just one, otherwise it's straightforward.

btw, this project is the debut for my CArrayManager class, which is CVector:<anything> class - dynamic array manager for arrays with elements of arbitrary size :)

Yes, from the output layer working back, he calculates an error delta array for each layer, then (this confuses me) he adds and averages the error for each layer, and uses this common error metric to modify the input weights of ALL neurons on that layer, then moving back to the previous layer and so on.

I want to know why I would want to affect a neuron with the errors of its neighbours - this seems weird and wrong.

Should I not just propagate each output neuron's error back thru THAT neuron's input weights and so on? This would seem to be more logical to me, but then I'm probably very wrong here.

I want to know why I would want to affect a neuron with the errors of its neighbours - this seems weird and wrong.

Should I not just propagate each output neuron's error back thru THAT neuron's input weights and so on? This would seem to be more logical to me, but then I'm probably very wrong here.

I agree with you that doesn't make sense. It is important to note that if you modify the weights of one layer before working out the deltas of the previous layer then the deltas you later calculate there will be wrong. Not totally wrong probably but still...

The thing with these nets is that most learning occurs in the final layer, thats why many different techniques still seem to produce Nets that can learn. But for highly non-liner associations between input and output then the extra layers are needed. As best I can tell the initial random weights scramble the inputs and the hope is that somewhere in the outputs of the hidden layers will be a pattern which can be linked linearly to the desired output. Traning then both makes this link and strenghtens that fluke pattern.

The back propagation algorithim is mathematically worked out and hence you have bits in the formulas like x(1-x) which don't necessarly make intutive sense until you start working out integrals/derivatives of the sigmoid function. When you start playing with the formulas from a programming point of view you can find that everything still seems to work but chances are you'll mess the part of training which strenghtens those non-liner assocations and ultimately I've found that such nets then tend to be very bad at fine tuning their outputs to the desired ones.

Also regards that response variable, Unless there is a very specific need for it i'd drop it. netinput is a weighted sum, dividing that by the response variable is only throwing another weighting into the mix. Unless the training equations take it into account then I could see such a variable slowing down learning. Even if they do take it into account I can't think of a way it be an improvement.

I would be very interested in reading this article you mention.

The thing with these nets is that most learning occurs in the final layer, thats why many different techniques still seem to produce Nets that can learn. But for highly non-liner associations between input and output then the extra layers are needed. As best I can tell the initial random weights scramble the inputs and the hope is that somewhere in the outputs of the hidden layers will be a pattern which can be linked linearly to the desired output. Traning then both makes this link and strenghtens that fluke pattern.

The back propagation algorithim is mathematically worked out and hence you have bits in the formulas like x(1-x) which don't necessarly make intutive sense until you start working out integrals/derivatives of the sigmoid function. When you start playing with the formulas from a programming point of view you can find that everything still seems to work but chances are you'll mess the part of training which strenghtens those non-liner assocations and ultimately I've found that such nets then tend to be very bad at fine tuning their outputs to the desired ones.

Also regards that response variable, Unless there is a very specific need for it i'd drop it. netinput is a weighted sum, dividing that by the response variable is only throwing another weighting into the mix. Unless the training equations take it into account then I could see such a variable slowing down learning. Even if they do take it into account I can't think of a way it be an improvement.

I would be very interested in reading this article you mention.

Thanks, I must say I disagree with bits of that, or rather I disagree with my understanding of it. He says that the formulas for the deltas are derived with respect of the squared error. Fair enough I believe he's right there. But there is certainly no need to actually work out what that squared error is cause it never used in the formulas, you might only use it to compare to a threshold value (as he does mention earlier).

And in as much as I understand whats going on I can say that the following quote is wrong.

The errors of neurons in the hidden layer tell how much they contributed to the overall error. If you have since modified the weights then those errors cannot be calculated correctly and the deltas based on them will also be wrong.

And in as much as I understand whats going on I can say that the following quote is wrong.

*"I prefer adjusting the weights one layer at a time. This method involves recomputing the network error before the next weight layer error terms are computed."*The errors of neurons in the hidden layer tell how much they contributed to the overall error. If you have since modified the weights then those errors cannot be calculated correctly and the deltas based on them will also be wrong.

I can't see why I can't do the following:

-calculate deltas for current neuron layer based on current layer outputs and "deltas from the previous layer" (except for the output layer, where we use the difference between output and training set)

-alter the weights of the current neuron layer according to the input-weighted partial derivatives of the delta error per neuron

(and NOT based on the weighted derivative of the network error squared sum)

-move back one layer and repeat

Seems to me that once we have our deltas per layer, we are ready to modify the weights for that layer, provided we hand the deltas we calculated to the next earlier layer in the next iteration.

There seems no reason to calculate deltas for all layers before modifying the weights.

-calculate deltas for current neuron layer based on current layer outputs and "deltas from the previous layer" (except for the output layer, where we use the difference between output and training set)

-alter the weights of the current neuron layer according to the input-weighted partial derivatives of the delta error per neuron

(and NOT based on the weighted derivative of the network error squared sum)

-move back one layer and repeat

Seems to me that once we have our deltas per layer, we are ready to modify the weights for that layer, provided we hand the deltas we calculated to the next earlier layer in the next iteration.

There seems no reason to calculate deltas for all layers before modifying the weights.

When you calculate the deltas for one of the hidden layers you need the deltas of the previous layer, but in the calculation those deltas are multiplied by the weights which connect those layers, if you had previously adjusted those weights then the deltas wil be different.

I can understand the logic that the article seems to suggest that the adjusted weights are more correct and so any deltas calculated with them should also be more correct but I don't think it works thats way. To be honest I don't know this for an absolute fact but if you imagine a situation where a weight gets adjusted from a positive to a negative value, the the deltas of the previous layers will be opposite sign of they would have been.

Maybe this really would be more correct, but without seeing some research on the topic I would place my trust in the classic algorithim...

I can understand the logic that the article seems to suggest that the adjusted weights are more correct and so any deltas calculated with them should also be more correct but I don't think it works thats way. To be honest I don't know this for an absolute fact but if you imagine a situation where a weight gets adjusted from a positive to a negative value, the the deltas of the previous layers will be opposite sign of they would have been.

Maybe this really would be more correct, but without seeing some research on the topic I would place my trust in the classic algorithim...