Well i have checked the past few posts of nn in here and even the essay at generation5 and a few more sites but i still dont seem to get it. Its probably the maths which is making it so tough for me.

Lets take the XOR example.
Why does it use 3 perceptrons?
When it only has to adjust weights of the neurons till we get the least error why do we need 3 of them? Why not only 1?

I get the delta calculation part but dont seem to get the weight adjustion part.
This is the bit of wight adjustion code from the XOR example at generation 5



[COLOR=green]// Now, alter the weights accordingly.[/COLOR]
float v1 = i1, v2 = i2;
for(int i=0;i<3;i++) {
[COLOR=green]//Change the values for the output layer, if necessary.[/COLOR]
if (i == 2) {
v1 = i3;
v2 = i4;
}

m_fWeights[0][i] += BP_LEARNING*1*deltas[i];
m_fWeights[1][i] += BP_LEARNING*v1*deltas[i];
m_fWeights[2][i] += BP_LEARNING*v2*deltas[i];
}



What exactly is it doing? Why is it that at i=2, the values are changed?
If someone wants to take a look at a more code from the example i can post it.

I am abosolutly confused as you can see. Can someone please explain a bit?
Posted on 2002-12-30 06:34:24 by clippy
First here's the way I always looked at things; in a three layer net there are only really two layers of neurons, the hidden and output layers. The input layer is simply an array of values which are fed into the hidden layer.

Now, you've gottten up to the stage up deltas calculated and now you want to modify the weights. What you need is a loop which goes through all the neurons, not even in any paticular order.

For each neuron you'll want to loop through its weights, for each one you modify the weights in the same way. Get the delta of the neuron it feeds inputs to, multiply that by the learning coefficient, multiply that by the output of the neuron that feeds into the weight (ie the one in the previous layer) and add the result to the weight.

Now regarding to code, I think v1 & v2 represent the outsput of the previous layer, initially their set to i1, i2, probably the inputs to the net, for the output layer they're changed to i3 & i4, probably the output of the hidden layer. All in all it seem to be a very badly written bit of code. But I could be completly mistaken here.
Posted on 2002-12-30 07:48:42 by Eóin
Originally posted by gladiator
When it only has to adjust weights of the neurons till we get the least error why do we need 3 of them? Why not only 1?

Just try it :) !
A perceptron has two inputs, a bias input (always 1 iirc), an output and a weight for all 3 inputs. So the weighted sum will be:
sum = weight1 * input1 + weight2 * input2 + weight3 * 1
The sum will get through a hard-limiter (sum <= x --> output=0, sum >x --> output =1).
Now try to find weight1, weight2 weight 3 and x so that the output will be input1 xor input2. You would find it's impossible.

You've seen generation5's essays already but this problem is clearly explained on this page http://www.generation5.org/perceptron.shtml .

Thomas
Posted on 2002-12-30 07:53:46 by Thomas
Thomas,
I get that the problem has to be linearly separable for the perceptrons to be able to solve it.
Now one may determine the number of perceptrons to use in a simple XOR program but how do u do so in large problems like your digital brain example?

How did you know the number of perceptrons to use in your digital brain?
Posted on 2002-12-31 03:11:40 by clippy
Originally posted by gladiator
I get that the problem has to be linearly separable for the perceptrons to be able to solve it.

Yes with one perceptron, the problem has to be linearly separable, but with multiple it doesn't have to be.

Now one may determine the number of perceptrons to use in a simple XOR program but how do u do so in large problems like your digital brain example?
How did you know the number of perceptrons to use in your digital brain?


Mostly trial and error :), with a very low number of neurons, recognition is afwul, with too many, it's too hard to train it. I just tried out some values and looked what worked best..

Thomas
Posted on 2002-12-31 07:39:40 by Thomas
This is indeed "the problem" with Neural Nets : "there is no mathematical sure way to find the best layout for the neurons in order to solve a problem"

This fact together with "the XOR problem" have made researchers ignore NN for about 20years (not anymore).

However if you "think" with your intuition and follow some obviouse rules that emerge from experience (and trial and error of course) you will soon understand that:

-1) almost any layout will do... yes some will learn faster and some will learn slower but basically ANY decent layout will do! (not exagerated/denaturated)

2) If you want to resolve ONLY one specific problem then there are obvoiuse better layouts. For example image recognition uses more like a matrix with improved vertical and horizontal connections while speech recognition needs a more horizontal (aka time) oriented network.

The big big problem is how to learn the network not necesarly the layout, and you will loose a lot of time with learning it, also the data sets and algorithms used for learning need to be very well choosen. You must never forget to present denaturated absurd data to the network from time to time (after each lerning cycle) to check its reponse it is still decent (like: "unknown")

However nature did the same thing; tiral an error (i guess) and after many years we are left with a brain that is able to theoreticaly have more potential connections that Universe is able to have atoms...so... when researchers will achieve that ... we as a humman race are obsolete... so do not hurry :)
Posted on 2003-01-01 07:59:51 by BogdanOntanu
Sorry my 'extremly' late reply on this thread. I had left neural networks in the middle for a while.

Before i continue, thank you eoin, thomas and bodgan ontanu for helping me so far :)

Eoin,
originally posted by eoin
Now regarding to code, I think v1 & v2 represent the outsput of the previous layer, initially their set to i1, i2, probably the inputs to the net, for the output layer they're changed to i3 & i4, probably the output of the hidden layer. All in all it seem to be a very badly written bit of code. But I could be completly mistaken here.

You are pretty right. Here is the full code for training the network.



[color=blue]float[/color] m_fWeights[3][3]; [color=green] // Weights for the 3 neurons.[/color]
[color=blue]float[/color] Sigmoid([color=blue]float[/color]); [color=green]// The sigmoid function.[/color]

[COLOR=red][b]READ THIS FIRST:[/b] the weights for each neuron are arranged in rows, [b]not [/b] the columns.
That is, they are arranged vertically, [b]not[/b] horizontally.
The programmer at generation5 explains this by saying he is a [b]visual [/b]programmer:rolleyes:
[/COLOR]

[color=blue]float[/color] Train([color=blue]float[/color] i1, [color=blue]float[/color] i2, [color=blue]float[/color] d) {
[color=green]// These are all the main variables used in the
// routine. Seems easier to group them all here.[/color]
[color=blue]float[/color] net1, net2, i3, i4, out;

[color=green]// Calculate the net values for the hidden layer neurons.[/color]
net1 = 1 * m_fWeights[0][0] + i1 * m_fWeights[1][0] +
i2 * m_fWeights[2][0];
net2 = 1 * m_fWeights[0][1] + i1 * m_fWeights[1][1] +
i2 * m_fWeights[2][1];

[color=green]// // Use the hardlimiter function - the Sigmoid.[/color]
i3 = Sigmoid(net1);
i4 = Sigmoid(net2);

[color=green]// Now, calculate the net for the final output layer.[/color]
net1 = 1 * m_fWeights[0][2] + i3 * m_fWeights[1][2] +
i4 * m_fWeights[2][2];
out = Sigmoid(net1);

[color=green] // We have to calculate the deltas for the two layers.
// Remember, we have to calculate the errors backwards
// from the output layer to the hidden layer (thus the
// name 'BACK-propagation').[/color]
[color=blue]float[/color] deltas[3];

deltas[2] = out*(1-out)*(d-out);
deltas[1] = i4*(1-i4)*(m_fWeights[2][2])*(deltas[2]);
deltas[0] = i3*(1-i3)*(m_fWeights[1][2])*(deltas[2]);

[color=green]// Now, alter the weights accordingly.[/color]
[color=blue]float[/color] v1 = i1, v2 = i2;
[color=blue]for[/color](int i=0;i<3;i++) {
[color=green] // Change the values for the output layer, if necessary.[/color]
[color=blue]if [/color](i == 2) {
v1 = i3;
v2 = i4;
}

m_fWeights[0][ i] += BP_LEARNING*1*deltas[ i];
m_fWeights[1][ i] += BP_LEARNING*v1*deltas[ i];
m_fWeights[2][ i] += BP_LEARNING*v2*deltas[ i];
}

[color=blue]return[/color] out;
}



My problems -

This is the main one really. The back propogation part. I knew i said earlier that i could understand it, but now it seems to me i really cant. So you can see how confused i am-


[color=green] // We have to calculate the deltas for the two layers.
// Remember, we have to calculate the errors backwards
// from the output layer to the hidden layer (thus the
// name 'BACK-propagation').[/color]
[color=blue]float[/color] deltas[3];

deltas[2] = out*(1-out)*(d-out);
deltas[1] = i4*(1-i4)*(m_fWeights[2][2])*(deltas[2]);
deltas[0] = i3*(1-i3)*(m_fWeights[1][2])*(deltas[2]);


I cant understand a hell of the above code. There is a formula for the first calculation
( delta[2] ) on generation5, but i dont know where they get the rest from and how actually does this back propogation work?:confused:

originally posted by eoin
First here's the way I always looked at things; in a three layer net there are only really two layers of neurons, the hidden and output layers. The input layer is simply an array of values which are fed into the hidden layer.

Now again i am confused. Here there are actually three layers and the inputs are in different variables, so what is the 3rd layer used for???:confused:

Also if the bias is always supposed to be 1, then why do we use it at all? And if we are supposed to actually change its value then how do we know that its not supposed to be one?:confused:

P.S. - If you are wondering about the pretty syntax highlighting, i was feeling bored :grin:
Posted on 2003-01-25 05:22:03 by clippy

My problems -

This is the main one really. The back propogation part. I knew i said earlier that i could understand it, but now it seems to me i really cant. So you can see how confused i am-


[color=green] // We have to calculate the deltas for the two layers.
// Remember, we have to calculate the errors backwards
// from the output layer to the hidden layer (thus the
// name 'BACK-propagation').[/color]
[color=blue]float[/color] deltas[3];

deltas[2] = out*(1-out)*(d-out);
deltas[1] = i4*(1-i4)*(m_fWeights[2][2])*(deltas[2]);
deltas[0] = i3*(1-i3)*(m_fWeights[1][2])*(deltas[2]);


I cant understand a hell of the above code. There is a formula for the first calculation
( delta[2] ) on generation5, but i dont know where they get the rest from and how actually does this back propogation work?:confused:

The first formula is d = yp(1-yp)(dp - yp). The other two are the same formula applied to different neurons, and is shown on the same page as the first formula (something like dp(q) = xp(q) [ 1 - xp(q)] SUM( w(p+1)(q,i) d(p+1)(i)) (doesn't look very well as plain text :) ).
Basically it calculates the weight delta by using the deltas of the previous layer (when going backwards). In the example case this is only one value (deltas[2]) so the sum will only sum one equatation (m_fWeights * deltas[2]). The result is multiplied by xp(q)[1-xp(q)] which is i4 - (1-i4) in this case (same for i3).
Any network with more layers or more output nodes will have to sum everything so you can't put it in one statement then (you'll have to use a loop to calculate the sum).

Also if the bias is always supposed to be 1, then why do we use it at all? And if we are supposed to actually change its value then how do we know that its not supposed to be one?:confused:

Because the bias is added to the weighted sum, you can 'shift' the output values to a higher or lower range, which might be necessary sometimes to get correct outputs.

I don't think its value (1) will ever change, because the weight can change and thus its weighted value can be anything you like.
Posted on 2003-01-25 05:41:08 by Thomas
Hi thomas,
Sorry again for the late reply.

My one question is still left of the three layers.

As eoin said

in a three layer net there are only really two layers of neurons, the hidden and output layers.

So why is this array of three layers?
float m_fWeights[3][3];


Whats the use of the third layer when the input is already in another different variables?
Posted on 2003-01-29 02:15:29 by clippy
Hi Gladiator, sorry if I confused you.

What I ment was that thee is only need for two layers of neurons. Only the hidden layer and output layer have bias and inputs for each neuron. The input layer simply represents a vector (array basically) of inputs.

However having an input layer of neurons as well and physically inserting the desired input vector into the outputs of the input neurons can simplify coding, espicially if a very OOP approach has been taken.

I don't think that XOR net is the greatest example to work from, its too specific to that problem (or rather I should say, to that Net Layout) this may be more helpful.
Posted on 2003-01-29 07:26:45 by Eóin

Hi thomas,
Sorry again for the late reply.

My one question is still left of the three layers.

As eoin said

So why is this array of three layers?
float m_fWeights[3][3];


Whats the use of the third layer when the input is already in another different variables?


The m_fWeights[3][3] contains two weights and one bias for three neurons. That is (2+1)*3 = 3*3 weights. There is no topology information there at all.
Posted on 2003-01-30 09:42:31 by gliptic
Thanks eoin and gliptic:)

eoin,
I am looking at the link you gave me.
I must say I cant make much out of it.:)

I will post specific questions when i find them.
Posted on 2003-02-04 01:21:19 by clippy
Here's how a trianing loops works. Its probably better if you simply try and program it yourself, its very hard to follow this stuff.

Calculating Deltas

You start off by calculating the deltas for the output layer.

Delta = Output(1 - Output)(Desired - Output)

After that you'll propagate those deltas backwards. This basically means you start with the first neuron on the 2nd last layer and perform the quick calculation to prepare for the delta value

preDelta = Output(1 - Output)

Then you need to cycle through the Output weights for that neuron and for each one multiply that weight and the delta of the neuron it points to by the preDelta value.

Once you gone through all the weights you the preDelta will be the actual Delta value. Do that for every neuron on the layer then move down to the layer below it and repeat the process.

Weights and Bias.

Go to the first neuron on the output (last) layer and enter a loop which will cycle through its input weights.

For each weight take the neurons delta multiplied by the learning coefficient and multiply that by the output of the neuron which feeds into that weight on the lower layer. Add this value to the weight.

Repeat for all the weights that feed into the neuron. For the bias it's a similar process but there is no output from a previous layer so just take neurons delta multiplied by the learning coefficient and add that to the bias.

Then move down a layer and repeat the process. You don't go through this for the input layer as it doesn't have input weights or bias. This is the reason why technically the input layer isn't a layer of neurons, though depending on your style it can make programming easier if you pretend it is.

Note on Weights

For me the main confusion in these nets came from understanding the layout of the weights. So I'll try and explain it here.

Between two layers there are the weights which connect the layers. These are usually stored as a 2d array in two possible layouts. Weight(1,1) is the weight which connects neuron 1 in the lower layer to neuron 1 in the upper layer. However weight(1,3) could represent the weight which connects neuron 1 in the lower layer to neuron 3 in the upper layer.

In this layout the delta calculating loop which goes through the output weights of neuron 1 in the lower layer is accessing weight(1,j) throughout the loop where j is the counter.

However for the weight adjusting loop which modifies the input weights of neuron 1 of the upper layer based on neuron j of the lower layers output is modifying weight(j,1).

Hope I made this as clear as I could, it's a tricky subject to get your head around. I've written a couple of backpropagation nets and every time I have to sit back and really think about the weights. But it slowly gets easier.
Posted on 2003-02-04 07:39:30 by Eóin
hi eoin,
thanks a million for going through the pain of explaining me all this.

I think i understand much better now. But as you said, its gonna get absolutly clear when i actually code something using it:)
Posted on 2003-02-08 04:16:20 by clippy
I found this site very useful for learning the basics:
http://www.cogs.susx.ac.uk/users/carlos/doc/FCS-ANN-tutorial.htm
Posted on 2003-02-09 22:59:09 by huh