C++ Neural Networks and Fuzzy Logic
by Valluru B. Rao M&T Books, IDG Books Worldwide, Inc. ISBN: 1558515526 Pub Date: 06/01/95 |
Previous | Table of Contents | Next |
The ability of a Perceptron in evaluating functions was brought into question when Minsky and Papert proved that a simple function like XOR (the logical function exclusive or) could not be correctly evaluated by a Perceptron. The XOR logical function, f(A,B), is as follows:
A | B | f(A,B)= XOR(A,B) | |
---|---|---|---|
0 | 0 | 0 | |
0 | 1 | 1 | |
1 | 0 | 1 | |
1 | 1 | 0 | |
To summarize the behavior of the XOR, if both inputs are the same value, the output is 0, otherwise the output is 1.
Minsky and Papert showed that it is impossible to come up with the proper set of weights for the neurons in the single layer of a simple Perceptron to evaluate the XOR function. The reason for this is that such a Perceptron, one with a single layer of neurons, requires the function to be evaluated, to be linearly separable by means of the function values. The concept of linear separability is explained next. But let us show you first why the simple perceptron fails to compute this function.
Since there are two arguments for the XOR function, there would be two neurons in the input layer, and since the functions value is one number, there would be one output neuron. Therefore, you need two weights w1 and w2 ,and a threshold value θ. Let us now look at the conditions to be satisfied by the ws and the θ so that the outputs corresponding to given inputs would be as for the XOR function.
First the output should be 0 if inputs are 0 and 0. The activation works out as 0. To get an output of 0, you need 0 < θ. This is your first condition. Table 5.1 shows this and two other conditions you need, and why.
Input | Activation | Output | Needed Condition |
---|---|---|---|
0, 0 | 0 | 0 | 0 < θ |
1, 0 | w1 | 1 | w1 > θ |
0, 1 | w2 | 1 | w2 > θ |
1, 1 | w1 + w2 | 0 | w1 + w2< θ |
From the first three conditions, you can deduce that the sum of the two weights has to be greater than θ, which has to be positive itself. Line 4 is inconsistent with lines 1, 2, and 3, since line 4 requires the sum of the two weights to be less than θ. This affirms the contention that it is not possible to compute the XOR function with a simple perceptron.
Geometrically, the reason for this failure is that the inputs (0, 1) and (1, 0) with which you want output 1, are situated diagonally opposite each other, when plotted as points in the plane, as shown below in a diagram of the output (1=T, 0=F):
F T T F
You cant separate the Ts and the Fs with a straight line. This means that you cannot draw a line in the plane in such a way that neither (1, 1) ->F nor (0, 0)->F is on the same side of the line as (0, 1) ->T and (1, 0)-> T.
What linearly separable means is, that a type of a linear barrier or a separatora line in the plane, or a plane in the three-dimensional space, or a hyperplane in higher dimensionsshould exist, so that the set of inputs that give rise to one value for the function all lie on one side of this barrier, while on the other side lie the inputs that do not yield that value for the function. A hyperplane is a surface in a higher dimension, but with a linear equation defining it much the same way a line in the plane and a plane in the three-dimensional space are defined.
To make the concept a little bit clearer, consider a problem that is similar but, let us emphasize, not the same as the XOR problem.
Imagine a cube of 1-unit length for each of its edges and lying in the positive octant in a xyz-rectangular coordinate system with one corner at the origin. The other corners or vertices are at points with coordinates (0, 0, 1), (0, 1, 0), (0, 1, 1), (1, 0, 0), (1, 0, 1), (1, 1, 0), and (1, 1, 1). Call the origin O, and the seven points listed as A, B, C, D, E, F, and G, respectively. Then any two faces opposite to each other are linearly separable because you can define the separating plane as the plane halfway between these two faces and also parallel to these two faces.
For example, consider the faces defined by the set of points O, A, B, and C and by the set of points D, E, F, and G. They are parallel and 1 unit apart, as you can see in Figure 5.1. The separating plane for these two faces can be seen to be one of many possible planesany plane in between them and parallel to them. One example, for simplicity, is the plane that passes through the points (1/2, 0, 0), (1/2, 0, 1), (1/2, 1, 0), and (1/2, 1, 1). Of course, you need only specify three of those four points because a plane is uniquely determined by three points that are not all on the same line. So if the first set of points corresponds to a value of say, +1 for the function, and the second set to a value of 1, then a single-layer Perceptron can determine, through some training algorithm, the correct weights for the connections, even if you start with the weights being initially all 0.
Figure 5.1 Separating plane.
Consider the set of points O, A, F, and G. This set of points cannot be linearly separated from the other vertices of the cube. In this case, it would be impossible for the single-layer Perceptron to determine the proper weights for the neurons in evaluating the type of function we have been discussing.
Previous | Table of Contents | Next |