|
C++ Neural Networks and Fuzzy Logic
by Valluru B. Rao
M&T Books, IDG Books Worldwide, Inc.
ISBN: 1558515526 Pub Date: 06/01/95
|
Trying the Noise and Momentum Features
You can test out the version 2 simulator, which you just compiled with the example that you saw at the beginning of the chapter. You will find that there is a lot of trial and error in finding optimum values for alpha, the noise factor, and beta. This is true also for the middle layer size and the number of middle layers. For some problems, the addition of momentum makes convergence much faster. For other problems, you may not find any noticeable difference. An example run of the five-character recognition problem discussed at the beginning of this chapter resulted in the following results with beta = 0.1, tolerance = 0.001, alpha = 0.25, NF = 0.1, and the layer sizes kept at 35 5 3.
-
done: results in file output.dat
training: last vector only
not training: full cycle
weights saved in file weights.dat
>average error per cycle = 0.02993<-
>error last cycle = 0.00498<-
->error last cycle per pattern= 0.000996 <-
>total cycles = 242 <-
>total patterns = 1210 <-
-
The network was able to converge on a better solution (in terms of error measurement) in one-fourth the number of cycles. You can try varying alpha and NF to see the effect on overall simulation time. You can now start from the same initial starting weights by specifying a value of 1 for the starting weights question. For large values of alpha and beta, the network usually will not converge, and the weights will get unacceptably large (you will receive a message to that effect).
Variations of the Backpropagation Algorithm
Backpropagation is a versatile neural network algorithm that very often leads to success. Its Achilles heel is the slowness at which it converges for certain problems. Many variations of the algorithm exist in the literature to try to improve convergence speed and robustness. Variations have been proposed in the following portions of the algorithm:
- Adaptive parameters. You can set rules that modify alpha, the momentum parameter, and beta, the learning parameter, as the simulation progresses. For example, you can reduce beta whenever a weight change does not reduce the error. You can consider undoing the particular weight change, setting alpha to zero and redoing the weight change with the new value of beta.
- Use other minimum search routines besides steepest descent. For example, you could use Newtons method for finding a minimum, although this would be a fairly slow process. Other examples include the use of conjugate gradient methods or Levenberg-Marquardt optimization, both of which would result in very rapid training.
- Use different cost functions. Instead of calculating the error (as expectedactual output), you could determine another cost function that you want to minimize.
- Modify the architecture. You could use partially connected layers instead of fully connected layers. Also, you can use a recurrent network, that is, one in which some outputs feed back as inputs.
Applications
Backpropagation remains the king of neural network architectures because of its ease of use and wide applicability. A few of the notable applications in the literature will be cited as examples.
- NETTalk. In 1987, Sejnowski and Rosenberg developed a network connected to a speech synthesizer that was able to utter English words, being trained to produce phonemes from English text. The architecture consisted of an input layer window of seven characters. The characters were part of English text that was scrolled by. The network was trained to pronounce the letter at the center of the window. The middle layer had 80 neurons, while the output layer consisted of 26 neurons. With 1024 training patterns and 10 cycles, the network started making intelligible speech, similar to the process of a child learning to talk. After 50 cycles, the network was about 95% accurate. You could purposely damage the network with the removal of neurons, but this did not cause performance to drop off a cliff; instead, the performance degraded gracefully. There was rapid recovery with retraining using fewer neurons also. This shows the fault tolerance of neural networks.
- Sonar target recognition. Neural nets using backpropagation have been used to identify different types of targets using the frequency signature (with a Fast Fourier transform) of the reflected signal.
- Car navigation. Pomerleau developed a neural network that is able to navigate a car based on images obtained from a camera mounted on the cars roof, and a range finder that coded distances in grayscale. The 30×32 pixel image and the 8×32 range finder image were fed into a hidden layer of size 29 feeding an output layer of 45 neurons. The output neurons were arranged in a straight line with each side representing a turn to a particular direction (right or left), while the center neurons represented drive straight ahead. After 1200 road images were trained on the network, the neural network driver was able to negotiate a part of the Carnegie-Mellon campus at a speed of about 3 miles per hour, limited only by the speed of the real-time calculations done on a trained network in the Sun-3 computer in the car.
- Image compression. G.W. Cottrell, P. Munro, and D. Zipser used backpropagation to compress images with the result of an 8:1 compression ratio. They used standard backpropagation with 64 input neurons (8×8 pixels), 16 hidden neurons, and 64 output neurons equal to the inputs. This is called self-supervised backpropagation and represents an autoassociative network. The compressed signal is taken from the hidden layer. The input to hidden layer comprised the compressor, while the hidden to output layer forms a decompressor.
- Image recognition. Le Cun reported a backpropagation network with three hidden layers that could recognize handwritten postal zip codes. He used a 16×16 array of pixel to represent each handwritten digit and needed to encode 10 outputs, each of which represented a digit from 0 to 9. One interesting aspect of this work is that the hidden layers were not fully connected. The network was set up with blocks of neurons in the first two hidden layers set up as feature detectors for different parts of the previous layer. All the neurons in the block were set up to have the same weights as those from the previous layer. This is called weight sharing. Each block would sample a different part of the previous layers image. The first hidden layer had 12 blocks of 8×8 neurons, whereas the second hidden layer had 12 blocks of 4×4 neurons. The third hidden layer was fully connected and consisted of 30 neurons. There were 1256 neurons. The network was trained on 7300 examples and tested on 2000 cases with error rates of 1% on training set and 5% on the test set.