C++ Neural Networks and Fuzzy Logic
by Valluru B. Rao M&T Books, IDG Books Worldwide, Inc. ISBN: 1558515526 Pub Date: 06/01/95 |
Previous | Table of Contents | Next |
Details of the Backpropagation Header File
At the top of the file, there are two #define statements, which are used to set the maximum number of layers that can be used, currently five, and the maximum number of training or test vectors that can be read into an I/O buffer. This is currently 100. You can increase the size of the buffer for better speed at the cost of increased memory usage.
The following are definitions in the layer base class. Note that the number of inputs and outputs are protected data members, which means that they can be accessed freely by descendants of the class.
int num_inputs; int num_outputs; float *outputs; // pointer to array of outputs float *inputs; // pointer to array of inputs, which // are outputs of some other layer friend network;
There are also two pointers to arrays of floats in this class. They are the pointers to the outputs in a given layer and the inputs to a given layer. To get a better idea of what a layer encompasses, Figure 7.3 shows you a small feedforward backpropagation network, with a dotted line that shows you the three layers for that network. A layer contains neurons and weights. The layer is responsible for calculating its output (calc_out()), stored in the float * outputs array, and errors (calc_error()) for each of its respective neurons. The errors are stored in another array called float * output_errors defined in the output class. Note that the input class does not have any weights associated with it and therefore is a special case. It does not need to provide any data members or function members related to errors or backpropagation. The only purpose of the input layer is to store data to be forward propagated to the next layer.
Figure 7.3 Organization of layers for backpropagation program.
With the output layer, there are a few more arrays present. First, for storing backpropagated errors, there is an array called float * back_errors. There is a weights array called float * weights, and finally, for storing the expected values that initiate the error calculation process, there is an array called float * expected_values. Note that the middle layer needs almost all of these arrays and inherits them by being a derived class of the output_layer class.
There is one other class besides the layer class and its descendants defined in this header file, and that is the network class, which is used to set up communication channels between layers and to feed and remove data from the network. The network class performs the interconnection of layers by setting the pointer of an input array of a given layer to the output array of a previous layer.
This is a fairly extensible scheme that can be used to create variations on the feedforward backpropagation network with feedback connections, for instance.
Another connection that the network class is responsible for is setting the pointer of an output_error array to the back_error array of the next layer (remember, errors flow in reverse, and the back_error array is the output error of the layer reflected at its inputs).
The network class stores an array of pointers to layers and an array of layer sizes for all the layers defined. These layer objects and arrays are dynamically allocated on the heap with the New and Delete functions in C++. There is some minimal error checking for file I/O and memory allocation, which can be enhanced, if desired.
As you can see, the feedforward backpropagation network can quickly become a memory and CPU hog, with large networks and large training sets. The size and topology of the network, or architecture, will largely dictate both these characteristics.
Details of the Backpropagation Implementation File
The implementation of the classes and methods is the next topic. Lets look at the layer.cpp file in Listing 7.2.
Listing 7.2 layer.cpp implementation file for the backpropagation simulator
// layer.cpp V.Rao, H.Rao // compile for floating point hardware if available #include <stdio.h> #include <iostream.h> #include <stdlib.h> #include <math.h> #include <time.h> #include "layer.h" inline float squash(float input) // squashing function // use sigmoid -- can customize to something // else if desired; can add a bias term too // { if (input < -50) return 0.0; else if (input > 50) return 1.0; else return (float)(1/(1+exp(-(double)input))); } inline float randomweight(unsigned init) { int num; // random number generator // will return a floating point // value between -1 and 1 if (init==1) // seed the generator srand ((unsigned)time(NULL)); num=rand() % 100; return 2*(float(num/100.00))-1; } // the next function is needed for Turbo C++ // and Borland C++ to link in the appropriate // functions for fscanf floating point formats: static void force_fpf() { float x, *y; y=&x; x=*y; } // -------------------- // input layer //--------------------- input_layer::input_layer(int i, int o) { num_inputs=i; num_outputs=o; outputs = new float[num_outputs]; if (outputs==0) { cout << "not enough memory\n"; cout << "choose a smaller architecture\n"; exit(1); } } input_layer::~input_layer() { delete [num_outputs] outputs; } void input_layer::calc_out() { //nothing to do, yet } // -------------------- // output layer //--------------------- output_layer::output_layer(int i, int o) { num_inputs =i; num_outputs =o; weights = new float[num_inputs*num_outputs]; output_errors = new float[num_outputs]; back_errors = new float[num_inputs]; outputs = new float[num_outputs]; expected_values = new float[num_outputs]; if ((weights==0)||(output_errors==0)||(back_errors==0) ||(outputs==0)||(expected_values==0)) { cout << "not enough memory\n"; cout << "choose a smaller architecture\n"; exit(1); } } output_layer::~output_layer() { // some compilers may require the array // size in the delete statement; those // conforming to Ansi C++ will not delete [num_outputs*num_inputs] weights; delete [num_outputs] output_errors; delete [num_inputs] back_errors; delete [num_outputs] outputs; } void output_layer::calc_out() { int i,j,k; float accumulator=0.0; for (j=0; j<num_outputs; j++) { for (i=0; i<num_inputs; i++) { k=i*num_outputs; if (weights[k+j]*weights[k+j] > 1000000.0) { cout << "weights are blowing up\n"; cout << "try a smaller learning constant\n"; cout << "e.g. beta=0.02 aborting...\n"; exit(1); } outputs[j]=weights[k+j]*(*(inputs+i)); accumulator+=outputs[j]; } // use the sigmoid squash function outputs[j]=squash(accumulator); accumulator=0; } } void output_layer::calc_error(float & error) { int i, j, k; float accumulator=0; float total_error=0; for (j=0; j<num_outputs; j++) { output_errors[j] = expected_values[j]-outputs[j]; total_error+=output_errors[j]; } error=total_error; for (i=0; i<num_inputs; i++) { k=i*num_outputs; for (j=0; j<num_outputs; j++) { back_errors[i]= weights[k+j]*output_errors[j]; accumulator+=back_errors[i]; } back_errors[i]=accumulator; accumulator=0; // now multiply by derivative of // sigmoid squashing function, which is // just the input*(1-input) back_errors[i]*=(*(inputs+i))*(1-(*(inputs+i))); } } void output_layer::randomize_weights() { int i, j, k; const unsigned first_time=1; const unsigned not_first_time=0; float discard; discard=randomweight(first_time); for (i=0; i< num_inputs; i++) { k=i*num_outputs; for (j=0; j< num_outputs; j++) weights[k+j]=randomweight(not_first_time); } } void output_layer::update_weights(const float beta) { int i, j, k; // learning law: weight_change = // beta*output_error*input for (i=0; i< num_inputs; i++) { k=i*num_outputs; for (j=0; j< num_outputs; j++) weights[k+j] += beta*output_errors[i]*(*(inputs+i)); } } void output_layer::list_weights() { int i, j, k; for (i=0; i< num_inputs; i++) { k=i*num_outputs; for (j=0; j< num_outputs; j++) cout << "weight["<<i<<","<< j<<"] is: "<<weights[k+j]; } } void output_layer::list_errors() { int i, j; for (i=0; i< num_inputs; i++) cout << "backerror["<<i<< "] is : "<<back_errors[i]<<"\n"; for (j=0; j< num_outputs; j++) cout << "outputerrors["<<j<< "] is: "<<output_errors[j]<<"\n"; } void output_layer::write_weights(int layer_no, FILE * weights_file_ptr) { int i, j, k; // assume file is already open and ready for // writing // prepend the layer_no to all lines of data // format: // layer_no weight[0,0] weight[0,1] ... // layer_no weight[1,0] weight[1,1] ... // ... for (i=0; i< num_inputs; i++) { fprintf(weights_file_ptr,"%i ",layer_no); k=i*num_outputs; for (j=0; j< num_outputs; j++) { fprintf(weights_file_ptr,"%f", weights[k+j]); } fprintf(weights_file_ptr,"\n"); } } void output_layer::read_weights(int layer_no, FILE * weights_file_ptr) { int i, j, k; // assume file is already open and ready for // reading // look for the prepended layer_no // format: // layer_no weight[0,0] weight[0,1] ... // layer_no weight[1,0] weight[1,1] ... // ... while (1) { fscanf(weights_file_ptr,"%i",&j); if ((j==layer_no)|| (feof(weights_file_ptr))) break; else { while (fgetc(weights_file_ptr) != `\n') {;}// get rest of line } } if (!(feof(weights_file_ptr))) { // continue getting first line i=0; for (j=0; j< num_outputs; j++) { fscanf(weights_file_ptr,"%f", &weights[j]); // i*num_outputs = 0 } fscanf(weights_file_ptr,"\n"); // now get the other lines for (i=1; i< num_inputs; i++) { fscanf(weights_file_ptr,"%i",&layer_no); k=i*num_outputs; for (j=0; j< num_outputs; j++) { fscanf(weights_file_ptr,"%f", &weights[k+j]); } } fscanf(weights_file_ptr,"\n"); } else cout << "end of file reached\n"; } void output_layer::list_outputs() { int j; for (j=0; j< num_outputs; j++) { cout << "outputs["<<j <<"] is: "<<outputs[j]<<"\n"; } } // --------------------- // middle layer //---------------------- middle_layer::middle_layer(int i, int o): output_layer(i,o) { } middle_layer::~middle_layer() { delete [num_outputs*num_inputs] weights; delete [num_outputs] output_errors; delete [num_inputs] back_errors; delete [num_outputs] outputs; } void middle_layer::calc_error() { int i, j, k; float accumulator=0; for (i=0; i<num_inputs; i++) { k=i*num_outputs; for (j=0; j<num_outputs; j++) { back_errors[i]= weights[k+j]*(*(output_errors+j)); accumulator+=back_errors[i]; } back_errors[i]=accumulator; accumulator=0; // now multiply by derivative of // sigmoid squashing function, which is // just the input*(1-input) back_errors[i]*=(*(inputs+i))*(1-(*(inputs+i))); } } network::network() { position=0L; } network::~network() { int i,j,k; i=layer_ptr[0]->num_outputs;// inputs j=layer_ptr[number_of_layers-1]->num_outputs; //outputs k=MAX_VECTORS; delete [(i+j)*k]buffer; } void network::set_training(const unsigned & value) { training=value; } unsigned network::get_training_value() { return training; } void network::get_layer_info() { int i; //--------------------- // // Get layer sizes for the network // // -------------------- cout << " Please enter in the number of layers for your network.\n"; cout << " You can have a minimum of 3 to a maximum of 5. \n"; cout << " 3 implies 1 hidden layer; 5 implies 3 hidden layers : \n\n"; cin >> number_of_layers; cout << " Enter in the layer sizes separated by spaces.\n"; cout << " For a network with 3 neurons in the input layer,\n"; cout << " 2 neurons in a hidden layer, and 4 neurons in the\n"; cout << " output layer, you would enter: 3 2 4 .\n"; cout << " You can have up to 3 hidden layers,for five maximum entries :\n\n"; for (i=0; i<number_of_layers; i++) { cin >> layer_size[i]; } // -------------------------- // size of layers: // input_layer layer_size[0] // output_layer layer_size[number_of_layers-1] // middle_layers layer_size[1] // optional: layer_size[number_of_layers-3] // optional: layer_size[number_of_layers-2] //--------------------------- } void network::set_up_network() { int i,j,k; //--------------------------- // Construct the layers // //--------------------------- layer_ptr[0] = new input_layer(0,layer_size[0]); for (i=0;i<(number_of_layers-1);i++) { layer_ptr[i+1y] = new middle_layer(layer_size[i],layer_size[i+1]); } layer_ptr[number_of_layers-1] = new output_layer(layer_size[number_of_layers-2],layer_size[number_of_layers- 1]); for (i=0;i<(number_of_layers-1);i++) { if (layer_ptr[i] == 0) { cout << "insufficient memory\n"; cout << "use a smaller architecture\n"; exit(1); } } //-------------------------- // Connect the layers // //-------------------------- // set inputs to previous layer outputs for all layers, // except the input layer for (i=1; i< number_of_layers; i++) layer_ptr[i]->inputs = layer_ptr[i-1y]->outputs; // for back_propagation, set output_errors to next layer // back_errors for all layers except the output // layer and input layer for (i=1; i< number_of_layers -1; i++) ((output_layer *)layer_ptr[i])->output_errors = ((output_layer *)layer_ptr[i+1])->back_errors; // define the IObuffer that caches data from // the datafile i=layer_ptr[0]->num_outputs;// inputs j=layer_ptr[number_of_layers-1]->num_outputs; //outputs k=MAX_VECTORS; buffer=new float[(i+j)*k]; if (buffer==0) cout << "insufficient memory for buffer\n"; } void network::randomize_weights() { int i; for (i=1; i<number_of_layers; i++) ((output_layer *)layer_ptr[i]) ->randomize_weights(); } void network::update_weights(const float beta) { int i; for (i=1; i<number_of_layers; i++) ((output_layer *)layer_ptr[i]) ->update_weights(beta); } void network::write_weights(FILE * weights_file_ptr) { int i; for (i=1; i<number_of_layers; i++) ((output_layer *)layer_ptr[i]) ->write_weights(i,weights_file_ptr); } void network::read_weights(FILE * weights_file_ptr) { int i; for (i=1; i<number_of_layers; i++) ((output_layer *)layer_ptr[i]) ->read_weights(i,weights_file_ptr); } void network::list_weights() { int i; for (i=1; i<number_of_layers; i++) { cout << "layer number : " <<i<< "\n"; ((output_layer *)layer_ptr[i]) ->list_weights(); } } void network::list_outputs() { int i; for (i=1; i<number_of_layers; i++) { cout << "layer number : " <<i<< "\n"; ((output_layer *)layer_ptr[i]) ->list_outputs(); } } void network::write_outputs(FILE *outfile) { int i, ins, outs; ins=layer_ptr[0]->num_outputs; outs=layer_ptr[number_of_layers-1]->num_outputs; float temp; fprintf(outfile,"for input vector:\n"); for (i=0; i<ins; i++) { temp=layer_ptr[0]->outputs[i]; fprintf(outfile,"%f ",temp); } fprintf(outfile,"\noutput vector is:\n"); for (i=0; i<outs; i++) { temp=layer_ptr[number_of_layers-1]-> outputs[i]; fprintf(outfile,"%f ",temp); } if (training==1) { fprintf(outfile,"\nexpected output vector is:\n"); for (i=0; i<outs; i++) { temp=((output_layer *)(layer_ptr[number_of_layers-1]))-> expected_values[i]; fprintf(outfile,"%f ",temp); } } fprintf(outfile,"\n----------\n"); } void network::list_errors() { int i; for (i=1; i<number_of_layers; i++) { cout << "layer number : " <<i<< "\n"; ((output_layer *)layer_ptr[i]) ->list_errors(); } } int network::fill_IObuffer(FILE * inputfile) { // this routine fills memory with // an array of input, output vectors // up to a maximum capacity of // MAX_INPUT_VECTORS_IN_ARRAY // the return value is the number of read // vectors int i, k, count, veclength; int ins, outs; ins=layer_ptr[0]->num_outputs; outs=layer_ptr[number_of_layers-1]->num_outputs; if (training==1) veclength=ins+outs; else veclength=ins; count=0; while ((count<MAX_VECTORS)&& (!feof(inputfile))) { k=count*(veclength); for (i=0; i<veclength; i++) { fscanf(inputfile,"%f",&buffer[k+i]); } fscanf(inputfile,"\n"); count++; } if (!(ferror(inputfile))) return count; else return -1; // error condition } void network::set_up_pattern(int buffer_index) { // read one vector into the network int i, k; int ins, outs; ins=layer_ptr[0]->num_outputs; outs=layer_ptr[number_of_layers-1]->num_outputs; if (training==1) k=buffer_index*(ins+outs); else k=buffer_index*ins; for (i=0; i<ins; i++) layer_ptr[0]->outputs[i]=buffer[k+i]; if (training==1) { for (i=0; i<outs; i++) ((output_layer *)layer_ptr[number_of_layers-1])-> expected_values[i]=buffer[k+i+ins]; } } void network::forward_prop() { int i; for (i=0; i<number_of_layers; i++) { layer_ptr[i]->calc_out(); //polymorphic // function } } void network::backward_prop(float & toterror) { int i; // error for the output layer ((output_layer*)layer_ptr[number_of_layers-1])-> calc_error(toterror); // error for the middle layer(s) for (i=number_of_layers-2; i>0; i--) { ((middle_layer*)layer_ptr[i])-> calc_error(); } }
Previous | Table of Contents | Next |