Towards reducing the hardware complexity of feature detection- based models Bassem Medawar and Andrew Noetzel1 Polytechnic University 333 Jay St. Brooklyn, NY 11201 ____________________ 1. This paper will be presented at the International Joint Conference on Neural Networks, Jan. 15-19, 1990, Washington, DC. Abstract A model for feature detection-based pattern recognition is presented. It attempts to improve on the hardware complexity of existing models. Traditionally, feature detection has been implemented with brute force duplication of template-based feature detectors, offering little scalability. This model eliminates the need to duplicate complex feature detectors using instead operators to transform patterns. 1. Introduction It has been shown [1] that the brain uses feature detection, in its visual pattern recognition task. Many researches have attempted to capture the brain's pattern recognizing capability in abstract models. In their attempts, some have tried to remain faithful to the biological principles underlying the functioning and organization of the brain [2]. Others borrowed from the brain the most important principles and tried to cast them in any model that could be demonstrated to work [3,4]. The model in this paper follows the latter approach. The work done by Fukushima will first be examined, representing earlier models of its class. A new model, which attempts to overcome their limitations, will then be described. Finally the paper will conclude with a description of future improvements to the model. 2. Earlier work Models which loosely follow the brain's architecture, have done so based on the following elementary principles: a) The neuron (as a threshold element) is the building block of those models. b) The neuron's output can be viewed as a boolean corresponding to on and off activation states, or as a positive (bounded) real number representing the activation rate of the neuron. c) The pattern recognizing network is layered. d) Each layer contains feature detecting cells with increasing level of conceptual complexity, the higher the layer level. Many models in the neural network literature applied those principles. Focus will be centered on Fukushima's model, because it is the most elaborate and has been shown to work with a (relatively) large retina of 128x128 pixels [5]. Fukushima's Neocognitron model [3,6] has layers with two levels each. The lower level is made up of groups of template matchers. The higher level neurons take their inputs from groups at the lower level that recognize the same object at different positions. The net effect is feature detectors tolerant of displacement and slight distortion. Fukushima's neocognitron suffers the following problems. First, hardware is not amiable to large scale implementations: in one case [6], a simple 19x19 retina, 4 layer network implementation required over 40 thousand cells, excluding non-responding ones. Second, each learned template is duplicated in many positions after being trained in only one position: this is problematic for hardware implementations as well as being biologically implausible. 3. The model This model [fig. 1] is based on the premise that instead of taking the feature detector to the pattern (multiplying the number of feature detectors), we bring the pattern to the feature detector (multiplying the pathways.) As the example of the neocognitron shows, duplicating complex feature detectors is costly, in terms of number of cells. What we hope to achieve is a reduction in the overall number of cells. The retina is thus divided into several marginally overlapping receptive fields (RF's) of uniform size. Simple hard-wired operators provide a many-to-one mapping from the RF area to the feature detector. Those operators are divided into classes. For instance, on the lowest layer, the classes are displacement, scaling, and rotation. On higher layers, the classes include positional and set operators. Each class has its variations within each RF, depending on where the operator maps from, and the degree of the mapping. For example, the displacement class has variations which corresponds to the direction and the amount of the displacement. On the next level, within a layer, feature detectors take their input from the output of the operators weighed by the optimal feature pattern. The output of those feature detectors represents the degree of success with which a particular operator maps into the optimal feature. From this large pool of feature detector outputs within a receptive field, the best variation and degree for each class is selected. Then, the optimum values across the layer from each RF are combined to choose the best class of operators. This choice represents the consensus as to which class of operators best maps into some feature. The consensus is then fed back to lower levels, allowing each RF to reset its own image of the retina according to the new transformation. Notice that while the application of the operator class is enforced upon the layer, each RF implements the transformation in a way that generates optimal mapping. After one class is selected within a layer, the class is then inhibited allowing another class to win in a second round. The process is then repeated until a threshold of desirable outcome is exceeded. The feature with best degree of success, can thus be said to have been recognized. [Fig. 1 is inserted about here] Having recognized a feature for each RF on the first layer, the output of the first layer is fed into the second layer. A similar process of transformation and recognition is carried out in the second layer. Finally, on the topmost layer, a feature detector which conceptually represents an object is selected and the whole process terminates. 4. The model's weakness Simulating this model necessitated additional hardware that was not originally envisioned. While its design premise is simple, the number of cells required to implement it is proportional to the number of RF's, the number of classes, the number of variations within a class, and the number of features within each layer. The model does not lend itself to a nice and simple mathematical model to support it, and mathematical properties of the model are not practical to implement. For example, while it is mathematically sound to say that two features are different if one can not be generated from the other by applying any sequence of operators in any order. This property taken to the extreme is not practical to use in order to incorporate self-organization into the model. While that the model can be augmented with learning rules that change the weights on the feature detectors, it fails to address how a whole new class of operators can be learned. The model has been shown in practice to fail under certain circumstances: the wrong sequence of transformations is applied leading to either faulty recognition or no recognition at all. This is a result of the lack of communication between neighboring RF's. 5. Conclusion and future work A feature detection-based model was presented that was designed to address the limitations of previous models. The model was developed from an innovative idea. Although the model achieved its objective of lesser overall hardware complexity, it had few limitations of its own. Currently, work is in progress on a new model. This model incorporates communication between neighboring RF, coupled with hill climbing techniques to pick the best transformation to apply within an RF image. In effect, this will result in a reduction in the number of pathways as only few transformations will be implemented at a time. References [1] Kuffler S. W., Nicholls J. G., and Martin A. R., From Neuron to Brain, 2nd Ed. Sinauer Associates Inc. Publishers, Sunderland, MA, 1984. [2] Linsker R., Self-Organization in a Perceptual Network. IEEE Computer, March 1988, pp. 105-117. [3] Fukushima K., and Miyake S., Neocognitron: A new algorithm for pattern recognition tolerant of deformations and shifts in position. Pattern Recognition, Vol. 15, No. 6, pp. 455- 469, 1982. [4] Widrow B., Adaline and Madaline - 1963. IEEE 1st International Conference on Neural Networks. Vol. 1, pp. 143-157. [5] Menon M. M., and Heinemann K. G., Classification of patterns using a self-organizing neural network. Neural Networks, Vol. 1, pp. 201-215, 1988. [6] Fukushima K., A neural network for visual pattern recognition. IEEE Computer, March 1988, pp. 65-74.