|NSF Report - Facial Expression Understanding|
III-D. Special Hardware For Face Processing
Abstract. The optical and electronic methods for performing image processing tasks, such as face recognition, and the requirements imposed on the hardware for real time operation are described. Real time image processing is difficult because of the very high computational rate that is required to make interesting calculations on large images. Parallel hardware is an obvious solution to the problem particularly since image processing lends itself naturally to fine grain parallelism, with each pixel being processed in parallel at each stage. For certain regular and/or local operations, digital techniques can be effectively applied to image processing (e.g., cellular and systolic arrays). Problems with power consumption, topology and interconnections, however, can make analog implementations using VLSI or optical techniques advantageous. The primary focus is on such analog hardware for early vision tasks.
Presenter: D. Psaltis
Face recognition algorithms can be classified into two broad categories: model-based algorithms and learning algorithms. Model based algorithms are in general computationally intensive, requiring complex logic operations that typically require the flexibility of a general purpose digital computer. On the other hand, algorithms that are based on learning approaches, (neural networks, eigenfaces, statistical approaches, etc.), typically require relatively simple elementary operations and are conducive to massively parallel implementations. As a general rule then, approaches such as neural networks have the advantage that the speed, efficiency, and cost of the implementation can be greatly reduced by special purpose hardware. The hardware options for the implementation of a neural network face recognition system range from workstations, to general purpose supercomputers, to custom digital and analog VLSI, and optics. Workstations and, in some cases, supercomputers are used for algorithm development. Special purpose co-processors and digital signal processing cards can speed up the execution sufficiently to allow real time operation for simple recognition tasks. An excellent example in this category is Burt's (1988b) work in which a simple workstation with a special purpose digital co-processor was able to track the face of a person staring forward anywhere in the field of view of the television camera. In the work on eigenfaces by Turk and Pentland (1991), the data reduction achieved by the algorithm, makes it possible to use commercial hardware to obtain near real time performance. The implementation of more complex face processing tasks where we can have insensitivity to the illumination direction, head position, orientation and size as well as good discrimination capability will require, in most cases, special purpose hardware for real time, compact operation.
The reason for switching to special purpose hardware is not only so that we can implement more efficiently an algorithm that was previously working in software or on a slower machine. Rather, we need to select an implementation strategy that is inherently well suited to the task at hand and dramatically enhances the processing power that is available. There are two major categories of special purpose hardware that appear best suited for real time implementation of neural, image processing algorithms: analog VLSI and optics. Both of these technologies are analog. In switching from digital to analog we give up a lot. We give up algorithmic flexibility and accuracy. With neural networks, however, adaptation helps us address these problems. A neural network is programmed to perform specific tasks largely through the learning process instead of relying solely on the circuit design. Therefore, the fact that we have less control over the initial functionality of the machine is compensated by the ability to train it. Adaptation (along with redundancy) also helps overcome the accuracy limitations since an error in computation is detected and compensated for by the adaptation mechanisms that are built in. The benefits of the analog implementation are extremely dense, power efficient, and powerful networks. An analog multiplier is constructed with 3 to 5 transistors whereas an entire chip is needed for a digital multiplier. In what follows we will briefly discuss each of the two hardware technologies, analog VLSI and optics.
One of the most dramatic demonstrations of the power of analog VLSI (Mead, 1989) has been in the area of image processing. Most analog VLSI image processing chips consist of a 2D array of photosensors coupled to surrounding circuitry that performs a local computation on the image field. The first circuit of this type is the silicon retina of Mahowald and Mead (1991). In this circuit, a 2D hexagonal array of phototransistors senses the incident image. Each photodetector is coupled to its 6 neighbors through a resistive array. The circuit at each node computes a local edge enhancement and also emphasizes the time varying portion of the signal using differentiation of the signals. Several other circuits of this type have been constructed (Andreou et al., 1991; Harris et al., 1990; Tanner & Mead, 1984; Tolbruck, 1992), most significantly chips that perform motion processing. These analog VLSI chips very effectively perform preprocessing tasks that can be implemented with local connectivity. For computations that require pixels in the retina to communicate with many of their neighbors, the achievable density of image pixels deteriorates roughly as the square of the number of connections per pixel. This is because if the number of connections is doubled the area that needs to be devoted to them quadruples since we not only have more connections but also longer connections. Recently, several researchers have demonstrated analog VLSI image processing chips that have an optical output for each image pixel (Drabik & Handschy, 1990; Cotter et al., 1990). This is accomplished by depositing liquid crystal light modulators on top of the silicon chip. This provides more extensive connectivity and the capability to cascade in parallel such chips using the optical techniques we discuss below.
The advantages of the optical implementation derive from the fact that we have direct access optically to the third dimension (Psaltis et al., 1990). This is particularly useful for image processing in general and face processing in particular, since it allows us to arrange the image pixels in 2D arrays that densely populate the plane, while making the interconnections via the third dimension. Typically, the interconnections are specified with holograms that are dynamically adapted. This basic arrangement makes it possible to process images with roughly one million pixels within an active area of approximately 1 squared cm with essentially any desired connectivity. The optical implementation becomes increasingly attractive as the number of connections per pixel increases. For applications such as motion detection and edge enhancement, which can be realized with very few local connections, a purely electronic approach may be sufficient. However, as the density and range of the connections increases, the connections start dominating the area of the chip, and the density of pixels that can be supported reduces dramatically. The pixel density that is achievable with an optical implementation is relatively insensitive to the type of connectivity that is required. A large number of optical implementations have been described and experimentally demonstrated (Psaltis & Farhat, 1985; Abu-Mostafa & Psaltis, 1987; Owechko et al., 1987; Anderson, 1986; Farhat et al., 1985; Wagner & Psaltis, 1987; Psaltis et al., 1988; Yeh et al., 1988; Paek & Jung, 1991; Maniloff & Johnson, 1990). Recently, an optical experiment was carried out specifically for face recognition (Li et al., no date). This two layer network was trained to recognize in real time (at 30 frames per second) faces under a broad range of viewing conditions while exhibiting excellent discrimination capability against unfamiliar faces. The network has up to 200 hidden units and more than 10 million adaptable weights. This is probably the most ambitious face processing hardware demonstration to date. The next step is likely to be a marriage of the silicon retinas that perform the early preprocessing followed by an optical system that performs the recognition task.