|NSF Report - Facial Expression Understanding|
Beatrice Golomb and Terrence J. Sejnowski
The potential benefits from efforts to understand the face using the technologies discussed in this report are varied and numerous. The preceeding sections have separately enumerated many benefits. This section summarizes these benefits and indicates additional areas where benefits could accrue that were not emphasized previously.
Automated systems that process natural facial signals and/or generate synthetic outputs related to the face have important commercial potential, as indicated throughout this report. Some products would become parts of systems used by the mass consumer market. For example, machines that monitor the face could become part of digital video conferencing and video telephone systems of low bandwidth, tracking both the location and signals of the face. When combined with synthesis of the face and speech recognition and translation, these systems could become universal translators where a foreign language and the correct lip movements and facial expressions are imposed on audio-video images of a speaker. These "talking heads" could significantly enhance international commercial and political communications. Another area of application is the "personal agent" of advanced personal computer operating systems, enabling users to create and communicate with a customized personality that could perform increasingly numerous and complex tasks of the computer and assess the reactions of the user. This agent would use speech recognition, possibly supplemented with lip reading, and nonverbal facial cues to interpret the user. The agent's face would be created synthetically and would respond with synthesized speech and artificial expressive movements based on the methods described by the Modeling and Database Workgroup (pages 48-50).
Other markets for this technology include specialized areas in industrial and professional sectors. For example, monitoring and interpreting facial signals are important to lawyers, the police, and intelligence or security agents, who are often interested in issues concerning deception or attitude. Machine monitoring could introduce a valuable tool in these situations where only informal interpretations are now used. A mechanism for assessing boredom or inattention could be of considerable value in workplace situations where attention to a crucial, but perhaps tedious task is essential, such as air traffic control, space flight operations, or watching a display monitor for rare events. A special advantage of machine monitoring in these situations is that other people need not be watching, as in the Big Brother scenario; instead, "one's own machine" provides helpful prompts for better performance. Developments in commercial motion picture production have already taken advantage of digital image processing (morphing, etc.) and would be a likely benefactor of improved synthesis of human qualities. Also in the entertainment area, the nascent virtual reality field might use similar resources.
Finally, the specialized market for scientific instrumentation should not be overlooked. Packaged tools that analyze or synthesize facial information could be a profitable enterprise for researchers in fields that are discussed below.
The problem of understanding the face will enhance the computer sciences and technologies brought to bear on these issues. As the report from the Sensing and Processing Workgroup points out (page 43), the face has some ideal qualities for development of computer vision systems. The application of neural networks and innovative approaches to digital image processing is sure to be applied to other problems, such as analysis of planetary photos and particle decay images.
The Modeling and Database Workgroup emphasized the need for a database of the face. There are parallels between the need to collect and maintain image databases for the analysis of facial expressions and the need to develop a database of utterances for speech recognition. The TIMIT database, a joint effort of Texas Instruments and MIT, provided the speech community with a research standard for speech records and a means for objectively comparing speech recognition systems. The proposed database of facial expressions could provide similar benefits for the vision community. As the Database Workgroup indicates, there are many challenges that the creation of such a huge multimedia database would bring to the investigators developing scientific databases.
Basic Science Research
Basic research that uses measures of the face and facial behavior would reap substantial benefits from inexpensive, reliable, and rapid facial measurement tools. In brief, such tools would revolutionize these fields by raising the quality of research in which reliability and precision currently are nagging problems, by shortening the time to conduct research that is now lengthy and laborious, and by enabling many more researchers, who are presently inhibited by its expense and complexity, to use facial measurement. More studies of greater quality at lower cost in more scientific areas is the promise of success in automating facial measurement.
One can imagine the impact of these innovations by considering the breadth of some of the current research that uses facial measurement. In behavioral science, facial expression is an important variable for a large number of studies on human interaction and communication (e.g., Ekman et al., 1972; Ekman & Oster, 1979); is a focus of research on emotion (e.g., Ekman, 1984), cognition (e.g., Zajonc, 1984), and the development of infants and young children, (e.g., Camras, 1977); and has become a measure frequently used in psychophysiolgical studies (e.g,, Davidson, et al., 1990). In anthropology, the cross-cultural perception and production of facial expression is a topic of considerable interest (e.g., Ekman & Friesen, 1986). For political science and economics, measurement of facial expression is important in studies of negotiations and interpersonal influence (e.g., McHugo et al., 1985). In neurophysiology, correlates of viewing faces, and in some cases facial expressions, with single neuron activity helps map brain function, such as the cells that respond selectively to faces (some to identity, others to expression) in the superior temporal sulcus (Perret et al., 1982, 1984; Rolls et al., 1987, 1989; Baylis et al., 1985), in parietal and frontal regions (Piagarev et al., 1979), and the inferotemporal region (Gross et al., 1972). The amygdala, where neurons also respond to faces, appears to be concerned with emotional and social responses, and ablation leads to inappropriate social responses to faces (Rolls 1984). In linguistics, coarticulation strategies in lower lip protrusion movements (Perkell, 1986), relations between facial nerve, facial muscles, and soft palate in speech (Van Gelder & Van Gelder 1990), features of acquired dysarthria in childhood (Van Dongen et al., 1987), and lip-reading as a supplement to auditory cues in speech perception have been investigated. The Basic Science Workgroup outlined many tools (pages 32 to 38) that could be applied to enhance these efforts.
This section examines more closely one area where the facial measurement tools described above could expand and enhance research and applications. Faces and facial expression have relevance to medicine, neurology, and psychiatry, and a system to automate coding of facial expression would advance research in diverse domains.
Many disorders in medicine, particularly neurology and psychiatry, involve aberrations in expression, perception, or interpretation of facial action. Coding of facial action is thus necessary to assess the effect of the primary disorder, to better understand the disorder, and to devise strategies to overcome the limitations imposed by the disorder. In addition, because different disorders produce different effects on expression, examination of facial action skills (production and reception) may assist diagnosis.
In the psychiatric domain, the ability to produce or interpret facial expression is selectively affected by certain brain lesions or psychopathology. Schizophrenia and "psychosomatic" illness lead to blunting of expression, both in patients and in control subjects talking to these patients, who do not consciously know they are talking to mentally ill people (Krause et al., 1989; Steimer-Krause et al., 1990). Facial expression of emotion distinguishes depressed patients on admission to the hospital and after successful therapy (Ellgring, 1989), and various studies have examined expression with major affective illness (Ekman & Fridlund, 1987) and unipolar depression (Jaeger et al., 1986), as well as other psychopathology (Mandal & Palchoudhury, 1986). Automated methods for assessing facial responses and delivering stimuli would improve the research on and delivery of clinical services.
In neurology, analysis of inappropriate facial expressions may provide evidence for the location and type of brain lesions. Brainstem damage may lead to emotional lability, as in pseudo-bulbar palsy. Changes of facial expression consistent with sadness, fear, surprise, etc. have also been described at the onset of seizures (Hurwitz et al., 1985), and outbursts of anger have been seen with brain lesions as well (Poeck, 1969; Reeves & Plum, 1969). Parkinson's disease, a disorder of the dopaminergic system, is associated with amimia, or reduction of spontaneous facial activity (Buck & Duffy, 1980), including a decreased eye blink rate; but subcortical lesions may also lead to increased facial activity, as in Meige's disease or Brueghel's syndrome, thought to result from a disorder of the basal ganglia.
Cortical lesions also influence expression. Lesions of the supplementary motor area (medial part of the frontal lobe) lead to contralateral facial paresis, with spontaneous emotional expression more affected than voluntary; lesions of the motor cortex (also with contralateral facial hemiparesis) affect voluntary movements but leave intact spontaneous smiling (Monrad-Krohn, 1924). Frontal lobe lesions lead to fewer spontaneous expressions of brow raising, smiling, lip tightening, tongue protrusion, etc. during neuropsychological testing of brain injured subjects than parietal or temporal lesions (Kolb & Milner, 1981), though the role of motor versus cognitive or emotional contributions has not been sorted out. The effects of biofeedback, used therapeutically for this condition (Godoy & Carrobles, 1986), could be tracked using facial expression analysis.
Automation of facial measurements could provide the increased reliability, sensitivity, and precision needed to exploit the relationship between facial signs and neurological damage and lead to new insights and diagnostic methods.
Alteration in facial expression may be seen with medical disorders not normally viewed as neurological or psychiatric, such as asthma (Marx et al., 1986), hypertension (Schachter, 1957), conditions associated with pain (Vaughan & Lanzetta 1980; Prkachin & Mercer, 1989). Evaluation of either the production of or the response to facial expression in all these conditions requires measurement of the facial expression produced or presented.
Very recently, anesthesiologists have suggested that it might be possible to detect consciousness during surgery from facial activity. Although the patient experiences no pain and gross motor activity is paralyzed, patients have reported full recall of what was said and other activity in the room during an operation. Not knowing the patient was mentally alert has caused severe problems for both the patients and the medical teams and this provides yet another potential medical application for on-line monitoring of facial activity.
Many congenital disorders and in utero exposures (to prescribed, accidental, and recreational chemicals, as well as infections such as cytomegalovirus) lead to subtle or profound neurological and developmental dysfunction. A comprehensive battery of developmental markers by which to assess and address children's deficits is crucial, and evaluations dealing with production and reception of facial expression are certain to figure in such testing.
Behavioral disorders often accompany (or may occur independently of) minimal or overt brain damage. In this light, the association between delinquency and inability to recognize facial affects (McCown et al., 1986, 1988) is noteworthy. Assessment of facial expression may serve not only as a marker for dysfunction, but may also aid in management of children who fail to elaborate or fail to heed normal social signals, as suggested by the reduction of disruptive mealtime behavior by facial screening in a mentally retarded girl (Horton, 1987).
Deaf users of sign language have prominent use of facial expression in communication, signaling both meaning and grammar. For instance certain adverbials, or the presence of a relative clause, are indicated by distinctive facial movements (Bellugi, personal communication). There is a dissociation between linguistic and emotive facial expression in deaf signers with right versus left hemisphere lesions. Scientists investigating the use of facial expression in language of deaf and normal speakers have expressed excitement about the possibility of an automated facial expression coding system.
These diverse examples illustrate the potential of a focused research program directed towards computer understanding of facial expression. However, the most important applications and benefits could well be the ones that we have not yet imagined. Important strides in measurement methods can give rise to far reaching changes in substantive investigations, and we anticipate many more creative applications to emerge, once this technology is available, that will enhance our understanding of people through their expressions.