NSF Report - Facial Expression Understanding
Title/Contents
Exec Summary
Overview
Psychology & Neuroanatomy
Computer Vision
Neural networks & Computation
Special Hardware
Basic Science
Sensing & Processing
Expression Models and Databases
Recommendations
Benefits
References

 

IV. REPORTS FROM THE PLANNING BREAKOUT GROUPS

IV-A. Breakout Group on Basic Science

Participants: J. Allman, J. T. Cacioppo, R. J. Davidson, P. Ekman, W. V. Friesen, C. E. Izard, M. Phillips

Basic Science for Understanding Facial Expression

Richard Davidson, John Allman, John Cacioppo, Paul Ekman, Wallace Friesen, Joseph C. Hager, and Mike Phillips

This group was given the assignment of specifying basic research needs in the emerging area of facial expression understanding. This report first presents a series of recommendations that specify the most critical basic research needs, the infrastructure that needs to be developed, and the data bases that need to be developed. This is followed by a discussion of next steps that can be taken now to reduce the time spent in facial measurement before a fully automated system for measuring facial movement is available.

Recommendations on Basic Research Needs

Perception of facial expression

Research should determine:

  • whether high levels of agreement among observers can be obtained about which emotion is displayed and/or the intensity of the emotion which is displayed, without presenting the full facial configurations (prototypes) which have been studied to date;
  • how blends of emotion are manifest;
  • the variables that influence an observers interpretation of a facial expression;
  • the mechanisms that underlie the perception of facial expressions of emotion;
  • the relationship between information from facial actions and other behaviors.

Darwin's (1872) pioneering studies began a century-long debate about whether observers can accurately judge the emotion shown in a facial expression. This issue is related to the question of whether specific expressions actually do correspond to particular emotions. Over the decades, clearer conceptualization of these two problems stripped away confounding variables such as characteristics of the elicitors, blending of two or more expressions, additions of irrelevant actions of the face or body, poor photographic and procedural techniques, and language problems to reveal agreement among observers about the emotional meanings of a small number of expressions postulated to be prototypical for each of a number of emotions (Ekman, 1972; Izard, 1971). Today, studies by several researchers (reviewed by Ekman, 1989) show that such prototypes, including happy, sad, fear, anger, surprise, and disgust, are accurately judged across many cultures.

Some research studies have examined variables that affect the interpretation of facial expression. One line of studies indicates that females tend to be more accurate in judging expressions than males, but the difference is quite small (Hall, 1978). Some work has examined the influence of the context, including the eliciting circumstance and previously seen expressions, on judgments but there is disagreement about this issue and how to properly study it (Russell, 1991a, 1991b, 1991c; Russell & Fehr, 1987; Ekman & O'Sullivan, 1988; Ekman et al., 1991a, 1991b).

Only a few provocative research findings have shed any light on the specific mechanisms for perceiving facial expression. One line of investigation has focussed on hemispheric differences in processing facial stimuli, with the argument centering on whether the right hemisphere is dominant for this task (Borod et al., 1990). Another approach is identifying specific brain centers for processing facial information, such as identity (Heywood & Cowey, 1992). Two excellent studies have recently appeared that have used positron-emission tomography (PET) to reveal the neural systems involved in discriminating the gender and identity of faces in the temporal cortex (Haxby et al., 1991; Sergent et al., 1992).

Thus far, no functional anatomical studies have been published that reveal the neural systems involved in the production or interpretation of facial expression. Such studies should be strongly encouraged. The recent development of functional MRI (magnetic resonance imaging) has enormous potential for these studies because functional MRI has higher spatial and temporal resolution than PET and because multiple studies can be conducted in the same individual since subjects are not exposed to ionizing radiation.

Twenty five years of cross-cultural research finds consistent evidence for the universal recognition of six emotions -- anger, fear, disgust, sadness, happiness and surprise. Recent evidence indicates possible additions to this list, including contempt and shame. A key issue for research is to elaborate further the basic categories of emotion recognized from the same expressions across cultures and to study any differences for specific cultures. Another issue is what variations (e.g., changes in intensity or additions of facial actions) on the full face prototype expressions for an emotion are still perceived as belonging to the same basic emotion. A related question is how expressions that contain only portions of the full prototype are judged.

Another issue barely explored is how blends of different emotions in the same expression are perceived and judged. The effect of asymmetry of expressions on the perception of emotion expression needs further clarification (Hager & Ekman, 1985). How the perception of emotion expressions affects the perception of the expresser's other personal characteristics, such as personality, intelligence, and health, should be explored.

The temporal dynamics of expression should be examined to determine if they provide information independent of what is provided by the configurational aspects of an expression. Research should examine the relationship between the dynamics of the facial expression and the information the user is attempting to convey in spoken discourse.

Differences between voluntary and involuntary facial expressions of emotion

The facial behaviors that distinguish between false versus genuine, and more broadly between voluntarily produced and involuntary facial expressions have been explored only for happiness (Ekman et al., 1988, 1990) and need to be examined for other emotions.

We need to know if the markers identified in the extant research generalize to all expressive behavior (e.g., asymmetry of facial movement).

The relationship between facial and other signs of emotion

Much more work needs to be done to determine how emotion signs interact with other facial signs, such as signs of age, sex, race, ethnicity, and linguistically-related signals. Another related area of research is the relative contribution of face compared to other signal systems, such as body movement, voice and language, and how these systems may be integrated by the expresser and decoded by the observer.

The physiological consequences of voluntary production of facial expressions

Work began in the last decade on the consequences of voluntary production of facial expressions on behavior and physiology. A growing body of evidence indicates that such voluntary production generates subjective changes in emotional feelings, shifts in autonomic nervous system activity and alterations in central nervous system patterning (Ekman et al., 1983; Ekman & Davidson, 1992; Levenson et al., 1990). We do not know which elements of particular expressions are necessary and/or sufficient for the production of these effects, nor do we know the neural circuits which subserve these effects. Studies that combine modern neuroimaging methods with behavioral procedures are needed to address this question.

Studies of spontaneous expressive behavior in response to standardized elicitors

We have few data on the temporal dynamics of spontaneous expressive displays. Such data will be critical for both machine understanding and machine production (animation) of facial expressions. We also know little about the range and variability of the configurations (specific muscle actions) that occur in spontaneous facial expressive behavior. How frequent are the facial prototypes of emotion in different real-world situations? It is likely that there is considerable variability in spontaneous facial behavior, across cultures and social settings, in addition to some uniformities. By studying large sample sizes under varying incentive conditions, we can begin to specify just what remains invariant within an emotion, or group of related emotions, e.g., what has been called an emotion family (Ekman, 1992a). At present, this proposal assumes an invariant core of facial actions that is preserved across different instantiations of an emotion, but we have very little data on the expressive signs of such invariance.

We also know little about the contributions of non-facial expressive behavior such as head and lip movements to the understanding of facial expressions. While emotional expressive behavior appears to be particularly dense during spoken conversation, the relation between emotional and nonemotional facial behavior has not been systematically studied.

Another issue of central importance is to determine what information is carried in physiological indices that is not available in expressive measures. If we are to use automated measures of expressive behavior to make inferences about emotional state, we must know the degree to which other nonverbal measures contribute unique variance unavailable in measures of expressive behavior. Studies that systematically examine the contributions of facial and physiological (both central and peripheral) measures to both self-report and other behavioral (or task-related) manifestations of emotion are needed.

Infrastructure Recommendations

Training

A new generation of investigators must be trained who are knowledgeable about neuroanatomy, the psychology of facial expression, computer science, and neural networks. This should be accomplished in a number of ways:

  • Post-doctoral research fellowships providing interdisciplinary training that uses the resources of multiple institutions.
  • Summer Institutes, bringing together faculty from diverse disciplines.
  • Centers of Excellence, which represent geographic centers where high concentrations of relevant investigators are present and that could be combined with post-doctoral training.
  • Special journal sections to bring information from different disciplines to the attention of other relevant disciplines (e.g., have computer vision and neural network experts write a series of papers for a psychology journal and vice versa). The purpose would be to catalyze a dialogue.

Instrumentation

Appropriate video recording technology to make a psychologist's recording usable to the computer vision community was emphasized.

Standardization of inexpensive hardware platforms for digitally processing facial images and analyzing them.

Mechanisms to facilitate the sharing of software across laboratories and training personnel in its use.

Database recommendations

The following are important to include in a database for the face:

  • Compilation of extant findings with more fine-grained description of facial behavior. Virtually all of the extant findings on facial expression and emotion have not included the actual FACS codes that were used to identify particular expressions. The compilation of this information would provide some of the information that is currently lacking on the range and variability of spontaneous facial behavior. Very large data sets on adult facial behavior exist at the University of Washington (J. Gottman), University of California, San Francisco (P. Ekman), University of Saarlandes (R. Krause), Wurzberg University (H. Ellgring), University of Zurich (E. Banninger-Huber), and the Ludwig-Boltzmann Institute of Humanethologie (K. Grammer).
  • Dynamic voluntary productions of facial actions, with descriptive tags, accompanied by data from other sensors (e.g., EMG, facial thermography, central nervous system measures).
  • Spontaneous behavior with tags, along with associated physiology. Expressive behavior to be obtained under several conditions including interactions with both people and machines. The latter is required since it is not clear if people will interact with even intelligent machines in the same way that they interact with humans. Ideally, these interactions would include both acoustic and visual data. To be useful for studying how people would provide visual information cues when speaking to a machine, data should be collected with either a simulation of an intended spoken language system or with some initial version of such a system. Since we expect the benefits of visual cues to become more pronounced in noisy environments, at least some of these data should be collected with varying amounts of environmental noise. Not only will this alter the acoustic data, but it is likely that it will alter the behavior of the subject. Data collected in this way, would be useful for answering some basic questions about how people will interact with intelligent machines in different situations, as well as provide a means of training and testing systems to make use of the visual cues that people provide.
  • Animation exemplars of various combinations of facial actions with variations in time course, for use in perception studies.

Next Steps towards Automating Facial Measurement

While all agreed that automating the entire process of facial coding would be enormously beneficial, we also recognized the likelihood that such a goal was in the relatively distant future. Accordingly, we discussed the advantages of various interim solutions that represent efforts to partially automate the system. Below we consider the relative advantages and disadvantages of different partial automated systems.

A fully automatic system may be too difficult to ever develop because of the many potential artifacts that could interfere with measurement. However it is within the range of current technology for much of the tedious and time-consuming parts of FACS scoring to be automated, freeing valuable time of trained human observers to make the most difficult judgments. Since the discovery of new phenomena may not be automatable, it is essential for humans to remain "in the loop" in any case.

Detecting when a person is speaking

Facial behavior usually occurs most often during speech. Most of these facial actions are conversational signals (e.g. movements which punctuate speech; see Ekman, [1979] for a listing of various conversational signals), rather than signs of emotions. The emotion relevant facial movements may also be most frequent when a person is speaking. A system that could identify when a person is speaking would flag locations during a conversation where there is a high likelihood of frequent facial behavior, and that could then be scored by a human coder. Speaking could be detected from a voicing detector or by identifying lip movements associated with speech. It will be important to distinguish those lip movements required for speech articulation from additional movements of the lips which are signs of emotion. This is relatively easy for a human observer to distinguish. However, the speech required lip movements do vary with particular languages. Every system described below should have this capacity as it is likely that many investigators will want to examine the relationship between speech interaction patterns and facial behavior.

Detecting head and/or eye movement or position change

Data could be provided for studies that require knowing if a person was facing and/or looking at a particular person, object or area in the visual space. Noise in such measurement would come from quick movements to and from the person's usual position, such actions as nodding 'yes' or shaking the head to indicate 'no'. Perhaps these could be identified separately by the fact that the direction of movement changes rapidly.

A system could be developed which detected when the head was turned so far away that facial muscle activity could not be scored. Time would be saved by removing those periods from the corpus which the human must then score.

In addition to methodological benefits of this type of automated system, such a system could provide data relevant to attention, and perhaps relevant to satisfaction or interest.

Detecting some of the most frequent facial actions

The evidence to date suggests that when people are engaged in a conversation, the most frequently occurring movements are brow raise, brow lower, and some form of smiling.

Brow Movement Detector

The simplest system would detect just brow raising and lowering. Some investigators might find this detector useful as a means of identifying and setting aside what they do not want to have human scorers analyze. Others might find brow raising and lowering of substantive interest, as there is some evidence that changes in the frequency of these actions are relevant to involvement in the speech process, and may also provide information about other cognitive states (see Ekman, 1979).

Measurement of brow raising and lowering should be combined with the detection of speaking and face direction described above, as there is some evidence to suggest that the signal value of these brow actions varies with whether the person is speaking or not, and with whether they are facing the other interactant or not (Chesney et al., 1990).

A more elaborate system would identify the other five movements of the eyebrows that are possible. These occur at a much lower frequency, but three of the movements are relevant to identifying the occurrence of fear or sadness.

Smile Detector

There are numerous studies where the automatic detection of the frequency and duration of the contraction of the zygomatic major muscle would provide complete or at least sufficient data. Overall smiling rate could be used to indicate satisfaction with certain types of interaction or pleasure with the person in the interaction. It will be important to distinguish the zygomatic major smile from the risorious muscle smile, as the latter has been most often found as a sign of fear.

There are now more than a dozen studies (reviewed in Ekman, 1992b) which show that zygomatic major muscle smiling is not a sign of actual enjoyment unless part of the muscle which orbits the eye (orbicularis oculi, pars medialis) is also active. It will be much more difficult to detect the presence of this action in addition to zygomatic major. It may be sufficient to simply identify that zygomatic major smiling has occurred, and then have a human scorer make the decision about the presence of the orbicularis oculi muscle.

Detecting facial movement which is neither brow action nor smiling.

If smiling and brow movement could be automatically detected, it would be very useful if a system could also detect any other facial movement apart from those actions, even if it could not discriminate among those movements. A human scorer would then inspect and score those movements. Although it may be obvious, such a system would enormously reduce the time now consumed in finding and scoring less frequent, but important, facial actions.

Detecting the onset, apex, and offset of any given facial movement.

The most time costly aspect of current facial scoring is to obtain these time markers. This information is crucial for coordinating facial activity with simultaneous changes in physiology, voice, or speech. It is also thought likely that information about the time course of a facial action may have psychological meaning relevant to the intensity, genuineness, and other aspects of the expresser's state. And, time course information is necessary to provide a database for those wanting life-like facial animation.

The simplest system would take input from a human scorer about the particular action which had been identified, and then automatically identify the start, apex, and end points of that action. More sophisticated systems would measure different aspects of the timing of the onset to apex, and the offset.

Detecting a limited number of specified facial actions.

There are many studies in which the investigator can specify a priorithe particular actions of interest. A system could be developed which was capable of detecting just instances of actions for which a number of exemplars were provided. For example, in studies of depression, it may be sufficient to detect any instance of sadness.

Automating Other Relevant Data Sources:

Automated gesture recognition

Although considerably less work has been performed on gesture than on facial expression, a system that automatically recognized gesture, or a partial system similar to those described for the face above, would help considerably in using both facial and gestural information together to make predictions about emotion and other behavior (see Rosenfeld, 1982, for a review on systems for measuring bodily movement and posture).

Physiological pattern recognition

The use of multiple measures of peripheral and central psychophysiological measures necessitates ways of meaningfully integrating across many measures to describe coherent patterns. Most of the extant research on multiple physiological indicators of emotion has used relatively crude procedures for characterizing patterns of physiological response. We need better statistical tools for pattern description and tests to differentiate one pattern from another (e.g., how do we know when we have a different pattern versus a variant of the same pattern?).

NOTE: Prepared by Richard J. Davidson with contributions from John Allman, John Cacioppo, Wallace Friesen, Joseph Hager and Mike Phillips. This was then substantially revised and edited by Paul Ekman.