NSF Report - Facial Expression Understanding
Title/Contents
Exec Summary
Overview
Psychology & Neuroanatomy
Computer Vision
Neural networks & Computation
Special Hardware
Basic Science
Sensing & Processing
Expression Models and Databases
Recommendations
Benefits
References

 

III. TUTORIAL SUMMARIES

III-A. The Psychology and Neuroanatomy of Facial Expression

John Cacioppo, Joseph Hager, and Paul Ekman

Abstract. This session surveys the different sources of information in the face and the different types of information that can be derived. Neural efferent pathways include the brain areas transmitting to the facial nerve, the facial nerve to facial nucleus, and the facial nucleus to muscles. The relationship between electromyographic (EMG) measurement, muscle tonus measurements, and visible observable facial activity and methods is considered. Evidence on emotion signals includes universals, development, spontaneous versus deliberate actions, and masked emotions. The face also provides conversational signals and signs relevant to cognitive activity. The logic of comprehensively measuring facial movement illustrates how FACS scores facial behavior, the mechanics of facial movement, and options for what to score (intensity, timing, symmetry). Relationships between facial behavior, voice, and physiological measures are discussed. A database of the face and support for implementing this resource are needed.

Presenters: J. T. Cacioppo, P. Ekman, W. V. Friesen, J. C. Hager, C. E. Izard

Facial Signal Systems

The face is the site for the major sensory inputs and the major communicative outputs. It is a multisignal, multimessage response system capable of tremendous flexibility and specificity (Ekman, 1979; Ekman & Friesen, 1975). This system conveys information via four general classes of signals or sign vehicles: (1) static facial signals represent relatively permanent features of the face, such as the bony structure and soft tissues masses, that contribute to an individual's appearance; (2) slow facial signals represent changes in the appearance of the face that occur gradually over time, such as the development of permanent wrinkles and changes in skin texture; (3) artificial signals represent exogenously determined features of the face, such as eyeglasses and cosmetics; and (4) rapid facial signals represent phasic changes in neuromuscular activity that may lead to visually detectable changes in facial appearance. (See Ekman, 1978, for discussion of these four signal systems and eighteen different messages that can be derived from these signals).

All four classes of signals contribute to facial recognition. We are concerned here, however, with rapid signals. These movements of the facial muscles pull the skin, temporarily distorting the shape of the eyes, brows, and lips, and the appearance of folds, furrows and bulges in different patches of skin. These changes in facial muscular activity typically are brief, lasting a few seconds; rarely do they endure more than five seconds or less than 250 ms. The most useful terminology for describing or measuring facial actions refers to the production system -- the activity of specific muscles. These muscles may be designated by their Latin names, or a numeric system for Action Units (AUs), as is used in Ekman and Friesen's Facial Action Coding System (FACS, see page 10). A coarser level of description involves terms such as smile, smirk, frown, sneer, etc. which are imprecise, ignoring differences between a variety of different muscular actions to which they may refer, and mixing description with inferences about meaning or the message which they may convey.

Among the types of messages conveyed by rapid facial signals are: (1) emotions -- including happiness, sadness, anger, disgust, surprise, and fear; (2) emblems -- culture-specific symbolic communicators such as the wink; (3) manipulators -- self-manipulative associated movements such as lip biting; (4) illustrators -- actions accompanying and highlighting speech such as a raised brow; and (5) regulators -- nonverbal conversational mediators such as nods or smiles (Ekman & Friesen, 1969).

A further distinction can be drawn among rapid facial actions that reflect: (1) reflex actions under the control of afferent input; (2) rudimentary reflex-like or impulsive actions accompanying emotion and less differentiated information processing (e.g., the orienting or defense response) that appear to be controlled by innate motor programs; (3) adaptable, versatile, and more culturally variable spontaneous actions that appear to be mediated by learned motor programs; and (4) malleable voluntary actions. Thus, some classes of rapid facial actions are relatively undemanding of a person's limited information processing capacity, free of deliberate control for their evocation, and associated with (though not necessary for) rudimentary emotional and symbolic processing, whereas others are demanding of processing capacity, are under voluntary control, and are governed by complex and culturally specific prescriptions, or display rules (Ekman & Friesen, 1969), for facial communications. (The terms facial "actions," "movements," and "expressions" are used interchangeably throughout this report).

Techniques for Measuring the Rapid Facial Signals

Numerous methods exist for measuring facial movements resulting from the action of muscles (see a review of 14 such techniques in Ekman, 1982; also Hager, 1985 for a comparison of the two most commonly used, FACS and MAX). The Facial Action Coding System (FACS) (Ekman and Friesen, 1978) is the most comprehensive, widely used, and versatile system. Because it is being used by most of the participants in the Workshop who currently are working with facial movement, and is referred to many times in the rest of this report, more detail will be given here about its derivation and use than about other techniques. Later, the section on neuroanatomy of facial movement (page 12) discusses electromyography (EMG), which can measure activity that might not be visible, and, therefore, is not a social signal.

The Facial Action Coding System (FACS)

FACS was developed by determining how the contraction of each facial muscle (singly and in combination with other muscles) changes the appearance of the face. Videotapes of more than 5000 different combinations of muscular actions were examined to determine the specific changes in appearance which occurred and how to best differentiate one from another. It was not possible to reliably distinguish which specific muscle had acted to produce the lowering of the eyebrow and the drawing of the eyebrows together, and therefore the three muscles involved in these changes in appearance were combined into one specific Action Unit (AU). Likewise, the muscles involved in opening the lips have also been combined.

Measurement with FACS is done in terms of Action Units rather than muscular units for two reasons. First, for a few changes in appearance, more than one muscle has been combined into a single AU, as described above. Second, FACS separates into two AUs the activity of the frontalis muscle, because the inner and outer portion of this muscle can act independently, producing different changes in appearance. There are 46 AUs which account for changes in facial expression, and 12 AUs which more grossly describe changes in gaze direction and head orientation.

Coders spend approximately 100 hours learning FACS. Self instructional materials teach the anatomy of facial activity, i.e., how muscles singly and in combination change the appearance of the face. Prior to using FACS, all learners are required to score a videotaped test (provided by Ekman), to insure they are measuring facial behavior in agreement with prior learners. To date, more than 300 people have achieved high inter-coder agreement on this test.

A FACS coder "dissects" an observed expression, decomposing it into the specific AUs which produced the movement. The coder repeatedly views records of behavior in slowed and stopped motion to determine which AU or combination of AUs best account for the observed changes. The scores for a facial expression consist of the list of AUs which produced it. The precise duration of each action also is determined, and the intensity of each muscular action and any bilateral asymmetry is rated. In the most elaborate use of FACS, the coder determines the onset (first evidence) of each AU, when the action reaches an apex (asymptote), the end of the apex period when it begins to decline, and when it disappears from the face completely (offset). These time measurements are usually much more costly to obtain than the decision about which AU(s) produced the movement, and in most research only onset and offset have been measured.

The FACS scoring units are descriptive, involving no inferences about emotions. For example, the scores for a upper face expression might be that the inner corners of the eyebrows are pulled up (AU 1) and together (AU 4), rather than that the eyebrows' position shows sadness. Data analyses can be done on these purely descriptive AU scores, or FACS scores can be converted by a computer using a dictionary and rules into emotion scores. Although this emotion interpretation dictionary was originally based on theory, there is now considerable empirical support for the facial action patterns listed in it:

  • FACS scores yield highly accurate pre- and postdictions of the emotions signaled to observers in more than fifteen cultures, Western and non-Western, literate and preliterate (Ekman, 1989);
  • specific AU scores show moderate to high correlations with subjective reports by the expresser about the quality and intensity of the felt emotion (e.g., Davidson et al., 1990);
  • experimental circumstances are associated with specific facial expressions (Ekman, 1984);
  • different and specific patterns of physiological activity co-occur with specific facial expressions (Davidson et al. 1990).

The emotion prediction dictionary provides scores on the frequency of the seven single emotions (anger, fear, disgust, sadness, happiness, contempt, and surprise), the co-occurrence of two or more of these emotions in blends, and a distinction between emotional and nonemotional smiling, which is based on whether or not the muscle that orbits the eye (AU 6) is present with the muscle that pulls the lip corners up obliquely (AU 12). Emotional smiles are presumed to be involuntary and to be associated with the subjective experience of happiness and associated physiological changes. Nonemotional smiles are presumed to be voluntary, and not to be associated with happy feelings nor with physiological changes unique to happiness. A number of lines of evidence -- from physiological correlates to subjective feelings -- now support this distinction between emotional and nonemotional smiles (reviewed in Ekman, 1992a).

The Maximally Discriminative Affect Coding System (MAX)

Izard's (1979) MAX also measures visible appearance changes in the face. MAX's units are formulated in terms of appearances that are relevant to eight specific emotions, rather than in terms of individual muscles. Unlike FACS, MAX does not exhaustively measure all facial actions, but scores only those facial movements Izard relates to one or more of the eight emotions. All of the facial actions which MAX specifies as relevant to particular emotions are also found in the FACS emotion dictionary, but that database contains inferences about many other facial actions not present in MAX that may signal emotion. There is some argument (Oster et al., in press) about whether the facial actions MAX specifies as relevant to emotion are valid for infants.

Evidence About Which Facial Actions Signal Which Emotions

The scientific study of what facial configurations are associated with each emotion has primarily focused on observers' interpretations of facial expressions (e.g., judgements of pictures of facial expressions). There has been far less research, although some, that has examined how facial expressions relate to other responses the person may emit (i.e., physiological activity, voice, and speech) and to the occasion when the expression occurs. (1) Across cultures there is highly significant agreement among observers in categorizing facial expressions of happiness, sadness, surprise, anger, disgust, and fear. (Note there have been some recent challenges to this work, but they are ideologically, not empirically, based and are proposed by those who claim emotions do not exist [Fridlund, 1991] or that emotions are socially constructed and have no biological basis [Russell, 1991a,b,c]). (2) The experimental inductions of what individuals report as being positive and negative emotional states are associated with distinct facial actions, as are the reports of specific positive and specific negative emotions. (3) Cultural influences can, but do not necessarily, alter these outcomes significantly. (4) These outcomes can be found in neonates and the blind as well as sighted adults, although the evidence on the blind and neonates is more limited than that for sighted adults. (5) Emotion-specific activity in the autonomic nervous system appears to emerge when facial prototypes of emotion are produced on request, muscle by muscle. (6) Different patterns of regional brain activity coincide with different facial expressions. (7) The variability in emotional expressions observed across individuals and cultures is attributable to factors such as differences in which emotion, or sequence of emotions, was evoked and to cultural prescriptions regarding the display of emotions (e.g., Ekman, 1972, 1992b; Ekman & Friesen, 1978; Ekman et al., 1972, 1983; Izard, 1971, 1977).

Facial actions have also been linked to nonemotional information processing. For instance, in addition to nonverbal messages (e.g., illustrators or emblems; see Ekman & Friesen, 1969; Ekman, 1979), incipient perioral activity has been observed during silent language processing (Cacioppo & Petty, 1981; McGuigan, 1970), increased activity over the eyebrow region (corrugator supercilii) and decreased blinking has been associated with mental concentration or effort (e.g., Darwin, 1872; Cacioppo et al., 1985; Stern & Dunham, 1990), and gesture/speech mismatches have been observed during the simultaneous activation of incompatible beliefs (Goldin-Meadow et al., in press).

The Neuroanatomy of Facial Movement

Rapid facial signals, such as emotional expressions, are the result of movements of facial skin and connective tissue (i.e., fascia) caused by the contraction of one or more of the 44 bilaterally symmetrical facial muscles. These striated muscles fall into two groups: four of these muscles, innervated by the trigeminal (5th cranial) nerve, are attached to and move skeletal structures (e.g., the jaw) in mastication; and forty of these muscles, innervated by the facial (7th cranial) nerve, are attached to bone, facial skin, or fascia and do not operate directly by moving skeletal structures but rather arrange facial features in meaningful configurations (Rinn, 1984). Although muscle activation must occur if these facial configurations are to be achieved, it is possible for muscle activation to occur in the absence of any overt facial action if the activation is weak or transient or if the overt response is aborted.

Briefly, the neural activation of the striated muscles results in the release of acetylcholine at motor end plates, which in turn leads to muscle action potentials (MAPs) that are propagated bidirectionally across muscle fibers and activate the physiochemical mechanism responsible for muscle contraction. The activating neurotransmitter acetylcholine is quickly eradicated by the enzyme acetyl cholinesterase, so that continued efferent discharges are required for continued propagation of MAPs and fiber contraction. Moreover, low amplitude neural volleys along motor nerves tend to activate small motoneurons, which innervate relatively few and small muscle fibers (a relationship called the size principle; Henneman, 1980). Thus, dynamic as well as configural information flows from the muscles underlying rapid facial signals (Cacioppo & Dorfman, 1987). Finally, fast or low-level changes in these efferent discharges can occur without leading to one-to-one feature distortions on the surface of the face. This is due to factors such as the organization of the facial muscles (e.g., agonist/antagonist, synergist, and how some muscles overlay other muscles) and the structure and elasticity of the facial skin, facial sheath, adipose tissue, and facial muscles. Not unlike a loose chain, the facial muscles can be pulled a small distance before exerting a significant force on the object to which they are anchored (Ekman, 1982; Tassinary et al., 1989). In addition, the elasticity of the facial sheath, facial skin, and adipose tissue acts like a low-pass mechanical filter. Therefore, electromyography (measuring electrical activity by attaching electrodes to the surface of the face), has served as a useful complement to overt facial action coding systems (see review by Cacioppo et al., 1990).

The muscles of mimicry on each side of the face are innervated by a lower motor nerve emanating from a facial nerve nucleus located in the pons. The left and right facial nerve nuclei are independent, but the commands they carry to the lower face are from contralateral upper motorneuron tracts whereas the instructions they carry to the mid-upper face are from bilateral motorneuron tracts. The upper motorneuron tracts include corticobulbar and subcortical ("extrapyramidal") pathways. Lesions in the former are associated with hemiparalysis of voluntary movements, whereas lesions in the latter are more typically associated with attenuated spontaneous movements (Rinn, 1984). Despite these distinctions, the subcortical and cortical control mechanisms provide complementary influences, with (1) the former well suited for spontaneous, non-flexible behaviors that are directly and immediately in the service of basic drives and (2) the latter providing adaptability by allowing learning and voluntary control to influence motor behavior.

Facial Data Base

A readily accessible, multimedia database shared by the diverse facial research community would be an important resource for the resolution and extension of issues concerning facial understanding. This database should contain images of faces (still and motion), vocalizations and speech, research findings, psychophysiological correlates of specific facial actions, and interpretations of facial scores in terms of emotional state, cognitive process, and other internal processes. The database should be supplemented with tools for working with faces, such as translating one facial measurement system into another, and modifying the expression of an image synthetically. Such a database would be capable of facilitating and integrating the efforts of researchers, highlighting contradictions and consistencies, and suggesting fruitful avenues for new research. Only isolated pieces of such a database exist now, such as the FACS Dictionary (Friesen and Ekman, 1987), which needs updating.

Note: This report was originally prepared by John Cacioppo and Joseph Hager, and then revised and edited by Paul Ekman. It is based on presentations by each of these authors, C. E. Izard, and W. V. Friesen.