|NSF Report - Facial Expression Understanding|
IV-C. Breakout group on Modeling and Databases
Participants: F. Parke, D. Terzopoulos, T. Sejnowski, P. Stucki, L. Williams, D. Ballard, L. Sadler, J. Hager, D. Psaltis, and J. Zhang
Computer-Based Facial Expression Models and Image Databases
Fred Parke, Demetri Terzopoulos, Terrence Sejnowski, Peter Stucki, Lance Williams, Dana Ballard, Lewis Sadler, and Joseph Hager
The ability to recognize and generate animated facial images, together with speech input and output, holds enormous promise for many diverse application areas. Consider the following examples.
The use of model based scene analysis and model based facial image synthesis could yield very low bandwidth video conferencing (Aizawa et al., 1989; Choi el al., 1991).
Applications relevant to human/computer interface development include
Substantive research in the real-time animation of faces for telecommunication and for the synthesis of computer interface "agents" is being conducted at Apple Computer, Inc. (Advanced Technology Group, MS:76-4J, 20525 Mariani Ave., Cupertino, CA 95014), Hitachi, Ltd. (Hitachi Central Research Laboratory, 1-280 Higashi-Koigakubo, Kokubunji, Tokyo 185, Japan), NTT (Human and Multimedia Laboratory, 420 C, NTT Human Interface Laboratories, 1-2356 Take, Yokosuka-Shi, Kanagawa, 238-03 Japan), and Sony (Information Systems Research Center, Sony Corporation, Asahi-cho, Atsugi-shi 243, Japan).
A number of companies are in the business of vending computer systems and services for making facial image composites ("identikit" police identification tools, point-of-purchase video preview for cosmetic make overs or cosmetic surgery, and one class of systems for estimating the aged appearances of missing children), 3D digitization of faces, 3D reconstructive surgery preview and manufacture of facial prosthetics, 3D digitization of teeth for the manufacture of dental appliances, and 2D and 3D facial animation.
Another important current interest is in the entertainment industry; the use of graphical face models in advertising, for movie special effects, etc.
These many diverse examples illustrate the potential of a focused research program directed towards computer understanding of facial expression. The purpose of this document is to delimit the specific research questions that would form the basis of such a program. To do this, the document is organized into four subsequent sections: 1) 3D Modeling, 2) Facial Databases and Security, and 3) Research Directions.
State of the Art in 3D Modeling
Brief partial history
The first work in developing facial models was done in the early 70's by Parke at the University of Utah (Parke, 1972a, 1972b, 1974, 1975) and Gillenson at Ohio State (Gillenson, 1974). Parke developed the first interpolated and the first parametric three dimensional face models while Gillenson developed the first interactive two dimensional face models. In 1971, Chernoff (1971, 1973) proposed the use of simple 2D computer generated facial images to present n-dimensional data. In the early 80's, Platt and Badler at the University of Pennsylvania developed the first muscle action based facial models (Platt, 1980, 1985; Platt & Badler, 1981). These models were the first to make use of the Facial Action Coding System (Ekman & Friesen, 1978; Ekman & Oster, 1979) as the basis for facial expression control.
The last seven years has seen considerable activity in the development of facial models and related techniques. Waters and Terzopoulos developed a series of physically based pseudo-muscle driven facial models (Waters, 1986, 1987, 1988; Waters & Terzopoulos, 1990, 1992; Terzopoulos & Waters, 1990b). Magnenat-Thalmann, Primeau, and Thalmann (1988) presented their work on Abstract Muscle Action models in the same year as Nahas, Huitric and Sanintourens (1988) developed a face model using B-spline surfaces rather than the more common polygonal surfaces. Waite (1989) and Patel and Willis (1991) have also reported recent facial model work. Techniques for modeling and rendering hair have been the focus of much recent work (Yamana & Suenaga, 1987; Watanabe & Suenaga, 1992). Also, surface texture mapping techniques to achieve more realistic images have been incorporated in facial models (Oka et al., 1987; Williams, 1990; Waters & Terzopoulos, 1991).
The ability to synchronize facial actions with speech was first demonstrated by Parke in 1974 (Parke, 1974, 1975). Several other researchers have reported work in speech animation (Pearce et al., 1986; Lewis & Parke, 1987; Hill et al., 1988; Wyvill, 1989). Pelachaud has reported on recent work incorporating co-articulation into facial animation (Pelachaud, 1991). Work modeling the physical properties of human skin have been reported by Komatsu (1988), Larrabee (1986), and Pieper (1989, 1991).
Essentially all of the current face models produce rendered images based on polygonal surfaces. Some of the models make use of surface texture mapping to increase realism. The facial surfaces are controlled and manipulated using one of three basic techniques: 3D surface interpolation, ad hoc surface shape parameterization, and physically based with pseudo-muscles.
By far the most common technique is to control facial expression using simple 3D shape interpolation. This is done by measuring (Cyberware Laboratory Inc., 1990; Vannier et al., 1991) the desired face in several different expressions and interpolating the surface vertex values to go from one expression to the next. One extension on this approach is to divide the face into regions and interpolate each region independently (Kleiser, 1989).
Ad hoc parameterized facial models have been developed primarily by Parke (1982). These models present the user with a small set of control parameters that manipulate various aspects of facial expression and facial conformation. These parameters are only loosely physically based. These parametric models are the only ones to date that allow facial conformation control, i.e., changes from one individual face to another.
Physically based models attempt to model the shape changes of the face by modeling the properties of facial tissue and muscle actions. Most of these models are based on spring meshes or spring lattices with muscle actions approximated by various force functions. These models often use subsets of the FACS system to specify the muscle actions.
Even the best current physically based facial models use relatively crude approximations to the true anatomy of the face. The detailed structure of the face (Fried, 1976; Friedman, 1970), its anatomical components, their interaction, and actions are only approximated at fairly abstract levels.
No model to date includes the complete set of FACS muscle actions. The muscle action models do not yet model the true muscle behavior, but only approximate it with various fairly simple mathematical functions
Throughout this report, repeated references to the need for a database of facial information have emphasized the glaring lack of such a resource. The preceding sections have indicated many types of information that should be part of such a database, including images of the face (both still and motion), FACS scores, interpretations of facial signs, physiological data, digitized speech and sounds, and synthetic images. Tools for manipulating images and associated data and for synthesizing facial images might usefully be linked to the database. Much of the progress in applying computers to understand the face depends upon access to a facial database, both in terms of sharing information and common sets of images, and in regard to increasing cooperation and collaboration among investigators.
The benefits of the image database include reduction of research costs and improved research efficiency. Costs can be reduced by avoiding redundant collection of facial expression exemplars by each investigator independently. The retrieval and exchange of images is also improved by a centralized repository. Research efficiency can increase by maintaining a reference set of images that can provide a basis for benchmarks for efforts in various areas. Also, a great deal of information will be collected about this reference set.
Because the image part of the database is clearly the first step in constructing a full, multimedia database of the face, it is discussed more fully in the following section on images and security.
Images in the database
Several technical considerations for database images need to be resolved, including criteria for image resolution, color, and sequence, data formats, compression methods, and distribution mechanisms. The choice should be compatible with with other multimedia, scientific databases now being developed in biology and medicine.
Images for the database must meet the multiple needs of scientists working in this field, independent of the technical criteria for inclusion in the database. In other words, some images of importance may not meet current standards for resolution or color. These images probably include those from the archives of leading behavioral scientists. An important aspect of a database of images is the metadata associated with each image, such as the circumstances under which the image was obtained. Some metadata and other associated fields are discussed below.
For general relevance, the images should be scored in terms of Ekman and Friesen's FACS (1978) as a standard measure of activity in the face. Images should represent a number of demographic variables to provide a basis for generality of research findings. These variables include racial and ethnic background, gender, and age. For each category of images in the database, images of several individuals need to be included to avoid effects of the unique properties of particular people.
Another important variable, in terms of both signal value and neurophysiology, is the distinction between deliberate actions performed on request versus spontaneous actions not under volitional control. Many expressions are not easily classified into either of these categories. Examples of both categories (with provision for additional categories) must be included in the database in order to study the substantive question of the difference between these expressions. Examples of the prototype expressions for each of the seven or eight basic emotions should be included. In addition, expressions in which only part of the prototype is present, in one or more areas of the face, should be available. Examples of deliberate individual muscular actions and the important combinations of actions for selected experts in facial movement is an important component of this database. Intensity of muscular action should be varied for many of the image categories above.
For spontaneous expressions, an important consideration is to identify within the database the precise nature of the conditions that elicited the expression. The eliciting circumstances, such as a conversation versus watching films alone, can produce different types of expressions.
Given the number of classifications and variables, the number of images to fill a full matrix of cross-tabulations would be quite large. The great number of expressions that might be included is one reason for including a module in the database that could artificially generate variations on expressions given certain image prototypes. Such images could be used as stimuli in judgment studies.
Both still images and motion images of the facial expressions are required in the database. Still images can convey much of the configurational information about facial expressions that is important for inferring meaning in terms of emotion, message, etc. Motion records contain information about the temporal dynamics of expression that might convey information about the volitional quality of the movement, deception in the expression, etc. Both still and motion records could provide the basis for computer-based measurement of the image. Motion records would be important for work on animation and modeling.
In addition to the reference images installed and managed by the database administrator, provision should be made for individual researchers to add their own images to the database. These additions must match the specified technical formats which should be made available to all researchers for their equipment purchase and record collection phases of research. The security status of such additions also needs to be determined.
In order to meet the high resolution technical requirements of the database, most images used as references in the database will need to be collected from scratch. A limited number of images are currently available that might be included in the database to start. An indication of where such archives might be is contained in the database recommendations of the Basic Science Workgroup on page 35. A survey should be made of these and other laboratories to determine what images exist that might be contributed to the database, what restrictions they have, and the terms under which they might be incorporated. Color moving images of spontaneous facial behaviors will probably be difficult to find on anything but consumer level video in these archives.
The database of images can be valuable to thousands of researchers if it is easy to access and use. Ease of access implies a relaxed level of security that allows most users quick access to materials they need and frees the administrator of time consuming identity checks. In this case, the database could be accessed by nonscientists, such as journalists, advertisers, and ubiquitous hackers. However, some investigators may need to work with images that can be made available only to certain authorized users. Examples include images of psychiatric patients with emotional disorders or facial surgery patients who are unable or unwilling to give permission to use their likenesses outside a community of scientists. These images could easily, even unwittingly, be abused by users who are not familiar with informed consent procedures, granting agencies's regulations on uses of identifiable records, or permissions. Such examples indicate the need for a more comprehensive security strategy.
Issues related to security are very important in any database design. Their requirements, however, vary from business to engineering to scientific databases. While these requirements are rather well understood in the business and engineering databases, the problem of security has not yet been addressed in full detail for the design of scientific databases.
For the subject image database, there are many aspects that need to be investigated by an interdisciplinary team of database designers, computer vision experts and behavioral scientists. For example, in the area of access control, the following model-options are possible:
a) The Discretionary Access Control (DAC) model, where the owner of objects or object classes keeps all rights and can grant or revoke them to individual users or user-groups of his choice. The granting can include units like tuples or sub-units like attributes. Also, the database end-user community can get the right to use a complex object like a human face but not the right to use details or underlying contraction/expansion models of the same human face. The database manager has no direct influence on this security scheme.
b) Mandatory, Multilevel Access Control (MAC) model. The users can get rights to use individual objects or object classes at various security levels (registered confidential, confidential, for internal use only, no security). The assignment of security access control is rule-based and can be implemented by the database manager.
Both access control models are based on proper identification and authentication by the database user or user-groups.
A MAC model for secure use of databases might be implemented as a three tiered hierarchy. In the lowest security level, all images should be without restriction on use, i.e., "in the public domain." This criterion would exclude any images that have proprietary restrictions (e.g. use with royalty), have any restrictions on their use (e.g., consents or permissions), or are of a sensitive nature (e.g., patients or prisoners). This level of security allows anonymous or guest access to the database. The medium level of security includes images that can be used only for scientific purposes, without restrictions such as royalties. All images at this level should be accessible by anyone who is authorized at this level. The high level of security gives access to images that have special considerations requiring careful validation of users. This level of security might be used by police, security, and military agencies, by those who require some kind of payment for use of images, and by researchers who have sensitive images that require screening of users. The high level of security should be flexible enough for a number of different categories of users, each with their own databases. The medium and high levels of security imply a system of userids and passwords and an administrative staff to verify user identities and maintain the user database.
Most currently available images would seem to fall into the medium level of security, but the status of existing images needs to be investigated. Procedures for determining the appropriate security level need to be developed and the role of the owner or contributor in this process specified. Incentives for obtaining images for lower levels of security from primary researchers would be helpful.
Another import issue is the problem of object distribution over networks. This requires appropriate network security level up to encrypted procedures. Again, the assignment of distribution security can be done on an object- or sub-object level.
Research in database design has gradually evolved from business databases to engineering databases and is currently starting to address issues concerned with scientific databases (Frenkel, 1994; Rhiner & Stucki, 1992). Similar security issues arise in the human genome project.
Research Directions and Goals
Anatomically correct models Models which faithfully reflect detailed facial anatomy, facial muscle actions, and facial tissue properties need to be developed. These models would be useful both for medical applications and for expression understanding research. The Library of Medicine is creating a three-dimensional model of several human bodies from digitized slices. The face portion of this database could provide an anatomically accurate model for the facial musculature.
Expression control Models which include complete, accurate expression control capabilities are needed. Control could be based on the FACS system or other parameterizations as needed and developed. The ability to easily specify and control facial conformation (those facial aspects which make each face unique) is desirable. Orthogonality between expression control and conformation control is essential.
Universal models An ultimate goal is the development of models which quickly and accurately provide representations for any face with any expression. This goal should address the extent to which artificially generated images can be substituted for photographic images.
Accessible models These models needed to be readily available (in the public domain?) and supported on common platforms - PC's and workstations.
Database organization A database of images needs answers to such questions as: How does one construct and administer large multimedia databases? What is the relationship between performance and database organization of the multimedia database? How do you distribute images and other fields efficiently and quickly via network or storage device? How do you search for key properties of a visual image?
Database security issues As discussed above security of databases can potentially be a huge problem. Some method of managing personal information must be established that has different levels of protection. The Human Subjects protocol used by universities could serve as a model.
NOTE: This report was assembled by D. Ballard from his own contribution and others from F. Parke, J. Hager, L. Sadler, P. Stucki, D. Terzopoulos, T. Sejnowski, and L. Williams, and was edited by T. Sejnowski.