A. Reliability Issues
The fundamental issue was whether independent persons would agree in their scoring of facial behavior. More specifically, whether persons who learned FACS without instruction from the developers would agree, both among themselves and/or with the developers.
Scoring of facial behavior requires two different operations - description and location - and thus two different reliability issues. By description, we mean what happened, what are the Action Units responsible for an observed change in facial behavior. By location, we mean when did it happen, at precisely what moment in time did whatever happened start and stop. Suppose the brows have moved. To describe the movement we would ask which type of movement it was; did the brows raise, lower, raise and draw together, did just the inner part raise or the entire brow, etc.? To locate the movement we must determine at what video frame (1/60 second) the movement, whatever it is, started and at what video frame it ended. The two questions are independent to some extent. Reliability could be high on description but low on location, or vice versa.
Our emphasis has been upon description. We believed that if we could succeed in achieving a reliable descriptive system for distinguishing what happens, it will be likely that there will be reliability in spotting when it happens. The Manual teaches description; there is no instruction on how to solve location problems, although this is considered in Chapter 11 of the Guide. Most of this chapter is about description reliability. Section M. reports preliminary results on location reliability.
For either description or location, reliability can be evaluated in two terms:
(1) agreement among independent persons;
(2) agreement between a learner and an expert.
We were interested in not only whether there was intercoder agreement, but whether those who learned FACS without instruction from us would score facial behavior the way we do. Data are reported for both types of agreement. The results were about the same.
Description of facial movement with FACS involves four operations and the reliability of each can be studied:
(1) Determining which AUs are responsible for the observed movement. The coder learns how to recognize the appearance changes due to each of 44 AUs, singly and in combination. The logic of the system is that any movement can be scored in terms of which AUs produced it. Theoretically, it is possible for about 20 AUs to combine to produce a single facial movement or as few as one. (All 44 cannot combine since some involve antagonistic actions, and also the occurrence of some actions conceals the possible presence of others.) Most of this chapter is focused upon the reliability of this operation.
(2) Scoring the intensity of action for five of the 44 AUs. While intensity scoring could have been provided for each and every one of the 44 AUs, we have used intensity scoring only where we thought the magnitude of action could influence the recognition of a particular Action Unit or a related action. Intensity is scored in terms of three levels: low, medium and high. Reliability of intensity scoring is reported separately in section J.
(3) Determining whether any AU is shown on only one side of the face rather than bilaterally. Asymmetries, where an AU is bilateral but of different intensity on the two sides of the face, are not scored. While unilaterality was included in the reliability study, it was so rare that no evidence is provided about the reliability of this operation.
(4) Scoring the position of the head and the position of the eyes during a facial movement. This descriptive system is grosser than that provided for the AUs. Fourteen descriptors are provided, of which up to six can be scored for any event. Because head/eye scoring is a simpler system, agreement on it might have inflated agreement measures on the total scoring of a face. Results are therefore reported separately including and excluding head/eye position scores. In fact, it made little difference.
The last issue considered is whether agreement is substantially improved by having the independent coders arbitrate their disagreements. The agreement achieved by six independent coders (intercoder agreement and agreement with experts) is contrasted with agreement achieved by three pairs of arbitrated scores in section I. Arbitration improved agreement, but not by much.
B. The Behavior Sample
We selected behavior samples from 10 of the honest-deceptive interviews we have been studying the past eight years (Ekman & Friesen, 1974; Ekman, Friesen & Scherer, 1976). We selected the first two actions shown by the subject while she conversed about her reactions to a film she was watching, and the first two actions shown while the interview continued after the film was over. In order to increase the variety of behavior which would be subject to scoring, if the first two actions repeated an AU or AU combination already selected more than once, then the next non-redundant action was taken. By these means a total of 40 items was obtained. Six were dropped because the video picture was not acceptable, leaving 34 items.
Coders were given the videotape with the instruction to score whatever occurred within each of the 34 events. Note, that by defining each event ahead of time, giving the coders the start and stop frame within which they should score, we eliminated decisions about location and studied just description reliability.
C. The Coders
Seven persons previously unfamiliar with FACS learned FACS as a group in January-February 1977. We had minimum contact with them during this period, so that their performance can be considered a fair test of whether FACS produces reliable scoring when learned without instruction from the developers. Working about half-time it took five weeks for them to complete the FACS instructional procedure. The results reported are based on six persons since one coder did not continue.
These six coders included five women and one man. Two were research assistants who have bachelors level education. Two were doctoral candidates, one in psychology, another in linguistics. Another was a post-doctoral fellow trained in developmental psychology. The last was a visiting associate professor of clinical psychology whose native language is German.
The six coders independently scored the 34 events without any communication among them. After their scoring was completed the six were grouped into three pairs and given their scores on any event where they disagreed. They were required to jointly arrive at an arbitrated final scoring.
We, Ekman and Friesen, jointly scored each of the 34 events. We then examined the scoring of the six learners, and considered whether we would want to change our scoring in light of their performance. We did so only a few times, and those decisions did not increase agreement between them and us.
E. Raw Data Matrix
Thirty-four events were scored by six independent persons producing 6 x 34=204 sets of Action Unit scores. Additionally, there are the three arbitrated pair scores for each event.
Table 2-1: Example of Raw Scores on One Behavioral Event
Experts: 1+4+ 7 Blossom: 1+4+ 7 Kathy: 1+4+ 6 Charlotte: 4+5X+ 7+10X Linda: 1+4+ 7 Sonia: 4+ 7 Rainer: 4+ 7 Arbitrated Bl-Ka 1+4+ 7 Arbitrated Ch-Li 1+4+ 7+10X Combined So-Ra 4+ 7
The first row is the scoring of Ekman and Friesen. The next six rows show the scoring of this event by each of the six persons. The next three rows show the arbitrated scoring of the three pairings. (Note that Sonia and Rainer agreed on this event so they did not arbitrate.) The entries are the numbers for the AUs, which is the system used to record scores. The experts scored three AUs - 1, 4 and 7 - which describe a raising of the inner corners of the brow (1), pulling the brows together (4) and tightening of the eyelids (7). There was agreement among all coders that AU 4 was present. Some did not score AU 1. One coder scored an outer eyelid action (6) rather than the inner eyelid action of 7. One coder also scored an upper eyelid raise (5X, the X means she scored it as low in intensity); and a low level upper lip raise (l0X).
F. An Index of Agreement
It was not obvious what type of measure of agreement should be employed. Reliability measures often are applied to situations where scoring involves a binary decision (present or absent) or assignment into one of a series of exclusive categories. In FACS there is a range of possible scores, from 1 to about 26 (about 20 AUs and 6 head/eye descriptors) which could be scored for any one event. There are many more opportunities for disagreement than is usually the case in psychological measurement.
We could have assessed reliability for each AU separately, determining how many times the six persons agreed about its presence or absence over the 34 items. This method, often used in reliability studies, would give as much credit to an agreement that an AU was not scored for an event, as agreement that it was to be scored. Such a method would have produced reliability scores much higher than the procedure we selected.
The index of agreement that we employed (Wexler, 1972) was a ratio calculated separately for each of the 34 events, for each pair of coders and for each coder compared to the expert scoring. The arbitrated scoring was also evaluated with the same index. The formula was:
(Number of AUs on which Coder 1 and Coder 2 agreed) X 2
The total number of AUs scored by the two coders
For example, if the scoring by one coder was 1+5+7+22 and the scoring by a second coder was 1+7+16, the ratio would be:
2 (Number of AUs agreed upon) times 2=4, divided by
7 (Total number of AUs scored by the two coders) =.57
Table 2-2 shows the matrix of ratios generated with this formula for the raw data shown in Table 2-1.
Table 2-2:Matrix of Agreement Ratios for the Scoring of One Behavioral Event
Experts Blossom Kathy Charlot Linda Sonia Blossom: 1.000 Single Kathy: 0.667 0.667 Person Charlot: 0.571 0.571 0.286 Scoring Linda: 1.000 1.000 0.667 0.571 Sonia: 0.800 0.800 0.400 0.667 0.800 Rainer 0.800 0.800 0.400 0.667 0.800 1.000 Experts Bl-Ka Ch-Li Arbitrated Bl-Ka: 1.000 Pairs Ch-Li: 0.857 0.857 Scoring So-Ra: 0.800 0.800 0.667
The top part of Table 2-2 gives the ratios calculated for the scoring of each individual person. The bottom part of the table gives the ratios when the scoring reached through arbitration by a pair of persons was evaluated. We will use the top part of the table to illustrate how the ratio represents agreement. The first column shows the ratio when each coder's scoring was entered into the formula with the scoring of the experts. Perfect agreement (Blossom, Linda) generated 1.00 ratios. Disagreements generated lower ratios. The other columns in Table 2-2 show agreement between each pair of coders. One can see that Sonia and Rainer agreed exactly as did Linda and Blossom. The maximum disagreement was between Kathy and Charlotte.
The mathematics of this formula are such that if only one or two AUs are scored for an event, a disagreement will lower the ratio more than if six or seven are scored. If two coders disagreed about only one AU and agreed about one AU, they would earn a ratio of .50. If they disagreed about one AU and agreed about four AUs, the ratio would be .80. Even though the disagreements in both instances about only one score, it seems reasonable that the formula rewards agreement on a high proportion of actions which are present.
We checked on how many AUs were scored for each of the 34 events by the experts. The mode was three scores for an event, with about 1/3 of the 34 events having one or two scores, and 1/3 having four to seven scores. Thus, if the absolute number of scores distorted the ratio of agreement, the 314 events produced a balanced distribution in this regard.
Two matrices were generated. One matrix is composed of the ratios derived by comparing each person's scoring of each event with the experts' scoring, generating 204 data points (6 persons times 34 events). The second matrix disregarded the experts' scoring, and calculated the ratio by comparing each person's scoring with each other person. With six persons, for each person five such ratios were generated, (comparing that person with every other person) for each of the 34 events scored. The mean of those five ratios was taken as the measure of a particular person's average agreement with others for a particular event. This yielded a second matrix which again had 204 points, with each point representing the mean ratio of agreement with the other person's for each event scored (34 events times 6 persons).
G. Overall Agreement
The mean ratio across all coders (six) and all events scored (34) was .822 when scoring was compared to experts, and .756 when intercoder agreement was evaluated. Figure 2-1 shows the distribution of the 204 ratios represented by these means. Figure 2-1 shows that the distributions of ratios were skewed towards high agreement. For example, 141 out of 204 ratios of agreement with the experts were .80 or above, and only 28 out of the 204 ratios were below .60. Figure 2-1 shows also that the distribution of ratios representing intercoder agreement was similarly skewed towards agreement, with just as few low value ratios, but not as many ratios above .80 as when agreement with experts was calculated.
H. Did Scoring Head/Eye Position Inflate Reliability?
The answer is no. Recall that the measurement of head/eye position was a grosser descriptive scheme than that of the Action Units. Agreement on what might be an easier set of decisions might have inflated the agreement ratios, concealing disagreements about the scoring of AUs. When head/eye position scores were disregarded and the ratios recalculated, the mean ratio across all coders and all events was .816 (as compared to .822 including head/eye) when scoring was compared against experts, and .745 (as compared to .756 including head/eye) for intercoder agreement. The distributions were examined and they are not noticeably different from those shown in Figure 2-1. Results reported hereafter include the head/eye position scores.
I. Does Arbitrating Differences Enhance Agreement?
The answer is slightly, but it depends upon how much they disagreed and how low their individual agreement was prior to arbitration. Presenting the coders with their disagreements and asking them to arbitrate their differences could have produced lower rather than higher agreement. Each pair after arbitrating might diverge more from the other pairs or from the experts. Instead there was a slight increase in both agreement measures.
The mean ratio across all coders and all events went up from .822 to .863 in terms of agreement with experts, and from .756 to .809 in terms of intercoder agreement. Table 2-3 shows that the benefit was negligible for the pair who had high agreement individually (Charlotte and Linda), moderate for a pair somewhat lower individually (Blossom and Kathy), and considerable for the pair where one member (Rainer) had the lowest coefficient of agreement. His gain through arbitration, however, was at the cost of a loss for the person he arbitrated with (Sonia).
Table 2-3: Benefits of Having Coders Arbitrate Their Disagreements in Pairs
Mean Ratios of Agreement with Experts Individual Scoring Arbitrated Pairs Blossom .782 Kathy .827 .869 Charlotte .859 Linda .858 .886 Sonia .973 Rainer .732 .833
When the same comparisons were made utilizing the measures of intercoder agreement (rather than agreement with experts as shown in Table 2-3), the values were two to three hundreths lower but the pattern was the same. For example, the mean ratio of intercoder agreement for Sonia and Rainer's arbitrated scoring was .802 as compared to .833 for agreement with experts.
Two other methods for reconciling disagreements were explored. In one, a simple flip of the coin was used to determine who was "correct" on each disagreement. Using the coin flip as the basis for saying what the final score should be for items where a pair disagreed yielded ratios of agreement with the experts that were just as high as arbitration for the coder pairs who had not disagreed much to begin with (Blossom and Kathy; Charlotte and Linda). For the pair which included the one coder who had shown the lowest agreement with the experts (Rainer) a coin flip did not yield as much increased agreement as did arbitration. Another method for resolving disagreements was to apply a set of logical rules to determine who was "correct" for any events where a pair disagreed. These rules benefitted the pair who most disagreed (Sonia and Rainer) as much as did arbitration.
J. Agreement About Intensity
The data analysis so far has ignored any disagreements about intensity. Such disagreement could have occurred on the scoring of only five of the 44 AUs, since FACS provided for intensity scoring on just those few AUs. There were 19 instances in which the experts had scored one of the intensity-AUs, providing 114 opportunities for agreement (6 coders times 19 instances).
Exact agreement about intensity was reached on 55% of these scores. Recall that intensity involved a three-point scale. There were no two-point disagreements; instead about half the disagreements were one-point disparities, the other half were when one person entirely missed scoring an intensity-AU that had been scored by the experts at the low intensity level.
The scoring of the pairs of persons who had disagreed on intensity were subject to arbitration. Arbitration enhanced agreement with experts. Exact agreement about intensity rose to 74%.
Recall that the data reported in section G., H. and I. had disregarded disparities in intensity scores. The agreement ratios for each of the six coders compared to the experts' scoring were recalculated considering a disagreement about intensity as a total disagreement. The mean ratio across all six persons and all 34 events was .778 when a difference on intensity was considered an error, as compared to .822 when intensity disagreement was ignored. Of course the reason why the ratio of agreement did not decrease further was that there were not that many instances where intensity could be scored. In another behavioral sample, in which there was a preponderance of behavior involving AUs where intensity could be scored, the ratios of agreement might be lower.
K. Representativeness of the Behavior Sample
The scores were tabulated for each AU across all coders and all events to provide a picture of the extent to which the behavior sample offered opportunity for testing the reliability of all the AUs. For this tabulation we considered not only whether an AU was scored, but also whether an AU was considered, even if not scored, during the coders' step-by-step scoring procedure. (Such information is readily retrieved from the scoring sheets on which the coders recorded every AU considered.)
Twenty-five out of the 44 AUs were scored or considered many times; 19 of the AUs each were scored or considered less than ten times. These 19 AUs are probably rare occurrences in most conversations between adults; for example, sticking out the tongue, tightening the platysma muscle, sucking the lips in to cover the teeth, puffing out the cheeks, etc. While we cannot generalize from this study to the reliability which might be obtained if the behavior scored included such actions, there is no reason to suspect that reliability would be lower. Quite the contrary, the classification of many of these infrequent AUs probably involves an easier set of discriminations than is required for the AUs which were often considered and scored in this study.
L. Errors in Scoring Particular Action Units
The purpose of this examination was to determine (a) if there were more errors in the scoring of some AUs than others, (b) if frequent errors were primarily failing to score an AU versus substituting another AU for the correct one, and (c) if frequent errors were the product of only one or two persons or the entire group of coders. The number of times each AU was not scored by a coder but was scored by the experts was tallied, as was the number of times each AU was scored by a coder when it was not scored by the experts. Errors were fairly evenly distributed across the entire group of AUs, with no pattern as to the type of error or who made the mistakes, with the exception of five AUs. These five AUs (out of 44) accounted for a third of all errors.
Two of these high error AUs involve the muscles orbiting the eyes. In agreement with anatomists, FACS distinguishes orbicularis oculi into two Action Units, one AU referring to the involvement of the pars palebralis (AU 7) and the other referring to pars orbitalis (AU 6) (the inner and outer portions respectively). The errors involving these two AUs for the most part were substitutions of one for the other. We are doubtful that it will be possible to decrease errors on this discrimination, since the distinction is frequently a subtle one.
Two of the high error AUs involve muscles which reach down from the upper portion of the face to raise the upper lip. In agreement with anatomists, FACS distinguishes between Levator Labii Superioris Caput Infraorbitalis (AU 10) and Levator Labii Superioris Alaeque Nasi (AU 9). Here again, most of the errors involved substituting the scoring of one for the other. Since this also can be a subtle distinction additional instruction or training would probably not decrease errors substantially.
The last high error AU involves a muscle or muscle group which stretches the lips horizontally. Anatomists disagree about whether this action (AU 20) is due to Risorius, Buccinator, or some strands of Platysma. Most of the errors involved a total failure to score the AU (when it was scored by experts) rather than a substitution. With this AU, further instruction or training might be beneficial. Those using FACS should be attentive to providing more practice on AU 20 and monitoring reliability on this action.
Errors were found to be distributed across the six persons rather than made disproportionately by any one person. Note also that even with the high error AUs reported above, in the majority of instances the learners scored the high error AUs correctly (in agreement with experts).
M. Reliability in Location of Facial Action
In section A. we distinguished between two aspects of measuring facial action - description (what happened) and location (when something happened). The Manual deals with description and provides no instruction about location. In section B. we explained that the learners in their practice and in this test of reliability were not required to locate actions but only to describe them. We, the investigators, located a series of events for them to score the AUs responsible for the event.
Let us now consider the issue of location, the reliability of determining when an action occurs. Preliminary information is available from a dissertation by Sonia Ancoli (1978), one of the six people who had learned FACS. In Ancoli's experiment, subjects sat alone in a room and watched two films. One film showed scenes which other subjects had rated as causing pleasant feelings. The other film had been rated as producing feelings of disgust and fear. The subjects were monitored on EEG, heart rate, EMG on skeletal muscles, and respiration. In addition, a videotape was made of their faces. Ancoli scored all of the facial behavior shown by 35 subjects, a total of three minutes during a pleasant film and two minutes during the unpleasant film for each subject.
Reliability was evaluated at two points in the study. After the first ten subjects had been scored by Ancoli, she randomly selected the facial behavior during one of the two films for each subject. This sample was then scored by a second person (Linda Camras, another one of the people who had recently learned FACS). Later, a second sample was drawn, selecting a 30-second period from the video records of each of the 25 remaining subjects. Again, Camras scored the randomly selected sample.
Location, unlike FACS description, can be regarded as a binary decision - something is happening or not at each frame in time. The decision should be easy with a large facial movement or when the face is completely still. It should be difficult when there is a very small movement. FACS provides a set of minimum requirements for the amount of change which must occur before a movement can be scored. The most difficult decision, and the main opportunity for disagreement is when there is a small movement and the person must evaluate whether it is sufficient to meet FACS requirements for scoring. If it does not, the coders treat it as a no-movement.
When occur versus no-occur decisions are made point-by-point in time, a common way to assess reliability is to determine for each point in time whether two independent persons agree. Agreement is then represented as a percent of total time considered. Each 1/10 of a second was so examined. In sample 1, the two coders agreed (as to whether or not something was occurring) 89% of the time. In sample 2, the two coders agreed 95% of the time. This calculation gave equal credit to agreement that nothing happened, as to agreement that something happened. If the sample contained long periods of time in which the face was inactive, this measure of location agreement would be inflated. In sample 1, the face was totally inactive or not scorable (action but not meeting the Minimum Requirements demanded by FACS) 69 percent of the time; in sample 2, the face was inactive or not scorable 66 Percent of the time.
There is of course quite a difference between agreement that nothing has occurred, and agreement that something has occurred but it is unscorable (does not meet the Minimum Requirements specified by FACS). Agreement about an unscorable action should represent the most difficult location decision. Since Ancoli's study we have added a new Action Descriptor to FACS for unscorable actions. If this had been available in Ancoli's study it would have been possible to calculate the percent of time two coders agreed: (a) that a scorable action occurred; (b) that an unscorable action occurred; (c) and that no action occurred. Now that unscorable actions have been included in the scoring procedure, in future studies using FACS, we recommend that location agreement be so examined. It will then be possible to isolate disagreements where one person said the action was scorable and another called it unscorable, and instances where one person said the action was unscorable and the other recorded no action. In either case, additional instruction can be given to increase location reliability if a consistent pattern is found, consistent for a particular coder or a particular AU.
Another way to examine agreement about location, which avoids the problem of inflating the estimate by agreements on the absence of action, is to examine the occurrence of complete disagreements. The worst error in location is when one person scores an event which the other failed to score (either because they missed the event entirely or judged it as not reaching the minimum requirements dictated by FACS). In sample 1, such complete disagreement occurred with 18.4% of the behavior scored; in sample 2, such complete disagreement occurred with 12.9% of the behavior scored.
Location reliability can be studied in more detail by examining exactly how closely coders designated when an event began and when it ended. Table 2-14 shows that information.
Percent of Total Events Located Agree on Beginning Agree on End Agree on Both beginning & End within 1/10 sec. within 1/2 sec. within 1/10 sec. within 1/2 sec. within 1 sec. within 2 sec. Sample 1 25.0 59.1 13.6 38.6 47.7 68.2 Sample 2 64.5 74.2 61.3 67.7 74.2 74.2
The percentage of agreement used the total events scored by both (including events seen by only one) as the denominator. Agreement was higher for judgments of when an action began than for judgments of when it ended. Agreement was higher in sample 2 than in sample 1, perhaps because of experience. The last two columns in Table 2-4 show the percent of events where both persons agreed within one second and within two seconds on both the start and stop of an action. A high percent agreement was found in sample 2.
N. Another Look at Description Reliability
Ancoli's dissertation allows another opportunity to study the reliability of the FACS description. Table 2-5 reports the ratios of agreement (calculated as explained in section F.) for the two behavior samples.
Table 2-5: Description Reliability Ratios of Agreement
Including events scored by only one Including events scored by both Including only events scored by both - excluding events they agreed were not scorable Sample 1 .722 .878 .815 Sample 2 .791 .909 .824
The first column shows the mean ratio when the events scored by only one coder was included in calculating the mean across all events. For those events scored by only one coder, the ratio was zero. Thus, the first column allows disagreement about location to lower the measure of agreement on description. The second column included in the calculation of the agreement ratios only events scored by both persons. These ratios include agreements that in certain instances there was no scorable facial action. That is, of course, an important type of agreement, but it is not the same as agreement about how to describe what is present. In the third column the ratios were calculated excluding items in which both coders agreed that the event was a no score or neutral action. The figures in the third column are directly comparable to the ratios reported earlier for inter-coder agreement among 6 persons, since in that reliability test no neutral events were included in the behavior sample and the ratios are not deflated by disagreements about location. Agreement remained about as it was for the learners described in section H.
Now that FACS provides an unscorable action descriptor, it is possible to analyze description reliability with one further refinement not shown in Table 2-5. Agreement ratios would be calculated for all events considered scorable or unscorable by one or the other of the coders, excluding from the ratio only agreements that no action had occurred. These ratios would give credit for any agreements that unscorable activity occurred. Since that represents a difficult decision, it seems sensible to include such agreement in at least one of the measures of description reliability.
0. Summary and Discussion
The description of facial action in terms of the Action Units responsible for an observed behavior change appears to be reliable. Importantly, high reliability was found for persons who learned FACS using self-instructional materials without tutoring by Ekman or Friesen. Reliability will, of course, vary with the type of facial behavior which is measured. In both sampling situations - behavior during conversation and while silently viewing films - reliability of FACS description was high.
One area where description reliability needs improvement is in the scoring of the intensity level of an Action Unit. Such intensity differentiations are allowed on only five of the 44 AUs. The intensity differentiation is limited to three levels. While there was never an instance in which there was a two-level disagreement, there was disagreement on almost half of the intensity scoring. With arbitrated scoring, agreement reached 74%. Note also that when disagreements on intensity were regarded as a total disagreement, the measure of agreement still remained high.
The reliability of locating a facial movement, i.e., designating its onset and offset, is encouraging, since the problem was not explicitly addressed in the FACS Manual and since learners were not given practice on locating facial actions. The agreement reached on location would be satisfactory for many studies, e.g., using facial behavior as a criterion measure to differentiate responders versus nonresponders to a treatment, comparing changes in facial behavior with change in heart rate or some other measure which is scored in 1/2 second units. Agreement on location needs to be improved to study such issues as the internal organization of facial actions, or to differentiate micro- from macroexpressions. Suggestions about how to measure location are discussed in Chapter 11 of this Guide.
Ancoli, S. Psychophysiological response patterns to emotions, Doctoral Dissertation, University of California, San Francisco, 1978.
Ekman, P. & Friesen, W.V. Detecting deception from the body or face. Journal of Personality and Social Psychology, 1974, 29(3), 238-298.
Ekman, P., Friesen, W.V. & Scherer, K. Body movement and voice pitch in deceptive interaction. Semiotica, 1976, 16 (1), 23-27.
Wexler, D. Method for unitizing protocols of descriptions of emotional states. Journal of Supplemental Abstracts Service, Catalogue of Selected Documents in Psychology, Vol. 2, 1972, p116, American Psychological Association.