Note: All the informations presented below are published on Ferreira et. all (2012).
Curhan and Pentland (2007) present a set of variables computed from the interlocutor's speech signal to predict the outcomes of negotiation dynamics. Motivated by the results presented by Kapoor and Picard (2005), Pentland proposes the application of the same set of variables to motion and posture analysis (Pentland, 2006). We decided to compute the same two variables, Activity and Emphasis, from the first 60 seconds of video recordings of user interaction to predict the user's experienced task difficulty.
The main proposal is to extract features using simple and computationally inexpensive video processing techniques. As a first approach, we extracted participant’s motion using the difference between frames.
Video of the user's interaction and the frame difference processing result
The frame difference signal was used to compute the two variables presented before:
Activity - measures the participant’s activity level. This variable is the fraction between the number of motion frames and the number of total frames of interaction time.
Emphasis - measures the variation of motion's energy and frequency. The variable is a sum of the Fourier transform applied to the frame difference signal and the signal’s standard deviation. For this initial approach only 60 seconds of video recording were used.
Then, we developed a classification model using the two computed variables: Activity and Emphasis.
Hypothesis 1: Activity is correlated with experienced difficulty.
Hypothesis 2: Emphasis is correlated with experienced difficulty.
These two variables, that measure different features of the signal, were found to be correlated with the experienced difficulty (table 1). As such, the two hypotheses were confirmed: Activity is negatively correlated while Emphasis is positively correlated with the experienced difficulty.
In sum, motion tends to be lower (Activity) and more irregular (Emphasis) with the increase in task difficulty.
Curhan, J., and Pentland, A, 2007. "Thin slices of negotiation: predicting outcomes from conversational dynamics within the first five minutes". Journal of Applied Psychology 92, 3, 802–811.
Ferreira, J.P., Noronha e Sousa, M., Branco, N., Ferreira, M.J., Otero, N., Zagalo, N., Branco, P., “Thin Slices of Interaction: Predicting Users’ Task Difficulty within 60 sec.,” Proceedings AltCHI’12, ACM (in press).
Pentland, A. 2006. "A Computational Model of Social Signaling". Proc. ICPR'06, IEEE.
Kapoor A., and Picard, R. W., 2005. "Multimodal affect recognition in learning environments". Proc 13th annual ACM international conference on Multimedia, ACM, 677-682.