Technical Faculty

Multimodal Behavior Processing

Jun. Prof. Dr. Hanna Drimalla

The Virtual Interaction Task (VIT)

Drimalla, H., Scheffer, T., Landwehr, N., Baskow, I., Röpke, S., Behnia, B. and Dziobek, I. (submitted 2019) A Digital Tool to Identify Social Biomarkers (Validated in Individuals with Autism).

The Virtual Interaction Task (VIT) is a simulated social interaction designed as a “conversation” between the participant and a recording of an actress about food preferences and dinner preparation. The participant watches the video of the actress, who speaks about her experiences and poses simple questions to the participant. While the participant answers the questions, the actress smiles and nods as if she were listening to the answers. During the whole conversation, the participants non-verbal and verbal behaviour is video and audio-recorded. The participant’s facial expressions, voice modulation and gaze behaviour are analysed later using computer-based technologies. Thus, the VIT allows one to objectively measure qualitative and quantitative differences in social behaviour with a high-level of standardization and in an interactive naturalistic setting.

Theoretical considerations

Recent research of social cognition has emphasised a strong need for more interactive social tasks for the laboratory as well as clinical diagnosis. However, interactive paradigms are more costly, as they require a second participant, a confederate or professional to interact with the participant. They are also very hard to standardize as the participant’s counterpart must always behave and interact in the same way to everyone. The VIT balances naturalism and standardization by putting the participant in an interactive naturalistic situation with a recorded counterpart. Focusing on food preferences, the conversation in the VIT deals with a typical subject of small talk involving an emotional component. Furthermore, the topic lends itself to three comparable parts of neutral, positive and negative valence. Last but not least, the additional assessment of the participant’s own food preferences via a post-questionnaire allows one to differentiate between their emotional empathic reaction to the actress and their own emotions regarding the topic. New computer-based technologies are used to automatically and objectively analyse the participant’s facial expression, gaze behaviour and voice modulation during the conversation. This allows for a completely non-intrusive measurement of non-verbal behaviour without any additional equipment. As a screen, a simple webcam and a microphone are all that is needed; the task can be conducted in the laboratory as well as at home. In particular, clinical studies may profit from not attaching equipment to the participant, as this reduces confounds in touch-averse patients. However, if additional physiological measures such as EMG, fMRT or EEG are of interest, the task can be easily adapted.

Stimuli and Design

The task is designed as a “dialogue” between the recording of a young woman and the participant. The woman speaks about food she likes and dislikes and how she sets a table for dinner. Following each section, she asks the participant about it. After each question, the participant has about half a minute to answer while the actress nods and smiles towards him. The participants know that they are being recorded during the whole conversation and are asked to behave as they would in a real conversation. The conversation consists of seven parts, which are shown in table X in more detail. In the first part of the conversation the actress explains the task to the participant and poses a sample example question to the participant and appears to listen to their answer. This part of the conversation will not be analysed. After the introduction a neutral part follows where the actress and the participant speak about how to set a table for dinner. A positive part about each person’s favourite food follows. In this section the actress explains her favourite food (Rucola pizza) to the participant and asks them for their favourite food. Last, the actress speaks about food she strongly dislikes (cold fish in aspic), and the participant elaborates on the food they dislike. After the conversation has ended, the participant uses a 5-point Likert scale to indicate how much they like the food mentioned by the actress.

Realization and Administration

The experimenter explained the task and left the room before the participant started the task by himself. The video of the participant was recorded automatically and timestamped to later align the actress’s video and the participant’s video (in the new version, the videos of each part are already edited in the different parts of conversation and aligned automatically). The resulting data consist of a video for each emotional part (neutral, joy, disgust), a video of the participant listening to the actress and a video of the participant talking to the actress.

VIT