Marc Fraile: Computer Vision and Explainability in Human-Human and Human-Robot Interaction
- Datum: 22 november 2024, kl. 9.15
- Plats: Polhemsalen, Ångströmlaboratoriet, Lägerhyddsvägen 1, Uppsala
- Typ: Disputation
- Respondent: Marc Fraile
- Opponent: Gualtiero Volpe
- Handledare: Ginevra Castellano, Joakim Lindblad, Gustaf Gredebäck, Nataša Sladoje
- DiVA
Abstract
Guided play is a natural part of learning during childhood, and has been widely employed to study aspects of social interaction in child development. Such analysis in infants can help early detection of developmental issues; while in later childhood we can employ similar techniques to enrich our knowledge on child-child interaction and potentially apply it to child-robot interaction. This doctoral thesis contributes to the growing field of automatic social signal analysis by exploring the application of modern end-to-end Deep Learning-based Computer Vision approaches for detection of engagement-related states. It further explores the use of explainable AI as a knowledge distillation tool, targeting both experts (what are the best XAI techniques to help researchers understand the model's decision-making process?) and novices (how can explanation techniques help disclose information to the end-user?). In the four included papers, me and my co-authors contribute a new dataset in child-child interaction controlled for the level of rapport, show that feature-based methods outperform end-to-end training for rapport detection in our dataset, and show that end-to-end training succeeds in a very small infant engagement dataset, even when feature extraction methods fail. We further show that explanation methods can enhance user trust towards a socially assistive robot, and that judging the human-likeness of attention mapping techniques provides a quantifiable comparison technique that favours the same traits that are identified as desirable in the literature: distribution locality and class sensitivity.