Viktor Gunnarsson: Binaural Modeling for High-Fidelity Spatial Audio
- Date: 13 September 2024, 13:15
- Location: Polhemsalen, Ångströmlaboratoriet, Lägerhyddsvägen 1, Uppsala
- Type: Thesis defence
- Thesis author: Viktor Gunnarsson
- External reviewer: Jens Ahrens
- Supervisors: Mikael Sternad, Lars-Johan Brännmark, Anders Ahlén
- Research subject: Electrical Engineering with specialization in Signal Processing
- DiVA
Abstract
The enjoyment of reproduced sound and music is a prime pleasure for many, and the high-fidelity reproduction of binaural audio is integral to many applications in augmented and virtual reality. This thesis introduces a framework for binaural headphone auralization of sound systems, together with an in-depth analysis and proposed solutions to address sources of coloration within the signal chain.
The framework includes a novel method for binaural auralization of microphone array impulse responses. Employing a hybrid parametric approach, it utilizes causal multichannel Wiener filtering to synthesize the directional response of the ear, as described by head-related transfer functions (HRTFs), using the microphone array and a model of its acoustic properties. A time-domain polynomial matrix framework is employed for filter computations and direct and reflected sound is treated separately. Results demonstrate a small perceptual difference to reference measured binaural room impulse responses.
Additionally, the thesis addresses the impact of binaural measurement uncertainty and proposes a new measurement technique for HRTFs and headphone transfer functions (HpTFs). The method is based on a cardioid microphone array for open ear canal measurements. Results indicate that the method significantly reduces measurement uncertainty compared to omnidirectional measurements in the ear canal.
Moreover, a phase pre-processing method for HRTFs is introduced that reduces spatial phase variability of the HRTF set at high frequencies while retaining correct interaural coherence for diffuse sound. It is demonstrated that the HRTF phase pre-processing greatly reduces spectral coloration in headphone simulation of amplitude panning on virtual speakers. The method also improves performance in binaural rendering of microphone array recordings.
Finally, the thesis presents a comprehensive model for addressing coloration at the ear-signal level inherent in amplitude panning on speaker arrays. The analysis focuses on pairwise panning on symmetrical speaker setups and monaural correction filters are proposed that are robust to head movements around the sweet spot. The proposed filters are found to mitigate the phantom source elevation effect in stereophonic panning and enhance the perceived spectral similarity between discrete and panned sound sources, with effectiveness contingent on the speaker setup geometry.