Ayuda
Ir al contenido

Dialnet


Wavelet-based spatial audio framework

  • Autores: Davide Scaini
  • Directores de la Tesis: Ricardo Baeza Yates (dir. tes.), Daniel Arteaga Barriel (codir. tes.)
  • Lectura: En la Universitat Pompeu Fabra ( España ) en 2019
  • Idioma: español
  • Tribunal Calificador de la Tesis: Lars Falck Villemoes (presid.), Antonio Mateos Sole (secret.), Franz Potter (voc.)
  • Programa de doctorado: Programa de Doctorado en Tecnologías de la Información y las Comunicaciones por la Universidad Pompeu Fabra
  • Materias:
  • Texto completo no disponible (Saber más ...)
  • Resumen
    • Ambisonics is a complete theory for spatial audio whose building blocks are the spherical harmonics.

      Some of the drawbacks of low order Ambisonics, like poor source directivity and small sweet-spot, are directly related to the properties of spherical harmonics.

      In this thesis we illustrate a novel spatial audio framework similar in spirit to Ambisonics that replaces the spherical harmonics by an alternative set of functions with compact support: the spherical wavelets.

      We develop a complete audio chain from encoding to decoding, using discrete spherical wavelets built on a multiresolution mesh.

      We show how the wavelet family and the decoding matrices to loudspeakers can be generated via numerical optimization.

      In particular, we present a decoding algorithm optimizing acoustic and psychoacoustic parameters that can generate decoding matrices to irregular layouts for both Ambisonics and the new wavelet format.

      This audio workflow is directly compared with Ambisonics.

      After a first introductory Chapter 1, the thesis est omnis divisa in partes tres: Part I, on Ambisonics; Part II, on wavelets and the wavelet spatial audio framework we designed, and Part III, describes an incarnation of this wavelet framework into a spherical audio format which is evaluated against Ambisonics.

      Part I is composed of three Chapters, the first summarizes the background information and the following two describe the original contributions. Chapter 2 briefly describes Ambisonics giving the basis for encoding and decoding Higher Order Ambisonics. Chapter 3 is an original contribution to Ambisonics’ decoding to irregular speakers’ layouts. Initially, we describe the physical and psychoacoustical variables to design the Ambisonics decoder. We formulate the problem as an optimization problem, so later we define the optimization’s cost function. Finally we show the resulting performance of the decoder for a specific layout of speakers. In Chapter 4 we evaluate the performance of our decoder against some publicly available ones, both objectively and subjectively.

      Part II is composed of three chapters, the first summarizes the background information and the following two describe the original contributions. Chapter 5 is an introduction to Wavelet Theory, starting from a comparison with Fourier Transform and then fast-forwarding into multiresolution, the Lifting Scheme and a construction of Spherical Wavelets. Chapter 6 describes a method to generate spherical audio formats using wavelets built on a multiresolution mesh. Chapter 7 illustrates a numerical method to obtain spherical wavelet filters optimized for spatial audio purposes.

      The last Part III is composed of three chapters. Chapter 8 evaluates different versions of Spherical Audio Formats de ned by different wavelet families. Chapter 9 inspects the properties of this new format against Ambisonics for a reference layout of speakers. Chapter 10 draws the conclusions and picture some directions for future work.

      In Appendix A we present a method to implement optimization problems in Python leveraging autodifferentiation. In Appendix B we describe the method used to reduce the dimensionality of the spherical wavelet optimization problem. Both Appendices are original contributions.

      This thesis rises from the realization that there is no absolute best spatial audio technique and the consequent question: "Is it possible to build a theory for spatial audio that is channel agnostic, homogeneous and coherent, but also has good localization with few channels, easily handles irregular layouts and holds well when moving out of the sweet spot? In other words, is it possible to build a theory that combines the best of channel-based and channel-agnostic worlds?" In this thesis we develop a new spatial audio codification, similar in spirit to Ambisonics but replacing the SH by a different and more localized, set of functions: the spherical wavelets.

      The goal is to get better localization and a larger sweet spot with few channels, that can be easily decoded to irregular layouts.
 In this thesis we have defined a new generic framework for spatial audio encodings based on wavelet filters. We have described the complete audio workflow that makes use of this new tool. Then, we have particularized the framework to the spherical case for a specific mesh construction, resulting in a practical realization: the spherical wavelet format (SWF). Similarly to Ambisonics, this format is channel-agnostic. Unlike Ambisonics, in the case of SWF the signals that compose the format have a particular spatial localization. On the encoding side of the audio chain, we have devised a numerical method for wavelets optimization (with short filter length), enabling the creation of a possibly in nite set of core filters. On the decoding side of the audio chain, we have built and made publicly available1 a universal decoding method, based on the numerical optimization of some psychoacoustical observables.

      The objective of this thesis was to build a new channel agnostic format, that is homogeneous and coherent, but also has good localization with few channels, easily handles irregular layouts and holds well when moving out of the sweet spot. We have depicted the full audio chain: encoding to mesh and downsampling, transmission, upsampling and decoding to speakers. The new format is effectively channel agnostic, there is no reference to the destination layout in the definition of the format. The homogeneity and coherence characteristics need a more detailed discussion.

      If we look at homogeneity of a particular format, we should distinguish between the format homogeneity, and whether this homogeneity is retained during decoding. SWF is homogeneous in the specific implementation defined in the thesis, since the mesh over which the format is defined is in fact homogeneous. Nevertheless, it could be perfectly possible to define a wavelet format with a non-homogeneous mesh and obtain a non-homogeneous format. Ambisonics is by construction homogeneous, and the decoders for regular layouts are also (typically) homogeneous. However, when decoding to irregular layouts both Ambisonics and SWF do not assure that the resulting decoding will be homogeneous. Actually, the best decoders for irregular layouts are not homogeneous, like the ones produced by IDHOA and presented in this thesis. In this case SWF and Ambisonics are no different. The coherence follows a similar train of thought, both SWF and Ambisonics can be coherent in special conditions. SWF has indeed good localization with few channels and, thanks to the extremely limited negative gains, holds well when moving out of the sweet spot, like an amplitude panner does. It has been demonstrated that SWF behaves well when decoding to layouts that are irregular (in the SWF sense) and with the help of IDHOA it is possible to generate meaningful decoders in a matter of minutes.

      We have explored three variations of a particular incarnation of this format. In both cases wavelets were de ned over a spherical mesh, created from a primitive solid (an octahedron) using a Loop subdivision scheme. In the first variation, the wavelet family is implicitly de ned by the VBAP panning rule. In the second variation, we use an off-the-shelf wavelet family, called interpolating wavelet. In the third variation, the wavelets were optimized numerically by a brute-force method. The three methods generate audio formats that have very similar characteristics in terms of energy and intensity reconstruction. The main differences lay in the shape of the panning functions and in the behaviour of the upsampling matrix, P. It is to be noted that the three examples explored do not necessarily represent the best possible realizations. One of the virtues of SWF is precisely the ability to adapt to many different situations. We believe that there is no such thing as the best possible SWF format, but rather it depends on the particular context and goal. One of the drawbacks of SWF with respect to Ambisonics is that the acoustical and perceptual interpretation of the format in terms of pressure and velocity is lost in general (we still retain the notion of the global pressure by a careful wavelet design). In this context, it is key to have an acoustically and perceptually motivated decoder that can reinstate the missing physical and perceptual observables.

      A three-element comparison, OPT-SWF (using an optimized wavelet), VBAP-SWF (trivial wavelet from VBAP) and Ambisonics, has been carried out for two different speakers layouts. Observations from reconstructed signals, reconstructed energy and intensity indicate that SWF is a format that, depending on the decoding, can fit between an amplitude panner and Ambisonics. It has localization characteristics similar to (or in some cases better than) Ambisonics, with greater control on the negative gains. Informal listening tests confirm these characteristics.

      In our experience, the difference between the two variations of the wavelet format explored are relatively minor when evaluated in terms of the decoding results. We noticed that final results depend only slightly on the wavelet family as long as this family has been designed with reasonable characteristics. A possible explanation is that the IDHOA decoding minimizes any possible intrinsic differences between the different encodings. Also, notice that we have only explored meshes of relatively low order. It is possible that differences become more apparent when going to higher order meshes, since the filtering effects are cumulative. Additionally, besides the decoding characteristics, other filter design properties, e.g. encoding performance, can be considered when designing and evaluating the wavelet families. Anyway, we expect that the different characteristics of the wavelet families will be more evident when using custom subdivision meshes that represent directly standard speaker positions. When building custom meshes, the requirement for a spherical format could be lifted, and we could define a format for meshes with a non-spherical topology (e.g. half dome).

      Overall, SWF’s encoding, transmission and decoding flexibility and rendering performance make it an interesting family of formats to explore in real- life conditions.

      A fundamental piece for the new format, and also for the comparison with the reference technology Ambisonics, is the stage of decoding to a real world layout. One of the main outputs of the work is the formulation and implementation of the IDHOA decoder. While initially oriented to solve the problem of decoding Ambisonics to irregular layouts (to date still relevant), we developed an algorithm that, leveraging psychoacoustic criteria, can generate a decoding matrix for any linear encoding format, as long as this encoding format allows encoding a point source to a set of directions on the sphere. We have described to applications of IDHOA to Ambisonics and SWF, but it could be applied to any other format, e.g. using a specific multichannel layout as intermediate audio format and decoding it to any other layout. The main novelty factor is the separation of intensity vectors in radial and tangential components, making it possible to optimize the two components separately.

      Arteaga,D.(2013). An Ambisonics decoder for irregular 3-D loudspeaker arrays. In Audio Engineering Society Convention 134.

      Bates, E. (2009). The composition and performance of spatial music. PhD thesis.

      Batke, J.-M. and Keiler, F. (2010). Using VBAP- derived panning functions for 3d Ambisonics decoding. In 2nd Ambisonics Symposium, Paris.

      Benjamin, E., Heller, A., and Lee, R. (2010). Design of Ambisonic decoders for irregular arrays of loudspeakers by non-linear optimization. In Audio Engineering Society Convention 129.

      Daniel,J.(2003). Spatial sound encoding including near field effect: Introducing distance coding filters and a viable, new ambisonic format. In Audio Engineering Society Conference: 23rd International Conference: Signal Processing in Audio Recording and Reproduction.

      Frank,M.(2013). Phantom Sources using Multiple Loudspeakers in the Horizontal Plane. PhD thesis, IEM Graz, Austria.

      Moore,D. and Wakefield,J.(2007). The design and analysis of first order ambisonic decoders for the ITU layout. In Audio Engineering Society Convention 122.

      Scaini, D. and Arteaga, D. (2014). Decoding of higher order ambisonics to irregular periphonic loudspeaker arrays. In Audio Engineering Society Conference: 55th International Conference: Spatial Audio.

      Scaini, D. and Arteaga, D. (2015). An evaluation of the IDHOA Ambisonics decoder in irregular planar layouts. In Audio Engineering Society Convention 138.

      Schröder,P. and Sweldens,W.(1995). Spherical wavelets: effciently representing functions on a sphere. In Wavelets in the Geosciences, pages 158–188. Springer-Verlag.

      Sweldens, W. (1998). The lifting scheme: A construction of second generation wavelets. SIAM Journal on Mathematical Analysis, 29(2):511–546.

      Zotter, F. and Frank, M. (2012). All-round ambisonic panning and decoding. J. Audio Eng. Soc, 60(10):807–820.

      Zotter,F. and Frank,M (2018). Ambisonic decoding with panning-invariant loudness on small layouts (allrad2). In Audio Engineering Society Convention 144.

      Zotter, F. and Frank, M. (2019). Ambisonics. Springer Open.


Fundación Dialnet

Dialnet Plus

  • Más información sobre Dialnet Plus

Opciones de compartir

Opciones de entorno