Muze: 3D Mixer/Binaural Reverb Plugin (Beta)

Muze is multi-purpose VST/AU reverb plugin that combines a 3D audio mixer with a high quality binaural near & far field model. The mixer allows for tight control over the early spatial impression of your sound-sources while psychoacoustic models add essential directional cues to help localize the sound. Good externalization or depth is achieved with our reverb design that integrates with the binaural cues, giving appropriate contrast for the early spatial impression to sit in.

Muze supports up to 9 channels with interactive graphical controls for each sound-source in a single instance.  Direct sound-sources have separate processing paths that come together within the shared reverb component. Create wide impressions by routing a source signal onto multiple channels. Up-mix mono sources into stereo/binaural. Virtualize 5.1+ to headphones by placing virtual speakers on the sound-stage. Switch to the panning mode for speaker setups.

The plugin is in closed beta (Windows 7+ x64, VST2, Mac VST2/AU) and we are looking for several testers with keen ears to evaluate and give us feedback. Contact us if you’re interested along with something about yourself and we’ll supply you with further details.

Sound Samples:

  • Vocals by the talented Stephanie Kay (Stars Collide): Dry first, followed by alternating rotations along yaw and pitch axes for no-verb, no-verb near-field, small room, hall, diffuse hall, and cave
  • Jazz instrumental by Maurizio Pagnutti (All The Gin Is Gone’ & ‘Bess): Spinning around 9 instruments!
  • Loopy spatial effects!
  • Experiments! 

Geometric Audio 2: Gauss Circle Problem for Integer Sized Room Models

In the previous post on image-sources for room modeling, we made the observations that

  1. There exist a unique path from each image-source coordinate that can be back-traced to a receiver
  2. The distance and direction between image-source coordinate and listener are equal sum total of the back-traced ray lengths and the final leg of the back-traced path respectively.
  3. Image-source coordinates in 2D orthotopes has one-to-one mapping to integer lattice points
  4. The number of image-sources with respect to reflection order K and dimension D is O(K^D)

For large K or large D, the number of image-sources becomes too large to process individually for any practical real-time applications. Instead, we ought to take a density estimation approach by making the following query: How many image-sources lie between a hypersphere of radius k_1 and k_2 centered about a receiver L? i.e. what is the difference in “lattice volume” or number of lattice points  contained between two hyperspheres of different radii? If we can quickly solve for such queries, then it should be possible to design a multi-tap finite impulse response (FIR) where each sample is a weighted function of the differences in lattice volumes at successive radius k=t \frac{v}{Fs}  in meters (v is the velocity of sound, Fs is the sample-rate, and t is the integer sample index). To attack this problem, let us consider the classic Gauss Circle Problem posed simply as the determination of the number of integer lattice points within a circle of radius k (see animation below).

Gauss Circle.gif
2D Gauss circle problem

The exact solution is known and given by S(k) = 1+4\left \lfloor k \right \rfloor + 4\sum_{i=1}^{\left \lfloor k \right \rfloor} \left \lfloor \sqrt{k^2 - i^2} \right \rfloor, requiring an expensive summation over the positive integers less than k. i.e. we must numerically count despite the fact that the lattice volume approaches the area of a circle as k grows large if we want to model high-frequencies in our FIR.

Unfortunately, mapping our image-source density problem to the Gauss circle problem introduces many unsatisfactory constraints:

  1. Number of dimensions restricted to D = 2
  2. The room must be a unit square
  3. Emitter and receiver are coincident to the origin
  4. Each lattice point contributes only one unit to the summation (they are unweighted)
  5. Area is a quadratic function of k so later summations will be huge

Let us generalize the Gauss circle problem so that these constraints can be either removed or relaxed. For reference, Euclidean or L_2 distance in D dimensions is given by

  1. Recurrence relation for lattice volume S(k,D) in arbitrary dimensions:
    The base case for D=1 assumes that positive and negative lattice points coordinates are symmetric. For higher dimensional cases,  we do a form of recursive integration over the positive integers of each dimension (see animation below).

    high dim.gif
    Purple region represents lattice volume at \sqrt{k^2 - i^2} in D=1 as i increases in D=2.

    This proof follows from the observation that if ||\nu||_2 \in \mathcal{R}^D \leq k and \nu_D = i, then \sum_{d=1}^{D-1} \nu_d^2 \leq k^2 - i^2 .

  2. Memorization in quadratic space:
    Lower dimensional solutions of the original recursive formulation can be stored in memory by mapping integer space k to the quadratic space k^2. i.e. We compute and store \hat{S}(q, D) for 0 \leq q \leq K^2 where K is the max radius of interest (in practical terms, K is a meter distance quantity converted from a desired reverb time). The trade-off is that memory requirements undergo quadratic scaling with respect to max radius.
  3. Integer scaling of room boundaries and receiver offset:
    where integer scalar \ell_D determines the size of room along dimension D and integer scalar \Delta_D: 0 \leq \Delta_D \leq \left \lfloor \frac{\ell_D}{2} \right \rfloor is the emitter offset in dimension D from the origin. The proof follows from the constraint that q - (\Delta_D + \ell_D i)^2 \geq 0 implies \sqrt{q} \geq \Delta_D + \ell_D i and -\sqrt{q} \leq \Delta_D + \ell_D i.
  4. Exponential decaying lattice point contributions:
    where w_D: 0 \leq w_D \leq 1 is a real value representing in physical terms a conversion of dB loss to magnitude due to a reflection off a boundary. Proof of the case of l \leq h, D=1 follows the application of the geometric series.

With the generalization of the Gauss circle problem into a dynamic programming problem, we have expanded the parameter space to include arbitrary dimensional orthotopes of integer boundary sizes, integer receiver offsets, and real reflection gain/loss coefficients. Prefiguring these parameters beforehand and accounting for attenuation loss due to a generalization of the inverse square law of sound-fields into higher dimensions, an RT60 or FIR length and subsequent max meters K terms can be specified beforehand.

The cost of directly computing \hat{S}(q, D) at q is given by O(\sqrt{q}/\ell_D) flops. Summing over all 0 \leq q \leq K^2 gives a cost of O(D K^3) flops and is most expensive when the boundary size  \ell_D =1 is minimized. This is certainly a large improvement over directly processing O(K^D) individual image-sources where the asymptotic costs of the two methods match for D=3. However, we can make one last improvement by observing that the access patterns of q, i \in \hat{S}(q - (\Delta_D + \ell_D i)^2, D-1) resembles that of the convolution operation, allowing us to achieve even lower asymptotic cost of O(D K^2 \log K) via the fast Fourier transform. Implementation details will be covered in the next post!


Notes: Equations were lifted from a draft paper that I’m writing. Animations were done with GeoGebra.

Geometric Audio 1: Image-Source Room Models

A common audio technique for adding depth to a mix is to throw in echos or early reflections following the direct sound-source arrival to a listener. To model such reflections, many reverberation algorithms treat a sound-source and listener as a point emitter and receiver within an imaginary room or box. The reasoning follows that such a configuration is elegant from both theoretical and practical perspectives. In this series of posts, we will investigate why this is so followed by several new results that were recently derived and implemented in Riviera.

To start off, here’s an animation of a moving sound-source with respect to a listener in front of a wall. The dotted-orange line represents two rays that form the path that a sound-wave, originating from emitter S, would take if it underwent a specular reflection about plane n before reaching receiver L.

Specular reflection model for emitter S, image-source S' reflected across plane n,  and receiver L

The reflection point X is specific to the coordinates of S and L as the two incident angles of the two rays with respect to the plane must be identical. Determining X follows from applying some basic high-school geometry tools. If S' is the “image-source” or reflection of S across n, then the ray \overrightarrow{S'L} will intersect n at the desired point X; proof follows basic axioms of congruent angles of intersecting lines. More useful are the implications of such a construction. From the coordinates of S', observe that

  1. \overrightarrow{S'L} has length equal to the ray-traced path
  2. Last leg of the ray-traced path is coincident to \overrightarrow{S'L}

In other words, S' contains useful information for computing both distance and  direction of a first-order reflection (the two properties can later be used to update various DSP parameters such as time-delay, attenuation/gain coefficients, and head-related transfer functions). If the reflecting surface were to extend to infinity, then we need not even compute X given that the intersection will always fall upon the surface.

A third property of the image-source construction is its extension to higher-order image-sources that preserve properties 1 and 2. Supposing that we have two planes n_1, n_2 and S' is the first-order image-source of S reflected about n_1. Reflecting S' about  n_2 generates a second-order image-source S'' shown in the animation below.

Reflecting a first-order image-source S' about a valid plane produces a second-order image-source S''

Note some of the caveats as to which planes are valid. If emitter and receiver are located within a convex enclosure of planes whose normals point inward, then the image-source must lie within the positive side of candidate plane or else the resulting ray-traced path will be physically impossible (it will pass through planes). If the candidate plane were valid and S'' constructed, then the reflection path can be computed by iteratively back-tracing intersection points with lower-order image-sources; the total back-traced path can be shown to be equal to that of \overrightarrow{S''L} via symmetry arguments . In the example above, G' is the second-order intersection point between S'', L,  n_2, followed by the construction of ray \overrightarrow{S'G'} used to find the first-order intersection point G with respect to n_1. However, if any intersection point in the back-trace were to lie outside the reflecting plane, then the entire path would be invalid (akin to a reflection off of a non-surface). This is crucial as such a check would possibly invalidate a large majority of the high-order image-sources, resulting in wasted CPU cycles. Thus we now have some hints as to the complexity of the problem space.

  1. Supposing on average that each image-source has N valid planes to reflect from (e.g. regular polygonal enclosure), then the total number of image-sources is exponential  O(N^K)  with respect to image-source order K.
  2. Only a fraction of the exponentially large set of image-sources are valid.

It should be clear now that image-source methods are computationally expensive for even well-structured enclosures so we ought to turn our attention/look towards special cases where the problem space collapses.

First, consider the case of second-order image-sources computed amongst two orthogonal planes n_1 n_2 shown in the animation below.

Second-order image-source S'' coincident for both S'_1 and S'_2. Only one back-trace path is valid at a time.

The first-order image-sources S'_1, S'_2 are constructed by reflecting S off planes n_1, n_2 respectively. Their second-order image-sources are computed from reflecting off n_2, n_1 respectively and seem to possess coordinates coincident with respect to each (both are subsequently referred to as S''). This follows from the observation that the reflections w.r.t. orthogonal n_1 and n_2 commute. i.e. reflections between orthogonal planes can be performed in any order, only their multiplicity will matter.  Moreover, if we perform the back-trace from S'', we observe that there’s exactly one path that is ever valid with respect to a moving emitter or receiver. i.e. computing the coordinate of S'' is sufficient as it will always have a unique non-degenerate or non-coincident valid path. How can we apply this fact to the case of rectangular room enclosures?

Let us now define a rectangular room in terms of a pair of parallel planes n_{-1}, n_{+1} along dimension 1 orthogonal to a pair of parallel planes n_{-2}, n_{+2} along dimension 2 (see the animation below). For K_i reflections along the the i^{th} dimension, only the choice of the first reflection (either n_{-i} or n_{+i}) matters as all subsequent reflections must alternate; a one-to-one mapping exists between sequence and image-source coordinate. For reflections between orthogonal planes, the commuting property allows us to shuffle the ordering of reflections into sub-sequences restricted to those along dimension 1 followed by those in dimension 2. This allows us to map any unique image-source coordinate to and from the sequence of reflections given by \{ K_{\pm 1}, K_{\pm 2}\}. Moreover, each image-source will be restricted to its image-room computed in the same way by applying the same Matrix transform to its vertex points. The result is a lattice map of image-sources within image-rooms conveniently organized along two integer axes shown below.

Integer lattice map of image-source/rooms

It should be apparent that the number of valid image-sources is no longer exponential with respect to max order reflections K, but in the 2D case quadratic O(K^2) and in the general case, O(K^D) for D dimensions. The question now arises as to whether we can do better, especially for higher dimension (D \geq 3) where quantities grow more quickly and become non-trivial to compute.

Next posts: We determine the bounds for the number of image-sources within different radii defined under the L_1 taxi-cab distance and then the more practically useful L_2 Euclidean distance.

  1. The taxi-cab distance on the lattice map is equivalent to the max order K of image-sources. Counting the number of image-sources will bound the computations required to process individual image-sources within DSP pipelines (e.g. updating a tapped delay-line). See post. Related: linear algebra and dynamic programming
  2. The Euclidean distance on the lattice map gives the time-interval or sampling period of which image-sources appear. Counting the number of image-sources between two distances gives an energy density or amplitude profile of an impulse response allowing us to forgo processing individual image-sources. See part 1 and part 2. Related: Gauss Circle problem, dynamic programming, and Fourier analysis

Animations were generated using GeoGebra.