The next iteration of the Muze plugin is live! Screen shot and workflow video below:

New features / changes

Added new echo-blender reverb algorithm (second of nine) and selector for changing modes (see GUI + manual for details)

Sound-sources are now draggable in XY plane in sound-stage

Right-click on sound-source toggles mute button

Enable wrap-around mouse wheel scrolling on source and orientation knobs

Forum link added to top right

Lower left displays registered user name only

Added 10 new presets

Those who have already purchased can update their full version via their Fastspring link (if you were a beta-tester, contact us). The demo links have also been updated.

Release notes:
v. 1.1.1
-Fixed crash on certain mac DAWs when switching between reverb modes
-Added version numbering to upper-right corner

Muze is multi-purpose VST/AU reverb plugin that combines a 3D audio mixer with a high quality binaural near & far field model. The mixer component enables tight control over the early spatial impression of the acoustic field while psychoacoustic models add essential directional cues to help localize sound; our reverb design integrates with the binaural cues, giving appropriate contrast for the early spatial impression to sit in.

Muze supports up to 8 input channels with an additional ninth summation channel that doubles as a delay unit. Channels are associated with point sound-sources in an interactive graphical display and have separate processing paths that come together in a shared reverb component. Create whisper effects, wide impressions, impossible spaces. Up-mix mono sources into stereo/binaural. Virtualize 5.1+ to headphones by placing virtual speakers on the sound-stage. Switch to the panning mode for speaker setups.

Features:

8 channel virtualization, each with automatable azimuth, elevation, and distance controls

Adjustable listener controls for scaling interaural time delay and aligning yaw/pitch orientations

Spin knob enables automatic head-rotation, unleashing new possibilities for different modulation effects

Unique spatial-reverb design integrates specular & diffuse sound reflections with the HRTF model, producing an immersive reverberant field

Reverb characteristics such as room size, sound depth, high frequency dampening/reverb times are automatable

Delay unit integrates with the reverb, creating DUB-type effects

Rendering modes togglable between binaural/panning processing

All sampling rates supported for HRTFs

Graphical display for visualizing sound-sources & listener position and orientations

Updates are free

50+ presets to get you started

Samples:

Vocals by the talented Stephanie Kay (Stars Collide): Dry first, followed by alternating rotations along yaw and pitch axes for no-verb, no-verb near-field, small room, hall, diffuse hall, and cave

Jazz instrumental by Maurizio Pagnutti (All The Gin Is Gone’ & ‘Bess): Dancing around instruments!

Loopy spatial effects!

Workflow/Experiments:

Demo & Specifications:

Windows 7+: VST2 Win32 + x64, SSE2 enabled processor

Mac OSX 10.7+: VST2/AU, SSE2 enabled processor

Download Link (Demo restriction is brief silence every 30 sec)

Here’s a quick video about the controls in the Riviera plugin (turn on closed-captions for descriptions). The dry clip is a collection of transients typically used to stress-test reverb effects. Below is a cross-post of the readme contained within the plugin with some further elaborations.

Controls:
-Fine-grain adjustments of knob are possible with the mouse wheel + holding down either shift or ctr on keyboard.
-Double-click a parameter will reset it to default.

Voom (N-Orthotope) panel: Generalization of room into arbitrary dimensions.
5 faders, each with three knobs determine the characteristics of each dimension of the voom.

Size: The length of the dimension in meters. The sound-source is effectively placed at the center of this dimension. Enlarging this tends to increase RT60 and sense of “spaciousness” due to greater separation of early reflections.

Depth: Where you (the listener) is positioned in whole (integer) meters relative to the center of the room. 0 percent is coincident to the sound-source so there’s maximal delay between the direct sound and early reflections. 50 percent is coincident to the “wall” or boundary so the direct and early reflections are less distinguishable from reverb. Note that the IR is computed for depths that would map to whole meters so for a 6 meter dimension, there are only 4 positions the listener can be in (0, 1, 2, 3 meters from center).

Reflection: The dB loss incured per reflection between sound-source and boundary. Setting this to low values (e.g. 0.1 dB loss) will largely increase RT60 which got truncated to 5 seconds for performance reasons.

V1-V5 buttons: Enable/disable individual dimensions; enabling any combination of the N buttons generates an N-D room, disabling all dimensions will cause bypass. e.g. enabling (V1, V3, V5) generates a 3D room as does (V2, V3, V4). Note that in higher dimensional vooms, the reverb build-up creates a swell if you aren’t coincident to the sound-source so there’s hardly any distinction between direct, early, and late reflections. This motivates some time manipulation controls so that we may listen in these spaces.

Time panel: Manipulates geometry, distances between direct/early/late reflections, and more.

Stretch: A form of super-sampling of the underlying space which has the effect of spacing all the reflections out. This is geometrically equivalent to scaling your the voom and depth by a constant which will allow us to achieve long reverb tails.
1: no change
>1: Oversample geometry for longer IR
Note that freq. decay is applied after the fact so the reported RT60 will not scale proportionately.

Reverse: Mirrors the first % of the IR to create a pre-verb / pre-fading effect. 0% default gives no pre-verb where the first non-zero tap is the direct onset of the sound-source. 100% completely reverses the IR.

Linearity: A form of biasing the sampling in the geometry to either towards the earlier reflections as opposed to the later ones. In physical terms, this is modeling variable acceleration of the speed-of-sound without annoying Doppler effects. If late is oversampled (sound-velocty accelerates over time), the result is a long IR with distinct (well-separated) early reflections (more like echos). If early is oversampled (sound-velocity decelerates overtime), the result is a short IR with a very fast attack with a diminished reverb tail as all the earlier reflections have been compressed towards the direct sound-source.

Attenuation: This modifies the generalization of the inverse square law in higher-dimensions for sound-source energy loss. Low g causes less attenuation over distances (slow roll-off) which will emphasize the late-tail / reverb over the direct+early reflections without IR length. Large g causes more attenuation (fast roll-off) which will emphasize direct-early over reverb.
Delay: The direct sound-source normally has a non-zero time-of-arrival depending on the listener depth but for practical usage (mixing), a separate control was created for delaying the entire IR. By default, physical delay between source-listener is truncated to 0. Use this in conjunction with T0 (see below) and the mix knob to do pre-fading.

T0: Direct truncation of the early part of the IR. Use it to remove the direct sound-source onset, start the IR anywhere within the reverb tail, or decrease pre-fading time and even gate the reverb tail with the reverse knob.

FFT: Controls the internal max-power block-size parameter without effecting latency (default latency is twice the latency set within the DAW due to some DAWs using variable block-sizes). Decreasing this will lower peak real-time processing at the cost of increase average real-time CPU usage. Increasing this will raise peak real-time processing but with decreased the average real-time CPU usage.

Frequency panel: All mediums (air, water, dry wall, glass) have frequency dependent absorption characteristics that will color the IR over time (see spectrogram). Two knobs are provided that parametrically fits a smooth function between 0 to pi radians in unnormalized frequency domain.

High/low decay or dampening: Increasing these will more quickly attenuate respective high and low frequencies from sampling_rate/2 Hz to 0 Hz over time; all freq. between have decay bounded between these two settings. Setting them equal to each other has the effect of applying frequency-independent dB loss (i.e. gain control).

Low cut Hz/Quality: Filter out low frequencies from 0 to f0 Hz with strength Q. Note that large Q will delay the signal a little (due to linear-phase) so watch the latency or use the T0 control to cut out the initial pre-ring.

Misc panel:
Your standard pan, stereo and mix (wet/dry) controls. These all have frame-buffer length latency and will not incur a recomputation of the IR.

Pan: Applies dB loss to either left or right channels.

Stereo: Applies low ms delay to either left or right channels.

Mix: Basic fader between original and processed signals.

Fast mode: Enable this so that non-voom parameter updates are faster at the expense of more memory usage.

IR normalization: If enabled, will normalize impulse response if sum of squared exceeds 1. Otherwise, beware of speakers if you start adjusting attenuation and reflection settings too aggressively.

Recall from the introductory post, the taxi-cab distance metric on the integer lattice map relates the max-order of image-sources to the lattice volume or the total number of image-sources contained within a radius “ball”. This quantity is useful for determining the total costs of processing individual image-sources in cases where can be known or estimated beforehand (e.g. selected according to an estimate of a room’s RT60 using Sabine’s Equation).

In dimensional space, the norm is given by where the lattice volume is defined by the number of integer lattice coordinates . The visual equal-distance curve resembles a diamond in and the growth rate of the lattice volume seems obvious if we introduce a lattice-shell term given by the number of integer coordinates that satisfies (see the animation below). Such ease is not the case for dimensions .

We start with a counting argument that relates the lattice shell and lattice volumes across dimensions via a recurrence relation:

The lattice shell relation follows from observation that all lattice coordinates on the boundary can be formed by augmenting with and with . The lattice volume relation follows from the definition of a lattice shell which through substitution can be re-written as self-referential difference equation between successive dimensions. The counting solution is practical for small and but can be improved by the following ansatz.

Ansatz: Suppose that the lattice volume can be expressed as a polynomial equation given by . Substitution into the difference equation gives

where are unknown polynomial coefficients of the powers of radius . Using the binomial theorem, the powers of rearranged as to form a generalized square matrix-system given by

where upper triangular matrices contain the binomial coefficients and is the vector of unknown polynomial coefficients to be solved for in terms of the polynomial coefficients in the preceding dimension. Carrying the recursion out from the base case , the higher-dimensional lattice volumes are given by

where by inspection, the highest order coefficients decrease towards as increases. This is expected as the majority of the volume moves towards the origin as grows. For some context, we can compare the lattice volumes bounded under larger p-norms such as or Euclidean (see post on Gauss circle problem, volume appx. hypersphere) and the norm given by (lattice volume trivially ). By inspection, the lattice volume gap between and grows wider as the leading polynomial terms of decay to (see plot below).

This concludes our four-part foray into image-source models within high-dimensional integer lattice maps. If any new or interesting results come up or any errors found, I will update accordingly.

Notes: Animations generated in GeoGebra, equations and figures were from my draft paper.

Recall in the previous post our derivation of the Gauss circle recurrence relation generalized to arbitrary dimensions, scaling, offset, and weights:

where the cost of computing for for max-distance requires flops. In practice, the summation should be transposed so that is incremented within the inner loop as to take advantage of vectorization/SIMD memory access patterns.

A second approach for those familiar with DSP utilizes the fact that the summation term in the recurrence relation passes as a form of sparse convolution due to the quadratic-striding memory access patterns of . To make the convolution operation explicit, we can vectorize into a sparse signal and re-express the summation as follows:

where is an indicator variable. Note that in consideration of building , a larger boundary scaling term increases the signal sparsity. i.e. from a computational perspective, there are regions in the parameter space of where either direct sparse-convolution or the direct evaluation of the recurrence relation will be faster than an implementation dependent Fast Fourier Transform (FFT) based convolution despite the latter’s lower asymptotic cost of flops.

With the distance bound and Gauss circle problem out of the way, we will continue our investigation of lattice volume bounds in terms of the or taxi-cab distance in the next post.

In the previous post on image-sources for room modeling, we made the observations that

There exist a unique path from each image-source coordinate that can be back-traced to a receiver

The distance and direction between image-source coordinate and listener are equal sum total of the back-traced ray lengths and the final leg of the back-traced path respectively.

Image-source coordinates in 2D orthotopes has one-to-one mapping to integer lattice points

The number of image-sources with respect to reflection order and dimension is

For large or large , the number of image-sources becomes too large to process individually for any practical real-time applications. Instead, we ought to take a density estimation approach by making the following query: How many image-sources lie between a hypersphere of radius and centered about a receiver ? i.e. what is the difference in “lattice volume” or number of lattice points contained between two hyperspheres of different radii? If we can quickly solve for such queries, then it should be possible to design a multi-tap finite impulse response (FIR) where each sample is a weighted function of the differences in lattice volumes at successive radius in meters ( is the velocity of sound, is the sample-rate, and is the integer sample index). To attack this problem, let us consider the classic Gauss Circle Problem posed simply as the determination of the number of integer lattice points within a circle of radius (see animation below).

The exact solution is known and given by , requiring an expensive summation over the positive integers less than . i.e. we must numerically count despite the fact that the lattice volume approaches the area of a circle as grows large if we want to model high-frequencies in our FIR.

Unfortunately, mapping our image-source density problem to the Gauss circle problem introduces many unsatisfactory constraints:

Number of dimensions restricted to

The room must be a unit square

Emitter and receiver are coincident to the origin

Each lattice point contributes only one unit to the summation (they are unweighted)

Area is a quadratic function of so later summations will be huge

Let us generalize the Gauss circle problem so that these constraints can be either removed or relaxed. For reference, Euclidean or distance in dimensions is given by

Recurrence relation for lattice volume in arbitrary dimensions:

The base case for assumes that positive and negative lattice points coordinates are symmetric. For higher dimensional cases, we do a form of recursive integration over the positive integers of each dimension (see animation below).

This proof follows from the observation that if and , then .

Memorization in quadratic space:

Lower dimensional solutions of the original recursive formulation can be stored in memory by mapping integer space to the quadratic space . i.e. We compute and store for where is the max radius of interest (in practical terms, is a meter distance quantity converted from a desired reverb time). The trade-off is that memory requirements undergo quadratic scaling with respect to max radius.

Integer scaling of room boundaries and receiver offset:

where integer scalar determines the size of room along dimension and integer scalar is the emitter offset in dimension from the origin. The proof follows from the constraint that implies and .

Exponential decaying lattice point contributions:

where is a real value representing in physical terms a conversion of dB loss to magnitude due to a reflection off a boundary. Proof of the case of follows the application of the geometric series.

With the generalization of the Gauss circle problem into a dynamic programming problem, we have expanded the parameter space to include arbitrary dimensional orthotopes of integer boundary sizes, integer receiver offsets, and real reflection gain/loss coefficients. Prefiguring these parameters beforehand and accounting for attenuation loss due to a generalization of the inverse square law of sound-fields into higher dimensions, an RT60 or FIR length and subsequent max meters terms can be specified beforehand.

The cost of directly computing at is given by flops. Summing over all gives a cost of flops and is most expensive when the boundary size is minimized. This is certainly a large improvement over directly processing individual image-sources where the asymptotic costs of the two methods match for . However, we can make one last improvement by observing that the access patterns of resembles that of the convolution operation, allowing us to achieve even lower asymptotic cost of via the fast Fourier transform. Implementation details will be covered in the next post!

———-

Notes: Equations were lifted from a draft paper that I’m writing. Animations were done with GeoGebra.

A common audio technique for adding depth to a mix is to throw in echos or early reflections following the direct sound-source arrival to a listener. To model such reflections, many reverberation algorithms treat a sound-source and listener as a point emitter and receiver within an imaginary room or box. The reasoning follows that such a configuration is elegant from both theoretical and practical perspectives. In this series of posts, we will investigate why this is so followed by several new results that were recently derived and implemented in Riviera.

To start off, here’s an animation of a moving sound-source with respect to a listener in front of a wall. The dotted-orange line represents two rays that form the path that a sound-wave, originating from emitter , would take if it underwent a specular reflection about plane before reaching receiver .

The reflection point is specific to the coordinates of and as the two incident angles of the two rays with respect to the plane must be identical. Determining follows from applying some basic high-school geometry tools. If is the “image-source” or reflection of across , then the ray will intersect at the desired point ; proof follows basic axioms of congruent angles of intersecting lines. More useful are the implications of such a construction. From the coordinates of , observe that

has length equal to the ray-traced path

Last leg of the ray-traced path is coincident to

In other words, contains useful information for computing both distance and direction of a first-order reflection (the two properties can later be used to update various DSP parameters such as time-delay, attenuation/gain coefficients, and head-related transfer functions). If the reflecting surface were to extend to infinity, then we need not even compute given that the intersection will always fall upon the surface.

A third property of the image-source construction is its extension to higher-order image-sources that preserve properties 1 and 2. Supposing that we have two planes and is the first-order image-source of reflected about . Reflecting about generates a second-order image-source shown in the animation below.

Note some of the caveats as to which planes are valid. If emitter and receiver are located within a convex enclosure of planes whose normals point inward, then the image-source must lie within the positive side of candidate plane or else the resulting ray-traced path will be physically impossible (it will pass through planes). If the candidate plane were valid and constructed, then the reflection path can be computed by iteratively back-tracing intersection points with lower-order image-sources; the total back-traced path can be shown to be equal to that of via symmetry arguments . In the example above, is the second-order intersection point between followed by the construction of ray used to find the first-order intersection point with respect to . However, if any intersection point in the back-trace were to lie outside the reflecting plane, then the entire path would be invalid (akin to a reflection off of a non-surface). This is crucial as such a check would possibly invalidate a large majority of the high-order image-sources, resulting in wasted CPU cycles. Thus we now have some hints as to the complexity of the problem space.

Supposing on average that each image-source has valid planes to reflect from (e.g. regular polygonal enclosure), then the total number of image-sources is exponential with respect to image-source order .

Only a fraction of the exponentially large set of image-sources are valid.

It should be clear now that image-source methods are computationally expensive for even well-structured enclosures so we ought to turn our attention/look towards special cases where the problem space collapses.

First, consider the case of second-order image-sources computed amongst two orthogonal planes shown in the animation below.

The first-order image-sources are constructed by reflecting off planes respectively. Their second-order image-sources are computed from reflecting off respectively and seem to possess coordinates coincident with respect to each (both are subsequently referred to as ). This follows from the observation that the reflections w.r.t. orthogonal and commute. i.e. reflections between orthogonal planes can be performed in any order, only their multiplicity will matter. Moreover, if we perform the back-trace from , we observe that there’s exactly one path that is ever valid with respect to a moving emitter or receiver. i.e. computing the coordinate of is sufficient as it will always have a unique non-degenerate or non-coincident valid path. How can we apply this fact to the case of rectangular room enclosures?

Let us now define a rectangular room in terms of a pair of parallel planes along dimension orthogonal to a pair of parallel planes along dimension (see the animation below). For reflections along the the dimension, only the choice of the first reflection (either or ) matters as all subsequent reflections must alternate; a one-to-one mapping exists between sequence and image-source coordinate. For reflections between orthogonal planes, the commuting property allows us to shuffle the ordering of reflections into sub-sequences restricted to those along dimension followed by those in dimension . This allows us to map any unique image-source coordinate to and from the sequence of reflections given by . Moreover, each image-source will be restricted to its image-room computed in the same way by applying the same Matrix transform to its vertex points. The result is a lattice map of image-sources within image-rooms conveniently organized along two integer axes shown below.

It should be apparent that the number of valid image-sources is no longer exponential with respect to max order reflections , but in the D case quadratic and in the general case, for dimensions. The question now arises as to whether we can do better, especially for higher dimension () where quantities grow more quickly and become non-trivial to compute.

Next posts: We determine the bounds for the number of image-sources within different radii defined under the taxi-cab distance and then the more practically useful Euclidean distance.

The taxi-cab distance on the lattice map is equivalent to the max order of image-sources. Counting the number of image-sources will bound the computations required to process individual image-sources within DSP pipelines (e.g. updating a tapped delay-line). See post. Related: linear algebra and dynamic programming

The Euclidean distance on the lattice map gives the time-interval or sampling period of which image-sources appear. Counting the number of image-sources between two distances gives an energy density or amplitude profile of an impulse response allowing us to forgo processing individual image-sources. See part 1 and part 2. Related: Gauss Circle problem, dynamic programming, and Fourier analysis