Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REQUEST] add 4th order (or more) ambisonics #1031

Open
TheBestKmanEver opened this issue Aug 16, 2024 · 7 comments
Open

[REQUEST] add 4th order (or more) ambisonics #1031

TheBestKmanEver opened this issue Aug 16, 2024 · 7 comments

Comments

@TheBestKmanEver
Copy link

for both HRTF and speaker layout

@kcat
Copy link
Owner

kcat commented Aug 17, 2024

Higher orders is something that could be nice to support. For fourth order, the main issue is being able to take advantage of it; speaker layouts aren't typically dense enough to utilize higher orders, for instance a 7.1 surround sound setup doesn't really benefit from even third order since there's not enough speakers to resolve the spatial detail, without succumbing to problems ambisonic panning attempts to address (sounds sticking to speakers, or having holes in between, as it pans around). HRTF may have more room to benefit with denser datasets, though less dense datasets will have the same problem if the virtual speakers don't line up well with the responses' locations. If there are ways to improve the output with higher orders despite having a low density of speakers, I'm not aware of it. The main use would be with outputting it directly to some external processor that can make use of the spatial resolution, and/or saving for future-proofing.

Orders above 4th may need more work to support, as there are places in the library that expect to use bits of an integer in relation to ambisonic channels. 32-bit integers allow up to 32 channels, which is fine for 4th order at 25 channels, while 5th order is 36, 6th order is 49, and 7th order is 64, which would need 64-bit integers. The panning coefficients for the higher order channels will also be needed, which I currently only have up to 4th order noted down. The NFC coefficients will also need expanding for higher orders, presuming the NFC filters will even still work at such higher orders.

@ThreeDeeJay
Copy link
Contributor

Doesn't hrtf-mode=full (default) already use all available HRTF positions independently from ambisonics order?
Or is this for an edge case that explicitly requires an external ambisonics spatializer instead of letting OpenAL Soft spatialize each source individually for better positional accuracy (as much as the HRTF density allows)?

Speaking of, how dense would the HRTF need to be to reproduce the equivalent of 35th Order Ambisonics? 🤔
IIRC that's 1296 channels, and SADIE II has ~3-9K positions, so I wonder how many would be enough since we need more (virtual) speakers than ambisonics channels and a high enough order for HRTF to match the real position:

3OA:
3 3

35OA:
35 35

Source: https://www.brucewiggins.co.uk/?p=1095

@kcat
Copy link
Owner

kcat commented Sep 15, 2024

Doesn't hrtf-mode=full (default) already use all available HRTF positions independently from ambisonics order?

Yes, but full HRTF mode adds more CPU cost per source than using the ambi* modes, so people may want to use them. I also regularly think about changing the default to one of the ambi modes to reduce the CPU overhead when a number of sources are playing, and because effects process an ambisonic mix, which would also make the final mix more consistent.

Speaking of, how dense would the HRTF need to be to reproduce the equivalent of 35th Order Ambisonics? 🤔
IIRC that's 1296 channels, and SADIE II has ~3-9K positions, so I wonder how many would be enough since #999 (comment) and a high enough order for HRTF to match the real position:

Full 3D 35th order would need more than 1296 channels (it's not clear exactly how many, but more is better as long as it's not overboard). But just as importantly, they have to be placed in an appropriate layout, which would need a decent amount of precision. Concentric rings at fixed elevation and azimuth steps, like mhr files use, aren't great for that without the step size being very low (only a few degrees), otherwise multiple channels would end up at the same location.

@junh1024
Copy link

For even ITU 5.1 , 4th/5th order may be beneficial since the angle between L & C is 30deg, but the resolution of 3oa is only 45deg. 5oa has resolution of 30deg.

@ThreeDeeJay
Copy link
Contributor

ThreeDeeJay commented Sep 16, 2024

Yes, but full HRTF mode adds more CPU cost per source than using the ambi* modes

Are we talking like significant usage even on modern desktop CPUs or just low-end/mobile devices? 🤔

Full 3D 35th order would need more than 1296 channels (it's not clear exactly how many, but more is better as long as it's not overboard). But just as importantly, they have to be placed in an appropriate layout, which would need a decent amount of precision. Concentric rings at fixed elevation and azimuth steps, like mhr files use, aren't great for that without the step size being very low (only a few degrees), otherwise multiple channels would end up at the same location.

Dang, I just noticed that SADIE uses just ~1/3 IRs of the total ~3K, because of incompatible layout, I presume 😔
At least EAC only loses about a hundred
image

On a side note, could OpenAL Soft add Higher-Order Ambisonics decoding to apps like sView (limited to 1OA) or should every app add HOA decoding on its own like Virtual Home Theater which already supports up to 3OA?

@kcat
Copy link
Owner

kcat commented Jan 28, 2025

For even ITU 5.1 , 4th/5th order may be beneficial since the angle between L & C is 30deg, but the resolution of 3oa is only 45deg.

It's not really that simple. You get full sphere resolution with even first-order. As a sound moves around the listener, the volume levels change for the X,Y,Z channels in relation to each other, and you can detect subtle movements even on quad output since the mix to each of the four speakers changes to alter the focus direction (it doesn't take much change to be perceptible). You can definitely make out differences of less than 45 degrees in first-order ambisonics, let alone second- or third-order. The angle between a given channel's lobes may be around that, but it's the relative differences between all channels that defines the direction of a sound, which in turn create relative differences in all output speakers to focus the sound in the intended direction. Dual-band/frequency dependent processing can also increase the perceived sharpness by accounting for how the brain localizes low vs high frequencies.

Where higher orders help is directional clarity, especially when multiple sounds mix together. Lower orders have a more diffuse response, a sound has a wider cone of influence around the listener (a cone that is focused in the exact direction of the sound, and fades toward the edge of the cone). If you have a sound coming in from one direction, and the same sound coming in from the exact opposite direction, their cones overlap such that the sound seems to come from all directions equally, regardless of being left and right, or front and back, or top and bottom (the X, Y, and Z channels all cancel to 0 with a diametrically opposed pair, leaving W at full strength). Higher orders help resolve this kind of directional ambiguity, providing extra channels that won't cancel out so readily when sounds are mixed together (left and right mixing together can be distinguished from front and back mixing together, which can be distinguished from top and bottom, so you can discern the direction of any diametrically opposed pair). And with more speakers, those extra channels help focus the sounds in their separate directions.

ITU 5.1 in particular has some interesting challenges with ambisonics. There are three front channels within 60 degrees of each other, while only two side/surround channels covering 140 degrees in the back, and ambisonics really doesn't like this kind of asymmetry. Given a passive matrix decoder, higher order channels can become a liability without enough speakers, since they create a push-and-pull effect on the speakers as a sound moves around and have a larger influence for relatively smaller movement. So higher orders could help focus sound using the three front speakers that are closer together, while being detrimental to the surround speakers' ability to focus the sound as they're farther apart. There's been a lot of work to try to find good second-order decoder matrices for 5.1, and while there are usable options, the best results depend on modified layouts that break the ITU 5.1 recommendation (spreading the speakers out to be at a more regular distribution and not so clustered in front).

Are we talking like significant usage even on modern desktop CPUs or just low-end/mobile devices? 🤔

Performance can still be an issue on modern desktop CPUs, given enough simultaneous sources.

Dang, I just noticed that SADIE uses just ~1/3 IRs of the total ~3K, because of incompatible layout, I presume 😔
At least EAC only loses about a hundred

Yeah, and not only would it need more than 1000 HRIRs, but each source channel would need to be mixed to the 1296 ambisonic channels, and HRTF would need to be applied to each. Even for just 4th order, each source channel would need to mix to 25 channels, and HRTF would need to be applied to 25 channels (definitely more practical, but certainly not cheap).

On a side note, could OpenAL Soft add Higher-Order Ambisonics decoding to apps like sView (limited to 1OA) or should every app add HOA decoding on its own like Virtual Home Theater which already supports up to 3OA?

OpenAL Soft is designed for real-time 3D audio processing. With some of the available extensions, an app can use it to decode first-, second-, or (kind of) third-order ambisonics to a discrete channel layout (OpenAL Soft doesn't currently support a channel layout that would be third-order, except HRTF with appropriate config options; it would essentially just decode the order the channel layout is for), even if that's not it's intended usage. It may or may not be suggested, depending on the quality of decoder they would otherwise implement themselves.

@alex-schroedsen
Copy link

alex-schroedsen commented Feb 2, 2025

In the event that ambi4 and higher channel modes are implemented, I have modified and compiled the IEM suite of Ambisonic plugins to support the JACK audio backend on Windows. https://github.com/alex-schroedsen/iem-plugin-suite-jack-api-win/releases/tag/1.14.1-JACK-win IEM supports up to 7th order Ambisonics. I have compiled OpenAL 1.24.2 with JACK as it's only backend here openal-soft-1.24.2-JACK-win-bin.tar.gz

Here is a screenshot of one of the compiled programs in action.

Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants