Guide: stereo to binaural conversion for headphone listening

Post by **satshanti** » Tue Jun 03, 2014 9:30 pm

Hello everyone!

Some folks at the MQN thread have asked me to share my quest into binaural headphone playback, so I decided to start this new thread. I’m afraid the first post will be somewhat long, but I think it’s going to be worth it. I published part of it before somewhere, but have added a lot of new bits. To make it all more bearable (and fun), I’ll use easily digestible episodes. :-)

Part 1: Introduction
First, an appetizer, download: Virtual Barbershop

If you know it already, great, if you don’t, listen to it on headphones straight up, and be amazed! Now THAT is what I would call a 3-axis 360 degree sound stage!

I’m going to share with you my quest for a similarly effortless, realistic, natural, life-like, transparent sound stage through headphones. I thought to myself: wouldn’t it be absolutely great, if I could listen to my favourite music with the same life-like presence as the virtual barbershop. That’s what started me on my quest and although in the end I have not managed to place my favourite musicians in a virtual space exactly as sharply defined as the barbershop, I have come very close!

Here are some pre-processed samples in 16/44 flac I've created by the process I'll explain here. They can be played directly through headphones and will give you a good idea of what to expect:

Binaural Samples

Part 2: All the World’s a Sound Stage?
I guess this chapter is not going to be news to most of you, but I’ll include it anyway to paint a complete picture and as preparation for what comes next. From the very beginning, music recording has been focused on reproduction through loudspeakers rather than headphones. That’s why almost all recordings to date are “stereophonic” recordings. In modern sound studios stereo mixes are created from multiple mono tracks, but in the old days a stereo recording was made by placing two microphones a certain distance apart, and realistic playback of a 3-dimensional “stereo-image” was possible through two speakers similarly placed a certain distance apart, a phenomenon all of us know very well. This type of recording was and still is meant to be heard through a pair of loudspeakers in order to unfold and re-create its inherent 3D-image or sound stage. If heard through headphones however, each channel that’s supposed to be heard by both the left and the right ear, is instead heard only by one ear, causing the stereo-image to collapse into a flat line between both ears.

This is the (not much of a) “sound stage” that we perceive through headphones while listening to stereophonic music recordings in our natural state of hearing. Note the two issues I have underlined, which I will get into separately.

Part 3: Binaural Minority Report
The Virtual Barbershop (VB) is NOT a stereophonic recording. It’s a binaural recording, tweaked digitally by way of a proprietary algorithm. The binaural recording technique is one specifically designed to be played back through headphones. Two microphones are placed in a dummy head, where our eardrums are located. If the dummy would be an exact plastered copy of our own head and ears, we would have no need of digitally enhancing the recording. Anything would then sound exactly as the VB. I’m sure you can understand why. In order for it to create its 3D realism to such an extent in no matter which pair of human ears, the digital algorithm that’s whispered into your ear at the end of the clip is used. It enhances the so-called head-related transfer functions (HRTF) of the recorded sounds. This is what creates the main difference between the perception of front and rear sounds. Of the very few binaural recordings that are made, only some will give you that exact front/rear positioning like the VB. I'll tell you why: some binaural recordings are recorded with a Jecklin Disc, or a dummy head without ears, so typically the perceived space is placed either 180 degrees behind you OR 180 degrees in front of you, rather than the full 360 like the VB. It’s our ears, and in this case I mean those funny pieces of meat sticking out of the sides of our heads, that allow us to discern between a sound coming from the front or the rear. They screen the sounds coming from the rear more than they do the sounds coming from the front. The way sound is altered because of our outer ear is determined by these HRTF. So the first clue I followed was the mysterious algorithm that was whispered in my left ear. But first, as I promised above, I‘d like to share my experience with natural hearing and the lack thereof!

Part 4: The Red or the Blue Pill
Our brain is an amazing thing capable of performing awe-inspiring feats. As we come into the world, our ears (that is our brain) don’t have as yet the capacity to locate sounds. We have to slowly start learning to interpret those slight phase-shifts in sound, those reflections and diffractions that are caused by the unique shape of our ears and our head (HRTF). We would lose that capability if we would suddenly lose our ears or be outfitted with differently shaped ears, at least at first, but as we would get used to those new ears, we would slowly gain that capability again. This shows that we are able the “re-program” our brain in order to preserve our capacity for 3-dimensional hearing. In case of the new set of ears we merely have to continue our inherent capability for “natural” hearing, based on those subtle HRTF cues, so although the transitional adaption period might be slightly confusing and tiring for our brain, once re-programmed, we are again able to listen effortlessly to the sounds in the world around us. Now, what does this have to with anything?

Let’s talk about headphone fatigue. This is the reason why I personally always preferred listening to speakers rather than headphones. While we listen to stereophonically recorded music through headphones, our brain is receiving auditory information that is in some way distorted and unnatural. Some aspects of it, like the frequency spectrum and the timing, are OK, but the directional cues are plainly NOT there in the way our brain is used to receiving them. So rather than give up and leave us with the narrow between-the-ears stereo-image we actually perceive in that natural state, our brain starts the process of re-programming itself in order to re-instate the illusion of natural positional hearing. This takes time and effort. It does cause fatigue, but after some time our wonderful brain IS actually able to have us believe that we are listening to a speaker-like sound stage. And the more we get used to it, the less fatiguing it gets and we are happy. It’s not exactly natural, and it still does take some small effort for the brain to maintain the illusion, but it kind of works, and at least there are absolutely no changes made to the frequency spectrum, the timing or resonant harmonics of the source.

For a long time this was the one and only choice available for headphone listening, but now I’m going to offer you a pill of a different colour. What if we could spare the brain the initial time and effort to re-program itself for headphone listening and the continuous effort it takes to uphold an illusion. I believe that fatigue is still occurring to almost everyone who's gotten used to headphone listening, because it just takes much more effort to translate those invalid auditory cues into a coherent sound stage, at least compared to the natural HRTF phase-based cues.

I’m not the first one to get the idea of some sort of pre-processing to make the sound more natural. Some headphone amp makers started experimenting with hardware-based cross-feed circuits, so that’s one of the things I started experimenting with.

Part 5: Cross-Feed, just a gimmick?
Cross-feed, as the name implies, feeds or little bit of the left channel into the right, and vice versa.
I found out that there were actually some Foobar plugins offering software cross-feed. There's the Bauer stereophonic-to-binaural DSP. The name was very promising and having played around with it and its settings, I liked it. It emulates various hardware based cross-feed circuits and makes subtle changes to the sound. It helps the brain a little bit more with deciphering spatial cues and building a small-scale sound stage, but still leaves something for the brain to do: expanding the soundstage outward; all in all, a good compromise, but not the end of the journey. At this point in time I found the virtual barbershop demo, clearly demonstrating that even more should be possible, so I started digging deeper.

Part 6: Positional Audio
I started searching for that mysterious Cetera algorithm responsible for the WOW-factor in the Virtual Barbershop. I found that the demo was created by a manufacturer of hearing aids called Starkey. The Cetera algorithm was the software part of a hearing aid developed in the late nineties, in cooperation with another company called QSound Labs. This company, then as well as now, specializes in a wide spectrum of 3D audio solutions. Their technology is implemented in various ways, software as well as hardware based. In fact, I discovered that since the nineties various companies had started research into 3D audio, both for studio purposes, like music recording and movie surround tracks, as well as positional audio for the PC (think first-person shooters). SRS Labs, for instance, has worked in the same field as QSound. Both these companies have developed software packages for the PC, able to process and enhance sound and music in a variety of ways, including headphone surround. I’ll not go into details here, as their products usually are shipped with certain hardware, like PC sounds cards, or if sold separately, are only useable as part of the operating system, which means upsampling/downsampling, etc, so that doesn’t really serve the purpose of audiophile music listening.

There’s one company however that I haven't mentioned yet, Lake Technology. This Australian company developed digital audio algorithms for recording studios. One of their algorithms allowed movie studio technicians to use headphones to work with and monitor 5.1 movie tracks. After Dolby Laboratories licensed the technology and then even bought the whole company, it became known as Dolby Headphone. Being a company with a slightly different focus compared to the others mentioned before, Dolby Headphone was licensed to manufacturers of DVD-players and other home-theatre equipment, where its algorithms were hardwired into the signal path.

Investigating Dolby Headphone I stumbled upon a thread at the Hydrogenaudio forum and discovered that someone had had similar thoughts already, and that turned out to be a significant discovery.

Part 7: Dolby Headphone Wrapper
Someone had already developed a great piece of software, called the Dolby Headphone Wrapper (DHW). It’s now an official 3rd party Foobar plugin and using it correctly and in combination with certain other plugins improves regular cross-feed processing by several orders of magnitude.

The Dolby Headphone algorithm is not only built into stand-alone dvd-players, but is also part of a number of commercial software dvd-players for the pc. One little file in particular takes care of it: dolbyHph.dll and the Foobar plugin utilizes that file. It is possibly not 100% legal to distribute it, but there are trial versions of software dvd-players available for download that include that dll-file.

The wrapper converts a 5.1 channel input into a binaural 2-channel output for headphones. Dolby Headphone does work with a 2-channel input as well, but the result won't be as good.

So what can we put before DHW to change a 2-channel stereophonic track into a 5.1 surround track? For almost all my music - be it rock, classic, pop or folk - I listen to through headphones, I use my customized DSP chain based on Dolby Headphone. With it I experience something better than a speaker-like soundstage. I feel as if I'm smack in the middle of a live soundstage. Words cannot begin to describe what these DSP’s do to any source of music, no matter how it’s recorded.

So what is the missing link?

Part 8: The Icing on the Cake
A guy named Steve Thomson created a free piece of software called V.I. Stereo to 5.1 Converter VST Plugin Suite (VI) that incorporates a number of algorithms (i.e. ambisonics) to place sounds into the proper place in the 3-dimensional sound stage. It creates a living, breathing atmosphere out of the slightest auditory cues available in the original signal. No matter how the recording is made, as long as it’s stereophonic and not already binaural, VI will create a 360 degrees image that is absolutely believable. It is VI that is responsible for placing echoes, resonances and other subtle or not so subtle cues at the proper place in the virtual sound stage, without ever overdoing it in such a way that it’s perceived as unnatural. A singer for instance is typically placed front center, but the acoustic reverberation of the voice that is part of the original recording is placed all around the listener just as it should be if the singer would be standing before you in a real room. And this applies to all instruments and sounds. The result is impressive. Of course some recordings work better than others with it, but all in all it’s pretty amazing how intelligently VI and Dolby Headphone work together to create such a realistic sound space. I have spent considerable time finding the optimal setting for VI where the focus and front/rear division is optimal and most realistic. I suggest you start with this and only if you are the experimenting type, change the settings and see if your taste is different than mine.

Part 9: Putting it all together
So, what do you need?

Download the package I prepared on Dropbox. It contains Dolby Headphone Wrapper (foo_dsp_dolbyhp.dll), VI Suite (VI_Setup.zip), VST adapter plugin (foo_vst.dll), SoX resampler (foo_dsp_resampler.dll) and another unnamed but necessary file. Install VI Suite according to the instructions included with it. Place the three Foobar plugins in the components folder of your foobar2000 installation folder. Start Foobar. Open Preferences and go to Components>VST plug-ins. Click Add, navigate to the folder where VI Suite was installed and add VI.dll to the VST list. Click OK and restart Foobar. Then open Preferences again and go to Playback>DSP Manager. Move Dolby Headphone and VI from the right to the left pane to activate these DSP's. If you have a DAC that works on 24bit/96KHz, I suggest you add the Resampler (SoX) to the list as well. Make sure the DPS's are listed in the following order: VI, DH, SoX.

Configure Dolby Headphone (click on DSP in list and then click "Configure selected"). Point the wrapper to the dolbyHph.dll file you must have sourced somehow :-) and saved on your PC somewhere (I suggest the Foobar folder). There are 3 choices for your virtual room. I tend to use the DH2 live room, as this is a good compromise between directness and spaciousness. The DH1 reference room is smaller, so less reverb, and the DH3 movie theater is large, so will create a very spacious effect, really impressive and pleasant to just let wash over you, but muddles detail somewhat. It's up to personal preference, but the VI settings I use and my converted samples are all based on room 2. Set amplification at 100% and make sure to leave Dynamic Compression off.

Configure VI. A red settings screen pops up. There are 4 sliders and 3 buttons. The top on/off button should obviously be on. Leave or switch the other buttons off. The sliders are each divided into 100 units. I call the centre point 0, with the leftmost at -50 and the rightmost at 50. Set them as follows:

Width correction: -15
Front ambience: 0
Rear ambience: -10
Rear level: -45

I have spent a lot of time figuring out these optimal settings. They are of course subject to personal preference, so feel free to experiment yourself. I'll warn you though: a few notches can cause the illusion to collapse.

If you decided to resample, configure SoX by setting target samplerate to 96000, quality to best, and leave the rest of the settings as they are.

I also use ReplayGain in the conversion, because especially with test tracks I don't like continuously having to fiddle with the volume knob. If you don't know what it is, or know already you don't want to use it, just skip this paragraph. I set +3dB for tracks without RG info and +6dB for the ones with RG info. For test tracks I choose track source mode and for listening to a whole album I obviously choose album source mode. This also offsets the loss of gain the DSP chain induces. I just apply gain, I don't select prevent clipping according to peak, as this is a useless feature that more often than not nullifies the whole purpose of using RG. If you decide to use RG as well, just add the Advanced Limiter DSP and place it last in the list. This doesn't need to be configured. It's just a simple filter that only touches samples that are actually clipping, and will only ever change anything in rare cases where average gain level is low and peaks are relatively high.

When you've got your DSP chain configured, type a name into the empty field under DSP chain presets and click Save. You can now load this DSP plugin chain with their configurations by simply selecting it in the preset list and clicking load. The point however is not to use the chain in real time (although you could of course do that too, if you'd like), but only use Foobar to convert the original file into a binaural version. For ease of use you should now create a conversion preset. Right click on any file in your library and select Convert and then the ... (three dots at the bottom).

You are now in the Converter Setup window. There are 4 main parts, which you reach by clicking on the links. Start at the top with Output format. I suggest you use Wav for best quality, but a lossless format is okay too. If your DAC only supports 16/44 you should select 16-bit under Output bit depth and under Dither select always. If your DAC supports 24/96, select 24-bit and under Dither select never or lossy sources only if you sometimes convert MP3's or other lossy formats.

Go back and go to the Destination part. Just read about the various output destination options. That pretty much speaks for itself.

The last link Other is left as is "When finished do nothing".

Now we come to the most important part: Processing. Open the submenu. If you want to use ReplayGain, set that up under the relevant header. Now you can simply select the DSP preset you created above and load it. The Active DSPs list should then be populated with your previously selected and configured plugins. Go back. Click the Save button. Select Create new preset. Give it a name and press Enter. You now have created a Binaural Conversion Preset. All you need to do for future use is right click on a track, go to Convert and then click on the preset name you've chosen before. Your file will be converted and saved in the location and under the name you've setup in the Destination part. You can now play this new file on your headphones, preferably with a good headphone amp or DAC with a headphone output, and using a top class music player like MQn.

Wow, that was a long story. Thanks for your patience if you managed to make it this far. :-)

I hope this will give you as much musical pleasure as it has given me already for a number of years.
Enjoy!

Post by **darkpink** » Tue Jun 03, 2014 10:27 pm

That was great! I remember I had a 3d audio xp in Orlando Florida in a amusement park. Kept wondering why it never was transferd to hifi but here we are, thanks for the little booklet you wrote, will dive in to it tomorrow

Post by **minionas** » Wed Jun 04, 2014 7:11 am

Wow satshanti! Huge thanks for such detailed and perfectly clear explanations of your experiences with headphones!
I was also starting "digging" into this, but obviously i was just in "infant stage", as only discovered bauers and dolby wrapper, but only have heard about VST plugins suite :)
Im sure your findings will guide me the right way to the end!
Thanks again for your efforts and your good will to share it!!!

Post by **Aleg** » Wed Jun 04, 2014 8:44 am

Hi Satshanti

This is indeed worthy of a big thank you, even though I still have to read it properly.

So THANK YOU :-)

Cheers

Aleg

Post by **Diapason** » Wed Jun 04, 2014 9:08 am

What an epic post! Thank you satshanti, that's a fantastic resource.

Post by **tony** » Wed Jun 04, 2014 9:41 am

Wow Man with few posts but it seems whatever is posted is incredibly worthwhile.
Have to put the time aside to try this stuff out. As the lads have said alright many thanks Satshanti.
Welcome to Tirnahifi I never liked the welcome threads on forums but realise now without one it is fairly hard to introduce oneself. I think your introduction is up there with Gordon.

Post by **minionas** » Thu Jun 05, 2014 7:02 am

Had a very brief listen yesterday, to few songs converted by satshanti's instructions and can definatelly confirm, that scene was pulled out of my head and projected in front of me. You can really feel 3D locations. Need to listen more and concentrated to make conclusions about the reality of all this, but definately a must to audition/try for any headphoner :)
Thanks again satshanti !!!

Post by **satshanti** » Fri Apr 22, 2016 1:58 pm

Hi everyone,

It's been almost two years now since my original post, and during that time I have made some discoveries that improve upon the original, so I finally found the time and the desire to update this thread.

First of all, with the advent of mobile computing and now virtual reality, the use of headphones for positional audio has arrived into the mainstream world. Where a few years ago it was a niche market, now it's spreading fast. You can easily find VST plugins like TB Isone or a virtual device solution like Out Of Your Head to experience a binaural or 3d externalized solution for listening to music, watch video or play games. These options are still useless however for minimalist audiophile players like MQn, JPlay or WTFplay, because the heavy real-time CPU usage involved defies the purpose of these players. My method makes it possible to use these great players through headphones with hardly any loss in detail, tonal accuracy and dynamics, but with a huge gain in comfort and sound stage. One has to get used to the expanded image, so quickly switching back and forth between the original and converted track is not recommended. Take some time, close your eyes and imagine the sounds originating in the space around you, rather than in between your ears. The brain needs some time to deprogram itself back from inherently fatiguing regular headphone listening to natural listening.

The original post extensively explains the background story and contains step-by-step instructions on how to set it all up. I've now made it even more easy and changed some settings to improve the sound quality as well.

For a quick experience of the result download the now updated audio files (16/44 FLAC) in the following Dropbox folder:

Binaural Samples

These are already processed and can be listened to without setting anything up. They should work on any system, but of course only a high-end headphone rig will do them full justice. I personally use 24/88 wav versions to convert to, but I didn't want to make these files too large.

So here's the simplified method for creating binaural wav files yourself from any 44/16 audio file (preferably lossless, but even MP3's will work).

Download the new package I prepared from Dropbox. It contains VI Suite (VI_Setup.zip), Dolby Headphone Wrapper (foo_dsp_dolbyhp.dll), VST adapter (foo_vst.dll), SoX resampler (foo_dsp_resampler.dll), converter configuration file (foo_converter.dll.cfg) and another unnamed but necessary file, hereafter called file X. Install VI Suite according to the instructions included with it. Place the three Foobar plugins (dll-files) in the components folder of your foobar2000 installation folder (typically /Program Files (x86)/foobar2000/components). Place the configuration file in the configuration folder of your foobar2000 user data folder (typically /Users/your_user_name/AppData/Roaming/foobar2000/configuration), but make sure to make a backup of the existing file first. If you can't find the AppData folder, it means you'll have to switch on the display of hidden folders in Windows. Place file X in the main foobar2000 installation folder. Now start Foobar. Open Preferences and go to Components>VST plug-ins. Click Add, navigate to the folder where VI Suite was installed and add VI.dll to the VST list. Click OK and restart Foobar. Right-click on any file in your library, maybe a test file to test the process on, hover over Convert and click on the three dots (...). The converter setup window now opens. You should see a small list of Saved Presets on the left side. These are the ones I personally use and you will only need the ones that start with DH (Dolby Headphone). You could delete the rest (right-click on individual presets for further options). If you've got a DAC that supports hi-res, use DH24, otherwise DH16. The difference between the track and album versions is just in the naming and location of the converted files. You'll have to change that anyway. So for testing your single file, just click on the DH24 track preset and then on Load. The current settings on the right panel are now updated.

First Dolby Headphone needs to be configured for the current preset, so click on Processing, then on Dolby Headphone in the Active DSP's list and then click on Configure selected. Point the wrapper to file X. The rest of the settings should be left as is. Click on OK and then on Back.

Now click on Destination to change the location and possibly the naming convention of your destination file. Click Back. Click Save and either overwrite an existing preset or better yet, create a new one.

If you now click convert, your file will be converted to a 24/88 hi-res binaural file. If your DAC supports 24-bit, use this preset. You can also fiddle with the settings to create 24/96 or even 24/44 files. On my system 88 sounds best. If necessary, use the DH16 preset to output 16/44 files. From now on you can just right click on any track, album or folder in your Foobar library, hover over Convert and directly select the preset you've just updated or created.

That's it. It should all work as described. Let me know if it doesn't or if anything else is unclear.

Enjoy!

Post by **minionas** » Wed Apr 27, 2016 9:05 am

Thanks a lot satshanti for sharing your findings!!!

One quick question about resolution change. You said 24/88 is best on your system. Is it dac or conversion process result: original 16/44 file to sound better converted into binaural 24/88? What do you think?

Post by **satshanti** » Thu Apr 28, 2016 11:08 am

minionas wrote:One quick question about resolution change. You said 24/88 is best on your system. Is it dac or conversion process result: original 16/44 file to sound better converted into binaural 24/88? What do you think?

I'm not sure I understand your question. I'll explain a bit more and hope it will contain the answer you seek.

All my source files for binaural listening are 16/44. They have to be, because the Dolby headphone wrapper (or possibly DH itself, I don't remember) doesn't support anything higher. I do have some original hi-res files and I play those on my speakers, but most of my collection is 16/44 anyway. Foobar processes everything at 32-bit floating point, which means a much higher resolution than the source file. So after using VI and DH, it's just more accurate to then output at the highest bit-depth my DAC supports, which is 24-bit. It just clearly sounds better than output at 16-bit. If my DAC would support 32-bit, I would use that. I ALWAYS dither the output, whether it's 16-bit or 24-bit. It just sounds better.

Now, what about the sample rate? The original is 44, and changing it wouldn't necessarily improve the sound. There are three options for me, output at 24/44, so leaving the sample rate as is, at 24/88, so doubling it, or at 24/96, upsampling it. I use the SOX plugin for upsampling, as I find this the best. I tested these three options with the result in order of quality: 88 > 44 > 96, but the differences are small, much smaller than the difference between 24-bit and 16-bit output. The fact that 96 doesn't sound as good is that SOX has to calculate ALL new samples again, as 96 is not a multiple of 44, which involves some inherent inaccuracy. I am not quite sure why 88 does sound sligthly better than 44. Theoretically, half of the 88.2K samples are exactly the same as the 44.1K samples, so the wave form in both cases should be quite similar. I think it must have something to do with how my DAC works. I just don't know. I only know 24/88 sounds a bit more solid and fleshed out than 24/44.

I hope that clarifies things somewhat.

Tír Na HiFi

Guide: stereo to binaural conversion for headphone listening

Guide: stereo to binaural conversion for headphone listening

Re: Guide: stereo to binaural conversion for headphone liste

Re: Guide: stereo to binaural conversion for headphone liste

Re: Guide: stereo to binaural conversion for headphone liste

Re: Guide: stereo to binaural conversion for headphone liste

Re: Guide: stereo to binaural conversion for headphone liste

Re: Guide: stereo to binaural conversion for headphone liste

Re: Guide: stereo to binaural conversion for headphone liste

Re: Guide: stereo to binaural conversion for headphone liste

Re: Guide: stereo to binaural conversion for headphone liste