These AI Headphones Let Users Focus on One Voice With Just a Glance

Hear only who you want to, even in a busy place.
Hear only who you want to, even in a busy place.

We independently review all our recommendations. Purchases made via our links may earn us a commission. Learn more ❯

The best part is, you can test this out now.

Researchers from the University of Washington have created a system called “Target Speech Hearing” (TSH). This AI-powered creation allows you to zero in on a single speaker’s voice in a noisy environment – just by looking at them.

It’s not yet commercially available. But the great news is, the code for the proof-of-concept device is openly accessible for others to build upon and experiment with.

“We tend to think of AI now as web-based chatbots that answer questions. But in this project, we develop AI to modify the auditory perception of anyone wearing headphones, given their preferences.” said the senior author, Shyam Gollakota.

“With our devices, you can now hear a single speaker clearly even if you are in a noisy environment with lots of other people talking.”

How the AI-Powered Target Speech Hearing System Works

Demonstration of how the Target Speech Hearing system works.
Demonstration of how the Target Speech Hearing system works.

The TSH system uses AI to separate a target speaker’s voice through a simple process.

You just need to look at the person for 3-5 seconds while tapping a button. This will activate the binaural microphones to identify the voice of the speaker with a 16-degree margin of error. By doing so, the system “enrolls” the speaker and remembers the unique sound of their voice.

That captured audio then goes to an embedded computer, where a machine learning code carefully studies the vocal patterns. This helps the TSH remove all other environmental noise, leaving you with a clear audio channel for the enrolled speaker – even as they move or turn away.

There’s a caveat, though. For now, TSH can only enroll one speaker at a time. Its performance may also suffer if a louder voice comes from the same direction as the target.
The researchers used Sony WH-1000XM4 headphones during their tests.
The researchers used Sony WH-1000XM4 headphones during their tests.

What’s great is that this technology utilizes off-the-shelf headphones.

For example, the researchers have tested this with a modified Sony WH-1000XM4 headphones. They simply fitted them with binaural microphones (Sonic Presence SP15C), and the Orange Pi 5B embedded CPU for processing.

By doing so, the researchers prove that the TSH can be integrated into just about any consumer audio device.

Co-authors of the research include Bandhav Veluri, Malek Itani, Tuochao Chen, and Takuya Yoshioka. The research was funded by the Moore Inventor Fellow award, Thomas J. Cable Endowed Professorship, and UW CoMotion Innovation Gap Fund.

Testing and Results

The test results in noisy environments.
The test results in noisy environments.

The Target Speech Hearing was presented at the ACM CHI Conference on Human Factors in Computing Systems in Honolulu. Here, the researchers presented the results of their tests with 21 subjects.

Based on these results, the subjects rated the clarity of the enrolled speaker’s voice nearly twice as high as unfiltered audio on average.

Plus, TSH’s skill only grew with continuous speech, as it absorbed more sound data to refine its grasp of the speaker’s voice.

The system should still be effective even if the target speaker isn't directly in front of you.
The system should still be effective even if the target speaker isn’t directly in front of you.

This also proved effective even in noisy environments.

In fact, the numbers show that the TSH system achieved a signal quality improvement of 7.01 dB using less than 5 seconds of noisy enrollment audio. This is only a 0.4 dB drop compared to quieter environments.

Unfortunately, the system isn’t perfect against interference from another overpowering voice in the same direction.

But, in such cases, the user can simply re-enroll the speaker’s voice to isolate it better.

The Target Speech Hearing technology builds on the team’s prior “semantic hearing” research. This lets users select specific sounds like birds or voices while canceling out other environmental sounds.

Real-World Applications and Future Plans

The TSH system holds promise for various real-world applications.

From enabling clearer communications in crowded venues to improving hearing aids, this tech could redefine how we experience sound in noisy environments.

Excitingly, the researchers already plan to integrate TSH into earbuds and hearing aids. Their vision also includes the hardware side: With AI chips potentially costing under $10 per unit at scale.

“We are excited to see how this technology can be integrated into everyday devices to improve communication in noisy environments,” said Shyam Gollakota.

Plus, in true open-source spirit, the TSH system’s code, neural networks, and AI algorithms are openly available on GitHub. This means AI enthusiasts and developers can experiment and expand on this foundation, which the researchers hope will help them improve the tech further.

đź’¬ Conversation: 1 comment

  1. This is great for those of us you cannot hear the person in a crowd that interest you. It’s frustrating to not to full hear, understand their conversations. I try to read lips but that fails most of the time

Leave a Reply