This blind test proved that even the “best” music streaming service can flop in the right conditions.
Audiophiles love to argue about which streaming service sounds the best. But what happens when you take bias out of the equation?
Three engineers at ABYSS Headphones ran a blind test using a $30,000 headphone system. Spotify, Apple Music, Tidal, Qobuz, and Amazon Music all went head to head. And, the results turned expectations upside down and showed how personal sound quality really is.
The Blind Test and Its Results
To see how music streaming services really stack up, the engineers at ABYSS Headphones set up a blind listening session using their high-end headphone system.
Here, they tested Spotify, Apple Music, Tidal, Qobuz, and Amazon Music. And to keep things fair, they used an external DAC that showed the resolution of each track in real time.
But while the gear stayed the same, the rest didn’t. Each engineer brought in their own set of songs, which meant they were all judging different music. That small detail had a big impact on the results.
And those results were all over the place:
- Joe ranked Tidal first, then Qobuz, Apple Music, Spotify, and Amazon Music last.
- Jason put Apple Music and Amazon in a tie for first, followed by Qobuz and Tidal, with Spotify at the bottom
- And Eric had a completely different order: Tidal came first again, followed by Amazon, then Apple Music, Spotify, and, surprisingly, Qobuz in last place.
That last one caught everyone off guard. Besides, Qobuz is known for high-resolution audio, and it’s a favorite among audiophiles. But during the test, Eric didn’t think it sounded great and ranked it at the bottom.
Based on these individual rankings, the overall rankings would be:
- Tidal – Ranked 1st by both Joe and Eric, 4th by Jason
- Apple Music – Consistently middle-tier, but Jason’s 1st place brings it up
- Amazon Music – Wide variation from Jason’s 1st to Joe’s last
- Qobuz – Middle rankings from Joe and Jason, but Eric’s last place hurts it
- Spotify – Consistently ranked poorly by all three reviewers
In any case, this just proved that even experienced audio engineers can’t agree on streaming quality. While they used identical equipment, each engineer ranked the services differently.
Some even placed the same service at opposite ends of their lists.
Why Qobuz Didn’t Perform as Expected

Out of all the results, Eric’s ranking of Qobuz was the most surprising. While Qobuz is known for offering high-resolution audio and is often favored by audiophiles, Eric placed it dead last.
How did that happen?
Well, during the blind test, he didn’t know which service was playing at first. But when one track came on (Angel by Sarah McLachlan) he called it “highly compressed” and “shockingly bad.”
He even guessed, “This is probably Spotify,” based on how it sounded. But later, when another unfamiliar track played, he took another guess: “If I had to guess, this is Qobuz.”
Eric kept noticing things he didn’t expect. One track, he said, had a strange “strobing echo,” and he wondered out loud if it was Atmos or some other DSP effect.
Finally, when the final rankings were revealed and he saw Qobuz at the bottom of his list, he didn’t react with shock but with curiosity.
It was a fair question. And, unfortunately, most streaming platforms do not provide detailed metadata about the exact mastering used for each track. This includes whether a track is from a loud CD master, a special audiophile remaster, or even a remastered vinyl transfer.
While some services like Qobuz may label certain tracks as “Hi-Res” or display the resolution (e.g. 24-bit/96kHz), this doesn’t guarantee transparency about which master was used.
So, we wouldn’t exactly know if it’s a mastering issue, or a platform issue.
Why Engineers Couldn’t Agree on the Best Service

One big reason the results were all over the place? The engineers didn’t listen to the same music.
Each person picked their own set of tracks for the test. That made it less of a shared comparison and more like three different listening sessions. They didn’t plan it that way, but it had a real impact on how they judged the services.
Joe leaned on experience. So, he chose songs he’s known for decades and used them as benchmarks. He said he could tell when a version didn’t match what he was used to hearing, especially after hearing those tracks on vinyl, CD, and high-res digital formats over the years.
Eric did the opposite. He avoided songs he already knew so he could focus only on what he was hearing in the moment.
That approach didn’t quite click with Joe, who compared it to a blind taste test without knowing what Coke is supposed to taste like.
On the other hand, Jason fell somewhere in the middle. He’s a regular Apple Music user and was more familiar with how it typically sounds. That comfort may have shaped some of his choices, though he later said he probably would’ve picked different songs if he could do it again.
None of this became clear until the rankings were revealed. That’s when they realized just how differently each person had approached the test.
“Everybody came in at a different angle to the same test. And you didn’t figure it out until after the test was over.” says Jason.
“That’s confusing when you think about it because it’s like this is why people can’t seem to agree online as to what their favorite is.”
Eric ran into this firsthand. On one track by Linkin Park, he actually preferred how it sounded on Spotify. It felt smoother and less harsh. That surprised him, especially since Spotify is usually seen as lower quality.
But it proved that what sounds “best” isn’t always about technical specs.
There were also differences in file resolution. They tried to max out the quality settings on every service, and the external DAC showed what kind of signal was actually coming through.
Sometimes it was high-res, other times just CD quality.
“So if you’re just kind of rocking through this stuff and you’re not paying attention to that, you don’t even know what you’re playing. There’s a lot of confusion in that.”
In my home,Banzai Republic, 0,32” reverberation.
Easy way to test headphones ,speakers and amplifiers etc.
Het wordt dus geen Abyss hoofdtelefoon!
This process would be similar to comparing speakers while choosing your own songs. Of course some speakers might sound better with various types of music. Streaming apparently does, too. And this:”Joe leaned on experience. So, he chose songs he’s known for decades and used them as benchmarks.” Decades-old songs don’t have enough fidelity to tell you anything. I wonder what types of amplifiers were used. Many amps are designed to sound warm or to mask the distortion and the inherent flaws in class AB amplification and vinyl records, audiotapes, and other messy sources. So what is transmitted as a clean stream of bits may have been “messed with” quite a bit before it got to the listeners’ ears. So as you say more than once, the test fails because the listeners are not comparing apples to apples. The test succeeds where a person ends up saying “I like streamer X for the music I like to listen.”