Modern DACs Are Failing the Same Null Test That Once Humiliated $6,000 Amps in 1985

We independently review all our recommendations. Purchases made via our links may earn us a commission. Learn more ❯

The part of a DAC that gets all the marketing barely registers on testing.

The difference between two DAC chips inside the same device is vanishingly small, according to a null test that has been embarrassing the audio industry since 1985.

Craig Stark, a neuroscience professor at UC Irvine, ran the test on downloadable recordings from a DAC with selectable chip/filter modes. And for an industry that markets converter chips and sound modes as meaningful upgrades, his measurements leave less to sell.

This test is not new audio skepticism dressed up in software, though. In 1985, Bob Carver used the same basic method to make a roughly $400 amplifier electrically mimic $6,000 tube monoblocks closely enough that experienced listeners struggled to pick them apart.

Four decades later, Stark aimed the same logic at modern DAC marketing.

Where the Difference Should Have Been

The principle behind a null test is straightforward enough. Play two recordings simultaneously with one signal inverted, and if they match, the result is silence. Whatever survives the cancellation is the measurable difference between the two sources.

“This is a brutal test because it’s going to isolate any amount of difference that’s actually there in the signal,” Stark explained.

To validate his setup, he used DeltaWave, a null-testing application developed by Paul K., USB compared against itself cancelled out, while USB compared against Bluetooth left visible artifacts.

And with the controls confirmed, he then used the same subtraction test on two kinds of DAC evidence: recordings from selectable modes inside one DAC, and recordings from two separate DACs.

Testing selectable modes inside one DAC

The first comparison started with files from Audio Masterclass, not with Stark recording the device himself.

Audio educator David Mellor of Audio Masterclass had recently reviewed the SMSL RAW-MDA 1 and done something most reviewers skip. Instead of relying only on listening impressions, he published downloadable .wav files recorded from the DAC’s selectable modes, giving Stark raw captures to test without YouTube compression.

Stark loaded two of those files into DeltaWave, aligned them, and subtracted one from the other. With this, if the modes produced a large sonic difference, the subtraction should have left a large residual. But it did not.

“If you are saying to yourself that there is a huge difference between these chips and you can tell them the instant you walk into the room listening or something like this… the difference between these is vanishingly, vanishingly small,” he said.

A DeltaWave null test (From: Craig Stark)

What survived was mostly small transient material, which Stark tied to reconstruction-filter behavior rather than a broad tonal shift.

That matters because the device appears to use two instances of the same ES9039Q2M chip, so this was not a clean contest between rival DAC chip brands. The useful point is narrower and stronger: in this DAC, the selectable chip/filter paths produced differences too small to support the promise of dramatically different ‘sounds.

Testing two separate DACs

If selectable paths inside the same DAC produce near-silence on a null test, the question becomes whether separate DACs from different manufacturers hold up any better.

Stark ran the same methodology on a JoliCali JM6 Pro and a Schiit Modi. These are separate devices with different converter chips, different analog output stages, and different price points.

And unlike the Audio Masterclass files, this comparison changed the whole DAC implementation: chip, analog output stage, clocking, power supply, filters, and board layout.

So if Stark’s first test challenged the marketing value of selectable chip/filter paths inside one DAC, this one challenged the broader idea that competent modern DACs should sound obviously different.

“Here is the big moment of truth. The incredible amount of difference that you can actually hear between these two. Ready? […] Yeah, I don’t know about you. I’m not hearing a darn thing,” he said.

DeltaWave spectrum analysis (From: Craig Stark)

After amplifying the separate-DAC null residual by 20 dB, subtle differences surfaced on the spectrum analyzer, including clock jitter, power-supply switching noise, and slight analog-circuit variations. But Stark still described those leftovers as below normal hearing thresholds.

A Method With a 1985 Pedigree

Stark’s DAC results are harder to wave away because the method behind them has embarrassed high-end audio claims before.

In 1985, Stereophile Magazine challenged Bob Carver to prove that the difference between two amplifiers could be reduced to something measurable. The target was a pair of Conrad-Johnson Premier Five tube monoblocks worth more than $6,000. Carver’s amplifier cost roughly $400.

Working in Stereophile’s own offices in New Mexico, Carver used null difference testing to compare his amplifier against the expensive reference. He adjusted the cheaper amp until the difference signal between the two became electrically negligible, then asked the magazine’s editors to identify which amplifier was which.

“Stereophile employees failed to pass a single blind test with their own equipment in their own listening room,” the company later noted.

A vintage Carver amplifier tied to the famous 1985 blind testing challenge. (From: Stereophile)

The point was not that all amplifiers sound the same, though. It was that a claimed sonic difference could be forced into the open. And if the difference was real and large, it would survive subtraction. But if it disappeared into the null, the listening claim had to shrink with it.

Stark’s DAC tests belong to that same tradition. DeltaWave is software rather than a bench full of analog gear, but the logic is the same: align two signals, invert one, and listen to what remains. Four decades after the Carver Challenge made a $6,000 amplifier argument measurable, Stark used the same basic method on modern DAC recordings whose differences were marketed as obvious.

His connection to that older testing culture is personal as well as methodological. His father, Craig Stark Sr., was an editor at Stereo Review magazine, where listening claims were often checked against measurement. But the force of Stark’s DAC test does not come from family history. It comes from the null.

Why Engineers Aren’t Surprised

The engineering community’s reaction to Stark’s findings has been something closer to “obviously.” John Siau, VP of Engineering at Benchmark Media Systems, has been making this argument for years.

“Never assume that the sound of a converter is determined by the D/A chip!” Siau wrote. “Two converters with identical digital chip sets can have wildly different performance.”

The thing is, roughly 90% of a D/A converter’s components are analog. After the chip converts the signal, the surrounding analog and electrical implementation still determine how much noise, distortion, jitter, and filtering behavior reaches the listener.

That distinction explains both of Stark’s tests. The Audio Masterclass files compared two chip paths inside the same device, with the same analog stage and power supply. So the tiny residual Stark found, mostly transient artifacts he tied to different reconstruction filters, makes sense.

Meanwhile, the JoliCali JM6 Pro versus Schiit Modi comparison was tougher because the entire DAC changed. Yet even there, Stark heard no meaningful residual. Only after boosting the leftover signal by 20 dB did clock jitter, power supply noise, and small analog-circuit differences become visible.

So the engineering lesson is narrower than the marketing claim. Modern DACs can differ electrically, and bad implementation can still matter. But Stark’s results make chip identity look like a poor predictor of what listeners will actually hear.

The Spec Sheet Premium

The engineering consensus hasn’t reached the marketing department.

For instance, the SMSL RAW-MDA 1, a roughly $280 DAC that houses dual ES9039Q2M chips of the same model, markets itself on offering “1000 sounds” through various filter and color mode combinations. Those are the kinds of selectable paths captured in the Audio Masterclass files Stark tested, and the null result made the promise of dramatic sonic variety look much larger on the spec sheet than in the signal.

Listeners who actually tried to hear the difference were underwhelmed.

“I couldn’t hear any difference between any of your samples,” one commenter noted.

At $280, the dual-chip feature is at least inexpensive enough to be harmless.

But scale the same marketing logic up to flagship pricing and the math gets uncomfortable. The McIntosh MCD12000, which retails between $12,000 and $15,000, builds its identity around ES9038PRO converter chips that cost approximately $90 each at retail.

That does not mean the rest of the player is worthless, or that flagship products are only the sum of their chips, though. It means the opposite: the expensive, difficult part is the surrounding implementation, not the converter-chip name printed in the brochure.

It means the expensive part is the implementation, build quality, support, and brand positioning, not the converter-chip name printed in the brochure.

What Trained Listeners Actually Hear

The spec-sheet critique is strongest when the differences disappear under testing. But not every blind test says DAC differences are imaginary.

Blind listening test results showed many listeners struggled to hear meaningful differences between DAC setups. (From: Archimago)

In 2024, audio blogger Archimago ran a blind listening test with 105 audiophiles comparing an Apple USB-C dongle, a Linn Majik DS with Dynamik PSU, and a Linn Klimax DSM/2. Participants downloaded 24/96 FLAC files captured from those three DACs and ranked the samples by preference, so the test measured preference rankings, not whether listeners could name the chips or identify the products.

The overall results weren’t statistically significant, landing at a p-value of .175. Nearly half the participants, 43%, either heard no meaningful difference or concluded the gap wasn’t worth paying for.

Archimago also found a secondary wrinkle: listeners with systems under $2,000 ranked the samples closer to the devices’ objective performance and price hierarchy than listeners with more expensive rigs. That does not mean cheaper systems hear better, and it does not prove those listeners identified the DACs by name. It mainly weakens the easy assumption that spending more on playback gear automatically makes subjective DAC judgment more reliable.

However, headphone listeners told the stronger story. Among 42 participants using headphones, the p-value fell to .00084. With room acoustics stripped away, their rankings were much less likely to be random.

A p-value of .00084 means there’s less than a 0.1% chance the headphone listeners’ preferences were random. The conventional threshold for statistical significance is .05, or 5%.

This result keeps the argument from becoming too simple. DACs can sound different under the right conditions. But the Apple dongle and Linn devices differed in their full DAC implementation, not just their chip identity. The evidence still points away from chip identity and toward the engineering around it.

💬 Conversation: 2 comments

Craig Stark

May 17, 2026 at 9:47 am

It sounds like someone should loan me a Linn DSM/2. I’ve got the Apple Dongle and the DeltaWave setup already… But seriously, none of this says that DACs can’t sound different. Heck, even within the same DAC, GoldenSound’s ability to reliably hear up to 20kHz showed he could pick out different reconstruction filters under the right circumstances. That was both an audible and a measurable difference. As you note, there’s a lot that surrounds that DAC chip and there’s certainly room for things to alter the sound.

If we turn to Archimago’s results, the best thing we, as a community, could do here is to simply run that same test again and see if we get the same results. Given my day-job as a scientist who does a ton of stats for a living on noisy human behavior (BTW, Archimago’s great on this!), replication is the key to the realm. If the effects were night and day, we’d expect that last graph to be a bit more consistent. Why does the Apple Dongle have the lowest “mid” ranking for example? That’s a bit odd… Toss in the fact that these weren’t corrected for multiple comparisons (run 100 tests, set a p-value at 0.05 and you’ll certainly find some that pass) and we’ll want a replication to be convinced for sure.

But, it was fun to revisit this kind of test and really impressive to see that there’s great software out there now to do this digitally. As a hobby, we’ll be better off the more we understand just what does, and what doesn’t, improve our sound.

Reticuli

May 19, 2026 at 12:12 am

So he demonstrated something we already knew from the engineering and measurements. And IEM > over ear > speaker for critical listening. What’s astounding is how a $20 IEM is now more revealing than a $2,000 over ear that is more revealing than a $20,000 speaker.