Neural Amp Modeler Architecture 2 (NAM A2) is out, but is it as good as everyone says? By the end of this article, you’ll understand all of the data and whether or not the hype is real.
Not only is NAM A2 more accurate than TONEX and Neural DSP, but even the Lite version meant for embedded systems beats them! A2 Full crushes them even further.
General Improvements
NAM A2 Full is better than A1 Standard in a number of ways:
- sounds more accurate
- measures better in the lab
- reduces aliasing-like artifacts
- virtually eliminates high frequency ringing in high-gain amps
- larger receptive field, a proxy for good playing feel
- improves CPU use by 30–40%
If that’s not enough reason to update, I don’t know what is. Oh yeah—it’s still free! A2 Lite is even lighter, and even though it scores lower, it still beats the competition listed in the official NAM A2 Complete Guide.
All is not cherries and whipped cream, however. A2 is not fully backwards-compatible. So while A2 hosts can load A1 models, A1 hosts can not load A2 models. It seems that you will need NeuralAmpModelerCore v0.5.2, plugin v0.7.14, and training code v0.13.0 or later for A2 support.
It’s worth mentioning that TONE3000, the home base for all your A2 needs, has already retrained the existing A1 library to A2, except malformed captures and zero-download models older than 60 days.
Objective Tests
While the previous section sounded like your typical press release, the NAM people are all about that data. Every single product in audio is made using objective measurements, even if not all of them use those measurements in marketing. What I’ve found is that the companies that lead with data usually have it on their side. The TONE3K article is chock full of goodies, but you can’t digest what you can’t understand. Me to the rescue!
A2 was subjected to a battery of tests including:
- ESR (Error-to-Signal Ratio): This is pretty much a null test, with the residual compared to the original.
- MAE (Mean Absolute Error): Pixel peeping for audio nerds. This compares waveforms, sample-by-sample.
- LOG_MEL: This divides the sound into bands known as Mels, which are related to pitch perception. This is similar to Bark bands, which are related to perceptual masking. It’s basically measuring frequency content and the perception thereof.
- MRSTFT: No, not Mrs. Tuft 🙂 This is a tonal vs transient test, which strikes me as very similar to equalizers in the same vein. Pretty much they’re using different FFT windows to measure various parts of the audio.
- Elo: Not a test itself, but a kind of summary score, not unlike a chess ranking. Since no one test gives the whole picture, they wrap all of them into this tortilla of a score so you can digest it more easily.
Subjective Test
The final arbiter of quality is the ear. You need objective measurements because the ear is easily fooled, but at the end of the day, this model is for people to use. Nay, music people—an elite breed altogether. And 1,000 of us took up the challenge of a four hour sonic examination known as the MUSHRA.
MUSHRA stands for MUSHRoom Acid. One wishes! Actually, it means MUltiple Stimuli with Hidden Reference and Anchors. That means:
- Multiple Stimuli: Instead of an ABX test where you’re comparing A or B to an unknown, you’re comparing seven or so files.
- Hidden Reference: One of the files is an exact copy of the original reference, hidden among the choices.
- Anchors: A low-passed file meant to anchor low scores.
In addition, a post process is applied where any trolls or people with their speakers off who can’t reliably tell between the anchor and the reference are booted, leaving only pure clean results.
Results
Here’s the moment you’ve been waiting for. What does it all mean, Hexspa? TELL US. Well first, a quick stats brush up:
- Median: Line up all the scores in order and pretend there are 100. Median is the 50th highest score.
- P10: This would be the score 10th from the bottom (10th percentile).
- Q1: Think of it as P25, the 25th score from the bottom (first quartile).
- Q3: Like Q1 but from the other end, P75 or third quartile.
- P90: The best gun in CS2, or the 90th percentile score.
A2 Full Crushed All!
I’m gonna focus on A1 Standard vs. A2 Full. You can use this framework to figure out the rest for yourself.
Objective Results
- Median ESR: 0.0062 for A1, 0.0033 for A2 (twice as good).
- Elo: 1,834 for A1, 2,039 for A2 (more Elo/higher cumulative objective goodness).
Subjective Results
- P10: 70.4 A1, 77.8 A2 (biggest jump).
- Q1: 87 A1, 90 A2 (small but significant).
- Mean (average) for Hidden Reference: 97.5 (i.e. the perfect copy didn’t score perfect.
- Mean A1 to A2: 88.8 to 92.44 (3.64% higher on the MUSHRA scale, or just 5.06% down from the perfect copy vs 8.7% down for A1).
Not Covered
Not every question has been answered, however. The two notable absentees are Kemper and Fractal—the pro’s choices for over a decade and a half. Honestly, I had never even heard of Line 6 Proxy, so I’m not shocked that A2 beat it.
Also, MUSHRA is simply a test for similarity, a copy cat test, if you will. It does not measure preference, and accuracy does not always equal preference. If it did, none of you would even use NAM, because the whole point of guitar amps is to make your guitar sound different!
Of course, none of this covers ecosystem, user sentiment, or even playing feel, though the receptive field is supposed to act as a proxy.
Takeaway
The tokeaway here is that A2 is better than A1 in every published metric, both objective and subjective. That’s fine. It’s not my emphasis to compare to the third parties mentioned, because I feel it’s better to let them speak for themselves. I don’t use IK Multimedia’s TONEX nor Neural DSP’s products, so don’t call me butt hurt. I DON’T CARE OK. Just let NAM be NAM, nom nom nom.
Yuuge props to NAM/TONE3000 for putting out the datas. Bigly respect. So many companies hide, misrepresent, downplay, or otherwise crap on good measurements, which makes this campaign refreshing. Heck, it even pulled me out of retirement!
So while none of this means that NAM is the best amp sim on the market, it definitely proves that it’s far from the worst. They’ve set a benchmark here that no one else has dared to compete with in a public way.
Congrats to the NAM/TONE3K team, cheers to the leet muso community, and join the Hexie Dose newsletter for more content like this, straight to your inbox.
-Michael
