How Headphones Sound (to me)

Waterfall CSD of the HD800S

When it comes to reviewing headphones, there are quite a few different points of comparison I have to look at: build quality, aesthetics, sound, portability, functionality. The most important aspect, though (for me, at least) is sound. But, having no way to directly replicate the sound to my readers, I have to rely on words, and sometimes measurements.

The problem is that not every reviewer relies on the same vocabulary to describe the sound of headphones. While one reviewer might talk about “transient response” or “speed,” another might talk about “note weight” or “development.” While one reviewer calls a headphone “hard,” another might call it “shouty.” While one calls a headphone “fast,” another might call it “bright.”

For those new to the world of audiophilia, I’ve written this absurdly long tome of an article as a primer to the way we talk about sound. However, those who’ve been around the block may also want to check out the later parts of the guide to see what I personally mean when I use certain words.

Measurements

Already the novice audiophile runs into an issue when they’re expected to understand the numerous graphs, charts, and squiggly lines that are put forward by many headphone review websites (but not MajorHifi – we’re working on it, though!). If you already know what all these measurements are, feel free to skip this section.

Sometimes, though, I’ll refer to measurements from other websites like InnerFidelity, DIY-Audio-Heaven, or crinacle’s in-ear fidelity.

Frequency response graph of the HD800S, taken from DIY-Audio-Heaven

Frequency response

A frequency response graph, like the one pictured above (taken from the Sennheiser HD800S review on DIY-Audio-Heaven), is a chart where the horizontal axis represents pitch, in Hertz, and the vertical axis represents volume, in decibels. We look at these graphs because no headphone is truly flat when it comes to frequency response – each headphone plays some pitches louder than others. And a headphone’s sound depends quite a bit on what its frequency response graph looks like.

A good deal of this article will be talking about how boosts at certain frequencies and cuts at others affect sound – and all of this is visible on a frequency response graph.

Waterfall CSD of the HD800S

Other graphs

Besides the frequency response, we also look at things like impulse response, step response, and CSD plots. An impulse response basically displays what a headphone does in response to an extremely short blip of white noise, while a step response takes the integral of that, which demonstrates additional properties of a driver.

CSD stands for Cumulative Spectral Decay, and it is a three-dimensional plot. You can see one above, also taken from the DIY-Audio-Heaven HD800S review. The first two axes are essentially the same as in a frequency response graph; the third is time. When a headphone plays sound, its driver, being a physical object that carries momentum, can’t stop instanteneously, and in fact every driver stops at a different rate depending on pitch. A CSD displays how fast a headphone driver stops playing sound at each frequency.

When a headphone has a particularly long decay at a certain pitch, we call it “ringing” – and sometimes that ringing is audible, while sometimes it isn’t. But when I say a headphone has “ringing,” it means that a particular pitch overstays its welcome, muddying the sound.

Sine sweeps

Since I lack the tools to make my own frequency response graphs, I instead use a sine sweep. This is an easy way to test your own headphones at home for frequency response.

Most pitches that you hear played by instruments have overtones. They produce both a fundamental pitch (the note that you hear) and a number of pitches above that fundamental that influence the timbre of the sound. A sine wave, on the other hand, is just the principle pitch with no overtones, making it the best way to analyze the sound of a headphone.

Using a sine-tone generator, like this free one online, I move the slider up and down to check which frequencies are louder. This gives me a lot of information about how a headphone sounds, usually enough to review it. However, it’s not completely objective, as it depends on the structure of my ear.

Compensations and targets

You might think that an ideal headphone would be one with a completely flat frequency response, and that anything else is unrealistic. Unfortunately, it’s not quite that simple. Speakers are quite easy to measure – just put a microphone in a room, and if a speaker is flat, it has an accurate tonal balance.

Headphones, on the other hand, are harder to measure, and those difficulties with measurement present headphone manufacturers with a challenge. In fact, it turns out that a headphone that sounds “flat” actually doesn’t measure flat. For one thing, you have to have a significant boost at 3kHz to truly sound flat.

But also, scientists have created a number of different targets, each with its own claim to neutrality. The diffuse-field curve, for example, is the frequency response measured by a dummy head placed in a “diffuse field” – a room where sound comes equally from all directions.

And I could keep going with more targets and more about equalization, but there are other places you can read about this. I may write my own article on the subject one of these days.

Distortion graph of the HD800S

Distortion

In a perfect recreation of an audio signal, every pitch will be recreated exactly as it is found in the original recording. But since drivers are imperfect objects that have weight and momentum, an exact rendering is unfortunately physically impossible. Distortion describes imperfections in the sound that exist beyond the frequency response.

I’ve already explained the concept of overtones, and distortion has a lot to do with that. Essentially, distortion is the addition of overtones beyond what’s in the original recording. Usually, it’s not severe enough to be audible beyond a sort of harshness or grating quality to the sound. However, occasionally, distortion can be audible, especially in the bass frequencies (more on that later).

Pictured above is a distortion graph for the HD800S. The red line represents the amount present of the 2nd harmonic; the other colors represent the rest of the harmonics. As you can see, the distortion levels of the HD800S are very low, generally below 0.2% – hence the HD800S’s reputation as an outstandingly resolving headphone.

The Listening Process

One of the greatest drawbacks of being a reviewer is the need to listen to the same tracks for every new headphone that you test. Every reviewer is familiar with the fatigue of listening to one of your favorite albums, over and over again, until it becomes almost unbearable. But that’s the thing: to know what a headphone sounds like, you need to listen to music that you know well.

Luckily for me, I have eclectic tastes, so my tracks cover a range different sounds and sensibilities. There’s:

There’s quite a few more, but those are some of the more specialized tracks on my playlist, and I’m always sure to hit those at some point during my review. Most headphones aren’t able to play all of these tracks equally well: one can tell quite a bit from which tracks a pair struggles on.

Bass

Subbass

The subbass comprises the very lowest of the low frequencies. These are the “rumble” frequencies, the frequencies that create the gut-punch effect of a good kick drum, and the slam of dubstep basslines. Subbass generally refers to the very lowest frequencies (down to 20Hz or below) to about 60Hz.

When talking about subbass, reviewers will often mention the “subbass extension” of a headphone. Essentially, many headphones (some types more than others) have what we call “roll-off,” which is a decrease in volume and clarity towards the bottom of the audio spectrum. A rolled-off headphone will sound less authoritative in the bass, and may render some bass frequencies inaudible, making them suboptimal for bass-heavy genres like EDM and pop.

In the subbass, I also frequently refer to “slam.” This is much less easily measurable than extension. To me, it refers to the sense of dynamics and speed of attack for sounds in the bass region. A headphone that really “slams” or “thumps” will provide an impression of moving a lot of air, very quickly, against my eardrum.

It’s important to note that some headphones with relatively quiet or even rolled-off bass can still slam, as long as they’re fast in the bass, and with low distortion. For example, the Focal Utopia is not the bassiest headphone, and has some roll-off – but when it comes to slam, it has it.

Midbass

From 60Hz to about 250Hz we have the “midbass,” the frequency range in which most acoustic or “real-world” bass frequencies are found. (Although not all – for reference, the lowest note that’s playable on a double bass is about 41 Hz.) This is a frequency range that’s often responsible for lending a headphone a sound of heft, weight, or body.

Many consumer-oriented headphones, like those from Beats and Bose, have a pronounced midbass region, producing a weighty, “fun” sound. This can leave some audiophile headphones sounding thin or dry in comparison, even when they’re completely neutral. When the midbass is overly pronounced, we often call it a “midbass hump,” because it looks on a FR graph like a big hump centered somewhere in the midbass.

The downside to a pronounced midbass is that it can obscure clarity. In addition, many lower-end drivers feature significant distortion in the midbass, which can allow it to “bleed” into the midrange (remember, distortion adds overtones, which can extend upward into the midrange). In this case, we call it “midbass bleed.”

Midrange

Lower midrange

For my purposes, I’ll call the region from about 250Hz to about 750 Hz the “lower midrange.” Never mind the fact that the upper end of this range isn’t actually that low – this is where most of the fundamental frequencies of instruments are going to be found. So this is the region that lends “body” to the sound.

Instruments with a pronounced lower midrange will sound warm, thick, or intimate. An overly-large emphasis on the lower midrange can make a headphone sound cloying, lacking in clarity, or dull. And a headphone with a recessed lower midrange can sound thin or lifeless.

In general, it’s really important to get this frequency band right – and significant boost or cut in this area will be audible, since instruments actually play notes that fall within it.

Mid-to-upper midrange

The upper midrange technically consists of anything above 2kHz, up to about 4kHz, or maybe slightly above. This might be the single most important area in determing the “timbre” of a headphone, so pay close attention.

The upper midrange, especially around 3-4kHz, is sometimes known as the “presence.” A boost or peak here will make instruments sound clearer, closer, and more, well, present. But headphone manufacturers walk a fine line here because this region can also sound very harsh and unpleasant when the levels get too high.

One phenomenon that we like to talk about in audio is “shout.” Shoutiness refers to a boost in the upper midrange that makes things sound unpleasant and…well, shouty, as if the headphones are yelling into your ear.

So you might think the safe thing to do in the upper midrange is to pull it back, and this is true, to an extent. Many consumer-tuned headphones do pull back the upper midrange – the Bowers & Wilkins PX5 that I recently reviewed is a good example. But this often results in a headphone that sounds boring, laid-back, and unengaging.

However, a cut at 3-4kHz combined with a boost at around 2-2.5kHz can create a very interesting effect. While the recession at 3-4kHz does take away presence and make instruments sound more distant, the slightly lower boost can achieve the clarity of the missing higher frequencies, allowing for a wider soundstage. The K701 from AKG is an example of a headphone that uses this trick to its advantage.

If there’s one thing to take away from this section, when it comes to the upper midrange…too much, it’s shouty and unpleasant; too little, it’s unclear and unengaging.

Treble

Most of the pitched content happens in the bass and the mids; once one gets to the treble region, starting around 4-5kHz and extending upward to the ceiling of human hearing (about 20kHz, slightly lower for most people), it becomes difficult, if not impossible, to pick out individual notes. Instead, the treble is reserved for unpitched sounds like cymbals, environmental noises, and the grain of the human voice, among others.

But that’s not to say that treble isn’t important. Bad treble can truly ruin a headphone.

It’s almost impossible to create a headphone with no peaks or dips in the treble response – even headphones that sell for thousands of dollars still have an emphasis on some frequencies over others.

One of the terms we reviewers will commonly refer to when describing treble response is “sibilance.” Sibilance refers to certain consonants, like “s,” “z,” and “sh,” that are produced without activating the vocal chords, and result in a higher-pitched burst of noise. In headphones with especially bright, jagged, or distorted treble response, these sounds can come across as piercing or fatiguing. So we call headphones that emphasize these sounds “sibilant.”

On the other hand, headphones with a subdued treble response, especially in the lower treble (say, around 6kHz), can sound flat, dull, or unrealistic. I will often refer to these headphones as “dark.” That’s not to say that a dark sound is always a bad thing – I consider the Audeze LCD-2 a “dark headphone,” and it’s unquestionably a great product. But I personally do prefer things on the brighter side.

Texture

Speed

You’ll probably see people referring to headphones as “fast” fairly frequently, and you might wonder what that means. While some reviewers (notably Tyll from InnerFidelity) use it differently, most people are referring to the ability to transition quickly and seamlessly between different sounds.

Headphones operate by using magnets to make their drivers move, producing sound. But these drivers have mass, and therefore momentum, so they won’t move perfectly in accordance with the incoming signal – they’ll lag behind, or they’ll bend and distort. “Speed” essentially refers to how far behind the signal the driver lags.

Of course, a slow headphone won’t actually sound slow, per se. Instead, textures get confused and blended together, especially in fast-paced, complex music. Ringing causes individual sounds to bleed into the next. So a slow headphone, like the old AKG K400 that I still use occasionally, will play slow music beautifully but won’t bring any clarity or separation to more complex, dynamic passages.

Detail

Speed has quite a bit to do with detail, but it doesn’t tell the whole story. Indeed, it’s possible for a headphone to be quite fast while still losing out on some detail. Detail also has quite a bit to do with the frequency response of a pair of headphones.

I still haven’t completely “cracked the code” on what aspects of the frequency response makes a headphone “detailed-sounding,” but headphones with forward treble tend to emphasize detail like tape hiss, and a slight boost somewhere in the upper midrange (especially around 2kHz) tends to help out. It’s also possible for a headphone to be detailed in one region, but lacking in detail in another – again this has to do with both speed and frequency response. For example, I could say a headphone is “lacking in mid-treble detail” because of either a dip in frequency response, a ringing edge or grain at a certain frequency, or both.

Soundstage

The first headphone I ever fell in love with was the Sennheiser HD558. After a lifetime of listening to music on tiny, on-ear noise-cancelling headphones, the HD558 was a revelation – and a big part of that revelation was my experience of the 558’s wide, expansive soundstage (at least, in comparison with my previous headphones). It’s true that headphones will never sound like speakers; they’ll also never replicate how sounds happen “in real life.” But to me, they have a captivating way of presenting sounds – they have a way of creating a private little world.

Since soundstage is mostly a psychoacoustic phenomenon, I personally consider it faintly ridiculous when viewers talk about how a certain thing is “4 inches tall” on one headphone, or “7 inches tall” on another. But it does seem possible, when listening to two headphones side-by-side, to determine things like imaging precision, and the depth, width, and height of the image.

No, you’ll never catch me using precise measurements to describe the dimensions of a soundstage. But I’ll still say things like “wider than it is high,” or “uncanny,” or “grandiose.” (And for your reference, the two kings of staging as I’ve heard it are the Sennheiser HD800 and the Final D8000 Pro.)

In Conclusion

If you’re just reading this article for fun, I hope you’ve enjoyed it. If you followed a link from a review, I hope this article has helped you understand all the crazy jargon that I use from time to time.

However, I hope to continue revising this article as I gain more and more experience with writing reviews. After all, the act of listening is a dynamic, ever-changing thing, and one never hears the same thing the exact same way twice.

If anything has been unclear, please leave a comment so I can update it – I would very much appreciate it! And thank you for reading MajorHifi.