Learning the language of sound

Learning the language of sound
Are you trying to describe how your headphones sound?

Does your favorite reviewer tend to wax poetic and use flowery prose when they describe what they hear, and you wonder what on earth they are talking about?

It’s always tough to put words to what your senses perceive. Have you ever tried to explain a smell or a color? Sound is no different. We often use descriptors from our other senses in an attempt to convey meaning.

As a result, common terminology has evolved in the audio industry to describe the ‘flavor’ of sound.

Headphonesty is here to help with our very own list of Audio Descriptors. Wherever possible, we’ve linked to related and opposite terms to help with understanding.

Click the letters to jump to the relevant section:

A 🔗

Accuracy 🔗

How faithfully the music is reproduced by the playback equipment.


  • Attack: The time it takes for a sound to increase to its maximum amplitude.
  • Decay: The sound immediately begins to decrease to the sustain level.
  • Sustain: The sound remains at this level until released.
  • Release: The sound decreases back to zero amplitude.

Aggressive 🔗

Excessively in-your-face sounding. Brightness tending towards unpleasantness.

Airy 🔗

Giving the perception of music reproduction in an open and empty interior space. Typically indicates high-frequency reproduction above approximately 15kHz.

Analytical 🔗

Conveying complexity and detail.

Articulate 🔗

Able to discern individual voices and instruments in the music.

Attack 🔗

The first part of ADSR, the ability to reproduce the initial part of a sound or note.

B 🔗

Balance 🔗

Tonal balance indicates the relative degree of each part of the spectrum of audible frequencies. Channel balance is the relative level of the left and right sides stereo in stereo reproduction.

Bass 🔗

The lowest range of audible frequencies (approximately 60-250Hz). Below this is Sub-Bass which is felt more than heard. Bass is often described as imparting depth to the sound, with weight and impact. Bassy implies that the lowest frequencies are emphasized.

Bass-Head 🔗

An individual who prefers equipment and music that reproduces strong bass frequencies.

Bass-Bleed 🔗

An abundance of, or uncontrolled bass response that overwhelms or interferes with higher (midrange) frequencies.

Beat 🔗

The pulse of the music. Related to tempo, rhythm, and groove.

Bleed 🔗

Leakage of sound either from a device or between frequencies.

Bloated 🔗

Uncontrolled or excessive mid-bass (approximately 250Hz) frequency reproduction.

Blurred 🔗

Inability to quickly respond to rapidly changing dynamics, or inability to reproduce a sense of separation between the left and right channels of stereo audio.

Boost 🔗

To make louder or increase (all or a portion of) the audible spectrum.

Boomy 🔗

Excessive bass (approximately 125Hz) reproduction, that sounds like low-frequency resonances.

Boxy 🔗

Excessive lower midrange (approximately 250-500Hz) that creates an enclosed or cupped sound.

Breathy 🔗

Upper midrange and treble reproduction allow for audible breath sounds on wind instruments.

Bright 🔗

Strong upper midrange and treble reproduction.

Brilliance 🔗

The impression of clarity from strong treble (approximately 6-16Hz) reproduction.

C 🔗

Closed 🔗

Treble roll-off above 10kHz, creating a lack of detail, delicacy, and openness.

Coherent 🔗

Indicates proper timing and natural sounding reproduction with good imaging and definition.

Color 🔗

A change to the original sound, timbre, or frequency, that may (or may not) be pleasant sounding and desired, but regardless, is not accurate.

Congested 🔗

Difficult to hear details due to distortion, noise, poor frequency reproduction, or timing.

Cool 🔗

Sound lacking lower bass frequencies under approximately 150Hz.

Crisp 🔗

Good high-frequency extension.

D 🔗

Dark 🔗

Stronger low frequencies and weaker higher frequencies, yielding a feeling of subdued details.

Decay 🔗

In ADSR, immediately following the Attack stage of a note, the amplitude decreases to the Sustain level.

Definition 🔗

Subtleties in the sound are revealed or able to be discerned

Delicate 🔗

Extended and smooth high frequencies above 15kHz.

Depth 🔗

A sense of front to back space discerned in the music.

Detailed 🔗

A strong midrange and treble, as well as fast transient response, that conveys the most subtle elements in the music.

Dry 🔗

A lack of discernible reverberations or harmonics causing a thin sound.

Dynamics 🔗

The variation in loudness between notes or sections within the music.

E 🔗

Edgy 🔗

An excessive treble response, often due to unwanted harmonics.

Energy 🔗

A sense of power and dynamics in the music.

Euphonic 🔗

Often used to describe the appealing harmonics and sense of increased fidelity created by tube amplification.

F 🔗

Fast 🔗

Rapid transient response.

Fatigue 🔗

When prolonged listening causes discomfort to the listener, typically due to perceived low-level distortion, or strain caused by reflexively attempting to localize sounds in the music. The latter can be minimized with crossfeed.

Fidelity 🔗

Accuracy of the reproduction compared to the original music.

Flat 🔗

An unchanging (constant) loudness across the entire Frequency Response spectrum.

Focus 🔗

Conveying a strong sense of width or depth.

Forward 🔗

The music seems difficult to ignore and is presented in an ‘in-your-face’ manner.

Frequency Response 🔗

Measurement of amplitude (output in dB) vs frequency (inHz). The basic graph used by many headphone measurements to judge the sound signature.

Frequency Spectrum 🔗

The human range of hearing is measured on a continuous spectrum of frequencies typically ranging from 20Hz to 20kHz.

  • Subsonic: < 20Hz
  • Sub-Bass: 20 – 60Hz
  • Bass: 60 – 150Hz
  • Upper Bass: 150 – 250Hz
  • Midrange: 250 – 3000Hz
  • Upper Midrange: 3000 – 5000Hz
  • Treble: 5000 – 20000Hz
  • Super-Sonic: > 20000Hz

Full 🔗

Music reproduction with a good balance of harmonics and fundamental response. Often indicative of strong bass response in the approximately 100 – 300Hz range.

G 🔗

Grain 🔗

Conveying a rough, gritty, or unrefined texture to the reproduction of midrange and treble. May be called sparkle referring to higher treble reproduction.

H 🔗

Harsh 🔗

Excessive upper midrange (approximately 3 – 6kHz) peaks causing the music to feel uncomfortable and abrasive.

Headstage 🔗

A headphone specific term referring to a sense of space where the instruments reside within the music.

Highs 🔗

The upper range of the audible frequency spectrum (6 – 20kHz).

Hiss 🔗

Audible unwanted noise caused by electrical fluctuations.

Hollow 🔗

A feeling of thinness, or lack of fullness in the music, caused by recessed midrange reproduction.

Honky 🔗

Analogous to speaking through a megaphone or cupped hands. Elevated frequency response between approximately 500 – 700Hz.

I 🔗

Imaging 🔗

The ability to create the perception of physically locating a specific instrument in a horizontal space created within the music. Differs from Soundstage, as imaging is a sense of progression between left and right channels, rather than depth.

J 🔗

K 🔗

L 🔗

Laid-Back 🔗

Somewhat soft and gentle music reproduction that has a sense of distance or depth from the performers. May be due to a recessed midrange or rolled-off treble.

Liquid 🔗

Smooth, integrated, and coherent music reproduction that is not overly technical or detailed.

Loudness 🔗

Subjective interpretation of the magnitude of sound.

Low-Level Detail 🔗

The quietest sounds present in a recording.

Low-Range 🔗

The bottom end of the audible frequency spectrum typically referred to as Bass.

Lush 🔗

Music reproduction with a large number of even-order harmonics, which tend to be pleasing and warm sounding, often as a result of tube amplification. Often indicative of strong bass response in the approximately 100 – 300Hz range.

M 🔗

Midrange 🔗

The center portion of the audio spectrum (between approximately 250Hz and 3kHz) is sometimes referred to as ‘Mids’. Upper Midrange refers to audio frequencies between approximately 3kHz and 5kHz. Human vocal reproduction and many instruments reside in the midrange frequencies.

Muddy 🔗

Poorly-defined, smeared or vague music reproduction, often due to slow transient response or decay, weak treble response, and/or excessive bass response.

Musical 🔗

Cohesive music reproduction that sounds natural, realistic, and ‘right’ to the listener.

N 🔗

Nasal 🔗

An unpleasant confined sound, or a ‘speaking-through-the-nose’ sense for vocals, typically due to a peak in midrange frequencies around approximately 600Hz.

Natural 🔗

A sense of realism in sound reproduction that sounds ‘right’ to the listener.

O 🔗

Opaque 🔗

Vague or smeared music reproduction, lacking delicacy or definition.

Open 🔗

Smooth high-frequency reproduction that yields a sense of airiness or subtlety to the music.

P 🔗

Peaky 🔗

Sudden or sharp ups or downs in frequency reproduction.

Piercing 🔗

Excessive treble response, typically due to peaks around approximately 3kHz to 10kHz.

Pitch 🔗

The fundamental frequency of a musical note.

Pop 🔗

Audible sharp breath noise when air hits the microphone, typically occurring with ‘b’, ‘p’ and ‘t’.

PRaT 🔗

Subjective (and controversial) interpretation of three basic building blocks of music reproduction.

  • Pace (formerly Pitch referring to the speed stability of Linn turntables) – the speed that the music is being played.
  • Rhythm – the relationship of successive notes and the beat.
  • Timing – the accuracy of multiple frequency reproduction.

Presence 🔗

The clarity and definition of voices and instruments and the ability to uniquely perceive them, typically influenced by upper midrange and treble reproduction around approximately 3kHz – 5kHz.

Punch 🔗

Indicative of strong bass reproduction and dynamics, with fast attack and short decay, giving a sense of power but remaining coherent and controlled.

Q 🔗

R 🔗

Range 🔗

The difference between the highest and lowest frequencies being reproduced.

Recessed 🔗

A decrease in a section of the audible frequency spectrum in relation to the others. ‘V-shaped’ refers to a recessed midrange.

Relaxed 🔗

Gentle or rolled-off treble reproduction compared to the midrange, which may result in a non-fatiguing but not overly detailed sound.

Resolving 🔗

The equipment’s ability to reproduce and separate the sounds between individual instruments. Detail is the interpretation of the reproduced resolution.

Rhythm 🔗

The controlled or ordered reproduction of the sounds and elements of the music in time.

Roll-Off 🔗

The attenuation, or gradual decrease of a frequency, typically at the high and low extremes of the audible frequency spectrum.

S 🔗

Sibilant 🔗

Excessive, unnatural, and unwanted, high-frequency vocal noises, that exaggerate ‘s’ or ‘sh’ sounds, which sounds strident and unpleasant. Can also be heard on cymbals typically in the 4 – 9kHz range.

Smeared 🔗

Poorly resolved sound reproduction that lacks detail, typically due to poor transient response.

Smooth 🔗

Indicative of a non-fatiguing and even frequency response without peakiness, especially in the midrange.

Soundstage 🔗

The ability of the equipment to create a perception of space (width, height, and depth) in the music, within which the instruments and vocalists are located.

Sparkle 🔗

Strong high-frequency reproduction yielding a sense of energy to the sound.

Spatial Localization 🔗

Interpretation of distance and location based on auditory clues.

Speed 🔗

Refers to the rate of transient response and individual note reproduction and discernibility.

Sub-Bass 🔗

The lowest end of the theoretical audible frequency spectrum between approximately 20 – 60Hz, where the sounds are more felt than heard. Sometimes referred to as seismic.

Sustain 🔗

The third part of ADSR which is the level where the sound remains after the initial Attack and Decay.

Sweet 🔗

Delicate reproduction of the highest audible frequencies approximately 15 – 20kHz without distortion or peaks.

T 🔗

Tempo 🔗

Italian for “time”, it is the speed or pace of the music measured in beats per minute (BPM).

Texture 🔗

Describing the quality of sound reproduction, and if the number or structure of unique sound sources can be perceived in the music.

Thick 🔗

Slow transient response and lack of clarity in bass frequencies.

Thin 🔗

Reproduction lacking in low frequencies or weak fundamental harmonics.

Thump 🔗

Indicative of strong bass and sub-bass reproduction and dynamics, but may lose coherence if not well controlled.

Tight 🔗

Easily distinguished and strong bass frequency reproduction, notable for quick transient response and control.

Timbre 🔗

A measure of sound quality and the ability to reproduce the characteristic or realistic tone of a voice or instrument.

Transient Response 🔗

The ability to respond to rapid or instantaneous changes in dynamics, voltage, or input.

Transparent 🔗

The ability to discern individual components and details in the music, typically related to a flat frequency response with little distortion.

Treble 🔗

The upper portion of the audio frequency spectrum is between about 5000 – 20000Hz.

It contains the highest audible frequencies of voice or instruments such as cymbals. Accurate reproduction leads to a feeling of definition, imaging, and accuracy.

U 🔗

V 🔗

V-Shaped 🔗

Referring to recessed midrange reproduction in relation to bass and treble. The vocal sounds appear to be sitting in the background behind the strong bass and energetic treble.

Veiled 🔗

Loss of detail or transparency that sounds as though there is fabric in the way of the sound, often due to high-frequency roll-off or distortion.

W 🔗

Warm 🔗

Extended lower frequency reproduction and smooth sound signature, often attributed to strong even harmonics from tube amplification. Typically paired with rolled-off, or recessed higher frequencies.

Weight 🔗

Solid, deep, and substantial sense of bass response below approximately 50Hz.

Width 🔗

A sense of horizontal space (left to right) in stereo reproduction. Similar to Depth (which is space front to back).

X 🔗

Y 🔗

Z 🔗