Sound II Reading

1. Preception of Sound

Our perception of sound is the result of the physical dimensions being transformed by the ear and interpreted by the mind. The perceived parameters of sound are our perception of the physical dimensions of sound.

This translation process is nonlinear, and differs between individuals. The human ear is not equally sensitive in all frequency ranges, nor is it equally sensitive to sound at all amplitude levels. Complicating matters, no two people actually hear the characteristics of sound in precisely the same way. We need only to notice the different ear shapes around us to recognize that no two people will pick up acoustic energy in precisely the same way.

Physical	Perceived
Frequency	Pitch
Amplitude	Loudness
Time	Duration
Timbre (physical component)	Timbre (perceived overall quality)
Space (physical components)	Space (perceived characteristics)

Next we will look at each of the perceived qualities of sound.

1.1. Pitch

Pitch

As discussed last week, pitch is the perception of the frequency of the waveform. We assign values to pitches and organize these into tuning and harmonic systems, which give rise to melody and harmony.

The frequency range most widely accepted as the human hearing range is from 20 Hz to 20,000 Hz (20 kHz), though most humans are sensitive to frequencies well above and below that range. But remember, most of you lost 20k a long long time ago.

1.2. Duration

Duration

Humans perceive time as duration. We cannot accurately judge time increments without a reference time unit. Regular time reference units are found in musical contexts. The underlying metric pulse of a piece of music allow for accurate duration perception. In music the listeners remembers the relative duration value of successive sounds. These successive durations create musical rhythm.

The listener is only able to establish a metric pulse within certain limits. Humans can accurately perceive metric pulse between 30 to 260 pulse per minute (or more commonly, bpm – beats per minute). Beyond these boundaries, we will instead replace the pulse with a duration of either one-half or twice the value, or the listener might just become confused and be unable to make sense of the rhythmic activity.

1.3. Equal Loudness

The ear is a non-linear device, so does not respond equally across its hearing range. This is what the Fletcher-Munson Equal-Loudness Contours tells us. One of the main characteristics of this is that when a signal is generated at a specific intensity (sound pressure level – SPL), the perceived volume is dependent on the frequency. Our ears are more sensitive to mid-band sounds than low or high frequencies.

Fletcher-Munson Equal Loudness Contours:

What is important understand about this is the level you mix at. This will make a significant difference between having a mix that is balanced correctly versus a mix that has an inaccurate frequency balance. The key is to mix at conversation level. If you have to shout over your mix, you are trying to mix at too loud of a level. Yes, at some point you will need to listen to your mix louder and quieter, but to get the correct balance conversation level is the best. The book states 85 dB as the optimal listening level, but it is really dependent upon the room size. In a small room, 85 dB may be too loud. Attempting to mix at a loud level is one of the fundamental mistake people make when starting out.

The nonlinear frequency response and fatigue over time contribute to further inaccuracies in three ways:

With sounds of long duration and steady loudness level, loudness will be perceived as increasing with the progression of sound until approximately 0.2 seconds of duration. At that time, the gradual fatigue of the ear (and possible shift of attention by the listener) will cause perceived loudness to diminish.
As loudness level of the sound is increased, the ear requires increasing more time between soundings before it can accurately judge the loudness level of a succeeding sound. We are unable to accurately judge the individual loudness levels of a sequence of high-intensity sounds as accurately as we can judge the individual loudness levels of mid- to low-intensity sounds; the inner ear requires time to reestablish a state of normalcy, from which it can accurately track the next sound level.
As a sound of long duration is being sustained, its perceived loudness level will gradually diminish. This is especially true for sound with high sound pressure levels. The ear gradually becomes desensitized to the loudness level. The physical masking (covering) of softer sounds and an inability to accurately judge changes in loudness levels will result from the fatigue. When the listener is hearing under listening fatigue, slight changes of loudness may be judge as being large. Listening fatigue may desensitize the ear's ability to detect new sounds at frequencies with the frequency band where the high sound-pressure level was formally present.

1.4. Timbre

Timber,Tamber,Timbre

A reminder of Timbre:

Humans recognize timbre as entities, as signal objects having an overall quality. We recognize hundreds of human voices because we can remember their timbre. We remember the timbres of many musical instruments and their different timbres. The global quality of timbre is that it allows us to remember and recognize specific timbre as unique and identifiable objects.

Sufficient time is required for the mind to recognize and identify an object. For rather simple sounds, the time required for accurate perception is approximately 60 ms. As the complexity of the sound grows, the time needed to perceive the sound will also increase. All sounds lasting less than 50 ms are perceived as noise-like. The only exception is if the listener is well acquainted with the sound and timbre can be recognized from this small part.

1.5. Space

The perception of space is the impression of the physical location of the sound source in an environment, together with the modifications the environment itself places on the sound source's timbre.

The perception of space in audio recording is not the same as the perception of space of an acoustic source in a physical environment. In an acoustic space, listeners perceive the location of the sound in relation to the three-dimensional space around them: distance, vertical plane, and horizontal plane. Sound is perceived at any possible angle from the listener, and sound is perceived at a distance from the listener.

In audio recording, illusions of space are created. Sound sources are given spatial characteristics through the recording process and/or through signal processing. The spatial information is intended to complement the timbre of the music or sound source. The spatial characteristics may simulate particular known physical environments, or be intended to provide spatial cues that have no relation to our reality.

2. Interaction

The perception of sound is always dependent upon the current state of the other parameters. Altering any of the perceived parameters of sound will cause a change in the perceived state of at least one other parameter.

Some of the interactions are:

Pitch Perception: Duration and Loudness

Loudness and Time perception

Loudness Perception altered by Duration and timbre

Pitch perception and Spectrum (Missing Fundamental)

Amplitude, Time, and Location (Haas effect)

Masking

Beating

3rd Voice

2.1. Pitch

Duration

Short pulses of a particular frequency can also be perceived as having a different pitch compared to a longer pulse of the same frequency – generally, a short pulse with decaying amplitude will be perceived as having a slightly higher pitch than a longer pulse of the same frequency.

Loudness

We now know that frequency can affect the perceived loudness of a signal, but it is also true that the actual signal volume can affect the perceived pitch; this can manifest itself in different ways. The level 60 dB SPL is considered to be a threshold where increases or decreases in loudness affect pitch perception. Above 60 dBm for sound below 2 kHz a substantial increase in loudness will cause an apparent lowering of the pitch, the sound will go flat. Similarly, a substantial increase in loudness levels above 2 kHz will cause the sound to appear sharp. If we decrease the loudness the opposite will occur. Below 60 dBm a decrease in loudness will cause sounds below 2 kHz to be perceived as getting sharp and sounds above 2 kHz perceived as going flat.

2.2. Loudness/Time

Loudness/Time Perception

Loudness level can influence perceived time relationships. When two sound begin simultaneously, they will appear to have staggered entrances if one of the two sounds is significantly louder than the other. The louder sound will be perceived as having been started first.

Loudness/Duration

Duration can distort the perception of loudness. Humans tend to average loudness levels over a time period of 2/10 of a second. Sounds of shorter duration will appear to be louder than sounds (of the same intensity) with durations longer the 2/10 of a second.

Loudness/Timbre

Timbre can also influence loudness perception. Sounds with complex spectrum will be perceived as being louder than sounds that contain less harmonics. Also sounds with more complex spectrum with a strong presence of overtones will be perceived as louder than sounds containing mostly proportionally related harmonics. Following this principle, a change in timbre during the sustain duration of sound will result in a perceived change of loudness.

2.3. Masking

Masking is one of the most important psycho acoustic principles to consider when mixing.

This effect of one particular sound effectively blanking out another sound can occur in two distinct ways: frequency and temporal.

Frequency masking

the phenomenon of louder signals blocking out quieter signals of similar frequencies. This is why it is vital to use equalization to create space within the mix, as two instruments that sound beautiful in isolation may compete for the same frequency range within the ear – and so need to be equalized to work together.

Temporal masking

the effect whereby a loud sound (of any frequency) restricts how we hear a quieter sound that is played at a similar (but not simultaneous) time. Both pre- and post-masking can take place if the time difference between the signals is short enough, so a quiet sound can actually be masked by a louder sound that comes after it.

2.4. Phase

In the realms of audio recording and mixing, there are few words that appear more frequently than ‘phase’. In short, the implications of phase impact on almost every decision and action we take as a sound engineer. Whether it’s determining the placement of microphones, our perception of stereo, the application of EQ or the sweep of a synthesizer’s filter, all of these processes have some effect on phase.

phase

One factor that can induce phase problems is the orientation of the microphones. A common occurrence of this comes from the mic’ing a snare with microphones placed on the top and bottom of the snare. Because the microphones are approaching the snare from opposite directions, the compressions and rarefactions picked up by each microphone will be correspondingly inverse; as the top mic moves into positive, the bottom mic moves into negative. When the two mics are combined, the resulting sound is weaker, as a number of harmonics in the snare are canceled out. By reversing the phase of the bottom mic the problem is eliminated.

Seeing phase as purely problematic is seeing only half the picture, though; there are many situations in which manipulating or distorting phase can have a positive impact on the recording process. It is widely known, for example, that the process of equalization, as well as modifying the relative timbral balance of a signal, also creates differing phase shifts in the frequency spectra. Although this phase distortion might seem problematic at first, it remains a defining quality of the sound we associate with specific EQ's. Many of the vintage EQ's that today's plug-ins emulate are highly sought after for their sound. Analog Moog and Oberheim filters are so revered, largely because of the phase distortions they impose on a filtered oscillator. Effects such as flangers or phasers also make deliberate use of phase manipulation to modify the timbre of a sound.

2.5. Haas

Based on the delay time between the source sound and its delayed image we can fool the brain into creating images that aren't really present.

The example below first has the sound source coming from the left monitor. This sound is also routed out, via an aux channel, to the left monitor.

The next example now delays the right side by 20 msec. Notice how the image shifts to the sound coming from the left hand side. The image also appears fuller wider then the image produced without the delay. A delay below 30 msec is known as the Haas Effect.

Delays greater than 30 msec will start to produce discrete echoes. Delayed sounds will be covered in deeper detail in a later lesson. Right now it is important to understand the concept of how the brain deals with delayed sounds.

2.6. Standing Waves

One of the most common problems in acoustics, that particularly affects 'room-sized' rooms, rather than concert halls, is standing waves.

A standing wave occurs when a wave reflects between two boundaries and travels back along the incident path, interfering with the original wave and causing a waveform that appears to be stationary. In the simplest case, when the distance between two parallel walls is exactly half the wavelength of a particular frequency, then a standing wave can build up. If the losses are minimal (due to a lack of absorption), the wave will potentially resonate and cause localized peaks in volume at the specific frequency – the result being that the sound that is heard within the room is no longer a good representation of the original sound.

In the example below the blue wave is the original signal moving from left to right. The red wave is return signal from a parallel wall. The black wave is the result of the two waves combining. The resulting wave is the resonant frequency of the room.

Smaller rooms sound worse because the frequencies where standing waves are strong are well into the sensitive range of our hearing. Most people will immediately recognize common parallel walls within a room. Using a typical home studio setup as an example, the four walls of the room create two common sources of parallel walls. Another source people overlook is the floor to ceiling parallel wall.

3. Pink and White Noise

Pink noise

each octave carries an equal amount of noise power. For the human auditory system - which processes frequencies logarithmically - pink noise sounds evenly spread across all frequencies and best approximates the average spectral distribution of music.

White noise

power distribution is flat across the entire frequency range. In other words, white noise contains all frequencies in equal proportion. For the human auditory system - which processes audio in a logarithmic frequency scale - white noise sounds much brighter than pink noise.

A simpler way to think about the difference is to relate pink and white noise to the Equal-loudness contour curve. Pink noise is all is frequencies of 20 Hz to 20 kHz at the same power level, so we hear this as a mid-range sound since the Equal-loudness contour curve tells us our perception when listening to frequencies at the same perceived volume level will cause the low and high frequencies to roll off leaving mostly mid-range frequencies.

White noise is all the frequencies of 20 Hz to 20 kHz at the same perceived level. This means white noise factors in compensating for our hearing and boost the level of low and high frequencies so we perceive the loudness of each frequency to be the same. Since we now can hear all the frequencies, this will make white noise brighter.

Site:	TRCOA Conservatory
Course:	Recording Techniques I & II
Book:	Sound II Reading

Printed by:	Guest user
Date:	Tuesday, 15 July 2025, 3:13 PM

Sound II Reading

Table of contents

1. Preception of Sound

Physical

Perceived

Frequency

Pitch

Amplitude

Loudness

Time

Duration

Timbre (physical component)

Timbre (perceived overall quality)

Space (physical components)

Space (perceived characteristics)

Next we will look at each of the perceived qualities of sound.

1.1. Pitch

Pitch

1.2. Duration

Duration

1.3. Equal Loudness

Fletcher-Munson Equal Loudness Contours:

1.4. Timbre

Timber,Tamber,Timbre

1.5. Space

2. Interaction

The perception of sound is always dependent upon the current state of the other parameters. Altering any of the perceived parameters of sound will cause a change in the perceived state of at least one other parameter.

Some of the interactions are:

2.1. Pitch

Duration

Loudness

2.2. Loudness/Time

Loudness/Time Perception

Loudness/Duration

Loudness/Timbre

2.3. Masking

Masking is one of the most important psycho acoustic principles to consider when mixing.

Frequency masking

Temporal masking

2.4. Phase

2.5. Haas

Based on the delay time between the source sound and its delayed image we can fool the brain into creating images that aren't really present.

The example below first has the sound source coming from the left monitor. This sound is also routed out, via an aux channel, to the left monitor.

The next example now delays the right side by 20 msec. Notice how the image shifts to the sound coming from the left hand side. The image also appears fuller wider then the image produced without the delay. A delay below 30 msec is known as the Haas Effect.

Delays greater than 30 msec will start to produce discrete echoes. Delayed sounds will be covered in deeper detail in a later lesson. Right now it is important to understand the concept of how the brain deals with delayed sounds.

2.6. Standing Waves

3. Pink and White Noise

Pink noise

White noise