Consistency in Sound Recording

A critical aspect in sound recording for motion pictures and video is the consistency of all of the repetitively appearing elements of the soundtrack from scene to scene. As a scene (or an event in time) progresses from beginning to end -- audiences expect the sound to flow seamlessly and continuously, just as it would if they were somehow physically present while witness to the event.

It does not matter to the audience that we have constructed this cinematic event from numerous camera angles and takes, shot over a wide expanse of actual time. On screen, it all becomes one continuous mise en scene. Actors walk & talk and progress from point A in their fictional time to point B, without delay nor interruption. That is, until the scene changes. Not merely the angle, but the scene!

Imagine that you are in an apartment and eavesdropping on your roommates. You make a pretense of moving around in order to houseclean, but really you are just trying to be inconspicuous while you watch and listen to their conversation. As you periodically change your location in the room while in the act of tidying up, you are really just editing your viewing angle of the action. But even as you change you visual vantage point, or mentally focus in (zoom) on one roommate or the other, the sound remains pretty much the same.

The audio elements within the scene normally include dialogue between two or more characters, background noise, and spot sound effects. The audio levels between the actors will vary in relation to each other; different people do not speak at the same level nor with the same intensity. But real people will not change their individual speaking levels arbitrarily, suddenly shifting from whispers to shouts to normal to whispers to normal to shouts without extenuating dramatic rationale. The ring of a telephone or the slamming of a door may shatter the monotony, but the drone of the traffic outside the window remains fairly constant.

To achieve this realistic consistency of the audio is the combined goal of the entire sound team, including the Production Mixer, Sound Editor, and the Re-Recording Mixer.

This article will deal with consistency as it pertains to the role of the Production Sound Mixer.

There are three aspects of consistency that the Mixer must be attentive to: 1) Consistency within the shot; 2) Consistency between shots within the scene; and 3) Consistency between scenes.

Within the shot, levels should remain relatively constant between actors and also between background ambiance. Actors are not expected to match each other in terms of recording level; variations are normal. But their levels should match themselves. As they banter, the actors' audio should appear somewhat constant. There should be no unwarranted sudden changes in volume, except when justified by dramatic intent.

For instance, Actor A (Tough Guy) usually speaks loud and forcefully. Actor B (Mousy Nerd) is far more timid and soft-spoken.

If we are recording with a mixing panel, we try to keep normal conversation at around minus 8 or so on the meter (which is usually a peak reading meter). The area around zero is reserved for shouts and loud sound effects. Recall that when using a peak reading meter such as that found on many professional mixers, a level of minus 8 is the rough equivalent of zero on a VU meter.

Peak meters are calibrated in terms of measuring the loudest part of the signal that can be recorded without risking distortion. It is like reading a 100% white level. VU meters are set up in terms of average volume levels, and assumes approximately ten dB difference between the average level and maximum. It is like reading a middle gray level. Zero VU (middle gray) is equivalent to minus eight or ten PEAK (white). Our industry, for the sake of convention, considers a pure tone (not really the same as voice, which fluctuates a lot) of minus eight dB PEAK to equate zero VU.

When recording in digital, as opposed to analog, consider zero as an absolute zero. In other words, NEVER let your signal peak about the zero mark! Unlike analog audio recording, where distortion gradually begins at zero, the digital domain is totally unforgiving. Less than zero; it all sounds great. Exceed zero; and you have severe problems. For best results in digital, I recommend keeping normal dialogue down around minus 15 to minus 20. That allows sufficient headroom in case an actor hits you with a loud exclamation such as a shout. But on the mixing panel, we still set the actor so that his normal dialogue is below zero, say around minus eight. Zero on the mixer might correspond to minus fifteen or minus twenty on the digital recorder.

When recording these two actors, we find that Tough Guy usually moves the meter on ou mixer to, say, around minus 6. The Nerd hits around minus 10, which is a bit lower in volume and natural. Again, we reserve levels above minus 6 or so for very loud sounds (which would translate into signals of zero to plus ten on a VU meter, but still under the absolute zero critical barrier of a digital recorder).

So as much as possible during this shot, we want to maintain this recording relationship of Tough Guy around minus 6 and Nerd around minus 10. This is especially important to do if we are opening and closing multiple mics.

In addition to the actors, we must also be attentive to background noise. If we are continually adjusting the levels of our mics on the set to balance the levels of our actors, then the side effect is for our background noise to go up and down like a roller coaster.

The way to avoid problems with the background noise is to take advantage of the acoustic properties of the mics we use in order to control the relative levels of the dialogue by means of microphone positioning (distance) and angle rather than by electronically adjusting the gain (volume) at the recorder or mixing panel.

Shotgun microphones are more sensitive in the front ("on axis") and less sensitive from the side ("off axis"). Therefore, in order to balance the levels between Tough Guy and Nerd, the boom operator should hold the mic closer overhead to the Nerd with the front of the mic aimed more towards the Nerd, and allow the Tough Guy to strike the mic from more of a side angle and from a little further away. The increased distance to the mic along with the reduced sensitivity of the off axis angle will effectively reduce the volume of the Tough Guy in relation to the Nerd without affecting the constant level of the background ambiance.

As you can well imagine, the boom operator is a very important player. That is why boom operators need to be chosen carefully by the Mixer and cannot merely be selected from the pool of bystanders who aren't busy in the shot. This is also why it is very important for boom people to be provided with a good headphone feed of the program material.

When it is time to record another take of the same shot, once again it is critical that the Mixer pay attention to the relative levels of the characters and background. Footage from this take may be combined later on with past or future takes, so consistency of sound quality and levels is important.

When the camera changes its angle, the Mixer must be especially attentive that the levels of the new shot match and be intercuttable with the previous angles. Tough Guy should still be recorded around minus 6, where we established him before. Likewise, Nerd should remain around minus 10, where he was previously established. Remember, the audience should not be cognizant of an edit or camera angle change within the complete mise en scene; the action must appear to flow seamlessly from point A in time to point B in time.

Minor changes in angle do not motivate drastic changes in audio. Panning or cutting from one close-up to another of two people standing around talking does not constitute a significant perspective change. Levels and background are expected to remain constant.

One should be careful not to confuse perspective with volume.

In the medium long shot, Tough Guy speaks at minus 6 and Nerd at minus 10. The boom mic is maybe two and a half feet overhead due to the loose framing of the shot. When we push in to a single head close-up of Nerd, the boom mic is able to move in to a much closer position. It becomes relatively easy to record Nerd at minus 6 because the mic is so close. That would be an error!

In real life, when we talk with a person standing ten feet away from us, we tend to both see and hear more of the surroundings. But when we step in closer to the person, our mind tends to blank out some of the surroundings as we focus our eyes on the face in front of us. This is a perspective change. It is also a gradual and self-motivated change.

In cinema, changes in camera angle occur spontaneously and are motivated by the director/editor, not the viewer. The change may be a bit of a sensory shock.

Audiences tend to accept the visual change, since in real life our brain is constantly shifting focus and scope of what we see (a biological imitation of zooms and cuts, if you will). But it takes us longer to adapt to outwardly imposed changes on the audio, especially when it creates a discontinuity of levels (normal, loud, soft, loud, normal, soft, soft, loud, etc.) within the scene.

Getting back to our example scene above, when we move the mic closer to the Nerd for his close-up, the effect is to make his voice dominate over the surrounding background, which is in keeping with the natural change in perspective. But if we allow his voice level to rise above its established level range, then the audio becomes disjointed from the time line of the complete scene and will not smoothly intercut with the rest of the footage.

Therefore, when you move the microphone in for a close-up, re-adjust the volume so that the actor's voice level remains constant with the rest of the sequence. Characters' audio should be somewhat constant throughout the course of the scene, even as the shot changes from wide shots to mediums to singles to reverses to mediums, etc. If you were to close your eyes, the changes in audio from shot to shot should not be unnatural nor unexpected.

This is not to say that if an actor walks distantly away from camera that his voice level should not diminish. Of course it should, as it would in real life. But a variation in camera angle (as opposed to a change in actor location within the set, visual or implied) does not warrant a major change in audio levels. However, a major change in camera LOCATION may justify a change in relative audio, particularly the background.

Of course there are always going to be some changes in audio levels. This is an art form, not a controlled manufacturing process. The nature of production is such that we can't always control things as much as we'd like to, such as mic placement and background ambiance. The idea, though, is to at least try and keep these level changes as minimal and inconspicuous as we can when we record them; and then to fix them completely during post-production.

Not only does sound need to be consistent within a shot, and from shot to shot, since this footage may all be integrated during editing -- but sound must also match up when scenes butt up against other scenes.

Throughout the duration of the production, try to establish and then maintain relative audio levels for all of your characters. Change perspective (the blend of background to dialogue) as necessary, but try to keep your characters as constant as possible.

Equalization is another important aspect to consider, as well as straight audio level. Avoid using any more equalization than is absolutely necessary on the set. Traditionally, a mixer will roll off the excess bass frequencies to reduce wind noise and rumble, especially out of doors. Some mixers like to boost the mid-range frequencies just a smidge, in order to emphasize speech over ambiance. High frequencies are usually left alone.

If you choose to employ some equalization on a shoot, make certain than you apply the setting consistently from the first till the last day of production. For instance, many mixers have a set degree of bass rolloff that they will use outdoors and a lesser amount of rolloff for interiors. That is okay since people do sound different outside than inside. But do not vary the intensity of rolloff from day to day based on the local wind conditions. Otherwise, what sounds good Monday and Tuesday may not intercut well with material recorded the week before, or the month later!

Resist the temptation to sweeten the mix on location by playing with all those colored dials. Once you record something with EQ, it cannot be undone later on. Record your tracks as plain as possible, and save the special effects and final tweaking for post, where they have the liberty of working with edited sequences and of repeating their attempts until it all sounds right.

The only time that a mixer is justified to employ extraordinary EQ to improve a shot (that is, anything over and above your "permanent" bass rolloff and possible midrange bump) is when the alternative is to absolutely have to loop the scene unless correction is applied. In other words, you can play with the EQ only when you have absolutely nothing to loose and anything to gain. If in doubt, leave the EQ settings alone!

In conclusion, plan ahead!

See how your characters interplay, and then try to establish and maintain their relative audio levels and EQ regardless of close-up or wider shot. Changing perspective does not mean changing volume, only reducing background. Louder does not mean better.

Adjust your EQ "permanent" settings for interior or exterior, but do not mess around from shot to shot nor scene to scene.

Above all else, think like an editor. All this stuff has to intercut smoothly and seamlessly. From consistent work habits you will achieve consistent soundtracks.