Chapter 2: Elements of the Soundtrack

Much of what a Production Mixer does is based upon his or her assessment of what will be needed later on during post-production (editing & final mixdown). With that in mind, let’s begin with a brief overview of “post” and work our way back to the production side of things. Key elements include Narration, Music, Sound Effects, and Dialogue.

What types of sound make up a motion picture or video soundtrack?

Narration (N, NARR, VO)

Many films rely heavily on NARRATION to hold the visuals together or to provide explanation. All of us, I’m sure, are familiar with documentaries, travelogues, and educational films that employ Narration as the primary element of the soundtrack. Don’t forget, however, that many theatrical films also use Narration as a story device—sometimes in the role of an ‘anonymous’ storyteller, sometimes as the inner thoughts of a principal character.

Narration can be recorded in two different ways.

The first way, or style, is to have the narrator view the film and record live commentary while it is projected. The lines may be from a script or totally improvised, depending on the film in question. This style is referred to as “sync to picture”. As you have guessed, it is quite common to travelogues! The other approach, which is usually the preferred way of doing it, involves recording the narration “wild” from a script, instead of from watching the picture. The talent reads the lines from a prepared script, which are recorded as isolated takes. (Note, while it is true that some narrators may view the film in preparation of the recording session, the picture does not play a role during the session itself.) An editor then cuts the desired lines in place opposite the appropriate footage.

This method gives the filmmaker maximum creative control over the relationship between picture and narration, and allows greater flexibility should editorial changes be desired later on. It also frees the narrator to concentrate on enunciation and delivery of the lines, rather than worrying about matching whatever is up on screen that moment.

Narration tracks can physically be recorded either in a professional recording studio (with full acoustic isolation from any outside noise), or as a “wild track” while on location. Which technique is used depends on knowing how the narration is to intercut with the rest of the soundtrack.

If the narration is supposed to be authoritative and ‘anonymous’ (commonly nicknamed the “voice of God” approach) — then isolated studio recording is called for. The voice track is recorded with a full presence, completely free of any ambient background noise or room coloration (room echo or bounce).

On the other hand, if the narration is supposed to be degetic, in other words, a “continuation” of on-screen dialogue or on-screen explanation—then the narration is usually recorded as a “wild track” (camera is not shooting) at the same location. The sound quality of the wild lines should match closely with the sound quality of the original on-screen portion of the dialogue. Perspective and presence should be similar. Background ambiance and room acoustics should also match. The goal is to convince the audience that the narration is an uninterrupted continuation of the talking head they saw at the beginning, even though the visuals have cut away to instructional inserts.

It is true, however, that often the sound mixer will be asked to record “voice of God” narration as well as “wild lines” while out on location, due to limited availability of some actors (or limitations of the budget). This, though, becomes more a matter of technique in “faking it” (to sound like an isolated recording studio).

Music (M, MX)

Even the earliest ‘silent’ films depended heavily on music to add emotion to moving images. The presence of a musical score tells the audience what feelings they are supposed to have: joy, sorrow, tension, exhilaration, impending fear, etc. In fact, many prerecorded musical scores in music libraries are titled and catalogued by their suggested emotional effect.

If this explanation of music’s role is new for you, then experiment a little. View a favorite film or two on DVD. Pick out a few major scenes, and try viewing them again with the sound off. Instead, play a few music albums in the background as you view the scenes. Notice how each different music selection appears the change the feeling of the scene!

As you can see, the presence of music always has some effect on what the audience will perceive about a scene. Depending on the musical selection, this effect may reinforce, contradict, or completely alter the original intent of the picture.

The dramatic source of music under a scene can be either “extraneous” or “practical”. Extraneous means that the score is simply there on the soundtrack because the filmmaker put it there to accompany the picture. The people in the movie theater hear it, but the characters in the film do not. Most music in soundtracks falls under this category. Also referred to as "non-diegetic" sound.

In contrast to this, some music is initially explained or motivated by some source on screen, such as a radio playing, a nightclub band, or a character musician. In these instances, the music that the audience hears is also being heard by the characters on screen! This is also referred to as "diegetic" sound.

Sometimes, music can creatively overlap both of these categories, by starting off as extraneous and then being revealed as practical, or vice versa.

Music for a soundtrack can originate one of two ways: canned or original score.

“Canned” music refers to having come from a prerecorded music library. For a fee, a producer can purchase the rights to use selections of existing music in his or her production. A large number of companies produce volumes of high quality, generic purpose music tracks intended exclusively for this purpose. The music is composed and recorded so as to facilitate “modular” editing to accommodate scene length or climax.

Producers can pay for the music on a “needle drop”, screen minute, or blanket basis. Needle drop refers to buying music based on a per selection, per use, basis. Blanket arrangements permit unlimited usage of the entire library either per entire production or per entire year. In determining their fees, music libraries will also want to know the intended purpose and scope of distribution of the film (theatrical, educational, home video, nationwide broadcast, industrial in-house, etc.).

Readers are warned, however, to exercise extreme caution in planning to use consumer music albums (pop, rock, soul, oldie, classical, etc.) as sources of music. Even in cases where the song itself is in public domain, the particular arrangement and performance are protected under copyright and fair trade laws. If you feel it is absolutely imperative to use a “real” song instead of one from a music library, make certain to obtain permission—in writing, in advance—from the recording company in question! Otherwise, you will discover just how ruthless, greedy, and unsympathetic lawyers and their clients can be.

For example, a tune as innocuous as the Happy Birthday Song is actually very much copyright protected and diligently monitored. The estate of the composer has made oodles of cash; shared by their attorneys; from countless public appearances of the song on TV shows, films, restaurant chains, concert venues, and the list goes on. Be careful what music appears in your production!

The other source of music is to have it originally composed and recorded for your project. This could involve a full scale orchestra, or be as simple as a single musician overdubbing himself. The process begins with supplying the composer with a video copy of the footage along with instructions from the director or editor.

In the course of composing the music, at some point the composer and editor will create what is known as a “click track”. This is a soundtrack that consists solely of clicks placed opposite the picture in order to convey cutting rhythm and climax. This click track serves to guide the composer and, later on, the musicians in keeping ‘beat’ with the film rather than a more arbitrary reference rhythm.

After the music has been composed, the next step is obviously to record it. In the case of an orchestral score, musicians are assembled and arranged in a large recording studio, known as a “scoring stage”. There, they view the film on a large screen while hearing the click track in headphones. Led by the composer, the orchestra performs the selections. The music is recorded on multi-track for later mixdown.

When the score is composed and performed by a single musician, as is more often the case on low budget productions, the individual composer may be responsible for producing the entire musical soundtrack. Employing a portable multi-track recording system in conjunction with video playback (or more likely a sophisticated computer software based system such as Pro Tools), he or she will commonly perform and overdub with keyboards, synthesizers, electronic drums, and perhaps a few acoustic instruments.

As to which form of music is better, it all depends on the situation, budget, and talent pool available. A good canned library will sound better than the results obtained from most “aspiring” young composer/musicians and from many “hack” orchestral composers. On the other hand, there are many talented composers whose quality and brilliance far surpass the generic accompaniment of even the best music libraries. (Personally, on low budget shows, unless the individual is of known and proven aptitude—I would prefer to go with a canned selection of good quality rather than gamble for excellence and end up with trash.)

There is also a new form of music library that utilizes computerized software to custom create a music score from pre-recorded songs assembled from modules. After the film editor selects a genre and length (within 1/10 of a second), the program will search its library for appropriate title selections. Music is created by assembling appropriate modules (opens, endings, middles, etc.) to achieve the correct length. Variations of the theme are created by assembling different modules in different order. These computer assisted music libraries are a fantastic compromise between custom music and traditional pre-recorded offerings. (Author is referring to SmartSound)

Sound Effects (FX, PFX, SFX)

The third of our soundtrack elements, in addition to narration and music, is the category of “Sound Effects”.

Sound Effects (commonly abbreviated as “FX”) refer to the sounds—other than dialogue—that objects or people make, along with those sounds that occur naturally in the background. All of these sounds are defined as “natural” necessarily only within the creative context of the movie and the filmmaker’s imagination. What they may or may not sound like in real life is not always in question. Who really knows what a three foot mosquito sounds like, so long as the sound effect works within the creative framework of the movie!

Sound effects can refer to events happening on or off screen. Footsteps of an actor may be an on screen event if we see the actor. Footsteps of the killer, coming down the hallway, outside of the closed door are an off screen event if all the audience sees is a shot of the closed door (from inside of the heroine’s room). Similarly, background ambiance often refers to off screen activity that the audience may never see, such as a passing siren, birds & crickets, a thunderstorm, and so on.

Sound effects may be either frame-accurate or wild. If the effect is dependent on synchronizing exactly, frame-to-frame, with an on screen event — it is known as a frame-accurate effect or more commonly, a “hard” effect. Examples include matching the sound of a gunshot with the firing of a gun, matching up door slams, whip cracks, sword clashes, punches, silverware being put on a plate, and so on.

If the sound of the effect only needs to be placed in the vicinity of an on screen event, but specific frame-to-frame synchronization is not important, then it is referred to as a wild or “soft” effect. Examples include environmental backgrounds (birds & crickets, rain, wind, ocean surf, traffic), engine noise, cafeteria ambiance, crowd noises, applause, laughter, even music and narration.

The sound effects themselves can originate from a number of different sources. Many effects are lifted from special sound effects libraries that operate similarly to music libraries. Editors can pay per effect, or arrange blanket usage agreements. Most sound editors and studios maintain and compile their own elaborate libraries of sound effects, built up over the years from all of the films they have worked on as well as by swapping with fellow editors. Unlike music, it is very difficult to identify original ownership of most sound effects—so, except in a few rare cases (recognizable synthesized effects), mere access to an effect is considered by most editors as an okay to use them. Legally speaking, that is false. However, the practice remains rampant in Hollywood.

Library effects include both “hard” effects as well as “wild” or “soft” backgrounds.

Sound effects don’t always come from a library. Quite often, they are recorded right on the set during actual production. Effects may be recorded in “sync” with picture during a take. This might include footsteps, door slams, explosions, car crashes, virtually anything that takes place in front of the camera. These are known as "PFX", or production sound effects.

Sometimes, though, these sound effects coincide with live dialogue or other effects. In those instances, and when time permits, the location sound mixer will try to record the sound effect “clean” after the take has been shot. (Although it can be confusing, the term “wild” also applies to anything recorded on the set without the camera rolling in sync.) This newly recorded effect retains most, if not all, of the same ambiance and characteristics of the original take. It is also completely accurate in that the same props were utilized.

Imagine yourself as an editor trying to match the sound of an arthritic woman slamming the car door of a ‘62 Thunderbird coupe... from an effects library. There might be a dozen or so car door slams, but probably none with the right speed, intensity, delivery—not to mention car model. In some situations, exact matching of details may be very critical, such as in a sales film or commercial, where it is illegal to substitute the sound of another car for the one being featured.

Sound effects can be recorded after production, during editing. It is not uncommon for a sound editor to send someone out (hopefully, a bona fide soundperson) in order to record a list of needed effects. Freshly recorded sound effects are usually far superior to anything in a library. By knowing as much as possible how the effect is to be utilized in a given scene, the soundperson can do a better job of recording the sound effect to match.

The soundperson should avoid the temptation to record any more or less elements of the effect than called for by the editor. For instance, if the editor needs the sound of a hammer striking a nail, don’t embellish the track with background construction noises and wild dialogue (“Hey, Ralph, hold this nail for me!”).

Some effects don’t readily lend themselves to live recording. Ever try to get the footsteps of a giant dinosaur? Editors and sound mixers will often conspire to create a sound effect that doesn’t exist in real life (or does exist but doesn’t lend itself to be easily recorded). Effects may be completely synthesized on electronic instruments, or may be based on taking real sounds and electronically modifying them. Most effects are composite effects, created like a musical chord, built up from a number of simpler sounds (all of which may have also been modified).

Finally, many sound effects are ‘dubbed’ in, by means of a process known as “Foley”. Briefly, the Foley process consists of recording the sounds of an artist while he mimics the actions of an actor on the screen. A short section of the film is projected over and over again for the Foley artist (also known as the “Foley walker”). The artist watches every movement of the actor very carefully, and mimics both the action and rhythm. The artist performs those same actions using a variety of props, and these actions are recorded in sync with the picture. For instance, the Foley walker may imitate the actor taking out a gun from a holster, or sitting down in a squeaky chair, or shuffling some papers in his hand.

In addition to mimicking simple actions, the Foley artist will also dub fight punches, hugs, kisses, swordplay, head scratching, and anything else that emotes sound—no matter how subtle.

Then there are the footsteps, which are what Foley people are best known for. Most nearly every actor walks. Sometimes we see his feet moving, other times we only sense the movement because the camera is in close. The Foley artist will recreate all of the footsteps of each actor, regardless of whether or not the steps are seen or implied. To assist in making the Foley footsteps match the environment on screen, the inside of the Foley recording stage is equipped with a multitude of small troughs known as Foley pits. Each Foley pit is a small rectangular area filled or covered with a different texture, such as concrete, dirt, linoleum, carpet, hardwood flooring, marble, grass, brush & twigs, sand, cobblestone, steel plate, and so on. In addition, there is a small wading pool of water for creating aquatic sound effects. The Foley walker also has access to a wide array of footwear, ranging from men’s combat boots to women’s high heels (irregardless of whether the Foley artist is male or female!) in order to accurately recreate all of the footsteps as well as mere body shuffles.

Some (low budget) Foley artists also employ what is known as “Fingertip Foley”. For example, walking your fingers in a large tray of uncooked rice, topped off with corn flakes, sounds exactly like a person walking through grass/dirt blanketed with dry leaves.

Dialogue (D, DX, ADR)

The fourth and final major element of the soundtrack is dialogue, or speech. Audiences want to hear what the actors are saying!

Dialogue in a film takes on, ultimately, one of two forms. Either the words are spoken by an actor on screen, with the lips visible to the audience; or, the words are spoken by an actor off screen, or by an actor on screen whose face is not visible. Dialogue from an actor whose face we see is termed “lip-synch”, because the words must match the movement of the lips. All other dialogue is considered “wild”, since it does not have to sync with any on screen source.

The recording of dialogue usually occurs on the set during filming, and this is referred to as “production dialogue”. Sometimes, while actors are on the set, but without cameras rolling—the company will record additional lines of dialogue to be used later as “wild lines”. Examples of wild lines that would be recorded on the set for future use include other halves of phone conversations, shouts or greetings from afar, background ambiance, alternate dialogue (to cover profanity in event of television broadcast, known as Protection Tracks), narration, or any dialogue that talent tends to stumble over (the editor can either meticulously replace the lip-synch a word at a time, or cut to a reverse angle that hides the actor’s lips and just lay in the lines).

Sometimes, for any of a multitude of reasons, production dialogue is unusable and must be replaced during post-production. Maybe a production mixer is either incompetent or suffers an equipment malfunction. Often, the problem is totally beyond the help of the mixer, such as a loud generator or continuous aircraft. Directors may shout screen directions and talk during dialogue. There are all sorts of reasons and excuses for having to replace dialogue on occasion, some of which we can control and some of which we can’t.

When a production track does need to replaced, editors use a process known as “A.D.R.”, which is short for Automated (or Automatic) Dialogue Replacement.

In the old days, dialogue replacement was done by physically cutting out short sections of the original dialogue (consisting of one or two lines) along with the appropriate picture. These sections were formed into continuous loops. That’s why the process was called “looping”. A projection system would run a loop of picture along with the corresponding loop of original sound in sync with a loop of fresh stock threaded up in a recorder. The actor would watch the film clip, listen to his original track on headphones, and re-perform each line aloud.

When the process was complete for each loop of dialogue, the editor would painstakingly replace each section of picture along with the newly recorded sound.

More modern technology later simplified the process. In the A.D.R. process, the physical loops were done away with. Instead, the entire reel of picture and the entire reel of original sound were threaded up in sync. An entire reel of blank audio stock was set up on a recorder. A simple computer was programmed with the start and stop footage of each “loop” that needed to be recorded. All three machines rolled down, in sync, to the first “loop” and the process began. The actor watched the projected footage and listened to the cue track on headphones. A series of three audible beeps alerted talent as the system rolled forward towards the record start point. His take was recorded on the blank stock. At the completion of each take, the computer rewound all three machines back to the programmed start point and the process repeated itself. When the “loop” had been successfully recorded, the entire system moved ahead to the next programmed set of cues.

After the A.D.R. recording process had been completed, life was considerably much easier for the editor since all three elements—picture, production sound, A.D.R.—were already in sync with each other throughout the length of the entire reel. To replace bad original sound, all the editor had to do was put the three elements in a gang synchronizer on his editing bench, roll down to the first cut point, and splice in his track. 550’ at the picture and 550’ on the production sound reel would correspond to 550’ on the A.D.R. reel.

A.D.R. could also be recorded using a multi-track recorder with SMPTE timecode to sync with picture; or most likely today with a non-linear digital editing system that contains picture and audio on its hard drive.

Computer editing has certainly made the physical recording/cutting process a whole lot easier and faster; but the basic principles remain the same.