Recently, our faculty at California State University Northridge was submitted a proposal for one of our Senior Thesis Films for an original musical, to be shot "live". Even though we have discussed Sync Playback in previous articles here on our website, FilmTVsound.com, I felt that it is important to revisit this topic and explain about the process of shooting to Sync Playback.
Filming a music video or musical sequence within a film entails having the cast perform in perfect lip synch to the music seamlessly throughout a large number of camera angle changes and other edits. The only practical way to achieve this is to splice all of the selected picture takes over one continuous piece of music (aka the edit master), so that you end up with a final product that is built from one rendition of the song devoid of frankenstein-ish glitches in pitch & tempo.
Attempts to re-construct the "song" by assembling multiple "live" takes is a dauntless task, and seldom is successful even with the best of singers & musicians. The only way to go "live" is to utilize only footage shot from multiple cameras during one single take. It is extremely difficult to combine audio from other takes.
Usually, music video shot of a "live" concert consists of one performance covered by multiple cameras, which is then enhanced by additional coverage acquired by playing back the audio from the "live" song repeatedly while the performers lip synch back to the original live track.
Traditional music videos are shot by playing back a song that was previously recorded in the studio (in order to achieve audio perfection). These can be shot either single camera or multi-camera. First, you must create an edit master which consists of the original song combined with SMPTE timecode. The edit master is safely set aside for post production, after duplicates of the edit master (playback dupes) have been produced for use on the set.
During production, camera is rolled first. Then the Director calls for "playback" and the pre-recorded playback dupe is played back on the set via a loudspeaker system. Note that the song can be played in its entirety, or only a selected portion (along with several seconds of lead-in). Simultaneously, the pre-recorded timecode is displayed for the camera on a timecode slate or timecode display reader. After visually capturing a few frames of the playback timecode, the slate is removed and the performance is ready to be photographed on film or video.
Remember that you cannot JAM SYNC the playback timecode to the timecode slate the way we do for live dialogue, since the timecode is pre-recorded and dependent on the act of playback. The slate and the playback recorder/deck need to be physically connected by a cable or wireless transmitter/receiver.
The presence of the visual timecode allows the editors to match up each visual take with the corresponding section of the song (edit master). Note that although the timecode can sometimes be recorded in the camera on a soundtrack for reference -- it is not advised to jam sync the camera timecode to the playback timecode. The camera timecode allows post production to identify and conform all of the resulting camera footage; if that footage had timecode that repeated itself (the playback timecode) -- then the computers would have no way of distinguishing the raw footage!
"But we want to shoot 'live', like they did on HBO's TREME and Les Miserables!"
True, sometimes Hollywood does shoot a musical sequence live, without benefit of playback. But those instances are extremely rare.
First, let's take a quick look at TREME, which was the HBO dramatic series about musicians in New Orleans. In that series, all of the musical numbers were, in fact, shot live and recorded live. However, there are a couple of things to take into account. To begin with, all of the musical scenes were supposed to be live performances on the streets and clubs. That meant it was okay to be imperfect, and to suffer from poor acoustics & extraneous noise. This music was not intended to match the "studio album" of a record label.
The stars in the series sometimes were not actual musicians themselves, so the real street musicians played in for them just outside of camera frame, while the actors pantomimed with the music. Nor was there much in the way of choreography, lighting effects, set changes, or magical location changes. It was all kept simple, real-time, and street realistic.
Les Miserables was a hit Broadway show for many years. Sets and choreography have had a long evolution, so the production team had a lot to start from. All of the cast (except for you know who) were accomplished singers and had stage experience.
It was, for the most part, shot "live" one song at a time. Those songs were covered by three to ten cameras per take. It was shot in large state-of-the-art sound stages, so that there would not be extraneous noise. Lighting and set changes (within the take) were controlled by computerized lighting boards and sophisticated rigs.
The live accompaniment was a piano. Full orchestration was recorded during post production to match the singing (key & tempo).
And they rehearsed, rehearsed, and rehearsed for around four months!
Keeping It All in Synch
Sync Playback requires precision lip synch. Just as in recording live dialogue, there must be perfect alignment in terms of speed as well as wow & flutter if the performance in the picture is to match up frame for frame with the soundtrack.
In today's digital climate, concern about wow & flutter is a thing of the past (unless you are playing back CD's on a boom box). Pretty much all of your digital files, played back on a digital recorder, will maintain perfect speed without mechanical drift. Yes, CD's may drift during playback -- ever watch a DJ do a scratch mix?
What you do need to be concerned with is matching the timecode frame rate with that of the camera (and edit system).
If you are shooting in standard (SD) video, or in high def (HD) video -- then it is fairly simple. Picture speed does not change between on-set, editing, and presentation. Just make sure that you playback your audio/timecode files at the same rate that they were created, which should match the same rate as the intended picture.
For standard (SD) video, that means 29.97 frames per second (either non-drop or drop-frame). Make sure that you match non-drop audio to non-drop picture, or drop-frame audio with drop-frame picture settings. For example, if the video format is going to be 29.97 non-drop -- create your edit master (music) at 29.97 non-drop. Then create your matching playback dupes at 29.97 non-drop.
Bit rate is usually 16 bit at 48K sampling; although some productions may opt for 24 bit at 48k. Never at 16 bit/44.1k, which is consumer audio CD release format.
For HD video, picture would be at 23.97 frame rate (there is no drop or non-drop in HD). So format your audio likewise, at 23.97 frames per second.
On the set, you would playback your audio at its true rate.
Where things become complicated is when the camera shoots on the set at one speed, but ends up in the edit bay at a slightly converted speed. For example, if you shot with a film camera running at 24 sprocketed frames per second -- that speed might be slowed down during the film-to-video telecine conversion to 23.97 video frames per second (HD) or even 29.97 non-drop video frames per second (SD). To get from a true 24 fps to 23.97 fps, the film footage is slowed down (pulled down) by one tenth of one percent.
If the performers sang, danced, and played to music on the set (during playback) that was real-time, then their movements would appear out of sync to the music after the camera footage was slowed down yet synched up with the same music. (picture slowed down from 24 to 23.97 but audio not changed). To compensate for this slowing down of what was shot on the set, it is necessary to playback the music one tenth of one percent faster on the set in order to match the fact that the camera is running slightly faster on the set then it will be in post.
Speeding up the playback rate is known as pull-up. So what you will get is: camera on the set at 24fps, slowed down to 23.97 in post. audio on the set at 24 fps, slowed down to 23.97 in post. In terms of sampling rates, when we slow down 48k by 0.1% you end up with 47.9k. Or if you start with 48.048k on the set, you would end up with 48k in post.
So what you want to do is the following: record your edit master at 23.97fps @ 48k for use in the finished product. But create your playback dupes at 47.9 k (but faux stamped as 48k), so that when those files are played back by your recorder at 48k (the faux stamp fools the machine), you will have pulled up (sped up) the music by the proper percentage. Note that some recorders have menu settings to allow you to do this easily; otherwise you must create these pull ups with Pro Tools back in the studio.
If you are planning to shoot at a true 24 fps either in film or electronic cinema, and NOT edit in 23.97 -- then you can just create your edit master and playback dupes at 24 fps and not concern yourself with speed conversions. But that is a rare situation!
In all instances, it is very important to confirm with the Post Production Supervisor about what frame rates, bit rates, and sampling rates to use for the edit master and subsequent playback dupes. And get the answers in writing, lest they be contested later.
Life gets real complicated when the Director wants something bizarre, such as filming the actors in slow motion while they sing in real time to the song! Leave those shoots for the experts! Some audio engineers earn a lucrative living doing nothing but sync playback shoots.
However, if there is no lip sync required for the performance, then there is nothing to worry about. Thematic shots of slow motion or random shots of whatever (that do not involve lip sync or performing to specific notes/beats) only require playing back any general rendition of the song to set the mood. Pull-ups or pull-downs won't matter, since those loose shots do not require precise matching.
Playback Techniques on the Set
For basic sync playback, you need a digital recorder or playback deck, a timecode slate, and a loudspeaker system. After the camera rolls, you begin the playback (from the song start or from a predetermined cue point) and allow the camera to see a few frames of the playback timecode on the slate. The slate must be hard wired to the playback unit, or use a wireless transmitter/receiver so that the timecode on the track is being displayed.
Sometimes, it is necessary to record some live dialogue before, after, or even during the playback sequence, To do this, you will need two timecode slates as well as two recorders (or one recorder and one playback deck). Begin the roll as you normally would for live dialogue. Roll camera, roll the audio recorder, verbally slate the scene/take.... and then begin the playback. Make sure that the camera can see both sets of timecode numbers in the two slates (or one double-decker slate). One set of timecode numbers is for the live dialogue take; the other set of numbers represent the playback track. (If the playback track is to start and stop in the middle of the scene and precludes the use of head or tail slates, then try to record the playback timecode onto an audio track on the camera, or at least onto an ISO track on your audio recorder.)
Play some of the playback track through the loudspeakers to set the mood/tempo on the set. On cue, kill the loudspeakers so that the set is silent in order to record usable clean dialogue. However, the playback track continues to play in the headphones or earphones of the "band" in the background, or for the choreographer and dance captains who might be standing just out of frame, yet within the eye line of the background dancers. The dance captains will continue to step and move to the music, so that the background extras can follow the beat. Sometimes, it is possible to hide small, directional speakers downstage and set to very low volume.
You might consider having the dancers remove their shoes, or adhere soft padding to the front row of shoes if they might be visible. Of course, everyone in the scene who does not have key dialogue is "silent" and only pretending to talk, drink, etc.
Another technique for inaudible playback of the "beat" is to use what Hollywood calls a thumper. On cue, the main loudspeakers are killed, and the playback track is limited to a large sub-woofer. The sub-woofer puts out audio around 40 to 50 hertz, so low that you feel it more than you can hear it. Microphones on the set are set to roll off bass frequencies below 90Hz, so that they cannot "hear" the sub-woofer track.
In-the-ear wireless receivers are yet another way to achieve silent playback (or cueing). Actors can wear small belt pack receivers with discreet earpieces to receive transmitted playback tracks, script prompts, or stunt cues. If the wire from their earpiece cannot be readily hidden, there are wireless earpieces available.
Comtek induction earpieces are longtime favorites of Hollywood and considered standards of the industry. An induction earpiece utilizes a wire loop "necktie" that is worn under clothing and plugs into the headphone output of a belt pack receiver. A small, hearing aid sized, in ear receiver is able to pick up the magnetic field created by the loop antenna.
For large casts, or when talent is scantily clad, we can utilize a variation on the wire loop necktie. Instead of placing the magnetic field around the neck, we use a large audio amplifier (around 400 watts) and pump the audio into a large wire loop that we run around the circumference of the set. The wire connects to the plus and minus of the speaker terminals -- make sure that you have enough resistance so as not to short out the amplifier. Use either a lot of thin wire, or add some in-line resistors to satisfy "minimum load" requirements.
Anyone in the cast who stands within the wire loop will receive audio in their earpieces.
Shooting sync playback is a complex process and requires a good deal of planning. It is not overly complicated, but you have to do it right or else risk having your performance out of sync with your music! Make sure to pay attention to background performances as well. I have been on many music videos where the Director was only paying attention to the lead vocal, and did not pick up on the errors in the background.
It is always a good idea to have some sort of video playback available to check your takes. Even if it is just a consumer camcorder held close to the film camera.