DirectX 9: DirectMusic Concepts

Overview
Scott Selfon

DirectMusic and DirectMusic Producer (the tool used to author music and soundscapes for DirectMusic) introduce some new concepts in audio production. Learn to appreciate what follows, and you will see that there is a new musical world waiting to be conquered. But first, a quick explanation of terms….

We use the term "audio producer" throughout this book. An audio producer is anyone who creates audio to be used as part of a DirectMusic project. This can be a sound designer, producer, recording engineer, composer; we don't care. If your focus is creating audio to be placed in a project, you are an audio producer. If your job is focused on integrating sounds into a project using code, building tools/extensions to DirectX Audio, or developing playback software, or you are a game programmer, then you should perk up when you see the term "programmer."
Interactivity
"Interactivity" is both the most overused and misused term in game audio. Often, when someone speaks of interactive audio, he is referring either to adaptive audio (audio that changes characteristics, such as pitch, tempo, instrumentation, etc., in reaction to changes in game conditions) or to the art and science of game audio in general. A better definition for interactive audio is any sound event resulting from action taken by the audience (or a player in the case of a game or a surfer in the case of a web page, etc.). When a game player presses the "fire" button, his weapon sounds off. The sound of the gun firing is interactive. If a player rings a doorbell in a game world, the ringing of the bell is also an interactive sound. If someone rolls the mouse over an object on a web site and triggers a sound, that sound is interactive. Basically, any sound event, whether it be a one-off, a musical event, or even an ambient pad change, if it comes into play because of something the audience does (directly or indirectly), it is classified as interactive audio
Variability
An interesting side effect of the performance of recorded music production is the absence of variation in playback. Songs on a CD play back the same every single time. Music written to take advantage of DirectMusic's variability properties can be different every time it plays. Variation is particularly useful in producing music for games, since there is often a little bit of music that needs to stretch over many hours of gameplay. Chapter 21 discusses applications of variation in music production outside the realm of games.

The imperfection of the living, breathing musician creates the human element of live musical performances. Humans are incapable of reproducing a musical performance with 100 percent fidelity. Therefore, every time you hear a band play your favorite song, no matter how much they practice, it differs from the last time they played it live, however subtle the differences. Repetition is perceived as being something unnatural (not necessarily undesirable, but unnatural nonetheless) and is easily detectable by the human ear. Variability also plays a role in song form and improvisation. Again, when a song is committed to a recording, it has the same form and solos every time someone plays that recording. However, when performed by musicians at a live venue, they may choose to alter the form or improvise in a way that is different from that used in the recording.

DirectMusic allows a composer to inject variability into a prepared piece of digital audio, whether a violin concerto or an audio design modeled to mimic the sounds of a South American rain forest. Composers and sound designers can introduce variability on different levels, from the instrument level (altering characteristics such as velocity, timbre, pitch, etc.) to the song level (manipulating overall instrumentation choices, musical style choices, song form, etc.). Using DirectMusic's power of variability, composers can create stand-alone pieces of music that reinvent themselves every time the listener plays them, creating a very different listening experience when juxtaposed to a mixed/mastered version of the same music. Composers alter the replay value of their compositions as well by allowing their music to reinvent itself upon every listening session.

Avoidance of audio content repetition in games is often important. When asked about music for games, someone once said, "At no time in history have so few notes been heard so many times." Repetition is arguably the single biggest deterrent to the enjoyment of audio (both sound effects and music) in games. Unlike traditional linear media like film, there are typically no set times or durations for specific game events. A "scene" might take five minutes in one instance and hours in another. Furthermore, there is no guarantee that particular events will occur in a specific order, will not repeat, or will not be skipped entirely. Coupled with the modest storage space (also known as the "footprint") budgeted for audio on the media and in memory, this leaves the audio producer in a bit of a quandary. For the issue of underscore, a game title with hours of gameplay might only be budgeted for a few minutes worth of linear music. The audio director must develop creative ways to keep this music fresh — alternative versions, version branching, and so on. Audio programmers can investigate and implement these solutions using DirectMusic.

As to events triggering ambience, dialog, and specific sound effects, these may repeat numerous times, adding the challenge of avoiding the kind of obvious repetition that can spoil a game's realism for the player. As we discuss in more detail, DirectMusic provides numerous methods for helping to avoid repetition. On the most basic level, audio programmers can specify variations in pitch and multiple versions of wave, note, and controller data. Even game content (specific scripted events in the game for instance) can specify orderings for playback (such as shuffling, no repeats, and so on) that DirectMusic tracks as the game progresses. Using advanced features, chord progressions can maintain numerous potential progression paths, allowing a limited amount of source material to remain fresh even longer.
Adaptability
Adaptive audio is audio that changes according to the state of its playback environment. In many ways, it is like interactive audio in that it responds to a particular event. The difference is that instead of responding to feedback from the listener/player, the audio changes according to changes occurring within the game or playback environment. Say for instance that a game's musical score shifts keys with the rising and setting of the sun within the game world. The player isn't causing the sun to set; the game is. Therefore, the score "adapts" to changes happening within the game's world. A famous example of adaptive audio in games occurs in Tetris. The music plays at a specified tempo, but that tempo will double if the player is in danger of losing the game.

Avoiding repetition is an excellent first step in a strong audio implementation for games, as well as reintroducing music listeners to some potentially intriguing performance characteristics lost when listening to linear music. Continuing to focus on audio for games for a moment, audio content triggered out of context to game events is in many ways less desirable than simple repetition. For instance, using data compression and/or high-capacity media, a composer might be able to create hours and hours of underscore, but if this linear music is played with no regard to the state of the game or the current events in the game's plot, it could be misleading or distracting to the user. Adaptive audio (audio that changes according to changes in certain conditions) provides an important first step for creating interactivity in underscore.

Do not confuse adaptive audio with variability — if, for instance, 20 variations of a gunshot are created and a single one is randomly chosen when the player fires a gun, that is variable but not necessarily adaptive. If the gunshot's reverberation characteristics modulate as the player navigates various and differing environments, then we have adaptive sound design. Likewise, a game's score could have infinite variability, never playing the same way twice, but it becomes adaptive when the sultry slow jazz tune fades in over the rock theme during the player's introduction to the game's film noir femme fatale antagonist. Now contrast variability and adaptation with interactivity; a character that stops talking when shot by the player is an example of interactive audio.

Adaptation does not always need to mean "just-in-time" audio generation. While an underscore that instantly and obviously responds to game events might be appropriate for a children's animated title, this kind of music is often inappropriate for a first-person shooter, for instance. In such a style of game, where the breakneck pace of the action constantly modulates the game state, the user can begin to notice the interactivity of the score. This is dangerous, as the subtlety of the score is lost on the player, potentially damaging the designer's carefully crafted game experience. In a game like Halo, you do not want the player performing music via an exploit in the interactive music engine. For this reason, musical changes are often most satisfying when "immediate" changes are reserved for important game events and more subtle interactivity is used for other events and situations.
Groove Levels
One of the more powerful ways that DirectMusic exposes adaptive audio is with the groove level. Groove level is a musical or emotional intensity setting that adjusts in real time according to rules you define as part of your playback environment. You can set up different moods within a DirectMusic piece and assign those moods to different groove levels. For instance, someone could produce a progressive trance piece with sparse musical activity on groove level 1 and increase the intensity of the music through various groove levels. This can be achieved by adding more notes and instruments, or by setting higher volumes. You can then assign the different intensity (groove) levels to trigger upon changes in the playback environment. Say you create a DirectMusic piece for stand-alone music listening; you could set a rule that switches groove levels according to the time of day or even the number of windows the listener has open on the desktop. The possibilities are truly endless.
Content-Driven Audio
Adaptive audio becomes really interesting when the audio gets behind the wheel and drives the playback environment to behave and/or change in a particular manner. This is commonly referred to as content-driven audio. DirectMusic's sequencer provides notifications that can be used to manipulate the playback environment, allowing for the interesting possibility of music driving gameplay rather than the opposite. For instance, a monster around the corner could wait to jump out at you until an appropriate moment in the score. Content-driven audio is a particularly unexplored area of DirectMusic, one that could be put to some interesting uses.
Playback Standardization
DirectMusic can play wave files. The great thing about wave files is that they can be CD quality, and (speaker quality not withstanding) they sound the same when played back on any computer. The problem with wave files is that they are big when compared to MIDI files, and they often limit adaptability and variation. A partial solution is to create a custom sample bank that is distributed with the MIDI sequence data in your DirectMusic file. While in most cases you are still forced to use minimal file sizes (restricting quality), you don't have to worry about different listeners hearing your sequenced music through different synthesizers/samplers; you've given them the sample bank that you created as part of the DirectMusic file. Before this standardization took place, you could listen to a MIDI file of Beethoven's Fifth through your SoundBlaster, while we listened to it through the sample set on our Turtle Beach sound card. The result: We both would have heard very different renditions. This is an audio producer's nightmare, since you have no control over the instruments on which your sequence plays. Luckily, this is no longer a problem, thanks to DLS-2.

DLS-2 (the Downloadable Sounds format, level 2) is a sound format used for creating and storing samples, much like the popular Akai sample format. DirectMusic can use DLS-2 samples. The great thing about DLS-2 support in DirectMusic is that audio producers can create their own custom sample bank and include it as part of the DirectMusic file. This means that no matter who plays the DirectMusic file, they will hear the same sounds that everyone else does (as opposed to relying on their sound card or soft synth rendering the sequence data). DLS also specifies basic synthesis abilities — specification of which waves comprise an Instrument and how to modify the source wave data according to MIDI parameters like pitch bend, modulation, and so on. DirectMusic, or more specifically the Microsoft Software Synthesizer that DirectMusic uses, supports DLS-2, which adds numerous additional features, including six stage envelopes for volume and pitch, two low-frequency oscillators, and MIDI-controllable filtering per voice. When using DirectMusic, these sample-based Instruments are stored in DLS Collections, which can be self-contained files (generally using the .dls file extension) or embedded directly within pieces of music, similar to traditional MOD files and the newer .rmid extension to standard MIDI files, where sampled Instruments can be embedded within the same file as the note information.
3D Spatialized Audio
Our world is one of three dimensions. Most audio that you've heard played over the radio or television exists in one or two dimensions. Even surround sound lacks a vertical component (sound coming from above or below). Audio engineers have developed a technique that synthesizes the way we hear sound in 3D space. DirectX Audio has this functionality. This means that a sound can be mixed in space not only to sound closer, farther away, or to the left or right but also from above, below, and even behind us, all using just two speakers! We do not go into detail here on how or why DX Audio is able to do this. Just know that you have the ability to mix sound in 3D using DirectX Audio.
DirectMusic Rules and Regulations
DirectMusic imposes several rules and limitations that you should keep in mind. We cover the more significant ones here and discuss more specific restrictions as they arise.

Each DirectMusic Performance Has a Single Tempo at Any One Time
A DirectMusic Performance is somewhat analogous to a conductor and an orchestra; the conductor is only able to conduct at a single speed at any time (unless he is conducting some avant-garde piece of contemporary music!), and the orchestra as a whole needs to understand where those beats are. All pieces of music playing on a Performance must play at the same tempo — and a tempo change imposed by one piece of music will affect all other playing pieces of music. This becomes most interesting when using DirectMusic for playing sound effects and ambient sound in addition to music, as these sounds care nothing for tempo. SFX and ambience may inadvertently cut off because of musical tempo changes. There are several options that you may consider:

Author all content at the same tempo. This of course defeats the ability to change speeds based on various events.

Use clock-time Segments for sound effects. Every Segment file can specify whether it is meant to use music-time or clock-time.A music-time Segment bases its timing information on the tempo of the music. Notes will play for a specific number of measures, beats, ticks, etc. If the tempo changes, the note will be played for less or more time — desirable for music but not so much for other sounds. A clock-time Segment, by contrast, only pays attention to the system clock and absolute time. Generally used for Segments that only contain non-note-based wave information, a clock-time Segment uses millisecond accuracy for playback. A wave told to play for 5.12 seconds will play for exactly that amount of time, regardless of tempo changes that occur while it is playing.

Use multiple DirectMusic Performances. You can always run more than one DirectMusic Performance simultaneously. The amount of processing power used (i.e., the CPU overhead) for a second DirectMusic Performance is small, though of course there are now additional assets to manage and track. Consider playing sound effects, ambience, and other audio cues that do not respond to tempo on one performance (with constant tempo), while music-oriented sounds will be played on another performance (with variable tempo).

Primary and Secondary Segments
The next restriction to keep in mind is the differences between primary and secondary Segments. A Segment can be either primary or secondary. Only one primary Segment may play at a time; starting a new primary Segment will implicitly stop and replace any previously playing primary Segment. Primary Segments typically dictate tempo, groove level, chord progression, and other big picture (aka "global") performance-level events. By contrast, many secondary Segments may be playing at the same time. Secondary Segments typically layer on top of the primary Segment, picking up and using the primary Segment's tempo and other settings. However, if a secondary Segment plays as a controlling secondary Segment, rather than layer on top of the primary Segment, it will actually replace corresponding Tracks from the primary Segment. Controlling secondary Segments are typically reserved for more advanced usage (for instance, changing the chord progression of the primary Segment, say, when an antagonist enters the same room as the hero in the game).

Primary Segments are generally the main background music or ambience. Secondary Segments are often "one-shot" sounds or "stingers" that play over the primary — perhaps sound effects or musical motifs. Secondary Segments follow the primary's chord progression, so musically motivated secondary Segments can play with appropriate transposition and voicing.

Crossfades
Programmers and audio producers should note that crossfades are not a built-in DirectMusic function for transitioning between Segments. For now, it is a fact, and so you are going to have to fudge them. It's not all bad, as you do have options. These options include AudioPath volume manipulation, MIDI volume controller use, or authoring volume crossfades directly into content. Remember that only one primary Segment can play at a time, which precludes the possibility of using primary Segments exclusively for crossfades. Unless multiple DirectMusic Performances are used, at least one of the Segments we are crossfading between will need to be a secondary Segment. For ease of use, we suggest that both be secondary Segments. The one-tempo limitation we discussed above makes crossfades between two pieces of music with different tempos difficult. One of the pieces will be played faster or slower (unless multiple DirectMusic Performances are used). Of course, this limitation really only applies for sequenced music; prerendered wave files using different tempos play fine.

Pause/Resume
"Pause" is another feature not implicitly built into DirectMusic. However, a programmer can track the time when a Segment starts, as well as its offset when he stops that Segment. Using this information (and keeping in mind that the Segment may have been looped), the programmer can typically restart a Segment from the same location to create "resume" behavior. However, this only works for Segments that use music time. Note that content that includes numerous tempo changes may be less accurate in restart position than content with a single tempo.

While this works for waves, MIDI and DLS limitations do not allow this for sequenced note data. The MIDI standard for note information consists of a "note on" and corresponding "note off" event. Starting playback between the two events will not trigger the note. Even if it did, remember that DLS Instruments cannot be started from an offset; they can only be played from the beginning. This won't really be an issue for short notes in a dense musical passage. However, if you have long sustained notes (for instance, a string pad), keep this in mind if you want to be able to pause/resume your music. In that particular case, you might want pause/resume functionality to restart the piece of music at a position before the pad began or even the beginning of the Segment.

Memory Footprint
Memory is typically the most precious resource for games, whether on the PC or a console. The amount of memory taken up by the resources of a program or a particular component of a program is called the footprint. Making use of streaming wave data can help keep the memory footprint small; only a small amount of any playing wave is actually in memory at a time. DirectMusic supports wave streaming within Wave Tracks, with content-specified read-ahead settings that determine how much of the wave is in memory. However, the DLS format does not support streaming, so any DLS Collection instruments play entirely from memory. There are several optimizations available that we will discuss in Chapter 2 when we cover Bands (a DirectMusic file that stores DLS Instruments), but keep in mind that DLS Instruments will occupy a footprint as long as they are downloaded.

Again, this isn't a big deal if you are creating music for stand-alone listening. You'll have the free range of the system memory to work with for samples (normally 128MB of memory — more memory than any pro hardware synthesizer on the 2003 market) on top of all the streaming that the audience's PC can handle. But in games, you are very limited in the amount of memory you can use. Just keep that in mind.

Note for programmers: Just using DirectMusic incurs a memory footprint, albeit a small one. The size does depend on which DirectMusic functions you choose to use in your application. For instance, if you use DirectX Audio Scripting with the Visual Basic Scripting Edition language (VBScript), that language requires several libraries of code to be loaded to the tune of almost 1MB of DLLs (dynamically linked libraries). Granted, other aspects of your program might depend on the same libraries, but to help keep the memory footprint more manageable, DirectMusic offers a tiny (~76KB) optimized DirectX Audio Scripting language (called audiovbscript) as an alternative to the fully featured VBScript.
DirectMusic and MIDI
We've already alluded several times to DirectMusic's support for the MIDI format. This can be a strength or a weakness; the interoperability with this standard sequencing format allows for easy authoring from more sequencing-focused tools in the composer's palette (like Sonar or Cubase SX). But using DirectMusic for sound effects can work against this support — a sound effect is often meant to be played once or looped indefinitely. The concept of a finite "note on" and "duration" is somewhat foreign in this instance. As we've discussed already, the use of clock-time Segments can mitigate this somewhat for sound effects — they are given a fixed duration in milliseconds that are independent of tempo.

While DirectMusic supports MIDI, it has several significant aspects that allow it to overcome some of the basic limitations of MIDI devices. For instance, typical devices are limited to 16 MIDI channels. The Microsoft Software Synthesizer, the synthesis engine used by DirectMusic, allows you to reference up to 32 million discrete channels. Using DirectMusic Producer, you will be able to use 1000 of these performance channels, or pchannels, per Segment.

DirectMusic also adds some basic optimizations to MIDI to allow for easier manipulation of controller data. Traditionally, sweeping a controller from one value to another consisted of many rapid MIDI controller events. As a file format (and authoring convenience) optimization, DirectMusic allows the audio producer to instead use continuous controller curves. The start time, end time, initial value, end value, and curve type can all be specified. When the content plays back, the intermediate values are dynamically generated to remain compatible with traditional MIDI.

A common question raised when a game is layering 20 pieces of music is whether the composer needs to worry about authoring each piece of audio onto its own set of pchannels. With traditional MIDI, you cannot simultaneously play two separate Instruments on the same channel at the same time — the patch change from one overrides the other. Similarly, if two Tracks on the same channel use the same MIDI controllers, they will override each other. Because DirectMusic allows the composer to play multiple pieces of content simultaneously, DirectMusic provides solutions to these basic MIDI limitations.

From the point of view of MIDI, DirectMusic AudioPaths effectively act as unique synthesizers with their own unique pchannels. An AudioPath is pretty much what it sounds like—aroute that audio data follows through hardware or software before it is output. PC AudioPaths can include DirectX Media Objects (DMOs), which are software audio processing effects. AudioPaths also define the behavior of content being played onto them, such as whether they can be positioned in 3D space (kind of like surround sound but different — more on this later), whether they should be played as stereo or mono, and so on. As far as MIDI is concerned, each AudioPath gets its own unique set of "physical" pchannels (granted, this is all in software, so we're not talking about tangible channels). For instance, if two sequences are authored, both using pchannels 1 through 16, playing them on separate AudioPaths will keep the two from interacting. If one Segment on AudioPath A has a piano on pchannel 1 and another Segment on AudioPath B has a French horn on pchannel 1, they will both play happily. If we tried to play these two Segments onto a single AudioPath, one of the patch changes would win out, and one Segment would be playing the wrong instruments.

That said, sometimes playing multiple pieces of content onto the same AudioPath is desirable. For instance, a Segment could be authored simply to alter the mix of music played by another Segment (using volume controller curves). Alternatively, in the example of 3D audio above, we probably would want to play everything that will emanate from this single position onto the same AudioPath.

Therefore, we have a solution for multiple simultaneous pieces of music. But what about MIDI controller curve interaction? Of course, if the pchannels are kept unique (either at authoring time or by playing the content onto separate AudioPaths), the MIDI controllers won't conflict. But what about the above example of a Segment where we just want to alter the mix? If the Segment it's altering has its own volume controller curves, the two will conflict, and we might get the volume jumping rapidly between the two. The classic solution is to use volume curves in one Segment and expression curves in the other Segment. This is a common approach in sequenced music, as both affect perceived volume and apply additively. This way the audio producer sets Volume (MIDI Continuous Controller #7) in the primary piece of music, and uses Expression (MIDI Continuous Controller #11) to reduce this volume per individual performance channel in the second, layered piece of music.

DirectMusic provides a more generalized solution for this problem — not only for volume but also for pitch bend, chorus send, reverb send, and numerous other MIDI continuous controllers. The audio producer specifies an additive curve ID for each MIDI continuous controller curve created. Curves that have the same ID override each other, as with traditional MIDI. Curves that have different IDs are applied additively (or more accurately for the case of volume, the process becomes subtractive).

DirectX 9

Thursday, December 20, 2007

DirectMusic Concepts

No comments:

Blog Archive

About Me