DeRustle: Removing Lavalier Microphone Noise with Deep Learning

By inconspicuously attaching on clothing near a person’s mouth, the lavalier microphone (lav mic) provides multiple benefits when capturing dialogue.  For video applications, there is no microphone distracting viewer attention, and the orator can move freely and naturally since they aren’t holding a microphone.  Lav mics also benefit audio quality, since they are attached near the mouth they eliminate noise and reverberation from the recording environment.

Unfortunately, the freedom lav mics provide an orator to move around can also be a detriment to the audio engineer, as the mic can rub against clothing or bounce around creating disturbances often described as rustle.  Here are some examples of lav-mic recordings where the person moved just a bit too much:

Rustle cannot be easily removed using the existing De-noise technology found in an audio repair program such as iZotope RX, because rustle changes over time in unpredictable ways based on how the person wearing the microphone moves their body.  The material the clothing is made of also can have an impact on the rustle’s sonic quality, and if you have the choice attaching it to natural fibers such as cotton or wool is preferred to synthetics or silk in terms of rustling intensity.  Attaching the lav mic with tape instead of using a clip can also change the amount and sound of rustle.

Because of all these variations, rustle presents itself sonically in many different ways from high frequency “crackling” sounds to low frequency “thuds” or bumps.  Additionally, rustle often overlaps with speech and is not well localized in time like a click or in frequency like electrical hum.  These difficulties made it nearly impossible to develop an effective deRustle algorithm using traditional signal processing approaches.  Fortunately, with recent breakthroughs in source separation and deep learning removing lav rustle with minimal artifacts is now possible.

Audio Source Separation

Often referred to as “unmixing”, source separation algorithms attempt to recover the individual signals composing a mix, e.g., separating the vocals and acoustic guitar from your favorite folk track.  While source separation has applications ranging from neuroscience to chemical analysis, its most popular application is in audio, where it drew inspiration from the cocktail party effect in the human brain, which is what allows you to hear a single voice in a crowded room, or focus on a single instrument in an ensemble.

We can view removing lav mic rustle from dialogue recordings as a source separation problem with two sources: rustle and dialogue.  Audio source separation algorithms typically operate in the frequency domain, where we separate sources by assigning each frequency component to the source that generated it.   This process of assigning frequency components to sources is called spectral masking, and the mask for each separated source is a number between zero and one at each frequency.  When each frequency component can belong to only one source, we call this a binary mask since all masks contain only ones and zeros.  Alternatively, a ratio mask represents the percentage of each source in each time-frequency bin.   Ratio masks can give better results, but are more difficult to estimate.

For example, a ratio mask for a frame of speech in rustle noise will have values close to one near the fundamental frequency and its harmonics, but smaller values in low-frequencies not associated with harmonics and in high frequencies where rustle noise dominates.

mask_ex
The magnitude spectrum for a frame of noisy speech, and the associated ratio mask for separating the clean speech.  The mask is highest at frequencies where there are peaks in the magnitude spectrum, which correspond to vocal harmonics.

To recover the separated speech from the mask, we multiply the mask in each frame by the noisy magnitude spectrum, and then do an inverse Fourier transform to obtain the separated speech waveform.

Mask Estimation with Deep Learning

The real challenge in mask-based source separation is estimating the spectral mask. Because of the wide variety and unpredictable nature of lav mic rustle, we cannot use pre-defined rules (e.g., filter low frequencies) to estimate the spectral masks needed to separate rustle from dialogue.  Fortunately, recent breakthroughs in deep learning have led to great improvements in our ability to estimate spectral masks from noisy audio (e.g., this interesting article related to hearing aids).  In our case, we use deep learning to estimate a neural network that maps speech corrupted with with rustle noise (input) to separated speech and rustle (output).

Since we are working with audio we use recurrent neural networks, which are better at modeling sequences than feed-forward neural networks (the models typically used for processing images), and store a hidden state between time steps that can remember previous inputs when making predictions.   We can think of our input sequence as a spectrogram, obtained by taking the Fourier transform of short-overlapping windows of audio, and we input them to our neural network one column at a time.  We learn to estimate a spectral mask for separating dialogue from lav mic rustle by starting with a spectrogram containing only clean speech.

training_speech2
Example spectrogram of clean speech used as network training target.

We can then mix in some isolated rustle noise, to create a nosiy spectrogram where the true separated sources are known.  

training_mixture2
Noisy spectrogram used for network input when we add rustle to the clean speech example.

We then feed this noisy spectrogram to the neural network which outputs a ratio mask.  By multiplying the ratio mask with the noisy input spectrogram we have an estimate of our clean speech spectrogram.  We can then compare this estimated clean speech spectrogram with the original clean speech, and obtain an error signal which can be backpropagated through the neural network to update the weights.  We can then repeat this process over and over again with different clean speech and isolated rustle spectrograms.  Once training is complete we can feed a noisy spectrogram to our network and obtain clean speech.

Gathering Training Data

We ultimately want to use our trained network to generalize across any rustle corrupted dialogue an audio engineer may capture when working with a lav mic.  To achieve this we need to make sure our network sees as many different rustle/dialogue mixtures as possible.  Obtaining lots of clean speech samples is relatively easy; there are lots of datasets developed for speech recognition in addition to audio recorded for podcasts, video tutorials, etc.  However, obtaining isolated rustle noises is much more difficult.  Engineers go to great lengths to minimize rustle and recordings of rustle typically are heavily overlapped with speech.  As a proof of concept, we used recordings of clothing or card shuffling from sound effects libraries as a substitute for isolated rustle.  

These gave us promising initial results for rustle removal, but only worked well for rustle where the mic rubbed heavily over clothing.  To build a general deRustle algorithm, we were going to have to record our own collection of isolated rustle.

We started by calling into the post production industry to obtain as many rustle corrupted dialogue samples as possible.  This gave us an idea of the different qualities of rustle we would need to emulate in our dataset.  Our sound design team then worked with different clothing materials, lav mounting techniques (taping and clipping), and motions from regular speech gestures to jumping and stretching to collect our isolated rustle dataset.  Additionally, in machine learning any patterns can potentially be picked up by the algorithm, so we also varied things like microphone type and recording environment to make sure our algorithm didn’t specialize to a specific microphone frequency response for example.  Here’s a greatest hits collection of some of the isolated rustle we used to train our algorithm:

Debugging the Data

One challenge with machine learning is when things go wrong it’s often not clear what the root cause of the problem was.  Your training algorithm can compile, converge, and appear to generalize well, but still behave strangely in the wild.  For example, our first attempt at training a deRustle algorithm always output clean speech with almost no energy above 10 kHz, even though there was speech energy at those frequencies.

missing_highs
Example spectrogram of clean speech output by an early version of our network.  Notice the removal of all the high frequency energy, highlighted in yellow.

It turned out that a large percentage of our clean speech was recorded with a microphone that attenuated high frequencies.  Here’s an example problematic clean speech spectrogram with almost no high-frequency energy:

TSP_spec
Training with clean speech without high frequency energy led to missing high frequency artifacts.

Since all of our rustle recordings had high frequency energy the algorithm learned to assign no high frequency energy to speech.  Adding more high quality clean speech to our training set corrected this problem.

Before and After Examples

Once we got the problems with our data straightened out and trained the network for a couple days on a NVIDIA K80 GPU, we were ready to try it out removing rustle from some pretty messy real-world examples:

Before

After

Before

After

Conclusion

While lav mics are an extremely valuable tool, if they move a bit too much the rustle they produce can drive you crazy.  Fortunately, by leveraging advances in deep learning we were able to develop a tool to accurately remove this disturbance.  If you’re interested in trying this deRustle algorithm give the RX 6 Advanced demo a try.

shared_ptr_nonnull and the Zen of reducing assumptions

(This article assumes some familiarity with shared_ptrs in C++.)

Imagine the following line of code and comment are in the private area of the definition of a C++ class Foo:

// The current Quaffle, always valid
shared_ptr<Quaffle> currentQuaffle;

Can you spot any dangerous thinking here? If not, that’s okay, but hopefully this article will change that.

Within the implementation of Foo, because currentQuaffle is assumed to be “always valid,” code dereferences it and uses the Quaffle it points to without checking for validity, i.e.:

currentQuaffle->DoTheThing();

rather than

if (currentQuaffle) {
    currentQuaffle->DoTheThing();
}

The bug introduced by this assumption was caused when an empty currentQuaffle crashed trying to DoTheThing(). A value had never been set after currentQuaffle was silently created using the shared_ptr default constructor.

It’s easy to imagine other ways a bug could be introduced here. Some other object might pass an empty shared_ptr into an instance of Foo without realizing it, maybe across many layers of the call stack. Or a future developer might call reset() on currentQuaffle inside Foo’s implementation without knowing it’s meant to always be valid. In all these cases, currentQuaffle ends up breaking an unenforced law that it should always be valid.

What’s the solution? Ideally, we could simplify the ownership of currentQuaffle so that Foo has a plain Quaffle rather than use a pointer at all. But if this isn’t feasible, we can still let the type of currentQuaffle encode the rule about validity rather than hoping that developers obey it. Enter shared_ptr_nonnull, a class invented to solve this problem. It’s just like a shared_ptr, except:

  • It lacks functions that would make it empty, i.e. a default constructor and reset() with no arguments, and
  • It fails an assert whenever it’s made empty, like when it’s constructed from an empty shared_ptr. (“Fails an assert” means it traps to the debugger in debug builds. This could arguably be an even stronger failure, but I’ll leave that topic alone for now.)

This class catches bugs at compile time, most often when something tries to default-construct a shared_ptr_nonnull member variable, and at runtime, when someone makes it empty in other ways.

In a nutshell, we were assuming currentQuaffle was always valid, and shared_ptr_nonnull gives us a way to make sure that’s true. I’ve seen this concept come up again and again in software development, across programming, debugging, testing, planning and more. Two of the most important questions you can ask yourself are “What am I assuming right now?” and “How can I make sure it’s true?”

The answer to “How can I make sure it’s true?” might be to write another unit test, to step through with the debugger in a slightly different context, to try a different manual test case, to do some user validation, or a whole host of other options. In this case, the solution was to write a new class. But it took a bug to make us write that class. Preferably, we would have seen that innocuous little phrase “always valid” as an alarm bell going off before we ran into a bug. It takes a particular kind of thinking to see past our own assumptions and we should push ourselves to think in that way as much as possible.

I used the term Zen in the title of this article not because I want you to write code in the lotus position but because Zen meditation focuses on a heightened awareness of things so innate you might never otherwise notice them, like your thoughts and your breathing. If we can train ourselves to be aware of our own innate assumptions, we can write more enlightened code. Thanks for reading, and good luck!

How Loud is Disaster Area?

Tedd Terry

Super scientific comparative analysis

Disaster Area, the “loudest band in the galaxy” from Douglas Adams’ The Restaurant at the End of the Universe, is remarkably loud.

So loud that their audience prefers to listen to them in a concrete bunker 37 miles away from the stage. So loud that Disaster Area play their instruments remotely from a spaceship that’s orbiting a different planet. So loud that they have trouble booking shows because their PA violates ordnance regulations.

If their audience prefers a reasonable OSHA-approved listening level of 83 dB SPL, and their listening bunker has one-foot thick concrete walls, how loud is it at ground zero of the Disaster Area stage if the concert took place on Earth?

We’ve done some napkin math to figure it out, so try it yourself and then compare numbers after the cut!

How do we measure how loud something is?

Almost everyone is familiar with the decibel as an indicator of volume: more is louder. A large enough number means it hurts and listening to it long enough causes some permanent damage that makes it hard to hear quiet things for a while (or forever!). However, a decibel is much more than just an indicator of loudness: it’s an important measurement of distance between two values.

Decibels are in common usage in many fields to express ratios of difference that are otherwise cumbersome to throw around. Decibels have a few properties that make them really useful to us:

  • They use a logarithmic scale, meaning that each doubling of amplitude corresponds to an increase of about 6 dB.
  • They represent a ratio, so they’re always specified in terms of some reference value.

Since decibels measure a change in loudness, there are a few different ways of measuring them for different applications.

For example, the amplitude of sound in the real world is measured in dB SPL (sound pressure level). The amplitude of audio data in a computer is measured in dBFS (full scale). In dB SPL, 0 dB means 0.00002 Pascals: a tiny amount of air pressure. In dBFS, 0 dB means the largest sample value that can possibly be expressed by the wordlength of your bit depth, or a sample value of 1 if floating point is used.

For real world audio, SPL is measured with a sound pressure level meter (as one would expect), which is basically a calibrated microphone with some readout and parameters for how often a measurement is taken and whether or not some frequencies are filtered out before measuring. There’s a lot of pretty good SPL meter apps available for phones, which makes it pretty easy to measure your subway ride and e-mail yourself a CSV so you can nerd out on the data.

dB SPL Reference

Because the decibel is all about the reference, we can build some meaningful perceptual references for dB SPL values. Here are some common world dB values (many of which are cribbed from this handy list):

  • 0 dB SPL: threshold of hearing, a mosquito 10 feet away, the eardrum moves less than 1/100 the length of an air molecule
  • 10 dB SPL: absolute silence, AT&T-Bell Laboratory “Quiet Room”
  • 40 dB SPL: whispered conversation
  • 60 dB SPL: normal conversation
  • 83 dB SPL: average loudness in a THX-certified cinema program; perception of frequencies is equalized around this loudness (this is where you perceive bass and treble about as well as you perceive mids)
  • 85 dB SPL: permanent hearing damage after eight hours of exposure
  • 110 dB SPL: average loudness at a typical rock concert or construction site
  • 120 dB SPL: threshold of pain, permanent hearing damage after less than a minute of exposure

This general range is why we say that human hearing has a dynamic range of 120 dB. We can go even farther and say that the effective dynamic range of our perception is even lower (probably about 60 dB) because we can’t hear stuff buried by everyday noise like HVACs or wind, and we don’t like listening to loud things for very long (or, at least, we probably shouldn’t). This is probably something to consider, given a recent trend toward placing an importance on high bit depth audio delivery formats, where the dynamic range can be 144 dB or above. In our DAWs we have an insane amount of dynamic range available at the mix stage: about 196 dB!

The “threshold of pain” bit is somewhat subjective, but really this is around the ceiling of where it’s safe for humans to experience sound without protection. The point is, sound is pressure, and levels past this threshold make bad things happen to your hearing forever. Of course, that doesn’t stop rock and roll:

  • 130 dB SPL: Front row of an AC/DC concert
  • 150 dB SPL: The Who’s sound system in 1976: measured at 120 dB 50m away from the speaker, audible 100 miles away
  • 154 dB SPL: The loudest sound system on Earth (used to test if spaceflight vehicles can withstand the forces of takeoff)
  • 155 dB SPL: Loud enough to blur vision and cause difficulty breathing
  • 194 dB SPL: 1 ATM (14.7 PSI) of pressure, rupturing of eardrums and probably some other distressing events as well

194 dB is pretty much the ceiling for audio on Earth. Past this point, pressure essentially clips, which would probably sound awesome except for you’d be dead.

Masking

While we’re talking relative loudness, it’s interesting to note that a sound needs to be at least roughly 6 dB louder than whatever it’s competing with to be perceived, and probably 12 dB louder to hear it well. If your subway ride has a noise floor of 80 dB SPL (it probably does), you’re rocking your jams between 86 and 92 dB SPL to hear them. This phenomenon is known as auditory masking, and a more sophisticated model is exploited by audio codecs like MP3 to discard sound that cannot be perceived.

90dB SPL, by the way, is about where it becomes really important to pay attention to your sound level exposure: permanent hearing damage can occur after just two hours , and safe exposure time halves for every additional 3dB.

OK so how loud is Disaster Area?

The cultured audience of Disaster Area fans sipping Pan Galactic Gargle Blasters inside their listening bunker are enjoying the noise at a refined, safe loudness of 83 dB SPL.

The bunker’s walls

The mass of the bunker’s concrete walls absorbs much of the energy present outside the structure. We can quantify this property as transmission loss, which varies by building material and density.

We take a typical density of concrete (2400 kg/m3), make it 1 foot thick (.3 meters) and get a mass of 720 kg/m2. Eyeball that value on this handy chart and we get a transmission loss of about 54 dB. If the level in the room is 83 dB SPL, we can add the transmission loss to get the level outside the Disaster Area bunker: 137 dB SPL.

Between the PA and the bunker

Sound halves in power every doubling of a distance (more or less: there are some caveats here about point sources and line sources and positioning against reflective surfaces but for our purposes we’ll consider the Disaster Area PA an unencumbered point source in space). A 2x change in acoustic power is about 6 dB.

There are 37 miles between the PA and the bunker, which is pretty close to the beautifully round value of 60,000 meters. There’s about 15 doublings between the stage and our Disaster Area concertgoers (log2 60000 ≈ 15.87).

15 doublings × 6 dB is 90 dB. Add that to the level just outside the bunker wall and we get the level at the stage: 227 dB SPL, louder than Earth’s atmosphere can support.

What happens above 194 dB SPL?

Is it meaningful to measure SPL above 194 dB? We’ve already seen that we can use decibels to compare the ranges of acoustic and digital systems. We can abuse decibels a little further to make observations about the power of natural disasters, explosions, and other phenomena we can observe with sound (but would like very much to be standing far away from). This gives us a context for comparing a Disaster Area concert on Earth to some other similarly catastrophic event, like the eruption of Krakatoa or the yield of Ivy Mike.

We’re not really dealing with sound at this point: sound is the pushing and pulling of pressure waves and our only reference for the kind of noise Disaster Area makes is the shockwaves resulting from heavy ordnance. We have to consider Disaster Area’s stage as the source of an explosion and estimate the shockwave force of the event.

This means that, above 194 dB SPL, instead of losing 6 dB per doubling of distance, we look at how explosions behave, where the shockwave loses around 18 dB (at maximum — there’s some transfer here but we’re keeping it simple for napkin math).

So, considering that:

  • There are 10 doublings between bunker and stage until we hit around 194 dB (the “threshold of atmosphere”): 10 × 6 dB = 60 dB
  • There are 5 remaining doublings between that point and the stage. 5 × 18 dB = 90 dB
  • We don’t care about the fancy pants math or reality, really, because we’re talking about an imaginary space band that competes with rocket launches for loudness.

The total level difference between bunker and stage is 150 dB, which brings Disaster Area to a very pretty 287 dB SPL.

For comparison:

  • 194 dB SPL: 1 ATM (14.7 PSI) of pressure, rupturing of eardrums
  • 200 dB SPL: Instantaneous human death from pressure waves
  • 210 dB SPL: Explosion of 1 ton of TNT
  • 220 dB SPL: Saturn V rocket launch
  • 286 dB SPL: Eruption of Mount St. Helens, which knocked down trees for 16 miles around it and blew out windows ~200 miles away in the Seattle-Tacoma area
  • 287 dB SPL: Disaster Area concert (stage)
  • 310 dB SPL: Eruption of Krakatoa, which created an anti-node of pressure on the other side of the planet

That’s our best rough calculation. Did you get another number? Got an acoustic insight we’ve missed? Leave a comment and let us know!

True Peak Detection

Russell McClellan

In the last few years, a number of different countries have passed laws regulating the loudness of audio in television and other broadcast mediums. Surprisingly, loudness is a difficult concept to capture with a simple technical specification. Current regulations set limits for a number of different audio metrics, including overall loudness, maximum short-term loudness, and the true peak level of a signal.

What are true peaks?

To understand how true peaks differ from sample peaks, we have to go back to the basis of digital audio: the Sampling Theorem. This theorem states that for every sampled digital signal, there is only one correct way of reconstructing a band-limited analog signal into a digital one such that the analog signal passes through each digital sample. Digital-to-analog converters try to approximate this correct analog waveform as closely as possible. For more details on this fascinating theorem, we recommend this video from xiph.org.

Some audio editors are able to display the digital samples and an approximation of the corresponding analog waveform. In iZotope RX, both of these signals appear when you zoom far enough in. The blue line represents the analog signal, while the white squares are the digital samples.

analog signal and digital samples

In RX, you can click and drag on an individual sample to change it and see how the analog signal reacts. For example, if you move a single sample very far, we can see that a large amount of ripple appears in the analog signal around that sample.

modified signal with ripple

It’s clear that the analog signal’s peak is quite a bit higher than the highest digital sample. The highest point the analog signal reaches is called the true peak while the highest digital sample is called the sample peak. Since a digital signal has to be converted to an analog signal to be heard, the true peak is a much more sensible metric for the peak level of a waveform.

It turns out that for real audio signals, quite often the true peak is significantly higher than the sample peak, so it’s important to measure carefully.

How are true peaks detected?

BS.1770, the international standards document used as the base for regional loudness specifications, gives a suggested algorithm to detect the true peak level of a digital signal. This algorithm is a relatively simple one: first, upsample the signal to four times its original sampling rate, and then take the digital peak of the new, upsampled signal. We can perform this algorithm manually in RX: first, open the “Resample” module and select a sample rate four times the original rate, then open the waveform statistics window and check the sample peak level. Here’s what the test signal above looks like after it has been upsampled to four times its original rate:

upsampled signal

As you can see, after upsampling, the true peak level is now very close to the sample peak.

Of course, the RX waveform statistics window already provides the true peak level, so you don’t have to perform these steps by hand.

While this algorithm is quite good, there are two major ways that errors can occur. First, no upsampling algorithm can ever be perfect, so either overshoots or undershoots can occur during the upsampling process. This problem can be helped by using a high-quality upsampling algorithm. Second, the true peak may still be between samples even after the upsampling happens. This problem can be ameliorated by upsampling at a higher ratio.

How can we measure the quality of a true peak meter?

While most true peak meters follow the same basic algorithm as the one described in BS.1770, they can vary significantly in two dimensions: the quality of the upsampling algorithm, and also in the ratio of upsampling. BS.1770 includes a description of a simple upsampling algorithm, but many true peak meters actually perform more accurate upsampling than required by the specification. Also, many meters upsample by more than the required four times. This means that true peak meters can vary significantly in the accuracy of their output.

How can the quality of a meter be measured? One way is to create a synthetic signal that is difficult to meter accurately, but has a mathematically known true peak. This way, we can compare the meter’s reported true peak to the true peak we calculated ahead of time, and any difference can be attributed to meter error.

Testing meters with single impulses

One simple signal with a known true peak is a digital impulse, a signal with all samples at zero except for a single sample at a non-zero value. We can see the analog waveform this creates by looking at it in RX:

sinc function

It turns out that the analog waveform for a digital impulse is a well studied function called the sinc function and has a simple mathematical expression: sinc(x) = k \frac{sin(\pi x)}{\pi x}. Also, the true peak is the same as the sample peak at k. This isn’t an incredibly useful signal for testing true peak meters, since even a bad true peak meter that only looks at the sample peak without upsampling will get the correct answer.

However, knowing the mathematical expression for the analog signal allows us to shift it in time to create a more interesting signal. Consider a signal the same function with a time offset of a fraction of a sample, say 0.375, i.e., f(x) = k \frac{sin(\pi (x - 0.375)}{\pi (x - 0.375)}. This signal should still have a true peak of k, since time shifting an analog signal will not change its analog peak. However, the sample peak will be lower, since the true peak no longer sits exactly on a digital sample.

We can use Python, NumPy, and scikits.audiolab to create a wave file with this shifted sinc signal:

import numpy as np
from scikits.audiolab import Format, Sndfile

def save_file(arr, filename):
    format = Format('wav')
    f = Sndfile(filename, 'w', format, 1, 48000)
    f.write_frames(arr)
    f.close()

def shifted_sinc(x):
    k = 0.5
    offset = 0.375
    return (k * np.sin(np.pi * (x - offset)) /
            (np.pi  * (x - offset)))

length = 48000
out = shifted_sinc(np.arange(length, dtype='float') -
                   length / 2)
save_file(out, 'shifted_sinc.wav')

Then, we can open it in RX to see the digital samples and analog waveform:

shifted sinc function

As we can see, the analog waveform is the same, only shifted in time. However, now the sample peak is a few decibels lower than the true peak. We set k = 0.5, so the true peak is 0.5 or -6.02 dB. The sample peak is the sample immediately before the true peak, which using our formula above is 0.5 \frac{sin(\pi (-0.375)}{\pi (-0.375)} or around -8.13 dB, a difference of over two decibels from the true peak!

Since we know the exact true peak level of this signal, we can use it as a test of a true peak meter. It’s fairly difficult to measure, because a sinc function contains information at all frequencies up to the Nyquist frequency, making it difficult to upsample accurately. Also, the peak is located at a fraction of \frac{3}{8}, so even perfect upsampling by four would not catch the true peak. You can download this shifted sinc test file here.

Testing Overshoot: Sine Sweeps

Another good test for true peak meters is a sampled sine sweep at a known amplitude. The true peak of this waveform will just be the amplitude of the sine sweep, but many meters will report a higher true peak because of errors in the upsampling algorithm. Like the sinc function, the sine sweep is difficult to upsample accurately because it has information at all frequencies. We can generate a sine sweep with the following NumPy code:

import numpy as np
from scikits.audiolab import Format, Sndfile

def save_file(arr, filename):
    format = Format('wav')
    f = Sndfile(filename, 'w', format, 1, 48000)
    f.write_frames(arr)
    f.close()

def sine_sweep(begin_freq, end_freq, length, fs, scale):
    # The instantaneous frequency at each sample
    freqs = np.linspace(begin_freq, end_freq, length)
    freqs /= fs

    # The angular frequency of the sweep at each sample
    omegas = freqs / 2 * np.pi

    # The phase of the sweep at each sample
    phases = np.cumsum(omegas)

    # Create a fade in and out to avoid artifacts at
    # the beginning and the end.
    fade_length = length / 8
    fade_in = np.linspace(0, 1, fade_length)
    fade_out = np.linspace(1, 0, fade_length)
    fade = np.ones(length)
    fade[:fade_length] = fade_in
    fade[length - fade_length:] = fade_out
    return fade * scale * np.sin(phases)

sweep = sine_sweep(200, 23000, 48000, 48000, 0.5)
save_file(sweep, 'sine_sweep.wav')

You can download the sine sweep file here.

How good is the example algorithm specified by BS.1770?

Now that we have a few techniques for measuring the quality of true peak detection algorithms, let’s put these to work in evaluating the example algorithm provided by BS.1770.

The upsampling algorithm is a simple one, based on upsampling by four, interpolating with a specific kernel. For more background information on upsampling, please see this reference. The coefficients of the kernel are given in the BS.1770 specification, and looks like this:

BS.1770 filter

If we save this kernel as a wave file we can use RX’s Spectrum Analyzer to visualize the frequency response of this kernel:

BS.1770 filter spectrum

Here, the cutoff frequency is a quarter of the sampling rate, or 6 kHz. The ideal filter would be perfectly flat below this frequency, and then drop immediately down to -\infty dB above it. Real world resampling filters have to make tradeoffs and cannot achieve this.

As we can see, there is a fairly significant amount of ripple in the passband (below roughly 5 kHz), which may indicate that the detector will overshoot at certain frequencies. Indeed, applying this detector to our sine sweep test signal, which has a true peak level of -6.0 dB, results in a measured value of -5.8 dB, an error of 0.2 dB.

Also, the kernel is not very steep at our cutoff frequency. This indicates that for signals with a lot of high-frequency content, such as our sinc test signal, the filter may significantly undershoot. Indeed, for our shifted sinc test file, which also has a true peak of -6.0 dB, the BS.1770 detector results in a measured value of -6.5 dB, an error of 0.5 dB. So, even compliant meters can have fairly significant errors in their true peak detection.

Extra credit: How high can true peaks get?

We’ve now seen several signals that have true peaks higher than their sample peaks, even by more than a decibel. Is there any limit to how much higher the true peaks can be than the sample peak? This is an interesting question because if there were some limit than we would have a worst case bound of how much error any given true peak meter could have.

Unfortunately for meters, it turns out that there is actually no limit to the difference between sample peaks and true peaks.

Plan of the Proof

To show that true peaks can become arbitrarily high, we’ll explore a pathological waveform where we can make the true peak as high as we want, by adding more samples. This particular example was discovered by iZotope colleague Alex Lukin, and the rigorous proof that it had an unboundedly high true peak was found by Aaron Wishnick.

The pathological waveform we are interested in is a series of N alternations between -1 and 1, followed by silence. We’ll show that by adding more alternations, we can make the true peak as high as we want to.

We can start to get a feel for this waveform by manually dragging samples around in RX. Here’s what it looks like after three alternations of -1 and 1:

pathological signal after three alternations

As you can see, one true peak is already higher than the sample peak of 1, and it’s exactly halfway between samples. Using RX’s waveform stats window, we can see that after three alternations the true peak is +2.33 dB, while the sample peak is 1 or 0.0 dB. It turns out that by adding more alternations of -1 and 1, we can make the true peak even higher. Here’s what it looks like with ten alternations:

pathological signal after ten alternations

Using waveform stats, we see that the true peak is +2.58 dB, while the sample peak is the same at 0.0 dB.

In order to prove that we really can make the true peak as high as we want, we’ll have to dig into some of the math.

Detailed Proof

For convenience, let’s call the time of the last 1 sample time 0, so that the alternations of 1 and -1 extend back into negative time. Also for convenience, let’s assume our sampling rate is 1 (this will make the math a bit easier). Judging from RX, this will put the true peak at time 0.5, half way between the last 1 and the first 0.

So, we need to find an equation to tell us the value of the analog waveform at time 0.5. For this, we can use the Shannon interpolation formula:

f(t) = \sum\limits_{n = -2N}^0 x[n] \frac{\sin(\pi (t - n))}{\pi (t - n) }

Where x[n] is our sampled signal at time n, N is the number of alternations and f(t) is the analog waveform at time t. Since we are interested in t = 0.5, our equation becomes

f(0.5) = \sum\limits_{n = -2N}^0 x[n] \frac{\sin(\frac{\pi}{2} - \pi n))}{\pi (0.5 - n) }

We know from trigonometry that \sin(\frac{\pi}{2} - \pi n) is positive 1 if n is even, or negative 1 if n is odd. We can express this as -1^n. So our equation becomes

f(0.5) = \sum\limits_{n = -2N}^0 x[n] \frac{-1^n}{\pi (0.5 - n) }

Now, we plug in the fact that our signal x consists of N alternations between -1 and 1, ending at 1 at time 0. Note that x[n] = 1 when n is even, and x[n] = -1 when n is odd, since it alternates every sample. We can express this as x[n] = -1^n. So, our equation simplifies to

f(0.5) = \sum\limits_{n = -2N}^0 -1^n \frac{-1^n}{\pi (0.5 - n) }

Now, note the two -1^n terms cancel:

f(0.5) = \sum\limits_{n = -2N}^0 \frac{1}{\pi (0.5 - n) }

This is a formula for the analog level at time 0.5 that only depends on the number of alternations, N.

We can plot this series using Wolfram Alpha, and note that the sum diverges. We can also recognize this as a general harmonic series, which all diverge.

Knowing that the series diverges means that the more terms we add the more alternations of -1 and 1, the higher the true peak will be. There is no limit to how high we can make the true peak, if we have enough alternations of -1 and 1. However, since the signal is either -1, 1, or 0 at all samples, the sample peak is always 1. So, knowing the sample peak doesn’t tell you much about the true peak, at least for these pathological signals.

Improving Your Team’s Git Workflow

Nick Donaldson

Version control is an indispensable tool in modern software development, and Git is one of the most popular and widely used version control systems available today. However, despite its pervasiveness, many development teams do not use Git to its full potential, often due to a lack of a well constructed and adhered-to Git workflow. This can lead to frustration with version control as well as process problems, which are easily solvable by adopting a different workflow strategy.

In this article, I’ll be discussing the differences between a few different high-level Git workflows as well as diving a bit deeper into a particular structured, opinionated Git branching workflow known as Gitflow, which has helped a number of teams at iZotope ensure high quality and build integrity while minimizing common pain points associated with using version control on a team.

Version Control Overview

Centralized

Historically, most version control systems were based on a centralized topology, in which one remote server acts as the “source of truth” for all consumers of a repository. Under this topology, new commits to the repository are immediately pushed to the server and other clients are forced to update their local working copy and resolve any conflicts before they can commit new changes. This is how Apache Subversion (commonly known as SVN) works, for example.

While it is possible to create branches in centralized version control systems, the relative difficulty of doing so tends to prevent teams from using branches frequently in their VCS workflow. Furthermore, the eventual merge back to the core branch (or trunk) can be a painful process in a centralized version control system, meaning that having more branches often leads to more time spent resolving VCS issues instead of developing software.

Distributed

Git, by contrast, is a distributed version control system, meaning that each client has a complete, isolated working copy of the repository, which may be modified in isolation from the remote server and synchronized later. In addition, with Git there can be multiple remote servers hosting the same repository or different versions of the same original repository, commonly known as “forks”. Synchronization between client copies of a Git repository, known as “clones”, and any of potentially several remote servers can be done at will, and per-branch.

Branching and merging are also both significantly easier using Git as compared to centralized version control systems. Clients can make branches and resolve merges locally without updating the remote server, which allows a lot of flexibility in terms of validating merge results, reorganizing/editing commits, or even completely restructuring branches before committing the end result to the remote server.

New Git repositories are created with a single master branch by default, the purpose of which can be interpreted in a number of ways. The most basic (and, as you will see, least useful) way to use the master branch is in a similar way to the trunk in SVN and other centralized version control systems, in which it represents the latest version of the source code, and is the branch to which all commits are made during development.

Git Workflows

Single-Branch Git Workflow

A very basic single-branch (master only) Git workflow typically works something like this:

  1. Clone the repository
    • git clone [my_repo_url]
  2. Change some code
  3. Add changes to the index
    • git add [changed_file]
  4. Make a commit
    • git commit -m "Changed some code"
  5. Push the change to the remote server
    • git push origin master
  6. If the push fails because the remote server is ahead of the local copy, pull first and fix any conflicts, then push again
    • git pull --rebase
    • Note: The use of --rebase is optional here, but it avoids polluting the repository with superfluous merge commits

This type of Git workflow is not much different from centralized version control; there is still only one primary “source of truth” (the master branch) where all code changes are made by everyone on the team. Despite this, some teams never move past this workflow, which can lead to a number of problems as teams scale up in size and move toward a regular release schedule. These problems may include the following:

Different development arcs are all mixed on the same branch

No matter how your team chooses to break up specific tasks, it’s probably the case that not everyone is working on the same thing at the same time. In a single-branch workflow, commits containing completely unrelated code changes all end up on master in an unknown state of completeness and stability.

Neither features nor bugfixes can be tested in isolation

If part of your team is working on one particular task, be it a feature or bugfix, it is very difficult for QA to validate new changes, since the state of master is constantly in flux and potentially consists of commits made as a part of multiple, unrelated tasks. At best, other commits do not affect the change being tested, but in many cases the change may not be verifiable at all due to instabilities introduced from other unrelated changes.

There are no implicit guarantees of branch stability

This one is extremely important: in a single-branch workflow, there are absolutely zero built-in guarantees that the HEAD of master represents a stable, working build. This can be solved by creating tags for commits representing good builds, but a better solution is to avoid the problem altogether by using a different Git workflow.

“Code Freeze” can be very painful

When it comes time to prepare a build for release, teams using a single-branch workflow have very little flexibility when it comes to what goes into the build or not – once changes are committed to master, it’s difficult and costly to productivity to revert them if the relevant feature isn’t ready for release yet. This means that your team has to be very careful about commits leading up to a release, which can be especially troublesome for teams trying to adhere to an agile release schedule.

Branching Workflows

So, if the single-branch workflow has so many issues, what is a viable alternative?

Branching!

Creating, updating, and merging branches in Git is comparatively easy — in fact, the relative simplicity of its branching model is one of the distinguishing features of Git as a version control system. I won’t go into detail on how branching works in Git, but there are plenty of resources out there if you are interested in learning more about the mechanics.

In a branch-based workflow, developers create “feature” branches from master or another stable branch to implement feature work, bug fixes, or other maintenance in parallel with other ongoing work. When the work is completed, it can be tested in isolation from other codebase changes by building the relevant branch. Once the branch’s build passes the necessary quality checks, which might include code review, unit tests, and manual or automated regression tests, the branch is then merged back into the stable branch.

Since branching is easy in Git, creating and destroying branches does not impose a significant overhead cost on productivity. In addition, making use of a branch-based workflow solves all of the aforementioned issues with single-branch workflows in an elegant way:

Branches allow features to be developed and tested in isolation

Because each feature or bugfix is implemented on a branch, the changes introduced in builds generated from feature branches are always directly related to the specific feature or bugfix. Therefore, there is high likelihood that any regressions or issues discovered during testing will have been introduced in the course of development of the particular feature or bugfix being tested. This can save a lot of time and effort for both QA engineers and developers in tracking down and fixing bugs.

Branches ensure build stability

In a branching workflow, one or more long-term branches (e.g. master and develop, see below) are designated as “good” branches, to which only working, fully tested code is merged. This means that at any point in time, your team can have confidence that the code from one of these branches will produce a stable build.

Branches make “Code Freeze” a breeze!

Preparing a release candidate and integrating stop-ship fixes is extremely easy in a branching workflow. Because there is always a stable branch which consists of up-to-date, fully tested features, all you need to do is create a new branch from the stable branch in order to institute a code freeze. Builds generated from the release candidate branch can then be verified for the release at hand without stopping development on features for future releases.

Gitflow

Gitflow is a specific branching strategy for development teams first proposed by Vincent Driessen in a 2010 blog post. It provides a common, structured template for teams to follow in order to implement a successful Git branching workflow. It’s both easy to learn and highly effective, and it has benefitted a number of teams here at iZotope.

Permanent Branches

In Gitflow, there are at least two branches that are permanent — that is, they are never deleted for the entire lifetime of the repository.

master

In Gitflow, the master branch represents code that is currently in production. That’s right — the master branch will remain empty until you have released the first version of your product to the public. The advantage of keeping only in-production code in the master branch is that it’s very easy to find the code that’s out in the wild in order to diagnose or address critical issues — just checkout master! Another benefit of ensuring that only in-production code is in master is that hotfixes – fixes for critical bugs in the wild – can very easily be made in isolation from ongoing development of the product.

develop

The develop branch contains the latest development code. In other words, it is the eventual destination of all features and bugfixes that will go into the next release. Since feature branches and bugfixes are independently developed, tested, and validated prior to merging them into develop, the HEAD of this branch should, under ideal circumstances, always produce a “good” build with the latest validated features.

Temporary Branches

There are also a variety of temporary branch types in Gitflow which are created as necessary and destroyed when no longer needed.

Feature branches

Feature branches are where development of new features is performed. If your team is using a methodology like Scrum, then each feature branch generally represents a single user story. Multiple developers may work on a single feature branch at once, in which case their collaborative Git workflow on the feature branch will generally resemble the familiar single-branch workflow, with both developers pushing commits to the same branch.

Feature branches should follow a naming convention where an optional user story identifier and short hyphen-delimited summary of the branch are preceded by feature/. For example, feature/JIRA-50-new-login-ui. It’s worth noting that this naming convention does not follow Vincent Driessen’s original proposal, but it serves a few useful purposes:

  • The prefix facilitates immediate identification of the purpose of the branch
  • The front slash is interpreted as a “group” by many git clients, and also makes branches listed using the CLI align well with one another visually:
    bugfix/JIRA-100-this-bug
    develop
    feature/JIRA-120-this-feature
    feature/JIRA-121-that-feature
    master
    
  • A common prefix makes it extremely easy to setup a matching pattern in CI systems to automatically build temporary branches when commits are pushed to them, e.g. (feature|bugfix|release)/*

Bugfix branches

As you may have guessed, bugfix branches are created for the purpose of fixing one or more related bugs. The use of bugfix branches should be restricted to only bugfix work as to avoid confusion. Although it’s possible for multiple bugfixes to be included on a single branch, it’s advantageous to keep these branches as small and concise as possible to minimize the chance of changes introducing unintended side effects.

In a similar naming pattern as feature branches, bugfix branches should begin with bugfix/ followed by an issue number (if there is one) and short summary. For example, bugfix/JIRA-123-fix-null-text.

Hotfix branches

If there is a critical issue in your product that’s affecting users in the wild, Gitflow makes it extremely easy to patch and release a new update. Remember how the HEAD of master is always the current version in production? Well, in order to patch an issue without disrupting ongoing development, all you need to do is create a “hotfix” branch off of master, fix the issue, test it, and merge back into master and develop. The patched build is created from master, and all future builds will also include the patch since it was merged back into develop as well.

No surprise here: hotfix branches should be named starting with hotfix/ and followed by an issue number and a short summary of the fix. For example, hotfix/JIRA-444-sync-crash.

Pull Requests

A pull request is an abstract concept that isn’t actually part of Git, but rather a process tool that can be used to validate code changes made on branches before merging them into a stable branch. The idea of a pull request became well known in the collective consciousness of software developers largely due to its implementation on GitHub.

The purpose of a pull request is to indicate to the rest of the development team (or to the maintainers of the codebase) that you have a branch with new changes that you’d like to have merged into a stable branch. This gives other developers an opportunity to review the code changes and gives QA and/or a continuous integration system an opportunity to validate the quality and integrity of the build with the new changes integrated.

In most Git frontends such as GitHub, Bitbucket, or Stash, a pull request is represented by a webpage with three main components:

  1. A timeline of events in the pull request, which shows a chronological history of code review comments and commits that have been added or deleted since the pull request was opened. Sometimes the timeline also has links to issue tickets or other pull requests referenced by the pull request or vice versa.
  2. An overall line-by-line diff of the code in the branch with the code in its merge destination. This is often supplemented by a commenting system which allows comments to be made on a file as a whole or on a particular line of code.
  3. An up-to-date list of commits that will be merged into the destination if the pull request is accepted, each with individual line-by-line diffs.

Here is an example of a merged pull request from a popular open source repository on GitHub.

In Gitflow, when developers are finished working on a temporary branch, a pull request should be created for the branch and reviewed/validated by the team before being merged into its destination (typically develop). This provides a standardized quality checkpoint for all code changes and helps ensure that permanent branches are always stable.

Sometimes a pull request will not be able to be merged due to merge conflicts. There are two primary ways to address this: merging from or rebasing onto the destination branch. Which you use is a matter of preference; merging can be easier but litters your Git history with bidirectional merge commits, while rebasing creates a cleaner git history but can be more tedious and requires force-pushing to the remote server once completed, since it involves rewriting commits. Force pushing can be scary and requires anyone else working on the same branch to carefully handle updating their local copy, so if you choose to rebase, communication is key!

Releasing

As mentioned a few times previously, Gitflow is highly effective when it comes to release management. Let’s walk through an example to see how it works.

Suppose the development team at the hip software company Deuterium, Inc. consists of two sub-teams, Team A and Team B, working on the company’s flagship product, Stratosphere. Each team could be made up of multiple people, or each team could just be one developer.

  • Version 1.0 of Stratosphere is already out in the hands of customers.
  • Team A is in charge of preparing version 1.1 for release.
  • Team B is in charge of working on a feature that isn’t finished yet, but is planned for version 1.2.

Here’s how Gitflow makes this type of scenario not only possible, but also easy to manage:

  1. Team A creates a new branch off of develop, which at this point includes all of the new features and bugfixes that will go into version 1.1.
    • git checkout -b release/v1.1
    • git push origin release/v1.1
  2. At this point, the code for release version 1.1 is effectively frozen.
  3. Team A encounters a stop-ship bug in the code for version 1.1, so they create a new bugfix branch from release/v1.1.
    • git checkout release/v1.1
    • git checkout -b bugfix/JIRA-150-stop-ship
    • git push origin bugfix/JIRA-150-stop-ship
  4. While Team A is working on the stop-ship bug, Team B manages to finish, test, review, and validate their new feature before version 1.1 is released yet. They merge their feature into develop, push the result, and delete the feature branch.
    • git checkout develop && git pull
    • git merge --no-ff feature/JIRA-149-awesome-thing
    • git push origin develop
    • git push origin :feature/JIRA-149-awesome-thing
    • Note: If using a Git frontend, merging the pull request from the web interface would accomplish all of the above automatically. Also, the --no-ff is to ensure that the merge generates a merge commit, serving as a record in the commit history that the branch was merged.
  5. develop now includes the new feature, but since release/v1.1 is isolated, the release build is not affected by this.
  6. Team A fixes the bug, so once they have validated and tested it, they merge the bugfix branch into the release branch and delete the bugfix branch from the server.
    • git checkout release/v1.1
    • git merge --no-ff bugfix/JIRA-150-stop-ship
    • git push origin release/v1.1
    • git push origin :bugfix/JIRA-150-stop-ship
    • Note: once again, merging a pull request in a Git front end would do all of the above.
  7. Team A validates the release build with the bugfix and is ready to ship. They merge the release branch into both master and develop, delete the release branch, and create a new tag for the release.
    • git checkout master && git pull
    • git merge --no-ff release/v1.1
    • git tag -m v1.1
    • git push --tags origin master
    • git checkout develop && git pull
    • git merge --no-ff release/v1.1
    • git push origin develop
    • git push origin :release/v1.1
  8. At this point, master contains the in-production code for version 1.1, and develop now includes both the bugfix implemented by Team A and the new feature for version 1.2 implemented by Team B

This type of (dare I say) agile release management would be orders of magnitude more difficult if not using a branching version control workflow, and effectively demonstrates the power of Gitflow for streamlining release management.

Summary

Git is an extremely powerful tool which can serve to either complicate or streamline your development team’s version control and release management processes. By making good use of Git’s easy and functional branching capabilities in an enforced and structured workflow, many of the common pain points of building software (at least from a process standpoint) can be addressed in an effective and simple way.

Resources

Here are a few other resources on Gitflow to check out if you want to dive deeper. Happy branching!

Vincent Driessen’s Original Gitflow Article
Atlassian Git Workflows (with examples)