Search

The Online Encyclopedia and Dictionary

 
     
 

Encyclopedia

Dictionary

Quotes

 

Audio timescale-pitch modification

(Redirected from Pitch shifting)

Time stretching is the process of changing the speed or duration of an audio signal without affecting its pitch. Pitch scaling or pitch shifting is the reverse: the process of changing the pitch without affecting the speed. There are also more advanced methods used to change speed, pitch, or both at once, as a function of time.

These processes are used, for instance, to match the pitches and tempos of two pre-recorded clips for mixing when the clips cannot be reperformed or resampled. (A drum track could be moderately resampled for tempo without adverse effects, but a pitched track could not). They are also used to create effects such as increasing the range of an instrument (like pitch shifting a guitar down an octave).

Contents

Resampling

The simplest way to change the duration or pitch of a digital audio clip is to resample it. This is a mathematical operation that effectively rebuilds the original waveform from its samples and then samples the waveform again at a different rate. When the new samples are played at the original sampling frequency, the audio clip sounds faster or slower. Unfortunately, the frequencies in the sample are always scaled at the same rate as the speed. In other words, slowing down the recording lowers the pitch, speeding it up raises the pitch, and the two effects cannot be separated. This is analogous to speeding up or slowing down an analog recording, like a phonograph record or tape.

Phase vocoder

One way of stretching the length of a signal without affecting the pitch is to build a phase vocoder after Flanagan, Golden, and Portnoff.

Basic steps: compute the frequency/time relationship of the signal by taking the Fast Fourier Transform of each windowed block of samples (in other words, perform the STFT), do some processing of the frequency components' amplitudes and phases, perform the inverse FFT on each chunk, and add the resulting waveform chunks (or perform the inverse STFT).

A good algorithm will give good results at compression/expansion ratios of ± 25%; beyond that, the pre-echo and other smearing artifacts of frequency domain interpolation on transient ("beat") waveforms, which are not localized at all in the frequency domain, begin to take a toll on perceived audio quality. A simple implementation of phase vocoder processing will create a "swooshy" or "underwater" sound in the reconstructed waveform, as the transient peaks waveforms are dissolved.

This technique can also be used to perform pitch shifting, chorusing, timbre manipulation, harmonizing, and other unusual modifications, the parameters of which can be changed as functions of time.

Time domain

Rabiner and Schafer in 1978 put forth an alternate solution: work in the time domain, attempt to find the period of a given section of the fundamental wave with the autocorrelation function, and crossfade one period into another. This is called time domain harmonic scaling or the synchronized overlap-add method and performs somewhat faster than the phase vocoder on slower machines but fails when the autocorrelation misunderestimates the period of a signal with complicated harmonics (such as orchestral pieces). Cool Edit Pro seems to solve this by looking for the period closest to a center period that the user specifies, which should be an integer multiple of the tempo, and between 30 Hz and the lowest bass frequency. For a 120 bpm tune, use 48 Hz because 48 Hz = 2,880 cycles/minute = 24 cycles/beat * 120 bpm.

This is much more limited in scope than the vocoder based processing, but can be made much less processor intensive, for real-time applications.

High-end commercial audio processing packages combine the two techniques, using wavelet techniques to separate the signal into sinusoid and transient waveforms, applying the phase vocoder to the sinusoids, and processing transients in the time domain, producing the highest quality time stretching.

Pitch scaling

These techniques can also be used to scale the pitch of an audio sample while holding speed or duration constant.

Note that the technique can be called pitch scaling or pitch shifting, depending on perspective. Under one definition of musical pitch, pitch and frequency are related logarithmically; as the musical pitch is shifted linearly (shifting every note up the scale by a perfect fifth, for instance), the frequencies of the signal are actually being scaled, because of the logarithmic relationship between the notes we hear and the actual frequencies of those notes. A frequency shift, which is performed by amplitude modulation, does not preserve the ratios of the harmonic frequencies that determine the sound's timbre, and is not a "musical" transformation. Similarly, a literal pitch scaling, in which the musical pitch is scaled (a higher note would be shifted at a greater interval than a lower note) is highly unusual, and not musical. However, "pitch" can also be used to refer to frequency, and the other two transformations are not commonly used, so either term usually refers to musical pitch shifting.

Time domain processing works much better here, as smearing is less noticeable, but scaling vocal samples distorts the formants into a sort of Alvin and the Chipmunks-like effect, which may be desirable or undesirable. To preserve the formants and character of a voice, you can analyze the signal with a channel vocoder or LPC vocoder plus any of several pitch detection algorithms and then resynthesize it at a different fundamental frequency.

See also

External links

  • http://www.panix.com/~jens/pvoc-dolson.par - A good description of the phase vocoder
  • http://www.ee.columbia.edu/~dpwe/papers/LaroD99-pvoc.pdf - Advanced techniques using the phase vocoder method
  • http://www.ircam.fr/equipes/analyse-synthese/roebel/paper/dafx2003.pdf - Phase vocoder with transient processing
  • PSOLA Synthesis, SOLAFS Synthesis - Two specific methods of time domain TDHS or SOLA processing.
  • Audio Engineering Society
  • Original E2 article (http://everything2.com/index.pl?node_id=1074923)
  • http://www.dspdimension.com/html/timepitch.html
  • http://www.bdti.com/faq/dsp_faq.htm - comp.dsp FAQ

Last updated: 05-26-2005 08:26:59
The contents of this article are licensed from Wikipedia.org under the GNU Free Documentation License. How to see transparent copy