Publications of the Ptolemy Group

December 10, 1997

The opening chapter of this thesis provides a review of background material related to audio signal modeling as well as an overview of current trends. Basis expansions and their shortcomings are discussed; these shortcomings motivate the use of overcomplete expansions, which can achieve improved compaction. Methods based on overcompleteness, {\it e.g.}~best bases, adaptive wavelet packets, oversampled filter banks, and generalized time-frequency decompositions, have been receiving increased attention in the literature.

The first signal representation discussed in detail in this thesis is the sinusoidal model, which has proven useful for speech coding and music analysis-synthesis. The model is developed as a parametric extension of the short-time Fourier transform (STFT); parametrization of the STFT in terms of sinusoidal partials leads to improved compaction for evolving signals and enables a wide range of meaningful modifications. Analysis methods for the sinusoidal model are explored, and time-domain and frequency-domain synthesis techniques are considered.

In its standard form, the sinusoidal model has some difficulties representing nonstationary signals. For instance, a pre-echo artifact is introduced in the reconstruction of signal onsets. Such difficulties can be overcome by carrying out the sinusoidal model in a multiresolution framework. Two multiresolution approaches based respectively on filter banks and adaptive time segmentation are presented. A dynamic program for deriving pseudo-optimal signal-adaptive segmentations is discussed; it is shown to substantially mitigate pre-echo distortion.

In parametric methods such as the sinusoidal model, perfect reconstruction is generally not achieved in the analysis-synthesis process; there is a nonzero difference between the original and the inexact reconstruction. For high-quality synthesis, it is important to model this residual and incorporate it in the signal reconstruction to account for salient features such as breath noise in a flute sound. A method for parameterizing the sinusoidal model residual based on a perceptually motivated filter bank is considered; analysis and synthesis techniques for this residual model are given.

For pseudo-periodic signals, compaction can be achieved by incorporating the pitch in the signal model. It is shown that both the sinusoidal model and the wavelet transform can be improved by pitch-synchronous operation when the original signal is pseudo-periodic. Furthermore, approaches for representing dynamic signals having both periodic and aperiodic regions are discussed.

Both the sinusoidal model and the various pitch-synchronous methods can be interpreted as signal-adaptive expansions whose components are time-frequency atoms constructed according to parameters extracted from the signal by an analysis process. An alternative approach to deriving a compact parametric atomic decomposition is to choose the atoms in a signal-adaptive fashion from an overcomplete dictionary of parametric time-frequency atoms. Such overcomplete expansions can be arrived at using the matching pursuit algorithm. Typically, the time-frequency dictionaries used in matching pursuit consist of Gabor atoms based on a symmetric prototype window. Such symmetric atoms, however, are not well-suited for representing transient behavior, so alternative dictionaries are considered, namely dictionaries of damped sinusoids as well as dictionaries of general asymmetric atoms constructed using underlying causal and anticausal damped sinusoids. It is shown that the matching pursuit computation for either type of atom can be carried out with low-cost recursive filter banks.

In the closing chapter, the key points of the thesis are summarized. The conclusion also discusses extensions to audio coding and provides suggestions for further work related to overcomplete representations.

PhD thesis, University of California, Berkeley, 1997.

Available as UCB/ERL M97/91.

`http://ptolemy.eecs.berkeley.edu/papers/97/michaelgThesis/`

The dissertation (273 pages) |
---|

Send comments to Michael Goodwin at michaelg@eecs.berkeley.edu.