A short introduction to speech and music coding :

Although with the emergence of the optical fibers bandwidth in wired communications has become inexpensive, there is a growing need for bandwidth conservation and enhanced privacy in wireless cellular and satellite communications. On the other hand, there is a trend toward integrating voice-related applications on desktop and portable personal computers. Most of these applications require that the speech signal is in digital format so that it can be processed, stored, or transmitted under software control.

Speech coding or speech compression is the field concerned with the purpose of efficient transmission or storage. The objective in speech coding is to represent speech with a minimum number of bits while maintaining its perceptual quality.

Generally, the term medium rate is used for coding in the range of 8-16 kb/s and low rate for systems working below 8 kb/s and down to 2.4 kb/s, and very low rate for coders operating below 2.4 kb/s.

Since the 80 's, the researches in the compression domain is focused on coders with medium and very low rates. This is achieved using the analysis-synthesis process. In the analysis stage, speech is represented by a compact set of parameters which are encoded efficiently. In the synthesis stage, these parameters are decoded and used in conjunction with a reconstructed mechanism to form speech.

Non-speech-specific coders or waveform coders are concerned with faithful reconstruction of the time domain waveform and generally operate at medium rates (e.g. the ADPCM coder work with rates 32 kb/s). Speech-coders or voice coders ( vocoders) rely on speech models and are focused upon producing perceptually intelligible speech without necessarily matching the waveform. Vocoders are capable of operating at very low rate but also tend to produce speech of synthetic quality. There are coders that combine features from the both categories. For example, there are hybrid coders which rely on analysis-by-synthesis linear prediction. Hybrid coders combine the coding efficiency of the vocoders with the high-quality potential of waveform coders by modeling the spectral properties of the speech and exploiting the perceptual properties of the ear, while at the same time providing for waveform matching.

Last updated December 17, 1999, send comments to dutoit@tcts.fpms.ac.be