release
dev builds
extras
themes manual
wiki
device status forums
mailing lists
IRC bugs
patches
dev guide

# Rockbox mail archive

Subject: RE: Recording speeds...

# RE: Recording speeds...

From: Nielsen Linus (ext) <"Nielsen>
Date: Thu, 4 Jul 2002 14:03:13 +0200

> So a sampling rate of 16kHz means it does 16000 (or 16384?)
> samples per second?

Yes.

> And the bitrate is the number of bits each sample uses?

No. The number of bits is always 16 (i think you can have 8 too, but i'm not
sure).

Maybe i should explain some theory. Please accept that I'm not an expert on
this. The description below is probably full of errors, but the general idea
is pretty correct.

The sampling process produces a large amount of 16-bit samples, 16000/sec in
you example. The sampling rate must be twice as high as the highest
frequency you want to record. This is called the Nyquist Theorem.

According to Fourier, you can generate any sound you want by adding sine
waves of different frequencies. This process can also be reversed, so you
can find out which frequencies a particular waveform is composed of. This is
called a Fourier transform.

The MP3 encoding performs an analysis of the frequency components of the
sound to be compressed. As the sound contains different frequencies at
different parts of the waveform, it is split into small chunks of 20-30
milliseconds each. The chunk length depends on the sampling rate, higher
sampling rates needs shorter frames.

Those chunks are then analyzed to find which frequencies it is built of.
Then comes the magic. We can take advantage of the fact that the ear and the
brain are not perfect. The human perception of sound tends to filter out
frequencies that are close to each other, and to disregard some frequencies
with a low amplitude. We can simply remove those components from our
analysis result. This is called Psycho Acoustic compression.

The result of a Fourier transform is a table of amplitudes (and phase?) for
a given set of frequencies. If we can remove some of these components and
set them to zero, the table would be easy to compress with a common data
compression algorithm. Further, those components that the human can still
hear but only faint, we can quantize. That is, as an example, rounding all
odd numbers to the next even number. Then we will have very few odd numbers
in our table, which makes the data even easier to compress. This is very
similar to JPEG encoding.

Now, we can control how much data to remove by adjusting the quantization
factors, and by removing narrower or wider frequency ranges.

Ok, so now we have a waveform, split into chunks, each one Fourier
transformed, quantized and (Huffman?) compressed. Now we have an interesting
problem:

The MP3 data must be able to stream, for example the sound track of an MPEG
movie. The data channel on which the sound is transferred has a limited
bandwidth, let's say 128kbit/s. We need to compress the waveform so that it
can be transferred at 128kbit/s and still play with a minimum latency and
very small buffers in the receiver.

So we divide the stream into frames that, in a perfect world, would hold
exactly one of the chunks described earlier. Each frame corresponds to the
chunk time (20-30ms), on 128kbit/s this is about 140 bytes, I think.
However, since each chunk may have different frequency contents, some may be
hard to compress and that may result in chunks that are bigger than 140
bytes.

We can approach this problem in many ways. Either we make sure that no
chunks get bigger than 140 bytes. This may lead to bad audio quality on some
frames, as we need to quantize the frequencies heavier to reach a higher
compression ratio. We can even this effect out a little by taking advantage
of the wasted space in those frames that are smaller than 140 bytes. If a
few frames are smaller than 140 bytes, a succeeding frame could make use of
the space in the preceding frames to allow it to be larger than 140 bytes.
This is called the Bit Reservoir.

However, the bit reservoir can only allow for small variations in frame
size. A more flexible way of solving this is of course to allow frames of
different sizes. Enter the Variable Bit Rate encoding. This encoding allows
each frame to be of a different size. Each size corresponds to a certain bit
rate. The bit reservoir is practically unused, since all frames can have its
own size.

Oooh. This mail became longer than I thought. I hope you understood at least
the general concept.

What was your question again? :-)

Linus