
Rockbox mail archiveSubject: RE: Recording speeds...RE: Recording speeds...
From: Nielsen Linus (ext) <"Nielsen>
Date: Thu, 4 Jul 2002 14:03:13 +0200 > So a sampling rate of 16kHz means it does 16000 (or 16384?) > samples per second? Yes. > And the bitrate is the number of bits each sample uses? No. The number of bits is always 16 (i think you can have 8 too, but i'm not sure). Maybe i should explain some theory. Please accept that I'm not an expert on this. The description below is probably full of errors, but the general idea is pretty correct. The sampling process produces a large amount of 16bit samples, 16000/sec in you example. The sampling rate must be twice as high as the highest frequency you want to record. This is called the Nyquist Theorem. According to Fourier, you can generate any sound you want by adding sine waves of different frequencies. This process can also be reversed, so you can find out which frequencies a particular waveform is composed of. This is called a Fourier transform. The MP3 encoding performs an analysis of the frequency components of the sound to be compressed. As the sound contains different frequencies at different parts of the waveform, it is split into small chunks of 2030 milliseconds each. The chunk length depends on the sampling rate, higher sampling rates needs shorter frames. Those chunks are then analyzed to find which frequencies it is built of. Then comes the magic. We can take advantage of the fact that the ear and the brain are not perfect. The human perception of sound tends to filter out frequencies that are close to each other, and to disregard some frequencies with a low amplitude. We can simply remove those components from our analysis result. This is called Psycho Acoustic compression. The result of a Fourier transform is a table of amplitudes (and phase?) for a given set of frequencies. If we can remove some of these components and set them to zero, the table would be easy to compress with a common data compression algorithm. Further, those components that the human can still hear but only faint, we can quantize. That is, as an example, rounding all odd numbers to the next even number. Then we will have very few odd numbers in our table, which makes the data even easier to compress. This is very similar to JPEG encoding. Now, we can control how much data to remove by adjusting the quantization factors, and by removing narrower or wider frequency ranges. Ok, so now we have a waveform, split into chunks, each one Fourier transformed, quantized and (Huffman?) compressed. Now we have an interesting problem: The MP3 data must be able to stream, for example the sound track of an MPEG movie. The data channel on which the sound is transferred has a limited bandwidth, let's say 128kbit/s. We need to compress the waveform so that it can be transferred at 128kbit/s and still play with a minimum latency and very small buffers in the receiver. So we divide the stream into frames that, in a perfect world, would hold exactly one of the chunks described earlier. Each frame corresponds to the chunk time (2030ms), on 128kbit/s this is about 140 bytes, I think. However, since each chunk may have different frequency contents, some may be hard to compress and that may result in chunks that are bigger than 140 bytes. We can approach this problem in many ways. Either we make sure that no chunks get bigger than 140 bytes. This may lead to bad audio quality on some frames, as we need to quantize the frequencies heavier to reach a higher compression ratio. We can even this effect out a little by taking advantage of the wasted space in those frames that are smaller than 140 bytes. If a few frames are smaller than 140 bytes, a succeeding frame could make use of the space in the preceding frames to allow it to be larger than 140 bytes. This is called the Bit Reservoir. However, the bit reservoir can only allow for small variations in frame size. A more flexible way of solving this is of course to allow frames of different sizes. Enter the Variable Bit Rate encoding. This encoding allows each frame to be of a different size. Each size corresponds to a certain bit rate. The bit reservoir is practically unused, since all frames can have its own size. Oooh. This mail became longer than I thought. I hope you understood at least the general concept. What was your question again? :) Linus Received on 20020704 Page template was last modified "Tue Sep 7 00:00:02 2021" The Rockbox Crew  Privacy Policy 