February 2002
Bit Rates When Encoding MP3 Files
MP3 Bit Rates Explained for the Layman
by Darryl Sperber
Darryl wrote the following in response to a question on comp.os.os2.multimedia.
It's a great education in MP3 encoding and we think you'll find his article fascinating.
Background:
An MP3 file is a sound file (often taken from a WAV file or a track from an audio
CD) that is compressed to make it smaller.
Much of the music you download from Napster and similar sites is encoded into
MP3 files.
Welcome to the mystical topic of choosing a "bit rate" when creating an MP3 sound file!
When you make an MP3 file you have to make two choices - the type of bit rate to use
and the speed of that bit rate.
The bit rate used when encoding an MP3 file is one of two types, either fixed/constant
bit rate ("CBR") or variable bit rate ("VBR").
Fixed Bit Rate Encoding
In the case of a fixed bit rate, a person creates the MP3 file using an "encoder" program
where the desired constant bit rate is specified in advance.
It is usually selected as a command-line parameter value but can also be specified as
some GUI setting if that is the way the software was written.
Generally speaking, the higher the fixed MP3 bit rate selected at encoding time, the
closer it will be to the original CD/WAV source, and the "better" will be the sound of the
resulting MP3 file when played back (assuming you're using a high-quality player
program).
And naturally, the higher the fixed bit rate the larger the resulting output file.
With CBR, the same fixed/constant number of bits per second are generated into the
encoded MP3 file output no matter what the source file, so that the output MP3 file size
can actually be predicted in advance based on the time length of the track being
encoded.
And the same number of bits per second are used regardless of the amount of
"information" in the track being encoded.
So a one minute WAV file which has a constant tone that is on for two seconds and
then off (and totally silent) for two seconds will generate an output file size which is a
constant, based approximately on the bit rate times the length of the track.
If the one minute file instead has that same constant tone for the entire one minute,
without the alternating silent periods, the output CBR MP3 file will again be exactly the
same size.
The size of these two CBR-encoded MP3 files will be identical.
And if instead you encoded a one minute musical WAV selection which had completely
normal musical variations during that one minute, it would again produce an output file
which is identical in size to the file produced as described above, from the
alternating/constant/silent tone one minute WAV file.
One minute of CBR MP3 is always the same size (depending on the bit rate selected),
regardless of what is in the input file.
In other words a "constant bit rate" encoding uses that many bits per second regardless
of what the input source is -- sound or silence, fixed or varying sound frequency,
whatever the volume and intensity.
Variable Bit Rate Encoding
In contrast, "variable bit rate" (VBR) encoding uses a dynamic algorithm whose bit rate
at any point is based on "need" and varies as a function of the input sound source data
values.
With VBR encoding it takes very little to represent a section of a WAV sound that is
constant because the sound doesn't change and the encoder can just tell the player to
keep repeating each cycle of the sound.
All that's conceptually needed is to describe the particular sound or silence by its
frequency, volume, intensity, absence, etc., plus an indication of the time duration for
which that combination of values (or absence) is to be maintained, until the next change
in the sound.
Once the sound is "described", these encoded values will "apply" until a new value of
VBR data is encountered indicating that the sound state values should change.
So the more the input WAV source appears "quiet" or "constant" or "non-varying", the
greater is the "mathematical advantage" of the VBR encoding approach.
During each of these periods of constant sound (or silence) essentially NO bits are
required except during the very first cycle.
Only when the sound once again changes from its last state is new encoded data
required.
Clearly, then, VBR files are "unpredictable" in output size.
They are obviously a function of the input source WAV data, since all of the factors
(frequency, loudness, etc.) which make that sound data will vary.
But VBR-encoded files are almost always smaller than their CBR counterparts,
especially for slower, quieter, more non-varying input files.
There is some additional information required in the VBR output datastream (namely the
time duration for which each described set of sound values are to be maintained), but it
is only required at the beginning of that sound's time interval.
New information is only required when the sound state changes, with zero additional
information required during the intervening constant state period.
When you do VBR encoding, you give the encoder program a "range" of bit rates (really
just a specification of the maximum bit rate you will allow it to go up to) which you will
permit it to vary across while performing the MP3 encoding.
The mathematical design of the process will determine how many bits per second are
required to adequately represent the sound being encoded at that moment, up to the
maximum you allow.
And of course the mathematical design of the VBR encoding algorithm has much to do
with the resulting audio quality when played back.
Intuitively more elaborate and more sophisticated analysis encoding algorithms will take
more CPU power and will be slower, while faster algorithms might make some
compromises.
Depending on the source material, the encoded VBR result might be almost entirely at
the maximum bit rate allowed (if the "variability" of the source is extremely high) or
considerably less than that (if the source is much more "quiet" and "non-varying").
As expected, the resulting sound quality of the VBR output can be no better than the
maximum bit rate permitted.
But if the input source data can be easily "represented" within that constraint you
probably won't notice the difference and the file size will be much smaller than its CBR
counterpart.
If the upper limit permitted in a VBR (or CBR, for that matter) encoding isn't high
enough, there's no question that playback sound quality will suffer.
Bit Rate Displayed By MP3 Player
The bit rate that an MP3 player program displays when it reads CBR input files is
obviously the CBR value itself.
128,000 is usually the minimum default value for "acceptable" quality, although file size
may also be on the mind of the person running the encoder program so the choice of
this value is a tradeoff.
The VBR value of the encoded MP3 data clearly changes as each second goes by.
So what the MP3 player program displays is sort of up to the software author.
The player program could be designed to display a "VBR indicator" with an updated
value every second as the VBR bit rate changes.
It wouldn't make any CPU-cycle sense to try and "average" the variable bit rates
encountered during a time interval and then display that average, since you want to
listen to an MP3 file, not watch the VBR bit rate value change.
A more worthwhile use of CPU cycles in VBR decoder/players would be such things as
calculating an EQ (tone control) to improve sound quality still further or driving a
visualization plugin.
Also, more CPU cycles are spent in "overhead" for a VBR playback than a CBR
playback, as the extra VBR bit rate values must be utilized to determine and then use
the bit rate of the next section of MP3 input data, insofar as how it is to be decoded and
converted to analog sound output.
Clearly CBR playback provides many more extra CPU cycles for use by the EQ or
visualization plugin.
VBR thus needs a stronger machine to "sound good", because of the additional VBR
overhead.
Bit Rate versus Sound Quality
Note that there are definite sonic differences in an encoded MP3 sound file as the bit
rate upper limit is increased.
For example 192,000 is the smallest bit rate at which my DOS Fraunhofer encoder will
produce "true stereo" L/R output.
Anything less than that provides some "compromise" in resulting stereo quality with
some "mixing" of L/R data.
Also, my Fraunhofer is a CBR encoder, not a VBR encoder.
So the 192,000 bit rate at which I do my MP3 encoding is definitely going to produce
larger MP3 files than if I used 128,000 or 160,000.
But my personal primary objective is "sonic quality", not smaller file sizes.
These MP3 files are for my own use and enjoyment, and 192,000 produces a noticeably
superior result when played back.
I also run my Fraunhofer CBR encoder at "maximum quality", which makes the audio
analysis algorithms run at their slowest.
Thus my MP3 encoding time is longer than it might be but the audio quality result is
what I'm shooting for.
You only produce this encoded MP3 file once, but you listen to it forever.
So why not make it sound its best and not be concerned with a little extra time to do
that?
In general, the higher the CBR bit rate or VBR upper-limit bit rate, the better will be the
audio quality result.
This is common sense.
Happy encoding!
If you have comments or questions, you can reach the author by
email.
The Southern California OS/2 User Group
P.O. Box 26904
Santa Ana, CA 92799-6904, USA
Copyright 2002 the Southern California OS/2 User Group. ALL RIGHTS
RESERVED.
SCOUG, Warp Expo West, and Warpfest are trademarks of the Southern California OS/2 User Group.
OS/2, Workplace Shell, and IBM are registered trademarks of International
Business Machines Corporation.
All other trademarks remain the property of their respective owners.
|