How AAC works
Flash Player 9 Update 3, code name MovieStar, introduced quite a lot of new and interesting features. One of them is the support for HE AAC v2, the state of the art general purpose audio encoding standard. Let’s take a look at how it works and compare it to the king: the good old MP3.
The good old MP3
MP3 audio encoding format is on the edge since its debut in the far 1993. MP3 encodes audio signals exploiting a perceptive model which mimic the human ear. Usually it is not possible to hear every frequency in a sound expecially if other frequencies are too “loud”. With specific frequency-masking and time-masking roles, it is possible to eliminate informations that a human cannot be usually able to hear.
The first and most important step is to transform the signal from time-domain to frequency-domain. MP3 uses an Hybrid approach. At first the original signal is sent to a band-pass filter bank. Each filtered band is then transformed to frequency domain with an MDCT (Modified Discrete Cosine Transform). Then the specific perceptual model of MP3 is applyed to eliminate informations that can not be heard. Finally the bitstream redundancies are eliminated.
We know that at 128Kbit/s MP3 has a good performance in reproducing a typical song. Below this bitrate, it’s easy to feel a progressive degradation of sound quality.
The AAC (Advanced Audio Codec)
AAC was first specified in the standard MPEG-2 Part 7 in 1997 and updated in MPEG-4 Part 3 in 1999. AAC is more linear, eliminating the hybrid approach in favor of a pure MDCT. To better encode both stationary signals and transient signals, the encoder can decide to use a wide 1024 samples window for the MDCT (better frequency resolution for stationary signals) or a set of 8 x 128 samples MDCTs (better time resolution in transient signals). This approach, with other optimizations (better join stereo, entropy coding, perceptual noise shaping, etc..) , assure to AAC an higher perceptual quality compared to MP3 at the same bitrate, expecially below 128Kbit/s.
This enhanced version of AAC introduces the SBR Technique (Side Band Replication). With SBR, a signal frequency spectrum is split in half. The part below 22KHz (lower frequencies) is encoded as usual, the part above (higher frequencies) is reconstructed as a trasformation of the spectrum envelope of the low frequencies block. A very small side-bitstream gives to the decoder the addictional informations necessary to reconstruct the final signal. With SBR, a signal of around 70-75Kbit/s is only slightly worst than a 128Kbit/s MP3.
The latest update of AAC standard, introduces the Parametric Stereo Technique (PS). With PS, only a mono signal is encoded and trasmitted, the second audio signal (supposing a stereo signal in input) is analysed and a set of parameters found and trasmitted to the decoder. These parameters describe matematically the difference beetween the left and the right channel in a stereo sound. This technique allow a further level of compression. At 64Kbit/s, a HE-AAC coded sound and a HE-AAC v2 codec sound are almost identical, but at lower bitrate, as 48Kbit/s, 32Kbit/s and 24Kbit/s, HE-AAC v2 produces a signal with an higher perceptal quality.
The support of AAC allows us to encode our sounds to 64Kbit/s with the same quality of a 128Kbit/s encoded MP3. Further more, for other use more susceptible to bandwidth usage, like Internet Radio, HE-AAC v2 gives us the possibility to encode our sounds to 32Kbit/s or lower with a surpraisingly good final result. In low bitrate streaming scenarios this can make the difference.