摘要: | 數位音訊編碼已經非常流行而且被廣泛的應用在各種不同的領域中,傳統的數位音訊解碼器實現大致上可以分為三種類型,包括硬體實現、軟體實現及軟體硬體共同設計,在本論文中會針對這三種方法的關鍵技術提出相對應的解決方案,並分成四個研究的主題。在第一個主題中,我們提出一個純硬體的設計來實現一個低功率的先進式數位音訊(AAC)解碼器,根據解碼演算法的特性,整個系統會被切分為四個主要的運算模組,然後在低功率跟低複雜度的考量下,我們從架構的層級以及演算法的層級對個別的模組和整合的系統都提出最佳化的方法。在低成本的考量下,雙聲道的數位音訊處理也只需要單一套的硬體即可有效率的處理,在UMC 0.18 um 1P6M的製程下,我們所設計的AAC解碼器只需要3 MHz就可以達到即時播放的要求,而且如果在44.1 KHz的音樂取樣頻率條件下其功率消耗更是只需要2.45 mW。 第二主題部分則是針對單一解碼器支援多種音訊標準的設計進行探討,主要的動機在於目前並沒有一個統一的音訊標準可以取代其他的標準,因此在單一的產品中必須具備支援多標準的能力才能符合消費者的需求,在此論文中我們提出一個Configurable Common Filterbank Processor (CCFP)的處理單元,它可以用來作為一般通用處理器的加速器來提升系統的效能,此設計也是採用UMC 0.18 um 的製程,所需要的gate counts只有26.7 Kgates,而所需的操作頻率只需在1.3 MHz到3.6 MHz的範圍內,對AC-3、MP3及AAC三個標準其功率消耗則分別是0.9 mW、3.2 mW以及1 mW。 第三個主題部分則是針對軟硬體共同設計做介紹,因為在一個SoC的平台上可以同時提供通用處理器所具備的高彈性優點以及客製化硬體的高效能、低功率消耗特性,所以我們特別提出軟硬體共同設計的方式來實現數位音訊的解碼。首先,我們提出一個分析的模型,藉由此模型分析的結果可以將系統做最合適的軟體及硬體切分,然後便可針對硬體及軟體進行個別的最佳化設計,經由軟硬體共同設計搭配在第二個主題所設計的硬體加速器(CCFP),我們的系統可以比單獨軟體在ARM上執行的效能再提升約15倍左右。 最後一個主題的部份則是數位音訊在DSP上的實現,以DSP的實現方式可以達到比較高的彈性,可以較容易增加一些新的功能,但是由於目前比較高效能的DSP都是採用VLIW的架構,表示其可以在一個運算週期內同時執行多道的運算指令,然而傳統的數位音訊演算法還都是循序執行的方式,如果要在以VLIW架構的DSP上得到較高的效能就必須將演算法改寫成可以平行處理的方式,本論文的部份則是以AAC為實例來探討各種在VLIW架構DSP上的最佳化方式,我們最後所實現的AAC解碼器只需操作在15 MHz就可以達到即時播放的要求,而所使用的程式記憶體及資料記憶體則是各要27 Kbytes。 Digital audio coding is popular and has been applied in many areas. The conventional implementation approaches for these audio decoders can be categorized to three methods, i.e. dedicated hardware, software-based general purpose processor (GPP) and hardware/software co-design. This dissertation covers the key techniques of all these methods and brings four contributions. First, an implementation of low power and pure hardware AAC audio decoder system is presented. Based on the characteristics of each decoding block, the AAC system is partitioned into four separate modules. For the low power and low complexity considerations, architectural and algorithmic level approaches are adopted in both the individual modules and the whole system. Referring to stereo processing, a single hardware is shared for the channel pairs with the low cost consideration. The hardware operations of each module are well scheduled with high utilization of pipeline, and further the parallel processing among blocks are joined to increase the efficiency. A 48 % power savings can be reached by using the pipeline and parallel techniques of the channel pair. The proposed AAC decoder is realized in UMC 0.18 ?m 1P6M technology and operated at only 3 MHz in the worst case. The power dissipation is only 2.45 mW at the sampling frequency of 44.1KHz. Second, due to audio applications for mobile phone and portable devices are increasingly popular. To attract consumer interest, a multi-standard design on a single device is the requirement of current audio decoder development. We present a configurable common filterbank processor (CCFP) for AC-3, MP3 and AAC audio decoder. It is used as an accelerator for general purpose processors to improve performance. All the filterbank transforms are derived to even- or odd-point IFFT flows. In the architecture, a fully pipelined approach is developed which can be configured for different operation modes. This design is synthesized using UMC 0.18 µm library and takes about 26.7K gates. It can be executed at a very low operation frequency with the range of 1.3 to 3.6 MHz. Besides, the power consumption is only 0.9 mW, 3.2 mW and 1 mW for AC-3, MP3 and AAC respectively. Third, SoC integration platform provides the flexibility of general-purpose processors and the high performance and low power consumption of custom hardware. We present a Hardware/Software co-design method for the implementation of AAC audio decoder. This approach not only considers the characteristics of algorithms, but also provides the numerical decision for evaluation of the various approaches. The overall system is first analyzed and profiled with ARM profiler. Then the decoder system is partitioned into software part and hardware part respectively based on the property of analysis. Besides, a multi-standard audio decoder based on the SoC Hardware/Software co-design approach is also presented. It supports popular audio formats including AC-3, MP3 and AAC. By using the accelerator and the processor with cache enabled, the overall system can get more than 15x speedup compared to software-based ARM audio decoder. Finally, we propose VLIW-aware software optimization techniques for the AAC decoding blocks on the parallel architecture core DSP (PACDSP) processor. This approach provides the flexibility for adding new extensions and solves two important issues, low power consumption and limited resources problems on DSP for portable devices. We change the traditional sequential algorithms into parallel processes and minimize the memory utilization of each block. The realized decoder can be operated at a lower frequency of only 15 MHz and needs only 27 Kbytes of program memory and 27 Kbytes of data memory. |