李涛,曹辉,郭乐乐.深度神经网络的语音深度特征提取方法[J].声学技术,2018,37(4):367~371 |
深度神经网络的语音深度特征提取方法 |
Speech deep feature extraction method for deep neural network |
投稿时间:2017-08-04 修订日期:2017-10-18 |
DOI:10.16300/j.cnki.1000-3630.2018.04.013 |
中文关键词: 语音识别 深度自编码器 梅尔频率倒谱系数 |
英文关键词: speech recognition Deep Auto-Encoding (DAE) Mel-Frequency Cepstral Coefficient (MFCC) |
基金项目:国家自然科学基金资助(1202020368、11074159、11374199)。 |
|
摘要点击次数: 1234 |
全文下载次数: 761 |
中文摘要: |
为了提升连续语音识别系统性能,将深度自编码器神经网络应用于语音信号特征提取。通过堆叠稀疏自编码器组成深度自编码器(Deep Auto-Encoding,DAE),经过预训练和微调两个步骤提取语音信号的本质特征,使用与上下文相关的三音素模型,以音素错误率大小为系统性能的评判标准。仿真结果表明相对于传统梅尔频率倒谱系数(Mel-Frequency Cepstral Coefficient,MFCC)特征以及优化后的MFCC特征,基于深度自编码器提取的深度特征更具优越性。 |
英文摘要: |
In order to improve the performance of continuous speech recognition system, this paper applies the deep auto-encoder neural network to the speech signal feature extraction process. The deep auto-encoder is formed by stacking sparsely the auto-encoder. The neural networks based on deep learning introduce the greedy layer-wise learning algorithm by pre-training and fine-tuning. The context-dependent three-phoneme model is used in the continuous speech recognition system, and the phoneme error rate is taken as the criterion of system performance. The simulation results show that the deep auto-encoder based deep feature is more advantageous than the traditional MFCC features and optimized MFCC features. |
查看全文
查看/发表评论 下载PDF阅读器 |
关闭 |