欢迎访问《声学技术》编辑部！

文章摘要

李涛,曹辉,郭乐乐.深度神经网络的语音深度特征提取方法[J].声学技术,2018,37(4):367~371

深度神经网络的语音深度特征提取方法

Speech deep feature extraction method for deep neural network

投稿时间：2017-08-04 修订日期：2017-10-18

DOI：10.16300/j.cnki.1000-3630.2018.04.013

中文关键词: 语音识别深度自编码器梅尔频率倒谱系数

英文关键词: speech recognition Deep Auto-Encoding (DAE) Mel-Frequency Cepstral Coefficient (MFCC)

基金项目:国家自然科学基金资助（1202020368、11074159、11374199）。

作者	单位	E-mail
李涛	陕西师范大学物理学与信息技术学院, 陕西西安 710100
曹辉	陕西师范大学物理学与信息技术学院, 陕西西安 710100	caohui@snnu.edu.cn
郭乐乐	陕西师范大学物理学与信息技术学院, 陕西西安 710100

摘要点击次数: 1117

全文下载次数: 691

中文摘要:

为了提升连续语音识别系统性能，将深度自编码器神经网络应用于语音信号特征提取。通过堆叠稀疏自编码器组成深度自编码器（Deep Auto-Encoding，DAE），经过预训练和微调两个步骤提取语音信号的本质特征，使用与上下文相关的三音素模型，以音素错误率大小为系统性能的评判标准。仿真结果表明相对于传统梅尔频率倒谱系数（Mel-Frequency Cepstral Coefficient，MFCC）特征以及优化后的MFCC特征，基于深度自编码器提取的深度特征更具优越性。

英文摘要:

In order to improve the performance of continuous speech recognition system, this paper applies the deep auto-encoder neural network to the speech signal feature extraction process. The deep auto-encoder is formed by stacking sparsely the auto-encoder. The neural networks based on deep learning introduce the greedy layer-wise learning algorithm by pre-training and fine-tuning. The context-dependent three-phoneme model is used in the continuous speech recognition system, and the phoneme error rate is taken as the criterion of system performance. The simulation results show that the deep auto-encoder based deep feature is more advantageous than the traditional MFCC features and optimized MFCC features.

查看全文查看/发表评论下载PDF阅读器

关闭