文章摘要
金浩,朱文博,段志奎,陈建文,李艾园.基于注意力机制的TDNN-LSTM模型及应用[J].声学技术,2021,40(4):508~514
基于注意力机制的TDNN-LSTM模型及应用
Attention mechanism based TDNN-LSTM model and its application
投稿时间:2020-11-08  修订日期:2021-01-23
DOI:10.16300/j.cnki.1000-3630.2021.04.011
中文关键词: 小样本  注意力机制  时延神经网络  长短时记忆递归网络
英文关键词: small sample  attention mechanism  time delay neural network (TDNN)  long and short time memory recurrent network
基金项目:广东省基础与应用基础研究基金项目支持-粤佛联合基金项目支持(2019A1515110273)
作者单位E-mail
金浩 佛山科学技术学院, 广东佛山 528000  
朱文博 佛山科学技术学院, 广东佛山 528000 zhuwenbo@fosu.edu.cn 
段志奎 佛山科学技术学院, 广东佛山 528000  
陈建文 佛山科学技术学院, 广东佛山 528000  
李艾园 佛山科学技术学院, 广东佛山 528000  
摘要点击次数: 17
全文下载次数: 18
中文摘要:
      在大数据规模下,基于深度学习的语音识别技术已经相当成熟,但在小样本资源下,由于特征信息的关联性有限,模型的上下文信息建模能力不足从而导致识别率不高。针对此问题,提出了一种嵌入注意力机制层(Attention Mechanism)的时延神经网络(Time Delay Neural Network,TDNN)结合长短时记忆递归(Long Short Term Memory,LSTM)神经网络的时序预测声学模型,即TLSTM-Attention,有效地融合了具有重要信息的粗细粒度特征以提高上下文信息建模能力。通过速度扰动技术扩增数据,结合说话人声道信息特征以及无词格最大互信息训练准则,选取不同输入特征、模型结构及节点个数进行对比实验。实验结果表明,该模型相比于基线模型,词错误率降低了3.37个百分点。
英文摘要:
      With the development of big data, speech recognition technology based on deep learning has been quite mature, but under small sample resources, due to the limited relevance of feature information, the modeling ability of contextual information of the model is insufficient, which leads to low recognition rate. To solve this problem, a timing prediction acoustic model (named TLSTM-Attention), which consists of a time delay neural network (TDNN) embedded by attention mechanism layer (Attention) and a long and short time memory (LSTM) recurrent neural network, is proposed in this paper. This model can effectively fuse the coarse and fine particle features with important information to improve the modeling ability of context information. By using the velocity perturbation technique to amplify the data and combining the speaker's channel information features and the lattice-free maximum mutual information training criteria, and by selecting different input features, model structures and numbers of nodes, a series of comparative experiments are conducted. The experimental results show that compared with the baseline model, the word error rate of the model is reduced by 3.77 percentage points.
查看全文   查看/发表评论  下载PDF阅读器
关闭