文章摘要
孙杰,王宏,吾守尔·斯拉木.结合注意力机制和因果卷积网络的维吾尔语方言识别[J].声学技术,2020,39(6):697~703
结合注意力机制和因果卷积网络的维吾尔语方言识别
The Uyghur dialect recognition based on attention mechanism and causal convolution networks
投稿时间:2020-04-16  修订日期:2020-05-29
DOI:10.16300/j.cnki.1000-3630.2020.06.008
中文关键词: 注意力机制  因果卷积网络  空洞卷积  维吾尔语方言  识别
英文关键词: attention mechanism  causal convolution networks  dilated convolution  Uyghur dialect  recognition
基金项目:国家自然科学基金(U1435215、U1603262、61433012、201704041014)
作者单位E-mail
孙杰 新疆大学信息科学与工程学院, 新疆乌鲁木齐 830046
昌吉学院, 新疆昌吉 831100 
 
王宏 昌吉学院, 新疆昌吉 831100  
吾守尔·斯拉木 新疆大学信息科学与工程学院, 新疆乌鲁木齐 830046
昌吉学院, 新疆昌吉 831100 
wushour@xju.edu.cn 
摘要点击次数: 672
全文下载次数: 501
中文摘要:
      针对传统x-vector模型生成方言语音段级表示时,未考虑不同帧级特征对方言辨识作用不一致的问题,以及维吾尔语的黏着性特点,提出结合注意力机制和因果卷积网络的维吾尔语方言识别方法。首先使用多层因果卷网络实现方言语音序列建模,然后采用空洞卷积核增大感受野扩展采样范围,最后使用注意力池化获取方言语音段级特征。维吾尔语方言识别实验结果表明,所提方法较标准x-vector模型方言识别的识别准确率提升了23.19个百分点。
英文摘要:
      Considering that different frame features have different effects on dialect recognition when the traditional x-vector model is used to generate segment representation of dialect speech, and that Uighur language is an agglutinative language, a recognition method of Uighur dialect based on attention mechanism and causal convolution network is proposed. First, the multi-layer causal volume network is used to model the speech sequence, then the dilated convolution kernel is used to expand the sampling range of the receptive field, and finally the attention pooling is used to obtain the speech segment features. The experimental results of Uyghur dialect recognition show that the accuracy of the proposed method is 23.19 percentage higher than that of the standard x-vector model.
查看全文   查看/发表评论  下载PDF阅读器
关闭