孙杰,王宏,吾守尔·斯拉木.结合注意力机制和因果卷积网络的维吾尔语方言识别[J].声学技术,2020,39(6):697~703 |
结合注意力机制和因果卷积网络的维吾尔语方言识别 |
The Uyghur dialect recognition based on attention mechanism and causal convolution networks |
投稿时间:2020-04-16 修订日期:2020-05-29 |
DOI:10.16300/j.cnki.1000-3630.2020.06.008 |
中文关键词: 注意力机制 因果卷积网络 空洞卷积 维吾尔语方言 识别 |
英文关键词: attention mechanism causal convolution networks dilated convolution Uyghur dialect recognition |
基金项目:国家自然科学基金(U1435215、U1603262、61433012、201704041014) |
|
摘要点击次数: 672 |
全文下载次数: 501 |
中文摘要: |
针对传统x-vector模型生成方言语音段级表示时,未考虑不同帧级特征对方言辨识作用不一致的问题,以及维吾尔语的黏着性特点,提出结合注意力机制和因果卷积网络的维吾尔语方言识别方法。首先使用多层因果卷网络实现方言语音序列建模,然后采用空洞卷积核增大感受野扩展采样范围,最后使用注意力池化获取方言语音段级特征。维吾尔语方言识别实验结果表明,所提方法较标准x-vector模型方言识别的识别准确率提升了23.19个百分点。 |
英文摘要: |
Considering that different frame features have different effects on dialect recognition when the traditional x-vector model is used to generate segment representation of dialect speech, and that Uighur language is an agglutinative language, a recognition method of Uighur dialect based on attention mechanism and causal convolution network is proposed. First, the multi-layer causal volume network is used to model the speech sequence, then the dilated convolution kernel is used to expand the sampling range of the receptive field, and finally the attention pooling is used to obtain the speech segment features. The experimental results of Uyghur dialect recognition show that the accuracy of the proposed method is 23.19 percentage higher than that of the standard x-vector model. |
查看全文
查看/发表评论 下载PDF阅读器 |
关闭 |