文章摘要
罗春梅,张风雷.基于均值特征和改进深度神经网络的说话人识别算法[J].声学技术,2021,40(4):503~507
基于均值特征和改进深度神经网络的说话人识别算法
Speaker recognition based on mean feature and improved deep neural network
投稿时间:2020-09-28  修订日期:2020-12-13
DOI:10.16300/j.cnki.1000-3630.2021.04.010
中文关键词: 说话人识别  梅尔频率倒谱系数(MFCC)  深度卷积神经网络  高斯均值矩阵
英文关键词: speaker recognition  Mel frequency cepstrum coefficient (MFCC)  deep convolutional neural network  Gaussian mean matrix
基金项目:辽宁省教育厅科学研究项目(LNSJYT201904)。
作者单位E-mail
罗春梅 辽东学院化工与机械学院, 辽宁丹东 118000 luo_cm115@163.com 
张风雷 辽东学院化工与机械学院, 辽宁丹东 118000  
摘要点击次数: 570
全文下载次数: 390
中文摘要:
      为提高神经网络在说话人识别应用中的识别性能,提出基于高斯增值矩阵特征和改进深度卷积神经网络的说话人识别算法。算法首先通过最大后验概率提取基于梅尔频率倒谱系数(Mel Frequency Cepstrum Coefficient,MFCC)特征的高斯均值矩阵,并对特征进行噪声适应性补偿,以增强信号的帧间关联和说话人特征信息,然后采用改进的深度卷积神经网络进一步对准帧间信息,以提高说话人识别特征对背景噪声的适应性。实验结果表明,相比于高斯混合模型-通用背景模型等识别框架及传统MFCC等特征,该算法可取得更高的识别准确率和最小的识别均方误差。
英文摘要:
      In order to improve the recognition performance, a speaker recognition algorithm based on Gaussian valueadded matrix features and improved deep convolutional neural network is proposed. In the algorithm, the adaptive Gaussian mean matrix based on Mel frequency cepstrum coefficient (MFCC) features is first extracted by the maximum posterior probability, and the noise adaptive compensation for features is performed to enhance interframe correlation and speaker feature information. Then, an improved deep convolutional neural network is used to further align the interframe information to improve the feature learning for speaker recognition and the adaptability to the back-ground noise environment. The experimental results show that, compared with Gaussian mixture model-general background model (GMM-UBM) framework and traditional MFCC features, the algorithm proposed in this paper achieves the best recognition accuracy and the least recognition mean square error.
查看全文   查看/发表评论  下载PDF阅读器
关闭