欢迎访问《声学技术》编辑部！

文章摘要

罗春梅,张风雷.基于均值特征和改进深度神经网络的说话人识别算法[J].声学技术,2021,40(4):503~507

基于均值特征和改进深度神经网络的说话人识别算法

Speaker recognition based on mean feature and improved deep neural network

投稿时间：2020-09-28 修订日期：2020-12-13

DOI：10.16300/j.cnki.1000-3630.2021.04.010

中文关键词: 说话人识别梅尔频率倒谱系数(MFCC) 深度卷积神经网络高斯均值矩阵

英文关键词: speaker recognition Mel frequency cepstrum coefficient (MFCC) deep convolutional neural network Gaussian mean matrix

基金项目:辽宁省教育厅科学研究项目（LNSJYT201904）。

作者	单位	E-mail
罗春梅	辽东学院化工与机械学院, 辽宁丹东 118000	luo_cm115@163.com
张风雷	辽东学院化工与机械学院, 辽宁丹东 118000

摘要点击次数: 570

全文下载次数: 390

中文摘要:

为提高神经网络在说话人识别应用中的识别性能，提出基于高斯增值矩阵特征和改进深度卷积神经网络的说话人识别算法。算法首先通过最大后验概率提取基于梅尔频率倒谱系数（Mel Frequency Cepstrum Coefficient，MFCC）特征的高斯均值矩阵，并对特征进行噪声适应性补偿，以增强信号的帧间关联和说话人特征信息，然后采用改进的深度卷积神经网络进一步对准帧间信息，以提高说话人识别特征对背景噪声的适应性。实验结果表明，相比于高斯混合模型-通用背景模型等识别框架及传统MFCC等特征，该算法可取得更高的识别准确率和最小的识别均方误差。

英文摘要:

In order to improve the recognition performance, a speaker recognition algorithm based on Gaussian valueadded matrix features and improved deep convolutional neural network is proposed. In the algorithm, the adaptive Gaussian mean matrix based on Mel frequency cepstrum coefficient (MFCC) features is first extracted by the maximum posterior probability, and the noise adaptive compensation for features is performed to enhance interframe correlation and speaker feature information. Then, an improved deep convolutional neural network is used to further align the interframe information to improve the feature learning for speaker recognition and the adaptability to the back-ground noise environment. The experimental results show that, compared with Gaussian mixture model-general background model (GMM-UBM) framework and traditional MFCC features, the algorithm proposed in this paper achieves the best recognition accuracy and the least recognition mean square error.

查看全文查看/发表评论下载PDF阅读器

关闭