文章摘要
毛海全,冯海泓,洪峰,马皓天,徐楚林,郑立通.面向中文短语音的文本无关说话人确认新框架[J].声学技术,2024,43(4):503~510
面向中文短语音的文本无关说话人确认新框架
A new framework for text-independent speaker verification based on Chinese short utterance
投稿时间:2023-02-08  修订日期:2023-03-03
DOI:10.16300/j.cnki.1000-3630.2024.04.008
中文关键词: 说话人确认  短语音  注意力机制  验证词选择
英文关键词: speaker verification  short utterance  attention mechanism  verification word selection
基金项目:上海市自然科学基金项目(22ZR1475700)、中国科学院声学研究所自主部署"前沿探索"项目(QYTS202114)、上海市人才发展资金(2020011)。
作者单位E-mail
毛海全 中国科学院声学研究所东海研究站, 上海 201815
中国科学院大学, 北京 100049 
 
冯海泓 中国科学院声学研究所东海研究站, 上海 201815  
洪峰 中国科学院声学研究所东海研究站, 上海 201815 hongfeng@mail.ioa.ac.cn 
马皓天 中国科学院声学研究所东海研究站, 上海 201815
中国科学院大学, 北京 100049 
 
徐楚林 中国科学院声学研究所东海研究站, 上海 201815  
郑立通 中国科学院声学研究所东海研究站, 上海 201815
中国科学院大学, 北京 100049 
 
摘要点击次数: 134
全文下载次数: 69
中文摘要:
      相较于文本相关说话人确认,文本无关说话人确认由于验证文本内容不受限制,结合语音识别能够有效避免录音欺诈等常见攻击。然而,文本无关说话人确认系统在短语音验证上会出现严重的性能下降。为此,文章首先提出了一种改进的端到端模型,通过长、短语音说话人分类损失增强网络对不同时长语音段的说话人分类识别能力;同时,在嵌入码空间中增大同一说话人的短语音和长语音之间的相似度,减小不同说话人的短语音之间的相似度,增强网络对短语音的特征提取能力。此外,还提出了一种基于注意力机制的验证词选择方法,选择具有高注意力权重的中文词作为系统验证提示词。实验结果表明,文章提出的改进的端到端模型结合softmax预训练使得模型在短测试语音上的等错误率相对降低29%,基于注意力机制的验证词选择方法也能筛选出具有更好识别结果的验证词,二者结合能够有效提升说话人确认系统对于短中文语音的识别性能。
英文摘要:
      The verification word content of text-independent speaker verification is not constrained. Compared with text-dependent speaker verification, text-independent speaker verification can effectively avoid common attacks such as recording fraud when combined with speech recognition. However, text-independent speaker verification systems suffer from severe performance degradation on short verification utterances. For this reason, an improved end-to-end model is proposed in this paper. The speaker classification losses of both long and short utterances are utilized to enhance the network's ability to classify and identify speakers of the speech segments of different durations. Meanwhile, the similarity of short utterances and long utterances belonging to the same speaker is increased in the embedding space, the similarity of short utterances belonging to different speakers is reduced, and the feature extraction capability of the network for short utterances is enhanced. In addition, an attention mechanism-based verification word selection method is proposed. The Chinese words with high attention weights are selected as the verification prompt text of the speaker verification system. The experimental results show that the improved end-toend model combined with softmax pre-training can result in a 29% relative reduction in equal error rate on short test utterances, and the attention mechanism-based verification word selection method can also effectively select verification words with better recognition results. The combination of the two methods can effectively improve the recognition performance of the speaker verification system for short Chinese utterances.
查看全文   查看/发表评论  下载PDF阅读器
关闭