文章摘要
周琳,赵一良,朱竑谕,汤一彬.基于双耳语音分离和丢失数据技术的鲁棒语音识别算法[J].声学技术,2019,38(5):545~553
基于双耳语音分离和丢失数据技术的鲁棒语音识别算法
Robust speech recognition algorithm based on binaural speech separation and missing data technique
投稿时间:2018-09-14  修订日期:2018-10-19
DOI:10.16300/j.cnki.1000-3630.2019.05.011
中文关键词: 空间听觉  双耳声源分离  丢失数据技术  误识率
英文关键词: spatial hearing  binaural speech separation  missing data technique  speech recognition  word error rate (WER)
基金项目:国家自然科学基金(61571106、61501169、61201345)、中央高校基本科研业务费专项资金(2242013K30010)
作者单位E-mail
周琳 东南大学信息与工程学院水声信号处理教育部重点实验室, 江苏南京 210096 Linzhou@seu.edu.cn 
赵一良 东南大学信息与工程学院水声信号处理教育部重点实验室, 江苏南京 210096  
朱竑谕 东南大学信息与工程学院水声信号处理教育部重点实验室, 江苏南京 210096  
汤一彬 河海大学物联网学院, 江苏常州 213022  
摘要点击次数: 816
全文下载次数: 477
中文摘要:
      鲁棒语音识别技术在人机交互、智能家居、语音翻译系统等方面有重要应用。为了提高在噪声和语音干扰等复杂声学环境下的语音识别性能,基于人耳听觉系统的掩蔽效应和鸡尾酒效应,利用不同声源的空间方位,提出了基于双耳声源分离和丢失数据技术的鲁棒语音识别算法。该算法首先根据目标语音的空间方位信息,在双耳声信号的等效矩形带宽(Equivalent Rectangular Bandwidth,ERB)子带内进行混合语音信号的分离,从而得到目标语音的数据流。针对分离后目标语音在频域存在频谱数据丢失的问题,利用丢失数据技术修正基于隐马尔科夫模型的概率计算,再进行语音识别。仿真实验表明,由于双耳声源分离方法得到的目标语音数据去除了噪声和干扰的影响,所提出的算法显著提高了复杂声学环境下的语音识别性能。
英文摘要:
      Robust speech recognition has an important application in human-computer interaction, smart home, voice translation system and so on. In order to improve the speech recognition performance in complex acoustic environment with noise and speech interference, a robust speech recognition algorithm based on binaural speech separation and missing data technique is proposed in this paper. First, according to the azimuth of the target sound source, the algorithm separates the mixed data in the sub-bands of equivalent rectangular bandwidth (ERB), and obtains the data flow of the target sound source. Then, in order to solve the problem that the target source loses spectral data in some ERB sub-bands, the probability calculation based on hidden Markov model is modified by using the missing data technique, and finally the reconstructed spectrum data is utilized for speech recognition. The simulation results show that the proposed algorithm can improve the performance of speech recognition in complex acoustic environment, because the influence of noise and interference on the target sound source data is neglected after binaural speech separation.
查看全文   查看/发表评论  下载PDF阅读器
关闭