文章摘要
姚琨,杨吉斌,张雄伟,郑昌艳,孙蒙.基于多分辨率时频特征融合的声学场景分类[J].声学技术,2020,39(4):494~500
基于多分辨率时频特征融合的声学场景分类
Acoustic scene classification based on multi-resolution time-frequency feature fusion
投稿时间:2019-03-03  修订日期:2019-04-30
DOI:10.16300/j.cnki.1000-3630.2020.04.019
中文关键词: 声学场景分类  多分辨率卷积神经网络  时频特征融合  时频结构  非负矩阵分解
英文关键词: acoustic scene classification  multi-resolution convolutional neural network  time-frequency feature fusion  time-frequency structure  non-negative matrix factorization
基金项目:国家自然科学基金(61471394)、江苏省优秀青年基金(BK20180080)资助项目。
作者单位E-mail
姚琨 陆军工程大学, 江苏南京 210007  
杨吉斌 陆军工程大学, 江苏南京 210007 yjbice@sina.com 
张雄伟 陆军工程大学, 江苏南京 210007  
郑昌艳 陆军工程大学, 江苏南京 210007  
孙蒙 陆军工程大学, 江苏南京 210007  
摘要点击次数: 37
全文下载次数: 43
中文摘要:
      声学场景分类是计算机听觉中最难的任务之一,在单一特征条件下采用基本的卷积神经网络相对于传统的分类方法精度已经有所提升,但是效果依然不够理想。针对这一问题,在卷积神经网络框架下,提出了一种基于时频特征融合的声学场景分类方案。在分类模型构建方面,提出一种多分辨率卷积池化方案,构造多分辨率卷积神经网络,以更好地适应提取特征的时频结构;在特征选取方面,融合低层次包络特征对数——Mel子带能量和高层次结构特征——非负矩阵分解系数矩阵,把两种二维特征堆叠为三维特征送入分类模型。在2017年和2018年声学场景分类和事件检测挑战赛的开发数据集上进行了训练和测试。实验结果表明,文中提出方案比基线系统的分类精度分别提高7.5%和10.3%,可有效改善分类效果。
英文摘要:
      Acoustic scene classification is one of the most difficult tasks in computer hearing. It is difficult to achieve good classification performance by using basic convolutional neural network structure under the condition of single feature. To solve this problem, this paper proposes an acoustic scene classification scheme based on time-frequency feature fusion and multi-resolution convolutional neural network. In the model design, a multi-resolution pooling scheme is adopted to construct a multi-resolution convolutional neural network, which can better adapt to the time-frequency structure of feature extraction. In the feature extraction, the Log Mel-band energies of low level envelope features and the non-negative matrix decomposition coefficient matrix of high level structure features are fused into three dimensional features to input the classification model. Training and testing are carried out on the development data sets of the acoustic scene classification and event detection challenge in 2017 and 2018. The experimental results show that the classification accuracy of the proposed scheme is 7.5% and 10.3% higher than the classification accuracy of the baseline system respectively.
查看全文   查看/发表评论  下载PDF阅读器
关闭
function PdfOpen(url){ var win="toolbar=no,location=no,directories=no,status=yes,menubar=yes,scrollbars=yes,resizable=yes"; window.open(url,"",win); } function openWin(url,w,h){ var win="toolbar=no,location=no,directories=no,status=no,menubar=no,scrollbars=yes,resizable=no,width=" + w + ",height=" + h; controlWindow=window.open(url,"",win); } &et=A14657746343918BE66203552EDC6B3926CC0BAEAEB8C04733AB8B47848637E9B4BE7F8CAA1E335181D9A95F5C6F30BC74F04BFC6573940DB1B5B49534CC725AE0C953BC84A0CE19266CBDE13F44194DAF1739FA8BF41D7B&pcid=5B3AB970F71A803DEACDC0559115BFCF0A068CD97DD29835&cid=84529CA2B2E519AC&jid=DDCFCD5ACE1B1E5A6D46213553C850CA&yid=0D1D160AB8016934&aid=&vid=&iid=E158A972A605785F&sid=3F419E61BD389CC8&eid=8566B4AE2A8832E3&fileno=20200419&flag=1&is_more=0">