文章摘要
王基豪,周晓彦,李大鹏,韩智超,王丽丽.基于卷积神经网络和Transformer网络的鸟声识别[J].声学技术,2023,42(5):675~683
基于卷积神经网络和Transformer网络的鸟声识别
Bird sound recognition based on convolutional neural network and Transformer network
投稿时间:2022-04-29  修订日期:2022-08-17
DOI:10.16300/j.cnki.1000-3630.2023.05.018
中文关键词: 鸟声识别  特征提取  卷积神经网络(CNN)  Transformer网络
英文关键词: bird sound recognition  feature extraction  convolutional neural network (CNN)  Transformer network
基金项目:
作者单位E-mail
王基豪 南京信息工程大学电子与信息工程学院, 江苏南京 210044  
周晓彦 南京信息工程大学电子与信息工程学院, 江苏南京 210044 xiaoyan_zhou@nuist.edu.cn 
李大鹏 南京信息工程大学电子与信息工程学院, 江苏南京 210044  
韩智超 南京信息工程大学电子与信息工程学院, 江苏南京 210044  
王丽丽 南京信息工程大学电子与信息工程学院, 江苏南京 210044  
摘要点击次数: 537
全文下载次数: 778
中文摘要:
      针对传统鸟声识别算法中特征提取方式单一、分类识别准确率低等问题,提出一种结合卷积神经网络和Transformer网络的鸟声识别方法。该方法综合考虑网络局部特征学习和全局上下文依赖性构造,从原始鸟声音频信号中提取短时傅里叶变换(Short Time Fourier Transform,STFT)语谱图特征,将其输入到卷积神经网络(ConvolutionalNeural Network,CNN)中提取局部频谱特征信息,同时提取鸟声信号的对数梅尔特征及一阶差分、二阶差分特征用于合成梅尔频率倒谱系数(Mel Frequency Cepstrum Coefficient,MFCC)混合特征向量,将其输入到Transformer网络中获取全局序列特征信息,最后融合所提取的特征可得到更丰富的鸟声特征参数,通过Softmax分类器得到鸟声识别结果。在Birdsdata和xeno-canto鸟声数据集上进行实验,平均识别准确率分别达到了97.81%和89.47%。实验结果表明该方法相较于其他现有的鸟声识别模型具有更高的识别准确率。
英文摘要:
      In view of the singleness of feature extraction method and low classification accuracy in traditional bird sound recognition algorithms, a bird sound recognition method that combines convolutional neural networks and Transformer networks is proposed in this paper. The method comprehensively considers local feature learning and global context dependency construction of the network, first extracts the features of the short-time Fourier transform (STFT) spectrogram from the original bird sound signal, and then inputs them into the convolution neural network (CNN) to extract local spectrum feature information. At the same time, the log-Mel feature, the first-order and secondorder difference features of bird sound signal are extracted to synthesize the mixed Mel frequency cepstrum coefficient (MFCC) feature vector and input into the Transformer network to obtain the global sequence feature information. Finally, the obtained features are fused to obtain richer bird sound feature parameters, and the bird sound recognition results are obtained by Softmax classifier. Experiments on Birdsdata and xeno-canto bird sound datasets show that the average recognition accuracies of this method are 97.81% and 89.47%, respectively, higher than that of other existing bird sound recognition models.
查看全文   查看/发表评论  下载PDF阅读器
关闭