Time-frequency representation based feature extraction for audio scene classification
投稿时间:2016-11-04  修订日期:2017-03-15
中文关键词: 声场景  恒Q变换  梯度直方图  局部二值模式
英文关键词: acoustic scene classification  constant-Q transform  histogram of oriented gradient  local binary pattern
高敏 西北工业大学电子信息学院, 陕西西安 710129 253191300@mail.nwpu.edu.cn 
尹雪飞 西北工业大学电子信息学院, 陕西西安 710129  
陈克安 西北工业大学航海学院, 陕西西安 710072  
摘要点击次数: 1713
全文下载次数: 2132
      To recognize audio scene in a complex environment according to an audio stream, a constant-Q transform is chosen to obtain the time-frequency representation (TFR) of the signal. Due to the lack of prior knowledge on the signal and noise, a mean filtering is used to smooth the TFR image, then the features based on the histogram of gradients (HOG) of the TFR image are extracted, which can reflect the local direction of variation (both in time and frequency) of the signal power spectrum. Consequently the Local Binary Pattern (LBP) feature is considered, which captures the texture information of the signal. As for the classification algorithm, support vector machine with linear kernel function is used. Classification experiment has been done on the data of different acoustic scenes. Compared with the classical audio features such as MFCCs, the proposed features capture the discriminative power of a given audio scene to show good performance in classification, and the combined features achieve the best results. It is valuable in the field of feature extraction of acoustic signal.
查看全文   查看/发表评论  下载PDF阅读器