文章摘要
胡婷婷,冯亚琴,沈凌洁,王蔚.基于注意力机制的LSTM语音情感主要特征选择[J].声学技术,2019,38(4):414~421
基于注意力机制的LSTM语音情感主要特征选择
The salient feature selection by attention mechanism based LSTM in speech emotion recognition
投稿时间:2018-08-09  修订日期:2018-09-03
DOI:10.16300/j.cnki.1000-3630.2019.04.010
中文关键词: 特征选择  语音情感识别  深度学习  注意力机制
英文关键词: feature selection  speech emotion recognition  deep learning  attention mechanism
基金项目:中国国家社会科学基金会项目(BCA150054)
作者单位E-mail
胡婷婷 南京师范大学教育科学学院机器学习与认知实验室, 江苏南京 210097  
冯亚琴 南京师范大学教育科学学院机器学习与认知实验室, 江苏南京 210097  
沈凌洁 南京师范大学教育科学学院机器学习与认知实验室, 江苏南京 210097  
王蔚 南京师范大学教育科学学院机器学习与认知实验室, 江苏南京 210097 769370106@qq.com 
摘要点击次数: 915
全文下载次数: 640
中文摘要:
      传统的语音情感识别方式采用的语音特征具有数据量大且无关特征多的特点,因此选择出与情感相关的语音特征具有重要意义。通过提出将注意力机制结合长短时记忆网络(Long Short Term Memory,LSTM),根据注意力权重进行特征选择,在两个数据集上进行了实验。结果发现:(1)基于注意力机制的LSTM相比于单独的LSTM模型,识别率提高了5.4%,可见此算法有效提高了模型的识别效果;(2)注意力机制是一种有效的特征选择方法。采用注意力机制选择出了具有实际物理意义的声学特征子集,此特征集相比于原有公用特征集在降低了维数的情况下,提高了识别准确率;(3)根据选择结果对声学特征进行分析,发现有声片段长度特征、无声片段长度特征、梅尔倒谱系数(Mel-Frequency Cepstral Coefficient,MFCC)、F0基频等特征与情感识别具有较大相关性。
英文摘要:
      The traditional approaches to speech emotion recognition use the acoustic features characterized by large amount of data and redundancy. So, it is of great significance to choose the important phonetic features related to emotion. In this study, the attention mechanism is combined with Long Short Term Memory (LSTM) to conduct feature selection according to the attention parameters. The results show that:(1) the recognition rate of the attention mechanism based LSTM is increased by 5.4% compared with the single LSTM model, so this algorithm effectively improves the recognition accuracy; (2) the attention mechanism is an effective feature selection method, by which, the subsets of acoustic features with practical physical significance can be selected to improve the recognition accuracy and reduce the dimension compared with the original common feature set; (3) according to the selection results, the acoustic features are analyzed, and it is found that the emotion recognition is correlated with the features of voiced segment length, unvoiced segment length, fundamental frequency F0 and Mel-frequency cepstral coefficients.
查看全文   查看/发表评论  下载PDF阅读器
关闭