文章摘要
沈凌洁,王蔚.基于融合特征的短语音汉语声调自动识别方法[J].声学技术,2018,37(2):167~174
基于融合特征的短语音汉语声调自动识别方法
Fusion feature based automatic Chinese short tone classification
投稿时间:2017-05-06  修订日期:2017-08-12
DOI:10.16300/j.cnki.1000-3630.2018.02.013
中文关键词: 韵律特征  倒谱特征  梅尔倒谱系数  短语音声调  声调分类  融合  卷积神经网络
英文关键词: prosodic feature  cepstral feature  Mel-Frequency Cepstral Coefficient (MFCC)  short tone  tone classification  fusion  convolutional neural network
基金项目:国家社会科学基金教育学一般项目(BCA150054)资助
作者单位E-mail
沈凌洁 南京师范大学教育科学学院, 江苏南京 210097  
王蔚 南京师范大学教育科学学院, 江苏南京 210097 wangwei5@njnu.edu.cn 
摘要点击次数: 1199
全文下载次数: 1680
中文摘要:
      提出一种基于韵律特征(基频、时长)和梅尔倒谱系数(Mel-Frequency Cepstral Coefficient,MFCC)特征的融合特征进行短语音汉语声调识别的方法,旨在利用两种特征的优势提高短语音汉语声调识别率。该融合特征包括7个根据不同模型得到的韵律特征和统计参数以及4个从每个音段的梅尔倒谱系数计算得来的对数化后验概率,使用高斯混合模型表示4个声调的倒谱特征的分布。实验分两步:第一步,将基于韵律特征和倒谱特征的分类器在决策阶段混合起来进行声调分类,分别赋予两个分类器权重,计算倒谱特征和韵律特征在声调分类任务中的权重;第二步,将基于字的韵律特征和基于帧的倒谱特征结合起来生成融合特征的超向量,使用融合特征进行汉语声调识别,根据准确率、未加权平均召回率(Unweigted Average Recall,UAR)和科恩卡帕(Cohen’s Kappa)系数3个指标,比较并评估5种分类器(两种设置的高斯混合模型,后向传播神经网络,支持向量机和卷积神经网络(Convolutional Neural Network,CNN))在不平衡数据集上的分类效果。实验结果表明:(1)倒谱特征方法能够提高汉语声调的识别率,该特征在总体分类任务中的权重为0.11;(2)基于融合特征的深度学习(CNN)方法对声调的识别率最高,为87.6%,与高斯混合模型的基线系统相比,提高了5.87%。该研究证明了倒谱特征法能够提供与韵律特征法互补的信息,从而提高短语音汉语声调识别率;同时,该方法可以运用到韵律检测和副语言信息检测等相关研究中。
英文摘要:
      This study proposes an approach to automatically recognizing short Chinese tone based on the fusion of prosodic and cepstral features to improve the recognition rate of Chinese tone.The fused features include seven prosodic features and their statistic parameters based on different models as well as four MFCC log posterior probabilities calculated from four Gaussian mixture models (GMM).Experiments have two steps:First,the classifiers based on prosodic features and cepstral features are combined to classify tone,and both of the two classifiers are given weights to examine the functions of prosodic features and cepstral features on tone classification;Second,seven reduced prosodic features based on different models and four log posterior probabilities obtained from frame-level MFCC which are modeled by Gaussian mixture model are concatenated into a fusion feature.Then,the tone classification performances of five classifiers,namely GMM with two configurations,back propagating neural network (BPNN),support vector machine (SVM) and convolutional neural network (CNN),are compared and evaluated with three indicators of accuracy,unweighted average recall (UAR) and Cohen's Kappa coefficient.Results show that:(1) Cepstral feature method can improve the recognition rate of Chinese tone classification and the weight of the features in the overall tone classification is 0.11;(2) Deep learning method of CNN using fused features outperforms other classifiers with a recognition rate of 87.6%,which is improved by 5.87% compared with the GMM baseline system.This study indicates that cepstral features provide complementary information to tone classification and hence improve the recognition rate.This new method could also be applied to other relevant researches on prosody detection and paralinguistic information detection.
查看全文   查看/发表评论  下载PDF阅读器
关闭