增强的基于vq算法的说话人语音识别外文文献翻译.doc
- 文档编号:279153
- 上传时间:2023-04-28
- 格式:DOC
- 页数:15
- 大小:769.21KB
增强的基于vq算法的说话人语音识别外文文献翻译.doc
《增强的基于vq算法的说话人语音识别外文文献翻译.doc》由会员分享,可在线阅读,更多相关《增强的基于vq算法的说话人语音识别外文文献翻译.doc(15页珍藏版)》请在冰点文库上搜索。
文献信息:
文献标题:
EnhancedVQ-basedAlgorithmsforSpeechIndependentSpeakerIdentification(增强的基于VQ算法的说话人语音识别)
国外作者:
NingpingFan,JustinianRosca
文献出处:
《Audio-andVideo-basedBiometriePersonAuthentication,InternationalConference,Avbpa,Guildford,Uk,June》,2003,2688:
470-477
字数统计:
英文1869单词,9708字符;中文3008汉字
外文文献:
EnhancedVQ-basedAlgorithmsforSpeechIndependentSpeakerIdentification
AbstractWeighteddistancemeasureanddiscriminativetrainingaretwodifferentapproachestoenhanceVQ-basedsolutionsforspeakeridentification.ToaccountforvaryingimportanceoftheLPCcoefficientsinSV,theso-calledpartitionnormalizeddistancemeasuresuccessfullyusednormalizedfeaturecomponents.Thispaperintroducesanalternative,calledheuristicweighteddistance,toliftuphigherorderMFCCfeaturevectorcomponentsusingalinearformula.Thenitproposestwonewalgorithmscombiningtheheuristicweightingandthepartitionnormalizeddistancemeasurewithgroupvectorquantizationdiscriminativetrainingtotakeadvantageofbothapproaches.ExperimentsusingtheTIMITcorpussuggestthatthenewcombinedapproachissuperiortocurrentVQ-basedsolutions(50%errorreduction).ItalsooutperformstheGaussianMixtureModelusingtheWaveletfeaturestestedinasimilarsetting.
1.Introduction
Vectorquantization(VQ)basedclassificationalgorithmsplayanimportantroleinspeechindependentspeakeridentification(SI)systems.Althoughinbaselineform,theVQ-basedsolutionislessaccuratethantheGaussianMixtureModel(GMM),itofferssimplicityincomputation.Foralargedatabaseofoverhundredsorthousandsofspeakers,bothaccuracyandspeedareimportantissues.HerewediscussVQenhancementsaimedataccuracyandfastcomputation.
1.1VQBasedSpeakerIdentificationSystem
Fig.1showstheVQbasedspeakeridentificationsystem.Itcontainsanofflinetrainingsub-systemtoproduceVQcodebooksandanonlinetestingsub-systemtogenerateidentificationdecision.Bothsub-systemscontainapreprocessingorfeatureextractionmoduletoconvertanaudioutteranceintoasetoffeaturevectors.FeaturesofinterestintherecentliteraturesincludetheMel-frequencycepstralcoefficients(MFCC),theLinespectrapairs(LSP),theWaveletpacketparameter(WPP),orPCAandICAfeatures].AlthoughtheWPPandICAhavebeenshowntoofferadvantages,weusedMFCCinthispapertofocusourattentiononothermodulesofthesystem.
Fig.1.AVQ-basedspeakeridentificationsystemfeaturesanonlinesub-systemforidentifyingtestingaudioutterance,andanofflinetrainingsub-system,whichusestrainingaudioutterancetogenerateacodebookforeachspeakerinthedatabase.
AVQcodebooknormallyconsistsofcentroidsofpartitionsoverspeaker’sfeaturevectorspace.TheeffectstoSIbydifferentpartitionclusteringalgorithms,suchastheLBGandtheRLS,havebeenstudied.TheaverageerrorordistortionofthefeaturevectorsoflengthTwithaspeakerkcodebookisgivenby
(1)
d(.,.)isadistancefunctionbetweentwovectors.isthejcodeofdimensionD.Sisthecodebooksize.Listhetotalnumberofspeakersinthedatabase.ThebaselineVQalgorithmofSIsimplyusestheLBGtogeneratecodebooksandthesquareoftheEuclideandistanceasthed(.,.).
ManyimprovementstothebaselineVQalgorithmhavebeenpublished.Amongthem,therearetwoindependentapproaches:
(1)chooseaweighteddistancefunction,suchastheF-ratioandIHMweights,thePartitionNormalizedDistanceMeasure(PNDM),andtheBhattacharyyaDistance;
(2)explorediscriminationpowerofinter-speakercharacteristicsusingtheentiresetofspeakers,suchastheGroupVectorQuantization(GVQ)discriminativetraining,andtheSpeakerDiscriminativeWeighting.ExperimentallywehavefoundthatPNDMandGVQaretwoveryeffectivemethodsineachofthegroupsrespectively.
1.2ReviewofPartitionNormalizedDistanceMeasure
ThePartitionNormalizedDistanceMeasureisdefinedasthesquareoftheweightedEuclideandistance.
(2)
Theweightingcoefficientsaredeterminedbyminimizingtheaverageerroroftrainingutterancesofallthespeakers,subjecttotheconstraintthatthegeometricmeanoftheweightsforeachpartitionisequalto1.
bearandomtrainingfeaturevectorofspeakerk,whichisassignedtopartitionjviaminimizationprocessinEquation
(1).Ithasmeanandvariancevectors:
(3)
Theconstrainedoptimizationcriteriontobeminimizedinordertoderivetheweightsis
(4)
WhereListhenumberofspeakers,andSisthecodebooksize.Letting
and(5)
Wehave
and(6)
Wheresub-scriptiisthefeaturevectorcomponentindex,kandjarespeakerandpartitionindicesrespectively.Becausekandjareinbothsidesoftheequations,theweightsareonlydependentonthedatafromonepartitionofonespeaker.
1.3ReviewofGroupVectorQuantization
Discriminativetrainingistousethedataofallthespeakerstotrainthecodebook,sothatitcanachievemoreaccurateidentificationresultsbyexploringtheinter-speakerdifferences.TheGVQtrainingalgorithmisdescribedasfollows.
GroupVectorQuantizationAlgorithm:
(1)Randomlychooseaspeakerj.
(2)SelectNvectors
(3)calculateerrorforallthecodebooks.
Iffollowingconditionsaresatisfiedgoto(4)
a),but;
b),whereWisawindowsize;
Elsegoto(5)
(4)foreach
where
(5)foreach,
,where
2.Enhancements
WeproposethefollowingstepstofurtherenhancetheVQbasedsolution:
(1)aHeuristicWeightedDistance(HWD),
(2)combinationofHWDandGVQ,and(3)combinationofPNDMandGVQ.
2.1HeuristicWeightedDistance
ThePNDMweightsareinverselyproportionaltopartitionvariancesofthefeaturecomponents,asshowninEquation(6).Ithasbeenshownthatvariancesofcepstral.Clearlywhereiisthevectorelementindex,whichreflectsfrequencyband.Thehighertheindex,thelessfeaturevalueanditsvariance.
WeconsideredaHeuristicWeightedDistanceas
(7)
Theweightsarecalculatedby
(8)
Wherec(S,D)isafunctionofboththecodebooksizeSandthefeaturevectordimensionD.Foragivencodebook,SandDarefixed,andthusc(S,D)isaconstant.Thevalueofc(S,D)isestimatedexperimentallybyperforminganexhaustivesearchtoachievethemaximumidentificationrateinagivensampletestdataset.
2.2CombinationofHWDandGVQ
CombinationoftheHWDandtheGVQisachievedbysimplyreplacingtheoriginalsquareoftheEuclideandistancewiththeHWDEquation(7),andtoadjusttheGVQupdatingparameterαwheneverneeded.
2.3CombinationofPNDMandGVQ
TocombinePNDMwiththeGVQrequiresaslightmorework,becausetheGVQaltersthepartitionandthusitscomponentvariance.Wehaveusedthefollowingalgorithmtoovercomethisproblem.
AlgorithmtoCombinePNDMwiththeGVQDiscriminativeTraining:
(1)UseLBGalgorithmtogenerateinitialLBGcodebooks;
(2)CalculatePNDMweightsusingtheLBGcodebooks,andproducePNDMweightedLBGcodebooks,whichareLBGcodebooksappendedwiththePNDMweights;
(3)PerformGVQtrainingwithPNDMdistancefunction,andgeneratetheinitialPNDM+GVQcodebooksbyreplacingtheLBGcodeswiththeGVQcodes;
(4)RecalculatePNDMweightsusingthePNDM+GVQcodebooks,andproducethefinalPNDM+GVQcodebooksbyreplacingtheoldPNDMweightswiththenewones.
3.ExperimentalComparisonofVQ-basedAlgorithms
3.1TestingDataandProcedures
168speakersinTESTsectionoftheTIMITcorpusareusedforSIexperiment,and190speakersfromDR1,DR2,DR3ofTRAINsectionareusedforestimatingthec(S,D)parameter.Eachspeakerhas10goodqualityrecordingsof16KHz,16bits/sample,andstoredasWAVEfilesinNISTformat.Twoofthem,SA1.WAVandSA2.WAV,areusedfortesting,andtherestfortrainingcodebooks.WedidnotperformsilenceremovalonWAVEfiles,sothatotherscouldreproducetheenvironmentwithnoadditionalcomplicationofVADalgorithmsandtheirparameters.
AMFCCprogramconvertsalltheWAVEfilesinadirectoryintoonefeaturevectorfile,inwhichallthefeaturevectorsareindexedwithitsspeakerandrecording.Foreachvalueoffeaturevectordimension,D=30,40,50,60,70,80,90,onetrainingfileandonetestingfilearecreated.TheyareusedbyallthealgorithmstotraincodebooksofsizeS=16,32,64,andtoperformidentificationtest,respectively.
TheMFCCfeaturevectorsarecalculatedasfollows:
1)dividetheentireutteranceintoblocksofsize512sampleswith256overlapping;2)performpre-emphasizefilteringwithcoefficient0.97;3)multiplywithHammingwindow,andperformshort-timeFFT;4)applythestandardmel-frequencytriangularfilterbankstothesquareofmagnitudeofFFT;5)applythelogarithmtothesumofalltheoutputsofeachindividualfilter;6)applyDCTontheentiresetofdataresultedfromallfilters;7)dropthezerocoefficient,toproducethecepstralcoefficients;8)afteralltheblocksbeingprocessed,calculatethemeanovertheentiretimedurationandsubtractitfromthecepstralcoefficients;9)calculatethe1stordertimederivativesofcepstralcoefficients,andconcatenatethemafterthecepstralcoefficients,toformafeaturevector.Forexample,afilter-bankofsize16willproduce30dimensionalfeaturevectors.
Duetoprojecttimeconstraint,theHWDparameterc(S,D)wasestimatedatS=16,32,64,D=40,80,sothatitachievesthehighestidentificationrateusingthe190speakersdatasetofTRAINsection.ForothervaluesofSand
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- 增强 基于 vq 算法 说话 人语 识别 外文 文献 翻译
![提示](https://static.bingdoc.com/images/bang_tan.gif)