Convolutional-Neural-Networks-for-Speech-Recogniti

ConvolutionalNeuralNetworksforSpeechRecognitionOssamaAbdel-Hamid,Abdel-rahmanMohamed,HuiJiang,LiDeng,GeraldPenn,andDongYuIEEE/ACMTRANSACTIONSONAUDIO,SPEECH,ANDLANGUAGEPROCESSING,VOL.22,NO.10,OCTOBER2014Organizationofthepaper•Introduction•DeepNeuralNetworks•ConvolutionalNeuralNetworks•CNNwithlimitedweightsharing•Experiments•ConclusionsDeepNeuralNetworks•Generallyspeaking,adeepneuralnetwork(DNN)referstoafeedforwardneuralnetworkwithmorethanonehiddenlayer.Eachhiddenlayerhasanumberofunits(orneurons),eachofwhichtakesalloutputsofthelowerlayerasinput,multipliesthembyaweightvector,sumstheresultandpassesitthroughanon-linearactivationfunctionsuchassigmoidortanh.•Allneuronactivationsineachlayercanberepresentedinthefollowingmatrixform:•Foramulti-classclassificationproblem,theposteriorprobabilityofeachclasscanbeestimatedusinganoutputsoftmaxlayer:•InthehybridDNN-HMMmodel,theDNNoutputlayercomputesthestateposteriorprobabilitieswhicharedividedbythestates’priorstoestimatetheobservationlikelihoods.•BecauseoftheincreasedmodelcomplexityofDNNs,apretrainingalgorithmisoftenneeded,whichinitializesallweightmatricespriortotheaboveback-propagationalgorithm.•OnepopularmethodtopretrainDNNsusestherestrictedBoltzmannmachine(RBM).•AnRBMhasasetofhiddenunitsthatareusedtocomputeabetterfeaturerepresentationoftheinputdata.Afterlearning,allRBMweightscanbeusedasagoodinitializationforoneDNNlayer.ConvolutionalNeuralNetworks•Theconvolutionalneuralnetwork(CNN)canberegardedasavariantofthestandardneuralnetwork.Insteadofusingfullyconnectedhiddenlayersasdescribedintheprecedingsection,theCNNintroducesaspecialnetworkstructure,whichconsistsofconvolutionandpoolinglayers.ConvolutionalNeuralNetworks•OrganizationoftheInputDatatotheCNN•ConvolutionPly•PoolingPly•LearningWeightsintheCNN•TreatmentofEnergyFeatures•TheOverallCNNArchitecture•BenefitsofCNNsforASROrganizationoftheInputDatatotheCNN•InusingtheCNNforpatternrecognition,theinputdataneedtobeorganizedasanumberoffeaturemapstobefedintotheCNN.•Weneedtouseinputsthatpreservelocalityinbothaxesoffrequencyandtime.•MFSCfeatureswillbeusedtorepresenteachspeechframe,alongwiththeirdeltasanddelta-deltas,inordertodescribetheacousticenergydistributionineachofseveraldifferentfrequencybands.•Asfortime,asinglewindowofinputtotheCNNwillconsistofawideamountofcontext.Asforfrequency,theconventionaluseofMFCCsdoespresentamajorproblembecausethediscretecosinetransformprojectsthespectralenergiesintoanewbasisthatmaynotmaintainlocality.Inthispaper,weshallusethelog-energycomputeddirectlyfromthemel-frequencyspectralcoefficients(i.e.,withnoDCT),whichwewilldenoteasMFSCfeatures.anumberofone-dimensional(1-D)featuremapsthree2-DfeaturemapsConvolutionPly•Everyinputfeaturemapisconnectedtomanyfeaturemapsintheconvolutionplybasedonanumberoflocalweightmatrices.Themappingcanberepresentedasthewell-knownconvolutionoperationinsignalprocessing.•eachunitofonefeaturemapintheconvolutionplycanbecomputedas:•writteninamoreconcisematrixform:PoolingPly•Thepoolingplyisalsoorganizedintofeaturemaps,andithasthesamenumberoffeaturemapsasthenumberoffeaturemapsinitsconvolutionply,buteachmapissmaller.•Thisreductionisachievedbyapplyingapoolingfunctiontoseveralunitsinalocalregion.Itisusuallyasimplefunctionsuchasmaximizationoraveraging.PoolingPlyLearningWeightsintheCNN•Allweightsintheconvolutionplycanbelearnedusingthesameerrorback-propagationalgorithmbutsomespecialmodificationsareneededtotakecareofsparseconnectionsandweightsharing.•letusfirstrepresenttheconvolutionoperationineq.(9)inthesamemathematicalformasthefullyconnectedANNlayersothatthesamelearningalgorithmcanbesimilarlyapplied.•Sincethepoolingplyhasnoweights,nolearningisneededhere.However,theerrorsignalsshouldbeback-propagatedtolowerpliesthroughthepoolingfunction.•Thatis,theerrorsignalreachingthelowerconvolutionplycanbecomputedas:TreatmentofEnergyFeatures•InASR,log-energyisusuallycalculatedperframeandappendedtootherspectralfeatures.InaCNN,itisnotsuitabletotreatenergythesamewaysinceitisthesumoftheenergyinallfrequencybandsandsodoesnotdependonfrequency.Instead,thelog-energyfeaturesshouldbeappendedasextrainputstoallconvolutionunits.TreatmentofEnergyFeaturesTheOverallCNNArchitecture•Inthispaper,wefollowthehybridANN-HMMframework,whereweuseasoftmaxoutputlayerontopofthetopmostlayeroftheCNNtocomputetheposteriorprobabilitiesforallHMMstates.TheseposteriorsareusedtoestimatethelikelihoodofallHMMstatesperframebydividingbythestates’priorprobabilities.Finally,thelikelihoodsofallHMMstatesaresenttoaViterbidecodertorecognizethecontinuousstreamofspeechunits.BenefitsofCNNsforASR•TheCNNhasthreekeyproperties:locality,weightsharing,andpooling.•Localityintheunitsoftheconvolutionplyallowsmorerobustnessagainstnon-whitenoisewheresomebandsarecleanerthantheothers.Thisisbecausegoodfeaturescanbecomputedlocallyfromcleanerpartsofthespectrumandonlyasmallernumberoffeaturesareaffectedbythenoise.BenefitsofCNNsforASR•Weightsharingcanalsoimprovemodelrobustnessandreduceoverfittingaseachweightislearnedfrommultiplefrequencybandsintheinputinsteadofjustfromonesinglelo

Convolutional-Neural-Networks-for-Speech-Recogniti

免费阅读已结束，点击付费阅读剩下 ... 页

阅读已结束，您可以下载文档离线阅读

现代家具调研

中国传统建筑空间组织设计美学特征

康城绿洲基坑支护施工组织设计56

金融保险企业会计常用公式

金融市场创新与发展(1)

化工识图 Word 文档

中外汽车融资租赁比较分析

剃须刀产品设计报告书

对影响我国果林产品国际竞争力因素的微观分析

XXXX年西安鱼化寨高密度住宅项目整体定位及发展策略研

相关文档

相关搜索