您好,欢迎访问三七文档
当前位置:首页 > 办公文档 > 会议纪要 > Speech Coding Fundamentals and Applications
SpeehCoding:FundamentalsandAppliationsMarkHasegawa-Johnson1andAbeerAlwan21UniversityofIllinoisatUrbana-Champaign2UniversityofCaliforniaatLosAngelesHandbookofTeleommuniationsCopyright()2003byJohnWileyandSons,In.ThismaterialisusedbypermissionofJohnWileyandSons,In.AbstratInthishapter,wepresentanoverviewofthemostwidelyusedalgorithms,standards,andappliationsofwidebandandnarrowbandspeehoding.Algorithmsforspeehodingarelassi edintofourbroadheadings:(1)Waveformodingtehniques(inludingPCM,om-pandedPCM,andDPCM)whiharetypiallyusedforland-linetelephony,internettelephony,andseuremilitaryommuniations.(2)Sub-bandodinginludingpereptuallytransparentmulti-rateandembeddedodingwhihismainlyusedforinternetanddigitalaudioappliations.(3)Linearpreditiveanalysisbysynthesisoding(LPC-AS)algorithmsinludingmulti-pulseLPC,CELP,SELP,VSELP,andlow-delayCELP,whiharetypiallyusedfordigitalellulartelephony.(4)LPCvooders,inludingadvanedvooderalgorithms(e.g.,MELP,MBE,andPWI)areusedforappliationssuhasseuretelephonyandsatellitetelephony.AppliationsinareassuhasvoieoverIP(VoIP)anddigitalellularareemergingandrequireaspeehodertograefullyadapttorapidlyhanginghannelonditions|aneedwhihismetbyem-beddedandmulti-ratespeehodersassoiatedwithjointsoure-hannelodingalgorithms.Measuresofspeehoderpereptualqualityinludesubjetivemeasuresofintelligibility(DRTandDALT)andnaturalness(MOSandDAM),aswellasobjetivemeasuressuhassegmentalSNR,Barkspetraldistortion,PSQM,andPESQ.Speehodingstandardsaresetbyorgani-zationsinludingtheITU(forland-linetelephony),MPEG(formultimediaappliations),ETSI(forEuropeandigitalellular),TIA(forU.S.digitalellular),andDDVPC(forUnitedStatesmilitaryappliations).Keywords:speehoding,PCM,sub-bandoding,CELP,LPC,digitalellular,multimedia,voieoverIP,meanopinionsore(MOS).1IntrodutionSpeehodingistheproessofobtainingaompatrepresentationofvoiesignalsforeÆienttransmissionoverband-limitedwiredandwirelesshannelsand/orstorage.Today,speehodershavebeomeessentialomponentsinteleommuniationsandinthemultimediainfrastruture.CommerialsystemswhihrelyoneÆientspeehodinginludeellularommuniation,voieoverinternetprotool(VOIP),videoonferening,eletronitoys,arhiving,digitalsimultaneousvoieanddata(DSVD),aswellasnumerousPC-basedgamesandmultimediaappliations.SpeehodingistheartofreatingaminimallyredundantrepresentationofthespeehsignalwhihanbeeÆientlytransmittedorstoredindigitalmedia,anddeodingthesignalwiththebestpossiblepereptualquality.Likeanyotherontinuous-timesignal,speehmayberepresenteddigitallythroughtheproessesofsamplingandquantization;speehistypiallyquantizedusing1Copyright()2003byJohnWileyandSons,In.2SpeehCoderClassRates(kbps)ComplexityStandardizedAppliationsSetionWaveformoders16-64LowLand-linetelephone2Subbandoders12-256MediumTeleonferening,Audio3LPC-AS4.8-16HighDigitalellular4LPCvooder2.0-4.8HighSatellitetelephony,Military5Table1:Charateristisofstandardizednarrowbandspeehodingalgorithmsineahoffourbroadategories.either16-bituniformor8-bitompandedquantization.Likemanyothersignals,however,asam-pledspeehsignalontainsagreatdealofinformationwhihiseitherredundant(non-zeromutualinformationbetweensuessivesamplesinthesignal)orpereptuallyirrelevant(informationwhihisnotpereivedbyhumanlisteners).Mostteleommuniationsodersarelossy,meaningthatthesynthesizedspeehispereptuallysimilartotheoriginalbutmaybephysiallydissimilar.Aspeehoderonvertsadigitizedspeehsignalintoaodedrepresentation,whihisusuallytransmittedinframes.Aspeehdeoderreeivesodedframesandsynthesizesreonstrutedspeeh.Standardstypiallyditatetheinput-outputrelationshipsofbothoderanddeoder.Theinput-outputrelationshipisspei edusingarefereneimplementation,butnovelimplementationsareallowed,providedthatinput-outputequivaleneismaintained.Speehodersdi erprimarilyinbitrate(measuredinbits/sampleorbits/seond),omplexity(measuredinoperations/seond),delay(measuredinmilliseondsbetweenreordingandplaybak),andpereptualqualityofthesynthesizedspeeh.Narrowband(NB)odingreferstoodingofspeehsignalswhosebandwidthislessthan4kHz(8kHzsamplingrate)whilewideband(WB)odingreferstoodingof7kHz-bandwithsignals(14-16kHzsamplingrate).NBodingismoreommonthanWBodingmainlybeauseofthenarrowbandnatureofthewirelinetelephonehannel(300-3600Hz).Reently,however,therehasbeenaninreasede ortinwidebandspeehodingbeauseofseveralappliationssuhasvideoonferening.Therearedi erenttypesofspeehoders.Table1summarizesthebitrates,algorithmiom-plexity,andstandardizedappliationsofthefourgenerallassesofodersdesribedinthishapter;Table2attheendofthehapterlistsaseletionofspei speehodingstandards.Waveformodersattempttoodetheexatshapeofthespeehsignalwaveform,withoutonsideringthenatureofhumanspeehprodutionandspeehpereption.Theseodersarehighbitrateoders(typiallyabove16kbps).Linear-PreditionCoders(LPC)assumethatthespeehsignalistheout-putofalineartimeinvariant(LTI)modelofspeehprodution.Thetransferfuntionofthatmodelisassumedtobeall-pole(autoregressivemodel).Theexitationfuntionisaquasi-periodisignalonstrutedfromdisretepulses(1-8perpithperiod),pseudo-randomnoise,orsomeombinationofthetwo.Iftheexitationisonlygenerateda
本文标题:Speech Coding Fundamentals and Applications
链接地址:https://www.777doc.com/doc-3248985 .html