您好,欢迎访问三七文档
当前位置:首页 > IT计算机/网络 > AI人工智能 > 基于改进双流细胞神经网络的动作识别(IJMSC-V6-N6-3)
I.J.MathematicalSciencesandComputing,2020,6,15-23PublishedOnlineDecember2020inMECS()DOI:10.5815/ijmsc.2020.06.03Copyright©2020MECSI.J.MathematicalSciencesandComputing,2020,6,15-23ActionRecognitionBasedontheModifiedTwo-streamCNNDanzhenga,HangLia,*,andShoulinYina,*aSoftwareCollege,ShenyangNormalUniversity,Shenyang110034,ChinaCorrespondingAuthor:lihangsoft@163.com;yslinhit@163.comReceived:20October2020;Accepted:03November2020;Published:08December2020Abstract:Humanactionrecognitionisanimportantresearchdirectionincomputervisionareas.Itsmaincontentistosimulatehumanbraintoanalyzeandrecognizehumanactioninvideo.Itusuallyincludesindividualactions,interactionsbetweenpeopleandtheexternalenvironment.Space-timedual-channelneuralnetworkcanrepresentthefeaturesofvideofrombothspatialandtemporalperspectives.Comparedwithotherneuralnetworkmodels,ithasmoreadvantagesinhumanactionrecognition.Inthispaper,aactionrecognitionmethodbasedonimprovedspace-timetwo-channelconvolutionalneuralnetworkisproposed.First,thevideoisdividedintoseveralequallengthnon-overlappingsegments,andaframeimagerepresentingthestaticfeatureofthevideoandastackedopticalflowimagerepresentingthemotionfeaturearesampledatrandompartfromeachsegment.Thenthesetwokindsofimagesareinputintothespatialdomainandthetemporaldomainconvolutionalneuralnetworkrespectivelyforfeatureextraction,andthenthesegmentedfeaturesofeachvideoarefusedinthetwochannelsrespectivelytoobtainthecategorypredictionfeaturesofthespatialdomainandthetemporaldomain.Finally,thevideoactionrecognitionresultsareobtainedbyintegratingthepredictivefeaturesofthetwochannels.Throughexperiments,variousdataenhancementmethodsandtransferlearningschemesarediscussedtosolvetheover-fittingproblemcausedbyinsufficienttrainingsamples,andtheeffectsofdifferentsegmentalnumber,pre-trainingnetwork,segmentalfeaturefusionschemeanddual-channelintegrationstrategyonactionrecognitionperformanceareanalyzed.Theexperimentresultsshowthattheproposedmodelcanbetterlearnthehumanactionfeaturesinacomplexvideoandbetterrecognizetheaction.IndexTerms:Actionrecognition,dual-channel,convolutionalneuralnetwork.1.IntroductionWhenhumanbeingsgetinformationfromtheoutsideworld,visualinformationaccountsfor80%ofthetotalinformationobtainedbyvariousorgans.Thisinformationisofgreatsignificanceforunderstandingthenatureofthings.WiththerapiddevelopmentofmobileInternetandelectronictechnology,mobilephonesandothervideocapturedeviceshavebecomepopularinlargeNumbers,andInternetshortvideoapplicationshavemushroomedlikemushrooms,greatlyreducingthecostofvideoshootingandsharing,whichleadstotheexplosivegrowthofonlinevideoresources.Theseresourcesenrichpeople'slife,butbecauseoftheirhugeamount,varietyandcontent,howtoconductintelligentanalysis,understandingandrecognitionofthesevideodatahasbecomeanurgentchallenge[1-5].Humanactionrecognitionisanimportantresearchdirectioninthefieldofcomputervision.Themajorresearchobjectivesaretosimulatehumanbraintoanalyzeandrecognizehumanactioninvideos,whichusuallyincludesindividualactionsofhumanbeings,interactionsbetweenhumanbeingsandtheoutsideworldandenvironment.Inthetraditionalactionrecognitionmethodsbasedonartificialdesignfeatures,theearlyfeaturesbasedonhumanbodygeometryoractioninformationareonlysuitablefortherecognitionofsimplehumanbodymovementsinsimplescenes,whilethespatio-temporalinterestpointsmethodismoreeffectiveinthecaseofrelativelycomplexbackground.Inthisway,theinterestpointsordensesamplingpointsinspace-timeinthevideoareobtainedfirst,andlocalcharacteristicsarecalculatedbasedonthespace-timechunksaroundthesepoints.Inthisway,thecharacteristicvectordescribingthevideoactioniseventuallyformedbyusingtheclassicfeatureencodingmethodssuchasBagofFeatures(BoF),VLAD(VectorofLocallyAggregatedDescriptors)orFisherVector[6-8].Currently,inthelocalfeature-basedapproach,theactionidentificationmethodbasedonDenseTrajectory(DT)hasobtainedbetteridentificationresultsinmanypublicrealsceneactiondatabases.TheyobtaintheDenseTrajectorybytrackingthedensesamplingpointsineachframeofthevideo,andthencalculatetheTrajectorycharacteristicstodescribetheactioninthevideo.Forexample,Cai[9]usedmulti-viewsupervector(MVSV)asglobaldescriptortocodethefeatureofDenseTrajectory.Wang[10]improvedsetterTrajectory(IDT)featureusingFVencoding.Peng[11]usedBagofVisualWords,(BoVW)tocodespace-timepointofinterestorfeaturesofimproveddensetrajectorycharacteristic.Basedondensetrajectory16ActionRecognitionBasedontheModifiedTwo-streamCNNCopyright©2020MECSI.J.MathematicalSciencesandComputing,2020,6,15-23characteristics,Wang[12]proposedamultistagevideorepresentationmodelMoFAP(MotionFeatures,Atoms,andPhrases),whichcouldrepresentthevisualinformationinahierarchicalmanner.Densetrajectoriescanextractactionalfeatureswithwidercoverageandfinergranularity,butthereisusuallyalargenumberoftrajectoryredundancywhichlimitstherecognitioneffect.Alongwiththedeeplearningsuccessfullyusedinthefieldofspeechandimagerecognitionandsoon,especiallytheConvolutionalneuralnetwork(CNN),avarietyofhumanactionrecognitionm
本文标题:基于改进双流细胞神经网络的动作识别(IJMSC-V6-N6-3)
链接地址:https://www.777doc.com/doc-7723395 .html