您好,欢迎访问三七文档
ImageNetClassificationwithDeepConvolutionalNeuralNetworksAlexKrizhevskyUniversityofTorontokriz@cs.utoronto.caIlyaSutskeverUniversityofTorontoilya@cs.utoronto.caGeoffreyE.HintonUniversityofTorontohinton@cs.utoronto.caAbstractWetrainedalarge,deepconvolutionalneuralnetworktoclassifythe1.2millionhigh-resolutionimagesintheImageNetLSVRC-2010contestintothe1000dif-ferentclasses.Onthetestdata,weachievedtop-1andtop-5errorratesof37.5%and17.0%whichisconsiderablybetterthanthepreviousstate-of-the-art.Theneuralnetwork,whichhas60millionparametersand650,000neurons,consistsoffiveconvolutionallayers,someofwhicharefollowedbymax-poolinglayers,andthreefully-connectedlayerswithafinal1000-waysoftmax.Tomaketrain-ingfaster,weusednon-saturatingneuronsandaveryefficientGPUimplemen-tationoftheconvolutionoperation.Toreduceoverfittinginthefully-connectedlayersweemployedarecently-developedregularizationmethodcalled“dropout”thatprovedtobeveryeffective.WealsoenteredavariantofthismodelintheILSVRC-2012competitionandachievedawinningtop-5testerrorrateof15.3%,comparedto26.2%achievedbythesecond-bestentry.1IntroductionCurrentapproachestoobjectrecognitionmakeessentialuseofmachinelearningmethods.Toim-provetheirperformance,wecancollectlargerdatasets,learnmorepowerfulmodels,andusebet-tertechniquesforpreventingoverfitting.Untilrecently,datasetsoflabeledimageswererelativelysmall—ontheorderoftensofthousandsofimages(e.g.,NORB[16],Caltech-101/256[8,9],andCIFAR-10/100[12]).Simplerecognitiontaskscanbesolvedquitewellwithdatasetsofthissize,especiallyiftheyareaugmentedwithlabel-preservingtransformations.Forexample,thecurrent-besterrorrateontheMNISTdigit-recognitiontask(0.3%)approacheshumanperformance[4].Butobjectsinrealisticsettingsexhibitconsiderablevariability,sotolearntorecognizethemitisnecessarytousemuchlargertrainingsets.Andindeed,theshortcomingsofsmallimagedatasetshavebeenwidelyrecognized(e.g.,Pintoetal.[21]),butithasonlyrecentlybecomepossibletocol-lectlabeleddatasetswithmillionsofimages.ThenewlargerdatasetsincludeLabelMe[23],whichconsistsofhundredsofthousandsoffully-segmentedimages,andImageNet[6],whichconsistsofover15millionlabeledhigh-resolutionimagesinover22,000categories.Tolearnaboutthousandsofobjectsfrommillionsofimages,weneedamodelwithalargelearningcapacity.However,theimmensecomplexityoftheobjectrecognitiontaskmeansthatthisprob-lemcannotbespecifiedevenbyadatasetaslargeasImageNet,soourmodelshouldalsohavelotsofpriorknowledgetocompensateforallthedatawedon’thave.Convolutionalneuralnetworks(CNNs)constituteonesuchclassofmodels[16,11,13,18,15,22,26].Theircapacitycanbecon-trolledbyvaryingtheirdepthandbreadth,andtheyalsomakestrongandmostlycorrectassumptionsaboutthenatureofimages(namely,stationarityofstatisticsandlocalityofpixeldependencies).Thus,comparedtostandardfeedforwardneuralnetworkswithsimilarly-sizedlayers,CNNshavemuchfewerconnectionsandparametersandsotheyareeasiertotrain,whiletheirtheoretically-bestperformanceislikelytobeonlyslightlyworse.1DespitetheattractivequalitiesofCNNs,anddespitetherelativeefficiencyoftheirlocalarchitecture,theyhavestillbeenprohibitivelyexpensivetoapplyinlargescaletohigh-resolutionimages.Luck-ily,currentGPUs,pairedwithahighly-optimizedimplementationof2Dconvolution,arepowerfulenoughtofacilitatethetrainingofinterestingly-largeCNNs,andrecentdatasetssuchasImageNetcontainenoughlabeledexamplestotrainsuchmodelswithoutsevereoverfitting.Thespecificcontributionsofthispaperareasfollows:wetrainedoneofthelargestconvolutionalneuralnetworkstodateonthesubsetsofImageNetusedintheILSVRC-2010andILSVRC-2012competitions[2]andachievedbyfarthebestresultseverreportedonthesedatasets.Wewroteahighly-optimizedGPUimplementationof2Dconvolutionandalltheotheroperationsinherentintrainingconvolutionalneuralnetworks,whichwemakeavailablepublicly1.Ournetworkcontainsanumberofnewandunusualfeatureswhichimproveitsperformanceandreduceitstrainingtime,whicharedetailedinSection3.Thesizeofournetworkmadeoverfittingasignificantproblem,evenwith1.2millionlabeledtrainingexamples,soweusedseveraleffectivetechniquesforpreventingoverfitting,whicharedescribedinSection4.Ourfinalnetworkcontainsfiveconvolutionalandthreefully-connectedlayers,andthisdepthseemstobeimportant:wefoundthatremovinganyconvolutionallayer(eachofwhichcontainsnomorethan1%ofthemodel’sparameters)resultedininferiorperformance.Intheend,thenetwork’ssizeislimitedmainlybytheamountofmemoryavailableoncurrentGPUsandbytheamountoftrainingtimethatwearewillingtotolerate.OurnetworktakesbetweenfiveandsixdaystotrainontwoGTX5803GBGPUs.AllofourexperimentssuggestthatourresultscanbeimprovedsimplybywaitingforfasterGPUsandbiggerdatasetstobecomeavailable.2TheDatasetImageNetisadatasetofover15millionlabeledhigh-resolutionimagesbelongingtoroughly22,000categories.TheimageswerecollectedfromthewebandlabeledbyhumanlabelersusingAma-zon’sMechanicalTurkcrowd-sourcingtool.Startingin2010,aspartofthePascalVisualObjectChallenge,anannualcompetitioncalledtheImageNetLarge-ScaleVisualRecognitionChallenge(ILSVRC)hasbeenheld.ILSVRCusesasubsetofImageNetwithroughly1000imagesin
三七文档所有资源均是用户自行上传分享,仅供网友学习交流,未经上传用户书面授权,请勿作他用。
本文标题:ImageNet-Classification-with-Deep-Convolutional-Ne
链接地址:https://www.777doc.com/doc-3664338 .html