Haploop&MongoDB

HadoopandMongoDBIntroductionofOLTP,HadoopandMongoDBPresenter:YuboAgenda•OLTPandTraditionalRDBMS•BigDataChallengesforRDBMS–Datastorage,Retrieveandprocesschallenge–DataAnalysischallenge•MongoDB•Hadoop–HadoopDistributedFileSystem–MapReduceOLTPWhatisOLTPOnlinetransactionprocessing,orOLTP,isaclassofinformationsystemsthatfacilitateandmanagetransaction-orientedapplications,typicallyfordataentryandretrievaltransactionprocessing.Itinvolvesgatheringinputinformation,processingtheinformationandupdatingexistinginformationtoreflectthegatheredandprocessedinformation.AndgenerallytheOLTPbuildonthetraditionalRDBMSandgainagreatsuccessinthepastdecade.SoinmanycaseswhenwesayOLTP,werefertothetraditionalRDBMSandtheapplicationbuildonit.UseCases:•OnlineBanking•CRMsystem•OAsystem•SaleForceAdvantagesofRDBMS•TherearemanymatureRDBMSproducts,likeOracle,SQLServer,MySQLetc.•Havematurealgorithmonstoring,retrievingdataefficientlyonlittleandintermediatedatavolume•HavebuildinACIDpropertiestoensurethereliabilityandaccurancyofthebusinesstraction•FlexibleIndexmechanismtoimprovethedataretrievalChallengesforTraditionalRDBMSDataVolumeChallengeInrecentyearstherehasbeenanexplosionofdata,variednewersetsofsources,includingGlobalPositioningSystems(GPS),automatedtrackersandmonitoringsystems,aregeneratingalotofdata.TheselargervolumesofdatasetscangrowtohundredofTBforwhichRDBMShardtostoreandprocessSemi-structuredChallengeInparalleltothefastdatagrowth,dataisalsobecomingincreasinglysemi-structuredandsparse.ThismeansthetraditionaldatamanagementtechniquesaroundupfrontschemadefinitionandrelationalreferencesisalsobeingquestionedThequesttosolvetheproblems(store,retrieveandprocessthoselargeandsemi-structureddataefficiently)ledtotheemergenceofaclassofnewertypesofdatabaseproductswhichcalledNoSQLdatabase.isoneofthatkindofDatabase.DataAnalysis/ComputingChallengeTheexponentialgrowthofdataalsopresentchallengeforthedataanalysis.LikeforGoogle,Yahoo,Amazon,theyneedtogothroughterabytesandevenpetabytesofdatatofigureoutwhichwebsites/product/campaignwerepopular,whatkindsofadsappealedtopeople.GooglewasthefirsttopublicizeMapReduce–asystemtheyhadusedtoscaletheirdataprocessingneeds.DougCuttingsawanopportunityandledthechargetodevelopanopensourceversionofthisMapReducesystemcalledSoonafter,Yahooandothersralliedaroundtosupportthiseffort.Today,Hadoopisacorepartofthecomputinginfrastructureformanywebcompanies,suchasYahoo,Facebook,LinkedIn,andTwitter.ChallengesforTraditionalRDBMSNow,let’sdoasimpleintroductionforANDWhatisHadoopTheApacheHadoopsoftwarelibraryisaframeworkthatallowsforthedistributedprocessingoflargedatasetsacrossclustersofcomputersusingsimpleprogrammingmodels.Itisdesignedtoscaleupfromsingleserverstothousandsofmachines,eachofferinglocalcomputationandstorage.Theprojectincludesthesemodules:•HadoopCommon:ThecommonutilitiesthatsupporttheotherHadoopmodules.•HadoopDistributedFileSystem:Adistributedfilesystemthatprovideshigh-throughputaccesstoapplicationdata.•HadoopYARN:Aframeworkforjobschedulingandclusterresourcemanagement.•HadoopMapReduce:AYARN-basedsystemforparallelprocessingoflargedatasets.Requirement:GooglewanttoclassifytheSearchKeyWordsandfrequentResult:KeyWordsCountHadoop4213423C#543345T-SQL64354…………..……………..Source:AlltheweblogfilesChallenges:1.Morethan100billionlogfiles,TBdata,howtostorethedatawhichneedtoanalyzed?2.HowtoAnalyze/computebaseonsolargedata?StorageSolutions:•ScaleUp?SuperComputer?Soexpensive!NoteasytoscaleanymoreIOisstillabottleneckforlaterprocessStorageSolutions:•Let’sScaleOutClusterhavemaybethousandsofcommonPCSplitandDistributethedataamongtheclusterHowtohandledatanodecorruptissue?SplitfileintodifferentblockItmeansthedistributionisbaseonfileblockinsteadoffileReplicationautomaticallyWhereisthemetadata?That’sHadoopDistributedFileSystemAdvantage:•AutomaticallyDistributethefileblockamongthecluster•Replicationensurethedatawillnotmissifonedatanodedown•Userknownothingaboutthedistributionandreplication•EasytoscaleouttoaddmoremachineintotheclusterDisadvantage:•Costmorestoragetoensurereplication•SinglepointoffailurelimitationCalculationChallenge:•Movedatatocentralizedlocationandthencalculate?IOandnetworkwillbethebottleneckTheclientnodewhichdothecalculationwillbethebottleneckCalculationSolution:•Solet’sdistributethecalculationinsteadofdataEverydatanodeneedanagenttodothecalculationlocallyShouldhaveamasternodetoschedulethecalculationdistributionandcollecttheintermediateresultShouldeasytoscaleoutandprogrammingThatisMapReduce!!WhatisMapReduce:MapReduceisaprogrammingmodelforprocessinglargedatasetswithaparallel,distributedalgorithmonacluster.InspiredbythemapandreduceprimitivespresentinfunctionallanguagesMap:foreacheveryiteminalisttodosomeoperationandoutputanotherlistReduce:dosomekindofaggregationonalistandoutputanotherlistKeyValue1C#2SQLServer3Hadoop4SQLServer5Hadoop6HadoopKeyValueC#1SQLServer1Hadoop1SQLServer1Hadoop1Hadoop1MapKeyValueC#1SQLServer2Hadoop3ReduceCalculatetheSearchKeywordsfrequencyusingMapReduceKeyValueLog20120909.txtFilecontentKeyValueC#4321SQLServer54523H

Haploop&MongoDB

免费阅读已结束，点击付费阅读剩下 ... 页

阅读已结束，您可以下载文档离线阅读

第九章网上银行

福建省高职单招旅游类考试大纲

09-10第二学期生物化学试卷(B)

河南航天精工制造有限公司SPC管理实践经验

某上市公司劳动合同管理制度

新后勤保障管理手册（DOC38页）

严：咨询心理学

外销主管岗位说明书.doc

福建农林大学XXXX-XXXX校内勤工岗位设置汇总表

东兴证券柜台业务知识(入职)

相关文档

相关搜索