您好,欢迎访问三七文档
10.1101/gr.8.11.1202Accessthemostrecentversionatdoi:19988:1202-1215GenomeRes. AlvisBrazma,IngeJonassen,JaakViloandEskoUkkonen PredictingGeneRegulatoryElementsinSilicoonaGenomicScale References: :serviceEmailalertingclickheretoprightcornerofthearticleorReceivefreeemailalertswhennewarticlescitethisarticle-signupintheboxattheNotes :GenomeResearchTosubscribeto©1998ColdSpringHarborLaboratoryPressColdSpringHarborLaboratoryPressonJanuary4,2008-Publishedby(EMBL)Outstation–Hinxton,EuropeanBioinformaticsInstitute,WellcomeTrustGenomeCampus,Hinxton,CambridgeCB101SD,UK;2DepartmentofInformatics,UniversityofBergen,Høyteknologisenteret,N5020Bergen,Norway;3DepartmentofComputerScience,FIN-00014UniversityofHelsinki,Helsinki,FinlandWeperformedasystematicanalysisofgeneupstreamregionsintheyeastgenomeforoccurrencesofregularexpression-typepatternswiththegoalofidentifyingpotentialregulatoryelements.Toachievethisgoal,wehavedevelopedanewsequencepatterndiscoveryalgorithmthatsearchesexhaustivelyforaprioriunknownregularexpression-typepatternsthatareover-representedinagivensetofsequences.Weappliedthealgorithmintwocases,(1)discoveryofpatternsinthecompletesetof6000sequencestakenupstreamoftheputativeyeastgenesand(2)discoveryofpatternsintheregionsupstreamofthegeneswithsimilarexpressionprofiles.Inthefirstcase,welookedforpatternsthatoccurmorefrequentlyinthegeneupstreamregionsthaninthegenomeoverall.Inthesecondcase,firstweclusteredtheupstreamregionsofallthegenesbysimilarityoftheirexpressionprofilesonthebasisofpubliclyavailablegeneexpressiondataandthenlookedforsequencepatternsthatareover-representedineachcluster.Inbothcasesweconsideredeachpatternthatoccurredatleastinsomeminimumnumberofsequences,andratedthemonthebasisoftheirover-representation.Amongthehighestratingpatterns,mosthavematchestosubstringsinknownyeasttranscriptionfactor-bindingsites.Moreover,severalofthemareknowntoberelevanttotheexpressionofthegenesfromtherespectiveclusters.Experimentsonsimulateddatashowthatthemajorityofthediscoveredpatternsarenotexpectedtooccurbychance.Completelysequencedgenomes,togetherwiththeemergingDNAmicroarraytechnologiesenablingthemeasurementofthegeneexpressionlevelsincellcultures(Schenaetal.1995;forasurvey,seeRamsay1998),areopeningnewpossibilitiesforstudyinggeneregulation.Thesequencingofthefirsteukaryoticgenome(theyeastSaccharomycescer-evisiae)wascompletedin1996(Goffeauetal.1996;Mewesetal.1997).Dataabouttheexpressionlevelsofalmostallofthe~6000yeastgeneshavebeenobtained(DeRisietal.1997;Velculescuetal.1997;Wodickaetal.1997)during1997.Inparticular,De-Risietal.(1997)measuredtherelativeexpressionlevelsoftheyeastgenesatsevenconsecutivetimepoints(in2-hrintervals)duringashiftfromanaero-bictoaerobicmetabolism(diauxicshift).Theyshowedthatsomeofthegenesthatareknowntobeinvolvedinmetabolicpathwaysrelatedtothedi-auxicshiftunderwentaverysignificantchangeintheirexpressionlevelduringtheshift.Bytreatingtheexpressionmeasurementsasatimeseries,itispossibletoclustergenesaccordingtosimilaritiesintheirexpressionprofiles.Itmaybehypothesizedthatatleastsomeofthegenesinaclusterareregu-latedbysimilarmechanisms.Thetranscriptionregulationmechanismsineu-karyoticgenomesarenotwellunderstood.Evi-dently,however,anessentialroleisplayedbytran-scriptionfactors,whichcanbindtoparticularDNAsequences,calledtranscriptionfactor-bindingsites,believedtobeabout5–25bplong.Inyeast,thesesitesareusuallywithinseveralhundredbasepairsupstreamoftherespectiveORFs(Mellor1993).Regularexpressiontypepatterns,aswellasnucleotidedistributionmatrices,havebothbeenusedfordescribingtranscriptionfactor-bindingsites,(e.g.,seeBucher1990;Ghosh1990;Chenetal.1995;Wingenderetal.1996).Inferenceofsuchde-scriptionsfromthesequencesthatareassumedtocontainasiteforaparticulartranscriptionfactorisadifficultproblemastheconsensusofthedifferentbindingsitesofthesametranscriptionfactorisof-tenratherweak.Algorithmshavebeenproposedforinferringsuchdescriptionsfromsetsofrelativelysmallnumberofsequences(about20)inwhichall4Correspondingauthor.E-MAILvilo@cs.helsinki.fi;FAX358970844441.LETTER1202GENOMERESEARCH8:1202–1215©1998byColdSpringHarborLaboratoryPressISSN1054-9803/98$5.00;(e.g.,seeStormoandHartzell1989;Wolfertstetteretal.1996;vanHeldenetal.1998).Morerecently,vanHeldenetal.(1998)andYadaetal.(1998)haveproposedmethodsforthediscoveryofputativetranscriptionfactor-bindingsitesfromlargerdatasets.Yadaetal.(1998)appliedtheirmethodtoana-lyzeabout400humanpromotorsequences.Apparently,
本文标题:Predicting gene regulatory elements in silico on a
链接地址:https://www.777doc.com/doc-3402693 .html