BK21 BT Â· IT Integrationist Program Omics information mix & mining The Sixth Sino-Japan-Korea Bioinformatics Training Course Shanghai, Ma rch 27-30, 200 7 2007. 3. 29 Sangsoo Kim & KOBIC Omics TeamSlide 2
What is the objective of Biosciences? Eventually, the complete comprehension of life wonders Complex association Regulatory component (homeostasis) Growth & improvement Energy use Response to the ecological boosts Reproduction (DNA assurances precise replication) Evolution (limit of species to change after some time)Slide 3
Spider Silk: Stronger than Steel Life â s differences results from the assortment of atoms in cells An arachnid â s web-building aptitude relies on upon its DNA particles DNA likewise decides the structure of silk proteins These make a spiderweb solid and flexibleSlide 4
The catch strand contains a solitary curled silk fiber covered with a sticky liquid The looped fiber loosens up to catch prey and after that backlashes quickly Coiled fiber of silk protein Coating of catch strandSlide 5
Evidence from flagelliform silk cDNA for the auxiliary premise of flexibility and secluded nature of insect silks J Mol Biol. 1998 Feb 6;275(5):773-84 They report the cloning of generous cDNA for flagelliform organ silk protein, which shapes the main element of the getting winding The overwhelming rehash of this protein is Gly-Pro-Gly-Gly-X, which can show up to 63 times in coupled exhibits They suggest that the spring-like helix is the premise for the versatility of silkSlide 6
Central doctrine of atomic science DNA RNA proteinSlide 7
Paradigm Shift in Biosciences So far, researcher have centered certain phenotypes and chased the qualities capable, each one in turn New pattern is Catalog every one of the parts: qualities and proteins Understand how every part functions Model & mimic the aggregate conduct of the parts Genomics & Proteomics FunctionalGenomics Systems BiologySlide 8
genome transcriptome proteome Central authoritative opinion of bioinformatics and genomics Central creed of sub-atomic science DNA RNA proteinSlide 9
Base sets of DNA (billions) Sequences (millions) 1982 1986 1990 1994 1998 2002 YearSlide 10
With $1,000 genome sequencing advancements in 10 years combined with utilitarian information, we need better IT arrangements!Slide 11
Proliferation of Genomics Explosion of information Human qualities: 25,000 Human genome: 3x10 9 bp DNA-protein or protein-protein associations could build information significantly Chimpanzee, mouse, rodent, pooch, bovine, chicken, creepy crawlies, worms, plants, parasites, green growth, microbes, archaea, infections â¦Slide 12
Genome Projects (385 completed) as of June 4, 2006 Ongoing activities 608 eukaryotes 989 prokaryotesSlide 13
Top ten difficulties for bioinformatics  Precise models of where and when interpretation will happen in a genome (start and end)  Precise, prescient models of option RNA joining  Precise models of sign transduction pathways; capacity to anticipate cell reactions to outside jolts  Determining protein:DNA, protein:RNA, protein:protein acknowledgment codes  Accurate abdominal muscle initio protein structure forecastSlide 14
Top ten difficulties for bioinformatics  Rational configuration of little atom inhibitors of proteins  Mechanistic comprehension of protein advancement  Mechanistic comprehension of speciation  Development of successful quality ontologies: deliberate approaches to portray quality and protein capacity  Education: improvement of bioinformatics educational program Source: Ewan Birney, Chris Burge, Jim FickettSlide 15
Functional Genomics & Systems Biology New information sorts: Sequences Structures High throughput expression profiles in (10,000 x 100) framework shapes Interactions, Pathways, Networks Mathematical displaying & reenactment of organic procedures Algorithms Graphical representationSlide 16
K-JIST 18C 19C 20CSlide 17
Genome Transcriptome Proteome Metabolome Genomics Transcriptomics Proteomics Metabolomics DNA RNA Protein Metabolite K-JIST Terminology More than 50-omes including â Unknownome âSlide 19
Omics information In the Omics period, we see expansion of genome/vast high throughput information that are accessible out in the open files Comparative genome groupings Sequence variety & phenotypes Epigenetics & chromatin structure Regulatory components & quality expression Protein expression, change & confinement Protein space, structure, collaboration Metabolic, signal, administrative pathways Drug, toxicogenomics, toxicoproteomicsSlide 20
Joyce et al. Nature Reviews Molecular Cell Biology 7 , 198 â 210 (March 2006) | doi:10.1038/nrm1857Slide 21
Joyce et al. Nature Reviews Molecular Cell Biology 7 , 198 â 210 (March 2006) | doi:10.1038/nrm1857Slide 22
Joyce et al. Nature Reviews Molecular Cell Biology 7 , 198 â 210 (March 2006) | doi:10.1038/nrm1857Slide 23
As an illustration, Suppose you are keen on how much the CDK2 trascription control is monitored, you may require Orthologs in different model life forms Genome arrangements of promoter areas among phylogenetic cousins Among mammalians or vertebrates Among yeast subsepecies Transfac-kind of TF tying database ChIP-chip information for every living being Orthology guide of the TF â s thus on You may include proteome and interactome Only piece of them are accessible at NCBI Rest of them are accessible in the general population space as a supplementary materials or at the writer â s sites.:Slide 26
Integration of Omics information Systematic mining Cross-learning area approval Cross-species interjection Generation of speculations that can be tried Biologically extremely intriguing questions Requires cross-useful information The best approachSlide 27
Organization of informationSlide 28
Where to search for Nature gives omics segment www.nature.com/omics Science Cell PLoS Biology Genes & Development Stem Cell Relevant articles (PubMed, Google Scholar)Slide 31
ENCyclopedia Of DNA Elements (ENCODE) subsidized by NHGRISlide 32
NHGRI Current Topics in Genome Analysis 2006Slide 33
NHGRI Current Topics in Genome Analysis 2006Slide 34
ENCODE Genomes to seuqnceSlide 35
Phase 1 of ENCODE NHGRI â s ENCODE task creates such information at a pilot scale The information are saved and incorporated into the UCSC Genome Browser It offers information mining capacity by means of Table Browser There is no â natural connections â among the 3,000+ tables (Ensembl â s BioMart is more â natural â ) It is upto the clients how to join the tables It is constrained to genomic directions, not proposed for proteome workSlide 36
ENCODE Data Integrated in UCSC Genome BrowserSlide 37
A ~2kb rationed, transcribable, Ac-histone, pol2-tying component in the 1 st intron of ST7Slide 38
Turned out to be a pseudo quality!Slide 39
And likewise copied in different parts of genome!Slide 41
Omics Dataset ExampleSlide 42
Application Examples Joyce et al. Nature Reviews Molecular Cell Biology 7 , 198 â 210 (March 2006) | doi:10.1038/nrm1857Slide 43
Protein-DNA Interaction & Transcriptomics Yeast rich medium quality modules system ChIP-chip area and expression information 106 modules containing 655 qualities managed by 68 TFsSlide 44
Protein-DNA Interaction & TranscriptomicsSlide 45
Combining so as to predict Protein-Protein Interaction numerous datasetsSlide 46
Combining so as to predict Protein-Protein Interaction various datasetsSlide 47
Combining so as to predict Protein-Protein Interaction different datasetsSlide 48
How to take an interest Domain learning gathering Monitoring papers and sites of important information Collect the omics information and change into normal configurations Develop speculations & mining procedures Data mix gathering Develop DB blueprint Integration with bio-grid & bio-motor Querying organic ideas Graphic perceptionSlide 50
Practice Session - Cytoscape Installation One of the most generally utilized and extensively open programming bundles intended to encourage omics information combination and investigation Totorials Interaction system show Expression
Gotten 3D model of Chamber including Mechanical interfaces and food through ... (For the DETECTO ...
E.g., given another accident coverage candidate, would it be a good idea for him to or she be na ...
Exceptionally hard to accomplish for non-enlisted clients in the present Web environment ... Web ...
Plot. Brief History on Mining IndustryIntroduction of the part microorganisms play miningMethods ...
Diagram. Presentation of SmartRule affiliation principle miningCase I: mining pregnancy informat ...
Worldwide Outline. 1. Information mining models and calculations (JG,15 min)1.1 Preprocess to ge ...
Plan. Web Usage Mining: DefinitionResearch Issues in Web Usage MiningCurrent Research in Web Usa ...
CS753 Dr. Mary Ann Robbert. What is Data Mining. Information mining finds significant new connec ...
Prophet Data Integration An Overview with Emphasis in DW Appliances . Where Does Data Integratio ...
Data Mining: Association. Mining Association Rules in Large Databases. Association rule min ...
Issues with Data Mining. Data Mining involves Generalization. Data mining learns generalizat ...
CS 349: Market Basket Data Mining. All about beer and diapers. Overview. What is Data Mini ...
Data Mining Techniques for CRM. Seyyed Jamaleddin Pishvayi Customer Relationship Management ...
Oracle Data Integration Strategy and Roadmap Oracle Fusion Middleware Product Management. Age ...
Data Mining – Intro. Course Overview. Spatial Databases Temporal Databases Spatio-Tempor ...
Time-Series and Sequential Pattern Mining. Relapse and pattern investigation
Motivation. What is Data Mining ?Data Mining TechniquesData Mining ProcessOur work in Data Minin ...
2. 3. Rundown of mining laws . CountryMining Act Country Mining Act Argentina1997 Ghana 1986Boli ...
What decides the kind of mining?. Profundity of overburdenSize of the mineral bodyShape of the m ...