Omics data integration mining l.jpg
1 / 50

Omics data integration & mining.


54 views
Uploaded on:
Category: General / Misc
Description
BK21 BT · IT Integrationist Program Omics data integration & mining The Sixth Sino-Japan-Korea Bioinformatics Training Course Shanghai, Ma rch 27-30, 200 7 2007. 3. 29 Sangsoo Kim & KOBIC Omics Team What is the goal of Biosciences? Ultimately, the complete understanding of life phenomena
Transcripts
Slide 1

BK21 BT · IT Integrationist Program Omics information mix & mining The Sixth Sino-Japan-Korea Bioinformatics Training Course Shanghai, Ma rch 27-30, 200 7 2007. 3. 29 Sangsoo Kim & KOBIC Omics Team

Slide 2

What is the objective of Biosciences? Eventually, the complete comprehension of life wonders Complex association Regulatory component (homeostasis) Growth & improvement Energy use Response to the ecological boosts Reproduction (DNA assurances precise replication) Evolution (limit of species to change after some time)

Slide 3

Spider Silk: Stronger than Steel Life ’ s differences results from the assortment of atoms in cells An arachnid ’ s web-building aptitude relies on upon its DNA particles DNA likewise decides the structure of silk proteins These make a spiderweb solid and flexible

Slide 4

The catch strand contains a solitary curled silk fiber covered with a sticky liquid The looped fiber loosens up to catch prey and after that backlashes quickly Coiled fiber of silk protein Coating of catch strand

Slide 5

Evidence from flagelliform silk cDNA for the auxiliary premise of flexibility and secluded nature of insect silks J Mol Biol. 1998 Feb 6;275(5):773-84 They report the cloning of generous cDNA for flagelliform organ silk protein, which shapes the main element of the getting winding The overwhelming rehash of this protein is Gly-Pro-Gly-Gly-X, which can show up to 63 times in coupled exhibits They suggest that the spring-like helix is the premise for the versatility of silk

Slide 6

Central doctrine of atomic science DNA RNA protein

Slide 7

Paradigm Shift in Biosciences So far, researcher have centered certain phenotypes and chased the qualities capable, each one in turn New pattern is Catalog every one of the parts: qualities and proteins Understand how every part functions Model & mimic the aggregate conduct of the parts Genomics & Proteomics FunctionalGenomics Systems Biology

Slide 8

genome transcriptome proteome Central authoritative opinion of bioinformatics and genomics Central creed of sub-atomic science DNA RNA protein

Slide 9

Base sets of DNA (billions) Sequences (millions) 1982 1986 1990 1994 1998 2002 Year

Slide 10

With $1,000 genome sequencing advancements in 10 years combined with utilitarian information, we need better IT arrangements!

Slide 11

Proliferation of Genomics Explosion of information Human qualities: 25,000 Human genome: 3x10 9 bp DNA-protein or protein-protein associations could build information significantly Chimpanzee, mouse, rodent, pooch, bovine, chicken, creepy crawlies, worms, plants, parasites, green growth, microbes, archaea, infections …

Slide 12

Genome Projects (385 completed) as of June 4, 2006 Ongoing activities 608 eukaryotes 989 prokaryotes

Slide 13

Top ten difficulties for bioinformatics [1] Precise models of where and when interpretation will happen in a genome (start and end) [2] Precise, prescient models of option RNA joining [3] Precise models of sign transduction pathways; capacity to anticipate cell reactions to outside jolts [4] Determining protein:DNA, protein:RNA, protein:protein acknowledgment codes [5] Accurate abdominal muscle initio protein structure forecast

Slide 14

Top ten difficulties for bioinformatics [6] Rational configuration of little atom inhibitors of proteins [7] Mechanistic comprehension of protein advancement [8] Mechanistic comprehension of speciation [9] Development of successful quality ontologies: deliberate approaches to portray quality and protein capacity [10] Education: improvement of bioinformatics educational program Source: Ewan Birney, Chris Burge, Jim Fickett

Slide 15

Functional Genomics & Systems Biology New information sorts: Sequences Structures High throughput expression profiles in (10,000 x 100) framework shapes Interactions, Pathways, Networks Mathematical displaying & reenactment of organic procedures Algorithms Graphical representation

Slide 16

K-JIST 18C 19C 20C

Slide 17

Genome Transcriptome Proteome Metabolome Genomics Transcriptomics Proteomics Metabolomics DNA RNA Protein Metabolite K-JIST Terminology More than 50-omes including “ Unknownome ”

Slide 19

Omics information In the Omics period, we see expansion of genome/vast high throughput information that are accessible out in the open files Comparative genome groupings Sequence variety & phenotypes Epigenetics & chromatin structure Regulatory components & quality expression Protein expression, change & confinement Protein space, structure, collaboration Metabolic, signal, administrative pathways Drug, toxicogenomics, toxicoproteomics

Slide 20

Joyce et al. Nature Reviews Molecular Cell Biology 7 , 198 – 210 (March 2006) | doi:10.1038/nrm1857

Slide 21

Joyce et al. Nature Reviews Molecular Cell Biology 7 , 198 – 210 (March 2006) | doi:10.1038/nrm1857

Slide 22

Joyce et al. Nature Reviews Molecular Cell Biology 7 , 198 – 210 (March 2006) | doi:10.1038/nrm1857

Slide 23

Joyce et al. Nature Reviews Molecular Cell Biology 7 , 198 – 210 (March 2006) | doi:10.1038/nrm1857

Slide 24

Joyce et al. Nature Reviews Molecular Cell Biology 7 , 198 – 210 (March 2006) | doi:10.1038/nrm1857

Slide 25

As an illustration, Suppose you are keen on how much the CDK2 trascription control is monitored, you may require Orthologs in different model life forms Genome arrangements of promoter areas among phylogenetic cousins Among mammalians or vertebrates Among yeast subsepecies Transfac-kind of TF tying database ChIP-chip information for every living being Orthology guide of the TF ’ s thus on You may include proteome and interactome Only piece of them are accessible at NCBI Rest of them are accessible in the general population space as a supplementary materials or at the writer ’ s sites.:

Slide 26

Integration of Omics information Systematic mining Cross-learning area approval Cross-species interjection Generation of speculations that can be tried Biologically extremely intriguing questions Requires cross-useful information The best approach

Slide 27

Organization of information

Slide 28

Where to search for Nature gives omics segment www.nature.com/omics Science Cell PLoS Biology Genes & Development Stem Cell Relevant articles (PubMed, Google Scholar)

Slide 31

ENCyclopedia Of DNA Elements (ENCODE) subsidized by NHGRI

Slide 32

NHGRI Current Topics in Genome Analysis 2006

Slide 33

NHGRI Current Topics in Genome Analysis 2006

Slide 34

ENCODE Genomes to seuqnce

Slide 35

Phase 1 of ENCODE NHGRI ’ s ENCODE task creates such information at a pilot scale The information are saved and incorporated into the UCSC Genome Browser It offers information mining capacity by means of Table Browser There is no ‘ natural connections ’ among the 3,000+ tables (Ensembl ’ s BioMart is more ‘ natural ’ ) It is upto the clients how to join the tables It is constrained to genomic directions, not proposed for proteome work

Slide 36

ENCODE Data Integrated in UCSC Genome Browser

Slide 37

A ~2kb rationed, transcribable, Ac-histone, pol2-tying component in the 1 st intron of ST7

Slide 38

Turned out to be a pseudo quality!

Slide 39

And likewise copied in different parts of genome!

Slide 41

Omics Dataset Example

Slide 42

Application Examples Joyce et al. Nature Reviews Molecular Cell Biology 7 , 198 – 210 (March 2006) | doi:10.1038/nrm1857

Slide 43

Protein-DNA Interaction & Transcriptomics Yeast rich medium quality modules system ChIP-chip area and expression information 106 modules containing 655 qualities managed by 68 TFs

Slide 44

Protein-DNA Interaction & Transcriptomics

Slide 45

Combining so as to predict Protein-Protein Interaction numerous datasets

Slide 46

Combining so as to predict Protein-Protein Interaction various datasets

Slide 47

Combining so as to predict Protein-Protein Interaction different datasets

Slide 48

How to take an interest Domain learning gathering Monitoring papers and sites of important information Collect the omics information and change into normal configurations Develop speculations & mining procedures Data mix gathering Develop DB blueprint Integration with bio-grid & bio-motor Querying organic ideas Graphic perception

Slide 50

Practice Session - Cytoscape Installation One of the most generally utilized and extensively open programming bundles intended to encourage omics information combination and investigation Totorials Interaction system show Expression