SNP Assets: Discovering SNPs Disclosure and Databases.


113 views
Uploaded on:
Category: General / Misc
Description
SNP Assets: Discovering SNPs Disclosure and Databases Mark J. Rieder, PhD SeattleSNPs Workshop Walk 20-21, 2006 SNP Assets: SNP revelation and listing SNP disclosure/genotyping: all inclusive methodologies The ebb and flow condition of SNP assets Far reaching SNP disclosure
Transcripts
Slide 1

SNP Resources: Finding SNPs Discovery and Databases Mark J. Rieder, PhD SeattleSNPs Workshop March 20-21, 2006

Slide 2

SNP Resources: SNP revelation and indexing SNP disclosure/genotyping: far reaching methodologies The ebb and flow condition of SNP assets Comprehensive SNP revelation Seattle SNPs - Program for Genomic Application SNP Databases - “How to” Manual for discovering SNPs In class - Tutorial

Slide 3

Genetic Markers: Overview RFLPs (SNPs around 1980) - arrangement variations that prompt a change in confinement site distinguished by Southern Blot Analysis 2. Microsatellites (di-, tri-, tetranucleotide rehashes) 1/50,000 bp Linkage Studies - 300-600 markers (~1 Mbp) Multi-allelic/High heterozygosity/enlightening Complex genotyping measures 3. Single Nucleotide Polymorphisms (SNPs) Most regular hereditary variation (base substitutions) 1/1000 bp (contrasting two haphazardly chose chromosomes) Biallelic/less educational Simplified genotyping stages (+/ - calling)

Slide 4

Development of a vast SNP map: what number SNPs? Nickerson and Kruglyak, Nature Genetics, 2001 ~ 10 million normal SNPs (> 1-5% MAF) - 1/300 bp How has SNP revelation advanced toward this objective?

Slide 5

Finding SNPs: Marker Discovery and Methods SNP revelation has continued in two unmistakable stages: 1 - SNP Identification Define the alleles Map this to a remarkable spot in the genome 2 - SNP Characterization Determination of the genotype in numerous people Population recurrence of SNPs

Slide 6

Finding SNPs: Marker Discovery and Methods SNP Discovery has continued in two particular stages: 1 - SNP Discovery** Human Genome Project - covers of BAC clones and ESTs The SNP Consortium - Reduced Representation Sequencing The HapMap 2 - SNP Discovery/Characterization** The HapMap

Slide 7

Genomic mRNA BAC Library RRS Library cDNA Library BAC Overlap Shotgun Overlap EST Overlap Finding SNPs: Sequence-based SNP Mining How would you discover SNPs on the off chance that you don’t have a reference succession RT lapses? DNA Sequencing Quality Sequence Overlap - SNP Discovery GTTACGCCAATACAG G ATCCAGGAGATTACC GTTACGCCAATACAG C ATCCAGGAGATTACC

Slide 8

Finding SNPs: Sequence-based SNP Mining BAC = Bacterial Artificial Chromosome Primary vector for DNA cloning in the HGP DNA from numerous people Clone expansive pieces into BACs (obscure arrangement) Fragment DNA Sequence and Reassemble (known grouping) Assembly with other covering BACs GTTACGCCAATACAG G ATCCAGGAGATTACC GTTACGCCAATACAG C ATCCAGGAGATTACC

Slide 9

Finding SNPs: Marker Discovery and Methods $ 45 Million - 2 years (1999, 2001 - 2003) Goals: Identify 300,000 SNPs and guide 150,000 (April 1999) Determine allele recurrence of SNPs If you don’t have a reference genome - how would you discover SNPs?

Slide 10

Finding SNPs: Sequence-based SNP Mining RRS = R evoked R epresentation S equencing Genomic DNA (various people) RE to create parts Clone DNA sections into plasmid vectors Altshuler, et al. Nature (2000) Sequence and adjust and group GTTACGCCAATACAG G ATCCAGGAGATTACC GTTACGCCAATACAG C ATCCAGGAGATTACC From cover recognize crisscrosses = SNPs

Slide 11

TSC and HGP: High Resolution SNP Map Feb. 2001 - Human Genome Project and TSC

Slide 12

Development of a broad SNP map: what number SNPs? Nickerson and Kruglyak, Nature Genetics, 2001 ~ 10 million normal SNPs (> 1 - 5% MAF) - 1/300 bp Feb 2001 - 1.42 million (1/1900 bp)

Slide 13

SNP Discovery: dbSNP database dbSNP - NCBI SNP database

Slide 14

Unique mapping to a genome area ( r eference S NP = rs# ) SNPs put together By exploration communtiy ( s ubmitted S NPs = ss# ) (by 2hit-2allele) SNP information submitted to dbSNP: Clustering dbSNP preparing of SNPs

Slide 15

Finding SNPs: Marker Discovery and Methods SNP Discovery has continued in two particular stages: 1 - SNP Discovery** Human Genome Project - covers of BAC clones and ESTs The SNP Consortium - Reduced Representation Sequencing The HapMap 2 - SNP Discovery/Characterization** The HapMap

Slide 16

HapMap Project Proposed: Map more SNPs and genotype Increase SNP thickness over the initial 6 - 12 months Ultimately create a fine scale hereditary guide (HapMap) which would serve as a typical asset for all biomedical reseseachers Genotype 600,000 SNPs broad Four populaces: CEPH (Europe), Yoruban (Africa), Japanese/Chinese (Asian)

Slide 17

Nov 2003 - 5.7 million (2 million accepted) - 1/1500 bp Feb 2004 - 7.2 million (3.3 million approved) - 1/900 bp Genomic DNA (various people) Random Shotgun Sequencing Sequence and adjust (reference grouping) Draft Human Genome GTTACGCCAATACAGGATCCAGGAGATTACC HapMap SNP Discovery: Prior to Genotyping Initiation of undertaking arranging (July 2001): 2.8 million SNPs (1.4 million accepted) - 1/1900 bp Generate more SNPs: Other Sources of SNPs: Perlegen (Affymetrix chips) SNP information (chr22) Sequence chromatograms from Celera venture TACGCC T ATA TC An AGGAGAT

Slide 18

HapMap SNP Discovery HapMap Discovery Increased SNP Density and Validated SNPs 10 million rs SNPs 5 million approved rs SNPs

Slide 19

Development of a far reaching SNP map: what number SNPs? Nickerson and Kruglyak, Nature Genetics, 2001 ~ 10 million basic SNPs (> 1-5% MAF) - 1/300 bp Feb 2001 - 1.42 million (1/1900 bp) Nov 2003 - 2.0 million (1/1500 bp) Feb 2004 - 3.3 million (1/900 bp) Mar 2005 - 5.0 million (accepted - 1/600 bp) When will we have all of them?

Slide 20

mRNA cDNA Library BAC Library EST Overlap BAC Overlap Finding SNPs: Sequence-based SNP Mining Genomic RRS Library Random Shotgun DNA SEQUENCING Shotgun Overlap Align to Reference RANDOM Sequence Overlap - SNP Discovery GTTACGCCAATACAG G ATCCAGGAGATTACC GTTACGCCAATACAG C ATCCAGGAGATTACC

Slide 21

1.0 8 Fraction of SNPs Discovered 0.5 2 0.0 0.1 0.2 0.3 0.4 0.5 Minor Allele Frequency (MAF) SNP revelation is reliant on your specimen populace size { GTTACGCCAATACAG G ATCCAGGAGATTACC GTTACGCCAATACAG C ATCCAGGAGATTACC 2 chromosomes 8

Slide 22

SNP Characterization/Genotyping Nickerson and Kruglyak, Nature Genetics, 2001 ~ 10 million basic SNPs (>1-5% MAF) - 1/300 bp Mar 2005 - 5.0 million (accepted/mapped - 1/600 bp) 5.0/10.0 = half of all normal SNPs (approved)!

Slide 23

HapMap Project Proposed: Map more SNPs and Genotype 600,000 SNPs all inclusive Four populaces: CEPH (CEU) (Europe - n = 90, trios) Yoruban (YRI) (Africa - n = 90, trios) Japanese (JPT) (Asian - n = 45) Chinese (HCB) (Asian - n =45)

Slide 24

Finding SNPs: Genotype Data Adds Value to SNPs HapMap Genotyping Confirms SNP as “real” and “informative” Minor Allele Frequency (MAF) - normal or uncommon MAF in distinctive populaces Detection of SNP x SNP relationships (Linkage Disequilibrium) Determine haplotypes

Slide 25

Few SNPs in dbSNPs had Genotype Data

Slide 26

Perlegen Large-scale Genotyping Capacity 1.58 millions SNPs genotyped 71 people from 3 American populaces European, African and Asian family line

Slide 27

HapMap Completion Nature - Oct 27 (2005) HapMap + Perlegen

Slide 29

dbSNP: Increasing quantities of SNPs now have genotype information HapMap Phase II Perlegen Data

Slide 30

Current State of dbSNP Many SNPs left to accept and portray.

Slide 31

15,367 dbSNP 16,248 New SNPs half of SNPs in dbSNP 5 Mb/31,500 SNPs = 1/160 bp Increasing SNP Density: HapMap ENCODE Project ENCODE = ENC yclopedia O f D NA E lements Catalog every single utilitarian component in 1% of the genome (30 Mb) 10 Regions x 500 kb/area (Pilot Project) David Altschuler (Broad), Richard Gibbs (Baylor) 16 CEU, 16 YRI, 8 HCB, 8 JPT Comprehensive PCR based resequencing over these districts

Slide 32

Development of an expansive SNP map: what number SNPs? Nickerson and Kruglyak, Nature Genetics, 2001 ~ 10 million regular SNPs (>1-5% MAF) - 1/300 bp Mar 2005 - 5.0 million (accepted - 1/600 bp) ~4.0 million approved SNPs with genotypes! (HapMap affirmed, allele recurrence/populace, SNPxSNP relationships (LD), haplotypes)

Slide 33

1.0 96 48 24 16 8 Fraction of SNPs Discovered 0.5 2 0.0 0.1 0.2 0.3 0.4 0.5 Minor Allele Frequency (MAF) SNP revelation is subject to your specimen populace size { GTTACGCCAATACAG G ATCCAGGAGATTACC GTTACGCCAATACAG C ATCCAGGAGATTACC 2 chromosomes

Slide 34

Goal: Comprehensively distinguish all basic arrangement variety in applicant qualities Initial natural center: Inflammation, Coagulation, Complement Approach: Direct resequencing of qualities Sample: p1 = 24 African-Americans, 23 CEPH folks, 1 chimp p2 = 24 HapMap-YRI, 23 HapMap-CEU, 1 gorilla Status: 271 qualities (21 kb ave)

Slide 35

Targeted SNP Discovery Directed investigation: cSNPs 5’ 3’ Val-Val Arg-Cys PCR amplicons Complete examination: cSNP and Haplotype Structure Analysis 5’ 3’ Arg-Cys Val-Val PCR amplicons Generate SNP information from complete genomic resequencing (i.e. 5’ administrative, exon, intron, 3’ administrative succession)

Slide 36

Comprehensive SNP Discovery: Resequencing Overlapping PCR Amplicons crosswise over whole quality Make no suspicions about arrangement capacity Sequence assorted qualities and hereditary structure for every quality is diverse Proper affiliation studies must be composed in this setting Complete resequencing encourages populace hereditary qualities techniques

Slide 37

Sequence-based SNP Identification Sequence Amplify DNA Phred Phrap Base-calling Contig get together Sequence every end 5’ 3’ Quality determination Final quality determination of the section. PolyPhred Polymorphism identification Sequencing Capacity: ABI 3730 – 96 narrow ~2000 singular chromatograms/day ~1,00,000 bp/day ~20 kbp grouping checked in 48 people/wk Consed Seque