Next Generation Sequencing Bioinformatics Support for GPCL BAC Analysis

Next Generation Sequencing Bioinformatics Support for GPCL BAC Analysis
paly

This article discusses the process of utilizing GPCL BAC data analysis for Next Generation Sequencing. The article highlights the importance of study design and how it impacts the use of BAC data analysis. Moreover, the article offers insight into data analysis, estimation of annotation and assembly, and how final reports are received by Principal Investigator (PI).

  • Uploaded on | 1 Views
  • viljami viljami

About Next Generation Sequencing Bioinformatics Support for GPCL BAC Analysis

PowerPoint presentation about 'Next Generation Sequencing Bioinformatics Support for GPCL BAC Analysis'. This presentation describes the topic on This article discusses the process of utilizing GPCL BAC data analysis for Next Generation Sequencing. The article highlights the importance of study design and how it impacts the use of BAC data analysis. Moreover, the article offers insight into data analysis, estimation of annotation and assembly, and how final reports are received by Principal Investigator (PI).. The key topics included in this slideshow are Next Generation Sequencing, Bioinformatics Support, GPCL BAC Analysis, Study Design, Data Analysis,. Download this presentation absolutely free.

Presentation Transcript


1. Next-Gen Sequencing Bioinformatics Support GPCL-BAC Rick Jordan, Programmer/Analyst J. Lyons-Weiler, Sci. Director September 26, 2008

2. Process GPCL-BAC Director & Analyst meet w/PI Discuss Data Analysis Needs & Study Design PI Decides on Use of BAC or Go it Alone Go It Alone -> data (.sff files) Use the BAC data analysis $ estimate annotation, assembly, & analysis + data PI reviews Preliminary Research Report w/Analyst After final analysis, PI receives Report & Data Often the Analysis will be tailored to the application

3. de novo Analysis Flowchart Data/Reads exported to data rig 454 GS FLX Image files Sequences Sequence processing dataRunParams. parse Image processing analysisParams. parse Signal processing .sff files 454Runtime Metrics.csv 454QualityFilter Metrics.csv 454BaseCaller Metrics.csv Assembler Analysis & Annotation Assembler GS FLX System GS or Lasergene

4. Image processing

5. Lasergene SeqBuilder Reference sequence e.coli K12

6. Signal Processing

7. de novo Genome Assembly Two software packages currently used: GS FLX Assembler (Newbler algorithm) Can be used for all experiments Lasergene (SeqMan Pro) Single-end experiments only

8. GS de novo Assembler Input: .sff files and per-base quality scores Output: Consensus sequence, assembled de novo Main processing steps: Identify pairwise overlaps between reads Construct multiple alignments of contigs Generate consensus basecalls of contigs Output contig consensus sequences and quality scores, along with ACE file of multiple alignments and assembly metrics files From 454 Sequencing GS-FLX Data Analysis Software Manual, Dec 2007

10. e.g. Graphic Figure of the Assembly (Lasergene 7.2)

11. GS Reference Mapper Generates the consensus DNA sequence by mapping, or alignment, of the reads to a reference sequence Provides a list of high-confidence mutations (individual bases or blocks of bases that differ between the consensus DNA sequence of the sample and the reference sequence) From 454 Sequencing GS-FLX Data Analysis Software Manual, Dec 2007

13. Genome Annotation (sequence functional classes) Zuber et al. (2007)

14. Gene annotation with SeqManPro Project

15. e.g. Diagrams Smith et al. (2007)

16. Impacted Pathways # Input # Pat hw ay %Pathway Pathway Impact # Genes Genes Genes Genes corrected Rank Name Score In Pathway on Chip on Chip in Input p-value p-value 1 Phosphatidylinositol signaling system 10.508 55 4 46 7.273 0.007995 0.007995 2 ECM-receptor interaction 6.746 62 4 57 6.452 0.016721 0.016721 3 Wnt signaling pathway 6.731 113 6 92 5.31 0.005133 0.005133 4 B cell receptor signaling pathway 6 59 4 47 6.78 0.008623 0.008623 5 Melanogenesis 5.765 86 4 63 4.651 0.023291 0.023291 6 Gap junction 5.422 76 4 63 5.263 0.023291 0.023291 7 GnRH signaling pathway 5.026 84 4 68 4.762 0.029813 0.029813 8 Focal adhesion 4.954 163 5 140 3.067 0.095078 0.095078 9 Long-term potentiation 4.868 62 3 49 4.839 0.052339 0.052339 10 Olfactory transduction 4.673 27 2 21 7.407 0.050233 0.050233 11 Calcium signaling pathway 4.644 164 5 131 3.049 0.076528 0.076528

17. e.g. Pathway view

19. e.g. COGS table Smith et al . (2007)

20. e.g. Sequencing statistics table Marcy et al. (2007)

21. Base Caller Metrics

22. Quality Filter Metrics

23. Runtime Metrics

24. Quality measures by region TCA ATG

25. Read lengths by region TCA ATG

26. e.g. Blast results

27. e.g. Predicted nucleotide and protein alignment Raymond et al. (2007)

28. e.g. Predicted protein alignment Raymond et al. (2007)

29. Grant Text Next Generation Sequence Bioinformatics Analysis. The Bioinformatics Analysis Core is sufficiently endowed with software and human resources to conduct the analysis of data from resequencing and de novo sequencing studies. Software acquisitions include the default Genome Sequencer modules and the recently acquired specialized Lasergene 7.2 software by DNA*. One BAC staff member is dedicated to the analysis of long-read NextGen sequencing data and is responsible for generating research reports for each project. Genome Sequencer FLX System Software The FLX System Software includes modules for each stage in the analysis. All raw data are accessible, and the system also offers a variety of third party software packages for niche applications. Data QA/QC The Core uses a variety of data quality control measures including consensus accuracy and quality scores including per base (Q20+) and per genome (%Bases Q20+; the proportion of an assembled genome with base call accuracy of >99%). The Core has also acquired licenses required to execute the full suite of Lasergene applications to round out the cores Genome Annotation capabilities. In addition to the sequence assembler/SNP discover algorithms in SeqMan Pro, and the visualization and sequence editing modules (SeqBuilder), the Lasergene suite adds the capacity for gene finding (GeneQuest) and protein structure analysis & prediction (Protean). The variety of file types that the core is expected to handle is greatly aided by Laser Geness EditSeq and by the much-improved interoperability of SeqMan Pro (which can import .sff, .fna, .fas and .qual files).

30. Research Report Components Tables Base Call Metrics Quality Filter Metrics Run Time Metric Tables Quality Score per base (Q40+) per genome (%Bases Q40+; the proportion of an assembled genome with base call accuracy of >99%). Quality Measure Distributions (By region) Read Length Measure Distributions Overall Sequence Statistics Tables Blast tables COGs Table Figures Assembly Figures Alignment Diagrams Gene Functional Categories Diagrams Genome View Diagrams Nucleotide Alignment Diagrams Predicted Protein Alignment Diagrams Gene Ontology Functional Class Diagrams/Charts Pathway Views COGs Figures Methods Text Manuscripts Proposals Letter of Support

31. Application Areas Ancient DNA ChIP-seq/Methylation/Epigenetics Eukaryotic Whole Genome Sequencing Expression tags Genetic variation detection HIV sequencing Metagenomics and Microbial Diversity Mitochondria/viruses/plastids/plasmids Prokaryotic Whole Genome Sequencing Sequence Capture/Target Region Resequencing Small RNAs Somatic variation detection Transcriptome Sequencing Roche 454/GS-FLX Web Site

32. de novo Analysis Flowchart Data/Reads exported to data rig 454 GS FLX Image files Sequences Sequence processing dataRunParams. parse Image processing analysisParams. parse Signal processing .sff files 454Runtime Metrics.csv 454QualityFilter Metrics.csv 454BaseCaller Metrics.csv Assembler Analysis & Annotation Assembler GS FLX System GS or Lasergene

33. Final Service Product Pre-analysis output files dataRunParams.parse 454 BaseCallerMetrics.csv 454 QualityFilterMetrics.csv 454 RuntimeMetricsAll.csv Post-analysis output files .sff files (for each region) Research report (.ppt) Additional text editing

Related