Applied genomics: development of bioinformatics pipelines for analyzing clinical pediatric genomic data

Date
2016
Journal Title
Journal ISSN
Volume Title
Publisher
University of Delaware
Abstract
The onset and prognosis of several human diseases, such as cancer, are characterized by specific genomic alterations. The sequencing and assembly of the human genome is enabling advancements in personalized medicine, but the process of associating genetic mutations to a specific human disease and treatment is still complex. Recent advancements in DNA sequencing technologies, known as next generation sequencing (NGS), are enabling the detection of many genomic alterations at once. However, a primary limiting factor to clinical applications of genomic NGS is downstream bioinformatics analysis. A novel approach was created for analyzing whole exome sequencing (WES) datasets (sequenced on the Illumina platform) from clinical patients diagnosed with a rare Mendelian disease. One of the datasets used to help establish the methodologies was paired-end WES from six patients, plus their family members, with a rare disorder characterized by facio-skeletal abnormalities. Robust bioinformatics pipelines were implemented for trimming, genome alignment, single nucleotide polymorphisms (SNPs) detection and annotation, and copy number variation detection. Quality control metrics were analyzed at each step of the pipeline to ensure data integrity for clinical applications. The variants are annotated with three custom modules that enable flexible filtering of the variants based upon criteria such as quality of variant, inheritance pattern (e.g. dominant, recessive, X-linked), and minor allele frequency. Bioinformatics methodologies were also developed for analyzing NGS data generated from 19 pediatric acute myeloid leukemia patients. The bioinformatics pipelines developed were focused on single nucleotide variant detection and annotation, combined with genomic insertion / deletion detection and annotation. A list of verified single nucleotide variants was provided with the clinical NGS dataset, and the pipeline was capable of detecting ~94% of the verified single nucleotide variants using a combination of Mutect and Shimmer. The bioinformatics pipeline developed reported high quality single nucleotide variants that were previously not reported to the Children’s Oncology Group. Furthermore, detection and analysis of an internal tandem duplication (ITD) in FLT3, a known clinically relevant mutation in pediatric AML associated with poor prognosis, was conducted using Pindel. The ITD was detected in 5 of the 6 patient’s NGS data, which had previously only been detected using PCR and electrophoresis. Collectively, this dissertation project provides a unique method for prioritizing and visualizing genomic variants using functional annotations, including gene ontologies and pathway enrichment strategies.
Description
Keywords
Citation