Browsing by Author "Adetunji, Modupeore O."
Now showing 1 - 2 of 2
Results Per Page
Sort Options
Item Expanding the frontiers of transcriptome sequencing data (RNA-seq): selection signatures in chickens(University of Delaware, 2019) Adetunji, Modupeore O.Transcriptome sequencing (RNA-seq) analysis is a highly exploited technique for defining transcript abundance and differential expression analysis but is underutilized for nucleotide variant detection. Given the ability of RNA-seq to reveal active regions of the genome, detection of RNA-seq SNPs can prove valuable in understanding the phenotypic diversity between populations. This dissertation showcases the applicability of RNA-seq data in currently unexplored but important areas of biological research; such as variant analysis and detection of selection signatures in commercial broilers. I have developed a novel computational workflow that takes advantage of multiple RNA-seq splice aware aligners to call SNPs using RNA-seq data only. Our workflow achieved high precision and sensitivity, furthermore, we discovered SNPs resulting from post-translational events that would have been missed in WGS data. The results demonstrate SNP identification from RNA-seq data be reliable and a potential resource in determining selection signatures from variants. The identification of regions that have undergone selection is important in understanding the variation patterns responsible for the underlying phenotypic changes between populations. Modern broilers are characterized from decades of extensive genetic selection for traits of economic importance. However, improvement in economic traits also resulted in negative complications, such as skeletal abnormalities, inability to adapt to heat stress and susceptibility to diseases. These phenotypic changes imply strong positive selection for the causal loci or polymorphisms controlling these traits. To offer insight into the variation patterns responsible for the underlying phenotypic changes, we investigated regions of selection using the SNPs derived from our RNA-seq workflow in commercial broilers. ☐ Given the vast amounts of data generated by next-generation sequencing (NGS) data for the today’s -omics era, the ability to efficiently manage the massive throughput from NGS analysis becomes a major challenge, especially when dealing with data that range on a terabyte to petabyte scale. Thus, innovative storage solutions that address this computational bottleneck are paramount. To this aim, we designed a hybrid (Relational & NoSQL) database framework, called TransAtlasDB, that addresses the crucial need for a smart and innovative storage solution for management and retrieval of large-scale transcriptomics data output relevant to basic, medical and agriculture research.Item Improving eukaryotic genome assembly through application of single molecule real-time sequencing data genome: coffee leaf rust fungus, H. vastatrix(University of Delaware, 2014) Adetunji, Modupeore O.Coffee production is globally threatened by Coffee Leaf Rust disease. The fungal pathogen, Hemileia vastatrix, has been estimated to have the largest fungal genome known. With the absence of an available draft genome, genome sequencing and assembly is a fundamental step in understanding the infectious mechanism of the disease. Next Generation Sequencing technologies (NGS) have been successfully applied for the whole genome sequencing and assembly of many genomes. Second-generation sequencing technologies, such as Illumina, are known for their high throughput but limited by short read lengths and systematic biases. The application of such technologies on large and more complex genomes result in numerous inaccuracies due to the inability to handle repeat regions and sequencing errors. Longer sequence data produced by third generation sequencing technologies, notably PacBio RS-II (Pacific Biosciences Inc.), show promise for overcoming such issues, demonstrated through accurate bacterial-scale genome assemblies and improvements to existing eukaryotic genomes by filling gaps and sequencing through repetitive sequence regions, but are limited by a high error rate and lower throughput. In this study, we developed a three-stage pipeline to assess the performance of various de novo assembly algorithms, SOAPdenovo2, CLC Genomics Workbench (CLC), and Velvet; error correction tools, LSC and PacBioToCA; and the whole-genome shotgun assembler, Celera, for the whole genome assembly of large eukaryotic genomes using synthetic PacBio RS II CLR (Continuous Long Reads) and Illumina paired-end reads created from the Arabidopsis thaliana genome as a proxy for H. vastatrix. At each stage, performance was assessed by reference genome mapping using BLASR and BWA-MEM, and was visualized using SeqMonk and CLC. The results showed the ability of the pipeline to produce long scaffolds with low nucleotide mapping error; the best performance overall was seen with the whole-genome shotgun assembly of SOAPdenovo2 scaffolds and PacBioToCA contigs, producing long genome scaffolds (>1.8Mb) with high N50, no captured gaps and spanning 93% of the reference genome with 1% nucleotide mapping error. These findings demonstrate that creating long genomic scaffolds for complex eukaryotic genomes such as H. vastatrix by NGS can be achieved with implementation of appropriate de novo assembly algorithms.