Expanding the frontiers of transcriptome sequencing data (RNA-seq): selection signatures in chickens

Date
2019
Journal Title
Journal ISSN
Volume Title
Publisher
University of Delaware
Abstract
Transcriptome sequencing (RNA-seq) analysis is a highly exploited technique for defining transcript abundance and differential expression analysis but is underutilized for nucleotide variant detection. Given the ability of RNA-seq to reveal active regions of the genome, detection of RNA-seq SNPs can prove valuable in understanding the phenotypic diversity between populations. This dissertation showcases the applicability of RNA-seq data in currently unexplored but important areas of biological research; such as variant analysis and detection of selection signatures in commercial broilers. I have developed a novel computational workflow that takes advantage of multiple RNA-seq splice aware aligners to call SNPs using RNA-seq data only. Our workflow achieved high precision and sensitivity, furthermore, we discovered SNPs resulting from post-translational events that would have been missed in WGS data. The results demonstrate SNP identification from RNA-seq data be reliable and a potential resource in determining selection signatures from variants. The identification of regions that have undergone selection is important in understanding the variation patterns responsible for the underlying phenotypic changes between populations. Modern broilers are characterized from decades of extensive genetic selection for traits of economic importance. However, improvement in economic traits also resulted in negative complications, such as skeletal abnormalities, inability to adapt to heat stress and susceptibility to diseases. These phenotypic changes imply strong positive selection for the causal loci or polymorphisms controlling these traits. To offer insight into the variation patterns responsible for the underlying phenotypic changes, we investigated regions of selection using the SNPs derived from our RNA-seq workflow in commercial broilers. ☐ Given the vast amounts of data generated by next-generation sequencing (NGS) data for the today’s -omics era, the ability to efficiently manage the massive throughput from NGS analysis becomes a major challenge, especially when dealing with data that range on a terabyte to petabyte scale. Thus, innovative storage solutions that address this computational bottleneck are paramount. To this aim, we designed a hybrid (Relational & NoSQL) database framework, called TransAtlasDB, that addresses the crucial need for a smart and innovative storage solution for management and retrieval of large-scale transcriptomics data output relevant to basic, medical and agriculture research.
Description
Keywords
Citation