Advancing gene-centric approaches for microbial ecology

Date
2024
Journal Title
Journal ISSN
Volume Title
Publisher
University of Delaware
Abstract
Metagenomics is a powerful approach that has enhanced our understanding of microbial communities and the roles microbes play in various environments. A deep examination of single genes, particularly protein-coding genes, can add critical insight to metagenomic datasets by providing functional information and allowing for the prediction of observable traits and the formulation of "genome to phenome" hypotheses. However, gene-centric approaches to metagenomics face unique challenges, and the comparative lack of tools and approaches specifically designed to address these problems makes gene-centric analyses of microbial communities less accessible to many researchers. Though data quality issues arise at all stages of the sample-to-sequence to-discovery pipeline, gene-centric studies are particularly sensitive to issues such as those arising from misannotations of the genes under study, which necessitates time consuming manual curation, or from the compositional nature of metagenomic data, which requires special statistical care. To address some of the barriers to effective gene-centric analysis in metagenomics, this dissertation introduces three tools: PASV, InteinFinder, and Iroki, as well as a novel framework for examining microbial community diversity. PASV (protein amino acid signature validator) automates the manual curation of homology search results to ensure accurate protein annotation. InteinFinder is a pipeline developed to automatically identify and remove inteins, the protein equivalent of introns, from protein sequences commonly used in gene-centric studies. Together, PASV and InteinFinder significantly reduce the amount of time and domain-knowledge traditionally needed to manually curate single gene datasets. Iroki is a userfriendly tool designed to automatically customize phylogenetic and other types of trees with user supplied metadata, facilitating data interpretation. The introduced diversity framework provides a more comprehensive and scalable view of microbial community diversity compared to current approaches, particularly for large metagenomic datasets. Overall, these advancements simplify the gene-centric study of microbial communities and enhance the metagenomic analysis pipeline.
Description
Keywords
Data science, Microbial ecology, Viral ecology, Metagenomics, Microbiome
Citation