Characterization of genomic diversity at a quantitative disease resistance locus in maize using improved bioinformatic tools for targeted resequencing

Date
2018
Journal Title
Journal ISSN
Volume Title
Publisher
University of Delaware
Abstract
Sequence variation is a fundamental component of biodiversity that underlies the response to selection and improvement of crops via plant breeding. The objective of this dissertation was to use targeted resequencing to characterize genomic diversity at regions of the maize genome associated with quantitative disease resistance (QDR). However, unsuccessful attempts to enrich or amplify specific regions of the maize genome using current methods (including a method tested in this dissertation), as well as the requirement for repeat-subtraction techniques for effective sequence capture in maize, indicated that current methods for targeted resequencing could be improved, at least for genomes with high repeat content. Moreover, sources of error from sequencing technology and bioinformatic algorithms are important factors to consider for the implementation of resequencing studies. ☐ To address these problems, new tools were developed for the production and analysis of multiplexed amplicon sequencing libraries: (i) ThermoAlign: a genome-aware primer design tool tailored for tiled amplicon resequencing; (ii) C3S-LAA: a sequencing error correction and assembly pipeline for single molecule real-time (SMRT) sequence data from amplicon libraries. Given a reference genome sequence, ThermoAlign performs priming relevant genome-wide alignments under a thermodynamics model for DNA hybridization -- “thermoalignments” -- to identify locally specific primer pairs. It was determined that the number of mismatches in an alignment or subsequence specificity was a poor proxy for evaluating priming specificity, and laboratory validation experiments demonstrated that the thermoalignment approach did indeed generate specific primer pairs. Notably, ThermoAlign has broad applications and can be used to design primer pairs for routine PCR assays and may also be extended for evaluating specificity of DNA hybridization probes and CRISPR/Cas9 guide RNAs. To minimize errors from sequence data processing, a clustering of circular consensus sequence (C3S) algorithm was developed and shown to eliminate the bioinformatic source of error encountered using the current, standard long amplicon analysis (LAA) method for SMRT sequence data of amplicon libraries. ☐ These developments enabled successful resequencing of 27 founder lines of a maize nested association mapping (NAM) population at a fine-mapped section of chromosome 1 (approximately 23 kb) associated with QDR to Northern leaf blight. Nestled within the highly repetitive and diverse genome of maize, this locus contained relatively low repeat content and low sequence diversity. Single nucleotide polymorphisms were the dominant type of variants and no major structural variants were observed. Using the resequence data as a benchmark, maize HapMap3, a community resource cataloging approximately 60 M variants across the genome, had a mean genotyping error rate of 12% per line and a limited catalog (less than 10%) of the variants present at the locus. This study demonstrated the importance and scope of long-read sequencing for accurate and exhaustive identification of genomic variants. Previous work suggested that the inbred line Tx303 carries a susceptible allele of a remorin gene that is unique to Tx303 among the NAM founders which underlies variation in QDR, but resequencing revealed that Tx303-specific variants were only present within another gene in the region (T-complex protein 1 subunit gamma gene). This work contributes new tools for genome science and provides new insights into the potential causal variants at a locus associated with QDR to NLB in maize.
Description
Keywords
Biological sciences, Applied sciences, Maize genome, Primer, Sequence variation, Thermoalignment
Citation