UniProt: the universal protein knowledgebase
Date
2016-11-28
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Oxford University Press on behalf of Nucleic Acids Research
Abstract
The UniProt knowledgebase is a large resource of
protein sequences and associated detailed annotation.
The database contains over 60 million sequences,
of which over half a million sequences
have been curated by experts who critically review
experimental and predicted data for each protein.
The remainder are automatically annotated based on
rule systems that rely on the expert curated knowledge.
Since our last update in 2014, we have more
than doubled the number of reference proteomes to
5631, giving a greater coverage of taxonomic diversity.
We implemented a pipeline to remove redundant
highly similar proteomes that were causing excessive
redundancy in UniProt. The initial run of this
pipeline reduced the number of sequences in UniProt
by 47 million. For our users interested in the accessory
proteomes, we have made available sets of pan
proteome sequences that cover the diversity of sequences
for each species that is found in its strains
and sub-strains. To help interpretation of genomic
variants, we provide tracks of detailed protein information
for the major genome browsers. We provide
a SPARQL endpoint that allows complex queries of
the more than 22 billion triples of data in UniProt
(http://sparql.uniprot.org/). UniProt resources can be
accessed via the website at http://www.uniprot.org/.
Description
Publisher's PDF
Keywords
Citation
UniProt Consortium. "UniProt: the universal protein knowledgebase." Nucleic acids research 45.D1 (2017): D158-D169.