A probabilistic framework for protein multi-location prediction, and its applicability to multi-label classification

Date
2016
Journal Title
Journal ISSN
Volume Title
Publisher
University of Delaware
Abstract
Knowing the location of a protein within the cell is important for understanding its function, role in biological processes, and potential use as a drug target. While it has been shown that proteins localize to multiple locations, most computational methods assign a single location per protein. A few recent systems assign multiple locations to proteins. However, they typically treat locations as independent of each other. We present a system for protein multi-location prediction that utilizes inter-dependencies among locations for predicting multiple locations of proteins. Results obtained by using this system show that incorporating such inter-dependencies in the location prediction process improves the classifier's prediction performance. In machine learning terms, assigning multiple locations to proteins is a special case of multi-label classification (MLC), where proteins can be viewed as instances and locations as instance labels. Improving on the initial system, we introduce an advanced approach for MLC based on a probabilistic generative model that explicitly captures dependencies between features and subsets of labels, in addition to representing inter-dependencies among labels as done by our earlier system. Experimental results demonstrate improved performance of our system for protein multi-location prediction as well as for the general problem of MLC. The most comprehensive current set of multi-localized proteins used to assess the performance of multi-location prediction systems contains proteins localizing to only two locations. We present a procedure to construct a more extensive collection of proteins that localize to multiple locations. This procedure comprises extracting reliable protein information from up-to-date online repositories and storing relevant attributes in a relational database.
Description
Keywords
Citation