A probabilistic framework for protein multi-location prediction, and its applicability to multi-label classification
Author(s) | Simha, Ramanuja | |
Date Accessioned | 2017-02-06T13:24:34Z | |
Date Available | 2017-02-06T13:24:34Z | |
Publication Date | 2016 | |
Abstract | Knowing the location of a protein within the cell is important for understanding its function, role in biological processes, and potential use as a drug target. While it has been shown that proteins localize to multiple locations, most computational methods assign a single location per protein. A few recent systems assign multiple locations to proteins. However, they typically treat locations as independent of each other. We present a system for protein multi-location prediction that utilizes inter-dependencies among locations for predicting multiple locations of proteins. Results obtained by using this system show that incorporating such inter-dependencies in the location prediction process improves the classifier's prediction performance. In machine learning terms, assigning multiple locations to proteins is a special case of multi-label classification (MLC), where proteins can be viewed as instances and locations as instance labels. Improving on the initial system, we introduce an advanced approach for MLC based on a probabilistic generative model that explicitly captures dependencies between features and subsets of labels, in addition to representing inter-dependencies among labels as done by our earlier system. Experimental results demonstrate improved performance of our system for protein multi-location prediction as well as for the general problem of MLC. The most comprehensive current set of multi-localized proteins used to assess the performance of multi-location prediction systems contains proteins localizing to only two locations. We present a procedure to construct a more extensive collection of proteins that localize to multiple locations. This procedure comprises extracting reliable protein information from up-to-date online repositories and storing relevant attributes in a relational database. | en_US |
Advisor | Shatkay, Hagit | |
Degree | Ph.D. | |
Department | University of Delaware, Department of Computer and Information Sciences | |
Unique Identifier | 971492924 | |
URL | http://udspace.udel.edu/handle/19716/20447 | |
Publisher | University of Delaware | en_US |
URI | https://search.proquest.com/docview/1840890205?accountid=10457 | |
dc.subject.lcsh | Probabilities. | |
dc.subject.lcsh | Proteins. | |
Title | A probabilistic framework for protein multi-location prediction, and its applicability to multi-label classification | en_US |
Type | Thesis | en_US |