A probabilistic framework for protein multi-location prediction, and its applicability to multi-label classification

Author(s)Simha, Ramanuja
Date Accessioned2017-02-06T13:24:34Z
Date Available2017-02-06T13:24:34Z
Publication Date2016
AbstractKnowing the location of a protein within the cell is important for understanding its function, role in biological processes, and potential use as a drug target. While it has been shown that proteins localize to multiple locations, most computational methods assign a single location per protein. A few recent systems assign multiple locations to proteins. However, they typically treat locations as independent of each other. We present a system for protein multi-location prediction that utilizes inter-dependencies among locations for predicting multiple locations of proteins. Results obtained by using this system show that incorporating such inter-dependencies in the location prediction process improves the classifier's prediction performance. In machine learning terms, assigning multiple locations to proteins is a special case of multi-label classification (MLC), where proteins can be viewed as instances and locations as instance labels. Improving on the initial system, we introduce an advanced approach for MLC based on a probabilistic generative model that explicitly captures dependencies between features and subsets of labels, in addition to representing inter-dependencies among labels as done by our earlier system. Experimental results demonstrate improved performance of our system for protein multi-location prediction as well as for the general problem of MLC. The most comprehensive current set of multi-localized proteins used to assess the performance of multi-location prediction systems contains proteins localizing to only two locations. We present a procedure to construct a more extensive collection of proteins that localize to multiple locations. This procedure comprises extracting reliable protein information from up-to-date online repositories and storing relevant attributes in a relational database.en_US
AdvisorShatkay, Hagit
DegreePh.D.
DepartmentUniversity of Delaware, Department of Computer and Information Sciences
Unique Identifier971492924
URLhttp://udspace.udel.edu/handle/19716/20447
PublisherUniversity of Delawareen_US
URIhttps://search.proquest.com/docview/1840890205?accountid=10457
dc.subject.lcshProbabilities.
dc.subject.lcshProteins.
TitleA probabilistic framework for protein multi-location prediction, and its applicability to multi-label classificationen_US
TypeThesisen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
2016_SimhaRamanuja_PhD.pdf
Size:
10.82 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.22 KB
Format:
Item-specific license agreed upon to submission
Description: