Network-level study of protein-protein interactions: analysis and prediction

Date
2017
Journal Title
Journal ISSN
Volume Title
Publisher
University of Delaware
Abstract
With continuous efforts in identifying protein-protein interactions (PPIs) through both high-throughput wet-lab experiments and computational methods, an increasing number of new PPIs have been discovered and validated, enabling sizeable (even genome wide) PPI networks to be formed. Therefore, it has become feasible and also imperative to study PPIs, as a whole, at the network level; to gain knowledge about the network topology and evolution; and to leverage the newly gained knowledge to advance the reconstruction of PPI networks, which are still quite sparse in most cases, by inferring de novo PPIs that are difficult to predict without a network context. ☐ In this dissertation, we systematically studied the PPI networks in terms of network evolution analysis and network completion with predicting de novo PPIs, and have proposed and developed a suite of novel methods from selecting evolutionary models to utilizing network evolution and topology, and leveraging multiple heterogeneous data sources for predicting PPIs. ☐ PPI evolution analysis aims at identifying the underlying evolution/growth mechanism of PPI networks, which plays a crucial role for understanding PPIs as a network system and for predicting new interactions. By exploring the state-of-the-art PPI network evolution models, we developed a novel sampling method based on Approximate Bayesian Computation and modified Differential Evolution algorithm to select the most fitting evolution model for different PPI networks. The results from our analysis based on Human and Yeast PPI networks show that different PPI networks may have different evolution/growth models: for Human PPI networks, Duplication-Attachment is the predominant mechanism while Scale-Free is the predominant mechanism for Yeast PPI networks. Equipped with the evolution models for different PPI networks, we designed a novel PPI prediction method to include the evolution information into the geometric embedding, which consequently improves the PPI prediction performance by about 15%. ☐ Despite of the rapid growth, PPI networks by and large remain incomplete and sparsely disconnected for most organisms, and therefore network completion poses a grand challenge in systems biology. Many traditional network-level PPI prediction methods use only connectivity information of existing edges to predict PPIs. However, from a PPI prediction perspective, what is particularly useful is to incorporate pairwise features for node pairs that are not currently linked by a direct edge but may become linked. In this dissertation, we developed novel PPI network inference methods that can utilize pairwise features for all node pairs, regardless whether they are currently directly connected or not. In particular, our methods can help integrate various heterogeneous feature kernels, e.g. gene co-expression kernel, protein sequence similarity kernel, etc., to build the PPI inference matrix, whose element is interpreted as probability of how likely the two corresponding proteins will interact. Specifically, we adopt two strategies to optimize weights for various feature kernels to build the kernel fusion and eventually the PPI inference matrix. Tested on Yeast PPI data and compared with two control methods, our proposed methods shows a significant improvement in performance as measured by receiver operating characteristic. ☐ Another challenge of PPI prediction is how to train prediction model over extremely sparse and disconnected PPI networks. Many of existing network level methods assume the training network should be connected. However, that assumption greatly affects their predictive power and limits the application area because current golden standard PPI networks are actually very sparse and disconnected. We developed a novel PPI prediction method based on deep learning neural network and regularized Laplacian kernel. We use the neural network to implicitly simulate and guide the evolution process of a PPI network by using rows of an ancient network as inputs and rows of the disconnected training network as labels. After the training step, an evolved PPI network whose rows are outputs of the neural network can be obtained. Then we predict PPIs by applying the regularized Laplacian kernel to the transition matrix that is built upon the evolved PPI network. The results from cross-validation experiments show that the PPI prediction accuracy for yeast data and human data can be further improved. Meanwhile, the transition matrix based on the evolved PPI network can also be used to leverage complementary information from the disconnected training network and multiple heterogeneous data sources. ☐ In sum, the work in this dissertation contributes to the understanding of PPI networks, especially, those that are large and sparse, by novel methods in selecting network evolutionary models and leveraging network topology and heterogeneous features to improve the prediction performance. We believe methods proposed in this dissertation will be useful tools to help researchers further analyze PPI data and predict PPIs.
Description
Keywords
Applied sciences, Interaction prediction, Network evolution, Network inference, PPI network, Random walk, Weight optimization
Citation