Browsing by Author "Ren, Jia"
Now showing 1 - 4 of 4
Results Per Page
Sort Options
Item Improve large-scale automatic information extraction for biomedical knowledge discovery and curation(University of Delaware, 2020) Ren, JiaNumerous efforts have been made for developing text-mining tools to extract information from biomedical text automatically. They have assisted in many biological tasks, such as database curation and hypothesis generation. However, main challenges exist in using text-mining tools for large-scale automatic information extraction for knowledge discovery and curation. First, text-mining tools are usually different from each other in terms of programming language, system dependency and input/output format, requiring a lot of engineering efforts to use them in a single large-scale data processing framework and consolidate their results. Secondly, the text-mining results unavoidably contain errors and hinder their usage for knowledge discovery and fast curation. Last but not least, the text-mining results are usually disseminated in a different venue than the one where the documents are originally published, e.g., European PMC, making it difficult for users to quickly obtain the information while reading the papers. ☐ In this dissertation, we describe our efforts to address the three challenges. First, we develop the iTextMine system with an automated workflow to run multiple text-mining tools on large-scale text for knowledge extraction. We employ parallel processing for dockerized text-mining tools with a standardized JSON output format and implement a text alignment algorithm to solve the text discrepancy for result integration. Currently, iTextMine consists of four relation extraction tools and has processed all the Medline abstracts and PMC open access full-length articles. ☐ To remove errors and improve result quality, we further develop several post-processing modules to filter, evaluate, and aggregate the extracted relations. We integrate several tools to label negation, hedging, and citation in a sentence, and mark the relations affected by these phenomena. A confidence module with state-of-the-art deep learning methods is developed to assign confidence scores to relations extracted by rule-based text-mining tools. We compare the performance of several models for well-calibrated confidence scores. These add-on steps produce higher quality annotations and allow them to be ranked based on confidence to facilitate curation. ☐ Last but not least, we explore a popular web annotation system to disseminate iTextMine result to broaden curator community. We demonstrate how to submit the annotations to publisher website at post-publication stage. Meanwhile, we showcase how our existing pipeline can be modified to annotate pre-publication biomedical text. ☐ The iTextMine website and its data APIs is available at URL: http://research.bioinformatics.udel.edu/itextmineItem iPTMnet visualization with Cytoscape Web(University of Delaware, 2013) Ren, JiaiPTMnet is a new bioinformatics resource for post-translational modifications (PTMs). It integrates text mining, data mining, databases and ontologies into one convenient platform. These large collections of relations are important tools for studying specific cell behaviors. Data visualization is an effective methodology to make sense of thousands of rows of data. The graphic presentation is able to highlight important substructures. We use Cytoscape Web, a web-based graphic presentation platform, to build a visualization tool for iPTMnet. Several web visualization tools have been built with Cytoscape Web to facilitate projects target on showing proteinprotein interaction, predicting related gene group, and integrating protein-protein interaction, chromatin machinery and human disease. Here we present a novel graphic presentation of the iPTMnet. The motivation of the thesis work is to assist researchers in quickly identifying PTM for one or agroup of Protein Ontology (PRO) entries. In this work, we mainly focus on PRO curated PTM relations and visualize PRO hierarchy structure, proteinprotein interactions, phosphorylation and acetylation. We tackled problems in storing and presenting hierarchy structure, creating a user-friendly interface and constructing view functions. The visualization tool is used to support a spindle checkpoint study and help reveal the evolutionary relationship, perform cross-species comparison to predict phosphorylation site, and construct PPI network.Item Protein Ontology (PRO): enhancing and scaling up the representation of protein entities(Oxford University Press, 2016-11-28) Natale, Darren A.; Arighi, Cecilia N.; Blake, Judith A.; Bona, Jonathan; Chen, Chuming; Chen, Sheng-Chih; Christie, Karen R.; Cowart, Julie; D’Eustachio, Peter; Diehl, Alexander D.; Drabkin, Harold J.; Duncan, William D.; Huang, Hongzhan; Ren, Jia; Ross, Karen; Ruttenberg, Alan; Shamovsky, Veronica; Smith, Barry; Wang, Qinghua; Zhang, Jian; El-Sayed, Abdelrahman; Wu, Cathy H.; Darren A. Natale, Cecilia N. Arighi, Judith A. Blake, Jonathan Bona, Chuming Chen, Sheng-Chih Chen, Karen R. Christie, Julie Cowart, Peter D’Eustachio, Alexander D. Diehl, Harold J. Drabkin, William D. Duncan, Hongzhan Huang, Jia Ren, Karen Ross, Alan Ruttenberg, Veronica Shamovsky, Barry Smith, Qinghua Wang, Jian Zhang, Abdelrahman El-Sayed and Cathy H. Wu; Arighi, Cecilia N.; Chen, Chuming; Chen, Sheng-Chih; Cowart, Julie; Huang, Hongzhan; Ren, Jia; Wang, Qinghua; Wu, Cathy H.The Protein Ontology (PRO; http://purl.obolibrary. org/obo/pr) formally defines and describes taxonspecific and taxon-neutral protein-related entities in three major areas: proteins related by evolution; proteins produced from a given gene; and proteincontaining complexes. PRO thus serves as a tool for referencing protein entities at any level of specificity. To enhance this ability, and to facilitate the comparison of such entities described in different resources, we developed a standardized representation of proteoforms using UniProtKB as a sequence reference and PSI-MOD as a post-translationalmodification reference. We illustrate its use in facilitating an alignment between PRO and Reactome protein entities. We also address issues of scalability, describing our first steps into the use of text mining to identify protein-related entities, the large-scale import of proteoform information from expert curated resources, and our ability to dynamically generate PRO terms. Web views for individual terms are now more informative about closely-related terms, including for example an interactive multiple sequence alignment. Finally, we describe recent improvement in semantic utility, with PRO now represented in OWL and as a SPARQL endpoint. These developments will further support the anticipated growth of PRO and facilitate discoverability of and allow aggregation of data relating to protein entities.Item WebGIVI: a web-based gene enrichment analysis and visualization tool(BioMed Central, 2017-05-04) Sun, Liang; Zhu, Yongnan; Mahmood, A. S. M. Ashique; Tudor, Catalina O.; Ren, Jia; Vijay-Shanker, K.; Chen, Jian; Schmidt, Carl J.; Liang Sun, Yongnan Zhu, A. S. M. Ashique Mahmood, Catalina O. Tudor, Jia Ren, K. Vijay-Shanker, Jian Chen and Carl J. Schmidt; Sun, Liang; Mahmood, A. S. M. Ashique; Tudor, Catalina O.; Ren, Jia; Vijay-Shanker, K.; Schmidt, Carl J.BACKGROUND: A major challenge of high throughput transcriptome studies is presenting the data to researchers in an interpretable format. In many cases, the outputs of such studies are gene lists which are then examined for enriched biological concepts. One approach to help the researcher interpret large gene datasets is to associate genes and informative terms (iTerm) that are obtained from the biomedical literature using the eGIFT text-mining system. However, examining large lists of iTerm and gene pairs is a daunting task. RESULTS: We have developed WebGIVI, an interactive web-based visualization tool (http://raven.anr.udel.edu/webgivi/) to explore gene:iTerm pairs. WebGIVI was built via Cytoscape and Data Driven Document JavaScript libraries and can be used to relate genes to iTerms and then visualize gene and iTerm pairs. WebGIVI can accept a gene list that is used to retrieve the gene symbols and corresponding iTerm list. This list can be submitted to visualize the gene iTerm pairs using two distinct methods: a Concept Map or a Cytoscape Network Map. In addition, WebGIVI also supports uploading and visualization of any two-column tab separated data. CONCLUSIONS: WebGIVI provides an interactive and integrated network graph of gene and iTerms that allows filtering, sorting, and grouping, which can aid biologists in developing hypothesis based on the input gene lists. In addition, WebGIVI can visualize hundreds of nodes and generate a high-resolution image that is important for most of research publications. The source code can be freely downloaded at https://github.com/sunliang3361/WebGIVI. The WebGIVI tutorial is available at http://raven.anr.udel.edu/webgivi/tutorial.php.