DiMeX: A Text Mining System for Mutation- Disease Association Extraction

Mahmood, A. S. M. Ashique; Wu, Tsung-Jung; Mazumder, Raja; Vijay-Shanker, K.

DiMeX: A Text Mining System for Mutation- Disease Association Extraction

Author(s)	Mahmood, A. S. M. Ashique
Author(s)	Wu, Tsung-Jung
Author(s)	Mazumder, Raja
Author(s)	Vijay-Shanker, K.
Ordered Author	A. S. M. Ashique Mahmood, Tsung-Jung Wu, Raja Mazumder, K. Vijay-Shanker
UD Author	Mahmood, A. S. M. Ashique	en_US
UD Author	Vijay-Shanker, K.	en_US
Date Accessioned	2016-11-10T15:56:09Z
Date Available	2016-11-10T15:56:09Z
Copyright Date	Copyright © 2016 Mahmood et al.	en_US
Publication Date	2016-04-13
Description	Publisher's PDF	en_US
Abstract	The number of published articles describing associations between mutations and diseases is increasing at a fast pace. There is a pressing need to gather such mutation-disease associations into public knowledge bases, but manual curation slows down the growth of such databases. We have addressed this problem by developing a text-mining system (DiMeX) to extract mutation to disease associations from publication abstracts. DiMeX consists of a series of natural language processing modules that preprocess input text and apply syntactic and semantic patterns to extract mutation-disease associations. DiMeX achieves high precision and recall with F-scores of 0.88, 0.91 and 0.89 when evaluated on three different datasets for mutation-disease associations. DiMeX includes a separate component that extracts mutation mentions in text and associates them with genes. This component has been also evaluated on different datasets and shown to achieve state-of-the-art performance. The results indicate that our system outperforms the existing mutation-disease association tools, addressing the low precision problems suffered by most approaches. DiMeX was applied on a large set of abstracts from Medline to extract mutation-disease associations, as well as other relevant information including patient/cohort size and population data. The results are stored in a database that can be queried and downloaded at http:// biotm.cis.udel.edu/dimex/.We conclude that this high-throughput text-mining approach has the potential to significantly assist researchers and curators to enrich mutation databases.	en_US
Department	University of Delaware. Department of Computer and Information Sciences.	en_US
Citation	Mahmood ASMA, Wu T-J, Mazumder R, Vijay-Shanker K (2016) DiMeX: A Text Mining System for Mutation-Disease Association Extraction. PLoS ONE 11(4): e0152725. doi:10.1371/journal. pone.0152725	en_US
DOI	doi:10.1371/journal. pone.0152725	en_US
ISSN	1932-6203	en_US
URL	http://udspace.udel.edu/handle/19716/19832
Language	en_US	en_US
Publisher	Public Library of Science	en_US
dc.rights	CC BY	en_US
dc.source	PLOS One	en_US
dc.source.uri	http://journals.plos.org/plosone/	en_US
Title	DiMeX: A Text Mining System for Mutation- Disease Association Extraction	en_US
Type	Article	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: DiMeX.pone.0152725_1461075042T1657.pdf
Size:: 1.08 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 2.22 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Open Access Publications