Applying machine learning methods to electronic health records: studies in risk-stratification of chronic kidney disease and hypertrophic cardiomyopathy

Date
2019
Journal Title
Journal ISSN
Volume Title
Publisher
University of Delaware
Abstract
Electronic Health Records (EHRs) provide valuable clinical information that can be used toward disease prediction and patient risk-stratification. Applying machine learning methods to EHR data can enable identification of characteristic patterns in individuals and in whole populations. However, learning accurate models based on EHRs pose a number of challenges, including class imbalance and missing observations. In this thesis, we develop machine learning approaches for risk-stratification while effectively addressing challenges stemming from EHR analysis. We present our work in the context of chronic kidney disease (CKD) and hypertrophic cardiomyopathy (HCM), two common chronic conditions. ☐ We propose two approaches for addressing class imbalance. The first is a sampling-based ensemble method that attains high performance when used for stratifying CKD by severity levels. The second is an approach combining under- and over-sampling and an ensemble classifier that effectively identifies HCM patients at risk for adverse outcomes. To identify groups of co-occurring medical conditions among CKD patients, we introduce a probabilistic framework employing topic modeling in a non-traditional way. The obtained topics are clinically-meaningful, tight and distinct. Last, we present a framework utilizing supervised learning that effectively stratifies CKD patients by hospitalization risk. The models proposed in this thesis have much potential to assist healthcare providers in making clinical decisions.
Description
Keywords
Citation