Department of Computer and Information Sciences
Permanent URI for this community
For more information, see the Department of Computer and Information Sciences website.
Browse
Browsing Department of Computer and Information Sciences by Title
Now showing 1 - 20 of 48
Results Per Page
Sort Options
Item A comprehensive analysis of the integration of team research between sport psychology and management(Psychology of Sport and Exercise, 2020-06-13) Emich, Kyle J.; Norder, Kurt; Lu, Li; Sawhney, AmanBoth sports and organizations rely on teams. As such, the sport psychology and management literatures have contributed greatly to our understanding of team functioning. Despite this, previous reviews based on subsets of articles in these literatures indicate a lack of communication between them. In this article, we assess the state of integration between the entirety of the sport psychology and management literatures on teams by considering the full set of interconnected team articles in the SCOPUS database (6974 articles over 69 years). We use this data to conduct a combination of citation network analysis and content analysis via topic modeling to evaluate conceptual integration. The data show that interdisciplinary discussion between these two fields is lacking, particularly regarding the integration of sport psychology into management research. Whereas 7% of references to team articles in sport psychology come from management journals, only 0.6% of team references in management journals come from sport psychology. Despite this, longitudinal analysis indicates that in the last 10 years the rate of integration between these fields is increasing. We identify specific topics that have accounted for this integration and suggest topics ripe for future integration.Item A Game-Theoretic Approach to Energy-Efficient Elevator Scheduling in Smart Buildings(IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2023-02-22) Maleki, Erfan Farhangi; Bhatta, Dixit; Mashayekhy, LenaBuildings, producing more carbon footprints than the transportation sector, account for a significant portion of the United States’ total energy consumption. By designing modern automation techniques, smart buildings can significantly reduce energy consumption, protect the environment, and consequently improve quality of life. This article focuses on the automation of elevator scheduling, which is an NP-Hard problem, to reduce energy usage in smart buildings and improve users’ quality of experience. We propose an optimal mathematical model for the elevator scheduling problem using integer programming. We then propose a novel game-theoretic approach that captures interactions within the elevator system to reduce energy consumption and enhance user experience. We propose a request coalition formation game, where nonoverlapping coalitions of user requests are served by elevators to minimize their movements and energy consumption while reducing service time and stops for users. We analyze the performance of our proposed approach using the optimal solution as a benchmark and Nearest Car and Fixed Sectoring algorithms as rivals. The experiments show that our approach is significantly efficient in terms of energy consumption and service time, making it suitable for smart buildings.Item A scoping review of the use of lab streaming layer framework in virtual and augmented reality research(Virtual Reality, 2023-05-02) Wang, Qile; Zhang, Qinqi; Sun, Weitong; Boulay, Chadwick; Kim, Kangsoo; Barmaki, Roghayeh LeilaThe use of multimodal data allows excellent opportunities for human–computer interaction research and novel techniques regarding virtual and augmented reality (VR/AR) experiences. Collecting, coordinating, and synchronizing a large amount of data from multiple VR/AR hardware while maintaining a high framerate can be a daunting task, despite the compelling nature of multimodal data. The Lab Streaming Layer (LSL) is an open-source framework that enables the synchronous collection of various types of multimodal data, unlike existing expensive alternatives. However, despite its potential, this framework has not been fully adopted by the VR/AR research community. In this paper, we present a guideline of the LSL framework’s use in VR/AR research as well as report current trends by performing a comprehensive literature review on the subject. We extract 549 publications using LSL from January 2015 to March 2022. We analyze types of data, displays, and targeted application areas. We describe in-depth reviews of 38 selected papers and provide use of LSL in the VR/AR research community while highlighting benefits, challenges, and future opportunities.Item An Efficient Approach to Predict Eye Diseases from Symptoms Using Machine Learning and Ranker-Based Feature Selection Methods(Bioengineering, 2022-12-24) Marouf, Ahmed Al; Mottalib, Md Mozaharul; Alhajj, Reda; Rokne, Jon; Jafarullah, OmarThe eye is generally considered to be the most important sensory organ of humans. Diseases and other degenerative conditions of the eye are therefore of great concern as they affect the function of this vital organ. With proper early diagnosis by experts and with optimal use of medicines and surgical techniques, these diseases or conditions can in many cases be either cured or greatly mitigated. Experts that perform the diagnosis are in high demand and their services are expensive, hence the appropriate identification of the cause of vision problems is either postponed or not done at all such that corrective measures are either not done or done too late. An efficient model to predict eye diseases using machine learning (ML) and ranker-based feature selection (r-FS) methods is therefore proposed which will aid in obtaining a correct diagnosis. The aim of this model is to automatically predict one or more of five common eye diseases namely, Cataracts (CT), Acute Angle-Closure Glaucoma (AACG), Primary Congenital Glaucoma (PCG), Exophthalmos or Bulging Eyes (BE) and Ocular Hypertension (OH). We have used efficient data collection methods, data annotations by professional ophthalmologists, applied five different feature selection methods, two types of data splitting techniques (train-test and stratified k-fold cross validation), and applied nine ML methods for the overall prediction approach. While applying ML methods, we have chosen suitable classic ML methods, such as Decision Tree (DT), Random Forest (RF), Naive Bayes (NB), AdaBoost (AB), Logistic Regression (LR), k-Nearest Neighbour (k-NN), Bagging (Bg), Boosting (BS) and Support Vector Machine (SVM). We have performed a symptomatic analysis of the prominent symptoms of each of the five eye diseases. The results of the analysis and comparison between methods are shown separately. While comparing the methods, we have adopted traditional performance indices, such as accuracy, precision, sensitivity, F1-Score, etc. Finally, SVM outperformed other models obtaining the highest accuracy of 99.11% for 10-fold cross-validation and LR obtained 98.58% for the split ratio of 80:20.Item Automated Identification of Uniqueness in JUnit Tests(ACM Transactions on Software Engineering and Methodology, 2022-05-24) Wu, Jianwei; Clause, JamesIn the context of testing, descriptive test names are desirable because they document the purpose of tests and facilitate comprehension tasks during maintenance. Unfortunately, prior work has shown that tests often do not have descriptive names. To address this limitation, techniques have been developed to automatically generate descriptive names. However, they often generated names that are invalid or do not meet with developer approval. To help address these limitations, we present a novel approach to extract the attributes of a given test that make it unique among its siblings. Because such attributes often serve as the basis for descriptive names, identifying them is an important first step towards improving test name generation approaches. To evaluate the approach, we created a prototype implementation for JUnit tests and compared its output with human judgment. The results of the evaluation demonstrate that the attributes identified by the approach are consistent with human judgment and are likely to be useful for future name generation techniques.Item A Bifactor Approximation Algorithm for Cloudlet Placement in Edge Computing(IEEE Transactions on Parallel and Distributed Systems, 2021-11-15) Bhatta, Dixit; Mashayekhy, LenaEmerging applications with low-latency requirements such as real-time analytics, immersive media applications, and intelligent virtual assistants have rendered Edge Computing as a critical computing infrastructure. Existing studies have explored the cloudlet placement problem in a homogeneous scenario with different goals such as latency minimization, load balancing, energy efficiency, and placement cost minimization. However, placing cloudlets in a highly heterogeneous deployment scenario considering the next-generation 5G networks and IoT applications is still an open challenge. The novel requirements of these applications indicate that there is still a gap in ensuring low-latency service guarantees when deploying cloudlets. Furthermore, deploying cloudlets in a cost-effective manner and ensuring full coverage for all users in edge computing are other critical conflicting issues. In this article, we address these issues by designing a bifactor approximation algorithm to solve the heterogeneous cloudlet placement problem to guarantee a bounded latency and placement cost, while fully mapping user applications to appropriate cloudlets. We first formulate the problem as a multi-objective integer programming model and show that it is a computationally NP-hard problem. We then propose a bifactor approximation algorithm, ACP, to tackle its intractability. We investigate the effectiveness of ACP by performing extensive theoretical analysis and experiments on multiple deployment scenarios based on New York City OpenData. We prove that ACP provides a (2,4)-approximation ratio for the latency and the placement cost. The experimental results show that ACP obtains near-optimal results in a polynomial running time making it suitable for both short-term and long-term cloudlet placement in heterogeneous deployment scenarios.Item BioC-compatible full-text passage detection for protein-protein interactions using extended dependency graph(Oxford University Press, 4/12/16) Peng,Yifan; Arighi,Cecilia; Wu,Cathy H.; Vijay-Shanker,K.; Yifan Peng, Cecilia Arighi, Cathy H. Wu and K. Vijay-Shanker; Arighi, Cecilia Noemi; Wu, Cathy Huey-Hwa; Shanker, Vijay KThere has been a large growth in the number of biomedical publications that report experimental results. Many of these results concern detection of protein-protein interactions (PPI). In BioCreative V, we participated in the BioC task and Developmenteloped a PPI system to detect text passages with PPIs in the full-text articles. By adopting the BioC format, the output of the system can be seamlessly added to the biocuration pipeline with little effort required for the system integration. A distinctive feature of our PPI system is that it utilizes extended dependency graph, an intermediate level of representation that attempts to abstract away syntactic variations in text. As a result, we are able to use only a limited set of rules to extract PPI pairs in the sentences, and additional rules to detect additional passages for PPI pairs. For evaluation, we used the 95 articles that were provided for the BioC annotation task. We retrieved the unique PPIs from the BioGRID database for these articles and show that our system achieves a recall of 83.5%. In order to evaluate the detection of passages with PPIs, we further annotated Abstract and Results sections of 20 documents from the dataset and show that an f-value of 80.5% was obtained. To evaluate the generalizability of the system, we also conducted experiments on AIMed, a well-known PPI corpus. We achieved an f-value of 76.1% for sentence detection and an f-value of 64.7% for unique PPI detection.Item BioCreative V BioC track overview: collaborative biocurator assistant task for BioGRID(Oxford University Press, 8/2/16) Kim,Sun; Dogan,Rezarta Islamaj; Chatr-Aryamontri,Andrew; Chang,Christie S.; Oughtred,Rose; Rust,Jennifer; Batista-Navarro,Riza; Carter,Jacob; Ananiadou,Sophia; Matos,Sergio; Santos,Andre; Campos,David; Oliveira,Jose Luis; Singh,Onkar; Jonnagaddala,Jitendra; Dai,Hong-Jie; Su,Emily Chia-Yu; Chang,Yung-Chun; Su,Yu-Chen; Chu,Chun-Han; Chen,Chien Chin; Hsu,Wen-Lian; Peng,Yifan; Arighi,Cecilia; Wu,Cathy H.; Vijay-Shanker,K.; Aydin,Ferhat; Husunbeyi,Zehra Melce; Ozgur,Arzucan; Shin,Soo-Yong; Kwon,Dongseop; Dolinski,Kara; Tyers,Mike; Wilbur,W. John; Comeau,Donald C.; Sun Kim, Rezarta Islamaj Do gan, Andrew Chatr-Aryamontri, Christie S. Chang, Rose Oughtred, Jennifer Rust, Riza Batista-Navarro, Jacob Carter, Sophia Ananiadou, Se� rgio Matos, Andre� Santos, David Campos, Jose�Lu?s Oliveira, Onkar Singh, Jitendra Jonnagaddala, Hong-Jie Dai, Emily Chia-Yu Su, Yung-Chun Chang, Yu-Chen Su, Chun-Han Chu, Chien Chin Chen,Wen-Lian Hsu,Yifan Peng, Cecilia Arighi,Cathy H. Wu, K. Vijay-Shanker, Ferhat Ayd?n, Zehra Melce Husunbey, Arzucan Ozgu, Soo-Yong Shin, Dongseop Kwon, Kara Dolinski, Mike Tyers, W. John Wilbur and Donald C. Comeau; Arighi, Cecilia Noemi; Wu, Cathy Huey-Hwa; Shanker, Vijay KBioC is a simple XML format for text, annotations and relations, and was Developmenteloped to achieve interoperability for biomedical text processing. Following the success of BioC in BioCreative IV, the BioCreative V BioC track addressed a collaborative task to build an assistant system for BioGRID curation. In this paper, we describe the framework of the collaborative BioC task and discuss our findings based on the user survey. This track consisted of eight subtasks including gene/protein/organism named entity recognition, protein-protein/genetic interaction passage identification and annotation visualization. Using BioC as their data-sharing and communication medium, nine teams, world-wide, participated and contributed either new methods or improvements of existing tools to address different subtasks of the BioC track. Results from different teams were shared in BioC and made available to other teams as they addressed different subtasks of the track. In the end, all submitted runs were merged using a machine learning classifier to produce an optimized output. The biocurator assistant system was evaluated by four BioGRID curators in terms of practical usability. The curators' feedback was overall positive and highlighted the user-friendly design and the convenient gene/protein curation tool based on text mining.Item Communication-Constrained Routing and Traffic Control: A Framework for Infrastructure-Assisted Autonomous Vehicles(IEEE Transactions on Intelligent Transportation Systems, 2022-09-07) Liu, Guangyi; Salehi, Seyedmohammad; Bala, Erdem; Shen, Chien-Chung; Cimini, Leonard J.With the increasing demand for advanced autonomous driving, the available communication resources may become constrained over different geographic areas. In addition, due to dynamic channel variations and imperfect cell deployments, guaranteeing the required communication resources for data hungry and delay-sensitive applications in autonomous vehicles (AVs), along their entire trips, becomes challenging. To address these issues, the paper investigates the feasibility of a hybrid system-optimum and user-equilibrium AV traffic framework subject to communication constraints, as well as its performance gain. Within such a framework, the paper introduces the problems of communication-constrained routing (CCR) and traffic control (CCTC) in the context of infrastructure-assisted autonomous driving and presents respective solutions. For CCR, an efficient two-layered routing scheme is proposed which can provide optimal trip duration. Simulation results show that the routing scheme achieves a good balance between longer duration of communication coverage and acceptable source-to-destination travel time. For CCTC, it is shown that there exists an optimal AV speed on each road segment, as well as an optimal inter-AV distance and an optimal number of AVs in each cell, to maximize the road-network AV throughput within a single cell. Moreover, spectrum allocation is used to achieve Pareto-optimal road-network throughput across cells, and a new key performance index (KPI) is defined to evaluate the traffic control capability of cellular systems. Simulation results validate the improvement of AV throughput via the proposed CCTC solution.Item COVID-19 Knowledge Graph from semantic integration of biomedical literature and databases(Bioinformatics, 2021-10-06) Chen, Chuming; Ross, Karen E.; Gavali, Sachin; Cowart, Julie E.; Wu, Cathy H.The global response to the COVID-19 pandemic has led to a rapid increase of scientific literature on this deadly disease. Extracting knowledge from biomedical literature and integrating it with relevant information from curated biological databases is essential to gain insight into COVID-19 etiology, diagnosis and treatment. We used Semantic Web technology RDF to integrate COVID-19 knowledge mined from literature by iTextMine, PubTator and SemRep with relevant biological databases and formalized the knowledge in a standardized and computable COVID-19 Knowledge Graph (KG). We published the COVID-19 KG via a SPARQL endpoint to support federated queries on the Semantic Web and developed a knowledge portal with browsing and searching interfaces. We also developed a RESTful API to support programmatic access and provided RDF dumps for download.Item A crowdsourcing open platform for literature curation in UniProt(PLOS Biology, 2021-12-06) Wang, Yuqi; Wang, Qinghua; Huang, Hongzhan; Huang, Wei; Chen, Yongxing; McGarvey, Peter B.; Wu, Cathy H.; Arighi, Cecilia N.The UniProt knowledgebase is a public database for protein sequence and function, covering the tree of life and over 220 million protein entries. Now, the whole community can use a new crowdsourcing annotation system to help scale up UniProt curation and receive proper attribution for their biocuration work.Item DiMeX: A Text Mining System for Mutation- Disease Association Extraction(Public Library of Science, 2016-04-13) Mahmood, A. S. M. Ashique; Wu, Tsung-Jung; Mazumder, Raja; Vijay-Shanker, K.; A. S. M. Ashique Mahmood, Tsung-Jung Wu, Raja Mazumder, K. Vijay-Shanker; Mahmood, A. S. M. Ashique; Vijay-Shanker, K.The number of published articles describing associations between mutations and diseases is increasing at a fast pace. There is a pressing need to gather such mutation-disease associations into public knowledge bases, but manual curation slows down the growth of such databases. We have addressed this problem by developing a text-mining system (DiMeX) to extract mutation to disease associations from publication abstracts. DiMeX consists of a series of natural language processing modules that preprocess input text and apply syntactic and semantic patterns to extract mutation-disease associations. DiMeX achieves high precision and recall with F-scores of 0.88, 0.91 and 0.89 when evaluated on three different datasets for mutation-disease associations. DiMeX includes a separate component that extracts mutation mentions in text and associates them with genes. This component has been also evaluated on different datasets and shown to achieve state-of-the-art performance. The results indicate that our system outperforms the existing mutation-disease association tools, addressing the low precision problems suffered by most approaches. DiMeX was applied on a large set of abstracts from Medline to extract mutation-disease associations, as well as other relevant information including patient/cohort size and population data. The results are stored in a database that can be queried and downloaded at http:// biotm.cis.udel.edu/dimex/.We conclude that this high-throughput text-mining approach has the potential to significantly assist researchers and curators to enrich mutation databases.Item E3-UAV: An Edge-Based Energy-Efficient Object Detection System for Unmanned Aerial Vehicles(IEEE Internet of Things Journal, 2023-08-03) Suo, Jiashun; Zhang, Xingzhou; Shi, Weisong; Zhou, WeiMotivated by the advances in deep learning techniques, the application of Unmanned Aerial Vehicle (UAV)-based object detection has proliferated across a range of fields, including vehicle counting, fire detection, and city monitoring. While most existing research studies only a subset of the challenges inherent to UAV-based object detection, there are few studies that balance various aspects to design a practical system for energy consumption reduction. In response, we present the E3-UAV, an edge-based energy-efficient object detection system for UAVs. The system is designed to dynamically support various UAV devices, edge devices, and detection algorithms, with the aim of minimizing energy consumption by deciding the most energy-efficient flight parameters (including flight altitude, flight speed, detection algorithm, and sampling rate) required to fulfill the detection requirements of the task. We first present an effective evaluation metric for actual tasks and construct a transparent energy consumption model based on hundreds of actual flight data to formalize the relationship between energy consumption and flight parameters. Then we present a lightweight energy-efficient priority decision algorithm based on a large quantity of actual flight data to assist the system in deciding flight parameters. Finally, we evaluate the performance of the system, and our experimental results demonstrate that it can significantly decrease energy consumption in real-world scenarios. Additionally, we provide four insights that can assist researchers and engineers in their efforts to study UAV-based object detection further.Item Effective biomedical document classification for identifying publications relevant to the mouse Gene Expression Database (GXD)(Oxford University Press., 2017-03-24) Jiang, Xiangying; Ringwald, Martin; Blake, Judith; Shatkay, Hagit; Xiangying Jiang, Martin Ringwald, Judith Blake and Hagit Shatkay; ; Jiang, Xiangying; Shatkay, HagitThe Gene Expression Database (GXD) is a comprehensive online database within the Mouse Genome Informatics resource, aiming to provide available information about endogenous gene expression during mouse development. The information stems primarily from many thousands of biomedical publications that database curators must go through and read. Given the very large number of biomedical papers published each year, automatic document classification plays an important role in biomedical research. Specifically, an effective and efficient document classifier is needed for supporting the GXD annotation workflow. We present here an effective yet relatively simple classification scheme, which uses readily available tools while employing feature selection, aiming to assist curators in identifying publications relevant to GXD. We examine the performance of our method over a large manually curated dataset, consisting of more than 25 000 PubMed abstracts, of which about half are curated as relevant to GXD while the other half as irrelevant to GXD. In addition to text from title-and-abstract, we also consider image captions, an important information source that we integrate into our method. We apply a captions-based classifier to a subset of about 3300 documents, for which the full text of the curated articles is available. The results demonstrate that our proposed approach is robust and effectively addresses the GXD document classification. Moreover, using information obtained from image captions clearly improves performance, compared to title and abstract alone, affirming the utility of image captions as a substantial evidence source for automatically determining the relevance of biomedical publications to a specific subject area.Item emiRIT: a text-mining-based resource for microRNA information(Database, 2021-05-28) Roychowdhury, Debarati; Gupta, Samir; Qin, Xihan; Arighi, Cecilia N.; Vijay-Shanker, K.microRNAs (miRNAs) are essential gene regulators, and their dysregulation often leads to diseases. Easy access to miRNA information is crucial for interpreting generated experimental data, connecting facts across publications and developing new hypotheses built on previous knowledge. Here, we present extracting miRNA Information from Text (emiRIT), a text-miningbased resource, which presents miRNA information mined from the literature through a user-friendly interface. We collected 149 ,233 miRNA –PubMed ID pairs from Medline between January 1997 and May 2020. emiRIT currently contains ‘miRNA –gene regulation’ (69 ,152 relations), ‘miRNA disease (cancer)’ (12 ,300 relations), ‘miRNA –biological process and pathways’ (23, 390 relations) and circulatory ‘miRNAs in extracellular locations’ (3782 relations). Biological entities and their relation to miRNAs were extracted from Medline abstracts using publicly available and in-house developed text-mining tools, and the entities were normalized to facilitate querying and integration. We built a database and an interface to store and access the integrated data, respectively. We provide an up-to-date and user-friendly resource to facilitate access to comprehensive miRNA information from the literature on a large scale, enabling users to navigate through different roles of miRNA and examine them in a context specific to their information needs. To assess our resource’s information coverage, we have conducted two case studies focusing on the target and differential expression information of miRNAs in the context of cancer and a third case study to assess the usage of emiRIT in the curation of miRNA information.Item Generalization of Runoff Risk Prediction at Field Scales to a Continental-Scale Region Using Cluster Analysis and Hybrid Modeling(Geophysical Research Letters, 2022-08-26) Ford, Chanse M.; Hu, Yao; Ghosh, Chirantan; Fry, Lauren M.; Malakpour-Estalaki, Siamak; Mason, Lacey; Fitzpatrick, Lindsay; Mazrooei, Amir; Goering, Dustin C.As surface water resources in the U.S. continue to be pressured by excess nutrients carried by agricultural runoff, the need to assess runoff risk at the field scale continues to grow in importance. Most landscape hydrologic models developed at regional scales have limited applicability at finer spatial scales. Hybrid models can be used to address the scale mismatch between model simulation and applicability, but could be limited by their ability to generalize over a large domain with heterogeneous hydrologic characteristics. To assist the generalization, we develop a regionalization approach based on the principal component analysis and K-means clustering to identify the clusters with similar runoff potential over the Great Lakes region. For each cluster, hybrid models are developed by combining National Oceanic and Atmospheric Administration's National Water Model and a data-driven model, eXtreme gradient boosting with field-scale measurements, enabling prediction of daily runoff risk level at the field scale over the entire region. Key Points: Identify five clusters in the Great Lakes region with similar runoff potential Generalize hybrid models developed at field scales to a continental-scale region Predict daily runoff risk on 1 km-by-1 km grid over the entire Great Lakes region Plain Language Summary: Nutrient loading is an important factor determining water quality in the Great Lakes. Transport of nutrients to surface water is often correlated with runoff, causing detrimental effects to aquatic ecosystems, such as harmful algal blooms. Runoff risk forecasts constituting an early warning system can be used to improve timing of nutrient application, leading to dual benefits of reducing nutrient transport to surface water and leaving more nutrients in the field for crop growth. However, measurements of the edge-of-field runoff are conducted at the field scale and sparse over the Great Lakes region, posing a great challenge to developing such a warning system over the continental scale. To address the challenge, we developed a generalization approach that allows predictive models developed using the runoff measurements at the field scale to be generalized to large regions with similar hydrogeologic characteristics. We can then predict the daily runoff risk level over the entire Great Lakes domain at 1 km-by-1 km resolution, which shows promise to be the backbone of the early warning system on the forecast of daily risk level for the Contiguous U.S.Item A house divided: A multilevel bibliometric review of the job search literature 1973–2020(Journal of Business Research, 2022-07-02) Norder, Kurt; Emich, Kyle; Kanar, Adam; Sawhney, Aman; Behrend, Tara S.A growing body of research across multiple disciplines has aimed to better understand the phenomenon of job search. However, little empirical research has examined the combined content and structure of the job search literature to accumulate programmatic knowledge. Unfortunately, this has resulted in redundancies and isolated advances that harm our ability to make concrete practical recommendations to aid policy makers, organizations, and broader society. Using bibliometric analysis of 3,197 articles on job search, the present article identifies and describes 10 distinct communities of thought and assesses patterns of integration between these communities. Assessment of community relationships confirms disciplinary divides, but reveals insights into patterns of thought within disciplines, and structural and conceptual relationships between them. Based on these findings, we offer a multilevel conceptual framework to organize the job search literature and suggest possible ways to improve its integration to build a more programmatic understanding of the job search phenomenon.Item A Hybrid Blockchain-Edge Architecture for Electronic Health Record Management with Attribute-based Cryptographic Mechanisms(IEEE Transactions on Network and Service Management, 2022-06-24) Guo, Hao; Li, Wanxin; Nejad, Mark; Shen, Chien-ChungThis paper presents a hybrid blockchain-edge architecture for managing Electronic Health Records (EHRs) with attribute-based cryptographic mechanisms. The architecture introduces a novel attribute-based signature aggregation (ABSA) scheme and multi-authority attribute-based encryption (MA-ABE) integrated with Paillier homomorphic encryption (HE) to protect patients’ anonymity and safeguard their EHRs. All the EHR activities and access control events are recorded permanently as blockchain transactions. We develop the ABSA module on Hyperledger Ursa cryptography library, MA-ABE module on OpenABE toolset, and blockchain network on Hyperledger Fabric. We measure the execution time of ABSA’s signing and verification functions, MA-ABE with different access policies and homomorphic encryption schemes, and compare the results with other existing blockchain-based EHR systems. We validate the access activities and authentication events recorded in blockchain transactions and evaluate the transaction throughput and latency using Hyperledger Caliper. The results show that the performance meets real-world scenarios’ requirements while safeguarding EHR and is robust against unauthorized retrievals.Item Improvements in viral gene annotation using large language models and soft alignments(BMC Bioinformatics, 2024-04-25) Harrigan, William L.; Ferrell, Barbra D.; Wommack, K. Eric; Polson, Shawn W.; Schreiber, Zachary D.; Belcaid, MahdiBackground The annotation of protein sequences in public databases has long posed a challenge in molecular biology. This issue is particularly acute for viral proteins, which demonstrate limited homology to known proteins when using alignment, k-mer, or profile-based homology search approaches. A novel methodology employing Large Language Models (LLMs) addresses this methodological challenge by annotating protein sequences based on embeddings. Results Central to our contribution is the soft alignment algorithm, drawing from traditional protein alignment but leveraging embedding similarity at the amino acid level to bypass the need for conventional scoring matrices. This method not only surpasses pooled embedding-based models in efficiency but also in interpretability, enabling users to easily trace homologous amino acids and delve deeper into the alignments. Far from being a black box, our approach provides transparent, BLAST-like alignment visualizations, combining traditional biological research with AI advancements to elevate protein annotation through embedding-based analysis while ensuring interpretability. Tests using the Virus Orthologous Groups and ViralZone protein databases indicated that the novel soft alignment approach recognized and annotated sequences that both blastp and pooling-based methods, which are commonly used for sequence annotation, failed to detect. Conclusion The embeddings approach shows the great potential of LLMs for enhancing protein sequence annotation, especially in viral genomics. These findings present a promising avenue for more efficient and accurate protein function inference in molecular biology.Item Improving Inter-Helix Contact Prediction With Local 2D Topological Information(IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2023-05-08) Li, Jiefu; Sawhney, Aman; Lee, Jung-Youn; Liao, LiInter-helix contact prediction is to identify residue contact across different helices in α-helical integral membrane proteins. Despite the progress made by various computational methods, contact prediction remains as a challenging task, and there is no method to our knowledge that directly tap into the contact map in an alignment free manner. We build 2D contact models from an independent dataset to capture the topological patterns in the neighborhood of a residue pair depending it is a contact or not, and apply the models to the state-of-art method's predictions to extract the features reflecting 2D inter-helix contact patterns. A secondary classifier is trained on such features. Realizing that the achievable improvement is intrinsically hinged on the quality of original predictions, we devise a mechanism to deal with the issue by introducing, 1) partial discretization of original prediction scores to more effectively leverage useful information 2) fuzzy score to assess the quality of the original prediction to help with selecting the residue pairs where improvement is more achievable. The cross-validation results show that the prediction from our method outperforms other methods including the state-of-the-art method (DeepHelicon) by a notable degree even without using the refinement selection scheme. By applying the refinement selection scheme, our method outperforms the state-of-the-art method significantly in these selected sequences.
- «
- 1 (current)
- 2
- 3
- »