Cited 0 times in Scipus Cited Count

Computational prediction and interpretation of cell-specific replication origin sites from multiple eukaryotes by exploiting stacking framework

DC Field Value Language
dc.contributor.authorWei, L-
dc.contributor.authorHe, W-
dc.contributor.authorMalik, A-
dc.contributor.authorSu, R-
dc.contributor.authorCui, L-
dc.contributor.authorManavalan, B-
dc.date.accessioned2023-01-05T03:03:12Z-
dc.date.available2023-01-05T03:03:12Z-
dc.date.issued2021-
dc.identifier.issn1467-5463-
dc.identifier.urihttp://repository.ajou.ac.kr/handle/201003/23635-
dc.description.abstractOrigins of replication sites (ORIs), which refers to the initiative locations of genomic DNA replication, play essential roles in DNA replication process. Detection of ORIs' distribution in genome scale is one of key steps to in-depth understanding their regulation mechanisms. In this study, we presented a novel machine learning-based approach called Stack-ORI encompassing 10 cell-specific prediction models for identifying ORIs from four different eukaryotic species (Homo sapiens, Mus musculus, Drosophila melanogaster and Arabidopsis thaliana). For each cell-specific model, we employed 12 feature encoding schemes that cover nucleic acid composition, position-specific and physicochemical properties information. The optimal feature set was identified from each encoding individually and developed their respective baseline models using the eXtreme Gradient Boosting (XGBoost) classifier. Subsequently, the predicted scores of 12 baseline models are integrated as a novel feature vector to train XGBoost and develop the final model. Extensive experimental results show that Stack-ORI achieves significantly better performance as compared with their baseline models on both training and independent datasets. Interestingly, Stack-ORI consistently outperforms existing predictor in all cell-specific models, not only on training but also on independent test. Moreover, our novel approach provides necessary interpretations that help understanding model success by leveraging the powerful SHapley Additive exPlanation algorithm, thus underlining the most important feature encoding schemes significant for predicting cell-specific ORIs.-
dc.language.isoen-
dc.subject.MESHAnimals-
dc.subject.MESHDatabases, Nucleic Acid-
dc.subject.MESHDrosophila melanogaster-
dc.subject.MESHHumans-
dc.subject.MESHMice-
dc.subject.MESHModels, Genetic-
dc.subject.MESHReplication Origin-
dc.subject.MESHSupport Vector Machine-
dc.subject.MESHTranscription, Genetic-
dc.titleComputational prediction and interpretation of cell-specific replication origin sites from multiple eukaryotes by exploiting stacking framework-
dc.typeArticle-
dc.identifier.pmid33152766-
dc.subject.keywordeXtreme Gradient Boosting-
dc.subject.keywordfeature extraction-
dc.subject.keywordmodel interpretability-
dc.subject.keywordorigin of replication site-
dc.subject.keywordstacking strategy-
dc.contributor.affiliatedAuthorManavalan, B-
dc.type.localJournal Papers-
dc.identifier.doi10.1093/bib/bbaa275-
dc.citation.titleBriefings in bioinformatics-
dc.citation.volume22-
dc.citation.number4-
dc.citation.date2021-
dc.citation.startPagebbaa275-
dc.citation.endPagebbaa275-
dc.identifier.bibliographicCitationBriefings in bioinformatics, 22(4). : bbaa275-bbaa275, 2021-
dc.embargo.liftdate9999-12-31-
dc.embargo.terms9999-12-31-
dc.identifier.eissn1477-4054-
dc.relation.journalidJ014675463-
Appears in Collections:
Journal Papers > School of Medicine / Graduate School of Medicine > Physiology
Files in This Item:
There are no files associated with this item.

qrcode

해당 아이템을 이메일로 공유하기 원하시면 인증을 거치시기 바랍니다.

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Browse