Cited 0 times in Scipus Cited Count

Transformer-Based Gene Scoring Model for Extracting Representative Characteristic of Central Dogma Process to Prioritize Pathogenic Genes Applying Breast Cancer Multi-omics Data

DC Field Value Language
dc.contributor.authorJhee, JH-
dc.contributor.authorSong, MY-
dc.contributor.authorKim, BG-
dc.contributor.authorShin, H-
dc.contributor.authorLee, SY-
dc.date.accessioned2023-06-14T02:52:26Z-
dc.date.available2023-06-14T02:52:26Z-
dc.date.issued2023-
dc.identifier.urihttp://repository.ajou.ac.kr/handle/201003/25926-
dc.description.abstractVarious deep learning approaches using big multiomics data of cancer patients are being applied to identify biomarkers of diverse cancer types these days. Because multiomics data generally have a character with high dimensions compared with relatively few patient samples, this imbalance is a recognized bottleneck to apply integrated characteristics of multiomics in cancer research. Among the dimensionality reduction techniques, deep learning-based approaches, such as autoencoder, are known to have strength in handling high dimensional data with few samples. However, the black box model makes it difficult to explain which genes are essential. In this study, we develop a transformer-based representative Central tendency Gene score considering Central Dogma process information (CGCD) model to predict optimized potential anti-breast cancer therapeutic target genes. It is based on a unified representation applying the compressed features learned through Transformer using multiomics data of 105 breast cancer patients from The Cancer Genome Atlas (TCGA). Unlike other autoencoder-based models, CGCD can derive gene scores from the self-attention mechanism in the transformer model. The significant encoding genes were selected by computing the p-value per each gene based on the scores for all the patients. To verify CGCD score ability for predicting target genes, we estimated hazard ratio and p-value per gene by conducting survival analysis using Cox proportional hazard model and calculated area under the curve (AUC) with CGCD score and the p-value per patient, and performed biological functional analysis including Gene Set Enrichment Analysis (GSEA). As the CGCD score became higher, the results showed a pronounced increasing trend in the retention rate of breast cancer marker genes and pathways. From this point of view, the CGCD score that reflects harmony of multi-omics data in a gene is considered suitable as a criterion for predicting cancer diagnostic markers.-
dc.language.isoen-
dc.titleTransformer-Based Gene Scoring Model for Extracting Representative Characteristic of Central Dogma Process to Prioritize Pathogenic Genes Applying Breast Cancer Multi-omics Data-
dc.typeArticle-
dc.subject.keywordbreast cancer-
dc.subject.keyworddata integration-
dc.subject.keyworddeep learning-
dc.subject.keywordgene scoring-
dc.subject.keywordMulti-omics-
dc.contributor.affiliatedAuthorKim, BG-
dc.contributor.affiliatedAuthorLee, SY-
dc.type.localJournal Papers-
dc.identifier.doi10.1109/BigComp57234.2023.00033-
dc.citation.title2023 IEEE International Conference on Big Data and Smart Computing (BigComp)-
dc.citation.date2023-
dc.citation.startPage149-
dc.citation.endPage154-
dc.identifier.bibliographicCitation2023 IEEE International Conference on Big Data and Smart Computing (BigComp), : 149-154, 2023-
dc.embargo.liftdate9999-12-31-
dc.embargo.terms9999-12-31-
dc.relation.journalidJJ00000008-
Appears in Collections:
Journal Papers > School of Medicine / Graduate School of Medicine > Brain Science
Journal Papers > Research Organization > KIURI
Files in This Item:
There are no files associated with this item.

qrcode

해당 아이템을 이메일로 공유하기 원하시면 인증을 거치시기 바랍니다.

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Browse