This project aims to enhance the accuracy of cell clustering by integrating multi-omics data, specifically gene and protein expression, into a unified analytical framework. Building on the methods established by sciPENN and scGNN, we have adapted the scGNN model to accept two distinct inputs: one for gene expression and another for protein expression. This multi-omics approach allows for a more comprehensive analysis, leveraging the strengths of both data types to improve cell clustering outcomes.
Our method utilizes the powerful graph neural network architecture of scGNN, originally designed for single-cell RNA-seq data, and extends it to incorporate protein expression data predicted by the sciPENN model. By training these inputs separately and then merging them at the graph embedding stage, we ensure that the distinct biological signals from gene and protein data are preserved and accurately integrated. This approach aims to overcome the limitations of single-modality analyses, offering a more robust and precise method for cell cluster identification in complex biological datasets.