M.S. Scientific Computing (completed) · Ph.D. Computer Science (in progress) @ Georgia State University. I build machine learning systems for genomic surveillance, epidemiological modeling, and health informatics — where data meets real-world impact.
Graduate researcher and ML engineer passionate about building computational tools that improve human health outcomes.
I am a Ph.D. researcher in Computer Science at Georgia State University (GPA 3.67), working at the intersection of machine learning, bioinformatics, and public health. I hold an M.S. in Scientific Computing (GPA 3.61) and a B.S. in Statistics (First Class Honors, KNUST Ghana). My doctoral research focuses on unsupervised genomic representation learning, probabilistic community structure inference, and epidemiological excess burden modeling.
I hold a B.S. in Statistics (First Class Honors) from KNUST, Ghana, and bring a strong mathematical foundation to every project — from variational EM formulations for stochastic block models to masked reconstruction losses in VQ-VAEs trained across 4× GPU clusters.
Beyond research, I am passionate about broadening participation in computing, having taught and mentored students from middle school through university level across three countries.
Proposed MaskedVQ-Seq — a masked vector-quantized variational autoencoder that learns discrete codebook representations of genomic sequences fully unsupervised. Benchmarked against 7 methods including DNABERT-2 and Transformer VAEs on 1M SARS-CoV-2 wastewater reads. Ablation studies across 35 configurations (codebook size, k-mer size, masking probability, entropy bonus) reveal key design principles for genomic discrete representation learning.
Under Peer ReviewApplied SBM and Spectral-GMM to NCBI wastewater metagenomic data. Built full pipeline: FASTA → Kraken2 → Bracken → CLR normalization → Spearman correlation graph → probabilistic community detection. Identified human gut, pathogen-enriched, and environmental taxa communities with biological validation.
Fitted 2-sub-epidemic logistic model to US congenital syphilis surveillance data (2010–2023, AICc = 138.50). Estimated 3,012 excess cases (~31% above pre-pandemic baseline) across 2020–2023. State-level OLS regression identified maternal syphilis rate as strongest predictor of excess burden (r = 0.425, p = 0.005).
Applied Louvain method to functional brain connectivity data across temporal windows. Developed label standardization logic to consistently distinguish control-dominated vs. schizophrenia-dominated communities across varying numbers of detected communities.
Implemented and compared Entropy Minimization, Pseudo-Labeling, and Virtual Adversarial Training on MNIST and the Two-Moon toy dataset. Proposed a novel noisy Pseudo-Label variant that improved classification on unlabeled data beyond baseline methods.
Comprehensive review of 7 GAN variants (DCGAN, cGAN, WGAN, LSGAN, CycleGAN, CoGAN, InfoGAN) covering mathematical formulations, KL/JS divergence relationships, minimax optimization theory, and domain-specific applications including image synthesis, NLP, and 3D modeling.
Built dense layers, ReLU, and Softmax activations from scratch in NumPy. Explored forward pass mechanics, activation behavior, and multi-layer composition as mathematical foundations for deeper architecture study.
Large-scale dataset of COVID-19 related tweets spanning 2020–2023. Ongoing research into NLP-based public health surveillance — sentiment trends, misinformation spread patterns, and geographic health signal extraction from social media at scale.
Advanced Algorithms in Bioinformatics · Machine Learning · Applied Mathematics · Database Systems · Epidemiological Modeling
Kwame Nkrumah University of Science & Technology, Ghana · GPA: 3.32
I'm actively seeking research collaborations, GRA positions, and opportunities in ML for health informatics and bioinformatics.