I am a scientist and research software engineer with 7+ years of experience, passionate about developing high-performance tools and machine learning systems to accelerate research. My tools and algorithms have enabled new insights into human disease and enabled genome analyses previously impossible.
Areas of Expertise
- Languages & Frameworks. Python, C++, Rust, JavaScript, scikit-learn, TensorFlow, Jax, Pandas, NumPy
- Bioinformatics. Sequence Alignment, Variant Calling, Pangenomics, Genome Assembly, Phylogenetics.
- Machine Learning. Deep Learning, Hidden Markov Models, Logistic Regression, Linear Models
- Software Engineering. Git, Docker, CI/CD, Unit Testing, Google Cloud Platform, PostgreSQL, Linux
Professional experience
Pacific Biosciences, Remote
- Senior Software Engineer, Bioinformatics, Nov 2025 - Current
Broad Institute of MIT and Harvard, Cambridge, MA
- Computational Scientist II, Data Sciences Platform, Sep 2024 - Jul 2025
- Developed a Google Cloud-based data pre-processing pipeline, curating disease phenotypes from electronic health records (EHR) for 10,000 participants in the “All of Us” biobank and assessing the quality of associated terabyte-scale next-generation sequencing (NGS) data.
- Led the design of interpretable machine learning models integrating genomic (PacBio HiFi, Illumina), transcriptomic, and proteomic data to predict disease onset and identify novel biomarkers.
- Redesigned and optimized an implementation of a hidden markov model for recombination-aware DNA sequence alignment using Cython and C++, resulting in 360x faster inference.
GitHub - castcollab/tesserae2: Tesserae2: Fast recombination-aware global and local alignment.
Tesserae2: Fast recombination-aware global and local alignment. - castcollab/tesserae2
- Computational Associate II, Bacterial Genomics Lab, Nov 2017 - Aug 2024
- Designed and implemented a new optimal partial order alignment algorithm in Rust, which was, on average, 4.1x faster than similar tools, enabling DNA sequence-to-graph alignments previously impossible.
- Led the development of Python and C++-based software specifically designed to track low-abundance (>0.1%) bacterial strains in complex microbial communities (e.g., the human gut microbiome) using whole metagenome sequencing data.
- Developed a cloud-based pipeline using the Workflow Description Language (WDL) and Docker to enable fast and reproducible characterization of thousands of metagenomic samples.
- Obtained detailed insight into the E. coli strain-level dynamics in a year-long longitudinal microbiome study of women with recurrent urinary tract infections (UTIs), revealing unexpected similarities with a healthy control group and that the UTI-causing strain is rarely cleared from the gut after antibiotics.
StrainGE: Strain Genome Explorer
StrainGE is a toolkit for tracking and characterizing low-abundance strains in complex microbial communities. It enables detailed insights into the bacterial strain-level diversity of whole metagenomic sequencing samples.

Gut microbiome dysbiosis linked with recurrent UTIs
More than half of the women in the US get a urinary tract infection (UTI) in their lifetime, which frequently becomes recurrent. In this paper, we investigated the role of the gut microbiome in facilitating recurrence.

DSM-firmenich, Delft, The Netherlands
- Intern, Jun 2016 - Oct 2016
- Reviewed literature and proposed a plan to analyze a large-scale protein production problem with Bacillus subtilis using genome-scale metabolic models.
Studio bereikbaar, Rotterdam, The Netherlands
- Software Engineer, May 2013 - Jun 2016
- Led the development of web-based geographic information systems (GIS) tools using Django and PostgreSQL, securing multi-million euro infrastructure contracts with the government through improved collaborative project planning and design.
- Enabled local analysis and modification of GIS data in the open-source desktop application QGIS by developing a custom Python-based plugin, communicating with a central server through a REST API.
Thales, Delft, The Netherlands
- Intern, Sep 2012 - Nov 2012
- Developed a GPU-accelerated tool to analyze the radar reflectivity of navy ships using nVidia's OptiX CUDA ray tracing library.
Extracurricular Projects
Python Software Foundation (VisPy Google Summer of Code 2015)
- Implemented a high-performance graph visualization system in Python and OpenGL, including several automatic graph layout algorithms.
- Contributed the open-source code upstream to VisPy, a high-performance scientific data visualization software
library.
Drawing arbitrary shapes with OpenGL points
Part of my Google Summer of Code project involves porting several arrow heads from Glumpy to Vispy. I also want to make a slight change to them: the arrow heads in Glumpy include an arrow body, I want to remove that to make sure you can put an arrow head

Home — VisPy

Delft University of Technology, "Helios 3D team"
- Built a large 3x3x1 meter 3D RGB LED cube display as part of a student team.
- Architected and implemented the C++ embedded software responsible for receiving the image to display over Wi-Fi and driving individual LEDs to the correct color.
Education
- PhD Bioinformatics. Delft University of Technology. 2025.
- Research conducted at Broad Institute’s Bacterial Genomics Lab in collaboration with the Delft Bioinformatics Lab. Courses: Immunology, Bayesian Methods for Machine Learning, Deep Learning
- MSc Computer Science. Delft University of Technology. 2017.
- BEng Electrical Engineering. The Hague University of Applied Sciences. 2013.