About

I am a scientist and research software engineer with 7+ years of experience, passionate about developing high-performance tools and machine learning systems to accelerate research. My tools and algorithms have enabled new insights into human disease and enabled genome analyses previously impossible.

Areas of Expertise

  • Languages & Frameworks. Python, C++, Rust, JavaScript, scikit-learn, TensorFlow, Jax, Pandas, NumPy
  • Bioinformatics. Sequence Alignment, Variant Calling, Pangenomics, Genome Assembly, Phylogenetics.
  • Machine Learning. Deep Learning, Hidden Markov Models, Logistic Regression, Linear Models
  • Software Engineering. Git, Docker, CI/CD, Unit Testing, Google Cloud Platform, PostgreSQL, Linux

Professional experience

Pacific Biosciences, Remote

  • Senior Software Engineer, Bioinformatics, Nov 2025 - Current

Broad Institute of MIT and Harvard, Cambridge, MA

  • Computational Scientist II, Data Sciences Platform, Sep 2024 - Jul 2025
    • Developed a Google Cloud-based data pre-processing pipeline, curating disease phenotypes from electronic health records (EHR) for 10,000 participants in the “All of Us” biobank and assessing the quality of associated terabyte-scale next-generation sequencing (NGS) data.
    • Led the design of interpretable machine learning models integrating genomic (PacBio HiFi, Illumina), transcriptomic, and proteomic data to predict disease onset and identify novel biomarkers.
    • Redesigned and optimized an implementation of a hidden markov model for recombination-aware DNA sequence alignment using Cython and C++, resulting in 360x faster inference.
GitHub - castcollab/tesserae2: Tesserae2: Fast recombination-aware global and local alignment.
Tesserae2: Fast recombination-aware global and local alignment. - castcollab/tesserae2
  • Computational Associate II, Bacterial Genomics Lab, Nov 2017 - Aug 2024
    • Designed and implemented a new optimal partial order alignment algorithm in Rust, which was, on average, 4.1x faster than similar tools, enabling DNA sequence-to-graph alignments previously impossible.
    • Led the development of Python and C++-based software specifically designed to track low-abundance (>0.1%) bacterial strains in complex microbial communities (e.g., the human gut microbiome) using whole metagenome sequencing data.
    • Developed a cloud-based pipeline using the Workflow Description Language (WDL) and Docker to enable fast and reproducible characterization of thousands of metagenomic samples.
    • Obtained detailed insight into the E. coli strain-level dynamics in a year-long longitudinal microbiome study of women with recurrent urinary tract infections (UTIs), revealing unexpected similarities with a healthy control group and that the UTI-causing strain is rarely cleared from the gut after antibiotics.
StrainGE: Strain Genome Explorer
StrainGE is a toolkit for tracking and characterizing low-abundance strains in complex microbial communities. It enables detailed insights into the bacterial strain-level diversity of whole metagenomic sequencing samples.
Gut microbiome dysbiosis linked with recurrent UTIs
More than half of the women in the US get a urinary tract infection (UTI) in their lifetime, which frequently becomes recurrent. In this paper, we investigated the role of the gut microbiome in facilitating recurrence.

DSM-firmenich, Delft, The Netherlands

  • Intern, Jun 2016 - Oct 2016
    • Reviewed literature and proposed a plan to analyze a large-scale protein production problem with Bacillus subtilis using genome-scale metabolic models.

Studio bereikbaar, Rotterdam, The Netherlands

  • Software Engineer, May 2013 - Jun 2016
    • Led the development of web-based geographic information systems (GIS) tools using Django and PostgreSQL, securing multi-million euro infrastructure contracts with the government through improved collaborative project planning and design.
    • Enabled local analysis and modification of GIS data in the open-source desktop application QGIS by developing a custom Python-based plugin, communicating with a central server through a REST API.

Thales, Delft, The Netherlands

  • Intern, Sep 2012 - Nov 2012
    • Developed a GPU-accelerated tool to analyze the radar reflectivity of navy ships using nVidia's OptiX CUDA ray tracing library.

Extracurricular Projects

Python Software Foundation (VisPy Google Summer of Code 2015)

  • Implemented a high-performance graph visualization system in Python and OpenGL, including several automatic graph layout algorithms.
  • Contributed the open-source code upstream to VisPy, a high-performance scientific data visualization software
    library.
Drawing arbitrary shapes with OpenGL points
Part of my Google Summer of Code project involves porting several arrow heads from Glumpy to Vispy. I also want to make a slight change to them: the arrow heads in Glumpy include an arrow body, I want to remove that to make sure you can put an arrow head
Home — VisPy

Delft University of Technology, "Helios 3D team"

  • Built a large 3x3x1 meter 3D RGB LED cube display as part of a student team.
  • Architected and implemented the C++ embedded software responsible for receiving the image to display over Wi-Fi and driving individual LEDs to the correct color.

Education

  • PhD Bioinformatics. Delft University of Technology. 2025.
    • Research conducted at Broad Institute’s Bacterial Genomics Lab in collaboration with the Delft Bioinformatics Lab. Courses: Immunology, Bayesian Methods for Machine Learning, Deep Learning
  • MSc Computer Science. Delft University of Technology. 2017.
  • BEng Electrical Engineering. The Hague University of Applied Sciences. 2013.