About

I am a computational scientist passionate about using machine learning and high-performance computing to answer biological questions at scale. I have a strong foundation in probabilistic modeling, algorithm design, and software engineering, and I am experienced with analyzing large-scale omic datasets. My current research focuses on designing interpretable, multi-modal machine learning algorithms integrating genomic, transcriptomic, and proteomic blood plasma data to improve our understanding of disease and identify novel biomarkers.

Technical Skills

  • Programming languages: Python, C++, Rust, JavaScript, HTML, CSS, Workflow Description Language, Snakemake.
  • Data analysis and visualization: Pandas, Polars, scikit-learn, TensorFlow, Pyro, NetworkX, Matplotlib, d3.js.
  • Molecular data analysis: Illumina (genomic, metagenomic, transcriptomic), PacBio (genomic), Olink (proteomics).
  • Software: Linux, Git, Google Cloud Platform, PostgreSQL, Adobe Illustrator.

Professional experience

Broad Institute of MIT and Harvard, Cambridge, MA

  • Computational Scientist II, Data Sciences Platform, Sep 2024 - Current
    • Enabled the characterization of structurally diverse genomic regions in Plasmodium falciparum by redesigning a hidden markov model for recombination-aware sequence alignment, resulting in 360x faster inference.
    • Developed a data pre-processing and quality control pipeline, curating disease phenotypes from electronic health records for 10,000 participants in a national biobank and assessing the quality of associated genomic, transcriptomic, and proteomic samples.
    • Working on designing interpretable, multi-modal machine learning models to cluster biobank participants, integrating genomic, transcriptomic, and proteomic blood plasma data to improve our understanding of disease and identify novel biomarkers.
GitHub - castcollab/tesserae2: Tesserae2: Fast recombination-aware global and local alignment.
Tesserae2: Fast recombination-aware global and local alignment. - castcollab/tesserae2
  • Computational Associate II, Bacterial Genomics Lab, Nov 2017 - Aug 2024
    • Designed and implemented a new optimal partial order alignment algorithm in Rust, which was, on average, 4.1x faster than similar tools, enabling DNA sequence-to-graph alignments previously impossible.
    • Led the development of Python and C++-based software specifically designed to track low-abundance (>0.1%) bacterial strains in complex microbial communities (e.g., the human gut microbiome) using whole metagenome sequencing data.
    • Developed a cloud-based pipeline using the Workflow Description Language (WDL) and Docker to enable fast and reproducible characterization of thousands of metagenomic samples.
    • Obtained detailed insight into the E. coli strain-level dynamics in a year-long longitudinal microbiome study of women with recurrent urinary tract infections (UTIs), revealing unexpected similarities with a healthy control group and that the UTI-causing strain is rarely cleared from the gut after antibiotics.
StrainGE: Strain Genome Explorer
StrainGE is a toolkit for tracking and characterizing low-abundance strains in complex microbial communities. It enables detailed insights into the bacterial strain-level diversity of whole metagenomic sequencing samples.
Gut microbiome dysbiosis linked with recurrent UTIs
More than half of the women in the US get a urinary tract infection (UTI) in their lifetime, which frequently becomes recurrent. In this paper, we investigated the role of the gut microbiome in facilitating recurrence.

DSM-firmenich, Delft, The Netherlands

  • Intern, Jun 2016 - Oct 2016
    • Reviewed literature and proposed a plan to analyze a large-scale protein production problem with Bacillus subtilis using genome-scale metabolic models.

Studio bereikbaar, Rotterdam, The Netherlands

  • Software Engineer, May 2013 - Jun 2016
    • Led the development of web-based geographic information systems (GIS) tools using Django and PostgreSQL, facilitating collaborative project planning and design between engineering colleagues.
    • These tools helped win several multi-million euro infrastructure construction contracts with the government.

Thales, Delft, The Netherlands

  • Intern, Sep 2012 - Nov 2012
    • Developed a GPU-accelerated tool to analyze the radar reflectivity of navy ships using nVidia's OptiX CUDA ray tracing library.

Extracurricular Projects

Python Software Foundation (VisPy Google Summer of Code 2015)

  • Implemented a high-performance graph visualization system in Python and OpenGL, including several automatic graph layout algorithms.
  • Contributed the open-source code upstream to VisPy, a high-performance scientific data visualization software
    library.
Drawing arbitrary shapes with OpenGL points
Part of my Google Summer of Code project involves porting several arrow heads from Glumpy to Vispy. I also want to make a slight change to them: the arrow heads in Glumpy include an arrow body, I want to remove that to make sure you can put an arrow head
Home — VisPy

Delft University of Technology, "Helios 3D team"

  • Built a large 3x3x1 meter 3D RGB LED cube display as part of a student team.
  • Architected and implemented the C++ embedded software responsible for receiving the image to display over Wi-Fi and driving individual LEDs to the correct color.

Education

  • PhD Bioinformatics. Delft University of Technology. 2025.
    • PhD in the bacterial genomics lab at the Broad Institute in collaboration with the bioinformatics lab at TU Delft.
  • MSc Computer Science. Delft University of Technology. 2017.
  • BEng Electrical Engineering. The Hague University of Applied Sciences. 2013.