Home
Publication
StrainGE: Strain Genome Explorer

Mar 7, 2022 3 min read

StrainGE: Strain Genome Explorer

StrainGE is a toolkit for tracking and characterizing low-abundance strains in complex microbial communities. It enables detailed insights into the bacterial strain-level diversity of whole metagenomic sequencing samples.

We humans each carry thousands of bacterial species. They inhabit our skin, nose, gut, and several other sites. Many species protect us from pathogens, help digest food, or train our immune system. Other species, however, can cause life-threatening infections.

Even within the same bacterial species, enormous differences in pathogenicity exist. For example, almost every human harbors the bacteria Escherichia coli in their gut microbiome without any issues. Some strains, however, can cause severe diarrhea or urinary tract infections.

Because of this phenotypic diversity, it is important to know what specific strain is present when analyzing microbial communities. Improved insights into the strain-level diversity of complex microbial communities will strengthen our understanding of their role in human health. For example, by comparing the genomes of strains, we could identify genetic factors distinguishing pathogenic and non-pathogenic strains.

This is where StrainGE comes in. StrainGE is specifically designed to identify and characterize low-abundance strains using metagenomic sequencing data from a community. Metagenomic sequencing data represents sequenced DNA fragments from all community members, which is a challenge for characterizing low-abundance strains. For example, E. coli typically represents only 1% of a healthy human gut microbiome; thus, only a tiny fraction of sequenced reads will originate from E. coli strains.

StrainGE overcomes this challenge by comparing the read data to a database of reference genomes, such as genomes available in public databases. It first detects which reads in the sample likely originate from the species of interest and then compares the reads to the references in the database, reporting those that look the most similar to the strain(s) in the sample. If multiple strains of the same species are present, it will report multiple reference genomes.

StrainGE's reported references serve as a basis for further, more detailed characterization of the sample strains. Since the reported references are unlikely to be the same as the strains in the sample, StrainGE identifies strain-specific genetic variants by mapping sample reads to the references. StrainGE analyzes the read alignment pileups to search for evidence of different alleles compared to the reference.

StrainGE was instrumental in characterizing the E. coli strain-level dynamics in the gut microbiomes of women with recurrent urinary tract infections (UTIs). We found that the UTI-causing strain was rarely cleared from the gut after antibiotics and found unexpected similarities with a healthy control group. More about this study can be found at the link below.

Lucas van Dijk

Updated on Apr 16, 2025

Publication Microbiome Software

Relative positional embeddings with RoPE

In language models, the order of tokens is of critical importance. This post explains RoPE: a technique that ensures that the pairwise token attention weights only depend on their relative position.

TIL Deep Learning

Fast and exact gap-affine partial order alignment with POASTA post image

By Lucas van Dijk

Jan 3, 2025

Fast and exact gap-affine partial order alignment with POASTA

Partial order alignment is a common method to compute multiple sequence alignments through iterative sequence-to-graph alignment. In this paper, we introduce a new algorithm that accelerates alignment by exploiting graph topology and shared sequence between the query and the graph.

Publication Bioinformatics POASTA

Gut microbiome dysbiosis linked with recurrent UTIs post image

By Lucas van Dijk

May 2, 2022

Gut microbiome dysbiosis linked with recurrent UTIs

More than half of the women in the US get a urinary tract infection (UTI) in their lifetime, which frequently becomes recurrent. In this paper, we investigated the role of the gut microbiome in facilitating recurrence.

Publication Microbiome

Using locality sensitive hashing to compactly represent k-mers post image

By Lucas van Dijk

Mar 26, 2018

Using locality sensitive hashing to compactly represent k-mers

K-mers are frequently used in bioinformatics to profile and compare genomes. This article describes a technique for compactly representing long k-mers with locality-sensitive hashing.

Tutorial Bioinformatics

Hi, I'm Lucas! I write about bioinformatics and high-performance software engineering. Expect deep dives into the technical aspects behind a paper or explainers in simple language. Topics I find interesting include sequence alignment algorithms, immunology, and GPU programming.

I am a senior software engineer on the Instrument Analysis team at PacBio, a company that builds long-read DNA sequencing machines. We develop high-performance software that transforms sensor data into raw base calls and consensus algorithms that deliver highly accurate DNA reads. Views on this website are my own.

Read the rest of my resume »

Lucas van Dijk

Subscribe to New Posts

StrainGE: Strain Genome Explorer

Read Next

Relative positional embeddings with RoPE

Fast and exact gap-affine partial order alignment with POASTA

Gut microbiome dysbiosis linked with recurrent UTIs

Using locality sensitive hashing to compactly represent k-mers

Fast and exact gap-affine partial order alignment with POASTA

Gut microbiome dysbiosis linked with recurrent UTIs

StrainGE: Strain Genome Explorer