International CYAnobacteria & TOXin database​

 Background

Current data resources for cyanobacterial research include a collection of at least 19 databases incorporating strain collections, genomics, proteomics, transcriptomics, regulatory information, descriptions of secondary metabolites, taxonomy and literature. However, these databases include only a few cyanobacterial species isolated from Canada and Quebec, as we noticed after a data mining using the term “cyanobacteria” in NCBI (i.e., https://www.ncbi.nlm.nih.gov/genome/?term=cyanobacteria) and other microbial genome databases.

The poor representation of Canadian freshwater cyanobacteria isolates in these reference databases is limiting for stakeholders, who need accurate identification of cyanobacterial species, especially during bloom events.

Indeed, of all the 398,132 prokaryotic genomes currently on the NCBI genome database (as of July 26, 2023), only 3,552 (less than 1%) are from cyanobacteria. From this number, only 23 are from Quebec (all from non-axenic cultures).

Consequently, for a rapid analysis in real time and on-site, there is a critical need of a representative database of cyanobacteria isolated from Quebec and from Canada, especially from freshwater ecosystems.

The ICYATOX initiative

The ICYATOX database uses MySQL5 and a Web-based ZenD Framework to describe cyanobacterial strains while providing genomic information linked to their phenotypic characterization and environmental data of the source lake. ICYATOX will hold information such as the isolate ID, researcher responsible for the isolation, date, sample geographical origin and environmental variables describing it, phenotypic data, DNA extraction, sequencing information, and genome assembly.

Whole Genome Sequencing (WGS) will be performed using Illumina MiSeq with 300 bp paired-end libraries and 30 X coverage to provide high quality nucleotide sequence data. Additional long-read DNA sequencing from Oxford Nanopore will be combined with MiSeq data to generate complete genome assemblies for annotation, including complete circular chromosome assembly, plasmids and phage content.

WGS data will be used for phylogenetic analysis and to identify Single Nucleotide Variants (SNVs) in the core genome of strains belonging to the same species, allowing us to assess within-species diversity of Quebec’s cyanobacterial strains. Some tools that can be used for SNV search and pangenome analysis are SaturnV (GitHub - ejfresch/saturnV; Freschi et al., 2019), PANSEQ (Laing et al., 2010) and HARVEST (Treangen et al., 2014).

The pangenomes produced based on ICYATOX isolates will allow identifying genes responsible for within-species variability (accessory genes) in addition to those underpinning conserved traits (core genes) within Quebec strains of the main bloom-forming species in the province.