For most eukaryotic non-model organisms no single automated pipeline exists for genome assembly and gene annotation. 

We aim to automate genome assembly and gene annotation.

Every day new genomes from a variety of organism are being sequenced. Biologists have long joined the “Big Data” area, referring to generation of massive amounts of raw sequencing data that are becoming difficult to process and handle. Often the sequencing data volume exceeds the processing capacity, thus posing great informatics challenges.

Genomics data are only part of a comprehensive approach using data from multiple levels to produce a holistic understanding of biological systems. In brief, this holistic approach is called Systems Biology.

A highly simplified schematic visualizing the central dogma of gene expression in relation to some “Omics” technologies with data integration and systems biology analysis:


  • Metabolites Metabolomics
    Proteins Proteomics
    RNA Transcriptomics
    DNA Genomics



For each level of biological information flow with resulting different types of “Omics” data, already dozens of individual informatics tools exist to deal with a variety of data analysis tasks, but novel platform technologies are needed to integrate individual tools to create an intelligent workflow with analysis. Such new integrative pipeline technologies must have learning capabilities to minimize errors and optimize the outcome. In particular, errors at the foundation of genome sequence alignment and assembly will drastically affect all downstream analyzes.