Reconstructing genomes of organisms from high-throughput sequencing experiments without a reference genome available (de novo assembly) is a challenging problem which has been approached in several ways in the past decade. Although numerous methods are available and many offer fair performance in reconstruction, there is a lack of generalized template libraries and interchangeable data structures/methods for serial, multithreaded and distributed processing. In this work we propose a novel set of cache oblivious generic data structures for serial, multithreaded and distributed processing of high-throughput sequencing data for the creation of de Bruijn or k-mer graphs towards their usage in de novo assembly and related HTS data analytics problems.
|Titolo:||High-performance data structures for de novo assembly of genomes: cache oblivious generic programming|
|Data di pubblicazione:||2016|
|Appare nelle tipologie:||4.1 Contributo in Atti di convegno|