Reconstructing genomes of organisms from high-throughput sequencing experiments without a reference genome available (de novo assembly) is a challenging problem which has been approached in several ways in the past decade. Although numerous methods are available and many offer fair performance in reconstruction, there is a lack of generalized template libraries and interchangeable data structures/methods for serial, multithreaded and distributed processing. In this work we propose a novel set of cache oblivious generic data structures for serial, multithreaded and distributed processing of high-throughput sequencing data for the creation of de Bruijn or k-mer graphs towards their usage in de novo assembly and related HTS data analytics problems.
Milicchio, F. (2016). High-performance data structures for de novo assembly of genomes: cache oblivious generic programming. In Proceedings of the 7th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics [10.1145/2975167.2985691].
High-performance data structures for de novo assembly of genomes: cache oblivious generic programming
MILICCHIO, Franco
2016-01-01
Abstract
Reconstructing genomes of organisms from high-throughput sequencing experiments without a reference genome available (de novo assembly) is a challenging problem which has been approached in several ways in the past decade. Although numerous methods are available and many offer fair performance in reconstruction, there is a lack of generalized template libraries and interchangeable data structures/methods for serial, multithreaded and distributed processing. In this work we propose a novel set of cache oblivious generic data structures for serial, multithreaded and distributed processing of high-throughput sequencing data for the creation of de Bruijn or k-mer graphs towards their usage in de novo assembly and related HTS data analytics problems.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.