Improving the contiguity and correctness of genome assembly via optical maps
Prof. Stefano LonardiDe novo genome assembly is a challenging computational problem due to the high repetitive content of eukaryotic genomes and the imperfections of sequencing technologies. Several assembly tools are currently available, each of which has strengths and weaknesses in dealing with the tradeoff between maximizing contiguity and minimizing assembly errors (e.g., mis-joins). In order to obtain the best possible assembly, it is common practice to generate multiple assemblies from several assemblers and/or parameter settings and try to identify the highest quality assembly. Unfortunately, often there is no assembly that both maximizes contiguity and minimizes assembly errors, so one has to compromise one for the other. The concept of assembly reconciliation has been proposed as a way to obtain a higher quality consensus assembly by merging or reconciling all the available assemblies. While several reconciliation methods have been introduced in the literature, we will show that none of them can consistently generate assemblies that are better than the assemblies provided in input. Then, we will propose a novel assembly reconciliation method that can take advantage of optical maps to accurately carry out assembly reconciliation. Experimental results demonstrate that our tool can double the contiguity (N50) of the input assemblies without introducing mis-joins or reducing genome completeness.