08/18/2011

A “Meraculous” Algorithm for Whole-Genome Assemblies

Summary

DNA sequencing technologies generate a tremendous amount of genomic data compared to just a few years ago. Today, however, most genomic data is for small DNA fragments that need to be assembled back into a whole genome to elucidate the biological function of the parent organism. This represents a computational challenge for the sequencing community, in particular when the amount of genomic data reaches more than a hundred million fragments. DOE Joint Genome Institute researchers have now developed an efficient algorithm, Meraculous, to assemble the short genomic fragments into whole genome sequences. Meraculous can quickly and accurately assemble microbial genomes with a fraction of the computer memory required for more traditional methods, thanks to the use of novel techniques in graph theory and in memory-efficient hashing schemes. JGI staff have tested this method on Pichia stipiti, a microbe that efficiently produces ethanol from the five-carbon sugar xylose and found that they were able to quickly reconstruct 95% of the genome, error free. Research at JGI continues to advance this algorithm with applications to more complex plant genomes planned.

References

Chapman, J. A., I. Ho, S. Sunkara, S. Luo, G. P. Schroth, and D. S. Rokhsar. 2011. “Meraculous: De Novo Genome Assembly with Short Paired-End Reads,” PLoS ONE 6(8), e23501. DOI:10.1371/journal.pone.0023501.