A New Approach Produces a 90-Fold Increase in Known Viral Taxa

Researchers leverage viruses identified from worldwide environmental samples to expand knowledge of viral taxa and their role in tree microbiomes.

The Science

Scientists believe the Earth may be home to more viruses than there are stars in the Milky Way. These viruses play an essential role in regulating microbiomes—the communities of organisms that share a given habitat. Viruses are also found throughout the human biome, even in healthy people. Scientists use two key methods to identify these viruses. Metagenomics (also called environmental genomics) is the study of genetic material recovered directly from environmental samples. Metatranscriptomics is the science that studies how microbes’ genes are expressed in a natural environment. However, scientists have full descriptions—called taxonomies—of only a tiny proportion of the world’s viruses. This limits their ability to study how viruses function in a microbiome.

The Impact

This research used a novel algorithm to compare and incorporate 715,672 metagenome viruses from environmental samples around the world. This expands the viral taxa available to researchers from about 8,000 to 723,672. The scientists then created a database of the tree of viral associations. Next, they used the database in combination with other data to examine samples from around the roots, nearby soil, and the plants themselves for two Populus tree genotypes, eastern cottonwood and black cottonwood. They found that the virus communities differed significantly between the different cottonwood genotypes. They also found that viruses differed significantly between soils, roots, and inside the trees. This research will help scientists better understand the role that viruses play in microbial communities and their influence on plant health and growth.


Viruses are a vastly understudied component of microbiomes. In this study, researchers from Oak Ridge National Laboratory (ORNL), the Massachusetts Institute of Technology, Harvard University, and the University of Tennessee created a novel method to create a classification tree for viruses at an unprecedented scale. The method can be used with any taxonomy-based classification tool to better identify viruses and their impacts in the microbiome. The 715,672 metagenome viruses that the Joint Genome Institute (JGI), a Department of Energy (DOE) user facility, has identified potentially make up only a small fraction of viruses that exist, though incorporating them increases the pool of viral taxa for classification by approximately 90-fold. While the uniqueness and diversity of the JGI viruses makes them more difficult to classify in samples with Kraken2, the new method is still 82 percent accurate in identifying the correct JGI viral sequences and more than 90 percent accurate in identifying the sequence as a JGI-identified virus. Using a parallel version of Kraken2 called ParaKraken, the researchers showed that it is possible to identify viral sequences in metagenomic Populus genotype and compartment samples. Furthermore, viral taxa comprise between 6-20% (mean 15%) of the sequence reads in metagenomic samples. The results provide a means to better understand the role that viruses may play in plant biology.

Principal Investigator(s)

Daniel Jacobson
Oak Ridge National Laboratory


This research was supported by the DOE Office of Science, Biological and Environmental Research program’s Genomic Science Program as part of the Plant Microbe Interfaces Scientific Focus Area.


Garcia, B.J., et al., A k-mer based approach for classifying viruses without taxonomy identifies viral associations in human autism and plant microbiomes. Computational and Structural Biotechnology Journal. 19, 5911-5919 (2021). [DOI: 10.1016/j.csbj.2021.10.029]