If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
The human virome is the collection of all viruses that are found in or on humans, including both eukaryotic and prokaryotic viruses. Eukaryotic viruses clearly have important effects on human health, ranging from mild, self-limited acute or chronic infections to those with serious or fatal consequences. Prokaryotic viruses can also influence human health by affecting bacterial community structure and function. Therefore, definition of the virome is an important step toward understanding how microbes affect human health and disease. We review progress in virome analysis, which has been driven by advances in high-throughput, deep sequencing technology. Highlights from these studies include the association of viruses with clinical phenotypes and description of novel viruses that may be important pathogens. Together these studies indicate that analysis of the human virome is critical as we aim to understand how microbial communities influence human health and disease. Descriptions of the human virome will stimulate future work to understand how the virome affects long-term human health, immunity, and response to coinfections. Analysis of the virome ultimately may affect the treatment of patients with a variety of clinical syndromes.
The viral component of the human microbiome is referred to as the “human virome.” The human virome (also referred to as the “viral metagenome”) is the collection of all viruses that are found in or on humans, including viruses causing acute, persistent, or latent infection, and viruses integrated into the human genome, such as endogenous retroviruses. The human virome includes both eukaryotic and prokaryotic viruses (bacteriophages). Eukaryotic viruses clearly have important effects on human health. Viral infections of humans include acute, self-limited infections; fulminant, uncontrolled acute infections; and chronic infections that may be asymptomatic or associated with serious, even fatal diseases, such as acquired immunodeficiency syndrome.
Thus, we are poised to begin to understand the richness of the virome and the role viruses play within complex microbial communities (Fig 1).
Challenges for Virome Studies
The study of the virome is challenging for several reasons. First, viruses do not contain a conserved genomic region that can be used to identify the viruses in a microbial community, such as the 16S rRNA gene that is used to classify bacteria. Instead, the entire viral community must be sampled and viral genomic sequences compared with known viral reference sequences. The success of this process is currently limited by the fact that many viruses have not yet been characterized and are not included in viral reference databases.
Furthermore, viral sequences with poor homology to known viruses may be difficult to classify.
The second challenge in studying the virome is that viral genomic material can be a small proportion of the total nucleic acid in microbial communities because of the small genome sizes of most viruses and their low-level presence in some cases. This is particularly true for eukaryotic viruses producing persistent asymptomatic infection that may have as yet unappreciated effects on long-term human health.
Polymerase chain reaction and culture are tools that can be used to characterize the virome. However, the use of these approaches requires up-front decisions about which viruses to look for, thus providing an informative but more limited view of the scope of the virome. Viral nucleic acids can be enriched using hybridization techniques such as microarray or capture,
and bound nucleic acids can subsequently be sequenced to provide additional information about the viral genomes. Some novel viruses can be detected by these methods if there is sufficient sequence homology to bind the viral probes.
High-throughput, deep sequencing technology is revolutionary, because it provides an unbiased approach that can detect even rare components of a microbial community.
Nucleotide sequencing delivers great power for detecting known and novel viruses in clinical samples. Less than 10 years ago, the ABI 3730 capillary sequencer (Applied Biosystems, Foster City, CA) was the state-of-the-art platform for high-throughput sequencing, simultaneously generating sequences from 96 clones on a single run. The lengths of sequences generated on this platform are typically 500 to 800 bases. This relatively long length can be advantageous for discovering novel microbes with remote homologies to reference sequences. However, ABI 3730 sequencing requires that the novel microbe be abundant in the original sample or cloned, because the cost per read limits the number of sequences that can be generated in an experiment. Sequences generated on the ABI 3730 were used for the initial sequence-based characterizations of nonviral microbial communities and for early studies in which novel viral pathogens were detected (discussed below).
In the decade since capillary sequencing was used for the Human Genome Project, technology has increased the yield of sequence that can be generated per day from a single instrument by >30,000-fold while reducing cost by approximately 7000-fold. With the advent of massively parallel sequencing platforms, such as the Roche 454 pyrosequencer (454 Life Sciences, Branford, CT), sequencing capacity grew to approximately 1 million sequences per run, each 250 to 500 bases in length, resulting in a total sequence throughput of up to 500 million bases per run. By introducing sequence “barcodes” during sample amplification, multiple samples can be pooled within a single run, allowing generation of tens to hundreds of thousands of sequences per sample. This massively parallel sequencing allows a more thorough assessment of microbial communities that includes the description of lower abundance microbes. Indeed, analysis of stool samples on the Roche 454 platform revealed a greater number of viruses compared with the ABI 3730.
Many novel viruses were discovered using the Roche platform (discussed below). The Illumina Genome Analyzer (Illumina Inc, San Diego, CA) generates up to 640 million sequences per run, and the Illumina HiSeq 2000 can generate up to 6 billion paired-end sequences per run. On each of these platforms, multiple pooled, barcoded samples can be included on each run. Illumina sequences are shorter than those generated by Roche 454 pyrosequencing: In early experiments, they were less than 50 bases in length but now are routinely 100 bases. Although the read length is short, sequences can be generated from both ends of a DNA fragment to yield “paired-end” reads, allowing 200 bases to be sequenced from the same DNA fragment. Illumina technology provides the sensitivity needed to detect rare virus sequences, with sensitivity comparable to that of quantitative reverse transcriptase polymerase chain reaction in some studies.
and assembly programs such as PRICE have been developed to extend a fragment of sequence from a novel organism iteratively using paired-end Illumina data (DeRisi, unpublished, available at: http://derisilab.ucsf.edu/software/price/index.html). Trends toward increasing numbers of sequences per run and decreased cost per base are likely to continue. New sequencing platforms, including the Illumina MiSeq and the Life Technologies (Grand Island, NY) Ion Torrent Personal Genome Machine Sequencer, are being developed to generate large amounts of sequence data with a rapid turnaround time.
Rapid, accurate analysis of sequence data is critical for research, with more stringent requirements anticipated as clinical applications for virome analysis are developed. Identification of viral sequences is generally achieved by comparison of microbial sequences with reference genomes. Use of programs such as BLAST and BLASTX
is the traditional method for doing this; these programs work well for relatively small data sets generated by the ABI 3730 and Roche 454 pyrosequencer or for longer contiguous sequences assembled from shorter Illumina reads. However, analyzing millions to billions of Illumina Genome Analyzer sequences requires faster aligners. Many short-read sequence alignment tools are fast but have low tolerance for sequence mismatches; however, virus sequences may differ significantly from the reference genome sequences, so allowing mismatches in the alignments is critical.
provide a thorough comparison of nucleotide alignment tools for short sequences. CLC bio (www.clcbio.com) and Real Time Genomics (RTG) (www.realtimegenomics.com) software were chosen from the tools evaluated, and they were used extensively to carry out nucleotide alignments of the terabases of Illumina data generated in the Human Microbiome Project (HMP); MBLASTX from Multi Core Ware (www.multicorewareinc.com) and RTG mapx software were used for HMP translated sequence alignments (HMP Consortium, manuscript in revision, 2012). These programs provide 100- to 1000-fold increases in alignment speed over BLAST and BLASTX while maintaining similar sensitivities (MBLASTX, Mitreva et al, manuscript in revision, 2012) (RTG, Mitreva et al, manuscript in preparation, 2012). Although identification of virus sequences based on sequence homology to known viruses is straightforward in concept, one must be cautious in interpreting the data. Low-complexity sequence and sequences with homology between virus and host can cause false-positive viral identifications. Likewise, false-positive identifications can occur when a sequence does not have close homology to a sequence in the reference database; some general functions are conserved among eukaryotes, bacteria, and DNA viruses, which can result in a weak alignment of translated sequence. Further analysis of virome diversity and complexity can be achieved using software packages, such as GAAS,
viral DNA was isolated from surface seawater collected in La Jolla and San Diego, California, and approximately 1000 sequences were generated from each sample. Chao1 estimates and rank abundance curves predicted that hundreds to thousands of viral genotypes were present in the viral communities. Significant alignments were identified to all major families of dsDNA tailed phages. In addition, 65% of the sequences were unclassified, pointing to the existence of vast genomic diversity in the oceanic ecosystem, including many novel viruses. Angly et al
expanded the virome analysis to 4 distinct oceanic regions (Sargasso Sea, Gulf of Mexico, seawater off the coast of British Columbia, and the Arctic ocean) and analyzed samples collected at different time points, locations, and depths. More than 1.7 million sequences were generated using the Roche 454 platform. These sequences were relatively short, with an average length of 102 bases. Oceanic environments contained distinct phage groups that reflected the composition of the bacterial community in that niche, as well as some phages that were common to all or some environments. The diversity and richness of phage populations were different in the 4 environments described. These data suggest that phage communities in different ecologic niches will differ with respect to the environment in which they are found, in part reflecting the resident bacterial population and its functions. This work also suggests that the study of the viral populations in a variety of human body habitats will reveal an unappreciated diversity of common and specialized viruses.
Early sequence-based analyses of the virome in samples from humans focused on bacteriophage populations. Bacteriophages influence their host bacteria and contribute genes that affect the structure and functions of microbial communities.
Therefore, bacteriophages may be both important effectors and indicators of human health and disease. In the first characterization of a bacteriophage community in a human stool sample, shotgun sequencing of 532 cloned viral DNA fragments from the stool of a healthy adult revealed that the majority of phage sequences were novel.
The data suggested rich diversity of bacteriophage sequences, with approximately 2 to 5 times the number of bacteriophage genotypes as predicted bacterial genera in a stool community (∼1200–2000 genotypes predicted).
but a larger group of adults and infants will need to be sampled and compared to validate this conclusion. In fact, more recent studies that include samples from more individuals and use deeper sequencing indicate that the richness of bacteriophage populations in stool communities varies greatly among adults. In one study, Reyes et al
found ∼19 to 785 genotypes per sample from 16 individuals. Thus, although important changes in the virome may occur as the infant gut matures, it is likely that the changes are more complex than simply increased diversity.
The insights into the human virome (particularly the bacteriophage component) provided by studies by Reyes et al
were made possible in large part because of newer sequencing technologies, especially the Roche 454 pyrosequencing platform. Consistent with the earlier studies, most viral sequences obtained were novel. Both studies showed relative stability of the virome over time (days to years), although changes in diet that affected the bacterial communities also correlated with changes in the viral communities.
The depth of sequencing enabled the assembly of longer contiguous sequences that were used to identify remote homologies and open reading frames for functional analysis. Of importance, the studies by Reyes et al and Minot et al show that bacteriophages encode antibiotic resistance genes
Also, like bacterial plasmids, bacteriophages serve as reservoirs for mobile genetic elements in bacteria. In turn, this suggests that bacteriophages may affect human health by contributing to or changing the metabolic capabilities of the resident bacterial community.
The perturbation of a microbial environment by a disease, such as cystic fibrosis (CF), can cause changes in the microbiome. Willner et al
The study describes bacteriophage communities in healthy people that were unique to each individual and were thought to reflect a random, transient sampling of the external environment. However, bacteriophage communities from individuals with CF were similar to each other, presumably driven by effects of their airway pathology. The spouse of a CF patient and a control with asthma, neither of whom had CF, shared the distinct sets of viral taxa and predicted host range found in the individuals with CF. These data lead to 2 important inferences. The first is that environment can have a strong influence on an individual's microbiome, including the virome. In this study, the presence of shared organisms between spouses was striking, indicating a shared external environment. The microbial community was thought to be transient in the spouse without CF but more established in the patient with CF, in whom clearance of microbes is impaired. The second inference is that similar microbial communities may be established in response to distinct health conditions, such as CF and asthma, both of which may cause impaired clearance of microbes from the airways. Together, these data suggest that in addition to the components of the virome, the dynamics of the viral community may be important for distinguishing the effects of the virome in different microenvironments.
The studies discussed thus far evaluated DNA viruses, but many important RNA viruses that infect eukaryotic cells also are found in the gastrointestinal and respiratory tracts. Eukaryotic viruses are found less frequently than bacteriophages in many microbial communities, and indeed the stool and sputum samples evaluated contained only a few sequences with homology to eukaryotic DNA viruses.
It is likely that more eukaryotic viruses would be found by inclusion of RNA in the analysis (particularly in the respiratory tract). A study that evaluated RNA viruses in stool samples from 2 healthy individuals found a diverse array of viruses.
revealed a viral community that was dominated by bacteriophages and the subset of eukaryotic viruses that were predominantly from plants. Seventeen known human viruses were detected. Strikingly, novel viruses belonging to 51 virus families were also detected. These data indicate that environmental samples that contain specimens from a large number of individuals can provide valuable information concerning viruses present in the population, including novel agents in addition to known human pathogens.
Virome and Disease: The Discovery of Novel Eukaryotic Viruses
Overall, eukaryotic viruses are minor components of a microbial community, although their effects are often readily observed. Titers of eukaryotic viruses are generally higher in samples from symptomatic versus asymptomatic individuals. Thus, some of the viral metagenomic studies of the human gastrointestinal tract evaluated stool from patients with diarrhea
The samples evaluated (from 12 and 35 patients, respectively) contained a variety of DNA and RNA viruses, including human enteroviruses, adenoviruses, caliciviruses, and parvoviruses. The eukaryotic viral metagenomes were distinct in each subject. Viral sequences accounted for the majority of sequences that were present in some subjects. The use of the Roche 454 pyrosequencing platform, which generated more sequences per sample than the ABI 3730 platform, revealed a greater richness in the eukaryotic viral metagenome.
This indicates that depth of sampling is an important factor for comprehensive viral metagenomic analysis and for discovering novel eukaryotic viruses. In addition to the detection of known viruses, each of these studies identified novel viruses associated with diarrhea, including an astrovirus,
The identification of novel viruses is an exciting part of the characterization of the virome. Most of the viral sequences detected in deep sequencing experiments are uncharacterized (described above), indicating the presence of great viral diversity to be discovered. These undiscovered viruses may affect human health, either acutely or through chronic infection.
Indeed, many conditions, including fever, diarrhea, and respiratory illness, may be caused by unknown or undiagnosed pathogens that are suspected to be viral. In recent years, many novel eukaryotic viruses have been discovered or characterized using sequencing, including viruses in the following groups: arenaviruses,
Discoveries about the presence and dynamics of known viruses in the virome may also affect the way we view their impact on human health. For instance, viruses that integrate into the human genome have been associated with cancer (eg, human papillomavirus 16, Epstein–Barr virus, and the more recently discovered Merkel cell polyomavirus). As we characterize the human virome, distinguishing episomal from integrated viruses is an important goal that may relate to the understanding of disease. In addition, virome analysis may identify known viruses in unexpected tissues, which could suggest novel mechanisms of disease.
The Future: Clinical Applications of Human Virome Studies
The most immediate applications of virome studies relate to the discovery of new viral pathogens (see above) or viruses with previously unappreciated tropisms.
Ongoing viral metagenomic analyses will undoubtedly reveal the presence of additional novel viruses. Significant evidence must be accrued to relate novel viruses to disease phenotypes. As evidence associating novel viruses with disease phenotypes accumulates, these new viruses will be considered as potential causes for disease. For instance, since their discovery in 2005,
however, their roles as pathogens have not yet been formally established. Detailed studies will be required to establish causal relationships between viruses and disease.
An intriguing question is whether viral metagenomic analysis can be applied as a clinical diagnostic method. The concept is appealing because a sequencing-based approach could dramatically increase the range of viruses detected in clinical samples compared with existing diagnostic methods. In some recent studies, sequence-based analysis of viral communities has had sensitivity comparable to virus-specific polymerase chain reaction.
Important methodological questions that need to be addressed include which samples should be selected for analysis, what sample preparation method should be used, and which sequencing platform should be used. In addition, extensive work remains to be done by laboratorians and clinicians to understand the clinical significance of the data generated. Finally, significant practical barriers remain to be surmounted, including decreasing the time required for sample-to-result analysis and decreasing cost. Although further technologic progress in both sequencing and information processing will be required to meet these goals, the pace of recent advances suggests that this may occur in the relatively near future.
We envision that in some patients who are diagnostic mysteries, rapid, unbiased sequence analysis of the viral metagenome in several samples from the patient will be used to generate a list of medically relevant viruses and genes that are detected, which can be further evaluated and confirmed using virus-specific assays. The viral metagenomic data will then be considered along with clinical data to determine whether (a) the virus or viruses can have a causal relationship to the patient's illness or (b) genes encoded by the virus may affect a planned treatment (antibiotic or antiviral resistance). In the future, as we begin to understand how the virome affects long-term human health, immunity, and response to coinfections or treatments, analysis of the virome may become highly informative for patient management.