Translational research in infectious disease: current paradigms and challenges ahead

In recent years, the biomedical community has witnessed a rapid scientific and technologic evolution after the development and refinement of high-throughput methodologies. Concurrently and consequentially, the scientific perspective has changed from the reductionist approach of meticulously analyzing the fine details of a single component of biology to the “holistic” approach of broadmindedly examining the globally interacting elements of biological systems. The emergence of this new way of thinking has brought about a scientific revolution in which genomics, proteomics, metabolomics, and other “omics” have become the predominant tools by which large amounts of data are amassed, analyzed, and applied to complex questions of biology that were previously unsolvable. This enormous transformation of basic science research and the ensuing plethora of promising data, especially in the realm of human health and disease, have unfortunately not been followed by a parallel increase in the clinical application of this information. On the contrary, the number of new potential drugs in development has been decreasing steadily, suggesting the existence of roadblocks that prevent the translation of promising research into medically relevant therapeutic or diagnostic application. In this article, we will review, in a noninclusive fashion, several recent scientific advancements in the field of translational research, with a specific focus on how they relate to infectious disease. We will also present a current picture of the limitations and challenges that exist for translational research, as well as ways that have been proposed by the National Institutes of Health to improve the state of this field.


REVIEW ARTICLE
Translational research in infectious disease: current paradigms and challenges ahead JUDITH M. FONTANA, ELIZABETH ALEXANDER, and MIRELLA SALVATORE NEW YORK, NY In recent years, the biomedical community has witnessed a rapid scientific and technologic evolution after the development and refinement of high-throughput methodologies. Concurrently and consequentially, the scientific perspective has changed from the reductionist approach of meticulously analyzing the fine details of a single component of biology to the ''holistic'' approach of broadmindedly examining the globally interacting elements of biological systems. The emergence of this new way of thinking has brought about a scientific revolution in which genomics, proteomics, metabolomics, and other ''omics'' have become the predominant tools by which large amounts of data are amassed, analyzed, and applied to complex questions of biology that were previously unsolvable. This enormous transformation of basic science research and the ensuing plethora of promising data, especially in the realm of human health and disease, have unfortunately not been followed by a parallel increase in the clinical application of this information. On the contrary, the number of new potential drugs in development has been decreasing steadily, suggesting the existence of roadblocks that prevent the translation of promising research into medically relevant therapeutic or diagnostic application. In this article, we will review, in a noninclusive fashion, several recent scientific advancements in the field of translational research, with a specific focus on how they relate to infectious disease. We will also present a current picture of the limitations and challenges that exist for translational research, as well as ways that have been proposed by the National Institutes of Health to improve the state of this field. (Translational Research 2012;159:430-453) Abbreviations: 2-DE ¼ 2-dimensional electrophoresis; 2-D DIGE ¼ 2-dimensional differential ingel electrophoresis; CF ¼ cystic fibrosis; CTSA ¼ Clinical and Translational Science Awards program; EBV ¼ Epstein-Barr virus; FDA ¼ U.S. Food and Drug Administration; GWAS ¼ genome-wide association studies; HCV ¼ hepatitis C virus; HMP ¼ Human Microbiome Project; HPLC ¼ high-pressure liquid chromatography; LC ¼ liquid chromatography; LSB ¼ Laboratory of Systems Biology; mAb ¼ monoclonal antibody; MRM/SRM ¼ multiple reaction monitoring/selective reaction monitoring; MS ¼ mass spectrometry; MS/MS ¼ tandem mass spectrometry; NCATS ¼ National Center for Advancing Translational Sciences; NCRR ¼ National Center of Research Resources; NIAID ¼ National Institute of Allergy and Infectious Disease; NIH ¼ National Institutes of Health; NME ¼ new molecular entity; NMR ¼ nuclear magnetic resonance; PBMC ¼ peripheral blood mononuclear cell; PCR ¼ polymerase chain reaction; PRR ¼ patho-gen recognition receptor; QQQ ¼ triple quadrupole mass spectrometry; SARS-CoV ¼ coronavirus associated with severe acute respiratory syndrome; SNP ¼ single nucleotide polymorphism; TB ¼ tuberculosis; UTI ¼ urinary tract infection; YFV ¼ yellow fever virus Translational research is defined precisely by the National Institutes of Health (NIH) as ''the process of applying ideas, insights and discoveries generated through basic scientific inquiry to the treatment or prevention of human disease.'' 1 As the field of translational research has become increasingly popular in recent years, it has undergone numerous reiterations, such that the specific meaning of the term ''translational research'' has itself been redefined several times. 1,2 The recent evolution in next-generation sequencing techniques and the introduction of high-throughput methods has resulted in an explosive cascade of research applications, spanning from target identification to diagnostics and therapeutics. These technical advances have provided the impetus for some radical changes in the way research itself is conceived and performed. As a result, enhanced interactions and broader collaborations among researchers with different expertise will be required just to keep up with the rapidly changing state of science. For a multidisciplinary approach to be effective, better ways to collect and share data (eg, biorepositories) must be identified. In addition, a more rapid translation of information from basic science into useful clinical applications will require the removal of communication barriers and financial roadblocks that currently prevent basic science teams from working with each other and with clinical researchers. Finally, regulatory changes will be necessary to promote faster approval of new molecular entities (NMEs) resulting from such scientific collaborations.
A major turning point in the validation of the field of translational research came with the creation of the NIH's Clinical and Translational Science Awards program (CTSA) in 2006, which supports a national consortium of 60 medical research institutions in 30 states. 3 CTSA is the largest program of the National Center of Research Resources (NCRR), which was established initially in 1990 for the purpose of providing resources, tools, and networks to NIH researchers nationwide as well as support for other programs, such as biomedical technology and research infrastructure. 4,5 In a climate of rapid scientific development, these programs aim to provide infrastructure, resources, training, and opportunities for collaboration among scientists and clinicians from a wide variety of disciplines and across different institutions. The common goal of these programs is to facilitate the transition from basic science discoveries to clinical studies, community practice, and health care decision making. 4-7 Although the programs of the NCRR have promoted higher quality training, improved research management, and enhanced community participation in clinical research, many roadblocks remain for translational research. 6 In March 2011, Francis Collins, director of the NIH, announced the creation of a new National Center for Advancing Translational Sciences (NCATS), which will assume the CTSA and dissolve the NCRR by redistributing its other programs throughout other institutes and centers. The aim of this controversial new program will be to catalyze the progression from basic scientific discovery to clinically useful therapeutic or diagnostic tools by identifying bottlenecks in translational research and exploring innovative methods to approach and remove them. 8 Thus, the creation of the NCATS is an attempt to shift the current paradigm in which most basic science discoveries take more than 10 years before they are translated into a clinically relevant drug.
The long time periods and costs necessary for the development and approval of useful drugs have also affected pharmaceutical research and development negatively, particularly in the area of infectious disease, where drug companies are curtailing or ceasing antimicrobial development despite increasing antibiotic resistance and ''bad bugs.'' [9][10][11] Similarly, despite the Orphan Drug Act, which was passed in 1983 to stimulate the development of drugs for rare diseases, 12 the pipeline for new drugs targeting diseases not of interest to pharmaceutical companies is also dry. Ultimately, whereas the creation of programs like the NCATS is a step in the right direction, the changes proposed by this organization still leave several problems unresolved and generate a new series of concerns.
In this article, we will review, in a noninclusive fashion, several examples of basic science achievements that have been brought to the forefront of translational research. A complete review of the translational applications of all recent scientific discoveries transcends the purpose of this article, so we will limit our discussion to the field of infectious disease. Overall, we draw attention to the general challenges facing translational research and define some of the recently realized achievements and future prospects of this field.

THE ''OMICS'' REVOLUTION OF TRANSLATIONAL RESEARCH
The Human Genome Project was conceived initially in 1985 13 and is considered to be one of the most important steps in the start of the genomic era. By definition, genomics characterizes the hereditary information of an organism. Research in proteomics, which defines the structure and function of proteins encoded by the genome of an organism, is the next logical step after genomics for the study of biological systems, because proteins are the major catalysts of intracellular processes. 14 Metabolomics uses the systematic analysis of the complete set of chemical fingerprints left behind by certain cellular processes to determine key aspects of the function and regulation of those processes. 15 Together, genomics, proteomics, and metabolomics comprise the 3 major systems biology approaches at the forefront of translational research (Fig 1).
Genomics. Since it was first invented in the late 1970s, DNA sequencing has been one of the most influential innovations in biological research. [16][17][18] This rapidly evolving technique has provided a feasible approach to a multitude of problems concerning biology, including infectious disease. Polymerase chain reaction (PCR) is the most common method for amplifying nucleotide sequences based on defined primer sets that recognize specific regions of the DNA of a sample. Physicians routinely refer to data from PCR and sequencing methods as an integral part of the standard of care. For example, HIV medications are chosen after analyzing the virus genotype for the presence of mutations that would confer resistance to some of the antiretroviral drugs on the market, and viral loads (viral levels measured by PCR) are used to help predict the responses to therapy. 19 Similar approaches are also being used for other viral diseases such as hepatitis B virus, [20][21][22] hepatitis C (HCV) virus, [23][24][25][26] and influenza. [27][28][29] PCR and sequencing techniques also play a crucial role in pathogen discovery and outbreak investigation. In the recent years, the importance of this field has been highlighted by the growing number of emerging pathogens and their rapid spread as a consequence of globally mobile populations, climatic changes, improved surveillance and threats of bioterrorism. Both basic research and clinical microbiology laboratories use uniplex and multiplex PCR assays for the rapid identification and typing of many bacterial and viral pathogens. Conventional uniplex PCR methods use specifically targeted sets of primers (1 set per reaction) to amplify conserved genetic regions of a sample and identify close variants of known pathogens. Multiplex PCR assays, which first emerged in the late 1990s, 30 are based on the same theory but can amplify multiple regions of DNA at once from a single sample, thus producing additional information that would otherwise require several times the amount of reagents to run using uniplex PCR. 31 Although multiplex PCR techniques were a significant improvement compared with conventional PCR, both approaches are still limited in their ability to detect new pathogens and to screen large numbers of sequences in an efficient manner.
The introduction of microarray technology in the middle to late 1990s allowed for the development of microarray-based pathogen detection platforms and high-density, pan-microbial arrays, which truly made rapid screening for a large number of pathogens feasible. 32 In this application of microarray technology, sample material suspected to contain pathogen genomes is hybridized to a 2-dimensional (2-D) array (or gene chip) of hundreds or thousands of miniaturized spots containing nucleic acid probes specific for various pathogen sequences. Microarray-based pathogen-detection platforms, such as the GreeneChip and the Virochip, 33-36 use 70-mer oligonucleotide probes of conserved genetic regions within each taxonomic group of viruses. Because of their length, these probes are more tolerant of sequence mismatches and therefore allow for the detection of unknown microbial targets. 37 Furthermore, the probes are updated continuously when new viruses and sequences are added to their associated databases. To identify a pathogen from a clinical specimen, nucleic acids are amplified randomly and then hybridized to the chips. Given the large number of probes and possible hybridization successes, the results are analyzed with the help of computer programs that use algorithms and statistics to identify virus hybridization patterns. Although microarray-based pathogendetection platforms have been instrumental in the detection of numerous pathogens, including novel ones like the coronavirus associated with severe acute respiratory syndrome (SARS-CoV), 34,38 they still rely on some prior knowledge of relevant genome sequences. [39][40][41] The introduction of high-throughput sequencing has revolutionized the ability to detect novel infectious agents whose genomic sequences are completely unknown or are present in extremely low numbers. The highthroughput sequencing revolution started in 1987 with the introduction of the first automatic sequencer. 42 The first of the next-generation sequencing technologies was developed only a few years later, 43 but it took nearly 15 years of technologic improvements before it became commercially available. 44,45 Currently, the most frequently used platforms of this type include the 454 large-scale parallel pyrosequencing system (454 Life Sciences/Roche, Branford, Conn) and the Solexa (Illumina, Inc, San Diego, Calif) sequencing-by-synthesis system, but this field is evolving rapidly with new platform generations regularly being developed. These techniques allow for the ''deep'' sequencing of clinical samples. The term ''deep'' refers to the depth of coverage or the average number of times that a single nucleotide is sequenced, and it allows for high levels of accuracy in sequence determination. [46][47][48] ''Deep'' sequencing methods can, therefore, achieve thousands to millions of simultaneous sequence reads per run, allowing for the precise determination of the entire genome of an organism. ''Ultra-deep'' sequencing, which is largely being enabled by platforms such as the 454 sequencing system, provides an even deeper sequence coverage and allows for several additional applications, including amplicon and transcript sequencing, which can detect extremely low-abundance genetic variations. 49 Because pyrosequencing techniques are based on the detection of light produced whenever a nucleotide is incorporated and do not rely on physical separation of DNA bases, these platforms can be run in parallel and miniaturized to any reaction volume that generates detectable levels of light, thus driving down the overall cost of sequencing. Nevertheless, the shorter reads necessitated by the deeper sequencing capacity of nextgeneration sequencing creates additional challenges for sequence assembly and gene annotation. 49 Although the cost of sequencing is decreasing, the analysis of the accumulated data is costly, time consuming, and computationally challenging. The analysis is complicated by the fact that computer systems will have to discriminate between the nucleic acids of the host and those of the newly discovered infectious agent. Despite these challenges, several new viruses have been identified using ''deep'' sequencing systems. [50][51][52][53] Additionally, these systems have made subsequent steps in pathogen discovery, including sequencing the entire genome to '' Genomics is the study of the complete set of hereditary genetic information in an organism. With the advent of microarray technology and next-generation sequencing, numerous applications have arisen from the field of genomics, including pathogen discovery, epidemiologic advances, and a variety of molecular techniques that allow for precise manipulation of microbial genomes. As DNA begets RNA and protein, so do the fields of transcriptomics and proteomics follow logically from genomics. Transcriptomics, which is a subset of the field of genomics and concerns the collection of messenger RNA transcripts expressed within an organism, has emerged as a result of more sophisticated techniques that allow for the highly sensitive determination of low-abundance mutations, transcripts, and SNPs. Proteomics is the next major ''omics'' field after genomics, and it focuses on the complement of proteins, their modifications, and their interactions within an organism. With the aid of analytical techniques such as 2-DE, MS/MS, and shotgun proteomics, the field of proteomics has found useful application for the identification of biomarkers and mapping of epitopes that may provide targets for antimicrobial drug development. The field of metabolomics is a natural offshoot of proteomics as it uses many of the same MS techniques. A major difference, however, is that whereas genomics and proteomics provide insight into the potential of cellular processes, metabolomics gives an instantaneous snapshot of what is actually happening in a cell. Each one of these ''omics'' fields generates massive amounts of rich, detailed data that surpass the capabilities of manual data analysis. This therefore necessitates the incorporation of bioinformatics for the development of computer algorithms that are used to analyze and model data. (Color version of figure is available online.) facilitate the development of diagnostic techniques and the identification of potential therapeutic targets, more time and cost effective.
Beyond pathogen discovery, sequencing-based methods have the ability to discriminate between closely related strains and can therefore be employed successfully to study viral variations and evolution. Next-generation sequencing techniques also provide an important tool that enables epidemiologists to follow person-to-person transmission more closely and eventually identify the source of an outbreak. This was the case in the recent cholera outbreak in Haiti, where the sequences of the Vibrio cholerae genome from the Haitian outbreak were compared with the sequences of previously isolated strains and found to be more closely related to strains isolated in Bangladesh in 2002 and 2008 than to the South American isolates. 54 Similarly, high-throughput sequencing techniques have allowed for the fast sequencing of the 2009 H1N1 ''swine'' flu isolates and for the determination of the exact origin of each of its gene segments. 55 New sequencing techniques, in combination with other advances in molecular medicine, have also improved the field of vaccine discovery drastically. In the past, vaccines were constituted by crude or partially purified, inactivated viral or bacterial preparations, and often carried the risk of serious allergic reactions or autoimmune diseases because of impurities in the formulation. 56 Furthermore, administration of the vaccine itself could sometimes cause disease, as illustrated by vaccine-induced paralytic polio. 57 Although some of these problems still exist with present-day vaccines, the field of genomics has paved the way for more powerful and focused techniques that allow for a better understanding of the nature of pathogen antigenicity and how our immune system can be tuned to respond optimally.
With the explosion of sequence data from the genomic era, the genomes of a large number of medically important microbes have been decoded and made readily available. As a result, techniques such as reverse vaccinology have emerged that allow for the design of more specifically targeted vaccines. 58 The power of reverse vaccinology lies in its ability to identify novel vaccine targets based on computer-facilitated predictions of the immunogenicity and prevalence of all antigens in a given pathogen without ever having to cultivate the specific microorganism. Once potentially suitable vaccine targets are identified, each gene from the infectious agent that encodes one of these targets is cloned into an expression vector, and the resulting proteins are expressed in vitro and used to immunize an animal model from which antisera will be derived. Each corresponding antisera is then tested for its ability to neutralize or protect against the original infectious agent. Using this information, vaccines can be designed to elicit specific immune responses against the specified target and are combined with rationally designed adjuvants whose mechanisms of action could improve vaccine immunogenicity. One example of the successful use of this process is the recently developed vaccine for serogroup B Neisseria meningitides. [59][60][61] An array of other potential vaccines, including those targeting group B Streptococcus and Streptococcus pneumoniae species, are currently under study. 62 One technique that has derived from the newly sequenced genomes of viral pathogens is that of reverse genetics, which has made it possible to generate viruses entirely by co-transfection of specifically engineered plasmid DNA in cell culture. This technique has very important implications for vaccine production because it means that very specific changes can be reverse engineered into the sequences of the plasmid DNA used for co-transfection, allowing extremely fine control over the pathogenicity and immunogenicity of the resulting virus. 63 Reverse genetics has been used successfully in the development of promising candidates for vaccines targeting influenza viruses. Several groups have explored reverse genetics strategies to design liveattenuated H5N1 influenza virus vaccines, 64 and a recent phase I clinical trial has demonstrated the safety and immunogenicity of a reverse genetics-derived NS1truncated variant of an H1N1 influenza virus in male volunteers. 65 Reverse genetics has also been used in pathogenetic studies. In this case, the insertion, mutagenesis, or deletion of specific sequences in the plasmids before transfection can clarify the effect of such modifications on viral infectivity and can help to elucidate the function of any modified protein in the context of the viral life cycle. 66 Finally, the ability to exert such exquisite control over a viral sequence would permit the identification of specific sequence alterations that promote viral attenuation, which is information that can be used for the design and generation of new vaccines. [67][68][69][70][71] Additional applications that have risen from the genomic revolution include the use of genetically altered viral vectors as an alternative approach to vaccination 72,73 or as a conduit for delivering antibodies as passive immunotherapy 74,75 and the use of sequence data in the design of specific inhibitors and antivirals. These approaches are currently under study and are likely come to the forefront in the near future.
The genomic revolution has improved significantly the way we detect, prevent, and treat infectious diseases. Future advancements in this field will continue to decrease the time necessary to identify newly emerging human pathogens and produce new vaccines for clinically relevant infections.
Proteomics. Proteomics is a predominantly mass spectrometry (MS)-based technique that deals with the large-scale evaluation of cellular function at the protein level. MS analysis is accomplished by ionizing a sample into its molecular components (eg, with an electron beam, laser, etc) and then passing them through an electromagnetic field that separates them by their ratio of molecular mass to charge (m/z). 76 Proteomic analysis (as opposed to other MS-based techniques) seeks specifically to elucidate the entire complement of proteins and protein modifications and/or interactions in a specimen at a given time. Because proteins are the major effectors of cellular pathways, proteomics provides a way to interpret information encoded within the genome of an organism. Since the term proteomics was first coined in 1995, 14 the field has expanded tremendously, largely because of the development of ionization and analytic methodologies, which have allowed for the more efficient separation and detection of almost all protein components within a sample of interest. 77 Unlike genomics, no method or instrument currently exists that can identify and quantify the components of a complex protein sample in a single step. 77,78 Instead, a combination of steps with varied methodologies and instrumentation are used.
Proteomic platforms are characterized by 2 key steps: (1) protein separation, mostly obtained through either electrophoretic or chromatographic methods, and (2) protein analysis using primarily MS-based techniques. Two-dimensional electrophoresis (2-DE), which was developed in the 1970s, is a method that allows all proteins contained within a complex biological mixture to be separated by their isoelectric points and molecular weights, and then visualized by staining. 79,80 Today, 2-DE-based proteomic approaches remain the gold standard for the separation of complex protein mixtures despite several limitations, including the detection of only relatively abundant proteins in a complex mix, poor separation of insoluble and/or hydrophobic proteins, protein comigration, and the labor-intensive nature of identifying multiple protein spots on the resulting gels. 79 To address the problem of comigration, a modification of 2-DE, known 2-D differential in-gel electrophoresis (2-D DIGE) was developed. In this technique, multiple protein samples are labeled separately with fluorescent dyes, mixed together, and then resolved on a single gel, eliminating the need for cross-gel comparisons and increasing the overall statistical power by the inclusion of a greater number of samples. 77,81,82 Whereas the development of 2-D DIGE represented some improvement, the need for the increased sensitivity of detection and more efficient separation of proteins within a complex mixture remained and was the driving force behind the development of ''gel-free'' technologies. 79 Gel-free proteomic approaches were made possible by coupling the physical separation capabilities of chromatographic techniques to the mass analysis capabilities of tandem mass spectrometry (MS/MS). 79,80 Chromatography is a technique that involves moving the components to be separated (mobile phase) across an immobile support (stationary phase) such that the migration rates of solutes (ie, proteins) vary depending on their chemical properties. 83 Liquid chromatography (LC), in which the mobile phase is liquid, is one of the most widely used protein separation techniques in proteomics. 83 With the use of smaller particles (less than 2 mm), monolithic columns (using porous channels rather than beads) and extremely high pressures (up to 19,000 p.s.i.), high-performance LC (HPLC) and ultrahigh-performance LC have improved the efficiency and resolution of protein separation drastically. 84 In HPLC, complex protein mixtures are digested by trypsin into their peptide components, which are then moved in a liquid mobile phase at high pressures across a stationary phase that allows for their separation based on polarity. Next, the sorted peptide components are analyzed by MS/MS. The first mass spectrometer is programmed to detect peptides over a range of predetermined masses to distinguish targets of interest from the vast array of proteins typically present in complex biological fluids. The resulting MS1 data are used to pinpoint ions of interest, which are then fragmented, measured, and recorded by the second mass spectrometer (MS2 data). Finally, computer algorithms are used to aid researchers in identifying individual proteins from combined MS1 and MS2 data. 77,79,81,85 As an example, the SEQUEST algorithm, developed in 1994, uses the fragmentation spectra of peptides to infer amino acid sequences and match them to translated genomic sequences. This approach has an identification rate of greater than 100 proteins per run. 86 The newest iteration of this software, MacroSEQUEST, features enhanced capabilities (eg, posttranslational modification scanning) and a dramatic improvement in processing speed compared with the original version. 87 The combination of HPLC and MS/MS, which is now known as ''shotgun proteomics,'' has proven to be a fast and highly sensitive approach, especially when scaled down to microflow or nanoflow rates. 88,89 As such, shotgun proteomics has not only allowed for the identification of thousands of individual proteins within a given sample with essentially no prior knowledge of the component peptides but also has greatly facilitated the analysis of entire proteomes. Although shotgun proteomics represents a significant technological advancement in the analysis of complex biological mixtures, the concentrations of individual proteins within such samples can be low enough to require levels of sensitivity beyond what even standard HPLC conditions can achieve. By decreasing flow rates and column diameters, this level of sensitivity can be obtained, but only by adapting the system to the special requirements of nanoliter flow rates (20-1000 nL/min). 88 Another shortcoming of shotgun proteomics is the inability of one-dimensional HPLC to provide sufficient separating power to manage the vast complexity of proteins inherent in most biological specimens. To address this issue, multidimensional separating approaches, such as multidimensional protein identification technology (MudPit), which combines strong cation exchange chromatography with HPLC, were developed. 79,85 Even with such improvements, these advanced shotgun proteomics platforms can still identify only approximately 40% to 70% of the proteins predicted by a microbial genome (less for proteomes of higher complexity). Furthermore, the actual proteome size may be smaller or larger than predicted by the genome because of the absence of transcription of some proteins and the alternative splicing and/or posttranslational modification of others. 77,81 Therefore, whereas shotgun proteomics does enable the identification and relative abundance of a large number of proteins in a sample, it is not optimal for absolute protein quantitation.
In addition to identifying proteins within complex biological mixture, the need to quantitate these components can provide information regarding their dynamics within the context of a biological system. This added focus has led to the emergence of a derivative field known as quantitative proteomics. Protein quantitation can be relative or absolute. Relative protein abundance may be estimated by several methods, including by measuring the total number of repeat observations of MS/MS spectra from all peptides in a protein (spectral counting), calculating the percentage of possible peptides per protein observed (peptide counting), or by comparing directly the extracted ion chromatograms (XIC, aligned using both mass and elution time) of one sample with that of another sample. 90,91 These label-free techniques have been facilitated greatly by the development of new software programs, such as MZmine, MapQuant, and SuperHirn. [91][92][93] Alternatively, protein quantification may be accomplished by combining shotgun proteomic techniques with stable isotope labeling. 90,91 Using this approach, samples are labeled differentially with stable isotopes (heavy vs light), then pooled and analyzed together. Peptides common to both samples, but labeled with different isotopes, behave the same on analysis and are detected as peak pairs. By comparing the sizes of these peaks, the relative abundance of the peptides in each sample can be compared. 79 The methods for absolute quantitation of protein abundance are more desirable in many cases (eg, for determining copies per cell). One technology that addresses this need and is now commonplace in clinical settings is called multiple reaction monitoring, also known as selective reaction monitoring (MRM/SRM). This technique exploits the unique capabilities triple quadrupole (QQQ) MS to analyze precursor and fragment ion pairs. QQQ MS uses 3 sets of 4 circular metal rods (quadrupoles) that run parallel to each other. The first quadrupole (Q1) acts specifically to select peptide ions with predefined m/z values. These peptides are then fragmented by the second quadrupole (Q2), which serves as a collision cell. The third quadrupole (Q3) can be set either to scan the entire m/z range to provide information about the structure of the original ion or to filter out fragments with specific m/z values that may be indicative of particular characteristics (eg, functional groups). MRM/SRM can provide targeted information (sensitive to the attomole level) about multiple peptides across a wide concentration scale within a single sample. 90,94,95 The sensitivity of the MRM/SRM technique can also be improved by enriching the target from a complex background using antibody-mediated approaches such as stable isotype capture by antipeptide antibodies and immuno-matrix-assisted laser desorption/ionization. 90 As with relative protein quantitation, absolute protein quantitation possesses its own arsenal of computer programs designed to assist in the analysis of MRM/SRM data, including Skyline MRM/SRM builder, MRMpilot (Applied Biosystems), Peptide Optimizer, and Dynamic MRM software (Agilent Technologies, Santa Clara, Calif), MaRiMba, MRMaid, and MRMer. 90 To date, these analytical platforms have given rise to 4 major proteomic applications: (1) global protein profiling-analysis of all proteins within a given biological unit or sample, (2) analysis of protein modifications (phosphorylation, ubiquitination, nitrosylation, etc), (3) analysis of protein-protein interactions, and (4) analysis of protein-gene interactions. 77,78 With regard to the study of infectious diseases, global protein profiling using shotgun proteomics has been used recently to identify and quantitate protein expression in the malaria parasite, Plasmodium falciparum, according to the stage of its life cycle and its host compartment. These studies resulted in more than 200 candidate proteins for possible stage-specific drug or vaccine targets. In addition, they generated information on a large set of ''orphan'' proteins (proteins that could be mapped to the genome but were not found in the set of previously known proteins), protein interactions, and protein expression at each stage of the parasite's life cycle. 96,97 An analysis of protein modifications has been used to identify phosphoprotein signaling pathways that are altered in Rift Valley fever virus-infected airway epithelial cells and to uncover potential targets of reactive nitrogen intermediates in Mycobacterium tuberculosis (M. tubercolosis), thus informing on potential targets of novel antiviral and antimycobacterial compounds. 98,99 Finally, analyses of protein-protein and protein-gene interactions have been used to study the relationship between Epstein-Barr virus (EBV) infection and nasopharyngeal carcinoma, as well as alterations in host cellular processes associated with HIV-1 viral replication. 100,101 Recent studies have aimed to establish the context in which these types of interactions occur by combining proteomic-based techniques with either cryo-electron tomography or ''membrane shaving'' with proteinase K digestion to examine specific cellular localization. 81 Such approaches have been used successfully to map the spatial proteome of Leptospira interrogans 102 and the membrane proteome of Staphylococcus aureus (S. aureus). 103 Studies such as these ultimately might lead to an improved understanding of microbial pathogenesis and provide important clues for the identification of novel biomarkers and antivirals.
Because proteins carry out many functional biological activities, the proteomic analysis of an organism provides a layer of detail beyond that of genomics. Although it is still an emerging technology, proteomics has already had a profound impact on clinical and biological research by informing on the pathogenesis of disease, facilitating biomarker discovery and aiding in the identification of potential drug targets. New developments in MS platforms, MS-based tissue imaging techniques, and combined proteomic/genomic approaches will likely extend the impact of proteomics to drug design and testing as well as pathogen identification and discovery, which will continue to advance our understanding of complex biological processes. 77,78 Metabolomics. The newest field of systems biology, metabolomics, has been defined as ''the comprehensive and simultaneous systematic determination of metabolite levels in whole organisms and their changes over time as a consequence of stimuli such as diet, lifestyle, environment, genetic effect and pharmaceutical intervention, both beneficial and adverse.'' 104 A more concise definition of metabolomics might be the global profiling of all small molecule metabolites contained within a sample of interest at a given time. Metabolites, which are the intermediates and products of all cellular processes, typically fall into one of 4 categories: amino acids, nucleotides, lipids, and sugars. Moreover, metabolites represent the level at which most pharmaceuticals exert their effects. Therefore, by studying the production and consumption of these chemical fingerprints, one can gain insight into the dynamic functions of any biological system. 104,105 The analytic techniques employed in metabolomic studies are largely similar to those used in proteomic studies and are based primarily on either nuclear magnetic resonance (NMR) spectroscopy or MS. The typical workflow involves mechanical extraction coupled to a rapid cooling step that halts metabolism in its native state, followed by detection (either via NMR or MS) and data analysis. MS-based metabolomics requires the additional separation of components by either gas chromatography or LC for enhanced molecular identification. For each molecule within a sample of interest, LC-MS-based metabolomics generates 3 pieces of data: (1) chromatographic retention time, which informs on the chemical structure; (2) a m/z ratio, which informs on the molecular mass; and (3) abundance. The retention time and m/z ratio may then be queried against a library of known standards to determine the identity of each metabolite. 106 An NMR-based metabolomic analysis, by contrast, provides a nondestructive way to view changes in metabolite levels either in vitro or in vivo more broadly. Moreover, it provides more detailed structural information than MS. Because MS analysis provides better resolution and sensitivity, it remains the predominant methodology for analyzing metabolic data. All metabolomic analyses result in complex, multivariable data sets that require visualization software for spectral analysis as well as bioinformatics and statistical methodologies for interpretation. 104,107 Metabolomic analysis of data sets produced by NMR and MS yields 2 major applications, steady-state metabolite profiling and labeled flux analysis. Steady-state metabolite profiling may be global (profiling of all small molecule metabolites within a sample) or targeted (analysis of only prespecified metabolites of interest), and it has been used to discover new biomarkers, inform on dysregulated pathways associated with disease, and generate information about cellular functions. Labeled flux analysis, in which radiolabeled isotopes are used to analyze changes in the amount of a given metabolite that is used and produced in a particular metabolic pathway, is an approach that yields more detailed information about the dynamic nature of biological pathways and can be used to better identify potential therapeutic targets. 107 Recently, Henderson et al 108 used a steadystate LC-MS-based metabolomics approach to identify how bacterial strains associated with urinary tract infections (UTIs) differed from those that colonize the gastrointestinal tract. Their results indicated that urinary Escherichia coli (E. coli) strains preferentially express 2 small molecules, yersiniabactin and salmochelin, which are known to be important in iron scavenging and support bacterial survival and growth. These findings suggest that new antibiotics directed against yersiniabactin-or salmochelin-producing E. coli strains might be an improved and more targeted strategy to prevent recurrent UTIs. In another example, Munger et al 109 used changes in metabolic flux between uninfected and human cytomegalovirus-infected cells to identify pathways that were upregulated by viral infection, thus providing important information regarding potential targets for novel antivirals aimed at blocking viral replication.
Lipidomics and glycomics are two growing fields related to metabolomics that merit special mention. 110,111 Lipidomics is the large-scale study of the networks and pathways employed by all cellular lipids, and it employs largely the same tools as those used for metabolomics, but with specific modifications that accommodate the unique characteristics of lipids. 112 Lipids have key functions in signaling pathways, energy storage, and the structural integrity of cell membranes. They also play important roles in host-pathogen interactions and immunomodulation. 111,113 Glycomics is the comprehensive study of all sugars in an organism. 114,115 Because cellular sugars can be simple sugars, glycoproteins, or glycolipids, there is no common experimental approach for their analysis. Instead, glycomic studies use a combination of the techniques found in meta bolomics, proteomics, and lipidomics. 110 Glycoconjugates participate in a variety of biological processes associated with cell adhesion and migration, bacterial and viral recognition, signaling pathways, and innate immunity. 115 Therefore, the study of changes in the lipid and sugar components of a biological system can provide powerful clues about disease-related mechanisms and novel therapeutic targets. Indeed, these approaches have shown particular promise in the study of infectious disease. Lipidomics has been used to uncover the mechanisms involved in enhancing the drug susceptibility of Candida albicans 116 to gain insight regarding the intracellular processes and lipid composition of HIV, 117,118 CMV, 119 and HCV, 120,121 and to identify lipidomic modifications that occur in the host as a result of bacterial or viral infection. 122 The utility of glycomics has also been demonstrated recently in the characterization of specific interactions between viral proteins and various forms of sialic acid, 123,124 the detailed analysis of how lipopolysaccharide from Coxiella burnetii may contribute to pathogenesis, 125 and the discovery that the glycan coats of HIV-1 virions match that of the T cell immunomodulatory microvesicles from which they derive, suggesting a mechanism by which the virus can evade the antiviral immune response. 126 The aforementioned examples demonstrate the potential of metabolomics as a tool for enhancing development of novel diagnostics and therapeutics.
Like proteomics, metabolomics has already been used as a vehicle for investigating disease pathogenesis, potential biomarkers, and therapeutic targets. Moving forward, improvements in automated metabolite identification will expand exponentially the potential uses of metabolomics-based approaches. Additionally, the integration of metabolomics data with other global profiling techniques, such as genomics and proteomics, will allow for a deeper understanding of biological pathways than possible to date, with unparalleled possibilities for clinical application.

THE ''OMICS'' OF INFECTIOUS DISEASE
With the advent of genomics, proteomics, and metabolomics came an explosion of other ''omics'' sciences. This convention arose, in part, because the suffixes ''ome'' and ''omics'' succinctly describe a holistic way of looking at relationships that exist within a relatively complex scientific domain; however, this trend also reflects the growing involvement of bioinformatics and the internet as a way to integrate complex biological data. 127 Concurrent to the significant advances in ''omics'' sciences that have allowed for a better understanding of the molecular and cellular processes that occur within humans, researchers also began to appreciate the number and complexity of organisms that resided on and in the human body. After the publication of the human genome sequence in 2001, 128,129 scientists began to call for a ''second human genome project'' that would catalog the complete inventory of microbial genomes at major sites of colonization on the human body. 130,131 Because microbiota are believed to play a fundamentally important role in human health and disease, 132 a microbial inventory in combination with a comprehensive analysis of host gene expression would provide insights into the mechanisms of this interaction. A landmark study that was published in 2005 by Eckburg et al 133 used the 16S ribosomal RNA sequencing technique to characterize the endogenous gastrointestinal microbial flora in 3 individuals. The authors concluded that the bacterial communities varied tremendously from one individual to the next. This and other studies that came after indicated that a larger scale investigation of the human microbial ecosystem was warranted. Thus, the NIH initiated a 5-year project called the Human Microbiome Project (HMP) in 2007. 134 The HMP uses various ''omics'' approaches to characterize the complexity of microbial communities, determine the existence of a core microbiome at various colonization sites on and in the human body, and examine the relationship between changes in the microbiome and human health (Fig 2). 135,136 To date, the HMP has determined that substantial alterations in the human microbiome are important for a variety of disease states, including psoriasis, 137 sexually transmitted infection of the male urogenital tract, 138 Crohn disease, 139 gastroesophageal reflux disease, 140 and others. There is also a growing interest regarding the effects of the gut microbiome on the development of the immune system. A recent study showed that mice treated with antibiotics that depleted specific gut bacterial populations would have an impairment of virus-specific cell-mediated and humoral responses in the lung after infection with influenza virus. 141 These data will require confirmation and more in-depth studies that address mechanism; however, they raise a series of new questions about the role of the bacterial microbiota in human health. Not only has the HMP made major advances in our understanding of the human microbiome, but also it has contributed significantly to the growing field of metagenomics by developing broadly applicable techniques for analyzing massive amounts of sequence data. For example, Langmead et al 142 designed a cloud-computing software tool called Crossbow that can analyze large DNA sequence data sets more rapidly (38-fold coverage of the human genome in 3 h) to perform the alignment and detection of single-nucleotide polymorphisms (SNPs) without sacrificing accuracy. Another group developed a robust software program called QIIME (pronounced ''chime'' and stands for quantitative insights into microbial ecology) that can analyze pyrosequencing data from thousands of heterogeneous experimental data sets in order to acquire information rapidly about various microbial communities. 143 Although the HMP does fund at least 1 project aimed at characterizing the relationship between viruses and human illness, 144 the primary focus of this project is currently with bacterial populations. Therefore, although not as well-defined as the HMP, at least 2 other infectious disease-related ''omes'' have emerged in parallel: the human virome and the human mycobiome, which comprise the myriad of viruses and fungi, respectively, that inhabit the human biosphere and play roles in health and disease (Fig 2). In 2003, Anderson et al 36 proposed a system for the global screening of viruses in large human populations. Although this global screening is not limited to viruses that persist chronically in the human body as is the case for the commensal bacteria populations included in the HMP, such a system could lay the groundwork for the rapid detection of outbreaks and the systematic discovery of novel human viruses, as well as provide better clues about etiologies in human disease. For example, 1 study investigating the human virome uncovered differences in the metabolic states of DNA viromes in the respiratory tract of cystic fibrosis (CF) compared with non-CF patients. 145 Another group demonstrated that bacteriophage populations became more similar in the guts of individuals who were on a diet of similar fat content, thus raising the possibility that virus populations could be engineered to combat obesity. 146 In contrast, studies regarding the human mycobiome seem to still be in their infancy as the first study describing a baseline mycobiome in humans was published only recently. 147 In this study, the authors used multitag pyrosequencing to characterize the fungi present in the oral cavity of healthy individuals and identified differences and similarities in white and Asian populations.
In light of major advancements in the genotypic characterization of bacterial, viral, and mycotic pathogens associated with humans, it is clear that an additional holistic approach will be required to understand the phenotype of this interaction more fully and how it may affect human health and disease. Thus, the emerging field of infectomics will aim to study the structural and functional genomics and proteomics of microbial infections more efficiently, accurately, and integratively (Fig 2). Whereas the microbiome, virome, and mycobiome will identify the comprehensive inventories of microbes that inhabit the human body, the infectome will examine the distinct signatures of each microbe as they vary depending on host or microbial gene expression, tissue type infected, and other dynamic characteristics of both host and pathogen. 148 By networking the infectome with the human diseasome (Fig 2), which links certain diseases together based on common gene expression profiles, 149 it will be possible to identify links The ''omics'' paradigm for host-microbe interactions. State-ofthe-art ''omics'' technologies have paved the way for the comprehensive characterization of a variety of hosts and microbes. The microbiome, virome, and mycobiome are 3 microorganism-specific data sets that have arisen from the ''omics'' revolution. Although each subset comprises distinct information about its target (bacteria, viruses, fungi, etc), they do not exist in isolation. An increasing number of studies is focused on the intersection between the collective set of microbial ''omics'' and the profile of gene expression that is associated with human disease (diseasome). Thus, investigation into the complex dynamics that occur during infection (infectome) can provide significant insight regarding the etiology and progression of disease, as well as lead to the identification of targets at which to aim therapeutic intervention.
between microbial infection and the etiology of human disease, which could inform the design of better vaccines and therapeutics. 150

TAKING IT ALL TOGETHER: A LOOK AT SYSTEMS BIOLOGY
The need to manage the rapidly accumulating number of sequences and massive amounts of data from highthroughput platforms has required the development of more sophisticated computer programs to analyze and integrate the data. Thus, systems biology has developed into a new field that aims to understand the complexity of pathogen-host interactions by using computational integration of high-throughput experimental data and by modeling molecular networks via bioinformatics. Intrinsic to this approach is the idea that biological systems display ''emergent properties'' which are complex patterns that arise from a multiplicity of relatively simple interactions. Therefore, a major objective of the systems biology approach with regard to infectious disease is to make predictions about the dynamic behavior of biochemical networks involved during infection with any pathogen. 151 Thus, whereas small-scale studies look only at one side of the interaction, system biology gives a global perspective of the events.
Under the umbrella of systems biology, 4 main approaches are relevant to the study of infectious diseases: the systems biology of viral and bacterial pathogens, systems immunology, systems vaccinology, and highthroughput drug discovery. Common to each approach is the goal of integrating high-throughput multiomics data to construct predictive models of the networks and dynamic interactions between pathogens and their hosts to identify preventive and therapeutic targets for clinical development more quickly.
Systems biology of viral and bacterial pathogens. To recognize the importance of systems biology for the understanding, integration and practical application of the results of high-throughput screenings in the field of infectious disease and host-pathogen interaction, the National Institute of Allergy and Infectious Disease (NIAID) has created the Systems Biology for Infectious Diseases Research program. The research activities of this program were separated initially into 4 centers: The Tuberculosis (TB) Systems Biology Center, The Systems Virology Center, The Center for Systems Influenza, and The Center for Systems Biology for Enteropathogens. Each center will use both computational and experimental methodologies to analyze and model host-pathogen interactions comprehensively to determine the extent to which molecular and cellular networks are induced or altered during the course of infection. The overarching goal of this program is to generate and develop reagents, protocols, and statistical models that will be rapidly made available to the scientific community to promote swift advancements in scientific research and translation to clinical practice. 152 The purpose of the TB Systems Biology Center is to model the molecular pathways of M. tuberculosis under conditions relevant to TB pathogenesis using a combination of experimental and computational techniques. For example, chromatin immunoprecipitation followed by next-generation DNA sequencing analysis (ChIP-seq) is a powerful technique that can be used to map global DNA binding sites precisely for any protein of interest so that the regulation of gene expression within the context of pathogenesis can be more fully understood. 153 By combining this technique with a variety of ''omics'' data (transcriptomics, proteomics, glycomics, metabolomics, lipidomics, etc) during in vitro and in vivo growth of M. tuberculosis, the TB Systems Biology Center aims to measure the baseline metabolic and topologic state of the gene regulatory network of M. tuberculosis, delineate global alterations that occur in both the pathogen and host during infection of host macrophages, and integrate profiling data to develop predictive computational models of bacterial regulatory and metabolic networks. Using these models, predictions can be made regarding the state of these networks during infection with M. tuberculosis, leading to additional hypothesisdriven research. 151,154 The Systems Virology Center and the Center for Systems Influenza are similarly aimed at using systems biology to analyze and model alterations comprehensively that occur in molecular and cellular events during the course of virus infection and to provide deeper understanding into the processes that lead to the initiation and progression of infectious disease. The Systems Virology Center will focus on creating a unified view of the mechanisms of pathogenesis for highly pathogenic respiratory viruses including the 2009 pandemic H1N1 influenza virus, the H5N1 avian influenza virus, and SARS-CoV. 151,155 In contrast, the Center for Systems Influenza will focus more specifically on defining host innate immune responses to infection with influenza virus strains of varying pathogenicities, as well as characterizing regulatory networks associated with concurrent secondary infections like S. aureus. 151,156 To date, several specific achievements have come out of these centers. Using a technique called weighted gene correlation network analysis, in which relationships between genes that are coexpressed are weighted based on their strengths or capacities, 157 the transcriptional response of human bronchial epithelial cells infected with a highly pathogenic avian H5N1 influenza virus was profiled and compared with cells infected with a less virulent strain to identify potential novel mediators of pathogenesis. 158 Another group performed a comprehensive proteomic analysis using LC-MS/MS to identify a large number of human host proteins associated with the polymerase complex of an H5N1 influenza virus. In addition to previously published interactions, this method identified novel interactions of the viral polymerase with host mitochondrial, apoptosis-inducing, innate antiviral, and RNA polymerase accessory proteins that could provide new insight into mechanisms by which the viral polymerase may contribute to host cell pathogenicity. 159 Additionally, this center has used next-generation sequencing and analysis of the host transcriptome in response to influenza virus and SARS-CoV infections to investigate beyond the proteome and characterize the roles of non-protein-coding RNAs in regulating the host innate immune response. 160 Finally, the Center for Systems Biology for Enteropathogens uses computational and experimental ''omics'' methodologies to analyze and model the interactions that occur between a host macrophage and the Salmonella enterica (S. enterica) and Yersinia pestis bacterial species to shed light on mechanisms behind the outcome of infection (either bacterial replication and host cell death or containment of the pathogen). 151,161 Researchers affiliated with this center have developed and used constraint-based reconstruction and analysis methods to simulate, analyze, and predict the way metabolic processes behave in various microorganisms. 162,163 In another study, novel virulence factors of S. enterica, including a new class of translocated effectors, were discovered and characterized through sample-matched multiomic measurements of various mutant strains. 164 Other computational approaches, such as support vector machine-based identification and evaluation of virulence effectors, have also been used to facilitate the identification of secreted bacterial effector proteins from genomics sequence information. 165

SYSTEMS IMMUNOLOGY
To understand fully the mechanisms by which drugs and vaccines combat disease, one must also grasp the way in which the human immune system responds to its infectome. Unfortunately, as science analyzes the fine details of immunity with higher and higher resolution, there is actually a decreasing ability to predict how the system behaves as a whole in the context of infection. To that end, researchers have begun to take systems biology approaches to immunology. One example of an effort to meet the challenge of amassing high-quality quantitative data using the available technical advances that ''omics'' sciences have provided while organizing and analyzing that data in ways that are meaningful to the study of immunology is the development of the NIAID Program in Systems Immunology and Infectious Disease Modeling in 2006, which became the Laboratory of Systems Biology (LSB) in 2011. 166,167 Rather than being a collection of independent laboratories, the LSB is an integrated group of scientists and staff who play a major role in fostering the growth of systems biology efforts both within the United States and abroad. 167 At the forefront of this field is a rapidly increasing collection of systems biology techniques that have been applied to the unique set of problems encountered at the intersection between the human immune system and infectious disease. For example, computer simulations of the molecular dynamics of association, dissociation and posttranslational modifications of large peptide folding processes are used frequently in the field of systems immunology to characterize receptor-ligand interactions that occur between the host and pathogen. One study used this technique and the model system of EBV to demonstrate that some T cell receptor structures appear more frequently than others because of certain pattern recognition receptor (PRR)-like traits that occur within the T cell repertoire. 168 Another group used constant pH molecular dynamics simulations to investigate the mechanism behind the ligand-binding activity of an innate immune system PRR that recognizes a pH-sensitive microbial protein to understand more completely how this type of interaction plays a role in immune surveillance and clearance of apoptotic cells. 169 In addition to simulations that model the dynamics of molecular-level interactions, several programs have emerged that combine the techniques of systems biology with information provided by datadriven prediction methods. Because of advances in the predictive capacity of systems biology research, the newest version of the computer simulation software, C-ImmSim, can model the immune response in silico more accurately. Recently, the creators of C-ImmSim demonstrated a variety of new applications now accessible by their improved program, including the modeling of primary and secondary responses to prime/ boost immunization, the way in which immunodominant peptides lead to affinity maturation and the relationship of diversity in MHC to ''time to AIDS'' in HIV-infected patients. 170 Another simulation framework that has made its way into the public domain is the Multiscale Systems Immunology platform, which aims to model the early immune response to vaccination and natural infection by incorporating realistic biophysics and intracellular dynamics. 171

SYSTEMS VACCINOLOGY
The growing field of systems vaccinology combines modern analytic tools of systems biology with data on human immunological response patterns to vaccines in order to uncover molecular signatures of vaccine efficacy and to guide the design and evaluation of new vaccines. Most recently, systems vaccinology has been used to study the immune responses to vaccines against yellow fever virus (YFV). 172,173 Although the YFV vaccine has been used in humans since 1937, little was known about how the vaccine protected from infection. With the advent of systems vaccinology, scientists examined comprehensively the initial molecular signatures in individuals vaccinated against YFV not only to elucidate the mechanisms behind the vaccine's efficacy but also to establish predictive correlates for the later development of protective responses. 172,173 Systems vaccinology was also used to establish the predictive correlates of a protective antibody response after administration of the trivalent inactivated influenza virus vaccine, 174 and similar approaches have been applied to the study of immune responses to Brucella melitensis and fungal infections. [175][176][177] Ideally, subsequent development of these models will define inherent and infection-mediated perturbations in the genetic regulatory networks of the host and will enable predictions about how the host will respond to vaccines. Based on these models, we can design vaccines that induce optimal immune responses without toxic effects, thus improving vaccine safety profiles. 75,178 Once the correlates of vaccine-induced protective immune responses against infection are well defined with the help of systems vaccinology, new methodologies must be identified that can apply this information to the design and development of new vaccines. The likeliness that commonalities exist among many different vaccines regarding the correlates of successful immune responses will drive faster and more accurate ways of screening vaccine candidates for their effectiveness. For example, Pulendran et al 179 predicted the development of a vaccine chip microarray, similar to the Mam-maPrint prognostic chip that was developed for breast cancer, which will be able to predict the immunogenicity of any vaccine. Although this specific technology may so far still exist only in theory, several systems biology approaches are being used currently to aid with antigen discovery and vaccine development. Phage display is one such technique that enables the screening and identification of unique molecules with highly specific affinities for targets of interest. 180 This approach works by introducing a foreign sequence at an appropriate site within a bacteriophage coat protein gene that results in the display of that protein on the phage surface. If random sequences are inserted, then it is possible to obtain a library containing a large number of distinct proteins, each displayed on a different phage. By immobilizing a DNA or protein target to a solid surface, libraries can be ''panned'' for phages that recognize the target. After washing, phages that do not bind to the target will be washed away, whereas specific phages can be eluted and used to produce more phage, thus enriching the population for relevant binding phages and enabling the isolation of specific proteins of interest for use in subsequent applications. 181 Phage display is a powerful technique because of its many applications for vaccine and drug development. One application of phage display technology is the identification of the epitope repertoire of antibodies in postvaccination human sera. To this end, Khurana et al 182 used H5N1 influenza virus whole genome-fragment phage display libraries to evaluate the quality of antibody responses in humans after immunization with a novel virus-like particle vaccine. This approach also enabled the authors to determine the most appropriate dosage of the vaccine to induce the desired antibody repertoire and will help them to evaluate the usefulness of incorporating an adjuvant into the vaccine.

HIGH-THROUGHPUT DRUG DISCOVERY
In the war against infectious disease, the ability to develop nontoxic and fast-acting therapeutic and prophylactic drugs is just as important as being able to design safe and efficacious vaccines. Besides its usefulness in evaluating the efficacy of vaccine-induced immune responses, phage display can also be used to identify and synthesize monoclonal antibodies (mAbs) that can be used for therapeutic purposes in a variety of infections. An additional benefit of antibody identification through phage display is that because phage libraries can be generated from any set of sequences, it is possible to display human antibody fragments, thus avoiding the side effects inherent in humanized or chimeric antibodies. 183 This approach has produced several mAbs with potential therapeutic applications against agents of infectious disease, including influenza A virus, 184 Clostridium difficile, HIV, viral hepatitis, rabies, Pseudomonas aeruginosa, methicillin-resistant S. aureus, and Bacillus anthracis (B. anthracis). 185 Of these, the mAb targeting the protective antigen of B. anthracis, raxibacumab, is so far the only one that has met all criteria for approval by the U.S. Food and Drug Administration (FDA). 186 With the growing emergence of antibiotic-resistant bacterial strains and the ever-present public health threat of pandemic viral infections, there has been an increasing pressure on the scientific community to discover, test, and produce novel antimicrobial drugs. High-throughput methods for synthesis and screening an increased number of targets resulting from the ''omic'' revolution have significantly advanced the search for new therapeutics. Combinatorial chemistry and diversity-oriented synthesis are 2 of the most important new methodologies in drug discovery because they have drastically reduced the time and costs associated with producing effective, marketable, and competitive new drugs. The power of these techniques comes from the ability to synthesize extremely large libraries of high-quality compounds with complex molecular diversity that increases their likelihood of interacting with biological macromolecules in a selective manner with high affinity. 187 Phage display libraries have also been used widely in screening for novel drug candidates based on specific protein-protein interactions. Although they are frequently cheaper than most chemical synthesis methodologies, there are still several limitations regarding stop codon placement within displayed sequences, inefficient transformation methods, peptide length, and restricted use of naturally occurring amino acids. Nevertheless, several proof-of-principle studies have been performed recently to overcome these limitations, including 1 study that demonstrated that the incorporation of genetically encoded unusual amino acids might be feasible. 188 After the synthesis of potential molecular targets, high-throughput screening is necessary to test the compounds in these large, natural, or synthetic libraries in in vitro or in silico assays and identify those that are highly specific and biologically active against a defined target of interest. Although many different assays are used in high-throughput screening, the general principle is that the specific molecular target or pathway of interest is combined systematically with each possible drug compound, then verified in manual or automated platforms for a positive result, or ''hit.'' The resulting ''hits'' would then be selected based on how strongly they interacted with the target, and additional confirmatory experiments would be performed to characterize the interaction. 189 The incorporation of automation, computational methods, and nanotechnology has allowed for enormously increased efficiency in both the development of combinatorial libraries and the highthroughput screening of potentially useful drugs. To illustrate the rapidly accelerating success of this method, a recent paper has reported achieving ultrahighthroughput screening speeds of 100 million individual enzyme reactions in 10 h using only small reagent volumes; this represents a 1000-fold increase in speed and a 1 million-fold reduction in cost compared with standard high-throughput techniques. 190 A negative consequence of the massive amounts of data generated from high-and ultrahigh-throughput technologies is that bottlenecks will occur at the level of drug development and optimization. In the past decade, more than 3000 expressed proteins have been described as potential drug targets, yet only 20% of those have been exploited commercially. 191 Therefore, despite the optimism engendered by high-throughput screening and other advances in drug discovery and validation, numerous obstacles remain for the translation of this vast amount of data into clinical application.

FROM THE SIDE OF THE HOST: GENETIC PREDISPOSITION TO INFECTIOUS DISEASE AND PERSONALIZED MEDICINE
The human genome project brought about a conceptual shift in the focus of medicine, from treatment to the prediction of disease and prevention of side effects. The resulting principle is that if the manifestation of disease is influenced by interactions between the host and the infectious agent, then characterizing the host genetic correlates to disease would help to predict individual susceptibility to specific infection-related pathologies. The recent technological advances in sequencing and computational biology have facilitated the identification of host factors that predispose or affect the response to disease. Therefore, therapies can now be tailored according to the genetic makeup of the host and the characteristics of the microbe responsible for disease. Francis Collins suggested that genetics could be integrated with medicine either by using classic or ''old'' genetics to develop genetic tests for monogenic diseases and prenatal diagnostics or by using ''new'' genetics to predict patient responses and implement personalized medicine. 192 The clinical application of personalized medicine principles may have seemed almost visionary a few years ago, but now this approach is used routinely in the treatment of some diseases. For example, genomic data are used to make accurate predictions about the susceptibility of a patient to disease or about how he/she will respond to treatment. Specifically, the application of personalized medicine is common in diseases such as cancer caused by some solid tumors or hematologic malignancies where patients are genetically tested to determine their likelihood of responding to therapy or having a serious adverse reaction to various drugs. 193,194 Personalized medicine has also been applied to the field of infectious disease to define host predisposition to some infections and to predict successful responses as well as adverse reactions to treatment. Evidence that certain infectious diseases have a genetic predisposition initially came from a study that showed an approximately 6-fold increase in the risk of mortality from infectious diseases in adopted children if the biological parents also died from an infection. 195 Other groups later confirmed these results, and since then, the susceptibility to several infectious diseases, including S. aureus and M. tuberculosis, has been linked to rare defects in genes encoding effectors of the immune response. 152,[196][197][198][199][200][201][202][203] The completion of the Human Genome Project and the map of human genetic variation 204 has promoted the evolution of highdensity SNP arrays that analyze the association of certain polymorphisms with susceptibility to infectious disease, as well as the linkage disequilibrium, or correlation between 2 SNPs. These increasingly common genomewide association studies (GWAS) have identified a series of genes implicated in the predisposition of different bacterial and viral diseases. 205,206 Although GWAS was first applied to the fields of rheumatic and autoimmune diseases, recently this approach has identified novel genetic loci involved in susceptibility to infectious disease. In addition, the use of GWAS is beginning to be demonstrated in the identification of markers that indicate the quality of a patient's response to treatment. For example, a recent GWAS showed that an SNP located upstream of the IL28B gene, which codes for type III interferon, is associated strongly with more than a 2-fold difference in response to anti-HCV drug treatment. 207 Other studies showed a relationship between IL28B gene polymorphisms and the spontaneous clearance of acute HCV. 208 Because HCV, which almost invariably leads to a chronic disease with liver failure in approximately 70% of individuals, is treated most effectively during the acute phase of the disease, this study was a significant step toward allowing physicians to predict spontaneous clearance accurately and to identify a subgroup of patients that will benefit from early treatment. 208 Microarray technology has paved the way for the development of the field of transcriptomics, which is the large-scale profiling of all RNA transcripts in a cell to characterize gene expression (Fig 1). Transcriptomics has been applied widely to the study of the host response and identification of biomarkers after an infection or treatment. In this case, patterns of gene expression in human cells, most frequently peripheral blood mononuclear cells (PBMCs), are studied by hybridizing biological material to defined gene chips. [209][210][211] With regard to infectious disease, microarray technology has made a huge contribution to our understanding of the changes evident in the transcriptome of humans during infection. Lempicki et al 212 used microarrayderived gene expression patterns to demonstrate that the increased expression of interferon-stimulated genes in patient PBMCs could be one explanation for why anti-HCV therapy in patients coinfected with HCV and HIV has lower success rates. In another study, Jacobsen et al 213 used microarray technology to identify the biomarkers expressed during infection with M. tuberculosis that would allow for more accurate predictions regarding the outcome of infection. Thus, microarray technology has been broadly applicable in studies aimed at characterizing the infection profiles of a variety of microorganisms, including SARS-COV, 214 dengue, 215 influenza, 216 S. aureus, 217 and S. enterica. 218 The advent of next-generation deep and ultra-deep sequencing techniques has also improved the ability of researchers to identify splice variants and low abundance transcriptomic changes in response to infectious disease. 49 Data such as these can be used to inform improvements in the diagnosis and prognosis of disease, the design of antiviral and antibiotic therapies, and the construction of vaccines that can target susceptible populations more specifically.
The application of new genomic techniques in predicting the response of a patient to drug therapy based on his/her genetic profile (pharmacogenomics) has found practical application in identifying populations that are susceptible to short-and long-term toxicities of specific HIV antiretroviral drugs. One example is the reaction to abacavir, which is a nucleoside analog reverse transcriptase inhibitor. The major toxicity manifests as a hypersensitivity reaction that occurs in approximately 5% to 8% of recipients within 6 weeks of commencing therapy. 219 Mallal et al 220 showed that patients carrying the HLA-B5701 allele are more likely to experience this hypersensitivity reaction to abacavir. This marker had a negative predictive value of 100%, indicating that genetic tests can provide unequivocal information that can help to predict and prevent otherwise unpredictable drug reactions. Another example of the use of pharmacogenomics to guide therapy is in the case of the antiretroviral drug maraviroc, which is a CCR5-specific HIV entry inhibitor. This drug is effective only in patients who are infected with strains of HIV that use the CCR5 coreceptor for entry, but not in patients who are infected with the CCR4-tropic viruses. Therefore, by screening for the specific tropism of the strain of HIV with which the patient is infected, clinicians can target drug therapy more specifically, for example, by using maraviroc only in patients who are infected with the susceptible strain of HIV. 221 These results underline the success of GWAS and personalized medicine in predicting which populations should avoid treatment with specific drugs because they are more prone to side effects.
Although the clinical application of ''omics'' and systems biology holds great potential for medical discovery, the introduction of high-throughput technologies has also introduced new questions and challenges regarding the transformation of the large volumes of data produced by these approaches into useful information for the development of diagnostic, prognostic, and therapeutic devices. The ''large n, small p'' paradigm, in which the number of study subjects (n) is higher than the variables (p) generated from the study itself, currently serves as a fundamental analytic principle for data modeling. With a shift in this ratio toward smaller n values compared with the large number of variables amassed through high-throughput data acquisition, it is becoming clear that this paradigm will need to be revised to preserve statistical accuracy. 222 To address this problem, some researchers have proposed changes in the mathematical principles used for data analysis, such as using continuous variables [222][223][224] and nonlinear modeling. 222,225,226 Some also recommend switching from deterministic to probability-based modeling strategies to account for the uncertainty that is typical of common clinical scenarios and to predict the relative probabilities of various outcomes given the same initial situation. 222,227 In addition to the necessary evolution of data modeling strategies, another critical factor in the successful translation of research data is the capability to access and analyze large sets of data that originated from different sources and contain diverse types of information. There is, therefore, a need to develop methods that enable the construction of hybrid data sets that contain both clinical and biological data, and are organized in ways that are accessible to all translational researchers and compatible among different systems. Translation bioinformatics is an emerging area that proposes to offer a holistic approach to this issue by providing integrative methods for incorporating multiple data modalities and enabling the development of predictive models for therapeutic responses. 228 Although a detailed description of the methods used for data integration and modeling is beyond the scope of our article, other articles by Yan, 228 Kanehisa et al, 229 and Sarkar et al 230 offer a more complete review of the topic. These developments are still the object of research and far from completion, but as the data accumulate, only the full development of new analytic strategies will enable their full diagnostic and clinical use.
Here, we mentioned some advances in translational research that are applicable to infectious disease studies. There have been several other major achievements in translational research with other diseases; some have been especially relevant in the fields of cancer and neuroscience, specifically with regard to epigenomics and stem cell research. Although a thorough discussion of these fields is beyond the scope of this article, their impact on translational research has also been profound.
Regardless of the field of study or research technique, however, the promise of translational research remains hindered by a few key factors, which we will now discuss.

BRIDGING THE GAPS: TURNING SCIENTIFIC PROMISE INTO CLINICAL PRACTICE
Whereas the aforementioned studies illustrate the incredible promise of new technologies in translational research, their full potential remains largely untapped partly because of the high failure rates, long timelines, and costs of converting basic science research into clinically applicable methodologies. The recently proposed NCATS, a new NIH entity that would include the CTSA (and eliminate the NCRR), is an effort to circumvent these roadblocks. In the remainder of this review, we will discuss how time frames, costs, and poor success rates have persistently hindered the translation of basic research advances into clinically useful products, how the newly proposed NCATS aims to circumvent these roadblocks, and the additional challenges facing the field of translational medicine. 8,11 Failure rates. Over the past decade, the number of NMEs approved by the FDA and comparable regulatory bodies around the world has decreased progressively. In the second half of the 1990s, for example, the FDA approved more than 30 NMEs per year. 231 By contrast, the number of NMEs approved since 2001 has averaged slightly less than 23 per year, with the exception of 36 approved in 2004, despite the increasing number of potential therapeutic compounds identified from high-throughput technologies. [231][232][233] In addition to the decline in approvals, the number of NME applications submitted to the FDA over the past decade has decreased, with 2010 producing one of the lowest numbers in more than 15 years. 233 Because of this decline in approvals and applications, as well as the increasing use of generic drugs, it is estimated that large cap pharmaceutical companies only can replace, on average, 26 cents to every dollar lost to patent expirations by new product revenues. 234 According to current estimates, only 8% of NMEs will make it from candidate selection (in the preclinical stage) to a successful launch. 232,235 This decline in drug development threatens to take not only an economic toll but also a medical one, as approximately 40% of the 2-year increase in life expectancy from 1986 to 2000 was attributable to the introduction of new drugs. 236 This lack of new drugs is especially worrisome in the field of infectious disease, where the emergence of multidrug resistant pathogens has led to a situation in which formerly curable infections are now untreatable. Paradoxically, some pharmaceutical companies have indicated that they are curtailing or curbing anti-infective research. 10 A major reason for this is that anti-infective agents, which are only needed for a short time in smaller populations, make less economically attractive targets than drugs targeting chronic diseases that appear in the larger aging population (eg, antihypertensives and antidepressants).
At this time, approximately 85% of the failures in translating promising NMEs into clinical application occur during early clinical trials. According to a recent editorial, 237 many of these NMEs fail because of outdated clinical trial designs. Drug testing requires the study of large sample populations to show statistically significant benefits compared with existing pharmaceuticals. The introduction of specific markers for personalized medicine makes it even more difficult to recruit patients who qualify for these tests. Therefore, many clinical trials take years to complete their enrollment, or they are cancelled because of the inability to recruit a sufficient number of subjects. These failures therefore contribute to the total costs and failure rates of translational research. As a result, many NMEs are never developed, and companies are removing more products from their pipelines than ever before. 237 To circumvent this problem and increase the success rate of proposed NMEs, Francis Collins has proposed several priorities of the new NCATS. First, the NCATS will support studies that investigate broadly applicable, rather than disease-specific, target validation approaches (ie, targets that may be relevant to multiple diseases), as well as targets that are considered to be ''too risky'' for industry investment, such as those identified by GWAS. Second, the NCATS will encourage the use of ''omics'' and systems biology approaches in order to help design and validate new diagnostics and therapeutics for translation into clinical application. Third, the NCATS would also house the recently established NIH-FDA Regulatory Science Initiative, which was launched in 2010 with the goals of facilitating improvements in regulatory science and providing new insights to broadly benefit the field of translational research. 8,[238][239][240] Finally, together with the NIH-EPA-FDA Tox21 Consortium, the NCATS would support the use of new cell-based approaches as a means of pursuing preclinical toxicology studies. 241 Timelines. By current estimates, the average timeline for discovering and developing an NME is approximately 13.5 years. This estimate includes the regulatory review but not the time it takes to identify and validate the target pathway. 232 Time spent in clinical trials, unnecessary bureaucratic procedures associated with research and design management, and inefficient regulatory review have all been identified as contributors to this problem. 9,232,242,243 To minimize these problems, the proposed NCATS would seek to encourage the ''reengineering of the translational process, from initial target identification to first-inhuman application of small molecules, biologics, diagnostics, and devices'' to streamline this process and attract pharmaceutical involvement. 8 Additionally, the center would aim to act as an honest broker, hastening Institutional Review Board (IRB) approval for multicenter trials, further decreasing the time that NMEs spend in clinical trials and facilitating the rescue and repurposing of abandoned compounds. Last, the center would support collaborative efforts for the testing of new entities to act as a ''proof of concept'' in the hopes that promising new drugs would be picked up by industry. 8,244 Costs. The cost of a translational endeavor is inexorably linked to both high failure rates and long timelines. Indeed, the average cost to bring an NME to market is now estimated to be $1.8 billion. This estimate seems to be the same regardless of whether the NME is produced by a big pharmaceutical company or small biotechnology firm. 232 The NCATs aim to decrease these costs in a number of ways. First, it would encourage the use of early human trials, also called ''phase zero'' clinical trials, in which ''microdoses'' of a new drug are administered in a small number of patients. 237 In these trials, the absorption, tissue distribution, and toxicity of the drug are assessed by highly sensitive methodologies, such as molecular imaging, metabolomics, or proteomics. 8 The feasibility of such endeavors has been demonstrated already by the Consortium for Metabonomic Toxicology at the Imperial College in London, which showed it is possible to design a prediction model for clinical toxicity using metabolomic data. 104 Another way in which the NCATS would decrease development costs is by encouraging innovative models, including human tissue biobanks, stem cell models, and tissueengineered organoids. These models would potentially allow researchers to decrease or omit the use of animal models, which have been criticized as both costly and not accurately predictive of efficacy in humans. 8,237 Finally, the NCATS would encourage the development of technologies aimed at biomarker identification. These have been used already by researchers to design a new style of ''adaptive'' clinical trial, in which the study design changes as new data are collected, allowing patients who are more likely to respond to be assigned to treatment with a given compound. 237 Beyond encouraging the development of new technologies aimed at decreasing costs, the NCATS would promote cross-collaborations between academic institutions, biotechnology firms, philanthropic organizations, and pharmaceutical companies to split research and development costs.
Last, the proposed Cures Acceleration Network, which will be assessed for congressional funding in the next fiscal year, would provide the NIH with the ability to award up to $15 billion per year to academic and private consortia to direct the development of certain especially promising therapeutics. 8

ARE NCATS THE FUTURE OF TRANSLATIONAL RESEARCH?
Although the need to encourage the development of new drugs and therapeutics is clear, some have expressed concerns that the proposed NCATS represents a change in priority for the NIH away from basic, discovery-based science and toward application-focused research, thus neglecting the fact that basic science lays the foundation for translational research. Indeed, this has been a criticism since the development of the CTSA. 245 Another criticism is that the support of GWAS and other highthroughput methodologies causes a dramatic shift from research that is hypothesis based to that which is data driven, placing an overemphasis on screening for drugs and biomarkers at the expense of discovery-based science. 246 For example, studies based on highthroughput technologies are typically not the result of experiments designed to test a hypothesis. Rather, they aim to identify systematically all the molecular effectors that are involved in a specific system in the most comprehensive and efficient way possible. To address this critique, some authors have added an extra step of complexity to their high-throughput methods by determining a mechanistic function for the molecules they have identified, switching in this way from ''fishing expedition'' to discovery-based research. In the end, despite assertions that the establishment of the NCATS will reinforce, rather than detract from basic science, many investigators continue to have concerns, mostly related to money. In today's climate of funding cuts, some wonder: Will it be possible to receive funding for more basic (ie, less obviously applicable) research? Will the emphasis on collaboration and multidisciplinary approaches shift most of the funding toward multicenter groups of established scientists, making it difficult for new investigators to survive? Will the current emphasis on clinical applicability result in a narrowing of ideas and perspectives that will result in most funding being allocated to well-established scientists pursuing traditional approaches? Beyond these concerns regarding a change in the focus of NIH, concerns regarding policy and infrastructure exist, such as whether funding NCATS programs will take money away from other programs that were previously housed within the NCRR. As the NCRR currently awards funds of approximately $1.25 billion annually, this is a substantial concern. 247 Although none of the NCRR programs are currently slated to be discontinued, the perceived lack of transparency surrounding the establishment of the NCATS, the dissolution of the NCRR, and where programs formerly housed within the NCRR will now be located has resulted in persistent scrutiny from both NIH stakeholders and Congress. [248][249][250] Perhaps the greatest issue of debate is where to draw the line between public and private industry. Many criticize the proposed NCATS because they believe it is designed to help companies develop new drugs by decreasing their financial risks, and it is believed that it will do so by using taxpayers' money. Also, they argue that the NIH only has experience with basic science research and not with drug development. Indeed, some have argued that the decline in research and development productivity is not related to increasing costs, because public investment in the pharmaceutical industry has increased proportionally since 1970, nor is it to the result of time or attritional rates, which have been steady since the 1970s. Instead, they argue that it may be caused by poor prioritization and mismanagement within the pharmaceutical industry. 9 Specifically, changes in management techniques in the pharmaceutical industry hierarchy have led to corporate policies that discourage innovation in favor of cost cutting and risk minimization. 9,246 This together with a shift of control from scientists and researchers to marketers, a focus on short-term profits and fast sales growth, and the discontinuation of compounds for nontechnical reasons, have led to the decreased translation of scientific discoveries to clinical applications. 9 As such, some have argued that the development of the NCATS might make it even less likely that the pharmaceutical industry will try to solve these problems. Ultimately, although the NCATS might serve as a champion for key therapeutic compounds, the pharmaceutical industry itself will need to reform certain practices if the pace of drug development is to keep up with the pace of translational research.
Translational research has made incredible progress in recent years, both with the development of new techniques and with the expansion of more holistic ways of viewing the data that stem from those approaches. For the most part, the discovery of novel technologies, the development of new infrastructures, and the training of budding scientists have supported this evolution. The transition is not complete and roadblocks still exist on the path to scientific progress. It remains to be seen whether the newly proposed NCATS, which has raised as many objections as it has hopes, will be the answer to these problems. What is evident, however, is that translational research must be reprioritized and bolstered in a way that increases the speed at which promising drugs and technologies move through the pipeline into clinical application, while at the same time refraining from stifling the breadth of ideas and creativity of basic researchers, which is necessary for the progression of science itself.
We thank Ann Beeder, MD for useful discussion.