Infologs are independently designed synthetic genes derived from one or a few genes where substitutions are systematically incorporated to maximize information. Infologs are designed for perfect diversity distribution to maximize search efficiency.
Typical protein engineering methods rely on screening a high number (106-1012 or more) of gene variants to identify individuals with improved activity using a surrogate high throughput screen (HTP) to identify initial hits. Unfortunately, results are defined by what is screened for, thus the “hit” from the HTP screen often has very little real activity in a lower throughput assay more indicative of the improved functionality for which the protein is being developed. By adapting the standard algorithms for engineering complex systems to work with biological systems, the resulting process enables researchers to deconvolute how substitutions within a protein sequence modify its function. Combining these algorithms with an integrated query and ranking mechanism allows the identification of appropriate sequence substitutions.[1] Infologs refers to the set of designed genes, singular use Infolog describes an individual variant.
Ancestry
editHomology between protein or DNA sequences is defined in terms of shared ancestry. Two segments of DNA can have shared ancestry because of either a speciation event (orthologs) or a duplication event (paralogs).
Homologs are similar genes and/or proteins which are related by ancestry.
Orthologs are the 'same' gene, but from different organisms. Homologous sequences are orthologous if they were separated by a speciation event: when a species diverges into two separate species, the copies of a single gene in the two resulting species are said to be orthologous. Orthologs, or orthologous genes, are genes in different species that originated by vertical descent from a single gene of the last common ancestor. The term "ortholog" was coined in 1970 by Walter Fitch.[2]
Paralogs are related genes originating from one gene that through duplication ended up as two genes that over time has evolved for two separate functions (or, according to a recent Science paper,[3] a promiscuous starting gene that duplicated and each copy evolved towards different functions). Paralogs typically have the same or similar function, but sometimes do not: due to lack of the original selective pressure upon one copy of the duplicated gene, this copy is free to mutate and acquire new functions. Paralogs usually occur from within the same species.
Xenologs are homologs resulting from horizontal gene transfer between two organisms. Xenologs can have different functions, if the new environment is vastly different for the horizontally moving gene. In general, though, xenologs typically have similar function in both organisms.
Infologs are similar genes and/or proteins which are related by synthetic ancestry to approach perfect diversity distribution.
Features
edit- Optimize directly for function in the final application
- Does not require high-throughput (HTP) screens
- Screen small numbers of variants (50-200) directly for the desired function
- Decreased false positives: variants identified by HTP screens that do not retain activity in 'real' assay
- Decreased loss of potential positive hits due to screening error or poor correlation between HTP screen and 'real' assay
- No biodiversity collections required, everything is synthesized as needed
- Sequence-function relationships provide the basis for strong composition-of-matter patent claims.
Case study
editTransforming Protein engineering with Infologs:
Using independently designed synthetic genes where substitutions are systematically incorporated (Infologs) leads to uniform sampling, systematic variance and unrestricted information rich results. Wheat Glutathione S-transferases (GST) with the ability to detoxify a panel of common herbicides was designed using this patented bioengineering method. The relative functional contribution of 60 amino acid substitutions against 14 herbicides was quantified using only 96 Infologs and dramatically improved by a small set (16) of 2nd generation Infologs. In addition, highly predictable GST sequence-function models against two commercially relevant herbicides were created with quantification of relative functional contribution of 60 amino acid substitutions in two dimensions.[4]
Rational design of proteins
editIn rational protein design, the scientist uses detailed knowledge of the structure and function of the protein to make desired changes. This generally has the advantage of being technically easy and inexpensive, since site-directed mutagenesis techniques are well-developed. However, its major drawback is that detailed structural knowledge of a protein is often unavailable, and even when it is available, it can be extremely difficult to predict the effects of various mutations.
Computational protein design algorithms seek to identify novel amino acid sequences that are low in energy when folded to the pre-specified target structure. While the sequence-conformation space that needs to be searched is large, the most challenging requirement for computational protein design is a fast, yet accurate, energy function that can distinguish optimal sequences from similar suboptimal ones.
See also
editReferences
edit- ^ This technology is covered by United States issued patent US 8,005,620
- ^ Fitch, Walter M. (1970). "Distinguishing Homologous from Analogous Proteins". Systematic Biology. 19 (2): 99–113. doi:10.2307/2412448. JSTOR 2412448. PMID 5449325.
- ^ Nasvall, J.; Sun, L.; Roth, J. R.; Andersson, D. I. (2012). "Real-Time Evolution of New Genes by Innovation, Amplification, and Divergence". Science. 338 (6105): 384–7. Bibcode:2012Sci...338..384N. doi:10.1126/science.1226521. PMC 4392837. PMID 23087246.
- ^ Enzyme Engineering Conference Presentation: "Using Infologs to Engineer Biological Systems"
Further reading
edit- GRC Biocatalysis, 2014: Systematic Exploration of Sequence Space for Protein Engineering Poster
- Chen, Fei; Gaucher, Eric A.; Leal, Nicole A.; Hutter, Daniel; Havemann, Stephanie A.; Govindarajan, Sridhar; Ortlund, Eric A.; Benner, Steven A. (2010). "Reconstructed evolutionary adaptive paths give polymerases accepting reversible terminators for sequencing and SNP detection". Proceedings of the National Academy of Sciences. 107 (5): 1948–53. Bibcode:2010PNAS..107.1948C. doi:10.1073/pnas.0908463107. PMC 2804741. PMID 20080675.
- Heinzelman, Pete; Snow, Christopher D.; Smith, Matthew A.; Yu, Xinlin; Kannan, Arvind; Boulware, Kevin; Villalobos, Alan; Govindarajan, Sridhar; et al. (2009). "SCHEMA Recombination of a Fungal Cellulase Uncovers a Single Mutation That Contributes Markedly to Stability". Journal of Biological Chemistry. 284 (39): 26229–33. doi:10.1074/jbc.C109.034058. PMC 2785310. PMID 19625252.
- Heinzelman, Pete; Snow, Christopher D.; Wu, Indira; Nguyen, Catherine; Villalobos, Alan; Govindarajan, Sridhar; Minshull, Jeremy; Arnold, Frances H. (2009). "A family of thermostable fungal cellulases created by structure-guided recombination" (PDF). Proceedings of the National Academy of Sciences. 106 (14): 5610–5. Bibcode:2009PNAS..106.5610H. doi:10.1073/pnas.0901417106. JSTOR 40454838. PMC 2667002. PMID 19307582.
- Ehren, J.; Govindarajan, S.; Moron, B.; Minshull, J.; Khosla, C. (2008). "Protein engineering of improved prolyl endopeptidases for celiac sprue therapy". Protein Engineering Design and Selection. 21 (12): 699–707. doi:10.1093/protein/gzn050. PMC 2583057. PMID 18836204.
- Liao, Jun; Warmuth, Manfred K; Govindarajan, Sridhar; Ness, Jon E; Wang, Rebecca P; Gustafsson, Claes; Minshull, Jeremy (2007). "Engineering proteinase K using machine learning and synthetic genes". BMC Biotechnology. 7: 16. doi:10.1186/1472-6750-7-16. PMC 1847811. PMID 17386103.
- Minshull, Jeremy; Ness, Jon E; Gustafsson, Claes; Govindarajan, Sridhar (2005). "Predicting enzyme function from protein sequence". Current Opinion in Chemical Biology. 9 (2): 202–9. doi:10.1016/j.cbpa.2005.02.003. PMID 15811806.
- Gustafsson, Claes; Govindarajan, Sridhar; Minshull, Jeremy (2003). "Putting engineering back into protein engineering: Bioinformatic approaches to catalyst design". Current Opinion in Biotechnology. 14 (4): 366–70. doi:10.1016/S0958-1669(03)00101-0. PMID 12943844.