Breadcrumb

Using data science to enhance protein biogenesis for biotechnology and medicine

Prof. Justin Chartron, Department of Bioengineering, UCR.
ABSTRACT –

Engineered proteins are at the heart of biotechnology and the biopharmaceutical industry. The research community now has several tools available to develop novel protein structures and functions. Computational methods can link fully designed structures back to sequence, and directed evolution can create enzymes that catalyze reactions not found in biology. However, these approaches rely on cells (or cell extracts) to produce the engineered proteins, and the principles guiding protein folding in the cell are comparatively poorly developed. As a result, the range of molecules that can be made is restricted, product development is slowed, and production costs remain high. In this talk, I will outline outstanding issues of protein folding in the cell and introduce a data-driven approach used by my group to understand and control protein biogenesis. Our central hypothesis is that the protein biogenesis machinery present in a cell has evolved under the demands of the organism’s proteome; thus, overexpressed, non-native proteins present a challenge that the cell is not optimized to solve. Protein biogenesis pathways comprise dozens of molecular chaperones, protein processing enzymes, targeting factors and quality control factors. While we know the identities and broad roles of many of these factors, we do not understand how they coordinate as networks that act on whole proteomes. Here, we focus a single industrially significant biogenesis pathway: protein secretion from fungi. We devise -omics level experiments to capture and quantitatively measure distinct events during protein secretion. We then use machine learning to link our observations to protein or mRNA sequence features and the state of the host cell. To expand our dataset, we rely on the natural variation arising from evolutionarily divergent species. Altogether, we are developing a set describing the early secretion of about 15,000 unique protein sequences. These data are generating hypotheses into fundamental protein biogenesis pathways.

Prof. Justin Chartron

Tags