Gene function prediction: insights into the correlation of protein and mRNA

How often are we trying to compare the transcripts to the protein expression levels and it just doesn’t work? Can we really use transcriptome level information to predict cellular phenotype, which is mostly governed by proteins? Recent technological advances to quantify proteins (mass spectrometry-based protoemics) and transcripts (RNA-Seq) have helped us understand the dichotomy between mRNA and protein level in complex biological systems.

The central dogma of biology tightly links DNA, RNA, and protein in biological samples. It is mechanistically very well understood how genes get transcribed, mRNA gets processed and sequentially translated into amino acid chains at the ribosome and subsequently fold into functional proteins. However, the relationship between mRNA and protein abundances derived from a particular locus is not trivial (Li and Biggin, 2015).

There are multiple biological processes beyond transcript concentration that affect the expression level of a protein. These include translation rate, translation rate modulation, modulation of protein’s half-life, protein synthesis delay, and protein delocalization. Therefore there is a significant discrepancy between mRNA and protein profiles in complex biological sample. Zhang et al. (Zhang et al., 2014) presented the first global analysis of transcript-protein relationships in a large human colon and rectal cancer cohort (87 tumors for which 3764 genes had both mRNA and protein measurements). They showed that although 89% of the genes showed positive mRNA-protein correlation, only 32% had statistically significant correlation. The average Spearman’s correlation between mRNA and protein variation was 0.23, which implies that only ~23% of the variation in protein concentration can be explained by knowing mRNA abundance (Figure 1). They also showed that genes encoding intermediary metabolism functions (e.g. Arginine and proline metabolism) had high mRNA-protein correlations, whereas genes involved in regulation, chromatin organization, and transcriptional regulation had low or negative correlation.

newFigure 1. The mean Spearman’s correlation for mRNA and protein variation is 0.23 (A). The poor concordance between protein and mRNA varaiation can be related to the biological function of gene product (B) (figure modified from Zhang et al., 2014)

Several new proteogenomics studies published in high-impact journals  (Liu et al., 2016; Mertins et al., 2016; Ruggles et al., 2016; Zhang et al., 2014; Zhang et al., 2016) showed that the proteome links better to the cell phenotype than the transcriptome does. But can we improve biological function prediction if we only have RNA-Seq data for our samples? In a new study by Jing Wang et al. (Wang et al., 2017), the gene co-expression networks were constructed based on mRNA and protein profiling data sets for three cancer types (i.e. breast, colorectal, and ovarian). They systematically investigated the relative utility of mRNA and proteome data in predicting co-functionality. They concluded that mRNA co-expression was driven by both co-function and chromosomal co-localization of the genes, whereas protein co-expression was driven primarily by functional similarity between co-expressed genes. Therefore by linking the differential gene expression to the chromosomal localization of genes, we may improve the RNA-Seq data for functional characterization of genomes.


Li, J.J., and Biggin, M.D. (2015). Gene expression. Statistics requantitates the central dogma. Science 347, 1066-1067.

Liu, Y., Beyer, A., and Aebersold, R. (2016). On the Dependency of Cellular Protein Levels on mRNA Abundance. Cell 165, 535-550.

Mertins, P., Mani, D.R., Ruggles, K.V., Gillette, M.A., Clauser, K.R., Wang, P., Wang, X., Qiao, J.W., Cao, S., Petralia, F., et al. (2016). Proteogenomics connects somatic mutations to signalling in breast cancer. Nature 534, 55-62.

Ruggles, K.V., Tang, Z., Wang, X., Grover, H., Askenazi, M., Teubl, J., Cao, S., McLellan, M.D., Clauser, K.R., Tabb, D.L., et al. (2016). An Analysis of the Sensitivity of Proteogenomic Mapping of Somatic Mutations and Novel Splicing Events in Cancer. Mol Cell Proteomics 15, 1060-1071.

Wang, J., Ma, Z., Carr, S.A., Mertins, P., Zhang, H., Zhang, Z., Chan, D.W., Ellis, M.J., Townsend, R.R., Smith, R.D., et al. (2017). Proteome Profiling Outperforms Transcriptome Profiling for Coexpression Based Gene Function Prediction. Mol Cell Proteomics 16, 121-134.

Zhang, B., Wang, J., Wang, X., Zhu, J., Liu, Q., Shi, Z., Chambers, M.C., Zimmerman, L.J., Shaddox, K.F., Kim, S., et al. (2014). Proteogenomic characterization of human colon and rectal cancer. Nature 513, 382-387.

Zhang, H., Liu, T., Zhang, Z., Payne, S.H., Zhang, B., McDermott, J.E., Zhou, J.Y., Petyuk, V.A., Chen, L., Ray, D., et al. (2016). Integrated Proteogenomic Characterization of Human High-Grade Serous Ovarian Cancer. Cell 166, 755-765.


This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s