Biology:Global coordination level

From HandWiki
Short description: Computational approach for data analysis
}}Global coordination level (GCL) is a computational method that evaluates the system-wide dependency in multivariate data, by calculating the distance correlation between random subsets of the variables. Originally applied to gene expression data, GCL assesses the level of coordination between genes, which are fundamentally linked in performing tasks and biological functions. Unlike traditional methods that require precise knowledge of pairwise interactions between genes, GCL can evaluate coordination without such information. The GCL value of zero signifies independent gene expression, while values above zero indicate gene-to-gene regulatory interactions. For instance, when GCL is applied to known genetic pathways in the Kyoto Encyclopedia of Genes and Genomes (KEGG) database, it yields significantly positive values, while random subsets of genes or mock pathways with similar gene expression levels show very low GCL values. Additionally, GCL can be useful in analyzing high-dimensional ecological and biochemical dynamics.

Introduction

Genes interact with each other in a complex structure known as the gene regulatory network, which plays a crucial role in implementing various biological functions and performing different tasks within cells. However, inferring the precise pairwise interactions of the gene regulatory network remains challenging due to the large number of functional genes and the inherent stochasticity of these systems.[1][2] Despite these challenges, certain features of the gene regulatory network can still be extracted without fully inferring all the interactions. For instance, the network connectivity, which refers to the density of actual gene-gene interactions compared to all possible interactions, may have important implications for general cellular processes.

Method description

The calculation of the Conditional Likelihood (CL) is based on multivariate dependencies among genes in a given cohort of cells. This involves a repeated procedure of randomly selecting subsets of genes and calculating the distance correlation between them, as described in the work.[3] By averaging over many such gene subsets, a single numerical value, known as the Gene Connectivity Landscape (GCL), is obtained to assess the overall dependencies between the genes.

However, there are several important pre- and post-processing steps that need to be taken into account to ensure the accuracy and reliability of the GCL. Firstly, clustering methods should be applied to divide the analyzed cohort of cells into subsets, and the GCL should be calculated separately for each subset or the largest one to ensure homogeneity. Secondly, cells that deviate significantly from the rest of the cells (referred to as 'outliers') or cells that are too similar to each other (referred to as 'inliers') should be filtered out to avoid their undue influence on the GCL calculation.

Additionally, jackknife analysis, which involves systematically omitting subsets of cells from the analysis and recalculating the GCL, should be performed to test the stability of the results. These steps are necessary because the GCL, like other correlation measures, can be sensitive to unusual cells and heterogeneous cohorts, especially in the context of sparse, noisy, and outlier-prone scRNA-seq data.

Applications

Aging: Stochastic aberration of transcriptional regulation is a dominant factor in the process of aging.[4] However, when assessing GCLs in multiple single-cell RNA-sequencing datasets, the decline of GCL with age has been consistently observed across various organisms and cell types. Notably, significant decreases in GCL were found in mouse hematopoietic stem cells based on single-cell RNA-seq data, supporting the hypothesis of aging as dys-differentiation. This idea, originally posited by Richard Cutler in the 1970s, suggests that cells deviate from their proper state of differentiation as they age, as evidenced by the activation of genes that should normally be silent in aged tissues.[5]

Measuring biological variability: The GCL decreases in cohorts of cells with increased 'biological variability' only when it arises from gene interactions. The GCL can be used to assess and compare the ratio between introduced biological and technical variability in cohorts with similar cell-to-cell variability.[6]

References

  1. Banf, Michael; Rhee, Seung Y (1 January 2017). "Computational inference of gene regulatory networks: Approaches, limitations and opportunities". Biochimica et Biophysica Acta (BBA) - Gene Regulatory Mechanisms 1860 (1): 41–52. doi:10.1016/j.bbagrm.2016.09.003. PMID 27641093. 
  2. Chai, L. E.; Loh, S. K.; Low, S. T.; Mohamad, M. S.; Deris, S.; Zakaria, Z. (2014). "A review on the computational approaches for gene regulatory network construction". Computers in Biology and Medicine 48: 55–65. doi:10.1016/j.compbiomed.2014.02.011. PMID 24637147. https://pubmed.ncbi.nlm.nih.gov/24637147/. 
  3. Székely, Gábor J.; Rizzo, Maria L. (2013). "The distance correlation -test of independence in high dimension". Journal of Multivariate Analysis 117: 193–213. doi:10.1016/j.jmva.2013.02.012. 
  4. Warren, L. A.; Rossi, D. J.; Schiebinger, G. R.; Weissman, I. L.; Kim, S. K.; Quake, S. R. (2007). "Transcriptional instability is not a universal attribute of aging". Aging Cell 6 (6): 775–782. doi:10.1111/j.1474-9726.2007.00337.x. PMID 17925006. 
  5. Ono, Tetsuya; Cutler, Richard G. (1978). "Age-dependent relaxation of gene repression: Increase of endogenous murine leukemia virus-related and globin-related RNA in brain and liver of mice". Proceedings of the National Academy of Sciences 75 (9): 4431–4435. doi:10.1073/pnas.75.9.4431. PMID 212751. Bibcode1978PNAS...75.4431O. 
  6. Vaknin, Dana; Amit, Guy; Bashan, Amir (2021). "A top-down measure of gene-to-gene coordination for analyzing cell-to-cell variability". Scientific Reports 11 (1): 11075. doi:10.1038/s41598-021-90353-w. PMID 34040065. Bibcode2021NatSR..1111075V.  This article incorporates text from this source, which is available under the CC BY 4.0 license.