Biweight midcorrelation

In statistics, biweight midcorrelation (also called bicor) is a measure of similarity between samples. It is median-based, rather than mean-based, thus is less sensitive to outliers, and can be a robust alternative to other similarity metrics, such as Pearson correlation or mutual information.^[1]

Derivation

Here we find the biweight midcorrelation of two vectors $x$ and $y$ , with $i=1,2,\ldots ,m$ items, representing each item in the vector as $x_{1},x_{2},\ldots ,x_{m}$ and $y_{1},y_{2},\ldots ,y_{m}$ . First, we define $\operatorname {med} (x)$ as the median of a vector $x$ and $\operatorname {mad} (x)$ as the median absolute deviation (MAD), then define $u_{i}$ and $v_{i}$ as,

{\begin{aligned}u_{i}&={\frac {x_{i}-\operatorname {med} (x)}{9\operatorname {mad} (x)}},\\v_{i}&={\frac {y_{i}-\operatorname {med} (y)}{9\operatorname {mad} (y)}}.\end{aligned}}

Now we define the weights $w_{i}^{(x)}$ and $w_{i}^{(y)}$ as,

{\begin{aligned}w_{i}^{(x)}&=\left(1-u_{i}^{2}\right)^{2}I\left(1-|u_{i}|\right)\\w_{i}^{(y)}&=\left(1-v_{i}^{2}\right)^{2}I\left(1-|v_{i}|\right)\end{aligned}}

where $I$ is the identity function where,

I(x)={\begin{cases}1,&{\text{if }}x>0\\0,&{\text{otherwise}}\end{cases}}

Then we normalize so that the sum of the weights is 1:

{\begin{aligned}{\tilde {x}}_{i}&={\frac {\left(x_{i}-\operatorname {med} (x)\right)w_{i}^{(x)}}{\sqrt {\sum _{j=1}^{m}\left[(x_{j}-\operatorname {med} (x))w_{j}^{(x)}\right]^{2}}}}\\{\tilde {y}}_{i}&={\frac {\left(y_{i}-\operatorname {med} (y)\right)w_{i}^{(y)}}{\sqrt {\sum _{j=1}^{m}\left[(y_{j}-\operatorname {med} (y))w_{j}^{(y)}\right]^{2}}}}.\end{aligned}}

Finally, we define biweight midcorrelation as,

\mathrm {bicor} \left(x,y\right)=\sum _{i=1}^{m}{\tilde {x}}_{i}{\tilde {y}}_{i}

Applications

Biweight midcorrelation has been shown to be more robust in evaluating similarity in gene expression networks,^[2] and is often used for weighted correlation network analysis.

Implementations

Biweight midcorrelation has been implemented in the R statistical programming language as the function bicor as part of the WGCNA package^[3]

Also implemented in the Raku programming language as the function bi_cor_coef as part of the Statistics module.^[4]

References

^ Wilcox, Rand (January 12, 2012). Introduction to Robust Estimation and Hypothesis Testing (3rd ed.). Academic Press. p. 455. ISBN 978-0123869838.
^ Song, Lin (9 December 2012). "Comparison of co-expression measures: mutual information, correlation, and model based indices". BMC Bioinformatics. 13 (328): 328. doi:10.1186/1471-2105-13-328. PMC 3586947. PMID 23217028.
^ Langfelder, Peter. "WGCNA: Weighted Correlation Network Analysis (an R package)". CRAN. Retrieved 2018-04-06.
^ Khanal, Suman. "Statistics: Raku module for doing statistics". GitHub. Retrieved 2022-03-11.

[1] Wilcox, Rand (January 12, 2012). Introduction to Robust Estimation and Hypothesis Testing (3rd ed.). Academic Press. p. 455. ISBN 978-0123869838.

[2] Song, Lin (9 December 2012). "Comparison of co-expression measures: mutual information, correlation, and model based indices". BMC Bioinformatics. 13 (328): 328. doi:10.1186/1471-2105-13-328. PMC 3586947. PMID 23217028.

[3] Langfelder, Peter. "WGCNA: Weighted Correlation Network Analysis (an R package)". CRAN. Retrieved 2018-04-06.

[4] Khanal, Suman. "Statistics: Raku module for doing statistics". GitHub. Retrieved 2022-03-11.

[1]

[2]

[3]

[4]