DESeq2 is a software package in the field of bioinformatics and computational biology for the statistical programming language R. It is primarily employed for the analysis of high-throughput RNA sequencing (RNA-seq) data to identify differentially expressed genes between different experimental conditions. DESeq2 employs statistical methods to normalize and analyze RNA-seq data, making it a valuable tool for researchers studying gene expression patterns and regulation. It is available through the Bioconductor repository.

Original author(s)Michael Love
Constantin Ahlmann-Eltze
Kwame Forbes
Simon Anders
Wolfgang Huber
Initial release22 March 2013; 11 years ago (2013-03-22)
Stable release
1.40.2 / 20 August 2023; 8 months ago (2023-08-20)
Repositorygithub.com/thelovelab/DESeq2
Operating systemLinux, macOS, Windows
PlatformR programming language
TypeBioinformatics
LicenseGNU Lesser General Public License
Websitebioconductor.org/packages/release/bioc/html/DESeq2.html

It was first presented in 2014.[1] As of September 2023, its use has been cited over 30,000 times.[2]

Features edit

One of the key steps in the analysis of RNA-seq data is data normalization.[3] DESeq2 employs the "size factor" normalization method, which adjusts for differences in sequencing depth between samples.[1] This normalization ensures that the expression values of genes are comparable across samples, allowing for accurate identification of differentially expressed genes. In addition to size factor normalization, DESeq2 also employs a variance-stabilizing transformation, which further enhances the quality of the data by stabilizing the variance across different expression levels.[4] This combination of normalization techniques minimizes bias and improves the accuracy of differential expression analysis.

DESeq2 makes available negative binomial distribution models to account for the over-dispersion commonly observed in RNA-seq data.[5] This modeling approach takes into consideration the variability that is not adequately explained by a simple Poisson distribution. By incorporating the negative binomial distribution, DESeq2 accurately models the dispersion of gene expression counts and provides more reliable estimates of differential expression.

DESeq2 also offers an adaptive shrinkage procedure, known as the "apeglm" method, which is particularly useful when dealing with small sample sizes.[6] This technique effectively shrinks the log-fold changes of gene expression estimates, reducing the impact of extreme values and improving the stability of results. This is especially valuable for researchers working with limited biological replicates, as it helps to mitigate the problem of low statistical power.

Furthermore, DESeq2 allows users to incorporate relevant covariates into their analyses.[1] This feature enables researchers to account for potential confounding factors, such as batch effects or experimental conditions, that can influence gene expression. By including covariates in the analysis, DESeq2 offers a more accurate assessment of the true differential expression patterns in the data.

Use edit

DESeq2 is interfaced through R, via the bioconductor repository.[7] The repository provides comprehensive documentation and tutorials, making it accessible to a wide range of researchers.

References edit

  1. ^ a b c Love, Michael I; Huber, Wolfgang; Anders, Simon (December 2014). "Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2". Genome Biology. 15 (12): 550. doi:10.1186/s13059-014-0550-8. PMC 4302049. PMID 25516281.
  2. ^ Love, M. I.; Huber, W.; Anders, S. (2014). "Citation Metrics". Genome Biology. 15 (12). University of Otago: 550. doi:10.1186/s13059-014-0550-8. PMC 4302049. PMID 25516281.
  3. ^ Evans, Ciaran; Hardin, Johanna; Stoebel, Daniel M (28 September 2018). "Selecting between-sample RNA-Seq normalization methods from the perspective of their assumptions". Briefings in Bioinformatics. 19 (5): 776–792. doi:10.1093/bib/bbx008. PMC 6171491. PMID 28334202.
  4. ^ "varianceStabilizingTransformation: Apply a variance stabilizing transformation (VST) to the..." rdrr.io. Archived from the original on 28 September 2023. Retrieved 28 September 2023.
  5. ^ "Gene-level differential expression analysis". HBC Training. Github.io. 15 May 2020. Archived from the original on 28 September 2023. Retrieved 28 September 2023.
  6. ^ Chipman, Hugh A.; Kolaczyk, Eric D.; McCulloch, Robert E. (December 1997). "Adaptive Bayesian Wavelet Shrinkage". Journal of the American Statistical Association. 92 (440): 1413. doi:10.2307/2965411. JSTOR 2965411.
  7. ^ "DESeq2: An Overview of a Popular RNA-Seq Analysis Package". pluto.bio. 18 October 2021. Archived from the original on 27 September 2023. Retrieved 27 September 2023.