The class of a-b power interaction models, proposed by [1], provides a general framework for modeling sparse compositional data with pairwise feature interactions. This class includes many distributions as special cases and enables modeling of zero entries through power transformations, making it particularly suitable for modern high-throughput sequencing data with excess zeros, including single-cell RNA-Seq and microbial amplicon data. Here, we present an extension of this class of models that allows inclusion of covariate information, thus enabling accurate characterization of covariate dependencies in heterogeneous populations. Combining this model with a tailored differential abundance (DA) test leads to a novel DA testing scheme, cosmoDA, that can reduce the false positive detection rate caused by correlated features. cosmoDA uses penalized generalized score matching for parsimonious model fitting. We show on simulated benchmarks that cosmoDA can accurately estimate feature interactions in the presence of population heterogeneity and significantly reduces the false discovery rate when testing for differential abundance of correlated features. Using single-cell and amplicon data, we illustrate cosmoDA’s ability to estimate data-adaptive Box-Cox-type data transformations and assess the impact of zero replacement and power transformations on downstream differential abundance results. cosmoDA is available at https://github.com/bio-datascience/cosmoDA. Download paper here
Recommended citation: Ostner, J., Li, H., and Müller, C. L. (2026). Score matching for differential abundance testing of compositional high-throughput sequencing data. Stat. Med. 45, e70534.
