Vectorized chi-squared test for 2x2 contingency tables

Performs a chi-squared test with optional Yates' continuity correction on each row of a two-column count matrix. For each row, a 2x2 contingency table is constructed using the row counts and the column sums, and the chi-squared statistic and p-value are computed.

tgs_chi2(x, yates = TRUE)

Arguments

x: a numeric matrix or sparse matrix of dgCMatrix type with exactly 2 columns containing non-negative counts.
yates: logical; if TRUE (default), Yates' continuity correction is applied to the chi-squared statistic.

Value

A numeric matrix with nrow(x) rows and 2 columns named "chi2" and "pval". Row names from x are preserved. When the denominator of the chi-squared formula is zero, the statistic is set to 0 and the p-value to 1.

Details

This function is useful for differential gene expression analysis, where each row represents a gene and the two columns represent UMI counts in two conditions. The test determines whether the gene's proportion differs significantly between conditions.

For each row i, the contingency table is:

	Condition 1	Condition 2
Gene i	x[i,1]	x[i,2]
Other genes	colSum1 - x[i,1]	colSum2 - x[i,2]

Examples

# \donttest{
# Note: all the available CPU cores might be used

set.seed(42)
# Simulate UMI counts for 1000 genes in 2 conditions
mat <- matrix(rpois(2000, lambda = 100), ncol = 2)
rownames(mat) <- paste0("gene", 1:1000)
colnames(mat) <- c("condition1", "condition2")
result <- tgs_chi2(mat)
head(result)
#>              chi2      pval
#> gene1 0.170807851 0.6793948
#> gene2 0.464785853 0.4953958
#> gene3 0.006138945 0.9375485
#> gene4 0.338068873 0.5609460
#> gene5 0.000000000 1.0000000
#> gene6 0.000000000 1.0000000

# Without Yates' correction
result_no_yates <- tgs_chi2(mat, yates = FALSE)

# With sparse matrix
sparse_mat <- Matrix::Matrix(mat, sparse = TRUE)
result_sparse <- tgs_chi2(sparse_mat)
# }