The goal of `tgstat`

is to provide fast and efficient implementation of certain R functions such as ‘cor’ and ‘dist’, along with specific statistical tools.

Various approaches are used to boost the performance, including multi-processing and use of optimized functions provided by the Basic Linear Algebra Subprograms (BLAS) library.

Install from CRAN:

`install.packages("tgstat")`

For the development version:

`remotes::install_github("tanaylab/tgstat")`

```
library(tgstat)
set.seed(seed = 60427)
rows <- 3000
cols <- 3000
vals <- sample(1:(rows * cols / 2), rows * cols, replace = T)
m <- matrix(vals, nrow = rows, ncol = cols)
m_with_NAs <- m
m_with_NAs[sample(1:(rows * cols), rows * cols / 10)] <- NA
dim(m)
#> [1] 3000 3000
```

Pearson correlation without BLAS, no NAs:

```
options(tgs_use.blas = F)
system.time(tgs_cor(m))
#> user system elapsed
#> 106.865 1.951 2.331
```

Same with BLAS:

```
# tgs_cor, with BLAS, no NAs, pearson
options(tgs_use.blas = T)
system.time(tgs_cor(m))
#> user system elapsed
#> 4.228 0.324 0.809
```

Base R version:

```
system.time(cor(m))
#> user system elapsed
#> 21.780 0.078 21.857
```

Pearson correlation without BLAS, with NAs:

```
options(tgs_use.blas = F)
system.time(tgs_cor(m_with_NAs, pairwise.complete.obs = T))
#> user system elapsed
#> 158.846 2.687 3.164
```

Same with BLAS:

```
options(tgs_use.blas = T)
system.time(tgs_cor(m_with_NAs, pairwise.complete.obs = T))
#> user system elapsed
#> 11.286 1.173 0.803
```

Base R version:

```
system.time(cor(m_with_NAs, use = "pairwise.complete.obs"))
#> user system elapsed
#> 311.627 0.182 311.823
```

Distance without BLAS, no NAs:

```
options(tgs_use.blas = F)
system.time(tgs_dist(m))
#> user system elapsed
#> 354.742 2.509 5.002
```

Same with BLAS:

```
options(tgs_use.blas = T)
system.time(tgs_dist(m))
#> user system elapsed
#> 7.407 0.656 0.462
```

Base R:

```
system.time(dist(m, method = "euclidean"))
#> user system elapsed
#> 164.197 0.077 164.280
```

`BLAS`

`tgstat`

runs best when R is linked with an optimized BLAS implementation.

Many optimized BLAS implementations are available, both proprietary (e.g. Intel’s MKL, Apple’s vecLib) and opensource (e.g. OpenBLAS, ATLAS). Unfortunately, R often uses by default the reference BLAS implementation, which is known to have poor performance.

Having `tgstat`

rely on the reference BLAS will result in very poor performance and is strongly discouraged. If your R implementation uses an optimized BLAS, set `options(tgs_use.blas=TRUE)`

to allow `tgstat`

to make BLAS calls. Otherwise, set `options(tgs_use.blas=FALSE)`

(default) which instructs `tgstat`

to avoid BLAS and instead rely only on its own optimization methods. If in doubt, it is possible to run one of `tgstat`

CPU intensive functions (e.g. `tgs_cor`

) and compare its run time under both `options(tgs_use.blas=FALSE)`

.

Exact instructions for linking R with an optimized BLAS library are system dependent and are out of scope of this document.