Old Linux Mmap

TanayLabUtilities.OldLinuxMmap Module

(Very partially) mitigate old Linux kernels memory mapping issues.

Linux kernels prior to 5.7 serialize page faults from multiple threads of the same process. Which means that if we memory map a file, and different threads access different pages, triggering page faults, all the IO related code becomes serial, which butchers performance.

A very partial and unsatisfactory workaround is to have the file in a RAM disk, and also pre-populate the page table when mapping the file. There's no I/O and serially pre-populating the page table is faster than doing this in the middle of the parallel code.

Of course this requires enough RAM so that having the files in a RAM disk (e.g. /dev/shm ) is practical, and also requires manually copying the files back from/to a real disk if they need to be persistent, so this workaround only works when the stars are in alignment.

However, if you are stuck with an ancient Linux kernel, this can dramatically increase performance of multi-threaded code that uses memory mapping (specifically, Metacells code). Given 5.7 was released in 2020 and will be end-of-life in 2029, and large institute IT departments sometimes use ancient kernel versions until the last possible moment, we should probably be able to delete this module in 2030. Sigh.

TanayLabUtilities.OldLinuxMmap.mmap_populate_if_old_linux_ramdisk Function
mmap_populate_if_old_linux_ramdisk(
    path::AbstractString,
    ::Type{ArrayT},
    size::Union{Integer, Tuple{<:Integer, <:Integer}},
    mode::AbstractString
)::ArrayT where {ArrayT <: Array}

Similar to mmap but pre-populate the page table if in Linux, the kernel is older than 5.7, and the file is in a RAM disk. The mode must be either r or r+ . Mapping is always done with grow = false .

Index