tgutils package

Submodules

tgutils.application module

Utilities for main functions.

class tgutils.application.FileLockLoggerAdapter(logger: logging.Logger, path: str)

Bases: logging.LoggerAdapter

A logger adapter that performs a file lock around each logged messages.

If used consistently in multiple applications, this ensures that logging does not get garbled, even when running across multiple machines.

__init__(logger: logging.Logger, path: str) → None

Create a logger adapter that locks the specified directory path.

log(*args, **kwargs) → Any

Log a message while locking the directory.

class tgutils.application.Loop(*, start: str, progress: str, completed: str, log_every: int = 1, log_with: Optional[int] = None, expected: Optional[int] = None)

Bases: object

Log progress for a (possibly parallel) loop.

__init__(*, start: str, progress: str, completed: str, log_every: int = 1, log_with: Optional[int] = None, expected: Optional[int] = None) → None

Initialize self. See help(type(self)) for accurate signature.

completed = None

The format of the completion messages.

done() → None

Indicate the loop has completed.

expected = None

The expected number of increments.

local_every = None

Granularity of parallel counting.

log_every = None

Emit a log message every this amount of iterations (typically a power of 10).

log_with = None

The value in the log message is divided by this amount (typically a power of 1000).

progress = None

The format of the progress message.

shared_counter = None

The shared memory iteration counter.

start = None

The format of the start message.

step(fraction: Optional[float] = None) → None

Indicate a loop iteration.

Ideally is called at the end of the iteration to indicate the iteration has completed. If the loop code is complex (contains continue etc.) then it is placed at the start of the code.

tgutils.application.each_file_line(path: str, loop: Optional[tgutils.application.Loop] = None) → Iterator[Tuple[int, str]]

Loop on each file line.

tgutils.application.indexed_range(index: int, *, size: int, invocations: int = 0) → range

Return a range of indices for an indexed invocation.

Each invocation index will get its own range, where the range sizes will be the same (as much as possible) for each invocation.

If the number of invocations is zero, it is assumed to be the number of available parallel processes, that is, that there will be one invocation per parallel process (at most size).

tgutils.application.lock_file(lock_path: str, lock_fd: int) → Iterator[None]

Perform some action while holding a file lock.

tgutils.application.main(parser: argparse.ArgumentParser, functions: Optional[List[str]] = None, *, adapter: Optional[Callable[argparse.Namespace, None]] = None) → None

A generic main function for configurable functions.

See dynamake.application.main().

tgutils.application.maximal_open_files() → None

Ensure we can use the maximal number of open files at the same time.

tgutils.application.reset_application() → None

Reset the global state (for tests).

tgutils.application.tg_qsub_logger(logger: logging.Logger) → logging.Logger

Wrap a logger so that messages will not get interleaved with other program invocations and/or the messages from the tg_qsub script.

tgutils.application.tgutils_adapter(args: argparse.Namespace) → None

Perform last minute adaptation of the program following parsing the command line options.

tgutils.cache module

Simple caching of expensive values.

class tgutils.cache.Cache

Bases: typing.Generic

Cache of expensive values.

__init__() → None

Initialize self. See help(type(self)) for accurate signature.

lookup(key: Key, compute_value: Callable[Value]) → Value

Lookup a value by its key, computing it only if this is the first lookup.

static reset() → None

Clear all the caches (for tests).

tgutils.load_yaml module

Load data from YAML files.

tgutils.load_yaml.load_dictionary(path: str, data: Any = None, *, allowed_keys: Optional[Dict[str, type]] = None, required_keys: Optional[Dict[str, type]] = None, key_type: type = <class 'str'>, value_type: Optional[type] = None) → Dict[Any, Any]

Load a dictionary with string keys a YAML or JSON file.

Parameters:
  • path – The path of the YAML/JSON file.
  • data – Optional data loaded from the file. If this is None, the file is loaded instead.
  • allowed_keys – An optional dictionary of allowed keys, where the value is the expected type of the loaded value. If not None, other keys are rejected (unless listed in required_keys).
  • required_keys – An optional dictionary of required_keys, where the value is the expected type of the loaded value. If not None, specified keys that are missing from the loaded data are an error.
  • key_type – The expected type of the keys, str by default.
  • value_type – An optional type. If not None
Returns:

The loaded dictionary.

Return type:

Dict[str, Any]

tgutils.load_yaml.verify_type(path: str, element_kind: str, element_identifier: str, value: Any, expected_type: Optional[type]) → None

Verify the type of an element loaded from a YAML/JSON file.

If the value has an unexpected type, throws a RuntimeError.

Parameters:
  • path – The path of the loaded YAML/JSON file.
  • element_kind – The kind of element this is (for the error message).
  • element_identifier – The identifier of the element (unique within its kind).
  • value – The loaded value of the element.
  • expected_type – The expected Python class the value should be an instance of.

tgutils.make module

Utilities for using DynaMake.

tgutils.make.parallel_jobs() → int

Return the number of jobs to use for a parallel sub-process in the current context (can be passed to --jobs).

This assumes all the actions of the innermost tg_require in the current context are executed, and tries to utilize all the available CPUs for them.

tgutils.make.reset_make() → None

Reset the persistent context (for tests).

tgutils.make.tg_require(*paths) → None

Require all the specified paths with a parallel context.

This sets up the invocation context(s) of all the actions needed to build these files, and any of their dependencies, such that parallel_size contains the number of paths and parallel_index contains the index of the specific path. If nested, the context of the innermost call is used.

The parallel_size and parallel_index context can then be embedded in the run_prefix of the actions, to be passed to qsubber which uses this information to optimize the assignment of CPUs to SunGrid jobs.

For example, suppose you wrote the following in DynaMake.yaml:

- when:
    is_parallel: True
    step: my_expensive_multi_processing_step
  then:
    run_prefix:
      'qsubber -v -I {parallel_index} -S {parallel_size} -j job-{action_id} -s 8- --'

Then qsubber will allocate at least 8 CPUs for each action invoked by some_step. If there are only a few such invocations (say, up to one per cluster server), it may assign more CPUs to each invocation (up to all the CPUs on each server). If there are many invocations, it will assign less, to ensure as many invocations as possible run in parallel.

This is due to the unfortunate fact that speedup gained by using more CPUs is not linear; that is, a 16-CPU action takes longer than half the time it takes using 8 CPUs. Therefore, if all we have is a 16-CPU machine, we are better off running two 8-CPU actions in parallel than one 16-CPU action followed by another.

This is overly convoluted, sub-optimal, and very specific to the way we distribute actions on the SunGrid cluster in the Tanay Group lab. The cluster manager should arguably do much better without all these complications. However, all we have is qsub.

tgutils.numpy module

Numpy utilities.

Import this as np instead of importing the numpy module. It exports the same symbols, with the addition of strongly-typed phantom classes for tracking the exact dimensions and type of each variable using mypy. It also provides some additional utilities (I/O).

tgutils.numpy.A = ~A

Type variable for arrays.

tgutils.numpy.ARRAY_OF_DTYPE = {'bool': <class 'tgutils.numpy.ArrayBool'>, 'float32': <class 'tgutils.numpy.ArrayFloat32'>, 'float64': <class 'tgutils.numpy.ArrayFloat64'>, 'int16': <class 'tgutils.numpy.ArrayInt16'>, 'int32': <class 'tgutils.numpy.ArrayInt32'>, 'int64': <class 'tgutils.numpy.ArrayInt64'>, 'int8': <class 'tgutils.numpy.ArrayInt8'>, 'str': <class 'tgutils.numpy.ArrayStr'>}

The phantom type for an array by its data type name.

class tgutils.numpy.ArrayBool

Bases: tgutils.numpy.BaseArray

An array of booleans.

dimensions = 1
dtype = 'bool'
class tgutils.numpy.ArrayFloat32

Bases: tgutils.numpy.BaseArray

An array of 32-bit floating point numbers.

dimensions = 1
dtype = 'float32'
class tgutils.numpy.ArrayFloat64

Bases: tgutils.numpy.BaseArray

An array of 64-bit floating point numbers.

dimensions = 1
dtype = 'float64'
class tgutils.numpy.ArrayInt16

Bases: tgutils.numpy.BaseArray

An array of 16-bit integers.

dimensions = 1
dtype = 'int16'
class tgutils.numpy.ArrayInt32

Bases: tgutils.numpy.BaseArray

An array of 32-bit integers.

dimensions = 1
dtype = 'int32'
class tgutils.numpy.ArrayInt64

Bases: tgutils.numpy.BaseArray

An array of 64-bit integers.

dimensions = 1
dtype = 'int64'
class tgutils.numpy.ArrayInt8

Bases: tgutils.numpy.BaseArray

An array of 8-bit integers.

dimensions = 1
dtype = 'int8'
class tgutils.numpy.ArrayStr

Bases: tgutils.numpy.BaseArray

An array of Unicode strings.

dimensions = 1
dtype = 'O'
class tgutils.numpy.BaseArray

Bases: numpy.ndarray

Base class for all Numpy array and matrix phantom types.

classmethod am(data: numpy.ndarray) → A

Declare an array as being of this type.

classmethod be(data: Collection) → A

Convert an array to this type.

dimensions = None

The expected dimensions of an array of the (derived) class.

classmethod empty(shape: Union[int, Tuple[int, ...]]) → A

Return an uninitialized array.

static exists(path: str) → bool

Whether there exists a disk file with the specified path to load an array from.

This checks for either a .txt or a .npy suffix to allow for loading either an array of strings or an array or matrix of numeric values.

classmethod filled(value: Any, shape: Union[int, Tuple[int, ...]]) → A

Return an array full of some value.

classmethod read(path: str, mmap_mode: Optional[str] = None) → A

Read a Numpy array of the concrete type from the disk.

If a disk file with a .txt suffix exists, this will read an array of strings. Otherwise, a file with a .npy suffix must exist, and this will memory map the array or matrix of values contained in it.

static read_array(path: str, mmap_mode: Optional[str] = None) → numpy.ndarray

Read a 1D array of any type from the disk.

static read_matrix(path: str, mmap_mode: Optional[str] = None) → numpy.ndarray

Read a 2D array of any type from the disk.

classmethod shared_memory_zeros(shape: Union[int, Tuple[int, ...]]) → A

Create a shared memory array, initialized to zeros.

classmethod write(data: numpy.ndarray, path: str) → None

Write a Numpy array of the concrete type to the disk.

If writing an array of strings, this will create a file with a .txt suffix containing one string value per line. Otherwise, the data may be an array or a matrix of numeric values, which will be written to a file with a .npy format allowing for memory mapped access.

classmethod zeros(shape: Union[int, Tuple[int, ...]]) → A

Return an array full of zeros.

tgutils.numpy.MATRIX_OF_DTYPE = {'bool': <class 'tgutils.numpy.MatrixBool'>, 'float32': <class 'tgutils.numpy.MatrixFloat32'>, 'float64': <class 'tgutils.numpy.MatrixFloat64'>, 'int16': <class 'tgutils.numpy.MatrixInt16'>, 'int32': <class 'tgutils.numpy.MatrixInt32'>, 'int64': <class 'tgutils.numpy.MatrixInt64'>, 'int8': <class 'tgutils.numpy.MatrixInt8'>}

The phantom type for a matrix by its data type name.

class tgutils.numpy.MatrixBool

Bases: tgutils.numpy.BaseArray

A matrix of booleans.

dimensions = 2
dtype = 'bool'
class tgutils.numpy.MatrixFloat32

Bases: tgutils.numpy.BaseArray

A matrix of 32-bit floating point numbers.

dimensions = 2
dtype = 'float32'
class tgutils.numpy.MatrixFloat64

Bases: tgutils.numpy.BaseArray

A matrix of 64-bit floating point numbers.

dimensions = 2
dtype = 'float64'
class tgutils.numpy.MatrixInt16

Bases: tgutils.numpy.BaseArray

A matrix of 16-bit integers.

dimensions = 2
dtype = 'int16'
class tgutils.numpy.MatrixInt32

Bases: tgutils.numpy.BaseArray

A matrix of 32-bit integers.

dimensions = 2
dtype = 'int32'
class tgutils.numpy.MatrixInt64

Bases: tgutils.numpy.BaseArray

A matrix of 64-bit integers.

dimensions = 2
dtype = 'int64'
class tgutils.numpy.MatrixInt8

Bases: tgutils.numpy.BaseArray

A matrix of 8-bit integers.

dimensions = 2
dtype = 'int8'

tgutils.pandas module

Pandas utilities.

Import this as pd instead of directly importing the pandas module. It exports the same symbols, with the addition of strongly-typed phantom classes for tracking the exact dimensions and type of each variable using mypy. It also provides some additional utilities (I/O).

class tgutils.pandas.BaseFrame(data=None, index=None, columns=None, dtype=None, copy=False)

Bases: pandas.core.frame.DataFrame

Base class for all Numpy data series phantom types.

classmethod am(data: pandas.core.frame.DataFrame) → F

Declare a data frame as being of this type.

classmethod be(data: Union[pandas.core.frame.DataFrame, numpy.ndarray, List[List[Any]]], index: Optional[Collection] = None, columns: Optional[Collection] = None) → F

Convert an array to this type.

dtype = None

The expected data type of a data frame of the (derived) class.

classmethod empty(*, index: Collection, columns: Collection) → F

Return an uninitialized frame

classmethod filled(value: Any, *, index: Collection, columns: Collection) → F

Return a frame full of some value.

classmethod read(path: str, mmap_mode: Optional[str] = None) → F

Read a Pandas data frame of the concrete type from the disk.

If additional file(s) with an .index and/or .columns suffix exist, they are loaded into the index and/or column labels.

classmethod shared_memory_zeros(*, index: Collection, columns: Collection) → F

Create a shared memory frame, initialized to zeros.

classmethod write(frame: pandas.core.frame.DataFrame, path: str) → None

Write a Pandas data frame of the concrete type to a file.

If necessary, creates additional file(s) with an .index and/or .columns suffix to preserve the index and/or column labels.

classmethod zeros(*, index: Collection, columns: Collection) → F

Return a frame full of zeros.

class tgutils.pandas.BaseSeries(data=None, index=None, dtype=None, name=None, copy=False, fastpath=False)

Bases: pandas.core.series.Series

Base class for all Numpy data series phantom types.

classmethod am(data: pandas.core.series.Series) → S

Declare a data series as being of this type.

classmethod be(data: Collection, index: Optional[Collection] = None) → S

Convert an array to this type.

classmethod empty(index: Collection) → S

Return an uninitialized series

classmethod filled(value: Any, index: Collection) → S

Return a series full of zeros.

classmethod read(path: str, mmap_mode: Optional[str] = None) → S

Read a Pandas data series of the concrete type from the disk.

If an additional file with an .index suffix exists, it is loaded into the index labels.

classmethod shared_memory_zeros(index: Collection) → S

Create a shared memory series, initialized to zeros.

classmethod write(series: pandas.core.series.Series, path: str) → None

Write a Pandas data series of the concrete type to a file.

If necessary, creates additional file with an .index suffix to preserve the index labels.

classmethod zeros(index: Collection) → S

Return a series full of zeros.

tgutils.pandas.F = ~F

type variable for data frames.

tgutils.pandas.FRAME_OF_DTYPE = {'bool': <class 'tgutils.pandas.FrameBool'>, 'float32': <class 'tgutils.pandas.FrameFloat32'>, 'float64': <class 'tgutils.pandas.FrameFloat64'>, 'int16': <class 'tgutils.pandas.FrameInt16'>, 'int32': <class 'tgutils.pandas.FrameInt32'>, 'int64': <class 'tgutils.pandas.FrameInt64'>, 'int8': <class 'tgutils.pandas.FrameInt8'>}

The phantom type for a data frame by its type name.

tgutils.pandas.Frame

alias of pandas.core.frame.DataFrame

class tgutils.pandas.FrameBool(data=None, index=None, columns=None, dtype=None, copy=False)

Bases: tgutils.pandas.BaseFrame

A data frame of booleans.

dtype = 'bool'
class tgutils.pandas.FrameFloat32(data=None, index=None, columns=None, dtype=None, copy=False)

Bases: tgutils.pandas.BaseFrame

A data frame of 32-bit floating-point numbers.

dtype = 'float32'
class tgutils.pandas.FrameFloat64(data=None, index=None, columns=None, dtype=None, copy=False)

Bases: tgutils.pandas.BaseFrame

A data frame of 64-bit floating-point numbers.

dtype = 'float64'
class tgutils.pandas.FrameInt16(data=None, index=None, columns=None, dtype=None, copy=False)

Bases: tgutils.pandas.BaseFrame

A data frame of 16-bit integers.

dtype = 'int16'
class tgutils.pandas.FrameInt32(data=None, index=None, columns=None, dtype=None, copy=False)

Bases: tgutils.pandas.BaseFrame

A data frame of 32-bit integers.

dtype = 'int32'
class tgutils.pandas.FrameInt64(data=None, index=None, columns=None, dtype=None, copy=False)

Bases: tgutils.pandas.BaseFrame

A data frame of 64-bit integers.

dtype = 'int64'
class tgutils.pandas.FrameInt8(data=None, index=None, columns=None, dtype=None, copy=False)

Bases: tgutils.pandas.BaseFrame

A data frame of 8-bit integers.

dtype = 'int8'
tgutils.pandas.S = ~S

Type variable for data series.

tgutils.pandas.SERIES_OF_DTYPE = {'bool': <class 'tgutils.pandas.SeriesBool'>, 'float32': <class 'tgutils.pandas.SeriesFloat32'>, 'float64': <class 'tgutils.pandas.SeriesFloat64'>, 'int16': <class 'tgutils.pandas.SeriesInt16'>, 'int32': <class 'tgutils.pandas.SeriesInt32'>, 'int64': <class 'tgutils.pandas.SeriesInt64'>, 'int8': <class 'tgutils.pandas.SeriesInt8'>, 'str': <class 'tgutils.pandas.SeriesStr'>}

The phantom type for a data series by its type name.

class tgutils.pandas.SeriesBool(data=None, index=None, dtype=None, name=None, copy=False, fastpath=False)

Bases: tgutils.pandas.BaseSeries

A data series of booleans.

dtype = 'bool'
class tgutils.pandas.SeriesFloat32(data=None, index=None, dtype=None, name=None, copy=False, fastpath=False)

Bases: tgutils.pandas.BaseSeries

A data series of 32-bit floating-point numbers.

dtype = 'float32'
class tgutils.pandas.SeriesFloat64(data=None, index=None, dtype=None, name=None, copy=False, fastpath=False)

Bases: tgutils.pandas.BaseSeries

A data series of 64-bit floating-point numbers.

dtype = 'float64'
class tgutils.pandas.SeriesInt16(data=None, index=None, dtype=None, name=None, copy=False, fastpath=False)

Bases: tgutils.pandas.BaseSeries

A data series of 16-bit integers.

dtype = 'int16'
class tgutils.pandas.SeriesInt32(data=None, index=None, dtype=None, name=None, copy=False, fastpath=False)

Bases: tgutils.pandas.BaseSeries

A data series of 32-bit integers.

dtype = 'int32'
class tgutils.pandas.SeriesInt64(data=None, index=None, dtype=None, name=None, copy=False, fastpath=False)

Bases: tgutils.pandas.BaseSeries

A data series of 64-bit integers.

dtype = 'int64'
class tgutils.pandas.SeriesInt8(data=None, index=None, dtype=None, name=None, copy=False, fastpath=False)

Bases: tgutils.pandas.BaseSeries

A data series of 8-bit integers.

dtype = 'int8'
class tgutils.pandas.SeriesStr(data=None, index=None, dtype=None, name=None, copy=False, fastpath=False)

Bases: tgutils.pandas.BaseSeries

A data series of Unicode strings.

dtype = 'O'

tgutils.setup_mypy module

Import this module in the setup.py file to use the provided Numpy/Pandas mypy stubs.

TODO: This is a horrible hack.

tgutils.tests module

Common utilities for tests.

class tgutils.tests.TestWithFiles(methodName='runTest')

Bases: tgutils.tests.TestWithReset

expect_file(path: str, expected: str) → None
setUp() → None

Hook method for setting up the test fixture before exercising it.

tearDown() → None

Hook method for deconstructing the test fixture after testing it.

class tgutils.tests.TestWithReset(methodName='runTest')

Bases: unittest.case.TestCase

setUp() → None

Hook method for setting up the test fixture before exercising it.

tgutils.tests.undent(content: str) → str
tgutils.tests.write_file(path: str, content: str = '') → None

tgutils.tg_qsub module

Submit a job to qsub in the Tanay Group lab.

class tgutils.tg_qsub.Qsubber

Bases: object

Submit a job to qsub in the Tanay Group lab.

__init__() → None

Initialize self. See help(type(self)) for accurate signature.

run() → int

Run the submitted job using the command line options.

tgutils.tg_qsub.main() → None

Submit a job to qsub in the Tanay Group lab.

tgutils.version module

Version is generated by setup.py.

Module contents

Main TGUtils module.