tgutils package¶
Submodules¶
tgutils.application module¶
Utilities for main functions.
-
class
tgutils.application.
FileLockLoggerAdapter
(logger: logging.Logger, path: str)¶ Bases:
logging.LoggerAdapter
A logger adapter that performs a file lock around each logged messages.
If used consistently in multiple applications, this ensures that logging does not get garbled, even when running across multiple machines.
-
__init__
(logger: logging.Logger, path: str) → None¶ Create a logger adapter that locks the specified directory path.
-
log
(*args, **kwargs) → Any¶ Log a message while locking the directory.
-
-
class
tgutils.application.
Loop
(*, start: str, progress: str, completed: str, log_every: int = 1, log_with: Optional[int] = None, expected: Optional[int] = None)¶ Bases:
object
Log progress for a (possibly parallel) loop.
-
__init__
(*, start: str, progress: str, completed: str, log_every: int = 1, log_with: Optional[int] = None, expected: Optional[int] = None) → None¶ Initialize self. See help(type(self)) for accurate signature.
-
completed
= None¶ The format of the completion messages.
-
done
() → None¶ Indicate the loop has completed.
-
expected
= None¶ The expected number of increments.
-
local_every
= None¶ Granularity of parallel counting.
-
log_every
= None¶ Emit a log message every this amount of iterations (typically a power of 10).
-
log_with
= None¶ The value in the log message is divided by this amount (typically a power of 1000).
-
progress
= None¶ The format of the progress message.
The shared memory iteration counter.
-
start
= None¶ The format of the start message.
-
step
(fraction: Optional[float] = None) → None¶ Indicate a loop iteration.
Ideally is called at the end of the iteration to indicate the iteration has completed. If the loop code is complex (contains
continue
etc.) then it is placed at the start of the code.
-
-
tgutils.application.
each_file_line
(path: str, loop: Optional[tgutils.application.Loop] = None) → Iterator[Tuple[int, str]]¶ Loop on each file line.
-
tgutils.application.
indexed_range
(index: int, *, size: int, invocations: int = 0) → range¶ Return a range of indices for an indexed invocation.
Each invocation
index
will get its own range, where the range sizes will be the same (as much as possible) for each invocation.If the number of
invocations
is zero, it is assumed to be the number of available parallel processes, that is, that there will be one invocation per parallel process (at mostsize
).
-
tgutils.application.
lock_file
(lock_path: str, lock_fd: int) → Iterator[None]¶ Perform some action while holding a file lock.
-
tgutils.application.
main
(parser: argparse.ArgumentParser, functions: Optional[List[str]] = None, *, adapter: Optional[Callable[argparse.Namespace, None]] = None) → None¶ A generic
main
function for configurable functions.See
dynamake.application.main()
.
-
tgutils.application.
maximal_open_files
() → None¶ Ensure we can use the maximal number of open files at the same time.
-
tgutils.application.
reset_application
() → None¶ Reset the global state (for tests).
-
tgutils.application.
tg_qsub_logger
(logger: logging.Logger) → logging.Logger¶ Wrap a logger so that messages will not get interleaved with other program invocations and/or the messages from the
tg_qsub
script.
-
tgutils.application.
tgutils_adapter
(args: argparse.Namespace) → None¶ Perform last minute adaptation of the program following parsing the command line options.
tgutils.cache module¶
Simple caching of expensive values.
-
class
tgutils.cache.
Cache
¶ Bases:
typing.Generic
Cache of expensive values.
-
__init__
() → None¶ Initialize self. See help(type(self)) for accurate signature.
-
lookup
(key: Key, compute_value: Callable[Value]) → Value¶ Lookup a value by its key, computing it only if this is the first lookup.
-
static
reset
() → None¶ Clear all the caches (for tests).
-
tgutils.load_yaml module¶
Load data from YAML files.
-
tgutils.load_yaml.
load_dictionary
(path: str, data: Any = None, *, allowed_keys: Optional[Dict[str, type]] = None, required_keys: Optional[Dict[str, type]] = None, key_type: type = <class 'str'>, value_type: Optional[type] = None) → Dict[Any, Any]¶ Load a dictionary with string keys a YAML or JSON file.
Parameters: - path – The path of the YAML/JSON file.
- data – Optional data loaded from the file.
If this is
None
, the file is loaded instead. - allowed_keys – An optional dictionary of allowed keys,
where the value is the expected type of the loaded value.
If not
None
, other keys are rejected (unless listed in required_keys). - required_keys – An optional dictionary of required_keys,
where the value is the expected type of the loaded value.
If not
None
, specified keys that are missing from the loaded data are an error. - key_type – The expected type of the keys,
str
by default. - value_type – An optional type.
If not
None
Returns: The loaded dictionary.
Return type: Dict[str, Any]
-
tgutils.load_yaml.
verify_type
(path: str, element_kind: str, element_identifier: str, value: Any, expected_type: Optional[type]) → None¶ Verify the type of an element loaded from a YAML/JSON file.
If the value has an unexpected type, throws a
RuntimeError
.Parameters: - path – The path of the loaded YAML/JSON file.
- element_kind – The kind of element this is (for the error message).
- element_identifier – The identifier of the element (unique within its kind).
- value – The loaded value of the element.
- expected_type – The expected Python class the value should be an instance of.
tgutils.make module¶
Utilities for using DynaMake.
-
tgutils.make.
parallel_jobs
() → int¶ Return the number of jobs to use for a parallel sub-process in the current context (can be passed to
--jobs
).This assumes all the actions of the innermost
tg_require
in the current context are executed, and tries to utilize all the available CPUs for them.
-
tgutils.make.
reset_make
() → None¶ Reset the persistent context (for tests).
-
tgutils.make.
tg_require
(*paths) → None¶ Require all the specified paths with a parallel context.
This sets up the invocation context(s) of all the actions needed to build these files, and any of their dependencies, such that
parallel_size
contains the number of paths andparallel_index
contains the index of the specific path. If nested, the context of the innermost call is used.The
parallel_size
andparallel_index
context can then be embedded in therun_prefix
of the actions, to be passed toqsubber
which uses this information to optimize the assignment of CPUs to SunGrid jobs.For example, suppose you wrote the following in
DynaMake.yaml
:- when: is_parallel: True step: my_expensive_multi_processing_step then: run_prefix: 'qsubber -v -I {parallel_index} -S {parallel_size} -j job-{action_id} -s 8- --'
Then
qsubber
will allocate at least 8 CPUs for each action invoked bysome_step
. If there are only a few such invocations (say, up to one per cluster server), it may assign more CPUs to each invocation (up to all the CPUs on each server). If there are many invocations, it will assign less, to ensure as many invocations as possible run in parallel.This is due to the unfortunate fact that speedup gained by using more CPUs is not linear; that is, a 16-CPU action takes longer than half the time it takes using 8 CPUs. Therefore, if all we have is a 16-CPU machine, we are better off running two 8-CPU actions in parallel than one 16-CPU action followed by another.
This is overly convoluted, sub-optimal, and very specific to the way we distribute actions on the SunGrid cluster in the Tanay Group lab. The cluster manager should arguably do much better without all these complications. However, all we have is
qsub
.
tgutils.numpy module¶
Numpy utilities.
Import this as np
instead of importing the numpy
module. It exports the same symbols, with
the addition of strongly-typed phantom classes for tracking the exact dimensions and type of each
variable using mypy
. It also provides some additional utilities (I/O).
-
tgutils.numpy.
A
= ~A¶ Type variable for arrays.
-
tgutils.numpy.
ARRAY_OF_DTYPE
= {'bool': <class 'tgutils.numpy.ArrayBool'>, 'float32': <class 'tgutils.numpy.ArrayFloat32'>, 'float64': <class 'tgutils.numpy.ArrayFloat64'>, 'int16': <class 'tgutils.numpy.ArrayInt16'>, 'int32': <class 'tgutils.numpy.ArrayInt32'>, 'int64': <class 'tgutils.numpy.ArrayInt64'>, 'int8': <class 'tgutils.numpy.ArrayInt8'>, 'str': <class 'tgutils.numpy.ArrayStr'>}¶ The phantom type for an array by its data type name.
-
class
tgutils.numpy.
ArrayBool
¶ Bases:
tgutils.numpy.BaseArray
An array of booleans.
-
dimensions
= 1¶
-
dtype
= 'bool'¶
-
-
class
tgutils.numpy.
ArrayFloat32
¶ Bases:
tgutils.numpy.BaseArray
An array of 32-bit floating point numbers.
-
dimensions
= 1¶
-
dtype
= 'float32'¶
-
-
class
tgutils.numpy.
ArrayFloat64
¶ Bases:
tgutils.numpy.BaseArray
An array of 64-bit floating point numbers.
-
dimensions
= 1¶
-
dtype
= 'float64'¶
-
-
class
tgutils.numpy.
ArrayInt16
¶ Bases:
tgutils.numpy.BaseArray
An array of 16-bit integers.
-
dimensions
= 1¶
-
dtype
= 'int16'¶
-
-
class
tgutils.numpy.
ArrayInt32
¶ Bases:
tgutils.numpy.BaseArray
An array of 32-bit integers.
-
dimensions
= 1¶
-
dtype
= 'int32'¶
-
-
class
tgutils.numpy.
ArrayInt64
¶ Bases:
tgutils.numpy.BaseArray
An array of 64-bit integers.
-
dimensions
= 1¶
-
dtype
= 'int64'¶
-
-
class
tgutils.numpy.
ArrayInt8
¶ Bases:
tgutils.numpy.BaseArray
An array of 8-bit integers.
-
dimensions
= 1¶
-
dtype
= 'int8'¶
-
-
class
tgutils.numpy.
ArrayStr
¶ Bases:
tgutils.numpy.BaseArray
An array of Unicode strings.
-
dimensions
= 1¶
-
dtype
= 'O'¶
-
-
class
tgutils.numpy.
BaseArray
¶ Bases:
numpy.ndarray
Base class for all Numpy array and matrix phantom types.
-
classmethod
am
(data: numpy.ndarray) → A¶ Declare an array as being of this type.
-
classmethod
be
(data: Collection) → A¶ Convert an array to this type.
-
dimensions
= None¶ The expected dimensions of an array of the (derived) class.
-
classmethod
empty
(shape: Union[int, Tuple[int, ...]]) → A¶ Return an uninitialized array.
-
static
exists
(path: str) → bool¶ Whether there exists a disk file with the specified path to load an array from.
This checks for either a
.txt
or a.npy
suffix to allow for loading either an array of strings or an array or matrix of numeric values.
-
classmethod
filled
(value: Any, shape: Union[int, Tuple[int, ...]]) → A¶ Return an array full of some value.
-
classmethod
read
(path: str, mmap_mode: Optional[str] = None) → A¶ Read a Numpy array of the concrete type from the disk.
If a disk file with a
.txt
suffix exists, this will read an array of strings. Otherwise, a file with a.npy
suffix must exist, and this will memory map the array or matrix of values contained in it.
-
static
read_array
(path: str, mmap_mode: Optional[str] = None) → numpy.ndarray¶ Read a 1D array of any type from the disk.
-
static
read_matrix
(path: str, mmap_mode: Optional[str] = None) → numpy.ndarray¶ Read a 2D array of any type from the disk.
Create a shared memory array, initialized to zeros.
-
classmethod
write
(data: numpy.ndarray, path: str) → None¶ Write a Numpy array of the concrete type to the disk.
If writing an array of strings, this will create a file with a
.txt
suffix containing one string value per line. Otherwise, the data may be an array or a matrix of numeric values, which will be written to a file with a.npy
format allowing for memory mapped access.
-
classmethod
zeros
(shape: Union[int, Tuple[int, ...]]) → A¶ Return an array full of zeros.
-
classmethod
-
tgutils.numpy.
MATRIX_OF_DTYPE
= {'bool': <class 'tgutils.numpy.MatrixBool'>, 'float32': <class 'tgutils.numpy.MatrixFloat32'>, 'float64': <class 'tgutils.numpy.MatrixFloat64'>, 'int16': <class 'tgutils.numpy.MatrixInt16'>, 'int32': <class 'tgutils.numpy.MatrixInt32'>, 'int64': <class 'tgutils.numpy.MatrixInt64'>, 'int8': <class 'tgutils.numpy.MatrixInt8'>}¶ The phantom type for a matrix by its data type name.
-
class
tgutils.numpy.
MatrixBool
¶ Bases:
tgutils.numpy.BaseArray
A matrix of booleans.
-
dimensions
= 2¶
-
dtype
= 'bool'¶
-
-
class
tgutils.numpy.
MatrixFloat32
¶ Bases:
tgutils.numpy.BaseArray
A matrix of 32-bit floating point numbers.
-
dimensions
= 2¶
-
dtype
= 'float32'¶
-
-
class
tgutils.numpy.
MatrixFloat64
¶ Bases:
tgutils.numpy.BaseArray
A matrix of 64-bit floating point numbers.
-
dimensions
= 2¶
-
dtype
= 'float64'¶
-
-
class
tgutils.numpy.
MatrixInt16
¶ Bases:
tgutils.numpy.BaseArray
A matrix of 16-bit integers.
-
dimensions
= 2¶
-
dtype
= 'int16'¶
-
-
class
tgutils.numpy.
MatrixInt32
¶ Bases:
tgutils.numpy.BaseArray
A matrix of 32-bit integers.
-
dimensions
= 2¶
-
dtype
= 'int32'¶
-
-
class
tgutils.numpy.
MatrixInt64
¶ Bases:
tgutils.numpy.BaseArray
A matrix of 64-bit integers.
-
dimensions
= 2¶
-
dtype
= 'int64'¶
-
-
class
tgutils.numpy.
MatrixInt8
¶ Bases:
tgutils.numpy.BaseArray
A matrix of 8-bit integers.
-
dimensions
= 2¶
-
dtype
= 'int8'¶
-
tgutils.pandas module¶
Pandas utilities.
Import this as pd
instead of directly importing the pandas
module. It exports the same
symbols, with the addition of strongly-typed phantom classes for tracking the exact dimensions and
type of each variable using mypy
. It also provides some additional utilities (I/O).
-
class
tgutils.pandas.
BaseFrame
(data=None, index=None, columns=None, dtype=None, copy=False)¶ Bases:
pandas.core.frame.DataFrame
Base class for all Numpy data series phantom types.
-
classmethod
am
(data: pandas.core.frame.DataFrame) → F¶ Declare a data frame as being of this type.
-
classmethod
be
(data: Union[pandas.core.frame.DataFrame, numpy.ndarray, List[List[Any]]], index: Optional[Collection] = None, columns: Optional[Collection] = None) → F¶ Convert an array to this type.
-
dtype
= None¶ The expected data type of a data frame of the (derived) class.
-
classmethod
empty
(*, index: Collection, columns: Collection) → F¶ Return an uninitialized frame
-
classmethod
filled
(value: Any, *, index: Collection, columns: Collection) → F¶ Return a frame full of some value.
-
classmethod
read
(path: str, mmap_mode: Optional[str] = None) → F¶ Read a Pandas data frame of the concrete type from the disk.
If additional file(s) with an
.index
and/or.columns
suffix exist, they are loaded into the index and/or column labels.
Create a shared memory frame, initialized to zeros.
-
classmethod
write
(frame: pandas.core.frame.DataFrame, path: str) → None¶ Write a Pandas data frame of the concrete type to a file.
If necessary, creates additional file(s) with an
.index
and/or.columns
suffix to preserve the index and/or column labels.
-
classmethod
zeros
(*, index: Collection, columns: Collection) → F¶ Return a frame full of zeros.
-
classmethod
-
class
tgutils.pandas.
BaseSeries
(data=None, index=None, dtype=None, name=None, copy=False, fastpath=False)¶ Bases:
pandas.core.series.Series
Base class for all Numpy data series phantom types.
-
classmethod
am
(data: pandas.core.series.Series) → S¶ Declare a data series as being of this type.
-
classmethod
be
(data: Collection, index: Optional[Collection] = None) → S¶ Convert an array to this type.
-
classmethod
empty
(index: Collection) → S¶ Return an uninitialized series
-
classmethod
filled
(value: Any, index: Collection) → S¶ Return a series full of zeros.
-
classmethod
read
(path: str, mmap_mode: Optional[str] = None) → S¶ Read a Pandas data series of the concrete type from the disk.
If an additional file with an
.index
suffix exists, it is loaded into the index labels.
Create a shared memory series, initialized to zeros.
-
classmethod
write
(series: pandas.core.series.Series, path: str) → None¶ Write a Pandas data series of the concrete type to a file.
If necessary, creates additional file with an
.index
suffix to preserve the index labels.
-
classmethod
zeros
(index: Collection) → S¶ Return a series full of zeros.
-
classmethod
-
tgutils.pandas.
F
= ~F¶ type variable for data frames.
-
tgutils.pandas.
FRAME_OF_DTYPE
= {'bool': <class 'tgutils.pandas.FrameBool'>, 'float32': <class 'tgutils.pandas.FrameFloat32'>, 'float64': <class 'tgutils.pandas.FrameFloat64'>, 'int16': <class 'tgutils.pandas.FrameInt16'>, 'int32': <class 'tgutils.pandas.FrameInt32'>, 'int64': <class 'tgutils.pandas.FrameInt64'>, 'int8': <class 'tgutils.pandas.FrameInt8'>}¶ The phantom type for a data frame by its type name.
-
tgutils.pandas.
Frame
¶ alias of
pandas.core.frame.DataFrame
-
class
tgutils.pandas.
FrameBool
(data=None, index=None, columns=None, dtype=None, copy=False)¶ Bases:
tgutils.pandas.BaseFrame
A data frame of booleans.
-
dtype
= 'bool'¶
-
-
class
tgutils.pandas.
FrameFloat32
(data=None, index=None, columns=None, dtype=None, copy=False)¶ Bases:
tgutils.pandas.BaseFrame
A data frame of 32-bit floating-point numbers.
-
dtype
= 'float32'¶
-
-
class
tgutils.pandas.
FrameFloat64
(data=None, index=None, columns=None, dtype=None, copy=False)¶ Bases:
tgutils.pandas.BaseFrame
A data frame of 64-bit floating-point numbers.
-
dtype
= 'float64'¶
-
-
class
tgutils.pandas.
FrameInt16
(data=None, index=None, columns=None, dtype=None, copy=False)¶ Bases:
tgutils.pandas.BaseFrame
A data frame of 16-bit integers.
-
dtype
= 'int16'¶
-
-
class
tgutils.pandas.
FrameInt32
(data=None, index=None, columns=None, dtype=None, copy=False)¶ Bases:
tgutils.pandas.BaseFrame
A data frame of 32-bit integers.
-
dtype
= 'int32'¶
-
-
class
tgutils.pandas.
FrameInt64
(data=None, index=None, columns=None, dtype=None, copy=False)¶ Bases:
tgutils.pandas.BaseFrame
A data frame of 64-bit integers.
-
dtype
= 'int64'¶
-
-
class
tgutils.pandas.
FrameInt8
(data=None, index=None, columns=None, dtype=None, copy=False)¶ Bases:
tgutils.pandas.BaseFrame
A data frame of 8-bit integers.
-
dtype
= 'int8'¶
-
-
tgutils.pandas.
S
= ~S¶ Type variable for data series.
-
tgutils.pandas.
SERIES_OF_DTYPE
= {'bool': <class 'tgutils.pandas.SeriesBool'>, 'float32': <class 'tgutils.pandas.SeriesFloat32'>, 'float64': <class 'tgutils.pandas.SeriesFloat64'>, 'int16': <class 'tgutils.pandas.SeriesInt16'>, 'int32': <class 'tgutils.pandas.SeriesInt32'>, 'int64': <class 'tgutils.pandas.SeriesInt64'>, 'int8': <class 'tgutils.pandas.SeriesInt8'>, 'str': <class 'tgutils.pandas.SeriesStr'>}¶ The phantom type for a data series by its type name.
-
class
tgutils.pandas.
SeriesBool
(data=None, index=None, dtype=None, name=None, copy=False, fastpath=False)¶ Bases:
tgutils.pandas.BaseSeries
A data series of booleans.
-
dtype
= 'bool'¶
-
-
class
tgutils.pandas.
SeriesFloat32
(data=None, index=None, dtype=None, name=None, copy=False, fastpath=False)¶ Bases:
tgutils.pandas.BaseSeries
A data series of 32-bit floating-point numbers.
-
dtype
= 'float32'¶
-
-
class
tgutils.pandas.
SeriesFloat64
(data=None, index=None, dtype=None, name=None, copy=False, fastpath=False)¶ Bases:
tgutils.pandas.BaseSeries
A data series of 64-bit floating-point numbers.
-
dtype
= 'float64'¶
-
-
class
tgutils.pandas.
SeriesInt16
(data=None, index=None, dtype=None, name=None, copy=False, fastpath=False)¶ Bases:
tgutils.pandas.BaseSeries
A data series of 16-bit integers.
-
dtype
= 'int16'¶
-
-
class
tgutils.pandas.
SeriesInt32
(data=None, index=None, dtype=None, name=None, copy=False, fastpath=False)¶ Bases:
tgutils.pandas.BaseSeries
A data series of 32-bit integers.
-
dtype
= 'int32'¶
-
-
class
tgutils.pandas.
SeriesInt64
(data=None, index=None, dtype=None, name=None, copy=False, fastpath=False)¶ Bases:
tgutils.pandas.BaseSeries
A data series of 64-bit integers.
-
dtype
= 'int64'¶
-
-
class
tgutils.pandas.
SeriesInt8
(data=None, index=None, dtype=None, name=None, copy=False, fastpath=False)¶ Bases:
tgutils.pandas.BaseSeries
A data series of 8-bit integers.
-
dtype
= 'int8'¶
-
-
class
tgutils.pandas.
SeriesStr
(data=None, index=None, dtype=None, name=None, copy=False, fastpath=False)¶ Bases:
tgutils.pandas.BaseSeries
A data series of Unicode strings.
-
dtype
= 'O'¶
-
tgutils.setup_mypy module¶
Import this module in the setup.py
file to use the provided Numpy/Pandas mypy
stubs.
TODO: This is a horrible hack.
tgutils.tests module¶
Common utilities for tests.
-
class
tgutils.tests.
TestWithFiles
(methodName='runTest')¶ Bases:
tgutils.tests.TestWithReset
-
expect_file
(path: str, expected: str) → None¶
-
setUp
() → None¶ Hook method for setting up the test fixture before exercising it.
-
tearDown
() → None¶ Hook method for deconstructing the test fixture after testing it.
-
-
class
tgutils.tests.
TestWithReset
(methodName='runTest')¶ Bases:
unittest.case.TestCase
-
setUp
() → None¶ Hook method for setting up the test fixture before exercising it.
-
-
tgutils.tests.
undent
(content: str) → str¶
-
tgutils.tests.
write_file
(path: str, content: str = '') → None¶
tgutils.tg_qsub module¶
Submit a job to qsub in the Tanay Group lab.
-
class
tgutils.tg_qsub.
Qsubber
¶ Bases:
object
Submit a job to qsub in the Tanay Group lab.
-
__init__
() → None¶ Initialize self. See help(type(self)) for accurate signature.
-
run
() → int¶ Run the submitted job using the command line options.
-
-
tgutils.tg_qsub.
main
() → None¶ Submit a job to qsub in the Tanay Group lab.
tgutils.version module¶
Version is generated by setup.py.
Module contents¶
Main TGUtils module.