metacell.storage package¶
Submodules¶
metacell.storage.genes module¶
Manage storage of per-gene data.
-
class
metacell.storage.genes.
Genes
(full: metacell.storage.genes.GenesSet, names: tgutils.numpy.ArrayStr)¶ Bases:
object
A collection of genes with some associated data.
This must be consistent between all profiles that are processed together.
-
__init__
(full: metacell.storage.genes.GenesSet, names: tgutils.numpy.ArrayStr) → None¶ Initialize the genes.
-
abstract
any_array
(name: str) → numpy.ndarray¶ Load a per-gene data array.
-
array
(cls: Type[A], name: str) → A¶ Load a per-gene data array.
-
abstract
available_data
() → List[str]¶ Return a list of the available data.
-
count
= None¶ Convenient access to the number of genes.
-
static
created
(path: str, organism: str) → None¶ Write the
genes.yaml
file after creating a genes directory.At minimum, the directory should contain the gene names file.
-
full
= None¶ The full gene set this is derived from.
-
abstract
has_array
(name: str) → bool¶ Whether there exists some per-gene data.
-
static
load
(path: str, organism: Optional[str] = None) → metacell.storage.genes.Genes¶ Load genes from a directory.
-
names
= None¶ The sorted upper-case names of genes in the set.
-
uuid
= None¶ The unique identifier of this list of gene names.
-
-
class
metacell.storage.genes.
GenesMetadata
(**kwargs)¶ Bases:
metacell.storage.metadata.YamlMetadata
Per-batch meta-data.
-
__init__
(**kwargs) → None¶ Create metadata for for some genes.
-
genes_count
= None¶ The number of genes.
-
organism
= None¶ The organism these are genes of.
-
required_keys
= {'genes_count': <class 'int'>, 'organism': <class 'str'>, 'uuid': <class 'str'>}¶
-
-
class
metacell.storage.genes.
GenesSet
(metadata: metacell.storage.genes.GenesMetadata)¶ Bases:
metacell.storage.genes.Genes
A set of genes with some associated data.
This must be consistent between all batches that are processed together.
-
__init__
(metadata: metacell.storage.genes.GenesMetadata) → None¶ Open a genes set directory for access.
-
any_array
(name: str) → numpy.ndarray¶ Load a per-gene data array.
-
available_data
() → List[str]¶ Return a list of the available data.
-
data_path
(path: str) → str¶ Return the path of a data file in the genes directory.
-
has_array
(name: str) → bool¶ Whether there exists some per-gene data.
-
metadata
= None¶ The meta-data describing the genes.
-
-
class
metacell.storage.genes.
GenesSubset
(superset: metacell.storage.genes.Genes, included_indices: tgutils.numpy.ArrayInt32)¶ Bases:
metacell.storage.genes.Genes
A subset of some genes.
-
__init__
(superset: metacell.storage.genes.Genes, included_indices: tgutils.numpy.ArrayInt32) → None¶ Create a subset of some genes.
-
any_array
(name: str) → numpy.ndarray¶ Load a per-gene data array.
-
available_data
() → List[str]¶ Return a list of the available data.
-
has_array
(name: str) → bool¶ Whether there exists some per-gene data.
-
included_indices
= None¶ The indices of the superset genes which are included in the subset.
-
superset
= None¶ The genes this is a subset of.
-
metacell.storage.hca module¶
Handle HCA (h5) files.
metacell.storage.helpers module¶
Helper functions.
-
metacell.storage.helpers.
combine_uuids
(uuids: List[uuid.UUID]) → uuid.UUID¶ Use
md5sum
to combine multiple UUIDs into a single UUID.
-
metacell.storage.helpers.
file_uuid
(path: str) → uuid.UUID¶ Compute a checksum of a disk file.
-
metacell.storage.helpers.
sum_profiles_loop
(expected: int) → tgutils.application.Loop¶ Create a logged loop for summing profiles.
metacell.storage.imports module¶
Handle multi-lane sparse format data.
-
class
metacell.storage.imports.
CsvFormat
(path: str, data: Dict[str, Any])¶ Bases:
object
Describe the format of a CSV file.
-
__init__
(path: str, data: Dict[str, Any]) → None¶ Construct from an entry in a
format.yaml
file.
-
field_names
= None¶ When the file has no header line, assume this header line is used.
-
field_types
= None¶ For each recognized field of the file, its expected data type.
-
separator
= None¶ The separator between fields.
-
-
class
metacell.storage.imports.
DelaneData
¶ Bases:
object
Data needed to delane the UMIs matrix.
-
__init__
() → None¶ Initialize self. See help(type(self)) for accurate signature.
-
data_of_lanes
= None¶ The per-lane data.
-
data_of_profiles
= None¶ The per-profile data by the 0-based original index.
-
get_lane
(lane_name: str) → metacell.storage.imports.LaneData¶ Get the data for a specific lane.
-
-
class
metacell.storage.imports.
DirFormat
(path: str)¶ Bases:
object
Describe the format of all CSV files in a standard format directory.
-
__init__
(path: str) → None¶ Construct the format from a
format.yaml
file.
-
genes
= None¶ The format of the genes CSV file.
-
profiles
= None¶ The format of the profiles CSV file.
-
-
class
metacell.storage.imports.
LaneData
(name: str)¶ Bases:
object
Per-lane data needed to delane the UMIs matrix.
-
__init__
(name: str) → None¶ Initialize empty lane data.
-
add_entry
(gene_index: int, profile_index: int, umis_count: int) → None¶ Add a scanned UMIs entry to the lane.
-
entry_lines
= None¶ The lines of the entries of the lane.
-
last_profile_data_line_index
= None¶ The last profile data line index of the lane.
-
name
= None¶ The unique lane name.
-
profiles_count
= None¶ The number of profiles in the lane.
-
write_umis_matrix
(delaned_root: str, umis_header_line: str, genes_count: int) → None¶ Write the collected entries into the UMIs matrix file.
-
-
class
metacell.storage.imports.
ProfileData
(lane_data: metacell.storage.imports.LaneData)¶ Bases:
object
Per-profile data needed to delane the UMIs matrix.
-
__init__
(lane_data: metacell.storage.imports.LaneData) → None¶ Initialize scanned profile data.
-
index_in_lane
= None¶ The 1-based index of the profile in the lane.
-
lane_data
= None¶ The lane the profile belongs to.
-
metacell.storage.matrices module¶
Memory-mapped matrices for observations storage.
-
metacell.storage.matrices.
MAGIC_BYTES
= b'METACELL SPARSE MEMORY MAPPED MATRIX VERSION 0\x00\x00'¶ The magic string for a sparse memory mapped matrix.
-
metacell.storage.matrices.
MAGIC_SIZE
= 48¶ Size of the magic string identifying the file type.
-
class
metacell.storage.matrices.
MemoryMappedMatrix
(path: str)¶ Bases:
object
A read-only memory mapped file containing a series of integer measurements per profile.
This is optimized to allow for accessing the data of arbitrary profiles. It does not allow for efficient extraction of the data of arbitrary genes.
The file format is as follows:
Magic string: containing the
metadata.storage.matrices.MAGIC_BYTES
.Profiles count: 4-bytes little-endian.
Genes count: 4-bytes little-endian.
Profiles data: See below.
Padding to 4 bytes alignment using 0 bytes.
Index of offsets of profile data: See below.
The format of the index of the offsets is as follows:
For each profile, the 4-byte little-endian offset of the 1st byte of its data.
Finally, the offset of the 1st byte following the data of the last profile.
The format of the profile data is as follows:
A sequence of pairs of bytes, where the 1st byte is the unsigned delta to apply to the current gene index, and the 2nd byte is the unsigned value to add to the current gene of the profile.
This format is very compact when more than 1/256 of the genes have non-zero data, and when the measurement value is typically less than 256. When genes are sparser, we write additional byte pairs with a delta gene index of 255 and measurement of 0. When the measurement value is too high, we write additional byte pairs with a delta gene index of 0.
Decompression is reasonably fast as there are no branches involved. It is difficult to vectorize but the saving in I/O (especially when accessing the data over a network) “should” justify this compression. However it is trivial to parallelize decompression of multiple profile data.
The compression rate is pretty good (around 1/50 compared to dense encoding). The optimal number of bits for the genes and measurement deltas is actually 8 and 4, respectively. However, this complicates the code and only reduces the file sizes by around 20%. Using an optimized Huffman tree or
gzip
it is possible to further compress the result to around 1/2th the current size, but this would come at the cost of much slower decompression speed.The profile data offsets index appears at the end of the file, but its location is known to be
file_size - (profiles_count + 1) * 4
. The profiles count is known to be at the offsetmetaprofile.storage.matrices.MAGIC_SIZE
.All 4-byte values are stored on a 4-byte aligned offset, and are in little-endian order. This allowing a memory-mapped implementation on most processors to decode the value using an efficient aligned pointer dereference.
-
__init__
(path: str) → None¶ Open a read-only memory-mapped matrix file.
-
add_profile_data
(profile_index: int, array: tgutils.numpy.ArrayFloat32) → None¶ Add the profile data into a numpy array.
-
close
() → None¶ Release all the resources held by the file.
-
entries_count
() → int¶ Return the total number of non-zero entries in the matrix.
-
genes_count
= None¶ The number of genes in the matrix.
-
mmap
= None¶ The mapped memory.
-
profiles_count
= None¶ The number of profiles in the matrix.
-
to_numpy_matrix
() → tgutils.numpy.MatrixFloat32¶ Return the full matrix as a dense numpy array, whose rows are genes, columns are profiles, and entries are 4-byte integers containing observations.
-
static
write
(path: str, genes_count: int, data: List[List[Tuple[int, int]]]) → None¶ Write a sparse memory-mapped matrix.
- Parameters
path – The path of the disk file to write.
genes_count – The number of genes (size of gene set).
data – A list of profile data, where each profile data is a list of tuples, where each tuple has a gene index and an observation. The list of tuples must be sorted by the gene index. Entries with zero observations are allowed, and have no effect on the output. Multiple tuples with the same gene index are allowed, and are summed into the output.
metacell.storage.metadata module¶
Manage meta-data in YAML files.
-
class
metacell.storage.metadata.
YamlMetadata
(**kwargs)¶ Bases:
types.SimpleNamespace
Meta-data stored in a YAML file.
-
__init__
(**kwargs) → None¶ Create the metadata object.
-
as_dict
(**kwargs) → Dict[str, Any]¶ Return the metadata as a clean dictionary for dumping to YAML.
-
static
detect_cls
(cls: Type[YM], _dictionary: Dict[str, Any]) → Type[YM]¶ Decide on the concrete metadata class to load from the dictionary.
-
dump
(yaml_path: str, **kwargs) → None¶ Dump batch metadata into a file.
-
classmethod
load
(yaml_path: str) → YM¶ Load the batch meta-data from a file.
-
path
= None¶ The path of the directory containing the data.
-
removed_keys
= ['path', 'yaml_path']¶ The keys that should be removed from the YAML file.
-
required_keys
= {'uuid': <class 'str'>}¶ The keys that must appear in the YAML file.
-
uuid
= None¶ The globally unique identifier of the data.
-
yaml_path
= None¶ The path of the YAML meta-data file.
-
metacell.storage.profiles module¶
Maintain per-gene-per-profile UMIs data on the disk.
-
class
metacell.storage.profiles.
CombinedMetadata
(yaml_path: str, data: Any)¶ Bases:
types.SimpleNamespace
Per-combined profiles container metadata.
-
__init__
(yaml_path: str, data: Any) → None¶ Load the meta-data from a YAML file.
-
as_dict
(**kwargs) → Dict[str, Any]¶ Return the metadata as a clean dictionary for dumping to YAML.
-
name
= None¶ The name of the combined container in the collection.
-
profiles_count
= None¶ The number of profiles in the combined container.
-
uuid
= None¶ The unique identifier of the combined container.
-
-
class
metacell.storage.profiles.
Profiles
(*, metadata: metacell.storage.profiles.ProfilesMetadata, genes: metacell.storage.genes.Genes, name: str = '')¶ Bases:
object
Common base class for all profile containers.
-
__init__
(*, metadata: metacell.storage.profiles.ProfilesMetadata, genes: metacell.storage.genes.Genes, name: str = '') → None¶ Initialize self. See help(type(self)) for accurate signature.
-
add_into_gp_array
(name: str, profile_index: int, profile_array: tgutils.numpy.ArrayFloat32) → tgutils.numpy.ArrayFloat32¶ Add the requested data of a profile into an array.
-
any_array
(kind: str, name: str) → numpy.ndarray¶ Return a per-gene or per-profile array of any type.
-
any_frame
(kind: str, name: str, *, profile_indices: Optional[Collection], gene_indices: Optional[Collection]) → pandas.core.frame.DataFrame¶ Return a per-gene/gene, per-gene/profile or per-profile/profile data of any type.
Note
The returned frame index for genes (for
gg
andgp
) is the gene names, even though thegene_indices
parameter uses the integer index. Both theprofile_indices
and the returned frame index for profiles (forgp
andpp
) use the integer profile indices.
-
any_matrix
(kind: str, name: str, profile_indices: Optional[Collection] = None, gene_indices: Optional[Collection] = None) → numpy.ndarray¶ Return a per-gene/gene, per-gene/profile or per-profile/profile data of any type.
-
any_series
(kind: str, name: str) → pandas.core.series.Series¶ Return a series of per-gene or per-profile data of any type.
-
array
(cls: Type[A], kind: str, name: str) → A¶ Return a per-gene or per-profile array of some type.
-
available_data
(kind: str) → List[str]¶ Return a list of the available data of the specified kind (‘g’, ‘p’, ‘gp’, ‘gg’ or ‘pp’).
-
data_path
(path: str) → str¶ Return the path of a data file in the container directory.
-
abstract
dir_paths
() → List[str]¶ Return the list of directories containing actual UMIs data for the container.
This is just a single path for a batch, and a list for a merged collection of batches. For a view it contains the view’s directory in addition to the base container’s.
-
frame
(cls: Type[F], kind: str, name: str, *, profile_indices: Optional[Collection] = None, gene_indices: Optional[Collection] = None) → F¶ Return a per-gene/gene, per-gene/profile or per-profile/profile data of some type.
Note
The returned frame index for genes (for
gg
andgp
) is the gene names, even though thegene_indices
parameter uses the integer index. Both theprofile_indices
and the returned frame index for profiles (forgp
andpp
) use the integer profile indices.
-
genes
= None¶ The genes set stored for each profile.
-
has_array
(kind: str, name: str) → bool¶ Test whether the container contains some per-gene or per-profile data file.
-
has_matrix
(kind: str, name: str) → bool¶ Test whether the container contains some per-gene/gene, per-gene/profile or per-profile/profile data file.
-
static
load
(path: str, genes: metacell.storage.genes.Genes, *, name: str = '', profiles_kind: Optional[str] = None) → metacell.storage.profiles.Profiles¶ Load a profiles container from disk directory.
-
matrix
(cls: Type[A], kind: str, name: str, *, profile_indices: Optional[Collection] = None, gene_indices: Optional[Collection] = None) → A¶ Return a per-gene/gene, per-gene/profile or per-profile/profile data of any type.
-
metadata
= None¶ The meta-data describing the container.
-
name
= None¶ The human friendly name of the container.
-
profile_indices
() → tgutils.numpy.ArrayInt32¶ Return an array of profile indices.
-
abstract
profile_name
(profile_index: int, *, base_index: int = 0) → str¶ Return a human-friendly profile name.
This uses the profile’s barcode (if a
p.barcode.txt
file exists), otherwise the numeric profile index. In both cases the name is prefixed with the specific batch name, if any.
-
profile_names
(*, base_index: int = 0) → tgutils.numpy.ArrayStr¶ Return an array of all the profile names.
-
profiles_count
= None¶ The number of profiles in the container (to be filled by the sub-class).
-
series
(cls: Type[S], kind: str, name: str) → S¶ Return a series of per-gene or per-profile data of some type.
-
total_umis
(kind: str, profile_indices: Optional[Collection] = None) → tgutils.numpy.ArrayFloat32¶ Return an array of the per-gene or per-profile total UMIs, for all or some of the profiles.
The first time this is called (only for all the profiles), it ts cached on disk.
-
write
(cls: Type[A], array: A, kind: str, name: str) → None¶ Write a data array or matrix into the container.
-
-
class
metacell.storage.profiles.
ProfilesBatch
(*, metadata: metacell.storage.profiles.ProfilesMetadata, genes: metacell.storage.genes.Genes, name: str = '')¶ Bases:
metacell.storage.profiles.Profiles
A batch of UMIs measurement profiles.
-
__init__
(*, metadata: metacell.storage.profiles.ProfilesMetadata, genes: metacell.storage.genes.Genes, name: str = '') → None¶ Open a batch of profiles for access.
-
static
created
(path: str, genes: metacell.storage.genes.Genes, profiles_kind: str) → None¶ Write the
profiles.yaml
file after creating a profiles batch in some directory.At minimum, the directory should include a raw UMIs matrix.
-
dir_paths
() → List[str]¶ Return the list of directories containing actual UMIs data for the container.
This is just a single path for a batch, and a list for a merged collection of batches. For a view it contains the view’s directory in addition to the base container’s.
-
metadata
= None¶ The meta-data describing the batch.
-
profile_name
(profile_index: int, *, base_index: int = 0) → str¶ Return a human-friendly profile name.
This uses the profile’s barcode (if a
p.barcode.txt
file exists), otherwise the numeric profile index. In both cases the name is prefixed with the specific batch name, if any.
-
-
class
metacell.storage.profiles.
ProfilesBatchMetadata
(**kwargs)¶ Bases:
metacell.storage.profiles.ProfilesMetadata
Per-batch meta-data.
-
container
¶ alias of
ProfilesBatch
-
-
class
metacell.storage.profiles.
ProfilesCollection
(metadata: metacell.storage.profiles.ProfilesMetadata, genes: metacell.storage.genes.Genes, name: str = '')¶ Bases:
metacell.storage.profiles.Profiles
A collection of named containers.
Each combined container is a sub-directory which is, itself, a profiles container. Per-profile and per-gene-per-profile data is collected from the leaf (batch or view) containers.
-
__init__
(metadata: metacell.storage.profiles.ProfilesMetadata, genes: metacell.storage.genes.Genes, name: str = '') → None¶ Initialize self. See help(type(self)) for accurate signature.
-
add_into_gp_array
(name: str, profile_index: int, profile_array: tgutils.numpy.ArrayFloat32) → tgutils.numpy.ArrayFloat32¶ Add the requested data of a profile into an array.
-
available_data
(kind: str) → List[str]¶ Return a list of the available data of the specified kind (‘g’, ‘p’, ‘gp’, ‘gg’ or ‘pp’).
-
container_index_of_profile
(profile_index: int) → int¶ Return the index of the container to which a specific profile belongs.
-
containers
= None¶ The combined (leaf) containers.
-
static
created
(path: str, combined_dirs: List[str]) → None¶ Write the
profiles.yaml
file after creating a profiles collection in some directory.The directory should include all the combined profile containers.
-
dir_paths
() → List[str]¶ Return the list of directories containing actual UMIs data for the container.
This is just a single path for a batch, and a list for a merged collection of batches. For a view it contains the view’s directory in addition to the base container’s.
-
first_profile_indices
= None¶ The first index of each container (for searching).
-
has_array
(kind: str, name: str) → bool¶ Test whether the container contains some per-gene or per-profile data file.
-
has_matrix
(kind: str, name: str) → bool¶ Test whether the container contains some per-gene/gene, per-gene/profile or per-profile/profile data file.
-
index_by_name
= None¶ The index of a combined container by its name.
-
metadata
= None¶ The meta-data describing the collection.
-
profile_name
(profile_index: int, *, base_index: int = 0) → str¶ Return a human-friendly profile name.
This uses the profile’s barcode (if a
p.barcode.txt
file exists), otherwise the numeric profile index. In both cases the name is prefixed with the specific batch name, if any.
-
-
class
metacell.storage.profiles.
ProfilesCollectionMetadata
(**kwargs)¶ Bases:
metacell.storage.profiles.ProfilesMetadata
Per-collection meta-data.
-
__init__
(**kwargs) → None¶ Create metadata for a collection of batches of profiles.
-
as_dict
(**kwargs) → Dict[str, Any]¶ Return the metadata as a clean dictionary.
-
combined_profiles
= None¶ The list of batches contained in the collection.
-
container
¶ alias of
ProfilesCollection
-
required_keys
= {'combined_profiles': <class 'list'>, 'genes_count': <class 'int'>, 'genes_uuid': <class 'str'>, 'organism': <class 'str'>, 'profiles_count': <class 'int'>, 'profiles_kind': <class 'str'>, 'uuid': <class 'str'>}¶
-
-
class
metacell.storage.profiles.
ProfilesGroups
(*, metadata: metacell.storage.profiles.ProfilesMetadata, genes: metacell.storage.genes.GenesSet, name: str = '')¶ Bases:
metacell.storage.profiles.ProfilesBatch
A batch of profiles where each one is the sum of a group of profiles from some other container.
-
__init__
(*, metadata: metacell.storage.profiles.ProfilesMetadata, genes: metacell.storage.genes.GenesSet, name: str = '') → None¶ Open a view of a profiles groups container for access.
In this case, the passed genes parameter is the full genes, which must match the full genes of the grouped container. The exact used genes are identical to the genes of the grouped container.
-
static
created
(path: str, grouped_profiles: metacell.storage.profiles.Profiles) → None¶ Write the
profiles.yaml
file after creating a profiles groups in some directory.The directory should contain the
p.grouped_count.npy
file with the size of each group.
-
grouped_profiles
= None¶ The open grouped container.
-
metadata
= None¶ The meta-data describing the groups.
-
-
class
metacell.storage.profiles.
ProfilesGroupsMetadata
(**kwargs)¶ Bases:
metacell.storage.profiles.ProfilesBatchMetadata
Per-groups meta-data.
-
container
¶ alias of
ProfilesGroups
-
required_keys
= {'genes_count': <class 'int'>, 'genes_uuid': <class 'str'>, 'grouped_profiles': <class 'str'>, 'grouped_uuid': <class 'str'>, 'organism': <class 'str'>, 'profiles_count': <class 'int'>, 'profiles_kind': <class 'str'>, 'uuid': <class 'str'>}¶
-
-
class
metacell.storage.profiles.
ProfilesMetadata
(**kwargs)¶ Bases:
metacell.storage.metadata.YamlMetadata
Common meta-data for any profiles container.
-
__init__
(**kwargs) → None¶ Create metadata for a batch of profiles.
-
container
= None¶ The (concrete) container of the (derived) class.
-
static
detect_cls
(_cls: Type[YM], dictionary: Dict[str, Any]) → Type[YM]¶ Decide on the concrete metadata class to load from the dictionary.
-
genes_count
= None¶ The number of genes.
-
genes_uuid
= None¶ The set of genes that were measured for each profile.
-
open
(genes: metacell.storage.genes.Genes, *, name: str = '') → metacell.storage.profiles.Profiles¶ Open the actual container for data access.
-
organism
= None¶ The organism the data is for (‘human’, ‘mouse’, etc.).
-
profiles_count
= None¶ The number of profiles this contains.
-
profiles_kind
= None¶ The kind of profiles this contains (‘cells’, ‘metacells’, etc.)
-
required_keys
= {'genes_count': <class 'int'>, 'genes_uuid': <class 'str'>, 'organism': <class 'str'>, 'profiles_count': <class 'int'>, 'profiles_kind': <class 'str'>, 'uuid': <class 'str'>}¶ The keys that must appear in the metadata YAML file.
-
-
class
metacell.storage.profiles.
ProfilesView
(*, metadata: metacell.storage.profiles.ProfilesMetadata, genes: metacell.storage.genes.GenesSet, name: str = '')¶ Bases:
metacell.storage.profiles.ProfilesBatch
A restricted view of some profiles container.
The view is specified using the two optional files,
p.base_index.npy
andgenes.base_index.npy
. If missing, then the view is assumed to use the full profiles and/or genes of the base container.Having a view which uses the full base container data is useful when creating a new collection. Such a “just a link” view allows the collection to refer to an arbitrary existing container, without having to physically copy it into the collection directory. This is different from a symbolic link in that it has different modification times (for
make
-like tools), and allows creating additional computed data inside the view without modifying the original base container.Note
Some data of the base container may not be meaningfully subset-able. For example,
p.total_umis
would be incorrect if the view uses a subset of the genes. By default, the code assumes all data is subset-able, as this allows for each access to barcodes and other meta-data associated with each profile (or gene). To avoid subset-ing computed data, add the name of the data to one ofmetacell.storage.profiles.ProfilesView.GENES_DEPENDENT_DATA
,metacell.storage.profiles.ProfilesView.PROFILES_DEPENDENT_DATA
.-
GENES_DEPENDENT_DATA
= {'g': {'mean_fraction'}, 'gg': {}, 'gp': {}, 'p': {'total_umis'}, 'pp': {'balanced_ranks', 'correlation', 'edge_weights', 'pruned_ranks'}}¶ For each kind (
gp
,gg
,pp
,g
,p
), the names of data which depends on the set of genes.
-
PROFILES_DEPENDENT_DATA
= {'g': {'mean_fraction', 'total_umis'}, 'gg': {}, 'gp': {'fold_in_group'}, 'p': {}, 'pp': {'balanced_ranks', 'correlation', 'edge_weights', 'pruned_ranks'}}¶ For each kind (
gp
,gg
,pp
,g
,p
), the names of data which depends on the set of profiles.
-
__init__
(*, metadata: metacell.storage.profiles.ProfilesMetadata, genes: metacell.storage.genes.GenesSet, name: str = '') → None¶ Open a view of a profiles container for access.
In this case, the passed genes parameter is the full genes, which must match the full genes of the base container. The exact used gene subset is computed based on the base container’s genes, and the content of the
genes.base_index.npy
file, if any.
-
add_into_gp_array
(name: str, profile_index: int, profile_array: tgutils.numpy.ArrayFloat32) → tgutils.numpy.ArrayFloat32¶ Add the requested data of a profile into an array.
-
available_data
(kind: str) → List[str]¶ Return a list of the available data of the specified kind (‘g’, ‘p’, ‘gp’, ‘gg’ or ‘pp’).
-
base_gene_index
(gene_index: int) → int¶ Return the index of the gene in the base profiles container.
-
base_gene_indices
= None¶ The indices of the included genes, or
None
if all genes are included.
-
base_profile_index
(profile_index: int) → int¶ Return the index of the profile in the base profiles container.
-
base_profile_indices
= None¶ The indices of the included profiles, or
None
if all profiles are included.
-
base_profiles
= None¶ The open base container.
-
static
created
(path: str, base_profiles: metacell.storage.profiles.Profiles) → None¶ Write the
profiles.yaml
file after creating a profiles view in some directory.The directory should contain the
base_index
files of the filtered profiles and/or genes.
-
dir_paths
() → List[str]¶ Return the list of directories containing actual UMIs data for the container.
This is just a single path for a batch, and a list for a merged collection of batches. For a view it contains the view’s directory in addition to the base container’s.
-
has_array
(kind: str, name: str) → bool¶ Test whether the container contains some per-gene or per-profile data file.
-
has_matrix
(kind: str, name: str) → bool¶ Test whether the container contains some per-gene/gene, per-gene/profile or per-profile/profile data file.
-
is_slicing
() → bool¶ Whether the view only includes some of the base container’s data.
-
is_slicing_genes
() → bool¶ Whether the view only includes some of the base container’s genes.
-
is_slicing_profiles
() → bool¶ Whether the view only includes some of the base container’s profiles.
-
metadata
= None¶ The meta-data describing the batch.
-
-
class
metacell.storage.profiles.
ProfilesViewMetadata
(**kwargs)¶ Bases:
metacell.storage.profiles.ProfilesBatchMetadata
Per-view meta-data.
-
container
¶ alias of
ProfilesView
-
required_keys
= {'base_profiles': <class 'str'>, 'base_uuid': <class 'str'>, 'genes_count': <class 'int'>, 'genes_uuid': <class 'str'>, 'organism': <class 'str'>, 'profiles_count': <class 'int'>, 'profiles_kind': <class 'str'>, 'uuid': <class 'str'>}¶
-
-
metacell.storage.profiles.
RawMatrix
= typing.Union[metacell.storage.matrices.MemoryMappedMatrix, numpy.ndarray]¶ A matrix of data per profile and gene.
-
metacell.storage.profiles.
grouped_kind
(groups_kind: str) → str¶ Return the profiles kind for the grouped profiles.
-
metacell.storage.profiles.
groups_kind
(grouped_kind: str) → str¶ Return the profiles kind for the profiles groups.
metacell.storage.views module¶
Create profile views.
Module contents¶
Manage disk storage of metacell data.