data ¶

Interface of DafReader and DafWriter. See the Julia documentation , documentation and documentation for details.

class dafpy.data. DafReader ( jl_obj ) [source] ¶

Read-only access to Daf data. See the Julia documentation for details.

property name : str ¶: Return the (hopefully unique) name of the Daf data set.

description ( * , cache : bool = False , deep : bool = False , tensors : bool = True ) → str [source] ¶: Return a (multi-line) description of the contents of Daf data. See the Julia documentation for details.

has_scalar ( name : str ) → bool [source] ¶: Check whether a scalar property with some name exists in the Daf data set. See the Julia documentation for details.

Get the value of a scalar property with some name in the Daf data set. See the Julia documentation for details.

Numeric scalars are always returned as int or float, regardless of the specific data type they are stored in the Daf data set (e.g., a UInt8 will be returned as an int instead of a np.uint8).

scalars_set ( ) → AbstractSet [ str ] [source] ¶: The names of the scalar properties in the Daf data set. See the Julia documentation for details.

has_axis ( axis : str ) → bool [source] ¶: Check whether some axis exists in the Daf data set. See the Julia documentation for details.

axes_set ( ) → AbstractSet [ str ] [source] ¶: The set of names of the axes of the Daf data set. See the Julia documentation for details.

axis_length ( axis : str ) → int [source] ¶: The number of entries along the axis in the Daf data set. See the Julia documentation for details.

axis_np_vector ( axis : str ) → ndarray [source] ¶

A numpy vector of unique names of the entries of some axis of the Daf data set. See the Julia documentation for details.

This creates an in-memory copy of the data, which is cached for repeated calls.

axis_np_entries ( axis : str , indices : Sequence [ int ] | None = None , * , allow_empty : bool = False ) → ndarray [source] ¶

Return a numpy vector of the names of entries of the indices in the axis. See the Julia documentation for details.

The indices passed here are 0-based to fit the Python conventions. This means that if allow_empty, negative indices are converted to the empty string.

axis_dict ( axis : str ) → Mapping [ str , int ] [source] ¶: Return a dictionary converting axis entry names to their (0-based) integer index.

axis_np_indices ( axis : str , entries : Sequence [ str ] , * , allow_empty : bool = False ) → ndarray [source] ¶

Return a numpy vector of the indices of the entries in the axis. See the Julia documentation for details.

The indices returned here are 0-based to fit the Python conventions. This means that if allow_empty, the empty string is converted to the index -1.

axis_pd_indices ( axis : str , entries : Sequence [ str ] , * , allow_empty : bool = False ) → Series [source] ¶: Return a pandas series of the indices of the entries in the axis. See the Julia documentation for details.

has_vector ( axis : str , name : str ) → bool [source] ¶: Check whether a vector property with some name exists for the axis in the Daf data set. See the Julia documentation for details.

vectors_set ( axis : str ) → AbstractSet [ str ] [source] ¶: The set of names of the vector properties for the axis in Daf data set, not including the special name property. See the Julia documentation for details.

get_np_vector ( axis : str , name : str , * , default : None ) → ndarray | None [source] ¶

Get the vector property with some name for some axis in the Daf data set. See the Julia documentation for details.

This always returns a numpy vector (unless default is None and the vector does not exist). If the stored data is numeric and dense, this is a zero-copy view of the data stored in the Daf data set. Otherwise, a Python copy of the data as a dense numpy array is returned (and cached for repeated calls). Since Python has no concept of sparse vectors (because “reasons”), you can’t zero-copy view a sparse Daf vector using the Python API.

get_pd_vector ( axis : str , name : str , * , default : None ) → Series | None [source] ¶

Get the vector property with some name for some axis in the Daf data set. See the Julia documentation for details.

This is a wrapper around get_np_vector which returns a pandas series using the entry names of the axis as the index.

has_matrix ( rows_axis : str , columns_axis : str , name : str , * , relayout : bool = True ) → bool [source] ¶: Check whether a matrix property with some name exists for the rows_axis and the columns_axis in the Daf data set. See the Julia documentation for details.

matrices_set ( rows_axis : str , columns_axis : str , * , relayout : bool = True ) → AbstractSet [ str ] [source] ¶: The names of the matrix properties for the rows_axis and columns_axis in the Daf data set. See the Julia documentation for details.

get_np_matrix ( rows_axis : str , columns_axis : str , name : str , * , default : None , relayout : bool = True ) → ndarray | csc_matrix | None [source] ¶

Get the column-major matrix property with some name for some rows_axis and columns_axis in the Daf data set. See the Julia documentation for details.

This always returns a column-major numpy matrix or a scipy sparse csc_matrix, (unless default is None and the matrix does not exist). If the stored data is numeric and dense, this is a zero-copy view of the data stored in the Daf data set.

Note that by default numpy matrices are in row-major (C) layout and not in column-major (Fortran) layout. To get a row-major matrix, simply flip the order of the axes, and call transpose on the result (which is an efficient zero-copy operation). This will also (zero-copy) convert the csc_matrix into a csr_matrix.

Also note that although we call this get_np_matrix, the result is not the deprecated np.matrix (which is to be avoided at all costs).

get_pd_matrix ( rows_axis : str , columns_axis : str , name : str , * , default : None , relayout : bool = True ) → DataFrame | None [source] ¶

Get the column-major matrix property with some name for some rows_axis and columns_axis in the Daf data set. See the Julia documentation for details.

This is a wrapper around get_np_matrix which returns a pandas data frame using the entry names of the axes as the indices.

Note that since pandas data frames can’t contain a sparse matrix, the data will always be in a dense numpy matrix, so take care not to invoke this for a too-large sparse data matrix.

This is not to be confused with get_frame which returns a “real” pandas data frame, with arbitrary (query) columns, possibly using a different data type for each.

empty_cache ( * , clear : Literal [ 'MappedData' ] | Literal [ 'MemoryData' ] | Literal [ 'QueryData' ] | None = None , keep : Literal [ 'MappedData' ] | Literal [ 'MemoryData' ] | Literal [ 'QueryData' ] | None = None ) → None [source] ¶: Clear some cached data. By default, completely empties the caches. See the Julia documentation for details.

has_query ( query : str | Axis | Lookup | Names | QuerySequence ) → bool [source] ¶: Return whether the query can be applied to the Daf data. See the Julia documentation for details.

get_np_query ( query : None = None , * , cache : bool = True ) → PendingNumpyQuery

Apply the full query to the Daf data set and return the result. See the Julia documentation for details.

If the result isn’t a scalar, and isn’t an array of names, then we return a numpy array or a scipy csc_matrix.

If the query is not specified, this is intended to be used as query | daf.get_np_query(). This is useful when constructing the query in parts (e.g. Axis("cell") |> Lookup("metacell") |> daf.get_np_query()).

get_pd_query ( query : None = None , * , cache : bool = True ) → PendingPandasQuery

Similar to get_np_query, but return a pandas series or data frame for vector and matrix data.

Note that since pandas data frames can’t contain a sparse matrix, the data will always be in a dense numpy matrix, so take care not to invoke this for a too-large sparse data matrix.

If the query is not specified, this is intended to be used as query | daf.get_np_query(). This is useful when constructing the query in parts (e.g. Axis("cell") |> Lookup("metacell") |> daf.get_np_query()).

Return a DataFrame containing multiple vectors of the same axis. See the Julia documentation for details.

Note this is different from get_pd_matrix which returns some 2D data as a pandas data frame. Here, each column can be the result of an arbitrary query and may have a different data type.

The order of the columns matters. Luckily, the default dictionary type is ordered in modern Python, so if you write columns = {"color": ": type => color", "age": ": batch => age"} you can trust that the color column will be first and the age column will be second.

read_only ( * , name : str | None = None ) → DafReadOnly [source] ¶: Wrap the Daf data sett with a DafReadOnlyWrapper to protect it against accidental modification. See the Julia documentation for details.

class dafpy.data. DafReadOnly ( jl_obj ) [source] ¶

A read-only DafReader, which doesn’t allow any modification of the data. See the Julia documentation for details.

read_only ( * , name : str | None = None ) → DafReadOnly [source] ¶: Wrap the Daf data sett with a DafReadOnlyWrapper to protect it against accidental modification. See the Julia documentation for details.

class dafpy.data. DafWriter ( jl_obj ) [source] ¶

Read-write access to Daf data. See the Julia documentation for details.

Set the value of a scalar property with some name in a Daf data set. See the Julia documentation for details.

Returns self for chaining.

You can force the data type numeric scalars are stored in by using the appropriate numpy type (e.g., a np.uint8 will be stored as a UInt8).

delete_scalar ( name : str , * , must_exist : bool = True ) → Self [source] ¶

Delete a scalar property with some name from the Daf data set. See the Julia documentation for details.

Returns self for chaining.

add_axis ( axis : str , entries : Sequence [ str ] | ndarray , * , overwrite : bool = False ) → Self [source] ¶

Add a new axis to the Daf data set. See the Julia documentation for details.

Returns self for chaining.

delete_axis ( axis : str , * , must_exist : bool = True ) → Self [source] ¶

Delete an axis from the Daf data set. See the Julia documentation for details.

Returns self for chaining.

Set a vector property with some name for some axis in the Daf data set. See the Julia documentation for details.

If the provided value is numeric and dense, this passes a zero-copy view of the data to the Daf data set. Otherwise, a Python copy of the data is made (as a dense numpy array), and passed to Daf.

As a convenience, you can pass a 1xN or Nx1 matrix here and it will be mercifully interpreted as a vector. This allows creating sparse vectors in Daf by passing a 1xN slice of a sparse (column-major) Python matrix.

Returns self for chaining.

empty_dense_vector ( axis : str , name : str , eltype : Type , * , overwrite : bool = False ) → Iterator [ ndarray ] [source] ¶

Create an empty dense vector property with some name for some axis in the Daf data set, and pass it to the block to be filled. See the Julia documentation for details.

Note this is a Python contextmanager, that is, is meant to be used with the with statement: with empty_dense_vector(dset, ...) as empty_vector: ....

empty_sparse_vector ( axis : str , name : str , eltype : Type , nnz : int , indtype : Type , * , overwrite : bool = False ) → Iterator [ Tuple [ ndarray , ndarray ] ] [source] ¶

Create an empty sparse vector property with some name for some axis in the Daf data set, pass its parts ( nzind and nzval) to the block to be filled. See the Julia documentation for details.

Note this is a Python contextmanager, that is, is meant to be used with the with statement: with empty_sparse_vector(dset, ...) as (empty_nzind, empty_nzval): .... The arrays are to be filled with Julia’s SparseVector data, that is, empty_nzind needs to be filled with 1 -based indices (as opposed to 0-based indices typically used by scipy.sparse). Due to this difference in the indexing, we can’t zero-copy share sparse data between Python and Julia. Sigh.

delete_vector ( axis : str , name : str , * , must_exist : bool = True ) → Self [source] ¶

Delete a vector property with some name for some axis from the Daf data set. See the Julia documentation for details.

Returns self for chaining.

set_matrix ( rows_axis : str , columns_axis : str , name : str , value : ndarray | csc_matrix , * , overwrite : bool = False , relayout : bool = True ) → Self [source] ¶

Set the matrix property with some name for some rows_axis and columns_axis in the Daf data set. See the Julia documentation for details.

Since Daf is implemented Julia, this should be a column-major matrix, so if you have a standard numpy or scipy row-major matrix, flip the order of the axes and pass the transpose (which is an efficient zero-copy operation).

Returns self for chaining.

empty_dense_matrix ( rows_axis : str , columns_axis : str , name : str , eltype : Type , * , overwrite : bool = False ) → Iterator [ ndarray ] [source] ¶

Create an empty (column-major) dense matrix property with some name for some rows_axis and columns_axis in the Daf data set, and pass it to the block to be filled. See the Julia documentation for details.

Note this is a Python contextmanager, that is, is meant to be used with the with statement: with empty_dense_matrix(dset, ...) as empty_matrix: ....

empty_sparse_matrix ( rows_axis : str , columns_axis : str , name : str , eltype : Type , nnz : int , indtype : Type , * , overwrite : bool = False ) → Iterator [ Tuple [ ndarray , ndarray , ndarray ] ] [source] ¶

Create an empty (column-major) sparse matrix property with some name for some rows_axis and columns_axis in the Daf data set, and pass its parts ( colptr, rowval and nzval) to the block to be filles. See the Julia documentation for details.

Note this is a Python contextmanager, that is, is meant to be used with the with statement: with empty_sparse_vector(dset, ...) as (empty_colptr, empty_rowval, empty_nzval): .... The arrays are to be filled with Julia’s SparseVector data, that is, empty_colptr and empty_rowval need to be filled with 1 -based indices (as opposed to 0-based indices used by scipy.sparse.cs[cr]_matrix). Due to this difference in the indexing, we can’t zero-copy share sparse data between Python and Julia. Sigh.

relayout_matrix ( rows_axis : str , columns_axis : str , name : str , * , overwrite : bool = False ) → Self [source] ¶

Given a matrix property with some name exists (in column-major layout) in the Daf data set for the rows_axis and the columns_axis, then relayout it and store the row-major result as well (that is, with flipped axes). See the Julia documentation for details.

Returns self for chaining.

delete_matrix ( rows_axis : str , columns_axis : str , name : str , * , must_exist : bool = True ) → Self [source] ¶

Delete a matrix property with some name for some rows_axis and columns_axis from the Daf data set. See the Julia documentation for details.

Returns self for chaining.

dafpy.data. CacheGroup ¶

Types of cached data inside Daf. See the Julia documentation for details.

alias of Union[ Literal[‘MappedData’], Literal[‘MemoryData’], Literal[‘QueryData’]]