symforce.caspar.memory.accessors module¶
- class Accessor(name, storage, *, dtype, kernel_dtype, reuse_indices_from=None, offset=None, stride=None, default=None, after=None, block_size=1024)[source]¶
Bases:
objectParent class for all memory accessors.
- Parameters:
- Return type:
- class ReadSequential(name, storage, *, dtype, kernel_dtype, reuse_indices_from=None, offset=None, stride=None, default=None, after=None, block_size=1024)[source]¶
Bases:
_Sequential,_ReadAccessor- Parameters:
- class WriteSequential(name, storage, *, dtype, kernel_dtype, reuse_indices_from=None, offset=None, stride=None, default=None, after=None, block_size=1024)[source]¶
Bases:
_Sequential,_WriteAccessor- Parameters:
- class ReadStrided(name, storage, *, dtype, kernel_dtype, reuse_indices_from=None, offset=None, stride=None, default=None, after=None, block_size=1024)[source]¶
Bases:
_Strided,_ReadAccessor- Parameters:
- class ReadStridedWithDefault(name, storage, *, dtype, kernel_dtype, reuse_indices_from=None, offset=None, stride=None, default=None, after=None, block_size=1024)[source]¶
Bases:
_Strided,_ReadAccessor- Parameters:
- class WriteStrided(name, storage, *, dtype, kernel_dtype, reuse_indices_from=None, offset=None, stride=None, default=None, after=None, block_size=1024)[source]¶
Bases:
_Strided,_WriteAccessor- Parameters:
- class AddSequential(name, storage, *, dtype, kernel_dtype, reuse_indices_from=None, offset=None, stride=None, default=None, after=None, block_size=1024)[source]¶
Bases:
_Sequential,_AddAccessorAccessor for sequentially adding to the output.
Each thread reads, increments and writes to the element at its global thread index.
- Parameters:
- class ReadIndexed(name, storage, *, dtype, kernel_dtype, reuse_indices_from=None, offset=None, stride=None, default=None, after=None, block_size=1024)[source]¶
Bases:
_Indexed,_ReadAccessor- Parameters:
- class WriteIndexed(name, storage, *, dtype, kernel_dtype, reuse_indices_from=None, offset=None, stride=None, default=None, after=None, block_size=1024)[source]¶
Bases:
_Indexed,_WriteAccessor- Parameters:
- class AddIndexed(name, storage, *, dtype, kernel_dtype, reuse_indices_from=None, offset=None, stride=None, default=None, after=None, block_size=1024)[source]¶
Bases:
_Indexed,_AddAccessorAccessor for adding to indexed elements.
Each thread reads, increments and writes to the index specified by the input array. This accessor does not use atomic operations, so the indices have to be unique.
Not optimized for shared memory access or coalescing.
- Parameters:
Bases:
_UsingIndexData,_UsingSharedMem,_ReadAccessorAccessor for shared memory read access.
You need to generate the shared indices using the lib.shared_indices function. All reads within a block are sorted (for better coalescence), read once and distributed within the block. There is a small overhead compared to Indexed access.
- Parameters:
- class ReadUnique(name, storage, *, dtype, kernel_dtype, reuse_indices_from=None, offset=None, stride=None, default=None, after=None, block_size=1024)[source]¶
Bases:
_UsingSharedMem,_ReadAccessorAccessor for shared memory read access. You need to generate the shared indices using the lib.shared_indices function. All reads within a block are sorted (for better coalesence), read once and distributed within the block. There is a small overhead compared to Indexed access.
- Parameters:
Bases:
_UsingIndexData,_UsingSharedMem,_AddAccessorAccessor for shared sum memory write access.
Each thread adds to the value at a give index. You need to generate the shared indices from the indices using the lib.shared_indices function. All writes within a block are sorted (for better coalesence), written once and distributed within the block.
Equivalent to:
(for i, k in enumerate(indices)): out[k] += values[i]- Parameters:
- class WriteBlockSum(name, storage, *, dtype, kernel_dtype, reuse_indices_from=None, offset=None, stride=None, default=None, after=None, block_size=1024)[source]¶
Bases:
_UsingSharedMem,_WriteAccessorAccessor for summation over the block.
Each block writes to one element.
To do a full reduction, the user needs to calculate the final sum from the
n // 1024elements.This class does not use atomic add when writing to the output. You need to generate the shared indices using the lib.shared_indices function.
Equivalent to:
(for i, k in enumerate(indices)): out[(k//1024)] += values[i]- Parameters:
- KERNEL_SIG_TEMPLATE: ClassVar[dict[str, str]] = {'{name}': '{storage_t}* const', '{name}_num_alloc': 'unsigned int'}¶
- class AddBlockSum(name, storage, *, dtype, kernel_dtype, reuse_indices_from=None, offset=None, stride=None, default=None, after=None, block_size=1024)[source]¶
Bases:
_UsingSharedMem,_AddAccessorAccessor for summation over the block.
Each block adds to one element.
To do a full reduction, the user needs to calculate the final sum from the
n // 1024elements.This class does not use atomic add when writing to the output. You need to generate the shared indices using the lib.shared_indices function.
Equivalent to:
(for i, k in enumerate(indices)): out[(k//1024)] += values[i]- Parameters:
- KERNEL_SIG_TEMPLATE: ClassVar[dict[str, str]] = {'{name}': '{storage_t}*', '{name}_num_alloc': 'unsigned int'}¶
- class AddSum(name, storage, *, dtype, kernel_dtype, reuse_indices_from=None, offset=None, stride=None, default=None, after=None, block_size=1024)[source]¶
Bases:
_UsingSharedMem,_AddAccessorAccessor for adding global sum.
- Parameters:
- EXTRA_DATA = -992¶
- class WriteSum(name, storage, *, dtype, kernel_dtype, reuse_indices_from=None, offset=None, stride=None, default=None, after=None, block_size=1024)[source]¶
Bases:
AddSumAccessor for writing global sum.
- Parameters:
- class ReadPair(name, storage, *, dtype, kernel_dtype, reuse_indices_from=None, offset=None, stride=None, default=None, after=None, block_size=1024)[source]¶
Bases:
_Pairwise,_ReadAccessorAccessor to read the element corresponding to the current thread and the next thread.
- Parameters:
- KERNEL_SIG_TEMPLATE: ClassVar[dict[str, str]] = {'{name}': '{storage_t}* const', '{name}_num_alloc': 'unsigned int'}¶
- class ReadPairStridedWithDefault(name, storage, *, dtype, kernel_dtype, reuse_indices_from=None, offset=None, stride=None, default=None, after=None, block_size=1024)[source]¶
Bases:
_Pairwise,_ReadAccessorAccessor to read the element corresponding to the current thread and the next thread.
- Parameters:
- KERNEL_SIG_TEMPLATE: ClassVar[dict[str, str]] = {'{name}': '{storage_t}* const', '{name}_num_alloc': 'unsigned int', '{name}_offset': 'int', '{name}_stride': 'int'}¶
- PY_SIG_TEMPLATE: ClassVar[dict[str, str]] = {'{name}': 'pybind11::object', '{name}_offset': 'int', '{name}_stride': 'int'}¶
- class AddPair(name, storage, *, dtype, kernel_dtype, reuse_indices_from=None, offset=None, stride=None, default=None, after=None, block_size=1024)[source]¶
Bases:
_Pairwise,_AddAccessorAccessor for adding the pair to the element corresponding to the current thread and the next thread.
- Parameters:
- KERNEL_SIG_TEMPLATE: ClassVar[dict[str, str]] = {'{name}': '{storage_t}* const', '{name}_num_alloc': 'unsigned int'}¶
- PY_ARGS_TEMPLATE: ClassVar[list[str]] = ['As{storage_t_capitalized}Ptr({name})', 'GetNumCols({name})']¶
- EXTRA_DATA = 1¶
- class WritePair(name, storage, *, dtype, kernel_dtype, reuse_indices_from=None, offset=None, stride=None, default=None, after=None, block_size=1024)[source]¶
Bases:
_Pairwise,_ReadAccessorAccessor for writing the pair to the element corresponding to the current thread and the next thread. The second element is added to the first element of the next thread.
- Parameters:
- KERNEL_SIG_TEMPLATE: ClassVar[dict[str, str]] = {'{name}': '{storage_t}* const', '{name}_num_alloc': 'unsigned int'}¶
- PY_ARGS_TEMPLATE: ClassVar[list[str]] = ['As{storage_t_capitalized}Ptr({name})', 'GetNumCols({name})']¶
- EXTRA_DATA = 1¶
Bases:
ReadShared,_TunableAccessorUsed in factors to define a tunable parameter that is shared between factors.
- class Tunable(name, storage, *, dtype, kernel_dtype, reuse_indices_from=None, offset=None, stride=None, default=None, after=None, block_size=1024)[source]¶
Bases:
TunableShared
- class TunableUnique(name, storage, *, dtype, kernel_dtype, reuse_indices_from=None, offset=None, stride=None, default=None, after=None, block_size=1024)[source]¶
Bases:
ReadUnique,_TunableAccessorUsed in factors to define a tunable parameter that is shared between all factors.
- class TunablePair(name, storage, *, dtype, kernel_dtype, reuse_indices_from=None, offset=None, stride=None, default=None, after=None, block_size=1024)[source]¶
Bases:
ReadPair,_TunableAccessorUsed in factors to define a tunable pair. That is factor[0] depends on arg[0] and arg[1], factor[1] depends on arg[1] and arg[2], etc.
- class ConstantSequential(name, storage, *, dtype, kernel_dtype, reuse_indices_from=None, offset=None, stride=None, default=None, after=None, block_size=1024)[source]¶
Bases:
ReadSequential,_ConstAccessorUsed in factors to define constants that are unique to each factor.
- class Constant(name, storage, *, dtype, kernel_dtype, reuse_indices_from=None, offset=None, stride=None, default=None, after=None, block_size=1024)[source]¶
Bases:
ConstantSequential
Bases:
ReadShared,_ConstAccessorUsed in factors to define constants that are shared between factors.
- class ConstantIndexed(name, storage, *, dtype, kernel_dtype, reuse_indices_from=None, offset=None, stride=None, default=None, after=None, block_size=1024)[source]¶
Bases:
ReadIndexed,_ConstAccessorUsed in factors to define constants that are indexed.
- class ConstantUnique(name, storage, *, dtype, kernel_dtype, reuse_indices_from=None, offset=None, stride=None, default=None, after=None, block_size=1024)[source]¶
Bases:
ReadUnique,_ConstAccessorUsed in factors to define a constant that is shared between all factors.