kd.data.InMemoryPipeline

kd.data.InMemoryPipeline#

class kauldron.data.InMemoryPipeline(*, _fake_refs: type[_FakeRefsUnset] | dict[str, _FakeRootCfg] = <class 'kauldron.utils.config_util._FakeRefsUnset'>, batch_size: int | None = None, seed: int | collections.abc.Sequence[int] | numpy.ndarray | jaxtyping.UInt32[Array, '2'] | jaxtyping.UInt32[ndarray, '2'] | jax.Array | None = _FakeRootCfg('cfg.seed'), loader: Callable[[], _ArrayTree], shuffle: bool = False, num_epochs: Optional[int] = None, drop_remainder: bool = True)[source]

Bases: kauldron.data.pipelines.Pipeline

Pipeline which fit in memory.

loader

Callable which returns all examples in a single Tree[Array[‘num_examples …’]] of `np.array

Type:

collections.abc.Callable[[], Any]

shuffle

Whether to shuffle the dataset

Type:

bool

num_epochs

Number of epoch (None for infinite iteration)

Type:

int | None

drop_remainder

Whether to drop the remainer (currently drop_remainder=False not supported)

Type:

bool

loader: Callable[[], _ArrayTree]
shuffle: bool = False
num_epochs: int | None = None
drop_remainder: bool = True
iter() collections.abc.Iterator[Any][source]

Iterator.

property examples: Any

Cached in-memory data.

property num_examples: int
property sampler: kauldron.data.in_memory.BatchedIndexSampler