kd.data.InMemoryPipeline#

class kauldron.data.InMemoryPipeline(*, _fake_refs: type[_FakeRefsUnset] | dict[str, _FakeRootCfg] = <class 'kauldron.utils.config_util._FakeRefsUnset'>, batch_size: int | None = None, seed: int | collections.abc.Sequence[int] | numpy.ndarray | jaxtyping.UInt32[Array, '2'] | jaxtyping.UInt32[ndarray, '2'] | jax.Array | None = _FakeRootCfg('cfg.seed'), loader: Callable[[], _ArrayTree], shuffle: bool = False, num_epochs: Optional[int] = None, drop_remainder: bool = True)[source]

Bases: kauldron.data.pipelines.Pipeline

Pipeline which fit in memory.

loader

Callable which returns all examples in a single Tree[Array[‘num_examples …’]] of `np.array

Type:: collections.abc.Callable[[], Any]

shuffle

Whether to shuffle the dataset

Type:: bool

num_epochs

Number of epoch (None for infinite iteration)

Type:: int | None

drop_remainder

Whether to drop the remainer (currently drop_remainder=False not supported)

Type:: bool

loader: Callable[[], _ArrayTree]

shuffle: bool = False

num_epochs: int | None = None

drop_remainder: bool = True

iter() → collections.abc.Iterator[Any][source]: Iterator.

property examples: Any: Cached in-memory data.

property num_examples: int

property sampler: kauldron.data.in_memory.BatchedIndexSampler