kd.evals.RunStrategy#

class kauldron.evals.RunStrategy[source]

Bases: object

Base class for info on how to run the evaluation.

RunStrategy are divided into:

Strategies which run along train (in the same XM job): * EveryNSteps: Run evaluation every X steps * Once: Run a single evaluation after X steps
Strategies which run in a separate XManager job: * StandaloneEveryCheckpoint: Run evaluation every time a new checkpoint is

found. Note that if eval is too slow, itermediate checkpoints will be skipped.
- StandaloneLastCheckpoint: Only run evaluation once, after train has completed.

Evaluators run in a standalone job can be grouped together through the job_group=’group_name’ attribute. This allow to save resources by sharing the same job for multiple evaluators.

Example:

shared_run = kd.evals.StandaloneEveryCheckpoint(
    job_group='separate',
    # Standalone evaluators supports all `kxm.Job` parameters.
    platform='a100=1',
))

cfg.evals = {
    'eval0': kd.evals.Evaluator(run=kd.evals.EveryNSteps(100)),
    'eval1': kd.evals.Evaluator(run=shared_run),
    'eval2': kd.evals.Evaluator(run=shared_run),
}

The XManager experiment will containing 2 tasks: train (running eval0) and separate (running eval1 and eval2).

Those objects are never resolved by the Kauldron Trainer config (to avoid XManager dependency).

Currently, the Kauldron XM launcher hardcode assumptions on the classes here. Contact us if you need more flexibility and for custom run behavior.

should_eval_in_train(step: int) → bool[source]: Whether the evaluator should be run for the given train-step.