Checkpoint#

Restoring a Checkpoint#

To load weights from a particular checkpoint, use the checkpointer attribute of the trainer:

init_state = trainer.init_state()
state = trainer.checkpointer.restore(init_state, step=-1)

To restore the trainer from a previous experiment:

From a workdir directory:

config_path = pathlib.Path(workdir) / 'config.json'
trainer = kd.konfig.resolve(json.loads(config_path.read_text()))

Partial loading#

To load weights from another checkpoint (e.g. restore pretrained encoder), you can use the init_transform argument of kd.train.Trainer

cfg.init_transform = kd.ckpts.PartialKauldronLoader(
    workdir=/path/to/old/workdir/,
    new_to_old={  # Mapping params
        # The new_to_old dict determines which weights are loaded from the
        # target checkpoint, and can also be used to rename subtrees when
        # loading params from a different pretrained model.
        # '<new_path>':            '<source_path>'
        'params.decoder.layers_0': 'params.endoder',
    },
)

trainer = konfig.resolve(cfg)

# When initializing the weights, the `init_transform` is applied
init_state = trainer.init_state()

# `init_state.params['decoder']['layers_0']` now contains the previous encoder
# weights

See kd.ckpts.PartialLoader for details.

Relaunching an experiment#

To relaunch an experiment, you can:

Continue training in a new separate workdir: Relaunch a new job with init_transform, to initialize your new model to the previous state:

cfg.init_transform = kd.ckpts.PartialKauldronLoader(
    workdir=/path/to/old/workdir/,
    new_to_old={  # Mapping params
        'step': 'step',
        'params': 'params',
        'collections': 'collections',
        'opt_state': 'opt_state',
    },
)

Checkpoint

Contents

Checkpoint#

Restoring a Checkpoint#

Partial loading#

Relaunching an experiment#