Checkpoint#
Restoring a Checkpoint#
To load weights from a particular checkpoint, use the checkpointer attribute of the trainer:
init_state = trainer.init_state()
state = trainer.checkpointer.restore(init_state, step=-1)
To restore the trainer from a previous experiment:
From a workdir directory:
config_path = pathlib.Path(workdir) / 'config.json' trainer = kd.konfig.resolve(json.loads(config_path.read_text()))
Partial loading#
To load weights from another checkpoint (e.g. restore pretrained encoder), you
can use the init_transform argument of kd.train.Trainer
cfg.init_transform = kd.ckpts.PartialKauldronLoader(
workdir=/path/to/old/workdir/,
new_to_old={ # Mapping params
# The new_to_old dict determines which weights are loaded from the
# target checkpoint, and can also be used to rename subtrees when
# loading params from a different pretrained model.
# '<new_path>': '<source_path>'
'params.decoder.layers_0': 'params.endoder',
},
)
trainer = konfig.resolve(cfg)
# When initializing the weights, the `init_transform` is applied
init_state = trainer.init_state()
# `init_state.params['decoder']['layers_0']` now contains the previous encoder
# weights
See kd.ckpts.PartialLoader for details.
Relaunching an experiment#
To relaunch an experiment, you can:
Continue training in a new separate workdir: Relaunch a new job with
init_transform, to initialize your new model to the previous state:cfg.init_transform = kd.ckpts.PartialKauldronLoader( workdir=/path/to/old/workdir/, new_to_old={ # Mapping params 'step': 'step', 'params': 'params', 'collections': 'collections', 'opt_state': 'opt_state', }, )