Konfig principles#
Config can be a hot topic. Why create a new config system when there are already so many ones?
TLDR: The goal of
konfigis to remove all indirection between your code and config.The user just writes regular python code (following some design principles explained in this doc), and all their code is automatically configurable without any changes (no dependency injection or obscure magic).
Why? (Design goals)#
Motivating example#
Let’s take a simple example: configuring an optimizer.
optimizer = optax.adam(learning_rate=1e-5)
Often in config systems, to make the above configurable, users will duplicate the params to configure in some config class, like:
@dataclasses.dataclass
class AdamOptimizerConfig:
learning_rate: float = 1e-5
def make(self):
return optax.adam(learning_rate=self.learning_rate)
This config object is then propagated down the program tree until the place it is actually used.
We think this indirection is boilerplate and unnecessary. Very soon,
user will want to switch adam for a new optimizer (e.g. adafactor), or
customize adam b2= arguments. Each time users will have to update the config
and duplicate additional parameters. And with rising complexity it becomes
increasingly unclear from reading the config which optimizer will actually get
created. Config classes end-up with many unused parameters which where only
added once for a single experiment.
There must be a better way!
With konfig, there is no need to create any custom config object. The config is
written as pure Python calls. Need to customize the b2 params? Just write it
in your config.
Rather than:
cfg = AdamOptimizerConfig(learning_rate=1e-5, b2=0.99)
Simply write:
cfg = optax.adam(learning_rate=1e-5, b2=0.99)
Need to switch adam for a another optimizer ? Just write an arbitrarily complex one in your config:
cfg = optax.chain(
optax.scale_by_adadelta(),
optax.add_decayed_weights(weight_decay=0.0),
optax.scale_by_learning_rate(learning_rate=0.003),
)
All python classes / functions / constants are supported without any changes to
your codebase! (optax doesn’t know anything about konfig)
How does it works ?#
Conceptually, the idea is very simple. A config is just a nested tree of
dict. There’s a one-to-one mapping between a nested call of Python
functions and its corresponding dict representation.
For example, the last example can be represented as:
cfg = {
'__qualname__': 'optax:chain',
0: {'__qualname__': 'optax:scale_by_adadelta'},
1: {'__qualname__': 'optax:add_decayed_weights', 'weight_decay': 0.0},
2: {'__qualname__': 'optax:scale_by_learning_rate', 'learning_rate': 0.003},
}
Resolving the config dict back to original Python object is trivial by simply
importing the __qualname__ symbols and calling it with the args from the
dict. This is done by konfig.resolve(cfg).
The two codes are identical and return the exact same output:
|
|
When building the config, the only magic required is to capture the module (with
konfig.imports()), such as calling function creates dict-like objects,
rather than actually executing the function.
| Without config | With config |
|---|---|
|
|
Output is real Python object
|
Output is dict-like object
|
| Python code should never be mutated. | Config is mutable
|
| N/A | Config is serializable
|
Note: Because konfig objects are simply
dictlike object, you cannot have control flow among them (i.e. your config can definef(g()), but NOTf() + g()as it’s like executing{'__qualname__': 'f'} + {'__qualname__': 'g'})
Benefits#
Removing the indirection between code and config makes the config code look like regular Python code. This comes with additional advantages, like:
Full IDE supports out-of-the-box, including:
Auto-complete
Hover to show function/classes docstring
Ctrl+click on a symbol jump directly to the symbol definition and usage (in the above case, you directly jump to
adamimplementation without any indirection)
Type checking in your config files
Code is copy-pastable outside of config, which is very helpful for quick inspection and debugging. For example, just copy-paste the dataset config definition in Colab to create the matching Python object and inspect it.
How ? (Best-practices)#
Rule 0: Keep config / Python code separated#
Because konfig and standard Python code looks very similar, it’s very
important to keep a clear boundary between the two.
Execution#
Before
konfig.resolve, all the objects inside the cfg tree are onlyConfigDictlike objects. The library will raise an error if adding a Python object inside the config.After the
konfig.resolve: all the objects are only real Python objects. There are no moreConfigDictobjects.
Folder structure#
All config code should go inside some
configs/folder.All the actual implementation (models, dataset implementation, additional logic,…) should go outside this
configs/folder.
Colab#
On Colab, it’s very easy to mix regular imports with konfig.imports(), which
is a direct violation of Rule 0. To avoid this issue:
TLDR: Never use konfig.imports() on Colab.
On Colab: Instead, you can locally mock the modules with
konfig.mock_modules().import optax # NO `konfig.imports()` # Imports are mocked only locally with konfig.mock_modules(): cfg = optax.adam(learning_rate=1e-5) # This is a ConfigDict optimizer = optax.adam(learning_rate=1e-5) # This is the real Python object
Outside Colab: Use
konfig.import()for all imports in your config file. This ensures clear boundaries between configurable modules and the config implementation.with konfig.imports(): import optax cfg = optax.adam(learning_rate=1e-5)
Naming#
When manipulating the unresolved config in your code (before konfig.resolve),
make it clear in the name this is a config-like object (cfg, config,
optimizer_config,…):
# Before: object is named `cfg`
optimizer = konfig.resolve(cfg)
# After: object is named `optimizer`
Rule 1: Keep complexity in Python#
TLDR: Config files should be kept minimal (ideally in a single small file less than ~300 lines) !
As your code supports more complex uses-cases, it can be tempting to add more and more options in the config. This is a bad idea! The config files will start to grow uncontrollably.
Instead, move the complexity to Python code, by wrapping the code in higher level abstractions.
Note that this design rule is independent of using konfig or not. As your program complexity grows, it’s your responsibility to wrap complexity in higher level abstractions. Remember that your Python code can directly be used without konfig. The config file is just the reflection of the abstractions from your codebase.
Example 1#
Let’s take our original example:
cfg = optax.adam(learning_rate=1e-5)
As your use-cases get more and more complex, you start using more and more complex optimizers.
cfg = optax.chain(
optax.scale_by_adadelta(),
optax.add_decayed_weights(weight_decay=0.0),
optax.scale_by_learning_rate(learning_rate=0.003),
)
Over time, the optimizer definition in your config starts to become too big. At
this point, it is tempting to add a def _make_optimizer() function in your
config file, to simplify creating the optimizer. Resist this urge! Factoring
out the complexity in a higher level wrapper is good, but this abstraction
should live in the Python side. Not in the config.
The config can then simply call your higher level abstraction:
cfg = my_project.my_complex_optimizer(learning_rate=0.003)
Example 2#
One common pattern where this applies is in the data pipeline. As more complex transforms get added, the data pipeline starts to grow:
cfg = kd.data.tf.Tfds(
name="ai2dcaption",
split="train",
shuffle=True,
transforms=[
# Low-level transforms
gm.data.Tokenize(key="prompt", tokenizer=tokenizer, add_bos=True),
gm.data.Tokenize(key="response", tokenizer=tokenizer, add_eos=True),
gm.data.AddNextTokenPredictionFields(...),
kd.data.Elements(keep=["input", "target", "loss_mask"]),
gm.data.Pad(
key=["input", "target", "loss_mask"],
max_length=max_length,
truncate=True,
),
kd.data.Rearrange(key=["target", "loss_mask"], pattern="... -> ... 1"),
],
)
At this point, it’s a good idea to create some higher level transform in your Python code, which can be reused across configs.
cfg = kd.data.tf.Tfds(
name="ai2dcaption",
split="train",
shuffle=True,
transforms=[
# High-level transform
gm.data.Seq2SeqTask(in_prompt="prompt", in_response="response", ...),
],
)
Can I split my config into multiple files ?#
Before doing this, remember rule 1. Why do you need to split config ? If it’s because the config file has grow too much, it likely indicate some of the config logic should be moved outside of config, directly in Python.
However there’s still one legitimate use-case to split the config file. If you have some base config which defines some common fields, and the sub-configs only customize one or two additional field.
Even then, this can be a slippery slope. Do not start to chain multiple configs. For configs files, forking is better than inheritance. When following the above principles, configs are small and self-contained.
Rule 2: Keep Python code simple#
Create a top-level abstraction for your program#
Your Python code can be used as standalone, without konfig. The best way to do
this in a modular/reusable way is to define some top-level @dataclass which
exposes all the sub-parts user might want to customize.
For example, Kauldron experiments are a kd.train.Trainer class which can be
used as standalone, like:
trainer = kd.train.Trainer(
train_ds=ds,
model=model,
optimizer=optimizer,
...,
)
trainer.train()
Similarly, the XManager launcher is a kxm.Experiment dataclass which
can be called directly:
xp = kxm.Experiment(
name='My experiment',
jobs={
'train': kxm.Job(
target='//path/to/my:target',
platform='df=2x2',
),
},
)
xp.launch()
Those top-level abstractions are very natural to be wrapped inside konfig, to allow json serialization and CLI overwrite. In the previous example, wrapping inside config would automatically allow user to overwrite every fields:
--xp.name='My new experiment'--xp.jobs.train.target='//path/to/my:new_target'--xp.jobs.train.platform='df=4x4'…
Python code should be immutable#
In Python, mutations make it hard to track and understand side effects.
Config code can (and is meant to) be mutated (in the config files, on Colab, through CLI,…).
Python code should be immutable. For safety, konfig will normalize
list->tuple,dict->immutabledict. If you have dataclasses, it’s best to define them asfrozen=True, as it is the case with the abovekd.train.Trainer,kxm.Experiment,…
Do not create Config classes#
The goal of konfig is to remove the indirection with your code. So do not add
the indirection back with config classes. Just directly call your nn.Module,
optax.GradientTransformation, tfds.builder,… in your config.