Clipppy Users’ Guide

Clipppy’s features fall into two broad categories: a configuration system and runtime utilities. Whereas the latter are aimed mainly at probabilistic programming (PP) and variational inference (VI), the former is meant to be as general as possible in its core while having specific “shortcuts” useful in PP and VI.

YAML configuration system

Clipppy’s configuration semantics were derived from its venerable predecessor pyrofit, which used YAML files with a more or less predefined structure to define VI models (and guides, etc.). In contrast, Clipppy’s YAML “format” is general and can in principle represent the initialisation of any Python objects (and even a bit beyond). The backbone of YAML parsing in Clipppy is ruamel.yaml, itself derived from PyYAML, and the very interested reader is referred to their documentations and to the official YAML specification. The less patient among you might be interested in any of a number of YAML tutorials on the web, while if you just want to start off with Clipppy, simply read on.

The main entry point functions for YAML parsing in Clipppy are clipppy.loads (general purpose) and clipppy.load_config (for VI). The examples in this guide assume the YAML is loaded via the former, which makes no prior assumptions on the overall structure of the document (while the latter by default interprets it as a clipppy.Clipppy object) and works with plain strings (whereas load_config requires a path, pathname, or text stream as input).

Basic YAML

YAML is1 a markup language that extends JSON. Hence, any valid JSON, like {"answer": 42, "foo": [3.14, "euler's number", {"question": null, "whatever": {"maybe": true, "but actually": false}}]} is valid YAML. This basic variant (which is almost directly usable as a Python literal) allows the description of arbitrary primitive objects: numbers, strings, arrays (Python lists), and dictionaries. YAML also allows a modified syntax, where one is permitted to

  • ditch the quotation marks since everything that doesn’t look like a number (and isn’t a literal like true, false, null) is interpreted as a string;

  • ditch the brackets as long as one uses indentation: everything more indented with respect to the parent by the same amount is on the same nesting level;

  • use a bullet-point-like style for lists (using -) instead of square brackets.

Thus, in YAML the above example may be rewritten as

answer: 42
foo:
  - 3.14
  - euler's number
  - question: null
    whatever: {maybe: true, but actually: false}

which is, arguably, way more pleasant to look at.2 Note that one can still use (even partially well-formatted) JSON for any node.

Note

YAML also permits using lists (will be converted to tuples) as keys in a dictionary:

[a string, 26]: value

However, the usual hashability rules of Python apply, so dictionaries are not allowed inside keys. Also, since non-string keys are not used in function definitions, this feature is discouraged in Clipppy, and no guarantees are made that it will indeed be allowed forever.

Footnotes

1

according to its name, though, it ain’t

2

Fret not! It will soon become much messier!

Advanced YAML3

On top of syntactic sugar, YAML comes with some useful additional features. One of them is the ability to name and subsequently reference nodes. The syntax is inspired by C’s pointers:

a: &name {key1: value1, key2: value2}
b: *name

Using &name defines the “variable”, and *name “dereferences” it4. The pointer language is accurate here since in the parsed object, the two nodes will be converted to references to the same object, so parsed['a'] is parsed['b'] will evaluate to True in Python. Since this is a standard feature of ruamel.yaml, Clipppy’s machinery is bypassed when dereferencing, which might be surprising to someone who uses YAML references as a way to avoid duplicating code and doesn’t really mean to have the same object.

Footnotes

3

well, intermediate, really

4

The correct terms in YAML-speak are “anchor” and “alias”.

Tags

Secondly, YAML introduces the idea of tags (and Clipppy takes it maybe a bit too close to the heart). They are identified by a prefixed !, as in !tag, and, as in any markup language, define the type of the node/element. They are metadata and denote a step of postprocessing the primitive values found in the node contents.

Note

The specification goes on and on about tags and prefixes and namespaces and URIs and local and global and what-not tags… Truth is, every node must have a tag (be it an empty one), but end users usually don’t with bother them since most tags are assigned automatically by the parser based on whether a node is a number, a string, a literal, an array, or a dictionary. Clipppy adds specific tags for specific object types and uses other tags to enable more complex behaviour presented below. Tags are also assigned (experimentally) based on type annotations as described in The Power of Type Hints.

!py

Used as !py:NAME, this tag allows access to arbitrary objects. The name can be any qualified name (non strict sense), i.e. any importable module, or nothing, followed by a sequence of attribute accesses. Maybe some examples will clarify this:

  • a name in the local scope: print,

  • a_local_var.s_attribute.and_more,

  • a series of modules and a name: astropy.units.astrophys.attoparsec,

  • properties of classes, etc.: package.module.OuterClass.InnerClass.method.__name__.

More precisely, the directive first tries to evaluate the name in ClipppyYAML.constructor.scope (see Scopes). If a NameError or AttributeError occurs, it tries to import part of the name as a module and evalueate the rest in its scope. It does that at every possible splitting location (a dot), starting from the right, i.e. prefers long imports to long attribute lookups. For example, for astropy.units.astrophys.attoparsec, assuming astropy.units has not been imported, it will try to import astropy.units.astrophys.attoparsec first; this will fail, so it will try astropy.units.astrophys, which will have been imported as usual; finally !py will look up the name attoparsec in the imported module and thus succeed. If nothing works, a NameError/AttributeError/ModuleNotFoundError is raised as appropriate.

Once the name is resolved, there are two options. If the node is empty (that is, if there is no value following the NAME), the value of the node is set to the resolved Python object. For example,

key: !py:print

will parse as {'key': <built-in function print>}. Beware of this style since in YAML a tag must be followed by whitespace or end-of-line/transmission, so things like {key: !py:print} are not valid (just needs a space, though).

If the node does have a value, the object corresponding to the tag will be called with the node contents as arguments, and the node’s value will be set to the returned object. Thus,

key: !py:print Making progress!

will actually print Making progress! while parsing and return {'key': None} (since print returns None). If the node is a scalar, it will be passed as a single argument; if it is a sequence, it will be expanded as func(*args), and if it is a mapping, as func(**kwargs), so you can do some wacky things like

!py:str.join [" ", [Hooray, for, Python!]]

which is the same as str.join(' ', ['Hooray', 'for', 'Python!']), or outright lose it:

!py:sorted
    <: [[[a, 42], [c, 26], [b, 13]]]
    key: !py:operator.itemgetter [0]
    <<: {reverse: True}

(equivalent to the Python

sorted(*[[['a', 42], ['c', 26], ['b', 13]]],
       key=operator.itemgetter(0), **{'reverse': True})

that results in reverse sorting the list by the first entry in each element: [['c', 26], ['b', 13], ['a', 42]]). You can see that the syntax resembles real Python code as close as possible, with the exception of parameter expansions being effected by < and << instead of, respectively, * and ** (because a * is reserved for anchors in YAML). The tricks that !py hides up its sleeve are described in full detail in From Node to Signature.

!import

A directive that realises Python imports. This tag expects an array node and returns None as the node’s value. Each element node should be a simple string as you would write in Python, and all import styles are supported. The general syntax, therefore, is

whatever name: !import
    - import torch
    - import numpy as np
    - from matplotlib import pyplot as plt

Loading this config will result in {'whatever name': None}, but as a side effect the respective modules / names will be imported by the standard Python machinery and will be available to subsequent !py and !eval directives for name lookup, as well as in sys.modules. This directive is primarily useful for as-style imports, abridging qualified names to just the proper __name__ or for making names available in !eval. Other cases are covered by the name resolution semantics of !py.

Note

Additional formats are supported for backwards compatibility. These will not be documented in order to encourage the more sensible standard syntax but can be deduced by perusing the source code of ScopeMixin.import_. I’ll give away just that things like !import numpy as np work as well.

Note

Some modules are always available without an explicit import in the YAML file. These include the majority of the Clipppy API, torch, numpy (also as np), and operator (as op).

!eval

Evaluate the node contents as a Python expression. Basically, this is God mode, although you’re still limited to a single expression (not even a statement) since the contents are simply passed on to the built-in eval function. But a Python God is supposed to be able to do anything in a single expression[citation needed].

Warning

It is currently not possible to define a lambda expression that makes use of global variables inside an !eval, like !eval "lambda: torch.ones(...)". The reason is that the scope is kept in a ChainMap that is inadmissible in eval‘s globals parameter… Instead, it goes in the locals, but that is not remembered by a lambda

!txt

Load a text file with numerical data. This is a thin wrapper around numpy.loadtxt and as such expects the contents of the node to be valid arguments for it: see From Node to Signature. The particular most frequently used signatures are

!txt data.txt

and

!txt {\: data.csv, delimiter: ","}

The quotation marks are necessary here because a comma is a special character in YAML/JSON.

!npy

Load a .npy file. A thin wrapper around numpy.load:

- !npy data.npy
- !npy {\: data.npy, allow_pickle: false}
!npz

Load a .npz archive. This again wraps numpy.load, but has an optional second argument key that specifies a particular file from the archive to be returned (see numpy.savez). Thus,

!npz [data.npz, somekey]  # or {fname: data.npz, key: somekey}

is the same as numpy.load('data.npz')['somekey']. Otherwise, the opened NpzFile will be returned as is:

!npz data.npz  # same as np.load('data.npz')

Additional (keyword only!) arguments will be passed on to numpy.load:

!npz {\: data.npz, key: somekey, allow_pickle: false}
!pt

Load a PyTorch archive through torch.load. Has the same semantics as !npz:

- !pt data.pt             # torch.load('data.pt')
- !pt [data.pt, somekey]  # torch.load('data.pt')['somekey']
- !pt                     # torch.load('data.pt', map_location='cuda', **kwargs)['somekey']
    fname: data.pt
    key: somekey  # optional
    map_location: cuda
    # any other keyword arguments will go into kwargs

Note that torch.load can save any Python object, so it is not guaranteed that indexing torch.load('data.pt')['somekey'] is sensible.

!trace

Extract values from a saved pyro.poutine.Trace. Assuming a trace containing sites a and b was saved to trace.pt via torch.save(trace, 'trace.pt'), one can retrieve the values either one at a time:

!trace [trace.pt, a]

(equivalent to torch.load('trace.pt').nodes['a']) or at multiple sites:

!trace [trace.pt, [a, b]]  # -> {'a': ..., 'b': ...}

which will return a dictionary of values as above. Additional keyword arguments are also accepted and passed on to torch.load as in !pt. Note, though, that for !trace the second (key) argument is required.

!tensor

Explicitly construct a torch.Tensor via the torch.tensor function. The simplest use case is to convert a list of numbers5 to a tensor:

!tensor [[1, 2, 3, 4, 5]]

Notice the double brackets: this is necessary because the node contents first have to be translated to a tuple of arguments, the first of which happens to be an array. Additional (keyword! as per the signature of torch.tensor) arguments for the dtype, device, gradient and pinnedness of the tensor are accepted, and furthermore, the data argument can be an arbitrary construction:

!tensor
    : !npz [data.npz, somekey]
    dtype: !py:torch.get_default_dtype []6
    device: cuda

The above example loads a NumPy array, converts it to the default float type, and puts it on the GPU.

Note

The usual caveats of torch.tensor apply. In particular, a copy is always made, even if the data is a Tensor with the requested properties. Furthermore, if an explicit device argument is not given, any non-Tensor data will be (copied and) placed on the default PyTorch device, whereas a Tensor will be (copied and) kept on the same device. Use, therefore, !tensor:default to ensure that the result is placed on the default device.

5

Arguably, it’s simpler to convert a single number to a tensor: !tensor 42. This also works but is slightly frowned upon (it is the same as !tensor [42].)

6

If you’re confused about the brackets here, remember that torch.get_default_dtype is a function and needs to be called with no arguments.

!tensor:DTYPE

In order to simplify the above code, Clipppy supports a namespace/prefixed version as a succint way of specifying the desired Tensor.dtype. This is equivalent to

!tensor
    ...
    dtype: !py:torch.DTYPE

Acceptable versions, therefore, are !tensor:int, !tensor:float, !tensor:double, !tensor:bool, among others, and the special value, !tensor:default, which stands for the current default dtype and device obtained as above.

![operator]

Operators can be accessed with the syntax !+… A simple example is !* [6, 9] (evaluates to 42). These dispatch to functions in operator7. Here is a full list:

!+ (add), !- (sub), !* (mul), !/ (truediv), !@ (matmul),

!== (eq), !ne (ne), !lt (lt), !le (le), !gt (gt), !ge (ge).

The angular brackets (<>) are interpreted in YAML tags, so they, sadly, cannot be used to represent operators. Additionally, the following are defined for getting properties

![] (getitem), !. (getattr).

And finally, we define !: (slice) because we can. Then, one can do:

!+ [![] [!py:np.arange [10], !: [2, 9, 3]], 0.1]

as a slightly longer and deranged version of np.arange(10)[2:9:3] + 0.1 if such a thing floats one’s boat.

7

The same can, of course be accessed via !py:operator.add…, the shorter !py:op.add… and even directly as !py:add… (i.e. a from operator import * is performed in the global name resolution scope).

!Stochastic:NAME

A shortcut for

!py:clipppy.stochastic.stochastic.Stochastic
    ...
    name: NAME

therefore, see the documentation of Stochastic and StochasticSpecs. NAME (and the colon :) can be omitted and will default to None. Since Stochastic takes at least two arguments, the first one being an object to “wrap” and the second a dictionary of parameter “specifications”, the usual YAML pattern is

!Stochastic:NAME
    - !py:MyDeterministicCallable
        ...  # constructor arguments
    - param_1: ...  # Sampler, etc. or distribution or constant
      param_2: ...
      ...

Note

Built into stochastic are two features that make describing stochastic wrappers in YAML (and not only) easier. Firstly, if any of the specs.values() is an instance of AbstractSampler (this includes instances of Sampler and company), its name is set to the name of the parameter it is attached to (via AbstractSampler.set_name). Secondly, if it is a Distribution, a Sampler is automatically created from it. This allows for the rather concise

!Stochastic [..., {param: !py:Normal [0., 1.], ...}]

for example, assuming Normal has been imported already.

!Param
!Sampler
!InfiniteSampler
!SemiInfiniteSampler

Shortcuts for Param, Sampler, Sampler(d=InfiniteUniform()), Sampler(d=SemiInfiniteUniform()).

Todo

list all classes from sampler.

As a final shortcut, Clipppy’s YAML processor is set up so that by default the top-level node is auto-interpreted as a Clipppy object, i.e. it is assigned a tag !py:Clipppy. If this is not desired, use the interpret_as_Clipppy parameter to loads/load_config and ClipppyYAML.load or explicitly tag the whole document however you like.

Scopes

The directives !py and !eval advertise giving you access to arbitrary Python (objects) from inside the YAML configuration and therefore need to resolve variable names. The scope in which this is done is kept in ClipppyYAML.constructor.scope8. By default Clipppy makes every[citation needed] effort to simulate the scoping “rule” of eval/exec, i.e. to “execute” the YAML in the local scope from which loads/clipppy.load_config or ClipppyYAML.load is called:

>>> a = 'spam, baked beans, and spam'
>>> clipppy.loads('!py:str.replace [!py:a , baked beans, spam]')
'spam, spam, and spam'

(Note the space here, since a) we don’t want to call a, and b) a space is required after every tag in YAML.)

To achieve this, every invokation of ClipppyYAML.load by default collects the locals, globals, and builtins from the appropriate frame and saves them to ClipppyYAML.constructor.scope. The scope may then be updated by !import directives, and these updates will leak to the caller. This is probably best illustrated with an explicitly given scope:

>>> scope = {}
>>> clipppy.loads('!import numpy as np', scope=scope)      # None
>>> scope
{'np': <module 'numpy' from '.../numpy/__init__.py'>}
>>> clipppy.loads('!import jax.numpy as np', scope=scope)  # None
>>> scope
{'np': <module 'jax.numpy' from '.../jax/numpy/__init__.py'>}
>>> 'jax.numpy' in sys.modules
True

but the same thing happens when using the default “current” scope:

>>> clipppy.loads('!import torch')  # uses the current scope
>>> torch
<module 'torch' from ...>

On top of that scope, ClipppyYAML installs a custom ClipppyYAML.constructor.builtins that consists of the usual __builtins__ and the global scope of clipppy.yaml. The latter is kept for compatibility and as an easy way to get numpy, torch, and the majority of the clipppy API registered, even though the “full” API is then explicitly registered in this ClipppyYAML.constructor.builtins scope.

Note

If invoked from within YAML, e.g. via !py:locals [] or !py:globals [], the built-in locals and globals functions return the respective scopes for some function inside clipppy.yaml instead of something more meaningful9. The way to get at the “correct” scope, which !import imports in, is via eval-uating locals/globals as a Python call: !eval locals(), which will return ClipppyYAML.constructor.scope as, currently, a ChainMap. Remember, though, that !py operations essentially transpire in this scope anyway.

Footnotes

8

This attribute is unconditionally overwritten on each load, so setting it directly will not have an effect on YAML loading. What it is set to, though, is controlled by the scope function parameter, which is your chance of controlling the YAML “globals” scope’; especially, if you want to “hide” the caller scope from the YAML for some reason (speed?), pass an empty dictionary.

9

This might point you to why loading YAML is considered “unsafe” and why ruamel.yaml operates in a “safe” mode, turning which off is the first order of business for ClipppyYAML.

From Node to Signature

Magic Keys

There are only three “magic keys”. Since YAML does not allow mixing sequence and mapping nodes, while in Python this is common practice, and also to cover the case of positional-only parameters, Clipppy needs a positional argument indicator key. Furthermore, since it is common to want to expand some generated parameter or maybe use the same object as a monolithic sequence in one place and as individual items in another11, Clipppy defines positional and keyword expansion “operators” corresponding to the Python parameter expansion syntax */**.

/

Use the value as a positional argument. Can be used at any point (even after keywords, contrary to the Python grammar).

<

Expand the value into positional arguments. A simple use case would be some xy coordinates as an \(N \times 2\) array that need to be expanded into two arrays of length \(N\):

- &pts [[0, 0], [1, 1], [26, 42]]
...
- !py:matplotlib.pyplot.plot
    <: !py:np.transpose [*pts]

which corresponds to the very similar Python code

plt.plot(*np.transpose(pts))

Note

If you try this example with ruamel.yaml<=0.17.4 (or maybe even higher), this will (may) not work! The reason is that there is no (not-too-hacky) way to force depth-first construction if using an optimised C-based loader/parser/constructor, and the current implementation returns an empty list as the value of the referenced node when the !py:np.transpose-tagged node requires it. To solve this, tag the whole document with !py:list for example, which will transfer control to ClipppyYAML from the beginning (and make the document a one-element sequence as per the requirement of list… See, I told you: hacky!).

This highlights a fundamental design choice of Clipppy: in order to provide sensible insight using type hints, construction has to be depth first and recursive (hence, Python’s stack depth limitation applies to Clipppy YAML files). In contrast, simple collection assembly can live with breadth-first construction and a subsequent population using further placeholders, etc.

Deprecated since version 0: Initially, the key for positional expansion was __args, but this should not be used anymore.

<<

Expand the value into keyword arguments. This “merge type” is actually present in the officially recommended YAML type system10. Clipppy needs to merge eagerly, though, in order to be able to tag the nodes, so this key is handled specially. Otherwise, it does what you would expect: merges the named mapping into its parent, overwriting any already present keys. In this regard

!py:func {<<: *map1, <<: *map2, ...}

behaves more like

func(**{**map1, **map2, ...})

than

func(**map1, **map2, ...)

which would throw an exception for repeated keys. The same overwriting rule applies to keys not from expanded mappings.

Magic keys can be freely mixed and matched, used multiple times, etc. The order of evaluation of the nodes/parameters follows strictly the definition order in the YAML, just as it follows the definition order in a Python call (important for side effects and defining anchors). Here’s an example:

!py:f
<: [!eval 22/7]
euler: 2.72
<: [0, 1, 1, 2, 3, 5, 8]
euler: 2.71828
<<: {euler: !py:math.exp [1],
     pi: !py:math.pi , phi: !py:mpmath.phi }  # spaces!

may be used with a signature like

def f(not_pi, *fibonacci, euler, **exact): ...

and will result in a locals

{'not_pi': 3.142857142857143,
 'euler': 2.718281828459045,
 'fibonacci': (0, 1, 1, 2, 3, 5, 8),
 'exact': {'pi': 3.141592653589793,
           'phi': <Golden ratio phi: 1.61803~>}}

Footnotes

10

But… it isn’t really a type, is it? It’s a procedural directive, mandating the merge of some mappings, which is an operation!

11

or simply to forget the name of some commonly-used first parameter, like in the example with sorted above. In that case, you’ll need to wrap it in [], of course.

The Power of Type Hints

Type hints in Python are the best!12 They are completely ignored at runtime, so they don’t limit you in any way, but are still tremendously helpful in static analysis and allow IDEs to spot errors in your code before you run it. They help clarify the meaning of parameters and properties and contribute to automatic documentation generation. Even though the language ignores type hints, they are not completely “lost” as are the types of compiled languages: “annotations” can be freely examined by the program using the builtin typing and inspect modules. Basically, they are free information that the software designer gives to the program without any obligation. As such, type hints are often the basis of “smart” functionality, such as in the dataclasses modules. And in Clipppy, which tries to be smart and save you some typing in YAML if you have gone through the trouble of writing properly annotated Python code.

Clipppy needs to invoke Python functions with arguments coming from YAML in order to construct complex data structures beyond simple containers (sequences and mappings). Sometimes the inputs are themselves complex structures, and so the YAML parser needs to be informed further of the way to form them from simpler data, and so on. However, the original function knows what data to expect, and the constructors of complex structures know what primitives they need, or at least the programmer who wrote them does. Thus, if they provided this information as type hints, Clipppy can try to automatically determine the processing needed in the middle between primitives and the final call signature.

Take the following typical Clipppy configuration as example:

guide:
    - cls: MultivariateNormalSamplingGroup
      name: main
      match: main/.*
    - cls: DiagonalNormalSamplingGroup
      name: others

To an outside observer this is just a one-key mapping, and the one value is a list of two further mappings with some strings. No tags or further information provided. However, as we said, Clipppy can automatically assume that this whole YAML represents a Clipppy object, and so automatically tag it13 with !py:Clipppy. The node, thus, represents a call to the constructor of Clipppy with an argument guide, so Clipppy inspects it for further information. In an ideal world, such as the one we live in, the guide parameter would be tagged with Guide so that the parser can tag it with !py:clipppy.guide.guide.Guide (it’s a mouthful, but that’s qualified names for you; also, that’s why we want automation, right?). Next, the constructor for Guide reads

def __init__(self, *specs: GroupSpec, model=None, name=''): ...

so the parser expands the sequence node into this signature and realises than both elements should be instances of GroupSpec, whose constructor might be (it was, it’s not anymore)

def __init__(
    self,
    cls: Type[SamplingGroup] = DeltaSamplingGroup,
    match: Union[str, re.Pattern] = _allmatch,
    exclude: Union[str, re.Pattern] = _nomatch,
    name='', *args, **kwargs): ...

Here, even though name is not annotated, Clipppy will consider the type of the default value in line with most type checkers. However, a str is not particularly interesting since scalar nodes are by default strings. The match is a Union for convenience and is explicitly converted to a re.Pattern in the body of the function. Sadly, Clipppy connot handle Unions yet, so it leaves the match node alone14. Finally, for the cls parameter, meant to indicate the subtype of SamplingGroup to use, Clipppy assumes that the node is a name of a class / Python object to pass. The node is then tagged with !py:VALUE, where VALUE is the original content15. Clipppy does that for all Type or typing.Callable|collections.abc.Callable annotations, so if you want to pass something else than a name, you should put an explicit annotation.

Depending on ClipppyConstructor.strict_node_type, which is True by default, Clipppy enforces the types of nodes versus what it expects from an annotation: that callable / string parameters are represented as scalar nodes and that builtin sequences / mappings are, respectively, sequences / mappings.

Finally, the original YAML is perceived as

!py:Clipppy
guide: !py:clipppy.guide.guide.Guide
    - !py:clipppy.guide.sampling_group.SamplingGroup
        cls: !py:MultivariateNormalSamplingGroup
        name: main
        match: main/.*
    - !py:clipppy.guide.sampling_group.SamplingGroup
        cls: !py:DiagonalNormalSamplingGroup
        name: others

Footnotes

12

But they’re soon getting worse (PEP 563)… :/

13

This only applies to loading with interpret_as_Clipppy, as discussed above. Note that Clipppy will never interfere with your code if you’re explicit and do put tags in, unless they are the standard ones <tag:yaml.org,2002:str>, <...:seq>, <...:map>, which are actually auto-assigned based on the node type.

14

Even if the annotation were a plain re.Pattern, it wouldn’t work directly. Clipppy may be smart, but how is it to know that the constructor raises a TypeError: cannot create 're.Pattern' instances when called directly, or that its signature checks out as (), i.e. nothing?! Maybe the developer knows that, though, and also that Patterns are constructed via re.compile. They can then help Clipppy by registering a type-to-tag mapping in ClipppyConstructor.type_to_tag as

ClipppyConstructor.type_to_tag[re.Pattern] = '!py:re.compile'

to replace the default cls -> '!py:{cls.__module__}.{cls.__qualname__}'. Then a function like f(a: re.Pattern) can be safely “called” as !py:f [(meta-)*regex golf] and will be passed re.compile('(meta-)*regex golf').

15

For now no checks for inheritance / signature constraints or types of container elements are performed by Clipppy, so this has to be handled in user code.

Templating

Clipppy includes rudimentary templating functionality built on top of string.Template. Placeholders are valid Python identifiers introduced by a $16 and optionally delimited by braces17: $_var123, ${var}. Replacement strings are given as keywords to ClipppyYAML.load, load_config, and loads, e.g.

>>> loads('[$var, ${var}, $var_123]', var='rep', var_123='rep_123')
['rep', 'rep', 'rep_123']

Template substitution is activated anytime “excess” keyword arguments are given, or when the force_templating argument to ClipppyYAML.load / load_config / loads is True (it is by default).

More usefully, templates can have defaults, specified as:

${var = default text }  →  default text

In this case the {} are mandatory, and surrounding whitespace is stripped. The default can be enclosed in parentheses (necessary when it contains a closing brace or to preserve surrounding whitespace):

${var = (␣{text}␣␣) }  →  ␣{text}␣␣

Inside the default text a backslash and a closing parenthesis are escaped:

${var = f(x\)}     →  f(x)
${var = f(x)}      →  f(x)  # OK because doesn't start with "("
${var = (f(x\))}   →  f(x)  # here, though, it's necessary
${var = \\text\\}  →  \text\

Defaults apply only to specific instances of the template, i.e. they are not associated with the name of the placeholder. Thus, one can have different defaults in different places:

>>> loads('${var=a} $var ${var=b}')
'a $var b'
>>> loads('${var=a} $var ${var=b}', var=7)
'7 7 7'

Notice how in the first case the middle instance, which has no default, is left alone, and that one can pass non-string values (they are formatted with str).

Using templates to give the values of YAML “variables” allows spelling the default out only once:

defs:
    - &a ${a=26}
    - &b ${b=42}
# later, use the variables:
a nice number: *a
the answer: *b

Footnotes

16

A literal $ has to be doubled, i.e. $$var$var, when template substitution is on.

17

The formal pattern is \$(?P<brace>{)?[_a-z][_a-z0-9]*(?(brace)}|) (regex101) or, allowing for defaults (regex101):

\$(?P<brace>{)?(?P<named>[_a-z][_a-z0-9]*)(?:|\s*=\s*(?P<paren>\()?(?P<default>(?:[^\\]|\\(?:\\|\)))*?)(?(paren)\)|)\s*)(?(brace)}|)