Clipppy Users’ Guide¶
Clipppy’s features fall into two broad categories: a configuration system and runtime utilities. Whereas the latter are aimed mainly at probabilistic programming (PP) and variational inference (VI), the former is meant to be as general as possible in its core while having specific “shortcuts” useful in PP and VI.
YAML configuration system¶
Clipppy’s configuration semantics were derived from its venerable predecessor pyrofit, which used YAML files with a more or less predefined structure to define VI models (and guides, etc.). In contrast, Clipppy’s YAML “format” is general and can in principle represent the initialisation of any Python objects (and even a bit beyond). The backbone of YAML parsing in Clipppy is ruamel.yaml, itself derived from PyYAML, and the very interested reader is referred to their documentations and to the official YAML specification. The less patient among you might be interested in any of a number of YAML tutorials on the web, while if you just want to start off with Clipppy, simply read on.
The main entry point functions for YAML parsing in Clipppy are clipppy.loads (general purpose) and clipppy.load_config (for VI). The examples in this guide assume the YAML is loaded via the former, which makes no prior assumptions on the overall structure of the document (while the latter by default interprets it as a clipppy.Clipppy object) and works with plain strings (whereas load_config requires a path, pathname, or text stream as input).
Basic YAML¶
YAML is1 a markup language that extends JSON. Hence, any valid JSON, like {"answer": 42, "foo": [3.14, "euler's number", {"question": null, "whatever": {"maybe": true, "but actually": false}}]} is valid YAML. This basic variant (which is almost directly usable as a Python literal) allows the description of arbitrary primitive objects: numbers, strings, arrays (Python lists), and dictionaries. YAML also allows a modified syntax, where one is permitted to
ditch the quotation marks since everything that doesn’t look like a number (and isn’t a literal like
true,false,null) is interpreted as a string;ditch the brackets as long as one uses indentation: everything more indented with respect to the parent by the same amount is on the same nesting level;
use a bullet-point-like style for lists (using
-) instead of square brackets.
Thus, in YAML the above example may be rewritten as
answer: 42
foo:
- 3.14
- euler's number
- question: null
whatever: {maybe: true, but actually: false}
which is, arguably, way more pleasant to look at.2 Note that one can still use (even partially well-formatted) JSON for any node.
Note
YAML also permits using lists (will be converted to tuples) as keys in a dictionary:
[a string, 26]: value
However, the usual hashability rules of Python apply, so dictionaries are not allowed inside keys. Also, since non-string keys are not used in function definitions, this feature is discouraged in Clipppy, and no guarantees are made that it will indeed be allowed forever.
Footnotes
Advanced YAML3¶
On top of syntactic sugar, YAML comes with some useful additional features. One of them is the ability to name and subsequently reference nodes. The syntax is inspired by C’s pointers:
a: &name {key1: value1, key2: value2}
b: *name
Using &name defines the “variable”, and *name “dereferences” it4. The pointer language is accurate here since in the parsed object, the two nodes will be converted to references to the same object, so parsed['a'] is parsed['b'] will evaluate to True in Python. Since this is a standard feature of ruamel.yaml, Clipppy’s machinery is bypassed when dereferencing, which might be surprising to someone who uses YAML references as a way to avoid duplicating code and doesn’t really mean to have the same object.
Footnotes
Scopes¶
The directives !py and !eval advertise giving you access to arbitrary Python (objects) from inside the YAML configuration and therefore need to resolve variable names. The scope in which this is done is kept in ClipppyYAML.constructor.scope8. By default Clipppy makes every[citation needed] effort to simulate the scoping “rule” of eval/exec, i.e. to “execute” the YAML in the local scope from which loads/clipppy.load_config or ClipppyYAML.load is called:
>>> a = 'spam, baked beans, and spam'
>>> clipppy.loads('!py:str.replace [!py:a , baked beans, spam]')
'spam, spam, and spam'
(Note the space here, since a) we don’t want to call a, and b) a space is required after every tag in YAML.)
To achieve this, every invokation of ClipppyYAML.load by default collects the locals, globals, and builtins from the appropriate frame and saves them to ClipppyYAML.constructor.scope. The scope may then be updated by !import directives, and these updates will leak to the caller. This is probably best illustrated with an explicitly given scope:
>>> scope = {}
>>> clipppy.loads('!import numpy as np', scope=scope) # None
>>> scope
{'np': <module 'numpy' from '.../numpy/__init__.py'>}
>>> clipppy.loads('!import jax.numpy as np', scope=scope) # None
>>> scope
{'np': <module 'jax.numpy' from '.../jax/numpy/__init__.py'>}
>>> 'jax.numpy' in sys.modules
True
but the same thing happens when using the default “current” scope:
>>> clipppy.loads('!import torch') # uses the current scope
>>> torch
<module 'torch' from ...>
On top of that scope, ClipppyYAML installs a custom ClipppyYAML.constructor.builtins that consists of the usual __builtins__ and the global scope of clipppy.yaml. The latter is kept for compatibility and as an easy way to get numpy, torch, and the majority of the clipppy API registered, even though the “full” API is then explicitly registered in this ClipppyYAML.constructor.builtins scope.
Note
If invoked from within YAML, e.g. via !py:locals [] or !py:globals [], the built-in locals and globals functions return the respective scopes for some function inside clipppy.yaml instead of something more meaningful9. The way to get at the “correct” scope, which !import imports in, is via eval-uating locals/globals as a Python call: !eval locals(), which will return ClipppyYAML.constructor.scope as, currently, a ChainMap. Remember, though, that !py operations essentially transpire in this scope anyway.
Footnotes
- 8
This attribute is unconditionally overwritten on each
load, so setting it directly will not have an effect on YAML loading. What it is set to, though, is controlled by thescopefunction parameter, which is your chance of controlling the YAML “globals” scope’; especially, if you want to “hide” the caller scope from the YAML for some reason (speed?), pass an empty dictionary.- 9
This might point you to why loading YAML is considered “unsafe” and why ruamel.yaml operates in a “safe” mode, turning which off is the first order of business for
ClipppyYAML.
From Node to Signature¶
Magic Keys¶
There are only three “magic keys”. Since YAML does not allow mixing sequence and mapping nodes, while in Python this is common practice, and also to cover the case of positional-only parameters, Clipppy needs a positional argument indicator key. Furthermore, since it is common to want to expand some generated parameter or maybe use the same object as a monolithic sequence in one place and as individual items in another11, Clipppy defines positional and keyword expansion “operators” corresponding to the Python parameter expansion syntax */**.
/¶Use the value as a positional argument. Can be used at any point (even after keywords, contrary to the Python grammar).
<¶Expand the value into positional arguments. A simple use case would be some
xycoordinates as an \(N \times 2\) array that need to be expanded into two arrays of length \(N\):- &pts [[0, 0], [1, 1], [26, 42]] ... - !py:matplotlib.pyplot.plot <: !py:np.transpose [*pts]
which corresponds to the very similar Python code
plt.plot(*np.transpose(pts))
Note
If you try this example with
ruamel.yaml<=0.17.4(or maybe even higher), this will (may) not work! The reason is that there is no (not-too-hacky) way to force depth-first construction if using an optimised C-based loader/parser/constructor, and the current implementation returns an empty list as the value of the referenced node when the!py:np.transpose-tagged node requires it. To solve this, tag the whole document with!py:listfor example, which will transfer control toClipppyYAMLfrom the beginning (and make the document a one-element sequence as per the requirement oflist… See, I told you: hacky!).This highlights a fundamental design choice of Clipppy: in order to provide sensible insight using type hints, construction has to be depth first and recursive (hence, Python’s stack depth limitation applies to Clipppy YAML files). In contrast, simple collection assembly can live with breadth-first construction and a subsequent population using further placeholders, etc.
Deprecated since version 0: Initially, the key for positional expansion was
__args, but this should not be used anymore.<<¶Expand the value into keyword arguments. This “merge type” is actually present in the officially recommended YAML type system10. Clipppy needs to merge eagerly, though, in order to be able to tag the nodes, so this key is handled specially. Otherwise, it does what you would expect: merges the named mapping into its parent, overwriting any already present keys. In this regard
!py:func {<<: *map1, <<: *map2, ...}
behaves more like
func(**{**map1, **map2, ...})
than
func(**map1, **map2, ...)
which would throw an exception for repeated keys. The same overwriting rule applies to keys not from expanded mappings.
Magic keys can be freely mixed and matched, used multiple times, etc. The order of evaluation of the nodes/parameters follows strictly the definition order in the YAML, just as it follows the definition order in a Python call (important for side effects and defining anchors). Here’s an example:
!py:f
<: [!eval 22/7]
euler: 2.72
<: [0, 1, 1, 2, 3, 5, 8]
euler: 2.71828
<<: {euler: !py:math.exp [1],
pi: !py:math.pi , phi: !py:mpmath.phi } # spaces!
may be used with a signature like
def f(not_pi, *fibonacci, euler, **exact): ...
and will result in a locals
{'not_pi': 3.142857142857143,
'euler': 2.718281828459045,
'fibonacci': (0, 1, 1, 2, 3, 5, 8),
'exact': {'pi': 3.141592653589793,
'phi': <Golden ratio phi: 1.61803~>}}
Footnotes
The Power of Type Hints¶
Type hints in Python are the best!12 They are completely ignored at runtime, so they don’t limit you in any way, but are still tremendously helpful in static analysis and allow IDEs to spot errors in your code before you run it. They help clarify the meaning of parameters and properties and contribute to automatic documentation generation. Even though the language ignores type hints, they are not completely “lost” as are the types of compiled languages: “annotations” can be freely examined by the program using the builtin typing and inspect modules. Basically, they are free information that the software designer gives to the program without any obligation. As such, type hints are often the basis of “smart” functionality, such as in the dataclasses modules. And in Clipppy, which tries to be smart and save you some typing in YAML if you have gone through the trouble of writing properly annotated Python code.
Clipppy needs to invoke Python functions with arguments coming from YAML in order to construct complex data structures beyond simple containers (sequences and mappings). Sometimes the inputs are themselves complex structures, and so the YAML parser needs to be informed further of the way to form them from simpler data, and so on. However, the original function knows what data to expect, and the constructors of complex structures know what primitives they need, or at least the programmer who wrote them does. Thus, if they provided this information as type hints, Clipppy can try to automatically determine the processing needed in the middle between primitives and the final call signature.
Take the following typical Clipppy configuration as example:
guide:
- cls: MultivariateNormalSamplingGroup
name: main
match: main/.*
- cls: DiagonalNormalSamplingGroup
name: others
To an outside observer this is just a one-key mapping, and the one value is a list of two further mappings with some strings. No tags or further information provided. However, as we said, Clipppy can automatically assume that this whole YAML represents a Clipppy object, and so automatically tag it13 with !py:Clipppy. The node, thus, represents a call to the constructor of Clipppy with an argument guide, so Clipppy inspects it for further information. In an ideal world, such as the one we live in, the guide parameter would be tagged with Guide so that the parser can tag it with !py:clipppy.guide.guide.Guide (it’s a mouthful, but that’s qualified names for you; also, that’s why we want automation, right?). Next, the constructor for Guide reads
def __init__(self, *specs: GroupSpec, model=None, name=''): ...
so the parser expands the sequence node into this signature and realises than both elements should be instances of GroupSpec, whose constructor might be (it was, it’s not anymore)
def __init__(
self,
cls: Type[SamplingGroup] = DeltaSamplingGroup,
match: Union[str, re.Pattern] = _allmatch,
exclude: Union[str, re.Pattern] = _nomatch,
name='', *args, **kwargs): ...
Here, even though name is not annotated, Clipppy will consider the type of the default value in line with most type checkers. However, a str is not particularly interesting since scalar nodes are by default strings. The match is a Union for convenience and is explicitly converted to a re.Pattern in the body of the function. Sadly, Clipppy connot handle Unions yet, so it leaves the match node alone14. Finally, for the cls parameter, meant to indicate the subtype of SamplingGroup to use, Clipppy assumes that the node is a name of a class / Python object to pass. The node is then tagged with !py:VALUE, where VALUE is the original content15. Clipppy does that for all Type or typing.Callable|collections.abc.Callable annotations, so if you want to pass something else than a name, you should put an explicit annotation.
Depending on ClipppyConstructor.strict_node_type, which is True by default, Clipppy enforces the types of nodes versus what it expects from an annotation: that callable / string parameters are represented as scalar nodes and that builtin sequences / mappings are, respectively, sequences / mappings.
Finally, the original YAML is perceived as
!py:Clipppy
guide: !py:clipppy.guide.guide.Guide
- !py:clipppy.guide.sampling_group.SamplingGroup
cls: !py:MultivariateNormalSamplingGroup
name: main
match: main/.*
- !py:clipppy.guide.sampling_group.SamplingGroup
cls: !py:DiagonalNormalSamplingGroup
name: others
Footnotes
- 12
But they’re soon getting worse (PEP 563)… :/
- 13
This only applies to loading with
interpret_as_Clipppy, as discussed above. Note that Clipppy will never interfere with your code if you’re explicit and do put tags in, unless they are the standard ones<tag:yaml.org,2002:str>,<...:seq>,<...:map>, which are actually auto-assigned based on the node type.- 14
Even if the annotation were a plain
re.Pattern, it wouldn’t work directly. Clipppy may be smart, but how is it to know that the constructor raises aTypeError: cannot create 're.Pattern' instanceswhen called directly, or that its signature checks out as(), i.e. nothing?! Maybe the developer knows that, though, and also thatPatterns are constructed viare.compile. They can then help Clipppy by registering a type-to-tag mapping inClipppyConstructor.type_to_tagasClipppyConstructor.type_to_tag[re.Pattern] = '!py:re.compile'
to replace the default
cls -> '!py:{cls.__module__}.{cls.__qualname__}'. Then a function likef(a: re.Pattern)can be safely “called” as!py:f [(meta-)*regex golf]and will be passedre.compile('(meta-)*regex golf').- 15
For now no checks for inheritance / signature constraints or types of container elements are performed by Clipppy, so this has to be handled in user code.
Templating¶
Clipppy includes rudimentary templating functionality built on top of string.Template. Placeholders are valid Python identifiers introduced by a $16 and optionally delimited by braces17: $_var123, ${var}. Replacement strings are given as keywords to ClipppyYAML.load, load_config, and loads, e.g.
>>> loads('[$var, ${var}, $var_123]', var='rep', var_123='rep_123')
['rep', 'rep', 'rep_123']
Template substitution is activated anytime “excess” keyword arguments are given, or when the force_templating argument to ClipppyYAML.load / load_config / loads is True (it is by default).
More usefully, templates can have defaults, specified as:
${var = default text } → default text
In this case the {} are mandatory, and surrounding whitespace is stripped. The default can be enclosed in parentheses (necessary when it contains a closing brace or to preserve surrounding whitespace):
${var = (␣{text}␣␣) } → ␣{text}␣␣
Inside the default text a backslash and a closing parenthesis are escaped:
${var = f(x\)} → f(x)
${var = f(x)} → f(x) # OK because doesn't start with "("
${var = (f(x\))} → f(x) # here, though, it's necessary
${var = \\text\\} → \text\
Defaults apply only to specific instances of the template, i.e. they are not associated with the name of the placeholder. Thus, one can have different defaults in different places:
>>> loads('${var=a} $var ${var=b}')
'a $var b'
>>> loads('${var=a} $var ${var=b}', var=7)
'7 7 7'
Notice how in the first case the middle instance, which has no default, is left alone, and that one can pass non-string values (they are formatted with str).
Using templates to give the values of YAML “variables” allows spelling the default out only once:
defs:
- &a ${a=26}
- &b ${b=42}
# later, use the variables:
a nice number: *a
the answer: *b
Footnotes
- 16
A literal
$has to be doubled, i.e.$$var→$var, when template substitution is on.- 17
The formal pattern is
\$(?P<brace>{)?[_a-z][_a-z0-9]*(?(brace)}|)(regex101) or, allowing for defaults (regex101):\$(?P<brace>{)?(?P<named>[_a-z][_a-z0-9]*)(?:|\s*=\s*(?P<paren>\()?(?P<default>(?:[^\\]|\\(?:\\|\)))*?)(?(paren)\)|)\s*)(?(brace)}|)