Trace Translators

While generative functions define probability distributions on traces, Trace Translators convert from one space of traces to another space of traces. Trace translators are building blocks of inference programs that utilize multiple model representations, like Involutive MCMC.

Trace translators are significantly more general than Bijectors. Trace translators can (i) convert between spaces of traces that include mixed numeric discrete random choices, as well as stochastic control flow, and (ii) convert between spaces for which there is no one-to-one correspondence (e.g. between models of different dimensionality, or between discrete and continuous models). Bijectors are limited to deterministic transformations between real-valued vectors of constant dimension.

Deterministic Trace Translators

Inference programs manipulate traces, but they also keep track of probabilities and probability densities associated with these traces. Suppose we have two generative functions p1 and p2. Given a trace t2 of p2 we can easily compute the probability (or probability density) that the trace would have been generated by p2 using get_score(t2). But suppose we want to construct the trace of p2 first sampling a trace t1 of p1 and then applying a deterministic transformation to that trace to obtain t2. How can we compute the probability that a trace t2 would have been produced by this process? This probability is needed if, for example, p2 defines a probabilistic model and want to use p1 as a proposal distribution within importance sampling. If we produce t2 via an arbitrary deterministic transformation of the random choices in t1, then computing the necessary probability is difficult.

If we restrict ourselves to deterministic transformations that are bijections (one-to-one correspondences) from the set of traces of p1 to the set of traces of p2, then the problem is much simplified. If the transformation is a bijection this means that (i) each trace of p1 gets mapped to a different trace of p2, and (ii) for every trace of p2 there is some trace of p1 that maps to it. Bijective transformations between traces are useful components of inference programs because the probability that a given trace t2 of p2 would have been produced by first sampling from p1 and then applying the transform can be computed simply as the probability that p1 would produce the (unique) trace t1 that gets mapped to the given trace by the transform. Conceptually, bijective trace transforms preserve probability. When trace transforms operate on traces with continuous random choices, computing probability densities of the transformed traces requires computing a Jacobian associated with the continuous part of the transformation.

Gen provides a DSL for expressing bijections between spaces of traces, called the Trace Transform DSL. We introduce this DSL via an example. Below are two generative functions. The first samples polar coordinates and the second uses cartesian coordinates.

@gen function p1()
    r ~ inv_gamma(1, 1)
    theta ~ uniform(-pi/2, pi/2)
end

@gen function p2()
    x ~ normal(0, 1)
    y ~ normal(0, 1)
end

Defining a trace transform with the Trace Transform DSL

The following trace transform DSL program defines a transformation (called f) that transforms traces of p1 into traces of p2:

@transform f (t1) to (t2) begin
    r = @read(t1[:r], :continuous)
    theta = @read(t1[:theta], :continuous)
    @write(t2[:x], r * cos(theta), :continuous)
    @write(t2[:y], r * sin(theta), :continuous)
end

This transform reads values of random choices in the input trace (t1) at specific addresses (indicated by the syntax t1[addr]) using @read and writes values to the output trace (t2) using @write. Each read and write operation is labeled with whether the random choice is discrete or continuous. The section Trace Transform DSL defines the DSL in more detail.

It is usually a good idea to write the inverse of the bijection. The inverse can provide a dynamic check that the transform truly is a bijection. The inverse of the above transformation is:

@transform finv (t2) to (t1) begin
    x = @read(t2[:x], :continuous)
    y = @read(t2[:y], :continuous)
    r = sqrt(x^2 + y^2)
    @write(t1[:r], sqrt(x^2 + y^2), :continuous)
    @write(t1[:theta], atan(y, x), :continuous)
end

We can inform Gen that two transforms are inverses of one another using pair_bijections!:

pair_bijections!(f, finv)

Wrapping a trace transform in a trace translator

Note that the transform DSL code does not specify what the two generative functions are, or what the arguments to these generative functions are. This information will be required for computing probabilities and probability densities of traces. We provide this information by constructing a Trace Translator that wraps the transform along with this transformation:

translator = DeterministicTraceTranslator(p2, (), choicemap(), f)

We then can then apply the translator to a trace of p1 using function call syntax. The translator returns a trace of p2 and a log-weight that we can use to compute the probability (density) of the resulting trace:

t2, log_weight = translator(t1)

Specifically, the log probability (density) that the trace t2 was produced by first sampling t1 = simulate(p1, ()) and then applying the trace translator, is:

get_score(t1) + log_weight

Let's unpack the previous few code blocks in more detail. First, note that we did not pass in the source generative function (p1) or its arguments (()) when we constructed the trace translator. This is because this information will be attached to the input trace t1 itself. We did need to pass in the target generative function (p2) and its arguments (()) however, because this information is not included in t1.

In this case, because continuous random choices are involved, the probabilities are probability densities, and the trace translator used automatic differentiation of the code in the transform f to compute a change-of-variables Jacobian that is necessary to compute the correct probability density of the new trace t2.

Observations

Typically, there are observations associated with one or both of the generative functions involved, and we have values for these in a choice map, so we do not want the trace translator to be responsible for populating the values of these observed random choices. For example, suppose we want to condition p2 on an observed random choice z:

@gen function p2()
    x ~ normal(0, 1)
    y ~ normal(0, 1)
    z ~ normal(x + y, 0.1)
end
observations = choicemap()
observations[:z] = 2.3

When p2 has observations, these can be passed in as an additional argument to the trace translator constructor:

translator = DeterministicTraceTranslator(p2, (), observations, f)

Discrete random choices and stochastic control flow

Trace transforms and trace translators interoperate seamlessly with discrete random choices and stochastic control flow. Both the input and the output traces can contain a mix of discrete and continuous choices, and arbitrary stochastic control flow. Consider the following simple example, where we use two different discrete representations to represent probability distributions the integers 0-7:

@gen function p1()
    bit1 ~ bernoulli(0.5)
    bit2 ~ bernoulli(0.5)
    bit3 ~ bernoulli(0.5)
end

@gen function p2()
    n ~ categorical([0.1, 0.1, 0.1, 0.2, 0.2, 0.15, 0.15])
end

We define the forward and inverse transforms:

@transform f (t1) to (t2) begin
    n = (
        @read(t1[:bit1], :discrete) * 1 +
        @read(t1[:bit2], :discrete) * 2 +
        @read(t1[:bit3], :discrete) * 4)
    @write(t2[:n], n, :discrete)
end

@transform finv (t2) to (t1) begin
    bits = digits(@read(t2[:n], :discrete), base=2)
    @write(t1[:bit1], bits[1], :discrete)
    @write(t1[:bit2], bits[2], :discrete)
    @write(t1[:bit3], bits[3], :discrete)
end

Here is an example that includes discrete random choices, stochastic control flow, and continuous random choices.

@gen function p1()
    if ({:branch} ~ bernoulli(0.5))
        x ~ normal(0, 1)
    else
        other ~ categorical([0.3, 0.7])
    end
end

@gen function p2()
    k ~ uniform_discrete(1, 4)
    if k <= 2
        y ~ gamma(1, 1)
    end
end

Note that transformations between spaces of traces need not be intuitive (although they probably should)! Try to convince yourself that the functions below are indeed a pair of bijections between the traces of these two generative functions.

@transform f (t1) to (t2) begin
    if @read(t1[:branch], :discrete)
        x = @read(t1[:x], :continuous)
        if x > 0
            @write(t2[:k], 2, :discrete)
        else
            @write(t2[:k], 1, :discrete)
        end
        @write(t2[:y], abs(x), :continuous)
    else
        other = @read(t1[:other], :discrete)
        @write(t2[:k], (other == 1) ? 3 : 4, :discrete)
    end
end

@transform finv (t2) to (t1) begin
    k = @read(t2[:k], :discrete)
    if k <= 2
        y = @read(t2[:y], :continuous)
        @write(t2[:x], (k == 1) ? -y : y, :continuous)
    else
        @write(t1[:other], (k == 3) ? 1 : 2, :discrete)
    end
end

General Trace Translators

Note that for two arbitrary generative functions p1 and p2 there may not exist any one-to-one correspondence between their spaces of traces. For example, consider a generative function p1 that samples points within the unit square $[0, 1]^2$

@gen function p1()
    x ~ uniform(0, 1)
    y ~ uniform(0, 1)
end

and another generative function p2 that samples one of 100 possible discrete values, each value representing one cell of the unit square:

@gen function p2()
    i ~ uniform_discrete(1, 10) # interval [(i-1)/10, i/10]
    j ~ uniform_discrete(1, 10) # interval [(j-1)/10, j/10]
end

There is no one-to-one correspondence between the spaces of traces of these two generative functions: The first is an uncountably infinite set, and the other is a finite set with 100 elements in it.

However, there is an intuitive notion of correspondence that we would like to be able to encode. Each discrete cell $(i, j)$ corresponds to a subset of the unit square $[(i - 1)/10, i/10] \times [(j-1)/10, j/10]$.

We can express this correspondence (and any correspondence between two arbitrary generative functions) by introducing two auxiliary generative functions q1 and q2. The first function q1 will take a trace of p1 as input, and the second function q2 will take a trace of p2 as input. Then, instead of a transfomation between traces of p1 and traces of p2 our trace transform will transform between (i) the space of pairs of traces of p1 and q1 and (ii) the space of pairs of traces of p2 and q2. We construct q1 and q2 so that the two spaces have the same size, and a one-to-one correspondence is possible.

For our example above, we construct q2 to sample the coordinate ($[0, 0.1]^2$) relative to the cell. We construct q1 to be empty–there is already a mapping from each trace of p1 to each trace of p2 that simply identifies what cell $(i, j)$ a given point in $[0, 1]^2$ is in, so no extra random choices are needed.

@gen function q1(p1_trace)
end

@gen function q2(p2_trace)
    dx ~ uniform(0.0, 0.1)
    dy ~ uniform(0.0, 0.1)
end

Trace transforms between pairs of traces

To handle general trace translators that require auxiliary probability distributions, the trace trace DSL supports defining transformations between pairs of traces. For example, the following defines a trace transform that maps from pairs of traces of p1 and q1 to pairs of traces of p2 and q2:

@transform f (p1_trace, q1_trace) to (p2_trace, q2_trace) begin
    x = @read(p1_trace[:x], :continuous)
    y = @read(p1_trace[:y], :continuous)
    i = ceil(x * 10)
    j = ceil(y * 10)
    @write(p2_trace[:i], i, :discrete)
    @write(p2_trace[:j], j, :discrete)
    @write(q2_trace[:dx], x - (i-1)/10, :continuous)
    @write(q2_trace[:dy], y - (j-1)/10, :continuous)
end

and the inverse transform:

@transform f_inv (p2_trace, q2_trace) to (p1_trace, q1_trace) begin
    i = @read(p2_trace[:i], :discrete)
    j = @read(p2_trace[:j], :discrete)
    dx = @read(q2_trace[:dx], :continuous)
    dy = @read(q2_trace[:dy], :continuous)
    x = (i-1)/10 + dx
    y = (j-1)/10 + dy
    @write(p1_trace[:x], x, :continuous)
    @write(p1_trace[:y], y, :continuous)
end

which we associate as inverses:

pair_bijections!(f, f_inv)

Constructing a general trace translator

We now wrap the transform above into a general trace translator, by providing the three probabilistic programs p2, q1, q2 that it uses (a reference to p1 will included in the input trace), and the arguments to these functions.

translator = GeneralTraceTranslator(
    p_new=p2,
    p_new_args=(),
    new_observations=choicemap(),
    q_forward=q1,
    q_forward_args=(),
    q_backward=q2,
    q_backward_args=(),
    f=f)

Then, we can apply the trace translator to a trace (t1) of p1 and get a trace (t2) of p2 and a log-weight:

t2, log_weight = translator(t1)

Symmetric Trace Translators

When the previous and new generative functions (e.g. p1 and p2 in the previous example) are the same, and their arguments are the same, and q_forward and q_backward (and their arguments) are also identical, we call this the trace translator a Symmetric Trace Translator. Symmetric trace translators are important because they form the basis of Involutive MCMC. Instead of translating a trace of one generative function to the trace of another generative function, they translate a trace of a generative function to another trace of the same generative function.

Symmetric trace translators have the interesting property that the function f is an involution, or a function that is its own inverse. To indicate that a trace transform is an involution, use is_involution!.

Because symmetric trace translators translate within the same generative function, their implementation uses update to incrementally modify the trace from the previous to the new trace. This has two benefits when the previous and new traces have random choices that aren't modified between them: (i) the incremental modification may be more efficient than writing the new trace entirely from scratch, and (ii) the transform DSL program does not need to specify a value for addresses whose value is not changed from the previous trace.

Simple Extending Trace Translators

Simple extending trace translators extend an existing trace with new random choices sampled from a proposal distribution, as well as any new observations. The arguments of the trace may also be updated. This type of trace translation is the basic operation used in Particle Filtering. For example, we might have a model that sequentially samples new latent variables (:z, t) and observations (:x, t) for each timestep t:

@gen function model(T::Int)
    for t in 1:T
        z = {(:z, t)} ~ normal(0, 1)
        x = {(:x, t)} ~ normal(z, 1)
    end
end

Each time we observe a new (:x ,t), we might want to propose (:z, t) so that it is close in value:

@gen function proposal(trace::Trace, x::Real)
    t = get_args(trace)[1] + 1
    {(:z, t)} ~ normal(x, 1)
end

Suppose we initially generated a trace up to timestep t=1, e.g. by calling t1 = simulate(model, (1,)). Now we observe (:x, 2) to be 5.0. By constructing a simple extending trace translator, we can simultaneously update the trace t1 with new arguments, introduce the new observation at (:x, 2), and propose a likely value for (:z, 2):

translator = SimpleExtendingTraceTranslator(
    p_new_args=(2,), p_argdiffs=(UnknownChange(),),
    new_observations=choicemap((:x, 2) => 5.0),
    q_forward=proposal, q_forward_args=(5.0,))
t2, log_weight = translator(t1)

Similar functionality can be achieved through a combination of propose on the proposal and update on the original trace, but using a trace translator provides a nice layer of abstraction.

Trace Transform DSL

The Trace Transform DSL is a differentiable programming language for writing deterministic transformations of traces. Programs written in this DSL are called transforms. Transforms read the value of random choices from input trace(s) and write values of random choices to output trace(s). These programs are not typically executed directly by users, but are instead wrapped into one of the several forms of trace translators listed above (GeneralTraceTranslator, and SymmetricTraceTranslator).

A transform is identified with the @transform macro, and uses one of the following four syntactic forms for the signature (the name of the transform, and the names of the input and output traces are all user-defined varibles; the only keywords are @transform, to, begin, and end):

A transform from one trace to another, without extra parameters

@transform f t1 to t2 begin
    ...
end

A transform from one trace to another, with extra parameters

@transform f(x, y, ..) t1 to t2 begin
    ...
end

A transform from pairs of traces to pairs of traces, without extra parameters

@transform f (t1, t2) to (t3, t4) begin
    ...
end

A transform from one trace to another, with extra parameters

@transform f(x, y, ..) (t1, t2) to (t3, t4) begin
    ...
end

The extra parameters are optional, and can be used to pass arguments to a transform function that is invoked, from another transform function, using the @tcall macro. For example:

@transform g(x) t1 to t2 begin
    ...
end
@transform f t1 to t2 begin
    x = ..
    @tcall(g(x))
end

The callee transform function (g above) reads and writes to the same input and output traces as the caller transform function (f above). Note that the input and output traces can be assigned different names in the caller and the callee.

The body of a transform reads the values of random choices at addresses in the input trace(s), performs computation using regular Julia code (provided this code can be differentiated with ForwardDiff.jl) and writes valeus of random choices at addresses in the output trace(s). In the body @read expressions read a value from a particular address of an input trace:

val = @read(<source>, <type-label>)

where <source> is an expression of the form <trace>[<addr>] where <trace> must be the name of an input trace in the transform's signature. The <type-label> is either :continuous or :discrete, and indicates whether the random choice is discrete or continuous (in measure-theoretic terms, whether it uses the counting measure, or a Lebesgue-measure a Euclidean space of some dimension). Similarly, @write expressions write a value to a particular address in an output trace:

@write(<destination>, <value>, <type-label>)

Sometimes trace transforms need to directly copy the value from one address in an input trace to one address in an output trace. In these cases, it is recommended to use the explicit @copy expression:

@copy(<source>, <destination>)

where <source> and <destination> are of the form <trace>[<addr>] as above. Note you can also copy collections of multiple random choices under an address namespace in an input trace to an address namespace in an output trace. For example,

@copy(trace1[:foo], trace2[:bar])

will copy every random choice in trace1 with an address of the form :foo => <rest> to trace2 at address :bar => <rest>.

It is also possible to read the return value from an input trace using the following syntax, but this value must be discrete (in the local neighborhood of traces, the return value must be constant as a function of all continuous random choices in input traces):

val = @read(<trace>[], :discrete)

This feature is useful when the generative function precomputes a quantity as part of its return value, and we would like to reuse this value, instead of having to recompute it as part of the transform. The `discrete' requirement is needed because the transform DSL does not currently backpropagate through the return value (this feature could be added in the future).

Tips for defining valid transforms:

If you find yourself copying the same continuous source address to multiple locations, it probably means your transform is not valid (the Jacobian matrix will have rows that are identical, and so the Jacobian determinant will be zero).
You can gain some confidence that your transform is valid by enabling dynamic checks (check=true) in the trace translator that uses it.