Optimizing Trainable Parameters

Optimizing Trainable Parameters

Trainable parameters of generative functions are initialized differently depending on the type of generative function. Trainable parameters of the built-in modeling language are initialized with init_param!.

Gradient-based optimization of the trainable parameters of generative functions is based on interleaving two steps:

Parameter update

A parameter update reads from the gradient accumulators for certain trainable parameters, updates the values of those parameters, and resets the gradient accumulators to zero. A paramter update is constructed by combining an update configuration with the set of trainable parameters to which the update should be applied:

Gen.ParamUpdateType.
update = ParamUpdate(conf, param_lists...)

Return an update configured by conf that applies to set of parameters defined by param_lists.

Each element in param_lists value is is pair of a generative function and a vector of its parameter references.

Example. To construct an update that applies a gradient descent update to the parameters :a and :b of generative function foo and the parameter :theta of generative function :bar:

update = ParamUpdate(GradientDescent(0.001, 100), foo => [:a, :b], bar => [:theta])

Syntactic sugar for the constructor form above.

update = ParamUpdate(conf, gen_fn::GenerativeFunction)

Return an update configured by conf that applies to all trainable parameters owned by the given generative function.

Note that trainable parameters not owned by the given generative function will not be updated, even if they are used during execution of the function.

Example. If generative function foo has parameters :a and :b, to construct an update that applies a gradient descent update to the parameters :a and :b:

update = ParamUpdate(GradientDescent(0.001, 100), foo)
source

The set of possible update configurations is described in Update configurations. An update is applied with:

Gen.apply!Function.
apply!(update::ParamUpdate)

Perform one step of the update.

source

Update configurations

Gen has built-in support for the following types of update configurations.

conf = FixedStepGradientDescent(step_size)

Configuration for stochastic gradient descent update with fixed step size.

source
conf = GradientDescent(step_size_init, step_size_beta)

Configuration for stochastic gradient descent update with step size given by (t::Int) -> step_size_init * (step_size_beta + 1) / (step_size_beta + t) where t is the iteration number.

source
Gen.ADAMType.
conf = ADAM(learning_rate, beta1, beta2, epsilon)

Configuration for ADAM update.

source

For adding new types of update configurations, see Optimizing Trainable Parameters (Internal).