-
-
Notifications
You must be signed in to change notification settings - Fork 25
Closed
Description
Have been discussing with @pkofod how to design optimisers that can be used across Flux, Optim.jl and perhaps others. It seems the basic outline of a design in FluxML/Flux.jl#637 is something that Optim can work with. We're currently looking at splitting this into:
state = init(rule, x)
dx', state' = apply(rule, x, dx, state)
x' = update(x, dx')
Some design goals from my side:
- It should be easy to e.g. specify that structs are optimised by optimising each field.
- It should be easy to specify how custom structs like
Colors
are updated (e.g. clamp the values). apply
should supportstate=nothing
optimisers in a generic way.- We also need an in-place
update!
, but at this level we don't need to do any in-place/out-of-place detection. - Rules should be composable (e.g. weight decay and ADAM).
The current default for update(x, dx)
is to calculate x .- dx
; this is convenient for ML but could be changed if it's inconvenient for other things (we'll just do the negation as part of the rule).
Metadata
Metadata
Assignees
Labels
No labels