-
Notifications
You must be signed in to change notification settings - Fork 617
Closed
Labels
Description
The LAMB optimizer declares that its exclude_from_weight_decay
argument should take in a comma-separated string of regex patterns. However, the code expects a list of regex patterns and instead iterates through each character in the string. Thus nearly every call to _do_use_weight_decay()
returns False.
I attempted to pass in a list to circumvent this bug, but this leads to a typeguard
error. So there's no easy way around this in the meantime. A similar bug exists for exclude_from_layer_adaption
.
Two proposed fixes, and I'm happy to contribute either:
- Change the desired datatype from
Optional[str]
toList[str]
. This would be preferred, and match the style of other implementations in the TensorFlow repo. See here for a list of examples. My PR is open at Fix LAMB optimizer regex parsing #1532. - Add
.split(',')
toexclude_from_weight_decay
andexclude_from_layer_adaption
in the constructor.
gabrieldemarmiesse and seanpmorgan