In the TransformerModel's init function we have
self.decoder = nn.Linear(ninp, ntoken)
and then
self.init_weights()
where
def init_weights(self):
initrange = 0.1
nn.init.uniform_(self.encoder.weight, -initrange, initrange)
nn.init.zeros_(self.decoder)
nn.init.uniform_(self.decoder.weight, -initrange, initrange)
`
The nn.init.zeros_(self.decoder) line gives the error
AttributeError: 'Linear' object has no attribute 'zero_'
I simply commented out the nn.init.zeros_(self.decoder) line, but I don't know how much not initializing with zeros messes with the model's performance.