https://github.com/pytorch/tutorials/blob/master/beginner_source/transformer_tutorial.py#L89
Since you are using the CrossEntropyLoss, you shouldn't do the log_softmax in the above line. You can either switch to the NLLLoss or remove the log_softmax to fix this.