check:
“Teacher forcing” is the concept of using the real target outputs as each next input, instead of using the decoder’s guess as the next input. Using teacher forcing causes it to converge faster but when the trained network is exploited, it may exhibit instability.
http://minds.jacobs-university.de/sites/default/files/uploads/papers/ESNTutorialRev.pdf