In machine translation, the encoder-decoder architecture is common. The encoder reads a sequence of words and represents it with a high-dimensional real-valued vector. This vector, often called the context vector, is given to the decoder, which then generates another sequence of words in the target…