Encoder-decoder

Appears in 1 paper

The Transformer's two-part structure for seq2seq tasks (e.g., translation).

As used in Paper 08 — Attention Is All You Need →

The Transformer's two-part structure for seq2seq tasks (e.g., translation). The encoder processes the full source sequence with self-attention (all positions can see all others). The decoder generates the target sequence autoregressively, attending to both its own previous outputs (masked self-attention) and the encoder's outputs (cross-attention).