Multihead Attention
Below is the diagram of cross-attention. In self-attention, source text and target text are the same.
Computation flows from top (inputs) to bottom (output). Move mouse over any operation to compute the result.

Input value
Learnable weight/parameter
Sum all inputs
Multiply all inputs



Loading...