The way to mask attention matrices in Flax · google flax · Discussion #2915

It seems that attention matrices are masked both for queries and keys simultaneously and masked positions are changed to jnp.finfo(dtype).min, not to -jnp.inf in Flax implementation. flax/flax/line...