Masking layer vs attention_mask parameter in MultiHeadAttention

I use MultiHeadAttention layer in my transformer model (my model is very similar to the named entity recognition models). Because my data comes with different lengths, I use padding and attention_m...