Hi, I am a little bit confused about cyclic shift,Can you help me understand? · Issue #52 · microsoft/Swin-Transformer

Can you explain how the cyclic shift changes the feature map, and what position of the tokens is masked during the calculation of the attention? As in your paper's figure , it's too abstract for me...