Swin transformer handle window size smaller than input size #6222

YosuaMichael · 2022-06-30T10:17:33Z

This PR introduce logic in swin transformer to update the window_size if it is bigger than the input size.
This behaviour follow the original implementation: https://github.com/microsoft/Swin-Transformer/blob/main/models/swin_transformer.py#L192-L195

We introduced this change on a commit in #6088, however since we want to cherrypick that PR and the main goal of #6088 is to adapt the components so it can be reused on the swin transformer 3d, we decide to revert back this change in order to lower the risk during the release.

However since this change can be useful especially if we want to adapt the components to swin transformer v2 (as commented by @xiaohu2015 in #6088 (comment)), we introduce the change back in this PR.

We have tested this change by running the validation script:

python -u ~/script/run_with_submitit.py \
    --timeout 3000 --nodes 1 --ngpus 4 --batch-size=1 \
    --partition train --model <model_variant> \
    --data-path="/datasets01_ontap/imagenet_full_size/061417" \
    --weights="<weight_variant>_Weights.IMAGENET1K_V1" \
    --test-only \

where model_variant can be swin_t, swin_s, swin_b and weight_variant can be Swin_T, Swin_S, Swin_B.
Here are the validation resutl:

swin_t, Swin_T : Acc@1 81.472 Acc@5 95.780
swin_s, Swin_S: Acc@1 83.186 Acc@5 96.366
swin_b, Swin_B: Acc@1 83.584 Acc@5 96.636

…mer by updating window_size

torchvision/models/swin_transformer.py

datumbox

Thanks @YosuaMichael. I left a couple of comments, let me know what you think.

Edit: also the failing tests seem related.

torchvision/models/swin_transformer.py

YosuaMichael · 2022-07-04T09:01:49Z

Investigating on the failing test, the changes affect the result when the input size is 112 or smaller.
And when I try the resize_size and crop_size on imagenet I got the following resize:

resize_size=256, crop_size=112
- [Original] Acc@1 66.318 Acc@5 85.934
- [This PR] Acc@1 64.236 Acc@5 84.412

resize_size=128, crop_size=112
- [Original] Acc@1 69.248 Acc@5 88.824
- [This PR] Acc@1 67.060 Acc@5 87.218

As we can see, this PR reduce the accuracy significantly.

When looking at the reason, it because the relative_position_index depend on the window_size. And I think this is the reason the original implementation put the window_size update on the constructor with a cost of having the input_resolution specified on the constructor as well (which mean the model can only accept a fixed input).

Looking at this, I think this change need more discussion and if we really want to implement like the original implementation its better to have a "better" reason like to implement Swin Transformer V2. Therefore for now I will close this PR.

xiaohu2015 · 2022-07-09T03:37:25Z

@YosuaMichael I think we can do some modification on the model:
the window_size can be List[int] or List[List[int]], if we use List[int], then the window size of all block should be the same (just when the input size is 224 the
window size 7), if we use List[List[int]], we can choose use different window size for each block ([[7, 7], [7, 7], [7, 7], [6, 6]] for input size 192, [[12, 12], [12, 12], [12, 12], [6, 6]] for 384. I mean that the user should make a choose for what they want. The original code require input_resolution, here we require window_size, I think the two ways can have same effect.

cc @datumbox

YosuaMichael · 2022-07-12T20:56:25Z

Hi @xiaohu2015 , sorry for the late reply! I am trying to understand your suggestion on having window_size specified for each layer. In this case, is it correct to say that the user need to retrain the model in order to change the window_size?
For instance if we train our model in input size 224, then if the user want to have input of size 384 then he need to change window_size and hence the pretrained weight cannot load and we need to have separate weight for input size 384?

xiaohu2015 · 2022-08-09T11:38:44Z

@YosuaMichael yes.

YosuaMichael added 2 commits June 30, 2022 10:17

Handle case where window_size larger than input_size in swin_transfor…

dbb5c5a

…mer by updating window_size

Add missing import Tuple and format ufmt

ef67639

YosuaMichael added enhancement module: models labels Jun 30, 2022

YosuaMichael self-assigned this Jun 30, 2022

facebook-github-bot added the cla signed label Jun 30, 2022

YosuaMichael commented Jun 30, 2022

View reviewed changes

torchvision/models/swin_transformer.py Show resolved Hide resolved

YosuaMichael mentioned this pull request Jun 30, 2022

Refactor swin transfomer so later we can reuse component for 3d version #6088

Merged

datumbox reviewed Jun 30, 2022

View reviewed changes

torchvision/models/swin_transformer.py Outdated Show resolved Hide resolved

torchvision/models/swin_transformer.py Outdated Show resolved Hide resolved

torchvision/models/swin_transformer.py Show resolved Hide resolved

torchvision/models/swin_transformer.py Outdated Show resolved Hide resolved

Resolve comment and update correct ufmt format

417e8cd

YosuaMichael marked this pull request as draft July 3, 2022 14:32

YosuaMichael closed this Jul 4, 2022

xiaohu2015 mentioned this pull request Jul 9, 2022

Add SwinV2 #6246

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Swin transformer handle window size smaller than input size #6222

Swin transformer handle window size smaller than input size #6222

Uh oh!

YosuaMichael commented Jun 30, 2022

Uh oh!

Uh oh!

datumbox left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

YosuaMichael commented Jul 4, 2022

Uh oh!

xiaohu2015 commented Jul 9, 2022 •

edited

Loading

Uh oh!

YosuaMichael commented Jul 12, 2022

Uh oh!

xiaohu2015 commented Aug 9, 2022

Uh oh!

Uh oh!

Swin transformer handle window size smaller than input size #6222

Swin transformer handle window size smaller than input size #6222

Uh oh!

Conversation

YosuaMichael commented Jun 30, 2022

Uh oh!

Uh oh!

datumbox left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

YosuaMichael commented Jul 4, 2022

Uh oh!

xiaohu2015 commented Jul 9, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

YosuaMichael commented Jul 12, 2022

Uh oh!

xiaohu2015 commented Aug 9, 2022

Uh oh!

Uh oh!

datumbox left a comment •

edited

Loading

xiaohu2015 commented Jul 9, 2022 •

edited

Loading