Add PvT-v2 Model #26812

FoamoftheSea · 2023-10-15T01:33:51Z

Description

Motivation - The PvT is a useful backbone for computer vision tasks, but only the outdated v1 is available in Hugging Face.
What this PR Does - Full integration of PvT-v2 model (works with AutoModel and AutoBackbone).
Notes - Like the original implementation, the config allows for using either Spatial Reduction "SR" or average pooling "AP" to reduce complexity in the attention layer, default is using SRA (as in the original code).

@amyeroberts

Resources

Model paper

PVT v2: Improved Baselines with Pyramid Vision Transformer

Open Source Implementations

Checks

Add PvT-v2 to model docs ✔
Build pytests and have them pass ✔
Formatting (make fix-copies, make fixup) ✔
Convert open-source weights and test expected logits, uploaded to hub ✔

ArthurZucker · 2023-10-16T16:26:04Z

FYI @rafaelpadilla

amyeroberts · 2023-11-14T18:33:50Z

@FoamoftheSea - awesome work! Let us know when you're ready for review.

For the code quality checks, running make fixup and pushing the changes should resolve them.

FoamoftheSea · 2023-11-16T06:48:58Z

@amyeroberts - Thanks! Let me try that and do one final sweep over things, then I will get back to you shortly for review :)

FoamoftheSea · 2023-11-18T21:59:48Z

@amyeroberts I believe this is ready for review now. I made changes so all the quality checks pass, and I also added integration with AutoBackbone so that the PVTv2 can be used with Deformable DETR and other models that use AutoBackbone.

amyeroberts

Really nice PR - thanks for adding!

Just a few small comments - the main one is that we don't need the image processor. Once resolved we should be good to merge!

src/transformers/models/pvt_v2/configuration_pvt_v2.py

src/transformers/models/pvt_v2/image_processing_pvt_v2.py

docs/source/en/index.md

tests/models/pvt_v2/test_modeling_pvt_v2.py

src/transformers/models/pvt_v2/modeling_pvt_v2.py

… size in config

…g function

# Conflicts: # README_hd.md # docs/source/en/tasks/image_classification.md

FoamoftheSea · 2024-02-28T20:46:00Z

@amyeroberts I merged main today since the main branch passed CI tests, so I'm not sure why these checks are failing... the logs are not clear.

ArthurZucker · 2024-02-29T10:49:33Z

The failing torch test is unrelated however the check repo consistency is related to bad # Copied from statements. I don't have more tips but check the tabbing and if it's only wrapping a block or not!

FoamoftheSea · 2024-02-29T19:40:50Z

Ah, good catch, thanks!. I will try to run a make fixup on that and see if I can get to the bottom of it.

FoamoftheSea · 2024-03-08T02:23:42Z

Looks like CIs are green ✔️ I had to remove all of the # Copied from comments from test_modeling_pvt_v2.py to get the checks to pass. None of the code blocks were really direct copies, and the only chunks of code that were were the smaller methods within larger test classes, which I guess can't be marked with this comment, but I haven't had a chance to dig into exactly how those checks work. In any case, here are the commits where I added/removed the # Copied from comments in case anyone wants to analyze best practice here any further.

Added: fdb7ceb
Removed: 46f434e

The authors have reached out to me via email to let me know they've copied the pretrained weights over to the OpenGVLab HuggingFace account here: https://huggingface.co/collections/OpenGVLab/pvt-65db4ca6c3e37ebc67cd8e01, so I went ahead and switched all of the hub references in the code to point to these locations, and we should be ready to merge!

amyeroberts

Thanks for all the work on this model!

Just a small comment about the config and a nit

amyeroberts · 2024-02-20T20:33:32Z

src/transformers/models/pvt_v2/convert_pvt_v2_to_pytorch.py

+        expected_slice_logits = torch.tensor([-0.1769, -0.1747, -0.0143])
+    elif pvt_v2_size == "b5":
+        expected_slice_logits = torch.tensor([-0.2943, -0.1008, 0.6812])
+    else:


We can make the check optional - so people can disable if they're converting a custom checkpoint

amyeroberts · 2024-02-20T20:34:37Z

src/transformers/models/pvt_v2/configuration_pvt_v2.py

+        if kwargs.get("_out_indices", None) is not None:
+            out_indices = kwargs["_out_indices"]
+            out_features = None


What is this for? It shouldn't be necessary

So the explanation for this is that without either out_indices or out_features not being set to None, the call to get_aligned_output_features_output_indices on line 165 will end up overriding whatever is in the pretrained config with a single value representing the last stage, leading to loading issues of pretrained models expecting these values to be consistent.

However, this solution is not optimal because it will always favor the config over the user input, and we would prefer the user to have the option to override the config, so I've come up with a better solution, which will prioritize passing these values as 1) User setting if not None, 2) Config setting if available, 3) Default to None

I will have a commit soon to fix this

Note: I originally followed what I saw in the ResNet config, so this might still be a problem for loading models with ResNet backbones that had specific out_indices configured, we may want to test for that.

I don't think this is in the ResNetConfig?

It should be tested, but I believe is covered in BackboneTesterMixin

Apologies, what I was referring to was line 119 in models.resnet.configuration_resnet.py, which calls get_aligned_output_features_output_indices, and possibly will also overwrite the settings in the pretrained config, which is the issue I was working around here.

I studied the ResNet config as an example to work from because it is a model which uses the BackboneConfigMixin.

I've created a more elegant solution in 7dd8aad
Let me know what you think!

Just to make sure we're on the same page - what do you mean by "overwrite the settings in the pretrained config"?

When loading a pretrained config using PvtV2Config.from_pretrained(), the call to super().__init__() correctly finds and loads the self._out_indices and self._out_features from the JSON, and these are then accessible using the self.out_indices and self.out_features properties from the BackboneConfigMixin. However, the subsequent call to the get_aligned_output_features_output_indices function, if passed None for both out_features and out_indices (which are the default values in the __init__ function and therefore the values when using from_pretrained), will actually revert these back to referencing only the last index/feature layer. Thus, the PVTv2Config, and presumably the ResNet config, would effectively ignore the _out_indices and _out_features fields from the pretrained JSON config.

The code you originally highlighted was a hack to make sure that if there was already a self._out_indices set, that one of these variables would be not None when it reached the function call, but that was not the right solution. I believe the correct solution is to prioritize the values being fed to the function call as I laid out above.

I assumed that calling this line in the ResNetConfig __init__ function was essential for some reason, which is why I copied it over, but then it caused problems. I think it's intended to allow for flexibility in loading different architectures, but I'm really not sure tbh, though I would hesitate to remove it.

Actually on second thought, I can see that the call to the function is necessary for establishing a default behavior when the options are not passed, and also for aligning any customization that the user is trying to accomplish. The issue was that it did not account for previously loaded values from the JSON and was causing them to be overridden.

The issue was that it did not account for previously loaded values from the JSON and was causing them to be overridden.

Sorry, I'm still slightly confused about the behaviour being described here. Is this the preprocessor_config.json you're talking about?

In this case, it should be possible to load previous values from the config i.e.

config = PvtV2Config(out_indices=(2, 3)) config.save_pretrained("test_config") config = PvtV2Config.from_pretrained("test_config") assert config.out_indices == [2, 3]

Yes, this is the behavior in question.

I've just run a test and the ResNet config works fine, it appears that there is a different saving behavior, where the ResNetConfig is saving these fields into the JSON as "out_indices" and "out_features", whereas the PvtV2Config is saving them with the underscore prefix as "_out_indices" and "_out_features", and this is where the divergent behavior stems from.

When I get time later this evening, I can dig into why save_pretrained has a different behavior on these fields in ResNetConfig vs PvtV2Config, because that is where the fix should probably be.

HuggingFaceDocBuilderDev · 2024-03-08T13:16:53Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

…v2_to_pytorch.py

amyeroberts

Just a final comment on the config

amyeroberts · 2024-03-11T19:19:00Z

src/transformers/models/pvt_v2/configuration_pvt_v2.py

+        self._out_features, self._out_indices = get_aligned_output_features_output_indices(
+            out_features=out_features if out_features is not None else getattr(self, "out_features", None),
+            out_indices=out_indices if out_indices is not None else getattr(self, "out_indices", None),
+            stage_names=self.stage_names,
+        )


I don't see why we need this here - neither out_features not out_indices should be already set when initializing the config

Suggested change

self._out_features, self._out_indices = get_aligned_output_features_output_indices(

out_features=out_features if out_features is not None else getattr(self, "out_features", None),

out_indices=out_indices if out_indices is not None else getattr(self, "out_indices", None),

stage_names=self.stage_names,

)

self._out_features, self._out_indices = get_aligned_output_features_output_indices(

out_features=out_features,

out_indices=out_indices,

stage_names=self.stage_names,

)

Ahhh, ok I see what happened. This was a tricky one. The order of the base classes matters here, and we had class ResNetConfig(BackboneConfigMixin, PretrainedConfig) but class PvtV2Config(PretrainedConfig, BackboneConfigMixin), and thus the latter was not getting the proper to_dict method override, and therefore not saving these fields correctly to JSON, which led to the weirdness during the __init__ call

I have fixed this the correct way now in: 0e7a054

Thank you for your scrutiny here, I hadn't realized these values were not being saved properly. For a bit of backstory, I built the PvtV2 and was using it before I incorporated the Backbone mixins, so the code you highlighted was a hack that was part of the original work and prevented me from seeing this issue.

Thanks for digging into this. Adding on my to-do list to make sure this is easier to implement / catch

…d override

amyeroberts

Thanks for all the work adding this model - looks great! 💪

ydshieh · 2024-03-14T15:30:56Z

Hi @FoamoftheSea Thank you adding adding this model. Our nightly CI shows 2 failing tests below. Could you take a look please. Thank you in advance.

tests/models/pvt_v2/test_modeling_pvt_v2.py::PvtV2ModelIntegrationTest::test_inference_image_classification
(line 765)  AssertionError: False is not true

tests/models/pvt_v2/test_modeling_pvt_v2.py::PvtV2ModelIntegrationTest::test_inference_model
(line 905)  AssertionError: torch.Size([1, 256, 7, 7]) != torch.Size([1, 50, 512])

huggingface deleted a comment from github-actions bot Nov 14, 2023

amyeroberts reviewed Nov 21, 2023

View reviewed changes

FoamoftheSea added 24 commits November 22, 2023 19:03

Added pytests for pvt-v2, all passed

e7d6bcd

Added pvt_v2 to docs/source/end/model_doc

6fb38a4

Ran fix-copies and fixup. All checks passed

e8c763f

Added additional ReLU for linear attention mode

2d6ba4c

pvt_v2_b2_linear converted and working

55e6336

copied models/pvt to adapt to pvt_v2

bdfaf1f

First commit of pvt_v2

c5d68cb

PvT-v2 now works in AutoModel

ca2a8aa

Reverted batch eval changes for PR

2b1164e

Expanded type support for Pvt-v2 config

d1be912

Fixed config docstring. Added channels property

801af3d

Fixed model names in tests

a7efd5d

Fixed config backbone compat. Added additional type support for image…

aca6dc3

… size in config

Fixed config backbone compat

e2e6775

Allowed for batching of eval metrics

d9ad519

copied models/pvt to adapt to pvt_v2

0354aba

First commit of pvt_v2

4e72325

Set key and value layers to use separate linear modules. Fixed prunin…

027ecd3

…g function

Set AvgPool to 7

ab6571e

Fixed issue in init

2de676b

PvT-v2 now works in AutoModel

5d60d6f

Successful conversion of pretrained weights for PVT-v2

a185db0

Successful conversion of pretrained weights for PVT-v2 models

71ae761

Added pytests for pvt-v2, all passed

ab266a5

FoamoftheSea added 6 commits February 21, 2024 17:30

Added "Copied from" comments in test_modeling_pvt_v2.py

fdb7ceb

Merge branch 'main' into pvtv2

ed983b0

# Conflicts: # README_hd.md # docs/source/en/tasks/image_classification.md

Fixed import listing

98d326e

Updated model name

7d4eb7e

Merge branch 'main' into pvtv2

7bdcc8e

Force empty commit for PR refresh

bb9ff2f

FoamoftheSea added 9 commits March 4, 2024 11:44

Merge branch 'main' into pvtv2

bc72797

Fixed linting issue

b375813

Removed # Copied from comments

46f434e

Added PVTv2 to README_fr.md

d5fcee4

Merge branch 'main' into pvtv2-staging

7359df8

Ran make fix-copies

387740d

Merge branch 'main' into pvtv2-staging

457fbf2

Replace all FoamoftheSea hub references with OpenGVLab

9e85d9e

Merge branch 'main' into pvtv2-staging

9a5ffa3

amyeroberts approved these changes Mar 8, 2024

View reviewed changes

FoamoftheSea added 3 commits March 8, 2024 08:54

Fixed out_indices and out_features logic in configuration_pvt_v2.py

7dd8aad

Made ImageNet weight conversion verification optional in convert_pvt_…

d33f0eb

…v2_to_pytorch.py

Ran code fixup

b2ff5e0

amyeroberts reviewed Mar 11, 2024

View reviewed changes

Fixed order of parent classes in PvtV2Config to fix the to_dict metho…

0e7a054

…d override

amyeroberts approved these changes Mar 13, 2024

View reviewed changes

amyeroberts merged commit 1fc505b into huggingface:main Mar 13, 2024

Add PvT-v2 Model #26812

Add PvT-v2 Model #26812

Uh oh!

Conversation

FoamoftheSea commented Oct 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Resources

Checks

Uh oh!

ArthurZucker commented Oct 16, 2023

Uh oh!

amyeroberts commented Nov 14, 2023

Uh oh!

FoamoftheSea commented Nov 16, 2023

Uh oh!

FoamoftheSea commented Nov 18, 2023

Uh oh!

amyeroberts left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

FoamoftheSea commented Feb 28, 2024

Uh oh!

ArthurZucker commented Feb 29, 2024

Uh oh!

FoamoftheSea commented Feb 29, 2024

Uh oh!

FoamoftheSea commented Mar 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amyeroberts left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

FoamoftheSea Mar 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Mar 8, 2024

Uh oh!

amyeroberts left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

FoamoftheSea Mar 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amyeroberts left a comment

Choose a reason for hiding this comment

FoamoftheSea commented Oct 15, 2023 •

edited

Loading

FoamoftheSea commented Mar 8, 2024 •

edited

Loading

FoamoftheSea Mar 8, 2024 •

edited

Loading

FoamoftheSea Mar 12, 2024 •

edited

Loading