-
Notifications
You must be signed in to change notification settings - Fork 29.2k
Add PvT-v2 Model #26812
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add PvT-v2 Model #26812
Conversation
FYI @rafaelpadilla |
@FoamoftheSea - awesome work! Let us know when you're ready for review. For the code quality checks, running |
@amyeroberts - Thanks! Let me try that and do one final sweep over things, then I will get back to you shortly for review :) |
@amyeroberts I believe this is ready for review now. I made changes so all the quality checks pass, and I also added integration with AutoBackbone so that the PVTv2 can be used with Deformable DETR and other models that use AutoBackbone. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really nice PR - thanks for adding!
Just a few small comments - the main one is that we don't need the image processor. Once resolved we should be good to merge!
# Conflicts: # README_hd.md # docs/source/en/tasks/image_classification.md
@amyeroberts I merged main today since the main branch passed CI tests, so I'm not sure why these checks are failing... the logs are not clear. |
The failing torch test is unrelated however the check repo consistency is related to bad |
Ah, good catch, thanks!. I will try to run a make fixup on that and see if I can get to the bottom of it. |
Looks like CIs are green ✔️ I had to remove all of the The authors have reached out to me via email to let me know they've copied the pretrained weights over to the OpenGVLab HuggingFace account here: https://huggingface.co/collections/OpenGVLab/pvt-65db4ca6c3e37ebc67cd8e01, so I went ahead and switched all of the hub references in the code to point to these locations, and we should be ready to merge! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the work on this model!
Just a small comment about the config and a nit
expected_slice_logits = torch.tensor([-0.1769, -0.1747, -0.0143]) | ||
elif pvt_v2_size == "b5": | ||
expected_slice_logits = torch.tensor([-0.2943, -0.1008, 0.6812]) | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can make the check optional - so people can disable if they're converting a custom checkpoint
if kwargs.get("_out_indices", None) is not None: | ||
out_indices = kwargs["_out_indices"] | ||
out_features = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this for? It shouldn't be necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the explanation for this is that without either out_indices
or out_features
not being set to None
, the call to get_aligned_output_features_output_indices
on line 165 will end up overriding whatever is in the pretrained config with a single value representing the last stage, leading to loading issues of pretrained models expecting these values to be consistent.
However, this solution is not optimal because it will always favor the config over the user input, and we would prefer the user to have the option to override the config, so I've come up with a better solution, which will prioritize passing these values as 1) User setting if not None, 2) Config setting if available, 3) Default to None
I will have a commit soon to fix this
Note: I originally followed what I saw in the ResNet config, so this might still be a problem for loading models with ResNet backbones that had specific out_indices configured, we may want to test for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is in the ResNetConfig?
It should be tested, but I believe is covered in BackboneTesterMixin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies, what I was referring to was line 119 in models.resnet.configuration_resnet.py
, which calls get_aligned_output_features_output_indices
, and possibly will also overwrite the settings in the pretrained config, which is the issue I was working around here.
I studied the ResNet config as an example to work from because it is a model which uses the BackboneConfigMixin
.
I've created a more elegant solution in 7dd8aad
Let me know what you think!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to make sure we're on the same page - what do you mean by "overwrite the settings in the pretrained config"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When loading a pretrained config using PvtV2Config.from_pretrained()
, the call to super().__init__()
correctly finds and loads the self._out_indices
and self._out_features
from the JSON, and these are then accessible using the self.out_indices
and self.out_features
properties from the BackboneConfigMixin
. However, the subsequent call to the get_aligned_output_features_output_indices
function, if passed None
for both out_features
and out_indices
(which are the default values in the __init__
function and therefore the values when using from_pretrained
), will actually revert these back to referencing only the last index/feature layer. Thus, the PVTv2Config, and presumably the ResNet config, would effectively ignore the _out_indices
and _out_features
fields from the pretrained JSON config.
The code you originally highlighted was a hack to make sure that if there was already a self._out_indices set, that one of these variables would be not None when it reached the function call, but that was not the right solution. I believe the correct solution is to prioritize the values being fed to the function call as I laid out above.
I assumed that calling this line in the ResNetConfig __init__
function was essential for some reason, which is why I copied it over, but then it caused problems. I think it's intended to allow for flexibility in loading different architectures, but I'm really not sure tbh, though I would hesitate to remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually on second thought, I can see that the call to the function is necessary for establishing a default behavior when the options are not passed, and also for aligning any customization that the user is trying to accomplish. The issue was that it did not account for previously loaded values from the JSON and was causing them to be overridden.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue was that it did not account for previously loaded values from the JSON and was causing them to be overridden.
Sorry, I'm still slightly confused about the behaviour being described here. Is this the preprocessor_config.json you're talking about?
In this case, it should be possible to load previous values from the config i.e.
config = PvtV2Config(out_indices=(2, 3))
config.save_pretrained("test_config")
config = PvtV2Config.from_pretrained("test_config")
assert config.out_indices == [2, 3]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is the behavior in question.
I've just run a test and the ResNet config works fine, it appears that there is a different saving behavior, where the ResNetConfig is saving these fields into the JSON as "out_indices" and "out_features", whereas the PvtV2Config is saving them with the underscore prefix as "_out_indices" and "_out_features", and this is where the divergent behavior stems from.
When I get time later this evening, I can dig into why save_pretrained
has a different behavior on these fields in ResNetConfig vs PvtV2Config, because that is where the fix should probably be.
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a final comment on the config
self._out_features, self._out_indices = get_aligned_output_features_output_indices( | ||
out_features=out_features if out_features is not None else getattr(self, "out_features", None), | ||
out_indices=out_indices if out_indices is not None else getattr(self, "out_indices", None), | ||
stage_names=self.stage_names, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see why we need this here - neither out_features
not out_indices
should be already set when initializing the config
self._out_features, self._out_indices = get_aligned_output_features_output_indices( | |
out_features=out_features if out_features is not None else getattr(self, "out_features", None), | |
out_indices=out_indices if out_indices is not None else getattr(self, "out_indices", None), | |
stage_names=self.stage_names, | |
) | |
self._out_features, self._out_indices = get_aligned_output_features_output_indices( | |
out_features=out_features, | |
out_indices=out_indices, | |
stage_names=self.stage_names, | |
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahhh, ok I see what happened. This was a tricky one. The order of the base classes matters here, and we had class ResNetConfig(BackboneConfigMixin, PretrainedConfig)
but class PvtV2Config(PretrainedConfig, BackboneConfigMixin)
, and thus the latter was not getting the proper to_dict
method override, and therefore not saving these fields correctly to JSON, which led to the weirdness during the __init__
call
I have fixed this the correct way now in: 0e7a054
Thank you for your scrutiny here, I hadn't realized these values were not being saved properly. For a bit of backstory, I built the PvtV2 and was using it before I incorporated the Backbone mixins, so the code you highlighted was a hack that was part of the original work and prevented me from seeing this issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for digging into this. Adding on my to-do list to make sure this is easier to implement / catch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the work adding this model - looks great! 💪
Hi @FoamoftheSea Thank you adding adding this model. Our nightly CI shows 2 failing tests below. Could you take a look please. Thank you in advance.
|
Description
@amyeroberts
Resources
Model paper
Open Source Implementations
Checks