-
Notifications
You must be signed in to change notification settings - Fork 7.1k
FCOS empty box images #5266
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@barschiiii I also found this issue, we plan to add another PR to handle this. |
@xiaohu2015 that's great to hear, hope I will be able to try it soon :) |
@barschiiii |
@xiaohu2015 thanks - technically it works! |
Thanks for providing this info, it helps the debugging. Could you try @xiaohu2015 patch without them and let us know if it's fixed? Thanks! |
@datumbox this is after applying the patch - do you mean running it without AMP? |
Yes, exactly. Using AMP+Adam without gradient clipping can cause instabilities. Running without them will tell us if the nans are caused by division by 0 or by some other instability. |
I am actually running it with gradient clipping, but will try without AMP and also with SGD. |
SGD with AMP, and Adam without AMP seem to both run fine. Adam with AMP and gradient clipping runs into instability (NaN) issues, also with different learning rates. |
It might be related to the default initialization of anchor boxes:
From my understanding, this should be lower if it should match the strides for e.g. a resnet50. |
@barschiiii there is no anchors in FCOS actually, but we borrow the anchor from retinanet. here, the anchor equal the cell or grid in the feature map, so the anchor size equals to strides of multi-level feature map. If you want to modify the label assignment, I think you can adjust the
but I think, in most cases, you don't need do that. |
But shouldn't the anchor size then be adjusted manually from the multi-level feature map stride? I can see in detectron2 implementation, that they are calculating the stride each time for a backbone that is passed. |
I might have been wrong, stability issues still happen. Will explore further but if anyone has an idea from where it could come, please let me know. |
yes, the anchor size shouldn't be adjusted manually. as you see, we can get the strides of multi-level feature maps in backbone from the method |
Do you also test your datasests with detectron2? The stability issues can offen happen in detection models, maybe you should adjust the training hyparams. |
Trying a lot of different settings, it seems the first forward pass in the classification head is causing NaNs and causing my instability issues. Could not resolve it for now, even if I force fp32 forward pass for this part. |
Can you confirm you still face the problem on the latest main branch? |
Not sure I understand why do we use anchors if FCOS doesn't need anchors. Can you explain a bit more? |
Uh oh!
There was an error while loading. Please reload this page.
I am playing around with new FCOS models (thanks for that) and am encountering issues when providing images without box annotations. This is a common use case in object detection, and also works for other detector models in torchvision.
A simple example to replicate:
An indexing error happens in
FCOSHead
when runningcompute_loss
in this part:A workaround seems to be necessary, when having empty targets. Happy for any guidance, maybe there is also a different way necessary for me to train on empty images.
@jdsgomes @xiaohu2015 @zhiqwang
Versions
Torchvision @ master
The text was updated successfully, but these errors were encountered: