FCOS empty box images #5266

barschiiii · 2022-01-24T11:58:48Z

I am playing around with new FCOS models (thanks for that) and am encountering issues when providing images without box annotations. This is a common use case in object detection, and also works for other detector models in torchvision.

A simple example to replicate:

model = fcos_resnet50_fpn(pretrained=True)
model(torch.zeros((1,3,512,512)), targets=[{"boxes": torch.empty(0,4), "labels": torch.empty(0,1).to(torch.int64)}])

An indexing error happens in FCOSHead when running compute_loss in this part:

all_gt_classes_targets = []
all_gt_boxes_targets = []
for targets_per_image, matched_idxs_per_image in zip(targets, matched_idxs):
    gt_classes_targets = targets_per_image["labels"][matched_idxs_per_image.clip(min=0)]
    gt_classes_targets[matched_idxs_per_image < 0] = -1  # backgroud
    gt_boxes_targets = targets_per_image["boxes"][matched_idxs_per_image.clip(min=0)]
    all_gt_classes_targets.append(gt_classes_targets)
    all_gt_boxes_targets.append(gt_boxes_targets)

A workaround seems to be necessary, when having empty targets. Happy for any guidance, maybe there is also a different way necessary for me to train on empty images.

@jdsgomes @xiaohu2015 @zhiqwang

Versions

Torchvision @ master

The text was updated successfully, but these errors were encountered:

xiaohu2015 · 2022-01-24T12:02:37Z

@barschiiii I also found this issue, we plan to add another PR to handle this.

barschiiii · 2022-01-24T12:10:30Z

@xiaohu2015 that's great to hear, hope I will be able to try it soon :)

xiaohu2015 · 2022-01-24T12:28:27Z

@barschiiii
hi, we fix this bug in #5267

barschiiii · 2022-01-24T14:43:40Z

@xiaohu2015 thanks - technically it works!
However, I am running in NaN loss after short period of time. Not sure this is related, will try to explore.
Using AMP mixed precision and Adam.

datumbox · 2022-01-24T14:53:04Z

Using AMP mixed precision and Adam.

Thanks for providing this info, it helps the debugging. Could you try @xiaohu2015 patch without them and let us know if it's fixed? Thanks!

barschiiii · 2022-01-24T14:56:05Z

@datumbox this is after applying the patch - do you mean running it without AMP?

datumbox · 2022-01-24T14:58:59Z

Yes, exactly. Using AMP+Adam without gradient clipping can cause instabilities. Running without them will tell us if the nans are caused by division by 0 or by some other instability.

barschiiii · 2022-01-24T15:01:10Z

I am actually running it with gradient clipping, but will try without AMP and also with SGD.

barschiiii · 2022-01-24T16:28:30Z

SGD with AMP, and Adam without AMP seem to both run fine. Adam with AMP and gradient clipping runs into instability (NaN) issues, also with different learning rates.

barschiiii · 2022-01-24T19:30:06Z

It might be related to the default initialization of anchor boxes:

if anchor_generator is None:
    anchor_sizes = ((8,), (16,), (32,), (64,), (128,))  # equal to strides of multi-level feature map
    aspect_ratios = ((1.0,),) * len(anchor_sizes)  # set only one anchor
    anchor_generator = AnchorGenerator(anchor_sizes, aspect_ratios)

From my understanding, this should be lower if it should match the strides for e.g. a resnet50.
Actually if I lower it, I am avoiding nan losses.

xiaohu2015 · 2022-01-25T01:40:31Z

@barschiiii there is no anchors in FCOS actually, but we borrow the anchor from retinanet. here, the anchor equal the cell or grid in the feature map, so the anchor size equals to strides of multi-level feature map. If you want to modify the label assignment, I think you can adjust the center_sampling_radius, or the lower bound and upper bound.

            # each anchor is only responsible for certain scale range.
            lower_bound = anchor_sizes * 4
            lower_bound[: num_anchors_per_level[0]] = 0
            upper_bound = anchor_sizes * 8
            upper_bound[-num_anchors_per_level[-1] :] = float("inf")

but I think, in most cases, you don't need do that.

barschiiii · 2022-01-25T07:45:53Z

But shouldn't the anchor size then be adjusted manually from the multi-level feature map stride? I can see in detectron2 implementation, that they are calculating the stride each time for a backbone that is passed.
Here in this case the anchor size is hardcoded, and I am wondering how these hard-coded values have been decided, the default ones seem not right to me.

barschiiii · 2022-01-25T09:04:51Z

I might have been wrong, stability issues still happen. Will explore further but if anyone has an idea from where it could come, please let me know.

xiaohu2015 · 2022-01-25T09:48:55Z

But shouldn't the anchor size then be adjusted manually from the multi-level feature map stride? I can see in detectron2 implementation, that they are calculating the stride each time for a backbone that is passed.
Here in this case the anchor size is hardcoded, and I am wondering how these hard-coded values have been decided, the default ones seem not right to me.

yes, the anchor size shouldn't be adjusted manually. as you see, we can get the strides of multi-level feature maps in backbone from the method output_shape. but torchvision does not implement the interface, so the default anchor size is hardcoded.

xiaohu2015 · 2022-01-25T09:49:25Z

I might have been wrong, stability issues still happen. Will explore further but if anyone has an idea from where it could come, please let me know.

Do you also test your datasests with detectron2? The stability issues can offen happen in detection models, maybe you should adjust the training hyparams.

barschiiii · 2022-01-25T12:18:35Z

Trying a lot of different settings, it seems the first forward pass in the classification head is causing NaNs and causing my instability issues. Could not resolve it for now, even if I force fp32 forward pass for this part.

datumbox · 2022-02-13T12:11:43Z

Can you confirm you still face the problem on the latest main branch?

Isalia20 · 2022-11-24T07:59:31Z

@barschiiii there is no anchors in FCOS actually, but we borrow the anchor from retinanet. here, the anchor equal the cell or grid in the feature map, so the anchor size equals to strides of multi-level feature map. If you want to modify the label assignment, I think you can adjust the center_sampling_radius, or the lower bound and upper bound.
            # each anchor is only responsible for certain scale range.
            lower_bound = anchor_sizes * 4
            lower_bound[: num_anchors_per_level[0]] = 0
            upper_bound = anchor_sizes * 8
            upper_bound[-num_anchors_per_level[-1] :] = float("inf")
but I think, in most cases, you don't need do that.

Not sure I understand why do we use anchors if FCOS doesn't need anchors. Can you explain a bit more?

xiaohu2015 mentioned this issue Jan 24, 2022

fix bug when the target is empty in FCOS #5267

Merged

FCOS empty box images #5266

FCOS empty box images #5266

Comments

barschiiii commented Jan 24, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Versions

xiaohu2015 commented Jan 24, 2022

Uh oh!

barschiiii commented Jan 24, 2022

Uh oh!

xiaohu2015 commented Jan 24, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

barschiiii commented Jan 24, 2022

Uh oh!

datumbox commented Jan 24, 2022

Uh oh!

barschiiii commented Jan 24, 2022

Uh oh!

datumbox commented Jan 24, 2022

Uh oh!

barschiiii commented Jan 24, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

barschiiii commented Jan 24, 2022

Uh oh!

barschiiii commented Jan 24, 2022

Uh oh!

xiaohu2015 commented Jan 25, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

barschiiii commented Jan 25, 2022

Uh oh!

barschiiii commented Jan 25, 2022

Uh oh!

xiaohu2015 commented Jan 25, 2022

Uh oh!

xiaohu2015 commented Jan 25, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

barschiiii commented Jan 25, 2022

Uh oh!

datumbox commented Feb 13, 2022

Uh oh!

Isalia20 commented Nov 24, 2022

Uh oh!

barschiiii commented Jan 24, 2022 •

edited

Loading

xiaohu2015 commented Jan 24, 2022 •

edited

Loading

barschiiii commented Jan 24, 2022 •

edited

Loading

xiaohu2015 commented Jan 25, 2022 •

edited

Loading

xiaohu2015 commented Jan 25, 2022 •

edited

Loading