-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Unet decoder upsampling #187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hello, in this UNet implementation, the upsampling operator of the decoder is a nearest neighbor interpolation which uses the PyTorch functional api, see:https://github.com/qubvel/segmentation_models.pytorch/blob/a3cc9ac7b3430967414506f37c5f043e1ba3397f/segmentation_models_pytorch/unet/decoder.py#L36 By default, the center block is disabled, see: Regarding the downsampling operators, you cannot get how they are used by looking at the Finally, in this library,
EDIT: The "implicit" definition of a |
Hi Howard For semantic segmentation should the centre block be activated? I have not activated it and my results seem to be very good(dice loss of 0.20 and IOU score of 0.767). With regards to:
How do we modify this spatial resolution? And finally in using the segmentation head, my segmentation head is using an identity upsample rather than nn.UpsamplingBilinear2d. How do I set upsampling > 1? Is it even necessary? |
@howard-mahe Sorry to hijack the thread, I had a question regarding the downsampling part of the encoder. I'm trying to teach my network to recognize small features that quickly disappear after 2 or 3 downsamplings. In order to improve my IoU I was wondering if it would make sens to increase the feature channel of the first layers of the encoder. Do you have any literature to recommend on this topic? |
@DamienLopez1 I am not sure I got your question regarding the modification of the spatial resolution. Actually, all the encoders are already defined so that each Regarding the upsampling, what do you mean by "identity" upsampling? Do you mean nearest neighbor upsampling? If you do not upsample the coarse features, you won't be able to fuse features from different stages. Hi @JulienMaille. Because of the pooling operators (or convolution stride=2), indeed some content might disappear from the feature maps after few stages. Deep backbones are designed to capture both local and global information, hence the theoretical receptive field of a resnet is generally above 200x200. If you are interested in small features, I imagine you do not want to increase the receptive field of your network. So first, do not use pooling operators and second do not make the network too deep. In the literature, Wu et al. have applied such principles to VGG to detect manipulations in images: Wu, Y., AbdAlmageed, W., & Natarajan, P. (2019). Mantra-net: Manipulation tracing network for detection and localization of image forgeries with anomalous features. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 9543-9552). (link). Hence the authors maintained the receptive of field of their "fully convolutional VGG" at 23x23. |
Thanks Howard that helps alot. Regarding the identity upsampling, this is my models segmentation head:
The upsampling is set to identity. Why would this be? |
In upsampling = nn.UpsamplingBilinear2d(scale_factor=upsampling) if upsampling > 1 else nn.Identity() The segmentation head of |
Thanks alot Howard. I get it now! |
@JulienMaille By pooling operators I mean any operator with a stride not equal to 1. There are five of them in any resnet in order to reduce the spatial resolution by a factor 1/32 at the end of the encoder. For your very specific usecase, I imagine you probably don't need to reduce the spatial resolution at all (just like in ManTra-Net) in order to maintain a small receptive field. Here follows for instance the implementation of the VGG in ManTra-Net: from torchvision.models.vgg import cfgs, make_layers
# Remove pooling operators from cfgs
cfgs = {name: list(filter(lambda a: a != 'M', cfg)) for name, cfg in cfgs.items()}
vgg11 = make_layers(cfgs['A'])
vgg13 = make_layers(cfgs['B'])
vgg16 = make_layers(cfgs['D'])
vgg19 = make_layers(cfgs['E']) If you have a lot of training data (1M+), you will be able to train such network from scratch. If you don't, well... you will have a hard time because pre-trained deep neural nets are designed with pooling operators which aim at aggregating information in a large receptive field. You cannot use those pre-trained models if you remove the pooling ops, except the very first stage of pre-trained models which works at full resolution. |
Maybe this should be another issue, but if I can maybe get an explanation as to why DICE loss works as a loss function for Unet rather than the Softmax with BCE loss as said in the Unet paper? I am doing a background vs single class segmentation and am using a sigmoid activation. I am thinking that because its single class and sigmoid, this is why dice loss is permissable? Am I on the right track? |
You are right, this should be another issue. Well, I am not an expert of binary losses. But the point is that Regarding softmax vs. sigmoid: for the binary segmentation tasks, you better go with the sigmoid and predict one heat map while the softmax must be used for multi-class segmentation tasks. However the softmax can be used for binary segmentation tasks when the number of classes K is set to 2, but this is a redundant formulation compared to sigmoid. |
Thanks alot Howard, that clears things up for me. |
Hi @howard-mahe , have you ever tried to use bilinear interpolation (or trilinear interpolation for 3D input)? In my experiments, bilinear interpolation always performs better than nearest neighbors, and yields smoother masks. (My model is not U-net based, so not totally sure whether this conclusion generalizes to U-net.)
|
Hi
I am using a Unet model with the encoder set to 'resnet34' and the pretrained weights are imagenet.
When I look at the model I do not see where the upsampling is occuring. The convolutions in the encoder side are occuring (although the downsampling is seemingly occuring after the intended layer e.g downsampling from layer 1 to layer 2 only occurs after layer 2), however I do not see where the upsampling takes place in the decoder side.
There is also the case where I do not see the centre block convolutions occuring.
Can I please be explained where the upsampling occurs?
My model for reference:
resnet34 Unet model.txt
The text was updated successfully, but these errors were encountered: