Skip to content

tensor size errors for bce_with_logits loss / net which outputs 1d tensor #373

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
tdhock opened this issue Apr 2, 2025 · 6 comments
Closed

Comments

@tdhock
Copy link
Contributor

tdhock commented Apr 2, 2025

Hi! @sebffischer

I am trying to implement a neural network with a custom loss function for binary classification.
It works just like the standard torch::nn_bce_with_logits_loss -- https://torch.mlverse.org/docs/reference/nn_bce_with_logits_loss
criterion(output, target) where both output and target are 1d tensors.
Here is a MRE.

task_sonar <- mlr3::tsk("sonar")
mlr3torch::LearnerTorchMLP$new(
  task_type="classif",
  loss=torch::nn_cross_entropy_loss
)$configure(
  epochs=2,
  batch_size=5
)$train(task_sonar)
mlr3torch::LearnerTorchMLP$new(
  task_type="classif",
  loss=torch::nn_bce_with_logits_loss
)$configure(
  epochs=2,
  batch_size=5
)$train(task_sonar)

I tried the code above, and I observe a tensor size error,

> task_sonar <- mlr3::tsk("sonar")
> mlr3torch::LearnerTorchMLP$new(
+ task_type="classif",
+ loss=torch::nn_cross_entropy_loss
+ )$configure(
+ epochs=2,
+ batch_size=5
+ )$train(task_sonar)
> mlr3torch::LearnerTorchMLP$new(
+ task_type="classif",
+ loss=torch::nn_bce_with_logits_loss
+ )$configure(
+ epochs=2,
+ batch_size=5
+ )$train(task_sonar)
Erreur dans (function (self, target, weight, pos_weight, reduction)  : 
  The size of tensor a (5) must match the size of tensor b (2) at non-singleton dimension 1
Exception raised from infer_size_impl at ../aten/src/ATen/ExpandUtils.cpp:31 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0xb0 (0x79d326994120 in /home/local/USHERBROOKE/hoct2726/lib/R/library/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0xfa (0x79d326937a5a in /home/local/USHERBROOKE/hoct2726/lib/R/library/torch/lib/libc10.so)
frame #2: at::infer_size_dimvector(c10::ArrayRef<long>, c10::ArrayRef<long>) + 0x3d4 (0x79d311ec39a4 in /home/local/USHERBROOKE/hoct2726/lib/R/library/torch/lib/libtorch_cpu.so)
frame #3: at::TensorIteratorBase::compute_shape(at::TensorIteratorConfig const&) + 0xc0 (0x79d311f7eab0 in /home/local/USHERBROOKE/ho

in the output above we see that loss=torch::nn_cross_entropy_loss works without error, but loss=torch::nn_bce_with_logits_loss gives a tensor size error.

I guess this is because MLP learner is always outputing a matrix with 2 columns? (even for the binary case where it could output 1 column or just a 1d tensor instead)

Is it possible in mlr3torch to have a neural network that outputs a 1d tensor instead?

I tried using graph learner code below, with nn_linear layer that has out_features=1.

ce_po_list <- list(
  mlr3pipelines::po(
    "select",
    selector = mlr3pipelines::selector_type(c("numeric", "integer"))),
  mlr3torch::PipeOpTorchIngressNumeric$new(),
  mlr3pipelines::po(
    "nn_linear",
    out_features=1),
  mlr3pipelines::po(
    "torch_loss",
    torch::nn_cross_entropy_loss),
  mlr3pipelines::po(
    "torch_optimizer",
    mlr3torch::t_opt("sgd", lr=0.1)),
  mlr3pipelines::po(
    "torch_model_classif",
    batch_size = 5,
    epochs = 2)
)
ce_graph_obj <- Reduce(mlr3pipelines::concat_graphs, ce_po_list)
ce_graph_learner <- mlr3::as_learner(ce_graph_obj)
ce_graph_learner$train(task_sonar)

bce_po_list <- list(
  mlr3pipelines::po(
    "select",
    selector = mlr3pipelines::selector_type(c("numeric", "integer"))),
  mlr3torch::PipeOpTorchIngressNumeric$new(),
  mlr3pipelines::po(
    "nn_linear",
    out_features=1),
  mlr3pipelines::po(
    "torch_loss",
    torch::nn_bce_with_logits_loss),
  mlr3pipelines::po(
    "torch_optimizer",
    mlr3torch::t_opt("sgd", lr=0.1)),
  mlr3pipelines::po(
    "torch_model_classif",
    batch_size = 5,
    epochs = 2)
)
bce_graph_obj <- Reduce(mlr3pipelines::concat_graphs, bce_po_list)
bce_graph_learner <- mlr3::as_learner(bce_graph_obj)
bce_graph_learner$train(task_sonar)

I observe two different errors for the two different loss functions:

> ce_graph_learner$train(task_sonar)
Erreur dans (function (self, target, weight, reduction, ignore_index, label_smoothing)  : 
  Target 2 is out of bounds.
Exception raised from nll_loss_out_frame at ../aten/src/ATen/native/LossNLL.cpp:251 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0xb0 (0x79d326994120 in /home/local/USHERBROOKE/hoct2726/lib/R/library/torch/lib/libc10.so)
frame #1: <unknown function> + 0x118ac3a (0x79d31178ac3a in /home/local/USHERBROOKE/hoct2726/lib/R/library/torch/lib/libtorch_cpu.so)
frame #2: at::native::structured_nll_loss_forward_out_cpu::impl(at::Tensor const&, at::Tensor const&, at::OptionalTensorRef, long, long, at::Tensor const&, at::Tensor const&) + 0x779 (0x79d3124c92a9 in /home/local/USHERBROOKE/hoct2726/lib/R/library/torch/lib/libtorch_cpu.so)
frame #3: <unknown function> + 0x2eaf9d2 (0x79d3134af9d2 in /home/local/USHERBROOKE/hoct2726/lib/R/library/torch/lib/libtorch_cpu.so)
frame #4: <unknown function> + 0x2eafb45 (0x79d3134afb45 in /home/local/USHERBROOKE
> bce_graph_learner$train(task_sonar)
Erreur dans (function (self, target, weight, pos_weight, reduction)  : 
  output with shape [5] doesn't match the broadcast shape [5, 5]
Exception raised from mark_resize_outputs at ../aten/src/ATen/TensorIterator.cpp:1207 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0xb0 (0x79d326994120 in /home/local/USHERBROOKE/hoct2726/lib/R/library/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0xfa (0x79d326937a5a in /home/local/USHERBROOKE/hoct2726/lib/R/library/torch/lib/libc10.so)
frame #2: at::TensorIteratorBase::mark_resize_outputs(at::TensorIteratorConfig const&) + 0x21d (0x79d311f7fd0d in /home/local/USHERBROOKE/hoct2726/lib/R/library/torch/lib/libtorch_cpu.so)
frame #3: at::TensorIteratorBase::build(at::TensorIteratorConfig&) + 0x78 (0x79d311f7fda8 in /home/local/USHERBROOKE/hoct2726/lib/R/library/t

So I guess this means that it is not currently supported?

for positive control, I tried changing out_features=2 in the code above, and in that case I observe the same result as MLP learner (cross entropy loss works, error for bce_with_logits_loss).

@sebffischer
Copy link
Member

sebffischer commented Apr 2, 2025

Thanks for this question!

There are in principle two solutions:

  1. Create a custom torch::nn_module that operates on the 2D tensors. The loss function basically just reshapes the response prediction to the correct format and casts the target from int to the required float.
  2. Integrate the reshaping into the graph, you can currently use po("nn_reshape") or (nn("reshape"), which is equivalent to po("nn_reshape", id = "reshape") for this. Note that we are also currently adding a PipeOp that allows to integrate custom R functions into the module, so you could add something like nn("fn", fn = function(x) x$reshape(-1)) into the graph very soon, which should be convenient (there is an open PR for this).
    Unfortunately this does currently not quite work, because mlr3torch loads the targets as torch_Ints and has currently no easy way to modify this (it would be possible via a callback, but it should be simpler). So maybe this requires some more user control.
library(mlr3torch)
#> Loading required package: mlr3
#> Loading required package: mlr3pipelines
#> Loading required package: torch

# currently requires adding class "nn_loss" because as_torch_loss only has a method for this class

nn_bce_loss2 = nn_module(c("nn_bce_with_logits_loss2", "nn_loss"),
  initialize = function(weight = NULL, reduction = "mean", pos_weight = NULL) {
    self$loss = nn_bce_with_logits_loss(weight, reduction, pos_weight)
  },
  forward = function(input, target) {
    # EDIT: added the -1 after comment by tdhock
    self$loss(input$reshape(-1), target$to(dtype = torch_float()) - 1)
  }
)

loss = nn_bce_loss2()

loss(torch_randn(10, 1), torch_randint(0, 1, 10))
#> torch_tensor
#> 0.500878
#> [ CPUFloatType{} ]

task = tsk("sonar")

graph = po("torch_ingress_num") %>>%
  nn("linear", out_features = 1) %>>%
  po("torch_loss", loss = nn_bce_loss2) %>>%
  po("torch_optimizer") %>>%
  po("torch_model_classif",
    epochs = 1, batch_size = 32
  )

glrn = as_learner(graph)

glrn$train(task)

# this does not work, because the targets are still loaded as ints but bce_loss expects floats

graph2 = po("torch_ingress_num") %>>%
  nn("linear", out_features = 1) %>>%
  nn("flatten", start_dim = 1) %>>%
  po("torch_loss", loss = nn_bce_loss) %>>%
  po("torch_optimizer") %>>%
  po("torch_model_classif",
    epochs = 1, batch_size = 32
  )


glrn2 = as_learner(graph2)

glrn2$train(task)
#> A lit of torch error backtrace
#> This happened PipeOp torch_model_classif's $train()

Created on 2025-04-02 with reprex v2.1.1

@tdhock
Copy link
Contributor Author

tdhock commented Apr 2, 2025

I confirm this works for me, thanks for the advice!
I did not know that it was possible to output a vector (rather than a matrix) for binary problems, but that makes sense about the reshape inside forward. not sure where this should be documented?
yes it would be nice to have the target be converted to float once at the beginning (instead of doing $to inside forward which I guess is slower).

@tdhock tdhock closed this as completed Apr 2, 2025
@sebffischer sebffischer reopened this Apr 3, 2025
@sebffischer
Copy link
Member

Keeping this open as this is at least a TODO for documentation

@tdhock
Copy link
Contributor Author

tdhock commented Apr 3, 2025

by the way, for others who are trying to do this, the code above runs without error, but does not learn, because the bce_with_logits_loss needs values in 0,1, but R labels are 1,2.
For learning you have to subtract 1:

nn_bce_loss3 = nn_module(c("nn_bce_with_logits_loss3", "nn_loss"),
  initialize = function(weight = NULL, reduction = "mean", pos_weight = NULL) {
    self$loss = nn_bce_with_logits_loss(weight, reduction, pos_weight)
  },
  forward = function(input, target) {
    self$loss(input$reshape(-1), target$to(dtype = torch_float())-1)
  }
)

@sebffischer
Copy link
Member

Thanks! I updated the code snippet

@sebffischer
Copy link
Member

This is now solved as torch classification learners are expected to output 1 column for binary classification problems

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants