Skip to content

Fix/nn transformer block #367

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 43 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
3fca75c
feat(layer): add tokenizers for categoricals and numerics
sebffischer Feb 6, 2025
9ff800a
Update news
sebffischer Feb 6, 2025
64f1b09
both versions of NEWS
cxzhang4 Feb 13, 2025
4bc5446
init
cxzhang4 Feb 14, 2025
8580b29
Merge branch 'main' into fix/nn_transformer_block
cxzhang4 Feb 14, 2025
b7d1f6b
both news
cxzhang4 Feb 14, 2025
91b9792
Merge branch 'feat/reglu-geglu' into fix/nn_transformer_block
cxzhang4 Feb 14, 2025
4ed0fce
conda env for torch 0.13, some comments about logic of old implementa…
cxzhang4 Mar 13, 2025
0ac124f
comments/notes, old implementation
cxzhang4 Mar 14, 2025
1b2b594
comments and such, assertion on new code looks ok
cxzhang4 Mar 14, 2025
b82beeb
copying old style/logic back in
cxzhang4 Mar 14, 2025
8aeac29
comments, move kv_compression to the layer module and out of the bloc…
cxzhang4 Mar 14, 2025
8f71b22
tests pass
cxzhang4 Mar 15, 2025
5851e09
this version passes tests I think
cxzhang4 Mar 15, 2025
d4160ee
factored out head
cxzhang4 Mar 15, 2025
9419328
idk
cxzhang4 Mar 15, 2025
b0bde4e
skeleton code for a mlr3 task
cxzhang4 Mar 15, 2025
6f84354
sketches of graph
cxzhang4 Mar 17, 2025
98ab02a
Copilot PipeOps
cxzhang4 Mar 17, 2025
4ca8640
cls token should be ok
cxzhang4 Mar 17, 2025
7bfd4a1
re-implement feedback from old PR
cxzhang4 Mar 17, 2025
89116e4
added a comment reminder for old feedback
cxzhang4 Mar 17, 2025
d287d45
modified shapes_out for the PipeOps
cxzhang4 Mar 18, 2025
4af8610
graph looks better
cxzhang4 Mar 18, 2025
c341781
idrk
cxzhang4 Mar 18, 2025
f4480a1
CoPilot reglu geglu
cxzhang4 Mar 18, 2025
7335e33
added back first_layer flag
cxzhang4 Mar 20, 2025
91f6049
graph still buggy
cxzhang4 Mar 21, 2025
80179ef
Merge branch 'main' into fix/nn_transformer_block
cxzhang4 Mar 21, 2025
6b90805
idk
cxzhang4 Mar 21, 2025
b1aaca7
merge main
cxzhang4 Apr 10, 2025
fe5eeaa
prototytping
cxzhang4 Apr 10, 2025
579c86f
think the tests for old and refactored look ok
cxzhang4 Apr 10, 2025
93a269c
added query_idx param
cxzhang4 Apr 10, 2025
8d56f83
removed browser statements
cxzhang4 Apr 10, 2025
ac672cf
graph trains
cxzhang4 Apr 10, 2025
5c180ca
need to keep debugging shapes
cxzhang4 Apr 10, 2025
88c5967
need to continue debugging
cxzhang4 Apr 10, 2025
ae6ee0e
more attic stuff (to be deleted), don't drop dimension
cxzhang4 Apr 11, 2025
86c8b90
add test file
cxzhang4 Apr 11, 2025
0e359e2
rename
cxzhang4 Apr 13, 2025
a22403b
more name
cxzhang4 Apr 13, 2025
04e9c0b
use new names in attic code
cxzhang4 Apr 13, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,7 @@ Collate:
'PipeOpTorchAvgPool.R'
'PipeOpTorchBatchNorm.R'
'PipeOpTorchBlock.R'
'PipeOpTorchCLS.R'
'PipeOpTorchCallbacks.R'
'PipeOpTorchConv.R'
'PipeOpTorchConvTranspose.R'
Expand All @@ -131,6 +132,7 @@ Collate:
'PipeOpTorchReshape.R'
'PipeOpTorchSoftmax.R'
'PipeOpTorchTokenizer.R'
'PipeOpTorchTransformerLayer.R'
'Select.R'
'TaskClassif_cifar.R'
'TaskClassif_lazy_iris.R'
Expand Down
2 changes: 2 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,7 @@ export(PipeOpTorchFlatten)
export(PipeOpTorchFn)
export(PipeOpTorchGELU)
export(PipeOpTorchGLU)
export(PipeOpTorchGeGLU)
export(PipeOpTorchHardShrink)
export(PipeOpTorchHardSigmoid)
export(PipeOpTorchHardTanh)
Expand Down Expand Up @@ -133,6 +134,7 @@ export(PipeOpTorchModelRegr)
export(PipeOpTorchOptimizer)
export(PipeOpTorchPReLU)
export(PipeOpTorchRReLU)
export(PipeOpTorchReGLU)
export(PipeOpTorchReLU)
export(PipeOpTorchReLU6)
export(PipeOpTorchReshape)
Expand Down
10 changes: 10 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,16 @@
* The `dataset` of a learner must no longer return the tensors on the specified `device`,
which allows for parallel dataloading on GPUs.
* `PipeOpBlock` should no longer create ID clashes with other PipeOps in the graph (#260).
Also, the improvement is calculated as the difference between the current and the best score,
not the current and the previous score.
* feat: Added multimodal melanoma and cifar{10, 100} example tasks.
* feat: Added a callback to iteratively unfreeze parameters for finetuning.
* fix: torch learners can now be used with `AutoTuner`.
* feat: Added different learning rate schedulers as callbacks.
* feat: `PipeOpBlock` should no longer create ID clashes with other PipeOps in the graph (#260)
* fix: `device` is no longer part of the `dataset` which allows for parallel dataloading
on GPUs.
* feat: Add tokenizers for numeric and categorical features.

# mlr3torch 0.1.2

Expand Down
118 changes: 118 additions & 0 deletions R/PipeOpTorchActivation.R
Original file line number Diff line number Diff line change
Expand Up @@ -798,3 +798,121 @@ PipeOpTorchGLU = R6Class("PipeOpTorchGLU",
)

register_po("nn_glu", PipeOpTorchGLU)

reglu <- function(x) {
assert_true(tail(x$shape, 1) %% 2 == 0)
chunked = x$chunk(2, dim=-1)
a = chunked[[1]]
b = chunked[[2]]
return(a * nnf_relu(b))
}

geglu <- function(x) {
assert_true(tail(x$shape, 1) %% 2 == 0)
chunked = x$chunk(2, dim=-1)
a = chunked[[1]]
b = chunked[[2]]
return(a * nnf_gelu(b))
}

nn_reglu = nn_module(
"nn_reglu",
forward = function(input) {
return(reglu(input))
}
)

nn_geglu = nn_module(
"nn_geglu",
forward = function(input) {
return(geglu(input))
}
)

#' @title ReGLU Activation Function
#'
#' @description
#' Regularized Gated Linear Unit (ReGLU) activation function.
#' @section Parameters:
#' No parameters.
#' @templateVar id nn_reglu
#' @template pipeop_torch_channels_default
#' @template pipeop_torch
#' @template pipeop_torch_example
#'
#' @export
PipeOpTorchReGLU = R6Class("PipeOpTorchReGLU",
inherit = PipeOpTorch,
public = list(
#' @description Creates a new instance of this [R6][R6::R6Class] class.
#' @template params_pipelines
initialize = function(id = "nn_reglu", param_vals = list()) {
param_set = ps()
super$initialize(
id = id,
param_set = param_set,
param_vals = param_vals,
module_generator = nn_reglu,
tags = "activation"
)
}
),
private = list(
.shapes_out = function(shapes_in, param_vals, task) {
shape = shapes_in[[1L]]
d_new = tail(shape, 1) / 2
if (test_integerish(d_new)) {
shape[length(shape)] = d_new
list(shape)
} else {
stopf("Last dimension of input tensor must be divisible by 2.")
}
}
)
)

register_po("nn_reglu", PipeOpTorchReGLU)

#' @title GeGLU Activation Function
#'
#' @description
#' Gaussian Error Linear Unit Gated Linear Unit (GeGLU) activation function.
#' @section Parameters:
#' No parameters.
#' @templateVar id nn_geglu
#' @template pipeop_torch_channels_default
#' @template pipeop_torch
#' @template pipeop_torch_example
#'
#' @export
PipeOpTorchGeGLU = R6Class("PipeOpTorchGeGLU",
inherit = PipeOpTorch,
public = list(
#' @description Creates a new instance of this [R6][R6::R6Class] class.
#' @template params_pipelines
initialize = function(id = "nn_geglu", param_vals = list()) {
param_set = ps()
super$initialize(
id = id,
param_set = param_set,
param_vals = param_vals,
module_generator = nn_geglu,
tags = "activation"
)
}
),
private = list(
.shapes_out = function(shapes_in, param_vals, task) {
shape = shapes_in[[1L]]
d_new = tail(shape, 1) / 2
if (test_integerish(d_new)) {
shape[length(shape)] = d_new
list(shape)
} else {
stopf("Last dimension of input tensor must be divisible by 2.")
}
}
)
)

register_po("nn_geglu", PipeOpTorchGeGLU)
71 changes: 71 additions & 0 deletions R/PipeOpTorchCLS.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
#' @title PipeOpTorchCLS
#' @description PipeOp that concatenates a CLS token to the input
#' TODO: describe exactly where it is concatenated
PipeOpTorchCLS = R6::R6Class("PipeOpTorchCLS",
inherit = PipeOpTorch,
public = list(
#' @description Create a new instance of this [R6][R6::R6Class] class.
#' @param id (`character(1)`)\cr
#' Identifier of the resulting object.
initialize = function(id = "cls", param_vals = list()) {
param_set = ps(
d_token = p_uty(custom_check = function(input) {
check_integerish(input, lower = 1L, any.missing = FALSE, len = 1)
}),
initialization = p_fct(levels = c("uniform", "normal"))
)

super$initialize(
id = id,
module_generator = nn_cls_token,
param_vals = param_vals,
param_set = param_set
)
}
),
private = list(
.shapes_out = function(shapes_in, param_vals, task) {
# TODO: add an assertion on the number of dimensions?
# this should always work for tabular data but maybe wouldn't work if we were trying to do NLP
# generally feels hacky
shapes_out = shapes_in$input
shapes_out[2] = shapes_out[2] + 1
return(list(shapes_out))
}
)
)
mlr3pipelines::mlr_pipeops$add("cls", PipeOpTorchCLS)

initialize_token_ = function(x, d, initialization="") {
assert_choice(initialization, c("uniform", "normal"))
d_sqrt_inv = 1 / sqrt(d)
if (initialization == "uniform") {
return(nn_init_uniform_(x, a = -d_sqrt_inv, b = d_sqrt_inv))
} else {
return(nn_init_normal_(x, std=d_sqrt_inv))
}
}

nn_cls_token = nn_module(
"nn_cls_token",
initialize = function(d_token, initialization) {
self$d_token = d_token
self$weight = nn_parameter(torch_empty(d_token))
self$initialization = initialization
self$reset_parameters()
},
reset_parameters = function() {
initialize_token_(self$weight, d = self$d_token, self$initialization)
},
expand = function(...) {
leading_dimensions = list(...)
if(length(leading_dimensions) == 0) {
return(self$weight)
}
new_dims = rep(1, length(leading_dimensions) - 1)
return(self$weight$view(c(new_dims, -1))$expand(c(leading_dimensions, -1)))
},
forward = function(input) {
return(torch_cat(list(input, self$expand(input$shape[1], 1)), dim=2)) # the length of tensor, multiplies all dimensions
}
)
Loading
Loading