133
133
# <div class="alert alert-warning">
134
134
#
135
135
# **Note:** Currently, _FinetuningScheduler_ only supports the following ``StrategyType``s:
136
+ #
136
137
# - ``DP``
137
138
# - ``DDP``
138
139
# - ``DDP_SPAWN``
@@ -600,6 +601,8 @@ def configure_callbacks(self):
600
601
enable_progress_bar = False
601
602
602
603
# %%
604
+
605
+
603
606
def train () -> None :
604
607
trainer = pl .Trainer (
605
608
enable_progress_bar = enable_progress_bar ,
@@ -622,13 +625,15 @@ def train() -> None:
622
625
# ### Running the Baseline and Implicit Finetuning Scenarios
623
626
#
624
627
# Let's now compare our ``nofts_baseline`` and ``fts_implicit`` scenarios with the ``fts_explicit`` one we just ran.
628
+ #
625
629
# We'll need to update our callbacks list, using the core PL ``EarlyStopping`` and ``ModelCheckpoint`` callbacks for the
626
630
# ``nofts_baseline`` (which operate identically to their FTS analogs apart from the recursive training support).
627
631
# For both core PyTorch Lightning and user-registered callbacks, we can define our callbacks using a dictionary as we do
628
632
# with the LightningCLI. This allows us to avoid managing imports and support more complex configuration separated from
629
633
# code.
630
- # We'll be using identical callback configurations to the ``fts_explicit`` scenario. Keeping ``max_depth`` for the
631
- # implicit schedule will limit finetuning to just the last 4 parameters of the model, which is only a small fraction
634
+ #
635
+ # Note that we'll be using identical callback configurations to the ``fts_explicit`` scenario. Keeping ``max_depth`` for
636
+ # the implicit schedule will limit finetuning to just the last 4 parameters of the model, which is only a small fraction
632
637
# of the parameters you'd want to tune for maximum performance. Since the implicit schedule is quite computationally
633
638
# intensive and most useful for exploring model behavior, leaving ``max_depth`` 1 allows us to demo implicit mode
634
639
# behavior while keeping the computational cost and runtime of this notebook reasonable. To review how a full implicit
@@ -686,8 +691,8 @@ def train() -> None:
686
691
# which uses DDP_SPAWN and 1 GPU.
687
692
#
688
693
# ``FinetuningScheduler`` expands the space of possible finetuning schedules and the composition of more sophisticated schedules can
689
- # yield mariginal finetuning performance gains. That stated, it should be emphasized the primary utility of ``FinetuningScheduler`` is to grant
690
- # greater finetuning flexibility for model exploration in research. For example, glancing at DeBERTav3 's implicit training
694
+ # yield marginal finetuning performance gains. That stated, it should be emphasized the primary utility of ``FinetuningScheduler`` is to grant
695
+ # greater finetuning flexibility for model exploration in research. For example, glancing at DeBERTa-v3 's implicit training
691
696
# run, a critical tuning transition point is immediately apparent:
692
697
#
693
698
# [](https://tensorboard.dev/experiment/n7U8XhrzRbmvVzC4SQSpWw/#scalars&_smoothingWeight=0&runSelectionState=eyJmdHNfZXhwbGljaXQiOmZhbHNlLCJub2Z0c19iYXNlbGluZSI6ZmFsc2UsImZ0c19pbXBsaWNpdCI6dHJ1ZX0%3D)
0 commit comments