Finetuning scheduler #115

speediedan · 2021-12-01T21:23:05Z

Before submitting

Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
Discussed as the appropriate implementation of #10197 and I have updated the FinetuningScheduler PR accordingly.
Did you make sure to update the docs?
Did you write any new necessary tests?

What does this PR do?

This PR adds a notebook-based tutorial introducing the FinetuningScheduler callback and demonstrating the use of FinetuningScheduler to finetune a small foundational model. As discussed with PL team lead(s), rather than adding the FinetuningScheduler callback directly to PL core, the preferred pattern moving forward will be to register callbacks via a forthcoming API. Until the new API is available, I've included a mock registry for the FinetuningScheduler callback and noted my callback fork of PL as a requirement below to facilitate evaluation of the new tutorial.

I'm scaling back my original FinetuningScheduler PR to include only a couple minor changes that enable user-registered modules like the finetuning_scheduler to function.

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Yes :)
Make sure you had fun coding 🙃

docs/requirements.txt

lightning_examples/finetuning-scheduler/finetuning-scheduler.py

Borda · 2021-12-01T21:36:11Z

lightning_examples/finetuning-scheduler/finetuning-scheduler.py

+# final phase of the schedule has its stopping criteria met. See
+# the [early stopping documentation](https://pytorch-lightning.readthedocs.io/en/latest/extensions/generated/pytorch_lightning.callbacks.EarlyStopping.html) for more details on that callback's configuration.
+#
+# <img src="fts_explicit_loss_anim.gif" width="376px" height="272px">


lets use MarkDown formating

Done! The picture is a bit bigger than I was overriding it to be w/ the html but still looks fine I think.

well, the advantage of MD formating is that in the publication we will inline it in the notebooks so it becomes standalone with full illustrations...

True, that's definitely nice. Would you like me to switch the remaining two illustrations I have to pure MD as well? They look a bit less appealing without the html massaging unfortunately. I noticed the UvA-DL tutorials had some html img tags I'm assuming for the same reason but if you think it's a worthwhile trade-off I can switch them to pure MD.

lightning_examples/finetuning-scheduler/finetuning-scheduler.py

tchaton · 2021-12-02T08:19:54Z

lightning_examples/finetuning-scheduler/finetuning-scheduler.py

+# %% [markdown]
+# <div class="alert alert-warning">
+#
+# **Note:** Currently, _FinetuningScheduler_ only supports the following _TrainingTypePlugins_:


Note: This would be renamed Strategy for v1.7

Good point, I've updated the reference here and will double-check to see if there are other appropriate references internal to finetuning_scheduler the module

tchaton · 2021-12-02T08:20:26Z

lightning_examples/finetuning-scheduler/finetuning-scheduler.py

+
+# %%
+# a couple helper functions to prepare code to work with the forthcoming hub and user module registry
+MOCK_HUB_REGISTRY = _Registry()


Let's drop all references to the hub, this isn't public knowledge yet :)

What hub? 😉 Renamed to a generic mock registry.

tchaton · 2021-12-02T08:21:48Z

lightning_examples/finetuning-scheduler/finetuning-scheduler.py

+DEFAULT_TASK = "rte"
+
+# narrow our logging to adapt it for a notebook environment
+for l_key in logging.Logger.manager.loggerDict.keys():


This might require an extra explanation. IMO, I would remove it, this makes the explication more complex.

Good point, code clarity >> shorter logs 😄 I've removed the log massaging.

lightning_examples/finetuning-scheduler/finetuning-scheduler.py

tchaton · 2021-12-02T08:57:51Z

lightning_examples/finetuning-scheduler/finetuning-scheduler.py

+    callbacks=callbacks,
+    logger=logger,
+)
+trainer.fit(model, datamodule=dm)


Did you do a full training? We could upload the weights and curve to s3. Would be neat to see the naive fine-tuning vs the tailored on in terms of performance on SuperGLUE.
Right now, I learned how to use the scheduler, but I don't see its benefits.

Yes, it is a full training, though only on a small model (albert-base), but I'm definitely happy to upload the produced checkpoint.

I've got a full comparison of "nofts_baseline" vs "implicit_mode" vs "explicit_mode" finetuning scenarios (w/ identical parameters other than thaw schedule) available on tensorboard.dev and documented in RST in the current finetuning_scheduler_module documentation. I'm planning to include the comparison of scenarios in whatever form the new user-registered module documentation takes.

I included a "nofts_baseline" vs "implicit_mode" vs "explicit_mode" comparison in an earlier version of this notebook but felt like the additional training session executions consumed too much compute and that it might be clearer just to demonstrate the primary usage pattern of FinetuningScheduler (an explicit user-defined schedule).

Maybe linking in the example notebook to the latest tensorboard.dev comparison of the scenarios would be useful? I can go back to including training all three scenarios in the notebook if you prefer but it's a lot IMHO.

As far as uploading the weights for experimentation, can I just drop them on google drive? I see the awesome UvA tutorials use github for saved_models but I think there are some limitations to posting the weights there that make google drive preferable.

Thanks for being generous with your time and all the great feedback!

Hey @speediedan,

Thanks for the hard work. This is starting to look very great!

I would prefer to actually include the 3 runs with their logs and the checkpoints. This brings much more value to your presentation as readers would be able to relate to the actual experimentations.

So, let's upload the weights and logs. You can use Google Drive or we can upload them to our s3 bucket. It is up to you :)

Furthermore, it would be great to see an inference using trainer.predict on some examples using all 3 models, especially if the latest creates slightly better predictions !

Getting close to the finish line :)

In addition to the recommended changes, I've made a number of expository enhancements and improvements to reproduciblity (in the full DDP Lightning CLI version to keep the tutorial as simple as possible). e.g. you can see salient library versions logged in the tensorboard summary hparams:

Updates in summary:

Added nofts_baseline and fts_implicit mode scenarios with full tensorboard experiment summaries, log files and associated checkpoints. I think it makes sense to move the checkpoints to the PL s3 bucket since I believe the bandwidth quotas on google drive will be lower than your s3 bucket (not that we will be in danger of hitting it, but just in case).

Clarified the value conferred by using FinetuningScheduler and demonstrated some of the model exploration utility

Switched to using the recently release DeBERTav3 model from microsoft (using the smaller base version to allow running on modest GPUs)

Though the performance on RTE with the fts_explicit scenario is on the order of a few % better than the naive nofts_baseline and a few examples could be found that the explicit model classifies correctly that the nofts_baseline scenario doesn't, I don't think show those few classification examples will be especially illuminating. Though you can squeeze out a few additional percentage points of performance with the FinetuningScheduler in many contexts, I would prefer to keep the focus on the model research/exploration benefits if that's okay with you. Hope you like the improvements! I know there will likely be some additional changes but I know we're getting there. Thanks again for your all your work and leadership!

Hey @speediedan, Yes, that's entirely fine and I think it is better to keep your vision in this notebook.

lightning_examples/finetuning-scheduler/finetuning-scheduler.py

tchaton · 2021-12-04T11:02:34Z

lightning_examples/finetuning-scheduler/finetuning-scheduler.py

+        labels = batch["labels"]
+        return {"loss": val_loss, "preds": preds, "labels": labels}
+
+    def validation_epoch_end(self, outputs):


I would actually move all logging directly to the validation_step, so you don't even need to return anything. Once a large dataset, this could raise OOM.

lightning_examples/finetuning-scheduler/finetuning-scheduler.py

tchaton · 2021-12-04T11:03:38Z

lightning_examples/finetuning-scheduler/finetuning-scheduler.py

+callbacks = [
+    FinetuningScheduler(ft_schedule=ft_schedule_name, max_depth=2),  # type: ignore # noqa
+    FTSEarlyStopping(monitor="val_loss", min_delta=0.001, patience=2),  # type: ignore # noqa
+    FTSCheckpoint(monitor="val_loss", save_top_k=5),  # type: ignore # noqa


Mind adding a comment above to explain this?

lightning_examples/finetuning-scheduler/finetuning-scheduler.py

…/finetuning-scheduler.py Co-authored-by: thomas chaton <[email protected]>

…quirement

Co-authored-by: thomas chaton <[email protected]>

…ended improvements for finetuning_scheduler tutorial

tchaton

Great progres.

lightning_examples/finetuning-scheduler/finetuning-scheduler.py

tchaton · 2021-12-17T07:47:18Z

lightning_examples/finetuning-scheduler/finetuning-scheduler.py

+    print(f"Imported and registered the following callbacks: {registered_list}")
+
+
+def instantiate_registered_class(init: Dict[str, Any], args: Optional[Union[Any, Tuple[Any, ...]]] = None) -> Any:


Just personal feeling, but I think instantiate_registered_class make the tutorial more opaque to readers.

Great suggestion. I think leaving this in the LightningCLI example in the module remains worthwhile but it was just adding an unnecessary layer of abstraction for this example. Thanks for all your valuable insight/feedback!

lightning_examples/finetuning-scheduler/finetuning-scheduler.py

* update both datamodules and basic-gan tutorials to reference stable doc version

Borda · 2022-04-22T05:13:48Z

@rohitgr7 could you pls check/review this tutorial? 🐰

rohitgr7

looks great so far.
Only one major point I need to clarify is how the callback is related to the accelerator since there are some print statements regarding that.

lightning_examples/finetuning-scheduler/finetuning-scheduler.py

a few excellent suggestions/cleanup from review Co-authored-by: Rohit Gupta <[email protected]>

for more information, see https://pre-commit.ci

lightning_examples/finetuning-scheduler/finetuning-scheduler.py

rohitgr7

awesome!

speediedan requested review from aribornstein, Borda, carmocca, ethanwharris, kaushikb11 and SeanNaren as code owners December 1, 2021 21:23

Borda reviewed Dec 1, 2021

View reviewed changes

speediedan force-pushed the finetuning_scheduler branch from d884be2 to de2c5b2 Compare December 1, 2021 22:59

speediedan mentioned this pull request Dec 2, 2021

Provide intra-fit() ckpt_path access to enable additional finetuning modalities Lightning-AI/pytorch-lightning#10198

Closed

12 tasks

tchaton reviewed Dec 2, 2021

View reviewed changes

speediedan force-pushed the finetuning_scheduler branch 2 times, most recently from 80c1c55 to dc2d25f Compare December 2, 2021 23:12

speediedan commented Dec 3, 2021

View reviewed changes

lightning_examples/finetuning-scheduler/finetuning-scheduler.py Outdated Show resolved Hide resolved

tchaton reviewed Dec 4, 2021

View reviewed changes

speediedan force-pushed the finetuning_scheduler branch from 24ff01c to f5a59a5 Compare December 8, 2021 21:22

speediedan and others added 10 commits December 11, 2021 13:22

several recommended fixes for lightning_examples/finetuning-scheduler…

b419e4d

…/finetuning-scheduler.py Co-authored-by: thomas chaton <[email protected]>

notebook-based fts tutorial and example

441104a

tweak md format to H2 and update meta info

2dfc99f

update tutorial reqs

b692250

stick to markdown w/o html wherever possible, remove temporary fts re…

9595b14

…quirement

Update lightning_examples/finetuning-scheduler/finetuning-scheduler.py

33366fe

Co-authored-by: thomas chaton <[email protected]>

additional tuning literature references and several additional recomm…

3d0fcfe

…ended improvements for finetuning_scheduler tutorial

misc tutorial enhancements

d13d9d9

numerous expository enhancements and switch to DeBERTav3

eb7e095

minor spelling and stylistic updates

542980b

speediedan force-pushed the finetuning_scheduler branch from 890e05c to 542980b Compare December 11, 2021 21:39

Borda added the Example Example / Demo / Tutorial label Dec 13, 2021

Merge branch 'main' into finetuning_scheduler

65b8eed

tchaton reviewed Dec 17, 2021

View reviewed changes

Merge branch 'main' into finetuning_scheduler

9d207f4

Merge branch 'main' into finetuning_scheduler

1d6ef3f

speediedan force-pushed the finetuning_scheduler branch from 8c01fd2 to 1d6ef3f Compare April 19, 2022 21:23

speediedan added 3 commits April 19, 2022 14:58

isort skip directive required to implement sentencepiece workaround

4e0715e

update broken gan/datamodules tutorial links (Lightning-AI#164)

31d3cb2

* update both datamodules and basic-gan tutorials to reference stable doc version

Merge branch 'main' into finetuning_scheduler

b339d3a

Borda force-pushed the main branch 2 times, most recently from b6526bc to db5de3a Compare April 22, 2022 02:48

Merge branch 'main' into finetuning_scheduler

fc679d2

Borda requested review from krshrimali and rohitgr7 and removed request for aribornstein April 22, 2022 05:14

rohitgr7 assigned rohitgr7 and unassigned tchaton Apr 22, 2022

rohitgr7 reviewed May 8, 2022

View reviewed changes

speediedan and others added 5 commits May 10, 2022 13:10

Apply suggestions from code review

d4e3678

a few excellent suggestions/cleanup from review Co-authored-by: Rohit Gupta <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

be5810f

for more information, see https://pre-commit.ci

additional clarification of documentation and minor cleanup pre-publish

d150588

Merge branch 'main' into finetuning_scheduler

9fd27ce

metadata and aesthetic image improvements

f8ec6df

speediedan commented May 10, 2022

View reviewed changes

lightning_examples/finetuning-scheduler/finetuning-scheduler.py Show resolved Hide resolved

speediedan commented May 10, 2022

View reviewed changes

lightning_examples/finetuning-scheduler/finetuning-scheduler.py Outdated Show resolved Hide resolved

speediedan added 2 commits May 11, 2022 09:24

Merge branch 'main' into finetuning_scheduler

b68dae6

make notebook accelerator agnostic

3cecc75

rohitgr7 approved these changes May 11, 2022

View reviewed changes

reset lr_scheduler interval to epoch

e73f443

Borda enabled auto-merge (squash) May 12, 2022 11:31

Borda disabled auto-merge May 12, 2022 11:32

Borda merged commit c478057 into Lightning-AI:main May 12, 2022

speediedan deleted the finetuning_scheduler branch June 10, 2022 20:02

		print(f"Imported and registered the following callbacks: {registered_list}")


		def instantiate_registered_class(init: Dict[str, Any], args: Optional[Union[Any, Tuple[Any, ...]]] = None) -> Any:

Finetuning scheduler #115

Finetuning scheduler #115

Uh oh!

Conversation

speediedan commented Dec 1, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Before submitting

What does this PR do?

PR review

Did you have fun?

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

tchaton left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Borda commented Apr 22, 2022

Uh oh!

rohitgr7 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

speediedan commented Dec 1, 2021 •

edited

Loading