Skip to content

Finetuning scheduler #115

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 59 commits into from
May 12, 2022
Merged

Conversation

speediedan
Copy link
Contributor

@speediedan speediedan commented Dec 1, 2021

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
    Discussed as the appropriate implementation of #10197 and I have updated the FinetuningScheduler PR accordingly.
  • Did you make sure to update the docs?
  • Did you write any new necessary tests?

What does this PR do?

This PR adds a notebook-based tutorial introducing the FinetuningScheduler callback and demonstrating the use of FinetuningScheduler to finetune a small foundational model. As discussed with PL team lead(s), rather than adding the FinetuningScheduler callback directly to PL core, the preferred pattern moving forward will be to register callbacks via a forthcoming API. Until the new API is available, I've included a mock registry for the FinetuningScheduler callback and noted my callback fork of PL as a requirement below to facilitate evaluation of the new tutorial.

I'm scaling back my original FinetuningScheduler PR to include only a couple minor changes that enable user-registered modules like the finetuning_scheduler to function.

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Yes :)
Make sure you had fun coding 🙃

# final phase of the schedule has its stopping criteria met. See
# the [early stopping documentation](https://pytorch-lightning.readthedocs.io/en/latest/extensions/generated/pytorch_lightning.callbacks.EarlyStopping.html) for more details on that callback's configuration.
#
# <img src="fts_explicit_loss_anim.gif" width="376px" height="272px">
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets use MarkDown formating

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! The picture is a bit bigger than I was overriding it to be w/ the html but still looks fine I think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, the advantage of MD formating is that in the publication we will inline it in the notebooks so it becomes standalone with full illustrations...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, that's definitely nice. Would you like me to switch the remaining two illustrations I have to pure MD as well? They look a bit less appealing without the html massaging unfortunately. I noticed the UvA-DL tutorials had some html img tags I'm assuming for the same reason but if you think it's a worthwhile trade-off I can switch them to pure MD.

# %% [markdown]
# <div class="alert alert-warning">
#
# **Note:** Currently, _FinetuningScheduler_ only supports the following _TrainingTypePlugins_:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This would be renamed Strategy for v1.7

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I've updated the reference here and will double-check to see if there are other appropriate references internal to finetuning_scheduler the module


# %%
# a couple helper functions to prepare code to work with the forthcoming hub and user module registry
MOCK_HUB_REGISTRY = _Registry()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's drop all references to the hub, this isn't public knowledge yet :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What hub? 😉 Renamed to a generic mock registry.

DEFAULT_TASK = "rte"

# narrow our logging to adapt it for a notebook environment
for l_key in logging.Logger.manager.loggerDict.keys():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might require an extra explanation. IMO, I would remove it, this makes the explication more complex.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, code clarity >> shorter logs 😄 I've removed the log massaging.

callbacks=callbacks,
logger=logger,
)
trainer.fit(model, datamodule=dm)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you do a full training? We could upload the weights and curve to s3. Would be neat to see the naive fine-tuning vs the tailored on in terms of performance on SuperGLUE.
Right now, I learned how to use the scheduler, but I don't see its benefits.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is a full training, though only on a small model (albert-base), but I'm definitely happy to upload the produced checkpoint.

I've got a full comparison of "nofts_baseline" vs "implicit_mode" vs "explicit_mode" finetuning scenarios (w/ identical parameters other than thaw schedule) available on tensorboard.dev and documented in RST in the current finetuning_scheduler_module documentation. I'm planning to include the comparison of scenarios in whatever form the new user-registered module documentation takes.
image
image

I included a "nofts_baseline" vs "implicit_mode" vs "explicit_mode" comparison in an earlier version of this notebook but felt like the additional training session executions consumed too much compute and that it might be clearer just to demonstrate the primary usage pattern of FinetuningScheduler (an explicit user-defined schedule).

Maybe linking in the example notebook to the latest tensorboard.dev comparison of the scenarios would be useful? I can go back to including training all three scenarios in the notebook if you prefer but it's a lot IMHO.

As far as uploading the weights for experimentation, can I just drop them on google drive? I see the awesome UvA tutorials use github for saved_models but I think there are some limitations to posting the weights there that make google drive preferable.

Thanks for being generous with your time and all the great feedback!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @speediedan,

Thanks for the hard work. This is starting to look very great!

I would prefer to actually include the 3 runs with their logs and the checkpoints. This brings much more value to your presentation as readers would be able to relate to the actual experimentations.

So, let's upload the weights and logs. You can use Google Drive or we can upload them to our s3 bucket. It is up to you :)

Furthermore, it would be great to see an inference using trainer.predict on some examples using all 3 models, especially if the latest creates slightly better predictions !

Getting close to the finish line :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to the recommended changes, I've made a number of expository enhancements and improvements to reproduciblity (in the full DDP Lightning CLI version to keep the tutorial as simple as possible). e.g. you can see salient library versions logged in the tensorboard summary hparams:
image

Updates in summary:

  1. Added nofts_baseline and fts_implicit mode scenarios with full tensorboard experiment summaries, log files and associated checkpoints. I think it makes sense to move the checkpoints to the PL s3 bucket since I believe the bandwidth quotas on google drive will be lower than your s3 bucket (not that we will be in danger of hitting it, but just in case).
  2. Clarified the value conferred by using FinetuningScheduler and demonstrated some of the model exploration utility
  3. Switched to using the recently release DeBERTav3 model from microsoft (using the smaller base version to allow running on modest GPUs)

Though the performance on RTE with the fts_explicit scenario is on the order of a few % better than the naive nofts_baseline and a few examples could be found that the explicit model classifies correctly that the nofts_baseline scenario doesn't, I don't think show those few classification examples will be especially illuminating. Though you can squeeze out a few additional percentage points of performance with the FinetuningScheduler in many contexts, I would prefer to keep the focus on the model research/exploration benefits if that's okay with you. Hope you like the improvements! I know there will likely be some additional changes but I know we're getting there. Thanks again for your all your work and leadership!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @speediedan, Yes, that's entirely fine and I think it is better to keep your vision in this notebook.

@speediedan speediedan force-pushed the finetuning_scheduler branch 2 times, most recently from 80c1c55 to dc2d25f Compare December 2, 2021 23:12
labels = batch["labels"]
return {"loss": val_loss, "preds": preds, "labels": labels}

def validation_epoch_end(self, outputs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would actually move all logging directly to the validation_step, so you don't even need to return anything. Once a large dataset, this could raise OOM.

callbacks = [
FinetuningScheduler(ft_schedule=ft_schedule_name, max_depth=2), # type: ignore # noqa
FTSEarlyStopping(monitor="val_loss", min_delta=0.001, patience=2), # type: ignore # noqa
FTSCheckpoint(monitor="val_loss", save_top_k=5), # type: ignore # noqa
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mind adding a comment above to explain this?

@speediedan speediedan force-pushed the finetuning_scheduler branch from 24ff01c to f5a59a5 Compare December 8, 2021 21:22
@Borda Borda added the Example Example / Demo / Tutorial label Dec 13, 2021
Copy link
Contributor

@tchaton tchaton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great progres.

print(f"Imported and registered the following callbacks: {registered_list}")


def instantiate_registered_class(init: Dict[str, Any], args: Optional[Union[Any, Tuple[Any, ...]]] = None) -> Any:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just personal feeling, but I think instantiate_registered_class make the tutorial more opaque to readers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great suggestion. I think leaving this in the LightningCLI example in the module remains worthwhile but it was just adding an unnecessary layer of abstraction for this example. Thanks for all your valuable insight/feedback!

@speediedan speediedan force-pushed the finetuning_scheduler branch from 8c01fd2 to 1d6ef3f Compare April 19, 2022 21:23
@Borda Borda force-pushed the main branch 2 times, most recently from b6526bc to db5de3a Compare April 22, 2022 02:48
@Borda
Copy link
Member

Borda commented Apr 22, 2022

@rohitgr7 could you pls check/review this tutorial? 🐰

@Borda Borda requested review from krshrimali and rohitgr7 and removed request for aribornstein April 22, 2022 05:14
@rohitgr7 rohitgr7 assigned rohitgr7 and unassigned tchaton Apr 22, 2022
Copy link
Contributor

@rohitgr7 rohitgr7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great so far.
Only one major point I need to clarify is how the callback is related to the accelerator since there are some print statements regarding that.

Copy link
Contributor

@rohitgr7 rohitgr7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome!

@Borda Borda enabled auto-merge (squash) May 12, 2022 11:31
@Borda Borda disabled auto-merge May 12, 2022 11:32
@Borda Borda merged commit c478057 into Lightning-AI:main May 12, 2022
@speediedan speediedan deleted the finetuning_scheduler branch June 10, 2022 20:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Example Example / Demo / Tutorial
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants