generated from dotnet/new-repo
-
Notifications
You must be signed in to change notification settings - Fork 22
FAQ
Jeff Handley edited this page Mar 7, 2025
·
5 revisions
Models should be retrained based on newly downloaded data when any of these conditions are true:
- Existing labels have been deleted, renamed, or otherwise modified such that prediction labels are stale
- New labels have been created and applied to issues/pulls, and it's desired for those labels to begin getting predicted
- The repository has gained a high volume of issues/pulls compared to when it was trained, and prediction accuracy is low
- The predicted labels are not meeting expectations for any other reason
If a model is not retrained under these circumstances:
- A label that has been deleted or renamed can be predicted again, which results in recreating the label automatically
- New labels will not be predicted
- Prediction accuracy degrades over time
High volume repositories with stable labels can go years without the need for retraining. Because retraining is straightforward and self-service though, teams are empowered to retrain their models at the cadence they find valuable. The results of testing predictions will inform whether a newly trained model should be promoted into use.
Teams may be tempted to use a cron schedule to automate retraining on a regular bases, but this must not be done. Training must remain a human-triggered event with review of the test data before promotion into usage.