Skip to content
Jeff Handley edited this page Mar 7, 2025 · 5 revisions

Frequently Asked Questions

When should models be retrained?

Models should be retrained based on newly downloaded data when any of these conditions are true:

  1. Existing labels have been deleted, renamed, or otherwise modified such that prediction labels are stale
  2. New labels have been created and applied to issues/pulls, and it's desired for those labels to begin getting predicted
  3. The repository has gained a high volume of issues/pulls compared to when it was trained, and prediction accuracy is low
  4. The predicted labels are not meeting expectations for any other reason

If a model is not retrained under these circumstances:

  1. A label that has been deleted or renamed can be predicted again, which results in recreating the label automatically
  2. New labels will not be predicted
  3. Prediction accuracy degrades over time

High volume repositories with stable labels can go years without the need for retraining. Because retraining is straightforward and self-service though, teams are empowered to retrain their models at the cadence they find valuable. The results of testing predictions will inform whether a newly trained model should be promoted into use.

Retraining invocation must never be automated

Teams may be tempted to use a cron schedule to automate retraining on a regular bases, but this must not be done. Training must remain a human-triggered event with review of the test data before promotion into usage.

Clone this wiki locally