Skip to content
Jeff Handley edited this page May 1, 2025 · 5 revisions

Should labels be reviewed and refined before onboarding?

If feasible, yes; but it's not essential. Because the labeler can be easily retrained at any time, it's suggested to onboard first and retrain when time permits the effort of cleaning up existing labels. Training the models typically takes less than 2 hours for very large repositories

When should models be retrained?

Models should be retrained based on newly downloaded data when any of these conditions are true:

  1. Existing labels have been deleted, renamed, or otherwise modified such that prediction labels are stale
  2. New labels have been created and applied to issues/pulls, and it's desired for those labels to begin getting predicted
  3. The repository has gained a high volume of issues/pulls compared to when it was trained, and prediction accuracy is low
  4. The predicted labels are not meeting expectations for any other reason

If a model is not retrained under these circumstances:

  1. A label that has been deleted or renamed can be predicted again, which results in recreating the label automatically
  2. New labels will not be predicted
  3. Prediction accuracy degrades over time

High volume repositories with stable labels can go years without the need for retraining. Because retraining is straightforward and self-service though, teams are empowered to retrain their models at the cadence they find valuable. The results of testing predictions will inform whether a newly trained model should be promoted into use.

Retraining invocation must never be automated

Teams may be tempted to use a cron schedule to automate retraining on a regular bases, but this must not be done. Training must remain a human-triggered event with review of the test data before promotion into usage.

Why do references to the Issue Labeler's actions use full length SHAs?

When onboarding, the workflows added that invoke the Issue Labeler reference GitHub Actions in the dotnet/issue-labeler repository using the full length commit SHA for the associated release version.

  • Reusable workflows can be referenced using either tags or full length commit SHAs
  • GitHub's Security hardening for GitHub Actions documentation recommends pinning to the commit SHA as the most secure approach, and we adhere to that guidance
  • The short SHA is not supported by GitHub in this context, and the full length SHA must be used

Will the labeler create new labels?

No, it will not. The labeler will only predict labels that exist in the trained model, and the recommended workflow templates do not provide permissions for the actions to create new labels. If a predicted label has been deleted, the prediction job will fail.

Does the labeler apply the untriaged label?

No, it does not. The legacy issue labeler had the optional configuration of applying the untriaged label to new issues, but very few repositories opted into that behavior. The recommended approach for automatically applying untriaged is to create a dotnet-policy-service configuration similar to the one at dotnet-api-docs/.github/policies/untriaged-label.yml. Alternatively, a GitHub workflow can be authored to achieve the same functionality.