-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Support Tiktoken Gpt-4.1 Model #7453
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds support for the new "gpt-4.1" model in the Tiktoken tokenizer, addressing issue #7450.
- Added test cases for "gpt-4.1" and "gpt-4.1-mini" in the test suite.
- Updated model prefix mappings in the tokenizer to support both dashed and non-dashed "gpt-4.1" formats.
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
File | Description |
---|---|
test/Microsoft.ML.Tokenizers.Tests/TiktokenTests.cs | Added inline test data for "gpt-4.1" variants to validate encoding support. |
src/Microsoft.ML.Tokenizers/Model/TiktokenTokenizer.cs | Inserted new mappings for "gpt-4.1-" and "gpt-4.1" models, mapping them to ModelEncoding.O200kBase. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #7453 +/- ##
==========================================
- Coverage 69.00% 68.99% -0.02%
==========================================
Files 1483 1482 -1
Lines 274563 273879 -684
Branches 28395 28254 -141
==========================================
- Hits 189455 188955 -500
+ Misses 77672 77537 -135
+ Partials 7436 7387 -49
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
Fixes #7450