[llm] Add a tokenizer python script #1611

larryliu0820 · 2024-01-17T00:19:03Z

Summary: Add a tokenizer python script that adds some post processing to the vanila sentencepiece tokenizer model. This comes in handy when we want to consume it in C++.

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

pytorch-bot · 2024-01-17T00:19:06Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/1611

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (2 Unrelated Failures)

As of commit 403fb31 with merge base 05d169b ():

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / unittest / linux (buck2) / linux-job (gh)
exir/tests/test_memory_format_ops_pass.py::TestMemoryFormatOpsPass::test_op_to_copy_replacement
pull / unittest / macos (buck2) / macos-job (gh)
exir/tests/test_memory_format_ops_pass.py::TestMemoryFormatOpsPass::test_op_to_copy_replacement

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2024-01-17T00:24:02Z

@larryliu0820 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-01-18T21:51:44Z

@larryliu0820 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Summary: Add a tokenizer python script that adds some post processing to the vanila `sentencepiece` tokenizer model. This comes in handy when we want to consume it in C++. Differential Revision: D52821402 Pulled By: larryliu0820

facebook-github-bot · 2024-01-18T23:47:38Z

This pull request was exported from Phabricator. Differential Revision: D52821402

Summary: Add a tokenizer python script that adds some post processing to the vanila `sentencepiece` tokenizer model. This comes in handy when we want to consume it in C++. Pull Request resolved: pytorch#1611 Differential Revision: D52821402 Pulled By: larryliu0820 fbshipit-source-id: a6f988ed893cdbfc1faf5b651f21e26eb81ed9e2

Summary: Add a tokenizer python script that adds some post processing to the vanila `sentencepiece` tokenizer model. This comes in handy when we want to consume it in C++. Pull Request resolved: pytorch#1611 Differential Revision: D52821402 Pulled By: larryliu0820 fbshipit-source-id: 6f56292acd552bd7ceaa89444490ccfda48f3a07

Summary: Add a tokenizer python script that adds some post processing to the vanila `sentencepiece` tokenizer model. This comes in handy when we want to consume it in C++. Pull Request resolved: pytorch#1611 Differential Revision: D52821402 Pulled By: larryliu0820 fbshipit-source-id: a9b10b37a3157f00983c7ce0f0badeefbee1aa4a

facebook-github-bot · 2024-01-19T08:08:37Z

@larryliu0820 merged this pull request in 78ccd2e.

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 17, 2024

larryliu0820 force-pushed the tokenizer branch from 0a4a6e1 to 00587b5 Compare January 18, 2024 21:51

Add a tokenizer python script (#1611)

403fb31

Summary: Add a tokenizer python script that adds some post processing to the vanila `sentencepiece` tokenizer model. This comes in handy when we want to consume it in C++. Differential Revision: D52821402 Pulled By: larryliu0820

facebook-github-bot force-pushed the tokenizer branch from 00587b5 to 403fb31 Compare January 18, 2024 23:47

facebook-github-bot added the fb-exported label Jan 18, 2024

facebook-github-bot closed this in 78ccd2e Jan 19, 2024

facebook-github-bot added the Merged label Jan 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[llm] Add a tokenizer python script #1611

[llm] Add a tokenizer python script #1611

larryliu0820 commented Jan 17, 2024

pytorch-bot bot commented Jan 17, 2024 •

edited

Loading

facebook-github-bot commented Jan 17, 2024

facebook-github-bot commented Jan 18, 2024

facebook-github-bot commented Jan 18, 2024

facebook-github-bot commented Jan 19, 2024

[llm] Add a tokenizer python script #1611

[llm] Add a tokenizer python script #1611

Conversation

larryliu0820 commented Jan 17, 2024

pytorch-bot bot commented Jan 17, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/1611

✅ You can merge normally! (2 Unrelated Failures)

facebook-github-bot commented Jan 17, 2024

facebook-github-bot commented Jan 18, 2024

facebook-github-bot commented Jan 18, 2024

facebook-github-bot commented Jan 19, 2024

pytorch-bot bot commented Jan 17, 2024 •

edited

Loading