Skip to content

Add ICU wordbreak dictionary (Thai) #877

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wannaphong opened this issue Dec 5, 2023 · 1 comment · Fixed by #879
Closed

Add ICU wordbreak dictionary (Thai) #877

wannaphong opened this issue Dec 5, 2023 · 1 comment · Fixed by #879
Labels
corpus corpus/dataset-related issues
Milestone

Comments

@wannaphong
Copy link
Member

Since ICU are include to almost all web browser, so I think we should add ICU dictionary to PyThaiNLP to use same dictionary and can deploy any system that pythainlp/nlpo3 doesn't support.

Dictionary: https://raw.githubusercontent.com/unicode-org/icu/main/icu4c/source/data/brkitr/dictionaries/thaidict.txt

@bact bact added the corpus corpus/dataset-related issues label Dec 5, 2023
@bact bact added this to the Future milestone Dec 5, 2023
@pavaris-pm
Copy link
Contributor

@wannaphong i've added ICU of Thai language into the corpus already. You can see and review it at PR #879 krub.

@bact bact changed the title Add ICU dictionary (Thai) Add ICU wordbreak dictionary (Thai) Dec 5, 2023
@bact bact closed this as completed in #879 Dec 6, 2023
@github-project-automation github-project-automation bot moved this to To do in PyThaiNLP Aug 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
corpus corpus/dataset-related issues
Projects
Status: To do
Development

Successfully merging a pull request may close this issue.

3 participants