Skip to content

Add PhayaThaiBERT model into PyThaiNLP [WIP] #868

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
4 of 5 tasks
pavaris-pm opened this issue Nov 26, 2023 · 4 comments · Fixed by #873
Closed
4 of 5 tasks

Add PhayaThaiBERT model into PyThaiNLP [WIP] #868

pavaris-pm opened this issue Nov 26, 2023 · 4 comments · Fixed by #873
Labels
enhancement enhance functionalities
Milestone

Comments

@pavaris-pm
Copy link
Contributor

pavaris-pm commented Nov 26, 2023

Due to an impressive result of the new released paper PhayaThaiBERT: Enhancing a Pretrained Thai Language Model with Unassimilated Loanwords for better handling with foreign words compared to former existing Thai encoder-based model.

I think it is great to add it into supported downstream task of PyThaiNLP e.g. token classification etc. to strengthen the library. What do you think? If all of us agreed on this, I can help integrating it as a new engine asap.

New features

Here is the task that I found that it can be integrated in PyThaiNLP after reading a paper. The list below here is the current progress and contributors who put their efforts develop the model ( ✅ check mark means that it already added in the source code and will make a complete PR after complete all of it krub):

  • Part-of-speech tagging on blackboard corpus by @MpolaarbearM
  • Named-entity-recognition on Thainer-v2 corpus by @pavaris-pm
  • Tokenization by @pavaris-pm
  • Data Augmentation (Text) by @pavaris-pm
  • Word Correction (currently under research and development)

etc ... (I will keep add more into the list based on what I have found during an experiment)

For those who interested, feel free to leave a comment below in case you want to develop a model in any of your interested task krub. After that, you can made a PR to the same brach as in PR #873

@wannaphong
Copy link
Member

I think it is very good to see new fine tuning model that use new the state-of-the-art Thai encoder model. You can train new model. I don't have freetime for training model. 😢 (reserach)

@pavaris-pm
Copy link
Contributor Author

pavaris-pm commented Nov 26, 2023

I think it is very good to see new fine tuning model that use new the state-of-the-art Thai encoder model. You can train new model. I don't have freetime for training model. 😢 (reserach)

that's worth to try. I will help working on this, then, you can take a review on my PR after it is completed 👍🏻

@bact
Copy link
Member

bact commented Nov 30, 2023

See #871 by @MpolaarbearM

@bact bact added the enhancement enhance functionalities label Nov 30, 2023
@bact bact added this to the Future milestone Nov 30, 2023
@pavaris-pm
Copy link
Contributor Author

See #871 by @MpolaarbearM

@bact Thanks for your remind krub. I already took a look at the paper itself. Seems like PhayaThaiBERT brings a lot of new things up into the game with extra vocabulary expansion. However, big thanks for @MpolaarbearM for training POS tagging model krub. I'll take a deeper look and see if there has any functionality from PhayaThaiBERT that we can added up into this library.

@pavaris-pm pavaris-pm changed the title Add PhayaThaiBERT model into PyThaiNLP Add PhayaThaiBERT model into PyThaiNLP [WIP] Dec 4, 2023
@bact bact closed this as completed in #873 Dec 11, 2023
@github-project-automation github-project-automation bot moved this to In progress in PyThaiNLP Aug 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement enhance functionalities
Projects
Status: In progress
Development

Successfully merging a pull request may close this issue.

3 participants