Skip to content

Ideas for pythainlp.lm function #1048

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wannaphong opened this issue Dec 27, 2024 · 6 comments
Open

Ideas for pythainlp.lm function #1048

wannaphong opened this issue Dec 27, 2024 · 6 comments
Labels
enhancement enhance functionalities

Comments

@wannaphong
Copy link
Member

wannaphong commented Dec 27, 2024

I think pythainlp.lm class should collect the function for doing preprocessing or post-processing Thai text from LLM and include a small language model that can run in computers for home users to do simple NLP jobs.

Preprocessing

Post-processing

@wannaphong wannaphong moved this to In progress in PyThaiNLP Dec 27, 2024
@bact
Copy link
Member

bact commented Dec 27, 2024

If we're going to have a small language model as well, should we call the module just "lm"?
Just to make it more generic.

@wannaphong
Copy link
Member Author

If we're going to have a small language model as well, should we call the module just "lm"? Just to make it more generic.

Agree 👍

@wannaphong wannaphong changed the title Ideas for pythainlp.llm function Ideas for pythainlp.lm function Dec 28, 2024
@bact bact added the enhancement enhance functionalities label Dec 30, 2024
@matichon-vultureprime
Copy link

How about leveraging NVIDIA-Curator to do pre-processing and post-processing?

We already have some examples from the NVIDIA team:

@wannaphong
Copy link
Member Author

Add pythainlp.lm.calculate_ngram_counts #1054

@bact
Copy link
Member

bact commented Jan 5, 2025

For the "small language model", what about having that model as a core/cores for most of the basic tasks that don't required larger model in PyThaiNLP? So we will have less dependencies as well.

Related to

@wannaphong
Copy link
Member Author

For the "small language model", what about having that model as a core/cores for most of the basic tasks that don't required larger model in PyThaiNLP? So we will have less dependencies as well.

Related to

* [Porting model to ONNX model #639](https://github.com/PyThaiNLP/pythainlp/issues/639)

* [Porting Thai2fit from fastai v1 to fastai v2 #716](https://github.com/PyThaiNLP/pythainlp/issues/716)

* [Remove all python-crfsuite models from PyThaiNLP #655](https://github.com/PyThaiNLP/pythainlp/issues/655)

* [Consider reduce dependencies #935](https://github.com/PyThaiNLP/pythainlp/issues/935)

Just llama-cpp-python or onnx model. I think it is ok.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement enhance functionalities
Projects
Status: In progress
Development

No branches or pull requests

3 participants