-
Notifications
You must be signed in to change notification settings - Fork 277
Ideas for pythainlp.lm function #1048
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
If we're going to have a small language model as well, should we call the module just "lm"? |
Agree 👍 |
How about leveraging NVIDIA-Curator to do pre-processing and post-processing? We already have some examples from the NVIDIA team: |
Add pythainlp.lm.calculate_ngram_counts #1054 |
For the "small language model", what about having that model as a core/cores for most of the basic tasks that don't required larger model in PyThaiNLP? So we will have less dependencies as well. Related to |
Just |
I think
pythainlp.lm
class should collect the function for doing preprocessing or post-processing Thai text from LLM and include a small language model that can run in computers for home users to do simple NLP jobs.Preprocessing
pythainlp.lm.calculate_ngram_counts
: Calculates the counts of n-grams in the list words for the specified range. Add pythainlp.lm.calculate_ngram_counts #1054Post-processing
pythainlp.lm.remove_repeated_ngrams
: Remove repeated n-grams (to fixed lm) Add pythainlp.llm #1043The text was updated successfully, but these errors were encountered: