Skip to content

"nercut" result not printed in a specific sentence #666

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
kmining opened this issue May 15, 2022 · 5 comments · Fixed by #668 or #671
Closed

"nercut" result not printed in a specific sentence #666

kmining opened this issue May 15, 2022 · 5 comments · Fixed by #668 or #671
Labels
bug bugs in the library

Comments

@kmining
Copy link

kmining commented May 15, 2022

Description

I try to tokenize text with ทุ๊กกโคนน , อือหือ, อย่าลืมอัพการ์ดนะจ๊ะ using nercut tokenizer.

Expected results

Something should be printed..

Current results

Result not printed
image
image

Steps to reproduce

import pythainlp
from pythainlp import word_tokenize
word_tokenize("ทุ๊กกโคนน", engine="nercut")

Your environment

  • PyThaiNLP version: 3.0.5
  • Python version: 3.7.3
  • Operating system and version (distro, 32/64-bit): 64
  • More info (Docker, VM, etc.):
@github-actions
Copy link

Hello @kmining, thank you for your interest in our work!

If this is a bug report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

wannaphong added a commit that referenced this issue May 16, 2022
Fixed missing any rule
@wannaphong wannaphong mentioned this issue May 16, 2022
2 tasks
@wannaphong
Copy link
Member

@kmining Thank you for reporting. I was fix this issue and I will release PyThaiNLP 3.0.6 to fix.

@wannaphong wannaphong added the bug bugs in the library label May 16, 2022
wannaphong added a commit that referenced this issue May 16, 2022
wannaphong added a commit that referenced this issue May 16, 2022
wannaphong added a commit that referenced this issue May 16, 2022
`PyThaiNLP v3.0.7` is This release is a bug fix release of `PyThaiNLP 3.0.5`.

**Bug Fixed**
- Fixed nercut bug. #666 Thank you @kmining for your bug report.

You can install by `pip install pythainlp` or upgrade by `pip install -U pythainlp`.

Documentation: https://pythainlp.github.io/docs/3.0/index.html

Report bug: https://github.com/PyThaiNLP/pythainlp/issues

See [PyThaiNLP 3.0 change log#545](#545)

## Contributors

<a href="https://github.com/PyThaiNLP/pythainlp/graphs/contributors">
  <img src="https://contributors-img.firebaseapp.com/image?repo=PyThaiNLP/pythainlp" />
</a>

Thanks all the [contributors](https://github.com/PyThaiNLP/pythainlp/graphs/contributors). (Image made with [contributors-img](https://contributors-img.firebaseapp.com))
@kmining
Copy link
Author

kmining commented May 16, 2022

@wannaphong, Thanks for the quick response.
The error with no result has been resolved, but it seems that there is a duplicate of the extracted tokens.
In version 3.0.5, there were no duplicate tokens.

import pythainlp
from pythainlp import word_tokenize
word_tokenize("ทันแน่ๆ", engine="nercut")
word_tokenize("%1ครั้ง", engine="nercut")

image
image

@wannaphong wannaphong reopened this May 16, 2022
wannaphong added a commit that referenced this issue May 16, 2022
@wannaphong
Copy link
Member

@wannaphong, Thanks for the quick response. The error with no result has been resolved, but it seems that there is a duplicate of the extracted tokens. In version 3.0.5, there were no duplicate tokens.

import pythainlp
from pythainlp import word_tokenize
word_tokenize("ทันแน่ๆ", engine="nercut")
word_tokenize("%1ครั้ง", engine="nercut")

image image

Thank you for report. I was rewrite code, so if it be done, I will release PyThaiNLP v3.0.8 to fix this issue.

@wannaphong wannaphong linked a pull request May 16, 2022 that will close this issue
wannaphong added a commit that referenced this issue May 16, 2022
@wannaphong
Copy link
Member

@kmining It's done. Thank you for reporting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug bugs in the library
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants