Skip to content

Add display cell tokenizer #1058

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jan 13, 2025
Merged

Conversation

wannaphong
Copy link
Member

@wannaphong wannaphong commented Jan 8, 2025

Fixes #663

Add a new function display_cell_tokenize to split Thai text into display cells without splitting tone marks.

  • New Functionality
    • Add display_cell_tokenize function in pythainlp/tokenize/core.py to handle the splitting of Thai text into display cells.
    • Ensure the function does not split tone marks.
  • Initialization
    • Update pythainlp/tokenize/__init__.py to include the new display_cell_tokenize function in the __all__ list.
  • Testing
    • Add tests for the display_cell_tokenize function in tests/core/test_tokenize.py.

For more details, open the Copilot Workspace session.

Fixes #663

Add a new function `display_cell_tokenize` to split Thai text into display cells without splitting tone marks.

* **New Functionality**
  - Add `display_cell_tokenize` function in `pythainlp/tokenize/core.py` to handle the splitting of Thai text into display cells.
  - Ensure the function does not split tone marks.
* **Initialization**
  - Update `pythainlp/tokenize/__init__.py` to include the new `display_cell_tokenize` function in the `__all__` list.
* **Testing**
  - Add tests for the `display_cell_tokenize` function in `tests/core/test_tokenize.py`.

---

For more details, open the [Copilot Workspace session](https://copilot-workspace.githubnext.com/PyThaiNLP/pythainlp/issues/663?shareId=XXXX-XXXX-XXXX-XXXX).
@pep8speaks
Copy link

pep8speaks commented Jan 8, 2025

Hello @wannaphong! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2025-01-08 13:42:07 UTC

Copy link

sonarqubecloud bot commented Jan 8, 2025

@coveralls
Copy link

coveralls commented Jan 8, 2025

Coverage Status

coverage: 52.898% (+0.1%) from 52.753%
when pulling 9c86f85 on wannaphong/add-display-cell-tokenizer
into 7332984 on dev.

@bact bact added the enhancement enhance functionalities label Jan 10, 2025
@wannaphong wannaphong merged commit ef0e01d into dev Jan 13, 2025
24 of 25 checks passed
@wannaphong wannaphong deleted the wannaphong/add-display-cell-tokenizer branch February 10, 2025 04:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement enhance functionalities
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Thai character splitter to display cell
4 participants