Releases: PyThaiNLP/pythainlp
PyThaiNLP v2.3.0-beta1
PyThaiNLP v2.3.0-beta1
is The first beta release of PyThaiNLP 2.3
Documentation: https://pythainlp.github.io/dev-docs/index.html
Report bug: https://github.com/PyThaiNLP/pythainlp/issues
See PyThaiNLP 2.3 change log #445
Deprecation and other API changes
- NER change a ThaiNER model (from ThaiNER 1.4 to ThaiNER 1.5). If you need use ThaiNER 1.4 model, You can use version in ThaiNameTagger class.
pythainlp.tag.named_entity.ThaiNameTagger(version: str = '1.4')
(Docs: https://pythainlp.github.io/dev-docs/api/tag.html#pythainlp.tag.named_entity.ThaiNameTagger)
Tokenizer
- #484 Add: model option for
attacut.tokenize()
- #502 Add:
corpus.util.revise_wordset()
to revise tokenization dictionary - #503 Add:
NERCut
tokenization engine
Corpus
- License change:
- All corpora, datasets, and documentation created by PyThaiNLP project are now released under Creative Commons Zero 1.0 Universal Public Domain Dedication License (CC0).
- All language models created by PyThaiNLP project are released under Creative Commons Attribution 4.0 International Public License (CC-by).
- #449 Fix: remove instances with
[
or]
from etcc.txt - #467 Add:
corpus.common.provinces()
can now return romanized names - #476 Add:
thai_family_names()
to get a set of Thai family names - #487 Fix:
thailand_provinces_th.csv
not found issue - #492 Fix: remove erroneous
AITT
tag from ORCHID to UD table -- thanks @c4n for the fix
POS Tagger
- #464 Add:
LST20
language model for part-of-speech tagging - #468 Add: port
PerceptronTagger
from NTLK. POS tagging no longer needs NLTK for dependency. - #478 Update: ORCHID POS tags documentation
Name Entity Tagging
Transliterate
Text Summarize
- #523 Add mT5 text summarize to
pythainlp.summarize
Chunk parser
- #524 Add
pythainlp.tag.chunk
Util
- #481 Fix:
remove_repeat_vowels()
bug that remove spaces between different vowels - #483 Add: add
remove()
method to remove a word from a trie -- thanks @korakot - #490 Fix:
thai_strftime()
- normalize output for unsupported directive (running in glibc and musl should produce the same output) - #512 Add:
emoji_to_thai()
to convert emoji to Thai description -- thanks @ppirch for the development - #513 Add:
thai_keyboard_dist()
to calculate euclidean distance between two characters according to their location on a Thai keyboard layout -- thanks @ppirch for the development
Links
- Website: https://pythainlp.github.io
- Docs: https://pythainlp.github.io/dev-docs/
- GitHub: https://github.com/PyThaiNLP/pythainlp
- Issues: https://github.com/PyThaiNLP/pythainlp/issues
Thanks all the contributors. (Image made with contributors-img)
We build Thai NLP.
PyThaiNLP
v2.3.0-dev0
PyThaiNLP v2.3.0-dev0
is The first development release of PyThaiNLP 2.3 (For development only)
Documentation: https://pythainlp.github.io/dev-docs/index.html
Report bug: https://github.com/PyThaiNLP/pythainlp/issues
See PyThaiNLP 2.3 change log #445
Deprecation and other API changes
- NER change a ThaiNER model (from ThaiNER 1.4 to ThaiNER 1.5). If you need use ThaiNER 1.4 model, You can use version in ThaiNameTagger class.
pythainlp.tag.named_entity.ThaiNameTagger(version: str = '1.4')
(Docs: https://pythainlp.github.io/dev-docs/api/tag.html#pythainlp.tag.named_entity.ThaiNameTagger)
Tokenizer
- #484 Add: model option for
attacut.tokenize()
- #502 Add:
corpus.util.revise_wordset()
to revise tokenization dictionary - #503 Add:
NERCut
tokenization engine
Corpus
- License change:
- All corpora, datasets, and documentation created by PyThaiNLP project are now released under Creative Commons Zero 1.0 Universal Public Domain Dedication License (CC0).
- All language models created by PyThaiNLP project are released under Creative Commons Attribution 4.0 International Public License (CC-by).
- #449 Fix: remove instances with
[
or]
from etcc.txt - #467 Add:
corpus.common.provinces()
can now return romanized names - #476 Add:
thai_family_names()
to get a set of Thai family names - #487 Fix:
thailand_provinces_th.csv
not found issue - #492 Fix: remove erroneous
AITT
tag from ORCHID to UD table -- thanks @c4n for the fix
POS Tagger
- #464 Add:
LST20
language model for part-of-speech tagging - #468 Add: port
PerceptronTagger
from NTLK. POS tagging no longer needs NLTK for dependency. - #478 Update: ORCHID POS tags documentation
Name Entity Tagging
Transliterate
Text Summarize
- #523 Add mT5 text summarize to
pythainlp.summarize
Chunk parser
- #524 Add
pythainlp.tag.chunk
Util
- #481 Fix:
remove_repeat_vowels()
bug that remove spaces between different vowels - #483 Add: add
remove()
method to remove a word from a trie -- thanks @korakot - #490 Fix:
thai_strftime()
- normalize output for unsupported directive (running in glibc and musl should produce the same output) - #512 Add:
emoji_to_thai()
to convert emoji to Thai description -- thanks @ppirch for the development - #513 Add:
thai_keyboard_dist()
to calculate euclidean distance between two characters according to their location on a Thai keyboard layout -- thanks @ppirch for the development
PyThaiNLP 2.2.6
PyThaiNLP 2.2.6 Released!
This release is a bug fix release.
- Update
pythainlp.tag
docs #492 thai_strftime
: Normalize output for unsupported directive #490- port pickle to json and add lst20 postag model to
pythainlp.corpus
#488
Thanks to the following contributors to 2.2.6: @c4n
Thanks to other contributors listed here: https://github.com/PyThaiNLP/pythainlp/blob/dev/CONTRIBUTING.md
You can install or upgrade using pip install -U pythainlp
- GitHub Releases: https://github.com/PyThaiNLP/pythainlp/releases/tag/v2.2.6
- Documentation: https://www.thainlp.org/pythainlp/docs/2.2/
- Tutorials: https://thainlp.org/pythainlp/tutorials/
- GitHub: https://github.com/PyThaiNLP/pythainlp
We build Thai NLP
PyThaiNLP Team
PyThaiNLP 2.2.5
PyThaiNLP 2.2.5 Released!
This release is a bug fix release.
- Fix: not found file for pythainlp.corpus #486
https://github.com/PyThaiNLP/pythainlp/releases/tag/v2.2.5
You can install or upgrade using pip install -U pythainlp
Documentation: https://www.thainlp.org/pythainlp/docs/2.2/
Tutorials: https://thainlp.org/pythainlp/tutorials/
GitHub: https://github.com/PyThaiNLP/pythainlp
We build Thai NLP
PyThaiNLP Team
PyThaiNLP 2.2.4
- #481 Fix: remove_repeat_vowels() bug that remove spaces between different vowels
PyThaiNLP 2.2.3
This release is a bug fix release.
- fix crfcut last segment not included if not predicted as end-of-sentence #459
Installation
- You can install or upgrade using
pip install -U pythainlp
More information
- Change log: #330
- Documentation: https://www.thainlp.org/pythainlp/docs/2.2/
- Tutorials: https://thainlp.org/pythainlp/tutorials/
- GitHub: https://github.com/PyThaiNLP/pythainlp
We build Thai NLP
PyThaiNLP Team
PyThaiNLP 2.2.2
This release is a bug fix release.
- Remove entries with
[
or]
frometcc.txt
#449 - Update license information:
- All corpora, datasets, and documentation created by PyThaiNLP project are now released under Creative Commons Zero 1.0 Universal Public Domain Dedication License (CC0).
- All language models created by PyThaiNLP project are released under Creative Commons Attribution 4.0 International Public License (CC-by).
- For more information about corpora and models created by PyThaiNLP project, see PyThaiNLP Corpus.
- For other corpora and models that may included with PyThaiNLP distribution, please advise Corpus License.
Installation
- You can install or upgrade using
pip install -U pythainlp
More information
- Change log: #330
- Documentation: https://www.thainlp.org/pythainlp/docs/2.2/
- Tutorials: https://thainlp.org/pythainlp/tutorials/
- GitHub: https://github.com/PyThaiNLP/pythainlp
We build Thai NLP
PyThaiNLP Team
PyThaiNLP 2.2.1
This release is a bug fix release.
Installation
- You can install or upgrade using
pip install -U pythainlp
More information
- Change log: #330
- Documentation: https://www.thainlp.org/pythainlp/docs/2.2/
- Tutorials: https://thainlp.org/pythainlp/tutorials/
- GitHub: https://github.com/PyThaiNLP/pythainlp
We build Thai NLP
PyThaiNLP Team
PyThaiNLP 2.2.0
English
Hello World. Today, we're happy to announce the availability of PyThaiNLP. It has been four years since PyThaiNLP's the first release. Thank you very much for supporting PyThaiNLP.
Summary – Release Highlights
New Features
Tokenizer
- Fix longest engine, last character is now consumed
- Add CRFCut sentence segmentation
Transliteration
- Add Thai Grapheme-to-Phoneme (Thai G2P) deep learning sequence-to-sequence model
Normalization
- Add more normalize functions, like remove zero-width characters, remove duplicate spaces, etc.
Utilities
- Add thaiword_to_date() and thaiword_to_time()
- Fix countthai() to handle a case where the text has only numbers and symbols
Command line
- Update command and sub-command syntax - see command line docs
Others
- Code improvement: Move non-init code out of init.py files, etc.
- Remove dependency: Unigram POS tagger no longer need NLTK module
Installation
You can install or upgrade using pip install -U pythainlp
Change log: #330
Documentation: https://www.thainlp.org/pythainlp/docs/2.2/
Tutorials: https://thainlp.org/pythainlp/tutorials/
GitHub: https://github.com/PyThaiNLP/pythainlp
We build Thai NLP
PyThaiNLP Team
ภาษาไทย
สวัสดีชาวโลก วันนี้ 24 มิถุนายน 2563 พวกเราได้ปล่อย PyThaiNLP 2.2 ตอนนี้ PyThaiNLP อายุครบ 4 ปี ขอบคุณที่ใช้ PyThaiNLP :)
สรุป – สิ่งที่สำคัญ
คุณลักษณะใหม่
ตัวตัดข้อความ
- แก้ไขตัวตัดคำ longest
- เพิ่มตัวตัดประโยค CRFCut
ถอดเสียง
- เพิ่มการถอดเสียงภาษาไทยเป็น IPA ด้วย Thai Grapheme-to-Phoneme (Thai G2P)
Normalization
- เพิ่มเติมความสามารถให้กับฟังก์ชัน normalize เช่น ลบช่องว่างซ้ำกัน เป็นต้น
เครื่องมือ
- เพิ่ม thaiword_to_date() และ thaiword_to_time()
- ปรับปรุง countthai()
Command line
- ปรับปรุงคำสั่ง command และไวยากรณ์ sub-command - ดูเพิ่มเติมได้ที่ command line docs
อื่น ๆ
- ปรับปรุงโค้ด: ย้ายโค้ดออกจากไฟล์ init.py เป็นต้น
- ลดความต้องการไลบรารีภายนอก: Unigram POS tagger สามารถทำงานได้โดยไม่ต้องการ NLTK
การติดตั้ง
สามารถติดตั้งหรือปรับรุ่นได้ด้วยคำสั่ง pip install -U pythainlp
Change log: #330
Documentation: https://www.thainlp.org/pythainlp/docs/2.2/
Tutorials https://thainlp.org/pythainlp/tutorials/
GitHub: https://github.com/PyThaiNLP/pythainlp
พวกเราสร้าง Thai NLP
ทีม PyThaiNLP
PyThaiNLP 2.2.0-beta1
This the first beta version of PyThaiNLP 2.2.
Installation
pip install --pre pythainlp
PyThaiNLP 2.2 change log #330
Documentation : https://www.thainlp.org/pythainlp/docs/dev/
Report bug : https://github.com/PyThaiNLP/pythainlp/issues
We build Thai NLP.
PyThaiNLP Team