Skip to content

Utility functions: rearrange package locations + add thai_strftime() date and time formatter #160

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Nov 21, 2018
Merged
Show file tree
Hide file tree
Changes from 15 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ where ```extras``` can be
- ```deepcut``` (to support deepcut machine-learnt tokenizer)
- ```icu``` (for ICU support in transliteration and tokenization)
- ```ipa``` (for International Phonetic Alphabet support in transliteration)
- ```ml``` (to support ULMFiT models, like one for sentiment analyser)
- ```ml``` (to support ULMFiT models)
- ```ner``` (for named-entity recognizer)
- ```thai2rom``` (for machine-learnt romanization)
- ```thai2vec``` (for Thai word vector)
Expand Down Expand Up @@ -141,7 +141,7 @@ $ pip install pythainlp[extra1,extra2,...]
- ```deepcut``` (สำหรับตัวตัดคำ deepcut)
- ```icu``` (สำหรับการถอดตัวสะกดเป็นสัทอักษรและการตัดคำด้วย ICU)
- ```ipa``` (สำหรับการถอดตัวสะกดเป็นสัทอักษรสากล (IPA))
- ```ml``` (สำหรับการรองรับโมเดล ULMFiT ซึ่งใช้ในฟังก์ชันเช่นการวิเคราะห์อารมณ์)
- ```ml``` (สำหรับการรองรับโมเดล ULMFiT)
- ```ner``` (สำหรับการติดป้ายชื่อเฉพาะ (named-entity))
- ```thai2rom``` (สำหรับการถอดตัวสะกดเป็นอักษรละติน)
- ```thai2vec``` (สำหรับ word vector)
Expand Down
11 changes: 0 additions & 11 deletions docs/api/change.rst

This file was deleted.

7 changes: 0 additions & 7 deletions docs/api/collation.rst

This file was deleted.

7 changes: 0 additions & 7 deletions docs/api/date.rst

This file was deleted.

11 changes: 0 additions & 11 deletions docs/api/ner.rst

This file was deleted.

12 changes: 0 additions & 12 deletions docs/api/number.rst

This file was deleted.

7 changes: 0 additions & 7 deletions docs/api/sentiment.rst

This file was deleted.

3 changes: 3 additions & 0 deletions docs/api/soundex.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,9 @@ pythainlp.soundex
====================================
The :class:`pythainlp.soundex` is soundex for thai.

Modules
-------

.. autofunction:: lk82
.. autofunction:: udom83
.. autofunction:: metasound
3 changes: 3 additions & 0 deletions docs/api/spell.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,7 @@ pythainlp.spell
=====================================
The :class:`pythainlp.spell` finds the closest correctly spelled word to the given text.

Modules
-------

.. autofunction:: spell
5 changes: 5 additions & 0 deletions docs/api/tag.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,9 @@ pythainlp.tag
=====================================
The :class:`pythainlp.tag` contains functions that are used to tag different parts of a text.

Modules
-------

.. autofunction:: pos_tag
.. autoclass:: ThaiNameTagger
:members: get_ner
3 changes: 3 additions & 0 deletions docs/api/tokenize.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,9 @@ pythainlp.tokenize
=====================================
The :class:`pythainlp.tokenize` contains multiple functions for tokenizing a chunk of Thai text into desirable units.

Modules
-------

.. autofunction:: word_tokenize
.. autofunction:: dict_word_tokenize
.. autofunction:: subword_tokenize
Expand Down
8 changes: 6 additions & 2 deletions docs/api/romanization.rst → docs/api/transliterate.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,11 @@ pythainlp.transliterate
====================================
The :class:`pythainlp.transliterate` turns Thai text into a romanized one (put simply, spelled with English).

Modules
-------

.. autofunction:: romanize
.. autofunction:: transliterate
.. currentmodule:: pythainlp.transliterate.thai2rom
.. autoclass:: thai2rom
.. currentmodule:: pythainlp.transliterate.ThaiTransliterator
.. autoclass:: ThaiTransliterator
:members: romanize
30 changes: 30 additions & 0 deletions docs/api/util.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
.. currentmodule:: pythainlp.util

pythainlp.util
=====================================
The :class:`pythainlp.util` contains utility functions, like text conversion and formatting

Modules
-------

.. autofunction:: arabic_digit_to_thai_digit
.. autofunction:: bahttext
.. autofunction:: collate
.. autofunction:: deletetone
.. autofunction:: digit_to_text
.. autofunction:: eng_to_thai
.. autofunction:: find_keyword
.. autofunction:: is_thai
.. autofunction:: is_thaichar
.. autofunction:: is_thaiword
.. autofunction:: normalize
.. autofunction:: now_reign_year
.. autofunction:: num_to_thaiword
.. autofunction:: rank
.. autofunction:: reign_year_to_ad
.. autofunction:: text_to_arabic_digit
.. autofunction:: text_to_thai_digit
.. autofunction:: thai_strftime
.. autofunction:: thai_to_eng
.. autofunction:: thai_digit_to_arabic_digit
.. autofunction:: thaiword_to_num
41 changes: 13 additions & 28 deletions docs/pythainlp-dev-thai.md
Original file line number Diff line number Diff line change
Expand Up @@ -215,13 +215,6 @@ summarize(text="อาหาร หมายถึง ของแข็งห

สร้างเวกเตอร์คำ

```python
from pythainlp.word_vector import thai2vec
```

ปัจจุบันรองรับเฉพาะ thai2vec (https://github.com/cstorm125/thai2vec)

พัฒนาโดย Charin Polpanumas

#### thai2vec

Expand All @@ -247,6 +240,8 @@ from pythainlp.word_vector import thai2vec
การทำงาน หาคำที่ถูกใช้งานมากกว่าค่าขั้นต่ำที่กำหนดได้ โดยจะลบ stopword ออก

```python
from pythainlp.util import find_keyword

find_keyword(word_list, lentext=3)
```

Expand Down Expand Up @@ -347,22 +342,26 @@ thaiword_to_num(["หกหมื่น", "หกพัน", "หกร้อย
เรียงลำดับข้อมูลภาษาไทยใน List

```python
from pythainlp.collation import collate
from pythainlp.util import collate
print(collate(["ไก่", "ไข่", "กา", "ฮา"])) # ['กา', 'ไก่', 'ไข่', 'ฮา']
```

รับ list คืนค่า list

### date

#### now
#### thai_strftime

รับเวลาปัจจุบันเป็นภาษาไทย
จัดรูปแบบข้อความบอกวันที่และเวลาเป็นภาษาไทยและปีพุทธศักราช

```python
from pythainlp.date import now
import datetime
from pythainlp.util import thai_strftime

now() # '30 พฤษภาคม 2560 18:45:24'
fmt = "%Aที่ %-d %B พ.ศ. %Y เวลา %H:%Mน. (%a %d-%b-%y)"
date = datetime.datetime(1976, 10, 6, 1, 40)
print(thai_strftime(date, fmt))
# วันพุธที่ 6 ตุลาคม พ.ศ. 2519 เวลา 01:40น. (พ 06-ต.ค.-19)
```
### rank

Expand All @@ -371,7 +370,7 @@ now() # '30 พฤษภาคม 2560 18:45:24'
หาคำที่มีจำนวนการใช้งานมากที่สุด

```python
from pythainlp.rank import rank
from pythainlp.util import rank

rank(list)
```
Expand Down Expand Up @@ -408,20 +407,6 @@ print(udom83("รถ")) # ร800000
print(metasound("รัก")) # 'ร100'
```

### sentiment

sentiment analysis ภาษาไทย ใช้ข้อมูลจาก [https://github.com/PyThaiNLP/lexicon-thai/tree/master/ข้อความ/](https://github.com/PyThaiNLP/lexicon-thai/tree/master/ข้อความ/)

```python
from pythainlp.sentiment import sentiment

sentiment(str)
```

รับค่า str

คืนค่าเป็น str ซึ่งมีค่า "pos" หรือ "neg"

### Util

#### normalize
Expand Down Expand Up @@ -559,7 +544,7 @@ for province in provinces():
สำหรับใช้ติดป้ายกำกับชื่อจังหวัดในประเทศไทย

```python
from pythainlp.ner.locations import tag_provinces
from pythainlp.tag.locations import tag_provinces

tag_provinces(text_list)
```
Expand Down
2 changes: 1 addition & 1 deletion examples/collation.py → examples/collate.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# -*- coding: utf-8 -*-

from pythainlp.collation import collate
from pythainlp.util import collate

print(collate(["ไก่", "ไข่", "ก", "ฮา"])) # ['ก', 'ไก่', 'ไข่', 'ฮา']
10 changes: 10 additions & 0 deletions examples/date.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# -*- coding: utf-8 -*-

import datetime
from pythainlp.util import thai_strftime

fmt = "%Aที่ %-d %B พ.ศ. %Y เวลา %H:%Mน. (%a %d-%b-%y)"
date = datetime.datetime(1976, 10, 6, 1, 40)

# วันพุธที่ 6 ตุลาคม พ.ศ. 2519 เวลา 01:40น. (พ 06-ต.ค.-19)
print(thai_strftime(date, fmt))
5 changes: 2 additions & 3 deletions pythainlp/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,10 +24,9 @@
thai_characters = "".join([thai_letters, thai_punctuations, thai_digits, thai_symbols])


from pythainlp.collation import collate
from pythainlp.date import now
from pythainlp.transliterate import romanize, transliterate
from pythainlp.soundex import soundex
from pythainlp.spell import spell
from pythainlp.tag import pos_tag
from pythainlp.tokenize import sent_tokenize, tcc, word_tokenize
from pythainlp.transliterate import romanize, transliterate
from pythainlp.util import collate, thai_strftime
8 changes: 0 additions & 8 deletions pythainlp/collation/__init__.py

This file was deleted.

60 changes: 0 additions & 60 deletions pythainlp/date/__init__.py

This file was deleted.

16 changes: 0 additions & 16 deletions pythainlp/keywords/__init__.py

This file was deleted.

25 changes: 0 additions & 25 deletions pythainlp/number/__init__.py

This file was deleted.

Loading