Skip to content

Commit 6d25baf

Browse files
authored
Merge pull request #18 from wannaphongcom/develop
PyThaiNLP 1.2 release
2 parents 0eaee2c + 5a9ed1f commit 6d25baf

36 files changed

+495
-121
lines changed

.travis.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33

44
language: python
55
python:
6+
- "2.7"
67
- "3.4"
78
- "3.5"
89
- "3.6"

README.md

Lines changed: 17 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -11,22 +11,27 @@ Thai NLP in python package.
1111

1212
Thai Natural language processing in Python language.
1313

14-
Supports Python 3.4 +
14+
Supports Python 2.7 and Python 3.4 +
1515

16-
- Document : [https://pythonhosted.org/pythainlp/](https://pythonhosted.org/pythainlp/)
16+
- Document : [https://sites.google.com/view/pythainlp/home](https://sites.google.com/view/pythainlp/home)
1717
- GitHub Home : [https://github.com/wannaphongcom/pythainlp](https://github.com/wannaphongcom/pythainlp)
1818

1919
### Project status
2020

2121
Developing
2222

2323
### Version
24-
1.1
24+
1.2
25+
26+
### New !
27+
- add Thai Sentiment (Python 3.4 + only)
28+
- Supports Python 2.7
2529

2630
### Capabilities
2731
- Thai Segment
2832
- Thai to Latin
2933
- Thai Postaggers
34+
- Thai Sentiment
3035
- Read a number to text in Thai language
3136
- Sort the words of a sentence
3237
- Fix the printer forgot to change the language
@@ -35,7 +40,7 @@ Developing
3540

3641
# Install
3742

38-
Supports Python 3.4 +
43+
Supports Python 2.7 and Python 3.4 +
3944

4045
Stable version
4146

@@ -89,17 +94,21 @@ Thai NLP in python package.
8994

9095
Natural language processing หรือ การประมวลภาษาธรรมชาติ โมดูล PyThaiNLP เป็นโมดูลที่ถูกพัฒนาขึ้นเพื่องานวิจัยและพัฒนาการประมวลภาษาธรรมชาติภาษาไทยในภาษา Python
9196

92-
รองรับ Python 3.4 ขึ้นไป
97+
รองรับ Python 2.7 และ Python 3.4 ขึ้นไป
9398

94-
- เอกสารการใช้งาน : [https://pythonhosted.org/pythainlp/](https://pythonhosted.org/pythainlp/)
99+
- เอกสารการใช้งาน : [https://sites.google.com/view/pythainlp/home](https://sites.google.com/view/pythainlp/home)
95100
- หน้าหลัก GitHub : [https://github.com/wannaphongcom/pythainlp](https://github.com/wannaphongcom/pythainlp)
96101

97102
### สถานะโครงการ
98103

99104
กำลังพัฒนา
100105

101106
### Version
102-
1.1
107+
1.2
108+
109+
### มีอะไรใหม่
110+
- เพิ่มการรองรับ Sentiment ภาษาไทย (Python 3.4 ขึ้นไป)
111+
- รองรับ Python 2.7
103112

104113
### ความสามารถ
105114
- ตัดคำภาษาไทย
@@ -113,7 +122,7 @@ Natural language processing หรือ การประมวลภาษา
113122

114123
# ติดตั้ง
115124

116-
รองรับ Python 3.4 ขึ้นไป
125+
รองรับ Python 2.7 และ Python 3.4 ขึ้นไป
117126

118127
รุ่นเสถียร
119128

README.rst

Lines changed: 3 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,37 +1,27 @@
11
PyThaiNLP
22
=========
33

4-
[|PyPI Downloads|][|pypi|](https://pypi.python.org/pypi/pythainlp)
5-
|Build Status|
6-
74
Thai NLP in python package.
85

9-
- Homepage : https://pythonhosted.org/pythainlp/
6+
- Homepage : https://sites.google.com/view/pythainlp/home
107
- GitHub : https://github.com/wannaphongcom/pythainlp
118

129

1310
Version
1411
~~~~~~~
1512

16-
1.1
13+
1.2
1714

1815

1916
Install
2017
=======
2118

22-
Python 3.4 + only
23-
2419
.. code:: sh
2520
26-
$ pip3 install pythainlp
21+
$ pip install pythainlp
2722
2823
License
2924
~~~~~~~
3025

3126
Apache Software License 2.0
3227

33-
34-
.. |PyPI Downloads| image:: https://img.shields.io/pypi/dm/pythainlp.png
35-
.. |pypi| image:: https://img.shields.io/pypi/v/pythainlp.svg
36-
.. |Build Status| image:: https://travis-ci.org/wannaphongcom/pythainlp.svg?branch=develop
37-
:target: https://travis-ci.org/wannaphongcom/pythainlp

docs/license.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# License
22

33
```
4-
Copyright 2016, PyThaiNLP Project
4+
Copyright 2017, PyThaiNLP Project
55
66
Licensed under the Apache License, Version 2.0 (the "License");
77
you may not use this file except in compliance with the License.

pythainlp/Text.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
# -*- coding: utf-8 -*-
2+
from __future__ import absolute_import,unicode_literals
3+
from pythainlp.tokenize import *
4+
import nltk
5+
def Text(str1):
6+
if type(str1) != 'list':
7+
str1=word_tokenize(str(str1))
8+
return nltk.Text(str1)

pythainlp/__init__.py

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,9 @@
11
# -*- coding: utf-8 -*-
2-
from __future__ import absolute_import,unicode_literals
3-
__author__ = 'Wannaphong Phatthiyaphaibun'
4-
__email__ = '[email protected]'
5-
__version__ = '1.1'
2+
from __future__ import absolute_import
3+
import six
4+
if six.PY3:
5+
from pythainlp.sentiment import *
6+
from pythainlp.spell import *
67
from pythainlp.romanization import *
78
from pythainlp.segment import * # เตรียมลบออก 1
89
from pythainlp.tokenize import * # แทนที่ 1
@@ -13,5 +14,5 @@
1314
from pythainlp.postaggers import * # เตรียมลบออก 2
1415
from pythainlp.tag import * # แทนที่ 2
1516
from pythainlp.collation import *
16-
from pythainlp.spell import *
17-
from pythainlp.test import *
17+
from pythainlp.test import *
18+
from pythainlp.Text import *

pythainlp/change/__init__.py

Lines changed: 10 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,10 @@
11
# -*- coding: utf-8 -*-
2-
from __future__ import absolute_import
3-
dictdata={'Z':'(','z':'ผ','X':')','x':'ป','C':'ฉ','c':'แ','V':'ฮ','v':'อ','B':'ฺ','b':'ิ','N':'์','n':'ื','M':'?','m':'ท','<':'ฒ',',':'ม','>':'ฬ','.':'ใ','?':'ฦ','/':'ฝ',
4-
'A':'ฤ','a':'ฟ','S':'ฆ','s':'ห','D':'ฏ','d':'ก','F':'โ','f':'ด','G':'ฌ','g':'เ','H':'็','h':'้','J':'๋','j':'j','K':'ษ','k':'า','L':'ศ','l':'ส',':':'ซ','"':'.',"'":"ง",':':'ซ',';':'ว',
5-
'Q':'๐','q':'ๆ','W':'"','w':'ไ','E':'ฎ','e':'ำ','R':'ฑ','r':'พ','T':'ธ','t':'ะ','Y':'ํ','y':'ั','U':'๊','u':'ี','I':'ณ','i':'ร','O':'ฯ','o':'น','P':'ญ','p':'ย','{':'ฐ','[':'บ','}':',',']':'ล','|':'ฅ',']':'ฃ',
6-
'~':'%','`':'_','@':'๑','2':'/','#':'๒','3':'-','$':'๓','4':'ภ','%':'๔','5':'ถ','^':'ู','6':'ุ','&':'฿','7':'ึ','*':'๕','8':'ค','(':'๖','9':'ต',')':'๗','0':'จ','_':'๘','-':'ข','+':'๙','=':'ช'}
2+
from __future__ import absolute_import,unicode_literals
3+
import six
4+
dictdata={u'Z':u'(',u'z':u'ผ',u'X':u')',u'x':u'ป',u'C':u'ฉ',u'c':u'แ',u'V':u'ฮ',u'v':u'อ',u'B':u'ฺ',u'b':u'ิ',u'N':u'์',u'n':u'ื',u'M':u'?',u'm':u'ท',u'<':u'ฒ',u',u':u'ม',u'>':u'ฬ',u'.':u'ใ',u'?':u'ฦ',u'/':u'ฝ',
5+
'A':u'ฤ',u'a':u'ฟ',u'S':u'ฆ',u's':u'ห',u'D':u'ฏ',u'd':u'ก',u'F':u'โ',u'f':u'ด',u'G':u'ฌ',u'g':u'เ',u'H':u'็',u'h':u'้',u'J':u'๋',u'j':u'j',u'K':u'ษ',u'k':u'า',u'L':u'ศ',u'l':u'ส',u':u':u'ซ',u'"':u'.',"'":"ง",u':u':u'ซ',u';':u'ว',
6+
'Q':u'๐',u'q':u'ๆ',u'W':u'"',u'w':u'ไ',u'E':u'ฎ',u'e':u'ำ',u'R':u'ฑ',u'r':u'พ',u'T':u'ธ',u't':u'ะ',u'Y':u'ํ',u'y':u'ั',u'U':u'๊',u'u':u'ี',u'I':u'ณ',u'i':u'ร',u'O':u'ฯ',u'o':u'น',u'P':u'ญ',u'p':u'ย',u'{':u'ฐ',u'[':u'บ',u'}':u',u',u']':u'ล',u'|':u'ฅ',u']':u'ฃ',
7+
'~':u'%',u'`':u'_',u'@':u'๑',u'2':u'/',u'#':u'๒',u'3':u'-',u'$':u'๓',u'4':u'ภ',u'%':u'๔',u'5':u'ถ',u'^':u'ู',u'6':u'ุ',u'&':u'฿',u'7':u'ึ',u'*':u'๕',u'8':u'ค',u'(':u'๖',u'9':u'ต',u')':u'๗',u'0':u'จ',u'_':u'๘',u'-':u'ข',u'+':u'๙',u'=':u'ช'}
78
# แก้ไขพิมพ์ภาษาไทยผิดภาษา
89
def texttothai(data):
910
"""เป็นคำสั่งแก้ไขข้อความที่พิมพ์ผิดภาษา ต้องการภาษาไทย แต่พิมพ์เป็นภาษาอังกฤษ
@@ -16,14 +17,15 @@ def texttothai(data):
1617
except:
1718
a = a
1819
data2+=a
20+
del data
1921
return data2
2022
# แก้ไขพิมพ์ภาษาอังกฤษผิดภาษา
2123
def texttoeng(data):
2224
"""เป็นคำสั่งแก้ไขข้อความที่พิมพ์ผิดภาษา ต้องการภาษาอังกฤษ แต่พิมพ์เป็นภาษาไทย
2325
รับค่าเป็น ''str'' คืนค่าเป็น ''str''"""
2426
data = list(data)
2527
data2 = ""
26-
dictdataeng= {v: k for k, v in iteritems(dictdata)}
28+
dictdataeng= {v: k for k, v in six.iteritems(dictdata)}
2729
for a in data:
2830
try:
2931
a = dictdataeng[a]
@@ -37,5 +39,5 @@ def texttoeng(data):
3739
a=texttothai(a)
3840
b="นามรสนอำันี"
3941
b=texttoeng(b)
40-
print(a)
41-
print(b)
42+
six.print_(a)
43+
six.print_(b)

pythainlp/chunk/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
11
# -*- coding: utf-8 -*-
2-
from __future__ import absolute_import
2+
from __future__ import absolute_import,unicode_literals
33
# TODO

pythainlp/collation/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
# -*- coding: utf-8 -*-
2-
from __future__ import absolute_import
2+
from __future__ import absolute_import,unicode_literals
33
import icu
44
collator1 = icu.Collator.createInstance(icu.Locale('th_TH'))
55
# เรียงลำดับข้อมูล list ภาษาไทย

pythainlp/corpus/__init__.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# -*- coding: utf-8 -*-
2-
from __future__ import absolute_import
3-
__all__ = ["thaipos", "thaiword","alphabet","tone","country","wordnet"]
2+
from __future__ import absolute_import,unicode_literals
3+
#__all__ = ["thaipos", "thaiword","alphabet","tone","country","wordnet"]
44
from .thaipos import get_data
55
from .thaiword import get_data
66
from .alphabet import get_data

pythainlp/corpus/alphabet.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
1-
from __future__ import absolute_import,unicode_literals
1+
from __future__ import absolute_import,unicode_literals
2+
import six
23
def get_data():
34
"""เป็นคำสั่งสำหรับดึงตัวอักษร ก - ฮ ในภาษาไทย
45
คืนค่า list
56
"""
6-
return ["ก","ข","ฃ","ค","ฅ","ฆ","ง","จ","ฉ","ช","ซ","ฌ","ญ","ฎ","ฏ","ฐ","ฑ","ฒ","ณ","ด","ต","ถ","ท","ธ","น","บ","ป","ผ","ฝ","พ","ฟ","ภ","ม","ย","ร","ล","ว","ศ","ษ","ส","ห","ฬ","อ","ฮ"]
7+
return [u"ก",u"ข",u"ฃ",u"ค",u"ฅ",u"ฆ",u"ง",u"จ",u"ฉ",u"ช",u"ซ",u"ฌ",u"ญ",u"ฎ",u"ฏ",u"ฐ",u"ฑ",u"ฒ",u"ณ",u"ด",u"ต",u"ถ",u"ท",u"ธ",u"น",u"บ",u"ป",u"ผ",u"ฝ",u"พ",u"ฟ",u"ภ",u"ม",u"ย",u"ร",u"ล",u"ว",u"ศ",u"ษ",u"ส",u"ห",u"ฬ",u"อ",u"ฮ"]

pythainlp/corpus/country.py

Lines changed: 2 additions & 2 deletions
Large diffs are not rendered by default.

pythainlp/corpus/thaipos.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,13 @@
11
# -*- coding: utf-8 -*-
2-
from __future__ import absolute_import
2+
from __future__ import absolute_import,unicode_literals
33
from builtins import open
44
import pythainlp
55
import os
66
import json
7+
import codecs
78
templates_dir = os.path.join(os.path.dirname(pythainlp.__file__), 'corpus')
89
template_file = os.path.join(templates_dir, 'thaipos.json')
910
def get_data():
10-
with open(template_file) as f:
11+
with codecs.open(template_file,encoding='utf8') as f:
1112
model = json.load(f)
1213
return model

pythainlp/corpus/thaiword.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
1-
# -*- coding: utf-8 -*-
2-
from __future__ import absolute_import
1+
# -*- coding: utf-8 -*-
2+
from __future__ import absolute_import,unicode_literals
33
import os
44
import codecs
55
import pythainlp

pythainlp/corpus/tone.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
1-
from __future__ import absolute_import,unicode_literals
1+
from __future__ import absolute_import,unicode_literals
22
def get_data():
33
"""เป็นคำสั่งสำหรับตัววรรณยุกต์ในภาษาไทย
44
คืนค่า list
55
"""
6-
return ['่','้','๊','๋']
6+
return [u'่',u'้',u'๊',u'๋']

pythainlp/corpus/wordnet.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
1-
# WordNet ภาษาไทย
2-
from __future__ import print_function
1+
# WordNet ภาษาไทย
2+
from __future__ import unicode_literals,print_function,absolute_import
33
import sqlite3
44
import pythainlp
55
import os
Binary file not shown.

pythainlp/date/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
# -*- coding: utf-8 -*-
2-
from __future__ import absolute_import
2+
from __future__ import absolute_import,unicode_literals
33
import icu
44
import datetime
55
# TODO

pythainlp/number/__init__.py

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -3,17 +3,17 @@
33
from __future__ import absolute_import,division,print_function,unicode_literals
44
from builtins import dict
55
from builtins import int
6-
import math
7-
p = [['ภาษาไทย', 'ตัวเลข','เลขไทย'],
8-
['หนึ่ง', '1', '๑'],
9-
['สอง', '2', '๒'],
10-
['สาม', '3', '๓'],
11-
['สี่', '4', '๔'],
12-
['ห้า', '5', '๕'],
13-
['หก', '6', '๖'],
14-
['หก', '7', '๗'],
15-
['แปด', '8', '๘'],
16-
['เก้า', '9', '๙']]
6+
import math,six
7+
p = [[u'ภาษาไทย', u'ตัวเลข',u'เลขไทย'],
8+
[u'หนึ่ง', u'1', u'๑'],
9+
[u'สอง', u'2', u'๒'],
10+
[u'สาม', u'3', u'๓'],
11+
[u'สี่', u'4', u'๔'],
12+
[u'ห้า', u'5', u'๕'],
13+
[u'หก', u'6', u'๖'],
14+
[u'หก', u'7', u'๗'],
15+
[u'แปด', u'8', u'๘'],
16+
[u'เก้า', u'9', u'๙']]
1717
thaitonum = dict((x[2], x[1]) for x in p[1:])
1818
p1 = dict((x[0], x[1]) for x in p[1:])
1919
d1 = 0
@@ -96,7 +96,7 @@ def ReadNumber(number):
9696
ret += "เอ็ด"
9797
else:
9898
ret += number_call[d]
99-
if(d):
99+
if d:
100100
ret += position_call[pos]
101101
else:
102102
ret += ""

pythainlp/postaggers/__init__.py

Lines changed: 22 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,23 @@
11
# -*- coding: utf-8 -*-
2-
from __future__ import absolute_import
3-
__all__ = ["text"]
4-
from .text import tag
2+
from __future__ import absolute_import,division,print_function,unicode_literals
3+
from pythainlp.segment import segment
4+
import pythainlp
5+
import codecs
6+
import os
7+
import json
8+
import six
9+
import nltk.tag, nltk.data
10+
templates_dir = os.path.join(os.path.dirname(pythainlp.__file__), 'corpus')
11+
template_file = os.path.join(templates_dir, 'thaipos.json')
12+
#default_tagger = nltk.data.load(nltk.tag._POS_TAGGER)
13+
def data():
14+
with codecs.open(template_file,'r',encoding='utf-8-sig') as handle:
15+
model = json.load(handle)
16+
return model
17+
data1 =data()
18+
#Postaggers ภาษาไทย
19+
def tag(text):
20+
"""รับค่าเป็นข้อความ ''str'' คืนค่าเป็น ''list'' เช่น [('ข้อความ', 'ชนิดคำ')]"""
21+
text= segment(text)
22+
tagger = nltk.tag.UnigramTagger(model=data1)# backoff=default_tagger)
23+
return tagger.tag(text)

pythainlp/postaggers/text.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
import codecs
66
import os
77
import json
8+
import six
89
import nltk.tag, nltk.data
910
templates_dir = os.path.join(os.path.dirname(pythainlp.__file__), 'corpus')
1011
template_file = os.path.join(templates_dir, 'thaipos.json')

pythainlp/romanization/__init__.py

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,9 @@
11
# -*- coding: utf-8 -*-
22
from __future__ import absolute_import,unicode_literals
3-
__all__ = ['pyicu']
4-
try:
5-
from .pyicu import romanization
6-
except:
7-
print("error")
3+
import icu
4+
import six
5+
# ถอดเสียงภาษาไทยเป็น Latin
6+
def romanization(data):
7+
"""เป็นคำสั่ง ถอดเสียงภาษาไทยเป็น Latin รับค่า ''str'' ข้อความ คืนค่าเป็น ''str'' ข้อความ Latin"""
8+
thai2latin = icu.Transliterator.createInstance('Thai-Latin')
9+
return thai2latin.transliterate(data)

pythainlp/romanization/pyicu.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,9 @@
11
# -*- coding: utf-8 -*-
2+
from __future__ import absolute_import,unicode_literals
23
import icu
4+
import six
35
# ถอดเสียงภาษาไทยเป็น Latin
46
def romanization(data):
57
"""เป็นคำสั่ง ถอดเสียงภาษาไทยเป็น Latin รับค่า ''str'' ข้อความ คืนค่าเป็น ''str'' ข้อความ Latin"""
68
thai2latin = icu.Transliterator.createInstance('Thai-Latin')
7-
return thai2latin.transliterate(data)
9+
return thai2latin.transliterate(six.u(data))

0 commit comments

Comments
 (0)