Skip to content

Commit 1c07e37

Browse files
authored
Merge pull request #850 from Saharshjain78/dev
Improving the documentation byt fixing the typos, adding necesarry details and explanation of the code and the missing necessary details about model and example.
2 parents 52524c4 + dd8cc72 commit 1c07e37

19 files changed

+1138
-174
lines changed

docs/api/augment.rst

+58-14
Original file line numberDiff line numberDiff line change
@@ -1,25 +1,69 @@
11
.. currentmodule:: pythainlp.augment
22

3-
pythainlp.augment
4-
=================
3+
pythainlp.augment Module
4+
=======================
55

6-
The :class:`textaugment` is Thai text augment. This function for text augment task.
6+
Introduction
7+
------------
78

8-
Modules
9-
-------
9+
The `pythainlp.augment` module is a powerful toolset for text augmentation in the Thai language. Text augmentation is a process that enriches and diversifies textual data by generating alternative versions of the original text. This module is a valuable resource for improving the quality and variety of Thai language data for NLP tasks.
10+
11+
TextAugment Class
12+
-----------------
13+
14+
The central component of the `pythainlp.augment` module is the `TextAugment` class. This class provides various text augmentation techniques and functions to enhance the diversity of your text data. It offers the following methods:
15+
16+
.. autoclass:: pythainlp.augment.TextAugment
17+
:members:
18+
19+
WordNetAug Class
20+
----------------
21+
22+
The `WordNetAug` class is designed to perform text augmentation using WordNet, a lexical database for English. This class enables you to augment Thai text using English synonyms, offering a unique approach to text diversification. The following methods are available within this class:
23+
24+
.. autoclass:: pythainlp.augment.WordNetAug
25+
:members:
26+
27+
Word2VecAug, Thai2fitAug, LTW2VAug Classes
28+
------------------------------------------
29+
30+
The `pythainlp.augment.word2vec` package contains multiple classes for text augmentation using Word2Vec models. These classes include `Word2VecAug`, `Thai2fitAug`, and `LTW2VAug`. Each of these classes allows you to use Word2Vec embeddings to generate text variations. Explore the methods provided by these classes to understand their capabilities.
1031

11-
.. autoclass:: WordNetAug
12-
:members:
13-
.. autofunction:: postype2wordnet
1432
.. autoclass:: pythainlp.augment.word2vec.Word2VecAug
15-
:members:
33+
:members:
34+
1635
.. autoclass:: pythainlp.augment.word2vec.Thai2fitAug
17-
:members:
36+
:members:
37+
1838
.. autoclass:: pythainlp.augment.word2vec.LTW2VAug
19-
:members:
39+
:members:
40+
41+
FastTextAug and Thai2transformersAug Classes
42+
--------------------------------------------
43+
44+
The `pythainlp.augment.lm` package offers classes for text augmentation using language models. These classes include `FastTextAug` and `Thai2transformersAug`. These classes allow you to use language model-based techniques to diversify text data. Explore their methods to understand their capabilities.
45+
2046
.. autoclass:: pythainlp.augment.lm.FastTextAug
21-
:members:
47+
:members:
48+
2249
.. autoclass:: pythainlp.augment.lm.Thai2transformersAug
23-
:members:
50+
:members:
51+
52+
BPEmbAug Class
53+
--------------
54+
55+
The `pythainlp.augment.word2vec.bpemb_wv` package contains the `BPEmbAug` class, which is designed for text augmentation using subword embeddings. This class is particularly useful when working with subword representations for Thai text augmentation.
56+
2457
.. autoclass:: pythainlp.augment.word2vec.bpemb_wv.BPEmbAug
25-
:members:
58+
:members:
59+
60+
Additional Functions
61+
-------------------
62+
63+
To further enhance your text augmentation tasks, the `pythainlp.augment` module offers the following functions:
64+
65+
- `postype2wordnet`: This function maps part-of-speech tags to WordNet-compatible POS tags, facilitating the integration of WordNet augmentation with Thai text.
66+
67+
These functions and classes provide diverse techniques for text augmentation in the Thai language, making this module a valuable asset for NLP researchers, developers, and practitioners.
68+
69+
For detailed usage examples and guidelines, please refer to the official PyThaiNLP documentation. The `pythainlp.augment` module opens up new possibilities for enriching and diversifying Thai text data, leading to improved NLP models and applications.

docs/api/benchmarks.rst

+28-8
Original file line numberDiff line numberDiff line change
@@ -2,23 +2,43 @@
22

33
pythainlp.benchmarks
44
====================================
5-
The :class:`pythainlp.benchmarks` contains utility functions for benchmarking
6-
tasked related to Thai NLP. At the moment, we have only for word tokenization.
7-
Other tasks will be added soon.
85

9-
Modules
10-
-------
6+
Introduction
7+
------------
8+
9+
The `pythainlp.benchmarks` module is a collection of utility functions designed for benchmarking tasks related to Thai Natural Language Processing (NLP). Currently, the module includes tools for word tokenization benchmarking. Please note that additional benchmarking tasks will be incorporated in the future.
1110

1211
Tokenization
13-
*********
12+
------------
13+
14+
Word tokenization is a fundamental task in NLP, and it plays a crucial role in various applications, such as text analysis and language processing. The `pythainlp.benchmarks` module offers a set of functions to assist in the benchmarking and evaluation of word tokenization methods.
15+
16+
Quality Evaluation
17+
^^^^^^^^^^^^^^^^^^
18+
19+
The quality of word tokenization can significantly impact the accuracy of downstream NLP tasks. To assess the quality of word tokenization, the module provides a qualitative evaluation using various metrics and techniques.
1420

15-
Quality
16-
^^^^
1721
.. figure:: ../images/evaluation.png
1822
:scale: 50 %
1923

2024
Qualitative evaluation of word tokenization.
2125

26+
Functions
27+
---------
28+
2229
.. autofunction:: pythainlp.benchmarks.word_tokenization.compute_stats
30+
31+
This function is used to compute various statistics and metrics related to word tokenization. It allows you to assess the performance of different tokenization methods.
32+
2333
.. autofunction:: pythainlp.benchmarks.word_tokenization.benchmark
34+
35+
The `benchmark` function facilitates the benchmarking of word tokenization methods. It provides an organized framework for evaluating and comparing the effectiveness of different tokenization tools.
36+
2437
.. autofunction:: pythainlp.benchmarks.word_tokenization.preprocessing
38+
39+
Preprocessing is a crucial step in NLP tasks. The `preprocessing` function assists in preparing text data for tokenization, which is essential for accurate and consistent benchmarking.
40+
41+
Usage
42+
-----
43+
44+
To make use of these benchmarking functions, you can follow the provided examples and guidelines in the official PyThaiNLP documentation. These tools are invaluable for researchers, developers, and anyone interested in improving and evaluating Thai word tokenization methods.

docs/api/coref.rst

+31-3
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,37 @@
22

33
pythainlp.coref
44
===============
5-
The :class:`pythainlp.coref` is Coreference Resolution for Thai.
5+
Introduction
6+
------------
7+
8+
The `pythainlp.coref` module is dedicated to Coreference Resolution for the Thai language. Coreference resolution is a crucial task in natural language processing (NLP) that deals with identifying and linking expressions (such as pronouns) in a text to the entities or concepts they refer to. This module provides tools to tackle coreference resolution challenges in the context of the Thai language.
69

7-
Modules
8-
-------
10+
Coreference Resolution Function
11+
-------------------------------
12+
13+
The primary component of the `pythainlp.coref` module is the `coreference_resolution` function. This function is designed to analyze text and identify instances of coreference, helping NLP systems understand when different expressions in the text refer to the same entity. Here's how you can use it:
14+
15+
The :class:`pythainlp.coref` is Coreference Resolution for Thai.
916

1017
.. autofunction:: coreference_resolution
18+
19+
Usage
20+
-----
21+
22+
To use the `coreference_resolution` function effectively, follow these steps:
23+
24+
1. Import the `coreference_resolution` function from the `pythainlp.coref` module.
25+
26+
2. Pass the Thai text you want to analyze for coreferences as input to the function.
27+
28+
3. The function will process the text and return information about coreference relationships within the text.
29+
30+
Example:
31+
32+
```python
33+
from pythainlp.coref import coreference_resolution
34+
35+
text = "นาย A มาจาก กรุงเทพ และเขา มีความรักต่อ บางกิจ ของเขา"
36+
coreferences = coreference_resolution(text)
37+
38+
print(coreferences)

0 commit comments

Comments
 (0)