Skip to content

Improving the documentation byt fixing the typos, adding necesarry details and explanation of the code and the missing necessary details about model and example. #850

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 23 commits into from
Oct 19, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
72 changes: 58 additions & 14 deletions docs/api/augment.rst
Original file line number Diff line number Diff line change
@@ -1,25 +1,69 @@
.. currentmodule:: pythainlp.augment

pythainlp.augment
=================
pythainlp.augment Module
=======================

The :class:`textaugment` is Thai text augment. This function for text augment task.
Introduction
------------

Modules
-------
The `pythainlp.augment` module is a powerful toolset for text augmentation in the Thai language. Text augmentation is a process that enriches and diversifies textual data by generating alternative versions of the original text. This module is a valuable resource for improving the quality and variety of Thai language data for NLP tasks.

TextAugment Class
-----------------

The central component of the `pythainlp.augment` module is the `TextAugment` class. This class provides various text augmentation techniques and functions to enhance the diversity of your text data. It offers the following methods:

.. autoclass:: pythainlp.augment.TextAugment
:members:

WordNetAug Class
----------------

The `WordNetAug` class is designed to perform text augmentation using WordNet, a lexical database for English. This class enables you to augment Thai text using English synonyms, offering a unique approach to text diversification. The following methods are available within this class:

.. autoclass:: pythainlp.augment.WordNetAug
:members:

Word2VecAug, Thai2fitAug, LTW2VAug Classes
------------------------------------------

The `pythainlp.augment.word2vec` package contains multiple classes for text augmentation using Word2Vec models. These classes include `Word2VecAug`, `Thai2fitAug`, and `LTW2VAug`. Each of these classes allows you to use Word2Vec embeddings to generate text variations. Explore the methods provided by these classes to understand their capabilities.

.. autoclass:: WordNetAug
:members:
.. autofunction:: postype2wordnet
.. autoclass:: pythainlp.augment.word2vec.Word2VecAug
:members:
:members:

.. autoclass:: pythainlp.augment.word2vec.Thai2fitAug
:members:
:members:

.. autoclass:: pythainlp.augment.word2vec.LTW2VAug
:members:
:members:

FastTextAug and Thai2transformersAug Classes
--------------------------------------------

The `pythainlp.augment.lm` package offers classes for text augmentation using language models. These classes include `FastTextAug` and `Thai2transformersAug`. These classes allow you to use language model-based techniques to diversify text data. Explore their methods to understand their capabilities.

.. autoclass:: pythainlp.augment.lm.FastTextAug
:members:
:members:

.. autoclass:: pythainlp.augment.lm.Thai2transformersAug
:members:
:members:

BPEmbAug Class
--------------

The `pythainlp.augment.word2vec.bpemb_wv` package contains the `BPEmbAug` class, which is designed for text augmentation using subword embeddings. This class is particularly useful when working with subword representations for Thai text augmentation.

.. autoclass:: pythainlp.augment.word2vec.bpemb_wv.BPEmbAug
:members:
:members:

Additional Functions
-------------------

To further enhance your text augmentation tasks, the `pythainlp.augment` module offers the following functions:

- `postype2wordnet`: This function maps part-of-speech tags to WordNet-compatible POS tags, facilitating the integration of WordNet augmentation with Thai text.

These functions and classes provide diverse techniques for text augmentation in the Thai language, making this module a valuable asset for NLP researchers, developers, and practitioners.

For detailed usage examples and guidelines, please refer to the official PyThaiNLP documentation. The `pythainlp.augment` module opens up new possibilities for enriching and diversifying Thai text data, leading to improved NLP models and applications.
36 changes: 28 additions & 8 deletions docs/api/benchmarks.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,23 +2,43 @@

pythainlp.benchmarks
====================================
The :class:`pythainlp.benchmarks` contains utility functions for benchmarking
tasked related to Thai NLP. At the moment, we have only for word tokenization.
Other tasks will be added soon.

Modules
-------
Introduction
------------

The `pythainlp.benchmarks` module is a collection of utility functions designed for benchmarking tasks related to Thai Natural Language Processing (NLP). Currently, the module includes tools for word tokenization benchmarking. Please note that additional benchmarking tasks will be incorporated in the future.

Tokenization
*********
------------

Word tokenization is a fundamental task in NLP, and it plays a crucial role in various applications, such as text analysis and language processing. The `pythainlp.benchmarks` module offers a set of functions to assist in the benchmarking and evaluation of word tokenization methods.

Quality Evaluation
^^^^^^^^^^^^^^^^^^

The quality of word tokenization can significantly impact the accuracy of downstream NLP tasks. To assess the quality of word tokenization, the module provides a qualitative evaluation using various metrics and techniques.

Quality
^^^^
.. figure:: ../images/evaluation.png
:scale: 50 %

Qualitative evaluation of word tokenization.

Functions
---------

.. autofunction:: pythainlp.benchmarks.word_tokenization.compute_stats

This function is used to compute various statistics and metrics related to word tokenization. It allows you to assess the performance of different tokenization methods.

.. autofunction:: pythainlp.benchmarks.word_tokenization.benchmark

The `benchmark` function facilitates the benchmarking of word tokenization methods. It provides an organized framework for evaluating and comparing the effectiveness of different tokenization tools.

.. autofunction:: pythainlp.benchmarks.word_tokenization.preprocessing

Preprocessing is a crucial step in NLP tasks. The `preprocessing` function assists in preparing text data for tokenization, which is essential for accurate and consistent benchmarking.

Usage
-----

To make use of these benchmarking functions, you can follow the provided examples and guidelines in the official PyThaiNLP documentation. These tools are invaluable for researchers, developers, and anyone interested in improving and evaluating Thai word tokenization methods.
34 changes: 31 additions & 3 deletions docs/api/coref.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,37 @@

pythainlp.coref
===============
The :class:`pythainlp.coref` is Coreference Resolution for Thai.
Introduction
------------

The `pythainlp.coref` module is dedicated to Coreference Resolution for the Thai language. Coreference resolution is a crucial task in natural language processing (NLP) that deals with identifying and linking expressions (such as pronouns) in a text to the entities or concepts they refer to. This module provides tools to tackle coreference resolution challenges in the context of the Thai language.

Modules
-------
Coreference Resolution Function
-------------------------------

The primary component of the `pythainlp.coref` module is the `coreference_resolution` function. This function is designed to analyze text and identify instances of coreference, helping NLP systems understand when different expressions in the text refer to the same entity. Here's how you can use it:

The :class:`pythainlp.coref` is Coreference Resolution for Thai.

.. autofunction:: coreference_resolution

Usage
-----

To use the `coreference_resolution` function effectively, follow these steps:

1. Import the `coreference_resolution` function from the `pythainlp.coref` module.

2. Pass the Thai text you want to analyze for coreferences as input to the function.

3. The function will process the text and return information about coreference relationships within the text.

Example:

```python
from pythainlp.coref import coreference_resolution

text = "นาย A มาจาก กรุงเทพ และเขา มีความรักต่อ บางกิจ ของเขา"
coreferences = coreference_resolution(text)

print(coreferences)
Loading