Skip to content

Commit 373991c

Browse files
committed
PYTHON-PACKAGE-006 Updated readme file and added copy_library_analyzers function
Signed-off-by: David de Hilster <[email protected]>
1 parent d2c2184 commit 373991c

File tree

2 files changed

+144
-15
lines changed

2 files changed

+144
-15
lines changed

NLPPlus/__init__.py

+25-4
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010

1111
import json
1212
import logging
13-
from shutil import copytree
13+
from shutil import copytree, rmtree
1414
from tempfile import TemporaryDirectory
1515
from os import PathLike, getcwd
1616
from pathlib import Path
@@ -135,7 +135,23 @@ def input_text(self, analyzer_name: str, file_name: str) -> str:
135135
def set_analyzers_folder(self, analyzer_name: str):
136136
"""Set analyzers directory path."""
137137
self.analyzer_path = analyzer_name
138-
138+
139+
def copy_library_analyzers(self, to_dir: str, overwrite: bool=True):
140+
"""Copy the library files to a directory."""
141+
copy_it = True
142+
143+
if os.path.exists(to_dir):
144+
if overwrite:
145+
rmtree(to_dir)
146+
else:
147+
copy_it = False
148+
149+
if copy_it:
150+
copytree(
151+
Path(__file__).parent / "analyzers", Path(to_dir)
152+
)
153+
self.analyzer_path = str(to_dir)
154+
139155

140156
engine = Engine()
141157

@@ -156,14 +172,19 @@ def set_working_folder(working_folder: Optional[str] = None, initialize: bool =
156172
engine = Engine(Path(working_folder), initialize=initialize)
157173

158174

175+
def copy_library_analyzers(analyzer_folder_path: str, overwrite=True):
176+
"""Run the analyzer named on the input string."""
177+
engine.copy_library_analyzers(analyzer_folder_path, overwrite)
178+
179+
159180
def set_analyzers_folder(analyzer_folder_path: str):
160181
"""Run the analyzer named on the input string."""
161182
engine.set_analyzers_folder(analyzer_folder_path)
162183

163184

164-
def analyze(str: str, parser: str = "parse-en-us"):
185+
def analyze(text: str, parser: str = "parse-en-us"):
165186
"""Run the analyzer named on the input string."""
166-
return engine.analyze(str, parser).output_text
187+
return engine.analyze(text, parser).output_text
167188

168189

169190
def input_text(analyzer_name: str, file_name: str):

README.md

+119-11
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,33 @@
11
# NLPPlus
22

3-
NLPPlus is the first 100% customizable NLP package for Python. NLPPlus
4-
uses the [open-source NLP Engine](https://github.com/VisualText/nlp-engine).
5-
Unlike other NLP packages which are black boxes, NLPPlus analyzers are
6-
100% NLP++ code that can be modified. NLPPlus comes with five starter
7-
analyzers: telephone numbers, links, emails, and a full English parser.
3+
## <## <span style='color:red'>READ FIRST</span>
4+
5+
Current NLP python packages have the "intention" of being plug-and-play
6+
systems that perform natural language tasks without modification. The
7+
problem is that when these systems ultimately fail in critical situations,
8+
coders are left with no real way to fix these systems and they are quickly
9+
abandoned.
10+
11+
The problem is that most all of these packages rely on statistical methods
12+
such as machine learning or neural networks, or in the simpler cases, they
13+
rely on Regex. Statistical systems cannot logically be corrected and Regex
14+
is extremely limited and unreadable and impossible to maintain or extend.
15+
Plus, these systems offer little if any means to modify them even though
16+
every NLP task is slightly different in important ways.
17+
18+
The NLPPlus Python Package is different from all other NLP Python packages.
19+
All its analyzers are 100% human readable and modifiable code that allows
20+
any non-NLP coder to become a computational linguist using the NLP++ VSCode
21+
Language Extension appropriately called "VisualText". The VisualText
22+
extension allows for the visualization of any NLP process. Coders can "see"
23+
the syntactic parse tree along each step of the process, see rule matches
24+
directly in the ext, and print out the knowledge base at any point in the
25+
process. Plus, dictionaries and knowledge bases are human readable unlike
26+
json files or databases.
27+
28+
NLPPlus comes with five starter analyzers: telephone numbers, links, emails,
29+
addresses, and a full English parser. And because NLP++ is a glassbox, all
30+
analyzers can easily be modified by any coder.
831

932
If for example, the telephone number analyzer is not working properly for your
1033
application, you can use the [NLP++ VSCode extension](http://vscode.visualtect.org)
@@ -23,7 +46,7 @@ around the world are starting to use NLP++ to write human digital readers for
2346

2447
* Python 3.10 or newer
2548

26-
## Installation
49+
## <span style='color:orange'>Installation</span>
2750

2851
### Future Installation (waiting for approval)
2952

@@ -63,9 +86,6 @@ shown in the filename, for instance, for Python 3.10 on Windows you
6386
will see a file with a name like
6487
`nlpplus-0.1.dev1+g55d691d-cp310-cp310-win_amd64.whl` - the `cp310`
6588
means Python 3.10. For Python 3.12 it would be `cp312`, and so forth.
66-
You can install this file with `pip`:
67-
68-
pip install nlpplus-0.1.2-cp310-cp310-win_amd64.whl
6989

7090
For specific instructions on setting up Python on your platform please
7191
consult the Python documentation.
@@ -74,7 +94,49 @@ If your platform is not supported you can also compile it from source,
7494
which will require a working C++ compiler. See the platform specific
7595
instructions below for the requirements to build.
7696

77-
## Using the Library
97+
## <span style='color:green'>Why Use NLP++?</span>
98+
99+
There are many reasons to consider using NLP++. Whether it be to be
100+
able to write Regex-like rule patterns, to having the ability to
101+
modify 100% of the NLP code, or to visualize the NLP analyzer in
102+
an intunitive way, NLP++ should be in every coder and programmer's
103+
toolkit.
104+
105+
To put it simply, NLP++ turns any coder or programmer into an NLP
106+
engineer.
107+
108+
### 1000 Times Better than Regex
109+
110+
For matching patterns in text, NLP++ is a Regex killer. The rule
111+
matching system in NLP++ is human readable and is performed by calling
112+
rules in a sequence, making creating and debugging rule-based patterns
113+
a breeze. Along with
114+
115+
### 100% Modifiable
116+
117+
The main reason to use NLP++
118+
it is to engineer an NLP system to a specific task. Most all extraction
119+
or understanding tasks in NLP require specific processing that is never
120+
included in "generic" systems. NLP++ allows for the creation or
121+
modification of any NLP++ system.
122+
123+
It must be emphasized that what separates NLPPlus from all the other
124+
NLP packages in Python is that fact that all parsers are 100% modifiable
125+
using the VSCode NLP++ Language Extension. Other NLP packages use regex
126+
patterns which are impossible to modify or use trained machine learning
127+
or neural network systems which cannot be fixed when
128+
129+
### VisualText Editor
130+
131+
Writing an NLP system from scratch is thought to be for only those in
132+
computational linguistics. But VisualText, NLP++, and the conceptual
133+
Grammar changes all that.
134+
135+
Taking full advantage of the familiar VSCode environment, the NLP++
136+
language extension makes NLP a visual process and logical process that
137+
is easy to understand.
138+
139+
## <span style='color:yellow;'>Usng the NLPPlus Python Package</span>
78140

79141
Very basic usage, which runs the default parser for US English and
80142
returns parsing results as xML:
@@ -99,7 +161,53 @@ or JSON output from them:
99161
parsed_address = results.output["email_address"][0]
100162
parse_tree = results.final_tree
101163

102-
## NLP++ Development
164+
### NLPPlus Engine Functions
165+
166+
#### set_analyzer_folder(analyzer_folder_path: str)
167+
This is used to set the folder where your analyzers are located.
168+
169+
#### analyze(text: str, parser: str = "parse-en-us"): str
170+
This calls one of the analyzers in the analyzer folder on the text.
171+
If the analyzer folder was not set, it will use the library analyzers
172+
that come with NLPPlus. It is recommended that you use the function
173+
copy_library_analyzers to copy the analyzers to avoid having them
174+
overwritten when a new version of NLPPlus is installed.
175+
176+
The analyze function a results object that make the analyzer
177+
output files easily accessible to python. (see reults below)
178+
179+
#### copy_library_analyzers(self, to_dir: str, overwrite: bool=True)
180+
This function copies the NLPPlus library analyzers into a safe
181+
folder away from where they can be overwritten by newer versions
182+
of the NLPPlus package. This allows coders to edit and modify the
183+
analyzers to their liking. Remember to use the set_analyzers_folder
184+
if you want to call your versions of these library analyzers
185+
using the NLPPlus package.
186+
187+
#### input_text(analyzer_name: str, file_name: str)
188+
When developing or editing NLP++ analyzers and calling them from
189+
Python, it is convenient to test your python code on text you
190+
have used to develop your analyzer. This function retrieves the
191+
text from a file in the analyzer's input directory for easy
192+
access while developing your python code in conjunction with
193+
and NLP++ analyzer.
194+
195+
### NLPPlus Engine Results
196+
197+
#### output
198+
This returns a json object based on the parsed output.json file
199+
producted by the analyzer. THe analyzer has to purposly construct
200+
the output.json file for this to work.
201+
202+
#### output.json
203+
The output file produced by the analyzer that is a string, not
204+
a jsoh object. This file must explicity be produced by the analyzer.
205+
206+
#### final.tree
207+
All analyzers output a final tree of the text that is being processed.
208+
This file is in the NLP++ tree format.
209+
210+
## <span style='color:orange'>NLP++ Development</span>
103211

104212
By default the `NLPPlus` module will create a temporary working
105213
directory with the default parser and the small set of analyzers

0 commit comments

Comments
 (0)