1
1
# NLPPlus
2
2
3
- NLPPlus is the first 100% customizable NLP package for Python. NLPPlus
4
- uses the [ open-source NLP Engine] ( https://github.com/VisualText/nlp-engine ) .
5
- Unlike other NLP packages which are black boxes, NLPPlus analyzers are
6
- 100% NLP++ code that can be modified. NLPPlus comes with five starter
7
- analyzers: telephone numbers, links, emails, and a full English parser.
3
+ ## <## <span style =' color :red ' >READ FIRST</span >
4
+
5
+ Current NLP python packages have the "intention" of being plug-and-play
6
+ systems that perform natural language tasks without modification. The
7
+ problem is that when these systems ultimately fail in critical situations,
8
+ coders are left with no real way to fix these systems and they are quickly
9
+ abandoned.
10
+
11
+ The problem is that most all of these packages rely on statistical methods
12
+ such as machine learning or neural networks, or in the simpler cases, they
13
+ rely on Regex. Statistical systems cannot logically be corrected and Regex
14
+ is extremely limited and unreadable and impossible to maintain or extend.
15
+ Plus, these systems offer little if any means to modify them even though
16
+ every NLP task is slightly different in important ways.
17
+
18
+ The NLPPlus Python Package is different from all other NLP Python packages.
19
+ All its analyzers are 100% human readable and modifiable code that allows
20
+ any non-NLP coder to become a computational linguist using the NLP++ VSCode
21
+ Language Extension appropriately called "VisualText". The VisualText
22
+ extension allows for the visualization of any NLP process. Coders can "see"
23
+ the syntactic parse tree along each step of the process, see rule matches
24
+ directly in the ext, and print out the knowledge base at any point in the
25
+ process. Plus, dictionaries and knowledge bases are human readable unlike
26
+ json files or databases.
27
+
28
+ NLPPlus comes with five starter analyzers: telephone numbers, links, emails,
29
+ addresses, and a full English parser. And because NLP++ is a glassbox, all
30
+ analyzers can easily be modified by any coder.
8
31
9
32
If for example, the telephone number analyzer is not working properly for your
10
33
application, you can use the [ NLP++ VSCode extension] ( http://vscode.visualtect.org )
@@ -23,7 +46,7 @@ around the world are starting to use NLP++ to write human digital readers for
23
46
24
47
* Python 3.10 or newer
25
48
26
- ## Installation
49
+ ## < span style = ' color : orange ' > Installation</ span >
27
50
28
51
### Future Installation (waiting for approval)
29
52
@@ -63,9 +86,6 @@ shown in the filename, for instance, for Python 3.10 on Windows you
63
86
will see a file with a name like
64
87
` nlpplus-0.1.dev1+g55d691d-cp310-cp310-win_amd64.whl ` - the ` cp310 `
65
88
means Python 3.10. For Python 3.12 it would be ` cp312 ` , and so forth.
66
- You can install this file with ` pip ` :
67
-
68
- pip install nlpplus-0.1.2-cp310-cp310-win_amd64.whl
69
89
70
90
For specific instructions on setting up Python on your platform please
71
91
consult the Python documentation.
@@ -74,7 +94,49 @@ If your platform is not supported you can also compile it from source,
74
94
which will require a working C++ compiler. See the platform specific
75
95
instructions below for the requirements to build.
76
96
77
- ## Using the Library
97
+ ## <span style =' color :green ' >Why Use NLP++?</span >
98
+
99
+ There are many reasons to consider using NLP++. Whether it be to be
100
+ able to write Regex-like rule patterns, to having the ability to
101
+ modify 100% of the NLP code, or to visualize the NLP analyzer in
102
+ an intunitive way, NLP++ should be in every coder and programmer's
103
+ toolkit.
104
+
105
+ To put it simply, NLP++ turns any coder or programmer into an NLP
106
+ engineer.
107
+
108
+ ### 1000 Times Better than Regex
109
+
110
+ For matching patterns in text, NLP++ is a Regex killer. The rule
111
+ matching system in NLP++ is human readable and is performed by calling
112
+ rules in a sequence, making creating and debugging rule-based patterns
113
+ a breeze. Along with
114
+
115
+ ### 100% Modifiable
116
+
117
+ The main reason to use NLP++
118
+ it is to engineer an NLP system to a specific task. Most all extraction
119
+ or understanding tasks in NLP require specific processing that is never
120
+ included in "generic" systems. NLP++ allows for the creation or
121
+ modification of any NLP++ system.
122
+
123
+ It must be emphasized that what separates NLPPlus from all the other
124
+ NLP packages in Python is that fact that all parsers are 100% modifiable
125
+ using the VSCode NLP++ Language Extension. Other NLP packages use regex
126
+ patterns which are impossible to modify or use trained machine learning
127
+ or neural network systems which cannot be fixed when
128
+
129
+ ### VisualText Editor
130
+
131
+ Writing an NLP system from scratch is thought to be for only those in
132
+ computational linguistics. But VisualText, NLP++, and the conceptual
133
+ Grammar changes all that.
134
+
135
+ Taking full advantage of the familiar VSCode environment, the NLP++
136
+ language extension makes NLP a visual process and logical process that
137
+ is easy to understand.
138
+
139
+ ## <span style =' color :yellow ;' >Usng the NLPPlus Python Package</span >
78
140
79
141
Very basic usage, which runs the default parser for US English and
80
142
returns parsing results as xML:
@@ -99,7 +161,53 @@ or JSON output from them:
99
161
parsed_address = results.output["email_address"][0]
100
162
parse_tree = results.final_tree
101
163
102
- ## NLP++ Development
164
+ ### NLPPlus Engine Functions
165
+
166
+ #### set_analyzer_folder(analyzer_folder_path: str)
167
+ This is used to set the folder where your analyzers are located.
168
+
169
+ #### analyze(text: str, parser: str = "parse-en-us"): str
170
+ This calls one of the analyzers in the analyzer folder on the text.
171
+ If the analyzer folder was not set, it will use the library analyzers
172
+ that come with NLPPlus. It is recommended that you use the function
173
+ copy_library_analyzers to copy the analyzers to avoid having them
174
+ overwritten when a new version of NLPPlus is installed.
175
+
176
+ The analyze function a results object that make the analyzer
177
+ output files easily accessible to python. (see reults below)
178
+
179
+ #### copy_library_analyzers(self, to_dir: str, overwrite: bool=True)
180
+ This function copies the NLPPlus library analyzers into a safe
181
+ folder away from where they can be overwritten by newer versions
182
+ of the NLPPlus package. This allows coders to edit and modify the
183
+ analyzers to their liking. Remember to use the set_analyzers_folder
184
+ if you want to call your versions of these library analyzers
185
+ using the NLPPlus package.
186
+
187
+ #### input_text(analyzer_name: str, file_name: str)
188
+ When developing or editing NLP++ analyzers and calling them from
189
+ Python, it is convenient to test your python code on text you
190
+ have used to develop your analyzer. This function retrieves the
191
+ text from a file in the analyzer's input directory for easy
192
+ access while developing your python code in conjunction with
193
+ and NLP++ analyzer.
194
+
195
+ ### NLPPlus Engine Results
196
+
197
+ #### output
198
+ This returns a json object based on the parsed output.json file
199
+ producted by the analyzer. THe analyzer has to purposly construct
200
+ the output.json file for this to work.
201
+
202
+ #### output.json
203
+ The output file produced by the analyzer that is a string, not
204
+ a jsoh object. This file must explicity be produced by the analyzer.
205
+
206
+ #### final.tree
207
+ All analyzers output a final tree of the text that is being processed.
208
+ This file is in the NLP++ tree format.
209
+
210
+ ## <span style =' color :orange ' >NLP++ Development</span >
103
211
104
212
By default the ` NLPPlus ` module will create a temporary working
105
213
directory with the default parser and the small set of analyzers
0 commit comments