-
Notifications
You must be signed in to change notification settings - Fork 144
WIP: Adding SRL #215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
WIP: Adding SRL #215
Changes from all commits
53c4ee1
d31108b
ddef851
7e0add5
e94d688
794bd04
6eeded1
9bee145
93352eb
80fd99e
e766eb4
09216e6
0bd5517
4a15867
80a30f0
b011ba6
da09783
39b1a5e
527ac81
e9a68aa
f322db1
400d6fa
8a088e3
39993e1
dd07d30
8dbadf3
3722c3e
2afb706
2386ec7
e744814
760b95f
b8d34db
75db17e
e46916b
f881cff
ae4e3ce
7e2c3e7
ae89da4
9c03a57
ee4fa4b
a645b1b
a12aaf6
c79f40f
d1ed497
d693602
550d228
5ea7da9
c60c57b
803bb24
4c98921
7f4d3b6
091cfde
99e7875
1260b53
6bfb3fa
79de23f
6827469
2923071
122767c
e191bdf
298fa23
a17d815
5c1306b
1db66f9
6890b3d
9053fbb
a8e3f30
4a3f0e1
db4f56d
d94ab6b
8c5b106
b518ebf
4d9070a
48e310d
a2c9f40
d9c1eec
9a58721
c07c1a1
fbd5602
62fe36c
a31d438
1e0b6f6
5bca95d
0459b97
e2ce6a5
52cc455
025e7df
82040ec
75de089
5c6c99c
396de0d
5722539
2a04ebc
976f288
d444a13
3998c95
15cd5fd
13e2746
691aac5
ae71b69
1161b78
d6f68c2
b016cc6
4a60b49
662afdd
2a98e89
357fd39
f5b0df0
d23369b
a008507
1f6beab
2ab580c
458b844
6813089
de4dab6
901998f
2653065
960e5c1
90f920e
8907a86
f196fe4
fd97201
f292cde
084d0ba
7c40a69
24ba53e
8252cb9
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,55 @@ | ||
Version 3.0.73 | ||
Moved to the super-project and changed the versioning to the super-project versioning | ||
|
||
Version 5.1.12 | ||
Added Windows support (including access to non-Gurobi solver) | ||
|
||
Version 5.1.4 | ||
Switched entirely to illinois-sl for structured prediction (removed JLIS traces) | ||
Using the latest AnnotatorService from illinois-core-utilities for both Curator & pipeline annotation | ||
Major cleaning up | ||
|
||
Version 5.1 | ||
Added JUnit tests | ||
Removed unnecessary dependencies | ||
Switched to illinois-nlp-pipeline-0.1.2 | ||
Minor fixes | ||
|
||
Version 5.0 | ||
Standalone SRL using illinois-nlp-pipeline | ||
|
||
Version 4.1.1 | ||
Switched to edison-0.7.1 and LBJava-1.0 | ||
Added dependency to illinois-common-resources | ||
|
||
Version 4.1 | ||
Various bugfixes | ||
|
||
Version 4.0.2 | ||
Updated inference dependency to latest version and modified inference | ||
code accordingly. | ||
|
||
Version 4.0.1 | ||
Removed duplicate code from JLIS-core and moved to IllinoisSL. Minor edits. | ||
|
||
Version 4.0 | ||
A complete rewrite of the SRL. Includes predicate and sense detectors, | ||
new constraints and a memory footprint of only 3GB. | ||
|
||
Version 3.0.3 | ||
Minor bugfixes. Uses edison v0.2.9 | ||
|
||
Version 3.0.2 | ||
Added an option to trim leading prepositions from arguments. | ||
|
||
Revamped the training mechanism to train using LBJ's BatchTrainer in | ||
the code. This allows manual lexicon handling, which reduces the | ||
memory requirements by nearly 40 percent. | ||
|
||
Version 3.0.1 | ||
Minor bugfix | ||
|
||
Version 3.0 | ||
A complete Java based re-implementation of the Illinois SRL from | ||
Punyakanok 2008. This version uses LBJ to train classifiers and | ||
for performing inference with a home-brewed beam search. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
# illinois-srl: Semantic Role Labeler | ||
|
||
### Running | ||
You can use the **illinois-srl** system in either *interactive* or *annotator* mode. | ||
#### Interactive mode | ||
In *interactive mode* the user can input a single piece of text and get back the feedback from both | ||
the **Nom**inal or **Verb**al SRL systems in plain text. | ||
|
||
To run the system in *interactive mode* see the class `edu.illinois.cs.cogcomp.srl.SemanticRoleLabeler` | ||
or simply execute the `run-interactive` script: | ||
|
||
For linux: | ||
``` | ||
scripts/run-interactive.sh | ||
``` | ||
|
||
For windows: | ||
``` | ||
cd scripts | ||
run-interactive-win.bat | ||
``` | ||
|
||
#### As an `Annotator` component | ||
**illinois-srl** can also be used programmatically through the `SemanticRoleLabeler` class which implements CogComp's | ||
[Annotator interface](http://cogcomp.cs.illinois.edu/software/doc/illinois-core-utilities/apidocs/edu/illinois/cs/cogcomp/core/datastructures/textannotation/Annotator.html). | ||
|
||
The main method is `getView(TextAnnotation)` inside `SemanticRoleLabeler`. This will add a new | ||
[`PredicateArgumentView`](http://cogcomp.cs.illinois.edu/software/doc/illinois-core-utilities/apidocs/edu/illinois/cs/cogcomp/core/datastructures/textannotation/PredicateArgumentView.html) | ||
for either **Nom**inal or **Verb**al SRL. | ||
|
||
### Training | ||
To train the SRL system you will require access to the [Propbank](https://verbs.colorado.edu/~mpalmer/projects/ace.html) | ||
or [Nombank](http://nlp.cs.nyu.edu/meyers/NomBank.html) corpora. You need to set pointers to these in the | ||
`config/srl-config.properties` file. | ||
(To train the system with a non-Prop/Nombank corpus, you need to extend | ||
[`AbstractSRLAnnotationReader`](http://cogcomp.cs.illinois.edu/software/doc/illinois-core-utilities/apidocs/edu/illinois/cs/cogcomp/nlp/corpusreaders/AbstractSRLAnnotationReader.html)) | ||
|
||
To perform the whole training/testing suite, run the `Main` class with parameters `<config-file> expt Verb|Nom true`. | ||
This will: | ||
|
||
1. Read and cache the datasets (train/test) | ||
2. Annotate each `TextAnnotation` with the required views | ||
(here you can set the `useCurator` flag to false to use the CogComp's standalone NLP pipeline) | ||
3. Pre-extract and cache the features for the classifiers | ||
4. Train the classifiers | ||
5. Evaluate on the (cached) test corpus | ||
|
||
**IMPORTANT** After training, make sure you comment-out the pre-trained SRL model dependencies inside | ||
`pom.xml` (lines 27-38). |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
# Available learning models: {L2LossSSVM, StructuredPerceptron} | ||
LEARNING_MODEL = L2LossSSVM | ||
|
||
# Available solver types: {DCDSolver, ParallelDCDSolver, DEMIParallelDCDSolver} | ||
L2_LOSS_SSVM_SOLVER_TYPE = ParallelDCDSolver | ||
|
||
NUMBER_OF_THREADS = 8 | ||
|
||
# Regularization parameter | ||
C_FOR_STRUCTURE = 1.0 | ||
|
||
# Mini-batch for 'warm' start | ||
TRAINMINI = true | ||
TRAINMINI_SIZE = 10000 | ||
|
||
# Suppress optimatility check | ||
CHECK_INFERENCE_OPT = false | ||
|
||
# Number of training rounds | ||
MAX_NUM_ITER = 100 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
## Flags for whether to use different annotators | ||
usePos = true | ||
useLemma = true | ||
useShallowParse = true | ||
useNerConll = true | ||
useNerOntonotes = false | ||
useStanfordParse = true | ||
useStanfordDep = true | ||
useSrlVerb = false | ||
useSrlNom = false | ||
|
||
## Flags for the Stanford parser (for pre-processing) | ||
# Max time per sentence (in milliseconds) | ||
stanfordMaxTimePerSentence = 1000 | ||
|
||
# Max sentence lenght (will throw exception if larger) | ||
stanfordParseMaxSentenceLength = 80 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
## Illinois SRL Configuration## | ||
|
||
# Whether to use the Illinois Curator to get the required annotations for training/testing | ||
# If set to false, Illinois NLP pipeline will be used | ||
UseCurator = false | ||
|
||
# The configuration of the Illinois NLP pipeline | ||
PipelineConfig = config/pipeline.properties | ||
|
||
# The parser used to extract constituents and syntactic features | ||
# Options are: Charniak, Berkeley, Stanford | ||
# NB: Only Stanford can be used in standalone mode. | ||
DefaultParser = Stanford | ||
|
||
# The configuration for the Structured learner | ||
LearnerConfig = config/learner.properties | ||
|
||
# Num of threads for feat. ext. | ||
NumFeatExtThreads = 10 | ||
|
||
# The ILP solver to use for the joint inference | ||
# Options are: Gurobi, OJAlgo | ||
ILPSolver = OJAlgo | ||
|
||
# The TextAnnotation caching mechanism to use | ||
# Options are: MapDB, H2 | ||
DatasetCache = MapDB | ||
|
||
### Training corpora directories ### | ||
# This is the directory of the merged (mrg) WSJ files | ||
PennTreebankHome = /shared/corpora/corporaWeb/treebanks/eng/pennTreebank/treebank-3/parsed/mrg/wsj/ | ||
PropbankHome = /shared/corpora/corporaWeb/treebanks/eng/propbank_1/data | ||
NombankHome = /shared/corpora/corporaWeb/treebanks/eng/nombank/ | ||
|
||
# The directory of the sentence and pre-extracted features database (~5G of space required) | ||
# Not used during test/working with pre-trained models | ||
CacheDirectory = cache | ||
|
||
ModelsDirectory = models | ||
|
||
# Directory to output gold and predicted files for manual comparison | ||
# Comment out for no output | ||
OutputDirectory = srl-out |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,133 @@ | ||
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" | ||
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd"> | ||
|
||
<parent> | ||
<artifactId>illinois-cogcomp-nlp</artifactId> | ||
<groupId>edu.illinois.cs.cogcomp</groupId> | ||
<version>3.0.77</version> | ||
</parent> | ||
|
||
<modelVersion>4.0.0</modelVersion> | ||
<artifactId>illinois-srl</artifactId> | ||
<packaging>jar</packaging> | ||
<url>http://cogcomp.cs.illinois.edu</url> | ||
|
||
<properties> | ||
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> | ||
<project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding> | ||
<cogcomp-nlp-pipeline-version>0.1.24</cogcomp-nlp-pipeline-version> | ||
</properties> | ||
|
||
<dependencies> | ||
<!-- Include the pre-trained SRL models for running SemanticRoleLabeler --> | ||
<!-- Notice that the models need to match up to the minor version number --> | ||
<dependency> | ||
<groupId>edu.illinois.cs.cogcomp</groupId> | ||
<artifactId>illinois-srl-models</artifactId> | ||
<classifier>verb-stanford</classifier> | ||
<version>5.1</version> | ||
</dependency> | ||
<dependency> | ||
<groupId>edu.illinois.cs.cogcomp</groupId> | ||
<artifactId>illinois-srl-models</artifactId> | ||
<classifier>nom-stanford</classifier> | ||
<version>5.1</version> | ||
</dependency> | ||
|
||
<!--The Illinois pipeline can be used instead --> | ||
<dependency> | ||
<groupId>edu.illinois.cs.cogcomp</groupId> | ||
<artifactId>illinois-nlp-pipeline</artifactId> | ||
<version>${cogcomp-nlp-pipeline-version}</version> | ||
<exclusions> | ||
<exclusion> | ||
<artifactId>illinois-srl</artifactId> | ||
<groupId>edu.illinois.cs.cogcomp</groupId> | ||
</exclusion> | ||
</exclusions> | ||
</dependency> | ||
|
||
<!-- The following 3 projects are now developed under illinois-cogcomp-nlp --> | ||
<dependency> | ||
<groupId>edu.illinois.cs.cogcomp</groupId> | ||
<artifactId>illinois-core-utilities</artifactId> | ||
<version>3.0.77</version> | ||
</dependency> | ||
<dependency> | ||
<groupId>edu.illinois.cs.cogcomp</groupId> | ||
<artifactId>illinois-curator</artifactId> | ||
<version>3.0.77</version> | ||
</dependency> | ||
<dependency> | ||
<groupId>edu.illinois.cs.cogcomp</groupId> | ||
<artifactId>illinois-edison</artifactId> | ||
<version>3.0.77</version> | ||
</dependency> | ||
<dependency> | ||
<groupId>com.gurobi</groupId> | ||
<artifactId>gurobi</artifactId> | ||
<version>6.5</version> | ||
<optional>true</optional> | ||
</dependency> | ||
<dependency> | ||
<groupId>edu.illinois.cs.cogcomp</groupId> | ||
<artifactId>illinois-common-resources</artifactId> | ||
<classifier>illinoisSRL</classifier> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note the use of the classifier in case this needs to be addressed (see #43) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. huh? I am not clean on the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was using the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
<version>1.5</version> | ||
</dependency> | ||
<dependency> | ||
<groupId>edu.illinois.cs.cogcomp</groupId> | ||
<artifactId>illinois-common-resources</artifactId> | ||
<version>1.5</version> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why do we need both the general |
||
</dependency> | ||
<dependency> | ||
<groupId>edu.illinois.cs.cogcomp</groupId> | ||
<artifactId>illinois-common-resources</artifactId> | ||
<classifier>ner</classifier> | ||
<version>1.5</version> | ||
</dependency> | ||
<dependency> | ||
<groupId>edu.illinois.cs.cogcomp</groupId> | ||
<artifactId>illinois-sl-core</artifactId> | ||
<version>1.0.3</version> | ||
</dependency> | ||
<dependency> | ||
<groupId>commons-lang</groupId> | ||
<artifactId>commons-lang</artifactId> | ||
<version>2.6</version> | ||
</dependency> | ||
<dependency> | ||
<groupId>com.h2database</groupId> | ||
<artifactId>h2</artifactId> | ||
<version>1.4.190</version> | ||
</dependency> | ||
<dependency> | ||
<groupId>edu.illinois.cs.cogcomp</groupId> | ||
<artifactId>illinois-inference</artifactId> | ||
<version>0.6.0</version> | ||
</dependency> | ||
<dependency> | ||
<groupId>org.tartarus</groupId> | ||
<artifactId>snowball</artifactId> | ||
<version>1.0</version> | ||
</dependency> | ||
<dependency> | ||
<groupId>junit</groupId> | ||
<artifactId>junit</artifactId> | ||
<version>4.12</version> | ||
<scope>test</scope> | ||
</dependency> | ||
</dependencies> | ||
|
||
<reporting> | ||
<excludeDefaults>true</excludeDefaults> | ||
<plugins> | ||
<plugin> | ||
<groupId>org.apache.maven.plugins</groupId> | ||
<artifactId>maven-javadoc-plugin</artifactId> | ||
<version>2.10.3</version> | ||
</plugin> | ||
</plugins> | ||
</reporting> | ||
|
||
</project> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sure to change the models version (after retraining)