Skip to content

GazetteerViewGenerator loads no gazetteers; deprecate in favor of SimpleGazetteerAnnotator? #146

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mssammon opened this issue Aug 10, 2016 · 6 comments
Assignees

Comments

@mssammon
Copy link
Contributor

GazetteerViewGenerator and it is trying to load gazetteer files from resources/gazetteers/gazetteers, but this doesn't exist. Instead, I think clients should use SimpleGazetteerAnnotator, which loads resources from illinois-common-resources (note that these are organized differently from the way expected by GazetteerViewGenerator).

@cogcomp-dev cogcomp-dev added this to the CCG Borg Bonanza milestone Sep 12, 2016
@shyamupa
Copy link
Member

shyamupa commented Oct 2, 2016

Hey @mssammon,
Is the resource organization the only issue? I see that the SimpleAnnotator takes a resource manager, so path etc. can be easily dictated. As far as I understand the only change in functionality between the two is that the old view generator uses some filters (POS, NP) that I did not find in the new SimpleAnnotator. That might be a problem, depending on what application uses this class.

@mssammon
Copy link
Contributor Author

mssammon commented Oct 2, 2016

@shyamupa good to know, thanks. I think that ideally the option to use filters should be implemented in the new annotator -- I'm presuming this is non-trivial additional work but doable. Sound OK?

@cowchipkid
Copy link
Contributor

The SimpleGazetteerAnnotator is highly optimized. It can do phrases of any length without impacting performance. The GazViewGen is based on the older methodology which is less efficient, but requires less memory. It is a simple hash set of all the phrased from the gaz, but for multi-token hits it is not very efficient. There are lots of cool features in GazViewGen. I suggest, if they are useful features, we should move functionality over to the SimpleGazetteerAnnotator, and deprecate the older GazViewGenerator.

@mssammon
Copy link
Contributor Author

@shyamupa please track any usages of the old gazetteer view generator within illinois-cogcomp-nlp. (I assume none, but just in case...). What is the cost of migrating the GazViewGen features to the SimpleGazetteerGenerator?

@shyamupa
Copy link
Member

@mssammon So the class is not used outside of KnownFexes,CreateTestFeaturesResource and TestWordFeatureFactory. We can migrate to the new class with almost no effect outside. I found that the POSFilter etc. filters that I was suspecting to cause problems are not used directly for any application inside the package. I think we can go ahead and deprecate the old class.

@shyamupa
Copy link
Member

Should be taken care of by #244. Will close this issue once merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants