Skip to content

enhance pos tagging with transformers function #866

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
pavaris-pm opened this issue Nov 14, 2023 · 1 comment
Closed

enhance pos tagging with transformers function #866

pavaris-pm opened this issue Nov 14, 2023 · 1 comment
Labels
refactoring a technical improvement which does not add any new features or change existing features.
Milestone

Comments

@pavaris-pm
Copy link
Contributor

In PR #857 , pos_tag_transformers was added which consist of 3 models, however, to call and engine, the full name of it must be specified, also the output still not the same format as another tagger. For example

pos_tag_transformers(words="แมวทำอะไรตอนห้าโมงเช้า", engine = "bert-base-th-cased-blackboard")
# outputs
# [{'entity_group': 'NN', 'score': 0.910759, 'word': 'แมวมา', 'start': 0, 'end': 5},
#  {'entity_group': 'VV', 'score': 0.9462489, 'word': '##ทำ', 'start': 5,  'end': 7},
# {'entity_group': 'NN', 'score': 0.8325567, 'word': '##อะไรตอนห้าโมงเช้า',  'start': 7, 'end': 24}]

which is very hard for the normal user to remember its entire name (at least me to remember "bert-base-th-cased-blackboard" is impossible), and may result in more mess in the internal code if another transformers model trained on new corpus are added. we will end up with a lot of if-else condition in order to call a model in the future

According to that i've cleaned up the code to let a user call a model with parameters named engine and corpus same as what we have from the former function that is pos_tag and pos_tag_sents and also fix output format in PR #865. This will reduce how hard to remember the entire model name, and better experience for users. What do you think ? @wannaphong

@bact bact added the refactoring a technical improvement which does not add any new features or change existing features. label Nov 14, 2023
@bact bact modified the milestones: Future, 5.0 Nov 14, 2023
@pavaris-pm
Copy link
Contributor Author

close this issue since i already fixed it in PR #865 krub 👍🏻

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
refactoring a technical improvement which does not add any new features or change existing features.
Projects
Status: Done
Development

No branches or pull requests

2 participants