-
Notifications
You must be signed in to change notification settings - Fork 3k
Consistency related changes in form recognizer #11467
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -180,7 +180,7 @@ def __repr__(self): | |||
)[:1024] | |||
|
|||
|
|||
class USReceipt(object): # pylint: disable=too-many-instance-attributes | |||
class USReceipt(RecognizedForm): # pylint: disable=too-many-instance-attributes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, instead of inheritng from RecognizedForm
we can pass in a recognizedForm as a variable and un-flatten this.
self.fields = kwargs.get("fields", None)
self.page_range = kwargs.get("page_range", None)
self.pages = kwargs.get("pages", None)
self.form_type = kwargs.get("form_type", None)
will be replaced by
self.recognized_form = kwargs.get("recognized_form ", None) # type: ~models.RecognizedForm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@johanste thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do remember @kristapratico had thoughts about making people access fields, pages etc through recognized_form, I can't fully remember the details though.
I think I'm slightly more in favor of adding recognized_form myself, since fields, form_type etc aren't commonly called on a USReceipt, but my feelings aren't particularly strong.
(editing in because I previously forgot to mention this): .net currently has RecognizedForm as a property on USReceipt
@@ -120,7 +120,7 @@ def __new__(cls, x, y): | |||
return super(Point, cls).__new__(cls, x, y) | |||
|
|||
|
|||
class PageRange(namedtuple("PageRange", "first_page last_page")): | |||
class FormPageRange(namedtuple("FormPageRange", "first_page last_page")): | |||
"""The 1-based page range of the document. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very tiny nit (please feel free to ignore): I think it would be better to have the comment as "The 1-based page range of the form" since we have form in the name and form is more specifically what we're returning the page range of.
This change would also include later parts of this docstring, i.e. changing
:ivar int first_page: The first page number of the document
to
:ivar int first_page: The first page number of the form
@@ -180,7 +180,7 @@ def __repr__(self): | |||
)[:1024] | |||
|
|||
|
|||
class USReceipt(object): # pylint: disable=too-many-instance-attributes | |||
class USReceipt(RecognizedForm): # pylint: disable=too-many-instance-attributes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is more of a question: If we inherit from another exposed class, do we include the properties we inherited from that other class in our docstring? I did this when I was working on key vault, but just want to check for sure what we should be doing in this case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we must be doing it imo - since they are "valid" properties. As a customer, i would not want to go check the docsstring for a series of models
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, that makes a lot of sense. thanks!
@@ -24,7 +24,7 @@ | |||
CustomFormModelInfo, | |||
AccountProperties, | |||
Point, | |||
PageRange, | |||
FormPageRange, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the changelog
self.receipt_locale = kwargs.get("receipt_locale", "en-US") | ||
|
||
def __repr__(self): | ||
return "RecognizedForm(form_type={}, fields={}, page_range={}, pages={}, " \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be RecognizedReceipt
in repr
- `PageRange` is renamed to `FormPageRange` | ||
- `FormField` does not have a page_number. | ||
- `begin_recognize_receipts` APIs now return `RecognizedReceipt` instead of `USReceipt` | ||
- `USReceiptType` is renamed to `ReceiptType` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should include that we added RecognizedReceipt
class, that it inherits from RecognziedForm
and USReceipt
inherits from it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RecognizedReceipt is more of a "middle man" and is mainly for documenting purposes - it should not ( and does not) change the behaviour of UsReceipt - so i dont think it should be in changelog
self.page_range = kwargs.get("page_range", None) | ||
self.pages = kwargs.get("pages", None) | ||
self.form_type = kwargs.get("form_type", None) | ||
self.receipt_locale = kwargs.get("receipt_locale", "en-US") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should also delete receipt_type and receipt_locale from the repr below (this would also require the USReceipt repr test)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nope - USReceipt inherits from RecognizedReceipt - so it will still stay logically in the repr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're right, my mind totally blanked there. thanks!
@@ -10,13 +10,16 @@ | |||
- Removed `get_form_training_client` from `FormRecognizerClient` | |||
- Added `get_form_recognizer_client` to `FormTrainingClient` | |||
- A `HttpResponseError` is now raised if a model with `status=="invalid"` is returned from the `begin_train_model()` or `train_model()` methods | |||
- `PageRange` is renamed to `FormPageRange` | |||
- `FormField` does not have a page_number. | |||
- `begin_recognize_receipts` APIs now return `RecognizedReceipt` instead of `USReceipt` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you also need to update the readme here: https://github.com/Azure/azure-sdk-for-python/blob/42684b2051480f546a4fd563411e9785a7d075f3/sdk/formrecognizer/azure-ai-formrecognizer/README.md#formrecognizerclient to say that it returns list of RecognizedReceipt
instead of USReceipt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice catch - thanks
@@ -29,7 +29,7 @@ def prepare_us_receipt(response): | |||
for page in document_result: | |||
if page.fields is None: | |||
receipt = USReceipt( | |||
page_range=PageRange(first_page=page.page_range[0], last_page=page.page_range[1]), | |||
page_range=FormPageRange(first_page=page.page_range[0], last_page=page.page_range[1]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One question I have about this change: shouldn't we also update prepare_us_receipt to return a list of RecognizedReceipt instead of USReceipt (line 31)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thats a good question - but at this point since we support only us receipt - we will return the USReceipt - the RecognizedReceipt is for documentation purposes so that we keep it more open for future - it's something we will be able to change internally later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this might be too much for the future, but should we add the ability in RecognizedReciept to map to which of its children it should be?
i.e., adding a dict of receipt-locale to child type {"en-us": USReceipt}, so we could pass in this into to RecognizedReceipt and get out a USReceipt in this case.
@kristapratico what are your thoughts on this, I wasn't in the phone call so not sure what you guys decided for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think that makes sense for the future. v2 will GA with only US sales receipts so we'll add that implementation detail in v2.1.
@@ -97,8 +97,8 @@ def form_page(form_table, form_line): | |||
|
|||
@pytest.fixture | |||
def us_receipt_type(): | |||
model = _models.USReceiptType(type="Itemized", confidence=1.0) | |||
model_repr = "USReceiptType(type=Itemized, confidence=1.0)" | |||
model = _models.ReceiptType(type="Itemized", confidence=1.0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a test that tests the repr of RecognizedReceipt
?
self.receipt_locale = kwargs.get("receipt_locale", "en-US") | ||
|
||
def __repr__(self): | ||
return "RecognizedForm(form_type={}, fields={}, page_range={}, pages={}, " \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should say 'RecognizedReceipt' instead of 'RecognizedForm'
Fixes #11130