Skip to content

Consistency related changes in form recognizer #11467

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
May 18, 2020

Conversation

rakshith91
Copy link
Contributor

@rakshith91 rakshith91 commented May 15, 2020

  • Rename PageRange to FormPageRange
  • Remove page_number from FormField
  • Inherit USReceipt from Recognized form.

Fixes #11130

@@ -180,7 +180,7 @@ def __repr__(self):
)[:1024]


class USReceipt(object): # pylint: disable=too-many-instance-attributes
class USReceipt(RecognizedForm): # pylint: disable=too-many-instance-attributes
Copy link
Contributor Author

@rakshith91 rakshith91 May 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, instead of inheritng from RecognizedForm we can pass in a recognizedForm as a variable and un-flatten this.

self.fields = kwargs.get("fields", None)	
self.page_range = kwargs.get("page_range", None)	
self.pages = kwargs.get("pages", None)	
self.form_type = kwargs.get("form_type", None)

will be replaced by

self.recognized_form = kwargs.get("recognized_form ", None) # type: ~models.RecognizedForm

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@johanste thoughts?

Copy link
Contributor

@iscai-msft iscai-msft May 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do remember @kristapratico had thoughts about making people access fields, pages etc through recognized_form, I can't fully remember the details though.

I think I'm slightly more in favor of adding recognized_form myself, since fields, form_type etc aren't commonly called on a USReceipt, but my feelings aren't particularly strong.

(editing in because I previously forgot to mention this): .net currently has RecognizedForm as a property on USReceipt

@rakshith91 rakshith91 marked this pull request as ready for review May 15, 2020 19:49
@rakshith91 rakshith91 requested a review from johanste May 15, 2020 19:52
@@ -120,7 +120,7 @@ def __new__(cls, x, y):
return super(Point, cls).__new__(cls, x, y)


class PageRange(namedtuple("PageRange", "first_page last_page")):
class FormPageRange(namedtuple("FormPageRange", "first_page last_page")):
"""The 1-based page range of the document.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very tiny nit (please feel free to ignore): I think it would be better to have the comment as "The 1-based page range of the form" since we have form in the name and form is more specifically what we're returning the page range of.

This change would also include later parts of this docstring, i.e. changing
:ivar int first_page: The first page number of the document
to
:ivar int first_page: The first page number of the form

@@ -180,7 +180,7 @@ def __repr__(self):
)[:1024]


class USReceipt(object): # pylint: disable=too-many-instance-attributes
class USReceipt(RecognizedForm): # pylint: disable=too-many-instance-attributes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more of a question: If we inherit from another exposed class, do we include the properties we inherited from that other class in our docstring? I did this when I was working on key vault, but just want to check for sure what we should be doing in this case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we must be doing it imo - since they are "valid" properties. As a customer, i would not want to go check the docsstring for a series of models

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, that makes a lot of sense. thanks!

@@ -24,7 +24,7 @@
CustomFormModelInfo,
AccountProperties,
Point,
PageRange,
FormPageRange,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the changelog

self.receipt_locale = kwargs.get("receipt_locale", "en-US")

def __repr__(self):
return "RecognizedForm(form_type={}, fields={}, page_range={}, pages={}, " \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be RecognizedReceipt in repr

- `PageRange` is renamed to `FormPageRange`
- `FormField` does not have a page_number.
- `begin_recognize_receipts` APIs now return `RecognizedReceipt` instead of `USReceipt`
- `USReceiptType` is renamed to `ReceiptType`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should include that we added RecognizedReceipt class, that it inherits from RecognziedForm and USReceipt inherits from it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RecognizedReceipt is more of a "middle man" and is mainly for documenting purposes - it should not ( and does not) change the behaviour of UsReceipt - so i dont think it should be in changelog

self.page_range = kwargs.get("page_range", None)
self.pages = kwargs.get("pages", None)
self.form_type = kwargs.get("form_type", None)
self.receipt_locale = kwargs.get("receipt_locale", "en-US")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should also delete receipt_type and receipt_locale from the repr below (this would also require the USReceipt repr test)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope - USReceipt inherits from RecognizedReceipt - so it will still stay logically in the repr

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're right, my mind totally blanked there. thanks!

@@ -10,13 +10,16 @@
- Removed `get_form_training_client` from `FormRecognizerClient`
- Added `get_form_recognizer_client` to `FormTrainingClient`
- A `HttpResponseError` is now raised if a model with `status=="invalid"` is returned from the `begin_train_model()` or `train_model()` methods
- `PageRange` is renamed to `FormPageRange`
- `FormField` does not have a page_number.
- `begin_recognize_receipts` APIs now return `RecognizedReceipt` instead of `USReceipt`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch - thanks

@@ -29,7 +29,7 @@ def prepare_us_receipt(response):
for page in document_result:
if page.fields is None:
receipt = USReceipt(
page_range=PageRange(first_page=page.page_range[0], last_page=page.page_range[1]),
page_range=FormPageRange(first_page=page.page_range[0], last_page=page.page_range[1]),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question I have about this change: shouldn't we also update prepare_us_receipt to return a list of RecognizedReceipt instead of USReceipt (line 31)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thats a good question - but at this point since we support only us receipt - we will return the USReceipt - the RecognizedReceipt is for documentation purposes so that we keep it more open for future - it's something we will be able to change internally later

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might be too much for the future, but should we add the ability in RecognizedReciept to map to which of its children it should be?
i.e., adding a dict of receipt-locale to child type {"en-us": USReceipt}, so we could pass in this into to RecognizedReceipt and get out a USReceipt in this case.

@kristapratico what are your thoughts on this, I wasn't in the phone call so not sure what you guys decided for now

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think that makes sense for the future. v2 will GA with only US sales receipts so we'll add that implementation detail in v2.1.

@@ -97,8 +97,8 @@ def form_page(form_table, form_line):

@pytest.fixture
def us_receipt_type():
model = _models.USReceiptType(type="Itemized", confidence=1.0)
model_repr = "USReceiptType(type=Itemized, confidence=1.0)"
model = _models.ReceiptType(type="Itemized", confidence=1.0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test that tests the repr of RecognizedReceipt?

self.receipt_locale = kwargs.get("receipt_locale", "en-US")

def __repr__(self):
return "RecognizedForm(form_type={}, fields={}, page_range={}, pages={}, " \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should say 'RecognizedReceipt' instead of 'RecognizedForm'

@iscai-msft iscai-msft self-requested a review May 18, 2020 20:53
@rakshith91 rakshith91 merged commit 83915fa into Azure:master May 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[form recognizer] Make API consistent across languages (FormRecognizerClient)
3 participants