Skip to content

Commit 83915fa

Browse files
authored
Consistency related changes in form recognizer (#11467)
* page range + form field page number * us receipt * RecognizedReceipt * Update sdk/formrecognizer/azure-ai-formrecognizer/CHANGELOG.md * async * comments
1 parent 88fae6e commit 83915fa

13 files changed

+88
-76
lines changed

sdk/formrecognizer/azure-ai-formrecognizer/CHANGELOG.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,13 +10,16 @@
1010
- Removed `get_form_training_client` from `FormRecognizerClient`
1111
- Added `get_form_recognizer_client` to `FormTrainingClient`
1212
- A `HttpResponseError` is now raised if a model with `status=="invalid"` is returned from the `begin_train_model()` or `train_model()` methods
13+
- `PageRange` is renamed to `FormPageRange`
14+
- `FormField` does not have a page_number.
15+
- `begin_recognize_receipts` APIs now return `RecognizedReceipt` instead of `USReceipt`
16+
- `USReceiptType` is renamed to `ReceiptType`
1317

1418
**New features**
1519

1620
- Authentication using `azure-identity` credentials now supported
1721
- see the [Azure Identity documentation](https://github.com/Azure/azure-sdk-for-python/blob/master/sdk/identity/azure-identity/README.md) for more information
1822

19-
2023
## 1.0.0b2 (2020-05-06)
2124

2225
**Fixes and improvements**

sdk/formrecognizer/azure-ai-formrecognizer/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -123,7 +123,7 @@ See the full details regarding [authentication][cognitive_authentication] of cog
123123

124124
- Recognizing form fields and content using custom models trained to recognize your custom forms. These values are returned in a collection of `RecognizedForm` objects.
125125
- Recognizing form content, including tables, lines and words, without the need to train a model. Form content is returned in a collection of `FormPage` objects.
126-
- Recognizing common fields from US receipts, using a pre-trained receipt model on the Form Recognizer service. These fields and meta-data are returned in a collection of `USReceipt` objects.
126+
- Recognizing common fields from US receipts, using a pre-trained receipt model on the Form Recognizer service. These fields and meta-data are returned in a collection of `RecognizedReceipt` objects.
127127

128128
### FormTrainingClient
129129
`FormTrainingClient` provides operations for:

sdk/formrecognizer/azure-ai-formrecognizer/azure/ai/formrecognizer/__init__.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
CustomFormModelStatus,
1616
FormContentType,
1717
USReceipt,
18-
USReceiptType,
18+
ReceiptType,
1919
USReceiptItem,
2020
FormTable,
2121
FormTableCell,
@@ -24,7 +24,7 @@
2424
CustomFormModelInfo,
2525
AccountProperties,
2626
Point,
27-
PageRange,
27+
FormPageRange,
2828
RecognizedForm,
2929
FormField,
3030
FieldText,
@@ -46,7 +46,7 @@
4646
'FormContentType',
4747
'FormContent',
4848
'USReceipt',
49-
'USReceiptType',
49+
'ReceiptType',
5050
'USReceiptItem',
5151
'FormTable',
5252
'FormTableCell',
@@ -55,7 +55,7 @@
5555
'CustomFormModelInfo',
5656
'AccountProperties',
5757
'Point',
58-
'PageRange',
58+
'FormPageRange',
5959
'RecognizedForm',
6060
'FormField',
6161
'FieldText',

sdk/formrecognizer/azure-ai-formrecognizer/azure/ai/formrecognizer/_form_recognizer_client.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -94,8 +94,8 @@ def begin_recognize_receipts(self, stream, **kwargs):
9494
:keyword int polling_interval: Waiting time between two polls for LRO operations
9595
if no Retry-After header is present. Defaults to 5 seconds.
9696
:return: An instance of an LROPoller. Call `result()` on the poller
97-
object to return a list[:class:`~azure.ai.formrecognizer.USReceipt`].
98-
:rtype: ~azure.core.polling.LROPoller[list[~azure.ai.formrecognizer.USReceipt]]
97+
object to return a list[:class:`~azure.ai.formrecognizer.RecognizedReceipt`].
98+
:rtype: ~azure.core.polling.LROPoller[list[~azure.ai.formrecognizer.RecognizedReceipt]]
9999
:raises ~azure.core.exceptions.HttpResponseError:
100100
101101
.. admonition:: Example:
@@ -142,8 +142,8 @@ def begin_recognize_receipts_from_url(self, url, **kwargs):
142142
:keyword int polling_interval: Waiting time between two polls for LRO operations
143143
if no Retry-After header is present. Defaults to 5 seconds.
144144
:return: An instance of an LROPoller. Call `result()` on the poller
145-
object to return a list[:class:`~azure.ai.formrecognizer.USReceipt`].
146-
:rtype: ~azure.core.polling.LROPoller[list[~azure.ai.formrecognizer.USReceipt]]
145+
object to return a list[:class:`~azure.ai.formrecognizer.RecognizedReceipt`].
146+
:rtype: ~azure.core.polling.LROPoller[list[~azure.ai.formrecognizer.RecognizedReceipt]]
147147
:raises ~azure.core.exceptions.HttpResponseError:
148148
149149
.. admonition:: Example:

sdk/formrecognizer/azure-ai-formrecognizer/azure/ai/formrecognizer/_models.py

Lines changed: 44 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -120,17 +120,17 @@ def __new__(cls, x, y):
120120
return super(Point, cls).__new__(cls, x, y)
121121

122122

123-
class PageRange(namedtuple("PageRange", "first_page last_page")):
124-
"""The 1-based page range of the document.
123+
class FormPageRange(namedtuple("FormPageRange", "first_page last_page")):
124+
"""The 1-based page range of the form.
125125
126-
:ivar int first_page: The first page number of the document.
127-
:ivar int last_page: The last page number of the document.
126+
:ivar int first_page: The first page number of the form.
127+
:ivar int last_page: The last page number of the form.
128128
"""
129129

130130
__slots__ = ()
131131

132132
def __new__(cls, first_page, last_page):
133-
return super(PageRange, cls).__new__(cls, first_page, last_page)
133+
return super(FormPageRange, cls).__new__(cls, first_page, last_page)
134134

135135

136136
class FormContent(object):
@@ -162,7 +162,7 @@ class RecognizedForm(object):
162162
this is the training-time label of the field. For models trained
163163
without labels, a unique name is generated for each field.
164164
:vartype fields: dict[str, ~azure.ai.formrecognizer.FormField]
165-
:ivar ~azure.ai.formrecognizer.PageRange page_range:
165+
:ivar ~azure.ai.formrecognizer.FormPageRange page_range:
166166
The first and last page of the input form.
167167
:ivar list[~azure.ai.formrecognizer.FormPage] pages:
168168
A list of pages recognized from the input document. Contains lines,
@@ -179,8 +179,39 @@ def __repr__(self):
179179
self.form_type, repr(self.fields), repr(self.page_range), repr(self.pages)
180180
)[:1024]
181181

182+
class RecognizedReceipt(RecognizedForm):
183+
"""Represents a receipt that has been recognized by a trained model.
182184
183-
class USReceipt(object): # pylint: disable=too-many-instance-attributes
185+
:ivar str form_type:
186+
The type of form the model identified the submitted form to be.
187+
:ivar fields:
188+
A dictionary of the fields found on the form. The fields dictionary
189+
keys are the `name` of the field. For models trained with labels,
190+
this is the training-time label of the field. For models trained
191+
without labels, a unique name is generated for each field.
192+
:vartype fields: dict[str, ~azure.ai.formrecognizer.FormField]
193+
:ivar ~azure.ai.formrecognizer.FormPageRange page_range:
194+
The first and last page of the input form.
195+
:ivar list[~azure.ai.formrecognizer.FormPage] pages:
196+
A list of pages recognized from the input document. Contains lines,
197+
words, tables and page metadata.
198+
:ivar ~azure.ai.formrecognizer.ReceiptType receipt_type:
199+
The reciept type and confidence.
200+
:ivar str receipt_locale: Defaults to "en-US".
201+
"""
202+
def __init__(self, **kwargs):
203+
super(RecognizedReceipt, self).__init__(**kwargs)
204+
self.receipt_type = kwargs.get("receipt_type", None)
205+
self.receipt_locale = kwargs.get("receipt_locale", "en-US")
206+
207+
def __repr__(self):
208+
return "RecognizedReceipt(form_type={}, fields={}, page_range={}, pages={}, " \
209+
"receipt_type={}, receipt_locale={})".format(
210+
self.form_type, repr(self.fields), repr(self.page_range), repr(self.pages),
211+
repr(self.receipt_type), self.receipt_locale
212+
)[:1024]
213+
214+
class USReceipt(RecognizedReceipt): # pylint: disable=too-many-instance-attributes
184215
"""Extracted fields found on the US sales receipt. Provides
185216
attributes for accessing common fields present in US sales receipts.
186217
@@ -190,8 +221,6 @@ class USReceipt(object): # pylint: disable=too-many-instance-attributes
190221
The name of the merchant.
191222
:ivar ~azure.ai.formrecognizer.FormField merchant_phone_number:
192223
The phone number associated with the merchant.
193-
:ivar ~azure.ai.formrecognizer.USReceiptType receipt_type:
194-
The reciept type and confidence.
195224
:ivar list[~azure.ai.formrecognizer.USReceiptItem] receipt_items:
196225
The purchased items found on the receipt.
197226
:ivar ~azure.ai.formrecognizer.FormField subtotal:
@@ -209,33 +238,27 @@ class USReceipt(object): # pylint: disable=too-many-instance-attributes
209238
:ivar fields:
210239
A dictionary of the fields found on the receipt.
211240
:vartype fields: dict[str, ~azure.ai.formrecognizer.FormField]
212-
:ivar ~azure.ai.formrecognizer.PageRange page_range:
241+
:ivar ~azure.ai.formrecognizer.FormPageRange page_range:
213242
The first and last page of the input receipt.
214243
:ivar list[~azure.ai.formrecognizer.FormPage] pages:
215244
Contains page metadata such as page width, length, text angle, unit.
216245
If `include_text_content=True` is passed, contains a list
217246
of extracted text lines for each page in the input document.
218247
:ivar str form_type: The type of form.
219-
:ivar str receipt_locale: Defaults to "en-US".
220248
"""
221249

222250
def __init__(self, **kwargs):
251+
super(USReceipt, self).__init__(**kwargs)
223252
self.merchant_address = kwargs.get("merchant_address", None)
224253
self.merchant_name = kwargs.get("merchant_name", None)
225254
self.merchant_phone_number = kwargs.get("merchant_phone_number", None)
226-
self.receipt_type = kwargs.get("receipt_type", None)
227255
self.receipt_items = kwargs.get("receipt_items", None)
228256
self.subtotal = kwargs.get("subtotal", None)
229257
self.tax = kwargs.get("tax", None)
230258
self.tip = kwargs.get("tip", None)
231259
self.total = kwargs.get("total", None)
232260
self.transaction_date = kwargs.get("transaction_date", None)
233261
self.transaction_time = kwargs.get("transaction_time", None)
234-
self.fields = kwargs.get("fields", None)
235-
self.page_range = kwargs.get("page_range", None)
236-
self.pages = kwargs.get("pages", None)
237-
self.form_type = kwargs.get("form_type", None)
238-
self.receipt_locale = kwargs.get("receipt_locale", "en-US")
239262

240263
def __repr__(self):
241264
return "USReceipt(merchant_address={}, merchant_name={}, merchant_phone_number={}, " \
@@ -264,8 +287,6 @@ class FormField(object):
264287
:class:`~azure.ai.formrecognizer.FormField`, or list[:class:`~azure.ai.formrecognizer.FormField`]
265288
:ivar float confidence:
266289
Measures the degree of certainty of the recognition result. Value is between [0.0, 1.0].
267-
:ivar int page_number:
268-
The 1-based number of the page in which this content is present.
269290
"""
270291

271292
def __init__(self, **kwargs):
@@ -274,7 +295,6 @@ def __init__(self, **kwargs):
274295
self.name = kwargs.get("name", None)
275296
self.value = kwargs.get("value", None)
276297
self.confidence = kwargs.get("confidence", None)
277-
self.page_number = kwargs.get("page_number", None)
278298

279299
@classmethod
280300
def _from_generated(cls, field, value, read_result):
@@ -284,7 +304,6 @@ def _from_generated(cls, field, value, read_result):
284304
value=get_field_value(field, value, read_result),
285305
name=field,
286306
confidence=adjust_confidence(value.confidence) if value else None,
287-
page_number=value.page if value else None,
288307
)
289308

290309

@@ -296,12 +315,11 @@ def _from_generated_unlabeled(cls, field, idx, page, read_result):
296315
value=field.value.text,
297316
name="field-" + str(idx),
298317
confidence=adjust_confidence(field.confidence),
299-
page_number=page,
300318
)
301319

302320
def __repr__(self):
303-
return "FormField(label_data={}, value_data={}, name={}, value={}, confidence={}, page_number={})".format(
304-
repr(self.label_data), repr(self.value_data), self.name, repr(self.value), self.confidence, self.page_number
321+
return "FormField(label_data={}, value_data={}, name={}, value={}, confidence={})".format(
322+
repr(self.label_data), repr(self.value_data), self.name, repr(self.value), self.confidence
305323
)[:1024]
306324

307325

@@ -495,7 +513,7 @@ def __repr__(self):
495513
)[:1024]
496514

497515

498-
class USReceiptType(object):
516+
class ReceiptType(object):
499517
"""The type of the analyzed US receipt and the confidence
500518
value of that type.
501519
@@ -516,7 +534,7 @@ def _from_generated(cls, item):
516534
confidence=adjust_confidence(item.confidence)) if item else None
517535

518536
def __repr__(self):
519-
return "USReceiptType(type={}, confidence={})".format(self.type, self.confidence)[:1024]
537+
return "ReceiptType(type={}, confidence={})".format(self.type, self.confidence)[:1024]
520538

521539

522540
class USReceiptItem(object):

sdk/formrecognizer/azure-ai-formrecognizer/azure/ai/formrecognizer/_response_handlers.py

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -8,14 +8,14 @@
88

99
from ._models import (
1010
USReceipt,
11-
USReceiptType,
11+
ReceiptType,
1212
FormField,
1313
USReceiptItem,
1414
FormPage,
1515
FormLine,
1616
FormTable,
1717
FormTableCell,
18-
PageRange,
18+
FormPageRange,
1919
RecognizedForm
2020
)
2121

@@ -29,7 +29,7 @@ def prepare_us_receipt(response):
2929
for page in document_result:
3030
if page.fields is None:
3131
receipt = USReceipt(
32-
page_range=PageRange(first_page=page.page_range[0], last_page=page.page_range[1]),
32+
page_range=FormPageRange(first_page=page.page_range[0], last_page=page.page_range[1]),
3333
pages=form_page[page.page_range[0]-1:page.page_range[1]],
3434
form_type=page.doc_type,
3535
)
@@ -47,7 +47,7 @@ def prepare_us_receipt(response):
4747
page.fields.get("MerchantPhoneNumber"),
4848
read_result,
4949
),
50-
receipt_type=USReceiptType._from_generated(page.fields.get("ReceiptType")),
50+
receipt_type=ReceiptType._from_generated(page.fields.get("ReceiptType")),
5151
receipt_items=USReceiptItem._from_generated(
5252
page.fields.get("Items"), read_result
5353
),
@@ -65,7 +65,7 @@ def prepare_us_receipt(response):
6565
transaction_time=FormField._from_generated(
6666
"TransactionTime", page.fields.get("TransactionTime"), read_result
6767
),
68-
page_range=PageRange(
68+
page_range=FormPageRange(
6969
first_page=page.page_range[0], last_page=page.page_range[1]
7070
),
7171
pages=form_page[page.page_range[0]-1:page.page_range[1]],
@@ -132,7 +132,7 @@ def prepare_unlabeled_result(response):
132132
if unlabeled_fields:
133133
unlabeled_fields = {field.name: field for field in unlabeled_fields}
134134
form = RecognizedForm(
135-
page_range=PageRange(
135+
page_range=FormPageRange(
136136
first_page=page.page,
137137
last_page=page.page
138138
),
@@ -152,7 +152,7 @@ def prepare_labeled_result(response, model_id):
152152
result = []
153153
for doc in response.analyze_result.document_results:
154154
form = RecognizedForm(
155-
page_range=PageRange(
155+
page_range=FormPageRange(
156156
first_page=doc.page_range[0],
157157
last_page=doc.page_range[1]
158158
),

sdk/formrecognizer/azure-ai-formrecognizer/azure/ai/formrecognizer/aio/_form_recognizer_client_async.py

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@
2929
from azure.core.credentials import AzureKeyCredential
3030
from azure.core.credentials_async import AsyncTokenCredential
3131
from .._models import (
32-
USReceipt,
32+
RecognizedReceipt,
3333
FormPage,
3434
RecognizedForm
3535
)
@@ -91,7 +91,7 @@ async def recognize_receipts(
9191
self,
9292
stream: Union[bytes, IO[bytes]],
9393
**kwargs: Any
94-
) -> List["USReceipt"]:
94+
) -> List["RecognizedReceipt"]:
9595
"""Extract field text and semantic values from a given US sales receipt.
9696
The input document must be of one of the supported content types - 'application/pdf',
9797
'image/jpeg', 'image/png' or 'image/tiff'.
@@ -106,8 +106,8 @@ async def recognize_receipts(
106106
see :class:`~azure.ai.formrecognizer.FormContentType`.
107107
:keyword int polling_interval: Waiting time between two polls for LRO operations
108108
if no Retry-After header is present. Defaults to 5 seconds.
109-
:return: A list of USReceipt.
110-
:rtype: list[~azure.ai.formrecognizer.USReceipt]
109+
:return: A list of RecognizedReceipt.
110+
:rtype: list[~azure.ai.formrecognizer.RecognizedReceipt]
111111
:raises ~azure.core.exceptions.HttpResponseError:
112112
113113
.. admonition:: Example:
@@ -145,7 +145,7 @@ async def recognize_receipts_from_url(
145145
self,
146146
url: str,
147147
**kwargs: Any
148-
) -> List["USReceipt"]:
148+
) -> List["RecognizedReceipt"]:
149149
"""Extract field text and semantic values from a given US sales receipt.
150150
The input document must be the location (Url) of the receipt to be analyzed.
151151
@@ -156,8 +156,8 @@ async def recognize_receipts_from_url(
156156
Whether or not to include text elements such as lines and words in addition to form fields.
157157
:keyword int polling_interval: Waiting time between two polls for LRO operations
158158
if no Retry-After header is present. Defaults to 5 seconds.
159-
:return: A list of USReceipt.
160-
:rtype: list[~azure.ai.formrecognizer.USReceipt]
159+
:return: A list of RecognizedReceipt.
160+
:rtype: list[~azure.ai.formrecognizer.RecognizedReceipt]
161161
:raises ~azure.core.exceptions.HttpResponseError:
162162
163163
.. admonition:: Example:

0 commit comments

Comments
 (0)