Skip to content

[formrecognizer] add strongly-typed receipt wrapper sample #12128

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jun 23, 2020
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,9 @@ def begin_recognize_receipts(self, receipt, **kwargs):
The input document must be of one of the supported content types - 'application/pdf',
'image/jpeg', 'image/png' or 'image/tiff'.

See fields found on a receipt here:
https://aka.ms/azsdk/python/formrecognizer/receiptfields

:param receipt: JPEG, PNG, PDF and TIFF type file stream or bytes.
Currently only supports US sales receipts.
:type receipt: bytes or IO[bytes]
Expand Down Expand Up @@ -141,6 +144,9 @@ def begin_recognize_receipts_from_url(self, receipt_url, **kwargs):
"""Extract field text and semantic values from a given US sales receipt.
The input document must be the location (Url) of the receipt to be analyzed.

See fields found on a receipt here:
https://aka.ms/azsdk/python/formrecognizer/receiptfields

:param str receipt_url: The url of the receipt to analyze. The input must be a valid, encoded url
of one of the supported formats: JPEG, PNG, PDF and TIFF. Currently only supports
US sales receipts.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,9 @@ async def begin_recognize_receipts(
The input document must be of one of the supported content types - 'application/pdf',
'image/jpeg', 'image/png' or 'image/tiff'.

See fields found on a receipt here:
https://aka.ms/azsdk/python/formrecognizer/receiptfields

:param receipt: JPEG, PNG, PDF and TIFF type file stream or bytes.
Currently only supports US sales receipts.
:type receipt: bytes or IO[bytes]
Expand Down Expand Up @@ -155,6 +158,9 @@ async def begin_recognize_receipts_from_url(
"""Extract field text and semantic values from a given US sales receipt.
The input document must be the location (Url) of the receipt to be analyzed.

See fields found on a receipt here:
https://aka.ms/azsdk/python/formrecognizer/receiptfields

:param str receipt_url: The url of the receipt to analyze. The input must be a valid, encoded url
of one of the supported formats: JPEG, PNG, PDF and TIFF. Currently only supports
US sales receipts.
Expand Down
5 changes: 4 additions & 1 deletion sdk/formrecognizer/azure-ai-formrecognizer/samples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@ what you can do with the Azure Form Recognizer client library.

|**Advanced Sample File Name**|**Description**|
|----------------|-------------|
|[sample_strongly_typing_recognized_form.py][sample_strongly_typing_recognized_form] and [sample_strongly_typing_recognized_form_async.py][sample_strongly_typing_recognized_form_async]|Use the fields in your recognized forms to create an object with strongly-typed fields|
|[sample_get_bounding_boxes.py][sample_get_bounding_boxes] and [sample_get_bounding_boxes_async.py][sample_get_bounding_boxes_async]|Get info to visualize the outlines of form content and fields, which can be used for manual validation|
|[sample_differentiate_output_models_trained_with_and_without_labels.py][sample_differentiate_output_models_trained_with_and_without_labels] and [sample_differentiate_output_models_trained_with_and_without_labels_async.py][sample_differentiate_output_models_trained_with_and_without_labels_async]|See the differences in output when using a custom model trained with labeled data and one trained with unlabeled data|

Expand Down Expand Up @@ -94,4 +95,6 @@ what you can do with the Azure Form Recognizer client library.
[sample_train_model_without_labels]: https://github.com/Azure/azure-sdk-for-python/tree/master/sdk/formrecognizer/azure-ai-formrecognizer/samples/sample_train_model_without_labels.py
[sample_train_model_without_labels_async]: https://github.com/Azure/azure-sdk-for-python/tree/master/sdk/formrecognizer/azure-ai-formrecognizer/samples/async_samples/sample_train_model_without_labels_async.py
[sample_copy_model]: https://github.com/Azure/azure-sdk-for-python/tree/master/sdk/formrecognizer/azure-ai-formrecognizer/samples/sample_copy_model.py
[sample_copy_model_async]: https://github.com/Azure/azure-sdk-for-python/tree/master/sdk/formrecognizer/azure-ai-formrecognizer/samples/async_samples/sample_copy_model_async.py
[sample_copy_model_async]: https://github.com/Azure/azure-sdk-for-python/tree/master/sdk/formrecognizer/azure-ai-formrecognizer/samples/async_samples/sample_copy_model_async.py
[sample_strongly_typing_recognized_form]: https://github.com/Azure/azure-sdk-for-python/tree/master/sdk/formrecognizer/azure-ai-formrecognizer/samples/sample_strongly_typing_recognized_form.py
[sample_strongly_typing_recognized_form_async]: https://github.com/Azure/azure-sdk-for-python/tree/master/sdk/formrecognizer/azure-ai-formrecognizer/samples/async_samples/sample_strongly_typing_recognized_form_async.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,12 @@
FILE: sample_recognize_receipts_async.py

DESCRIPTION:
This sample demonstrates how to recognize US sales receipts from a file.
This sample demonstrates how to recognize and extract common fields from US receipts,
using a pre-trained receipt model. For a suggested approach to extracting information
from receipts, see sample_strongly_typed_recognized_form_async.py.

See fields found on a receipt here:
https://aka.ms/azsdk/python/formrecognizer/receiptfields

USAGE:
python sample_recognize_receipts_async.py
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,12 @@
FILE: sample_recognize_receipts_from_url_async.py

DESCRIPTION:
This sample demonstrates how to recognize US sales receipts from a URL.
This sample demonstrates how to recognize and extract common fields from a US receipt URL,
using a pre-trained receipt model. For a suggested approach to extracting information
from receipts, see sample_strongly_typed_recognized_form_async.py.

See fields found on a receipt here:
https://aka.ms/azsdk/python/formrecognizer/receiptfields

USAGE:
python sample_recognize_receipts_from_url_async.py
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
# coding: utf-8

# -------------------------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License. See License.txt in the project root for
# license information.
# --------------------------------------------------------------------------

"""
FILE: sample_strongly_typed_recognized_form_async.py

DESCRIPTION:
This sample demonstrates how to use the fields in your recognized forms to create an object with
strongly-typed fields. The pre-trained receipt method will be used to illustrate this sample, but
note that a similar approach can be used for any custom form as long as you properly update the
fields' names and types.

See fields found on a receipt here:
https://aka.ms/azsdk/python/formrecognizer/receiptfields

USAGE:
python sample_strongly_typed_recognized_form_async.py

Set the environment variables with your own values before running the sample:
1) AZURE_FORM_RECOGNIZER_ENDPOINT - the endpoint to your Cognitive Services resource.
2) AZURE_FORM_RECOGNIZER_KEY - your Form Recognizer API key
"""

import os
import asyncio
from azure.ai.formrecognizer import FormField


class Receipt(object):
"""Creates a strongly-typed Receipt class from the fields returned in a RecognizedForm.
If a specific field is not found on the receipt, it will return None.

See fields found on a receipt here:
https://aka.ms/azsdk/python/formrecognizer/receiptfields
"""

def __init__(self, form):
self.receipt_type = form.fields.get("ReceiptType", FormField())
self.merchant_name = form.fields.get("MerchantName", FormField())
self.merchant_address = form.fields.get("MerchantAddress", FormField())
self.merchant_phone_number = form.fields.get("MerchantPhoneNumber", FormField())
self.receipt_items = self.convert_to_receipt_item(form.fields.get("Items", FormField()))
self.subtotal = form.fields.get("Subtotal", FormField())
self.tax = form.fields.get("Tax", FormField())
self.tip = form.fields.get("Tip", FormField())
self.total = form.fields.get("Total", FormField())
self.transaction_date = form.fields.get("TransactionDate", FormField())
self.transaction_time = form.fields.get("TransactionTime", FormField())

def convert_to_receipt_item(self, items):
"""Converts Items in a receipt to a list of strongly-typed ReceiptItem
"""
if items is None:
return []
return [ReceiptItem(item) for item in items.value]


class ReceiptItem(object):
"""Creates a strongly-typed ReceiptItem for every receipt item found in a RecognizedForm
"""

def __init__(self, item):
self.name = item.value.get("Name", FormField())
self.quantity = item.value.get("Quantity", FormField())
self.price = item.value.get("Price", FormField())
self.total_price = item.value.get("TotalPrice", FormField())


class StronglyTypedRecognizedFormSampleAsync(object):

async def strongly_typed_receipt_async(self):
path_to_sample_forms = os.path.abspath(os.path.join(os.path.abspath(__file__), "..", "..", "./sample_forms/receipt/contoso-allinone.jpg"))

from azure.core.credentials import AzureKeyCredential
from azure.ai.formrecognizer.aio import FormRecognizerClient

endpoint = os.environ["AZURE_FORM_RECOGNIZER_ENDPOINT"]
key = os.environ["AZURE_FORM_RECOGNIZER_KEY"]

async with FormRecognizerClient(
endpoint=endpoint, credential=AzureKeyCredential(key)
) as form_recognizer_client:

with open(path_to_sample_forms, "rb") as f:
poller = await form_recognizer_client.begin_recognize_receipts(receipt=f)
receipts = await poller.result()

for receipt in receipts:
receipt = Receipt(receipt)
print("Receipt Type: {} has confidence: {}".format(receipt.receipt_type.value, receipt.receipt_type.confidence))
print("Merchant Name: {} has confidence: {}".format(receipt.merchant_name.value, receipt.merchant_name.confidence))
print("Transaction Date: {} has confidence: {}".format(receipt.transaction_date.value, receipt.transaction_date.confidence))
print("Receipt items:")
for item in receipt.receipt_items:
print("...Item Name: {} has confidence: {}".format(item.name.value, item.name.confidence))
print("...Item Quantity: {} has confidence: {}".format(item.quantity.value, item.quantity.confidence))
print("...Individual Item Price: {} has confidence: {}".format(item.price.value, item.price.confidence))
print("...Total Item Price: {} has confidence: {}".format(item.total_price.value, item.total_price.confidence))
print("Subtotal: {} has confidence: {}".format(receipt.subtotal.value, receipt.subtotal.confidence))
print("Tax: {} has confidence: {}".format(receipt.tax.value, receipt.tax.confidence))
print("Tip: {} has confidence: {}".format(receipt.tip.value, receipt.tip.confidence))
print("Total: {} has confidence: {}".format(receipt.total.value, receipt.total.confidence))


async def main():
sample = StronglyTypedRecognizedFormSampleAsync()
await sample.strongly_typed_receipt_async()


if __name__ == '__main__':
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,12 @@
FILE: sample_recognize_receipts.py

DESCRIPTION:
This sample demonstrates how to recognize US sales receipts from a file.
This sample demonstrates how to recognize and extract common fields from US receipts,
using a pre-trained receipt model. For a suggested approach to extracting information
from receipts, see sample_strongly_typed_recognized_form.py.

See fields found on a receipt here:
https://aka.ms/azsdk/python/formrecognizer/receiptfields

USAGE:
python sample_recognize_receipts.py
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,12 @@
FILE: sample_recognize_receipts_from_url.py

DESCRIPTION:
This sample demonstrates how to recognize US sales receipts from a URL.
This sample demonstrates how to recognize and extract common fields from a US receipt URL,
using a pre-trained receipt model. For a suggested approach to extracting information
from receipts, see sample_strongly_typed_recognized_form.py.

See fields found on a receipt here:
https://aka.ms/azsdk/python/formrecognizer/receiptfields

USAGE:
python sample_recognize_receipts_from_url.py
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
# coding: utf-8

# -------------------------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License. See License.txt in the project root for
# license information.
# --------------------------------------------------------------------------

"""
FILE: sample_strongly_typed_recognized_form.py

DESCRIPTION:
This sample demonstrates how to use the fields in your recognized forms to create an object with
strongly-typed fields. The pre-trained receipt method will be used to illustrate this sample, but
note that a similar approach can be used for any custom form as long as you properly update the
fields' names and types.

See fields found on a receipt here:
https://aka.ms/azsdk/python/formrecognizer/receiptfields

USAGE:
python sample_strongly_typed_recognized_form.py

Set the environment variables with your own values before running the sample:
1) AZURE_FORM_RECOGNIZER_ENDPOINT - the endpoint to your Cognitive Services resource.
2) AZURE_FORM_RECOGNIZER_KEY - your Form Recognizer API key
"""

import os
from azure.ai.formrecognizer import FormField


class Receipt(object):
"""Creates a strongly-typed Receipt class from the fields returned in a RecognizedForm.
If a specific field is not found on the receipt, it will return None.

See fields found on a receipt here:
https://aka.ms/azsdk/python/formrecognizer/receiptfields
"""

def __init__(self, form):
self.receipt_type = form.fields.get("ReceiptType", FormField())
self.merchant_name = form.fields.get("MerchantName", FormField())
self.merchant_address = form.fields.get("MerchantAddress", FormField())
self.merchant_phone_number = form.fields.get("MerchantPhoneNumber", FormField())
self.receipt_items = self.convert_to_receipt_item(form.fields.get("Items", FormField()))
self.subtotal = form.fields.get("Subtotal", FormField())
self.tax = form.fields.get("Tax", FormField())
self.tip = form.fields.get("Tip", FormField())
self.total = form.fields.get("Total", FormField())
self.transaction_date = form.fields.get("TransactionDate", FormField())
self.transaction_time = form.fields.get("TransactionTime", FormField())

def convert_to_receipt_item(self, items):
"""Converts Items in a receipt to a list of strongly-typed ReceiptItem
"""
if items is None:
return []
return [ReceiptItem(item) for item in items.value]


class ReceiptItem(object):
"""Creates a strongly-typed ReceiptItem for every receipt item found in a RecognizedForm
"""

def __init__(self, item):
self.name = item.value.get("Name", FormField())
self.quantity = item.value.get("Quantity", FormField())
self.price = item.value.get("Price", FormField())
self.total_price = item.value.get("TotalPrice", FormField())


class StronglyTypedRecognizedFormSample(object):

def strongly_typed_receipt(self):
path_to_sample_forms = os.path.abspath(os.path.join(os.path.abspath(__file__), "..", "./sample_forms/receipt/contoso-allinone.jpg"))

from azure.core.credentials import AzureKeyCredential
from azure.ai.formrecognizer import FormRecognizerClient

endpoint = os.environ["AZURE_FORM_RECOGNIZER_ENDPOINT"]
key = os.environ["AZURE_FORM_RECOGNIZER_KEY"]

form_recognizer_client = FormRecognizerClient(
endpoint=endpoint, credential=AzureKeyCredential(key)
)
with open(path_to_sample_forms, "rb") as f:
poller = form_recognizer_client.begin_recognize_receipts(receipt=f)
receipts = poller.result()

for receipt in receipts:
receipt = Receipt(receipt)
print("Receipt Type: {} has confidence: {}".format(receipt.receipt_type.value, receipt.receipt_type.confidence))
print("Merchant Name: {} has confidence: {}".format(receipt.merchant_name.value, receipt.merchant_name.confidence))
print("Transaction Date: {} has confidence: {}".format(receipt.transaction_date.value, receipt.transaction_date.confidence))
print("Receipt items:")
for item in receipt.receipt_items:
print("...Item Name: {} has confidence: {}".format(item.name.value, item.name.confidence))
print("...Item Quantity: {} has confidence: {}".format(item.quantity.value, item.quantity.confidence))
print("...Individual Item Price: {} has confidence: {}".format(item.price.value, item.price.confidence))
print("...Total Item Price: {} has confidence: {}".format(item.total_price.value, item.total_price.confidence))
print("Subtotal: {} has confidence: {}".format(receipt.subtotal.value, receipt.subtotal.confidence))
print("Tax: {} has confidence: {}".format(receipt.tax.value, receipt.tax.confidence))
print("Tip: {} has confidence: {}".format(receipt.tip.value, receipt.tip.confidence))
print("Total: {} has confidence: {}".format(receipt.total.value, receipt.total.confidence))


if __name__ == '__main__':
sample = StronglyTypedRecognizedFormSample()
sample.strongly_typed_receipt()
Original file line number Diff line number Diff line change
Expand Up @@ -110,3 +110,8 @@ def test_sample_train_model_with_labels(self, resource_group, location, form_rec
def test_sample_train_model_without_labels(self, resource_group, location, form_recognizer_account, form_recognizer_account_key):
os.environ['CONTAINER_SAS_URL'] = self.get_settings_value("FORM_RECOGNIZER_STORAGE_CONTAINER_SAS_URL")
_test_file('sample_train_model_without_labels.py', form_recognizer_account, form_recognizer_account_key)

@pytest.mark.live_test_only
@GlobalFormRecognizerAccountPreparer()
def test_sample_strongly_typing_recognized_form(self, resource_group, location, form_recognizer_account, form_recognizer_account_key):
_test_file('sample_strongly_typing_recognized_form.py', form_recognizer_account, form_recognizer_account_key)
Original file line number Diff line number Diff line change
Expand Up @@ -108,3 +108,7 @@ def test_sample_train_model_without_labels_async(self, resource_group, location,
os.environ['CONTAINER_SAS_URL'] = self.get_settings_value("FORM_RECOGNIZER_STORAGE_CONTAINER_SAS_URL")
_test_file('sample_train_model_without_labels_async.py', form_recognizer_account, form_recognizer_account_key)

@pytest.mark.live_test_only
@GlobalFormRecognizerAccountPreparer()
def test_sample_strongly_typing_recognized_form_async(self, resource_group, location, form_recognizer_account, form_recognizer_account_key):
_test_file('sample_strongly_typing_recognized_form_async.py', form_recognizer_account, form_recognizer_account_key)