Skip to content

Commit 39532dd

Browse files
authored
[SchemaRegistry] update typehints/bug bash fixes (Azure#24112)
* update typehints * update typehints + version * update typevar name * make group_name optional in constructor * bug bash readme updates * samples docstring/readme/MessageContent docstring * update changelog * libbas comments
1 parent 5757aea commit 39532dd

21 files changed

+618
-128
lines changed

sdk/schemaregistry/azure-schemaregistry-avroencoder/CHANGELOG.md

Lines changed: 24 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,38 @@
11
# Release History
22

3-
## 1.0.0b4 (Unreleased)
3+
## 1.0.0 (Unreleased)
4+
5+
**Note:** This is the first stable release of our efforts to create a user-friendly and Pythonic client library for Azure Schema Registry.
46

57
### Features Added
68

9+
- `AvroEncoder` sync and async classes provide the functionality to encode and decode content which follows a schema with the RecordSchema format, as defined by the Apache Avro specification. The Apache Avro library is used as the implementation for encoding and decoding.
10+
The encoder will automatically register and retrieve schemas from Azure Schema Registry Service. It provides the following methods:
11+
- constructor: If `auto_register=True` keyword is passed in, will automatically register schemas passed in to the `encode` method. Otherwise, and by default, will require pre-registering of schemas passed to `encode`. Takes a `group_name` argument that is optional when decoding, but required for encoding.
12+
- `encode`: Encodes dict content into bytes according to the given schema and registers schema if needed. Returns either a dict of encoded content and corresponding content type or a `MessageType` subtype object, depending on arguments provided.
13+
- `decode`: Decodes bytes content into dict content by automatically retrieving schema from the service.
14+
- `MessageContent` TypedDict has been introduced with the following required keys:
15+
- `content`: The bytes content.
16+
- `content_type`: The string content type, which holds the schema ID and the record format indicator.
17+
- `MessageType` has been introduced with the following methods:
18+
- `from_message_content`: Class method that creates an object with given bytes content and string content type.
19+
- `__message_content__`: Returns a `MessageContent` object with content and content type values set to their respective properties on the object.
20+
- Schemas and Schema IDs are cached locally, so that multiple calls with the same schema/schema ID will not trigger multiple service calls.
21+
- The number of hits, misses, and total entries for the schema/schema ID caches will be logged at an info level when a new entry is added.
22+
- `InvalidContentError` has been introduced for errors related to invalid content and content types, where `__cause__` will contain the underlying exception raised by the Avro library.
23+
- `InvalidSchemaError` has been introduced for errors related to invalid schemas, where `__cause__` will contain the underlying exception raised by the Apache Avro library.
24+
- The `encode` and `decode` methods on `AvroEncoder` support the following message models:
25+
- `azure.eventhub.EventData` in `azure-eventhub>=5.9.0`
26+
727
### Breaking Changes
828

929
### Bugs Fixed
1030

1131
### Other Changes
1232

33+
- This package is meant to replace the azure-schemaregistry-avroserializer package, which will no longer be supported.
34+
- `group_name` is now an optional parameter in the sync and async `AvroEncoder` constructors.
35+
1336
## 1.0.0b3 (2022-04-05)
1437

1538
### Breaking Changes

sdk/schemaregistry/azure-schemaregistry-avroencoder/README.md

Lines changed: 31 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -14,10 +14,10 @@ _Azure SDK Python packages support for Python 2.7 has ended 01 January 2022. For
1414

1515
### Install the package
1616

17-
Install the Azure Schema Registry Avro Encoder client library and Azure Identity client library for Python with [pip][pip]:
17+
Install the Azure Schema Registry Avro Encoder client library for Python with [pip][pip]:
1818

1919
```Bash
20-
pip install azure-schemaregistry-avroencoder azure-identity
20+
pip install azure-schemaregistry-avroencoder
2121
```
2222

2323
### Prerequisites:
@@ -47,14 +47,15 @@ pip install aiohttp
4747
**Create AvroEncoder using the azure-schemaregistry library:**
4848

4949
```python
50+
import os
5051
from azure.schemaregistry import SchemaRegistryClient
5152
from azure.schemaregistry.encoder.avroencoder import AvroEncoder
5253
from azure.identity import DefaultAzureCredential
5354

5455
credential = DefaultAzureCredential()
5556
# Namespace should be similar to: '<your-eventhub-namespace>.servicebus.windows.net'
56-
fully_qualified_namespace = '<< FULLY QUALIFIED NAMESPACE OF THE SCHEMA REGISTRY >>'
57-
group_name = '<< GROUP NAME OF THE SCHEMA >>'
57+
fully_qualified_namespace = os.environ['SCHEMAREGISTRY_FULLY_QUALIFIED_NAMESPACE']
58+
group_name = os.environ['SCHEMAREGISTRY_GROUP']
5859
schema_registry_client = SchemaRegistryClient(fully_qualified_namespace, credential)
5960
encoder = AvroEncoder(client=schema_registry_client, group_name=group_name)
6061
```
@@ -70,11 +71,11 @@ content type with schema ID. Uses [SchemaRegistryClient][schemaregistry_client]
7071

7172
Support has been added to certain Azure Messaging SDK model classes for interoperability with the `AvroEncoder`. These models are subtypes of the `MessageType` protocol defined under the `azure.schemaregistry.encoder.avroencoder` namespace. Currently, the supported model classes are:
7273

73-
- `azure.eventhub.EventData` for `azure-eventhub==5.9.0b3`
74+
- `azure.eventhub.EventData` for `azure-eventhub>=5.9.0`
7475

7576
### Message format
7677

77-
If a message type that follows the MessageType protocol is provided to the encoder, it will encode the corresponding content and content type properties as follows:
78+
If a message type that follows the MessageType protocol is provided to the encoder for encoding, it will set the corresponding content and content type properties, where:
7879

7980
- `content`: Avro payload (in general, format-specific payload)
8081
- Avro Binary Encoding
@@ -86,6 +87,10 @@ If a message type that follows the MessageType protocol is provided to the encod
8687
- `avro/binary` is the format indicator
8788
- `<schema ID>` is the hexadecimal representation of GUID, same format and byte order as the string from the Schema Registry service.
8889

90+
If `EventData` is passed in as the message type, the following properties will be set on the `EventData` object:
91+
- The `body` property will be set to the content value.
92+
- The `content_type` property will be set to the content type value.
93+
8994
If message type is not provided, and by default, the encoder will create the following dict:
9095
`{"content": <Avro encoded payload>, "content_type": 'avro/binary+<schema ID>' }`
9196

@@ -100,8 +105,8 @@ The following sections provide several code snippets covering some of the most c
100105

101106
### Encoding
102107

103-
Use `AvroEncoder.encode` method to encode dict content with the given Avro schema.
104-
The method will use a schema previously registered to the Schema Registry service and keep the schema cached for future encoding usage. It is also possible to avoid pre-registering the schema to the service and automatically register with the `encode` method by instantiating the `AvroEncoder` with the keyword argument `auto_register=True`.
108+
Use the `AvroEncoder.encode` method to encode content with the given Avro schema.
109+
The method will use a schema previously registered to the Schema Registry service and keep the schema cached for future encoding usage. In order to avoid pre-registering the schema to the service and automatically register it with the `encode` method, the keyword argument `auto_register=True` should be passed to the `AvroEncoder` constructor.
105110

106111
```python
107112
import os
@@ -112,7 +117,7 @@ from azure.eventhub import EventData
112117

113118
token_credential = DefaultAzureCredential()
114119
fully_qualified_namespace = os.environ['SCHEMAREGISTRY_FULLY_QUALIFIED_NAMESPACE']
115-
group_name = "<your-group-name>"
120+
group_name = os.environ['SCHEMAREGISTRY_GROUP']
116121
name = "example.avro.User"
117122
format = "Avro"
118123

@@ -128,7 +133,7 @@ definition = """
128133
}"""
129134

130135
schema_registry_client = SchemaRegistryClient(fully_qualified_namespace, token_credential)
131-
schema_register_client.register(group_name, name, definition, format)
136+
schema_registry_client.register_schema(group_name, name, definition, format)
132137
encoder = AvroEncoder(client=schema_registry_client, group_name=group_name)
133138

134139
with encoder:
@@ -143,7 +148,7 @@ with encoder:
143148

144149
### Decoding
145150

146-
Use `AvroEncoder.decode` method to decode the bytes value into dict content by either:
151+
Use the `AvroEncoder.decode` method to decode the Avro-encoded content by either:
147152
- Passing in a message object that is a subtype of the MessageType protocol.
148153
- Passing in a dict with keys `content`(type bytes) and `content_type` (type string).
149154
The method automatically retrieves the schema from the Schema Registry Service and keeps the schema cached for future decoding usage.
@@ -159,10 +164,12 @@ fully_qualified_namespace = os.environ['SCHEMAREGISTRY_FULLY_QUALIFIED_NAMESPACE
159164
group_name = "<your-group-name>"
160165

161166
schema_registry_client = SchemaRegistryClient(fully_qualified_namespace, token_credential)
162-
encoder = AvroEncoder(client=schema_registry_client, group_name=group_name)
167+
encoder = AvroEncoder(client=schema_registry_client)
163168

164169
with encoder:
165170
# event_data is an EventData object with Avro encoded body
171+
dict_content = {"name": "Ben", "favorite_number": 7, "favorite_color": "red"}
172+
event_data = encoder.encode(dict_content, schema=definition, message_type=EventData)
166173
decoded_content = encoder.decode(event_data)
167174

168175
# OR
@@ -175,7 +182,7 @@ with encoder:
175182

176183
### Event Hubs Sending Integration
177184

178-
Integration with [Event Hubs][eventhubs_repo] to send encoded Avro dict content as the body of EventData.
185+
Integration with [Event Hubs][eventhubs_repo] to send an `EventData` object with `body` set to Avro-encoded content and corresponding `content_type`.
179186

180187
```python
181188
import os
@@ -186,7 +193,7 @@ from azure.identity import DefaultAzureCredential
186193

187194
token_credential = DefaultAzureCredential()
188195
fully_qualified_namespace = os.environ['SCHEMAREGISTRY_FULLY_QUALIFIED_NAMESPACE']
189-
group_name = "<your-group-name>"
196+
group_name = os.environ['SCHEMAREGISTRY_GROUP']
190197
eventhub_connection_str = os.environ['EVENT_HUB_CONN_STR']
191198
eventhub_name = os.environ['EVENT_HUB_NAME']
192199

@@ -219,7 +226,7 @@ with eventhub_producer, avro_encoder:
219226

220227
### Event Hubs Receiving Integration
221228

222-
Integration with [Event Hubs][eventhubs_repo] to receive `EventData` and decoded raw bytes into Avro dict content.
229+
Integration with [Event Hubs][eventhubs_repo] to receive an `EventData` object and decode the the Avro-encoded `body` value.
223230

224231
```python
225232
import os
@@ -230,7 +237,7 @@ from azure.identity import DefaultAzureCredential
230237

231238
token_credential = DefaultAzureCredential()
232239
fully_qualified_namespace = os.environ['SCHEMAREGISTRY_FULLY_QUALIFIED_NAMESPACE']
233-
group_name = "<your-group-name>"
240+
group_name = os.environ['SCHEMAREGISTRY_GROUP']
234241
eventhub_connection_str = os.environ['EVENT_HUB_CONN_STR']
235242
eventhub_name = os.environ['EVENT_HUB_NAME']
236243

@@ -254,7 +261,7 @@ with eventhub_consumer, avro_encoder:
254261

255262
### General
256263

257-
Azure Schema Registry Avro Encoder raises exceptions defined in [Azure Core][azure_core].
264+
Azure Schema Registry Avro Encoder raises exceptions defined in [Azure Core][azure_core] if errors are encountered when communicating with the Schema Registry service. Errors related to invalid content/content types and invalid schemas will be raised as `azure.schemaregistry.encoder.avroencoder.InvalidContentError` and `azure.schemaregistry.encoder.avroencoder.InvalidSchemaError`, respectively, where `__cause__` will contain the underlying exception raised by the Apache Avro library.
258265

259266
### Logging
260267
This library uses the standard
@@ -266,6 +273,7 @@ Detailed DEBUG level logging, including request/response bodies and unredacted
266273
headers, can be enabled on a client with the `logging_enable` argument:
267274
```python
268275
import sys
276+
import os
269277
import logging
270278
from azure.schemaregistry import SchemaRegistryClient
271279
from azure.schemaregistry.encoder.avroencoder import AvroEncoder
@@ -279,23 +287,25 @@ logger.setLevel(logging.DEBUG)
279287
handler = logging.StreamHandler(stream=sys.stdout)
280288
logger.addHandler(handler)
281289

290+
fully_qualified_namespace = os.environ['SCHEMAREGISTRY_FULLY_QUALIFIED_NAMESPACE']
291+
group_name = os.environ['SCHEMAREGISTRY_GROUP']
282292
credential = DefaultAzureCredential()
283-
schema_registry_client = SchemaRegistryClient("<your-fully_qualified_namespace>", credential, logging_enable=True)
293+
schema_registry_client = SchemaRegistryClient(fully_qualified_namespace, credential, logging_enable=True)
284294
# This client will log detailed information about its HTTP sessions, at DEBUG level
285-
encoder = AvroEncoder(client=schema_registry_client, group_name="<your-group-name>")
295+
encoder = AvroEncoder(client=schema_registry_client, group_name=group_name)
286296
```
287297

288298
Similarly, `logging_enable` can enable detailed logging for a single operation,
289299
even when it isn't enabled for the client:
290300
```py
291-
encoder.encode(dict_content, schema=schema_definition, logging_enable=True)
301+
encoder.encode(dict_content, schema=definition, logging_enable=True)
292302
```
293303

294304
## Next steps
295305

296306
### More sample code
297307

298-
Please find further examples in the [samples][sr_avro_samples] directory demonstrating common Azure Schema Registry Avro Encoder scenarios.
308+
Further examples demonstrating common Azure Schema Registry Avro Encoder scenarios are in the [samples][sr_avro_samples] directory.
299309

300310
## Contributing
301311

sdk/schemaregistry/azure-schemaregistry-avroencoder/azure/schemaregistry/encoder/avroencoder/_message_protocol.py

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -25,8 +25,8 @@ def from_message_content(
2525
cls, content: bytes, content_type: str, **kwargs: Any
2626
) -> "MessageType":
2727
"""
28-
Creates an object that is a subtype of MessageType given content type and
29-
a content value to be set as body.
28+
Creates an object that is a subtype of MessageType, given content type and
29+
a content value to be set on the object.
3030
3131
:param bytes content: The content value to be set as the body of the message.
3232
:param str content_type: The content type to be set on the message.
@@ -35,4 +35,10 @@ def from_message_content(
3535
...
3636

3737
def __message_content__(self) -> MessageContent:
38+
"""
39+
A MessageContent object, with `content` and `content_type` set to
40+
the values of their respective properties on the MessageType object.
41+
42+
:rtype: ~azure.schemaregistry.encoder.avroencoder.MessageContent
43+
"""
3844
...

sdk/schemaregistry/azure-schemaregistry-avroencoder/azure/schemaregistry/encoder/avroencoder/_schema_registry_avro_encoder.py

Lines changed: 17 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -33,41 +33,39 @@
3333
Optional,
3434
Type,
3535
overload,
36-
TypeVar,
3736
Union,
3837
)
3938
from ._utils import ( # pylint: disable=import-error
4039
validate_schema,
4140
create_message_content,
4241
validate_message,
4342
decode_content,
43+
MessageType
4444
)
4545

4646
from ._apache_avro_encoder import ( # pylint: disable=import-error
4747
ApacheAvroObjectEncoder as AvroObjectEncoder,
4848
)
4949
from ._message_protocol import ( # pylint: disable=import-error
5050
MessageContent,
51-
MessageType,
5251
)
5352

5453
if TYPE_CHECKING:
5554
from azure.schemaregistry import SchemaRegistryClient
5655

5756
_LOGGER = logging.getLogger(__name__)
5857

59-
MessageTypeT = TypeVar("MessageTypeT", bound=MessageType)
60-
6158

6259
class AvroEncoder(object):
6360
"""
6461
AvroEncoder provides the ability to encode and decode content according
6562
to the given avro schema. It would automatically register, get and cache the schema.
6663
67-
:keyword client: Required. The schema registry client
68-
which is used to register schema and retrieve schema from the service.
64+
:keyword client: Required. The schema registry client which is used to register schema
65+
and retrieve schema from the service.
6966
:paramtype client: ~azure.schemaregistry.SchemaRegistryClient
70-
:keyword str group_name: Required. Schema group under which schema should be registered.
67+
:keyword Optional[str] group_name: Required for encoding. Not used for decoding.
68+
Schema group under which schema should be registered.
7169
:keyword bool auto_register: When true, register new schemas passed to encode.
7270
Otherwise, and by default, encode will fail if the schema has not been pre-registered in the registry.
7371
@@ -76,13 +74,13 @@ class AvroEncoder(object):
7674
def __init__(self, **kwargs):
7775
# type: (Any) -> None
7876
try:
79-
self._schema_group = kwargs.pop("group_name")
8077
self._schema_registry_client = kwargs.pop(
8178
"client"
8279
) # type: "SchemaRegistryClient"
8380
except KeyError as exc:
84-
raise TypeError("'{}' is a required keyword.".format(exc.args[0]))
81+
raise TypeError(f"'{exc.args[0]}' is a required keyword.")
8582
self._avro_encoder = AvroObjectEncoder(codec=kwargs.get("codec"))
83+
self._schema_group = kwargs.pop("group_name", None)
8684
self._auto_register = kwargs.get("auto_register", False)
8785
self._auto_register_schema_func = (
8886
self._schema_registry_client.register_schema
@@ -146,10 +144,10 @@ def encode(
146144
content: Mapping[str, Any],
147145
*,
148146
schema: str,
149-
message_type: Type[MessageTypeT],
147+
message_type: Type[MessageType],
150148
request_options: Optional[Dict[str, Any]] = None,
151149
**kwargs: Any,
152-
) -> MessageTypeT:
150+
) -> MessageType:
153151
...
154152

155153
@overload
@@ -166,13 +164,13 @@ def encode(
166164

167165
def encode(
168166
self,
169-
content,
167+
content: Mapping[str, Any],
170168
*,
171-
schema,
172-
message_type=None,
173-
request_options=None,
174-
**kwargs,
175-
):
169+
schema: str,
170+
message_type: Optional[Type[MessageType]] = None,
171+
request_options: Optional[Dict[str, Any]] = None,
172+
**kwargs: Any,
173+
) -> Union[MessageType, MessageContent]:
176174
"""
177175
Encode content with the given schema. Create content type value, which consists of the Avro Mime Type string
178176
and the schema ID corresponding to given schema. If provided with a MessageType subtype, encoded content
@@ -202,6 +200,8 @@ def encode(
202200
"""
203201

204202
raw_input_schema = schema
203+
if not self._schema_group:
204+
raise TypeError("'group_name' in constructor cannot be None, if encoding.")
205205
schema_fullname = validate_schema(self._avro_encoder, raw_input_schema)
206206

207207
cache_misses = (

sdk/schemaregistry/azure-schemaregistry-avroencoder/azure/schemaregistry/encoder/avroencoder/_utils.py

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
# --------------------------------------------------------------------------------------------
55

66
from io import BytesIO
7-
from typing import TYPE_CHECKING, Any, Dict, Mapping, Optional, Type, Union, cast
7+
from typing import TYPE_CHECKING, Any, Dict, Mapping, Optional, Type, Union, cast, TypeVar
88
from avro.errors import SchemaResolutionException # type: ignore
99

1010
from ._exceptions import ( # pylint: disable=import-error
@@ -13,7 +13,7 @@
1313
)
1414
from ._message_protocol import ( # pylint: disable=import-error
1515
MessageContent,
16-
MessageType,
16+
MessageType as MessageTypeProtocol,
1717
)
1818
from ._constants import ( # pylint: disable=import-error
1919
AVRO_MIME_TYPE,
@@ -24,6 +24,8 @@
2424
ApacheAvroObjectEncoder as AvroObjectEncoder,
2525
)
2626

27+
MessageType = TypeVar("MessageType", bound=MessageTypeProtocol)
28+
2729

2830
def validate_schema(avro_encoder: "AvroObjectEncoder", raw_input_schema: str):
2931
try:
@@ -61,7 +63,7 @@ def create_message_content(
6163

6264
if message_type:
6365
try:
64-
return message_type.from_message_content(payload, content_type, **kwargs)
66+
return cast(MessageType, message_type.from_message_content(payload, content_type, **kwargs))
6567
except AttributeError as exc:
6668
raise TypeError(
6769
f"""Cannot set content and content type on model object. The content model

sdk/schemaregistry/azure-schemaregistry-avroencoder/azure/schemaregistry/encoder/avroencoder/_version.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,4 +24,4 @@
2424
#
2525
# --------------------------------------------------------------------------
2626

27-
VERSION = "1.0.0b4"
27+
VERSION = "1.0.0"

0 commit comments

Comments
 (0)