-
Notifications
You must be signed in to change notification settings - Fork 3k
Search advanced batching #12622
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Search advanced batching #12622
Conversation
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run python - search - tests |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run python - search |
/azp run python - search - tests |
Azure Pipelines successfully started running 1 pipeline(s). |
1 similar comment
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run python - search |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run python - search - tests |
Azure Pipelines successfully started running 1 pipeline(s). |
@@ -47,11 +51,12 @@ def add_upload_actions(self, *documents): | |||
:param documents: Documents to upload to an Azure search index. May be | |||
a single list of documents, or documents as individual parameters. | |||
:type documents: dict or list[dict] | |||
:rtype: List[IndexAction] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no description as to what the method returns (other than the type). Does it return the added actions or the total set of actions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It returns newly added actions. I will update the docstring
|
||
:rtype: List[IndexAction] | ||
""" | ||
return list(self._succeeded_actions) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Properties generally return the same instance every time. Here we are creating a new list every time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is my intension. I don't want succeeded_actions to be modified outside of this class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then I would make it into a function rather than a property. Making it a property implies that modifications to the returned list are persisted...
index_documents = IndexBatch(actions=batch.actions) | ||
return self._index_documents_actions(actions=batch.actions, **kwargs) | ||
|
||
@distributed_trace |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We generally don't provide spans for internal functions. In this case, it also leads to a nested span even in the non-split case.
return self._index_documents_actions(actions=batch.actions, **kwargs) | ||
|
||
@distributed_trace | ||
def _index_documents_actions(self, actions, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any custom handling of timeouts. When we have multiple calls involved, we should try to avoid multiplying any timeout values passed in....
error_map=error_map, | ||
**kwargs | ||
) | ||
if batch_response_second_half: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When is this False
y?
@@ -525,11 +532,41 @@ def index_documents(self, batch, **kwargs): | |||
:type batch: IndexDocumentsBatch | |||
:rtype: List[IndexingResult] | |||
""" | |||
index_documents = IndexBatch(actions=batch.actions) | |||
return self._index_documents_actions(actions=batch.actions, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should consider documenting what kind of exceptions this method can throw - which leads to the question "should the RequestEntityTooLarge" exception be public or not. And if not, should the type be introduced?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely. I added this into APIView.
# type: (*str, IndexAction, dict) -> None | ||
pass | ||
|
||
class SearchIndexDocumentBatchingClient(HeadersMixin): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@johanste Any concerns about the name?
No description provided.