Skip to content

Search advanced batching #12622

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 37 commits into from
Aug 7, 2020
Merged

Search advanced batching #12622

merged 37 commits into from
Aug 7, 2020

Conversation

xiangyan99
Copy link
Member

No description provided.

@ghost ghost added the Search label Jul 20, 2020
@xiangyan99 xiangyan99 marked this pull request as ready for review August 6, 2020 16:05
@xiangyan99 xiangyan99 requested a review from rakshith91 as a code owner August 6, 2020 16:05
@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@xiangyan99
Copy link
Member Author

/azp run python - search - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

rakshith91
rakshith91 previously approved these changes Aug 7, 2020
@xiangyan99
Copy link
Member Author

/azp run python - search

@xiangyan99
Copy link
Member Author

/azp run python - search - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

1 similar comment
@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@xiangyan99 xiangyan99 requested a review from rakshith91 August 7, 2020 22:11
@xiangyan99
Copy link
Member Author

/azp run python - search

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@xiangyan99
Copy link
Member Author

/azp run python - search - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@xiangyan99 xiangyan99 merged commit 919ce75 into master Aug 7, 2020
@xiangyan99 xiangyan99 deleted the search_advanced_batching branch August 7, 2020 22:55
@@ -47,11 +51,12 @@ def add_upload_actions(self, *documents):
:param documents: Documents to upload to an Azure search index. May be
a single list of documents, or documents as individual parameters.
:type documents: dict or list[dict]
:rtype: List[IndexAction]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no description as to what the method returns (other than the type). Does it return the added actions or the total set of actions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It returns newly added actions. I will update the docstring


:rtype: List[IndexAction]
"""
return list(self._succeeded_actions)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Properties generally return the same instance every time. Here we are creating a new list every time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is my intension. I don't want succeeded_actions to be modified outside of this class.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I would make it into a function rather than a property. Making it a property implies that modifications to the returned list are persisted...

index_documents = IndexBatch(actions=batch.actions)
return self._index_documents_actions(actions=batch.actions, **kwargs)

@distributed_trace
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We generally don't provide spans for internal functions. In this case, it also leads to a nested span even in the non-split case.

return self._index_documents_actions(actions=batch.actions, **kwargs)

@distributed_trace
def _index_documents_actions(self, actions, **kwargs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any custom handling of timeouts. When we have multiple calls involved, we should try to avoid multiplying any timeout values passed in....

error_map=error_map,
**kwargs
)
if batch_response_second_half:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When is this Falsey?

@@ -525,11 +532,41 @@ def index_documents(self, batch, **kwargs):
:type batch: IndexDocumentsBatch
:rtype: List[IndexingResult]
"""
index_documents = IndexBatch(actions=batch.actions)
return self._index_documents_actions(actions=batch.actions, **kwargs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should consider documenting what kind of exceptions this method can throw - which leads to the question "should the RequestEntityTooLarge" exception be public or not. And if not, should the type be introduced?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely. I added this into APIView.

# type: (*str, IndexAction, dict) -> None
pass

class SearchIndexDocumentBatchingClient(HeadersMixin):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@johanste Any concerns about the name?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants