Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

It's easy to fetch all search results accidentally with real_search. #929

Open
wants to merge 3 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions vumi/persist/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -474,6 +474,19 @@ def real_search(cls, manager, query, rows=None, start=None):
"""
Performs a real riak search, does no inspection on the given query.

:param Manager manager:
The model manager to use for the query.
:param str query:
The query to perform.
:param int rows:
The maximum number of search results to return. The default
of ``None`` indicates a backend specific number of rows
(usually 1000).
:param int start:
Numer of the first search result to return. The default of
``None`` is equivalent to ``0`` and starts with the first
search result.

:returns: list of keys.
"""
return manager.real_search(cls, query, rows=rows, start=start)
Expand Down
10 changes: 2 additions & 8 deletions vumi/persist/riak_manager.py
Original file line number Diff line number Diff line change
Expand Up @@ -293,16 +293,10 @@ def _search_iteration(self, bucket, query, rows, start):

def real_search(self, modelcls, query, rows=None, start=None):
rows = 1000 if rows is None else rows
start = 0 if start is None else start
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to throw an exception if we get start=None so we can find places that are using this incorrectly and fix them instead of having them silently return fewer results than they did before.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think raising an exception for the default value is a bit crazy? Everywhere I can find that uses real_search currently passes in a start value and expects it to be honoured.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this were a new method, I'd completely agree. What we're doing here is changing the behaviour of the default case, and throwing an error for the now-unsupported behaviour (in any places it's used that we haven't found) is better than silently returning fewer results than we had before. We'd have to document it clearly, of course -- probably in the exception.

(If start were before rows in the parameter list, I'd suggest removing the default value and making it mandatory.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think these other places exist? And if there are a couple of places we can fix them when we find them?

The situation in which the current implementation returns more results is a bit corner case anyway since it requires there to be more results than rows (so more than 1000) by default, but few enough to return in a reasonable time frame.

The cost of raising the exception is having yet another ugly function in the code base that we have to clean up later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think these other places exist? And if there are a couple of places we can fix them when we find them?

How will we find them if we don't raise an exception?

The cost of raising the exception is having yet another ugly function in the code base that we have to clean up later.

The cost of not raising the exception is potentially introducing subtle data corruption bugs into existing code. Having been on the receiving end of that kind of change more than once, I'll take the ugly function every time.

bucket_name = self.bucket_name(modelcls)
bucket = self.client.bucket(bucket_name)
if start is not None:
return self._search_iteration(bucket, query, rows, start)
keys = []
new_keys = self._search_iteration(bucket, query, rows, 0)
while new_keys:
keys.extend(new_keys)
new_keys = self._search_iteration(bucket, query, rows, len(keys))
return keys
return self._search_iteration(bucket, query, rows, start)

def riak_enable_search(self, modelcls):
bucket_name = self.bucket_name(modelcls)
Expand Down
13 changes: 12 additions & 1 deletion vumi/persist/tests/test_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -429,7 +429,18 @@ def test_big_real_search(self):
yield simple_model("yy000001", a=98, b=u'def').save()
yield simple_model("yy000002", a=98, b=u'ghi').save()

search = lambda q: simple_model.real_search(q, rows=11)
@inlineCallbacks
def search(q):
results = []
while True:
new_results = yield simple_model.real_search(
q, rows=11, start=len(results))
self.assertTrue(len(new_results) <= 11)
results.extend(new_results)
if len(new_results) < 11:
break
returnValue(results)

yield self.assert_search_results(keys, search, 'a:99')

@Manager.calls_manager
Expand Down
13 changes: 2 additions & 11 deletions vumi/persist/txriak_manager.py
Original file line number Diff line number Diff line change
Expand Up @@ -324,21 +324,12 @@ def _search_iteration(self, bucket, query, rows, start):
d.addCallback(lambda r: [doc["id"] for doc in r["docs"]])
return d

@inlineCallbacks
def real_search(self, modelcls, query, rows=None, start=None):
rows = 1000 if rows is None else rows
start = 0 if start is None else start
bucket_name = self.bucket_name(modelcls)
bucket = self.client.bucket(bucket_name)
if start is not None:
keys = yield self._search_iteration(bucket, query, rows, start)
returnValue(keys)
keys = []
new_keys = yield self._search_iteration(bucket, query, rows, 0)
while new_keys:
keys.extend(new_keys)
new_keys = yield self._search_iteration(
bucket, query, rows, len(keys))
returnValue(keys)
return self._search_iteration(bucket, query, rows, start)

def riak_enable_search(self, modelcls):
bucket_name = self.bucket_name(modelcls)
Expand Down