test_reshape only tests zero-dim arrays #307

ev-br · 2024-11-16T15:03:46Z

Applying

diff --git a/array_api_tests/test_manipulation_functions.py b/array_api_tests/test_manipulation_functions.py
index b8a919c..4091d62 100644
--- a/array_api_tests/test_manipulation_functions.py
+++ b/array_api_tests/test_manipulation_functions.py
@@ -360,13 +360,13 @@ def reshape_shapes(draw, shape):
 
 
 @pytest.mark.unvectorized
-@pytest.mark.skip("flaky")  # TODO: fix!
 @given(
     x=hh.arrays(dtype=hh.all_dtypes, shape=hh.shapes(max_side=MAX_SIDE)),
     data=st.data(),
 )
 def test_reshape(x, data):
     shape = data.draw(reshape_shapes(x.shape))
+    assume(shape != ())
 
     out = xp.reshape(x, shape)

produces

    @pytest.mark.unvectorized
>   @given(
        x=hh.arrays(dtype=hh.all_dtypes, shape=hh.shapes(max_side=MAX_SIDE)),
        data=st.data(),
    )
E   hypothesis.errors.FailedHealthCheck: It looks like your strategy is filtering out a lot of data. Health check found 50 filtered examples but only 0 good ones. This will make your tests much slower, and also will probably distort the data generation quite a lot. You should adapt your strategy to filter less. This can also be caused by a low max_leaves parameter in recursive() calls
E   See https://hypothesis.readthedocs.io/en/latest/healthchecks.html for more information about this. If you want to disable just this health check, add HealthCheck.filter_too_much to the suppress_health_check settings for this test.

array_api_tests/test_manipulation_functions.py:363: FailedHealthCheck
----------------------------------------------------------------------------------------------- Hypothesis ------------------------------------------------------------------------------------------------
You can add @seed(155687042254395097095124787400607549680) to this test or run pytest with --hypothesis-seed=155687042254395097095124787400607549680 to reproduce this failure.

The text was updated successfully, but these errors were encountered:

asmeurer · 2024-11-18T22:42:57Z

assume(shape != ()) will make it so that the test is skipped unless shape != (). In other words, it will skip the test whenever the input array is 0-D. We don't want to do that, since 0-D inputs are valid for reshape.

The first question I would try to answer for this particular issue is why the test is being skipped. If you just remove the flaky decorator, how does the test fail? You might have to run it many times, or run with many examples, using something like --max-examples=1000 (the default is 100).

asmeurer · 2024-11-18T22:47:57Z

More generally speaking, something to be aware of with hypothesis, is that it has two different ways of limiting the scope of input examples to tests: filtering and changing generation. Filtering means you explicitly tell hypothesis to reject whatever case you don't want, using either filter or assume. This means hypothesis will generate the example but then not actually use it.

The alternative is to change the test so that it can only generate the examples you want to see. The difference here is that you never actually filter out examples you don't want. This approach is generally better because you hypothesis doesn't ever waste time generating examples you don't want to see. And if hypothesis generates too many examples that get filtered out, it will issue a health warning like the one you saw.

To take an example, let's say you wanted to generate only even integers. There are two ways you might do this. The first is to take integers() and add a filter (or assume) to filter out example where n % 2 == 0 isn't true.

@given(integers().filter(lambda n: n % 2 == 0))
def test_even(n):
    assert n % 2 == 0

This works, but the problem here is that integers() is just going to generate integers. It doesn't know that half the integers it generates are going to be rejected (hypothesis does have some internal heuristics to try to guess at what you are filtering, but in general it can only be so smart).

A better approach would be to only generate even integers by taking the input and multiplying it by 2. This will still generate all even integers, but it will never generate an odd integer:

@given(integers().map(lambda n: n * 2))
def test_even(n):
    assert n % 2 == 0

The downside to this though is that sometimes it is very complicated to only generate the types of examples you want. So we do use filtering (typically using assume in the test suite). But you should only really use assume for uncommon cases, so that there isn't too much filtering.

To take this test itself, test_reshape, as a more concrete example, there are two inputs, the array and the shape. Now reshape is only defined when the reshape shape agrees with the array shape, i.e., they have the same total size. You could do this by generating an array and generating a shape tuple, and rejecting them unless they agree

@given(arrays(), shapes()) # shapes will generate tuples of positive integers
def test_reshape(a, shape):
    assume(prod(a.shape) == prod(shape))

However, this will reject almost all generated examples! Only when the array shape and the generated shape happen to have the same product will the test even run. Instead, what is done is that there is a special reshape_shapes strategy that generates only shapes that can possibly make sense given an array.

array-api-tests/array_api_tests/test_manipulation_functions.py

Lines 351 to 359 in 393b69b

    
           @st.composite 
        
           def reshape_shapes(draw, shape): 
        
               size = 1 if len(shape) == 0 else math.prod(shape) 
        
               rshape = draw(st.lists(st.integers(0)).filter(lambda s: math.prod(s) == size)) 
        
               assume(all(side <= MAX_SIDE for side in rshape)) 
        
               if len(rshape) != 0 and size > 0 and draw(st.booleans()): 
        
                   index = draw(st.integers(0, len(rshape) - 1)) 
        
                   rshape[index] = -1 
        
               return tuple(rshape)

This works by using data.draw so that it can first generate the array then pass it to the reshape_shapes strategy so that it can "know" what shape to base its shape on. It also uses @composite. Another way this sort of thing can be done is using shared, which lets two strategies generate things concurrently (you can find examples of that approach in other places in the tests).

(and as an aside, this strategy is itself using filter to generate a list of integers with a given product, contra the advice I am giving here. That might even be the reason this test is being flaky! Or, if it turns out that actually works, it could be due to some magic inside of hypothesis. At any rate, it should be possible to replace that with something more direct, like taking the product and splitting into some number of factors, e.g., by shuffling around its prime factors. That might not be a straightforward thing to do, and you might want input from the hypothesis devs here)

ev-br · 2024-11-20T13:14:36Z

assume(shape != ()) will make it so that the test is skipped unless shape != (). In other words, it will skip the test whenever the input array is 0-D. We don't want to do that, since 0-D inputs are valid for reshape.

Absolutely, yes. My confusion is I guess that this (wrong) assumption makes it consistently generate 0 examples. So all examples hypothesis otherwise generated (IOW, generates on main) are 0D?

asmeurer · 2024-11-20T22:46:23Z

It's looking like test_reshape is going to be much harder to fix. I'm going to give it a try at #319

asmeurer closed this as completed Nov 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test_reshape only tests zero-dim arrays #307

test_reshape only tests zero-dim arrays #307

ev-br commented Nov 16, 2024

asmeurer commented Nov 18, 2024

asmeurer commented Nov 18, 2024

ev-br commented Nov 20, 2024

asmeurer commented Nov 20, 2024

test_reshape only tests zero-dim arrays #307

test_reshape only tests zero-dim arrays #307

Comments

ev-br commented Nov 16, 2024

asmeurer commented Nov 18, 2024

asmeurer commented Nov 18, 2024

ev-br commented Nov 20, 2024

asmeurer commented Nov 20, 2024