Skip to content

[Service Bus & Event Hubs] Improve livetest test stability #21789

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
yunhaoling opened this issue Nov 16, 2021 · 12 comments · Fixed by #31671
Closed

[Service Bus & Event Hubs] Improve livetest test stability #21789

yunhaoling opened this issue Nov 16, 2021 · 12 comments · Fixed by #31671
Assignees
Labels
Client This issue points to a problem in the data-plane of the library. Event Hubs Messaging Messaging crew MQ This issue is part of a "milestone of quality" initiative. Service Bus

Comments

@yunhaoling
Copy link
Contributor

  • finding if there's any tests repeatedly failing in our livetest
  • whether it's an issue in SDK or VM agents' problem
  • if not issue in SDK, improve the test
@yunhaoling yunhaoling added Service Bus Event Hubs Client This issue points to a problem in the data-plane of the library. MQ This issue is part of a "milestone of quality" initiative. Messaging Messaging crew labels Nov 16, 2021
@swathipil
Copy link
Member

check devops and check if something can be combined with stress test onboarding

@swathipil
Copy link
Member

swathipil commented Jan 12, 2022

@yunhaoling yunhaoling added this to the [2022] February milestone Jan 19, 2022
@swathipil
Copy link
Member

swathipil commented Jan 26, 2022

update:

  • have been checking the nightly test runs, and there are no consistently flaky tests other than one above. Just some other ones only failing once from what I've seen:

EH:
tests/livetest/synctests/test_consumer_client.py::test_receive_batch_no_max_wait_time

SB:
FAILED tests/test_subscriptions.py::ServiceBusSubscriptionTests::test_subscription_by_servicebus_client_receive_batch_with_deadletter: https://dev.azure.com/azure-sdk/internal/_build/results?buildId=1326700&view=logs&j=fa284728-64c9-5de0-4442-32743d14c85e&t=b93d123d-eb8b-5272-3a5e-e2a5347ca4ca&l=2272
FAILED tests/async_tests/test_queues_async.py::ServiceBusQueueAsyncTests::test_async_queue_by_servicebus_client_browse_messages_with_receiver
FAILED tests\test_queues.py::ServiceBusQueueTests::test_queue_by_queue_client_conn_str_receive_handler_with_autolockrenew - https://dev.azure.com/azure-sdk/internal/_build/results?buildId=1326700&view=logs&j=217ad987-cc73-5dcd-abbd-eb23972de726&t=f10366f4-42a0-5bf3-0a2a-6be221a21989&l=2126
FAILED tests/test_queues.py::ServiceBusQueueTests::test_queue_send_dict_messages_scheduled
FAILED tests/test_queues.py::ServiceBusQueueTests::test_queue_operation_negative

@swathipil
Copy link
Member

check 12/28:
FAILED tests/async_tests/test_queues_async.py::ServiceBusQueueAsyncTests::test_async_queue_by_queue_client_conn_str_receive_handler_peeklock

12/31:
FAILED tests/test_sessions.py::ServiceBusSessionTests::test_session_by_servicebus_client_session_pool
FAILED tests/test_sessions.py::ServiceBusSessionTests::test_session_by_servicebus_client_session_pool
FAILED tests/test_sessions.py::ServiceBusSessionTests::test_session_connection_failure_is_idempotent
FAILED tests/async_tests/test_sessions_async.py::ServiceBusAsyncSessionTests::test_async_session_connection_failure_is_idempotent
FAILED tests/async_tests/test_queues_async.py::ServiceBusQueueAsyncTests::test_async_queue_receiver_respects_max_wait_time_overrides
FAILED tests/async_tests/test_sessions_async.py::ServiceBusAsyncSessionTests::test_async_session_by_servicebus_client_session_pool
FAILED tests/async_tests/test_sessions_async.py::ServiceBusAsyncSessionTests::test_async_session_by_session_client_conn_str_receive_handler_with_no_session
FAILED tests/async_tests/test_sessions_async.py::ServiceBusAsyncSessionTests::test_async_session_cancel_scheduled_messages
FAILED tests/async_tests/test_sessions_async.py::ServiceBusAsyncSessionTests::test_async_session_cancel_scheduled_messages
FAILED tests/test_sessions.py::ServiceBusSessionTests::test_session_by_servicebus_client_session_pool
FAILED tests/async_tests/test_queues_async.py::ServiceBusQueueAsyncTests::test_async_queue_receiver_respects_max_wait_time_overrides
FAILED tests/async_tests/test_sessions_async.py::ServiceBusAsyncSessionTests::test_async_session_by_servicebus_client_session_pool
FAILED tests/async_tests/test_sessions_async.py::ServiceBusAsyncSessionTests::test_async_session_by_session_client_conn_str_receive_handler_with_no_session
FAILED tests/async_tests/test_subscriptions_async.py::ServiceBusSubscriptionAsyncTests::test_topic_by_servicebus_client_receive_batch_with_deadletter
FAILED tests/test_queues.py::ServiceBusQueueTests::test_queue_receive_keep_conn_alive
FAILED tests/test_sessions.py::ServiceBusSessionTests::test_session_by_servicebus_client_session_pool
FAILED tests/test_sessions.py::ServiceBusSessionTests::test_session_connection_failure_is_idempotent
FAILED tests/async_tests/test_queues_async.py::ServiceBusQueueAsyncTests::test_queue_receive_keep_conn_alive_async
FAILED tests/mgmt_tests/test_mgmt_namespaces.py::ServiceBusManagementClientNamespaceTests::test_mgmt_namespace_get_properties

FIX: async/sync test_session_by_servicebus_client_session_pool: https://dev.azure.com/azure-sdk/internal/_build/results?buildId=1336668&view=logs&j=a5c73adf-21c4-51b0-3477-575974909b75&t=920caeb1-8790-523a-08a8-f5f4e4c5560b&l=1787

@swathipil
Copy link
Member

swathipil commented Feb 10, 2022

@swathipil
Copy link
Member

swathipil commented Feb 18, 2022

The error in test_session_by_servicebus_client_session_pool with Cannot open log for source 'Microsoft.ServiceBus' seems to be an error with the service. Similar issue was filed here : Azure/azure-sdk-for-net#27067

For now, ignoring this error in the test.

EDIT: It looks like this error is happening in every test where .get_queue_receiver(...session_id=NEXT_AVAILABLE_SESSION) and seems to be after this call results in an OperationTimeoutError. For ex:

with sb_client.get_queue_receiver(servicebus_queue.name,

@swathipil
Copy link
Member

nightly runs have been green for a week so closing

@swathipil
Copy link
Member

@swathipil swathipil reopened this Mar 7, 2022
@yunhaoling yunhaoling modified the milestones: [2022] March, [2022] April Mar 15, 2022
@yunhaoling
Copy link
Contributor Author

success criteria: 90% pass rate for the past two weeks

@yunhaoling yunhaoling modified the milestones: [2022] April, [2022] May Mar 15, 2022
@lmazuel lmazuel modified the milestones: [2022] May, [2022] June May 16, 2022
@lmazuel lmazuel modified the milestones: 2022-06, 2022-11 Sep 6, 2022
@swathipil
Copy link
Member

swathipil commented Oct 14, 2022

most flaky EH tests:

  • sync and async test_buffered_producer.py:
    • test_basic_send_single_events_round_robin[--False]
    • test_long_wait_small_buffer:
        with producer:
            for i in range(100):
                producer.send_event(EventData("test"))
    
        time.sleep(60)
    
        assert not on_error.err
        assert sum([len(sent_events[key]) for key in sent_events]) == 100
>       assert sum([len(received_events[key]) for key in received_events]) == 100
E       assert 50 == 100
E         +50
E         -100
  • test_basic_send_batch_events_round_robin[--False]
            if not flush_after_sending and not close_after_sending:
                # ensure it's buffered sending
                for pid in partitions:
                    assert len(sent_events[pid]) < each_partition_cnt
                assert sum([len(sent_events[pid]) for pid in partitions]) < total_events_cnt
                # give some time for producer to complete sending and consumer to complete receiving
            else:
                if flush_after_sending:
                    producer.flush()
                if close_after_sending:
                    producer.close()
                # ensure all events are sent
                assert sum([len(sent_events[pid]) for pid in partitions]) == total_events_cnt
    
            time.sleep(10)
>           assert len(sent_events) == len(received_events) == partitions_cnt
E           assert 2 == 1
E             +2
E             -1
  • sync and async test_send.py:
    • test_send_with_partition_key[***]:
source = "amqps://{}/{}/ConsumerGroups/{}/Partitions/{}".format(
                        live_eventhub['hostname'],
                        live_eventhub['event_hub'],
                        live_eventhub['consumer_group'],
                        index)
                    partition = uamqp.ReceiveClient(source, auth=sas_auth, debug=***, timeout=0, prefetch=500)
                    reconnect_receivers.append(partition)
                    retry_total += 1
            if retry_total == 3:
                raise OperationTimeoutError(f"Exhausted retries for receiving from {live_eventhub['hostname']}.")
    
        for r in reconnect_receivers:
            r.close()
    
>       assert single_cnt == 60
E       assert 30 == 60
E         +30
E         -60

In general, only receiving half of expected events when using a ConsumerClient to receive, instead of receivers in the conftest fixture.

@kristapratico kristapratico removed this from the 2022-11 milestone Aug 17, 2023
@github-actions github-actions bot locked and limited conversation to collaborators Jan 2, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Client This issue points to a problem in the data-plane of the library. Event Hubs Messaging Messaging crew MQ This issue is part of a "milestone of quality" initiative. Service Bus
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants