Test services resilience #1039
Labels
a:infra+ops
maintenance of infrastructure or operations (discussed in retro)
a:services-library
issues on packages/service-libs
t:enhancement
Improvement or request on an existing feature
t:maintenance
Some planned maintenance work
Found some situations in which service suddenly crashes and the swarm constantly restarts it ... and enters in an endless stop-restart loop.
This happens either because code is broken (normally caught by pylint but not always) or a faulty design
The fact is: All services should guarantee certain level of resilience.
Examples
- storageGET /locations/?user_id ...
gets an error in db (connection drops) andTODO
Related with PBI
The text was updated successfully, but these errors were encountered: