Skip to content

Memory usage increases with disk latency – potential backpressure improvement? #259

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
apuig opened this issue Apr 4, 2025 · 1 comment
Assignees

Comments

@apuig
Copy link

apuig commented Apr 4, 2025

Hi there!

First of all, kudos on the great work with this library, it's really clever how you've architected the message durability and decoupled it from the upload mechanism. very nice.

While experimenting with the library (I'm relatively new to Kotlin), I noticed the use of trySend in Channel(UNLIMITED), which caught my attention. It made me wonder whether high disk latency could lead to unintended memory growth. We're evaluating to use this library on server environments, where we can't necessarily expect the same I/O performance as on Android devices, especially in virtualized environments or network-attached storage. Disk latency in these setups can be quite variable and could become a bottleneck.

To explore this, I prepared a test scenario where:

  • Events are written at a sustained rate.
  • Initially, the HTTP endpoint is down
  • After a short period, the endpoint begins to accept requests again.

Here’s a link to the test setup:
https://github.com/apuig/segmentio-analytics-kotlin

Is your feature request related to a problem? Please describe.
Under the scenario above, I've observed that memory usage increases significantly if disk I/O is slow, leading to unmanaged growth as events accumulate.

Describe the solution you'd like
Introduce a mechanism or configuration option to limit memory usage when backpressure builds due to slow disk writes.

Describe alternatives you've considered
I attempted to use a bounded dispatcher by providing a custom CoroutineConfiguration with a limited FileIODispatcher, but the memory usage growth persisted. I suspect there are other channels involved.

Thanks for taking the time to consider this!

@wenxi-zeng
Copy link
Contributor

hi @apuig, glad you like the new architecture! yes, your concern is valid that slow disk write would cause the queue to build up, thus growing the memory usage. that's why we enforce a size limit on the event payload to minimize the latency that could be caused by our SDK. however, environment disruption happens that could still cause such latency. to prevent this from happening, you could:

  • write your own flush policy to back off if flush is too excessive or unnecessary. this would reduce the memory impact on disk reads. see flush policy example here.
  • write a before plugin that drops events if disk latency is detected to be high. this would avoid events adding to the queue, thus reducing the impact on memory and disk writes. it'd be something like this:
class SomePlugin: Plugin {
    override val type = Plugin.Type.Before

    override lateinit var analytics: Analytics

    override fun execute(event: BaseEvent): BaseEvent? {
        if (disk or memory usage is high) {
            // return null to drop the event
            return null
        }
        return event
    }
}

be aware of that it is a trade-off between data integrity and memory optimization that one would have to make, because it's impossible to use less resources in extreme environments without losing some of the data. but in general, this SDK works very well in intensive dataflow (see benchmarks here). it's great to see the use of custom storage to simulate the scenario. very smart!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants