Memory usage increases with disk latency – potential backpressure improvement? #259

apuig · 2025-04-04T16:01:48Z

Hi there!

First of all, kudos on the great work with this library, it's really clever how you've architected the message durability and decoupled it from the upload mechanism. very nice.

While experimenting with the library (I'm relatively new to Kotlin), I noticed the use of trySend in Channel(UNLIMITED), which caught my attention. It made me wonder whether high disk latency could lead to unintended memory growth. We're evaluating to use this library on server environments, where we can't necessarily expect the same I/O performance as on Android devices, especially in virtualized environments or network-attached storage. Disk latency in these setups can be quite variable and could become a bottleneck.

To explore this, I prepared a test scenario where:

Events are written at a sustained rate.
Initially, the HTTP endpoint is down
After a short period, the endpoint begins to accept requests again.

Here’s a link to the test setup:
https://github.com/apuig/segmentio-analytics-kotlin

Is your feature request related to a problem? Please describe.
Under the scenario above, I've observed that memory usage increases significantly if disk I/O is slow, leading to unmanaged growth as events accumulate.

Describe the solution you'd like
Introduce a mechanism or configuration option to limit memory usage when backpressure builds due to slow disk writes.

Describe alternatives you've considered
I attempted to use a bounded dispatcher by providing a custom CoroutineConfiguration with a limited FileIODispatcher, but the memory usage growth persisted. I suspect there are other channels involved.

Thanks for taking the time to consider this!

The text was updated successfully, but these errors were encountered:

wenxi-zeng · 2025-04-07T14:41:00Z

hi @apuig, glad you like the new architecture! yes, your concern is valid that slow disk write would cause the queue to build up, thus growing the memory usage. that's why we enforce a size limit on the event payload to minimize the latency that could be caused by our SDK. however, environment disruption happens that could still cause such latency. to prevent this from happening, you could:

write your own flush policy to back off if flush is too excessive or unnecessary. this would reduce the memory impact on disk reads. see flush policy example here.
write a before plugin that drops events if disk latency is detected to be high. this would avoid events adding to the queue, thus reducing the impact on memory and disk writes. it'd be something like this:

class SomePlugin: Plugin {
    override val type = Plugin.Type.Before

    override lateinit var analytics: Analytics

    override fun execute(event: BaseEvent): BaseEvent? {
        if (disk or memory usage is high) {
            // return null to drop the event
            return null
        }
        return event
    }
}

be aware of that it is a trade-off between data integrity and memory optimization that one would have to make, because it's impossible to use less resources in extreme environments without losing some of the data. but in general, this SDK works very well in intensive dataflow (see benchmarks here). it's great to see the use of custom storage to simulate the scenario. very smart!

apuig assigned prayansh Apr 4, 2025

wenxi-zeng assigned wenxi-zeng and unassigned prayansh Apr 7, 2025

wenxi-zeng closed this as completed Apr 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Memory usage increases with disk latency – potential backpressure improvement? #259

Memory usage increases with disk latency – potential backpressure improvement? #259

apuig commented Apr 4, 2025

wenxi-zeng commented Apr 7, 2025

Uh oh!

Memory usage increases with disk latency – potential backpressure improvement? #259

Memory usage increases with disk latency – potential backpressure improvement? #259

Comments

apuig commented Apr 4, 2025

wenxi-zeng commented Apr 7, 2025

Uh oh!