Optimize translog writes by moving the older file deletes in trimUnreferencedReaders call outside the write lock. #55530

itiyamas · 2020-04-21T14:05:32Z

TrimUnreferencedReaders call involves closing and removing old reader files and then updating the checkpoint of current writer to change minimum translog generation. Because the reader references are changed, we make this call under a writeLock to maintain consistency. However, the actual file deletes and file channel close need not be in the write lock once the references are switched and checkpoint updated. The recoverFromFiles can take care of it by deleting the older translog files. Also, the sync can be done once for a batch of files instead of doing it multiple times. This will reduce the total amount of time a writeLock is held.
I did an experiment on ES version 7.4 on a single node i3.8xl. The nyc_taxis rally benchmark was used with a refresh interval of 30seconds and 50 bulk clients.

Here are the results:
The total amount of writeLock time was reduced from close to 800-1000 ms to close to 20ms at a per minute level. This is for an i3.8xl instance which has a fast disk. I suppose the improvement would be higher for slower disks.

The overall improvement in indexing throughput is close to 1% by just moving the delete and channel close calls outside of the lock. The sync was also done just once instead of multiple times as we are anyway within the writeLock.

Let me know if this makes sense.

itiyamas · 2020-04-21T14:24:48Z

This is the change that I tested. Not made changes to the recovery part though: https://gist.github.com/itiyamas/0568a4af2b97ac5870cc5f9c3dd7679a

dnhatn · 2020-04-21T16:50:44Z

Hi @itiyamas,

Thank you for your interest in Elasticsearch.

We discussed this idea before and decided not to pursue it because the cleanup logic will be more complicated and less safe.

Relates to #46203

elasticmachine · 2020-04-21T16:52:24Z

Pinging @elastic/es-distributed (:Distributed/Engine)

itiyamas · 2020-04-22T13:08:07Z

Sorry for not going through it earlier, but this is a bit different from the earlier one.

Here, you are not moving sync outside write lock, but doing it just once within the writeLock. The sync is followed by file deletes outside the write lock. Executing sync outside writeLock is less safe and more complicated as you need to handle a lot of edge cases, but here you just delete a contiguous location of translog files from minGen, which is not complicated in my opinion.

dnhatn added the :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. label Apr 21, 2020

dnhatn closed this as completed Apr 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize translog writes by moving the older file deletes in trimUnreferencedReaders call outside the write lock. #55530

Optimize translog writes by moving the older file deletes in trimUnreferencedReaders call outside the write lock. #55530

itiyamas commented Apr 21, 2020

itiyamas commented Apr 21, 2020

dnhatn commented Apr 21, 2020

elasticmachine commented Apr 21, 2020

itiyamas commented Apr 22, 2020

Optimize translog writes by moving the older file deletes in trimUnreferencedReaders call outside the write lock. #55530

Optimize translog writes by moving the older file deletes in trimUnreferencedReaders call outside the write lock. #55530

Comments

itiyamas commented Apr 21, 2020

itiyamas commented Apr 21, 2020

dnhatn commented Apr 21, 2020

elasticmachine commented Apr 21, 2020

itiyamas commented Apr 22, 2020