Skip to content

GC frequency option #1298

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed

Conversation

bostrt
Copy link

@bostrt bostrt commented Jan 6, 2017

This pull request adds a new directive for configuring how often garbage collections run. Running less/more often can drastically improve performance based on request load.

@marcstern
Copy link

Any hint about the impact of SecCollectionGCFrequency?
How different is it from running an external process (sbdm-util)? Advantages/disadvantages?

@bostrt
Copy link
Author

bostrt commented Feb 8, 2017

@marcstern sorry for the delayed response. The impact of running a less frequent GC significantly improves performance by reducing the number of times ModSecurity has to scan an entire database file to perform GC.

SecCollectionGCFrequency was added to prevent the need for running an external process. Also, note the main reason for addition of SecCollectionGCFrequency was to prevent high frequency scans of the database.

@marcstern
Copy link

Do I understand correctly: if I decrease the frequency, it enhances the performance but the size & fragmentation increase, thus leading to more acces problems to the keys ("Failed deleting collection", "Permission denied on collection")

@bostrt
Copy link
Author

bostrt commented Feb 11, 2017

@marcstern Correct, just like in other garbage collection implementations, decreasing the frequency of collections will cause the database size to increase. As for fragmentation, I think that strictly depends on the collection timeout configurations.

I know I keep saying this in different ways, but I think it is the most important fact to keep in mind for this patch: ModSecurity's garbage collection scans must scan the entire database (expired and live elements). If I have a good understanding of how users/applications interact with my website and how much expired vs. live data I have in a persistent collection over time, then I should be able to tweak the garbage collection frequency to match; especially since each garbage collection scan is expensive.

I think the access problems need to be resolved independently of this performance issue. I imagine some of the other recent pull requests will help with these: #1224, #1274

@bostrt
Copy link
Author

bostrt commented Feb 11, 2017

@marcstern

then I should be able to tweak the garbage collection frequency to match; especially since each garbage collection scan is expensive.

Extending on this statement...

Assume I have a high traffic server receiving 100 requests per second. Also, assume I have a collection timeout of 60 seconds.

ModSecurity currently performs a garbage collection scan with probability of 1 every 100 requests. For my scenario above, that would be a scan every second. Also, this means that 59 of those scans would be useless since no collections will have timed out until the 60 second mark.

@bostrt
Copy link
Author

bostrt commented Feb 15, 2017

@marcstern I think any concerns about "Failed deleting collection" in regards to this pull request can be dismissed based on #576 (comment)

@zimmerle
Copy link
Contributor

The GC frequency option seems to be cool. It was already merged on the top of v2/exp/collection_garbage_freq. Won't be merged to mainline as it may be on hold/test till 2.10 as it is a significant feature.

Thanks @bostrt :)

@zimmerle zimmerle closed this May 21, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants