-
Notifications
You must be signed in to change notification settings - Fork 1.6k
IP persistence storage seems to not clean up #574
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Original reporter: wneessen |
wneessen: (This is actually a clone of MODSEC-325) Hi Breno, I am sorry that I have to re-open this old ticket (or actually clone). After reading your response I didn't further investigate this issue and simply disabled mod_security on the specific host, which resolved our problems so far. Here is a short timeline of what happend and what I was able to capture: Re-Enabled mod_security with the same ruleset on Tuesday. The rules worked as expected. [05/Sep/2013:21:47:09 +0000] [xxxxxxx.cleverbridge.com/sid#80171c530][rid#802dbe0a0][/XXXXXXXXX/xxxxxxx-xxxxxxx-xxxxxxx-xxxxxxx/xxxxxxx.pdf][1] collections_remove_stale: Failed deleting collection (name "ip", key "80.XXX.XXX.XXX_6e1c38356769095ad76267f9edb43ddae2af1491"): Internal error The /var/tmp/ip.pag file grew to a size of 7GiB: % ls -l /var/tmp/ip.pag :-/ % du -A /var/tmp/ip.pag (FreeBSD userland "du" - values in kbytes) % gdu -b /var/tmp/ip.pag (GNU "du" on FreeBSD) The SecCollectionTimeout was set to "600". Though this is a download server, it is not hit that hard, that it should have a collection of 7 GiB of IP-Useragent combinations given Thanks for you help. |
Hello, We had the same unfortunate experience. The solution was to erase ip.pag et let modsec create a new one. In apache error.log file, we can see lots of:
I added in modsecurity.conf the config:
but we still have the same errors. Server: Debian Wheezy |
Is there any way we can help debug that problem? The only info I found about it: "Looks like this error message is from a apr function." from Breno (seems to be the one in charge to find the bug) (http://marc.info/?l=mod-security-users&m=136683562925114&w=2) |
The keys found in the error messages are NOT in the ip.pag file.
It seems the problem doesn't come from modsec not able to delete keys. After a few checks, we can see that the number of keys in ip.pag file goes up and down. From apache2/persist_dbm.c (line 218)
And in apr-util-1.5.3, in dbm/sdbm/sdbm.c (line 230),
I'd say that the status code received in modsec comes from the call to getPage that is different from APR_SUCCESS as the key doesn't exist in the db. We changed our modesc local exceptions to not log anymore (that represents a lot of requests) and set the SecCollectionTimeout directive to 600 (it wasn't set at all before but the manual says the default value is 3600) and since, the ip.pag file stays at a good size (110Mo for now) Maybe you should just add more granularities to your tests and errors/warnings messages. |
Hi @quenenni, thanks. Are you interested in provide your solution in the format of a pull request? |
We have made an utility that may be useful for those who are using persistent collections in sdbm files. The utility allows the users to manipulate the sdbm files used by ModSecurity, including shrinking it, by remove the expired keys or malformed content. The utility is available here: @quenenni, can you check if you still have malformed content by using this utility? |
Sorry for the late answer. I did some more testing by playing with the "SecCollectionTimeout" directive and by reducing the logging ('nolog' for lots of rules I did). I can already say: I'm going to keep that config a few days more and then I'll test by removing the "SecCollectionTimeout" directive to see if by default, it's indeed considered as 3600. @zimmerle I'll gladly test you utilty, but, can you explain how to use it. I couldn't figure it out myself. @zimmerle usually, I like to offer solutions, but in this case, I prefer not as I don't develop in c and I'm not sure I'll be able to do something good. Sorry. |
First off, thanks to all that contributed to this thread, it helped put me on the right track. Just thought I would add a note here, considering I have had the exact same issue as quenenni. Server load would spike from 2 to 20 in a seemingly random manner, and each time it appeared to be an httpd process that was causing the load spike. The disk IO was also excruciatingly slow throughout the day. After a lot of debugging, we narrowed down the issue to mod_security (oddly enough, and largely still unexplained, the same mod_security rules are deployed on dozens of servers but only a few were exhibiting issues). When we disabled the rules, the load issues stopped. When we reactivated, the load problem returned. Running an lsof -p on the apache process eventually led me to our now-infamous tmp/ip.pag storage file. That file alone had reached 16GB (on a partition that was otherwise limited to 8GB, so not sure I understand how that could even work). Following quenenni's lead I emptied the file and the performance improvement was instant. The load dropped entirely. I added the following parameter to my config: And restarted Apache. So far after 30 minutes or so the ip.pag file has grown to 117MB, but performance is still optimal. So my question is: does setting the SecCollectionTimeout properly limit the file size in the long run? FYI I am running on CloudLinux 6 with Apache 2.4 |
Hi @Xavior82, That is a difficult question to answer because it depends on the load of your server. Usually reduce this time out is enough to have collections with "feasible" size. You can list the content of the "ip.pag" or even perform a out-of-band "shrink" using modsec-sdbm-util. |
Is the memcache-based replacement ready yet, as the current implementation seems broken ? |
Hi @tomchiverton, there is this branch: It is not ready for production yet, there was some issues reported, and so far no time to fix it yet. Did you tried this `memcache_collections' branch already? |
No, I was just wondering how well it had been tested, since the current storage system is still causing issues at even our moderate scale (~10req/sec), and from the looks of this bug has been for others for a year. |
This is still being worked on? |
Is this specific to Apache? Does anybody have an idea about the cause fo this bug? Is it so complex to fix? If it's only a warning that we can ignore, shouldn't we disable it (or log it only once per day/week/month with the number of failures)? |
Hi @djprmf, The back-end used to save the collections has some limitations that is boynd the way that ModSecurity uses it. As an alternative to this gdbm back-end we implemented the memchace test, which did not became so popular. Others alternatives were also implemented, as the example of the modsec-sdbm-uitl: https://github.com/SpiderLabs/modsec-sdbm-util This utility can be used to list and cleanup/shrink the sdbm file out-of-band. To circumvent this particular "issue" you may want to reduce the collection timeout: SecCollectionTimeout - https://github.com/SpiderLabs/ModSecurity/wiki/Reference-Manual#seccollectiontimeout And/or, generally, use the collection with more parsimony. Depending on the amount of usage of the persistent collections functionality, even if there was a cleanup, the amount of I/O in your server will be very high due the way it was implemented. In ModSecurity version 3 (under development) we tackle the persistent collection in other approach: we let the user select the proper back-end regarding the infrastructure that he/she wants to support. That allow us to support a larger number of users, from a simple setup (with less request/second, lets say...) to a high volume ISP (with very high number of requests). I am tagging this issue as to be fixed as part of the release of ModSecurity version 3. @marcstern The problem does not seems to be a implementation problem, rather, the back-end selection. Yes we can re-implement it without using apr or using other back-end such as the memcache implementation. That was the case, but the current target is the ModSecurity version 3 where this feature is actually under development. |
Sure, but why does this bug occur? Is it a bug in sdbm or the way we use it? |
Hi @marcstern, Do you mind to get the text in your comment well formatted? it is important to keep the readability of those issues, it is served as knowledge base to many users. I think github allows you to edit your comment. While editing the response it will be good if you can describe the circumstances where you are facing this issue, we may able to help by tuning your configuration a little bit. |
I have it on a dozen of machines (the others we suppressed the message). |
Just ran into this issue in a test machine. Even after running it with -k, the ip.pag file was still very large:
|
This ticket is 2013, so far! My God, when will that be solved? |
Please see #576 (comment) Looks to be same issue and same solution. |
This is fixed by #1274. |
MODSEC-426: It seem that mod_security is not cleaning up the persistence storage for the IP-storage I am using.
Here is a simple test ruleset I am using (nothing else):
SecRule IP:dos_block "@eq 1"
"phase:2,block,msg:'IP address blocked due to high connection rate.',
id:2003,status:403,severity:'CRITICAL',tag:'DoS'"
SecRule SCRIPT_FILENAME "@rx /www/content/prod/dl/dl.php"
"phase:5,chain,t:none,nolog,pass,id:2001,severity:'INFO',tag:'DoS',setvar:IP.dos_counter=+1,expirevar:IP.dos_counter=60"
SecRule IP:dos_counter "@gt 100"
"t:none,setvar:IP.dos_block,setvar:!IP.dos_counter,expirevar:IP.dos_block=60"
The ruleset is working fine. But what I am noticing is, that the /var/tmp/ip.pag file is constantly growing. It is currently at >1GiB
size. When I do a "strings" on that file, I see entries like this:
__expire_KEY
1344192686
794.224.68.113_64d9d88c927c58359a3649f30a00e95070bbb8c0
TIMEOUT
3600
__key
794.224.68.113_64d9d88c927c58359a3649f30a00e95070bbb8c0
__name
CREATE_TIME
1344189037
UPDATE_COUNTER
dos_counter
__expire_dos_counter
1344189097
LAST_UPDATE_TIME
1344189086
94.224.68.113_64d9d88c927c58359a3649f30a00e95070bbb8c0
__expire_KEY
1343169406
868.167.225.250_77150c41806fda817314ba9c40e040c598830d5d
TIMEOUT
3600
__key
868.167.225.250_77150c41806fda817314ba9c40e040c598830d5d
__name
CREATE_TIME
1343165769
UPDATE_COUNTER
dos_counter
__expire_dos_counter
1343165829
LAST_UPDATE_TIME
1343165806
68.167.225.250_77150c41806fda817314ba9c40e040c598830dIR
__expire_KEY
1344005533
786.132.126.93_aa739e3aaaa1fbfc8667bb26120e9930df881a82
TIMEOUT
3600
__key
786.132.126.93_aa739e3aaaa1fbfc8667bb26120e9930df881a82
__name
[...]
Now when I convert the UNIX time strings to UTC, it shows me dates, which are already
expired since weeks. I would asume, that if an entry is expired, that the garbage collection
of mod_sec would remove the entry from the ip.pag file, to keep it small. Or am I wrong?
Any assistance would be highly appreciated.
The text was updated successfully, but these errors were encountered: