-
-
Notifications
You must be signed in to change notification settings - Fork 31.9k
bpo-44186: prevent TimedRotatingFileHandler overwriting log files #26553
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
TimedRotatingFileHandler deletes any file with the destination name when it rotates, this change adds a number to the end of the filename to prevent conflict.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR, but it would need tests and documentation updates before it could be merged.
Lib/logging/handlers.py
Outdated
@@ -255,6 +255,7 @@ def __init__(self, filename, when='h', interval=1, backupCount=0, | |||
raise ValueError("Invalid rollover interval specified: %s" % self.when) | |||
|
|||
self.extMatch = re.compile(self.extMatch, re.ASCII) | |||
self.file_number = re.compile(r'(\()([0-9])(\))$') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason to limit to just one digit? Also, why do we need capturing groups for the parentheses?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason to limit to just one digit?
no, this was an oversight on my part, thanks.
Also, why do we need capturing groups for the parentheses?
Me from 10 days ago had a good reason I think :D. I think my thought process was having a group like
group = ['(', '12', ')']
would be easier to parse the number out as you can get the number from group[1] but the same effect could be had from match[1:-1]
if the parenthesis weren't in a separate group. Funnily enough this situation would never arise thanks to the first issue you pointed out.
I'll look at this again when I get back from work to see If I've missed anything.
name = re.sub(self.file_number, f'({int(match.groups()[1]) + 1})', dfn) | ||
dfn = self.rotation_filename(name) | ||
else: | ||
dfn = self.rotation_filename(self.baseFilename + '.' + time.strftime(self.suffix, timeTuple) + ' (2)') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the + ' (2)'
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was attempting to mimic the default behaviour I saw from Windows 10, in Windows if you create a new document with a duplicate name it will create the documents like this:
New Document
New Document (2)
New Document (3)
On my Mac it omits the parenthesis and creating new documents yields a naming scheme like
New Document
New Document 2
New Document 3
I personally preferred the bracketed naming scheme but that is really just my personal preference. I don't know of anywhere else this pops up in the standard library but if there's a convention that I'm not following I'd be happy to rework that.
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
of course, I just wanted to see if this would be helpful before leaping in with the test/ docs changes. |
While I was writing the tests for this PR I think I uncovered a separate bug. I'm not sure when this would ever come up in real world code as in all of the logging docs and examples you only instantiate one logger and get it in different locations, but if you have multiple import datetime
import logging.handlers
import logging
import time
fn = 'test_file'
now = datetime.datetime.now()
handlers = [
logging.handlers.TimedRotatingFileHandler(
fn, 'midnight', atTime=now, encoding='utf-8', backupCount=1),
logging.handlers.TimedRotatingFileHandler(
fn, 'midnight', atTime=now, encoding='utf-8', backupCount=1),
logging.handlers.TimedRotatingFileHandler(
fn, 'midnight', atTime=now, encoding='utf-8', backupCount=1),
]
r = logging.makeLogRecord({'msg': 'test msg'})
handlers[0].emit(r) The above program yields this error.
I'll open a new bpo with this issue if it's unexpected behaviour. Regarding this PR I changed the existing test to simply touch a file and then check if the |
This PR is stale because it has been open for 30 days with no activity. |
That's not a bug - you're not supposed to do this. See https://docs.python.org/3/howto/logging-cookbook.html#patterns-to-avoid I've not had a chance to look at this for a while, sorry. |
@@ -382,7 +382,8 @@ timed intervals. | |||
The system will save old log files by appending extensions to the filename. | |||
The extensions are date-and-time based, using the strftime format | |||
``%Y-%m-%d_%H-%M-%S`` or a leading portion thereof, depending on the | |||
rollover interval. | |||
rollover interval. In the case that an existing file has the same name, | |||
a numbered suffix will be added to the new file to prevent overwriting logs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was looking at other issues recently - bpo-43344, bpo-44753 - so this solution might not work in those scenarios. Specifcally, in scenarios where the user wants to keep the suffix as ".log" or similar. After changes to address those issues, rollover logic expects rolled-over files to start with the base filename (minus extension) + ".", and contain self.extMatch
. I've also added comments on the issue itself.
@@ -401,8 +402,12 @@ def doRollover(self): | |||
timeTuple = time.localtime(t + addend) | |||
dfn = self.rotation_filename(self.baseFilename + "." + | |||
time.strftime(self.suffix, timeTuple)) | |||
if os.path.exists(dfn): | |||
os.remove(dfn) | |||
while os.path.exists(dfn): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would be more future-proof if it were structured as:
if os.path.exists(dfn):
self.backup_existing(dfn)
and the new backup_existing
method would have the behaviour similar to this docstring:
def backup_existing(self, dfn):
"""
Back up an existing file with the same name `dfn` as a destination file for rollover.
This method must find a unique filename in the same directory, related to `dfn`,
and rename the existing file to that name.
This implementation appends ' (N)' to `dfn`, where N is an integer going up from 1
until an integer value N is found such that no existing file with that ' (N)' suffix is found.
"""
Having a separate method allows someone with a custom scheme for naming logfiles (using the namer
attribute) to adapt their backup to that scheme by overriding the method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for being slow, I just wanted to clear make sure I have the correct understanding. I agree that the having a separate backupExisting
method is a good idea (probably backupExisting
instead of backup_existing
for consistency with the rest of the module?)
I wasn't 100% sure on the behaviour of the method though, from my understanding on the rollover, the new file should be created with a suffix, from the docstring above, it seems like you want to rename the existing file to have a suffix?
as an example, I have the following log files
test.log
========
[INFO]: foo
test.log.2021-08-02
========
[INFO]: bar
for all the examples below, I'll denote that structure as
test.log # foo
test.log.2021-08-02 # bar
I was imagining the behaviour below:
test.log # baz
test.log.2021-08-02 # bar
test.log.2021-08-02 (1) # foo
This would mean that for n files, the latest file would always have the highest suffix. The current log would obviously have no suffix, the oldest rollover log would only have the date, and the following files would have (1), ... (N) in the order that they were generated.
From reading your comment, implemeting that method would yield the behaviour below:
test.log # baz
test.log.2021-08-02 # foo
test.log.2021-08-02 (1) # bar
where the most recent file would rename the already existing file in the directory and "bump" it up a number.
Personally I prefer the first method as implementing the method this way could cause some ambiguity i.e. in the case of a directory with suffixes (1)
, (2)
, (3)
, what happens to the newest file? The newest file gets the suffix (1)
, and then does the already existing (1)
get bumped to (4)
, or does it get bumped to (2)
, and then (2)
-> (3)
and so on.
I'm not sure how this different logic affects the other issues you linked, I had a look at them but I need to do some testing to see the results myself as I couldn't 100% follow just the PR and bpo. I'll try and implement the logic and update this PR tonight.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't rush to update the PR, because maybe we should think about things some more. I'm not able to say much at the minute as I'm pressed for time, but I was thinking along the lines of: a target file that exists when rolling over should be an unusual case, not a common case (This issue hasn't come up in the many years since TimedRotatingFileHandler
was added, IIRC). So, my proposed approach treats the existing file as just a backup (so stuff doesn't get lost), but once it's renamed, it's cleared the way for the expected dfn
to be used, and the "usual" scheme applies from then on for log files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(probably backupExisting instead of backup_existing for consistency with the rest of the module?
Yes, good point. The original logging module was written back before PEP8, and kept the old-style naming for backward compatibility. So backupExisting
is right.
Thank you for your contribution @Harry-Lees, but this issue was fixed in other way in #116191. |
When TimedRotatingFileHandler rotates, it deletes any files with the target name, this is especially noticeable if the rollover time is midnight as it will potentially overwrite an entire days logs.
https://bugs.python.org/issue44186