Skip to content

bpo-44186: prevent TimedRotatingFileHandler overwriting log files #26553

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 7 commits into from

Conversation

Harry-Lees
Copy link
Contributor

@Harry-Lees Harry-Lees commented Jun 5, 2021

When TimedRotatingFileHandler rotates, it deletes any files with the target name, this is especially noticeable if the rollover time is midnight as it will potentially overwrite an entire days logs.

https://bugs.python.org/issue44186

TimedRotatingFileHandler deletes any file with the destination name when it rotates, this change adds a number to the end of the filename to prevent conflict.
Copy link
Member

@vsajip vsajip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, but it would need tests and documentation updates before it could be merged.

@@ -255,6 +255,7 @@ def __init__(self, filename, when='h', interval=1, backupCount=0,
raise ValueError("Invalid rollover interval specified: %s" % self.when)

self.extMatch = re.compile(self.extMatch, re.ASCII)
self.file_number = re.compile(r'(\()([0-9])(\))$')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to limit to just one digit? Also, why do we need capturing groups for the parentheses?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to limit to just one digit?

no, this was an oversight on my part, thanks.

Also, why do we need capturing groups for the parentheses?

Me from 10 days ago had a good reason I think :D. I think my thought process was having a group like
group = ['(', '12', ')'] would be easier to parse the number out as you can get the number from group[1] but the same effect could be had from match[1:-1] if the parenthesis weren't in a separate group. Funnily enough this situation would never arise thanks to the first issue you pointed out.

I'll look at this again when I get back from work to see If I've missed anything.

name = re.sub(self.file_number, f'({int(match.groups()[1]) + 1})', dfn)
dfn = self.rotation_filename(name)
else:
dfn = self.rotation_filename(self.baseFilename + '.' + time.strftime(self.suffix, timeTuple) + ' (2)')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the + ' (2)' ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was attempting to mimic the default behaviour I saw from Windows 10, in Windows if you create a new document with a duplicate name it will create the documents like this:

New Document
New Document (2)
New Document (3)

On my Mac it omits the parenthesis and creating new documents yields a naming scheme like

New Document
New Document 2
New Document 3

I personally preferred the bracketed naming scheme but that is really just my personal preference. I don't know of anywhere else this pops up in the standard library but if there's a convention that I'm not following I'd be happy to rework that.

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@Harry-Lees
Copy link
Contributor Author

Thanks for the PR, but it would need tests and documentation updates before it could be merged.

of course, I just wanted to see if this would be helpful before leaping in with the test/ docs changes.

@Harry-Lees
Copy link
Contributor Author

While I was writing the tests for this PR I think I uncovered a separate bug. I'm not sure when this would ever come up in real world code as in all of the logging docs and examples you only instantiate one logger and get it in different locations, but if you have multiple TimedRotatingFileHandlers with the same filename, Windows locks the file and doesn't allow emitting messages from either of the file handlers. I haven't done extensive testing on this at all, this is just a small example (I ran the same code on this branch, Python3.9, and Python3.10 so I'm sure it wasn't something I broke).

import datetime
import logging.handlers
import logging
import time

fn = 'test_file'
now = datetime.datetime.now()
handlers = [
    logging.handlers.TimedRotatingFileHandler(
        fn, 'midnight', atTime=now, encoding='utf-8', backupCount=1),
    logging.handlers.TimedRotatingFileHandler(
        fn, 'midnight', atTime=now, encoding='utf-8', backupCount=1),
    logging.handlers.TimedRotatingFileHandler(
        fn, 'midnight', atTime=now, encoding='utf-8', backupCount=1),
]

r = logging.makeLogRecord({'msg': 'test msg'})

handlers[0].emit(r)

The above program yields this error.

--- Logging error ---
Traceback (most recent call last):
  File "C:\Users\Harry\AppData\Local\Programs\Python\Python310\lib\logging\handlers.py", line 74, in emit
    self.doRollover()
  File "C:\Users\Harry\AppData\Local\Programs\Python\Python310\lib\logging\handlers.py", line 403, in doRollover
    self.rotate(self.baseFilename, dfn)
  File "C:\Users\Harry\AppData\Local\Programs\Python\Python310\lib\logging\handlers.py", line 115, in rotate
    os.rename(source, dest)
PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\\Users\\Harry\\AppData\\Local\\Programs\\Python\\Python39\\test_file' -> 'C:\\Users\\Harry\\AppData\\Local\\Programs\\Python\\Python39\\test_file.2021-06-20'
Call stack:
  File "<string>", line 1, in <module>
  File "C:\Users\Harry\AppData\Local\Programs\Python\Python310\lib\idlelib\run.py", line 155, in main
    ret = method(*args, **kwargs)
  File "C:\Users\Harry\AppData\Local\Programs\Python\Python310\lib\idlelib\run.py", line 555, in runcode
    exec(code, self.locals)
  File "C:\Users\Harry\AppData\Local\Programs\Python\Python39\test.py", line 20, in <module>
    handlers[0].emit(r)
Message: 'test msg'
Arguments: ()

I'll open a new bpo with this issue if it's unexpected behaviour.

Regarding this PR I changed the existing test to simply touch a file and then check if the TimedRotatingFileHandler recognises the existing file which was the 2nd best option I could think of.

@Harry-Lees Harry-Lees requested a review from vsajip June 21, 2021 22:42
@github-actions
Copy link

This PR is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale Stale PR or inactive for long period of time. label Jul 30, 2021
@vsajip
Copy link
Member

vsajip commented Aug 2, 2021

but if you have multiple TimedRotatingFileHandlers with the same filename, Windows locks the file and doesn't allow emitting messages from either of the file handlers

That's not a bug - you're not supposed to do this. See https://docs.python.org/3/howto/logging-cookbook.html#patterns-to-avoid

I've not had a chance to look at this for a while, sorry.

@vsajip vsajip removed the stale Stale PR or inactive for long period of time. label Aug 2, 2021
@@ -382,7 +382,8 @@ timed intervals.
The system will save old log files by appending extensions to the filename.
The extensions are date-and-time based, using the strftime format
``%Y-%m-%d_%H-%M-%S`` or a leading portion thereof, depending on the
rollover interval.
rollover interval. In the case that an existing file has the same name,
a numbered suffix will be added to the new file to prevent overwriting logs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was looking at other issues recently - bpo-43344, bpo-44753 - so this solution might not work in those scenarios. Specifcally, in scenarios where the user wants to keep the suffix as ".log" or similar. After changes to address those issues, rollover logic expects rolled-over files to start with the base filename (minus extension) + ".", and contain self.extMatch. I've also added comments on the issue itself.

@@ -401,8 +402,12 @@ def doRollover(self):
timeTuple = time.localtime(t + addend)
dfn = self.rotation_filename(self.baseFilename + "." +
time.strftime(self.suffix, timeTuple))
if os.path.exists(dfn):
os.remove(dfn)
while os.path.exists(dfn):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be more future-proof if it were structured as:

if os.path.exists(dfn):
    self.backup_existing(dfn)

and the new backup_existing method would have the behaviour similar to this docstring:

def backup_existing(self, dfn):
    """
    Back up an existing file with the same name `dfn` as a destination file for rollover.

    This method must find a unique filename in the same directory, related to `dfn`,
    and rename the existing file to that name.

    This implementation appends ' (N)' to `dfn`, where N is an integer going up from 1
    until an integer value N is found such that no existing file with that ' (N)' suffix is found.
    """

Having a separate method allows someone with a custom scheme for naming logfiles (using the namer attribute) to adapt their backup to that scheme by overriding the method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for being slow, I just wanted to clear make sure I have the correct understanding. I agree that the having a separate backupExisting method is a good idea (probably backupExisting instead of backup_existing for consistency with the rest of the module?)

I wasn't 100% sure on the behaviour of the method though, from my understanding on the rollover, the new file should be created with a suffix, from the docstring above, it seems like you want to rename the existing file to have a suffix?

as an example, I have the following log files

test.log
========
[INFO]: foo

test.log.2021-08-02
========
[INFO]: bar

for all the examples below, I'll denote that structure as

test.log # foo
test.log.2021-08-02 # bar

I was imagining the behaviour below:

test.log # baz
test.log.2021-08-02 # bar
test.log.2021-08-02 (1) # foo

This would mean that for n files, the latest file would always have the highest suffix. The current log would obviously have no suffix, the oldest rollover log would only have the date, and the following files would have (1), ... (N) in the order that they were generated.

From reading your comment, implemeting that method would yield the behaviour below:

test.log # baz
test.log.2021-08-02 # foo
test.log.2021-08-02 (1) # bar

where the most recent file would rename the already existing file in the directory and "bump" it up a number.

Personally I prefer the first method as implementing the method this way could cause some ambiguity i.e. in the case of a directory with suffixes (1), (2), (3), what happens to the newest file? The newest file gets the suffix (1), and then does the already existing (1) get bumped to (4), or does it get bumped to (2), and then (2) -> (3) and so on.

I'm not sure how this different logic affects the other issues you linked, I had a look at them but I need to do some testing to see the results myself as I couldn't 100% follow just the PR and bpo. I'll try and implement the logic and update this PR tonight.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't rush to update the PR, because maybe we should think about things some more. I'm not able to say much at the minute as I'm pressed for time, but I was thinking along the lines of: a target file that exists when rolling over should be an unusual case, not a common case (This issue hasn't come up in the many years since TimedRotatingFileHandler was added, IIRC). So, my proposed approach treats the existing file as just a backup (so stuff doesn't get lost), but once it's renamed, it's cleared the way for the expected dfn to be used, and the "usual" scheme applies from then on for log files.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(probably backupExisting instead of backup_existing for consistency with the rest of the module?

Yes, good point. The original logging module was written back before PEP8, and kept the old-style naming for backward compatibility. So backupExisting is right.

@serhiy-storchaka
Copy link
Member

Thank you for your contribution @Harry-Lees, but this issue was fixed in other way in #116191.

@Harry-Lees Harry-Lees deleted the bpo-44186 branch March 2, 2024 10:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants