-
Notifications
You must be signed in to change notification settings - Fork 182
Check if the cached temp conf file still exists before using it. #38
Conversation
On some systems older unused temp files get cleaned up regularly. So, we cannot rely on the availability of the temp_files in long running processes.
Codecov Report
@@ Coverage Diff @@
## master #38 +/- ##
=======================================
Coverage 93.36% 93.36%
=======================================
Files 11 11
Lines 829 829
=======================================
Hits 774 774
Misses 55 55
Continue to review full report at Codecov.
|
@@ -49,7 +49,7 @@ def _create_temp_file_with_content(content): | |||
# Because we may change context several times, try to remember files we | |||
# created and reuse them at a small memory cost. | |||
content_key = str(content) | |||
if content_key in _temp_files: | |||
if content_key in _temp_files and os.path.isfile(_temp_files[content_key]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is still a small window for a race here:
- after the check of is_file, the file may be deleted (before line 53)
- the file may be deleted after the function returns, but before the file is consumed by the caller.
There should be a more generic solution to this, or the tracking of the file should be delegated to the caller. Possible solutions include:
- 'Semantically locking' the file so that deletion will not remove the file: not ideal as the caller should clean up
- Modifying the system so that temporary files are not cleared: to be done by the caller.
Hence I think this responsibility of ensuring that the file exists should be delegated to the caller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's absolutely correct @arunmk. But, what you suggest is a way bigger change. The temp file names are being passed all over and beyond the scope of this module (kube_config), so the changes have to go all the way up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, the situation is such that once you get into a bad condition, you can never get out without a restart of the python process. But, with this change, most such cases should get handled and even the one that you pointed can only cause a one-time exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we use the $TMPDIR, I don't see a way out that's better than the solution that you have here @amitsrivastava . Could you also add an API documentation change that says something to the effect of: 'The file that the API emits may not exist due to cleanup policy of $TMPDIR. In that case, the pattern is to retry the API until the file exists." That will clarify the usage of the API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Temporary files can be safe deleted only if these files are not accessed by processes for a long time. For example tmpwatch
(popular on Red Hat) works in this way. Because of this I suggest updating mtime to prevent deleting files. It also removes race condition.
if content_key in _temp_files:
try:
os.utime(_temp_files[content_key], None)
return _temp_files[content_key] # file exists and mtime has been updated
except OSError as err:
if err.errno == errno.ENOENT:
# ups, file disappeared, has to be recreated
pass
else:
raise
How does it look ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amitsrivastava @tomplus it looks like the /run directory is better for such purposes. From http://www.h-online.com/open/news/item/Linux-distributions-to-include-run-directory-1219006.html:
"
On the Fedora project's developer list, systemd developer Lennart Poettering has announced the introduction of a /run directory in the root directory and provided detailed background explanations. Similar to the existing /var/run/ directory, the new directory is designed to allow applications to store the data they require in order to operate. This includes process IDs, socket information, lock files and other data which is required at run-time but can't be stored in /tmp/ because programs such as tmpwatch could potentially delete it from there.
"
So if feasible, the /var/run or the /run directory may be a better location to create these temporary files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is keeping the temp file open a bad idea?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, Any update on this issue? Looks like it is not merged. How to address this issue ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am facing the same issue as well, what would be the best way to work around?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I choose store temp in another directory.
Unknown CLA label state. Rechecking for CLA labels. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale |
Hey @amitsrivastava, any update on this? I can get someone to review it afterwards. Since this PR has been open for a while, if it is no longer valid please close it. |
@scottilee @amitsrivastava I can take this forward and create a patch if you are fine by it |
any updates on this? |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale |
/remove-lifecycle rotten |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale |
/remove-lifecycle rotten |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale |
I am seeing several PRs are linked here as related and some of them as merged. Is this issue resolved ? If so can you please share in which version is this fixed ? Thank you |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-contributor-experience at kubernetes/community. |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close |
@k8s-triage-robot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
On some systems older unused temp files get cleaned up regularly. So,
we cannot rely on the availability of the temp_files in long running
processes.