Skip to content
This repository was archived by the owner on Mar 13, 2022. It is now read-only.

Check if the cached temp conf file still exists before using it. #38

Closed

Conversation

amitsrivastava
Copy link

On some systems older unused temp files get cleaned up regularly. So,
we cannot rely on the availability of the temp_files in long running
processes.

On some systems older unused temp files get cleaned up regularly. So,
we cannot rely on the availability of the temp_files in long running
processes.
@codecov-io
Copy link

Codecov Report

Merging #38 into master will not change coverage.
The diff coverage is 100%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master      #38   +/-   ##
=======================================
  Coverage   93.36%   93.36%           
=======================================
  Files          11       11           
  Lines         829      829           
=======================================
  Hits          774      774           
  Misses         55       55
Impacted Files Coverage Δ
config/kube_config.py 91.86% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9213876...3d91dcd. Read the comment docs.

@@ -49,7 +49,7 @@ def _create_temp_file_with_content(content):
# Because we may change context several times, try to remember files we
# created and reuse them at a small memory cost.
content_key = str(content)
if content_key in _temp_files:
if content_key in _temp_files and os.path.isfile(_temp_files[content_key]):
Copy link

@arunmk arunmk Nov 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is still a small window for a race here:

  1. after the check of is_file, the file may be deleted (before line 53)
  2. the file may be deleted after the function returns, but before the file is consumed by the caller.

There should be a more generic solution to this, or the tracking of the file should be delegated to the caller. Possible solutions include:

  1. 'Semantically locking' the file so that deletion will not remove the file: not ideal as the caller should clean up
  2. Modifying the system so that temporary files are not cleared: to be done by the caller.

Hence I think this responsibility of ensuring that the file exists should be delegated to the caller.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's absolutely correct @arunmk. But, what you suggest is a way bigger change. The temp file names are being passed all over and beyond the scope of this module (kube_config), so the changes have to go all the way up.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, the situation is such that once you get into a bad condition, you can never get out without a restart of the python process. But, with this change, most such cases should get handled and even the one that you pointed can only cause a one-time exception.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use the $TMPDIR, I don't see a way out that's better than the solution that you have here @amitsrivastava . Could you also add an API documentation change that says something to the effect of: 'The file that the API emits may not exist due to cleanup policy of $TMPDIR. In that case, the pattern is to retry the API until the file exists." That will clarify the usage of the API.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Temporary files can be safe deleted only if these files are not accessed by processes for a long time. For example tmpwatch (popular on Red Hat) works in this way. Because of this I suggest updating mtime to prevent deleting files. It also removes race condition.

 if content_key in _temp_files:
    try:
       os.utime(_temp_files[content_key], None)
       return _temp_files[content_key]   # file exists and mtime has been updated
    except OSError as err:
       if err.errno == errno.ENOENT:
           # ups, file disappeared, has to be recreated
           pass
       else:
           raise 

How does it look ?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amitsrivastava @tomplus it looks like the /run directory is better for such purposes. From http://www.h-online.com/open/news/item/Linux-distributions-to-include-run-directory-1219006.html:
"
On the Fedora project's developer list, systemd developer Lennart Poettering has announced the introduction of a /run directory in the root directory and provided detailed background explanations. Similar to the existing /var/run/ directory, the new directory is designed to allow applications to store the data they require in order to operate. This includes process IDs, socket information, lock files and other data which is required at run-time but can't be stored in /tmp/ because programs such as tmpwatch could potentially delete it from there.
"

So if feasible, the /var/run or the /run directory may be a better location to create these temporary files.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is keeping the temp file open a bad idea?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, Any update on this issue? Looks like it is not merged. How to address this issue ?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am facing the same issue as well, what would be the best way to work around?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I choose store temp in another directory.

@fejta-bot
Copy link

Unknown CLA label state. Rechecking for CLA labels.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/check-cla

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Apr 20, 2019
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 20, 2019
@tbarrella
Copy link

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 26, 2019
@scottilee
Copy link

Hey @amitsrivastava, any update on this? I can get someone to review it afterwards.

Since this PR has been open for a while, if it is no longer valid please close it.

@arunmk
Copy link

arunmk commented Sep 27, 2019

@scottilee @amitsrivastava I can take this forward and create a patch if you are fine by it

@pigletfly
Copy link

any updates on this?

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 22, 2020
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 21, 2020
@pigletfly
Copy link

/remove-lifecycle stale

@pigletfly
Copy link

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Feb 21, 2020
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 21, 2020
@arunmk
Copy link

arunmk commented May 21, 2020

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 21, 2020
@arunmk
Copy link

arunmk commented May 21, 2020

/remove-lifecycle rotten

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 8, 2020
@arunmk
Copy link

arunmk commented Oct 8, 2020

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 8, 2020
@ceroloy
Copy link

ceroloy commented Oct 9, 2020

I am seeing several PRs are linked here as related and some of them as merged. Is this issue resolved ? If so can you please share in which version is this fixed ?

Thank you

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 11, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 10, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closed this PR.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.