Skip to content

Report leaking environment variables in tests #5872

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Sep 24, 2021
Merged

Conversation

awaelchli
Copy link
Contributor

@awaelchli awaelchli commented Feb 8, 2021

What does this PR do?

Currently Lightning "pollutes" the environment with variables. To some degree, this is necessary for example for multiprocessing etc. but unfortunately the variables don't get cleaned up properly, and it may not be so obvious where and what to clean up.

We currently have this fixture which will restore the environment after each test:

@pytest.fixture(scope="function", autouse=True)
def restore_env_variables():
    env_backup = os.environ.copy()
    yield
    os.environ.clear()
    os.environ.update(env_backup)

This PR changes this into an assertion and each test needs to leave the environment unchanged.
For now, this fixture serves the purpose to identify which tests are leaking. After we fix all the issues, we can include this fixture for every test.
In order to solve this, we need proper teardown logic for Trainer, Accelerator, Plugins, Logger etc.
Fixes #5757

This PR reports any new leaking environment variables.

Before submitting

  • Was this discussed/approved via a GitHub issue? (not for typos and docs)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • Did you update the CHANGELOG? (not for typos, docs, test updates, or internal minor changes/refactorings)

PR review

Anyone in the community is free to review the PR once the tests have passed.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:

  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified
  • Check that target branch and milestone match!

Did you have fun?

Make sure you had fun coding 🙃

@awaelchli awaelchli added the ci Continuous Integration label Feb 8, 2021
@codecov
Copy link

codecov bot commented Feb 8, 2021

Codecov Report

Merging #5872 (9f00db2) into master (8dcba38) will increase coverage by 0%.
The diff coverage is n/a.

@@          Coverage Diff           @@
##           master   #5872   +/-   ##
======================================
  Coverage      93%     93%           
======================================
  Files         179     179           
  Lines       15325   15329    +4     
======================================
+ Hits        14218   14222    +4     
  Misses       1107    1107           

@awaelchli awaelchli added this to the 1.2 milestone Feb 8, 2021
@awaelchli awaelchli changed the title fixture for clearing PL environment variables in tests fixture for restoring the environment variables after each test Feb 11, 2021
Base automatically changed from release/1.2-dev to master February 11, 2021 14:32
@awaelchli awaelchli closed this Feb 12, 2021
@awaelchli awaelchli reopened this Feb 12, 2021
@Borda Borda modified the milestones: 1.2, 1.2.x Feb 18, 2021
@awaelchli awaelchli force-pushed the ci/clear_environment branch from d01d690 to 8a3052a Compare February 21, 2021 21:52
@awaelchli awaelchli changed the title fixture for restoring the environment variables after each test fix leaking environment variables in tests Feb 21, 2021
@Borda Borda modified the milestones: 1.2.x, 1.3 Apr 18, 2021
@edenlightning edenlightning removed this from the v1.3 milestone May 4, 2021
@Borda
Copy link
Member

Borda commented May 11, 2021

@awaelchli how is it going here, still WIP or ready to go? 🐰

@stale
Copy link

stale bot commented May 30, 2021

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. If you need further help see our docs: https://pytorch-lightning.readthedocs.io/en/latest/generated/CONTRIBUTING.html#pull-request or ask the assistance of a core contributor here or on Slack. Thank you for your contributions.

@stale stale bot added the won't fix This will not be worked on label May 30, 2021
@carmocca carmocca added this to the v1.4 milestone May 31, 2021
@stale stale bot removed the won't fix This will not be worked on label May 31, 2021
@edenlightning edenlightning removed this from the v1.4 milestone Jul 6, 2021
@Borda
Copy link
Member

Borda commented Sep 23, 2021

seem this quite old PR with several handers of commits behind the master, consider finishing it or closing as most likely the conflicts will make the PR challenging to finish... 🐰

@awaelchli awaelchli closed this Sep 23, 2021
@carmocca
Copy link
Contributor

Why couldn't we finish this? Is it just a matter of addressing each leaking test? Or is there a change in Lightning required?

@awaelchli awaelchli restored the ci/clear_environment branch September 23, 2021 14:32
@awaelchli awaelchli deleted the ci/clear_environment branch September 23, 2021 14:33
@awaelchli awaelchli restored the ci/clear_environment branch September 23, 2021 14:35
@awaelchli
Copy link
Contributor Author

awaelchli commented Sep 23, 2021

@carmocca yes it would require a clean up for certain environment variables that get set. I don't know whether it is worth doing that in Lightning. Right now, many tests would error with this fixture, for example because variables like PL_GLOBAL_SEED, MASTER_ADDR, etc. survive from one test to the next.

However, on master we currently have a fixture for resetting the os.environ dict to the state it was before running each test.

Examples:

ERROR tests/trainer/test_data_loading.py::test_replace_distributed_sampler[1] - AssertionError: test is leaking environment variable(s): MASTER_ADDR, MASTER_PORT, NODE_RANK, LOCAL_RANK, WORLD_SIZE
ERROR tests/trainer/test_data_loading.py::test_replace_distributed_sampler[2] - AssertionError: test is leaking environment variable(s): MASTER_ADDR, MASTER_PORT, NODE_RANK, LOCAL_RANK
ERROR tests/trainer/test_data_loading.py::test_replace_distributed_sampler[3] - AssertionError: test is leaking environment variable(s): MASTER_ADDR, MASTER_PORT, NODE_RANK, LOCAL_RANK
ERROR tests/trainer/test_dataloaders.py::test_auto_add_worker_init_fn - AssertionError: test is leaking environment variable(s): PL_GLOBAL_SEED, PL_SEED_WORKERS
ERROR tests/trainer/test_dataloaders.py::test_auto_add_worker_init_fn_distributed - AssertionError: test is leaking environment variable(s): PL_GLOBAL_SEED, PL_SEED_WORKERS, CUDA_DEVICE_ORDER, MASTER_PORT
ERROR tests/trainer/test_dataloaders.py::test_dataloader_distributed_sampler - AssertionError: test is leaking environment variable(s): PL_GLOBAL_SEED, PL_SEED_WORKERS, CUDA_DEVICE_ORDER, MASTER_PORT
ERROR tests/trainer/test_dataloaders.py::test_dataloader_distributed_sampler_already_attached - AssertionError: test is leaking environment variable(s): PL_GLOBAL_SEED, PL_SEED_WORKERS, CUDA_DEVICE_ORDER, MASTER_PORT
ERROR tests/trainer/test_dataloaders.py::test_batch_size_smaller_than_num_gpus - AssertionError: test is leaking environment variable(s): CUDA_DEVICE_ORDER, MASTER_PORT
ERROR tests/trainer/test_trainer.py::test_loading_meta_tags - AssertionError: test is leaking environment variable(s): PL_GLOBAL_SEED, PL_SEED_WORKERS
ERROR tests/trainer/test_trainer.py::test_loading_yaml - AssertionError: test is leaking environment variable(s): PL_GLOBAL_SEED, PL_SEED_WORKERS
ERROR tests/trainer/test_trainer.py::test_gradient_clipping_by_norm[32] - AssertionError: test is leaking environment variable(s): PL_GLOBAL_SEED, PL_SEED_WORKERS, CUDA_DEVICE_ORDER
ERROR tests/trainer/test_trainer.py::test_gradient_clipping_by_norm[16] - AssertionError: test is leaking environment variable(s): PL_GLOBAL_SEED, PL_SEED_WORKERS, CUDA_DEVICE_ORDER
ERROR tests/trainer/test_trainer.py::test_gradient_clipping_by_value[32] - AssertionError: test is leaking environment variable(s): PL_GLOBAL_SEED, PL_SEED_WORKERS, CUDA_DEVICE_ORDER
ERROR tests/trainer/test_trainer.py::test_gradient_clipping_by_value[16] - AssertionError: test is leaking environment variable(s): PL_GLOBAL_SEED, PL_SEED_WORKERS, CUDA_DEVICE_ORDER
ERROR tests/trainer/test_trainer.py::test_trainer_predict_ddp_cpu - AssertionError: test is leaking environment variable(s): MASTER_PORT
ERROR tests/trainer/test_trainer.py::test_predict_return_predictions_cpu[32-None] - AssertionError: test is leaking environment variable(s): PL_GLOBAL_SEED, PL_SEED_WORKERS
ERROR tests/trainer/test_trainer.py::test_predict_return_predictions_cpu[32-False] - AssertionError: test is leaking environment variable(s): PL_GLOBAL_SEED, PL_SEED_WORKERS
ERROR tests/trainer/test_trainer.py::test_predict_return_predictions_cpu[32-True] - AssertionError: test is leaking environment variable(s): PL_GLOBAL_SEED, PL_SEED_WORKERS
ERROR tests/trainer/test_trainer.py::test_predict_return_predictions_cpu[64-None] - AssertionError: test is leaking environment variable(s): PL_GLOBAL_SEED, PL_SEED_WORKERS
ERROR tests/trainer/test_trainer.py::test_predict_return_predictions_cpu[64-False] - AssertionError: test is leaking environment variable(s): PL_GLOBAL_SEED, PL_SEED_WORKERS
ERROR tests/trainer/test_trainer.py::test_predict_return_predictions_cpu[64-True] - AssertionError: test is leaking environment variable(s): PL_GLOBAL_SEED, PL_SEED_WORKERS
ERROR tests/trainer/test_trainer.py::test_setup_hook_move_to_device_correctly - AssertionError: test is leaking environment variable(s): CUDA_DEVICE_ORDER
ERROR tests/trainer/test_trainer.py::test_model_in_correct_mode_during_stages[ddp_cpu-2] - AssertionError: test is leaking environment variable(s): MASTER_PORT
ERROR tests/trainer/test_trainer.py::test_fit_test_synchronization - AssertionError: test is leaking environment variable(s): PL_GLOBAL_SEED, PL_SEED_WORKERS, MASTER_PORT
ERROR tests/trainer/test_trainer.py::test_multiple_trainer_constant_memory_allocated - AssertionError: test is leaking environment variable(s): MASTER_ADDR, MASTER_PORT, NODE_RANK, LOCAL_RANK, CUDA_DEVICE_ORDER
ERROR tests/trainer/test_trainer.py::test_error_handling_all_stages[ddp_cpu-2] - AssertionError: test is leaking environment variable(s): MASTER_PORT
ERROR tests/trainer/logging_/test_distributed_logging.py::test_all_rank_logging_ddp_cpu - AssertionError: test is leaking environment variable(s): MASTER_PORT
ERROR tests/trainer/logging_/test_distributed_logging.py::test_all_rank_logging_ddp_spawn - AssertionError: test is leaking environment variable(s): CUDA_DEVICE_ORDER, MASTER_PORT
ERROR tests/trainer/logging_/test_logger_connector.py::test_epoch_results_cache_dp - AssertionError: test is leaking environment variable(s): CUDA_DEVICE_ORDER
ERROR tests/trainer/logging_/test_train_loop_logging.py::test_logging_sync_dist_true[1] - AssertionError: test is leaking environment variable(s): CUDA_DEVICE_ORDER
ERROR tests/trainer/logging_/test_train_loop_logging.py::test_logging_sync_dist_true[2] - AssertionError: test is leaking environment variable(s): CUDA_DEVICE_ORDER, MASTER_PORT
ERROR tests/trainer/logging_/test_train_loop_logging.py::test_metric_are_properly_reduced - AssertionError: test is leaking environment variable(s): CUDA_DEVICE_ORDER
ERROR tests/trainer/logging_/test_train_loop_logging.py::test_log_gpu_memory_without_logging_on_step[all] - AssertionError: test is leaking environment variable(s): CUDA_DEVICE_ORDER
ERROR tests/trainer/logging_/test_train_loop_logging.py::test_log_gpu_memory_without_logging_on_step[min_max] - AssertionError: test is leaking environment variable(s): CUDA_DEVICE_ORDER
ERROR tests/trainer/logging_/test_train_loop_logging.py::test_move_metrics_to_cpu - AssertionError: test is leaking environment variable(s): CUDA_DEVICE_ORDER
ERROR tests/trainer/optimization/test_manual_optimization.py::test_multiple_optimizers_manual_no_return[kwargs1] - AssertionError: test is leaking environment variable(s): CUDA_DEVICE_ORDER
ERROR tests/trainer/optimization/test_manual_optimization.py::test_multiple_optimizers_manual_native_amp - AssertionError: test is leaking environment variable(s): CUDA_DEVICE_ORDER
ERROR tests/trainer/optimization/test_manual_optimization.py::test_manual_optimization_and_return_tensor - AssertionError: test is leaking environment variable(s): CUDA_DEVICE_ORDER, MASTER_PORT
ERROR tests/trainer/optimization/test_manual_optimization.py::test_manual_optimization_and_accumulated_gradient - AssertionError: test is leaking environment variable(s): PL_GLOBAL_SEED, PL_SEED_WORKERS, CUDA_DEVICE_ORDER
ERROR tests/trainer/optimization/test_manual_optimization.py::test_multiple_optimizers_step - AssertionError: test is leaking environment variable(s): CUDA_DEVICE_ORDER
ERROR tests/trainer/optimization/test_manual_optimization.py::test_step_with_optimizer_closure_with_different_frequencies_ddp_spawn - AssertionError: test is leaking environment variable(s): PL_GLOBAL_SEED, PL_SEED_WORKERS, CUDA_DEVICE_ORDER, MASTER_PORT
ERROR tests/trainer/optimization/test_manual_optimization.py::test_multiple_optimizers_logging[16] - AssertionError: test is leaking environment variable(s): CUDA_DEVICE_ORDER
ERROR tests/trainer/optimization/test_manual_optimization.py::test_multiple_optimizers_logging[32] - AssertionError: test is leaking environment variable(s): CUDA_DEVICE_ORDER
ERROR tests/trainer/properties/test_get_model.py::test_get_model_ddp_cpu - AssertionError: test is leaking environment variable(s): MASTER_PORT
ERROR tests/trainer/properties/test_get_model.py::test_get_model_gpu - AssertionError: test is leaking environment variable(s): CUDA_DEVICE_ORDER

@carmocca
Copy link
Contributor

We could whitelist these for now:

CUDA_DEVICE_ORDER
LOCAL_RANK
MASTER_ADDR
MASTER_PORT
NODE_RANK
PL_GLOBAL_SEED
PL_SEED_WORKERS
WORLD_SIZE

@carmocca carmocca reopened this Sep 23, 2021
@carmocca carmocca changed the title fix leaking environment variables in tests Report leaking environment variables in tests Sep 23, 2021
@carmocca carmocca added this to the v1.5 milestone Sep 23, 2021
@carmocca carmocca marked this pull request as ready for review September 23, 2021 16:00
Co-authored-by: Adrian Wälchli <[email protected]>
@mergify mergify bot added the ready PRs ready to be merged label Sep 24, 2021
Copy link
Contributor

@tchaton tchaton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM !

@awaelchli awaelchli merged commit 714331b into master Sep 24, 2021
@awaelchli awaelchli deleted the ci/clear_environment branch September 24, 2021 10:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci Continuous Integration ready PRs ready to be merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

strange behavior with tests in PL: tests influence each other
6 participants