You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Honestly, I'm not sure if this is an issue with this library or with Pytorch Lightning itself.
When I run my Neural Network training code with --gpus=4 and --accelerator="ddp", 4 runs are created on the Neptune run list, while only the first one has any metrics logged within it.
The output I get is:
/home/ssm-user/train_script/venv/lib/python3.6/site-packages/pytorch_lightning/metrics/__init__.py:44: LightningDeprecationWarning: `pytorch_lightning.metrics.*` module has been renamed to `torchmetrics.*` and split off to its own package (https://github.com/PyTorchLightning/metrics) since v1.3 and will be removed inv1.5
"`pytorch_lightning.metrics.*` module has been renamed to `torchmetrics.*` and split off to its own package"
Global seed set to 101
https://app.neptune.ai/reuven/my-project-name/e/LOG-229
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
Global seed set to 101
https://app.neptune.ai/reuven/my-project-name/e/LOG-230
Global seed set to 101
initializing ddp: GLOBAL_RANK: 1, MEMBER: 2/4
Global seed set to 101
https://app.neptune.ai/reuven/my-project-name/e/LOG-231
Global seed set to 101
initializing ddp: GLOBAL_RANK: 2, MEMBER: 3/4
Global seed set to 101
initializing ddp: GLOBAL_RANK: 0, MEMBER: 1/4
Global seed set to 101
https://app.neptune.ai/reuven/my-project-name/e/LOG-232
Global seed set to 101
initializing ddp: GLOBAL_RANK: 3, MEMBER: 4/4
LOCAL_RANK: 1 - CUDA_VISIBLE_DEVICES: [0,1,2,3]
LOCAL_RANK: 2 - CUDA_VISIBLE_DEVICES: [0,1,2,3]
LOCAL_RANK: 3 - CUDA_VISIBLE_DEVICES: [0,1,2,3]
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2,3]
The text was updated successfully, but these errors were encountered:
Thank you for reporting. At first, I will really appreciate any information about your environment especially neptune-client, neptune-pytorch-lightning, and pytorch-lightning package versions. Of course, an Issue may still remain despite the version but it speeds up the reproduction process.
We do have an environment variable NEPTUNE_CUSTOM_RUN_ID which might be helpful in most parallel/distributed setups. As you suspected that all metrics are uploaded to only one Run custom ID should merge them. More info could be found here: https://docs.neptune.ai/how-to-guides/neptune-api/pipelines .
Honestly, I'm not sure if this is an issue with this library or with Pytorch Lightning itself.
When I run my Neural Network training code with
--gpus=4
and--accelerator="ddp"
, 4 runs are created on the Neptune run list, while only the first one has any metrics logged within it.The output I get is:
The text was updated successfully, but these errors were encountered: