You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
#- PyTorch Lightning Version (e.g., 2.5.0):
#- PyTorch Version (e.g., 2.5):
#- Python version (e.g., 3.12):
#- OS (e.g., Linux):
#- CUDA/cuDNN version:
#- GPU models and configuration:
#- How you installed Lightning(`conda`, `pip`, source):
More info
No response
The text was updated successfully, but these errors were encountered:
Bug description
When I load my checkpoint, it says LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
But I check my slurm jobstats and it's like
GPU utilization per node
stellar-m01g3 (GPU 0): 0% <--- GPU was not used
GPU memory usage per node - maximum used/total
stellar-m01g3 (GPU 0): 12.2GB/40.0GB (30.5%)
I even made sure to later put accelerator: gpu in the trainer section of the yaml file
What version are you seeing the problem on?
v2.5
How to reproduce the bug
Error messages and logs
Environment
Current environment
More info
No response
The text was updated successfully, but these errors were encountered: