-
Notifications
You must be signed in to change notification settings - Fork 3.5k
WandbLogger doesn't format config correctly #17558
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for pointing out #14988. That is very related (my other issue #17559 is a duplicate), but I think that this issue is still slightly different though. In particular this issue is about what happens at This is my understanding:
|
@albertfgu #17574 should fix this. Could you check? |
And yes, we forward |
@albertfgu We merged the fix #17574 and so there shouldn't be an inconsistency anymore. Could you check that it is working as expected? |
How can I install the latest commits? I followed the instructions at https://lightning.ai/docs/pytorch/stable/starter/installation.html and ran
It prints a lot of messages about |
It looks like the entire namespace got moved from |
I switched to |
As you can see in the linked PR #17574, the flattening got completely removed from WandbLogger: If you install lightning from master, you will see that in the run, the parameters are nested properly. |
So I misspoke in the previous post
It is not longer flattening in either scenario: instead of logging However, this issue was originally about case (2) which does not have to do with lightning's flattening at all. Thus PR #17574 cannot have addressed it. Just to recap again, this issue is about the fact that
Thus the issue seems to be that Lightning is doing something to the config dictionary before it gets passed to |
Here is a simple example that demonstrates how Lightning passes the objects unmodified to wandb: import argparse
import torch
from torch.utils.data import DataLoader, Dataset
from lightning.pytorch import LightningModule, Trainer
from lightning.pytorch.loggers import WandbLogger
class RandomDataset(Dataset):
def __init__(self, size, length):
self.len = length
self.data = torch.randn(length, size)
def __getitem__(self, index):
return self.data[index]
def __len__(self):
return self.len
class BoringModel(LightningModule):
def __init__(self, **kwargs):
super().__init__()
self.save_hyperparameters()
self.layer = torch.nn.Linear(32, 2)
def forward(self, x):
return self.layer(x)
def training_step(self, batch, batch_idx):
loss = self(batch).sum()
self.log("train_loss", loss)
return {"loss": loss}
def configure_optimizers(self):
return torch.optim.SGD(self.layer.parameters(), lr=0.1)
def train_dataloader(self):
return DataLoader(RandomDataset(32, 64), batch_size=2)
def run():
# Example of a nested configuration
config = {
"int": 1,
"dict": {"a": 1, "b": 2},
"dict2": {"a": {"a": 1, "b": 2}, "b": 3},
"list": [1, 2, 3],
"namespace": argparse.Namespace(a=1, b=2),
}
model = BoringModel(**config)
trainer = Trainer(
logger=WandbLogger(project="test-config-logging"),
max_steps=1,
)
trainer.fit(model)
if __name__ == "__main__":
run() Of course, anything that WandB does NOT support displaying in a special way will just be sanitized and shown as a string. This is not decided by Lightning. If I pass the above config directly into wandb like so: config = {
"int": 1,
"dict": {"a": 1, "b": 2},
"dict2": {"a": {"a": 1, "b": 2}, "b": 3},
"list": [1, 2, 3],
"namespace": argparse.Namespace(a=1, b=2),
}
# Now, let's log the same thing with wandb directly and compare that they are the same
import wandb
wandb.init(project="test-config-logging", name="raw", config=config) I get the exact same as shown in the screenshot above. |
Thanks for showing the snippet and screenshot. Let me dig in some more to see if there is something strange going on with my setup. |
Okay, I finally tracked down the issue. It turned out I was passing in not a raw Python dictionary for the config but an omegaconf DictConfig object (https://omegaconf.readthedocs.io/en/2.1_branch/index.html). This is the dictionary object used by Hydra (https://hydra.cc/), but it seems that WandB doesn't like it when you pass in this fancy dictionary object instead of a basic Python dictionary. The solution is to convert the @awaelchli Thanks for addressing this and making the PR, and sorry for assuming the issue was with Lightning when it turned out to be an unfortunate interaction between multiple libraries. I think it is perhaps possible for the libraries to help alleviate these sorts of issues; for example In fact reading through the Lightning code, it seems like But perhaps it's just the responsibility of the user to make sure all the libraries are interacting properly |
@albertfgu If we updated the types from Dict to Mapping (and converted mapping objects to dicts), would that resolve your remaining issue? |
I have fixed my own issue after identifying the problem, but I do think that what you described would be more robust to potential related issues. As is, even the Hydra + Lightning combination is probably fairly common and users would all run into this non-obvious issue. |
Agree! I have been banging my head over this for a while and just found this issue. Thank you for this discussion! @albertfgu could you please elaborate on how you fixed this issue eventually? Thanks! |
I also had this issue after I upgraded wandb &pl, I think the same code but with wandb 0.16.6 + pl=2.0.1 + hydra-core=1.3.2 is fine. I run in class ModelWrapper(LightningModule):
def __init__(self, cfg, eval=False):
super().__init__()
# other codes here.
# self.save_hyperparameters() # previously
self.save_hyperparameters(cfg) I think it's mainly lightning issue, looks like after they upgraded? related to mine one: #20311 |
Uh oh!
There was an error while loading. Please reload this page.
Bug description
Summary:
WandbLogger(config=config)
does not provide the same behavior aswandb.init(config=config)
in recent versions of pytorch-lightningExplanation:
Passing

config
intowandb.init
is supposed to create a nicely formatted config in WandBExample:
wandb.init(config={'key1': {'key2': 'value'}})
The Run Overview in wandb looks like this:
A more complicated example of a nested config:

In this WandB run, the keys are logged and searchable as e.g.
model.pool.0
(with corresponding value4
)However, this is what it looks like when you run

pytorch_lightning.loggers.WandbLogger(config={'test_key': {'key2': 'test_value'}})
(which is supposed to pass theconfig
entry straight through towandb.init
)Note that the keys are no longer nested and there's only one level of hierarchy where the values are massive dictionaries. Instead of the WandB config having a key of
test_key.key2
with value oftest_value
, there is only a key oftest_key
with a value of{'key2': 'test_value'}
.What version are you seeing the problem on?
v2_0
Note: I have used older versions of pytorch-lightning that do not have this issue. I'm not sure if it is a regression and have not had time to bisect.
How to reproduce the bug
Error messages and logs
Environment
❯ python collect_env_details.py
Current environment
- GPU:
- NVIDIA A100-SXM4-40GB
- NVIDIA A100-SXM4-40GB
- NVIDIA A100-SXM4-40GB
- NVIDIA A100-SXM4-40GB
- NVIDIA A100-SXM4-40GB
- NVIDIA A100-SXM4-40GB
- NVIDIA A100-SXM4-40GB
- NVIDIA A100-SXM4-40GB
- NVIDIA A100-SXM4-40GB
- NVIDIA A100-SXM4-40GB
- NVIDIA A100-SXM4-40GB
- NVIDIA A100-SXM4-40GB
- NVIDIA A100-SXM4-40GB
- NVIDIA A100-SXM4-40GB
- NVIDIA A100-SXM4-40GB
- NVIDIA A100-SXM4-40GB
- available: True
- version: 12.1
- lightning-utilities: 0.8.0
- pytorch-fast-transformers: 0.4.0
- pytorch-lightning: 2.0.1.post0
- pytorch-quantization: 2.1.2
- torch: 2.0.0a0+1767026
- torch-tensorrt: 1.4.0.dev0
- torchaudio: 2.0.1+3b40834
- torchmetrics: 0.11.4
- torchtext: 0.13.0a0+fae8e8c
- torchvision: 0.15.0a0
- absl-py: 1.4.0
- accelerate: 0.18.0
- aiohttp: 3.8.4
- aiosignal: 1.3.1
- alembic: 1.10.3
- antlr4-python3-runtime: 4.9.3
- apex: 0.1
- appdirs: 1.4.4
- argcomplete: 3.0.5
- argon2-cffi: 21.3.0
- argon2-cffi-bindings: 21.2.0
- asttokens: 2.2.1
- astunparse: 1.6.3
- async-timeout: 4.0.2
- attrs: 22.2.0
- audioread: 3.0.0
- autopage: 0.5.1
- backcall: 0.2.0
- beautifulsoup4: 4.11.2
- bleach: 6.0.0
- blessed: 1.20.0
- blis: 0.7.9
- boto: 2.49.0
- cachetools: 5.3.0
- catalogue: 2.0.8
- cauchy-mult: 0.1
- certifi: 2022.12.7
- cffi: 1.15.1
- charset-normalizer: 3.1.0
- click: 8.1.3
- cliff: 4.2.0
- cloudpickle: 2.2.1
- cmaes: 0.9.1
- cmake: 3.24.1.1
- cmd2: 2.4.3
- colorlog: 6.7.0
- comm: 0.1.2
- confection: 0.0.4
- contourpy: 1.0.7
- crcmod: 1.7
- cryptography: 40.0.2
- cubinlinker: 0.2.2+2.g4de3e99
- cuda-python: 12.1.0rc5+1.gc7fd38c.dirty
- cudf: 23.2.0
- cugraph: 23.2.0
- cugraph-dgl: 23.2.0
- cugraph-service-client: 23.2.0
- cugraph-service-server: 23.2.0
- cuml: 23.2.0
- cupy-cuda12x: 12.0.0b3
- cycler: 0.11.0
- cymem: 2.0.7
- cython: 0.29.33
- dask: 2023.1.1
- dask-cuda: 23.2.0
- dask-cudf: 23.2.0
- datasets: 2.11.0
- debugpy: 1.6.6
- decorator: 5.1.1
- deepspeed: 0.9.1
- defusedxml: 0.7.1
- dill: 0.3.6
- distributed: 2023.1.1
- docker-pycreds: 0.4.0
- docutils: 0.19
- dropout-layer-norm: 0.1
- eeghdf: 0.2.4
- einops: 0.6.1
- en-core-web-sm: 3.5.0
- exceptiongroup: 1.1.1
- execnet: 1.9.0
- executing: 1.2.0
- expecttest: 0.1.3
- fasteners: 0.18
- fastjsonschema: 2.16.3
- fastrlock: 0.8.1
- fftconv: 0.1
- filelock: 3.10.0
- flash-attn: 1.0.3.post0
- fonttools: 4.38.0
- frozenlist: 1.3.3
- fsspec: 2023.1.0
- ft-attention: 0.1
- fused-dense-lib: 0.0.0
- future: 0.18.3
- fvcore: 0.1.5.post20221221
- gast: 0.4.0
- gcs-oauth2-boto-plugin: 3.0
- gdown: 4.7.1
- gitdb: 4.0.10
- gitpython: 3.1.31
- google-apitools: 0.5.32
- google-auth: 2.16.2
- google-auth-oauthlib: 0.4.6
- google-reauth: 0.1.1
- gpustat: 1.1
- graphsurgeon: 0.4.6
- greenlet: 2.0.2
- grpcio: 1.51.3
- gsutil: 5.23
- h5py: 3.8.0
- heapdict: 1.0.1
- hjson: 3.1.0
- httplib2: 0.20.4
- huggingface-hub: 0.13.4
- hydra-colorlog: 1.2.0
- hydra-core: 1.3.2
- hydra-optuna-sweeper: 1.2.0
- hypothesis: 5.35.1
- idna: 3.4
- importlib-metadata: 6.0.0
- importlib-resources: 5.12.0
- iniconfig: 2.0.0
- intel-openmp: 2021.4.0
- iopath: 0.1.10
- ipdb: 0.13.13
- ipykernel: 6.21.3
- ipython: 8.11.0
- ipython-genutils: 0.2.0
- ipywidgets: 8.0.6
- jaraco.classes: 3.2.3
- jedi: 0.18.2
- jeepney: 0.8.0
- jinja2: 3.1.2
- joblib: 1.2.0
- json5: 0.9.11
- jsonschema: 4.17.3
- jupyter: 1.0.0
- jupyter-client: 8.0.3
- jupyter-console: 6.6.3
- jupyter-core: 5.2.0
- jupyter-tensorboard: 0.2.0
- jupyterlab: 2.3.2
- jupyterlab-pygments: 0.2.2
- jupyterlab-server: 1.2.0
- jupyterlab-widgets: 3.0.7
- jupytext: 1.14.5
- keopscore: 2.1.2
- keyring: 23.13.1
- kiwisolver: 1.4.4
- langcodes: 3.3.0
- librosa: 0.9.2
- lightning-utilities: 0.8.0
- lit: 15.0.7
- llvmlite: 0.39.1
- locket: 1.0.0
- mako: 1.2.4
- markdown: 3.4.1
- markdown-it-py: 2.2.0
- markupsafe: 2.1.2
- matplotlib: 3.7.0
- matplotlib-inline: 0.1.6
- mdit-py-plugins: 0.3.5
- mdurl: 0.1.2
- mistune: 2.0.5
- mkl: 2021.1.1
- mkl-devel: 2021.1.1
- mkl-include: 2021.1.1
- mlperf-logging: 2.1.0
- mock: 5.0.1
- monotonic: 1.6
- more-itertools: 9.1.0
- mpmath: 1.3.0
- msgpack: 1.0.4
- multidict: 6.0.4
- multiprocess: 0.70.14
- munch: 2.5.0
- murmurhash: 1.0.9
- nbclient: 0.7.2
- nbconvert: 7.2.10
- nbformat: 5.7.3
- nest-asyncio: 1.5.6
- networkx: 2.6.3
- ninja: 1.11.1
- notebook: 6.4.10
- numba: 0.56.4+1.g9a03de713
- numpy: 1.22.2
- nvidia-dali-cuda110: 1.23.0
- nvidia-ml-py: 11.525.112
- nvidia-pyindex: 1.0.9
- nvitop: 1.1.2
- nvtx: 0.2.5
- oauth2client: 4.1.3
- oauthlib: 3.2.2
- omegaconf: 2.3.0
- onnx: 1.13.0
- opencv: 4.6.0
- opt-einsum: 3.3.0
- optuna: 2.10.1
- packaging: 23.0
- pandas: 1.5.2
- pandocfilters: 1.5.0
- parso: 0.8.3
- partd: 1.3.0
- pathtools: 0.1.2
- pathy: 0.10.1
- pbr: 5.11.1
- pexpect: 4.8.0
- pickleshare: 0.7.5
- pillow: 9.2.0
- pip: 21.2.4
- pkginfo: 1.9.6
- pkgutil-resolve-name: 1.3.10
- platformdirs: 3.1.1
- pluggy: 1.0.0
- ply: 3.11
- polygraphy: 0.44.2
- pooch: 1.7.0
- portalocker: 2.7.0
- preshed: 3.0.8
- prettytable: 3.6.0
- prometheus-client: 0.16.0
- prompt-toolkit: 3.0.38
- protobuf: 3.20.3
- psutil: 5.9.4
- ptxcompiler: 0.7.0+27.gbcb4096
- ptyprocess: 0.7.0
- pure-eval: 0.2.2
- py-cpuinfo: 9.0.0
- pyarrow: 10.0.1.dev0+ga6eabc2b.d20230220
- pyasn1: 0.4.8
- pyasn1-modules: 0.2.8
- pybind11: 2.10.3
- pycocotools: 2.0+nv0.7.1
- pycparser: 2.21
- pydantic: 1.10.6
- pygments: 2.14.0
- pylibcugraph: 23.2.0
- pylibcugraphops: 23.2.0
- pylibraft: 23.2.0
- pynvml: 11.5.0
- pyopenssl: 23.1.1
- pyparsing: 3.0.9
- pyperclip: 1.8.2
- pyrootutils: 1.0.4
- pyrsistent: 0.19.3
- pysocks: 1.7.1
- pytest: 7.2.2
- pytest-rerunfailures: 11.1.2
- pytest-shard: 0.1.2
- pytest-xdist: 3.2.1
- python-dateutil: 2.8.2
- python-dotenv: 1.0.0
- python-hostlist: 1.23.0
- pytorch-fast-transformers: 0.4.0
- pytorch-lightning: 2.0.1.post0
- pytorch-quantization: 2.1.2
- pytz: 2022.7.1
- pyu2f: 0.1.5
- pyyaml: 6.0
- pyzmq: 25.0.1
- qtconsole: 5.4.2
- qtpy: 2.3.1
- raft-dask: 23.2.0
- readme-renderer: 37.3
- regex: 2022.10.31
- requests: 2.28.2
- requests-oauthlib: 1.3.1
- requests-toolbelt: 0.10.1
- resampy: 0.4.2
- responses: 0.18.0
- retry-decorator: 1.1.1
- rfc3986: 2.0.0
- rich: 13.3.4
- rmm: 23.2.0
- rotary-emb: 0.1
- rsa: 4.7.2
- scikit-learn: 1.2.0
- scipy: 1.6.3
- seaborn: 0.12.2
- secretstorage: 3.3.3
- send2trash: 1.8.0
- sentencepiece: 0.1.98
- sentry-sdk: 1.20.0
- setproctitle: 1.3.2
- setuptools: 65.5.1
- six: 1.16.0
- smart-open: 6.3.0
- smmap: 5.0.0
- sortedcontainers: 2.4.0
- soundfile: 0.12.1
- soupsieve: 2.4
- spacy: 3.5.1
- spacy-legacy: 3.0.12
- spacy-loggers: 1.0.4
- sphinx-glpi-theme: 0.3
- sqlalchemy: 2.0.10
- srsly: 2.4.6
- stack-data: 0.6.2
- stevedore: 5.0.0
- strings-udf: 23.2.0
- structured-kernels: 0.1.0
- sympy: 1.11.1
- tabulate: 0.9.0
- tbb: 2021.8.0
- tblib: 1.7.0
- tensorboard: 2.9.0
- tensorboard-data-server: 0.6.1
- tensorboard-plugin-wit: 1.8.1
- tensorrt: 8.5.3.1
- termcolor: 2.2.0
- terminado: 0.17.1
- thinc: 8.1.9
- threadpoolctl: 3.1.0
- thriftpy2: 0.4.16
- timm: 0.6.13
- tinycss2: 1.2.1
- tokenizers: 0.13.3
- toml: 0.10.2
- tomli: 2.0.1
- toolz: 0.12.0
- torch: 2.0.0a0+1767026
- torch-tensorrt: 1.4.0.dev0
- torchaudio: 2.0.1+3b40834
- torchmetrics: 0.11.4
- torchtext: 0.13.0a0+fae8e8c
- torchvision: 0.15.0a0
- tornado: 6.2
- tqdm: 4.65.0
- traitlets: 5.9.0
- transformer-engine: 0.6.0
- transformers: 4.28.1
- treelite: 3.1.0
- treelite-runtime: 3.1.0
- triton: 2.0.0.dev20221202
- twine: 4.0.2
- typer: 0.7.0
- types-dataclasses: 0.6.6
- typing-extensions: 4.5.0
- ucx-py: 0.30.0
- uff: 0.6.9
- urllib3: 1.26.14
- wandb: 0.15.1
- wasabi: 1.1.1
- wcwidth: 0.2.6
- webencodings: 0.5.1
- werkzeug: 2.2.3
- wheel: 0.38.4
- widgetsnbextension: 4.0.7
- xdoctest: 1.0.2
- xentropy-cuda-lib: 0.1
- xxhash: 3.2.0
- yacs: 0.1.8
- yarl: 1.9.1
- zict: 2.2.0
- zipp: 3.14.0
- zstandard: 0.21.0
- OS: Linux
- architecture:
- 64bit
- ELF
- processor: x86_64
- python: 3.8.10
- version: Proposal for help #1 SMP Debian 4.19.269-1 (2022-12-20)
More info
No response
cc @awaelchli @morganmcg1 @borisdayma @scottire @parambharat
The text was updated successfully, but these errors were encountered: