Skip to content

WandbLogger doesn't format config correctly #17558

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
albertfgu opened this issue May 4, 2023 · 18 comments
Open

WandbLogger doesn't format config correctly #17558

albertfgu opened this issue May 4, 2023 · 18 comments
Labels
bug Something isn't working help wanted Open to be worked on logger: wandb Weights & Biases ver: 2.0.x

Comments

@albertfgu
Copy link

albertfgu commented May 4, 2023

Bug description

Summary: WandbLogger(config=config) does not provide the same behavior as wandb.init(config=config) in recent versions of pytorch-lightning

Explanation:

Passing config into wandb.init is supposed to create a nicely formatted config in WandB
Example:
wandb.init(config={'key1': {'key2': 'value'}})
The Run Overview in wandb looks like this:
image

A more complicated example of a nested config:
image
In this WandB run, the keys are logged and searchable as e.g. model.pool.0 (with corresponding value 4)

However, this is what it looks like when you run
pytorch_lightning.loggers.WandbLogger(config={'test_key': {'key2': 'test_value'}}) (which is supposed to pass the config entry straight through to wandb.init)
image

Note that the keys are no longer nested and there's only one level of hierarchy where the values are massive dictionaries. Instead of the WandB config having a key of test_key.key2 with value of test_value, there is only a key of test_key with a value of {'key2': 'test_value'}.

What version are you seeing the problem on?

v2_0

Note: I have used older versions of pytorch-lightning that do not have this issue. I'm not sure if it is a regression and have not had time to bisect.

How to reproduce the bug

See above

Error messages and logs

# Error messages and logs here please

Environment

❯ python collect_env_details.py

Current environment
  • CUDA:
    - GPU:
    - NVIDIA A100-SXM4-40GB
    - NVIDIA A100-SXM4-40GB
    - NVIDIA A100-SXM4-40GB
    - NVIDIA A100-SXM4-40GB
    - NVIDIA A100-SXM4-40GB
    - NVIDIA A100-SXM4-40GB
    - NVIDIA A100-SXM4-40GB
    - NVIDIA A100-SXM4-40GB
    - NVIDIA A100-SXM4-40GB
    - NVIDIA A100-SXM4-40GB
    - NVIDIA A100-SXM4-40GB
    - NVIDIA A100-SXM4-40GB
    - NVIDIA A100-SXM4-40GB
    - NVIDIA A100-SXM4-40GB
    - NVIDIA A100-SXM4-40GB
    - NVIDIA A100-SXM4-40GB
    - available: True
    - version: 12.1
  • Lightning:
    - lightning-utilities: 0.8.0
    - pytorch-fast-transformers: 0.4.0
    - pytorch-lightning: 2.0.1.post0
    - pytorch-quantization: 2.1.2
    - torch: 2.0.0a0+1767026
    - torch-tensorrt: 1.4.0.dev0
    - torchaudio: 2.0.1+3b40834
    - torchmetrics: 0.11.4
    - torchtext: 0.13.0a0+fae8e8c
    - torchvision: 0.15.0a0
  • Packages:
    - absl-py: 1.4.0
    - accelerate: 0.18.0
    - aiohttp: 3.8.4
    - aiosignal: 1.3.1
    - alembic: 1.10.3
    - antlr4-python3-runtime: 4.9.3
    - apex: 0.1
    - appdirs: 1.4.4
    - argcomplete: 3.0.5
    - argon2-cffi: 21.3.0
    - argon2-cffi-bindings: 21.2.0
    - asttokens: 2.2.1
    - astunparse: 1.6.3
    - async-timeout: 4.0.2
    - attrs: 22.2.0
    - audioread: 3.0.0
    - autopage: 0.5.1
    - backcall: 0.2.0
    - beautifulsoup4: 4.11.2
    - bleach: 6.0.0
    - blessed: 1.20.0
    - blis: 0.7.9
    - boto: 2.49.0
    - cachetools: 5.3.0
    - catalogue: 2.0.8
    - cauchy-mult: 0.1
    - certifi: 2022.12.7
    - cffi: 1.15.1
    - charset-normalizer: 3.1.0
    - click: 8.1.3
    - cliff: 4.2.0
    - cloudpickle: 2.2.1
    - cmaes: 0.9.1
    - cmake: 3.24.1.1
    - cmd2: 2.4.3
    - colorlog: 6.7.0
    - comm: 0.1.2
    - confection: 0.0.4
    - contourpy: 1.0.7
    - crcmod: 1.7
    - cryptography: 40.0.2
    - cubinlinker: 0.2.2+2.g4de3e99
    - cuda-python: 12.1.0rc5+1.gc7fd38c.dirty
    - cudf: 23.2.0
    - cugraph: 23.2.0
    - cugraph-dgl: 23.2.0
    - cugraph-service-client: 23.2.0
    - cugraph-service-server: 23.2.0
    - cuml: 23.2.0
    - cupy-cuda12x: 12.0.0b3
    - cycler: 0.11.0
    - cymem: 2.0.7
    - cython: 0.29.33
    - dask: 2023.1.1
    - dask-cuda: 23.2.0
    - dask-cudf: 23.2.0
    - datasets: 2.11.0
    - debugpy: 1.6.6
    - decorator: 5.1.1
    - deepspeed: 0.9.1
    - defusedxml: 0.7.1
    - dill: 0.3.6
    - distributed: 2023.1.1
    - docker-pycreds: 0.4.0
    - docutils: 0.19
    - dropout-layer-norm: 0.1
    - eeghdf: 0.2.4
    - einops: 0.6.1
    - en-core-web-sm: 3.5.0
    - exceptiongroup: 1.1.1
    - execnet: 1.9.0
    - executing: 1.2.0
    - expecttest: 0.1.3
    - fasteners: 0.18
    - fastjsonschema: 2.16.3
    - fastrlock: 0.8.1
    - fftconv: 0.1
    - filelock: 3.10.0
    - flash-attn: 1.0.3.post0
    - fonttools: 4.38.0
    - frozenlist: 1.3.3
    - fsspec: 2023.1.0
    - ft-attention: 0.1
    - fused-dense-lib: 0.0.0
    - future: 0.18.3
    - fvcore: 0.1.5.post20221221
    - gast: 0.4.0
    - gcs-oauth2-boto-plugin: 3.0
    - gdown: 4.7.1
    - gitdb: 4.0.10
    - gitpython: 3.1.31
    - google-apitools: 0.5.32
    - google-auth: 2.16.2
    - google-auth-oauthlib: 0.4.6
    - google-reauth: 0.1.1
    - gpustat: 1.1
    - graphsurgeon: 0.4.6
    - greenlet: 2.0.2
    - grpcio: 1.51.3
    - gsutil: 5.23
    - h5py: 3.8.0
    - heapdict: 1.0.1
    - hjson: 3.1.0
    - httplib2: 0.20.4
    - huggingface-hub: 0.13.4
    - hydra-colorlog: 1.2.0
    - hydra-core: 1.3.2
    - hydra-optuna-sweeper: 1.2.0
    - hypothesis: 5.35.1
    - idna: 3.4
    - importlib-metadata: 6.0.0
    - importlib-resources: 5.12.0
    - iniconfig: 2.0.0
    - intel-openmp: 2021.4.0
    - iopath: 0.1.10
    - ipdb: 0.13.13
    - ipykernel: 6.21.3
    - ipython: 8.11.0
    - ipython-genutils: 0.2.0
    - ipywidgets: 8.0.6
    - jaraco.classes: 3.2.3
    - jedi: 0.18.2
    - jeepney: 0.8.0
    - jinja2: 3.1.2
    - joblib: 1.2.0
    - json5: 0.9.11
    - jsonschema: 4.17.3
    - jupyter: 1.0.0
    - jupyter-client: 8.0.3
    - jupyter-console: 6.6.3
    - jupyter-core: 5.2.0
    - jupyter-tensorboard: 0.2.0
    - jupyterlab: 2.3.2
    - jupyterlab-pygments: 0.2.2
    - jupyterlab-server: 1.2.0
    - jupyterlab-widgets: 3.0.7
    - jupytext: 1.14.5
    - keopscore: 2.1.2
    - keyring: 23.13.1
    - kiwisolver: 1.4.4
    - langcodes: 3.3.0
    - librosa: 0.9.2
    - lightning-utilities: 0.8.0
    - lit: 15.0.7
    - llvmlite: 0.39.1
    - locket: 1.0.0
    - mako: 1.2.4
    - markdown: 3.4.1
    - markdown-it-py: 2.2.0
    - markupsafe: 2.1.2
    - matplotlib: 3.7.0
    - matplotlib-inline: 0.1.6
    - mdit-py-plugins: 0.3.5
    - mdurl: 0.1.2
    - mistune: 2.0.5
    - mkl: 2021.1.1
    - mkl-devel: 2021.1.1
    - mkl-include: 2021.1.1
    - mlperf-logging: 2.1.0
    - mock: 5.0.1
    - monotonic: 1.6
    - more-itertools: 9.1.0
    - mpmath: 1.3.0
    - msgpack: 1.0.4
    - multidict: 6.0.4
    - multiprocess: 0.70.14
    - munch: 2.5.0
    - murmurhash: 1.0.9
    - nbclient: 0.7.2
    - nbconvert: 7.2.10
    - nbformat: 5.7.3
    - nest-asyncio: 1.5.6
    - networkx: 2.6.3
    - ninja: 1.11.1
    - notebook: 6.4.10
    - numba: 0.56.4+1.g9a03de713
    - numpy: 1.22.2
    - nvidia-dali-cuda110: 1.23.0
    - nvidia-ml-py: 11.525.112
    - nvidia-pyindex: 1.0.9
    - nvitop: 1.1.2
    - nvtx: 0.2.5
    - oauth2client: 4.1.3
    - oauthlib: 3.2.2
    - omegaconf: 2.3.0
    - onnx: 1.13.0
    - opencv: 4.6.0
    - opt-einsum: 3.3.0
    - optuna: 2.10.1
    - packaging: 23.0
    - pandas: 1.5.2
    - pandocfilters: 1.5.0
    - parso: 0.8.3
    - partd: 1.3.0
    - pathtools: 0.1.2
    - pathy: 0.10.1
    - pbr: 5.11.1
    - pexpect: 4.8.0
    - pickleshare: 0.7.5
    - pillow: 9.2.0
    - pip: 21.2.4
    - pkginfo: 1.9.6
    - pkgutil-resolve-name: 1.3.10
    - platformdirs: 3.1.1
    - pluggy: 1.0.0
    - ply: 3.11
    - polygraphy: 0.44.2
    - pooch: 1.7.0
    - portalocker: 2.7.0
    - preshed: 3.0.8
    - prettytable: 3.6.0
    - prometheus-client: 0.16.0
    - prompt-toolkit: 3.0.38
    - protobuf: 3.20.3
    - psutil: 5.9.4
    - ptxcompiler: 0.7.0+27.gbcb4096
    - ptyprocess: 0.7.0
    - pure-eval: 0.2.2
    - py-cpuinfo: 9.0.0
    - pyarrow: 10.0.1.dev0+ga6eabc2b.d20230220
    - pyasn1: 0.4.8
    - pyasn1-modules: 0.2.8
    - pybind11: 2.10.3
    - pycocotools: 2.0+nv0.7.1
    - pycparser: 2.21
    - pydantic: 1.10.6
    - pygments: 2.14.0
    - pylibcugraph: 23.2.0
    - pylibcugraphops: 23.2.0
    - pylibraft: 23.2.0
    - pynvml: 11.5.0
    - pyopenssl: 23.1.1
    - pyparsing: 3.0.9
    - pyperclip: 1.8.2
    - pyrootutils: 1.0.4
    - pyrsistent: 0.19.3
    - pysocks: 1.7.1
    - pytest: 7.2.2
    - pytest-rerunfailures: 11.1.2
    - pytest-shard: 0.1.2
    - pytest-xdist: 3.2.1
    - python-dateutil: 2.8.2
    - python-dotenv: 1.0.0
    - python-hostlist: 1.23.0
    - pytorch-fast-transformers: 0.4.0
    - pytorch-lightning: 2.0.1.post0
    - pytorch-quantization: 2.1.2
    - pytz: 2022.7.1
    - pyu2f: 0.1.5
    - pyyaml: 6.0
    - pyzmq: 25.0.1
    - qtconsole: 5.4.2
    - qtpy: 2.3.1
    - raft-dask: 23.2.0
    - readme-renderer: 37.3
    - regex: 2022.10.31
    - requests: 2.28.2
    - requests-oauthlib: 1.3.1
    - requests-toolbelt: 0.10.1
    - resampy: 0.4.2
    - responses: 0.18.0
    - retry-decorator: 1.1.1
    - rfc3986: 2.0.0
    - rich: 13.3.4
    - rmm: 23.2.0
    - rotary-emb: 0.1
    - rsa: 4.7.2
    - scikit-learn: 1.2.0
    - scipy: 1.6.3
    - seaborn: 0.12.2
    - secretstorage: 3.3.3
    - send2trash: 1.8.0
    - sentencepiece: 0.1.98
    - sentry-sdk: 1.20.0
    - setproctitle: 1.3.2
    - setuptools: 65.5.1
    - six: 1.16.0
    - smart-open: 6.3.0
    - smmap: 5.0.0
    - sortedcontainers: 2.4.0
    - soundfile: 0.12.1
    - soupsieve: 2.4
    - spacy: 3.5.1
    - spacy-legacy: 3.0.12
    - spacy-loggers: 1.0.4
    - sphinx-glpi-theme: 0.3
    - sqlalchemy: 2.0.10
    - srsly: 2.4.6
    - stack-data: 0.6.2
    - stevedore: 5.0.0
    - strings-udf: 23.2.0
    - structured-kernels: 0.1.0
    - sympy: 1.11.1
    - tabulate: 0.9.0
    - tbb: 2021.8.0
    - tblib: 1.7.0
    - tensorboard: 2.9.0
    - tensorboard-data-server: 0.6.1
    - tensorboard-plugin-wit: 1.8.1
    - tensorrt: 8.5.3.1
    - termcolor: 2.2.0
    - terminado: 0.17.1
    - thinc: 8.1.9
    - threadpoolctl: 3.1.0
    - thriftpy2: 0.4.16
    - timm: 0.6.13
    - tinycss2: 1.2.1
    - tokenizers: 0.13.3
    - toml: 0.10.2
    - tomli: 2.0.1
    - toolz: 0.12.0
    - torch: 2.0.0a0+1767026
    - torch-tensorrt: 1.4.0.dev0
    - torchaudio: 2.0.1+3b40834
    - torchmetrics: 0.11.4
    - torchtext: 0.13.0a0+fae8e8c
    - torchvision: 0.15.0a0
    - tornado: 6.2
    - tqdm: 4.65.0
    - traitlets: 5.9.0
    - transformer-engine: 0.6.0
    - transformers: 4.28.1
    - treelite: 3.1.0
    - treelite-runtime: 3.1.0
    - triton: 2.0.0.dev20221202
    - twine: 4.0.2
    - typer: 0.7.0
    - types-dataclasses: 0.6.6
    - typing-extensions: 4.5.0
    - ucx-py: 0.30.0
    - uff: 0.6.9
    - urllib3: 1.26.14
    - wandb: 0.15.1
    - wasabi: 1.1.1
    - wcwidth: 0.2.6
    - webencodings: 0.5.1
    - werkzeug: 2.2.3
    - wheel: 0.38.4
    - widgetsnbextension: 4.0.7
    - xdoctest: 1.0.2
    - xentropy-cuda-lib: 0.1
    - xxhash: 3.2.0
    - yacs: 0.1.8
    - yarl: 1.9.1
    - zict: 2.2.0
    - zipp: 3.14.0
    - zstandard: 0.21.0
  • System:
    - OS: Linux
    - architecture:
    - 64bit
    - ELF
    - processor: x86_64
    - python: 3.8.10
    - version: Proposal for help #1 SMP Debian 4.19.269-1 (2022-12-20)

More info

No response

cc @awaelchli @morganmcg1 @borisdayma @scottire @parambharat

@albertfgu albertfgu added bug Something isn't working needs triage Waiting to be triaged by maintainers labels May 4, 2023
@awaelchli awaelchli added the duplicate This issue or pull request already exists label May 4, 2023
@awaelchli awaelchli reopened this May 4, 2023
@awaelchli awaelchli added logger: wandb Weights & Biases and removed needs triage Waiting to be triaged by maintainers duplicate This issue or pull request already exists labels May 4, 2023
@albertfgu
Copy link
Author

Thanks for pointing out #14988. That is very related (my other issue #17559 is a duplicate), but I think that this issue is still slightly different though. In particular this issue is about what happens at wandb.init() time before LightningModule.save_hyperparameters()

This is my understanding:

  • Calling WandbLogger(...) instantiates the WandB run (via wandb.init()) immediately
  • Calling WandbLogger(config=config) absorbs config into a **kwargs that gets passed through to wandb.init(config=config). This is according to the documentation: https://lightning.ai/docs/pytorch/latest/extensions/generated/lightning.pytorch.loggers.WandbLogger.html. And I briefly looked at the source code and this description seems correct
  • Therefore WandbLogger(config=config) and wandb.init(config=config) should display the same behavior in terms of the logged config (and this is independent of whatever LightningModule.save_hyperparameters() does)
  • However as my screenshots show, they are doing quite different things

@awaelchli
Copy link
Contributor

@albertfgu #17574 should fix this. Could you check?

@awaelchli
Copy link
Contributor

And yes, we forward WandbLogger(config=config) unmodified to wandb.init(config=config) so I think there is nothing to fix for that part.

@awaelchli
Copy link
Contributor

@albertfgu We merged the fix #17574 and so there shouldn't be an inconsistency anymore. Could you check that it is working as expected?

@albertfgu
Copy link
Author

How can I install the latest commits? I followed the instructions at https://lightning.ai/docs/pytorch/stable/starter/installation.html and ran

pip install https://github.com/Lightning-AI/lightning/archive/refs/heads/master.zip -U

It prints a lot of messages about lightning==2.1.0.dev0, but pip list shows pytorch-lightning-2.0.2 which seems like the wrong version. This version still has the issue.

@albertfgu
Copy link
Author

It looks like the entire namespace got moved from pytorch_lightning to lightning.pytorch -- is that the issue I'm having? Where is this documented?

@albertfgu
Copy link
Author

I switched to lightning.pytorch using lightning==2.1.0.dev0 based on the instructions for installing the nightlies, and the problem is still there. Actually it seems to have gotten worse as even the normal workflow with LightningModule.save_hyperparameters() is showing flattened dictionaries.

@awaelchli
Copy link
Contributor

awaelchli commented May 6, 2023

As you can see in the linked PR #17574, the flattening got completely removed from WandbLogger:
https://github.com/Lightning-AI/lightning/blob/a36af3f9f855a4735fdc97c5e08bebb8a7467bf3/src/lightning/pytorch/loggers/wandb.py#L422-L425

If you install lightning from master, you will see that in the run, the parameters are nested properly.

@albertfgu
Copy link
Author

albertfgu commented May 7, 2023

Yes, I looked at the PR and saw the changes. However it is definitely not working for me. Has anyone else reported that it works?


Just to recap, here is what I previously saw:

  1. With WandbLogger(...) and LightningModule.save_hyperparameters(config), the keys were saved in a custom flattened form key1/key2=value
  2. With WandbLogger(..., config=config) and LightningModule.save_hyperparameters(config, logger=False), the keys were saved with only one level key1={key2: value} image

Now, I installed lightning from master as I wrote above. I am sure that I am now on master because:

  • I previously did not have lightning installed at all, and pip now shows I'm on lightning==2.1.0.dev0
  • I did a find and replace in my whole repository to change pytorch_lightning, which I was previously using, to lightning.pytorch so it is using the newly installed library
  • The user behavior has changed for me

This is the current behavior I see:

  1. With WandbLogger(...) and LightningModule.save_hyperparameters(config), now all keys show the bad behavior of no nesting. Note that this is different and worse than the behavior before I changed to lightning master branch

image

  1. With WandbLogger(..., config=config) and LightningModule.save_hyperparameters(config, logger=False), the behavior is nearly identical as to before (the giant dicts for values)

@albertfgu
Copy link
Author

albertfgu commented May 7, 2023

So I misspoke in the previous post

Actually it seems to have gotten worse as even the normal workflow with LightningModule.save_hyperparameters() is showing flattened dictionaries.

It is not longer flattening in either scenario: instead of logging {'a/b': c}, it now preserve the original config {'a': {'b': c}}. This addresses the series of issues #17559, #14988, etc which governs the behavior of case (1) above


However, this issue was originally about case (2) which does not have to do with lightning's flattening at all. Thus PR #17574 cannot have addressed it. Just to recap again, this issue is about the fact that

  • When WandB is called with wandb.init(config=config) with config={'key1': {'key2': 'value'}}, it displays a nicely formatted nested dictionary

image

  • When Lightning's WandbLogger is called with WandbLogger(config=config), or is called with WandbLogger() and logged with LightningModule.save_hyperparameters(), the config is displayed in the ugly way.

Thus the issue seems to be that Lightning is doing something to the config dictionary before it gets passed to wandb.init() that causes wandb to not display it properly. This change also seems to have happened during Lightning 1.9.x -> 2.0, because this behavior was not there before.

@albertfgu
Copy link
Author

I am not sure if this is related, but I have not been able to figure out why the logged config is nested under these higher-level metadata keys:
image

Here the entire config is nested under _content instead of being top-level.
image

This is what happens when either WandbLogger(config=config) or LightningModule.save_hyperparameters(config), and config = {'callbacks': {...}}. I am not sure why the config is being nested under the _content key and where the other metadata keys are coming from, but perhaps this is related to the issue?

@awaelchli
Copy link
Contributor

Here is a simple example that demonstrates how Lightning passes the objects unmodified to wandb:

import argparse

import torch
from torch.utils.data import DataLoader, Dataset

from lightning.pytorch import LightningModule, Trainer
from lightning.pytorch.loggers import WandbLogger


class RandomDataset(Dataset):
    def __init__(self, size, length):
        self.len = length
        self.data = torch.randn(length, size)

    def __getitem__(self, index):
        return self.data[index]

    def __len__(self):
        return self.len


class BoringModel(LightningModule):
    def __init__(self, **kwargs):
        super().__init__()
        self.save_hyperparameters()
        self.layer = torch.nn.Linear(32, 2)

    def forward(self, x):
        return self.layer(x)

    def training_step(self, batch, batch_idx):
        loss = self(batch).sum()
        self.log("train_loss", loss)
        return {"loss": loss}

    def configure_optimizers(self):
        return torch.optim.SGD(self.layer.parameters(), lr=0.1)

    def train_dataloader(self):
        return DataLoader(RandomDataset(32, 64), batch_size=2)


def run():
    # Example of a nested configuration
    config = {
        "int": 1,
        "dict": {"a": 1, "b": 2},
        "dict2": {"a": {"a": 1, "b": 2}, "b": 3},
        "list": [1, 2, 3],
        "namespace": argparse.Namespace(a=1, b=2),
    }

    model = BoringModel(**config)
    trainer = Trainer(
        logger=WandbLogger(project="test-config-logging"),
        max_steps=1,
    )
    trainer.fit(model)


if __name__ == "__main__":
    run()

image

Of course, anything that WandB does NOT support displaying in a special way will just be sanitized and shown as a string. This is not decided by Lightning.

If I pass the above config directly into wandb like so:

    config = {
        "int": 1,
        "dict": {"a": 1, "b": 2},
        "dict2": {"a": {"a": 1, "b": 2}, "b": 3},
        "list": [1, 2, 3],
        "namespace": argparse.Namespace(a=1, b=2),
    }

    # Now, let's log the same thing with wandb directly and compare that they are the same
    import wandb
    wandb.init(project="test-config-logging", name="raw", config=config)

I get the exact same as shown in the screenshot above.
Please, if something is still not working, paste a short code example to demonstrate (like I did here).

@albertfgu
Copy link
Author

Thanks for showing the snippet and screenshot. Let me dig in some more to see if there is something strange going on with my setup.

@albertfgu
Copy link
Author

Okay, I finally tracked down the issue. It turned out I was passing in not a raw Python dictionary for the config but an omegaconf DictConfig object (https://omegaconf.readthedocs.io/en/2.1_branch/index.html). This is the dictionary object used by Hydra (https://hydra.cc/), but it seems that WandB doesn't like it when you pass in this fancy dictionary object instead of a basic Python dictionary.

The solution is to convert the config to a Python dict before passing into WandbLogger(config=config) or before calling LightningModule.save_hyperparameters(config). In the case of Hydra one should call OmegaConf.to_container(config).
Hopefully this issue helps other people who run into problems because I think Lightning + Hydra + WandB is a fairly common ML stack these days.

@awaelchli Thanks for addressing this and making the PR, and sorry for assuming the issue was with Lightning when it turned out to be an unfortunate interaction between multiple libraries.

I think it is perhaps possible for the libraries to help alleviate these sorts of issues; for example LightningModule.save_hyperparameters(config) could convert the config from any Mapping type to a raw dictionary before passing into the logger.

In fact reading through the Lightning code, it seems like log_hyperparams()
https://github.com/Lightning-AI/lightning/blob/bd05aa96eddbfcb6f010228ec91ce09f1db4fd29/src/lightning/pytorch/loggers/wandb.py#L419
assumes the input is type Dict, but I think this issue occurred because I was passing in a Mapping and Python doesn't actually enforce the type checking. Perhaps it makes sense to handle the case when params is a Mapping (like the log_metrics() method right below) and recursively convert it to a Dict in the _convert_params() function? After all the docstring of _convert_params() says "Ensure parameters are a dict or convert to dict if necessary."

But perhaps it's just the responsibility of the user to make sure all the libraries are interacting properly

@awaelchli
Copy link
Contributor

@albertfgu If we updated the types from Dict to Mapping (and converted mapping objects to dicts), would that resolve your remaining issue?

@albertfgu
Copy link
Author

I have fixed my own issue after identifying the problem, but I do think that what you described would be more robust to potential related issues. As is, even the Hydra + Lightning combination is probably fairly common and users would all run into this non-obvious issue.

@awaelchli awaelchli added the help wanted Open to be worked on label Jul 26, 2023
@justachetan
Copy link

Agree! I have been banging my head over this for a while and just found this issue. Thank you for this discussion! @albertfgu could you please elaborate on how you fixed this issue eventually? Thanks!

@Kin-Zhang
Copy link

Kin-Zhang commented Apr 30, 2025

I also had this issue after I upgraded wandb &pl,

I think the same code but with wandb 0.16.6 + pl=2.0.1 + hydra-core=1.3.2 is fine.

I run in wandb 0.19.10 + pl=2.4.0 + hydra-core=1.3.2 and define like this, it needs save_hyperparameter as cfg as input.

class ModelWrapper(LightningModule):
    def __init__(self, cfg, eval=False):
        super().__init__()
# other codes here.
#       self.save_hyperparameters() # previously
        self.save_hyperparameters(cfg)

I think it's mainly lightning issue, looks like after they upgraded? related to mine one: #20311

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Open to be worked on logger: wandb Weights & Biases ver: 2.0.x
Projects
None yet
Development

No branches or pull requests

4 participants