Import error when trying to run TPU accelerator on CPU #12044

awaelchli · 2022-02-22T13:57:31Z

🐛 Bug

To Reproduce

import os

import torch
from torch.utils.data import DataLoader, Dataset

from pytorch_lightning import LightningModule, Trainer


class RandomDataset(Dataset):
    def __init__(self, size, length):
        self.len = length
        self.data = torch.randn(length, size)

    def __getitem__(self, index):
        return self.data[index]

    def __len__(self):
        return self.len


class BoringModel(LightningModule):
    def __init__(self):
        super().__init__()
        self.layer = torch.nn.Linear(32, 2)

    def forward(self, x):
        return self.layer(x)

    def training_step(self, batch, batch_idx):
        loss = self(batch).sum()
        self.log("train_loss", loss)
        return {"loss": loss}

    def validation_step(self, batch, batch_idx):
        loss = self(batch).sum()
        self.log("valid_loss", loss)

    def test_step(self, batch, batch_idx):
        loss = self(batch).sum()
        self.log("test_loss", loss)

    def configure_optimizers(self):
        return torch.optim.SGD(self.layer.parameters(), lr=0.1)


def run():
    train_data = DataLoader(RandomDataset(32, 64), batch_size=2)
    val_data = DataLoader(RandomDataset(32, 64), batch_size=2)
    test_data = DataLoader(RandomDataset(32, 64), batch_size=2)

    model = BoringModel()
    trainer = Trainer(
        default_root_dir=os.getcwd(),
        limit_train_batches=1,
        limit_val_batches=1,
        limit_test_batches=1,
        num_sanity_val_steps=0,
        max_epochs=1,
        enable_model_summary=False,
        accelerator="tpu",
        devices=1,
    )
    trainer.fit(model, train_dataloaders=train_data, val_dataloaders=val_data)
    trainer.test(model, dataloaders=test_data)


if __name__ == "__main__":
    run()

Produces

Traceback (most recent call last):
  File "/Users/adrian/repositories/pytorch-lightning/pl_examples/bug_report/bug_report_model.py", line 68, in <module>
    run()
  File "/Users/adrian/repositories/pytorch-lightning/pl_examples/bug_report/bug_report_model.py", line 52, in run
    trainer = Trainer(
  File "/Users/adrian/repositories/pytorch-lightning/pytorch_lightning/utilities/argparse.py", line 336, in insert_env_defaults
    return fn(self, **kwargs)
  File "/Users/adrian/repositories/pytorch-lightning/pytorch_lightning/trainer/trainer.py", line 477, in __init__
    self._accelerator_connector = AcceleratorConnector(
  File "/Users/adrian/repositories/pytorch-lightning/pytorch_lightning/trainer/connectors/accelerator_connector.py", line 194, in __init__
    self._strategy_flag = self._choose_strategy()
  File "/Users/adrian/repositories/pytorch-lightning/pytorch_lightning/trainer/connectors/accelerator_connector.py", line 531, in _choose_strategy
    return SingleTPUStrategy(device=self._parallel_devices[0])  # type: ignore
  File "/Users/adrian/repositories/pytorch-lightning/pytorch_lightning/strategies/single_tpu.py", line 44, in __init__
    device=xm.xla_device(device),
NameError: name 'xm' is not defined

Expected behavior

Should error with


File "/Users/adrian/repositories/pytorch-lightning/pytorch_lightning/trainer/connectors/accelerator_connector.py", line 218, in select_accelerator_type
    raise MisconfigurationException(f"You passed `accelerator='tpu'`, but {msg}.")
pytorch_lightning.utilities.exceptions.MisconfigurationException: You passed `accelerator='tpu'`, but TPUs are not available.

like on 1.5.10

Environment

Latest master.
Ask if you need more.

cc @kaushikb11 @rohitgr7

The text was updated successfully, but these errors were encountered:

awaelchli · 2022-02-22T13:57:56Z

Likely a result of #11448 cc @four4fish

kaushikb11 · 2022-02-22T15:19:33Z

It would be fixed by the updates added in #12030, and I have moved the parsing logic to the accelerators.
But would do a quick PR for a fix for it

four4fish · 2022-02-22T17:49:11Z

Yeah, this is the follow up item "Enable accelerator.is_available() check" in #11449
The proper way is to call accelerator.is_available() in _init_acceleartor(), left it as a follow up because it cause a lot of GPU tests fails. The previous accl_con logic, device availability check doesn't apply to GPU. Now calling self.accelerator.is_available() will apply device check to GPU as well, and a lot of tests need adding mocks. To avoid massive tests change in one PR, I left this as follow up.
Is this urgent? Do you prefer a quick fix or waiting for "Enable accelerator.is_available() check" to fix this?
@awaelchli @kaushikb11

awaelchli added the bug Something isn't working label Feb 22, 2022

awaelchli added the accelerator: tpu Tensor Processing Unit label Feb 22, 2022

awaelchli added this to the 1.6 milestone Feb 22, 2022

awaelchli mentioned this issue Feb 22, 2022

Rewrite Accelerator_connector and follow up tasks #11449

Closed

10 tasks

kaushikb11 self-assigned this Feb 22, 2022

rohitgr7 mentioned this issue Feb 22, 2022

Trainer(accelerator="tpu") should raise exception if TPU not found #12047

Closed

kaushikb11 mentioned this issue Feb 23, 2022

Add support for pluggable Accelerators #12030

Merged

12 tasks

kaushikb11 closed this as completed in #12030 Feb 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Import error when trying to run TPU accelerator on CPU #12044

Import error when trying to run TPU accelerator on CPU #12044

awaelchli commented Feb 22, 2022 •

edited by github-actions bot

Loading

awaelchli commented Feb 22, 2022

Uh oh!

kaushikb11 commented Feb 22, 2022

Uh oh!

four4fish commented Feb 22, 2022 •

edited

Loading

Uh oh!

Import error when trying to run TPU accelerator on CPU #12044

Import error when trying to run TPU accelerator on CPU #12044

Comments

awaelchli commented Feb 22, 2022 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🐛 Bug

To Reproduce

Expected behavior

Environment

awaelchli commented Feb 22, 2022

Uh oh!

kaushikb11 commented Feb 22, 2022

Uh oh!

four4fish commented Feb 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

awaelchli commented Feb 22, 2022 •

edited by github-actions bot

Loading

four4fish commented Feb 22, 2022 •

edited

Loading