Skip to content

Import error when trying to run TPU accelerator on CPU #12044

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
awaelchli opened this issue Feb 22, 2022 · 3 comments · Fixed by #12030
Closed

Import error when trying to run TPU accelerator on CPU #12044

awaelchli opened this issue Feb 22, 2022 · 3 comments · Fixed by #12030
Assignees
Labels
accelerator: tpu Tensor Processing Unit bug Something isn't working
Milestone

Comments

@awaelchli
Copy link
Contributor

awaelchli commented Feb 22, 2022

🐛 Bug

To Reproduce

import os

import torch
from torch.utils.data import DataLoader, Dataset

from pytorch_lightning import LightningModule, Trainer


class RandomDataset(Dataset):
    def __init__(self, size, length):
        self.len = length
        self.data = torch.randn(length, size)

    def __getitem__(self, index):
        return self.data[index]

    def __len__(self):
        return self.len


class BoringModel(LightningModule):
    def __init__(self):
        super().__init__()
        self.layer = torch.nn.Linear(32, 2)

    def forward(self, x):
        return self.layer(x)

    def training_step(self, batch, batch_idx):
        loss = self(batch).sum()
        self.log("train_loss", loss)
        return {"loss": loss}

    def validation_step(self, batch, batch_idx):
        loss = self(batch).sum()
        self.log("valid_loss", loss)

    def test_step(self, batch, batch_idx):
        loss = self(batch).sum()
        self.log("test_loss", loss)

    def configure_optimizers(self):
        return torch.optim.SGD(self.layer.parameters(), lr=0.1)


def run():
    train_data = DataLoader(RandomDataset(32, 64), batch_size=2)
    val_data = DataLoader(RandomDataset(32, 64), batch_size=2)
    test_data = DataLoader(RandomDataset(32, 64), batch_size=2)

    model = BoringModel()
    trainer = Trainer(
        default_root_dir=os.getcwd(),
        limit_train_batches=1,
        limit_val_batches=1,
        limit_test_batches=1,
        num_sanity_val_steps=0,
        max_epochs=1,
        enable_model_summary=False,
        accelerator="tpu",
        devices=1,
    )
    trainer.fit(model, train_dataloaders=train_data, val_dataloaders=val_data)
    trainer.test(model, dataloaders=test_data)


if __name__ == "__main__":
    run()

Produces

Traceback (most recent call last):
  File "/Users/adrian/repositories/pytorch-lightning/pl_examples/bug_report/bug_report_model.py", line 68, in <module>
    run()
  File "/Users/adrian/repositories/pytorch-lightning/pl_examples/bug_report/bug_report_model.py", line 52, in run
    trainer = Trainer(
  File "/Users/adrian/repositories/pytorch-lightning/pytorch_lightning/utilities/argparse.py", line 336, in insert_env_defaults
    return fn(self, **kwargs)
  File "/Users/adrian/repositories/pytorch-lightning/pytorch_lightning/trainer/trainer.py", line 477, in __init__
    self._accelerator_connector = AcceleratorConnector(
  File "/Users/adrian/repositories/pytorch-lightning/pytorch_lightning/trainer/connectors/accelerator_connector.py", line 194, in __init__
    self._strategy_flag = self._choose_strategy()
  File "/Users/adrian/repositories/pytorch-lightning/pytorch_lightning/trainer/connectors/accelerator_connector.py", line 531, in _choose_strategy
    return SingleTPUStrategy(device=self._parallel_devices[0])  # type: ignore
  File "/Users/adrian/repositories/pytorch-lightning/pytorch_lightning/strategies/single_tpu.py", line 44, in __init__
    device=xm.xla_device(device),
NameError: name 'xm' is not defined

Expected behavior

Should error with


File "/Users/adrian/repositories/pytorch-lightning/pytorch_lightning/trainer/connectors/accelerator_connector.py", line 218, in select_accelerator_type
    raise MisconfigurationException(f"You passed `accelerator='tpu'`, but {msg}.")
pytorch_lightning.utilities.exceptions.MisconfigurationException: You passed `accelerator='tpu'`, but TPUs are not available.

like on 1.5.10

Environment

Latest master.
Ask if you need more.

cc @kaushikb11 @rohitgr7

@awaelchli awaelchli added the bug Something isn't working label Feb 22, 2022
@awaelchli
Copy link
Contributor Author

Likely a result of #11448 cc @four4fish

@awaelchli awaelchli added the accelerator: tpu Tensor Processing Unit label Feb 22, 2022
@awaelchli awaelchli added this to the 1.6 milestone Feb 22, 2022
@kaushikb11 kaushikb11 self-assigned this Feb 22, 2022
@kaushikb11
Copy link
Contributor

It would be fixed by the updates added in #12030, and I have moved the parsing logic to the accelerators.
But would do a quick PR for a fix for it

@four4fish
Copy link
Contributor

four4fish commented Feb 22, 2022

Yeah, this is the follow up item "Enable accelerator.is_available() check" in #11449
The proper way is to call accelerator.is_available() in _init_acceleartor(), left it as a follow up because it cause a lot of GPU tests fails. The previous accl_con logic, device availability check doesn't apply to GPU. Now calling self.accelerator.is_available() will apply device check to GPU as well, and a lot of tests need adding mocks. To avoid massive tests change in one PR, I left this as follow up.
Is this urgent? Do you prefer a quick fix or waiting for "Enable accelerator.is_available() check" to fix this?
@awaelchli @kaushikb11

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accelerator: tpu Tensor Processing Unit bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants