Skip to content

CPU System Metrics collection #11253

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
EricWiener opened this issue Dec 24, 2021 · 9 comments · Fixed by #11795
Closed

CPU System Metrics collection #11253

EricWiener opened this issue Dec 24, 2021 · 9 comments · Fixed by #11795
Labels
accelerator: cpu Central Processing Unit callback: device stats feature Is an improvement or enhancement help wanted Open to be worked on
Milestone

Comments

@EricWiener
Copy link
Contributor

EricWiener commented Dec 24, 2021

🚀 Feature

Provide CPU profiling similar to GPU and XLA profiling provided by DeviceStatsMonitor. It would be nice if you could specify which device you wanted to profile with DevcieStatsMonitor vs. the profiling defaulting to whatever accelerator you are using.

Motivation

I am running out of CPU memory and I need to figure out where this is occurring. It would be nice if I could easily monitor CPU stats (memory usage, percent utilization, etc).

Pitch

Modify DevcieStatsMonitor to take a device arg that allows you to specify which device to profile. You can then pass multiple DeviceStatsMonitor callbacks to Trainer. The CPU Monitor can use psutil to track common memory attributes.

Alternatives

N/A

Additional context

Also discussed here: #9032 (comment)

cc @Borda @kaushikb11 @awaelchli @justusschock @akihironitta @rohitgr7

@EricWiener EricWiener added the feature Is an improvement or enhancement label Dec 24, 2021
@ananthsub ananthsub added this to the 1.6 milestone Dec 24, 2021
@ananthsub
Copy link
Contributor

ananthsub commented Dec 24, 2021

@EricWiener would you prefer that seeing CPU stats automatically, even when training on GPU? And to confirm, do you want CPU stats tracked at the same cadence/hooks as GPU stats?

I retitled it the issue to be about system metrics collection to avoid confusion with profiling/Lightning profilers

@ananthsub ananthsub changed the title CPU Profiling CPU System Metrics Dec 24, 2021
@ananthsub ananthsub changed the title CPU System Metrics CPU System Metrics collection Dec 24, 2021
@EricWiener
Copy link
Contributor Author

EricWiener commented Dec 24, 2021

@EricWiener would you prefer that seeing CPU stats automatically, even when training on GPU?

Automatic stats would be very nice, but it seems a little strange to require callbacks for GPU/XLA stats but have CPU stats automatically tracked. Also, if it were to be automatically tracked, it would be nice if iteration speed was tracked as well.

And to confirm, do you want CPU stats tracked at the same cadence/hooks as GPU stats?

Ideally one could specify the frequency (maybe by passing a list of the callbacks they want stats to be logged at). For debugging memory usage it would be nice for stats to be logged at every possible callback. However, for most cases, every n steps/epochs would likely suffice.

@ananthsub
Copy link
Contributor

ananthsub commented Dec 25, 2021

I should've clarified, by automatic I mean when using the device stats monitor callback, not on all the time.

One idea @daniellepintz and I discussed earlier was to do something like this:

class CPUAccelerator(Accelerator):

    _process: Optional[psutil.Process]  # check if psutil available

    def setup_environment(self, root_device: torch.device) -> None:
        """
        Raises:
            MisconfigurationException:
                If the selected device is not CPU.
        """
        if "cpu" not in str(root_device):
            raise MisconfigurationException(f"Device should be CPU, got {root_device} instead.")
        self._process = psutil.Process()
    
    def teardown(self) -> None:
        self._process = None
    
    def get_device_stats(self, device: Union[str, torch.device]) -> Dict[str, Any]:
        """CPU device stats aren't supported yet."""
        if not self._process:
            return {}
        return get_cpu_process_metrics(self._process)

def get_cpu_process_metrics(process: Optional[psutil.Process]) -> Dict[str, float]:
    process = process or psutil.Process()
    memory_info = process.memory_info()
    cpu_times = process.cpu_times()
    metrics["cpu_rss_memory_bytes"] = memory_info.rss
    metrics["cpu_time_user"] = cpu_times.user
    metrics["cpu_time_system"] = cpu_times.system
    return metrics

get_cpu_process_metrics could also be called from the GPU and TPU accelerators too as part of their get_device_stats implementations. Anytime someone attaches a device stats monitor callback, it would generate both the CPU + specific device stats being used.

@four4fish @awaelchli this would motivate teardown being added to the accelerator interface. as a rule of thumb, anytime we offer a setup interface, we should also provide a teardown since they come in pairs

@EricWiener
Copy link
Contributor Author

Having CPU stats whenever the device stats monitor is used would be great.

It would also be ideal if the user could specify additional CPU metrics they wanted (like swap memory percent)

@tchaton
Copy link
Contributor

tchaton commented Jan 4, 2022

Hey @ananthsub,

Users might want to track both their CPU + Accelerator Device (GPU, TPU, ...) usages at the same time which is a very common use case (done automatically by Wandb as an example).

However, the current design relies on single accelerator to be instantiated.

Do you have any idea how to resolve this?

IMO, as a user, I would prefer an interface like this:

# opt-in when the selected accelerator isn't 'cpu'.
Trainer(accelerator="gpu", devices=2, callbacks=DeviceStatsMonitor(cpu_stats=True))
class DeviceStatsMonitor:
     ...

    def on_train_batch_end(
        self,
        trainer: "pl.Trainer",
        pl_module: "pl.LightningModule",
        outputs: STEP_OUTPUT,
        batch: Any,
        batch_idx: int,
        unused: Optional[int] = 0,
    ) -> None:
        if not trainer.logger:
            raise MisconfigurationException("Cannot use `DeviceStatsMonitor` callback with `Trainer(logger=False)`.")

        if not trainer.logger_connector.should_update_logs:
            return

        device_stats = trainer.accelerator.get_device_stats(pl_module.device)

        if pl_module.device != 'cpu' and self.cpu_stats:
            device_stats.update(CPUAccelerator.get_device_stats())

        ...

@ananthsub
Copy link
Contributor

@tchaton that looks reasonable to me!

@tchaton
Copy link
Contributor

tchaton commented Jan 5, 2022

@carmocca @awaelchli Any thoughts on this?

@EricWiener Would you have some interest in contributing this feature?

@awaelchli
Copy link
Contributor

Yes looks good, if the stats collection happens on the accelerator this is the only way I see currently. Lightning always has one accelerator, but uses both CPU and the extra device together. This topic will come up again in the future.

@EricWiener
Copy link
Contributor Author

@carmocca @awaelchli Any thoughts on this?

@EricWiener Would you have some interest in contributing this feature?

So sorry just saw this when I searched for the issue again. Will try to work on this this week or next

@carmocca carmocca modified the milestones: 1.6, future Feb 1, 2022
@carmocca carmocca added the help wanted Open to be worked on label Feb 1, 2022
@carmocca carmocca modified the milestones: future, 1.7 Mar 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accelerator: cpu Central Processing Unit callback: device stats feature Is an improvement or enhancement help wanted Open to be worked on
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants