Fix memory leaks #2526

gcuendet · 2023-12-11T14:06:26Z

Description

Converting a torchscript with torch-tensorRT shows large memory leaks. A simple way to reproduce the leaks is provided in this script:

import numpy as np
import torch_tensorrt as trt
import torch
import torchvision
import psutil
import gc


if __name__ == "__main__":
    network = torchvision.models.mobilenet_v2(pretrained=True)
    network.eval().cuda()
    torch_s = torch.jit.script(network)

    compile_settings = {
        "inputs": [
            trt.Input([1, 3, 224, 224])
        ],
        "enabled_precisions": {torch.float32},
    }
    output_path = "/tmp/trt.ts"

    for _ in range(3):
        print(f"Used Virtual Memory: {psutil.virtual_memory().used / (1024*1024)}")
        trt_ts_module = trt.compile(torch_s, **compile_settings)
        torch.jit.save(trt_ts_module, output_path)

        del trt_ts_module
        gc.collect()

Running this script prints the used memory for each loop, which increases steadily by 35MB - 45MB per loop, where it should not increase, since all objects from the loop are deleted.

Running the small reproduction script with Valgrind, using the following command:

valgrind --leak-check=full python leak_mem.py

shows the following pretty important (42MB) possible losses:

==1163== 42,470,332 bytes in 3 blocks are possibly lost in loss record 70,886 of 70,887
==1163==    at 0x483BE63: operator new(unsigned long) (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==1163==    by 0x2EAF9ADE: ??? (in /usr/lib/x86_64-linux-gnu/libnvinfer.so.8.5.3)
==1163==    by 0x2F00444B: ??? (in /usr/lib/x86_64-linux-gnu/libnvinfer.so.8.5.3)
==1163==    by 0x12D2ED492: torch_tensorrt::core::conversion::ConversionCtx::SerializeEngine[abi:cxx11]() (in /usr/local/lib/python3.8/dist-packages/torch_tensorrt/lib/libtorchtrt.so)
==1163==    by 0x12D241E30: torch_tensorrt::core::conversion::ConvertBlockToEngine[abi:cxx11](torch::jit::Block const*, torch_tensorrt::core::conversion::ConversionInfo, std::map<torch::jit::Value*, c10::IValue, std::less<torch::jit::Value*>, std::allocator<std::pair<torch::jit::Value* const, c10::IValue> > >&) (in /usr/local/lib/python3.8/dist-packages/torch_tensorrt/lib/libtorchtrt.so)
==1163==    by 0x12D1F75A2: torch_tensorrt::core::CompileGraph(torch::jit::Module const&, torch_tensorrt::core::CompileSpec) (in /usr/local/lib/python3.8/dist-packages/torch_tensorrt/lib/libtorchtrt.so)
==1163==    by 0x12D05AC0B: torch_tensorrt::pyapi::CompileGraph(torch::jit::Module const&, torch_tensorrt::pyapi::CompileSpec&) (torch_tensorrt_py.cpp:155)
==1163==    by 0x12D08806E: void pybind11::cpp_function::initialize<torch::jit::Module (*&)(torch::jit::Module const&, torch_tensorrt::pyapi::CompileSpec&), torch::jit::Module, torch::jit::Module const&, torch_tensorrt::pyapi::CompileSpec&, pybind11::name, pybind11::scope, pybind11::sibling, char [128]>(torch::jit::Module (*&)(torch::jit::Module const&, torch_tensorrt::pyapi::CompileSpec&), torch::jit::Module (*)(torch::jit::Module const&, torch_tensorrt::pyapi::CompileSpec&), pybind11::name const&, pybind11::scope const&, pybind11::sibling const&, char const (&) [128])::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) (cast.h:1439)
==1163==    by 0x12D08AFA8: pybind11::cpp_function::dispatcher(_object*, _object*, _object*) (pybind11.h:929)
==1163==    by 0x5F5B38: PyCFunction_Call (in /usr/bin/python3.8)
==1163==    by 0x5F6705: _PyObject_MakeTpCall (in /usr/bin/python3.8)
==1163==    by 0x571142: _PyEval_EvalFrameDefault (in /usr/bin/python3.8)
==1163==
==1163== 42,470,332 bytes in 3 blocks are possibly lost in loss record 70,887 of 70,887
==1163==    at 0x483BE63: operator new(unsigned long) (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==1163==    by 0x2EAF9ADE: ??? (in /usr/lib/x86_64-linux-gnu/libnvinfer.so.8.5.3)
==1163==    by 0x2F7B03DF: ??? (in /usr/lib/x86_64-linux-gnu/libnvinfer.so.8.5.3)
==1163==    by 0x12D2139BC: std::_Function_handler<void (std::vector<c10::IValue, std::allocator<c10::IValue> >&), torch::jit::Function* torch::class_<torch_tensorrt::core::runtime::TRTEngine>::defineMethod<torch_tensorrt::core::runtime::(anonymous namespace)::{lambda(c10::intrusive_ptr<torch_tensorrt::core::runtime::TRTEngine, c10::detail::intrusive_target_default_null_type<torch_tensorrt::core::runtime::TRTEngine> > const&)#1}>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, torch_tensorrt::core::runtime::(anonymous namespace)::{lambda(c10::intrusive_ptr<torch_tensorrt::core::runtime::TRTEngine, c10::detail::intrusive_target_default_null_type<torch_tensorrt::core::runtime::TRTEngine> > const&)#1}, std::allocator<char>, std::initializer_list<torch::arg>)::{lambda(std::vector<c10::IValue, std::allocator<c10::IValue> >&)#1}>::_M_invoke(std::_Any_data const&, std::vector<c10::IValue, std::allocator<c10::IValue> >&) (in /usr/local/lib/python3.8/dist-packages/torch_tensorrt/lib/libtorchtrt.so)
==1163==    by 0xC64E5EF6: torch::jit::Pickler::pushIValueImpl(c10::IValue const&) (in /usr/local/lib/python3.8/dist-packages/torch/lib/libtorch_cpu.so)
==1163==    by 0xC64E641A: torch::jit::Pickler::pushIValue(c10::IValue const&) (in /usr/local/lib/python3.8/dist-packages/torch/lib/libtorch_cpu.so)
==1163==    by 0xC64E612E: torch::jit::Pickler::pushIValueImpl(c10::IValue const&) (in /usr/local/lib/python3.8/dist-packages/torch/lib/libtorch_cpu.so)
==1163==    by 0xC64E65B2: torch::jit::Pickler::pushIValue(c10::IValue const&) (in /usr/local/lib/python3.8/dist-packages/torch/lib/libtorch_cpu.so)
==1163==    by 0xC694D3BD: torch::jit::ScriptModuleSerializer::writeArchive(c10::IValue const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool) (in /usr/local/lib/python3.8/dist-packages/torch/lib/libtorch_cpu.so)
==1163==    by 0xC695043A: torch::jit::ScriptModuleSerializer::serialize(torch::jit::Module const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const&, bool, bool) (in /usr/local/lib/python3.8/dist-packages/torch/lib/libtorch_cpu.so)
==1163==    by 0xC695154C: torch::jit::ExportModule(torch::jit::Module const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const&, bool, bool, bool) (in /usr/local/lib/python3.8/dist-packages/torch/lib/libtorch_cpu.so)
==1163==    by 0xC13976B4: void pybind11::cpp_function::initialize<torch::jit::initJitScriptBindings(_object*)::{lambda(torch::jit::Module&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const&)#25}, void, torch::jit::Module&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const&, pybind11::name, pybind11::is_method, pybind11::sibling, pybind11::arg, pybind11::arg_v>(torch::jit::initJitScriptBindings(_object*)::{lambda(torch::jit::Module&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const&)#25}&&, void (*)(torch::jit::Module&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, pybind11::arg const&, pybind11::arg_v const&)::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call) (in /usr/local/lib/python3.8/dist-packages/torch/lib/libtorch_python.so)
==1163==

This points to two places,

a function called torch_tensorrt::core::conversion::ConversionCtx::SerializeEngine (defined in core/conversion/conversionctx/ConversionCtx.cpp)
a class called torch::class_<torch_tensorrt::core::runtime::TRTEngine> (defined in core/runtime/register_jit_hooks.cpp)

In these two places, pointers to TensorRT objects are obtained:

A IHostMemory* raw pointer is returned by nvinfer1::IBuilder::buildSerializedNetwork(...)
A IHostMemory* raw pointer is returned by nvinfer1::ICudaEngine::serialize()

This PR adds missing wrapping of raw pointers in smart pointers, so that the destructors of the underlying TensorRT objects are called properly, thus effectively releasing that host memory.

Type of change

Please delete options that are not relevant and/or add your own.

Bug fix (non-breaking change which fixes an issue)

Checklist:

My code follows the style guidelines of this project (You can use the linters)
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas and hacks
I have made corresponding changes to the documentation
I have added tests to verify my fix or my feature
New and existing unit tests pass locally with my changes
I have added the relevant labels to my PR in so that relevant reviewers are notified

Add missing wrapping of raw pointers in smart pointers, so that the destructors of the underlying TensorRT objects are called properly Signed-off-by: Gabriel Cuendet <[email protected]>

gcuendet · 2023-12-12T07:32:39Z

Looking quickly at the rest of the code, the TRTEngine::get_engine_layer_info function implemented in `core/core/TRTEngine.cpp might be another place where such a wrapping of a TensorRT raw pointer into a smart pointer is missing:

std::string TRTEngine::get_engine_layer_info() {
  auto inspector = cuda_engine->createEngineInspector(); // <-- The object pointed to by inspector never gets deleted
  return inspector->getEngineInformation(nvinfer1::LayerInformationFormat::kJSON);
}

narendasan

LGTM, Thanks for the patch, just want @gs-olive to take a look as well

gs-olive · 2023-12-13T04:07:51Z

This looks good! I do still see the memory footprint increasing from run to run, but the definitely lost field in the valgrind leak summary decreases substantially, which is great to see. I am also using the debug build which could affect the memory metrics here.

Fix memory leaks

dcd4e09

Add missing wrapping of raw pointers in smart pointers, so that the destructors of the underlying TensorRT objects are called properly Signed-off-by: Gabriel Cuendet <[email protected]>

facebook-github-bot added the cla signed label Dec 11, 2023

github-actions bot added component: conversion Issues re: Conversion stage component: core Issues re: The core compiler component: runtime labels Dec 11, 2023

github-actions bot requested a review from narendasan December 11, 2023 14:06

narendasan requested a review from gs-olive December 12, 2023 18:07

narendasan approved these changes Dec 12, 2023

View reviewed changes

gs-olive approved these changes Dec 18, 2023

View reviewed changes

narendasan merged commit e0b3fe1 into pytorch:main Dec 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix memory leaks #2526

Fix memory leaks #2526

gcuendet commented Dec 11, 2023

gcuendet commented Dec 12, 2023

narendasan left a comment

gs-olive commented Dec 13, 2023

Fix memory leaks #2526

Fix memory leaks #2526

Conversation

gcuendet commented Dec 11, 2023

Description

Type of change

Checklist:

gcuendet commented Dec 12, 2023

narendasan left a comment

Choose a reason for hiding this comment

gs-olive commented Dec 13, 2023