Skip to content

Automatic Fallback #406

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 56 commits into from
Apr 30, 2021
Merged
Show file tree
Hide file tree
Changes from 47 commits
Commits
Show all changes
56 commits
Select commit Hold shift + click to select a range
848335e
implemented segmentedBlock to construct subgraphs
bowang007 Feb 26, 2021
123f026
stitched the TensorRT engine to Torch nodes
bowang007 Mar 2, 2021
bbd3835
implemented fallback and run successfully
bowang007 Mar 5, 2021
1ca13d8
clean messy code
bowang007 Mar 5, 2021
0d28164
resolved dependency problems in edge cases
bowang007 Mar 8, 2021
55e0510
refactored the new graph output registration
bowang007 Mar 9, 2021
f4c29b4
feat: added user level API for fallback
bowang007 Mar 10, 2021
6d3064a
feat: allow users to set fallback block size and ops
bowang007 Mar 10, 2021
100b090
feat: support Python APIs for Automatic Fallback
bowang007 Mar 11, 2021
8b7919f
fix: register the torch_fallback attribute in Python API
bowang007 Mar 11, 2021
46950bb
fix: support shape inference for add_, support non-tensor arguments f…
bowang007 Mar 17, 2021
c0ea3a9
chore: merge master branch into fallback development branch
bowang007 Mar 19, 2021
d90a300
chore: added some comments and reformat the code
bowang007 Mar 19, 2021
da09e4b
chore: support passing BoolType/ListType arguments for segments
bowang007 Mar 23, 2021
6147d4f
chore: Support more types conversion for minigraph inputs
bowang007 Mar 23, 2021
54e407e
feat: support Int/Bool and other constants' inputs/outputs for Tensor…
bowang007 Mar 25, 2021
4e32eff
feat: insert nodes by dependencies for nonTensor inputs/outputs
bowang007 Mar 30, 2021
ec2bbf2
feat: support prim::Param for fallback inputs
inocsin Mar 30, 2021
77b4dc7
Merge pull request #414 from inocsin/bowa_fallback
bowang007 Mar 30, 2021
459a9b9
chore: clean messy code
bowang007 Mar 30, 2021
cfc68ce
Merge branch 'bowa_fallback' of https://github.com/NVIDIA/TRTorch int…
bowang007 Mar 30, 2021
965a67a
chore: reorganize code structure initially
bowang007 Mar 30, 2021
3cebe97
feat: support prim::Param for input type after refactor
inocsin Mar 31, 2021
664ccbd
Merge pull request #415 from inocsin/bowa_fallback
bowang007 Mar 31, 2021
c8656ce
chore: merge from master
bowang007 Mar 31, 2021
1e68899
Merge branch 'master' of https://github.com/NVIDIA/TRTorch into bowa_…
bowang007 Apr 2, 2021
0a0e922
chore: fix format bugs
bowang007 Apr 2, 2021
6e96289
chore: refactor cloneNode function
bowang007 Apr 6, 2021
fb1a299
refactor(//core/partitioning): Reorganizing partitioning deps
narendasan Apr 1, 2021
b3589c5
feat(//core/partitioing): Adding ostream for Partition Info
narendasan Apr 1, 2021
24c3a22
refactor: Apply linting
narendasan Apr 1, 2021
ee536b6
feat(//core/partitioning): Add an ostream implementation for
narendasan Apr 1, 2021
57002ab
refactor: Apply pylinting
narendasan Apr 2, 2021
1447bd5
refactor(//core/partitioning): refactor segmentedblock
narendasan Apr 7, 2021
569d011
Merge pull request #418 from NVIDIA/narens/fallback
bowang007 Apr 7, 2021
6d826d3
test: add tests for graph segmentation and shape analysis in partitio…
bowang007 Apr 7, 2021
2840281
chore: change test code according to new APIs
bowang007 Apr 7, 2021
3d39d7c
test: add tests for TRT conversion, graph stitch, results comparison …
bowang007 Apr 8, 2021
116b001
chore: apply linting
bowang007 Apr 8, 2021
3a72dc3
test: remove the jit file dependency from tests
bowang007 Apr 13, 2021
824b555
Merge branch 'master' of https://github.com/NVIDIA/TRTorch into bowa_…
bowang007 Apr 13, 2021
d73dc42
tests: update the dependent models for fallback graph conversion, sti…
bowang007 Apr 13, 2021
d4b7ad0
refactor: refactor SegmentedBlock inshape to ir::InputRange
bowang007 Apr 14, 2021
437670e
chore: apply linting
bowang007 Apr 14, 2021
f722035
tests: use IRParser in test_tensorrt_conversion and test_stitched_graph
bowang007 Apr 16, 2021
c1934c1
chore: improve some minor code problems
bowang007 Apr 16, 2021
de3ba23
merge: resolve the confilct in AddEngineToGraph argument
bowang007 Apr 23, 2021
c67d8f6
feat: support the case when the injected node is not supported in dep…
bowang007 Apr 24, 2021
20543c6
chore: optimize minor code problems according to PR
bowang007 Apr 26, 2021
58cb53e
chore: apply linting
bowang007 Apr 26, 2021
e491bb5
fix: fix typo bug
bowang007 Apr 26, 2021
4a318a2
chore: apply linting
bowang007 Apr 27, 2021
80b1038
fix: erase the repetitive nodes in dependency analysis
bowang007 Apr 27, 2021
5110480
chore: refactor code structures according to PR
bowang007 Apr 29, 2021
dde0216
Merge branch 'master' into bowa_fallback
narendasan Apr 30, 2021
ff89059
fix(//tests/core/partitioning): Fixing some issues with the partition
narendasan Apr 30, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions core/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ cc_library(
"//core/conversion",
"//core/runtime",
"//core/lowering",
"//core/partitioning",
"//core/util/logging",
"@tensorrt//:nvinfer"
] + select({
Expand Down
142 changes: 118 additions & 24 deletions core/compiler.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -21,37 +21,24 @@

#include "core/conversion/conversion.h"
#include "core/lowering/lowering.h"
#include "core/partitioning/partitioning.h"
#include "core/runtime/runtime.h"

namespace trtorch {
namespace core {

c10::FunctionSchema GenerateGraphSchema(
torch::jit::script::Module mod,
std::string method_name,
std::shared_ptr<torch::jit::Graph>& g) {
std::vector<c10::Argument> args;
for (auto in : g->inputs()) {
args.push_back(c10::Argument(in->debugName(), in->type()));
}

std::vector<c10::Argument> returns;
for (auto out : g->outputs()) {
returns.push_back(c10::Argument(out->debugName(), out->type()));
}

return c10::FunctionSchema(method_name, method_name, args, returns);
}

void AddEngineToGraph(
torch::jit::script::Module mod,
std::shared_ptr<torch::jit::Graph>& g,
const std::string& serialized_engine) {
auto engine_ptr = c10::make_intrusive<runtime::TRTEngine>(mod._ivalue()->name(), serialized_engine);
const std::string& serialized_engine,
int engine_id = 0) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should think about methods for more descriptive names in the future

auto engine_ptr =
c10::make_intrusive<runtime::TRTEngine>(mod._ivalue()->name() + std::to_string(engine_id), serialized_engine);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does engine_id just need to be unique or do we use the ids else where?

If they are just unique we should use the pointer trick to get something that is likely to be unique, therefore we dont really need to worry about conflicts

// Get required metadata about the engine out
auto num_io = engine_ptr->num_io;
auto name = engine_ptr->name;

//..
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be removed.

// Add the engine as an attribute of the module, this will let the engine be
// serialized and deserialized
mod.register_attribute(
Expand Down Expand Up @@ -108,17 +95,19 @@ void AddEngineToGraph(
g->block()->appendNode(unpack_node);

// If there are multiple output tensors from TensorRT we wrap them in a tuple
// to return
if (unpack_node->outputs().size() > 1) {
// to return, convert to tuple only when we only have 1 segmented graph
if (!engine_id && unpack_node->outputs().size() > 1) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the case where we have multiple TRT engines never have engine_id 0? We should not this if so.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I refactored this function by adding a default argument indicating whether we have fallback or not.

// Creates prim::TupleConstruct(<output tensors>) using outputs of the
// unpack node
auto return_tuple_node = g->createTuple(unpack_node->outputs());
g->block()->appendNode(return_tuple_node);
// Set the output as the produced tuple
g->registerOutput(return_tuple_node->outputs()[0]);
} else {
// Set the output as the sole output tensor
g->registerOutput(unpack_node->outputs()[0]);
// if fallback is enabled, multiple outputs will be registered
for (size_t i = 0; i < unpack_node->outputs().size(); ++i) {
g->registerOutput(unpack_node->outputs()[i]);
}
}

LOG_DEBUG(*g << "(AddEngineToGraph)\n");
Expand All @@ -142,6 +131,7 @@ std::string ConvertGraphToTRTEngine(const torch::jit::script::Module& mod, std::

auto convert_cfg = std::move(cfg.convert_info);
auto g = graph_and_parameters.first;

auto params = graph_and_parameters.second;
auto named_params = conversion::get_named_params(g->inputs(), params);

Expand All @@ -151,7 +141,111 @@ std::string ConvertGraphToTRTEngine(const torch::jit::script::Module& mod, std::
return std::move(engine);
}

void AddSegmentedBlockToGraph(
std::shared_ptr<torch::jit::Graph>& g,
partitioning::SegmentedBlock& seg,
std::unordered_map<torch::jit::Value*, torch::jit::Value*>& old_to_new_g) {
// old_to_new_g contains: original global graph value => new global graph value,
// mini_to_new_g: mini graph value -> new graph value
std::unordered_map<torch::jit::Value*, torch::jit::Value*> mini_to_new_g;
size_t input_idx = 0;
if (seg.target() == partitioning::SegmentedBlock::kTensorRT && g->inputs().size() > 0) {
if (g->inputs()[0]->type()->str().find("__torch__") == std::string::npos) {
auto self = g->insertInput(0, "self_1");
self->setType(seg.inputs()[0]->type());
}
mini_to_new_g[seg.inputs()[input_idx++]] = g->inputs()[0];
}

for (auto& raw_input : seg.raw_inputs()) {
if (old_to_new_g.count(raw_input)) {
mini_to_new_g[seg.inputs()[input_idx++]] = old_to_new_g[raw_input];
}
}

for (const auto n : seg.nodes()) {
util::cloneNode(n, g, mini_to_new_g);
}

// original graph value => new global graph value
for (size_t i = 0; i < seg.raw_outputs().size(); ++i) {
old_to_new_g[seg.raw_outputs()[i]] = mini_to_new_g[seg.outputs()[i]];
}

return;
}

torch::jit::script::Module CompileGraphWithFallback(const torch::jit::script::Module& mod, CompileSpec cfg) {
// TODO: Should be doing a functional transform but need PR #31978
// [jit] More robust mangling
// torch::jit::script::Module new_mod = mod.clone();
torch::jit::script::Module new_mod(mod._ivalue()->name() + "_trt");
std::vector<std::shared_ptr<torch::jit::Graph>> graphs;
for (const torch::jit::script::Method& method : mod.get_methods()) {
// Don't convert hidden methods
if (method.name().rfind("_", 0)) {
auto new_g = std::make_shared<torch::jit::Graph>();
auto graph_and_parameters = lowering::Lower(mod, method.name());

auto g = graph_and_parameters.first;
auto params = graph_and_parameters.second;
auto named_params = conversion::get_named_params(g->inputs(), params);
auto convert_cfg = std::move(cfg.convert_info);
LOG_INFO(*g << "(LoweringGraph)\n");

// segment the graph and convert segmented TensorRT block
auto segmented_blocks = partitioning::Partition(g, convert_cfg.input_ranges, cfg.partition_info);
if (segmented_blocks.size() == 1 && segmented_blocks[0].target() == partitioning::SegmentedBlock::kTorch) {
return mod;
}

int trt_engine_id = 1;
std::unordered_map<torch::jit::Value*, torch::jit::Value*> old_to_new_g;
// add global graph's input to old_to_new_g mapping
for (auto input : g->inputs()) {
util::getOrAddInputForValue(input, new_g, old_to_new_g);
}
for (auto& seg_block : segmented_blocks) {
LOG_INFO(*g << "(MiniGraphInSegmentedBlock)\n");
if (seg_block.target() == partitioning::SegmentedBlock::kTensorRT) {
std::vector<ir::InputRange> input_ranges;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does Dynamic shape work?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. Currently we haven't considered the case when we have dynamic shapes in shape analysis.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats probably higher priority than loops then, I think since we have unrolling that can be enabled. Also I think its pretty achievable in the time we have

for (auto& shape : seg_block.in_shape()) {
input_ranges.push_back(ir::InputRange(shape));
}
// update the input ranges for each segments
convert_cfg.input_ranges = input_ranges;
auto engine = conversion::ConvertBlockToEngine(seg_block.block(), convert_cfg, named_params);
auto temp_g = std::make_shared<torch::jit::Graph>();
AddEngineToGraph(new_mod, temp_g, engine, trt_engine_id++);

seg_block.update_graph(temp_g);
AddSegmentedBlockToGraph(new_g, seg_block, old_to_new_g);
} else {
AddSegmentedBlockToGraph(new_g, seg_block, old_to_new_g);
}
}

for (auto& output : g->outputs()) {
new_g->registerOutput(old_to_new_g[output]);
}

LOG_INFO(*new_g << "(FallbackGraph)\n");

auto new_method = new_mod._ivalue()->compilation_unit()->create_function(method.name(), new_g);
auto schema = util::GenerateGraphSchema(new_method->name(), new_g);
new_mod.type()->addMethod(new_method);
new_method->setSchema(schema);
}
}

return new_mod;
}

torch::jit::script::Module CompileGraph(const torch::jit::script::Module& mod, CompileSpec cfg) {
// TODO: not sure how to deal with duplicated code here, so just cut out a branch temporally
if (cfg.partition_info.enabled) {
return CompileGraphWithFallback(mod, cfg);
}
// TODO: Should be doing a functional transform but need PR #31978
// [jit] More robust mangling
// torch::jit::script::Module new_mod = mod.clone();
Expand All @@ -164,7 +258,7 @@ torch::jit::script::Module CompileGraph(const torch::jit::script::Module& mod, C
auto new_g = std::make_shared<torch::jit::Graph>();
AddEngineToGraph(new_mod, new_g, engine);
auto new_method = new_mod._ivalue()->compilation_unit()->create_function(method.name(), new_g);
auto schema = GenerateGraphSchema(new_mod, new_method->name(), new_g);
auto schema = util::GenerateGraphSchema(new_method->name(), new_g);
new_mod.type()->addMethod(new_method);
new_method->setSchema(schema);
}
Expand Down
5 changes: 4 additions & 1 deletion core/compiler.h
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,17 @@
#include <cuda_runtime.h>
#include <vector>
#include "core/conversion/conversion.h"
#include "core/ir/ir.h"
#include "core/partitioning/partitioning.h"
#include "torch/csrc/jit/api/module.h"

namespace trtorch {
namespace core {

struct CompileSpec {
CompileSpec(std::vector<conversion::InputRange> input_ranges) : convert_info(std::move(input_ranges)) {}
CompileSpec(std::vector<ir::InputRange> input_ranges) : convert_info(std::move(input_ranges)) {}
conversion::ConversionInfo convert_info;
partitioning::PartitionInfo partition_info;
};

bool CheckMethodOperatorSupport(const torch::jit::script::Module& mod, std::string method_name);
Expand Down
3 changes: 2 additions & 1 deletion core/conversion/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,8 @@ cc_library(
"//core/conversion/conversionctx",
"//core/conversion/converters",
"//core/conversion/evaluators",
"//core/util:prelude"
"//core/util:prelude",
"//core/ir",
] + select({
":use_pre_cxx11_abi": ["@libtorch_pre_cxx11_abi//:libtorch"],
"//conditions:default": ["@libtorch//:libtorch"],
Expand Down
49 changes: 0 additions & 49 deletions core/conversion/InterfaceTypes.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -23,55 +23,6 @@ GraphParams get_named_params(c10::ArrayRef<torch::jit::Value*> inputs, std::vect
return std::move(named_params);
}

InputRange::InputRange(std::vector<int64_t> d) {
if (d.size() > 5) {
LOG_WARNING("Verify that this dim size is accepted");
}

opt = util::toDims(d);
min = util::toDims(d);
max = util::toDims(d);
input_shape = util::toDims(d);
input_is_dynamic = false;
}

InputRange::InputRange(std::vector<int64_t> min_shape, std::vector<int64_t> opt_shape, std::vector<int64_t> max_shape) {
if (min_shape.size() > 5 || opt_shape.size() > 5 || max_shape.size() > 5) {
LOG_WARNING("Verify that this dim size is accepted");
}

std::set<size_t> sizes;
sizes.insert(min_shape.size());
sizes.insert(opt_shape.size());
sizes.insert(max_shape.size());

if (sizes.size() != 1) {
LOG_ERROR(
"Expected all input sizes have the same dimensions, but found dimensions: min("
<< min_shape.size() << "), opt(" << opt_shape.size() << "), max(" << max_shape.size() << ")");
}

min = util::toDims(min_shape);
opt = util::toDims(opt_shape);
max = util::toDims(max_shape);

std::vector<int64_t> dyn_shape;
for (size_t i = 0; i < opt_shape.size(); i++) {
std::set<uint64_t> dim;
dim.insert(min_shape[i]);
dim.insert(opt_shape[i]);
dim.insert(max_shape[i]);
if (dim.size() != 1) {
dyn_shape.push_back(-1);
input_is_dynamic = true;
} else {
dyn_shape.push_back(opt_shape[i]);
}
}

input_shape = util::toDims(dyn_shape);
}

} // namespace conversion
} // namespace core
} // namespace trtorch
5 changes: 4 additions & 1 deletion core/conversion/conversion.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -118,7 +118,10 @@ void AddLayer(ConversionCtx* ctx, const torch::jit::Node* n) {
<< "please report this error to https://www.github.com/NVIDIA/TRTorch/issues");
}

void AddInputs(ConversionCtx* ctx, at::ArrayRef<const torch::jit::Value*> inputs, std::vector<InputRange>& input_dims) {
void AddInputs(
ConversionCtx* ctx,
at::ArrayRef<const torch::jit::Value*> inputs,
std::vector<ir::InputRange>& input_dims) {
std::vector<const torch::jit::Value*> input_tensors;
for (auto in : inputs) {
// Disregarding inputs that are not tensors
Expand Down
16 changes: 3 additions & 13 deletions core/conversion/conversion.h
Original file line number Diff line number Diff line change
Expand Up @@ -4,27 +4,17 @@

#include "NvInfer.h"
#include "core/conversion/conversionctx/ConversionCtx.h"
#include "core/ir/ir.h"
#include "torch/csrc/jit/ir/ir.h"

namespace trtorch {
namespace core {
namespace conversion {

struct InputRange {
nvinfer1::Dims min;
nvinfer1::Dims max;
nvinfer1::Dims opt;
nvinfer1::Dims input_shape;
bool input_is_dynamic = false;
// Should we restrict to unsigned?
InputRange(std::vector<int64_t> d);
InputRange(std::vector<int64_t> min_shape, std::vector<int64_t> opt_shape, std::vector<int64_t> max_shape);
};

struct ConversionInfo {
std::vector<InputRange> input_ranges;
std::vector<ir::InputRange> input_ranges;
BuilderSettings engine_settings;
ConversionInfo(std::vector<InputRange> input_ranges)
ConversionInfo(std::vector<ir::InputRange> input_ranges)
: input_ranges(std::move(input_ranges)), engine_settings(BuilderSettings()) {}
};

Expand Down
2 changes: 1 addition & 1 deletion core/conversion/evaluators/aten.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -450,7 +450,7 @@ auto aten_registrations TRTORCH_UNUSED =
if (args.at(n->input(0)).IValue()->isInt()) {
auto a = args.at(n->input(0)).unwrapToInt();
auto b = args.at(n->input(1)).unwrapToInt();
return std::floor(a / b);
return static_cast<int>(std::floor(a / b));
} else if (args.at(n->input(0)).IValue()->isDouble()) {
auto a = args.at(n->input(0)).unwrapToDouble();
auto b = args.at(n->input(1)).unwrapToDouble();
Expand Down
35 changes: 35 additions & 0 deletions core/ir/BUILD
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
package(default_visibility = ["//visibility:public"])

config_setting(
name = "use_pre_cxx11_abi",
values = {
"define": "abi=pre_cxx11_abi",
}
)

cc_library(
name = "ir",
hdrs = [
"ir.h"
],
srcs = [
"InputRange.cpp",
],
deps = [
"@tensorrt//:nvinfer",
"//core/util:prelude",
] + select({
":use_pre_cxx11_abi": ["@libtorch_pre_cxx11_abi//:libtorch"],
"//conditions:default": ["@libtorch//:libtorch"],
}),
)

load("@rules_pkg//:pkg.bzl", "pkg_tar")

pkg_tar(
name = "include",
package_dir = "core/ir/",
srcs = [
"ir.h",
],
)
Loading