Skip to content

feat: support int64 <=> int32 auto conversion #1407

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Nov 14, 2022
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions core/partitioning/partitioning.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -453,6 +453,7 @@ void partition(PartitioningCtx* ctx, ExampleIValues& example_tensor_map) {
registerSegmentsOutputs(ctx, block);

// run shape analysis on each segmented block
LOG_DEBUG("Running shape analysis for segmented graphs");
runShapeAnalysis(ctx, block, example_tensor_map);
}
}
Expand Down
67 changes: 64 additions & 3 deletions core/partitioning/shape_analysis.cpp
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
#include <queue>
#include "ATen/ATen.h"
#include "torch/csrc/jit/api/module.h"
#include "torch/csrc/jit/passes/constant_pooling.h"
Expand Down Expand Up @@ -57,6 +58,26 @@ std::unordered_map<const torch::jit::Value*, torch::jit::IValue> generateRandomI
return ivalue_map;
}

torch::jit::Node* getUpstreamCastNode(torch::jit::Value* val) {
std::queue<torch::jit::Value*> q;
q.push(val);
std::unordered_set<torch::jit::Node*> visited;
while (!q.empty()) {
auto cur_val = q.front();
q.pop();
auto node = cur_val->node();
if (node->kind().toQualString() == std::string("aten::to")) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May need an additional check to ensure that the aten::to schema is valid for dtype insertion, as some of these schemas do not take an integer dtype at all, for example:

  • aten::to(Tensor(a) self, bool non_blocking=False, bool copy=False) -> Tensor(b|a)
  • aten::to(Tensor(a) self, Device device, ScalarType dtype, bool non_blocking=False, bool copy=False, MemoryFormat? memory_format=None) -> Tensor(a)
  • aten::to(Tensor(a) self, Tensor other, bool non_blocking=False, bool copy=False, MemoryFormat? memory_format=None) -> Tensor(a)

A check could be something like an additional && with

(node->inputs()[1]->node()->output()->type()->kind() == torch::jit::TypeKind::IntType) ||
(node->inputs()[2]->node()->output()->type()->kind() == torch::jit::TypeKind::IntType)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @gs-olive Any reproducer for this?
What I'm not sure about is that for getUpstreamNode() function when we pass in a int32 value will the first cast node be the cast node that casts this value to int64? If that's the case, then we don't need this check.
In other words, is it possible that the first cast node involving the passed value is to cast some other value? If the first cast node is not the cast node that casts to int64, will the second cast node be what we want?

Copy link
Collaborator

@gs-olive gs-olive Nov 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @bowang007 - as an update, while this is no longer throwing an error on my end, my thought was that we do need this check you have, but maybe it should be more stringent - something like:

    if ((node->kind().toQualString() == std::string("aten::to")) &&
        ((node->inputs()[1]->node()->output()->type()->kind() == torch::jit::TypeKind::IntType) ||
        (node->inputs()[2]->node()->output()->type()->kind() == torch::jit::TypeKind::IntType))) {

This is because, in the case where the aten::to is the second option in my above comment, then inserting a constant like 3 will cause the model to fail, as the schema for to as requested needs a ScalarType and not an int. I don't have a specific model to reproduce an error with, and I do not think I encountered one while testing, I just thought it is generally safer to be more strict about the type of upstream cast node used to recast to Int32 - specifically, if we are unsure whether a node has a valid schema for repurposing, we should choose the safer option which is to manually insert an Int32 cast node, as you do in createCastNode.

Copy link
Collaborator

@gs-olive gs-olive Nov 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bowang007 Please let me know what you think about the comment in the thread above:
#1407 (comment)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gs-olive I got your point now, let me update this part.

return node;
}
if (node->kind() != torch::jit::prim::Constant && !visited.count(node)) {
visited.insert(node);
for (auto input : node->inputs()) {
q.push(input);
}
}
}
}

void getSegmentsOutputByRunning(
SegmentedBlock& seg_block,
std::unordered_map<const torch::jit::Value*, torch::jit::IValue>& ivalues_maps,
Expand Down Expand Up @@ -142,16 +163,56 @@ void getSegmentsOutputByRunning(
ivalues_maps[output] = jit_results[idx++];
}

// auto int64 <=> int32 conversion
if (seg_block.target() == SegmentedBlock::kTorch) {
// Firstly, check if there is Int64 input
for (size_t i = 0; i < seg_block.inputs().size(); ++i) {
if (ivalues_maps[seg_block.raw_inputs()[i]].isTensor()) {
auto cur_ivalue = ivalues_maps[seg_block.raw_inputs()[i]];
at::ScalarType t = cur_ivalue.toTensor().scalar_type();
if (t == at::kLong) {
// we add a cast operation to cast the type to Int64
auto inner_g = seg_block.g();
torch::jit::Node* cast_node = getUpstreamCastNode(seg_block.raw_inputs()[i]);
std::unordered_map<torch::jit::Value*, torch::jit::Value*> value_map;
value_map.insert({cast_node->inputs()[0], seg_block.inputs()[i]});
auto env = [&](torch::jit::Value* v) { return util::getOrAddInputForValue(v, inner_g, value_map); };
auto new_cast_node = inner_g->prependNode(inner_g->createClone(cast_node, env));
seg_block.inputs()[i]->replaceAllUsesAfterNodeWith(new_cast_node, new_cast_node->outputs()[0]);
}
}
}
// TODO: This part might be necessary for some model, now checkint to verify
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bowang007 should this be uncommented?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, optimized and refactored, this part is now included.

// for (size_t i = 0; i < seg_block.outputs().size(); ++i) {
// if (ivalues_maps[seg_block.raw_outputs()[i]].isTensor()) {
// auto cur_ivalue = ivalues_maps[seg_block.raw_outputs()[i]];
// at::ScalarType t = cur_ivalue.toTensor().scalar_type();
// if (t == at::kLong) {
// auto inner_g = seg_block.g();
// torch::jit::Node* cast_node = getUpstreamCastNode(seg_block.raw_outputs()[i]);
// std::unordered_map<torch::jit::Value*, torch::jit::Value*> value_map;
// value_map.insert({cast_node->inputs()[0], seg_block.outputs()[i]});
// auto const_val = inner_g->insertConstant(3);
// value_map.insert({cast_node->inputs()[1], const_val});
// auto env = [&](torch::jit::Value* v) { return util::getOrAddInputForValue(v, inner_g, value_map); };
// auto new_cast_node = inner_g->appendNode(inner_g->createClone(cast_node, env));
//
// }
// }
// }
}

// set input shape for each segmented block so we wil use it in conversion process
std::vector<ir::Input> input_shapes;
std::vector<at::ScalarType> input_types;
for (auto& i : seg_block.raw_inputs()) {
if (ivalues_maps[i].isTensor()) {
for (size_t i = 0; i < seg_block.inputs().size(); ++i) {
if (ivalues_maps[seg_block.raw_inputs()[i]].isTensor()) {
// set the input_shape and data_type
// we can use a temp value here instead of replacing the values in ivalues_map since we only use ivalues_map for
// shape inference
auto cur_ivalue = ivalues_maps[i];
auto cur_ivalue = ivalues_maps[seg_block.raw_inputs()[i]];
at::ScalarType t = cur_ivalue.toTensor().scalar_type();

if (!partitioning_info.truncate_long_and_double && (t == at::kLong || t == at::kDouble)) {
TORCHTRT_THROW_ERROR(
"Unable to process subgraph input type of at::kLong/at::kDouble, try to compile model with truncate_long_and_double enabled");
Expand Down