Skip to content

fix bugs in embedding converter #403

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 22, 2021

Conversation

ruoqianguo
Copy link
Contributor

Signed-off-by: Ruoqian Guo [email protected]

Description

In my practice, the indicesTensor in aten::embedding sometimes is IValue. So i replaced the ITensor with ITensorOrFreeze.
According to the document , After calling setType, the type of a tensor is unchanged if the tensor is not a network input tensor, or marked as an output tensor or shape output tensor. So i replaced the setType with addIdentity to avoid change the input data type and ensure the indicesTensor is set as INT32.

In addition, i use the linters to ensure that my code matches the style guidelines. The linters changed the entire file(core/conversion/converters/impl/select.cpp) . I only modified the aten::embedding part in select.cpp.

Fixes # (issue)

Type of change

Please delete options that are not relevant and/or add your own.

  • Bug fix (non-breaking change which fixes an issue)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist:

  • My code follows the style guidelines of this project (You can use the linters)
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas and hacks
  • I have made corresponding changes to the documentation
  • I have added tests to verify my fix or my feature
  • New and existing unit tests pass locally with my changes

Comment on lines 177 to 184
{"aten::embedding(Tensor weight, Tensor indices, int padding_idx=-1, bool scale_grad_by_freq=False, bool sparse=False) -> (Tensor)",
[](ConversionCtx* ctx, const torch::jit::Node* n, args& args) -> bool {
auto embeddingTensor = args[0].ITensorOrFreeze(ctx);
auto indicesTensor = args[1].ITensor();
auto indicesTensor = args[1].ITensorOrFreeze(ctx);
// Set datatype for indices tensor to INT32
indicesTensor->setType(nvinfer1::DataType::kINT32);
auto identity = ctx->net->addIdentity(*indicesTensor);
identity->setOutputType(0, nvinfer1::DataType::kINT32);
indicesTensor = identity->getOutput(0);

// IGatherLayer takes in input tensor, the indices, and the axis of input tensor to take indices from
auto gather_layer = ctx->net->addGather(*embeddingTensor, *indicesTensor, 0);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only modified this part in select.cpp.

@@ -106,7 +106,7 @@ TEST(Converters, ATenEmbeddingConvertsCorrectly) {
auto jit_results = trtorch::tests::util::RunGraph(g, params, {jit_in});

// Run TensorRT
auto options_trt = torch::TensorOptions().device(torch::kCUDA, 0).dtype(torch::kI32);
auto options_trt = torch::TensorOptions().device(torch::kCUDA, 0).dtype(torch::kFloat);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do the indices only come in float? can they come in other types? We should maybe instead duplicate instead of modifying the test

Copy link
Contributor Author

@ruoqianguo ruoqianguo Mar 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When using setType function in embbedding converter, the input data type will be changed from kFloat to kInt32. But i have replaced setType function with addIdentity function, the input data type will not change. If i still use kInt32, it will get an error like that "Expected inputs[pyt_idx].dtype() == expected_type to be true but got false. Expected input tensors to have type Float, found type int".
In my opinion, a single operation at one layer of the network directly changes the input data type of the entire network, which maybe is not a cool thing, right? And the issue #388 maybe can be resolved through set all input data type as kFloat.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is a good catch, I was just wondering about the test and why it wasnt duplicated. I think its not a massive deal, other than that this patch lgtm

@narendasan
Copy link
Collaborator

When I run the linter on my machine on your branch it switches everything back. How are you running the linter? do you perhaps have a global clang-format or something?

@ruoqianguo
Copy link
Contributor Author

When I run the linter on my machine on your branch it switches everything back. How are you running the linter? do you perhaps have a global clang-format or something?

Previously I used clang-format10.0, now I have changed it to clang-format 9.0.1-12. Now everything in select.cpp is back.

Besides, after i run this command "bazel run //tools/linter:cpplint -- //..." with clang-format 9.0.1-12, there are other some files are changed:
image
I am curious why other files are also modified. Is it because of my clang-format version?

@@ -106,7 +106,7 @@ TEST(Converters, ATenEmbeddingConvertsCorrectly) {
auto jit_results = trtorch::tests::util::RunGraph(g, params, {jit_in});

// Run TensorRT
auto options_trt = torch::TensorOptions().device(torch::kCUDA, 0).dtype(torch::kI32);
auto options_trt = torch::TensorOptions().device(torch::kCUDA, 0).dtype(torch::kFloat);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is a good catch, I was just wondering about the test and why it wasnt duplicated. I think its not a massive deal, other than that this patch lgtm

@narendasan
Copy link
Collaborator

Hmm, that is odd, I did not realize that different clang-format versions will give you different lintings. I will look more into what the set up of the canonical linter (the one that gets run on github on PRs) is and make it standard

Copy link
Collaborator

@narendasan narendasan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@narendasan narendasan merged commit 9439059 into pytorch:master Mar 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants