Skip to content

TensorFlow.loadLibrary not working on Windows #404

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
lucaro opened this issue Jan 6, 2022 · 17 comments
Open

TensorFlow.loadLibrary not working on Windows #404

lucaro opened this issue Jan 6, 2022 · 17 comments

Comments

@lucaro
Copy link

lucaro commented Jan 6, 2022

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04 x86_64): Windows 10
  • TensorFlow installed from (source or binary): 2.4.1
  • TensorFlow version (use command below):
  • Java version (i.e., the output of java -version): 11
  • Java command line flags (e.g., GC parameters):
  • Python version (if transferring a model trained in Python):
  • Bazel version (if compiling from source):
  • GCC/Compiler version (if compiling from source):
  • CUDA/cuDNN version:
  • GPU model and memory:

Describe the current behavior

I'm trying to use models which contain ops from tensorflow-text. To get them to work on Linux, I just took the .so files from the pip wheel and loaded them using TensorFlow.loadLibrary(). When I run the same code on Windows, using the binary files from the Windows pip wheel extracted in the same way (which are also .so files for some reason), they cannot be loaded. The example code below uses the universal-sentence-encoder-multilingual-large and works as expected when run from within the WSL2 Ubuntu 20.04 environment on my Windows 10 machine. Is there something else that needs to be done to get the libraries to load on Windows?

Describe the expected behavior

Code to reproduce the issue

public class Playground {

    public static void main(String[] args) {

        System.out.println(TensorFlow.version());

        File tensorflowTextBase = new File("../resources/tf_text/2.4.1/linux");

        File[] libraries = tensorflowTextBase.listFiles(f -> f.getName().endsWith(".so"));

        System.out.println("found " + libraries.length + " libraries");

        for (File lib : libraries) {
            try {
                OpList list = TensorFlow.loadLibrary(lib.getAbsolutePath());
                System.out.println("loaded " + list.getOpCount() + " ops from " + lib.getName());
            } catch (UnsatisfiedLinkError e) {
                System.err.println("could not load " + lib.getName());
            }
        }


        String text = "this is a test text";

        File file = new File("../resources/universal-sentence-encoder-multilingual-large_3/");
        System.out.println(file.getAbsolutePath());

        SavedModelBundle textEmbedding = SavedModelBundle.load(file.getAbsolutePath());

        try (TString textTensor = TString.tensorOf(NdArrays.vectorOfObjects(text))) {

            HashMap<String, Tensor> inputMap = new HashMap<>();
            inputMap.put("inputs", textTensor);

            Map<String, Tensor> resultMap = textEmbedding.call(inputMap);

            TFloat32 embedding = (TFloat32) resultMap.get("outputs");

            float[] embeddingArray = new float[512];
            FloatDataBuffer floatBuffer = DataBuffers.of(embeddingArray);
            embedding.read(floatBuffer);

            System.out.println(Arrays.toString(embeddingArray));

        }
    }
}

Other info / logs

Output from WSL2 Ubuntu 20.04

2.4.1
found 12 libraries
loaded 1 ops from _constrained_sequence_op.so
loaded 1 ops from _mst_ops.so
loaded 4 ops from _normalize_ops.so
loaded 1 ops from _regex_split_ops.so
loaded 7 ops from _sentencepiece_tokenizer.so
loaded 1 ops from _sentence_breaking_ops.so
loaded 1 ops from _split_merge_from_logits_tokenizer.so
loaded 1 ops from _split_merge_tokenizer.so
loaded 1 ops from _state_based_sentence_breaker_op.so
loaded 1 ops from _unicode_script_tokenizer.so
loaded 1 ops from _whitespace_tokenizer.so
loaded 1 ops from _wordpiece_tokenizer.so
/mnt/c/Users/Lucaro/Workspace/cineast/cineast-runtime/../resources/universal-sentence-encoder-multilingual-large_3
2022-01-06 13:12:35.253304: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:32] Reading SavedModel from: /mnt/c/Users/Lucaro/Workspace/cineast/cineast-runtime/../resources/universal-sentence-encoder-multilingual-large_3
2022-01-06 13:12:35.326120: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:55] Reading meta graph with tags { serve }
2022-01-06 13:12:35.326257: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:93] Reading SavedModel debug info (if present) from: /mnt/c/Users/Lucaro/Workspace/cineast/cineast-runtime/../resources/universal-sentence-encoder-multilingual-large_3
2022-01-06 13:12:35.326472: I external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-01-06 13:12:35.536663: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:206] Restoring SavedModel bundle.
2022-01-06 13:12:35.555468: I external/org_tensorflow/tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 3501000000 Hz
2022-01-06 13:12:37.667468: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:190] Running initialization op on SavedModel bundle at path: /mnt/c/Users/Lucaro/Workspace/cineast/cineast-runtime/../resources/universal-sentence-encoder-multilingual-large_3
2022-01-06 13:12:38.426469: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:277] SavedModel load for tags { serve }; Status: success: OK. Took 3173164 microseconds.
[-0.025447764, 0.05077521, -0.033019744, 0.078675404, 0.01637241, 0.06915054, -0.05788038, -0.023094412, -0.02089644, 0.02315649, 0.009380558, -0.021741828, -0.008661511, -0.0459719, 0.0042683827, -0.0513097, -0.01917885, -0.02274911
...

Output from Windows

2.4.1
found 12 libraries
could not load _constrained_sequence_op.so
could not load _mst_ops.so
could not load _normalize_ops.so
could not load _regex_split_ops.so
could not load _sentencepiece_tokenizer.so
could not load _sentence_breaking_ops.so
could not load _split_merge_from_logits_tokenizer.so
could not load _split_merge_tokenizer.so
could not load _state_based_sentence_breaker_op.so
could not load _unicode_script_tokenizer.so
could not load _whitespace_tokenizer.so
could not load _wordpiece_tokenizer.so
C:\Users\Lucaro\Workspace\cineast\resources\universal-sentence-encoder-multilingual-large_3
2022-01-06 13:30:06.993614: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:32] Reading SavedModel from: C:\Users\Lucaro\Workspace\cineast\resources\universal-sentence-encoder-multilingual-large_3
2022-01-06 13:30:07.024864: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:55] Reading meta graph with tags { serve }
2022-01-06 13:30:07.024895: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:93] Reading SavedModel debug info (if present) from: C:\Users\Lucaro\Workspace\cineast\resources\universal-sentence-encoder-multilingual-large_3
2022-01-06 13:30:07.025022: I external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-01-06 13:30:07.174499: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:206] Restoring SavedModel bundle.
2022-01-06 13:30:09.339210: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:190] Running initialization op on SavedModel bundle at path: C:\Users\Lucaro\Workspace\cineast\resources\universal-sentence-encoder-multilingual-large_3
2022-01-06 13:30:10.152906: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:277] SavedModel load for tags { serve }; Status: success: OK. Took 3159272 microseconds.
Exception in thread "main" org.tensorflow.exceptions.TensorFlowException: Op type not registered 'SentencepieceOp' in binary running on LUCARO-DESKTOP-. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.
	at org.tensorflow.internal.c_api.AbstractTF_Status.throwExceptionIfNotOK(AbstractTF_Status.java:101)
	at org.tensorflow.SavedModelBundle.load(SavedModelBundle.java:418)
	at org.tensorflow.SavedModelBundle.access$000(SavedModelBundle.java:59)
	at org.tensorflow.SavedModelBundle$Loader.load(SavedModelBundle.java:68)
	at org.tensorflow.SavedModelBundle.load(SavedModelBundle.java:242)
	at org.vitrivr.cineast.standalone.Playground.main(Playground.java:45)
@karllessard
Copy link
Collaborator

It is strange indeed that Windows binaries are .so but not DLLs... but my first guess would be that it fails to load the same way it works on Linux because TensorFlow is built monolithic on Windows, as you can see here

The result of this is that you'll end up having a single TensorFlow binary on Windows (tensorflow_cc.dll), while on other platforms there are two: libtensorflow_framework.so and libtensorflow_cc.so.

I know that on Linux and MacOS, tensorflow-text binaries links to tensorflow_framework.so. So my guess is that it tries to do the same thing on Windows but fails to find that library. You can probably check by dumping the list of library dependencies of the Windows tensorflow-text binary.

If that's the case, we need to understand how it is possible to load it then on Python or maybe try to build Windows TF binaries as non-monolithic as well (though this setting comes from the TensorFlow itself, not TensorFlow-Java, and I bet there is a good reason for it).

@lucaro
Copy link
Author

lucaro commented Jan 6, 2022

Thanks for the quick reply. If I understand the TensorFlow source correctly, the loadLibrary call is just a wrapper for the operating system call to load a system lib, both for the Python as well as the Java version. Is the monolithic build on windows only done for TensorFlow Java or also for Python? It looks the same to me, but I have very little experience with bazel, so I'm not sure I'm reading the build files correctly. When I try manually to load the .so files on Windows in Python, the call succeeds, at least if all the versions match up. If there's a version mismatch, Python at least throws a cryptic message about a missing entry point. So if the libs are built the same way for both Python and Java, there should be a way to make this work, no? Since the load call fails silently on the Java side, I also don't know if there's any other mismatch.

@karllessard
Copy link
Collaborator

Unfortunately I don't have a Windows setup available. It could be very helpful if you can dump out the list of dependencies of one of the dll/so and share it with us, here are some suggestions on how to do it: https://stackoverflow.com/questions/7378959/how-to-check-for-dll-dependency

@lucaro
Copy link
Author

lucaro commented Jan 7, 2022

Thanks, that was actually very insightful. I ran dumpbin on several of the .so files, the output was always the same:

Microsoft (R) COFF/PE Dumper Version 14.26.28806.0
Copyright (C) Microsoft Corporation.  All rights reserved.


Dump of file _mst_ops.so

  Image has the following dependencies:

    _pywrap_tensorflow_internal.pyd
    MSVCP140.dll
    KERNEL32.dll
    VCRUNTIME140.dll
    VCRUNTIME140_1.dll
    api-ms-win-crt-runtime-l1-1-0.dll
    api-ms-win-crt-math-l1-1-0.dll
    api-ms-win-crt-heap-l1-1-0.dll

  Summary

        1000 .data
        1000 .pdata
        5000 .rdata
        1000 .reloc
        B000 .text

All of the DLLs depend on _pywrap_tensorflow_internal.pyd which would be loaded by the python environment. This also explains this somewhat related comment on stackoverflow which I had dismissed before as not related enough. All other listed dependencies also appear in the dependency list of jnitensorflow.dll, so they are probably fine. So if it wants a _pywrap_tensorflow_internal.pyd I tried to give it one, but without success. First I tried just using an empty DLL, renaming it accordingly, but that didn't have an effect. Neither did copying over and renaming jnitensorflow.dll. When I tried the actual _pywrap_tensorflow_internal.pyd from the pip wheel (which is about 748MB in size, so not an ideal solution for a dependency that isn't actually needed), it would try to load something and then fail in a different way:

2022-01-07 08:30:51.597624: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2022-01-07 08:30:51.597643: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
Exception in thread "main" org.tensorflow.exceptions.TensorFlowException: Cannot parse OpList protocol buffer
	at org.tensorflow.TensorFlow.libraryOpList(TensorFlow.java:102)
	at org.tensorflow.TensorFlow.loadLibrary(TensorFlow.java:76)
	...
Caused by: com.google.protobuf.InvalidProtocolBufferException: While parsing a protocol message, the input ended unexpectedly in the middle of a field.  This could mean either that the input has been truncated or that an embedded message misreported its own length.
	at com.google.protobuf.InvalidProtocolBufferException.truncatedMessage(InvalidProtocolBufferException.java:84)
	at com.google.protobuf.CodedInputStream$UnsafeDirectNioDecoder.readRawByte(CodedInputStream.java:1950)
	at com.google.protobuf.CodedInputStream$UnsafeDirectNioDecoder.readRawVarint64SlowPath(CodedInputStream.java:1852)
	at com.google.protobuf.CodedInputStream$UnsafeDirectNioDecoder.readRawVarint32(CodedInputStream.java:1747)
	at com.google.protobuf.CodedInputStream$UnsafeDirectNioDecoder.readTag(CodedInputStream.java:1338)
	at org.tensorflow.proto.framework.OpList.<init>(OpList.java:52)
	at org.tensorflow.proto.framework.OpList.<init>(OpList.java:13)
	at org.tensorflow.proto.framework.OpList$1.parsePartialFrom(OpList.java:754)
	at org.tensorflow.proto.framework.OpList$1.parsePartialFrom(OpList.java:748)
	at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:134)
	at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:149)
	at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:48)
	at org.tensorflow.proto.framework.OpList.parseFrom(OpList.java:206)
	at org.tensorflow.TensorFlow.libraryOpList(TensorFlow.java:100)
	... 4 more

If I put a try{ ... }catch(Throwable t){} around the load instruction, just to prevent the whole thing from crashing, I end up with the same Exception in thread "main" org.tensorflow.exceptions.TensorFlowException: Op type not registered 'SentencepieceOp' in binary again.

So the problem isn't in the java binding for TensorFlow but rather in the way TensorFlow-Text is compiled for Python on Windows. Interestingly enough, the .so files for Linux don't have any Python-specific dependencies:

 >ldd _mst_ops.so
        linux-vdso.so.1 (0x00007ffff952c000)
        libtensorflow_framework.so.2 => not found
        libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fc101630000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fc101610000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fc1015ed000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fc1013f0000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fc1012a1000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fc101a4e000)

Any ideas on how to proceed?

@saudet
Copy link
Contributor

saudet commented Jan 7, 2022

@lucaro It is possible to link the binaries distributed for Python with the JNI wrappers, see issue #226 (comment). On Windows, since I don't think we can hack this with symbolic links, we probably need to build from source after patching the presets here:
https://github.com/tensorflow/java/blob/master/tensorflow-core/tensorflow-core-api/src/main/java/org/tensorflow/internal/c_api/presets/tensorflow.java#L68

@lucaro
Copy link
Author

lucaro commented Jan 7, 2022

I'm not sure I understand what you mean. When I try to load _pywrap_tensorflow_internal.pyd with TensorFlow.loadLibrary() I get the following:

2022-01-07 10:51:46.740762: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2022-01-07 10:51:46.740789: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
Exception in thread "main" org.tensorflow.exceptions.TensorFlowException: Cannot parse OpList protocol buffer
	at org.tensorflow.TensorFlow.libraryOpList(TensorFlow.java:102)
	at org.tensorflow.TensorFlow.loadLibrary(TensorFlow.java:76)
	at org.vitrivr.cineast.standalone.Playground.main(Playground.java:70)
Caused by: com.google.protobuf.InvalidProtocolBufferException$InvalidWireTypeException: Protocol message tag had invalid wire type.
Caused by: com.google.protobuf.InvalidProtocolBufferException$InvalidWireTypeException: Protocol message tag had invalid wire type.

	at com.google.protobuf.InvalidProtocolBufferException.invalidWireType(InvalidProtocolBufferException.java:111)
	at com.google.protobuf.UnknownFieldSet$Builder.mergeFieldFrom(UnknownFieldSet.java:557)
	at com.google.protobuf.GeneratedMessageV3.parseUnknownField(GeneratedMessageV3.java:320)
	at org.tensorflow.proto.framework.OpList.<init>(OpList.java:67)
	at org.tensorflow.proto.framework.OpList.<init>(OpList.java:13)
	at org.tensorflow.proto.framework.OpList$1.parsePartialFrom(OpList.java:754)
	at org.tensorflow.proto.framework.OpList$1.parsePartialFrom(OpList.java:748)
	at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:134)
	at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:149)
	at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:48)
	at org.tensorflow.proto.framework.OpList.parseFrom(OpList.java:206)
	at org.tensorflow.TensorFlow.libraryOpList(TensorFlow.java:100)
	... 2 more

It certainly does something as indicated by the warning about missing cuda, which does not appear otherwise, but crashes afterward. Does this go into the direction of what you had in mind?

@saudet
Copy link
Contributor

saudet commented Jan 7, 2022

It doesn't work because that way it tries to load both tensorflow_cc.dll and _pywrap_tensorflow_internal.pyd. We can't do that. We can only load a single version of TF Core in the same process.

@lucaro
Copy link
Author

lucaro commented Jan 7, 2022

Is there a way I can prevent tensorflow_cc.dll from being loaded?

@saudet
Copy link
Contributor

saudet commented Jan 7, 2022

Yes, like I'm saying for the third time now, by linking with _pywrap_tensorflow_internal.pyd. How many more times do you need to be told? :)

@lucaro
Copy link
Author

lucaro commented Jan 7, 2022

I'm sorry for missing the point here. This would require changing the line you indicated and then rebuilding TensorFlow Java, right? This would not be something I could do quickly since I don't have the build environment setup, so far only having used the maven artifacts you so graciously provide. I was hoping there was a way to do such modifications for testing purposes without having to rebuild such complex dependencies. Any guidance on how to check this in the most efficient way would be greatly appreciated.

@karllessard
Copy link
Collaborator

Thanks for the dependency dump @lucaro , yes looks like it is _pywrap_tensorflow_internal.pyd that is playing its dark magic on Windows.

Like @saudet suggested (supposedly many times ;) ), we can try building artifacts that links directly to this binary instead of the TensorFlow build we distribute. If that works, we could distribute that artifact as a new platform for TensorFlow Java (e.g. windows-py) to let people use it, at the cost of a (useless) additional dependency to CPython.

I understand you are not setup for building it, so we can probably create an additional setup in our CI build to test it out, but if you are eager to try it out now, you can look at this page for more guidance on how to prepare your environment to do it: https://github.com/tensorflow/java/blob/master/CONTRIBUTING.md

Please let me know if you plan to try it, thanks!

@lucaro
Copy link
Author

lucaro commented Jan 7, 2022

Hey @karllessard, thanks a lot for this clarification. Having TensorFlow-Text working on Linux is sufficient for my purposes for the time being and I don't currently have the capacity to get into the whole build process on Windows. I might tinker with it if I find some time, but I don't know when that's going to be. I think the idea of @saudet to link against the python binary is great though and if you ever add such a build to your CI, I'd be happy to help with beta-testing if that should be useful.

@saudet
Copy link
Contributor

saudet commented Jan 8, 2022

@lucaro Yes, it requires rebuilding from source, but since this repository uses GitHub Actions to perform the builds, anyone can very easily create a fork to perform their own custom builds.

@karllessard One thing we could do is rename tensorflow_cc.dll to _pywrap_tensorflow_internal.pyd and link the JNI wrappers to that. This would allow users to link with the Python binaries very easily, in a similar fashion to how we can do it for the JavaCPP Presets for PyTorch and the "official LibTorch distribution", or the binaries that come with the modules for Python for that matter, since PyTorch doesn't play games like that with DLL names:
https://github.com/bytedeco/javacpp-presets/tree/master/pytorch#documentation

@Craigacp
Copy link
Collaborator

Craigacp commented Jan 8, 2022

Given that we don't expose the same set of symbols as the pywrap file does, won't renaming it cause more trouble in the long run when people wonder why there are two completely different files with the same name?

@saudet
Copy link
Contributor

saudet commented Jan 8, 2022

Well, if you want to try and convince TF Core maintainers that they should give better more consistent names to their libraries, be my guest :) Like I said, PyTorch does that right. There's no reason TF couldn't either.

@karllessard
Copy link
Collaborator

At this point, I'm fine with @saudet suggestion and simply rename the DLL to match Python's one. It is hacky for sure but it's better than not supporting ops libraries like tensorflow-text on Windows.

I like though the idea of allowing users to link directly to the binaries distributed by the Python wheels. I'm wondering if we can give that option to the user, using let say an environment variable (or maybe that already works by setting the java.library.path)? But again, that would require the names of our native artifacts to match the Python's ones.

@saudet
Copy link
Contributor

saudet commented Jan 10, 2022

When the names of the libraries match, it works with the binaries from the wheels, yes, see issue #226 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants