-
-
Notifications
You must be signed in to change notification settings - Fork 70
It also means our exception handling is poor as exceptions escaping the context are quietly swallowed. #221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Just to not forget it. I am bitten hardly by this now, as this type of error make the code hang, and no exception is printed. If I run it in "embedded mode", the below exception is not visible, and the python+JVM hangs for ever.
|
I was not sure, if the fix would be here or in cljbridge, |
behrica/clojurebridge#2 has some code snippets to reproduce it |
@cnuernber Each time my code hangs, I am always wondering if there was a python exception, which I just don't see. |
@behrica can you provide a little more context? |
It happnes with "any exception" on python side, if using embedded mode and loading a script: crash.clj; (ns train
(:require [libpython-clj2.python :refer [py.- py.] :as py]
[libpython-clj2.python.ffi :as ffi]))
;; [libpython-clj2.require :refer [require-python]]
;; [libpython-clj2.python.gc]
(println "-------- manual-gil: " ffi/manual-gil)
(def locked (ffi/lock-gil))
(println :gil-locked)
(println :before-crash)
(py/import-module "not existing") ;; whatever error on python side, like loading an non existing library
(println "crash fininished") Load the file in embedded:
|
It "hangs" forever without printing any error |
This appears to me to be an issue with clojurebridge, not libpython-clj. Specifically javabridge's static call pathway must swallow exceptions. Or I guess their make_method pathway. |
It does not happen with normal Clojure / JVM exceptions, that why I am wondering. |
Hmm. In that case something needs to check the python exception state and rethrow. |
Using |
So this code:
|
called like this: export _JAVA_OPTIONS="-Dlibpython_clj.manual_gil=true";python -c 'from clojurebridge import cljbridge;cljbridge.load_clojure_file(clj_file="crash2.clj",mvn_local_repo="/home/carsten/.m2/repository")' gives
|
and called like this export _JAVA_OPTIONS="-Dlibpython_clj.manual_gil=false";python -c 'from clojurebridge import cljbridge;cljbridge.load_clojure_file(clj_file="crash2.clj",mvn_local_repo="/home/carsten/.m2/repository")' gives an Java exception (which I could probably catch):
We can probably agree, that this is the expected behavior. A JVM exception and neither "jvm crash" nor "jvm hanging". |
And we have the correct behaviour in the REPL, in "Non-embedded" |
We have as well correct behaviour "in embedded mode from the REPL". The "load_clojure_file" makes the issue:
This |
It is documented here: chapter: Loading and running a Clojure file in embedded mode |
The new entry point is here: I think I do the "same" as the other entry point: |
To make it more confusing, there is an other pathway I made as well, which loads a fixed "user.clj" file. This prints as well teh exception.....
(I propose to later remove this So lets ignore this for the moment, I created: behrica/clojurebridge#5 |
Let's concentrated on the "hanging" one, which is worst from a user experience point of view: crash.clj (ns train
(:require [libpython-clj2.python :refer [py.- py.] :as py]
[libpython-clj2.python.ffi :as ffi]))
;; [libpython-clj2.require :refer [require-python]]
;; [libpython-clj2.python.gc]
(println "-------- manual-gil: " ffi/manual-gil)
(def locked (ffi/lock-gil))
(println :gil-locked)
(println :before-crash)
(py/import-module "not existing") ;; whatever error on python side, like loading an non existing library
(println "crash finished") run as export _JAVA_OPTIONS="-Dlibpython_clj.manual_gil=false";python -c 'from clojurebridge import cljbridge;cljbridge.load_clojure_file(clj_file="crash.clj",mvn_local_repo="/home/carsten/.m2/repository")' results in expected behaviour Exception in thread "Thread-0" Syntax error macroexpanding at (/home/carsten/Dropbox/sources/libpython-clj/crash.clj:5:1).
at clojure.lang.Compiler.load(Compiler.java:7665)
at clojure.lang.Compiler.loadFile(Compiler.java:7591)
at clojure.lang.RT$3.invoke(RT.java:327)
at clojure.lang.Var.invoke(Var.java:384)
Caused by: java.lang.Exception: ModuleNotFoundError: No module named 'not existing' |
(same hanning happens without calling "lock/gil") and independent of the "manual_gil") setting, so we get an even easier hanging szenario: |
You might ask, why do I play with "manul_gil" and GIL locking. But by doing so, I experience now the issue we discuss. If I get an python exception in my code with "manuel GIL handling", it hangs. As a "workaround" I now run it without manual GIL (to see if working on little data) and then switch to "with manual GIL" for train on full data, which is annoying to do. Especially because the instability of |
Even simpler code t reproduce: Any exception thronw by pythonresult in "hanging" (ns train
(:require [libpython-clj2.python :refer [py.- py.] :as py]
[libpython-clj2.python.ffi :as ffi]))
(def gil (ffi/lock-gil))
(py/run-simple-string "invalid syntax !!!")
(ffi/unlock-gil gil) clled like this: export _JAVA_OPTIONS="-Dlibpython_clj.manual_gil=true"
python -c 'from clojurebridge import cljbridge;cljbridge.load_clojure_file(clj_file="crash.clj",mvn_local_repo="/home/carsten/.m2/repository")' |
Well, I stand corrected :-). That definitely appears to be a bug in libpython-clj. Perhaps the unlock-gil code needs check-error-throw. Hmm - that needs to happen nearly after every python call - it was handled by with-gil and it should still be happening. |
I found a way to at least print the error, but it still hang diff --git a/src/libpython_clj2/python/ffi.clj b/src/libpython_clj2/python/ffi.clj
index 773d1f9..ec46e55 100644
--- a/src/libpython_clj2/python/ffi.clj
+++ b/src/libpython_clj2/python/ffi.clj
@@ -706,7 +706,8 @@ Each call must be matched with PyGILState_Release"}
(defn check-error-throw
[]
(when-let [error-str (check-error-str)]
- (throw (Exception. ^String error-str))))
+ (do (println :error error-str)
+ (throw (Exception. ^String error-str)))))
|
It gets as well solved by unlocking the gil in a (try
(py/run-simple-string "1/0")
(finally
(ffi/unlock-gil gil))) I noticed already before, that in embedded mode and loading a clj file, So the "hanging" we see is maybe the following: The exception:
|
so maybe it is simpler to find out and fix that in the following the VM does not terminate: (ns train
(:require [libpython-clj2.python :refer [py.- py.] :as py]
[libpython-clj2.python.ffi :as ffi]))
(def gil (ffi/lock-gil))
(py/run-simple-string "print('ok')") |
I saw that in the hanging state some thread seems to wait forever: Name: tech.resource.gc ref thread
State: TIMED_WAITING on java.lang.ref.ReferenceQueue$Lock@28f1bb74
Total blocked: 1 Total waited: 87,568
Stack trace:
java.base@11.0.16.1/java.lang.Object.wait(Native Method)
java.base@11.0.16.1/java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:155)
tech.v3.resource.gc$watch_reference_queue$fn__10795.invoke(gc.clj:27)
tech.v3.resource.gc$watch_reference_queue.invokeStatic(gc.clj:23)
tech.v3.resource.gc$watch_reference_queue.invoke(gc.clj:21)
tech.v3.resource.gc$start_reference_thread$fn__10807.invoke(gc.clj:47)
app//clojure.lang.AFn.run(AFn.java:22)
java.base@11.0.16.1/java.lang.Thread.run(Thread.java:829) not sure, if relevant |
The java api has a try-with-resources pattern - (with-open [glock (java_api/GILLocker)]
...) That will automatically do the try-finally. Anything auto-closeable will be closed in a with-open macro. |
There is a new API in latest (2.020) - with-manual-gil that will automatically lock/unlock the gil. user> ffi/manual-gil
true
user> (py/with-manual-gil
(py/run-simple-string "!! syntax error"))
Execution error at libpython-clj2.python.ffi/check-error-throw (ffi.clj:708).
File "<string>", line 1
!! syntax error
^
SyntaxError: invalid syntax
user> Outputs:
|
Originally posted by @cnuernber in #219 (comment)
The text was updated successfully, but these errors were encountered: