Skip to content

It also means our exception handling is poor as exceptions escaping the context are quietly swallowed. #221

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
behrica opened this issue Sep 27, 2022 · 30 comments

Comments

@behrica
Copy link
Contributor

behrica commented Sep 27, 2022

    It also means our exception handling is poor as exceptions escaping the context are quietly swallowed.

Originally posted by @cnuernber in #219 (comment)

@behrica
Copy link
Contributor Author

behrica commented Sep 27, 2022

Just to not forget it.

I am bitten hardly by this now, as this type of error make the code hang, and no exception is printed.
It can be easily reproduced y running in embed a script which tried to load a "non existing python module".

If I run it in "embedded mode", the below exception is not visible, and the python+JVM hangs for ever.
I do see it, when running the same code in "non-embedded".

{:clojure.main/message
 "Syntax error compiling at (src/crash.clj:12:1).\nModuleNotFoundError: No module named 'not existing'\n\n",
 :clojure.main/triage
 {:clojure.error/phase :compile-syntax-check,
  :clojure.error/line 12,
  :clojure.error/column 1,
  :clojure.error/source "crash.clj",
  :clojure.error/path "src/crash.clj",
  :clojure.error/class java.lang.Exception,
  :clojure.error/cause
  "ModuleNotFoundError: No module named 'not existing'\n"},
 :clojure.main/trace
 {:via
  [{:type clojure.lang.Compiler$CompilerException,
    :message
    "Syntax error compiling at (/home/carsten/Dropbox/sources/predict_disaster_tweet/src/crash.clj:12:1).",
    :data
    {:clojure.error/phase :compile-syntax-check,
     :clojure.error/line 12,
     :clojure.error/column 1,
     :clojure.error/source
     "/home/carsten/Dropbox/sources/predict_disaster_tweet/src/crash.clj"},
    :at [clojure.lang.Compiler load "Compiler.java" 7652]}
   {:type java.lang.Exception,
    :message "ModuleNotFoundError: No module named 'not existing'\n",
    :at
    [libpython_clj2.python.ffi$check_error_throw
     invokeStatic
     "ffi.clj"
     708]}],
  :trace
  [[libpython_clj2.python.ffi$check_error_throw
    invokeStatic
    "ffi.clj"
    708]
   [libpython_clj2.python.ffi$check_error_throw invoke "ffi.clj" 706]
   [libpython_clj2.python$import_module invokeStatic "python.clj" 165]
   [libpython_clj2.python$import_module invoke "python.clj" 158]
   [train$eval14176 invokeStatic "crash.clj" 12]
   [train$eval14176 invoke "crash.clj" 12]
   [clojure.lang.Compiler eval "Compiler.java" 7181]
   [clojure.lang.Compiler load "Compiler.java" 7640]
   [clojure.lang.Compiler loadFile "Compiler.java" 7578]
   [clojure.main$load_script invokeStatic "main.clj" 475]
   [clojure.main$script_opt invokeStatic "main.clj" 535]
   [clojure.main$script_opt invoke "main.clj" 530]
   [clojure.main$main invokeStatic "main.clj" 664]
   [clojure.main$main doInvoke "main.clj" 616]
   [clojure.lang.RestFn applyTo "RestFn.java" 137]
   [clojure.lang.Var applyTo "Var.java" 705]
   [clojure.main main "main.java" 40]],
  :cause "ModuleNotFoundError: No module named 'not existing'\n",
  :phase :compile-syntax-check}}

@behrica
Copy link
Contributor Author

behrica commented Sep 27, 2022

I was not sure, if the fix would be here or in cljbridge,
so I opened this before:

behrica/clojurebridge#2

@behrica
Copy link
Contributor Author

behrica commented Sep 27, 2022

behrica/clojurebridge#2 has some code snippets to reproduce it

@behrica
Copy link
Contributor Author

behrica commented Sep 29, 2022

@cnuernber
Could you have a look at this ?
I am not able to dig myself, too much native and macro magic for me.

Each time my code hangs, I am always wondering if there was a python exception, which I just don't see.

@jjtolton
Copy link
Contributor

@behrica can you provide a little more context? "ModuleNotFoundError: No module named 'not existing' is a Python error, but 'not existing' is a very unusual module name. I feel like I would need more information to pinpoint why the error is not printing to the console, because I don't think I'd be able to reproduce this.

@behrica
Copy link
Contributor Author

behrica commented Sep 29, 2022

It happnes with "any exception" on python side, if using embedded mode and loading a script:

crash.clj;

(ns train
  (:require [libpython-clj2.python :refer [py.- py.] :as py]
            [libpython-clj2.python.ffi :as ffi]))
            ;; [libpython-clj2.require :refer [require-python]]
            ;; [libpython-clj2.python.gc]
(println "-------- manual-gil: " ffi/manual-gil)

(def locked (ffi/lock-gil))
(println :gil-locked)

(println :before-crash)
(py/import-module "not existing")  ;; whatever error on python side, like loading an non existing library
(println "crash fininished")

Load the file in embedded:

python -c 'from clojurebridge import cljbridge;cljbridge.load_clojure_file(clj_file="crash.clj",mvn_local_repo="/home/carsten/.m2/repository")'

@behrica
Copy link
Contributor Author

behrica commented Sep 29, 2022

It "hangs" forever without printing any error

@cnuernber
Copy link
Collaborator

cnuernber commented Sep 29, 2022

This appears to me to be an issue with clojurebridge, not libpython-clj. Specifically javabridge's static call pathway must swallow exceptions.

Or I guess their make_method pathway.

@behrica
Copy link
Contributor Author

behrica commented Sep 29, 2022

It does not happen with normal Clojure / JVM exceptions, that why I am wondering.

@cnuernber
Copy link
Collaborator

Hmm. In that case something needs to check the python exception state and rethrow.

@behrica
Copy link
Contributor Author

behrica commented Sep 30, 2022

Using py/with-gil behaves "better" (and different depending on "manual gil or not")....

@behrica
Copy link
Contributor Author

behrica commented Sep 30, 2022

So this code:

(ns train
  (:require [libpython-clj2.python :refer [py.- py.] :as py]))


(py/with-gil
  (py/import-module "not existing"))

@behrica
Copy link
Contributor Author

behrica commented Sep 30, 2022

called like this:

export _JAVA_OPTIONS="-Dlibpython_clj.manual_gil=true";python -c 'from clojurebridge import cljbridge;cljbridge.load_clojure_file(clj_file="crash2.clj",mvn_local_repo="/home/carsten/.m2/repository")'

gives

ARNING: Please consider reporting this to the maintainers of tech.v3.datatype.UnsafeUtil
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f92b2d87d7a, pid=38488, tid=38488
#
# JRE version: OpenJDK Runtime Environment (11.0.16.1+1) (build 11.0.16.1+1)
# Java VM: OpenJDK 64-Bit Server VM (11.0.16.1+1, mixed mode, tiered, compressed oops, g1 gc, linux-amd64)
# Problematic frame:
# C  [libpython3.10.so.1.0+0x187d7a]  PyImport_Import+0x3a
#
# Core dump will be written. Default location: Core dumps may be processed with "/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h" (or dumping to /home/carsten/Dropbox/sources/libpython-clj/core.38488)
#
# An error report file with more information is saved as:
# /home/carsten/Dropbox/sources/libpython-clj/hs_err_pid38488.log

@behrica
Copy link
Contributor Author

behrica commented Sep 30, 2022

and called like this

export _JAVA_OPTIONS="-Dlibpython_clj.manual_gil=false";python -c 'from clojurebridge import cljbridge;cljbridge.load_clojure_file(clj_file="crash2.clj",mvn_local_repo="/home/carsten/.m2/repository")'

gives an Java exception (which I could probably catch):

ARNING: All illegal access operations will be denied in a future release
Exception in thread "Thread-0" Syntax error macroexpanding at (/home/carsten/Dropbox/sources/libpython-clj/crash2.clj:5:1).
	at clojure.lang.Compiler.load(Compiler.java:7665)
	at clojure.lang.Compiler.loadFile(Compiler.java:7591)
	at clojure.lang.RT$3.invoke(RT.java:327)
	at clojure.lang.Var.invoke(Var.java:384)
Caused by: java.lang.Exception: ModuleNotFoundError: No module named 'not existing'

	at libpython_clj2.python.ffi$check_error_throw.invokeStatic(ffi.clj:708)
	at libpython_clj2.python.ffi$check_error_throw.invoke(ffi.clj:706)
	at libpython_clj2.python$import_module.invokeStatic(python.clj:165)
	at libpython_clj2.python$import_module.invoke(python.clj:158)
	at train$eval15837.invokeStatic(crash2.clj:6)
	at train$eval15837.invoke(crash2.clj:5)
	at clojure.lang.Compiler.eval(Compiler.java:7194)
	at clojure.lang.Compiler.load(Compiler.java:7653)
	... 3 more
Sep 30, 2022 10:30:39 AM clojure.tools.logging$eval3239$fn__3242 invoke
INFO: Reference thread starting
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/carsten/.local/lib/python3.10/site-packages/clojurebridge/cljbridge.py", line 187, in load_clojure_file
    resolve_call_fn("clojure.core/load-file",
  File "/home/carsten/.local/lib/python3.10/site-packages/clojurebridge/cljbridge.py", line 73, in resolve_call_fn
    return resolve_fn(namespaced_fn_name)(*args)
  File "/home/carsten/.local/lib/python3.10/site-packages/clojurebridge/cljbridge.py", line 50, in __call__
    return invoker(*args)
  File "/home/carsten/.local/lib/python3.10/site-packages/javabridge/jutil.py", line 961, in method
    result = call(self.o, name, sig, *args)
  File "/home/carsten/.local/lib/python3.10/site-packages/javabridge/jutil.py", line 892, in call
    result = fn(*nice_args)
  File "/home/carsten/.local/lib/python3.10/site-packages/javabridge/jutil.py", line 859, in fn
    raise JavaException(x)
javabridge.jutil.JavaException: Syntax error macroexpanding at (/home/carsten/Dropbox/sources/libpython-clj/crash2.clj:5:1).

We can probably agree, that this is the expected behavior.

A JVM exception and neither "jvm crash" nor "jvm hanging".

@behrica
Copy link
Contributor Author

behrica commented Sep 30, 2022

And we have the correct behaviour in the REPL, in "Non-embedded"

@behrica
Copy link
Contributor Author

behrica commented Sep 30, 2022

We have as well correct behaviour "in embedded mode from the REPL".

The "load_clojure_file" makes the issue:

python -c 'from clojurebridge import cljbridge;cljbridge.load_clojure_file(clj_file="crash2.clj",mvn_local_repo="/home/carsten/.m2/repositor

This load_clojure_file pathway was added by me to clojurebridge, and it was therefore never tested by you, I suppose.

@behrica
Copy link
Contributor Author

behrica commented Sep 30, 2022

It is documented here:
https://clj-python.github.io/libpython-clj/embedded.html

chapter: Loading and running a Clojure file in embedded mode

@behrica
Copy link
Contributor Author

behrica commented Sep 30, 2022

To make it more confusing, there is an other pathway I made as well, which loads a fixed "user.clj" file.

This prints as well teh exception.....

python -c 'from clojurebridge import cljbridge;cljbridge.init_jvm(load_user_clj=True,mvn_local_repo="/home/carsten/.m2/repository")'

(I propose to later remove this load_user_clj pathway, as it is superseded by the load_clojure_file)

So lets ignore this for the moment,

I created: behrica/clojurebridge#5

@behrica
Copy link
Contributor Author

behrica commented Sep 30, 2022

Let's concentrated on the "hanging" one, which is worst from a user experience point of view:

crash.clj

(ns train
  (:require [libpython-clj2.python :refer [py.- py.] :as py]
            [libpython-clj2.python.ffi :as ffi]))
            ;; [libpython-clj2.require :refer [require-python]]
            ;; [libpython-clj2.python.gc]
(println "-------- manual-gil: " ffi/manual-gil)

(def locked (ffi/lock-gil))
(println :gil-locked)

(println :before-crash)
(py/import-module "not existing")  ;; whatever error on python side, like loading an non existing library
(println "crash finished")

run as

export _JAVA_OPTIONS="-Dlibpython_clj.manual_gil=false";python -c 'from clojurebridge import cljbridge;cljbridge.load_clojure_file(clj_file="crash.clj",mvn_local_repo="/home/carsten/.m2/repository")'

results in
hanging for ever

expected behaviour
A nice JVM exception:

Exception in thread "Thread-0" Syntax error macroexpanding at (/home/carsten/Dropbox/sources/libpython-clj/crash.clj:5:1).
	at clojure.lang.Compiler.load(Compiler.java:7665)
	at clojure.lang.Compiler.loadFile(Compiler.java:7591)
	at clojure.lang.RT$3.invoke(RT.java:327)
	at clojure.lang.Var.invoke(Var.java:384)
Caused by: java.lang.Exception: ModuleNotFoundError: No module named 'not existing'

@behrica
Copy link
Contributor Author

behrica commented Sep 30, 2022

(same hanning happens without calling "lock/gil")

and independent of the "manual_gil") setting, so we get an even easier hanging szenario:

@behrica
Copy link
Contributor Author

behrica commented Sep 30, 2022

You might ask, why do I play with "manul_gil" and GIL locking.
Because this makes python library simpletransformers work more reliable (proven by repeated executions and counting training sucecsses)

But by doing so, I experience now the issue we discuss. If I get an python exception in my code with "manuel GIL handling", it hangs.

As a "workaround" I now run it without manual GIL (to see if working on little data) and then switch to "with manual GIL" for train on full data, which is annoying to do.

Especially because the instability of simpletransformers might result in "hanging" as well (or JVM crash)

@behrica
Copy link
Contributor Author

behrica commented Sep 30, 2022

Even simpler code t reproduce:

Any exception thronw by pythonresult in "hanging"

(ns train
  (:require [libpython-clj2.python :refer [py.- py.] :as py]
            [libpython-clj2.python.ffi :as ffi]))


(def gil  (ffi/lock-gil))
(py/run-simple-string "invalid syntax !!!")
(ffi/unlock-gil gil)

clled like this:

export _JAVA_OPTIONS="-Dlibpython_clj.manual_gil=true"
python -c 'from clojurebridge import cljbridge;cljbridge.load_clojure_file(clj_file="crash.clj",mvn_local_repo="/home/carsten/.m2/repository")'

@cnuernber
Copy link
Collaborator

cnuernber commented Sep 30, 2022

Well, I stand corrected :-). That definitely appears to be a bug in libpython-clj.

Perhaps the unlock-gil code needs check-error-throw. Hmm - that needs to happen nearly after every python call - it was handled by with-gil and it should still be happening.

@behrica
Copy link
Contributor Author

behrica commented Sep 30, 2022

I found a way to at least print the error, but it still hang

diff --git a/src/libpython_clj2/python/ffi.clj b/src/libpython_clj2/python/ffi.clj
index 773d1f9..ec46e55 100644
--- a/src/libpython_clj2/python/ffi.clj
+++ b/src/libpython_clj2/python/ffi.clj
@@ -706,7 +706,8 @@ Each call must be matched with PyGILState_Release"}
 (defn check-error-throw
   []
   (when-let [error-str (check-error-str)]
-    (throw (Exception. ^String error-str))))
+    (do (println :error error-str)
+        (throw (Exception. ^String error-str)))))
 

@behrica
Copy link
Contributor Author

behrica commented Sep 30, 2022

It gets as well solved by unlocking the gil in a finally

(try
  (py/run-simple-string "1/0")
  (finally
    (ffi/unlock-gil gil)))

I noticed already before, that in embedded mode and loading a clj file,
the JVM does not terminate without the unlock

So the "hanging" we see is maybe the following:

The exception:

  1. is thrown
  2. not printed for whatever reason
  3. gil/unlock is not called
  4. code finishes
    5, JVM hangs

@behrica
Copy link
Contributor Author

behrica commented Sep 30, 2022

so maybe it is simpler to find out and fix that in the following the VM does not terminate:

(ns train
  (:require [libpython-clj2.python :refer [py.- py.] :as py]
            [libpython-clj2.python.ffi :as ffi]))


(def gil  (ffi/lock-gil))

(py/run-simple-string "print('ok')")

@behrica
Copy link
Contributor Author

behrica commented Sep 30, 2022

I saw that in the hanging state some thread seems to wait forever:

Name: tech.resource.gc ref thread
State: TIMED_WAITING on java.lang.ref.ReferenceQueue$Lock@28f1bb74
Total blocked: 1  Total waited: 87,568

Stack trace: 
java.base@11.0.16.1/java.lang.Object.wait(Native Method)
java.base@11.0.16.1/java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:155)
tech.v3.resource.gc$watch_reference_queue$fn__10795.invoke(gc.clj:27)
tech.v3.resource.gc$watch_reference_queue.invokeStatic(gc.clj:23)
tech.v3.resource.gc$watch_reference_queue.invoke(gc.clj:21)
tech.v3.resource.gc$start_reference_thread$fn__10807.invoke(gc.clj:47)
app//clojure.lang.AFn.run(AFn.java:22)
java.base@11.0.16.1/java.lang.Thread.run(Thread.java:829)

not sure, if relevant

@cnuernber
Copy link
Collaborator

cnuernber commented Sep 30, 2022

The java api has a try-with-resources pattern -

(with-open [glock (java_api/GILLocker)]
  ...)

That will automatically do the try-finally. Anything auto-closeable will be closed in a with-open macro.

@cnuernber
Copy link
Collaborator

There is a new API in latest (2.020) - with-manual-gil that will automatically lock/unlock the gil.

user> ffi/manual-gil
true
user> (py/with-manual-gil
        (py/run-simple-string "!! syntax error"))
Execution error at libpython-clj2.python.ffi/check-error-throw (ffi.clj:708).
  File "<string>", line 1
    !! syntax error
    ^
SyntaxError: invalid syntax
user> 

Outputs:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants