Skip to content

OpenBLAS has issues solving triangular matrices on win32 #1270

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ghost opened this issue Aug 7, 2017 · 74 comments
Closed

OpenBLAS has issues solving triangular matrices on win32 #1270

ghost opened this issue Aug 7, 2017 · 74 comments

Comments

@ghost
Copy link

ghost commented Aug 7, 2017

Using the 32 bit openblas causes timeouts while running the scipy tests:

@ghost
Copy link
Author

ghost commented Aug 7, 2017

@martin-frbg
Copy link
Collaborator

Do I read that correctly - you are using a binary distribution of 0.2.19 from some third party site (which could be one of the sourceforge mirrors sites, but then again could be something else) ? Could you retest with the current "develop" snapshot which has a bunch of fixes vs. 0.2.19 (or at least build 0.2.19 yourself on your platform to see if it passes the built-in tests) ? In any case I guess a minimal example run on a well-defined system would be preferable to a big unedited dump from an appveyor vm.

@ghost
Copy link
Author

ghost commented Aug 7, 2017

That was built 12 days ago: scipy/scipy#7616 (comment)

@ghost
Copy link
Author

ghost commented Aug 7, 2017

In any case I guess a minimal example run on a well-defined system would be preferable to a big unedited dump from an appveyor vm.

Is there any way to turn on debug output?

@ghost
Copy link
Author

ghost commented Aug 7, 2017

excuse me; the URL was not apparently correct. Retrying with 2.20 now.

@martin-frbg
Copy link
Collaborator

not sure if 0.2.20 will build in your environment (depending on how and where you build - a cross-compile on linux like the binaries provided on sourceforge will probably go okay. the issue is a botched commit that breaks things on non-glibc systems, waiting for xianyi to release 0.2.21) which is why I suggested taking a snapshot of "develop" (easily done by clicking on the green "Clone or download" button on the "Code" tab here). Normal build from source should run a few tests at the end automatically (though as a Linux guy I am not completely sure it does this on Windows as well.

@martin-frbg
Copy link
Collaborator

Thanks for the link to the scipy issue. From reading that, it seems your tests were (at least at some point in the history of that PR) already done with 0.2.20 (?)
Do you still believe the problematic behaviour is with potrf specifically ?

@ghost
Copy link
Author

ghost commented Aug 7, 2017

Thanks for the link to the scipy issue. From reading that, it seems your tests were (at least at some point in the history of that PR) already done with 0.2.20 (?)

That's what I thought, but HEAD had 2.19. I updated to 2.20 just now.

Do you still believe the problematic behaviour is with potrf specifically ?

That is my suspicion; @carlkl used a completely different build method and still had issues with OpenBLAS on win32 but not win_amd64. Of course it would be helpful to get more debug output in order to understand what is happening.

@ghost
Copy link
Author

ghost commented Aug 7, 2017

That is my suspicion; @carlkl used a completely different build method

Specifically he used cross compiling with gcc rather than building with MSVC.

@martin-frbg
Copy link
Collaborator

Cross compiling with gcc will currently get you the hand-optimized assembly functions (which use inline assembly and AT&T notation that MSVC does not support), building with MSVC will use generic C implementations instead (which are possibly still a bit faster than the netlib reference implementation).
Latest commits to "develop" allow compiling with clang which supports the optimized assembly (but will need a windows flang compiler for completeness)

@brada4
Copy link
Contributor

brada4 commented Aug 7, 2017

Specifically what lapack is imported by arpack?

@ghost
Copy link
Author

ghost commented Aug 7, 2017

Specifically what lapack is imported by arpack?

Lapack and BLAS both come from OpenBLAS.

@ghost
Copy link
Author

ghost commented Aug 7, 2017

https://ci.appveyor.com/project/scipy/scipy/build/1.0.44/job/wix716oq76ft44r4

Looks like we still have issues:

  File "C:\Python36\lib\site-packages\scipy\optimize\tests\test_trustregion.py", line 76, in test_solver_concordance
    options={'return_all': True})
  File "C:\Python36\lib\site-packages\scipy\optimize\_minimize.py", line 486, in minimize
    callback=callback, **options)
  File "C:\Python36\lib\site-packages\scipy\optimize\_trustregion_exact.py", line 43, in _minimize_trustregion_exact
    **trust_region_options)
  File "C:\Python36\lib\site-packages\scipy\optimize\_trustregion.py", line 184, in _minimize_trust_region
    p, hits_boundary = m.solve(trust_radius)
  File "C:\Python36\lib\site-packages\scipy\optimize\_trustregion_exact.py", line 418, in solve
    delta, v = singular_leading_submatrix(H, U, info)
  File "C:\Python36\lib\site-packages\scipy\optimize\_trustregion_exact.py", line 185, in singular_leading_submatrix
    return delta, v
+++++++++++++++++++++++++++++++++++ Timeout ++++++++++++++++++++++++++++++++++++
[gw3] node down: Not properly terminated
fReplacing crashed slave gw3
...................F....F..........x................................................

It's not to difficult to reproduce; just download and install python 32 bit, install the wheel from appveyor and run the test. I can also instrument this if you want, just tell me what tooling I need to use.

@ghost
Copy link
Author

ghost commented Aug 7, 2017

(but will need a windows flang compiler for completeness)

Yeah, I actually opened an issue on flang for that; if anyone has time, I would be greatly appreciative.

@brada4
Copy link
Contributor

brada4 commented Aug 7, 2017

Sure, and your binary numpy too, ten times.
Some backtrace would be nice....

@ghost
Copy link
Author

ghost commented Aug 7, 2017

Some backtrace would be nice....
I can also instrument this if you want, just tell me what tooling I need to use.

Sorry but I am not so familiar with debugging OpenBLAS. I am happy to provide a backtrace but I need assistance here.

@ghost
Copy link
Author

ghost commented Aug 7, 2017

The problem that I don't know how to debug an MSVC lib that has called into a gcc lib.

@ghost
Copy link
Author

ghost commented Aug 7, 2017

@matthew-brett Perhaps you would be better at obtaining the backtrace?

@matthew-brett
Copy link
Contributor

The OpenBLAS build is with a slightly aged Mingw variant over at https://anaconda.org/carlkl/mingwpy. See: https://mingwpy.github.io. The build runs on Windows (not a cross-compile).

I've just set off a build from OpenBLAS master, in case that's useful:

@matthew-brett
Copy link
Contributor

@xoviat - sadly I'm not experienced with this either.

@brada4
Copy link
Contributor

brada4 commented Aug 7, 2017

Excuse me? You pull binary numpy linked against unknown BLAS and say that specifically OpenBLAS is at fault. Without backtrace that sadly goes nowhere.

@ghost
Copy link
Author

ghost commented Aug 7, 2017

Numpy is not being called here. We are calling OpenBLAS directly.

@ghost
Copy link
Author

ghost commented Aug 7, 2017

Numpy is only used for numpy.distutils.

@ghost
Copy link
Author

ghost commented Aug 7, 2017

Without backtrace that sadly goes nowhere.

I agree that this impossible to solve without a backtrace. How can I do this?

@brada4
Copy link
Contributor

brada4 commented Aug 7, 2017

Usually you get debugger next to compiler.

@ghost
Copy link
Author

ghost commented Aug 7, 2017

My guess is that this is going to be undebuggable:

MSVC EXE (python) ==> MSVC DLL (*.pyd) ==> gcc DLL (OpenBLAS)

@matthew-brett
Copy link
Contributor

Just to check we're on the same page:

  • We're reporting some crashes, and some numerical test failures with the Scipy tests when linking Scipy Fortran code against OpenBLAS.
  • OpenBLAS is being built with a reproducible build on Appveyor using (a version of) gcc mingw-w64, linking against MSVCRT - links to the build above;
  • Numpy is being built with another reproducible build on Appveyor, using the OpenBLAS from the step above, using MSVC as a C compiler;
  • Scipy is being built with a reproducible build on Appveyor, using the OpenBLAS above, using MSVC as s C compiler, and mingw-w64 gfortran as a Fortran compiler.

Using this same rig, all running on 64-bit, we get no test failures or crashes.

@brada4
Copy link
Contributor

brada4 commented Aug 8, 2017

Can you break into frozen process with a debugger? Is it reproducable?

@martin-frbg
Copy link
Collaborator

I think what would be ideal is a fortran test case linked against OpenBLAS with no numpy/scipy involved at all. This should also eliminate any risk of a conflict between mingw and msvc runtime libraries.
(Or I guess just the input to potrf would suffice for building such a test case - but wading through a big appveyor log and scipy issue thread is a bit strenuous if done after regular work hours. Also do you have any indication what OpenBLAS thinks it is running on (CPU type) in the failing appveyor tests. (May be its in there somewhere but I missed it). 32bit windows probably sees less testing nowadays than the other platforms. Any chance to run the test on some local win32 system rather than appveyor virtual-whatever ?

@brada4
Copy link
Contributor

brada4 commented Aug 10, 2017

While betting on buffer sizing that sounds feasible theory.

@martin-frbg
Copy link
Collaborator

Thinking of #1141 here ? This definitely needs looking into.

@carlkl
Copy link

carlkl commented Aug 10, 2017

It seems that newer mingw-w64 versions can build Lapack without this error. This could be either an gcc bug or more likely a bug with mingw-w64 itself. Compiling OpenBLAS with a recent version of mingw-w64 and testing on this bugs seems to be a reasonable workaround right now.

@brada4
Copy link
Contributor

brada4 commented Aug 11, 2017

@martin-frbg it needs backtrace and/or coretype, currently it is in state of trying to induce same testcase problem for whole day.

@carlkl
Copy link

carlkl commented Aug 11, 2017

For comparison I compiled openblas_v0.2.20_1e9247c (develop) with mingw-build gcc-7.1 for 32bit. The dpotrf testcase #1270 (comment) now runs without error. And all 32bit scipy wheels compiled with mingwpy also runs without errors and failures. (these builds are different from that created by @xoviat and mentioned in the comment above).

It seems, that mingw-w64 toolchains based on gcc-5.3 and gcc-4.9 has problems with Lapack.

@brada4
Copy link
Contributor

brada4 commented Aug 11, 2017

It should be all fortran77 from netlib
Can you build openblas without lapack and then netlib produce with -lopenblas ?

@carlkl
Copy link

carlkl commented Aug 11, 2017

The mingw-w64 based builds mentioned in scipy/scipy#7616 (comment) do not exhibit this erroneous behaviour. Is there a need to experiment with netlib Lapack as well? I may find some time this weekend to do some debugging the faulty build, lets see.

@brada4
Copy link
Contributor

brada4 commented Aug 11, 2017

we have almost no windows. Just to blame original build scripts or small adjustments made for building w openblas, or well windows compilers you use.

@carlkl
Copy link

carlkl commented Aug 13, 2017

OpenBLAS uses interface/lapack/potrf.c (out-of-source) instead of original lapack's dpotrf.f. Using dpotrf.f instead, I have no problems regardless what version of mingw-w64 I'm using.

@martin-frbg
Copy link
Collaborator

Guess we could add something like

#if defined(OS_WINDOWS) && (defined(__MINGW32__) && <gcc version below 7.1>)
(nothing)
#else 
(code overriding *potrf.f)
#endif

in interface/lapack/potrf.c to work around the apparent miscompilation on affected platforms ?

@brada4
Copy link
Contributor

brada4 commented Aug 15, 2017

Actually all logs come from -march=native builds. Probably omitting all -O99 series would complete the build without a single problem

@martin-frbg
Copy link
Collaborator

I take it you are suggesting that compiling interface/lapack/potrf.c with -O0 instead of the default -O2 would solve the problem with older mingw versions ? Not sure how easy it would be to cater for that in the Makefiles (not sure if -march=native is actually what they are using or intending to use, if my understanding is correct they are preparing a distribution package of some sort).

@brada4
Copy link
Contributor

brada4 commented Aug 15, 2017

-march=pentium4 -msse2 -fpmath=sse ...
It is miscompilation by C compiler, not something to work around. Since build logs/options are not around for each and every 'build' and 'gcc version' i would take gcc7 claim with a big grain of salt.

@martin-frbg
Copy link
Collaborator

Once again I am not sure I can follow your cryptic messages. My suggestion was to omit compilation of the problematic potrf interface if there is reason to believe the compiler is not up to that task.

@brada4
Copy link
Contributor

brada4 commented Aug 15, 2017

It compiles just fine with gcc 4.6 included with windows rtools (compilers for r packages).
I am quite certain aggressive/unpopolar compiler options lead to miscompilation in this case.

@carlkl
Copy link

carlkl commented Aug 15, 2017

I'm not going to create a PR to OpenBLAS with a patch to something without further diagnosis and debugging. However, using mingw-w64 with gcc-7.1 seems to be a workaround to this issue right now,

@brada4
Copy link
Contributor

brada4 commented Aug 16, 2017

You dont even have exact compiler options for each failed case.
Your PR will be wrong, as old gcc can compile OpenBLAS without any problems. Please find problematic compiler option and report a compiler bug first.

@martin-frbg
Copy link
Collaborator

@brada4 alienating developers of other projects helps noone. Even if your gcc 4.6 from rtools happens to be identical to the mingw-w64 build of the same version number this does nothing to refute the claim that at least the 4.9 and 5.3 versions miscompile potrf.c . If we really want to get to the bottom of this I guess we would need the intermediate assembler (gcc -S) output for this file from both "good" and "bad" compilers and someone who is capable of spotting the relevant differences.
For now, adding a note to the wiki (and perhaps ensuring that any future pre-compiled binaries for win32 made available on the project page at sourceforge are built with a recent gcc cross-compiler "just in case the problem carries over to cross-compilations") may be sufficient ?

@brada4
Copy link
Contributor

brada4 commented Aug 16, 2017

I dont have compiler on windows. I can try to cross-compile with old and new crosscompiler and run test case on 64bit windows...
My speculation is based on assumption that typical distribution compiler option sets (-O2 -march=early-i686 -msome-memory-protector ) are million times better tested in GCC than -march=better-cpu

@martin-frbg
Copy link
Collaborator

Please note that this issue appears to be specifically about mingw-w64 on Windows 32bit then.

@brada4
Copy link
Contributor

brada4 commented Aug 16, 2017

Yes, yes, 32bit dll+exe running on 64bit windows.
It is very important to have it running at top quality since all alternatives are fading away.

@carlkl
Copy link

carlkl commented Aug 16, 2017

I'm willing to dig into this. However, This takes some time as there are some topics with higher priority for me right now.

@brada4
Copy link
Contributor

brada4 commented Aug 16, 2017

You are not alone. As a minimum find a linux distribution with good and recent gcc that produces production quality 32bit DLL

@martin-frbg
Copy link
Collaborator

Have there been any new insights since then ? I tried to track any seemingly related numpy/scipy issues but as far as I could tell the discussions there seem to have moved on to inadvertent use of the 80bit extended precision Intel FPU mode causing problems in cdflib. So should we add a warning in the wiki to use at least version 7.1 of the mingw gcc for 32bit builds on Windows, or are there any indications that the problem may be bigger and in OpenBLAS itself ?

@ghost
Copy link
Author

ghost commented Oct 8, 2017

It turns out that we were compiling with a non-standard toolchain called mingwpy. We decided to drop that toolchain in favor of vanilla mingw. That seemed to resolve the problems.

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants