Use of long double in OpenBLAS #1711

brianborchers · 2018-08-04T11:47:11Z

Pull Request #1709 surprised me by showing that long double variables are used in some of the OpenBLAS code.

Although gcc on x86_64 uses 80 bit Intel extended precision for these, the performance is slow in comparison to AVX2 or AVX512 SIMD. Other x86_64 compilers might just use IEEE double precision for long double. On many other architectures there's no hardware support for an extended precision long double (and the compiler often turns long double into double precision.) It's also possible that some compiler might implement a quadruple precision long double in software, but that would be far slower than the x86_64 80 bit extended precision.

Thus there's no real way to count on this helping with precision, and furthermore, if you do get some kind of extended precision using long double it might have poor performance. A final concern is that users of BLAS/LAPACK routines don't expect extended precision and might be surprised by results which are too accurate.

Which of the BLAS/LAPACK functions in OpenBLAS make use of long double? How is it handled by the different architectures and compilers?

martin-frbg · 2018-08-04T12:20:51Z

From a quick grep, there appears to be an x86-specific reimplementation of sqrt() in common_x86.h that uses long double, and rotg.c/zrotg.c that were highlighted in #1709 (where the use of "long double" is confined to x86/x86_64 platforms). Possibly the motivation for using long doubles there may have been the extended data range rather than the extended precision, but as I wrote in the PR this code is unchanged from K.Goto's work.
(There are provisions for compiling OpenBLAS with EXPRECISION=1, which apparently turns on support for 128bit "long doubles", but this option appears to be off by default and again dates back to GotoBLAS2.)

carlkl · 2018-08-05T07:23:10Z

long double == double for the Microsoft MSVC compilers for x86/x86_64 platforms.
long double = extended precision for the GCC compilers for x86/x86_64 platforms (mingw-w64, cygwin) unless -mlong-double-64 is given.

Is there really a need to use extended precision in rotg.c/zrotg.c?

martin-frbg · 2018-08-05T08:51:15Z

Apparently Goto saw a need, though he may have been working around a compiler bug for all we know. The long double appears to be used only to calculate a scale factor from the sum of the absolute values of two doubles, but the code actually contains #ifdef'd sections for x86 (with long double) and "everything else".

brianborchers · 2018-08-05T13:27:54Z

I'd look at how the reference BLAS handles this- I don't believe that extended precision is required at all.

…

On Sun, Aug 5, 2018 at 2:51 AM, Martin Kroeker ***@***.***> wrote: Apparently Goto saw a need, though he may have been working around a compiler bug for all we know. The long double appears to be used only to calculate a scale factor from the sum of the absolute values of two doubles, but the code actually contains #ifdef'd sections for x86 (with long double) and "everything else". — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#1711 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AS1gkkt_Xy-Yus4pJpcYULTmMTbArFQ-ks5uNrIIgaJpZM4Vu9zt> .

-- Brian Borchers [email protected] Department of Mathematics http://www.nmt.edu/~borchers/ New Mexico Tech Phone: (575) 322-2592 Socorro, NM 87801 FAX: (575) 835-5366

carlkl · 2018-08-05T19:42:17Z

I don't understand the (long)((double) usage in blas_set_parameter (driver/others/parameter.c) either:

  factor=openblas_block_factor();
  if (factor>0) {
    if (factor <  10) factor =  10;
    if (factor > 200) factor = 200;

    sgemm_p = ((long)((double)sgemm_p * (double)factor * 1.e-2)) & ~7L;
    dgemm_p = ((long)((double)dgemm_p * (double)factor * 1.e-2)) & ~7L;
    cgemm_p = ((long)((double)cgemm_p * (double)factor * 1.e-2)) & ~7L;
    zgemm_p = ((long)((double)zgemm_p * (double)factor * 1.e-2)) & ~7L;
#ifdef EXPRECISION
    qgemm_p = ((long)((double)qgemm_p * (double)factor * 1.e-2)) & ~7L;
    xgemm_p = ((long)((double)xgemm_p * (double)factor * 1.e-2)) & ~7L;
#endif

I guess all OpenBLAS should get rid of long double usage.

martin-frbg · 2018-08-05T21:02:41Z

@carlkl the usage in blas_set_parameter is something completely different I think - it does not involve "long double" at all, but scaling of an integer value by real numbers, with the result cast to long again.
(And I suspect neither the referece BLAS nor a blanket statement is going to be particularly helpful in
trying to interpret legacy code that does not appear to be demonstrably wrong

carlkl · 2019-02-07T17:07:44Z

@martin-frbg , another issue that could be closed now?

martin-frbg · 2019-02-07T19:04:36Z

I had been meaning to revisit this - the use of "long double" does appear to be confined to rot/rotg but it is still not clear to me if it was used to fix an actual problem on any (pre-Haswell) x86 hardware or just experimentally as a (then) "new toy" (x87 extended precision). I am still inclined to treat this as a harmless oddity inherited from the GotoBLAS2 codebase

brianborchers · 2019-02-07T20:43:19Z

My only concern about it is that it will behave differently on different architectures and different compilers. In many cases you'll simply get IEEE double precision. On x86_64 with gcc (but not with Microsoft's compilers) you'll get the 80 bit extended precision. This could lead to different answers on different architecture/compiler combinations.

…

On Thu, Feb 7, 2019 at 12:04 PM Martin Kroeker ***@***.***> wrote: I had been meaning to revisit this - the use of "long double" does appear to be confined to rot/rotg but it is still not clear to me if it was used to fix an actual problem on any (pre-Haswell) x86 hardware or just experimentally as a (then) "new toy". I am still inclined to treat this as a harmless oddity inherited from the GotoBLAS2 codebase — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#1711 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AS1gki1qElZSdPZKuZOsnxAOrVHny6XEks5vLHjJgaJpZM4Vu9zt> .

-- Brian Borchers [email protected] Department of Mathematics http://www.nmt.edu/~borchers/ New Mexico Tech Phone: (575) 322-2592 Socorro, NM 87801 FAX: (575) 835-5366

carlkl · 2019-02-07T21:22:13Z

ARM doesn't use 80 bit precision on long double either. I guess (but I cannot prove it) that 80 bit precision is meaningless for rot/rotg.

carlkl · 2019-02-07T21:33:20Z

there is i.e. the following code snippet in openblas_config_template.h:

#ifdef OPENBLAS_QUAD_PRECISION
typedef struct {
  unsigned long x[2];
}  xdouble;
#elif defined OPENBLAS_EXPRECISION
#define xdouble long double
#else
#define xdouble double
#endif

This place is the only occurence of these macros, so xdouble is always defined as double. My impression is, that OPENBLAS_EXPRECISION and OPENBLAS_QUAD_PRECISION is an artefact of some unfinished work from GOTOBLAS.

martin-frbg · 2019-02-07T21:54:10Z

@carlkl yes, I think we already arrived at that conclusion in #1769. But neither this snippet nor any ARM features appear to have much bearing on the specific few lines of x86-only code used in rot(g)

carlkl · 2019-02-19T14:20:53Z

@martin-frbg, there is another place with use of extended precision I don't fully understand:

common_x86.h:

static __inline long double sqrt_long(long double val) {
#if defined(_MSC_VER) && !defined(__clang__)
  return sqrt(val); // not sure if this will use fsqrt
#else
  long double result;

  __asm__ __volatile__ ("fldt %1\n"
		    "fsqrt\n"
		    "fstpt %0\n" : "=m" (result) : "m"(val));
  return result;
#endif
}
#define SQRT(a)  sqrt_long(a)

x86 architecture (32 bit): for Cygwin and for mingw-w64 SQRT is defined as sqrt_long with extended precision. Following LAPACK files: potf2_L.c, potf2_U.c, zpotf2_L.c, zpotf2_U.c, make use of SQRT.

This code mentioned here could be a potential explanation for the issue: #1270 :
The OpenBLAS used for scipy test build at this time was compiled with a mingwpy version, that itself was build by me with: long double = double defined (using -mlong-double-64). Using assembler code for extended loads or stores is obviously not compatible with such a configuration.

Maybe all this long double code in OpenBLAS should stay as it is right now. However potential implications should be kept in mind.

So what about simply close this issue?

martin-frbg · 2019-02-19T15:05:23Z

I had already mentioned that x86 sqrt in #1711 (comment) above - this is another GotoBLAS2 odditiy without documented explanation. (The comment about something being due to a gcc bug a few lines down refers to the cpuid prototype - it is already in GotoBLAS-1.00 while the sqrt_long only appeared in GotoBLAS2 as far as I can tell from my incomplete collection of old GotoBLAS versions).
It is entirely possible that this was just an idle experiment by K.Goto exploring x87 extended precision - the ifdef MSVC shortcut was added only a few years ago by hpanderson.

carlkl mentioned this issue May 28, 2022

Build openblas with ucrt mingw MacPython/openblas-libs#76

Closed

martin-frbg closed this as completed Nov 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use of long double in OpenBLAS #1711

Use of long double in OpenBLAS #1711

brianborchers commented Aug 4, 2018

martin-frbg commented Aug 4, 2018

carlkl commented Aug 5, 2018

martin-frbg commented Aug 5, 2018

brianborchers commented Aug 5, 2018 via email

carlkl commented Aug 5, 2018

martin-frbg commented Aug 5, 2018

carlkl commented Feb 7, 2019

martin-frbg commented Feb 7, 2019 •

edited

Loading

brianborchers commented Feb 7, 2019 via email

carlkl commented Feb 7, 2019

carlkl commented Feb 7, 2019

martin-frbg commented Feb 7, 2019

carlkl commented Feb 19, 2019

martin-frbg commented Feb 19, 2019

Use of long double in OpenBLAS #1711

Use of long double in OpenBLAS #1711

Comments

brianborchers commented Aug 4, 2018

martin-frbg commented Aug 4, 2018

carlkl commented Aug 5, 2018

martin-frbg commented Aug 5, 2018

brianborchers commented Aug 5, 2018 via email

carlkl commented Aug 5, 2018

martin-frbg commented Aug 5, 2018

carlkl commented Feb 7, 2019

martin-frbg commented Feb 7, 2019 • edited Loading

brianborchers commented Feb 7, 2019 via email

carlkl commented Feb 7, 2019

carlkl commented Feb 7, 2019

martin-frbg commented Feb 7, 2019

carlkl commented Feb 19, 2019

martin-frbg commented Feb 19, 2019

martin-frbg commented Feb 7, 2019 •

edited

Loading