-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Use of long double in OpenBLAS #1711
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
From a quick grep, there appears to be an x86-specific reimplementation of sqrt() in common_x86.h that uses long double, and rotg.c/zrotg.c that were highlighted in #1709 (where the use of "long double" is confined to x86/x86_64 platforms). Possibly the motivation for using long doubles there may have been the extended data range rather than the extended precision, but as I wrote in the PR this code is unchanged from K.Goto's work. |
long double == double for the Microsoft MSVC compilers for x86/x86_64 platforms. Is there really a need to use extended precision in rotg.c/zrotg.c? |
Apparently Goto saw a need, though he may have been working around a compiler bug for all we know. The long double appears to be used only to calculate a scale factor from the sum of the absolute values of two doubles, but the code actually contains |
I'd look at how the reference BLAS handles this- I don't believe that
extended precision is required at all.
…On Sun, Aug 5, 2018 at 2:51 AM, Martin Kroeker ***@***.***> wrote:
Apparently Goto saw a need, though he may have been working around a
compiler bug for all we know. The long double appears to be used only to
calculate a scale factor from the sum of the absolute values of two
doubles, but the code actually contains #ifdef'd sections for x86 (with
long double) and "everything else".
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#1711 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AS1gkkt_Xy-Yus4pJpcYULTmMTbArFQ-ks5uNrIIgaJpZM4Vu9zt>
.
--
Brian Borchers [email protected]
Department of Mathematics http://www.nmt.edu/~borchers/
New Mexico Tech Phone: (575) 322-2592
Socorro, NM 87801 FAX: (575) 835-5366
|
I don't understand the factor=openblas_block_factor();
if (factor>0) {
if (factor < 10) factor = 10;
if (factor > 200) factor = 200;
sgemm_p = ((long)((double)sgemm_p * (double)factor * 1.e-2)) & ~7L;
dgemm_p = ((long)((double)dgemm_p * (double)factor * 1.e-2)) & ~7L;
cgemm_p = ((long)((double)cgemm_p * (double)factor * 1.e-2)) & ~7L;
zgemm_p = ((long)((double)zgemm_p * (double)factor * 1.e-2)) & ~7L;
#ifdef EXPRECISION
qgemm_p = ((long)((double)qgemm_p * (double)factor * 1.e-2)) & ~7L;
xgemm_p = ((long)((double)xgemm_p * (double)factor * 1.e-2)) & ~7L;
#endif I guess all OpenBLAS should get rid of long double usage. |
@carlkl the usage in blas_set_parameter is something completely different I think - it does not involve "long double" at all, but scaling of an integer value by real numbers, with the result cast to long again. |
@martin-frbg , another issue that could be closed now? |
I had been meaning to revisit this - the use of "long double" does appear to be confined to rot/rotg but it is still not clear to me if it was used to fix an actual problem on any (pre-Haswell) x86 hardware or just experimentally as a (then) "new toy" (x87 extended precision). I am still inclined to treat this as a harmless oddity inherited from the GotoBLAS2 codebase |
My only concern about it is that it will behave differently on different
architectures and different compilers. In many cases you'll simply get
IEEE double precision. On x86_64 with gcc (but not with Microsoft's
compilers) you'll get the 80 bit extended precision. This could lead to
different answers on different architecture/compiler combinations.
…On Thu, Feb 7, 2019 at 12:04 PM Martin Kroeker ***@***.***> wrote:
I had been meaning to revisit this - the use of "long double" does appear
to be confined to rot/rotg but it is still not clear to me if it was used
to fix an actual problem on any (pre-Haswell) x86 hardware or just
experimentally as a (then) "new toy". I am still inclined to treat this as
a harmless oddity inherited from the GotoBLAS2 codebase
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#1711 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AS1gki1qElZSdPZKuZOsnxAOrVHny6XEks5vLHjJgaJpZM4Vu9zt>
.
--
Brian Borchers [email protected]
Department of Mathematics http://www.nmt.edu/~borchers/
New Mexico Tech Phone: (575) 322-2592
Socorro, NM 87801 FAX: (575) 835-5366
|
ARM doesn't use 80 bit precision on long double either. I guess (but I cannot prove it) that 80 bit precision is meaningless for rot/rotg. |
there is i.e. the following code snippet in #ifdef OPENBLAS_QUAD_PRECISION
typedef struct {
unsigned long x[2];
} xdouble;
#elif defined OPENBLAS_EXPRECISION
#define xdouble long double
#else
#define xdouble double
#endif This place is the only occurence of these macros, so xdouble is always defined as double. My impression is, that |
@martin-frbg, there is another place with use of extended precision I don't fully understand: common_x86.h: static __inline long double sqrt_long(long double val) {
#if defined(_MSC_VER) && !defined(__clang__)
return sqrt(val); // not sure if this will use fsqrt
#else
long double result;
__asm__ __volatile__ ("fldt %1\n"
"fsqrt\n"
"fstpt %0\n" : "=m" (result) : "m"(val));
return result;
#endif
}
#define SQRT(a) sqrt_long(a) x86 architecture (32 bit): for Cygwin and for mingw-w64 This code mentioned here could be a potential explanation for the issue: #1270 : Maybe all this long double code in OpenBLAS should stay as it is right now. However potential implications should be kept in mind. So what about simply close this issue? |
I had already mentioned that x86 sqrt in #1711 (comment) above - this is another GotoBLAS2 odditiy without documented explanation. (The comment about something being due to a gcc bug a few lines down refers to the cpuid prototype - it is already in GotoBLAS-1.00 while the sqrt_long only appeared in GotoBLAS2 as far as I can tell from my incomplete collection of old GotoBLAS versions). |
Pull Request #1709 surprised me by showing that long double variables are used in some of the OpenBLAS code.
Although gcc on x86_64 uses 80 bit Intel extended precision for these, the performance is slow in comparison to AVX2 or AVX512 SIMD. Other x86_64 compilers might just use IEEE double precision for long double. On many other architectures there's no hardware support for an extended precision long double (and the compiler often turns long double into double precision.) It's also possible that some compiler might implement a quadruple precision long double in software, but that would be far slower than the x86_64 80 bit extended precision.
Thus there's no real way to count on this helping with precision, and furthermore, if you do get some kind of extended precision using long double it might have poor performance. A final concern is that users of BLAS/LAPACK routines don't expect extended precision and might be surprised by results which are too accurate.
Which of the BLAS/LAPACK functions in OpenBLAS make use of long double? How is it handled by the different architectures and compilers?
The text was updated successfully, but these errors were encountered: