-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Fixing up the Double/Single parsing code to be correct #20707
Conversation
src/System.Private.CoreLib/shared/System/Number.NumberToFloatingPointBitsRoslyn.cs
Outdated
Show resolved
Hide resolved
Marked as [WIP] until I can get Roslyn attributed correctly and until I can get the "fast-path" up for both double and single (so we don't regress perf for the common case). For the "fast-path" we can use algorithm similar to the previous |
@@ -198,6 +406,217 @@ public static int Compare(ref BigInteger lhs, ref BigInteger rhs) | |||
return 0; | |||
} | |||
|
|||
public static uint CountSignificantBits(uint value) | |||
{ | |||
return (value != 0) ? (1 + LogBase2(value)) : 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few places, like this, where HWIntrinsics could probably be used in a later PR.
|
||
public static void DivRem(ref BigInteger lhs, ref BigInteger rhs, out BigInteger quo, out BigInteger rem) | ||
{ | ||
// This is modified from the CoreFX BigIntegerCalculator.DivRem.cs implementation: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably some room for improvements here. We only care about the lower 64-bits of quo
, for example; and rem
only matters if it is zero or non-zero.
@@ -410,12 +844,12 @@ public static void Multiply(ref BigInteger lhs, ref BigInteger rhs, ref BigInteg | |||
} | |||
} | |||
|
|||
public static void Pow10(uint exponent, ref BigInteger result) | |||
public static void Pow10(uint exponent, out BigInteger result) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably some room for improvement here as well, calculating some of the larger powers of 10 can be "expensive".
src/System.Private.CoreLib/shared/System/Number.NumberToFloatingPointBitsRoslyn.cs
Outdated
Show resolved
Hide resolved
CoreFX test failures are expected as
|
A simple fast-path is up, which is marginally faster than the previous implementation for inputs of |
src/System.Private.CoreLib/shared/System.Private.CoreLib.Shared.projitems
Outdated
Show resolved
Hide resolved
ulong bits = NumberToFloatingPointBitsRoslyn(ref number, in FloatingPointInfo.Double); | ||
double result = BitConverter.Int64BitsToDouble((long)(bits)); | ||
|
||
if (!double.IsFinite(result)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our current behavior is to return false
and 0
for overflow. However, the correct behavior is to return Infinity
with the appropriate sign.
Fixing this would fall into the realm of breaking changes and we would need to determine whether this counts as a parsing failure (which would return false
from TryParse
and would throw for Parse
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Under the default IEEE rules, this would signal the overflow exception
(if enabled); However, we explicitly disable and do not support IEEE exception handling; which makes the current behavior a bit of an oddity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CC. @jkotas, do you have an opinion here on the correct behavior?
I believe the options are:
- Keep the current behavior and return false, setting
result
to 0 - Return false, but set the result to
Infinity
with the appropriate sign- This would throw for
Parse
, but forTryParse
would allow interested users to differentiate between "Format" and "Overflow"; it should not cause any difference in behavior for existing code paths
- This would throw for
- Return true and set the result to
Infinity
with the appropriate sign- Both
Parse
andTryParse
would work and this is arguably the most correct. However, some code may not know to handle the overflow case correctly
- Both
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've selected to return true
and set the result to Infinity
with the appropriate sign, for now. It is the most correct behavior and is what other languages/frameworks do when IEEE exception handling is disabled.
uint fractionalLastIndex = totalDigits; | ||
uint fractionalDigitsPresent = fractionalLastIndex - fractionalFirstIndex; | ||
|
||
// When the number of significant digits is less than or equal to 15 and the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For System.Single
, this covers a very good range of possible inputs. You only need at most 9 digits to guarantee round-tripping and the worst case is 1.4 * 10^-45
. This means that, for an expected range of inputs, we will likely always be faster than previous, and there will be some very small/large numbers that are slower.
For System.Double
, this likewise covers a good range of possible inputs. However, you need at most 17-digits to guarantee round-tripping so there are likely some round-trippable inputs that will always be slower (but that will be correct now). There will also be a much broader range of values that will be slower, since the worst case is 4.9 * 10^-324
, but these fall outside the range of what I would expect to be "normal" inputs (IIRC, approx half of all unique values fall between -1.0 and 1.0
, due to the representation, and for those, I think we can assume the magnitude of most values will be larger than 1e-22
and less than 1e22
).
- There may be some other tricks we can do to speed up this more for things like the 17 digit case. For example: the "golden standard" by David M. Gay, uses a software implementation of an "extended-precision" float with 96-bits to cover some of these inputs (we would need to do all the appropriate license checks before using it, however, other languages like Java use a variation of it so it probably won't be too much of a concern).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I did my math right, we can have a couple other fast paths to cover the following additional ranges. Doing so would provide code similar to the previous implementation, which was providing a software implementation for 80-bit extended-precision multiply; however, we would also need to provide a software-based division implementation, since negative exponents are not exactly representable.
- System.Single (8-bit exponent, 24-bit mantissa)
- 7 Digits, "fast exponent" up to 10
- System.Double (11-bit exponent, 53-bit mantissa)
- 15 Digits, "fast exponent" up to 22
- Extended-Precision Double (15-bit exponent, 64-bit mantissa)
- 19 Digits, "fast exponent" up to 27
- Quad-Precision (15-bit exponent, 113-bit mantissa)
- 34 Digits, "fast exponent" up to 38
Supporting a basic software implementation for "extended-precision double" is more trivial since it basically just relies on things we have existing hardware support for (such as 64*64=128
). This would also cover most doubles that are of "round-trippable" length (17 digits).
Supporting a basic software implementation for "quad-precision" is more complex, but may still be worthwhile, since it would effectively cover the entire range of System.Single
"normal" inputs (for System.Single
, only subnormal inputs have an exponent that is greater than 38).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It now appropriately uses System.Single
or System.Double
arithmetic, where applicable.
It also uses For the division case (which is arguably more common), the implementation isn't as trivial. Berkeley has an existing software-implementation (available under the BSD 3 Clause) for most of the floating-point functions (for binary16, binary32, binary64, binary80, and binary128), which would be nice if we could use here (rather than needing to port our own): https://github.com/ucb-bar/berkeley-softfloat-3/blob/master/COPYING.txtExtended-Precision
arithmetic for the multiplication case.
{ | ||
0, // 10^8 | ||
2, // 10^16 | ||
5, // 10^32 | ||
10, // 10^64 | ||
18, // 10^128 | ||
33, // 10^256 | ||
61, // 10^512 | ||
116, // 10^1024 | ||
}; | ||
|
||
private static readonly uint[] s_Pow10BigNumTable = new uint[] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it feasible/worthwhile to have debug-only code that validates the entries in this table don't contain a mistake? I didn't see any
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we have that elsewhere so it would make sense to have it here as well.
src/System.Private.CoreLib/shared/System/Number.NumberBuffer.cs
Outdated
Show resolved
Hide resolved
Missing test? |
No, this value came out from a test, just pointed it out as a quick example. |
The CoreFX failures are here (linked since they should now be disabled): https://ci.dot.net/job/dotnet_coreclr/job/master/job/x64_checked_windows_nt_corefx_innerloop_prtest/3912/#showFailuresLink There were 159 in total and were caused by:
|
A number of the CoreFX failures are unrelated and are showing up on other PRs. |
src/System.Private.CoreLib/shared/System/Number.NumberToFloatingPointBits.cs
Show resolved
Hide resolved
I believe the only remaining items are:
|
I believe this is ready for final review. After some discussion with @danmosemsft, it sounds like we want to log issues for some of the remaining optional work and look into them at some point in the future (such as if we get feedback about other inputs, outside the currently expected range). |
CC. @gafter since you logged https://github.com/dotnet/coreclr/issues/1316. I ran the entire Roslyn "RealParser" test suite (modified to call |
Fixed a couple bugs caught by the test base provided by @cyberphone: https://github.com/dotnet/coreclr/issues/17467#issuecomment-384932600
Now, the entire Roslyn RealParser suite and the entire ES6 test suite (covering 100M inputs) passes validation. |
Shoukd we pick up some of the Roslyn tests as well? |
I think we need/want to add these to CoreFX. |
src/System.Private.CoreLib/shared/System/Number.NumberToFloatingPointBits.cs
Outdated
Show resolved
Hide resolved
… and extended-precision multiplication where possible
…PointBits, due to a bug
… code has been modified from the original source.
…ision fast-path in NumberToFloatingPointBits
…er the MIT license.
src/System.Private.CoreLib/shared/System/Number.NumberToFloatingPointBits.cs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jkotas, latest changes have remove the licensing "glue" and re-released the code under MIT. |
Most of the jobs stopped due to disconnection. |
The CI system seems to have serious problems at the moment 😕 |
Yay @tannergooding ! |
@mmitche I have seen this too -- occasionally, just ocmmenting seems to kick off CI again. Do you know why? |
It can happen on old PRs if Jenkins loses track of the current state of what it has tested. |
* Don't normalize -0.0 to 0.0 when parsing * Updating NumberBuffer to validate the constructor inputs * Updating NumberToDouble to just get the precision * Don't check for non-significant zero's in NumberToDouble * Updating Number.BigInteger to carry additional space for the worst-case scenario * Removing some dead code from double.TryParse * Updating NumberToDouble to use the RealParser implementation from Roslyn * Fixing TryNumberToDouble and TryNumberToSingle to apply the appropriate sign. * Adding a fast path for double/single parsing when we have <= 15 digits and an absolute scale <= 22 * Update NumberBuffer to also track whether any input digits past maxDigCount were non-zero * Renaming NumberToFloatingPointBitsRoslyn to NumberToFloatingPointBits * Updating TryNumberToDouble and TryNumberToSingle to support Overflow to Infinity * Fixing a Debug.Assert in TryParseNumber * Fixing `DecimalNumberBufferLength` to 30 * Renaming NumberToFloatingPointBitsRoslyn to NumberToFloatingPointBits * Clarifying the NumberBufferLength comments * Fixing TryNumberToDecimal to check the last digit in the buffer, if it exists * Disable some CoreFX tests due to the single/double/decimal parsing fixes * Fix TryNumberToDecimal to not modify the value of p in the assert. Co-Authored-By: tannergooding <[email protected]> * Updating NumberToFloatingPointBits to use single-precision arithmetic and extended-precision multiplication where possible * Splitting the NumberToFloatingPointBits code into a fast and slow-path method * Ensure Roslyn is properly attributed. * Removing the 80-bit extended precision fast path for NumberToFloatingPointBits, due to a bug * Fixing the double and single parser to ignore case for Infinity and NaN * Add a clarifying comment to Number.NumberToFloatingPointBits that the code has been modified from the original source. * Removing the remaining code that was used by the 80-bit extended precision fast-path in NumberToFloatingPointBits * Adding a missing comma to the CoreFX.issues.json * Remove licensing "glue" and re-release the Roslyn RealParser code under the MIT license. * Some minor cleanup to the NumberToFloatingPointBits code.
…r#20707) * Don't normalize -0.0 to 0.0 when parsing * Updating NumberBuffer to validate the constructor inputs * Updating NumberToDouble to just get the precision * Don't check for non-significant zero's in NumberToDouble * Updating Number.BigInteger to carry additional space for the worst-case scenario * Removing some dead code from double.TryParse * Updating NumberToDouble to use the RealParser implementation from Roslyn * Fixing TryNumberToDouble and TryNumberToSingle to apply the appropriate sign. * Adding a fast path for double/single parsing when we have <= 15 digits and an absolute scale <= 22 * Update NumberBuffer to also track whether any input digits past maxDigCount were non-zero * Renaming NumberToFloatingPointBitsRoslyn to NumberToFloatingPointBits * Updating TryNumberToDouble and TryNumberToSingle to support Overflow to Infinity * Fixing a Debug.Assert in TryParseNumber * Fixing `DecimalNumberBufferLength` to 30 * Renaming NumberToFloatingPointBitsRoslyn to NumberToFloatingPointBits * Clarifying the NumberBufferLength comments * Fixing TryNumberToDecimal to check the last digit in the buffer, if it exists * Disable some CoreFX tests due to the single/double/decimal parsing fixes * Fix TryNumberToDecimal to not modify the value of p in the assert. Co-Authored-By: tannergooding <[email protected]> * Updating NumberToFloatingPointBits to use single-precision arithmetic and extended-precision multiplication where possible * Splitting the NumberToFloatingPointBits code into a fast and slow-path method * Ensure Roslyn is properly attributed. * Removing the 80-bit extended precision fast path for NumberToFloatingPointBits, due to a bug * Fixing the double and single parser to ignore case for Infinity and NaN * Add a clarifying comment to Number.NumberToFloatingPointBits that the code has been modified from the original source. * Removing the remaining code that was used by the 80-bit extended precision fast-path in NumberToFloatingPointBits * Adding a missing comma to the CoreFX.issues.json * Remove licensing "glue" and re-release the Roslyn RealParser code under the MIT license. * Some minor cleanup to the NumberToFloatingPointBits code. Commit migrated from dotnet/coreclr@b30280d
This cleans up the
System.Double
andSystem.Single
parsing code to be correct. It is a port of the Roslyn "RealParser" code with a few modifications:Number.BigInteger
structure, rather thanSystem.Numerics.BigInteger
For numbers that hit the fast-path, we are no slower than the current implementation and up to 30% faster. For numbers that have to fallback to the slow-path, the worst case I saw was 500% slower (for
double.Epsilon
).slow-path
, as you must consider and construct aBigInteger
containing up to 768 digits (but no more digits than the length of the input string). You must then do various arithmetic operations (including division) to compute the correct and precisely rounded result.This should resolve: