Submitted by Alexander Monakov on Jan. 7, 2020, 1:06 p.m.

Message ID | 20200107130605.7618-1-amonakov@ispras.ru |
---|---|

State | New |

Headers | show |

diff --git a/src/math/i386/sqrt.c b/src/math/i386/sqrt.c new file mode 100644 index 00000000..619df056 --- /dev/null +++ b/src/math/i386/sqrt.c @@ -0,0 +1,15 @@ +#include "libm.h" + +double sqrt(double x) +{ + union ldshape ux; + unsigned fpsr; + __asm__ ("fsqrt; fnstsw %%ax": "=t"(ux.f), "=a"(fpsr) : "0"(x)); + if ((ux.i.m & 0x7ff) != 0x400) + return (double)ux.f; + /* Rounding to double would have encountered an exact halfway case. + Adjust mantissa downwards if fsqrt rounded up, else upwards. + (result of fsqrt could not have been exact) */ + ux.i.m ^= (fpsr & 0x200) + 0x200; + return (double)ux.f; +} diff --git a/src/math/i386/sqrt.s b/src/math/i386/sqrt.s deleted file mode 100644 index 57837e25..00000000 --- a/src/math/i386/sqrt.s +++ /dev/null @@ -1,21 +0,0 @@ -.global sqrt -.type sqrt,@function -sqrt: fldl 4(%esp) - fsqrt - fnstsw %ax - sub $12,%esp - fld %st(0) - fstpt (%esp) - mov (%esp),%ecx - and $0x7ff,%ecx - cmp $0x400,%ecx - jnz 1f - and $0x200,%eax - sub $0x100,%eax - sub %eax,(%esp) - fstp %st(0) - fldt (%esp) -1: add $12,%esp - fstpl 4(%esp) - fldl 4(%esp) - ret

On Tue, Jan 07, 2020 at 04:06:05PM +0300, Alexander Monakov wrote: > --- > Since union ldshape does not have a dedicated field for 32 least significant > bits of the x87 long double mantissa, keeping the original approach with > > ux.i.m -= (fpsr & 0x200) - 0x100; > > would lead to a 64-bit subtraction that is not trivial for the compiler to > optimize to 32-bit subtraction as done in the original assembly. Therefore > I have elected to change the approach and use > > ux.i.m ^= (fpsr & 0x200) + 0x200; > > which is easier to optimize to a 32-bit rather than 64-bit xor. > > Thoughts? I'm getting test failures with sqrt and this seems to be the culprit -- I don't think it's equivalent. The original version could offset the value by +0x100 or -0x100 before rounding, and offsets in the opposite direction of the rounding that already occurred. Your version can only offset it by +0x200 or -0x400. The (well, one) particular failing case is: src/math/ucb/sqrt.h:49: RU sqrt(0x1.fffffffffffffp+1023) want 0x1p+512 got 0x1.fffffffffffffp+511 ulperr -0.250 = -0x1p-1 + 0x1p-2 Here the mantissa is fffffffffffffc00 and offset by -0x400 yields: fffffffffffff800 which has exactly 53 bits and therefore does not round up like it should. I still like your approach better if there's a way to salvage it. Do you see one? Rich

On Sat, Mar 21, 2020 at 01:53:51PM -0400, Rich Felker wrote: > On Tue, Jan 07, 2020 at 04:06:05PM +0300, Alexander Monakov wrote: > > --- > > Since union ldshape does not have a dedicated field for 32 least significant > > bits of the x87 long double mantissa, keeping the original approach with > > > > ux.i.m -= (fpsr & 0x200) - 0x100; > > > > would lead to a 64-bit subtraction that is not trivial for the compiler to > > optimize to 32-bit subtraction as done in the original assembly. Therefore > > I have elected to change the approach and use > > > > ux.i.m ^= (fpsr & 0x200) + 0x200; > > > > which is easier to optimize to a 32-bit rather than 64-bit xor. > > > > Thoughts? > > I'm getting test failures with sqrt and this seems to be the culprit > -- I don't think it's equivalent. The original version could offset > the value by +0x100 or -0x100 before rounding, and offsets in the > opposite direction of the rounding that already occurred. Your version > can only offset it by +0x200 or -0x400. > > The (well, one) particular failing case is: > > src/math/ucb/sqrt.h:49: RU sqrt(0x1.fffffffffffffp+1023) want 0x1p+512 > got 0x1.fffffffffffffp+511 ulperr -0.250 = -0x1p-1 + 0x1p-2 > > Here the mantissa is > > fffffffffffffc00 > > and offset by -0x400 yields: > > fffffffffffff800 > > which has exactly 53 bits and therefore does not round up like it > should. > > I still like your approach better if there's a way to salvage it. Do > you see one? And, I think I do. Changing it to: ux.i.m ^= (fpsr & 0x200) + 0x300; yields an offset of +0x300 (^0x300) or -0x300 (^0x500). This looks like it should work theoretically, and indeed it passes libc-test. Rich

On Sat, 21 Mar 2020, Rich Felker wrote: > On Sat, Mar 21, 2020 at 01:53:51PM -0400, Rich Felker wrote: > > On Tue, Jan 07, 2020 at 04:06:05PM +0300, Alexander Monakov wrote: > > > --- > > > Since union ldshape does not have a dedicated field for 32 least significant > > > bits of the x87 long double mantissa, keeping the original approach with > > > > > > ux.i.m -= (fpsr & 0x200) - 0x100; > > > > > > would lead to a 64-bit subtraction that is not trivial for the compiler to > > > optimize to 32-bit subtraction as done in the original assembly. Therefore > > > I have elected to change the approach and use > > > > > > ux.i.m ^= (fpsr & 0x200) + 0x200; > > > > > > which is easier to optimize to a 32-bit rather than 64-bit xor. > > > > > > Thoughts? > > > > I'm getting test failures with sqrt and this seems to be the culprit > > -- I don't think it's equivalent. The original version could offset > > the value by +0x100 or -0x100 before rounding, and offsets in the > > opposite direction of the rounding that already occurred. Your version > > can only offset it by +0x200 or -0x400. > > > > The (well, one) particular failing case is: > > > > src/math/ucb/sqrt.h:49: RU sqrt(0x1.fffffffffffffp+1023) want 0x1p+512 > > got 0x1.fffffffffffffp+511 ulperr -0.250 = -0x1p-1 + 0x1p-2 > > > > Here the mantissa is > > > > fffffffffffffc00 > > > > and offset by -0x400 yields: > > > > fffffffffffff800 > > > > which has exactly 53 bits and therefore does not round up like it > > should. > > > > I still like your approach better if there's a way to salvage it. Do > > you see one? > > And, I think I do. Changing it to: > > ux.i.m ^= (fpsr & 0x200) + 0x300; > > yields an offset of +0x300 (^0x300) or -0x300 (^0x500). This looks > like it should work theoretically, and indeed it passes libc-test. Indeed, I was considering only the default (to-nearest) rounding mode and did not notice the problem for upwards rounding mode. I think your change solves this nicely. Thanks Alexander