math: move i386 sqrt to C

Submitted by Alexander Monakov on Jan. 7, 2020, 1:06 p.m.

Details

Message ID 20200107130605.7618-1-amonakov@ispras.ru
State New
Headers show

Patch hide | download patch | download mbox

diff --git a/src/math/i386/sqrt.c b/src/math/i386/sqrt.c
new file mode 100644
index 00000000..619df056
--- /dev/null
+++ b/src/math/i386/sqrt.c
@@ -0,0 +1,15 @@ 
+#include "libm.h"
+
+double sqrt(double x)
+{
+	union ldshape ux;
+	unsigned fpsr;
+	__asm__ ("fsqrt; fnstsw %%ax": "=t"(ux.f), "=a"(fpsr) : "0"(x));
+	if ((ux.i.m & 0x7ff) != 0x400)
+		return (double)ux.f;
+	/* Rounding to double would have encountered an exact halfway case.
+	   Adjust mantissa downwards if fsqrt rounded up, else upwards.
+	   (result of fsqrt could not have been exact) */
+	ux.i.m ^= (fpsr & 0x200) + 0x200;
+	return (double)ux.f;
+}
diff --git a/src/math/i386/sqrt.s b/src/math/i386/sqrt.s
deleted file mode 100644
index 57837e25..00000000
--- a/src/math/i386/sqrt.s
+++ /dev/null
@@ -1,21 +0,0 @@ 
-.global sqrt
-.type sqrt,@function
-sqrt:	fldl 4(%esp)
-	fsqrt
-	fnstsw %ax
-	sub $12,%esp
-	fld %st(0)
-	fstpt (%esp)
-	mov (%esp),%ecx
-	and $0x7ff,%ecx
-	cmp $0x400,%ecx
-	jnz 1f
-	and $0x200,%eax
-	sub $0x100,%eax
-	sub %eax,(%esp)
-	fstp %st(0)
-	fldt (%esp)
-1:	add $12,%esp
-	fstpl 4(%esp)
-	fldl 4(%esp)
-	ret

Comments

Rich Felker Jan. 8, 2020, 7:26 a.m.
On Tue, Jan 07, 2020 at 04:06:05PM +0300, Alexander Monakov wrote:
> ---
> Since union ldshape does not have a dedicated field for 32 least significant
> bits of the x87 long double mantissa, keeping the original approach with
> 
>     ux.i.m -= (fpsr & 0x200) - 0x100;
> 
> would lead to a 64-bit subtraction that is not trivial for the compiler to
> optimize to 32-bit subtraction as done in the original assembly. Therefore
> I have elected to change the approach and use
> 
>     ux.i.m ^= (fpsr & 0x200) + 0x200;
> 
> which is easier to optimize to a 32-bit rather than 64-bit xor.
> 
> Thoughts?

I think it looks like a good change. Ideally the compiler would see,
since the branch was taken where the low bits are 0x400, that
subtracting a value less than 0x400 can't borrow from the high 32
bits. But I don't think current compilers do such range/value
analysis, and it doesn't seem like there are anny downsides to your
approach anyway.

Rich