math: move i386 sqrtf to C

Submitted by Alexander Monakov on Jan. 6, 2020, 5:43 p.m.

Details

Message ID 20200106174346.6489-1-amonakov@ispras.ru
State New
Headers show

Patch hide | download patch | download mbox

diff --git a/src/math/i386/sqrtf.c b/src/math/i386/sqrtf.c
new file mode 100644
index 00000000..2bcc7754
--- /dev/null
+++ b/src/math/i386/sqrtf.c
@@ -0,0 +1,8 @@ 
+#include <math.h>
+
+float sqrtf(float x)
+{
+	long double t;
+	__asm__ ("fsqrt" : "=t"(t) : "0"(x));
+	return (float)t;
+}
diff --git a/src/math/i386/sqrtf.s b/src/math/i386/sqrtf.s
deleted file mode 100644
index 9e944f45..00000000
--- a/src/math/i386/sqrtf.s
+++ /dev/null
@@ -1,7 +0,0 @@ 
-.global sqrtf
-.type sqrtf,@function
-sqrtf:	flds 4(%esp)
-	fsqrt
-	fstps 4(%esp)
-	flds 4(%esp)
-	ret

Comments

Pascal Cuoq Jan. 6, 2020, 6:32 p.m.
> On 6 Jan 2020, at 12:44, Alexander Monakov <amonakov@ispras.ru> wrote:

> 

> Apparently that double rounding yields the correct result here was

> suggested by Stephen Canon in https://stackoverflow.com/a/9689746/4755075

> Should his contribution be mentioned somehow in the source code?


You should rather credit Figueroa, for instance in the article “when is double rounding innocuous?”. This would be more of a primary source, and Figueroa wrote demonstrations.

Posting from phone, I apologize if I got something wrong. I'm pretty confident Stephen was implicitly citing Figueroa's results though.
Alexander Monakov Jan. 9, 2020, 3:55 p.m.
I should have asked earlier, but - is everyone happy with this style of
expressing removal of excess precision, or should I use eval_as_float?

+{
+       long double t;
[...]
+       return (float)t;
+}

Alexander
Rich Felker Jan. 9, 2020, 5 p.m.
On Thu, Jan 09, 2020 at 06:55:11PM +0300, Alexander Monakov wrote:
> I should have asked earlier, but - is everyone happy with this style of
> expressing removal of excess precision, or should I use eval_as_float?
> 
> +{
> +       long double t;
> [...]
> +       return (float)t;
> +}

musl requires C99+ with conforming floating point behavior so I think
this is okay, and I'd prefer to avoid having dependency on libm.h
machinery that's not strictly needed in src/math/$(ARCH) since it
increases the cost of changing that machinery (requires looking at
arch-specific files). My recollection was that eval_as_float is mostly
an artifact of nsz's math functions supporting and being compatible
with somewhat non-conforming compilers.

If we want to ensure correct rounding (important for sqrt[f]) even on
broken compilers (some ppl use gcc 3.x, and pcc may be broken too?)
perhaps we should just do the store from asm too?

Note that eval_as_float only helps if -ffloat-store is used, which is
a nasty hack and also nonconforming, arguably worse than the behavior
without it, so we should probably drop use of that as a fallback, and
use fp_barrier[f] instead if needed.

Rich
Szabolcs Nagy Jan. 9, 2020, 9 p.m.
* Rich Felker <dalias@libc.org> [2020-01-09 12:00:02 -0500]:
> On Thu, Jan 09, 2020 at 06:55:11PM +0300, Alexander Monakov wrote:
> > I should have asked earlier, but - is everyone happy with this style of
> > expressing removal of excess precision, or should I use eval_as_float?
> > 
> > +{
> > +       long double t;
> > [...]
> > +       return (float)t;
> > +}
> 
> musl requires C99+ with conforming floating point behavior so I think
> this is okay, and I'd prefer to avoid having dependency on libm.h
> machinery that's not strictly needed in src/math/$(ARCH) since it
> increases the cost of changing that machinery (requires looking at
> arch-specific files). My recollection was that eval_as_float is mostly
> an artifact of nsz's math functions supporting and being compatible
> with somewhat non-conforming compilers.

it's an artifact of glibc using -fexcess-precision=fast,
so the compiler can do clever transformations (without
relying on float_t and double_t), and then they annotate
where the excess precision really has to be dropped
(using inline asm).

i wanted to upstream that code into glibc as well as other
math libraries and c99 standard excess precision handling
is not the default in most environments (e.g. gcc default
-std=gnu* or iso c++) and actually the annotation is rarely
needed in math functions: only exact arithmetic tricks
need it or at return from public api functions (like in
this case) to force a final rounding for side-effects
or for sane call abi.

so for me it made sense to add explicit annotations which
document where this issue is really relevant and allows
using the code in more environments by just redefining
the eval_as_float etc calls. (on systems where this is
relevant results can be different compared to other targets
anyway, so allowing more variance depending on different
excess-precision settings is not that big deal.)

about dropping excess precision at return:

c99 required cast at return, but that was considered to be
a bug so in c11 it's no longer required: return now must
round. of course in non-standard mode return continues to
keep excess precision, which seems to require to always
annotate returns, but in most math functions all relevant
fenv side-effects of the final rounding would need special
handling anyway to set errno (assuming you still care about
errno: glibc does so my code also cares about it), so
overflow etc can't happen in the normal return paths, all
such cases are directed to special code that needs special
annotations anyway, so return annotation rarely comes up
(other than with correctly rounded functions like sqrt).

> 
> If we want to ensure correct rounding (important for sqrt[f]) even on
> broken compilers (some ppl use gcc 3.x, and pcc may be broken too?)
> perhaps we should just do the store from asm too?

that would be a bit safer, but then correct compiler would
store twice i think. it's hard to get excited about this
issue: it only matters on m68k and i386 which are not the
main targets for new float code (and old code had to deal
with this and bigger brokenness already).

> 
> Note that eval_as_float only helps if -ffloat-store is used, which is
> a nasty hack and also nonconforming, arguably worse than the behavior
> without it, so we should probably drop use of that as a fallback, and
> use fp_barrier[f] instead if needed.

i think -ffloat-store almost always drops excess precision
including returns and assignments, so with that no
annotation is needed. but yes the way the annotation is
defined now is not useful against broken compilers or
non-standard excess precision setting, in glibc the
annotation is defined differently (with inline asm).

btw, i'm not against the cast, but in my optimized-routines
code i will keep using my annotations.
Rich Felker Jan. 9, 2020, 10 p.m.
On Thu, Jan 09, 2020 at 10:00:06PM +0100, Szabolcs Nagy wrote:
> > If we want to ensure correct rounding (important for sqrt[f]) even on
> > broken compilers (some ppl use gcc 3.x, and pcc may be broken too?)
> > perhaps we should just do the store from asm too?
> 
> that would be a bit safer, but then correct compiler would
> store twice i think.

If you did something like:

	float y = expr_with_excess_precision;
	__asm__( "" : "+m"(y));
	return y;

then I think you'd get just one store and one load, as intended. It
seems to work as intended here. Oddly though my local gcc (7.3) is
gratuitously pushing/popping a single gpr to align stack to 8 (but not
16) despite being a leaf function. No idea why.

> it's hard to get excited about this
> issue: it only matters on m68k and i386 which are not the
> main targets for new float code (and old code had to deal
> with this and bigger brokenness already).

Indeed, but context of present thread is getting rid of the i386 asm
files so it's relevant here.

> > Note that eval_as_float only helps if -ffloat-store is used, which is
> > a nasty hack and also nonconforming, arguably worse than the behavior
> > without it, so we should probably drop use of that as a fallback, and
> > use fp_barrier[f] instead if needed.
> 
> i think -ffloat-store almost always drops excess precision
> including returns and assignments, so with that no
> annotation is needed. but yes the way the annotation is
> defined now is not useful against broken compilers or
> non-standard excess precision setting, in glibc the
> annotation is defined differently (with inline asm).

I was thinking in the context of wanting to remove from configure the:

|| { test "$ARCH" = i386 && tryflag CFLAGS_C99FSE -ffloat-store ; }

which is probably doing more harm than good. Do you know if there are
things that'd break if we did that? I think eval_as_float should
probably be defined as fp_barrierf to make it safe in your code,
conditional on FLT_EVAL_METHOD>0 (and likewise >1 for eval_as_double).

Rich
Szabolcs Nagy Jan. 9, 2020, 11:18 p.m.
* Rich Felker <dalias@libc.org> [2020-01-09 17:00:14 -0500]:
> On Thu, Jan 09, 2020 at 10:00:06PM +0100, Szabolcs Nagy wrote:
> > > Note that eval_as_float only helps if -ffloat-store is used, which is
> > > a nasty hack and also nonconforming, arguably worse than the behavior
> > > without it, so we should probably drop use of that as a fallback, and
> > > use fp_barrier[f] instead if needed.
> > 
> > i think -ffloat-store almost always drops excess precision
> > including returns and assignments, so with that no
> > annotation is needed. but yes the way the annotation is
> > defined now is not useful against broken compilers or
> > non-standard excess precision setting, in glibc the
> > annotation is defined differently (with inline asm).
> 
> I was thinking in the context of wanting to remove from configure the:
> 
> || { test "$ARCH" = i386 && tryflag CFLAGS_C99FSE -ffloat-store ; }
> 
> which is probably doing more harm than good. Do you know if there are
> things that'd break if we did that? I think eval_as_float should
> probably be defined as fp_barrierf to make it safe in your code,
> conditional on FLT_EVAL_METHOD>0 (and likewise >1 for eval_as_double).

i think -fexcess-precision=standard was introduced in
gcc 4.5 and to get reliable behaviour before that we
needed -ffloat-store. the freebsd math code had
volatile hacks to avoid -ffloat-store but those were
incomplete (there were only a few bugs e.g. see commit
c4359e01303da2755fe7e8033826b132eb3659b1 freebsd sets
x87 precision to double so for them some of these were
not issues in practice).

since we had -ffloat-store i turned off the volatile
hacks in commit 6d3f1a39c14b12026df84f386875b094e3652990
and later completely removed the annotations in commit
9b0fcb441a44456c7b071c7cdaf90403f81ec05a

on new compilers -fexcess-precision=standard is used,
but that turned out to do too many stores on the fdlibm
code (which is why glibc kept using =fast), so in commit
e216951f509b71da193da2fc63e25b998740d58b i started using
float_t and double_t to get fast code in standard mode.
(of course this made things worse for -ffloat-store).

i think we would need to add back the old annotations
to make old compilers safe without -ffloat-store.
(fdlibm often raises fenv exceptions via a final rounding
before return, those could be often handled more cleanly
by __math_oflow etc helpers, but since it was not designed
for inline errno handling some normal return paths can
raise fp exceptions too and thus need eval_as_* annotation).
Rich Felker Jan. 10, 2020, 2:07 a.m.
On Fri, Jan 10, 2020 at 12:18:58AM +0100, Szabolcs Nagy wrote:
> * Rich Felker <dalias@libc.org> [2020-01-09 17:00:14 -0500]:
> > On Thu, Jan 09, 2020 at 10:00:06PM +0100, Szabolcs Nagy wrote:
> > > > Note that eval_as_float only helps if -ffloat-store is used, which is
> > > > a nasty hack and also nonconforming, arguably worse than the behavior
> > > > without it, so we should probably drop use of that as a fallback, and
> > > > use fp_barrier[f] instead if needed.
> > > 
> > > i think -ffloat-store almost always drops excess precision
> > > including returns and assignments, so with that no
> > > annotation is needed. but yes the way the annotation is
> > > defined now is not useful against broken compilers or
> > > non-standard excess precision setting, in glibc the
> > > annotation is defined differently (with inline asm).
> > 
> > I was thinking in the context of wanting to remove from configure the:
> > 
> > || { test "$ARCH" = i386 && tryflag CFLAGS_C99FSE -ffloat-store ; }
> > 
> > which is probably doing more harm than good. Do you know if there are
> > things that'd break if we did that? I think eval_as_float should
> > probably be defined as fp_barrierf to make it safe in your code,
> > conditional on FLT_EVAL_METHOD>0 (and likewise >1 for eval_as_double).
> 
> i think -fexcess-precision=standard was introduced in
> gcc 4.5 and to get reliable behaviour before that we
> needed -ffloat-store.

I don't think the behavior was "reliable" with -ffloat-store; it's
wrong with respect to the defined meaning of FLT_EVAL_METHOD.

> since we had -ffloat-store i turned off the volatile
> hacks in commit 6d3f1a39c14b12026df84f386875b094e3652990
> and later completely removed the annotations in commit
> 9b0fcb441a44456c7b071c7cdaf90403f81ec05a

Thanks for these references.

> on new compilers -fexcess-precision=standard is used,
> but that turned out to do too many stores on the fdlibm
> code (which is why glibc kept using =fast), so in commit
> e216951f509b71da193da2fc63e25b998740d58b i started using
> float_t and double_t to get fast code in standard mode.
> (of course this made things worse for -ffloat-store).

This was the right thing to do, and I think it largely but not
entirely eliminates the need for caring about how the compiler handles
this, except in a few cases. It could probably be eliminated in more.
For example the argument reduction code cited above could use the
right constants for double_t rather than double to avoid the need to
store/load to drop excess precision.

> i think we would need to add back the old annotations
> to make old compilers safe without -ffloat-store.
> (fdlibm often raises fenv exceptions via a final rounding
> before return, those could be often handled more cleanly
> by __math_oflow etc helpers, but since it was not designed
> for inline errno handling some normal return paths can
> raise fp exceptions too and thus need eval_as_* annotation).

I think I asked you before, but from a standpoint of fenv stuff, I'm
confused why the eval_as_* things are useful at all; it looks like you
would need fp_barrier* to ensure they're actually evaluated (e.g. in
the presence of LTO with a compiler that doesn't honor fenv right).

But I think it's also useful to distinguish between possibility of
wrong exceptions being raised, which is a rather minor issue since
some widely-used compilers don't even support fenv reasonably at at
all, and the possibility of wrong values being returned for functions
where the result is required to be correctly rounded. I would deem it
a serious problem for sqrt[f] or fma[f] to return the wrong value when
compiled with gcc3 or pcc. I don't think I would particularly care if
exceptions failed to be raised properly when compiled with gcc3 or
pcc, though. So I probably would like to ensure that, whatever code we
end up with in i386 sqrt[f].c, it it ends up working even if the
compiler does not handle excess precision correctly.

Rich
Szabolcs Nagy Jan. 10, 2020, 9:17 a.m.
* Rich Felker <dalias@libc.org> [2020-01-09 21:07:47 -0500]:
> On Fri, Jan 10, 2020 at 12:18:58AM +0100, Szabolcs Nagy wrote:
> > i think -fexcess-precision=standard was introduced in
> > gcc 4.5 and to get reliable behaviour before that we
> > needed -ffloat-store.
> 
> I don't think the behavior was "reliable" with -ffloat-store; it's
> wrong with respect to the defined meaning of FLT_EVAL_METHOD.

well reliable in the sense that the results are more
consistent with other targets that have FLT_EVAL_METHOD==0.

> > i think we would need to add back the old annotations
> > to make old compilers safe without -ffloat-store.
> > (fdlibm often raises fenv exceptions via a final rounding
> > before return, those could be often handled more cleanly
> > by __math_oflow etc helpers, but since it was not designed
> > for inline errno handling some normal return paths can
> > raise fp exceptions too and thus need eval_as_* annotation).
> 
> I think I asked you before, but from a standpoint of fenv stuff, I'm
> confused why the eval_as_* things are useful at all; it looks like you
> would need fp_barrier* to ensure they're actually evaluated (e.g. in
> the presence of LTO with a compiler that doesn't honor fenv right).

if you mean the current definition of eval_as_* in musl,
then those are not useful (except in the middle of some
arithmetic expression or assignment to double_t etc
where c99 normally would not drop excess precision)

their point is to annotate where we need to drop excess
precision so we can define it appropriately for specific
targets (and allow all other places to use excess prec).

e.g. in case of 'return 0x1p999*0x1p999' to raise overflow
and return inf, we need fp barrier to avoid const folds and
eval_as_double to drop excess precision so it becomes:

  return eval_as_double(fp_barrier(0x1p999) * 0x1p999);

in principle fp_barrier need not drop excess precision so

  return fp_barrier(fp_barrier(0x1p999) * 0x1p999);

would not be enough, but i ended up using float and double
in fp_barrier instead of float_t and double_t so now it
drops excess precision too (may be a mistake, but we almost
always want to drop excess prec when forcing an evaluation).
separate eval_as_double is still useful since on most targets
it is a nop while fp_barrier is not a nop. (but e.g. i386
could define eval_as_double as fp_barrier)

the bigger problem with fp_barrier is that it just hides
a value (behind volatile or asm volatile) but it should be
forcing the mul operation, e.g. currently

  if (cond)
    return eval_as_double(fp_barrier(0x1p999) * 0x1p999);

may be transformed to

  if (cond)
    x = fp_barrier(0x1p999);
  y = x * 0x1p999;
  if (cond)
    return eval_as_double(y);

by a compiler that assumes mul has no side-effects and
then the mul is evaluated unconditionally (but such
transformation is unlikely so in practice the barrier
works).

i don't think there is a trick that allows us to force
the evaluation of an individual fp op, so for now i'm
happy with fp_barrier and eval_as_*.

> 
> But I think it's also useful to distinguish between possibility of
> wrong exceptions being raised, which is a rather minor issue since
> some widely-used compilers don't even support fenv reasonably at at
> all, and the possibility of wrong values being returned for functions
> where the result is required to be correctly rounded. I would deem it
> a serious problem for sqrt[f] or fma[f] to return the wrong value when
> compiled with gcc3 or pcc. I don't think I would particularly care if
> exceptions failed to be raised properly when compiled with gcc3 or
> pcc, though. So I probably would like to ensure that, whatever code we
> end up with in i386 sqrt[f].c, it it ends up working even if the
> compiler does not handle excess precision correctly.

note that rounding down to double to force an overflow
usually also forces an infinity and without the force
the result would be >DBL_MAX finite which then can cause
problems, just like sqrt with excess prec.

so most eval_as_double has the same importance: if any
of them is missed the result may be wrong.

i have fp_force_eval which has no return value and is
only used to force side-effects, if you don't care about
fenv then that may be defined as nop.
Alexander Monakov Jan. 14, 2020, 5:59 p.m.
On Thu, 9 Jan 2020, Szabolcs Nagy wrote:

> c99 required cast at return, but that was considered to be
> a bug so in c11 it's no longer required: return now must
> round

This is the interpretation that GCC uses (return statements
imply removal of excess precision), but I wonder where the
standard actually says that: afaics N1570 still has the same
footnote in 6.8.6.4 that ends with "A cast may be used to
remove this extra range and precision."

Alexander
Szabolcs Nagy Jan. 14, 2020, 6:47 p.m.
* Alexander Monakov <amonakov@ispras.ru> [2020-01-14 20:59:21 +0300]:
> On Thu, 9 Jan 2020, Szabolcs Nagy wrote:
> 
> > c99 required cast at return, but that was considered to be
> > a bug so in c11 it's no longer required: return now must
> > round
> 
> This is the interpretation that GCC uses (return statements
> imply removal of excess precision), but I wonder where the
> standard actually says that: afaics N1570 still has the same
> footnote in 6.8.6.4 that ends with "A cast may be used to
> remove this extra range and precision."

it's in annex f
http://port70.net/~nsz/c/c11/n1570.html#F.6