x86: optimize fp_arch.h

Submitted by Szabolcs Nagy on April 24, 2019, 11:51 p.m.

Details

Message ID 20190424235106.GH26605@port70.net
State New
Series "x86: optimize fp_arch.h"
Headers show

Commit Message

Szabolcs Nagy April 24, 2019, 11:51 p.m.
tested on x86_64 and i386

Patch hide | download patch | download mbox

From 5f97370ff3e94bea812ec123a31d7482965a3b1b Mon Sep 17 00:00:00 2001
From: Szabolcs Nagy <nsz@port70.net>
Date: Wed, 24 Apr 2019 23:29:05 +0000
Subject: [PATCH] x86: optimize fp_arch.h

Use fp register constraint instead of volatile store when sse2 math is
available, and use memory constraint when only x87 fpu is available.
---
 arch/i386/fp_arch.h   | 31 +++++++++++++++++++++++++++++++
 arch/x32/fp_arch.h    | 25 +++++++++++++++++++++++++
 arch/x86_64/fp_arch.h | 25 +++++++++++++++++++++++++
 3 files changed, 81 insertions(+)
 create mode 100644 arch/i386/fp_arch.h
 create mode 100644 arch/x32/fp_arch.h
 create mode 100644 arch/x86_64/fp_arch.h

diff --git a/arch/i386/fp_arch.h b/arch/i386/fp_arch.h
new file mode 100644
index 00000000..b4019de2
--- /dev/null
+++ b/arch/i386/fp_arch.h
@@ -0,0 +1,31 @@ 
+#ifdef __SSE2_MATH__
+#define FP_BARRIER(x) __asm__ __volatile__ ("" : "+x"(x))
+#else
+#define FP_BARRIER(x) __asm__ __volatile__ ("" : "+m"(x))
+#endif
+
+#define fp_barrierf fp_barrierf
+static inline float fp_barrierf(float x)
+{
+	FP_BARRIER(x);
+	return x;
+}
+
+#define fp_barrier fp_barrier
+static inline double fp_barrier(double x)
+{
+	FP_BARRIER(x);
+	return x;
+}
+
+#define fp_force_evalf fp_force_evalf
+static inline void fp_force_evalf(float x)
+{
+	FP_BARRIER(x);
+}
+
+#define fp_force_eval fp_force_eval
+static inline void fp_force_eval(double x)
+{
+	FP_BARRIER(x);
+}
diff --git a/arch/x32/fp_arch.h b/arch/x32/fp_arch.h
new file mode 100644
index 00000000..ff9b8311
--- /dev/null
+++ b/arch/x32/fp_arch.h
@@ -0,0 +1,25 @@ 
+#define fp_barrierf fp_barrierf
+static inline float fp_barrierf(float x)
+{
+	__asm__ __volatile__ ("" : "+x"(x));
+	return x;
+}
+
+#define fp_barrier fp_barrier
+static inline double fp_barrier(double x)
+{
+	__asm__ __volatile__ ("" : "+x"(x));
+	return x;
+}
+
+#define fp_force_evalf fp_force_evalf
+static inline void fp_force_evalf(float x)
+{
+	__asm__ __volatile__ ("" : "+x"(x));
+}
+
+#define fp_force_eval fp_force_eval
+static inline void fp_force_eval(double x)
+{
+	__asm__ __volatile__ ("" : "+x"(x));
+}
diff --git a/arch/x86_64/fp_arch.h b/arch/x86_64/fp_arch.h
new file mode 100644
index 00000000..ff9b8311
--- /dev/null
+++ b/arch/x86_64/fp_arch.h
@@ -0,0 +1,25 @@ 
+#define fp_barrierf fp_barrierf
+static inline float fp_barrierf(float x)
+{
+	__asm__ __volatile__ ("" : "+x"(x));
+	return x;
+}
+
+#define fp_barrier fp_barrier
+static inline double fp_barrier(double x)
+{
+	__asm__ __volatile__ ("" : "+x"(x));
+	return x;
+}
+
+#define fp_force_evalf fp_force_evalf
+static inline void fp_force_evalf(float x)
+{
+	__asm__ __volatile__ ("" : "+x"(x));
+}
+
+#define fp_force_eval fp_force_eval
+static inline void fp_force_eval(double x)
+{
+	__asm__ __volatile__ ("" : "+x"(x));
+}
-- 
2.21.0


Comments

Rich Felker April 25, 2019, 2:01 a.m.
On Thu, Apr 25, 2019 at 01:51:06AM +0200, Szabolcs Nagy wrote:
> tested on x86_64 and i386

> >From 5f97370ff3e94bea812ec123a31d7482965a3b1b Mon Sep 17 00:00:00 2001
> From: Szabolcs Nagy <nsz@port70.net>
> Date: Wed, 24 Apr 2019 23:29:05 +0000
> Subject: [PATCH] x86: optimize fp_arch.h
> 
> Use fp register constraint instead of volatile store when sse2 math is
> available, and use memory constraint when only x87 fpu is available.
> ---
>  arch/i386/fp_arch.h   | 31 +++++++++++++++++++++++++++++++
>  arch/x32/fp_arch.h    | 25 +++++++++++++++++++++++++
>  arch/x86_64/fp_arch.h | 25 +++++++++++++++++++++++++
>  3 files changed, 81 insertions(+)
>  create mode 100644 arch/i386/fp_arch.h
>  create mode 100644 arch/x32/fp_arch.h
>  create mode 100644 arch/x86_64/fp_arch.h
> 
> diff --git a/arch/i386/fp_arch.h b/arch/i386/fp_arch.h
> new file mode 100644
> index 00000000..b4019de2
> --- /dev/null
> +++ b/arch/i386/fp_arch.h
> @@ -0,0 +1,31 @@
> +#ifdef __SSE2_MATH__
> +#define FP_BARRIER(x) __asm__ __volatile__ ("" : "+x"(x))
> +#else
> +#define FP_BARRIER(x) __asm__ __volatile__ ("" : "+m"(x))
> +#endif

I guess for float and double you need the "m" constraint to ensure
that a broken compiler doesn't skip dropping of precision (although I
still wish we didn't bother with complexity to support that, and just
relied on cast working correctly), but at least for long double
couldn't we use an x87 register constraint to avoid the spill to
memory?

Rich
Szabolcs Nagy April 25, 2019, 8:53 a.m.
* Rich Felker <dalias@libc.org> [2019-04-24 22:01:08 -0400]:
> On Thu, Apr 25, 2019 at 01:51:06AM +0200, Szabolcs Nagy wrote:
> > tested on x86_64 and i386
> 
> > >From 5f97370ff3e94bea812ec123a31d7482965a3b1b Mon Sep 17 00:00:00 2001
> > From: Szabolcs Nagy <nsz@port70.net>
> > Date: Wed, 24 Apr 2019 23:29:05 +0000
> > Subject: [PATCH] x86: optimize fp_arch.h
> > 
> > Use fp register constraint instead of volatile store when sse2 math is
> > available, and use memory constraint when only x87 fpu is available.
> > ---
> >  arch/i386/fp_arch.h   | 31 +++++++++++++++++++++++++++++++
> >  arch/x32/fp_arch.h    | 25 +++++++++++++++++++++++++
> >  arch/x86_64/fp_arch.h | 25 +++++++++++++++++++++++++
> >  3 files changed, 81 insertions(+)
> >  create mode 100644 arch/i386/fp_arch.h
> >  create mode 100644 arch/x32/fp_arch.h
> >  create mode 100644 arch/x86_64/fp_arch.h
> > 
> > diff --git a/arch/i386/fp_arch.h b/arch/i386/fp_arch.h
> > new file mode 100644
> > index 00000000..b4019de2
> > --- /dev/null
> > +++ b/arch/i386/fp_arch.h
> > @@ -0,0 +1,31 @@
> > +#ifdef __SSE2_MATH__
> > +#define FP_BARRIER(x) __asm__ __volatile__ ("" : "+x"(x))
> > +#else
> > +#define FP_BARRIER(x) __asm__ __volatile__ ("" : "+m"(x))
> > +#endif
> 
> I guess for float and double you need the "m" constraint to ensure
> that a broken compiler doesn't skip dropping of precision (although I
> still wish we didn't bother with complexity to support that, and just
> relied on cast working correctly), but at least for long double
> couldn't we use an x87 register constraint to avoid the spill to
> memory?

i think fp_barrier does not have to drop excess precision:
it is supposed to be an identity op that is hidden from
the compiler e.g. to prevent const folding or hoisting,
but fp_force_eval is used to force side-effects that may only
happen if the excess precision is dropped.

i think modern gcc drops excess precision at arg passing
in standard mode, so "+m" is not needed, but makes the code
behave the same in non-standard mode too.

and yes the long double version could use "+t", maybe i should
add that (the patch saves about 400byte .text because of
volatile load/store overhead).