An addition to the math subtree

Submitted by Stefan Kanthak on Dec. 10, 2019, 4:58 p.m.

Details

Message ID A82F83FD339842C19EA53090EDAB3E11@H270
State New
Series "An addition to the math subtree"
Headers show

Commit Message

Stefan Kanthak Dec. 10, 2019, 4:58 p.m.
Optimised implementations of copysign() for i386

JFTR: I'm NOT subscribed to your mailing list, so CC: me in replies!

Patch hide | download patch | download mbox

--- -/dev/null
+++ +/src/math/i386/copysign.S
@@ -0,0 +1,26 @@ 
+.global copysignf
+.type copysignf,@function
+copysignf:
+        shlb $1,4+3(%esp)
+        shlb $1,8+3(%esp)
+        rcrb $1,4+3(%esp)
+        flds 4(%esp)
+        ret
+
+.global copysignl
+.type copysignl,@function
+copysignl:
+        shlb $1,4+9(%esp)
+        shlb $1,16+9(%esp)
+        rcrb $1,4+9(%esp)
+        fldt 4(%esp)
+        ret
+
+.global copysign
+.type copysign,@function
+copysign:
+        shlb $1,4+7(%esp)
+        shlb $1,12+7(%esp)
+        rcrb $1,4+7(%esp)
+        fldl 4(%esp)
+        ret

Comments

Szabolcs Nagy Dec. 11, 2019, 9:48 a.m.
* Stefan Kanthak <stefan.kanthak@nexgo.de> [2019-12-10 17:58:40 +0100]:
> Optimised implementations of copysign() for i386

note that in most user code gcc would inline
copysign calls instead of calling into libc.

when it is a call into libc then it wont be
fast no matter what because of the call overhead
(spilling registers).

so libc copysign is not performance critical
(except currently musl code itself calls it
but that can be fixed)

i386 asm is also not very interesting (it introduces
maintenance burden and not many users care about it)

> 
> JFTR: I'm NOT subscribed to your mailing list, so CC: me in replies!
> 
> --- -/dev/null
> +++ +/src/math/i386/copysign.S
> @@ -0,0 +1,26 @@
> +.global copysignf
> +.type copysignf,@function
> +copysignf:
> +        shlb $1,4+3(%esp)
> +        shlb $1,8+3(%esp)
> +        rcrb $1,4+3(%esp)
> +        flds 4(%esp)
> +        ret
> +
> +.global copysignl
> +.type copysignl,@function
> +copysignl:
> +        shlb $1,4+9(%esp)
> +        shlb $1,16+9(%esp)
> +        rcrb $1,4+9(%esp)
> +        fldt 4(%esp)
> +        ret
> +
> +.global copysign
> +.type copysign,@function
> +copysign:
> +        shlb $1,4+7(%esp)
> +        shlb $1,12+7(%esp)
> +        rcrb $1,4+7(%esp)
> +        fldl 4(%esp)
> +        ret