[PATCHv8,02/34] lib/vdso: make do_hres and do_coarse as __always_inline

Submitted by Jann Horn via Containers on Nov. 12, 2019, 1:26 a.m.

Details

Message ID 20191112012724.250792-3-dima@arista.com
State New
Series "kernel: Introduce Time Namespace"
Headers show

Commit Message

Jann Horn via Containers Nov. 12, 2019, 1:26 a.m.
From: Andrei Vagin <avagin@gmail.com>

Performance numbers for Intel(R) Core(TM) i5-6300U CPU @ 2.40GHz
(more clock_gettime() cycles - the better):

clock            | before     | after      | diff
----------------------------------------------------------
monotonic        |  153222105 |  166775025 | 8.8%
monotonic-coarse |  671557054 |  691513017 | 3.0%
monotonic-raw    |  147116067 |  161057395 | 9.5%
boottime         |  153446224 |  166962668 | 9.1%

The improvement for arm64 for monotonic and boottime is around 3.5%.

clock            | before     | after      | diff

Patch hide | download patch | download mbox

==================================================
monotonic          17326692     17951770     3.6%
monotonic-coarse   43624027     44215292     1.3%
monotonic-raw      17541809     17554932     0.1%
boottime           17334982     17954361     3.5%

Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
---
 lib/vdso/gettimeofday.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/lib/vdso/gettimeofday.c b/lib/vdso/gettimeofday.c
index 45f57fd2db64..9923e1eab9db 100644
--- a/lib/vdso/gettimeofday.c
+++ b/lib/vdso/gettimeofday.c
@@ -38,7 +38,7 @@  u64 vdso_calc_delta(u64 cycles, u64 last, u64 mask, u32 mult)
 }
 #endif
 
-static int do_hres(const struct vdso_data *vd, clockid_t clk,
+static __always_inline int do_hres(const struct vdso_data *vd, clockid_t clk,
 		   struct __kernel_timespec *ts)
 {
 	const struct vdso_timestamp *vdso_ts = &vd->basetime[clk];
@@ -68,7 +68,7 @@  static int do_hres(const struct vdso_data *vd, clockid_t clk,
 	return 0;
 }
 
-static void do_coarse(const struct vdso_data *vd, clockid_t clk,
+static __always_inline void do_coarse(const struct vdso_data *vd, clockid_t clk,
 		      struct __kernel_timespec *ts)
 {
 	const struct vdso_timestamp *vdso_ts = &vd->basetime[clk];
@@ -97,12 +97,16 @@  __cvdso_clock_gettime_common(clockid_t clock, struct __kernel_timespec *ts)
 	 */
 	msk = 1U << clock;
 	if (likely(msk & VDSO_HRES)) {
-		return do_hres(&vd[CS_HRES_COARSE], clock, ts);
+		vd = &vd[CS_HRES_COARSE];
+out_hres:
+		return do_hres(vd, clock, ts);
 	} else if (msk & VDSO_COARSE) {
 		do_coarse(&vd[CS_HRES_COARSE], clock, ts);
 		return 0;
 	} else if (msk & VDSO_RAW) {
-		return do_hres(&vd[CS_RAW], clock, ts);
+		vd = &vd[CS_RAW];
+		/* goto allows to avoid extra inlining of do_hres. */
+		goto out_hres;
 	}
 	return -1;
 }

Comments

Vincenzo Frascino Jan. 10, 2020, 9:45 a.m.
On 11/12/19 1:26 AM, Dmitry Safonov wrote:
> +» » vd·=·&vd[CS_HRES_COARSE];
> +out_hres:
> +» » return·do_hres(vd,·clock,·ts);
> » }·else·if·(msk·&·VDSO_COARSE)·{
> » » do_coarse(&vd[CS_HRES_COARSE],·clock,·ts);
> » » return·0;
> » }·else·if·(msk·&·VDSO_RAW)·{
> -» » return·do_hres(&vd[CS_RAW],·clock,·ts);
> +» » vd·=·&vd[CS_RAW];
> +» » /*·goto·allows·to·avoid·extra·inlining·of·do_hres.·*/
> +» » goto·out_hres;

What is the performance impact of "goto out_hres"?
Thomas Gleixner Jan. 10, 2020, 11:42 a.m.
Vincenzo Frascino <vincenzo.frascino@arm.com> writes:
> On 11/12/19 1:26 AM, Dmitry Safonov wrote:
>> +» » vd·=·&vd[CS_HRES_COARSE];
>> +out_hres:
>> +» » return·do_hres(vd,·clock,·ts);
>> » }·else·if·(msk·&·VDSO_COARSE)·{
>> » » do_coarse(&vd[CS_HRES_COARSE],·clock,·ts);
>> » » return·0;
>> » }·else·if·(msk·&·VDSO_RAW)·{
>> -» » return·do_hres(&vd[CS_RAW],·clock,·ts);
>> +» » vd·=·&vd[CS_RAW];
>> +» » /*·goto·allows·to·avoid·extra·inlining·of·do_hres.·*/
>> +» » goto·out_hres;
>
> What is the performance impact of "goto out_hres"?

On x86 it's invisible at least in my limited testing.

Thanks,

        tglx
Vincenzo Frascino Jan. 10, 2020, 11:47 a.m.
On 1/10/20 11:42 AM, Thomas Gleixner wrote:
> Vincenzo Frascino <vincenzo.frascino@arm.com> writes:
>> On 11/12/19 1:26 AM, Dmitry Safonov wrote:
>>> +» » vd·=·&vd[CS_HRES_COARSE];
>>> +out_hres:
>>> +» » return·do_hres(vd,·clock,·ts);
>>> » }·else·if·(msk·&·VDSO_COARSE)·{
>>> » » do_coarse(&vd[CS_HRES_COARSE],·clock,·ts);
>>> » » return·0;
>>> » }·else·if·(msk·&·VDSO_RAW)·{
>>> -» » return·do_hres(&vd[CS_RAW],·clock,·ts);
>>> +» » vd·=·&vd[CS_RAW];
>>> +» » /*·goto·allows·to·avoid·extra·inlining·of·do_hres.·*/
>>> +» » goto·out_hres;
>>
>> What is the performance impact of "goto out_hres"?
> 
> On x86 it's invisible at least in my limited testing.

On arm64 as well based on mine as well. Shall we keep the code more readable
here (without goto)?

> 
> Thanks,
> 
>         tglx
>