zdtm/static/fd failure on aarch64

Submitted by Dmitry Safonov on June 11, 2018, 3:41 p.m.

Details

Message ID CAJwJo6YLMu_f9NRMFEoiBzrW6QtZrdNd+9Qs4iJZ1TVFD5LJGA@mail.gmail.com
State New
Series "zdtm/static/fd failure on aarch64"
Headers show

Commit Message

Dmitry Safonov June 11, 2018, 3:41 p.m.
2018-06-11 16:34 GMT+01:00 Adrian Reber <adrian@lisas.de>:
[...]
> No, still endless loop:
>
> 11:32:31.668:  4109: ERR: ../lib/lock.h:148: futex (errno = 11 (Resource temporarily unavailable))
>
> [pid  4109] futex(0xffff87000000, FUTEX_WAIT, 2264924161, NULL) = -1 EAGAIN (Resource temporarily unavailable)
> [pid  4109] newfstatat(AT_FDCWD, "/etc/localtime", {st_mode=S_IFREG|0644, st_size=3519, ...}, 0) = 0
> [pid  4109] write(2, "11:32:32.516:  4109: ERR: ../lib"..., 99) = 99

Ugh, I adore this piece of code.
The third time, lucky time? %)

Thanks,
             Dmitry

Patch hide | download patch | download mbox

From df5674071f7afcb0d16d2a50a96b25ba85e759f3 Mon Sep 17 00:00:00 2001
From: Dmitry Safonov <dima@arista.com>
Date: Mon, 11 Jun 2018 15:44:07 +0100
Subject: [PATCH] zdtm/lib: Check EWOULDBLOCK in errno instead of -EWOULDBLOCK

Syscalls do return negative value in case of an error.
But errno contains the error code itself.

Also check that result of atomic_inc() is not 1, which
would mean that we've just acquired the mutex.

And FUTEX_WAIT should be called against `c', not `c + 1',
because it was already incremented by atomic_inc().

Signed-off-by: Dmitry Safonov <dima@arista.com>
---
 test/zdtm/lib/lock.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/test/zdtm/lib/lock.h b/test/zdtm/lib/lock.h
index 972ed5769a9f..a00fab2b408f 100644
--- a/test/zdtm/lib/lock.h
+++ b/test/zdtm/lib/lock.h
@@ -142,11 +142,11 @@  static void inline mutex_lock(mutex_t *m)
 	uint32_t c;
 	int ret;
 
-	while ((c = atomic_inc(&m->raw))) {
-		ret = sys_futex(&m->raw, FUTEX_WAIT, c + 1, NULL, NULL, 0);
+	while ((c = atomic_inc(&m->raw)) != 1) {
+		ret = sys_futex(&m->raw, FUTEX_WAIT, c, NULL, NULL, 0);
 		if (ret < 0)
 			pr_perror("futex");
-		BUG_ON(ret < 0 && errno != -EWOULDBLOCK);
+		BUG_ON(ret < 0 && errno != EWOULDBLOCK);
 	}
 }
 
-- 
2.13.6


Comments

Adrian Reber June 11, 2018, 3:55 p.m.
On Mon, Jun 11, 2018 at 04:41:17PM +0100, Dmitry Safonov wrote:
> 2018-06-11 16:34 GMT+01:00 Adrian Reber <adrian@lisas.de>:
> [...]
> > No, still endless loop:
> >
> > 11:32:31.668:  4109: ERR: ../lib/lock.h:148: futex (errno = 11 (Resource temporarily unavailable))
> >
> > [pid  4109] futex(0xffff87000000, FUTEX_WAIT, 2264924161, NULL) = -1 EAGAIN (Resource temporarily unavailable)
> > [pid  4109] newfstatat(AT_FDCWD, "/etc/localtime", {st_mode=S_IFREG|0644, st_size=3519, ...}, 0) = 0
> > [pid  4109] write(2, "11:32:32.516:  4109: ERR: ../lib"..., 99) = 99
> 
> Ugh, I adore this piece of code.
> The third time, lucky time? %)

:( Sorry. Still:

11:54:16.950:  4210: ERR: ../lib/lock.h:148: futex (errno = 11 (Resource temporarily unavailable))

[pid  4210] futex(0xffff9f650000, FUTEX_WAIT, 2674196480, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[pid  4210] newfstatat(AT_FDCWD, "/etc/localtime", {st_mode=S_IFREG|0644, st_size=3519, ...}, 0) = 0
[pid  4210] write(2, "11:54:16.950:  4210: ERR: ../lib"..., 99) = 99

		Adrian
Dmitry Safonov June 11, 2018, 4:59 p.m.
2018-06-11 16:55 GMT+01:00 Adrian Reber <adrian@lisas.de>:
> On Mon, Jun 11, 2018 at 04:41:17PM +0100, Dmitry Safonov wrote:
>> 2018-06-11 16:34 GMT+01:00 Adrian Reber <adrian@lisas.de>:
>> [...]
>> > No, still endless loop:
>> >
>> > 11:32:31.668:  4109: ERR: ../lib/lock.h:148: futex (errno = 11 (Resource temporarily unavailable))
>> >
>> > [pid  4109] futex(0xffff87000000, FUTEX_WAIT, 2264924161, NULL) = -1 EAGAIN (Resource temporarily unavailable)
>> > [pid  4109] newfstatat(AT_FDCWD, "/etc/localtime", {st_mode=S_IFREG|0644, st_size=3519, ...}, 0) = 0
>> > [pid  4109] write(2, "11:32:32.516:  4109: ERR: ../lib"..., 99) = 99
>>
>> Ugh, I adore this piece of code.
>> The third time, lucky time? %)
>
> :( Sorry. Still:
>
> 11:54:16.950:  4210: ERR: ../lib/lock.h:148: futex (errno = 11 (Resource temporarily unavailable))
>
> [pid  4210] futex(0xffff9f650000, FUTEX_WAIT, 2674196480, NULL) = -1 EAGAIN (Resource temporarily unavailable)
> [pid  4210] newfstatat(AT_FDCWD, "/etc/localtime", {st_mode=S_IFREG|0644, st_size=3519, ...}, 0) = 0
> [pid  4210] write(2, "11:54:16.950:  4210: ERR: ../lib"..., 99) = 99

Heh, could you attach 20-30 lines after the first
futex(..., FUTEX_WAIT, 2,...) call in the endless loop?

I do see that there is a possible race like this:

futex val           CPU0/task0                      CPU1/task1
                        ---------------                      ---------------
1                      atomic_inc()=2                  atomic_inc()=3
3                      futex(FUTEX_WAIT, 2)
3                      EWOULDBLOCK
3                      atomic_inc()=4
4
futex(FUTEX_WAIT, 3)
4                                                               EWOULDBLOCK
...

But it looks quite unlikely to reproduce each cycle and I'm not sure
it's the problem you're observing.

Thanks,
             Dmitry
Adrian Reber June 11, 2018, 6:42 p.m.
On Mon, Jun 11, 2018 at 05:59:51PM +0100, Dmitry Safonov wrote:
> 2018-06-11 16:55 GMT+01:00 Adrian Reber <adrian@lisas.de>:
> > On Mon, Jun 11, 2018 at 04:41:17PM +0100, Dmitry Safonov wrote:
> >> 2018-06-11 16:34 GMT+01:00 Adrian Reber <adrian@lisas.de>:
> >> [...]
> >> > No, still endless loop:
> >> >
> >> > 11:32:31.668:  4109: ERR: ../lib/lock.h:148: futex (errno = 11 (Resource temporarily unavailable))
> >> >
> >> > [pid  4109] futex(0xffff87000000, FUTEX_WAIT, 2264924161, NULL) = -1 EAGAIN (Resource temporarily unavailable)
> >> > [pid  4109] newfstatat(AT_FDCWD, "/etc/localtime", {st_mode=S_IFREG|0644, st_size=3519, ...}, 0) = 0
> >> > [pid  4109] write(2, "11:32:32.516:  4109: ERR: ../lib"..., 99) = 99
> >>
> >> Ugh, I adore this piece of code.
> >> The third time, lucky time? %)
> >
> > :( Sorry. Still:
> >
> > 11:54:16.950:  4210: ERR: ../lib/lock.h:148: futex (errno = 11 (Resource temporarily unavailable))
> >
> > [pid  4210] futex(0xffff9f650000, FUTEX_WAIT, 2674196480, NULL) = -1 EAGAIN (Resource temporarily unavailable)
> > [pid  4210] newfstatat(AT_FDCWD, "/etc/localtime", {st_mode=S_IFREG|0644, st_size=3519, ...}, 0) = 0
> > [pid  4210] write(2, "11:54:16.950:  4210: ERR: ../lib"..., 99) = 99
> 
> Heh, could you attach 20-30 lines after the first
> futex(..., FUTEX_WAIT, 2,...) call in the endless loop?

4508  dup3(4, 0, 0)                     = 0
4508  close(4)                          = 0
4508  mmap(NULL, 65536, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_ANONYMOUS, 0, 0) = 0xffffa14a0000
4508  clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xffffa1502dd0) = 4509
4508  futex(0xffffa14a0000, FUTEX_WAIT, 0, NULL <unfinished ...>
4509  setsid()                          = 4509
4509  rt_sigaction(SIGCHLD, {SIG_DFL, [], SA_RESTART}, NULL, 8) = 0
4509  mmap(NULL, 65536, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_ANONYMOUS, -1, 0) = 0xffffa1300000
4509  futex(0xffffa1300000, FUTEX_WAIT, 2704277504, NULL) = -1 EAGAIN (Resource temporarily unavailable)
4509  brk(NULL)                         = 0x1f8d0000
4509  brk(0x1f900000)                   = 0x1f900000
4509  brk(NULL)                         = 0x1f900000
4509  openat(AT_FDCWD, "/etc/localtime", O_RDONLY|O_CLOEXEC) = 4
4509  fstat(4, {st_mode=S_IFREG|0644, st_size=3519, ...}) = 0
4509  fstat(4, {st_mode=S_IFREG|0644, st_size=3519, ...}) = 0
4509  mmap(NULL, 65536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xffffa12f0000
4509  read(4, "TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\4\0\0\0\4\0\0\0\0"..., 8192) = 3519
4509  lseek(4, -2252, SEEK_CUR)         = 1267
4509  read(4, "TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\5\0\0\0\5\0\0\0\0"..., 8192) = 2252
4509  close(4)                          = 0
4509  munmap(0xffffa12f0000, 65536)     = 0
4509  write(2, "14:41:27.635:  4509: ERR: ../lib"..., 99) = 99
4509  futex(0xffffa1300000, FUTEX_WAIT, 2704277504, NULL) = -1 EAGAIN (Resource temporarily unavailable)
4509  newfstatat(AT_FDCWD, "/etc/localtime", {st_mode=S_IFREG|0644, st_size=3519, ...}, 0) = 0
4509  write(2, "14:41:27.636:  4509: ERR: ../lib"..., 99) = 99
4509  futex(0xffffa1300000, FUTEX_WAIT, 2704277504, NULL) = -1 EAGAIN (Resource temporarily unavailable)
4509  newfstatat(AT_FDCWD, "/etc/localtime", {st_mode=S_IFREG|0644, st_size=3519, ...}, 0) = 0
4509  write(2, "14:41:27.636:  4509: ERR: ../lib"..., 99) = 99

> I do see that there is a possible race like this:
> 
> futex val           CPU0/task0                      CPU1/task1
>                         ---------------                      ---------------
> 1                      atomic_inc()=2                  atomic_inc()=3
> 3                      futex(FUTEX_WAIT, 2)
> 3                      EWOULDBLOCK
> 3                      atomic_inc()=4
> 4
> futex(FUTEX_WAIT, 3)
> 4                                                               EWOULDBLOCK
> ...
> 
> But it looks quite unlikely to reproduce each cycle and I'm not sure
> it's the problem you're observing.
> 
> Thanks,
>              Dmitry

		Adrian