Message ID | CAJwJo6YLMu_f9NRMFEoiBzrW6QtZrdNd+9Qs4iJZ1TVFD5LJGA@mail.gmail.com |
---|---|
State | New |
Series | "zdtm/static/fd failure on aarch64" |
Headers | show |
From df5674071f7afcb0d16d2a50a96b25ba85e759f3 Mon Sep 17 00:00:00 2001 From: Dmitry Safonov <dima@arista.com> Date: Mon, 11 Jun 2018 15:44:07 +0100 Subject: [PATCH] zdtm/lib: Check EWOULDBLOCK in errno instead of -EWOULDBLOCK Syscalls do return negative value in case of an error. But errno contains the error code itself. Also check that result of atomic_inc() is not 1, which would mean that we've just acquired the mutex. And FUTEX_WAIT should be called against `c', not `c + 1', because it was already incremented by atomic_inc(). Signed-off-by: Dmitry Safonov <dima@arista.com> --- test/zdtm/lib/lock.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/test/zdtm/lib/lock.h b/test/zdtm/lib/lock.h index 972ed5769a9f..a00fab2b408f 100644 --- a/test/zdtm/lib/lock.h +++ b/test/zdtm/lib/lock.h @@ -142,11 +142,11 @@ static void inline mutex_lock(mutex_t *m) uint32_t c; int ret; - while ((c = atomic_inc(&m->raw))) { - ret = sys_futex(&m->raw, FUTEX_WAIT, c + 1, NULL, NULL, 0); + while ((c = atomic_inc(&m->raw)) != 1) { + ret = sys_futex(&m->raw, FUTEX_WAIT, c, NULL, NULL, 0); if (ret < 0) pr_perror("futex"); - BUG_ON(ret < 0 && errno != -EWOULDBLOCK); + BUG_ON(ret < 0 && errno != EWOULDBLOCK); } } -- 2.13.6
On Mon, Jun 11, 2018 at 04:41:17PM +0100, Dmitry Safonov wrote: > 2018-06-11 16:34 GMT+01:00 Adrian Reber <adrian@lisas.de>: > [...] > > No, still endless loop: > > > > 11:32:31.668: 4109: ERR: ../lib/lock.h:148: futex (errno = 11 (Resource temporarily unavailable)) > > > > [pid 4109] futex(0xffff87000000, FUTEX_WAIT, 2264924161, NULL) = -1 EAGAIN (Resource temporarily unavailable) > > [pid 4109] newfstatat(AT_FDCWD, "/etc/localtime", {st_mode=S_IFREG|0644, st_size=3519, ...}, 0) = 0 > > [pid 4109] write(2, "11:32:32.516: 4109: ERR: ../lib"..., 99) = 99 > > Ugh, I adore this piece of code. > The third time, lucky time? %) :( Sorry. Still: 11:54:16.950: 4210: ERR: ../lib/lock.h:148: futex (errno = 11 (Resource temporarily unavailable)) [pid 4210] futex(0xffff9f650000, FUTEX_WAIT, 2674196480, NULL) = -1 EAGAIN (Resource temporarily unavailable) [pid 4210] newfstatat(AT_FDCWD, "/etc/localtime", {st_mode=S_IFREG|0644, st_size=3519, ...}, 0) = 0 [pid 4210] write(2, "11:54:16.950: 4210: ERR: ../lib"..., 99) = 99 Adrian
2018-06-11 16:55 GMT+01:00 Adrian Reber <adrian@lisas.de>: > On Mon, Jun 11, 2018 at 04:41:17PM +0100, Dmitry Safonov wrote: >> 2018-06-11 16:34 GMT+01:00 Adrian Reber <adrian@lisas.de>: >> [...] >> > No, still endless loop: >> > >> > 11:32:31.668: 4109: ERR: ../lib/lock.h:148: futex (errno = 11 (Resource temporarily unavailable)) >> > >> > [pid 4109] futex(0xffff87000000, FUTEX_WAIT, 2264924161, NULL) = -1 EAGAIN (Resource temporarily unavailable) >> > [pid 4109] newfstatat(AT_FDCWD, "/etc/localtime", {st_mode=S_IFREG|0644, st_size=3519, ...}, 0) = 0 >> > [pid 4109] write(2, "11:32:32.516: 4109: ERR: ../lib"..., 99) = 99 >> >> Ugh, I adore this piece of code. >> The third time, lucky time? %) > > :( Sorry. Still: > > 11:54:16.950: 4210: ERR: ../lib/lock.h:148: futex (errno = 11 (Resource temporarily unavailable)) > > [pid 4210] futex(0xffff9f650000, FUTEX_WAIT, 2674196480, NULL) = -1 EAGAIN (Resource temporarily unavailable) > [pid 4210] newfstatat(AT_FDCWD, "/etc/localtime", {st_mode=S_IFREG|0644, st_size=3519, ...}, 0) = 0 > [pid 4210] write(2, "11:54:16.950: 4210: ERR: ../lib"..., 99) = 99 Heh, could you attach 20-30 lines after the first futex(..., FUTEX_WAIT, 2,...) call in the endless loop? I do see that there is a possible race like this: futex val CPU0/task0 CPU1/task1 --------------- --------------- 1 atomic_inc()=2 atomic_inc()=3 3 futex(FUTEX_WAIT, 2) 3 EWOULDBLOCK 3 atomic_inc()=4 4 futex(FUTEX_WAIT, 3) 4 EWOULDBLOCK ... But it looks quite unlikely to reproduce each cycle and I'm not sure it's the problem you're observing. Thanks, Dmitry
On Mon, Jun 11, 2018 at 05:59:51PM +0100, Dmitry Safonov wrote: > 2018-06-11 16:55 GMT+01:00 Adrian Reber <adrian@lisas.de>: > > On Mon, Jun 11, 2018 at 04:41:17PM +0100, Dmitry Safonov wrote: > >> 2018-06-11 16:34 GMT+01:00 Adrian Reber <adrian@lisas.de>: > >> [...] > >> > No, still endless loop: > >> > > >> > 11:32:31.668: 4109: ERR: ../lib/lock.h:148: futex (errno = 11 (Resource temporarily unavailable)) > >> > > >> > [pid 4109] futex(0xffff87000000, FUTEX_WAIT, 2264924161, NULL) = -1 EAGAIN (Resource temporarily unavailable) > >> > [pid 4109] newfstatat(AT_FDCWD, "/etc/localtime", {st_mode=S_IFREG|0644, st_size=3519, ...}, 0) = 0 > >> > [pid 4109] write(2, "11:32:32.516: 4109: ERR: ../lib"..., 99) = 99 > >> > >> Ugh, I adore this piece of code. > >> The third time, lucky time? %) > > > > :( Sorry. Still: > > > > 11:54:16.950: 4210: ERR: ../lib/lock.h:148: futex (errno = 11 (Resource temporarily unavailable)) > > > > [pid 4210] futex(0xffff9f650000, FUTEX_WAIT, 2674196480, NULL) = -1 EAGAIN (Resource temporarily unavailable) > > [pid 4210] newfstatat(AT_FDCWD, "/etc/localtime", {st_mode=S_IFREG|0644, st_size=3519, ...}, 0) = 0 > > [pid 4210] write(2, "11:54:16.950: 4210: ERR: ../lib"..., 99) = 99 > > Heh, could you attach 20-30 lines after the first > futex(..., FUTEX_WAIT, 2,...) call in the endless loop? 4508 dup3(4, 0, 0) = 0 4508 close(4) = 0 4508 mmap(NULL, 65536, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_ANONYMOUS, 0, 0) = 0xffffa14a0000 4508 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xffffa1502dd0) = 4509 4508 futex(0xffffa14a0000, FUTEX_WAIT, 0, NULL <unfinished ...> 4509 setsid() = 4509 4509 rt_sigaction(SIGCHLD, {SIG_DFL, [], SA_RESTART}, NULL, 8) = 0 4509 mmap(NULL, 65536, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_ANONYMOUS, -1, 0) = 0xffffa1300000 4509 futex(0xffffa1300000, FUTEX_WAIT, 2704277504, NULL) = -1 EAGAIN (Resource temporarily unavailable) 4509 brk(NULL) = 0x1f8d0000 4509 brk(0x1f900000) = 0x1f900000 4509 brk(NULL) = 0x1f900000 4509 openat(AT_FDCWD, "/etc/localtime", O_RDONLY|O_CLOEXEC) = 4 4509 fstat(4, {st_mode=S_IFREG|0644, st_size=3519, ...}) = 0 4509 fstat(4, {st_mode=S_IFREG|0644, st_size=3519, ...}) = 0 4509 mmap(NULL, 65536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xffffa12f0000 4509 read(4, "TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\4\0\0\0\4\0\0\0\0"..., 8192) = 3519 4509 lseek(4, -2252, SEEK_CUR) = 1267 4509 read(4, "TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\5\0\0\0\5\0\0\0\0"..., 8192) = 2252 4509 close(4) = 0 4509 munmap(0xffffa12f0000, 65536) = 0 4509 write(2, "14:41:27.635: 4509: ERR: ../lib"..., 99) = 99 4509 futex(0xffffa1300000, FUTEX_WAIT, 2704277504, NULL) = -1 EAGAIN (Resource temporarily unavailable) 4509 newfstatat(AT_FDCWD, "/etc/localtime", {st_mode=S_IFREG|0644, st_size=3519, ...}, 0) = 0 4509 write(2, "14:41:27.636: 4509: ERR: ../lib"..., 99) = 99 4509 futex(0xffffa1300000, FUTEX_WAIT, 2704277504, NULL) = -1 EAGAIN (Resource temporarily unavailable) 4509 newfstatat(AT_FDCWD, "/etc/localtime", {st_mode=S_IFREG|0644, st_size=3519, ...}, 0) = 0 4509 write(2, "14:41:27.636: 4509: ERR: ../lib"..., 99) = 99 > I do see that there is a possible race like this: > > futex val CPU0/task0 CPU1/task1 > --------------- --------------- > 1 atomic_inc()=2 atomic_inc()=3 > 3 futex(FUTEX_WAIT, 2) > 3 EWOULDBLOCK > 3 atomic_inc()=4 > 4 > futex(FUTEX_WAIT, 3) > 4 EWOULDBLOCK > ... > > But it looks quite unlikely to reproduce each cycle and I'm not sure > it's the problem you're observing. > > Thanks, > Dmitry Adrian