Message ID | 154028886677.8068.6960631900459085731.stgit@localhost.localdomain |
---|---|
State | New |
Series | "fuse kio: Add comment about why we need to wait pending read requests" |
Headers | show |
diff --git a/fs/fuse/kio/pcs/pcs_fuse_kdirect.c b/fs/fuse/kio/pcs/pcs_fuse_kdirect.c index 3940d6c255ba..9b45fcbf8941 100644 --- a/fs/fuse/kio/pcs/pcs_fuse_kdirect.c +++ b/fs/fuse/kio/pcs/pcs_fuse_kdirect.c @@ -1081,7 +1081,12 @@ static void pcs_kio_setattr_handle(struct fuse_inode *fi, struct fuse_req *req) r->end = req->end; if (di->size.op == PCS_SIZE_SHRINK) { BUG_ON(!mutex_is_locked(&req->io_inode->i_mutex)); - /* wait for aio reads in flight */ + /* + * Wait for already submitted aio reads. Further reads + * (including already queued to bg_queue) will be stopped + * by wait_shrink(), and they will be processed from + * _pcs_shrink_end(). + */ inode_dio_wait(req->io_inode); /* * Writebackcache was flushed already so it is safe to
On 23.10.2018 13:01, Kirill Tkhai wrote: > Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> > --- > fs/fuse/kio/pcs/pcs_fuse_kdirect.c | 7 ++++++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/fs/fuse/kio/pcs/pcs_fuse_kdirect.c b/fs/fuse/kio/pcs/pcs_fuse_kdirect.c > index 3940d6c255ba..9b45fcbf8941 100644 > --- a/fs/fuse/kio/pcs/pcs_fuse_kdirect.c > +++ b/fs/fuse/kio/pcs/pcs_fuse_kdirect.c > @@ -1081,7 +1081,12 @@ static void pcs_kio_setattr_handle(struct fuse_inode *fi, struct fuse_req *req) > r->end = req->end; > if (di->size.op == PCS_SIZE_SHRINK) { > BUG_ON(!mutex_is_locked(&req->io_inode->i_mutex)); > - /* wait for aio reads in flight */ > + /* > + * Wait for already submitted aio reads. Further reads > + * (including already queued to bg_queue) will be stopped > + * by wait_shrink(), and they will be processed from > + * _pcs_shrink_end(). > + */ Why do we have to wait already submitted aio reads ? > inode_dio_wait(req->io_inode); > /* > * Writebackcache was flushed already so it is safe to >
On 23.10.2018 13:15, Pavel Butsykin wrote: > > > On 23.10.2018 13:01, Kirill Tkhai wrote: >> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> >> --- >> fs/fuse/kio/pcs/pcs_fuse_kdirect.c | 7 ++++++- >> 1 file changed, 6 insertions(+), 1 deletion(-) >> >> diff --git a/fs/fuse/kio/pcs/pcs_fuse_kdirect.c b/fs/fuse/kio/pcs/pcs_fuse_kdirect.c >> index 3940d6c255ba..9b45fcbf8941 100644 >> --- a/fs/fuse/kio/pcs/pcs_fuse_kdirect.c >> +++ b/fs/fuse/kio/pcs/pcs_fuse_kdirect.c >> @@ -1081,7 +1081,12 @@ static void pcs_kio_setattr_handle(struct fuse_inode *fi, struct fuse_req *req) >> r->end = req->end; >> if (di->size.op == PCS_SIZE_SHRINK) { >> BUG_ON(!mutex_is_locked(&req->io_inode->i_mutex)); >> - /* wait for aio reads in flight */ >> + /* >> + * Wait for already submitted aio reads. Further reads >> + * (including already queued to bg_queue) will be stopped >> + * by wait_shrink(), and they will be processed from >> + * _pcs_shrink_end(). >> + */ > > Why do we have to wait already submitted aio reads ? Because there could be grow requests. This place is broken, I'm fixing that at the moment. One-dimensional size.op is not enough to drive race between grow and shrink, we need to convert PCS_SIZE_* into bit fields like it used to be before. >> inode_dio_wait(req->io_inode); >> /* >> * Writebackcache was flushed already so it is safe to >>
On 23.10.2018 13:19, Kirill Tkhai wrote: > On 23.10.2018 13:15, Pavel Butsykin wrote: >> >> >> On 23.10.2018 13:01, Kirill Tkhai wrote: >>> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> >>> --- >>> fs/fuse/kio/pcs/pcs_fuse_kdirect.c | 7 ++++++- >>> 1 file changed, 6 insertions(+), 1 deletion(-) >>> >>> diff --git a/fs/fuse/kio/pcs/pcs_fuse_kdirect.c b/fs/fuse/kio/pcs/pcs_fuse_kdirect.c >>> index 3940d6c255ba..9b45fcbf8941 100644 >>> --- a/fs/fuse/kio/pcs/pcs_fuse_kdirect.c >>> +++ b/fs/fuse/kio/pcs/pcs_fuse_kdirect.c >>> @@ -1081,7 +1081,12 @@ static void pcs_kio_setattr_handle(struct fuse_inode *fi, struct fuse_req *req) >>> r->end = req->end; >>> if (di->size.op == PCS_SIZE_SHRINK) { >>> BUG_ON(!mutex_is_locked(&req->io_inode->i_mutex)); >>> - /* wait for aio reads in flight */ >>> + /* >>> + * Wait for already submitted aio reads. Further reads >>> + * (including already queued to bg_queue) will be stopped >>> + * by wait_shrink(), and they will be processed from >>> + * _pcs_shrink_end(). >>> + */ >> >> Why do we have to wait already submitted aio reads ? > > Because there could be grow requests. This place is broken, I'm fixing that at the moment. > One-dimensional size.op is not enough to drive race between grow and shrink, we need to > convert PCS_SIZE_* into bit fields like it used to be before. > I don't understand. There are no chances to races between grow and shrink, because the shrink is protected from simultaneous execution with write requests at the fuse level. No write requests - no grows. >>> inode_dio_wait(req->io_inode); >>> /* >>> * Writebackcache was flushed already so it is safe to >>>
On 23.10.2018 16:08, Pavel Butsykin wrote: > > > On 23.10.2018 13:19, Kirill Tkhai wrote: >> On 23.10.2018 13:15, Pavel Butsykin wrote: >>> >>> >>> On 23.10.2018 13:01, Kirill Tkhai wrote: >>>> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> >>>> --- >>>> fs/fuse/kio/pcs/pcs_fuse_kdirect.c | 7 ++++++- >>>> 1 file changed, 6 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/fs/fuse/kio/pcs/pcs_fuse_kdirect.c b/fs/fuse/kio/pcs/pcs_fuse_kdirect.c >>>> index 3940d6c255ba..9b45fcbf8941 100644 >>>> --- a/fs/fuse/kio/pcs/pcs_fuse_kdirect.c >>>> +++ b/fs/fuse/kio/pcs/pcs_fuse_kdirect.c >>>> @@ -1081,7 +1081,12 @@ static void pcs_kio_setattr_handle(struct fuse_inode *fi, struct fuse_req *req) >>>> r->end = req->end; >>>> if (di->size.op == PCS_SIZE_SHRINK) { >>>> BUG_ON(!mutex_is_locked(&req->io_inode->i_mutex)); >>>> - /* wait for aio reads in flight */ >>>> + /* >>>> + * Wait for already submitted aio reads. Further reads >>>> + * (including already queued to bg_queue) will be stopped >>>> + * by wait_shrink(), and they will be processed from >>>> + * _pcs_shrink_end(). >>>> + */ >>> >>> Why do we have to wait already submitted aio reads ? >> >> Because there could be grow requests. This place is broken, I'm fixing that at the moment. >> One-dimensional size.op is not enough to drive race between grow and shrink, we need to >> convert PCS_SIZE_* into bit fields like it used to be before. >> > > I don't understand. There are no chances to races between grow and > shrink, because the shrink is protected from simultaneous execution with > write requests at the fuse level. No write requests - no grows. Not exactly. Currently we have fallocate requests, which may spawn a grow operation. Even FALLOC_FL_KEEP_SIZE may do that -- then inode mutex is not held. So, the question is not so easy, what is correct, and which is place should be fixed. >>>> inode_dio_wait(req->io_inode); >>>> /* >>>> * Writebackcache was flushed already so it is safe to >>>>
On 23.10.2018 13:15, Pavel Butsykin wrote: > > > On 23.10.2018 13:01, Kirill Tkhai wrote: >> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> >> --- >> fs/fuse/kio/pcs/pcs_fuse_kdirect.c | 7 ++++++- >> 1 file changed, 6 insertions(+), 1 deletion(-) >> >> diff --git a/fs/fuse/kio/pcs/pcs_fuse_kdirect.c b/fs/fuse/kio/pcs/pcs_fuse_kdirect.c >> index 3940d6c255ba..9b45fcbf8941 100644 >> --- a/fs/fuse/kio/pcs/pcs_fuse_kdirect.c >> +++ b/fs/fuse/kio/pcs/pcs_fuse_kdirect.c >> @@ -1081,7 +1081,12 @@ static void pcs_kio_setattr_handle(struct fuse_inode *fi, struct fuse_req *req) >> r->end = req->end; >> if (di->size.op == PCS_SIZE_SHRINK) { >> BUG_ON(!mutex_is_locked(&req->io_inode->i_mutex)); >> - /* wait for aio reads in flight */ >> + /* >> + * Wait for already submitted aio reads. Further reads >> + * (including already queued to bg_queue) will be stopped >> + * by wait_shrink(), and they will be processed from >> + * _pcs_shrink_end(). >> + */ > > Why do we have to wait already submitted aio reads ? Also, see here: [cpu0] [cpu1] fuse_size_grow_work() ... spin_lock(&di->lock) di->size.op = PCS_SIZE_INACTION; spin_unlock(&di->lock) ... pcs_kio_setattr_handle() inode_dio_wait() started | | pcs_cc_requeue(di->cluster, &pending_reqs) | ... | request_end() v inode_dio_wait() finished proceed FATTR_SIZE shrink
On 23.10.2018 16:38, Kirill Tkhai wrote: > On 23.10.2018 16:08, Pavel Butsykin wrote: >> >> >> On 23.10.2018 13:19, Kirill Tkhai wrote: >>> On 23.10.2018 13:15, Pavel Butsykin wrote: >>>> >>>> >>>> On 23.10.2018 13:01, Kirill Tkhai wrote: >>>>> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> >>>>> --- >>>>> fs/fuse/kio/pcs/pcs_fuse_kdirect.c | 7 ++++++- >>>>> 1 file changed, 6 insertions(+), 1 deletion(-) >>>>> >>>>> diff --git a/fs/fuse/kio/pcs/pcs_fuse_kdirect.c b/fs/fuse/kio/pcs/pcs_fuse_kdirect.c >>>>> index 3940d6c255ba..9b45fcbf8941 100644 >>>>> --- a/fs/fuse/kio/pcs/pcs_fuse_kdirect.c >>>>> +++ b/fs/fuse/kio/pcs/pcs_fuse_kdirect.c >>>>> @@ -1081,7 +1081,12 @@ static void pcs_kio_setattr_handle(struct fuse_inode *fi, struct fuse_req *req) >>>>> r->end = req->end; >>>>> if (di->size.op == PCS_SIZE_SHRINK) { >>>>> BUG_ON(!mutex_is_locked(&req->io_inode->i_mutex)); >>>>> - /* wait for aio reads in flight */ >>>>> + /* >>>>> + * Wait for already submitted aio reads. Further reads >>>>> + * (including already queued to bg_queue) will be stopped >>>>> + * by wait_shrink(), and they will be processed from >>>>> + * _pcs_shrink_end(). >>>>> + */ >>>> >>>> Why do we have to wait already submitted aio reads ? >>> >>> Because there could be grow requests. This place is broken, I'm fixing that at the moment. >>> One-dimensional size.op is not enough to drive race between grow and shrink, we need to >>> convert PCS_SIZE_* into bit fields like it used to be before. >>> >> >> I don't understand. There are no chances to races between grow and >> shrink, because the shrink is protected from simultaneous execution with >> write requests at the fuse level. No write requests - no grows. > > Not exactly. Currently we have fallocate requests, which may spawn a grow operation. > Even FALLOC_FL_KEEP_SIZE may do that -- then inode mutex is not held. So, the question > is not so easy, what is correct, and which is place should be fixed. > So, we have two places where possible fallocate without mutex: 1. static size_t fuse_send_unmap(struct fuse_req *req, struct fuse_io_priv *io, loff_t pos, size_t count, fl_owner_t owner) { .. inarg->mode = FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE; req->in.h.opcode = FUSE_FALLOCATE; .. if (io->async) return fuse_async_req_send(fc, req, count, io); fuse_request_send(fc, req); return count; } It's definitely a mistake, we should protect this fallocate from simultaneous execution with shrink. 2. static long fuse_file_fallocate(struct file *file, int mode, loff_t offset, loff_t length) { ... bool lock_inode = !(mode & FALLOC_FL_KEEP_SIZE) || (mode & (FALLOC_FL_PUNCH_HOLE|FALLOC_FL_ZERO_RANGE)); /* Return error if mode is not supported */ if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE | FALLOC_FL_ZERO_RANGE)) return -EOPNOTSUPP; ... So, fallocate without mutex only possible if (mode & FALLOC_FL_KEEP_SIZE) == FALLOC_FL_KEEP_SIZE. Hmm, what does it mean ? Looks like it will be the same as FALLOC_FL_ZERO_RANGE|FALLOC_FL_KEEP_SIZE. I wonder, Do we need to handle such fallocate for vstorage at all? It seems that NO, it's just PCS_IREQ_NOOP. Alexey? >>>>> inode_dio_wait(req->io_inode); >>>>> /* >>>>> * Writebackcache was flushed already so it is safe to >>>>>
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> --- fs/fuse/kio/pcs/pcs_fuse_kdirect.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)