Skip writing first iovec if it's empty

Submitted by Jouni Roivas on Oct. 12, 2020, 12:59 p.m.

Details

Message ID 20201012125901.2629590-1-jouni.roivas@tuxera.com
State New
Series "Skip writing first iovec if it's empty"
Headers show

Commit Message

Jouni Roivas Oct. 12, 2020, 12:59 p.m.
In case the first iovec is empty, skip writing it. Usually writing
zero length iovec is no-op, but in case of certain special cases this
causes the write to fail.

This affects at least cgroups under sysfs, since it doesn't properly
support writev with multiple iovec. For those kernel tends to handle
them as simple as possible, passing each iovec separately. In case
of zero length write into cgroups file causes kernel to return error.
Thus if writing the first iovec fails for being zero length, it
causes the whole write to fail even if writing the second iovec would
succeed. This happens for example when doing unbuffered write with
musl to a file under cgroups. Fix the issue here, since if kernel
gets fixed for this specific case, it still doesn't get fixed for
older kernels, nor any other possible similar case.
---
 src/stdio/__stdio_write.c | 4 ++++
 1 file changed, 4 insertions(+)

Patch hide | download patch | download mbox

diff --git a/src/stdio/__stdio_write.c b/src/stdio/__stdio_write.c
index d2d89475..eedce03a 100644
--- a/src/stdio/__stdio_write.c
+++ b/src/stdio/__stdio_write.c
@@ -11,6 +11,10 @@  size_t __stdio_write(FILE *f, const unsigned char *buf, size_t len)
 	size_t rem = iov[0].iov_len + iov[1].iov_len;
 	int iovcnt = 2;
 	ssize_t cnt;
+	if (iov[0].iov_len == 0) {
+		iov++;
+		iovcnt--;
+	}
 	for (;;) {
 		cnt = syscall(SYS_writev, f->fd, iov, iovcnt);
 		if (cnt == rem) {

Comments

Rich Felker Oct. 12, 2020, 3:02 p.m.
On Mon, Oct 12, 2020 at 03:59:01PM +0300, Jouni Roivas wrote:
> In case the first iovec is empty, skip writing it. Usually writing
> zero length iovec is no-op, but in case of certain special cases this
> causes the write to fail.
> 
> This affects at least cgroups under sysfs, since it doesn't properly
> support writev with multiple iovec. For those kernel tends to handle
> them as simple as possible, passing each iovec separately. In case
> of zero length write into cgroups file causes kernel to return error.
> Thus if writing the first iovec fails for being zero length, it
> causes the whole write to fail even if writing the second iovec would
> succeed. This happens for example when doing unbuffered write with
> musl to a file under cgroups. Fix the issue here, since if kernel
> gets fixed for this specific case, it still doesn't get fixed for
> older kernels, nor any other possible similar case.
> ---
>  src/stdio/__stdio_write.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/src/stdio/__stdio_write.c b/src/stdio/__stdio_write.c
> index d2d89475..eedce03a 100644
> --- a/src/stdio/__stdio_write.c
> +++ b/src/stdio/__stdio_write.c
> @@ -11,6 +11,10 @@ size_t __stdio_write(FILE *f, const unsigned char *buf, size_t len)
>  	size_t rem = iov[0].iov_len + iov[1].iov_len;
>  	int iovcnt = 2;
>  	ssize_t cnt;
> +	if (iov[0].iov_len == 0) {
> +		iov++;
> +		iovcnt--;
> +	}
>  	for (;;) {
>  		cnt = syscall(SYS_writev, f->fd, iov, iovcnt);
>  		if (cnt == rem) {
> -- 
> 2.25.1

I don't think this is sufficient to make writing procfs/sysfs files
via stdio work reliably; their interface is fundamentally incompatible
with flexibility of implementation in buffering. But it might be
preferable to do this still for other reasons. If we go forward with
this I think the commit message needs major rework so as not to be
implying that it makes the procfs/sysfs thing into a supported usage
case.

Note that commit e7eeeb9f2a4a358fb0bbed81e145ef5538ff60f0 did the
analog for __stdio_read in a way that's probably slightly better, but
that would need adaptation to work for the write case.

Rich
Jouni Roivas Oct. 12, 2020, 4:30 p.m.
The 10/12/2020 11:02, Rich Felker wrote:
> Note that commit e7eeeb9f2a4a358fb0bbed81e145ef5538ff60f0 did the
> analog for __stdio_read in a way that's probably slightly better, but
> that would need adaptation to work for the write case.

Thanks for pointing out that commit. For read case that looks indeed 
better, but for the write case I'll keep my implementation for the
simplicity. However I think I take and adapt the commit message from
the read case.

--
Jouni