fdstore: Print detailed error when queue is exhausted

Submitted by Cyrill Gorcunov on June 22, 2018, 11:48 a.m.

Details

Message ID 20180622114826.12768-1-gorcunov@gmail.com
State Rejected
Series "fdstore: Print detailed error when queue is exhausted"
Headers show

Commit Message

Cyrill Gorcunov June 22, 2018, 11:48 a.m.
We use fdstore intensively for example when handling
bindmounted sockets and ghost dgram sockets. The system
limit for per-socket queue may not be enough if someone
generate lots of ghost sockets (150 and more as been
detected on default fedora 27).

We can't just increase system limits since thery are
global but instead lets print an error with list
of parameters to adjust. This will give a node
admin a way to process restore.

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
---
 criu/fdstore.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

Patch hide | download patch | download mbox

diff --git a/criu/fdstore.c b/criu/fdstore.c
index 8d3a6c89b1dc..2748c6743fa7 100644
--- a/criu/fdstore.c
+++ b/criu/fdstore.c
@@ -79,11 +79,25 @@  int fdstore_init(void)
 int fdstore_add(int fd)
 {
 	int sk = get_service_fd(FDSTORE_SK_OFF);
-	int id;
+	int id, ret, i;
 
 	mutex_lock(&desc->lock);
 
-	if (send_fd(sk, NULL, 0, fd)) {
+	ret = send_fd(sk, NULL, 0, fd);
+	if (ret) {
+		int err_cpy = errno;
+		pr_perror("Can't send fd %d into store\n", fd);
+		if (err_cpy == EAGAIN) {
+			static const char * const sysctl_params[] = {
+				"net.core.rmem_default",
+				"net.core.rmem_max",
+				"net.core.wmem_default",
+				"net.core.wmem_max",
+			};
+			pr_err("Too many fdstore entries are used. Increase sysctl:\n");
+			for (i = 0; i < ARRAY_SIZE(sysctl_params); i++)
+				pr_err("  %s\n", sysctl_params[i]);
+		}
 		mutex_unlock(&desc->lock);
 		return -1;
 	}

Comments

Andrey Vagin June 25, 2018, 5:20 p.m.
On Fri, Jun 22, 2018 at 02:48:26PM +0300, Cyrill Gorcunov wrote:
> We use fdstore intensively for example when handling
> bindmounted sockets and ghost dgram sockets. The system
> limit for per-socket queue may not be enough if someone
> generate lots of ghost sockets (150 and more as been
> detected on default fedora 27).
> 
> We can't just increase system limits since thery are
> global but instead lets print an error with list
> of parameters to adjust. This will give a node
> admin a way to process restore.

Do you understand that, in this case, we can't use fdstore to restore
unix sockets?

> 
> Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
> ---
>  criu/fdstore.c | 18 ++++++++++++++++--
>  1 file changed, 16 insertions(+), 2 deletions(-)
> 
> diff --git a/criu/fdstore.c b/criu/fdstore.c
> index 8d3a6c89b1dc..2748c6743fa7 100644
> --- a/criu/fdstore.c
> +++ b/criu/fdstore.c
> @@ -79,11 +79,25 @@ int fdstore_init(void)
>  int fdstore_add(int fd)
>  {
>  	int sk = get_service_fd(FDSTORE_SK_OFF);
> -	int id;
> +	int id, ret, i;
>  
>  	mutex_lock(&desc->lock);
>  
> -	if (send_fd(sk, NULL, 0, fd)) {
> +	ret = send_fd(sk, NULL, 0, fd);
> +	if (ret) {
> +		int err_cpy = errno;
> +		pr_perror("Can't send fd %d into store\n", fd);
> +		if (err_cpy == EAGAIN) {
> +			static const char * const sysctl_params[] = {
> +				"net.core.rmem_default",
> +				"net.core.rmem_max",
> +				"net.core.wmem_default",
> +				"net.core.wmem_max",
> +			};
> +			pr_err("Too many fdstore entries are used. Increase sysctl:\n");
> +			for (i = 0; i < ARRAY_SIZE(sysctl_params); i++)
> +				pr_err("  %s\n", sysctl_params[i]);
> +		}
>  		mutex_unlock(&desc->lock);
>  		return -1;
>  	}
> -- 
> 2.14.4
>
Cyrill Gorcunov June 25, 2018, 5:40 p.m.
On Mon, Jun 25, 2018 at 10:20:18AM -0700, Andrey Vagin wrote:
> On Fri, Jun 22, 2018 at 02:48:26PM +0300, Cyrill Gorcunov wrote:
> > We use fdstore intensively for example when handling
> > bindmounted sockets and ghost dgram sockets. The system
> > limit for per-socket queue may not be enough if someone
> > generate lots of ghost sockets (150 and more as been
> > detected on default fedora 27).
> > 
> > We can't just increase system limits since thery are
> > global but instead lets print an error with list
> > of parameters to adjust. This will give a node
> > admin a way to process restore.
> 
> Do you understand that, in this case, we can't use fdstore to restore
> unix sockets?

An admin may increase the system limits, and thats all
Andrey Vagin June 25, 2018, 6:29 p.m.
On Mon, Jun 25, 2018 at 08:40:36PM +0300, Cyrill Gorcunov wrote:
> On Mon, Jun 25, 2018 at 10:20:18AM -0700, Andrey Vagin wrote:
> > On Fri, Jun 22, 2018 at 02:48:26PM +0300, Cyrill Gorcunov wrote:
> > > We use fdstore intensively for example when handling
> > > bindmounted sockets and ghost dgram sockets. The system
> > > limit for per-socket queue may not be enough if someone
> > > generate lots of ghost sockets (150 and more as been
> > > detected on default fedora 27).
> > > 
> > > We can't just increase system limits since thery are
> > > global but instead lets print an error with list
> > > of parameters to adjust. This will give a node
> > > admin a way to process restore.
> > 
> > Do you understand that, in this case, we can't use fdstore to restore
> > unix sockets?
> 
> An admin may increase the system limits, and thats all

CRIU should work w/o increasing system limits...
Cyrill Gorcunov June 25, 2018, 6:31 p.m.
On Mon, Jun 25, 2018 at 11:29:05AM -0700, Andrey Vagin wrote:
> > 
> > An admin may increase the system limits, and thats all
> 
> CRIU should work w/o increasing system limits...

As being discussed face to face -- will improve.