[3/5] userns: Don't read extents twice in m_start

Submitted by Eric W. Biederman on Oct. 31, 2017, 11:48 p.m.

Details

Message ID 87k1zaswu6.fsf_-_@xmission.com
State New
Series "Series without cover letter"
Headers show

Commit Message

Eric W. Biederman Oct. 31, 2017, 11:48 p.m.
This is important so reading /proc/<pid>/{uid_map,gid_map,projid_map} while
the map is being written does not do strange things.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/user_namespace.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Patch hide | download patch | download mbox

diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index 563a2981d7c7..4f7e357ac1e2 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -683,11 +683,13 @@  static void *m_start(struct seq_file *seq, loff_t *ppos,
 		     struct uid_gid_map *map)
 {
 	loff_t pos = *ppos;
+	unsigned extents = map->nr_extents;
+	smp_rmb();
 
-	if (pos >= map->nr_extents)
+	if (pos >= extents)
 		return NULL;
 
-	if (map->nr_extents <= UID_GID_MAP_MAX_BASE_EXTENTS)
+	if (extents <= UID_GID_MAP_MAX_BASE_EXTENTS)
 		return &map->extent[pos];
 
 	return &map->forward[pos];

Comments

Nikolay Borisov Nov. 1, 2017, 8:31 a.m.
On  1.11.2017 01:48, Eric W. Biederman wrote:
> 
> This is important so reading /proc/<pid>/{uid_map,gid_map,projid_map} while
> the map is being written does not do strange things.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  kernel/user_namespace.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
> index 563a2981d7c7..4f7e357ac1e2 100644
> --- a/kernel/user_namespace.c
> +++ b/kernel/user_namespace.c
> @@ -683,11 +683,13 @@ static void *m_start(struct seq_file *seq, loff_t *ppos,
>  		     struct uid_gid_map *map)
>  {
>  	loff_t pos = *ppos;
> +	unsigned extents = map->nr_extents;
> +	smp_rmb();

Barriers need to be paired to work correctly as well as have explicit
comments describing the pairing as per kernel coding style. Checkpatch
will actually produce warning for that particular memory barrier.

>  
> -	if (pos >= map->nr_extents)
> +	if (pos >= extents)
>  		return NULL;
>  
> -	if (map->nr_extents <= UID_GID_MAP_MAX_BASE_EXTENTS)
> +	if (extents <= UID_GID_MAP_MAX_BASE_EXTENTS)
>  		return &map->extent[pos];
>  
>  	return &map->forward[pos];
>
Eric W. Biederman Nov. 1, 2017, 11:08 a.m.
Nikolay Borisov <nborisov@suse.com> writes:

> On  1.11.2017 01:48, Eric W. Biederman wrote:
>> 
>> This is important so reading /proc/<pid>/{uid_map,gid_map,projid_map} while
>> the map is being written does not do strange things.
>> 
>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>> ---
>>  kernel/user_namespace.c | 6 ++++--
>>  1 file changed, 4 insertions(+), 2 deletions(-)
>> 
>> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
>> index 563a2981d7c7..4f7e357ac1e2 100644
>> --- a/kernel/user_namespace.c
>> +++ b/kernel/user_namespace.c
>> @@ -683,11 +683,13 @@ static void *m_start(struct seq_file *seq, loff_t *ppos,
>>  		     struct uid_gid_map *map)
>>  {
>>  	loff_t pos = *ppos;
>> +	unsigned extents = map->nr_extents;
>> +	smp_rmb();
>
> Barriers need to be paired to work correctly as well as have explicit
> comments describing the pairing as per kernel coding style. Checkpatch
> will actually produce warning for that particular memory barrier.

So please look at the code and read the comment.  The fact the barrier
was not in m_start earlier is strictly speaking a bug.

In practice except for a very narrow window when this data is changing
the one time it can, this code does not matter at all.

As for checkpatch I have sympathy for it, checkpatch has a hard job,
but I won't listen to checkpatch when it is wrong.

If you have additional cleanups you would like to make in this area
please send patches.

Eric

>>  
>> -	if (pos >= map->nr_extents)
>> +	if (pos >= extents)
>>  		return NULL;
>>  
>> -	if (map->nr_extents <= UID_GID_MAP_MAX_BASE_EXTENTS)
>> +	if (extents <= UID_GID_MAP_MAX_BASE_EXTENTS)
>>  		return &map->extent[pos];
>>  
>>  	return &map->forward[pos];
>>
Nikolay Borisov Nov. 1, 2017, 1:05 p.m.
[CCing some memory-barriers people + checkpatch maintainers]

On  1.11.2017 13:08, Eric W. Biederman wrote:
> Nikolay Borisov <nborisov@suse.com> writes:
> 
>> On  1.11.2017 01:48, Eric W. Biederman wrote:
>>>
>>> This is important so reading /proc/<pid>/{uid_map,gid_map,projid_map} while
>>> the map is being written does not do strange things.
>>>
>>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>>> ---
>>>  kernel/user_namespace.c | 6 ++++--
>>>  1 file changed, 4 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
>>> index 563a2981d7c7..4f7e357ac1e2 100644
>>> --- a/kernel/user_namespace.c
>>> +++ b/kernel/user_namespace.c
>>> @@ -683,11 +683,13 @@ static void *m_start(struct seq_file *seq, loff_t *ppos,
>>>  		     struct uid_gid_map *map)
>>>  {
>>>  	loff_t pos = *ppos;
>>> +	unsigned extents = map->nr_extents;
>>> +	smp_rmb();
>>
>> Barriers need to be paired to work correctly as well as have explicit
>> comments describing the pairing as per kernel coding style. Checkpatch
>> will actually produce warning for that particular memory barrier.
> 
> So please look at the code and read the comment.  The fact the barrier
> was not in m_start earlier is strictly speaking a bug.

The comment you are referring to (I guess) is the one in map_write,
right? However it states nothing about which barrier pairs with which
one so it's not possible to reason with 100% certainty about the
intentions of the code. I think the fact there was a missing barrier
here just makes this point stronger.

Additionally, when someone reading the code sees a rmb being added
without a pairing wmb it's only natural to think that something is amiss
in this case.

> 
> In practice except for a very narrow window when this data is changing
> the one time it can, this code does not matter at all.
> 
> As for checkpatch I have sympathy for it, checkpatch has a hard job,
> but I won't listen to checkpatch when it is wrong.

Well if you deem the checkpatch then this rule is better removed from
checkpatch. However, considering how hard it is to debug and reason
about correctness of barriers without explicit comments I'd say you are
in the wrong on this one.

> 
> If you have additional cleanups you would like to make in this area
> please send patches.
> 
> Eric
> 
>>>  
>>> -	if (pos >= map->nr_extents)
>>> +	if (pos >= extents)
>>>  		return NULL;
>>>  
>>> -	if (map->nr_extents <= UID_GID_MAP_MAX_BASE_EXTENTS)
>>> +	if (extents <= UID_GID_MAP_MAX_BASE_EXTENTS)
>>>  		return &map->extent[pos];
>>>  
>>>  	return &map->forward[pos];
>>>
> _______________________________________________
> Containers mailing list
> Containers@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers
>
Joe Perches Nov. 1, 2017, 5 p.m.
On Wed, 2017-11-01 at 06:08 -0500, Eric W. Biederman wrote:
> I won't listen to checkpatch when it is wrong.

Always a good idea.

btw: what is checkpatch wrong about this time?
Eric W. Biederman Nov. 1, 2017, 5:20 p.m.
Joe Perches <joe@perches.com> writes:

> On Wed, 2017-11-01 at 06:08 -0500, Eric W. Biederman wrote:
>> I won't listen to checkpatch when it is wrong.
>
> Always a good idea.
>
> btw: what is checkpatch wrong about this time?

Well the way I was hearing the conversation was that there was a patch
that fixed a real bug, but it was wrong because checkpatch complained
about it.

So I don't even know if the warning is a problem.  But blocking bug
fixes because there is a warning certainly is.

If someone wants to change coding style in practice so that every
smp_rmb and every smp_wmb has detailed comments that everyone must
include they need to follow the usual rule and update the entire kernel
when making an interface change.  As that did not happen I don't see
any problems with incremental updates in the style the code is already
in.

Not that I will mind a patch that updates the code, but I am not going
to hold up a perfectly good bug fix waiting for one either.

Eric
Peter Zijlstra Nov. 1, 2017, 6:15 p.m.
On Wed, Nov 01, 2017 at 12:20:52PM -0500, Eric W. Biederman wrote:

> Well the way I was hearing the conversation was that there was a patch
> that fixed a real bug, but it was wrong because checkpatch complained
> about it.
> 
> So I don't even know if the warning is a problem.  But blocking bug
> fixes because there is a warning certainly is.
> 
> If someone wants to change coding style in practice so that every
> smp_rmb and every smp_wmb has detailed comments that everyone must
> include they need to follow the usual rule and update the entire kernel
> when making an interface change.  As that did not happen I don't see
> any problems with incremental updates in the style the code is already
> in.

Reverse engineering all the memory barriers in the code is a massive
undertaking. We are looking at it, but its slow progress. Mostly an
update when touched approach.

Although some searching for dodgy patterns; see for example commit:

  a7af0af32171 ("blk-mq: attempt to fix atomic flag memory ordering")

which was the result of people looking at creative
smp_mb__{before,after}_atomic() usage -- in specific use of those
barriers not immediately adjacent to the respective atomic operation.

But to use the lack of completion of that work to not write comments on
code you understand is complete bollocks though.

> Not that I will mind a patch that updates the code, but I am not going
> to hold up a perfectly good bug fix waiting for one either.

If both you and the submitter actually understand the ordering, asking
them to write the comment isn't onerous and should not hold up anything
much at all.

In fact, if you cannot write that comment you cannot claim it to be a
good fix. And I'm saying the lack of the comment means its not a
perfectly good fix at all, since a perfect fix would indeed explain
memory ordering.