Minor style patch to exit.c

Submitted by Markus Wichmann on Jan. 19, 2020, 11:07 a.m.

Details

Message ID 20200119110743.GD2020@voyager
State New
Series "Minor style patch to exit.c"
Headers show

Commit Message

Markus Wichmann Jan. 19, 2020, 11:07 a.m.
Hi all,

it has often been said that if a LISP programmer were tasked with
catching an elephant, they would build a maze of parentheses in which
the elephant gets lost. If so, then libc_exit_fini() provides a nice
starting point for such a maze, even if it was started in C.

In all seriousness, the logic of the function is to iterate over an
array of function pointers, so why not write down that that's what it
does instead of iterating over some numbers that happen to be
convertible to such? Or is there something I have missed?

Ciao,
Markus

Patch hide | download patch | download mbox

From 44e09392c72cbd72c1e5b4ec3dea71f9b38e13c8 Mon Sep 17 00:00:00 2001
From: Markus Wichmann <nullplan@gmx.net>
Date: Sun, 19 Jan 2020 11:58:51 +0100
Subject: [PATCH] Make libc_exit_fini() easier to read.

The previous version did have a maze of parentheses here. But the actual
logic of the function is not to iterate over some numbers that happen to
be convertible to pointers, but rather to iterate over an array of
function pointers. So let us use pointer arithmetic correctly.
---
 src/exit/exit.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/exit/exit.c b/src/exit/exit.c
index a6869b37..e02ba2be 100644
--- a/src/exit/exit.c
+++ b/src/exit/exit.c
@@ -16,9 +16,9 @@  extern weak hidden void (*const __fini_array_start)(void), (*const __fini_array_

 static void libc_exit_fini(void)
 {
-	uintptr_t a = (uintptr_t)&__fini_array_end;
-	for (; a>(uintptr_t)&__fini_array_start; a-=sizeof(void(*)()))
-		(*(void (**)())(a-sizeof(void(*)())))();
+	void (*const *a)() = &__fini_array_end;
+	while (a > &__fini_array_start)
+		(*--a)();
 	_fini();
 }

--
2.17.1


Comments

Markus Wichmann Jan. 19, 2020, 11:12 a.m.
Hi again,

I forgot to add: I did not test the change, but I did test compile it
and compare the generated assembly before and after, and that was the
same. Therefore, I don't think anything else can change at run time,
either, right?

Ciao,
Markus
Szabolcs Nagy Jan. 19, 2020, 11:31 a.m.
* Markus Wichmann <nullplan@gmx.net> [2020-01-19 12:07:43 +0100]:
> The previous version did have a maze of parentheses here. But the actual
> logic of the function is not to iterate over some numbers that happen to
> be convertible to pointers, but rather to iterate over an array of
> function pointers. So let us use pointer arithmetic correctly.
> ---
>  src/exit/exit.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/src/exit/exit.c b/src/exit/exit.c
> index a6869b37..e02ba2be 100644
> --- a/src/exit/exit.c
> +++ b/src/exit/exit.c
> @@ -16,9 +16,9 @@ extern weak hidden void (*const __fini_array_start)(void), (*const __fini_array_
> 
>  static void libc_exit_fini(void)
>  {
> -	uintptr_t a = (uintptr_t)&__fini_array_end;
> -	for (; a>(uintptr_t)&__fini_array_start; a-=sizeof(void(*)()))
> -		(*(void (**)())(a-sizeof(void(*)())))();
> +	void (*const *a)() = &__fini_array_end;
> +	while (a > &__fini_array_start)
> +		(*--a)();

this has undefined behaviour.

the original code was carefully written to avoid that.
you need to keep the uintptr_t cast since -- and > are
undefined for pointers that go out of bound or don't
point to the same object (and the _start, _end symbols
don't represent the same c language object, they are
independent).

>  	_fini();
>  }
> 
> --
> 2.17.1
>
Markus Wichmann Jan. 19, 2020, 12:17 p.m.
On Sun, Jan 19, 2020 at 12:31:34PM +0100, Szabolcs Nagy wrote:
> this has undefined behaviour.
>
> the original code was carefully written to avoid that.
> you need to keep the uintptr_t cast since -- and > are
> undefined for pointers that go out of bound or don't
> point to the same object (and the _start, _end symbols
> don't represent the same c language object, they are
> independent).
>

So there was something I missed. Those linker symbols sure cause
trouble.

FTR: The problem is that we are taking the addresses of ostensibly
independent objects and comparing them. The only way to make the change
I'd like to make would be to get the linker to generate an actual
pointer object with a value, instead of generating some external
variable whose address is meaningful. And even if that were possible (I
think you can get the linker to write something into a section), that is
not part of the ABI, or of the normal linker script. So then you'd
always require a custom linker script. All for the convenience of not
having to do this conversion.

Oh well, at least I learned something.

Ciao,
Markus
Alexander Cherepanov Jan. 19, 2020, 1:33 p.m.
On 19/01/2020 14.31, Szabolcs Nagy wrote:
> * Markus Wichmann <nullplan@gmx.net> [2020-01-19 12:07:43 +0100]:
>> The previous version did have a maze of parentheses here. But the actual
>> logic of the function is not to iterate over some numbers that happen to
>> be convertible to pointers, but rather to iterate over an array of
>> function pointers. So let us use pointer arithmetic correctly.
>> ---
>>   src/exit/exit.c | 6 +++---
>>   1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/src/exit/exit.c b/src/exit/exit.c
>> index a6869b37..e02ba2be 100644
>> --- a/src/exit/exit.c
>> +++ b/src/exit/exit.c
>> @@ -16,9 +16,9 @@ extern weak hidden void (*const __fini_array_start)(void), (*const __fini_array_
>>
>>   static void libc_exit_fini(void)
>>   {
>> -	uintptr_t a = (uintptr_t)&__fini_array_end;
>> -	for (; a>(uintptr_t)&__fini_array_start; a-=sizeof(void(*)()))
>> -		(*(void (**)())(a-sizeof(void(*)())))();
>> +	void (*const *a)() = &__fini_array_end;
>> +	while (a > &__fini_array_start)
>> +		(*--a)();

[This is part of my reply was sent to Markus off-list by mistake, sorry 
for the dup.]

Iteration over pointers like this is wrong C (even if it's somewhat 
disputed) and was miscompiled by gcc at some point -- for example you 
can follow links from 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61502#c19 (see also the 
next comment there for a reply from a gcc developer).

> this has undefined behaviour.
> 
> the original code was carefully written to avoid that.
> you need to keep the uintptr_t cast since -- and > are
> undefined for pointers that go out of bound or don't
> point to the same object (and the _start, _end symbols
> don't represent the same c language object, they are
> independent).

I think this only solves a part of the problem.

1. Casts from integers to pointers are implementation-defined and only 
guaranteed to give sane results when those integers are in turn casted 
from pointers. Arithmetic on integers between casts is outside any 
guarantees. It would be funny if not for the next issue.

2. As is defined, not only _start and _end symbols represent different 
objects, locations between them represent different objects too. 
Starting with a pointer to one object and iterating over integers until 
you get to another object doesn't give you a valid pointer to the second 
object, at least in gcc -- see 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65752 .

Couldn't _start defined as an array? Then separate values could be 
accessed simply as elements of this array. And casts to integers could 
be limited to calculating the number of elements, the terminating value 
or something.

Beware though, even if you get it perfect from the C POV gcc has some 
bugs in this area, e.g. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49330 .
Markus Wichmann Jan. 19, 2020, 2:24 p.m.
On Sun, Jan 19, 2020 at 04:33:47PM +0300, Alexander Cherepanov wrote:
> Couldn't _start defined as an array? Then separate values could be accessed
> simply as elements of this array. And casts to integers could be limited to
> calculating the number of elements, the terminating value or something.
>

That reminds me of something I read in the C standard: Two pointers must
compare equal if, among other possibilities, one is a pointer to
one-past its underlying array, and the other is a pointer to the start
of its array, and the arrays happen to lie behind one another in address
space.

Therefore, if _start and _end were arrays, even the GCC devs must agree
that there might be an integer i such that _start + i == _end. For the C
language, _start and _end would be arrays that happen to lie adjacent in
address space.

And if we have guarantees from the outside attesting to that, then
_end - _start is no longer an undefined expression, right?

Ciao,
Markus
Alexander Monakov Jan. 19, 2020, 2:46 p.m.
On Sun, 19 Jan 2020, Alexander Cherepanov wrote:

> Couldn't _start defined as an array? Then separate values could be accessed
> simply as elements of this array. And casts to integers could be limited to
> calculating the number of elements, the terminating value or something.

Yeah, I think usually such linker-provided symbols are declared as
extern arrays. I'm surprised that isn't the case in musl.  I don't think
declaring them as arrays helps with making casts pedantically suitable for
calculating number of elements though - as you said, any bijection between
intptr_t and pointers would be a valid implementation of a cast, you're not
guaranteed that (intptr_t)&a[i] == (intptr_t)a + i * sizeof *a.

Alexander
Pascal Cuoq Jan. 19, 2020, 2:49 p.m.
> On 19 Jan 2020, at 15:24, Markus Wichmann <nullplan@gmx.net> wrote:

> 

> That reminds me of something I read in the C standard: Two pointers must

> compare equal if, among other possibilities, one is a pointer to

> one-past its underlying array, and the other is a pointer to the start

> of its array, and the arrays happen to lie behind one another in address

> space.


The clause is 6.5.9:6 in C11:

Two pointers compare equal if and only if both are null pointers, both are pointers to the same object (including a pointer to an object and a subobject at its beginning) or function, both are pointers to one past the last element of the same array object, or one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space.

With a footnote:

Two objects may be adjacent in memory because they are adjacent elements of a larger array or adjacent members of a structure with no padding between them, or because the implementation chose to place them so, even though they are unrelated. […]



The way GCC developers have decided to interpret this clause is that if the two arrays had to be adjacent in memory because they were part of the same aggregate, the pointers will reliably be equal iff one is one-past-the-end of the first one and the other is at the beginning of the second one. Otherwise, compilers will sometimes optimize “&a + 1 == b” to 0, that is, they implement a rule that could be paraphrased as “two pointers *may* compare equal if one is to the end of an array object and the other is to the beginning of another array objects that happens to lie immediately after the first one”.

https://gcc.godbolt.org/z/NfmTV9

> Therefore, if _start and _end were arrays, even the GCC devs must agree

> that there might be an integer i such that _start + i == _end.


If they did, they would not have written an optimization that transforms “a + 1 == b” into 0 regardless of the actual addresses of a and b.

(Every object is an array for the purpose of this discussion, that's 6.5.9:7, but it doesn't help.)

Pascal
Alexander Cherepanov Jan. 19, 2020, 3:53 p.m.
On 19/01/2020 17.24, Markus Wichmann wrote:
> On Sun, Jan 19, 2020 at 04:33:47PM +0300, Alexander Cherepanov wrote:
>> Couldn't _start defined as an array? Then separate values could be accessed
>> simply as elements of this array. And casts to integers could be limited to
>> calculating the number of elements, the terminating value or something.
> 
> That reminds me of something I read in the C standard: Two pointers must
> compare equal if, among other possibilities, one is a pointer to
> one-past its underlying array, and the other is a pointer to the start
> of its array, and the arrays happen to lie behind one another in address
> space.

One[1] of the gcc bug reports I mentioned is exactly about this issue. 
DR 260[2] allows to take the provenance of the pointers into account 
when comparing them and gcc really does this.

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61502
[2] http://open-std.org/jtc1/sc22/wg14/www/docs/dr_260.htm

As a side note, I thinks this is the wildest gcc bug report, it contains 
really mind-blowing comments (like comment 3). I don't mean it in a bad 
way at all and if you want to turn your understanding of C language 
inside-out you can try to read it. OTOH I think it's all wrong after all 
and I have some hope for it to be settled after my recent comments 
there. But I don't hold my breath.

> Therefore, if _start and _end were arrays, even the GCC devs must agree
> that there might be an integer i such that _start + i == _end. For the C
> language, _start and _end would be arrays that happen to lie adjacent in
> address space.
> 
> And if we have guarantees from the outside attesting to that, then
> _end - _start is no longer an undefined expression, right?

Even if we know that _start + k == _end it doesn't mean that we allowed 
to subtract them.
Rich Felker Jan. 19, 2020, 4:18 p.m.
On Sun, Jan 19, 2020 at 05:46:15PM +0300, Alexander Monakov wrote:
> On Sun, 19 Jan 2020, Alexander Cherepanov wrote:
> 
> > Couldn't _start defined as an array? Then separate values could be accessed
> > simply as elements of this array. And casts to integers could be limited to
> > calculating the number of elements, the terminating value or something.
> 
> Yeah, I think usually such linker-provided symbols are declared as
> extern arrays. I'm surprised that isn't the case in musl.  I don't think
> declaring them as arrays helps with making casts pedantically suitable for
> calculating number of elements though - as you said, any bijection between
> intptr_t and pointers would be a valid implementation of a cast, you're not
> guaranteed that (intptr_t)&a[i] == (intptr_t)a + i * sizeof *a.

Conceptually the _start is an array; that means it's fine to iterate
over its elements, and we could even do so with the "correct" type.
The problem is that _end is a different symbol and thereby inherently
a "different object", and comparing against it with < is not valid;
the compiler can legitimately optimize that out. I think with != would
be valid, but I'm not sure we can trust compilers to honor any
consistency for such "one past the end" comparisons. Casting to
(uintptr_t) before doing the != comparison would absolutely be safe in
the abstract machine; whether compilers honor this is unclear (because
of the "provenance" stuff, which could break even the current code, so
arguably we should have some "provenance barrier" here).

Of course exit runs the array in reverse, which makes it even more of
a mess. _end[-1] is clearly not valid when _end is an array object,
and the compiler is free to break that.

Rich
Rich Felker Jan. 19, 2020, 4:22 p.m.
On Sun, Jan 19, 2020 at 06:53:49PM +0300, Alexander Cherepanov wrote:
> On 19/01/2020 17.24, Markus Wichmann wrote:
> >On Sun, Jan 19, 2020 at 04:33:47PM +0300, Alexander Cherepanov wrote:
> >>Couldn't _start defined as an array? Then separate values could be accessed
> >>simply as elements of this array. And casts to integers could be limited to
> >>calculating the number of elements, the terminating value or something.
> >
> >That reminds me of something I read in the C standard: Two pointers must
> >compare equal if, among other possibilities, one is a pointer to
> >one-past its underlying array, and the other is a pointer to the start
> >of its array, and the arrays happen to lie behind one another in address
> >space.
> 
> One[1] of the gcc bug reports I mentioned is exactly about this
> issue. DR 260[2] allows to take the provenance of the pointers into
> account when comparing them and gcc really does this.
> 
> [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61502
> [2] http://open-std.org/jtc1/sc22/wg14/www/docs/dr_260.htm
> 
> As a side note, I thinks this is the wildest gcc bug report, it
> contains really mind-blowing comments (like comment 3). I don't mean
> it in a bad way at all and if you want to turn your understanding of
> C language inside-out you can try to read it. OTOH I think it's all
> wrong after all and I have some hope for it to be settled after my
> recent comments there. But I don't hold my breath.
> 
> >Therefore, if _start and _end were arrays, even the GCC devs must agree
> >that there might be an integer i such that _start + i == _end. For the C
> >language, _start and _end would be arrays that happen to lie adjacent in
> >address space.
> >
> >And if we have guarantees from the outside attesting to that, then
> >_end - _start is no longer an undefined expression, right?
> 
> Even if we know that _start + k == _end it doesn't mean that we
> allowed to subtract them.

Consider a function that takes a pointer p, an array a, and a length
l, and does:

	for (i=0; i<l; i++) if (a+i == p) return p-a;

Can f(_end,_start,k) and f(_start+k,_start,k) legitimately differ,
despite _end==_start+k? I think the answer is no, in the existing C
language, in that the result of an expression is a pure function of
the *values* put into it. But compiler folks do not want to interpret
it this way and are pushing through hidden "provenance" state, so...

Rich
Alexander Cherepanov Jan. 19, 2020, 4:33 p.m.
On 19/01/2020 17.46, Alexander Monakov wrote:
> On Sun, 19 Jan 2020, Alexander Cherepanov wrote:
> 
>> Couldn't _start defined as an array? Then separate values could be accessed
>> simply as elements of this array. And casts to integers could be limited to
>> calculating the number of elements, the terminating value or something.
> 
> Yeah, I think usually such linker-provided symbols are declared as
> extern arrays. I'm surprised that isn't the case in musl.  I don't think
> declaring them as arrays helps with making casts pedantically suitable for
> calculating number of elements though - as you said, any bijection between
> intptr_t and pointers would be a valid implementation of a cast, you're not

Well, we want use from C some outside info, there could be no pedantic 
way to do this. Let's see, we know that the _end array follows the 
_start array in memory. This means that &_start[i] == &_end[0] for some 
i. But different provenance of the pointers means that we cannot do it 
just like that. Adding a cast should fix this. Summarizing, it should 
look like this:

for (size_t i = 0; (uintptr_t)&_start[i] != (uintptr_t)&_end[0]; i++)

or

for (type *p = _start; (uintptr_t)p != (uintptr_t)_end; p++)

> guaranteed that (intptr_t)&a[i] == (intptr_t)a + i * sizeof *a.

While you are inside one object, I think this should be safe in 
practice. For gcc, this is more or less guaranteed by [3]. BTW there is 
an explicit restriction there:

"When casting from pointer to integer and back again, the resulting 
pointer must reference the same object as the original pointer, 
otherwise the behavior is undefined. That is, one may not use integer 
arithmetic to avoid the undefined behavior of pointer arithmetic as 
proscribed in C99 and C11 6.5.6/8."

[3] 
https://gcc.gnu.org/onlinedocs/gcc/Arrays-and-pointers-implementation.html
Rich Felker Jan. 19, 2020, 4:39 p.m.
On Sun, Jan 19, 2020 at 07:33:08PM +0300, Alexander Cherepanov wrote:
> On 19/01/2020 17.46, Alexander Monakov wrote:
> >On Sun, 19 Jan 2020, Alexander Cherepanov wrote:
> >
> >>Couldn't _start defined as an array? Then separate values could be accessed
> >>simply as elements of this array. And casts to integers could be limited to
> >>calculating the number of elements, the terminating value or something.
> >
> >Yeah, I think usually such linker-provided symbols are declared as
> >extern arrays. I'm surprised that isn't the case in musl.  I don't think
> >declaring them as arrays helps with making casts pedantically suitable for
> >calculating number of elements though - as you said, any bijection between
> >intptr_t and pointers would be a valid implementation of a cast, you're not
> 
> Well, we want use from C some outside info, there could be no
> pedantic way to do this. Let's see, we know that the _end array
> follows the _start array in memory. This means that &_start[i] ==
> &_end[0] for some i. But different provenance of the pointers means
> that we cannot do it just like that. Adding a cast should fix this.
> Summarizing, it should look like this:
> 
> for (size_t i = 0; (uintptr_t)&_start[i] != (uintptr_t)&_end[0]; i++)
> 
> or
> 
> for (type *p = _start; (uintptr_t)p != (uintptr_t)_end; p++)

This works for forward walk, not backwards walk.

> >guaranteed that (intptr_t)&a[i] == (intptr_t)a + i * sizeof *a.
> 
> While you are inside one object, I think this should be safe in
> practice. For gcc, this is more or less guaranteed by [3]. BTW there
> is an explicit restriction there:
> 
> "When casting from pointer to integer and back again, the resulting
> pointer must reference the same object as the original pointer,
> otherwise the behavior is undefined. That is, one may not use
> integer arithmetic to avoid the undefined behavior of pointer
> arithmetic as proscribed in C99 and C11 6.5.6/8."
> 
> [3] https://gcc.gnu.org/onlinedocs/gcc/Arrays-and-pointers-implementation.html

GCC is badly wrong here, and it breaks XOR linked lists and other
things. It's also worded imprecisely. What does it mean if arithmetic
is performed on the value between the cast and cast back. What if two
pointers go into the arithmetic, but complex mathematical relations
result in one of the original values coming out, and the compiler can
only "see" the other pointer going in? Will it then wrongly assume
that the result points to the same object as the pointer it "saw" go
in?

This whole provenance thing is a trashfire.

Rich
Alexander Monakov Jan. 19, 2020, 5:11 p.m.
On Sun, 19 Jan 2020, Rich Felker wrote:

> Conceptually the _start is an array; that means it's fine to iterate
> over its elements, and we could even do so with the "correct" type.
> The problem is that _end is a different symbol and thereby inherently
> a "different object", and comparing against it with < is not valid;
> the compiler can legitimately optimize that out. I think with != would
> be valid, but I'm not sure we can trust compilers to honor any
> consistency for such "one past the end" comparisons. Casting to
> (uintptr_t) before doing the != comparison would absolutely be safe in
> the abstract machine; whether compilers honor this is unclear (because
> of the "provenance" stuff, which could break even the current code, so
> arguably we should have some "provenance barrier" here).
> 
> Of course exit runs the array in reverse, which makes it even more of
> a mess. _end[-1] is clearly not valid when _end is an array object,
> and the compiler is free to break that.

I would suggest

  void (**ptr)(void);

  __asm__ ("" : "=g"(ptr) : "0"(..._end), "X"(..._start));

  while (ptr != _start) (*--ptr)();

Alexander
Alexander Monakov Jan. 19, 2020, 5:17 p.m.
On Sun, 19 Jan 2020, Alexander Monakov wrote:

> I would suggest
> 
>   void (**ptr)(void);
> 
>   __asm__ ("" : "=g"(ptr) : "0"(..._end), "X"(..._start));

Or even less magic:

  void (**ptr)(void) = ..._end;

  __asm__ ("" : "+g"(ptr) : "X"(..._start));

Alexander
Rich Felker Jan. 19, 2020, 5:19 p.m.
On Sun, Jan 19, 2020 at 08:11:37PM +0300, Alexander Monakov wrote:
> 
> 
> On Sun, 19 Jan 2020, Rich Felker wrote:
> 
> > Conceptually the _start is an array; that means it's fine to iterate
> > over its elements, and we could even do so with the "correct" type.
> > The problem is that _end is a different symbol and thereby inherently
> > a "different object", and comparing against it with < is not valid;
> > the compiler can legitimately optimize that out. I think with != would
> > be valid, but I'm not sure we can trust compilers to honor any
> > consistency for such "one past the end" comparisons. Casting to
> > (uintptr_t) before doing the != comparison would absolutely be safe in
> > the abstract machine; whether compilers honor this is unclear (because
> > of the "provenance" stuff, which could break even the current code, so
> > arguably we should have some "provenance barrier" here).
> > 
> > Of course exit runs the array in reverse, which makes it even more of
> > a mess. _end[-1] is clearly not valid when _end is an array object,
> > and the compiler is free to break that.
> 
> I would suggest
> 
>   void (**ptr)(void);
> 
>   __asm__ ("" : "=g"(ptr) : "0"(..._end), "X"(..._start));
> 
>   while (ptr != _start) (*--ptr)();

I think we could just put the assignment outside the asm and use "+g".
Not clear why _start is needed as an input operand. It's external so
asm must be assumed to be able to see it.

BTW does the "X" constraint work all the way back to ancient gcc (and
then presumably with pcc, clang)? The first time I ever saw it was in
one of your other patches.

Rich
Alexander Monakov Jan. 19, 2020, 5:32 p.m.
On Sun, 19 Jan 2020, Rich Felker wrote:

> > I would suggest
> > 
> >   void (**ptr)(void);
> > 
> >   __asm__ ("" : "=g"(ptr) : "0"(..._end), "X"(..._start));
> > 
> >   while (ptr != _start) (*--ptr)();
> 
> I think we could just put the assignment outside the asm and use "+g".
> Not clear why _start is needed as an input operand. It's external so
> asm must be assumed to be able to see it.

In the context of LTO asms really need to mention referenced symbols in
constraints. Plus, it's good to have "the pointer is related to _start"
spelled out in the code.

> BTW does the "X" constraint work all the way back to ancient gcc (and
> then presumably with pcc, clang)? The first time I ever saw it was in
> one of your other patches.

gcc documentation says 2.95 already had it.

Alexander
Rich Felker Jan. 19, 2020, 5:38 p.m.
On Sun, Jan 19, 2020 at 08:32:57PM +0300, Alexander Monakov wrote:
> 
> 
> On Sun, 19 Jan 2020, Rich Felker wrote:
> 
> > > I would suggest
> > > 
> > >   void (**ptr)(void);
> > > 
> > >   __asm__ ("" : "=g"(ptr) : "0"(..._end), "X"(..._start));
> > > 
> > >   while (ptr != _start) (*--ptr)();
> > 
> > I think we could just put the assignment outside the asm and use "+g".
> > Not clear why _start is needed as an input operand. It's external so
> > asm must be assumed to be able to see it.
> 
> In the context of LTO asms really need to mention referenced symbols in
> constraints. Plus, it's good to have "the pointer is related to _start"
> spelled out in the code.

I think this is true for objects defined in C code, but not for ones
defined in external asm or by the linker, which are not subject to any
analysis by LTO. Both the contents of inline asm statements and the
contents of external asm are black boxes, no? I think if they're not
treated as such lots of reasonable things would break.

> > BTW does the "X" constraint work all the way back to ancient gcc (and
> > then presumably with pcc, clang)? The first time I ever saw it was in
> > one of your other patches.
> 
> gcc documentation says 2.95 already had it.

Thanks! I tried looking at 3.x docs but forgot how to construct the
URLs.

Rich
Alexander Monakov Jan. 19, 2020, 7:13 p.m.
On Sun, 19 Jan 2020, Rich Felker wrote:

> > In the context of LTO asms really need to mention referenced symbols in
> > constraints. Plus, it's good to have "the pointer is related to _start"
> > spelled out in the code.
> 
> I think this is true for objects defined in C code, but not for ones
> defined in external asm or by the linker, which are not subject to any
> analysis by LTO. Both the contents of inline asm statements and the
> contents of external asm are black boxes, no? I think if they're not
> treated as such lots of reasonable things would break.

Yes, I was speaking in a more general sense. I agree here there would be
no problem technically; I'd still insist on mentioning _start in constraints
as a matter of clarity though.

Alexander
Alexander Cherepanov Jan. 19, 2020, 9:02 p.m.
On 19/01/2020 19.22, Rich Felker wrote:
>> Even if we know that _start + k == _end it doesn't mean that we
>> allowed to subtract them.
> 
> Consider a function that takes a pointer p, an array a, and a length
> l, and does:
> 
> 	for (i=0; i<l; i++) if (a+i == p) return p-a;
> 
> Can f(_end,_start,k) and f(_start+k,_start,k) legitimately differ,
> despite _end==_start+k?

I guess it depends on what you mean by "legitimately" and "differ". 
Given that _start and _end are different arrays the first variant is 
undefined.

Counter-intuitive behavior of equal pointers could be demonstrated much 
easier. Suppose x and y are two objects of the same type and &x + 1 == 
&y. Is it valid to evaluate the following expressions: *(&x + 1), &x + 
2, (&y)[-1]?

> I think the answer is no, in the existing C
> language, in that the result of an expression is a pure function of
> the *values* put into it. 

The fact that two values are equal doesn't mean that they are the same 
value.

Take floating-point zeroes for example. They are equal but have 
different provenances: one came from the right, another one -- from the 
left:-)

> But compiler folks do not want to interpret
> it this way and are pushing through hidden "provenance" state, so...

IIUC they are not happy about it too but the alternatives are not that 
great.
Alexander Cherepanov Jan. 19, 2020, 9:34 p.m.
On 19/01/2020 19.39, Rich Felker wrote:
> On Sun, Jan 19, 2020 at 07:33:08PM +0300, Alexander Cherepanov wrote:
>> On 19/01/2020 17.46, Alexander Monakov wrote:
>>> On Sun, 19 Jan 2020, Alexander Cherepanov wrote:
>>>
>>>> Couldn't _start defined as an array? Then separate values could be accessed
>>>> simply as elements of this array. And casts to integers could be limited to
>>>> calculating the number of elements, the terminating value or something.
>>>
>>> Yeah, I think usually such linker-provided symbols are declared as
>>> extern arrays. I'm surprised that isn't the case in musl.  I don't think
>>> declaring them as arrays helps with making casts pedantically suitable for
>>> calculating number of elements though - as you said, any bijection between
>>> intptr_t and pointers would be a valid implementation of a cast, you're not
>>
>> Well, we want use from C some outside info, there could be no
>> pedantic way to do this. Let's see, we know that the _end array
>> follows the _start array in memory. This means that &_start[i] ==
>> &_end[0] for some i. But different provenance of the pointers means
>> that we cannot do it just like that. Adding a cast should fix this.
>> Summarizing, it should look like this:
>>
>> for (size_t i = 0; (uintptr_t)&_start[i] != (uintptr_t)&_end[0]; i++)
>>
>> or
>>
>> for (type *p = _start; (uintptr_t)p != (uintptr_t)_end; p++)
> 
> This works for forward walk, not backwards walk.

Oops, then asm barriers look more attractive.

>>> guaranteed that (intptr_t)&a[i] == (intptr_t)a + i * sizeof *a.
>>
>> While you are inside one object, I think this should be safe in
>> practice. For gcc, this is more or less guaranteed by [3]. BTW there
>> is an explicit restriction there:
>>
>> "When casting from pointer to integer and back again, the resulting
>> pointer must reference the same object as the original pointer,
>> otherwise the behavior is undefined. That is, one may not use
>> integer arithmetic to avoid the undefined behavior of pointer
>> arithmetic as proscribed in C99 and C11 6.5.6/8."
>>
>> [3] https://gcc.gnu.org/onlinedocs/gcc/Arrays-and-pointers-implementation.html
> 
> GCC is badly wrong here, and it breaks XOR linked lists and other
> things. 

Why is that? Integers (and pointers as it turned out) could have several 
provenances as tracked by gcc, so XOR linked lists should be fine.

> It's also worded imprecisely.

Sure.

> What does it mean if arithmetic
> is performed on the value between the cast and cast back. What if two
> pointers go into the arithmetic, but complex mathematical relations
> result in one of the original values coming out, and the compiler can
> only "see" the other pointer going in? Will it then wrongly assume
> that the result points to the same object as the pointer it "saw" go
> in?

I looked into exactly this about 3 week ago:-) Rediscovered an old gcc 
bug and found that the problem happens even without any casts -- see 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49330#c28 .

> This whole provenance thing is a trashfire.

Pluses of the provenance thing: `a[i] = 1;` could be moved over `b[j] = 
2;` when `a` and `b` are different array while `i` and `j` are unknown.
Minuses of the provenance thing: slight inconvenience in cases like with 
_start & _end. The pluses seem to outweigh the minuses. Did I miss 
something important?

What I recently found definitely wrong is instability of equality `&x + 
1 == &y`. This leads to outright nonsense.