[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v6 1/4] xen: introduce SYMBOL



On Thu, 17 Jan 2019, Jan Beulich wrote:
> >>> On 17.01.19 at 01:37, <sstabellini@xxxxxxxxxx> wrote:
> > On Wed, 16 Jan 2019, Jan Beulich wrote:
> >> In any event - since intermediate variables merely hide the
> >> casting from the compiler, but they don't remove the casts, the
> >> solution involving casts is better imo, for incurring less overhead.
> > 
> > This is where I completely disagree. The intermediate variables are not
> > hiding casts from the compiler. There were never any pointers in this
> > case.  The linker creates "symbols", not pointers, completely invisible
> > from C land. Assembly uses these symbols to initialize variables. We
> > expose these assembly variables as integer to C lands. LD scripts and
> > assembly have their own terminology and rules: neither "_start" nor
> > "start" are pointers at any point in time. The operations done in var.S
> > is not a cast. The C spec is happy, the compiler is happy, MISRA-C is
> > happy. And we get to avoid the ugly SYMBOL macro that Linux uses. It is
> > really a win-win.
> 
> Well, that's a position one can take. But we have to settle on another
> aspect then first: Does what is not done in C underly C's rules? I
> thought you were of the opinion that what comes from linker scripts
> does. In which case what comes from assembly files ought to, too.
> (FAOD my implication is: If the answer is yes, both approaches
> violate C's rules. If the answer is no, no change is needed at all.)

Great question, that is the core of the issue. Also, let me premise that
I agree on the comments you made on the patches (I dislike "start_"
too), and I can address them if we agree to continue down this path.

But no, I do not think that what is done outside of C-land should follow
C rules. But I do not agree with your conclusion that in that case there
is no difference between the approaches. Let's get more into the
details.


1) SYMBOL_HIDE returning pointer type

Let's take _start and _end as an example. _start is born as a linker
symbol, and it becomes a C pointer when we do:

  extern char _start[], _end[]

Now it is a pointer (actually I should say an array, but let's pretend
they are the same thing for this discussion).

When we do:

  SYMBOL_HIDE(_end) - SYMBOL_HIDE(_start)

We are still subtracting pointers: the pointers returned by SYMBOL_HIDE.
We cannot prove that they are pointers to the same object or subsequence
objects in memory, so it is undefined behavior, which is not allowed.
This solution allows us to highlight the problematic call sites, it
helps with compiler issues, but I am not convinced it helps with C
compliance. Better than nothing, but the worst of the lot.


2) SYMBOL_HIDE returning unsigned long

Similarly to the previous case, _start and _end are born as linker
symbols and become pointers when we do:

  extern char _start[], _end[]

Then the operation:

  SYMBOL_HIDE(_end) - SYMBOL_HIDE(_start)

SYMBOL_HIDE returns unsigned long, so the pointers disappear. We are
comparing unsigned long, which should solve the C compliance issue.
Because the pointers to unsigned long conversion is done in assembly, C
compliance does not have a say on the nature of the unsigned long
returned by SYMBOL_HIDE.

However, given that _start and _end are still defined as pointers in
C-land and given that SYMBOL_HIDE, although assembly, is basically a
fancy cast, I concede that the solution is not ideal. I still think is
acceptable, but inferior to the next solution.


3) var.S + start_ as unsigned long

With this approach, _start is born as a linker symbol. It is never
exported to C, so from C point of view, it doesn't exist. There is
another variable named "start_" defined in assembly and initialized to
_start. Now we go into C land with:

  extern uintptr_t start_, end_

start_ and end_ are uintptr_t from the beginning from C point of view.
They have never been pointers or in any way connected to _start. They
are "clean".

When we do:

  _end - _start

it is a subtraction between uintptr_t, which is allowed. When we do:

    for ( call = (const initcall_t *)initcall_start_;
          (uintptr_t)call < presmp_initcall_end_;

The comparison is still between uintptr_t types, and the value of "call"
still comes from an unsigned long initially. There is never a comparison
between dubious pointers. (Interger to pointer conversions and pointer
to integer conversions are allowed by MISRA with some limitations, but I
am double-checking.) Even:

   (uintptr_t)random_pointer < presmp_initcall_end_

would be acceptable because presmp_initcall_end_ is an integer and has
always been an integer from C point of view.


However, there are still a couple of issued not correctly solved by v8
of the series. For starters: 

        apply_alternatives((struct alt_instr *)alt_instructions_,
                           (struct alt_instr *)alt_instructions_end_);

I can see how the pointers comparisons in apply_alternatives could be
considered wrong given the way the pointers are initialized:

    for ( a = base = start; a < end; a++ )
    {

start and end come from alt_instructions_ and alt_instructions_end_. It
doesn't matter that alt_instructions_ and alt_instructions_end_ are
"special", they could be perfectly normal integers and we would still
have the same problem: we cannot prove that "start" and "end" point to
the same object or subsequent objects in memory.

The way to fix it is by changing the parameters of apply_alternatives to
interger types, making comparison between integers, and only using
pointers to access the data.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.