[RFC] ANY linker script syntax for non-contiguous regions

public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed

* [RFC] ANY linker script syntax for non-contiguous regions
@ 2024-02-06 23:58 Daniel Thornburgh
  2024-02-07  0:33 ` Roland McGrath
  0 siblings, 1 reply; 7+ messages in thread
From: Daniel Thornburgh @ 2024-02-06 23:58 UTC (permalink / raw)
  To: binutils

[-- Attachment #1: Type: text/plain, Size: 1754 bytes --]

I've been working to add a feature for section packing akin to LD's
`--enable-non-contiguous-regions` to LLD. Discussing this[1] revealed a
desire to have an explicit syntax in the linker script, rather than having
a flag globally modify the behavior of wildcard matching and address
layout. I wanted to pop my head in here before implementing anything too
extensively with the hope of arriving at something that could work across
linkers.

I'm currently thinking of something like the existing `SORT_BY_xxx`
modifiers: `ANY(<tag> <wildcard_patterns>)`. This would match input
sections identically as normal, but it would record the set of matched
sections as associated with the given tag. Then, any further instances of
`ANY(<tag>)` without any wildcard patterns would provide locations that the
tagged sections could spill to, as if by a subsequent
`enable-non-contiguous-regions` match. `ANY` refers to the possibility that
any of the tagged sections that fit may appear at that location. (The name
feels weak to me; it comes from a vague reference to an armlink feature
with wildly different semantics. Open for suggestions.)

Using tags avoids introducing a new semantics for wildcard matching; a
section would still always match exactly one wildcard. Accordingly, it
would preserve the ability to use broad wildcards to refer to anything not
yet matched by earlier more specific ones. This in turn should make porting
easier, since linker scripts may be written assuming this. By contrast,
`--enable-non-contigous-regions` makes broad wildcards potential spill
locations for earlier matches, which may not have been intended.

[1] https://discourse.llvm.org/t/rfc-lld-enable-non-contiguous-regions/76513
-- 

Daniel Thornburgh | dthorn@google.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] ANY linker script syntax for non-contiguous regions
  2024-02-06 23:58 [RFC] ANY linker script syntax for non-contiguous regions Daniel Thornburgh
@ 2024-02-07  0:33 ` Roland McGrath
  2024-02-07 22:36   ` Daniel Thornburgh
  0 siblings, 1 reply; 7+ messages in thread
From: Roland McGrath @ 2024-02-07  0:33 UTC (permalink / raw)
  To: Daniel Thornburgh; +Cc: binutils

This sounds like it essentially creates a two-phase system for assigning
input sections to output sections.  As such, IMHO it would be cleaner not
to combine the two phases in a single syntax.  That is, your proposal seems
to have two uses of the new syntax, both inside an output section clause:
the "defining" use matches a subset of sections being selected for the
output section whose clause it's embedded in; the "referring" uses then
refer to the unassigned remainder of that subset in a later output section
clause.  What seems clearer to me is to have a different kind of clause
that's not an output section but only defines a new subset of input
sections for later use.  Perhaps:

```
SECTIONS {
  /* Traditional output section clause: */
  .output1 { *(.input1) }

  /* New syntax that is not an output section clause: */
  SECTION_CLASS(class1) { *(.input2) }

  /* Output section clause referring to SECTION_CLASS: */
  .output2 {
    *(.input3) /* normal section wildcard */

    SECTION_CLASS(class1) /* reference to previously defined class */

    *(.input4) /* normal section wildcard */
  }

  .output3 {
    SECTION_CLASS(class1) /* reference to remainder of class not in .output2 */
  }

  .output4 {
    /* reference to remainder not in .output[23], sorting applied to them */
    SORT_BY_NAME(SECTION_CLASS(class1))
  }

  /* This cannot match anything that went into a SECTION_CLASS and orphans
     placement does not apply to them so it's an error if any
     SECTION_CLASS-matched input section has not been assigned to a
     previous output section. */
  .output5 { *(*) }
}
```

The idea is that `SECTION_CLASS(<class name>)` works like an output section
clause in that position as far as its input section wildcards matching and
"consuming" input sections that are unassigned at that point in the script.
(Nothing else but input section wildcards would be allowed inside the
braces of a `SECTION_CLASS` clause.)  The sections it has matched are
"consumed" such that they cannot be matched by any normal input section
wildcard later in the script.  They are also excluded from orphans logic.

The only way these input sections can now be placed is by having
`SECTION_CLASS(<class name>)` appear in one or more output section clauses.
That causes as many of those input sections to be placed in that output
section as can fit.  Any that don't fit remain in that SECTION_CLASS's list
of unassigned sections to be drawn from by another `SECTION_CLASS` input
clause.

Where they do appear, they can appear inside the SORT_* to have that
sorting applied to the subset.  I'm not sure it's actually meaningful to
select a subset and then sort within that subset, so perhaps it would be
better to only support the SORT_* modifiers in the usual way on the input
section wildcards in the defining `SECTION_CLASS` clause, and then every
reference has to always just draw in order from that sorted list.

Note that this implicitly provides the option to insert:
```
  SECTION_CLASS(orphans) { *(*) }
```
at the end of the whole SECTIONS clause.  This means that all unassigned
sections go into the `orphans` class.  Since no output section clause
consumes from `SECTION_CLASS(orphans)`, then it's an error if the class is
nonempty.  This provides a way for a linker script to rule out any unwanted
orphans placement, which is challenging today.  (It's probably possible
with SHF_ALLOC flags matching and ASSERT, but quite picayune.)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] ANY linker script syntax for non-contiguous regions
  2024-02-07  0:33 ` Roland McGrath
@ 2024-02-07 22:36   ` Daniel Thornburgh
  2024-02-07 23:23     ` Daniel Thornburgh
  0 siblings, 1 reply; 7+ messages in thread
From: Daniel Thornburgh @ 2024-02-07 22:36 UTC (permalink / raw)
  To: Roland McGrath; +Cc: binutils

[-- Attachment #1: Type: text/plain, Size: 4125 bytes --]

This is interesting; I hadn't thought of breaking out tagging into its own
construct. I do like the name SECTION_CLASS; it makes it clear that
elements from that section could be placed there, and the only plausible
meaning for multiple references to the same class would be to allow them to
spill.

This also slightly increases the power of regular matching, since you could
catch a group of sections earlier in a linker script and place them later,
perhaps to avoid a broad wildcard needed for output sections that should
appear earlier in a memory region.

E.g.
```
SECTIONS {
   SECTION_CLASS(specific) { *(.specific) }
   .first_output { *(*) } >m1
   .second_output { SECTION_CLASS(specific) } >m1
}
```

On the down side, it would make it harder to port existing linker scripts.
To spill a wildcard match inside an output section with strong ordering
requirements, you would need to define classes for anything that precedes
it.

Without spilling:
```
SECTIONS {
  .foo {
    *(.foo.first.*)
     . = . < 0xc000 ? 0xc000 : .;  /* or something similarly horrible */
    *(.foo.second.*)

    /* Excludes .foo.first.* and .foo.second.* by virtue of ordering */
    *(.foo.*)
  }>m1
}
```

With spilling:
```SECTIONS {
  /* A whole collection now needs to be broken out of the output section to
preserve its semantics */
  SECTION_CLASS(foo_first) { *(.foo.first.*) }
  SECTION_CLASS(foo_second) { *(.foo.second.*) }
  /* If only this section class were defined, then it would unintentionally
capture .foo.first and .foo.second */
  SECTION_CLASS(foo_rest) { *(.foo.*) }

.foo {
    SECTION_CLASS(foo_first)
    . = . < 0xc000 ? 0xc000 : .;
    SECTION_CLASS(foo_second)
    SECTION_CLASS(foo_rest)
  }>m1
  .foo_alt {
     /* foo_rest is the only desirable spill, but it forces sweeping
changes to the linker script */
    SECTION_CLASS(foo_rest)
  }>m2
}
```

Progressive enhancement of existing scripts is essential for this to be
useful in practice. As far as I can see, since linker script semantics are
so imperative, that requires section classes to be nameable in the same
place as existing wildcards. That being said, as mentioned earlier, naming
outside output sections also adds power, so it might be useful too, but it
seems like it would need to be judged on merit of the extra power added.

On Tue, Feb 6, 2024 at 4:33 PM Roland McGrath <mcgrathr@google.com> wrote:

> ```
> SECTIONS {

  /* New syntax that is not an output section clause: */
>   SECTION_CLASS(class1) { *(.input2) }
>
>   /* Output section clause referring to SECTION_CLASS: */
>   .output2 {
>     *(.input3) /* normal section wildcard */
>
>     SECTION_CLASS(class1) /* reference to previously defined class */
>
>     *(.input4) /* normal section wildcard */
>   }
>   .output4 {
>     /* reference to remainder not in .output[23], sorting applied to them
> */
>     SORT_BY_NAME(SECTION_CLASS(class1))
>   }
> }
> ```
>
Where they do appear, they can appear inside the SORT_* to have that
> sorting applied to the subset.  I'm not sure it's actually meaningful to
> select a subset and then sort within that subset, so perhaps it would be
> better to only support the SORT_* modifiers in the usual way on the input
> section wildcards in the defining `SECTION_CLASS` clause, and then every
> reference has to always just draw in order from that sorted list.
>

I would expect SECTION_CLASS uses in output sections to be referentially
transparent with the exception of the packing behavior and the ordering of
section "consumption". They should broadly behave as if the named wildcards
had actually appeared in order in the corresponding location; that should
allow providing a set of allowed and disallowed SORT_x family
specifications to match the current behavior. That would support using
SECTION_CLASS to break out an existing wildcard to run it earlier, since it
would provide some assurance that this wouldn't change the behavior beyond
that which was intended.

-- 

Daniel Thornburgh | dthorn@google.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] ANY linker script syntax for non-contiguous regions
  2024-02-07 22:36   ` Daniel Thornburgh
@ 2024-02-07 23:23     ` Daniel Thornburgh
  2024-02-10  6:33       ` Fangrui Song
                         ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Daniel Thornburgh @ 2024-02-07 23:23 UTC (permalink / raw)
  To: Roland McGrath; +Cc: binutils

[-- Attachment #1: Type: text/plain, Size: 4630 bytes --]

One more tack on: maybe just CLASS instead of SECTION_CLASS for the name...
it's only legal within SECTIONS {}, and none of the other things within
contain the word SECTION.

On Wed, Feb 7, 2024 at 2:36 PM Daniel Thornburgh <dthorn@google.com> wrote:

> This is interesting; I hadn't thought of breaking out tagging into its own
> construct. I do like the name SECTION_CLASS; it makes it clear that
> elements from that section could be placed there, and the only plausible
> meaning for multiple references to the same class would be to allow them to
> spill.
>
> This also slightly increases the power of regular matching, since you
> could catch a group of sections earlier in a linker script and place them
> later, perhaps to avoid a broad wildcard needed for output sections that
> should appear earlier in a memory region.
>
> E.g.
> ```
> SECTIONS {
>    SECTION_CLASS(specific) { *(.specific) }
>    .first_output { *(*) } >m1
>    .second_output { SECTION_CLASS(specific) } >m1
> }
> ```
>
> On the down side, it would make it harder to port existing linker scripts.
> To spill a wildcard match inside an output section with strong ordering
> requirements, you would need to define classes for anything that precedes
> it.
>
> Without spilling:
> ```
> SECTIONS {
>   .foo {
>     *(.foo.first.*)
>      . = . < 0xc000 ? 0xc000 : .;  /* or something similarly horrible */
>     *(.foo.second.*)
>
>     /* Excludes .foo.first.* and .foo.second.* by virtue of ordering */
>     *(.foo.*)
>   }>m1
> }
> ```
>
> With spilling:
> ```SECTIONS {
>   /* A whole collection now needs to be broken out of the output section
> to preserve its semantics */
>   SECTION_CLASS(foo_first) { *(.foo.first.*) }
>   SECTION_CLASS(foo_second) { *(.foo.second.*) }
>   /* If only this section class were defined, then it would
> unintentionally capture .foo.first and .foo.second */
>   SECTION_CLASS(foo_rest) { *(.foo.*) }
>
> .foo {
>     SECTION_CLASS(foo_first)
>     . = . < 0xc000 ? 0xc000 : .;
>     SECTION_CLASS(foo_second)
>     SECTION_CLASS(foo_rest)
>   }>m1
>   .foo_alt {
>      /* foo_rest is the only desirable spill, but it forces sweeping
> changes to the linker script */
>     SECTION_CLASS(foo_rest)
>   }>m2
> }
> ```
>
> Progressive enhancement of existing scripts is essential for this to be
> useful in practice. As far as I can see, since linker script semantics are
> so imperative, that requires section classes to be nameable in the same
> place as existing wildcards. That being said, as mentioned earlier, naming
> outside output sections also adds power, so it might be useful too, but it
> seems like it would need to be judged on merit of the extra power added.
>
> On Tue, Feb 6, 2024 at 4:33 PM Roland McGrath <mcgrathr@google.com> wrote:
>
>> ```
>> SECTIONS {
>
>
>
>   /* New syntax that is not an output section clause: */
>>   SECTION_CLASS(class1) { *(.input2) }
>>
>>   /* Output section clause referring to SECTION_CLASS: */
>>   .output2 {
>>     *(.input3) /* normal section wildcard */
>>
>>     SECTION_CLASS(class1) /* reference to previously defined class */
>>
>>     *(.input4) /* normal section wildcard */
>>   }
>>   .output4 {
>>     /* reference to remainder not in .output[23], sorting applied to them
>> */
>>     SORT_BY_NAME(SECTION_CLASS(class1))
>>   }
>> }
>> ```
>>
> Where they do appear, they can appear inside the SORT_* to have that
>> sorting applied to the subset.  I'm not sure it's actually meaningful to
>> select a subset and then sort within that subset, so perhaps it would be
>> better to only support the SORT_* modifiers in the usual way on the input
>> section wildcards in the defining `SECTION_CLASS` clause, and then every
>> reference has to always just draw in order from that sorted list.
>>
>
> I would expect SECTION_CLASS uses in output sections to be referentially
> transparent with the exception of the packing behavior and the ordering of
> section "consumption". They should broadly behave as if the named wildcards
> had actually appeared in order in the corresponding location; that should
> allow providing a set of allowed and disallowed SORT_x family
> specifications to match the current behavior. That would support using
> SECTION_CLASS to break out an existing wildcard to run it earlier, since it
> would provide some assurance that this wouldn't change the behavior beyond
> that which was intended.
>
> --
>
> Daniel Thornburgh | dthorn@google.com
>
>

-- 

Daniel Thornburgh | dthorn@google.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] ANY linker script syntax for non-contiguous regions
  2024-02-07 23:23     ` Daniel Thornburgh
@ 2024-02-10  6:33       ` Fangrui Song
       [not found]       ` <DS7PR12MB5765A6AA68F691E3A9641AABCB4A2@DS7PR12MB5765.namprd12.prod.outlook.com>
  2024-05-01  5:07       ` Fangrui Song
  2 siblings, 0 replies; 7+ messages in thread
From: Fangrui Song @ 2024-02-10  6:33 UTC (permalink / raw)
  To: Christophe Lyon; +Cc: Daniel Thornburgh, Roland McGrath, binutils

+Christophe

On Wed, Feb 7, 2024 at 3:23 PM Daniel Thornburgh <dthorn@google.com> wrote:
>
> One more tack on: maybe just CLASS instead of SECTION_CLASS for the name... it's only legal within SECTIONS {}, and none of the other things within contain the word SECTION.
>
> On Wed, Feb 7, 2024 at 2:36 PM Daniel Thornburgh <dthorn@google.com> wrote:
>>
>> This is interesting; I hadn't thought of breaking out tagging into its own construct. I do like the name SECTION_CLASS; it makes it clear that elements from that section could be placed there, and the only plausible meaning for multiple references to the same class would be to allow them to spill.
>>
>> This also slightly increases the power of regular matching, since you could catch a group of sections earlier in a linker script and place them later, perhaps to avoid a broad wildcard needed for output sections that should appear earlier in a memory region.
>>
>> E.g.
>> ```
>> SECTIONS {
>>    SECTION_CLASS(specific) { *(.specific) }
>>    .first_output { *(*) } >m1
>>    .second_output { SECTION_CLASS(specific) } >m1
>> }
>> ```
>>
>> On the down side, it would make it harder to port existing linker scripts. To spill a wildcard match inside an output section with strong ordering requirements, you would need to define classes for anything that precedes it.
>>
>> Without spilling:
>> ```
>> SECTIONS {
>>   .foo {
>>     *(.foo.first.*)
>>      . = . < 0xc000 ? 0xc000 : .;  /* or something similarly horrible */
>>     *(.foo.second.*)
>>
>>     /* Excludes .foo.first.* and .foo.second.* by virtue of ordering */
>>     *(.foo.*)
>>   }>m1
>> }
>> ```
>>
>> With spilling:
>> ```SECTIONS {
>>   /* A whole collection now needs to be broken out of the output section to preserve its semantics */
>>   SECTION_CLASS(foo_first) { *(.foo.first.*) }
>>   SECTION_CLASS(foo_second) { *(.foo.second.*) }
>>   /* If only this section class were defined, then it would unintentionally capture .foo.first and .foo.second */
>>   SECTION_CLASS(foo_rest) { *(.foo.*) }
>>
>> .foo {
>>     SECTION_CLASS(foo_first)
>>     . = . < 0xc000 ? 0xc000 : .;
>>     SECTION_CLASS(foo_second)
>>     SECTION_CLASS(foo_rest)
>>   }>m1
>>   .foo_alt {
>>      /* foo_rest is the only desirable spill, but it forces sweeping changes to the linker script */
>>     SECTION_CLASS(foo_rest)
>>   }>m2
>> }
>> ```
>>
>> Progressive enhancement of existing scripts is essential for this to be useful in practice. As far as I can see, since linker script semantics are so imperative, that requires section classes to be nameable in the same place as existing wildcards. That being said, as mentioned earlier, naming outside output sections also adds power, so it might be useful too, but it seems like it would need to be judged on merit of the extra power added.
>>
>> On Tue, Feb 6, 2024 at 4:33 PM Roland McGrath <mcgrathr@google.com> wrote:
>>>
>>> ```
>>> SECTIONS {
>>>
>>>
>>>
>>>   /* New syntax that is not an output section clause: */
>>>   SECTION_CLASS(class1) { *(.input2) }
>>>
>>>   /* Output section clause referring to SECTION_CLASS: */
>>>   .output2 {
>>>     *(.input3) /* normal section wildcard */
>>>
>>>     SECTION_CLASS(class1) /* reference to previously defined class */
>>>
>>>     *(.input4) /* normal section wildcard */
>>>   }
>>>   .output4 {
>>>     /* reference to remainder not in .output[23], sorting applied to them */
>>>     SORT_BY_NAME(SECTION_CLASS(class1))
>>>   }
>>> }
>>> ```
>>>
>>> Where they do appear, they can appear inside the SORT_* to have that
>>> sorting applied to the subset.  I'm not sure it's actually meaningful to
>>> select a subset and then sort within that subset, so perhaps it would be
>>> better to only support the SORT_* modifiers in the usual way on the input
>>> section wildcards in the defining `SECTION_CLASS` clause, and then every
>>> reference has to always just draw in order from that sorted list.
>>
>>
>> I would expect SECTION_CLASS uses in output sections to be referentially transparent with the exception of the packing behavior and the ordering of section "consumption". They should broadly behave as if the named wildcards had actually appeared in order in the corresponding location; that should allow providing a set of allowed and disallowed SORT_x family specifications to match the current behavior. That would support using SECTION_CLASS to break out an existing wildcard to run it earlier, since it would provide some assurance that this wouldn't change the behavior beyond that which was intended.
>>
>> --
>>
>> Daniel Thornburgh | dthorn@google.com
>>
>
>
> --
>
> Daniel Thornburgh | dthorn@google.com
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] ANY linker script syntax for non-contiguous regions
       [not found]       ` <DS7PR12MB5765A6AA68F691E3A9641AABCB4A2@DS7PR12MB5765.namprd12.prod.outlook.com>
@ 2024-02-12 10:04         ` Christophe Lyon
  0 siblings, 0 replies; 7+ messages in thread
From: Christophe Lyon @ 2024-02-12 10:04 UTC (permalink / raw)
  To: Fangrui Song; +Cc: Daniel Thornburgh, Roland McGrath, binutils

Hi!

On Sat, 10 Feb 2024 at 07:33, Fangrui Song <i@maskray.me> wrote:
>
> +Christophe
>
> On Wed, Feb 7, 2024 at 3:23 PM Daniel Thornburgh <dthorn@google.com> wrote:
> >
> > One more tack on: maybe just CLASS instead of SECTION_CLASS for the name... it's only legal within SECTIONS {}, and none of the other things within contain the word SECTION.
> >
> > On Wed, Feb 7, 2024 at 2:36 PM Daniel Thornburgh <dthorn@google.com> wrote:
> >>
> >> This is interesting; I hadn't thought of breaking out tagging into its own construct. I do like the name SECTION_CLASS; it makes it clear that elements from that section could be placed there, and the only plausible meaning for multiple references to the same class would be to allow them to spill.
> >>
> >> This also slightly increases the power of regular matching, since you could catch a group of sections earlier in a linker script and place them later, perhaps to avoid a broad wildcard needed for output sections that should appear earlier in a memory region.
> >>
> >> E.g.
> >> ```
> >> SECTIONS {
> >>    SECTION_CLASS(specific) { *(.specific) }
> >>    .first_output { *(*) } >m1
> >>    .second_output { SECTION_CLASS(specific) } >m1
> >> }
> >> ```
> >>
> >> On the down side, it would make it harder to port existing linker scripts. To spill a wildcard match inside an output section with strong ordering requirements, you would need to define classes for anything that precedes it.
> >>
> >> Without spilling:
> >> ```
> >> SECTIONS {
> >>   .foo {
> >>     *(.foo.first.*)
> >>      . = . < 0xc000 ? 0xc000 : .;  /* or something similarly horrible */
> >>     *(.foo.second.*)
> >>
> >>     /* Excludes .foo.first.* and .foo.second.* by virtue of ordering */
> >>     *(.foo.*)
> >>   }>m1
> >> }
> >> ```
> >>
> >> With spilling:
> >> ```SECTIONS {
> >>   /* A whole collection now needs to be broken out of the output section to preserve its semantics */
> >>   SECTION_CLASS(foo_first) { *(.foo.first.*) }
> >>   SECTION_CLASS(foo_second) { *(.foo.second.*) }
> >>   /* If only this section class were defined, then it would unintentionally capture .foo.first and .foo.second */
> >>   SECTION_CLASS(foo_rest) { *(.foo.*) }
> >>
> >> .foo {
> >>     SECTION_CLASS(foo_first)
> >>     . = . < 0xc000 ? 0xc000 : .;
> >>     SECTION_CLASS(foo_second)
> >>     SECTION_CLASS(foo_rest)
> >>   }>m1
> >>   .foo_alt {
> >>      /* foo_rest is the only desirable spill, but it forces sweeping changes to the linker script */
> >>     SECTION_CLASS(foo_rest)
> >>   }>m2
> >> }
> >> ```
> >>
> >> Progressive enhancement of existing scripts is essential for this to be useful in practice. As far as I can see, since linker script semantics are so imperative, that requires section classes to be nameable in the same place as existing wildcards. That being said, as mentioned earlier, naming outside output sections also adds power, so it might be useful too, but it seems like it would need to be judged on merit of the extra power added.
> >>
> >> On Tue, Feb 6, 2024 at 4:33 PM Roland McGrath <mcgrathr@google.com> wrote:
> >>>
> >>> ```
> >>> SECTIONS {
> >>>
> >>>
> >>>
> >>>   /* New syntax that is not an output section clause: */
> >>>   SECTION_CLASS(class1) { *(.input2) }
> >>>
> >>>   /* Output section clause referring to SECTION_CLASS: */
> >>>   .output2 {
> >>>     *(.input3) /* normal section wildcard */
> >>>
> >>>     SECTION_CLASS(class1) /* reference to previously defined class */
> >>>
> >>>     *(.input4) /* normal section wildcard */
> >>>   }
> >>>   .output4 {
> >>>     /* reference to remainder not in .output[23], sorting applied to them */
> >>>     SORT_BY_NAME(SECTION_CLASS(class1))
> >>>   }
> >>> }
> >>> ```
> >>>
> >>> Where they do appear, they can appear inside the SORT_* to have that
> >>> sorting applied to the subset.  I'm not sure it's actually meaningful to
> >>> select a subset and then sort within that subset, so perhaps it would be
> >>> better to only support the SORT_* modifiers in the usual way on the input
> >>> section wildcards in the defining `SECTION_CLASS` clause, and then every
> >>> reference has to always just draw in order from that sorted list.
> >>
> >>
> >> I would expect SECTION_CLASS uses in output sections to be referentially transparent with the exception of the packing behavior and the ordering of section "consumption". They should broadly behave as if the named wildcards had actually appeared in order in the corresponding location; that should allow providing a set of allowed and disallowed SORT_x family specifications to match the current behavior. That would support using SECTION_CLASS to break out an existing wildcard to run it earlier, since it would provide some assurance that this wouldn't change the behavior beyond that which was intended.
> >>


Maybe I can provide some background about the discussions that took
place on this list when I implemented --enable-non-contiguous-regions

My first proposal of implementation was:
https://sourceware.org/legacy-ml/binutils/2019-11/msg00402.html, which
led to some discussions (sorry the list archives are not easy to
browse across months, discussions continue until March 2020 when I
finally committed the patch)

This was a result of my follow-up in June 2019
(https://sourceware.org/legacy-ml/binutils/2019-06/msg00254.html) to a
discussion originally started in Feb 2017:
https://sourceware.org/legacy-ml/binutils/2017-02/msg00250.html

Hopefully these discussions can give you more background.

IMO, a big advantage of the global flag is that it makes the
implementation not too intrusive, and it's easier to understand for
end-users. In my experience, many users have trouble understanding how
to write a linker script, so having multiple keywords with "subtle"
differences or changes in behaviour depending on the context is
confusing.

Christophe


> >> --
> >>
> >> Daniel Thornburgh | dthorn@google.com
> >>
> >
> >
> > --
> >
> > Daniel Thornburgh | dthorn@google.com
> >

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] ANY linker script syntax for non-contiguous regions
  2024-02-07 23:23     ` Daniel Thornburgh
  2024-02-10  6:33       ` Fangrui Song
       [not found]       ` <DS7PR12MB5765A6AA68F691E3A9641AABCB4A2@DS7PR12MB5765.namprd12.prod.outlook.com>
@ 2024-05-01  5:07       ` Fangrui Song
  2 siblings, 0 replies; 7+ messages in thread
From: Fangrui Song @ 2024-05-01  5:07 UTC (permalink / raw)
  To: Daniel Thornburgh; +Cc: Roland McGrath, binutils, Christophe Lyon, Nick Clifton

On Wed, Feb 7, 2024 at 3:23 PM Daniel Thornburgh <dthorn@google.com> wrote:
>
> One more tack on: maybe just CLASS instead of SECTION_CLASS for the name... it's only legal within SECTIONS {}, and none of the other things within contain the word SECTION.
>
> On Wed, Feb 7, 2024 at 2:36 PM Daniel Thornburgh <dthorn@google.com> wrote:
>>
>> This is interesting; I hadn't thought of breaking out tagging into its own construct. I do like the name SECTION_CLASS; it makes it clear that elements from that section could be placed there, and the only plausible meaning for multiple references to the same class would be to allow them to spill.
>>
>> This also slightly increases the power of regular matching, since you could catch a group of sections earlier in a linker script and place them later, perhaps to avoid a broad wildcard needed for output sections that should appear earlier in a memory region.
>>
>> E.g.
>> ```
>> SECTIONS {
>>    SECTION_CLASS(specific) { *(.specific) }
>>    .first_output { *(*) } >m1
>>    .second_output { SECTION_CLASS(specific) } >m1
>> }
>> ```
>>
>> On the down side, it would make it harder to port existing linker scripts. To spill a wildcard match inside an output section with strong ordering requirements, you would need to define classes for anything that precedes it.
>>
>> Without spilling:
>> ```
>> SECTIONS {
>>   .foo {
>>     *(.foo.first.*)
>>      . = . < 0xc000 ? 0xc000 : .;  /* or something similarly horrible */
>>     *(.foo.second.*)
>>
>>     /* Excludes .foo.first.* and .foo.second.* by virtue of ordering */
>>     *(.foo.*)
>>   }>m1
>> }
>> ```
>>
>> With spilling:
>> ```SECTIONS {
>>   /* A whole collection now needs to be broken out of the output section to preserve its semantics */
>>   SECTION_CLASS(foo_first) { *(.foo.first.*) }
>>   SECTION_CLASS(foo_second) { *(.foo.second.*) }
>>   /* If only this section class were defined, then it would unintentionally capture .foo.first and .foo.second */
>>   SECTION_CLASS(foo_rest) { *(.foo.*) }
>>
>> .foo {
>>     SECTION_CLASS(foo_first)
>>     . = . < 0xc000 ? 0xc000 : .;
>>     SECTION_CLASS(foo_second)
>>     SECTION_CLASS(foo_rest)
>>   }>m1
>>   .foo_alt {
>>      /* foo_rest is the only desirable spill, but it forces sweeping changes to the linker script */
>>     SECTION_CLASS(foo_rest)
>>   }>m2
>> }
>> ```
>>
>> Progressive enhancement of existing scripts is essential for this to be useful in practice. As far as I can see, since linker script semantics are so imperative, that requires section classes to be nameable in the same place as existing wildcards. That being said, as mentioned earlier, naming outside output sections also adds power, so it might be useful too, but it seems like it would need to be judged on merit of the extra power added.
>>
>> On Tue, Feb 6, 2024 at 4:33 PM Roland McGrath <mcgrathr@google.com> wrote:
>>>
>>> ```
>>> SECTIONS {
>>>
>>>
>>>
>>>   /* New syntax that is not an output section clause: */
>>>   SECTION_CLASS(class1) { *(.input2) }
>>>
>>>   /* Output section clause referring to SECTION_CLASS: */
>>>   .output2 {
>>>     *(.input3) /* normal section wildcard */
>>>
>>>     SECTION_CLASS(class1) /* reference to previously defined class */
>>>
>>>     *(.input4) /* normal section wildcard */
>>>   }
>>>   .output4 {
>>>     /* reference to remainder not in .output[23], sorting applied to them */
>>>     SORT_BY_NAME(SECTION_CLASS(class1))
>>>   }
>>> }
>>> ```
>>>
>>> Where they do appear, they can appear inside the SORT_* to have that
>>> sorting applied to the subset.  I'm not sure it's actually meaningful to
>>> select a subset and then sort within that subset, so perhaps it would be
>>> better to only support the SORT_* modifiers in the usual way on the input
>>> section wildcards in the defining `SECTION_CLASS` clause, and then every
>>> reference has to always just draw in order from that sorted list.
>>
>>
>> I would expect SECTION_CLASS uses in output sections to be referentially transparent with the exception of the packing behavior and the ordering of section "consumption". They should broadly behave as if the named wildcards had actually appeared in order in the corresponding location; that should allow providing a set of allowed and disallowed SORT_x family specifications to match the current behavior. That would support using SECTION_CLASS to break out an existing wildcard to run it earlier, since it would provide some assurance that this wouldn't change the behavior beyond that which was intended.

The "CLASS" syntax is useful. I've filed a feature request:
https://sourceware.org/bugzilla/show_bug.cgi?id=31688 ("ld: Add CLASS
to allow separate section matching and referring")

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-05-01  5:07 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-06 23:58 [RFC] ANY linker script syntax for non-contiguous regions Daniel Thornburgh
2024-02-07  0:33 ` Roland McGrath
2024-02-07 22:36   ` Daniel Thornburgh
2024-02-07 23:23     ` Daniel Thornburgh
2024-02-10  6:33       ` Fangrui Song
     [not found]       ` <DS7PR12MB5765A6AA68F691E3A9641AABCB4A2@DS7PR12MB5765.namprd12.prod.outlook.com>
2024-02-12 10:04         ` Christophe Lyon
2024-05-01  5:07       ` Fangrui Song

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).