Re: [RFC] Allow linker scripts to specify multiple output regions for an output section?

public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed

* Re: [RFC] Allow linker scripts to specify multiple output regions for an output section?
@ 2017-03-09 12:06 Erik Christiansen
  2017-06-09 12:21 ` Tejas Belagod
  0 siblings, 1 reply; 17+ messages in thread
From: Erik Christiansen @ 2017-03-09 12:06 UTC (permalink / raw)
  To: Tejas Belagod; +Cc: Thomas Preudhomme, binutils

As two prior posts of this content have failed to appear on-list,
apparently due to an auto-spamblock, we'll try again after invoking an
unblock, and waiting a bit.

On 07.03.17 11:06, Tejas Belagod wrote:
> On 03/03/17 10:27, Erik Christiansen wrote:
> > There would undoubtedly be some real effort involved in tweaking ld to
> > rebase input pattern "first match" on output section overflow - where a
> > subsequent match is available. Whether that would best be done as a
> > "second match" search when needed, or replacing "first match" with a
> > list of matches at the outset, remains to be seen. The difference
> > between theory and practice always looks smaller from this side.
> > 
> 
> I like the approach you've proposed. I admit it is more practical than
> extending the syntax for more regions. But, I see 2 disadvantages that are
> more cosmetic than anything else:
> 
> 1. Is the duplicity of patterns over multiple output section regions as
> expressive of the intent as using '> REGION1, REGION2,..., REGIONX'? Though
> you could argue that if the subsequent-match flowing feature is controlled
> by a command-line switch, the user knows what they're doing and the
> intention would be implicit.

Both flowing notations make the intent explicit in the linker script, I
think, but when adding inter-region flow, it is worthwhile considering
more than one use case. Requests for flowing around holes have appeared
on this list before, and flowing input sections according to a single
set of ordering patterns is the simplest case, without much flexibility.

Let us consider an embedded application where flowing of the bulk of
input sections is required, but some of them must remain together for
addressing reasons (e.g. pointer-relative reach, or page-zero addressing).
That requires differing input section sorting between output sections,
but is not possible when a single set of patterns is forced on all the
flow regions.

For such cases, and more complex ones where we might require another
group of input sections to be herded into e.g. the last region (it might
be slower bulk memory, best for low-use data, perhaps), it is essential
that the linker script be able to specify such sorting, optimally by the
existing scripting method of utilising an output section to contain the
sorting pattern subset for the associated memory region.

For the simple repetitive case, copy/paste is the slightest of burdens
which disappears instantly the moment the product requirements evolve
due to the hardware boffins or customer requirements necessitating
selective flowing. That is not the time to discover that we have
designed all flexibility out of the flowing implementation, I submit.
(Most especially when the flexibility comes at no cost beyond the pattern
rebasing for basic flowing.)

Yes, a command line switch for enabling flowing might be a useful
safeguard, as we have discussed.

> 2. If we have complex patterns matching input sections/filenames,
> duplicating it over multiple output sections statements might be prone to
> copy-paste errors. Keeping them consistent after changes means diligently
> replicating them everywhere - adds to maintenance overhead.

Hmmm ... the initial replication is mere copy/paste, but I take your
point about subsequent maintenance of the simple case. However, being
able to edit code/makefiles/scripts is a basic programmer prerequisite.
Muck up a makefile, and we discover the need for accuracy too.

Currently, we similarly flow by manually tweaking input section patterns
in output sections, so creating and keeping an eye on the patterns is
nothing new, really. 

Importantly, if any input sections need to be herded about, the
mechanism for it has not been designed out.

> I agree that replacing the first-match rule with a subsequent match rule
> controlled by a command-line switch is much much lower implementation cost.
> It will be interesting to hear views of a maintainer about the preferred
> approach.

I'm also intrigued whether you'd in the end be satisfied with a partial
flowing solution, covering only one use case, when less effort covers
a very broad diversity, using existing notation and usage. ;-)

Erik

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] Allow linker scripts to specify multiple output regions for an output section?
  2017-03-09 12:06 [RFC] Allow linker scripts to specify multiple output regions for an output section? Erik Christiansen
@ 2017-06-09 12:21 ` Tejas Belagod
  2017-06-09 13:35   ` Erik Christiansen
  0 siblings, 1 reply; 17+ messages in thread
From: Tejas Belagod @ 2017-06-09 12:21 UTC (permalink / raw)
  To: dvalin; +Cc: Thomas Preudhomme, binutils

On 09/03/17 12:06, Erik Christiansen wrote:
> As two prior posts of this content have failed to appear on-list,
> apparently due to an auto-spamblock, we'll try again after invoking an
> unblock, and waiting a bit.
>
> On 07.03.17 11:06, Tejas Belagod wrote:
>> On 03/03/17 10:27, Erik Christiansen wrote:
>>> There would undoubtedly be some real effort involved in tweaking ld to
>>> rebase input pattern "first match" on output section overflow - where a
>>> subsequent match is available. Whether that would best be done as a
>>> "second match" search when needed, or replacing "first match" with a
>>> list of matches at the outset, remains to be seen. The difference
>>> between theory and practice always looks smaller from this side.
>>>
>>
>> I like the approach you've proposed. I admit it is more practical than
>> extending the syntax for more regions. But, I see 2 disadvantages that are
>> more cosmetic than anything else:
>>
>> 1. Is the duplicity of patterns over multiple output section regions as
>> expressive of the intent as using '> REGION1, REGION2,..., REGIONX'? Though
>> you could argue that if the subsequent-match flowing feature is controlled
>> by a command-line switch, the user knows what they're doing and the
>> intention would be implicit.
>
> Both flowing notations make the intent explicit in the linker script, I
> think, but when adding inter-region flow, it is worthwhile considering
> more than one use case. Requests for flowing around holes have appeared
> on this list before, and flowing input sections according to a single
> set of ordering patterns is the simplest case, without much flexibility.
>
> Let us consider an embedded application where flowing of the bulk of
> input sections is required, but some of them must remain together for
> addressing reasons (e.g. pointer-relative reach, or page-zero addressing).
> That requires differing input section sorting between output sections,
> but is not possible when a single set of patterns is forced on all the
> flow regions.
>
> For such cases, and more complex ones where we might require another
> group of input sections to be herded into e.g. the last region (it might
> be slower bulk memory, best for low-use data, perhaps), it is essential
> that the linker script be able to specify such sorting, optimally by the
> existing scripting method of utilising an output section to contain the
> sorting pattern subset for the associated memory region.
>
> For the simple repetitive case, copy/paste is the slightest of burdens
> which disappears instantly the moment the product requirements evolve
> due to the hardware boffins or customer requirements necessitating
> selective flowing. That is not the time to discover that we have
> designed all flexibility out of the flowing implementation, I submit.
> (Most especially when the flexibility comes at no cost beyond the pattern
> rebasing for basic flowing.)
>
> Yes, a command line switch for enabling flowing might be a useful
> safeguard, as we have discussed.
>
>> 2. If we have complex patterns matching input sections/filenames,
>> duplicating it over multiple output sections statements might be prone to
>> copy-paste errors. Keeping them consistent after changes means diligently
>> replicating them everywhere - adds to maintenance overhead.
>
> Hmmm ... the initial replication is mere copy/paste, but I take your
> point about subsequent maintenance of the simple case. However, being
> able to edit code/makefiles/scripts is a basic programmer prerequisite.
> Muck up a makefile, and we discover the need for accuracy too.
>
> Currently, we similarly flow by manually tweaking input section patterns
> in output sections, so creating and keeping an eye on the patterns is
> nothing new, really.
>
> Importantly, if any input sections need to be herded about, the
> mechanism for it has not been designed out.
>
>> I agree that replacing the first-match rule with a subsequent match rule
>> controlled by a command-line switch is much much lower implementation cost.
>> It will be interesting to hear views of a maintainer about the preferred
>> approach.
>
> I'm also intrigued whether you'd in the end be satisfied with a partial
> flowing solution, covering only one use case, when less effort covers
> a very broad diversity, using existing notation and usage. ;-)
>
> Erik
>

Hi,

Just wanted to give you the latest on this. Unfortunately, I have suspended 
working on this feature for the foreseeable future owing to a shift in 
priorities. Will update if there is a change in circumstances.

Thanks,
Tejas.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] Allow linker scripts to specify multiple output regions for an output section?
  2017-06-09 12:21 ` Tejas Belagod
@ 2017-06-09 13:35   ` Erik Christiansen
  2019-06-27 12:58     ` Christophe Lyon
  0 siblings, 1 reply; 17+ messages in thread
From: Erik Christiansen @ 2017-06-09 13:35 UTC (permalink / raw)
  To: binutils

On 09.06.17 13:20, Tejas Belagod wrote:
> > > I agree that replacing the first-match rule with a subsequent match rule
> > > controlled by a command-line switch is much much lower implementation cost.
> > > It will be interesting to hear views of a maintainer about the preferred
> > > approach.
...

> Hi,
> 
> Just wanted to give you the latest on this. Unfortunately, I have suspended
> working on this feature for the foreseeable future owing to a shift in
> priorities. Will update if there is a change in circumstances.

Ah, pity, but life is the art of the possible, not the ideal.
Will look forward to hearing of any resumption, should it occur.

Erik

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] Allow linker scripts to specify multiple output regions for an output section?
  2017-06-09 13:35   ` Erik Christiansen
@ 2019-06-27 12:58     ` Christophe Lyon
  2019-07-02  6:49       ` Erik Christiansen
  0 siblings, 1 reply; 17+ messages in thread
From: Christophe Lyon @ 2019-06-27 12:58 UTC (permalink / raw)
  To: dvalin, Tejas Belagod; +Cc: binutils, Maxim Kuvyrkov, Peter Smith

Hi!

On Fri, 9 Jun 2017 at 15:35, Erik Christiansen <dvalin@internode.on.net> wrote:
>
> On 09.06.17 13:20, Tejas Belagod wrote:
> > > > I agree that replacing the first-match rule with a subsequent match rule
> > > > controlled by a command-line switch is much much lower implementation cost.
> > > > It will be interesting to hear views of a maintainer about the preferred
> > > > approach.
> ...
>
> > Hi,
> >
> > Just wanted to give you the latest on this. Unfortunately, I have suspended
> > working on this feature for the foreseeable future owing to a shift in
> > priorities. Will update if there is a change in circumstances.
>
> Ah, pity, but life is the art of the possible, not the ideal.
> Will look forward to hearing of any resumption, should it occur.
>


We have received requests to support non-contiguous memory regions in
the BFD linker, so it seems it is time to resurrect this thread :-)

IIUC, Erik made a proposal that seems simpler to implement than the
initial one in
https://sourceware.org/ml/binutils/2017-03/msg00020.html

Tejas, on your side, do you have any news about this project? (ideally
in terms of implementation, but updated specs would be good too)

Thanks,

Christophe


> Erik

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] Allow linker scripts to specify multiple output regions for an output section?
  2019-06-27 12:58     ` Christophe Lyon
@ 2019-07-02  6:49       ` Erik Christiansen
  2019-07-11  8:42         ` Christophe Lyon
  0 siblings, 1 reply; 17+ messages in thread
From: Erik Christiansen @ 2019-07-02  6:49 UTC (permalink / raw)
  To: Christophe Lyon; +Cc: Tejas Belagod, binutils, Maxim Kuvyrkov, Peter Smith

On 27.06.19 14:58, Christophe Lyon wrote:
> On Fri, 9 Jun 2017 at 15:35, Erik Christiansen <dvalin@internode.on.net> wrote:
> > On 09.06.17 13:20, Tejas Belagod wrote:
> > > > > I agree that replacing the first-match rule with a subsequent match rule
> > > > > controlled by a command-line switch is much much lower implementation cost.
> > > > > It will be interesting to hear views of a maintainer about the preferred
> > > > > approach.
> > ...

> We have received requests to support non-contiguous memory regions in
> the BFD linker, so it seems it is time to resurrect this thread :-)
> 
> IIUC, Erik made a proposal that seems simpler to implement than the
> initial one in
> https://sourceware.org/ml/binutils/2017-03/msg00020.html
> 
> Tejas, on your side, do you have any news about this project? (ideally
> in terms of implementation, but updated specs would be good too)

Having not heard from Tejas or Thomas since the on-list posts in 2017,
the closest I can offer to draft specs is a summary of my recollection
of the consensus reached:

A memory region with a hole is identical to two non-contiguous memory
regions, if flowing of input sections preempts the linker's existing
overflow detection. Existing linker script syntax can specify the
regions, but the flowing is currently lacking. Flowing should occur when
the remaining space in a region cannot accommodate the next full input
section about to be allocated, and the current input section matches a
pattern in a subsequent output section, directed to another memory
region. The comparison is (LENGTH - SIZEOF) vs input section size, as we
have a 1:1 correspondence between output section and memory region for
flow steering, anyway.)

Flowing granularity is full input sections. That avoids needing to
rework relocations within an input section, due to flowing. Care may
need to be taken to still invoke existing linker code to provide
(default or explicit) FILL to the memory remnant. (I.e. slip it in
before resuming allocation in the next region.)

A command line switch would usefully protect against any backward linker
script incompatibility.

A new refinement?: The required search for a second pattern match might
be more simply implemented by remembering the output section in which
the first match was found, and when needing a second match, resuming the
search from the following output section. When available subsequent
matches are exhausted, the existing overflow detection is finally
reached, giving consistent behaviour.

The simplified implementation does not require modification of linker
script syntax. It also allows explicit placement of chosen input
sections in a preferred memory section. In addition to simple flowing of
*(.data) *(.data.*):

MEMORY
{
  RAML (rwx) : ORIGIN = 0x1FFF0000, LENGTH = 0x00010000
  RAMU (rwx) : ORIGIN = 0x20000000, LENGTH = 0x00040000
  RAMZ (rwx) : ORIGIN = 0x20040000, LENGTH = 0x00040000
}

SECTIONS
{
   .raml : AT ( ADDR (.text) + SIZEOF (.text) )
   {  _rmal_start = . ;
      *(.boot) ;
      *(.data) *(.data.*) ;
      _raml_end = . ;
   } > RAML

   .ramu : AT ( ADDR (.raml) + SIZEOF (.raml) )
   {  _rmau_start = . ;
      *(.data) *(.data.*) ;
      _ramu_end = . ;
   } > RAMU

   .ramz : AT ( ADDR (.ramu) + SIZEOF (.ramu) )
   {  _rmaz_start = . ;
      *(.data) *(.data.*) ;
      version.data
      _ramz_end = . ;
   } > RAMZ
}

additional patterns can be specified to allocate key input sections in a
specific memory region. Such control would not be achievable with a new
syntax ">RAML,RAMU,RAMZ" implementation.

It's not a very detailed "spec", but it's the strategy to date. (Barring
anything that I've forgotten.)

It would be interesting to hear how well that matches your use case, as
this is the time to wrangle any wrinkles, while maintaining existing
user flexibility.

Erik

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] Allow linker scripts to specify multiple output regions for an output section?
  2019-07-02  6:49       ` Erik Christiansen
@ 2019-07-11  8:42         ` Christophe Lyon
  2019-07-24  7:28           ` Nick Clifton
  0 siblings, 1 reply; 17+ messages in thread
From: Christophe Lyon @ 2019-07-11  8:42 UTC (permalink / raw)
  To: Christophe Lyon, Tejas Belagod, binutils, Maxim Kuvyrkov,
	Peter Smith, Alan Modra, nick clifton

On Tue, 2 Jul 2019 at 08:49, Erik Christiansen <dvalin@internode.on.net> wrote:
>
> On 27.06.19 14:58, Christophe Lyon wrote:
> > On Fri, 9 Jun 2017 at 15:35, Erik Christiansen <dvalin@internode.on.net> wrote:
> > > On 09.06.17 13:20, Tejas Belagod wrote:
> > > > > > I agree that replacing the first-match rule with a subsequent match rule
> > > > > > controlled by a command-line switch is much much lower implementation cost.
> > > > > > It will be interesting to hear views of a maintainer about the preferred
> > > > > > approach.
> > > ...
>
> > We have received requests to support non-contiguous memory regions in
> > the BFD linker, so it seems it is time to resurrect this thread :-)
> >
> > IIUC, Erik made a proposal that seems simpler to implement than the
> > initial one in
> > https://sourceware.org/ml/binutils/2017-03/msg00020.html
> >
> > Tejas, on your side, do you have any news about this project? (ideally
> > in terms of implementation, but updated specs would be good too)
>
> Having not heard from Tejas or Thomas since the on-list posts in 2017,
> the closest I can offer to draft specs is a summary of my recollection
> of the consensus reached:
>
> A memory region with a hole is identical to two non-contiguous memory
> regions, if flowing of input sections preempts the linker's existing
> overflow detection. Existing linker script syntax can specify the
> regions, but the flowing is currently lacking. Flowing should occur when
> the remaining space in a region cannot accommodate the next full input
> section about to be allocated, and the current input section matches a
> pattern in a subsequent output section, directed to another memory
> region. The comparison is (LENGTH - SIZEOF) vs input section size, as we
> have a 1:1 correspondence between output section and memory region for
> flow steering, anyway.)
>
> Flowing granularity is full input sections. That avoids needing to
> rework relocations within an input section, due to flowing. Care may
> need to be taken to still invoke existing linker code to provide
> (default or explicit) FILL to the memory remnant. (I.e. slip it in
> before resuming allocation in the next region.)
>
> A command line switch would usefully protect against any backward linker
> script incompatibility.
>
> A new refinement?: The required search for a second pattern match might
> be more simply implemented by remembering the output section in which
> the first match was found, and when needing a second match, resuming the
> search from the following output section. When available subsequent
> matches are exhausted, the existing overflow detection is finally
> reached, giving consistent behaviour.
>
> The simplified implementation does not require modification of linker
> script syntax. It also allows explicit placement of chosen input
> sections in a preferred memory section. In addition to simple flowing of
> *(.data) *(.data.*):
>
> MEMORY
> {
>   RAML (rwx) : ORIGIN = 0x1FFF0000, LENGTH = 0x00010000
>   RAMU (rwx) : ORIGIN = 0x20000000, LENGTH = 0x00040000
>   RAMZ (rwx) : ORIGIN = 0x20040000, LENGTH = 0x00040000
> }
>
> SECTIONS
> {
>    .raml : AT ( ADDR (.text) + SIZEOF (.text) )
>    {  _rmal_start = . ;
>       *(.boot) ;
>       *(.data) *(.data.*) ;
>       _raml_end = . ;
>    } > RAML
>
>    .ramu : AT ( ADDR (.raml) + SIZEOF (.raml) )
>    {  _rmau_start = . ;
>       *(.data) *(.data.*) ;
>       _ramu_end = . ;
>    } > RAMU
>
>    .ramz : AT ( ADDR (.ramu) + SIZEOF (.ramu) )
>    {  _rmaz_start = . ;
>       *(.data) *(.data.*) ;
>       version.data
>       _ramz_end = . ;
>    } > RAMZ
> }
>
> additional patterns can be specified to allocate key input sections in a
> specific memory region. Such control would not be achievable with a new
> syntax ">RAML,RAMU,RAMZ" implementation.
>
> It's not a very detailed "spec", but it's the strategy to date. (Barring
> anything that I've forgotten.)
>
> It would be interesting to hear how well that matches your use case, as
> this is the time to wrangle any wrinkles, while maintaining existing
> user flexibility.
>

Hi,

Sorry for the delay.

Thanks very much for this updated summary.

I checked with other internal users, and it seems it would be OK for
our use case.

I think I will have a look at implementing it, but not immediately
because of holidays.
I hope it's not too difficult :-)

What do maintainers think about this? Would it be acceptable?

Thanks,

Christophe


> Erik

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] Allow linker scripts to specify multiple output regions for an output section?
  2019-07-11  8:42         ` Christophe Lyon
@ 2019-07-24  7:28           ` Nick Clifton
  2019-07-24  9:18             ` Simon Richter
  0 siblings, 1 reply; 17+ messages in thread
From: Nick Clifton @ 2019-07-24  7:28 UTC (permalink / raw)
  To: Christophe Lyon, Tejas Belagod, binutils, Maxim Kuvyrkov,
	Peter Smith, Alan Modra

Hi Christophe,

>> The simplified implementation does not require modification of linker
>> script syntax. It also allows explicit placement of chosen input
>> sections in a preferred memory section. In addition to simple flowing of
>> *(.data) *(.data.*):

>> SECTIONS
>> {
>>    .raml : AT ( ADDR (.text) + SIZEOF (.text) )
>>    {  _rmal_start = . ;
>>       *(.boot) ;
>>       *(.data) *(.data.*) ;
>>       _raml_end = . ;
>>    } > RAML
>>
>>    .ramu : AT ( ADDR (.raml) + SIZEOF (.raml) )
>>    {  _rmau_start = . ;
>>       *(.data) *(.data.*) ;
>>       _ramu_end = . ;
>>    } > RAMU

> What do maintainers think about this? Would it be acceptable?

Yes, but you need to be very careful about what happens when switching
from one output section to another.  Can the linker backtrack to an 
earlier output section if it subsequently finds an input section which
will fit in the remaining space ?

If you do allow backtracking then the ordering of sections can change
from the current linker's specified behaviour (sections are linked in
input order unless a SORT keyword is used).  And so users will complain.
If you don't allow backtracking then there could be gaps in memory
regions which could have been used, and users will complain... :-)

Cheers
  Nick

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] Allow linker scripts to specify multiple output regions for an output section?
  2019-07-24  7:28           ` Nick Clifton
@ 2019-07-24  9:18             ` Simon Richter
  2019-07-24 12:48               ` Erik Christiansen
  0 siblings, 1 reply; 17+ messages in thread
From: Simon Richter @ 2019-07-24  9:18 UTC (permalink / raw)
  To: binutils

Hi,

On Wed, Jul 24, 2019 at 08:28:05AM +0100, Nick Clifton wrote:

> If you do allow backtracking then the ordering of sections can change
> from the current linker's specified behaviour (sections are linked in
> input order unless a SORT keyword is used).  And so users will complain.
> If you don't allow backtracking then there could be gaps in memory
> regions which could have been used, and users will complain... :-)

And if a section would fit after linker relaxation, but the new layout
would make the relaxations invalid...

   Simon

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] Allow linker scripts to specify multiple output regions for an output section?
  2019-07-24  9:18             ` Simon Richter
@ 2019-07-24 12:48               ` Erik Christiansen
  0 siblings, 0 replies; 17+ messages in thread
From: Erik Christiansen @ 2019-07-24 12:48 UTC (permalink / raw)
  To: binutils

On 24.07.19 11:18, Simon Richter wrote:
> Hi,
> 
> On Wed, Jul 24, 2019 at 08:28:05AM +0100, Nick Clifton wrote:
> 
> > If you do allow backtracking then the ordering of sections can change
> > from the current linker's specified behaviour (sections are linked in
> > input order unless a SORT keyword is used).  And so users will complain.
> > If you don't allow backtracking then there could be gaps in memory
> > regions which could have been used, and users will complain... :-)

In the years we have waited for a sufficient need to warrant
implementation, potential users have been rarer than hen's teethÂ¹. My
instinct is that anyone needing flowing will be grateful to have it, and
will accept both the proffered granularity, and freedom from the curse of
backtracking. The prior consensus has up to now been to respect FILL for
the trailing segment remnant, and that looks fine to me. (It is a darn
sight easier to implement too, I figure.)

That a given hole potentially grows by almost the granularity of the nearest
input section seems hardly troubling - and if it is, then it is up to the
user to reduce the size of the nearest input section, I submit.

But I figure that the casting vote probably goes to the implementer, who
has the imperative of a current and actual use case. :-)

> And if a section would fit after linker relaxation, but the new layout
> would make the relaxations invalid...

The previous run-up at this problem specified the flowing granularity to
be an input section. Flowing an input section across a hole has been
avoided, specifically to stay out of the clutches of such relocation
issues. (There's enough work there already, I think. :-)

Erik

Â¹ All right, there was one, admittedly, but it didn't lead to
  implementation.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [RFC] Allow linker scripts to specify multiple output regions for an output section?
@ 2017-02-22 15:28 Thomas Preudhomme
  2017-02-27 10:27 ` Tejas Belagod
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Thomas Preudhomme @ 2017-02-22 15:28 UTC (permalink / raw)
  To: binutils, tejas.belagod

[Sending on behalf of Tejas Belagod, please reply to both him (in Cc) and me]

Hi,

There has been some interest in the past in having syntactic support for 
specifying mapping of an output section to multiple memory regions in the GNU LD 
scripting language (eg. https://sourceware.org/bugzilla/show_bug.cgi?id=14299). 
I would like to propose a scheme here and welcome any feedback.

The section command in the LD Script language is structured thus:

section [address] [(type)] :
	[AT(lma)]
	[ALIGN(section_align)]
	[SUBALIGN(subsection_align)]
	[constraint]
	{
	  output-section-command
	  output-section-command
	  ...
	} [>region] [AT>lma_region] [:phdr :phdr ...] [=fillexp]

As I understand, it simply means - place the output section at â€˜addressâ€™ with 
attributes specified above (type, alignment etc). If LMA is specified, the 
image(startup code etc.) most likely handles the copying from load address to 
output section VMA. Multiple segment spec means the output section can be part 
of more than one segment and â€˜fillexpâ€™ simply fills the output section loaded 
with the fill value.

Now, this does not have a method to specify output section spanning multiple 
memory regions. For example, if there are 2 RAM regions RAML and RAMU and the 
user wants an output section to first fill RAML and then when RAML is full, i.e. 
when the remaining space in RAML cannot accommodate a full input section, start 
filling RAMU, the user has to split the sections into multiple output sections. 
If we extend this syntax to specify multiple output regions, we can make the 
linker map the output section to multiple regions by filling the output region 
with input sections in the order specified in the â€˜output-section-commandâ€™ and 
when its full (meaning when the remaining gap in a region cannot accommodate one 
full input section, it starts from the next output region. Eg.

MEMORY

{
   RAML (rwx) : ORIGIN = 0x1FFF0000, LENGTH = 0x00010000
   RAMU (rwx) : ORIGIN = 0x20000000, LENGTH = 0x00040000
   RAMZ (rwx) : ORIGIN = 0x20040000, LENGTH = 0x00040000	
}

SECTIONS
{
   .text 0x1000 : { *(.text) _etext = . ; }
   .mdata  :
   AT ( ADDR (.text) + SIZEOF (.text) )
   { _data = . ; *(.data) *(.data.*); _edata = . ; } > RAML, RAMU, RAMZ
}

The statement:

   .mdata :
    AT ( ADDR (.text) + SIZEOF (.text) )
    { _data = . ; *(.data) *(.data.*); _edata = . ; } > RAML, RAMU, RAMZ

Will have roughly the following meaning:

  For_each_output_section {
   curr_mem_region = get_next_mem_region ();
   location_counter = get_vma_mem_region (curr_mem_region);

   While (fill) {
     current_input_section = get_next_input_section ();

     If (location_counter > end_vma_of_mem_region_in_list)
       Break;

     mem_avail_in_curr_region = get_vma_mem_region (curr_mem_region) + sizeof 
(curr_mem_region) - location_counter;

     If ( sizeof (current_input_section) > mem_avail_in_curr_region))
      {
       curr_mem_region = get_next_mem_region ();
       location_counter = get_vma_mem_region (curr_mem_region);
      }

     process_section (current_input_section, location_counter);
     location_counter += sizeof (current_input_section);
   }

  }

Illustration:

Consider an example where we have the following input .data sections:

.data: size 0x0000FFF0
.data.a : size 0x000000F0
.data.b : size 0x00003000
.data.c : size 0x00000200

With the above scheme, this will be mapped in the following way to RAML,RAMU and 
RAMZ:

RAML : (0x1FFF0000 - 0x1FFFFFF0): .data
        (0x1FFFFFF0 - 0x1FFFFFFF): *** GAP ***

RAMU : (0x20000000 - 0x200000F0): .data.a
        (0x200000F0 - 0x200030F0): .data.b
        (0x200030F0 - 0x200032F0): .data.c

It will not affect the specification in terms of the other attributes, but one 
(LMA):

* Output section VMA: No change - this just specifies where the output section 
will start.

* type: No change - this is for the output section as a whole - output memory 
regions will not change it.

* LMA: The output section can still be loaded from one LMA and mapped to output 
VMA - the only change here is that the loader will need to map the output 
sections to VMA with the same pattern as the multiple output region matching 
code above. Can a loader do that? Can ad-hoc loaders do this? Or do all loaders 
assume that regions are continguous when output section is mapped to VMAs?

* phdr: No change - Multiple values can still be specified here. One can have an 
output section map to multiple segments irrespective of their output memory 
region mapping.

* Fillexp: No change. We might possibly want to introduce a fillexp for the gaps 
left behind when filling multiple output memory regions.

Caveats:

A comma-separated list of regions will not guarantee contiguous placement of 
input sections, the only way to get a contiguous placement of input sections 
will be to assign the output section to one monolithic memory region.

For orthogonality and consistency, we would want to apply the multiple region 
feature to overlays too. The semantics will not be different from the algorithm 
mentioned above. The only caveat is that the overlay manager/loader will need to 
handle the swapping in and out of sections that run from the VMA consistently 
with the mapping algo described above. Do we want this for overlays too?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] Allow linker scripts to specify multiple output regions for an output section?
  2017-02-22 15:28 Thomas Preudhomme
@ 2017-02-27 10:27 ` Tejas Belagod
  2017-02-28  5:52 ` Erik Christiansen
  2017-03-02  4:32 ` Erik Christiansen
  2 siblings, 0 replies; 17+ messages in thread
From: Tejas Belagod @ 2017-02-27 10:27 UTC (permalink / raw)
  To: amodra; +Cc: Thomas Preud'homme, binutils

Hi Alan,

Do you have any comments on this?

Thanks,
Tejas.

On 22/02/17 15:28, Thomas Preudhomme wrote:
> [Sending on behalf of Tejas Belagod, please reply to both him (in Cc) and me]
>
> Hi,
>
> There has been some interest in the past in having syntactic support for
> specifying mapping of an output section to multiple memory regions in the GNU LD
> scripting language (eg. https://sourceware.org/bugzilla/show_bug.cgi?id=14299).
> I would like to propose a scheme here and welcome any feedback.
>
> The section command in the LD Script language is structured thus:
>
> section [address] [(type)] :
> 	[AT(lma)]
> 	[ALIGN(section_align)]
> 	[SUBALIGN(subsection_align)]
> 	[constraint]
> 	{
> 	  output-section-command
> 	  output-section-command
> 	  ...
> 	} [>region] [AT>lma_region] [:phdr :phdr ...] [=fillexp]
>
> As I understand, it simply means - place the output section at â€˜addressâ€™ with
> attributes specified above (type, alignment etc). If LMA is specified, the
> image(startup code etc.) most likely handles the copying from load address to
> output section VMA. Multiple segment spec means the output section can be part
> of more than one segment and â€˜fillexpâ€™ simply fills the output section loaded
> with the fill value.
>
> Now, this does not have a method to specify output section spanning multiple
> memory regions. For example, if there are 2 RAM regions RAML and RAMU and the
> user wants an output section to first fill RAML and then when RAML is full, i.e.
> when the remaining space in RAML cannot accommodate a full input section, start
> filling RAMU, the user has to split the sections into multiple output sections.
> If we extend this syntax to specify multiple output regions, we can make the
> linker map the output section to multiple regions by filling the output region
> with input sections in the order specified in the â€˜output-section-commandâ€™ and
> when its full (meaning when the remaining gap in a region cannot accommodate one
> full input section, it starts from the next output region. Eg.
>
> MEMORY
>
> {
>     RAML (rwx) : ORIGIN = 0x1FFF0000, LENGTH = 0x00010000
>     RAMU (rwx) : ORIGIN = 0x20000000, LENGTH = 0x00040000
>     RAMZ (rwx) : ORIGIN = 0x20040000, LENGTH = 0x00040000	
> }
>
> SECTIONS
> {
>     .text 0x1000 : { *(.text) _etext = . ; }
>     .mdata  :
>     AT ( ADDR (.text) + SIZEOF (.text) )
>     { _data = . ; *(.data) *(.data.*); _edata = . ; } > RAML, RAMU, RAMZ
> }
>
> The statement:
>
>     .mdata :
>      AT ( ADDR (.text) + SIZEOF (.text) )
>      { _data = . ; *(.data) *(.data.*); _edata = . ; } > RAML, RAMU, RAMZ
>
>
> Will have roughly the following meaning:
>
>    For_each_output_section {
>     curr_mem_region = get_next_mem_region ();
>     location_counter = get_vma_mem_region (curr_mem_region);
>
>     While (fill) {
>       current_input_section = get_next_input_section ();
>
>       If (location_counter > end_vma_of_mem_region_in_list)
>         Break;
>
>       mem_avail_in_curr_region = get_vma_mem_region (curr_mem_region) + sizeof
> (curr_mem_region) - location_counter;
>
>       If ( sizeof (current_input_section) > mem_avail_in_curr_region))
>        {
>         curr_mem_region = get_next_mem_region ();
>         location_counter = get_vma_mem_region (curr_mem_region);
>        }
>
>       process_section (current_input_section, location_counter);
>       location_counter += sizeof (current_input_section);
>     }
>
>    }
>
>
> Illustration:
>
> Consider an example where we have the following input .data sections:
>
> .data: size 0x0000FFF0
> .data.a : size 0x000000F0
> .data.b : size 0x00003000
> .data.c : size 0x00000200
>
> With the above scheme, this will be mapped in the following way to RAML,RAMU and
> RAMZ:
>
> RAML : (0x1FFF0000 - 0x1FFFFFF0): .data
>          (0x1FFFFFF0 - 0x1FFFFFFF): *** GAP ***
>
> RAMU : (0x20000000 - 0x200000F0): .data.a
>          (0x200000F0 - 0x200030F0): .data.b
>          (0x200030F0 - 0x200032F0): .data.c
>
>
> It will not affect the specification in terms of the other attributes, but one
> (LMA):
>
> * Output section VMA: No change - this just specifies where the output section
> will start.
>
> * type: No change - this is for the output section as a whole - output memory
> regions will not change it.
>
> * LMA: The output section can still be loaded from one LMA and mapped to output
> VMA - the only change here is that the loader will need to map the output
> sections to VMA with the same pattern as the multiple output region matching
> code above. Can a loader do that? Can ad-hoc loaders do this? Or do all loaders
> assume that regions are continguous when output section is mapped to VMAs?
>
> * phdr: No change - Multiple values can still be specified here. One can have an
> output section map to multiple segments irrespective of their output memory
> region mapping.
>
> * Fillexp: No change. We might possibly want to introduce a fillexp for the gaps
> left behind when filling multiple output memory regions.
>
> Caveats:
>
> A comma-separated list of regions will not guarantee contiguous placement of
> input sections, the only way to get a contiguous placement of input sections
> will be to assign the output section to one monolithic memory region.
>
> For orthogonality and consistency, we would want to apply the multiple region
> feature to overlays too. The semantics will not be different from the algorithm
> mentioned above. The only caveat is that the overlay manager/loader will need to
> handle the swapping in and out of sections that run from the VMA consistently
> with the mapping algo described above. Do we want this for overlays too?
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] Allow linker scripts to specify multiple output regions for an output section?
  2017-02-22 15:28 Thomas Preudhomme
  2017-02-27 10:27 ` Tejas Belagod
@ 2017-02-28  5:52 ` Erik Christiansen
  2017-02-28 12:11   ` Tejas Belagod
  2017-03-02  4:32 ` Erik Christiansen
  2 siblings, 1 reply; 17+ messages in thread
From: Erik Christiansen @ 2017-02-28  5:52 UTC (permalink / raw)
  To: Thomas Preudhomme; +Cc: binutils, tejas.belagod

On 22.02.17 15:28, Thomas Preudhomme wrote:
> There has been some interest in the past in having syntactic support for
> specifying mapping of an output section to multiple memory regions in the
> GNU LD scripting language (eg.
> https://sourceware.org/bugzilla/show_bug.cgi?id=14299). I would like to
> propose a scheme here and welcome any feedback.

TL;DR: Detailed response begins after 6 paragraphs.

OK, in the absence of prior discussion, I'll just think aloud as I
correlate the proposal with my experience in three decades developing
embedded systems. Unfortunately, the one time an MMU was involved, that
was done by the time I became involved, but memory holes are all black.

The closest scenario I recall is where there were disparate physical
memories, both on and off chip, I simply added a MEMORY region for each
such block, e.g. Flash, 16bit SRAM, 8bit SRAM, a couple of small ones
for specific memory mapped system chips with bunches of config
registers, and maybe an FPGA in the mix. Add comments for device names
and the waitstate generator values, and the script serves as central
documentation too.

With that one-to-one region mapping, there was never any conflict over
where stuff should be located, and non were interchangeable. It is as
described by "some on-chip memory and some off-chip memory, but at
non-contiguous addresses" in the above link. And where we had both 8 and
16 bit SRAMS, it was most definitely consistent with "a region of
on-chip SRAM which performs better for code, and the remainder performs
better for data", except that using the wrong one was fatal rather than
merely inferior.

One issue I've encountered is detecting region overflow when multiple
output sections contribute to its content, but existing syntax supports
that, e.g.:

MEMORY
{
  flash   (rx)  : ORIGIN = 0, LENGTH = 32K
  ram    (rw!x) : ORIGIN = 0x800060, LENGTH = 2K
  eeprom (rw!x) : ORIGIN = 0x810000, LENGTH = 1K
}

. = ASSERT (_etext + SIZEOF (.data) <= LENGTH(flash) , "Error: .text + .data
collectively overflow the flash memory." ) ;

But the need to flow across memory holes never eventuated in practice,
as a modest chunk of on-chip RAM could always be used for e.g. sdata,
leaving no need for flowing. All other regions were always incompatible,
making flowing impossible.

...
> If LMA is specified, the image(startup code etc.) most likely handles
> the copying from load address to output section VMA.

Yes, it does. And in the generic init code I've encountered, it has just
been a single copy loop for e.g. bss, performing a contiguous block copy.
(And when I've written it, that was true too.)

> Multiple segment spec means the output section can be part of more
> than one segment and â€˜fillexpâ€™ simply fills the output section loaded
> with the fill value.

Trans-hole flowing would also require a runtime copy loop for each
non-contiguous block, or a table-driven multi-block copier, with the
run-time table somehow initialised from the linker script. (I can
imagine using variables defined in the linker script, and the .RPT
assembler directive - maybe.)

> Now, this does not have a method to specify output section spanning multiple
> memory regions. For example, if there are 2 RAM regions RAML and RAMU and
> the user wants an output section to first fill RAML and then when RAML is
> full, i.e. when the remaining space in RAML cannot accommodate a full input
> section, start filling RAMU, the user has to split the sections into
> multiple output sections. If we extend this syntax to specify multiple
> output regions, we can make the linker map the output section to multiple
> regions by filling the output region with input sections in the order
> specified in the â€˜output-section-commandâ€™ and when its full (meaning when
> the remaining gap in a region cannot accommodate one full input section, it
> starts from the next output region.

This seems to be the alternate view of the problem of asking ld to flow
code around holes in a region, something it still can't do, IIRC. I
state it that way, because two non-contiguous memory regions over which
code (or data) may be interchangeably flowed, are identical to a single
region with a hole.

The proposal does seem to be a way to think about addressing that issue:

> Eg.
> 
> MEMORY
> 
> {
>   RAML (rwx) : ORIGIN = 0x1FFF0000, LENGTH = 0x00010000
>   RAMU (rwx) : ORIGIN = 0x20000000, LENGTH = 0x00040000
>   RAMZ (rwx) : ORIGIN = 0x20040000, LENGTH = 0x00040000	
> }
> 
> SECTIONS
> {
>   .text 0x1000 : { *(.text) _etext = . ; }
>   .mdata  :
>   AT ( ADDR (.text) + SIZEOF (.text) )
>   { _data = . ; *(.data) *(.data.*); _edata = . ; } > RAML, RAMU, RAMZ
> }

Without the need for new syntax or complex init code generators,
having gcc flow code across up to 5 pages of flash plus .lowtext and a
floating .hightext was compatible with the linker script and tests shown
here:

http://lists.nongnu.org/archive/html/avr-gcc-list/2012-12/msg00044.html

While details have faded from wet RAM, ISTR that holes were
manufacturable by not populating any of the 5 pages, which gcc sees as
named spaces. The gcc stuff was done in the AVR back end, IIRC, while an
implementation in ld would be generic.

> Illustration:
> 
> Consider an example where we have the following input .data sections:
> 
> .data: size 0x0000FFF0
> .data.a : size 0x000000F0
> .data.b : size 0x00003000
> .data.c : size 0x00000200
> 
> With the above scheme, this will be mapped in the following way to RAML,RAMU
> and RAMZ:
> 
> RAML : (0x1FFF0000 - 0x1FFFFFF0): .data
>        (0x1FFFFFF0 - 0x1FFFFFFF): *** GAP ***

Would GAP use ALIGNMENT, or introduce a new parameter?

How would the target-specific relocations required to break code across
the hole be handled by ld? E.g. break a small AVR code loop (with 6-bit
relative addressing range) and you'll need a LJMP to bridge the hole,
and another with reversed loop conditionality to close the loop.
Multiply that task by all the possible relocs, and again by all the
possible CPU targets, and it's never-ending work for a software team for
life.

It seems more 

> RAMU : (0x20000000 - 0x200000F0): .data.a
>        (0x200000F0 - 0x200030F0): .data.b
>        (0x200030F0 - 0x200032F0): .data.c
> 
> 
> It will not affect the specification in terms of the other attributes, but
> one (LMA):
> 
> * Output section VMA: No change - this just specifies where the output
> section will start.
> 
> * type: No change - this is for the output section as a whole - output
> memory regions will not change it.
> 
> * LMA: The output section can still be loaded from one LMA and mapped to
> output VMA - the only change here is that the loader will need to map the
> output sections to VMA with the same pattern as the multiple output region
> matching code above. Can a loader do that? Can ad-hoc loaders do this? Or do
> all loaders assume that regions are continguous when output section is
> mapped to VMAs?

Contiguous. Hole-flowing is what you're proposing to implement, both the
linker internal component (target-specific reloctions), and the generic
(e.g. table driven) multi-block copy loop synthesiser for custom init
code generation. How that would integrate with existing init code in
various implementations, I have no idea.

If LMA can also be flowed around a hole, then runtime init code must be
able to handle not only non-contiguous delivery, but gapped pick-up. Has
the complexity of simultaneously handling different gaps in both been
considered?

...
> For orthogonality and consistency, we would want to apply the multiple
> region feature to overlays too. The semantics will not be different from the
> algorithm mentioned above. The only caveat is that the overlay
> manager/loader will need to handle the swapping in and out of sections that
> run from the VMA consistently with the mapping algo described above. Do we
> want this for overlays too?

Expanding the complexity of a single-problem solution to cover other
situations seems courageous, unless it naturally falls out of the
narrower solution. As overlays are used e.g. when RAM size or CPU
instruction addressing range is constrained, but there's ample flash,
then the likelihood of holes in either is limited, I suspect.

Specifying discrete output sections with VMAs placed around the physical
holes is another way to dodge them. They can all be allocated to a
global encompassing memory region. Flowing is performed manually by
assigning suitable code chunks to preferred input sections. Automating
that, as intimated above, is non-trivial.

Caveat: Above thoughts have flowed without aid of caffeine, and are
        recollections from old battles.

Erik

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] Allow linker scripts to specify multiple output regions for an output section?
  2017-02-28  5:52 ` Erik Christiansen
@ 2017-02-28 12:11   ` Tejas Belagod
  2017-03-01  7:12     ` Erik Christiansen
  0 siblings, 1 reply; 17+ messages in thread
From: Tejas Belagod @ 2017-02-28 12:11 UTC (permalink / raw)
  To: dvalin; +Cc: Thomas Preudhomme, binutils

Hi Erik,

Thanks for your comments. My comments inline below.

On 28/02/17 05:51, Erik Christiansen wrote:
> On 22.02.17 15:28, Thomas Preudhomme wrote:
>> There has been some interest in the past in having syntactic support for
>> specifying mapping of an output section to multiple memory regions in the
>> GNU LD scripting language (eg.
>> https://sourceware.org/bugzilla/show_bug.cgi?id=14299). I would like to
>> propose a scheme here and welcome any feedback.
>
> TL;DR: Detailed response begins after 6 paragraphs.
>
> OK, in the absence of prior discussion, I'll just think aloud as I
> correlate the proposal with my experience in three decades developing
> embedded systems. Unfortunately, the one time an MMU was involved, that
> was done by the time I became involved, but memory holes are all black.
>
> The closest scenario I recall is where there were disparate physical
> memories, both on and off chip, I simply added a MEMORY region for each
> such block, e.g. Flash, 16bit SRAM, 8bit SRAM, a couple of small ones
> for specific memory mapped system chips with bunches of config
> registers, and maybe an FPGA in the mix. Add comments for device names
> and the waitstate generator values, and the script serves as central
> documentation too.
>
> With that one-to-one region mapping, there was never any conflict over
> where stuff should be located, and non were interchangeable. It is as
> described by "some on-chip memory and some off-chip memory, but at
> non-contiguous addresses" in the above link. And where we had both 8 and
> 16 bit SRAMS, it was most definitely consistent with "a region of
> on-chip SRAM which performs better for code, and the remainder performs
> better for data", except that using the wrong one was fatal rather than
> merely inferior.
>
> One issue I've encountered is detecting region overflow when multiple
> output sections contribute to its content, but existing syntax supports
> that, e.g.:
>
> MEMORY
> {
>    flash   (rx)  : ORIGIN = 0, LENGTH = 32K
>    ram    (rw!x) : ORIGIN = 0x800060, LENGTH = 2K
>    eeprom (rw!x) : ORIGIN = 0x810000, LENGTH = 1K
> }
>
> . = ASSERT (_etext + SIZEOF (.data) <= LENGTH(flash) , "Error: .text + .data
> collectively overflow the flash memory." ) ;
>
> But the need to flow across memory holes never eventuated in practice,
> as a modest chunk of on-chip RAM could always be used for e.g. sdata,
> leaving no need for flowing. All other regions were always incompatible,
> making flowing impossible.
>
> ...
>> If LMA is specified, the image(startup code etc.) most likely handles
>> the copying from load address to output section VMA.
>
> Yes, it does. And in the generic init code I've encountered, it has just
> been a single copy loop for e.g. bss, performing a contiguous block copy.
> (And when I've written it, that was true too.)
>
>> Multiple segment spec means the output section can be part of more
>> than one segment and â€˜fillexpâ€™ simply fills the output section loaded
>> with the fill value.
>
> Trans-hole flowing would also require a runtime copy loop for each
> non-contiguous block, or a table-driven multi-block copier, with the
> run-time table somehow initialised from the linker script. (I can
> imagine using variables defined in the linker script, and the .RPT
> assembler directive - maybe.)
>
>> Now, this does not have a method to specify output section spanning multiple
>> memory regions. For example, if there are 2 RAM regions RAML and RAMU and
>> the user wants an output section to first fill RAML and then when RAML is
>> full, i.e. when the remaining space in RAML cannot accommodate a full input
>> section, start filling RAMU, the user has to split the sections into
>> multiple output sections. If we extend this syntax to specify multiple
>> output regions, we can make the linker map the output section to multiple
>> regions by filling the output region with input sections in the order
>> specified in the â€˜output-section-commandâ€™ and when its full (meaning when
>> the remaining gap in a region cannot accommodate one full input section, it
>> starts from the next output region.
>
> This seems to be the alternate view of the problem of asking ld to flow
> code around holes in a region, something it still can't do, IIRC. I
> state it that way, because two non-contiguous memory regions over which
> code (or data) may be interchangeably flowed, are identical to a single
> region with a hole.
>
> The proposal does seem to be a way to think about addressing that issue:
>
>> Eg.
>>
>> MEMORY
>>
>> {
>>    RAML (rwx) : ORIGIN = 0x1FFF0000, LENGTH = 0x00010000
>>    RAMU (rwx) : ORIGIN = 0x20000000, LENGTH = 0x00040000
>>    RAMZ (rwx) : ORIGIN = 0x20040000, LENGTH = 0x00040000	
>> }
>>
>> SECTIONS
>> {
>>    .text 0x1000 : { *(.text) _etext = . ; }
>>    .mdata  :
>>    AT ( ADDR (.text) + SIZEOF (.text) )
>>    { _data = . ; *(.data) *(.data.*); _edata = . ; } > RAML, RAMU, RAMZ
>> }
>
> Without the need for new syntax or complex init code generators,
> having gcc flow code across up to 5 pages of flash plus .lowtext and a
> floating .hightext was compatible with the linker script and tests shown
> here:
>
> http://lists.nongnu.org/archive/html/avr-gcc-list/2012-12/msg00044.html
>
> While details have faded from wet RAM, ISTR that holes were
> manufacturable by not populating any of the 5 pages, which gcc sees as
> named spaces. The gcc stuff was done in the AVR back end, IIRC, while an
> implementation in ld would be generic.
>
>> Illustration:
>>
>> Consider an example where we have the following input .data sections:
>>
>> .data: size 0x0000FFF0
>> .data.a : size 0x000000F0
>> .data.b : size 0x00003000
>> .data.c : size 0x00000200
>>
>> With the above scheme, this will be mapped in the following way to RAML,RAMU
>> and RAMZ:
>>
>> RAML : (0x1FFF0000 - 0x1FFFFFF0): .data
>>         (0x1FFFFFF0 - 0x1FFFFFFF): *** GAP ***
>
> Would GAP use ALIGNMENT, or introduce a new parameter?
>

I wouldn't want to overload ALIGNMENT here - what if its needed simultaneously 
with ALIGNMENT. Can we not leave this space unassigned? More often than not if 
one's filling a memory region automatically, would they really care what goes 
into the gaps (if security is not a concern)? OTOH, if security is a concern, we 
can explore introducing a new syntax with a default behavior of zero-filling the 
gaps.

> How would the target-specific relocations required to break code across
> the hole be handled by ld? E.g. break a small AVR code loop (with 6-bit
> relative addressing range) and you'll need a LJMP to bridge the hole,
> and another with reversed loop conditionality to close the loop.
> Multiply that task by all the possible relocs, and again by all the
> possible CPU targets, and it's never-ending work for a software team for
> life.
>

As I understand, compilers generate references to objects within a section with a

  .<input_section_name> + offset_within_section

Now when a section that spans 2 or more regions inserts holes/padding to prevent 
an object from straddling 2 regions, the offsets within the section to other 
objects will change. This means all the compiler-generated "section + offset" of 
all objects that come after the padding will need to be fixed up. Its really 
difficult to know which ones to fix up - the relocations are only on the section 
label, not the object in the section. So, what I'm proposing here will not split 
the input sections - input sections will move as a block.

> It seems more
>
>> RAMU : (0x20000000 - 0x200000F0): .data.a
>>         (0x200000F0 - 0x200030F0): .data.b
>>         (0x200030F0 - 0x200032F0): .data.c
>>
>>
>> It will not affect the specification in terms of the other attributes, but
>> one (LMA):
>>
>> * Output section VMA: No change - this just specifies where the output
>> section will start.
>>
>> * type: No change - this is for the output section as a whole - output
>> memory regions will not change it.
>>
>> * LMA: The output section can still be loaded from one LMA and mapped to
>> output VMA - the only change here is that the loader will need to map the
>> output sections to VMA with the same pattern as the multiple output region
>> matching code above. Can a loader do that? Can ad-hoc loaders do this? Or do
>> all loaders assume that regions are continguous when output section is
>> mapped to VMAs?
>
> Contiguous. Hole-flowing is what you're proposing to implement, both the
> linker internal component (target-specific reloctions), and the generic
> (e.g. table driven) multi-block copy loop synthesiser for custom init
> code generation. How that would integrate with existing init code in
> various implementations, I have no idea.
>
> If LMA can also be flowed around a hole, then runtime init code must be
> able to handle not only non-contiguous delivery, but gapped pick-up. Has
> the complexity of simultaneously handling different gaps in both been
> considered?
>

I haven't thought about that. Can it be worked on the principle that when one 
specifies an LMA and there is user-written init code to copy blocks, the init 
code programmer knows the LMA gap layout and can handle the gaps accordingly? It 
could be the case currently where code from different non-contiguous ROMs are 
copied into a RAM during startup. This IMHO, is always specific to the 
particular embedded system being deployed.

> ...
>> For orthogonality and consistency, we would want to apply the multiple
>> region feature to overlays too. The semantics will not be different from the
>> algorithm mentioned above. The only caveat is that the overlay
>> manager/loader will need to handle the swapping in and out of sections that
>> run from the VMA consistently with the mapping algo described above. Do we
>> want this for overlays too?
>
> Expanding the complexity of a single-problem solution to cover other
> situations seems courageous, unless it naturally falls out of the
> narrower solution. As overlays are used e.g. when RAM size or CPU
> instruction addressing range is constrained, but there's ample flash,
> then the likelihood of holes in either is limited, I suspect.
>

Makes sense.

Thanks,
Tejas.

> Specifying discrete output sections with VMAs placed around the physical
> holes is another way to dodge them. They can all be allocated to a
> global encompassing memory region. Flowing is performed manually by
> assigning suitable code chunks to preferred input sections. Automating
> that, as intimated above, is non-trivial.
>
> Caveat: Above thoughts have flowed without aid of caffeine, and are
>          recollections from old battles.
>
> Erik
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] Allow linker scripts to specify multiple output regions for an output section?
  2017-02-28 12:11   ` Tejas Belagod
@ 2017-03-01  7:12     ` Erik Christiansen
  0 siblings, 0 replies; 17+ messages in thread
From: Erik Christiansen @ 2017-03-01  7:12 UTC (permalink / raw)
  To: Tejas Belagod; +Cc: Thomas Preudhomme, binutils

On 28.02.17 12:11, Tejas Belagod wrote:
> On 28/02/17 05:51, Erik Christiansen wrote:
> > 
> > Would GAP use ALIGNMENT, or introduce a new parameter?
> 
> I wouldn't want to overload ALIGNMENT here - what if its needed
> simultaneously with ALIGNMENT. Can we not leave this space unassigned? More
> often than not if one's filling a memory region automatically, would they
> really care what goes into the gaps (if security is not a concern)?

My reason for raising the issue of ALIGNMENT was concern about splitting
instructions at the edge of a hole during flowing. I see below that the
proposed method avoids that problem entirely.

> OTOH, if security is a concern, we can explore introducing a new
> syntax with a default behavior of zero-filling the gaps.

We already have FILL to cover that, so I wouldn't worry.

> > How would the target-specific relocations required to break code across
> > the hole be handled by ld? E.g. break a small AVR code loop (with 6-bit
> > relative addressing range) and you'll need a LJMP to bridge the hole,
> > and another with reversed loop conditionality to close the loop.
> > Multiply that task by all the possible relocs, and again by all the
> > possible CPU targets, and it's never-ending work for a software team for
> > life.
> > 
> 
> As I understand, compilers generate references to objects within a section with a
> 
>  .<input_section_name> + offset_within_section
> 
> Now when a section that spans 2 or more regions inserts holes/padding to
> prevent an object from straddling 2 regions, the offsets within the section
> to other objects will change. This means all the compiler-generated "section
> + offset" of all objects that come after the padding will need to be fixed
> up. Its really difficult to know which ones to fix up - the relocations are
> only on the section label, not the object in the section. So, what I'm
> proposing here will not split the input sections - input sections will move
> as a block.

Aha! That is a commendably inexpensive way to avoid a great deal of pain.
A little bit of SIZEOF and LENGTH arithmetic in ld easily predicts
whether the current input section will fit in the current region, and
the start address of the next region becomes the new base for offsets,
without the need for additional arithmetic. Very neat. (So long as we
size our input sections modestly.)

...
> > If LMA can also be flowed around a hole, then runtime init code must be
> > able to handle not only non-contiguous delivery, but gapped pick-up. Has
> > the complexity of simultaneously handling different gaps in both been
> > considered?
> > 
> 
> I haven't thought about that. Can it be worked on the principle that when
> one specifies an LMA and there is user-written init code to copy blocks, the
> init code programmer knows the LMA gap layout and can handle the gaps
> accordingly?

I was playing devil's advocate there - the likelihood of gapped LMA
seems low in practice, as flash would mostly be larger than fast RAM.
It's just the worst case.

On many projects we used either a commercial or FOSS RTOS, and in each
case the init code was auto-generated. (Really nothing more than picking
up start/end addresses for read/write from the linker script, to use in
a single provided copy loop.) I have written my own less than half the
time - there may be embedded developers out there who have never done a
"bare metal" development. For them, once start and end labels, including
gap edges, are provided in the linker script, a small example in the ld
info would be the minimum needed.

> It could be the case currently where code from different
> non-contiguous ROMs are copied into a RAM during startup. This IMHO,
> is always specific to the particular embedded system being deployed.

OK, I'd thought that rare these days, as ROMs are so much bigger than
in my youth, but you did mention the case of overlays. It is easy to
imagine a separate ROM for one or several RAM-sized overlays. Then
overlay handling is as easy as manually handling gapped LMA, just done
in an overlay handler, rather than init.

With granularity equal to input sections, the proposal seems eminently
feasible, and an interesting project. I don't know what relocs might
ensue from bumping the ld location counter to the other side of a hole,
as when two input sections from one compile unit are separated to straddle
it, or whether ld would handle that without intervention. I'd be more
confident where the input sections are from separate compile units, and
connected only by globals.

I hope it goes well!

Erik

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] Allow linker scripts to specify multiple output regions for an output section?
  2017-02-22 15:28 Thomas Preudhomme
  2017-02-27 10:27 ` Tejas Belagod
  2017-02-28  5:52 ` Erik Christiansen
@ 2017-03-02  4:32 ` Erik Christiansen
       [not found]   ` <58B83CDA.5050000@foss.arm.com>
  2 siblings, 1 reply; 17+ messages in thread
From: Erik Christiansen @ 2017-03-02  4:32 UTC (permalink / raw)
  To: Thomas Preudhomme, Tejas Belagod; +Cc: binutils

Given that atomicity of flow around holes is to be input sections, there
may be a simpler equivalent to the proposed new syntax:

On 22.02.17 15:28, Thomas Preudhomme wrote:
> MEMORY
> 
> {
>   RAML (rwx) : ORIGIN = 0x1FFF0000, LENGTH = 0x00010000
>   RAMU (rwx) : ORIGIN = 0x20000000, LENGTH = 0x00040000
>   RAMZ (rwx) : ORIGIN = 0x20040000, LENGTH = 0x00040000	
> }
> 
> SECTIONS
> {
>   .text 0x1000 : { *(.text) _etext = . ; }
>   .mdata  :
>   AT ( ADDR (.text) + SIZEOF (.text) )
>   { _data = . ; *(.data) *(.data.*); _edata = . ; } > RAML, RAMU, RAMZ
> }

AIUI, that syntax proposal is motivated by the effect of this ld info
documentation "If a file name matches more than one wildcard pattern, or
if a file name appears explicitly and is also matched by a wildcard
pattern, the linker will use the first match in the linker script." I.e.
instead of seeking subsequent matching wildcard patterns when needed, ld
generates an overflow error on .raml, given this hole dodger, using
existing syntax:

   .raml : AT ( ADDR (.text) + SIZEOF (.text) )
   {  _rmal_start = . ;
      *(.data) *(.data.*) ;
      _raml_end = . ;
   } > RAML

   .ramu : AT ( ADDR (.raml) + SIZEOF (.raml) )
   {  _rmau_start = . ;
      *(.data) *(.data.*) ;
      _ramu_end = . ;
   } > RAMU

   .ramz : AT ( ADDR (.ramu) + SIZEOF (.ramu) )
   {  _rmaz_start = . ;
      *(.data) *(.data.*) ;
      _ramz_end = . ;
   } > RAMZ

I am led to wonder if it might not be less work to merely tweak ld to
look for subsequent matching wildcard patterns in following output
sections before issuing a region overflow error. I.e. ld merely
redefines "first match" if a subsequent one is available when needed.
That seems less intervention than adding new syntax to the script
interpreter, and then grafting on the new capability.

The overflowing input section needs to remain in the input queue during
the output section bump, to complete its "go-around" on failed landing
approach.

One significant advantage of this approach is that part of the
established practice, i.e. constraining certain input sections to low,
middle, or high RAM regions, remains both straightforward and explicit.
If multiple output sections are directed to a region, even finer
constraint is possible _simultaneous_ with inter-region flowing on
overflow. On the other hand, what would happen if multiple "> RAML,
RAMU, RAMZ" were aimed at these regions in an attempt to enforce a
paging or proximity constraint while flowing?

Utilising the existing syntax, which we've used for many years with
explicit input section patterns, empowered by a small ld intelligence
increment, would seem to manage the task with less effort and more
control. How does that fare as a modest variation on skinning the cat?

Erik

^ permalink raw reply	[flat|nested] 17+ messages in thread

[parent not found: <58B83CDA.5050000@foss.arm.com>]

* Re: [RFC] Allow linker scripts to specify multiple output regions for an output section?
       [not found]   ` <58B83CDA.5050000@foss.arm.com>
@ 2017-03-03 10:27     ` Erik Christiansen
  2017-03-07 11:06       ` Tejas Belagod
  0 siblings, 1 reply; 17+ messages in thread
From: Erik Christiansen @ 2017-03-03 10:27 UTC (permalink / raw)
  To: Tejas Belagod; +Cc: Thomas Preudhomme, binutils

On 02.03.17 15:40, Tejas Belagod wrote:
> On 02/03/17 04:32, Erik Christiansen wrote:
> > I am led to wonder if it might not be less work to merely tweak ld to
> > look for subsequent matching wildcard patterns in following output
> > sections before issuing a region overflow error. I.e. ld merely
> > redefines "first match" if a subsequent one is available when needed.
> > That seems less intervention than adding new syntax to the script
> > interpreter, and then grafting on the new capability.
> > 
> > The overflowing input section needs to remain in the input queue during
> > the output section bump, to complete its "go-around" on failed landing
> > approach.
> > 
> 
> It does seem like an interesting idea. Two things immediately spring to mind.
> 
> 1. Will it break existing code?

That's perhaps the most important question. At present any input section
pattern repetitions in the linker script would only be nonfunctional
baggage. They would only occur as harmless errors, disregarded by ld,
through its "first match" policy. Adding a command-line option to enable
flowing would however be a useful safeguard.

> 2. How do we honor any ordering specified? For eg. If the above spec means
> that raml will have .data first and .data.* later .ramu is expected to start
> with .data sections, will this break the assumption if a .data.* jumps into
> .ramu and starts the region with it?

Re-using the existing code, an input section would not just fall over
the edge to "start the region". Whether an input section is read from
ld's input, or redirected from an overflowing output section makes no
difference while the input section remains in the input queue,
unallocated. On failing to land in the full output section, it needs to
be redirected to a "second match" in a subsequent output section if
provided, else the pending (existing code) overflow error comes to
fruition. The existing allocation code (being unmodified) then continues
to distribute the input section according to existing pattern matching
behaviour, but using the "second match".

The ordering of input sections into output sections is set out in ld
info. The difference between "*(.text .rdata)" and "*(.text) *(.rdata)"
is described in "3.6.4.1 Input Section Basics".

Thus, if the user wants .ramu and .raml to have identical .data vs
.data.* order, then it'll be copy/paste. But if a difference is desired,
then copy/edit/paste is equally available. It was when one output
section had to "> RAML,RAMU, RAMZ", that region-specific control over
ordering was lost.

It is not suggested to change any code other than to interpose rebasing
of pattern allocation before erroring on output section overflow. If at
that point, we look for "second match" wildcard patterns in subsequent
output sections, then as each input section is read from ld's input, it
will be allocated to the next output section with matching patterns -
using the existing allocation code, influenced only to the extent of
replacing the "first match" patterns from the full output section with
subsequent substitutes.

> > One significant advantage of this approach is that part of the
> > established practice, i.e. constraining certain input sections to low,
> > middle, or high RAM regions, remains both straightforward and explicit.
> > If multiple output sections are directed to a region, even finer
> > constraint is possible _simultaneous_ with inter-region flowing on
> > overflow. On the other hand, what would happen if multiple "> RAML,
> > RAMU, RAMZ" were aimed at these regions in an attempt to enforce a
> > paging or proximity constraint while flowing?
> > 
> 
> I'm not sure I understand this question.

My word picture was a bit fuzzy, I must admit. The minimalist tweak
without syntax extension is capable of constraining some input sections
at the same time as flowing others. Input sections which need to be in
low memory are made to match a wildcard pattern (or explicit file list)
which is placed only in the first output section. Only input sections
which match patterns in subsequent output section can flow. The
mechanism thus sorts sheep from goats, while flowing. That is very
useful, and should be present in any implementation of flowing, I think.

There would undoubtedly be some real effort involved in tweaking ld to
rebase input pattern "first match" on output section overflow - where a
subsequent match is available. Whether that would best be done as a
"second match" search when needed, or replacing "first match" with a
list of matches at the outset, remains to be seen. The difference
between theory and practice always looks smaller from this side.

Erik

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] Allow linker scripts to specify multiple output regions for an output section?
  2017-03-03 10:27     ` Erik Christiansen
@ 2017-03-07 11:06       ` Tejas Belagod
  0 siblings, 0 replies; 17+ messages in thread
From: Tejas Belagod @ 2017-03-07 11:06 UTC (permalink / raw)
  To: dvalin; +Cc: Thomas Preudhomme, binutils

On 03/03/17 10:27, Erik Christiansen wrote:
> On 02.03.17 15:40, Tejas Belagod wrote:
>> On 02/03/17 04:32, Erik Christiansen wrote:
>>> I am led to wonder if it might not be less work to merely tweak ld to
>>> look for subsequent matching wildcard patterns in following output
>>> sections before issuing a region overflow error. I.e. ld merely
>>> redefines "first match" if a subsequent one is available when needed.
>>> That seems less intervention than adding new syntax to the script
>>> interpreter, and then grafting on the new capability.
>>>
>>> The overflowing input section needs to remain in the input queue during
>>> the output section bump, to complete its "go-around" on failed landing
>>> approach.
>>>
>>
>> It does seem like an interesting idea. Two things immediately spring to mind.
>>
>> 1. Will it break existing code?
>
> That's perhaps the most important question. At present any input section
> pattern repetitions in the linker script would only be nonfunctional
> baggage. They would only occur as harmless errors, disregarded by ld,
> through its "first match" policy. Adding a command-line option to enable
> flowing would however be a useful safeguard.
>
>> 2. How do we honor any ordering specified? For eg. If the above spec means
>> that raml will have .data first and .data.* later .ramu is expected to start
>> with .data sections, will this break the assumption if a .data.* jumps into
>> .ramu and starts the region with it?
>
> Re-using the existing code, an input section would not just fall over
> the edge to "start the region". Whether an input section is read from
> ld's input, or redirected from an overflowing output section makes no
> difference while the input section remains in the input queue,
> unallocated. On failing to land in the full output section, it needs to
> be redirected to a "second match" in a subsequent output section if
> provided, else the pending (existing code) overflow error comes to
> fruition. The existing allocation code (being unmodified) then continues
> to distribute the input section according to existing pattern matching
> behaviour, but using the "second match".
>
> The ordering of input sections into output sections is set out in ld
> info. The difference between "*(.text .rdata)" and "*(.text) *(.rdata)"
> is described in "3.6.4.1 Input Section Basics".
>
> Thus, if the user wants .ramu and .raml to have identical .data vs
> .data.* order, then it'll be copy/paste. But if a difference is desired,
> then copy/edit/paste is equally available. It was when one output
> section had to "> RAML,RAMU, RAMZ", that region-specific control over
> ordering was lost.
>
> It is not suggested to change any code other than to interpose rebasing
> of pattern allocation before erroring on output section overflow. If at
> that point, we look for "second match" wildcard patterns in subsequent
> output sections, then as each input section is read from ld's input, it
> will be allocated to the next output section with matching patterns -
> using the existing allocation code, influenced only to the extent of
> replacing the "first match" patterns from the full output section with
> subsequent substitutes.
>

Ah, yes! That makes a lot of sense. Thanks for clearing that up.

>>> One significant advantage of this approach is that part of the
>>> established practice, i.e. constraining certain input sections to low,
>>> middle, or high RAM regions, remains both straightforward and explicit.
>>> If multiple output sections are directed to a region, even finer
>>> constraint is possible _simultaneous_ with inter-region flowing on
>>> overflow. On the other hand, what would happen if multiple "> RAML,
>>> RAMU, RAMZ" were aimed at these regions in an attempt to enforce a
>>> paging or proximity constraint while flowing?
>>>
>>
>> I'm not sure I understand this question.
>
> My word picture was a bit fuzzy, I must admit. The minimalist tweak
> without syntax extension is capable of constraining some input sections
> at the same time as flowing others. Input sections which need to be in
> low memory are made to match a wildcard pattern (or explicit file list)
> which is placed only in the first output section. Only input sections
> which match patterns in subsequent output section can flow. The
> mechanism thus sorts sheep from goats, while flowing. That is very
> useful, and should be present in any implementation of flowing, I think.
>
> There would undoubtedly be some real effort involved in tweaking ld to
> rebase input pattern "first match" on output section overflow - where a
> subsequent match is available. Whether that would best be done as a
> "second match" search when needed, or replacing "first match" with a
> list of matches at the outset, remains to be seen. The difference
> between theory and practice always looks smaller from this side.
>

I like the approach you've proposed. I admit it is more practical than extending 
the syntax for more regions. But, I see 2 disadvantages that are more cosmetic 
than anything else:

1. Is the duplicity of patterns over multiple output section regions as 
expressive of the intent as using '> REGION1, REGION2,..., REGIONX'? Though you 
could argue that if the subsequent-match flowing feature is controlled by a 
command-line switch, the user knows what they're doing and the intention would 
be implicit.

2. If we have complex patterns matching input sections/filenames, duplicating it 
over multiple output sections statements might be prone to copy-paste errors. 
Keeping them consistent after changes means diligently replicating them 
everywhere - adds to maintenance overhead.

I agree that replacing the first-match rule with a subsequent match rule 
controlled by a command-line switch is much much lower implementation cost. It 
will be interesting to hear views of a maintainer about the preferred approach.

Thanks,
Tejas.


> Erik
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2019-07-24 12:48 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-09 12:06 [RFC] Allow linker scripts to specify multiple output regions for an output section? Erik Christiansen
2017-06-09 12:21 ` Tejas Belagod
2017-06-09 13:35   ` Erik Christiansen
2019-06-27 12:58     ` Christophe Lyon
2019-07-02  6:49       ` Erik Christiansen
2019-07-11  8:42         ` Christophe Lyon
2019-07-24  7:28           ` Nick Clifton
2019-07-24  9:18             ` Simon Richter
2019-07-24 12:48               ` Erik Christiansen
  -- strict thread matches above, loose matches on Subject: below --
2017-02-22 15:28 Thomas Preudhomme
2017-02-27 10:27 ` Tejas Belagod
2017-02-28  5:52 ` Erik Christiansen
2017-02-28 12:11   ` Tejas Belagod
2017-03-01  7:12     ` Erik Christiansen
2017-03-02  4:32 ` Erik Christiansen
     [not found]   ` <58B83CDA.5050000@foss.arm.com>
2017-03-03 10:27     ` Erik Christiansen
2017-03-07 11:06       ` Tejas Belagod

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).