Re: [RFC] Allow linker scripts to specify multiple output regions for an output section?

public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed

From: Tejas Belagod <tejas.belagod@foss.arm.com>
To: dvalin@internode.on.net
Cc: Thomas Preudhomme <thomas.preudhomme@foss.arm.com>,
	binutils@sourceware.org
Subject: Re: [RFC] Allow linker scripts to specify multiple output regions for an output section?
Date: Tue, 28 Feb 2017 12:11:00 -0000	[thread overview]
Message-ID: <58B568E0.9010906@foss.arm.com> (raw)
In-Reply-To: <20170228055141.GA4400@ratatosk>

Hi Erik,

Thanks for your comments. My comments inline below.

On 28/02/17 05:51, Erik Christiansen wrote:
> On 22.02.17 15:28, Thomas Preudhomme wrote:
>> There has been some interest in the past in having syntactic support for
>> specifying mapping of an output section to multiple memory regions in the
>> GNU LD scripting language (eg.
>> https://sourceware.org/bugzilla/show_bug.cgi?id=14299). I would like to
>> propose a scheme here and welcome any feedback.
>
> TL;DR: Detailed response begins after 6 paragraphs.
>
> OK, in the absence of prior discussion, I'll just think aloud as I
> correlate the proposal with my experience in three decades developing
> embedded systems. Unfortunately, the one time an MMU was involved, that
> was done by the time I became involved, but memory holes are all black.
>
> The closest scenario I recall is where there were disparate physical
> memories, both on and off chip, I simply added a MEMORY region for each
> such block, e.g. Flash, 16bit SRAM, 8bit SRAM, a couple of small ones
> for specific memory mapped system chips with bunches of config
> registers, and maybe an FPGA in the mix. Add comments for device names
> and the waitstate generator values, and the script serves as central
> documentation too.
>
> With that one-to-one region mapping, there was never any conflict over
> where stuff should be located, and non were interchangeable. It is as
> described by "some on-chip memory and some off-chip memory, but at
> non-contiguous addresses" in the above link. And where we had both 8 and
> 16 bit SRAMS, it was most definitely consistent with "a region of
> on-chip SRAM which performs better for code, and the remainder performs
> better for data", except that using the wrong one was fatal rather than
> merely inferior.
>
> One issue I've encountered is detecting region overflow when multiple
> output sections contribute to its content, but existing syntax supports
> that, e.g.:
>
> MEMORY
> {
>    flash   (rx)  : ORIGIN = 0, LENGTH = 32K
>    ram    (rw!x) : ORIGIN = 0x800060, LENGTH = 2K
>    eeprom (rw!x) : ORIGIN = 0x810000, LENGTH = 1K
> }
>
> . = ASSERT (_etext + SIZEOF (.data) <= LENGTH(flash) , "Error: .text + .data
> collectively overflow the flash memory." ) ;
>
> But the need to flow across memory holes never eventuated in practice,
> as a modest chunk of on-chip RAM could always be used for e.g. sdata,
> leaving no need for flowing. All other regions were always incompatible,
> making flowing impossible.
>
> ...
>> If LMA is specified, the image(startup code etc.) most likely handles
>> the copying from load address to output section VMA.
>
> Yes, it does. And in the generic init code I've encountered, it has just
> been a single copy loop for e.g. bss, performing a contiguous block copy.
> (And when I've written it, that was true too.)
>
>> Multiple segment spec means the output section can be part of more
>> than one segment and â€˜fillexpâ€™ simply fills the output section loaded
>> with the fill value.
>
> Trans-hole flowing would also require a runtime copy loop for each
> non-contiguous block, or a table-driven multi-block copier, with the
> run-time table somehow initialised from the linker script. (I can
> imagine using variables defined in the linker script, and the .RPT
> assembler directive - maybe.)
>
>> Now, this does not have a method to specify output section spanning multiple
>> memory regions. For example, if there are 2 RAM regions RAML and RAMU and
>> the user wants an output section to first fill RAML and then when RAML is
>> full, i.e. when the remaining space in RAML cannot accommodate a full input
>> section, start filling RAMU, the user has to split the sections into
>> multiple output sections. If we extend this syntax to specify multiple
>> output regions, we can make the linker map the output section to multiple
>> regions by filling the output region with input sections in the order
>> specified in the â€˜output-section-commandâ€™ and when its full (meaning when
>> the remaining gap in a region cannot accommodate one full input section, it
>> starts from the next output region.
>
> This seems to be the alternate view of the problem of asking ld to flow
> code around holes in a region, something it still can't do, IIRC. I
> state it that way, because two non-contiguous memory regions over which
> code (or data) may be interchangeably flowed, are identical to a single
> region with a hole.
>
> The proposal does seem to be a way to think about addressing that issue:
>
>> Eg.
>>
>> MEMORY
>>
>> {
>>    RAML (rwx) : ORIGIN = 0x1FFF0000, LENGTH = 0x00010000
>>    RAMU (rwx) : ORIGIN = 0x20000000, LENGTH = 0x00040000
>>    RAMZ (rwx) : ORIGIN = 0x20040000, LENGTH = 0x00040000	
>> }
>>
>> SECTIONS
>> {
>>    .text 0x1000 : { *(.text) _etext = . ; }
>>    .mdata  :
>>    AT ( ADDR (.text) + SIZEOF (.text) )
>>    { _data = . ; *(.data) *(.data.*); _edata = . ; } > RAML, RAMU, RAMZ
>> }
>
> Without the need for new syntax or complex init code generators,
> having gcc flow code across up to 5 pages of flash plus .lowtext and a
> floating .hightext was compatible with the linker script and tests shown
> here:
>
> http://lists.nongnu.org/archive/html/avr-gcc-list/2012-12/msg00044.html
>
> While details have faded from wet RAM, ISTR that holes were
> manufacturable by not populating any of the 5 pages, which gcc sees as
> named spaces. The gcc stuff was done in the AVR back end, IIRC, while an
> implementation in ld would be generic.
>
>> Illustration:
>>
>> Consider an example where we have the following input .data sections:
>>
>> .data: size 0x0000FFF0
>> .data.a : size 0x000000F0
>> .data.b : size 0x00003000
>> .data.c : size 0x00000200
>>
>> With the above scheme, this will be mapped in the following way to RAML,RAMU
>> and RAMZ:
>>
>> RAML : (0x1FFF0000 - 0x1FFFFFF0): .data
>>         (0x1FFFFFF0 - 0x1FFFFFFF): *** GAP ***
>
> Would GAP use ALIGNMENT, or introduce a new parameter?
>

I wouldn't want to overload ALIGNMENT here - what if its needed simultaneously 
with ALIGNMENT. Can we not leave this space unassigned? More often than not if 
one's filling a memory region automatically, would they really care what goes 
into the gaps (if security is not a concern)? OTOH, if security is a concern, we 
can explore introducing a new syntax with a default behavior of zero-filling the 
gaps.

> How would the target-specific relocations required to break code across
> the hole be handled by ld? E.g. break a small AVR code loop (with 6-bit
> relative addressing range) and you'll need a LJMP to bridge the hole,
> and another with reversed loop conditionality to close the loop.
> Multiply that task by all the possible relocs, and again by all the
> possible CPU targets, and it's never-ending work for a software team for
> life.
>

As I understand, compilers generate references to objects within a section with a

  .<input_section_name> + offset_within_section

Now when a section that spans 2 or more regions inserts holes/padding to prevent 
an object from straddling 2 regions, the offsets within the section to other 
objects will change. This means all the compiler-generated "section + offset" of 
all objects that come after the padding will need to be fixed up. Its really 
difficult to know which ones to fix up - the relocations are only on the section 
label, not the object in the section. So, what I'm proposing here will not split 
the input sections - input sections will move as a block.

> It seems more
>
>> RAMU : (0x20000000 - 0x200000F0): .data.a
>>         (0x200000F0 - 0x200030F0): .data.b
>>         (0x200030F0 - 0x200032F0): .data.c
>>
>>
>> It will not affect the specification in terms of the other attributes, but
>> one (LMA):
>>
>> * Output section VMA: No change - this just specifies where the output
>> section will start.
>>
>> * type: No change - this is for the output section as a whole - output
>> memory regions will not change it.
>>
>> * LMA: The output section can still be loaded from one LMA and mapped to
>> output VMA - the only change here is that the loader will need to map the
>> output sections to VMA with the same pattern as the multiple output region
>> matching code above. Can a loader do that? Can ad-hoc loaders do this? Or do
>> all loaders assume that regions are continguous when output section is
>> mapped to VMAs?
>
> Contiguous. Hole-flowing is what you're proposing to implement, both the
> linker internal component (target-specific reloctions), and the generic
> (e.g. table driven) multi-block copy loop synthesiser for custom init
> code generation. How that would integrate with existing init code in
> various implementations, I have no idea.
>
> If LMA can also be flowed around a hole, then runtime init code must be
> able to handle not only non-contiguous delivery, but gapped pick-up. Has
> the complexity of simultaneously handling different gaps in both been
> considered?
>

I haven't thought about that. Can it be worked on the principle that when one 
specifies an LMA and there is user-written init code to copy blocks, the init 
code programmer knows the LMA gap layout and can handle the gaps accordingly? It 
could be the case currently where code from different non-contiguous ROMs are 
copied into a RAM during startup. This IMHO, is always specific to the 
particular embedded system being deployed.

> ...
>> For orthogonality and consistency, we would want to apply the multiple
>> region feature to overlays too. The semantics will not be different from the
>> algorithm mentioned above. The only caveat is that the overlay
>> manager/loader will need to handle the swapping in and out of sections that
>> run from the VMA consistently with the mapping algo described above. Do we
>> want this for overlays too?
>
> Expanding the complexity of a single-problem solution to cover other
> situations seems courageous, unless it naturally falls out of the
> narrower solution. As overlays are used e.g. when RAM size or CPU
> instruction addressing range is constrained, but there's ample flash,
> then the likelihood of holes in either is limited, I suspect.
>

Makes sense.

Thanks,
Tejas.

> Specifying discrete output sections with VMAs placed around the physical
> holes is another way to dodge them. They can all be allocated to a
> global encompassing memory region. Flowing is performed manually by
> assigning suitable code chunks to preferred input sections. Automating
> that, as intimated above, is non-trivial.
>
> Caveat: Above thoughts have flowed without aid of caffeine, and are
>          recollections from old battles.
>
> Erik
>

next prev parent reply	other threads:[~2017-02-28 12:11 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-22 15:28 Thomas Preudhomme
2017-02-27 10:27 ` Tejas Belagod
2017-02-28  5:52 ` Erik Christiansen
2017-02-28 12:11   ` Tejas Belagod [this message]
2017-03-01  7:12     ` Erik Christiansen
2017-03-02  4:32 ` Erik Christiansen
     [not found]   ` <58B83CDA.5050000@foss.arm.com>
2017-03-03 10:27     ` Erik Christiansen
2017-03-07 11:06       ` Tejas Belagod
2017-03-09 12:06 Erik Christiansen
2017-06-09 12:21 ` Tejas Belagod
2017-06-09 13:35   ` Erik Christiansen
2019-06-27 12:58     ` Christophe Lyon
2019-07-02  6:49       ` Erik Christiansen
2019-07-11  8:42         ` Christophe Lyon
2019-07-24  7:28           ` Nick Clifton
2019-07-24  9:18             ` Simon Richter
2019-07-24 12:48               ` Erik Christiansen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=58B568E0.9010906@foss.arm.com \
    --to=tejas.belagod@foss.arm.com \
    --cc=binutils@sourceware.org \
    --cc=dvalin@internode.on.net \
    --cc=thomas.preudhomme@foss.arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).