* Re: [RFC] Allow linker scripts to specify multiple output regions for an output section? @ 2017-03-09 12:06 Erik Christiansen 2017-06-09 12:21 ` Tejas Belagod 0 siblings, 1 reply; 17+ messages in thread From: Erik Christiansen @ 2017-03-09 12:06 UTC (permalink / raw) To: Tejas Belagod; +Cc: Thomas Preudhomme, binutils As two prior posts of this content have failed to appear on-list, apparently due to an auto-spamblock, we'll try again after invoking an unblock, and waiting a bit. On 07.03.17 11:06, Tejas Belagod wrote: > On 03/03/17 10:27, Erik Christiansen wrote: > > There would undoubtedly be some real effort involved in tweaking ld to > > rebase input pattern "first match" on output section overflow - where a > > subsequent match is available. Whether that would best be done as a > > "second match" search when needed, or replacing "first match" with a > > list of matches at the outset, remains to be seen. The difference > > between theory and practice always looks smaller from this side. > > > > I like the approach you've proposed. I admit it is more practical than > extending the syntax for more regions. But, I see 2 disadvantages that are > more cosmetic than anything else: > > 1. Is the duplicity of patterns over multiple output section regions as > expressive of the intent as using '> REGION1, REGION2,..., REGIONX'? Though > you could argue that if the subsequent-match flowing feature is controlled > by a command-line switch, the user knows what they're doing and the > intention would be implicit. Both flowing notations make the intent explicit in the linker script, I think, but when adding inter-region flow, it is worthwhile considering more than one use case. Requests for flowing around holes have appeared on this list before, and flowing input sections according to a single set of ordering patterns is the simplest case, without much flexibility. Let us consider an embedded application where flowing of the bulk of input sections is required, but some of them must remain together for addressing reasons (e.g. pointer-relative reach, or page-zero addressing). That requires differing input section sorting between output sections, but is not possible when a single set of patterns is forced on all the flow regions. For such cases, and more complex ones where we might require another group of input sections to be herded into e.g. the last region (it might be slower bulk memory, best for low-use data, perhaps), it is essential that the linker script be able to specify such sorting, optimally by the existing scripting method of utilising an output section to contain the sorting pattern subset for the associated memory region. For the simple repetitive case, copy/paste is the slightest of burdens which disappears instantly the moment the product requirements evolve due to the hardware boffins or customer requirements necessitating selective flowing. That is not the time to discover that we have designed all flexibility out of the flowing implementation, I submit. (Most especially when the flexibility comes at no cost beyond the pattern rebasing for basic flowing.) Yes, a command line switch for enabling flowing might be a useful safeguard, as we have discussed. > 2. If we have complex patterns matching input sections/filenames, > duplicating it over multiple output sections statements might be prone to > copy-paste errors. Keeping them consistent after changes means diligently > replicating them everywhere - adds to maintenance overhead. Hmmm ... the initial replication is mere copy/paste, but I take your point about subsequent maintenance of the simple case. However, being able to edit code/makefiles/scripts is a basic programmer prerequisite. Muck up a makefile, and we discover the need for accuracy too. Currently, we similarly flow by manually tweaking input section patterns in output sections, so creating and keeping an eye on the patterns is nothing new, really. Importantly, if any input sections need to be herded about, the mechanism for it has not been designed out. > I agree that replacing the first-match rule with a subsequent match rule > controlled by a command-line switch is much much lower implementation cost. > It will be interesting to hear views of a maintainer about the preferred > approach. I'm also intrigued whether you'd in the end be satisfied with a partial flowing solution, covering only one use case, when less effort covers a very broad diversity, using existing notation and usage. ;-) Erik ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC] Allow linker scripts to specify multiple output regions for an output section? 2017-03-09 12:06 [RFC] Allow linker scripts to specify multiple output regions for an output section? Erik Christiansen @ 2017-06-09 12:21 ` Tejas Belagod 2017-06-09 13:35 ` Erik Christiansen 0 siblings, 1 reply; 17+ messages in thread From: Tejas Belagod @ 2017-06-09 12:21 UTC (permalink / raw) To: dvalin; +Cc: Thomas Preudhomme, binutils On 09/03/17 12:06, Erik Christiansen wrote: > As two prior posts of this content have failed to appear on-list, > apparently due to an auto-spamblock, we'll try again after invoking an > unblock, and waiting a bit. > > On 07.03.17 11:06, Tejas Belagod wrote: >> On 03/03/17 10:27, Erik Christiansen wrote: >>> There would undoubtedly be some real effort involved in tweaking ld to >>> rebase input pattern "first match" on output section overflow - where a >>> subsequent match is available. Whether that would best be done as a >>> "second match" search when needed, or replacing "first match" with a >>> list of matches at the outset, remains to be seen. The difference >>> between theory and practice always looks smaller from this side. >>> >> >> I like the approach you've proposed. I admit it is more practical than >> extending the syntax for more regions. But, I see 2 disadvantages that are >> more cosmetic than anything else: >> >> 1. Is the duplicity of patterns over multiple output section regions as >> expressive of the intent as using '> REGION1, REGION2,..., REGIONX'? Though >> you could argue that if the subsequent-match flowing feature is controlled >> by a command-line switch, the user knows what they're doing and the >> intention would be implicit. > > Both flowing notations make the intent explicit in the linker script, I > think, but when adding inter-region flow, it is worthwhile considering > more than one use case. Requests for flowing around holes have appeared > on this list before, and flowing input sections according to a single > set of ordering patterns is the simplest case, without much flexibility. > > Let us consider an embedded application where flowing of the bulk of > input sections is required, but some of them must remain together for > addressing reasons (e.g. pointer-relative reach, or page-zero addressing). > That requires differing input section sorting between output sections, > but is not possible when a single set of patterns is forced on all the > flow regions. > > For such cases, and more complex ones where we might require another > group of input sections to be herded into e.g. the last region (it might > be slower bulk memory, best for low-use data, perhaps), it is essential > that the linker script be able to specify such sorting, optimally by the > existing scripting method of utilising an output section to contain the > sorting pattern subset for the associated memory region. > > For the simple repetitive case, copy/paste is the slightest of burdens > which disappears instantly the moment the product requirements evolve > due to the hardware boffins or customer requirements necessitating > selective flowing. That is not the time to discover that we have > designed all flexibility out of the flowing implementation, I submit. > (Most especially when the flexibility comes at no cost beyond the pattern > rebasing for basic flowing.) > > Yes, a command line switch for enabling flowing might be a useful > safeguard, as we have discussed. > >> 2. If we have complex patterns matching input sections/filenames, >> duplicating it over multiple output sections statements might be prone to >> copy-paste errors. Keeping them consistent after changes means diligently >> replicating them everywhere - adds to maintenance overhead. > > Hmmm ... the initial replication is mere copy/paste, but I take your > point about subsequent maintenance of the simple case. However, being > able to edit code/makefiles/scripts is a basic programmer prerequisite. > Muck up a makefile, and we discover the need for accuracy too. > > Currently, we similarly flow by manually tweaking input section patterns > in output sections, so creating and keeping an eye on the patterns is > nothing new, really. > > Importantly, if any input sections need to be herded about, the > mechanism for it has not been designed out. > >> I agree that replacing the first-match rule with a subsequent match rule >> controlled by a command-line switch is much much lower implementation cost. >> It will be interesting to hear views of a maintainer about the preferred >> approach. > > I'm also intrigued whether you'd in the end be satisfied with a partial > flowing solution, covering only one use case, when less effort covers > a very broad diversity, using existing notation and usage. ;-) > > Erik > Hi, Just wanted to give you the latest on this. Unfortunately, I have suspended working on this feature for the foreseeable future owing to a shift in priorities. Will update if there is a change in circumstances. Thanks, Tejas. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC] Allow linker scripts to specify multiple output regions for an output section? 2017-06-09 12:21 ` Tejas Belagod @ 2017-06-09 13:35 ` Erik Christiansen 2019-06-27 12:58 ` Christophe Lyon 0 siblings, 1 reply; 17+ messages in thread From: Erik Christiansen @ 2017-06-09 13:35 UTC (permalink / raw) To: binutils On 09.06.17 13:20, Tejas Belagod wrote: > > > I agree that replacing the first-match rule with a subsequent match rule > > > controlled by a command-line switch is much much lower implementation cost. > > > It will be interesting to hear views of a maintainer about the preferred > > > approach. ... > Hi, > > Just wanted to give you the latest on this. Unfortunately, I have suspended > working on this feature for the foreseeable future owing to a shift in > priorities. Will update if there is a change in circumstances. Ah, pity, but life is the art of the possible, not the ideal. Will look forward to hearing of any resumption, should it occur. Erik ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC] Allow linker scripts to specify multiple output regions for an output section? 2017-06-09 13:35 ` Erik Christiansen @ 2019-06-27 12:58 ` Christophe Lyon 2019-07-02 6:49 ` Erik Christiansen 0 siblings, 1 reply; 17+ messages in thread From: Christophe Lyon @ 2019-06-27 12:58 UTC (permalink / raw) To: dvalin, Tejas Belagod; +Cc: binutils, Maxim Kuvyrkov, Peter Smith Hi! On Fri, 9 Jun 2017 at 15:35, Erik Christiansen <dvalin@internode.on.net> wrote: > > On 09.06.17 13:20, Tejas Belagod wrote: > > > > I agree that replacing the first-match rule with a subsequent match rule > > > > controlled by a command-line switch is much much lower implementation cost. > > > > It will be interesting to hear views of a maintainer about the preferred > > > > approach. > ... > > > Hi, > > > > Just wanted to give you the latest on this. Unfortunately, I have suspended > > working on this feature for the foreseeable future owing to a shift in > > priorities. Will update if there is a change in circumstances. > > Ah, pity, but life is the art of the possible, not the ideal. > Will look forward to hearing of any resumption, should it occur. > We have received requests to support non-contiguous memory regions in the BFD linker, so it seems it is time to resurrect this thread :-) IIUC, Erik made a proposal that seems simpler to implement than the initial one in https://sourceware.org/ml/binutils/2017-03/msg00020.html Tejas, on your side, do you have any news about this project? (ideally in terms of implementation, but updated specs would be good too) Thanks, Christophe > Erik ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC] Allow linker scripts to specify multiple output regions for an output section? 2019-06-27 12:58 ` Christophe Lyon @ 2019-07-02 6:49 ` Erik Christiansen 2019-07-11 8:42 ` Christophe Lyon 0 siblings, 1 reply; 17+ messages in thread From: Erik Christiansen @ 2019-07-02 6:49 UTC (permalink / raw) To: Christophe Lyon; +Cc: Tejas Belagod, binutils, Maxim Kuvyrkov, Peter Smith On 27.06.19 14:58, Christophe Lyon wrote: > On Fri, 9 Jun 2017 at 15:35, Erik Christiansen <dvalin@internode.on.net> wrote: > > On 09.06.17 13:20, Tejas Belagod wrote: > > > > > I agree that replacing the first-match rule with a subsequent match rule > > > > > controlled by a command-line switch is much much lower implementation cost. > > > > > It will be interesting to hear views of a maintainer about the preferred > > > > > approach. > > ... > We have received requests to support non-contiguous memory regions in > the BFD linker, so it seems it is time to resurrect this thread :-) > > IIUC, Erik made a proposal that seems simpler to implement than the > initial one in > https://sourceware.org/ml/binutils/2017-03/msg00020.html > > Tejas, on your side, do you have any news about this project? (ideally > in terms of implementation, but updated specs would be good too) Having not heard from Tejas or Thomas since the on-list posts in 2017, the closest I can offer to draft specs is a summary of my recollection of the consensus reached: A memory region with a hole is identical to two non-contiguous memory regions, if flowing of input sections preempts the linker's existing overflow detection. Existing linker script syntax can specify the regions, but the flowing is currently lacking. Flowing should occur when the remaining space in a region cannot accommodate the next full input section about to be allocated, and the current input section matches a pattern in a subsequent output section, directed to another memory region. The comparison is (LENGTH - SIZEOF) vs input section size, as we have a 1:1 correspondence between output section and memory region for flow steering, anyway.) Flowing granularity is full input sections. That avoids needing to rework relocations within an input section, due to flowing. Care may need to be taken to still invoke existing linker code to provide (default or explicit) FILL to the memory remnant. (I.e. slip it in before resuming allocation in the next region.) A command line switch would usefully protect against any backward linker script incompatibility. A new refinement?: The required search for a second pattern match might be more simply implemented by remembering the output section in which the first match was found, and when needing a second match, resuming the search from the following output section. When available subsequent matches are exhausted, the existing overflow detection is finally reached, giving consistent behaviour. The simplified implementation does not require modification of linker script syntax. It also allows explicit placement of chosen input sections in a preferred memory section. In addition to simple flowing of *(.data) *(.data.*): MEMORY { RAML (rwx) : ORIGIN = 0x1FFF0000, LENGTH = 0x00010000 RAMU (rwx) : ORIGIN = 0x20000000, LENGTH = 0x00040000 RAMZ (rwx) : ORIGIN = 0x20040000, LENGTH = 0x00040000 } SECTIONS { .raml : AT ( ADDR (.text) + SIZEOF (.text) ) { _rmal_start = . ; *(.boot) ; *(.data) *(.data.*) ; _raml_end = . ; } > RAML .ramu : AT ( ADDR (.raml) + SIZEOF (.raml) ) { _rmau_start = . ; *(.data) *(.data.*) ; _ramu_end = . ; } > RAMU .ramz : AT ( ADDR (.ramu) + SIZEOF (.ramu) ) { _rmaz_start = . ; *(.data) *(.data.*) ; version.data _ramz_end = . ; } > RAMZ } additional patterns can be specified to allocate key input sections in a specific memory region. Such control would not be achievable with a new syntax ">RAML,RAMU,RAMZ" implementation. It's not a very detailed "spec", but it's the strategy to date. (Barring anything that I've forgotten.) It would be interesting to hear how well that matches your use case, as this is the time to wrangle any wrinkles, while maintaining existing user flexibility. Erik ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC] Allow linker scripts to specify multiple output regions for an output section? 2019-07-02 6:49 ` Erik Christiansen @ 2019-07-11 8:42 ` Christophe Lyon 2019-07-24 7:28 ` Nick Clifton 0 siblings, 1 reply; 17+ messages in thread From: Christophe Lyon @ 2019-07-11 8:42 UTC (permalink / raw) To: Christophe Lyon, Tejas Belagod, binutils, Maxim Kuvyrkov, Peter Smith, Alan Modra, nick clifton On Tue, 2 Jul 2019 at 08:49, Erik Christiansen <dvalin@internode.on.net> wrote: > > On 27.06.19 14:58, Christophe Lyon wrote: > > On Fri, 9 Jun 2017 at 15:35, Erik Christiansen <dvalin@internode.on.net> wrote: > > > On 09.06.17 13:20, Tejas Belagod wrote: > > > > > > I agree that replacing the first-match rule with a subsequent match rule > > > > > > controlled by a command-line switch is much much lower implementation cost. > > > > > > It will be interesting to hear views of a maintainer about the preferred > > > > > > approach. > > > ... > > > We have received requests to support non-contiguous memory regions in > > the BFD linker, so it seems it is time to resurrect this thread :-) > > > > IIUC, Erik made a proposal that seems simpler to implement than the > > initial one in > > https://sourceware.org/ml/binutils/2017-03/msg00020.html > > > > Tejas, on your side, do you have any news about this project? (ideally > > in terms of implementation, but updated specs would be good too) > > Having not heard from Tejas or Thomas since the on-list posts in 2017, > the closest I can offer to draft specs is a summary of my recollection > of the consensus reached: > > A memory region with a hole is identical to two non-contiguous memory > regions, if flowing of input sections preempts the linker's existing > overflow detection. Existing linker script syntax can specify the > regions, but the flowing is currently lacking. Flowing should occur when > the remaining space in a region cannot accommodate the next full input > section about to be allocated, and the current input section matches a > pattern in a subsequent output section, directed to another memory > region. The comparison is (LENGTH - SIZEOF) vs input section size, as we > have a 1:1 correspondence between output section and memory region for > flow steering, anyway.) > > Flowing granularity is full input sections. That avoids needing to > rework relocations within an input section, due to flowing. Care may > need to be taken to still invoke existing linker code to provide > (default or explicit) FILL to the memory remnant. (I.e. slip it in > before resuming allocation in the next region.) > > A command line switch would usefully protect against any backward linker > script incompatibility. > > A new refinement?: The required search for a second pattern match might > be more simply implemented by remembering the output section in which > the first match was found, and when needing a second match, resuming the > search from the following output section. When available subsequent > matches are exhausted, the existing overflow detection is finally > reached, giving consistent behaviour. > > The simplified implementation does not require modification of linker > script syntax. It also allows explicit placement of chosen input > sections in a preferred memory section. In addition to simple flowing of > *(.data) *(.data.*): > > MEMORY > { > RAML (rwx) : ORIGIN = 0x1FFF0000, LENGTH = 0x00010000 > RAMU (rwx) : ORIGIN = 0x20000000, LENGTH = 0x00040000 > RAMZ (rwx) : ORIGIN = 0x20040000, LENGTH = 0x00040000 > } > > SECTIONS > { > .raml : AT ( ADDR (.text) + SIZEOF (.text) ) > { _rmal_start = . ; > *(.boot) ; > *(.data) *(.data.*) ; > _raml_end = . ; > } > RAML > > .ramu : AT ( ADDR (.raml) + SIZEOF (.raml) ) > { _rmau_start = . ; > *(.data) *(.data.*) ; > _ramu_end = . ; > } > RAMU > > .ramz : AT ( ADDR (.ramu) + SIZEOF (.ramu) ) > { _rmaz_start = . ; > *(.data) *(.data.*) ; > version.data > _ramz_end = . ; > } > RAMZ > } > > additional patterns can be specified to allocate key input sections in a > specific memory region. Such control would not be achievable with a new > syntax ">RAML,RAMU,RAMZ" implementation. > > It's not a very detailed "spec", but it's the strategy to date. (Barring > anything that I've forgotten.) > > It would be interesting to hear how well that matches your use case, as > this is the time to wrangle any wrinkles, while maintaining existing > user flexibility. > Hi, Sorry for the delay. Thanks very much for this updated summary. I checked with other internal users, and it seems it would be OK for our use case. I think I will have a look at implementing it, but not immediately because of holidays. I hope it's not too difficult :-) What do maintainers think about this? Would it be acceptable? Thanks, Christophe > Erik ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC] Allow linker scripts to specify multiple output regions for an output section? 2019-07-11 8:42 ` Christophe Lyon @ 2019-07-24 7:28 ` Nick Clifton 2019-07-24 9:18 ` Simon Richter 0 siblings, 1 reply; 17+ messages in thread From: Nick Clifton @ 2019-07-24 7:28 UTC (permalink / raw) To: Christophe Lyon, Tejas Belagod, binutils, Maxim Kuvyrkov, Peter Smith, Alan Modra Hi Christophe, >> The simplified implementation does not require modification of linker >> script syntax. It also allows explicit placement of chosen input >> sections in a preferred memory section. In addition to simple flowing of >> *(.data) *(.data.*): >> SECTIONS >> { >> .raml : AT ( ADDR (.text) + SIZEOF (.text) ) >> { _rmal_start = . ; >> *(.boot) ; >> *(.data) *(.data.*) ; >> _raml_end = . ; >> } > RAML >> >> .ramu : AT ( ADDR (.raml) + SIZEOF (.raml) ) >> { _rmau_start = . ; >> *(.data) *(.data.*) ; >> _ramu_end = . ; >> } > RAMU > What do maintainers think about this? Would it be acceptable? Yes, but you need to be very careful about what happens when switching from one output section to another. Can the linker backtrack to an earlier output section if it subsequently finds an input section which will fit in the remaining space ? If you do allow backtracking then the ordering of sections can change from the current linker's specified behaviour (sections are linked in input order unless a SORT keyword is used). And so users will complain. If you don't allow backtracking then there could be gaps in memory regions which could have been used, and users will complain... :-) Cheers Nick ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC] Allow linker scripts to specify multiple output regions for an output section? 2019-07-24 7:28 ` Nick Clifton @ 2019-07-24 9:18 ` Simon Richter 2019-07-24 12:48 ` Erik Christiansen 0 siblings, 1 reply; 17+ messages in thread From: Simon Richter @ 2019-07-24 9:18 UTC (permalink / raw) To: binutils Hi, On Wed, Jul 24, 2019 at 08:28:05AM +0100, Nick Clifton wrote: > If you do allow backtracking then the ordering of sections can change > from the current linker's specified behaviour (sections are linked in > input order unless a SORT keyword is used). And so users will complain. > If you don't allow backtracking then there could be gaps in memory > regions which could have been used, and users will complain... :-) And if a section would fit after linker relaxation, but the new layout would make the relaxations invalid... Simon ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC] Allow linker scripts to specify multiple output regions for an output section? 2019-07-24 9:18 ` Simon Richter @ 2019-07-24 12:48 ` Erik Christiansen 0 siblings, 0 replies; 17+ messages in thread From: Erik Christiansen @ 2019-07-24 12:48 UTC (permalink / raw) To: binutils On 24.07.19 11:18, Simon Richter wrote: > Hi, > > On Wed, Jul 24, 2019 at 08:28:05AM +0100, Nick Clifton wrote: > > > If you do allow backtracking then the ordering of sections can change > > from the current linker's specified behaviour (sections are linked in > > input order unless a SORT keyword is used). And so users will complain. > > If you don't allow backtracking then there could be gaps in memory > > regions which could have been used, and users will complain... :-) In the years we have waited for a sufficient need to warrant implementation, potential users have been rarer than hen's teeth¹. My instinct is that anyone needing flowing will be grateful to have it, and will accept both the proffered granularity, and freedom from the curse of backtracking. The prior consensus has up to now been to respect FILL for the trailing segment remnant, and that looks fine to me. (It is a darn sight easier to implement too, I figure.) That a given hole potentially grows by almost the granularity of the nearest input section seems hardly troubling - and if it is, then it is up to the user to reduce the size of the nearest input section, I submit. But I figure that the casting vote probably goes to the implementer, who has the imperative of a current and actual use case. :-) > And if a section would fit after linker relaxation, but the new layout > would make the relaxations invalid... The previous run-up at this problem specified the flowing granularity to be an input section. Flowing an input section across a hole has been avoided, specifically to stay out of the clutches of such relocation issues. (There's enough work there already, I think. :-) Erik ¹ All right, there was one, admittedly, but it didn't lead to implementation. ^ permalink raw reply [flat|nested] 17+ messages in thread
* [RFC] Allow linker scripts to specify multiple output regions for an output section? @ 2017-02-22 15:28 Thomas Preudhomme 2017-02-27 10:27 ` Tejas Belagod ` (2 more replies) 0 siblings, 3 replies; 17+ messages in thread From: Thomas Preudhomme @ 2017-02-22 15:28 UTC (permalink / raw) To: binutils, tejas.belagod [Sending on behalf of Tejas Belagod, please reply to both him (in Cc) and me] Hi, There has been some interest in the past in having syntactic support for specifying mapping of an output section to multiple memory regions in the GNU LD scripting language (eg. https://sourceware.org/bugzilla/show_bug.cgi?id=14299). I would like to propose a scheme here and welcome any feedback. The section command in the LD Script language is structured thus: section [address] [(type)] : [AT(lma)] [ALIGN(section_align)] [SUBALIGN(subsection_align)] [constraint] { output-section-command output-section-command ... } [>region] [AT>lma_region] [:phdr :phdr ...] [=fillexp] As I understand, it simply means - place the output section at âaddressâ with attributes specified above (type, alignment etc). If LMA is specified, the image(startup code etc.) most likely handles the copying from load address to output section VMA. Multiple segment spec means the output section can be part of more than one segment and âfillexpâ simply fills the output section loaded with the fill value. Now, this does not have a method to specify output section spanning multiple memory regions. For example, if there are 2 RAM regions RAML and RAMU and the user wants an output section to first fill RAML and then when RAML is full, i.e. when the remaining space in RAML cannot accommodate a full input section, start filling RAMU, the user has to split the sections into multiple output sections. If we extend this syntax to specify multiple output regions, we can make the linker map the output section to multiple regions by filling the output region with input sections in the order specified in the âoutput-section-commandâ and when its full (meaning when the remaining gap in a region cannot accommodate one full input section, it starts from the next output region. Eg. MEMORY { RAML (rwx) : ORIGIN = 0x1FFF0000, LENGTH = 0x00010000 RAMU (rwx) : ORIGIN = 0x20000000, LENGTH = 0x00040000 RAMZ (rwx) : ORIGIN = 0x20040000, LENGTH = 0x00040000 } SECTIONS { .text 0x1000 : { *(.text) _etext = . ; } .mdata : AT ( ADDR (.text) + SIZEOF (.text) ) { _data = . ; *(.data) *(.data.*); _edata = . ; } > RAML, RAMU, RAMZ } The statement: .mdata : AT ( ADDR (.text) + SIZEOF (.text) ) { _data = . ; *(.data) *(.data.*); _edata = . ; } > RAML, RAMU, RAMZ Will have roughly the following meaning: For_each_output_section { curr_mem_region = get_next_mem_region (); location_counter = get_vma_mem_region (curr_mem_region); While (fill) { current_input_section = get_next_input_section (); If (location_counter > end_vma_of_mem_region_in_list) Break; mem_avail_in_curr_region = get_vma_mem_region (curr_mem_region) + sizeof (curr_mem_region) - location_counter; If ( sizeof (current_input_section) > mem_avail_in_curr_region)) { curr_mem_region = get_next_mem_region (); location_counter = get_vma_mem_region (curr_mem_region); } process_section (current_input_section, location_counter); location_counter += sizeof (current_input_section); } } Illustration: Consider an example where we have the following input .data sections: .data: size 0x0000FFF0 .data.a : size 0x000000F0 .data.b : size 0x00003000 .data.c : size 0x00000200 With the above scheme, this will be mapped in the following way to RAML,RAMU and RAMZ: RAML : (0x1FFF0000 - 0x1FFFFFF0): .data (0x1FFFFFF0 - 0x1FFFFFFF): *** GAP *** RAMU : (0x20000000 - 0x200000F0): .data.a (0x200000F0 - 0x200030F0): .data.b (0x200030F0 - 0x200032F0): .data.c It will not affect the specification in terms of the other attributes, but one (LMA): * Output section VMA: No change - this just specifies where the output section will start. * type: No change - this is for the output section as a whole - output memory regions will not change it. * LMA: The output section can still be loaded from one LMA and mapped to output VMA - the only change here is that the loader will need to map the output sections to VMA with the same pattern as the multiple output region matching code above. Can a loader do that? Can ad-hoc loaders do this? Or do all loaders assume that regions are continguous when output section is mapped to VMAs? * phdr: No change - Multiple values can still be specified here. One can have an output section map to multiple segments irrespective of their output memory region mapping. * Fillexp: No change. We might possibly want to introduce a fillexp for the gaps left behind when filling multiple output memory regions. Caveats: A comma-separated list of regions will not guarantee contiguous placement of input sections, the only way to get a contiguous placement of input sections will be to assign the output section to one monolithic memory region. For orthogonality and consistency, we would want to apply the multiple region feature to overlays too. The semantics will not be different from the algorithm mentioned above. The only caveat is that the overlay manager/loader will need to handle the swapping in and out of sections that run from the VMA consistently with the mapping algo described above. Do we want this for overlays too? ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC] Allow linker scripts to specify multiple output regions for an output section? 2017-02-22 15:28 Thomas Preudhomme @ 2017-02-27 10:27 ` Tejas Belagod 2017-02-28 5:52 ` Erik Christiansen 2017-03-02 4:32 ` Erik Christiansen 2 siblings, 0 replies; 17+ messages in thread From: Tejas Belagod @ 2017-02-27 10:27 UTC (permalink / raw) To: amodra; +Cc: Thomas Preud'homme, binutils Hi Alan, Do you have any comments on this? Thanks, Tejas. On 22/02/17 15:28, Thomas Preudhomme wrote: > [Sending on behalf of Tejas Belagod, please reply to both him (in Cc) and me] > > Hi, > > There has been some interest in the past in having syntactic support for > specifying mapping of an output section to multiple memory regions in the GNU LD > scripting language (eg. https://sourceware.org/bugzilla/show_bug.cgi?id=14299). > I would like to propose a scheme here and welcome any feedback. > > The section command in the LD Script language is structured thus: > > section [address] [(type)] : > [AT(lma)] > [ALIGN(section_align)] > [SUBALIGN(subsection_align)] > [constraint] > { > output-section-command > output-section-command > ... > } [>region] [AT>lma_region] [:phdr :phdr ...] [=fillexp] > > As I understand, it simply means - place the output section at âaddressâ with > attributes specified above (type, alignment etc). If LMA is specified, the > image(startup code etc.) most likely handles the copying from load address to > output section VMA. Multiple segment spec means the output section can be part > of more than one segment and âfillexpâ simply fills the output section loaded > with the fill value. > > Now, this does not have a method to specify output section spanning multiple > memory regions. For example, if there are 2 RAM regions RAML and RAMU and the > user wants an output section to first fill RAML and then when RAML is full, i.e. > when the remaining space in RAML cannot accommodate a full input section, start > filling RAMU, the user has to split the sections into multiple output sections. > If we extend this syntax to specify multiple output regions, we can make the > linker map the output section to multiple regions by filling the output region > with input sections in the order specified in the âoutput-section-commandâ and > when its full (meaning when the remaining gap in a region cannot accommodate one > full input section, it starts from the next output region. Eg. > > MEMORY > > { > RAML (rwx) : ORIGIN = 0x1FFF0000, LENGTH = 0x00010000 > RAMU (rwx) : ORIGIN = 0x20000000, LENGTH = 0x00040000 > RAMZ (rwx) : ORIGIN = 0x20040000, LENGTH = 0x00040000 > } > > SECTIONS > { > .text 0x1000 : { *(.text) _etext = . ; } > .mdata : > AT ( ADDR (.text) + SIZEOF (.text) ) > { _data = . ; *(.data) *(.data.*); _edata = . ; } > RAML, RAMU, RAMZ > } > > The statement: > > .mdata : > AT ( ADDR (.text) + SIZEOF (.text) ) > { _data = . ; *(.data) *(.data.*); _edata = . ; } > RAML, RAMU, RAMZ > > > Will have roughly the following meaning: > > For_each_output_section { > curr_mem_region = get_next_mem_region (); > location_counter = get_vma_mem_region (curr_mem_region); > > While (fill) { > current_input_section = get_next_input_section (); > > If (location_counter > end_vma_of_mem_region_in_list) > Break; > > mem_avail_in_curr_region = get_vma_mem_region (curr_mem_region) + sizeof > (curr_mem_region) - location_counter; > > If ( sizeof (current_input_section) > mem_avail_in_curr_region)) > { > curr_mem_region = get_next_mem_region (); > location_counter = get_vma_mem_region (curr_mem_region); > } > > process_section (current_input_section, location_counter); > location_counter += sizeof (current_input_section); > } > > } > > > Illustration: > > Consider an example where we have the following input .data sections: > > .data: size 0x0000FFF0 > .data.a : size 0x000000F0 > .data.b : size 0x00003000 > .data.c : size 0x00000200 > > With the above scheme, this will be mapped in the following way to RAML,RAMU and > RAMZ: > > RAML : (0x1FFF0000 - 0x1FFFFFF0): .data > (0x1FFFFFF0 - 0x1FFFFFFF): *** GAP *** > > RAMU : (0x20000000 - 0x200000F0): .data.a > (0x200000F0 - 0x200030F0): .data.b > (0x200030F0 - 0x200032F0): .data.c > > > It will not affect the specification in terms of the other attributes, but one > (LMA): > > * Output section VMA: No change - this just specifies where the output section > will start. > > * type: No change - this is for the output section as a whole - output memory > regions will not change it. > > * LMA: The output section can still be loaded from one LMA and mapped to output > VMA - the only change here is that the loader will need to map the output > sections to VMA with the same pattern as the multiple output region matching > code above. Can a loader do that? Can ad-hoc loaders do this? Or do all loaders > assume that regions are continguous when output section is mapped to VMAs? > > * phdr: No change - Multiple values can still be specified here. One can have an > output section map to multiple segments irrespective of their output memory > region mapping. > > * Fillexp: No change. We might possibly want to introduce a fillexp for the gaps > left behind when filling multiple output memory regions. > > Caveats: > > A comma-separated list of regions will not guarantee contiguous placement of > input sections, the only way to get a contiguous placement of input sections > will be to assign the output section to one monolithic memory region. > > For orthogonality and consistency, we would want to apply the multiple region > feature to overlays too. The semantics will not be different from the algorithm > mentioned above. The only caveat is that the overlay manager/loader will need to > handle the swapping in and out of sections that run from the VMA consistently > with the mapping algo described above. Do we want this for overlays too? > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC] Allow linker scripts to specify multiple output regions for an output section? 2017-02-22 15:28 Thomas Preudhomme 2017-02-27 10:27 ` Tejas Belagod @ 2017-02-28 5:52 ` Erik Christiansen 2017-02-28 12:11 ` Tejas Belagod 2017-03-02 4:32 ` Erik Christiansen 2 siblings, 1 reply; 17+ messages in thread From: Erik Christiansen @ 2017-02-28 5:52 UTC (permalink / raw) To: Thomas Preudhomme; +Cc: binutils, tejas.belagod On 22.02.17 15:28, Thomas Preudhomme wrote: > There has been some interest in the past in having syntactic support for > specifying mapping of an output section to multiple memory regions in the > GNU LD scripting language (eg. > https://sourceware.org/bugzilla/show_bug.cgi?id=14299). I would like to > propose a scheme here and welcome any feedback. TL;DR: Detailed response begins after 6 paragraphs. OK, in the absence of prior discussion, I'll just think aloud as I correlate the proposal with my experience in three decades developing embedded systems. Unfortunately, the one time an MMU was involved, that was done by the time I became involved, but memory holes are all black. The closest scenario I recall is where there were disparate physical memories, both on and off chip, I simply added a MEMORY region for each such block, e.g. Flash, 16bit SRAM, 8bit SRAM, a couple of small ones for specific memory mapped system chips with bunches of config registers, and maybe an FPGA in the mix. Add comments for device names and the waitstate generator values, and the script serves as central documentation too. With that one-to-one region mapping, there was never any conflict over where stuff should be located, and non were interchangeable. It is as described by "some on-chip memory and some off-chip memory, but at non-contiguous addresses" in the above link. And where we had both 8 and 16 bit SRAMS, it was most definitely consistent with "a region of on-chip SRAM which performs better for code, and the remainder performs better for data", except that using the wrong one was fatal rather than merely inferior. One issue I've encountered is detecting region overflow when multiple output sections contribute to its content, but existing syntax supports that, e.g.: MEMORY { flash (rx) : ORIGIN = 0, LENGTH = 32K ram (rw!x) : ORIGIN = 0x800060, LENGTH = 2K eeprom (rw!x) : ORIGIN = 0x810000, LENGTH = 1K } . = ASSERT (_etext + SIZEOF (.data) <= LENGTH(flash) , "Error: .text + .data collectively overflow the flash memory." ) ; But the need to flow across memory holes never eventuated in practice, as a modest chunk of on-chip RAM could always be used for e.g. sdata, leaving no need for flowing. All other regions were always incompatible, making flowing impossible. ... > If LMA is specified, the image(startup code etc.) most likely handles > the copying from load address to output section VMA. Yes, it does. And in the generic init code I've encountered, it has just been a single copy loop for e.g. bss, performing a contiguous block copy. (And when I've written it, that was true too.) > Multiple segment spec means the output section can be part of more > than one segment and âfillexpâ simply fills the output section loaded > with the fill value. Trans-hole flowing would also require a runtime copy loop for each non-contiguous block, or a table-driven multi-block copier, with the run-time table somehow initialised from the linker script. (I can imagine using variables defined in the linker script, and the .RPT assembler directive - maybe.) > Now, this does not have a method to specify output section spanning multiple > memory regions. For example, if there are 2 RAM regions RAML and RAMU and > the user wants an output section to first fill RAML and then when RAML is > full, i.e. when the remaining space in RAML cannot accommodate a full input > section, start filling RAMU, the user has to split the sections into > multiple output sections. If we extend this syntax to specify multiple > output regions, we can make the linker map the output section to multiple > regions by filling the output region with input sections in the order > specified in the âoutput-section-commandâ and when its full (meaning when > the remaining gap in a region cannot accommodate one full input section, it > starts from the next output region. This seems to be the alternate view of the problem of asking ld to flow code around holes in a region, something it still can't do, IIRC. I state it that way, because two non-contiguous memory regions over which code (or data) may be interchangeably flowed, are identical to a single region with a hole. The proposal does seem to be a way to think about addressing that issue: > Eg. > > MEMORY > > { > RAML (rwx) : ORIGIN = 0x1FFF0000, LENGTH = 0x00010000 > RAMU (rwx) : ORIGIN = 0x20000000, LENGTH = 0x00040000 > RAMZ (rwx) : ORIGIN = 0x20040000, LENGTH = 0x00040000 > } > > SECTIONS > { > .text 0x1000 : { *(.text) _etext = . ; } > .mdata : > AT ( ADDR (.text) + SIZEOF (.text) ) > { _data = . ; *(.data) *(.data.*); _edata = . ; } > RAML, RAMU, RAMZ > } Without the need for new syntax or complex init code generators, having gcc flow code across up to 5 pages of flash plus .lowtext and a floating .hightext was compatible with the linker script and tests shown here: http://lists.nongnu.org/archive/html/avr-gcc-list/2012-12/msg00044.html While details have faded from wet RAM, ISTR that holes were manufacturable by not populating any of the 5 pages, which gcc sees as named spaces. The gcc stuff was done in the AVR back end, IIRC, while an implementation in ld would be generic. > Illustration: > > Consider an example where we have the following input .data sections: > > .data: size 0x0000FFF0 > .data.a : size 0x000000F0 > .data.b : size 0x00003000 > .data.c : size 0x00000200 > > With the above scheme, this will be mapped in the following way to RAML,RAMU > and RAMZ: > > RAML : (0x1FFF0000 - 0x1FFFFFF0): .data > (0x1FFFFFF0 - 0x1FFFFFFF): *** GAP *** Would GAP use ALIGNMENT, or introduce a new parameter? How would the target-specific relocations required to break code across the hole be handled by ld? E.g. break a small AVR code loop (with 6-bit relative addressing range) and you'll need a LJMP to bridge the hole, and another with reversed loop conditionality to close the loop. Multiply that task by all the possible relocs, and again by all the possible CPU targets, and it's never-ending work for a software team for life. It seems more > RAMU : (0x20000000 - 0x200000F0): .data.a > (0x200000F0 - 0x200030F0): .data.b > (0x200030F0 - 0x200032F0): .data.c > > > It will not affect the specification in terms of the other attributes, but > one (LMA): > > * Output section VMA: No change - this just specifies where the output > section will start. > > * type: No change - this is for the output section as a whole - output > memory regions will not change it. > > * LMA: The output section can still be loaded from one LMA and mapped to > output VMA - the only change here is that the loader will need to map the > output sections to VMA with the same pattern as the multiple output region > matching code above. Can a loader do that? Can ad-hoc loaders do this? Or do > all loaders assume that regions are continguous when output section is > mapped to VMAs? Contiguous. Hole-flowing is what you're proposing to implement, both the linker internal component (target-specific reloctions), and the generic (e.g. table driven) multi-block copy loop synthesiser for custom init code generation. How that would integrate with existing init code in various implementations, I have no idea. If LMA can also be flowed around a hole, then runtime init code must be able to handle not only non-contiguous delivery, but gapped pick-up. Has the complexity of simultaneously handling different gaps in both been considered? ... > For orthogonality and consistency, we would want to apply the multiple > region feature to overlays too. The semantics will not be different from the > algorithm mentioned above. The only caveat is that the overlay > manager/loader will need to handle the swapping in and out of sections that > run from the VMA consistently with the mapping algo described above. Do we > want this for overlays too? Expanding the complexity of a single-problem solution to cover other situations seems courageous, unless it naturally falls out of the narrower solution. As overlays are used e.g. when RAM size or CPU instruction addressing range is constrained, but there's ample flash, then the likelihood of holes in either is limited, I suspect. Specifying discrete output sections with VMAs placed around the physical holes is another way to dodge them. They can all be allocated to a global encompassing memory region. Flowing is performed manually by assigning suitable code chunks to preferred input sections. Automating that, as intimated above, is non-trivial. Caveat: Above thoughts have flowed without aid of caffeine, and are recollections from old battles. Erik ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC] Allow linker scripts to specify multiple output regions for an output section? 2017-02-28 5:52 ` Erik Christiansen @ 2017-02-28 12:11 ` Tejas Belagod 2017-03-01 7:12 ` Erik Christiansen 0 siblings, 1 reply; 17+ messages in thread From: Tejas Belagod @ 2017-02-28 12:11 UTC (permalink / raw) To: dvalin; +Cc: Thomas Preudhomme, binutils Hi Erik, Thanks for your comments. My comments inline below. On 28/02/17 05:51, Erik Christiansen wrote: > On 22.02.17 15:28, Thomas Preudhomme wrote: >> There has been some interest in the past in having syntactic support for >> specifying mapping of an output section to multiple memory regions in the >> GNU LD scripting language (eg. >> https://sourceware.org/bugzilla/show_bug.cgi?id=14299). I would like to >> propose a scheme here and welcome any feedback. > > TL;DR: Detailed response begins after 6 paragraphs. > > OK, in the absence of prior discussion, I'll just think aloud as I > correlate the proposal with my experience in three decades developing > embedded systems. Unfortunately, the one time an MMU was involved, that > was done by the time I became involved, but memory holes are all black. > > The closest scenario I recall is where there were disparate physical > memories, both on and off chip, I simply added a MEMORY region for each > such block, e.g. Flash, 16bit SRAM, 8bit SRAM, a couple of small ones > for specific memory mapped system chips with bunches of config > registers, and maybe an FPGA in the mix. Add comments for device names > and the waitstate generator values, and the script serves as central > documentation too. > > With that one-to-one region mapping, there was never any conflict over > where stuff should be located, and non were interchangeable. It is as > described by "some on-chip memory and some off-chip memory, but at > non-contiguous addresses" in the above link. And where we had both 8 and > 16 bit SRAMS, it was most definitely consistent with "a region of > on-chip SRAM which performs better for code, and the remainder performs > better for data", except that using the wrong one was fatal rather than > merely inferior. > > One issue I've encountered is detecting region overflow when multiple > output sections contribute to its content, but existing syntax supports > that, e.g.: > > MEMORY > { > flash (rx) : ORIGIN = 0, LENGTH = 32K > ram (rw!x) : ORIGIN = 0x800060, LENGTH = 2K > eeprom (rw!x) : ORIGIN = 0x810000, LENGTH = 1K > } > > . = ASSERT (_etext + SIZEOF (.data) <= LENGTH(flash) , "Error: .text + .data > collectively overflow the flash memory." ) ; > > But the need to flow across memory holes never eventuated in practice, > as a modest chunk of on-chip RAM could always be used for e.g. sdata, > leaving no need for flowing. All other regions were always incompatible, > making flowing impossible. > > ... >> If LMA is specified, the image(startup code etc.) most likely handles >> the copying from load address to output section VMA. > > Yes, it does. And in the generic init code I've encountered, it has just > been a single copy loop for e.g. bss, performing a contiguous block copy. > (And when I've written it, that was true too.) > >> Multiple segment spec means the output section can be part of more >> than one segment and âfillexpâ simply fills the output section loaded >> with the fill value. > > Trans-hole flowing would also require a runtime copy loop for each > non-contiguous block, or a table-driven multi-block copier, with the > run-time table somehow initialised from the linker script. (I can > imagine using variables defined in the linker script, and the .RPT > assembler directive - maybe.) > >> Now, this does not have a method to specify output section spanning multiple >> memory regions. For example, if there are 2 RAM regions RAML and RAMU and >> the user wants an output section to first fill RAML and then when RAML is >> full, i.e. when the remaining space in RAML cannot accommodate a full input >> section, start filling RAMU, the user has to split the sections into >> multiple output sections. If we extend this syntax to specify multiple >> output regions, we can make the linker map the output section to multiple >> regions by filling the output region with input sections in the order >> specified in the âoutput-section-commandâ and when its full (meaning when >> the remaining gap in a region cannot accommodate one full input section, it >> starts from the next output region. > > This seems to be the alternate view of the problem of asking ld to flow > code around holes in a region, something it still can't do, IIRC. I > state it that way, because two non-contiguous memory regions over which > code (or data) may be interchangeably flowed, are identical to a single > region with a hole. > > The proposal does seem to be a way to think about addressing that issue: > >> Eg. >> >> MEMORY >> >> { >> RAML (rwx) : ORIGIN = 0x1FFF0000, LENGTH = 0x00010000 >> RAMU (rwx) : ORIGIN = 0x20000000, LENGTH = 0x00040000 >> RAMZ (rwx) : ORIGIN = 0x20040000, LENGTH = 0x00040000 >> } >> >> SECTIONS >> { >> .text 0x1000 : { *(.text) _etext = . ; } >> .mdata : >> AT ( ADDR (.text) + SIZEOF (.text) ) >> { _data = . ; *(.data) *(.data.*); _edata = . ; } > RAML, RAMU, RAMZ >> } > > Without the need for new syntax or complex init code generators, > having gcc flow code across up to 5 pages of flash plus .lowtext and a > floating .hightext was compatible with the linker script and tests shown > here: > > http://lists.nongnu.org/archive/html/avr-gcc-list/2012-12/msg00044.html > > While details have faded from wet RAM, ISTR that holes were > manufacturable by not populating any of the 5 pages, which gcc sees as > named spaces. The gcc stuff was done in the AVR back end, IIRC, while an > implementation in ld would be generic. > >> Illustration: >> >> Consider an example where we have the following input .data sections: >> >> .data: size 0x0000FFF0 >> .data.a : size 0x000000F0 >> .data.b : size 0x00003000 >> .data.c : size 0x00000200 >> >> With the above scheme, this will be mapped in the following way to RAML,RAMU >> and RAMZ: >> >> RAML : (0x1FFF0000 - 0x1FFFFFF0): .data >> (0x1FFFFFF0 - 0x1FFFFFFF): *** GAP *** > > Would GAP use ALIGNMENT, or introduce a new parameter? > I wouldn't want to overload ALIGNMENT here - what if its needed simultaneously with ALIGNMENT. Can we not leave this space unassigned? More often than not if one's filling a memory region automatically, would they really care what goes into the gaps (if security is not a concern)? OTOH, if security is a concern, we can explore introducing a new syntax with a default behavior of zero-filling the gaps. > How would the target-specific relocations required to break code across > the hole be handled by ld? E.g. break a small AVR code loop (with 6-bit > relative addressing range) and you'll need a LJMP to bridge the hole, > and another with reversed loop conditionality to close the loop. > Multiply that task by all the possible relocs, and again by all the > possible CPU targets, and it's never-ending work for a software team for > life. > As I understand, compilers generate references to objects within a section with a .<input_section_name> + offset_within_section Now when a section that spans 2 or more regions inserts holes/padding to prevent an object from straddling 2 regions, the offsets within the section to other objects will change. This means all the compiler-generated "section + offset" of all objects that come after the padding will need to be fixed up. Its really difficult to know which ones to fix up - the relocations are only on the section label, not the object in the section. So, what I'm proposing here will not split the input sections - input sections will move as a block. > It seems more > >> RAMU : (0x20000000 - 0x200000F0): .data.a >> (0x200000F0 - 0x200030F0): .data.b >> (0x200030F0 - 0x200032F0): .data.c >> >> >> It will not affect the specification in terms of the other attributes, but >> one (LMA): >> >> * Output section VMA: No change - this just specifies where the output >> section will start. >> >> * type: No change - this is for the output section as a whole - output >> memory regions will not change it. >> >> * LMA: The output section can still be loaded from one LMA and mapped to >> output VMA - the only change here is that the loader will need to map the >> output sections to VMA with the same pattern as the multiple output region >> matching code above. Can a loader do that? Can ad-hoc loaders do this? Or do >> all loaders assume that regions are continguous when output section is >> mapped to VMAs? > > Contiguous. Hole-flowing is what you're proposing to implement, both the > linker internal component (target-specific reloctions), and the generic > (e.g. table driven) multi-block copy loop synthesiser for custom init > code generation. How that would integrate with existing init code in > various implementations, I have no idea. > > If LMA can also be flowed around a hole, then runtime init code must be > able to handle not only non-contiguous delivery, but gapped pick-up. Has > the complexity of simultaneously handling different gaps in both been > considered? > I haven't thought about that. Can it be worked on the principle that when one specifies an LMA and there is user-written init code to copy blocks, the init code programmer knows the LMA gap layout and can handle the gaps accordingly? It could be the case currently where code from different non-contiguous ROMs are copied into a RAM during startup. This IMHO, is always specific to the particular embedded system being deployed. > ... >> For orthogonality and consistency, we would want to apply the multiple >> region feature to overlays too. The semantics will not be different from the >> algorithm mentioned above. The only caveat is that the overlay >> manager/loader will need to handle the swapping in and out of sections that >> run from the VMA consistently with the mapping algo described above. Do we >> want this for overlays too? > > Expanding the complexity of a single-problem solution to cover other > situations seems courageous, unless it naturally falls out of the > narrower solution. As overlays are used e.g. when RAM size or CPU > instruction addressing range is constrained, but there's ample flash, > then the likelihood of holes in either is limited, I suspect. > Makes sense. Thanks, Tejas. > Specifying discrete output sections with VMAs placed around the physical > holes is another way to dodge them. They can all be allocated to a > global encompassing memory region. Flowing is performed manually by > assigning suitable code chunks to preferred input sections. Automating > that, as intimated above, is non-trivial. > > Caveat: Above thoughts have flowed without aid of caffeine, and are > recollections from old battles. > > Erik > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC] Allow linker scripts to specify multiple output regions for an output section? 2017-02-28 12:11 ` Tejas Belagod @ 2017-03-01 7:12 ` Erik Christiansen 0 siblings, 0 replies; 17+ messages in thread From: Erik Christiansen @ 2017-03-01 7:12 UTC (permalink / raw) To: Tejas Belagod; +Cc: Thomas Preudhomme, binutils On 28.02.17 12:11, Tejas Belagod wrote: > On 28/02/17 05:51, Erik Christiansen wrote: > > > > Would GAP use ALIGNMENT, or introduce a new parameter? > > I wouldn't want to overload ALIGNMENT here - what if its needed > simultaneously with ALIGNMENT. Can we not leave this space unassigned? More > often than not if one's filling a memory region automatically, would they > really care what goes into the gaps (if security is not a concern)? My reason for raising the issue of ALIGNMENT was concern about splitting instructions at the edge of a hole during flowing. I see below that the proposed method avoids that problem entirely. > OTOH, if security is a concern, we can explore introducing a new > syntax with a default behavior of zero-filling the gaps. We already have FILL to cover that, so I wouldn't worry. > > How would the target-specific relocations required to break code across > > the hole be handled by ld? E.g. break a small AVR code loop (with 6-bit > > relative addressing range) and you'll need a LJMP to bridge the hole, > > and another with reversed loop conditionality to close the loop. > > Multiply that task by all the possible relocs, and again by all the > > possible CPU targets, and it's never-ending work for a software team for > > life. > > > > As I understand, compilers generate references to objects within a section with a > > .<input_section_name> + offset_within_section > > Now when a section that spans 2 or more regions inserts holes/padding to > prevent an object from straddling 2 regions, the offsets within the section > to other objects will change. This means all the compiler-generated "section > + offset" of all objects that come after the padding will need to be fixed > up. Its really difficult to know which ones to fix up - the relocations are > only on the section label, not the object in the section. So, what I'm > proposing here will not split the input sections - input sections will move > as a block. Aha! That is a commendably inexpensive way to avoid a great deal of pain. A little bit of SIZEOF and LENGTH arithmetic in ld easily predicts whether the current input section will fit in the current region, and the start address of the next region becomes the new base for offsets, without the need for additional arithmetic. Very neat. (So long as we size our input sections modestly.) ... > > If LMA can also be flowed around a hole, then runtime init code must be > > able to handle not only non-contiguous delivery, but gapped pick-up. Has > > the complexity of simultaneously handling different gaps in both been > > considered? > > > > I haven't thought about that. Can it be worked on the principle that when > one specifies an LMA and there is user-written init code to copy blocks, the > init code programmer knows the LMA gap layout and can handle the gaps > accordingly? I was playing devil's advocate there - the likelihood of gapped LMA seems low in practice, as flash would mostly be larger than fast RAM. It's just the worst case. On many projects we used either a commercial or FOSS RTOS, and in each case the init code was auto-generated. (Really nothing more than picking up start/end addresses for read/write from the linker script, to use in a single provided copy loop.) I have written my own less than half the time - there may be embedded developers out there who have never done a "bare metal" development. For them, once start and end labels, including gap edges, are provided in the linker script, a small example in the ld info would be the minimum needed. > It could be the case currently where code from different > non-contiguous ROMs are copied into a RAM during startup. This IMHO, > is always specific to the particular embedded system being deployed. OK, I'd thought that rare these days, as ROMs are so much bigger than in my youth, but you did mention the case of overlays. It is easy to imagine a separate ROM for one or several RAM-sized overlays. Then overlay handling is as easy as manually handling gapped LMA, just done in an overlay handler, rather than init. With granularity equal to input sections, the proposal seems eminently feasible, and an interesting project. I don't know what relocs might ensue from bumping the ld location counter to the other side of a hole, as when two input sections from one compile unit are separated to straddle it, or whether ld would handle that without intervention. I'd be more confident where the input sections are from separate compile units, and connected only by globals. I hope it goes well! Erik ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC] Allow linker scripts to specify multiple output regions for an output section? 2017-02-22 15:28 Thomas Preudhomme 2017-02-27 10:27 ` Tejas Belagod 2017-02-28 5:52 ` Erik Christiansen @ 2017-03-02 4:32 ` Erik Christiansen [not found] ` <58B83CDA.5050000@foss.arm.com> 2 siblings, 1 reply; 17+ messages in thread From: Erik Christiansen @ 2017-03-02 4:32 UTC (permalink / raw) To: Thomas Preudhomme, Tejas Belagod; +Cc: binutils Given that atomicity of flow around holes is to be input sections, there may be a simpler equivalent to the proposed new syntax: On 22.02.17 15:28, Thomas Preudhomme wrote: > MEMORY > > { > RAML (rwx) : ORIGIN = 0x1FFF0000, LENGTH = 0x00010000 > RAMU (rwx) : ORIGIN = 0x20000000, LENGTH = 0x00040000 > RAMZ (rwx) : ORIGIN = 0x20040000, LENGTH = 0x00040000 > } > > SECTIONS > { > .text 0x1000 : { *(.text) _etext = . ; } > .mdata : > AT ( ADDR (.text) + SIZEOF (.text) ) > { _data = . ; *(.data) *(.data.*); _edata = . ; } > RAML, RAMU, RAMZ > } AIUI, that syntax proposal is motivated by the effect of this ld info documentation "If a file name matches more than one wildcard pattern, or if a file name appears explicitly and is also matched by a wildcard pattern, the linker will use the first match in the linker script." I.e. instead of seeking subsequent matching wildcard patterns when needed, ld generates an overflow error on .raml, given this hole dodger, using existing syntax: .raml : AT ( ADDR (.text) + SIZEOF (.text) ) { _rmal_start = . ; *(.data) *(.data.*) ; _raml_end = . ; } > RAML .ramu : AT ( ADDR (.raml) + SIZEOF (.raml) ) { _rmau_start = . ; *(.data) *(.data.*) ; _ramu_end = . ; } > RAMU .ramz : AT ( ADDR (.ramu) + SIZEOF (.ramu) ) { _rmaz_start = . ; *(.data) *(.data.*) ; _ramz_end = . ; } > RAMZ I am led to wonder if it might not be less work to merely tweak ld to look for subsequent matching wildcard patterns in following output sections before issuing a region overflow error. I.e. ld merely redefines "first match" if a subsequent one is available when needed. That seems less intervention than adding new syntax to the script interpreter, and then grafting on the new capability. The overflowing input section needs to remain in the input queue during the output section bump, to complete its "go-around" on failed landing approach. One significant advantage of this approach is that part of the established practice, i.e. constraining certain input sections to low, middle, or high RAM regions, remains both straightforward and explicit. If multiple output sections are directed to a region, even finer constraint is possible _simultaneous_ with inter-region flowing on overflow. On the other hand, what would happen if multiple "> RAML, RAMU, RAMZ" were aimed at these regions in an attempt to enforce a paging or proximity constraint while flowing? Utilising the existing syntax, which we've used for many years with explicit input section patterns, empowered by a small ld intelligence increment, would seem to manage the task with less effort and more control. How does that fare as a modest variation on skinning the cat? Erik ^ permalink raw reply [flat|nested] 17+ messages in thread
[parent not found: <58B83CDA.5050000@foss.arm.com>]
* Re: [RFC] Allow linker scripts to specify multiple output regions for an output section? [not found] ` <58B83CDA.5050000@foss.arm.com> @ 2017-03-03 10:27 ` Erik Christiansen 2017-03-07 11:06 ` Tejas Belagod 0 siblings, 1 reply; 17+ messages in thread From: Erik Christiansen @ 2017-03-03 10:27 UTC (permalink / raw) To: Tejas Belagod; +Cc: Thomas Preudhomme, binutils On 02.03.17 15:40, Tejas Belagod wrote: > On 02/03/17 04:32, Erik Christiansen wrote: > > I am led to wonder if it might not be less work to merely tweak ld to > > look for subsequent matching wildcard patterns in following output > > sections before issuing a region overflow error. I.e. ld merely > > redefines "first match" if a subsequent one is available when needed. > > That seems less intervention than adding new syntax to the script > > interpreter, and then grafting on the new capability. > > > > The overflowing input section needs to remain in the input queue during > > the output section bump, to complete its "go-around" on failed landing > > approach. > > > > It does seem like an interesting idea. Two things immediately spring to mind. > > 1. Will it break existing code? That's perhaps the most important question. At present any input section pattern repetitions in the linker script would only be nonfunctional baggage. They would only occur as harmless errors, disregarded by ld, through its "first match" policy. Adding a command-line option to enable flowing would however be a useful safeguard. > 2. How do we honor any ordering specified? For eg. If the above spec means > that raml will have .data first and .data.* later .ramu is expected to start > with .data sections, will this break the assumption if a .data.* jumps into > .ramu and starts the region with it? Re-using the existing code, an input section would not just fall over the edge to "start the region". Whether an input section is read from ld's input, or redirected from an overflowing output section makes no difference while the input section remains in the input queue, unallocated. On failing to land in the full output section, it needs to be redirected to a "second match" in a subsequent output section if provided, else the pending (existing code) overflow error comes to fruition. The existing allocation code (being unmodified) then continues to distribute the input section according to existing pattern matching behaviour, but using the "second match". The ordering of input sections into output sections is set out in ld info. The difference between "*(.text .rdata)" and "*(.text) *(.rdata)" is described in "3.6.4.1 Input Section Basics". Thus, if the user wants .ramu and .raml to have identical .data vs .data.* order, then it'll be copy/paste. But if a difference is desired, then copy/edit/paste is equally available. It was when one output section had to "> RAML,RAMU, RAMZ", that region-specific control over ordering was lost. It is not suggested to change any code other than to interpose rebasing of pattern allocation before erroring on output section overflow. If at that point, we look for "second match" wildcard patterns in subsequent output sections, then as each input section is read from ld's input, it will be allocated to the next output section with matching patterns - using the existing allocation code, influenced only to the extent of replacing the "first match" patterns from the full output section with subsequent substitutes. > > One significant advantage of this approach is that part of the > > established practice, i.e. constraining certain input sections to low, > > middle, or high RAM regions, remains both straightforward and explicit. > > If multiple output sections are directed to a region, even finer > > constraint is possible _simultaneous_ with inter-region flowing on > > overflow. On the other hand, what would happen if multiple "> RAML, > > RAMU, RAMZ" were aimed at these regions in an attempt to enforce a > > paging or proximity constraint while flowing? > > > > I'm not sure I understand this question. My word picture was a bit fuzzy, I must admit. The minimalist tweak without syntax extension is capable of constraining some input sections at the same time as flowing others. Input sections which need to be in low memory are made to match a wildcard pattern (or explicit file list) which is placed only in the first output section. Only input sections which match patterns in subsequent output section can flow. The mechanism thus sorts sheep from goats, while flowing. That is very useful, and should be present in any implementation of flowing, I think. There would undoubtedly be some real effort involved in tweaking ld to rebase input pattern "first match" on output section overflow - where a subsequent match is available. Whether that would best be done as a "second match" search when needed, or replacing "first match" with a list of matches at the outset, remains to be seen. The difference between theory and practice always looks smaller from this side. Erik ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC] Allow linker scripts to specify multiple output regions for an output section? 2017-03-03 10:27 ` Erik Christiansen @ 2017-03-07 11:06 ` Tejas Belagod 0 siblings, 0 replies; 17+ messages in thread From: Tejas Belagod @ 2017-03-07 11:06 UTC (permalink / raw) To: dvalin; +Cc: Thomas Preudhomme, binutils On 03/03/17 10:27, Erik Christiansen wrote: > On 02.03.17 15:40, Tejas Belagod wrote: >> On 02/03/17 04:32, Erik Christiansen wrote: >>> I am led to wonder if it might not be less work to merely tweak ld to >>> look for subsequent matching wildcard patterns in following output >>> sections before issuing a region overflow error. I.e. ld merely >>> redefines "first match" if a subsequent one is available when needed. >>> That seems less intervention than adding new syntax to the script >>> interpreter, and then grafting on the new capability. >>> >>> The overflowing input section needs to remain in the input queue during >>> the output section bump, to complete its "go-around" on failed landing >>> approach. >>> >> >> It does seem like an interesting idea. Two things immediately spring to mind. >> >> 1. Will it break existing code? > > That's perhaps the most important question. At present any input section > pattern repetitions in the linker script would only be nonfunctional > baggage. They would only occur as harmless errors, disregarded by ld, > through its "first match" policy. Adding a command-line option to enable > flowing would however be a useful safeguard. > >> 2. How do we honor any ordering specified? For eg. If the above spec means >> that raml will have .data first and .data.* later .ramu is expected to start >> with .data sections, will this break the assumption if a .data.* jumps into >> .ramu and starts the region with it? > > Re-using the existing code, an input section would not just fall over > the edge to "start the region". Whether an input section is read from > ld's input, or redirected from an overflowing output section makes no > difference while the input section remains in the input queue, > unallocated. On failing to land in the full output section, it needs to > be redirected to a "second match" in a subsequent output section if > provided, else the pending (existing code) overflow error comes to > fruition. The existing allocation code (being unmodified) then continues > to distribute the input section according to existing pattern matching > behaviour, but using the "second match". > > The ordering of input sections into output sections is set out in ld > info. The difference between "*(.text .rdata)" and "*(.text) *(.rdata)" > is described in "3.6.4.1 Input Section Basics". > > Thus, if the user wants .ramu and .raml to have identical .data vs > .data.* order, then it'll be copy/paste. But if a difference is desired, > then copy/edit/paste is equally available. It was when one output > section had to "> RAML,RAMU, RAMZ", that region-specific control over > ordering was lost. > > It is not suggested to change any code other than to interpose rebasing > of pattern allocation before erroring on output section overflow. If at > that point, we look for "second match" wildcard patterns in subsequent > output sections, then as each input section is read from ld's input, it > will be allocated to the next output section with matching patterns - > using the existing allocation code, influenced only to the extent of > replacing the "first match" patterns from the full output section with > subsequent substitutes. > Ah, yes! That makes a lot of sense. Thanks for clearing that up. >>> One significant advantage of this approach is that part of the >>> established practice, i.e. constraining certain input sections to low, >>> middle, or high RAM regions, remains both straightforward and explicit. >>> If multiple output sections are directed to a region, even finer >>> constraint is possible _simultaneous_ with inter-region flowing on >>> overflow. On the other hand, what would happen if multiple "> RAML, >>> RAMU, RAMZ" were aimed at these regions in an attempt to enforce a >>> paging or proximity constraint while flowing? >>> >> >> I'm not sure I understand this question. > > My word picture was a bit fuzzy, I must admit. The minimalist tweak > without syntax extension is capable of constraining some input sections > at the same time as flowing others. Input sections which need to be in > low memory are made to match a wildcard pattern (or explicit file list) > which is placed only in the first output section. Only input sections > which match patterns in subsequent output section can flow. The > mechanism thus sorts sheep from goats, while flowing. That is very > useful, and should be present in any implementation of flowing, I think. > > There would undoubtedly be some real effort involved in tweaking ld to > rebase input pattern "first match" on output section overflow - where a > subsequent match is available. Whether that would best be done as a > "second match" search when needed, or replacing "first match" with a > list of matches at the outset, remains to be seen. The difference > between theory and practice always looks smaller from this side. > I like the approach you've proposed. I admit it is more practical than extending the syntax for more regions. But, I see 2 disadvantages that are more cosmetic than anything else: 1. Is the duplicity of patterns over multiple output section regions as expressive of the intent as using '> REGION1, REGION2,..., REGIONX'? Though you could argue that if the subsequent-match flowing feature is controlled by a command-line switch, the user knows what they're doing and the intention would be implicit. 2. If we have complex patterns matching input sections/filenames, duplicating it over multiple output sections statements might be prone to copy-paste errors. Keeping them consistent after changes means diligently replicating them everywhere - adds to maintenance overhead. I agree that replacing the first-match rule with a subsequent match rule controlled by a command-line switch is much much lower implementation cost. It will be interesting to hear views of a maintainer about the preferred approach. Thanks, Tejas. > Erik > ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2019-07-24 12:48 UTC | newest] Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-03-09 12:06 [RFC] Allow linker scripts to specify multiple output regions for an output section? Erik Christiansen 2017-06-09 12:21 ` Tejas Belagod 2017-06-09 13:35 ` Erik Christiansen 2019-06-27 12:58 ` Christophe Lyon 2019-07-02 6:49 ` Erik Christiansen 2019-07-11 8:42 ` Christophe Lyon 2019-07-24 7:28 ` Nick Clifton 2019-07-24 9:18 ` Simon Richter 2019-07-24 12:48 ` Erik Christiansen -- strict thread matches above, loose matches on Subject: below -- 2017-02-22 15:28 Thomas Preudhomme 2017-02-27 10:27 ` Tejas Belagod 2017-02-28 5:52 ` Erik Christiansen 2017-02-28 12:11 ` Tejas Belagod 2017-03-01 7:12 ` Erik Christiansen 2017-03-02 4:32 ` Erik Christiansen [not found] ` <58B83CDA.5050000@foss.arm.com> 2017-03-03 10:27 ` Erik Christiansen 2017-03-07 11:06 ` Tejas Belagod
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).