* [patch 0/2] PR49847: Add hook to place read-only lookup-tables in named address-space @ 2017-07-27 12:29 Georg-Johann Lay 2017-07-27 12:34 ` Richard Biener ` (2 more replies) 0 siblings, 3 replies; 13+ messages in thread From: Georg-Johann Lay @ 2017-07-27 12:29 UTC (permalink / raw) To: gcc-patches; +Cc: Denis Chertykov For some targets, the best place to put read-only lookup tables as generated by -ftree-switch-conversion is not the generic address space but some target specific address space. This is the case for AVR, where for most devices .rodata must be located in RAM. Part #1 adds a new, optional target hook that queries the back-end about the desired address space. There is currently only one user: tree-switch-conversion.c::build_one_array() which re-builds value_type and array_type if the address space returned by the backend is not the generic one. Part #2 is the AVR part that implements the new hook and adds some sugar around it. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [patch 0/2] PR49847: Add hook to place read-only lookup-tables in named address-space 2017-07-27 12:29 [patch 0/2] PR49847: Add hook to place read-only lookup-tables in named address-space Georg-Johann Lay @ 2017-07-27 12:34 ` Richard Biener 2017-07-27 13:32 ` Georg-Johann Lay 2017-07-27 12:41 ` [patch 1/2] " Georg-Johann Lay 2017-07-27 12:50 ` [patch 2/2,avr] " Georg-Johann Lay 2 siblings, 1 reply; 13+ messages in thread From: Richard Biener @ 2017-07-27 12:34 UTC (permalink / raw) To: Georg-Johann Lay; +Cc: gcc-patches, Denis Chertykov On Thu, Jul 27, 2017 at 2:29 PM, Georg-Johann Lay <avr@gjlay.de> wrote: > For some targets, the best place to put read-only lookup tables as > generated by -ftree-switch-conversion is not the generic address space > but some target specific address space. > > This is the case for AVR, where for most devices .rodata must be > located in RAM. > > Part #1 adds a new, optional target hook that queries the back-end > about the desired address space. There is currently only one user: > tree-switch-conversion.c::build_one_array() which re-builds value_type > and array_type if the address space returned by the backend is not > the generic one. > > Part #2 is the AVR part that implements the new hook and adds some > sugar around it. Given that switch-conversion just creates a constant initializer doesn't AVR benefit from handling those uniformly (at RTL expansion?). Not sure but I think it goes through the regular constant pool handling. Richard. > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [patch 0/2] PR49847: Add hook to place read-only lookup-tables in named address-space 2017-07-27 12:34 ` Richard Biener @ 2017-07-27 13:32 ` Georg-Johann Lay 2017-07-28 7:34 ` Richard Biener 0 siblings, 1 reply; 13+ messages in thread From: Georg-Johann Lay @ 2017-07-27 13:32 UTC (permalink / raw) To: Richard Biener; +Cc: gcc-patches, Denis Chertykov On 27.07.2017 14:34, Richard Biener wrote: > On Thu, Jul 27, 2017 at 2:29 PM, Georg-Johann Lay <avr@gjlay.de> wrote: >> For some targets, the best place to put read-only lookup tables as >> generated by -ftree-switch-conversion is not the generic address space >> but some target specific address space. >> >> This is the case for AVR, where for most devices .rodata must be >> located in RAM. >> >> Part #1 adds a new, optional target hook that queries the back-end >> about the desired address space. There is currently only one user: >> tree-switch-conversion.c::build_one_array() which re-builds value_type >> and array_type if the address space returned by the backend is not >> the generic one. >> >> Part #2 is the AVR part that implements the new hook and adds some >> sugar around it. > > Given that switch-conversion just creates a constant initializer doesn't AVR > benefit from handling those uniformly (at RTL expansion?). Not sure but > I think it goes through the regular constant pool handling. > > Richard. avr doesn't use constant pools because they are not profitable for several reasons. Moreover, it's not sufficient to just patch the pool's section, you'd also have to patch *all* accesses to also use the exact same address space so that they use the correct instruction (LPM instead of LD). Loop optimization, for example, may move addr-space pointers out of loops and actually does this for some CSWTCH tables. I didn't look into pool handling, but don't expect it allows to consistently patch all accesses in the aftermath. *If* that's possible, then it should also be possible to patch vtables and all of their accesses, aka. https://gcc.gnu.org/PR43745 e.g. in a target-specific tree pass? With vtables it's basically the same problem, you'd have to patch *all* accesses to match the vtable's address-space. c++ need not to expose address-spaces to the application, handling address-spaces internally would be sufficient. Johann ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [patch 0/2] PR49847: Add hook to place read-only lookup-tables in named address-space 2017-07-27 13:32 ` Georg-Johann Lay @ 2017-07-28 7:34 ` Richard Biener 2017-07-28 10:18 ` Georg-Johann Lay 2017-08-16 14:29 ` Georg-Johann Lay 0 siblings, 2 replies; 13+ messages in thread From: Richard Biener @ 2017-07-28 7:34 UTC (permalink / raw) To: Georg-Johann Lay; +Cc: gcc-patches, Denis Chertykov On Thu, Jul 27, 2017 at 3:32 PM, Georg-Johann Lay <avr@gjlay.de> wrote: > On 27.07.2017 14:34, Richard Biener wrote: >> >> On Thu, Jul 27, 2017 at 2:29 PM, Georg-Johann Lay <avr@gjlay.de> wrote: >>> >>> For some targets, the best place to put read-only lookup tables as >>> generated by -ftree-switch-conversion is not the generic address space >>> but some target specific address space. >>> >>> This is the case for AVR, where for most devices .rodata must be >>> located in RAM. >>> >>> Part #1 adds a new, optional target hook that queries the back-end >>> about the desired address space. There is currently only one user: >>> tree-switch-conversion.c::build_one_array() which re-builds value_type >>> and array_type if the address space returned by the backend is not >>> the generic one. >>> >>> Part #2 is the AVR part that implements the new hook and adds some >>> sugar around it. >> >> >> Given that switch-conversion just creates a constant initializer doesn't >> AVR >> benefit from handling those uniformly (at RTL expansion?). Not sure but >> I think it goes through the regular constant pool handling. >> >> Richard. > > > avr doesn't use constant pools because they are not profitable for > several reasons. > > Moreover, it's not sufficient to just patch the pool's section, you'd > also have to patch *all* accesses to also use the exact same address > space so that they use the correct instruction (LPM instead of LD). Sure. So then not handle it in constant pool handling but in expanding the initializers of global vars. I think the entry for this is varasm.c:assemble_variable - that sets DECL_RTL for the variable. > Loop optimization, for example, may move addr-space pointers out of > loops and actually does this for some CSWTCH tables. I didn't look > into pool handling, but don't expect it allows to consistently > patch all accesses in the aftermath. > > *If* that's possible, then it should also be possible to patch vtables > and all of their accesses, aka. > > https://gcc.gnu.org/PR43745 > > e.g. in a target-specific tree pass? Ah, ok. I see what you mean. Once the address of a global var is taken the pointers refering to it have to use the proper address-space. So yeah, it looks like this would need to be done in a (target specific) pass, probably an IPA pass in case the variables are used from multiple functions. > With vtables it's basically the same problem, you'd have to patch *all* > accesses to match the vtable's address-space. c++ need not to expose > address-spaces to the application, handling address-spaces internally > would be sufficient. The IPA pass should be able to handle this transparently (vtables are just global decls with initializers). Of course the question is whether the linker can handle relocations between address-spaces for example? As of the patch I think it is too specific (switch-conversion only) for my taste. What you can do I think is in assemble_variable handle the case of ! TREE_PUBLIC && ! TREE_ADDRESSABLE -- non-exported, non-address taken variables can be put into a different address-space transparently by adjusting their type to reflect the changed address-space. This should catch the switch-conversion case that didn't run into the loop case. It won't fix vtables I think because they are always exported unless you use LTO, they also end up address-taken I think. For vtables (a bigger chunk than switch-conversion decls) adjusting things in the C++ FE is probably a good idea if it helps AVR. Richard. > Johann ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [patch 0/2] PR49847: Add hook to place read-only lookup-tables in named address-space 2017-07-28 7:34 ` Richard Biener @ 2017-07-28 10:18 ` Georg-Johann Lay 2017-07-28 11:15 ` Richard Biener 2017-08-16 14:29 ` Georg-Johann Lay 1 sibling, 1 reply; 13+ messages in thread From: Georg-Johann Lay @ 2017-07-28 10:18 UTC (permalink / raw) To: Richard Biener; +Cc: gcc-patches, Denis Chertykov Richard Biener schrieb: > On Thu, Jul 27, 2017 at 3:32 PM, Georg-Johann Lay <avr@gjlay.de> wrote: >> On 27.07.2017 14:34, Richard Biener wrote: >>> On Thu, Jul 27, 2017 at 2:29 PM, Georg-Johann Lay <avr@gjlay.de> wrote: >>>> For some targets, the best place to put read-only lookup tables as >>>> generated by -ftree-switch-conversion is not the generic address space >>>> but some target specific address space. >>>> >>>> This is the case for AVR, where for most devices .rodata must be >>>> located in RAM. >>>> >>>> Part #1 adds a new, optional target hook that queries the back-end >>>> about the desired address space. There is currently only one user: >>>> tree-switch-conversion.c::build_one_array() which re-builds value_type >>>> and array_type if the address space returned by the backend is not >>>> the generic one. >>>> >>>> Part #2 is the AVR part that implements the new hook and adds some >>>> sugar around it. >>> >>> Given that switch-conversion just creates a constant initializer doesn't >>> AVR >>> benefit from handling those uniformly (at RTL expansion?). Not sure but >>> I think it goes through the regular constant pool handling. >>> >>> Richard. >> >> avr doesn't use constant pools because they are not profitable for >> several reasons. >> >> Moreover, it's not sufficient to just patch the pool's section, you'd >> also have to patch *all* accesses to also use the exact same address >> space so that they use the correct instruction (LPM instead of LD). > > Sure. So then not handle it in constant pool handling but in expanding > the initializers of global vars. I think the entry for this is > varasm.c:assemble_variable - that sets DECL_RTL for the variable. Expand and RTL is too late. The CSWTCH tables are created by some tree pass, and it also generated accesses which use the address-space of the element types. If any tree variable or copy thereof uses generic space, then you will get wrong code. In the case of CSWTCH, you must get the AS into value_type as noted by Steven https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49857#c3 >> Loop optimization, for example, may move addr-space pointers out of >> loops and actually does this for some CSWTCH tables. I didn't look >> into pool handling, but don't expect it allows to consistently >> patch all accesses in the aftermath. >> >> *If* that's possible, then it should also be possible to patch vtables >> and all of their accesses, aka. >> >> https://gcc.gnu.org/PR43745 >> >> e.g. in a target-specific tree pass? > > Ah, ok. I see what you mean. Once the address of a global var is > taken the pointers refering to it have to use the proper address-space. Yes, and all copies of these pointers etc. generated so far. That's the point of address spaces. The binary representation of a pointer may be exact the same for different ASes, but still accesses have to be handled differently: Different legitimate addresses, different instructions to access through these pointers, etc. Just converting a pointer to a different AS will give you wrong code. > So yeah, it looks like this would need to be done in a (target specific) > pass, probably an IPA pass in case the variables are used from multiple > functions. It would at least be an ABI change. If different modules are accessing the vtable, then they must use the same AS. Having a copy of the vtable in each module is pointless: If these modules don't agree about the AS and host vtables in different ASes, we lost from the optimization point: one module would have it in flash (consuming flash) and a different module would have it in RAM (consumes RAM and flash to initialize that RAM at startup). So this would not use 1x flash as intended, and not 1x flash + 1x RAM like in the non optimized version, but 2x flash + 1x RAM. Some comdat won't help because that gives wrong code. >> With vtables it's basically the same problem, you'd have to patch *all* >> accesses to match the vtable's address-space. c++ need not to expose >> address-spaces to the application, handling address-spaces internally >> would be sufficient. > > The IPA pass should be able to handle this transparently (vtables are just > global decls with initializers). Global is bad. If the address of the object escapes, that's be wrong code because location + all accesses must agree about the AS. > Of course the question is whether the linker can handle relocations > between address-spaces for example? What's the linker to do with it? Maybe there's confusion about what I mean with "address space". I mean the named address spaces as of http://gcc.gnu.org/onlinedocs/gcc/Named-Address-Spaces.html according to ISO/IEC DTR 18037. It's not in the C-standard and only supported by gcc as a GNU extension. Suppose the following GNU-C extern const char c; char readc (const char *pc) { return c + *pc; } The generic AS allows direct addressing and addressing via X=r26 for example: readc: ; (set (reg:HI 26) ; (reg:HI 24)) movw r26,r24 ; 18 *movhi/1 ; (set (reg:QI 25) ; (mem:QI (reg:HI 26) [S1 A8])) ld r25,X ; 8 movqi_insn/4 ; (set (reg:QI 24) ; (mem:QI (symbol_ref:HI ("c")) [S1 A8])) lds r24,c ; 9 movqi_insn/4 ; (set (reg:QI 24) ; (plus:QI (reg:QI 24) ; (reg:QI 25))) add r24,r25 ; 15 addqi3/1 ret This locates c in RAM and *pc must also be located in RAM which is a waste. Now use AS __flash, i.e. non-volatile memory: extern const __flash char c; char readc (const __flash char *pc) { return c + *pc; } readc: ; (set (reg:HI 30) ; (reg:HI 24)) movw r30,r24 ; 20 *movhi/1 ; (set (reg:QI 25) ; (mem:QI (reg:HI 30) [S1 A8 AS1])) lpm r25,Z ; 7 movqi_insn/4 ; (set (reg:HI 30) ; (symbol_ref:HI ("c"))) ldi r30,lo8(c) ; 17 *movhi/5 ldi r31,hi8(c) ; (set (reg:QI 24) ; (mem:QI (reg:HI 30) [S1 A8 AS1])) lpm r24,Z ; 8 movqi_insn/4 ; (set (reg:QI 24) ; (plus:QI (reg:QI 24) ; (reg:QI 25))) add r24,r25 ; 14 addqi3/1 ret And access to __flash must use instruction LPM *,Z. Direct addressing or using X or Y register are not allowed. Using Z+const addressing is not allowed. Using LD or LDS on the same binary address will lead to wrong code, for example will read from RAM address 0x42 instead of from flash address 0x42. How could the linker fix an address to generic AS to one that goes to non-generic AS? The linker doesn't even have that information. > As of the patch I think it is too specific (switch-conversion only) > for my taste. It's actually not restricted to switch-conversion. It can be used for any object provided 1) The object is in static storage and located in that AS. 2) All accesses to that object use that AS. 3) The object is read-only, i.e. not modified after load-time. > What you can do I think is in assemble_variable handle the case of > ! TREE_PUBLIC && ! TREE_ADDRESSABLE -- non-exported, non-address > taken variables can be put into a different address-space transparently > by adjusting their type to reflect the changed address-space. /* Assemble everything that is needed for a variable or function declaration. Not used for automatic variables, and not used for function definitions. Should not be called for variables of incomplete structure type. TOP_LEVEL is nonzero if this variable has file scope. AT_END is nonzero if this is the special handling, at end of compilation, to define things that have had only tentative definitions. DONT_OUTPUT_DATA if nonzero means don't actually output the initial value (that will be done by the caller). */ void assemble_variable (tree decl, int top_level ATTRIBUTE_UNUSED, int at_end ATTRIBUTE_UNUSED, int dont_output_data) { So I can patch TYPE_ADDR_SPACE of /decl/ and this will by magic diffuse to all places that access it? Wow! > This should catch the switch-conversion case that didn't run into the > loop case. It won't fix vtables I think because they are always exported > unless you use LTO, they also end up address-taken I think. > > For vtables (a bigger chunk than switch-conversion decls) adjusting > things in the C++ FE is probably a good idea if it helps AVR. > > Richard. It don't think the C++ FE has an understanding of ASes at all. So even if it doesn't ICE or error, it's likely non-generic ASes are not preserved / respected during FE transformations. For now I'd say that when you use C+ +/ vtables on such a µC one must live with the consequences. Johann ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [patch 0/2] PR49847: Add hook to place read-only lookup-tables in named address-space 2017-07-28 10:18 ` Georg-Johann Lay @ 2017-07-28 11:15 ` Richard Biener 2017-07-28 18:16 ` Georg-Johann Lay 0 siblings, 1 reply; 13+ messages in thread From: Richard Biener @ 2017-07-28 11:15 UTC (permalink / raw) To: Georg-Johann Lay; +Cc: gcc-patches, Denis Chertykov On Fri, Jul 28, 2017 at 12:18 PM, Georg-Johann Lay <avr@gjlay.de> wrote: > Richard Biener schrieb: > >> On Thu, Jul 27, 2017 at 3:32 PM, Georg-Johann Lay <avr@gjlay.de> wrote: >>> >>> On 27.07.2017 14:34, Richard Biener wrote: >>>> >>>> On Thu, Jul 27, 2017 at 2:29 PM, Georg-Johann Lay <avr@gjlay.de> wrote: >>>>> >>>>> For some targets, the best place to put read-only lookup tables as >>>>> generated by -ftree-switch-conversion is not the generic address space >>>>> but some target specific address space. >>>>> >>>>> This is the case for AVR, where for most devices .rodata must be >>>>> located in RAM. >>>>> >>>>> Part #1 adds a new, optional target hook that queries the back-end >>>>> about the desired address space. There is currently only one user: >>>>> tree-switch-conversion.c::build_one_array() which re-builds value_type >>>>> and array_type if the address space returned by the backend is not >>>>> the generic one. >>>>> >>>>> Part #2 is the AVR part that implements the new hook and adds some >>>>> sugar around it. >>>> >>>> >>>> Given that switch-conversion just creates a constant initializer doesn't >>>> AVR >>>> benefit from handling those uniformly (at RTL expansion?). Not sure but >>>> I think it goes through the regular constant pool handling. >>>> >>>> Richard. >>> >>> >>> avr doesn't use constant pools because they are not profitable for >>> several reasons. >>> >>> Moreover, it's not sufficient to just patch the pool's section, you'd >>> also have to patch *all* accesses to also use the exact same address >>> space so that they use the correct instruction (LPM instead of LD). >> >> >> Sure. So then not handle it in constant pool handling but in expanding >> the initializers of global vars. I think the entry for this is >> varasm.c:assemble_variable - that sets DECL_RTL for the variable. > > > Expand and RTL is too late. The CSWTCH tables are created by some tree > pass, and it also generated accesses which use the address-space of the > element types. If any tree variable or copy thereof uses generic space, > then you will get wrong code. In the case of CSWTCH, you must get the > AS into value_type as noted by Steven > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49857#c3 > >>> Loop optimization, for example, may move addr-space pointers out of >>> loops and actually does this for some CSWTCH tables. I didn't look >>> into pool handling, but don't expect it allows to consistently >>> patch all accesses in the aftermath. >>> >>> *If* that's possible, then it should also be possible to patch vtables >>> and all of their accesses, aka. >>> >>> https://gcc.gnu.org/PR43745 >>> >>> e.g. in a target-specific tree pass? >> >> >> Ah, ok. I see what you mean. Once the address of a global var is >> taken the pointers refering to it have to use the proper address-space. > > > Yes, and all copies of these pointers etc. generated so far. That's the > point of address spaces. The binary representation of a pointer may > be exact the same for different ASes, but still accesses have to be > handled differently: Different legitimate addresses, different > instructions to access through these pointers, etc. Just converting > a pointer to a different AS will give you wrong code. > >> So yeah, it looks like this would need to be done in a (target specific) >> pass, probably an IPA pass in case the variables are used from multiple >> functions. > > > It would at least be an ABI change. If different modules are accessing > the vtable, then they must use the same AS. Having a copy of the vtable > in each module is pointless: If these modules don't agree about the AS > and host vtables in different ASes, we lost from the optimization point: > one module would have it in flash (consuming flash) and a different > module would have it in RAM (consumes RAM and flash to initialize that > RAM at startup). So this would not use 1x flash as intended, and not > 1x flash + 1x RAM like in the non optimized version, but 2x flash + > 1x RAM. Some comdat won't help because that gives wrong code. > >>> With vtables it's basically the same problem, you'd have to patch *all* >>> accesses to match the vtable's address-space. c++ need not to expose >>> address-spaces to the application, handling address-spaces internally >>> would be sufficient. >> >> >> The IPA pass should be able to handle this transparently (vtables are just >> global decls with initializers). > > > Global is bad. If the address of the object escapes, that's be wrong > code because location + all accesses must agree about the AS. > >> Of course the question is whether the linker can handle relocations >> between address-spaces for example? > > > What's the linker to do with it? Maybe there's confusion about what > I mean with "address space". I mean the named address spaces as of > > http://gcc.gnu.org/onlinedocs/gcc/Named-Address-Spaces.html > > according to ISO/IEC DTR 18037. It's not in the C-standard and only > supported by gcc as a GNU extension. Suppose the following GNU-C > > extern const char c; > > char readc (const char *pc) > { > return c + *pc; > } > > The generic AS allows direct addressing and addressing via X=r26 for > example: > > readc: > ; (set (reg:HI 26) > ; (reg:HI 24)) > movw r26,r24 ; 18 *movhi/1 > ; (set (reg:QI 25) > ; (mem:QI (reg:HI 26) [S1 A8])) > ld r25,X ; 8 movqi_insn/4 > ; (set (reg:QI 24) > ; (mem:QI (symbol_ref:HI ("c")) [S1 A8])) > lds r24,c ; 9 movqi_insn/4 > ; (set (reg:QI 24) > ; (plus:QI (reg:QI 24) > ; (reg:QI 25))) > add r24,r25 ; 15 addqi3/1 > ret > > This locates c in RAM and *pc must also be located in RAM which > is a waste. Now use AS __flash, i.e. non-volatile memory: > > extern const __flash char c; > > char readc (const __flash char *pc) > { > return c + *pc; > } > > readc: > ; (set (reg:HI 30) > ; (reg:HI 24)) > movw r30,r24 ; 20 *movhi/1 > ; (set (reg:QI 25) > ; (mem:QI (reg:HI 30) [S1 A8 AS1])) > lpm r25,Z ; 7 movqi_insn/4 > ; (set (reg:HI 30) > ; (symbol_ref:HI ("c"))) > ldi r30,lo8(c) ; 17 *movhi/5 > ldi r31,hi8(c) > ; (set (reg:QI 24) > ; (mem:QI (reg:HI 30) [S1 A8 AS1])) > lpm r24,Z ; 8 movqi_insn/4 > ; (set (reg:QI 24) > ; (plus:QI (reg:QI 24) > ; (reg:QI 25))) > add r24,r25 ; 14 addqi3/1 > ret > > And access to __flash must use instruction LPM *,Z. > Direct addressing or using X or Y register are not allowed. > Using Z+const addressing is not allowed. Using LD or LDS on > the same binary address will lead to wrong code, for example will > read from RAM address 0x42 instead of from flash address 0x42. > > How could the linker fix an address to generic AS to one that > goes to non-generic AS? The linker doesn't even have that > information. > > >> As of the patch I think it is too specific (switch-conversion only) >> for my taste. > > > It's actually not restricted to switch-conversion. It can be used for > any object provided > > 1) The object is in static storage and located in that AS. > > 2) All accesses to that object use that AS. > > 3) The object is read-only, i.e. not modified after load-time. > >> What you can do I think is in assemble_variable handle the case of >> ! TREE_PUBLIC && ! TREE_ADDRESSABLE -- non-exported, non-address >> taken variables can be put into a different address-space transparently >> by adjusting their type to reflect the changed address-space. > > > /* Assemble everything that is needed for a variable or function > declaration. > Not used for automatic variables, and not used for function definitions. > Should not be called for variables of incomplete structure type. > > TOP_LEVEL is nonzero if this variable has file scope. > AT_END is nonzero if this is the special handling, at end of compilation, > to define things that have had only tentative definitions. > DONT_OUTPUT_DATA if nonzero means don't actually output the > initial value (that will be done by the caller). */ > > void > assemble_variable (tree decl, int top_level ATTRIBUTE_UNUSED, > int at_end ATTRIBUTE_UNUSED, int dont_output_data) > { > > So I can patch TYPE_ADDR_SPACE of /decl/ and this will by magic > diffuse to all places that access it? Wow! For the ! TREE_ADDRESSABLE case yes, I think so (of course you need to double-check...). ! TREE_PUBLIC is because otherwise you're not seeing all accesses and thus TREE_ADDRESSABLE cannot be trusted. TREE_STATIC should be always set in this function. >> This should catch the switch-conversion case that didn't run into the >> loop case. It won't fix vtables I think because they are always exported >> unless you use LTO, they also end up address-taken I think. >> >> For vtables (a bigger chunk than switch-conversion decls) adjusting >> things in the C++ FE is probably a good idea if it helps AVR. >> >> Richard. > > > It don't think the C++ FE has an understanding of ASes at all. So even > if it doesn't ICE or error, it's likely non-generic ASes are not > preserved / respected during FE transformations. For now I'd say > that when you use C+ +/ vtables on such a µC one must live with the > consequences. Ah - I totally forgot C++ doesn't handle address-spaces... Richard. > Johann ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [patch 0/2] PR49847: Add hook to place read-only lookup-tables in named address-space 2017-07-28 11:15 ` Richard Biener @ 2017-07-28 18:16 ` Georg-Johann Lay 0 siblings, 0 replies; 13+ messages in thread From: Georg-Johann Lay @ 2017-07-28 18:16 UTC (permalink / raw) To: Richard Biener; +Cc: gcc-patches, Denis Chertykov Richard Biener schrieb: > On Fri, Jul 28, 2017 at 12:18 PM, Georg-Johann Lay <avr@gjlay.de> wrote: >> Richard Biener schrieb: >> >>> On Thu, Jul 27, 2017 at 3:32 PM, Georg-Johann Lay <avr@gjlay.de> wrote: >>>> On 27.07.2017 14:34, Richard Biener wrote: >>>>> On Thu, Jul 27, 2017 at 2:29 PM, Georg-Johann Lay <avr@gjlay.de> wrote: >>>>>> For some targets, the best place to put read-only lookup tables as >>>>>> generated by -ftree-switch-conversion is not the generic address space >>>>>> but some target specific address space. >>>>>> >>>>>> This is the case for AVR, where for most devices .rodata must be >>>>>> located in RAM. >>>>>> >>>>>> Part #1 adds a new, optional target hook that queries the back-end >>>>>> about the desired address space. There is currently only one user: >>>>>> tree-switch-conversion.c::build_one_array() which re-builds value_type >>>>>> and array_type if the address space returned by the backend is not >>>>>> the generic one. >>>>>> >>>>>> Part #2 is the AVR part that implements the new hook and adds some >>>>>> sugar around it. >>>>> >>>>> Given that switch-conversion just creates a constant initializer doesn't >>>>> AVR >>>>> benefit from handling those uniformly (at RTL expansion?). Not sure but >>>>> I think it goes through the regular constant pool handling. >>>>> >>>>> Richard. >>>> >>>> avr doesn't use constant pools because they are not profitable for >>>> several reasons. >>>> >>>> Moreover, it's not sufficient to just patch the pool's section, you'd >>>> also have to patch *all* accesses to also use the exact same address >>>> space so that they use the correct instruction (LPM instead of LD). >>> >>> Sure. So then not handle it in constant pool handling but in expanding >>> the initializers of global vars. I think the entry for this is >>> varasm.c:assemble_variable - that sets DECL_RTL for the variable. >> >> Expand and RTL is too late. The CSWTCH tables are created by some tree >> pass, and it also generated accesses which use the address-space of the >> element types. If any tree variable or copy thereof uses generic space, >> then you will get wrong code. In the case of CSWTCH, you must get the >> AS into value_type as noted by Steven >> >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49857#c3 >> >>>> Loop optimization, for example, may move addr-space pointers out of >>>> loops and actually does this for some CSWTCH tables. I didn't look >>>> into pool handling, but don't expect it allows to consistently >>>> patch all accesses in the aftermath. >>>> >>>> *If* that's possible, then it should also be possible to patch vtables >>>> and all of their accesses, aka. >>>> >>>> https://gcc.gnu.org/PR43745 >>>> >>>> e.g. in a target-specific tree pass? >>> >>> Ah, ok. I see what you mean. Once the address of a global var is >>> taken the pointers refering to it have to use the proper address-space. >> >> Yes, and all copies of these pointers etc. generated so far. That's the >> point of address spaces. The binary representation of a pointer may >> be exact the same for different ASes, but still accesses have to be >> handled differently: Different legitimate addresses, different >> instructions to access through these pointers, etc. Just converting >> a pointer to a different AS will give you wrong code. >> >>> So yeah, it looks like this would need to be done in a (target specific) >>> pass, probably an IPA pass in case the variables are used from multiple >>> functions. >> >> It would at least be an ABI change. If different modules are accessing >> the vtable, then they must use the same AS. Having a copy of the vtable >> in each module is pointless: If these modules don't agree about the AS >> and host vtables in different ASes, we lost from the optimization point: >> one module would have it in flash (consuming flash) and a different >> module would have it in RAM (consumes RAM and flash to initialize that >> RAM at startup). So this would not use 1x flash as intended, and not >> 1x flash + 1x RAM like in the non optimized version, but 2x flash + >> 1x RAM. Some comdat won't help because that gives wrong code. >> >>>> With vtables it's basically the same problem, you'd have to patch *all* >>>> accesses to match the vtable's address-space. c++ need not to expose >>>> address-spaces to the application, handling address-spaces internally >>>> would be sufficient. >>> >>> The IPA pass should be able to handle this transparently (vtables are just >>> global decls with initializers). >> >> Global is bad. If the address of the object escapes, that's be wrong >> code because location + all accesses must agree about the AS. >> >>> Of course the question is whether the linker can handle relocations >>> between address-spaces for example? >> >> What's the linker to do with it? Maybe there's confusion about what >> I mean with "address space". I mean the named address spaces as of >> >> http://gcc.gnu.org/onlinedocs/gcc/Named-Address-Spaces.html >> >> according to ISO/IEC DTR 18037. It's not in the C-standard and only >> supported by gcc as a GNU extension. Suppose the following GNU-C >> >> extern const char c; >> >> char readc (const char *pc) >> { >> return c + *pc; >> } >> >> The generic AS allows direct addressing and addressing via X=r26 for >> example: >> >> readc: >> ; (set (reg:HI 26) >> ; (reg:HI 24)) >> movw r26,r24 ; 18 *movhi/1 >> ; (set (reg:QI 25) >> ; (mem:QI (reg:HI 26) [S1 A8])) >> ld r25,X ; 8 movqi_insn/4 >> ; (set (reg:QI 24) >> ; (mem:QI (symbol_ref:HI ("c")) [S1 A8])) >> lds r24,c ; 9 movqi_insn/4 >> ; (set (reg:QI 24) >> ; (plus:QI (reg:QI 24) >> ; (reg:QI 25))) >> add r24,r25 ; 15 addqi3/1 >> ret >> >> This locates c in RAM and *pc must also be located in RAM which >> is a waste. Now use AS __flash, i.e. non-volatile memory: >> >> extern const __flash char c; >> >> char readc (const __flash char *pc) >> { >> return c + *pc; >> } >> >> readc: >> ; (set (reg:HI 30) >> ; (reg:HI 24)) >> movw r30,r24 ; 20 *movhi/1 >> ; (set (reg:QI 25) >> ; (mem:QI (reg:HI 30) [S1 A8 AS1])) >> lpm r25,Z ; 7 movqi_insn/4 >> ; (set (reg:HI 30) >> ; (symbol_ref:HI ("c"))) >> ldi r30,lo8(c) ; 17 *movhi/5 >> ldi r31,hi8(c) >> ; (set (reg:QI 24) >> ; (mem:QI (reg:HI 30) [S1 A8 AS1])) >> lpm r24,Z ; 8 movqi_insn/4 >> ; (set (reg:QI 24) >> ; (plus:QI (reg:QI 24) >> ; (reg:QI 25))) >> add r24,r25 ; 14 addqi3/1 >> ret >> >> And access to __flash must use instruction LPM *,Z. >> Direct addressing or using X or Y register are not allowed. >> Using Z+const addressing is not allowed. Using LD or LDS on >> the same binary address will lead to wrong code, for example will >> read from RAM address 0x42 instead of from flash address 0x42. >> >> How could the linker fix an address to generic AS to one that >> goes to non-generic AS? The linker doesn't even have that >> information. >> >> >>> As of the patch I think it is too specific (switch-conversion only) >>> for my taste. >> >> It's actually not restricted to switch-conversion. It can be used for >> any object provided >> >> 1) The object is in static storage and located in that AS. >> >> 2) All accesses to that object use that AS. >> >> 3) The object is read-only, i.e. not modified after load-time. >> >>> What you can do I think is in assemble_variable handle the case of >>> ! TREE_PUBLIC && ! TREE_ADDRESSABLE -- non-exported, non-address >>> taken variables can be put into a different address-space transparently >>> by adjusting their type to reflect the changed address-space. >> >> /* Assemble everything that is needed for a variable or function >> declaration. >> Not used for automatic variables, and not used for function definitions. >> Should not be called for variables of incomplete structure type. >> >> TOP_LEVEL is nonzero if this variable has file scope. >> AT_END is nonzero if this is the special handling, at end of compilation, >> to define things that have had only tentative definitions. >> DONT_OUTPUT_DATA if nonzero means don't actually output the >> initial value (that will be done by the caller). */ >> >> void >> assemble_variable (tree decl, int top_level ATTRIBUTE_UNUSED, >> int at_end ATTRIBUTE_UNUSED, int dont_output_data) >> { >> >> So I can patch TYPE_ADDR_SPACE of /decl/ and this will by magic >> diffuse to all places that access it? Wow! > > For the ! TREE_ADDRESSABLE case yes, I think so (of course you > need to double-check...). ! TREE_PUBLIC is because otherwise > you're not seeing all accesses and thus TREE_ADDRESSABLE cannot > be trusted. TREE_STATIC should be always set in this function. I don't think using this is safe because you don't have control over all accesses. For example, stuff might be accessed by means of inline asm. This is not the case for CSWTCH because it is not exposed to the user's code and user cannot refer to the symbol. So it must also be DECL_ARTIFICIAL. If the user generated an object and has a symbol, she may expect that the object is in the address space she assigned (and use asm code as appropriate for that AS). Using a different AS than specified by the user may be considered a bug. If a user wants stuff to reside in flash she can do so by adjusting her code and definitions, but this is not the case for CSWTCH. The only thing she can do is -fno-tree-switch-conversion or -fno-jump-tables to avoid waste of RAM. And when assemble_variable is called from the C++ FE then changing AS might nuke the compiler, so you'd have to approve ugly testing for language FE... >>> This should catch the switch-conversion case that didn't run into the >>> loop case. It won't fix vtables I think because they are always exported >>> unless you use LTO, they also end up address-taken I think. >>> >>> For vtables (a bigger chunk than switch-conversion decls) adjusting >>> things in the C++ FE is probably a good idea if it helps AVR. >>> >>> Richard. >> >> It don't think the C++ FE has an understanding of ASes at all. So even >> if it doesn't ICE or error, it's likely non-generic ASes are not >> preserved / respected during FE transformations. For now I'd say >> that when you use C+ +/ vtables on such a µC one must live with the >> consequences. > > Ah - I totally forgot C++ doesn't handle address-spaces... And never will. Johann ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [patch 0/2] PR49847: Add hook to place read-only lookup-tables in named address-space 2017-07-28 7:34 ` Richard Biener 2017-07-28 10:18 ` Georg-Johann Lay @ 2017-08-16 14:29 ` Georg-Johann Lay 2017-08-18 10:35 ` Richard Biener 1 sibling, 1 reply; 13+ messages in thread From: Georg-Johann Lay @ 2017-08-16 14:29 UTC (permalink / raw) To: Richard Biener; +Cc: gcc-patches, Denis Chertykov On 28.07.2017 09:34, Richard Biener wrote: > On Thu, Jul 27, 2017 at 3:32 PM, Georg-Johann Lay <avr@gjlay.de> wrote: >> On 27.07.2017 14:34, Richard Biener wrote: >>> >>> On Thu, Jul 27, 2017 at 2:29 PM, Georg-Johann Lay <avr@gjlay.de> wrote: >>>> >>>> For some targets, the best place to put read-only lookup tables as >>>> generated by -ftree-switch-conversion is not the generic address space >>>> but some target specific address space. >>>> >>>> This is the case for AVR, where for most devices .rodata must be >>>> located in RAM. >>>> >>>> Part #1 adds a new, optional target hook that queries the back-end >>>> about the desired address space. There is currently only one user: >>>> tree-switch-conversion.c::build_one_array() which re-builds value_type >>>> and array_type if the address space returned by the backend is not >>>> the generic one. >>>> >>>> Part #2 is the AVR part that implements the new hook and adds some >>>> sugar around it. >>> >>> >>> Given that switch-conversion just creates a constant initializer doesn't >>> AVR >>> benefit from handling those uniformly (at RTL expansion?). Not sure but >>> I think it goes through the regular constant pool handling. >>> >>> Richard. >> >> >> avr doesn't use constant pools because they are not profitable for >> several reasons. >> >> Moreover, it's not sufficient to just patch the pool's section, you'd >> also have to patch *all* accesses to also use the exact same address >> space so that they use the correct instruction (LPM instead of LD). > > Sure. So then not handle it in constant pool handling but in expanding > the initializers of global vars. I think the entry for this is > varasm.c:assemble_variable - that sets DECL_RTL for the variable. That cannot work, and here is why; just for completeness: cgraphunit.c::symbol_table::compile() runs ... expand_all_functions (); output_variables (); // runs assemble_variable ... So any patching of DECL_RTL will result in wrong code: The address space of the decl won't match the address space used by the code accessing decl. Johann ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [patch 0/2] PR49847: Add hook to place read-only lookup-tables in named address-space 2017-08-16 14:29 ` Georg-Johann Lay @ 2017-08-18 10:35 ` Richard Biener 2017-08-18 14:54 ` Georg-Johann Lay 0 siblings, 1 reply; 13+ messages in thread From: Richard Biener @ 2017-08-18 10:35 UTC (permalink / raw) To: Georg-Johann Lay; +Cc: gcc-patches, Denis Chertykov On Wed, Aug 16, 2017 at 3:32 PM, Georg-Johann Lay <avr@gjlay.de> wrote: > On 28.07.2017 09:34, Richard Biener wrote: >> >> On Thu, Jul 27, 2017 at 3:32 PM, Georg-Johann Lay <avr@gjlay.de> wrote: >>> >>> On 27.07.2017 14:34, Richard Biener wrote: >>>> >>>> >>>> On Thu, Jul 27, 2017 at 2:29 PM, Georg-Johann Lay <avr@gjlay.de> wrote: >>>>> >>>>> >>>>> For some targets, the best place to put read-only lookup tables as >>>>> generated by -ftree-switch-conversion is not the generic address space >>>>> but some target specific address space. >>>>> >>>>> This is the case for AVR, where for most devices .rodata must be >>>>> located in RAM. >>>>> >>>>> Part #1 adds a new, optional target hook that queries the back-end >>>>> about the desired address space. There is currently only one user: >>>>> tree-switch-conversion.c::build_one_array() which re-builds value_type >>>>> and array_type if the address space returned by the backend is not >>>>> the generic one. >>>>> >>>>> Part #2 is the AVR part that implements the new hook and adds some >>>>> sugar around it. >>>> >>>> >>>> >>>> Given that switch-conversion just creates a constant initializer doesn't >>>> AVR >>>> benefit from handling those uniformly (at RTL expansion?). Not sure but >>>> I think it goes through the regular constant pool handling. >>>> >>>> Richard. >>> >>> >>> >>> avr doesn't use constant pools because they are not profitable for >>> several reasons. >>> >>> Moreover, it's not sufficient to just patch the pool's section, you'd >>> also have to patch *all* accesses to also use the exact same address >>> space so that they use the correct instruction (LPM instead of LD). >> >> >> Sure. So then not handle it in constant pool handling but in expanding >> the initializers of global vars. I think the entry for this is >> varasm.c:assemble_variable - that sets DECL_RTL for the variable. > > > That cannot work, and here is why; just for completeness: > > cgraphunit.c::symbol_table::compile() > > runs > > ... > expand_all_functions (); > output_variables (); // runs assemble_variable > ... > > So any patching of DECL_RTL will result in wrong code: The address > space of the decl won't match the address space used by the code > accessing decl. Ok, so it's more tricky to hack it that way (you'd need to catch the time the decl gets its DECL_RTL set, not sure where that happens for globals). That leaves doing a more broad transform. One convenient place to hook in would be the ipa_single_use pass which computes the varpool_node.used_by_single_function flag. All such variables would be suitable for promotion (with some additional constraints I suppose). You'd add a transform phase to the pass that rewrites the decls and performs GIMPLE IL adjustments (I think you need to patch memory references to use the address-space). Of course there would need to be a target hook determining if/what section/address-space a variable should be promoted to which somehow would allow switch-conversion to use that as well. Ho humm. That said, do you think the switch-conversion created decl is the only one that benefits from compiler-directed promotion to different address-space? Richard. > Johann ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [patch 0/2] PR49847: Add hook to place read-only lookup-tables in named address-space 2017-08-18 10:35 ` Richard Biener @ 2017-08-18 14:54 ` Georg-Johann Lay 0 siblings, 0 replies; 13+ messages in thread From: Georg-Johann Lay @ 2017-08-18 14:54 UTC (permalink / raw) To: Richard Biener; +Cc: gcc-patches, Denis Chertykov On 18.08.2017 12:01, Richard Biener wrote: > On Wed, Aug 16, 2017 at 3:32 PM, Georg-Johann Lay <avr@gjlay.de> wrote: >> On 28.07.2017 09:34, Richard Biener wrote: >>> On Thu, Jul 27, 2017 at 3:32 PM, Georg-Johann Lay <avr@gjlay.de> wrote: >>>> On 27.07.2017 14:34, Richard Biener wrote: >>>>> On Thu, Jul 27, 2017 at 2:29 PM, Georg-Johann Lay <avr@gjlay.de> wrote: >>>>>> >>>>>> For some targets, the best place to put read-only lookup tables as >>>>>> generated by -ftree-switch-conversion is not the generic address space >>>>>> but some target specific address space. >>>>>> >>>>>> This is the case for AVR, where for most devices .rodata must be >>>>>> located in RAM. >>>>>> >>>>>> Part #1 adds a new, optional target hook that queries the back-end >>>>>> about the desired address space. There is currently only one user: >>>>>> tree-switch-conversion.c::build_one_array() which re-builds value_type >>>>>> and array_type if the address space returned by the backend is not >>>>>> the generic one. >>>>>> >>>>>> Part #2 is the AVR part that implements the new hook and adds some >>>>>> sugar around it. >>>>> >>>>> Given that switch-conversion just creates a constant initializer doesn't >>>>> AVR >>>>> benefit from handling those uniformly (at RTL expansion?). Not sure but >>>>> I think it goes through the regular constant pool handling. >>>>> >>>>> Richard. >>>> >>>> avr doesn't use constant pools because they are not profitable for >>>> several reasons. >>>> >>>> Moreover, it's not sufficient to just patch the pool's section, you'd >>>> also have to patch *all* accesses to also use the exact same address >>>> space so that they use the correct instruction (LPM instead of LD). >>> >>> Sure. So then not handle it in constant pool handling but in expanding >>> the initializers of global vars. I think the entry for this is >>> varasm.c:assemble_variable - that sets DECL_RTL for the variable. >> >> That cannot work, and here is why; just for completeness: >> >> cgraphunit.c::symbol_table::compile() >> >> runs >> ... >> expand_all_functions (); >> output_variables (); // runs assemble_variable >> ... >> >> So any patching of DECL_RTL will result in wrong code: The address >> space of the decl won't match the address space used by the code >> accessing decl. > > Ok, so it's more tricky to hack it that way (you'd need to catch the > time the decl gets its DECL_RTL set, not sure where that happens > for globals). Too late, IMO. Problem is that any reference must also have the same AS. Hence if somewhere, before patching decl_rtl, some code uses TREE_TYPE on respective decl, that type will miss the AS, same for all variables / pointers created with that type. Been there, seen it... > That leaves doing a more broad transform. One convenient place > to hook in would be the ipa_single_use pass which computes > the varpool_node.used_by_single_function flag. All such variables > would be suitable for promotion (with some additional constraints > I suppose). You'd add a transform phase to the pass that rewrites > the decls and performs GIMPLE IL adjustments (I think you need > to patch memory references to use the address-space). Rewriting DECLs is not enough. Only the place that cooks up the decl (implicitly) knows whether it's appropriate to use different AS and whether is has control over diffusion into TREE_TYPEs. And as we have strong filter (see below) which includes DECL_ARTIFICIAL, almost nothing will remain to be transformed anyway... > Of course there would need to be a target hook determining > if/what section/address-space a variable should be promoted to > which somehow would allow switch-conversion to use that > as well. Ho humm. > > That said, do you think the switch-conversion created decl is > the only one that benefits from compiler-directed promotion > to different address-space? Yes. Only compiler-generated lookup-tables may be transformed, and the only such tables I know of are CSWTCH and vtables. The conditions so far are: * TREE_STATIC * !TREE_PUBLIC * TREE_READONLY (at least for the case this PR is after) * DECL_ARTIFICIAL (or otherwise exclude inline asm) * decl must not be cooked up by non-C FE (i.e. vtables are out). * Not weak, comdat or linkonce (again, vtables are out). * Not for debug purpose (like what dwarf2asm does). * No section attribute (actually also CSWTCH could be mentioned in linker script, but we can argue that artificials are managed by the compiler + user has option to control CSWTCH). What remains is CSWTCH. Johann ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [patch 1/2] PR49847: Add hook to place read-only lookup-tables in named address-space 2017-07-27 12:29 [patch 0/2] PR49847: Add hook to place read-only lookup-tables in named address-space Georg-Johann Lay 2017-07-27 12:34 ` Richard Biener @ 2017-07-27 12:41 ` Georg-Johann Lay 2017-07-27 12:50 ` [patch 2/2,avr] " Georg-Johann Lay 2 siblings, 0 replies; 13+ messages in thread From: Georg-Johann Lay @ 2017-07-27 12:41 UTC (permalink / raw) To: gcc-patches; +Cc: Denis Chertykov [-- Attachment #1: Type: text/plain, Size: 1831 bytes --] On 27.07.2017 14:29, Georg-Johann Lay wrote: > For some targets, the best place to put read-only lookup tables as > generated by -ftree-switch-conversion is not the generic address space > but some target specific address space. > > This is the case for AVR, where for most devices .rodata must be > located in RAM. > > Part #1 adds a new, optional target hook that queries the back-end > about the desired address space. There is currently only one user: > tree-switch-conversion.c::build_one_array() which re-builds value_type > and array_type if the address space returned by the backend is not > the generic one. > > Part #2 is the AVR part that implements the new hook and adds some > sugar around it. This is the gcc part which adds the new hook TARGET_ADDR_SPACE_FOR_ARTIFICIAL_RODATA resp. targetm.addr_space.for_artificial_rodata(). The default implementation returns ADDR_SPACE_GENERIC which represents a no-op. Only if !ADDR_SPACE_GENERIC_P, the array and value type are re-built. The accesses must be in such a way that any access to the newly created array matches the address space. This is the reason for why the back-end cannot do it on its own: Just putting the stuff in a specific section does *not* do the trick. Bootstrapped ok on x86_64. Johann PR 49857 * doc/tm.texi.in (Named Address Spaces) [TARGET_ADDR_SPACE_FOR_ARTIFICIAL_RODATA]: Add hook anchor. * doc/tm.texi: Regenerate. * target.def (addr_space) [for_artificial_rodata]: New optional hook. * targhooks.c (default_addr_space_for_artificial_rodata): New function. * targhooks.h (default_addr_space_for_artificial_rodata): New proto. * tree-switch-conversion.c (target.h): Include it. (build_one_array): Set address space of value_type according to targetm.addr_space.for_artificial_rodata() and rebuild array_type if needed. [-- Attachment #2: pr49857-v3-gcc.diff --] [-- Type: text/x-patch, Size: 5170 bytes --] Index: target.def =================================================================== --- target.def (revision 250302) +++ target.def (working copy) @@ -3285,6 +3285,25 @@ The default implementation does nothing. void, (addr_space_t as, location_t loc), default_addr_space_diagnose_usage) +/* Function to get the address space of some compiler-generated + read-only data. Used for optimization purposes only. */ +DEFHOOK +(for_artificial_rodata, + "Define this hook to return an address space to be used for @var{type},\n\ +usually an artificial lookup-table that would reside in @code{.rodata}.\n\ +It is always safe not to implement this hook or to return\n\ +@code{ADDR_SPACE_GENERIC}.\n\ +\n\ +The hook can be used to put compiler-generated, artificial data in\n\ +static stzorage into a specific address space when this is better suited\n\ +than the generic address space.\n\ +The compiler will also generate all accesses to the respective data\n\ +so that all associated accesses will also use the desired address space.\n\ +An example for such data are the @code{CSWTCH} lookup tables as generated\n\ +by @option{-ftree-switch-conversion}.", + addr_space_t, (tree type), + default_addr_space_for_artificial_rodata) + HOOK_VECTOR_END (addr_space) #undef HOOK_PREFIX Index: targhooks.c =================================================================== --- targhooks.c (revision 250302) +++ targhooks.c (working copy) @@ -1397,6 +1397,14 @@ default_addr_space_convert (rtx op ATTRI gcc_unreachable (); } +/* The default hook for TARGET_ADDR_SPACE_FOR_ARTIFICIAL_RODATA. */ + +addr_space_t +default_addr_space_for_artificial_rodata (tree) +{ + return ADDR_SPACE_GENERIC; +} + bool default_hard_regno_scratch_ok (unsigned int regno ATTRIBUTE_UNUSED) { Index: targhooks.h =================================================================== --- targhooks.h (revision 250302) +++ targhooks.h (working copy) @@ -184,6 +184,7 @@ extern bool default_addr_space_zero_addr extern int default_addr_space_debug (addr_space_t); extern void default_addr_space_diagnose_usage (addr_space_t, location_t); extern rtx default_addr_space_convert (rtx, tree, tree); +extern addr_space_t default_addr_space_for_artificial_rodata (tree); extern unsigned int default_case_values_threshold (void); extern bool default_have_conditional_execution (void); Index: tree-switch-conversion.c =================================================================== --- tree-switch-conversion.c (revision 250302) +++ tree-switch-conversion.c (working copy) @@ -46,6 +46,7 @@ Software Foundation, 51 Franklin Street, #include "gimplify-me.h" #include "tree-cfg.h" #include "cfgloop.h" +#include "target.h" /* ??? For lang_hooks.types.type_for_mode, but is there a word_mode type in the GIMPLE type system that is language-independent? */ @@ -1136,6 +1137,16 @@ build_one_array (gswitch *swtch, int num default_type = TREE_TYPE (info->default_values[num]); value_type = array_value_type (swtch, default_type, num, info); array_type = build_array_type (value_type, arr_index_type); + // Run the following hook on the complete array so the back-end + // can inspect details of it. + addr_space_t as = targetm.addr_space.for_artificial_rodata (array_type); + if (!ADDR_SPACE_GENERIC_P (as)) + { + int quals = (TYPE_QUALS_NO_ADDR_SPACE (value_type) + | ENCODE_QUAL_ADDR_SPACE (as)); + value_type = build_qualified_type (value_type, quals); + array_type = build_array_type (value_type, arr_index_type); + } if (default_type != value_type) { unsigned int i; Index: doc/tm.texi =================================================================== --- doc/tm.texi (revision 250302) +++ doc/tm.texi (working copy) @@ -10595,6 +10595,21 @@ the address space as registered with @co The default implementation does nothing. @end deftypefn +@deftypefn {Target Hook} addr_space_t TARGET_ADDR_SPACE_FOR_ARTIFICIAL_RODATA (tree @var{type}) +Define this hook to return an address space to be used for @var{type}, +usually an artificial lookup-table that would reside in @code{.rodata}. +It is always safe not to implement this hook or to return +@code{ADDR_SPACE_GENERIC}. + +The hook can be used to put compiler-generated, artificial data in +static stzorage into a specific address space when this is better suited +than the generic address space. +The compiler will also generate all accesses to the respective data +so that all associated accesses will also use the desired address space. +An example for such data are the @code{CSWTCH} lookup tables as generated +by @option{-ftree-switch-conversion}. +@end deftypefn + @node Misc @section Miscellaneous Parameters @cindex parameters, miscellaneous Index: doc/tm.texi.in =================================================================== --- doc/tm.texi.in (revision 250302) +++ doc/tm.texi.in (working copy) @@ -7529,6 +7529,8 @@ c_register_addr_space ("__ea", ADDR_SPAC @hook TARGET_ADDR_SPACE_DIAGNOSE_USAGE +@hook TARGET_ADDR_SPACE_FOR_ARTIFICIAL_RODATA + @node Misc @section Miscellaneous Parameters @cindex parameters, miscellaneous ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [patch 2/2,avr] PR49847: Add hook to place read-only lookup-tables in named address-space 2017-07-27 12:29 [patch 0/2] PR49847: Add hook to place read-only lookup-tables in named address-space Georg-Johann Lay 2017-07-27 12:34 ` Richard Biener 2017-07-27 12:41 ` [patch 1/2] " Georg-Johann Lay @ 2017-07-27 12:50 ` Georg-Johann Lay 2017-07-27 15:48 ` Denis Chertykov 2 siblings, 1 reply; 13+ messages in thread From: Georg-Johann Lay @ 2017-07-27 12:50 UTC (permalink / raw) To: gcc-patches; +Cc: Denis Chertykov [-- Attachment #1: Type: text/plain, Size: 2452 bytes --] On 27.07.2017 14:29, Georg-Johann Lay wrote: > For some targets, the best place to put read-only lookup tables as > generated by -ftree-switch-conversion is not the generic address space > but some target specific address space. > > This is the case for AVR, where for most devices .rodata must be > located in RAM. > > Part #1 adds a new, optional target hook that queries the back-end > about the desired address space. There is currently only one user: > tree-switch-conversion.c::build_one_array() which re-builds value_type > and array_type if the address space returned by the backend is not > the generic one. > > Part #2 is the AVR part that implements the new hook and adds some > sugar around it. This is the AVR part. It implements the new hook which returns a convenient flash address space for devices where .rodata is located in RAM: The 16-bit __flash for devices with <= 64 KiB flash and 24-bit __memx for > 64 KiB flash. It adds a new option -madd-space-for-lookup= which allows to pick a specific address space. Some new insns and combine-split suport best code generation by the knowledge that the 24-bit addresses will never point to RAM so that the expensive decision-at-runtime whether LD or LPM has to be used can be avoided. Passed without new regressions on atmega128. Ok for trunk provided the gcc part 1/2 is approved? Johann Implement TARGET_ADDR_SPACE_FOR_ARTIFICIAL_RODATA. PR target/49857 * config/avr/avr-opts.h: New file. * config/avr/avr.opt: Include it. (-maddr-space-for-lookup=): New option and... (avr_opt_addr_space_for_lookup): ...associated Var. (avr_aspace_for_lookup): New option enums used by above. * config/avr/avr-protos.h (avr_out_load_flashx): New proto. * config/avr/avr.c (avr_out_load_flashx): New function. * avr_adjust_insn_length [ADJUST_LEN_LOAD_FLASHX]: Handle it. * avr_rtx_costs_1 [ZERO_EXTEND, SIGN_EXTEND]: Handle shift-and-extend-by-1 of HI -> PSI. [ASHIFT,PSImode]: Describe cost of extend-and-shift-by-1. (TARGET_ADDR_SPACE_FOR_ARTIFICIAL_RODATA): Define to... (avr_addr_space_for_artificial_rodata): ...this new static function. * config/avr/avr.md (unspec): Add UNSPEC_LOAD_FLASHX. (adjust_len): Add load_flashx. (*ashiftpsi.1_sign_extend.hi, *ashiftpsi.1_zero_extend.hi) (*extendpsi.ashift.1.uqi, *load<mode>-flashx): New insns. (*split_xload<mode>-cswtch): New insn-and-split. * doc/invoke.texi (AVR Options) <-maddr-space-for-lookup=>: Document. [-- Attachment #2: pr49857-v3-avr.diff --] [-- Type: text/x-patch, Size: 17253 bytes --] Index: config/avr/avr-opts.h =================================================================== --- config/avr/avr-opts.h (nonexistent) +++ config/avr/avr-opts.h (working copy) @@ -0,0 +1,40 @@ +/* Definitions for option handling for AVR. + Copyright (C) 2017 Free Software Foundation, Inc. + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 3, or (at your option) +any later version. + +GCC is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU General Public License for more details. + +Under Section 7 of GPL version 3, you are granted additional +permissions described in the GCC Runtime Library Exception, version +3.1, as published by the Free Software Foundation. + +You should have received a copy of the GNU General Public License and +a copy of the GCC Runtime Library Exception along with this program; +see the files COPYING3 and COPYING.RUNTIME respectively. If not, see +<http://www.gnu.org/licenses/>. */ + +#ifndef AVR_OPTS_H +#define AVR_OPTS_H + +enum avr_opt_addr_space + { + AVR_OPT_ADDR_SPACE_flash, + AVR_OPT_ADDR_SPACE_flash1, + AVR_OPT_ADDR_SPACE_flash2, + AVR_OPT_ADDR_SPACE_flash3, + AVR_OPT_ADDR_SPACE_flash4, + AVR_OPT_ADDR_SPACE_flash5, + AVR_OPT_ADDR_SPACE_memx, + AVR_OPT_ADDR_SPACE_generic + }; + +#endif /* AVR_OPTS_H */ Index: config/avr/avr-protos.h =================================================================== --- config/avr/avr-protos.h (revision 250302) +++ config/avr/avr-protos.h (working copy) @@ -94,6 +94,7 @@ extern const char* avr_out_plus (rtx, rt extern const char* avr_out_round (rtx_insn *, rtx*, int* =NULL); extern const char* avr_out_addto_sp (rtx*, int*); extern const char* avr_out_xload (rtx_insn *, rtx*, int*); +extern const char* avr_out_load_flashx (rtx_insn*, rtx*, int*); extern const char* avr_out_movmem (rtx_insn *, rtx*, int*); extern const char* avr_out_insert_bits (rtx*, int*); extern bool avr_popcount_each_byte (rtx, int, int); Index: config/avr/avr.c =================================================================== --- config/avr/avr.c (revision 250302) +++ config/avr/avr.c (working copy) @@ -3914,6 +3914,31 @@ avr_out_xload (rtx_insn *insn ATTRIBUTE_ } +/* Load register XOP[0] with flash content from PSI address XOP[1]. + We have ELPMX, MOVW, load up to 4 bytes and may clobber Z. */ + +const char* +avr_out_load_flashx (rtx_insn*, rtx *xop, int *plen) +{ + unsigned n_bytes = GET_MODE_SIZE (GET_MODE (xop[0])); + gcc_assert (n_bytes <= 4); + + avr_asm_len ("out __RAMPZ__,%C1", xop, plen, -1); + avr_asm_len ("movw ZL,%1", xop, plen, 1); + + avr_asm_len ("elpm %A0,Z+", xop, plen, 1); + if (n_bytes >= 2) avr_asm_len ("elpm %B0,Z+", xop, plen, 1); + if (n_bytes >= 3) avr_asm_len ("elpm %C0,Z+", xop, plen, 1); + if (n_bytes >= 4) avr_asm_len ("elpm %D0,Z+", xop, plen, 1); + + // EBI devices: reset RAMPZ to 0 so that we don't read garbage from RAM. + if (AVR_HAVE_RAMPD) + avr_asm_len ("out __RAMPZ__,__zero_reg__", xop, plen, 1); + + return ""; +} + + const char* output_movqi (rtx_insn *insn, rtx operands[], int *plen) { @@ -9438,6 +9463,7 @@ avr_adjust_insn_length (rtx_insn *insn, case ADJUST_LEN_MOV32: output_movsisf (insn, op, &len); break; case ADJUST_LEN_MOVMEM: avr_out_movmem (insn, op, &len); break; case ADJUST_LEN_XLOAD: avr_out_xload (insn, op, &len); break; + case ADJUST_LEN_LOAD_FLASHX: avr_out_load_flashx (insn, op, &len); break; case ADJUST_LEN_SEXT: avr_out_sign_extend (insn, op, &len); break; case ADJUST_LEN_SFRACT: avr_out_fract (insn, op, true, &len); break; @@ -10837,6 +10863,12 @@ avr_rtx_costs_1 (rtx x, machine_mode mod return true; case ZERO_EXTEND: + if (PSImode == mode + && ASHIFT == GET_CODE (XEXP (x, 0))) + { + *total = COSTS_N_INSNS (GET_MODE_SIZE (mode)); + return true; + } *total = COSTS_N_INSNS (GET_MODE_SIZE (mode) - GET_MODE_SIZE (GET_MODE (XEXP (x, 0)))); *total += avr_operand_rtx_cost (XEXP (x, 0), GET_MODE (XEXP (x, 0)), @@ -10844,6 +10876,12 @@ avr_rtx_costs_1 (rtx x, machine_mode mod return true; case SIGN_EXTEND: + if (PSImode == mode + && ASHIFT == GET_CODE (XEXP (x, 0))) + { + *total = COSTS_N_INSNS (GET_MODE_SIZE (mode)); + return true; + } *total = COSTS_N_INSNS (GET_MODE_SIZE (mode) + 2 - GET_MODE_SIZE (GET_MODE (XEXP (x, 0)))); *total += avr_operand_rtx_cost (XEXP (x, 0), GET_MODE (XEXP (x, 0)), @@ -11227,6 +11265,18 @@ avr_rtx_costs_1 (rtx x, machine_mode mod break; case PSImode: + if (SIGN_EXTEND == GET_CODE (XEXP (x, 0)) + && const1_rtx == XEXP (x, 1)) + { + *total = COSTS_N_INSNS (3); + return true; + } + if (ZERO_EXTEND == GET_CODE (XEXP (x, 0)) + && const1_rtx == XEXP (x, 1)) + { + *total = COSTS_N_INSNS (4); + return true; + } if (!CONST_INT_P (XEXP (x, 1))) { *total = COSTS_N_INSNS (!speed ? 6 : 73); @@ -13341,6 +13391,66 @@ avr_emit3_fix_outputs (rtx (*gen)(rtx,rt } +/* Implement `TARGET_ADDR_SPACE_FOR_ARTIFICIAL_RODATA'. */ + +static addr_space_t +avr_addr_space_for_artificial_rodata (tree node) +{ + if (TREE_CODE (node) == ARRAY_TYPE + && INTEGRAL_TYPE_P (TREE_TYPE (node)) + && int_cst_value (TYPE_SIZE_UNIT (TREE_TYPE (node))) > 4) + { + // Due to reduced addressing mode and DImode being lowered to QI, + // avoid a total code bloat by leaving DImode in .rodata. + return (addr_space_t) ADDR_SPACE_GENERIC; + } + + // Map option to address space proper... due to cumbersome include + // order we cannot use address spaces as option enum *sigh*. + + int as = ADDR_SPACE_GENERIC; + switch (avr_opt_addr_space_for_lookup) + { + case AVR_OPT_ADDR_SPACE_memx: as = ADDR_SPACE_MEMX; break; + case AVR_OPT_ADDR_SPACE_flash: as = ADDR_SPACE_FLASH; break; + case AVR_OPT_ADDR_SPACE_flash1: as = ADDR_SPACE_FLASH1; break; + case AVR_OPT_ADDR_SPACE_flash2: as = ADDR_SPACE_FLASH2; break; + case AVR_OPT_ADDR_SPACE_flash3: as = ADDR_SPACE_FLASH3; break; + case AVR_OPT_ADDR_SPACE_flash4: as = ADDR_SPACE_FLASH4; break; + case AVR_OPT_ADDR_SPACE_flash5: as = ADDR_SPACE_FLASH5; break; + default: + as = ADDR_SPACE_GENERIC; + break; + } + + if (as == ADDR_SPACE_GENERIC +#if defined HAVE_LD_AVR_AVRXMEGA3_RODATA_IN_FLASH + // If the device has a linear address space and configure found out + // that .rodata resides in flash, use .rodata without extra penalty. + || avr_arch->flash_pm_offset != 0 +#endif + // Reduced Tiny has no address spaces, .rodata is just fine. + || AVR_TINY) + { + return (addr_space_t) ADDR_SPACE_GENERIC; + } + + if (avr_n_flash > 1 + && (AVR_HAVE_ELPM || AVR_HAVE_ELPMX)) + { + if (as == ADDR_SPACE_MEMX + || (as == ADDR_SPACE_FLASH1 && avr_n_flash > 1) + || (as == ADDR_SPACE_FLASH2 && avr_n_flash > 2) + || (as == ADDR_SPACE_FLASH3 && avr_n_flash > 3) + || (as == ADDR_SPACE_FLASH4 && avr_n_flash > 4) + || (as == ADDR_SPACE_FLASH5 && avr_n_flash > 5)) + return (addr_space_t) as; + } + + return (addr_space_t) ADDR_SPACE_FLASH; +} + + /* Worker function for movmemhi expander. XOP[0] Destination as MEM:BLK XOP[1] Source " " @@ -14762,6 +14872,10 @@ avr_fold_builtin (tree fndecl, int n_arg #undef TARGET_ADDR_SPACE_DIAGNOSE_USAGE #define TARGET_ADDR_SPACE_DIAGNOSE_USAGE avr_addr_space_diagnose_usage +#undef TARGET_ADDR_SPACE_FOR_ARTIFICIAL_RODATA +#define TARGET_ADDR_SPACE_FOR_ARTIFICIAL_RODATA \ + avr_addr_space_for_artificial_rodata + #undef TARGET_MODE_DEPENDENT_ADDRESS_P #define TARGET_MODE_DEPENDENT_ADDRESS_P avr_mode_dependent_address_p Index: config/avr/avr.md =================================================================== --- config/avr/avr.md (revision 250302) +++ config/avr/avr.md (working copy) @@ -78,6 +78,7 @@ (define_c_enum "unspec" UNSPEC_COPYSIGN UNSPEC_IDENTITY UNSPEC_INSERT_BITS + UNSPEC_LOAD_FLASHX UNSPEC_ROUND ]) @@ -158,7 +159,7 @@ (define_attr "adjust_len" tsthi, tstpsi, tstsi, compare, compare64, call, mov8, mov16, mov24, mov32, reload_in16, reload_in24, reload_in32, ufract, sfract, round, - xload, movmem, + xload, load_flashx, movmem, ashlqi, ashrqi, lshrqi, ashlhi, ashrhi, lshrhi, ashlsi, ashrsi, lshrsi, @@ -1405,6 +1406,99 @@ (define_insn "*addsi3_zero_extend.hi" [(set_attr "length" "4") (set_attr "cc" "set_n")]) +(define_insn "*ashiftpsi.1_sign_extend.hi" + [(set (match_operand:PSI 0 "register_operand" "=r") + (ashift:PSI (sign_extend:PSI (match_operand:HI 1 "register_operand" "0")) + (const_int 1)))] + "" + "lsl %A0\;rol %B0\;sbc %C0,%C0" + [(set_attr "length" "3") + (set_attr "cc" "clobber")]) + +(define_insn "*ashiftpsi.1_zero_extend.hi" + [(set (match_operand:PSI 0 "register_operand" "=r") + (ashift:PSI (zero_extend:PSI (match_operand:HI 1 "register_operand" "0")) + (const_int 1)))] + "" + "lsl %A0\;rol %B0\;clr %C0\;rol %C0" + [(set_attr "length" "4") + (set_attr "cc" "clobber")]) + +(define_split ; non-matching + [(set (match_operand:PSI 0 "register_operand") + (plus:PSI (sign_extend:PSI (ashift:HI (match_operand:HI 1 "register_operand") + (const_int 1))) + (match_operand:PSI 2 "symbol_ref_operand")))] + "!reload_completed" + [;; "*ashiftpsi.1_sign_extend.hi" + (set (match_dup 0) + (ashift:PSI (sign_extend:PSI (match_dup 1)) + (const_int 1))) + ;; "addpsi3" + (set (match_dup 0) + (plus:PSI (match_dup 0) + (match_dup 2)))] + { + if (strncmp (XSTR (operands[2], 0), "CSWTCH.", strlen ("CSWTCH."))) + FAIL; + // Ok, this is from tree-switch-conversion. If, in the original + // pattern, bit 15 or 14 was set, then the code cooked up by + // tree-swich-conversion won't work anyways and we can just as well + // swap the shift with the extension. + }) + + +(define_insn "*extendpsi.ashift.1.uqi" + [(set (match_operand:PSI 0 "register_operand" "=r") + (any_extend:PSI (ashift:HI (zero_extend:HI (match_operand:QI 1 "register_operand" "0")) + (const_int 1))))] + "" + "lsl %A0\;clr %B0\;rol %B0\;clr %C0" + [(set_attr "length" "4") + (set_attr "cc" "clobber")]) + + +;; We abuse the addition to get grep for a split that transforms an +;; expensive memx library call to an inline sequence of ELPM Z+. +;; This is legitimate for lookups created by tree-switch-conversion. +(define_insn_and_split "*split_xload<mode>-cswtch" + [(set (match_operand:MOVMODE 0 "register_operand" "=r") + (mem:MOVMODE (plus:PSI (match_operand:PSI 1 "register_operand" "r") + (match_operand:PSI 2 "symbol_ref_operand" "i")))) + (clobber (reg:MOVMODE 22)) + (clobber (reg:QI 21)) + (clobber (reg:HI REG_Z))] + "!reload_completed + && AVR_HAVE_ELPMX + && SYMBOL_REF_P (operands[2]) + && 0 == strncmp (XSTR (operands[2], 0), \"CSWTCH.\", strlen (\"CSWTCH.\"))" + { gcc_unreachable(); } + "&& 1" + [;; "*load<mode>-flashx" + (parallel [(set (match_dup 0) + (unspec:MOVMODE [(match_dup 3) + ] UNSPEC_LOAD_FLASHX)) + (clobber (reg:HI REG_Z))])] + { + rtx set = single_set (curr_insn); + rtx mem = SET_SRC (set); + // "addpsi3" + operands[3] = force_reg (PSImode, XEXP (mem, 0)); + }) + + +(define_insn "*load<mode>-flashx" + [(set (match_operand:MOVMODE 0 "register_operand" "=r") + (unspec:MOVMODE [(match_operand:PSI 1 "register_operand" "r") + ] UNSPEC_LOAD_FLASHX)) + (clobber (reg:HI REG_Z))] + "AVR_HAVE_ELPMX" + { + return avr_out_load_flashx (insn, operands, NULL); + } + [(set_attr "adjust_len" "load_flashx")]) + + (define_insn "addpsi3" [(set (match_operand:PSI 0 "register_operand" "=??r,d ,d,r") (plus:PSI (match_operand:PSI 1 "register_operand" "%0,0 ,0,0") Index: config/avr/avr.opt =================================================================== --- config/avr/avr.opt (revision 250302) +++ config/avr/avr.opt (working copy) @@ -18,6 +18,9 @@ ; along with GCC; see the file COPYING3. If not see ; <http://www.gnu.org/licenses/>. +HeaderInclude +config/avr/avr-opts.h + mcall-prologues Target Report Mask(CALL_PROLOGUES) Use subroutines for function prologues and epilogues. @@ -82,6 +85,38 @@ mpmem-wrap-around Target Report Make the linker relaxation machine assume that a program counter wrap-around occurs. +maddr-space-for-lookup= +Target RejectNegative Joined Var(avr_opt_addr_space_for_lookup) Enum(avr_aspace_for_lookup) Init(AVR_OPT_ADDR_SPACE_memx) +Select the address space that might be used for accessing artificial lookup tables. + +Enum +Name(avr_aspace_for_lookup) Type(enum avr_opt_addr_space) +Known address space choices (for use with the -maddr-space-for-lookup= option): + +EnumValue +Enum(avr_aspace_for_lookup) String(flash) Value(AVR_OPT_ADDR_SPACE_flash) + +EnumValue +Enum(avr_aspace_for_lookup) String(flash1) Value(AVR_OPT_ADDR_SPACE_flash1) + +EnumValue +Enum(avr_aspace_for_lookup) String(flash2) Value(AVR_OPT_ADDR_SPACE_flash2) + +EnumValue +Enum(avr_aspace_for_lookup) String(flash3) Value(AVR_OPT_ADDR_SPACE_flash3) + +EnumValue +Enum(avr_aspace_for_lookup) String(flash4) Value(AVR_OPT_ADDR_SPACE_flash4) + +EnumValue +Enum(avr_aspace_for_lookup) String(flash5) Value(AVR_OPT_ADDR_SPACE_flash5) + +EnumValue +Enum(avr_aspace_for_lookup) String(memx) Value(AVR_OPT_ADDR_SPACE_memx) + +EnumValue +Enum(avr_aspace_for_lookup) String(generic) Value(AVR_OPT_ADDR_SPACE_generic) + maccumulate-args Target Report Mask(ACCUMULATE_OUTGOING_ARGS) Accumulate outgoing function arguments and acquire/release the needed stack space for outgoing function arguments in function prologue/epilogue. Without this option, outgoing arguments are pushed before calling a function and popped afterwards. This option can lead to reduced code size for functions that call many functions that get their arguments on the stack like, for example printf. Index: doc/invoke.texi =================================================================== --- doc/invoke.texi (revision 250302) +++ doc/invoke.texi (working copy) @@ -662,7 +662,7 @@ -imacros @var{file} -imultilib @var{dir @gccoptlist{-mmcu=@var{mcu} -mabsdata -maccumulate-args @gol -mbranch-cost=@var{cost} @gol -mcall-prologues -mgas-isr-prologues -mint8 @gol --mn_flash=@var{size} -mno-interrupts @gol +-maddr-space-for-lookup=@var{as} -mn_flash=@var{size} -mno-interrupts @gol -mrelax -mrmw -mstrict-X -mtiny-stack -mfract-convert-truncate @gol -mshort-calls -nodevicelib @gol -Waddr-space-convert -Wmisspelled-isr} @@ -15977,6 +15977,40 @@ This option can lead to reduced code siz several calls to functions that get their arguments on the stack like calls to printf-like functions. +@item -maddr-space-for-lookup=@var{addr-space} +@opindex maddr-space-for-lookup +When a simple initialization in a switch statement is converted to an +initialization from a scalar array, the compiler puts the lookup array +into @code{.rodata}, which is the generic address space. +This address space is always used for devices where flash memory is seen +in the the RAM address range like for @code{avrtiny} and @code{avrxmega3}. + +On devices where @code{.rodata} resides in RAM, this option allows to chose +a different @ref{AVR Named Address Spaces, address space} @var{addr-space} +which may be one of: +@table @code +@item memx +The default. +On devices with up to 64@tie{}KiB of program memory, use the 16-bit address +space @code{__flash}. On devices with more than 64@tie{}KiB of program +memory, use a 24-bit flash address space. +@item flash +@itemx flash1 +@itemx flash2 +@itemx flash3 +@itemx flash4 +@itemx flash5 +Use the 16-bit address space @code{__flash} @dots{} @code{__flash5}, +respectively. If the respective address space is not supported for the device +because it is beyond its flash size, @code{__flash} is used as a fallback. +@item generic +Use the generic address space, i.e. @code{.rodata}. +@end table +Other noticeable options in this context are: +@option{-f[no-]tree-switch-conversion}, @option{-f[no-]jump-tables} +and @option{--param case-values-threshold=@var{value}}, +@pxref{Optimize Options}. + @item -mbranch-cost=@var{cost} @opindex mbranch-cost Set the branch costs for conditional branch instructions to ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [patch 2/2,avr] PR49847: Add hook to place read-only lookup-tables in named address-space 2017-07-27 12:50 ` [patch 2/2,avr] " Georg-Johann Lay @ 2017-07-27 15:48 ` Denis Chertykov 0 siblings, 0 replies; 13+ messages in thread From: Denis Chertykov @ 2017-07-27 15:48 UTC (permalink / raw) To: Georg-Johann Lay; +Cc: gcc-patches 2017-07-27 16:50 GMT+04:00 Georg-Johann Lay <avr@gjlay.de>: > On 27.07.2017 14:29, Georg-Johann Lay wrote: >> >> For some targets, the best place to put read-only lookup tables as >> generated by -ftree-switch-conversion is not the generic address space >> but some target specific address space. >> >> This is the case for AVR, where for most devices .rodata must be >> located in RAM. >> >> Part #1 adds a new, optional target hook that queries the back-end >> about the desired address space. There is currently only one user: >> tree-switch-conversion.c::build_one_array() which re-builds value_type >> and array_type if the address space returned by the backend is not >> the generic one. >> >> Part #2 is the AVR part that implements the new hook and adds some >> sugar around it. > > > This is the AVR part. > > It implements the new hook which returns a convenient flash address > space for devices where .rodata is located in RAM: The 16-bit __flash > for devices with <= 64 KiB flash and 24-bit __memx for > 64 KiB flash. > > It adds a new option -madd-space-for-lookup= which allows to pick a > specific address space. > > Some new insns and combine-split suport best code generation by the > knowledge that the 24-bit addresses will never point to RAM so that > the expensive decision-at-runtime whether LD or LPM has to be used > can be avoided. > > Passed without new regressions on atmega128. > > Ok for trunk provided the gcc part 1/2 is approved? It's cool. Thank you. Please apply. > > Johann > > > Implement TARGET_ADDR_SPACE_FOR_ARTIFICIAL_RODATA. > > PR target/49857 > * config/avr/avr-opts.h: New file. > * config/avr/avr.opt: Include it. > (-maddr-space-for-lookup=): New option and... > (avr_opt_addr_space_for_lookup): ...associated Var. > (avr_aspace_for_lookup): New option enums used by above. > * config/avr/avr-protos.h (avr_out_load_flashx): New proto. > * config/avr/avr.c (avr_out_load_flashx): New function. > * avr_adjust_insn_length [ADJUST_LEN_LOAD_FLASHX]: Handle it. > * avr_rtx_costs_1 [ZERO_EXTEND, SIGN_EXTEND]: Handle > shift-and-extend-by-1 of HI -> PSI. > [ASHIFT,PSImode]: Describe cost of extend-and-shift-by-1. > (TARGET_ADDR_SPACE_FOR_ARTIFICIAL_RODATA): Define to... > (avr_addr_space_for_artificial_rodata): ...this new static function. > * config/avr/avr.md (unspec): Add UNSPEC_LOAD_FLASHX. > (adjust_len): Add load_flashx. > (*ashiftpsi.1_sign_extend.hi, *ashiftpsi.1_zero_extend.hi) > (*extendpsi.ashift.1.uqi, *load<mode>-flashx): New insns. > (*split_xload<mode>-cswtch): New insn-and-split. > * doc/invoke.texi (AVR Options) <-maddr-space-for-lookup=>: > Document. > ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2017-08-18 14:36 UTC | newest] Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-07-27 12:29 [patch 0/2] PR49847: Add hook to place read-only lookup-tables in named address-space Georg-Johann Lay 2017-07-27 12:34 ` Richard Biener 2017-07-27 13:32 ` Georg-Johann Lay 2017-07-28 7:34 ` Richard Biener 2017-07-28 10:18 ` Georg-Johann Lay 2017-07-28 11:15 ` Richard Biener 2017-07-28 18:16 ` Georg-Johann Lay 2017-08-16 14:29 ` Georg-Johann Lay 2017-08-18 10:35 ` Richard Biener 2017-08-18 14:54 ` Georg-Johann Lay 2017-07-27 12:41 ` [patch 1/2] " Georg-Johann Lay 2017-07-27 12:50 ` [patch 2/2,avr] " Georg-Johann Lay 2017-07-27 15:48 ` Denis Chertykov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).