public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [patch 0/2] PR49847: Add hook to place read-only lookup-tables in named address-space
@ 2017-07-27 12:29 Georg-Johann Lay
  2017-07-27 12:34 ` Richard Biener
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Georg-Johann Lay @ 2017-07-27 12:29 UTC (permalink / raw)
  To: gcc-patches; +Cc: Denis Chertykov

For some targets, the best place to put read-only lookup tables as
generated by -ftree-switch-conversion is not the generic address space
but some target specific address space.

This is the case for AVR, where for most devices .rodata must be
located in RAM.

Part #1 adds a new, optional target hook that queries the back-end
about the desired address space.  There is currently only one user:
tree-switch-conversion.c::build_one_array() which re-builds value_type
and array_type if the address space returned by the backend is not
the generic one.

Part #2 is the AVR part that implements the new hook and adds some
sugar around it.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [patch 0/2] PR49847: Add hook to place read-only lookup-tables in named address-space
  2017-07-27 12:29 [patch 0/2] PR49847: Add hook to place read-only lookup-tables in named address-space Georg-Johann Lay
@ 2017-07-27 12:34 ` Richard Biener
  2017-07-27 13:32   ` Georg-Johann Lay
  2017-07-27 12:41 ` [patch 1/2] " Georg-Johann Lay
  2017-07-27 12:50 ` [patch 2/2,avr] " Georg-Johann Lay
  2 siblings, 1 reply; 13+ messages in thread
From: Richard Biener @ 2017-07-27 12:34 UTC (permalink / raw)
  To: Georg-Johann Lay; +Cc: gcc-patches, Denis Chertykov

On Thu, Jul 27, 2017 at 2:29 PM, Georg-Johann Lay <avr@gjlay.de> wrote:
> For some targets, the best place to put read-only lookup tables as
> generated by -ftree-switch-conversion is not the generic address space
> but some target specific address space.
>
> This is the case for AVR, where for most devices .rodata must be
> located in RAM.
>
> Part #1 adds a new, optional target hook that queries the back-end
> about the desired address space.  There is currently only one user:
> tree-switch-conversion.c::build_one_array() which re-builds value_type
> and array_type if the address space returned by the backend is not
> the generic one.
>
> Part #2 is the AVR part that implements the new hook and adds some
> sugar around it.

Given that switch-conversion just creates a constant initializer doesn't AVR
benefit from handling those uniformly (at RTL expansion?).  Not sure but
I think it goes through the regular constant pool handling.

Richard.

>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [patch 1/2] PR49847: Add hook to place read-only lookup-tables in named address-space
  2017-07-27 12:29 [patch 0/2] PR49847: Add hook to place read-only lookup-tables in named address-space Georg-Johann Lay
  2017-07-27 12:34 ` Richard Biener
@ 2017-07-27 12:41 ` Georg-Johann Lay
  2017-07-27 12:50 ` [patch 2/2,avr] " Georg-Johann Lay
  2 siblings, 0 replies; 13+ messages in thread
From: Georg-Johann Lay @ 2017-07-27 12:41 UTC (permalink / raw)
  To: gcc-patches; +Cc: Denis Chertykov

[-- Attachment #1: Type: text/plain, Size: 1831 bytes --]

On 27.07.2017 14:29, Georg-Johann Lay wrote:
> For some targets, the best place to put read-only lookup tables as
> generated by -ftree-switch-conversion is not the generic address space
> but some target specific address space.
> 
> This is the case for AVR, where for most devices .rodata must be
> located in RAM.
> 
> Part #1 adds a new, optional target hook that queries the back-end
> about the desired address space.  There is currently only one user:
> tree-switch-conversion.c::build_one_array() which re-builds value_type
> and array_type if the address space returned by the backend is not
> the generic one.
> 
> Part #2 is the AVR part that implements the new hook and adds some
> sugar around it.

This is the gcc part which adds the new hook 
TARGET_ADDR_SPACE_FOR_ARTIFICIAL_RODATA resp. 
targetm.addr_space.for_artificial_rodata().

The default implementation returns ADDR_SPACE_GENERIC which represents
a no-op.  Only if !ADDR_SPACE_GENERIC_P, the array and value type are
re-built.

The accesses must be in such a way that any access to the newly created
array matches the address space.  This is the reason for why the
back-end cannot do it on its own:  Just putting the stuff in a specific
section does *not* do the trick.

Bootstrapped ok on x86_64.

Johann

	PR 49857
	* doc/tm.texi.in (Named Address Spaces)
	[TARGET_ADDR_SPACE_FOR_ARTIFICIAL_RODATA]: Add hook anchor.
	* doc/tm.texi: Regenerate.
	* target.def (addr_space) [for_artificial_rodata]: New optional hook.
	* targhooks.c (default_addr_space_for_artificial_rodata): New function.
	* targhooks.h (default_addr_space_for_artificial_rodata): New proto.
	* tree-switch-conversion.c (target.h): Include it.
	(build_one_array): Set address space of value_type according to
	targetm.addr_space.for_artificial_rodata() and rebuild array_type
	if needed.

[-- Attachment #2: pr49857-v3-gcc.diff --]
[-- Type: text/x-patch, Size: 5170 bytes --]

Index: target.def
===================================================================
--- target.def	(revision 250302)
+++ target.def	(working copy)
@@ -3285,6 +3285,25 @@ The default implementation does nothing.
  void, (addr_space_t as, location_t loc),
  default_addr_space_diagnose_usage)
 
+/* Function to get the address space of some compiler-generated
+   read-only data.  Used for optimization purposes only.  */
+DEFHOOK
+(for_artificial_rodata,
+ "Define this hook to return an address space to be used for @var{type},\n\
+usually an artificial lookup-table that would reside in @code{.rodata}.\n\
+It is always safe not to implement this hook or to return\n\
+@code{ADDR_SPACE_GENERIC}.\n\
+\n\
+The hook can be used to put compiler-generated, artificial data in\n\
+static stzorage into a specific address space when this is better suited\n\
+than the generic address space.\n\
+The compiler will also generate all accesses to the respective data\n\
+so that all associated accesses will also use the desired address space.\n\
+An example for such data are the @code{CSWTCH} lookup tables as generated\n\
+by @option{-ftree-switch-conversion}.",
+ addr_space_t, (tree type),
+ default_addr_space_for_artificial_rodata)
+
 HOOK_VECTOR_END (addr_space)
 
 #undef HOOK_PREFIX
Index: targhooks.c
===================================================================
--- targhooks.c	(revision 250302)
+++ targhooks.c	(working copy)
@@ -1397,6 +1397,14 @@ default_addr_space_convert (rtx op ATTRI
   gcc_unreachable ();
 }
 
+/* The default hook for TARGET_ADDR_SPACE_FOR_ARTIFICIAL_RODATA.  */
+
+addr_space_t
+default_addr_space_for_artificial_rodata (tree)
+{
+  return ADDR_SPACE_GENERIC;
+}
+
 bool
 default_hard_regno_scratch_ok (unsigned int regno ATTRIBUTE_UNUSED)
 {
Index: targhooks.h
===================================================================
--- targhooks.h	(revision 250302)
+++ targhooks.h	(working copy)
@@ -184,6 +184,7 @@ extern bool default_addr_space_zero_addr
 extern int default_addr_space_debug (addr_space_t);
 extern void default_addr_space_diagnose_usage (addr_space_t, location_t);
 extern rtx default_addr_space_convert (rtx, tree, tree);
+extern addr_space_t default_addr_space_for_artificial_rodata (tree);
 extern unsigned int default_case_values_threshold (void);
 extern bool default_have_conditional_execution (void);
 
Index: tree-switch-conversion.c
===================================================================
--- tree-switch-conversion.c	(revision 250302)
+++ tree-switch-conversion.c	(working copy)
@@ -46,6 +46,7 @@ Software Foundation, 51 Franklin Street,
 #include "gimplify-me.h"
 #include "tree-cfg.h"
 #include "cfgloop.h"
+#include "target.h"
 
 /* ??? For lang_hooks.types.type_for_mode, but is there a word_mode
    type in the GIMPLE type system that is language-independent?  */
@@ -1136,6 +1137,16 @@ build_one_array (gswitch *swtch, int num
       default_type = TREE_TYPE (info->default_values[num]);
       value_type = array_value_type (swtch, default_type, num, info);
       array_type = build_array_type (value_type, arr_index_type);
+      // Run the following hook on the complete array so the back-end
+      // can inspect details of it.
+      addr_space_t as = targetm.addr_space.for_artificial_rodata (array_type);
+      if (!ADDR_SPACE_GENERIC_P (as))
+	{
+	  int quals = (TYPE_QUALS_NO_ADDR_SPACE (value_type)
+		       | ENCODE_QUAL_ADDR_SPACE (as));
+	  value_type = build_qualified_type (value_type, quals);
+	  array_type = build_array_type (value_type, arr_index_type);
+	}
       if (default_type != value_type)
 	{
 	  unsigned int i;
Index: doc/tm.texi
===================================================================
--- doc/tm.texi	(revision 250302)
+++ doc/tm.texi	(working copy)
@@ -10595,6 +10595,21 @@ the address space as registered with @co
 The default implementation does nothing.
 @end deftypefn
 
+@deftypefn {Target Hook} addr_space_t TARGET_ADDR_SPACE_FOR_ARTIFICIAL_RODATA (tree @var{type})
+Define this hook to return an address space to be used for @var{type},
+usually an artificial lookup-table that would reside in @code{.rodata}.
+It is always safe not to implement this hook or to return
+@code{ADDR_SPACE_GENERIC}.
+
+The hook can be used to put compiler-generated, artificial data in
+static stzorage into a specific address space when this is better suited
+than the generic address space.
+The compiler will also generate all accesses to the respective data
+so that all associated accesses will also use the desired address space.
+An example for such data are the @code{CSWTCH} lookup tables as generated
+by @option{-ftree-switch-conversion}.
+@end deftypefn
+
 @node Misc
 @section Miscellaneous Parameters
 @cindex parameters, miscellaneous
Index: doc/tm.texi.in
===================================================================
--- doc/tm.texi.in	(revision 250302)
+++ doc/tm.texi.in	(working copy)
@@ -7529,6 +7529,8 @@ c_register_addr_space ("__ea", ADDR_SPAC
 
 @hook TARGET_ADDR_SPACE_DIAGNOSE_USAGE
 
+@hook TARGET_ADDR_SPACE_FOR_ARTIFICIAL_RODATA
+
 @node Misc
 @section Miscellaneous Parameters
 @cindex parameters, miscellaneous

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [patch 2/2,avr] PR49847: Add hook to place read-only lookup-tables in named address-space
  2017-07-27 12:29 [patch 0/2] PR49847: Add hook to place read-only lookup-tables in named address-space Georg-Johann Lay
  2017-07-27 12:34 ` Richard Biener
  2017-07-27 12:41 ` [patch 1/2] " Georg-Johann Lay
@ 2017-07-27 12:50 ` Georg-Johann Lay
  2017-07-27 15:48   ` Denis Chertykov
  2 siblings, 1 reply; 13+ messages in thread
From: Georg-Johann Lay @ 2017-07-27 12:50 UTC (permalink / raw)
  To: gcc-patches; +Cc: Denis Chertykov

[-- Attachment #1: Type: text/plain, Size: 2452 bytes --]

On 27.07.2017 14:29, Georg-Johann Lay wrote:
> For some targets, the best place to put read-only lookup tables as
> generated by -ftree-switch-conversion is not the generic address space
> but some target specific address space.
> 
> This is the case for AVR, where for most devices .rodata must be
> located in RAM.
> 
> Part #1 adds a new, optional target hook that queries the back-end
> about the desired address space.  There is currently only one user:
> tree-switch-conversion.c::build_one_array() which re-builds value_type
> and array_type if the address space returned by the backend is not
> the generic one.
> 
> Part #2 is the AVR part that implements the new hook and adds some
> sugar around it.

This is the AVR part.

It implements the new hook which returns a convenient flash address
space for devices where .rodata is located in RAM:  The 16-bit __flash
for devices with <= 64 KiB flash and 24-bit __memx for > 64 KiB flash.

It adds a new option -madd-space-for-lookup= which allows to pick a
specific address space.

Some new insns and combine-split suport best code generation by the
knowledge that the 24-bit addresses will never point to RAM so that
the expensive decision-at-runtime whether LD or LPM has to be used
can be avoided.

Passed without new regressions on atmega128.

Ok for trunk provided the gcc part 1/2 is approved?

Johann


	Implement TARGET_ADDR_SPACE_FOR_ARTIFICIAL_RODATA.

	PR target/49857
	* config/avr/avr-opts.h: New file.
	* config/avr/avr.opt: Include it.
	(-maddr-space-for-lookup=): New option and...
	(avr_opt_addr_space_for_lookup): ...associated Var.
	(avr_aspace_for_lookup): New option enums used by above.
	* config/avr/avr-protos.h (avr_out_load_flashx): New proto.
	* config/avr/avr.c (avr_out_load_flashx): New function.
	* avr_adjust_insn_length [ADJUST_LEN_LOAD_FLASHX]: Handle it.
	* avr_rtx_costs_1 [ZERO_EXTEND, SIGN_EXTEND]: Handle
	shift-and-extend-by-1 of HI -> PSI.
	[ASHIFT,PSImode]: Describe cost of extend-and-shift-by-1.
	(TARGET_ADDR_SPACE_FOR_ARTIFICIAL_RODATA): Define to...
	(avr_addr_space_for_artificial_rodata): ...this new static function.
	* config/avr/avr.md (unspec): Add UNSPEC_LOAD_FLASHX.
	(adjust_len): Add load_flashx.
	(*ashiftpsi.1_sign_extend.hi, *ashiftpsi.1_zero_extend.hi)
	(*extendpsi.ashift.1.uqi, *load<mode>-flashx): New insns.
	(*split_xload<mode>-cswtch): New insn-and-split.
	* doc/invoke.texi (AVR Options) <-maddr-space-for-lookup=>: Document.


[-- Attachment #2: pr49857-v3-avr.diff --]
[-- Type: text/x-patch, Size: 17253 bytes --]

Index: config/avr/avr-opts.h
===================================================================
--- config/avr/avr-opts.h	(nonexistent)
+++ config/avr/avr-opts.h	(working copy)
@@ -0,0 +1,40 @@
+/* Definitions for option handling for AVR.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+#ifndef AVR_OPTS_H
+#define AVR_OPTS_H
+
+enum avr_opt_addr_space
+  {
+    AVR_OPT_ADDR_SPACE_flash,
+    AVR_OPT_ADDR_SPACE_flash1,
+    AVR_OPT_ADDR_SPACE_flash2,
+    AVR_OPT_ADDR_SPACE_flash3,
+    AVR_OPT_ADDR_SPACE_flash4,
+    AVR_OPT_ADDR_SPACE_flash5,
+    AVR_OPT_ADDR_SPACE_memx,
+    AVR_OPT_ADDR_SPACE_generic
+  };
+
+#endif /* AVR_OPTS_H */
Index: config/avr/avr-protos.h
===================================================================
--- config/avr/avr-protos.h	(revision 250302)
+++ config/avr/avr-protos.h	(working copy)
@@ -94,6 +94,7 @@ extern const char* avr_out_plus (rtx, rt
 extern const char* avr_out_round (rtx_insn *, rtx*, int* =NULL);
 extern const char* avr_out_addto_sp (rtx*, int*);
 extern const char* avr_out_xload (rtx_insn *, rtx*, int*);
+extern const char* avr_out_load_flashx (rtx_insn*, rtx*, int*);
 extern const char* avr_out_movmem (rtx_insn *, rtx*, int*);
 extern const char* avr_out_insert_bits (rtx*, int*);
 extern bool avr_popcount_each_byte (rtx, int, int);
Index: config/avr/avr.c
===================================================================
--- config/avr/avr.c	(revision 250302)
+++ config/avr/avr.c	(working copy)
@@ -3914,6 +3914,31 @@ avr_out_xload (rtx_insn *insn ATTRIBUTE_
 }
 
 
+/* Load register XOP[0] with flash content from PSI address XOP[1].
+   We have ELPMX, MOVW, load up to 4 bytes and may clobber Z.  */
+
+const char*
+avr_out_load_flashx (rtx_insn*, rtx *xop, int *plen)
+{
+  unsigned n_bytes = GET_MODE_SIZE (GET_MODE (xop[0]));
+  gcc_assert (n_bytes <= 4);
+
+  avr_asm_len ("out __RAMPZ__,%C1", xop, plen, -1);
+  avr_asm_len ("movw ZL,%1", xop, plen, 1);
+
+  avr_asm_len ("elpm %A0,Z+", xop, plen, 1);
+  if (n_bytes >= 2)  avr_asm_len ("elpm %B0,Z+", xop, plen, 1);
+  if (n_bytes >= 3)  avr_asm_len ("elpm %C0,Z+", xop, plen, 1);
+  if (n_bytes >= 4)  avr_asm_len ("elpm %D0,Z+", xop, plen, 1);
+
+  // EBI devices: reset RAMPZ to 0 so that we don't read garbage from RAM.
+  if (AVR_HAVE_RAMPD)
+    avr_asm_len ("out __RAMPZ__,__zero_reg__", xop, plen, 1);
+  
+  return "";
+}
+
+
 const char*
 output_movqi (rtx_insn *insn, rtx operands[], int *plen)
 {
@@ -9438,6 +9463,7 @@ avr_adjust_insn_length (rtx_insn *insn,
     case ADJUST_LEN_MOV32: output_movsisf (insn, op, &len); break;
     case ADJUST_LEN_MOVMEM: avr_out_movmem (insn, op, &len); break;
     case ADJUST_LEN_XLOAD: avr_out_xload (insn, op, &len); break;
+    case ADJUST_LEN_LOAD_FLASHX: avr_out_load_flashx (insn, op, &len); break;
     case ADJUST_LEN_SEXT: avr_out_sign_extend (insn, op, &len); break;
 
     case ADJUST_LEN_SFRACT: avr_out_fract (insn, op, true, &len); break;
@@ -10837,6 +10863,12 @@ avr_rtx_costs_1 (rtx x, machine_mode mod
       return true;
 
     case ZERO_EXTEND:
+      if (PSImode == mode
+          && ASHIFT == GET_CODE (XEXP (x, 0)))
+        {
+          *total = COSTS_N_INSNS (GET_MODE_SIZE (mode));
+          return true;
+        }
       *total = COSTS_N_INSNS (GET_MODE_SIZE (mode)
 			      - GET_MODE_SIZE (GET_MODE (XEXP (x, 0))));
       *total += avr_operand_rtx_cost (XEXP (x, 0), GET_MODE (XEXP (x, 0)),
@@ -10844,6 +10876,12 @@ avr_rtx_costs_1 (rtx x, machine_mode mod
       return true;
 
     case SIGN_EXTEND:
+      if (PSImode == mode
+          && ASHIFT == GET_CODE (XEXP (x, 0)))
+        {
+          *total = COSTS_N_INSNS (GET_MODE_SIZE (mode));
+          return true;
+        }
       *total = COSTS_N_INSNS (GET_MODE_SIZE (mode) + 2
 			      - GET_MODE_SIZE (GET_MODE (XEXP (x, 0))));
       *total += avr_operand_rtx_cost (XEXP (x, 0), GET_MODE (XEXP (x, 0)),
@@ -11227,6 +11265,18 @@ avr_rtx_costs_1 (rtx x, machine_mode mod
 	  break;
 
         case PSImode:
+          if (SIGN_EXTEND == GET_CODE (XEXP (x, 0))
+              && const1_rtx == XEXP (x, 1))
+            {
+              *total = COSTS_N_INSNS (3);
+              return true;
+            }
+          if (ZERO_EXTEND == GET_CODE (XEXP (x, 0))
+              && const1_rtx == XEXP (x, 1))
+            {
+              *total = COSTS_N_INSNS (4);
+              return true;
+            }
           if (!CONST_INT_P (XEXP (x, 1)))
             {
               *total = COSTS_N_INSNS (!speed ? 6 : 73);
@@ -13341,6 +13391,66 @@ avr_emit3_fix_outputs (rtx (*gen)(rtx,rt
 }
 
 
+/* Implement `TARGET_ADDR_SPACE_FOR_ARTIFICIAL_RODATA'.  */
+
+static addr_space_t
+avr_addr_space_for_artificial_rodata (tree node)
+{
+  if (TREE_CODE (node) == ARRAY_TYPE
+      && INTEGRAL_TYPE_P (TREE_TYPE (node))
+      && int_cst_value (TYPE_SIZE_UNIT (TREE_TYPE (node))) > 4)
+    {
+      // Due to reduced addressing mode and DImode being lowered to QI,
+      // avoid a total code bloat by leaving DImode in .rodata.
+      return (addr_space_t) ADDR_SPACE_GENERIC;
+    }
+
+  // Map option to address space proper... due to cumbersome include
+  // order we cannot use address spaces as option enum *sigh*.
+
+  int as = ADDR_SPACE_GENERIC;
+  switch (avr_opt_addr_space_for_lookup)
+    {
+    case AVR_OPT_ADDR_SPACE_memx:   as = ADDR_SPACE_MEMX; break;
+    case AVR_OPT_ADDR_SPACE_flash:  as = ADDR_SPACE_FLASH; break;
+    case AVR_OPT_ADDR_SPACE_flash1: as = ADDR_SPACE_FLASH1; break;
+    case AVR_OPT_ADDR_SPACE_flash2: as = ADDR_SPACE_FLASH2; break;
+    case AVR_OPT_ADDR_SPACE_flash3: as = ADDR_SPACE_FLASH3; break;
+    case AVR_OPT_ADDR_SPACE_flash4: as = ADDR_SPACE_FLASH4; break;
+    case AVR_OPT_ADDR_SPACE_flash5: as = ADDR_SPACE_FLASH5; break;
+    default:
+      as = ADDR_SPACE_GENERIC;
+      break;
+    }
+
+  if (as == ADDR_SPACE_GENERIC
+#if defined HAVE_LD_AVR_AVRXMEGA3_RODATA_IN_FLASH
+      // If the device has a linear address space and configure found out
+      // that .rodata resides in flash, use .rodata without extra penalty.
+      || avr_arch->flash_pm_offset != 0
+#endif
+      // Reduced Tiny has no address spaces, .rodata is just fine.
+      || AVR_TINY)
+    {
+      return (addr_space_t) ADDR_SPACE_GENERIC;
+    }
+
+  if (avr_n_flash > 1
+      && (AVR_HAVE_ELPM || AVR_HAVE_ELPMX))
+    {
+      if (as == ADDR_SPACE_MEMX
+          || (as == ADDR_SPACE_FLASH1 && avr_n_flash > 1)
+          || (as == ADDR_SPACE_FLASH2 && avr_n_flash > 2)
+          || (as == ADDR_SPACE_FLASH3 && avr_n_flash > 3)
+          || (as == ADDR_SPACE_FLASH4 && avr_n_flash > 4)
+          || (as == ADDR_SPACE_FLASH5 && avr_n_flash > 5))
+        return (addr_space_t) as;
+    }
+
+  return (addr_space_t) ADDR_SPACE_FLASH;
+}
+
+
 /* Worker function for movmemhi expander.
    XOP[0]  Destination as MEM:BLK
    XOP[1]  Source      "     "
@@ -14762,6 +14872,10 @@ avr_fold_builtin (tree fndecl, int n_arg
 #undef  TARGET_ADDR_SPACE_DIAGNOSE_USAGE
 #define TARGET_ADDR_SPACE_DIAGNOSE_USAGE avr_addr_space_diagnose_usage
 
+#undef  TARGET_ADDR_SPACE_FOR_ARTIFICIAL_RODATA
+#define TARGET_ADDR_SPACE_FOR_ARTIFICIAL_RODATA  \
+  avr_addr_space_for_artificial_rodata
+
 #undef  TARGET_MODE_DEPENDENT_ADDRESS_P
 #define TARGET_MODE_DEPENDENT_ADDRESS_P avr_mode_dependent_address_p
 
Index: config/avr/avr.md
===================================================================
--- config/avr/avr.md	(revision 250302)
+++ config/avr/avr.md	(working copy)
@@ -78,6 +78,7 @@ (define_c_enum "unspec"
    UNSPEC_COPYSIGN
    UNSPEC_IDENTITY
    UNSPEC_INSERT_BITS
+   UNSPEC_LOAD_FLASHX
    UNSPEC_ROUND
    ])
 
@@ -158,7 +159,7 @@ (define_attr "adjust_len"
    tsthi, tstpsi, tstsi, compare, compare64, call,
    mov8, mov16, mov24, mov32, reload_in16, reload_in24, reload_in32,
    ufract, sfract, round,
-   xload, movmem,
+   xload, load_flashx, movmem,
    ashlqi, ashrqi, lshrqi,
    ashlhi, ashrhi, lshrhi,
    ashlsi, ashrsi, lshrsi,
@@ -1405,6 +1406,99 @@ (define_insn "*addsi3_zero_extend.hi"
   [(set_attr "length" "4")
    (set_attr "cc" "set_n")])
 
+(define_insn "*ashiftpsi.1_sign_extend.hi"
+  [(set (match_operand:PSI 0 "register_operand"                            "=r")
+        (ashift:PSI (sign_extend:PSI (match_operand:HI 1 "register_operand" "0"))
+                    (const_int 1)))]
+  ""
+  "lsl %A0\;rol %B0\;sbc %C0,%C0"
+  [(set_attr "length" "3")
+   (set_attr "cc" "clobber")])
+
+(define_insn "*ashiftpsi.1_zero_extend.hi"
+  [(set (match_operand:PSI 0 "register_operand"                            "=r")
+        (ashift:PSI (zero_extend:PSI (match_operand:HI 1 "register_operand" "0"))
+                    (const_int 1)))]
+  ""
+  "lsl %A0\;rol %B0\;clr %C0\;rol %C0"
+  [(set_attr "length" "4")
+   (set_attr "cc" "clobber")])
+
+(define_split ; non-matching
+  [(set (match_operand:PSI 0 "register_operand")
+        (plus:PSI (sign_extend:PSI (ashift:HI (match_operand:HI 1 "register_operand")
+                                              (const_int 1)))
+                  (match_operand:PSI 2 "symbol_ref_operand")))]
+  "!reload_completed"
+  [;; "*ashiftpsi.1_sign_extend.hi"
+   (set (match_dup 0)
+        (ashift:PSI (sign_extend:PSI (match_dup 1))
+                    (const_int 1)))
+   ;; "addpsi3"
+   (set (match_dup 0)
+        (plus:PSI (match_dup 0)
+                  (match_dup 2)))]
+  {
+     if (strncmp (XSTR (operands[2], 0), "CSWTCH.", strlen ("CSWTCH.")))
+       FAIL;
+     // Ok, this is from tree-switch-conversion.  If, in the original
+     // pattern, bit 15 or 14 was set, then the code cooked up by
+     // tree-swich-conversion won't work anyways and we can just as well
+     // swap the shift with the extension.
+  })
+        
+
+(define_insn "*extendpsi.ashift.1.uqi"
+  [(set (match_operand:PSI 0 "register_operand"                                          "=r")
+        (any_extend:PSI (ashift:HI (zero_extend:HI (match_operand:QI 1 "register_operand" "0"))
+                                   (const_int 1))))]
+  ""
+  "lsl %A0\;clr %B0\;rol %B0\;clr %C0"
+  [(set_attr "length" "4")
+   (set_attr "cc" "clobber")])
+
+
+;; We abuse the addition to get grep for a split that transforms an
+;; expensive memx library call to an inline sequence of ELPM Z+.
+;; This is legitimate for lookups created by tree-switch-conversion.
+(define_insn_and_split "*split_xload<mode>-cswtch"
+  [(set (match_operand:MOVMODE 0 "register_operand"                      "=r")
+        (mem:MOVMODE (plus:PSI (match_operand:PSI 1 "register_operand"    "r")
+                               (match_operand:PSI 2 "symbol_ref_operand"  "i"))))
+   (clobber (reg:MOVMODE 22))
+   (clobber (reg:QI 21))
+   (clobber (reg:HI REG_Z))]
+  "!reload_completed
+   && AVR_HAVE_ELPMX
+   && SYMBOL_REF_P (operands[2])
+   && 0 == strncmp (XSTR (operands[2], 0), \"CSWTCH.\", strlen (\"CSWTCH.\"))"
+  { gcc_unreachable(); }
+  "&& 1"
+  [;; "*load<mode>-flashx"
+   (parallel [(set (match_dup 0)
+                   (unspec:MOVMODE [(match_dup 3)
+                                    ] UNSPEC_LOAD_FLASHX))
+              (clobber (reg:HI REG_Z))])]
+  {
+    rtx set = single_set (curr_insn);  
+    rtx mem = SET_SRC (set);
+    // "addpsi3"
+    operands[3] = force_reg (PSImode, XEXP (mem, 0));
+  })
+
+
+(define_insn "*load<mode>-flashx"
+  [(set (match_operand:MOVMODE 0 "register_operand"             "=r")
+        (unspec:MOVMODE [(match_operand:PSI 1 "register_operand" "r")
+                         ] UNSPEC_LOAD_FLASHX))
+   (clobber (reg:HI REG_Z))]
+  "AVR_HAVE_ELPMX"
+  {
+    return avr_out_load_flashx (insn, operands, NULL);
+  }
+  [(set_attr "adjust_len" "load_flashx")])
+
+
 (define_insn "addpsi3"
   [(set (match_operand:PSI 0 "register_operand"         "=??r,d ,d,r")
         (plus:PSI (match_operand:PSI 1 "register_operand" "%0,0 ,0,0")
Index: config/avr/avr.opt
===================================================================
--- config/avr/avr.opt	(revision 250302)
+++ config/avr/avr.opt	(working copy)
@@ -18,6 +18,9 @@
 ; along with GCC; see the file COPYING3.  If not see
 ; <http://www.gnu.org/licenses/>.
 
+HeaderInclude
+config/avr/avr-opts.h
+
 mcall-prologues
 Target Report Mask(CALL_PROLOGUES)
 Use subroutines for function prologues and epilogues.
@@ -82,6 +85,38 @@ mpmem-wrap-around
 Target Report
 Make the linker relaxation machine assume that a program counter wrap-around occurs.
 
+maddr-space-for-lookup=
+Target RejectNegative Joined Var(avr_opt_addr_space_for_lookup) Enum(avr_aspace_for_lookup) Init(AVR_OPT_ADDR_SPACE_memx)
+Select the address space that might be used for accessing artificial lookup tables.
+
+Enum
+Name(avr_aspace_for_lookup) Type(enum avr_opt_addr_space)
+Known address space choices (for use with the -maddr-space-for-lookup= option):
+
+EnumValue
+Enum(avr_aspace_for_lookup) String(flash) Value(AVR_OPT_ADDR_SPACE_flash)
+
+EnumValue
+Enum(avr_aspace_for_lookup) String(flash1) Value(AVR_OPT_ADDR_SPACE_flash1)
+
+EnumValue
+Enum(avr_aspace_for_lookup) String(flash2) Value(AVR_OPT_ADDR_SPACE_flash2)
+
+EnumValue
+Enum(avr_aspace_for_lookup) String(flash3) Value(AVR_OPT_ADDR_SPACE_flash3)
+
+EnumValue
+Enum(avr_aspace_for_lookup) String(flash4) Value(AVR_OPT_ADDR_SPACE_flash4)
+
+EnumValue
+Enum(avr_aspace_for_lookup) String(flash5) Value(AVR_OPT_ADDR_SPACE_flash5)
+
+EnumValue
+Enum(avr_aspace_for_lookup) String(memx) Value(AVR_OPT_ADDR_SPACE_memx)
+
+EnumValue
+Enum(avr_aspace_for_lookup) String(generic) Value(AVR_OPT_ADDR_SPACE_generic)
+
 maccumulate-args
 Target Report Mask(ACCUMULATE_OUTGOING_ARGS)
 Accumulate outgoing function arguments and acquire/release the needed stack space for outgoing function arguments in function prologue/epilogue.  Without this option, outgoing arguments are pushed before calling a function and popped afterwards.  This option can lead to reduced code size for functions that call many functions that get their arguments on the stack like, for example printf.
Index: doc/invoke.texi
===================================================================
--- doc/invoke.texi	(revision 250302)
+++ doc/invoke.texi	(working copy)
@@ -662,7 +662,7 @@ -imacros @var{file}  -imultilib @var{dir
 @gccoptlist{-mmcu=@var{mcu}  -mabsdata  -maccumulate-args @gol
 -mbranch-cost=@var{cost} @gol
 -mcall-prologues  -mgas-isr-prologues  -mint8 @gol
--mn_flash=@var{size}  -mno-interrupts @gol
+-maddr-space-for-lookup=@var{as} -mn_flash=@var{size}  -mno-interrupts @gol
 -mrelax  -mrmw  -mstrict-X  -mtiny-stack  -mfract-convert-truncate @gol
 -mshort-calls  -nodevicelib @gol
 -Waddr-space-convert  -Wmisspelled-isr}
@@ -15977,6 +15977,40 @@ This option can lead to reduced code siz
 several calls to functions that get their arguments on the stack like
 calls to printf-like functions.
 
+@item -maddr-space-for-lookup=@var{addr-space}
+@opindex maddr-space-for-lookup
+When a simple initialization in a switch statement is converted to an
+initialization from a scalar array, the compiler puts the lookup array
+into @code{.rodata}, which is the generic address space.
+This address space is always used for devices where flash memory is seen
+in the the RAM address range like for @code{avrtiny} and @code{avrxmega3}.
+
+On devices where @code{.rodata} resides in RAM, this option allows to chose
+a different @ref{AVR Named Address Spaces, address space} @var{addr-space}
+which may be one of:
+@table @code
+@item memx
+The default.
+On devices with up to 64@tie{}KiB of program memory, use the 16-bit address
+space @code{__flash}.   On devices with more than 64@tie{}KiB of program
+memory, use a 24-bit flash address space.
+@item flash
+@itemx flash1
+@itemx flash2
+@itemx flash3
+@itemx flash4
+@itemx flash5
+Use the 16-bit address space @code{__flash} @dots{} @code{__flash5},
+respectively. If the respective address space is not supported for the device
+because it is beyond its flash size, @code{__flash} is used as a fallback.
+@item generic
+Use the generic address space, i.e. @code{.rodata}.
+@end table
+Other noticeable options in this context are:
+@option{-f[no-]tree-switch-conversion}, @option{-f[no-]jump-tables}
+and @option{--param case-values-threshold=@var{value}},
+@pxref{Optimize Options}. 
+
 @item -mbranch-cost=@var{cost}
 @opindex mbranch-cost
 Set the branch costs for conditional branch instructions to

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [patch 0/2] PR49847: Add hook to place read-only lookup-tables in named address-space
  2017-07-27 12:34 ` Richard Biener
@ 2017-07-27 13:32   ` Georg-Johann Lay
  2017-07-28  7:34     ` Richard Biener
  0 siblings, 1 reply; 13+ messages in thread
From: Georg-Johann Lay @ 2017-07-27 13:32 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Denis Chertykov

On 27.07.2017 14:34, Richard Biener wrote:
> On Thu, Jul 27, 2017 at 2:29 PM, Georg-Johann Lay <avr@gjlay.de> wrote:
>> For some targets, the best place to put read-only lookup tables as
>> generated by -ftree-switch-conversion is not the generic address space
>> but some target specific address space.
>>
>> This is the case for AVR, where for most devices .rodata must be
>> located in RAM.
>>
>> Part #1 adds a new, optional target hook that queries the back-end
>> about the desired address space.  There is currently only one user:
>> tree-switch-conversion.c::build_one_array() which re-builds value_type
>> and array_type if the address space returned by the backend is not
>> the generic one.
>>
>> Part #2 is the AVR part that implements the new hook and adds some
>> sugar around it.
> 
> Given that switch-conversion just creates a constant initializer doesn't AVR
> benefit from handling those uniformly (at RTL expansion?).  Not sure but
> I think it goes through the regular constant pool handling.
> 
> Richard.

avr doesn't use constant pools because they are not profitable for
several reasons.

Moreover, it's not sufficient to just patch the pool's section, you'd
also have to patch *all* accesses to also use the exact same address
space so that they use the correct instruction (LPM instead of LD).

Loop optimization, for example, may move addr-space pointers out of
loops and actually does this for some CSWTCH tables.  I didn't look
into pool handling, but don't expect it allows to consistently
patch all accesses in the aftermath.

*If* that's possible, then it should also be possible to patch vtables
and all of their accesses, aka.

https://gcc.gnu.org/PR43745

e.g. in a target-specific tree pass?

With vtables it's basically the same problem, you'd have to patch *all*
accesses to match the vtable's address-space.  c++ need not to expose
address-spaces to the application, handling address-spaces internally
would be sufficient.

Johann

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [patch 2/2,avr] PR49847: Add hook to place read-only lookup-tables in named address-space
  2017-07-27 12:50 ` [patch 2/2,avr] " Georg-Johann Lay
@ 2017-07-27 15:48   ` Denis Chertykov
  0 siblings, 0 replies; 13+ messages in thread
From: Denis Chertykov @ 2017-07-27 15:48 UTC (permalink / raw)
  To: Georg-Johann Lay; +Cc: gcc-patches

2017-07-27 16:50 GMT+04:00 Georg-Johann Lay <avr@gjlay.de>:
> On 27.07.2017 14:29, Georg-Johann Lay wrote:
>>
>> For some targets, the best place to put read-only lookup tables as
>> generated by -ftree-switch-conversion is not the generic address space
>> but some target specific address space.
>>
>> This is the case for AVR, where for most devices .rodata must be
>> located in RAM.
>>
>> Part #1 adds a new, optional target hook that queries the back-end
>> about the desired address space.  There is currently only one user:
>> tree-switch-conversion.c::build_one_array() which re-builds value_type
>> and array_type if the address space returned by the backend is not
>> the generic one.
>>
>> Part #2 is the AVR part that implements the new hook and adds some
>> sugar around it.
>
>
> This is the AVR part.
>
> It implements the new hook which returns a convenient flash address
> space for devices where .rodata is located in RAM:  The 16-bit __flash
> for devices with <= 64 KiB flash and 24-bit __memx for > 64 KiB flash.
>
> It adds a new option -madd-space-for-lookup= which allows to pick a
> specific address space.
>
> Some new insns and combine-split suport best code generation by the
> knowledge that the 24-bit addresses will never point to RAM so that
> the expensive decision-at-runtime whether LD or LPM has to be used
> can be avoided.
>
> Passed without new regressions on atmega128.
>
> Ok for trunk provided the gcc part 1/2 is approved?

It's cool. Thank you.
Please apply.


>
> Johann
>
>
>         Implement TARGET_ADDR_SPACE_FOR_ARTIFICIAL_RODATA.
>
>         PR target/49857
>         * config/avr/avr-opts.h: New file.
>         * config/avr/avr.opt: Include it.
>         (-maddr-space-for-lookup=): New option and...
>         (avr_opt_addr_space_for_lookup): ...associated Var.
>         (avr_aspace_for_lookup): New option enums used by above.
>         * config/avr/avr-protos.h (avr_out_load_flashx): New proto.
>         * config/avr/avr.c (avr_out_load_flashx): New function.
>         * avr_adjust_insn_length [ADJUST_LEN_LOAD_FLASHX]: Handle it.
>         * avr_rtx_costs_1 [ZERO_EXTEND, SIGN_EXTEND]: Handle
>         shift-and-extend-by-1 of HI -> PSI.
>         [ASHIFT,PSImode]: Describe cost of extend-and-shift-by-1.
>         (TARGET_ADDR_SPACE_FOR_ARTIFICIAL_RODATA): Define to...
>         (avr_addr_space_for_artificial_rodata): ...this new static function.
>         * config/avr/avr.md (unspec): Add UNSPEC_LOAD_FLASHX.
>         (adjust_len): Add load_flashx.
>         (*ashiftpsi.1_sign_extend.hi, *ashiftpsi.1_zero_extend.hi)
>         (*extendpsi.ashift.1.uqi, *load<mode>-flashx): New insns.
>         (*split_xload<mode>-cswtch): New insn-and-split.
>         * doc/invoke.texi (AVR Options) <-maddr-space-for-lookup=>:
> Document.
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [patch 0/2] PR49847: Add hook to place read-only lookup-tables in named address-space
  2017-07-27 13:32   ` Georg-Johann Lay
@ 2017-07-28  7:34     ` Richard Biener
  2017-07-28 10:18       ` Georg-Johann Lay
  2017-08-16 14:29       ` Georg-Johann Lay
  0 siblings, 2 replies; 13+ messages in thread
From: Richard Biener @ 2017-07-28  7:34 UTC (permalink / raw)
  To: Georg-Johann Lay; +Cc: gcc-patches, Denis Chertykov

On Thu, Jul 27, 2017 at 3:32 PM, Georg-Johann Lay <avr@gjlay.de> wrote:
> On 27.07.2017 14:34, Richard Biener wrote:
>>
>> On Thu, Jul 27, 2017 at 2:29 PM, Georg-Johann Lay <avr@gjlay.de> wrote:
>>>
>>> For some targets, the best place to put read-only lookup tables as
>>> generated by -ftree-switch-conversion is not the generic address space
>>> but some target specific address space.
>>>
>>> This is the case for AVR, where for most devices .rodata must be
>>> located in RAM.
>>>
>>> Part #1 adds a new, optional target hook that queries the back-end
>>> about the desired address space.  There is currently only one user:
>>> tree-switch-conversion.c::build_one_array() which re-builds value_type
>>> and array_type if the address space returned by the backend is not
>>> the generic one.
>>>
>>> Part #2 is the AVR part that implements the new hook and adds some
>>> sugar around it.
>>
>>
>> Given that switch-conversion just creates a constant initializer doesn't
>> AVR
>> benefit from handling those uniformly (at RTL expansion?).  Not sure but
>> I think it goes through the regular constant pool handling.
>>
>> Richard.
>
>
> avr doesn't use constant pools because they are not profitable for
> several reasons.
>
> Moreover, it's not sufficient to just patch the pool's section, you'd
> also have to patch *all* accesses to also use the exact same address
> space so that they use the correct instruction (LPM instead of LD).

Sure.  So then not handle it in constant pool handling but in expanding
the initializers of global vars.  I think the entry for this is
varasm.c:assemble_variable - that sets DECL_RTL for the variable.

> Loop optimization, for example, may move addr-space pointers out of
> loops and actually does this for some CSWTCH tables.  I didn't look
> into pool handling, but don't expect it allows to consistently
> patch all accesses in the aftermath.
>
> *If* that's possible, then it should also be possible to patch vtables
> and all of their accesses, aka.
>
> https://gcc.gnu.org/PR43745
>
> e.g. in a target-specific tree pass?

Ah, ok.  I see what you mean.  Once the address of a global var is
taken the pointers refering to it have to use the proper address-space.

So yeah, it looks like this would need to be done in a (target specific)
pass, probably an IPA pass in case the variables are used from multiple
functions.

> With vtables it's basically the same problem, you'd have to patch *all*
> accesses to match the vtable's address-space.  c++ need not to expose
> address-spaces to the application, handling address-spaces internally
> would be sufficient.

The IPA pass should be able to handle this transparently (vtables are just
global decls with initializers).

Of course the question is whether the linker can handle relocations
between address-spaces for example?

As of the patch I think it is too specific (switch-conversion only)
for my taste.

What you can do I think is in assemble_variable handle the case of
! TREE_PUBLIC && ! TREE_ADDRESSABLE -- non-exported, non-address
taken variables can be put into a different address-space transparently
by adjusting their type to reflect the changed address-space.

This should catch the switch-conversion case that didn't run into the
loop case.  It won't fix vtables I think because they are always exported
unless you use LTO, they also end up address-taken I think.

For vtables (a bigger chunk than switch-conversion decls) adjusting
things in the C++ FE is probably a good idea if it helps AVR.

Richard.

> Johann

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [patch 0/2] PR49847: Add hook to place read-only lookup-tables in named address-space
  2017-07-28  7:34     ` Richard Biener
@ 2017-07-28 10:18       ` Georg-Johann Lay
  2017-07-28 11:15         ` Richard Biener
  2017-08-16 14:29       ` Georg-Johann Lay
  1 sibling, 1 reply; 13+ messages in thread
From: Georg-Johann Lay @ 2017-07-28 10:18 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Denis Chertykov

Richard Biener schrieb:
> On Thu, Jul 27, 2017 at 3:32 PM, Georg-Johann Lay <avr@gjlay.de> wrote:
>> On 27.07.2017 14:34, Richard Biener wrote:
>>> On Thu, Jul 27, 2017 at 2:29 PM, Georg-Johann Lay <avr@gjlay.de> wrote:
>>>> For some targets, the best place to put read-only lookup tables as
>>>> generated by -ftree-switch-conversion is not the generic address space
>>>> but some target specific address space.
>>>>
>>>> This is the case for AVR, where for most devices .rodata must be
>>>> located in RAM.
>>>>
>>>> Part #1 adds a new, optional target hook that queries the back-end
>>>> about the desired address space.  There is currently only one user:
>>>> tree-switch-conversion.c::build_one_array() which re-builds value_type
>>>> and array_type if the address space returned by the backend is not
>>>> the generic one.
>>>>
>>>> Part #2 is the AVR part that implements the new hook and adds some
>>>> sugar around it.
>>>
>>> Given that switch-conversion just creates a constant initializer doesn't
>>> AVR
>>> benefit from handling those uniformly (at RTL expansion?).  Not sure but
>>> I think it goes through the regular constant pool handling.
>>>
>>> Richard.
>>
>> avr doesn't use constant pools because they are not profitable for
>> several reasons.
>>
>> Moreover, it's not sufficient to just patch the pool's section, you'd
>> also have to patch *all* accesses to also use the exact same address
>> space so that they use the correct instruction (LPM instead of LD).
> 
> Sure.  So then not handle it in constant pool handling but in expanding
> the initializers of global vars.  I think the entry for this is
> varasm.c:assemble_variable - that sets DECL_RTL for the variable.

Expand and RTL is too late.  The CSWTCH tables are created by some tree
pass, and it also generated accesses which use the address-space of the
element types.  If any tree variable or copy thereof uses generic space,
then you will get wrong code.  In the case of CSWTCH, you must get the
AS into value_type as noted by Steven

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49857#c3

>> Loop optimization, for example, may move addr-space pointers out of
>> loops and actually does this for some CSWTCH tables.  I didn't look
>> into pool handling, but don't expect it allows to consistently
>> patch all accesses in the aftermath.
>>
>> *If* that's possible, then it should also be possible to patch vtables
>> and all of their accesses, aka.
>>
>> https://gcc.gnu.org/PR43745
>>
>> e.g. in a target-specific tree pass?
> 
> Ah, ok.  I see what you mean.  Once the address of a global var is
> taken the pointers refering to it have to use the proper address-space.

Yes, and all copies of these pointers etc. generated so far.  That's the
point of address spaces.  The binary representation of a pointer may
be exact the same for different ASes, but still accesses have to be
handled differently:  Different legitimate addresses, different
instructions to access through these pointers, etc.  Just converting
a pointer to a different AS will give you wrong code.

> So yeah, it looks like this would need to be done in a (target specific)
> pass, probably an IPA pass in case the variables are used from multiple
> functions.

It would at least be an ABI change.  If different modules are accessing
the vtable, then they must use the same AS.  Having a copy of the vtable
in each module is pointless:  If these modules don't agree about the AS
and host vtables in different ASes, we lost from the optimization point:
one module would have it in flash (consuming flash) and a different
module would have it in RAM (consumes RAM and flash to initialize that
RAM at startup).  So this would not use 1x flash as intended, and not
1x flash + 1x RAM like in the non optimized version, but 2x flash +
1x RAM.  Some comdat won't help because that gives wrong code.

>> With vtables it's basically the same problem, you'd have to patch *all*
>> accesses to match the vtable's address-space.  c++ need not to expose
>> address-spaces to the application, handling address-spaces internally
>> would be sufficient.
> 
> The IPA pass should be able to handle this transparently (vtables are just
> global decls with initializers).

Global is bad.  If the address of the object escapes, that's be wrong
code because location + all accesses must agree about the AS.

> Of course the question is whether the linker can handle relocations
> between address-spaces for example?

What's the linker to do with it?  Maybe there's confusion about what
I mean with "address space".  I mean the named address spaces as of

http://gcc.gnu.org/onlinedocs/gcc/Named-Address-Spaces.html

according to ISO/IEC DTR 18037.  It's not in the C-standard and only
supported by gcc as a GNU extension.  Suppose the following GNU-C

extern const char c;

char readc (const char *pc)
{
     return c + *pc;
}

The generic AS allows direct addressing and addressing via X=r26 for
example:

readc:
  ; (set (reg:HI 26)
  ;      (reg:HI 24))
	movw r26,r24	 ;  18	*movhi/1
  ; (set (reg:QI 25)
  ;      (mem:QI (reg:HI 26) [S1 A8]))
	ld r25,X	 ;  8	movqi_insn/4
  ; (set (reg:QI 24)
  ;      (mem:QI (symbol_ref:HI ("c")) [S1 A8]))
	lds r24,c	 ;  9	movqi_insn/4
  ; (set (reg:QI 24)
  ;      (plus:QI (reg:QI 24)
  ;               (reg:QI 25)))
	add r24,r25	 ;  15	addqi3/1
	ret

This locates c in RAM and *pc must also be located in RAM which
is a waste. Now use AS __flash, i.e. non-volatile memory:

extern const __flash char c;

char readc (const __flash char *pc)
{
     return c + *pc;
}

readc:
  ; (set (reg:HI 30)
  ;      (reg:HI 24))
	movw r30,r24	 ;  20	*movhi/1
  ; (set (reg:QI 25)
  ;      (mem:QI (reg:HI 30) [S1 A8 AS1]))
	lpm r25,Z	 ;  7	movqi_insn/4
  ; (set (reg:HI 30)
  ;      (symbol_ref:HI ("c")))
	ldi r30,lo8(c)	 ;  17	*movhi/5
	ldi r31,hi8(c)
  ; (set (reg:QI 24)
  ;      (mem:QI (reg:HI 30) [S1 A8 AS1]))
	lpm r24,Z	 ;  8	movqi_insn/4
  ; (set (reg:QI 24)
  ;      (plus:QI (reg:QI 24)
  ;               (reg:QI 25)))
	add r24,r25	 ;  14	addqi3/1
	ret

And access to __flash must use instruction LPM *,Z.
Direct addressing or using X or Y register are not allowed.
Using Z+const addressing is not allowed.  Using LD or LDS on
the same binary address will lead to wrong code, for example will
read from RAM address 0x42 instead of from flash address 0x42.

How could the linker fix an address to generic AS to one that
goes to non-generic AS?  The linker doesn't even have that
information.


> As of the patch I think it is too specific (switch-conversion only)
> for my taste.

It's actually not restricted to switch-conversion. It can be used for
any object provided

1) The object is in static storage and located in that AS.

2) All accesses to that object use that AS.

3) The object is read-only, i.e. not modified after load-time.

> What you can do I think is in assemble_variable handle the case of
> ! TREE_PUBLIC && ! TREE_ADDRESSABLE -- non-exported, non-address
> taken variables can be put into a different address-space transparently
> by adjusting their type to reflect the changed address-space.

/* Assemble everything that is needed for a variable or function 
declaration.
    Not used for automatic variables, and not used for function definitions.
    Should not be called for variables of incomplete structure type.

    TOP_LEVEL is nonzero if this variable has file scope.
    AT_END is nonzero if this is the special handling, at end of 
compilation,
    to define things that have had only tentative definitions.
    DONT_OUTPUT_DATA if nonzero means don't actually output the
    initial value (that will be done by the caller).  */

void
assemble_variable (tree decl, int top_level ATTRIBUTE_UNUSED,
		   int at_end ATTRIBUTE_UNUSED, int dont_output_data)
{

So I can patch TYPE_ADDR_SPACE of /decl/ and this will by magic
diffuse to all places that access it? Wow!

> This should catch the switch-conversion case that didn't run into the
> loop case.  It won't fix vtables I think because they are always exported
> unless you use LTO, they also end up address-taken I think.
> 
> For vtables (a bigger chunk than switch-conversion decls) adjusting
> things in the C++ FE is probably a good idea if it helps AVR.
> 
> Richard.

It don't think the C++ FE has an understanding of ASes at all. So even
if it doesn't ICE or error, it's likely non-generic ASes are not
preserved / respected during FE transformations.  For now I'd say
that when you use C+ +/ vtables on such a µC one must live with the
consequences.

Johann

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [patch 0/2] PR49847: Add hook to place read-only lookup-tables in named address-space
  2017-07-28 10:18       ` Georg-Johann Lay
@ 2017-07-28 11:15         ` Richard Biener
  2017-07-28 18:16           ` Georg-Johann Lay
  0 siblings, 1 reply; 13+ messages in thread
From: Richard Biener @ 2017-07-28 11:15 UTC (permalink / raw)
  To: Georg-Johann Lay; +Cc: gcc-patches, Denis Chertykov

On Fri, Jul 28, 2017 at 12:18 PM, Georg-Johann Lay <avr@gjlay.de> wrote:
> Richard Biener schrieb:
>
>> On Thu, Jul 27, 2017 at 3:32 PM, Georg-Johann Lay <avr@gjlay.de> wrote:
>>>
>>> On 27.07.2017 14:34, Richard Biener wrote:
>>>>
>>>> On Thu, Jul 27, 2017 at 2:29 PM, Georg-Johann Lay <avr@gjlay.de> wrote:
>>>>>
>>>>> For some targets, the best place to put read-only lookup tables as
>>>>> generated by -ftree-switch-conversion is not the generic address space
>>>>> but some target specific address space.
>>>>>
>>>>> This is the case for AVR, where for most devices .rodata must be
>>>>> located in RAM.
>>>>>
>>>>> Part #1 adds a new, optional target hook that queries the back-end
>>>>> about the desired address space.  There is currently only one user:
>>>>> tree-switch-conversion.c::build_one_array() which re-builds value_type
>>>>> and array_type if the address space returned by the backend is not
>>>>> the generic one.
>>>>>
>>>>> Part #2 is the AVR part that implements the new hook and adds some
>>>>> sugar around it.
>>>>
>>>>
>>>> Given that switch-conversion just creates a constant initializer doesn't
>>>> AVR
>>>> benefit from handling those uniformly (at RTL expansion?).  Not sure but
>>>> I think it goes through the regular constant pool handling.
>>>>
>>>> Richard.
>>>
>>>
>>> avr doesn't use constant pools because they are not profitable for
>>> several reasons.
>>>
>>> Moreover, it's not sufficient to just patch the pool's section, you'd
>>> also have to patch *all* accesses to also use the exact same address
>>> space so that they use the correct instruction (LPM instead of LD).
>>
>>
>> Sure.  So then not handle it in constant pool handling but in expanding
>> the initializers of global vars.  I think the entry for this is
>> varasm.c:assemble_variable - that sets DECL_RTL for the variable.
>
>
> Expand and RTL is too late.  The CSWTCH tables are created by some tree
> pass, and it also generated accesses which use the address-space of the
> element types.  If any tree variable or copy thereof uses generic space,
> then you will get wrong code.  In the case of CSWTCH, you must get the
> AS into value_type as noted by Steven
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49857#c3
>
>>> Loop optimization, for example, may move addr-space pointers out of
>>> loops and actually does this for some CSWTCH tables.  I didn't look
>>> into pool handling, but don't expect it allows to consistently
>>> patch all accesses in the aftermath.
>>>
>>> *If* that's possible, then it should also be possible to patch vtables
>>> and all of their accesses, aka.
>>>
>>> https://gcc.gnu.org/PR43745
>>>
>>> e.g. in a target-specific tree pass?
>>
>>
>> Ah, ok.  I see what you mean.  Once the address of a global var is
>> taken the pointers refering to it have to use the proper address-space.
>
>
> Yes, and all copies of these pointers etc. generated so far.  That's the
> point of address spaces.  The binary representation of a pointer may
> be exact the same for different ASes, but still accesses have to be
> handled differently:  Different legitimate addresses, different
> instructions to access through these pointers, etc.  Just converting
> a pointer to a different AS will give you wrong code.
>
>> So yeah, it looks like this would need to be done in a (target specific)
>> pass, probably an IPA pass in case the variables are used from multiple
>> functions.
>
>
> It would at least be an ABI change.  If different modules are accessing
> the vtable, then they must use the same AS.  Having a copy of the vtable
> in each module is pointless:  If these modules don't agree about the AS
> and host vtables in different ASes, we lost from the optimization point:
> one module would have it in flash (consuming flash) and a different
> module would have it in RAM (consumes RAM and flash to initialize that
> RAM at startup).  So this would not use 1x flash as intended, and not
> 1x flash + 1x RAM like in the non optimized version, but 2x flash +
> 1x RAM.  Some comdat won't help because that gives wrong code.
>
>>> With vtables it's basically the same problem, you'd have to patch *all*
>>> accesses to match the vtable's address-space.  c++ need not to expose
>>> address-spaces to the application, handling address-spaces internally
>>> would be sufficient.
>>
>>
>> The IPA pass should be able to handle this transparently (vtables are just
>> global decls with initializers).
>
>
> Global is bad.  If the address of the object escapes, that's be wrong
> code because location + all accesses must agree about the AS.
>
>> Of course the question is whether the linker can handle relocations
>> between address-spaces for example?
>
>
> What's the linker to do with it?  Maybe there's confusion about what
> I mean with "address space".  I mean the named address spaces as of
>
> http://gcc.gnu.org/onlinedocs/gcc/Named-Address-Spaces.html
>
> according to ISO/IEC DTR 18037.  It's not in the C-standard and only
> supported by gcc as a GNU extension.  Suppose the following GNU-C
>
> extern const char c;
>
> char readc (const char *pc)
> {
>     return c + *pc;
> }
>
> The generic AS allows direct addressing and addressing via X=r26 for
> example:
>
> readc:
>  ; (set (reg:HI 26)
>  ;      (reg:HI 24))
>         movw r26,r24     ;  18  *movhi/1
>  ; (set (reg:QI 25)
>  ;      (mem:QI (reg:HI 26) [S1 A8]))
>         ld r25,X         ;  8   movqi_insn/4
>  ; (set (reg:QI 24)
>  ;      (mem:QI (symbol_ref:HI ("c")) [S1 A8]))
>         lds r24,c        ;  9   movqi_insn/4
>  ; (set (reg:QI 24)
>  ;      (plus:QI (reg:QI 24)
>  ;               (reg:QI 25)))
>         add r24,r25      ;  15  addqi3/1
>         ret
>
> This locates c in RAM and *pc must also be located in RAM which
> is a waste. Now use AS __flash, i.e. non-volatile memory:
>
> extern const __flash char c;
>
> char readc (const __flash char *pc)
> {
>     return c + *pc;
> }
>
> readc:
>  ; (set (reg:HI 30)
>  ;      (reg:HI 24))
>         movw r30,r24     ;  20  *movhi/1
>  ; (set (reg:QI 25)
>  ;      (mem:QI (reg:HI 30) [S1 A8 AS1]))
>         lpm r25,Z        ;  7   movqi_insn/4
>  ; (set (reg:HI 30)
>  ;      (symbol_ref:HI ("c")))
>         ldi r30,lo8(c)   ;  17  *movhi/5
>         ldi r31,hi8(c)
>  ; (set (reg:QI 24)
>  ;      (mem:QI (reg:HI 30) [S1 A8 AS1]))
>         lpm r24,Z        ;  8   movqi_insn/4
>  ; (set (reg:QI 24)
>  ;      (plus:QI (reg:QI 24)
>  ;               (reg:QI 25)))
>         add r24,r25      ;  14  addqi3/1
>         ret
>
> And access to __flash must use instruction LPM *,Z.
> Direct addressing or using X or Y register are not allowed.
> Using Z+const addressing is not allowed.  Using LD or LDS on
> the same binary address will lead to wrong code, for example will
> read from RAM address 0x42 instead of from flash address 0x42.
>
> How could the linker fix an address to generic AS to one that
> goes to non-generic AS?  The linker doesn't even have that
> information.
>
>
>> As of the patch I think it is too specific (switch-conversion only)
>> for my taste.
>
>
> It's actually not restricted to switch-conversion. It can be used for
> any object provided
>
> 1) The object is in static storage and located in that AS.
>
> 2) All accesses to that object use that AS.
>
> 3) The object is read-only, i.e. not modified after load-time.
>
>> What you can do I think is in assemble_variable handle the case of
>> ! TREE_PUBLIC && ! TREE_ADDRESSABLE -- non-exported, non-address
>> taken variables can be put into a different address-space transparently
>> by adjusting their type to reflect the changed address-space.
>
>
> /* Assemble everything that is needed for a variable or function
> declaration.
>    Not used for automatic variables, and not used for function definitions.
>    Should not be called for variables of incomplete structure type.
>
>    TOP_LEVEL is nonzero if this variable has file scope.
>    AT_END is nonzero if this is the special handling, at end of compilation,
>    to define things that have had only tentative definitions.
>    DONT_OUTPUT_DATA if nonzero means don't actually output the
>    initial value (that will be done by the caller).  */
>
> void
> assemble_variable (tree decl, int top_level ATTRIBUTE_UNUSED,
>                    int at_end ATTRIBUTE_UNUSED, int dont_output_data)
> {
>
> So I can patch TYPE_ADDR_SPACE of /decl/ and this will by magic
> diffuse to all places that access it? Wow!

For the ! TREE_ADDRESSABLE case yes, I think so (of course you
need to double-check...).  ! TREE_PUBLIC is because otherwise
you're not seeing all accesses and thus TREE_ADDRESSABLE cannot
be trusted.  TREE_STATIC should be always set in this function.

>> This should catch the switch-conversion case that didn't run into the
>> loop case.  It won't fix vtables I think because they are always exported
>> unless you use LTO, they also end up address-taken I think.
>>
>> For vtables (a bigger chunk than switch-conversion decls) adjusting
>> things in the C++ FE is probably a good idea if it helps AVR.
>>
>> Richard.
>
>
> It don't think the C++ FE has an understanding of ASes at all. So even
> if it doesn't ICE or error, it's likely non-generic ASes are not
> preserved / respected during FE transformations.  For now I'd say
> that when you use C+ +/ vtables on such a µC one must live with the
> consequences.

Ah - I totally forgot C++ doesn't handle address-spaces...

Richard.

> Johann

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [patch 0/2] PR49847: Add hook to place read-only lookup-tables in named address-space
  2017-07-28 11:15         ` Richard Biener
@ 2017-07-28 18:16           ` Georg-Johann Lay
  0 siblings, 0 replies; 13+ messages in thread
From: Georg-Johann Lay @ 2017-07-28 18:16 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Denis Chertykov

Richard Biener schrieb:
> On Fri, Jul 28, 2017 at 12:18 PM, Georg-Johann Lay <avr@gjlay.de> wrote:
>> Richard Biener schrieb:
>>
>>> On Thu, Jul 27, 2017 at 3:32 PM, Georg-Johann Lay <avr@gjlay.de> wrote:
>>>> On 27.07.2017 14:34, Richard Biener wrote:
>>>>> On Thu, Jul 27, 2017 at 2:29 PM, Georg-Johann Lay <avr@gjlay.de> wrote:
>>>>>> For some targets, the best place to put read-only lookup tables as
>>>>>> generated by -ftree-switch-conversion is not the generic address space
>>>>>> but some target specific address space.
>>>>>>
>>>>>> This is the case for AVR, where for most devices .rodata must be
>>>>>> located in RAM.
>>>>>>
>>>>>> Part #1 adds a new, optional target hook that queries the back-end
>>>>>> about the desired address space.  There is currently only one user:
>>>>>> tree-switch-conversion.c::build_one_array() which re-builds value_type
>>>>>> and array_type if the address space returned by the backend is not
>>>>>> the generic one.
>>>>>>
>>>>>> Part #2 is the AVR part that implements the new hook and adds some
>>>>>> sugar around it.
>>>>>
>>>>> Given that switch-conversion just creates a constant initializer doesn't
>>>>> AVR
>>>>> benefit from handling those uniformly (at RTL expansion?).  Not sure but
>>>>> I think it goes through the regular constant pool handling.
>>>>>
>>>>> Richard.
>>>>
>>>> avr doesn't use constant pools because they are not profitable for
>>>> several reasons.
>>>>
>>>> Moreover, it's not sufficient to just patch the pool's section, you'd
>>>> also have to patch *all* accesses to also use the exact same address
>>>> space so that they use the correct instruction (LPM instead of LD).
>>>
>>> Sure.  So then not handle it in constant pool handling but in expanding
>>> the initializers of global vars.  I think the entry for this is
>>> varasm.c:assemble_variable - that sets DECL_RTL for the variable.
>>
>> Expand and RTL is too late.  The CSWTCH tables are created by some tree
>> pass, and it also generated accesses which use the address-space of the
>> element types.  If any tree variable or copy thereof uses generic space,
>> then you will get wrong code.  In the case of CSWTCH, you must get the
>> AS into value_type as noted by Steven
>>
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49857#c3
>>
>>>> Loop optimization, for example, may move addr-space pointers out of
>>>> loops and actually does this for some CSWTCH tables.  I didn't look
>>>> into pool handling, but don't expect it allows to consistently
>>>> patch all accesses in the aftermath.
>>>>
>>>> *If* that's possible, then it should also be possible to patch vtables
>>>> and all of their accesses, aka.
>>>>
>>>> https://gcc.gnu.org/PR43745
>>>>
>>>> e.g. in a target-specific tree pass?
>>>
>>> Ah, ok.  I see what you mean.  Once the address of a global var is
>>> taken the pointers refering to it have to use the proper address-space.
>>
>> Yes, and all copies of these pointers etc. generated so far.  That's the
>> point of address spaces.  The binary representation of a pointer may
>> be exact the same for different ASes, but still accesses have to be
>> handled differently:  Different legitimate addresses, different
>> instructions to access through these pointers, etc.  Just converting
>> a pointer to a different AS will give you wrong code.
>>
>>> So yeah, it looks like this would need to be done in a (target specific)
>>> pass, probably an IPA pass in case the variables are used from multiple
>>> functions.
>>
>> It would at least be an ABI change.  If different modules are accessing
>> the vtable, then they must use the same AS.  Having a copy of the vtable
>> in each module is pointless:  If these modules don't agree about the AS
>> and host vtables in different ASes, we lost from the optimization point:
>> one module would have it in flash (consuming flash) and a different
>> module would have it in RAM (consumes RAM and flash to initialize that
>> RAM at startup).  So this would not use 1x flash as intended, and not
>> 1x flash + 1x RAM like in the non optimized version, but 2x flash +
>> 1x RAM.  Some comdat won't help because that gives wrong code.
>>
>>>> With vtables it's basically the same problem, you'd have to patch *all*
>>>> accesses to match the vtable's address-space.  c++ need not to expose
>>>> address-spaces to the application, handling address-spaces internally
>>>> would be sufficient.
>>>
>>> The IPA pass should be able to handle this transparently (vtables are just
>>> global decls with initializers).
>>
>> Global is bad.  If the address of the object escapes, that's be wrong
>> code because location + all accesses must agree about the AS.
>>
>>> Of course the question is whether the linker can handle relocations
>>> between address-spaces for example?
>>
>> What's the linker to do with it?  Maybe there's confusion about what
>> I mean with "address space".  I mean the named address spaces as of
>>
>> http://gcc.gnu.org/onlinedocs/gcc/Named-Address-Spaces.html
>>
>> according to ISO/IEC DTR 18037.  It's not in the C-standard and only
>> supported by gcc as a GNU extension.  Suppose the following GNU-C
>>
>> extern const char c;
>>
>> char readc (const char *pc)
>> {
>>     return c + *pc;
>> }
>>
>> The generic AS allows direct addressing and addressing via X=r26 for
>> example:
>>
>> readc:
>>  ; (set (reg:HI 26)
>>  ;      (reg:HI 24))
>>         movw r26,r24     ;  18  *movhi/1
>>  ; (set (reg:QI 25)
>>  ;      (mem:QI (reg:HI 26) [S1 A8]))
>>         ld r25,X         ;  8   movqi_insn/4
>>  ; (set (reg:QI 24)
>>  ;      (mem:QI (symbol_ref:HI ("c")) [S1 A8]))
>>         lds r24,c        ;  9   movqi_insn/4
>>  ; (set (reg:QI 24)
>>  ;      (plus:QI (reg:QI 24)
>>  ;               (reg:QI 25)))
>>         add r24,r25      ;  15  addqi3/1
>>         ret
>>
>> This locates c in RAM and *pc must also be located in RAM which
>> is a waste. Now use AS __flash, i.e. non-volatile memory:
>>
>> extern const __flash char c;
>>
>> char readc (const __flash char *pc)
>> {
>>     return c + *pc;
>> }
>>
>> readc:
>>  ; (set (reg:HI 30)
>>  ;      (reg:HI 24))
>>         movw r30,r24     ;  20  *movhi/1
>>  ; (set (reg:QI 25)
>>  ;      (mem:QI (reg:HI 30) [S1 A8 AS1]))
>>         lpm r25,Z        ;  7   movqi_insn/4
>>  ; (set (reg:HI 30)
>>  ;      (symbol_ref:HI ("c")))
>>         ldi r30,lo8(c)   ;  17  *movhi/5
>>         ldi r31,hi8(c)
>>  ; (set (reg:QI 24)
>>  ;      (mem:QI (reg:HI 30) [S1 A8 AS1]))
>>         lpm r24,Z        ;  8   movqi_insn/4
>>  ; (set (reg:QI 24)
>>  ;      (plus:QI (reg:QI 24)
>>  ;               (reg:QI 25)))
>>         add r24,r25      ;  14  addqi3/1
>>         ret
>>
>> And access to __flash must use instruction LPM *,Z.
>> Direct addressing or using X or Y register are not allowed.
>> Using Z+const addressing is not allowed.  Using LD or LDS on
>> the same binary address will lead to wrong code, for example will
>> read from RAM address 0x42 instead of from flash address 0x42.
>>
>> How could the linker fix an address to generic AS to one that
>> goes to non-generic AS?  The linker doesn't even have that
>> information.
>>
>>
>>> As of the patch I think it is too specific (switch-conversion only)
>>> for my taste.
>>
>> It's actually not restricted to switch-conversion. It can be used for
>> any object provided
>>
>> 1) The object is in static storage and located in that AS.
>>
>> 2) All accesses to that object use that AS.
>>
>> 3) The object is read-only, i.e. not modified after load-time.
>>
>>> What you can do I think is in assemble_variable handle the case of
>>> ! TREE_PUBLIC && ! TREE_ADDRESSABLE -- non-exported, non-address
>>> taken variables can be put into a different address-space transparently
>>> by adjusting their type to reflect the changed address-space.
>>
>> /* Assemble everything that is needed for a variable or function
>> declaration.
>>    Not used for automatic variables, and not used for function definitions.
>>    Should not be called for variables of incomplete structure type.
>>
>>    TOP_LEVEL is nonzero if this variable has file scope.
>>    AT_END is nonzero if this is the special handling, at end of compilation,
>>    to define things that have had only tentative definitions.
>>    DONT_OUTPUT_DATA if nonzero means don't actually output the
>>    initial value (that will be done by the caller).  */
>>
>> void
>> assemble_variable (tree decl, int top_level ATTRIBUTE_UNUSED,
>>                    int at_end ATTRIBUTE_UNUSED, int dont_output_data)
>> {
>>
>> So I can patch TYPE_ADDR_SPACE of /decl/ and this will by magic
>> diffuse to all places that access it? Wow!
> 
> For the ! TREE_ADDRESSABLE case yes, I think so (of course you
> need to double-check...).  ! TREE_PUBLIC is because otherwise
> you're not seeing all accesses and thus TREE_ADDRESSABLE cannot
> be trusted.  TREE_STATIC should be always set in this function.

I don't think using this is safe because you don't have control
over all accesses.  For example, stuff might be accessed by means
of inline asm.  This is not the case for CSWTCH because it is not
exposed to the user's code and user cannot refer to the symbol.

So it must also be DECL_ARTIFICIAL.

If the user generated an object and has a symbol, she may expect
that the object is in the address space she assigned (and use asm
code as appropriate for that AS).  Using a different AS than
specified by the user may be considered a bug.

If a user wants stuff to reside in flash she can do so by adjusting
her code and definitions, but this is not the case for CSWTCH.  The
only thing she can do is -fno-tree-switch-conversion or -fno-jump-tables
to avoid waste of RAM.

And when assemble_variable is called from the C++ FE then changing AS
might nuke the compiler, so you'd have to approve ugly testing for
language FE...

>>> This should catch the switch-conversion case that didn't run into the
>>> loop case.  It won't fix vtables I think because they are always exported
>>> unless you use LTO, they also end up address-taken I think.
>>>
>>> For vtables (a bigger chunk than switch-conversion decls) adjusting
>>> things in the C++ FE is probably a good idea if it helps AVR.
>>>
>>> Richard.
>>
>> It don't think the C++ FE has an understanding of ASes at all. So even
>> if it doesn't ICE or error, it's likely non-generic ASes are not
>> preserved / respected during FE transformations.  For now I'd say
>> that when you use C+ +/ vtables on such a µC one must live with the
>> consequences.
> 
> Ah - I totally forgot C++ doesn't handle address-spaces...

And never will.

Johann

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [patch 0/2] PR49847: Add hook to place read-only lookup-tables in named address-space
  2017-07-28  7:34     ` Richard Biener
  2017-07-28 10:18       ` Georg-Johann Lay
@ 2017-08-16 14:29       ` Georg-Johann Lay
  2017-08-18 10:35         ` Richard Biener
  1 sibling, 1 reply; 13+ messages in thread
From: Georg-Johann Lay @ 2017-08-16 14:29 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Denis Chertykov

On 28.07.2017 09:34, Richard Biener wrote:
> On Thu, Jul 27, 2017 at 3:32 PM, Georg-Johann Lay <avr@gjlay.de> wrote:
>> On 27.07.2017 14:34, Richard Biener wrote:
>>>
>>> On Thu, Jul 27, 2017 at 2:29 PM, Georg-Johann Lay <avr@gjlay.de> wrote:
>>>>
>>>> For some targets, the best place to put read-only lookup tables as
>>>> generated by -ftree-switch-conversion is not the generic address space
>>>> but some target specific address space.
>>>>
>>>> This is the case for AVR, where for most devices .rodata must be
>>>> located in RAM.
>>>>
>>>> Part #1 adds a new, optional target hook that queries the back-end
>>>> about the desired address space.  There is currently only one user:
>>>> tree-switch-conversion.c::build_one_array() which re-builds value_type
>>>> and array_type if the address space returned by the backend is not
>>>> the generic one.
>>>>
>>>> Part #2 is the AVR part that implements the new hook and adds some
>>>> sugar around it.
>>>
>>>
>>> Given that switch-conversion just creates a constant initializer doesn't
>>> AVR
>>> benefit from handling those uniformly (at RTL expansion?).  Not sure but
>>> I think it goes through the regular constant pool handling.
>>>
>>> Richard.
>>
>>
>> avr doesn't use constant pools because they are not profitable for
>> several reasons.
>>
>> Moreover, it's not sufficient to just patch the pool's section, you'd
>> also have to patch *all* accesses to also use the exact same address
>> space so that they use the correct instruction (LPM instead of LD).
> 
> Sure.  So then not handle it in constant pool handling but in expanding
> the initializers of global vars.  I think the entry for this is
> varasm.c:assemble_variable - that sets DECL_RTL for the variable.

That cannot work, and here is why; just for completeness:

cgraphunit.c::symbol_table::compile()

runs

   ...
   expand_all_functions ();
   output_variables (); // runs assemble_variable
   ...

So any patching of DECL_RTL will result in wrong code: The address
space of the decl won't match the address space used by the code
accessing decl.

Johann

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [patch 0/2] PR49847: Add hook to place read-only lookup-tables in named address-space
  2017-08-16 14:29       ` Georg-Johann Lay
@ 2017-08-18 10:35         ` Richard Biener
  2017-08-18 14:54           ` Georg-Johann Lay
  0 siblings, 1 reply; 13+ messages in thread
From: Richard Biener @ 2017-08-18 10:35 UTC (permalink / raw)
  To: Georg-Johann Lay; +Cc: gcc-patches, Denis Chertykov

On Wed, Aug 16, 2017 at 3:32 PM, Georg-Johann Lay <avr@gjlay.de> wrote:
> On 28.07.2017 09:34, Richard Biener wrote:
>>
>> On Thu, Jul 27, 2017 at 3:32 PM, Georg-Johann Lay <avr@gjlay.de> wrote:
>>>
>>> On 27.07.2017 14:34, Richard Biener wrote:
>>>>
>>>>
>>>> On Thu, Jul 27, 2017 at 2:29 PM, Georg-Johann Lay <avr@gjlay.de> wrote:
>>>>>
>>>>>
>>>>> For some targets, the best place to put read-only lookup tables as
>>>>> generated by -ftree-switch-conversion is not the generic address space
>>>>> but some target specific address space.
>>>>>
>>>>> This is the case for AVR, where for most devices .rodata must be
>>>>> located in RAM.
>>>>>
>>>>> Part #1 adds a new, optional target hook that queries the back-end
>>>>> about the desired address space.  There is currently only one user:
>>>>> tree-switch-conversion.c::build_one_array() which re-builds value_type
>>>>> and array_type if the address space returned by the backend is not
>>>>> the generic one.
>>>>>
>>>>> Part #2 is the AVR part that implements the new hook and adds some
>>>>> sugar around it.
>>>>
>>>>
>>>>
>>>> Given that switch-conversion just creates a constant initializer doesn't
>>>> AVR
>>>> benefit from handling those uniformly (at RTL expansion?).  Not sure but
>>>> I think it goes through the regular constant pool handling.
>>>>
>>>> Richard.
>>>
>>>
>>>
>>> avr doesn't use constant pools because they are not profitable for
>>> several reasons.
>>>
>>> Moreover, it's not sufficient to just patch the pool's section, you'd
>>> also have to patch *all* accesses to also use the exact same address
>>> space so that they use the correct instruction (LPM instead of LD).
>>
>>
>> Sure.  So then not handle it in constant pool handling but in expanding
>> the initializers of global vars.  I think the entry for this is
>> varasm.c:assemble_variable - that sets DECL_RTL for the variable.
>
>
> That cannot work, and here is why; just for completeness:
>
> cgraphunit.c::symbol_table::compile()
>
> runs
>
>   ...
>   expand_all_functions ();
>   output_variables (); // runs assemble_variable
>   ...
>
> So any patching of DECL_RTL will result in wrong code: The address
> space of the decl won't match the address space used by the code
> accessing decl.

Ok, so it's more tricky to hack it that way (you'd need to catch the
time the decl gets its DECL_RTL set, not sure where that happens
for globals).

That leaves doing a more broad transform.  One convenient place
to hook in would be the ipa_single_use pass which computes
the varpool_node.used_by_single_function flag.  All such variables
would be suitable for promotion (with some additional constraints
I suppose).  You'd add a transform phase to the pass that rewrites
the decls and performs GIMPLE IL adjustments (I think you need
to patch memory references to use the address-space).

Of course there would need to be a target hook determining
if/what section/address-space a variable should be promoted to
which somehow would allow switch-conversion to use that
as well.  Ho humm.

That said, do you think the switch-conversion created decl is
the only one that benefits from compiler-directed promotion
to different address-space?

Richard.

> Johann

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [patch 0/2] PR49847: Add hook to place read-only lookup-tables in named address-space
  2017-08-18 10:35         ` Richard Biener
@ 2017-08-18 14:54           ` Georg-Johann Lay
  0 siblings, 0 replies; 13+ messages in thread
From: Georg-Johann Lay @ 2017-08-18 14:54 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Denis Chertykov

On 18.08.2017 12:01, Richard Biener wrote:
> On Wed, Aug 16, 2017 at 3:32 PM, Georg-Johann Lay <avr@gjlay.de> wrote:
>> On 28.07.2017 09:34, Richard Biener wrote:
>>> On Thu, Jul 27, 2017 at 3:32 PM, Georg-Johann Lay <avr@gjlay.de> wrote:
>>>> On 27.07.2017 14:34, Richard Biener wrote:
>>>>> On Thu, Jul 27, 2017 at 2:29 PM, Georg-Johann Lay <avr@gjlay.de> wrote:
>>>>>>
>>>>>> For some targets, the best place to put read-only lookup tables as
>>>>>> generated by -ftree-switch-conversion is not the generic address space
>>>>>> but some target specific address space.
>>>>>>
>>>>>> This is the case for AVR, where for most devices .rodata must be
>>>>>> located in RAM.
>>>>>>
>>>>>> Part #1 adds a new, optional target hook that queries the back-end
>>>>>> about the desired address space.  There is currently only one user:
>>>>>> tree-switch-conversion.c::build_one_array() which re-builds value_type
>>>>>> and array_type if the address space returned by the backend is not
>>>>>> the generic one.
>>>>>>
>>>>>> Part #2 is the AVR part that implements the new hook and adds some
>>>>>> sugar around it.
>>>>>
>>>>> Given that switch-conversion just creates a constant initializer doesn't
>>>>> AVR
>>>>> benefit from handling those uniformly (at RTL expansion?).  Not sure but
>>>>> I think it goes through the regular constant pool handling.
>>>>>
>>>>> Richard.
>>>>
>>>> avr doesn't use constant pools because they are not profitable for
>>>> several reasons.
>>>>
>>>> Moreover, it's not sufficient to just patch the pool's section, you'd
>>>> also have to patch *all* accesses to also use the exact same address
>>>> space so that they use the correct instruction (LPM instead of LD).
>>>
>>> Sure.  So then not handle it in constant pool handling but in expanding
>>> the initializers of global vars.  I think the entry for this is
>>> varasm.c:assemble_variable - that sets DECL_RTL for the variable.
>>
>> That cannot work, and here is why; just for completeness:
>>
>> cgraphunit.c::symbol_table::compile()
>>
>> runs
>>    ...
>>    expand_all_functions ();
>>    output_variables (); // runs assemble_variable
>>    ...
>>
>> So any patching of DECL_RTL will result in wrong code: The address
>> space of the decl won't match the address space used by the code
>> accessing decl.
> 
> Ok, so it's more tricky to hack it that way (you'd need to catch the
> time the decl gets its DECL_RTL set, not sure where that happens
> for globals).

Too late, IMO.  Problem is that any reference must also have the
same AS.  Hence if somewhere, before patching decl_rtl, some code
uses TREE_TYPE on respective decl, that type will miss the AS,
same for all variables / pointers created with that type.

Been there, seen it...

> That leaves doing a more broad transform.  One convenient place
> to hook in would be the ipa_single_use pass which computes
> the varpool_node.used_by_single_function flag.  All such variables
> would be suitable for promotion (with some additional constraints
> I suppose).  You'd add a transform phase to the pass that rewrites
> the decls and performs GIMPLE IL adjustments (I think you need
> to patch memory references to use the address-space).

Rewriting DECLs is not enough.  Only the place that cooks up
the decl (implicitly) knows whether it's appropriate to use
different AS and whether is has control over diffusion into
TREE_TYPEs. And as we have strong filter (see below) which
includes DECL_ARTIFICIAL, almost nothing will remain to be
transformed anyway...

> Of course there would need to be a target hook determining
> if/what section/address-space a variable should be promoted to
> which somehow would allow switch-conversion to use that
> as well.  Ho humm.
> 
> That said, do you think the switch-conversion created decl is
> the only one that benefits from compiler-directed promotion
> to different address-space?

Yes. Only compiler-generated lookup-tables may be transformed,
and the only such tables I know of are CSWTCH and vtables.

The conditions so far are:

* TREE_STATIC
* !TREE_PUBLIC
* TREE_READONLY (at least for the case this PR is after)
* DECL_ARTIFICIAL (or otherwise exclude inline asm)
* decl must not be cooked up by non-C FE (i.e. vtables are out).
* Not weak, comdat or linkonce (again, vtables are out).
* Not for debug purpose (like what dwarf2asm does).
* No section attribute (actually also CSWTCH could be mentioned
   in linker script, but we can argue that artificials are
   managed by the compiler + user has option to control CSWTCH).

What remains is CSWTCH.

Johann

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2017-08-18 14:36 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-27 12:29 [patch 0/2] PR49847: Add hook to place read-only lookup-tables in named address-space Georg-Johann Lay
2017-07-27 12:34 ` Richard Biener
2017-07-27 13:32   ` Georg-Johann Lay
2017-07-28  7:34     ` Richard Biener
2017-07-28 10:18       ` Georg-Johann Lay
2017-07-28 11:15         ` Richard Biener
2017-07-28 18:16           ` Georg-Johann Lay
2017-08-16 14:29       ` Georg-Johann Lay
2017-08-18 10:35         ` Richard Biener
2017-08-18 14:54           ` Georg-Johann Lay
2017-07-27 12:41 ` [patch 1/2] " Georg-Johann Lay
2017-07-27 12:50 ` [patch 2/2,avr] " Georg-Johann Lay
2017-07-27 15:48   ` Denis Chertykov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).