public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH 0/3] Add zero cycle move support
@ 2021-11-19 14:49 Michael Meissner
  2021-11-19 14:53 ` [PATCH 1/3] Add power10 zero cycle moves for switches & indirect jumps Michael Meissner
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Michael Meissner @ 2021-11-19 14:49 UTC (permalink / raw)
  To: gcc-patches, Michael Meissner, Segher Boessenkool,
	David Edelsohn, Bill Schmidt, Peter Bergner, Will Schmidt

The next set of 3 patches add zero cycle move support to the Power10.  Zero
cycle moves are where the move to LR/CTR/TAR register that is adjacent to the
jump to LR/CTR/TAR register can be fused together.

At the moment, these set of three patches add support for zero cycle moves for
indirect jumps and switch tables using the CTR register.  Potential zero cycle
moves for doing returns are not currently handled.

In looking at the code, I discovered that just using zero cycle moves isn't as
helpful unless we can eliminate the add instruction before doing the jump.  I
also noticed that the various power10 fusion options are only done if
-mcpu=power10.  I added a patch to do the fusion for -mtune=power10 as well.

I have done bootstraps and make check with these patches installed on both
little endian power9 and little endian power10 systems.  Can I install these
patches?

The following patches will be posted:

1) Patch to add zero cycle move for indirect jumps and switches.

2) Patch to enable p10 fusion for -mtune=power10 in addition to -mcpu=power10.

3) Patch to use absolute addresses for switch tables instead of relative
   addresses if zero cycle fusion is enabled.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meissner@linux.ibm.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/3] Add power10 zero cycle moves for switches & indirect jumps
  2021-11-19 14:49 [PATCH 0/3] Add zero cycle move support Michael Meissner
@ 2021-11-19 14:53 ` Michael Meissner
  2021-11-22 16:36   ` Bill Schmidt
  2021-12-13 17:10   ` Ping: " Michael Meissner
  2021-11-19 14:55 ` [PATCH 2/3] Set power10 fusion if -mtune=power10 Michael Meissner
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 15+ messages in thread
From: Michael Meissner @ 2021-11-19 14:53 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, Segher Boessenkool,
	David Edelsohn, Bill Schmidt, Peter Bergner, Will Schmidt

Add power10 zero cycle moves for switches.

Power10 will fuse adjacenet 'mtctr' and 'bctr' instructions to form zero
cycle moves.  This code exploits this fusion opportunity.

I have built bootstrapped compilers with this patch on little endian power9 and
power10 systems with no regressions.  Can I install this into the master
branch?

2021-11-19  Michael Meissner  <meissner@the-meissners.org>

	* config/rs6000/rs6000-cpus.def (ISA_3_1_MASKS_SERVER): Add
	support for -mpower10-fusion-zero-cycle.
	(POWERPC_MASKS): Likewise.
	* config/rs6000/rs6000.c (rs6000_option_override_internal):
	Likewise.
	* config/rs6000/rs6000.md (indirect_jump): Support zero cycle
	moves.
	(indirect_jump<mode>_zero_cycle): New insns.
	(tablejump<mode>_normal): Likewise.
	(tablejump<mode>_absolute): Likewise.
	(tablejump<mode>_insn_zero_cycle): New insn.
	* config/rs6000/rs6000.opt (-mpower10-fusion-zero-cycle): New
	debug switch.
---
 gcc/config/rs6000/rs6000-cpus.def |  4 ++-
 gcc/config/rs6000/rs6000.c        |  4 +++
 gcc/config/rs6000/rs6000.md       | 52 ++++++++++++++++++++++++++++---
 gcc/config/rs6000/rs6000.opt      |  4 +++
 4 files changed, 59 insertions(+), 5 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-cpus.def b/gcc/config/rs6000/rs6000-cpus.def
index f5812da0184..cc072ee94ea 100644
--- a/gcc/config/rs6000/rs6000-cpus.def
+++ b/gcc/config/rs6000/rs6000-cpus.def
@@ -91,7 +91,8 @@
 				 | OPTION_MASK_P10_FUSION_LOGADD 	\
 				 | OPTION_MASK_P10_FUSION_ADDLOG	\
 				 | OPTION_MASK_P10_FUSION_2ADD		\
-				 | OPTION_MASK_P10_FUSION_2STORE)
+				 | OPTION_MASK_P10_FUSION_2STORE	\
+				 | OPTION_MASK_P10_FUSION_ZERO_CYCLE)
 
 /* Flags that need to be turned off if -mno-power9-vector.  */
 #define OTHER_P9_VECTOR_MASKS	(OPTION_MASK_FLOAT128_HW		\
@@ -145,6 +146,7 @@
 				 | OPTION_MASK_P10_FUSION_ADDLOG	\
 				 | OPTION_MASK_P10_FUSION_2ADD    	\
 				 | OPTION_MASK_P10_FUSION_2STORE	\
+				 | OPTION_MASK_P10_FUSION_ZERO_CYCLE	\
 				 | OPTION_MASK_HTM			\
 				 | OPTION_MASK_ISEL			\
 				 | OPTION_MASK_MFCRF			\
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index e4843eb0f1c..6780304a5eb 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -4497,6 +4497,10 @@ rs6000_option_override_internal (bool global_init_p)
       && (rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION_2STORE) == 0)
     rs6000_isa_flags |= OPTION_MASK_P10_FUSION_2STORE;
 
+  if (TARGET_POWER10
+      && (rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION_ZERO_CYCLE) == 0)
+    rs6000_isa_flags |= OPTION_MASK_P10_FUSION_ZERO_CYCLE;
+
   /* Turn off vector pair/mma options on non-power10 systems.  */
   else if (!TARGET_POWER10 && TARGET_MMA)
     {
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 6bec2bddbde..ea41eb4ada3 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -12988,15 +12988,34 @@ (define_expand "indirect_jump"
     emit_jump_insn (gen_indirect_jump_nospec (Pmode, operands[0], ccreg));
     DONE;
   }
+  if (TARGET_P10_FUSION && TARGET_P10_FUSION_ZERO_CYCLE)
+    {
+      emit_jump_insn (gen_indirect_jump_zero_cycle (Pmode, operands[0]));
+      DONE;
+    }
 })
 
 (define_insn "*indirect_jump<mode>"
   [(set (pc)
 	(match_operand:P 0 "register_operand" "c,*l"))]
-  "rs6000_speculate_indirect_jumps"
+  "rs6000_speculate_indirect_jumps
+   && !(TARGET_P10_FUSION && TARGET_P10_FUSION_ZERO_CYCLE)"
   "b%T0"
   [(set_attr "type" "jmpreg")])
 
+(define_insn "@indirect_jump<mode>_zero_cycle"
+  [(set (pc)
+	(match_operand:P 0 "register_operand" "r,r,!cl"))
+   (clobber (match_scratch:P 1 "=c,*l,X"))]
+  "rs6000_speculate_indirect_jumps && TARGET_P10_FUSION
+   && TARGET_P10_FUSION_ZERO_CYCLE"
+  "@
+   mt%T1 %0\;b%T1
+   mt%T1 %0\;b%T1
+   b%T0"
+  [(set_attr "type" "jmpreg")
+   (set_attr "length" "8,8,4")])
+
 (define_insn "@indirect_jump<mode>_nospec"
   [(set (pc) (match_operand:P 0 "register_operand" "c,*l"))
    (clobber (match_operand:CC 1 "cc_reg_operand" "=y,y"))]
@@ -13050,7 +13069,11 @@ (define_expand "@tablejump<mode>_normal"
   rtx addr = gen_reg_rtx (Pmode);
 
   emit_insn (gen_add<mode>3 (addr, off, lab));
-  emit_jump_insn (gen_tablejump_insn_normal (Pmode, addr, operands[1]));
+  rtx insn = (TARGET_P10_FUSION && TARGET_P10_FUSION_ZERO_CYCLE
+	      ? gen_tablejump_insn_zero_cycle (Pmode, addr, operands[1])
+	      : gen_tablejump_insn_normal (Pmode, addr, operands[1]));
+
+  emit_jump_insn (insn);
   DONE;
 })
 
@@ -13062,7 +13085,11 @@ (define_expand "@tablejump<mode>_absolute"
   rtx addr = gen_reg_rtx (Pmode);
   emit_move_insn (addr, operands[0]);
 
-  emit_jump_insn (gen_tablejump_insn_normal (Pmode, addr, operands[1]));
+  rtx insn = (TARGET_P10_FUSION && TARGET_P10_FUSION_ZERO_CYCLE
+	      ? gen_tablejump_insn_zero_cycle (Pmode, addr, operands[1])
+	      : gen_tablejump_insn_normal (Pmode, addr, operands[1]));
+
+  emit_jump_insn (insn);
   DONE;
 })
 
@@ -13107,10 +13134,27 @@ (define_insn "@tablejump<mode>_insn_normal"
   [(set (pc)
 	(match_operand:P 0 "register_operand" "c,*l"))
    (use (label_ref (match_operand 1)))]
-  "rs6000_speculate_indirect_jumps"
+  "rs6000_speculate_indirect_jumps
+   && !(TARGET_P10_FUSION && TARGET_P10_FUSION_ZERO_CYCLE)"
   "b%T0"
   [(set_attr "type" "jmpreg")])
 
+;; Version of indirect jump that fuses the mtctr to bctr to achieve 0 cycle
+;; moves on Power10.
+(define_insn "@tablejump<mode>_insn_zero_cycle"
+  [(set (pc)
+	(match_operand:P 0 "register_operand" "r,r,!cl"))
+   (use (label_ref (match_operand 1)))
+   (clobber (match_scratch:P 2 "=c,*l,X"))]
+  "rs6000_speculate_indirect_jumps && TARGET_P10_FUSION
+   && TARGET_P10_FUSION_ZERO_CYCLE"
+  "@
+   mt%T2 %0\;b%T2
+   mt%T2 %0\;b%T2
+   b%T0"
+  [(set_attr "type" "jmpreg")
+   (set_attr "length" "8,8,4")])
+
 (define_insn "@tablejump<mode>_insn_nospec"
   [(set (pc)
 	(match_operand:P 0 "register_operand" "c,*l"))
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index 9d7878f144a..ba674947557 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -518,6 +518,10 @@ mpower10-fusion-2store
 Target Undocumented Mask(P10_FUSION_2STORE) Var(rs6000_isa_flags)
 Fuse certain store operations together for better performance on power10.
 
+mpower10-fusion-zero-cycle
+Target Undocumented Mask(P10_FUSION_ZERO_CYCLE) Var(rs6000_isa_flags)
+Fuse move to special register and jump for better performance on power10.
+
 mcrypto
 Target Mask(CRYPTO) Var(rs6000_isa_flags)
 Use ISA 2.07 Category:Vector.AES and Category:Vector.SHA2 instructions.
-- 
2.31.1


-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meissner@linux.ibm.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 2/3] Set power10 fusion if -mtune=power10.
  2021-11-19 14:49 [PATCH 0/3] Add zero cycle move support Michael Meissner
  2021-11-19 14:53 ` [PATCH 1/3] Add power10 zero cycle moves for switches & indirect jumps Michael Meissner
@ 2021-11-19 14:55 ` Michael Meissner
  2021-11-22 16:06   ` Bill Schmidt
  2021-12-13 17:12   ` Ping: " Michael Meissner
  2021-11-19 14:57 ` [PATCH 3/3] Use absolute switch table addresses for zero cycle moves Michael Meissner
  2021-11-22 15:57 ` [PATCH 0/3] Add zero cycle move support Bill Schmidt
  3 siblings, 2 replies; 15+ messages in thread
From: Michael Meissner @ 2021-11-19 14:55 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, Segher Boessenkool,
	David Edelsohn, Bill Schmidt, Peter Bergner, Will Schmidt

Set power10 fusion if -mtune=power10.

In doing the patch for zero cycle moves for switch statements and indirect
jumps, I noticed the fusion support is only done if -mcpu=power10.  This option
enables power10 fusion if we use -mtune=power10.

I have built and run the testsuites on little endian power9 and power10 systems
with no regressions.  Can I install this patch?

2021-11-19  Michael Meissner  <meissner@the-meissners.org>

	* config/rs6000/rs6000.c (rs6000_option_override_internal): Enable
	power10 fusion if -mtune=power10.
	(rs6000_opt_masks): Add power10 fusion options.
---
 gcc/config/rs6000/rs6000.c | 25 +++++++++++++++++--------
 1 file changed, 17 insertions(+), 8 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 6780304a5eb..8531cef0337 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -4469,35 +4469,36 @@ rs6000_option_override_internal (bool global_init_p)
   if (TARGET_POWER10 && (rs6000_isa_flags_explicit & OPTION_MASK_MMA) == 0)
     rs6000_isa_flags |= OPTION_MASK_MMA;
 
-  if (TARGET_POWER10
+  /* Enable power10 tuning if either -mcpu=power10 or -mtune=power10.  */
+  if ((TARGET_POWER10 || rs6000_tune == PROCESSOR_POWER10)
       && (rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION) == 0)
     rs6000_isa_flags |= OPTION_MASK_P10_FUSION;
 
-  if (TARGET_POWER10 &&
+  if (TARGET_P10_FUSION &&
       (rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION_LD_CMPI) == 0)
     rs6000_isa_flags |= OPTION_MASK_P10_FUSION_LD_CMPI;
 
-  if (TARGET_POWER10
+  if (TARGET_P10_FUSION
       && (rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION_2LOGICAL) == 0)
     rs6000_isa_flags |= OPTION_MASK_P10_FUSION_2LOGICAL;
 
-  if (TARGET_POWER10
+  if (TARGET_P10_FUSION
       && (rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION_LOGADD) == 0)
     rs6000_isa_flags |= OPTION_MASK_P10_FUSION_LOGADD;
 
-  if (TARGET_POWER10
+  if (TARGET_P10_FUSION
       && (rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION_ADDLOG) == 0)
     rs6000_isa_flags |= OPTION_MASK_P10_FUSION_ADDLOG;
 
-  if (TARGET_POWER10
+  if (TARGET_P10_FUSION
       && (rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION_2ADD) == 0)
     rs6000_isa_flags |= OPTION_MASK_P10_FUSION_2ADD;
 
-  if (TARGET_POWER10
+  if (TARGET_P10_FUSION
       && (rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION_2STORE) == 0)
     rs6000_isa_flags |= OPTION_MASK_P10_FUSION_2STORE;
 
-  if (TARGET_POWER10
+  if (TARGET_P10_FUSION
       && (rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION_ZERO_CYCLE) == 0)
     rs6000_isa_flags |= OPTION_MASK_P10_FUSION_ZERO_CYCLE;
 
@@ -24292,6 +24293,14 @@ static struct rs6000_opt_mask const rs6000_opt_masks[] =
   { "power9-misc",		OPTION_MASK_P9_MISC,		false, true  },
   { "power9-vector",		OPTION_MASK_P9_VECTOR,		false, true  },
   { "power10-fusion",		OPTION_MASK_P10_FUSION,		false, true  },
+  { "power10-fusion-ld-cmpi",	OPTION_MASK_P10_FUSION_LD_CMPI,	false, true  },
+  { "power10-fusion-2logical",	OPTION_MASK_P10_FUSION_2LOGICAL,false, true  },
+  { "power10-fusion-logical-add", OPTION_MASK_P10_FUSION_LOGADD,false, true  },
+  { "power10-fusion-add-logical", OPTION_MASK_P10_FUSION_ADDLOG,false, true  },
+  { "power10-fusion-2add",	OPTION_MASK_P10_FUSION_2ADD,	false, true  },
+  { "power10-fusion-2store",	OPTION_MASK_P10_FUSION_2STORE,	false, true  },
+  { "power10-fusion-zero-cycle", OPTION_MASK_P10_FUSION_ZERO_CYCLE,
+								false, true  },
   { "powerpc-gfxopt",		OPTION_MASK_PPC_GFXOPT,		false, true  },
   { "powerpc-gpopt",		OPTION_MASK_PPC_GPOPT,		false, true  },
   { "prefixed",			OPTION_MASK_PREFIXED,		false, true  },
-- 
2.31.1


-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meissner@linux.ibm.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 3/3] Use absolute switch table addresses for zero cycle moves.
  2021-11-19 14:49 [PATCH 0/3] Add zero cycle move support Michael Meissner
  2021-11-19 14:53 ` [PATCH 1/3] Add power10 zero cycle moves for switches & indirect jumps Michael Meissner
  2021-11-19 14:55 ` [PATCH 2/3] Set power10 fusion if -mtune=power10 Michael Meissner
@ 2021-11-19 14:57 ` Michael Meissner
  2021-12-13 17:13   ` Ping: " Michael Meissner
  2021-11-22 15:57 ` [PATCH 0/3] Add zero cycle move support Bill Schmidt
  3 siblings, 1 reply; 15+ messages in thread
From: Michael Meissner @ 2021-11-19 14:57 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, Segher Boessenkool,
	David Edelsohn, Bill Schmidt, Peter Bergner, Will Schmidt

se absolute switch table addresses for zero cycle moves.

This option enables using absolute addresses in switch tables if the
power10 zero cycle move tuning is turned on.  The combination of using
absolute addresses in switch tables along with zero cycle moves seems to
give the best performance.

I have built and run bootstrapped compilers on little endian power9 and power10
systems.  There were no regressions.  Can I install this patch?

2021-11-19  Michael Meissner  <meissner@the-meissners.org>

	* config/rs6000/rs6000.c (rs6000_option_override_internal): Use
	absolute addresses in switch tables if power10 zero cycle move
	fusion is enabled.
---
 gcc/config/rs6000/rs6000.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 8531cef0337..dc942765828 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -4502,6 +4502,12 @@ rs6000_option_override_internal (bool global_init_p)
       && (rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION_ZERO_CYCLE) == 0)
     rs6000_isa_flags |= OPTION_MASK_P10_FUSION_ZERO_CYCLE;
 
+  /* If we enable zero cycle move fusion, also switch to absolute addresses in
+     switch tables.  */
+  if (TARGET_P10_FUSION && TARGET_P10_FUSION_ZERO_CYCLE
+      && !global_options_set.x_rs6000_relative_jumptables)
+    rs6000_relative_jumptables = 0;
+
   /* Turn off vector pair/mma options on non-power10 systems.  */
   else if (!TARGET_POWER10 && TARGET_MMA)
     {
-- 
2.31.1


-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meissner@linux.ibm.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3] Add zero cycle move support
  2021-11-19 14:49 [PATCH 0/3] Add zero cycle move support Michael Meissner
                   ` (2 preceding siblings ...)
  2021-11-19 14:57 ` [PATCH 3/3] Use absolute switch table addresses for zero cycle moves Michael Meissner
@ 2021-11-22 15:57 ` Bill Schmidt
  2021-11-22 16:09   ` David Edelsohn
  3 siblings, 1 reply; 15+ messages in thread
From: Bill Schmidt @ 2021-11-22 15:57 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, Segher Boessenkool,
	David Edelsohn, Peter Bergner, Will Schmidt, HAO CHEN GUI

Hi!

On 11/19/21 8:49 AM, Michael Meissner wrote:
> The next set of 3 patches add zero cycle move support to the Power10.  Zero
> cycle moves are where the move to LR/CTR/TAR register that is adjacent to the
> jump to LR/CTR/TAR register can be fused together.
>
> At the moment, these set of three patches add support for zero cycle moves for
> indirect jumps and switch tables using the CTR register.  Potential zero cycle
> moves for doing returns are not currently handled.
>
> In looking at the code, I discovered that just using zero cycle moves isn't as
> helpful unless we can eliminate the add instruction before doing the jump.  I
> also noticed that the various power10 fusion options are only done if
> -mcpu=power10.  I added a patch to do the fusion for -mtune=power10 as well.
>
> I have done bootstraps and make check with these patches installed on both
> little endian power9 and little endian power10 systems.  Can I install these
> patches?
>
> The following patches will be posted:
>
> 1) Patch to add zero cycle move for indirect jumps and switches.
>
> 2) Patch to enable p10 fusion for -mtune=power10 in addition to -mcpu=power10.
>
> 3) Patch to use absolute addresses for switch tables instead of relative
>    addresses if zero cycle fusion is enabled.
>
For this last point, I had thought that the plan was to always switch over to
absolute addresses for switch tables, following the work that Hao Chen did in
this area.  Am I misremembering?  Hao Chen, can you please remind me where we
ended up here?

Thanks!
Bill


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] Set power10 fusion if -mtune=power10.
  2021-11-19 14:55 ` [PATCH 2/3] Set power10 fusion if -mtune=power10 Michael Meissner
@ 2021-11-22 16:06   ` Bill Schmidt
  2021-11-22 21:13     ` Michael Meissner
  2021-12-13 17:12   ` Ping: " Michael Meissner
  1 sibling, 1 reply; 15+ messages in thread
From: Bill Schmidt @ 2021-11-22 16:06 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, Segher Boessenkool,
	David Edelsohn, Peter Bergner, Will Schmidt

Hi Mike,

On 11/19/21 8:55 AM, Michael Meissner wrote:
> Set power10 fusion if -mtune=power10.
>
> In doing the patch for zero cycle moves for switch statements and indirect
> jumps, I noticed the fusion support is only done if -mcpu=power10.  This option
> enables power10 fusion if we use -mtune=power10.
>
> I have built and run the testsuites on little endian power9 and power10 systems
> with no regressions.  Can I install this patch?

This all seems fine, but since we're planning on collapsing all those flags
anyway, maybe it would be better if we did that first.  This seems like work
that will mostly be removed soon.  But no concerns from me otherwise.

Thanks!
Bill

>
> 2021-11-19  Michael Meissner  <meissner@the-meissners.org>
>
> 	* config/rs6000/rs6000.c (rs6000_option_override_internal): Enable
> 	power10 fusion if -mtune=power10.
> 	(rs6000_opt_masks): Add power10 fusion options.
> ---
>  gcc/config/rs6000/rs6000.c | 25 +++++++++++++++++--------
>  1 file changed, 17 insertions(+), 8 deletions(-)
>
> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index 6780304a5eb..8531cef0337 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -4469,35 +4469,36 @@ rs6000_option_override_internal (bool global_init_p)
>    if (TARGET_POWER10 && (rs6000_isa_flags_explicit & OPTION_MASK_MMA) == 0)
>      rs6000_isa_flags |= OPTION_MASK_MMA;
>  
> -  if (TARGET_POWER10
> +  /* Enable power10 tuning if either -mcpu=power10 or -mtune=power10.  */
> +  if ((TARGET_POWER10 || rs6000_tune == PROCESSOR_POWER10)
>        && (rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION) == 0)
>      rs6000_isa_flags |= OPTION_MASK_P10_FUSION;
>  
> -  if (TARGET_POWER10 &&
> +  if (TARGET_P10_FUSION &&
>        (rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION_LD_CMPI) == 0)
>      rs6000_isa_flags |= OPTION_MASK_P10_FUSION_LD_CMPI;
>  
> -  if (TARGET_POWER10
> +  if (TARGET_P10_FUSION
>        && (rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION_2LOGICAL) == 0)
>      rs6000_isa_flags |= OPTION_MASK_P10_FUSION_2LOGICAL;
>  
> -  if (TARGET_POWER10
> +  if (TARGET_P10_FUSION
>        && (rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION_LOGADD) == 0)
>      rs6000_isa_flags |= OPTION_MASK_P10_FUSION_LOGADD;
>  
> -  if (TARGET_POWER10
> +  if (TARGET_P10_FUSION
>        && (rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION_ADDLOG) == 0)
>      rs6000_isa_flags |= OPTION_MASK_P10_FUSION_ADDLOG;
>  
> -  if (TARGET_POWER10
> +  if (TARGET_P10_FUSION
>        && (rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION_2ADD) == 0)
>      rs6000_isa_flags |= OPTION_MASK_P10_FUSION_2ADD;
>  
> -  if (TARGET_POWER10
> +  if (TARGET_P10_FUSION
>        && (rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION_2STORE) == 0)
>      rs6000_isa_flags |= OPTION_MASK_P10_FUSION_2STORE;
>  
> -  if (TARGET_POWER10
> +  if (TARGET_P10_FUSION
>        && (rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION_ZERO_CYCLE) == 0)
>      rs6000_isa_flags |= OPTION_MASK_P10_FUSION_ZERO_CYCLE;
>  
> @@ -24292,6 +24293,14 @@ static struct rs6000_opt_mask const rs6000_opt_masks[] =
>    { "power9-misc",		OPTION_MASK_P9_MISC,		false, true  },
>    { "power9-vector",		OPTION_MASK_P9_VECTOR,		false, true  },
>    { "power10-fusion",		OPTION_MASK_P10_FUSION,		false, true  },
> +  { "power10-fusion-ld-cmpi",	OPTION_MASK_P10_FUSION_LD_CMPI,	false, true  },
> +  { "power10-fusion-2logical",	OPTION_MASK_P10_FUSION_2LOGICAL,false, true  },
> +  { "power10-fusion-logical-add", OPTION_MASK_P10_FUSION_LOGADD,false, true  },
> +  { "power10-fusion-add-logical", OPTION_MASK_P10_FUSION_ADDLOG,false, true  },
> +  { "power10-fusion-2add",	OPTION_MASK_P10_FUSION_2ADD,	false, true  },
> +  { "power10-fusion-2store",	OPTION_MASK_P10_FUSION_2STORE,	false, true  },
> +  { "power10-fusion-zero-cycle", OPTION_MASK_P10_FUSION_ZERO_CYCLE,
> +								false, true  },
>    { "powerpc-gfxopt",		OPTION_MASK_PPC_GFXOPT,		false, true  },
>    { "powerpc-gpopt",		OPTION_MASK_PPC_GPOPT,		false, true  },
>    { "prefixed",			OPTION_MASK_PREFIXED,		false, true  },

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3] Add zero cycle move support
  2021-11-22 15:57 ` [PATCH 0/3] Add zero cycle move support Bill Schmidt
@ 2021-11-22 16:09   ` David Edelsohn
  2021-11-22 21:17     ` Michael Meissner
  2021-11-23  3:41     ` HAO CHEN GUI
  0 siblings, 2 replies; 15+ messages in thread
From: David Edelsohn @ 2021-11-22 16:09 UTC (permalink / raw)
  To: Bill Schmidt, Michael Meissner, HAO CHEN GUI
  Cc: GCC Patches, Segher Boessenkool, Peter Bergner, Will Schmidt

On Mon, Nov 22, 2021 at 10:58 AM Bill Schmidt <wschmidt@linux.ibm.com> wrote:
>
> Hi!
>
> On 11/19/21 8:49 AM, Michael Meissner wrote:
> > The next set of 3 patches add zero cycle move support to the Power10.  Zero
> > cycle moves are where the move to LR/CTR/TAR register that is adjacent to the
> > jump to LR/CTR/TAR register can be fused together.
> >
> > At the moment, these set of three patches add support for zero cycle moves for
> > indirect jumps and switch tables using the CTR register.  Potential zero cycle
> > moves for doing returns are not currently handled.
> >
> > In looking at the code, I discovered that just using zero cycle moves isn't as
> > helpful unless we can eliminate the add instruction before doing the jump.  I
> > also noticed that the various power10 fusion options are only done if
> > -mcpu=power10.  I added a patch to do the fusion for -mtune=power10 as well.
> >
> > I have done bootstraps and make check with these patches installed on both
> > little endian power9 and little endian power10 systems.  Can I install these
> > patches?
> >
> > The following patches will be posted:
> >
> > 1) Patch to add zero cycle move for indirect jumps and switches.
> >
> > 2) Patch to enable p10 fusion for -mtune=power10 in addition to -mcpu=power10.
> >
> > 3) Patch to use absolute addresses for switch tables instead of relative
> >    addresses if zero cycle fusion is enabled.
> >
> For this last point, I had thought that the plan was to always switch over to
> absolute addresses for switch tables, following the work that Hao Chen did in
> this area.  Am I misremembering?  Hao Chen, can you please remind me where we
> ended up here?

And do the absolute addressing for switch tables changes work on AIX?
I thought that Hao Chen only had done the work for PPC64 Linux ELF
syntax with promises of future changes to accommodate AIX as well.

Thanks, David

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/3] Add power10 zero cycle moves for switches & indirect jumps
  2021-11-19 14:53 ` [PATCH 1/3] Add power10 zero cycle moves for switches & indirect jumps Michael Meissner
@ 2021-11-22 16:36   ` Bill Schmidt
  2021-11-22 21:12     ` Michael Meissner
  2021-12-13 17:10   ` Ping: " Michael Meissner
  1 sibling, 1 reply; 15+ messages in thread
From: Bill Schmidt @ 2021-11-22 16:36 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, Segher Boessenkool,
	David Edelsohn, Peter Bergner, Will Schmidt

Hi Mike,

Thanks for this patch!

On 11/19/21 8:53 AM, Michael Meissner wrote:
> Add power10 zero cycle moves for switches.
>
> Power10 will fuse adjacenet 'mtctr' and 'bctr' instructions to form zero
> cycle moves.  This code exploits this fusion opportunity.
>
> I have built bootstrapped compilers with this patch on little endian power9 and
> power10 systems with no regressions.  Can I install this into the master
> branch?
>
> 2021-11-19  Michael Meissner  <meissner@the-meissners.org>
>
> 	* config/rs6000/rs6000-cpus.def (ISA_3_1_MASKS_SERVER): Add
> 	support for -mpower10-fusion-zero-cycle.
> 	(POWERPC_MASKS): Likewise.
> 	* config/rs6000/rs6000.c (rs6000_option_override_internal):
> 	Likewise.
> 	* config/rs6000/rs6000.md (indirect_jump): Support zero cycle
> 	moves.
> 	(indirect_jump<mode>_zero_cycle): New insns.
> 	(tablejump<mode>_normal): Likewise.
> 	(tablejump<mode>_absolute): Likewise.
> 	(tablejump<mode>_insn_zero_cycle): New insn.
> 	* config/rs6000/rs6000.opt (-mpower10-fusion-zero-cycle): New
> 	debug switch.
> ---
>  gcc/config/rs6000/rs6000-cpus.def |  4 ++-
>  gcc/config/rs6000/rs6000.c        |  4 +++
>  gcc/config/rs6000/rs6000.md       | 52 ++++++++++++++++++++++++++++---
>  gcc/config/rs6000/rs6000.opt      |  4 +++
>  4 files changed, 59 insertions(+), 5 deletions(-)
>
> diff --git a/gcc/config/rs6000/rs6000-cpus.def b/gcc/config/rs6000/rs6000-cpus.def
> index f5812da0184..cc072ee94ea 100644
> --- a/gcc/config/rs6000/rs6000-cpus.def
> +++ b/gcc/config/rs6000/rs6000-cpus.def
> @@ -91,7 +91,8 @@
>  				 | OPTION_MASK_P10_FUSION_LOGADD 	\
>  				 | OPTION_MASK_P10_FUSION_ADDLOG	\
>  				 | OPTION_MASK_P10_FUSION_2ADD		\
> -				 | OPTION_MASK_P10_FUSION_2STORE)
> +				 | OPTION_MASK_P10_FUSION_2STORE	\
> +				 | OPTION_MASK_P10_FUSION_ZERO_CYCLE)

I guess it's fine to introduce one more for now, but ultimately we want
all these to get collapsed down to one.  No worries from me.

>  
>  /* Flags that need to be turned off if -mno-power9-vector.  */
>  #define OTHER_P9_VECTOR_MASKS	(OPTION_MASK_FLOAT128_HW		\
> @@ -145,6 +146,7 @@
>  				 | OPTION_MASK_P10_FUSION_ADDLOG	\
>  				 | OPTION_MASK_P10_FUSION_2ADD    	\
>  				 | OPTION_MASK_P10_FUSION_2STORE	\
> +				 | OPTION_MASK_P10_FUSION_ZERO_CYCLE	\
>  				 | OPTION_MASK_HTM			\
>  				 | OPTION_MASK_ISEL			\
>  				 | OPTION_MASK_MFCRF			\
> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index e4843eb0f1c..6780304a5eb 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -4497,6 +4497,10 @@ rs6000_option_override_internal (bool global_init_p)
>        && (rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION_2STORE) == 0)
>      rs6000_isa_flags |= OPTION_MASK_P10_FUSION_2STORE;
>  
> +  if (TARGET_POWER10
> +      && (rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION_ZERO_CYCLE) == 0)
> +    rs6000_isa_flags |= OPTION_MASK_P10_FUSION_ZERO_CYCLE;
> +
>    /* Turn off vector pair/mma options on non-power10 systems.  */
>    else if (!TARGET_POWER10 && TARGET_MMA)
>      {
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index 6bec2bddbde..ea41eb4ada3 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -12988,15 +12988,34 @@ (define_expand "indirect_jump"
>      emit_jump_insn (gen_indirect_jump_nospec (Pmode, operands[0], ccreg));
>      DONE;
>    }
> +  if (TARGET_P10_FUSION && TARGET_P10_FUSION_ZERO_CYCLE)
> +    {
> +      emit_jump_insn (gen_indirect_jump_zero_cycle (Pmode, operands[0]));
> +      DONE;
> +    }
>  })
>  
>  (define_insn "*indirect_jump<mode>"
>    [(set (pc)
>  	(match_operand:P 0 "register_operand" "c,*l"))]
> -  "rs6000_speculate_indirect_jumps"
> +  "rs6000_speculate_indirect_jumps
> +   && !(TARGET_P10_FUSION && TARGET_P10_FUSION_ZERO_CYCLE)"
>    "b%T0"
>    [(set_attr "type" "jmpreg")])
>  
> +(define_insn "@indirect_jump<mode>_zero_cycle"

I don't know why this is an "@" pattern, but honestly I don't
know why @indirect_jump<mode>_nospec is an "@" pattern either.
The documentation for such things is hard for me to understand,
so I'm probably just missing something obvious, but I don't
immediately see why we would need the @ here.

> +  [(set (pc)
> +	(match_operand:P 0 "register_operand" "r,r,!cl"))
> +   (clobber (match_scratch:P 1 "=c,*l,X"))]

Do we need the *l and X alternatives if we're only doing this for
mtctr/bctr?

> +  "rs6000_speculate_indirect_jumps && TARGET_P10_FUSION
> +   && TARGET_P10_FUSION_ZERO_CYCLE"
> +  "@
> +   mt%T1 %0\;b%T1
> +   mt%T1 %0\;b%T1
> +   b%T0"
> +  [(set_attr "type" "jmpreg")
> +   (set_attr "length" "8,8,4")])
> +
>  (define_insn "@indirect_jump<mode>_nospec"
>    [(set (pc) (match_operand:P 0 "register_operand" "c,*l"))
>     (clobber (match_operand:CC 1 "cc_reg_operand" "=y,y"))]
> @@ -13050,7 +13069,11 @@ (define_expand "@tablejump<mode>_normal"
>    rtx addr = gen_reg_rtx (Pmode);
>  
>    emit_insn (gen_add<mode>3 (addr, off, lab));
> -  emit_jump_insn (gen_tablejump_insn_normal (Pmode, addr, operands[1]));
> +  rtx insn = (TARGET_P10_FUSION && TARGET_P10_FUSION_ZERO_CYCLE
> +	      ? gen_tablejump_insn_zero_cycle (Pmode, addr, operands[1])
> +	      : gen_tablejump_insn_normal (Pmode, addr, operands[1]));
> +
> +  emit_jump_insn (insn);
>    DONE;
>  })
>  
> @@ -13062,7 +13085,11 @@ (define_expand "@tablejump<mode>_absolute"
>    rtx addr = gen_reg_rtx (Pmode);
>    emit_move_insn (addr, operands[0]);
>  
> -  emit_jump_insn (gen_tablejump_insn_normal (Pmode, addr, operands[1]));
> +  rtx insn = (TARGET_P10_FUSION && TARGET_P10_FUSION_ZERO_CYCLE
> +	      ? gen_tablejump_insn_zero_cycle (Pmode, addr, operands[1])
> +	      : gen_tablejump_insn_normal (Pmode, addr, operands[1]));
> +
> +  emit_jump_insn (insn);
>    DONE;
>  })
>  
> @@ -13107,10 +13134,27 @@ (define_insn "@tablejump<mode>_insn_normal"
>    [(set (pc)
>  	(match_operand:P 0 "register_operand" "c,*l"))
>     (use (label_ref (match_operand 1)))]
> -  "rs6000_speculate_indirect_jumps"
> +  "rs6000_speculate_indirect_jumps
> +   && !(TARGET_P10_FUSION && TARGET_P10_FUSION_ZERO_CYCLE)"
>    "b%T0"
>    [(set_attr "type" "jmpreg")])
>  
> +;; Version of indirect jump that fuses the mtctr to bctr to achieve 0 cycle
> +;; moves on Power10.
> +(define_insn "@tablejump<mode>_insn_zero_cycle"

Same question about @.

> +  [(set (pc)
> +	(match_operand:P 0 "register_operand" "r,r,!cl"))
> +   (use (label_ref (match_operand 1)))
> +   (clobber (match_scratch:P 2 "=c,*l,X"))]

Same question about 2nd and 3rd alternatives.

Otherwise LGTM... over to the maintainers. :)

Thanks!
Bill

> +  "rs6000_speculate_indirect_jumps && TARGET_P10_FUSION
> +   && TARGET_P10_FUSION_ZERO_CYCLE"
> +  "@
> +   mt%T2 %0\;b%T2
> +   mt%T2 %0\;b%T2
> +   b%T0"
> +  [(set_attr "type" "jmpreg")
> +   (set_attr "length" "8,8,4")])
> +
>  (define_insn "@tablejump<mode>_insn_nospec"
>    [(set (pc)
>  	(match_operand:P 0 "register_operand" "c,*l"))
> diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
> index 9d7878f144a..ba674947557 100644
> --- a/gcc/config/rs6000/rs6000.opt
> +++ b/gcc/config/rs6000/rs6000.opt
> @@ -518,6 +518,10 @@ mpower10-fusion-2store
>  Target Undocumented Mask(P10_FUSION_2STORE) Var(rs6000_isa_flags)
>  Fuse certain store operations together for better performance on power10.
>  
> +mpower10-fusion-zero-cycle
> +Target Undocumented Mask(P10_FUSION_ZERO_CYCLE) Var(rs6000_isa_flags)
> +Fuse move to special register and jump for better performance on power10.
> +
>  mcrypto
>  Target Mask(CRYPTO) Var(rs6000_isa_flags)
>  Use ISA 2.07 Category:Vector.AES and Category:Vector.SHA2 instructions.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/3] Add power10 zero cycle moves for switches & indirect jumps
  2021-11-22 16:36   ` Bill Schmidt
@ 2021-11-22 21:12     ` Michael Meissner
  0 siblings, 0 replies; 15+ messages in thread
From: Michael Meissner @ 2021-11-22 21:12 UTC (permalink / raw)
  To: Bill Schmidt
  Cc: Michael Meissner, gcc-patches, Segher Boessenkool,
	David Edelsohn, Peter Bergner, Will Schmidt

On Mon, Nov 22, 2021 at 10:36:13AM -0600, Bill Schmidt wrote:
> Hi Mike,
> 
> Thanks for this patch!
> > --- a/gcc/config/rs6000/rs6000.md
> > +++ b/gcc/config/rs6000/rs6000.md
> > @@ -12988,15 +12988,34 @@ (define_expand "indirect_jump"
> >      emit_jump_insn (gen_indirect_jump_nospec (Pmode, operands[0], ccreg));
> >      DONE;
> >    }
> > +  if (TARGET_P10_FUSION && TARGET_P10_FUSION_ZERO_CYCLE)
> > +    {
> > +      emit_jump_insn (gen_indirect_jump_zero_cycle (Pmode, operands[0]));
> > +      DONE;
> > +    }
> >  })
> >  
> >  (define_insn "*indirect_jump<mode>"
> >    [(set (pc)
> >  	(match_operand:P 0 "register_operand" "c,*l"))]
> > -  "rs6000_speculate_indirect_jumps"
> > +  "rs6000_speculate_indirect_jumps
> > +   && !(TARGET_P10_FUSION && TARGET_P10_FUSION_ZERO_CYCLE)"
> >    "b%T0"
> >    [(set_attr "type" "jmpreg")])
> >  
> > +(define_insn "@indirect_jump<mode>_zero_cycle"
> 
> I don't know why this is an "@" pattern, but honestly I don't
> know why @indirect_jump<mode>_nospec is an "@" pattern either.
> The documentation for such things is hard for me to understand,
> so I'm probably just missing something obvious, but I don't
> immediately see why we would need the @ here.

I didn't know about it either.  Basically the next insn used it:

(define_insn "@indirect_jump<mode>_nospec"
  [(set (pc) (match_operand:P 0 "register_operand" "c,*l"))
   (clobber (match_operand:CC 1 "cc_reg_operand" "=y,y"))]
  "!rs6000_speculate_indirect_jumps"
  "crset %E1\;beq%T0- %1\;b $"
  [(set_attr "type" "jmpreg")
   (set_attr "length" "12")])

This creates a function:

	gen_indirect_jump_nospec (machine_mode arg0, rtx x0, rtx x1)

where the mode of the P iterator is passed as argument.  I.e. you can do:

	rtx foo = gen_indirect_jump_nospec (Pmode, op0, op1);

instead of:

	rtx foo;
	if (Pmode == SImode)
	  foo = gen_indirect_jumpsi_nospec (op0, op1);
	else if (Pmode == DImode)
	  foo = gen_indirect_jumpdi_nospec (op0, op1);
	else
	  gcc_unreachable ();

> > +  [(set (pc)
> > +	(match_operand:P 0 "register_operand" "r,r,!cl"))
> > +   (clobber (match_scratch:P 1 "=c,*l,X"))]
> 
> Do we need the *l and X alternatives if we're only doing this for
> mtctr/bctr?

Probably not, but I recall back before the current allocator, that it would
cause crashes if we didn't have LR.  I could certainly eliminate the *l
alternative.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meissner@linux.ibm.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] Set power10 fusion if -mtune=power10.
  2021-11-22 16:06   ` Bill Schmidt
@ 2021-11-22 21:13     ` Michael Meissner
  0 siblings, 0 replies; 15+ messages in thread
From: Michael Meissner @ 2021-11-22 21:13 UTC (permalink / raw)
  To: Bill Schmidt
  Cc: Michael Meissner, gcc-patches, Segher Boessenkool,
	David Edelsohn, Peter Bergner, Will Schmidt

On Mon, Nov 22, 2021 at 10:06:17AM -0600, Bill Schmidt wrote:
> Hi Mike,
> 
> On 11/19/21 8:55 AM, Michael Meissner wrote:
> > Set power10 fusion if -mtune=power10.
> >
> > In doing the patch for zero cycle moves for switch statements and indirect
> > jumps, I noticed the fusion support is only done if -mcpu=power10.  This option
> > enables power10 fusion if we use -mtune=power10.
> >
> > I have built and run the testsuites on little endian power9 and power10 systems
> > with no regressions.  Can I install this patch?
> 
> This all seems fine, but since we're planning on collapsing all those flags
> anyway, maybe it would be better if we did that first.  This seems like work
> that will mostly be removed soon.  But no concerns from me otherwise.
> 
> Thanks!
> Bill

It sitll is useful early on to do builds with/without to see what the bubbles
are.  But yeah, we could eliminate it.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meissner@linux.ibm.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3] Add zero cycle move support
  2021-11-22 16:09   ` David Edelsohn
@ 2021-11-22 21:17     ` Michael Meissner
  2021-11-23  3:41     ` HAO CHEN GUI
  1 sibling, 0 replies; 15+ messages in thread
From: Michael Meissner @ 2021-11-22 21:17 UTC (permalink / raw)
  To: David Edelsohn
  Cc: Bill Schmidt, Michael Meissner, HAO CHEN GUI, GCC Patches,
	Segher Boessenkool, Peter Bergner, Will Schmidt

On Mon, Nov 22, 2021 at 11:09:22AM -0500, David Edelsohn wrote:
> On Mon, Nov 22, 2021 at 10:58 AM Bill Schmidt <wschmidt@linux.ibm.com> wrote:
> And do the absolute addressing for switch tables changes work on AIX?
> I thought that Hao Chen only had done the work for PPC64 Linux ELF
> syntax with promises of future changes to accommodate AIX as well.

In theory it should work on AIX, since the assembler has to support syntax to
load the contents of a 64-bit address in memory.

In the past, when I measured this (probably in the power8 days), the issue was
occasionally having 64-bit loads for the switch tables insted of 32-bit loads
and an add instruction meant a slow down for 1-2 benchmarks that were extremely
sensitive to cache sizes.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meissner@linux.ibm.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3] Add zero cycle move support
  2021-11-22 16:09   ` David Edelsohn
  2021-11-22 21:17     ` Michael Meissner
@ 2021-11-23  3:41     ` HAO CHEN GUI
  1 sibling, 0 replies; 15+ messages in thread
From: HAO CHEN GUI @ 2021-11-23  3:41 UTC (permalink / raw)
  To: David Edelsohn, Bill Schmidt, Michael Meissner
  Cc: GCC Patches, Segher Boessenkool, Peter Bergner, Will Schmidt

Bill and David,

    Currently, the absolute jump table is not by default enabled. It can be enabled by undocumented option "-mno-relative-jumptables". If the target supports named sections (have_named_sections), the feature can be enabled. We plan to enable the future by default in GCC12 and there is a ticket for it.  Latest status is that I am waiting for comments on my patch. (https://github.ibm.com/wschmidt/power-gcc/issues/998#issuecomment-34643825). Thanks.

||

On 23/11/2021 上午 12:09, David Edelsohn wrote:
> On Mon, Nov 22, 2021 at 10:58 AM Bill Schmidt <wschmidt@linux.ibm.com> wrote:
>> Hi!
>>
>> On 11/19/21 8:49 AM, Michael Meissner wrote:
>>> The next set of 3 patches add zero cycle move support to the Power10.  Zero
>>> cycle moves are where the move to LR/CTR/TAR register that is adjacent to the
>>> jump to LR/CTR/TAR register can be fused together.
>>>
>>> At the moment, these set of three patches add support for zero cycle moves for
>>> indirect jumps and switch tables using the CTR register.  Potential zero cycle
>>> moves for doing returns are not currently handled.
>>>
>>> In looking at the code, I discovered that just using zero cycle moves isn't as
>>> helpful unless we can eliminate the add instruction before doing the jump.  I
>>> also noticed that the various power10 fusion options are only done if
>>> -mcpu=power10.  I added a patch to do the fusion for -mtune=power10 as well.
>>>
>>> I have done bootstraps and make check with these patches installed on both
>>> little endian power9 and little endian power10 systems.  Can I install these
>>> patches?
>>>
>>> The following patches will be posted:
>>>
>>> 1) Patch to add zero cycle move for indirect jumps and switches.
>>>
>>> 2) Patch to enable p10 fusion for -mtune=power10 in addition to -mcpu=power10.
>>>
>>> 3) Patch to use absolute addresses for switch tables instead of relative
>>>    addresses if zero cycle fusion is enabled.
>>>
>> For this last point, I had thought that the plan was to always switch over to
>> absolute addresses for switch tables, following the work that Hao Chen did in
>> this area.  Am I misremembering?  Hao Chen, can you please remind me where we
>> ended up here?
> And do the absolute addressing for switch tables changes work on AIX?
> I thought that Hao Chen only had done the work for PPC64 Linux ELF
> syntax with promises of future changes to accommodate AIX as well.
>
> Thanks, David

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Ping: [PATCH 1/3] Add power10 zero cycle moves for switches & indirect jumps
  2021-11-19 14:53 ` [PATCH 1/3] Add power10 zero cycle moves for switches & indirect jumps Michael Meissner
  2021-11-22 16:36   ` Bill Schmidt
@ 2021-12-13 17:10   ` Michael Meissner
  1 sibling, 0 replies; 15+ messages in thread
From: Michael Meissner @ 2021-12-13 17:10 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, Segher Boessenkool,
	David Edelsohn, Bill Schmidt, Peter Bergner, Will Schmidt

Ping patch.

| Date: Fri, 19 Nov 2021 09:53:14 -0500
| From: Michael Meissner <meissner@linux.ibm.com>
| Subject: [PATCH 1/3] Add power10 zero cycle moves for switches & indirect jumps
| Message-ID: <YZe6WugqvxPKNQJj@toto.the-meissners.org>

https://gcc.gnu.org/pipermail/gcc-patches/2021-November/585005.html

Note, I will on-line through December 20th.  I will be off-line from December
21st through January 1st.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meissner@linux.ibm.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Ping: [PATCH 2/3] Set power10 fusion if -mtune=power10.
  2021-11-19 14:55 ` [PATCH 2/3] Set power10 fusion if -mtune=power10 Michael Meissner
  2021-11-22 16:06   ` Bill Schmidt
@ 2021-12-13 17:12   ` Michael Meissner
  1 sibling, 0 replies; 15+ messages in thread
From: Michael Meissner @ 2021-12-13 17:12 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, Segher Boessenkool,
	David Edelsohn, Bill Schmidt, Peter Bergner, Will Schmidt

Ping patch.

| Date: Fri, 19 Nov 2021 09:55:50 -0500
| From: Michael Meissner <meissner@linux.ibm.com>
| Subject: [PATCH 2/3] Set power10 fusion if -mtune=power10.
| Message-ID: <YZe69s2sbkTzNxL5@toto.the-meissners.org>

https://gcc.gnu.org/pipermail/gcc-patches/2021-November/585006.html

Note, I will be on-line through December 20th.  I will be off-line December
21st through January 1st.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meissner@linux.ibm.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Ping: [PATCH 3/3] Use absolute switch table addresses for zero cycle moves.
  2021-11-19 14:57 ` [PATCH 3/3] Use absolute switch table addresses for zero cycle moves Michael Meissner
@ 2021-12-13 17:13   ` Michael Meissner
  0 siblings, 0 replies; 15+ messages in thread
From: Michael Meissner @ 2021-12-13 17:13 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, Segher Boessenkool,
	David Edelsohn, Bill Schmidt, Peter Bergner, Will Schmidt

Ping patch.

| Date: Fri, 19 Nov 2021 09:57:49 -0500
| From: Michael Meissner <meissner@linux.ibm.com>
| Subject: [PATCH 3/3] Use absolute switch table addresses for zero cycle moves.
| Message-ID: <YZe7be/SUobW7Qek@toto.the-meissners.org>

https://gcc.gnu.org/pipermail/gcc-patches/2021-November/585007.html

Note, I will on-line through December 20th.  I will be off-line December 21st
through January 1st.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meissner@linux.ibm.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2021-12-13 17:13 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-19 14:49 [PATCH 0/3] Add zero cycle move support Michael Meissner
2021-11-19 14:53 ` [PATCH 1/3] Add power10 zero cycle moves for switches & indirect jumps Michael Meissner
2021-11-22 16:36   ` Bill Schmidt
2021-11-22 21:12     ` Michael Meissner
2021-12-13 17:10   ` Ping: " Michael Meissner
2021-11-19 14:55 ` [PATCH 2/3] Set power10 fusion if -mtune=power10 Michael Meissner
2021-11-22 16:06   ` Bill Schmidt
2021-11-22 21:13     ` Michael Meissner
2021-12-13 17:12   ` Ping: " Michael Meissner
2021-11-19 14:57 ` [PATCH 3/3] Use absolute switch table addresses for zero cycle moves Michael Meissner
2021-12-13 17:13   ` Ping: " Michael Meissner
2021-11-22 15:57 ` [PATCH 0/3] Add zero cycle move support Bill Schmidt
2021-11-22 16:09   ` David Edelsohn
2021-11-22 21:17     ` Michael Meissner
2021-11-23  3:41     ` HAO CHEN GUI

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).