public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [RS6000] rtx_costs
@ 2020-09-15  1:19 Alan Modra
  2020-09-15  1:19 ` [RS6000] Count rldimi constant insns Alan Modra
                   ` (8 more replies)
  0 siblings, 9 replies; 26+ messages in thread
From: Alan Modra @ 2020-09-15  1:19 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc-patches, Alan Modra

This patch series fixes a number of issues in rs6000_rtx_costs, the
aim being to provide costing somewhat closer to reality.  Probably the
most important patch of the series is patch 4, which just adds a
comment.  Without the analysis that went into that comment, I found
myself making what seemed to be good changes but which introduced
regressions.

So far these changes have not introduced any testsuite regressions
on --with-cpu=power8 and --with-cpu=power9 all lang bootstraps on
powerpc64le-linux.  Pat spec tested on power9 against a baseline
master from a few months ago, seeing a few small improvements and no
degradations above the noise.

Some notes:

Examination of varasm.o shows quite a number of cases where
if-conversion succeeds due to different seq_cost.  One example:

extern int foo ();
int
default_assemble_integer (unsigned size)
{
  extern unsigned long rs6000_isa_flags;

  if (size > (!((rs6000_isa_flags & (1UL << 35)) != 0) ? 4 : 8))
    return 0;
  return foo ();
}

This rather horrible code turns the rs6000_isa_flags value into either
4 or 8:
	rldicr 9,9,28,0
	srdi 9,9,28
	addic 9,9,-1
	subfe 9,9,9
	rldicr 9,9,0,61
	addi 9,9,8
Better would be
	rldicl 9,9,29,63
	sldi 9,9,2
	addi 9,9,4

There is also a "rlwinm ra,rb,3,0,26" instead of "rldicr ra,rb,3,60",
and "li r31,0x4000; rotldi r31,r31,17" vs.
"lis r31,0x8000; clrldi r31,r31,32".
Neither of these is a real change.  I saw one occurrence of a 5 insn
sequence being replaced with a load from memory in
default_function_rodata_section, for ".rodata", and others elsewhere.

Sometimes correct insn cost leads to unexpected results.  For
example:

extern unsigned bar (void);
unsigned
f1 (unsigned a)
{
  if ((a & 0x01000200) == 0x01000200)
    return bar ();
  return 0;
}

emits for a & 0x01000200
 (set (reg) (and (reg) (const_int 0x01000200)))
at expand time (two rlwinm insns) rather than the older
 (set (reg) (const_int 0x01000200))
 (set (reg) (and (reg) (reg)))
which is three insns.  However, since 0x01000200 is needed later the
older code after optimisation is smaller.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RS6000] Count rldimi constant insns
  2020-09-15  1:19 [RS6000] rtx_costs Alan Modra
@ 2020-09-15  1:19 ` Alan Modra
  2020-09-15 22:29   ` Segher Boessenkool
  2020-09-15  1:19 ` [RS6000] rs6000_rtx_costs for PLUS/MINUS constant Alan Modra
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 26+ messages in thread
From: Alan Modra @ 2020-09-15  1:19 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc-patches, Alan Modra

rldimi is generated by rs6000_emit_set_long_const when the high and
low 32 bits of a 64-bit constant are equal.

	* config/rs6000/rs6000.c (num_insns_constant_gpr): Count rldimi
	constants correctly.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 20a4ba382bc..6fc86f8185a 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -5731,7 +5731,7 @@ direct_return (void)
 
 /* Helper for num_insns_constant.  Calculate number of instructions to
    load VALUE to a single gpr using combinations of addi, addis, ori,
-   oris and sldi instructions.  */
+   oris, sldi and rldimi instructions.  */
 
 static int
 num_insns_constant_gpr (HOST_WIDE_INT value)
@@ -5759,7 +5759,7 @@ num_insns_constant_gpr (HOST_WIDE_INT value)
 
       high >>= 1;
 
-      if (low == 0)
+      if (low == 0 || low == high)
 	return num_insns_constant_gpr (high) + 1;
       else if (high == 0)
 	return num_insns_constant_gpr (low) + 1;

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RS6000] rs6000_rtx_costs for PLUS/MINUS constant
  2020-09-15  1:19 [RS6000] rtx_costs Alan Modra
  2020-09-15  1:19 ` [RS6000] Count rldimi constant insns Alan Modra
@ 2020-09-15  1:19 ` Alan Modra
  2020-09-15 22:31   ` Segher Boessenkool
  2020-09-15  1:19 ` [RS6000] rs6000_rtx_costs for AND Alan Modra
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 26+ messages in thread
From: Alan Modra @ 2020-09-15  1:19 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc-patches, Alan Modra

These functions do behave a little differently for SImode, so the
mode should be passed.

	* config/rs6000/rs6000.c (rs6000_rtx_costs): Pass mode to
	reg_or_add_cint_operand and reg_or_sub_cint_operand.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 6fc86f8185a..32044d33977 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -21189,9 +21189,9 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
 	  return true;
 	}
       else if ((outer_code == PLUS
-		&& reg_or_add_cint_operand (x, VOIDmode))
+		&& reg_or_add_cint_operand (x, mode))
 	       || (outer_code == MINUS
-		   && reg_or_sub_cint_operand (x, VOIDmode))
+		   && reg_or_sub_cint_operand (x, mode))
 	       || ((outer_code == SET
 		    || outer_code == IOR
 		    || outer_code == XOR)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RS6000] rs6000_rtx_costs for AND
  2020-09-15  1:19 [RS6000] rtx_costs Alan Modra
  2020-09-15  1:19 ` [RS6000] Count rldimi constant insns Alan Modra
  2020-09-15  1:19 ` [RS6000] rs6000_rtx_costs for PLUS/MINUS constant Alan Modra
@ 2020-09-15  1:19 ` Alan Modra
  2020-09-15 18:15   ` will schmidt
  2020-09-15  1:19 ` [RS6000] rs6000_rtx_costs comment Alan Modra
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 26+ messages in thread
From: Alan Modra @ 2020-09-15  1:19 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc-patches, Alan Modra

The existing "case AND" in this function is not sufficient for
optabs.c:avoid_expensive_constant usage, where the AND is passed in
outer_code.

	* config/rs6000/rs6000.c (rs6000_rtx_costs): Move costing for
	AND to CONST_INT case.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 32044d33977..523d029800a 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -21150,16 +21150,13 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
 	    || outer_code == MINUS)
 	   && (satisfies_constraint_I (x)
 	       || satisfies_constraint_L (x)))
-	  || (outer_code == AND
-	      && (satisfies_constraint_K (x)
-		  || (mode == SImode
-		      ? satisfies_constraint_L (x)
-		      : satisfies_constraint_J (x))))
-	  || ((outer_code == IOR || outer_code == XOR)
+	  || ((outer_code == AND || outer_code == IOR || outer_code == XOR)
 	      && (satisfies_constraint_K (x)
 		  || (mode == SImode
 		      ? satisfies_constraint_L (x)
 		      : satisfies_constraint_J (x))))
+	  || (outer_code == AND
+	      && rs6000_is_valid_and_mask (x, mode))
 	  || outer_code == ASHIFT
 	  || outer_code == ASHIFTRT
 	  || outer_code == LSHIFTRT
@@ -21196,7 +21193,9 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
 		    || outer_code == IOR
 		    || outer_code == XOR)
 		   && (INTVAL (x)
-		       & ~ (unsigned HOST_WIDE_INT) 0xffffffff) == 0))
+		       & ~ (unsigned HOST_WIDE_INT) 0xffffffff) == 0)
+	       || (outer_code == AND
+		   && rs6000_is_valid_2insn_and (x, mode)))
 	{
 	  *total = COSTS_N_INSNS (1);
 	  return true;
@@ -21334,26 +21333,6 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
 	      *total += COSTS_N_INSNS (1);
 	      return true;
 	    }
-
-	  /* rotate-and-mask (no rotate), andi., andis.: 1 insn.  */
-	  HOST_WIDE_INT val = INTVAL (XEXP (x, 1));
-	  if (rs6000_is_valid_and_mask (XEXP (x, 1), mode)
-	      || (val & 0xffff) == val
-	      || (val & 0xffff0000) == val
-	      || ((val & 0xffff) == 0 && mode == SImode))
-	    {
-	      *total = rtx_cost (left, mode, AND, 0, speed);
-	      *total += COSTS_N_INSNS (1);
-	      return true;
-	    }
-
-	  /* 2 insns.  */
-	  if (rs6000_is_valid_2insn_and (XEXP (x, 1), mode))
-	    {
-	      *total = rtx_cost (left, mode, AND, 0, speed);
-	      *total += COSTS_N_INSNS (2);
-	      return true;
-	    }
 	}
 
       *total = COSTS_N_INSNS (1);

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RS6000] rs6000_rtx_costs comment
  2020-09-15  1:19 [RS6000] rtx_costs Alan Modra
                   ` (2 preceding siblings ...)
  2020-09-15  1:19 ` [RS6000] rs6000_rtx_costs for AND Alan Modra
@ 2020-09-15  1:19 ` Alan Modra
  2020-09-16 23:21   ` Segher Boessenkool
  2020-09-15  1:19 ` [RS6000] rs6000_rtx_costs multi-insn constants Alan Modra
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 26+ messages in thread
From: Alan Modra @ 2020-09-15  1:19 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc-patches, Alan Modra

Prior patches in this series were small bug fixes.  This lays out the
ground rules for following patches.

	* config/rs6000/rs6000.c (rs6000_rtx_costs): Expand comment.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 523d029800a..5b3c0ee0e8c 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -21133,7 +21133,45 @@ rs6000_cannot_copy_insn_p (rtx_insn *insn)
 
 /* Compute a (partial) cost for rtx X.  Return true if the complete
    cost has been computed, and false if subexpressions should be
-   scanned.  In either case, *TOTAL contains the cost result.  */
+   scanned.  In either case, *TOTAL contains the cost result.
+
+   1) Calls from places like optabs.c:avoid_expensive_constant will
+   come here with OUTER_CODE set to an operation such as AND with X
+   being a CONST_INT or other CONSTANT_P type.  This will be compared
+   against set_src_cost, where we'll come here with OUTER_CODE as SET
+   and X the same constant.
+
+   2) Calls from places like combine:distribute_and_simplify_rtx are
+   asking whether a possibly quite complex SET_SRC can be implemented
+   more cheaply than some other logically equivalent SET_SRC.
+
+   3) Calls from places like default_noce_conversion_profitable_p will
+   come here via seq_cost and pass the pattern of a SET insn in X.
+   Presuming the insn is valid and set_dest a reg, rs6000_rtx_costs
+   will next see the SET_SRC.  The overall cost should be comparable
+   to rs6000_insn_cost since the code is comparing one insn sequence
+   (some of which may be costed by insn_cost) against another insn
+   sequence.
+
+   4) Calls from places like cprop.c:try_replace_reg will come here
+   with OUTER_CODE as INSN, and X either a valid pattern of a SET or
+   one where some registers have been replaced with constants.  The
+   replacements may make the SET invalid, for example if
+     (set (reg1) (and (reg2) (const_int 0xfff)))
+   replaces reg2 as
+     (set (reg1) (and (symbol_ref) (const_int 0xfff)))
+   then the replacement can't be implemented in one instruction and
+   really the cost should be higher by one instruction.  However,
+   the cost for invalid insns doesn't matter much except that a
+   higher cost may lead to their rejection earlier.
+
+   5) fwprop.c:should_replace_address puts yet another wrinkle on this
+   function, where we prefer an address calculation that is more
+   complex yet has the same address_cost.  In this case "more
+   complex" is determined by having a higher set_src_cost.  So for
+   example, if we want a plain (reg) address to be replaced with
+   (plus (reg) (const)) when possible then PLUS needs to cost more
+   than zero here.  */
 
 static bool
 rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RS6000] rs6000_rtx_costs multi-insn constants
  2020-09-15  1:19 [RS6000] rtx_costs Alan Modra
                   ` (3 preceding siblings ...)
  2020-09-15  1:19 ` [RS6000] rs6000_rtx_costs comment Alan Modra
@ 2020-09-15  1:19 ` Alan Modra
  2020-09-16 23:28   ` Segher Boessenkool
  2020-09-15  1:19 ` [RS6000] rs6000_rtx_costs cost IOR Alan Modra
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 26+ messages in thread
From: Alan Modra @ 2020-09-15  1:19 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc-patches, Alan Modra

This small patch to rs6000_rtx_const considerably improves code
generated for large constants in 64-bit code, teaching gcc that it is
better to load a constant from memory than to generate a sequence of
up to five dependent instructions.  Note that the rs6000 backend does
generate large constants as loads from memory at expand time but
optimisation passes replace them with SETs of the value due to not
having correct costs.

	PR 94393
	* config/rs6000/rs6000.c (rs6000_rtx_costs): Cost multi-insn
	constants.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 5b3c0ee0e8c..8c300b82b11 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -21181,7 +21181,9 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
 
   switch (code)
     {
-      /* On the RS/6000, if it is valid in the insn, it is free.  */
+      /* (reg) is costed at zero by rtlanal.c:rtx_cost.  That sets a
+	 baseline for rtx costs:  If a constant is valid in an insn,
+	 it is free.  */
     case CONST_INT:
       if (((outer_code == SET
 	    || outer_code == PLUS
@@ -21242,6 +21244,17 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
 
     case CONST_DOUBLE:
     case CONST_WIDE_INT:
+      /* Subtract one insn here for consistency with the above code
+	 that returns one less than the actual number of insns for
+	 SETs.  Don't subtract one for other than SETs, because other
+	 operations will require the constant to be loaded to a
+	 register before performing the operation.  All special cases
+	 for codes other than SET must be handled before we get
+	 here.  */
+      *total = COSTS_N_INSNS (num_insns_constant (x, mode)
+			      - (outer_code == SET ? 1 : 0));
+      return true;
+
     case CONST:
     case HIGH:
     case SYMBOL_REF:

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RS6000] rs6000_rtx_costs cost IOR
  2020-09-15  1:19 [RS6000] rtx_costs Alan Modra
                   ` (4 preceding siblings ...)
  2020-09-15  1:19 ` [RS6000] rs6000_rtx_costs multi-insn constants Alan Modra
@ 2020-09-15  1:19 ` Alan Modra
  2020-09-17  0:02   ` Segher Boessenkool
  2020-09-15  1:19 ` [RS6000] rs6000_rtx_costs reduce cost for SETs Alan Modra
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 26+ messages in thread
From: Alan Modra @ 2020-09-15  1:19 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc-patches, Alan Modra

	* config/rs6000/rs6000.c (rs6000_rtx_costs): Cost IOR.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 8c300b82b11..fb5fe7969a3 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -21177,6 +21177,7 @@ static bool
 rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
 		  int opno ATTRIBUTE_UNUSED, int *total, bool speed)
 {
+  rtx left, right;
   int code = GET_CODE (x);
 
   switch (code)
@@ -21367,16 +21368,17 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
       return false;
 
     case AND:
-      if (CONST_INT_P (XEXP (x, 1)))
+      right = XEXP (x, 1);
+      if (CONST_INT_P (right))
 	{
-	  rtx left = XEXP (x, 0);
+	  left = XEXP (x, 0);
 	  rtx_code left_code = GET_CODE (left);
 
 	  /* rotate-and-mask: 1 insn.  */
 	  if ((left_code == ROTATE
 	       || left_code == ASHIFT
 	       || left_code == LSHIFTRT)
-	      && rs6000_is_valid_shift_mask (XEXP (x, 1), left, mode))
+	      && rs6000_is_valid_shift_mask (right, left, mode))
 	    {
 	      *total = rtx_cost (XEXP (left, 0), mode, left_code, 0, speed);
 	      if (!CONST_INT_P (XEXP (left, 1)))
@@ -21390,9 +21392,106 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
       return false;
 
     case IOR:
-      /* FIXME */
       *total = COSTS_N_INSNS (1);
-      return true;
+      left = XEXP (x, 0);
+      if (GET_CODE (left) == AND
+	  && CONST_INT_P (XEXP (left, 1)))
+	{
+	  right = XEXP (x, 1);
+	  if (GET_CODE (right) == AND
+	      && CONST_INT_P (XEXP (right, 1))
+	      && UINTVAL (XEXP (left, 1)) + UINTVAL (XEXP (right, 1)) + 1 == 0)
+	    {
+	      rtx leftop = XEXP (left, 0);
+	      rtx rightop = XEXP (right, 0);
+
+	      // rotlsi3_insert_5
+	      if (REG_P (leftop)
+		  && REG_P (rightop)
+		  && mode == SImode
+		  && UINTVAL (XEXP (left, 1)) != 0
+		  && UINTVAL (XEXP (right, 1)) != 0
+		  && rs6000_is_valid_mask (XEXP (left, 1), NULL, NULL, mode))
+		return true;
+	      // rotldi3_insert_6
+	      if (REG_P (leftop)
+		  && REG_P (rightop)
+		  && mode == DImode
+		  && exact_log2 (-UINTVAL (XEXP (left, 1))) > 0)
+		return true;
+	      // rotldi3_insert_7
+	      if (REG_P (leftop)
+		  && REG_P (rightop)
+		  && mode == DImode
+		  && exact_log2 (-UINTVAL (XEXP (right, 1))) > 0)
+		return true;
+
+	      rtx mask = 0;
+	      rtx shift = leftop;
+	      rtx_code shift_code = GET_CODE (shift);
+	      // rotl<mode>3_insert
+	      if (shift_code == ROTATE
+		  || shift_code == ASHIFT
+		  || shift_code == LSHIFTRT)
+		mask = right;
+	      else
+		{
+		  shift = rightop;
+		  shift_code = GET_CODE (shift);
+		  // rotl<mode>3_insert_2
+		  if (shift_code == ROTATE
+		      || shift_code == ASHIFT
+		      || shift_code == LSHIFTRT)
+		    mask = left;
+		}
+	      if (mask
+		  && CONST_INT_P (XEXP (shift, 1))
+		  && rs6000_is_valid_insert_mask (XEXP (mask, 1), shift, mode))
+		{
+		  /* Test both regs even though the one in the mask is
+		     constrained to be equal to the output.  Increasing
+		     cost may well result in rejecting an invalid insn
+		     earlier.  */
+		  rtx reg_op = XEXP (shift, 0);
+		  if (!REG_P (reg_op))
+		    *total += rtx_cost (reg_op, mode, shift_code, 0, speed);
+		  reg_op = XEXP (mask, 0);
+		  if (!REG_P (reg_op))
+		    *total += rtx_cost (reg_op, mode, AND, 0, speed);
+		  return true;
+		}
+	    }
+	  // rotl<mode>3_insert_3
+	  if (GET_CODE (right) == ASHIFT
+	      && CONST_INT_P (XEXP (right, 1))
+	      && (INTVAL (XEXP (right, 1))
+		  == exact_log2 (UINTVAL (XEXP (left, 1)) + 1)))
+	    {
+	      rtx reg_op = XEXP (left, 0);
+	      if (!REG_P (reg_op))
+		*total += rtx_cost (reg_op, mode, AND, 0, speed);
+	      reg_op = XEXP (right, 0);
+	      if (!REG_P (reg_op))
+		*total += rtx_cost (reg_op, mode, ASHIFT, 0, speed);
+	      return true;
+	    }
+	  // rotl<mode>3_insert_4
+	  if (GET_CODE (right) == LSHIFTRT
+	      && CONST_INT_P (XEXP (right, 1))
+	      && mode == SImode
+	      && (INTVAL (XEXP (right, 1))
+		  + exact_log2 (-UINTVAL (XEXP (left, 1)))) == 32)
+	    {
+	      rtx reg_op = XEXP (left, 0);
+	      if (!REG_P (reg_op))
+		*total += rtx_cost (reg_op, mode, AND, 0, speed);
+	      reg_op = XEXP (right, 0);
+	      if (!REG_P (reg_op))
+		*total += rtx_cost (reg_op, mode, LSHIFTRT, 0, speed);
+	      return true;
+	    }
+	}
+      return false;
 
     case CLZ:
     case XOR:

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RS6000] rs6000_rtx_costs reduce cost for SETs
  2020-09-15  1:19 [RS6000] rtx_costs Alan Modra
                   ` (5 preceding siblings ...)
  2020-09-15  1:19 ` [RS6000] rs6000_rtx_costs cost IOR Alan Modra
@ 2020-09-15  1:19 ` Alan Modra
  2020-09-17 17:51   ` Segher Boessenkool
  2020-09-15  1:19 ` [RS6000] rotate and mask constants Alan Modra
  2020-09-15 18:15 ` [RS6000] rtx_costs will schmidt
  8 siblings, 1 reply; 26+ messages in thread
From: Alan Modra @ 2020-09-15  1:19 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc-patches, Alan Modra

Also use rs6000_cost only for speed.

	* config/rs6000/rs6000.c (rs6000_rtx_costs): Reduce cost for SETs
	when insn operation cost handled on recursive call.  Only use
	rs6000_cost for speed.  Tidy break/return.  Tidy AND costing.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index fb5fe7969a3..86c90c4d756 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -21277,15 +21277,19 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
 
     case PLUS:
     case MINUS:
-      if (FLOAT_MODE_P (mode))
+      if (speed && FLOAT_MODE_P (mode))
 	*total = rs6000_cost->fp;
       else
 	*total = COSTS_N_INSNS (1);
       return false;
 
     case MULT:
-      if (CONST_INT_P (XEXP (x, 1))
-	  && satisfies_constraint_I (XEXP (x, 1)))
+      if (!speed)
+	/* A little more than one insn so that nothing is tempted to
+	   turn a shift left into a multiply.  */
+	*total = COSTS_N_INSNS (1) + 1;
+      else if (CONST_INT_P (XEXP (x, 1))
+	       && satisfies_constraint_I (XEXP (x, 1)))
 	{
 	  if (INTVAL (XEXP (x, 1)) >= -256
 	      && INTVAL (XEXP (x, 1)) <= 255)
@@ -21304,18 +21308,22 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
       return false;
 
     case FMA:
-      if (mode == SFmode)
+      if (!speed)
+	*total = COSTS_N_INSNS (1) + 1;
+      else if (mode == SFmode)
 	*total = rs6000_cost->fp;
       else
 	*total = rs6000_cost->dmul;
-      break;
+      return false;
 
     case DIV:
     case MOD:
       if (FLOAT_MODE_P (mode))
 	{
-	  *total = mode == DFmode ? rs6000_cost->ddiv
-				  : rs6000_cost->sdiv;
+	  if (!speed)
+	    *total = COSTS_N_INSNS (1) + 2;
+	  else
+	    *total = mode == DFmode ? rs6000_cost->ddiv : rs6000_cost->sdiv;
 	  return false;
 	}
       /* FALLTHRU */
@@ -21334,7 +21342,9 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
 	}
       else
 	{
-	  if (GET_MODE (XEXP (x, 1)) == DImode)
+	  if (!speed)
+	    *total = COSTS_N_INSNS (1) + 2;
+	  else if (GET_MODE (XEXP (x, 1)) == DImode)
 	    *total = rs6000_cost->divdi;
 	  else
 	    *total = rs6000_cost->divsi;
@@ -21368,6 +21378,7 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
       return false;
 
     case AND:
+      *total = COSTS_N_INSNS (1);
       right = XEXP (x, 1);
       if (CONST_INT_P (right))
 	{
@@ -21380,15 +21391,15 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
 	       || left_code == LSHIFTRT)
 	      && rs6000_is_valid_shift_mask (right, left, mode))
 	    {
-	      *total = rtx_cost (XEXP (left, 0), mode, left_code, 0, speed);
-	      if (!CONST_INT_P (XEXP (left, 1)))
-		*total += rtx_cost (XEXP (left, 1), SImode, left_code, 1, speed);
-	      *total += COSTS_N_INSNS (1);
+	      rtx reg_op = XEXP (left, 0);
+	      if (!REG_P (reg_op))
+		*total += rtx_cost (reg_op, mode, left_code, 0, speed);
+	      reg_op = XEXP (left, 1);
+	      if (!REG_P (reg_op) && !CONST_INT_P (reg_op))
+		*total += rtx_cost (reg_op, mode, left_code, 1, speed);
 	      return true;
 	    }
 	}
-
-      *total = COSTS_N_INSNS (1);
       return false;
 
     case IOR:
@@ -21519,7 +21530,9 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
       if (outer_code == TRUNCATE
 	  && GET_CODE (XEXP (x, 0)) == MULT)
 	{
-	  if (mode == DImode)
+	  if (!speed)
+	    *total = COSTS_N_INSNS (1) + 1;
+	  else if (mode == DImode)
 	    *total = rs6000_cost->muldi;
 	  else
 	    *total = rs6000_cost->mulsi;
@@ -21554,11 +21567,16 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
     case FIX:
     case UNSIGNED_FIX:
     case FLOAT_TRUNCATE:
-      *total = rs6000_cost->fp;
+      if (!speed)
+	*total = COSTS_N_INSNS (1);
+      else
+	*total = rs6000_cost->fp;
       return false;
 
     case FLOAT_EXTEND:
-      if (mode == DFmode)
+      if (!speed)
+	*total = COSTS_N_INSNS (1);
+      else if (mode == DFmode)
 	*total = rs6000_cost->sfdf_convert;
       else
 	*total = rs6000_cost->fp;
@@ -21576,7 +21594,7 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
 	  *total = rs6000_cost->fp;
 	  return false;
 	}
-      break;
+      return false;
 
     case NE:
     case EQ:
@@ -21614,13 +21632,32 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
 	  *total = 0;
 	  return true;
 	}
-      break;
+      return false;
+
+    case SET:
+      /* The default cost of a SET is the number of general purpose
+	 regs being set multiplied by COSTS_N_INSNS (1).  That only
+	 works where the incremental cost of the operation and
+	 operands is zero, when the operation performed can be done in
+	 one instruction.  For other cases where we add COSTS_N_INSNS
+	 for some operation (see point 5 above), COSTS_N_INSNS (1)
+	 should be subtracted from the total cost.  */
+      {
+	rtx_code src_code = GET_CODE (SET_SRC (x));
+	if (src_code == CONST_INT
+	    || src_code == CONST_DOUBLE
+	    || src_code == CONST_WIDE_INT)
+	  return false;
+	int set_cost = (rtx_cost (SET_SRC (x), mode, SET, 1, speed)
+			+ rtx_cost (SET_DEST (x), mode, SET, 0, speed));
+	if (set_cost >= COSTS_N_INSNS (1))
+	  *total += set_cost - COSTS_N_INSNS (1);
+	return true;
+      }
 
     default:
-      break;
+      return false;
     }
-
-  return false;
 }
 
 /* Debug form of r6000_rtx_costs that is selected if -mdebug=cost.  */

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RS6000] rotate and mask constants
  2020-09-15  1:19 [RS6000] rtx_costs Alan Modra
                   ` (6 preceding siblings ...)
  2020-09-15  1:19 ` [RS6000] rs6000_rtx_costs reduce cost for SETs Alan Modra
@ 2020-09-15  1:19 ` Alan Modra
  2020-09-15  7:16   ` Alan Modra
  2020-09-15 18:15 ` [RS6000] rtx_costs will schmidt
  8 siblings, 1 reply; 26+ messages in thread
From: Alan Modra @ 2020-09-15  1:19 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc-patches, Alan Modra

Implement more two insn constants.

	PR 94393
	* config/rs6000/rs6000.c (rotate_and_mask_constant): New function.
	(num_insns_constant_multi, rs6000_emit_set_long_const): Use it here.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 86c90c4d756..1848cb57ef8 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -1112,6 +1112,8 @@ static tree rs6000_handle_altivec_attribute (tree *, tree, tree, int, bool *);
 static tree rs6000_handle_struct_attribute (tree *, tree, tree, int, bool *);
 static tree rs6000_builtin_vectorized_libmass (combined_fn, tree, tree);
 static void rs6000_emit_set_long_const (rtx, HOST_WIDE_INT);
+static bool rotate_and_mask_constant (unsigned HOST_WIDE_INT, HOST_WIDE_INT *,
+				      int *, unsigned HOST_WIDE_INT *);
 static int rs6000_memory_move_cost (machine_mode, reg_class_t, bool);
 static bool rs6000_debug_rtx_costs (rtx, machine_mode, int, int, int *, bool);
 static int rs6000_debug_address_cost (rtx, machine_mode, addr_space_t,
@@ -5789,7 +5791,8 @@ num_insns_constant_multi (HOST_WIDE_INT value, machine_mode mode)
 	  /* We won't get more than 2 from num_insns_constant_gpr
 	     except when TARGET_POWERPC64 and mode is DImode or
 	     wider, so the register mode must be DImode.  */
-	  && rs6000_is_valid_and_mask (GEN_INT (low), DImode))
+	  && (rs6000_is_valid_and_mask (GEN_INT (low), DImode)
+	      || rotate_and_mask_constant (low, NULL, NULL, NULL)))
 	insns = 2;
       total += insns;
       /* If BITS_PER_WORD is the number of bits in HOST_WIDE_INT, doing
@@ -9420,6 +9423,82 @@ rs6000_emit_set_const (rtx dest, rtx source)
   return true;
 }
 
+/* Detect cases where a constant can be formed by li; rldicl, li; rldicr,
+   or lis; rldicl.  */
+
+static bool
+rotate_and_mask_constant (unsigned HOST_WIDE_INT c,
+			  HOST_WIDE_INT *val, int *shift,
+			  unsigned HOST_WIDE_INT *mask)
+{
+  /* We know C can't be formed by lis,addi so that puts constraints
+     on the max leading zeros.  lead_zeros won't be larger than
+     HOST_BITS_PER_WIDE_INT - 31.  */
+  int lead_zeros = wi::clz (c);
+  int non_zero = HOST_BITS_PER_WIDE_INT - lead_zeros;
+  /* 00...01xxxxxxxxxxxxxx0..00 (up to 14 x's, any number of leading
+     and trailing 0's) can be implemented as a li, rldicl.  */
+  if ((c & ~(HOST_WIDE_INT_UC (0x7fff) << (non_zero - 15))) == 0)
+    {
+      /* eg. c = 1100 0000 0000 ... 0000
+	 -> val = 0x3000, shift = 49, mask = -1ull.  */
+      if (val)
+	{
+	  *val = c >> (non_zero - 15);
+	  *shift = non_zero - 15;
+	  *mask = HOST_WIDE_INT_M1U;
+	}
+      return true;
+    }
+  /* 00...01xxxxxxxxxxxxxxx1..11 (up to 15 x's, any number of leading
+     0's and trailing 1's) can be implemented as a li, rldicl.  */
+  if ((c | (HOST_WIDE_INT_M1U << (non_zero - 16))) == HOST_WIDE_INT_M1U)
+    {
+      /* eg. c = 0000 1011 1111 1111 ... 1111
+	 -> val = sext(0xbfff), shift = 44, mask = 0x0fffffffffffffff.  */
+      if (val)
+	{
+	  *val = (((c >> (non_zero - 16)) & 0xffff) ^ 0x8000) - 0x8000;
+	  *shift = non_zero - 16;
+	  *mask = HOST_WIDE_INT_M1U >> lead_zeros;
+	}
+      return true;
+    }
+  /* 00...01xxxxxxxxxxxxxxx00..01..11 (up to 15 x's followed by 16 0's,
+     any number of leading 0's and trailing 1's) can be implemented as
+     lis, rldicl.  */
+  if (non_zero >= 32
+      && (c & ((HOST_WIDE_INT_1U << (non_zero - 16))
+	       - (HOST_WIDE_INT_1U << (non_zero - 32)))) == 0
+      && (c | (HOST_WIDE_INT_M1U << (non_zero - 32))) == HOST_WIDE_INT_M1U)
+    {
+      if (val)
+	{
+	  *val = (((c >> (non_zero - 32)) & 0xffffffff)
+		  ^ 0x80000000) - 0x80000000;
+	  *shift = non_zero - 32;
+	  *mask = HOST_WIDE_INT_M1U >> lead_zeros;
+	}
+      return true;
+    }
+  /* 11..1xxxxxxxxxxxxxxx0..0 (up to 15 x's, any number of leading 1's
+     and trailing 0's) can be implemented as a li, rldicr.  */
+  int trail_zeros = wi::ctz (c);
+  if (trail_zeros >= 48
+      || ((c | ((HOST_WIDE_INT_1U << (trail_zeros + 15)) - 1))
+	  == HOST_WIDE_INT_M1U))
+    {
+      if (val)
+	{
+	  *val = (((c >> trail_zeros) & 0xffff) ^ 0x8000) - 0x8000;
+	  *shift = trail_zeros;
+	  *mask = HOST_WIDE_INT_M1U << trail_zeros;
+	}
+      return true;
+    }
+  return false;
+}
+
 /* Subroutine of rs6000_emit_set_const, handling PowerPC64 DImode.
    Output insns to set DEST equal to the constant C as a series of
    lis, ori and shl instructions.  */
@@ -9429,6 +9508,9 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c)
 {
   rtx temp;
   HOST_WIDE_INT ud1, ud2, ud3, ud4;
+  HOST_WIDE_INT val = c;
+  int shift;
+  unsigned HOST_WIDE_INT mask;
 
   ud1 = c & 0xffff;
   c = c >> 16;
@@ -9454,6 +9536,15 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c)
 			gen_rtx_IOR (DImode, copy_rtx (temp),
 				     GEN_INT (ud1)));
     }
+  else if (rotate_and_mask_constant (val, &val, &shift, &mask))
+    {
+      temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
+      emit_move_insn (temp, GEN_INT (val));
+      rtx x = gen_rtx_ROTATE (DImode, copy_rtx (temp), GEN_INT (shift));
+      if (mask != HOST_WIDE_INT_M1U)
+	x = gen_rtx_AND (DImode, x, GEN_INT (mask));
+      emit_move_insn (dest, x);
+    }
   else if (ud3 == 0 && ud4 == 0)
     {
       temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RS6000] rotate and mask constants
  2020-09-15  1:19 ` [RS6000] rotate and mask constants Alan Modra
@ 2020-09-15  7:16   ` Alan Modra
  2020-09-21 15:56     ` Segher Boessenkool
  0 siblings, 1 reply; 26+ messages in thread
From: Alan Modra @ 2020-09-15  7:16 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc-patches, Peter Bergner

On Tue, Sep 15, 2020 at 10:49:46AM +0930, Alan Modra wrote:
> Implement more two insn constants.

And tests.  rot_cst1 checks the values generated, rot_cst2 checks
instruction count.

	* gcc.target/powerpc/rot_cst.h,
	* gcc.target/powerpc/rot_cst1.c,
	* gcc.target/powerpc/rot_cst2.c: New tests.

diff --git a/gcc/testsuite/gcc.target/powerpc/rot_cst.h b/gcc/testsuite/gcc.target/powerpc/rot_cst.h
new file mode 100644
index 00000000000..0d100d61233
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/rot_cst.h
@@ -0,0 +1,269 @@
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+c1 (void)
+{
+  return 0xc000000000000000ULL;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+c2 (void)
+{
+  return 0xc00000000000000ULL;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+c3 (void)
+{
+  return 0xc0000000000000ULL;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+c4 (void)
+{
+  return 0xc000000000000ULL;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+c5 (void)
+{
+  return 0xc00000000000ULL;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+c6 (void)
+{
+  return 0xc0000000000ULL;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+c7 (void)
+{
+  return 0xc000000000ULL;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+c8 (void)
+{
+  return 0xc00000000ULL;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+c9 (void)
+{
+  return 0xc0000000ULL;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+c10 (void)
+{
+  return 0xc000000ULL;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+c11 (void)
+{
+  return 0xc00000ULL;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+c12 (void)
+{
+  return 0xc0000ULL;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+c13 (void)
+{
+  return 0xc000ULL;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+c14 (void)
+{
+  return 0xc00ULL;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+c15 (void)
+{
+  return 0xc0ULL;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+c16 (void)
+{
+  return 0xcULL;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+b1 (void)
+{
+  return 0xbfffffffffffffffULL;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+b2 (void)
+{
+  return 0xbffffffffffffffULL;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+b3 (void)
+{
+  return 0xbfffffffffffffULL;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+b4 (void)
+{
+  return 0xbffffffffffffULL;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+b5 (void)
+{
+  return 0xbfffffffffffULL;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+b6 (void)
+{
+  return 0xbffffffffffULL;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+b7 (void)
+{
+  return 0xbfffffffffULL;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+b8 (void)
+{
+  return 0xbffffffffULL;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+b9 (void)
+{
+  return 0xbfffffffULL;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+b10 (void)
+{
+  return 0xbffffffULL;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+b11 (void)
+{
+  return 0xbfffffULL;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+b12 (void)
+{
+  return 0xbffffULL;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+b13 (void)
+{
+  return 0xbfffULL;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+b14 (void)
+{
+  return 0xbffULL;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+b15 (void)
+{
+  return 0xbfULL;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+b16 (void)
+{
+  return 0xbULL;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+r1 (void)
+{
+  return -0x124ULL << 48;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+r2 (void)
+{
+  return -0x124ULL << 44;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+r3 (void)
+{
+  return -0x124ULL << 40;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+r4 (void)
+{
+  return -0x124ULL << 32;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+r5 (void)
+{
+  return -0x124ULL << 28;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+r6 (void)
+{
+  return -0x124ULL << 24;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+r7 (void)
+{
+  return -0x124ULL << 20;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+r8 (void)
+{
+  return -0x124ULL << 16;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+r9 (void)
+{
+  return -0x124ULL << 12;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+r10 (void)
+{
+  return -0x124ULL << 8;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+r11 (void)
+{
+  return -0x124ULL << 4;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+r12 (void)
+{
+  return -0x124ULL << 0;
+}
+
+unsigned long long __attribute__ ((__noinline__, __noclone__))
+lis1 (void)
+{
+  return 0x89ab0000ffffULL;
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/rot_cst1.c b/gcc/testsuite/gcc.target/powerpc/rot_cst1.c
new file mode 100644
index 00000000000..30219dd1438
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/rot_cst1.c
@@ -0,0 +1,68 @@
+/* { dg-do run { target lp64 } } */
+/* { dg-options "-O2" } */
+
+#include "rot_cst.h"
+
+struct fun {
+  unsigned long long (*f) (void);
+  unsigned long long val;
+};
+
+volatile struct fun values[] = {
+  { c1,  0xc000000000000000ULL },
+  { c2,  0xc00000000000000ULL },
+  { c3,  0xc0000000000000ULL },
+  { c4,  0xc000000000000ULL },
+  { c5,  0xc00000000000ULL },
+  { c6,  0xc0000000000ULL },
+  { c7,  0xc000000000ULL },
+  { c8,  0xc00000000ULL },
+  { c9,  0xc0000000ULL },
+  { c10, 0xc000000ULL },
+  { c11, 0xc00000ULL },
+  { c12, 0xc0000ULL },
+  { c13, 0xc000ULL },
+  { c14, 0xc00ULL },
+  { c15, 0xc0ULL },
+  { c16, 0xcULL },
+  { b1,  0xbfffffffffffffffULL },
+  { b2,  0xbffffffffffffffULL },
+  { b3,  0xbfffffffffffffULL },
+  { b4,  0xbffffffffffffULL },
+  { b5,  0xbfffffffffffULL },
+  { b6,  0xbffffffffffULL },
+  { b7,  0xbfffffffffULL },
+  { b8,  0xbffffffffULL },
+  { b9,  0xbfffffffULL },
+  { b10, 0xbffffffULL },
+  { b11, 0xbfffffULL },
+  { b12, 0xbffffULL },
+  { b13, 0xbfffULL },
+  { b14, 0xbffULL },
+  { b15, 0xbfULL },
+  { b16, 0xbULL },
+  { r1,  -0x124ULL << 48 },
+  { r2,  -0x124ULL << 44 },
+  { r3,  -0x124ULL << 40 },
+  { r4,  -0x124ULL << 32 },
+  { r5,  -0x124ULL << 28 },
+  { r6,  -0x124ULL << 24 },
+  { r7,  -0x124ULL << 20 },
+  { r8,  -0x124ULL << 16 },
+  { r9,  -0x124ULL << 12 },
+  { r10, -0x124ULL << 8 },
+  { r11, -0x124ULL << 4 },
+  { r12, -0x124ULL << 0 },
+  { lis1, 0x89ab0000ffffULL }
+};
+
+int
+main (void)
+{
+  volatile struct fun *t;
+
+  for (t = values; t < values + sizeof (values) / sizeof (values[0]); ++t)
+    if (t->f () != t->val)
+      __builtin_abort ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/rot_cst2.c b/gcc/testsuite/gcc.target/powerpc/rot_cst2.c
new file mode 100644
index 00000000000..d5d45ead043
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/rot_cst2.c
@@ -0,0 +1,6 @@
+/* { dg-do compile { target lp64 } } */
+/* { dg-options "-O2" } */
+
+#include "rot_cst.h"
+
+/* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 122 } } */

-- 
Alan Modra
Australia Development Lab, IBM

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RS6000] rtx_costs
  2020-09-15  1:19 [RS6000] rtx_costs Alan Modra
                   ` (7 preceding siblings ...)
  2020-09-15  1:19 ` [RS6000] rotate and mask constants Alan Modra
@ 2020-09-15 18:15 ` will schmidt
  8 siblings, 0 replies; 26+ messages in thread
From: will schmidt @ 2020-09-15 18:15 UTC (permalink / raw)
  To: Alan Modra, Segher Boessenkool; +Cc: gcc-patches

On Tue, 2020-09-15 at 10:49 +0930, Alan Modra via Gcc-patches wrote:
> This patch series fixes a number of issues in rs6000_rtx_costs, the
> aim being to provide costing somewhat closer to reality.  Probably
> the
> most important patch of the series is patch 4, which just adds a
> comment.  Without the analysis that went into that comment, I found
> myself making what seemed to be good changes but which introduced
> regressions.
> 
> So far these changes have not introduced any testsuite regressions
> on --with-cpu=power8 and --with-cpu=power9 all lang bootstraps on
> powerpc64le-linux.  Pat spec tested on power9 against a baseline
> master from a few months ago, seeing a few small improvements and no
> degradations above the noise.

I've read through all the patches in this series, (including the tests
that were sent a bit later).  Your use of comments does a good job
helping describe whats going on.

One comment/question/point of clarity for the AND patch that I'll send
separately.  

That said, the series lgtm.  :-) 

thanks, 
-Will


> 
> Some notes:
> 
> Examination of varasm.o shows quite a number of cases where
> if-conversion succeeds due to different seq_cost.  One example:
> 
> extern int foo ();
> int
> default_assemble_integer (unsigned size)
> {
>   extern unsigned long rs6000_isa_flags;
> 
>   if (size > (!((rs6000_isa_flags & (1UL << 35)) != 0) ? 4 : 8))
>     return 0;
>   return foo ();
> }
> 
> This rather horrible code turns the rs6000_isa_flags value into
> either
> 4 or 8:
> 	rldicr 9,9,28,0
> 	srdi 9,9,28
> 	addic 9,9,-1
> 	subfe 9,9,9
> 	rldicr 9,9,0,61
> 	addi 9,9,8
> Better would be
> 	rldicl 9,9,29,63
> 	sldi 9,9,2
> 	addi 9,9,4
> 
> There is also a "rlwinm ra,rb,3,0,26" instead of "rldicr ra,rb,3,60",
> and "li r31,0x4000; rotldi r31,r31,17" vs.
> "lis r31,0x8000; clrldi r31,r31,32".
> Neither of these is a real change.  I saw one occurrence of a 5 insn
> sequence being replaced with a load from memory in
> default_function_rodata_section, for ".rodata", and others elsewhere.
> 
> Sometimes correct insn cost leads to unexpected results.  For
> example:
> 
> extern unsigned bar (void);
> unsigned
> f1 (unsigned a)
> {
>   if ((a & 0x01000200) == 0x01000200)
>     return bar ();
>   return 0;
> }
> 
> emits for a & 0x01000200
>  (set (reg) (and (reg) (const_int 0x01000200)))
> at expand time (two rlwinm insns) rather than the older
>  (set (reg) (const_int 0x01000200))
>  (set (reg) (and (reg) (reg)))
> which is three insns.  However, since 0x01000200 is needed later the
> older code after optimisation is smaller.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RS6000] rs6000_rtx_costs for AND
  2020-09-15  1:19 ` [RS6000] rs6000_rtx_costs for AND Alan Modra
@ 2020-09-15 18:15   ` will schmidt
  2020-09-16  7:24     ` Alan Modra
  0 siblings, 1 reply; 26+ messages in thread
From: will schmidt @ 2020-09-15 18:15 UTC (permalink / raw)
  To: Alan Modra, Segher Boessenkool; +Cc: gcc-patches

On Tue, 2020-09-15 at 10:49 +0930, Alan Modra via Gcc-patches wrote:
> The existing "case AND" in this function is not sufficient for
> optabs.c:avoid_expensive_constant usage, where the AND is passed in
> outer_code.
> 
> 	* config/rs6000/rs6000.c (rs6000_rtx_costs): Move costing for
> 	AND to CONST_INT case.
> 
> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index 32044d33977..523d029800a 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -21150,16 +21150,13 @@ rs6000_rtx_costs (rtx x, machine_mode mode,
> int outer_code,
>  	    || outer_code == MINUS)
>  	   && (satisfies_constraint_I (x)
>  	       || satisfies_constraint_L (x)))
> -	  || (outer_code == AND
> -	      && (satisfies_constraint_K (x)
> -		  || (mode == SImode
> -		      ? satisfies_constraint_L (x)
> -		      : satisfies_constraint_J (x))))
> -	  || ((outer_code == IOR || outer_code == XOR)
> +	  || ((outer_code == AND || outer_code == IOR || outer_code ==
> XOR)
>  	      && (satisfies_constraint_K (x)
>  		  || (mode == SImode
>  		      ? satisfies_constraint_L (x)
>  		      : satisfies_constraint_J (x))))
> +	  || (outer_code == AND
> +	      && rs6000_is_valid_and_mask (x, mode))
>  	  || outer_code == ASHIFT
>  	  || outer_code == ASHIFTRT
>  	  || outer_code == LSHIFTRT
> @@ -21196,7 +21193,9 @@ rs6000_rtx_costs (rtx x, machine_mode mode,
> int outer_code,
>  		    || outer_code == IOR
>  		    || outer_code == XOR)
>  		   && (INTVAL (x)
> -		       & ~ (unsigned HOST_WIDE_INT) 0xffffffff) == 0))
> +		       & ~ (unsigned HOST_WIDE_INT) 0xffffffff) == 0)
> +	       || (outer_code == AND
> +		   && rs6000_is_valid_2insn_and (x, mode)))
>  	{
>  	  *total = COSTS_N_INSNS (1);
>  	  return true;
> @@ -21334,26 +21333,6 @@ rs6000_rtx_costs (rtx x, machine_mode mode,
> int outer_code,
>  	      *total += COSTS_N_INSNS (1);
>  	      return true;
>  	    }
> -
> -	  /* rotate-and-mask (no rotate), andi., andis.: 1 insn.  */
> -	  HOST_WIDE_INT val = INTVAL (XEXP (x, 1));
> -	  if (rs6000_is_valid_and_mask (XEXP (x, 1), mode)
> -	      || (val & 0xffff) == val
> -	      || (val & 0xffff0000) == val
> -	      || ((val & 0xffff) == 0 && mode == SImode))
> -	    {
> -	      *total = rtx_cost (left, mode, AND, 0, speed);
> -	      *total += COSTS_N_INSNS (1);
> -	      return true;
> -	    }
> -
> -	  /* 2 insns.  */
> -	  if (rs6000_is_valid_2insn_and (XEXP (x, 1), mode))
> -	    {
> -	      *total = rtx_cost (left, mode, AND, 0, speed);
> -	      *total += COSTS_N_INSNS (2);
> -	      return true;
> -	    }
>  	}

It's not exactly 1x1..   I tentatively conclude that the /* rotate-and-
mask */  lump of code here does go dead with the "case AND" changes
above.

thanks
-Will

> 
>        *total = COSTS_N_INSNS (1);


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RS6000] Count rldimi constant insns
  2020-09-15  1:19 ` [RS6000] Count rldimi constant insns Alan Modra
@ 2020-09-15 22:29   ` Segher Boessenkool
  0 siblings, 0 replies; 26+ messages in thread
From: Segher Boessenkool @ 2020-09-15 22:29 UTC (permalink / raw)
  To: Alan Modra; +Cc: gcc-patches

On Tue, Sep 15, 2020 at 10:49:39AM +0930, Alan Modra wrote:
> rldimi is generated by rs6000_emit_set_long_const when the high and
> low 32 bits of a 64-bit constant are equal.
> 
> 	* config/rs6000/rs6000.c (num_insns_constant_gpr): Count rldimi
> 	constants correctly.

Wow, did I miss that?  Whoops.  (That was PR93012, 72b2f3317b44.)

Okay for trunk.  Thanks!


Segher

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RS6000] rs6000_rtx_costs for PLUS/MINUS constant
  2020-09-15  1:19 ` [RS6000] rs6000_rtx_costs for PLUS/MINUS constant Alan Modra
@ 2020-09-15 22:31   ` Segher Boessenkool
  0 siblings, 0 replies; 26+ messages in thread
From: Segher Boessenkool @ 2020-09-15 22:31 UTC (permalink / raw)
  To: Alan Modra; +Cc: gcc-patches

On Tue, Sep 15, 2020 at 10:49:40AM +0930, Alan Modra wrote:
> These functions do behave a little differently for SImode, so the
> mode should be passed.
> 
> 	* config/rs6000/rs6000.c (rs6000_rtx_costs): Pass mode to
> 	reg_or_add_cint_operand and reg_or_sub_cint_operand.

Okay for trunk.  Thanks!

(Btw, please use [patch 2/6] etc. markers?  It helps refer to them :-) )


Segher

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RS6000] rs6000_rtx_costs for AND
  2020-09-15 18:15   ` will schmidt
@ 2020-09-16  7:24     ` Alan Modra
  0 siblings, 0 replies; 26+ messages in thread
From: Alan Modra @ 2020-09-16  7:24 UTC (permalink / raw)
  To: will schmidt; +Cc: Segher Boessenkool, gcc-patches

On Tue, Sep 15, 2020 at 01:15:34PM -0500, will schmidt wrote:
> On Tue, 2020-09-15 at 10:49 +0930, Alan Modra via Gcc-patches wrote:
> > The existing "case AND" in this function is not sufficient for
> > optabs.c:avoid_expensive_constant usage, where the AND is passed in
> > outer_code.
> > 
> > 	* config/rs6000/rs6000.c (rs6000_rtx_costs): Move costing for
> > 	AND to CONST_INT case.
[snip]

> It's not exactly 1x1..   I tentatively conclude that the /* rotate-and-
> mask */  lump of code here does go dead with the "case AND" changes
> above.

It's not so much dead as duplicate.  When rtx_costs returns false, as
it will now on sub-expressions matching the removed code, it will be
called recursively on the sub-expressions with outer_code set to the
operation, AND in this case.

-- 
Alan Modra
Australia Development Lab, IBM

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RS6000] rs6000_rtx_costs comment
  2020-09-15  1:19 ` [RS6000] rs6000_rtx_costs comment Alan Modra
@ 2020-09-16 23:21   ` Segher Boessenkool
  0 siblings, 0 replies; 26+ messages in thread
From: Segher Boessenkool @ 2020-09-16 23:21 UTC (permalink / raw)
  To: Alan Modra; +Cc: gcc-patches

Hi!

This took a while to digest, sorry.

On Tue, Sep 15, 2020 at 10:49:42AM +0930, Alan Modra wrote:
> +   1) Calls from places like optabs.c:avoid_expensive_constant will
> +   come here with OUTER_CODE set to an operation such as AND with X
> +   being a CONST_INT or other CONSTANT_P type.  This will be compared
> +   against set_src_cost, where we'll come here with OUTER_CODE as SET
> +   and X the same constant.

This (and similar) reasons are why I still haven't made set_src_cost
based on insn_cost -- it is in some places compared to some rtx_cost.

> +   2) Calls from places like combine:distribute_and_simplify_rtx are
> +   asking whether a possibly quite complex SET_SRC can be implemented
> +   more cheaply than some other logically equivalent SET_SRC.

It is comparing the set_src_cost of two equivalent formulations, yeah.
This is one place where set_src_cost can be pretty easily replaced by
insn_cost (combine uses that in most other places, already, and that
was a quite useful change).

> +   3) Calls from places like default_noce_conversion_profitable_p will
> +   come here via seq_cost and pass the pattern of a SET insn in X.

The pattern of the single SET in any instruction that is single_set,
yeah.

> +   Presuming the insn is valid and set_dest a reg, rs6000_rtx_costs
> +   will next see the SET_SRC.  The overall cost should be comparable
> +   to rs6000_insn_cost since the code is comparing one insn sequence
> +   (some of which may be costed by insn_cost) against another insn
> +   sequence.

Yes.  And our rtx_cost misses incredibly many cases, but most common
things are handled okay.

> +   4) Calls from places like cprop.c:try_replace_reg will come here
> +   with OUTER_CODE as INSN, and X either a valid pattern of a SET or
> +   one where some registers have been replaced with constants.  The
> +   replacements may make the SET invalid, for example if
> +     (set (reg1) (and (reg2) (const_int 0xfff)))
> +   replaces reg2 as
> +     (set (reg1) (and (symbol_ref) (const_int 0xfff)))
> +   then the replacement can't be implemented in one instruction and
> +   really the cost should be higher by one instruction.  However,
> +   the cost for invalid insns doesn't matter much except that a
> +   higher cost may lead to their rejection earlier.

Yup.  This uses set_rtx_cost, which also ideally will use insn_cost one
day.

> +   5) fwprop.c:should_replace_address puts yet another wrinkle on this
> +   function, where we prefer an address calculation that is more
> +   complex yet has the same address_cost.  In this case "more
> +   complex" is determined by having a higher set_src_cost.  So for
> +   example, if we want a plain (reg) address to be replaced with
> +   (plus (reg) (const)) when possible then PLUS needs to cost more
> +   than zero here.  */

Maybe it helps if you more prominenty mention set_rtx_cost and
set_src_cost?  Either way, okay for trunk.  Thanks!


Segher

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RS6000] rs6000_rtx_costs multi-insn constants
  2020-09-15  1:19 ` [RS6000] rs6000_rtx_costs multi-insn constants Alan Modra
@ 2020-09-16 23:28   ` Segher Boessenkool
  0 siblings, 0 replies; 26+ messages in thread
From: Segher Boessenkool @ 2020-09-16 23:28 UTC (permalink / raw)
  To: Alan Modra; +Cc: gcc-patches

On Tue, Sep 15, 2020 at 10:49:43AM +0930, Alan Modra wrote:
> This small patch to rs6000_rtx_const considerably improves code

(_costs)

> generated for large constants in 64-bit code, teaching gcc that it is
> better to load a constant from memory than to generate a sequence of
> up to five dependent instructions.  Note that the rs6000 backend does
> generate large constants as loads from memory at expand time but
> optimisation passes replace them with SETs of the value due to not
> having correct costs.
> 
> 	PR 94393
> 	* config/rs6000/rs6000.c (rs6000_rtx_costs): Cost multi-insn
> 	constants.

Okay for trunk.  Note that some p10 insns take a floating point
immediate, but those need to be handled specially anyway.

Thanks!


Segher

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RS6000] rs6000_rtx_costs cost IOR
  2020-09-15  1:19 ` [RS6000] rs6000_rtx_costs cost IOR Alan Modra
@ 2020-09-17  0:02   ` Segher Boessenkool
  2020-09-17  3:42     ` Alan Modra
  0 siblings, 1 reply; 26+ messages in thread
From: Segher Boessenkool @ 2020-09-17  0:02 UTC (permalink / raw)
  To: Alan Modra; +Cc: gcc-patches

Hi!

On Tue, Sep 15, 2020 at 10:49:44AM +0930, Alan Modra wrote:
> 	* config/rs6000/rs6000.c (rs6000_rtx_costs): Cost IOR.

>      case IOR:
> -      /* FIXME */
>        *total = COSTS_N_INSNS (1);
> -      return true;

Hey this was okay for over five years :-)

> +      left = XEXP (x, 0);
> +      if (GET_CODE (left) == AND
> +	  && CONST_INT_P (XEXP (left, 1)))

Add a comment that this is the integer insert insns?

> +	      // rotlsi3_insert_5

But use /* comments */.

> +		  /* Test both regs even though the one in the mask is
> +		     constrained to be equal to the output.  Increasing
> +		     cost may well result in rejecting an invalid insn
> +		     earlier.  */

Is that ever actually useful?

So this new block is pretty huge.  Can it easily be factored to a
separate function?  Just the insert insns part, not all IOR.

Okay for trunk with the comments changed to the correct syntax, and
factoring masked insert out to a separate function pre-approved if you
want to do that.  Thanks!


Segher

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RS6000] rs6000_rtx_costs cost IOR
  2020-09-17  0:02   ` Segher Boessenkool
@ 2020-09-17  3:42     ` Alan Modra
  2020-09-21 15:49       ` Segher Boessenkool
  0 siblings, 1 reply; 26+ messages in thread
From: Alan Modra @ 2020-09-17  3:42 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc-patches

On Wed, Sep 16, 2020 at 07:02:06PM -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Tue, Sep 15, 2020 at 10:49:44AM +0930, Alan Modra wrote:
> > 	* config/rs6000/rs6000.c (rs6000_rtx_costs): Cost IOR.
> 
> >      case IOR:
> > -      /* FIXME */
> >        *total = COSTS_N_INSNS (1);
> > -      return true;
> 
> Hey this was okay for over five years :-)
> 
> > +      left = XEXP (x, 0);
> > +      if (GET_CODE (left) == AND
> > +	  && CONST_INT_P (XEXP (left, 1)))
> 
> Add a comment that this is the integer insert insns?
> 
> > +	      // rotlsi3_insert_5
> 
> But use /* comments */.
> 
> > +		  /* Test both regs even though the one in the mask is
> > +		     constrained to be equal to the output.  Increasing
> > +		     cost may well result in rejecting an invalid insn
> > +		     earlier.  */
> 
> Is that ever actually useful?

Possibly not in this particular case, but I did see cases where
invalid insns were rejected early by costing non-reg sub-expressions.

Beside that, the default position on rtx_costs paths that return true
should be to cost any sub-expressions unless you know for sure they
are zero cost.  And yes, we fail to do that for some cases,
eg. mul_highpart.

> So this new block is pretty huge.  Can it easily be factored to a
> separate function?  Just the insert insns part, not all IOR.

Done in my local tree.

> Okay for trunk with the comments changed to the correct syntax, and
> factoring masked insert out to a separate function pre-approved if you
> want to do that.  Thanks!

I'll hold off committing until the whole rtx_costs patch series is
reviewed (not counting the rotate_and_mask_constant patch).

-- 
Alan Modra
Australia Development Lab, IBM

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RS6000] rs6000_rtx_costs reduce cost for SETs
  2020-09-15  1:19 ` [RS6000] rs6000_rtx_costs reduce cost for SETs Alan Modra
@ 2020-09-17 17:51   ` Segher Boessenkool
  2020-09-18  3:38     ` Alan Modra
  0 siblings, 1 reply; 26+ messages in thread
From: Segher Boessenkool @ 2020-09-17 17:51 UTC (permalink / raw)
  To: Alan Modra; +Cc: gcc-patches

Hi!

On Tue, Sep 15, 2020 at 10:49:45AM +0930, Alan Modra wrote:
> Also use rs6000_cost only for speed.

More directly: use something completely different for !speed, namely,
code size.

> -      if (CONST_INT_P (XEXP (x, 1))
> -	  && satisfies_constraint_I (XEXP (x, 1)))
> +      if (!speed)
> +	/* A little more than one insn so that nothing is tempted to
> +	   turn a shift left into a multiply.  */
> +	*total = COSTS_N_INSNS (1) + 1;

Please don't.  We have a lot of code elsewhere to handle this directly,
already.  Also, this is just wrong for size.  Five shifts is *not*
better than four muls.  If that is the only way to get good results,
than unfortunately we probably have to; but do not do this without any
proof.

>      case FMA:
> -      if (mode == SFmode)
> +      if (!speed)
> +	*total = COSTS_N_INSNS (1) + 1;

Not here, either.

>      case DIV:
>      case MOD:
>        if (FLOAT_MODE_P (mode))
>  	{
> -	  *total = mode == DFmode ? rs6000_cost->ddiv
> -				  : rs6000_cost->sdiv;
> +	  if (!speed)
> +	    *total = COSTS_N_INSNS (1) + 2;

And why + 2 even?

> -	  if (GET_MODE (XEXP (x, 1)) == DImode)
> +	  if (!speed)
> +	    *total = COSTS_N_INSNS (1) + 2;
> +	  else if (GET_MODE (XEXP (x, 1)) == DImode)
>  	    *total = rs6000_cost->divdi;
>  	  else
>  	    *total = rs6000_cost->divsi;

(more)

> @@ -21368,6 +21378,7 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
>        return false;
>  
>      case AND:
> +      *total = COSTS_N_INSNS (1);
>        right = XEXP (x, 1);
>        if (CONST_INT_P (right))
>  	{
> @@ -21380,15 +21391,15 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
>  	       || left_code == LSHIFTRT)
>  	      && rs6000_is_valid_shift_mask (right, left, mode))
>  	    {
> -	      *total = rtx_cost (XEXP (left, 0), mode, left_code, 0, speed);
> -	      if (!CONST_INT_P (XEXP (left, 1)))
> -		*total += rtx_cost (XEXP (left, 1), SImode, left_code, 1, speed);
> -	      *total += COSTS_N_INSNS (1);
> +	      rtx reg_op = XEXP (left, 0);
> +	      if (!REG_P (reg_op))
> +		*total += rtx_cost (reg_op, mode, left_code, 0, speed);
> +	      reg_op = XEXP (left, 1);
> +	      if (!REG_P (reg_op) && !CONST_INT_P (reg_op))
> +		*total += rtx_cost (reg_op, mode, left_code, 1, speed);
>  	      return true;
>  	    }
>  	}
> -
> -      *total = COSTS_N_INSNS (1);
>        return false;

This doesn't improve anything?  It just makes it different from all
surrounding code?

> @@ -21519,7 +21530,9 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
>        if (outer_code == TRUNCATE
>  	  && GET_CODE (XEXP (x, 0)) == MULT)
>  	{
> -	  if (mode == DImode)
> +	  if (!speed)
> +	    *total = COSTS_N_INSNS (1) + 1;

(more)

> +    case SET:
> +      /* The default cost of a SET is the number of general purpose
> +	 regs being set multiplied by COSTS_N_INSNS (1).  That only
> +	 works where the incremental cost of the operation and
> +	 operands is zero, when the operation performed can be done in
> +	 one instruction.  For other cases where we add COSTS_N_INSNS
> +	 for some operation (see point 5 above), COSTS_N_INSNS (1)
> +	 should be subtracted from the total cost.  */

What does "incremental cost" mean there?  If what increases?

> +      {
> +	rtx_code src_code = GET_CODE (SET_SRC (x));
> +	if (src_code == CONST_INT
> +	    || src_code == CONST_DOUBLE
> +	    || src_code == CONST_WIDE_INT)
> +	  return false;
> +	int set_cost = (rtx_cost (SET_SRC (x), mode, SET, 1, speed)
> +			+ rtx_cost (SET_DEST (x), mode, SET, 0, speed));

This should use set_src_cost, if anything.  But that will recurse then,
currently?  Ugh.

Using rtx_cost for SET_DEST is problematic, too.

What (if anything!) calls this for a SET?  Oh, set_rtx_cost still does
that, hrm.

rtx_cost costs RTL *expressions*.  Not instructions.  Except where it is
(ab)used for that, sigh.

Many targets have something for it already, but all quite different from
this.

> +	if (set_cost >= COSTS_N_INSNS (1))
> +	  *total += set_cost - COSTS_N_INSNS (1);

I don't understand this part at all, for example.  Why not just
  *total += set_cost - COSTS_N_INSNS (1);
?  If set_cost is lower than one insn's cost, don't we have a problem
already?


Generic things.  Please split this patch up when sending it again, it
does too many different things, and many of those are not obvious.

All such changes that aren't completely obvious (like the previous ones
were) should have some measurement.  We are in stage1, and we will
notice (non-trivial) degradations, but if we can expect degradations
(like for this patch), it needs benchmarking.

Since you add !speed all over the place, maybe we should just have a
separate function that does !speed?  It looks like quite a few things
will simplify.


Segher

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RS6000] rs6000_rtx_costs reduce cost for SETs
  2020-09-17 17:51   ` Segher Boessenkool
@ 2020-09-18  3:38     ` Alan Modra
  2020-09-18 18:13       ` Segher Boessenkool
  0 siblings, 1 reply; 26+ messages in thread
From: Alan Modra @ 2020-09-18  3:38 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc-patches

On Thu, Sep 17, 2020 at 12:51:25PM -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Tue, Sep 15, 2020 at 10:49:45AM +0930, Alan Modra wrote:
> > Also use rs6000_cost only for speed.
> 
> More directly: use something completely different for !speed, namely,
> code size.

Yes, that might be better.

> > -      if (CONST_INT_P (XEXP (x, 1))
> > -	  && satisfies_constraint_I (XEXP (x, 1)))
> > +      if (!speed)
> > +	/* A little more than one insn so that nothing is tempted to
> > +	   turn a shift left into a multiply.  */
> > +	*total = COSTS_N_INSNS (1) + 1;
> 
> Please don't.  We have a lot of code elsewhere to handle this directly,
> already.  Also, this is just wrong for size.  Five shifts is *not*
> better than four muls.  If that is the only way to get good results,
> than unfortunately we probably have to; but do not do this without any
> proof.

Huh.  If a cost of 5 is "just wrong for size" then you prefer a cost
of 12 for example (power9 mulsi or muldi rs6000_cost)?  Noticing that
result for !speed rs6000_rtx_costs is the entire basis for the !speed
changes.  I don't have any proof that this is correct.

> >      case FMA:
> > -      if (mode == SFmode)
> > +      if (!speed)
> > +	*total = COSTS_N_INSNS (1) + 1;
> 
> Not here, either.
> 
> >      case DIV:
> >      case MOD:
> >        if (FLOAT_MODE_P (mode))
> >  	{
> > -	  *total = mode == DFmode ? rs6000_cost->ddiv
> > -				  : rs6000_cost->sdiv;
> > +	  if (!speed)
> > +	    *total = COSTS_N_INSNS (1) + 2;
> 
> And why + 2 even?
> 
> > -	  if (GET_MODE (XEXP (x, 1)) == DImode)
> > +	  if (!speed)
> > +	    *total = COSTS_N_INSNS (1) + 2;
> > +	  else if (GET_MODE (XEXP (x, 1)) == DImode)
> >  	    *total = rs6000_cost->divdi;
> >  	  else
> >  	    *total = rs6000_cost->divsi;
> 
> (more)

OK, I can remove all the !speed changes.  To be honest, I didn't look
anywhere near as much at code size changes as I worried about
performance.  And about not regressing any fiddly testcase we have.

> > @@ -21368,6 +21378,7 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
> >        return false;
> >  
> >      case AND:
> > +      *total = COSTS_N_INSNS (1);
> >        right = XEXP (x, 1);
> >        if (CONST_INT_P (right))
> >  	{
> > @@ -21380,15 +21391,15 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
> >  	       || left_code == LSHIFTRT)
> >  	      && rs6000_is_valid_shift_mask (right, left, mode))
> >  	    {
> > -	      *total = rtx_cost (XEXP (left, 0), mode, left_code, 0, speed);
> > -	      if (!CONST_INT_P (XEXP (left, 1)))
> > -		*total += rtx_cost (XEXP (left, 1), SImode, left_code, 1, speed);
> > -	      *total += COSTS_N_INSNS (1);
> > +	      rtx reg_op = XEXP (left, 0);
> > +	      if (!REG_P (reg_op))
> > +		*total += rtx_cost (reg_op, mode, left_code, 0, speed);
> > +	      reg_op = XEXP (left, 1);
> > +	      if (!REG_P (reg_op) && !CONST_INT_P (reg_op))
> > +		*total += rtx_cost (reg_op, mode, left_code, 1, speed);
> >  	      return true;
> >  	    }
> >  	}
> > -
> > -      *total = COSTS_N_INSNS (1);
> >        return false;
> 
> This doesn't improve anything?  It just makes it different from all
> surrounding code?

So it moves the common COSTS_N_INSNS (1) count and doesn't recurse for
regs, like it doesn't for const_int.

> > @@ -21519,7 +21530,9 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
> >        if (outer_code == TRUNCATE
> >  	  && GET_CODE (XEXP (x, 0)) == MULT)
> >  	{
> > -	  if (mode == DImode)
> > +	  if (!speed)
> > +	    *total = COSTS_N_INSNS (1) + 1;
> 
> (more)
> 
> > +    case SET:
> > +      /* The default cost of a SET is the number of general purpose
> > +	 regs being set multiplied by COSTS_N_INSNS (1).  That only
> > +	 works where the incremental cost of the operation and
> > +	 operands is zero, when the operation performed can be done in
> > +	 one instruction.  For other cases where we add COSTS_N_INSNS
> > +	 for some operation (see point 5 above), COSTS_N_INSNS (1)
> > +	 should be subtracted from the total cost.  */
> 
> What does "incremental cost" mean there?  If what increases?
> 
> > +      {
> > +	rtx_code src_code = GET_CODE (SET_SRC (x));
> > +	if (src_code == CONST_INT
> > +	    || src_code == CONST_DOUBLE
> > +	    || src_code == CONST_WIDE_INT)
> > +	  return false;
> > +	int set_cost = (rtx_cost (SET_SRC (x), mode, SET, 1, speed)
> > +			+ rtx_cost (SET_DEST (x), mode, SET, 0, speed));
> 
> This should use set_src_cost, if anything.  But that will recurse then,
> currently?  Ugh.
> 
> Using rtx_cost for SET_DEST is problematic, too.
> 
> What (if anything!) calls this for a SET?  Oh, set_rtx_cost still does
> that, hrm.
> 
> rtx_cost costs RTL *expressions*.  Not instructions.  Except where it is
> (ab)used for that, sigh.
> 
> Many targets have something for it already, but all quite different from
> this.

Right, you are starting to understand just how difficult it is to do
anything at all to rs6000_rtx_costs.

> > +	if (set_cost >= COSTS_N_INSNS (1))
> > +	  *total += set_cost - COSTS_N_INSNS (1);
> 
> I don't understand this part at all, for example.  Why not just
>   *total += set_cost - COSTS_N_INSNS (1);
> ?  If set_cost is lower than one insn's cost, don't we have a problem
> already?

The set_cost I calculate here from src and dest can easily be zero.
(set (reg) (reg)) and (set (reg) (const_int 0)) for example have a
dest cost of zero and a src cost of zero.  That can't change without
breaking places where rtx_costs is called to compare pieces of RTL.
Here though we happen to be looking at a SET, so have an entire
instruction.  The value returned should be comparable to our
instruction costs.  That's tricky to do, and this change is just a
hack.  Without the hack I saw some testcases regress.

I don't like this hack any more than you do reviewing it!

> 
> Generic things.  Please split this patch up when sending it again, it
> does too many different things, and many of those are not obvious.
> 
> All such changes that aren't completely obvious (like the previous ones
> were) should have some measurement.  We are in stage1, and we will
> notice (non-trivial) degradations, but if we can expect degradations
> (like for this patch), it needs benchmarking.

Pat did benchmark these changes..  I was somewhat surprised to see
a small improvement in spec results.

> Since you add !speed all over the place, maybe we should just have a
> separate function that does !speed?  It looks like quite a few things
> will simplify.

Revised patch as follows.

	* config/rs6000/rs6000.c (rs6000_rtx_costs): Reduce cost for SETs
	when insn operation cost handled on recursive call.  Tidy
	break/return.  Tidy AND costing.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 6af8a9a31cb..26c2f443502 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -21397,7 +21397,7 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
 	*total = rs6000_cost->fp;
       else
 	*total = rs6000_cost->dmul;
-      break;
+      return false;
 
     case DIV:
     case MOD:
@@ -21457,6 +21457,7 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
       return false;
 
     case AND:
+      *total = COSTS_N_INSNS (1);
       right = XEXP (x, 1);
       if (CONST_INT_P (right))
 	{
@@ -21469,15 +21470,15 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
 	       || left_code == LSHIFTRT)
 	      && rs6000_is_valid_shift_mask (right, left, mode))
 	    {
-	      *total = rtx_cost (XEXP (left, 0), mode, left_code, 0, speed);
-	      if (!CONST_INT_P (XEXP (left, 1)))
-		*total += rtx_cost (XEXP (left, 1), SImode, left_code, 1, speed);
-	      *total += COSTS_N_INSNS (1);
+	      rtx reg_op = XEXP (left, 0);
+	      if (!REG_P (reg_op))
+		*total += rtx_cost (reg_op, mode, left_code, 0, speed);
+	      reg_op = XEXP (left, 1);
+	      if (!REG_P (reg_op) && !CONST_INT_P (reg_op))
+		*total += rtx_cost (reg_op, mode, left_code, 1, speed);
 	      return true;
 	    }
 	}
-
-      *total = COSTS_N_INSNS (1);
       return false;
 
     case IOR:
@@ -21575,7 +21576,7 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
 	  *total = rs6000_cost->fp;
 	  return false;
 	}
-      break;
+      return false;
 
     case NE:
     case EQ:
@@ -21613,13 +21614,40 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
 	  *total = 0;
 	  return true;
 	}
-      break;
+      return false;
+
+    case SET:
+      /* The default cost of a SET is the number of general purpose
+	 regs being set multiplied by COSTS_N_INSNS (1).  Here we
+	 happen to be looking at a SET, so have an instruction rather
+	 than just a piece of RTL and want to return a cost comparable
+	 to rs6000 instruction costing.  That's a little complicated
+	 because in some cases the cost of SET operands is non-zero,
+	 see point 5 above and cost of PLUS for example, and in
+	 others it is zero, for example for (set (reg) (reg)).
+	 But (set (reg) (reg)) actually costs the same as 
+	 (set (reg) (plus (reg) (reg))).  Hack around this by
+	 subtracting COSTS_N_INSNS (1) from the operand cost in cases
+	 were we add COSTS_N_INSNS (1) for some operation.  Don't do
+	 so for constants that might cost more than zero because they
+	 don't fit in one instruction.  FIXME: rtx_costs should not be
+	 looking at entire instructions.  */
+      {
+	rtx_code src_code = GET_CODE (SET_SRC (x));
+	if (src_code == CONST_INT
+	    || src_code == CONST_DOUBLE
+	    || src_code == CONST_WIDE_INT)
+	  return false;
+	int set_cost = (rtx_cost (SET_SRC (x), mode, SET, 1, speed)
+			+ rtx_cost (SET_DEST (x), mode, SET, 0, speed));
+	if (set_cost >= COSTS_N_INSNS (1))
+	  *total += set_cost - COSTS_N_INSNS (1);
+	return true;
+      }
 
     default:
-      break;
+      return false;
     }
-
-  return false;
 }
 
 /* Debug form of r6000_rtx_costs that is selected if -mdebug=cost.  */


-- 
Alan Modra
Australia Development Lab, IBM

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RS6000] rs6000_rtx_costs reduce cost for SETs
  2020-09-18  3:38     ` Alan Modra
@ 2020-09-18 18:13       ` Segher Boessenkool
  2020-09-21  7:07         ` Alan Modra
  0 siblings, 1 reply; 26+ messages in thread
From: Segher Boessenkool @ 2020-09-18 18:13 UTC (permalink / raw)
  To: Alan Modra; +Cc: gcc-patches

On Fri, Sep 18, 2020 at 01:08:42PM +0930, Alan Modra wrote:
> On Thu, Sep 17, 2020 at 12:51:25PM -0500, Segher Boessenkool wrote:
> > > -      if (CONST_INT_P (XEXP (x, 1))
> > > -	  && satisfies_constraint_I (XEXP (x, 1)))
> > > +      if (!speed)
> > > +	/* A little more than one insn so that nothing is tempted to
> > > +	   turn a shift left into a multiply.  */
> > > +	*total = COSTS_N_INSNS (1) + 1;
> > 
> > Please don't.  We have a lot of code elsewhere to handle this directly,
> > already.  Also, this is just wrong for size.  Five shifts is *not*
> > better than four muls.  If that is the only way to get good results,
> > than unfortunately we probably have to; but do not do this without any
> > proof.
> 
> Huh.  If a cost of 5 is "just wrong for size" then you prefer a cost
> of 12 for example (power9 mulsi or muldi rs6000_cost)?  Noticing that
> result for !speed rs6000_rtx_costs is the entire basis for the !speed
> changes.  I don't have any proof that this is correct.

No, just 4, like all other insns -- it is the size of the insn, after
all!  Not accidentally using the speed cost is a fine change of course,
but the + 1 isn't based on anything afaics, so it can only hurt.

> > > -	  if (GET_MODE (XEXP (x, 1)) == DImode)
> > > +	  if (!speed)
> > > +	    *total = COSTS_N_INSNS (1) + 2;
> > > +	  else if (GET_MODE (XEXP (x, 1)) == DImode)
> > >  	    *total = rs6000_cost->divdi;
> > >  	  else
> > >  	    *total = rs6000_cost->divsi;
> > 
> > (more)
> 
> OK, I can remove all the !speed changes.

No, those are quite okay (as a separate patch though).  But you
shouldn't add random numbers to it, one insn is just one insn.  The cost
function for -O2 is just code size, nothing more, nothing less.

> > >      case AND:
> > > +      *total = COSTS_N_INSNS (1);
> > >        right = XEXP (x, 1);
> > >        if (CONST_INT_P (right))
> > >  	{
> > > @@ -21380,15 +21391,15 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
> > >  	       || left_code == LSHIFTRT)
> > >  	      && rs6000_is_valid_shift_mask (right, left, mode))
> > >  	    {
> > > -	      *total = rtx_cost (XEXP (left, 0), mode, left_code, 0, speed);
> > > -	      if (!CONST_INT_P (XEXP (left, 1)))
> > > -		*total += rtx_cost (XEXP (left, 1), SImode, left_code, 1, speed);
> > > -	      *total += COSTS_N_INSNS (1);
> > > +	      rtx reg_op = XEXP (left, 0);
> > > +	      if (!REG_P (reg_op))
> > > +		*total += rtx_cost (reg_op, mode, left_code, 0, speed);
> > > +	      reg_op = XEXP (left, 1);
> > > +	      if (!REG_P (reg_op) && !CONST_INT_P (reg_op))
> > > +		*total += rtx_cost (reg_op, mode, left_code, 1, speed);
> > >  	      return true;
> > >  	    }
> > >  	}
> > > -
> > > -      *total = COSTS_N_INSNS (1);
> > >        return false;
> > 
> > This doesn't improve anything?  It just makes it different from all
> > surrounding code?
> 
> So it moves the common COSTS_N_INSNS (1) count and doesn't recurse for
> regs, like it doesn't for const_int.

I meant that *total set only.  It is fine to have one extra source line.

Does not recursing for regs help anything?  Yes, for CONST_INT that is
questionable as well, but adding more stuff like it does not help
(without actual justification).

This routine is not hot at all.  Maybe decades ago it was, and back then
the thought was that micro-optimisations like this were useful (and
maybe they were, _back then_ :-) )

> > > +    case SET:
> > > +      /* The default cost of a SET is the number of general purpose
> > > +	 regs being set multiplied by COSTS_N_INSNS (1).  That only
> > > +	 works where the incremental cost of the operation and
> > > +	 operands is zero, when the operation performed can be done in
> > > +	 one instruction.  For other cases where we add COSTS_N_INSNS
> > > +	 for some operation (see point 5 above), COSTS_N_INSNS (1)
> > > +	 should be subtracted from the total cost.  */
> > 
> > What does "incremental cost" mean there?  If what increases?

Can you improve that comment please?

> > > +      {
> > > +	rtx_code src_code = GET_CODE (SET_SRC (x));
> > > +	if (src_code == CONST_INT
> > > +	    || src_code == CONST_DOUBLE
> > > +	    || src_code == CONST_WIDE_INT)
> > > +	  return false;
> > > +	int set_cost = (rtx_cost (SET_SRC (x), mode, SET, 1, speed)
> > > +			+ rtx_cost (SET_DEST (x), mode, SET, 0, speed));
> > 
> > This should use set_src_cost, if anything.  But that will recurse then,
> > currently?  Ugh.
> > 
> > Using rtx_cost for SET_DEST is problematic, too.
> > 
> > What (if anything!) calls this for a SET?  Oh, set_rtx_cost still does
> > that, hrm.
> > 
> > rtx_cost costs RTL *expressions*.  Not instructions.  Except where it is
> > (ab)used for that, sigh.
> > 
> > Many targets have something for it already, but all quite different from
> > this.
> 
> Right, you are starting to understand just how difficult it is to do
> anything at all to rs6000_rtx_costs.

"Starting to understand", lol :-)  I created insn_cost back in 2017, for
this and may related problems: trying to derive the cost of an RTL
expression does not make sense at all in almost all places, what we want
to know is if the machine code is faster (or smaller for -Os).  That was
after many years of fighting this (for rs6000 and other targets; hardly
any of this is specific to our port).

Eventually we should be able to delete this completely.  That will be a
good day.

Currently mostly set_rtx_cost and set_src_cost are in the way of getting
there, and mostly because many ways those are used simply make no sense
at all.

Other than those, there is just a loooong tail of random stuff.


Improving this is fine of course: it will be quite a long time before it
isn't useful anymore, and longer until it isn't used anymore.


> > > +	if (set_cost >= COSTS_N_INSNS (1))
> > > +	  *total += set_cost - COSTS_N_INSNS (1);
> > 
> > I don't understand this part at all, for example.  Why not just
> >   *total += set_cost - COSTS_N_INSNS (1);
> > ?  If set_cost is lower than one insn's cost, don't we have a problem
> > already?
> 
> The set_cost I calculate here from src and dest can easily be zero.
> (set (reg) (reg)) and (set (reg) (const_int 0)) for example have a
> dest cost of zero and a src cost of zero.  That can't change without
> breaking places where rtx_costs is called to compare pieces of RTL.
> Here though we happen to be looking at a SET, so have an entire
> instruction.  The value returned should be comparable to our
> instruction costs.  That's tricky to do, and this change is just a
> hack.  Without the hack I saw some testcases regress.
> 
> I don't like this hack any more than you do reviewing it!

So maybe handle all SETs with a zero set_cost before the general case?
Isn't that what we used to do, and what the general code does?

> > Generic things.  Please split this patch up when sending it again, it
> > does too many different things, and many of those are not obvious.
> > 
> > All such changes that aren't completely obvious (like the previous ones
> > were) should have some measurement.  We are in stage1, and we will
> > notice (non-trivial) degradations, but if we can expect degradations
> > (like for this patch), it needs benchmarking.
> 
> Pat did benchmark these changes..  I was somewhat surprised to see
> a small improvement in spec results.

Thanks (to both of you).  Interesting!  Which of these unrelated changes
does this come from?


Segher

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RS6000] rs6000_rtx_costs reduce cost for SETs
  2020-09-18 18:13       ` Segher Boessenkool
@ 2020-09-21  7:07         ` Alan Modra
  0 siblings, 0 replies; 26+ messages in thread
From: Alan Modra @ 2020-09-21  7:07 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc-patches

On Fri, Sep 18, 2020 at 01:13:18PM -0500, Segher Boessenkool wrote:
> Thanks (to both of you).  Interesting!  Which of these unrelated changes
> does this come from?

Most of the changes I saw in code generation (not in spec, I didn't
look there, but in gcc) came down to this change to the cost for SETs,
and "rs6000_rtx_costs multi-insn constants".  I expect they were the
changes that made most difference to spec results, with this patch
likely resulting in more if-conversion.

So here is the patch again, this time without any distracting other
changes.  With a further revised comment.

	* config/rs6000/rs6000.c (rs6000_rtx_costs): Reduce cost of SET
	operands.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 8969baa4dcf..2d770afd8fe 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -21599,6 +21599,35 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
 	}
       break;
 
+    case SET:
+      /* On entry the value in *TOTAL is the number of general purpose
+	 regs being set, multiplied by COSTS_N_INSNS (1).  Handle
+	 costing of set operands specially since in most cases we have
+	 an instruction rather than just a piece of RTL and should
+	 return a cost comparable to insn_cost.  That's a little
+	 complicated because in some cases the cost of SET operands is
+	 non-zero, see point 5 above and cost of PLUS for example, and
+	 in others it is zero, for example for (set (reg) (reg)).
+	 But (set (reg) (reg)) has the same insn_cost as
+	 (set (reg) (plus (reg) (reg))).  Hack around this by
+	 subtracting COSTS_N_INSNS (1) from the operand cost in cases
+	 were we add at least COSTS_N_INSNS (1) for some operation.
+	 However, don't do so for constants.  Constants might cost
+	 more than zero when they require more than one instruction,
+	 and we do want the cost of extra instructions.  */
+      {
+	rtx_code src_code = GET_CODE (SET_SRC (x));
+	if (src_code == CONST_INT
+	    || src_code == CONST_DOUBLE
+	    || src_code == CONST_WIDE_INT)
+	  return false;
+	int set_cost = (rtx_cost (SET_SRC (x), mode, SET, 1, speed)
+			+ rtx_cost (SET_DEST (x), mode, SET, 0, speed));
+	if (set_cost >= COSTS_N_INSNS (1))
+	  *total += set_cost - COSTS_N_INSNS (1);
+	return true;
+      }
+
     default:
       break;
     }

-- 
Alan Modra
Australia Development Lab, IBM

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RS6000] rs6000_rtx_costs cost IOR
  2020-09-17  3:42     ` Alan Modra
@ 2020-09-21 15:49       ` Segher Boessenkool
  2020-09-21 23:54         ` Alan Modra
  0 siblings, 1 reply; 26+ messages in thread
From: Segher Boessenkool @ 2020-09-21 15:49 UTC (permalink / raw)
  To: Alan Modra; +Cc: gcc-patches

Hi!

On Thu, Sep 17, 2020 at 01:12:19PM +0930, Alan Modra wrote:
> On Wed, Sep 16, 2020 at 07:02:06PM -0500, Segher Boessenkool wrote:
> > > +		  /* Test both regs even though the one in the mask is
> > > +		     constrained to be equal to the output.  Increasing
> > > +		     cost may well result in rejecting an invalid insn
> > > +		     earlier.  */
> > 
> > Is that ever actually useful?
> 
> Possibly not in this particular case, but I did see cases where
> invalid insns were rejected early by costing non-reg sub-expressions.

But does that ever change generated code?

This makes the compiler a lot harder to read and understand.  To the
point that such micro-optimisations makes worthwhile optimisations hard
or impossible to do.


Segher

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RS6000] rotate and mask constants
  2020-09-15  7:16   ` Alan Modra
@ 2020-09-21 15:56     ` Segher Boessenkool
  0 siblings, 0 replies; 26+ messages in thread
From: Segher Boessenkool @ 2020-09-21 15:56 UTC (permalink / raw)
  To: Alan Modra; +Cc: gcc-patches, Peter Bergner

On Tue, Sep 15, 2020 at 04:46:08PM +0930, Alan Modra wrote:
> On Tue, Sep 15, 2020 at 10:49:46AM +0930, Alan Modra wrote:
> > Implement more two insn constants.
> 
> And tests.  rot_cst1 checks the values generated, rot_cst2 checks
> instruction count.
> 
> 	* gcc.target/powerpc/rot_cst.h,
> 	* gcc.target/powerpc/rot_cst1.c,
> 	* gcc.target/powerpc/rot_cst2.c: New tests.

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/rot_cst1.c
> @@ -0,0 +1,68 @@
> +/* { dg-do run { target lp64 } } */

This doesn't need lp64.

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/rot_cst2.c
> @@ -0,0 +1,6 @@
> +/* { dg-do compile { target lp64 } } */
> +/* { dg-options "-O2" } */
> +
> +#include "rot_cst.h"
> +
> +/* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 122 } } */

Please write (in comments) how much of each insn are expected, and
possibly for what function?  Also, bonus points if you make this work
for 32 bit as well (it is almost required even).


Segher

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RS6000] rs6000_rtx_costs cost IOR
  2020-09-21 15:49       ` Segher Boessenkool
@ 2020-09-21 23:54         ` Alan Modra
  0 siblings, 0 replies; 26+ messages in thread
From: Alan Modra @ 2020-09-21 23:54 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc-patches

On Mon, Sep 21, 2020 at 10:49:17AM -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Thu, Sep 17, 2020 at 01:12:19PM +0930, Alan Modra wrote:
> > On Wed, Sep 16, 2020 at 07:02:06PM -0500, Segher Boessenkool wrote:
> > > > +		  /* Test both regs even though the one in the mask is
> > > > +		     constrained to be equal to the output.  Increasing
> > > > +		     cost may well result in rejecting an invalid insn
> > > > +		     earlier.  */
> > > 
> > > Is that ever actually useful?
> > 
> > Possibly not in this particular case, but I did see cases where
> > invalid insns were rejected early by costing non-reg sub-expressions.
> 
> But does that ever change generated code?
> 
> This makes the compiler a lot harder to read and understand.  To the
> point that such micro-optimisations makes worthwhile optimisations hard
> or impossible to do.

Fair enough, here's a revised patch.

	* config/rs6000/rs6000.c (rotate_insert_cost): New function.
	(rs6000_rtx_costs): Cost IOR.  Tidy break/return.  Tidy AND.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 5025e3c30c0..78c33cc8cba 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -21118,6 +21118,91 @@ rs6000_cannot_copy_insn_p (rtx_insn *insn)
 	 && get_attr_cannot_copy (insn);
 }
 
+/* Handle rtx_costs for scalar integer rotate and insert insns.  */
+
+static bool
+rotate_insert_cost (rtx left, rtx right, machine_mode mode, bool speed,
+		    int *total)
+{
+  if (GET_CODE (right) == AND
+      && CONST_INT_P (XEXP (right, 1))
+      && UINTVAL (XEXP (left, 1)) + UINTVAL (XEXP (right, 1)) + 1 == 0)
+    {
+      rtx leftop = XEXP (left, 0);
+      rtx rightop = XEXP (right, 0);
+
+      /* rotlsi3_insert_5.  */
+      if (REG_P (leftop)
+	  && REG_P (rightop)
+	  && mode == SImode
+	  && UINTVAL (XEXP (left, 1)) != 0
+	  && UINTVAL (XEXP (right, 1)) != 0
+	  && rs6000_is_valid_mask (XEXP (left, 1), NULL, NULL, mode))
+	return true;
+      /* rotldi3_insert_6.  */
+      if (REG_P (leftop)
+	  && REG_P (rightop)
+	  && mode == DImode
+	  && exact_log2 (-UINTVAL (XEXP (left, 1))) > 0)
+	return true;
+      /* rotldi3_insert_7.  */
+      if (REG_P (leftop)
+	  && REG_P (rightop)
+	  && mode == DImode
+	  && exact_log2 (-UINTVAL (XEXP (right, 1))) > 0)
+	return true;
+
+      rtx mask = 0;
+      rtx shift = leftop;
+      rtx_code shift_code = GET_CODE (shift);
+      /* rotl<mode>3_insert.  */
+      if (shift_code == ROTATE
+	  || shift_code == ASHIFT
+	  || shift_code == LSHIFTRT)
+	mask = right;
+      else
+	{
+	  shift = rightop;
+	  shift_code = GET_CODE (shift);
+	  /* rotl<mode>3_insert_2.  */
+	  if (shift_code == ROTATE
+	      || shift_code == ASHIFT
+	      || shift_code == LSHIFTRT)
+	    mask = left;
+	}
+      if (mask
+	  && CONST_INT_P (XEXP (shift, 1))
+	  && rs6000_is_valid_insert_mask (XEXP (mask, 1), shift, mode))
+	{
+	  *total += rtx_cost (XEXP (shift, 0), mode, shift_code, 0, speed);
+	  *total += rtx_cost (XEXP (mask, 0), mode, AND, 0, speed);
+	  return true;
+	}
+    }
+  /* rotl<mode>3_insert_3.  */
+  if (GET_CODE (right) == ASHIFT
+      && CONST_INT_P (XEXP (right, 1))
+      && (INTVAL (XEXP (right, 1))
+	  == exact_log2 (UINTVAL (XEXP (left, 1)) + 1)))
+    {
+      *total += rtx_cost (XEXP (left, 0), mode, AND, 0, speed);
+      *total += rtx_cost (XEXP (right, 0), mode, ASHIFT, 0, speed);
+      return true;
+    }
+  /* rotl<mode>3_insert_4.  */
+  if (GET_CODE (right) == LSHIFTRT
+      && CONST_INT_P (XEXP (right, 1))
+      && mode == SImode
+      && (INTVAL (XEXP (right, 1))
+	  + exact_log2 (-UINTVAL (XEXP (left, 1)))) == 32)
+    {
+      *total += rtx_cost (XEXP (left, 0), mode, AND, 0, speed);
+      *total += rtx_cost (XEXP (right, 0), mode, LSHIFTRT, 0, speed);
+      return true;
+    }
+  return false;
+}
+
 /* Compute a (partial) cost for rtx X.  Return true if the complete
    cost has been computed, and false if subexpressions should be
    scanned.  In either case, *TOTAL contains the cost result.
@@ -21165,6 +21250,7 @@ static bool
 rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
 		  int opno ATTRIBUTE_UNUSED, int *total, bool speed)
 {
+  rtx left, right;
   int code = GET_CODE (x);
 
   switch (code)
@@ -21295,7 +21381,7 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
 	*total = rs6000_cost->fp;
       else
 	*total = rs6000_cost->dmul;
-      break;
+      return false;
 
     case DIV:
     case MOD:
@@ -21355,32 +21441,37 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
       return false;
 
     case AND:
-      if (CONST_INT_P (XEXP (x, 1)))
+      *total = COSTS_N_INSNS (1);
+      right = XEXP (x, 1);
+      if (CONST_INT_P (right))
 	{
-	  rtx left = XEXP (x, 0);
+	  left = XEXP (x, 0);
 	  rtx_code left_code = GET_CODE (left);
 
 	  /* rotate-and-mask: 1 insn.  */
 	  if ((left_code == ROTATE
 	       || left_code == ASHIFT
 	       || left_code == LSHIFTRT)
-	      && rs6000_is_valid_shift_mask (XEXP (x, 1), left, mode))
+	      && rs6000_is_valid_shift_mask (right, left, mode))
 	    {
-	      *total = rtx_cost (XEXP (left, 0), mode, left_code, 0, speed);
-	      if (!CONST_INT_P (XEXP (left, 1)))
-		*total += rtx_cost (XEXP (left, 1), SImode, left_code, 1, speed);
-	      *total += COSTS_N_INSNS (1);
+	      *total += rtx_cost (XEXP (left, 0), mode, left_code, 0, speed);
+	      *total += rtx_cost (XEXP (left, 1), mode, left_code, 1, speed);
 	      return true;
 	    }
 	}
-
-      *total = COSTS_N_INSNS (1);
       return false;
 
     case IOR:
-      /* FIXME */
       *total = COSTS_N_INSNS (1);
-      return true;
+      left = XEXP (x, 0);
+      if (GET_CODE (left) == AND
+	  && CONST_INT_P (XEXP (left, 1)))
+	{
+	  right = XEXP (x, 1);
+	  if (rotate_insert_cost (left, right, mode, speed, total))
+	    return true;
+	}
+      return false;
 
     case CLZ:
     case XOR:
@@ -21465,7 +21556,7 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
 	  *total = rs6000_cost->fp;
 	  return false;
 	}
-      break;
+      return false;
 
     case NE:
     case EQ:
@@ -21503,13 +21594,11 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
 	  *total = 0;
 	  return true;
 	}
-      break;
+      return false;
 
     default:
-      break;
+      return false;
     }
-
-  return false;
 }
 
 /* Debug form of r6000_rtx_costs that is selected if -mdebug=cost.  */

-- 
Alan Modra
Australia Development Lab, IBM

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2020-09-21 23:54 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-15  1:19 [RS6000] rtx_costs Alan Modra
2020-09-15  1:19 ` [RS6000] Count rldimi constant insns Alan Modra
2020-09-15 22:29   ` Segher Boessenkool
2020-09-15  1:19 ` [RS6000] rs6000_rtx_costs for PLUS/MINUS constant Alan Modra
2020-09-15 22:31   ` Segher Boessenkool
2020-09-15  1:19 ` [RS6000] rs6000_rtx_costs for AND Alan Modra
2020-09-15 18:15   ` will schmidt
2020-09-16  7:24     ` Alan Modra
2020-09-15  1:19 ` [RS6000] rs6000_rtx_costs comment Alan Modra
2020-09-16 23:21   ` Segher Boessenkool
2020-09-15  1:19 ` [RS6000] rs6000_rtx_costs multi-insn constants Alan Modra
2020-09-16 23:28   ` Segher Boessenkool
2020-09-15  1:19 ` [RS6000] rs6000_rtx_costs cost IOR Alan Modra
2020-09-17  0:02   ` Segher Boessenkool
2020-09-17  3:42     ` Alan Modra
2020-09-21 15:49       ` Segher Boessenkool
2020-09-21 23:54         ` Alan Modra
2020-09-15  1:19 ` [RS6000] rs6000_rtx_costs reduce cost for SETs Alan Modra
2020-09-17 17:51   ` Segher Boessenkool
2020-09-18  3:38     ` Alan Modra
2020-09-18 18:13       ` Segher Boessenkool
2020-09-21  7:07         ` Alan Modra
2020-09-15  1:19 ` [RS6000] rotate and mask constants Alan Modra
2020-09-15  7:16   ` Alan Modra
2020-09-21 15:56     ` Segher Boessenkool
2020-09-15 18:15 ` [RS6000] rtx_costs will schmidt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).