[PATCH][combine][RFC] Don't transform sign and zero extends inside mults

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH][combine][RFC] Don't transform sign and zero extends inside mults
@ 2015-11-02 14:15 Kyrill Tkachov
  2015-11-02 22:31 ` Jeff Law
  2015-11-04 23:50 ` Segher Boessenkool
  0 siblings, 2 replies; 17+ messages in thread
From: Kyrill Tkachov @ 2015-11-02 14:15 UTC (permalink / raw)
  To: gcc Patches; +Cc: Segher Boessenkool

[-- Attachment #1: Type: text/plain, Size: 2592 bytes --]

Hi all,

This patch attempts to restrict combine from transforming ZERO_EXTEND and SIGN_EXTEND operations into and-bitmask
and weird SUBREG expressions when they appear inside MULT expressions. This is because a MULT rtx containing these
extend operations is usually a well understood widening multiply operation.
However, if the costs for simple zero or sign-extend moves happen to line up in a particular way,
expand_compound_operation will end up mangling a perfectly innocent extend+mult+add rtx like:
  (set (reg/f:DI 393)
     (plus:DI (mult:DI (sign_extend:DI (reg/v:SI 425 [ selected ]))
             (sign_extend:DI (reg:SI 606)))
          (reg:DI 600)))

into:
  (set (reg/f:DI 393)
     (plus:DI (mult:DI (and:DI (subreg:DI (reg:SI 606) 0)
                 (const_int 202 [0xca]))
             (sign_extend:DI (reg/v:SI 425 [ selected ])))
          (reg:DI 600)))

because it decides that the and-subreg form is cheaper than a sign-extend, even though
the resultant expression can't hope to match a widening multiply-add pattern anymore.
The and-subreg form may well be cheaper as the SET_SRC of a simple set, but is unlikely
to be meaningful when inside a MULT expression.
AFAIK the accepted way of representing widening multiplies is with MULT expressions containing
zero/sign-extends, so rather than adding duplicate patterns in the backend to represent the different
ways an extend operation can be expressed by combine, I think we should stop combine from mangling
the representation where possible.

This patch fixes that by stopping the transformation on the extend operands of a MULT if the target
has a mode-appropriate widening multiply optab in the two places where I saw this happen.

With this patch on aarch64 I saw increased generation of smaddl and umaddl instructions where
before the combination of smull and add instructions were rejected because the extend operands
were being transformed into these weird equivalent expressions.

Overall, I see 24% more [su]maddl instructions being generated for SPEC2006 and with no performance regressions.

The testcase in the patch is the most minimal one I could get that demonstrates the issue I'm trying to solve.

Does this approach look ok?

Thanks,
Kyrill

2015-11-01  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * combine.c (subst): Restrict transformation when handling extend
     ops inside a MULT.
     (force_to_mode, binop case): Likewise.

2015-11-01  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * gcc.target/aarch64/umaddl_combine_1.c: New test.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: combine-mul-extend.patch --]
[-- Type: text/x-patch; name=combine-mul-extend.patch, Size: 4284 bytes --]

commit 2005243d896cbeb3d22421a63f080699ab357610
Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
Date:   Fri Oct 30 13:56:10 2015 +0000

    [combine] Don't transform sign and zero extends inside mults

diff --git a/gcc/combine.c b/gcc/combine.c
index fa1a93f..be0f5ae 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -5369,6 +5369,7 @@ subst (rtx x, rtx from, rtx to, int in_dest, int in_cond, int unique_copy)
 		  n_occurrences++;
 		}
 	      else
+		{
 		/* If we are in a SET_DEST, suppress most cases unless we
 		   have gone inside a MEM, in which case we want to
 		   simplify the address.  We assume here that things that
@@ -5376,15 +5377,33 @@ subst (rtx x, rtx from, rtx to, int in_dest, int in_cond, int unique_copy)
 		   parts in the first expression.  This is true for SUBREG,
 		   STRICT_LOW_PART, and ZERO_EXTRACT, which are the only
 		   things aside from REG and MEM that should appear in a
-		   SET_DEST.  */
-		new_rtx = subst (XEXP (x, i), from, to,
+		   SET_DEST.  Also, restrict transformations on SIGN_EXTENDs
+		   and ZERO_EXTENDs if they appear inside multiplications if
+		   the target supports widening multiplies.  This is to avoid
+		   mangling such expressions beyond recognition.  */
+		  bool restrict_extend_p = false;
+		  rtx_code inner_code = GET_CODE (XEXP (x, i));
+
+		  if (code == MULT
+		      && (inner_code == SIGN_EXTEND
+			  || inner_code == ZERO_EXTEND)
+		      && widening_optab_handler (inner_code == ZERO_EXTEND
+						    ? umul_widen_optab
+						    : smul_widen_optab,
+						  GET_MODE (x),
+						  GET_MODE (XEXP (XEXP (x, i), 0)))
+			 != CODE_FOR_nothing)
+		    restrict_extend_p = true;
+
+		  new_rtx = subst (XEXP (x, i), from, to,
 			     (((in_dest
 				&& (code == SUBREG || code == STRICT_LOW_PART
 				    || code == ZERO_EXTRACT))
 			       || code == SET)
-			      && i == 0),
+			      && i == 0) || restrict_extend_p,
 				 code == IF_THEN_ELSE && i == 0,
 				 unique_copy);
+		}
 
 	      /* If we found that we will have to reject this combination,
 		 indicate that by returning the CLOBBER ourselves, rather than
@@ -8509,8 +8528,30 @@ force_to_mode (rtx x, machine_mode mode, unsigned HOST_WIDE_INT mask,
       /* For most binary operations, just propagate into the operation and
 	 change the mode if we have an operation of that mode.  */
 
-      op0 = force_to_mode (XEXP (x, 0), mode, mask, next_select);
-      op1 = force_to_mode (XEXP (x, 1), mode, mask, next_select);
+      op0 = XEXP (x, 0);
+      op1 = XEXP (x, 1);
+
+      /* If we have a widening multiply avoid messing with the
+	 ZERO/SIGN_EXTEND operands so that we can still match the
+	 appropriate smul/umul patterns.  */
+      if (GET_CODE (x) == MULT
+	  && GET_CODE (op0) == GET_CODE (op1)
+	  && (GET_CODE (op0) == SIGN_EXTEND
+	      || GET_CODE (op0) == ZERO_EXTEND)
+	  && GET_MODE (op0) == GET_MODE (op1)
+	  && widening_optab_handler (GET_CODE (op0) == ZERO_EXTEND
+				      ? umul_widen_optab
+				      : smul_widen_optab,
+				     GET_MODE (x),
+				     GET_MODE (XEXP (op0, 0)))
+	       != CODE_FOR_nothing)
+	;
+      else
+	{
+	  op0 = force_to_mode (op0, mode, mask, next_select);
+	  op1 = force_to_mode (op1, mode, mask, next_select);
+	}
+
 
       /* If we ended up truncating both operands, truncate the result of the
 	 operation instead.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/umaddl_combine_1.c b/gcc/testsuite/gcc.target/aarch64/umaddl_combine_1.c
new file mode 100644
index 0000000..703c94d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/umaddl_combine_1.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mcpu=cortex-a53" } */
+
+enum reg_class
+{
+  NO_REGS,
+  AD_REGS,
+  ALL_REGS, LIM_REG_CLASSES
+};
+
+extern enum reg_class
+  reg_class_subclasses[((int) LIM_REG_CLASSES)][((int) LIM_REG_CLASSES)];
+
+void
+init_reg_sets_1 ()
+{
+  unsigned int i, j;
+  {
+    for (j = i + 1; j < ((int) LIM_REG_CLASSES); j++)
+      {
+	enum reg_class *p;
+	p = &reg_class_subclasses[j][0];
+	while (*p != LIM_REG_CLASSES)
+	  p++;
+      }
+  }
+}
+
+/* { dg-final { scan-assembler-not "umull\tx\[0-9\]+, w\[0-9\]+, w\[0-9\]+" } } */

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH][combine][RFC] Don't transform sign and zero extends inside mults
  2015-11-02 14:15 [PATCH][combine][RFC] Don't transform sign and zero extends inside mults Kyrill Tkachov
@ 2015-11-02 22:31 ` Jeff Law
  2015-11-04 11:37   ` Kyrill Tkachov
  2015-11-04 23:50 ` Segher Boessenkool
  1 sibling, 1 reply; 17+ messages in thread
From: Jeff Law @ 2015-11-02 22:31 UTC (permalink / raw)
  To: Kyrill Tkachov, gcc Patches; +Cc: Segher Boessenkool

On 11/02/2015 07:15 AM, Kyrill Tkachov wrote:
> Hi all,
>
> This patch attempts to restrict combine from transforming ZERO_EXTEND
> and SIGN_EXTEND operations into and-bitmask
> and weird SUBREG expressions when they appear inside MULT expressions.
> This is because a MULT rtx containing these
> extend operations is usually a well understood widening multiply operation.
> However, if the costs for simple zero or sign-extend moves happen to
> line up in a particular way,
> expand_compound_operation will end up mangling a perfectly innocent
> extend+mult+add rtx like:
>   (set (reg/f:DI 393)
>      (plus:DI (mult:DI (sign_extend:DI (reg/v:SI 425 [ selected ]))
>              (sign_extend:DI (reg:SI 606)))
>           (reg:DI 600)))
>
> into:
>   (set (reg/f:DI 393)
>      (plus:DI (mult:DI (and:DI (subreg:DI (reg:SI 606) 0)
>                  (const_int 202 [0xca]))
>              (sign_extend:DI (reg/v:SI 425 [ selected ])))
>           (reg:DI 600)))
Going to leave the review side of this for Segher.

If you decide to go forward, there's a section in md.texi WRT 
canonicalization of these RTL codes that probably would need updating. 
Just search for "canonicalization" section and read down a ways.


Jeff


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH][combine][RFC] Don't transform sign and zero extends inside mults
  2015-11-02 22:31 ` Jeff Law
@ 2015-11-04 11:37   ` Kyrill Tkachov
  2015-11-04 13:33     ` Kyrill Tkachov
  0 siblings, 1 reply; 17+ messages in thread
From: Kyrill Tkachov @ 2015-11-04 11:37 UTC (permalink / raw)
  To: Jeff Law, gcc Patches; +Cc: Segher Boessenkool

Ji Jrgg,

On 02/11/15 22:31, Jeff Law wrote:
> On 11/02/2015 07:15 AM, Kyrill Tkachov wrote:
>> Hi all,
>>
>> This patch attempts to restrict combine from transforming ZERO_EXTEND
>> and SIGN_EXTEND operations into and-bitmask
>> and weird SUBREG expressions when they appear inside MULT expressions.
>> This is because a MULT rtx containing these
>> extend operations is usually a well understood widening multiply operation.
>> However, if the costs for simple zero or sign-extend moves happen to
>> line up in a particular way,
>> expand_compound_operation will end up mangling a perfectly innocent
>> extend+mult+add rtx like:
>>   (set (reg/f:DI 393)
>>      (plus:DI (mult:DI (sign_extend:DI (reg/v:SI 425 [ selected ]))
>>              (sign_extend:DI (reg:SI 606)))
>>           (reg:DI 600)))
>>
>> into:
>>   (set (reg/f:DI 393)
>>      (plus:DI (mult:DI (and:DI (subreg:DI (reg:SI 606) 0)
>>                  (const_int 202 [0xca]))
>>              (sign_extend:DI (reg/v:SI 425 [ selected ])))
>>           (reg:DI 600)))
> Going to leave the review side of this for Segher.
>
> If you decide to go forward, there's a section in md.texi WRT canonicalization of these RTL codes that probably would need updating. Just search for "canonicalization" section and read down a ways.
>

You mean document a canonical form for these operations as (mult (extend op1) (extend op2)) ?

>
> Jeff
>
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH][combine][RFC] Don't transform sign and zero extends inside mults
  2015-11-04 11:37   ` Kyrill Tkachov
@ 2015-11-04 13:33     ` Kyrill Tkachov
  0 siblings, 0 replies; 17+ messages in thread
From: Kyrill Tkachov @ 2015-11-04 13:33 UTC (permalink / raw)
  To: Jeff Law, gcc Patches; +Cc: Segher Boessenkool


On 04/11/15 11:37, Kyrill Tkachov wrote:
>
> On 02/11/15 22:31, Jeff Law wrote:
>> On 11/02/2015 07:15 AM, Kyrill Tkachov wrote:
>>> Hi all,
>>>
>>> This patch attempts to restrict combine from transforming ZERO_EXTEND
>>> and SIGN_EXTEND operations into and-bitmask
>>> and weird SUBREG expressions when they appear inside MULT expressions.
>>> This is because a MULT rtx containing these
>>> extend operations is usually a well understood widening multiply operation.
>>> However, if the costs for simple zero or sign-extend moves happen to
>>> line up in a particular way,
>>> expand_compound_operation will end up mangling a perfectly innocent
>>> extend+mult+add rtx like:
>>>   (set (reg/f:DI 393)
>>>      (plus:DI (mult:DI (sign_extend:DI (reg/v:SI 425 [ selected ]))
>>>              (sign_extend:DI (reg:SI 606)))
>>>           (reg:DI 600)))
>>>
>>> into:
>>>   (set (reg/f:DI 393)
>>>      (plus:DI (mult:DI (and:DI (subreg:DI (reg:SI 606) 0)
>>>                  (const_int 202 [0xca]))
>>>              (sign_extend:DI (reg/v:SI 425 [ selected ])))
>>>           (reg:DI 600)))
>> Going to leave the review side of this for Segher.
>>
>> If you decide to go forward, there's a section in md.texi WRT canonicalization of these RTL codes that probably would need updating. Just search for "canonicalization" section and read down a ways.
>>
>
> You mean document a canonical form for these operations as (mult (extend op1) (extend op2)) ?
>

Looking at the issue more deeply I think the concrete problem is the code in expand_compound_operation:
<code>
   /* If we reach here, we want to return a pair of shifts.  The inner
      shift is a left shift of BITSIZE - POS - LEN bits.  The outer
      shift is a right shift of BITSIZE - LEN bits.  It is arithmetic or
      logical depending on the value of UNSIGNEDP.

      If this was a ZERO_EXTEND or ZERO_EXTRACT, this pair of shifts will be
      converted into an AND of a shift.

      We must check for the case where the left shift would have a negative
      count.  This can happen in a case like (x >> 31) & 255 on machines
      that can't shift by a constant.  On those machines, we would first
      combine the shift with the AND to produce a variable-position
      extraction.  Then the constant of 31 would be substituted in
      to produce such a position.  */

   modewidth = GET_MODE_PRECISION (GET_MODE (x));
   if (modewidth >= pos + len)
     {
       machine_mode mode = GET_MODE (x);
       tem = gen_lowpart (mode, XEXP (x, 0));
       if (!tem || GET_CODE (tem) == CLOBBER)
     return x;
       tem = simplify_shift_const (NULL_RTX, ASHIFT, mode,
                   tem, modewidth - pos - len);
       tem = simplify_shift_const (NULL_RTX, unsignedp ? LSHIFTRT : ASHIFTRT,
                   mode, tem, modewidth - len);
     }
   else if (unsignedp && len < HOST_BITS_PER_WIDE_INT)
     tem = simplify_and_const_int (NULL_RTX, GET_MODE (x),
                   simplify_shift_const (NULL_RTX, LSHIFTRT,
                             GET_MODE (x),
                             XEXP (x, 0), pos),
                   ((unsigned HOST_WIDE_INT) 1 << len) - 1);
   else
     /* Any other cases we can't handle.  */
     return x;
</code>

this code ends up unconditionally transforming (zero_extend:DI (reg:SI 125)) into:
(and:DI (subreg:DI (reg:SI 125) 0)
     (const_int 1 [0x1]))

which then gets substituted as an operand of the mult and nothing matches after that.
archaeology shows this code has been there since at least 1992. I guess my question is
why do we do this unconditionally? Should we be checking whether the zero_extend form
is cheaper than whatever simplify_shift_const returns?
I don't think what expand_compound_operatio is trying to do here is canonicalisation...

Thanks,
Kyrill


>>
>> Jeff
>>
>>
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH][combine][RFC] Don't transform sign and zero extends inside mults
  2015-11-02 14:15 [PATCH][combine][RFC] Don't transform sign and zero extends inside mults Kyrill Tkachov
  2015-11-02 22:31 ` Jeff Law
@ 2015-11-04 23:50 ` Segher Boessenkool
  2015-11-05 12:01   ` Kyrill Tkachov
  1 sibling, 1 reply; 17+ messages in thread
From: Segher Boessenkool @ 2015-11-04 23:50 UTC (permalink / raw)
  To: Kyrill Tkachov; +Cc: gcc Patches

Hi Kyrill,

On Mon, Nov 02, 2015 at 02:15:27PM +0000, Kyrill Tkachov wrote:
> This patch attempts to restrict combine from transforming ZERO_EXTEND and 
> SIGN_EXTEND operations into and-bitmask
> and weird SUBREG expressions when they appear inside MULT expressions. This 
> is because a MULT rtx containing these
> extend operations is usually a well understood widening multiply operation.

Right.  I have often wanted for combine to try plain substitution as well
as substitution and simplification: simplification (which uses nonzero_bits
etc.) often makes a mess of even moderately complex instructions.

But since we do not have that yet, and your 24% number is nicely impressive,
let's try to do no simplification for widening mults, as in your patch.

> With this patch on aarch64 I saw increased generation of smaddl and umaddl 
> instructions where
> before the combination of smull and add instructions were rejected because 
> the extend operands
> were being transformed into these weird equivalent expressions.
> 
> Overall, I see 24% more [su]maddl instructions being generated for SPEC2006 
> and with no performance regressions.
> 
> The testcase in the patch is the most minimal one I could get that 
> demonstrates the issue I'm trying to solve.
> 
> Does this approach look ok?

In broad lines it does.  Could you try this patch instead?  Not tested
etc. (other than building an aarch64 cross and your test case); I'll do
that tomorrow if you like the result.


Segher


diff --git a/gcc/combine.c b/gcc/combine.c
index c3db2e0..3bf7ffb 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -5284,6 +5284,15 @@ subst (rtx x, rtx from, rtx to, int in_dest, int in_cond, int unique_copy)
 	      || GET_CODE (SET_DEST (x)) == PC))
 	fmt = "ie";
 
+      /* Substituting into the operands of a widening MULT is not likely
+	 to create RTL matching a machine insn.  */
+      if (code == MULT
+	  && (GET_CODE (XEXP (x, 0)) == ZERO_EXTEND
+	      || GET_CODE (XEXP (x, 0)) == SIGN_EXTEND)
+	  && (GET_CODE (XEXP (x, 1)) == ZERO_EXTEND
+	      || GET_CODE (XEXP (x, 1)) == SIGN_EXTEND))
+	return x;
+
       /* Get the mode of operand 0 in case X is now a SIGN_EXTEND of a
 	 constant.  */
       if (fmt[0] == 'e')
@@ -8455,6 +8464,15 @@ force_to_mode (rtx x, machine_mode mode, unsigned HOST_WIDE_INT mask,
       /* ... fall through ...  */
 
     case MULT:
+      /* Substituting into the operands of a widening MULT is not likely to
+	 create RTL matching a machine insn.  */
+      if (code == MULT
+	  && (GET_CODE (XEXP (x, 0)) == ZERO_EXTEND
+	      || GET_CODE (XEXP (x, 0)) == SIGN_EXTEND)
+	  && (GET_CODE (XEXP (x, 1)) == ZERO_EXTEND
+	      || GET_CODE (XEXP (x, 1)) == SIGN_EXTEND))
+	return gen_lowpart_or_truncate (mode, x);
+
       /* For PLUS, MINUS and MULT, we need any bits less significant than the
 	 most significant bit in MASK since carries from those bits will
 	 affect the bits we are interested in.  */
-- 
1.9.3

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH][combine][RFC] Don't transform sign and zero extends inside mults
  2015-11-04 23:50 ` Segher Boessenkool
@ 2015-11-05 12:01   ` Kyrill Tkachov
  2015-11-06  0:56     ` Segher Boessenkool
  0 siblings, 1 reply; 17+ messages in thread
From: Kyrill Tkachov @ 2015-11-05 12:01 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc Patches

Hi Segher,

On 04/11/15 23:50, Segher Boessenkool wrote:
> Hi Kyrill,
>
> On Mon, Nov 02, 2015 at 02:15:27PM +0000, Kyrill Tkachov wrote:
>> This patch attempts to restrict combine from transforming ZERO_EXTEND and
>> SIGN_EXTEND operations into and-bitmask
>> and weird SUBREG expressions when they appear inside MULT expressions. This
>> is because a MULT rtx containing these
>> extend operations is usually a well understood widening multiply operation.
> Right.  I have often wanted for combine to try plain substitution as well
> as substitution and simplification: simplification (which uses nonzero_bits
> etc.) often makes a mess of even moderately complex instructions.
>
> But since we do not have that yet, and your 24% number is nicely impressive,
> let's try to do no simplification for widening mults, as in your patch.
>
>> With this patch on aarch64 I saw increased generation of smaddl and umaddl
>> instructions where
>> before the combination of smull and add instructions were rejected because
>> the extend operands
>> were being transformed into these weird equivalent expressions.
>>
>> Overall, I see 24% more [su]maddl instructions being generated for SPEC2006
>> and with no performance regressions.
>>
>> The testcase in the patch is the most minimal one I could get that
>> demonstrates the issue I'm trying to solve.
>>
>> Does this approach look ok?
> In broad lines it does.  Could you try this patch instead?  Not tested
> etc. (other than building an aarch64 cross and your test case); I'll do
> that tomorrow if you like the result.

Thanks, that looks less intrusive. I did try it out on arm and aarch64.
It does work on the aarch64 testcase. However, there's also a correctness
regression, I'll try to explain inline....

>
>
> Segher
>
>
> diff --git a/gcc/combine.c b/gcc/combine.c
> index c3db2e0..3bf7ffb 100644
> --- a/gcc/combine.c
> +++ b/gcc/combine.c
> @@ -5284,6 +5284,15 @@ subst (rtx x, rtx from, rtx to, int in_dest, int in_cond, int unique_copy)
>   	      || GET_CODE (SET_DEST (x)) == PC))
>   	fmt = "ie";
>   
> +      /* Substituting into the operands of a widening MULT is not likely
> +	 to create RTL matching a machine insn.  */
> +      if (code == MULT
> +	  && (GET_CODE (XEXP (x, 0)) == ZERO_EXTEND
> +	      || GET_CODE (XEXP (x, 0)) == SIGN_EXTEND)
> +	  && (GET_CODE (XEXP (x, 1)) == ZERO_EXTEND
> +	      || GET_CODE (XEXP (x, 1)) == SIGN_EXTEND))
> +	return x;

I think we should also add:
       && REG_P (XEXP (XEXP (x, 0), 0))
       && REG_P (XEXP (XEXP (x, 1), 0))

to the condition. Otherwise I've seen regressions in the arm testsuite, the
gcc.target/arm/smlatb-1.s test in particular that tries to match the pattern
(define_insn "*maddhisi4tb"
   [(set (match_operand:SI 0 "s_register_operand" "=r")
     (plus:SI (mult:SI (ashiftrt:SI
                (match_operand:SI 1 "s_register_operand" "r")
                (const_int 16))
               (sign_extend:SI
                (match_operand:HI 2 "s_register_operand" "r")))
          (match_operand:SI 3 "s_register_operand" "r")))]


There we have a sign_extend of a shift that we want to convert to the form
that the pattern expects. So adding the checks for REG_P fixes that for me.

For the correctness issue I saw on aarch64 the shortest case I could reduce is:
short int a[16], b[16];
void
f5 (void)
{
   a[0] = b[0] / 14;
}

We synthesise the division and lose one of the intermediate immediates.
The good codegen is:
f5:
     adrp    x1, b
     mov    w0, 9363            // <--- This gets lost!
     movk    w0, 0x9249, lsl 16 // <--- This gets lost!
     adrp    x2, a
     ldrsh    w1, [x1, #:lo12:b]
     smull    x0, w1, w0
     lsr    x0, x0, 32
     add    w0, w1, w0
     asr    w0, w0, 3
     sub    w0, w0, w1, asr 31
     strh    w0, [x2, #:lo12:a]
     ret

The bad one with this patch is:
     adrp    x1, b
     adrp    x2, a
     ldrsh    w1, [x1, #:lo12:b]
     smull    x0, w1, w0
     lsr    x0, x0, 32
     add    w0, w1, w0
     asr    w0, w0, 3
     sub    w0, w0, w1, asr 31
     strh    w0, [x2, #:lo12:a]
     ret


The problematic hunk there is the subst hunk.
In the expression 'x':
(mult:DI (sign_extend:DI (reg:SI 80 [ b ]))
     (sign_extend:DI (reg:SI 82)))

it tries to substitute 'from': (reg:SI 82)
with 'to': (const_int -1840700269 [0xffffffff92492493])

since we now return just 'x' combine thinks that the substitution succeeded
and eliminates the immediate move.
Is there a way that subst can signal some kind of "failed to substitute" result?
If not, I got it to work by using that check to set the in_dest variable to the
subsequent recursive call to subst, in a similar way to my original patch, but I was
hoping to avoid overloading the meaning of in_dest.

Thanks,
Kyrill

> +
>         /* Get the mode of operand 0 in case X is now a SIGN_EXTEND of a
>   	 constant.  */
>         if (fmt[0] == 'e')
> @@ -8455,6 +8464,15 @@ force_to_mode (rtx x, machine_mode mode, unsigned HOST_WIDE_INT mask,
>         /* ... fall through ...  */
>   
>       case MULT:
> +      /* Substituting into the operands of a widening MULT is not likely to
> +	 create RTL matching a machine insn.  */
> +      if (code == MULT
> +	  && (GET_CODE (XEXP (x, 0)) == ZERO_EXTEND
> +	      || GET_CODE (XEXP (x, 0)) == SIGN_EXTEND)
> +	  && (GET_CODE (XEXP (x, 1)) == ZERO_EXTEND
> +	      || GET_CODE (XEXP (x, 1)) == SIGN_EXTEND))
> +	return gen_lowpart_or_truncate (mode, x);
> +
>         /* For PLUS, MINUS and MULT, we need any bits less significant than the
>   	 most significant bit in MASK since carries from those bits will
>   	 affect the bits we are interested in.  */

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH][combine][RFC] Don't transform sign and zero extends inside mults
  2015-11-05 12:01   ` Kyrill Tkachov
@ 2015-11-06  0:56     ` Segher Boessenkool
  2015-11-06 14:19       ` Kyrill Tkachov
  0 siblings, 1 reply; 17+ messages in thread
From: Segher Boessenkool @ 2015-11-06  0:56 UTC (permalink / raw)
  To: Kyrill Tkachov; +Cc: gcc Patches

On Thu, Nov 05, 2015 at 12:01:26PM +0000, Kyrill Tkachov wrote:
> Thanks, that looks less intrusive. I did try it out on arm and aarch64.
> It does work on the aarch64 testcase. However, there's also a correctness
> regression, I'll try to explain inline....

> >diff --git a/gcc/combine.c b/gcc/combine.c
> >index c3db2e0..3bf7ffb 100644
> >--- a/gcc/combine.c
> >+++ b/gcc/combine.c
> >@@ -5284,6 +5284,15 @@ subst (rtx x, rtx from, rtx to, int in_dest, int 
> >in_cond, int unique_copy)
> >  	      || GET_CODE (SET_DEST (x)) == PC))
> >  	fmt = "ie";
> >  
> >+      /* Substituting into the operands of a widening MULT is not likely
> >+	 to create RTL matching a machine insn.  */
> >+      if (code == MULT
> >+	  && (GET_CODE (XEXP (x, 0)) == ZERO_EXTEND
> >+	      || GET_CODE (XEXP (x, 0)) == SIGN_EXTEND)
> >+	  && (GET_CODE (XEXP (x, 1)) == ZERO_EXTEND
> >+	      || GET_CODE (XEXP (x, 1)) == SIGN_EXTEND))
> >+	return x;
> 
> I think we should also add:
>       && REG_P (XEXP (XEXP (x, 0), 0))
>       && REG_P (XEXP (XEXP (x, 1), 0))
> 
> to the condition. Otherwise I've seen regressions in the arm testsuite, the
> gcc.target/arm/smlatb-1.s test in particular that tries to match the pattern
> (define_insn "*maddhisi4tb"
>   [(set (match_operand:SI 0 "s_register_operand" "=r")
>     (plus:SI (mult:SI (ashiftrt:SI
>                (match_operand:SI 1 "s_register_operand" "r")
>                (const_int 16))
>               (sign_extend:SI
>                (match_operand:HI 2 "s_register_operand" "r")))
>          (match_operand:SI 3 "s_register_operand" "r")))]
> 
> 
> There we have a sign_extend of a shift that we want to convert to the form
> that the pattern expects. So adding the checks for REG_P fixes that for me.

I'll have a look at this; I thought it should be handled with the new patch
(attached), but maybe not.

> For the correctness issue I saw on aarch64 the shortest case I could reduce 
> is:
> short int a[16], b[16];
> void
> f5 (void)
> {
>   a[0] = b[0] / 14;
> }

(Without -mcpu=cortex-a53, or you just get a divide insn).

> Is there a way that subst can signal some kind of "failed to substitute" 
> result?

Yep, see new patch.  The "from == to" condition is for when subst is called
just to simplify some code (normally with pc_rtx, pc_rtx).

> If not, I got it to work by using that check to set the in_dest variable to 
> the
> subsequent recursive call to subst, in a similar way to my original patch, 
> but I was
> hoping to avoid overloading the meaning of in_dest.

Yes me too :-)


Segher


diff --git a/gcc/combine.c b/gcc/combine.c
index c3db2e0..c806db9 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -5284,6 +5284,20 @@ subst (rtx x, rtx from, rtx to, int in_dest, int in_cond, int unique_copy)
 	      || GET_CODE (SET_DEST (x)) == PC))
 	fmt = "ie";
 
+      /* Substituting into the operands of a widening MULT is not likely
+	 to create RTL matching a machine insn.  */
+      if (code == MULT
+	  && (GET_CODE (XEXP (x, 0)) == ZERO_EXTEND
+	      || GET_CODE (XEXP (x, 0)) == SIGN_EXTEND)
+	  && (GET_CODE (XEXP (x, 1)) == ZERO_EXTEND
+	      || GET_CODE (XEXP (x, 1)) == SIGN_EXTEND))
+	{
+	  if (from == to)
+	    return x;
+	  else
+	    return gen_rtx_CLOBBER (GET_MODE (x), const0_rtx);
+	}
+
       /* Get the mode of operand 0 in case X is now a SIGN_EXTEND of a
 	 constant.  */
       if (fmt[0] == 'e')
@@ -8455,6 +8469,15 @@ force_to_mode (rtx x, machine_mode mode, unsigned HOST_WIDE_INT mask,
       /* ... fall through ...  */
 
     case MULT:
+      /* Substituting into the operands of a widening MULT is not likely to
+	 create RTL matching a machine insn.  */
+      if (code == MULT
+	  && (GET_CODE (XEXP (x, 0)) == ZERO_EXTEND
+	      || GET_CODE (XEXP (x, 0)) == SIGN_EXTEND)
+	  && (GET_CODE (XEXP (x, 1)) == ZERO_EXTEND
+	      || GET_CODE (XEXP (x, 1)) == SIGN_EXTEND))
+	return gen_lowpart_or_truncate (mode, x);
+
       /* For PLUS, MINUS and MULT, we need any bits less significant than the
 	 most significant bit in MASK since carries from those bits will
 	 affect the bits we are interested in.  */
-- 
1.9.3

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH][combine][RFC] Don't transform sign and zero extends inside mults
  2015-11-06  0:56     ` Segher Boessenkool
@ 2015-11-06 14:19       ` Kyrill Tkachov
  2015-11-06 21:14         ` Jeff Law
  0 siblings, 1 reply; 17+ messages in thread
From: Kyrill Tkachov @ 2015-11-06 14:19 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc Patches

[-- Attachment #1: Type: text/plain, Size: 5340 bytes --]


On 06/11/15 00:56, Segher Boessenkool wrote:
> On Thu, Nov 05, 2015 at 12:01:26PM +0000, Kyrill Tkachov wrote:
>> Thanks, that looks less intrusive. I did try it out on arm and aarch64.
>> It does work on the aarch64 testcase. However, there's also a correctness
>> regression, I'll try to explain inline....
>>> diff --git a/gcc/combine.c b/gcc/combine.c
>>> index c3db2e0..3bf7ffb 100644
>>> --- a/gcc/combine.c
>>> +++ b/gcc/combine.c
>>> @@ -5284,6 +5284,15 @@ subst (rtx x, rtx from, rtx to, int in_dest, int
>>> in_cond, int unique_copy)
>>>   	      || GET_CODE (SET_DEST (x)) == PC))
>>>   	fmt = "ie";
>>>   
>>> +      /* Substituting into the operands of a widening MULT is not likely
>>> +	 to create RTL matching a machine insn.  */
>>> +      if (code == MULT
>>> +	  && (GET_CODE (XEXP (x, 0)) == ZERO_EXTEND
>>> +	      || GET_CODE (XEXP (x, 0)) == SIGN_EXTEND)
>>> +	  && (GET_CODE (XEXP (x, 1)) == ZERO_EXTEND
>>> +	      || GET_CODE (XEXP (x, 1)) == SIGN_EXTEND))
>>> +	return x;
>> I think we should also add:
>>        && REG_P (XEXP (XEXP (x, 0), 0))
>>        && REG_P (XEXP (XEXP (x, 1), 0))
>>
>> to the condition. Otherwise I've seen regressions in the arm testsuite, the
>> gcc.target/arm/smlatb-1.s test in particular that tries to match the pattern
>> (define_insn "*maddhisi4tb"
>>    [(set (match_operand:SI 0 "s_register_operand" "=r")
>>      (plus:SI (mult:SI (ashiftrt:SI
>>                 (match_operand:SI 1 "s_register_operand" "r")
>>                 (const_int 16))
>>                (sign_extend:SI
>>                 (match_operand:HI 2 "s_register_operand" "r")))
>>           (match_operand:SI 3 "s_register_operand" "r")))]
>>
>>
>> There we have a sign_extend of a shift that we want to convert to the form
>> that the pattern expects. So adding the checks for REG_P fixes that for me.
> I'll have a look at this; I thought it should be handled with the new patch
> (attached), but maybe not.
>
>> For the correctness issue I saw on aarch64 the shortest case I could reduce
>> is:
>> short int a[16], b[16];
>> void
>> f5 (void)
>> {
>>    a[0] = b[0] / 14;
>> }
> (Without -mcpu=cortex-a53, or you just get a divide insn).
>
>> Is there a way that subst can signal some kind of "failed to substitute"
>> result?
> Yep, see new patch.  The "from == to" condition is for when subst is called
> just to simplify some code (normally with pc_rtx, pc_rtx).

Indeed, this looks better but it still needs the REG_P checks for the inner
operands of the extends to not screw up the arm case.
Here's my proposed extension. I've also modified the testcase slightly to not use
an unitialised variable. It still demonstrates the issue we're trying to solve.

Bootstrapped and tested on arm and aarch64.
I'll let you put it through it's paces on your setup :)

P.S. Do we want to restrict this to targets that have a widening mul optab like I did
in the original patch?

Thanks,
Kyrill

2015-11-06  Segher Boessenkool  <segher@kernel.crashing.org>
             Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * combine.c (subst): Don't substitute or simplify when
     handling register-wise widening multiply.
     (force_to_mode): Likewise.

2015-11-06  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * gcc.target/aarch64/umaddl_combine_1.c: New test.

>> If not, I got it to work by using that check to set the in_dest variable to
>> the
>> subsequent recursive call to subst, in a similar way to my original patch,
>> but I was
>> hoping to avoid overloading the meaning of in_dest.
> Yes me too :-)
>
>
> Segher
>
>
> diff --git a/gcc/combine.c b/gcc/combine.c
> index c3db2e0..c806db9 100644
> --- a/gcc/combine.c
> +++ b/gcc/combine.c
> @@ -5284,6 +5284,20 @@ subst (rtx x, rtx from, rtx to, int in_dest, int in_cond, int unique_copy)
>   	      || GET_CODE (SET_DEST (x)) == PC))
>   	fmt = "ie";
>   
> +      /* Substituting into the operands of a widening MULT is not likely
> +	 to create RTL matching a machine insn.  */
> +      if (code == MULT
> +	  && (GET_CODE (XEXP (x, 0)) == ZERO_EXTEND
> +	      || GET_CODE (XEXP (x, 0)) == SIGN_EXTEND)
> +	  && (GET_CODE (XEXP (x, 1)) == ZERO_EXTEND
> +	      || GET_CODE (XEXP (x, 1)) == SIGN_EXTEND))
> +	{
> +	  if (from == to)
> +	    return x;
> +	  else
> +	    return gen_rtx_CLOBBER (GET_MODE (x), const0_rtx);
> +	}
> +
>         /* Get the mode of operand 0 in case X is now a SIGN_EXTEND of a
>   	 constant.  */
>         if (fmt[0] == 'e')
> @@ -8455,6 +8469,15 @@ force_to_mode (rtx x, machine_mode mode, unsigned HOST_WIDE_INT mask,
>         /* ... fall through ...  */
>   
>       case MULT:
> +      /* Substituting into the operands of a widening MULT is not likely to
> +	 create RTL matching a machine insn.  */
> +      if (code == MULT
> +	  && (GET_CODE (XEXP (x, 0)) == ZERO_EXTEND
> +	      || GET_CODE (XEXP (x, 0)) == SIGN_EXTEND)
> +	  && (GET_CODE (XEXP (x, 1)) == ZERO_EXTEND
> +	      || GET_CODE (XEXP (x, 1)) == SIGN_EXTEND))
> +	return gen_lowpart_or_truncate (mode, x);
> +
>         /* For PLUS, MINUS and MULT, we need any bits less significant than the
>   	 most significant bit in MASK since carries from those bits will
>   	 affect the bits we are interested in.  */


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: combine-mul-extend.patch --]
[-- Type: text/x-patch; name=combine-mul-extend.patch, Size: 2823 bytes --]

commit 24d4a7f9f32e9290e95f940c1ac57fc9be26687c
Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
Date:   Fri Oct 30 13:56:10 2015 +0000

    [combine] Don't transform sign and zero extends inside mults

diff --git a/gcc/combine.c b/gcc/combine.c
index fa1a93f..26b14bf 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -5286,6 +5286,22 @@ subst (rtx x, rtx from, rtx to, int in_dest, int in_cond, int unique_copy)
 	      || GET_CODE (SET_DEST (x)) == PC))
 	fmt = "ie";
 
+      /* Substituting into the operands of a widening MULT is not likely
+	 to create RTL matching a machine insn.  */
+      if (code == MULT
+	  && (GET_CODE (XEXP (x, 0)) == ZERO_EXTEND
+	      || GET_CODE (XEXP (x, 0)) == SIGN_EXTEND)
+	  && (GET_CODE (XEXP (x, 1)) == ZERO_EXTEND
+	      || GET_CODE (XEXP (x, 1)) == SIGN_EXTEND)
+	  && REG_P (XEXP (XEXP (x, 0), 0))
+	  && REG_P (XEXP (XEXP (x, 1), 0)))
+	{
+	  if (from == to)
+	    return x;
+	  else
+	    return gen_rtx_CLOBBER (GET_MODE (x), const0_rtx);
+	}
+
       /* Get the mode of operand 0 in case X is now a SIGN_EXTEND of a
 	 constant.  */
       if (fmt[0] == 'e')
@@ -8447,6 +8463,17 @@ force_to_mode (rtx x, machine_mode mode, unsigned HOST_WIDE_INT mask,
       /* ... fall through ...  */
 
     case MULT:
+      /* Substituting into the operands of a widening MULT is not likely to
+	 create RTL matching a machine insn.  */
+      if (code == MULT
+	  && (GET_CODE (XEXP (x, 0)) == ZERO_EXTEND
+	      || GET_CODE (XEXP (x, 0)) == SIGN_EXTEND)
+	  && (GET_CODE (XEXP (x, 1)) == ZERO_EXTEND
+	      || GET_CODE (XEXP (x, 1)) == SIGN_EXTEND)
+	  && REG_P (XEXP (XEXP (x, 0), 0))
+	  && REG_P (XEXP (XEXP (x, 1), 0)))
+	return gen_lowpart_or_truncate (mode, x);
+
       /* For PLUS, MINUS and MULT, we need any bits less significant than the
 	 most significant bit in MASK since carries from those bits will
 	 affect the bits we are interested in.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/umaddl_combine_1.c b/gcc/testsuite/gcc.target/aarch64/umaddl_combine_1.c
new file mode 100644
index 0000000..430277f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/umaddl_combine_1.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mcpu=cortex-a53" } */
+
+enum reg_class
+{
+  NO_REGS,
+  AD_REGS,
+  ALL_REGS, LIM_REG_CLASSES
+};
+
+extern enum reg_class
+  reg_class_subclasses[((int) LIM_REG_CLASSES)][((int) LIM_REG_CLASSES)];
+
+void
+init_reg_sets_1 (unsigned int i)
+{
+  unsigned int j;
+  {
+    for (j = i + 1; j < ((int) LIM_REG_CLASSES); j++)
+      {
+	enum reg_class *p;
+	p = &reg_class_subclasses[j][0];
+	while (*p != LIM_REG_CLASSES)
+	  p++;
+      }
+  }
+}
+
+/* { dg-final { scan-assembler-not "umull\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+" } } */

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH][combine][RFC] Don't transform sign and zero extends inside mults
  2015-11-06 14:19       ` Kyrill Tkachov
@ 2015-11-06 21:14         ` Jeff Law
  2015-11-06 22:00           ` Segher Boessenkool
  0 siblings, 1 reply; 17+ messages in thread
From: Jeff Law @ 2015-11-06 21:14 UTC (permalink / raw)
  To: Kyrill Tkachov, Segher Boessenkool; +Cc: gcc Patches

On 11/06/2015 07:19 AM, Kyrill Tkachov wrote:
>>> I think we should also add:
>>>        && REG_P (XEXP (XEXP (x, 0), 0))
>>>        && REG_P (XEXP (XEXP (x, 1), 0))
I tend to agree.

>> Yep, see new patch.  The "from == to" condition is for when subst is
>> called
>> just to simplify some code (normally with pc_rtx, pc_rtx).
>
> Indeed, this looks better but it still needs the REG_P checks for the inner
> operands of the extends to not screw up the arm case.
> Here's my proposed extension. I've also modified the testcase slightly
> to not use
> an unitialised variable. It still demonstrates the issue we're trying to
> solve.
>
> Bootstrapped and tested on arm and aarch64.
> I'll let you put it through it's paces on your setup :)
>
> P.S. Do we want to restrict this to targets that have a widening mul
> optab like I did in the original patch?
I don't think it's necessary or desirable.  Consider the case where the 
target has a limited widening multiply insn -- we may not want to expose 
it to the RTL expanders due to some limitation, but we still want the 
combiner to be able to create it under the right circumstances.

>
> Thanks,
> Kyrill
>
> 2015-11-06  Segher Boessenkool  <segher@kernel.crashing.org>
>              Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>
>      * combine.c (subst): Don't substitute or simplify when
>      handling register-wise widening multiply.
>      (force_to_mode): Likewise.
>
> 2015-11-06  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>
>      * gcc.target/aarch64/umaddl_combine_1.c: New test.
I'll let Segher give the final yes/no on this, but it generally looks 
good to me.

jeff

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH][combine][RFC] Don't transform sign and zero extends inside mults
  2015-11-06 21:14         ` Jeff Law
@ 2015-11-06 22:00           ` Segher Boessenkool
  2015-11-08 20:58             ` Segher Boessenkool
  0 siblings, 1 reply; 17+ messages in thread
From: Segher Boessenkool @ 2015-11-06 22:00 UTC (permalink / raw)
  To: Jeff Law; +Cc: Kyrill Tkachov, gcc Patches

[ reordered a bit ]

On Fri, Nov 06, 2015 at 02:14:12PM -0700, Jeff Law wrote:
> On 11/06/2015 07:19 AM, Kyrill Tkachov wrote:
> >>>I think we should also add:
> >>>       && REG_P (XEXP (XEXP (x, 0), 0))
> >>>       && REG_P (XEXP (XEXP (x, 1), 0))
> I tend to agree.

> >Indeed, this looks better but it still needs the REG_P checks for the inner
> >operands of the extends to not screw up the arm case.

> >P.S. Do we want to restrict this to targets that have a widening mul
> >optab like I did in the original patch?
> I don't think it's necessary or desirable.

With the REG_P checks added, now simplification is only stopped for
widening muls of registers, so all is fine -- any such construct _is_
a widening multiplication!

This patch stops combine from generating widening muls of anything else
but registers (immediates, memory, ...).  This probably is a reasonable
tradeoff for all targets, even those (if any) that have such insns.

> >I'll let you put it through it's paces on your setup :)

> I'll let Segher give the final yes/no on this, but it generally looks 
> good to me.

It looks okay to me too.  Testing now, combine patches have the tendency
to do unforeseen things on other targets ;-)


Segher

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH][combine][RFC] Don't transform sign and zero extends inside mults
  2015-11-06 22:00           ` Segher Boessenkool
@ 2015-11-08 20:58             ` Segher Boessenkool
  2015-11-09  7:52               ` Uros Bizjak
  0 siblings, 1 reply; 17+ messages in thread
From: Segher Boessenkool @ 2015-11-08 20:58 UTC (permalink / raw)
  To: Jeff Law; +Cc: Kyrill Tkachov, gcc Patches, ubizjak

On Fri, Nov 06, 2015 at 04:00:08PM -0600, Segher Boessenkool wrote:
> This patch stops combine from generating widening muls of anything else
> but registers (immediates, memory, ...).  This probably is a reasonable
> tradeoff for all targets, even those (if any) that have such insns.
> 
> > >I'll let you put it through it's paces on your setup :)
> 
> > I'll let Segher give the final yes/no on this, but it generally looks 
> > good to me.
> 
> It looks okay to me too.  Testing now, combine patches have the tendency
> to do unforeseen things on other targets ;-)

Testing shows it makes a difference only very rarely.  For many targets
it makes no difference, for a few it is a small win.  For 32-bit x86 it
creates slightly bigger code.

I think it looks good, but let's wait to hear Uros' opinion.


Segher

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH][combine][RFC] Don't transform sign and zero extends inside mults
  2015-11-08 20:58             ` Segher Boessenkool
@ 2015-11-09  7:52               ` Uros Bizjak
  2015-11-09  9:51                 ` Segher Boessenkool
  0 siblings, 1 reply; 17+ messages in thread
From: Uros Bizjak @ 2015-11-09  7:52 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Jeff Law, Kyrill Tkachov, gcc Patches

On Sun, Nov 8, 2015 at 9:58 PM, Segher Boessenkool
<segher@kernel.crashing.org> wrote:
> On Fri, Nov 06, 2015 at 04:00:08PM -0600, Segher Boessenkool wrote:
>> This patch stops combine from generating widening muls of anything else
>> but registers (immediates, memory, ...).  This probably is a reasonable
>> tradeoff for all targets, even those (if any) that have such insns.
>>
>> > >I'll let you put it through it's paces on your setup :)
>>
>> > I'll let Segher give the final yes/no on this, but it generally looks
>> > good to me.
>>
>> It looks okay to me too.  Testing now, combine patches have the tendency
>> to do unforeseen things on other targets ;-)
>
> Testing shows it makes a difference only very rarely.  For many targets
> it makes no difference, for a few it is a small win.  For 32-bit x86 it
> creates slightly bigger code.
>
> I think it looks good, but let's wait to hear Uros' opinion.

From the original patch submission, it looks that this patch would
also benefit x86_32.

Regarding the above code size increase -  do you perhaps have a
testcase, to see what causes the difference? It isn't necessary due to
the patch, but perhaps some loads are moved to the insn and aren't
CSE'd anymore.

Uros.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH][combine][RFC] Don't transform sign and zero extends inside mults
  2015-11-09  7:52               ` Uros Bizjak
@ 2015-11-09  9:51                 ` Segher Boessenkool
  2015-11-10 19:53                   ` Segher Boessenkool
  0 siblings, 1 reply; 17+ messages in thread
From: Segher Boessenkool @ 2015-11-09  9:51 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: Jeff Law, Kyrill Tkachov, gcc Patches

On Mon, Nov 09, 2015 at 08:52:13AM +0100, Uros Bizjak wrote:
> On Sun, Nov 8, 2015 at 9:58 PM, Segher Boessenkool
> <segher@kernel.crashing.org> wrote:
> > On Fri, Nov 06, 2015 at 04:00:08PM -0600, Segher Boessenkool wrote:
> >> This patch stops combine from generating widening muls of anything else
> >> but registers (immediates, memory, ...).  This probably is a reasonable
> >> tradeoff for all targets, even those (if any) that have such insns.
> >>
> >> > >I'll let you put it through it's paces on your setup :)
> >>
> >> > I'll let Segher give the final yes/no on this, but it generally looks
> >> > good to me.
> >>
> >> It looks okay to me too.  Testing now, combine patches have the tendency
> >> to do unforeseen things on other targets ;-)
> >
> > Testing shows it makes a difference only very rarely.  For many targets
> > it makes no difference, for a few it is a small win.  For 32-bit x86 it
> > creates slightly bigger code.
> >
> > I think it looks good, but let's wait to hear Uros' opinion.
> 
> >From the original patch submission, it looks that this patch would
> also benefit x86_32.

Yes, that is what I thought too.

> Regarding the above code size increase -  do you perhaps have a
> testcase, to see what causes the difference?

I could extract some.  It happens quite rarely on usual code.

> It isn't necessary due to
> the patch, but perhaps some loads are moved to the insn and aren't
> CSE'd anymore.


Segher

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH][combine][RFC] Don't transform sign and zero extends inside mults
  2015-11-09  9:51                 ` Segher Boessenkool
@ 2015-11-10 19:53                   ` Segher Boessenkool
  2015-11-13 10:10                     ` Kyrill Tkachov
  0 siblings, 1 reply; 17+ messages in thread
From: Segher Boessenkool @ 2015-11-10 19:53 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: Jeff Law, Kyrill Tkachov, gcc Patches

On Mon, Nov 09, 2015 at 03:51:32AM -0600, Segher Boessenkool wrote:
> > >From the original patch submission, it looks that this patch would
> > also benefit x86_32.
> 
> Yes, that is what I thought too.
> 
> > Regarding the above code size increase -  do you perhaps have a
> > testcase, to see what causes the difference?
> 
> I could extract some.  It happens quite rarely on usual code.
> 
> > It isn't necessary due to
> > the patch, but perhaps some loads are moved to the insn and aren't
> > CSE'd anymore.

I don't have a small testcase yet.

What causes the degradation is that sometimes we end up with imul reg,reg
instead of imul mem,reg.  In the normal case we already have imul mem,reg
after expand, so the patch doesn't change anything in the normal case.
Even if expand didn't do it fwprop would I think.

It also isn't LRA that is doing it, the MEMs in case are not on stack.
Maybe as you say some CSE pass.

For x86_64, which has many more registers than i386, often a peephole
fires that turns a  mov reg,reg ; imul mem,reg  into an  mov mem,reg ;
imul reg,reg  which makes the generated machines code identical with
or without the patch (tested on a Linux build, 12MB text).

The i386 size regression is 0.01% btw (comparable to the gains for
other targets).

Segher

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH][combine][RFC] Don't transform sign and zero extends inside mults
  2015-11-10 19:53                   ` Segher Boessenkool
@ 2015-11-13 10:10                     ` Kyrill Tkachov
  2015-11-13 10:17                       ` Uros Bizjak
  0 siblings, 1 reply; 17+ messages in thread
From: Kyrill Tkachov @ 2015-11-13 10:10 UTC (permalink / raw)
  To: Segher Boessenkool, Uros Bizjak; +Cc: Jeff Law, gcc Patches

Hi all,

On 10/11/15 19:53, Segher Boessenkool wrote:
> On Mon, Nov 09, 2015 at 03:51:32AM -0600, Segher Boessenkool wrote:
>>> >From the original patch submission, it looks that this patch would
>>> also benefit x86_32.
>> Yes, that is what I thought too.
>>
>>> Regarding the above code size increase -  do you perhaps have a
>>> testcase, to see what causes the difference?
>> I could extract some.  It happens quite rarely on usual code.
>>
>>> It isn't necessary due to
>>> the patch, but perhaps some loads are moved to the insn and aren't
>>> CSE'd anymore.
> I don't have a small testcase yet.
>
> What causes the degradation is that sometimes we end up with imul reg,reg
> instead of imul mem,reg.  In the normal case we already have imul mem,reg
> after expand, so the patch doesn't change anything in the normal case.
> Even if expand didn't do it fwprop would I think.
>
> It also isn't LRA that is doing it, the MEMs in case are not on stack.
> Maybe as you say some CSE pass.
>
> For x86_64, which has many more registers than i386, often a peephole
> fires that turns a  mov reg,reg ; imul mem,reg  into an  mov mem,reg ;
> imul reg,reg  which makes the generated machines code identical with
> or without the patch (tested on a Linux build, 12MB text).
>
> The i386 size regression is 0.01% btw (comparable to the gains for
> other targets).

So how shall we proceed? Should we try to come up with
a testcase? I wonder what are the RTL patterns that combine
now tries on i386 that don't match...

Kyrill

>
>
> Segher
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH][combine][RFC] Don't transform sign and zero extends inside mults
  2015-11-13 10:10                     ` Kyrill Tkachov
@ 2015-11-13 10:17                       ` Uros Bizjak
  2015-11-13 15:01                         ` Segher Boessenkool
  0 siblings, 1 reply; 17+ messages in thread
From: Uros Bizjak @ 2015-11-13 10:17 UTC (permalink / raw)
  To: Kyrill Tkachov; +Cc: Segher Boessenkool, Jeff Law, gcc Patches

On Fri, Nov 13, 2015 at 11:10 AM, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote:
> Hi all,
>
> On 10/11/15 19:53, Segher Boessenkool wrote:
>>
>> On Mon, Nov 09, 2015 at 03:51:32AM -0600, Segher Boessenkool wrote:
>>>>
>>>> >From the original patch submission, it looks that this patch would
>>>> also benefit x86_32.
>>>
>>> Yes, that is what I thought too.
>>>
>>>> Regarding the above code size increase -  do you perhaps have a
>>>> testcase, to see what causes the difference?
>>>
>>> I could extract some.  It happens quite rarely on usual code.
>>>
>>>> It isn't necessary due to
>>>> the patch, but perhaps some loads are moved to the insn and aren't
>>>> CSE'd anymore.
>>
>> I don't have a small testcase yet.
>>
>> What causes the degradation is that sometimes we end up with imul reg,reg
>> instead of imul mem,reg.  In the normal case we already have imul mem,reg
>> after expand, so the patch doesn't change anything in the normal case.
>> Even if expand didn't do it fwprop would I think.
>>
>> It also isn't LRA that is doing it, the MEMs in case are not on stack.
>> Maybe as you say some CSE pass.
>>
>> For x86_64, which has many more registers than i386, often a peephole
>> fires that turns a  mov reg,reg ; imul mem,reg  into an  mov mem,reg ;
>> imul reg,reg  which makes the generated machines code identical with
>> or without the patch (tested on a Linux build, 12MB text).
>>
>> The i386 size regression is 0.01% btw (comparable to the gains for
>> other targets).
>
>
> So how shall we proceed? Should we try to come up with
> a testcase? I wonder what are the RTL patterns that combine
> now tries on i386 that don't match...

IMO, this is such a small code-size regression, that it should not
block the patch submission. It would be nice to know, what causes the
increase (in case, this is some systematic oversight), but we can live
with it.

Uros.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH][combine][RFC] Don't transform sign and zero extends inside mults
  2015-11-13 10:17                       ` Uros Bizjak
@ 2015-11-13 15:01                         ` Segher Boessenkool
  0 siblings, 0 replies; 17+ messages in thread
From: Segher Boessenkool @ 2015-11-13 15:01 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: Kyrill Tkachov, Jeff Law, gcc Patches

On Fri, Nov 13, 2015 at 11:17:38AM +0100, Uros Bizjak wrote:
> IMO, this is such a small code-size regression, that it should not
> block the patch submission.

In that case: Kyrill, the patch is okay for trunk.  Thanks!

> It would be nice to know, what causes the
> increase (in case, this is some systematic oversight), but we can live
> with it.

After the patch it will no longer combine an imul reg,reg (+ mov) into an
imul mem,reg.  _Most_ cases that end up as mem,reg are already expanded
as such, but not all.  It's hard to make a smallish testcase.


Segher

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2015-11-13 15:01 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-02 14:15 [PATCH][combine][RFC] Don't transform sign and zero extends inside mults Kyrill Tkachov
2015-11-02 22:31 ` Jeff Law
2015-11-04 11:37   ` Kyrill Tkachov
2015-11-04 13:33     ` Kyrill Tkachov
2015-11-04 23:50 ` Segher Boessenkool
2015-11-05 12:01   ` Kyrill Tkachov
2015-11-06  0:56     ` Segher Boessenkool
2015-11-06 14:19       ` Kyrill Tkachov
2015-11-06 21:14         ` Jeff Law
2015-11-06 22:00           ` Segher Boessenkool
2015-11-08 20:58             ` Segher Boessenkool
2015-11-09  7:52               ` Uros Bizjak
2015-11-09  9:51                 ` Segher Boessenkool
2015-11-10 19:53                   ` Segher Boessenkool
2015-11-13 10:10                     ` Kyrill Tkachov
2015-11-13 10:17                       ` Uros Bizjak
2015-11-13 15:01                         ` Segher Boessenkool

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).