public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH, rtl-optimization]: Fix PR54457, [x32] Fail to combine 64bit index + constant
@ 2012-09-25  8:04 Uros Bizjak
  2012-09-26 18:17 ` Richard Sandiford
  0 siblings, 1 reply; 23+ messages in thread
From: Uros Bizjak @ 2012-09-25  8:04 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2532 bytes --]

Hello!

Attached patch fixes the combine deficiency, where combine is not able
to recognize combination of:

Trying 6 -> 8:
Failed to match this instruction:
(set (reg:QI 66)
    (mem/j:QI (plus:SI (subreg:SI (plus:DI (reg/v:DI 62 [ position ])
                    (const_int 1 [0x1])) 0)
            (symbol_ref:SI ("array") [flags 0x40]  <var_decl
0x7ffff19ad260 array>)) [0 array S1 A8]))

This should be a valid address, but the RTX is not accepted in
ix86_decompose_address since
we have two displacements here. Combine should simplify this RTX to:

(set (reg:QI 68)
    (mem/j:QI (plus:SI (subreg:SI (reg/v:DI 62 [ position ]) 0)
            (const:SI (plus:SI (symbol_ref:SI ("array") [flags 0x40]
<var_decl 0x7f8d1bc41390 array>)
                    (const_int 1 [0x1])))) [0 array S1 A8]))


as is the case with -m32 (but rejected for 32bit targets in
ix86_address_subreg_operand for unrelated reason).

The solution is, to transform operands of PLUS and MINUS RTX in the
form of (subreg:M (op:N A C) 0) to (op:M (subreg:N (A 0)) C), as is
done for standalone subreg operation in 32bit case.  This
transformation generates canonical (plus:M (plus:M (subreg:N (A 0) C1)
C2)) RTXes that can be recognized and further optimized by
simplify_plus_minus.

The effect of the patch on following testcase (-O2 -mx32 -maddres-mode=short):

--cut here--
extern char array[40];

char foo (long long position)
{
  return array[position + 1];
}
--cut here--

 foo:
-       addq    $1, %rdi
-       movzbl  array(%edi), %eax
+       movzbl  array+1(%edi), %eax
        ret

The patch is effective also for -fpic:

 foo:
        movl    array@GOTPCREL(%rip), %eax
-       addq    $1, %rdi
-       movzbl  (%eax,%edi), %eax
+       movzbl  1(%eax,%edi), %eax
        ret

Generated code for vanilla 32bit and 64bit targets is unchanged.

2012-09-25  Uros Bizjak  <ubizjak@gmail.com>

	PR rtl-optimization/54457
	* combine.c (combine_simplify_rtx) <RTX_COMM_ARITH, RTX_BIN_ARIT>:
	For PLUS and MINUS, convert plus_minus_operand_p operands in
	the form of (subreg:M (op:N A C) 0) to (op:M (subreg:N (A 0)) C).
	* simplify-rtx.c (plus_minus_operand_p): Make global.
	* rtl.h (plus_minus_operand_p): Declare.

testsuite/ChangeLog:

2012-09-25  Uros Bizjak  <ubizjak@gmail.com>

	PR rtl-optimization/54457
	* gcc.target/i386/pr54457.c: New test.

Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu
{,-m32}, also on x32 target by H.J [1].

OK for mainline?

[1] http://gcc.gnu.org/ml/gcc-testresults/2012-09/msg02448.html

Uros.

[-- Attachment #2: p.diff.txt --]
[-- Type: text/plain, Size: 2590 bytes --]

Index: combine.c
===================================================================
--- combine.c	(revision 191681)
+++ combine.c	(working copy)
@@ -5314,6 +5314,26 @@ combine_simplify_rtx (rtx x, enum machine_mode op0
       break;
     case RTX_COMM_ARITH:
     case RTX_BIN_ARITH:
+      if ((code == PLUS || code == MINUS)
+	  && GET_MODE_CLASS (mode) == MODE_INT
+	  && HWI_COMPUTABLE_MODE_P (mode))
+	{
+	  /* Adjust operands in the form of (subreg:M (op:N A C) 0)
+	     to (op:M (subreg:N (A 0)) C).  This transformation 
+	     generates canonical (plus:M (plus:M (subreg:N (A 0) C1) C2))
+	     RTXes that can be recognized and further optimized by
+	     simplify_plus_minus.  */
+	  if (GET_CODE (XEXP (x, 0)) == SUBREG
+	      && plus_minus_operand_p (SUBREG_REG (XEXP (x, 0))))
+	    SUBST (XEXP (x, 0),
+		   force_to_mode (XEXP (x, 0), mode,
+				  ~(unsigned HOST_WIDE_INT) 0, 0));
+	  if (GET_CODE (XEXP (x, 1)) == SUBREG
+	      && plus_minus_operand_p (SUBREG_REG (XEXP (x, 1))))
+	    SUBST (XEXP (x, 1),
+		   force_to_mode (XEXP (x, 1), mode,
+				  ~(unsigned HOST_WIDE_INT) 0, 0));
+	}
       temp = simplify_binary_operation (code, mode, XEXP (x, 0), XEXP (x, 1));
       break;
     case RTX_BITFIELD_OPS:
Index: rtl.h
===================================================================
--- rtl.h	(revision 191681)
+++ rtl.h	(working copy)
@@ -1895,6 +1895,7 @@ extern bool val_signbit_known_set_p (enum machine_
 				     unsigned HOST_WIDE_INT);
 extern bool val_signbit_known_clear_p (enum machine_mode,
 				       unsigned HOST_WIDE_INT);
+extern bool plus_minus_operand_p (const_rtx);
 
 /* In reginfo.c  */
 extern enum machine_mode choose_hard_reg_mode (unsigned int, unsigned int,
Index: simplify-rtx.c
===================================================================
--- simplify-rtx.c	(revision 191681)
+++ simplify-rtx.c	(working copy)
@@ -48,7 +48,6 @@ along with GCC; see the file COPYING3.  If not see
  ((((HOST_WIDE_INT) low) < 0) ? ((HOST_WIDE_INT) -1) : ((HOST_WIDE_INT) 0))
 
 static rtx neg_const_int (enum machine_mode, const_rtx);
-static bool plus_minus_operand_p (const_rtx);
 static bool simplify_plus_minus_op_data_cmp (rtx, rtx);
 static rtx simplify_plus_minus (enum rtx_code, enum machine_mode, rtx, rtx);
 static rtx simplify_immed_subreg (enum machine_mode, rtx, enum machine_mode,
@@ -4186,7 +4185,7 @@ simplify_plus_minus (enum rtx_code code, enum mach
 }
 
 /* Check whether an operand is suitable for calling simplify_plus_minus.  */
-static bool
+bool
 plus_minus_operand_p (const_rtx x)
 {
   return GET_CODE (x) == PLUS

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH, rtl-optimization]: Fix PR54457, [x32] Fail to combine 64bit index + constant
  2012-09-25  8:04 [PATCH, rtl-optimization]: Fix PR54457, [x32] Fail to combine 64bit index + constant Uros Bizjak
@ 2012-09-26 18:17 ` Richard Sandiford
  2012-09-26 21:38   ` Eric Botcazou
  0 siblings, 1 reply; 23+ messages in thread
From: Richard Sandiford @ 2012-09-26 18:17 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches

Uros Bizjak <ubizjak@gmail.com> writes:
> The solution is, to transform operands of PLUS and MINUS RTX in the
> form of (subreg:M (op:N A C) 0) to (op:M (subreg:N (A 0)) C), as is
> done for standalone subreg operation in 32bit case.

I agree (subreg:M (op:N A C) 0) to (op:M (subreg:N (A 0)) C) is
a good transformation, but why do we need to handle as special
the case where the subreg is itself the operand of a plus or minus?
I think it should happen regardless of where the subreg occurs.

On the face of it, this feels like something that could legimately
go in simplify_subreg.

Richard

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH, rtl-optimization]: Fix PR54457, [x32] Fail to combine 64bit index + constant
  2012-09-26 18:17 ` Richard Sandiford
@ 2012-09-26 21:38   ` Eric Botcazou
  2012-09-27 14:25     ` Uros Bizjak
  0 siblings, 1 reply; 23+ messages in thread
From: Eric Botcazou @ 2012-09-26 21:38 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: gcc-patches, Uros Bizjak

> I agree (subreg:M (op:N A C) 0) to (op:M (subreg:N (A 0)) C) is
> a good transformation, but why do we need to handle as special
> the case where the subreg is itself the operand of a plus or minus?
> I think it should happen regardless of where the subreg occurs.

Don't we need to restrict this to the low part though?

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH, rtl-optimization]: Fix PR54457, [x32] Fail to combine 64bit index + constant
  2012-09-26 21:38   ` Eric Botcazou
@ 2012-09-27 14:25     ` Uros Bizjak
  2012-09-27 16:10       ` Richard Sandiford
  0 siblings, 1 reply; 23+ messages in thread
From: Uros Bizjak @ 2012-09-27 14:25 UTC (permalink / raw)
  To: Eric Botcazou; +Cc: Richard Sandiford, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1573 bytes --]

On Wed, Sep 26, 2012 at 11:22 PM, Eric Botcazou <ebotcazou@adacore.com> wrote:
>> I agree (subreg:M (op:N A C) 0) to (op:M (subreg:N (A 0)) C) is
>> a good transformation, but why do we need to handle as special
>> the case where the subreg is itself the operand of a plus or minus?
>> I think it should happen regardless of where the subreg occurs.
>
> Don't we need to restrict this to the low part though?

I have tried this approach with attached patch.  Unfortunately,
although it survived bootstrap without libjava on x86_64, it failed
building libjava with:

/home/uros/gcc-svn/trunk/libjava/classpath/javax/swing/plaf/basic/BasicSliderUI.java:1299:0:
error: insn does not satisfy its constraints:
   }
 ^
(insn 237 398 399 7 (set (reg:SI 1 dx [125])
        (plus:SI (subreg:SI (mult:DI (reg:DI 1 dx [orig:72 D.78627 ] [72])
                    (const_int 2 [0x2])) 0)
            (reg:SI 5 di)))
/home/uros/gcc-svn/trunk/libjava/classpath/javax/swing/plaf/basic/BasicSliderUI.java:1271
240 {*leasi}
     (expr_list:REG_DEAD (reg:DI 5 di)
        (nil)))

Original RTX was (subreg:SI (plus:DI (mult:DI (...) reg:DI))), which
is valid RTX pattern for lea insn, the above is not.

Due to these problems, I think the safer approach is to limit the
transformation to (plus:SI (subreg:SI (plus:DI (...) 0)) RTXes, as was
the case with original patch. This approach would fix a specific
problem where simplify_plus_minus is not able to simplify the combined
RTX at combine time. Please note, that combined RTXes are always
checked for correctness at combine pass.

Uros.

[-- Attachment #2: s.diff.txt --]
[-- Type: text/plain, Size: 1199 bytes --]

Index: simplify-rtx.c
===================================================================
--- simplify-rtx.c	(revision 191796)
+++ simplify-rtx.c	(working copy)
@@ -5689,6 +5688,21 @@ simplify_subreg (enum machine_mode outermode, rtx
 	return CONST0_RTX (outermode);
     }
 
+  /* Simplify (subreg:SI (plus:DI ((x:DI) (y:DI)), 0)
+     to (plus:SI (subreg:SI (x:DI) 0) (subreg:SI (x:DI) 0)), where
+     the outer subreg is effectively a truncation to the original mode.  */
+  if ((GET_CODE (op) == PLUS
+       || GET_CODE (op) == MINUS)
+      && SCALAR_INT_MODE_P (outermode)
+      && SCALAR_INT_MODE_P (innermode)
+      && GET_MODE_PRECISION (outermode) < GET_MODE_PRECISION (innermode)
+      && subreg_lsb_1 (outermode, innermode, byte) == 0)
+    return simplify_gen_binary (GET_CODE (op), outermode,
+				simplify_gen_subreg (outermode, XEXP (op, 0),
+						     innermode, 0),
+				simplify_gen_subreg (outermode, XEXP (op, 1),
+						     innermode, 0));
+  
   /* Simplify (subreg:QI (lshiftrt:SI (sign_extend:SI (x:QI)) C), 0) into
      to (ashiftrt:QI (x:QI) C), where C is a suitable small constant and
      the outer subreg is effectively a truncation to the original mode.  */

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH, rtl-optimization]: Fix PR54457, [x32] Fail to combine 64bit index + constant
  2012-09-27 14:25     ` Uros Bizjak
@ 2012-09-27 16:10       ` Richard Sandiford
  2012-09-27 18:20         ` [PATCH v2, " Uros Bizjak
  0 siblings, 1 reply; 23+ messages in thread
From: Richard Sandiford @ 2012-09-27 16:10 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: Eric Botcazou, gcc-patches

Uros Bizjak <ubizjak@gmail.com> writes:
> On Wed, Sep 26, 2012 at 11:22 PM, Eric Botcazou <ebotcazou@adacore.com> wrote:
>>> I agree (subreg:M (op:N A C) 0) to (op:M (subreg:N (A 0)) C) is
>>> a good transformation, but why do we need to handle as special
>>> the case where the subreg is itself the operand of a plus or minus?
>>> I think it should happen regardless of where the subreg occurs.
>>
>> Don't we need to restrict this to the low part though?
>
> I have tried this approach with attached patch.  Unfortunately,
> although it survived bootstrap without libjava on x86_64, it failed
> building libjava with:
>
> /home/uros/gcc-svn/trunk/libjava/classpath/javax/swing/plaf/basic/BasicSliderUI.java:1299:0:
> error: insn does not satisfy its constraints:
>    }
>  ^
> (insn 237 398 399 7 (set (reg:SI 1 dx [125])
>         (plus:SI (subreg:SI (mult:DI (reg:DI 1 dx [orig:72 D.78627 ] [72])
>                     (const_int 2 [0x2])) 0)
>             (reg:SI 5 di)))
> /home/uros/gcc-svn/trunk/libjava/classpath/javax/swing/plaf/basic/BasicSliderUI.java:1271
> 240 {*leasi}
>      (expr_list:REG_DEAD (reg:DI 5 di)
>         (nil)))
>
> Original RTX was (subreg:SI (plus:DI (mult:DI (...) reg:DI))), which
> is valid RTX pattern for lea insn, the above is not.
>
> Due to these problems, I think the safer approach is to limit the
> transformation to (plus:SI (subreg:SI (plus:DI (...) 0)) RTXes, as was
> the case with original patch. This approach would fix a specific
> problem where simplify_plus_minus is not able to simplify the combined
> RTX at combine time. Please note, that combined RTXes are always
> checked for correctness at combine pass.

I think instead the (subreg (plus ...)) handling should be applied
to (subreg (mult ...)) too.  IMO the correct form of the above address
ought to be:

    (set (reg:SI 1 dx [125])
         (plus:SI (mult:SI (reg:SI 1 dx [orig:72 D.78627 ] [72])
                           (const_int 2 [0x2]))
                  (reg:SI 5 di))

Richard

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v2, rtl-optimization]: Fix PR54457, [x32] Fail to combine 64bit index + constant
  2012-09-27 16:10       ` Richard Sandiford
@ 2012-09-27 18:20         ` Uros Bizjak
  2012-09-27 18:35           ` Paul_Koning
  2012-09-27 20:33           ` [PATCH v2, rtl-optimization]: Fix PR54457, [x32] Fail to combine 64bit index + constant Jakub Jelinek
  0 siblings, 2 replies; 23+ messages in thread
From: Uros Bizjak @ 2012-09-27 18:20 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Sandiford, Eric Botcazou

[-- Attachment #1: Type: text/plain, Size: 2736 bytes --]

On Thu, Sep 27, 2012 at 4:25 PM, Richard Sandiford
<rdsandiford@googlemail.com> wrote:

>>>> I agree (subreg:M (op:N A C) 0) to (op:M (subreg:N (A 0)) C) is
>>>> a good transformation, but why do we need to handle as special
>>>> the case where the subreg is itself the operand of a plus or minus?
>>>> I think it should happen regardless of where the subreg occurs.
>>>
>>> Don't we need to restrict this to the low part though?
>>
>> I have tried this approach with attached patch.  Unfortunately,
>> although it survived bootstrap without libjava on x86_64, it failed
>> building libjava with:
>>
>> /home/uros/gcc-svn/trunk/libjava/classpath/javax/swing/plaf/basic/BasicSliderUI.java:1299:0:
>> error: insn does not satisfy its constraints:
>>    }
>>  ^
>> (insn 237 398 399 7 (set (reg:SI 1 dx [125])
>>         (plus:SI (subreg:SI (mult:DI (reg:DI 1 dx [orig:72 D.78627 ] [72])
>>                     (const_int 2 [0x2])) 0)
>>             (reg:SI 5 di)))
>> /home/uros/gcc-svn/trunk/libjava/classpath/javax/swing/plaf/basic/BasicSliderUI.java:1271
>> 240 {*leasi}
>>      (expr_list:REG_DEAD (reg:DI 5 di)
>>         (nil)))
>>
>> Original RTX was (subreg:SI (plus:DI (mult:DI (...) reg:DI))), which
>> is valid RTX pattern for lea insn, the above is not.
>>
>> Due to these problems, I think the safer approach is to limit the
>> transformation to (plus:SI (subreg:SI (plus:DI (...) 0)) RTXes, as was
>> the case with original patch. This approach would fix a specific
>> problem where simplify_plus_minus is not able to simplify the combined
>> RTX at combine time. Please note, that combined RTXes are always
>> checked for correctness at combine pass.
>
> I think instead the (subreg (plus ...)) handling should be applied
> to (subreg (mult ...)) too.  IMO the correct form of the above address
> ought to be:
>
>     (set (reg:SI 1 dx [125])
>          (plus:SI (mult:SI (reg:SI 1 dx [orig:72 D.78627 ] [72])
>                            (const_int 2 [0x2]))
>                   (reg:SI 5 di))

Great, this works as expected!

After some off-line discussion with Richard, attached is v2 of the patch.

2012-09-27  Uros Bizjak  <ubizjak@gmail.com>

        PR rtl-optimization/54457
        * simplify-rtx.c (simplify_subreg):
	Simplify (subreg:SI (op:DI ((x:DI) (y:DI)), 0)
     	to (op:SI (subreg:SI (x:DI) 0) (subreg:SI (x:DI) 0)).

testsuite/ChangeLog:

2012-09-27  Uros Bizjak  <ubizjak@gmail.com>

        PR rtl-optimization/54457
        * gcc.target/i386/pr54457.c: New test.

Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32}.

BTW: I propose that we start with limited selection of opcodes, so x32
autotester will pick and test the patch with SImode addresses.

OK for mainline?

Uros.

[-- Attachment #2: p.diff.txt --]
[-- Type: text/plain, Size: 1917 bytes --]

Index: simplify-rtx.c
===================================================================
--- simplify-rtx.c	(revision 191808)
+++ simplify-rtx.c	(working copy)
@@ -5689,6 +5689,28 @@ simplify_subreg (enum machine_mode outermode, rtx
 	return CONST0_RTX (outermode);
     }
 
+  /* Simplify (subreg:SI (op:DI ((x:DI) (y:DI)), 0)
+     to (op:SI (subreg:SI (x:DI) 0) (subreg:SI (x:DI) 0)), where
+     the outer subreg is effectively a truncation to the original mode.  */
+  if ((GET_CODE (op) == PLUS
+       || GET_CODE (op) == MINUS
+       || GET_CODE (op) == MULT)
+      && SCALAR_INT_MODE_P (outermode)
+      && SCALAR_INT_MODE_P (innermode)
+      && GET_MODE_PRECISION (outermode) < GET_MODE_PRECISION (innermode)
+      && byte == subreg_lowpart_offset (outermode, innermode))
+    {
+      rtx op0 = simplify_gen_subreg (outermode, XEXP (op, 0),
+                                     innermode, byte);
+      if (op0)
+        {
+          rtx op1 = simplify_gen_subreg (outermode, XEXP (op, 1),
+                                         innermode, byte);
+          if (op1)
+            return simplify_gen_binary (GET_CODE (op), outermode, op0, op1);
+        }
+    }
+
   /* Simplify (subreg:QI (lshiftrt:SI (sign_extend:SI (x:QI)) C), 0) into
      to (ashiftrt:QI (x:QI) C), where C is a suitable small constant and
      the outer subreg is effectively a truncation to the original mode.  */
Index: testsuite/gcc.target/i386/pr54457.c
===================================================================
--- testsuite/gcc.target/i386/pr54457.c	(revision 0)
+++ testsuite/gcc.target/i386/pr54457.c	(working copy)
@@ -0,0 +1,11 @@
+/* { dg-do compile { target { ! { ia32 } } } } */
+/* { dg-options "-O2 -mx32 -maddress-mode=short" } */
+
+extern char array[40];
+
+char foo (long long position)
+{
+  return array[position + 1];
+}
+
+/* { dg-final { scan-assembler-not "add\[lq\]?\[^\n\]*1" } } */

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2, rtl-optimization]: Fix PR54457, [x32] Fail to combine 64bit index + constant
  2012-09-27 18:20         ` [PATCH v2, " Uros Bizjak
@ 2012-09-27 18:35           ` Paul_Koning
  2012-09-27 19:21             ` Uros Bizjak
  2012-09-27 20:33           ` [PATCH v2, rtl-optimization]: Fix PR54457, [x32] Fail to combine 64bit index + constant Jakub Jelinek
  1 sibling, 1 reply; 23+ messages in thread
From: Paul_Koning @ 2012-09-27 18:35 UTC (permalink / raw)
  To: ubizjak; +Cc: gcc-patches, rdsandiford, ebotcazou


On Sep 27, 2012, at 2:04 PM, Uros Bizjak wrote:

> 
> 
> 
>>>>> I agree (subreg:M (op:N A C) 0) to (op:M (subreg:N (A 0)) C) is
>>>>> a good transformation, but why do we need to handle as special
>>>>> the case where the subreg is itself the operand of a plus or minus?
>>>>> I think it should happen regardless of where the subreg occurs.
>>>> 
>>>> Don't we need to restrict this to the low part though?
>>> 
>>> ...
> 
> After some off-line discussion with Richard, attached is v2 of the patch.
> 
> 2012-09-27  Uros Bizjak  <ubizjak@gmail.com>
> 
>        PR rtl-optimization/54457
>        * simplify-rtx.c (simplify_subreg):
> 	Simplify (subreg:SI (op:DI ((x:DI) (y:DI)), 0)
>     	to (op:SI (subreg:SI (x:DI) 0) (subreg:SI (x:DI) 0)).
> ...

Is it just specific to DI -> SI, or is it for any large mode -> smaller mode, like SI -> HI?

	paul


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2, rtl-optimization]: Fix PR54457, [x32] Fail to combine 64bit index + constant
  2012-09-27 18:35           ` Paul_Koning
@ 2012-09-27 19:21             ` Uros Bizjak
  2012-10-02  2:13               ` Andrew Pinski
  0 siblings, 1 reply; 23+ messages in thread
From: Uros Bizjak @ 2012-09-27 19:21 UTC (permalink / raw)
  To: Paul_Koning; +Cc: gcc-patches, rdsandiford, ebotcazou

On Thu, Sep 27, 2012 at 8:08 PM,  <Paul_Koning@dell.com> wrote:

>>>>>> I agree (subreg:M (op:N A C) 0) to (op:M (subreg:N (A 0)) C) is
>>>>>> a good transformation, but why do we need to handle as special
>>>>>> the case where the subreg is itself the operand of a plus or minus?
>>>>>> I think it should happen regardless of where the subreg occurs.
>>>>>
>>>>> Don't we need to restrict this to the low part though?
>>>>
>>>> ...
>>
>> After some off-line discussion with Richard, attached is v2 of the patch.
>>
>> 2012-09-27  Uros Bizjak  <ubizjak@gmail.com>
>>
>>        PR rtl-optimization/54457
>>        * simplify-rtx.c (simplify_subreg):
>>       Simplify (subreg:SI (op:DI ((x:DI) (y:DI)), 0)
>>       to (op:SI (subreg:SI (x:DI) 0) (subreg:SI (x:DI) 0)).
>> ...
>
> Is it just specific to DI -> SI, or is it for any large mode -> smaller mode, like SI -> HI?

Oh, I just copied v1 ChangeLog. The patch converts all modes where
size of mode M < size of mode N. Updated ChangeLog reads:

2012-09-27  Uros Bizjak  <ubizjak@gmail.com>

        PR rtl-optimization/54457
        * simplify-rtx.c (simplify_subreg):
	Simplify (subreg:M (op:N ((x:N) (y:N)), 0)
     	to (op:M (subreg:M (x:N) 0) (subreg:M (x:N) 0)), where
	the outer subreg is effectively a truncation to the original mode M.

testsuite/ChangeLog:

2012-09-27  Uros Bizjak  <ubizjak@gmail.com>

        PR rtl-optimization/54457
        * gcc.target/i386/pr54457.c: New test.

Uros.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2, rtl-optimization]: Fix PR54457, [x32] Fail to combine 64bit index + constant
  2012-09-27 18:20         ` [PATCH v2, " Uros Bizjak
  2012-09-27 18:35           ` Paul_Koning
@ 2012-09-27 20:33           ` Jakub Jelinek
  2012-09-27 22:37             ` Richard Sandiford
  2012-09-28 15:39             ` Uros Bizjak
  1 sibling, 2 replies; 23+ messages in thread
From: Jakub Jelinek @ 2012-09-27 20:33 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches, Richard Sandiford, Eric Botcazou

On Thu, Sep 27, 2012 at 08:04:58PM +0200, Uros Bizjak wrote:
> After some off-line discussion with Richard, attached is v2 of the patch.
> 
> 2012-09-27  Uros Bizjak  <ubizjak@gmail.com>
> 
>         PR rtl-optimization/54457
>         * simplify-rtx.c (simplify_subreg):
> 	Simplify (subreg:SI (op:DI ((x:DI) (y:DI)), 0)
>      	to (op:SI (subreg:SI (x:DI) 0) (subreg:SI (x:DI) 0)).

Is that a good idea even for WORD_REGISTER_OPERATIONS targets?

> --- simplify-rtx.c	(revision 191808)
> +++ simplify-rtx.c	(working copy)
> @@ -5689,6 +5689,28 @@ simplify_subreg (enum machine_mode outermode, rtx
>  	return CONST0_RTX (outermode);
>      }
>  
> +  /* Simplify (subreg:SI (op:DI ((x:DI) (y:DI)), 0)
> +     to (op:SI (subreg:SI (x:DI) 0) (subreg:SI (x:DI) 0)), where
> +     the outer subreg is effectively a truncation to the original mode.  */
> +  if ((GET_CODE (op) == PLUS
> +       || GET_CODE (op) == MINUS
> +       || GET_CODE (op) == MULT)
> +      && SCALAR_INT_MODE_P (outermode)
> +      && SCALAR_INT_MODE_P (innermode)
> +      && GET_MODE_PRECISION (outermode) < GET_MODE_PRECISION (innermode)
> +      && byte == subreg_lowpart_offset (outermode, innermode))
> +    {
> +      rtx op0 = simplify_gen_subreg (outermode, XEXP (op, 0),
> +                                     innermode, byte);
> +      if (op0)
> +        {
> +          rtx op1 = simplify_gen_subreg (outermode, XEXP (op, 1),
> +                                         innermode, byte);
> +          if (op1)
> +            return simplify_gen_binary (GET_CODE (op), outermode, op0, op1);
> +        }
> +    }
> +
>    /* Simplify (subreg:QI (lshiftrt:SI (sign_extend:SI (x:QI)) C), 0) into
>       to (ashiftrt:QI (x:QI) C), where C is a suitable small constant and
>       the outer subreg is effectively a truncation to the original mode.  */

	Jakub

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2, rtl-optimization]: Fix PR54457, [x32] Fail to combine 64bit index + constant
  2012-09-27 20:33           ` [PATCH v2, rtl-optimization]: Fix PR54457, [x32] Fail to combine 64bit index + constant Jakub Jelinek
@ 2012-09-27 22:37             ` Richard Sandiford
  2012-09-28 15:39             ` Uros Bizjak
  1 sibling, 0 replies; 23+ messages in thread
From: Richard Sandiford @ 2012-09-27 22:37 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Uros Bizjak, gcc-patches, Eric Botcazou

Jakub Jelinek <jakub@redhat.com> writes:
> On Thu, Sep 27, 2012 at 08:04:58PM +0200, Uros Bizjak wrote:
>> After some off-line discussion with Richard, attached is v2 of the patch.
>> 
>> 2012-09-27  Uros Bizjak  <ubizjak@gmail.com>
>> 
>>         PR rtl-optimization/54457
>>         * simplify-rtx.c (simplify_subreg):
>> 	Simplify (subreg:SI (op:DI ((x:DI) (y:DI)), 0)
>>      	to (op:SI (subreg:SI (x:DI) 0) (subreg:SI (x:DI) 0)).
>
> Is that a good idea even for WORD_REGISTER_OPERATIONS targets?

I think so.  Admittedly it means I'm going to have to change the
mips.md BADDU patterns from:

(define_insn "*baddu_si_el"
  [(set (match_operand:SI 0 "register_operand" "=d")
        (zero_extend:SI
	 (subreg:QI
	  (plus:SI (match_operand:SI 1 "register_operand" "d")
		   (match_operand:SI 2 "register_operand" "d")) 0)))]
  "ISA_HAS_BADDU && !BYTES_BIG_ENDIAN"
  "baddu\\t%0,%1,%2"
  [(set_attr "alu_type" "add")])

to:

(define_insn "*baddu_si_el"
  [(set (match_operand:SI 0 "register_operand" "=d")
        (zero_extend:SI
	 (plus:QI (match_operand:QI 1 "register_operand" "d")
		  (match_operand:QI 2 "register_operand" "d"))))]
  "ISA_HAS_BADDU"
  "baddu\\t%0,%1,%2"
  [(set_attr "alu_type" "add")])

But really, that's a better representation even on MIPS,
not least because it gets rid of the ugly endianness condition.

There will probably be fallout on other targets too.
But the upside is that we get rid of more subregs from .mds.
I think a few of the i386.md define_peephole2s could go too.
E.g. the second two of:

(define_peephole2
  [(set (match_operand:DI 0 "register_operand")
  	(zero_extend:DI
	  (plus:SI (match_operand:SI 1 "register_operand")
		   (match_operand:SI 2 "nonmemory_operand"))))]
  "TARGET_64BIT && !TARGET_OPT_AGU
   && REGNO (operands[0]) == REGNO (operands[1])
   && peep2_regno_dead_p (0, FLAGS_REG)"
  [(parallel [(set (match_dup 0)
		   (zero_extend:DI (plus:SI (match_dup 1) (match_dup 2))))
	      (clobber (reg:CC FLAGS_REG))])])

(define_peephole2
  [(set (match_operand:DI 0 "register_operand")
  	(zero_extend:DI
	  (plus:SI (match_operand:SI 1 "nonmemory_operand")
		   (match_operand:SI 2 "register_operand"))))]
  "TARGET_64BIT && !TARGET_OPT_AGU
   && REGNO (operands[0]) == REGNO (operands[2])
   && peep2_regno_dead_p (0, FLAGS_REG)"
  [(parallel [(set (match_dup 0)
		   (zero_extend:DI (plus:SI (match_dup 2) (match_dup 1))))
	      (clobber (reg:CC FLAGS_REG))])])

(define_peephole2
  [(set (match_operand:DI 0 "register_operand")
  	(zero_extend:DI
	  (subreg:SI (plus:DI (match_dup 0)
			      (match_operand:DI 1 "nonmemory_operand")) 0)))]
  "TARGET_64BIT && !TARGET_OPT_AGU
   && peep2_regno_dead_p (0, FLAGS_REG)"
  [(parallel [(set (match_dup 0)
		   (zero_extend:DI (plus:SI (match_dup 2) (match_dup 1))))
	      (clobber (reg:CC FLAGS_REG))])]
{
  operands[1] = gen_lowpart (SImode, operands[1]);
  operands[2] = gen_lowpart (SImode, operands[0]);
})

(define_peephole2
  [(set (match_operand:DI 0 "register_operand")
  	(zero_extend:DI
	  (subreg:SI (plus:DI (match_operand:DI 1 "nonmemory_operand")
		     	      (match_dup 0)) 0)))]
  "TARGET_64BIT && !TARGET_OPT_AGU
   && peep2_regno_dead_p (0, FLAGS_REG)"
  [(parallel [(set (match_dup 0)
		   (zero_extend:DI (plus:SI (match_dup 2) (match_dup 1))))
	      (clobber (reg:CC FLAGS_REG))])]
{
  operands[1] = gen_lowpart (SImode, operands[1]);
  operands[2] = gen_lowpart (SImode, operands[0]);
})

where we should now always generate the first two forms.

There's no semantic difference between the two rtxes, and I think
it would be confusing to have different canonical forms on different
targets.  If the caller really wants a word-mode operation on
WORD_REGISTER_OPERATIONS targets, then I think it's asking for
the wrong thing by taking this subreg.

Richard

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2, rtl-optimization]: Fix PR54457, [x32] Fail to combine 64bit index + constant
  2012-09-27 20:33           ` [PATCH v2, rtl-optimization]: Fix PR54457, [x32] Fail to combine 64bit index + constant Jakub Jelinek
  2012-09-27 22:37             ` Richard Sandiford
@ 2012-09-28 15:39             ` Uros Bizjak
  2012-09-30 11:40               ` Richard Sandiford
  1 sibling, 1 reply; 23+ messages in thread
From: Uros Bizjak @ 2012-09-28 15:39 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: gcc-patches, Richard Sandiford, Eric Botcazou

On Thu, Sep 27, 2012 at 8:20 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Thu, Sep 27, 2012 at 08:04:58PM +0200, Uros Bizjak wrote:
>> After some off-line discussion with Richard, attached is v2 of the patch.
>>
>> 2012-09-27  Uros Bizjak  <ubizjak@gmail.com>
>>
>>         PR rtl-optimization/54457
>>         * simplify-rtx.c (simplify_subreg):
>>       Simplify (subreg:SI (op:DI ((x:DI) (y:DI)), 0)
>>       to (op:SI (subreg:SI (x:DI) 0) (subreg:SI (x:DI) 0)).
>
> Is that a good idea even for WORD_REGISTER_OPERATIONS targets?

I have bootstrapped and regtested [1] the patch on
alphaev68-pc-linux-gnu, a WORD_REGISTER_OPERATIONS target, and there
were no additional failures.

[1] http://gcc.gnu.org/ml/gcc-testresults/2012-09/msg02828.html

Uros.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2, rtl-optimization]: Fix PR54457, [x32] Fail to combine 64bit index + constant
  2012-09-28 15:39             ` Uros Bizjak
@ 2012-09-30 11:40               ` Richard Sandiford
  2012-10-03 11:07                 ` Paolo Bonzini
  0 siblings, 1 reply; 23+ messages in thread
From: Richard Sandiford @ 2012-09-30 11:40 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: Jakub Jelinek, gcc-patches, Eric Botcazou

Uros Bizjak <ubizjak@gmail.com> writes:
> On Thu, Sep 27, 2012 at 8:20 PM, Jakub Jelinek <jakub@redhat.com> wrote:
>> On Thu, Sep 27, 2012 at 08:04:58PM +0200, Uros Bizjak wrote:
>>> After some off-line discussion with Richard, attached is v2 of the patch.
>>>
>>> 2012-09-27  Uros Bizjak  <ubizjak@gmail.com>
>>>
>>>         PR rtl-optimization/54457
>>>         * simplify-rtx.c (simplify_subreg):
>>>       Simplify (subreg:SI (op:DI ((x:DI) (y:DI)), 0)
>>>       to (op:SI (subreg:SI (x:DI) 0) (subreg:SI (x:DI) 0)).
>>
>> Is that a good idea even for WORD_REGISTER_OPERATIONS targets?
>
> I have bootstrapped and regtested [1] the patch on
> alphaev68-pc-linux-gnu, a WORD_REGISTER_OPERATIONS target, and there
> were no additional failures.

Thanks.  Given Jakub's question/concern, I'd really prefer a third
opinion rather than approving it myself, but... OK if no-one objects
within 24hrs.

Richard

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2, rtl-optimization]: Fix PR54457, [x32] Fail to combine 64bit index + constant
  2012-09-27 19:21             ` Uros Bizjak
@ 2012-10-02  2:13               ` Andrew Pinski
  2012-10-02 19:32                 ` Richard Sandiford
  0 siblings, 1 reply; 23+ messages in thread
From: Andrew Pinski @ 2012-10-02  2:13 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: Paul_Koning, gcc-patches, rdsandiford, ebotcazou

On Thu, Sep 27, 2012 at 11:13 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Thu, Sep 27, 2012 at 8:08 PM,  <Paul_Koning@dell.com> wrote:
>
>>>>>>> I agree (subreg:M (op:N A C) 0) to (op:M (subreg:N (A 0)) C) is
>>>>>>> a good transformation, but why do we need to handle as special
>>>>>>> the case where the subreg is itself the operand of a plus or minus?
>>>>>>> I think it should happen regardless of where the subreg occurs.
>>>>>>
>>>>>> Don't we need to restrict this to the low part though?
>>>>>
>>>>> ...
>>>
>>> After some off-line discussion with Richard, attached is v2 of the patch.
>>>
>>> 2012-09-27  Uros Bizjak  <ubizjak@gmail.com>
>>>
>>>        PR rtl-optimization/54457
>>>        * simplify-rtx.c (simplify_subreg):
>>>       Simplify (subreg:SI (op:DI ((x:DI) (y:DI)), 0)
>>>       to (op:SI (subreg:SI (x:DI) 0) (subreg:SI (x:DI) 0)).
>>> ...
>>
>> Is it just specific to DI -> SI, or is it for any large mode -> smaller mode, like SI -> HI?
>
> Oh, I just copied v1 ChangeLog. The patch converts all modes where
> size of mode M < size of mode N. Updated ChangeLog reads:
>
> 2012-09-27  Uros Bizjak  <ubizjak@gmail.com>
>
>         PR rtl-optimization/54457
>         * simplify-rtx.c (simplify_subreg):
>         Simplify (subreg:M (op:N ((x:N) (y:N)), 0)
>         to (op:M (subreg:M (x:N) 0) (subreg:M (x:N) 0)), where
>         the outer subreg is effectively a truncation to the original mode M.


When I was doing something similar on our internal toolchain at
Cavium.  I found doing this caused a regression on MIPS64 n32 in
gcc.c-torture/execute/20040709-1.c Where:


(insn 15 14 16 2 (set (reg/v:DI 200 [ y ])
        (reg:DI 2 $2)) t.c:16 301 {*movdi_64bit}
     (expr_list:REG_DEAD (reg:DI 2 $2)
        (nil)))

(insn 16 15 17 2 (set (reg:DI 210)
        (zero_extract:DI (reg/v:DI 200 [ y ])
            (const_int 29 [0x1d])
            (const_int 0 [0]))) t.c:16 249 {extzvdi}
     (expr_list:REG_DEAD (reg/v:DI 200 [ y ])
        (nil)))

(insn 17 16 23 2 (set (reg:SI 211)
        (truncate:SI (reg:DI 210))) t.c:16 175 {truncdisi2}
     (expr_list:REG_DEAD (reg:DI 210)
        (nil)))

Gets converted to:
(insn 23 17 26 2 (set (reg/i:SI 2 $2)
        (and:SI (reg:SI 2 $2 [+4 ])
            (const_int 536870911 [0x1fffffff]))) t.c:18 156 {*andsi3}
     (nil))

Which is considered an ext instruction

And with the Octeon simulator which causes undefined arguments to
32bit word operations to come out as 0xDEADBEEF which showed the
regression.  I fixed it by changing it to produce TRUNCATE instead of
the subreg.

I did the simplification on ior/and rather than plus/minus/mult so the
issue is only when expanding to this to and/ior.

Thanks,
Andrew Pinski



>
> testsuite/ChangeLog:
>
> 2012-09-27  Uros Bizjak  <ubizjak@gmail.com>
>
>         PR rtl-optimization/54457
>         * gcc.target/i386/pr54457.c: New test.
>
> Uros.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2, rtl-optimization]: Fix PR54457, [x32] Fail to combine 64bit index + constant
  2012-10-02  2:13               ` Andrew Pinski
@ 2012-10-02 19:32                 ` Richard Sandiford
  2012-10-06 10:22                   ` RFA: Simplifying truncation and integer lowpart subregs Richard Sandiford
  0 siblings, 1 reply; 23+ messages in thread
From: Richard Sandiford @ 2012-10-02 19:32 UTC (permalink / raw)
  To: Andrew Pinski; +Cc: Uros Bizjak, Paul_Koning, gcc-patches, ebotcazou

Andrew Pinski <andrew.pinski@caviumnetworks.com> writes:
> On Thu, Sep 27, 2012 at 11:13 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
>> 2012-09-27  Uros Bizjak  <ubizjak@gmail.com>
>>
>>         PR rtl-optimization/54457
>>         * simplify-rtx.c (simplify_subreg):
>>         Simplify (subreg:M (op:N ((x:N) (y:N)), 0)
>>         to (op:M (subreg:M (x:N) 0) (subreg:M (x:N) 0)), where
>>         the outer subreg is effectively a truncation to the original mode M.
>
>
> When I was doing something similar on our internal toolchain at
> Cavium.  I found doing this caused a regression on MIPS64 n32 in
> gcc.c-torture/execute/20040709-1.c Where:
>
>
> (insn 15 14 16 2 (set (reg/v:DI 200 [ y ])
>         (reg:DI 2 $2)) t.c:16 301 {*movdi_64bit}
>      (expr_list:REG_DEAD (reg:DI 2 $2)
>         (nil)))
>
> (insn 16 15 17 2 (set (reg:DI 210)
>         (zero_extract:DI (reg/v:DI 200 [ y ])
>             (const_int 29 [0x1d])
>             (const_int 0 [0]))) t.c:16 249 {extzvdi}
>      (expr_list:REG_DEAD (reg/v:DI 200 [ y ])
>         (nil)))
>
> (insn 17 16 23 2 (set (reg:SI 211)
>         (truncate:SI (reg:DI 210))) t.c:16 175 {truncdisi2}
>      (expr_list:REG_DEAD (reg:DI 210)
>         (nil)))
>
> Gets converted to:
> (insn 23 17 26 2 (set (reg/i:SI 2 $2)
>         (and:SI (reg:SI 2 $2 [+4 ])
>             (const_int 536870911 [0x1fffffff]))) t.c:18 156 {*andsi3}
>      (nil))
>
> Which is considered an ext instruction
>
> And with the Octeon simulator which causes undefined arguments to
> 32bit word operations to come out as 0xDEADBEEF which showed the
> regression.  I fixed it by changing it to produce TRUNCATE instead of
> the subreg.
>
> I did the simplification on ior/and rather than plus/minus/mult so the
> issue is only when expanding to this to and/ior.

Hmm, hadn't thought of that.  I think some of the existing subreg
optimisations suffer the same problem.  I.e. we can't assume that
subreg truncations of nested operands are OK just because the outer
subreg is OK.

I've got a patch I'm testing.

BTW, I haven't forgotten about your other ext patch.  Was hoping
to see whether we could finally take the opportunity to parameterise
the ext* patterns by mode, but got distracted with other patches.
Maybe I'll just have to admit I won't get time to try it for 4.8...

Richard

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2, rtl-optimization]: Fix PR54457, [x32] Fail to combine 64bit index + constant
  2012-09-30 11:40               ` Richard Sandiford
@ 2012-10-03 11:07                 ` Paolo Bonzini
  0 siblings, 0 replies; 23+ messages in thread
From: Paolo Bonzini @ 2012-10-03 11:07 UTC (permalink / raw)
  To: Uros Bizjak, Jakub Jelinek, gcc-patches, Eric Botcazou, rdsandiford

[-- Attachment #1: Type: text/plain, Size: 1226 bytes --]

Il 30/09/2012 11:02, Richard Sandiford ha scritto:
> Uros Bizjak <ubizjak@gmail.com> writes:
>> On Thu, Sep 27, 2012 at 8:20 PM, Jakub Jelinek <jakub@redhat.com> wrote:
>>> On Thu, Sep 27, 2012 at 08:04:58PM +0200, Uros Bizjak wrote:
>>>> After some off-line discussion with Richard, attached is v2 of the patch.
>>>>
>>>> 2012-09-27  Uros Bizjak  <ubizjak@gmail.com>
>>>>
>>>>         PR rtl-optimization/54457
>>>>         * simplify-rtx.c (simplify_subreg):
>>>>       Simplify (subreg:SI (op:DI ((x:DI) (y:DI)), 0)
>>>>       to (op:SI (subreg:SI (x:DI) 0) (subreg:SI (x:DI) 0)).
>>>
>>> Is that a good idea even for WORD_REGISTER_OPERATIONS targets?
>>
>> I have bootstrapped and regtested [1] the patch on
>> alphaev68-pc-linux-gnu, a WORD_REGISTER_OPERATIONS target, and there
>> were no additional failures.
> 
> Thanks.  Given Jakub's question/concern, I'd really prefer a third
> opinion rather than approving it myself, but... OK if no-one objects
> within 24hrs.

I used to have a patch doing roughly the same thing (for more operations
but not MULT).  I never submitted because I didn't have the time to
audit all targets after changing the canonicalization.

Perhaps you can take the best of both worlds.

Paolo

[-- Attachment #2: pr39726.patch --]
[-- Type: text/x-patch, Size: 5830 bytes --]

change canonicalization

Index: gcc/Makefile.in
===================================================================
--- gcc/Makefile.in	(branch pr39726)
+++ gcc/Makefile.in	(working copy)
@@ -2844,7 +2844,7 @@ jump.o : jump.c $(CONFIG_H) $(SYSTEM_H) 
 simplify-rtx.o : simplify-rtx.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
    $(RTL_H) $(REGS_H) hard-reg-set.h $(FLAGS_H) $(REAL_H) insn-config.h \
    $(RECOG_H) $(EXPR_H) $(TOPLEV_H) output.h $(FUNCTION_H) $(GGC_H) $(TM_P_H) \
-   $(TREE_H) $(TARGET_H)
+   $(TREE_H) $(TARGET_H) $(OPTABS_H)
 cgraph.o : cgraph.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(TREE_H) \
    langhooks.h $(TOPLEV_H) $(FLAGS_H) $(GGC_H) $(TARGET_H) $(CGRAPH_H) \
    gt-cgraph.h output.h intl.h $(BASIC_BLOCK_H) debug.h $(HASHTAB_H) \
Index: gcc/simplify-rtx.c
===================================================================
--- gcc/simplify-rtx.c	(branch pr39726)
+++ gcc/simplify-rtx.c	(working copy)
@@ -39,6 +39,7 @@ along with GCC; see the file COPYING3.  
 #include "output.h"
 #include "ggc.h"
 #include "target.h"
+#include "optabs.h"
 
 /* Simplification and canonicalization of RTL.  */
 
@@ -5191,6 +5192,8 @@ rtx
 simplify_subreg (enum machine_mode outermode, rtx op,
 		 enum machine_mode innermode, unsigned int byte)
 {
+  int is_lowpart;
+
   /* Little bit of sanity checking.  */
   gcc_assert (innermode != VOIDmode);
   gcc_assert (outermode != VOIDmode);
@@ -5300,11 +5303,13 @@ simplify_subreg (enum machine_mode outer
       return NULL_RTX;
     }
 
+  is_lowpart = subreg_lowpart_offset (outermode, innermode) == byte;
+
   /* Merge implicit and explicit truncations.  */
 
   if (GET_CODE (op) == TRUNCATE
       && GET_MODE_SIZE (outermode) < GET_MODE_SIZE (innermode)
-      && subreg_lowpart_offset (outermode, innermode) == byte)
+      && is_lowpart)
     return simplify_gen_unary (TRUNCATE, outermode, XEXP (op, 0),
 			       GET_MODE (XEXP (op, 0)));
 
@@ -5343,7 +5348,7 @@ simplify_subreg (enum machine_mode outer
 	     The information is used only by alias analysis that can not
 	     grog partial register anyway.  */
 
-	  if (subreg_lowpart_offset (outermode, innermode) == byte)
+	  if (is_lowpart)
 	    ORIGINAL_REGNO (x) = ORIGINAL_REGNO (op);
 	  return x;
 	}
@@ -5393,6 +5398,51 @@ simplify_subreg (enum machine_mode outer
       return NULL_RTX;
     }
 
+  /* Try to move a subreg inside an arithmetic operation.  */
+  if (is_lowpart && ARITHMETIC_P (op)
+      && GET_MODE_CLASS (outermode) == MODE_INT
+      && GET_MODE_CLASS (innermode) == MODE_INT)
+    {
+      enum insn_code ic;
+      enum machine_mode cnt_mode;
+      switch (GET_CODE (op))
+	{
+	case ABS:
+	case NEG:
+	  return simplify_gen_unary (GET_CODE (op), outermode,
+				     rtl_hooks.gen_lowpart_no_emit
+				      (outermode, XEXP (op, 0)),
+				     outermode);
+
+	case ASHIFT:
+	  /* i386 always uses QImode for the shift count.  Get the
+	     appropriate mode from the optab.  */
+	  ic = ashl_optab->handlers[outermode].insn_code;
+	  if (ic != CODE_FOR_nothing)
+	    cnt_mode = insn_data[ic].operand[2].mode;
+	  else
+	    cnt_mode = outermode;
+	  return simplify_gen_binary (GET_CODE (op), outermode,
+				      rtl_hooks.gen_lowpart_no_emit
+				       (outermode, XEXP (op, 0)),
+				      rtl_hooks.gen_lowpart_no_emit
+				       (cnt_mode, XEXP (op, 1)));
+
+	case PLUS:
+	case MINUS:
+	case AND:
+	case IOR:
+	case XOR:
+	  return simplify_gen_binary (GET_CODE (op), outermode,
+				      rtl_hooks.gen_lowpart_no_emit
+				       (outermode, XEXP (op, 0)),
+				      rtl_hooks.gen_lowpart_no_emit
+				       (outermode, XEXP (op, 1)));
+	default:
+	  break;
+	}
+    }
+
   /* Optimize SUBREG truncations of zero and sign extended values.  */
   if ((GET_CODE (op) == ZERO_EXTEND
        || GET_CODE (op) == SIGN_EXTEND)
Index: gcc/combine.c
===================================================================
--- gcc/combine.c	(branch pr39726)
+++ gcc/combine.c	(working copy)
@@ -11155,47 +11155,6 @@ simplify_comparison (enum rtx_code code,
 	      continue;
 	    }
 
-	  /* If this is (and:M1 (subreg:M2 X 0) (const_int C1)) where C1
-	     fits in both M1 and M2 and the SUBREG is either paradoxical
-	     or represents the low part, permute the SUBREG and the AND
-	     and try again.  */
-	  if (GET_CODE (XEXP (op0, 0)) == SUBREG)
-	    {
-	      unsigned HOST_WIDE_INT c1;
-	      tmode = GET_MODE (SUBREG_REG (XEXP (op0, 0)));
-	      /* Require an integral mode, to avoid creating something like
-		 (AND:SF ...).  */
-	      if (SCALAR_INT_MODE_P (tmode)
-		  /* It is unsafe to commute the AND into the SUBREG if the
-		     SUBREG is paradoxical and WORD_REGISTER_OPERATIONS is
-		     not defined.  As originally written the upper bits
-		     have a defined value due to the AND operation.
-		     However, if we commute the AND inside the SUBREG then
-		     they no longer have defined values and the meaning of
-		     the code has been changed.  */
-		  && (0
-#ifdef WORD_REGISTER_OPERATIONS
-		      || (mode_width > GET_MODE_BITSIZE (tmode)
-			  && mode_width <= BITS_PER_WORD)
-#endif
-		      || (mode_width <= GET_MODE_BITSIZE (tmode)
-			  && subreg_lowpart_p (XEXP (op0, 0))))
-		  && CONST_INT_P (XEXP (op0, 1))
-		  && mode_width <= HOST_BITS_PER_WIDE_INT
-		  && GET_MODE_BITSIZE (tmode) <= HOST_BITS_PER_WIDE_INT
-		  && ((c1 = INTVAL (XEXP (op0, 1))) & ~mask) == 0
-		  && (c1 & ~GET_MODE_MASK (tmode)) == 0
-		  && c1 != mask
-		  && c1 != GET_MODE_MASK (tmode))
-		{
-		  op0 = simplify_gen_binary (AND, tmode,
-					     SUBREG_REG (XEXP (op0, 0)),
-					     gen_int_mode (c1, tmode));
-		  op0 = gen_lowpart (mode, op0);
-		  continue;
-		}
-	    }
-
 	  /* Convert (ne (and (not X) 1) 0) to (eq (and X 1) 0).  */
 	  if (const_op == 0 && equality_comparison_p
 	      && XEXP (op0, 1) == const1_rtx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RFA: Simplifying truncation and integer lowpart subregs
  2012-10-02 19:32                 ` Richard Sandiford
@ 2012-10-06 10:22                   ` Richard Sandiford
  2012-10-06 11:13                     ` Eric Botcazou
  0 siblings, 1 reply; 23+ messages in thread
From: Richard Sandiford @ 2012-10-06 10:22 UTC (permalink / raw)
  To: Andrew Pinski
  Cc: Uros Bizjak, Paul_Koning, gcc-patches, ebotcazou, kkojima,
	aoliva, dje.gcc, uweigand, walt

[cc:ing sh, spu and tilegx maintainers]

Richard Sandiford <rdsandiford@googlemail.com> writes:
> Andrew Pinski <andrew.pinski@caviumnetworks.com> writes:
>> On Thu, Sep 27, 2012 at 11:13 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
>>> 2012-09-27  Uros Bizjak  <ubizjak@gmail.com>
>>>
>>>         PR rtl-optimization/54457
>>>         * simplify-rtx.c (simplify_subreg):
>>>         Simplify (subreg:M (op:N ((x:N) (y:N)), 0)
>>>         to (op:M (subreg:M (x:N) 0) (subreg:M (x:N) 0)), where
>>>         the outer subreg is effectively a truncation to the original mode M.
>>
>>
>> When I was doing something similar on our internal toolchain at
>> Cavium.  I found doing this caused a regression on MIPS64 n32 in
>> gcc.c-torture/execute/20040709-1.c Where:
>>
>>
>> (insn 15 14 16 2 (set (reg/v:DI 200 [ y ])
>>         (reg:DI 2 $2)) t.c:16 301 {*movdi_64bit}
>>      (expr_list:REG_DEAD (reg:DI 2 $2)
>>         (nil)))
>>
>> (insn 16 15 17 2 (set (reg:DI 210)
>>         (zero_extract:DI (reg/v:DI 200 [ y ])
>>             (const_int 29 [0x1d])
>>             (const_int 0 [0]))) t.c:16 249 {extzvdi}
>>      (expr_list:REG_DEAD (reg/v:DI 200 [ y ])
>>         (nil)))
>>
>> (insn 17 16 23 2 (set (reg:SI 211)
>>         (truncate:SI (reg:DI 210))) t.c:16 175 {truncdisi2}
>>      (expr_list:REG_DEAD (reg:DI 210)
>>         (nil)))
>>
>> Gets converted to:
>> (insn 23 17 26 2 (set (reg/i:SI 2 $2)
>>         (and:SI (reg:SI 2 $2 [+4 ])
>>             (const_int 536870911 [0x1fffffff]))) t.c:18 156 {*andsi3}
>>      (nil))
>>
>> Which is considered an ext instruction
>>
>> And with the Octeon simulator which causes undefined arguments to
>> 32bit word operations to come out as 0xDEADBEEF which showed the
>> regression.  I fixed it by changing it to produce TRUNCATE instead of
>> the subreg.
>>
>> I did the simplification on ior/and rather than plus/minus/mult so the
>> issue is only when expanding to this to and/ior.
>
> Hmm, hadn't thought of that.  I think some of the existing subreg
> optimisations suffer the same problem.  I.e. we can't assume that
> subreg truncations of nested operands are OK just because the outer
> subreg is OK.
>
> I've got a patch I'm testing.

The idea is to split most of the lowpart subreg handling out of
simplify_subreg and apply it to TRUNCATE too.  There are three reasons:

- I wanted to make the !TRULY_NOOP_TRUNCATION truncation simplifications
  as similar to subreg truncation simplifications as possible.

- Some of the current lowpart subreg simplifications are also correct
  for vector truncations.

- Ideally, using simplify_gen_unary (TRUNCATE, ...) instead of
  simplify_gen_subreg shouldn't penalise TRULY_NOOP_TRUNCATION targets.
  There is already code to use gen_lowpart_no_emit for truncations that
  reduce to subregs, but as things stand, gen_lowpart_no_emit only
  passes objects like SUBREG, REG, MEM, etc., to simplify_gen_subreg;
  others go through gen_lowpart_SUBREG and get no recursive simplification.

We inherited this code from a 1996 patch (r13058):

      if ((TRULY_NOOP_TRUNCATION_MODES_P (mode, GET_MODE (op))
	   ? (num_sign_bit_copies (op, GET_MODE (op))
	      > (unsigned int) (GET_MODE_PRECISION (GET_MODE (op))
				- GET_MODE_PRECISION (mode)))
        ...
 	return rtl_hooks.gen_lowpart_no_emit (mode, op);

I don't see any reason for the sign-bit check.  If truncations are noops,
we should be able to use a subreg regardless.

Other than removing that check, the patch just moves simplifications around.
I've not tried to match new patterns.

The other !TRULY_NOOP_TRUNCATION targets are sh64, spu and tilegx.
I don't think sh64 has any patterns that would be adversely affected,
although the patch ought to make these patterns redundant:

(define_insn_and_split "*logical_sidisi3"
  [(set (match_operand:SI 0 "arith_reg_dest" "=r,r")
	(truncate:SI (sign_extend:DI
			(match_operator:SI 3 "logical_operator"
			  [(match_operand:SI 1 "arith_reg_operand" "%r,r")
			   (match_operand:SI 2 "logical_operand" "r,I10")]))))]
  "TARGET_SHMEDIA"
  "#"
  "&& 1"
  [(set (match_dup 0) (match_dup 3))])

(define_insn_and_split "*logical_sidi3_2"
  [(set (match_operand:DI 0 "arith_reg_dest" "=r,r")
	(sign_extend:DI (truncate:SI (sign_extend:DI
			(match_operator:SI 3 "logical_operator"
			  [(match_operand:SI 1 "arith_reg_operand" "%r,r")
			   (match_operand:SI 2 "logical_operand" "r,I10")])))))]
  "TARGET_SHMEDIA"
  "#"
  "&& 1"
  [(set (match_dup 0) (sign_extend:DI (match_dup 3)))])

combine should now simplify the first to the normal SI logical op
and the second to *logical_sidisi3.  I don't think any spu or tilegx
patterns are affected either way.

Tested on x86_64-linux-gnu, mipsisa32-elf and mipsisa64-elf.  Also tested
by making sure that there were no code differences for a set of gcc .ii
files on gcc20 (-O2 -march=native).  OK to install?

Richard


gcc/
	* machmode.h (GET_MODE_UNIT_PRECISION): New macro.
	* simplify-rtx.c (simplify_truncation): New function.
	(simplify_unary_operation_1): Use it.  Remove sign bit test
	for !TRULY_NOOP_TRUNCATION_MODES_P.
	(simplify_subreg): Use simplify_int_lowpart for TRUNCATE.
	* config/mips/mips.c (mips_truncated_op_cost): New function.
	(mips_rtx_costs): Adjust test for BADDU.
	* config/mips/mips.md (*baddu_di<mode>): Push truncates to operands.

Index: gcc/machmode.h
===================================================================
--- gcc/machmode.h	2012-06-23 08:30:36.000000000 +0100
+++ gcc/machmode.h	2012-10-06 10:03:47.146873855 +0100
@@ -217,6 +217,11 @@ #define GET_MODE_UNIT_SIZE(MODE)		\
 #define GET_MODE_UNIT_BITSIZE(MODE) \
   ((unsigned short) (GET_MODE_UNIT_SIZE (MODE) * BITS_PER_UNIT))
 
+#define GET_MODE_UNIT_PRECISION(MODE)		\
+  (GET_MODE_INNER (MODE) == VOIDmode		\
+   ? GET_MODE_PRECISION (MODE)			\
+   : GET_MODE_PRECISION (GET_MODE_INNER (MODE)))
+
 /* Get the number of units in the object.  */
 
 extern const unsigned char mode_nunits[NUM_MACHINE_MODES];
Index: gcc/simplify-rtx.c
===================================================================
--- gcc/simplify-rtx.c	2012-10-02 20:34:15.969129966 +0100
+++ gcc/simplify-rtx.c	2012-10-06 10:08:08.349859303 +0100
@@ -564,6 +564,167 @@ simplify_replace_rtx (rtx x, const_rtx o
   return simplify_replace_fn_rtx (x, old_rtx, 0, new_rtx);
 }
 \f
+/* Try to simplify a MODE truncation of OP, which has OP_MODE.  */
+
+static rtx
+simplify_truncation (enum machine_mode mode, rtx op,
+		     enum machine_mode op_mode)
+{
+  unsigned int precision = GET_MODE_UNIT_PRECISION (mode);
+  unsigned int op_precision = GET_MODE_UNIT_PRECISION (op_mode);
+  gcc_assert (precision <= op_precision);
+
+  /* Optimize truncations of zero and sign extended values.  */
+  if (GET_CODE (op) == ZERO_EXTEND
+      || GET_CODE (op) == SIGN_EXTEND)
+    {
+      /* There are three possibilities.  If MODE is the same as the
+	 origmode, we can omit both the extension and the subreg.
+	 If MODE is not larger than the origmode, we can apply the
+	 truncation without the extension.  Finally, if the outermode
+	 is larger than the origmode, we can just extend to the appropriate
+	 mode.  */
+      enum machine_mode origmode = GET_MODE (XEXP (op, 0));
+      if (mode == origmode)
+	return XEXP (op, 0);
+      else if (precision <= GET_MODE_UNIT_PRECISION (origmode))
+	return simplify_gen_unary (TRUNCATE, mode,
+				   XEXP (op, 0), origmode);
+      else
+	return simplify_gen_unary (GET_CODE (op), mode,
+				   XEXP (op, 0), origmode);
+    }
+
+  /* Simplify (truncate:SI (op:DI (x:DI) (y:DI)))
+     to (op:SI (truncate:SI (x:DI)) (truncate:SI (x:DI))).  */
+  if (GET_CODE (op) == PLUS
+      || GET_CODE (op) == MINUS
+      || GET_CODE (op) == MULT)
+    {
+      rtx op0 = simplify_gen_unary (TRUNCATE, mode, XEXP (op, 0), op_mode);
+      if (op0)
+	{
+	  rtx op1 = simplify_gen_unary (TRUNCATE, mode, XEXP (op, 1), op_mode);
+	  if (op1)
+	    return simplify_gen_binary (GET_CODE (op), mode, op0, op1);
+	}
+    }
+
+  /* Simplify (truncate:QI (lshiftrt:SI (sign_extend:SI (x:QI)) C)) into
+     to (ashiftrt:QI (x:QI) C), where C is a suitable small constant and
+     the outer subreg is effectively a truncation to the original mode.  */
+  if ((GET_CODE (op) == LSHIFTRT
+       || GET_CODE (op) == ASHIFTRT)
+      /* Ensure that OP_MODE is at least twice as wide as MODE
+	 to avoid the possibility that an outer LSHIFTRT shifts by more
+	 than the sign extension's sign_bit_copies and introduces zeros
+	 into the high bits of the result.  */
+      && 2 * precision <= op_precision
+      && CONST_INT_P (XEXP (op, 1))
+      && GET_CODE (XEXP (op, 0)) == SIGN_EXTEND
+      && GET_MODE (XEXP (XEXP (op, 0), 0)) == mode
+      && INTVAL (XEXP (op, 1)) < precision)
+    return simplify_gen_binary (ASHIFTRT, mode,
+				XEXP (XEXP (op, 0), 0), XEXP (op, 1));
+
+  /* Likewise (truncate:QI (lshiftrt:SI (zero_extend:SI (x:QI)) C)) into
+     to (lshiftrt:QI (x:QI) C), where C is a suitable small constant and
+     the outer subreg is effectively a truncation to the original mode.  */
+  if ((GET_CODE (op) == LSHIFTRT
+       || GET_CODE (op) == ASHIFTRT)
+      && CONST_INT_P (XEXP (op, 1))
+      && GET_CODE (XEXP (op, 0)) == ZERO_EXTEND
+      && GET_MODE (XEXP (XEXP (op, 0), 0)) == mode
+      && INTVAL (XEXP (op, 1)) < precision)
+    return simplify_gen_binary (LSHIFTRT, mode,
+				XEXP (XEXP (op, 0), 0), XEXP (op, 1));
+
+  /* Likewise (truncate:QI (ashift:SI (zero_extend:SI (x:QI)) C)) into
+     to (ashift:QI (x:QI) C), where C is a suitable small constant and
+     the outer subreg is effectively a truncation to the original mode.  */
+  if (GET_CODE (op) == ASHIFT
+      && CONST_INT_P (XEXP (op, 1))
+      && (GET_CODE (XEXP (op, 0)) == ZERO_EXTEND
+	  || GET_CODE (XEXP (op, 0)) == SIGN_EXTEND)
+      && GET_MODE (XEXP (XEXP (op, 0), 0)) == mode
+      && INTVAL (XEXP (op, 1)) < precision)
+    return simplify_gen_binary (ASHIFT, mode,
+				XEXP (XEXP (op, 0), 0), XEXP (op, 1));
+
+  /* Recognize a word extraction from a multi-word subreg.  */
+  if ((GET_CODE (op) == LSHIFTRT
+       || GET_CODE (op) == ASHIFTRT)
+      && SCALAR_INT_MODE_P (mode)
+      && SCALAR_INT_MODE_P (op_mode)
+      && precision >= BITS_PER_WORD
+      && 2 * precision <= op_precision
+      && CONST_INT_P (XEXP (op, 1))
+      && (INTVAL (XEXP (op, 1)) & (precision - 1)) == 0
+      && INTVAL (XEXP (op, 1)) >= 0
+      && INTVAL (XEXP (op, 1)) < op_precision)
+    {
+      int byte = subreg_lowpart_offset (mode, op_mode);
+      int shifted_bytes = INTVAL (XEXP (op, 1)) / BITS_PER_UNIT;
+      return simplify_gen_subreg (mode, XEXP (op, 0), op_mode,
+				  (WORDS_BIG_ENDIAN
+				   ? byte - shifted_bytes
+				   : byte + shifted_bytes));
+    }
+
+  /* If we have a TRUNCATE of a right shift of MEM, make a new MEM
+     and try replacing the TRUNCATE and shift with it.  Don't do this
+     if the MEM has a mode-dependent address.  */
+  if ((GET_CODE (op) == LSHIFTRT
+       || GET_CODE (op) == ASHIFTRT)
+      && SCALAR_INT_MODE_P (op_mode)
+      && MEM_P (XEXP (op, 0))
+      && CONST_INT_P (XEXP (op, 1))
+      && (INTVAL (XEXP (op, 1)) % GET_MODE_BITSIZE (mode)) == 0
+      && INTVAL (XEXP (op, 1)) > 0
+      && INTVAL (XEXP (op, 1)) < GET_MODE_BITSIZE (op_mode)
+      && ! mode_dependent_address_p (XEXP (XEXP (op, 0), 0),
+				     MEM_ADDR_SPACE (XEXP (op, 0)))
+      && ! MEM_VOLATILE_P (XEXP (op, 0))
+      && (GET_MODE_SIZE (mode) >= UNITS_PER_WORD
+	  || WORDS_BIG_ENDIAN == BYTES_BIG_ENDIAN))
+    {
+      int byte = subreg_lowpart_offset (mode, op_mode);
+      int shifted_bytes = INTVAL (XEXP (op, 1)) / BITS_PER_UNIT;
+      return adjust_address_nv (XEXP (op, 0), mode,
+				(WORDS_BIG_ENDIAN
+				 ? byte - shifted_bytes
+				 : byte + shifted_bytes));
+    }
+
+  /* (truncate:SI (OP:DI ({sign,zero}_extend:DI foo:SI))) is
+     (OP:SI foo:SI) if OP is NEG or ABS.  */
+  if ((GET_CODE (op) == ABS
+       || GET_CODE (op) == NEG)
+      && (GET_CODE (XEXP (op, 0)) == SIGN_EXTEND
+	  || GET_CODE (XEXP (op, 0)) == ZERO_EXTEND)
+      && GET_MODE (XEXP (XEXP (op, 0), 0)) == mode)
+    return simplify_gen_unary (GET_CODE (op), mode,
+			       XEXP (XEXP (op, 0), 0), mode);
+
+  /* (truncate:A (subreg:B (truncate:C X) 0)) is
+     (truncate:A X).  */
+  if (GET_CODE (op) == SUBREG
+      && SCALAR_INT_MODE_P (mode)
+      && SCALAR_INT_MODE_P (op_mode)
+      && SCALAR_INT_MODE_P (GET_MODE (SUBREG_REG (op)))
+      && GET_CODE (SUBREG_REG (op)) == TRUNCATE
+      && subreg_lowpart_p (op))
+    return simplify_gen_unary (TRUNCATE, mode, XEXP (SUBREG_REG (op), 0),
+			       GET_MODE (XEXP (SUBREG_REG (op), 0)));
+
+  /* (truncate:A (truncate:B X)) is (truncate:A X).  */
+  if (GET_CODE (op) == TRUNCATE)
+    return simplify_gen_unary (TRUNCATE, mode, XEXP (op, 0),
+			       GET_MODE (XEXP (op, 0)));
+
+  return NULL_RTX;
+}
+\f
 /* Try to simplify a unary operation CODE whose output mode is to be
    MODE with input operand OP whose mode was originally OP_MODE.
    Return zero if no simplification can be made.  */
@@ -689,12 +850,6 @@ simplify_unary_operation_1 (enum rtx_cod
 	    op_mode = mode;
 	  in2 = simplify_gen_unary (NOT, op_mode, in2, op_mode);
 
-	  if (GET_CODE (in2) == NOT && GET_CODE (in1) != NOT)
-	    {
-	      rtx tem = in2;
-	      in2 = in1; in1 = tem;
-	    }
-
 	  return gen_rtx_fmt_ee (GET_CODE (op) == IOR ? AND : IOR,
 				 mode, in1, in2);
 	}
@@ -821,44 +976,24 @@ simplify_unary_operation_1 (enum rtx_cod
       if (GET_MODE_CLASS (mode) == MODE_PARTIAL_INT)
         break;
 
-      /* (truncate:SI ({sign,zero}_extend:DI foo:SI)) == foo:SI.  */
-      if ((GET_CODE (op) == SIGN_EXTEND
-	   || GET_CODE (op) == ZERO_EXTEND)
-	  && GET_MODE (XEXP (op, 0)) == mode)
-	return XEXP (op, 0);
-
-      /* (truncate:SI (OP:DI ({sign,zero}_extend:DI foo:SI))) is
-	 (OP:SI foo:SI) if OP is NEG or ABS.  */
-      if ((GET_CODE (op) == ABS
-	   || GET_CODE (op) == NEG)
-	  && (GET_CODE (XEXP (op, 0)) == SIGN_EXTEND
-	      || GET_CODE (XEXP (op, 0)) == ZERO_EXTEND)
-	  && GET_MODE (XEXP (XEXP (op, 0), 0)) == mode)
-	return simplify_gen_unary (GET_CODE (op), mode,
-				   XEXP (XEXP (op, 0), 0), mode);
+      /* Don't optimize (lshiftrt (mult ...)) as it would interfere
+	 with the umulXi3_highpart patterns.  */
+      if (GET_CODE (op) == LSHIFTRT
+	  && GET_CODE (XEXP (op, 0)) == MULT)
+	break;
 
-      /* (truncate:A (subreg:B (truncate:C X) 0)) is
-	 (truncate:A X).  */
-      if (GET_CODE (op) == SUBREG
-	  && GET_CODE (SUBREG_REG (op)) == TRUNCATE
-	  && subreg_lowpart_p (op))
-	return simplify_gen_unary (TRUNCATE, mode, XEXP (SUBREG_REG (op), 0),
-				   GET_MODE (XEXP (SUBREG_REG (op), 0)));
+      if (GET_MODE (op) != VOIDmode)
+	{
+	  temp = simplify_truncation (mode, op, GET_MODE (op));
+	  if (temp)
+	    return temp;
+	}
 
       /* If we know that the value is already truncated, we can
-         replace the TRUNCATE with a SUBREG.  Note that this is also
-         valid if TRULY_NOOP_TRUNCATION is false for the corresponding
-         modes we just have to apply a different definition for
-         truncation.  But don't do this for an (LSHIFTRT (MULT ...))
-         since this will cause problems with the umulXi3_highpart
-         patterns.  */
-      if ((TRULY_NOOP_TRUNCATION_MODES_P (mode, GET_MODE (op))
-	   ? (num_sign_bit_copies (op, GET_MODE (op))
-	      > (unsigned int) (GET_MODE_PRECISION (GET_MODE (op))
-				- GET_MODE_PRECISION (mode)))
-	   : truncated_to_mode (mode, op))
-	  && ! (GET_CODE (op) == LSHIFTRT
-		&& GET_CODE (XEXP (op, 0)) == MULT))
+	 replace the TRUNCATE with a SUBREG.  */
+      if (GET_MODE_NUNITS (mode) == 1
+	  && (TRULY_NOOP_TRUNCATION_MODES_P (mode, GET_MODE (op))
+	      || truncated_to_mode (mode, op)))
 	return rtl_hooks.gen_lowpart_no_emit (mode, op);
 
       /* A truncate of a comparison can be replaced with a subreg if
@@ -5595,14 +5730,6 @@ simplify_subreg (enum machine_mode outer
       return NULL_RTX;
     }
 
-  /* Merge implicit and explicit truncations.  */
-
-  if (GET_CODE (op) == TRUNCATE
-      && GET_MODE_SIZE (outermode) < GET_MODE_SIZE (innermode)
-      && subreg_lowpart_offset (outermode, innermode) == byte)
-    return simplify_gen_unary (TRUNCATE, outermode, XEXP (op, 0),
-			       GET_MODE (XEXP (op, 0)));
-
   /* SUBREG of a hard register => just change the register number
      and/or mode.  If the hard register is not valid in that mode,
      suppress this simplification.  If the hard register is the stack,
@@ -5688,160 +5815,23 @@ simplify_subreg (enum machine_mode outer
       return NULL_RTX;
     }
 
-  /* Optimize SUBREG truncations of zero and sign extended values.  */
-  if ((GET_CODE (op) == ZERO_EXTEND
-       || GET_CODE (op) == SIGN_EXTEND)
-      && SCALAR_INT_MODE_P (innermode)
-      && GET_MODE_PRECISION (outermode) < GET_MODE_PRECISION (innermode))
+  /* A SUBREG resulting from a zero extension may fold to zero if
+     it extracts higher bits that the ZERO_EXTEND's source bits.  */
+  if (GET_CODE (op) == ZERO_EXTEND)
     {
       unsigned int bitpos = subreg_lsb_1 (outermode, innermode, byte);
-
-      /* If we're requesting the lowpart of a zero or sign extension,
-	 there are three possibilities.  If the outermode is the same
-	 as the origmode, we can omit both the extension and the subreg.
-	 If the outermode is not larger than the origmode, we can apply
-	 the truncation without the extension.  Finally, if the outermode
-	 is larger than the origmode, but both are integer modes, we
-	 can just extend to the appropriate mode.  */
-      if (bitpos == 0)
-	{
-	  enum machine_mode origmode = GET_MODE (XEXP (op, 0));
-	  if (outermode == origmode)
-	    return XEXP (op, 0);
-	  if (GET_MODE_PRECISION (outermode) <= GET_MODE_PRECISION (origmode))
-	    return simplify_gen_subreg (outermode, XEXP (op, 0), origmode,
-					subreg_lowpart_offset (outermode,
-							       origmode));
-	  if (SCALAR_INT_MODE_P (outermode))
-	    return simplify_gen_unary (GET_CODE (op), outermode,
-				       XEXP (op, 0), origmode);
-	}
-
-      /* A SUBREG resulting from a zero extension may fold to zero if
-	 it extracts higher bits that the ZERO_EXTEND's source bits.  */
-      if (GET_CODE (op) == ZERO_EXTEND
-	  && bitpos >= GET_MODE_PRECISION (GET_MODE (XEXP (op, 0))))
+      if (bitpos >= GET_MODE_PRECISION (GET_MODE (XEXP (op, 0))))
 	return CONST0_RTX (outermode);
     }
 
-  /* Simplify (subreg:SI (op:DI ((x:DI) (y:DI)), 0)
-     to (op:SI (subreg:SI (x:DI) 0) (subreg:SI (x:DI) 0)), where
-     the outer subreg is effectively a truncation to the original mode.  */
-  if ((GET_CODE (op) == PLUS
-       || GET_CODE (op) == MINUS
-       || GET_CODE (op) == MULT)
-      && SCALAR_INT_MODE_P (outermode)
+  if (SCALAR_INT_MODE_P (outermode)
       && SCALAR_INT_MODE_P (innermode)
       && GET_MODE_PRECISION (outermode) < GET_MODE_PRECISION (innermode)
       && byte == subreg_lowpart_offset (outermode, innermode))
     {
-      rtx op0 = simplify_gen_subreg (outermode, XEXP (op, 0),
-                                     innermode, byte);
-      if (op0)
-        {
-          rtx op1 = simplify_gen_subreg (outermode, XEXP (op, 1),
-                                         innermode, byte);
-          if (op1)
-            return simplify_gen_binary (GET_CODE (op), outermode, op0, op1);
-        }
-    }
-
-  /* Simplify (subreg:QI (lshiftrt:SI (sign_extend:SI (x:QI)) C), 0) into
-     to (ashiftrt:QI (x:QI) C), where C is a suitable small constant and
-     the outer subreg is effectively a truncation to the original mode.  */
-  if ((GET_CODE (op) == LSHIFTRT
-       || GET_CODE (op) == ASHIFTRT)
-      && SCALAR_INT_MODE_P (outermode)
-      && SCALAR_INT_MODE_P (innermode)
-      /* Ensure that OUTERMODE is at least twice as wide as the INNERMODE
-	 to avoid the possibility that an outer LSHIFTRT shifts by more
-	 than the sign extension's sign_bit_copies and introduces zeros
-	 into the high bits of the result.  */
-      && (2 * GET_MODE_PRECISION (outermode)) <= GET_MODE_PRECISION (innermode)
-      && CONST_INT_P (XEXP (op, 1))
-      && GET_CODE (XEXP (op, 0)) == SIGN_EXTEND
-      && GET_MODE (XEXP (XEXP (op, 0), 0)) == outermode
-      && INTVAL (XEXP (op, 1)) < GET_MODE_PRECISION (outermode)
-      && subreg_lsb_1 (outermode, innermode, byte) == 0)
-    return simplify_gen_binary (ASHIFTRT, outermode,
-				XEXP (XEXP (op, 0), 0), XEXP (op, 1));
-
-  /* Likewise (subreg:QI (lshiftrt:SI (zero_extend:SI (x:QI)) C), 0) into
-     to (lshiftrt:QI (x:QI) C), where C is a suitable small constant and
-     the outer subreg is effectively a truncation to the original mode.  */
-  if ((GET_CODE (op) == LSHIFTRT
-       || GET_CODE (op) == ASHIFTRT)
-      && SCALAR_INT_MODE_P (outermode)
-      && SCALAR_INT_MODE_P (innermode)
-      && GET_MODE_PRECISION (outermode) < GET_MODE_PRECISION (innermode)
-      && CONST_INT_P (XEXP (op, 1))
-      && GET_CODE (XEXP (op, 0)) == ZERO_EXTEND
-      && GET_MODE (XEXP (XEXP (op, 0), 0)) == outermode
-      && INTVAL (XEXP (op, 1)) < GET_MODE_PRECISION (outermode)
-      && subreg_lsb_1 (outermode, innermode, byte) == 0)
-    return simplify_gen_binary (LSHIFTRT, outermode,
-				XEXP (XEXP (op, 0), 0), XEXP (op, 1));
-
-  /* Likewise (subreg:QI (ashift:SI (zero_extend:SI (x:QI)) C), 0) into
-     to (ashift:QI (x:QI) C), where C is a suitable small constant and
-     the outer subreg is effectively a truncation to the original mode.  */
-  if (GET_CODE (op) == ASHIFT
-      && SCALAR_INT_MODE_P (outermode)
-      && SCALAR_INT_MODE_P (innermode)
-      && GET_MODE_PRECISION (outermode) < GET_MODE_PRECISION (innermode)
-      && CONST_INT_P (XEXP (op, 1))
-      && (GET_CODE (XEXP (op, 0)) == ZERO_EXTEND
-	  || GET_CODE (XEXP (op, 0)) == SIGN_EXTEND)
-      && GET_MODE (XEXP (XEXP (op, 0), 0)) == outermode
-      && INTVAL (XEXP (op, 1)) < GET_MODE_PRECISION (outermode)
-      && subreg_lsb_1 (outermode, innermode, byte) == 0)
-    return simplify_gen_binary (ASHIFT, outermode,
-				XEXP (XEXP (op, 0), 0), XEXP (op, 1));
-
-  /* Recognize a word extraction from a multi-word subreg.  */
-  if ((GET_CODE (op) == LSHIFTRT
-       || GET_CODE (op) == ASHIFTRT)
-      && SCALAR_INT_MODE_P (innermode)
-      && GET_MODE_PRECISION (outermode) >= BITS_PER_WORD
-      && GET_MODE_PRECISION (innermode) >= (2 * GET_MODE_PRECISION (outermode))
-      && CONST_INT_P (XEXP (op, 1))
-      && (INTVAL (XEXP (op, 1)) & (GET_MODE_PRECISION (outermode) - 1)) == 0
-      && INTVAL (XEXP (op, 1)) >= 0
-      && INTVAL (XEXP (op, 1)) < GET_MODE_PRECISION (innermode)
-      && byte == subreg_lowpart_offset (outermode, innermode))
-    {
-      int shifted_bytes = INTVAL (XEXP (op, 1)) / BITS_PER_UNIT;
-      return simplify_gen_subreg (outermode, XEXP (op, 0), innermode,
-				  (WORDS_BIG_ENDIAN
-				   ? byte - shifted_bytes
-				   : byte + shifted_bytes));
-    }
-
-  /* If we have a lowpart SUBREG of a right shift of MEM, make a new MEM
-     and try replacing the SUBREG and shift with it.  Don't do this if
-     the MEM has a mode-dependent address or if we would be widening it.  */
-
-  if ((GET_CODE (op) == LSHIFTRT
-       || GET_CODE (op) == ASHIFTRT)
-      && SCALAR_INT_MODE_P (innermode)
-      && MEM_P (XEXP (op, 0))
-      && CONST_INT_P (XEXP (op, 1))
-      && GET_MODE_SIZE (outermode) < GET_MODE_SIZE (GET_MODE (op))
-      && (INTVAL (XEXP (op, 1)) % GET_MODE_BITSIZE (outermode)) == 0
-      && INTVAL (XEXP (op, 1)) > 0
-      && INTVAL (XEXP (op, 1)) < GET_MODE_BITSIZE (innermode)
-      && ! mode_dependent_address_p (XEXP (XEXP (op, 0), 0),
-				     MEM_ADDR_SPACE (XEXP (op, 0)))
-      && ! MEM_VOLATILE_P (XEXP (op, 0))
-      && byte == subreg_lowpart_offset (outermode, innermode)
-      && (GET_MODE_SIZE (outermode) >= UNITS_PER_WORD
-	  || WORDS_BIG_ENDIAN == BYTES_BIG_ENDIAN))
-    {
-      int shifted_bytes = INTVAL (XEXP (op, 1)) / BITS_PER_UNIT;
-      return adjust_address_nv (XEXP (op, 0), outermode,
-				(WORDS_BIG_ENDIAN
-				 ? byte - shifted_bytes
-				 : byte + shifted_bytes));
+      rtx tem = simplify_truncation (outermode, op, innermode);
+      if (tem)
+	return tem;
     }
 
   return NULL_RTX;
Index: gcc/config/mips/mips.c
===================================================================
--- gcc/config/mips/mips.c	2012-10-02 21:02:21.000000000 +0100
+++ gcc/config/mips/mips.c	2012-10-06 10:06:42.617864078 +0100
@@ -3527,6 +3527,17 @@ mips_set_reg_reg_cost (enum machine_mode
     }
 }
 
+/* Return the cost of an operand X that can be trucated for free.
+   SPEED says whether we're optimizing for size or speed.  */
+
+static int
+mips_truncated_op_cost (rtx x, bool speed)
+{
+  if (GET_CODE (x) == TRUNCATE)
+    x = XEXP (x, 0);
+  return set_src_cost (x, speed);
+}
+
 /* Implement TARGET_RTX_COSTS.  */
 
 static bool
@@ -3907,12 +3918,13 @@ mips_rtx_costs (rtx x, int code, int out
     case ZERO_EXTEND:
       if (outer_code == SET
 	  && ISA_HAS_BADDU
-	  && (GET_CODE (XEXP (x, 0)) == TRUNCATE
-	      || GET_CODE (XEXP (x, 0)) == SUBREG)
 	  && GET_MODE (XEXP (x, 0)) == QImode
-	  && GET_CODE (XEXP (XEXP (x, 0), 0)) == PLUS)
+	  && GET_CODE (XEXP (x, 0)) == PLUS)
 	{
-	  *total = set_src_cost (XEXP (XEXP (x, 0), 0), speed);
+	  rtx plus = XEXP (x, 0);
+	  *total = (COSTS_N_INSNS (1)
+		    + mips_truncated_op_cost (XEXP (plus, 0), speed)
+		    + mips_truncated_op_cost (XEXP (plus, 1), speed));
 	  return true;
 	}
       *total = mips_zero_extend_cost (mode, XEXP (x, 0));
Index: gcc/config/mips/mips.md
===================================================================
--- gcc/config/mips/mips.md	2012-10-02 20:37:34.000000000 +0100
+++ gcc/config/mips/mips.md	2012-10-06 10:03:47.156873851 +0100
@@ -1305,9 +1305,8 @@ (define_insn "*baddu_si"
 (define_insn "*baddu_di<mode>"
   [(set (match_operand:GPR 0 "register_operand" "=d")
         (zero_extend:GPR
-	 (truncate:QI
-	  (plus:DI (match_operand:DI 1 "register_operand" "d")
-		   (match_operand:DI 2 "register_operand" "d")))))]
+	 (plus:QI (truncate:QI (match_operand:DI 1 "register_operand" "d"))
+		  (truncate:QI (match_operand:DI 2 "register_operand" "d")))))]
   "ISA_HAS_BADDU && TARGET_64BIT"
   "baddu\\t%0,%1,%2"
   [(set_attr "alu_type" "add")])

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: RFA: Simplifying truncation and integer lowpart subregs
  2012-10-06 10:22                   ` RFA: Simplifying truncation and integer lowpart subregs Richard Sandiford
@ 2012-10-06 11:13                     ` Eric Botcazou
  2012-10-06 12:39                       ` Richard Sandiford
  0 siblings, 1 reply; 23+ messages in thread
From: Eric Botcazou @ 2012-10-06 11:13 UTC (permalink / raw)
  To: Richard Sandiford
  Cc: gcc-patches, Andrew Pinski, Uros Bizjak, Paul_Koning, kkojima,
	aoliva, dje.gcc, uweigand, walt

> Tested on x86_64-linux-gnu, mipsisa32-elf and mipsisa64-elf.  Also tested
> by making sure that there were no code differences for a set of gcc .ii
> files on gcc20 (-O2 -march=native).  OK to install?

Are you sure that generating TRUNCATEs out of nowhere in simplify_subreg is 
always correct?

> gcc/
> 	* machmode.h (GET_MODE_UNIT_PRECISION): New macro.
> 	* simplify-rtx.c (simplify_truncation): New function.

You should say where it comes from.

> 	(simplify_unary_operation_1): Use it.  Remove sign bit test
> 	for !TRULY_NOOP_TRUNCATION_MODES_P.

	(simplify_unary_operation_1) <TRUNCATE>: ...

> 	(simplify_subreg): Use simplify_int_lowpart for TRUNCATE.

This function doesn't exist.  And this is misleading, it's not just for 
TRUNCATE, it's for a truncation to the lowpart.

>  /* Try to simplify a unary operation CODE whose output mode is to be
>     MODE with input operand OP whose mode was originally OP_MODE.
>     Return zero if no simplification can be made.  */
> @@ -689,12 +850,6 @@ simplify_unary_operation_1 (enum rtx_cod
>  	    op_mode = mode;
>  	  in2 = simplify_gen_unary (NOT, op_mode, in2, op_mode);
> 
> -	  if (GET_CODE (in2) == NOT && GET_CODE (in1) != NOT)
> -	    {
> -	      rtx tem = in2;
> -	      in2 = in1; in1 = tem;
> -	    }
> -
>  	  return gen_rtx_fmt_ee (GET_CODE (op) == IOR ? AND : IOR,
>  				 mode, in1, in2);
>  	}

Why is this hunk here?

> @@ -5595,14 +5730,6 @@ simplify_subreg (enum machine_mode outer
>        return NULL_RTX;
>      }
> 
> -  /* Merge implicit and explicit truncations.  */
> -
> -  if (GET_CODE (op) == TRUNCATE
> -      && GET_MODE_SIZE (outermode) < GET_MODE_SIZE (innermode)
> -      && subreg_lowpart_offset (outermode, innermode) == byte)
> -    return simplify_gen_unary (TRUNCATE, outermode, XEXP (op, 0),
> -			       GET_MODE (XEXP (op, 0)));
> -
>    /* SUBREG of a hard register => just change the register number
>       and/or mode.  If the hard register is not valid in that mode,
>       suppress this simplification.  If the hard register is the stack,

Likewise.

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: RFA: Simplifying truncation and integer lowpart subregs
  2012-10-06 11:13                     ` Eric Botcazou
@ 2012-10-06 12:39                       ` Richard Sandiford
  2012-10-06 13:05                         ` Eric Botcazou
  0 siblings, 1 reply; 23+ messages in thread
From: Richard Sandiford @ 2012-10-06 12:39 UTC (permalink / raw)
  To: Eric Botcazou
  Cc: gcc-patches, Andrew Pinski, Uros Bizjak, Paul_Koning, kkojima,
	aoliva, dje.gcc, uweigand, walt

Thanks for the review.

Eric Botcazou <ebotcazou@adacore.com> writes:
>> Tested on x86_64-linux-gnu, mipsisa32-elf and mipsisa64-elf.  Also tested
>> by making sure that there were no code differences for a set of gcc .ii
>> files on gcc20 (-O2 -march=native).  OK to install?
>
> Are you sure that generating TRUNCATEs out of nowhere in simplify_subreg is 
> always correct?

We only use TRUNCATEs when truncating the operands of an existing operation.
We don't convert the SUBREG itself to a TRUNCATE.  I don't think it matters
whether the narrowing of the operation was brought about by a TRUNCATE
or by a SUBREG.

I think modelling it as a TRUNCATE operation is correct for
!TRULY_NOOP_TRUNCATION (it's the bug that Andrew pointed out).
And we shouldn't generate an actual TRUNCATE rtx for
TRULY_NOOP_TRUNCATION (the thing about making
simplify_gen_unary (TRUNCATE, ...) no worse than simplify_gen_subreg
for those targets).  I suppose:

      /* We can't handle truncation to a partial integer mode here
         because we don't know the real bitsize of the partial
         integer mode.  */
      if (GET_MODE_CLASS (mode) == MODE_PARTIAL_INT)
        break;

might be a problem though; we should still allow a subreg to be
generated.  Is that what you were thinking of, or something else?

I'm certainly open to other ideas though.  Or do you think that I was
wrong about doing this narrowing in simplify_subreg to begin with?

>> gcc/
>> 	* machmode.h (GET_MODE_UNIT_PRECISION): New macro.
>> 	* simplify-rtx.c (simplify_truncation): New function.
>
> You should say where it comes from.

OK.

>> 	(simplify_unary_operation_1): Use it.  Remove sign bit test
>> 	for !TRULY_NOOP_TRUNCATION_MODES_P.
>
> 	(simplify_unary_operation_1) <TRUNCATE>: ...
>
>> 	(simplify_subreg): Use simplify_int_lowpart for TRUNCATE.
>
> This function doesn't exist.  And this is misleading, it's not just for 
> TRUNCATE, it's for a truncation to the lowpart.

Curses, forgot to update that part.  Thanks.

>>  /* Try to simplify a unary operation CODE whose output mode is to be
>>     MODE with input operand OP whose mode was originally OP_MODE.
>>     Return zero if no simplification can be made.  */
>> @@ -689,12 +850,6 @@ simplify_unary_operation_1 (enum rtx_cod
>>  	    op_mode = mode;
>>  	  in2 = simplify_gen_unary (NOT, op_mode, in2, op_mode);
>> 
>> -	  if (GET_CODE (in2) == NOT && GET_CODE (in1) != NOT)
>> -	    {
>> -	      rtx tem = in2;
>> -	      in2 = in1; in1 = tem;
>> -	    }
>> -
>>  	  return gen_rtx_fmt_ee (GET_CODE (op) == IOR ? AND : IOR,
>>  				 mode, in1, in2);
>>  	}
>
> Why is this hunk here?

Sorry, I've no idea :-(

>> @@ -5595,14 +5730,6 @@ simplify_subreg (enum machine_mode outer
>>        return NULL_RTX;
>>      }
>> 
>> -  /* Merge implicit and explicit truncations.  */
>> -
>> -  if (GET_CODE (op) == TRUNCATE
>> -      && GET_MODE_SIZE (outermode) < GET_MODE_SIZE (innermode)
>> -      && subreg_lowpart_offset (outermode, innermode) == byte)
>> -    return simplify_gen_unary (TRUNCATE, outermode, XEXP (op, 0),
>> -			       GET_MODE (XEXP (op, 0)));
>> -
>>    /* SUBREG of a hard register => just change the register number
>>       and/or mode.  If the hard register is not valid in that mode,
>>       suppress this simplification.  If the hard register is the stack,
>
> Likewise.

This moved to simplify_truncation:

  /* (truncate:A (truncate:B X)) is (truncate:A X).  */
  if (GET_CODE (op) == TRUNCATE)
    return simplify_gen_unary (TRUNCATE, mode, XEXP (op, 0),
			       GET_MODE (XEXP (op, 0)));

I haven't included an updated patch because of your general concern above.

Richard

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: RFA: Simplifying truncation and integer lowpart subregs
  2012-10-06 12:39                       ` Richard Sandiford
@ 2012-10-06 13:05                         ` Eric Botcazou
  2012-10-07  8:56                           ` Richard Sandiford
  0 siblings, 1 reply; 23+ messages in thread
From: Eric Botcazou @ 2012-10-06 13:05 UTC (permalink / raw)
  To: Richard Sandiford
  Cc: gcc-patches, Andrew Pinski, Uros Bizjak, Paul_Koning, kkojima,
	aoliva, dje.gcc, uweigand, walt

> I think modelling it as a TRUNCATE operation is correct for
> !TRULY_NOOP_TRUNCATION (it's the bug that Andrew pointed out).
> And we shouldn't generate an actual TRUNCATE rtx for
> TRULY_NOOP_TRUNCATION (the thing about making
> simplify_gen_unary (TRUNCATE, ...) no worse than simplify_gen_subreg
> for those targets).  I suppose:
> 
>       /* We can't handle truncation to a partial integer mode here
>          because we don't know the real bitsize of the partial
>          integer mode.  */
>       if (GET_MODE_CLASS (mode) == MODE_PARTIAL_INT)
>         break;
> 
> might be a problem though; we should still allow a subreg to be
> generated.  Is that what you were thinking of, or something else?

I was thinking of the !TRULY_NOOP_TRUNCATION case, where the two operations 
aren't equivalent.  Generating TRUNCATE in simplify_subreg seems suspicious to 
me in this case but, if not doing it is the source of the bug, I guess I need 
to do some homework on this TRULY_NOOP_TRUNCATION stuff. :-)

Maybe add a blurb to the head comment of simplify_truncation, explaining that
it is valid to call the function both for TRUNCATEs and truncations to the 
lowpart, and why it is correct to generate new TRUNCATEs in the latter case.

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: RFA: Simplifying truncation and integer lowpart subregs
  2012-10-06 13:05                         ` Eric Botcazou
@ 2012-10-07  8:56                           ` Richard Sandiford
  2012-10-07 12:36                             ` Eric Botcazou
  2012-11-28  2:27                             ` Ramana Radhakrishnan
  0 siblings, 2 replies; 23+ messages in thread
From: Richard Sandiford @ 2012-10-07  8:56 UTC (permalink / raw)
  To: Eric Botcazou
  Cc: gcc-patches, Andrew Pinski, Uros Bizjak, Paul_Koning, kkojima,
	aoliva, dje.gcc, uweigand, walt

Eric Botcazou <ebotcazou@adacore.com> writes:
>> I think modelling it as a TRUNCATE operation is correct for
>> !TRULY_NOOP_TRUNCATION (it's the bug that Andrew pointed out).
>> And we shouldn't generate an actual TRUNCATE rtx for
>> TRULY_NOOP_TRUNCATION (the thing about making
>> simplify_gen_unary (TRUNCATE, ...) no worse than simplify_gen_subreg
>> for those targets).  I suppose:
>> 
>>       /* We can't handle truncation to a partial integer mode here
>>          because we don't know the real bitsize of the partial
>>          integer mode.  */
>>       if (GET_MODE_CLASS (mode) == MODE_PARTIAL_INT)
>>         break;
>> 
>> might be a problem though; we should still allow a subreg to be
>> generated.  Is that what you were thinking of, or something else?
>
> I was thinking of the !TRULY_NOOP_TRUNCATION case, where the two operations 
> aren't equivalent.  Generating TRUNCATE in simplify_subreg seems suspicious to 
> me in this case but, if not doing it is the source of the bug, I guess I need 
> to do some homework on this TRULY_NOOP_TRUNCATION stuff. :-)
>
> Maybe add a blurb to the head comment of simplify_truncation, explaining that
> it is valid to call the function both for TRUNCATEs and truncations to the 
> lowpart, and why it is correct to generate new TRUNCATEs in the latter case.

Yeah, in hindsight, the patch was definitely lacking commentary.
How about the patch below?  It also fixes the partial int case
and gets rid of the errant NOT hunk.  Tested in the same way as before.

Richard


gcc/
	* machmode.h (GET_MODE_UNIT_PRECISION): New macro.
	* simplify-rtx.c (simplify_truncation): New function,
	extracted from simplify_subreg and (in small part) from
	simplify_unary_operation_1.
	(simplify_unary_operation_1) <TRUNCATE>: Use it.  Remove sign bit
	test for !TRULY_NOOP_TRUNCATION_MODES_P.
	(simplify_subreg): Use simplify_truncate for lowpart subregs
	where both the inner and outer modes are scalar integers.
	* config/mips/mips.c (mips_truncated_op_cost): New function.
	(mips_rtx_costs): Adjust test for BADDU.
	* config/mips/mips.md (*baddu_di<mode>): Push truncates to operands.

Index: gcc/machmode.h
===================================================================
--- gcc/machmode.h	2012-10-06 11:22:55.888609330 +0100
+++ gcc/machmode.h	2012-10-06 11:24:42.459603393 +0100
@@ -217,6 +217,11 @@ #define GET_MODE_UNIT_SIZE(MODE)		\
 #define GET_MODE_UNIT_BITSIZE(MODE) \
   ((unsigned short) (GET_MODE_UNIT_SIZE (MODE) * BITS_PER_UNIT))
 
+#define GET_MODE_UNIT_PRECISION(MODE)		\
+  (GET_MODE_INNER (MODE) == VOIDmode		\
+   ? GET_MODE_PRECISION (MODE)			\
+   : GET_MODE_PRECISION (GET_MODE_INNER (MODE)))
+
 /* Get the number of units in the object.  */
 
 extern const unsigned char mode_nunits[NUM_MACHINE_MODES];
Index: gcc/simplify-rtx.c
===================================================================
--- gcc/simplify-rtx.c	2012-10-06 11:23:57.005605924 +0100
+++ gcc/simplify-rtx.c	2012-10-06 17:14:40.018658588 +0100
@@ -564,6 +564,212 @@ simplify_replace_rtx (rtx x, const_rtx o
   return simplify_replace_fn_rtx (x, old_rtx, 0, new_rtx);
 }
 \f
+/* Try to simplify a MODE truncation of OP, which has OP_MODE.
+   Only handle cases where the truncated value is inherently an rvalue.
+
+   RTL provides two ways of truncating a value:
+
+   1. a lowpart subreg.  This form is only a truncation when both
+      the outer and inner modes (here MODE and OP_MODE respectively)
+      are scalar integers, and only then when the subreg is used as
+      an rvalue.
+
+      It is only valid to form such truncating subregs if the
+      truncation requires no action by the target.  The onus for
+      proving this is on the creator of the subreg -- e.g. the
+      caller to simplify_subreg or simplify_gen_subreg -- and typically
+      involves either TRULY_NOOP_TRUNCATION_MODES_P or truncated_to_mode.
+
+   2. a TRUNCATE.  This form handles both scalar and compound integers.
+
+   The first form is preferred where valid.  However, the TRUNCATE
+   handling in simplify_unary_operation turns the second form into the
+   first form when TRULY_NOOP_TRUNCATION_MODES_P or truncated_to_mode allow,
+   so it is generally safe to form rvalue truncations using:
+
+      simplify_gen_unary (TRUNCATE, ...)
+
+   and leave simplify_unary_operation to work out which representation
+   should be used.
+
+   Because of the proof requirements on (1), simplify_truncation must
+   also use simplify_gen_unary (TRUNCATE, ...) to truncate parts of OP,
+   regardless of whether the outer truncation came from a SUBREG or a
+   TRUNCATE.  For example, if the caller has proven that an SImode
+   truncation of:
+
+      (and:DI X Y)
+
+   is a no-op and can be represented as a subreg, it does not follow
+   that SImode truncations of X and Y are also no-ops.  On a target
+   like 64-bit MIPS that requires SImode values to be stored in
+   sign-extended form, an SImode truncation of:
+
+      (and:DI (reg:DI X) (const_int 63))
+
+   is trivially a no-op because only the lower 6 bits can be set.
+   However, X is still an arbitrary 64-bit number and so we cannot
+   assume that truncating it too is a no-op.  */
+
+static rtx
+simplify_truncation (enum machine_mode mode, rtx op,
+		     enum machine_mode op_mode)
+{
+  unsigned int precision = GET_MODE_UNIT_PRECISION (mode);
+  unsigned int op_precision = GET_MODE_UNIT_PRECISION (op_mode);
+  gcc_assert (precision <= op_precision);
+
+  /* Optimize truncations of zero and sign extended values.  */
+  if (GET_CODE (op) == ZERO_EXTEND
+      || GET_CODE (op) == SIGN_EXTEND)
+    {
+      /* There are three possibilities.  If MODE is the same as the
+	 origmode, we can omit both the extension and the subreg.
+	 If MODE is not larger than the origmode, we can apply the
+	 truncation without the extension.  Finally, if the outermode
+	 is larger than the origmode, we can just extend to the appropriate
+	 mode.  */
+      enum machine_mode origmode = GET_MODE (XEXP (op, 0));
+      if (mode == origmode)
+	return XEXP (op, 0);
+      else if (precision <= GET_MODE_UNIT_PRECISION (origmode))
+	return simplify_gen_unary (TRUNCATE, mode,
+				   XEXP (op, 0), origmode);
+      else
+	return simplify_gen_unary (GET_CODE (op), mode,
+				   XEXP (op, 0), origmode);
+    }
+
+  /* Simplify (truncate:SI (op:DI (x:DI) (y:DI)))
+     to (op:SI (truncate:SI (x:DI)) (truncate:SI (x:DI))).  */
+  if (GET_CODE (op) == PLUS
+      || GET_CODE (op) == MINUS
+      || GET_CODE (op) == MULT)
+    {
+      rtx op0 = simplify_gen_unary (TRUNCATE, mode, XEXP (op, 0), op_mode);
+      if (op0)
+	{
+	  rtx op1 = simplify_gen_unary (TRUNCATE, mode, XEXP (op, 1), op_mode);
+	  if (op1)
+	    return simplify_gen_binary (GET_CODE (op), mode, op0, op1);
+	}
+    }
+
+  /* Simplify (truncate:QI (lshiftrt:SI (sign_extend:SI (x:QI)) C)) into
+     to (ashiftrt:QI (x:QI) C), where C is a suitable small constant and
+     the outer subreg is effectively a truncation to the original mode.  */
+  if ((GET_CODE (op) == LSHIFTRT
+       || GET_CODE (op) == ASHIFTRT)
+      /* Ensure that OP_MODE is at least twice as wide as MODE
+	 to avoid the possibility that an outer LSHIFTRT shifts by more
+	 than the sign extension's sign_bit_copies and introduces zeros
+	 into the high bits of the result.  */
+      && 2 * precision <= op_precision
+      && CONST_INT_P (XEXP (op, 1))
+      && GET_CODE (XEXP (op, 0)) == SIGN_EXTEND
+      && GET_MODE (XEXP (XEXP (op, 0), 0)) == mode
+      && INTVAL (XEXP (op, 1)) < precision)
+    return simplify_gen_binary (ASHIFTRT, mode,
+				XEXP (XEXP (op, 0), 0), XEXP (op, 1));
+
+  /* Likewise (truncate:QI (lshiftrt:SI (zero_extend:SI (x:QI)) C)) into
+     to (lshiftrt:QI (x:QI) C), where C is a suitable small constant and
+     the outer subreg is effectively a truncation to the original mode.  */
+  if ((GET_CODE (op) == LSHIFTRT
+       || GET_CODE (op) == ASHIFTRT)
+      && CONST_INT_P (XEXP (op, 1))
+      && GET_CODE (XEXP (op, 0)) == ZERO_EXTEND
+      && GET_MODE (XEXP (XEXP (op, 0), 0)) == mode
+      && INTVAL (XEXP (op, 1)) < precision)
+    return simplify_gen_binary (LSHIFTRT, mode,
+				XEXP (XEXP (op, 0), 0), XEXP (op, 1));
+
+  /* Likewise (truncate:QI (ashift:SI (zero_extend:SI (x:QI)) C)) into
+     to (ashift:QI (x:QI) C), where C is a suitable small constant and
+     the outer subreg is effectively a truncation to the original mode.  */
+  if (GET_CODE (op) == ASHIFT
+      && CONST_INT_P (XEXP (op, 1))
+      && (GET_CODE (XEXP (op, 0)) == ZERO_EXTEND
+	  || GET_CODE (XEXP (op, 0)) == SIGN_EXTEND)
+      && GET_MODE (XEXP (XEXP (op, 0), 0)) == mode
+      && INTVAL (XEXP (op, 1)) < precision)
+    return simplify_gen_binary (ASHIFT, mode,
+				XEXP (XEXP (op, 0), 0), XEXP (op, 1));
+
+  /* Recognize a word extraction from a multi-word subreg.  */
+  if ((GET_CODE (op) == LSHIFTRT
+       || GET_CODE (op) == ASHIFTRT)
+      && SCALAR_INT_MODE_P (mode)
+      && SCALAR_INT_MODE_P (op_mode)
+      && precision >= BITS_PER_WORD
+      && 2 * precision <= op_precision
+      && CONST_INT_P (XEXP (op, 1))
+      && (INTVAL (XEXP (op, 1)) & (precision - 1)) == 0
+      && INTVAL (XEXP (op, 1)) >= 0
+      && INTVAL (XEXP (op, 1)) < op_precision)
+    {
+      int byte = subreg_lowpart_offset (mode, op_mode);
+      int shifted_bytes = INTVAL (XEXP (op, 1)) / BITS_PER_UNIT;
+      return simplify_gen_subreg (mode, XEXP (op, 0), op_mode,
+				  (WORDS_BIG_ENDIAN
+				   ? byte - shifted_bytes
+				   : byte + shifted_bytes));
+    }
+
+  /* If we have a TRUNCATE of a right shift of MEM, make a new MEM
+     and try replacing the TRUNCATE and shift with it.  Don't do this
+     if the MEM has a mode-dependent address.  */
+  if ((GET_CODE (op) == LSHIFTRT
+       || GET_CODE (op) == ASHIFTRT)
+      && SCALAR_INT_MODE_P (op_mode)
+      && MEM_P (XEXP (op, 0))
+      && CONST_INT_P (XEXP (op, 1))
+      && (INTVAL (XEXP (op, 1)) % GET_MODE_BITSIZE (mode)) == 0
+      && INTVAL (XEXP (op, 1)) > 0
+      && INTVAL (XEXP (op, 1)) < GET_MODE_BITSIZE (op_mode)
+      && ! mode_dependent_address_p (XEXP (XEXP (op, 0), 0),
+				     MEM_ADDR_SPACE (XEXP (op, 0)))
+      && ! MEM_VOLATILE_P (XEXP (op, 0))
+      && (GET_MODE_SIZE (mode) >= UNITS_PER_WORD
+	  || WORDS_BIG_ENDIAN == BYTES_BIG_ENDIAN))
+    {
+      int byte = subreg_lowpart_offset (mode, op_mode);
+      int shifted_bytes = INTVAL (XEXP (op, 1)) / BITS_PER_UNIT;
+      return adjust_address_nv (XEXP (op, 0), mode,
+				(WORDS_BIG_ENDIAN
+				 ? byte - shifted_bytes
+				 : byte + shifted_bytes));
+    }
+
+  /* (truncate:SI (OP:DI ({sign,zero}_extend:DI foo:SI))) is
+     (OP:SI foo:SI) if OP is NEG or ABS.  */
+  if ((GET_CODE (op) == ABS
+       || GET_CODE (op) == NEG)
+      && (GET_CODE (XEXP (op, 0)) == SIGN_EXTEND
+	  || GET_CODE (XEXP (op, 0)) == ZERO_EXTEND)
+      && GET_MODE (XEXP (XEXP (op, 0), 0)) == mode)
+    return simplify_gen_unary (GET_CODE (op), mode,
+			       XEXP (XEXP (op, 0), 0), mode);
+
+  /* (truncate:A (subreg:B (truncate:C X) 0)) is
+     (truncate:A X).  */
+  if (GET_CODE (op) == SUBREG
+      && SCALAR_INT_MODE_P (mode)
+      && SCALAR_INT_MODE_P (op_mode)
+      && SCALAR_INT_MODE_P (GET_MODE (SUBREG_REG (op)))
+      && GET_CODE (SUBREG_REG (op)) == TRUNCATE
+      && subreg_lowpart_p (op))
+    return simplify_gen_unary (TRUNCATE, mode, XEXP (SUBREG_REG (op), 0),
+			       GET_MODE (XEXP (SUBREG_REG (op), 0)));
+
+  /* (truncate:A (truncate:B X)) is (truncate:A X).  */
+  if (GET_CODE (op) == TRUNCATE)
+    return simplify_gen_unary (TRUNCATE, mode, XEXP (op, 0),
+			       GET_MODE (XEXP (op, 0)));
+
+  return NULL_RTX;
+}
+\f
 /* Try to simplify a unary operation CODE whose output mode is to be
    MODE with input operand OP whose mode was originally OP_MODE.
    Return zero if no simplification can be made.  */
@@ -815,50 +1021,34 @@ simplify_unary_operation_1 (enum rtx_cod
       break;
 
     case TRUNCATE:
-      /* We can't handle truncation to a partial integer mode here
-         because we don't know the real bitsize of the partial
-         integer mode.  */
-      if (GET_MODE_CLASS (mode) == MODE_PARTIAL_INT)
-        break;
-
-      /* (truncate:SI ({sign,zero}_extend:DI foo:SI)) == foo:SI.  */
-      if ((GET_CODE (op) == SIGN_EXTEND
-	   || GET_CODE (op) == ZERO_EXTEND)
-	  && GET_MODE (XEXP (op, 0)) == mode)
-	return XEXP (op, 0);
+      /* Don't optimize (lshiftrt (mult ...)) as it would interfere
+	 with the umulXi3_highpart patterns.  */
+      if (GET_CODE (op) == LSHIFTRT
+	  && GET_CODE (XEXP (op, 0)) == MULT)
+	break;
 
-      /* (truncate:SI (OP:DI ({sign,zero}_extend:DI foo:SI))) is
-	 (OP:SI foo:SI) if OP is NEG or ABS.  */
-      if ((GET_CODE (op) == ABS
-	   || GET_CODE (op) == NEG)
-	  && (GET_CODE (XEXP (op, 0)) == SIGN_EXTEND
-	      || GET_CODE (XEXP (op, 0)) == ZERO_EXTEND)
-	  && GET_MODE (XEXP (XEXP (op, 0), 0)) == mode)
-	return simplify_gen_unary (GET_CODE (op), mode,
-				   XEXP (XEXP (op, 0), 0), mode);
+      if (GET_MODE_CLASS (mode) == MODE_PARTIAL_INT)
+	{
+	  if (TRULY_NOOP_TRUNCATION_MODES_P (mode, GET_MODE (op)))
+	    return rtl_hooks.gen_lowpart_no_emit (mode, op);
+	  /* We can't handle truncation to a partial integer mode here
+	     because we don't know the real bitsize of the partial
+	     integer mode.  */
+	  break;
+	}
 
-      /* (truncate:A (subreg:B (truncate:C X) 0)) is
-	 (truncate:A X).  */
-      if (GET_CODE (op) == SUBREG
-	  && GET_CODE (SUBREG_REG (op)) == TRUNCATE
-	  && subreg_lowpart_p (op))
-	return simplify_gen_unary (TRUNCATE, mode, XEXP (SUBREG_REG (op), 0),
-				   GET_MODE (XEXP (SUBREG_REG (op), 0)));
+      if (GET_MODE (op) != VOIDmode)
+	{
+	  temp = simplify_truncation (mode, op, GET_MODE (op));
+	  if (temp)
+	    return temp;
+	}
 
       /* If we know that the value is already truncated, we can
-         replace the TRUNCATE with a SUBREG.  Note that this is also
-         valid if TRULY_NOOP_TRUNCATION is false for the corresponding
-         modes we just have to apply a different definition for
-         truncation.  But don't do this for an (LSHIFTRT (MULT ...))
-         since this will cause problems with the umulXi3_highpart
-         patterns.  */
-      if ((TRULY_NOOP_TRUNCATION_MODES_P (mode, GET_MODE (op))
-	   ? (num_sign_bit_copies (op, GET_MODE (op))
-	      > (unsigned int) (GET_MODE_PRECISION (GET_MODE (op))
-				- GET_MODE_PRECISION (mode)))
-	   : truncated_to_mode (mode, op))
-	  && ! (GET_CODE (op) == LSHIFTRT
-		&& GET_CODE (XEXP (op, 0)) == MULT))
+	 replace the TRUNCATE with a SUBREG.  */
+      if (GET_MODE_NUNITS (mode) == 1
+	  && (TRULY_NOOP_TRUNCATION_MODES_P (mode, GET_MODE (op))
+	      || truncated_to_mode (mode, op)))
 	return rtl_hooks.gen_lowpart_no_emit (mode, op);
 
       /* A truncate of a comparison can be replaced with a subreg if
@@ -5596,14 +5786,6 @@ simplify_subreg (enum machine_mode outer
       return NULL_RTX;
     }
 
-  /* Merge implicit and explicit truncations.  */
-
-  if (GET_CODE (op) == TRUNCATE
-      && GET_MODE_SIZE (outermode) < GET_MODE_SIZE (innermode)
-      && subreg_lowpart_offset (outermode, innermode) == byte)
-    return simplify_gen_unary (TRUNCATE, outermode, XEXP (op, 0),
-			       GET_MODE (XEXP (op, 0)));
-
   /* SUBREG of a hard register => just change the register number
      and/or mode.  If the hard register is not valid in that mode,
      suppress this simplification.  If the hard register is the stack,
@@ -5689,160 +5871,23 @@ simplify_subreg (enum machine_mode outer
       return NULL_RTX;
     }
 
-  /* Optimize SUBREG truncations of zero and sign extended values.  */
-  if ((GET_CODE (op) == ZERO_EXTEND
-       || GET_CODE (op) == SIGN_EXTEND)
-      && SCALAR_INT_MODE_P (innermode)
-      && GET_MODE_PRECISION (outermode) < GET_MODE_PRECISION (innermode))
+  /* A SUBREG resulting from a zero extension may fold to zero if
+     it extracts higher bits that the ZERO_EXTEND's source bits.  */
+  if (GET_CODE (op) == ZERO_EXTEND)
     {
       unsigned int bitpos = subreg_lsb_1 (outermode, innermode, byte);
-
-      /* If we're requesting the lowpart of a zero or sign extension,
-	 there are three possibilities.  If the outermode is the same
-	 as the origmode, we can omit both the extension and the subreg.
-	 If the outermode is not larger than the origmode, we can apply
-	 the truncation without the extension.  Finally, if the outermode
-	 is larger than the origmode, but both are integer modes, we
-	 can just extend to the appropriate mode.  */
-      if (bitpos == 0)
-	{
-	  enum machine_mode origmode = GET_MODE (XEXP (op, 0));
-	  if (outermode == origmode)
-	    return XEXP (op, 0);
-	  if (GET_MODE_PRECISION (outermode) <= GET_MODE_PRECISION (origmode))
-	    return simplify_gen_subreg (outermode, XEXP (op, 0), origmode,
-					subreg_lowpart_offset (outermode,
-							       origmode));
-	  if (SCALAR_INT_MODE_P (outermode))
-	    return simplify_gen_unary (GET_CODE (op), outermode,
-				       XEXP (op, 0), origmode);
-	}
-
-      /* A SUBREG resulting from a zero extension may fold to zero if
-	 it extracts higher bits that the ZERO_EXTEND's source bits.  */
-      if (GET_CODE (op) == ZERO_EXTEND
-	  && bitpos >= GET_MODE_PRECISION (GET_MODE (XEXP (op, 0))))
+      if (bitpos >= GET_MODE_PRECISION (GET_MODE (XEXP (op, 0))))
 	return CONST0_RTX (outermode);
     }
 
-  /* Simplify (subreg:SI (op:DI ((x:DI) (y:DI)), 0)
-     to (op:SI (subreg:SI (x:DI) 0) (subreg:SI (x:DI) 0)), where
-     the outer subreg is effectively a truncation to the original mode.  */
-  if ((GET_CODE (op) == PLUS
-       || GET_CODE (op) == MINUS
-       || GET_CODE (op) == MULT)
-      && SCALAR_INT_MODE_P (outermode)
-      && SCALAR_INT_MODE_P (innermode)
-      && GET_MODE_PRECISION (outermode) < GET_MODE_PRECISION (innermode)
-      && byte == subreg_lowpart_offset (outermode, innermode))
-    {
-      rtx op0 = simplify_gen_subreg (outermode, XEXP (op, 0),
-                                     innermode, byte);
-      if (op0)
-        {
-          rtx op1 = simplify_gen_subreg (outermode, XEXP (op, 1),
-                                         innermode, byte);
-          if (op1)
-            return simplify_gen_binary (GET_CODE (op), outermode, op0, op1);
-        }
-    }
-
-  /* Simplify (subreg:QI (lshiftrt:SI (sign_extend:SI (x:QI)) C), 0) into
-     to (ashiftrt:QI (x:QI) C), where C is a suitable small constant and
-     the outer subreg is effectively a truncation to the original mode.  */
-  if ((GET_CODE (op) == LSHIFTRT
-       || GET_CODE (op) == ASHIFTRT)
-      && SCALAR_INT_MODE_P (outermode)
-      && SCALAR_INT_MODE_P (innermode)
-      /* Ensure that OUTERMODE is at least twice as wide as the INNERMODE
-	 to avoid the possibility that an outer LSHIFTRT shifts by more
-	 than the sign extension's sign_bit_copies and introduces zeros
-	 into the high bits of the result.  */
-      && (2 * GET_MODE_PRECISION (outermode)) <= GET_MODE_PRECISION (innermode)
-      && CONST_INT_P (XEXP (op, 1))
-      && GET_CODE (XEXP (op, 0)) == SIGN_EXTEND
-      && GET_MODE (XEXP (XEXP (op, 0), 0)) == outermode
-      && INTVAL (XEXP (op, 1)) < GET_MODE_PRECISION (outermode)
-      && subreg_lsb_1 (outermode, innermode, byte) == 0)
-    return simplify_gen_binary (ASHIFTRT, outermode,
-				XEXP (XEXP (op, 0), 0), XEXP (op, 1));
-
-  /* Likewise (subreg:QI (lshiftrt:SI (zero_extend:SI (x:QI)) C), 0) into
-     to (lshiftrt:QI (x:QI) C), where C is a suitable small constant and
-     the outer subreg is effectively a truncation to the original mode.  */
-  if ((GET_CODE (op) == LSHIFTRT
-       || GET_CODE (op) == ASHIFTRT)
-      && SCALAR_INT_MODE_P (outermode)
-      && SCALAR_INT_MODE_P (innermode)
-      && GET_MODE_PRECISION (outermode) < GET_MODE_PRECISION (innermode)
-      && CONST_INT_P (XEXP (op, 1))
-      && GET_CODE (XEXP (op, 0)) == ZERO_EXTEND
-      && GET_MODE (XEXP (XEXP (op, 0), 0)) == outermode
-      && INTVAL (XEXP (op, 1)) < GET_MODE_PRECISION (outermode)
-      && subreg_lsb_1 (outermode, innermode, byte) == 0)
-    return simplify_gen_binary (LSHIFTRT, outermode,
-				XEXP (XEXP (op, 0), 0), XEXP (op, 1));
-
-  /* Likewise (subreg:QI (ashift:SI (zero_extend:SI (x:QI)) C), 0) into
-     to (ashift:QI (x:QI) C), where C is a suitable small constant and
-     the outer subreg is effectively a truncation to the original mode.  */
-  if (GET_CODE (op) == ASHIFT
-      && SCALAR_INT_MODE_P (outermode)
+  if (SCALAR_INT_MODE_P (outermode)
       && SCALAR_INT_MODE_P (innermode)
       && GET_MODE_PRECISION (outermode) < GET_MODE_PRECISION (innermode)
-      && CONST_INT_P (XEXP (op, 1))
-      && (GET_CODE (XEXP (op, 0)) == ZERO_EXTEND
-	  || GET_CODE (XEXP (op, 0)) == SIGN_EXTEND)
-      && GET_MODE (XEXP (XEXP (op, 0), 0)) == outermode
-      && INTVAL (XEXP (op, 1)) < GET_MODE_PRECISION (outermode)
-      && subreg_lsb_1 (outermode, innermode, byte) == 0)
-    return simplify_gen_binary (ASHIFT, outermode,
-				XEXP (XEXP (op, 0), 0), XEXP (op, 1));
-
-  /* Recognize a word extraction from a multi-word subreg.  */
-  if ((GET_CODE (op) == LSHIFTRT
-       || GET_CODE (op) == ASHIFTRT)
-      && SCALAR_INT_MODE_P (innermode)
-      && GET_MODE_PRECISION (outermode) >= BITS_PER_WORD
-      && GET_MODE_PRECISION (innermode) >= (2 * GET_MODE_PRECISION (outermode))
-      && CONST_INT_P (XEXP (op, 1))
-      && (INTVAL (XEXP (op, 1)) & (GET_MODE_PRECISION (outermode) - 1)) == 0
-      && INTVAL (XEXP (op, 1)) >= 0
-      && INTVAL (XEXP (op, 1)) < GET_MODE_PRECISION (innermode)
       && byte == subreg_lowpart_offset (outermode, innermode))
     {
-      int shifted_bytes = INTVAL (XEXP (op, 1)) / BITS_PER_UNIT;
-      return simplify_gen_subreg (outermode, XEXP (op, 0), innermode,
-				  (WORDS_BIG_ENDIAN
-				   ? byte - shifted_bytes
-				   : byte + shifted_bytes));
-    }
-
-  /* If we have a lowpart SUBREG of a right shift of MEM, make a new MEM
-     and try replacing the SUBREG and shift with it.  Don't do this if
-     the MEM has a mode-dependent address or if we would be widening it.  */
-
-  if ((GET_CODE (op) == LSHIFTRT
-       || GET_CODE (op) == ASHIFTRT)
-      && SCALAR_INT_MODE_P (innermode)
-      && MEM_P (XEXP (op, 0))
-      && CONST_INT_P (XEXP (op, 1))
-      && GET_MODE_SIZE (outermode) < GET_MODE_SIZE (GET_MODE (op))
-      && (INTVAL (XEXP (op, 1)) % GET_MODE_BITSIZE (outermode)) == 0
-      && INTVAL (XEXP (op, 1)) > 0
-      && INTVAL (XEXP (op, 1)) < GET_MODE_BITSIZE (innermode)
-      && ! mode_dependent_address_p (XEXP (XEXP (op, 0), 0),
-				     MEM_ADDR_SPACE (XEXP (op, 0)))
-      && ! MEM_VOLATILE_P (XEXP (op, 0))
-      && byte == subreg_lowpart_offset (outermode, innermode)
-      && (GET_MODE_SIZE (outermode) >= UNITS_PER_WORD
-	  || WORDS_BIG_ENDIAN == BYTES_BIG_ENDIAN))
-    {
-      int shifted_bytes = INTVAL (XEXP (op, 1)) / BITS_PER_UNIT;
-      return adjust_address_nv (XEXP (op, 0), outermode,
-				(WORDS_BIG_ENDIAN
-				 ? byte - shifted_bytes
-				 : byte + shifted_bytes));
+      rtx tem = simplify_truncation (outermode, op, innermode);
+      if (tem)
+	return tem;
     }
 
   return NULL_RTX;
Index: gcc/config/mips/mips.c
===================================================================
--- gcc/config/mips/mips.c	2012-10-06 11:22:55.888609330 +0100
+++ gcc/config/mips/mips.c	2012-10-06 16:32:25.240799783 +0100
@@ -3527,6 +3527,17 @@ mips_set_reg_reg_cost (enum machine_mode
     }
 }
 
+/* Return the cost of an operand X that can be trucated for free.
+   SPEED says whether we're optimizing for size or speed.  */
+
+static int
+mips_truncated_op_cost (rtx x, bool speed)
+{
+  if (GET_CODE (x) == TRUNCATE)
+    x = XEXP (x, 0);
+  return set_src_cost (x, speed);
+}
+
 /* Implement TARGET_RTX_COSTS.  */
 
 static bool
@@ -3907,12 +3918,13 @@ mips_rtx_costs (rtx x, int code, int out
     case ZERO_EXTEND:
       if (outer_code == SET
 	  && ISA_HAS_BADDU
-	  && (GET_CODE (XEXP (x, 0)) == TRUNCATE
-	      || GET_CODE (XEXP (x, 0)) == SUBREG)
 	  && GET_MODE (XEXP (x, 0)) == QImode
-	  && GET_CODE (XEXP (XEXP (x, 0), 0)) == PLUS)
+	  && GET_CODE (XEXP (x, 0)) == PLUS)
 	{
-	  *total = set_src_cost (XEXP (XEXP (x, 0), 0), speed);
+	  rtx plus = XEXP (x, 0);
+	  *total = (COSTS_N_INSNS (1)
+		    + mips_truncated_op_cost (XEXP (plus, 0), speed)
+		    + mips_truncated_op_cost (XEXP (plus, 1), speed));
 	  return true;
 	}
       *total = mips_zero_extend_cost (mode, XEXP (x, 0));
Index: gcc/config/mips/mips.md
===================================================================
--- gcc/config/mips/mips.md	2012-10-06 11:22:55.888609330 +0100
+++ gcc/config/mips/mips.md	2012-10-06 16:32:21.168800012 +0100
@@ -1305,9 +1305,8 @@ (define_insn "*baddu_si"
 (define_insn "*baddu_di<mode>"
   [(set (match_operand:GPR 0 "register_operand" "=d")
         (zero_extend:GPR
-	 (truncate:QI
-	  (plus:DI (match_operand:DI 1 "register_operand" "d")
-		   (match_operand:DI 2 "register_operand" "d")))))]
+	 (plus:QI (truncate:QI (match_operand:DI 1 "register_operand" "d"))
+		  (truncate:QI (match_operand:DI 2 "register_operand" "d")))))]
   "ISA_HAS_BADDU && TARGET_64BIT"
   "baddu\\t%0,%1,%2"
   [(set_attr "alu_type" "add")])

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: RFA: Simplifying truncation and integer lowpart subregs
  2012-10-07  8:56                           ` Richard Sandiford
@ 2012-10-07 12:36                             ` Eric Botcazou
  2012-11-28  2:27                             ` Ramana Radhakrishnan
  1 sibling, 0 replies; 23+ messages in thread
From: Eric Botcazou @ 2012-10-07 12:36 UTC (permalink / raw)
  To: Richard Sandiford
  Cc: gcc-patches, Andrew Pinski, Uros Bizjak, Paul_Koning, kkojima,
	aoliva, dje.gcc, uweigand, walt

> Yeah, in hindsight, the patch was definitely lacking commentary.
> How about the patch below?  It also fixes the partial int case
> and gets rid of the errant NOT hunk.  Tested in the same way as before.
> 
> Richard
> 
> 
> gcc/
> 	* machmode.h (GET_MODE_UNIT_PRECISION): New macro.
> 	* simplify-rtx.c (simplify_truncation): New function,
> 	extracted from simplify_subreg and (in small part) from
> 	simplify_unary_operation_1.
> 	(simplify_unary_operation_1) <TRUNCATE>: Use it.  Remove sign bit
> 	test for !TRULY_NOOP_TRUNCATION_MODES_P.
> 	(simplify_subreg): Use simplify_truncate for lowpart subregs
> 	where both the inner and outer modes are scalar integers.

This looks good to me, thanks.

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: RFA: Simplifying truncation and integer lowpart subregs
  2012-10-07  8:56                           ` Richard Sandiford
  2012-10-07 12:36                             ` Eric Botcazou
@ 2012-11-28  2:27                             ` Ramana Radhakrishnan
  2012-11-28 21:45                               ` Richard Sandiford
  1 sibling, 1 reply; 23+ messages in thread
From: Ramana Radhakrishnan @ 2012-11-28  2:27 UTC (permalink / raw)
  To: Eric Botcazou, gcc-patches, Andrew Pinski, Uros Bizjak,
	Paul_Koning, kkojima, aoliva, dje.gcc, uweigand, walt,
	rdsandiford

On Sun, Oct 7, 2012 at 8:56 AM, Richard Sandiford
<rdsandiford@googlemail.com> wrote:
> Eric Botcazou <ebotcazou@adacore.com> writes:
>>> I think modelling it as a TRUNCATE operation is correct for
>>> !TRULY_NOOP_TRUNCATION (it's the bug that Andrew pointed out).
>>> And we shouldn't generate an actual TRUNCATE rtx for
>>> TRULY_NOOP_TRUNCATION (the thing about making
>>> simplify_gen_unary (TRUNCATE, ...) no worse than simplify_gen_subreg
>>> for those targets).  I suppose:
>>>
>>>       /* We can't handle truncation to a partial integer mode here
>>>          because we don't know the real bitsize of the partial
>>>          integer mode.  */
>>>       if (GET_MODE_CLASS (mode) == MODE_PARTIAL_INT)
>>>         break;
>>>
>>> might be a problem though; we should still allow a subreg to be
>>> generated.  Is that what you were thinking of, or something else?
>>
>> I was thinking of the !TRULY_NOOP_TRUNCATION case, where the two operations
>> aren't equivalent.  Generating TRUNCATE in simplify_subreg seems suspicious to
>> me in this case but, if not doing it is the source of the bug, I guess I need
>> to do some homework on this TRULY_NOOP_TRUNCATION stuff. :-)
>>
>> Maybe add a blurb to the head comment of simplify_truncation, explaining that
>> it is valid to call the function both for TRUNCATEs and truncations to the
>> lowpart, and why it is correct to generate new TRUNCATEs in the latter case.
>
> Yeah, in hindsight, the patch was definitely lacking commentary.
> How about the patch below?  It also fixes the partial int case
> and gets rid of the errant NOT hunk.  Tested in the same way as before.
>
> Richard
>
>
> gcc/
>         * machmode.h (GET_MODE_UNIT_PRECISION): New macro.
>         * simplify-rtx.c (simplify_truncation): New function,
>         extracted from simplify_subreg and (in small part) from
>         simplify_unary_operation_1.
>         (simplify_unary_operation_1) <TRUNCATE>: Use it.  Remove sign bit
>         test for !TRULY_NOOP_TRUNCATION_MODES_P.
>         (simplify_subreg): Use simplify_truncate for lowpart subregs
>         where both the inner and outer modes are scalar integers.
>         * config/mips/mips.c (mips_truncated_op_cost): New function.
>         (mips_rtx_costs): Adjust test for BADDU.
>         * config/mips/mips.md (*baddu_di<mode>): Push truncates to operands.

This triggers PR55052 on ARM.I've attached the .i file and the dumps
to the bug report.


Thanks,
Ramana

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: RFA: Simplifying truncation and integer lowpart subregs
  2012-11-28  2:27                             ` Ramana Radhakrishnan
@ 2012-11-28 21:45                               ` Richard Sandiford
  0 siblings, 0 replies; 23+ messages in thread
From: Richard Sandiford @ 2012-11-28 21:45 UTC (permalink / raw)
  To: ramrad01
  Cc: Eric Botcazou, gcc-patches, Andrew Pinski, Uros Bizjak,
	Paul_Koning, kkojima, aoliva, dje.gcc, uweigand, walt

Ramana Radhakrishnan <ramana.gcc@googlemail.com> writes:
> On Sun, Oct 7, 2012 at 8:56 AM, Richard Sandiford
> <rdsandiford@googlemail.com> wrote:
>> Eric Botcazou <ebotcazou@adacore.com> writes:
>>>> I think modelling it as a TRUNCATE operation is correct for
>>>> !TRULY_NOOP_TRUNCATION (it's the bug that Andrew pointed out).
>>>> And we shouldn't generate an actual TRUNCATE rtx for
>>>> TRULY_NOOP_TRUNCATION (the thing about making
>>>> simplify_gen_unary (TRUNCATE, ...) no worse than simplify_gen_subreg
>>>> for those targets).  I suppose:
>>>>
>>>>       /* We can't handle truncation to a partial integer mode here
>>>>          because we don't know the real bitsize of the partial
>>>>          integer mode.  */
>>>>       if (GET_MODE_CLASS (mode) == MODE_PARTIAL_INT)
>>>>         break;
>>>>
>>>> might be a problem though; we should still allow a subreg to be
>>>> generated.  Is that what you were thinking of, or something else?
>>>
>>> I was thinking of the !TRULY_NOOP_TRUNCATION case, where the two operations
>>> aren't equivalent.  Generating TRUNCATE in simplify_subreg seems
>>> suspicious to
>>> me in this case but, if not doing it is the source of the bug, I guess I need
>>> to do some homework on this TRULY_NOOP_TRUNCATION stuff. :-)
>>>
>>> Maybe add a blurb to the head comment of simplify_truncation, explaining that
>>> it is valid to call the function both for TRUNCATEs and truncations to the
>>> lowpart, and why it is correct to generate new TRUNCATEs in the latter case.
>>
>> Yeah, in hindsight, the patch was definitely lacking commentary.
>> How about the patch below?  It also fixes the partial int case
>> and gets rid of the errant NOT hunk.  Tested in the same way as before.
>>
>> Richard
>>
>>
>> gcc/
>>         * machmode.h (GET_MODE_UNIT_PRECISION): New macro.
>>         * simplify-rtx.c (simplify_truncation): New function,
>>         extracted from simplify_subreg and (in small part) from
>>         simplify_unary_operation_1.
>>         (simplify_unary_operation_1) <TRUNCATE>: Use it.  Remove sign bit
>>         test for !TRULY_NOOP_TRUNCATION_MODES_P.
>>         (simplify_subreg): Use simplify_truncate for lowpart subregs
>>         where both the inner and outer modes are scalar integers.
>>         * config/mips/mips.c (mips_truncated_op_cost): New function.
>>         (mips_rtx_costs): Adjust test for BADDU.
>>         * config/mips/mips.md (*baddu_di<mode>): Push truncates to operands.
>
> This triggers PR55052 on ARM.I've attached the .i file and the dumps
> to the bug report.

Thanks.  I'd managed to drop a SCALAR_INT_MODE_P check when splitting
the ZERO_EXTEND handling into two.

This patch reinstates the check.  Tested on x86_64-linux-gnu and applied
as obvious.

Richard


gcc/
	PR rtl-optimization/55052
	* simplify-rtx.c (simplify_subreg): Restore SCALAR_INT_MODE_P check.

Index: gcc/simplify-rtx.c
===================================================================
--- gcc/simplify-rtx.c	2012-11-27 18:52:29.000000000 +0000
+++ gcc/simplify-rtx.c	2012-11-28 19:54:30.500525576 +0000
@@ -5875,7 +5875,7 @@ simplify_subreg (enum machine_mode outer
 
   /* A SUBREG resulting from a zero extension may fold to zero if
      it extracts higher bits that the ZERO_EXTEND's source bits.  */
-  if (GET_CODE (op) == ZERO_EXTEND)
+  if (GET_CODE (op) == ZERO_EXTEND && SCALAR_INT_MODE_P (innermode))
     {
       unsigned int bitpos = subreg_lsb_1 (outermode, innermode, byte);
       if (bitpos >= GET_MODE_PRECISION (GET_MODE (XEXP (op, 0)))

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2012-11-28 21:45 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-09-25  8:04 [PATCH, rtl-optimization]: Fix PR54457, [x32] Fail to combine 64bit index + constant Uros Bizjak
2012-09-26 18:17 ` Richard Sandiford
2012-09-26 21:38   ` Eric Botcazou
2012-09-27 14:25     ` Uros Bizjak
2012-09-27 16:10       ` Richard Sandiford
2012-09-27 18:20         ` [PATCH v2, " Uros Bizjak
2012-09-27 18:35           ` Paul_Koning
2012-09-27 19:21             ` Uros Bizjak
2012-10-02  2:13               ` Andrew Pinski
2012-10-02 19:32                 ` Richard Sandiford
2012-10-06 10:22                   ` RFA: Simplifying truncation and integer lowpart subregs Richard Sandiford
2012-10-06 11:13                     ` Eric Botcazou
2012-10-06 12:39                       ` Richard Sandiford
2012-10-06 13:05                         ` Eric Botcazou
2012-10-07  8:56                           ` Richard Sandiford
2012-10-07 12:36                             ` Eric Botcazou
2012-11-28  2:27                             ` Ramana Radhakrishnan
2012-11-28 21:45                               ` Richard Sandiford
2012-09-27 20:33           ` [PATCH v2, rtl-optimization]: Fix PR54457, [x32] Fail to combine 64bit index + constant Jakub Jelinek
2012-09-27 22:37             ` Richard Sandiford
2012-09-28 15:39             ` Uros Bizjak
2012-09-30 11:40               ` Richard Sandiford
2012-10-03 11:07                 ` Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).