RFC: allowing fwprop to propagate subregs

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* RFC: allowing fwprop to propagate subregs
@ 2011-09-14 15:40 Richard Sandiford
  2011-09-14 15:45 ` H.J. Lu
  2012-01-11 16:55 ` Ulrich Weigand
  0 siblings, 2 replies; 14+ messages in thread
From: Richard Sandiford @ 2011-09-14 15:40 UTC (permalink / raw)
  To: gcc-patches; +Cc: bonzini

At the moment, fwprop will propagate constants and registers
even if no further rtl simplifications are possible:

  if (REG_P (new_rtx) || CONSTANT_P (new_rtx))
    flags |= PR_CAN_APPEAR;

What do you think about extending this to subregs?  The reason for
asking is that on NEON, vector loads like vld4 are represented as a load
of a single monolithic register followed by subreg extractions of each
vector:

  (set (reg:OI FULL) (...))
  (set (reg:V2SI V0) (subreg:V2SI (reg:OI FULL) 0))
  (set (reg:V2SI V1) (subreg:V2SI (reg:OI FULL) 16))
  (set (reg:V2SI V2) (subreg:V2SI (reg:OI FULL) 32))
  (set (reg:V2SI V3) (subreg:V2SI (reg:OI FULL) 48))

Nothing ever propagates these subregs, so the separate moves
survive until IRA.  This has three problems:

  - We generally want the registers allocated to V0...V3 to be the same
    as FULL, so that the four subreg moves become nops.  And this often
    happens in simple examples.  But if register pressure is relatively
    high, these moves can sometimes cause IRA to spill in cases where
    it doesn't if the subregs are used instead of each Vi.

  - Perhaps related, register pressure becomes harder to estimate.

  - These moves can interfere with pre-reload scheduling.

In combination with the MODES_TIEABLE_P patch that I posted here:

    http://gcc.gnu.org/ml/gcc-patches/2011-09/msg00626.html

this patch significantly improves the code generated for several libav
loops.  Unfortunately, I don't have a setup that can do meaningful
x86_64 performance measurements, but a diff of the before and after
output for libav showed many cases where the patch removed moves.

What do you think?  Alternatives include propagating in lower-subreg,
or maybe only in the second fwprop pass.

Richard

gcc/
	* fwprop.c (propagate_rtx): Also set PR_CAN_APPEAR for subregs.

Index: gcc/fwprop.c
===================================================================
--- gcc/fwprop.c	2011-08-26 09:58:28.829540497 +0100
+++ gcc/fwprop.c	2011-08-26 10:14:03.767707504 +0100
@@ -664,7 +664,7 @@ propagate_rtx (rtx x, enum machine_mode 
     return NULL_RTX;

   flags = 0;
-  if (REG_P (new_rtx) || CONSTANT_P (new_rtx))
+  if (REG_P (new_rtx) || CONSTANT_P (new_rtx) || GET_CODE (new_rtx) == SUBREG)
     flags |= PR_CAN_APPEAR;
   if (!for_each_rtx (&new_rtx, varying_mem_p, NULL))
     flags |= PR_HANDLE_MEM;

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RFC: allowing fwprop to propagate subregs
  2011-09-14 15:40 RFC: allowing fwprop to propagate subregs Richard Sandiford
@ 2011-09-14 15:45 ` H.J. Lu
  2011-09-14 16:09   ` Richard Sandiford
  2012-01-11 16:55 ` Ulrich Weigand
  1 sibling, 1 reply; 14+ messages in thread
From: H.J. Lu @ 2011-09-14 15:45 UTC (permalink / raw)
  To: gcc-patches, bonzini, rdsandiford

On Wed, Sep 14, 2011 at 8:24 AM, Richard Sandiford
<rdsandiford@googlemail.com> wrote:
> At the moment, fwprop will propagate constants and registers
> even if no further rtl simplifications are possible:
>
>  if (REG_P (new_rtx) || CONSTANT_P (new_rtx))
>    flags |= PR_CAN_APPEAR;
>
> What do you think about extending this to subregs?  The reason for
> asking is that on NEON, vector loads like vld4 are represented as a load
> of a single monolithic register followed by subreg extractions of each
> vector:
>
>  (set (reg:OI FULL) (...))
>  (set (reg:V2SI V0) (subreg:V2SI (reg:OI FULL) 0))
>  (set (reg:V2SI V1) (subreg:V2SI (reg:OI FULL) 16))
>  (set (reg:V2SI V2) (subreg:V2SI (reg:OI FULL) 32))
>  (set (reg:V2SI V3) (subreg:V2SI (reg:OI FULL) 48))
>
> Nothing ever propagates these subregs, so the separate moves
> survive until IRA.  This has three problems:
>
>  - We generally want the registers allocated to V0...V3 to be the same
>    as FULL, so that the four subreg moves become nops.  And this often
>    happens in simple examples.  But if register pressure is relatively
>    high, these moves can sometimes cause IRA to spill in cases where
>    it doesn't if the subregs are used instead of each Vi.
>
>  - Perhaps related, register pressure becomes harder to estimate.
>
>  - These moves can interfere with pre-reload scheduling.
>
> In combination with the MODES_TIEABLE_P patch that I posted here:
>
>    http://gcc.gnu.org/ml/gcc-patches/2011-09/msg00626.html
>
> this patch significantly improves the code generated for several libav
> loops.  Unfortunately, I don't have a setup that can do meaningful
> x86_64 performance measurements, but a diff of the before and after
> output for libav showed many cases where the patch removed moves.
>
> What do you think?  Alternatives include propagating in lower-subreg,
> or maybe only in the second fwprop pass.
>
> Richard
>
>
> gcc/
>        * fwprop.c (propagate_rtx): Also set PR_CAN_APPEAR for subregs.
>
> Index: gcc/fwprop.c
> ===================================================================
> --- gcc/fwprop.c        2011-08-26 09:58:28.829540497 +0100
> +++ gcc/fwprop.c        2011-08-26 10:14:03.767707504 +0100
> @@ -664,7 +664,7 @@ propagate_rtx (rtx x, enum machine_mode
>     return NULL_RTX;
>
>   flags = 0;
> -  if (REG_P (new_rtx) || CONSTANT_P (new_rtx))
> +  if (REG_P (new_rtx) || CONSTANT_P (new_rtx) || GET_CODE (new_rtx) == SUBREG)
>     flags |= PR_CAN_APPEAR;
>   if (!for_each_rtx (&new_rtx, varying_mem_p, NULL))
>     flags |= PR_HANDLE_MEM;
>

A SUBREG may not be REG nor CONSTANT. Don't you need
to check REG_P/CONSTANT_P on SUBREG?


-- 
H.J.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RFC: allowing fwprop to propagate subregs
  2011-09-14 15:45 ` H.J. Lu
@ 2011-09-14 16:09   ` Richard Sandiford
  2011-09-14 18:28     ` Paolo Bonzini
  0 siblings, 1 reply; 14+ messages in thread
From: Richard Sandiford @ 2011-09-14 16:09 UTC (permalink / raw)
  To: H.J. Lu; +Cc: gcc-patches, bonzini

"H.J. Lu" <hjl.tools@gmail.com> writes:
> On Wed, Sep 14, 2011 at 8:24 AM, Richard Sandiford
> <rdsandiford@googlemail.com> wrote:
>> At the moment, fwprop will propagate constants and registers
>> even if no further rtl simplifications are possible:
>>
>>  if (REG_P (new_rtx) || CONSTANT_P (new_rtx))
>>    flags |= PR_CAN_APPEAR;
>>
>> What do you think about extending this to subregs?  The reason for
>> asking is that on NEON, vector loads like vld4 are represented as a load
>> of a single monolithic register followed by subreg extractions of each
>> vector:
>>
>>  (set (reg:OI FULL) (...))
>>  (set (reg:V2SI V0) (subreg:V2SI (reg:OI FULL) 0))
>>  (set (reg:V2SI V1) (subreg:V2SI (reg:OI FULL) 16))
>>  (set (reg:V2SI V2) (subreg:V2SI (reg:OI FULL) 32))
>>  (set (reg:V2SI V3) (subreg:V2SI (reg:OI FULL) 48))
>>
>> Nothing ever propagates these subregs, so the separate moves
>> survive until IRA.  This has three problems:
>>
>>  - We generally want the registers allocated to V0...V3 to be the same
>>    as FULL, so that the four subreg moves become nops.  And this often
>>    happens in simple examples.  But if register pressure is relatively
>>    high, these moves can sometimes cause IRA to spill in cases where
>>    it doesn't if the subregs are used instead of each Vi.
>>
>>  - Perhaps related, register pressure becomes harder to estimate.
>>
>>  - These moves can interfere with pre-reload scheduling.
>>
>> In combination with the MODES_TIEABLE_P patch that I posted here:
>>
>>    http://gcc.gnu.org/ml/gcc-patches/2011-09/msg00626.html
>>
>> this patch significantly improves the code generated for several libav
>> loops.  Unfortunately, I don't have a setup that can do meaningful
>> x86_64 performance measurements, but a diff of the before and after
>> output for libav showed many cases where the patch removed moves.
>>
>> What do you think?  Alternatives include propagating in lower-subreg,
>> or maybe only in the second fwprop pass.
>>
>> Richard
>>
>>
>> gcc/
>>        * fwprop.c (propagate_rtx): Also set PR_CAN_APPEAR for subregs.
>>
>> Index: gcc/fwprop.c
>> ===================================================================
>> --- gcc/fwprop.c        2011-08-26 09:58:28.829540497 +0100
>> +++ gcc/fwprop.c        2011-08-26 10:14:03.767707504 +0100
>> @@ -664,7 +664,7 @@ propagate_rtx (rtx x, enum machine_mode
>>     return NULL_RTX;
>>
>>   flags = 0;
>> -  if (REG_P (new_rtx) || CONSTANT_P (new_rtx))
>> +  if (REG_P (new_rtx) || CONSTANT_P (new_rtx) || GET_CODE (new_rtx) == SUBREG)
>>     flags |= PR_CAN_APPEAR;
>>   if (!for_each_rtx (&new_rtx, varying_mem_p, NULL))
>>     flags |= PR_HANDLE_MEM;
>>
>
> A SUBREG may not be REG nor CONSTANT. Don't you need
> to check REG_P/CONSTANT_P on SUBREG?

Yeah, good point.  There should be a "&& REG_P (SUBREG_REG (new_rtx))"
in there.  Probably also worth checking for non-paradoxical subregs.

Richard

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RFC: allowing fwprop to propagate subregs
  2011-09-14 16:09   ` Richard Sandiford
@ 2011-09-14 18:28     ` Paolo Bonzini
  0 siblings, 0 replies; 14+ messages in thread
From: Paolo Bonzini @ 2011-09-14 18:28 UTC (permalink / raw)
  To: H.J. Lu, gcc-patches, richard.sandiford

On 09/14/2011 05:44 PM, Richard Sandiford wrote:
>> >  A SUBREG may not be REG nor CONSTANT. Don't you need
>> >  to check REG_P/CONSTANT_P on SUBREG?
>
> Yeah, good point.  There should be a "&&  REG_P (SUBREG_REG (new_rtx))"
> in there.  Probably also worth checking for non-paradoxical subregs.

I was going to suggest the former.

Paradoxical subregs are already handled elsewhere in fwprop, but it also 
shouldn't hurt to include them.

Paolo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RFC: allowing fwprop to propagate subregs
  2011-09-14 15:40 RFC: allowing fwprop to propagate subregs Richard Sandiford
  2011-09-14 15:45 ` H.J. Lu
@ 2012-01-11 16:55 ` Ulrich Weigand
  2012-01-11 19:12   ` Paolo Bonzini
  2012-01-12  9:57   ` Richard Sandiford
  1 sibling, 2 replies; 14+ messages in thread
From: Ulrich Weigand @ 2012-01-11 16:55 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: gcc-patches, bonzini

Richard Sandiford wrote:

> At the moment, fwprop will propagate constants and registers
> even if no further rtl simplifications are possible:
> 
>   if (REG_P (new_rtx) || CONSTANT_P (new_rtx))
>     flags |= PR_CAN_APPEAR;
> 
> What do you think about extending this to subregs?

I've been testing your patch (in its latest version including
the SUBREG test pointed out by H.J.), and ran into issues where
this additional propagation suppresses other optimizations.

In particular, I'm seeing this testsuite regression on x86_64:
FAIL: gcc.target/i386/pr30315.c scan-assembler-times cmp 4

The problem is that the combined "compute result and set
condition code" insn patterns sometimes no longer match.

The source code in question looks like:

int c;

unsigned char
decccc (unsigned char a, unsigned char b)
{
  unsigned char difference = a - b;
  if (difference > a)
    c--;
  return difference;
}


Before your patch, we generate insns like:

(insn 3 4 5 2 (set (reg/v:QI 63 [ a ])
        (subreg:QI (reg:SI 64 [ a ]) 0)) pr30315.i:5 66 {*movqi_internal}
     (expr_list:REG_DEAD (reg:SI 64 [ a ])
        (nil)))

(insn 5 3 6 2 (set (reg/v:QI 65 [ b ])
        (subreg:QI (reg:SI 66 [ b ]) 0)) pr30315.i:5 66 {*movqi_internal}
     (expr_list:REG_DEAD (reg:SI 66 [ b ])
        (nil)))

(insn 9 6 10 2 (parallel [
            (set (reg/v:QI 59 [ difference ])
                (minus:QI (reg/v:QI 63 [ a ])
                    (reg/v:QI 65 [ b ])))
            (clobber (reg:CC 17 flags))
        ]) pr30315.i:6 287 {*subqi_1}
     (expr_list:REG_DEAD (reg/v:QI 65 [ b ])
        (expr_list:REG_UNUSED (reg:CC 17 flags)
            (nil))))

(insn 10 9 11 2 (set (reg:CC 17 flags)
        (compare:CC (reg/v:QI 63 [ a ])
            (reg/v:QI 59 [ difference ]))) pr30315.i:7 4 {*cmpqi_1}
     (expr_list:REG_DEAD (reg/v:QI 63 [ a ])
        (nil)))

which get combined into:

(insn 4 2 3 2 (set (reg:SI 66 [ b ])
        (reg:SI 4 si [ b ])) pr30315.i:5 64 {*movsi_internal}
     (expr_list:REG_DEAD (reg:SI 4 si [ b ])
        (nil)))

(insn 3 4 5 2 (set (reg/v:QI 63 [ a ])
        (subreg:QI (reg:SI 64 [ a ]) 0)) pr30315.i:5 66 {*movqi_internal}
     (expr_list:REG_DEAD (reg:SI 64 [ a ])
        (nil)))

(insn 10 9 11 2 (parallel [
            (set (reg:CCC 17 flags)
                (compare:CCC (minus:QI (reg/v:QI 63 [ a ])
                        (subreg:QI (reg:SI 66 [ b ]) 0))
                    (reg/v:QI 63 [ a ])))
            (set (reg/v:QI 59 [ difference ])
                (minus:QI (reg/v:QI 63 [ a ])
                    (subreg:QI (reg:SI 66 [ b ]) 0)))
        ]) pr30315.i:7 322 {*subqi3_cc_overflow}
     (expr_list:REG_DEAD (reg:SI 66 [ b ])
        (expr_list:REG_DEAD (reg/v:QI 63 [ a ])
            (nil))))


Now, with your patch we have instead before combine:

(insn 9 6 10 2 (parallel [
            (set (reg/v:QI 59 [ difference ])
                (minus:QI (subreg:QI (reg:SI 64 [ a ]) 0)
                    (subreg:QI (reg:SI 66 [ b ]) 0)))
            (clobber (reg:CC 17 flags))
        ]) pr30315.i:6 287 {*subqi_1}
     (expr_list:REG_DEAD (reg:SI 66 [ b ])
        (expr_list:REG_UNUSED (reg:CC 17 flags)
            (nil))))

(insn 10 9 11 2 (set (reg:CC 17 flags)
        (compare:CC (subreg:QI (reg:SI 64 [ a ]) 0)
            (reg/v:QI 59 [ difference ]))) pr30315.i:7 4 {*cmpqi_1}
     (expr_list:REG_DEAD (reg:SI 64 [ a ])
        (nil)))

which combine is unfortunately unable to process:

Trying 9 -> 10:
Failed to match this instruction:
(parallel [
        (set (reg:CC 17 flags)
            (compare:CC (subreg:QI (minus:SI (reg:SI 64 [ a ])
                        (reg:SI 66 [ b ])) 0)
                (subreg:QI (reg:SI 64 [ a ]) 0)))
        (set (reg/v:QI 59 [ difference ])
            (minus:QI (subreg:QI (reg:SI 64 [ a ]) 0)
                (subreg:QI (reg:SI 66 [ b ]) 0)))
    ])


The problem appears to be that an RTX expression like

    (minus:QI (subreg:QI (reg:SI 64 [ a ]) 0)
              (subreg:QI (reg:SI 66 [ b ]) 0))

seems to be considered non-canonical by combine, and is instead
transformed into

    (subreg:QI (minus:SI (reg:SI 64 [ a ])
                         (reg:SI 66 [ b ])) 0)

This happens via combine.c:apply_distributive_law.


On the one hand, this seems odd, as SUBREGs with a non-trivial
expression inside will usually not match any insn pattern.  On
the other hand, I guess this might still be useful behaviour
for combine on platforms that support only word arithmetic,
when the SUBREG might be considered a "split" point.  (Of course,
this doesn't work for the compute-result-and-cc type patterns.)


In either case, it is always odd to have RTX in the insn stream
that isn't "stable" under common simplication ...   Do you have
any suggestions on how to fix this?  If we add the fwprop patch,
should we then disable apply_distributive_law for SUBREGs?


Thanks,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  Ulrich.Weigand@de.ibm.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RFC: allowing fwprop to propagate subregs
  2012-01-11 16:55 ` Ulrich Weigand
@ 2012-01-11 19:12   ` Paolo Bonzini
  2012-01-12  9:57   ` Richard Sandiford
  1 sibling, 0 replies; 14+ messages in thread
From: Paolo Bonzini @ 2012-01-11 19:12 UTC (permalink / raw)
  To: Ulrich Weigand; +Cc: Richard Sandiford, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 628 bytes --]

On 01/11/2012 05:55 PM, Ulrich Weigand wrote:
> In either case, it is always odd to have RTX in the insn stream
> that isn't "stable" under common simplication ...   Do you have
> any suggestions on how to fix this?  If we add the fwprop patch,
> should we then disable apply_distributive_law for SUBREGs?

This strange canonicalization is also the cause of PR39726, where I also 
wanted to move subregs inside arithmetic operations.  See the attached 
patch (which is not enough to fix the PR; this is just the combine part).

Unfortunately, doing this also caused x86_64 regressions, I think in 
zero_extract patterns.

Paolo

[-- Attachment #2: subreg-canon.patch --]
[-- Type: text/x-patch, Size: 5830 bytes --]

change canonicalization

Index: gcc/Makefile.in
===================================================================
--- gcc/Makefile.in	(branch pr39726)
+++ gcc/Makefile.in	(working copy)
@@ -2844,7 +2844,7 @@ jump.o : jump.c $(CONFIG_H) $(SYSTEM_H) 
 simplify-rtx.o : simplify-rtx.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
    $(RTL_H) $(REGS_H) hard-reg-set.h $(FLAGS_H) $(REAL_H) insn-config.h \
    $(RECOG_H) $(EXPR_H) $(TOPLEV_H) output.h $(FUNCTION_H) $(GGC_H) $(TM_P_H) \
-   $(TREE_H) $(TARGET_H)
+   $(TREE_H) $(TARGET_H) $(OPTABS_H)
 cgraph.o : cgraph.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(TREE_H) \
    langhooks.h $(TOPLEV_H) $(FLAGS_H) $(GGC_H) $(TARGET_H) $(CGRAPH_H) \
    gt-cgraph.h output.h intl.h $(BASIC_BLOCK_H) debug.h $(HASHTAB_H) \
Index: gcc/simplify-rtx.c
===================================================================
--- gcc/simplify-rtx.c	(branch pr39726)
+++ gcc/simplify-rtx.c	(working copy)
@@ -39,6 +39,7 @@ along with GCC; see the file COPYING3.  
 #include "output.h"
 #include "ggc.h"
 #include "target.h"
+#include "optabs.h"
 
 /* Simplification and canonicalization of RTL.  */
 
@@ -5191,6 +5192,8 @@ rtx
 simplify_subreg (enum machine_mode outermode, rtx op,
 		 enum machine_mode innermode, unsigned int byte)
 {
+  int is_lowpart;
+
   /* Little bit of sanity checking.  */
   gcc_assert (innermode != VOIDmode);
   gcc_assert (outermode != VOIDmode);
@@ -5300,11 +5303,13 @@ simplify_subreg (enum machine_mode outer
       return NULL_RTX;
     }
 
+  is_lowpart = subreg_lowpart_offset (outermode, innermode) == byte;
+
   /* Merge implicit and explicit truncations.  */
 
   if (GET_CODE (op) == TRUNCATE
       && GET_MODE_SIZE (outermode) < GET_MODE_SIZE (innermode)
-      && subreg_lowpart_offset (outermode, innermode) == byte)
+      && is_lowpart)
     return simplify_gen_unary (TRUNCATE, outermode, XEXP (op, 0),
 			       GET_MODE (XEXP (op, 0)));
 
@@ -5343,7 +5348,7 @@ simplify_subreg (enum machine_mode outer
 	     The information is used only by alias analysis that can not
 	     grog partial register anyway.  */
 
-	  if (subreg_lowpart_offset (outermode, innermode) == byte)
+	  if (is_lowpart)
 	    ORIGINAL_REGNO (x) = ORIGINAL_REGNO (op);
 	  return x;
 	}
@@ -5393,6 +5398,51 @@ simplify_subreg (enum machine_mode outer
       return NULL_RTX;
     }
 
+  /* Try to move a subreg inside an arithmetic operation.  */
+  if (is_lowpart && ARITHMETIC_P (op)
+      && GET_MODE_CLASS (outermode) == MODE_INT
+      && GET_MODE_CLASS (innermode) == MODE_INT)
+    {
+      enum insn_code ic;
+      enum machine_mode cnt_mode;
+      switch (GET_CODE (op))
+	{
+	case ABS:
+	case NEG:
+	  return simplify_gen_unary (GET_CODE (op), outermode,
+				     rtl_hooks.gen_lowpart_no_emit
+				      (outermode, XEXP (op, 0)),
+				     outermode);
+
+	case ASHIFT:
+	  /* i386 always uses QImode for the shift count.  Get the
+	     appropriate mode from the optab.  */
+	  ic = ashl_optab->handlers[outermode].insn_code;
+	  if (ic != CODE_FOR_nothing)
+	    cnt_mode = insn_data[ic].operand[2].mode;
+	  else
+	    cnt_mode = outermode;
+	  return simplify_gen_binary (GET_CODE (op), outermode,
+				      rtl_hooks.gen_lowpart_no_emit
+				       (outermode, XEXP (op, 0)),
+				      rtl_hooks.gen_lowpart_no_emit
+				       (cnt_mode, XEXP (op, 1)));
+
+	case PLUS:
+	case MINUS:
+	case AND:
+	case IOR:
+	case XOR:
+	  return simplify_gen_binary (GET_CODE (op), outermode,
+				      rtl_hooks.gen_lowpart_no_emit
+				       (outermode, XEXP (op, 0)),
+				      rtl_hooks.gen_lowpart_no_emit
+				       (outermode, XEXP (op, 1)));
+	default:
+	  break;
+	}
+    }
+
   /* Optimize SUBREG truncations of zero and sign extended values.  */
   if ((GET_CODE (op) == ZERO_EXTEND
        || GET_CODE (op) == SIGN_EXTEND)
Index: gcc/combine.c
===================================================================
--- gcc/combine.c	(branch pr39726)
+++ gcc/combine.c	(working copy)
@@ -11155,47 +11155,6 @@ simplify_comparison (enum rtx_code code,
 	      continue;
 	    }
 
-	  /* If this is (and:M1 (subreg:M2 X 0) (const_int C1)) where C1
-	     fits in both M1 and M2 and the SUBREG is either paradoxical
-	     or represents the low part, permute the SUBREG and the AND
-	     and try again.  */
-	  if (GET_CODE (XEXP (op0, 0)) == SUBREG)
-	    {
-	      unsigned HOST_WIDE_INT c1;
-	      tmode = GET_MODE (SUBREG_REG (XEXP (op0, 0)));
-	      /* Require an integral mode, to avoid creating something like
-		 (AND:SF ...).  */
-	      if (SCALAR_INT_MODE_P (tmode)
-		  /* It is unsafe to commute the AND into the SUBREG if the
-		     SUBREG is paradoxical and WORD_REGISTER_OPERATIONS is
-		     not defined.  As originally written the upper bits
-		     have a defined value due to the AND operation.
-		     However, if we commute the AND inside the SUBREG then
-		     they no longer have defined values and the meaning of
-		     the code has been changed.  */
-		  && (0
-#ifdef WORD_REGISTER_OPERATIONS
-		      || (mode_width > GET_MODE_BITSIZE (tmode)
-			  && mode_width <= BITS_PER_WORD)
-#endif
-		      || (mode_width <= GET_MODE_BITSIZE (tmode)
-			  && subreg_lowpart_p (XEXP (op0, 0))))
-		  && CONST_INT_P (XEXP (op0, 1))
-		  && mode_width <= HOST_BITS_PER_WIDE_INT
-		  && GET_MODE_BITSIZE (tmode) <= HOST_BITS_PER_WIDE_INT
-		  && ((c1 = INTVAL (XEXP (op0, 1))) & ~mask) == 0
-		  && (c1 & ~GET_MODE_MASK (tmode)) == 0
-		  && c1 != mask
-		  && c1 != GET_MODE_MASK (tmode))
-		{
-		  op0 = simplify_gen_binary (AND, tmode,
-					     SUBREG_REG (XEXP (op0, 0)),
-					     gen_int_mode (c1, tmode));
-		  op0 = gen_lowpart (mode, op0);
-		  continue;
-		}
-	    }
-
 	  /* Convert (ne (and (not X) 1) 0) to (eq (and X 1) 0).  */
 	  if (const_op == 0 && equality_comparison_p
 	      && XEXP (op0, 1) == const1_rtx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RFC: allowing fwprop to propagate subregs
  2012-01-11 16:55 ` Ulrich Weigand
  2012-01-11 19:12   ` Paolo Bonzini
@ 2012-01-12  9:57   ` Richard Sandiford
  2012-01-12 11:56     ` Richard Kenner
  1 sibling, 1 reply; 14+ messages in thread
From: Richard Sandiford @ 2012-01-12  9:57 UTC (permalink / raw)
  To: Ulrich Weigand; +Cc: gcc-patches, bonzini, kenner

[Richard, combine question at the bottom for you.  I've quoted Ulrich's
 whole message in order to provide a bit of context.]

"Ulrich Weigand" <uweigand@de.ibm.com> writes:
> Richard Sandiford wrote:
>> At the moment, fwprop will propagate constants and registers
>> even if no further rtl simplifications are possible:
>> 
>>   if (REG_P (new_rtx) || CONSTANT_P (new_rtx))
>>     flags |= PR_CAN_APPEAR;
>> 
>> What do you think about extending this to subregs?
>
> I've been testing your patch (in its latest version including
> the SUBREG test pointed out by H.J.), and ran into issues where
> this additional propagation suppresses other optimizations.
>
> In particular, I'm seeing this testsuite regression on x86_64:
> FAIL: gcc.target/i386/pr30315.c scan-assembler-times cmp 4
>
> The problem is that the combined "compute result and set
> condition code" insn patterns sometimes no longer match.
>
> The source code in question looks like:
>
> int c;
>
> unsigned char
> decccc (unsigned char a, unsigned char b)
> {
>   unsigned char difference = a - b;
>   if (difference > a)
>     c--;
>   return difference;
> }
>
>
> Before your patch, we generate insns like:
>
> (insn 3 4 5 2 (set (reg/v:QI 63 [ a ])
>         (subreg:QI (reg:SI 64 [ a ]) 0)) pr30315.i:5 66 {*movqi_internal}
>      (expr_list:REG_DEAD (reg:SI 64 [ a ])
>         (nil)))
>
> (insn 5 3 6 2 (set (reg/v:QI 65 [ b ])
>         (subreg:QI (reg:SI 66 [ b ]) 0)) pr30315.i:5 66 {*movqi_internal}
>      (expr_list:REG_DEAD (reg:SI 66 [ b ])
>         (nil)))
>
> (insn 9 6 10 2 (parallel [
>             (set (reg/v:QI 59 [ difference ])
>                 (minus:QI (reg/v:QI 63 [ a ])
>                     (reg/v:QI 65 [ b ])))
>             (clobber (reg:CC 17 flags))
>         ]) pr30315.i:6 287 {*subqi_1}
>      (expr_list:REG_DEAD (reg/v:QI 65 [ b ])
>         (expr_list:REG_UNUSED (reg:CC 17 flags)
>             (nil))))
>
> (insn 10 9 11 2 (set (reg:CC 17 flags)
>         (compare:CC (reg/v:QI 63 [ a ])
>             (reg/v:QI 59 [ difference ]))) pr30315.i:7 4 {*cmpqi_1}
>      (expr_list:REG_DEAD (reg/v:QI 63 [ a ])
>         (nil)))
>
> which get combined into:
>
> (insn 4 2 3 2 (set (reg:SI 66 [ b ])
>         (reg:SI 4 si [ b ])) pr30315.i:5 64 {*movsi_internal}
>      (expr_list:REG_DEAD (reg:SI 4 si [ b ])
>         (nil)))
>
> (insn 3 4 5 2 (set (reg/v:QI 63 [ a ])
>         (subreg:QI (reg:SI 64 [ a ]) 0)) pr30315.i:5 66 {*movqi_internal}
>      (expr_list:REG_DEAD (reg:SI 64 [ a ])
>         (nil)))
>
> (insn 10 9 11 2 (parallel [
>             (set (reg:CCC 17 flags)
>                 (compare:CCC (minus:QI (reg/v:QI 63 [ a ])
>                         (subreg:QI (reg:SI 66 [ b ]) 0))
>                     (reg/v:QI 63 [ a ])))
>             (set (reg/v:QI 59 [ difference ])
>                 (minus:QI (reg/v:QI 63 [ a ])
>                     (subreg:QI (reg:SI 66 [ b ]) 0)))
>         ]) pr30315.i:7 322 {*subqi3_cc_overflow}
>      (expr_list:REG_DEAD (reg:SI 66 [ b ])
>         (expr_list:REG_DEAD (reg/v:QI 63 [ a ])
>             (nil))))
>
>
> Now, with your patch we have instead before combine:
>
> (insn 9 6 10 2 (parallel [
>             (set (reg/v:QI 59 [ difference ])
>                 (minus:QI (subreg:QI (reg:SI 64 [ a ]) 0)
>                     (subreg:QI (reg:SI 66 [ b ]) 0)))
>             (clobber (reg:CC 17 flags))
>         ]) pr30315.i:6 287 {*subqi_1}
>      (expr_list:REG_DEAD (reg:SI 66 [ b ])
>         (expr_list:REG_UNUSED (reg:CC 17 flags)
>             (nil))))
>
> (insn 10 9 11 2 (set (reg:CC 17 flags)
>         (compare:CC (subreg:QI (reg:SI 64 [ a ]) 0)
>             (reg/v:QI 59 [ difference ]))) pr30315.i:7 4 {*cmpqi_1}
>      (expr_list:REG_DEAD (reg:SI 64 [ a ])
>         (nil)))
>
> which combine is unfortunately unable to process:
>
> Trying 9 -> 10:
> Failed to match this instruction:
> (parallel [
>         (set (reg:CC 17 flags)
>             (compare:CC (subreg:QI (minus:SI (reg:SI 64 [ a ])
>                         (reg:SI 66 [ b ])) 0)
>                 (subreg:QI (reg:SI 64 [ a ]) 0)))
>         (set (reg/v:QI 59 [ difference ])
>             (minus:QI (subreg:QI (reg:SI 64 [ a ]) 0)
>                 (subreg:QI (reg:SI 66 [ b ]) 0)))
>     ])
>
>
> The problem appears to be that an RTX expression like
>
>     (minus:QI (subreg:QI (reg:SI 64 [ a ]) 0)
>               (subreg:QI (reg:SI 66 [ b ]) 0))
>
> seems to be considered non-canonical by combine, and is instead
> transformed into
>
>     (subreg:QI (minus:SI (reg:SI 64 [ a ])
>                          (reg:SI 66 [ b ])) 0)
>
> This happens via combine.c:apply_distributive_law.
>
>
> On the one hand, this seems odd, as SUBREGs with a non-trivial
> expression inside will usually not match any insn pattern.  On
> the other hand, I guess this might still be useful behaviour
> for combine on platforms that support only word arithmetic,
> when the SUBREG might be considered a "split" point.  (Of course,
> this doesn't work for the compute-result-and-cc type patterns.)
>
>
> In either case, it is always odd to have RTX in the insn stream
> that isn't "stable" under common simplication ...

Well, I suppose I see "common simplification" as what simplify-rtx.c does.
The problem comes at times like this when combine has its own ideas.

It looks like Paolo was thinking of making simplify-rtx.c go the other way
(which sounds like a good idea on the face of it).

> Do you have any suggestions on how to fix this?  If we add the fwprop
> patch, should we then disable apply_distributive_law for SUBREGs?

No immediate suggestions, sorry.  It looks like the combine case was
added by this pre-egcs patch:

Wed Mar 18 05:54:25 1998  Richard Kenner  <kenner@vlsi1.ultra.nyu.edu>

	* combine.c (gen_binary): Don't make AND that does nothing.
	(simplify_comparison, case AND): Commute AND and SUBREG.

so maybe Richard remembers (or has a record of) what this was designed
to handle?

Richard

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RFC: allowing fwprop to propagate subregs
  2012-01-12  9:57   ` Richard Sandiford
@ 2012-01-12 11:56     ` Richard Kenner
  2012-01-16 14:21       ` Ulrich Weigand
  0 siblings, 1 reply; 14+ messages in thread
From: Richard Kenner @ 2012-01-12 11:56 UTC (permalink / raw)
  To: richard.sandiford; +Cc: bonzini, gcc-patches, uweigand

> [Richard, combine question at the bottom for you.  I've quoted Ulrich's
>  whole message in order to provide a bit of context.]

I don't remember ALL of this, but can perhaps make a few hints.

> > The problem appears to be that an RTX expression like
> >
> >     (minus:QI (subreg:QI (reg:SI 64 [ a ]) 0)
> >               (subreg:QI (reg:SI 66 [ b ]) 0))
> >
> > seems to be considered non-canonical by combine, and is instead
> > transformed into
> >
> >     (subreg:QI (minus:SI (reg:SI 64 [ a ])
> >                          (reg:SI 66 [ b ])) 0)
> > On the one hand, this seems odd, as SUBREGs with a non-trivial
> > expression inside will usually not match any insn pattern.  On
> > the other hand, I guess this might still be useful behaviour
> > for combine on platforms that support only word arithmetic,
> > when the SUBREG might be considered a "split" point. 

No, I think the idea was that the outer SUBREG would be moved to the LHS
of the assignment to make a "paradoxical SUBREG".  Except, as you point
out, there doesn't seem to be an advantage of doing this for arithmetic
unless you only have it in SImode (which it could, of course, check).

> (Of course, this doesn't work for the compute-result-and-cc type patterns.)

Right.

> No immediate suggestions, sorry.  It looks like the combine case was
> added by this pre-egcs patch:
> 
> Wed Mar 18 05:54:25 1998  Richard Kenner  <kenner@vlsi1.ultra.nyu.edu>
> 
> 	* combine.c (gen_binary): Don't make AND that does nothing.
> 	(simplify_comparison, case AND): Commute AND and SUBREG.
> 
> so maybe Richard remembers (or has a record of) what this was designed
> to handle?

I suspect the two halves related to each other: we commute AND and SUBREG
in the hope that this will allow us to decide that the AND does nothing
and isn't needed.  But I don't see how this is related: this is the AND
case in a comparison and we have a MINUS.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RFC: allowing fwprop to propagate subregs
  2012-01-12 11:56     ` Richard Kenner
@ 2012-01-16 14:21       ` Ulrich Weigand
  2012-01-16 14:32         ` Richard Kenner
  0 siblings, 1 reply; 14+ messages in thread
From: Ulrich Weigand @ 2012-01-16 14:21 UTC (permalink / raw)
  To: Richard Kenner; +Cc: richard.sandiford, bonzini, gcc-patches

Richard Kenner wrote:

> > > The problem appears to be that an RTX expression like
> > >
> > >     (minus:QI (subreg:QI (reg:SI 64 [ a ]) 0)
> > >               (subreg:QI (reg:SI 66 [ b ]) 0))
> > >
> > > seems to be considered non-canonical by combine, and is instead
> > > transformed into
> > >
> > >     (subreg:QI (minus:SI (reg:SI 64 [ a ])
> > >                          (reg:SI 66 [ b ])) 0)
> > > On the one hand, this seems odd, as SUBREGs with a non-trivial
> > > expression inside will usually not match any insn pattern.  On
> > > the other hand, I guess this might still be useful behaviour
> > > for combine on platforms that support only word arithmetic,
> > > when the SUBREG might be considered a "split" point. 
> 
> No, I think the idea was that the outer SUBREG would be moved to the LHS
> of the assignment to make a "paradoxical SUBREG".  Except, as you point
> out, there doesn't seem to be an advantage of doing this for arithmetic
> unless you only have it in SImode (which it could, of course, check).

Right, I was mistaken about the "split" point case; this is done only
for SUBREG (MEM).  But you're correct that there is a special optimization
in simplify_set that will move the outer SUBREG to the LHS:

  /* If we have (set x (subreg:m1 (op:m2 ...) 0)) with OP being some operation,
     and X being a REG or (subreg (reg)), we may be able to convert this to
     (set (subreg:m2 x) (op)).

However, it would appear that this particular location is in fact the only
place where (subreg (op ...)) is handled; I don't see any other place where
this type of pattern would provide any benefit.

Maybe the best solution would be to remove the SUBREG case from the generic
apply_distributive_law subroutine, and instead add a special check for the
distributed subreg case right at the above place in simplify_set; i.e. to
perform the inverse distribution only if it is already guaranteed that we
will also be able to move the subreg to the LHS ...

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  Ulrich.Weigand@de.ibm.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RFC: allowing fwprop to propagate subregs
  2012-01-16 14:21       ` Ulrich Weigand
@ 2012-01-16 14:32         ` Richard Kenner
  2012-01-17 19:25           ` Ulrich Weigand
  0 siblings, 1 reply; 14+ messages in thread
From: Richard Kenner @ 2012-01-16 14:32 UTC (permalink / raw)
  To: uweigand; +Cc: bonzini, gcc-patches, richard.sandiford

> Maybe the best solution would be to remove the SUBREG case from the generic
> apply_distributive_law subroutine, and instead add a special check for the
> distributed subreg case right at the above place in simplify_set; i.e. to
> perform the inverse distribution only if it is already guaranteed that we
> will also be able to move the subreg to the LHS ...

That could indeed work.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RFC: allowing fwprop to propagate subregs
  2012-01-16 14:32         ` Richard Kenner
@ 2012-01-17 19:25           ` Ulrich Weigand
  2012-01-17 19:56             ` Richard Kenner
  0 siblings, 1 reply; 14+ messages in thread
From: Ulrich Weigand @ 2012-01-17 19:25 UTC (permalink / raw)
  To: Richard Kenner; +Cc: bonzini, gcc-patches, richard.sandiford

Richard Kenner wrote:
> > Maybe the best solution would be to remove the SUBREG case from the generic
> > apply_distributive_law subroutine, and instead add a special check for the
> > distributed subreg case right at the above place in simplify_set; i.e. to
> > perform the inverse distribution only if it is already guaranteed that we
> > will also be able to move the subreg to the LHS ...
> 
> That could indeed work.

I tried to implement that suggestion, but interestingly enough I cannot
really test it since I was unable to find any single case where that
SUBREG case in apply_distributive_law actually causes any difference
whatsoever in generated code.

As test case I used the whole of libstdc++.so on the following set of
platforms:
  - i686-pc-linux
  - s390x-ibm-linux
  - powerpc-ibm-linux
  - arm-linux-gnueabi
and built the compiler and libstdc++.so for each of:
  - current mainline
  - current mainline plus the first patch below
  - current mainline plus both patches below

All three resulting object files were identical for every platform.

Do you have any further suggestion of how to find a testcase (some
particular source code and/or architecture)?

Given the current set of results, since I do not have any way to verify
whether my simplify_set changes would actually trigger correctly, I'd
rather propose to just remove the SUBREG case in apply_distributive_law
(i.e. only apply the first patch below).

Thoughts?

Thanks,
Ulrich

Patch A: Remove SUBREG case in apply_distributive_law

Index: gcc/combine.c
===================================================================
--- gcc/combine.c	(revision 183240)
+++ gcc/combine.c	(working copy)
@@ -9238,37 +9269,6 @@
       /* This is also a multiply, so it distributes over everything.  */
       break;
 
-    case SUBREG:
-      /* Non-paradoxical SUBREGs distributes over all operations,
-	 provided the inner modes and byte offsets are the same, this
-	 is an extraction of a low-order part, we don't convert an fp
-	 operation to int or vice versa, this is not a vector mode,
-	 and we would not be converting a single-word operation into a
-	 multi-word operation.  The latter test is not required, but
-	 it prevents generating unneeded multi-word operations.  Some
-	 of the previous tests are redundant given the latter test,
-	 but are retained because they are required for correctness.
-
-	 We produce the result slightly differently in this case.  */
-
-      if (GET_MODE (SUBREG_REG (lhs)) != GET_MODE (SUBREG_REG (rhs))
-	  || SUBREG_BYTE (lhs) != SUBREG_BYTE (rhs)
-	  || ! subreg_lowpart_p (lhs)
-	  || (GET_MODE_CLASS (GET_MODE (lhs))
-	      != GET_MODE_CLASS (GET_MODE (SUBREG_REG (lhs))))
-	  || paradoxical_subreg_p (lhs)
-	  || VECTOR_MODE_P (GET_MODE (lhs))
-	  || GET_MODE_SIZE (GET_MODE (SUBREG_REG (lhs))) > UNITS_PER_WORD
-	  /* Result might need to be truncated.  Don't change mode if
-	     explicit truncation is needed.  */
-	  || !TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (x),
-					     GET_MODE (SUBREG_REG (lhs))))
-	return x;
-
-      tem = simplify_gen_binary (code, GET_MODE (SUBREG_REG (lhs)),
-				 SUBREG_REG (lhs), SUBREG_REG (rhs));
-      return gen_lowpart (GET_MODE (x), tem);
-
     default:
       return x;
     }


Patch B: Re-implement SUBREG case specifically in simplify_set


Index: gcc/combine.c
===================================================================
--- gcc/combine.c	(revision 183240)
+++ gcc/combine.c	(working copy)
@@ -6299,6 +6299,7 @@
   rtx dest = SET_DEST (x);
   enum machine_mode mode
     = GET_MODE (src) != VOIDmode ? GET_MODE (src) : GET_MODE (dest);
+  rtx src_subreg;
   rtx other_insn;
   rtx *cc_use;
 
@@ -6496,6 +6497,10 @@
      and X being a REG or (subreg (reg)), we may be able to convert this to
      (set (subreg:m2 x) (op)).
 
+     Similarly, if we have (set x (op:m1 (subreg:m2 ...) (subreg:m2 ...))),
+     we may be able to first distribute the subreg over op, and then apply
+     the above transformation.
+
      We can always do this if M1 is narrower than M2 because that means that
      we only care about the low bits of the result.
 
@@ -6504,30 +6509,56 @@
      be undefined.  On machine where it is defined, this transformation is safe
      as long as M1 and M2 have the same number of words.  */
 
+  src_subreg = NULL_RTX;
   if (GET_CODE (src) == SUBREG && subreg_lowpart_p (src)
-      && !OBJECT_P (SUBREG_REG (src))
+      && !OBJECT_P (SUBREG_REG (src)))
+    src_subreg = SUBREG_REG (src);
+  else if (GET_CODE (src) == IOR || GET_CODE (src) == XOR
+	   || GET_CODE (src) == AND
+	   || GET_CODE (src) == PLUS || GET_CODE (src) == MINUS)
+    {
+      rtx lhs = XEXP (x, 0);
+      rtx rhs = XEXP (x, 1);
+
+      /* We can distribute non-paradoxical lowpart SUBREGs if the
+	 inner modes agree.  */
+      if (GET_CODE (lhs) == SUBREG && GET_CODE (rhs) == SUBREG
+	  && GET_MODE (SUBREG_REG (lhs)) == GET_MODE (SUBREG_REG (rhs))
+	  && subreg_lowpart_p (lhs) && !paradoxical_subreg_p (lhs)
+	  && subreg_lowpart_p (rhs) && !paradoxical_subreg_p (rhs)
+	  /* This is safe in general only for integral modes.  */
+	  && INTEGRAL_MODE_P (GET_MODE (lhs))
+	  && INTEGRAL_MODE_P (GET_MODE (SUBREG_REG (lhs)))
+	  /* Result might need to be truncated.  Don't change mode if
+	     explicit truncation is needed.  */
+	  && TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (src),
+					    GET_MODE (SUBREG_REG (lhs))))
+	src_subreg = simplify_gen_binary (GET_CODE (src),
+					  GET_MODE (SUBREG_REG (lhs)),
+					  SUBREG_REG (lhs), SUBREG_REG (rhs));
+    }
+
+  if (src_subreg
       && (((GET_MODE_SIZE (GET_MODE (src)) + (UNITS_PER_WORD - 1))
 	   / UNITS_PER_WORD)
-	  == ((GET_MODE_SIZE (GET_MODE (SUBREG_REG (src)))
-	       + (UNITS_PER_WORD - 1)) / UNITS_PER_WORD))
+	  == ((GET_MODE_SIZE (GET_MODE (src_subreg)))
+	       + (UNITS_PER_WORD - 1)) / UNITS_PER_WORD)
 #ifndef WORD_REGISTER_OPERATIONS
       && (GET_MODE_SIZE (GET_MODE (src))
-	< GET_MODE_SIZE (GET_MODE (SUBREG_REG (src))))
+	  < GET_MODE_SIZE (GET_MODE (src_subreg)))
 #endif
 #ifdef CANNOT_CHANGE_MODE_CLASS
       && ! (REG_P (dest) && REGNO (dest) < FIRST_PSEUDO_REGISTER
 	    && REG_CANNOT_CHANGE_MODE_P (REGNO (dest),
-					 GET_MODE (SUBREG_REG (src)),
+					 GET_MODE (src_subreg),
 					 GET_MODE (src)))
 #endif
       && (REG_P (dest)
 	  || (GET_CODE (dest) == SUBREG
 	      && REG_P (SUBREG_REG (dest)))))
     {
-      SUBST (SET_DEST (x),
-	     gen_lowpart (GET_MODE (SUBREG_REG (src)),
-				      dest));
-      SUBST (SET_SRC (x), SUBREG_REG (src));
+      SUBST (SET_DEST (x), gen_lowpart (GET_MODE (src_subreg), dest));
+      SUBST (SET_SRC (x), src_subreg);
 
       src = SET_SRC (x), dest = SET_DEST (x);
     }


-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  Ulrich.Weigand@de.ibm.com


Richard Kenner wrote:
> > Maybe the best solution would be to remove the SUBREG case from the generic
> > apply_distributive_law subroutine, and instead add a special check for the
> > distributed subreg case right at the above place in simplify_set; i.e. to
> > perform the inverse distribution only if it is already guaranteed that we
> > will also be able to move the subreg to the LHS ...
> 
> That could indeed work.

I tried to implement that suggestion, but interestingly enough I cannot
really test it since I was unable to find any single case where that
SUBREG case in apply_distributive_law actually causes any difference
whatsoever in generated code.

As test case I used the whole of libstdc++.so on the following set of
platforms:
  - i686-pc-linux
  - s390x-ibm-linux
  - powerpc-ibm-linux
  - arm-linux-gnueabi
and built the compiler and libstdc++.so for each of:
  - current mainline
  - current mainline plus the first patch below
  - current mainline plus both patches below

All three resulting object files were identical for every platform.

Do you have any further suggestion of how to find a testcase (some
particular source code and/or architecture)?

Given the current set of results, since I do not have any way to verify
whether my simplify_set changes would actually trigger correctly, I'd
rather propose to just remove the SUBREG case in apply_distributive_law
(i.e. only apply the first patch below).

Thoughts?

Thanks,
Ulrich

Patch A: Remove SUBREG case in apply_distributive_law

Index: gcc/combine.c
===================================================================
--- gcc/combine.c	(revision 183240)
+++ gcc/combine.c	(working copy)
@@ -9238,37 +9269,6 @@
       /* This is also a multiply, so it distributes over everything.  */
       break;
 
-    case SUBREG:
-      /* Non-paradoxical SUBREGs distributes over all operations,
-	 provided the inner modes and byte offsets are the same, this
-	 is an extraction of a low-order part, we don't convert an fp
-	 operation to int or vice versa, this is not a vector mode,
-	 and we would not be converting a single-word operation into a
-	 multi-word operation.  The latter test is not required, but
-	 it prevents generating unneeded multi-word operations.  Some
-	 of the previous tests are redundant given the latter test,
-	 but are retained because they are required for correctness.
-
-	 We produce the result slightly differently in this case.  */
-
-      if (GET_MODE (SUBREG_REG (lhs)) != GET_MODE (SUBREG_REG (rhs))
-	  || SUBREG_BYTE (lhs) != SUBREG_BYTE (rhs)
-	  || ! subreg_lowpart_p (lhs)
-	  || (GET_MODE_CLASS (GET_MODE (lhs))
-	      != GET_MODE_CLASS (GET_MODE (SUBREG_REG (lhs))))
-	  || paradoxical_subreg_p (lhs)
-	  || VECTOR_MODE_P (GET_MODE (lhs))
-	  || GET_MODE_SIZE (GET_MODE (SUBREG_REG (lhs))) > UNITS_PER_WORD
-	  /* Result might need to be truncated.  Don't change mode if
-	     explicit truncation is needed.  */
-	  || !TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (x),
-					     GET_MODE (SUBREG_REG (lhs))))
-	return x;
-
-      tem = simplify_gen_binary (code, GET_MODE (SUBREG_REG (lhs)),
-				 SUBREG_REG (lhs), SUBREG_REG (rhs));
-      return gen_lowpart (GET_MODE (x), tem);
-
     default:
       return x;
     }


Patch B: Re-implement SUBREG case specifically in simplify_set


Index: gcc/combine.c
===================================================================
--- gcc/combine.c	(revision 183240)
+++ gcc/combine.c	(working copy)
@@ -6299,6 +6299,7 @@
   rtx dest = SET_DEST (x);
   enum machine_mode mode
     = GET_MODE (src) != VOIDmode ? GET_MODE (src) : GET_MODE (dest);
+  rtx src_subreg;
   rtx other_insn;
   rtx *cc_use;
 
@@ -6496,6 +6497,10 @@
      and X being a REG or (subreg (reg)), we may be able to convert this to
      (set (subreg:m2 x) (op)).
 
+     Similarly, if we have (set x (op:m1 (subreg:m2 ...) (subreg:m2 ...))),
+     we may be able to first distribute the subreg over op, and then apply
+     the above transformation.
+
      We can always do this if M1 is narrower than M2 because that means that
      we only care about the low bits of the result.
 
@@ -6504,30 +6509,56 @@
      be undefined.  On machine where it is defined, this transformation is safe
      as long as M1 and M2 have the same number of words.  */
 
+  src_subreg = NULL_RTX;
   if (GET_CODE (src) == SUBREG && subreg_lowpart_p (src)
-      && !OBJECT_P (SUBREG_REG (src))
+      && !OBJECT_P (SUBREG_REG (src)))
+    src_subreg = SUBREG_REG (src);
+  else if (GET_CODE (src) == IOR || GET_CODE (src) == XOR
+	   || GET_CODE (src) == AND
+	   || GET_CODE (src) == PLUS || GET_CODE (src) == MINUS)
+    {
+      rtx lhs = XEXP (x, 0);
+      rtx rhs = XEXP (x, 1);
+
+      /* We can distribute non-paradoxical lowpart SUBREGs if the
+	 inner modes agree.  */
+      if (GET_CODE (lhs) == SUBREG && GET_CODE (rhs) == SUBREG
+	  && GET_MODE (SUBREG_REG (lhs)) == GET_MODE (SUBREG_REG (rhs))
+	  && subreg_lowpart_p (lhs) && !paradoxical_subreg_p (lhs)
+	  && subreg_lowpart_p (rhs) && !paradoxical_subreg_p (rhs)
+	  /* This is safe in general only for integral modes.  */
+	  && INTEGRAL_MODE_P (GET_MODE (lhs))
+	  && INTEGRAL_MODE_P (GET_MODE (SUBREG_REG (lhs)))
+	  /* Result might need to be truncated.  Don't change mode if
+	     explicit truncation is needed.  */
+	  && TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (src),
+					    GET_MODE (SUBREG_REG (lhs))))
+	src_subreg = simplify_gen_binary (GET_CODE (src),
+					  GET_MODE (SUBREG_REG (lhs)),
+					  SUBREG_REG (lhs), SUBREG_REG (rhs));
+    }
+
+  if (src_subreg
       && (((GET_MODE_SIZE (GET_MODE (src)) + (UNITS_PER_WORD - 1))
 	   / UNITS_PER_WORD)
-	  == ((GET_MODE_SIZE (GET_MODE (SUBREG_REG (src)))
-	       + (UNITS_PER_WORD - 1)) / UNITS_PER_WORD))
+	  == ((GET_MODE_SIZE (GET_MODE (src_subreg)))
+	       + (UNITS_PER_WORD - 1)) / UNITS_PER_WORD)
 #ifndef WORD_REGISTER_OPERATIONS
       && (GET_MODE_SIZE (GET_MODE (src))
-	< GET_MODE_SIZE (GET_MODE (SUBREG_REG (src))))
+	  < GET_MODE_SIZE (GET_MODE (src_subreg)))
 #endif
 #ifdef CANNOT_CHANGE_MODE_CLASS
       && ! (REG_P (dest) && REGNO (dest) < FIRST_PSEUDO_REGISTER
 	    && REG_CANNOT_CHANGE_MODE_P (REGNO (dest),
-					 GET_MODE (SUBREG_REG (src)),
+					 GET_MODE (src_subreg),
 					 GET_MODE (src)))
 #endif
       && (REG_P (dest)
 	  || (GET_CODE (dest) == SUBREG
 	      && REG_P (SUBREG_REG (dest)))))
     {
-      SUBST (SET_DEST (x),
-	     gen_lowpart (GET_MODE (SUBREG_REG (src)),
-				      dest));
-      SUBST (SET_SRC (x), SUBREG_REG (src));
+      SUBST (SET_DEST (x), gen_lowpart (GET_MODE (src_subreg), dest));
+      SUBST (SET_SRC (x), src_subreg);
 
       src = SET_SRC (x), dest = SET_DEST (x);
     }


-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  Ulrich.Weigand@de.ibm.com


Richard Kenner wrote:
> > Maybe the best solution would be to remove the SUBREG case from the generic
> > apply_distributive_law subroutine, and instead add a special check for the
> > distributed subreg case right at the above place in simplify_set; i.e. to
> > perform the inverse distribution only if it is already guaranteed that we
> > will also be able to move the subreg to the LHS ...
> 
> That could indeed work.

I tried to implement that suggestion, but interestingly enough I cannot
really test it since I was unable to find any single case where that
SUBREG case in apply_distributive_law actually causes any difference
whatsoever in generated code.

As test case I used the whole of libstdc++.so on the following set of
platforms:
  - i686-pc-linux
  - s390x-ibm-linux
  - powerpc-ibm-linux
  - arm-linux-gnueabi
and built the compiler and libstdc++.so for each of:
  - current mainline
  - current mainline plus the first patch below
  - current mainline plus both patches below

All three resulting object files were identical for every platform.

Do you have any further suggestion of how to find a testcase (some
particular source code and/or architecture)?

Given the current set of results, since I do not have any way to verify
whether my simplify_set changes would actually trigger correctly, I'd
rather propose to just remove the SUBREG case in apply_distributive_law
(i.e. only apply the first patch below).

Thoughts?

Thanks,
Ulrich

Patch A: Remove SUBREG case in apply_distributive_law

Index: gcc/combine.c
===================================================================
--- gcc/combine.c	(revision 183240)
+++ gcc/combine.c	(working copy)
@@ -9238,37 +9269,6 @@
       /* This is also a multiply, so it distributes over everything.  */
       break;
 
-    case SUBREG:
-      /* Non-paradoxical SUBREGs distributes over all operations,
-	 provided the inner modes and byte offsets are the same, this
-	 is an extraction of a low-order part, we don't convert an fp
-	 operation to int or vice versa, this is not a vector mode,
-	 and we would not be converting a single-word operation into a
-	 multi-word operation.  The latter test is not required, but
-	 it prevents generating unneeded multi-word operations.  Some
-	 of the previous tests are redundant given the latter test,
-	 but are retained because they are required for correctness.
-
-	 We produce the result slightly differently in this case.  */
-
-      if (GET_MODE (SUBREG_REG (lhs)) != GET_MODE (SUBREG_REG (rhs))
-	  || SUBREG_BYTE (lhs) != SUBREG_BYTE (rhs)
-	  || ! subreg_lowpart_p (lhs)
-	  || (GET_MODE_CLASS (GET_MODE (lhs))
-	      != GET_MODE_CLASS (GET_MODE (SUBREG_REG (lhs))))
-	  || paradoxical_subreg_p (lhs)
-	  || VECTOR_MODE_P (GET_MODE (lhs))
-	  || GET_MODE_SIZE (GET_MODE (SUBREG_REG (lhs))) > UNITS_PER_WORD
-	  /* Result might need to be truncated.  Don't change mode if
-	     explicit truncation is needed.  */
-	  || !TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (x),
-					     GET_MODE (SUBREG_REG (lhs))))
-	return x;
-
-      tem = simplify_gen_binary (code, GET_MODE (SUBREG_REG (lhs)),
-				 SUBREG_REG (lhs), SUBREG_REG (rhs));
-      return gen_lowpart (GET_MODE (x), tem);
-
     default:
       return x;
     }


Patch B: Re-implement SUBREG case specifically in simplify_set


Index: gcc/combine.c
===================================================================
--- gcc/combine.c	(revision 183240)
+++ gcc/combine.c	(working copy)
@@ -6299,6 +6299,7 @@
   rtx dest = SET_DEST (x);
   enum machine_mode mode
     = GET_MODE (src) != VOIDmode ? GET_MODE (src) : GET_MODE (dest);
+  rtx src_subreg;
   rtx other_insn;
   rtx *cc_use;
 
@@ -6496,6 +6497,10 @@
      and X being a REG or (subreg (reg)), we may be able to convert this to
      (set (subreg:m2 x) (op)).
 
+     Similarly, if we have (set x (op:m1 (subreg:m2 ...) (subreg:m2 ...))),
+     we may be able to first distribute the subreg over op, and then apply
+     the above transformation.
+
      We can always do this if M1 is narrower than M2 because that means that
      we only care about the low bits of the result.
 
@@ -6504,30 +6509,56 @@
      be undefined.  On machine where it is defined, this transformation is safe
      as long as M1 and M2 have the same number of words.  */
 
+  src_subreg = NULL_RTX;
   if (GET_CODE (src) == SUBREG && subreg_lowpart_p (src)
-      && !OBJECT_P (SUBREG_REG (src))
+      && !OBJECT_P (SUBREG_REG (src)))
+    src_subreg = SUBREG_REG (src);
+  else if (GET_CODE (src) == IOR || GET_CODE (src) == XOR
+	   || GET_CODE (src) == AND
+	   || GET_CODE (src) == PLUS || GET_CODE (src) == MINUS)
+    {
+      rtx lhs = XEXP (x, 0);
+      rtx rhs = XEXP (x, 1);
+
+      /* We can distribute non-paradoxical lowpart SUBREGs if the
+	 inner modes agree.  */
+      if (GET_CODE (lhs) == SUBREG && GET_CODE (rhs) == SUBREG
+	  && GET_MODE (SUBREG_REG (lhs)) == GET_MODE (SUBREG_REG (rhs))
+	  && subreg_lowpart_p (lhs) && !paradoxical_subreg_p (lhs)
+	  && subreg_lowpart_p (rhs) && !paradoxical_subreg_p (rhs)
+	  /* This is safe in general only for integral modes.  */
+	  && INTEGRAL_MODE_P (GET_MODE (lhs))
+	  && INTEGRAL_MODE_P (GET_MODE (SUBREG_REG (lhs)))
+	  /* Result might need to be truncated.  Don't change mode if
+	     explicit truncation is needed.  */
+	  && TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (src),
+					    GET_MODE (SUBREG_REG (lhs))))
+	src_subreg = simplify_gen_binary (GET_CODE (src),
+					  GET_MODE (SUBREG_REG (lhs)),
+					  SUBREG_REG (lhs), SUBREG_REG (rhs));
+    }
+
+  if (src_subreg
       && (((GET_MODE_SIZE (GET_MODE (src)) + (UNITS_PER_WORD - 1))
 	   / UNITS_PER_WORD)
-	  == ((GET_MODE_SIZE (GET_MODE (SUBREG_REG (src)))
-	       + (UNITS_PER_WORD - 1)) / UNITS_PER_WORD))
+	  == ((GET_MODE_SIZE (GET_MODE (src_subreg)))
+	       + (UNITS_PER_WORD - 1)) / UNITS_PER_WORD)
 #ifndef WORD_REGISTER_OPERATIONS
       && (GET_MODE_SIZE (GET_MODE (src))
-	< GET_MODE_SIZE (GET_MODE (SUBREG_REG (src))))
+	  < GET_MODE_SIZE (GET_MODE (src_subreg)))
 #endif
 #ifdef CANNOT_CHANGE_MODE_CLASS
       && ! (REG_P (dest) && REGNO (dest) < FIRST_PSEUDO_REGISTER
 	    && REG_CANNOT_CHANGE_MODE_P (REGNO (dest),
-					 GET_MODE (SUBREG_REG (src)),
+					 GET_MODE (src_subreg),
 					 GET_MODE (src)))
 #endif
       && (REG_P (dest)
 	  || (GET_CODE (dest) == SUBREG
 	      && REG_P (SUBREG_REG (dest)))))
     {
-      SUBST (SET_DEST (x),
-	     gen_lowpart (GET_MODE (SUBREG_REG (src)),
-				      dest));
-      SUBST (SET_SRC (x), SUBREG_REG (src));
+      SUBST (SET_DEST (x), gen_lowpart (GET_MODE (src_subreg), dest));
+      SUBST (SET_SRC (x), src_subreg);
 
       src = SET_SRC (x), dest = SET_DEST (x);
     }


-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  Ulrich.Weigand@de.ibm.com


Richard Kenner wrote:
> > Maybe the best solution would be to remove the SUBREG case from the generic
> > apply_distributive_law subroutine, and instead add a special check for the
> > distributed subreg case right at the above place in simplify_set; i.e. to
> > perform the inverse distribution only if it is already guaranteed that we
> > will also be able to move the subreg to the LHS ...
> 
> That could indeed work.

I tried to implement that suggestion, but interestingly enough I cannot
really test it since I was unable to find any single case where that
SUBREG case in apply_distributive_law actually causes any difference
whatsoever in generated code.

As test case I used the whole of libstdc++.so on the following set of
platforms:
  - i686-pc-linux
  - s390x-ibm-linux
  - powerpc-ibm-linux
  - arm-linux-gnueabi
and built the compiler and libstdc++.so for each of:
  - current mainline
  - current mainline plus the first patch below
  - current mainline plus both patches below

All three resulting object files were identical for every platform.

Do you have any further suggestion of how to find a testcase (some
particular source code and/or architecture)?

Given the current set of results, since I do not have any way to verify
whether my simplify_set changes would actually trigger correctly, I'd
rather propose to just remove the SUBREG case in apply_distributive_law
(i.e. only apply the first patch below).

Thoughts?

Thanks,
Ulrich

Patch A: Remove SUBREG case in apply_distributive_law

Index: gcc/combine.c
===================================================================
--- gcc/combine.c	(revision 183240)
+++ gcc/combine.c	(working copy)
@@ -9238,37 +9269,6 @@
       /* This is also a multiply, so it distributes over everything.  */
       break;
 
-    case SUBREG:
-      /* Non-paradoxical SUBREGs distributes over all operations,
-	 provided the inner modes and byte offsets are the same, this
-	 is an extraction of a low-order part, we don't convert an fp
-	 operation to int or vice versa, this is not a vector mode,
-	 and we would not be converting a single-word operation into a
-	 multi-word operation.  The latter test is not required, but
-	 it prevents generating unneeded multi-word operations.  Some
-	 of the previous tests are redundant given the latter test,
-	 but are retained because they are required for correctness.
-
-	 We produce the result slightly differently in this case.  */
-
-      if (GET_MODE (SUBREG_REG (lhs)) != GET_MODE (SUBREG_REG (rhs))
-	  || SUBREG_BYTE (lhs) != SUBREG_BYTE (rhs)
-	  || ! subreg_lowpart_p (lhs)
-	  || (GET_MODE_CLASS (GET_MODE (lhs))
-	      != GET_MODE_CLASS (GET_MODE (SUBREG_REG (lhs))))
-	  || paradoxical_subreg_p (lhs)
-	  || VECTOR_MODE_P (GET_MODE (lhs))
-	  || GET_MODE_SIZE (GET_MODE (SUBREG_REG (lhs))) > UNITS_PER_WORD
-	  /* Result might need to be truncated.  Don't change mode if
-	     explicit truncation is needed.  */
-	  || !TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (x),
-					     GET_MODE (SUBREG_REG (lhs))))
-	return x;
-
-      tem = simplify_gen_binary (code, GET_MODE (SUBREG_REG (lhs)),
-				 SUBREG_REG (lhs), SUBREG_REG (rhs));
-      return gen_lowpart (GET_MODE (x), tem);
-
     default:
       return x;
     }


Patch B: Re-implement SUBREG case specifically in simplify_set


Index: gcc/combine.c
===================================================================
--- gcc/combine.c	(revision 183240)
+++ gcc/combine.c	(working copy)
@@ -6299,6 +6299,7 @@
   rtx dest = SET_DEST (x);
   enum machine_mode mode
     = GET_MODE (src) != VOIDmode ? GET_MODE (src) : GET_MODE (dest);
+  rtx src_subreg;
   rtx other_insn;
   rtx *cc_use;
 
@@ -6496,6 +6497,10 @@
      and X being a REG or (subreg (reg)), we may be able to convert this to
      (set (subreg:m2 x) (op)).
 
+     Similarly, if we have (set x (op:m1 (subreg:m2 ...) (subreg:m2 ...))),
+     we may be able to first distribute the subreg over op, and then apply
+     the above transformation.
+
      We can always do this if M1 is narrower than M2 because that means that
      we only care about the low bits of the result.
 
@@ -6504,30 +6509,56 @@
      be undefined.  On machine where it is defined, this transformation is safe
      as long as M1 and M2 have the same number of words.  */
 
+  src_subreg = NULL_RTX;
   if (GET_CODE (src) == SUBREG && subreg_lowpart_p (src)
-      && !OBJECT_P (SUBREG_REG (src))
+      && !OBJECT_P (SUBREG_REG (src)))
+    src_subreg = SUBREG_REG (src);
+  else if (GET_CODE (src) == IOR || GET_CODE (src) == XOR
+	   || GET_CODE (src) == AND
+	   || GET_CODE (src) == PLUS || GET_CODE (src) == MINUS)
+    {
+      rtx lhs = XEXP (x, 0);
+      rtx rhs = XEXP (x, 1);
+
+      /* We can distribute non-paradoxical lowpart SUBREGs if the
+	 inner modes agree.  */
+      if (GET_CODE (lhs) == SUBREG && GET_CODE (rhs) == SUBREG
+	  && GET_MODE (SUBREG_REG (lhs)) == GET_MODE (SUBREG_REG (rhs))
+	  && subreg_lowpart_p (lhs) && !paradoxical_subreg_p (lhs)
+	  && subreg_lowpart_p (rhs) && !paradoxical_subreg_p (rhs)
+	  /* This is safe in general only for integral modes.  */
+	  && INTEGRAL_MODE_P (GET_MODE (lhs))
+	  && INTEGRAL_MODE_P (GET_MODE (SUBREG_REG (lhs)))
+	  /* Result might need to be truncated.  Don't change mode if
+	     explicit truncation is needed.  */
+	  && TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (src),
+					    GET_MODE (SUBREG_REG (lhs))))
+	src_subreg = simplify_gen_binary (GET_CODE (src),
+					  GET_MODE (SUBREG_REG (lhs)),
+					  SUBREG_REG (lhs), SUBREG_REG (rhs));
+    }
+
+  if (src_subreg
       && (((GET_MODE_SIZE (GET_MODE (src)) + (UNITS_PER_WORD - 1))
 	   / UNITS_PER_WORD)
-	  == ((GET_MODE_SIZE (GET_MODE (SUBREG_REG (src)))
-	       + (UNITS_PER_WORD - 1)) / UNITS_PER_WORD))
+	  == ((GET_MODE_SIZE (GET_MODE (src_subreg)))
+	       + (UNITS_PER_WORD - 1)) / UNITS_PER_WORD)
 #ifndef WORD_REGISTER_OPERATIONS
       && (GET_MODE_SIZE (GET_MODE (src))
-	< GET_MODE_SIZE (GET_MODE (SUBREG_REG (src))))
+	  < GET_MODE_SIZE (GET_MODE (src_subreg)))
 #endif
 #ifdef CANNOT_CHANGE_MODE_CLASS
       && ! (REG_P (dest) && REGNO (dest) < FIRST_PSEUDO_REGISTER
 	    && REG_CANNOT_CHANGE_MODE_P (REGNO (dest),
-					 GET_MODE (SUBREG_REG (src)),
+					 GET_MODE (src_subreg),
 					 GET_MODE (src)))
 #endif
       && (REG_P (dest)
 	  || (GET_CODE (dest) == SUBREG
 	      && REG_P (SUBREG_REG (dest)))))
     {
-      SUBST (SET_DEST (x),
-	     gen_lowpart (GET_MODE (SUBREG_REG (src)),
-				      dest));
-      SUBST (SET_SRC (x), SUBREG_REG (src));
+      SUBST (SET_DEST (x), gen_lowpart (GET_MODE (src_subreg), dest));
+      SUBST (SET_SRC (x), src_subreg);
 
       src = SET_SRC (x), dest = SET_DEST (x);
     }


-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  Ulrich.Weigand@de.ibm.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RFC: allowing fwprop to propagate subregs
  2012-01-17 19:25           ` Ulrich Weigand
@ 2012-01-17 19:56             ` Richard Kenner
  2012-03-07 17:40               ` [PATCH] Do not handle SUBREG in apply_distributive_law (Re: RFC: allowing fwprop to propagate subregs) Ulrich Weigand
  0 siblings, 1 reply; 14+ messages in thread
From: Richard Kenner @ 2012-01-17 19:56 UTC (permalink / raw)
  To: uweigand; +Cc: bonzini, gcc-patches, richard.sandiford

> I tried to implement that suggestion, but interestingly enough I cannot
> really test it since I was unable to find any single case where that
> SUBREG case in apply_distributive_law actually causes any difference
> whatsoever in generated code.
> 
> Do you have any further suggestion of how to find a testcase (some
> particular source code and/or architecture)?

No.  It may well be irrelevant now and was only relevant in cases where
we just had SImode and not the narrower modes on some of the older
architectures such as a29k.  It may also be that the optimizations that
this caught are now done at tree level, though I think the former is
the more likely.

> Given the current set of results, since I do not have any way to verify
> whether my simplify_set changes would actually trigger correctly, I'd
> rather propose to just remove the SUBREG case in apply_distributive_law
> (i.e. only apply the first patch below).
> 
> Thoughts?

I think that's reasonable.  But I'd replace it with a comment saying
what used to be there and why it was removed.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH] Do not handle SUBREG in apply_distributive_law (Re: RFC: allowing fwprop to propagate subregs)
  2012-01-17 19:56             ` Richard Kenner
@ 2012-03-07 17:40               ` Ulrich Weigand
  2012-03-12 10:23                 ` Richard Guenther
  0 siblings, 1 reply; 14+ messages in thread
From: Ulrich Weigand @ 2012-03-07 17:40 UTC (permalink / raw)
  To: Richard Kenner; +Cc: bonzini, gcc-patches, richard.sandiford

Richard Kenner wrote:

> > Given the current set of results, since I do not have any way to verify
> > whether my simplify_set changes would actually trigger correctly, I'd
> > rather propose to just remove the SUBREG case in apply_distributive_law
> > (i.e. only apply the first patch below).
> > 
> > Thoughts?
> 
> I think that's reasonable.  But I'd replace it with a comment saying
> what used to be there and why it was removed.

Now that we're back in stage 1, I'd like to try and move forward with
this again.  Here's a patch that implements your suggestion.

Tested on arm-linux-gnueabi, i386-linux-gnu and s390-linux-gnu.

OK for mainline?

Bye,
Ulrich

2012-03-07  Ulrich Weigand  <ulrich.weigand@linaro.org>

	gcc/
	* combine.c (apply_distributive_law): Do not distribute SUBREG.

=== modified file 'gcc/combine.c'
--- gcc/combine.c	2012-02-07 15:48:52 +0000
+++ gcc/combine.c	2012-02-22 11:57:19 +0000
@@ -9286,36 +9286,22 @@
       /* This is also a multiply, so it distributes over everything.  */
       break;
 
-    case SUBREG:
-      /* Non-paradoxical SUBREGs distributes over all operations,
-	 provided the inner modes and byte offsets are the same, this
-	 is an extraction of a low-order part, we don't convert an fp
-	 operation to int or vice versa, this is not a vector mode,
-	 and we would not be converting a single-word operation into a
-	 multi-word operation.  The latter test is not required, but
-	 it prevents generating unneeded multi-word operations.  Some
-	 of the previous tests are redundant given the latter test,
-	 but are retained because they are required for correctness.
-
-	 We produce the result slightly differently in this case.  */
-
-      if (GET_MODE (SUBREG_REG (lhs)) != GET_MODE (SUBREG_REG (rhs))
-	  || SUBREG_BYTE (lhs) != SUBREG_BYTE (rhs)
-	  || ! subreg_lowpart_p (lhs)
-	  || (GET_MODE_CLASS (GET_MODE (lhs))
-	      != GET_MODE_CLASS (GET_MODE (SUBREG_REG (lhs))))
-	  || paradoxical_subreg_p (lhs)
-	  || VECTOR_MODE_P (GET_MODE (lhs))
-	  || GET_MODE_SIZE (GET_MODE (SUBREG_REG (lhs))) > UNITS_PER_WORD
-	  /* Result might need to be truncated.  Don't change mode if
-	     explicit truncation is needed.  */
-	  || !TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (x),
-					     GET_MODE (SUBREG_REG (lhs))))
-	return x;
-
-      tem = simplify_gen_binary (code, GET_MODE (SUBREG_REG (lhs)),
-				 SUBREG_REG (lhs), SUBREG_REG (rhs));
-      return gen_lowpart (GET_MODE (x), tem);
+    /* This used to handle SUBREG, but this turned out to be counter-
+       productive, since (subreg (op ...)) usually is not handled by
+       insn patterns, and this "optimization" therefore transformed
+       recognizable patterns into unrecognizable ones.  Therefore the
+       SUBREG case was removed from here.
+
+       It is possible that distributing SUBREG over arithmetic operations
+       leads to an intermediate result than can then be optimized further,
+       e.g. by moving the outer SUBREG to the other side of a SET as done
+       in simplify_set.  This seems to have been the original intent of
+       handling SUBREGs here.
+
+       However, with current GCC this does not appear to actually happen,
+       at least on major platforms.  If some case is found where removing
+       the SUBREG case here prevents follow-on optimizations, distributing
+       SUBREGs ought to be re-added at that place, e.g. in simplify_set.  */
 
     default:
       return x;



-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  Ulrich.Weigand@de.ibm.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Do not handle SUBREG in apply_distributive_law (Re: RFC: allowing fwprop to propagate subregs)
  2012-03-07 17:40               ` [PATCH] Do not handle SUBREG in apply_distributive_law (Re: RFC: allowing fwprop to propagate subregs) Ulrich Weigand
@ 2012-03-12 10:23                 ` Richard Guenther
  0 siblings, 0 replies; 14+ messages in thread
From: Richard Guenther @ 2012-03-12 10:23 UTC (permalink / raw)
  To: Ulrich Weigand; +Cc: Richard Kenner, bonzini, gcc-patches, richard.sandiford

On Wed, Mar 7, 2012 at 6:40 PM, Ulrich Weigand <uweigand@de.ibm.com> wrote:
> Richard Kenner wrote:
>
>> > Given the current set of results, since I do not have any way to verify
>> > whether my simplify_set changes would actually trigger correctly, I'd
>> > rather propose to just remove the SUBREG case in apply_distributive_law
>> > (i.e. only apply the first patch below).
>> >
>> > Thoughts?
>>
>> I think that's reasonable.  But I'd replace it with a comment saying
>> what used to be there and why it was removed.
>
> Now that we're back in stage 1, I'd like to try and move forward with
> this again.  Here's a patch that implements your suggestion.
>
> Tested on arm-linux-gnueabi, i386-linux-gnu and s390-linux-gnu.
>
> OK for mainline?

Ok.

Thanks,
Richard.

> Bye,
> Ulrich
>
> 2012-03-07  Ulrich Weigand  <ulrich.weigand@linaro.org>
>
>        gcc/
>        * combine.c (apply_distributive_law): Do not distribute SUBREG.
>
> === modified file 'gcc/combine.c'
> --- gcc/combine.c       2012-02-07 15:48:52 +0000
> +++ gcc/combine.c       2012-02-22 11:57:19 +0000
> @@ -9286,36 +9286,22 @@
>       /* This is also a multiply, so it distributes over everything.  */
>       break;
>
> -    case SUBREG:
> -      /* Non-paradoxical SUBREGs distributes over all operations,
> -        provided the inner modes and byte offsets are the same, this
> -        is an extraction of a low-order part, we don't convert an fp
> -        operation to int or vice versa, this is not a vector mode,
> -        and we would not be converting a single-word operation into a
> -        multi-word operation.  The latter test is not required, but
> -        it prevents generating unneeded multi-word operations.  Some
> -        of the previous tests are redundant given the latter test,
> -        but are retained because they are required for correctness.
> -
> -        We produce the result slightly differently in this case.  */
> -
> -      if (GET_MODE (SUBREG_REG (lhs)) != GET_MODE (SUBREG_REG (rhs))
> -         || SUBREG_BYTE (lhs) != SUBREG_BYTE (rhs)
> -         || ! subreg_lowpart_p (lhs)
> -         || (GET_MODE_CLASS (GET_MODE (lhs))
> -             != GET_MODE_CLASS (GET_MODE (SUBREG_REG (lhs))))
> -         || paradoxical_subreg_p (lhs)
> -         || VECTOR_MODE_P (GET_MODE (lhs))
> -         || GET_MODE_SIZE (GET_MODE (SUBREG_REG (lhs))) > UNITS_PER_WORD
> -         /* Result might need to be truncated.  Don't change mode if
> -            explicit truncation is needed.  */
> -         || !TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (x),
> -                                            GET_MODE (SUBREG_REG (lhs))))
> -       return x;
> -
> -      tem = simplify_gen_binary (code, GET_MODE (SUBREG_REG (lhs)),
> -                                SUBREG_REG (lhs), SUBREG_REG (rhs));
> -      return gen_lowpart (GET_MODE (x), tem);
> +    /* This used to handle SUBREG, but this turned out to be counter-
> +       productive, since (subreg (op ...)) usually is not handled by
> +       insn patterns, and this "optimization" therefore transformed
> +       recognizable patterns into unrecognizable ones.  Therefore the
> +       SUBREG case was removed from here.
> +
> +       It is possible that distributing SUBREG over arithmetic operations
> +       leads to an intermediate result than can then be optimized further,
> +       e.g. by moving the outer SUBREG to the other side of a SET as done
> +       in simplify_set.  This seems to have been the original intent of
> +       handling SUBREGs here.
> +
> +       However, with current GCC this does not appear to actually happen,
> +       at least on major platforms.  If some case is found where removing
> +       the SUBREG case here prevents follow-on optimizations, distributing
> +       SUBREGs ought to be re-added at that place, e.g. in simplify_set.  */
>
>     default:
>       return x;
>
>
>
> --
>  Dr. Ulrich Weigand
>  GNU Toolchain for Linux on System z and Cell BE
>  Ulrich.Weigand@de.ibm.com
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2012-03-12 10:23 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-09-14 15:40 RFC: allowing fwprop to propagate subregs Richard Sandiford
2011-09-14 15:45 ` H.J. Lu
2011-09-14 16:09   ` Richard Sandiford
2011-09-14 18:28     ` Paolo Bonzini
2012-01-11 16:55 ` Ulrich Weigand
2012-01-11 19:12   ` Paolo Bonzini
2012-01-12  9:57   ` Richard Sandiford
2012-01-12 11:56     ` Richard Kenner
2012-01-16 14:21       ` Ulrich Weigand
2012-01-16 14:32         ` Richard Kenner
2012-01-17 19:25           ` Ulrich Weigand
2012-01-17 19:56             ` Richard Kenner
2012-03-07 17:40               ` [PATCH] Do not handle SUBREG in apply_distributive_law (Re: RFC: allowing fwprop to propagate subregs) Ulrich Weigand
2012-03-12 10:23                 ` Richard Guenther

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).