public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Fix PR77881: combine improvement
@ 2016-10-20 14:20 Michael Matz
  2016-10-20 19:08 ` Jeff Law
  2016-11-12 11:48 ` Segher Boessenkool
  0 siblings, 2 replies; 8+ messages in thread
From: Michael Matz @ 2016-10-20 14:20 UTC (permalink / raw)
  To: gcc-patches

Hello,

like analyzed in the PR, combine is able to remove outer subregs that 
don't do anything interesting in the context they are used 
(simplify_comparison).  But that currently happens outside of the loop 
that retries simplifications if changes occurred.

When we do that inside the loop as well we get secondary simplifications 
that currently only happen when calling the simplifiers multiple time, 
like when we start from three rather than from two instructions.  So 
right now we're in the curious position that more complicated code is 
optimized better than simpler code and the patch fixes this.

(FWIW: this replicates parts of rather than moves the responsible code, 
because between the loop and the original place of simplification other 
things happen that might itself generate subregs).

Regstrapping on x86-64, all languages in process.  Okay if that passes?


Ciao,
Michael.
	PR missed-optimization/77881
	* combine.c (simplify_comparison): Remove useless subregs
	also inside the loop, not just after it.

testsuite/
	* gcc.target/i386/pr77881.c: New test.

diff --git a/gcc/combine.c b/gcc/combine.c
index 2727683..58351ff 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -11925,6 +11925,28 @@ simplify_comparison (enum rtx_code code, rtx *pop0, rtx *pop1)
 	  if (subreg_lowpart_p (op0)
 	      && GET_MODE_PRECISION (GET_MODE (SUBREG_REG (op0))) < mode_width)
 	    ;
+ 	  else if (subreg_lowpart_p (op0)
+ 		   && GET_MODE_CLASS (GET_MODE (op0)) == MODE_INT
+ 		   && GET_MODE_CLASS (GET_MODE (SUBREG_REG (op0))) == MODE_INT
+ 		   && (code == NE || code == EQ)
+ 		   && (GET_MODE_PRECISION (GET_MODE (SUBREG_REG (op0)))
+ 		       <= HOST_BITS_PER_WIDE_INT)
+ 		   && !paradoxical_subreg_p (op0)
+ 		   && (nonzero_bits (SUBREG_REG (op0),
+ 				     GET_MODE (SUBREG_REG (op0)))
+ 		       & ~GET_MODE_MASK (GET_MODE (op0))) == 0)
+ 	    {
+	      /* Remove outer subregs that don't do anything.  */
+ 	      tem = gen_lowpart (GET_MODE (SUBREG_REG (op0)), op1);
+ 
+ 	      if ((nonzero_bits (tem, GET_MODE (SUBREG_REG (op0)))
+ 		   & ~GET_MODE_MASK (GET_MODE (op0))) == 0)
+ 		{
+ 		  op0 = SUBREG_REG (op0), op1 = tem;
+ 		  continue;
+ 		}
+ 	      break;
+ 	    }
 	  else
 	    break;
 
diff --git a/gcc/testsuite/gcc.target/i386/pr77881.c b/gcc/testsuite/gcc.target/i386/pr77881.c
new file mode 100644
index 0000000..80d143f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr77881.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target pie } */
+/* { dg-options "-O2" } */
+extern void baz(void);
+int
+foo (long long int a, long long int a2, int b)
+{
+    if (a < 0 || b)
+          baz ();
+}
+/* { dg-final { scan-assembler "js\[ \t\]\.L" } } */
+/* { dg-final { scan-assembler "jne\[ \t\]\.L" } } */

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Fix PR77881: combine improvement
  2016-10-20 14:20 Fix PR77881: combine improvement Michael Matz
@ 2016-10-20 19:08 ` Jeff Law
  2016-11-12 11:48 ` Segher Boessenkool
  1 sibling, 0 replies; 8+ messages in thread
From: Jeff Law @ 2016-10-20 19:08 UTC (permalink / raw)
  To: Michael Matz, gcc-patches

On 10/20/2016 08:20 AM, Michael Matz wrote:
> Hello,
>
> like analyzed in the PR, combine is able to remove outer subregs that
> don't do anything interesting in the context they are used
> (simplify_comparison).  But that currently happens outside of the loop
> that retries simplifications if changes occurred.
>
> When we do that inside the loop as well we get secondary simplifications
> that currently only happen when calling the simplifiers multiple time,
> like when we start from three rather than from two instructions.  So
> right now we're in the curious position that more complicated code is
> optimized better than simpler code and the patch fixes this.
>
> (FWIW: this replicates parts of rather than moves the responsible code,
> because between the loop and the original place of simplification other
> things happen that might itself generate subregs).
>
> Regstrapping on x86-64, all languages in process.  Okay if that passes?
>
>
> Ciao,
> Michael.
> 	PR missed-optimization/77881
> 	* combine.c (simplify_comparison): Remove useless subregs
> 	also inside the loop, not just after it.
>
> testsuite/
> 	* gcc.target/i386/pr77881.c: New test.
LGTM.  I was a bit curious why you duplicated rather than factoring, but 
it looks like you simplified the copy a bit by not handling the 
paradoxical subreg case.


There's an outside chance this might help a couple BZs that I've poked 
at in the past.  I'll make a point to test them again this release cycle 
to see if your change improves them (I don't have the #s handy, I think 
they're P4/P5 regressions).

jeff

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Fix PR77881: combine improvement
  2016-10-20 14:20 Fix PR77881: combine improvement Michael Matz
  2016-10-20 19:08 ` Jeff Law
@ 2016-11-12 11:48 ` Segher Boessenkool
  2016-11-14  4:56   ` Michael Matz
  1 sibling, 1 reply; 8+ messages in thread
From: Segher Boessenkool @ 2016-11-12 11:48 UTC (permalink / raw)
  To: Michael Matz; +Cc: gcc-patches

Hi Michael,

On Thu, Oct 20, 2016 at 04:20:09PM +0200, Michael Matz wrote:
> 	PR missed-optimization/77881
> 	* combine.c (simplify_comparison): Remove useless subregs
> 	also inside the loop, not just after it.
> 
> testsuite/
> 	* gcc.target/i386/pr77881.c: New test.

This isn't checked in yet, do you still want it?

Thanks,


Segher

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Fix PR77881: combine improvement
  2016-11-12 11:48 ` Segher Boessenkool
@ 2016-11-14  4:56   ` Michael Matz
  2016-11-15  4:10     ` Segher Boessenkool
  2016-11-16 15:05     ` Andreas Schwab
  0 siblings, 2 replies; 8+ messages in thread
From: Michael Matz @ 2016-11-14  4:56 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc-patches

Hi,

On Sat, 12 Nov 2016, Segher Boessenkool wrote:

> Hi Michael,
> 
> On Thu, Oct 20, 2016 at 04:20:09PM +0200, Michael Matz wrote:
> > 	PR missed-optimization/77881
> > 	* combine.c (simplify_comparison): Remove useless subregs
> > 	also inside the loop, not just after it.
> > 
> > testsuite/
> > 	* gcc.target/i386/pr77881.c: New test.
> 
> This isn't checked in yet, do you still want it?

Gnah, fell through the cracks.  I had to fix something else in combine to 
make it not regress in the testsuite.  The problem is that removing the 
subregs enables further simplifications which in turn might not be 
expected down-stream.  The particular problem was that originally the loop 
was left with an (subreg:QI (and:SI (lrshift X 24) 255) 0), whose inner op 
in turn was recognized by make_compound_operation and transformed into an 
extract.

With the patch we leave the loop now with essentially
  (subreg:QI (lrshift X 24) 0)
which of course is just the same masking and hence extract, but 
make_compound_operation didn't know.  With the amended patch it does.  I 
didn't come around making the AND and SUBREG handling a bit more common 
(which I initially wanted to do before posting), so for now I'm handling 
only the specific case I hit.

With this patch there are now no regressions on x86-64-linux (bootstrapped 
with all languages+ada).  Okay for trunk?


Ciao,
Michael.
	PR missed-optimization/77881
	* combine.c (simplify_comparison): Remove useless subregs
	also inside the loop, not just after it.
	(make_compound_operation): Recognize some subregs as being
	masking as well.

testsuite/
	* gcc.target/i386/pr77881.c: New test.

diff --git a/gcc/combine.c b/gcc/combine.c
index 6ffa387..0210685 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -8102,6 +8102,18 @@ make_compound_operation (rtx x, enum rtx_code in_code)
 	rtx inner = SUBREG_REG (x), simplified;
 	enum rtx_code subreg_code = in_code;
 
+	/* If the SUBREG is masking of a logical right shift,
+	   make an extraction.  */
+	if (GET_CODE (inner) == LSHIFTRT
+	    && GET_MODE_SIZE (mode) < GET_MODE_SIZE (GET_MODE (inner))
+	    && subreg_lowpart_p (x))
+	  {
+	    new_rtx = make_compound_operation (XEXP (inner, 0), next_code);
+	    new_rtx = make_extraction (mode, new_rtx, 0, XEXP (inner, 1),
+				       mode_width, 1, 0, in_code == COMPARE);
+	    break;
+	  }
+
 	/* If in_code is COMPARE, it isn't always safe to pass it through
 	   to the recursive make_compound_operation call.  */
 	if (subreg_code == COMPARE
@@ -11994,6 +12006,29 @@ simplify_comparison (enum rtx_code code, rtx *pop0, rtx *pop1)
 	  if (subreg_lowpart_p (op0)
 	      && GET_MODE_PRECISION (GET_MODE (SUBREG_REG (op0))) < mode_width)
 	    ;
+ 	  else if (subreg_lowpart_p (op0)
+ 		   && GET_MODE_CLASS (GET_MODE (op0)) == MODE_INT
+ 		   && GET_MODE_CLASS (GET_MODE (SUBREG_REG (op0))) == MODE_INT
+ 		   && (code == NE || code == EQ)
+ 		   && (GET_MODE_PRECISION (GET_MODE (SUBREG_REG (op0)))
+ 		       <= HOST_BITS_PER_WIDE_INT)
+ 		   && !paradoxical_subreg_p (op0)
+ 		   && (nonzero_bits (SUBREG_REG (op0),
+ 				     GET_MODE (SUBREG_REG (op0)))
+ 		       & ~GET_MODE_MASK (GET_MODE (op0))) == 0)
+ 	    {
+	      /* Remove outer subregs that don't do anything.  */
+ 	      tem = gen_lowpart (GET_MODE (SUBREG_REG (op0)), op1);
+ 
+ 	      if ((nonzero_bits (tem, GET_MODE (SUBREG_REG (op0)))
+ 		   & ~GET_MODE_MASK (GET_MODE (op0))) == 0)
+ 		{
+ 		  op0 = SUBREG_REG (op0);
+		  op1 = tem;
+ 		  continue;
+ 		}
+ 	      break;
+ 	    }
 	  else
 	    break;
 
diff --git a/gcc/testsuite/gcc.target/i386/pr77881.c b/gcc/testsuite/gcc.target/i386/pr77881.c
new file mode 100644
index 0000000..80d143f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr77881.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target pie } */
+/* { dg-options "-O2" } */
+extern void baz(void);
+int
+foo (long long int a, long long int a2, int b)
+{
+    if (a < 0 || b)
+          baz ();
+}
+/* { dg-final { scan-assembler "js\[ \t\]\.L" } } */
+/* { dg-final { scan-assembler "jne\[ \t\]\.L" } } */

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Fix PR77881: combine improvement
  2016-11-14  4:56   ` Michael Matz
@ 2016-11-15  4:10     ` Segher Boessenkool
  2016-11-16 15:05     ` Andreas Schwab
  1 sibling, 0 replies; 8+ messages in thread
From: Segher Boessenkool @ 2016-11-15  4:10 UTC (permalink / raw)
  To: Michael Matz; +Cc: gcc-patches

On Mon, Nov 14, 2016 at 05:56:49AM +0100, Michael Matz wrote:
> With this patch there are now no regressions on x86-64-linux (bootstrapped 
> with all languages+ada).  Okay for trunk?

I build cross-compilers for this for a whole bunch of archs, and built
Linux with that, to see what effect this has.  This is the code size
generated before and after the patch; "0" means something failed to
build (either the compiler, for the mips targets and tilegx after the
patch, or Linux):

                before    after
       alpha   5410232   5410264
         arc   3624274   3624338
         arm         0         0
       arm64   9086689   9082593
    blackfin   1963170   1963226
         c6x   2086879   2086911
        cris   2186162   2186130
         frv   3623264   3623264
       h8300   1052810   1052850
        i386   9723021   9721407
        ia64  15243432  15244136
        m32r   3415580   3415580
        m68k   3221030   3221070
  microblaze         0         0
        mips         0         0
      mips64         0         0
     mn10300   2349253   2349237
       nios2   3172110   3172182
      parisc   8241147   8241147
    parisc64   7197909   7196853
     powerpc   8396871   8396863
   powerpc64  14908442  14907866
        s390  12579952  12579568
          sh   2819700   2819716
     shnommu   1360512   1360512
       sparc   3734865   3734881
     sparc64   5932081   5932249
      tilegx  10839527         0
     tilepro  10092546  10092610
      x86_64  10349451  10349038
      xtensa   1766572   1766572

So the patch helps nicely on many targets.  I looked into the regressions;
they all seem to be just unlucky, noise, or bad rtx_cost.  The tilegx build
fail is a target bug building _negvsi2.o -- the backend accepts shifts by
70 or 87 bits, but the assembler doesn't ;-)

> @@ -11994,6 +12006,29 @@ simplify_comparison (enum rtx_code code, rtx *pop0, rtx *pop1)
>  	  if (subreg_lowpart_p (op0)
>  	      && GET_MODE_PRECISION (GET_MODE (SUBREG_REG (op0))) < mode_width)
>  	    ;
> + 	  else if (subreg_lowpart_p (op0)

Many of these lines start with a space before the tab, please fix.
Okay for trunk with that fixed.  Thank you!


Segher

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Fix PR77881: combine improvement
  2016-11-14  4:56   ` Michael Matz
  2016-11-15  4:10     ` Segher Boessenkool
@ 2016-11-16 15:05     ` Andreas Schwab
  2016-11-18 15:43       ` Bin.Cheng
  1 sibling, 1 reply; 8+ messages in thread
From: Andreas Schwab @ 2016-11-16 15:05 UTC (permalink / raw)
  To: Michael Matz; +Cc: Segher Boessenkool, gcc-patches

On Nov 14 2016, Michael Matz <matz@suse.de> wrote:

> 	PR missed-optimization/77881
> 	* combine.c (simplify_comparison): Remove useless subregs
> 	also inside the loop, not just after it.
> 	(make_compound_operation): Recognize some subregs as being
> 	masking as well.

This breaks gcc.c-torture/execute/cbrt.c execution test on aarch64.

Andreas.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Fix PR77881: combine improvement
  2016-11-16 15:05     ` Andreas Schwab
@ 2016-11-18 15:43       ` Bin.Cheng
  2016-11-18 15:51         ` Michael Matz
  0 siblings, 1 reply; 8+ messages in thread
From: Bin.Cheng @ 2016-11-18 15:43 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Michael Matz, Segher Boessenkool, gcc-patches List

On Wed, Nov 16, 2016 at 3:05 PM, Andreas Schwab <schwab@suse.de> wrote:
> On Nov 14 2016, Michael Matz <matz@suse.de> wrote:
>
>>       PR missed-optimization/77881
>>       * combine.c (simplify_comparison): Remove useless subregs
>>       also inside the loop, not just after it.
>>       (make_compound_operation): Recognize some subregs as being
>>       masking as well.
>
> This breaks gcc.c-torture/execute/cbrt.c execution test on aarch64.
Hi,
I can confirm that, also new PR opened for tracking.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78422

Thanks,
bin
>
> Andreas.
>
> --
> Andreas Schwab, SUSE Labs, schwab@suse.de
> GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
> "And now for something completely different."

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Fix PR77881: combine improvement
  2016-11-18 15:43       ` Bin.Cheng
@ 2016-11-18 15:51         ` Michael Matz
  0 siblings, 0 replies; 8+ messages in thread
From: Michael Matz @ 2016-11-18 15:51 UTC (permalink / raw)
  To: Bin.Cheng; +Cc: Andreas Schwab, Segher Boessenkool, gcc-patches List

Hi,

On Fri, 18 Nov 2016, Bin.Cheng wrote:

> On Wed, Nov 16, 2016 at 3:05 PM, Andreas Schwab <schwab@suse.de> wrote:
> > On Nov 14 2016, Michael Matz <matz@suse.de> wrote:
> >
> >>       PR missed-optimization/77881
> >>       * combine.c (simplify_comparison): Remove useless subregs
> >>       also inside the loop, not just after it.
> >>       (make_compound_operation): Recognize some subregs as being
> >>       masking as well.
> >
> > This breaks gcc.c-torture/execute/cbrt.c execution test on aarch64.
> Hi,
> I can confirm that, also new PR opened for tracking.
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78422

See PR78390 for a patch (comment #8) fixing the aarch64 problem.


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-11-18 15:51 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-20 14:20 Fix PR77881: combine improvement Michael Matz
2016-10-20 19:08 ` Jeff Law
2016-11-12 11:48 ` Segher Boessenkool
2016-11-14  4:56   ` Michael Matz
2016-11-15  4:10     ` Segher Boessenkool
2016-11-16 15:05     ` Andreas Schwab
2016-11-18 15:43       ` Bin.Cheng
2016-11-18 15:51         ` Michael Matz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).