* Fix PR77881: combine improvement
@ 2016-10-20 14:20 Michael Matz
2016-10-20 19:08 ` Jeff Law
2016-11-12 11:48 ` Segher Boessenkool
0 siblings, 2 replies; 8+ messages in thread
From: Michael Matz @ 2016-10-20 14:20 UTC (permalink / raw)
To: gcc-patches
Hello,
like analyzed in the PR, combine is able to remove outer subregs that
don't do anything interesting in the context they are used
(simplify_comparison). But that currently happens outside of the loop
that retries simplifications if changes occurred.
When we do that inside the loop as well we get secondary simplifications
that currently only happen when calling the simplifiers multiple time,
like when we start from three rather than from two instructions. So
right now we're in the curious position that more complicated code is
optimized better than simpler code and the patch fixes this.
(FWIW: this replicates parts of rather than moves the responsible code,
because between the loop and the original place of simplification other
things happen that might itself generate subregs).
Regstrapping on x86-64, all languages in process. Okay if that passes?
Ciao,
Michael.
PR missed-optimization/77881
* combine.c (simplify_comparison): Remove useless subregs
also inside the loop, not just after it.
testsuite/
* gcc.target/i386/pr77881.c: New test.
diff --git a/gcc/combine.c b/gcc/combine.c
index 2727683..58351ff 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -11925,6 +11925,28 @@ simplify_comparison (enum rtx_code code, rtx *pop0, rtx *pop1)
if (subreg_lowpart_p (op0)
&& GET_MODE_PRECISION (GET_MODE (SUBREG_REG (op0))) < mode_width)
;
+ else if (subreg_lowpart_p (op0)
+ && GET_MODE_CLASS (GET_MODE (op0)) == MODE_INT
+ && GET_MODE_CLASS (GET_MODE (SUBREG_REG (op0))) == MODE_INT
+ && (code == NE || code == EQ)
+ && (GET_MODE_PRECISION (GET_MODE (SUBREG_REG (op0)))
+ <= HOST_BITS_PER_WIDE_INT)
+ && !paradoxical_subreg_p (op0)
+ && (nonzero_bits (SUBREG_REG (op0),
+ GET_MODE (SUBREG_REG (op0)))
+ & ~GET_MODE_MASK (GET_MODE (op0))) == 0)
+ {
+ /* Remove outer subregs that don't do anything. */
+ tem = gen_lowpart (GET_MODE (SUBREG_REG (op0)), op1);
+
+ if ((nonzero_bits (tem, GET_MODE (SUBREG_REG (op0)))
+ & ~GET_MODE_MASK (GET_MODE (op0))) == 0)
+ {
+ op0 = SUBREG_REG (op0), op1 = tem;
+ continue;
+ }
+ break;
+ }
else
break;
diff --git a/gcc/testsuite/gcc.target/i386/pr77881.c b/gcc/testsuite/gcc.target/i386/pr77881.c
new file mode 100644
index 0000000..80d143f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr77881.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target pie } */
+/* { dg-options "-O2" } */
+extern void baz(void);
+int
+foo (long long int a, long long int a2, int b)
+{
+ if (a < 0 || b)
+ baz ();
+}
+/* { dg-final { scan-assembler "js\[ \t\]\.L" } } */
+/* { dg-final { scan-assembler "jne\[ \t\]\.L" } } */
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Fix PR77881: combine improvement
2016-10-20 14:20 Fix PR77881: combine improvement Michael Matz
@ 2016-10-20 19:08 ` Jeff Law
2016-11-12 11:48 ` Segher Boessenkool
1 sibling, 0 replies; 8+ messages in thread
From: Jeff Law @ 2016-10-20 19:08 UTC (permalink / raw)
To: Michael Matz, gcc-patches
On 10/20/2016 08:20 AM, Michael Matz wrote:
> Hello,
>
> like analyzed in the PR, combine is able to remove outer subregs that
> don't do anything interesting in the context they are used
> (simplify_comparison). But that currently happens outside of the loop
> that retries simplifications if changes occurred.
>
> When we do that inside the loop as well we get secondary simplifications
> that currently only happen when calling the simplifiers multiple time,
> like when we start from three rather than from two instructions. So
> right now we're in the curious position that more complicated code is
> optimized better than simpler code and the patch fixes this.
>
> (FWIW: this replicates parts of rather than moves the responsible code,
> because between the loop and the original place of simplification other
> things happen that might itself generate subregs).
>
> Regstrapping on x86-64, all languages in process. Okay if that passes?
>
>
> Ciao,
> Michael.
> PR missed-optimization/77881
> * combine.c (simplify_comparison): Remove useless subregs
> also inside the loop, not just after it.
>
> testsuite/
> * gcc.target/i386/pr77881.c: New test.
LGTM. I was a bit curious why you duplicated rather than factoring, but
it looks like you simplified the copy a bit by not handling the
paradoxical subreg case.
There's an outside chance this might help a couple BZs that I've poked
at in the past. I'll make a point to test them again this release cycle
to see if your change improves them (I don't have the #s handy, I think
they're P4/P5 regressions).
jeff
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Fix PR77881: combine improvement
2016-10-20 14:20 Fix PR77881: combine improvement Michael Matz
2016-10-20 19:08 ` Jeff Law
@ 2016-11-12 11:48 ` Segher Boessenkool
2016-11-14 4:56 ` Michael Matz
1 sibling, 1 reply; 8+ messages in thread
From: Segher Boessenkool @ 2016-11-12 11:48 UTC (permalink / raw)
To: Michael Matz; +Cc: gcc-patches
Hi Michael,
On Thu, Oct 20, 2016 at 04:20:09PM +0200, Michael Matz wrote:
> PR missed-optimization/77881
> * combine.c (simplify_comparison): Remove useless subregs
> also inside the loop, not just after it.
>
> testsuite/
> * gcc.target/i386/pr77881.c: New test.
This isn't checked in yet, do you still want it?
Thanks,
Segher
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Fix PR77881: combine improvement
2016-11-12 11:48 ` Segher Boessenkool
@ 2016-11-14 4:56 ` Michael Matz
2016-11-15 4:10 ` Segher Boessenkool
2016-11-16 15:05 ` Andreas Schwab
0 siblings, 2 replies; 8+ messages in thread
From: Michael Matz @ 2016-11-14 4:56 UTC (permalink / raw)
To: Segher Boessenkool; +Cc: gcc-patches
Hi,
On Sat, 12 Nov 2016, Segher Boessenkool wrote:
> Hi Michael,
>
> On Thu, Oct 20, 2016 at 04:20:09PM +0200, Michael Matz wrote:
> > PR missed-optimization/77881
> > * combine.c (simplify_comparison): Remove useless subregs
> > also inside the loop, not just after it.
> >
> > testsuite/
> > * gcc.target/i386/pr77881.c: New test.
>
> This isn't checked in yet, do you still want it?
Gnah, fell through the cracks. I had to fix something else in combine to
make it not regress in the testsuite. The problem is that removing the
subregs enables further simplifications which in turn might not be
expected down-stream. The particular problem was that originally the loop
was left with an (subreg:QI (and:SI (lrshift X 24) 255) 0), whose inner op
in turn was recognized by make_compound_operation and transformed into an
extract.
With the patch we leave the loop now with essentially
(subreg:QI (lrshift X 24) 0)
which of course is just the same masking and hence extract, but
make_compound_operation didn't know. With the amended patch it does. I
didn't come around making the AND and SUBREG handling a bit more common
(which I initially wanted to do before posting), so for now I'm handling
only the specific case I hit.
With this patch there are now no regressions on x86-64-linux (bootstrapped
with all languages+ada). Okay for trunk?
Ciao,
Michael.
PR missed-optimization/77881
* combine.c (simplify_comparison): Remove useless subregs
also inside the loop, not just after it.
(make_compound_operation): Recognize some subregs as being
masking as well.
testsuite/
* gcc.target/i386/pr77881.c: New test.
diff --git a/gcc/combine.c b/gcc/combine.c
index 6ffa387..0210685 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -8102,6 +8102,18 @@ make_compound_operation (rtx x, enum rtx_code in_code)
rtx inner = SUBREG_REG (x), simplified;
enum rtx_code subreg_code = in_code;
+ /* If the SUBREG is masking of a logical right shift,
+ make an extraction. */
+ if (GET_CODE (inner) == LSHIFTRT
+ && GET_MODE_SIZE (mode) < GET_MODE_SIZE (GET_MODE (inner))
+ && subreg_lowpart_p (x))
+ {
+ new_rtx = make_compound_operation (XEXP (inner, 0), next_code);
+ new_rtx = make_extraction (mode, new_rtx, 0, XEXP (inner, 1),
+ mode_width, 1, 0, in_code == COMPARE);
+ break;
+ }
+
/* If in_code is COMPARE, it isn't always safe to pass it through
to the recursive make_compound_operation call. */
if (subreg_code == COMPARE
@@ -11994,6 +12006,29 @@ simplify_comparison (enum rtx_code code, rtx *pop0, rtx *pop1)
if (subreg_lowpart_p (op0)
&& GET_MODE_PRECISION (GET_MODE (SUBREG_REG (op0))) < mode_width)
;
+ else if (subreg_lowpart_p (op0)
+ && GET_MODE_CLASS (GET_MODE (op0)) == MODE_INT
+ && GET_MODE_CLASS (GET_MODE (SUBREG_REG (op0))) == MODE_INT
+ && (code == NE || code == EQ)
+ && (GET_MODE_PRECISION (GET_MODE (SUBREG_REG (op0)))
+ <= HOST_BITS_PER_WIDE_INT)
+ && !paradoxical_subreg_p (op0)
+ && (nonzero_bits (SUBREG_REG (op0),
+ GET_MODE (SUBREG_REG (op0)))
+ & ~GET_MODE_MASK (GET_MODE (op0))) == 0)
+ {
+ /* Remove outer subregs that don't do anything. */
+ tem = gen_lowpart (GET_MODE (SUBREG_REG (op0)), op1);
+
+ if ((nonzero_bits (tem, GET_MODE (SUBREG_REG (op0)))
+ & ~GET_MODE_MASK (GET_MODE (op0))) == 0)
+ {
+ op0 = SUBREG_REG (op0);
+ op1 = tem;
+ continue;
+ }
+ break;
+ }
else
break;
diff --git a/gcc/testsuite/gcc.target/i386/pr77881.c b/gcc/testsuite/gcc.target/i386/pr77881.c
new file mode 100644
index 0000000..80d143f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr77881.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target pie } */
+/* { dg-options "-O2" } */
+extern void baz(void);
+int
+foo (long long int a, long long int a2, int b)
+{
+ if (a < 0 || b)
+ baz ();
+}
+/* { dg-final { scan-assembler "js\[ \t\]\.L" } } */
+/* { dg-final { scan-assembler "jne\[ \t\]\.L" } } */
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Fix PR77881: combine improvement
2016-11-14 4:56 ` Michael Matz
@ 2016-11-15 4:10 ` Segher Boessenkool
2016-11-16 15:05 ` Andreas Schwab
1 sibling, 0 replies; 8+ messages in thread
From: Segher Boessenkool @ 2016-11-15 4:10 UTC (permalink / raw)
To: Michael Matz; +Cc: gcc-patches
On Mon, Nov 14, 2016 at 05:56:49AM +0100, Michael Matz wrote:
> With this patch there are now no regressions on x86-64-linux (bootstrapped
> with all languages+ada). Okay for trunk?
I build cross-compilers for this for a whole bunch of archs, and built
Linux with that, to see what effect this has. This is the code size
generated before and after the patch; "0" means something failed to
build (either the compiler, for the mips targets and tilegx after the
patch, or Linux):
before after
alpha 5410232 5410264
arc 3624274 3624338
arm 0 0
arm64 9086689 9082593
blackfin 1963170 1963226
c6x 2086879 2086911
cris 2186162 2186130
frv 3623264 3623264
h8300 1052810 1052850
i386 9723021 9721407
ia64 15243432 15244136
m32r 3415580 3415580
m68k 3221030 3221070
microblaze 0 0
mips 0 0
mips64 0 0
mn10300 2349253 2349237
nios2 3172110 3172182
parisc 8241147 8241147
parisc64 7197909 7196853
powerpc 8396871 8396863
powerpc64 14908442 14907866
s390 12579952 12579568
sh 2819700 2819716
shnommu 1360512 1360512
sparc 3734865 3734881
sparc64 5932081 5932249
tilegx 10839527 0
tilepro 10092546 10092610
x86_64 10349451 10349038
xtensa 1766572 1766572
So the patch helps nicely on many targets. I looked into the regressions;
they all seem to be just unlucky, noise, or bad rtx_cost. The tilegx build
fail is a target bug building _negvsi2.o -- the backend accepts shifts by
70 or 87 bits, but the assembler doesn't ;-)
> @@ -11994,6 +12006,29 @@ simplify_comparison (enum rtx_code code, rtx *pop0, rtx *pop1)
> if (subreg_lowpart_p (op0)
> && GET_MODE_PRECISION (GET_MODE (SUBREG_REG (op0))) < mode_width)
> ;
> + else if (subreg_lowpart_p (op0)
Many of these lines start with a space before the tab, please fix.
Okay for trunk with that fixed. Thank you!
Segher
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Fix PR77881: combine improvement
2016-11-14 4:56 ` Michael Matz
2016-11-15 4:10 ` Segher Boessenkool
@ 2016-11-16 15:05 ` Andreas Schwab
2016-11-18 15:43 ` Bin.Cheng
1 sibling, 1 reply; 8+ messages in thread
From: Andreas Schwab @ 2016-11-16 15:05 UTC (permalink / raw)
To: Michael Matz; +Cc: Segher Boessenkool, gcc-patches
On Nov 14 2016, Michael Matz <matz@suse.de> wrote:
> PR missed-optimization/77881
> * combine.c (simplify_comparison): Remove useless subregs
> also inside the loop, not just after it.
> (make_compound_operation): Recognize some subregs as being
> masking as well.
This breaks gcc.c-torture/execute/cbrt.c execution test on aarch64.
Andreas.
--
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Fix PR77881: combine improvement
2016-11-16 15:05 ` Andreas Schwab
@ 2016-11-18 15:43 ` Bin.Cheng
2016-11-18 15:51 ` Michael Matz
0 siblings, 1 reply; 8+ messages in thread
From: Bin.Cheng @ 2016-11-18 15:43 UTC (permalink / raw)
To: Andreas Schwab; +Cc: Michael Matz, Segher Boessenkool, gcc-patches List
On Wed, Nov 16, 2016 at 3:05 PM, Andreas Schwab <schwab@suse.de> wrote:
> On Nov 14 2016, Michael Matz <matz@suse.de> wrote:
>
>> PR missed-optimization/77881
>> * combine.c (simplify_comparison): Remove useless subregs
>> also inside the loop, not just after it.
>> (make_compound_operation): Recognize some subregs as being
>> masking as well.
>
> This breaks gcc.c-torture/execute/cbrt.c execution test on aarch64.
Hi,
I can confirm that, also new PR opened for tracking.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78422
Thanks,
bin
>
> Andreas.
>
> --
> Andreas Schwab, SUSE Labs, schwab@suse.de
> GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7
> "And now for something completely different."
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Fix PR77881: combine improvement
2016-11-18 15:43 ` Bin.Cheng
@ 2016-11-18 15:51 ` Michael Matz
0 siblings, 0 replies; 8+ messages in thread
From: Michael Matz @ 2016-11-18 15:51 UTC (permalink / raw)
To: Bin.Cheng; +Cc: Andreas Schwab, Segher Boessenkool, gcc-patches List
Hi,
On Fri, 18 Nov 2016, Bin.Cheng wrote:
> On Wed, Nov 16, 2016 at 3:05 PM, Andreas Schwab <schwab@suse.de> wrote:
> > On Nov 14 2016, Michael Matz <matz@suse.de> wrote:
> >
> >> PR missed-optimization/77881
> >> * combine.c (simplify_comparison): Remove useless subregs
> >> also inside the loop, not just after it.
> >> (make_compound_operation): Recognize some subregs as being
> >> masking as well.
> >
> > This breaks gcc.c-torture/execute/cbrt.c execution test on aarch64.
> Hi,
> I can confirm that, also new PR opened for tracking.
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78422
See PR78390 for a patch (comment #8) fixing the aarch64 problem.
Ciao,
Michael.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2016-11-18 15:51 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-20 14:20 Fix PR77881: combine improvement Michael Matz
2016-10-20 19:08 ` Jeff Law
2016-11-12 11:48 ` Segher Boessenkool
2016-11-14 4:56 ` Michael Matz
2016-11-15 4:10 ` Segher Boessenkool
2016-11-16 15:05 ` Andreas Schwab
2016-11-18 15:43 ` Bin.Cheng
2016-11-18 15:51 ` Michael Matz
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).