* Fix PR77881: combine improvement @ 2016-10-20 14:20 Michael Matz 2016-10-20 19:08 ` Jeff Law 2016-11-12 11:48 ` Segher Boessenkool 0 siblings, 2 replies; 8+ messages in thread From: Michael Matz @ 2016-10-20 14:20 UTC (permalink / raw) To: gcc-patches Hello, like analyzed in the PR, combine is able to remove outer subregs that don't do anything interesting in the context they are used (simplify_comparison). But that currently happens outside of the loop that retries simplifications if changes occurred. When we do that inside the loop as well we get secondary simplifications that currently only happen when calling the simplifiers multiple time, like when we start from three rather than from two instructions. So right now we're in the curious position that more complicated code is optimized better than simpler code and the patch fixes this. (FWIW: this replicates parts of rather than moves the responsible code, because between the loop and the original place of simplification other things happen that might itself generate subregs). Regstrapping on x86-64, all languages in process. Okay if that passes? Ciao, Michael. PR missed-optimization/77881 * combine.c (simplify_comparison): Remove useless subregs also inside the loop, not just after it. testsuite/ * gcc.target/i386/pr77881.c: New test. diff --git a/gcc/combine.c b/gcc/combine.c index 2727683..58351ff 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -11925,6 +11925,28 @@ simplify_comparison (enum rtx_code code, rtx *pop0, rtx *pop1) if (subreg_lowpart_p (op0) && GET_MODE_PRECISION (GET_MODE (SUBREG_REG (op0))) < mode_width) ; + else if (subreg_lowpart_p (op0) + && GET_MODE_CLASS (GET_MODE (op0)) == MODE_INT + && GET_MODE_CLASS (GET_MODE (SUBREG_REG (op0))) == MODE_INT + && (code == NE || code == EQ) + && (GET_MODE_PRECISION (GET_MODE (SUBREG_REG (op0))) + <= HOST_BITS_PER_WIDE_INT) + && !paradoxical_subreg_p (op0) + && (nonzero_bits (SUBREG_REG (op0), + GET_MODE (SUBREG_REG (op0))) + & ~GET_MODE_MASK (GET_MODE (op0))) == 0) + { + /* Remove outer subregs that don't do anything. */ + tem = gen_lowpart (GET_MODE (SUBREG_REG (op0)), op1); + + if ((nonzero_bits (tem, GET_MODE (SUBREG_REG (op0))) + & ~GET_MODE_MASK (GET_MODE (op0))) == 0) + { + op0 = SUBREG_REG (op0), op1 = tem; + continue; + } + break; + } else break; diff --git a/gcc/testsuite/gcc.target/i386/pr77881.c b/gcc/testsuite/gcc.target/i386/pr77881.c new file mode 100644 index 0000000..80d143f --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr77881.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target pie } */ +/* { dg-options "-O2" } */ +extern void baz(void); +int +foo (long long int a, long long int a2, int b) +{ + if (a < 0 || b) + baz (); +} +/* { dg-final { scan-assembler "js\[ \t\]\.L" } } */ +/* { dg-final { scan-assembler "jne\[ \t\]\.L" } } */ ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Fix PR77881: combine improvement 2016-10-20 14:20 Fix PR77881: combine improvement Michael Matz @ 2016-10-20 19:08 ` Jeff Law 2016-11-12 11:48 ` Segher Boessenkool 1 sibling, 0 replies; 8+ messages in thread From: Jeff Law @ 2016-10-20 19:08 UTC (permalink / raw) To: Michael Matz, gcc-patches On 10/20/2016 08:20 AM, Michael Matz wrote: > Hello, > > like analyzed in the PR, combine is able to remove outer subregs that > don't do anything interesting in the context they are used > (simplify_comparison). But that currently happens outside of the loop > that retries simplifications if changes occurred. > > When we do that inside the loop as well we get secondary simplifications > that currently only happen when calling the simplifiers multiple time, > like when we start from three rather than from two instructions. So > right now we're in the curious position that more complicated code is > optimized better than simpler code and the patch fixes this. > > (FWIW: this replicates parts of rather than moves the responsible code, > because between the loop and the original place of simplification other > things happen that might itself generate subregs). > > Regstrapping on x86-64, all languages in process. Okay if that passes? > > > Ciao, > Michael. > PR missed-optimization/77881 > * combine.c (simplify_comparison): Remove useless subregs > also inside the loop, not just after it. > > testsuite/ > * gcc.target/i386/pr77881.c: New test. LGTM. I was a bit curious why you duplicated rather than factoring, but it looks like you simplified the copy a bit by not handling the paradoxical subreg case. There's an outside chance this might help a couple BZs that I've poked at in the past. I'll make a point to test them again this release cycle to see if your change improves them (I don't have the #s handy, I think they're P4/P5 regressions). jeff ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Fix PR77881: combine improvement 2016-10-20 14:20 Fix PR77881: combine improvement Michael Matz 2016-10-20 19:08 ` Jeff Law @ 2016-11-12 11:48 ` Segher Boessenkool 2016-11-14 4:56 ` Michael Matz 1 sibling, 1 reply; 8+ messages in thread From: Segher Boessenkool @ 2016-11-12 11:48 UTC (permalink / raw) To: Michael Matz; +Cc: gcc-patches Hi Michael, On Thu, Oct 20, 2016 at 04:20:09PM +0200, Michael Matz wrote: > PR missed-optimization/77881 > * combine.c (simplify_comparison): Remove useless subregs > also inside the loop, not just after it. > > testsuite/ > * gcc.target/i386/pr77881.c: New test. This isn't checked in yet, do you still want it? Thanks, Segher ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Fix PR77881: combine improvement 2016-11-12 11:48 ` Segher Boessenkool @ 2016-11-14 4:56 ` Michael Matz 2016-11-15 4:10 ` Segher Boessenkool 2016-11-16 15:05 ` Andreas Schwab 0 siblings, 2 replies; 8+ messages in thread From: Michael Matz @ 2016-11-14 4:56 UTC (permalink / raw) To: Segher Boessenkool; +Cc: gcc-patches Hi, On Sat, 12 Nov 2016, Segher Boessenkool wrote: > Hi Michael, > > On Thu, Oct 20, 2016 at 04:20:09PM +0200, Michael Matz wrote: > > PR missed-optimization/77881 > > * combine.c (simplify_comparison): Remove useless subregs > > also inside the loop, not just after it. > > > > testsuite/ > > * gcc.target/i386/pr77881.c: New test. > > This isn't checked in yet, do you still want it? Gnah, fell through the cracks. I had to fix something else in combine to make it not regress in the testsuite. The problem is that removing the subregs enables further simplifications which in turn might not be expected down-stream. The particular problem was that originally the loop was left with an (subreg:QI (and:SI (lrshift X 24) 255) 0), whose inner op in turn was recognized by make_compound_operation and transformed into an extract. With the patch we leave the loop now with essentially (subreg:QI (lrshift X 24) 0) which of course is just the same masking and hence extract, but make_compound_operation didn't know. With the amended patch it does. I didn't come around making the AND and SUBREG handling a bit more common (which I initially wanted to do before posting), so for now I'm handling only the specific case I hit. With this patch there are now no regressions on x86-64-linux (bootstrapped with all languages+ada). Okay for trunk? Ciao, Michael. PR missed-optimization/77881 * combine.c (simplify_comparison): Remove useless subregs also inside the loop, not just after it. (make_compound_operation): Recognize some subregs as being masking as well. testsuite/ * gcc.target/i386/pr77881.c: New test. diff --git a/gcc/combine.c b/gcc/combine.c index 6ffa387..0210685 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -8102,6 +8102,18 @@ make_compound_operation (rtx x, enum rtx_code in_code) rtx inner = SUBREG_REG (x), simplified; enum rtx_code subreg_code = in_code; + /* If the SUBREG is masking of a logical right shift, + make an extraction. */ + if (GET_CODE (inner) == LSHIFTRT + && GET_MODE_SIZE (mode) < GET_MODE_SIZE (GET_MODE (inner)) + && subreg_lowpart_p (x)) + { + new_rtx = make_compound_operation (XEXP (inner, 0), next_code); + new_rtx = make_extraction (mode, new_rtx, 0, XEXP (inner, 1), + mode_width, 1, 0, in_code == COMPARE); + break; + } + /* If in_code is COMPARE, it isn't always safe to pass it through to the recursive make_compound_operation call. */ if (subreg_code == COMPARE @@ -11994,6 +12006,29 @@ simplify_comparison (enum rtx_code code, rtx *pop0, rtx *pop1) if (subreg_lowpart_p (op0) && GET_MODE_PRECISION (GET_MODE (SUBREG_REG (op0))) < mode_width) ; + else if (subreg_lowpart_p (op0) + && GET_MODE_CLASS (GET_MODE (op0)) == MODE_INT + && GET_MODE_CLASS (GET_MODE (SUBREG_REG (op0))) == MODE_INT + && (code == NE || code == EQ) + && (GET_MODE_PRECISION (GET_MODE (SUBREG_REG (op0))) + <= HOST_BITS_PER_WIDE_INT) + && !paradoxical_subreg_p (op0) + && (nonzero_bits (SUBREG_REG (op0), + GET_MODE (SUBREG_REG (op0))) + & ~GET_MODE_MASK (GET_MODE (op0))) == 0) + { + /* Remove outer subregs that don't do anything. */ + tem = gen_lowpart (GET_MODE (SUBREG_REG (op0)), op1); + + if ((nonzero_bits (tem, GET_MODE (SUBREG_REG (op0))) + & ~GET_MODE_MASK (GET_MODE (op0))) == 0) + { + op0 = SUBREG_REG (op0); + op1 = tem; + continue; + } + break; + } else break; diff --git a/gcc/testsuite/gcc.target/i386/pr77881.c b/gcc/testsuite/gcc.target/i386/pr77881.c new file mode 100644 index 0000000..80d143f --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr77881.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target pie } */ +/* { dg-options "-O2" } */ +extern void baz(void); +int +foo (long long int a, long long int a2, int b) +{ + if (a < 0 || b) + baz (); +} +/* { dg-final { scan-assembler "js\[ \t\]\.L" } } */ +/* { dg-final { scan-assembler "jne\[ \t\]\.L" } } */ ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Fix PR77881: combine improvement 2016-11-14 4:56 ` Michael Matz @ 2016-11-15 4:10 ` Segher Boessenkool 2016-11-16 15:05 ` Andreas Schwab 1 sibling, 0 replies; 8+ messages in thread From: Segher Boessenkool @ 2016-11-15 4:10 UTC (permalink / raw) To: Michael Matz; +Cc: gcc-patches On Mon, Nov 14, 2016 at 05:56:49AM +0100, Michael Matz wrote: > With this patch there are now no regressions on x86-64-linux (bootstrapped > with all languages+ada). Okay for trunk? I build cross-compilers for this for a whole bunch of archs, and built Linux with that, to see what effect this has. This is the code size generated before and after the patch; "0" means something failed to build (either the compiler, for the mips targets and tilegx after the patch, or Linux): before after alpha 5410232 5410264 arc 3624274 3624338 arm 0 0 arm64 9086689 9082593 blackfin 1963170 1963226 c6x 2086879 2086911 cris 2186162 2186130 frv 3623264 3623264 h8300 1052810 1052850 i386 9723021 9721407 ia64 15243432 15244136 m32r 3415580 3415580 m68k 3221030 3221070 microblaze 0 0 mips 0 0 mips64 0 0 mn10300 2349253 2349237 nios2 3172110 3172182 parisc 8241147 8241147 parisc64 7197909 7196853 powerpc 8396871 8396863 powerpc64 14908442 14907866 s390 12579952 12579568 sh 2819700 2819716 shnommu 1360512 1360512 sparc 3734865 3734881 sparc64 5932081 5932249 tilegx 10839527 0 tilepro 10092546 10092610 x86_64 10349451 10349038 xtensa 1766572 1766572 So the patch helps nicely on many targets. I looked into the regressions; they all seem to be just unlucky, noise, or bad rtx_cost. The tilegx build fail is a target bug building _negvsi2.o -- the backend accepts shifts by 70 or 87 bits, but the assembler doesn't ;-) > @@ -11994,6 +12006,29 @@ simplify_comparison (enum rtx_code code, rtx *pop0, rtx *pop1) > if (subreg_lowpart_p (op0) > && GET_MODE_PRECISION (GET_MODE (SUBREG_REG (op0))) < mode_width) > ; > + else if (subreg_lowpart_p (op0) Many of these lines start with a space before the tab, please fix. Okay for trunk with that fixed. Thank you! Segher ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Fix PR77881: combine improvement 2016-11-14 4:56 ` Michael Matz 2016-11-15 4:10 ` Segher Boessenkool @ 2016-11-16 15:05 ` Andreas Schwab 2016-11-18 15:43 ` Bin.Cheng 1 sibling, 1 reply; 8+ messages in thread From: Andreas Schwab @ 2016-11-16 15:05 UTC (permalink / raw) To: Michael Matz; +Cc: Segher Boessenkool, gcc-patches On Nov 14 2016, Michael Matz <matz@suse.de> wrote: > PR missed-optimization/77881 > * combine.c (simplify_comparison): Remove useless subregs > also inside the loop, not just after it. > (make_compound_operation): Recognize some subregs as being > masking as well. This breaks gcc.c-torture/execute/cbrt.c execution test on aarch64. Andreas. -- Andreas Schwab, SUSE Labs, schwab@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 "And now for something completely different." ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Fix PR77881: combine improvement 2016-11-16 15:05 ` Andreas Schwab @ 2016-11-18 15:43 ` Bin.Cheng 2016-11-18 15:51 ` Michael Matz 0 siblings, 1 reply; 8+ messages in thread From: Bin.Cheng @ 2016-11-18 15:43 UTC (permalink / raw) To: Andreas Schwab; +Cc: Michael Matz, Segher Boessenkool, gcc-patches List On Wed, Nov 16, 2016 at 3:05 PM, Andreas Schwab <schwab@suse.de> wrote: > On Nov 14 2016, Michael Matz <matz@suse.de> wrote: > >> PR missed-optimization/77881 >> * combine.c (simplify_comparison): Remove useless subregs >> also inside the loop, not just after it. >> (make_compound_operation): Recognize some subregs as being >> masking as well. > > This breaks gcc.c-torture/execute/cbrt.c execution test on aarch64. Hi, I can confirm that, also new PR opened for tracking. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78422 Thanks, bin > > Andreas. > > -- > Andreas Schwab, SUSE Labs, schwab@suse.de > GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 > "And now for something completely different." ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Fix PR77881: combine improvement 2016-11-18 15:43 ` Bin.Cheng @ 2016-11-18 15:51 ` Michael Matz 0 siblings, 0 replies; 8+ messages in thread From: Michael Matz @ 2016-11-18 15:51 UTC (permalink / raw) To: Bin.Cheng; +Cc: Andreas Schwab, Segher Boessenkool, gcc-patches List Hi, On Fri, 18 Nov 2016, Bin.Cheng wrote: > On Wed, Nov 16, 2016 at 3:05 PM, Andreas Schwab <schwab@suse.de> wrote: > > On Nov 14 2016, Michael Matz <matz@suse.de> wrote: > > > >> PR missed-optimization/77881 > >> * combine.c (simplify_comparison): Remove useless subregs > >> also inside the loop, not just after it. > >> (make_compound_operation): Recognize some subregs as being > >> masking as well. > > > > This breaks gcc.c-torture/execute/cbrt.c execution test on aarch64. > Hi, > I can confirm that, also new PR opened for tracking. > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78422 See PR78390 for a patch (comment #8) fixing the aarch64 problem. Ciao, Michael. ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2016-11-18 15:51 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-10-20 14:20 Fix PR77881: combine improvement Michael Matz 2016-10-20 19:08 ` Jeff Law 2016-11-12 11:48 ` Segher Boessenkool 2016-11-14 4:56 ` Michael Matz 2016-11-15 4:10 ` Segher Boessenkool 2016-11-16 15:05 ` Andreas Schwab 2016-11-18 15:43 ` Bin.Cheng 2016-11-18 15:51 ` Michael Matz
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).