* [Bug rtl-optimization/94026] combine missed opportunity to simplify comparisons with zero
[not found] <bug-94026-4@http.gcc.gnu.org/bugzilla/>
@ 2020-03-13 3:09 ` felix.yang at huawei dot com
2020-03-13 5:00 ` pinskia at gcc dot gnu.org
` (11 subsequent siblings)
12 siblings, 0 replies; 13+ messages in thread
From: felix.yang at huawei dot com @ 2020-03-13 3:09 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94026
--- Comment #2 from Fei Yang <felix.yang at huawei dot com> ---
The test case is reduced from spec2017 benchmark.
int FastBoard::count_pliberties(const int i) {
return count_neighbours(EMPTY, i);
}
// count neighbours of color c at vertex v
int FastBoard::count_neighbours(const int c, const int v) {
assert(c == WHITE || c == BLACK || c == EMPTY);
return (m_neighbours[v] >> (NBR_SHIFT * c)) & 7;
}
bool FastBoard::self_atari(int color, int vertex) {
assert(get_square(vertex) == FastBoard::EMPTY);
// 1) count new liberties, if we add 2 or more we're safe
if (count_pliberties(vertex) >= 2) {
return false;
}
......
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug rtl-optimization/94026] combine missed opportunity to simplify comparisons with zero
[not found] <bug-94026-4@http.gcc.gnu.org/bugzilla/>
2020-03-13 3:09 ` [Bug rtl-optimization/94026] combine missed opportunity to simplify comparisons with zero felix.yang at huawei dot com
@ 2020-03-13 5:00 ` pinskia at gcc dot gnu.org
2020-03-16 3:34 ` felix.yang at huawei dot com
` (10 subsequent siblings)
12 siblings, 0 replies; 13+ messages in thread
From: pinskia at gcc dot gnu.org @ 2020-03-13 5:00 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94026
--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
I think part of this optimization should be done on the tree level.
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug rtl-optimization/94026] combine missed opportunity to simplify comparisons with zero
[not found] <bug-94026-4@http.gcc.gnu.org/bugzilla/>
2020-03-13 3:09 ` [Bug rtl-optimization/94026] combine missed opportunity to simplify comparisons with zero felix.yang at huawei dot com
2020-03-13 5:00 ` pinskia at gcc dot gnu.org
@ 2020-03-16 3:34 ` felix.yang at huawei dot com
2020-03-20 14:23 ` wdijkstr at arm dot com
` (9 subsequent siblings)
12 siblings, 0 replies; 13+ messages in thread
From: felix.yang at huawei dot com @ 2020-03-16 3:34 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94026
--- Comment #4 from Fei Yang <felix.yang at huawei dot com> ---
(In reply to Fei Yang from comment #0)
> Created attachment 47966 [details]
> proposed patch to fix this issue
>
> Simple test case:
> int
> foo (int c, int d)
> {
> int a = (c >> d) & 7;
>
> if (a >= 2) {
> return 1;
> }
>
> return 0;
> }
>
> Compile option: gcc -S -O2 test.c
>
>
> On aarch64, GCC trunk emits 4 instrunctions:
> asr w0, w0, 8
> tst w0, 6
> cset w0, ne
> ret
>
> which can be further simplified into:
> tst x0, 1536
> cset w0, ne
> ret
>
> We see the same issue on other targets such as i386 and x86-64.
>
> Attached please find proposed patch for this issue.
The previously posted test case is not correct.
Test case should be:
int fifth (int c)
{
int a = (c >> 8) & 7;
if (a >= 2) {
return 1;
} else {
return 0;
}
}
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug rtl-optimization/94026] combine missed opportunity to simplify comparisons with zero
[not found] <bug-94026-4@http.gcc.gnu.org/bugzilla/>
` (2 preceding siblings ...)
2020-03-16 3:34 ` felix.yang at huawei dot com
@ 2020-03-20 14:23 ` wdijkstr at arm dot com
2021-07-25 1:14 ` [Bug tree-optimization/94026] " pinskia at gcc dot gnu.org
` (8 subsequent siblings)
12 siblings, 0 replies; 13+ messages in thread
From: wdijkstr at arm dot com @ 2020-03-20 14:23 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94026
Wilco <wdijkstr at arm dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |wdijkstr at arm dot com
--- Comment #5 from Wilco <wdijkstr at arm dot com> ---
(In reply to Fei Yang from comment #4)
> (In reply to Fei Yang from comment #0)
> > Created attachment 47966 [details]
> > proposed patch to fix this issue
> >
> > Simple test case:
> > int
> > foo (int c, int d)
> > {
> > int a = (c >> d) & 7;
> >
> > if (a >= 2) {
> > return 1;
> > }
> >
> > return 0;
> > }
> >
> > Compile option: gcc -S -O2 test.c
> >
> >
> > On aarch64, GCC trunk emits 4 instrunctions:
> > asr w0, w0, 8
> > tst w0, 6
> > cset w0, ne
> > ret
> >
> > which can be further simplified into:
> > tst x0, 1536
> > cset w0, ne
> > ret
> >
> > We see the same issue on other targets such as i386 and x86-64.
> >
> > Attached please find proposed patch for this issue.
>
> The previously posted test case is not correct.
> Test case should be:
> int fifth (int c)
> {
> int a = (c >> 8) & 7;
>
> if (a >= 2) {
> return 1;
> } else {
> return 0;
> }
> }
Simpler cases are:
int f1(int x) { return ((x >> 8) & 6) != 0; }
int f2(int x) { return ((x << 2) & 24) != 0; }
int f3(unsigned x) { return ((x << 2) & 15) != 0; }
int f4(unsigned x) { return ((x >> 2) & 14) != 0; }
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug tree-optimization/94026] combine missed opportunity to simplify comparisons with zero
[not found] <bug-94026-4@http.gcc.gnu.org/bugzilla/>
` (3 preceding siblings ...)
2020-03-20 14:23 ` wdijkstr at arm dot com
@ 2021-07-25 1:14 ` pinskia at gcc dot gnu.org
2022-06-24 15:48 ` segher at gcc dot gnu.org
` (7 subsequent siblings)
12 siblings, 0 replies; 13+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-07-25 1:14 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94026
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Component|rtl-optimization |tree-optimization
Ever confirmed|0 |1
Severity|normal |enhancement
Status|UNCONFIRMED |NEW
See Also| |https://gcc.gnu.org/bugzill
| |a/show_bug.cgi?id=21137
Last reconfirmed| |2021-07-25
--- Comment #6 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Confirmed. This is related to PR 21137. But currently for 21137, it is only
done in fold rather than moving it to match.pd.
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug tree-optimization/94026] combine missed opportunity to simplify comparisons with zero
[not found] <bug-94026-4@http.gcc.gnu.org/bugzilla/>
` (4 preceding siblings ...)
2021-07-25 1:14 ` [Bug tree-optimization/94026] " pinskia at gcc dot gnu.org
@ 2022-06-24 15:48 ` segher at gcc dot gnu.org
2022-06-24 16:30 ` law at gcc dot gnu.org
` (6 subsequent siblings)
12 siblings, 0 replies; 13+ messages in thread
From: segher at gcc dot gnu.org @ 2022-06-24 15:48 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94026
--- Comment #7 from Segher Boessenkool <segher at gcc dot gnu.org> ---
For Power, both the original testcase and the one in comment 5 generate perfect
code, for all -mcpu= I tested. Should this be a target bug?
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug tree-optimization/94026] combine missed opportunity to simplify comparisons with zero
[not found] <bug-94026-4@http.gcc.gnu.org/bugzilla/>
` (5 preceding siblings ...)
2022-06-24 15:48 ` segher at gcc dot gnu.org
@ 2022-06-24 16:30 ` law at gcc dot gnu.org
2022-06-24 17:30 ` segher at gcc dot gnu.org
` (5 subsequent siblings)
12 siblings, 0 replies; 13+ messages in thread
From: law at gcc dot gnu.org @ 2022-06-24 16:30 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94026
--- Comment #8 from Jeffrey A. Law <law at gcc dot gnu.org> ---
I don't think so -- the goal here is to optimize this in gimple so that all
targets benefit rather than every target having to customize a solution for
this idiom.
If Roger's patch is sound you might even be able to simplify the ppc backend
eve so slightly.
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug tree-optimization/94026] combine missed opportunity to simplify comparisons with zero
[not found] <bug-94026-4@http.gcc.gnu.org/bugzilla/>
` (6 preceding siblings ...)
2022-06-24 16:30 ` law at gcc dot gnu.org
@ 2022-06-24 17:30 ` segher at gcc dot gnu.org
2022-06-24 18:07 ` segher at gcc dot gnu.org
` (4 subsequent siblings)
12 siblings, 0 replies; 13+ messages in thread
From: segher at gcc dot gnu.org @ 2022-06-24 17:30 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94026
--- Comment #9 from Segher Boessenkool <segher at gcc dot gnu.org> ---
This is all handled in combine, nothing is specific to rs6000 (only the
description of all of our insns is, of course, but there is really no way
around that, nor should there be :-) )
Why does combine not optimise this for Arm? Of course it would be good if
this would be optimised early as well, but that does not mean we should not
try to optimise it late as well!
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug tree-optimization/94026] combine missed opportunity to simplify comparisons with zero
[not found] <bug-94026-4@http.gcc.gnu.org/bugzilla/>
` (7 preceding siblings ...)
2022-06-24 17:30 ` segher at gcc dot gnu.org
@ 2022-06-24 18:07 ` segher at gcc dot gnu.org
2022-06-24 20:06 ` segher at gcc dot gnu.org
` (3 subsequent siblings)
12 siblings, 0 replies; 13+ messages in thread
From: segher at gcc dot gnu.org @ 2022-06-24 18:07 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94026
--- Comment #10 from Segher Boessenkool <segher at gcc dot gnu.org> ---
So on Arm we get
Trying 6 -> 8:
6: r119:SI=r123:SI>>0x8
REG_DEAD r123:SI
8: {cc:CC_NZ=cmp(r119:SI&0x6,0);clobber scratch;}
REG_DEAD r119:SI
Failed to match this instruction:
(parallel [
(set (reg:CC_NZ 100 cc)
(compare:CC_NZ (and:SI (lshiftrt:SI (reg:SI 123)
(const_int 8 [0x8]))
(const_int 6 [0x6]))
(const_int 0 [0])))
(clobber (scratch:SI))
])
Failed to match this instruction:
(set (reg:CC_NZ 100 cc)
(compare:CC_NZ (and:SI (lshiftrt:SI (reg:SI 123)
(const_int 8 [0x8]))
(const_int 6 [0x6]))
(const_int 0 [0])))
instead of something like
(set (reg:CC_NZ 100 cc)
(compare:CC_NZ (and:SI (reg:SI 123)
(const_int 1536))
(const_int 0)))
which is correct for every CC mode even, not just NZ?
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug tree-optimization/94026] combine missed opportunity to simplify comparisons with zero
[not found] <bug-94026-4@http.gcc.gnu.org/bugzilla/>
` (8 preceding siblings ...)
2022-06-24 18:07 ` segher at gcc dot gnu.org
@ 2022-06-24 20:06 ` segher at gcc dot gnu.org
2022-06-27 6:45 ` cvs-commit at gcc dot gnu.org
` (2 subsequent siblings)
12 siblings, 0 replies; 13+ messages in thread
From: segher at gcc dot gnu.org @ 2022-06-24 20:06 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94026
--- Comment #11 from Segher Boessenkool <segher at gcc dot gnu.org> ---
Wrt rs6000: we have shift+mask+compare in just one insn (it is basic powerpc),
and our
(define_insn "*and<mode>3_imm_dot_shifted"
pattern outputs this as just an "andi." insn when it can. But indeed the shift
wasn't optimised away for us either.
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug tree-optimization/94026] combine missed opportunity to simplify comparisons with zero
[not found] <bug-94026-4@http.gcc.gnu.org/bugzilla/>
` (9 preceding siblings ...)
2022-06-24 20:06 ` segher at gcc dot gnu.org
@ 2022-06-27 6:45 ` cvs-commit at gcc dot gnu.org
2022-07-03 20:55 ` roger at nextmovesoftware dot com
2022-08-09 17:57 ` cvs-commit at gcc dot gnu.org
12 siblings, 0 replies; 13+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-06-27 6:45 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94026
--- Comment #12 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Roger Sayle <sayle@gcc.gnu.org>:
https://gcc.gnu.org/g:f3f73e86ec8613f176db3e52bbfbfbb9636cb714
commit r13-1281-gf3f73e86ec8613f176db3e52bbfbfbb9636cb714
Author: Roger Sayle <roger@nextmovesoftware.com>
Date: Mon Jun 27 07:44:49 2022 +0100
[PATCH] PR tree-optimization/94026: Simplify (X>>8)&6 != 0 as X&1536 != 0.
This patch implements the missed optimization described in PR 94026,
where a the shift can be eliminated from the sequence of a shift,
followed by a bit-wise AND followed by an equality/inequality test.
Specifically, ((X << C1) & C2) cmp C3 into (X & (C2 >> C1)) cmp (C3 >> C1)
and likewise ((X >> C1) & C2) cmp C3 into (X & (C2 << C1)) cmp (C3 << C1)
where cmp is == or !=, and C1, C2 and C3 are integer constants.
The example in the subject line is taken from the hot function
self_atari from the Go program Leela (in SPEC CPU 2017).
2022-06-27 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
PR tree-optimization/94026
* match.pd (((X << C1) & C2) eq/ne C3): New simplification.
(((X >> C1) & C2) eq/ne C3): Likewise.
gcc/testsuite/ChangeLog
PR tree-optimization/94026
* gcc.dg/pr94026.c: New test case.
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug tree-optimization/94026] combine missed opportunity to simplify comparisons with zero
[not found] <bug-94026-4@http.gcc.gnu.org/bugzilla/>
` (10 preceding siblings ...)
2022-06-27 6:45 ` cvs-commit at gcc dot gnu.org
@ 2022-07-03 20:55 ` roger at nextmovesoftware dot com
2022-08-09 17:57 ` cvs-commit at gcc dot gnu.org
12 siblings, 0 replies; 13+ messages in thread
From: roger at nextmovesoftware dot com @ 2022-07-03 20:55 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94026
Roger Sayle <roger at nextmovesoftware dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |roger at nextmovesoftware dot com
Target Milestone|--- |13.0
Status|NEW |RESOLVED
Resolution|--- |FIXED
--- Comment #13 from Roger Sayle <roger at nextmovesoftware dot com> ---
This should now be fixed on mainline. For the corrected code in comment #4,GCC
now generates (on arm):
tst w0, 1536
cset w0, ne
ret
as suggested by Fei in the original PR.
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug tree-optimization/94026] combine missed opportunity to simplify comparisons with zero
[not found] <bug-94026-4@http.gcc.gnu.org/bugzilla/>
` (11 preceding siblings ...)
2022-07-03 20:55 ` roger at nextmovesoftware dot com
@ 2022-08-09 17:57 ` cvs-commit at gcc dot gnu.org
12 siblings, 0 replies; 13+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-08-09 17:57 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94026
--- Comment #14 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Roger Sayle <sayle@gcc.gnu.org>:
https://gcc.gnu.org/g:6fc14f1963dfefead588a4cd8902d641ed69255c
commit r13-2005-g6fc14f1963dfefead588a4cd8902d641ed69255c
Author: Roger Sayle <roger@nextmovesoftware.com>
Date: Tue Aug 9 18:54:43 2022 +0100
middle-end: Optimize ((X >> C1) & C2) != C3 for more cases.
Following my middle-end patch for PR tree-optimization/94026, I'd promised
Jeff Law that I'd clean up the dead-code in fold-const.cc now that these
optimizations are handled in match.pd. Alas, I discovered things aren't
quite that simple, as the transformations I'd added avoided cases where
C2 overlapped with the new bits introduced by the shift, but the original
code handled any value of C2 provided that it had a single-bit set (under
the condition that C3 was always zero).
This patch upgrades the transformations supported by match.pd to cover
any values of C2 and C3, provided that C1 is a valid bit shift constant,
for all three shift types (logical right, arithmetic right and left).
This then makes the code in fold-const.cc fully redundant, and adds
support for some new (corner) cases not previously handled. If the
constant C1 is valid for the type's precision, the shift is now always
eliminated (with C2 and C3 possibly updated to test the sign bit).
Interestingly, the fold-const.cc code that I'm now deleting was originally
added by me back in 2006 to resolve PR middle-end/21137. I've confirmed
that those testcase(s) remain resolved with this patch (and I'll close
21137 in Bugzilla). This patch also implements most (but not all) of the
examples mentioned in PR tree-optimization/98954, for which I have some
follow-up patches.
2022-08-09 Roger Sayle <roger@nextmovesoftware.com>
Richard Biener <rguenther@suse.de>
gcc/ChangeLog
PR middle-end/21137
PR tree-optimization/98954
* fold-const.cc (fold_binary_loc): Remove optimizations to
optimize ((X >> C1) & C2) ==/!= 0.
* match.pd (cmp (bit_and (lshift @0 @1) @2) @3): Remove wi::ctz
check, and handle all values of INTEGER_CSTs @2 and @3.
(cmp (bit_and (rshift @0 @1) @2) @3): Likewise, remove wi::clz
checks, and handle all values of INTEGER_CSTs @2 and @3.
gcc/testsuite/ChangeLog
PR middle-end/21137
PR tree-optimization/98954
* gcc.dg/fold-eqandshift-4.c: New test case.
^ permalink raw reply [flat|nested] 13+ messages in thread