public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/94026] combine missed opportunity to simplify comparisons with zero
       [not found] <bug-94026-4@http.gcc.gnu.org/bugzilla/>
@ 2020-03-13  3:09 ` felix.yang at huawei dot com
  2020-03-13  5:00 ` pinskia at gcc dot gnu.org
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 13+ messages in thread
From: felix.yang at huawei dot com @ 2020-03-13  3:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94026

--- Comment #2 from Fei Yang <felix.yang at huawei dot com> ---
The test case is reduced from spec2017 benchmark.

int FastBoard::count_pliberties(const int i) {
    return count_neighbours(EMPTY, i);
}

// count neighbours of color c at vertex v
int FastBoard::count_neighbours(const int c, const int v) {
    assert(c == WHITE || c == BLACK || c == EMPTY);
    return (m_neighbours[v] >> (NBR_SHIFT * c)) & 7;
}

bool FastBoard::self_atari(int color, int vertex) {
    assert(get_square(vertex) == FastBoard::EMPTY);

    // 1) count new liberties, if we add 2 or more we're safe
    if (count_pliberties(vertex) >= 2) {
        return false;
    }

    ......

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug rtl-optimization/94026] combine missed opportunity to simplify comparisons with zero
       [not found] <bug-94026-4@http.gcc.gnu.org/bugzilla/>
  2020-03-13  3:09 ` [Bug rtl-optimization/94026] combine missed opportunity to simplify comparisons with zero felix.yang at huawei dot com
@ 2020-03-13  5:00 ` pinskia at gcc dot gnu.org
  2020-03-16  3:34 ` felix.yang at huawei dot com
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 13+ messages in thread
From: pinskia at gcc dot gnu.org @ 2020-03-13  5:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94026

--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
I think part of this optimization should be done on the tree level.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug rtl-optimization/94026] combine missed opportunity to simplify comparisons with zero
       [not found] <bug-94026-4@http.gcc.gnu.org/bugzilla/>
  2020-03-13  3:09 ` [Bug rtl-optimization/94026] combine missed opportunity to simplify comparisons with zero felix.yang at huawei dot com
  2020-03-13  5:00 ` pinskia at gcc dot gnu.org
@ 2020-03-16  3:34 ` felix.yang at huawei dot com
  2020-03-20 14:23 ` wdijkstr at arm dot com
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 13+ messages in thread
From: felix.yang at huawei dot com @ 2020-03-16  3:34 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94026

--- Comment #4 from Fei Yang <felix.yang at huawei dot com> ---
(In reply to Fei Yang from comment #0)
> Created attachment 47966 [details]
> proposed patch to fix this issue
> 
> Simple test case:
> int
> foo (int c, int d)
> {
>   int a = (c >> d) & 7;
> 
>   if (a >= 2) {
>     return 1;
>   }
> 
>   return 0;
> }
> 
> Compile option: gcc -S -O2 test.c
> 
> 
> On aarch64, GCC trunk emits 4 instrunctions:
>         asr     w0, w0, 8
>         tst     w0, 6
>         cset    w0, ne
>         ret
> 
> which can be further simplified into:
>         tst     x0, 1536
>         cset    w0, ne
>         ret
> 
> We see the same issue on other targets such as i386 and x86-64.
> 
> Attached please find proposed patch for this issue.

The previously posted test case is not correct.
Test case should be:
int fifth (int c)
{
    int a = (c >> 8) & 7;

    if (a >= 2) {
        return 1;
    } else {
        return 0;
    }
}

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug rtl-optimization/94026] combine missed opportunity to simplify comparisons with zero
       [not found] <bug-94026-4@http.gcc.gnu.org/bugzilla/>
                   ` (2 preceding siblings ...)
  2020-03-16  3:34 ` felix.yang at huawei dot com
@ 2020-03-20 14:23 ` wdijkstr at arm dot com
  2021-07-25  1:14 ` [Bug tree-optimization/94026] " pinskia at gcc dot gnu.org
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 13+ messages in thread
From: wdijkstr at arm dot com @ 2020-03-20 14:23 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94026

Wilco <wdijkstr at arm dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |wdijkstr at arm dot com

--- Comment #5 from Wilco <wdijkstr at arm dot com> ---
(In reply to Fei Yang from comment #4)
> (In reply to Fei Yang from comment #0)
> > Created attachment 47966 [details]
> > proposed patch to fix this issue
> > 
> > Simple test case:
> > int
> > foo (int c, int d)
> > {
> >   int a = (c >> d) & 7;
> > 
> >   if (a >= 2) {
> >     return 1;
> >   }
> > 
> >   return 0;
> > }
> > 
> > Compile option: gcc -S -O2 test.c
> > 
> > 
> > On aarch64, GCC trunk emits 4 instrunctions:
> >         asr     w0, w0, 8
> >         tst     w0, 6
> >         cset    w0, ne
> >         ret
> > 
> > which can be further simplified into:
> >         tst     x0, 1536
> >         cset    w0, ne
> >         ret
> > 
> > We see the same issue on other targets such as i386 and x86-64.
> > 
> > Attached please find proposed patch for this issue.
> 
> The previously posted test case is not correct.
> Test case should be:
> int fifth (int c)
> {
>     int a = (c >> 8) & 7;
> 
>     if (a >= 2) {
>         return 1;
>     } else {
>         return 0;
>     }
> }

Simpler cases are:

int f1(int x) { return ((x >> 8) & 6) != 0; }
int f2(int x) { return ((x << 2) & 24) != 0; }
int f3(unsigned x) { return ((x << 2) & 15) != 0; }
int f4(unsigned x) { return ((x >> 2) & 14) != 0; }

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/94026] combine missed opportunity to simplify comparisons with zero
       [not found] <bug-94026-4@http.gcc.gnu.org/bugzilla/>
                   ` (3 preceding siblings ...)
  2020-03-20 14:23 ` wdijkstr at arm dot com
@ 2021-07-25  1:14 ` pinskia at gcc dot gnu.org
  2022-06-24 15:48 ` segher at gcc dot gnu.org
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 13+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-07-25  1:14 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94026

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|rtl-optimization            |tree-optimization
     Ever confirmed|0                           |1
           Severity|normal                      |enhancement
             Status|UNCONFIRMED                 |NEW
           See Also|                            |https://gcc.gnu.org/bugzill
                   |                            |a/show_bug.cgi?id=21137
   Last reconfirmed|                            |2021-07-25

--- Comment #6 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Confirmed.  This is related to PR 21137.  But currently for 21137, it is only
done in fold rather than moving it to match.pd.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/94026] combine missed opportunity to simplify comparisons with zero
       [not found] <bug-94026-4@http.gcc.gnu.org/bugzilla/>
                   ` (4 preceding siblings ...)
  2021-07-25  1:14 ` [Bug tree-optimization/94026] " pinskia at gcc dot gnu.org
@ 2022-06-24 15:48 ` segher at gcc dot gnu.org
  2022-06-24 16:30 ` law at gcc dot gnu.org
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 13+ messages in thread
From: segher at gcc dot gnu.org @ 2022-06-24 15:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94026

--- Comment #7 from Segher Boessenkool <segher at gcc dot gnu.org> ---
For Power, both the original testcase and the one in comment 5 generate perfect
code, for all -mcpu= I tested.  Should this be a target bug?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/94026] combine missed opportunity to simplify comparisons with zero
       [not found] <bug-94026-4@http.gcc.gnu.org/bugzilla/>
                   ` (5 preceding siblings ...)
  2022-06-24 15:48 ` segher at gcc dot gnu.org
@ 2022-06-24 16:30 ` law at gcc dot gnu.org
  2022-06-24 17:30 ` segher at gcc dot gnu.org
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 13+ messages in thread
From: law at gcc dot gnu.org @ 2022-06-24 16:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94026

--- Comment #8 from Jeffrey A. Law <law at gcc dot gnu.org> ---
I don't think so -- the goal here is to optimize this in gimple so that all
targets benefit rather than every target having to customize a solution for
this idiom.

If Roger's patch is sound you might even be able to simplify the ppc backend
eve so slightly.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/94026] combine missed opportunity to simplify comparisons with zero
       [not found] <bug-94026-4@http.gcc.gnu.org/bugzilla/>
                   ` (6 preceding siblings ...)
  2022-06-24 16:30 ` law at gcc dot gnu.org
@ 2022-06-24 17:30 ` segher at gcc dot gnu.org
  2022-06-24 18:07 ` segher at gcc dot gnu.org
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 13+ messages in thread
From: segher at gcc dot gnu.org @ 2022-06-24 17:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94026

--- Comment #9 from Segher Boessenkool <segher at gcc dot gnu.org> ---
This is all handled in combine, nothing is specific to rs6000 (only the
description of all of our insns is, of course, but there is really no way
around that, nor should there be :-) )

Why does combine not optimise this for Arm?  Of course it would be good if
this would be optimised early as well, but that does not mean we should not
try to optimise it late as well!

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/94026] combine missed opportunity to simplify comparisons with zero
       [not found] <bug-94026-4@http.gcc.gnu.org/bugzilla/>
                   ` (7 preceding siblings ...)
  2022-06-24 17:30 ` segher at gcc dot gnu.org
@ 2022-06-24 18:07 ` segher at gcc dot gnu.org
  2022-06-24 20:06 ` segher at gcc dot gnu.org
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 13+ messages in thread
From: segher at gcc dot gnu.org @ 2022-06-24 18:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94026

--- Comment #10 from Segher Boessenkool <segher at gcc dot gnu.org> ---
So on Arm we get

Trying 6 -> 8:
    6: r119:SI=r123:SI>>0x8
      REG_DEAD r123:SI
    8: {cc:CC_NZ=cmp(r119:SI&0x6,0);clobber scratch;}
      REG_DEAD r119:SI
Failed to match this instruction:
(parallel [
        (set (reg:CC_NZ 100 cc)
            (compare:CC_NZ (and:SI (lshiftrt:SI (reg:SI 123)
                        (const_int 8 [0x8]))
                    (const_int 6 [0x6]))
                (const_int 0 [0])))
        (clobber (scratch:SI))
    ])
Failed to match this instruction:
(set (reg:CC_NZ 100 cc)
    (compare:CC_NZ (and:SI (lshiftrt:SI (reg:SI 123)
                (const_int 8 [0x8]))
            (const_int 6 [0x6]))
        (const_int 0 [0])))

instead of something like

(set (reg:CC_NZ 100 cc)
     (compare:CC_NZ (and:SI (reg:SI 123)
                            (const_int 1536))
                    (const_int 0)))

which is correct for every CC mode even, not just NZ?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/94026] combine missed opportunity to simplify comparisons with zero
       [not found] <bug-94026-4@http.gcc.gnu.org/bugzilla/>
                   ` (8 preceding siblings ...)
  2022-06-24 18:07 ` segher at gcc dot gnu.org
@ 2022-06-24 20:06 ` segher at gcc dot gnu.org
  2022-06-27  6:45 ` cvs-commit at gcc dot gnu.org
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 13+ messages in thread
From: segher at gcc dot gnu.org @ 2022-06-24 20:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94026

--- Comment #11 from Segher Boessenkool <segher at gcc dot gnu.org> ---
Wrt rs6000: we have shift+mask+compare in just one insn (it is basic powerpc),
and our
  (define_insn "*and<mode>3_imm_dot_shifted"
pattern outputs this as just an "andi." insn when it can.  But indeed the shift
wasn't optimised away for us either.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/94026] combine missed opportunity to simplify comparisons with zero
       [not found] <bug-94026-4@http.gcc.gnu.org/bugzilla/>
                   ` (9 preceding siblings ...)
  2022-06-24 20:06 ` segher at gcc dot gnu.org
@ 2022-06-27  6:45 ` cvs-commit at gcc dot gnu.org
  2022-07-03 20:55 ` roger at nextmovesoftware dot com
  2022-08-09 17:57 ` cvs-commit at gcc dot gnu.org
  12 siblings, 0 replies; 13+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-06-27  6:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94026

--- Comment #12 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Roger Sayle <sayle@gcc.gnu.org>:

https://gcc.gnu.org/g:f3f73e86ec8613f176db3e52bbfbfbb9636cb714

commit r13-1281-gf3f73e86ec8613f176db3e52bbfbfbb9636cb714
Author: Roger Sayle <roger@nextmovesoftware.com>
Date:   Mon Jun 27 07:44:49 2022 +0100

    [PATCH] PR tree-optimization/94026: Simplify (X>>8)&6 != 0 as X&1536 != 0.

    This patch implements the missed optimization described in PR 94026,
    where a the shift can be eliminated from the sequence of a shift,
    followed by a bit-wise AND followed by an equality/inequality test.
    Specifically, ((X << C1) & C2) cmp C3 into (X & (C2 >> C1)) cmp (C3 >> C1)
    and likewise ((X >> C1) & C2) cmp C3 into (X & (C2 << C1)) cmp (C3 << C1)
    where cmp is == or !=, and C1, C2 and C3 are integer constants.
    The example in the subject line is taken from the hot function
    self_atari from the Go program Leela (in SPEC CPU 2017).

    2022-06-27  Roger Sayle  <roger@nextmovesoftware.com>

    gcc/ChangeLog
            PR tree-optimization/94026
            * match.pd (((X << C1) & C2) eq/ne C3): New simplification.
            (((X >> C1) & C2) eq/ne C3): Likewise.

    gcc/testsuite/ChangeLog
            PR tree-optimization/94026
            * gcc.dg/pr94026.c: New test case.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/94026] combine missed opportunity to simplify comparisons with zero
       [not found] <bug-94026-4@http.gcc.gnu.org/bugzilla/>
                   ` (10 preceding siblings ...)
  2022-06-27  6:45 ` cvs-commit at gcc dot gnu.org
@ 2022-07-03 20:55 ` roger at nextmovesoftware dot com
  2022-08-09 17:57 ` cvs-commit at gcc dot gnu.org
  12 siblings, 0 replies; 13+ messages in thread
From: roger at nextmovesoftware dot com @ 2022-07-03 20:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94026

Roger Sayle <roger at nextmovesoftware dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |roger at nextmovesoftware dot com
   Target Milestone|---                         |13.0
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

--- Comment #13 from Roger Sayle <roger at nextmovesoftware dot com> ---
This should now be fixed on mainline.  For the corrected code in comment #4,GCC
now generates (on arm):
        tst     w0, 1536
        cset    w0, ne
        ret
as suggested by Fei in the original PR.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/94026] combine missed opportunity to simplify comparisons with zero
       [not found] <bug-94026-4@http.gcc.gnu.org/bugzilla/>
                   ` (11 preceding siblings ...)
  2022-07-03 20:55 ` roger at nextmovesoftware dot com
@ 2022-08-09 17:57 ` cvs-commit at gcc dot gnu.org
  12 siblings, 0 replies; 13+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-08-09 17:57 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94026

--- Comment #14 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Roger Sayle <sayle@gcc.gnu.org>:

https://gcc.gnu.org/g:6fc14f1963dfefead588a4cd8902d641ed69255c

commit r13-2005-g6fc14f1963dfefead588a4cd8902d641ed69255c
Author: Roger Sayle <roger@nextmovesoftware.com>
Date:   Tue Aug 9 18:54:43 2022 +0100

    middle-end: Optimize ((X >> C1) & C2) != C3 for more cases.

    Following my middle-end patch for PR tree-optimization/94026, I'd promised
    Jeff Law that I'd clean up the dead-code in fold-const.cc now that these
    optimizations are handled in match.pd.  Alas, I discovered things aren't
    quite that simple, as the transformations I'd added avoided cases where
    C2 overlapped with the new bits introduced by the shift, but the original
    code handled any value of C2 provided that it had a single-bit set (under
    the condition that C3 was always zero).

    This patch upgrades the transformations supported by match.pd to cover
    any values of C2 and C3, provided that C1 is a valid bit shift constant,
    for all three shift types (logical right, arithmetic right and left).
    This then makes the code in fold-const.cc fully redundant, and adds
    support for some new (corner) cases not previously handled.  If the
    constant C1 is valid for the type's precision, the shift is now always
    eliminated (with C2 and C3 possibly updated to test the sign bit).

    Interestingly, the fold-const.cc code that I'm now deleting was originally
    added by me back in 2006 to resolve PR middle-end/21137.  I've confirmed
    that those testcase(s) remain resolved with this patch (and I'll close
    21137 in Bugzilla).  This patch also implements most (but not all) of the
    examples mentioned in PR tree-optimization/98954, for which I have some
    follow-up patches.

    2022-08-09  Roger Sayle  <roger@nextmovesoftware.com>
                Richard Biener  <rguenther@suse.de>

    gcc/ChangeLog
            PR middle-end/21137
            PR tree-optimization/98954
            * fold-const.cc (fold_binary_loc): Remove optimizations to
            optimize ((X >> C1) & C2) ==/!= 0.
            * match.pd (cmp (bit_and (lshift @0 @1) @2) @3): Remove wi::ctz
            check, and handle all values of INTEGER_CSTs @2 and @3.
            (cmp (bit_and (rshift @0 @1) @2) @3): Likewise, remove wi::clz
            checks, and handle all values of INTEGER_CSTs @2 and @3.

    gcc/testsuite/ChangeLog
            PR middle-end/21137
            PR tree-optimization/98954
            * gcc.dg/fold-eqandshift-4.c: New test case.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-08-09 17:57 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-94026-4@http.gcc.gnu.org/bugzilla/>
2020-03-13  3:09 ` [Bug rtl-optimization/94026] combine missed opportunity to simplify comparisons with zero felix.yang at huawei dot com
2020-03-13  5:00 ` pinskia at gcc dot gnu.org
2020-03-16  3:34 ` felix.yang at huawei dot com
2020-03-20 14:23 ` wdijkstr at arm dot com
2021-07-25  1:14 ` [Bug tree-optimization/94026] " pinskia at gcc dot gnu.org
2022-06-24 15:48 ` segher at gcc dot gnu.org
2022-06-24 16:30 ` law at gcc dot gnu.org
2022-06-24 17:30 ` segher at gcc dot gnu.org
2022-06-24 18:07 ` segher at gcc dot gnu.org
2022-06-24 20:06 ` segher at gcc dot gnu.org
2022-06-27  6:45 ` cvs-commit at gcc dot gnu.org
2022-07-03 20:55 ` roger at nextmovesoftware dot com
2022-08-09 17:57 ` cvs-commit at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).