public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/108840] New: Aarch64 doesn't optimize away shift counter masking
@ 2023-02-17 17:59 jakub at gcc dot gnu.org
  2023-02-17 18:08 ` [Bug target/108840] " pinskia at gcc dot gnu.org
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: jakub at gcc dot gnu.org @ 2023-02-17 17:59 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108840

            Bug ID: 108840
           Summary: Aarch64 doesn't optimize away shift counter masking
           Product: gcc
           Version: 13.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: jakub at gcc dot gnu.org
  Target Milestone: ---

As mentioned in 
https://gcc.gnu.org/pipermail/gcc-patches/2023-February/612214.html
aarch64 doesn't optimize away and instructions masking shift count if there is
more than one shift with the same count.  Consider -O2 -fno-tree-vectorize:
int
foo (int x, int y)
{
  return x << (y & 31);
}

void
bar (int x[3], int y)
{
  x[0] <<= (y & 31);
  x[1] <<= (y & 31);
  x[2] <<= (y & 31);
}

void
baz (int x[3], int y)
{
  y &= 31;
  x[0] <<= y;
  x[1] <<= y;
  x[2] <<= y;
}

void corge (int, int, int);

void
qux (int x, int y, int z, int n)
{
  n &= 31;
  corge (x << n, y << n, z >> n);
}

foo is optimized correctly, combine matches the shift with masking, but in the
rest of cases due to costs the desirable combination is rejected.  Shift with
embedded masking of the count should have rtx_cost the same as normal shift
when it is actually under the hood the shift itself.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/108840] Aarch64 doesn't optimize away shift counter masking
  2023-02-17 17:59 [Bug target/108840] New: Aarch64 doesn't optimize away shift counter masking jakub at gcc dot gnu.org
@ 2023-02-17 18:08 ` pinskia at gcc dot gnu.org
  2023-02-21 12:24 ` ktkachov at gcc dot gnu.org
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-02-17 18:08 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108840

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2023-02-17
           See Also|https://gcc.gnu.org/bugzill |
                   |a/show_bug.cgi?id=91202     |
             Status|UNCONFIRMED                 |NEW
           Keywords|                            |missed-optimization

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Confirmed:

Trying 8 -> 10:
    8: r93:SI=r108:SI&0x1f
      REG_DEAD r108:SI
   10: r101:SI=r102:SI<<r93:SI#0
      REG_DEAD r102:SI
Failed to match this instruction:
(parallel [
        (set (reg:SI 101)
            (ashift:SI (reg:SI 102 [ *x_9(D) ])
                (subreg:QI (and:SI (reg:SI 108)
                        (const_int 31 [0x1f])) 0)))
        (set (reg:SI 93 [ _2 ])
            (and:SI (reg:SI 108)
                (const_int 31 [0x1f])))
    ])
Failed to match this instruction:
(parallel [
        (set (reg:SI 101)
            (ashift:SI (reg:SI 102 [ *x_9(D) ])
                (subreg:QI (and:SI (reg:SI 108)
                        (const_int 31 [0x1f])) 0)))
        (set (reg:SI 93 [ _2 ])
            (and:SI (reg:SI 108)
                (const_int 31 [0x1f])))
    ])
Successfully matched this instruction:
(set (reg:SI 93 [ _2 ])
    (and:SI (reg:SI 108)
        (const_int 31 [0x1f])))
Successfully matched this instruction:
(set (reg:SI 101)
    (ashift:SI (reg:SI 102 [ *x_9(D) ])
        (subreg:QI (and:SI (reg:SI 108)
                (const_int 31 [0x1f])) 0)))
rejecting combination of insns 8 and 10
original costs 4 + 4 = 8
replacement costs 4 + 8 = 12

The replacement cost should be still 8.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/108840] Aarch64 doesn't optimize away shift counter masking
  2023-02-17 17:59 [Bug target/108840] New: Aarch64 doesn't optimize away shift counter masking jakub at gcc dot gnu.org
  2023-02-17 18:08 ` [Bug target/108840] " pinskia at gcc dot gnu.org
@ 2023-02-21 12:24 ` ktkachov at gcc dot gnu.org
  2023-02-24 15:37 ` ktkachov at gcc dot gnu.org
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: ktkachov at gcc dot gnu.org @ 2023-02-21 12:24 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108840

ktkachov at gcc dot gnu.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|unassigned at gcc dot gnu.org      |ktkachov at gcc dot gnu.org
             Status|NEW                         |ASSIGNED

--- Comment #2 from ktkachov at gcc dot gnu.org ---
I have a patch to simplify and fix the aarch64 rtx costs for this case. I'll
aim it for GCC 14 as it's not a regression.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/108840] Aarch64 doesn't optimize away shift counter masking
  2023-02-17 17:59 [Bug target/108840] New: Aarch64 doesn't optimize away shift counter masking jakub at gcc dot gnu.org
  2023-02-17 18:08 ` [Bug target/108840] " pinskia at gcc dot gnu.org
  2023-02-21 12:24 ` ktkachov at gcc dot gnu.org
@ 2023-02-24 15:37 ` ktkachov at gcc dot gnu.org
  2023-04-19  8:35 ` cvs-commit at gcc dot gnu.org
  2023-04-19  8:37 ` ktkachov at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: ktkachov at gcc dot gnu.org @ 2023-02-24 15:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108840

--- Comment #3 from ktkachov at gcc dot gnu.org ---
Created attachment 54531
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54531&action=edit
Candidate patch

Candidate patch attached.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/108840] Aarch64 doesn't optimize away shift counter masking
  2023-02-17 17:59 [Bug target/108840] New: Aarch64 doesn't optimize away shift counter masking jakub at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2023-02-24 15:37 ` ktkachov at gcc dot gnu.org
@ 2023-04-19  8:35 ` cvs-commit at gcc dot gnu.org
  2023-04-19  8:37 ` ktkachov at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-04-19  8:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108840

--- Comment #4 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Kyrylo Tkachov <ktkachov@gcc.gnu.org>:

https://gcc.gnu.org/g:136330bf637b50a4f10ace017a4316541386b9c0

commit r14-62-g136330bf637b50a4f10ace017a4316541386b9c0
Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
Date:   Wed Apr 19 09:34:40 2023 +0100

    aarch64: PR target/108840 Simplify register shift RTX costs and eliminate
shift amount masking

    In this PR we fail to eliminate explicit &31 operations for variable shifts
such as in:
    void
    bar (int x[3], int y)
    {
      x[0] <<= (y & 31);
      x[1] <<= (y & 31);
      x[2] <<= (y & 31);
    }

    This is rejected by RTX costs that end up giving too high a cost for:
    (set (reg:SI 96)
        (ashift:SI (reg:SI 98)
            (subreg:QI (and:SI (reg:SI 99)
                    (const_int 31 [0x1f])) 0)))

    There is code to handle the AND-31 case in rtx costs, but it gets confused
by the subreg.
    It's easy enough to fix by looking inside the subreg when costing the
expression.
    While doing that I noticed that the ASHIFT case and the other shift-like
cases are almost identical
    and we should just merge them. This code will only be used for valid insns
anyway, so the code after this
    patch should do the Right Thing (TM) for all such shift cases.

    With this patch there are no more "and wn, wn, 31" instructions left in the
testcase.

    Bootstrapped and tested on aarch64-none-linux-gnu.

            PR target/108840

    gcc/ChangeLog:

            * config/aarch64/aarch64.cc (aarch64_rtx_costs): Merge ASHIFT and
            ROTATE, ROTATERT, LSHIFTRT, ASHIFTRT cases.  Handle subregs in op1.

    gcc/testsuite/ChangeLog:

            * gcc.target/aarch64/pr108840.c: New test.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/108840] Aarch64 doesn't optimize away shift counter masking
  2023-02-17 17:59 [Bug target/108840] New: Aarch64 doesn't optimize away shift counter masking jakub at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2023-04-19  8:35 ` cvs-commit at gcc dot gnu.org
@ 2023-04-19  8:37 ` ktkachov at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: ktkachov at gcc dot gnu.org @ 2023-04-19  8:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108840

ktkachov at gcc dot gnu.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|ASSIGNED                    |RESOLVED
   Target Milestone|---                         |14.0

--- Comment #5 from ktkachov at gcc dot gnu.org ---
Fixed for GCC 14.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-04-19  8:37 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-17 17:59 [Bug target/108840] New: Aarch64 doesn't optimize away shift counter masking jakub at gcc dot gnu.org
2023-02-17 18:08 ` [Bug target/108840] " pinskia at gcc dot gnu.org
2023-02-21 12:24 ` ktkachov at gcc dot gnu.org
2023-02-24 15:37 ` ktkachov at gcc dot gnu.org
2023-04-19  8:35 ` cvs-commit at gcc dot gnu.org
2023-04-19  8:37 ` ktkachov at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).