public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/101200] New: Unneeded AND after shift
@ 2021-06-24 20:19 steinar+gcc at gunderson dot no
  2021-06-24 20:37 ` [Bug rtl-optimization/101200] " pinskia at gcc dot gnu.org
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: steinar+gcc at gunderson dot no @ 2021-06-24 20:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101200

            Bug ID: 101200
           Summary: Unneeded AND after shift
           Product: gcc
           Version: 11.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: steinar+gcc at gunderson dot no
  Target Milestone: ---

The code after reduction is:

struct {
  int b[6];
} c;
unsigned char d;
void e() {
  unsigned char a = d >> 4, f = d & 15;
  c.b[a] = c.b[f];
}

with g++-11 -O2, this produces

        movzbl  d(%rip), %eax
        movq    %rax, %rdx
        shrq    $4, %rax
        andl    $15, %edx
        andl    $15, %eax
        movl    c(,%rdx,4), %edx
        movl    %edx, c(,%rax,4)
        ret

The second AND with 15 is unneeded and should have been optimized away by VRP
as I understand it. I can't reproduce it with ARM, though, so maybe there's
something x86-specific?

Compiler is

  gcc version 11.1.0 (Debian 11.1.0-3) 

The same code is generated back to at least 4.9. Also present in

  gcc version 12.0.0 20210527 (experimental) [master revision
262e75d22c3:7bb6b9b2f47:9d3a953ec4d2695e9a6bfa5f22655e2aea47a973] (Debian
20210527-1)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug rtl-optimization/101200] Unneeded AND after shift
  2021-06-24 20:19 [Bug tree-optimization/101200] New: Unneeded AND after shift steinar+gcc at gunderson dot no
@ 2021-06-24 20:37 ` pinskia at gcc dot gnu.org
  2021-06-24 20:45 ` pinskia at gcc dot gnu.org
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-06-24 20:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101200

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|tree-optimization           |rtl-optimization
           Keywords|                            |missed-optimization
           Severity|normal                      |enhancement

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
This does not make sense:
Trying 12 -> 14:
   12: {r93:QI=r82:QI 0>>0x4;clobber flags:CC;}
      REG_DEAD r82:QI
      REG_UNUSED flags:CC
   14: {r95:DI=r93:QI#0&0xf;clobber flags:CC;}
      REG_UNUSED flags:CC
      REG_DEAD r93:QI
Failed to match this instruction:
(parallel [
        (set (reg:DI 95 [ a ])
            (zero_extract:DI (subreg:HI (reg:QI 82 [ d.0_1 ]) 0)
                (const_int 4 [0x4])
                (const_int 4 [0x4])))
        (clobber (reg:CC 17 flags))
    ])

How can you have a subreg of HI and then get a zero_extract of DI?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug rtl-optimization/101200] Unneeded AND after shift
  2021-06-24 20:19 [Bug tree-optimization/101200] New: Unneeded AND after shift steinar+gcc at gunderson dot no
  2021-06-24 20:37 ` [Bug rtl-optimization/101200] " pinskia at gcc dot gnu.org
@ 2021-06-24 20:45 ` pinskia at gcc dot gnu.org
  2021-06-24 20:50 ` pinskia at gcc dot gnu.org
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-06-24 20:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101200

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Note the tree level looks decent:
  a_6 = d.0_1 >> 4;
  f_7 = d.0_1 & 15;
  _2 = (int) f_7;
  _3 = (int) a_6;
  _4 = c.b[_2];
  c.b[_3] = _4;

Which gets expanded (for c.b[_3] and dependents) into:
(insn 5 4 0 (set (reg:QI 82 [ d.0_1 ])
        (mem/c:QI (symbol_ref:DI ("d") [flags 0x2]  <var_decl 0x7f83ad5babd0
d>) [0 d+0 S1 A8])) "t6.c":6:17 -1
     (nil))

.....
(insn 12 11 13 (parallel [
            (set (reg:QI 93 [ a ])
                (lshiftrt:QI (reg:QI 82 [ d.0_1 ])
                    (const_int 4 [0x4])))
            (clobber (reg:CC 17 flags))
        ]) "t6.c":6:17 -1
     (nil))

(insn 13 12 14 (set (reg:SI 94)
        (zero_extend:SI (reg:QI 93 [ a ]))) "t6.c":7:6 -1
     (nil))

(insn 14 13 15 (set (reg:DI 95)
        (sign_extend:DI (reg:SI 94))) "t6.c":7:10 -1
     (nil))

(insn 15 14 0 (set (mem:SI (plus:DI (mult:DI (reg:DI 95)
                    (const_int 4 [0x4]))
                (reg/f:DI 92)) [1 c.b[_3]+0 S4 A32])
        (reg:SI 85 [ _4 ])) "t6.c":7:10 -1
     (nil))

Part of the problem is tracking of QI vs DI modes.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug rtl-optimization/101200] Unneeded AND after shift
  2021-06-24 20:19 [Bug tree-optimization/101200] New: Unneeded AND after shift steinar+gcc at gunderson dot no
  2021-06-24 20:37 ` [Bug rtl-optimization/101200] " pinskia at gcc dot gnu.org
  2021-06-24 20:45 ` pinskia at gcc dot gnu.org
@ 2021-06-24 20:50 ` pinskia at gcc dot gnu.org
  2021-06-25  9:34 ` [Bug target/101200] " jakub at gcc dot gnu.org
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-06-24 20:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101200

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2021-06-24
             Target|                            |x86_64-linux-gnu

--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
For aarch64 we get:
        adrp    x1, .LANCHOR0
        add     x0, x1, :lo12:.LANCHOR0
        add     x0, x0, 8
        ldrb    w1, [x1, #:lo12:.LANCHOR0]
        and     x2, x1, 15
        ubfx    x1, x1, 4, 4
        ldr     w2, [x0, x2, lsl 2]
        str     w2, [x0, x1, lsl 2]
        ret

Note the shift and and is combined into one instruction (ubfx) but really only
a shift instruction is needed.
Here we have:
Trying 21 -> 22:
   21: r112:SI=r92:SI 0>>0x4
      REG_DEAD r92:SI
   22: r113:DI=sign_extend(r112:SI)
      REG_DEAD r112:SI
Successfully matched this instruction:
(set (reg:DI 113)
    (zero_extract:DI (subreg:DI (reg:SI 92 [ d.0_1 ]) 0)
        (const_int 4 [0x4])
        (const_int 4 [0x4])))

The multiple modes issue is part of the problem.  If I was redesigning the
backends, I would only allow DI mode (and SI mode for i386) and always have the
zero extends on loads.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/101200] Unneeded AND after shift
  2021-06-24 20:19 [Bug tree-optimization/101200] New: Unneeded AND after shift steinar+gcc at gunderson dot no
                   ` (2 preceding siblings ...)
  2021-06-24 20:50 ` pinskia at gcc dot gnu.org
@ 2021-06-25  9:34 ` jakub at gcc dot gnu.org
  2021-06-25  9:38 ` jakub at gcc dot gnu.org
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-06-25  9:34 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101200

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org,
                   |                            |law at gcc dot gnu.org

--- Comment #4 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Doing everything in DImode or SImode has some advantages, but lots of
disadvantages too.
x86 can do 8/16/32/64bit arithmetics just fine (although 16bit can cause some
performance problems).
The second & 15 from the source is optimized away during GIMPLE operations.
But later on reappears from just a zero extension from 8 bits to 64 bits, with
a few bits known to be zero (so not & 255 but just & 15).

I think the right pass to optimize this back is REE.

If I manually undo the optimization that combine did and replace
(insn 14 13 15 2 (parallel [
            (set (reg:DI 95 [ a ])
                (and:DI (subreg:DI (reg:QI 93 [ a ]) 0)
                    (const_int 15 [0xf])))
            (clobber (reg:CC 17 flags))
        ]) "pr101200.c":9:10 494 {*anddi_1}
with
(insn 14 13 15 2 (set (reg:DI 95 [ a ])
        (zero_extend:DI (reg:QI 93 [ a ]))) "pr101200.c":9:10 137
{zero_extendqidi2}
then REE tries something but gives up anyway:
Trying to eliminate extension:
(insn 14 13 15 2 (set (reg:DI 0 ax [orig:95 a ] [95])
        (zero_extend:DI (reg:QI 0 ax [orig:93 a ] [93]))) "pr101200.c":9:10 137
{zero_extendqidi2}
     (nil))
Tentatively merged extension with definition :
(insn 12 10 13 2 (parallel [
            (set (reg:DI 0 ax)
                (zero_extend:DI (lshiftrt:QI (reg:QI 0 ax [orig:82 d.0_1 ]
[82])
                        (const_int 4 [0x4]))))
            (clobber (reg:CC 17 flags))
        ]) "pr101200.c":8:17 -1
     (nil))
Merge cancelled, non-mergeable definitions:
(insn 12 10 13 2 (parallel [
            (set (reg:QI 0 ax [orig:93 a ] [93])
                (lshiftrt:QI (reg:QI 0 ax [orig:82 d.0_1 ] [82])
                    (const_int 4 [0x4])))
            (clobber (reg:CC 17 flags))
        ]) "pr101200.c":8:17 717 {*lshrqi3_1}
     (nil))
Elimination opportunities = 1 realized = 0
So, to handle this, REE would need to figure out that some ANDs following
logical right shifts can be treated as zero extensions too and also be taught
to handle lshiftrts, replace the QImode MEM read with zero extending one,
widening the right shift from QImode to DImode too (or ideally to SImode given
the behavior of x86) and optimizing away the zero extension (emitted as AND
15).

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/101200] Unneeded AND after shift
  2021-06-24 20:19 [Bug tree-optimization/101200] New: Unneeded AND after shift steinar+gcc at gunderson dot no
                   ` (3 preceding siblings ...)
  2021-06-25  9:34 ` [Bug target/101200] " jakub at gcc dot gnu.org
@ 2021-06-25  9:38 ` jakub at gcc dot gnu.org
  2021-06-25 10:38 ` steinar+gcc at gunderson dot no
  2021-06-29 17:40 ` law at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-06-25  9:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101200

--- Comment #5 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
BTW, I certainly can't reproduce what #c0 shows, while I see movzbl in there
because QImode loads are done that way in most tunings,
I certainly see
        shrb    $4, %al
rather than 64-bit right shift.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/101200] Unneeded AND after shift
  2021-06-24 20:19 [Bug tree-optimization/101200] New: Unneeded AND after shift steinar+gcc at gunderson dot no
                   ` (4 preceding siblings ...)
  2021-06-25  9:38 ` jakub at gcc dot gnu.org
@ 2021-06-25 10:38 ` steinar+gcc at gunderson dot no
  2021-06-29 17:40 ` law at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: steinar+gcc at gunderson dot no @ 2021-06-25 10:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101200

--- Comment #6 from Steinar H. Gunderson <steinar+gcc at gunderson dot no> ---
You're right, I don't know why the shrq happened. When I run now, I get shrb.
Doesn't matter for the bug, though.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/101200] Unneeded AND after shift
  2021-06-24 20:19 [Bug tree-optimization/101200] New: Unneeded AND after shift steinar+gcc at gunderson dot no
                   ` (5 preceding siblings ...)
  2021-06-25 10:38 ` steinar+gcc at gunderson dot no
@ 2021-06-29 17:40 ` law at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: law at gcc dot gnu.org @ 2021-06-29 17:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101200

--- Comment #7 from Jeffrey A. Law <law at gcc dot gnu.org> ---
FWIW, it might be worth considering using a mode iterator for the shift count
to allow multiple modes.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-06-29 17:40 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-24 20:19 [Bug tree-optimization/101200] New: Unneeded AND after shift steinar+gcc at gunderson dot no
2021-06-24 20:37 ` [Bug rtl-optimization/101200] " pinskia at gcc dot gnu.org
2021-06-24 20:45 ` pinskia at gcc dot gnu.org
2021-06-24 20:50 ` pinskia at gcc dot gnu.org
2021-06-25  9:34 ` [Bug target/101200] " jakub at gcc dot gnu.org
2021-06-25  9:38 ` jakub at gcc dot gnu.org
2021-06-25 10:38 ` steinar+gcc at gunderson dot no
2021-06-29 17:40 ` law at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).