public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/101200] New: Unneeded AND after shift
@ 2021-06-24 20:19 steinar+gcc at gunderson dot no
2021-06-24 20:37 ` [Bug rtl-optimization/101200] " pinskia at gcc dot gnu.org
` (6 more replies)
0 siblings, 7 replies; 8+ messages in thread
From: steinar+gcc at gunderson dot no @ 2021-06-24 20:19 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101200
Bug ID: 101200
Summary: Unneeded AND after shift
Product: gcc
Version: 11.1.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: steinar+gcc at gunderson dot no
Target Milestone: ---
The code after reduction is:
struct {
int b[6];
} c;
unsigned char d;
void e() {
unsigned char a = d >> 4, f = d & 15;
c.b[a] = c.b[f];
}
with g++-11 -O2, this produces
movzbl d(%rip), %eax
movq %rax, %rdx
shrq $4, %rax
andl $15, %edx
andl $15, %eax
movl c(,%rdx,4), %edx
movl %edx, c(,%rax,4)
ret
The second AND with 15 is unneeded and should have been optimized away by VRP
as I understand it. I can't reproduce it with ARM, though, so maybe there's
something x86-specific?
Compiler is
gcc version 11.1.0 (Debian 11.1.0-3)
The same code is generated back to at least 4.9. Also present in
gcc version 12.0.0 20210527 (experimental) [master revision
262e75d22c3:7bb6b9b2f47:9d3a953ec4d2695e9a6bfa5f22655e2aea47a973] (Debian
20210527-1)
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug rtl-optimization/101200] Unneeded AND after shift
2021-06-24 20:19 [Bug tree-optimization/101200] New: Unneeded AND after shift steinar+gcc at gunderson dot no
@ 2021-06-24 20:37 ` pinskia at gcc dot gnu.org
2021-06-24 20:45 ` pinskia at gcc dot gnu.org
` (5 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-06-24 20:37 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101200
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Component|tree-optimization |rtl-optimization
Keywords| |missed-optimization
Severity|normal |enhancement
--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
This does not make sense:
Trying 12 -> 14:
12: {r93:QI=r82:QI 0>>0x4;clobber flags:CC;}
REG_DEAD r82:QI
REG_UNUSED flags:CC
14: {r95:DI=r93:QI#0&0xf;clobber flags:CC;}
REG_UNUSED flags:CC
REG_DEAD r93:QI
Failed to match this instruction:
(parallel [
(set (reg:DI 95 [ a ])
(zero_extract:DI (subreg:HI (reg:QI 82 [ d.0_1 ]) 0)
(const_int 4 [0x4])
(const_int 4 [0x4])))
(clobber (reg:CC 17 flags))
])
How can you have a subreg of HI and then get a zero_extract of DI?
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug rtl-optimization/101200] Unneeded AND after shift
2021-06-24 20:19 [Bug tree-optimization/101200] New: Unneeded AND after shift steinar+gcc at gunderson dot no
2021-06-24 20:37 ` [Bug rtl-optimization/101200] " pinskia at gcc dot gnu.org
@ 2021-06-24 20:45 ` pinskia at gcc dot gnu.org
2021-06-24 20:50 ` pinskia at gcc dot gnu.org
` (4 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-06-24 20:45 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101200
--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Note the tree level looks decent:
a_6 = d.0_1 >> 4;
f_7 = d.0_1 & 15;
_2 = (int) f_7;
_3 = (int) a_6;
_4 = c.b[_2];
c.b[_3] = _4;
Which gets expanded (for c.b[_3] and dependents) into:
(insn 5 4 0 (set (reg:QI 82 [ d.0_1 ])
(mem/c:QI (symbol_ref:DI ("d") [flags 0x2] <var_decl 0x7f83ad5babd0
d>) [0 d+0 S1 A8])) "t6.c":6:17 -1
(nil))
.....
(insn 12 11 13 (parallel [
(set (reg:QI 93 [ a ])
(lshiftrt:QI (reg:QI 82 [ d.0_1 ])
(const_int 4 [0x4])))
(clobber (reg:CC 17 flags))
]) "t6.c":6:17 -1
(nil))
(insn 13 12 14 (set (reg:SI 94)
(zero_extend:SI (reg:QI 93 [ a ]))) "t6.c":7:6 -1
(nil))
(insn 14 13 15 (set (reg:DI 95)
(sign_extend:DI (reg:SI 94))) "t6.c":7:10 -1
(nil))
(insn 15 14 0 (set (mem:SI (plus:DI (mult:DI (reg:DI 95)
(const_int 4 [0x4]))
(reg/f:DI 92)) [1 c.b[_3]+0 S4 A32])
(reg:SI 85 [ _4 ])) "t6.c":7:10 -1
(nil))
Part of the problem is tracking of QI vs DI modes.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug rtl-optimization/101200] Unneeded AND after shift
2021-06-24 20:19 [Bug tree-optimization/101200] New: Unneeded AND after shift steinar+gcc at gunderson dot no
2021-06-24 20:37 ` [Bug rtl-optimization/101200] " pinskia at gcc dot gnu.org
2021-06-24 20:45 ` pinskia at gcc dot gnu.org
@ 2021-06-24 20:50 ` pinskia at gcc dot gnu.org
2021-06-25 9:34 ` [Bug target/101200] " jakub at gcc dot gnu.org
` (3 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-06-24 20:50 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101200
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Ever confirmed|0 |1
Last reconfirmed| |2021-06-24
Target| |x86_64-linux-gnu
--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
For aarch64 we get:
adrp x1, .LANCHOR0
add x0, x1, :lo12:.LANCHOR0
add x0, x0, 8
ldrb w1, [x1, #:lo12:.LANCHOR0]
and x2, x1, 15
ubfx x1, x1, 4, 4
ldr w2, [x0, x2, lsl 2]
str w2, [x0, x1, lsl 2]
ret
Note the shift and and is combined into one instruction (ubfx) but really only
a shift instruction is needed.
Here we have:
Trying 21 -> 22:
21: r112:SI=r92:SI 0>>0x4
REG_DEAD r92:SI
22: r113:DI=sign_extend(r112:SI)
REG_DEAD r112:SI
Successfully matched this instruction:
(set (reg:DI 113)
(zero_extract:DI (subreg:DI (reg:SI 92 [ d.0_1 ]) 0)
(const_int 4 [0x4])
(const_int 4 [0x4])))
The multiple modes issue is part of the problem. If I was redesigning the
backends, I would only allow DI mode (and SI mode for i386) and always have the
zero extends on loads.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug target/101200] Unneeded AND after shift
2021-06-24 20:19 [Bug tree-optimization/101200] New: Unneeded AND after shift steinar+gcc at gunderson dot no
` (2 preceding siblings ...)
2021-06-24 20:50 ` pinskia at gcc dot gnu.org
@ 2021-06-25 9:34 ` jakub at gcc dot gnu.org
2021-06-25 9:38 ` jakub at gcc dot gnu.org
` (2 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-06-25 9:34 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101200
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |jakub at gcc dot gnu.org,
| |law at gcc dot gnu.org
--- Comment #4 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Doing everything in DImode or SImode has some advantages, but lots of
disadvantages too.
x86 can do 8/16/32/64bit arithmetics just fine (although 16bit can cause some
performance problems).
The second & 15 from the source is optimized away during GIMPLE operations.
But later on reappears from just a zero extension from 8 bits to 64 bits, with
a few bits known to be zero (so not & 255 but just & 15).
I think the right pass to optimize this back is REE.
If I manually undo the optimization that combine did and replace
(insn 14 13 15 2 (parallel [
(set (reg:DI 95 [ a ])
(and:DI (subreg:DI (reg:QI 93 [ a ]) 0)
(const_int 15 [0xf])))
(clobber (reg:CC 17 flags))
]) "pr101200.c":9:10 494 {*anddi_1}
with
(insn 14 13 15 2 (set (reg:DI 95 [ a ])
(zero_extend:DI (reg:QI 93 [ a ]))) "pr101200.c":9:10 137
{zero_extendqidi2}
then REE tries something but gives up anyway:
Trying to eliminate extension:
(insn 14 13 15 2 (set (reg:DI 0 ax [orig:95 a ] [95])
(zero_extend:DI (reg:QI 0 ax [orig:93 a ] [93]))) "pr101200.c":9:10 137
{zero_extendqidi2}
(nil))
Tentatively merged extension with definition :
(insn 12 10 13 2 (parallel [
(set (reg:DI 0 ax)
(zero_extend:DI (lshiftrt:QI (reg:QI 0 ax [orig:82 d.0_1 ]
[82])
(const_int 4 [0x4]))))
(clobber (reg:CC 17 flags))
]) "pr101200.c":8:17 -1
(nil))
Merge cancelled, non-mergeable definitions:
(insn 12 10 13 2 (parallel [
(set (reg:QI 0 ax [orig:93 a ] [93])
(lshiftrt:QI (reg:QI 0 ax [orig:82 d.0_1 ] [82])
(const_int 4 [0x4])))
(clobber (reg:CC 17 flags))
]) "pr101200.c":8:17 717 {*lshrqi3_1}
(nil))
Elimination opportunities = 1 realized = 0
So, to handle this, REE would need to figure out that some ANDs following
logical right shifts can be treated as zero extensions too and also be taught
to handle lshiftrts, replace the QImode MEM read with zero extending one,
widening the right shift from QImode to DImode too (or ideally to SImode given
the behavior of x86) and optimizing away the zero extension (emitted as AND
15).
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug target/101200] Unneeded AND after shift
2021-06-24 20:19 [Bug tree-optimization/101200] New: Unneeded AND after shift steinar+gcc at gunderson dot no
` (3 preceding siblings ...)
2021-06-25 9:34 ` [Bug target/101200] " jakub at gcc dot gnu.org
@ 2021-06-25 9:38 ` jakub at gcc dot gnu.org
2021-06-25 10:38 ` steinar+gcc at gunderson dot no
2021-06-29 17:40 ` law at gcc dot gnu.org
6 siblings, 0 replies; 8+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-06-25 9:38 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101200
--- Comment #5 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
BTW, I certainly can't reproduce what #c0 shows, while I see movzbl in there
because QImode loads are done that way in most tunings,
I certainly see
shrb $4, %al
rather than 64-bit right shift.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug target/101200] Unneeded AND after shift
2021-06-24 20:19 [Bug tree-optimization/101200] New: Unneeded AND after shift steinar+gcc at gunderson dot no
` (4 preceding siblings ...)
2021-06-25 9:38 ` jakub at gcc dot gnu.org
@ 2021-06-25 10:38 ` steinar+gcc at gunderson dot no
2021-06-29 17:40 ` law at gcc dot gnu.org
6 siblings, 0 replies; 8+ messages in thread
From: steinar+gcc at gunderson dot no @ 2021-06-25 10:38 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101200
--- Comment #6 from Steinar H. Gunderson <steinar+gcc at gunderson dot no> ---
You're right, I don't know why the shrq happened. When I run now, I get shrb.
Doesn't matter for the bug, though.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug target/101200] Unneeded AND after shift
2021-06-24 20:19 [Bug tree-optimization/101200] New: Unneeded AND after shift steinar+gcc at gunderson dot no
` (5 preceding siblings ...)
2021-06-25 10:38 ` steinar+gcc at gunderson dot no
@ 2021-06-29 17:40 ` law at gcc dot gnu.org
6 siblings, 0 replies; 8+ messages in thread
From: law at gcc dot gnu.org @ 2021-06-29 17:40 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101200
--- Comment #7 from Jeffrey A. Law <law at gcc dot gnu.org> ---
FWIW, it might be worth considering using a mode iterator for the shift count
to allow multiple modes.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2021-06-29 17:40 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-24 20:19 [Bug tree-optimization/101200] New: Unneeded AND after shift steinar+gcc at gunderson dot no
2021-06-24 20:37 ` [Bug rtl-optimization/101200] " pinskia at gcc dot gnu.org
2021-06-24 20:45 ` pinskia at gcc dot gnu.org
2021-06-24 20:50 ` pinskia at gcc dot gnu.org
2021-06-25 9:34 ` [Bug target/101200] " jakub at gcc dot gnu.org
2021-06-25 9:38 ` jakub at gcc dot gnu.org
2021-06-25 10:38 ` steinar+gcc at gunderson dot no
2021-06-29 17:40 ` law at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).