public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/105778] New: Rotate by register --- unnecessary AND instruction
@ 2022-05-30 21:21 zero at smallinteger dot com
2022-05-30 22:42 ` [Bug target/105778] Shift " jakub at gcc dot gnu.org
` (8 more replies)
0 siblings, 9 replies; 10+ messages in thread
From: zero at smallinteger dot com @ 2022-05-30 21:21 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105778
Bug ID: 105778
Summary: Rotate by register --- unnecessary AND instruction
Product: gcc
Version: 12.1.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: zero at smallinteger dot com
Target Milestone: ---
Created attachment 53053
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53053&action=edit
Sample code
With -O2, some x86 shift-by-register instructions are preceded by an
unnecessary AND instruction. The AND instruction is unnecessary because the
shift-by-register instructions already mask the register containing the
variable shift.
In the sample code, the #if 0 branch produces the code
mov rax, rdi
mov ecx, edi
shr rax, cl
ret
but the #if 1 branch produces the code
mov rcx, rdi
mov rax, rdi
and ecx, 63
shr rax, cl
ret
even though the code has the same behavior. Note that the and ecx, 63 is
unnecessary here because shr rax, cl will already operate on the bottom 6 bits
of ecx anyway, as per the Intel manual.
As notated in the code's comments, some explicit masks other than 0x3f may
produce even more inefficient code, e.g.:
movabs rcx, 35184372088831
mov rax, rdi
and rcx, rdi
shr rax, cl
ret
while some other masks like 0xff and 0xffff eliminate the explicit and
altogether.
Found with gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0. Verified with godbolt
for all gcc versions from 9.4.0 through trunk.
For the sake of completeness, I could not get clang to reproduce this problem.
The latest classic ICC compiler available in Godbolt (2021.5.0) can emit code
with MOVABS as above. However, the newer ICX Intel compiler behaves like clang
(this seems reasonable).
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug target/105778] Shift by register --- unnecessary AND instruction
2022-05-30 21:21 [Bug target/105778] New: Rotate by register --- unnecessary AND instruction zero at smallinteger dot com
@ 2022-05-30 22:42 ` jakub at gcc dot gnu.org
2022-05-31 13:19 ` jakub at gcc dot gnu.org
` (7 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-05-30 22:42 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105778
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |jakub at gcc dot gnu.org
--- Comment #1 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
I think:
--- gcc/config/i386/i386.md.jj 2022-05-30 14:07:11.988199636 +0200
+++ gcc/config/i386/i386.md 2022-05-31 00:39:08.031757037 +0200
@@ -12708,19 +12708,21 @@ (define_expand "<insn><mode>3"
""
"ix86_expand_binary_operator (<CODE>, <MODE>mode, operands); DONE;")
+(define_mode_iterator SWI48A [SI (DI "TARGET_64BIT")])
+
;; Avoid useless masking of count operand.
-(define_insn_and_split "*<insn><mode>3_mask"
+(define_insn_and_split "*<insn><SWI48:mode>3_mask_<SWI48A:mode>"
[(set (match_operand:SWI48 0 "nonimmediate_operand")
(any_shiftrt:SWI48
(match_operand:SWI48 1 "nonimmediate_operand")
(subreg:QI
- (and:SI
- (match_operand:SI 2 "register_operand" "c,r")
- (match_operand:SI 3 "const_int_operand")) 0)))
+ (and:SWI48A
+ (match_operand:SWI48A 2 "register_operand" "c,r")
+ (match_operand:SWI48A 3 "const_int_operand")) 0)))
(clobber (reg:CC FLAGS_REG))]
- "ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)
- && (INTVAL (operands[3]) & (GET_MODE_BITSIZE (<MODE>mode)-1))
- == GET_MODE_BITSIZE (<MODE>mode)-1
+ "ix86_binary_operator_ok (<CODE>, <SWI48:MODE>mode, operands)
+ && (INTVAL (operands[3]) & (GET_MODE_BITSIZE (<SWI48:MODE>mode)-1))
+ == GET_MODE_BITSIZE (<SWI48:MODE>mode)-1
&& ix86_pre_reload_split ()"
"#"
"&& 1"
@@ -12754,16 +12756,16 @@ (define_insn_and_split "*<insn><mode>3_m
""
[(set_attr "isa" "*,bmi2")])
-(define_insn_and_split "*<insn><dwi>3_doubleword_mask"
- [(set (match_operand:<DWI> 0 "register_operand")
- (any_shiftrt:<DWI>
- (match_operand:<DWI> 1 "register_operand")
+(define_insn_and_split "*<insn><DWIH:dwi>3_doubleword_mask_<SWI48:mode>"
+ [(set (match_operand:<DWIH:DWI> 0 "register_operand")
+ (any_shiftrt:<DWIH:DWI>
+ (match_operand:<DWIH:DWI> 1 "register_operand")
(subreg:QI
- (and:SI
- (match_operand:SI 2 "register_operand" "c")
- (match_operand:SI 3 "const_int_operand")) 0)))
+ (and:SWI48
+ (match_operand:SWI48 2 "register_operand" "c")
+ (match_operand:SWI48 3 "const_int_operand")) 0)))
(clobber (reg:CC FLAGS_REG))]
- "(INTVAL (operands[3]) & (<MODE_SIZE> * BITS_PER_UNIT)) == 0
+ "(INTVAL (operands[3]) & (<DWIH:MODE_SIZE> * BITS_PER_UNIT)) == 0
&& ix86_pre_reload_split ()"
"#"
"&& 1"
@@ -12772,7 +12774,8 @@ (define_insn_and_split "*<insn><dwi>3_do
(ior:DWIH (lshiftrt:DWIH (match_dup 4)
(and:QI (match_dup 2) (match_dup 8)))
(subreg:DWIH
- (ashift:<DWI> (zero_extend:<DWI> (match_dup 7))
+ (ashift:<DWIH:DWI>
+ (zero_extend:<DWIH:DWI> (match_dup 7))
(minus:QI (match_dup 9)
(and:QI (match_dup 2) (match_dup 8)))) 0)))
(clobber (reg:CC FLAGS_REG))])
@@ -12781,13 +12784,14 @@ (define_insn_and_split "*<insn><dwi>3_do
(any_shiftrt:DWIH (match_dup 7) (match_dup 2)))
(clobber (reg:CC FLAGS_REG))])]
{
- split_double_mode (<DWI>mode, &operands[0], 2, &operands[4], &operands[6]);
+ split_double_mode (<DWIH:DWI>mode, &operands[0], 2, &operands[4],
+ &operands[6]);
- operands[8] = GEN_INT (<MODE_SIZE> * BITS_PER_UNIT - 1);
- operands[9] = GEN_INT (<MODE_SIZE> * BITS_PER_UNIT);
+ operands[8] = GEN_INT (<DWIH:MODE_SIZE> * BITS_PER_UNIT - 1);
+ operands[9] = GEN_INT (<DWIH:MODE_SIZE> * BITS_PER_UNIT);
- if ((INTVAL (operands[3]) & ((<MODE_SIZE> * BITS_PER_UNIT) - 1))
- != ((<MODE_SIZE> * BITS_PER_UNIT) - 1))
+ if ((INTVAL (operands[3]) & ((<DWIH:MODE_SIZE> * BITS_PER_UNIT) - 1))
+ != ((<DWIH:MODE_SIZE> * BITS_PER_UNIT) - 1))
{
rtx tem = gen_reg_rtx (SImode);
emit_insn (gen_andsi3 (tem, operands[2], operands[3]));
could fix this. Wonder if it couldn't be written without the extra iterator
though...
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug target/105778] Shift by register --- unnecessary AND instruction
2022-05-30 21:21 [Bug target/105778] New: Rotate by register --- unnecessary AND instruction zero at smallinteger dot com
2022-05-30 22:42 ` [Bug target/105778] Shift " jakub at gcc dot gnu.org
@ 2022-05-31 13:19 ` jakub at gcc dot gnu.org
2022-05-31 13:44 ` ubizjak at gmail dot com
` (6 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-05-31 13:19 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105778
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Ever confirmed|0 |1
Status|UNCONFIRMED |ASSIGNED
Last reconfirmed| |2022-05-31
Assignee|unassigned at gcc dot gnu.org |jakub at gcc dot gnu.org
--- Comment #2 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Created attachment 53058
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53058&action=edit
gcc13-pr105778.patch
Full untested patch.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug target/105778] Shift by register --- unnecessary AND instruction
2022-05-30 21:21 [Bug target/105778] New: Rotate by register --- unnecessary AND instruction zero at smallinteger dot com
2022-05-30 22:42 ` [Bug target/105778] Shift " jakub at gcc dot gnu.org
2022-05-31 13:19 ` jakub at gcc dot gnu.org
@ 2022-05-31 13:44 ` ubizjak at gmail dot com
2022-05-31 13:50 ` jakub at gcc dot gnu.org
` (5 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: ubizjak at gmail dot com @ 2022-05-31 13:44 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105778
--- Comment #3 from Uroš Bizjak <ubizjak at gmail dot com> ---
Comment on attachment 53058
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53058
gcc13-pr105778.patch
>+ operands[2] = gen_lowpart (QImode, operands[2]);
We have learned that pre-reload splits need to move operand to a register
before calling gen_lowpart.
>+ emit_insn (gen_ashl<dwi>3_doubleword (operands[0], operands[1],
>+ operands[2]));
>+ DONE;
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug target/105778] Shift by register --- unnecessary AND instruction
2022-05-30 21:21 [Bug target/105778] New: Rotate by register --- unnecessary AND instruction zero at smallinteger dot com
` (2 preceding siblings ...)
2022-05-31 13:44 ` ubizjak at gmail dot com
@ 2022-05-31 13:50 ` jakub at gcc dot gnu.org
2022-05-31 14:41 ` jakub at gcc dot gnu.org
` (4 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-05-31 13:50 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105778
--- Comment #4 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
It is the same thing done a few lines later in the preexisting code too.
Shall I all of those change to gen_lowpart (QImode, force_reg (GET_MODE
(operands[2]), operands[2])) then?
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug target/105778] Shift by register --- unnecessary AND instruction
2022-05-30 21:21 [Bug target/105778] New: Rotate by register --- unnecessary AND instruction zero at smallinteger dot com
` (3 preceding siblings ...)
2022-05-31 13:50 ` jakub at gcc dot gnu.org
@ 2022-05-31 14:41 ` jakub at gcc dot gnu.org
2022-05-31 18:53 ` ubizjak at gmail dot com
` (3 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-05-31 14:41 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105778
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Attachment #53058|0 |1
is obsolete| |
--- Comment #5 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Created attachment 53059
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53059&action=edit
gcc13-pr105778.patch
Adjusted patch.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug target/105778] Shift by register --- unnecessary AND instruction
2022-05-30 21:21 [Bug target/105778] New: Rotate by register --- unnecessary AND instruction zero at smallinteger dot com
` (4 preceding siblings ...)
2022-05-31 14:41 ` jakub at gcc dot gnu.org
@ 2022-05-31 18:53 ` ubizjak at gmail dot com
2022-06-02 8:41 ` cvs-commit at gcc dot gnu.org
` (2 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: ubizjak at gmail dot com @ 2022-05-31 18:53 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105778
--- Comment #6 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Jakub Jelinek from comment #4)
> It is the same thing done a few lines later in the preexisting code too.
> Shall I all of those change to gen_lowpart (QImode, force_reg (GET_MODE
> (operands[2]), operands[2])) then?
Yes, please. A "register_operand" predicate accepts SUBREGs of some strange
mode (usually FP mode) registers, and gen_lowpart chokes on these.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug target/105778] Shift by register --- unnecessary AND instruction
2022-05-30 21:21 [Bug target/105778] New: Rotate by register --- unnecessary AND instruction zero at smallinteger dot com
` (5 preceding siblings ...)
2022-05-31 18:53 ` ubizjak at gmail dot com
@ 2022-06-02 8:41 ` cvs-commit at gcc dot gnu.org
2022-06-02 8:44 ` jakub at gcc dot gnu.org
2022-10-24 17:44 ` pinskia at gcc dot gnu.org
8 siblings, 0 replies; 10+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-06-02 8:41 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105778
--- Comment #7 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:
https://gcc.gnu.org/g:dcfdd2851b297e0005a8490b7f867ca45d1ad340
commit r13-927-gdcfdd2851b297e0005a8490b7f867ca45d1ad340
Author: Jakub Jelinek <jakub@redhat.com>
Date: Thu Jun 2 10:40:12 2022 +0200
i386: Optimize away shift count masking of shifts/rotates some more
[PR105778]
As the following testcase shows, our x86 backend support for optimizing
out useless masking of shift/rotate counts when using instructions
that naturally modulo the count themselves is insufficient.
The *_mask define_insn_and_split patterns use
(subreg:QI (and:SI (match_operand:SI) (match_operand "const_int_operand")))
for the masking, but that can catch only the case where the masking
is done in SImode, so typically in SImode in the source.
We then have another set of patterns, *_mask_1, which use
(and:QI (match_operand:QI) (match_operand "const_int_operand"))
If the masking is done in DImode or in theory in HImode, we don't match
it.
The following patch does 4 different things to improve this:
1) drops the mode from AND and MATCH_OPERAND inside of the subreg:QI
and replaces that by checking that the register shift count has
SWI48 mode - I think doing it this way is cheaper than adding
another mode iterator to patterns which use already another mode
iterator and sometimes a code iterator as well
2) the doubleword shift patterns were only handling the case where
the shift count is masked with a constant that has the most significant
bit clear, i.e. where we know the shift count is less than half the
number of bits in double-word. If the mask is equal to half the
number of bits in double-word minus 1, the masking was optimized
away, otherwise the AND was kept.
But if the most significant bit isn't clear, e use a word-sized shift
and SHRD instruction, where the former does the modulo and the latter
modulo with 64 / 32 depending on what mode the CPU is in (so 64 for
128-bit doubleword and 32 or 64-bit doubleword). So we can also
optimize away the masking when the mask has all the relevant bits set,
masking with the most significant bit will remain for the cmove
test.
3) as requested, this patch adds a bunch of force_reg calls before
gen_lowpart
4) 1-3 above unfortunately regressed
+FAIL: gcc.target/i386/bt-mask-2.c scan-assembler-not and[lq][ \\t]
+FAIL: gcc.target/i386/pr57819.c scan-assembler-not and[lq][ \\t]
where we during combine match the new pattern we didn't match
before and in the end don't match the pattern we were testing for.
These 2 tests are fixed by the *jcc_bt<mode>_mask_1 pattern
addition and small tweak to target rtx_costs, because even with
the pattern around we'd refuse to match it because it appeared to
have higher instruction cost
2022-06-02 Jakub Jelinek <jakub@redhat.com>
PR target/105778
* config/i386/i386.md (*ashl<dwi>3_doubleword_mask): Remove :SI
from AND and its operands and just verify operands[2] has HImode,
SImode or for TARGET_64BIT DImode. Allow operands[3] to be a mask
with all low 6 (64-bit) or 5 (32-bit) bits set and in that case
just throw away the masking. Use force_reg before calling
gen_lowpart.
(*ashl<dwi>3_doubleword_mask_1): Allow operands[3] to be a mask
with all low 6 (64-bit) or 5 (32-bit) bits set and in that case
just throw away the masking.
(*ashl<mode>3_doubleword): Rename to ...
(ashl<mode>3_doubleword): ... this.
(*ashl<mode>3_mask): Remove :SI from AND and its operands and just
verify operands[2] has HImode, SImode or for TARGET_64BIT DImode.
Use force_reg before calling gen_lowpart.
(*<insn><mode>3_mask): Likewise.
(*<insn><dwi>3_doubleword_mask): Likewise. Allow operands[3] to be
a mask with all low 6 (64-bit) or 5 (32-bit) bits set and in that
case just throw away the masking. Use force_reg before calling
gen_lowpart.
(*<insn><dwi>3_doubleword_mask_1): Allow operands[3] to be a mask
with all low 6 (64-bit) or 5 (32-bit) bits set and in that case
just
throw away the masking.
(*<insn><mode>3_doubleword): Rename to ...
(<insn><mode>3_doubleword): ... this.
(*<insn><mode>3_mask): Remove :SI from AND and its operands and
just
verify operands[2] has HImode, SImode or for TARGET_64BIT DImode.
Use force_reg before calling gen_lowpart.
(splitter after it): Remove :SI from AND and its operands and just
verify operands[2] has HImode, SImode or for TARGET_64BIT DImode.
(*<btsc><mode>_mask, *<btsc><mode>_mask): Remove :SI from AND and
its
operands and just verify operands[1] has HImode, SImode or for
TARGET_64BIT DImode. Use force_reg before calling gen_lowpart.
(*jcc_bt<mode>_mask_1): New define_insn_and_split pattern.
* config/i386/i386.cc (ix86_rtx_costs): For ZERO_EXTRACT with
ZERO_EXTEND QI->SI in last operand ignore the cost of the
ZERO_EXTEND.
* gcc.target/i386/pr105778.c: New test.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug target/105778] Shift by register --- unnecessary AND instruction
2022-05-30 21:21 [Bug target/105778] New: Rotate by register --- unnecessary AND instruction zero at smallinteger dot com
` (6 preceding siblings ...)
2022-06-02 8:41 ` cvs-commit at gcc dot gnu.org
@ 2022-06-02 8:44 ` jakub at gcc dot gnu.org
2022-10-24 17:44 ` pinskia at gcc dot gnu.org
8 siblings, 0 replies; 10+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-06-02 8:44 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105778
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|ASSIGNED |RESOLVED
Resolution|--- |FIXED
--- Comment #8 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Fixed for GCC 13.1 and later.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug target/105778] Shift by register --- unnecessary AND instruction
2022-05-30 21:21 [Bug target/105778] New: Rotate by register --- unnecessary AND instruction zero at smallinteger dot com
` (7 preceding siblings ...)
2022-06-02 8:44 ` jakub at gcc dot gnu.org
@ 2022-10-24 17:44 ` pinskia at gcc dot gnu.org
8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-10-24 17:44 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105778
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|--- |13.0
Severity|normal |enhancement
Keywords| |missed-optimization
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2022-10-24 17:44 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-30 21:21 [Bug target/105778] New: Rotate by register --- unnecessary AND instruction zero at smallinteger dot com
2022-05-30 22:42 ` [Bug target/105778] Shift " jakub at gcc dot gnu.org
2022-05-31 13:19 ` jakub at gcc dot gnu.org
2022-05-31 13:44 ` ubizjak at gmail dot com
2022-05-31 13:50 ` jakub at gcc dot gnu.org
2022-05-31 14:41 ` jakub at gcc dot gnu.org
2022-05-31 18:53 ` ubizjak at gmail dot com
2022-06-02 8:41 ` cvs-commit at gcc dot gnu.org
2022-06-02 8:44 ` jakub at gcc dot gnu.org
2022-10-24 17:44 ` pinskia at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).