[Bug target/105778] New: Rotate by register --- unnecessary AND instruction

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug target/105778] New: Rotate by register --- unnecessary AND instruction
@ 2022-05-30 21:21 zero at smallinteger dot com
  2022-05-30 22:42 ` [Bug target/105778] Shift " jakub at gcc dot gnu.org
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: zero at smallinteger dot com @ 2022-05-30 21:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105778

            Bug ID: 105778
           Summary: Rotate by register --- unnecessary AND instruction
           Product: gcc
           Version: 12.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: zero at smallinteger dot com
  Target Milestone: ---

Created attachment 53053
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53053&action=edit
Sample code

With -O2, some x86 shift-by-register instructions are preceded by an
unnecessary AND instruction.  The AND instruction is unnecessary because the
shift-by-register instructions already mask the register containing the
variable shift.

In the sample code, the #if 0 branch produces the code

        mov     rax, rdi
        mov     ecx, edi
        shr     rax, cl
        ret

but the #if 1 branch produces the code

        mov     rcx, rdi
        mov     rax, rdi
        and     ecx, 63
        shr     rax, cl
        ret

even though the code has the same behavior.  Note that the and ecx, 63 is
unnecessary here because shr rax, cl will already operate on the bottom 6 bits
of ecx anyway, as per the Intel manual.

As notated in the code's comments, some explicit masks other than 0x3f may
produce even more inefficient code, e.g.:

        movabs  rcx, 35184372088831
        mov     rax, rdi
        and     rcx, rdi
        shr     rax, cl
        ret

while some other masks like 0xff and 0xffff eliminate the explicit and
altogether.

Found with gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0.  Verified with godbolt
for all gcc versions from 9.4.0 through trunk.

For the sake of completeness, I could not get clang to reproduce this problem. 
The latest classic ICC compiler available in Godbolt (2021.5.0) can emit code
with MOVABS as above.  However, the newer ICX Intel compiler behaves like clang
(this seems reasonable).

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/105778] Shift by register --- unnecessary AND instruction
  2022-05-30 21:21 [Bug target/105778] New: Rotate by register --- unnecessary AND instruction zero at smallinteger dot com
@ 2022-05-30 22:42 ` jakub at gcc dot gnu.org
  2022-05-31 13:19 ` jakub at gcc dot gnu.org
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-05-30 22:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105778

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org

--- Comment #1 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
I think:
--- gcc/config/i386/i386.md.jj  2022-05-30 14:07:11.988199636 +0200
+++ gcc/config/i386/i386.md     2022-05-31 00:39:08.031757037 +0200
@@ -12708,19 +12708,21 @@ (define_expand "<insn><mode>3"
   ""
   "ix86_expand_binary_operator (<CODE>, <MODE>mode, operands); DONE;")

+(define_mode_iterator SWI48A [SI (DI "TARGET_64BIT")])
+
 ;; Avoid useless masking of count operand.
-(define_insn_and_split "*<insn><mode>3_mask"
+(define_insn_and_split "*<insn><SWI48:mode>3_mask_<SWI48A:mode>"
   [(set (match_operand:SWI48 0 "nonimmediate_operand")
        (any_shiftrt:SWI48
          (match_operand:SWI48 1 "nonimmediate_operand")
          (subreg:QI
-           (and:SI
-             (match_operand:SI 2 "register_operand" "c,r")
-             (match_operand:SI 3 "const_int_operand")) 0)))
+           (and:SWI48A
+             (match_operand:SWI48A 2 "register_operand" "c,r")
+             (match_operand:SWI48A 3 "const_int_operand")) 0)))
    (clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)
-   && (INTVAL (operands[3]) & (GET_MODE_BITSIZE (<MODE>mode)-1))
-      == GET_MODE_BITSIZE (<MODE>mode)-1
+  "ix86_binary_operator_ok (<CODE>, <SWI48:MODE>mode, operands)
+   && (INTVAL (operands[3]) & (GET_MODE_BITSIZE (<SWI48:MODE>mode)-1))
+      == GET_MODE_BITSIZE (<SWI48:MODE>mode)-1
    && ix86_pre_reload_split ()"
   "#"
   "&& 1"
@@ -12754,16 +12756,16 @@ (define_insn_and_split "*<insn><mode>3_m
   ""
   [(set_attr "isa" "*,bmi2")])

-(define_insn_and_split "*<insn><dwi>3_doubleword_mask"
-  [(set (match_operand:<DWI> 0 "register_operand")
-       (any_shiftrt:<DWI>
-         (match_operand:<DWI> 1 "register_operand")
+(define_insn_and_split "*<insn><DWIH:dwi>3_doubleword_mask_<SWI48:mode>"
+  [(set (match_operand:<DWIH:DWI> 0 "register_operand")
+       (any_shiftrt:<DWIH:DWI>
+         (match_operand:<DWIH:DWI> 1 "register_operand")
          (subreg:QI
-           (and:SI
-             (match_operand:SI 2 "register_operand" "c")
-             (match_operand:SI 3 "const_int_operand")) 0)))
+           (and:SWI48
+             (match_operand:SWI48 2 "register_operand" "c")
+             (match_operand:SWI48 3 "const_int_operand")) 0)))
    (clobber (reg:CC FLAGS_REG))]
-  "(INTVAL (operands[3]) & (<MODE_SIZE> * BITS_PER_UNIT)) == 0
+  "(INTVAL (operands[3]) & (<DWIH:MODE_SIZE> * BITS_PER_UNIT)) == 0
    && ix86_pre_reload_split ()"
   "#"
   "&& 1"
@@ -12772,7 +12774,8 @@ (define_insn_and_split "*<insn><dwi>3_do
           (ior:DWIH (lshiftrt:DWIH (match_dup 4)
                       (and:QI (match_dup 2) (match_dup 8)))
                     (subreg:DWIH
-                      (ashift:<DWI> (zero_extend:<DWI> (match_dup 7))
+                      (ashift:<DWIH:DWI>
+                        (zero_extend:<DWIH:DWI> (match_dup 7))
                         (minus:QI (match_dup 9)
                                   (and:QI (match_dup 2) (match_dup 8)))) 0)))
       (clobber (reg:CC FLAGS_REG))])
@@ -12781,13 +12784,14 @@ (define_insn_and_split "*<insn><dwi>3_do
           (any_shiftrt:DWIH (match_dup 7) (match_dup 2)))
       (clobber (reg:CC FLAGS_REG))])]
 {
-  split_double_mode (<DWI>mode, &operands[0], 2, &operands[4], &operands[6]);
+  split_double_mode (<DWIH:DWI>mode, &operands[0], 2, &operands[4],
+                    &operands[6]);

-  operands[8] = GEN_INT (<MODE_SIZE> * BITS_PER_UNIT - 1);
-  operands[9] = GEN_INT (<MODE_SIZE> * BITS_PER_UNIT);
+  operands[8] = GEN_INT (<DWIH:MODE_SIZE> * BITS_PER_UNIT - 1);
+  operands[9] = GEN_INT (<DWIH:MODE_SIZE> * BITS_PER_UNIT);

-  if ((INTVAL (operands[3]) & ((<MODE_SIZE> * BITS_PER_UNIT) - 1))
-      != ((<MODE_SIZE> * BITS_PER_UNIT) - 1))
+  if ((INTVAL (operands[3]) & ((<DWIH:MODE_SIZE> * BITS_PER_UNIT) - 1))
+      != ((<DWIH:MODE_SIZE> * BITS_PER_UNIT) - 1))
     {
       rtx tem = gen_reg_rtx (SImode);
       emit_insn (gen_andsi3 (tem, operands[2], operands[3]));
could fix this.  Wonder if it couldn't be written without the extra iterator
though...

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/105778] Shift by register --- unnecessary AND instruction
  2022-05-30 21:21 [Bug target/105778] New: Rotate by register --- unnecessary AND instruction zero at smallinteger dot com
  2022-05-30 22:42 ` [Bug target/105778] Shift " jakub at gcc dot gnu.org
@ 2022-05-31 13:19 ` jakub at gcc dot gnu.org
  2022-05-31 13:44 ` ubizjak at gmail dot com
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-05-31 13:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105778

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |ASSIGNED
   Last reconfirmed|                            |2022-05-31
           Assignee|unassigned at gcc dot gnu.org      |jakub at gcc dot gnu.org

--- Comment #2 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Created attachment 53058
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53058&action=edit
gcc13-pr105778.patch

Full untested patch.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/105778] Shift by register --- unnecessary AND instruction
  2022-05-30 21:21 [Bug target/105778] New: Rotate by register --- unnecessary AND instruction zero at smallinteger dot com
  2022-05-30 22:42 ` [Bug target/105778] Shift " jakub at gcc dot gnu.org
  2022-05-31 13:19 ` jakub at gcc dot gnu.org
@ 2022-05-31 13:44 ` ubizjak at gmail dot com
  2022-05-31 13:50 ` jakub at gcc dot gnu.org
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: ubizjak at gmail dot com @ 2022-05-31 13:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105778

--- Comment #3 from Uroš Bizjak <ubizjak at gmail dot com> ---
Comment on attachment 53058
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53058
gcc13-pr105778.patch

>+      operands[2] = gen_lowpart (QImode, operands[2]);

We have learned that pre-reload splits need to move operand to a register
before calling gen_lowpart.

>+      emit_insn (gen_ashl<dwi>3_doubleword (operands[0], operands[1],
>+					    operands[2]));
>+      DONE;

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/105778] Shift by register --- unnecessary AND instruction
  2022-05-30 21:21 [Bug target/105778] New: Rotate by register --- unnecessary AND instruction zero at smallinteger dot com
                   ` (2 preceding siblings ...)
  2022-05-31 13:44 ` ubizjak at gmail dot com
@ 2022-05-31 13:50 ` jakub at gcc dot gnu.org
  2022-05-31 14:41 ` jakub at gcc dot gnu.org
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-05-31 13:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105778

--- Comment #4 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
It is the same thing done a few lines later in the preexisting code too.
Shall I all of those change to gen_lowpart (QImode, force_reg (GET_MODE
(operands[2]), operands[2])) then?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/105778] Shift by register --- unnecessary AND instruction
  2022-05-30 21:21 [Bug target/105778] New: Rotate by register --- unnecessary AND instruction zero at smallinteger dot com
                   ` (3 preceding siblings ...)
  2022-05-31 13:50 ` jakub at gcc dot gnu.org
@ 2022-05-31 14:41 ` jakub at gcc dot gnu.org
  2022-05-31 18:53 ` ubizjak at gmail dot com
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-05-31 14:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105778

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #53058|0                           |1
        is obsolete|                            |

--- Comment #5 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Created attachment 53059
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53059&action=edit
gcc13-pr105778.patch

Adjusted patch.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/105778] Shift by register --- unnecessary AND instruction
  2022-05-30 21:21 [Bug target/105778] New: Rotate by register --- unnecessary AND instruction zero at smallinteger dot com
                   ` (4 preceding siblings ...)
  2022-05-31 14:41 ` jakub at gcc dot gnu.org
@ 2022-05-31 18:53 ` ubizjak at gmail dot com
  2022-06-02  8:41 ` cvs-commit at gcc dot gnu.org
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: ubizjak at gmail dot com @ 2022-05-31 18:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105778

--- Comment #6 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Jakub Jelinek from comment #4)
> It is the same thing done a few lines later in the preexisting code too.
> Shall I all of those change to gen_lowpart (QImode, force_reg (GET_MODE
> (operands[2]), operands[2])) then?

Yes, please. A "register_operand" predicate accepts SUBREGs of some strange
mode (usually FP mode) registers, and gen_lowpart chokes on these.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/105778] Shift by register --- unnecessary AND instruction
  2022-05-30 21:21 [Bug target/105778] New: Rotate by register --- unnecessary AND instruction zero at smallinteger dot com
                   ` (5 preceding siblings ...)
  2022-05-31 18:53 ` ubizjak at gmail dot com
@ 2022-06-02  8:41 ` cvs-commit at gcc dot gnu.org
  2022-06-02  8:44 ` jakub at gcc dot gnu.org
  2022-10-24 17:44 ` pinskia at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-06-02  8:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105778

--- Comment #7 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:dcfdd2851b297e0005a8490b7f867ca45d1ad340

commit r13-927-gdcfdd2851b297e0005a8490b7f867ca45d1ad340
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Thu Jun 2 10:40:12 2022 +0200

    i386: Optimize away shift count masking of shifts/rotates some more
[PR105778]

    As the following testcase shows, our x86 backend support for optimizing
    out useless masking of shift/rotate counts when using instructions
    that naturally modulo the count themselves is insufficient.
    The *_mask define_insn_and_split patterns use
    (subreg:QI (and:SI (match_operand:SI) (match_operand "const_int_operand")))
    for the masking, but that can catch only the case where the masking
    is done in SImode, so typically in SImode in the source.
    We then have another set of patterns, *_mask_1, which use
    (and:QI (match_operand:QI) (match_operand "const_int_operand"))
    If the masking is done in DImode or in theory in HImode, we don't match
    it.
    The following patch does 4 different things to improve this:
    1) drops the mode from AND and MATCH_OPERAND inside of the subreg:QI
       and replaces that by checking that the register shift count has
       SWI48 mode - I think doing it this way is cheaper than adding
       another mode iterator to patterns which use already another mode
       iterator and sometimes a code iterator as well
    2) the doubleword shift patterns were only handling the case where
       the shift count is masked with a constant that has the most significant
       bit clear, i.e. where we know the shift count is less than half the
       number of bits in double-word.  If the mask is equal to half the
       number of bits in double-word minus 1, the masking was optimized
       away, otherwise the AND was kept.
       But if the most significant bit isn't clear, e use a word-sized shift
       and SHRD instruction, where the former does the modulo and the latter
       modulo with 64 / 32 depending on what mode the CPU is in (so 64 for
       128-bit doubleword and 32 or 64-bit doubleword).  So we can also
       optimize away the masking when the mask has all the relevant bits set,
       masking with the most significant bit will remain for the cmove
       test.
    3) as requested, this patch adds a bunch of force_reg calls before
       gen_lowpart
    4) 1-3 above unfortunately regressed
       +FAIL: gcc.target/i386/bt-mask-2.c scan-assembler-not and[lq][ \\t]
       +FAIL: gcc.target/i386/pr57819.c scan-assembler-not and[lq][ \\t]
       where we during combine match the new pattern we didn't match
       before and in the end don't match the pattern we were testing for.
       These 2 tests are fixed by the *jcc_bt<mode>_mask_1 pattern
       addition and small tweak to target rtx_costs, because even with
       the pattern around we'd refuse to match it because it appeared to
       have higher instruction cost

    2022-06-02  Jakub Jelinek  <jakub@redhat.com>

            PR target/105778
            * config/i386/i386.md (*ashl<dwi>3_doubleword_mask): Remove :SI
            from AND and its operands and just verify operands[2] has HImode,
            SImode or for TARGET_64BIT DImode.  Allow operands[3] to be a mask
            with all low 6 (64-bit) or 5 (32-bit) bits set and in that case
            just throw away the masking.  Use force_reg before calling
            gen_lowpart.
            (*ashl<dwi>3_doubleword_mask_1): Allow operands[3] to be a mask
            with all low 6 (64-bit) or 5 (32-bit) bits set and in that case
            just throw away the masking.
            (*ashl<mode>3_doubleword): Rename to ...
            (ashl<mode>3_doubleword): ... this.
            (*ashl<mode>3_mask): Remove :SI from AND and its operands and just
            verify operands[2] has HImode, SImode or for TARGET_64BIT DImode.
            Use force_reg before calling gen_lowpart.
            (*<insn><mode>3_mask): Likewise.
            (*<insn><dwi>3_doubleword_mask): Likewise.  Allow operands[3] to be
            a mask with all low 6 (64-bit) or 5 (32-bit) bits set and in that
            case just throw away the masking.  Use force_reg before calling
            gen_lowpart.
            (*<insn><dwi>3_doubleword_mask_1): Allow operands[3] to be a mask
            with all low 6 (64-bit) or 5 (32-bit) bits set and in that case
just
            throw away the masking.
            (*<insn><mode>3_doubleword): Rename to ...
            (<insn><mode>3_doubleword): ... this.
            (*<insn><mode>3_mask): Remove :SI from AND and its operands and
just
            verify operands[2] has HImode, SImode or for TARGET_64BIT DImode.
            Use force_reg before calling gen_lowpart.
            (splitter after it): Remove :SI from AND and its operands and just
            verify operands[2] has HImode, SImode or for TARGET_64BIT DImode.
            (*<btsc><mode>_mask, *<btsc><mode>_mask): Remove :SI from AND and
its
            operands and just verify operands[1] has HImode, SImode or for
            TARGET_64BIT DImode.  Use force_reg before calling gen_lowpart.
            (*jcc_bt<mode>_mask_1): New define_insn_and_split pattern.
            * config/i386/i386.cc (ix86_rtx_costs): For ZERO_EXTRACT with
            ZERO_EXTEND QI->SI in last operand ignore the cost of the
ZERO_EXTEND.

            * gcc.target/i386/pr105778.c: New test.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/105778] Shift by register --- unnecessary AND instruction
  2022-05-30 21:21 [Bug target/105778] New: Rotate by register --- unnecessary AND instruction zero at smallinteger dot com
                   ` (6 preceding siblings ...)
  2022-06-02  8:41 ` cvs-commit at gcc dot gnu.org
@ 2022-06-02  8:44 ` jakub at gcc dot gnu.org
  2022-10-24 17:44 ` pinskia at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-06-02  8:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105778

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|---                         |FIXED

--- Comment #8 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Fixed for GCC 13.1 and later.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/105778] Shift by register --- unnecessary AND instruction
  2022-05-30 21:21 [Bug target/105778] New: Rotate by register --- unnecessary AND instruction zero at smallinteger dot com
                   ` (7 preceding siblings ...)
  2022-06-02  8:44 ` jakub at gcc dot gnu.org
@ 2022-10-24 17:44 ` pinskia at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-10-24 17:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105778

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |13.0
           Severity|normal                      |enhancement
           Keywords|                            |missed-optimization

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-10-24 17:44 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-30 21:21 [Bug target/105778] New: Rotate by register --- unnecessary AND instruction zero at smallinteger dot com
2022-05-30 22:42 ` [Bug target/105778] Shift " jakub at gcc dot gnu.org
2022-05-31 13:19 ` jakub at gcc dot gnu.org
2022-05-31 13:44 ` ubizjak at gmail dot com
2022-05-31 13:50 ` jakub at gcc dot gnu.org
2022-05-31 14:41 ` jakub at gcc dot gnu.org
2022-05-31 18:53 ` ubizjak at gmail dot com
2022-06-02  8:41 ` cvs-commit at gcc dot gnu.org
2022-06-02  8:44 ` jakub at gcc dot gnu.org
2022-10-24 17:44 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).