public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
From: "cvs-commit at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/105778] Shift by register --- unnecessary AND instruction
Date: Thu, 02 Jun 2022 08:41:58 +0000	[thread overview]
Message-ID: <bug-105778-4-EJVVCY0vYc@http.gcc.gnu.org/bugzilla/> (raw)
In-Reply-To: <bug-105778-4@http.gcc.gnu.org/bugzilla/>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105778

--- Comment #7 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:dcfdd2851b297e0005a8490b7f867ca45d1ad340

commit r13-927-gdcfdd2851b297e0005a8490b7f867ca45d1ad340
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Thu Jun 2 10:40:12 2022 +0200

    i386: Optimize away shift count masking of shifts/rotates some more
[PR105778]

    As the following testcase shows, our x86 backend support for optimizing
    out useless masking of shift/rotate counts when using instructions
    that naturally modulo the count themselves is insufficient.
    The *_mask define_insn_and_split patterns use
    (subreg:QI (and:SI (match_operand:SI) (match_operand "const_int_operand")))
    for the masking, but that can catch only the case where the masking
    is done in SImode, so typically in SImode in the source.
    We then have another set of patterns, *_mask_1, which use
    (and:QI (match_operand:QI) (match_operand "const_int_operand"))
    If the masking is done in DImode or in theory in HImode, we don't match
    it.
    The following patch does 4 different things to improve this:
    1) drops the mode from AND and MATCH_OPERAND inside of the subreg:QI
       and replaces that by checking that the register shift count has
       SWI48 mode - I think doing it this way is cheaper than adding
       another mode iterator to patterns which use already another mode
       iterator and sometimes a code iterator as well
    2) the doubleword shift patterns were only handling the case where
       the shift count is masked with a constant that has the most significant
       bit clear, i.e. where we know the shift count is less than half the
       number of bits in double-word.  If the mask is equal to half the
       number of bits in double-word minus 1, the masking was optimized
       away, otherwise the AND was kept.
       But if the most significant bit isn't clear, e use a word-sized shift
       and SHRD instruction, where the former does the modulo and the latter
       modulo with 64 / 32 depending on what mode the CPU is in (so 64 for
       128-bit doubleword and 32 or 64-bit doubleword).  So we can also
       optimize away the masking when the mask has all the relevant bits set,
       masking with the most significant bit will remain for the cmove
       test.
    3) as requested, this patch adds a bunch of force_reg calls before
       gen_lowpart
    4) 1-3 above unfortunately regressed
       +FAIL: gcc.target/i386/bt-mask-2.c scan-assembler-not and[lq][ \\t]
       +FAIL: gcc.target/i386/pr57819.c scan-assembler-not and[lq][ \\t]
       where we during combine match the new pattern we didn't match
       before and in the end don't match the pattern we were testing for.
       These 2 tests are fixed by the *jcc_bt<mode>_mask_1 pattern
       addition and small tweak to target rtx_costs, because even with
       the pattern around we'd refuse to match it because it appeared to
       have higher instruction cost

    2022-06-02  Jakub Jelinek  <jakub@redhat.com>

            PR target/105778
            * config/i386/i386.md (*ashl<dwi>3_doubleword_mask): Remove :SI
            from AND and its operands and just verify operands[2] has HImode,
            SImode or for TARGET_64BIT DImode.  Allow operands[3] to be a mask
            with all low 6 (64-bit) or 5 (32-bit) bits set and in that case
            just throw away the masking.  Use force_reg before calling
            gen_lowpart.
            (*ashl<dwi>3_doubleword_mask_1): Allow operands[3] to be a mask
            with all low 6 (64-bit) or 5 (32-bit) bits set and in that case
            just throw away the masking.
            (*ashl<mode>3_doubleword): Rename to ...
            (ashl<mode>3_doubleword): ... this.
            (*ashl<mode>3_mask): Remove :SI from AND and its operands and just
            verify operands[2] has HImode, SImode or for TARGET_64BIT DImode.
            Use force_reg before calling gen_lowpart.
            (*<insn><mode>3_mask): Likewise.
            (*<insn><dwi>3_doubleword_mask): Likewise.  Allow operands[3] to be
            a mask with all low 6 (64-bit) or 5 (32-bit) bits set and in that
            case just throw away the masking.  Use force_reg before calling
            gen_lowpart.
            (*<insn><dwi>3_doubleword_mask_1): Allow operands[3] to be a mask
            with all low 6 (64-bit) or 5 (32-bit) bits set and in that case
just
            throw away the masking.
            (*<insn><mode>3_doubleword): Rename to ...
            (<insn><mode>3_doubleword): ... this.
            (*<insn><mode>3_mask): Remove :SI from AND and its operands and
just
            verify operands[2] has HImode, SImode or for TARGET_64BIT DImode.
            Use force_reg before calling gen_lowpart.
            (splitter after it): Remove :SI from AND and its operands and just
            verify operands[2] has HImode, SImode or for TARGET_64BIT DImode.
            (*<btsc><mode>_mask, *<btsc><mode>_mask): Remove :SI from AND and
its
            operands and just verify operands[1] has HImode, SImode or for
            TARGET_64BIT DImode.  Use force_reg before calling gen_lowpart.
            (*jcc_bt<mode>_mask_1): New define_insn_and_split pattern.
            * config/i386/i386.cc (ix86_rtx_costs): For ZERO_EXTRACT with
            ZERO_EXTEND QI->SI in last operand ignore the cost of the
ZERO_EXTEND.

            * gcc.target/i386/pr105778.c: New test.

  parent reply	other threads:[~2022-06-02  8:41 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-30 21:21 [Bug target/105778] New: Rotate " zero at smallinteger dot com
2022-05-30 22:42 ` [Bug target/105778] Shift " jakub at gcc dot gnu.org
2022-05-31 13:19 ` jakub at gcc dot gnu.org
2022-05-31 13:44 ` ubizjak at gmail dot com
2022-05-31 13:50 ` jakub at gcc dot gnu.org
2022-05-31 14:41 ` jakub at gcc dot gnu.org
2022-05-31 18:53 ` ubizjak at gmail dot com
2022-06-02  8:41 ` cvs-commit at gcc dot gnu.org [this message]
2022-06-02  8:44 ` jakub at gcc dot gnu.org
2022-10-24 17:44 ` pinskia at gcc dot gnu.org

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-105778-4-EJVVCY0vYc@http.gcc.gnu.org/bugzilla/ \
    --to=gcc-bugzilla@gcc.gnu.org \
    --cc=gcc-bugs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).