[Bug target/102986] [12 Regression] ICE: in expand_shift_1, at expmed.c:2671 with a negative shift of a vector

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

From: "cvs-commit at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/102986] [12 Regression] ICE: in expand_shift_1, at expmed.c:2671 with a negative shift of a vector
Date: Tue, 02 Nov 2021 08:27:14 +0000	[thread overview]
Message-ID: <bug-102986-4-iMsruPfOL3@http.gcc.gnu.org/bugzilla/> (raw)
In-Reply-To: <bug-102986-4@http.gcc.gnu.org/bugzilla/>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102986

--- Comment #6 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Roger Sayle <sayle@gcc.gnu.org>:

https://gcc.gnu.org/g:1188cf5fb7d9c3f0753cdb11d961fe90113991e8

commit r12-4838-g1188cf5fb7d9c3f0753cdb11d961fe90113991e8
Author: Roger Sayle <roger@nextmovesoftware.com>
Date:   Tue Nov 2 08:23:04 2021 +0000

    x86_64: Expand ashrv1ti (and PR target/102986)

    This patch was originally intended to implement 128-bit arithmetic right
    shifts by constants of vector registers (V1TImode), but while working on
    it I discovered the (my) recent ICE on valid regression now known as
    PR target/102986.

    As diagnosed by Jakub, expanders for shifts are not allowed to fail, and
    so any backend that provides a shift optab needs to handle variable amount
    shifts as well as constant shifts [even though the middle-end knows how
    to synthesize these for vector modes].  This constraint could be relaxed
    in the middle-end, but it makes sense to fix this in my erroneous code.

    The solution is to change the constraints on the recently added (and new)
    shift expanders from SImode const_int_register to QImode general operand,
    matching the TImode expanders' constraints, and then simply check for
    !CONST_INT_P at the start of the ix86_expand_v1ti_* functions, converting
    the operands from V1TImode to TImode, performing the TImode operation
    and converting the result back to V1TImode.

    One nice benefit of this strategy, is that it allows us to implement
    Uros' recent suggestion, that we should be more efficiently converting
    between these modes, avoiding the use of memory and using the same idiom
    as LLVM or using pextrq/pinsrq where available.  The new helper functions
    ix86_expand_v1ti_to_ti and ix86_expand_ti_to_v1ti are sufficient to take
    care of this.  Interestingly partial support for this is already present,
    but x86_64's generic tuning prefers memory transfers to avoid penalizing
    microarchitectures with significant interunit delays.  With these changes
    we now generate both pextrq and pinsrq for -mtune=native.

    The main body of the patch is to implement arithmetic right shift in
    addition to the logical right shift and left shift implemented in the
    previous patch.  This expander provides no less than 13 different code
    sequences, special casing the different constant shifts, including
    variants taking advantage of TARGET_AVX2 and TARGET_SSE4_1.  The code
    is structured with the faster/shorter sequences and the start, and
    the generic implementations at the end.

    For the record, the implementations are:

    ashr_127:       // Shift 127, 2 operations, 10 bytes
            pshufd  $255, %xmm0, %xmm0
            psrad   $31, %xmm0
            ret

    ashr_64:        // Shift by 64, 3 operations, 14 bytes
            pshufd  $255, %xmm0, %xmm1
            psrad   $31, %xmm1
            punpckhqdq      %xmm1, %xmm0
            ret

    ashr_96:        // Shift by 96, 3 operations, 18 bytes
            movdqa  %xmm0, %xmm1
            psrad   $31, %xmm1
            punpckhqdq      %xmm1, %xmm0
            pshufd  $253, %xmm0, %xmm0
            ret

    ashr_8:         // Shift by 8/16/24/32 on AVX2, 3 operations, 16 bytes
            vpsrad  $8, %xmm0, %xmm1
            vpsrldq $1, %xmm0, %xmm0
            vpblendd        $7, %xmm0, %xmm1, %xmm0
            ret

    ashr_8:         // Shift by 8/16/24/32 on SSE4.1, 3 operations, 24 bytes
            movdqa  %xmm0, %xmm1
            psrldq  $1, %xmm0
            psrad   $8, %xmm1
            pblendw $63, %xmm0, %xmm1
            movdqa  %xmm1, %xmm0
            ret

    ashr_97:        // Shifts by 97..126, 4 operations, 23 bytes
            movdqa  %xmm0, %xmm1
            psrad   $31, %xmm0
            psrad   $1, %xmm1
            punpckhqdq      %xmm0, %xmm1
            pshufd  $253, %xmm1, %xmm0
            ret

    ashr_48:        // Shifts by 48/80 on SSE4.1, 4 operations, 25 bytes
            movdqa  %xmm0, %xmm1
            pshufd  $255, %xmm0, %xmm0
            psrldq  $6, %xmm1
            psrad   $31, %xmm0
            pblendw $31, %xmm1, %xmm0
            ret

    ashr_8:         // Shifts by multiples of 8, 5 operations, 28 bytes
            movdqa  %xmm0, %xmm1
            pshufd  $255, %xmm0, %xmm0
            psrad   $31, %xmm0
            psrldq  $1, %xmm1
            pslldq  $15, %xmm0
            por     %xmm1, %xmm0
            ret

    ashr_1:         // Shifts by 1..31 on AVX2, 6 operations, 30 bytes
            vpsrldq $8, %xmm0, %xmm2
            vpsrad  $1, %xmm0, %xmm1
            vpsllq  $63, %xmm2, %xmm2
            vpsrlq  $1, %xmm0, %xmm0
            vpor    %xmm2, %xmm0, %xmm0
            vpblendd        $7, %xmm0, %xmm1, %xmm0
            ret

    ashr_1:         // Shifts by 1..15 on SSE4.1, 6 operations, 42 bytes
            movdqa  %xmm0, %xmm2
            movdqa  %xmm0, %xmm1
            psrldq  $8, %xmm2
            psrlq   $1, %xmm0
            psllq   $63, %xmm2
            psrad   $1, %xmm1
            por     %xmm2, %xmm0
            pblendw $63, %xmm0, %xmm1
            movdqa  %xmm1, %xmm0
            ret

    ashr_1:         // Shift by 1, 8 operations, 46 bytes
            movdqa  %xmm0, %xmm1
            movdqa  %xmm0, %xmm2
            psrldq  $8, %xmm2
            psrlq   $63, %xmm1
            psllq   $63, %xmm2
            psrlq   $1, %xmm0
            pshufd  $191, %xmm1, %xmm1
            por     %xmm2, %xmm0
            psllq   $31, %xmm1
            por     %xmm1, %xmm0
            ret

    ashr_65:        // Shifts by 65..95, 8 operations, 42 bytes
            pshufd  $255, %xmm0, %xmm1
            psrldq  $8, %xmm0
            psrad   $31, %xmm1
            psrlq   $1, %xmm0
            movdqa  %xmm1, %xmm2
            psllq   $63, %xmm1
            pslldq  $8, %xmm2
            por     %xmm2, %xmm1
            por     %xmm1, %xmm0
            ret

    ashr_2:         // Shifts from 2..63, 9 operations, 47 bytes
            pshufd  $255, %xmm0, %xmm1
            movdqa  %xmm0, %xmm2
            psrad   $31, %xmm1
            psrldq  $8, %xmm2
            psllq   $62, %xmm2
            psrlq   $2, %xmm0
            pslldq  $8, %xmm1
            por     %xmm2, %xmm0
            psllq   $62, %xmm1
            por     %xmm1, %xmm0
            ret

    To test these changes there are several new test cases. 
sse2-v1ti-shift-2.c
    is a compile-test designed to spot/catch PR target/102986 [for all shifts
    and rotates by variable amounts], and sse2-v1ti-shift-3.c is an execution
    test to confirm shifts/rotates by variable amounts produce the same results
    for TImode and V1TImode.  sse2-v1ti-ashiftrt-1.c is a (similar) execution
    test to confirm arithmetic right shifts by different constants produce
    identical results between TImode and V1TImode.  sse2-v1ti-ashift-[23].c are
    duplicates of this file as compilation tests specifying -mavx2 and -msse4.1
    respectively to trigger all the paths through the new expander.

    2021-11-02  Roger Sayle  <roger@nextmovesoftware.com>
                Jakub Jelinek  <jakub@redhat.com>

    gcc/ChangeLog
            PR target/102986
            * config/i386/i386-expand.c (ix86_expand_v1ti_to_ti,
            ix86_expand_ti_to_v1ti): New helper functions.
            (ix86_expand_v1ti_shift): Check if the amount operand is an
            integer constant, and expand as a TImode shift if it isn't.
            (ix86_expand_v1ti_rotate): Check if the amount operand is an
            integer constant, and expand as a TImode rotate if it isn't.
            (ix86_expand_v1ti_ashiftrt): New function to expand arithmetic
            right shifts of V1TImode quantities.
            * config/i386/i386-protos.h (ix86_expand_v1ti_ashift): Prototype.
            * config/i386/sse.md (ashlv1ti3, lshrv1ti3): Change constraints
            to QImode general_operand, and let the helper functions lower
            shifts by non-constant operands, as TImode shifts.  Make
            conditional on TARGET_64BIT.
            (ashrv1ti3): New expander calling ix86_expand_v1ti_ashiftrt.
            (rotlv1ti3, rotrv1ti3): Change shift operand to QImode.
            Make conditional on TARGET_64BIT.

    gcc/testsuite/ChangeLog
            PR target/102986
            * gcc.target/i386/sse2-v1ti-ashiftrt-1.c: New test case.
            * gcc.target/i386/sse2-v1ti-ashiftrt-2.c: New test case.
            * gcc.target/i386/sse2-v1ti-ashiftrt-3.c: New test case.
            * gcc.target/i386/sse2-v1ti-shift-2.c: New test case.
            * gcc.target/i386/sse2-v1ti-shift-3.c: New test case.

next prev parent reply	other threads:[~2021-11-02  8:27 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-28 12:32 [Bug rtl-optimization/102986] New: " zsojka at seznam dot cz
2021-10-28 18:23 ` [Bug rtl-optimization/102986] " jakub at gcc dot gnu.org
2021-10-29 10:02 ` jakub at gcc dot gnu.org
2021-10-29 10:05 ` jakub at gcc dot gnu.org
2021-10-29 11:02 ` [Bug target/102986] " roger at nextmovesoftware dot com
2021-10-30 10:21 ` roger at nextmovesoftware dot com
2021-11-02  8:27 ` cvs-commit at gcc dot gnu.org [this message]
2021-11-03 13:53 ` roger at nextmovesoftware dot com
2022-03-23  9:30 ` cvs-commit at gcc dot gnu.org

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-102986-4-iMsruPfOL3@http.gcc.gnu.org/bugzilla/ \
    --to=gcc-bugzilla@gcc.gnu.org \
    --cc=gcc-bugs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).