Re: [PATCH] Fix MMX/SSE/AVX* shifts by non-immediate scalar (PR target/80286)

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Uros Bizjak <ubizjak@gmail.com>
To: Jakub Jelinek <jakub@redhat.com>
Cc: Kirill Yukhin <kirill.yukhin@gmail.com>,
		"gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH] Fix MMX/SSE/AVX* shifts by non-immediate scalar (PR target/80286)
Date: Tue, 04 Apr 2017 06:40:00 -0000	[thread overview]
Message-ID: <CAFULd4bHKjsF-i_uhDXb0AaCw2qtV3JzQSf5wabadnX-FTft-Q@mail.gmail.com> (raw)
In-Reply-To: <20170403203437.GF17461@tucnak>

On Mon, Apr 3, 2017 at 10:34 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> Hi!
>
> This patch deals just with correctness of vector shifts by scalar
> non-immediate.  The manuals say the shift count is bits [0:63] of
> the corresponding source operand (XMM reg or memory in some cases),
> and if the count is bigger than number of bits - 1 in the vector element,
> it is treated as number of bits shift count.
> We are modelling it as SImode shift count though, the upper 32 bits
> may be random in some cases which causes wrong-code.
> Fixed by using DImode that matches what the insns do.

IIRC, SImode was choosen to simplify GPR->XMM register moves on 32bit
target. It does look this was wrong choice from the correctness point.

> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> Any thoughts on what to do to generate reasonable code when the shift count
> comes from memory (e.g. as int variable) or is in the low bits of some XMM
> regioster?

The problem with int variable from memory is, that shifts access full
128bits for their count operand, so this is effectively a no-go. If
there is a 128bit count value in memory, we can maybe define shift
pattern with:

(subreg:DI (match_operand:V2DI 2 "general_operand" "xmN,vmN"))

?

> First of all, perhaps we could have some combiner (or peephole) pattern that would
> transform sign-extend from e.g. SI to DI on the shift count into zero-extend
> if there are no other uses of the extension result - if the shift count is
> negative in SImode (or even QImode), then it is already large number and the
> upper 32 bits or more don't really change anything on that.

We can introduce shift patterns with embedded extensions, and split
them to zext + shift. These new patterns can be easily macroized with
any_extend code iterator and SWI124 mode iterator, so we avoid pattern
explosion.

> Then perhaps we could emit pmovzxdq for SSE4.1+ instead of going through
> GPRs and back, or for SSE2 pxor on a scratch reg and punpck* to get it zero
> extended.  Not sure if we want to add =v / vm alternative to
> zero_extendsidi2*, it already has some x but with ?s that prevent the RA
> from using it.  So thoughts on that?

The ? is there to discourage RA from allocating xmm reg (all these
alternatives have * on xmm reg), in effect instructing RA to prefer
GPRs. If the value is already in xmm reg, then I expect ? alternative
will be used. So, yes, v/v alternative as you proposed would be a good
addition to zero_extendsidi alternatives. Please note though that
pmovzxdq operates on a vector value, so memory operands should be
avoided.

>
> 2017-04-03  Jakub Jelinek  <jakub@redhat.com>
>
>         PR target/80286
>         * config/i386/i386.c (ix86_expand_args_builtin): If op has scalar
>         int mode, convert_modes it to mode as unsigned, otherwise use
>         lowpart_subreg to mode rather than SImode.
>         * config/i386/sse.md (<mask_codefor>ashr<mode>3<mask_name>,
>         ashr<mode>3, ashr<mode>3<mask_name>, <shift_insn><mode>3<mask_name>):
>         Use DImode instead of SImode for the shift count operand.
>         * config/i386/mmx.md (mmx_ashr<mode>3, mmx_<shift_insn><mode>3):
>         Likewise.
> testsuite/
>         * gcc.target/i386/avx-pr80286.c: New test.
>         * gcc.dg/pr80286.c: New test.

OK for trunk and backports.

Thanks,
Uros.

> --- gcc/config/i386/i386.c.jj   2017-04-03 10:40:22.000000000 +0200
> +++ gcc/config/i386/i386.c      2017-04-03 18:31:39.482367634 +0200
> @@ -35582,10 +35582,17 @@ ix86_expand_args_builtin (const struct b
>         {
>           /* SIMD shift insns take either an 8-bit immediate or
>              register as count.  But builtin functions take int as
> -            count.  If count doesn't match, we put it in register.  */
> +            count.  If count doesn't match, we put it in register.
> +            The instructions are using 64-bit count, if op is just
> +            32-bit, zero-extend it, as negative shift counts
> +            are undefined behavior and zero-extension is more
> +            efficient.  */
>           if (!match)
>             {
> -             op = lowpart_subreg (SImode, op, GET_MODE (op));
> +             if (SCALAR_INT_MODE_P (GET_MODE (op)))
> +               op = convert_modes (mode, GET_MODE (op), op, 1);
> +             else
> +               op = lowpart_subreg (mode, op, GET_MODE (op));
>               if (!insn_p->operand[i + 1].predicate (op, mode))
>                 op = copy_to_reg (op);
>             }
> --- gcc/config/i386/sse.md.jj   2017-04-03 13:43:50.179572564 +0200
> +++ gcc/config/i386/sse.md      2017-04-03 18:01:19.713852914 +0200
> @@ -10620,7 +10620,7 @@ (define_insn "<mask_codefor>ashr<mode>3<
>    [(set (match_operand:VI24_AVX512BW_1 0 "register_operand" "=v,v")
>         (ashiftrt:VI24_AVX512BW_1
>           (match_operand:VI24_AVX512BW_1 1 "nonimmediate_operand" "v,vm")
> -         (match_operand:SI 2 "nonmemory_operand" "v,N")))]
> +         (match_operand:DI 2 "nonmemory_operand" "v,N")))]
>    "TARGET_AVX512VL"
>    "vpsra<ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
>    [(set_attr "type" "sseishft")
> @@ -10634,7 +10634,7 @@ (define_insn "ashr<mode>3"
>    [(set (match_operand:VI24_AVX2 0 "register_operand" "=x,x")
>         (ashiftrt:VI24_AVX2
>           (match_operand:VI24_AVX2 1 "register_operand" "0,x")
> -         (match_operand:SI 2 "nonmemory_operand" "xN,xN")))]
> +         (match_operand:DI 2 "nonmemory_operand" "xN,xN")))]
>    "TARGET_SSE2"
>    "@
>     psra<ssemodesuffix>\t{%2, %0|%0, %2}
> @@ -10667,7 +10667,7 @@ (define_insn "ashr<mode>3<mask_name>"
>    [(set (match_operand:VI248_AVX512BW_AVX512VL 0 "register_operand" "=v,v")
>         (ashiftrt:VI248_AVX512BW_AVX512VL
>           (match_operand:VI248_AVX512BW_AVX512VL 1 "nonimmediate_operand" "v,vm")
> -         (match_operand:SI 2 "nonmemory_operand" "v,N")))]
> +         (match_operand:DI 2 "nonmemory_operand" "v,N")))]
>    "TARGET_AVX512F"
>    "vpsra<ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
>    [(set_attr "type" "sseishft")
> @@ -10681,7 +10681,7 @@ (define_insn "<shift_insn><mode>3<mask_n
>    [(set (match_operand:VI2_AVX2_AVX512BW 0 "register_operand" "=x,v")
>         (any_lshift:VI2_AVX2_AVX512BW
>           (match_operand:VI2_AVX2_AVX512BW 1 "register_operand" "0,v")
> -         (match_operand:SI 2 "nonmemory_operand" "xN,vN")))]
> +         (match_operand:DI 2 "nonmemory_operand" "xN,vN")))]
>    "TARGET_SSE2 && <mask_mode512bit_condition> && <mask_avx512bw_condition>"
>    "@
>     p<vshift><ssemodesuffix>\t{%2, %0|%0, %2}
> @@ -10700,7 +10700,7 @@ (define_insn "<shift_insn><mode>3<mask_n
>    [(set (match_operand:VI48_AVX2 0 "register_operand" "=x,x,v")
>         (any_lshift:VI48_AVX2
>           (match_operand:VI48_AVX2 1 "register_operand" "0,x,v")
> -         (match_operand:SI 2 "nonmemory_operand" "xN,xN,vN")))]
> +         (match_operand:DI 2 "nonmemory_operand" "xN,xN,vN")))]
>    "TARGET_SSE2 && <mask_mode512bit_condition>"
>    "@
>     p<vshift><ssemodesuffix>\t{%2, %0|%0, %2}
> @@ -10720,7 +10720,7 @@ (define_insn "<shift_insn><mode>3<mask_n
>    [(set (match_operand:VI48_512 0 "register_operand" "=v,v")
>         (any_lshift:VI48_512
>           (match_operand:VI48_512 1 "nonimmediate_operand" "v,m")
> -         (match_operand:SI 2 "nonmemory_operand" "vN,N")))]
> +         (match_operand:DI 2 "nonmemory_operand" "vN,N")))]
>    "TARGET_AVX512F && <mask_mode512bit_condition>"
>    "vp<vshift><ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
>    [(set_attr "isa" "avx512f")
> --- gcc/config/i386/mmx.md.jj   2017-04-03 13:43:50.119573339 +0200
> +++ gcc/config/i386/mmx.md      2017-04-03 18:01:19.708852979 +0200
> @@ -930,7 +930,7 @@ (define_insn "mmx_ashr<mode>3"
>    [(set (match_operand:MMXMODE24 0 "register_operand" "=y")
>          (ashiftrt:MMXMODE24
>           (match_operand:MMXMODE24 1 "register_operand" "0")
> -         (match_operand:SI 2 "nonmemory_operand" "yN")))]
> +         (match_operand:DI 2 "nonmemory_operand" "yN")))]
>    "TARGET_MMX"
>    "psra<mmxvecsize>\t{%2, %0|%0, %2}"
>    [(set_attr "type" "mmxshft")
> @@ -944,7 +944,7 @@ (define_insn "mmx_<shift_insn><mode>3"
>    [(set (match_operand:MMXMODE248 0 "register_operand" "=y")
>          (any_lshift:MMXMODE248
>           (match_operand:MMXMODE248 1 "register_operand" "0")
> -         (match_operand:SI 2 "nonmemory_operand" "yN")))]
> +         (match_operand:DI 2 "nonmemory_operand" "yN")))]
>    "TARGET_MMX"
>    "p<vshift><mmxvecsize>\t{%2, %0|%0, %2}"
>    [(set_attr "type" "mmxshft")
> --- gcc/testsuite/gcc.target/i386/avx-pr80286.c.jj      2017-04-03 18:44:07.552698281 +0200
> +++ gcc/testsuite/gcc.target/i386/avx-pr80286.c 2017-04-03 18:43:51.000000000 +0200
> @@ -0,0 +1,26 @@
> +/* PR target/80286 */
> +/* { dg-do run { target avx } } */
> +/* { dg-options "-O2 -mavx" } */
> +
> +#include "avx-check.h"
> +#include <immintrin.h>
> +
> +__m256i m;
> +
> +__attribute__((noinline, noclone)) __m128i
> +foo (__m128i x)
> +{
> +  int s = _mm_cvtsi128_si32 (_mm256_castsi256_si128 (m));
> +  return _mm_srli_epi16 (x, s);
> +}
> +
> +static void
> +avx_test (void)
> +{
> +  __m128i a = (__m128i) (__v8hi) { 1 << 7, 2 << 8, 3 << 9, 4 << 10, 5 << 11, 6 << 12, 7 << 13, 8 << 12 };
> +  m = (__m256i) (__v8si) { 7, 8, 9, 10, 11, 12, 13, 14 };
> +  __m128i c = foo (a);
> +  __m128i b = (__m128i) (__v8hi) { 1, 2 << 1, 3 << 2, 4 << 3, 5 << 4, 6 << 5, 7 << 6, 8 << 5 };
> +  if (__builtin_memcmp (&c, &b, sizeof (__m128i)))
> +    __builtin_abort ();
> +}
> --- gcc/testsuite/gcc.dg/pr80286.c.jj   2017-04-03 18:45:27.574663948 +0200
> +++ gcc/testsuite/gcc.dg/pr80286.c      2017-04-03 18:45:18.386782707 +0200
> @@ -0,0 +1,23 @@
> +/* PR target/80286 */
> +/* { dg-do run } */
> +/* { dg-options "-O2 -Wno-psabi" } */
> +
> +typedef int V __attribute__((vector_size (4 * sizeof (int))));
> +
> +__attribute__((noinline, noclone)) V
> +foo (V x, V y)
> +{
> +  return x << y[0];
> +}
> +
> +int
> +main ()
> +{
> +  V x = { 1, 2, 3, 4 };
> +  V y = { 5, 6, 7, 8 };
> +  V z = foo (x, y);
> +  V e = { 1 << 5, 2 << 5, 3 << 5, 4 << 5 };
> +  if (__builtin_memcmp (&z, &e, sizeof (V)))
> +    __builtin_abort ();
> +  return 0;
> +}
>
>         Jakub

next prev parent reply	other threads:[~2017-04-04  6:40 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-03 20:34 Jakub Jelinek
2017-04-04  6:40 ` Uros Bizjak [this message]
2017-04-04 12:01   ` Jakub Jelinek
2017-04-04 12:33     ` Uros Bizjak
2017-04-04 15:09       ` Jakub Jelinek
2017-04-06  7:34         ` Uros Bizjak
2017-04-06  8:40           ` Jakub Jelinek
2017-04-06  8:47             ` Uros Bizjak
2017-04-06  9:56               ` Jakub Jelinek
2017-04-06  8:48             ` Jakub Jelinek
2017-04-06  8:40           ` Uros Bizjak

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAFULd4bHKjsF-i_uhDXb0AaCw2qtV3JzQSf5wabadnX-FTft-Q@mail.gmail.com \
    --to=ubizjak@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jakub@redhat.com \
    --cc=kirill.yukhin@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).