[PATCH 1/2, x86] Add palignr support for AVX2.

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH 1/2, x86] Add palignr support for AVX2.
@ 2014-04-28 16:51 Evgeny Stupachenko
  2014-04-28 17:29 ` H.J. Lu
  2014-04-28 18:15 ` Richard Henderson
  0 siblings, 2 replies; 40+ messages in thread
From: Evgeny Stupachenko @ 2014-04-28 16:51 UTC (permalink / raw)
  To: GCC Patches, Richard Biener, Uros Bizjak

Hi,

The patch enables use of "palignr with perm" instead of "2 pshufb, or
and perm" at AVX2 for some cases.

Bootstrapped and passes make check on x86.

Is it ok?

2014-04-28  Evgeny Stupachenko  <evstupac@gmail.com>

        * config/i386/i386.c (expand_vec_perm_1): Try AVX2 vpshufb.
        * config/i386/i386.c (expand_vec_perm_palignr): Extend to use AVX2
        PALINGR instruction.

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 88142a8..ae80477 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -42807,6 +42807,8 @@ expand_vec_perm_pshufb (struct expand_vec_perm_d *d)
   return true;
 }

+static bool expand_vec_perm_vpshufb2_vpermq (struct expand_vec_perm_d *d);
+
 /* A subroutine of ix86_expand_vec_perm_builtin_1.  Try to instantiate D
    in a single instruction.  */

@@ -42946,6 +42948,10 @@ expand_vec_perm_1 (struct expand_vec_perm_d *d)
   if (expand_vec_perm_pshufb (d))
     return true;

+  /* Try the AVX2 vpshufb.  */
+  if (expand_vec_perm_vpshufb2_vpermq (d))
+    return true;
+
   /* Try the AVX512F vpermi2 instructions.  */
   rtx vec[64];
   enum machine_mode mode = d->vmode;
@@ -43004,7 +43010,7 @@ expand_vec_perm_pshuflw_pshufhw (struct
expand_vec_perm_d *d)
 }

 /* A subroutine of ix86_expand_vec_perm_builtin_1.  Try to simplify
-   the permutation using the SSSE3 palignr instruction.  This succeeds
+   the permutation using the SSSE3/AVX2 palignr instruction.  This succeeds
    when all of the elements in PERM fit within one vector and we merely
    need to shift them down so that a single vector permutation has a
    chance to succeed.  */
@@ -43015,14 +43021,20 @@ expand_vec_perm_palignr (struct expand_vec_perm_d *d)
   unsigned i, nelt = d->nelt;
   unsigned min, max;
   bool in_order, ok;
-  rtx shift, target;
+  rtx shift, shift1, target, tmp;
   struct expand_vec_perm_d dcopy;

-  /* Even with AVX, palignr only operates on 128-bit vectors.  */
-  if (!TARGET_SSSE3 || GET_MODE_SIZE (d->vmode) != 16)
+  /* PALIGNR of 2 256-bits registers on AVX2 costs only 2 instructions:
+     PERM and PALIGNR.  It is more profitable than 2 PSHUFB and PERM.
+     PALIGNR of 2 128-bits registers takes only 1 instrucion.  */
+  if (!TARGET_SSSE3 || (GET_MODE_SIZE (d->vmode) != 16 &&
+      GET_MODE_SIZE (d->vmode) != 32))
+    return false;
+  /* Only AVX2 or higher support PALIGNR on 256-bits registers.  */
+  if (!TARGET_AVX2 && (GET_MODE_SIZE (d->vmode) == 32))
     return false;

-  min = nelt, max = 0;
+  min = 2 * nelt, max = 0;
   for (i = 0; i < nelt; ++i)
     {
       unsigned e = d->perm[i];
@@ -43041,9 +43053,34 @@ expand_vec_perm_palignr (struct expand_vec_perm_d *d)

   dcopy = *d;
   shift = GEN_INT (min * GET_MODE_BITSIZE (GET_MODE_INNER (d->vmode)));
-  target = gen_reg_rtx (TImode);
-  emit_insn (gen_ssse3_palignrti (target, gen_lowpart (TImode, d->op1),
-                                 gen_lowpart (TImode, d->op0), shift));
+  shift1 = GEN_INT ((min - nelt / 2) *
+          GET_MODE_BITSIZE (GET_MODE_INNER (d->vmode)));
+
+  if (GET_MODE_SIZE (d->vmode) != 32)
+    {
+      target = gen_reg_rtx (TImode);
+      emit_insn (gen_ssse3_palignrti (target, gen_lowpart (TImode, d->op1),
+                                     gen_lowpart (TImode, d->op0), shift));
+    }
+  else
+    {
+      target = gen_reg_rtx (V2TImode);
+      tmp = gen_reg_rtx (V4DImode);
+      emit_insn (gen_avx2_permv2ti (tmp,
+                                   gen_lowpart (V4DImode, d->op0),
+                                   gen_lowpart (V4DImode, d->op1),
+                                   GEN_INT (33)));
+      if (min < nelt / 2)
+        emit_insn (gen_avx2_palignrv2ti (target,
+                                        gen_lowpart (V2TImode, tmp),
+                                        gen_lowpart (V2TImode, d->op0),
+                                        shift));
+      else
+       emit_insn (gen_avx2_palignrv2ti (target,
+                                        gen_lowpart (V2TImode, d->op1),
+                                        gen_lowpart (V2TImode, tmp),
+                                        shift1));
+    }

   dcopy.op0 = dcopy.op1 = gen_lowpart (d->vmode, target);
   dcopy.one_operand_p = true;


Evgeny

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 1/2, x86] Add palignr support for AVX2.
  2014-04-28 16:51 [PATCH 1/2, x86] Add palignr support for AVX2 Evgeny Stupachenko
@ 2014-04-28 17:29 ` H.J. Lu
  2014-04-28 20:43   ` Evgeny Stupachenko
  2014-04-28 18:15 ` Richard Henderson
  1 sibling, 1 reply; 40+ messages in thread
From: H.J. Lu @ 2014-04-28 17:29 UTC (permalink / raw)
  To: Evgeny Stupachenko; +Cc: GCC Patches, Richard Biener, Uros Bizjak

On Mon, Apr 28, 2014 at 9:48 AM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
> Hi,
>
> The patch enables use of "palignr with perm" instead of "2 pshufb, or
> and perm" at AVX2 for some cases.
>
> Bootstrapped and passes make check on x86.
>
> Is it ok?
>
> 2014-04-28  Evgeny Stupachenko  <evstupac@gmail.com>
>
>         * config/i386/i386.c (expand_vec_perm_1): Try AVX2 vpshufb.
>         * config/i386/i386.c (expand_vec_perm_palignr): Extend to use AVX2
>         PALINGR instruction.

Can you add testcases to verify that AVX2 vpshufb and paligngr are
properly generated?

> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index 88142a8..ae80477 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -42807,6 +42807,8 @@ expand_vec_perm_pshufb (struct expand_vec_perm_d *d)
>    return true;
>  }
>
> +static bool expand_vec_perm_vpshufb2_vpermq (struct expand_vec_perm_d *d);
> +
>  /* A subroutine of ix86_expand_vec_perm_builtin_1.  Try to instantiate D
>     in a single instruction.  */
>
> @@ -42946,6 +42948,10 @@ expand_vec_perm_1 (struct expand_vec_perm_d *d)
>    if (expand_vec_perm_pshufb (d))
>      return true;
>
> +  /* Try the AVX2 vpshufb.  */
> +  if (expand_vec_perm_vpshufb2_vpermq (d))
> +    return true;
> +
>    /* Try the AVX512F vpermi2 instructions.  */
>    rtx vec[64];
>    enum machine_mode mode = d->vmode;
> @@ -43004,7 +43010,7 @@ expand_vec_perm_pshuflw_pshufhw (struct
> expand_vec_perm_d *d)
>  }
>
>  /* A subroutine of ix86_expand_vec_perm_builtin_1.  Try to simplify
> -   the permutation using the SSSE3 palignr instruction.  This succeeds
> +   the permutation using the SSSE3/AVX2 palignr instruction.  This succeeds
>     when all of the elements in PERM fit within one vector and we merely
>     need to shift them down so that a single vector permutation has a
>     chance to succeed.  */
> @@ -43015,14 +43021,20 @@ expand_vec_perm_palignr (struct expand_vec_perm_d *d)
>    unsigned i, nelt = d->nelt;
>    unsigned min, max;
>    bool in_order, ok;
> -  rtx shift, target;
> +  rtx shift, shift1, target, tmp;
>    struct expand_vec_perm_d dcopy;
>
> -  /* Even with AVX, palignr only operates on 128-bit vectors.  */
> -  if (!TARGET_SSSE3 || GET_MODE_SIZE (d->vmode) != 16)
> +  /* PALIGNR of 2 256-bits registers on AVX2 costs only 2 instructions:
> +     PERM and PALIGNR.  It is more profitable than 2 PSHUFB and PERM.
> +     PALIGNR of 2 128-bits registers takes only 1 instrucion.  */
> +  if (!TARGET_SSSE3 || (GET_MODE_SIZE (d->vmode) != 16 &&

                       ^^^ This should
be only on the next lined.

> +      GET_MODE_SIZE (d->vmode) != 32))
> +    return false;
> +  /* Only AVX2 or higher support PALIGNR on 256-bits registers.  */
> +  if (!TARGET_AVX2 && (GET_MODE_SIZE (d->vmode) == 32))
                                         ^ No need for '(' here.

Or you can use

if ((!TARGET_SSSE3 || GET_MODE_SIZE (d->vmode) != 16)
     && (!TARGET_AVX2 || GET_MODE_SIZE (d->vmode) != 32))

>      return false;
>
> -  min = nelt, max = 0;
> +  min = 2 * nelt, max = 0;
^^^^^^^^^^^^^^^^^^^^^^^^^^  Will this change 128bit vector code?

>    for (i = 0; i < nelt; ++i)
>      {
>        unsigned e = d->perm[i];
> @@ -43041,9 +43053,34 @@ expand_vec_perm_palignr (struct expand_vec_perm_d *d)
>
>    dcopy = *d;
>    shift = GEN_INT (min * GET_MODE_BITSIZE (GET_MODE_INNER (d->vmode)));
> -  target = gen_reg_rtx (TImode);
> -  emit_insn (gen_ssse3_palignrti (target, gen_lowpart (TImode, d->op1),
> -                                 gen_lowpart (TImode, d->op0), shift));
> +  shift1 = GEN_INT ((min - nelt / 2) *
> +          GET_MODE_BITSIZE (GET_MODE_INNER (d->vmode)));
> +
> +  if (GET_MODE_SIZE (d->vmode) != 32)
> +    {
> +      target = gen_reg_rtx (TImode);
> +      emit_insn (gen_ssse3_palignrti (target, gen_lowpart (TImode, d->op1),
> +                                     gen_lowpart (TImode, d->op0), shift));
> +    }
> +  else
> +    {
> +      target = gen_reg_rtx (V2TImode);
> +      tmp = gen_reg_rtx (V4DImode);
> +      emit_insn (gen_avx2_permv2ti (tmp,
> +                                   gen_lowpart (V4DImode, d->op0),
> +                                   gen_lowpart (V4DImode, d->op1),
> +                                   GEN_INT (33)));
> +      if (min < nelt / 2)
> +        emit_insn (gen_avx2_palignrv2ti (target,
> +                                        gen_lowpart (V2TImode, tmp),
> +                                        gen_lowpart (V2TImode, d->op0),
> +                                        shift));
> +      else
> +       emit_insn (gen_avx2_palignrv2ti (target,
> +                                        gen_lowpart (V2TImode, d->op1),
> +                                        gen_lowpart (V2TImode, tmp),
> +                                        shift1));
> +    }
>
>    dcopy.op0 = dcopy.op1 = gen_lowpart (d->vmode, target);
>    dcopy.one_operand_p = true;
>
>
> Evgeny



-- 
H.J.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 1/2, x86] Add palignr support for AVX2.
  2014-04-28 16:51 [PATCH 1/2, x86] Add palignr support for AVX2 Evgeny Stupachenko
  2014-04-28 17:29 ` H.J. Lu
@ 2014-04-28 18:15 ` Richard Henderson
  2014-04-28 20:57   ` Evgeny Stupachenko
  1 sibling, 1 reply; 40+ messages in thread
From: Richard Henderson @ 2014-04-28 18:15 UTC (permalink / raw)
  To: Evgeny Stupachenko, GCC Patches, Richard Biener, Uros Bizjak

On 04/28/2014 09:48 AM, Evgeny Stupachenko wrote:
> -  /* Even with AVX, palignr only operates on 128-bit vectors.  */
> -  if (!TARGET_SSSE3 || GET_MODE_SIZE (d->vmode) != 16)
> +  /* PALIGNR of 2 256-bits registers on AVX2 costs only 2 instructions:
> +     PERM and PALIGNR.  It is more profitable than 2 PSHUFB and PERM.
> +     PALIGNR of 2 128-bits registers takes only 1 instrucion.  */
> +  if (!TARGET_SSSE3 || (GET_MODE_SIZE (d->vmode) != 16 &&
> +      GET_MODE_SIZE (d->vmode) != 32))
> +    return false;
> +  /* Only AVX2 or higher support PALIGNR on 256-bits registers.  */
> +  if (!TARGET_AVX2 && (GET_MODE_SIZE (d->vmode) == 32))
>      return false;

This is confusingly written.

How about

  if (GET_MODE_SIZE (d->vmode) == 16)
    {
      if (!TARGET_SSSE3)
        return false;
    }
  else if (GET_MODE_SIZE (d->vmode) == 32)
    {
      if (!TARGET_AVX2)
        return false;
    }
  else
    return false;

With the comments added into the right places.


r~

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 1/2, x86] Add palignr support for AVX2.
  2014-04-28 17:29 ` H.J. Lu
@ 2014-04-28 20:43   ` Evgeny Stupachenko
  0 siblings, 0 replies; 40+ messages in thread
From: Evgeny Stupachenko @ 2014-04-28 20:43 UTC (permalink / raw)
  To: H.J. Lu; +Cc: GCC Patches, Richard Biener, Uros Bizjak

On Mon, Apr 28, 2014 at 9:08 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Mon, Apr 28, 2014 at 9:48 AM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
>> Hi,
>>
>> The patch enables use of "palignr with perm" instead of "2 pshufb, or
>> and perm" at AVX2 for some cases.
>>
>> Bootstrapped and passes make check on x86.
>>
>> Is it ok?
>>
>> 2014-04-28  Evgeny Stupachenko  <evstupac@gmail.com>
>>
>>         * config/i386/i386.c (expand_vec_perm_1): Try AVX2 vpshufb.
>>         * config/i386/i386.c (expand_vec_perm_palignr): Extend to use AVX2
>>         PALINGR instruction.
>
> Can you add testcases to verify that AVX2 vpshufb and paligngr are
> properly generated?

One of next patches will have test case.

>
>> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
>> index 88142a8..ae80477 100644
>> --- a/gcc/config/i386/i386.c
>> +++ b/gcc/config/i386/i386.c
>> @@ -42807,6 +42807,8 @@ expand_vec_perm_pshufb (struct expand_vec_perm_d *d)
>>    return true;
>>  }
>>
>> +static bool expand_vec_perm_vpshufb2_vpermq (struct expand_vec_perm_d *d);
>> +
>>  /* A subroutine of ix86_expand_vec_perm_builtin_1.  Try to instantiate D
>>     in a single instruction.  */
>>
>> @@ -42946,6 +42948,10 @@ expand_vec_perm_1 (struct expand_vec_perm_d *d)
>>    if (expand_vec_perm_pshufb (d))
>>      return true;
>>
>> +  /* Try the AVX2 vpshufb.  */
>> +  if (expand_vec_perm_vpshufb2_vpermq (d))
>> +    return true;
>> +
>>    /* Try the AVX512F vpermi2 instructions.  */
>>    rtx vec[64];
>>    enum machine_mode mode = d->vmode;
>> @@ -43004,7 +43010,7 @@ expand_vec_perm_pshuflw_pshufhw (struct
>> expand_vec_perm_d *d)
>>  }
>>
>>  /* A subroutine of ix86_expand_vec_perm_builtin_1.  Try to simplify
>> -   the permutation using the SSSE3 palignr instruction.  This succeeds
>> +   the permutation using the SSSE3/AVX2 palignr instruction.  This succeeds
>>     when all of the elements in PERM fit within one vector and we merely
>>     need to shift them down so that a single vector permutation has a
>>     chance to succeed.  */
>> @@ -43015,14 +43021,20 @@ expand_vec_perm_palignr (struct expand_vec_perm_d *d)
>>    unsigned i, nelt = d->nelt;
>>    unsigned min, max;
>>    bool in_order, ok;
>> -  rtx shift, target;
>> +  rtx shift, shift1, target, tmp;
>>    struct expand_vec_perm_d dcopy;
>>
>> -  /* Even with AVX, palignr only operates on 128-bit vectors.  */
>> -  if (!TARGET_SSSE3 || GET_MODE_SIZE (d->vmode) != 16)
>> +  /* PALIGNR of 2 256-bits registers on AVX2 costs only 2 instructions:
>> +     PERM and PALIGNR.  It is more profitable than 2 PSHUFB and PERM.
>> +     PALIGNR of 2 128-bits registers takes only 1 instrucion.  */
>> +  if (!TARGET_SSSE3 || (GET_MODE_SIZE (d->vmode) != 16 &&
>
>                        ^^^ This should
> be only on the next lined.
>
>> +      GET_MODE_SIZE (d->vmode) != 32))
>> +    return false;
>> +  /* Only AVX2 or higher support PALIGNR on 256-bits registers.  */
>> +  if (!TARGET_AVX2 && (GET_MODE_SIZE (d->vmode) == 32))
>                                          ^ No need for '(' here.
>
> Or you can use
>
> if ((!TARGET_SSSE3 || GET_MODE_SIZE (d->vmode) != 16)
>      && (!TARGET_AVX2 || GET_MODE_SIZE (d->vmode) != 32))
>
>>      return false;
>>
>> -  min = nelt, max = 0;
>> +  min = 2 * nelt, max = 0;
> ^^^^^^^^^^^^^^^^^^^^^^^^^^  Will this change 128bit vector code?

No. The only case when all elements in permutation constant are equal
or greater than "nelt" will lead to canonization to one operand case
with all elements less than "nelt". So the change actually affects
nothing, but still more accurate.

>
>>    for (i = 0; i < nelt; ++i)
>>      {
>>        unsigned e = d->perm[i];
>> @@ -43041,9 +43053,34 @@ expand_vec_perm_palignr (struct expand_vec_perm_d *d)
>>
>>    dcopy = *d;
>>    shift = GEN_INT (min * GET_MODE_BITSIZE (GET_MODE_INNER (d->vmode)));
>> -  target = gen_reg_rtx (TImode);
>> -  emit_insn (gen_ssse3_palignrti (target, gen_lowpart (TImode, d->op1),
>> -                                 gen_lowpart (TImode, d->op0), shift));
>> +  shift1 = GEN_INT ((min - nelt / 2) *
>> +          GET_MODE_BITSIZE (GET_MODE_INNER (d->vmode)));
>> +
>> +  if (GET_MODE_SIZE (d->vmode) != 32)
>> +    {
>> +      target = gen_reg_rtx (TImode);
>> +      emit_insn (gen_ssse3_palignrti (target, gen_lowpart (TImode, d->op1),
>> +                                     gen_lowpart (TImode, d->op0), shift));
>> +    }
>> +  else
>> +    {
>> +      target = gen_reg_rtx (V2TImode);
>> +      tmp = gen_reg_rtx (V4DImode);
>> +      emit_insn (gen_avx2_permv2ti (tmp,
>> +                                   gen_lowpart (V4DImode, d->op0),
>> +                                   gen_lowpart (V4DImode, d->op1),
>> +                                   GEN_INT (33)));
>> +      if (min < nelt / 2)
>> +        emit_insn (gen_avx2_palignrv2ti (target,
>> +                                        gen_lowpart (V2TImode, tmp),
>> +                                        gen_lowpart (V2TImode, d->op0),
>> +                                        shift));
>> +      else
>> +       emit_insn (gen_avx2_palignrv2ti (target,
>> +                                        gen_lowpart (V2TImode, d->op1),
>> +                                        gen_lowpart (V2TImode, tmp),
>> +                                        shift1));
>> +    }
>>
>>    dcopy.op0 = dcopy.op1 = gen_lowpart (d->vmode, target);
>>    dcopy.one_operand_p = true;
>>
>>
>> Evgeny
>
>
>
> --
> H.J.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 1/2, x86] Add palignr support for AVX2.
  2014-04-28 18:15 ` Richard Henderson
@ 2014-04-28 20:57   ` Evgeny Stupachenko
  2014-04-28 21:08     ` Richard Henderson
  0 siblings, 1 reply; 40+ messages in thread
From: Evgeny Stupachenko @ 2014-04-28 20:57 UTC (permalink / raw)
  To: Richard Henderson; +Cc: GCC Patches, Richard Biener, Uros Bizjak

Agree on checks:

  /* PALIGNR of 2 128-bits registers takes only 1 instrucion.
     Requires SSSE3.  */
  if (GET_MODE_SIZE (d->vmode) == 16)
    {
      if(!TARGET_SSSE3)
        return false;
    }
  /* PALIGNR of 2 256-bits registers on AVX2 costs only 2 instructions:
     PERM and PALIGNR.  It is more profitable than 2 PSHUFB and PERM.  */
  else if (GET_MODE_SIZE (d->vmode) == 32)
    {
      if(!TARGET_AVX2)
        return false;
    }
  else
    return false;


On Mon, Apr 28, 2014 at 9:32 PM, Richard Henderson <rth@redhat.com> wrote:
> On 04/28/2014 09:48 AM, Evgeny Stupachenko wrote:
>> -  /* Even with AVX, palignr only operates on 128-bit vectors.  */
>> -  if (!TARGET_SSSE3 || GET_MODE_SIZE (d->vmode) != 16)
>> +  /* PALIGNR of 2 256-bits registers on AVX2 costs only 2 instructions:
>> +     PERM and PALIGNR.  It is more profitable than 2 PSHUFB and PERM.
>> +     PALIGNR of 2 128-bits registers takes only 1 instrucion.  */
>> +  if (!TARGET_SSSE3 || (GET_MODE_SIZE (d->vmode) != 16 &&
>> +      GET_MODE_SIZE (d->vmode) != 32))
>> +    return false;
>> +  /* Only AVX2 or higher support PALIGNR on 256-bits registers.  */
>> +  if (!TARGET_AVX2 && (GET_MODE_SIZE (d->vmode) == 32))
>>      return false;
>
> This is confusingly written.
>
> How about
>
>   if (GET_MODE_SIZE (d->vmode) == 16)
>     {
>       if (!TARGET_SSSE3)
>         return false;
>     }
>   else if (GET_MODE_SIZE (d->vmode) == 32)
>     {
>       if (!TARGET_AVX2)
>         return false;
>     }
>   else
>     return false;
>
> With the comments added into the right places.
>
>
> r~

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 1/2, x86] Add palignr support for AVX2.
  2014-04-28 20:57   ` Evgeny Stupachenko
@ 2014-04-28 21:08     ` Richard Henderson
  2014-05-05 16:49       ` Evgeny Stupachenko
  0 siblings, 1 reply; 40+ messages in thread
From: Richard Henderson @ 2014-04-28 21:08 UTC (permalink / raw)
  To: Evgeny Stupachenko; +Cc: GCC Patches, Richard Biener, Uros Bizjak

On 04/28/2014 01:43 PM, Evgeny Stupachenko wrote:
> Agree on checks:
> 
>   /* PALIGNR of 2 128-bits registers takes only 1 instrucion.
>      Requires SSSE3.  */
>   if (GET_MODE_SIZE (d->vmode) == 16)
>     {
>       if(!TARGET_SSSE3)
>         return false;
>     }
>   /* PALIGNR of 2 256-bits registers on AVX2 costs only 2 instructions:
>      PERM and PALIGNR.  It is more profitable than 2 PSHUFB and PERM.  */
>   else if (GET_MODE_SIZE (d->vmode) == 32)
>     {
>       if(!TARGET_AVX2)
>         return false;
>     }
>   else
>     return false;

Thanks, much better.


r~

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 1/2, x86] Add palignr support for AVX2.
  2014-04-28 21:08     ` Richard Henderson
@ 2014-05-05 16:49       ` Evgeny Stupachenko
  2014-05-19 10:42         ` Evgeny Stupachenko
  2014-05-19 15:32         ` Richard Henderson
  0 siblings, 2 replies; 40+ messages in thread
From: Evgeny Stupachenko @ 2014-05-05 16:49 UTC (permalink / raw)
  To: Richard Henderson; +Cc: GCC Patches, Richard Biener, Uros Bizjak

Is the following patch ok? It passes bootstrap and make check.

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 88142a8..91f6f21 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -42807,6 +42807,8 @@ expand_vec_perm_pshufb (struct expand_vec_perm_d *d)
   return true;
 }

+static bool expand_vec_perm_vpshufb2_vpermq (struct expand_vec_perm_d *d);
+
 /* A subroutine of ix86_expand_vec_perm_builtin_1.  Try to instantiate D
    in a single instruction.  */

@@ -42946,6 +42948,10 @@ expand_vec_perm_1 (struct expand_vec_perm_d *d)
   if (expand_vec_perm_pshufb (d))
     return true;

+  /* Try the AVX2 vpshufb.  */
+  if (expand_vec_perm_vpshufb2_vpermq (d))
+    return true;
+
   /* Try the AVX512F vpermi2 instructions.  */
   rtx vec[64];
   enum machine_mode mode = d->vmode;
@@ -43004,7 +43010,7 @@ expand_vec_perm_pshuflw_pshufhw (struct
expand_vec_perm_d *d)
 }

 /* A subroutine of ix86_expand_vec_perm_builtin_1.  Try to simplify
-   the permutation using the SSSE3 palignr instruction.  This succeeds
+   the permutation using the SSSE3/AVX2 palignr instruction.  This succeeds
    when all of the elements in PERM fit within one vector and we merely
    need to shift them down so that a single vector permutation has a
    chance to succeed.  */
@@ -43015,14 +43021,26 @@ expand_vec_perm_palignr (struct expand_vec_perm_d *d)
   unsigned i, nelt = d->nelt;
   unsigned min, max;
   bool in_order, ok;
-  rtx shift, target;
+  rtx shift, shift1, target, tmp;
   struct expand_vec_perm_d dcopy;

-  /* Even with AVX, palignr only operates on 128-bit vectors.  */
-  if (!TARGET_SSSE3 || GET_MODE_SIZE (d->vmode) != 16)
+  /* SSSE3 is required to apply PALIGNR on 16 bytes operands.  */
+  if (GET_MODE_SIZE (d->vmode) == 16)
+    {
+      if (!TARGET_SSSE3)
+       return false;
+    }
+  /* AVX2 is required to apply PALIGNR on 32 bytes operands.  */
+  else if (GET_MODE_SIZE (d->vmode) == 32)
+    {
+      if (!TARGET_AVX2)
+       return false;
+    }
+  /* Other sizes are not supported.  */
+  else
     return false;

-  min = nelt, max = 0;
+  min = 2 * nelt, max = 0;
   for (i = 0; i < nelt; ++i)
     {
       unsigned e = d->perm[i];
@@ -43041,9 +43059,35 @@ expand_vec_perm_palignr (struct expand_vec_perm_d *d)

   dcopy = *d;
   shift = GEN_INT (min * GET_MODE_BITSIZE (GET_MODE_INNER (d->vmode)));
-  target = gen_reg_rtx (TImode);
-  emit_insn (gen_ssse3_palignrti (target, gen_lowpart (TImode, d->op1),
-                                 gen_lowpart (TImode, d->op0), shift));
+  shift1 = GEN_INT ((min - nelt / 2) *
+          GET_MODE_BITSIZE (GET_MODE_INNER (d->vmode)));
+
+  if (GET_MODE_SIZE (d->vmode) != 32)
+    {
+      target = gen_reg_rtx (TImode);
+      emit_insn (gen_ssse3_palignrti (target, gen_lowpart (TImode, d->op1),
+                                     gen_lowpart (TImode, d->op0), shift));
+    }
+  else
+    {
+      target = gen_reg_rtx (V2TImode);
+      tmp = gen_reg_rtx (V4DImode);
+      emit_insn (gen_avx2_permv2ti (tmp,
+                                   gen_lowpart (V4DImode, d->op0),
+                                   gen_lowpart (V4DImode, d->op1),
+                                   GEN_INT (33)));
+      if (min < nelt / 2)
+        emit_insn (gen_avx2_palignrv2ti (target,
+                                        gen_lowpart (V2TImode, tmp),
+                                        gen_lowpart (V2TImode, d->op0),
+                                        shift));
+      else
+       emit_insn (gen_avx2_palignrv2ti (target,
+                                        gen_lowpart (V2TImode, d->op1),
+                                        gen_lowpart (V2TImode, tmp),
+                                        shift1));
+    }
+

   dcopy.op0 = dcopy.op1 = gen_lowpart (d->vmode, target);
   dcopy.one_operand_p = true;

On Tue, Apr 29, 2014 at 1:03 AM, Richard Henderson <rth@redhat.com> wrote:
> On 04/28/2014 01:43 PM, Evgeny Stupachenko wrote:
>> Agree on checks:
>>
>>   /* PALIGNR of 2 128-bits registers takes only 1 instrucion.
>>      Requires SSSE3.  */
>>   if (GET_MODE_SIZE (d->vmode) == 16)
>>     {
>>       if(!TARGET_SSSE3)
>>         return false;
>>     }
>>   /* PALIGNR of 2 256-bits registers on AVX2 costs only 2 instructions:
>>      PERM and PALIGNR.  It is more profitable than 2 PSHUFB and PERM.  */
>>   else if (GET_MODE_SIZE (d->vmode) == 32)
>>     {
>>       if(!TARGET_AVX2)
>>         return false;
>>     }
>>   else
>>     return false;
>
> Thanks, much better.
>
>
> r~

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 1/2, x86] Add palignr support for AVX2.
  2014-05-05 16:49       ` Evgeny Stupachenko
@ 2014-05-19 10:42         ` Evgeny Stupachenko
  2014-05-19 15:32         ` Richard Henderson
  1 sibling, 0 replies; 40+ messages in thread
From: Evgeny Stupachenko @ 2014-05-19 10:42 UTC (permalink / raw)
  To: Richard Henderson; +Cc: GCC Patches, Richard Biener, Uros Bizjak

Ping.

On Mon, May 5, 2014 at 8:49 PM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
> Is the following patch ok? It passes bootstrap and make check.
>
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index 88142a8..91f6f21 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -42807,6 +42807,8 @@ expand_vec_perm_pshufb (struct expand_vec_perm_d *d)
>    return true;
>  }
>
> +static bool expand_vec_perm_vpshufb2_vpermq (struct expand_vec_perm_d *d);
> +
>  /* A subroutine of ix86_expand_vec_perm_builtin_1.  Try to instantiate D
>     in a single instruction.  */
>
> @@ -42946,6 +42948,10 @@ expand_vec_perm_1 (struct expand_vec_perm_d *d)
>    if (expand_vec_perm_pshufb (d))
>      return true;
>
> +  /* Try the AVX2 vpshufb.  */
> +  if (expand_vec_perm_vpshufb2_vpermq (d))
> +    return true;
> +
>    /* Try the AVX512F vpermi2 instructions.  */
>    rtx vec[64];
>    enum machine_mode mode = d->vmode;
> @@ -43004,7 +43010,7 @@ expand_vec_perm_pshuflw_pshufhw (struct
> expand_vec_perm_d *d)
>  }
>
>  /* A subroutine of ix86_expand_vec_perm_builtin_1.  Try to simplify
> -   the permutation using the SSSE3 palignr instruction.  This succeeds
> +   the permutation using the SSSE3/AVX2 palignr instruction.  This succeeds
>     when all of the elements in PERM fit within one vector and we merely
>     need to shift them down so that a single vector permutation has a
>     chance to succeed.  */
> @@ -43015,14 +43021,26 @@ expand_vec_perm_palignr (struct expand_vec_perm_d *d)
>    unsigned i, nelt = d->nelt;
>    unsigned min, max;
>    bool in_order, ok;
> -  rtx shift, target;
> +  rtx shift, shift1, target, tmp;
>    struct expand_vec_perm_d dcopy;
>
> -  /* Even with AVX, palignr only operates on 128-bit vectors.  */
> -  if (!TARGET_SSSE3 || GET_MODE_SIZE (d->vmode) != 16)
> +  /* SSSE3 is required to apply PALIGNR on 16 bytes operands.  */
> +  if (GET_MODE_SIZE (d->vmode) == 16)
> +    {
> +      if (!TARGET_SSSE3)
> +       return false;
> +    }
> +  /* AVX2 is required to apply PALIGNR on 32 bytes operands.  */
> +  else if (GET_MODE_SIZE (d->vmode) == 32)
> +    {
> +      if (!TARGET_AVX2)
> +       return false;
> +    }
> +  /* Other sizes are not supported.  */
> +  else
>      return false;
>
> -  min = nelt, max = 0;
> +  min = 2 * nelt, max = 0;
>    for (i = 0; i < nelt; ++i)
>      {
>        unsigned e = d->perm[i];
> @@ -43041,9 +43059,35 @@ expand_vec_perm_palignr (struct expand_vec_perm_d *d)
>
>    dcopy = *d;
>    shift = GEN_INT (min * GET_MODE_BITSIZE (GET_MODE_INNER (d->vmode)));
> -  target = gen_reg_rtx (TImode);
> -  emit_insn (gen_ssse3_palignrti (target, gen_lowpart (TImode, d->op1),
> -                                 gen_lowpart (TImode, d->op0), shift));
> +  shift1 = GEN_INT ((min - nelt / 2) *
> +          GET_MODE_BITSIZE (GET_MODE_INNER (d->vmode)));
> +
> +  if (GET_MODE_SIZE (d->vmode) != 32)
> +    {
> +      target = gen_reg_rtx (TImode);
> +      emit_insn (gen_ssse3_palignrti (target, gen_lowpart (TImode, d->op1),
> +                                     gen_lowpart (TImode, d->op0), shift));
> +    }
> +  else
> +    {
> +      target = gen_reg_rtx (V2TImode);
> +      tmp = gen_reg_rtx (V4DImode);
> +      emit_insn (gen_avx2_permv2ti (tmp,
> +                                   gen_lowpart (V4DImode, d->op0),
> +                                   gen_lowpart (V4DImode, d->op1),
> +                                   GEN_INT (33)));
> +      if (min < nelt / 2)
> +        emit_insn (gen_avx2_palignrv2ti (target,
> +                                        gen_lowpart (V2TImode, tmp),
> +                                        gen_lowpart (V2TImode, d->op0),
> +                                        shift));
> +      else
> +       emit_insn (gen_avx2_palignrv2ti (target,
> +                                        gen_lowpart (V2TImode, d->op1),
> +                                        gen_lowpart (V2TImode, tmp),
> +                                        shift1));
> +    }
> +
>
>    dcopy.op0 = dcopy.op1 = gen_lowpart (d->vmode, target);
>    dcopy.one_operand_p = true;
>
> On Tue, Apr 29, 2014 at 1:03 AM, Richard Henderson <rth@redhat.com> wrote:
>> On 04/28/2014 01:43 PM, Evgeny Stupachenko wrote:
>>> Agree on checks:
>>>
>>>   /* PALIGNR of 2 128-bits registers takes only 1 instrucion.
>>>      Requires SSSE3.  */
>>>   if (GET_MODE_SIZE (d->vmode) == 16)
>>>     {
>>>       if(!TARGET_SSSE3)
>>>         return false;
>>>     }
>>>   /* PALIGNR of 2 256-bits registers on AVX2 costs only 2 instructions:
>>>      PERM and PALIGNR.  It is more profitable than 2 PSHUFB and PERM.  */
>>>   else if (GET_MODE_SIZE (d->vmode) == 32)
>>>     {
>>>       if(!TARGET_AVX2)
>>>         return false;
>>>     }
>>>   else
>>>     return false;
>>
>> Thanks, much better.
>>
>>
>> r~

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 1/2, x86] Add palignr support for AVX2.
  2014-05-05 16:49       ` Evgeny Stupachenko
  2014-05-19 10:42         ` Evgeny Stupachenko
@ 2014-05-19 15:32         ` Richard Henderson
  2014-07-03  9:53           ` Evgeny Stupachenko
  1 sibling, 1 reply; 40+ messages in thread
From: Richard Henderson @ 2014-05-19 15:32 UTC (permalink / raw)
  To: Evgeny Stupachenko; +Cc: GCC Patches, Richard Biener, Uros Bizjak

On 05/05/2014 09:49 AM, Evgeny Stupachenko wrote:
> @@ -42946,6 +42948,10 @@ expand_vec_perm_1 (struct expand_vec_perm_d *d)
>    if (expand_vec_perm_pshufb (d))
>      return true;
> 
> +  /* Try the AVX2 vpshufb.  */
> +  if (expand_vec_perm_vpshufb2_vpermq (d))
> +    return true;

Why is this here?  It doesn't expand to 1 insn, which is
what expand_vec_perm_1 is intended to check.

It's already called from ix86_expand_vec_perm_const_1.
If things need to be shuffled around in that function,
then that's the right place to do so.

It's also clearly unrelated to "palignr", so has no
business at all within this patch.

> -  min = nelt, max = 0;
> +  min = 2 * nelt, max = 0;

This change to min is wrong for this patch.  It probably
belonged in your 2/2 patch.

> +  shift1 = GEN_INT ((min - nelt / 2) *
> +          GET_MODE_BITSIZE (GET_MODE_INNER (d->vmode)));

Coding convention sez

  shift1 = GEN_INT ((min - nelt / 2)
                    * GET_MODE_BITSIZE (GET_MODE_INNER (d->vmode)));


r~

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 1/2, x86] Add palignr support for AVX2.
  2014-05-19 15:32         ` Richard Henderson
@ 2014-07-03  9:53           ` Evgeny Stupachenko
  2014-07-04 11:22             ` Uros Bizjak
  2014-07-07 14:40             ` Richard Henderson
  0 siblings, 2 replies; 40+ messages in thread
From: Evgeny Stupachenko @ 2014-07-03  9:53 UTC (permalink / raw)
  To: Richard Henderson; +Cc: GCC Patches, Richard Biener, Uros Bizjak

The "expand_vec_perm_palignr" is similar for SSSE3 and AVX2 cases,
  but AVX2 requires more instructions to complete the scheme.

The patch below adds AVX2 support for six instructions, leaving SSSE3 for two.
Is it ok?

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 2cffcef..70fc832 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -43130,23 +43130,38 @@ expand_vec_perm_pshuflw_pshufhw (struct
expand_vec_perm_d *d)
   return true;
 }

+static bool
+expand_vec_perm_vpshufb2_vpermq (struct expand_vec_perm_d *d);
+
 /* A subroutine of ix86_expand_vec_perm_builtin_1.  Try to simplify
-   the permutation using the SSSE3 palignr instruction.  This succeeds
+   the permutation using the SSSE3/AVX2 palignr instruction.  This succeeds
    when all of the elements in PERM fit within one vector and we merely
    need to shift them down so that a single vector permutation has a
    chance to succeed.  */

 static bool
-expand_vec_perm_palignr (struct expand_vec_perm_d *d)
+expand_vec_perm_palignr (struct expand_vec_perm_d *d, int insn_num)
 {
   unsigned i, nelt = d->nelt;
   unsigned min, max;
   bool in_order, ok;
-  rtx shift, target;
+  rtx shift, shift1, target, tmp;
   struct expand_vec_perm_d dcopy;

-  /* Even with AVX, palignr only operates on 128-bit vectors.  */
-  if (!TARGET_SSSE3 || GET_MODE_SIZE (d->vmode) != 16)
+  /* SSSE3 is required to apply PALIGNR on 16 bytes operands.  */
+  if (GET_MODE_SIZE (d->vmode) == 16)
+    {
+      if (!TARGET_SSSE3)
+       return false;
+    }
+  /* AVX2 is required to apply PALIGNR on 32 bytes operands.  */
+  else if (GET_MODE_SIZE (d->vmode) == 32)
+    {
+      if (!TARGET_AVX2)
+       return false;
+    }
+  /* Other sizes are not supported.  */
+  else
     return false;

   min = nelt, max = 0;
@@ -43168,9 +43183,34 @@ expand_vec_perm_palignr (struct expand_vec_perm_d *d)

   dcopy = *d;
   shift = GEN_INT (min * GET_MODE_BITSIZE (GET_MODE_INNER (d->vmode)));
-  target = gen_reg_rtx (TImode);
-  emit_insn (gen_ssse3_palignrti (target, gen_lowpart (TImode, d->op1),
-                                 gen_lowpart (TImode, d->op0), shift));
+  shift1 = GEN_INT ((min - nelt / 2)
+          * GET_MODE_BITSIZE (GET_MODE_INNER (d->vmode)));
+
+  if (GET_MODE_SIZE (d->vmode) != 32)
+    {
+      target = gen_reg_rtx (TImode);
+      emit_insn (gen_ssse3_palignrti (target, gen_lowpart (TImode, d->op1),
+                                     gen_lowpart (TImode, d->op0), shift));
+    }
+  else
+    {
+      target = gen_reg_rtx (V2TImode);
+      tmp = gen_reg_rtx (V4DImode);
+      emit_insn (gen_avx2_permv2ti (tmp,
+                                   gen_lowpart (V4DImode, d->op0),
+                                   gen_lowpart (V4DImode, d->op1),
+                                   GEN_INT (33)));
+      if (min < nelt / 2)
+        emit_insn (gen_avx2_palignrv2ti (target,
+                                        gen_lowpart (V2TImode, tmp),
+                                        gen_lowpart (V2TImode, d->op0),
+                                        shift));
+      else
+       emit_insn (gen_avx2_palignrv2ti (target,
+                                        gen_lowpart (V2TImode, d->op1),
+                                        gen_lowpart (V2TImode, tmp),
+                                        shift1));
+    }

   dcopy.op0 = dcopy.op1 = gen_lowpart (d->vmode, target);
   dcopy.one_operand_p = true;
@@ -43192,9 +43232,22 @@ expand_vec_perm_palignr (struct expand_vec_perm_d *d)
       return true;
     }

-  ok = expand_vec_perm_1 (&dcopy);
-  gcc_assert (ok);
-
+  /* For SSSE3 we need 1 instruction for palignr plus 1 for one
+     operand permutaoin.  */
+  if (insn_num == 2)
+    {
+      ok = expand_vec_perm_1 (&dcopy);
+      gcc_assert (ok);
+    }
+  /* For AVX2 we need 2 instructions for the shift: vpalignr and
+     vperm plus 4 instructions for one operand permutation.  */
+  else if (insn_num == 6)
+    {
+      ok = expand_vec_perm_vpshufb2_vpermq (&dcopy);
+      gcc_assert (ok);
+    }
+  else
+    ok = false;
   return ok;
 }

@@ -44627,7 +44680,7 @@ ix86_expand_vec_perm_const_1 (struct
expand_vec_perm_d *d)
   if (expand_vec_perm_pshuflw_pshufhw (d))
     return true;

-  if (expand_vec_perm_palignr (d))
+  if (expand_vec_perm_palignr (d, 2))
     return true;

   if (expand_vec_perm_interleave2 (d))
@@ -44680,6 +44733,10 @@ ix86_expand_vec_perm_const_1 (struct
expand_vec_perm_d *d)
   if (expand_vec_perm_even_odd (d))
     return true;

+  /* Try sequences of six instructions.  */
+  if (expand_vec_perm_palignr (d, 6))
+    return true;
+
   /* Even longer sequences.  */
   if (expand_vec_perm_vpshufb4_vpermq2 (d))
     return true;

On Mon, May 19, 2014 at 7:32 PM, Richard Henderson <rth@redhat.com> wrote:
> On 05/05/2014 09:49 AM, Evgeny Stupachenko wrote:
>> @@ -42946,6 +42948,10 @@ expand_vec_perm_1 (struct expand_vec_perm_d *d)
>>    if (expand_vec_perm_pshufb (d))
>>      return true;
>>
>> +  /* Try the AVX2 vpshufb.  */
>> +  if (expand_vec_perm_vpshufb2_vpermq (d))
>> +    return true;
>
> Why is this here?  It doesn't expand to 1 insn, which is
> what expand_vec_perm_1 is intended to check.
>
> It's already called from ix86_expand_vec_perm_const_1.
> If things need to be shuffled around in that function,
> then that's the right place to do so.
>
> It's also clearly unrelated to "palignr", so has no
> business at all within this patch.
>
>> -  min = nelt, max = 0;
>> +  min = 2 * nelt, max = 0;
>
> This change to min is wrong for this patch.  It probably
> belonged in your 2/2 patch.
>
>> +  shift1 = GEN_INT ((min - nelt / 2) *
>> +          GET_MODE_BITSIZE (GET_MODE_INNER (d->vmode)));
>
> Coding convention sez
>
>   shift1 = GEN_INT ((min - nelt / 2)
>                     * GET_MODE_BITSIZE (GET_MODE_INNER (d->vmode)));
>
>
> r~
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 1/2, x86] Add palignr support for AVX2.
  2014-07-03  9:53           ` Evgeny Stupachenko
@ 2014-07-04 11:22             ` Uros Bizjak
  2014-07-04 12:47               ` Evgeny Stupachenko
  2014-07-07 14:40             ` Richard Henderson
  1 sibling, 1 reply; 40+ messages in thread
From: Uros Bizjak @ 2014-07-04 11:22 UTC (permalink / raw)
  To: Evgeny Stupachenko; +Cc: Richard Henderson, GCC Patches, Richard Biener

On Thu, Jul 3, 2014 at 11:53 AM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
> The "expand_vec_perm_palignr" is similar for SSSE3 and AVX2 cases,
>   but AVX2 requires more instructions to complete the scheme.
>
> The patch below adds AVX2 support for six instructions, leaving SSSE3 for two.
> Is it ok?

ChangeLog entry is missing.

Uros.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 1/2, x86] Add palignr support for AVX2.
  2014-07-04 11:22             ` Uros Bizjak
@ 2014-07-04 12:47               ` Evgeny Stupachenko
  0 siblings, 0 replies; 40+ messages in thread
From: Evgeny Stupachenko @ 2014-07-04 12:47 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: Richard Henderson, GCC Patches, Richard Biener

2014-07-04  Evgeny Stupachenko  <evstupac@gmail.com>

        * config/i386/i386.c (expand_vec_perm_palignr): Extend to use AVX2
        PALINGR instruction.
        * config/i386/i386.c (ix86_expand_vec_perm_const_1): Add palignr try
        for AVX2.

On Fri, Jul 4, 2014 at 3:22 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Thu, Jul 3, 2014 at 11:53 AM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
>> The "expand_vec_perm_palignr" is similar for SSSE3 and AVX2 cases,
>>   but AVX2 requires more instructions to complete the scheme.
>>
>> The patch below adds AVX2 support for six instructions, leaving SSSE3 for two.
>> Is it ok?
>
> ChangeLog entry is missing.
>
> Uros.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 1/2, x86] Add palignr support for AVX2.
  2014-07-03  9:53           ` Evgeny Stupachenko
  2014-07-04 11:22             ` Uros Bizjak
@ 2014-07-07 14:40             ` Richard Henderson
  2014-07-10 15:29               ` Evgeny Stupachenko
  1 sibling, 1 reply; 40+ messages in thread
From: Richard Henderson @ 2014-07-07 14:40 UTC (permalink / raw)
  To: Evgeny Stupachenko; +Cc: GCC Patches, Richard Biener, Uros Bizjak

On 07/03/2014 02:53 AM, Evgeny Stupachenko wrote:
> -expand_vec_perm_palignr (struct expand_vec_perm_d *d)
> +expand_vec_perm_palignr (struct expand_vec_perm_d *d, int insn_num)

insn_num might as well be "bool avx2", since it's only ever set to two values.

> -  /* Even with AVX, palignr only operates on 128-bit vectors.  */
> -  if (!TARGET_SSSE3 || GET_MODE_SIZE (d->vmode) != 16)
> +  /* SSSE3 is required to apply PALIGNR on 16 bytes operands.  */
> +  if (GET_MODE_SIZE (d->vmode) == 16)
> +    {
> +      if (!TARGET_SSSE3)
> +       return false;
> +    }
> +  /* AVX2 is required to apply PALIGNR on 32 bytes operands.  */
> +  else if (GET_MODE_SIZE (d->vmode) == 32)
> +    {
> +      if (!TARGET_AVX2)
> +       return false;
> +    }
> +  /* Other sizes are not supported.  */
> +  else
>      return false;

And you'd better check it up here because...

> +  /* For SSSE3 we need 1 instruction for palignr plus 1 for one
> +     operand permutaoin.  */
> +  if (insn_num == 2)
> +    {
> +      ok = expand_vec_perm_1 (&dcopy);
> +      gcc_assert (ok);
> +    }
> +  /* For AVX2 we need 2 instructions for the shift: vpalignr and
> +     vperm plus 4 instructions for one operand permutation.  */
> +  else if (insn_num == 6)
> +    {
> +      ok = expand_vec_perm_vpshufb2_vpermq (&dcopy);
> +      gcc_assert (ok);
> +    }
> +  else
> +    ok = false;
>    return ok;

... down here you'll simply ICE from the gcc_assert.


r~

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 1/2, x86] Add palignr support for AVX2.
  2014-07-07 14:40             ` Richard Henderson
@ 2014-07-10 15:29               ` Evgeny Stupachenko
  2014-08-14  8:09                 ` Evgeny Stupachenko
  0 siblings, 1 reply; 40+ messages in thread
From: Evgeny Stupachenko @ 2014-07-10 15:29 UTC (permalink / raw)
  To: Richard Henderson; +Cc: GCC Patches, Richard Biener, Uros Bizjak

On Mon, Jul 7, 2014 at 6:40 PM, Richard Henderson <rth@redhat.com> wrote:
> On 07/03/2014 02:53 AM, Evgeny Stupachenko wrote:
>> -expand_vec_perm_palignr (struct expand_vec_perm_d *d)
>> +expand_vec_perm_palignr (struct expand_vec_perm_d *d, int insn_num)
>
> insn_num might as well be "bool avx2", since it's only ever set to two values.

Agree. However:
 after the alignment, one operand permutation could be just move and
take 2 instructions for AVX2 as well
 for AVX2 there could be other cases when the scheme takes 4 or 5 instructions
 we can leave it for potential avx512 extension

>
>> -  /* Even with AVX, palignr only operates on 128-bit vectors.  */
>> -  if (!TARGET_SSSE3 || GET_MODE_SIZE (d->vmode) != 16)
>> +  /* SSSE3 is required to apply PALIGNR on 16 bytes operands.  */
>> +  if (GET_MODE_SIZE (d->vmode) == 16)
>> +    {
>> +      if (!TARGET_SSSE3)
>> +       return false;
>> +    }
>> +  /* AVX2 is required to apply PALIGNR on 32 bytes operands.  */
>> +  else if (GET_MODE_SIZE (d->vmode) == 32)
>> +    {
>> +      if (!TARGET_AVX2)
>> +       return false;
>> +    }
>> +  /* Other sizes are not supported.  */
>> +  else
>>      return false;
>
> And you'd better check it up here because...
>

Correct. The following should resolve the issue:
  /* For AVX2 we need more than 2 instructions when the alignment
     by itself does not produce the desired permutation.  */
  if (TARGET_AVX2 && insn_num <= 2)
    return false;

>> +  /* For SSSE3 we need 1 instruction for palignr plus 1 for one
>> +     operand permutaoin.  */
>> +  if (insn_num == 2)
>> +    {
>> +      ok = expand_vec_perm_1 (&dcopy);
>> +      gcc_assert (ok);
>> +    }
>> +  /* For AVX2 we need 2 instructions for the shift: vpalignr and
>> +     vperm plus 4 instructions for one operand permutation.  */
>> +  else if (insn_num == 6)
>> +    {
>> +      ok = expand_vec_perm_vpshufb2_vpermq (&dcopy);
>> +      gcc_assert (ok);
>> +    }
>> +  else
>> +    ok = false;
>>    return ok;
>
> ... down here you'll simply ICE from the gcc_assert.




>
>
> r~

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 1/2, x86] Add palignr support for AVX2.
  2014-07-10 15:29               ` Evgeny Stupachenko
@ 2014-08-14  8:09                 ` Evgeny Stupachenko
  2014-08-14 14:55                   ` H.J. Lu
  0 siblings, 1 reply; 40+ messages in thread
From: Evgeny Stupachenko @ 2014-08-14  8:09 UTC (permalink / raw)
  To: Richard Henderson; +Cc: GCC Patches, Richard Biener, Uros Bizjak

Ping.

On Thu, Jul 10, 2014 at 7:29 PM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
> On Mon, Jul 7, 2014 at 6:40 PM, Richard Henderson <rth@redhat.com> wrote:
>> On 07/03/2014 02:53 AM, Evgeny Stupachenko wrote:
>>> -expand_vec_perm_palignr (struct expand_vec_perm_d *d)
>>> +expand_vec_perm_palignr (struct expand_vec_perm_d *d, int insn_num)
>>
>> insn_num might as well be "bool avx2", since it's only ever set to two values.
>
> Agree. However:
>  after the alignment, one operand permutation could be just move and
> take 2 instructions for AVX2 as well
>  for AVX2 there could be other cases when the scheme takes 4 or 5 instructions
>  we can leave it for potential avx512 extension
>
>>
>>> -  /* Even with AVX, palignr only operates on 128-bit vectors.  */
>>> -  if (!TARGET_SSSE3 || GET_MODE_SIZE (d->vmode) != 16)
>>> +  /* SSSE3 is required to apply PALIGNR on 16 bytes operands.  */
>>> +  if (GET_MODE_SIZE (d->vmode) == 16)
>>> +    {
>>> +      if (!TARGET_SSSE3)
>>> +       return false;
>>> +    }
>>> +  /* AVX2 is required to apply PALIGNR on 32 bytes operands.  */
>>> +  else if (GET_MODE_SIZE (d->vmode) == 32)
>>> +    {
>>> +      if (!TARGET_AVX2)
>>> +       return false;
>>> +    }
>>> +  /* Other sizes are not supported.  */
>>> +  else
>>>      return false;
>>
>> And you'd better check it up here because...
>>
>
> Correct. The following should resolve the issue:
>   /* For AVX2 we need more than 2 instructions when the alignment
>      by itself does not produce the desired permutation.  */
>   if (TARGET_AVX2 && insn_num <= 2)
>     return false;
>
>>> +  /* For SSSE3 we need 1 instruction for palignr plus 1 for one
>>> +     operand permutaoin.  */
>>> +  if (insn_num == 2)
>>> +    {
>>> +      ok = expand_vec_perm_1 (&dcopy);
>>> +      gcc_assert (ok);
>>> +    }
>>> +  /* For AVX2 we need 2 instructions for the shift: vpalignr and
>>> +     vperm plus 4 instructions for one operand permutation.  */
>>> +  else if (insn_num == 6)
>>> +    {
>>> +      ok = expand_vec_perm_vpshufb2_vpermq (&dcopy);
>>> +      gcc_assert (ok);
>>> +    }
>>> +  else
>>> +    ok = false;
>>>    return ok;
>>
>> ... down here you'll simply ICE from the gcc_assert.
>
>
>
>
>>
>>
>> r~

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 1/2, x86] Add palignr support for AVX2.
  2014-08-14  8:09                 ` Evgeny Stupachenko
@ 2014-08-14 14:55                   ` H.J. Lu
  2014-08-26 12:59                     ` Evgeny Stupachenko
  0 siblings, 1 reply; 40+ messages in thread
From: H.J. Lu @ 2014-08-14 14:55 UTC (permalink / raw)
  To: Evgeny Stupachenko
  Cc: Richard Henderson, GCC Patches, Richard Biener, Uros Bizjak

On Thu, Aug 14, 2014 at 1:08 AM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
> Ping.
>
> On Thu, Jul 10, 2014 at 7:29 PM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
>> On Mon, Jul 7, 2014 at 6:40 PM, Richard Henderson <rth@redhat.com> wrote:
>>> On 07/03/2014 02:53 AM, Evgeny Stupachenko wrote:
>>>> -expand_vec_perm_palignr (struct expand_vec_perm_d *d)
>>>> +expand_vec_perm_palignr (struct expand_vec_perm_d *d, int insn_num)
>>>
>>> insn_num might as well be "bool avx2", since it's only ever set to two values.
>>
>> Agree. However:
>>  after the alignment, one operand permutation could be just move and
>> take 2 instructions for AVX2 as well
>>  for AVX2 there could be other cases when the scheme takes 4 or 5 instructions
>>  we can leave it for potential avx512 extension
>>
>>>
>>>> -  /* Even with AVX, palignr only operates on 128-bit vectors.  */
>>>> -  if (!TARGET_SSSE3 || GET_MODE_SIZE (d->vmode) != 16)
>>>> +  /* SSSE3 is required to apply PALIGNR on 16 bytes operands.  */
>>>> +  if (GET_MODE_SIZE (d->vmode) == 16)
>>>> +    {
>>>> +      if (!TARGET_SSSE3)
>>>> +       return false;
>>>> +    }
>>>> +  /* AVX2 is required to apply PALIGNR on 32 bytes operands.  */
>>>> +  else if (GET_MODE_SIZE (d->vmode) == 32)
>>>> +    {
>>>> +      if (!TARGET_AVX2)
>>>> +       return false;
>>>> +    }
>>>> +  /* Other sizes are not supported.  */
>>>> +  else
>>>>      return false;
>>>
>>> And you'd better check it up here because...
>>>
>>
>> Correct. The following should resolve the issue:
>>   /* For AVX2 we need more than 2 instructions when the alignment
>>      by itself does not produce the desired permutation.  */
>>   if (TARGET_AVX2 && insn_num <= 2)
>>     return false;
>>
>>>> +  /* For SSSE3 we need 1 instruction for palignr plus 1 for one
>>>> +     operand permutaoin.  */
>>>> +  if (insn_num == 2)
>>>> +    {
>>>> +      ok = expand_vec_perm_1 (&dcopy);
>>>> +      gcc_assert (ok);
>>>> +    }
>>>> +  /* For AVX2 we need 2 instructions for the shift: vpalignr and
>>>> +     vperm plus 4 instructions for one operand permutation.  */
>>>> +  else if (insn_num == 6)
>>>> +    {
>>>> +      ok = expand_vec_perm_vpshufb2_vpermq (&dcopy);
>>>> +      gcc_assert (ok);
>>>> +    }
>>>> +  else
>>>> +    ok = false;
>>>>    return ok;
>>>
>>> ... down here you'll simply ICE from the gcc_assert.
>>

Can you modify your patch to fix

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62128

with a testcase?


-- 
H.J.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 1/2, x86] Add palignr support for AVX2.
  2014-08-14 14:55                   ` H.J. Lu
@ 2014-08-26 12:59                     ` Evgeny Stupachenko
  2014-08-26 16:29                       ` Richard Henderson
  0 siblings, 1 reply; 40+ messages in thread
From: Evgeny Stupachenko @ 2014-08-26 12:59 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Richard Henderson, GCC Patches, Richard Biener, Uros Bizjak

That is covered by a separate part of the patch:
(make check and bootstrap passed: 2 new passes for core-avx2)
is it ok?

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index d6155cf..68ee65a 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -81,6 +81,7 @@
   ;; For AVX2 support
   UNSPEC_VPERMVAR
   UNSPEC_VPERMTI
+  UNSPEC_VPALIGNRDI
   UNSPEC_GATHER
   UNSPEC_VSIBADDR

@@ -14167,6 +14168,19 @@
    (set_attr "prefix" "vex")
    (set_attr "mode" "OI")])

+(define_insn "avx2_palignrv4di"
+  [(set (match_operand:V4DI 0 "register_operand" "=x")
+       (unspec:V4DI
+         [(match_operand:V4DI 1 "register_operand" "x")
+          (match_operand:V4DI 2 "nonimmediate_operand" "xm")
+          (match_operand:SI 3 "const_0_to_255_operand" "n")]
+         UNSPEC_VPALIGNRDI))]
+  "TARGET_AVX2"
+  "vpalignr\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+  [(set_attr "type" "sselog")
+   (set_attr "prefix" "vex")
+   (set_attr "mode" "OI")])
+
 (define_insn "avx2_vec_dupv4df"
   [(set (match_operand:V4DF 0 "register_operand" "=x")
        (vec_duplicate:V4DF
@@ -14658,6 +14672,49 @@
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "orig,vex")])

+(define_insn_and_split "avx2_rotate<mode>_perm"
+  [(set (match_operand:V_256 0 "register_operand" "=&x")
+      (vec_select:V_256
+       (match_operand:V_256 1 "register_operand" "x")
+       (match_parallel 2 "palignr_operand"
+         [(match_operand 3 "const_int_operand" "n")])))]
+  "TARGET_AVX2"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+{
+  enum machine_mode imode = GET_MODE_INNER (<MODE>mode);
+  int shift = INTVAL (operands[3]) * GET_MODE_SIZE (imode);
+  rtx op0 = gen_rtx_REG (V4DImode, REGNO (operands[0]));
+  rtx op1 = gen_rtx_REG (V4DImode, REGNO (operands[1]));
+
+  if (shift == 0)
+    emit_move_insn (operands[0], operands[1]);
+  else
+    {
+      emit_insn (gen_avx2_permv2ti (op0,
+                                   op1,
+                                   op1,
+                                   GEN_INT (33)));
+      if (shift < 16)
+       emit_insn (gen_avx2_palignrv4di (op0,
+                                        op0,
+                                        op1,
+                                        GEN_INT (shift)));
+      else if (shift > 16)
+       emit_insn (gen_avx2_palignrv4di (op0,
+                                        op1,
+                                        op0,
+                                        GEN_INT (shift - 16)));
+    }
+  DONE;
+}
+  [(set_attr "type" "sseishft")
+   (set_attr "prefix_extra" "1")
+   (set_attr "length_immediate" "1")
+   (set_attr "prefix" "vex")])
+
+
 (define_expand "avx_vinsertf128<mode>"
   [(match_operand:V_256 0 "register_operand")
    (match_operand:V_256 1 "register_operand")

The test case covering this is "gcc.target/i386/pr52252-atom.c".
It will pass for "-march=core-avx2" when the patch committed.

On Thu, Aug 14, 2014 at 6:55 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Thu, Aug 14, 2014 at 1:08 AM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
>> Ping.
>>
>> On Thu, Jul 10, 2014 at 7:29 PM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
>>> On Mon, Jul 7, 2014 at 6:40 PM, Richard Henderson <rth@redhat.com> wrote:
>>>> On 07/03/2014 02:53 AM, Evgeny Stupachenko wrote:
>>>>> -expand_vec_perm_palignr (struct expand_vec_perm_d *d)
>>>>> +expand_vec_perm_palignr (struct expand_vec_perm_d *d, int insn_num)
>>>>
>>>> insn_num might as well be "bool avx2", since it's only ever set to two values.
>>>
>>> Agree. However:
>>>  after the alignment, one operand permutation could be just move and
>>> take 2 instructions for AVX2 as well
>>>  for AVX2 there could be other cases when the scheme takes 4 or 5 instructions
>>>  we can leave it for potential avx512 extension
>>>
>>>>
>>>>> -  /* Even with AVX, palignr only operates on 128-bit vectors.  */
>>>>> -  if (!TARGET_SSSE3 || GET_MODE_SIZE (d->vmode) != 16)
>>>>> +  /* SSSE3 is required to apply PALIGNR on 16 bytes operands.  */
>>>>> +  if (GET_MODE_SIZE (d->vmode) == 16)
>>>>> +    {
>>>>> +      if (!TARGET_SSSE3)
>>>>> +       return false;
>>>>> +    }
>>>>> +  /* AVX2 is required to apply PALIGNR on 32 bytes operands.  */
>>>>> +  else if (GET_MODE_SIZE (d->vmode) == 32)
>>>>> +    {
>>>>> +      if (!TARGET_AVX2)
>>>>> +       return false;
>>>>> +    }
>>>>> +  /* Other sizes are not supported.  */
>>>>> +  else
>>>>>      return false;
>>>>
>>>> And you'd better check it up here because...
>>>>
>>>
>>> Correct. The following should resolve the issue:
>>>   /* For AVX2 we need more than 2 instructions when the alignment
>>>      by itself does not produce the desired permutation.  */
>>>   if (TARGET_AVX2 && insn_num <= 2)
>>>     return false;
>>>
>>>>> +  /* For SSSE3 we need 1 instruction for palignr plus 1 for one
>>>>> +     operand permutaoin.  */
>>>>> +  if (insn_num == 2)
>>>>> +    {
>>>>> +      ok = expand_vec_perm_1 (&dcopy);
>>>>> +      gcc_assert (ok);
>>>>> +    }
>>>>> +  /* For AVX2 we need 2 instructions for the shift: vpalignr and
>>>>> +     vperm plus 4 instructions for one operand permutation.  */
>>>>> +  else if (insn_num == 6)
>>>>> +    {
>>>>> +      ok = expand_vec_perm_vpshufb2_vpermq (&dcopy);
>>>>> +      gcc_assert (ok);
>>>>> +    }
>>>>> +  else
>>>>> +    ok = false;
>>>>>    return ok;
>>>>
>>>> ... down here you'll simply ICE from the gcc_assert.
>>>
>
> Can you modify your patch to fix
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62128
>
> with a testcase?
>
>
> --
> H.J.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 1/2, x86] Add palignr support for AVX2.
  2014-08-26 12:59                     ` Evgeny Stupachenko
@ 2014-08-26 16:29                       ` Richard Henderson
  2014-08-27 15:50                         ` Evgeny Stupachenko
  0 siblings, 1 reply; 40+ messages in thread
From: Richard Henderson @ 2014-08-26 16:29 UTC (permalink / raw)
  To: Evgeny Stupachenko, H.J. Lu; +Cc: GCC Patches, Richard Biener, Uros Bizjak

On 08/26/2014 05:59 AM, Evgeny Stupachenko wrote:
> +(define_insn_and_split "avx2_rotate<mode>_perm"
> +  [(set (match_operand:V_256 0 "register_operand" "=&x")
> +      (vec_select:V_256
> +       (match_operand:V_256 1 "register_operand" "x")
> +       (match_parallel 2 "palignr_operand"
> +         [(match_operand 3 "const_int_operand" "n")])))]
> +  "TARGET_AVX2"
> +  "#"
> +  "&& reload_completed"
> +  [(const_int 0)]

Why are you waiting until after reload to expand this?  It's only the
vec_select parallel that determines which direction the palignr should be done.

This seems like something you could do during permutation expansion.


r~


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 1/2, x86] Add palignr support for AVX2.
  2014-08-26 16:29                       ` Richard Henderson
@ 2014-08-27 15:50                         ` Evgeny Stupachenko
  2014-09-08 10:03                           ` Evgeny Stupachenko
  0 siblings, 1 reply; 40+ messages in thread
From: Evgeny Stupachenko @ 2014-08-27 15:50 UTC (permalink / raw)
  To: Richard Henderson; +Cc: H.J. Lu, GCC Patches, Richard Biener, Uros Bizjak

The rotate insn appeared right after expand.
I've done it similar to define_insn_and_split "*avx_vperm_broadcast_<mode>".
I don't see any potential losses on splitting that after reload.

On Tue, Aug 26, 2014 at 8:29 PM, Richard Henderson <rth@redhat.com> wrote:
> On 08/26/2014 05:59 AM, Evgeny Stupachenko wrote:
>> +(define_insn_and_split "avx2_rotate<mode>_perm"
>> +  [(set (match_operand:V_256 0 "register_operand" "=&x")
>> +      (vec_select:V_256
>> +       (match_operand:V_256 1 "register_operand" "x")
>> +       (match_parallel 2 "palignr_operand"
>> +         [(match_operand 3 "const_int_operand" "n")])))]
>> +  "TARGET_AVX2"
>> +  "#"
>> +  "&& reload_completed"
>> +  [(const_int 0)]
>
> Why are you waiting until after reload to expand this?  It's only the
> vec_select parallel that determines which direction the palignr should be done.
>
> This seems like something you could do during permutation expansion.
>
>
> r~
>
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 1/2, x86] Add palignr support for AVX2.
  2014-08-27 15:50                         ` Evgeny Stupachenko
@ 2014-09-08 10:03                           ` Evgeny Stupachenko
  2014-09-16 13:15                             ` Evgeny Stupachenko
  0 siblings, 1 reply; 40+ messages in thread
From: Evgeny Stupachenko @ 2014-09-08 10:03 UTC (permalink / raw)
  To: Richard Henderson; +Cc: H.J. Lu, GCC Patches, Richard Biener, Uros Bizjak

PING

On Wed, Aug 27, 2014 at 7:50 PM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
> The rotate insn appeared right after expand.
> I've done it similar to define_insn_and_split "*avx_vperm_broadcast_<mode>".
> I don't see any potential losses on splitting that after reload.
>
> On Tue, Aug 26, 2014 at 8:29 PM, Richard Henderson <rth@redhat.com> wrote:
>> On 08/26/2014 05:59 AM, Evgeny Stupachenko wrote:
>>> +(define_insn_and_split "avx2_rotate<mode>_perm"
>>> +  [(set (match_operand:V_256 0 "register_operand" "=&x")
>>> +      (vec_select:V_256
>>> +       (match_operand:V_256 1 "register_operand" "x")
>>> +       (match_parallel 2 "palignr_operand"
>>> +         [(match_operand 3 "const_int_operand" "n")])))]
>>> +  "TARGET_AVX2"
>>> +  "#"
>>> +  "&& reload_completed"
>>> +  [(const_int 0)]
>>
>> Why are you waiting until after reload to expand this?  It's only the
>> vec_select parallel that determines which direction the palignr should be done.
>>
>> This seems like something you could do during permutation expansion.
>>
>>
>> r~
>>
>>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 1/2, x86] Add palignr support for AVX2.
  2014-09-08 10:03                           ` Evgeny Stupachenko
@ 2014-09-16 13:15                             ` Evgeny Stupachenko
  2014-09-16 14:59                               ` H.J. Lu
  0 siblings, 1 reply; 40+ messages in thread
From: Evgeny Stupachenko @ 2014-09-16 13:15 UTC (permalink / raw)
  To: Richard Henderson, Uros Bizjak; +Cc: H.J. Lu, GCC Patches, Richard Biener

PING 2

On Mon, Sep 8, 2014 at 2:03 PM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
> PING
>
> On Wed, Aug 27, 2014 at 7:50 PM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
>> The rotate insn appeared right after expand.
>> I've done it similar to define_insn_and_split "*avx_vperm_broadcast_<mode>".
>> I don't see any potential losses on splitting that after reload.
>>
>> On Tue, Aug 26, 2014 at 8:29 PM, Richard Henderson <rth@redhat.com> wrote:
>>> On 08/26/2014 05:59 AM, Evgeny Stupachenko wrote:
>>>> +(define_insn_and_split "avx2_rotate<mode>_perm"
>>>> +  [(set (match_operand:V_256 0 "register_operand" "=&x")
>>>> +      (vec_select:V_256
>>>> +       (match_operand:V_256 1 "register_operand" "x")
>>>> +       (match_parallel 2 "palignr_operand"
>>>> +         [(match_operand 3 "const_int_operand" "n")])))]
>>>> +  "TARGET_AVX2"
>>>> +  "#"
>>>> +  "&& reload_completed"
>>>> +  [(const_int 0)]
>>>
>>> Why are you waiting until after reload to expand this?  It's only the
>>> vec_select parallel that determines which direction the palignr should be done.
>>>
>>> This seems like something you could do during permutation expansion.
>>>
>>>
>>> r~
>>>
>>>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 1/2, x86] Add palignr support for AVX2.
  2014-09-16 13:15                             ` Evgeny Stupachenko
@ 2014-09-16 14:59                               ` H.J. Lu
  2014-09-17 13:01                                 ` Evgeny Stupachenko
  0 siblings, 1 reply; 40+ messages in thread
From: H.J. Lu @ 2014-09-16 14:59 UTC (permalink / raw)
  To: Evgeny Stupachenko
  Cc: Richard Henderson, Uros Bizjak, GCC Patches, Richard Biener

On Tue, Sep 16, 2014 at 6:15 AM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
> PING 2
>
> On Mon, Sep 8, 2014 at 2:03 PM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
>> PING
>>
>> On Wed, Aug 27, 2014 at 7:50 PM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
>>> The rotate insn appeared right after expand.
>>> I've done it similar to define_insn_and_split "*avx_vperm_broadcast_<mode>".
>>> I don't see any potential losses on splitting that after reload.
>>>
>>> On Tue, Aug 26, 2014 at 8:29 PM, Richard Henderson <rth@redhat.com> wrote:
>>>> On 08/26/2014 05:59 AM, Evgeny Stupachenko wrote:
>>>>> +(define_insn_and_split "avx2_rotate<mode>_perm"
>>>>> +  [(set (match_operand:V_256 0 "register_operand" "=&x")
>>>>> +      (vec_select:V_256
>>>>> +       (match_operand:V_256 1 "register_operand" "x")
>>>>> +       (match_parallel 2 "palignr_operand"
>>>>> +         [(match_operand 3 "const_int_operand" "n")])))]
>>>>> +  "TARGET_AVX2"
>>>>> +  "#"
>>>>> +  "&& reload_completed"
>>>>> +  [(const_int 0)]
>>>>
>>>> Why are you waiting until after reload to expand this?  It's only the
>>>> vec_select parallel that determines which direction the palignr should be done.
>>>>
>>>> This seems like something you could do during permutation expansion.
>>>>
>>>>

Assuming your change is triggered today without any additional changes
 you should include some testcases.  For now, it doesn't show if it does
anything useful.

-- 
H.J.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 1/2, x86] Add palignr support for AVX2.
  2014-09-16 14:59                               ` H.J. Lu
@ 2014-09-17 13:01                                 ` Evgeny Stupachenko
  2014-09-17 14:49                                   ` H.J. Lu
  0 siblings, 1 reply; 40+ messages in thread
From: Evgeny Stupachenko @ 2014-09-17 13:01 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Richard Henderson, Uros Bizjak, GCC Patches, Richard Biener

It fix "gcc.target/i386/pr52252-atom.c" in core-avx2 make check and pr62128.

On Tue, Sep 16, 2014 at 6:51 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Tue, Sep 16, 2014 at 6:15 AM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
>> PING 2
>>
>> On Mon, Sep 8, 2014 at 2:03 PM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
>>> PING
>>>
>>> On Wed, Aug 27, 2014 at 7:50 PM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
>>>> The rotate insn appeared right after expand.
>>>> I've done it similar to define_insn_and_split "*avx_vperm_broadcast_<mode>".
>>>> I don't see any potential losses on splitting that after reload.
>>>>
>>>> On Tue, Aug 26, 2014 at 8:29 PM, Richard Henderson <rth@redhat.com> wrote:
>>>>> On 08/26/2014 05:59 AM, Evgeny Stupachenko wrote:
>>>>>> +(define_insn_and_split "avx2_rotate<mode>_perm"
>>>>>> +  [(set (match_operand:V_256 0 "register_operand" "=&x")
>>>>>> +      (vec_select:V_256
>>>>>> +       (match_operand:V_256 1 "register_operand" "x")
>>>>>> +       (match_parallel 2 "palignr_operand"
>>>>>> +         [(match_operand 3 "const_int_operand" "n")])))]
>>>>>> +  "TARGET_AVX2"
>>>>>> +  "#"
>>>>>> +  "&& reload_completed"
>>>>>> +  [(const_int 0)]
>>>>>
>>>>> Why are you waiting until after reload to expand this?  It's only the
>>>>> vec_select parallel that determines which direction the palignr should be done.
>>>>>
>>>>> This seems like something you could do during permutation expansion.
>>>>>
>>>>>
>
> Assuming your change is triggered today without any additional changes
>  you should include some testcases.  For now, it doesn't show if it does
> anything useful.
>
> --
> H.J.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 1/2, x86] Add palignr support for AVX2.
  2014-09-17 13:01                                 ` Evgeny Stupachenko
@ 2014-09-17 14:49                                   ` H.J. Lu
  2014-09-17 17:26                                     ` Evgeny Stupachenko
  0 siblings, 1 reply; 40+ messages in thread
From: H.J. Lu @ 2014-09-17 14:49 UTC (permalink / raw)
  To: Evgeny Stupachenko
  Cc: Richard Henderson, Uros Bizjak, GCC Patches, Richard Biener

On Wed, Sep 17, 2014 at 6:01 AM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
> It fix "gcc.target/i386/pr52252-atom.c" in core-avx2 make check and pr62128.
>

I suggest you resubmit the patch as a bug fix for pr62128 with
testcases from  pr62128 as well as gcc.target/i386/pr52252-atom.c.


-- 
H.J.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 1/2, x86] Add palignr support for AVX2.
  2014-09-17 14:49                                   ` H.J. Lu
@ 2014-09-17 17:26                                     ` Evgeny Stupachenko
  2014-10-01 10:16                                       ` Evgeny Stupachenko
  0 siblings, 1 reply; 40+ messages in thread
From: Evgeny Stupachenko @ 2014-09-17 17:26 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Richard Henderson, Uros Bizjak, GCC Patches, Richard Biener

The test in pr62128 is exactly TEST 22 from
gcc.dg/torture/vshuf-v32qi.c. It will check if the pattern is correct
or not.
Resubmitting patch looks good as current mail thread is already too complicated.

On Wed, Sep 17, 2014 at 6:49 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Wed, Sep 17, 2014 at 6:01 AM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
>> It fix "gcc.target/i386/pr52252-atom.c" in core-avx2 make check and pr62128.
>>
>
> I suggest you resubmit the patch as a bug fix for pr62128 with
> testcases from  pr62128 as well as gcc.target/i386/pr52252-atom.c.
>
>
> --
> H.J.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 1/2, x86] Add palignr support for AVX2.
  2014-09-17 17:26                                     ` Evgeny Stupachenko
@ 2014-10-01 10:16                                       ` Evgeny Stupachenko
  2014-10-01 10:28                                         ` Uros Bizjak
  0 siblings, 1 reply; 40+ messages in thread
From: Evgeny Stupachenko @ 2014-10-01 10:16 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Richard Henderson, Uros Bizjak, GCC Patches, Richard Biener

[-- Attachment #1: Type: text/plain, Size: 1075 bytes --]

Getting back to initial patch, is it ok?
It fixes gcc.target/i386/pr52252-atom.c for AVX2 make check.
X86 bootstrap is also ok.

2014-10-01  Evgeny Stupachenko  <evstupac@gmail.com>

        * config/i386/i386.c (expand_vec_perm_palignr): Extend to use AVX2
        PALINGR instruction.
        * config/i386/i386.c (ix86_expand_vec_perm_const_1): Add palignr try
        for AVX2.

On Wed, Sep 17, 2014 at 9:26 PM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
> The test in pr62128 is exactly TEST 22 from
> gcc.dg/torture/vshuf-v32qi.c. It will check if the pattern is correct
> or not.
> Resubmitting patch looks good as current mail thread is already too complicated.
>
> On Wed, Sep 17, 2014 at 6:49 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Wed, Sep 17, 2014 at 6:01 AM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
>>> It fix "gcc.target/i386/pr52252-atom.c" in core-avx2 make check and pr62128.
>>>
>>
>> I suggest you resubmit the patch as a bug fix for pr62128 with
>> testcases from  pr62128 as well as gcc.target/i386/pr52252-atom.c.
>>
>>
>> --
>> H.J.

[-- Attachment #2: palignr_avx2.patch --]
[-- Type: application/octet-stream, Size: 4332 bytes --]

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 6337aa5..818c910 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -43257,23 +43257,38 @@ expand_vec_perm_pshuflw_pshufhw (struct expand_vec_perm_d *d)
   return true;
 }
 
+static bool
+expand_vec_perm_vpshufb2_vpermq (struct expand_vec_perm_d *d);
+
 /* A subroutine of ix86_expand_vec_perm_builtin_1.  Try to simplify
-   the permutation using the SSSE3 palignr instruction.  This succeeds
+   the permutation using the SSSE3/AVX2 palignr instruction.  This succeeds
    when all of the elements in PERM fit within one vector and we merely
    need to shift them down so that a single vector permutation has a
    chance to succeed.  */
 
 static bool
-expand_vec_perm_palignr (struct expand_vec_perm_d *d)
+expand_vec_perm_palignr (struct expand_vec_perm_d *d, int insn_num)
 {
   unsigned i, nelt = d->nelt;
   unsigned min, max;
   bool in_order, ok;
-  rtx shift, target;
+  rtx shift, shift1, target, tmp;
   struct expand_vec_perm_d dcopy;
 
-  /* Even with AVX, palignr only operates on 128-bit vectors.  */
-  if (!TARGET_SSSE3 || GET_MODE_SIZE (d->vmode) != 16)
+  /* SSSE3 is required to apply PALIGNR on 16 bytes operands.  */
+  if (GET_MODE_SIZE (d->vmode) == 16)
+    {
+      if (!TARGET_SSSE3)
+	return false;
+    }
+  /* AVX2 is required to apply PALIGNR on 32 bytes operands.  */
+  else if (GET_MODE_SIZE (d->vmode) == 32)
+    {
+      if (!TARGET_AVX2)
+	return false;
+    }
+  /* Other sizes are not supported.  */
+  else
     return false;
 
   min = nelt, max = 0;
@@ -43295,9 +43310,34 @@ expand_vec_perm_palignr (struct expand_vec_perm_d *d)
 
   dcopy = *d;
   shift = GEN_INT (min * GET_MODE_BITSIZE (GET_MODE_INNER (d->vmode)));
-  target = gen_reg_rtx (TImode);
-  emit_insn (gen_ssse3_palignrti (target, gen_lowpart (TImode, d->op1),
-				  gen_lowpart (TImode, d->op0), shift));
+  shift1 = GEN_INT ((min - nelt / 2) *
+	   GET_MODE_BITSIZE (GET_MODE_INNER (d->vmode)));
+
+  if (GET_MODE_SIZE (d->vmode) != 32)
+    {
+      target = gen_reg_rtx (TImode);
+      emit_insn (gen_ssse3_palignrti (target, gen_lowpart (TImode, d->op1),
+				      gen_lowpart (TImode, d->op0), shift));
+    }
+  else
+    {
+      target = gen_reg_rtx (V2TImode);
+      tmp = gen_reg_rtx (V4DImode);
+      emit_insn (gen_avx2_permv2ti (tmp,
+				    gen_lowpart (V4DImode, d->op0),
+				    gen_lowpart (V4DImode, d->op1),
+				    GEN_INT (33)));
+      if (min < nelt / 2)
+        emit_insn (gen_avx2_palignrv2ti (target,
+					 gen_lowpart (V2TImode, tmp),
+					 gen_lowpart (V2TImode, d->op0),
+					 shift));
+      else
+	emit_insn (gen_avx2_palignrv2ti (target,
+					 gen_lowpart (V2TImode, d->op1),
+					 gen_lowpart (V2TImode, tmp),
+					 shift1));
+    }
 
   dcopy.op0 = dcopy.op1 = gen_lowpart (d->vmode, target);
   dcopy.one_operand_p = true;
@@ -43319,9 +43359,27 @@ expand_vec_perm_palignr (struct expand_vec_perm_d *d)
       return true;
     }
 
-  ok = expand_vec_perm_1 (&dcopy);
-  gcc_assert (ok);
+  /* For AVX2 we need more than 2 instructions when the alignment
+     by itself does not produce the desired permutation.  */
+  if (TARGET_AVX2 && insn_num <= 2)
+    return false;
 
+  /* For SSSE3 we need 1 instruction for palignr plus 1 for one
+     operand permutaoin.  */
+  if (insn_num == 2)
+    {
+      ok = expand_vec_perm_1 (&dcopy);
+      gcc_assert (ok);
+    }
+  /* For AVX2 we need 2 instructions for the shift: vpalignr and
+     vperm plus 4 instructions for one operand permutation.  */
+  else if (insn_num == 6)
+    {
+      ok = expand_vec_perm_vpshufb2_vpermq (&dcopy);
+      gcc_assert (ok);
+    }
+  else
+    ok = false;
   return ok;
 }
 
@@ -44754,7 +44812,7 @@ ix86_expand_vec_perm_const_1 (struct expand_vec_perm_d *d)
   if (expand_vec_perm_pshuflw_pshufhw (d))
     return true;
 
-  if (expand_vec_perm_palignr (d))
+  if (expand_vec_perm_palignr (d, 2))
     return true;
 
   if (expand_vec_perm_interleave2 (d))
@@ -44807,6 +44865,10 @@ ix86_expand_vec_perm_const_1 (struct expand_vec_perm_d *d)
   if (expand_vec_perm_even_odd (d))
     return true;
 
+  /* Try sequences of six instructions.  */
+  if (expand_vec_perm_palignr (d, 6))
+    return true;
+
   /* Even longer sequences.  */
   if (expand_vec_perm_vpshufb4_vpermq2 (d))
     return true;

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 1/2, x86] Add palignr support for AVX2.
  2014-10-01 10:16                                       ` Evgeny Stupachenko
@ 2014-10-01 10:28                                         ` Uros Bizjak
  2014-10-01 10:35                                           ` Jakub Jelinek
  0 siblings, 1 reply; 40+ messages in thread
From: Uros Bizjak @ 2014-10-01 10:28 UTC (permalink / raw)
  To: Evgeny Stupachenko
  Cc: H.J. Lu, Richard Henderson, GCC Patches, Richard Biener, Jakub Jelinek

On Wed, Oct 1, 2014 at 12:16 PM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
> Getting back to initial patch, is it ok?

IMO, we should start with Jakub's proposed patch [1]

[1] https://gcc.gnu.org/ml/gcc-patches/2014-10/msg00010.html

Uros.

> It fixes gcc.target/i386/pr52252-atom.c for AVX2 make check.
> X86 bootstrap is also ok.
>
> 2014-10-01  Evgeny Stupachenko  <evstupac@gmail.com>
>
>         * config/i386/i386.c (expand_vec_perm_palignr): Extend to use AVX2
>         PALINGR instruction.
>         * config/i386/i386.c (ix86_expand_vec_perm_const_1): Add palignr try
>         for AVX2.
>
> On Wed, Sep 17, 2014 at 9:26 PM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
>> The test in pr62128 is exactly TEST 22 from
>> gcc.dg/torture/vshuf-v32qi.c. It will check if the pattern is correct
>> or not.
>> Resubmitting patch looks good as current mail thread is already too complicated.
>>
>> On Wed, Sep 17, 2014 at 6:49 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>> On Wed, Sep 17, 2014 at 6:01 AM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
>>>> It fix "gcc.target/i386/pr52252-atom.c" in core-avx2 make check and pr62128.
>>>>
>>>
>>> I suggest you resubmit the patch as a bug fix for pr62128 with
>>> testcases from  pr62128 as well as gcc.target/i386/pr52252-atom.c.
>>>
>>>
>>> --
>>> H.J.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 1/2, x86] Add palignr support for AVX2.
  2014-10-01 10:28                                         ` Uros Bizjak
@ 2014-10-01 10:35                                           ` Jakub Jelinek
  2014-10-01 10:50                                             ` Evgeny Stupachenko
  2014-10-01 11:38                                             ` Jakub Jelinek
  0 siblings, 2 replies; 40+ messages in thread
From: Jakub Jelinek @ 2014-10-01 10:35 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: Evgeny Stupachenko, H.J. Lu, Richard Henderson, GCC Patches,
	Richard Biener

On Wed, Oct 01, 2014 at 12:28:51PM +0200, Uros Bizjak wrote:
> On Wed, Oct 1, 2014 at 12:16 PM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
> > Getting back to initial patch, is it ok?
> 
> IMO, we should start with Jakub's proposed patch [1]
> 
> [1] https://gcc.gnu.org/ml/gcc-patches/2014-10/msg00010.html

That doesn't compile, will post a new version; got interrupted when
I found that in
GCC_TEST_RUN_EXPENSIVE=1 make check-gcc RUNTESTFLAGS='--target_board=unix/-mavx2 dg-torture.exp=vshuf*.c'
one test is miscompiled even with unpatched compiler, debugging that now.

That said, my patch will not do anything about the
case Mark mentioned { 1, 2, 3, ..., 31, 0 } permutation,
for that we can't do vpalignr followed by vpshufb or similar,
but need to do some permutation first and then vpalignr on
the result.  So it would need a new routine.  It is still a 2
insn permutation, not 6, and needs different algorithm, so sharing
the same routine for that is undesirable.

	Jakub

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 1/2, x86] Add palignr support for AVX2.
  2014-10-01 10:35                                           ` Jakub Jelinek
@ 2014-10-01 10:50                                             ` Evgeny Stupachenko
  2014-10-01 11:38                                             ` Jakub Jelinek
  1 sibling, 0 replies; 40+ messages in thread
From: Evgeny Stupachenko @ 2014-10-01 10:50 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Uros Bizjak, H.J. Lu, Richard Henderson, GCC Patches, Richard Biener

This is not a fix for the case { 1, 2, 3, ..., 31, 0 }. This patch is
an extension of expand_vec_perm_palignr on AVX2 case.
For the case { 1, 2, 3, ..., 31, 0 } we should use separate
function/pattern. I like split as it is similar to already handled SSE
byte rotate {1,2,3,.....,15, 0}: ssse3_palignr<mode>_perm and AVX2
split: *avx_vperm_broadcast_<mode>.

On Wed, Oct 1, 2014 at 2:35 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Wed, Oct 01, 2014 at 12:28:51PM +0200, Uros Bizjak wrote:
>> On Wed, Oct 1, 2014 at 12:16 PM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
>> > Getting back to initial patch, is it ok?
>>
>> IMO, we should start with Jakub's proposed patch [1]
>>
>> [1] https://gcc.gnu.org/ml/gcc-patches/2014-10/msg00010.html
>
> That doesn't compile, will post a new version; got interrupted when
> I found that in
> GCC_TEST_RUN_EXPENSIVE=1 make check-gcc RUNTESTFLAGS='--target_board=unix/-mavx2 dg-torture.exp=vshuf*.c'
> one test is miscompiled even with unpatched compiler, debugging that now.
>
> That said, my patch will not do anything about the
> case Mark mentioned { 1, 2, 3, ..., 31, 0 } permutation,
> for that we can't do vpalignr followed by vpshufb or similar,
> but need to do some permutation first and then vpalignr on
> the result.  So it would need a new routine.  It is still a 2
> insn permutation, not 6, and needs different algorithm, so sharing
> the same routine for that is undesirable.
>
>         Jakub

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 1/2, x86] Add palignr support for AVX2.
  2014-10-01 10:35                                           ` Jakub Jelinek
  2014-10-01 10:50                                             ` Evgeny Stupachenko
@ 2014-10-01 11:38                                             ` Jakub Jelinek
  2014-10-01 11:46                                               ` Uros Bizjak
  1 sibling, 1 reply; 40+ messages in thread
From: Jakub Jelinek @ 2014-10-01 11:38 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: Evgeny Stupachenko, H.J. Lu, Richard Henderson, GCC Patches,
	Richard Biener

On Wed, Oct 01, 2014 at 12:35:14PM +0200, Jakub Jelinek wrote:
> On Wed, Oct 01, 2014 at 12:28:51PM +0200, Uros Bizjak wrote:
> > On Wed, Oct 1, 2014 at 12:16 PM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
> > > Getting back to initial patch, is it ok?
> > 
> > IMO, we should start with Jakub's proposed patch [1]
> > 
> > [1] https://gcc.gnu.org/ml/gcc-patches/2014-10/msg00010.html
> 
> That doesn't compile, will post a new version; got interrupted when
> I found that in
> GCC_TEST_RUN_EXPENSIVE=1 make check-gcc RUNTESTFLAGS='--target_board=unix/-mavx2 dg-torture.exp=vshuf*.c'
> one test is miscompiled even with unpatched compiler, debugging that now.

Let's start with the bugfix.  The || doesn't make any sense, and we really
want to fill in 4 bits (0, 1, 4, 5) of the immediate, not just two, anyway.
valid_perm_using_mode_p (V2TImode, d) should already guarantee that
it is possible to permutate it as V2TI, so all we care about are the
values of d->perm[0] and d->perm[nelt / 2], but we care not just which
lane it is, but also which operand (src1 or src2).

Tested with
GCC_TEST_RUN_EXPENSIVE=1 make check-gcc RUNTESTFLAGS='--target_board=unix/-mavx2 dg-torture.exp=vshuf*.c'
Ok for trunk/4.9/4.8?

2014-10-01  Jakub Jelinek  <jakub@redhat.com>

	PR target/63428
	* config/i386/i386.c (expand_vec_perm_pshufb): Fix up rperm[0]
	argument to avx2_permv2ti.

	* gcc.dg/torture/vshuf-4.inc: Move test 122 from EXPTESTS
	to test 24 in TESTS.

--- gcc/config/i386/i386.c.jj	2014-10-01 11:22:09.000000000 +0200
+++ gcc/config/i386/i386.c	2014-10-01 13:00:30.835809031 +0200
@@ -42961,8 +42961,8 @@ expand_vec_perm_pshufb (struct expand_ve
 	      op0 = gen_lowpart (V4DImode, d->op0);
 	      op1 = gen_lowpart (V4DImode, d->op1);
 	      rperm[0]
-		= GEN_INT (((d->perm[0] & (nelt / 2)) ? 1 : 0)
-			   || ((d->perm[nelt / 2] & (nelt / 2)) ? 2 : 0));
+		= GEN_INT ((d->perm[0] / (nelt / 2))
+			   | ((d->perm[nelt / 2] / (nelt / 2)) * 16));
 	      emit_insn (gen_avx2_permv2ti (target, op0, op1, rperm[0]));
 	      if (target != d->target)
 		emit_move_insn (d->target, gen_lowpart (d->vmode, target));
--- gcc/testsuite/gcc.dg/torture/vshuf-4.inc.jj	2012-03-20 08:51:25.000000000 +0100
+++ gcc/testsuite/gcc.dg/torture/vshuf-4.inc	2014-10-01 13:23:07.163090945 +0200
@@ -23,7 +23,8 @@ T (19,	3, 2, 1, 0) \
 T (20,	0, 4, 1, 5) \
 T (21,	2, 6, 3, 7) \
 T (22,	1, 2, 3, 0) \
-T (23,	2, 1, 0, 3)
+T (23,	2, 1, 0, 3) \
+T (24,	2, 5, 6, 3)
 #define EXPTESTS \
 T (116,	1, 2, 4, 3) \
 T (117,	7, 3, 3, 0) \
@@ -31,7 +32,6 @@ T (118,	5, 3, 2, 7) \
 T (119,	0, 3, 5, 6) \
 T (120,	0, 0, 1, 5) \
 T (121,	4, 6, 2, 1) \
-T (122,	2, 5, 6, 3) \
 T (123,	4, 6, 3, 2) \
 T (124,	4, 7, 5, 6) \
 T (125,	0, 4, 2, 4) \


	Jakub

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 1/2, x86] Add palignr support for AVX2.
  2014-10-01 11:38                                             ` Jakub Jelinek
@ 2014-10-01 11:46                                               ` Uros Bizjak
  2014-10-01 12:17                                                 ` Jakub Jelinek
  0 siblings, 1 reply; 40+ messages in thread
From: Uros Bizjak @ 2014-10-01 11:46 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Evgeny Stupachenko, H.J. Lu, Richard Henderson, GCC Patches,
	Richard Biener

On Wed, Oct 1, 2014 at 1:38 PM, Jakub Jelinek <jakub@redhat.com> wrote:

>> That doesn't compile, will post a new version; got interrupted when
>> I found that in
>> GCC_TEST_RUN_EXPENSIVE=1 make check-gcc RUNTESTFLAGS='--target_board=unix/-mavx2 dg-torture.exp=vshuf*.c'
>> one test is miscompiled even with unpatched compiler, debugging that now.
>
> Let's start with the bugfix.  The || doesn't make any sense, and we really
> want to fill in 4 bits (0, 1, 4, 5) of the immediate, not just two, anyway.
> valid_perm_using_mode_p (V2TImode, d) should already guarantee that
> it is possible to permutate it as V2TI, so all we care about are the
> values of d->perm[0] and d->perm[nelt / 2], but we care not just which
> lane it is, but also which operand (src1 or src2).
>
> Tested with
> GCC_TEST_RUN_EXPENSIVE=1 make check-gcc RUNTESTFLAGS='--target_board=unix/-mavx2 dg-torture.exp=vshuf*.c'
> Ok for trunk/4.9/4.8?
>
> 2014-10-01  Jakub Jelinek  <jakub@redhat.com>
>
>         PR target/63428
>         * config/i386/i386.c (expand_vec_perm_pshufb): Fix up rperm[0]
>         argument to avx2_permv2ti.
>
>         * gcc.dg/torture/vshuf-4.inc: Move test 122 from EXPTESTS
>         to test 24 in TESTS.

OK.

Thanks,
Uros.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 1/2, x86] Add palignr support for AVX2.
  2014-10-01 11:46                                               ` Uros Bizjak
@ 2014-10-01 12:17                                                 ` Jakub Jelinek
  2014-10-01 12:25                                                   ` Uros Bizjak
  0 siblings, 1 reply; 40+ messages in thread
From: Jakub Jelinek @ 2014-10-01 12:17 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: Evgeny Stupachenko, H.J. Lu, Richard Henderson, GCC Patches,
	Richard Biener

On Wed, Oct 01, 2014 at 01:45:54PM +0200, Uros Bizjak wrote:
> OK.

Thanks.  Second step is a tiny optimization, for the
simplified 122 (now 24) vshuf-v4di.c testcase:
typedef unsigned long long V __attribute__ ((vector_size (32)));
V a, b, c, d;

int
main ()
{
  int i;
  for (i = 0; i < 4; ++i)
    {
      a[i] = i + 2;
      b[i] = 4 + i + 2;
    }
  asm volatile ("" : : : "memory");
  c = __builtin_shuffle (a, b, (V) { 2, 5, 6, 3 });
  d = __builtin_shuffle ((V) { 2, 3, 4, 5 }, (V) { 6, 7, 8, 9 }, (V) { 2, 5, 6, 3 });
  if (__builtin_memcmp (&c, &d, sizeof (c)))
    __builtin_abort ();
  return 0;
}

this patch allows better code to be generated:
-       vmovdqa b(%rip), %ymm0
+       vpermq  $238, a(%rip), %ymm1
        movl    $32, %edx
-       movl    $d, %esi
-       vmovdqa a(%rip), %ymm1
+       vmovdqa b(%rip), %ymm0
+       movl    $d, %esi
        movl    $c, %edi
-       vperm2i128      $17, %ymm0, %ymm1, %ymm1
        vpblendd        $195, %ymm1, %ymm0, %ymm0
        vmovdqa %ymm0, c(%rip)

That is because vperm2i128 $17 unnecessarily uses
two operands when all the data it grabs are from a single one.
So, by canonicalizing the permutation we can emit
vpermq $238 instead.  Perhaps more places might benefit from
extra canonicalize_perm calls (two spots already use that beyond
the single one on the expansion/testing entry point).

Tested again with
GCC_TEST_RUN_EXPENSIVE=1 make check-gcc \
RUNTESTFLAGS='--target_board=unix/-mavx2 dg-torture.exp=vshuf*.c'
on x86_64-linux.  Ok for trunk?

2014-10-01  Jakub Jelinek  <jakub@redhat.com>

	* config/i386/i386.c (expand_vec_perm_vperm2f128): Canonicalize
	dfirst permutation.

--- gcc/config/i386/i386.c.jj	2014-10-01 13:00:30.000000000 +0200
+++ gcc/config/i386/i386.c	2014-10-01 13:59:40.061956852 +0200
@@ -43905,15 +43905,16 @@ expand_vec_perm_vperm2f128 (struct expan
 	    dfirst.perm[i] = (i & (nelt2 - 1))
 			     + ((perm >> (2 * (i >= nelt2))) & 3) * nelt2;
 
+	  canonicalize_perm (&dfirst);
 	  ok = expand_vec_perm_1 (&dfirst);
 	  gcc_assert (ok);
 
 	  /* And dsecond is some single insn shuffle, taking
 	     d->op0 and result of vperm2f128 (if perm < 16) or
 	     d->op1 and result of vperm2f128 (otherwise).  */
-	  dsecond.op1 = dfirst.target;
 	  if (perm >= 16)
-	    dsecond.op0 = dfirst.op1;
+	    dsecond.op0 = dsecond.op1;
+	  dsecond.op1 = dfirst.target;
 
 	  ok = expand_vec_perm_1 (&dsecond);
 	  gcc_assert (ok);


	Jakub

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 1/2, x86] Add palignr support for AVX2.
  2014-10-01 12:17                                                 ` Jakub Jelinek
@ 2014-10-01 12:25                                                   ` Uros Bizjak
  2014-10-01 12:56                                                     ` Jakub Jelinek
  0 siblings, 1 reply; 40+ messages in thread
From: Uros Bizjak @ 2014-10-01 12:25 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Evgeny Stupachenko, H.J. Lu, Richard Henderson, GCC Patches,
	Richard Biener

On Wed, Oct 1, 2014 at 2:17 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Wed, Oct 01, 2014 at 01:45:54PM +0200, Uros Bizjak wrote:
>> OK.
>
> Thanks.  Second step is a tiny optimization, for the
> simplified 122 (now 24) vshuf-v4di.c testcase:
> typedef unsigned long long V __attribute__ ((vector_size (32)));
> V a, b, c, d;
>
> int
> main ()
> {
>   int i;
>   for (i = 0; i < 4; ++i)
>     {
>       a[i] = i + 2;
>       b[i] = 4 + i + 2;
>     }
>   asm volatile ("" : : : "memory");
>   c = __builtin_shuffle (a, b, (V) { 2, 5, 6, 3 });
>   d = __builtin_shuffle ((V) { 2, 3, 4, 5 }, (V) { 6, 7, 8, 9 }, (V) { 2, 5, 6, 3 });
>   if (__builtin_memcmp (&c, &d, sizeof (c)))
>     __builtin_abort ();
>   return 0;
> }
>
> this patch allows better code to be generated:
> -       vmovdqa b(%rip), %ymm0
> +       vpermq  $238, a(%rip), %ymm1
>         movl    $32, %edx
> -       movl    $d, %esi
> -       vmovdqa a(%rip), %ymm1
> +       vmovdqa b(%rip), %ymm0
> +       movl    $d, %esi
>         movl    $c, %edi
> -       vperm2i128      $17, %ymm0, %ymm1, %ymm1
>         vpblendd        $195, %ymm1, %ymm0, %ymm0
>         vmovdqa %ymm0, c(%rip)
>
> That is because vperm2i128 $17 unnecessarily uses
> two operands when all the data it grabs are from a single one.
> So, by canonicalizing the permutation we can emit
> vpermq $238 instead.  Perhaps more places might benefit from
> extra canonicalize_perm calls (two spots already use that beyond
> the single one on the expansion/testing entry point).
>
> Tested again with
> GCC_TEST_RUN_EXPENSIVE=1 make check-gcc \
> RUNTESTFLAGS='--target_board=unix/-mavx2 dg-torture.exp=vshuf*.c'
> on x86_64-linux.  Ok for trunk?
>
> 2014-10-01  Jakub Jelinek  <jakub@redhat.com>
>
>         * config/i386/i386.c (expand_vec_perm_vperm2f128): Canonicalize
>         dfirst permutation.

OK.

Thanks,
Uros.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 1/2, x86] Add palignr support for AVX2.
  2014-10-01 12:25                                                   ` Uros Bizjak
@ 2014-10-01 12:56                                                     ` Jakub Jelinek
  2014-10-01 13:10                                                       ` Uros Bizjak
  2014-10-01 19:03                                                       ` Uros Bizjak
  0 siblings, 2 replies; 40+ messages in thread
From: Jakub Jelinek @ 2014-10-01 12:56 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: Evgeny Stupachenko, H.J. Lu, Richard Henderson, GCC Patches,
	Richard Biener

On Wed, Oct 01, 2014 at 02:25:01PM +0200, Uros Bizjak wrote:
> OK.

And now the expand_vec_perm_palignr improvement, tested
with GCC_TEST_RUN_EXPENSIVE=1 make check-gcc \
RUNTESTFLAGS='--target_board=unix/-mavx2 dg-torture.exp=vshuf*.c'
E.g.
typedef unsigned long long V __attribute__ ((vector_size (32)));
extern void abort (void);
V a, b, c, d;
void test_14 (void)
{
  V mask = { 6, 1, 3, 4 };
  int i;
  c = __builtin_shuffle (a, mask);
  d = __builtin_shuffle (a, b, mask);
}
(distilled from test 15 in vshuf-v4di.c) results in:
-	vmovdqa	a(%rip), %ymm0
-	vpermq	$54, %ymm0, %ymm1
-	vpshufb	.LC1(%rip), %ymm0, %ymm0
-	vmovdqa	%ymm1, c(%rip)
-	vmovdqa	b(%rip), %ymm1
-	vpshufb	.LC0(%rip), %ymm1, %ymm1
-	vpermq	$78, %ymm1, %ymm1
-	vpor	%ymm1, %ymm0, %ymm0
+	vmovdqa	a(%rip), %ymm1
+	vpermq	$54, %ymm1, %ymm0
+	vmovdqa	%ymm0, c(%rip)
+	vmovdqa	b(%rip), %ymm0
+	vpalignr	$8, %ymm1, %ymm0, %ymm0
+	vpermq	$99, %ymm0, %ymm0
 	vmovdqa	%ymm0, d(%rip)
 	vzeroupper
 	ret
change (and two fewer .rodata constants).

Ok for trunk?

2014-10-01  Jakub Jelinek  <jakub@redhat.com>

	* config/i386/i386.c (expand_vec_perm_palignr): Handle
	256-bit vectors for TARGET_AVX2.

--- gcc/config/i386/i386.c.jj	2014-10-01 14:24:16.483138899 +0200
+++ gcc/config/i386/i386.c	2014-10-01 14:27:53.577222011 +0200
@@ -43297,44 +43297,75 @@ expand_vec_perm_palignr (struct expand_v
   rtx shift, target;
   struct expand_vec_perm_d dcopy;
 
-  /* Even with AVX, palignr only operates on 128-bit vectors.  */
-  if (!TARGET_SSSE3 || GET_MODE_SIZE (d->vmode) != 16)
+  /* Even with AVX, palignr only operates on 128-bit vectors,
+     in AVX2 palignr operates on both 128-bit lanes.  */
+  if ((!TARGET_SSSE3 || GET_MODE_SIZE (d->vmode) != 16)
+      && (!TARGET_AVX2 || GET_MODE_SIZE (d->vmode) != 32))
     return false;
 
-  min = nelt, max = 0;
+  min = 2 * nelt, max = 0;
   for (i = 0; i < nelt; ++i)
     {
       unsigned e = d->perm[i];
+      if (GET_MODE_SIZE (d->vmode) == 32)
+	e = (e & ((nelt / 2) - 1)) | ((e & nelt) >> 1);
       if (e < min)
 	min = e;
       if (e > max)
 	max = e;
     }
-  if (min == 0 || max - min >= nelt)
+  if (min == 0
+      || max - min >= (GET_MODE_SIZE (d->vmode) == 32 ? nelt / 2 : nelt))
     return false;
 
   /* Given that we have SSSE3, we know we'll be able to implement the
-     single operand permutation after the palignr with pshufb.  */
-  if (d->testing_p)
+     single operand permutation after the palignr with pshufb for
+     128-bit vectors.  */
+  if (d->testing_p && GET_MODE_SIZE (d->vmode) == 16)
     return true;
 
   dcopy = *d;
-  shift = GEN_INT (min * GET_MODE_BITSIZE (GET_MODE_INNER (d->vmode)));
-  target = gen_reg_rtx (TImode);
-  emit_insn (gen_ssse3_palignrti (target, gen_lowpart (TImode, d->op1),
-				  gen_lowpart (TImode, d->op0), shift));
-
-  dcopy.op0 = dcopy.op1 = gen_lowpart (d->vmode, target);
-  dcopy.one_operand_p = true;
 
   in_order = true;
   for (i = 0; i < nelt; ++i)
     {
-      unsigned e = dcopy.perm[i] - min;
+      unsigned e = dcopy.perm[i];
+      if (GET_MODE_SIZE (d->vmode) == 32
+	  && e >= nelt
+	  && (e & (nelt / 2 - 1)) < min)
+	e = e - min - (nelt / 2);
+      else
+	e = e - min;
       if (e != i)
 	in_order = false;
       dcopy.perm[i] = e;
     }
+  dcopy.one_operand_p = true;
+
+  /* For AVX2, test whether we can permute the result in one instruction.  */
+  if (d->testing_p)
+    {
+      if (in_order)
+	return true;
+      dcopy.op1 = dcopy.op0;
+      return expand_vec_perm_1 (&dcopy);
+    }
+
+  shift = GEN_INT (min * GET_MODE_BITSIZE (GET_MODE_INNER (d->vmode)));
+  if (GET_MODE_SIZE (d->vmode) == 16)
+    {
+      target = gen_reg_rtx (TImode);
+      emit_insn (gen_ssse3_palignrti (target, gen_lowpart (TImode, d->op1),
+				      gen_lowpart (TImode, d->op0), shift));
+    }
+  else
+    {
+      target = gen_reg_rtx (V2TImode);
+      emit_insn (gen_avx2_palignrv2ti (target, gen_lowpart (V2TImode, d->op1),
+				       gen_lowpart (V2TImode, d->op0), shift));
+    }
+
+  dcopy.op0 = dcopy.op1 = gen_lowpart (d->vmode, target);
 
   /* Test for the degenerate case where the alignment by itself
      produces the desired permutation.  */
@@ -43345,7 +43376,7 @@ expand_vec_perm_palignr (struct expand_v
     }
 
   ok = expand_vec_perm_1 (&dcopy);
-  gcc_assert (ok);
+  gcc_assert (ok || GET_MODE_SIZE (d->vmode) == 32);
 
   return ok;
 }


	Jakub

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 1/2, x86] Add palignr support for AVX2.
  2014-10-01 12:56                                                     ` Jakub Jelinek
@ 2014-10-01 13:10                                                       ` Uros Bizjak
  2014-10-01 14:12                                                         ` Jakub Jelinek
  2014-10-01 19:03                                                       ` Uros Bizjak
  1 sibling, 1 reply; 40+ messages in thread
From: Uros Bizjak @ 2014-10-01 13:10 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Evgeny Stupachenko, H.J. Lu, Richard Henderson, GCC Patches,
	Richard Biener

On Wed, Oct 1, 2014 at 2:56 PM, Jakub Jelinek <jakub@redhat.com> wrote:

> And now the expand_vec_perm_palignr improvement, tested
> with GCC_TEST_RUN_EXPENSIVE=1 make check-gcc \
> RUNTESTFLAGS='--target_board=unix/-mavx2 dg-torture.exp=vshuf*.c'
> E.g.
> typedef unsigned long long V __attribute__ ((vector_size (32)));
> extern void abort (void);
> V a, b, c, d;
> void test_14 (void)
> {
>   V mask = { 6, 1, 3, 4 };
>   int i;
>   c = __builtin_shuffle (a, mask);
>   d = __builtin_shuffle (a, b, mask);
> }
> (distilled from test 15 in vshuf-v4di.c) results in:
> -       vmovdqa a(%rip), %ymm0
> -       vpermq  $54, %ymm0, %ymm1
> -       vpshufb .LC1(%rip), %ymm0, %ymm0
> -       vmovdqa %ymm1, c(%rip)
> -       vmovdqa b(%rip), %ymm1
> -       vpshufb .LC0(%rip), %ymm1, %ymm1
> -       vpermq  $78, %ymm1, %ymm1
> -       vpor    %ymm1, %ymm0, %ymm0
> +       vmovdqa a(%rip), %ymm1
> +       vpermq  $54, %ymm1, %ymm0
> +       vmovdqa %ymm0, c(%rip)
> +       vmovdqa b(%rip), %ymm0
> +       vpalignr        $8, %ymm1, %ymm0, %ymm0
> +       vpermq  $99, %ymm0, %ymm0
>         vmovdqa %ymm0, d(%rip)
>         vzeroupper
>         ret
> change (and two fewer .rodata constants).
>
> Ok for trunk?
>
> 2014-10-01  Jakub Jelinek  <jakub@redhat.com>
>
>         * config/i386/i386.c (expand_vec_perm_palignr): Handle
>         256-bit vectors for TARGET_AVX2.

Please mention PR 62128 and include the testcase from the PR. Also,
please add a version of gcc.target/i386/pr52252-atom.c, compiled with
-mavx2 (perhaps named pr52252-avx2.c)

OK with a small adjustment below.

Thanks,
Uros.

> --- gcc/config/i386/i386.c.jj   2014-10-01 14:24:16.483138899 +0200
> +++ gcc/config/i386/i386.c      2014-10-01 14:27:53.577222011 +0200
> @@ -43297,44 +43297,75 @@ expand_vec_perm_palignr (struct expand_v
>    rtx shift, target;
>    struct expand_vec_perm_d dcopy;
>
> -  /* Even with AVX, palignr only operates on 128-bit vectors.  */
> -  if (!TARGET_SSSE3 || GET_MODE_SIZE (d->vmode) != 16)
> +  /* Even with AVX, palignr only operates on 128-bit vectors,
> +     in AVX2 palignr operates on both 128-bit lanes.  */
> +  if ((!TARGET_SSSE3 || GET_MODE_SIZE (d->vmode) != 16)
> +      && (!TARGET_AVX2 || GET_MODE_SIZE (d->vmode) != 32))

Please simplify the above condition ...

>      return false;
>
> -  min = nelt, max = 0;
> +  min = 2 * nelt, max = 0;
>    for (i = 0; i < nelt; ++i)
>      {
>        unsigned e = d->perm[i];
> +      if (GET_MODE_SIZE (d->vmode) == 32)
> +       e = (e & ((nelt / 2) - 1)) | ((e & nelt) >> 1);
>        if (e < min)
>         min = e;
>        if (e > max)
>         max = e;
>      }
> -  if (min == 0 || max - min >= nelt)
> +  if (min == 0
> +      || max - min >= (GET_MODE_SIZE (d->vmode) == 32 ? nelt / 2 : nelt))
>      return false;
>
>    /* Given that we have SSSE3, we know we'll be able to implement the
> -     single operand permutation after the palignr with pshufb.  */
> -  if (d->testing_p)
> +     single operand permutation after the palignr with pshufb for
> +     128-bit vectors.  */
> +  if (d->testing_p && GET_MODE_SIZE (d->vmode) == 16)
>      return true;
>
>    dcopy = *d;
> -  shift = GEN_INT (min * GET_MODE_BITSIZE (GET_MODE_INNER (d->vmode)));
> -  target = gen_reg_rtx (TImode);
> -  emit_insn (gen_ssse3_palignrti (target, gen_lowpart (TImode, d->op1),
> -                                 gen_lowpart (TImode, d->op0), shift));
> -
> -  dcopy.op0 = dcopy.op1 = gen_lowpart (d->vmode, target);
> -  dcopy.one_operand_p = true;
>
>    in_order = true;
>    for (i = 0; i < nelt; ++i)
>      {
> -      unsigned e = dcopy.perm[i] - min;
> +      unsigned e = dcopy.perm[i];
> +      if (GET_MODE_SIZE (d->vmode) == 32
> +         && e >= nelt
> +         && (e & (nelt / 2 - 1)) < min)
> +       e = e - min - (nelt / 2);
> +      else
> +       e = e - min;
>        if (e != i)
>         in_order = false;
>        dcopy.perm[i] = e;
>      }
> +  dcopy.one_operand_p = true;
> +
> +  /* For AVX2, test whether we can permute the result in one instruction.  */
> +  if (d->testing_p)
> +    {
> +      if (in_order)
> +       return true;
> +      dcopy.op1 = dcopy.op0;
> +      return expand_vec_perm_1 (&dcopy);
> +    }
> +
> +  shift = GEN_INT (min * GET_MODE_BITSIZE (GET_MODE_INNER (d->vmode)));
> +  if (GET_MODE_SIZE (d->vmode) == 16)
> +    {
> +      target = gen_reg_rtx (TImode);
> +      emit_insn (gen_ssse3_palignrti (target, gen_lowpart (TImode, d->op1),
> +                                     gen_lowpart (TImode, d->op0), shift));
> +    }
> +  else
> +    {
> +      target = gen_reg_rtx (V2TImode);
> +      emit_insn (gen_avx2_palignrv2ti (target, gen_lowpart (V2TImode, d->op1),
> +                                      gen_lowpart (V2TImode, d->op0), shift));
> +    }
> +
> +  dcopy.op0 = dcopy.op1 = gen_lowpart (d->vmode, target);
>
>    /* Test for the degenerate case where the alignment by itself
>       produces the desired permutation.  */
> @@ -43345,7 +43376,7 @@ expand_vec_perm_palignr (struct expand_v
>      }
>
>    ok = expand_vec_perm_1 (&dcopy);
> -  gcc_assert (ok);
> +  gcc_assert (ok || GET_MODE_SIZE (d->vmode) == 32);
>
>    return ok;
>  }
>
>
>         Jakub

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 1/2, x86] Add palignr support for AVX2.
  2014-10-01 13:10                                                       ` Uros Bizjak
@ 2014-10-01 14:12                                                         ` Jakub Jelinek
  2014-10-01 14:21                                                           ` Uros Bizjak
  2014-10-01 20:44                                                           ` Jakub Jelinek
  0 siblings, 2 replies; 40+ messages in thread
From: Jakub Jelinek @ 2014-10-01 14:12 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: Evgeny Stupachenko, H.J. Lu, Richard Henderson, GCC Patches,
	Richard Biener

[-- Attachment #1: Type: text/plain, Size: 2246 bytes --]

On Wed, Oct 01, 2014 at 03:09:59PM +0200, Uros Bizjak wrote:
> > 2014-10-01  Jakub Jelinek  <jakub@redhat.com>
> >
> >         * config/i386/i386.c (expand_vec_perm_palignr): Handle
> >         256-bit vectors for TARGET_AVX2.
> 
> Please mention PR 62128 and include the testcase from the PR. Also,
> please add a version of gcc.target/i386/pr52252-atom.c, compiled with
> -mavx2 (perhaps named pr52252-avx2.c)

This patch doesn't fix PR62128, and it is already tested (even without
GCC_RUN_EXPENSIVE_TESTS=1) in the vshuf*.c torture tests (several of them).
If you want coverage not just for the default flags, I'd say we should
say for -O2 only just add gcc.target/i386/{avx,avx2,avx512}-vshuf-*.c
tests that would include ../../gcc.dg/torture/vshuf-*.c and be compiled/run
with -mavx/-mavx2/-mavx512 options instead of the default ones.

For PR62128, IMHO the right fix is attached.  Note, this is again covered
in vshuf-*.c tests (test 22 in both vshuf-v32*.c and vshuf-v16*.c).
With that attached patch, pr62128.c (aka test_22 in vshuf-v32qi.c), changes:
-       vpshufb .LC0(%rip), %ymm0, %ymm1
-       vpshufb .LC1(%rip), %ymm0, %ymm0
-       vpermq  $78, %ymm1, %ymm1
-       vpor    %ymm1, %ymm0, %ymm0
+       vpermq  $78, %ymm0, %ymm1
+       vpalignr        $1, %ymm0, %ymm1, %ymm0
        ret

> > --- gcc/config/i386/i386.c.jj   2014-10-01 14:24:16.483138899 +0200
> > +++ gcc/config/i386/i386.c      2014-10-01 14:27:53.577222011 +0200
> > @@ -43297,44 +43297,75 @@ expand_vec_perm_palignr (struct expand_v
> >    rtx shift, target;
> >    struct expand_vec_perm_d dcopy;
> >
> > -  /* Even with AVX, palignr only operates on 128-bit vectors.  */
> > -  if (!TARGET_SSSE3 || GET_MODE_SIZE (d->vmode) != 16)
> > +  /* Even with AVX, palignr only operates on 128-bit vectors,
> > +     in AVX2 palignr operates on both 128-bit lanes.  */
> > +  if ((!TARGET_SSSE3 || GET_MODE_SIZE (d->vmode) != 16)
> > +      && (!TARGET_AVX2 || GET_MODE_SIZE (d->vmode) != 32))
> 
> Please simplify the above condition ...

How?  I don't see how that can be simplified, it can
be transformed into
  if (!((TARGET_SSSE3 && GET_MODE_SIZE (d->vmode) == 16)
	|| (TARGET_AVX2 && GET_MODE_SIZE (d->vmode) == 32)))
but I don't find that simpler.

	Jakub

[-- Attachment #2: V324 --]
[-- Type: text/plain, Size: 3410 bytes --]

2014-10-01  Jakub Jelinek  <jakub@redhat.com>

	PR target/62128
	* config/i386/i386.c (expand_vec_perm_1): Try expand_vec_perm_palignr
	if it expands to a single insn only.
	(expand_vec_perm_palignr): Add SINGLE_INSN_ONLY_P argument.  If true,
	fail unless in_order is true.  Add forward declaration.
	(expand_vec_perm_vperm2f128): Fix up comment about which permutation
	is useful for one_operand_p.
	(ix86_expand_vec_perm_const_1): Adjust expand_vec_perm_palignr caller.

--- gcc/config/i386/i386.c.jj	2014-10-01 15:42:28.000000000 +0200
+++ gcc/config/i386/i386.c	2014-10-01 15:50:06.234079887 +0200
@@ -39636,6 +39636,7 @@ struct expand_vec_perm_d
 static bool canonicalize_perm (struct expand_vec_perm_d *d);
 static bool expand_vec_perm_1 (struct expand_vec_perm_d *d);
 static bool expand_vec_perm_broadcast_1 (struct expand_vec_perm_d *d);
+static bool expand_vec_perm_palignr (struct expand_vec_perm_d *d, bool);
 
 /* Get a vector mode of the same size as the original but with elements
    twice as wide.  This is only guaranteed to apply to integral vectors.  */
@@ -43225,6 +43226,10 @@ expand_vec_perm_1 (struct expand_vec_per
   if (expand_vec_perm_pshufb (d))
     return true;
 
+  /* Try the AVX2 vpalignr instruction.  */
+  if (expand_vec_perm_palignr (d, true))
+    return true;
+
   /* Try the AVX512F vpermi2 instructions.  */
   rtx vec[64];
   enum machine_mode mode = d->vmode;
@@ -43286,10 +43291,11 @@ expand_vec_perm_pshuflw_pshufhw (struct
    the permutation using the SSSE3 palignr instruction.  This succeeds
    when all of the elements in PERM fit within one vector and we merely
    need to shift them down so that a single vector permutation has a
-   chance to succeed.  */
+   chance to succeed.  If SINGLE_INSN_ONLY_P, succeed if only
+   the vpalignr instruction itself can perform the requested permutation.  */
 
 static bool
-expand_vec_perm_palignr (struct expand_vec_perm_d *d)
+expand_vec_perm_palignr (struct expand_vec_perm_d *d, bool single_insn_only_p)
 {
   unsigned i, nelt = d->nelt;
   unsigned min, max;
@@ -43320,8 +43326,9 @@ expand_vec_perm_palignr (struct expand_v
 
   /* Given that we have SSSE3, we know we'll be able to implement the
      single operand permutation after the palignr with pshufb for
-     128-bit vectors.  */
-  if (d->testing_p && GET_MODE_SIZE (d->vmode) == 16)
+     128-bit vectors.  If SINGLE_INSN_ONLY_P, in_order has to be computed
+     first.  */
+  if (d->testing_p && GET_MODE_SIZE (d->vmode) == 16 && !single_insn_only_p)
     return true;
 
   dcopy = *d;
@@ -43342,6 +43349,9 @@ expand_vec_perm_palignr (struct expand_v
     }
   dcopy.one_operand_p = true;
 
+  if (single_insn_only_p && !in_order)
+    return false;
+
   /* For AVX2, test whether we can permute the result in one instruction.  */
   if (d->testing_p)
     {
@@ -43922,7 +43932,8 @@ expand_vec_perm_vperm2f128 (struct expan
 	  return true;
 	}
 
-      /* For one operand, the only useful vperm2f128 permutation is 0x10.  */
+      /* For one operand, the only useful vperm2f128 permutation is 0x01
+	 aka lanes swap.  */
       if (d->one_operand_p)
 	return false;
     }
@@ -44811,7 +44822,7 @@ ix86_expand_vec_perm_const_1 (struct exp
   if (expand_vec_perm_pshuflw_pshufhw (d))
     return true;
 
-  if (expand_vec_perm_palignr (d))
+  if (expand_vec_perm_palignr (d, false))
     return true;
 
   if (expand_vec_perm_interleave2 (d))

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 1/2, x86] Add palignr support for AVX2.
  2014-10-01 14:12                                                         ` Jakub Jelinek
@ 2014-10-01 14:21                                                           ` Uros Bizjak
  2014-10-01 20:44                                                           ` Jakub Jelinek
  1 sibling, 0 replies; 40+ messages in thread
From: Uros Bizjak @ 2014-10-01 14:21 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Evgeny Stupachenko, H.J. Lu, Richard Henderson, GCC Patches,
	Richard Biener

On Wed, Oct 1, 2014 at 4:12 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Wed, Oct 01, 2014 at 03:09:59PM +0200, Uros Bizjak wrote:
>> > 2014-10-01  Jakub Jelinek  <jakub@redhat.com>
>> >
>> >         * config/i386/i386.c (expand_vec_perm_palignr): Handle
>> >         256-bit vectors for TARGET_AVX2.
>>
>> Please mention PR 62128 and include the testcase from the PR. Also,
>> please add a version of gcc.target/i386/pr52252-atom.c, compiled with
>> -mavx2 (perhaps named pr52252-avx2.c)
>
> This patch doesn't fix PR62128, and it is already tested (even without
> GCC_RUN_EXPENSIVE_TESTS=1) in the vshuf*.c torture tests (several of them).

Ah, OK then.

> If you want coverage not just for the default flags, I'd say we should
> say for -O2 only just add gcc.target/i386/{avx,avx2,avx512}-vshuf-*.c
> tests that would include ../../gcc.dg/torture/vshuf-*.c and be compiled/run
> with -mavx/-mavx2/-mavx512 options instead of the default ones.

No, the above should be good for now. The failure was triggered by the
target that defaults to -mavx2, and for avx512f we can run this suite
in the way you suggest.

> For PR62128, IMHO the right fix is attached.  Note, this is again covered
> in vshuf-*.c tests (test 22 in both vshuf-v32*.c and vshuf-v16*.c).
> With that attached patch, pr62128.c (aka test_22 in vshuf-v32qi.c), changes:
> -       vpshufb .LC0(%rip), %ymm0, %ymm1
> -       vpshufb .LC1(%rip), %ymm0, %ymm0
> -       vpermq  $78, %ymm1, %ymm1
> -       vpor    %ymm1, %ymm0, %ymm0
> +       vpermq  $78, %ymm0, %ymm1
> +       vpalignr        $1, %ymm0, %ymm1, %ymm0
>         ret
>
>> > --- gcc/config/i386/i386.c.jj   2014-10-01 14:24:16.483138899 +0200
>> > +++ gcc/config/i386/i386.c      2014-10-01 14:27:53.577222011 +0200
>> > @@ -43297,44 +43297,75 @@ expand_vec_perm_palignr (struct expand_v
>> >    rtx shift, target;
>> >    struct expand_vec_perm_d dcopy;
>> >
>> > -  /* Even with AVX, palignr only operates on 128-bit vectors.  */
>> > -  if (!TARGET_SSSE3 || GET_MODE_SIZE (d->vmode) != 16)
>> > +  /* Even with AVX, palignr only operates on 128-bit vectors,
>> > +     in AVX2 palignr operates on both 128-bit lanes.  */
>> > +  if ((!TARGET_SSSE3 || GET_MODE_SIZE (d->vmode) != 16)
>> > +      && (!TARGET_AVX2 || GET_MODE_SIZE (d->vmode) != 32))
>>
>> Please simplify the above condition ...
>
> How?  I don't see how that can be simplified, it can
> be transformed into
>   if (!((TARGET_SSSE3 && GET_MODE_SIZE (d->vmode) == 16)
>         || (TARGET_AVX2 && GET_MODE_SIZE (d->vmode) == 32)))
> but I don't find that simpler.

I was thinking of the above, but you are correct, the change doesn't
bring us anything.

So, the patch is OK as it was.

Thanks,
Uros.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 1/2, x86] Add palignr support for AVX2.
  2014-10-01 12:56                                                     ` Jakub Jelinek
  2014-10-01 13:10                                                       ` Uros Bizjak
@ 2014-10-01 19:03                                                       ` Uros Bizjak
  1 sibling, 0 replies; 40+ messages in thread
From: Uros Bizjak @ 2014-10-01 19:03 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Evgeny Stupachenko, H.J. Lu, Richard Henderson, GCC Patches,
	Richard Biener

On Wed, Oct 1, 2014 at 2:56 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Wed, Oct 01, 2014 at 02:25:01PM +0200, Uros Bizjak wrote:
>> OK.
>
> And now the expand_vec_perm_palignr improvement, tested
> with GCC_TEST_RUN_EXPENSIVE=1 make check-gcc \
> RUNTESTFLAGS='--target_board=unix/-mavx2 dg-torture.exp=vshuf*.c'
> E.g.
> typedef unsigned long long V __attribute__ ((vector_size (32)));
> extern void abort (void);
> V a, b, c, d;
> void test_14 (void)
> {
>   V mask = { 6, 1, 3, 4 };
>   int i;
>   c = __builtin_shuffle (a, mask);
>   d = __builtin_shuffle (a, b, mask);
> }
> (distilled from test 15 in vshuf-v4di.c) results in:
> -       vmovdqa a(%rip), %ymm0
> -       vpermq  $54, %ymm0, %ymm1
> -       vpshufb .LC1(%rip), %ymm0, %ymm0
> -       vmovdqa %ymm1, c(%rip)
> -       vmovdqa b(%rip), %ymm1
> -       vpshufb .LC0(%rip), %ymm1, %ymm1
> -       vpermq  $78, %ymm1, %ymm1
> -       vpor    %ymm1, %ymm0, %ymm0
> +       vmovdqa a(%rip), %ymm1
> +       vpermq  $54, %ymm1, %ymm0
> +       vmovdqa %ymm0, c(%rip)
> +       vmovdqa b(%rip), %ymm0
> +       vpalignr        $8, %ymm1, %ymm0, %ymm0
> +       vpermq  $99, %ymm0, %ymm0
>         vmovdqa %ymm0, d(%rip)
>         vzeroupper
>         ret
> change (and two fewer .rodata constants).

On a related note, I would like to point out that
gcc.target/i386/pr61403.c also fails to generate blend insn with
-mavx2. The new insn sequence includes lots of new vpshufb insns with
memory access.

Uros.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 1/2, x86] Add palignr support for AVX2.
  2014-10-01 14:12                                                         ` Jakub Jelinek
  2014-10-01 14:21                                                           ` Uros Bizjak
@ 2014-10-01 20:44                                                           ` Jakub Jelinek
  2014-10-02  6:30                                                             ` Uros Bizjak
  1 sibling, 1 reply; 40+ messages in thread
From: Jakub Jelinek @ 2014-10-01 20:44 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: Evgeny Stupachenko, H.J. Lu, Richard Henderson, GCC Patches,
	Richard Biener

On Wed, Oct 01, 2014 at 04:12:15PM +0200, Jakub Jelinek wrote:
> For PR62128, IMHO the right fix is attached.  Note, this is again covered
> in vshuf-*.c tests (test 22 in both vshuf-v32*.c and vshuf-v16*.c).
> With that attached patch, pr62128.c (aka test_22 in vshuf-v32qi.c), changes:
> -       vpshufb .LC0(%rip), %ymm0, %ymm1
> -       vpshufb .LC1(%rip), %ymm0, %ymm0
> -       vpermq  $78, %ymm1, %ymm1
> -       vpor    %ymm1, %ymm0, %ymm0
> +       vpermq  $78, %ymm0, %ymm1
> +       vpalignr        $1, %ymm0, %ymm1, %ymm0
>         ret
> 
> 2014-10-01  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR target/62128
> 	* config/i386/i386.c (expand_vec_perm_1): Try expand_vec_perm_palignr
> 	if it expands to a single insn only.
> 	(expand_vec_perm_palignr): Add SINGLE_INSN_ONLY_P argument.  If true,
> 	fail unless in_order is true.  Add forward declaration.
> 	(expand_vec_perm_vperm2f128): Fix up comment about which permutation
> 	is useful for one_operand_p.
> 	(ix86_expand_vec_perm_const_1): Adjust expand_vec_perm_palignr caller.

Now bootstrapped/regtested on x86_64-linux and i686-linux (and additionally
tested also with --target_board=unix/-mavx2), ok for trunk?

> --- gcc/config/i386/i386.c.jj	2014-10-01 15:42:28.000000000 +0200
> +++ gcc/config/i386/i386.c	2014-10-01 15:50:06.234079887 +0200
> @@ -39636,6 +39636,7 @@ struct expand_vec_perm_d
>  static bool canonicalize_perm (struct expand_vec_perm_d *d);
>  static bool expand_vec_perm_1 (struct expand_vec_perm_d *d);
>  static bool expand_vec_perm_broadcast_1 (struct expand_vec_perm_d *d);
> +static bool expand_vec_perm_palignr (struct expand_vec_perm_d *d, bool);
>  
>  /* Get a vector mode of the same size as the original but with elements
>     twice as wide.  This is only guaranteed to apply to integral vectors.  */
> @@ -43225,6 +43226,10 @@ expand_vec_perm_1 (struct expand_vec_per
>    if (expand_vec_perm_pshufb (d))
>      return true;
>  
> +  /* Try the AVX2 vpalignr instruction.  */
> +  if (expand_vec_perm_palignr (d, true))
> +    return true;
> +
>    /* Try the AVX512F vpermi2 instructions.  */
>    rtx vec[64];
>    enum machine_mode mode = d->vmode;
> @@ -43286,10 +43291,11 @@ expand_vec_perm_pshuflw_pshufhw (struct
>     the permutation using the SSSE3 palignr instruction.  This succeeds
>     when all of the elements in PERM fit within one vector and we merely
>     need to shift them down so that a single vector permutation has a
> -   chance to succeed.  */
> +   chance to succeed.  If SINGLE_INSN_ONLY_P, succeed if only
> +   the vpalignr instruction itself can perform the requested permutation.  */
>  
>  static bool
> -expand_vec_perm_palignr (struct expand_vec_perm_d *d)
> +expand_vec_perm_palignr (struct expand_vec_perm_d *d, bool single_insn_only_p)
>  {
>    unsigned i, nelt = d->nelt;
>    unsigned min, max;
> @@ -43320,8 +43326,9 @@ expand_vec_perm_palignr (struct expand_v
>  
>    /* Given that we have SSSE3, we know we'll be able to implement the
>       single operand permutation after the palignr with pshufb for
> -     128-bit vectors.  */
> -  if (d->testing_p && GET_MODE_SIZE (d->vmode) == 16)
> +     128-bit vectors.  If SINGLE_INSN_ONLY_P, in_order has to be computed
> +     first.  */
> +  if (d->testing_p && GET_MODE_SIZE (d->vmode) == 16 && !single_insn_only_p)
>      return true;
>  
>    dcopy = *d;
> @@ -43342,6 +43349,9 @@ expand_vec_perm_palignr (struct expand_v
>      }
>    dcopy.one_operand_p = true;
>  
> +  if (single_insn_only_p && !in_order)
> +    return false;
> +
>    /* For AVX2, test whether we can permute the result in one instruction.  */
>    if (d->testing_p)
>      {
> @@ -43922,7 +43932,8 @@ expand_vec_perm_vperm2f128 (struct expan
>  	  return true;
>  	}
>  
> -      /* For one operand, the only useful vperm2f128 permutation is 0x10.  */
> +      /* For one operand, the only useful vperm2f128 permutation is 0x01
> +	 aka lanes swap.  */
>        if (d->one_operand_p)
>  	return false;
>      }
> @@ -44811,7 +44822,7 @@ ix86_expand_vec_perm_const_1 (struct exp
>    if (expand_vec_perm_pshuflw_pshufhw (d))
>      return true;
>  
> -  if (expand_vec_perm_palignr (d))
> +  if (expand_vec_perm_palignr (d, false))
>      return true;
>  
>    if (expand_vec_perm_interleave2 (d))


	Jakub

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 1/2, x86] Add palignr support for AVX2.
  2014-10-01 20:44                                                           ` Jakub Jelinek
@ 2014-10-02  6:30                                                             ` Uros Bizjak
  0 siblings, 0 replies; 40+ messages in thread
From: Uros Bizjak @ 2014-10-02  6:30 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Evgeny Stupachenko, H.J. Lu, Richard Henderson, GCC Patches,
	Richard Biener

On Wed, Oct 1, 2014 at 10:43 PM, Jakub Jelinek <jakub@redhat.com> wrote:

> > For PR62128, IMHO the right fix is attached.  Note, this is again covered
> > in vshuf-*.c tests (test 22 in both vshuf-v32*.c and vshuf-v16*.c).
> > With that attached patch, pr62128.c (aka test_22 in vshuf-v32qi.c), changes:
> > -       vpshufb .LC0(%rip), %ymm0, %ymm1
> > -       vpshufb .LC1(%rip), %ymm0, %ymm0
> > -       vpermq  $78, %ymm1, %ymm1
> > -       vpor    %ymm1, %ymm0, %ymm0
> > +       vpermq  $78, %ymm0, %ymm1
> > +       vpalignr        $1, %ymm0, %ymm1, %ymm0
> >         ret
> >
> > 2014-10-01  Jakub Jelinek  <jakub@redhat.com>
> >
> >       PR target/62128
> >       * config/i386/i386.c (expand_vec_perm_1): Try expand_vec_perm_palignr
> >       if it expands to a single insn only.
> >       (expand_vec_perm_palignr): Add SINGLE_INSN_ONLY_P argument.  If true,
> >       fail unless in_order is true.  Add forward declaration.
> >       (expand_vec_perm_vperm2f128): Fix up comment about which permutation
> >       is useful for one_operand_p.
> >       (ix86_expand_vec_perm_const_1): Adjust expand_vec_perm_palignr caller.
>
> Now bootstrapped/regtested on x86_64-linux and i686-linux (and additionally
> tested also with --target_board=unix/-mavx2), ok for trunk?
>

OK.

Thanks,
Uros.

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2014-10-02  6:30 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-28 16:51 [PATCH 1/2, x86] Add palignr support for AVX2 Evgeny Stupachenko
2014-04-28 17:29 ` H.J. Lu
2014-04-28 20:43   ` Evgeny Stupachenko
2014-04-28 18:15 ` Richard Henderson
2014-04-28 20:57   ` Evgeny Stupachenko
2014-04-28 21:08     ` Richard Henderson
2014-05-05 16:49       ` Evgeny Stupachenko
2014-05-19 10:42         ` Evgeny Stupachenko
2014-05-19 15:32         ` Richard Henderson
2014-07-03  9:53           ` Evgeny Stupachenko
2014-07-04 11:22             ` Uros Bizjak
2014-07-04 12:47               ` Evgeny Stupachenko
2014-07-07 14:40             ` Richard Henderson
2014-07-10 15:29               ` Evgeny Stupachenko
2014-08-14  8:09                 ` Evgeny Stupachenko
2014-08-14 14:55                   ` H.J. Lu
2014-08-26 12:59                     ` Evgeny Stupachenko
2014-08-26 16:29                       ` Richard Henderson
2014-08-27 15:50                         ` Evgeny Stupachenko
2014-09-08 10:03                           ` Evgeny Stupachenko
2014-09-16 13:15                             ` Evgeny Stupachenko
2014-09-16 14:59                               ` H.J. Lu
2014-09-17 13:01                                 ` Evgeny Stupachenko
2014-09-17 14:49                                   ` H.J. Lu
2014-09-17 17:26                                     ` Evgeny Stupachenko
2014-10-01 10:16                                       ` Evgeny Stupachenko
2014-10-01 10:28                                         ` Uros Bizjak
2014-10-01 10:35                                           ` Jakub Jelinek
2014-10-01 10:50                                             ` Evgeny Stupachenko
2014-10-01 11:38                                             ` Jakub Jelinek
2014-10-01 11:46                                               ` Uros Bizjak
2014-10-01 12:17                                                 ` Jakub Jelinek
2014-10-01 12:25                                                   ` Uros Bizjak
2014-10-01 12:56                                                     ` Jakub Jelinek
2014-10-01 13:10                                                       ` Uros Bizjak
2014-10-01 14:12                                                         ` Jakub Jelinek
2014-10-01 14:21                                                           ` Uros Bizjak
2014-10-01 20:44                                                           ` Jakub Jelinek
2014-10-02  6:30                                                             ` Uros Bizjak
2014-10-01 19:03                                                       ` Uros Bizjak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).