public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH, x86] Fix pblendv expand.
@ 2014-12-08 23:33 Evgeny Stupachenko
  2014-12-09  8:57 ` Uros Bizjak
  0 siblings, 1 reply; 13+ messages in thread
From: Evgeny Stupachenko @ 2014-12-08 23:33 UTC (permalink / raw)
  To: GCC Patches, Uros Bizjak, Richard Henderson

Hi,

The patch fix pblendv expand.
The bug was uncovered when permutation operands are constants.
In this case we init target register for expand_vec_perm_1 with
constant and then rewrite the target with constant for
expand_vec_perm_pblend.

The patch fixes 403.gcc execution, compiled with -Ofast -funroll-loops
-flto -march=corei7.

Bootstrap and make check passed.

Is it ok?

Evgeny

2014-12-09  Evgeny Stupachenko  <evstupac@gmail.com>

gcc/
        * config/i386/i386.c (expand_vec_perm_pblendv): Gen new rtx for
        expand_vec_perm_1 target.

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index eafc15a..5a914ad 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -47546,6 +47546,7 @@ expand_vec_perm_pblendv (struct expand_vec_perm_d *d)
     dcopy.op0 = dcopy.op1 = d->op1;
   else
     dcopy.op0 = dcopy.op1 = d->op0;
+  dcopy.target = gen_reg_rtx (vmode);
   dcopy.one_operand_p = true;

   for (i = 0; i < nelt; ++i)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH, x86] Fix pblendv expand.
  2014-12-08 23:33 [PATCH, x86] Fix pblendv expand Evgeny Stupachenko
@ 2014-12-09  8:57 ` Uros Bizjak
  2014-12-09  9:06   ` Uros Bizjak
  0 siblings, 1 reply; 13+ messages in thread
From: Uros Bizjak @ 2014-12-09  8:57 UTC (permalink / raw)
  To: Evgeny Stupachenko; +Cc: GCC Patches, Richard Henderson

On Tue, Dec 9, 2014 at 12:33 AM, Evgeny Stupachenko <evstupac@gmail.com> wrote:

> The patch fix pblendv expand.
> The bug was uncovered when permutation operands are constants.
> In this case we init target register for expand_vec_perm_1 with
> constant and then rewrite the target with constant for
> expand_vec_perm_pblend.
>
> The patch fixes 403.gcc execution, compiled with -Ofast -funroll-loops
> -flto -march=corei7.
>
> Bootstrap and make check passed.
>
> Is it ok?

Please add a testcase.

Uros.

>
> Evgeny
>
> 2014-12-09  Evgeny Stupachenko  <evstupac@gmail.com>
>
> gcc/
>         * config/i386/i386.c (expand_vec_perm_pblendv): Gen new rtx for
>         expand_vec_perm_1 target.
>
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index eafc15a..5a914ad 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -47546,6 +47546,7 @@ expand_vec_perm_pblendv (struct expand_vec_perm_d *d)
>      dcopy.op0 = dcopy.op1 = d->op1;
>    else
>      dcopy.op0 = dcopy.op1 = d->op0;
> +  dcopy.target = gen_reg_rtx (vmode);
>    dcopy.one_operand_p = true;
>
>    for (i = 0; i < nelt; ++i)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH, x86] Fix pblendv expand.
  2014-12-09  8:57 ` Uros Bizjak
@ 2014-12-09  9:06   ` Uros Bizjak
  2014-12-09 13:29     ` Evgeny Stupachenko
  0 siblings, 1 reply; 13+ messages in thread
From: Uros Bizjak @ 2014-12-09  9:06 UTC (permalink / raw)
  To: Evgeny Stupachenko; +Cc: GCC Patches, Richard Henderson, Jakub Jelinek

On Tue, Dec 9, 2014 at 9:57 AM, Uros Bizjak <ubizjak@gmail.com> wrote:

>> The patch fix pblendv expand.
>> The bug was uncovered when permutation operands are constants.
>> In this case we init target register for expand_vec_perm_1 with
>> constant and then rewrite the target with constant for
>> expand_vec_perm_pblend.
>>
>> The patch fixes 403.gcc execution, compiled with -Ofast -funroll-loops
>> -flto -march=corei7.
>>
>> Bootstrap and make check passed.
>>
>> Is it ok?
>
> Please add a testcase.

Also, it surprises me that we enter expand_vec_perm_pblendv with
uninitialized (?) target. Does your patch only papers over a real
problem up the call chain (hard to say without a testcase)?

Uros.

>
>>
>> Evgeny
>>
>> 2014-12-09  Evgeny Stupachenko  <evstupac@gmail.com>
>>
>> gcc/
>>         * config/i386/i386.c (expand_vec_perm_pblendv): Gen new rtx for
>>         expand_vec_perm_1 target.
>>
>> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
>> index eafc15a..5a914ad 100644
>> --- a/gcc/config/i386/i386.c
>> +++ b/gcc/config/i386/i386.c
>> @@ -47546,6 +47546,7 @@ expand_vec_perm_pblendv (struct expand_vec_perm_d *d)
>>      dcopy.op0 = dcopy.op1 = d->op1;
>>    else
>>      dcopy.op0 = dcopy.op1 = d->op0;
>> +  dcopy.target = gen_reg_rtx (vmode);
>>    dcopy.one_operand_p = true;
>>
>>    for (i = 0; i < nelt; ++i)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH, x86] Fix pblendv expand.
  2014-12-09  9:06   ` Uros Bizjak
@ 2014-12-09 13:29     ` Evgeny Stupachenko
  2014-12-09 15:59       ` Evgeny Stupachenko
  0 siblings, 1 reply; 13+ messages in thread
From: Evgeny Stupachenko @ 2014-12-09 13:29 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: GCC Patches, Richard Henderson, Jakub Jelinek

The case comes from spec2006 403.gcc (or old GCC itself).

  for (i = 0; i < FIRST_PSEUDO_REGISTER; ++i)
    {
      vd->e[i].mode = VOIDmode;
      vd->e[i].oldest_regno = i;
      vd->e[i].next_regno = INVALID_REGNUM;
    }

It is vectorized and only then completely peeled.
Only after peeling all stored values become constant.

Currently expand_vec_perm_pblendv works as following:
Let d.target, d.op0, dop1 be permutation parameters.

First we permute an operand (d.op1 or d.op0) and then blend it with
other argument:

d.target = shuffle(d.op1) /* using expand_vec_perm_1 */
d.target = pblend(d.op0, d.target)
(if d.op0 equal to d.target this is buggy)

Patch make it more accurate:
tmp = gen_reg_rtx (vmode)
tmp = shuffle(d.op1) /* using expand_vec_perm_1 */
d.target = pblend(d.op0, tmp)
(Here d.op0 can be equal to d.target)

Below is rtl dump of buggy case:

(183t.optimized)
...
vect_shuffle3_low_470 = VEC_PERM_EXPR <{ 0, 0, 0, 0 }, { 32, 33, 34,
35 }, { 0, 4, 0, 1 }>;
vect_shuffle3_high_469 = VEC_PERM_EXPR <vect_shuffle3_low_470, {
4294967295, 4294967295, 4294967295, 4294967295 }, { 0, 1, 4, 3 }>;
...
(184r.expand)
...
(insn 205 204 206 (set (reg:V4SI 768)
        (const_vector:V4SI [
                (const_int 0 [0])
                (const_int 0 [0])
                (const_int 0 [0])
                (const_int 0 [0])
            ])) ../regrename.c:1171 -1
     (nil))

(insn 206 205 208 (set (reg:V4SI 769)
        (mem/u/c:V4SI (symbol_ref/u:DI ("*.LC28") [flags 0x2]) [3  S16
A128])) ../regrename.c:1171 -1
     (expr_list:REG_EQUAL (const_vector:V4SI [
                (const_int 32 [0x20])
                (const_int 33 [0x21])
                (const_int 34 [0x22])
                (const_int 35 [0x23])
            ])
        (nil)))

(insn 208 206 207 (set (reg:V4SI 770)
        (vec_select:V4SI (vec_concat:V8SI (reg:V4SI 768)
                (reg:V4SI 769))
            (parallel [
                    (const_int 0 [0])
                    (const_int 4 [0x4])
                    (const_int 1 [0x1])
                    (const_int 5 [0x5])
                ]))) ../regrename.c:1171 -1
     (nil))

(insn 207 208 209 (set (reg:V4SI 464 [ D.15061 ])
        (vec_select:V4SI (reg:V4SI 770)
            (parallel [
                    (const_int 0 [0])
                    (const_int 1 [0x1])
                    (const_int 0 [0])
                    (const_int 2 [0x2])
                ]))) ../regrename.c:1171 -1
     (nil))

(insn 209 207 210 (set (reg:V4SI 771)
        (const_vector:V4SI [
                (const_int -1 [0xffffffffffffffff])
                (const_int -1 [0xffffffffffffffff])
                (const_int -1 [0xffffffffffffffff])
                (const_int -1 [0xffffffffffffffff])
            ])) ../regrename.c:1171 -1
     (nil))

(insn 210 209 211 (set (reg:V4SI 464 [ D.15061 ])
        (vec_select:V4SI (reg:V4SI 771)
            (parallel [
                    (const_int 0 [0])
                    (const_int 1 [0x1])
                    (const_int 0 [0])
                    (const_int 3 [0x3])
                ]))) ../regrename.c:1171 -1
     (nil))
(insn 211 210 212 (set (reg:V8HI 772)
        (vec_merge:V8HI (subreg:V8HI (reg:V4SI 464 [ D.15061 ]) 0)
            (subreg:V8HI (reg:V4SI 464 [ D.15061 ]) 0)
            (const_int 48 [0x30]))) ../regrename.c:1171 -1
     (nil))
...

On Tue, Dec 9, 2014 at 12:06 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Tue, Dec 9, 2014 at 9:57 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
>
>>> The patch fix pblendv expand.
>>> The bug was uncovered when permutation operands are constants.
>>> In this case we init target register for expand_vec_perm_1 with
>>> constant and then rewrite the target with constant for
>>> expand_vec_perm_pblend.
>>>
>>> The patch fixes 403.gcc execution, compiled with -Ofast -funroll-loops
>>> -flto -march=corei7.
>>>
>>> Bootstrap and make check passed.
>>>
>>> Is it ok?
>>
>> Please add a testcase.
>
> Also, it surprises me that we enter expand_vec_perm_pblendv with
> uninitialized (?) target. Does your patch only papers over a real
> problem up the call chain (hard to say without a testcase)?
>
> Uros.
>
>>
>>>
>>> Evgeny
>>>
>>> 2014-12-09  Evgeny Stupachenko  <evstupac@gmail.com>
>>>
>>> gcc/
>>>         * config/i386/i386.c (expand_vec_perm_pblendv): Gen new rtx for
>>>         expand_vec_perm_1 target.
>>>
>>> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
>>> index eafc15a..5a914ad 100644
>>> --- a/gcc/config/i386/i386.c
>>> +++ b/gcc/config/i386/i386.c
>>> @@ -47546,6 +47546,7 @@ expand_vec_perm_pblendv (struct expand_vec_perm_d *d)
>>>      dcopy.op0 = dcopy.op1 = d->op1;
>>>    else
>>>      dcopy.op0 = dcopy.op1 = d->op0;
>>> +  dcopy.target = gen_reg_rtx (vmode);
>>>    dcopy.one_operand_p = true;
>>>
>>>    for (i = 0; i < nelt; ++i)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH, x86] Fix pblendv expand.
  2014-12-09 13:29     ` Evgeny Stupachenko
@ 2014-12-09 15:59       ` Evgeny Stupachenko
  2014-12-09 16:20         ` Richard Henderson
  2014-12-09 16:23         ` Jakub Jelinek
  0 siblings, 2 replies; 13+ messages in thread
From: Evgeny Stupachenko @ 2014-12-09 15:59 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: GCC Patches, Richard Henderson, Jakub Jelinek

I've tried to get smaller reproducer, however currently it is
complicated as several functions in GCC.
However patch is fixing spec2006 benchmark. Shouldn't that be enough
for regression testing?

Thanks,
Evgeny

On Tue, Dec 9, 2014 at 4:29 PM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
> The case comes from spec2006 403.gcc (or old GCC itself).
>
>   for (i = 0; i < FIRST_PSEUDO_REGISTER; ++i)
>     {
>       vd->e[i].mode = VOIDmode;
>       vd->e[i].oldest_regno = i;
>       vd->e[i].next_regno = INVALID_REGNUM;
>     }
>
> It is vectorized and only then completely peeled.
> Only after peeling all stored values become constant.
>
> Currently expand_vec_perm_pblendv works as following:
> Let d.target, d.op0, dop1 be permutation parameters.
>
> First we permute an operand (d.op1 or d.op0) and then blend it with
> other argument:
>
> d.target = shuffle(d.op1) /* using expand_vec_perm_1 */
> d.target = pblend(d.op0, d.target)
> (if d.op0 equal to d.target this is buggy)
>
> Patch make it more accurate:
> tmp = gen_reg_rtx (vmode)
> tmp = shuffle(d.op1) /* using expand_vec_perm_1 */
> d.target = pblend(d.op0, tmp)
> (Here d.op0 can be equal to d.target)
>
> Below is rtl dump of buggy case:
>
> (183t.optimized)
> ...
> vect_shuffle3_low_470 = VEC_PERM_EXPR <{ 0, 0, 0, 0 }, { 32, 33, 34,
> 35 }, { 0, 4, 0, 1 }>;
> vect_shuffle3_high_469 = VEC_PERM_EXPR <vect_shuffle3_low_470, {
> 4294967295, 4294967295, 4294967295, 4294967295 }, { 0, 1, 4, 3 }>;
> ...
> (184r.expand)
> ...
> (insn 205 204 206 (set (reg:V4SI 768)
>         (const_vector:V4SI [
>                 (const_int 0 [0])
>                 (const_int 0 [0])
>                 (const_int 0 [0])
>                 (const_int 0 [0])
>             ])) ../regrename.c:1171 -1
>      (nil))
>
> (insn 206 205 208 (set (reg:V4SI 769)
>         (mem/u/c:V4SI (symbol_ref/u:DI ("*.LC28") [flags 0x2]) [3  S16
> A128])) ../regrename.c:1171 -1
>      (expr_list:REG_EQUAL (const_vector:V4SI [
>                 (const_int 32 [0x20])
>                 (const_int 33 [0x21])
>                 (const_int 34 [0x22])
>                 (const_int 35 [0x23])
>             ])
>         (nil)))
>
> (insn 208 206 207 (set (reg:V4SI 770)
>         (vec_select:V4SI (vec_concat:V8SI (reg:V4SI 768)
>                 (reg:V4SI 769))
>             (parallel [
>                     (const_int 0 [0])
>                     (const_int 4 [0x4])
>                     (const_int 1 [0x1])
>                     (const_int 5 [0x5])
>                 ]))) ../regrename.c:1171 -1
>      (nil))
>
> (insn 207 208 209 (set (reg:V4SI 464 [ D.15061 ])
>         (vec_select:V4SI (reg:V4SI 770)
>             (parallel [
>                     (const_int 0 [0])
>                     (const_int 1 [0x1])
>                     (const_int 0 [0])
>                     (const_int 2 [0x2])
>                 ]))) ../regrename.c:1171 -1
>      (nil))
>
> (insn 209 207 210 (set (reg:V4SI 771)
>         (const_vector:V4SI [
>                 (const_int -1 [0xffffffffffffffff])
>                 (const_int -1 [0xffffffffffffffff])
>                 (const_int -1 [0xffffffffffffffff])
>                 (const_int -1 [0xffffffffffffffff])
>             ])) ../regrename.c:1171 -1
>      (nil))
>
> (insn 210 209 211 (set (reg:V4SI 464 [ D.15061 ])
>         (vec_select:V4SI (reg:V4SI 771)
>             (parallel [
>                     (const_int 0 [0])
>                     (const_int 1 [0x1])
>                     (const_int 0 [0])
>                     (const_int 3 [0x3])
>                 ]))) ../regrename.c:1171 -1
>      (nil))
> (insn 211 210 212 (set (reg:V8HI 772)
>         (vec_merge:V8HI (subreg:V8HI (reg:V4SI 464 [ D.15061 ]) 0)
>             (subreg:V8HI (reg:V4SI 464 [ D.15061 ]) 0)
>             (const_int 48 [0x30]))) ../regrename.c:1171 -1
>      (nil))
> ...
>
> On Tue, Dec 9, 2014 at 12:06 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
>> On Tue, Dec 9, 2014 at 9:57 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
>>
>>>> The patch fix pblendv expand.
>>>> The bug was uncovered when permutation operands are constants.
>>>> In this case we init target register for expand_vec_perm_1 with
>>>> constant and then rewrite the target with constant for
>>>> expand_vec_perm_pblend.
>>>>
>>>> The patch fixes 403.gcc execution, compiled with -Ofast -funroll-loops
>>>> -flto -march=corei7.
>>>>
>>>> Bootstrap and make check passed.
>>>>
>>>> Is it ok?
>>>
>>> Please add a testcase.
>>
>> Also, it surprises me that we enter expand_vec_perm_pblendv with
>> uninitialized (?) target. Does your patch only papers over a real
>> problem up the call chain (hard to say without a testcase)?
>>
>> Uros.
>>
>>>
>>>>
>>>> Evgeny
>>>>
>>>> 2014-12-09  Evgeny Stupachenko  <evstupac@gmail.com>
>>>>
>>>> gcc/
>>>>         * config/i386/i386.c (expand_vec_perm_pblendv): Gen new rtx for
>>>>         expand_vec_perm_1 target.
>>>>
>>>> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
>>>> index eafc15a..5a914ad 100644
>>>> --- a/gcc/config/i386/i386.c
>>>> +++ b/gcc/config/i386/i386.c
>>>> @@ -47546,6 +47546,7 @@ expand_vec_perm_pblendv (struct expand_vec_perm_d *d)
>>>>      dcopy.op0 = dcopy.op1 = d->op1;
>>>>    else
>>>>      dcopy.op0 = dcopy.op1 = d->op0;
>>>> +  dcopy.target = gen_reg_rtx (vmode);
>>>>    dcopy.one_operand_p = true;
>>>>
>>>>    for (i = 0; i < nelt; ++i)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH, x86] Fix pblendv expand.
  2014-12-09 15:59       ` Evgeny Stupachenko
@ 2014-12-09 16:20         ` Richard Henderson
  2014-12-09 16:38           ` Evgeny Stupachenko
  2014-12-09 16:23         ` Jakub Jelinek
  1 sibling, 1 reply; 13+ messages in thread
From: Richard Henderson @ 2014-12-09 16:20 UTC (permalink / raw)
  To: Evgeny Stupachenko, Uros Bizjak; +Cc: GCC Patches, Jakub Jelinek

On 12/09/2014 07:59 AM, Evgeny Stupachenko wrote:
> However patch is fixing spec2006 benchmark. Shouldn't that be enough
> for regression testing?
> 

No.  Spec is not free.


r~

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH, x86] Fix pblendv expand.
  2014-12-09 15:59       ` Evgeny Stupachenko
  2014-12-09 16:20         ` Richard Henderson
@ 2014-12-09 16:23         ` Jakub Jelinek
  1 sibling, 0 replies; 13+ messages in thread
From: Jakub Jelinek @ 2014-12-09 16:23 UTC (permalink / raw)
  To: Evgeny Stupachenko; +Cc: Uros Bizjak, GCC Patches, Richard Henderson

On Tue, Dec 09, 2014 at 06:59:27PM +0300, Evgeny Stupachenko wrote:
> I've tried to get smaller reproducer, however currently it is
> complicated as several functions in GCC.

Just add a gcc_assert where you are changing the code, testing for
what you want to avoid.
Then just delta reduce it or creduce it.

	Jakub

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH, x86] Fix pblendv expand.
  2014-12-09 16:20         ` Richard Henderson
@ 2014-12-09 16:38           ` Evgeny Stupachenko
  2014-12-09 21:33             ` Evgeny Stupachenko
  0 siblings, 1 reply; 13+ messages in thread
From: Evgeny Stupachenko @ 2014-12-09 16:38 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Uros Bizjak, GCC Patches, Jakub Jelinek

I mean that there are a lot of people tracking spec2006 stability and
therefore the issue should be on track in future.
And that I can create the test case, but it would be as big as several
GCC functions.
Will work on reducing the test case.


On Tue, Dec 9, 2014 at 7:20 PM, Richard Henderson <rth@redhat.com> wrote:
> On 12/09/2014 07:59 AM, Evgeny Stupachenko wrote:
>> However patch is fixing spec2006 benchmark. Shouldn't that be enough
>> for regression testing?
>>
>
> No.  Spec is not free.
>
>
> r~

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH, x86] Fix pblendv expand.
  2014-12-09 16:38           ` Evgeny Stupachenko
@ 2014-12-09 21:33             ` Evgeny Stupachenko
  2014-12-09 21:54               ` Jakub Jelinek
  0 siblings, 1 reply; 13+ messages in thread
From: Evgeny Stupachenko @ 2014-12-09 21:33 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Uros Bizjak, GCC Patches, Jakub Jelinek

I've added the reproducer to the patch.

is it ok?

ChangeLog:

2014-12-10  Evgeny Stupachenko  <evstupac@gmail.com>

gcc/testsuite
        * gcc.target/i386/blend.c: New.

gcc/
        * config/i386/i386.c (expand_vec_perm_pblendv): Gen new rtx for
        expand_vec_perm_1 target.

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index eafc15a..5a914ad 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -47546,6 +47546,7 @@ expand_vec_perm_pblendv (struct expand_vec_perm_d *d)
     dcopy.op0 = dcopy.op1 = d->op1;
   else
     dcopy.op0 = dcopy.op1 = d->op0;
+  dcopy.target = gen_reg_rtx (vmode);
   dcopy.one_operand_p = true;

   for (i = 0; i < nelt; ++i)
diff --git a/gcc/testsuite/gcc.target/i386/blend.c
b/gcc/testsuite/gcc.target/i386/blend.c
new file mode 100644
index 0000000..d03bdbb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/blend.c
@@ -0,0 +1,61 @@
+/* Test correctness of size 3 store groups permutation.  */
+/* { dg-do run } */
+/* { dg-options "-O3" } */
+
+#define N 50
+
+enum num3
+{
+  a, b, c
+};
+
+struct flags
+{
+  enum num3 f;
+  unsigned int c;
+  unsigned int p;
+};
+
+struct flagsN
+{
+  struct flags a[N];
+};
+
+void
+bar (int n, struct flagsN *ff)
+{
+  struct flagsN *fc;
+  for (fc = ff + 1; fc < (ff + n); fc++)
+    {
+      int i;
+      for (i = 0; i < N; ++i)
+       {
+         ff->a[i].f = 0;
+         ff->a[i].c = i;
+         ff->a[i].p = -1;
+       }
+      for (i = 0; i < n; i++)
+       {
+         int j;
+         for (j = 0; j < N - n; ++j)
+           {
+             fc->a[i + j].f = 0;
+             fc->a[i + j].c = j + i;
+             fc->a[i + j].p = -1;
+           }
+       }
+    }
+}
+
+struct flagsN q[2];
+
+int main()
+{
+  int i;
+  long long *rr = (long long *)q[0].a;
+  bar(2, q);
+  for (i = 0; i < N * 2; i += 2)
+    if (rr[i] == -1 && rr[i + 1] == -1)
+      return 1;
+  return 0;
+}



On Tue, Dec 9, 2014 at 7:38 PM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
> I mean that there are a lot of people tracking spec2006 stability and
> therefore the issue should be on track in future.
> And that I can create the test case, but it would be as big as several
> GCC functions.
> Will work on reducing the test case.
>
>
> On Tue, Dec 9, 2014 at 7:20 PM, Richard Henderson <rth@redhat.com> wrote:
>> On 12/09/2014 07:59 AM, Evgeny Stupachenko wrote:
>>> However patch is fixing spec2006 benchmark. Shouldn't that be enough
>>> for regression testing?
>>>
>>
>> No.  Spec is not free.
>>
>>
>> r~

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH, x86] Fix pblendv expand.
  2014-12-09 21:33             ` Evgeny Stupachenko
@ 2014-12-09 21:54               ` Jakub Jelinek
  2014-12-09 23:37                 ` Evgeny Stupachenko
  0 siblings, 1 reply; 13+ messages in thread
From: Jakub Jelinek @ 2014-12-09 21:54 UTC (permalink / raw)
  To: Evgeny Stupachenko; +Cc: Richard Henderson, Uros Bizjak, GCC Patches

On Wed, Dec 10, 2014 at 12:33:52AM +0300, Evgeny Stupachenko wrote:
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/blend.c
> @@ -0,0 +1,61 @@
> +/* Test correctness of size 3 store groups permutation.  */
> +/* { dg-do run } */
> +/* { dg-options "-O3" } */
> +
> +#define N 50
> +
> +enum num3
> +{
> +  a, b, c
> +};
> +
> +struct flags
> +{
> +  enum num3 f;

Does this really has to be an enum?  Doesn't unsigned int there work the
same?

> +int main()
> +{
> +  int i;
> +  long long *rr = (long long *)q[0].a;
> +  bar(2, q);
> +  for (i = 0; i < N * 2; i += 2)
> +    if (rr[i] == -1 && rr[i + 1] == -1)

This is aliasing violation, can't you avoid that?  I mean, just
check the individual struct flags fields?  And with the aliasing violation
fixed, is there anything i?86 specific left in the testcase (i.e. shouldn't
it go either to gcc.dg/ or gcc.c-torture/execute/ instead?)?

	Jakub

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH, x86] Fix pblendv expand.
  2014-12-09 21:54               ` Jakub Jelinek
@ 2014-12-09 23:37                 ` Evgeny Stupachenko
  2014-12-10 11:03                   ` Jakub Jelinek
  0 siblings, 1 reply; 13+ messages in thread
From: Evgeny Stupachenko @ 2014-12-09 23:37 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Richard Henderson, Uros Bizjak, GCC Patches

On Wed, Dec 10, 2014 at 12:54 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Wed, Dec 10, 2014 at 12:33:52AM +0300, Evgeny Stupachenko wrote:
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/i386/blend.c
>> @@ -0,0 +1,61 @@
>> +/* Test correctness of size 3 store groups permutation.  */
>> +/* { dg-do run } */
>> +/* { dg-options "-O3" } */
>> +
>> +#define N 50
>> +
>> +enum num3
>> +{
>> +  a, b, c
>> +};
>> +
>> +struct flags
>> +{
>> +  enum num3 f;
>
> Does this really has to be an enum?  Doesn't unsigned int there work the
> same?

Without enum the loop is not vectorized because of cost model (scalar
code is 1 insn faster).
Anyway, even with "-fvect-cost-model=unlimited" the issue is not reproduced.

>
>> +int main()
>> +{
>> +  int i;
>> +  long long *rr = (long long *)q[0].a;
>> +  bar(2, q);
>> +  for (i = 0; i < N * 2; i += 2)
>> +    if (rr[i] == -1 && rr[i + 1] == -1)
>
> This is aliasing violation, can't you avoid that?  I mean, just
> check the individual struct flags fields?  And with the aliasing violation
> fixed, is there anything i?86 specific left in the testcase (i.e. shouldn't
> it go either to gcc.dg/ or gcc.c-torture/execute/ instead?)?

There is nothing i?86 specific. Not sure if it can give real value for
other architectures.
Let's move it to gcc.dg/vect

Updated patch:

2014-12-10  Evgeny Stupachenko  <evstupac@gmail.com>

gcc/testsuite
        * gcc.dg/vect/blend.c: New.

gcc/
        * config/i386/i386.c (expand_vec_perm_pblendv): Gen new rtx for
        expand_vec_perm_1 target.

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index eafc15a..5a914ad 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -47546,6 +47546,7 @@ expand_vec_perm_pblendv (struct expand_vec_perm_d *d)
     dcopy.op0 = dcopy.op1 = d->op1;
   else
     dcopy.op0 = dcopy.op1 = d->op0;
+  dcopy.target = gen_reg_rtx (vmode);
   dcopy.one_operand_p = true;

   for (i = 0; i < nelt; ++i)
diff --git a/gcc/testsuite/gcc.dg/vect/blend.c
b/gcc/testsuite/gcc.dg/vect/blend.c
new file mode 100644
index 0000000..ebe2902
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/blend.c
@@ -0,0 +1,63 @@
+/* Test correctness of size 3 store groups permutation.  */
+/* { dg-do run } */
+
+#include "tree-vect.h"
+
+#define N 50
+
+enum num3
+{
+  a, b, c
+};
+
+struct flags
+{
+  enum num3 f;
+  unsigned int c;
+  unsigned int p;
+};
+
+struct flagsN
+{
+  struct flags a[N];
+};
+
+void
+bar (int n, struct flagsN *ff)
+{
+  struct flagsN *fc;
+  for (fc = ff + 1; fc < (ff + n); fc++)
+    {
+      int i;
+      for (i = 0; i < N; ++i)
+       {
+         ff->a[i].f = 0;
+         ff->a[i].c = i;
+         ff->a[i].p = -1;
+       }
+      for (i = 0; i < n; i++)
+       {
+         int j;
+         for (j = 0; j < N - n; ++j)
+           {
+             fc->a[i + j].f = 0;
+             fc->a[i + j].c = j + i;
+             fc->a[i + j].p = -1;
+           }
+       }
+    }
+}
+
+struct flagsN q[2];
+
+int main()
+{
+  int i;
+  check_vect ();
+  bar(2, q);
+  for (i = 0; i < N; i++)
+    if (q[0].a[i].f != 0 || q[0].a[i].c != i || q[0].a[i].p != -1)
+      return 1;
+  return 0;
+}
+/* { dg-final { cleanup-tree-dump "vect" } } */

>
>         Jakub

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH, x86] Fix pblendv expand.
  2014-12-09 23:37                 ` Evgeny Stupachenko
@ 2014-12-10 11:03                   ` Jakub Jelinek
  2014-12-10 16:30                     ` Uros Bizjak
  0 siblings, 1 reply; 13+ messages in thread
From: Jakub Jelinek @ 2014-12-10 11:03 UTC (permalink / raw)
  To: Evgeny Stupachenko; +Cc: Richard Henderson, Uros Bizjak, GCC Patches

On Wed, Dec 10, 2014 at 02:37:13AM +0300, Evgeny Stupachenko wrote:
> 2014-12-10  Evgeny Stupachenko  <evstupac@gmail.com>

I went ahead and filed a PR, so we have something to refer to in the
ChangeLog and name the testcases.

> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -47546,6 +47546,7 @@ expand_vec_perm_pblendv (struct expand_vec_perm_d *d)
>      dcopy.op0 = dcopy.op1 = d->op1;
>    else
>      dcopy.op0 = dcopy.op1 = d->op0;
> +  dcopy.target = gen_reg_rtx (vmode);
>    dcopy.one_operand_p = true;
> 
>    for (i = 0; i < nelt; ++i)

This is incorrect if d->testing_p is true (can happen e.g. on the testcase
below; generally, when testing_p is true, we should not use gen_reg_rtx
because it can be called from GIMPLE optimizers before init_emit is called.
See PR57896 for details.
If d->testing_p is true, target is never the same as any of the operands,
all 3 are virtual registers, so there is no overlap (and even if there would
be, it should not matter).

> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/blend.c
> @@ -0,0 +1,63 @@
> +/* Test correctness of size 3 store groups permutation.  */
> +/* { dg-do run } */

The testcase as is does not fail even with unfixed compiler.  I had to add
some dg-additional-options.

> +  bar(2, q);
> +  for (i = 0; i < N; i++)
> +    if (q[0].a[i].f != 0 || q[0].a[i].c != i || q[0].a[i].p != -1)
> +      return 1;

Furthermore, tests in the testsuite should fail through abort ()
/ __builtin_abort () instead of just returning non-zero exit code.

So, here is complete updated patch I'm going to bootstrap/regtest.

2014-12-10  Jakub Jelinek  <jakub@redhat.com>
	    Evgeny Stupachenko  <evstupac@gmail.com>

	* config/i386/i386.c (expand_vec_perm_pblendv): If not testing_p,
	set dcopy.target to a new pseudo.

	* gcc.dg/vect/pr64252.c: New test.
	* gcc.dg/pr64252.c: New test.
	* gcc.target/i386/avx2-pr64252.c: New test.

--- gcc/config/i386/i386.c.jj	2014-12-10 09:45:15.000000000 +0100
+++ gcc/config/i386/i386.c	2014-12-10 11:38:12.530795610 +0100
@@ -47554,6 +47554,8 @@ expand_vec_perm_pblendv (struct expand_v
     dcopy.op0 = dcopy.op1 = d->op1;
   else
     dcopy.op0 = dcopy.op1 = d->op0;
+  if (!d->testing_p)
+    dcopy.target = gen_reg_rtx (vmode);
   dcopy.one_operand_p = true;
 
   for (i = 0; i < nelt; ++i)
--- gcc/testsuite/gcc.dg/vect/pr64252.c.jj	2014-12-10 11:42:47.669991028 +0100
+++ gcc/testsuite/gcc.dg/vect/pr64252.c	2014-12-10 11:47:43.895818223 +0100
@@ -0,0 +1,66 @@
+/* PR target/64252 */
+/* Test correctness of size 3 store groups permutation.  */
+/* { dg-do run } */
+/* { dg-additional-options "-O3" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+#include "tree-vect.h"
+
+#define N 50
+
+enum num3
+{
+  a, b, c
+};
+
+struct flags
+{
+  enum num3 f;
+  unsigned int c;
+  unsigned int p;
+};
+
+struct flagsN
+{
+  struct flags a[N];
+};
+
+void
+bar (int n, struct flagsN *ff)
+{
+  struct flagsN *fc;
+  for (fc = ff + 1; fc < (ff + n); fc++)
+    {
+      int i;
+      for (i = 0; i < N; ++i)
+       {
+         ff->a[i].f = 0;
+         ff->a[i].c = i;
+         ff->a[i].p = -1;
+       }
+      for (i = 0; i < n; i++)
+       {
+         int j;
+         for (j = 0; j < N - n; ++j)
+           {
+             fc->a[i + j].f = 0;
+             fc->a[i + j].c = j + i;
+             fc->a[i + j].p = -1;
+           }
+       }
+    }
+}
+
+struct flagsN q[2];
+
+int main()
+{
+  int i;
+  check_vect ();
+  bar(2, q);
+  for (i = 0; i < N; i++)
+    if (q[0].a[i].f != 0 || q[0].a[i].c != i || q[0].a[i].p != -1)
+      abort ();
+  return 0;
+}
+/* { dg-final { cleanup-tree-dump "vect" } } */
--- gcc/testsuite/gcc.dg/pr64252.c.jj	2014-12-10 11:26:05.649467180 +0100
+++ gcc/testsuite/gcc.dg/pr64252.c	2014-12-10 11:25:27.057139759 +0100
@@ -0,0 +1,30 @@
+/* PR target/64252 */
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+typedef unsigned int V __attribute__((vector_size (32)));
+
+__attribute__((noinline, noclone)) void
+foo (V *a, V *b, V *c, V *d, V *e)
+{
+  V t = __builtin_shuffle (*a, *b, *c);
+  V v = __builtin_shuffle (t, (V) { ~0U, ~0U, ~0U, ~0U, ~0U, ~0U, ~0U, ~0U }, (V) { 0, 1, 8, 3, 4, 5, 9, 7 });
+  v = v + *d;
+  *e = v;
+}
+
+int
+main ()
+{
+  V a, b, c, d, e;
+  int i;
+  a = (V) { 1, 2, 3, 4, 5, 6, 7, 8 };
+  b = (V) { 9, 10, 11, 12, 13, 14, 15, 16 };
+  c = (V) { 1, 3, 5, 7, 9, 11, 13, 15 };
+  d = (V) { 0, 0, 0, 0, 0, 0, 0, 0 };
+  foo (&a, &b, &c, &d, &e);
+  for (i = 0; i < 8; i++)
+    if (e[i] != ((i == 2 || i == 6) ? ~0U : 2 + 2 * i))
+      __builtin_abort ();
+  return 0;
+}
--- gcc/testsuite/gcc.target/i386/avx2-pr64252.c.jj	2014-12-10 11:27:36.868877426 +0100
+++ gcc/testsuite/gcc.target/i386/avx2-pr64252.c	2014-12-10 11:30:49.500520279 +0100
@@ -0,0 +1,15 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx2" } */
+/* { dg-require-effective-target avx2 } */
+
+#include "avx2-check.h"
+
+#define main() do_main ()
+
+#include "../../gcc.dg/pr64252.c"
+
+static void
+avx2_test (void)
+{
+  do_main ();
+}


	Jakub

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH, x86] Fix pblendv expand.
  2014-12-10 11:03                   ` Jakub Jelinek
@ 2014-12-10 16:30                     ` Uros Bizjak
  0 siblings, 0 replies; 13+ messages in thread
From: Uros Bizjak @ 2014-12-10 16:30 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Evgeny Stupachenko, Richard Henderson, GCC Patches

On Wed, Dec 10, 2014 at 12:03 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Wed, Dec 10, 2014 at 02:37:13AM +0300, Evgeny Stupachenko wrote:
>> 2014-12-10  Evgeny Stupachenko  <evstupac@gmail.com>
>
> I went ahead and filed a PR, so we have something to refer to in the
> ChangeLog and name the testcases.

Thanks!

>> --- a/gcc/config/i386/i386.c
>> +++ b/gcc/config/i386/i386.c
>> @@ -47546,6 +47546,7 @@ expand_vec_perm_pblendv (struct expand_vec_perm_d *d)
>>      dcopy.op0 = dcopy.op1 = d->op1;
>>    else
>>      dcopy.op0 = dcopy.op1 = d->op0;
>> +  dcopy.target = gen_reg_rtx (vmode);
>>    dcopy.one_operand_p = true;
>>
>>    for (i = 0; i < nelt; ++i)
>
> This is incorrect if d->testing_p is true (can happen e.g. on the testcase
> below; generally, when testing_p is true, we should not use gen_reg_rtx
> because it can be called from GIMPLE optimizers before init_emit is called.
> See PR57896 for details.

Yes, this was also my concern. We've had some nasty failures occurring
when vReg was generated after reload.

> If d->testing_p is true, target is never the same as any of the operands,
> all 3 are virtual registers, so there is no overlap (and even if there would
> be, it should not matter).
>
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/vect/blend.c
>> @@ -0,0 +1,63 @@
>> +/* Test correctness of size 3 store groups permutation.  */
>> +/* { dg-do run } */
>
> The testcase as is does not fail even with unfixed compiler.  I had to add
> some dg-additional-options.
>
>> +  bar(2, q);
>> +  for (i = 0; i < N; i++)
>> +    if (q[0].a[i].f != 0 || q[0].a[i].c != i || q[0].a[i].p != -1)
>> +      return 1;
>
> Furthermore, tests in the testsuite should fail through abort ()
> / __builtin_abort () instead of just returning non-zero exit code.
>
> So, here is complete updated patch I'm going to bootstrap/regtest.
>
> 2014-12-10  Jakub Jelinek  <jakub@redhat.com>
>             Evgeny Stupachenko  <evstupac@gmail.com>
>
>         * config/i386/i386.c (expand_vec_perm_pblendv): If not testing_p,
>         set dcopy.target to a new pseudo.
>
>         * gcc.dg/vect/pr64252.c: New test.
>         * gcc.dg/pr64252.c: New test.
>         * gcc.target/i386/avx2-pr64252.c: New test.

OK.

Thanks,
Uros.

>
> --- gcc/config/i386/i386.c.jj   2014-12-10 09:45:15.000000000 +0100
> +++ gcc/config/i386/i386.c      2014-12-10 11:38:12.530795610 +0100
> @@ -47554,6 +47554,8 @@ expand_vec_perm_pblendv (struct expand_v
>      dcopy.op0 = dcopy.op1 = d->op1;
>    else
>      dcopy.op0 = dcopy.op1 = d->op0;
> +  if (!d->testing_p)
> +    dcopy.target = gen_reg_rtx (vmode);
>    dcopy.one_operand_p = true;
>
>    for (i = 0; i < nelt; ++i)
> --- gcc/testsuite/gcc.dg/vect/pr64252.c.jj      2014-12-10 11:42:47.669991028 +0100
> +++ gcc/testsuite/gcc.dg/vect/pr64252.c 2014-12-10 11:47:43.895818223 +0100
> @@ -0,0 +1,66 @@
> +/* PR target/64252 */
> +/* Test correctness of size 3 store groups permutation.  */
> +/* { dg-do run } */
> +/* { dg-additional-options "-O3" } */
> +/* { dg-additional-options "-mavx" { target avx_runtime } } */
> +
> +#include "tree-vect.h"
> +
> +#define N 50
> +
> +enum num3
> +{
> +  a, b, c
> +};
> +
> +struct flags
> +{
> +  enum num3 f;
> +  unsigned int c;
> +  unsigned int p;
> +};
> +
> +struct flagsN
> +{
> +  struct flags a[N];
> +};
> +
> +void
> +bar (int n, struct flagsN *ff)
> +{
> +  struct flagsN *fc;
> +  for (fc = ff + 1; fc < (ff + n); fc++)
> +    {
> +      int i;
> +      for (i = 0; i < N; ++i)
> +       {
> +         ff->a[i].f = 0;
> +         ff->a[i].c = i;
> +         ff->a[i].p = -1;
> +       }
> +      for (i = 0; i < n; i++)
> +       {
> +         int j;
> +         for (j = 0; j < N - n; ++j)
> +           {
> +             fc->a[i + j].f = 0;
> +             fc->a[i + j].c = j + i;
> +             fc->a[i + j].p = -1;
> +           }
> +       }
> +    }
> +}
> +
> +struct flagsN q[2];
> +
> +int main()
> +{
> +  int i;
> +  check_vect ();
> +  bar(2, q);
> +  for (i = 0; i < N; i++)
> +    if (q[0].a[i].f != 0 || q[0].a[i].c != i || q[0].a[i].p != -1)
> +      abort ();
> +  return 0;
> +}
> +/* { dg-final { cleanup-tree-dump "vect" } } */
> --- gcc/testsuite/gcc.dg/pr64252.c.jj   2014-12-10 11:26:05.649467180 +0100
> +++ gcc/testsuite/gcc.dg/pr64252.c      2014-12-10 11:25:27.057139759 +0100
> @@ -0,0 +1,30 @@
> +/* PR target/64252 */
> +/* { dg-do run } */
> +/* { dg-options "-O2" } */
> +
> +typedef unsigned int V __attribute__((vector_size (32)));
> +
> +__attribute__((noinline, noclone)) void
> +foo (V *a, V *b, V *c, V *d, V *e)
> +{
> +  V t = __builtin_shuffle (*a, *b, *c);
> +  V v = __builtin_shuffle (t, (V) { ~0U, ~0U, ~0U, ~0U, ~0U, ~0U, ~0U, ~0U }, (V) { 0, 1, 8, 3, 4, 5, 9, 7 });
> +  v = v + *d;
> +  *e = v;
> +}
> +
> +int
> +main ()
> +{
> +  V a, b, c, d, e;
> +  int i;
> +  a = (V) { 1, 2, 3, 4, 5, 6, 7, 8 };
> +  b = (V) { 9, 10, 11, 12, 13, 14, 15, 16 };
> +  c = (V) { 1, 3, 5, 7, 9, 11, 13, 15 };
> +  d = (V) { 0, 0, 0, 0, 0, 0, 0, 0 };
> +  foo (&a, &b, &c, &d, &e);
> +  for (i = 0; i < 8; i++)
> +    if (e[i] != ((i == 2 || i == 6) ? ~0U : 2 + 2 * i))
> +      __builtin_abort ();
> +  return 0;
> +}
> --- gcc/testsuite/gcc.target/i386/avx2-pr64252.c.jj     2014-12-10 11:27:36.868877426 +0100
> +++ gcc/testsuite/gcc.target/i386/avx2-pr64252.c        2014-12-10 11:30:49.500520279 +0100
> @@ -0,0 +1,15 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -mavx2" } */
> +/* { dg-require-effective-target avx2 } */
> +
> +#include "avx2-check.h"
> +
> +#define main() do_main ()
> +
> +#include "../../gcc.dg/pr64252.c"
> +
> +static void
> +avx2_test (void)
> +{
> +  do_main ();
> +}
>
>
>         Jakub

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2014-12-10 16:30 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-12-08 23:33 [PATCH, x86] Fix pblendv expand Evgeny Stupachenko
2014-12-09  8:57 ` Uros Bizjak
2014-12-09  9:06   ` Uros Bizjak
2014-12-09 13:29     ` Evgeny Stupachenko
2014-12-09 15:59       ` Evgeny Stupachenko
2014-12-09 16:20         ` Richard Henderson
2014-12-09 16:38           ` Evgeny Stupachenko
2014-12-09 21:33             ` Evgeny Stupachenko
2014-12-09 21:54               ` Jakub Jelinek
2014-12-09 23:37                 ` Evgeny Stupachenko
2014-12-10 11:03                   ` Jakub Jelinek
2014-12-10 16:30                     ` Uros Bizjak
2014-12-09 16:23         ` Jakub Jelinek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).