[Bug tree-optimization/101097] New: Vectorizer is too eager to use vec

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug tree-optimization/101097] New: Vectorizer is too eager to use vec_unpack
@ 2021-06-16 15:13 ubizjak at gmail dot com
  2021-06-17  6:38 ` [Bug tree-optimization/101097] " rguenth at gcc dot gnu.org
                   ` (12 more replies)
  0 siblings, 13 replies; 14+ messages in thread
From: ubizjak at gmail dot com @ 2021-06-16 15:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101097

            Bug ID: 101097
           Summary: Vectorizer is too eager to use vec_unpack
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: ubizjak at gmail dot com
  Target Milestone: ---

Following two testcases:

void
foo (unsigned short* p1, unsigned short* p2, int* __restrict p3)
{
    for (int i = 0 ; i != 8; i++)
     p3[i] = p1[i] + p2[i];
     return;
}

void
bar (unsigned short* p1, unsigned short* p2, int* __restrict p3)
{
    for (int i = 0 ; i != 4; i++)
     p3[i] = p1[i] + p2[i];
     return;
}

compile with -O3 -mavx2 to:

foo:
        vmovdqu (%rdi), %xmm1
        vmovdqu (%rsi), %xmm0
        vpmovzxwd       %xmm1, %xmm3
        vpsrldq $8, %xmm1, %xmm1
        vpmovzxwd       %xmm0, %xmm2
        vpsrldq $8, %xmm0, %xmm0
        vpmovzxwd       %xmm1, %xmm1
        vpaddd  %xmm3, %xmm2, %xmm2
        vpmovzxwd       %xmm0, %xmm0
        vmovdqu %xmm2, (%rdx)
        vpaddd  %xmm1, %xmm0, %xmm0
        vmovdqu %xmm0, 16(%rdx)
        ret

bar:
        vpmovzxwd       (%rsi), %xmm1
        vpmovzxwd       (%rdi), %xmm0
        vpaddd  %xmm1, %xmm0, %xmm0
        vmovdqu %xmm0, (%rdx)
        ret

However, with "foo" the vec_unpack* named patterns somehow interfere with the
compilation, preventing the compiler to generate code, similar to "bar", but
with %ymm registers.

Disabling vec_unpacku_hi_<mode> and vec_unpacku_lo_<mode> patterns in sse.md
results in the optimal code for foo:

foo:
        vpmovzxwd       (%rsi), %ymm0
        vpmovzxwd       (%rdi), %ymm1
        vpaddd  %ymm1, %ymm0, %ymm0
        vmovdqu %ymm0, (%rdx)
        vzeroupper
        ret

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug tree-optimization/101097] Vectorizer is too eager to use vec_unpack
  2021-06-16 15:13 [Bug tree-optimization/101097] New: Vectorizer is too eager to use vec_unpack ubizjak at gmail dot com
@ 2021-06-17  6:38 ` rguenth at gcc dot gnu.org
  2021-06-17  7:00 ` crazylht at gmail dot com
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-06-17  6:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101097

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Blocks|                            |53947
           Keywords|                            |missed-optimization
             Target|                            |x86_64-*-* i?86-*-*
             Status|UNCONFIRMED                 |NEW
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2021-06-17

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Hmm, so the difference is that we use loop vect for 'foo' but fail to do that
for 'bar' and BB vect succeeds.  Disabling loop vect but enabling BB vect also
produces optimal code for 'foo' (unrolling happens before):

foo:
.LFB0:
        .cfi_startproc
        vpmovzxwd       (%rsi), %ymm0
        vpmovzxwd       (%rdi), %ymm1
        vpaddd  %ymm1, %ymm0, %ymm0
        vmovdqu %ymm0, (%rdx)
        vzeroupper

the key difference in the vectorizer is that BB vect supports different
vector sizes in the same instance but the loop vectorizer can only use
a single vector size.

There's some related PRs in that context.

void
foo (unsigned short* p1, unsigned short* p2, int* __restrict p3)
{
    for (int i = 0 ; i != 32; i++)
     p3[i] = p1[i] + p2[i];
     return;
}

is never optimized optimally because of too many iterations for unrolling
to trigger (--parm max-completely-peel-times default is 16).


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug tree-optimization/101097] Vectorizer is too eager to use vec_unpack
  2021-06-16 15:13 [Bug tree-optimization/101097] New: Vectorizer is too eager to use vec_unpack ubizjak at gmail dot com
  2021-06-17  6:38 ` [Bug tree-optimization/101097] " rguenth at gcc dot gnu.org
@ 2021-06-17  7:00 ` crazylht at gmail dot com
  2021-06-17  7:08 ` ubizjak at gmail dot com
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: crazylht at gmail dot com @ 2021-06-17  7:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101097

Hongtao.liu <crazylht at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |crazylht at gmail dot com

--- Comment #2 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Richard Biener from comment #1)
> Hmm, so the difference is that we use loop vect for 'foo' but fail to do
> that for 'bar' and BB vect succeeds.  Disabling loop vect but enabling BB
> vect also produces optimal code for 'foo' (unrolling happens before):
> 
> foo:
> .LFB0:
>         .cfi_startproc
>         vpmovzxwd       (%rsi), %ymm0
>         vpmovzxwd       (%rdi), %ymm1
>         vpaddd  %ymm1, %ymm0, %ymm0
>         vmovdqu %ymm0, (%rdx)
>         vzeroupper
> 
> the key difference in the vectorizer is that BB vect supports different
> vector sizes in the same instance but the loop vectorizer can only use
> a single vector size.
Is there any plan for extending loop vectorizer to handle different vector
sizes?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug tree-optimization/101097] Vectorizer is too eager to use vec_unpack
  2021-06-16 15:13 [Bug tree-optimization/101097] New: Vectorizer is too eager to use vec_unpack ubizjak at gmail dot com
  2021-06-17  6:38 ` [Bug tree-optimization/101097] " rguenth at gcc dot gnu.org
  2021-06-17  7:00 ` crazylht at gmail dot com
@ 2021-06-17  7:08 ` ubizjak at gmail dot com
  2021-06-17  7:19 ` rguenth at gcc dot gnu.org
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: ubizjak at gmail dot com @ 2021-06-17  7:08 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101097

--- Comment #3 from Uroš Bizjak <ubizjak at gmail dot com> ---
Created attachment 51031
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51031&action=edit
Pack/unpack patterns for 8-byte vectors

FYI, this patch adds pack/unpack patterns for 8-byte vectors. It will fail:

FAIL: gcc.target/i386/pr97249-1.c scan-assembler-times (?n)vpmovzxbw[
\\\\t]+\\\\(.*%xmm[0-9] 2
FAIL: gcc.target/i386/pr97249-1.c scan-assembler-times (?n)vpmovzxwd[
\\\\t]+\\\\(.*%xmm[0-9] 2

due to the mentioned issue.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug tree-optimization/101097] Vectorizer is too eager to use vec_unpack
  2021-06-16 15:13 [Bug tree-optimization/101097] New: Vectorizer is too eager to use vec_unpack ubizjak at gmail dot com
                   ` (2 preceding siblings ...)
  2021-06-17  7:08 ` ubizjak at gmail dot com
@ 2021-06-17  7:19 ` rguenth at gcc dot gnu.org
  2021-06-17  7:21 ` rguenth at gcc dot gnu.org
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-06-17  7:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101097

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Hongtao.liu from comment #2)
> (In reply to Richard Biener from comment #1)
> > Hmm, so the difference is that we use loop vect for 'foo' but fail to do
> > that for 'bar' and BB vect succeeds.  Disabling loop vect but enabling BB
> > vect also produces optimal code for 'foo' (unrolling happens before):
> > 
> > foo:
> > .LFB0:
> >         .cfi_startproc
> >         vpmovzxwd       (%rsi), %ymm0
> >         vpmovzxwd       (%rdi), %ymm1
> >         vpaddd  %ymm1, %ymm0, %ymm0
> >         vmovdqu %ymm0, (%rdx)
> >         vzeroupper
> > 
> > the key difference in the vectorizer is that BB vect supports different
> > vector sizes in the same instance but the loop vectorizer can only use
> > a single vector size.
> Is there any plan for extending loop vectorizer to handle different vector
> sizes?

It's not an easy task - we're committing to vector types stmt-local and quite
early (vect_determine_vectorization_factor), the same is in principle true
for BB vect but there we know the vectorization factor beforehand (it's 1 - we
can't unroll a BB) and thus see to tweak the vector size instead of failing.

What would need to be done is determine the output vector type in
vectorizable_conversion based on the input vector types.  But then that
would need to be another phase of vectorizable_* calls since the
final vectorization factor would not be set.  The whole thing is related
to vector size iteration where the idea would be to somehow compute for
each stmt a set of input & output vector types that the target supports
and then somehow select sets that we want to send to costing.

As said - a lot of work, sth that might be easier when we got rid of the
SLP vs. non-SLP duality.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug tree-optimization/101097] Vectorizer is too eager to use vec_unpack
  2021-06-16 15:13 [Bug tree-optimization/101097] New: Vectorizer is too eager to use vec_unpack ubizjak at gmail dot com
                   ` (3 preceding siblings ...)
  2021-06-17  7:19 ` rguenth at gcc dot gnu.org
@ 2021-06-17  7:21 ` rguenth at gcc dot gnu.org
  2021-06-17  7:29 ` crazylht at gmail dot com
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-06-17  7:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101097

--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Uroš Bizjak from comment #3)
> Created attachment 51031 [details]
> Pack/unpack patterns for 8-byte vectors
> 
> FYI, this patch adds pack/unpack patterns for 8-byte vectors. It will fail:
> 
> FAIL: gcc.target/i386/pr97249-1.c scan-assembler-times (?n)vpmovzxbw[
> \\\\t]+\\\\(.*%xmm[0-9] 2
> FAIL: gcc.target/i386/pr97249-1.c scan-assembler-times (?n)vpmovzxwd[
> \\\\t]+\\\\(.*%xmm[0-9] 2
> 
> due to the mentioned issue.

You can add #pragma GCC unroll 8/4 to "fix" that and add a comment this test
tests BB vectorization.  Loop vectorization would have failed even before
the patch (and now succeeds, but with less optimal code).

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug tree-optimization/101097] Vectorizer is too eager to use vec_unpack
  2021-06-16 15:13 [Bug tree-optimization/101097] New: Vectorizer is too eager to use vec_unpack ubizjak at gmail dot com
                   ` (4 preceding siblings ...)
  2021-06-17  7:21 ` rguenth at gcc dot gnu.org
@ 2021-06-17  7:29 ` crazylht at gmail dot com
  2021-06-17  7:30 ` crazylht at gmail dot com
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: crazylht at gmail dot com @ 2021-06-17  7:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101097

--- Comment #6 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Richard Biener from comment #5)
> (In reply to Uroš Bizjak from comment #3)
> > Created attachment 51031 [details]
> > Pack/unpack patterns for 8-byte vectors
> > 
> > FYI, this patch adds pack/unpack patterns for 8-byte vectors. It will fail:
> > 
> > FAIL: gcc.target/i386/pr97249-1.c scan-assembler-times (?n)vpmovzxbw[
> > \\\\t]+\\\\(.*%xmm[0-9] 2
> > FAIL: gcc.target/i386/pr97249-1.c scan-assembler-times (?n)vpmovzxwd[
> > \\\\t]+\\\\(.*%xmm[0-9] 2
> > 
> > due to the mentioned issue.
> 
> You can add #pragma GCC unroll 8/4 to "fix" that and add a comment this test
> tests BB vectorization.  Loop vectorization would have failed even before
> the patch (and now succeeds, but with less optimal code).

I just to want to classify the test is used to test another optimization which
rely on either loop vectorization or slp. it means it's ok to add unroll pragma
here.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug tree-optimization/101097] Vectorizer is too eager to use vec_unpack
  2021-06-16 15:13 [Bug tree-optimization/101097] New: Vectorizer is too eager to use vec_unpack ubizjak at gmail dot com
                   ` (5 preceding siblings ...)
  2021-06-17  7:29 ` crazylht at gmail dot com
@ 2021-06-17  7:30 ` crazylht at gmail dot com
  2021-06-17  7:32 ` ubizjak at gmail dot com
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: crazylht at gmail dot com @ 2021-06-17  7:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101097

--- Comment #7 from Hongtao.liu <crazylht at gmail dot com> ---

> I just to want to classify the test is used to test another optimization
> which rely on either loop vectorization or slp. it means it's ok to add
> unroll pragma here.

clarify

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug tree-optimization/101097] Vectorizer is too eager to use vec_unpack
  2021-06-16 15:13 [Bug tree-optimization/101097] New: Vectorizer is too eager to use vec_unpack ubizjak at gmail dot com
                   ` (6 preceding siblings ...)
  2021-06-17  7:30 ` crazylht at gmail dot com
@ 2021-06-17  7:32 ` ubizjak at gmail dot com
  2021-06-17  7:34 ` crazylht at gmail dot com
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: ubizjak at gmail dot com @ 2021-06-17  7:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101097

--- Comment #8 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Hongtao.liu from comment #6)
> 
> I just to want to classify the test is used to test another optimization
> which rely on either loop vectorization or slp. it means it's ok to add
> unroll pragma here.

The code with new patterns is clearly less optimal, so I've had some second
thoughts...

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug tree-optimization/101097] Vectorizer is too eager to use vec_unpack
  2021-06-16 15:13 [Bug tree-optimization/101097] New: Vectorizer is too eager to use vec_unpack ubizjak at gmail dot com
                   ` (7 preceding siblings ...)
  2021-06-17  7:32 ` ubizjak at gmail dot com
@ 2021-06-17  7:34 ` crazylht at gmail dot com
  2021-06-17  7:42 ` rguenth at gcc dot gnu.org
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: crazylht at gmail dot com @ 2021-06-17  7:34 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101097

--- Comment #9 from Hongtao.liu <crazylht at gmail dot com> ---

> As said - a lot of work, sth that might be easier when we got rid of the
> SLP vs. non-SLP duality.
Understand.

I guess we will encounter more redundant packs and unpacks, considering that we
have supported 4byte/8byte vectorization.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug tree-optimization/101097] Vectorizer is too eager to use vec_unpack
  2021-06-16 15:13 [Bug tree-optimization/101097] New: Vectorizer is too eager to use vec_unpack ubizjak at gmail dot com
                   ` (8 preceding siblings ...)
  2021-06-17  7:34 ` crazylht at gmail dot com
@ 2021-06-17  7:42 ` rguenth at gcc dot gnu.org
  2021-06-22 12:43 ` rsandifo at gcc dot gnu.org
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-06-17  7:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101097

--- Comment #10 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Uroš Bizjak from comment #8)
> (In reply to Hongtao.liu from comment #6)
> > 
> > I just to want to classify the test is used to test another optimization
> > which rely on either loop vectorization or slp. it means it's ok to add
> > unroll pragma here.
> 
> The code with new patterns is clearly less optimal, so I've had some second
> thoughts...

But clearly the code is better than when not vectorized at all which is what
happens when the iteration count isn't exactly 4, 8, 12 or 16.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug tree-optimization/101097] Vectorizer is too eager to use vec_unpack
  2021-06-16 15:13 [Bug tree-optimization/101097] New: Vectorizer is too eager to use vec_unpack ubizjak at gmail dot com
                   ` (9 preceding siblings ...)
  2021-06-17  7:42 ` rguenth at gcc dot gnu.org
@ 2021-06-22 12:43 ` rsandifo at gcc dot gnu.org
  2021-07-01  2:46 ` crazylht at gmail dot com
  2021-07-01  2:53 ` crazylht at gmail dot com
  12 siblings, 0 replies; 14+ messages in thread
From: rsandifo at gcc dot gnu.org @ 2021-06-22 12:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101097

rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rsandifo at gcc dot gnu.org

--- Comment #11 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> ---
FWIW, you could try something similar to how aarch64 handles this
for Advanced SIMD, with a combination of:

- TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES
- TARGET_VECTORIZE_RELATED_MODE.

We get the optimal code for these tests on aarch64, even when
the loop vectoriser is used.  E.g.:

void bar (short unsigned int * p1, short unsigned int * p2, int * restrict p3)
{
  vector(4) int vect__11.26;
  vector(4) int vect__8.25;
  vector(4) short unsigned int vect__7.24;
  vector(4) int vect__5.21;
  vector(4) short unsigned int vect__4.20;

  <bb 2> [local count: 214748371]:
  vect__4.20_34 = MEM <vector(4) short unsigned int> [(short unsigned int
*)p1_15(D)];
  vect__5.21_35 = (vector(4) int) vect__4.20_34;
  vect__7.24_38 = MEM <vector(4) short unsigned int> [(short unsigned int
*)p2_16(D)];
  vect__8.25_39 = (vector(4) int) vect__7.24_38;
  vect__11.26_40 = vect__5.21_35 + vect__8.25_39;
  MEM <vector(4) int> [(int *)p3_17(D)] = vect__11.26_40;
  return;
}

which for -O2 -ftree-vectorize is produced by the loop vectorizer
rather than SLP.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug tree-optimization/101097] Vectorizer is too eager to use vec_unpack
  2021-06-16 15:13 [Bug tree-optimization/101097] New: Vectorizer is too eager to use vec_unpack ubizjak at gmail dot com
                   ` (10 preceding siblings ...)
  2021-06-22 12:43 ` rsandifo at gcc dot gnu.org
@ 2021-07-01  2:46 ` crazylht at gmail dot com
  2021-07-01  2:53 ` crazylht at gmail dot com
  12 siblings, 0 replies; 14+ messages in thread
From: crazylht at gmail dot com @ 2021-07-01  2:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101097

--- Comment #12 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to rsandifo@gcc.gnu.org from comment #11)
> FWIW, you could try something similar to how aarch64 handles this
> for Advanced SIMD, with a combination of:
> 
> - TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES
> - TARGET_VECTORIZE_RELATED_MODE.
I added a target_hook to return vector mode with same element number for i386
backend.
It works for this case, but regresses many testcases which are related to
gather/scatter instructions, because gather/scatter instructions accept same
vector size but not same element number.

  /* AVX2 */
  def_builtin_pure (OPTION_MASK_ISA_AVX2, 0, "__builtin_ia32_gathersiv2df",
                    V2DF_FTYPE_V2DF_PCDOUBLE_V4SI_V2DF_INT,
                    IX86_BUILTIN_GATHERSIV2DF);

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug tree-optimization/101097] Vectorizer is too eager to use vec_unpack
  2021-06-16 15:13 [Bug tree-optimization/101097] New: Vectorizer is too eager to use vec_unpack ubizjak at gmail dot com
                   ` (11 preceding siblings ...)
  2021-07-01  2:46 ` crazylht at gmail dot com
@ 2021-07-01  2:53 ` crazylht at gmail dot com
  12 siblings, 0 replies; 14+ messages in thread
From: crazylht at gmail dot com @ 2021-07-01  2:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101097

--- Comment #13 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Hongtao.liu from comment #12)
> (In reply to rsandifo@gcc.gnu.org from comment #11)
> > FWIW, you could try something similar to how aarch64 handles this
> > for Advanced SIMD, with a combination of:
> > 
> > - TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES
> > - TARGET_VECTORIZE_RELATED_MODE.
> I added a target_hook to return vector mode with same element number for
> i386 backend.
> It works for this case, but regresses many testcases which are related to
> gather/scatter instructions, because gather/scatter instructions accept same
> vector size but not same element number.
> 
>   /* AVX2 */
>   def_builtin_pure (OPTION_MASK_ISA_AVX2, 0, "__builtin_ia32_gathersiv2df",
> 		    V2DF_FTYPE_V2DF_PCDOUBLE_V4SI_V2DF_INT,
> 		    IX86_BUILTIN_GATHERSIV2DF);

It hits the gcc_assert in tree-vect-stmts.c:vect_build_gather_load_calls

      if (!useless_type_conversion_p (idxtype, TREE_TYPE (op)))
        {
          gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (op)),
                                TYPE_VECTOR_SUBPARTS (idxtype)));

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-07-01  2:53 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-16 15:13 [Bug tree-optimization/101097] New: Vectorizer is too eager to use vec_unpack ubizjak at gmail dot com
2021-06-17  6:38 ` [Bug tree-optimization/101097] " rguenth at gcc dot gnu.org
2021-06-17  7:00 ` crazylht at gmail dot com
2021-06-17  7:08 ` ubizjak at gmail dot com
2021-06-17  7:19 ` rguenth at gcc dot gnu.org
2021-06-17  7:21 ` rguenth at gcc dot gnu.org
2021-06-17  7:29 ` crazylht at gmail dot com
2021-06-17  7:30 ` crazylht at gmail dot com
2021-06-17  7:32 ` ubizjak at gmail dot com
2021-06-17  7:34 ` crazylht at gmail dot com
2021-06-17  7:42 ` rguenth at gcc dot gnu.org
2021-06-22 12:43 ` rsandifo at gcc dot gnu.org
2021-07-01  2:46 ` crazylht at gmail dot com
2021-07-01  2:53 ` crazylht at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).