[Bug tree-optimization/101668] New: vectorizer doesn't categorize vector construct cost right.

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug tree-optimization/101668] New: vectorizer doesn't categorize vector construct cost right.
@ 2021-07-29  1:47 crazylht at gmail dot com
  2021-07-29  6:55 ` [Bug tree-optimization/101668] BB vectorizer doesn't handle lowpart of existing vector rguenth at gcc dot gnu.org
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: crazylht at gmail dot com @ 2021-07-29  1:47 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101668

            Bug ID: 101668
           Summary: vectorizer doesn't categorize vector construct cost
                    right.
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: crazylht at gmail dot com
  Target Milestone: ---

cat test.c

typedef int v16si __attribute__((vector_size (64)));
typedef long long v8di __attribute__((vector_size (64)));

void
bar_s32_s64 (v8di * dst, v16si src)
{
  long long tem[8];
  tem[0] = src[0];
  tem[1] = src[1];
  tem[2] = src[2];
  tem[3] = src[3];
  tem[4] = src[4];
  tem[5] = src[5];
  tem[6] = src[6];
  tem[7] = src[7];
  dst[0] = *(v8di *) tem;
}

gcc -O3 -march=skylake-avx512 will fail to vectorize the case after my r12-2549
because i've increased vec_construct cost for SKX/CLX. Here's dump for slp2

  <bb 2> [local count: 1073741824]:
  _1 = BIT_FIELD_REF <src_18(D), 32, 0>;
  _2 = (long long int) _1;
  _3 = BIT_FIELD_REF <src_18(D), 32, 32>;
  _4 = (long long int) _3;
  _5 = BIT_FIELD_REF <src_18(D), 32, 64>;
  _6 = (long long int) _5;
  _7 = BIT_FIELD_REF <src_18(D), 32, 96>;
  _8 = (long long int) _7;
  _9 = BIT_FIELD_REF <src_18(D), 32, 128>;
  _10 = (long long int) _9;
  _11 = BIT_FIELD_REF <src_18(D), 32, 160>;
  _12 = (long long int) _11;
  _13 = BIT_FIELD_REF <src_18(D), 32, 192>;
  _14 = (long long int) _13;
  _15 = BIT_FIELD_REF <src_18(D), 32, 224>;
  _31 = {_1, _3, _5, _7, _9, _11, _13, _15};
  vect__2.4_32 = (vector(8) long long int) _31;
  _16 = (long long int) _15;
  MEM <vector(8) long long int> [(long long int *)&tem] = vect__2.4_32;
  _17 = MEM[(v8di *)&tem];
  *dst_28(D) = _17;
  tem ={v} {CLOBBER};
  return;

But actually, there's no need for vec_contruct from each element, it will be
optimized to

   <bb 2> [local count: 1073741824]:
  _2 = BIT_FIELD_REF <src_18(D), 256, 0>;
  vect__2.4_32 = (vector(8) long long int) _2;
  *dst_28(D) = vect__2.4_32;
  return;

So at the time slp2 can realize the optimization and categorize vec_contruct
cost more accurately, we can avoid this regression.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/101668] BB vectorizer doesn't handle lowpart of existing vector
  2021-07-29  1:47 [Bug tree-optimization/101668] New: vectorizer doesn't categorize vector construct cost right crazylht at gmail dot com
@ 2021-07-29  6:55 ` rguenth at gcc dot gnu.org
  2021-07-29  7:03 ` crazylht at gmail dot com
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-07-29  6:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101668

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization
   Last reconfirmed|                            |2021-07-29
             Status|UNCONFIRMED                 |NEW
     Ever confirmed|0                           |1
                 CC|                            |rguenth at gcc dot gnu.org
            Summary|vectorizer doesn't          |BB vectorizer doesn't
                   |categorize vector construct |handle lowpart of existing
                   |cost right.                 |vector
             Blocks|                            |53947

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
The basic-block vectorizer is currently limited as to what "existing" vectors
it recognizes.  In this testcase we're accessing only the lowpart of 'src',
something we cannot yet model in vectorizable_slp_permutation.  The specific
case isn't hard to fix, we'd get

  <bb 2> [local count: 1073741824]:
  _31 = VIEW_CONVERT_EXPR<vector(8) int>(src_18(D));
  vect__2.4_33 = [vec_unpack_lo_expr] _31;
  vect__2.4_34 = [vec_unpack_hi_expr] _31;
  MEM <vector(4) long long int> [(long long int *)&tem] = vect__2.4_33;
  MEM <vector(4) long long int> [(long long int *)&tem + 32B] = vect__2.4_34;
  _17 = MEM[(v8di *)&tem];
  *dst_28(D) = _17;
  tem ={v} {CLOBBER};
  return;

so we then fail to elide the temporary, producing

bar_s32_s64:
.LFB0:
        .cfi_startproc
        pushq   %rbp
        .cfi_def_cfa_offset 16
        .cfi_offset 6, -16
        vpmovsxdq       %xmm0, %ymm1
        vextracti128    $0x1, %ymm0, %xmm0
        movq    %rsp, %rbp
        .cfi_def_cfa_register 6
        andq    $-64, %rsp
        subq    $8, %rsp
        vpmovsxdq       %xmm0, %ymm0
        vmovdqa %ymm1, -56(%rsp)
        vmovdqa %ymm0, -24(%rsp)
        vmovdqa64       -56(%rsp), %zmm2
        vmovdqa64       %zmm2, (%rdi)
        leave
        .cfi_def_cfa 7, 8
        ret

it looks like there's no V8SI->V8DI conversion optab or we choose V4DI
for some other reason as prefered vector mode.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/101668] BB vectorizer doesn't handle lowpart of existing vector
  2021-07-29  1:47 [Bug tree-optimization/101668] New: vectorizer doesn't categorize vector construct cost right crazylht at gmail dot com
  2021-07-29  6:55 ` [Bug tree-optimization/101668] BB vectorizer doesn't handle lowpart of existing vector rguenth at gcc dot gnu.org
@ 2021-07-29  7:03 ` crazylht at gmail dot com
  2022-05-20  9:03 ` rguenth at gcc dot gnu.org
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: crazylht at gmail dot com @ 2021-07-29  7:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101668

--- Comment #2 from Hongtao.liu <crazylht at gmail dot com> ---

> it looks like there's no V8SI->V8DI conversion optab or we choose V4DI
> for some other reason as prefered vector mode.

We have, just need to add -mprefer-vector-width=512, the we'll get

bar_s32_s64:
  vpmovsxdq zmm0, ymm0
  vmovdqa64 ZMMWORD PTR [rdi], zmm0
  ret

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/101668] BB vectorizer doesn't handle lowpart of existing vector
  2021-07-29  1:47 [Bug tree-optimization/101668] New: vectorizer doesn't categorize vector construct cost right crazylht at gmail dot com
  2021-07-29  6:55 ` [Bug tree-optimization/101668] BB vectorizer doesn't handle lowpart of existing vector rguenth at gcc dot gnu.org
  2021-07-29  7:03 ` crazylht at gmail dot com
@ 2022-05-20  9:03 ` rguenth at gcc dot gnu.org
  2022-05-20  9:13 ` crazylht at gmail dot com
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-05-20  9:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101668

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot gnu.org
             Status|NEW                         |ASSIGNED

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
Some pending enhancements would allow us to use VEC_PERM_EXPR with different
input modes from output mode and thus make implementation of this easier.
vectorizable_slp_permutation doesn't yet support that though.

For the special case of a contiguous permutation we can also vectorize it
as BIT_FIELD_REF of course.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/101668] BB vectorizer doesn't handle lowpart of existing vector
  2021-07-29  1:47 [Bug tree-optimization/101668] New: vectorizer doesn't categorize vector construct cost right crazylht at gmail dot com
                   ` (2 preceding siblings ...)
  2022-05-20  9:03 ` rguenth at gcc dot gnu.org
@ 2022-05-20  9:13 ` crazylht at gmail dot com
  2022-05-20  9:25 ` rguenth at gcc dot gnu.org
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: crazylht at gmail dot com @ 2022-05-20  9:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101668

--- Comment #4 from Hongtao.liu <crazylht at gmail dot com> ---
Guess we need to extend backend hook to handle different input and output
modes.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/101668] BB vectorizer doesn't handle lowpart of existing vector
  2021-07-29  1:47 [Bug tree-optimization/101668] New: vectorizer doesn't categorize vector construct cost right crazylht at gmail dot com
                   ` (3 preceding siblings ...)
  2022-05-20  9:13 ` crazylht at gmail dot com
@ 2022-05-20  9:25 ` rguenth at gcc dot gnu.org
  2022-05-25 13:05 ` rguenth at gcc dot gnu.org
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-05-20  9:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101668

--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Hongtao.liu from comment #4)
> Guess we need to extend backend hook to handle different input and output
> modes.

Yes, alternatively as said, some special cases could be directly handled.
For example v16si -> v8si could be handled by VEC_PERM <lowpart, highpart,
{..}>
without any extra magic (but IIRC we don't have a way to query target support
for specific BIT_FIELD_REFs which we'd use for getting at the lowpart
or highpart and if not available those would fall back to memory).
And contiguous permutes could be directly emitted as BIT_FIELD_REFs
(in some cases).

I have a half-way patch that does the preparatory work but leaves
vectorizable_slp_permutation unchanged so we immediately fail there
due to

  FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child)
    {
      if (!vect_maybe_update_slp_op_vectype (child, vectype)
          || !types_compatible_p (SLP_TREE_VECTYPE (child), vectype))
        {
          if (dump_enabled_p ())
            dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
                             "Unsupported lane permutation\n");
          return false;

the comment above that says

  /* ???  We currently only support all same vector input and output types
     while the SLP IL should really do a concat + select and thus accept
     arbitrary mismatches.  */

so it was designed to handle more, it wasn't just necessary to implement it ...

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/101668] BB vectorizer doesn't handle lowpart of existing vector
  2021-07-29  1:47 [Bug tree-optimization/101668] New: vectorizer doesn't categorize vector construct cost right crazylht at gmail dot com
                   ` (4 preceding siblings ...)
  2022-05-20  9:25 ` rguenth at gcc dot gnu.org
@ 2022-05-25 13:05 ` rguenth at gcc dot gnu.org
  2022-06-02  6:46 ` cvs-commit at gcc dot gnu.org
  2022-06-02  6:47 ` rguenth at gcc dot gnu.org
  7 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-05-25 13:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101668

--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
Created attachment 53031
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53031&action=edit
patch

This works now - the support for enhanced vec_perm_const is still not complete
on trunk (it claims all is OK ...) so it will ICE for testcases that would
require this.  But lowpart extracts and concats (untested) should work.

I'll extend coverage once the dependences are on trunk.  For the testcase at
hand
we now generate

bar_s32_s64:
.LFB0:
        .cfi_startproc
        vpmovsxdq       %ymm0, %zmm0
        vmovdqa64       %zmm0, (%rdi)
        ret

with AVX512.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/101668] BB vectorizer doesn't handle lowpart of existing vector
  2021-07-29  1:47 [Bug tree-optimization/101668] New: vectorizer doesn't categorize vector construct cost right crazylht at gmail dot com
                   ` (5 preceding siblings ...)
  2022-05-25 13:05 ` rguenth at gcc dot gnu.org
@ 2022-06-02  6:46 ` cvs-commit at gcc dot gnu.org
  2022-06-02  6:47 ` rguenth at gcc dot gnu.org
  7 siblings, 0 replies; 9+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-06-02  6:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101668

--- Comment #7 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:08afab6f8642f58f702010ec196dce3b00955627

commit r13-926-g08afab6f8642f58f702010ec196dce3b00955627
Author: Richard Biener <rguenther@suse.de>
Date:   Tue May 31 09:37:05 2022 +0200

    tree-optimization/101668 - relax SLP of existing vectors

    This relaxes the conditions on SLPing extracts from existing vectors
    leveraging the relaxed VEC_PERM conditions on the input vs output
    vector type compatibility.  It also handles lowpart extracts
    and concats without VEC_PERMs now.

    2022-05-25  Richard Biener  <rguenther@suse.de>

            PR tree-optimization/101668
            * tree-vect-slp.cc (vect_build_slp_tree_1): Allow BIT_FIELD_REFs
            for vector types with compatible lane types.
            (vect_build_slp_tree_2): Deal with this.
            (vect_add_slp_permutation): Adjust.  Emit lowpart/concat
            special cases without VEC_PERM.
            (vectorizable_slp_permutation): Select the operand vector
            type and relax requirements.  Handle identity permutes
            with mismatching operand types.
            * optabs-query.cc (can_vec_perm_const_p): Only allow variable
            permutes for op_mode == mode.

            * gcc.target/i386/pr101668.c: New testcase.
            * gcc.dg/vect/bb-slp-pr101668.c: Likewise.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/101668] BB vectorizer doesn't handle lowpart of existing vector
  2021-07-29  1:47 [Bug tree-optimization/101668] New: vectorizer doesn't categorize vector construct cost right crazylht at gmail dot com
                   ` (6 preceding siblings ...)
  2022-06-02  6:46 ` cvs-commit at gcc dot gnu.org
@ 2022-06-02  6:47 ` rguenth at gcc dot gnu.org
  7 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-06-02  6:47 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101668

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|ASSIGNED                    |RESOLVED

--- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> ---
Fixed (lowpart of existing vector).  The vectorizer now should also handle
other things like extract even but the target needs to support this in its
vec_perm_const handling which now allows different input/output modes.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-06-02  6:47 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-29  1:47 [Bug tree-optimization/101668] New: vectorizer doesn't categorize vector construct cost right crazylht at gmail dot com
2021-07-29  6:55 ` [Bug tree-optimization/101668] BB vectorizer doesn't handle lowpart of existing vector rguenth at gcc dot gnu.org
2021-07-29  7:03 ` crazylht at gmail dot com
2022-05-20  9:03 ` rguenth at gcc dot gnu.org
2022-05-20  9:13 ` crazylht at gmail dot com
2022-05-20  9:25 ` rguenth at gcc dot gnu.org
2022-05-25 13:05 ` rguenth at gcc dot gnu.org
2022-06-02  6:46 ` cvs-commit at gcc dot gnu.org
2022-06-02  6:47 ` rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).